[FFmpeg-devel] [PATCH] avformat/mov: Fix decoding fragmented MP4 with multiple sample entries and empty stsc

Dimitry Andric dimitry at unified-streaming.com
Sat May 31 21:16:46 EEST 2025


On 9 May 2025, at 00:15, James Almer <jamrial at gmail.com> wrote:
> 
> On 5/8/2025 7:14 PM, Dimitry Andric wrote:
>> On 28 Apr 2025, at 13:00, Dimitry Andric <dimitry at unified-streaming.com> wrote:
>>> 
>>> On 19 Apr 2025, at 16:27, Dimitry Andric <dimitry at unified-streaming.com> wrote:
>>>> 
>>>> On 10 Apr 2025, at 11:03, Dimitry Andric <dimitry at unified-streaming.com> wrote:
>>>>> 
>>>>> On 3 Apr 2025, at 22:02, Dimitry Andric <dimitry at unified-streaming.com> wrote:
>>>>>> 
>>>>>> When decoding fragmented MP4 files that have an empty stsc box, and
>>>>>> instead contain sample description indexes in their tfhd boxes, the mov
>>>>>> demuxer does not notify the decoder whenever the current sample
>>>>>> description index changes. If the SPS or PPS changed sufficiently, this
>>>>>> can lead to unexpected decoding errors.
>>>>>> 
>>>>>> To fix this, in mov_finalize_packet(), when stsc_data is not available,
>>>>>> use get_frag_stream_info_from_pkt() to get at the current fragment
>>>>>> stream info, and retrieve the current sample description index from
>>>>>> there. Then use that index in a similar manner as the stsc case.
>>>>>> 
>>>>>> Signed-off-by: Dimitry Andric <dimitry at unified-streaming.com>
>>>>>> ---
>>>>>> libavformat/mov.c | 50 ++++++++++++++++++++++++++++-------------------
>>>>>> 1 file changed, 30 insertions(+), 20 deletions(-)
>>>>>> 
>>>>>> diff --git a/libavformat/mov.c b/libavformat/mov.c
>>>>>> index 452690090c..ead89192f4 100644
>>>>>> --- a/libavformat/mov.c
>>>>>> +++ b/libavformat/mov.c
>>>>>> @@ -10756,25 +10756,29 @@ static int mov_switch_root(AVFormatContext *s, int64_t target, int index)
>>>>>>  return 1;
>>>>>> }
>>>>>> 
>>>>>> -static int mov_change_extradata(AVStream *st, AVPacket *pkt)
>>>>>> +static int mov_change_extradata(AVStream *st, AVPacket *pkt, int stsd_id)
>>>>>> {
>>>>>>  MOVStreamContext *sc = st->priv_data;
>>>>>>  uint8_t *side, *extradata;
>>>>>>  int extradata_size;
>>>>>> 
>>>>>> -    /* Save the current index. */
>>>>>> -    sc->last_stsd_index = sc->stsc_data[sc->stsc_index].id - 1;
>>>>>> +    if (stsd_id > 0 &&
>>>>>> +        stsd_id - 1 < sc->stsd_count &&
>>>>>> +        stsd_id - 1 != sc->last_stsd_index) {
>>>>>> +        /* Save the current index. */
>>>>>> +        sc->last_stsd_index = stsd_id - 1;
>>>>>> 
>>>>>> -    /* Notify the decoder that extradata changed. */
>>>>>> -    extradata_size = sc->extradata_size[sc->last_stsd_index];
>>>>>> -    extradata = sc->extradata[sc->last_stsd_index];
>>>>>> -    if (st->discard != AVDISCARD_ALL && extradata_size > 0 && extradata) {
>>>>>> -        side = av_packet_new_side_data(pkt,
>>>>>> -                                       AV_PKT_DATA_NEW_EXTRADATA,
>>>>>> -                                       extradata_size);
>>>>>> -        if (!side)
>>>>>> -            return AVERROR(ENOMEM);
>>>>>> -        memcpy(side, extradata, extradata_size);
>>>>>> +        /* Notify the decoder that extradata changed. */
>>>>>> +        extradata_size = sc->extradata_size[sc->last_stsd_index];
>>>>>> +        extradata = sc->extradata[sc->last_stsd_index];
>>>>>> +        if (st->discard != AVDISCARD_ALL && extradata_size > 0 && extradata) {
>>>>>> +            side = av_packet_new_side_data(pkt,
>>>>>> +                                           AV_PKT_DATA_NEW_EXTRADATA,
>>>>>> +                                           extradata_size);
>>>>>> +            if (!side)
>>>>>> +                return AVERROR(ENOMEM);
>>>>>> +            memcpy(side, extradata, extradata_size);
>>>>>> +        }
>>>>>>  }
>>>>>> 
>>>>>>  return 0;
>>>>>> @@ -10893,13 +10897,10 @@ static int mov_finalize_packet(AVFormatContext *s, AVStream *st, AVIndexEntry *s
>>>>>> 
>>>>>>  /* Multiple stsd handling. */
>>>>>>  if (sc->stsc_data) {
>>>>>> -        if (sc->stsc_data[sc->stsc_index].id > 0 &&
>>>>>> -            sc->stsc_data[sc->stsc_index].id - 1 < sc->stsd_count &&
>>>>>> -            sc->stsc_data[sc->stsc_index].id - 1 != sc->last_stsd_index) {
>>>>>> -            int ret = mov_change_extradata(st, pkt);
>>>>>> -            if (ret < 0)
>>>>>> -                return ret;
>>>>>> -        }
>>>>>> +        int stsd_id = sc->stsc_data[sc->stsc_index].id;
>>>>>> +        int ret = mov_change_extradata(st, pkt, stsd_id);
>>>>>> +        if (ret < 0)
>>>>>> +            return ret;
>>>>>> 
>>>>>>      /* Update the stsc index for the next sample */
>>>>>>      sc->stsc_sample++;
>>>>>> @@ -10908,6 +10909,15 @@ static int mov_finalize_packet(AVFormatContext *s, AVStream *st, AVIndexEntry *s
>>>>>>          sc->stsc_index++;
>>>>>>          sc->stsc_sample = 0;
>>>>>>      }
>>>>>> +    } else {
>>>>>> +        MOVContext *mov = s->priv_data;
>>>>>> +        MOVFragmentStreamInfo *frag_stream_info = get_frag_stream_info_from_pkt(&mov->frag_index, pkt, sc->id);
>>>>>> +        if (frag_stream_info) {
>>>>>> +            int stsd_id = frag_stream_info->stsd_id;
>>>>>> +            int ret = mov_change_extradata(st, pkt, stsd_id);
>>>>>> +            if (ret < 0)
>>>>>> +                return ret;
>>>>>> +        }
>>>>>>  }
>>>>>> 
>>>>>>  return 0;
>>>>>> -- 
>>>>>> 2.43.0
>>>>>> 
>>>>> 
>>>>> Any comments on this patch?
>>>> 
>>>> Ping :)
>>> 
>>> Is there any particular group of persons that "own" the mov muxer?
>> Another ping.
> 
> I'll have a look seeing no one else will.

To provide some backstory here, I will attempt to explain further what
this patch is supposed to fix. It is specifically about AVC (or possibly
HEVC) video that has more than one referenced PPS in the elementary
stream. (One encoder that sometimes produces this kind of video is x264,
unless you use the --stitchable option).

In a MP4 file this can be represented by multiple sample description
entries in the 'stsd' box, and in a progressive file there is a 'stsc'
box which defines which samples have which sample description indexes.
FFmpeg handles these just fine.

However, in a fragmented MP4 file the 'stsc' box is usually empty, and
the fragments have a 'tfhd' box with a sample description index field
instead. Such files can sometimes not be decoded properly by FFmpeg,
since it does not call mov_change_extradata() whenever the sample
description index changes, somewhere in the middle of the video. In that
case, it will either complain about a bad PPS ID, or if the ID matches
but the PPS contents does not, lots of decoding errors will occur.

This proposed patch makes it so mov_change_extradata() is called even if
MovStreamContext's sc_data field is empty, but
get_frag_stream_info_from_pkt() returns a valid stsd_id in its
MOVFragmentStreamInfo. For fragmented files, mov_read_tfhd() already
takes care of reading the stsd_id from the tfhd boxes.

-Dimitry



More information about the ffmpeg-devel mailing list