[FFmpeg-devel] [PATCH] avformat/movenc: Remove dts delay from duration.

Martin Storsjö martin at martin.st
Sun Dec 20 01:10:02 EET 2020


On Fri, 11 Dec 2020, Josh Allmann wrote:

> On Fri, 11 Dec 2020 at 14:07, Martin Storsjö <martin at martin.st> wrote:
>>
>> On Fri, 11 Dec 2020, Josh Allmann wrote:
>>
>> > A negative start_dts value (eg, delay from edit lists) typically yields
>> > a duration larger than end_pts. During edit list processing, the
>> > delay is removed again, yielding the correct duration within the elst.
>> >
>> > However, other duration-carrying atoms (tkhd, mvhd, mdhd) still have
>> > the delay incorporated into their durations. This is incorrect.
>> >
>> > Fix this by withholding delay from the duration if edit lists are used.
>> > This also simplifies edit-list processing a bit, since the delay
>> > does not need to be removed from the calculated duration again.
>> > ---
>> >
>> >  The mov spec says that the tkhd duration is "derived from the track's
>> >  edits" [1] and the duratons of the other atoms (mvhd, mdhd) are in turn
>> >  taken from the longest track. So it seems that incorporating the delay
>> >  into the track duration is a bug in itself when the edit list has the
>> >  correct duration, and this propagates out tothe other top-level durations.
>> >
>> >  Unsure of how this change interacts with other modes that may expect
>> >  negative timestamps such as CMAF, so the patch errs on the side of
>> >  caution and only takes effect if edit lists are used. Can loosen that
>> >  up if necessary.
>> >
>> >  [1] https://developer.apple.com/library/archive/documentation/QuickTime/QTFF/QTFFChap2/qtff2.html#//apple_ref/doc/uid/TP40000939-CH204-BBCEIDFA
>> >
>> > libavformat/movenc.c | 13 ++++++++-----
>> > 1 file changed, 8 insertions(+), 5 deletions(-)
>> >
>> > diff --git a/libavformat/movenc.c b/libavformat/movenc.c
>> > index 7db2e28840..31441a9f6c 100644
>> > --- a/libavformat/movenc.c
>> > +++ b/libavformat/movenc.c
>> > @@ -2831,7 +2831,14 @@ static int64_t calc_pts_duration(MOVMuxContext *mov, MOVTrack *track)
>> >     if (track->end_pts != AV_NOPTS_VALUE &&
>> >         track->start_dts != AV_NOPTS_VALUE &&
>> >         track->start_cts != AV_NOPTS_VALUE) {
>> > -        return track->end_pts - (track->start_dts + track->start_cts);
>> > +        int64_t dur = track->end_pts, delay = track->start_dts + track->start_cts;
>> > +        /* Note, this delay is calculated from the pts of the first sample,
>> > +         * ensuring that we don't reduce the duration for cases with
>> > +         * dts<0 pts=0. */
>>
>> If you have a stream starting with dts<0 pts=0, you'll have start_pts =
>> start_dts + start_cts = 0. That gives delay=0 after your modification. But
>> the comment says "don't reduce the duration for cases with pts=0" - where
>> the delay variable would be zero anyway?
>>
>
> I'm not quite sure what you mean - that the comment is outdated?
> Or that this modification would perhaps not behave as expected?
>
> For what it's worth, the cases I'm concerned with have start_pts < 0.
>
>>
>>
>> I don't manage to follow the reasoning and explanation in the commit
>> message. To be able to concretely reason about this issue at all, we need
>> to look at a concrete example. Can you provide a sample input file and a
>> reproducible command, and point out which exact field in the muxer output
>> of that case that you consider wrong?
>>
>
> Had to create a trac to find somewhere to host the sample. Tried to put
> some details there but the formatting seems messed up and I can't figure
> out how to edit, apologies. So here is some more info -
>
> Input sample:
>
> https://trac.ffmpeg.org/raw-attachment/ticket/9028/test-timecode.mp4
>
> Run the following for a transmuxed clip from 3s for a 5s duration:
>
> ffmpeg -ss 3 -i test-timecode.mp4 -t 5 -c copy out.mp4
>
> Note that the actual cut location is mid-GOP, so there's a 1s pts delay
> at the beginning of the output file with negative pts.
>
> ffprobe shows:
>
> ffprobe -show_streams -show_format out.mp4 2>&1 | grep duration=
>
> duration=5.166992 # stream duration - correct
> duration=6.167000 # format duration - incorrect
>
> mp4dump'ing out.mp4 gives this:
>
> # incorrect: duration should be sum of elst durations
>  [tkhd] size=12+80, flags=3
>      duration = 6167
>
> # correct
>      [elst] size=12+16
>        entry_count = 1
>        entry/segment duration = 5167
>
> # incorrect; derived from track duration (tkhd)
>  [mvhd] size=12+96
>    timescale = 1000
>    duration = 6167

Ok, now I've finally had time to dig into this. It does indeed seem like 
what we produce right now is incorrect.

I don't think your patch does the right thing for cases that start with 
pts > 0. For those cases, the value returned by calc_pts_duration to 
mov_write_edts_tag would need to only cover the sample data itself, but 
for the other header durations would need to include the extra offset 
(adding the extra pts to it). And overall, the patch feels to me as it 
achieves it in a way that doesn't fit with how my mental model for this 
code is set up.

I've made an alternative patch that I'll post momentarily, where I try to 
address the issue, but in a way that fits the way I understand the code.

For your particular example code, it produces the same output as your 
patch, but I believe that it should do the right thing for cases that 
start with pts > 0 as well. (I practically didn't test that, but if 
someone would have time to do it, I'd appreciate it!)

// Martin



More information about the ffmpeg-devel mailing list