[FFmpeg-devel] [PATCH] avformat/movenc: Remove dts delay from duration.

Wed Dec 23 01:03:24 EET 2020

Hi Martin,

On Sat, 19 Dec 2020 at 15:10, Martin Storsjö <martin at martin.st> wrote:
>
> On Fri, 11 Dec 2020, Josh Allmann wrote:
>
> > On Fri, 11 Dec 2020 at 14:07, Martin Storsjö <martin at martin.st> wrote:
> >>
> >> On Fri, 11 Dec 2020, Josh Allmann wrote:
> >>
> >> > A negative start_dts value (eg, delay from edit lists) typically yields
> >> > a duration larger than end_pts. During edit list processing, the
> >> > delay is removed again, yielding the correct duration within the elst.
> >> >
> >> > However, other duration-carrying atoms (tkhd, mvhd, mdhd) still have
> >> > the delay incorporated into their durations. This is incorrect.
> >> >
> >> > Fix this by withholding delay from the duration if edit lists are used.
> >> > This also simplifies edit-list processing a bit, since the delay
> >> > does not need to be removed from the calculated duration again.
> >> > ---
> >> >
> >> >  The mov spec says that the tkhd duration is "derived from the track's
> >> >  edits" [1] and the duratons of the other atoms (mvhd, mdhd) are in turn
> >> >  taken from the longest track. So it seems that incorporating the delay
> >> >  into the track duration is a bug in itself when the edit list has the
> >> >  correct duration, and this propagates out tothe other top-level durations.
> >> >
> >> >  Unsure of how this change interacts with other modes that may expect
> >> >  negative timestamps such as CMAF, so the patch errs on the side of
> >> >  caution and only takes effect if edit lists are used. Can loosen that
> >> >  up if necessary.
> >> >
> >> >  [1] https://developer.apple.com/library/archive/documentation/QuickTime/QTFF/QTFFChap2/qtff2.html#//apple_ref/doc/uid/TP40000939-CH204-BBCEIDFA
> >> >
> >> > libavformat/movenc.c | 13 ++++++++-----
> >> > 1 file changed, 8 insertions(+), 5 deletions(-)
> >> >
> >> > diff --git a/libavformat/movenc.c b/libavformat/movenc.c
> >> > index 7db2e28840..31441a9f6c 100644
> >> > --- a/libavformat/movenc.c
> >> > +++ b/libavformat/movenc.c
> >> > @@ -2831,7 +2831,14 @@ static int64_t calc_pts_duration(MOVMuxContext *mov, MOVTrack *track)
> >> >     if (track->end_pts != AV_NOPTS_VALUE &&
> >> >         track->start_dts != AV_NOPTS_VALUE &&
> >> >         track->start_cts != AV_NOPTS_VALUE) {
> >> > -        return track->end_pts - (track->start_dts + track->start_cts);
> >> > +        int64_t dur = track->end_pts, delay = track->start_dts + track->start_cts;
> >> > +        /* Note, this delay is calculated from the pts of the first sample,
> >> > +         * ensuring that we don't reduce the duration for cases with
> >> > +         * dts<0 pts=0. */
> >>
> >> If you have a stream starting with dts<0 pts=0, you'll have start_pts =
> >> start_dts + start_cts = 0. That gives delay=0 after your modification. But
> >> the comment says "don't reduce the duration for cases with pts=0" - where
> >> the delay variable would be zero anyway?
> >>
> >
> > I'm not quite sure what you mean - that the comment is outdated?
> > Or that this modification would perhaps not behave as expected?
> >
> > For what it's worth, the cases I'm concerned with have start_pts < 0.
> >
> >>
> >>
> >> I don't manage to follow the reasoning and explanation in the commit
> >> message. To be able to concretely reason about this issue at all, we need
> >> to look at a concrete example. Can you provide a sample input file and a
> >> reproducible command, and point out which exact field in the muxer output
> >> of that case that you consider wrong?
> >>
> >
> > Had to create a trac to find somewhere to host the sample. Tried to put
> > some details there but the formatting seems messed up and I can't figure
> > out how to edit, apologies. So here is some more info -
> >
> > Input sample:
> >
> > https://trac.ffmpeg.org/raw-attachment/ticket/9028/test-timecode.mp4
> >
> > Run the following for a transmuxed clip from 3s for a 5s duration:
> >
> > ffmpeg -ss 3 -i test-timecode.mp4 -t 5 -c copy out.mp4
> >
> > Note that the actual cut location is mid-GOP, so there's a 1s pts delay
> > at the beginning of the output file with negative pts.
> >
> > ffprobe shows:
> >
> > ffprobe -show_streams -show_format out.mp4 2>&1 | grep duration=
> >
> > duration=5.166992 # stream duration - correct
> > duration=6.167000 # format duration - incorrect
> >
> > mp4dump'ing out.mp4 gives this:
> >
> > # incorrect: duration should be sum of elst durations
> >  [tkhd] size=12+80, flags=3
> >      duration = 6167
> >
> > # correct
> >      [elst] size=12+16
> >        entry_count = 1
> >        entry/segment duration = 5167
> >
> > # incorrect; derived from track duration (tkhd)
> >  [mvhd] size=12+96
> >    timescale = 1000
> >    duration = 6167
>
> Ok, now I've finally had time to dig into this. It does indeed seem like
> what we produce right now is incorrect.
>
> I don't think your patch does the right thing for cases that start with
> pts > 0. For those cases, the value returned by calc_pts_duration to
> mov_write_edts_tag would need to only cover the sample data itself, but
> for the other header durations would need to include the extra offset
> (adding the extra pts to it). And overall, the patch feels to me as it
> achieves it in a way that doesn't fit with how my mental model for this
> code is set up.
>
> I've made an alternative patch that I'll post momentarily, where I try to
> address the issue, but in a way that fits the way I understand the code.
>
> For your particular example code, it produces the same output as your
> patch, but I believe that it should do the right thing for cases that
> start with pts > 0 as well. (I practically didn't test that, but if
> someone would have time to do it, I'd appreciate it!)
>

Thank you for taking the time to look into this! Tested with your
alternative patch and it does seem to be an improvement. I was able to
find a case where it didn't seem to do the correct thing (described
below), but this is not a regression; master doesn't do the correct
thing, and neither does my patch. I haven't looked much at what the
code is doing beyond running these tests, but could find some time to
dig in early 2021 if it's not fixed by then.

First, a working sample with pts > 0 . This has a 600-frame GOP (20s @
30fps) to show that cutting mid-GOP works correctly.

https://trac.ffmpeg.org/raw-attachment/ticket/9028/gop-600.mp4

Command to generate a clipped sample with edit list. This spans two GOPs.

ffmpeg -ss 17.316666999999999 -i gop-600.mp4 -t 5 -c copy -movflags
+faststart -y cut-mp4-600.mp4

Things look good with ffprobe, and playback works well with ffplay; it
starts right around the 17-second mark. The sample is a timecode
pattern so it is easy to verify.

If the sample is a mpegts (rather than a mp4, all other things
identical), then things are a bit odd:

https://trac.ffmpeg.org/raw-attachment/ticket/9028/gop-600.ts

Cut the sample with:

ffmpeg -ss 17.316666999999999 -i gop-600.ts -t 5 -c copy -movflags
+faststart -y cut-ts-600.mp4

Probing gives the correct duration with a start time of 2.683.
However, playback starts at the 20 second mark. (Same timecode
pattern, so easy to visually verify.) Incidentally, it seems that 20 -
2.683 = 17.316, which is the original cut position.

ffprobe cut-ts-600.mp4 gives "Duration: 00:00:05.12, start: 2.683000"

The trac ticket has a bit more info (including commands for how to
generate the samples) but this is the gist of it.

Thanks again and happy holidays.

Josh