[FFmpeg-devel] [PATCHv2 2/3] avformat/utils: increase detected start_time with skip_samples

wm4 nfxjfg at googlemail.com
Thu Mar 10 09:12:31 CET 2016


On Wed, 9 Mar 2016 23:02:14 +0100 (CET)
Marton Balint <cus at passwd.hu> wrote:

> On Wed, 9 Mar 2016, wm4 wrote:
> > On Tue, 8 Mar 2016 23:44:13 +0100 (CET)
> > Marton Balint <cus at passwd.hu> wrote:
> >  
> >> On Tue, 8 Mar 2016, Hendrik Leppkes wrote:
> >>   
> >> > On Tue, Mar 8, 2016 at 10:54 PM, Marton Balint <cus at passwd.hu> wrote:   
> >> >>
> >> >> On Tue, 8 Mar 2016, wm4 wrote:
> >> >>   
> >> >>> On Tue,  8 Mar 2016 21:29:52 +0100
> >> >>> Marton Balint <cus at passwd.hu> wrote:
> >> >>>   
> >> >>>> Signed-off-by: Marton Balint <cus at passwd.hu>
> >> >>>> ---
> >> >>>>  libavformat/utils.c               | 10 ++++--
> >> >>>>  tests/ref/fate/gapless2-ipod-aac1 | 72
> >> >>>> +++++++++++++++++++--------------------
> >> >>>>  tests/ref/fate/gapless2-ipod-aac2 | 72
> >> >>>> +++++++++++++++++++--------------------
> >> >>>>  3 files changed, 80 insertions(+), 74 deletions(-)
> >> >>>>   
> >> >>   
> >> >>> I'm a probably bit late here, but what's the rationale of increasing the
> >> >>> start time?
> >> >>>   
> >> >>
> >> >> According to docs, start time is supposed to be the pts of the first
> >> >> decoded frame, therefore skipped samples must be taken into account, when
> >> >> the start time is determined based on the first packet pts.
> >> >>   
> >> >
> >> > But the skipping is performed by avcodec, not avformat, isn't it?   
> >> 
> >> Yes.
> >>   
> >> > start_time should be the PTS of the first avpacket coming out of
> >> > avformat, never mind what a decoder might do to that later.   
> >> 
> >> Not according to the docs:
> >> 
> >> "AVStream->start_time: decoding: pts of the first frame of the stream in 
> >> presentation order, in stream time base. Only set this if you are 
> >> absolutely 100% sure that the value you set it to really is the pts of 
> >> the first frame."
> >> 
> >> If here frame refers to a packet, why the docs is talking about 
> >> presentation order?
> >> 
> >> Also check the libavformat/mp3dec.c, it does the same kind of start_time 
> >> adjustment based on the skipped samples.  
> >
> > True, with mp3, the first PTS is always 0, and after skipping it will
> > be some positive value, and start_time is set to the skip duration. Is
> > this the same here?  
> 
> Yes. These are iTunes (-compatible) m4a files by the way with iTunes 
> gapless metadata. I don't think you can make any assumptions based on 
> a negative timestamp, even for MOV, elst should contain the proper start 
> time:
> https://developer.apple.com/library/mac/documentation/QuickTime/QTFF/QTFFAppenG/QTFFAppenG.html
> 
> > Also, what the mp3 demuxer does is not necessarily correct. And
> > also, what works for audio-only formats isn't necessarily right for if
> > there are video tracks involved.  
> 
> Only audio stream start times are adjusted, I don't see why does it make 
> a difference if there are other (video) streams.
> 
> > What I'm really worried about is what utils.c will determine as total 
> > stream start time?  
> 
> What do you mean by "total stream start time"? Are you referring to 
> AVFormat->start_time? Isn't that simply the minimum of the stream start 
> times?

That's what it seems to be doing.

Anyway, with just an audio track, adjusting start_time is rather
inoffensive.

If there's a video track, it becomes complicated. The audio packets
(after applying delay skipping) will not start at 0 (even if you adjust
AVStream.start_time, obviously). So something else needs to make sure
that they either start at 0, or the video track needs to be offset by
the audio delay.

So I would have thought that the edit list actually changes the audio
track so that the the audio track starts exactly at time 0 after
skipping the padding (presumably the video starts at time 0). This
would mean mov.c actually has to adjust the audio packet timestamps so
that the first audio packet PTS is negative (-padding). After skipping
padding, the first sample would have timestamp 0.

This also means the AVFormatContext.start_time should be 0 or unknown,
instead of e.g. the raw audio packet's negative PTS.

Or am I misunderstanding something?

> > From your added tests it looks like mov.c does now what I'd expect,
> > though. I just want to be sure this mess doesn't become worse.
> >  
> 
> Sure, ok.
> 
> > Last but not least, why do we have about 3 mechanisms to signal
> > pre-skip? (Skip side data, skip_samples field, and delay field.)  
> 
> As far as I see, avformat code injects skip side data based on the
> AVStream->skip_samples field, avcodec code uses skip side data to 
> skip the samples.
> 
> I guess the delay field is the old way of signalling the 
> number of to-be-skipped frames. As this can only signal whole frames and 
> not samples, it has a very limited use. Generic code does not do anything 
> with this.

The delay field is in samples. Some demuxers are using it for this
purpose (matroskadec and oggparseopus.c come to mind).

Yes, the common code and ffmpeg.c oddly ignores this, but API users can
use it to handle preskip correctly with these formats.


More information about the ffmpeg-devel mailing list