[FFmpeg-devel] [PATCH] avformat/dv: fix timestamps of audio packets in case of dropped corrupt audio frames

Wed Nov 4 11:09:54 EET 2020

On Mon, Nov 02, 2020 at 09:42:21PM +0100, Marton Balint wrote:
> 
> 
> On Mon, 2 Nov 2020, Michael Niedermayer wrote:
> 
> > > > Please correct me if iam wrong but
> > > > in cases where no audio is missing or damaged, this would also ignore how much
> > > > audio is in each packet. So you could have lets say a timestamp difference
> > > > of excatly 1 second between 2 packets while their is actually not exactly
> > > > 1 second worth of audio samples between them.
> > > 
> > > This is true, by using the frame counter (and the video time base) for
> > > audio, we lose some audio packet timestamp precision inherently. However I
> > > don't consider this a problem, audio timestamps do not have to be sample
> > > accurate, for most formats they are not.
> > 
> > 
> > > Also it is not practical to keep
> > > track of how many samples are there in the packets, for example when you do
> > > seeking, obviously you can't read all the audio data before the seek point
> > > to get a precise sample accurate timestamp.
> > 
> > Its true that with seeking there is not enough information for sample precisse
> > timestamps. But from packet to packet as long as no seek happened there is.
> 
> And that timestamp can turn out to be wrong. If the audio clock is running
> at little more than 48 kHz, there will be A-V desync because after some time
> audio and video timestamps for packets coming from the same DV frame will
> diverge significantly.
> 

> > My concern was more about something like significant frame to frame
> > differences in audio sample numbers.
> > Because if some hw or sw generates this we would produce packets of
> > identical duration which differ substantially in number of samples and
> > that would not be handled well in any scenario that accepted the timestamps
> > and durations as exact.
> 
> In general, you can't assume that timestamps or packet durations are exact.
> Consider you have a format which stores timestamps and durations in
> miliseconds. Rounding errors will occur. 

sure, maybe the distinction of millisecond/rounded timebases and exact
timebases needs a flag somewhere.

> Also, for consumer equipment audio
> and video is rarely locked together, and audio sample rates are rarely very
> precise.

sure, this case is maybe a bit more exceptional than this though
we have "millisecond" based formats, rounded timestamps
we have "exact" cases, maybe the timebase being 1 packet/frame per tick
we have "high precission" where the timebase is so precisse it doesnt matter

This here though is a bit an oddball, the size if 1 PCM frame is 1 sample
The timebase is not a millisecond based one, its not 1 frame either nor is
it exact nor high precission.
Its 1 video frame, and whatever amount of audio there is in the container

which IIUC can differ from 1 video frame even rounded.
maybe this just doesnt occur and each frame has a count of samples always
rounded to the closes integer count for the video frame.

But if for example some hardware was using internally a 16 sample buffer
and only put multiplies of 16 samples in frames this would introduce a
considerable amount of jitter in the timestamps in relation to the actual
duration. And using async to fix this without introducing more problems
might require some care

> > Maybe this never occurs and in that case your patch should be a good idea
> > but if it does happen then some code would be needed to deal with that.
> > It is detectable when sample counts do not match what is expected.
> 
> Yeah, and we have tools to fix that, like -af aresample=async=1.

> 
> > That said, i would consider a fix for #8762 to produce correct audio in
> > all cases including wav/pcm/mov/... output and not just when the output
> > can store "corrupted"/"sparse" audio.
> 
> I think ffmpeg.c should be smarter about it, and be aware if unlocked or
> sparse audio (or audio not starting at the same time as video) is supported
> by certain muxers or not. And if it is not suppoted, then maybe -af async=1
> or similar should be used automagically.

yes

thx

[....]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

What does censorship reveal? It reveals fear. -- Julian Assange
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 195 bytes
Desc: not available
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20201104/0743d2ba/attachment.sig>