[FFmpeg-devel] [PATCH] avformat/dv: fix timestamps of audio packets in case of dropped corrupt audio frames

Sun Aug 22 21:38:52 EEST 2021

Hi Marton,

> On Feb 23, 2021, at 3:07 PM, Dave Rice <dave at dericed.com> wrote:
> 
>> On Feb 23, 2021, at 2:42 PM, Marton Balint <cus at passwd.hu> wrote:
>> 
>> On Sat, 20 Feb 2021, Dave Rice wrote:
>> 
>>> Hi,
>>> 
>>>> On Oct 31, 2020, at 5:15 PM, Marton Balint <cus at passwd.hu <mailto:cus at passwd.hu>> wrote:
>>>> On Sat, 31 Oct 2020, Dave Rice wrote:
>>>>>> On Oct 31, 2020, at 3:47 PM, Marton Balint <cus at passwd.hu <mailto:cus at passwd.hu>> wrote:
>>>>>> On Sat, 31 Oct 2020, Dave Rice wrote:
>>>>>>> Hi Marton,
>>>>>>>> On Oct 31, 2020, at 12:56 PM, Marton Balint <cus at passwd.hu <mailto:cus at passwd.hu>> wrote:
>>>>>>>> Fixes out of sync timestamps in ticket #8762.
>>>>>>> Although Michael’s recent patch does address the issue documented in 8762, I haven’t found this patch to fix the issue. I tried with -c:a copy and with -c:a pcm_s16le with some sample files that exhibit this issue but each output was out of sync. I put an output at https://gist.github.com/dericed/659bd843bd38b6f24a60198b5e345795 <https://gist.github.com/dericed/659bd843bd38b6f24a60198b5e345795>. That output notes that 3597 packages of video are read and 3586 packets of audio. In the resulting file, at the end of the timeline the audio is 9 frames out of sync and my output video stream is 00:02:00.020 and output audio stream is 00:01:59.653.
>>>>>>> Beyond copying or encoding the audio, are there other options I should use to test this?
>>>>>> Well, it depends on what you want. After this patch you should get a file which has audio packets synced to video, but the audio stream is sparse, not every video packet has a corresponding audio packet. (It looks like our MOV muxer does not support muxing of sparse audio therefore does not produce proper timestamps. But MKV does, please try that.)
>>>>>> You can also make ffmpeg generate the missing audio based on packet timestamps. Swresample has an async=1 option, so something like this should get you synced audio with continous audio packets:
>>>>>> ffmpeg -y -i 1670520000_12.dv -c:v copy \
>>>>>> -af aresample=async=1:min_hard_comp=0.01 -c:a pcm_s16le 1670520000_12.mov
>>>>> Thank you for this. With the patch and async, the result is synced and the resulting audio was the same as Michael’s patch.
>>>>> Could you explain why you used min_hard_comp here? IIUC min_hard_comp is a set a threshold between the strategies of trim/fill or stretch/squeeze to align the audio to time; however, the async documentation says "Setting this to 1 will enable filling and trimming, larger values represent the maximum amount in samples that the data may be stretched or squeezed” so I thought that async=1 would not permit stretch/squeeze anyway.
>>>> It is documented poorly, but if you check the source code you will see that async=1 implicitly sets min_comp to 0.001 enabling trimming/dropping. min_hard_comp decides the threshold when silence injection actually happens, and the default for that is 0.1, which is more than a frame, therefore not acceptable if we want to maintain <1 frame accuracy. Or at least that is how I think it should work.
>>> 
>> 
>>> I’ve found that aresample=async=1:min_hard_comp=0.01, as discussed here, works well to add audio samples to maintain timestamp accuracy when muxing into a format like mov. However, this approach doesn’t work if the sparseness of the audio stream is at the end of the stream. Is there a way to use min_hard_comp to consider differences between a timestamp and audio data when one of the ends of that range is the end of the file?
>> 
>> I am not aware of a smart method to generate missing audio in the end until the end of video.
>> 
>> As a possible workaround you may query the video length using
>> ffprobe or mediainfo, and then use a second filter, apad to pad audio:
>> 
>> -af aresample=async=1:min_hard_comp=0.01,apad=whole_dur=<video_length>
>> 
>> Tnis might do what you want, but requires an additional step to query the video length…
> 
> 
> […]
> Perfect, thanks for sharing this idea.

I was hoping I could ask your advise on a related issue. There’s a sample to show it at https://archive.org/download/test_a_202108/test_a.dv <https://archive.org/download/test_a_202108/test_a.dv>. This sample has three frames, frame 1 and 3 have 2 tracks of 32k stereo audio. Frame 2 has nulled audio in the audio dif blocks and there are no audio metadata packs, so ffmpeg (rightly) does not find any audio in this frame.

Using your advice about the aresample filter above, I was using a command like this to mux the video and audio into a new container as lossless as feasible with:
ffmpeg -y -i test_a.dv -filter_complex "[0:a:0]aresample=async=1:min_hard_comp=0.01[aud1]" -c:v:0 copy -map 0:v:0 -c:a pcm_s16le -map "[aud1]” test_audio.mov

However here, the resulting audio track of test_audio.mov is 84 ms and the video track is 101 ms. The audio and video for frame 3 in test_audio.mov is out of sync as the audio for frame 3 starts in the middle of frame 2. It looks like frame 1 and 3 output 1001/30000 sec of audio but aresample only adds about half of that to fill the audio sparseness of the lack of audio in frame 2.

Kind Regards,

Dave Rice