[FFmpeg-user] Synchronizing A/V streams from independent sources?

Sun Jun 1 22:52:29 CEST 2014

On Sun, 01 Jun 2014 20:38:10 +0200, Nicolas George <george at nsup.org> 
wrote:

>Le tridi 13 prairial, an CCXXII, Jeff a écrit:
>> For some reason, the source file which has my preferred video stream
>> runs slightly faster (1:20:32) than the source of my audio stream
>> (1:23:58.000431). Obviously, these streams cannot be synchronized by
>> shifting the start time.
>
>Looking at the numbers, it is pretty obvious that the first video has 25
>frames per second, which is the standard for PAL, while the second one has
>24000/1001 (approximatively 23.976), which is one of the standards for NTSC.

Thank you for your reply.

Both my video sources are 25 fps (PAL), which is why I am puzzled by 
the fact that they do not have the same running times. Here are 
summaries, as provided by ffprobe:

Source from which I want to extract the video:

  Duration: 01:20:32.04, start: 0.000000, bitrate: 1252 kb/s
    Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p,
720x540 [SAR 1:1 DAR 4:3], 1119 kb/s, 25 fps, 25 tbr, 16k tbn, 50 tbc
(default)
    Stream #0:1(und): Audio: aac (mp4a / 0x6134706D), 48000 Hz, stereo,
fltp, 127 kb/s (default)

Source from which I want to extract the audio:

  Duration: 01:24:02.84, start: 0.023220, bitrate: 741 kb/s
    Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p,
640x480, 608 kb/s, 25 fps, 25 tbr, 12800 tbn, 50 tbc (default)
    Stream #0:1(und): Audio: aac (mp4a / 0x6134706D), 44100 Hz, stereo,
fltp, 127 kb/s (default)

Please note that this second file displays a logo at the end (the other 
file does not). With this material clipped off so that both files stop 
at the same place with respect to their content, the actual running 
time is 1:23:58.000431, as mentioned previously.

>I urge you to use these exact numbers in your computations. Remember than
>even a 0.005% difference will give a noticeable A-V desync over the run of
>your program.

Yes, I have noticed that. When I play my experimental results, I skip 
to the end of the program so that I can see how much the two streams 
have diverged.

>> I have, so far, tried 'setpts=3DPTS*1.xxxxxxxx' to slow and extend the
>> video, but this method introduces jerkiness regardless of whether I
>> choose '-vsync vfr' or 'cfr'. I get better (smooth, natural
>
>That happens because setpts is too versatile, the framework can not guess
>what you are doing with it.

This is the expression recommended in the ffmpeg documentation for this 
purpose, although I can see from the other examples that setpts can 
indeed accomplish many things.

When vsync is set to 'cfr', the video is slowed and extended by 
duplicating frames. Apparently, in this circumstance, setting vsync to 
'vfr' doubles the display duration of each frame that would have been 
duplicated with 'cfr'. In scenes of simple motion, such as a single 
person walking across a landscape, the motion is periodically arrested 
for an instant while the same image is displayed for the duration of 
two frames.

>Non-technical note: if you want the best results, you should inquire about
>the original format of your content, in order to produce an output that
>matches it as closely as possible.

>From what I have been able to discover, this is a direct-to-video 
European (PAL) release. Given the 4:3 display aspect ratios of both 
videos, they were most likely released on DVDs.

The video streams in both my source files are 25 fps progressive. I 
have stepped through small portions frame by frame, and could find no 
indication that these were telecine'd from (24 fps) film. 25 fps 
appears to be the original and native format.

>> Is there something I am missing, some other approach to take?
>
>There is something you are missing: the notion of time base.
>
>All timestamps are handled as integers, as a multiple of a base interval
>called time base. To optimize things, the time base is selected separately
>for each stream. For streams at constant frame rate, it is usually set to
>the normal interval between frames.

Thank you, that does help clarify some of what I have read on the 
topic. To make matters more complex, while the video stream has a time 
base as you describe, the MP4 container format has its own time base, 
which is different from the stream time base. I do not know if this 
time base, which ffmpeg identifies as 'tbn', is having an impact on 
what I am trying to do.

>In other words, with your 25 FPS that you are trying to convert to 24 FPS,

I am not trying to convert the frame rate. My best solution would be to 
stream-copy the video from my preferred source (or recode it to a lower 
bitrate that preserves the quality and native frame rate), and then 
"shrink" the audio to match it. If I have to "stretch" the video, I may 
use '-vsync vfr'--but not because I am trying to convert the frame 
rate.

>What you need is to decompose all steps so that ffmpeg gets the computation
>right.

Thank you, I will certainly save this procedure for future reference, 
but I do not know how it can be used for my current situation, since I 
am not trying to convert frame rates.

My apologies for not being clear in my original message. I hope that I 
have described my situation, and what I hope to accomplish, more 
completely. I hope that you will be able to help me. Thanks again for 
writing.

Jeff