[FFmpeg-user] Create an AAC stream matching the Core Media Audio packet format / priming etc?

Mon May 22 17:22:34 EEST 2017

On 15 Apr 2017, at 09:22, Christian Ebert <blacktrash at gmx.net> wrote:
> Somewhat counterintuitive, but you never know:
> 
> -filter:a aresample=async=1:first_pts=0,asetpts=PTS-STARTPTS+1024
> 
> combined with the -t incantation.

Hi Christian,

It seems this issue is not going to garner much attention which is a little disheartening. I can show how to reproduce it and would love to be able to help.

So I looked back at your above -af and realised that the 1024 should actually be 2112 which is Apple’s chosen fixed encoding delay.
https://developer.apple.com/library/content/documentation/QuickTime/QTFF/QTFFAppenG/QTFFAppenG.html <https://developer.apple.com/library/content/documentation/QuickTime/QTFF/QTFFAppenG/QTFFAppenG.html>

-filter:a aresample=async=1:first_pts=0,asetpts=PTS-STARTPTS+2112

...brings the native aac encoder almost perfectly into sync when played by a Quicktime based decoder. There is a tiny discrepancy, but its 99.9% better than without the -af line.

Further to this, using the AudioToolbox AAC encoder (aac_at) available in ffmpeg on macOS only, with the above -af line, this discrepancy is gone and the encoded file is a perfect sync match for the original source file.

The outstanding issue is the remaining samples for the file which are not being trimmed, so the clip runs past the end of the picture and we get a black frame. Perhaps the remaining samples are not being flagged in a way that the decoder would expect, I’m really not sure.

I can use the '-t' command with the value for ('total duration of source' - ‘0.041') to trim the end, but its has issues too.
In my case of 24fps source, 0.041 is the duration of 1 frame. Doing this shortens the overlong audio stream, but it removes the last frame of audio and in some cases does strange things with the last two frames of audio. So its not really usable in a production environment. Without subtracting 0.041, the audio is still overlong.

So in short, this is super close to a solution for Quicktime environments, but still with one big issue. Hmm.

Best
Mark