[MEncoder-users] Sync and Encoding audio: AAC and muxing

Sat May 10 02:11:10 CEST 2008

Nicolas Hesler <nicolas.hesler at sheridanc.on.ca> writes:

> So I did a lot more testing of the audio track.   I grabbed the WAV file 
> and encoded it with command line encoders to AAC using ffmpeg, faac, 
> neroaac.  In the case of ffmpeg I tried encoding to mp3 as well.  In all 
> cases, with no video to worry about, the duration of the encoded audio 
> was always different from the duration of the WAV audio.
>
> Why is this?

While I can not give you all the technical details, the reason likely
is that those audio compression algorithms are frame based.

More specifically, if the pulse code modulation sample rate is
e.g. 48000 samples per second, the compression algorithm won't just
take each of those samples and compress them independently. That
wouldn't give much of a compression. Rather it takes a chunk (in AAC I
think it's 1024 samples) and does compression on this, resulting in
frames resp. blocks of 1024/48000 length which makes about 21
milliseconds.

The result is that, while the video length is bound to be a multiple
of 1/fps, the encoded audio length is a multiple of those 21 ms, and
those both might just not match up.

To match those constraints, the encoder (I guess) will likely either
cut some audio short or apply some padding at the end. All that though
doesn't affect sync of course, because sync only depends on
maintaining the sample rate of both, video and audio, during encoding
and playback. I.e., if a video frame is skipped, it might still lead
to a sync error roughly of the resolution of pcm sampling 1/48000. But
as long as the encoder keeps track of those they won't accumulate when
many frames are skipped during the encoding process.

That said, I've seen audio streams which have been actively adjusted
by mencoder at the start of the presentation, likely because it
considered audio input corrupted but wouldn't drop video frames for
obvious reasons to keep the sync. In my case, that led to start time
delays for the audio streams, which were adjusted by setting a
corresponding field in the avi header. If you encounter such a case,
the remux process we've discussed this far would of course lose this
information, and you would have to dump the avi header, read the delay
and then apply it in the mp4 muxing process

 MP4Boc -add audio.aac:delay=whatever_number

For me that was a rare case though and only happened with tv captures
where the source wasn't that great to start with. While unrelated to
your original question, I thought I'd mention it anyway though, for
the sake of completeness.