[FFmpeg-devel] One pass volume normalization (ebur128)

Tue Jul 16 11:35:03 CEST 2013

Nicolas George in gmane.comp.video.ffmpeg.devel (Mon, 15 Jul 2013
10:45:36 +0200):
>Le septidi 27 messidor, an CCXXI, Jan Ehrhardt a écrit :
>> OK, you've got some points for using volumedetect. The question is if
>> you still get those differences, taken into account that disk speed
>> might be a limiting factor.
>
>I have no idea, and "profile, don't speculate".

That is why I did the test om a 2GB Sony Camcorder file. See the
explanation below.

>It is true that the results of volumedetect go only in ffmpeg's log, but it
>is specifically designed to be easily parsable. We still need a clean
>solution for filters that want to communicate out-of-band information.
>
>Also, calling it "screen output" shows a deep misunderstanding of how things
>work.

No, that is no deep misunderstanding. On a SD card, you really do not
want to use FFMpeg's logs, because they slow down things terribly. For
instance, volume detection on a concatenated stream of 2 2GB Sony
Camcorder files takes 17 seconds with no logging and 27 seconds with
logging. Redirecting the screen output (stderr) to a file is 40% faster
than using the log.

>> I did a little test on a 2GB Sony MPEG recording. Transcoding using the
>> R128 input took (on my i5) 178 seconds. Volumedetection on the same file
>> only took 8 seconds and when I applied a volume=-6dB on the source file
>> I was 166 seconds further. Net difference: 8 + 166 = 174 versus 178 is
>> little bit more than a 2% speed gain.
>
>Since you do not explain what is being measured (I suspect it includes video
>transcoding), this information is mostly useless.

I thought that would be obvious. The Camcorder file of course contains
both video and audio and we want them both. A two-pass transcode using
volumedetect is 2-3% faster that a single pass transcode using the R128
metadata.

>> I have now expanded that commandline with volume normalization for both
>> the [a0] and [a1] audio tracks. Breaking it up for two pass encoding
>> would in fact mean three pass in that case: (1) voldetect a0, (2)
>> voldetect a1, (3) the transcoding.
>
>I believe you are wrong, volumedetect should be ran on the whole
>concatenated audio.

Sometimes you do want volumedetect on concatenated audio, sometimes you
do not. In my case, the first recording is often a short intro on what
the interview with the client will be about. It is very often taken
under different sound conditions than the remainder of the recording(s).
I have seen people recording the introduction inside a car before
entering a house et cetera.

These introductory recordings should be treated separately with respect
to audio normalization. In a single pass transcode this means minor
changes in the commandline options, in a 2-pass transcode it means
breaking up the single commandline into at least three (1) voldetect
intro, (2) voldetect remainder, (3) transcoding. Which is far less
robust than the single command line.

Jan

PS. With respect to the slowness of writing to SD cards see this patch:
https://github.com/FFmpeg/FFmpeg/commit/f4d9148fe282879b9fcc755767c9c04de9ddbcfa
By simply increasing the copy-buffer from 1K to 32K qt-faststart became
400 times as fast.