[FFmpeg-user] Optimal workflow for concatenating (reliably)

Nicolas George george at nsup.org
Thu Apr 10 20:34:11 CEST 2014


Le primidi 21 germinal, an CCXXII, Jan Ehrhardt a écrit :
> hdn8 in gmane.comp.video.ffmpeg.user (Thu, 10 Apr 2014 08:07:34 -0700
> (PDT)):
> >Interesting debate over the "1-pass" normalization, I also think that
> >introducing an 'approximated' normalization into the core would be
> >useful in certain scenarios.     
> 
> I never really understood the resistance against it. Maybe I stepped on
> somebody's toes.

There is no resistance against a good approximated 1-pass normalization
filter. In fact, there is need for one, an want for it too. So much so that
a reasonably correct one would probably be accepted.

Your proposal was not that, though: it was just wrong. I already have
explained why it is wrong, I can explain again if it is necessary.

Using the immediate loudness to normalize will cause audible distortion,
such as background noise becoming suddenly much louder when people stop
talking.

The short-term loudness will do a bit better on that, but still not good
enough: its reaction to changes is too asymmetrical, first very fast then
slower.

The integrated loudness has a completely different issue: its reaction time
is very fast on the beginning and very slow at the end. That means that a
fanfare opening will cause the following content to be much too quiet for a
long time, while a fanfare ending will not be normalized at all.

I do not know why you have not noticed those effects yet on your encodings.
Maybe you are lucky enough to work with content that is already normalized
enough so that it will not matter. Or maybe you just did not pay enough
attention. That does not matter.

The correct way of doing this is to start with the immediate loudness and
apply a de-noising filter on the signal. That requires a certain amount of
look-ahead, but libavfilter is perfectly capable of doing that.

The de-noising is tricky, though: it must remove the random changes, such as
someone stopping talking for half a second for dramatic effect, but the
sharp transitions must remain sharp. Signal processing is not my forte, but
if someone has pointers to a good smoothing algorithm with those properties,
the infrastructure around it is not difficult at all, and I already have
some code.

Regards,

-- 
  Nicolas George
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: Digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-user/attachments/20140410/dc4c449a/attachment.asc>


More information about the ffmpeg-user mailing list