[FFmpeg-user] volume normalization: loudnorm versus replaygain; which is better?

Tue Aug 2 18:47:33 EEST 2022

When I record videos with my phone (a Google Pixel 6), the audio track
of those videos tends to be too quiet, especially if I recorded in a
quiet environment.  I want to normalize the volume of those videos
when I play them.

I see basically two ways to do this:

1.  Rewrite the audio track of the video to normalize its loudness.

2.  Calculate the ReplayGain of the audio track of the video and embed
    the ReplayGain information as metadata, so devices that play back
    the video will find the ReplayGain tags and normalize the volume
    for playback.

Accomplishing #1 with ffmpeg is easy:

$ ffmpeg -i too-quiet.mp4 -filter:a loudnorm -vcodec copy better.mp4

But: is this the best approach?

Because in the audio world, the preferred way to handle variable
loudness of songs (e.g., if you are ripping old CDs) is to apply
ReplayGain tags.  This leaves the original audio intact, and just
applies volume correction at playback.  The majority of players
recognize and obey ReplayGain tags.

To me, it feels like the ReplayGain strategy is better than rewriting
the original audio, for at least two reasons:

1. Re-encoding the audio track is going to introduce additional loss.

2. If, say, subsequent improvements are made to the loudnorm filter,
   unless I retain a copy of the original (unmodified) video, I can’t
   go back and re-apply the new-and-improved loudnorm filter, because
   I destroyed the original audio track the first time I applied it.

If I wanted to use the ReplayGain approach, the calculate part is
easy:

$ ffmpeg -i too-quiet.mp4 -c:v copy -af replaygain foo.mp4
…
[Parsed_replaygain_0 @ 0x55dc42d5c940] track_gain = +45.88 dB
[Parsed_replaygain_0 @ 0x55dc42d5c940] track_peak = 0.021767

But it’s not clear to me from the ffmpeg documentation how I would
actually *add* the ReplayGain metadata to the audio track of the video
once I have calculated it.

And it’s also not clear to me whether this approach would actually
work in the real world.  If the majority of video playback devices
(e.g. smartphones) and software (e.g. social media sites) ignore
ReplayGain metadata tags in the audio tracks of videos, then while
this approach might be the best from a audio purist perspective, it
won’t have the practical effect of making a quiet video playback at a
normalized volume.

I can’t be the only person to have encountered this situation.  Is
there currently a “best practices” for audio track loudness
normalization in the video world?  Will using the ReplayGain metadata
work?  Or should I just give up and use the loudnorm filter for now,
and save the original videos so that when ReplayGain metadata is
better supported, I can go back and just add ReplayGain metadata to
the original videos and discard the versions that I created with the
loudnorm filter?

Thanks in advance for any advice!