[FFmpeg-devel] [PATCH] avfilter/mpegaudiodec_template: Fix wrong decision of illegal

Wed Jan 8 17:10:16 EET 2020

On Wed, Jan 08, 2020 at 16:32:39 +0900, Ted Lee wrote:
> Dear FFmpeg developers,
> I'm glad to have a chance to contribute to FFmpeg.

Welcome.

> *Summary:*
>
>    - Test signal: 790_pomnyun_taebaek2.mp3
>       - MP3 file with the same left channel and right channel except for
>       the first 7 seconds and the last 10 seconds.
>       (download:
>       <https://www.dropbox.com/s/sivz18sixvhoo66/790_pomnyun_taebaek2.mp3?dl=0>
>       https://www.dropbox.com/s/sivz18sixvhoo66/790_pomnyun_taebaek2.mp3?dl=0
>       )
>    - When decoding the test signal using FFmpeg as below, a difference
>    occurs in the same part of L and R.
>       - $ffmpeg -i 790_pomnyun_taebaek2.mp3 -acodec pcm_s16le
>       790_pomnyun_taebaek2.wav
>
>
> Expected result (Adobe Audition or Core Audio):
>
> Decoded PCM
> [image: image.png]
> (L-R)
> [image: image.png]
> (You can reproduce it using Adobe Audition or CoreAudio of MacOS)
>
> Actual result (FFmpeg):
>
> Decoded PCM
> [image: image.png]
>
> (L-R)
> [image: image.png]

To put it in other words:
You expect the monoaural section of this file to be decoded with
identical left and right channels. ffmpeg does not achieve this. In
fact, there seem to be ~50 dB too much "noise".

[See mean/max_volume results below, before and after subtraction.]

Original ffmpeg:
barsnick at sunshine:/usr/new/tools/video/ffmpeg/ffmpeg-snapshot-2020-01-05 > ../ffmpeg-build-2020-01-05/ffmpeg_g -ss 10 -t 10:00 -i ~/tmp/790_pomnyun_taebaek2.mp3 -af volumedetect,"pan=mono|c0=c0-c1",volumedetect -f null -
[...]
[Parsed_volumedetect_0 @ 0xa5bb380] n_samples: 26460000
[Parsed_volumedetect_0 @ 0xa5bb380] mean_volume: -25.1 dB
[Parsed_volumedetect_0 @ 0xa5bb380] max_volume: 0.0 dB
[Parsed_volumedetect_0 @ 0xa5bb380] histogram_0db: 594
[Parsed_volumedetect_0 @ 0xa5bb380] histogram_1db: 671
[Parsed_volumedetect_0 @ 0xa5bb380] histogram_2db: 1033
[Parsed_volumedetect_0 @ 0xa5bb380] histogram_3db: 1794
[Parsed_volumedetect_0 @ 0xa5bb380] histogram_4db: 3067
[Parsed_volumedetect_0 @ 0xa5bb380] histogram_5db: 4940
[Parsed_volumedetect_0 @ 0xa5bb380] histogram_6db: 7871
[Parsed_volumedetect_0 @ 0xa5bb380] histogram_7db: 12062
[Parsed_volumedetect_2 @ 0xa5c3b40] n_samples: 13230000
[Parsed_volumedetect_2 @ 0xa5c3b40] mean_volume: -41.2 dB
[Parsed_volumedetect_2 @ 0xa5c3b40] max_volume: -15.5 dB
[Parsed_volumedetect_2 @ 0xa5c3b40] histogram_15db: 29
[Parsed_volumedetect_2 @ 0xa5c3b40] histogram_16db: 154
[Parsed_volumedetect_2 @ 0xa5c3b40] histogram_17db: 159
[Parsed_volumedetect_2 @ 0xa5c3b40] histogram_18db: 308
[Parsed_volumedetect_2 @ 0xa5c3b40] histogram_19db: 627
[Parsed_volumedetect_2 @ 0xa5c3b40] histogram_20db: 1130
[Parsed_volumedetect_2 @ 0xa5c3b40] histogram_21db: 1932
[Parsed_volumedetect_2 @ 0xa5c3b40] histogram_22db: 3433
[Parsed_volumedetect_2 @ 0xa5c3b40] histogram_23db: 5099
[Parsed_volumedetect_2 @ 0xa5c3b40] histogram_24db: 8002

I can at least confirm that other decoders achieve a better equality.
Using mpg321 with libmad-0.15.1b (to convert to wav and analyze with
ffmpeg), I get:

[Parsed_volumedetect_0 @ 0xbe50d00] n_samples: 26460000
[Parsed_volumedetect_0 @ 0xbe50d00] mean_volume: -26.9 dB
[Parsed_volumedetect_0 @ 0xbe50d00] max_volume: -0.1 dB
[Parsed_volumedetect_0 @ 0xbe50d00] histogram_0db: 186
[Parsed_volumedetect_0 @ 0xbe50d00] histogram_1db: 310
[Parsed_volumedetect_0 @ 0xbe50d00] histogram_2db: 556
[Parsed_volumedetect_0 @ 0xbe50d00] histogram_3db: 866
[Parsed_volumedetect_0 @ 0xbe50d00] histogram_4db: 1354
[Parsed_volumedetect_0 @ 0xbe50d00] histogram_5db: 2158
[Parsed_volumedetect_0 @ 0xbe50d00] histogram_6db: 3680
[Parsed_volumedetect_0 @ 0xbe50d00] histogram_7db: 6260
[Parsed_volumedetect_0 @ 0xbe50d00] histogram_8db: 8990
[Parsed_volumedetect_0 @ 0xbe50d00] histogram_9db: 13236
[Parsed_volumedetect_2 @ 0xbe4c180] n_samples: 13230000
[Parsed_volumedetect_2 @ 0xbe4c180] mean_volume: -85.5 dB
[Parsed_volumedetect_2 @ 0xbe4c180] max_volume: -62.7 dB
[Parsed_volumedetect_2 @ 0xbe4c180] histogram_62db: 1
[Parsed_volumedetect_2 @ 0xbe4c180] histogram_63db: 9
[Parsed_volumedetect_2 @ 0xbe4c180] histogram_64db: 13
[Parsed_volumedetect_2 @ 0xbe4c180] histogram_65db: 26
[Parsed_volumedetect_2 @ 0xbe4c180] histogram_66db: 37
[Parsed_volumedetect_2 @ 0xbe4c180] histogram_67db: 35
[Parsed_volumedetect_2 @ 0xbe4c180] histogram_68db: 68
[Parsed_volumedetect_2 @ 0xbe4c180] histogram_69db: 54
[Parsed_volumedetect_2 @ 0xbe4c180] histogram_70db: 73
[Parsed_volumedetect_2 @ 0xbe4c180] histogram_71db: 86
[Parsed_volumedetect_2 @ 0xbe4c180] histogram_72db: 118
[Parsed_volumedetect_2 @ 0xbe4c180] histogram_73db: 340
[Parsed_volumedetect_2 @ 0xbe4c180] histogram_74db: 9490
[Parsed_volumedetect_2 @ 0xbe4c180] histogram_75db: 0
[Parsed_volumedetect_2 @ 0xbe4c180] histogram_76db: 105618

With your patch, ffmpeg achieves this (very similar for both mp3 and
mp3float), which is even closer to identity:

[Parsed_volumedetect_0 @ 0xa6e3380] n_samples: 26460000
[Parsed_volumedetect_0 @ 0xa6e3380] mean_volume: -26.9 dB
[Parsed_volumedetect_0 @ 0xa6e3380] max_volume: -0.1 dB
[Parsed_volumedetect_0 @ 0xa6e3380] histogram_0db: 186
[Parsed_volumedetect_0 @ 0xa6e3380] histogram_1db: 310
[Parsed_volumedetect_0 @ 0xa6e3380] histogram_2db: 556
[Parsed_volumedetect_0 @ 0xa6e3380] histogram_3db: 866
[Parsed_volumedetect_0 @ 0xa6e3380] histogram_4db: 1354
[Parsed_volumedetect_0 @ 0xa6e3380] histogram_5db: 2158
[Parsed_volumedetect_0 @ 0xa6e3380] histogram_6db: 3680
[Parsed_volumedetect_0 @ 0xa6e3380] histogram_7db: 6258
[Parsed_volumedetect_0 @ 0xa6e3380] histogram_8db: 8990
[Parsed_volumedetect_0 @ 0xa6e3380] histogram_9db: 13236
[Parsed_volumedetect_2 @ 0xa6ebb40] n_samples: 13230000
[Parsed_volumedetect_2 @ 0xa6ebb40] mean_volume: -91.0 dB
[Parsed_volumedetect_2 @ 0xa6ebb40] max_volume: -59.4 dB
[Parsed_volumedetect_2 @ 0xa6ebb40] histogram_59db: 5
[Parsed_volumedetect_2 @ 0xa6ebb40] histogram_60db: 3
[Parsed_volumedetect_2 @ 0xa6ebb40] histogram_61db: 1
[Parsed_volumedetect_2 @ 0xa6ebb40] histogram_62db: 19
[Parsed_volumedetect_2 @ 0xa6ebb40] histogram_63db: 29
[Parsed_volumedetect_2 @ 0xa6ebb40] histogram_64db: 34
[Parsed_volumedetect_2 @ 0xa6ebb40] histogram_65db: 65
[Parsed_volumedetect_2 @ 0xa6ebb40] histogram_66db: 91
[Parsed_volumedetect_2 @ 0xa6ebb40] histogram_67db: 53
[Parsed_volumedetect_2 @ 0xa6ebb40] histogram_68db: 136
[Parsed_volumedetect_2 @ 0xa6ebb40] histogram_69db: 74
[Parsed_volumedetect_2 @ 0xa6ebb40] histogram_70db: 107
[Parsed_volumedetect_2 @ 0xa6ebb40] histogram_71db: 120
[Parsed_volumedetect_2 @ 0xa6ebb40] histogram_72db: 139
[Parsed_volumedetect_2 @ 0xa6ebb40] histogram_73db: 148
[Parsed_volumedetect_2 @ 0xa6ebb40] histogram_74db: 230
[Parsed_volumedetect_2 @ 0xa6ebb40] histogram_75db: 0
[Parsed_volumedetect_2 @ 0xa6ebb40] histogram_76db: 283
[Parsed_volumedetect_2 @ 0xa6ebb40] histogram_77db: 0
[Parsed_volumedetect_2 @ 0xa6ebb40] histogram_78db: 336
[Parsed_volumedetect_2 @ 0xa6ebb40] histogram_79db: 0
[Parsed_volumedetect_2 @ 0xa6ebb40] histogram_80db: 491
[Parsed_volumedetect_2 @ 0xa6ebb40] histogram_81db: 0
[Parsed_volumedetect_2 @ 0xa6ebb40] histogram_82db: 0
[Parsed_volumedetect_2 @ 0xa6ebb40] histogram_83db: 0
[Parsed_volumedetect_2 @ 0xa6ebb40] histogram_84db: 988
[Parsed_volumedetect_2 @ 0xa6ebb40] histogram_85db: 0
[Parsed_volumedetect_2 @ 0xa6ebb40] histogram_86db: 0
[Parsed_volumedetect_2 @ 0xa6ebb40] histogram_87db: 0
[Parsed_volumedetect_2 @ 0xa6ebb40] histogram_88db: 0
[Parsed_volumedetect_2 @ 0xa6ebb40] histogram_89db: 0
[Parsed_volumedetect_2 @ 0xa6ebb40] histogram_90db: 2788
[Parsed_volumedetect_2 @ 0xa6ebb40] histogram_91db: 13223860

I can't judge whether what you are doing is correct, but I can confirm
the observations.

> >From 754545f6c675aede07abd912992f9ed4a17d4ddc Mon Sep 17 00:00:00 2001
> From: Ted <t at gaudiolab.com>
> Date: Wed, 8 Jan 2020 11:53:56 +0900
> Subject: [PATCH] avfilter/mpegaudiodec_template: Fix wrong decision of
> illegal
>  intensity stereo position for MPEG-2 LSF

The patch is corrupted by newlines, inserted by your mailer. I managed
to fix that manually, but to ease review, please attach a patch created
by "git format-patch" to your e-mail (as an attachment), or use "git
send-email".

> Signed-off-by: Ted <t at gaudiolab.com>

You may want to provide a full real name, as this is what is registered
forever in the repositories.

> ---
>  libavcodec/mpegaudiodec_template.c | 14 +++++++++++---
>  1 file changed, 11 insertions(+), 3 deletions(-)

Since you are changing the decoder, you must also make sure that fate
testsuite does not detect changed results. Or if it does and such
changes are expected, the fate tests need to be adapted. (If you can't
do this, others may assist you, at least with checking. It's a bit more
of a hassle.)

It might also be worth adding such an MPEG-2 LSF sample to fate, if
those in fate don't cover this yet. (The MP§ decoder source file does
have a big TODO "test lsf / mpeg25 extensively".)

Cheers,
Moritz