[FFmpeg-devel] [PATCH 2/3] aacenc: Remove threshold-in-quiet modification from the 3GPP psymodel.

Tue Oct 26 11:22:30 CEST 2010

On Mon, Oct 25, 2010 at 5:22 PM, Nathan Caldwell <saintdev at gmail.com> wrote:
> On Mon, Oct 25, 2010 at 5:30 PM, Nathan Caldwell <saintdev at gmail.com>
wrote:
>> Removing this line vastly improves quality at a slight bitrate cost
>> for some samples. castanets.wav is a good example.
>>
>> The closest equivalent I see to this in the 3GPP spec is a similar
>> modification (over a specific frequency range) when TNS is used.
>

The spec says:

thr_q[n] = max(thr_spr, thr_quiet)
thr[n] = max(rpmin*thr_q[n], min(thr_q[n], rplev*thr_q,-1[n]))

With the patch we have:

421         for (g = 0; g < num_bands; g++) {
422             band[g].thr_quiet = FFMAX(band[g].thr, coeffs->ath[g]);
423             if (wi->num_windows != 8 && wi->window_type[1] !=
EIGHT_SHORT_SEQUENCE)
424                 band[g].thr_quiet =
FFMAX(PSY_3GPP_RPEMIN*band[g].thr_quiet,
425                                           FFMIN(band[g].thr_quiet,
426
PSY_3GPP_RPELEV*pch->prev_band[w+g].thr_quiet));
427             band[g].thr = FFMAX(band[g].thr, band[g].thr_quiet);
428
429             ctx->psy_bands[channel*PSY_MAX_BANDS+w+g].threshold =
band[g].thr;
430         }

> This patch is actually wrong.
>
> Really, only the modification needs removed. We still need to take
> MAX(thr, thr_quiet). Fixed patch attached.
>

It seems that we are smoothing against the already smoothed (thr[n]) values
rather the input values(thr_q[n]).

What part of the spec does line 427 map to?

To me it seems like we want something along the lines of...

421         for (g = 0; g < num_bands; g++) {
422             band[g].thr_quiet = FFMAX(band[g].thr, coeffs->ath[g]);
423             if (wi->num_windows != 8 && wi->window_type[1] !=
EIGHT_SHORT_SEQUENCE)
424                 band[g].thr = FFMAX(PSY_3GPP_RPEMIN*band[g].thr_quiet,
425                                           FFMIN(band[g].thr_quiet,
426
PSY_3GPP_RPELEV*pch->prev_band[w+g].thr_quiet));
427             else
427                 band[g].thr = band[g].thr_quiet;
428
429             ctx->psy_bands[channel*PSY_MAX_BANDS+w+g].threshold =
band[g].thr;
430         }

This is based on the assumption that if smoothing is off thr[n] = thr_q[n]
seeing as when smoothing is on thr[n] = SMOOTHED(thr_q[n])

Am I missing something?