[FFmpeg-cvslog] r18860 - in trunk/libavcodec: ac3dec.c ac3dec_data.c ac3dec_data.h eac3dec.c

Thu May 21 00:35:55 CEST 2009

Michael Niedermayer wrote:
> On Tue, May 19, 2009 at 10:13:34PM -0400, Justin Ruggles wrote:
>> Michael Niedermayer wrote:
>>> On Tue, May 19, 2009 at 06:23:38PM -0400, Justin Ruggles wrote:
>>>> Michael Niedermayer wrote:
>>>>> On Sun, May 17, 2009 at 08:53:27AM +0200, jbr wrote:
>>>>>> Author: jbr
>>>>>> Date: Sun May 17 08:53:24 2009
>>>>>> New Revision: 18860
>>>>>>
>>>>>> Log:
>>>>>> eac3dec: use 16-bit pre-mantissas instead of 24-bit in AHT decoding. it is
>>>>>> simpler and also fixes a bug in GAQ dequantization.
>>>>> are you sure 16bit precission is enough for that?
>>>>> what effect does this compared to double precission floats have on the
>>>>> PSNR ?
>>>> stddev:    7.95 PSNR: 78.30 bytes:  3999744/  3999744
>>> and what is that?
>>>
>>> we have raw -> encoder -> ac3 -> double fp decoder -> raw2
>>>                               -> 16bit     decoder -> raw3
>>>
>>> PSNR of raw2 - raw vs. PSNR raw3 - raw is what we care about.
>>> That would make 2 PSNR scores though, a before and a after
>>>
>>> the second part we care about is the difference to a reference decoder
>>> that should mention which reference decoder is used though
>> I can't compare to raw in this case because I don't have access to a
>> professional E-AC-3 encoder which uses the AHT quantization.  But I can
>> compare to decoded output from Nero's E-AC-3 decoder.
>>
> 
>> nero vs. ffac3 svn
>> stddev:  131.16 PSNR: 53.96 bytes:  3998716/  3999744
> 
> why does the output differ so much to begin with?
> 
> 
>> nero vs. ffac3 w/ doubles for AHT dequant & idct
>> stddev:  131.30 PSNR: 53.95 bytes:  3998716/  3999744
> 
> what percentage of the file uses aht? i assume not every sample
> is pushed through it

I haven't measured it, but I would guess most of them.  Enough that a
previous bug in AHT was very audible in almost all parts of the audio
clip.  I don't think all files which have the feature use it as much
though.  I didn't hear any problems with the other 3 AHT samples we had
before this latest batch.

> 
>>>> I think 16-bit is enough.
>>> as a general rule of thumb, calculations in the frequency domain need
>>> an additional log2(transform length)/2 bits.
>>> I think the transform size is 512, that means that a minimum of 20 bits
>>> are needed for all calculations that work on coefficients.
>>> That is if the output should be 16bits instead of 12bit
>>>
>>> And independant of that decoders must be tested when changes are done
>>> that lead to different output, that especially true if you throw
>>> a huge number (8bit) of precission away.
>>>
>>> The idea is not to commit and hope someone notices it.
>>>
>>> and about stddev/PSNR, if you output 16bit and the low 1-3 are just random
>>> your code is totally broken and should be fixed, removed or replaced.
>>> Its not that a stddev of 8 is a little off ...
>>> Thats just an amount at which i wouldnt want to touch the decoder with a
>>> 10 feet/3 meter pole. We live in a world where people complain about 16bits
>>> being not enough, choping the precission down to 12 is definitly not ok
>>>
>>> Also to repeat it again, please test, test and test!
>>> and, maybe the ac3 spec conatins some rules about minimum precission for
>>> compliant decoders, if not please look at mp3 an use its compliance
>>> requirement as a minimum.
>> In the change I made, there are 2 places where precision is lost.
>>
>> One is in the random mantissas.  In that case, the encoder thought it
>> was not worth giving those any bits at all, so I don't see this as a
>> problem.
> 
> what?
> your change is not a problem because a encoder decided not to use the
> feature???

I think you misunderstood my point, which I likely did not explain well
enough.  Random noise is only generated in the decoder for mantissas
which are insignificant enough with respect to audible effects of bit
reduction that they were not allocated any bits at all by the AC-3 bit
allocation model.  It doesn't have anything to do with features.

> 
>> The other is in the 6-point IDCT, which splits a single "pre-mantissa"
>> value into 6 mantissas, one for each of the 6 AC-3 blocks.  The result
>> is shifted to 24-bit before combining with exponents.
>>
> 
>> The AC-3 spec does not give requirements regarding precision.  The only
>> clues are in the precision of the tables, and occasionally some vague
>> wording in the text.
> 
> Leaving precission undefined, while surely annoying does not mean that
> you can use any arbitrary poor precission.
> More so the listerner will not say: ohh it sounds worse than my 8bit 8khz
> original soundblaster, but hey the spec has no precission requirement so
> iam happy with it.
> 
> Thus
> 1. If a spec has a requirement on precission that should be followed
> 2. Independant of 1. precission should at least be sufficient to not cause
>    _any_ audible artifacts with high end equipment. (there may be inherit
>    artifacts in a codec independant of precission used but none should be
>    added through use of insufficient precission.)
> 
> Now as testing for lack of "audible artifacts" in all files is not easy,
> simply using the required precission of a similar better worded spec like
> in this case mp3 seems like a good compromise
> 
> Also leaving precission unspecified might mean that the author had a
> float/double based decoder in mind (though of course its bad if this
> is not explicitly stated)
> Now if so this doesnt mean that any arbitrary small number of bits is
> sufficient for a fixed point implementation of such float/double decoder.

I completely understand the point with regard to comparable functions
such as the full block MDCT, but as far as I know, MP3 does not have
anything like this secondary DCT that sort of encodes the frequency of
the frequency or whatever.  But given that, I do understand that it
means it needs more testing since the effects of precision reduction in
this transform on the audibility of artifacts are not documented publicly.

>>  For the IDCT, the spec just says, "Any fast
>> technique may be used to invert the DCT in Enhanced AC-3 decoders."
> 
> Yes it says "any fast" not "any approximate"
> fast may be related to inaccurate in many algorithms but the spec says
> fast not approximate.

Yes, I understand the difference.  I was just pointing out that it
doesn't say anything about the accuracy.

>>  The
>> tables for pre-mantissas, which are the input to the IDCT are all
>> 16-bit.  The output is treated like normal AC-3 mantissas, which are
>> also only 16-bit.  I didn't see the point of shifting the 16-bit
>> pre-mantissas up to 24 bits just for the 6-point IDCT.
> 
> I think you should read (or experment with) rounding errors and
> precission as it does not seem you listen to me
> 
> Or for the matter of fact you could check how many bits idcts for
> video uses internally compared to its input and output.
> or you could try a int16_t mdct and listen to how that sounds (should
> based on your reasoning sound perfect...)

Our AC-3 encoder does an int16_t mdct...  But I do get your point.

I will change the IDCT back to 24-bit soon and do more testing of
accuracy, audibility of artifacts, and speed before reconsidering using
16-bit.

Thanks,
Justin