[FFmpeg-devel] (i)(m)dct help for g.722.1

Thu May 17 01:10:20 CEST 2007

Ramiro Ribeiro Polla wrote:
> Ramiro Ribeiro Polla wrote:
> 
>>Hello,
>>
>>Justin Ruggles wrote:
>>  
>>
>>>Ramiro Ribeiro Polla wrote:
>>>  
>>>    
>>>
>>>>Hello,
>>>>
>>>>When I got to the dct part of g.722.1 (which is currently in 
>>>>cook.c:687), I realized it doesn't use the same transform as cook (at 
>>>>least I think so). After the window, overlap, and add, the output is 
>>>>faster than it should be, making the last half of the 320 samples look 
>>>>like a fade-out.
>>>>
>>>>I wanted to know if FFmpeg already has the same transform used in 
>>>>g.722.1, or if I'm just using it the wrong way.
>>>>
>>>>The standard says it's a "type IV DCT":
>>>>u(n) = \sum_{m=0}^{319} \sqrt{\frac{2}{320}} cos( \frac{\pi}{320} 
>>>>(m+0.5)(n+0.5) ) mlt(m)
>>>>for 0 \leq n \leq 319
>>>>
>>>>Or else, how would I go about implementing it?
>>>>    
>>>>      
>>>
>>>I suspect Benjamin's advice is correct, but in case not, here is an
>>>example of how a type iv dct is related to a standard mdct.  See the
>>>mdct_512() function (and ignore mdct_256() which is an AC-3 oddity).
>>>Also, this example is for the forward transform, but the same concept
>>>applies to the inverse.
>>>
>>>http://aften.svn.sourceforge.net/viewvc/*checkout*/aften/libaften/mdct.c?revision=98
>>>
>>>  
>>>    
>>
>>Thanks for the tips. I tried what Benjamin suggested, and it seems the 
>>problem is because g.722.1 uses 320 samples, and the mdcts in FFmpeg 
>>work only with powers of 2. Cook uses powers of 2. Vorbis also supports 
>>only powers of 2.
>>
>>Any thoughts on a solution besides implementing my own dct? Can the mdct 
>>and fft functions be modified to accept any number of samples?
>>
>>  
> 
> Ok, after reading more on the subject, it seems I'd have to implement 
> another fft algorithm.
> 
> Questions to the math gurus:
> What's the best alternative FFT algorithm in terms of speed and accuracy 
> that supports arbitraty sizes?
> Is there a way to use the current code with some additional math 
> (convolution, re-sampling) and produce accurate results?

Another option would be an MDCT or DCT-IV implementation without FFT.
One way to do this is to recursively split into DCT-III and DCT-II.  I
don't know for sure, but I think this might be how libvorbis does it.
It at least uses a recursive algorithm with lots of butterfly ops, which
is what is done in the DCT-II/III implementation.  I have a pdf document
which covers this method if you're interested.

As far as an arbitrary-size FFT, I would suggest taking a look at FFTW.
 The code is GPL, so using it is off-limits, but the documentation and
source comments might give you some ideas to start with.

-Justin