[FFmpeg-devel] [rfc] qualification task: SSE2 IDCT

Pascal Massimino pascal.massimino
Wed Apr 2 11:01:18 CEST 2008

On Sun, Mar 30, 2008 at 4:53 PM, Michael Niedermayer <michaelni at gmx.at>

> On Sun, Mar 30, 2008 at 02:25:15PM +0100, Balatoni Denes wrote:
> [...]
> > Also if Alexander gets skal to donate his code under LGPL, will it
> satisfy the
> > qualification task requirement, as skal's idct iirc is in fact very
> similar
> > to app note 922/945 ? :)
> Is skals idct faster/slower/same speed/unknown relative to ap* ?
> My original idea for this qualification was to have a AP922/945 SSE2 idct
> which combines all optimizations from all existing such IDCTs. So the
> question about any single one being ok is not awnserable. The code has to
> be compared to see if there are any further improvments possible.
> Also the output must be binary identical to an existing IDCT
> (to minimize the issues with idct drift between the ever growing number
> of idcts)
> Its a little mystery to me why alex apparently thought this was an easy
> task.
> Now iam perfectly fine with a simple SSE2 idct one. This would at least
> skip the binary identical output problem and the work for me comparing
> it against other IDCTs. OTOH its harder as there is no existing SSE2 code
> to base ones work on ...
> Theres also AMD who have promissed to implement 2 things for us,
> they are (since months) working on a SSE float aan dct. I think they might
> be happy if the second task would be a AP945/922 SSE2 IDCT as they already
> have some code for that.

  i think it's important to not introduce a "new" idct with a different
 error-landscape than the ones already around (even if IEEE-1180
 compliant). We already have the famous Walken-idct and the
 simple-idct. A new one would cause another round of idct-mismatch
 problem (that's why there's only my fdct in xvid, for instance, and not
 the idct).  This being said, so far i recall, you can turn the
 into a Walken-exact (bitwise) idct by using the following rounding
 constants as replacement:

  Idct_Rnd0: dd  65536, 65536
  Idct_Rnd1: dd   3597,  3597
  Idct_Rnd2: dd   2260,  2260
  Idct_Rnd3: dd   1203,  1203
  Idct_Rnd4: dd      0,     0
  Idct_Rnd5: dd    120,   120
  Idct_Rnd6: dd    512,   512
  Idct_Rnd7:  dd    512,   512

 and of course:
  Idct_Sparse_Rnd0: times 4  dw  (65536>>11)
  Idct_Sparse_Rnd1: times 4  dw  ( 3597>>11)
  Idct_Sparse_Rnd2: times 4  dw  ( 2260>>11)


More information about the ffmpeg-devel mailing list