[FFmpeg-devel] [PATCH] SPARC VIS simple_idct try #2

Thu Aug 23 00:47:28 CEST 2007

Hi

On Wed, Aug 22, 2007 at 08:53:21PM +0200, Balatoni Denes wrote:
[...]
> 
> > > Also, although it's the same algorithm as simple_idct, it might be less
> > > accurate on rare occasions, because of the primitive slow split 8+8 bit
> > > multply operation in VIS. I am hoping the idct won't overflow, too.
> 
> > what happens with the regression tests if its forced to be used where the
> > normal C simple idct normally is?
> >
> > and what does dct-test.c output for the idct?
> 
> I won't try the regression test, as it would be very complicated - ffmpeg 
> doesn't support solaris 8 (I don't have acces to anything else), so I am 
> using an older hacked version of mplayer.

ok

> 
> But I did try dct-test, here is the output:
>  
> ------------
> ednebal mwux119> ./dct-test -i 0
> ffmpeg DCT/IDCT test
> 
>     0     0     0     0     0     0     0     0
>     0     0     0     0     0     0     0     0
>     0     0     0     0     0     0     0     0
>     0     0     0     0     0     0     0     0
>     0     0     0     0     0     0     0     0
>     0     0     0     0     0     0     0     0
>     0     0     0     0     0     0     0     0
>     0     0     0     0     0     0     0     0
> IDCT REF-DBL: err_inf=0 err2=0.00000000 syserr=0.00000000 maxout=260 
> blockSumErr=0
> IDCT REF-DBL: 50.2 kdct/s
> 
>    -4   -24    -1   -15     4    24   -29   -47
>   -35   -13    21    -2    24     8    -9    19
>    30   -17   -38     7   -18    43   -28    -5
>    31   -31   -13    -9    23     5    16    11
>   -16    20   -43    18     4   -11    -9     0
>    -3    -6    -9    -3    -5     4   -25    11
>   -43    10    -6    -7    24    22   -12    23
>     4   -23   -11     0    10    11    13    24
> IDCT INT: err_inf=1 err2=0.01606250 syserr=0.00235000 maxout=260 blockSumErr=6
> IDCT INT: 629.0 kdct/s
> 
>    -1    13    15    -2    30     9    10   -11
>   -22   -27     9    11    15    -1    31     8
>     2    -2     4   -21     0     9   -21   -13
>    21   -17   -10    12     4     1     9    -1
>    -8     4    -5    -6    20    -3    12     0
>    21   -15     9    18    -9    19   -14     3
>     1    22   -10    -6    11    -6     6     1
>    21    18    -7   -11     2    17    16    -8
> IDCT SIMPLE-C: err_inf=1 err2=0.00840078 syserr=0.00155000 maxout=260 
> blockSumErr=5
> IDCT SIMPLE-C: 463.2 kdct/s
> 
>   651   363   821  1416  1116   686    64   492
>   247   380   736   237   441   323   534   420
>  1094   716   359   676   442   484   371   426
>   726   296   437   302   399   380   413   411
>   762   622   357   548   476   429   350   463
>   451   386   407   382   437   386   415   391
>   384   579   421   523   482   492   403   454
>   304   455   360   399   344   407   396   375
> IDCT SIMPLE-VIS: err_inf=1 err2=0.06097266 syserr=0.07080000 maxout=259 
> blockSumErr=12
> IDCT SIMPLE-VIS: 1024.6 kdct/s
> 
>  2810  1725  3883  3476  1445  2977  1362  1943
>  1263  1244  3402  2061  3266  2497  2936  2741
>  3763  3555  1640  2765  1864  2615  2039  2391
>  3648  1897  2893  2304  2447  2391  2482  2613
>  1796  3132  2055  2615  2115  2426  2246  2373
>  2690  2483  2701  2527  2544  2556  2554  2520
>  1361  2738  2237  2556  2252  2520  2360  2349
>  1882  2594  2395  2605  2279  2759  2274  2578
> IDCT MLIBidct: err_inf=3 err2=0.38151094 syserr=0.19415000 maxout=260 
> blockSumErr=40
> IDCT MLIBidct: 2380.8 kdct/s
> ------------------
> 
> So it is less accurate than simple_idct, but definietly more accurate than the 
> mlib version. I think I also can confirm that the inaccuracy is from the 
> split two instruction multiply+rounding - because I have an earlier version 
> of simple_idct_vis that uses 32 bit for the calculations (and hence does only 
> two rows parallel, so it's half as fast, only a bit faster than the C 
> version),  which has the same good accuracy as simple_idct. As the main 
> algorithm is the same, I think the conclusion, that the two instruction 
> multiply+rounding is guilty, is correct.

ehhh, thats the first time i see the mlib accuarcy .... uhm
thats miles outside the requirements mpeg&h26x have
i wonder if mpeg4 is even watchable with that ...
also please dont enable either the mlib nor simple_vis idct by default
(at least not for encoding) as the files will show artifacts if decoded
with an accurate IDCT (= they wont be playable on non sparc)

[...]
> > also you permute the input explicitly instead of setting 
> > idct_permutation_type 
> > properly
> >
> > please dont sumbit trash
> 
> I don't think it is trash. Again, there is no left shift. It could be 
> substituted with 4 additions and no fpackfix mess, but than it would only be 
> marginally faster, and there would be more code in this file (as this 
> permutating is still needed for the other half of the idct), in other words it 
> would be more complex for little gain. So I prefer this version.

yes, sorry ...

ill review the code later ...

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

The educated differ from the uneducated as much as the living from the
dead. -- Aristotle 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20070823/5990c256/attachment.pgp>