[FFmpeg-devel] [PATCH] SPARC VIS simple_idct try#6

Mon Aug 27 23:25:16 CEST 2007

Hi

On Mon, Aug 27, 2007 at 10:21:40PM +0200, Balatoni Denes wrote:
> Hi!
> 
> I hope I am not filtered to your trash folder yet, Michael ;)
> 
> Here is a new patch, try#6. It is more accurate now (albeit a tiny bit 
> slower), almost - but unfortunatelly only almost - passing ieee-1180.
> Here is dct-test output:
> 
>   144   101   394   286   421   176    11   126
>    88   109   152    79    99    93   166    94
>   233   140    94   134    83   131    57    74
>   185   100   121    74    85   117    84    70
>   207   134    74   129   129    89    66    88
>   112    74    74    94    75    80   124   103
>   126   144   130   111   120    82    87   104
>    96   119    72    83    66   133   146    97
> IDCT SIMPLE-VIS: err_inf=1 err2=0.02248828 syserr=0.02105000 maxout=260 
> blockSumErr=8

i suspect that this can be improved by slightly changing
some coefficients or bias ...

> 
> Btw, I tested Walken's idct, here is the dct-test output:
> 
>   92   156  -255   132  -189   129    -9   110
>  -242    90   -97   100  -105    84  -104    80
>   -89   157  -141    66  -165   107  -108    89
>  -142   116   -92    81   -66   107  -102    67
>  -163   128  -157    87   -92    59  -104    50
>   -93   142   -78   123   -60   111  -107   109
>  -149   148   -49    98   -43    64   -87    82
>  -165   115  -115    66  -115    86  -112    96
> IDCT WALKEN-VIS: err_inf=1 err2=0.02248906 syserr=0.01275000 maxout=260 
> blockSumErr=8

i assume thats just the c code? if its asm id like to see a benchmark ...

[...]
> 1.)
> > > ok, but then you should move the for up so its not immedeatly before
> > > a fcmpd using its result
> >
> > Ok, done.
> 
> Well, I moved them back, because it broke sparse matrices.

well you do have to change the used register of course

> 
> 2.)
> > > there are 32 64bit registers these should be enough to do the idct
> > > without an intermediate store-load
> > > the whole 8x8 block needs 16registers, 7 for the constant coefficients
> > > that leaves 9 available
> >
> > It would be slower. In it's current form of the idct, there are 8
> > independent VIS instructions after each other, so the instruction latency
> > is not a problem. If you only use 9 registers, than good luck with latency.
> 
> Indeed. Calculating the first column part would take at least 30 clocks more 
> because of latency, because there would be only one register for intermediate 
> results. Calculating the second column would take at lest 10 clocks more, and 
> by this time we are slower than before, as the gain from all this wourk would 
> have been about 32 clocks.

well your current code mixes the even and odd calculations thus it would
require twice as many intermediates, a proper implementation would not
and thus would only need 4 registers to accumulate values until the butterfly
also 1 register would become available after each column thus

0. column 9 registers available
2. column 6 registers available
4. column 7 registers available
6. column 8 registers available
1. column 9 registers available
3. column 6 registers available
5. column 7 registers available
7. column 8 registers available

> 
> 3.)
> > > the idct should not store the output in memory but leave it in registers
> > > the ff_simple_idct_put/add then should call the idct (or have it inlined)
> > > and the clamping code should just work with the registers
> > > this avoids another 32 instructions
> 
> Although it could be done, it is quite some work (and as always, relatively 
> little benefit), and more complexity in the code. It's really not worth it, 
> though we might not agree on this point.
> 
> 
> So this was my last attempt at trying to get this code into SVN. If doesn't 
> get in now, than let's be realistic Michael, it never will - because there 
> really are very few people interested in developping SPARC VIS assembly - 
> like there was no original VIS code in ffmpeg before I came, only parts 
> copypasted from libmpeg2, and most of the things are just not optimized for 
> SPARC.
> 
> I do believe this contribution would be beneficial to ffmpeg, because the C 
> idct is much slower, and the mlib idct sometimes makes the picture turn pink 
> (or causes other artifacts).
> 
> Anyhow, do as you wish, I am off to have dinner

this decission is easy, patch rejected

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

In a rich man's house there is no place to spit but his face.
-- Diogenes of Sinope
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20070827/60707d04/attachment.pgp>