[FFmpeg-devel] [PATCH] SPARC VIS simple_idct try #2

Michael Niedermayer michaelni
Thu Aug 23 14:00:21 CEST 2007


Hi

On Thu, Aug 23, 2007 at 12:01:34PM +0200, Balatoni Denes wrote:
> Hi!
> 
> Thursday 23 August 2007 03:01-kor Michael Niedermayer ezt ?rta:
> > > Well, you can't move between floating point and integer registers. So
> > > there would be some additional storing to memory, reading from memory,
> > > some masking is still needed, than the shift - all in all it's the same
> > > speed or slower than 4 adds. Which I already said, that I don't really
> > > like, because of marginal speedup, and more complexity.
> >
> > i dont see where you would need masking, the msbs should be 0
> > additionally, there is block_last_index which allows you to skip
> > 90% of the coeffs (as it tells you after what point all coeffs are 0)
> 
> Why whould the msbs of the four 16 bit numbers be 0? If they contain -1 for 
> example. 

true, iam stupid


[...]
> > so again
> > fix the permutation
> > shift left by 4 bits using C code or asm both stoping after
> > block_last_index do the 2048*(1<<4) subtraction if needed per 64bit
> >
> > at this point you should have pretty much the same data as in your case
> > but very significantly faster
> 
> Okay, so we spent some hours with the problem, and what we came up is a cca. 
> 5% speedup (cca. 2% overall), and longer code (because I still think what I 
> had is kind of elegant). I don't think it's a very significant speedup, in 
> the sense, that what wasn't playable before is still not playable (eg. 720p 

its like leaving 100euro laying at the street saying its not enough to buy
a car ...


> HDTV). Also as the idct is rather inaccurate, 

ive not yet looked at how to make it more accurate :)


> it won't be used by default, so 
> not many people would even be using it, so I think optimizing this even more 
> is somewhat wasted effort. So, to tell the truth, I am not overly 
> enthusiastic about the new solution.

2% overall speedup is huge ive rejected patches which would have introduced
new features because they slowed the code down by 0.1%

also mlib does the idct at half the speed, so i think theres more than 5% of
gain possible


[...]

-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

I do not agree with what you have to say, but I'll defend to the death your
right to say it. -- Voltaire
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20070823/c3de7c01/attachment.pgp>



More information about the ffmpeg-devel mailing list