[Ffmpeg-devel] [PATCH] h264 - loopify some get_cabac calls

Sun Mar 25 04:07:46 CEST 2007

On Mar 24, 2007, at 7:46 PM, Guillaume Poirier wrote:

>> There's some more AltiVec code here we'll probably send soon:  
>> http://trac.perian.org/ticket/113
> I had a quick look at http://trac.perian.org/attachment/ticket/113/ 
> altivec_lum.3.diff
>
> Even though I imagine this patch isn't yet ready to be submitted,  
> I'd like to ask if the in your opinion, transpose routines can make  
> do without accessing memory (do it all in registers).

They actually do, that patch is just messy enough to hide it.
The functions transpose4x4 and readVector aren't ever called.

transpose4/6x16 only do memory operations because the initial loads  
and stores are integrated into them.
I think the stuff in transpose6x16 can be cleaned up; it should be  
able to use vec_ste instead of copying the result array.

But this is my first time studying it too; I didn't write it.

> Also more cycles could be saved if you take advantage of some known  
> alignments (8-bytes aligned load/store can be made faster than a  
> generic unaligned memory access)....

Hm, doesn't Altivec use the same unaligned load method for both?
(load x and 15+x, merge them)

>
> I do realize that this patch isn't meant for submission yet, I just  
> wanted to give some kind of feedback.
>
>
> Guillaume
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at mplayerhq.hu
> http://lists.mplayerhq.hu/mailman/listinfo/ffmpeg-devel