[FFmpeg-devel] [PATCH] Altivec version of h264_idct_add

David Conrad umovimus
Sun Jun 3 07:07:52 CEST 2007


On Jun 3, 2007, at 12:38 AM, Luca Barbato wrote:

> David Conrad wrote:
>> On Jun 2, 2007, at 10:15 PM, Luca Barbato wrote:
>>
>>> Loren Merritt wrote:
>>>>
>>>> The switch could be changed to a table if it matters.
>>>
>>> In theory vec_ste is all we need here sadly, I cannot manage to  
>>> get it
>>> working right for the unaligned cases.
>>
>> I've never really looked at vec_ste before today, but it seems that
>> vec_ste will always write the first element of the vector to the
>> rounded-down 16-byte address, and to store to an unaligned address  
>> you
>> have to move the data in the vector and store that element. The  
>> attached
>> patch does this with a permute and uses it instead of the switch. It
>> requires an additional 4 permutes and constant vector the aligned  
>> case,
>> but it seems to be a bit faster overall on my G4.
>
> vec_splat() should be enough (another perm spared)

Like so?

>> +    vec_u8_t repeatperm = (vec_u8_t)AVV(0x00, 0x01, 0x02, 0x03,  
>> 0x00, 0x01, 0x02, 0x03,
>> +                                        0x00, 0x01, 0x02, 0x03,  
>> 0x00, 0x01, 0x02, 0x03);
>
> lu

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: h264_idct_add_altivec_ste2.txt
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20070603/e6cfdf00/attachment.txt>



More information about the ffmpeg-devel mailing list