[FFmpeg-soc] [PATCH] 4/4 Split sws_getContext_altivec_alloc_context from sws_getContext
Luca Barbato
lu_zero at gentoo.org
Sun Jun 15 12:26:32 CEST 2008
Michael Niedermayer wrote:
> On Sun, Jun 15, 2008 at 01:55:57AM +0200, Luca Barbato wrote:
>> Keiji Costantini wrote:
>>> Luca Barbato ha scritto:
>>>> Michael Niedermayer wrote:
>>>>> On Wed, Jun 11, 2008 at 02:36:08AM +0200, Keiji Costantini wrote:
>>>>>> - p[j] = c->vLumFilter[i];
>>>>>> - p[j] = c->vChrFilter[i];
>>>>> Whichever way this is done and whereever, it should be done at the
>>>>> same place where lum/chrMmxFilter is initialized.
>>>>> And of course both altivec & mmx should use the same array for the same data.
>>>>>
>>>>> But looking again it seems these arrays are practically unused and the
>>>>> code using it looks like it shouldnt use them in the first place.
>>>>>
>>>>> So, correct cleanup seems to be to remove vCCoeffsBank and vYCoeffsBank.
>>>> The *Banks are just a copy from aligned memory to another, so just using
>>>> the vLumFilter and vChrFilter directly won't cause problems.
>>>>
>>>> lu
>>>>
>>> extract from code:
>>>
>>> for (i=0;i<c->vLumFilterSize*c->dstH;i++) {
>>> int j;
>>> short *p = (short *)&c->vYCoeffsBank[i];
>>> for (j=0;j<8;j++)
>>> p[j] = c->vLumFilter[i];
>>> }
>>>
>>> I see *Banks are *filters copied 8 times each...
>> I'm an idiot =P
>
> At least i now know why i didnt understand your earlier reply :)
Happens when I try to read code and I'm just awake or about to sleep ^^;
>> Well they could go away adding 2 vec_splats, but I'm pretty sure it
>> would slow things down. I'd consider this later -_-
>
> I wouldnt be so sure that the splats are slower than the cache trashing the
> array causes.
> Also if done properly (like in the mmx code) then there are rather few splats.
Now I'm just awake so I'll write something stupid again but:
if I just use the original vector I'd have:
(dumb way)
- one full unaligned load (2 loads, 1 table lookup, 1 permute)
- a splat
or
(smarter way)
- one simple load
- address mask to get the which is the element I care about
- a splat
right now I have a simple load and what's equivalent to the address mask
more or less (one &15 more), so you are right I should be able to kill
those vector and don't lose much.
lu - am I insane?
--
Luca Barbato
Gentoo Council Member
Gentoo/linux Gentoo/PPC
http://dev.gentoo.org/~lu_zero
More information about the FFmpeg-soc
mailing list