[FFmpeg-devel] [PATCH] Ticket #8750 Add inline function for the vec_xl intrinsic in non-VSX environments

Sat Feb 20 13:34:13 EET 2021

Hi!

> On 24 Jan 2021, at 17:35, Andriy Gelman <andriy.gelman at gmail.com> wrote:
> 
> On Sat, 07. Nov 16:31, Andriy Gelman wrote:
>> On Sat, 31. Oct 10:17, Andriy Gelman wrote:
>>> On Fri, 16. Oct 00:02, Andriy Gelman wrote:
>>>> On Fri, 09. Oct 20:17, Andriy Gelman wrote:
>>>>> From: Chip Kerchner <Chip.Kerchner at ibm.com>
>>>>> 
>>>>> ---
>>>>> libswscale/ppc/yuv2rgb_altivec.c | 10 ++++++++++
>>>>> 1 file changed, 10 insertions(+)
>>>>> 
>>>>> diff --git a/libswscale/ppc/yuv2rgb_altivec.c b/libswscale/ppc/yuv2rgb_altivec.c
>>>>> index 536545293d..930ef6b98f 100644
>>>>> --- a/libswscale/ppc/yuv2rgb_altivec.c
>>>>> +++ b/libswscale/ppc/yuv2rgb_altivec.c
>>>>> @@ -283,6 +283,16 @@ static inline void cvtyuvtoRGB(SwsContext *c, vector signed short Y,
>>>>>  * ------------------------------------------------------------------------------
>>>>>  */
>>>>> 
>>>>> +#if !HAVE_VSX
>>>>> +static inline vector unsigned char vec_xl(signed long long offset, const ubyte *addr)
>>>>> +{
>>>>> +    const vector unsigned char *v_addr = (const vector unsigned char *) (addr + offset);
>>>>> +    vector unsigned char align_perm = vec_lvsl(offset, addr);
>>>>> +
>>>>> +    return (vector unsigned char) vec_perm(v_addr[0], v_addr[1], align_perm);
>>>>> +}
>>>>> +#endif /* !HAVE_VSX */
>>>>> +
>>>>> #define DEFCSP420_CVT(name, out_pixels)                                       \
>>>>> static int altivec_ ## name(SwsContext *c, const unsigned char **in,          \
>>>>>                             int *instrides, int srcSliceY, int srcSliceH,     \
>>>> 
>>>> Ping.
>>>> This patch fixes PPC64 build on patchwork.
>>>> 
>>> 
>>> ping
>>> 
>> 
>> ping 
>> 
> 
> I asked Reimar to have a look at the patch. He didn't have a link to original
> post, so I'm forwarding his feedback: 
> 
> - a few comments sure would be good
> - the function likely should be always_inline
> - the offset is always 0 in the code, so that could be optimized
> - I am not sure if the addresses even can be unaligned (the whole magic code is about fixing up unaligned loads since altivec simply ignores the low bits of the address, but it looks like the x86 asm also assumes unaligned)
> - the extra load needed to fix the alignment can read outside the bounds of the buffer I think? Especially for chroma and if the load is aligned. Though chroma seems to have an issue already today...
> 

I had a look at the code before the vec_xl was introduced, and besides the unnecessary extra offset it seems it was pretty much like this.
So worst case this patch would restore the behaviour to before the vec_xl patch, which I
guess is a reasonable argument to merge it whether it’s perfect or not.
Personally I would have added a vecload (dropping the offset argument) or similar function
that either does vec_xl or this, instead of introducing a “fake” vec_xl function,
but that is just nitpicking I guess…

Best regards,
Reimar