[FFmpeg-devel] [PATCH] Ticket #8750 Add inline function for the vec_xl intrinsic in non-VSX environments

Andriy Gelman andriy.gelman at gmail.com
Sun Jan 24 18:35:37 EET 2021


On Sat, 07. Nov 16:31, Andriy Gelman wrote:
> On Sat, 31. Oct 10:17, Andriy Gelman wrote:
> > On Fri, 16. Oct 00:02, Andriy Gelman wrote:
> > > On Fri, 09. Oct 20:17, Andriy Gelman wrote:
> > > > From: Chip Kerchner <Chip.Kerchner at ibm.com>
> > > > 
> > > > ---
> > > >  libswscale/ppc/yuv2rgb_altivec.c | 10 ++++++++++
> > > >  1 file changed, 10 insertions(+)
> > > > 
> > > > diff --git a/libswscale/ppc/yuv2rgb_altivec.c b/libswscale/ppc/yuv2rgb_altivec.c
> > > > index 536545293d..930ef6b98f 100644
> > > > --- a/libswscale/ppc/yuv2rgb_altivec.c
> > > > +++ b/libswscale/ppc/yuv2rgb_altivec.c
> > > > @@ -283,6 +283,16 @@ static inline void cvtyuvtoRGB(SwsContext *c, vector signed short Y,
> > > >   * ------------------------------------------------------------------------------
> > > >   */
> > > >  
> > > > +#if !HAVE_VSX
> > > > +static inline vector unsigned char vec_xl(signed long long offset, const ubyte *addr)
> > > > +{
> > > > +    const vector unsigned char *v_addr = (const vector unsigned char *) (addr + offset);
> > > > +    vector unsigned char align_perm = vec_lvsl(offset, addr);
> > > > +
> > > > +    return (vector unsigned char) vec_perm(v_addr[0], v_addr[1], align_perm);
> > > > +}
> > > > +#endif /* !HAVE_VSX */
> > > > +
> > > >  #define DEFCSP420_CVT(name, out_pixels)                                       \
> > > >  static int altivec_ ## name(SwsContext *c, const unsigned char **in,          \
> > > >                              int *instrides, int srcSliceY, int srcSliceH,     \
> > > 
> > > Ping.
> > > This patch fixes PPC64 build on patchwork.
> > > 
> > 
> > ping
> > 
> 
> ping 
> 

I asked Reimar to have a look at the patch. He didn't have a link to original
post, so I'm forwarding his feedback: 

- a few comments sure would be good
- the function likely should be always_inline
- the offset is always 0 in the code, so that could be optimized
- I am not sure if the addresses even can be unaligned (the whole magic code is about fixing up unaligned loads since altivec simply ignores the low bits of the address, but it looks like the x86 asm also assumes unaligned)
- the extra load needed to fix the alignment can read outside the bounds of the buffer I think? Especially for chroma and if the load is aligned. Though chroma seems to have an issue already today...

-- 
Andriy


More information about the ffmpeg-devel mailing list