[Ffmpeg-devel] [PATCH] idct8 in Altivec for H.264 decoding

Mon Oct 9 11:34:48 CEST 2006

Guillaume POIRIER wrote:

> +
> +/***********************************************************************
> + * Vector types
> + **********************************************************************/
> +#define vec_u8_t  vector unsigned char
> +#define vec_s8_t  vector signed char
> +#define vec_u16_t vector unsigned short
> +#define vec_s16_t vector signed short
> +#define vec_u32_t vector unsigned int
> +#define vec_s32_t vector signed int
> +
> +/***********************************************************************
> + * Null vector
> + **********************************************************************/
> +#define LOAD_ZERO const vec_u8_t zerov = vec_splat_u8( 0 )
> +
> +#define zero_u8v  (vec_u8_t)  zerov
> +#define zero_s8v  (vec_s8_t)  zerov
> +#define zero_u16v (vec_u16_t) zerov
> +#define zero_s16v (vec_s16_t) zerov
> +#define zero_u32v (vec_u32_t) zerov
> +#define zero_s32v (vec_s32_t) zerov

move them in a types_altivec.h

> +
> +/***********************************************************************
> +* VEC_DIFF_H_8BYTE_ALIGNED
> +***********************************************************************
> +* p1, p2:    u8 *
> +* i1, i2, n: int
> +* d:         s16v
> +*
> +* Loads n bytes from p1 and p2, do the diff of the high elements into
> +* d, increments p1 and p2 by i1 and i2
> +* Slightly faster when we know we are loading/diffing 8bytes which
> +* are 8 byte aligned. Reduces need for two loads and two vec_lvsl()'s
> +**********************************************************************/
> +#define PREP_DIFF_8BYTEALIGNED \
> +LOAD_ZERO;                     \
> +vec_s16_t pix1v, pix2v;        \
> +vec_u8_t permPix1, permPix2;   \
> +permPix1 = vec_lvsl(0, pix1);  \
> +permPix2 = vec_lvsl(0, pix2);  \
> +
> +#define VEC_DIFF_H_8BYTE_ALIGNED(p1,i1,p2,i2,n,d)    \
> +pix1v = vec_perm(vec_ld(0,p1), zero_u8v, permPix1);  \
> +pix2v = vec_perm(vec_ld(0, p2), zero_u8v, permPix2); \
> +pix1v = vec_u8_to_s16( pix1v );                      \

missing macro?

> +pix2v = vec_u8_to_s16( pix2v );                      \
> +d = vec_sub( pix1v, pix2v);                          \
> +p1 += i1;                                            \
> +p2 += i2;
> +

...

> +#define ALTIVEC_STORE_SUM_CLIP_ALIGN8_A
> +#define ALTIVEC_STORE_SUM_CLIP_ALIGN8_B

if is 8 bytes aligned you have to pick the high part or the low part of
it, B should take the low part, while A is taking the high part.

lu

-- 

Luca Barbato

Gentoo/linux Gentoo/PPC
http://dev.gentoo.org/~lu_zero