[Ffmpeg-devel] [RFC] VC1 Transform in AltiVec

Luca Barbato lu_zero
Tue Jul 18 13:02:05 CEST 2006


Michael Niedermayer wrote:
> Hi
> 

> 
> the vertical transform can also be done in 16bit though its a little trickier

some example code untested and probably wrong, Guillaume you may
complete/factor&fix it as exercise.

> 
>             t1 = 6 * (src[ 0] + src[32]);

	A = vec_ld(0,src);
	B = vec_ld(32,src);
	tmp = vec_add(A, B);
	t1  = vec_add(tmp,vec_add(tmp,tmp));
	t1  = vec_add(tmp,tmp);

>             t2 = 6 * (src[ 0] - src[32]);
	tmp = vec_sub(A,B);
	t2  = vec_add(tmp,vec_add(tmp,tmp));
	t2  = vec_add(tmp,tmp);

>             t3 = 8 * src[16] +  3 * src[48];
					^^^^^_unaligned
	A = vec_ld(16,src);
	B = vec_ld(48,src);
	align = vec_lvsl(48,src);
	B_1 = vec_ld(48+15,src);
	B = vec_perm(B,B_1,align);
	
	t3 = vec_add(vec_sl(A,vec_splat_u8(8)),
			    vec_add(B,vec_add(B,B)));

>             t4 = 3 * src[16] -  8 * src[48];
	t4 = vec_sub(vec_add(A,vec_add(A,A)),vec_sl(B,vec_splat_u8(8)));

etc etc etc....
> 
>             t5 = t1 + t3;
>             t6 = t2 + t4;
>             t7 = t2 - t4;
>             t8 = t1 - t3;
> 
>             t1 = (8 * src[ 8] + 8 * src[24] + 4 * src[40] + 2 * src[56]) + ((- src[24] + src[40])>>1);
				
>             t2 = (8 * src[ 8] - 2 * src[24] - 8 * src[40] - 4 * src[56]) + ((- src[ 8] - src[56])>>1);
>             t3 = (4 * src[ 8] - 8 * src[24] + 2 * src[40] + 8 * src[56]) + ((  src[ 8] - src[56])>>1);
>             t4 = (2 * src[ 8] - 4 * src[24] + 8 * src[40] - 8 * src[56]) + ((- src[24] - src[40])>>1);
> 
>             dst[ 0] = (t5 + t1 + 32) >> 6;
>             dst[ 8] = (t6 + t2 + 32) >> 6;
>             dst[16] = (t7 + t3 + 32) >> 6;
>             dst[24] = (t8 + t4 + 32) >> 6;
>             dst[32] = (t8 - t4 + 32) >> 6;
>             dst[40] = (t7 - t3 + 32) >> 6;
>             dst[48] = (t6 - t2 + 32) >> 6;
>             dst[56] = (t5 - t1 + 32) >> 6;
> 
> its also interresting to note that microsoft must be aware of this due to the
> way rounding is done on the second half of coeffs but they apparently 
> dont mention it in the spec ... i am wondering what other stuff they have
> hidden ...

thehehe....

> 
> and the + 32 can be added to t1/t2 instead of the end
> 
> [...]

right

lu

-- 

Luca Barbato

Gentoo/linux Gentoo/PPC
http://dev.gentoo.org/~lu_zero





More information about the ffmpeg-devel mailing list