[FFmpeg-devel] [PATCH] Altivec implementation of int32_to_float_fmul_scalar
Guillaume POIRIER
poirierg
Tue Dec 16 10:29:45 CET 2008
Hello,
On Tue, Dec 16, 2008 at 10:21 AM, Luca Barbato <lu_zero at gentoo.org> wrote:
> Guillaume POIRIER wrote:
>> Damn, I feel stupid! (all the more since I didn't understand why you
>> wrote that at first....)
>>
>> Here it is now!
>
> What about unaligned cases?
Assuming that SSE2 version is correct:
static void int32_to_float_fmul_scalar_sse2(float *dst, const int
*src, float mul, int len)
{
x86_reg i = -4*len;
__asm__ volatile(
"movss %3, %%xmm4 \n"
"shufps $0, %%xmm4, %%xmm4 \n"
"1: \n"
"cvtdq2ps (%2,%0), %%xmm0 \n"
"cvtdq2ps 16(%2,%0), %%xmm1 \n"
"mulps %%xmm4, %%xmm0 \n"
"mulps %%xmm4, %%xmm1 \n"
"movaps %%xmm0, (%1,%0) \n"
"movaps %%xmm1, 16(%1,%0) \n"
"add $32, %0 \n"
"jl 1b \n"
:"+r"(i)
:"r"(dst+len), "r"(src+len), "m"(mul)
);
}
Then we don't need to worry about unaligned case, since SSE2 version
doesn't care.
> Beside that looks ok
Good. BTW, do you confirm that Altivec has no instruction to perform a
"plain" multiplication, but only has vectorized multiply-add?
Guillaume
--
Only a very small fraction of our DNA does anything; the rest is all
comments and ifdefs.
Stephen Leacock - "I detest life-insurance agents: they always argue
that I shall some day die, which is not so."
More information about the ffmpeg-devel
mailing list