[FFmpeg-devel] [RFC/PATCH] More flexible variafloat_to_int16 , WMA optimization, Vorbis
Loren Merritt
lorenm
Tue Jul 15 23:01:12 CEST 2008
On Tue, 15 Jul 2008, Michael Niedermayer wrote:
> On Tue, Jul 15, 2008 at 08:58:23AM -0600, Loren Merritt wrote:
>> On Tue, 15 Jul 2008, Michael Niedermayer wrote:
> [...]
>>> It also might be worth to look at mplayer/liba52/resample_mmx.c, maybe
>>> some
>>> of that code could be reused. Especially as we do not have a MMX
>>> float_to_int16, besides the trick used could be tried with SSE2.
>>
>> I'm not very interested in optimizing for pentium2 / k6-1. I'm not sure I
>> could, anyway; that's so far removed from anything I can benchmark on.
>
> Well, maybe you are interrested an a Merom-2M
> Your SSE2 : 16009
> My ancient MMX trick ported to SSE2 : 14764
Don't forget to include the cost of add_bias, since you're returning to
[384.0,386.0] scale.
Merom-2M (T5470), 1024 samples, 2 channels
svn sse2 : 14751
your sse2: 13630 + bias during windowing or something
below : 17237
@@ -2223,9 +2225,15 @@
)
FLOAT_TO_INT16_INTERLEAVE(sse2,
+ "movdqa ff_pd_0x43c08000, %%xmm7 \n"
+ "movdqa ff_ps_385, %%xmm6 \n"
"1: \n"
- "cvtps2dq (%2,%0), %%xmm0 \n"
- "cvtps2dq (%3,%0), %%xmm1 \n"
+ "movdqa (%2,%0), %%xmm0 \n"
+ "movdqa (%3,%0), %%xmm1 \n"
+ "addps %%xmm6, %%xmm0 \n"
+ "addps %%xmm6, %%xmm1 \n"
+ "psubd %%xmm7, %%xmm0 \n"
+ "psubd %%xmm7, %%xmm1 \n"
"packssdw %%xmm1, %%xmm0 \n"
"movhlps %%xmm0, %%xmm1 \n"
"punpcklwd %%xmm1, %%xmm0 \n"
--Loren Merritt
More information about the ffmpeg-devel
mailing list