[MPlayer-dev-eng] [PATCH] Make mp3lib SIMD optimizations work on AMD64, Part 2
Zuxy Meng
zuxy.meng at gmail.com
Mon May 21 03:55:39 CEST 2007
Hi,
2007/5/20, Guillaume POIRIER <poirierg at gmail.com>:
> Hi
>
> On 5/19/07, Zuxy Meng <zuxy.meng at gmail.com> wrote:
> > As discussed with Guillaume on IRC, I'll split my previous big patch
> > (Rewrite synth_1to1_MMX....) into several small parts for easier
> > review. Here's the first one, rewriting the generic code in
> > synth_1to1_MMX from assembly to C, so we don't need to deal with
> > different ABIs. I've tested it and confirmed it doesn't hurt
> > performance.
> >
> > Note I removed a conditional jump in the remaining assembly too. By
> > analyzing the code I'm sure it's never taken so don't worry about
> > that. Strictly speaking it should be in a seperate patch but then this
> > patch would break mplayer...
>
> Patch Ok with me. I also tested it on AMD64: now mp3lib/decode_MMX.o
> can compile.
>
> You can apply your patch whenever you feel like it.
>
>
> > Part 2 will replace 32-bit leal to equivalent add/sub (without the 'l'
> > suffix) so pointer arithmetic will be 64-bit under amd64.
> >
> > Part 3 will remove hardcoded registers.
> >
> > Part 4 will kill tabinit_mmx.c. We don't need to compute the table at
> > runtime; it can be predetermined.
> >
> > Part 5 will correct data types, replacing 'long' with 'int' where necessary.
> >
> > The last patch will deal with Makefile and macros.
>
> I'm happy with this schedule, I look forward reviewing these patches.
Thanks. Part 1 committed. Now is Part 2, replacing leal to add/sub so
that pointer arithmetic will be 64-bit under AMD64.
According to AMD & Intel's manuals, add/sub is faster than lea on K8
and P4 and has the same latency in PM/Core 2.
--
Zuxy
Beauty is truth,
While truth is beauty.
PGP KeyID: E8555ED6
-------------- next part --------------
Index: mp3lib/decode_MMX.c
===================================================================
--- mp3lib/decode_MMX.c ?????? 23360??
+++ mp3lib/decode_MMX.c ????????????
@@ -124,9 +124,9 @@
"por %%mm0, %%mm1\n\t"
"movq %%mm1,(%%edi)\n\t"
- "leal 64(%%esi),%%esi\n\t"
- "leal 128(%%edx),%%edx\n\t"
- "leal 8(%%edi),%%edi\n\t"
+ "add $64,%%esi\n\t"
+ "add $128,%%edx\n\t"
+ "add $8,%%edi\n\t"
"decl %%ecx\n\t"
"jnz .L03\n\t"
@@ -149,11 +149,10 @@
"packssdw %%mm0,%%mm0\n\t"
"movd %%mm0,%%eax\n\t"
"movw %%ax, (%%edi)\n\t"
- "leal 32(%%esi),%%esi\n\t"
- "leal 64(%%edx),%%edx\n\t"
- "leal 4(%%edi),%%edi\n\t"
-
- "subl $64,%%esi\n\t"
+ "sub $32,%%esi\n\t"
+ "add $64,%%edx\n\t"
+ "add $4,%%edi\n\t"
+
"movl $7,%%ecx\n\t"
ASMALIGN(4)
".L04:\n\t"
@@ -201,9 +200,9 @@
"por %%mm0, %%mm1\n\t"
"movq %%mm1,(%%edi)\n\t"
- "subl $64,%%esi\n\t"
- "addl $128,%%edx\n\t"
- "leal 8(%%edi),%%edi\n\t"
+ "sub $64,%%esi\n\t"
+ "add $128,%%edx\n\t"
+ "add $8,%%edi\n\t"
"decl %%ecx\n\t"
"jnz .L04\n\t"
More information about the MPlayer-dev-eng
mailing list