[FFmpeg-devel] [PATCH] Fix apply_welch_window_sse2 compilation on Mac OS X/x86
Pierre d'Herbemont
pdherbemont
Thu Oct 18 15:00:28 CEST 2007
On Oct 18, 2007, at 1:17 PM, Loren Merritt wrote:
> The whole point of splitting the asm block was to allow gcc to spill
> registers in between, because it doesn't have 6 general regs free. And
> look at your own disassembly: it did. So you jump from the 2nd asm
> block
> to the 1st without running the appropriate spilling code, and run
> the 1st
> block with the register values from the 2nd. Then you run the
> initialization code for the 2nd block again, which gcc expected to
> only
> run once.
Sorry, I missed that. Thanks for the explanation.
> BTW, spilling shouldn't be needed. It's possible to write the loop
> with 5
> regs, but that's slower than 6 if you have 6. Ideally I'd be able
> write
> the loop part in C and gcc would use 5 or 6 regs for addressing
> depending
> on what's available, but that's not what happens in practice.
I got your point. Here is one attempt at the 5 regs version. I guess
that's not far from Thorsten Jordan's version.
Basically there are two negl added for one sub removed in the loop.
So it's slower. I guess it would be nice to keep the 6 reg version
around, or rewrite it in C as you proposed.
Pierre.
Index: libavcodec/i386/dsputil_mmx.c
===================================================================
--- libavcodec/i386/dsputil_mmx.c (revision 10759)
+++ libavcodec/i386/dsputil_mmx.c (working copy)
@@ -2967,7 +2967,6 @@
double c = 2.0 / (len-1.0);
int n2 = len>>1;
long i = -n2*sizeof(int32_t);
- long j = n2*sizeof(int32_t);
asm volatile(
"movsd %0, %%xmm7 \n\t"
"movapd %1, %%xmm6 \n\t"
@@ -2985,17 +2984,18 @@
"movapd %%xmm6, %%xmm0 \n\t"\
"subpd %%xmm1, %%xmm0 \n\t"\
"pshufd $0x4e, %%xmm0, %%xmm1 \n\t"\
- "cvtpi2pd (%4,%0), %%xmm2 \n\t"\
- "cvtpi2pd (%5,%1), %%xmm3 \n\t"\
+ "cvtpi2pd (%3,%0), %%xmm2 \n\t"\
"mulpd %%xmm0, %%xmm2 \n\t"\
+ "movapd %%xmm2, (%1,%0,2) \n\t"\
+ "negl %0\n\t"\
+ "cvtpi2pd (%4,%0), %%xmm3 \n\t"\
"mulpd %%xmm1, %%xmm3 \n\t"\
- "movapd %%xmm2, (%2,%0,2) \n\t"\
- MOVPD" %%xmm3, (%3,%1,2) \n\t"\
+ MOVPD" %%xmm3, (%2,%0,2) \n\t"\
"subpd %%xmm5, %%xmm7 \n\t"\
- "sub $8, %1 \n\t"\
+ "negl %0\n\t"\
"add $8, %0 \n\t"\
"jl 1b \n\t"\
- :"+&r"(i), "+&r"(j)\
+ :"+&r"(i)\
:"r"(w_data+n2), "r"(w_data+len-2-n2),\
"r"(data+n2), "r"(data+len-2-n2)\
);
More information about the ffmpeg-devel
mailing list