[FFmpeg-devel] [FFmpeg-cvslog] r12171 - trunk/doc/optimization.txt
Michael Niedermayer
michaelni
Thu Feb 21 20:11:20 CET 2008
On Thu, Feb 21, 2008 at 08:52:17PM +0200, ?smail D?nmez wrote:
> Hi,
>
> >Author: melanson
> >Date: Thu Feb 21 19:46:49 2008
> >New Revision: 12171
> >
> >Log:
> >minor English corrections
> >
> >
> >Modified:
> > trunk/doc/optimization.txt
> [...]
> > -Use asm() instead of intrinsics. Later requires a good optimizing compiler
> > +Use asm() instead of intrinsics. The latter requires a good optimizing compiler
> > which gcc is not.
>
> We all know this is FUD now, I know Michael still uses gcc 2.95 but
> the world have moved on. GCC 4.3 is about to be released.
> So please either backup these claims or note that this is not true for
> recent GCCs.
I use gcc r132072 ATM, i admit its a few days old, do you claim that gcc
was rewritten yesterday?
Also to backup the claim, the following was suggested to me a few days ago:
-static inline void diff_pixels_mmx(DCTELEM *block, const uint8_t *s1, const uint8_t *s2, int stride)
+static void diff_pixels_mmx(DCTELEM *block, const uint8_t *s1, const uint8_t *s2, long stride)
{
- asm volatile(
- "pxor %%mm7, %%mm7 \n\t"
- "mov $-128, %%"REG_a" \n\t"
- ASMALIGN(4)
- "1: \n\t"
- "movq (%0), %%mm0 \n\t"
- "movq (%1), %%mm2 \n\t"
- "movq %%mm0, %%mm1 \n\t"
- "movq %%mm2, %%mm3 \n\t"
- "punpcklbw %%mm7, %%mm0 \n\t"
- "punpckhbw %%mm7, %%mm1 \n\t"
- "punpcklbw %%mm7, %%mm2 \n\t"
- "punpckhbw %%mm7, %%mm3 \n\t"
- "psubw %%mm2, %%mm0 \n\t"
- "psubw %%mm3, %%mm1 \n\t"
- "movq %%mm0, (%2, %%"REG_a") \n\t"
- "movq %%mm1, 8(%2, %%"REG_a") \n\t"
- "add %3, %0 \n\t"
- "add %3, %1 \n\t"
- "add $16, %%"REG_a" \n\t"
- "jnz 1b \n\t"
- : "+r" (s1), "+r" (s2)
- : "r" (block+64), "r" ((long)stride)
- : "%"REG_a
- );
+ long offset = -128;
+ MOVQ_ZERO(mm7);
+ do {
+ asm volatile(
+ "movq (%0), %%mm0 \n\t"
+ "movq (%1), %%mm2 \n\t"
+ "movq %%mm0, %%mm1 \n\t"
+ "movq %%mm2, %%mm3 \n\t"
+ "punpcklbw %%mm7, %%mm0 \n\t"
+ "punpckhbw %%mm7, %%mm1 \n\t"
+ "punpcklbw %%mm7, %%mm2 \n\t"
+ "punpckhbw %%mm7, %%mm3 \n\t"
+ "psubw %%mm2, %%mm0 \n\t"
+ "psubw %%mm3, %%mm1 \n\t"
+ "movq %%mm0, (%2, %4) \n\t"
+ "movq %%mm1, 8(%2, %4) \n\t"
+ : : "r" (s1), "r" (s2), "r" (block+64), "r" (stride), "r" (offset)
+ : "memory");
+ s1 += stride;
+ s2 += stride;
+ offset += 16;
+ } while (offset < 0);
}
the effect that has on the generated asm is:
.L143:
.loc 3 241 0
leaq (%rsi,%r8), %rdx
leaq (%r10,%r8), %rax
#APP
# 241 "dsputil_mmx.c" 1
movq (%rdx), %mm0
movq (%rax), %mm2
movq %mm0, %mm1
movq %mm2, %mm3
punpcklbw %mm7, %mm0
punpckhbw %mm7, %mm1
punpcklbw %mm7, %mm2
punpckhbw %mm7, %mm3
psubw %mm2, %mm0
psubw %mm3, %mm1
movq %mm0, (%rdi, %r9)
movq %mm1, 8(%rdi, %r9)
# 0 "" 2
.loc 3 258 0
#NO_APP
addq %rcx, %r8
.loc 3 259 0
addq $16, %r9
jne .L143
-------------
As you can see gcc injects 2 unneeded lea instructions in the innermost loop.
And i think this is a very simple asm, if you want you can try this with some
complex code, but i recommand that you have a few bags for vomit ready ...
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
The educated differ from the uneducated as much as the living from the
dead. -- Aristotle
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080221/e062ef54/attachment.pgp>
More information about the ffmpeg-devel
mailing list