[Ffmpeg-devel] gcc4 support & MMX fixups (from Debian)
Paweł Sikora
pluto
Wed Feb 1 01:56:21 CET 2006
Dnia Wednesday, 1 of February 2006 01:39, Aurelien Jacobs napisa?:
> Pawe? Sikora <pluto at pld-linux.org> wrote:
> > hmmm, the 4.1/4.0 fixed_transpose4x4 are equal but benchmarks differs.
> > maybe orig_transpose4x4 has different prologue?
>
> seems so.
>
> > [ 4.1 / -O2 ]
> > orig_transpose4x4:
> > leal (%rdx,%rdx), %r9d
> > leal (%rcx,%rcx), %eax
> > movslq %edx,%r11
> > movslq %ecx,%r8
> > movslq %r9d,%r10
> > addl %edx, %r9d
> > movslq %eax,%rdx
> > addl %ecx, %eax
> > movslq %r9d,%r9
> > cltq
> [ 4.0 / -O2 ]
> orig_transpose4x4:
> leal (%rdx,%rdx), %r8d
> movslq %edx,%r10
> leaq (%rcx,%rcx,2), %rax
> movslq %r8d,%r9
> addl %edx, %r8d
> movslq %r8d,%r8
yeah, the 4.1 gives worse code and my first benchmark can be send
to /dev/null. moreover the second fix (s/int/long/) simplifies x86-64
prologue and gives measurable gain.
thx for tests.
--
to_be || !to_be == 1, to_be | ~to_be == -1
More information about the ffmpeg-devel
mailing list