[Ffmpeg-devel] gcc4 support & MMX fixups (from Debian)

matthieu castet castet.matthieu
Tue Jan 31 21:25:29 CET 2006


Hi Pawe?,

Pawe? Sikora wrote:
> Hi all,
> 
> I have an implementation of transpose4x4 in C which uses gcc's vector
> extensions. It doesn't press register allocator so much and allows
> optimal code scheduling.
> 
> Instantiation of attached patch e.g. in foo(dst, src, 4, 4)
> gives a nice piece of code:
> 
> [ x86-64 example ]
> 
> foo:    movd        4(%rsi), %mm0
>         movd        (%rsi), %mm1
>         movd        8(%rsi), %mm2
>         movd        12(%rsi), %mm3
>         punpcklbw   %mm0, %mm1
>         punpcklbw   %mm3, %mm2
>         movq        %mm1, %mm0
>         punpckhwd   %mm2, %mm1
>         punpcklwd   %mm2, %mm0
>         movd        %mm1, 8(%rdi)
>         punpckhdq   %mm1, %mm1
>         movd        %mm0, (%rdi)
>         punpckhdq   %mm0, %mm0
>         movd        %mm1, 12(%rdi)
>         movd        %mm0, 4(%rdi)
>         ret
> 
> actually gcc-4.1 has a good optimizer and happy asm. hardcoding
> doesn't introduce incredible performance boost but only degradation
> of code scheduling.
Could you post a benchmarck between the 2 versions ?





More information about the ffmpeg-devel mailing list