[Ffmpeg-devel] [REQUEST] MMX/MMX2 and SSE optimizations for H.264 decoding

Mon Sep 19 12:23:42 CEST 2005

Michael Niedermayer wrote:
> On Fri, Sep 16, 2005 at 02:20:42PM +0200, Martin Boehme wrote:
>>Loren Merritt wrote:
>>>On Thu, 15 Sep 2005, Martin Boehme wrote:
>>>>Gamester17 wrote:
>>>>
>>>>>Yes there already are some MMX integer optimization for H264 but what 
>>>>>about SSE (Streaming SIMD Extensions) optimizations?, isn't SSE 
>>>>>suppose to be much more powerfull than MMX (and in fact be the thing 
>>>>>that replaces MMX)?
>>>>
>>>>Well, for a start, SSE has registers that are 128 bits wide, while 
>>>>MMX's registers are 64 bits. As long as you're operating only on the 
>>>>registers (i.e. you're CPU-bound, not memory bandwidth limited) that's 
>>>>an instant factor of 2 speedup.
>>>
>>>On AMD, most SSE2 instructions take exactly twice as long as the 
>>>equivalent MMX instruction. Any speedups are due only to scheduling.
>>>In x264, we have a bunch of SSE2 functions, but most of them are 
>>>_slower_ than the MMX versions on AMD.
>>
>>Interesting -- wasn't aware of that. I would assume that the AMD 
>>processors only have enough execution units for 64 bits worth of data 
>>and have to do SSE operations in two gos?
> 
> dunno but
> AFAIK the P4 (at least the older ones) have 2 MMX units running at half the
> cpu clock speed so they can execute either 1 MMX instruction per clock or
> 1 SSE(2) every 2 clocks, with a very small number of exceptions
> further note that execution itself isnt the only thing which can be a 
> bottleneck ...

Interesting, wasn't aware of that... it's probably chip space 
considerations that play into that, given that there shouldn't be aren't 
any dependencies between the individual "elements" of the vector units?

Martin

-- 
Martin B?hme
Inst. f. Neuro- and Bioinformatics
Ratzeburger Allee 160, D-23538 Luebeck
Phone: +49 451 500 5514
Fax:   +49 451 500 5502
boehme at inb.uni-luebeck.de