[FFmpeg-devel] [RFC] snow SSE2 optimizations
Tue Aug 28 18:06:06 CEST 2007
On Tue, Aug 28, 2007 at 05:10:18PM +0200, Luca Barbato wrote:
> Michael Niedermayer wrote:
> > On Tue, Aug 28, 2007 at 01:09:54PM +0200, Guillaume POIRIER wrote:
> >> Exactly. You need a CPU that has full-width (128bits) ALU to almost
> >> guarantee that SSE will be faster. Core2 and upcoming K10 have
> >> full-with SSE ALUs.
> > another way to say it is that you need a cpu which has 2 mmx units and
> > can use both for sse instructions but can only use 1 for mmx
> > if that is a step in the correct direction well ...
> Start guessing why there is just one altivec (across 3 generations of
> cpus) and SPU is still quite similar...
> The intel design for instructions set wasn't and isn't the smarter and
> they keep adding irregular changes...
true, but it also isnt the most stupid, sparc-vis beats them by quite a bit
and i dont think a perfectly regular set is a good idea either, because
90% of the resulting instructions are never used by anyone, but the
cpu must support them, it makes the cpu more complex and slower
where i think intel did mess up is:
* the mmx design which uses the floating point registers is sick
* the fact that both mmx and sse have just 8 registers is sick
it was well known that 8 is a limiting factor in many cases
and with IA64 intel demonstrated that you can as well do it wrong
in the opposite direction by having hundreads of registers ...
* i want 8bit shifts, signed average, pack with shift and rounding and
some lea like instruction for mmx
* the stack based FPU registers ...
* having implicit source and destination registers for some instructions
like the 32x32->64 bit multiply
* integer fixed point multiply (multiply + rounding + shift down) like
pmulhrsw but for normal integers is missing ...
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
I know you won't believe me, but the highest form of Human Excellence is
to question oneself and others. -- Socrates
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 189 bytes
Desc: Digital signature
More information about the ffmpeg-devel