[FFmpeg-devel] [RFC] snow SSE2 optimizations (was: Re: [FFmpeg-cvslog] r10223 - in trunk/libavcodec/i386: dsputil_mmx.c snowdsp_mmx.c)

Guillaume POIRIER poirierg
Tue Aug 28 13:09:54 CEST 2007


Hi,

On 8/28/07, Michael Niedermayer <michaelni at gmx.at> wrote:
> On Tue, Aug 28, 2007 at 12:07:02AM +0200, Reimar D?ffinger wrote:
> > Right, right, I just missed a few lines of code while reading the C
> > version, thus the confusion.
> > Since the diff is unreadable, do you think the following is better than
> > the current code (I mean visually, it does decode correctly after all ;-),
> > though it is not measurably faster than the mmx code on my PC):
>
> SSE2 is rarely faster than MMX its because most cpus need 2x as long to
> execute SSE2 instructions than MMX ...

Exactly. You need a CPU that has full-width (128bits) ALU to almost
guarantee that SSE will be faster. Core2 and upcoming K10 have
full-with SSE ALUs.

Guillaume
-- 
A soldier will fight long and hard for a bit of colored ribbon.
 -- Napoleon Bonaparte




More information about the ffmpeg-devel mailing list