[FFmpeg-devel] [PATCH] Add x86-optimized function ac3_or_abs_int16() and use in log2_tab().

Reimar Döffinger Reimar.Doeffinger
Sat Feb 12 11:49:00 CET 2011


On Fri, Feb 11, 2011 at 08:10:15PM -0500, Justin Ruggles wrote:
> On 02/11/2011 07:55 PM, Justin Ruggles wrote:
> 
> > +%macro PABSW2_MMX 6 ; dst1, dst2, src1, src2, temp1, temp2
> > +    mova    %1, %3
> > +    mova    %2, %4
> > +    mova    %5, %1
> > +    mova    %6, %2
> > +    psraw   %5, 15
> > +    psraw   %6, 15
> > +    pxor    %1, %5
> > +    pxor    %2, %6
> > +    psubw   %1, %5
> > +    psubw   %2, %6
> > +%endmacro
> 
> 
> If anyone is wondering why I used 2 temp registers and interleaved the
> instructions instead of using 1 temp register... it is faster on Atom.
> 
>  MMX: 7367 vs. 8966
> SSE2: 4228 vs. 4838

I think it doesn't hurt to have this kind of thing as comments,
otherwise it's quite likely it might be changed at some point
without realizing the drawbacks.



More information about the ffmpeg-devel mailing list