[Ffmpeg-devel] clever 8-bit MMX loop filter ABS test

Skal skal
Wed May 4 08:53:44 CEST 2005


	Hi Michael,

On Tue, 2005-05-03 at 13:46, Michael Niedermayer wrote:
> Hi
> 
> On Tuesday 03 May 2005 11:03, Skal wrote:
> [...]
> > %macro ABS_LESS_SSE 2     ;   %1:out reg  %2: alpha-1/beta-1  mm0:Px mm1:Qx
> >  Trashes mm0,mm1,mm2 
> >   movq    mm2, mm0  ; Save Po 
> >   psubusb mm0, mm1  ; Po-Qo
> >   psubusb mm1, mm2  ; Qo-Po
> >   psubusb mm0,  %2
> >   psubusb mm1,  %2
> >   por     mm1, mm0
> >   pxor     %1,  %1
> >   pcmpeqb  %1, mm1
> 
>    movq    mm2, mm0  ; Save Po 
>    psubusb mm0, %1   ; Po-Qo
>    psubusb %1, mm2   ; Qo-Po
>    por     %1, mm0
>    psubusb %1,  %2
>    pcmpeqb  %1, mm7
> is 2 instructions less and should be faster

	Not necessarily, because of non-pairability.
	But once the macro-ized code is exploded and
	overlapped, your code will indeed be better
	since it uses less regs, and mm0 is preserved,
	allowing load-instr removal at a global level.

-Skal







More information about the ffmpeg-devel mailing list