[MPlayer-dev-eng] Improved remove-logo filter
Zuxy Meng
zuxy.meng at gmail.com
Thu Sep 14 10:35:39 CEST 2006
Hi,
2006/9/13, Trent Piepho <xyzzy at speakeasy.org>:
> The reason I had _-prefixed variable names no longer applies, so I will
> change those.
>
> The constraints work fine. The 'y' constraint is for an MMX register, so
> gcc will place the 64 bit data into an MMX register it chooses. This way
> gcc has more options of where to put the movq instruction wrt the rest of
> the loop code. The actual code generated:
>
> .L44:
> addl $8, %edx #, i i+=8
> movq (%ebx), %mm0 #* mask, tmp217 mm0 = *mask
> addl $8, %ebx #, mask mask+=8
> #APP
> pandn (%ecx), %mm0 #* image.164, tmp217
> psadbw (%ecx), %mm0 #* image.164, tmp217
> movd %mm0, %eax # tmp217, _sum
> #NO_APP
> addl $8, %ecx #, image.164 image+=8
> addl %eax, %edi # _sum, accumulator accumulator += _sum
> cmpl %edx, %esi # i, D.4595 i < logo_mask->width
> jg .L44 #,
>
> You can see in the second line, gcc has generated a movq to put *mask into
> a mm0 for the asm block. gcc has decided to leave *image in memory, since
> I allowed that with the constraint "ym", so the asm block uses (%ecx). I
> could change the *image constraint to just "y", in which case gcc would
> put a 'movq (%ecx), %mm1' somewhere and the asm block would use mm1. I
> tried this, but it benchmarked slower.
>
> If I had used a "m" constraint and moved the data into mmx registers
> myself, then it could not be mixed in with the loop counter instructions
> for better scheduling.
>
MOVD mmreg, r32 is slow on AMD CPUs, maybe use "m" instead of "r" for
sum will be faster?
--
Zuxy
Beauty is truth,
While truth is beauty.
PGP KeyID: E8555ED6
More information about the MPlayer-dev-eng
mailing list