[FFmpeg-devel] [PATCH] SIMD-optimized exponent_min() for ac3enc
Loren Merritt
lorenm
Sat Jan 15 07:32:33 CET 2011
On Fri, 14 Jan 2011, Justin Ruggles wrote:
>+ /* round up to even multiple of 16 */
>+ if (nb_coefs & 15)
>+ nb_coefs = (nb_coefs & ~15) + 16;
unconditional
nb_coefs = FFALIGN(nb_coefs, 16);
>+%macro AC3_EXPONENT_MIN 1
>+cglobal ac3_exponent_min_%1, 3,4,3, exp, reuse_blks, offset, offset1
>+ cmp reuse_blksq, 0
>+ je .end
>+ sal reuse_blksq, 8
>+ sub offsetq, mmsize
>+.nextexp:
>+ mov offset1q, offsetq
>+ add offset1q, reuse_blksq
lea
>+ mova m0, [expq+offsetq]
>+.nextblk:
>+ mova m1, [expq+offset1q]
>+%ifidn %1, mmx
>+ PMINUB_MMX m0, m1, m2
>+%else ; mmxext/sse/sse2
>+ pminub m0, m1
memory arg
>+%endif
>+ sub offset1q, 256
>+ cmp offset1q, offsetq
It is usually possible to arrange your pointers such that a loop ends with
an offset of 0, and then you can take the flags from the add/sub instead
of a separate cmp.
>+ jne .nextblk
>+ mova [expq+offsetq], m0
>+ sub offsetq, mmsize
>+ jge .nextexp
>+.end:
>+ REP_RET
>+%endmacro
>+
>+INIT_MMX
>+AC3_EXPONENT_MIN mmx
>+AC3_EXPONENT_MIN sse_mmxext
mmx2 is a subset of sse; nothing should ever be tagged with both. In this
case, you're not using sse.
>+%macro PMINUB_MMX 3 ; dst, src, tmp
>+ mova %3, %1
>+ pcmpgtb %1, %2
>+ pand %2, %1
>+ pandn %1, %3
>+ por %1, %2
>+%endmacro
I think you can simplify that using psubusb.
--Loren Merritt
More information about the ffmpeg-devel
mailing list