[FFmpeg-devel] [PATCH] SIMD-optimized exponent_min() for ac3enc

Loren Merritt lorenm
Sat Jan 15 07:32:33 CET 2011


On Fri, 14 Jan 2011, Justin Ruggles wrote:

>+    /* round up to even multiple of 16 */
>+    if (nb_coefs & 15)
>+        nb_coefs = (nb_coefs & ~15) + 16;

unconditional
nb_coefs = FFALIGN(nb_coefs, 16);

>+%macro AC3_EXPONENT_MIN 1
>+cglobal ac3_exponent_min_%1, 3,4,3, exp, reuse_blks, offset, offset1
>+    cmp  reuse_blksq, 0
>+    je .end
>+    sal  reuse_blksq, 8
>+    sub      offsetq, mmsize
>+.nextexp:
>+    mov     offset1q, offsetq
>+    add     offset1q, reuse_blksq

lea

>+    mova          m0, [expq+offsetq]
>+.nextblk:
>+    mova          m1, [expq+offset1q]
>+%ifidn %1, mmx
>+    PMINUB_MMX    m0, m1, m2
>+%else ; mmxext/sse/sse2
>+    pminub        m0, m1

memory arg

>+%endif
>+    sub     offset1q, 256
>+    cmp     offset1q, offsetq

It is usually possible to arrange your pointers such that a loop ends with 
an offset of 0, and then you can take the flags from the add/sub instead 
of a separate cmp.

>+    jne .nextblk
>+    mova [expq+offsetq], m0
>+    sub      offsetq, mmsize
>+    jge .nextexp
>+.end:
>+    REP_RET
>+%endmacro
>+
>+INIT_MMX
>+AC3_EXPONENT_MIN mmx
>+AC3_EXPONENT_MIN sse_mmxext

mmx2 is a subset of sse; nothing should ever be tagged with both. In this 
case, you're not using sse.

>+%macro PMINUB_MMX 3 ; dst, src, tmp
>+    mova     %3, %1
>+    pcmpgtb  %1, %2
>+    pand     %2, %1
>+    pandn    %1, %3
>+    por      %1, %2
>+%endmacro

I think you can simplify that using psubusb.

--Loren Merritt



More information about the ffmpeg-devel mailing list