[FFmpeg-devel] [PATCH] SIMD-optimized exponent_min() for ac3enc

Frank Barchard fbarchard
Sat Jan 15 07:50:42 CET 2011


On Fri, Jan 14, 2011 at 10:32 PM, Loren Merritt <lorenm at u.washington.edu>wrote:

> On Fri, 14 Jan 2011, Justin Ruggles wrote:
>
>  +    /* round up to even multiple of 16 */
>> +    if (nb_coefs & 15)
>> +        nb_coefs = (nb_coefs & ~15) + 16;
>>
>
> unconditional
> nb_coefs = FFALIGN(nb_coefs, 16);
>

Loren is right.  But FYI if you do it yourself, its
nb_coefs = (nb_coefs + 15)  & ~15;

>
>  +%macro AC3_EXPONENT_MIN 1
>> +cglobal ac3_exponent_min_%1, 3,4,3, exp, reuse_blks, offset, offset1
>> +    cmp  reuse_blksq, 0
>> +    je .end
>> +    sal  reuse_blksq, 8
>> +    sub      offsetq, mmsize
>> +.nextexp:
>> +    mov     offset1q, offsetq
>> +    add     offset1q, reuse_blksq
>>
>
> lea
>
>  +    mova          m0, [expq+offsetq]
>> +.nextblk:
>> +    mova          m1, [expq+offset1q]
>> +%ifidn %1, mmx
>> +    PMINUB_MMX    m0, m1, m2
>> +%else ; mmxext/sse/sse2
>> +    pminub        m0, m1
>>
>
> memory arg
>
>  +%endif
>> +    sub     offset1q, 256
>> +    cmp     offset1q, offsetq
>>
>
> It is usually possible to arrange your pointers such that a loop ends with
> an offset of 0, and then you can take the flags from the add/sub instead of
> a separate cmp.
>

Or check for underflow.  ie jns

 sub     offset1q, 256
 js       next
top:
 ...
 sub     offset1q, 256
 jns      top
next:


>  +    jne .nextblk
>> +    mova [expq+offsetq], m0
>> +    sub      offsetq, mmsize
>> +    jge .nextexp
>>
>
use unsigned cc if you can.  It fusses on more cpus and does not use the
overflow condition.
jae nextexp


> +.end:
>> +    REP_RET
>> +%endmacro
>> +
>> +INIT_MMX
>> +AC3_EXPONENT_MIN mmx
>> +AC3_EXPONENT_MIN sse_mmxext
>>
>
> mmx2 is a subset of sse; nothing should ever be tagged with both. In this
> case, you're not using sse.
>
>  +%macro PMINUB_MMX 3 ; dst, src, tmp
>> +    mova     %3, %1
>> +    pcmpgtb  %1, %2
>> +    pand     %2, %1
>> +    pandn    %1, %3
>> +    por      %1, %2
>> +%endmacro
>>
>
> I think you can simplify that using psubusb.
>
> --Loren Merritt
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at mplayerhq.hu
> https://lists.mplayerhq.hu/mailman/listinfo/ffmpeg-devel
>



More information about the ffmpeg-devel mailing list