[FFmpeg-devel] [PATCH 04/10] lavc/flacenc: add sse4 version of the lpc encoder

Christophe Gisquet christophe.gisquet at gmail.com
Wed Feb 12 12:41:34 CET 2014


Hi,

2014-02-12 0:11 GMT+01:00 James Darnley <james.darnley at gmail.com>:

> +%if ARCH_X86_64
> +    cglobal flac_enc_lpc_16, 6, 8, 4, 0, res, smp, len, order, coefs, shift
> +    %define posj r6
> +    %define negj r7
> +%else
> +    cglobal flac_enc_lpc_16, 6, 6, 4, 0, res, smp, len, order, coefs, shift
> +    %define posj r2
> +    %define negj r5
> +%endif
[...]
> +movd m3, shiftmp

If I'm not mistaken and x264asm isn't already brighter than me, you're
forcing the loading of shift into a gpr, while you really never have
to.
This 6th register will always be on stack, so you need one less gpr in
all cases.

I'm not sure, but is it possible to leave order or len wherever they
are for x86, so as to save another gpr? That may require to manually
load the args.

> +.looplen:
> +    pxor m0,  m0
> +    xor posj, posj
> +    xor negj, negj
> +    .looporder:
> +        movd   m2, [coefsq+posj*4] ; c = coefs[j]
> +        SPLATD m2
> +        movu   m1, [smpq+negj*4-4] ; s = smp[i-j-1]
> +        pmulld m1,  m2
> +        paddd  m0,  m1             ; p += c * s
> +
> +        add posj, 1
> +        sub negj, 1
> +        cmp posj, ordermp
> +    jne .looporder

Potentially stupid question: do the add and sub gets compiled to
inc/dec ? Is there a benefit compared to adding/subtracting 4? (I
guess it does)
Also, maybe not worthwhile, coefsq could be incremented by orderq*4,
posj set to -orderq, and then you would do:
dec negj
inc posj
jl/jnz .looporder

> +    movu  [resq], m1               ; res[i] = smp[i] - (p >> shift)
> +
> +    add resq, mmsize
> +    add smpq, mmsize
> +    sub lenmp, mmsize/4
> +jg .looplen

Equivalent trick here if len is in a reg: add 4*len*mmsize to resq,
neg lenq then:
movu  [resq+4*lenq], m1
add smpq, mmsize
add lenq, mmsize/4
jg .looplen
There are probably errors in what I gave, but this should be
sufficient to give you the idea.

-- 
Christophe


More information about the ffmpeg-devel mailing list