[FFmpeg-devel] [PATCH 1/4] lavc/flacenc: add sse4 version of the 16-bit lpc encoder

James Almer jamrial at gmail.com
Tue Feb 25 04:42:15 CET 2014


On 20/02/14 3:48 PM, James Darnley wrote:
> From 1.8 to 2.4 times faster.  Runtime is reduced by 2 to 39%.  The
> speed-up generally increases with compression_level.
> 
> This lpc encoder is not used with levels < 3 so it provides no speed-up
> in these cases.
> ---
>  LICENSE                         |    1 +
>  libavcodec/flacenc.c            |    2 +-
>  libavcodec/x86/Makefile         |    3 +
>  libavcodec/x86/flac_dsp_gpl.asm |   83 +++++++++++++++++++++++++++++++++++++++
>  libavcodec/x86/flacdsp_init.c   |    4 ++
>  5 files changed, 92 insertions(+), 1 deletions(-)
>  create mode 100644 libavcodec/x86/flac_dsp_gpl.asm
> 

[...]

> +.looplen:
> +    pxor m0,   m0
> +    mov  posj, orderq
> +    xor  negj, negj
> +
> +    .looporder:
> +        movd   m2, [coefsq+posj*4] ; c = coefs[j]
> +        SPLATD m2
> +        movu   m1, [smpq+negj*4-4] ; s = smp[i-j-1]

> +        pmulld m1,  m2
> +        paddd  m0,  m1             ; p += c * s

PMACSDD m0, m1, m2, m0, m1

Same with the encoder (PMACSDQL instead in there). Do it of course with the 
unrolling patches as well.
You can then make the functions into macros to get both SSE4 and XOP versions, 
as i mentioned in a previous email.


More information about the ffmpeg-devel mailing list