[FFmpeg-devel] [PATCH 1/4] lavc/flacenc: add sse4 version of the 16-bit lpc encoder

James Almer jamrial at gmail.com
Tue Feb 25 06:51:50 CET 2014


On 25/02/14 12:42 AM, James Almer wrote:
> On 20/02/14 3:48 PM, James Darnley wrote:
>> From 1.8 to 2.4 times faster.  Runtime is reduced by 2 to 39%.  The
>> speed-up generally increases with compression_level.
>>
>> This lpc encoder is not used with levels < 3 so it provides no speed-up
>> in these cases.
>> ---
>>  LICENSE                         |    1 +
>>  libavcodec/flacenc.c            |    2 +-
>>  libavcodec/x86/Makefile         |    3 +
>>  libavcodec/x86/flac_dsp_gpl.asm |   83 +++++++++++++++++++++++++++++++++++++++
>>  libavcodec/x86/flacdsp_init.c   |    4 ++
>>  5 files changed, 92 insertions(+), 1 deletions(-)
>>  create mode 100644 libavcodec/x86/flac_dsp_gpl.asm
>>
> 
> [...]
> 
>> +.looplen:
>> +    pxor m0,   m0
>> +    mov  posj, orderq
>> +    xor  negj, negj
>> +
>> +    .looporder:
>> +        movd   m2, [coefsq+posj*4] ; c = coefs[j]
>> +        SPLATD m2
>> +        movu   m1, [smpq+negj*4-4] ; s = smp[i-j-1]
> 
>> +        pmulld m1,  m2
>> +        paddd  m0,  m1             ; p += c * s
> 
> PMACSDD m0, m1, m2, m0, m1
> 
> Same with the encoder (PMACSDQL instead in there). Do it of course with the 
> unrolling patches as well.
> You can then make the functions into macros to get both SSE4 and XOP versions, 
> as i mentioned in a previous email.
> 

Meant to say "Same with the 32-bit encoder". Sorry for the confusion.


More information about the ffmpeg-devel mailing list