[FFmpeg-devel] [PATCH 09/10] lavc/flacenc: add sse4 version of the 32-bit lpc encoder
James Darnley
james.darnley at gmail.com
Wed Feb 12 15:55:20 CET 2014
On 2014-02-12 07:49, Clément Bœsch wrote:
> On Wed, Feb 12, 2014 at 12:11:21AM +0100, James Darnley wrote:
>> From 1.3 to 2.5 times faster. Runtime reduced by 4 to 58%. As with the
>> 16-bit version the speed-up generally increases with compression_level.
>>
>> Also like the 16-bit version, it is not used with levels less than 3.
>> ---
>> libavcodec/x86/flac_dsp_gpl.asm | 97 +++++++++++++++++++++++++++++++++++++++
>> libavcodec/x86/flacdsp_init.c | 5 ++
>> 2 files changed, 102 insertions(+), 0 deletions(-)
>>
>> diff --git a/libavcodec/x86/flac_dsp_gpl.asm b/libavcodec/x86/flac_dsp_gpl.asm
>> index 9e9249a..e36c76b 100644
>> --- a/libavcodec/x86/flac_dsp_gpl.asm
>> +++ b/libavcodec/x86/flac_dsp_gpl.asm
>> @@ -22,6 +22,14 @@
>>
>> %include "libavutil/x86/x86util.asm"
>>
>> +SECTION_RODATA
>> +
>> +pd_0_int_min: times 2 dd 0, -2147483648
>> +pq_int_min: times 2 dq -2147483648
>> +pq_int_max: times 2 dq 2147483647
>> +
>> +SECTION .text
>> +
>> INIT_XMM sse4
>> %if ARCH_X86_64
>> cglobal flac_enc_lpc_16, 6, 8, 8, 0, res, smp, len, order, coefs, shift
>> @@ -89,3 +97,92 @@ movd m3, shiftmp
>> sub lenmp, (3*mmsize)/4
>> jg .looplen
>> RET
>> +
>> +%macro PMINSQ 3
>> + mova %3, %2
>> + pcmpgtq %3, %1
>
> pcmpgtq %3, %2, %1
I can certainly change that but it won't have any useful effect without
a version of the function that allows instructions with 3 operands.
I'm sure there are a few other places that could benefit from this and
maybe "new" instructions as well. I just need to grok the instructions
and then identify where they might be useful.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 683 bytes
Desc: OpenPGP digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20140212/fdaca20a/attachment.asc>
More information about the ffmpeg-devel
mailing list