[FFmpeg-devel] [PATCH 6/8] lavc/x86/flac_dsp_gpl: partially unroll 32-bit LPC encoder
James Darnley
james.darnley at gmail.com
Mon Nov 27 01:36:43 EET 2017
On 2017-11-27 00:17, Rostislav Pehlivanov wrote:
> On 26 November 2017 at 22:51, James Darnley <james.darnley at gmail.com> wrote:
>> @@ -152,13 +152,13 @@ RET
>> %macro FUNCTION_BODY_32 0
>>
>> %if ARCH_X86_64
>> - cglobal flac_enc_lpc_32, 5, 7, 8, mmsize, res, smp, len, order, coefs
>> + cglobal flac_enc_lpc_32, 5, 7, 8, mmsize*4, res, smp, len, order,
>> coefs
>>
>
> Why x4, shouldn't this be x2?
I write 3 mm registers more to the stack. The first one is the sign
extension for my hacked qword arithmetic shift added in the first 32-bit
patch. The new 3 are to store the "odd" values created in the first
inner loop.
I admit that this is a rather ugly construction for a little speed gain
but I think I've seen other ugly things since writing this.
More information about the ffmpeg-devel
mailing list