[FFmpeg-devel] [PATCH 2/3] x86/float_dsp: unroll loop in vector_fmac_scalar
jamrial at gmail.com
Wed Apr 16 19:07:31 CEST 2014
On 16/04/14 1:35 PM, Christophe Gisquet wrote:
> Le 16 avr. 2014 18:12, "James Almer" <jamrial at gmail.com> a écrit :
>> Athlon 64 7750+ mingw-w64. Went from 274 cycles to 257 when i benched with
>> the dts-es sample i uploaded for the fate test.
>> Also, does aac even use vector_fmac_scalar? A grep on libavcodec shows
>> results only in dcadec.c.
> I must have mistaken in which batch I modified what code. So what I am
> remembering must have been for something else, then.
>> The difference in the resulting code is in the order of instructions
>> to the unrolling of the loop. The mulps now have enough room to finish
>> the addps are executed, and so do the addps before the mova to memory.
> I would have expected this to be handled by out of order execution. But I
> guess the mulps have too long a latency to not cause a dependency. I can't
> help benchmark this atm but there should be no harm to your changes then.
Five cycles latency for mulps and three for addps according to Intel.
Using only two regs for the four mulps+addps+mova will surely cause a
> OK from my side then.
> Best regards,
More information about the ffmpeg-devel