[FFmpeg-devel] [PATCH 2/3] x86/float_dsp: unroll loop in vector_fmac_scalar

Christophe Gisquet christophe.gisquet at gmail.com
Wed Apr 16 18:35:56 CEST 2014

Le 16 avr. 2014 18:12, "James Almer" <jamrial at gmail.com> a écrit :
> Athlon 64 7750+ mingw-w64. Went from 274 cycles to 257 when i benched with
> the dts-es sample i uploaded for the fate test.


> Also, does aac even use vector_fmac_scalar? A grep on libavcodec shows
> results only in dcadec.c.

I must have mistaken in which batch I modified what code. So what I am
remembering must have been for something else, then.

> The difference in the resulting code is in the order of instructions
> to the unrolling of the loop. The mulps now have enough room to finish
> the addps are executed, and so do the addps before the mova to memory.

I would have expected this to be handled by out of order execution. But I
guess the mulps have too long a latency to not cause a dependency. I can't
help benchmark this atm but there should be no harm to your changes then.

OK from my side then.

Best regards,

More information about the ffmpeg-devel mailing list