[FFmpeg-devel] [PATCH] x86/sbrdsp: add ff_sbr_autocorrelate_{sse, sse3}

Christophe Gisquet christophe.gisquet at gmail.com
Sun Jan 25 14:11:56 CET 2015


2015-01-25 2:05 GMT+01:00 James Almer <jamrial at gmail.com>:
> 2 to 2.5 times faster.
> Signed-off-by: James Almer <jamrial at gmail.com>
> ---
>  libavcodec/x86/sbrdsp.asm    | 114 +++++++++++++++++++++++++++++++++++++++++++

Not the first time that I notice that, but memmoves are often
suboptimal using old SSE ones.
While movlhps is fine, movlps isn't, on my old core i5. You may want
to validate this with the attached patch, where storing ps_mask3 in m8
is a gain in Win64 (the gain does not match the number of loops, but
it is still there).

x64:  6023 decicycles in g, 262108 runs, 36 skips
SSE:  3049 decicycles in g, 262130 runs, 14 skips
SSE3: 2843 decicycles in g, 262086 runs, 58 skips
movq: 2693 decicycles in g, 262117 runs, 27 skips
m8:   2648 decicycles in g, 262083 runs, 61 skips

Thanks for doing it, I had only 3yo scraps left and no further
motivation to tackle the start/tail parts.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-Use-different-mem-moves.patch
Type: text/x-patch
Size: 3179 bytes
Desc: not available
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20150125/791505d9/attachment.bin>

More information about the ffmpeg-devel mailing list