[FFmpeg-devel] [PATCH] Add x86-optimized function ac3_or_abs_int16() and use in log2_tab().
Ronald S. Bultje
rsbultje
Sat Feb 12 22:24:07 CET 2011
Hi,
On Fri, Feb 11, 2011 at 7:55 PM, Justin Ruggles
<justin.ruggles at gmail.com> wrote:
> ?libavcodec/ac3dsp.c ? ? ? ? | ? ?9 ++++++
> ?libavcodec/ac3dsp.h ? ? ? ? | ? 11 ++++++++
> ?libavcodec/ac3enc_fixed.c ? | ? 11 ++-----
> ?libavcodec/x86/ac3dsp.asm ? | ? 61 +++++++++++++++++++++++++++++++++++++++++++
> ?libavcodec/x86/ac3dsp_mmx.c | ? 11 ++++++++
> ?5 files changed, 95 insertions(+), 8 deletions(-)
[..]
> + mova [rsp], m4
> + xor rax, rax
> + or ax, [rsp]
> + or ax, [rsp+2]
> + or ax, [rsp+4]
> + or ax, [rsp+6]
> +%ifidn mmsize, 16
> + or ax, [rsp+8]
> + or ax, [rsp+10]
> + or ax, [rsp+12]
> + or ax, [rsp+14]
> +%endif
for xmm version:
mova xmm5, xmm4
punpckhqdq xmm4, xmm4
por xmm5, xmm4 ; or in lowest 8 bytes
pshuflw xmm4, xmm5, 0xe
por xmm5, xmm4 ; or in lowest 4 bytes
pshuflw xmm4, xmm5, 0x1
por xmm5, xmm4
movd eax, xmm5
For mmx version:
pshuflw mm5, mm4, 0xe
por mm4, mm5
pshuflw mm5, mm4, 0x1
por mm4, mm5
movd eax, mm4
This has the advantage that you don't need to mess around with rsp
anywhere. I can't predict for sure if it's faster, but it probably is.
Ronald
More information about the ffmpeg-devel
mailing list