[FFmpeg-devel] [WIP] [PATCH 0/6] sse2/xmm version of 8-bit simple_idct
James Darnley
jdarnley at obe.tv
Sat Jun 3 03:18:03 EEST 2017
Two ideas here.
The first 3 patches alter the old mmx code so that it can use xmm registers. It
still only uses half the available width and adds a few shuffles meaning it
isn't an ideal solution. Though it is exact compared with the mmx version.
Seems to be moderately faster of Skylake despite the shuffles but similar speed
on Yorkfield (like some of my previous work). Possibly useful if anybody still
uses a 32-bit build on these CPUs.
The 4th patch is a bit of cleanup I did while reading and partly redoing the
10-bit simple_idct. It uses the named registers to remove a little indirection.
Not used everywhere, yet. It could be applied regardless of any other of these
patches.
The last 2 are an attempt to use the 10- and 12-bit macros. I don't think it is
correct, perhaps due to rounding or due to a small difference in the
coefficients used. Changing these causes other errors.
James Darnley (6):
initial alignment corrections for xmm registers
change explicit mmx register use to x264asm style
add and fix xmm version of simple_idct
avcodec/x86: cleanup simple_idct10
add x86_64 8-bit simple_idct function
change coeffs
libavcodec/tests/x86/dct.c | 5 +
libavcodec/x86/idctdsp_init.c | 11 +
libavcodec/x86/proresdsp.asm | 2 +-
libavcodec/x86/simple_idct.asm | 1242 +++++++++++++++--------------
libavcodec/x86/simple_idct.h | 4 +
libavcodec/x86/simple_idct10.asm | 18 +-
libavcodec/x86/simple_idct10_template.asm | 64 +-
7 files changed, 715 insertions(+), 631 deletions(-)
--
2.12.2
More information about the ffmpeg-devel
mailing list