[FFmpeg-devel] [WIP] [PATCH 0/6] sse2/xmm version of 8-bit simple_idct

Sat Jun 3 03:18:03 EEST 2017

Two ideas here.

The first 3 patches alter the old mmx code so that it can use xmm registers.  It
still only uses half the available width and adds a few shuffles meaning it
isn't an ideal solution.  Though it is exact compared with the mmx version.
Seems to be moderately faster of Skylake despite the shuffles but similar speed
on Yorkfield (like some of my previous work). Possibly useful if anybody still
uses a 32-bit build on these CPUs.

The 4th patch is a bit of cleanup I did while reading and partly redoing the
10-bit simple_idct.  It uses the named registers to remove a little indirection.
Not used everywhere, yet.  It could be applied regardless of any other of these
patches.

The last 2 are an attempt to use the 10- and 12-bit macros.  I don't think it is
correct, perhaps due to rounding or due to a small difference in the
coefficients used.  Changing these causes other errors.

James Darnley (6):
  initial alignment corrections for xmm registers
  change explicit mmx register use to x264asm style
  add and fix xmm version of simple_idct
  avcodec/x86: cleanup simple_idct10
  add x86_64 8-bit simple_idct function
  change coeffs

 libavcodec/tests/x86/dct.c                |    5 +
 libavcodec/x86/idctdsp_init.c             |   11 +
 libavcodec/x86/proresdsp.asm              |    2 +-
 libavcodec/x86/simple_idct.asm            | 1242 +++++++++++++++--------------
 libavcodec/x86/simple_idct.h              |    4 +
 libavcodec/x86/simple_idct10.asm          |   18 +-
 libavcodec/x86/simple_idct10_template.asm |   64 +-
 7 files changed, 715 insertions(+), 631 deletions(-)

-- 
2.12.2