[FFmpeg-devel] [PATCH 0/4] Port XvID iDCT to yasm syntax

Christophe Gisquet christophe.gisquet at gmail.com
Wed Mar 11 00:11:50 CET 2015

This patch series does not attempt to change the core implementation
of the iDCT.

First patch is relatively straightforward. I've only dropped the
alignment on a series of jumps which I didn't see helping at all.

Second patch is less, as I've also tried to reuse tables. Some of them
seem to be similar to what can be found in, e.g., fdct.c. This MMX code
is not compiled for ARCH_X86_64. I also decided to edit the licence

The last 2 patches are more questionable. They attempt to merge the
{put,add}_clamped and the iDCT for the SSE2 versions. This leads to
little object size increase, as the iDCT was always inlined in them.
To achieve this merge, ease rather than code minimization was targeted.
It's roughly 10 cycles/10% gain, but that's hardly noticeable.

This has been tested under Win32 and Win64, on a 140000-frames video,
producing the expected CRC. The patch series passes fate's xvid-idct
and xvid-custom-matrix.

However, linux was not tested, and this is arguably sensitive code, so
further evaluation is welcome.

Christophe Gisquet (4):
  x86: xvid: port SSE2 idct to yasm
  x86: xvid_idct: port MMX IDCT to yasm
  x86: xvid_idct: merged idct_put SSE2 versions
  x86: xvid_idct: SSE2 merged add version

 libavcodec/x86/Makefile        |   3 +-
 libavcodec/x86/xvididct.asm    | 983 +++++++++++++++++++++++++++++++++++++++++
 libavcodec/x86/xvididct_init.c |  49 +-
 libavcodec/x86/xvididct_mmx.c  | 549 -----------------------
 libavcodec/x86/xvididct_sse2.c | 406 -----------------
 5 files changed, 1024 insertions(+), 966 deletions(-)
 create mode 100644 libavcodec/x86/xvididct.asm
 delete mode 100644 libavcodec/x86/xvididct_mmx.c
 delete mode 100644 libavcodec/x86/xvididct_sse2.c


More information about the ffmpeg-devel mailing list