[FFmpeg-devel] [PATCH] Higher bit-depth x86 SIMD assembly for yadif

James Darnley james.darnley at gmail.com
Thu Jan 19 20:55:58 CET 2012


Attached are five patches which add code for:
mmx to sse4 instruction sets for 15 and 16 bits per sample
mmx to ssse3 instruction sets for 9 to 14 bits per sample
actual support of 9 bits per sample

I know that 11 to 15 bits per sample don't exist at present but support 
might be added since h264 allows up to 14 bits per sample.  Anyway, all 
the code added here is used for existing features.

Below, I have copied the commit messages for convenience.

Something else to think about.  The source code clarity could be greatly 
improved by using yasm and its preprocessor.  I wonder how much 
abstraction it would need to roll the source to all three functions 
together and whether it would save source code size.

Subject: [PATCH 1/5] x86 SIMD for 16 bits per sample in yadif

It might be a rather dumb copy of the 8-bit SIMD but it works and
produces identical output to the C.  The MMX and SSE2 has been tested on
my Athlon64.  The SSSE3 and SSE4.1 needs testing and benching elsewhere.

Benchmarks on the Athlon64 using a 704px wide video, per line:
1693075 decicycles in C, 521977 runs, 2311 skips
1029468 decicycles in mmx, 523347 runs, 941 skips
  730504 decicycles in sse2, 523474 runs, 814 skips

Subject: [PATCH 2/5] x86 SIMD for 9 to 14 bits per sample in yadif

These lower bit depths do not need unpacking to double words letting the
code process more pixels per iteration (still 2 in mmx but 6 in sse2)
and avoiding emulating the missing double word instructions on older
instruction sets.

Benchmarks on my Athlon64 using a 704 pixel wide video, per line:
1695927 decicycles in C, 260986 runs, 1158 skips
  854770 decicycles in mmx, 261717 runs, 427 skips
  440202 decicycles in sse2, 261829 runs, 315 skips

Works out at:
mmx - 1.20 times faster than the 16 bit
sse2 - 1.66 times faster than the 16 bit

Subject: [PATCH 3/5] cosmetic indent

Subject: [PATCH 4/5] Actually support 9-bit YUV in yadif

Subject: [PATCH 5/5] Update copyright headers in yadif related files




-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: 0001-x86-SIMD-for-16-bits-per-sample-in-yadif.patch
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20120119/cd78453c/attachment.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: 0002-x86-SIMD-for-9-to-14-bits-per-sample-in-yadif.patch
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20120119/cd78453c/attachment-0001.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: 0003-cosmetic-indent.patch
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20120119/cd78453c/attachment-0002.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: 0004-Actually-support-9-bit-YUV-in-yadif.patch
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20120119/cd78453c/attachment-0003.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: 0005-Update-copyright-headers-in-yadif-related-files.patch
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20120119/cd78453c/attachment-0004.ksh>


More information about the ffmpeg-devel mailing list