[FFmpeg-devel] [PATCH] MMX2/SSSE3 VC1 loop filter
David Conrad
lessen42
Wed Apr 1 11:06:59 CEST 2009
Hi,
Overall 17% faster decode on my Penryn, including the first function
to use SSE4 instructions in ffmpeg! (which shave an entire 2 clocks
off of vc1_h_loop_filter8 for me)
One thing I don't understand is why the PSIGNW_SRA_MMX macro is
necessary for correct results. I know that psignw isn't equivalent to
((a ^ b) - b), but the only difference I'm aware of is when b is 0,
psignw sets a to 0 as well. It's probably a stupidly simple case that
I'm missing...
700 dezicycles in vc1_v_loop_filter4_mmx2, 1048506 runs, 70 skips.
639 dezicycles in vc1_v_loop_filter4_ssse3, 1048447 runs, 129 skips
977 dezicycles in vc1_h_loop_filter4_mmx2, 2097069 runs, 83 skips.
951 dezicycles in vc1_h_loop_filter4_ssse3, 2097040 runs, 112 skips
1116 dezicycles in vc1_v_loop_filter8_mmx2, 33552803 runs, 1629 skips
677 dezicycles in vc1_v_loop_filter8_ssse3, 33552817 runs, 1615 skips
1648 dezicycles in vc1_h_loop_filter8_mmx2, 33552806 runs, 1626 skips
1158 dezicycles in vc1_h_loop_filter8_ssse3, 33552878 runs, 1554 skips
1137 dezicycles in vc1_h_loop_filter8_sse4, 33553447 runs, 985 skips
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: vc1-sse-lf.txt
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20090401/ab929e72/attachment.txt>
-------------- next part --------------
More information about the ffmpeg-devel
mailing list