[FFmpeg-devel] [PATCH] MMX2/SSSE3 VC1 loop filter

Sun Jul 4 12:54:15 CEST 2010

On Wed, Apr 01, 2009 at 05:06:59AM -0400, David Conrad wrote:
> Overall 17% faster decode on my Penryn, including the first function
> to use SSE4 instructions in ffmpeg! (which shave an entire 2 clocks
> off of vc1_h_loop_filter8 for me)
> 
> One thing I don't understand is why the PSIGNW_SRA_MMX macro is
> necessary for correct results. I know that psignw isn't equivalent
> to ((a ^ b) - b), but the only difference I'm aware of is when b is
> 0, psignw sets a to 0 as well. It's probably a stupidly simple case
> that I'm missing...
> 
> 
> 700 dezicycles in vc1_v_loop_filter4_mmx2, 1048506 runs, 70 skips.
> 639 dezicycles in vc1_v_loop_filter4_ssse3, 1048447 runs, 129 skips
> 
> 977 dezicycles in vc1_h_loop_filter4_mmx2, 2097069 runs, 83 skips.
> 951 dezicycles in vc1_h_loop_filter4_ssse3, 2097040 runs, 112 skips
> 
> 1116 dezicycles in vc1_v_loop_filter8_mmx2, 33552803 runs, 1629 skips
> 677 dezicycles in vc1_v_loop_filter8_ssse3, 33552817 runs, 1615 skips
> 
> 1648 dezicycles in vc1_h_loop_filter8_mmx2, 33552806 runs, 1626 skips
> 1158 dezicycles in vc1_h_loop_filter8_ssse3, 33552878 runs, 1554 skips
> 1137 dezicycles in vc1_h_loop_filter8_sse4, 33553447 runs, 985 skips

It seems this got lost?