[Ffmpeg-devel] [PATCH] Snow mmx+sse2 asm optimizations
Sun Feb 5 18:47:14 CET 2006
I've written assembly SIMD optimizations (MMX, SSE2) for three parts of
snow. These changes include:
- MMX and SSE2 code for the bottom part of add_yblock_buffered.
- Left shifting the OBMC tables by 2, and updating parts of the code
to work with the change. This makes for somewhat faster code by
eliminating some shift operations in the innermost loop of
- vertical compose has a straightforward SIMD implementation.
- horizontal compose has substantially modified internally to allow for
an efficient SIMD implementation and improving cache performance. For
plain C code, it may be faster or slower on your system (faster on
mine). The largest change is that it is almost entirely in-place and the
temp buffer is only half used now, allowing for SIMD optimization and
improving cache performance. An added step, interleave_line(), has been
added because the in-place lifts do not leave the coefficients in the
proper places. This code is extremely fast in SIMD.
I am aware that conditional compilation for SIMD code is frowned upon,
so could someone give me some feedback on how my code could be
efficiently done using function pointers like the other SIMD
optimizations in ffmpeg? Some functions (interleave_line, 8x8 obmc) take
nary 500 clocks to finish.
Also, if anyone has any ideas on how to clean up horizontal_compose or
the add_yblock asm without sacrificing much speed, that would be much
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 77281 bytes
Desc: not available
More information about the ffmpeg-devel