[FFmpeg-devel] [PATCH] BGR24 Huffyuv and drive-by bug fixes
Loren Merritt
lorenm
Sat Oct 17 19:12:23 CEST 2009
On Sat, 17 Oct 2009, Alexander Strange wrote:
> On Oct 15, 2009, at 5:53 PM, Michael Niedermayer wrote:
>
>> it might be as fast to handle 3 byte groups in C but its not clear how SIMD
>> would behave with that
>
> I don't think it applies here.
>
> The decoder profile looks like:
>
> 73.5% 73.5% ffmpeg_g decode_bgr_bitstream
> 8.1% 8.1% ffmpeg_g add_hfyu_left_prediction_bgr24_c
> 1.1% 1.1% ffmpeg_g bswap_buf
> 1.1% 1.1% ffmpeg_g add_bytes_mmx
>
> so it's entirely VLC-lookup bound (on angels_480-huffyuvcompress.avi/x86-64).
> The first two functions already can't be easily SIMDed, and the second two
> work just as well in either case.
add_hfyu_left_prediction_bgr32 can be simded. Just use the low half of an
mmxreg to add 3 samples at a time.
The same works for bgr24, except then the data is unaligned, so the
load/stores are slower. Shuffles may work better, and are still
conceptually simple, but are more annoying to write.
Hmm, even the yuv version can be done with a log-depth addition tree.
Dunno if that's faster than C.
--Loren Merritt
More information about the ffmpeg-devel
mailing list