[FFmpeg-devel] huffyuv optimization
Loren Merritt
lorenm
Wed May 23 13:18:07 CEST 2007
I almost have a potential optimization that could add another 1.5x
speedup to huffyuv/ffvhuff decoding, and I'm looking for ideas to
salvage it.
Here's an oprofile of the current decoder (running on a core2):
samples % symbol name
475208 47.5263 decode_*_bitstream
442444 44.2496 add_median_prediction
61532 6.1540 build_table
13023 1.3025 read_huffman_tables
6040 0.6040 bswap_buf
And for comparison, an oprofile of the current encoder:
samples % symbol name
299158 81.0695 encode_*_bitstream
36627 9.9256 sub_hfyu_median_prediction_mmx2
20002 5.4204 generate_len_table
5554 1.5051 generate_bits_table
4413 1.1959 bswap_buf
1546 0.4189 encode_frame
Both runs covered the same content and the same number of frames, so the
absolute number of samples is comparable. Yes, decoding is 2.7x slower
than encoding.
A large part of the difference is that sub_median_prediction is simd'ed
and add_median_prediction isn't. When encoding, we have all the pixels
and can compute the prediction in any order. When decoding, each pixel
depends on its neighbors, so we can't easily predict multiple pixels in
parallel...
But there is still some possible parallelism. The neighbors used are
only left, top, and topleft. So if we rearrange the pixels to load a
diagonal stripe into an mmreg, then simd can be applied. (This
rearrangement is simply transpose+skew, and the skew comes free with
stride manipulation.)
However, huffyuv also keeps the last pixel in a row as the virtual left
neighbor of the first pixel in the next row. I don't think it helps
compression in any way, and it's not any simpler than, say, using the
first pixel of the previous row. With this dependency, I can't implement
the above optimization.
I could modify ffvhuff to remove this dependency, but that wouldn't
optimize decoding of existing files.
--Loren Merritt
More information about the ffmpeg-devel
mailing list