[FFmpeg-devel] huffyuv optimization

Wed May 23 13:18:07 CEST 2007

I almost have a potential optimization that could add another 1.5x
speedup to huffyuv/ffvhuff decoding, and I'm looking for ideas to
salvage it.

Here's an oprofile of the current decoder (running on a core2):
samples  %        symbol name
475208   47.5263  decode_*_bitstream
442444   44.2496  add_median_prediction
61532     6.1540  build_table
13023     1.3025  read_huffman_tables
6040      0.6040  bswap_buf

And for comparison, an oprofile of the current encoder:
samples  %        symbol name
299158   81.0695  encode_*_bitstream
36627     9.9256  sub_hfyu_median_prediction_mmx2
20002     5.4204  generate_len_table
5554      1.5051  generate_bits_table
4413      1.1959  bswap_buf
1546      0.4189  encode_frame

Both runs covered the same content and the same number of frames, so the
absolute number of samples is comparable. Yes, decoding is 2.7x slower
than encoding.
A large part of the difference is that sub_median_prediction is simd'ed
and add_median_prediction isn't. When encoding, we have all the pixels
and can compute the prediction in any order. When decoding, each pixel
depends on its neighbors, so we can't easily predict multiple pixels in
parallel...
But there is still some possible parallelism. The neighbors used are
only left, top, and topleft. So if we rearrange the pixels to load a
diagonal stripe into an mmreg, then simd can be applied. (This
rearrangement is simply transpose+skew, and the skew comes free with
stride manipulation.)
However, huffyuv also keeps the last pixel in a row as the virtual left
neighbor of the first pixel in the next row. I don't think it helps
compression in any way, and it's not any simpler than, say, using the
first pixel of the previous row. With this dependency, I can't implement
the above optimization.
I could modify ffvhuff to remove this dependency, but that wouldn't
optimize decoding of existing files.

--Loren Merritt