[FFmpeg-devel] vc1dsp: introduce cases for 8x8 and 16x16
Ronald S. Bultje
rsbultje at gmail.com
Sun Apr 20 19:21:09 CEST 2014
Hi,
On Sun, Apr 20, 2014 at 12:16 PM, Christophe Gisquet <
christophe.gisquet at gmail.com> wrote:
> I noticed the 16x16 partitions were actually using 4 calls to the 8x8 MC
> code.
>
Patch OK.
> Note: I tried to at least unroll vertically the MMX code in the 16x16
> case, but that somehow slowed the decoder to its original speed. I
> didn't bother further because of the aforementioned reason.
Vertically? Maybe the code size is just too big (cache issue), I mean, 16
lines is quite a lot to unroll (I'm assuming you tried to unroll to 16, you
didn't specify so I may be wrong). Afair even h264 doesn't unroll much
vertically, just horizontally. I wouldn't expect much speed gain beyond an
unroll by 2 for instruction pairing anyway, so I guess no v unroll should
be fine for w=16.
Ronald
More information about the ffmpeg-devel
mailing list