[FFmpeg-devel] vc1dsp: introduce cases for 8x8 and 16x16

Christophe Gisquet christophe.gisquet at gmail.com
Sun Apr 20 19:46:21 CEST 2014


btw it passes x86 fate, but I haven't tested the arm modifications.
They should be obvious, but if I still managed to make a mistake here,
it should be pretty obvious.

2014-04-20 19:21 GMT+02:00 Ronald S. Bultje <rsbultje at gmail.com>:
> Vertically? Maybe the code size is just too big (cache issue), I mean, 16
> lines is quite a lot to unroll (I'm assuming you tried to unroll to 16, you
> didn't specify so I may be wrong).

Currently, the x86 still does 4 calls to the _mc functions. The actual
MMX functions have a loop on the previous height of 16. What I did was
instead pass the height as parameter to the function, so as to perform
2 calls instead of 4.

Like you, aside the nowadays low level of optimization, I was also
suspecting the increased size (iirc 12x16 int16_t in that scenario) of
the intermediates.

> Afair even h264 doesn't unroll much
> vertically, just horizontally. I wouldn't expect much speed gain beyond an
> unroll by 2 for instruction pairing anyway, so I guess no v unroll should
> be fine for w=16.

Yeah, I was unclear, unroll of over 4 is generally causing more
register pressure than anything.


More information about the ffmpeg-devel mailing list