[FFmpeg-devel] vc1dsp: introduce cases for 8x8 and 16x16
Christophe Gisquet
christophe.gisquet at gmail.com
Mon Apr 21 12:16:15 CEST 2014
2014-04-20 19:21 GMT+02:00 Ronald S. Bultje <rsbultje at gmail.com>:
> Vertically? Maybe the code size is just too big (cache issue), I mean, 16
> lines is quite a lot to unroll (I'm assuming you tried to unroll to 16, you
> didn't specify so I may be wrong). Afair even h264 doesn't unroll much
> vertically, just horizontally. I wouldn't expect much speed gain beyond an
> unroll by 2 for instruction pairing anyway, so I guess no v unroll should
> be fine for w=16.
A follow-up on this. I hacked 16x16 functions by copy'n'paste. Where
possible, it is unrolled horizontally.
It passes fate. But it doesn't change decoding speed, while probably
doubling code size.
To make it worthwhile, I guess one will need to move to >=SSE2 (and
nasmify it). But again, not worth the effort.
--
Christophe
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-x86-vc1dsp-implement-true-16x16-functions.patch
Type: text/x-patch
Size: 26816 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20140421/775f1a90/attachment.bin>
More information about the ffmpeg-devel
mailing list