[FFmpeg-devel] [PATCH] x86/dsputil: port clear_block functions to yasm

Ronald S. Bultje rsbultje at gmail.com
Wed May 21 18:46:59 CEST 2014


Hi,

On Wed, May 21, 2014 at 12:42 PM, James Almer <jamrial at gmail.com> wrote:

> On 21/05/14 4:43 AM, Christophe Gisquet wrote:
> > Hi,
> >
> > 2014-05-21 8:53 GMT+02:00 James Almer <jamrial at gmail.com>:
> >> +INIT_XMM sse
> >> +%define ZERO xorps
> >> +CLEAR_BLOCK 1, 1
> > [...]
> >> +INIT_XMM sse
> >> +%define ZERO xorps
> >> +CLEAR_BLOCKS 1
> >
> > Maybe it crossed your mind and then you crossed it out for lack of
> > benefit, but a sse2 and even maybe an avx version might make sense?
>
> Tried an AVX version, but it seems the blocks are 16-byte aligned because
> it crashed on me.
> Didn't look too much into it, though.
>
> And not sure if an SSE2 version is worth it. The function is not a critical
> one (and mostly used by vc1) and xorps -> pxor, movaps -> movdqa will
> probably
> not make that much of a difference.


Modern codecs integrate clear_blocks in the idct. The advantage of this is
you can partially clear the block as part of the subidct optimization step,
e.g. a dc-only idct would only clear block[0]. Plus you omit the extra call
overhead.

If we really want to optimize codecs, remove their use of clear_block(s).

Ronald


More information about the ffmpeg-devel mailing list