[FFmpeg-devel] [PATCH] x86: vc1dsp: Convert vc1_inv_trans_*_dc to NASM format

Mon Feb 1 00:27:30 CET 2016

On Sun, Jan 31, 2016 at 06:18:53PM -0300, James Almer wrote:
> On 1/31/2016 4:48 PM, Timothy Gu wrote:
> > ---
> >  libavcodec/x86/vc1dsp.asm    | 104 ++++++++++++++++++++++
> >  libavcodec/x86/vc1dsp_init.c |  13 +++
> >  libavcodec/x86/vc1dsp_mmx.c  | 207 -------------------------------------------
> >  3 files changed, 117 insertions(+), 207 deletions(-)
> > 
> > diff --git a/libavcodec/x86/vc1dsp.asm b/libavcodec/x86/vc1dsp.asm
> > index 6415a83..f922927 100644
> > --- a/libavcodec/x86/vc1dsp.asm
> > +++ b/libavcodec/x86/vc1dsp.asm
> > @@ -395,3 +395,107 @@ cglobal vc1_put_ver_16b_shift2, 4,7,0, dst, src, stride
> >          jnz         .loop
> >      REP_RET
> >  %endif ; HAVE_MMX_INLINE
> > +
> > +%macro INV_TRANS_INIT 0
> > +    movsxdifnidn linesizeq, linesized
> 
> Maybe change the prototype so linesize is ptrdiff_t?

I wanted to do that at first, but then I realized that to change this I'd need
to change simple_idct and a bunch of other decoders. I do want to come back to
this, but that just seems too much work for just four functions =P

[...]
> > +; ff_vc1_inv_trans_?x?_dc_mmxext(uint8_t *dest, int linesize, int16_t *block)
> > +INIT_MMX mmxext
> > +cglobal vc1_inv_trans_4x4_dc, 3,4,0, dest, linesize, block
> > +    movsx         r3d, WORD [blockq]
> 
> Can this value be negative?

I'm not 100% certain but I believe it can be.

> Because you're using it as an argument
> for lea using native size after movsx sign extended the value to 32
> bits, which means that on x86_64 the upper bits of the register will
> be zeroed.
> 
> If it can you'll have to use blockq/r3q everywhere, and if it can't
> then use movzx and shr.

Changed locally to blockq/r3. I was emulating GCC's code generation but seems
like there isn't much difference.

Timothy