[FFmpeg-devel] [PATCH] x86/vp9lpf: add ff_vp9_loop_filter_[vh]_88_16_sse2()

Clément Bœsch u at pkh.me
Tue Jan 28 13:24:34 CET 2014


On Tue, Jan 28, 2014 at 12:05:41PM +0100, Christophe Gisquet wrote:
> Hi,
> 
> 2014-01-28 James Almer <jamrial at gmail.com>:
> > +%if cpuflag(ssse3)
> >      mova                m0, [mask_mix]
> > +%endif
> >      movd                m2, Id
> >      movd                m3, Ed
> > -    pshufb              m2, m0
> > -    pshufb              m3, m0
> > +    SPLATB_MASK         m2, m0
> > +    SPLATB_MASK         m3, m0
> 
> Is there any gain in loading mask_mix into m0, in particular considering that:
> 

The register was available, and iirc splat macros need the value in a
register.

> >  %endif
> >      mova                m0, [pb_80]
> >      pxor                m2, m0
> > @@ -456,7 +469,7 @@ SECTION .text
> >      SPLATB_REG          m7, H, m0                       ; H H H H ...
> >  %else
> >      movd                m7, Hd
> > -    pshufb              m7, [mask_mix]
> > +    SPLATB_MASK         m7, [mask_mix]
> >  %endif
> 
> It is not loaded here?

I couldn't keep the register available until then.

> 
> I'm asking because I have noticed it sometimes (not in vp9 scope) does
> not matter, or is even 1 cycle faster.

In that particular case we need to use it twice, so we just avoid another
read. I admit I didn't bench, but that's probably not relevant.

-- 
Clément B.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 490 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20140128/674edd25/attachment.asc>


More information about the ffmpeg-devel mailing list