[FFmpeg-devel] [PATCH] vf_ssim: x86 simd for ssim_4x4xN and ssim_endN.

Michael Niedermayer michael at niedermayer.cc
Tue Jul 14 05:06:17 CEST 2015


On Mon, Jul 13, 2015 at 11:39:15PM -0300, James Almer wrote:
> On 12/07/15 8:33 PM, Ronald S. Bultje wrote:
> > +INIT_XMM sse4
> > +cglobal ssim_end_line, 3, 3, 6, sum0, sum1, w
> > +    pxor              m0, m0
> > +.loop:
> > +    mova              m1, [sum0q+mmsize*0]
> > +    mova              m2, [sum0q+mmsize*1]
> > +    mova              m3, [sum0q+mmsize*2]
> > +    mova              m4, [sum0q+mmsize*3]
> > +    paddd             m1, [sum1q+mmsize*0]
> > +    paddd             m2, [sum1q+mmsize*1]
> > +    paddd             m3, [sum1q+mmsize*2]
> > +    paddd             m4, [sum1q+mmsize*3]
> > +    paddd             m1, m2
> > +    paddd             m2, m3
> > +    paddd             m3, m4
> > +    paddd             m4, [sum0q+mmsize*4]
> > +    paddd             m4, [sum1q+mmsize*4]
> > +    TRANSPOSE4x4D      1, 2, 3, 4, 5
> > +
> > +    ; m1 = fs1, m2 = fs2, m3 = fss, m4 = fs12
> > +    pslld             m3, 6
> > +    pslld             m4, 6
> > +    pmulld            m5, m1, m2                ; fs1 * fs2
> > +    pmulld            m1, m1                    ; fs1 * fs1
> > +    pmulld            m2, m2                    ; fs2 * fs2
> 
> If these values are guaranteed to be always positive then this could also
> be implemented with pmuludq to get an sse2 version working. Although I'm
> not sure if it's worth doing. It will be six pmuludq and an awful lot of
> shuffling and unpacking when the speed up of the sse4 version is already
> only ~2x the C version.
> 

> This was already oked (Same with the psnr sse2 code), so it should be
> pushed already.

/me wonders a little bit why noone else applied it yet, but

applied

thanks

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

No human being will ever know the Truth, for even if they happen to say it
by chance, they would not even known they had done so. -- Xenophanes
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: Digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20150714/46638819/attachment.sig>


More information about the ffmpeg-devel mailing list