[FFmpeg-devel] [PATCH 02/10] x86: dcadsp: implement SSE lfe_dir
Timothy Gu
timothygu99 at gmail.com
Sun Apr 6 17:34:42 CEST 2014
On Apr 6, 2014 8:24 AM, "Hendrik Leppkes" <h.leppkes at gmail.com> wrote:
>
> On Fri, Feb 14, 2014 at 5:00 PM, Christophe Gisquet
> <christophe.gisquet at gmail.com> wrote:
> > Results for Arrandale/Windows:
> > 32: 1670 -> 316
> > 64: 728 -> 298
> > ---
> > libavcodec/x86/dcadsp.asm | 87
++++++++++++++++++++++++++++++++++++++++++++
> > libavcodec/x86/dcadsp_init.c | 4 ++
> > 2 files changed, 91 insertions(+)
> >
> > diff --git a/libavcodec/x86/dcadsp.asm b/libavcodec/x86/dcadsp.asm
> > index a0995c9..f4149d2 100644
> > --- a/libavcodec/x86/dcadsp.asm
> > +++ b/libavcodec/x86/dcadsp.asm
> > @@ -88,3 +88,90 @@ INT8X8_FMUL_INT32
> >
> > INIT_XMM sse4
> > INT8X8_FMUL_INT32
> > +
> > +; %1=v0/v1 %2=in1 %3=in2
> > +%macro FIR_LOOP 2-3
> > +.loop%1:
> > +%define va m1
> > +%define vb m2
> > +%if %1
> > +%define OFFSET 0
> > +%else
> > +%define OFFSET NUM_COEF*count
> > +%endif
> > +; for v0, incrementint and for v1, decrementing
> > + mova va, [cf0q + OFFSET]
> > + mova vb, [cf0q + OFFSET + 4*NUM_COEF]
> > +%if %0 == 3
> > + mova m4, [cf0q + OFFSET + mmsize]
> > + mova m0, [cf0q + OFFSET + 4*NUM_COEF + mmsize]
> > +%endif
> > + mulps va, %2
> > + mulps vb, %2
> > +%if %0 == 3
> > + mulps m4, %3
> > + mulps m0, %3
> > + addps va, m4
> > + addps vb, m0
> > +%endif
> > + ; va = va1 va2 va3 va4
> > + ; vb = vb1 vb2 vb3 vb4
> > +%if %1
> > + SWAP va, vb
> > +%endif
> > + mova m4, va
> > + unpcklps va, vb ; va3 vb3 va4 vb4
> > + unpckhps m4, vb ; va1 vb1 va2 vb2
> > + addps m4, va ; va1+3 vb1+3 va2+4 vb2+4
> > + movhlps vb, m4 ; va1+3 vb1+3
> > + addps vb, m4 ; va0..4 vb0..4
> > + movh [outq + count], vb
>
> I got a complaint about a crash on a SSE-only system, and the disasm I
> got from the user was pointing at this exact line:
>
> ......
> 6A2F283F movaps xmm4,xmm1
> 6A2F2842 unpcklps xmm1,xmm2
> 6A2F2845 unpckhps xmm4,xmm2
> 6A2F2848 addps xmm4,xmm1
> 6A2F284B movhlps xmm2,xmm4
> 6A2F284E addps xmm2,xmm4
> 6A2F2851 movq mmword ptr [eax+ecx],xmm2 <<<
>
> The "movh" generates a movq, which according to my quick research
> seems to be SSE2-only, and causes an illegal instruction on the users
> CPU.
That should probably be movlps. See
http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=b5161908e06b4497bf663510fb495ba97a6fd2b5
.
Timothy
More information about the ffmpeg-devel
mailing list