[FFmpeg-devel] [PATCH] Fix emulated_edge_mc SSE code to not contain SSE2 instructions on x86-32.
Ronald S. Bultje
rsbultje at gmail.com
Thu Oct 10 15:18:21 CEST 2013
Hi,
On Thu, Oct 10, 2013 at 8:52 AM, Michael Niedermayer <michaelni at gmx.at>wrote:
> On Thu, Oct 10, 2013 at 07:56:51AM -0400, Ronald S. Bultje wrote:
> > Hi,
> >
> > On Thu, Oct 10, 2013 at 7:35 AM, Michael Niedermayer <michaelni at gmx.at
> >wrote:
> >
> > > On Wed, Oct 09, 2013 at 09:24:20PM -0400, Ronald S. Bultje wrote:
> > > > ---
> > > > libavcodec/x86/videodsp.asm | 64
> > > +++++++++++++++++++++++++-----------------
> > > > libavcodec/x86/videodsp_init.c | 11 ++++++--
> > > > 2 files changed, 47 insertions(+), 28 deletions(-)
> > > >
> > > > diff --git a/libavcodec/x86/videodsp.asm
> b/libavcodec/x86/videodsp.asm
> > > > index cc2105d..aa865f5 100644
> > > > --- a/libavcodec/x86/videodsp.asm
> > > > +++ b/libavcodec/x86/videodsp.asm
> > > > @@ -100,7 +100,7 @@ cglobal emu_edge_hvar, 5, 6, 1, dst, dst_stride,
> > > start_x, n_words, h, w
> > > > ; FIXME also write a ssse3 version using pshufb
> > > > movzx wd, byte [dstq+start_xq] ; w = read(1)
> > > > imul wd, 0x01010101 ; w *= 0x01010101
> > > > - movd m0, wd
> > > > + movd m0, wd ; FIXME this is
> sse2,
> > > not sse
> > > [...]
> > > > diff --git a/libavcodec/x86/videodsp_init.c
> > > b/libavcodec/x86/videodsp_init.c
> > > > index a709d03..b5bab72 100644
> > > > --- a/libavcodec/x86/videodsp_init.c
> > > > +++ b/libavcodec/x86/videodsp_init.c
> > > > @@ -219,9 +219,14 @@ static av_noinline void
> > > emulated_edge_mc_sse(uint8_t *buf, ptrdiff_t buf_stride,
> > > > int block_w, int
> block_h,
> > > > int src_x, int src_y,
> int
> > > w, int h)
> > > > {
> > > > - emulated_edge_mc(buf, buf_stride, src, src_stride, block_w,
> block_h,
> > > > - src_x, src_y, w, h, vfixtbl_sse,
> > > &ff_emu_edge_vvar_sse,
> > > > - hfixtbl_sse, &ff_emu_edge_hvar_sse);
> > > > + emulated_edge_mc(buf, buf_stride, src, src_stride, block_w,
> > > block_h, src_x,
> > > > + src_y, w, h, vfixtbl_sse,
> &ff_emu_edge_vvar_sse,
> > > hfixtbl_sse,
> > > > +#if ARCH_X86_64
> > > > + &ff_emu_edge_hvar_sse
> > > > +#else
> > > > + &ff_emu_edge_hvar_mmx
> > > > +#endif
> > >
> > > this is a bit ugly but more important is to fix that bug so
> > > applied
> > >
> > > thanks
> >
> >
> > I'm not really sure how to properly fix this. I mean, I can temporarily
> > store on the stack (e.g. mov r0m, wd, followed by movss m0, r0m), but
> that
> > introduces an extra instruction in the loop, so we'd probably want a sse
> > and a sse2 version of the function (on x86-32 only of course; x86-64
> would
> > simply default to the sse2 one), and then have 3 (mmx, sse or sse2)
> instead
> > of the current 2 wrappers in videodsp_init.c (again, only on x86-32;
> x86-64
> > would always use sse2).
> >
> > In terms of binary size, it's not that bad, since it's only one small
> > function that would have 3 implementations (hvar); all others would
> simply
> > use the sse version in the sse2 wrapper. Does that sound ok? Loren, any
> > better ideas?
>
> hmm, thinking about it for a moment ...
>
> MMX and SSE make no sense for x86_64 when SSE2 is available, there
> are no cpus that would use them
> and IIRC SSE was not (much) faster than using twice as many MMX
> instructions on pre SSE2 cpus.
> so maybe MMX + SSE2 alone would be an option but then maybe i miss
> something ...
I'll probably want to measure that (on a real old sse2_slow or pre-sse2
computer) if I can find one. Anyone have old core duos or something with
ssh access? I threw mine out.
Ronald
More information about the ffmpeg-devel
mailing list