[FFmpeg-devel] [RFC] DXVA2 decoding and FFmpeg

Stefano Sabatini stefasab at gmail.com
Thu Jun 11 17:24:45 CEST 2015


On date Friday 2015-05-29 09:47:58 -0700, Timothy Gu encoded:
> On Fri, May 29, 2015 at 03:49:22PM +0200, Stefano Sabatini wrote:
[...]
> >  OBJS-$(CONFIG_PIXELUTILS) += x86/pixelutils_init.o                      \
> > diff --git a/libavutil/x86/imgutils.c b/libavutil/x86/imgutils.c
> > new file mode 100644
> > index 0000000..8b3ed0f
> > --- /dev/null
> > +++ b/libavutil/x86/imgutils.c
> > @@ -0,0 +1,95 @@
> > +/*
> > + * This file is part of FFmpeg.
> > + *
> > + * FFmpeg is free software; you can redistribute it and/or
> > + * modify it under the terms of the GNU Lesser General Public
> > + * License as published by the Free Software Foundation; either
> > + * version 2.1 of the License, or (at your option) any later version.
> > + *
> > + * FFmpeg is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > + * Lesser General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU Lesser General Public
> > + * License along with FFmpeg; if not, write to the Free Software
> > + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
> > + */
> > +
> > +#include <inttypes.h>
> > +#include "config.h"
> > +#include "libavutil/avassert.h"
> > +#include "libavutil/imgutils.h"
> > +#include "libavutil/imgutils_internal.h"
> > +
> > +#if HAVE_SSE2
> > +/* Copy 16/64 bytes from srcp to dstp loading data with the SSE>=2 instruction
> > + * load and storing data with the SSE>=2 instruction store.
> > + */
> > +#define COPY16(dstp, srcp, load, store) \
> > +    __asm__ volatile (                  \
> > +        load "  0(%[src]), %%xmm1\n"    \
> > +        store " %%xmm1,    0(%[dst])\n" \
> > +        : : [dst]"r"(dstp), [src]"r"(srcp) : "memory", "xmm1")
> > +
> > +#define COPY64(dstp, srcp, load, store) \
> > +    __asm__ volatile (                  \
> > +        load "  0(%[src]), %%xmm1\n"    \
> > +        load " 16(%[src]), %%xmm2\n"    \
> > +        load " 32(%[src]), %%xmm3\n"    \
> > +        load " 48(%[src]), %%xmm4\n"    \
> > +        store " %%xmm1,    0(%[dst])\n" \
> > +        store " %%xmm2,   16(%[dst])\n" \
> > +        store " %%xmm3,   32(%[dst])\n" \
> > +        store " %%xmm4,   48(%[dst])\n" \
> > +        : : [dst]"r"(dstp), [src]"r"(srcp) : "memory", "xmm1", "xmm2", "xmm3", "xmm4")
> > +#endif
> > +
> > +void ff_image_copy_plane_from_uswc_x86(uint8_t *dst, size_t dst_linesize,
> > +				       const uint8_t *src, size_t src_linesize,
> > +				       unsigned bytewidth, unsigned height,
> > +				       int cpu_flags)
> > +{
> > +#if !HAVE_SSSE3
> 

> Are any SSSE3 instructions used?

No. I re-checked, MOVDQA/MOVDQU were introduced in SSE2, MOVNTDQA in SSE4. 

> > +    return av_image_copy_plane(dst, dst_linesize, src, src_linesize, bytewidth, height);
> > +#endif
> > +
> > +    av_assert0(((intptr_t)dst & 0x0f) == 0 && (dst_linesize & 0x0f) == 0);
> > +
> > +    __asm__ volatile ("mfence");
> > +
> > +    for (unsigned y = 0; y < height; y++) {
> > +        const unsigned unaligned = (-(uintptr_t)src) & 0x0f;
> > +        unsigned x = unaligned;
> > +
> 
> > +#if HAVE_SSE42
> > +        if (cpu_flags & AV_CPU_FLAG_SSE4) {
> 
> movntdqa is an SSE4.1 instruction, so this should work better:
> 
>     if (INLINE_SSE4(cpu_flags))
> 
> That checks both HAVE_SSE4_INLINE and cpu_flags for AV_CPU_FLAG_SSE4.
> 
> (But then like others have said new inline asm code shouldn't be added in the
> first place)

Next step would be the use of YASM, but I only want to test if the
general approach is fine (and if the API is not too specific). Also if
someone wants to step up and port it to YASM I'm all for it, since
ASM/YASM is far from being my area of expertise.
-- 
FFmpeg = Fiendish Fabulous Most Pure Evangelical God
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-lavu-imgutils-add-av_image_copy_plane_from_uswc-func.patch
Type: text/x-diff
Size: 9673 bytes
Desc: not available
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20150611/2fd4505c/attachment.bin>


More information about the ffmpeg-devel mailing list