[FFmpeg-devel] [RFC] DXVA2 decoding and FFmpeg

Fri May 29 15:49:22 CEST 2015

On date Thursday 2015-05-28 18:02:34 -0300, James Almer encoded:
> On 28/05/15 2:39 PM, Stefano Sabatini wrote:
> > From f3b4e77dd9dd299aba8f4fa83625d2b61b243c3c Mon Sep 17 00:00:00 2001
> > From: Stefano Sabatini <stefasab at gmail.com>
> > Date: Fri, 15 May 2015 18:58:17 +0200
> > Subject: [PATCH] lavu/imgutils: add av_image_copy_plane_from_uswc() function.
> > 
> > This function allows support to optimized GPU to CPU.
> > 
> > Based on code from vlc dxva2.c, commit 62107e56 by Laurent Aimar
> > <fenrir at videolan.org>.
> > 
> > TODO: fix integration with the build system, bump micro
> > 
> > Signed-off-by: Stefano Sabatini <stefasab at gmail.com>
> > ---
> >  libavutil/imgutils.c          |  14 ++++++
> >  libavutil/imgutils.h          |  18 +++++++
> >  libavutil/imgutils_internal.h |  29 +++++++++++
> >  libavutil/x86/Makefile        |   1 +
> >  libavutil/x86/imgutils.c      | 109 ++++++++++++++++++++++++++++++++++++++++++
> >  5 files changed, 171 insertions(+)
> >  create mode 100644 libavutil/imgutils_internal.h
> >  create mode 100644 libavutil/x86/imgutils.c
> > 
> > diff --git a/libavutil/imgutils.c b/libavutil/imgutils.c
> > index ef0e671..e538c75 100644
> > --- a/libavutil/imgutils.c
> > +++ b/libavutil/imgutils.c
> > @@ -30,6 +30,7 @@
> >  #include "mathematics.h"
> >  #include "pixdesc.h"
> >  #include "rational.h"
> > +#include "imgutils_internal.h"
> >  
> >  void av_image_fill_max_pixsteps(int max_pixsteps[4], int max_pixstep_comps[4],
> >                                  const AVPixFmtDescriptor *pixdesc)
> > @@ -405,3 +406,16 @@ int av_image_copy_to_buffer(uint8_t *dst, int dst_size,
> >  
> >      return size;
> >  }
> > +
> > +void av_image_copy_plane_from_uswc(uint8_t *dst, size_t dst_linesize,
> > +				   const uint8_t *src, size_t src_linesize,
> > +				   unsigned bytewidth, unsigned height,
> > +				   unsigned cpu_flags)
> > +{
> > +#ifndef HAVE_SSSE3
> 
> All HAVE_ are always defined to either 0 or 1.

Fixed.

> Nonetheless, this kind of check does not belong outside of arch folders. You should
> check for ARCH_X86 to call functions in the x86/ folder. See lavc/lavfi for examples.

I see, but I think this use case is pretty different. We don't have a
context where to set a function pointer, and I don't want to add a new
context and API for such things (but I'm open to suggestions). A
probably slightly ugly alternative could be to define a function such
as:
get_ff_image_copy_plane_from_uswc_fn()

returning a pointer to the correct function.

[...]
> > diff --git a/libavutil/x86/imgutils.c b/libavutil/x86/imgutils.c
> > new file mode 100644
> > index 0000000..91c7a42
> > --- /dev/null
> > +++ b/libavutil/x86/imgutils.c
> > @@ -0,0 +1,109 @@
> > +/*
> > + * This file is part of FFmpeg.
> > + *
> > + * FFmpeg is free software; you can redistribute it and/or
> > + * modify it under the terms of the GNU Lesser General Public
> > + * License as published by the Free Software Foundation; either
> > + * version 2.1 of the License, or (at your option) any later version.
> > + *
> > + * FFmpeg is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > + * Lesser General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU Lesser General Public
> > + * License along with FFmpeg; if not, write to the Free Software
> > + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
> > + */
> > +
> > +#include <inttypes.h>
> > +#include "config.h"
> > +#include "libavutil/attributes.h"
> > +#include "libavutil/avassert.h"
> > +#include "libavutil/intreadwrite.h"
> > +#include "libavutil/x86/asm.h"
> > +#include "libavutil/x86/cpu.h"
> > +#include "libavutil/cpu.h"
> > +#include "libavutil/pixdesc.h"
> > +
> > +#include "libavutil/avassert.h"
> > +#include "libavutil/x86/asm.h"
> > +#include "libavutil/imgutils.h"
> > +#include "libavutil/imgutils_internal.h"
> > +
> > +#ifdef HAVE_SSE2
> > +/* Copy 16/64 bytes from srcp to dstp loading data with the SSE>=2 instruction
> > + * load and storing data with the SSE>=2 instruction store.
> > + */
> > +#define COPY16(dstp, srcp, load, store) \
> > +    __asm__ volatile (                      \
> > +        load "  0(%[src]), %%xmm1\n"    \
> > +        store " %%xmm1,    0(%[dst])\n" \
> > +        : : [dst]"r"(dstp), [src]"r"(srcp) : "memory", "xmm1")
> > +
> > +#define COPY64(dstp, srcp, load, store) \
> > +    __asm__ volatile (                      \
> > +        load "  0(%[src]), %%xmm1\n"    \
> > +        load " 16(%[src]), %%xmm2\n"    \
> > +        load " 32(%[src]), %%xmm3\n"    \
> > +        load " 48(%[src]), %%xmm4\n"    \
> > +        store " %%xmm1,    0(%[dst])\n" \
> > +        store " %%xmm2,   16(%[dst])\n" \
> > +        store " %%xmm3,   32(%[dst])\n" \
> > +        store " %%xmm4,   48(%[dst])\n" \
> > +        : : [dst]"r"(dstp), [src]"r"(srcp) : "memory", "xmm1", "xmm2", "xmm3", "xmm4")
> > +#endif
> 

> As already mentioned, this should be done in nasm/yasm syntax.
> Also, any reason you're not using more xmm registers to reduce the amount of loops?
> or just unroll things a bit even if you still use only four (you skipped xmm0 for
> that matter).

I'm not the author of the code, I just copied and arranged the code
from fenrir, and I'd like to keep that unchanged at least in the first
commit. Also I don't know enough about assembly and YASM, and even if
it could be useful for me to learn it, doing that for this single
patch would be overkill.

OTOH, the patch as is will cause compilation errors: surprisingly, it
compiles fine on Linux, while on Windows I had to manually add the
-mmmx -msse2 flags when compiling the x86/imgutils.o module.

[...]

I updated the patches with all the trivial fixes which I could apply.
-- 
FFmpeg = Frenzy Forgiving Magic Portable Exxagerate Game
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-lavu-imgutils-add-av_image_copy_plane_from_uswc-func.patch
Type: text/x-diff
Size: 8474 bytes
Desc: not available
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20150529/ee5ba261/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0003-ffmpeg_dxva2-use-optimized-copy-function-from-GPU-to.patch
Type: text/x-diff
Size: 2195 bytes
Desc: not available
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20150529/ee5ba261/attachment-0001.bin>