[FFmpeg-devel] [RFC] DXVA2 decoding and FFmpeg

Stefano Sabatini stefasab at gmail.com
Mon May 18 12:37:57 CEST 2015

On Thu, May 14, 2015 at 2:52 PM, Stefano Sabatini <stefasab at gmail.com>

> On date Thursday 2015-05-14 13:01:51 +0200, Stefano Sabatini encoded:
> > On date Tuesday 2015-05-12 15:54:17 +0200, Hendrik Leppkes encoded:
> [...]
> > > One limitation is as the manual said, it needs to be copied from the
> > > GPU to system memory. ffmpeg_dxva2.c does not implement a optimized
> > > copy function for this, it uses plain old memcpy.
> > > Intel introduced a new instruction for this in SSE4, MOVNTDQA, which
> > > is optimized for copying from USWC memory (Uncacheable Speculative
> > > Write Combining) to system memory. Using this may help speed up the
> > > process significantly, and VLC probably uses it.
> >
> > Now the question is, how would be possible to optimize GPU to CPU copy
> > to get an overall performance gain? At least VLC seems able to get
> > better performances when using HW decoding, but I'm not sure it is
> > copying decoded data back to the CPU (indeed it may perform direct
> > rendering).
> Self-reply:
> commit 62107e563f979c638f9a5f58cdfd5639d9c63ac7
> Author: Laurent Aimar <fenrir at videolan.org>
> Date:   Tue Nov 17 01:09:43 2009 +0100
>     Improved performance when copying video surface in dxva2.
> That is, VLC is using optimized GPU->CPU copy when the relevant SSE2
> instructions are available.

I have a first hackish patch, performed some tests and I got some
significant performance gains, on my iCore5 with Intel Graphics HD4000 I
have now the same performance as the software decoder using DXVA2 for
decoding a H.264 1920x1080 video, but using only a single thread. The patch
as is is a hack, since I had to modify the compilation flags to enable
assembly compilation in the ffmpeg_dxva2.c file. I should probably create
an optimized copy function in libavutil, comments are welcome.

The IDirect3D9_CreateDevice(... GetShellWindow ...) -> ..GetDesktopWindow
change is required to make it compile under MinGW (with MinGW64 it is
probably not required, I still have to switch to MinGW64 but allowing MinGW
compilation is still worthwhile).
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-ffmpeg_dxva.c-add-support-to-optimized-GPU-to-CPU-co.patch
Type: application/octet-stream
Size: 10218 bytes
Desc: not available
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20150518/577c38b8/attachment.obj>

More information about the ffmpeg-devel mailing list