[FFmpeg-devel] [RFC] DXVA2 decoding and FFmpeg

Tue May 12 15:33:13 CEST 2015

Hi guys,

I'm playing with DXVA2 hardware decoding on Windows, and these are my
findings.

DVXA2 decoding was enabled in avconv/ffmpeg through the commit:

commit 35177ba77ff60a8b8839783f57e44bcc4214507a
Author: Hendrik Leppkes <h.leppkes at gmail.com>
Date:   Tue Apr 22 15:22:53 2014 +0200

    avconv: add support for DXVA2 decoding

    Signed-off-by: Anton Khirnov <anton at khirnov.net>

DXVA2 decoding is enabled when a dxva2api.h header is found in the
path. From my understanding the header is provided by VLC:
http://download.videolan.org/pub/contrib/dxva2api.h

(I suppose the header was created in order to make compilation work
with MinGW). When compiling with MinGW from mingw.org I had to change
the GetShellWindow call in the line:

    hr = IDirect3D9_CreateDevice(ctx->d3d9, adapter, D3DDEVTYPE_HAL, GetShellWindow(),
                                 D3DCREATE_SOFTWARE_VERTEXPROCESSING | D3DCREATE_MULTITHREADED | D3DCREATE_FPU_PRESERVE,
                                 &d3dpp, &ctx->d3d9device);

to GetDesktopWindow in the ffmpeg_dxva2.c file. I applied the fix
suggested here:
http://ffmpeg.org/pipermail/libav-user/2014-December/007673.html

Then I performed some tests with the command:
ffmpeg -hwaccel dxva2 INPUT -threads 1 -f null -

The -threads 1 option seems required or ffmpeg will fail with decoding
errors.

In the ffmpeg(1) manual I can read this big warning:
 Note that most acceleration methods are intended for playback and
 will not be faster than software decoding on modern
 CPUs. Additionally, ffmpeg will usually need to copy the decoded
 frames from the GPU memory into the system memory, resulting in
 further performance loss. This option is thus mainly useful for
 testing.

I tested with several HW combinations, and I always find that pure
software decoding is always several time faster than DXVA2
decoding. In some cases I got invalid output (same with VLC) which may
be related to a problem in the graphics card or driver (a VIA VX900).

On the other hand when testing with VLC I noticed better performances
(in general, a significantly reduced usage of the CPU, usually of an
order of 3), so I have to conclude that at least VLC is able to make
good use of DXVA2 hardware acceleration.

I'm aware that the need to copy GPU data back to the CPU memory as
required by ffmpeg defeats the advantage (if any) of hardware
decoding, especially given that multithreading decoding cannot be
adopted with DXVA2.

My questions are:

There are some cases when DXVA2 (or in general HW decoding) can be
used effectively in ffmpeg? Can you tell if there is something which
could be improved in the current ffmpeg_dxva2.c implementation? (My
guess is that this code is somehow based on the VLC code).

Would it make sense to integrate DXVA2 decoding in ffplay.c, assuming
it would be worth the effort, at least for testing/didactic purposes?

Related resources:
https://trac.ffmpeg.org/ticket/604
https://ffmpeg.org/pipermail/ffmpeg-user/2012-May/006600.html
http://forum.doom9.org/showthread.php?t=170793

TIA for any comments.
-- 
FFmpeg = Fostering and Fantastic Maxi Picky Erudite God