[FFmpeg-devel] [PATCH v3] lavfi/qsvvpp: support async depth

Sat Apr 10 08:12:54 EEST 2021

Hi Fei W,

On Wed, Mar 31, 2021 at 10:14 AM Wang, Fei W <fei.w.wang at intel.com> wrote:
>
> On Wed, 2021-03-31 at 10:07 +0800, Fei Wang wrote:
> > Async depth will allow qsv filter cache few frames, and avoid force
> > switch and end filter task frame by frame. This change will improve
> > performance for some multi-task case, for example 1:N transcode(
> > decode + vpp + encode) with all QSV plugins.
> >
> > Performance data test on my Coffee Lake Desktop(i7-8700K) by using
> > the following 1:8 transcode test case improvement:
> > 1. Fps improved from 55 to 130.
> > 2. Render/Video usage improved from ~61%/~38% to ~100%/~70%.(Data get
> > from intel_gpu_top)
> >
> > test CMD:
> > ffmpeg -v verbose -init_hw_device qsv=hw:/dev/dri/renderD128
> > -filter_hw_device                 \
> >  hw -hwaccel qsv -hwaccel_output_format qsv -c:v h264_qsv -i
> > 1920x1080.264                     \
> > -vf 'vpp_qsv=w=1280:h=720:async_depth=4' -c:v h264_qsv -r:v 30
> > -preset 7 -g 33 -refs 2 -bf 3 -q 24 -f null - \
> > -vf 'vpp_qsv=w=1280:h=720:async_depth=4' -c:v h264_qsv -r:v 30
> > -preset 7 -g 33 -refs 2 -bf 3 -q 24 -f null - \
> > -vf 'vpp_qsv=w=1280:h=720:async_depth=4' -c:v h264_qsv -r:v 30
> > -preset 7 -g 33 -refs 2 -bf 3 -q 24 -f null - \
> > -vf 'vpp_qsv=w=1280:h=720:async_depth=4' -c:v h264_qsv -r:v 30
> > -preset 7 -g 33 -refs 2 -bf 3 -q 24 -f null - \
> > -vf 'vpp_qsv=w=1280:h=720:async_depth=4' -c:v h264_qsv -r:v 30
> > -preset 7 -g 33 -refs 2 -bf 3 -q 24 -f null - \
> > -vf 'vpp_qsv=w=1280:h=720:async_depth=4' -c:v h264_qsv -r:v 30
> > -preset 7 -g 33 -refs 2 -bf 3 -q 24 -f null - \
> > -vf 'vpp_qsv=w=1280:h=720:async_depth=4' -c:v h264_qsv -r:v 30
> > -preset 7 -g 33 -refs 2 -bf 3 -q 24 -f null -
> >
> > Signed-off-by: Fei Wang <fei.w.wang at intel.com>
> > ---
> > Change:
> > 1. Add test data in commit message.
> > 2. Rmove some duplicate code.

Appreciate the detailed data.
Verified locally the performance improves in 1:N downscale cases as
your description.

Also do some experiments for 1:N upscale (1080p->3840x2160) , 1:1 and N:1 cases,
the bottleneck seems to be somewhere else hence the performance
remains identical
for vpp async depth. But this could be another story.

Patch functionally LGTM, thx.

- linjie