[FFmpeg-devel] [PATCH v5] Improved the performance of 1 decode + N filter graphs and adaptive bitrate.

Mark Thompson sw at jkqxz.net
Sat Feb 16 14:12:28 EET 2019


On 15/02/2019 21:54, Shaofei Wang wrote:
> It enabled multiple filter graph concurrency, which bring above about
> 4%~20% improvement in some 1:N scenarios by CPU or GPU acceleration
> 
> Below are some test cases and comparison as reference.
> (Hardware platform: Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz)
> (Software: Intel iHD driver - 16.9.00100, CentOS 7)
> 
> For 1:N transcode by GPU acceleration with vaapi:
> ./ffmpeg -vaapi_device /dev/dri/renderD128 -hwaccel vaapi \
>     -hwaccel_output_format vaapi \
>     -i ~/Videos/1920x1080p_30.00_x264_qp28.h264 \
>     -vf "scale_vaapi=1280:720" -c:v h264_vaapi -f null /dev/null \
>     -vf "scale_vaapi=720:480" -c:v h264_vaapi -f null /dev/null
> 
>     test results:
>                 2 encoders 5 encoders 10 encoders
>     Improved       6.1%    6.9%       5.5%
> 
> For 1:N transcode by GPU acceleration with QSV:
> ./ffmpeg -hwaccel qsv -c:v h264_qsv \
>     -i ~/Videos/1920x1080p_30.00_x264_qp28.h264 \
>     -vf "scale_qsv=1280:720:format=nv12" -c:v h264_qsv -f null /dev/null \
>     -vf "scale_qsv=720:480:format=nv12" -c:v h264_qsv -f null /dev/null
> 
>     test results:
>                 2 encoders  5 encoders 10 encoders
>     Improved       6%       4%         15%
> 
> For Intel GPU acceleration case, 1 decode to N scaling, by QSV:
> ./ffmpeg -hwaccel qsv -c:v h264_qsv \
>     -i ~/Videos/1920x1080p_30.00_x264_qp28.h264 \
>     -vf "scale_qsv=1280:720:format=nv12,hwdownload" -pix_fmt nv12 -f null /dev/null \
>     -vf "scale_qsv=720:480:format=nv12,hwdownload" -pix_fmt nv12 -f null /dev/null
> 
>     test results:
>                 2 scale  5 scale   10 scale
>     Improved       12%     21%        21%
> 
> For CPU only 1 decode to N scaling:
> ./ffmpeg -i ~/Videos/1920x1080p_30.00_x264_qp28.h264 \
>     -vf "scale=1280:720" -pix_fmt nv12 -f null /dev/null \
>     -vf "scale=720:480" -pix_fmt nv12 -f null /dev/null
> 
>     test results:
>                 2 scale  5 scale   10 scale
>     Improved       25%    107%       148%
> 
> Signed-off-by: Wang, Shaofei <shaofei.wang at intel.com>
> Reviewed-by: Zhao, Jun <jun.zhao at intel.com>
> ---
>  fftools/ffmpeg.c        | 121 ++++++++++++++++++++++++++++++++++++++++++++----
>  fftools/ffmpeg.h        |  14 ++++++
>  fftools/ffmpeg_filter.c |   1 +
>  3 files changed, 128 insertions(+), 8 deletions(-)

On a bit more review, I don't think this patch works at all.

The existing code is all written to be run serially.  This simplistic approach to parallelising it falls down because many of those functions use variables written in what were previously other functions called at different times but have now become other threads, introducing undefined behaviour due to data races.

To consider a single example (not the only one), the function check_init_output_file() does not work at all after this change.  The test for OutputStream initialisation (so that you run exactly once after all of the output streams are ready) races with other threads setting those variables.  Since that's undefined behaviour you may get lucky sometimes and have the output file initialisation run exactly once, but in general it will fail in unknown ways.

If you want to resubmit this patch, you will need to refactor a lot of the other code in ffmpeg.c to rule out these undefined cases.

- Mark


More information about the ffmpeg-devel mailing list