[FFmpeg-devel] adding RGBA and BGRA to nvenc.c

Andy Furniss adf.lists at gmail.com
Mon Sep 12 16:51:17 EEST 2016

Andy Furniss wrote:

> I do know that I have really grabbed and encoded 1080p60 with my AMD
> h/w and including nv12 conversion gives a sane looking result -
> gst-launch-1.0 -f ximagesrc use-damage=0 startx=0 starty=0 endx=1919
>  endy=1079 num-buffers=1000 ! queue ! videoconvert !
> video/x-raw,framerate=100/1,format=NV12  ! fakesink Setting pipeline
> to PAUSED ... Pipeline is live and does not need PREROLL ... Setting
> pipeline to PLAYING ... New clock: GstSystemClock Got EOS from
> element "pipeline0". Execution ended after 0:00:14.419928745 Setting
> pipeline to PAUSED ... Setting pipeline to READY ... Setting pipeline
> to NULL ... Freeing pipeline ...
> 1000/14.419928745 = 69.3

Over the weekend I looked at the CSC aspect of this without using
x11grab = benching bgr0 on tmpfs to nv12 and managed with a bit of luck
to get ffmpeg to beat gstreamer.

Starting point gstreamer bgr0 to nv12 = 70fps, to I420 68fps.

ffmpeg benched using -f null as -f rawvideo to ram or /dev/null is
slower and I suspect/hope for my intended usage = vaapi upload -f null
will be more representative, but of course I don't know that.

ffmpeg -f rawvideo -s 1920x1080 -pix_fmt bgr0  -i /mnt/ramdisk/out.bgr0 
-pix_fmt nv12 -f null -

=41 fps, yuv420p = 66fps

So yuv420p is close to gstreamer but nv12 is poor.

By chance I wondered how much worse it would be if I used -sws_flags as
I have done in the past. Result it was faster, it turns out that
+full_chroma_inp takes yuv420p from 66 to 84fps and nv12 to 47fps.

The reason being that with no flags time is spent in bgr32toUV_half_c
with flag above I don't use that and see various sse in use like

nv12 is still too slow though. Looking with sysprof I see that time
is spent in yuv2nv12cX_c.

Seemed slow when remembering yuv420p -> nv12 conversions from the past
so I benched 1080p yuv420p -> nv12 and got > 500fps. Doing this didn't
use yuv2nv12cX_c at all so I got to make a new command line -

ffmpeg -f rawvideo -s 1920x1080 -pix_fmt bgr0  -i /mnt/ramdisk/out.bgr0 
-vf scale=flags=+full_chroma_inp,format=yuv420p,format=nv12 -f null -

= 78fps, nice.

So at least I can beat gstreamer on CSC now. Testing the new commandline
with x11grab gets me close to gst using the legacy x11grab = 65 fps.

libxcb x11grab is 52 fps though, so it would be good if that can be 
fixed up.

More information about the ffmpeg-devel mailing list