[FFmpeg-devel] Performance of P010LE/BE pixel convertion

Ali KIZIL alikizil at gmail.com
Thu Sep 1 15:36:57 EEST 2016


>* On 1 Sep 2016, at 14:59, Timo Rothenpieler <timo at rothenpieler.org <http://ffmpeg.org/mailman/listinfo/ffmpeg-devel>> wrote:
*

>

>* Am 01.09.2016 um 13:44 schrieb Ronald S. Bultje:
*

>>* Hi Timo,
*

>>

>>* On Thu, Sep 1, 2016 at 7:34 AM, Timo Rothenpieler <timo at rothenpieler.org <http://ffmpeg.org/mailman/listinfo/ffmpeg-devel>>
*

>>* wrote:
*

>>

>>>>* Hi,
*

>>>>

>>>>* On Thu, Sep 1, 2016 at 7:00 AM, Ali KIZIL <alikizil at gmail.com <http://ffmpeg.org/mailman/listinfo/ffmpeg-devel>> wrote:
*

>>>>

>>>>>* Hi Oliver,
*

>>>>>

>>>>>* I just setup my DDR3 RAM speed to 2133 Mhz on i7 4960x server. It dosnt
*

>>>>>* make a much difference. FPS is still waiving 41-44 fps for UHD P010LE
*

>>>* HEVC
*

>>>>>* Main 10 encoding.
*

>>>>>

>>>>>* Also, rawvideo P010LE encodding waiving 39-42 fps. For your note;while
*

>>>* FPS
*

>>>>>* waves from 39-42 fps for YUV420P to P010LE, YUV420P to YUV420P10LE fps
*

>>>* is
*

>>>>>* like 75-76:
*

>>>>

>>>>

>>>>* I think this is expected, the p010le conversion is C (no SIMD). The
*

>>>>* yuv420p10le conversion is using x86 SIMD (probably AVX).
*

>>>>

>>>>* To fix this, add x86 SIMD implementations of the p010le conversions in
*

>>>>* swscale. Better yet, add direct conversions from yuv420p10 (which I
*

>>>* assume
*

>>>>* is the internal format of your actual source after decoding?) to p010le,
*

>>>>* first C and then later x86 SIMD.
*

>>>

>>>* I think 40-50 FPS is quite a nice result for UHD with the plain stupid C
*

>>>* implementation.
*

>>>

>>

>>* I agree. I didn't mean to offend you for writing bad C code, or for not
*

>>* writing SIMD code. I simply meant to point out that if you want to go from
*

>>* 40-50fps to 100+fps, SIMD is probably the easiest way to move in that
*

>>* direction.
*

>

>* Didn't take it like that, was more a general remark.
*

>* The C implementation is as straight forward as it gets.
*

>* I wonder if re-arranging the code, could make it more efficient though.
*

>* Stuff like moving some if() checks out of the loop, and duplicating the
*

>* loop instead, or other tricks that lead to gcc generating faster code.
*

I’m not sure it’ll make much difference - you may recall my original
patch had code in nvenc.c that took a YUV420P input and converted it
to P010 as it fed the frames into the encoder. Out of curiosity I did
some quick testing of this versus the code that has since been added
in swscale to support P010 conversions and could find no difference in
the time it took to encode my 60s sample. Not an exhaustive test by
any means, but if there was any obvious inefficiency in the swscale
code then I’d have expected to see some difference but I tested my
sample three times with each version of the code and the time taken to
encode was virtually identical every time.

Oliver

Hi Oliver,

I followed your comment and tried your original patch. It works much
much better. FPS goes up to 88 - 92 fps for UHD HEVC Main10
YUV420P10LE.

I attached the nvenc.c file for your check as well (just I added the 2
convertion functions and did the change in YUV420P10LE pixel format
selection part).


ffmpeg version N-81508-g99882d0 Copyright (c) 2000-2016 the FFmpeg developers
  built with gcc 4.8 (Ubuntu 4.8.4-2ubuntu1~14.04.3)
  configuration: --prefix=/opt/ffmpeg --enable-shared --enable-static
--enable-nonfree --enable-gpl --extra-cflags='-I/opt/ffmpeg/include
-I/usr/local/include' --extra-ldflags=-L/opt/ffmpeg/lib
--bindir=/opt/ffmpeg/bin --extra-libs=-ldl --enable-libx264
--enable-libx265 --enable-nonfree --enable-gpl --enable-nvenc
--enable-vdpau --enable-libzvbi --enable-libfdk-aac --enable-libzimg
--enable-avresample --enable-libnpp --enable-cuda
  libavutil      55. 29.100 / 55. 29.100
  libavcodec     57. 54.101 / 57. 54.101
  libavformat    57. 48.101 / 57. 48.101
  libavdevice    57.  0.102 / 57.  0.102
  libavfilter     6. 58.100 /  6. 58.100
  libavresample   3.  0.  0 /  3.  0.  0
  libswscale      4.  1.100 /  4.  1.100
  libswresample   2.  1.100 /  2.  1.100
  libpostproc    54.  0.100 / 54.  0.100
Routing option err_detect to both codec and muxer layer
Input #0, matroska,webm, from
'/media/usb1/4K_TS/SES.Astra.UHD.Test.1.2160p.UHDTV.AAC.HEVC.x265-LiebeIst.mkv':
  Metadata:
    encoder         : libebml v1.3.1 + libmatroska v1.4.2
    creation_time   : 2015-10-03T13:49:42.000000Z
  Duration: 00:01:49.29, start: 0.816000, bitrate: 18484 kb/s
    Stream #0:0: Video: hevc (Main 10), 1 reference frame,
yuv420p10le(tv), 3840x2160 [SAR 1:1 DAR 16:9], 60 fps, 60 tbr, 1k tbn,
60 tbc (default)
    Metadata:
      BPS             : 18497251
      BPS-eng         : 18497251
      DURATION        : 00:01:48.450000000
      DURATION-eng    : 00:01:48.450000000
      NUMBER_OF_FRAMES: 6507
      NUMBER_OF_FRAMES-eng: 6507
      NUMBER_OF_BYTES : 250753360
      NUMBER_OF_BYTES-eng: 250753360
      _STATISTICS_WRITING_APP: mkvmerge v8.0.0 ('Til The Day That I Die') 64bit
      _STATISTICS_WRITING_APP-eng: mkvmerge v8.0.0 ('Til The Day That
I Die') 64bit
      _STATISTICS_WRITING_DATE_UTC: 2015-10-03 13:49:42
      _STATISTICS_WRITING_DATE_UTC-eng: 2015-10-03 13:49:42
      _STATISTICS_TAGS: BPS DURATION NUMBER_OF_FRAMES NUMBER_OF_BYTES
      _STATISTICS_TAGS-eng: BPS DURATION NUMBER_OF_FRAMES NUMBER_OF_BYTES
    Stream #0:1: Audio: aac (LC), 44100 Hz, stereo, fltp (default)
    Metadata:
      BPS             : 124607
      BPS-eng         : 124607
      DURATION        : 00:01:49.267000000
      DURATION-eng    : 00:01:49.267000000
      NUMBER_OF_FRAMES: 4669
      NUMBER_OF_FRAMES-eng: 4669
      NUMBER_OF_BYTES : 1701940
      NUMBER_OF_BYTES-eng: 1701940
      _STATISTICS_WRITING_APP: mkvmerge v8.0.0 ('Til The Day That I Die') 64bit
      _STATISTICS_WRITING_APP-eng: mkvmerge v8.0.0 ('Til The Day That
I Die') 64bit
      _STATISTICS_WRITING_DATE_UTC: 2015-10-03 13:49:42
      _STATISTICS_WRITING_DATE_UTC-eng: 2015-10-03 13:49:42
      _STATISTICS_TAGS: BPS DURATION NUMBER_OF_FRAMES NUMBER_OF_BYTES
      _STATISTICS_TAGS-eng: BPS DURATION NUMBER_OF_FRAMES NUMBER_OF_BYTES
[graph 0 input from stream 0:0 @ 0x17a00a0] w:3840 h:2160
pixfmt:yuv420p10le tb:1/1000 fr:60/1 sar:1/1 sws_param:flags=2
[scaler for output stream 0:0 @ 0x17cf980] w:3840 h:2160
flags:'bicubic' interl:0
[scaler for output stream 0:0 @ 0x17cf980] w:3840 h:2160
fmt:yuv420p10le sar:1/1 -> w:3840 h:2160 fmt:yuv420p10le sar:1/1
flags:0x4
[graph 1 input from stream 0:1 @ 0x17d0140] tb:1/44100 samplefmt:fltp
samplerate:44100 chlayout:0x3
-async is forwarded to lavfi similarly to -af
aresample=async=1:min_hard_comp=0.100000:first_pts=0.
[graph 1 aresample for input stream 0:1 @ 0x17d0be0] ch:2 chl:stereo
fmt:fltp r:44100Hz -> ch:2 chl:stereo fmt:s16 r:44100Hz
[nvenc_hevc @ 0x17dca80] This encoder is deprecated, use 'hevc_nvenc' instead
[nvenc_hevc @ 0x17dca80] Loaded Nvenc version 7.0
[nvenc_hevc @ 0x17dca80] Nvenc initialized successfully
[nvenc_hevc @ 0x17dca80] 1 CUDA capable devices found
[nvenc_hevc @ 0x17dca80] [ GPU #0 - < TITAN X (Pascal) > has Compute SM 6.1 ]
[nvenc_hevc @ 0x17dca80] supports NVENC
[mpegts @ 0x17e78c0] Using AVStream.codec to pass codec parameters to
muxers is deprecated, use AVStream.codecpar instead.
    Last message repeated 1 times
[mpegts @ 0x17e78c0] muxrate 30000000, pcr every 398 pkts, sdt every
9973, pat/pmt every 1994 pkts
Output #0, mpegts, to '/tmp/test1.ts':
  Metadata:
    service_name    : PikoEncoder
    service_provider: PikoEncoder
    encoder         : Lavf57.48.101
    Stream #0:0: Video: hevc (nvenc_hevc) (Main 10), 1 reference
frame, yuv420p10le, 3840x2160 [SAR 1:1 DAR 16:9], q=-1--1, 28000 kb/s,
60 fps, 90k tbn, 60 tbc (default)
    Metadata:
      BPS             : 18497251
      BPS-eng         : 18497251
      DURATION        : 00:01:48.450000000
      DURATION-eng    : 00:01:48.450000000
      NUMBER_OF_FRAMES: 6507
      NUMBER_OF_FRAMES-eng: 6507
      NUMBER_OF_BYTES : 250753360
      NUMBER_OF_BYTES-eng: 250753360
      _STATISTICS_WRITING_APP: mkvmerge v8.0.0 ('Til The Day That I Die') 64bit
      _STATISTICS_WRITING_APP-eng: mkvmerge v8.0.0 ('Til The Day That
I Die') 64bit
      _STATISTICS_WRITING_DATE_UTC: 2015-10-03 13:49:42
      _STATISTICS_WRITING_DATE_UTC-eng: 2015-10-03 13:49:42
      _STATISTICS_TAGS: BPS DURATION NUMBER_OF_FRAMES NUMBER_OF_BYTES
      _STATISTICS_TAGS-eng: BPS DURATION NUMBER_OF_FRAMES NUMBER_OF_BYTES
      encoder         : Lavc57.54.101 nvenc_hevc
    Side data:
      cpb: bitrate max/min/avg: 28000000/0/28000000 buffer size:
28000000 vbv_delay: -1
    Stream #0:1: Audio: mp2, 44100 Hz, stereo, s16, delay 481, padding
0, 384 kb/s (default)
    Metadata:
      BPS             : 124607
      BPS-eng         : 124607
      DURATION        : 00:01:49.267000000
      DURATION-eng    : 00:01:49.267000000
      NUMBER_OF_FRAMES: 4669
      NUMBER_OF_FRAMES-eng: 4669
      NUMBER_OF_BYTES : 1701940
      NUMBER_OF_BYTES-eng: 1701940
      _STATISTICS_WRITING_APP: mkvmerge v8.0.0 ('Til The Day That I Die') 64bit
      _STATISTICS_WRITING_APP-eng: mkvmerge v8.0.0 ('Til The Day That
I Die') 64bit
      _STATISTICS_WRITING_DATE_UTC: 2015-10-03 13:49:42
      _STATISTICS_WRITING_DATE_UTC-eng: 2015-10-03 13:49:42
      _STATISTICS_TAGS: BPS DURATION NUMBER_OF_FRAMES NUMBER_OF_BYTES
      _STATISTICS_TAGS-eng: BPS DURATION NUMBER_OF_FRAMES NUMBER_OF_BYTES
      encoder         : Lavc57.54.101 mp2
Stream mapping:
  Stream #0:0 -> #0:0 (hevc (native) -> hevc (nvenc_hevc))
  Stream #0:1 -> #0:1 (aac (native) -> mp2 (native))
Press [q] to stop, [?] for help
[graph 1 aresample for input stream 0:1 @ 0x17d0be0] [SWR @ 0x17ea060]
adding 1014 audio samples of silence
[AVBSFContext @ 0x4b67660] The input looks like it is Annex B already
frame=  818 fps= 89 q=16.0 Lsize=   50894kB time=00:00:13.91
bitrate=29968.0kbits/s speed=1.51x
video:19561kB audio:653kB subtitle:0kB other streams:0kB global
headers:0kB muxing overhead: 151.775177%
Input file #0 (/media/usb1/4K_TS/SES.Astra.UHD.Test.1.2160p.UHDTV.AAC.HEVC.x265-LiebeIst.mkv):
  Input stream #0:0 (video): 832 packets read (26952754 bytes); 819
frames decoded;
  Input stream #0:1 (audio): 599 packets read (218347 bytes); 599
frames decoded (613376 samples);
  Total: 1431 packets (27171101 bytes) demuxed
Output file #0 (/tmp/test1.ts):
  Output stream #0:0 (video): 818 frames encoded; 818 packets muxed
(20030971 bytes);
  Output stream #0:1 (audio): 533 frames encoded (614016 samples); 533
packets muxed (668316 bytes);
  Total: 1351 packets (20699287 bytes) muxed
[nvenc_hevc @ 0x17dca80] Nvenc unloaded
Exiting normally, received signal 2.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: nvenc.c
Type: text/x-csrc
Size: 62555 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20160901/1eea510b/attachment.c>


More information about the ffmpeg-devel mailing list