[FFmpeg-devel] Performance of P010LE/BE pixel convertion
Ronald S. Bultje
rsbultje at gmail.com
Thu Sep 1 14:44:09 EEST 2016
On Thu, Sep 1, 2016 at 7:34 AM, Timo Rothenpieler <timo at rothenpieler.org>
> > Hi,
> > On Thu, Sep 1, 2016 at 7:00 AM, Ali KIZIL <alikizil at gmail.com> wrote:
> >> Hi Oliver,
> >> I just setup my DDR3 RAM speed to 2133 Mhz on i7 4960x server. It dosnt
> >> make a much difference. FPS is still waiving 41-44 fps for UHD P010LE
> >> Main 10 encoding.
> >> Also, rawvideo P010LE encodding waiving 39-42 fps. For your note;while
> >> waves from 39-42 fps for YUV420P to P010LE, YUV420P to YUV420P10LE fps
> >> like 75-76:
> > I think this is expected, the p010le conversion is C (no SIMD). The
> > yuv420p10le conversion is using x86 SIMD (probably AVX).
> > To fix this, add x86 SIMD implementations of the p010le conversions in
> > swscale. Better yet, add direct conversions from yuv420p10 (which I
> > is the internal format of your actual source after decoding?) to p010le,
> > first C and then later x86 SIMD.
> I think 40-50 FPS is quite a nice result for UHD with the plain stupid C
I agree. I didn't mean to offend you for writing bad C code, or for not
writing SIMD code. I simply meant to point out that if you want to go from
40-50fps to 100+fps, SIMD is probably the easiest way to move in that
Also, isn't the internal representation of YUV 10bit in swscale
> essentially yuv420p10 anyway, so the conversion already is as direct as
> it gets?
There is probably no conversion at all, right. But given that there's also
a video being decoded, which is much more CPU-intensive than colorspace
conversion, you wouldn't expect the colorspace conversion to slow it down
by >2x. (Unless it's C, of course. :-).)
> I have no idea why you would want to convert from yuv420p to p010le or
> > yuv420p10le. I understand swscale supports it (it should) but I doubt
> > that's how you want to generate 10 bits content.
> P010 is the only YUV420 10bit format NVENC supports.
His source in the given example was yuv420p. If your source is 8bit, encode
8bits, not 10bits. For 10bit encoding, use 10bit source.
So even if this is only a performance test, we need to think about whether
the test tells us something meaningful. In particular, to repeat what I
said earlier, if the source is represented as yuv420p10le after decoding, a
direct yuv420p10le to p010le conversion in C and SIMD is probably going to
be even-more-efficient than a SIMD implementation of the p010le (or be)
input/output that you wrote earlier, since that's the "slow" conversion
If this is confusing, poke me at VDD (QtCon) and I'll explain in more
More information about the ffmpeg-devel