[FFmpeg-devel] Performance of P010LE/BE pixel convertion
ovcollyer at mac.com
Thu Sep 1 14:52:49 EEST 2016
> On 1 Sep 2016, at 14:44, Ronald S. Bultje <rsbultje at gmail.com> wrote:
> Hi Timo,
> On Thu, Sep 1, 2016 at 7:34 AM, Timo Rothenpieler <timo at rothenpieler.org>
>>> On Thu, Sep 1, 2016 at 7:00 AM, Ali KIZIL <alikizil at gmail.com> wrote:
>>>> Hi Oliver,
>>>> I just setup my DDR3 RAM speed to 2133 Mhz on i7 4960x server. It dosnt
>>>> make a much difference. FPS is still waiving 41-44 fps for UHD P010LE
>>>> Main 10 encoding.
>>>> Also, rawvideo P010LE encodding waiving 39-42 fps. For your note;while
>>>> waves from 39-42 fps for YUV420P to P010LE, YUV420P to YUV420P10LE fps
>>>> like 75-76:
>>> I think this is expected, the p010le conversion is C (no SIMD). The
>>> yuv420p10le conversion is using x86 SIMD (probably AVX).
>>> To fix this, add x86 SIMD implementations of the p010le conversions in
>>> swscale. Better yet, add direct conversions from yuv420p10 (which I
>>> is the internal format of your actual source after decoding?) to p010le,
>>> first C and then later x86 SIMD.
>> I think 40-50 FPS is quite a nice result for UHD with the plain stupid C
> I agree. I didn't mean to offend you for writing bad C code, or for not
> writing SIMD code. I simply meant to point out that if you want to go from
> 40-50fps to 100+fps, SIMD is probably the easiest way to move in that
> Also, isn't the internal representation of YUV 10bit in swscale
>> essentially yuv420p10 anyway, so the conversion already is as direct as
>> it gets?
> There is probably no conversion at all, right. But given that there's also
> a video being decoded, which is much more CPU-intensive than colorspace
> conversion, you wouldn't expect the colorspace conversion to slow it down
> by >2x. (Unless it's C, of course. :-).)
>> I have no idea why you would want to convert from yuv420p to p010le or
>>> yuv420p10le. I understand swscale supports it (it should) but I doubt
>>> that's how you want to generate 10 bits content.
>> P010 is the only YUV420 10bit format NVENC supports.
> His source in the given example was yuv420p. If your source is 8bit, encode
> 8bits, not 10bits. For 10bit encoding, use 10bit source.
When I did some tests of this a week or so ago I found that taking an 8-bit source, converting to 10-bit and encoding as 10-bit could actually save space. I posted my results to this list.
I tried it after reading this...
…and was curious to see if it applied to NVENC HEVC.
I only tried one sample file, a yuv420p Slingbox capture but when I set global quality constant I saved a fair bit on the output file size.
Interestingly (or not) I couldn’t reproduce anything similar using x265 using a similar approach.
> So even if this is only a performance test, we need to think about whether
> the test tells us something meaningful. In particular, to repeat what I
> said earlier, if the source is represented as yuv420p10le after decoding, a
> direct yuv420p10le to p010le conversion in C and SIMD is probably going to
> be even-more-efficient than a SIMD implementation of the p010le (or be)
> input/output that you wrote earlier, since that's the "slow" conversion
> If this is confusing, poke me at VDD (QtCon) and I'll explain in more
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
More information about the ffmpeg-devel