[FFmpeg-devel] Performance of P010LE/BE pixel convertion

Oliver Collyer ovcollyer at mac.com
Thu Sep 1 15:08:16 EEST 2016


> On 1 Sep 2016, at 14:59, Timo Rothenpieler <timo at rothenpieler.org> wrote:
> 
> Am 01.09.2016 um 13:44 schrieb Ronald S. Bultje:
>> Hi Timo,
>> 
>> On Thu, Sep 1, 2016 at 7:34 AM, Timo Rothenpieler <timo at rothenpieler.org>
>> wrote:
>> 
>>>> Hi,
>>>> 
>>>> On Thu, Sep 1, 2016 at 7:00 AM, Ali KIZIL <alikizil at gmail.com> wrote:
>>>> 
>>>>> Hi Oliver,
>>>>> 
>>>>> I just setup my DDR3 RAM speed to 2133 Mhz on i7 4960x server. It dosnt
>>>>> make a much difference. FPS is still waiving 41-44 fps for UHD P010LE
>>> HEVC
>>>>> Main 10 encoding.
>>>>> 
>>>>> Also, rawvideo P010LE encodding waiving 39-42 fps. For your note;while
>>> FPS
>>>>> waves from 39-42 fps for YUV420P to P010LE, YUV420P to YUV420P10LE fps
>>> is
>>>>> like 75-76:
>>>> 
>>>> 
>>>> I think this is expected, the p010le conversion is C (no SIMD). The
>>>> yuv420p10le conversion is using x86 SIMD (probably AVX).
>>>> 
>>>> To fix this, add x86 SIMD implementations of the p010le conversions in
>>>> swscale. Better yet, add direct conversions from yuv420p10 (which I
>>> assume
>>>> is the internal format of your actual source after decoding?) to p010le,
>>>> first C and then later x86 SIMD.
>>> 
>>> I think 40-50 FPS is quite a nice result for UHD with the plain stupid C
>>> implementation.
>>> 
>> 
>> I agree. I didn't mean to offend you for writing bad C code, or for not
>> writing SIMD code. I simply meant to point out that if you want to go from
>> 40-50fps to 100+fps, SIMD is probably the easiest way to move in that
>> direction.
> 
> Didn't take it like that, was more a general remark.
> The C implementation is as straight forward as it gets.
> I wonder if re-arranging the code, could make it more efficient though.
> Stuff like moving some if() checks out of the loop, and duplicating the
> loop instead, or other tricks that lead to gcc generating faster code.

I’m not sure it’ll make much difference - you may recall my original patch had code in nvenc.c that took a YUV420P input and converted it to P010 as it fed the frames into the encoder. Out of curiosity I did some quick testing of this versus the code that has since been added in swscale to support P010 conversions and could find no difference in the time it took to encode my 60s sample. Not an exhaustive test by any means, but if there was any obvious inefficiency in the swscale code then I’d have expected to see some difference but I tested my sample three times with each version of the code and the time taken to encode was virtually identical every time.

Oliver

> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel



More information about the ffmpeg-devel mailing list