[FFmpeg-devel] Performance of P010LE/BE pixel convertion

Timo Rothenpieler timo at rothenpieler.org
Thu Sep 1 14:59:36 EEST 2016

Am 01.09.2016 um 13:44 schrieb Ronald S. Bultje:
> Hi Timo,
> On Thu, Sep 1, 2016 at 7:34 AM, Timo Rothenpieler <timo at rothenpieler.org>
> wrote:
>>> Hi,
>>> On Thu, Sep 1, 2016 at 7:00 AM, Ali KIZIL <alikizil at gmail.com> wrote:
>>>> Hi Oliver,
>>>> I just setup my DDR3 RAM speed to 2133 Mhz on i7 4960x server. It dosnt
>>>> make a much difference. FPS is still waiving 41-44 fps for UHD P010LE
>>>> Main 10 encoding.
>>>> Also, rawvideo P010LE encodding waiving 39-42 fps. For your note;while
>> FPS
>>>> waves from 39-42 fps for YUV420P to P010LE, YUV420P to YUV420P10LE fps
>> is
>>>> like 75-76:
>>> I think this is expected, the p010le conversion is C (no SIMD). The
>>> yuv420p10le conversion is using x86 SIMD (probably AVX).
>>> To fix this, add x86 SIMD implementations of the p010le conversions in
>>> swscale. Better yet, add direct conversions from yuv420p10 (which I
>> assume
>>> is the internal format of your actual source after decoding?) to p010le,
>>> first C and then later x86 SIMD.
>> I think 40-50 FPS is quite a nice result for UHD with the plain stupid C
>> implementation.
> I agree. I didn't mean to offend you for writing bad C code, or for not
> writing SIMD code. I simply meant to point out that if you want to go from
> 40-50fps to 100+fps, SIMD is probably the easiest way to move in that
> direction.

Didn't take it like that, was more a general remark.
The C implementation is as straight forward as it gets.
I wonder if re-arranging the code, could make it more efficient though.
Stuff like moving some if() checks out of the loop, and duplicating the
loop instead, or other tricks that lead to gcc generating faster code.

More information about the ffmpeg-devel mailing list