[FFmpeg-devel] Performance of P010LE/BE pixel convertion

Ronald S. Bultje rsbultje at gmail.com
Thu Sep 1 18:08:36 EEST 2016


Hi Timo,

On Thu, Sep 1, 2016 at 7:59 AM, Timo Rothenpieler <timo at rothenpieler.org>
wrote:

> Am 01.09.2016 um 13:44 schrieb Ronald S. Bultje:
> > Hi Timo,
> >
> > On Thu, Sep 1, 2016 at 7:34 AM, Timo Rothenpieler <timo at rothenpieler.org
> >
> > wrote:
> >
> >>> Hi,
> >>>
> >>> On Thu, Sep 1, 2016 at 7:00 AM, Ali KIZIL <alikizil at gmail.com> wrote:
> >>>
> >>>> Hi Oliver,
> >>>>
> >>>> I just setup my DDR3 RAM speed to 2133 Mhz on i7 4960x server. It
> dosnt
> >>>> make a much difference. FPS is still waiving 41-44 fps for UHD P010LE
> >> HEVC
> >>>> Main 10 encoding.
> >>>>
> >>>> Also, rawvideo P010LE encodding waiving 39-42 fps. For your note;while
> >> FPS
> >>>> waves from 39-42 fps for YUV420P to P010LE, YUV420P to YUV420P10LE fps
> >> is
> >>>> like 75-76:
> >>>
> >>>
> >>> I think this is expected, the p010le conversion is C (no SIMD). The
> >>> yuv420p10le conversion is using x86 SIMD (probably AVX).
> >>>
> >>> To fix this, add x86 SIMD implementations of the p010le conversions in
> >>> swscale. Better yet, add direct conversions from yuv420p10 (which I
> >> assume
> >>> is the internal format of your actual source after decoding?) to
> p010le,
> >>> first C and then later x86 SIMD.
> >>
> >> I think 40-50 FPS is quite a nice result for UHD with the plain stupid C
> >> implementation.
> >>
> >
> > I agree. I didn't mean to offend you for writing bad C code, or for not
> > writing SIMD code. I simply meant to point out that if you want to go
> from
> > 40-50fps to 100+fps, SIMD is probably the easiest way to move in that
> > direction.
>
> Didn't take it like that, was more a general remark.
> The C implementation is as straight forward as it gets.
> I wonder if re-arranging the code, could make it more efficient though.
> Stuff like moving some if() checks out of the loop, and duplicating the
> loop instead, or other tricks that lead to gcc generating faster code.


So, partially. I just saw your other patch, and it indeed does very little,
but you'll still be able to get some speedups out of SIMD. SIMD is simply
faster because it allows you to do 8 or so pixels per
iteration-of-instructions (instead of just 1).

If you're wondering how to get started with SIMD in ffmpeg, I highly
recommend x264 asm intro:
https://wiki.videolan.org/X264_asm_intro/

Ronald


More information about the ffmpeg-devel mailing list