[FFmpeg-devel] Memcpy Operation Duration

Sven C. Dack sven.c.dack at sky.com
Tue Oct 18 22:44:36 EEST 2016

On 18/10/16 20:26, Ali KIZIL wrote:
> Hi Everyone,
> Today, I was analyzing memcpy duration in FFmpeg. I noticed that it is
> taking longer time compared to an optimized SSE, SSE2, MMX, MMX2, AVX or
> AVX2 based memcpy operation.
> I tried march=corei7-avx2 compiled FFmpeg version, it does not change the
> duration of memcpy operation.
> I also folowed https://trac.ffmpeg.org/wiki/CompilationGuide#PerformanceTips
> .Same result. In addition, I tried gcc 6.2 if gcc if gcc is not selecting
> the correct flag. Same result again.
> This memcpy operations effect the fps decoding (and probably encoding)
> rates.
> In a case that uyvy422 to p010 3840x2160 unscaled convertion in rawvideo,
> fps rate increased from 44 fps to 52 fps on a Xeon E5 2630 v4.
> Do I miss anything when compiling FFmpeg for AVX2 or other flag optimised,
> or there need a fix in FFmpeg to direct some (or all)  memcpy operations to
> a inherited memcpy operation which can decide flag for optimisation ?
> Or there is no such need and I am on a wrong path ?
> (As a side note, FFmpeg works performance on i7 Extreme cores compared to
> Xeon v4 processors.)
> Kind Regards,
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Could be it's gcc's built-in version. It's been said that libc is occasionally 
better at it than gcc's built-in version.

Use -fno-builtin-memcpy and see what difference it makes.

More information about the ffmpeg-devel mailing list