[FFmpeg-devel] swscale/swscale_unscaled : add X86_64 (SSE2, AVX) for uyvyto422
h.leppkes at gmail.com
Tue Apr 3 12:20:35 EEST 2018
On Tue, Apr 3, 2018 at 2:10 AM, James Almer <jamrial at gmail.com> wrote:
> On 4/2/2018 8:33 PM, Carl Eugen Hoyos wrote:
>> 2018-04-02 23:26 GMT+02:00, Martin Vignali <martin.vignali at gmail.com>:
>>> Around 20% faster (on a "benchmark cmd", who test pix_fmt conversion)
>>> (4.2s with the patch, 5.2s without)
>>> Pass fate test for me.
>>> Checkasm result :
>>> uyvytoyuv422_c: 14146.6
>>> uyvytoyuv422_mmx: 13696.4
>>> uyvytoyuv422_mmxext: 19395.9
>> Something looks wrong here...
>> Carl Eugen
> On a Haswell using GCC i get
> uyvytoyuv422_c: 44884.2
> uyvytoyuv422_mmx: 15284.5
> uyvytoyuv422_mmxext: 28656.5
> uyvytoyuv422_sse2: 10921.8
> uyvytoyuv422_avx: 10606.5
> Martin is using a Clang version that is for some reason ignoring our
> attempts at disabling tree vectorization, so his C function is optimized
> with simd by the compiler, hence the good result.
> The mmxext version being slower than the mmx one seems however to be an
> existing issue in the tree, which we should probably deal with. Unless
> of course the test is wrong.
Its mmx, dealing with it would probably entail just deleting it. Can
leave the ordinary mmx and remove the ext version, or perhaps just
More information about the ffmpeg-devel