[FFmpeg-devel] [PATCH 10/11] avcodec/blockdsp: add AVX-512 version of clear_block(s)
James Almer
jamrial at gmail.com
Fri Nov 10 16:36:36 EET 2017
On 11/10/2017 10:28 AM, James Darnley wrote:
> On 2017-11-09 20:35, Martin Vignali wrote:
>> 2017-11-09 12:58 GMT+01:00 James Darnley <jdarnley at obe.tv>:
>>
>>> From: James Darnley <james.darnley at gmail.com>
>>>
>>> Also adjust alignment requirements where nessecary.
>>> ---
>>> Whether this patch is committed or not the change to 4xm.c should be
>>> picked to
>>> master because the alignment is wrong for the AVX version of this
>>> function. I
>>> assume it hasn't been noticed yet because it manages to be 32-byte aligned
>>> without intervention.
>>>
>>>
>> Thanks for fixing, the 4xm, i miss it in the avx patch
>>
>> Just by curiosity : can you post the checkasm result (i can't test AVX512) ?
>
> I certainly can.
>
>> $ ./tests/checkasm/checkasm --bench --test=blockdsp
>> benchmarking with native FFmpeg timers
>> nop: 26.0
>> checkasm: using random seed 402373647
>> MMX:
>> - blockdsp.blockdsp [OK]
>> SSE:
>> - blockdsp.blockdsp [OK]
>> AVX:
>> - blockdsp.blockdsp [OK]
>> AVX-512:
>> - blockdsp.blockdsp [OK]
>> checkasm: all 8 tests passed
>> blockdsp.clear_block_c: 23.5
>> blockdsp.clear_block_mmx: 11.5
>> blockdsp.clear_block_sse: 5.5
>> blockdsp.clear_block_avx: 3.0
>> blockdsp.clear_block_avx512: 5.0
This sounds like it's not worth adding.
>> blockdsp.clear_blocks_c: 48.0
>> blockdsp.clear_blocks_mmx: 77.0
>> blockdsp.clear_blocks_sse: 38.0
>> blockdsp.clear_blocks_avx: 18.5
>> blockdsp.clear_blocks_avx512: 11.0
This one is better, but a perf run to check how much CPU time is spent
in this function is needed, because I'm not sure it's important enough
to justify having the CPU throttled just to run avx512 code...
More information about the ffmpeg-devel
mailing list