[FFmpeg-devel] libavcodec/blockdsp : add clear_blocks_prores func (SSE, AVX) for prores decoding
Martin Vignali
martin.vignali at gmail.com
Tue Oct 10 22:54:16 EEST 2017
>
>> This is still slower then the memset numbers from the first test, why
>> the high variation in there?
>>
>>
>
Hello,
Maybe the result in my first email was not very clear
For the results below i run the checkasm test 10 times in each case and
take the faster.
Original benchmark (similar to the current way in the proresdec)
using these func
static void clear_blocks_prores_c(int16_t * blocks, ptrdiff_t block_count)
{
int i;
for (i = 0; i < block_count; i++) {
memset(blocks+(i << 6), 0, sizeof(int16_t) * 64);
}
}
static void ff_clear_blocks_prores_sse(int16_t * blocks, ptrdiff_t
block_count){
int i;
for (i = 0; i < block_count; i++)
ff_clear_block_sse(blocks+(i<<6));
}
static void ff_clear_blocks_prores_avx(int16_t * blocks, ptrdiff_t
block_count){
int i;
for (i = 0; i < block_count; i++)
ff_clear_block_avx(blocks+(i<<6));
}
blockdsp.clear_blocks_prores_c: 570.3
blockdsp.clear_blocks_prores_sse: 325.8
blockdsp.clear_blocks_prores_avx: 190.3
new version
blockdsp.clear_blocks_prores_c: 138.3
blockdsp.clear_blocks_prores_sse: 274.6
blockdsp.clear_blocks_prores_avx: 137.6
with the new patch
using for the c version
static void clear_blocks_prores_c(int16_t * blocks, ptrdiff_t block_count)
{
memset(blocks, 0, sizeof(int16_t) * 64 * block_count);
}
Martin
More information about the ffmpeg-devel
mailing list