[FFmpeg-devel] libavcodec/blockdsp : add clear_blocks_prores func (SSE, AVX) for prores decoding

Martin Vignali martin.vignali at gmail.com
Tue Oct 10 22:54:16 EEST 2017

>> This is still slower then the memset numbers from the first test, why
>> the high variation in there?

Maybe the result in my first email was not very clear

For the results below i run the checkasm test 10 times in each case and
take the faster.

Original benchmark (similar to the current way in the proresdec)

using these func

static void clear_blocks_prores_c(int16_t * blocks, ptrdiff_t block_count)
    int i;
    for (i = 0; i < block_count; i++) {
        memset(blocks+(i << 6), 0, sizeof(int16_t) * 64);

static void ff_clear_blocks_prores_sse(int16_t * blocks, ptrdiff_t
    int i;
    for (i = 0; i < block_count; i++)

static void ff_clear_blocks_prores_avx(int16_t * blocks, ptrdiff_t
    int i;
    for (i = 0; i < block_count; i++)

blockdsp.clear_blocks_prores_c: 570.3
blockdsp.clear_blocks_prores_sse: 325.8
blockdsp.clear_blocks_prores_avx: 190.3

new version
blockdsp.clear_blocks_prores_c: 138.3
blockdsp.clear_blocks_prores_sse: 274.6
blockdsp.clear_blocks_prores_avx: 137.6

with the new patch

using for the c version
static void clear_blocks_prores_c(int16_t * blocks, ptrdiff_t block_count)
    memset(blocks, 0, sizeof(int16_t) * 64 * block_count);


More information about the ffmpeg-devel mailing list