[FFmpeg-devel] libavcodec/blockdsp : add clear_blocks_prores func (SSE, AVX) for prores decoding

Martin Vignali martin.vignali at gmail.com
Thu Oct 5 19:47:46 EEST 2017


2017-10-05 18:04 GMT+02:00 Hendrik Leppkes <h.leppkes at gmail.com>:

> On Thu, Oct 5, 2017 at 4:58 PM, Martin Vignali <martin.vignali at gmail.com>
> wrote:
> > Hello,
> >
> > In attach patchs to add a dedicated func for clear_block inside
> > prores decoding (proresdec2)
> >
> > currently slice decode func use a loop and call the blockdsp.clear_block
> > func
> >
> > After some test, it seems to be slower, than memset (for me)
> > I check using this "fake" func in the blockdsp
> > static void ff_clear_blocks_prores_sse_loop(int16_t * blocks, ptrdiff_t
> > block_count){
> >     int i;
> >     for (i = 0; i < block_count; i++)
> >         ff_clear_block_sse(blocks+(i<<6));
> > }
> >
> > static void ff_clear_blocks_prores_avx_loop(int16_t * blocks, ptrdiff_t
> > block_count){
> >     int i;
> >     for (i = 0; i < block_count; i++)
> >         ff_clear_block_avx(blocks+(i<<6));
> > }
> >
> > the result in checkasm are (need patch in attach to reproduce the test) :
> > using the loop
> > blockdsp.clear_blocks_prores_c: 137.8
> > blockdsp.clear_blocks_prores_sse: 292.0
> > blockdsp.clear_blocks_prores_avx: 230.5
> >
> >
> > Using the new asm func this is the result (Kaby Lake, os 10.12, Clang
> 8.1)
> > blockdsp.clear_blocks_prores_c: 153.4
> > blockdsp.clear_blocks_prores_sse: 284.4
> > blockdsp.clear_blocks_prores_avx: 142.2
> >
> >
>
> This is still slower then the memset numbers from the first test, why
> the high variation in there?
>
> Hello,

Don't know why, i often have this kind of variation between each test with
check_asm (and with -benchmark (like in other thread of exr simd, where the
decoding of the image sequence, have several variation)
In both test, this is the same c func

But if i run the same test several time, with the new asm, i always have an
avx version faster than C (who seems to be already optimized in my computer)
(and following James Almer answer in discussion : libavcodec/blockdsp : add
AVX version, having avx/sse for this simple func, can be interesting on
some computer/compiler...)

Martin


More information about the ffmpeg-devel mailing list