[FFmpeg-devel] libavcodec/blockdsp : add AVX version

Martin Vignali martin.vignali at gmail.com
Tue Oct 3 22:47:08 EEST 2017


Hello,


> I used GCC 7.2. clear_blocks_mmx is slower than c for me as well, but
> not the rest.
> Your compiler seems to have done a much better job than mine. Is it
> Clang? Does it somehow have vectorization enabled perhaps? Because
> that's not supposed to happen.
>
>
Yes it's Clang 8.1

I put the clear_blocks_c function, in a file and run
clang -S -O1 test_asm_gen.c

the asm result is
    .section    __TEXT,__text,regular,pure_instructions
    .macosx_version_min 10, 12
    .globl    _clear_blocks_c
    .p2align    4, 0x90
_clear_blocks_c:                        ## @clear_blocks_c
    .cfi_startproc
## BB#0:
    pushq    %rbp
Ltmp0:
    .cfi_def_cfa_offset 16
Ltmp1:
    .cfi_offset %rbp, -16
    movq    %rsp, %rbp
Ltmp2:
    .cfi_def_cfa_register %rbp
    movl    $768, %esi              ## imm = 0x300
    callq    ___bzero
    popq    %rbp
    retq
    .cfi_endproc


.subsections_via_symbols

Seems like an optimized function is call for clear_blocks_c

>
> > I also modify several decoder/encoder, in order to fix the
> DECLARE_ALIGNED
> > from 16 to 32
> >
> > I run make fate SAMPLES=fate-suite/
> > i have several errors, but after a check, these errors
> > doesn't seems to be related to this patch
>
> Make sure to clean your build folder if you recently pulled new commits
> from the git repository. Reconfigure if necessary.
>
>
Ok, i rerun it, and pass fate test


2017-10-02 4:05 GMT+02:00 Ronald S. Bultje <rsbultje at gmail.com>:

> Hi,
>
> On Sun, Oct 1, 2017 at 7:46 PM, Martin Vignali <martin.vignali at gmail.com>
> wrote:
>
> > I also modify several decoder/encoder, in order to fix the
> DECLARE_ALIGNED
> > from 16 to 32
> >
>
> How did you decide which ones to change?
>
> Ronald
>

after running fate test, looks like tests fail when
LOCAL_ALIGNED_16 or DECLARE_ALIGNED(16 is use to declare block variable
not in other case.

using git grep clear_block, i check all the files who use this func
and change LOCAL_ALIGNED_16 to LOCAL_ALIGNED_32
or  DECLARE_ALIGNED(16.. to DECLARE_ALIGNED(32...

Martin


More information about the ffmpeg-devel mailing list