[FFmpeg-devel] [PATCH 2/2] libavutil: add bmi2 optimized av_zhb

Michael Niedermayer michaelni at gmx.at
Tue Mar 17 12:53:22 CET 2015


On Tue, Mar 17, 2015 at 01:08:06AM -0300, James Almer wrote:
> Signed-off-by: James Almer <jamrial at gmail.com>
> ---
> GCC apparently can't generate a bzhi instruction on its own from the c version, so 
> here's a custom implementation.
> 
> Before:
> 
> gcc -O3
> <av_zhb_c>:
>    0:   89 f1                   mov    ecx,esi
>    2:   ba 01 00 00 00          mov    edx,0x1
>    7:   d3 e2                   shl    edx,cl
>    9:   83 ea 01                sub    edx,0x1
>    c:   89 d0                   mov    eax,edx
>    e:   21 f8                   and    eax,edi
>   10:   c3                      ret
> 
> gcc -mbmi2 -O3
> <av_zhb_c>:
>    0:   ba 01 00 00 00          mov    edx,0x1
>    5:   c4 e2 49 f7 d2          shlx   edx,edx,esi
>    a:   8d 42 ff                lea    eax,[rdx-0x1]
>    d:   21 f8                   and    eax,edi
>    f:   c3                      ret
> 
> After:
> 
> gcc -mbmi2 -O3
> <av_zhb_bmi2>:
>    0:   c4 e2 48 f5 c7          bzhi   eax,edi,esi
>    5:   c3                      ret
> 
> The non-bmi2 example is a bit bloated with movs to have values in ecx (needed for 
> shl) and eax (ret value) since, unlike the actual function, it was not inlined.
> Still, best case scenario is mov + shl + sub/dec/lea + and versus a single bzhi 
> when p is not a constant.

orthogonal to this patch, you or someone might want to submit a patch
to gcc to make it autogenerate this optimization

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

The bravest are surely those who have the clearest vision
of what is before them, glory and danger alike, and yet
notwithstanding go out to meet it. -- Thucydides
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 181 bytes
Desc: Digital signature
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20150317/753b299e/attachment.asc>


More information about the ffmpeg-devel mailing list