[FFmpeg-devel] [PATCH] libavutil: add x86 optimized av_popcount
Clément Bœsch
u at pkh.me
Wed Feb 25 16:43:51 CET 2015
On Tue, Feb 24, 2015 at 10:05:24PM -0300, James Almer wrote:
> Signed-off-by: James Almer <jamrial at gmail.com>
> ---
> I decided to go the configure route since other features (cmov, clz) also do
> it , but if prefered this could instead be done with a new intmath.h header
> in the x86/ folder containing something like
>
> #if defined(__GNUC__) && defined(__POPCNT__)
> #define av_popcount __builtin_popcount
> #if ARCH_X86_64
> #define av_popcount64 __builtin_popcountll
> #endif
> #endif
>
> For a cleaner compile time check.
>
> configure | 12 ++++++++++--
> libavutil/intmath.h | 13 +++++++++++++
> 2 files changed, 23 insertions(+), 2 deletions(-)
>
For the record, the builtin implementation looks like this here:
0000000000000000 <av_popcount_c>:
0: 89 f8 mov %edi,%eax
2: d1 e8 shr %eax
4: 25 55 55 55 55 and $0x55555555,%eax
9: 29 c7 sub %eax,%edi
b: 89 fa mov %edi,%edx
d: c1 ef 02 shr $0x2,%edi
10: 81 e2 33 33 33 33 and $0x33333333,%edx
16: 81 e7 33 33 33 33 and $0x33333333,%edi
1c: 8d 04 17 lea (%rdi,%rdx,1),%eax
1f: 89 c2 mov %eax,%edx
21: c1 ea 04 shr $0x4,%edx
24: 01 d0 add %edx,%eax
26: 25 0f 0f 0f 0f and $0xf0f0f0f,%eax
2b: 89 c2 mov %eax,%edx
2d: c1 ea 08 shr $0x8,%edx
30: 01 d0 add %edx,%eax
32: 89 c2 mov %eax,%edx
34: c1 ea 10 shr $0x10,%edx
37: 01 d0 add %edx,%eax
39: 83 e0 3f and $0x3f,%eax
3c: c3 retq
3d: 0f 1f 00 nopl (%rax)
0000000000000040 <popcount_gcc>:
40: 48 83 ec 08 sub $0x8,%rsp
44: 89 ff mov %edi,%edi
46: e8 00 00 00 00 callq 4b <popcount_gcc+0xb>
4b: 48 83 c4 08 add $0x8,%rsp
4f: c3 retq
0000000000000040 <popcount_clang>:
40: 89 f8 mov %edi,%eax
42: d1 e8 shr %eax
44: 25 55 55 55 55 and $0x55555555,%eax
49: 29 c7 sub %eax,%edi
4b: 89 f8 mov %edi,%eax
4d: 25 33 33 33 33 and $0x33333333,%eax
52: c1 ef 02 shr $0x2,%edi
55: 81 e7 33 33 33 33 and $0x33333333,%edi
5b: 01 c7 add %eax,%edi
5d: 89 f8 mov %edi,%eax
5f: c1 e8 04 shr $0x4,%eax
62: 01 f8 add %edi,%eax
64: 25 0f 0f 0f 0f and $0xf0f0f0f,%eax
69: 69 c0 01 01 01 01 imul $0x1010101,%eax,%eax
6f: c1 e8 18 shr $0x18,%eax
72: c3 retq
We might see relevant "optimizations" for our reference code.
[...]
--
Clément B.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 473 bytes
Desc: not available
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20150225/40a6b840/attachment.asc>
More information about the ffmpeg-devel
mailing list