[FFmpeg-devel] [PATCH] libavutil: add x86 optimized av_popcount
James Almer
jamrial at gmail.com
Wed Feb 25 18:29:11 CET 2015
On 25/02/15 12:43 PM, Clément Bœsch wrote:
> On Tue, Feb 24, 2015 at 10:05:24PM -0300, James Almer wrote:
>> Signed-off-by: James Almer <jamrial at gmail.com>
>> ---
>> I decided to go the configure route since other features (cmov, clz) also do
>> it , but if prefered this could instead be done with a new intmath.h header
>> in the x86/ folder containing something like
>>
>> #if defined(__GNUC__) && defined(__POPCNT__)
>> #define av_popcount __builtin_popcount
>> #if ARCH_X86_64
>> #define av_popcount64 __builtin_popcountll
>> #endif
>> #endif
>>
>> For a cleaner compile time check.
>>
>> configure | 12 ++++++++++--
>> libavutil/intmath.h | 13 +++++++++++++
>> 2 files changed, 23 insertions(+), 2 deletions(-)
>>
>
> For the record, the builtin implementation looks like this here:
>
> 0000000000000000 <av_popcount_c>:
> 0: 89 f8 mov %edi,%eax
> 2: d1 e8 shr %eax
> 4: 25 55 55 55 55 and $0x55555555,%eax
> 9: 29 c7 sub %eax,%edi
> b: 89 fa mov %edi,%edx
> d: c1 ef 02 shr $0x2,%edi
> 10: 81 e2 33 33 33 33 and $0x33333333,%edx
> 16: 81 e7 33 33 33 33 and $0x33333333,%edi
> 1c: 8d 04 17 lea (%rdi,%rdx,1),%eax
> 1f: 89 c2 mov %eax,%edx
> 21: c1 ea 04 shr $0x4,%edx
> 24: 01 d0 add %edx,%eax
> 26: 25 0f 0f 0f 0f and $0xf0f0f0f,%eax
> 2b: 89 c2 mov %eax,%edx
> 2d: c1 ea 08 shr $0x8,%edx
> 30: 01 d0 add %edx,%eax
> 32: 89 c2 mov %eax,%edx
> 34: c1 ea 10 shr $0x10,%edx
> 37: 01 d0 add %edx,%eax
> 39: 83 e0 3f and $0x3f,%eax
> 3c: c3 retq
> 3d: 0f 1f 00 nopl (%rax)
>
> 0000000000000040 <popcount_gcc>:
> 40: 48 83 ec 08 sub $0x8,%rsp
> 44: 89 ff mov %edi,%edi
> 46: e8 00 00 00 00 callq 4b <popcount_gcc+0xb>
> 4b: 48 83 c4 08 add $0x8,%rsp
> 4f: c3 retq
>
> 0000000000000040 <popcount_clang>:
> 40: 89 f8 mov %edi,%eax
> 42: d1 e8 shr %eax
> 44: 25 55 55 55 55 and $0x55555555,%eax
> 49: 29 c7 sub %eax,%edi
> 4b: 89 f8 mov %edi,%eax
> 4d: 25 33 33 33 33 and $0x33333333,%eax
> 52: c1 ef 02 shr $0x2,%edi
> 55: 81 e7 33 33 33 33 and $0x33333333,%edi
> 5b: 01 c7 add %eax,%edi
> 5d: 89 f8 mov %edi,%eax
> 5f: c1 e8 04 shr $0x4,%eax
> 62: 01 f8 add %edi,%eax
> 64: 25 0f 0f 0f 0f and $0xf0f0f0f,%eax
> 69: 69 c0 01 01 01 01 imul $0x1010101,%eax,%eax
> 6f: c1 e8 18 shr $0x18,%eax
> 72: c3 retq
>
> We might see relevant "optimizations" for our reference code.
What's clang code for av_popcount64_c, or their builtin?
We're currently calling av_popcount_c twice from within av_popcount64_c,
when on x86_64 cpus we could probably take advantage of the 64bits gprs.
>
> [...]
>
>
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
More information about the ffmpeg-devel
mailing list