[FFmpeg-devel] [PATCH 2/3] x86: asm for sign/zero_extend()

Måns Rullgård mans
Sun Feb 21 03:59:42 CET 2010


Michael Niedermayer <michaelni at gmx.at> writes:

> On Sun, Feb 21, 2010 at 02:37:17AM +0000, M?ns Rullg?rd wrote:
>> Michael Niedermayer <michaelni at gmx.at> writes:
>> 
>> > On Sun, Feb 21, 2010 at 12:35:58AM +0000, Mans Rullgard wrote:
>> >> ---
>> >>  libavcodec/x86/mathops.h |   18 ++++++++++++++++++
>> >>  1 files changed, 18 insertions(+), 0 deletions(-)
>> >> 
>> >> diff --git a/libavcodec/x86/mathops.h b/libavcodec/x86/mathops.h
>> >> index 010cfb7..0c17f35 100644
>> >> --- a/libavcodec/x86/mathops.h
>> >> +++ b/libavcodec/x86/mathops.h
>> >> @@ -97,4 +97,22 @@ static inline uint32_t NEG_USR32(uint32_t a, int8_t s){
>> >>      return a;
>> >>  }
>> >>  
>> >> +#define sign_extend sign_extend
>> >> +static inline int sign_extend(int val, unsigned bits)
>> >> +{
>> >> +    __asm__ ("shll %1, %0 \n\t"
>> >> +             "sarl %1, %0 \n\t"
>> >> +             : "+&r" (val) : "ic" ((uint8_t)-bits));
>> >> +    return val;
>> >> +}
>> >> +
>> >
>> >> +#define zero_extend zero_extend
>> >> +static inline unsigned zero_extend(unsigned val, unsigned bits)
>> >> +{
>> >> +    __asm__ ("shll %1, %0 \n\t"
>> >> +             "shrl %1, %0 \n\t"
>> >> +             : "+&r" (val) : "ic" ((uint8_t)-bits));
>> >> +    return val;
>> >
>> > if bits is a constant (which i guess it is quite often)
>> > then this is quite inefficient.
>> > val & 0x00007FFF
>> > for example is more efficient in that case.
>> > also its not certain 2 shifts are faster than =-1, >>, &
>> > on all x86 cpus
>> 
>> Tell that to whoever wrote the asm for the NEG_*SR32 functions.
>> Oh, wait... that was you...
>
> true, my code is not optimal, ill look into this but that might
> or might not be anytime soon.
> still please dont copy my mistakes

My intent was entirely the opposite.

-- 
M?ns Rullg?rd
mans at mansr.com



More information about the ffmpeg-devel mailing list