[FFmpeg-devel] [PATCH] x86/intmath: add sse optimized av_clipf and av_clipd

James Almer jamrial at gmail.com
Thu Jan 7 04:25:01 CET 2016


On 1/7/2016 12:19 AM, Ronald S. Bultje wrote:
> Hi,
> 
> On Wed, Jan 6, 2016 at 8:09 PM, James Almer <jamrial at gmail.com> wrote:
> 
>> Signed-off-by: James Almer <jamrial at gmail.com>
>> ---
>>  libavutil/x86/intmath.h | 32 ++++++++++++++++++++++++++++++++
>>  1 file changed, 32 insertions(+)
>>
>> diff --git a/libavutil/x86/intmath.h b/libavutil/x86/intmath.h
>> index 611ef88..e1cd596 100644
>> --- a/libavutil/x86/intmath.h
>> +++ b/libavutil/x86/intmath.h
>> @@ -98,6 +98,38 @@ static av_always_inline av_const unsigned
>> av_mod_uintp2_bmi2(unsigned a, unsigne
>>
>>  #endif /* __BMI2__ */
>>
>> +#if defined(__SSE2__)
>> +
>> +#define av_clipd av_clipd_sse2
>> +static av_always_inline av_const double av_clipd_sse2(double a, double
>> amin, double amax)
>> +{
>> +#if defined(ASSERT_LEVEL) && ASSERT_LEVEL >= 2
>> +    if (amin > amax) abort();
>> +#endif
>> +    __asm__ ("minsd %2, %0 \n\t"
>> +             "maxsd %1, %0 \n\t"
>> +             : "+x"(a) : "xm"(amin), "xm"(amax));
>> +    return a;
>> +}
>> +
>> +#endif /* __SSE2__ */
> 
> 
> This __SSE2__ is kind of strange, and we don't use it anywhere else. I
> understand it's not the same thing, but for practical purposes, could we
> just use #if ARCH_X86_64 and not care about -msse2?
> 
> Ronald

We use it in x86/intreadwrite.h for AV_ZERO128.
And no, I'd rather have it working on x86_32 when -msse2 is used since it's
much more efficient.
Compare:

00000000 <_av_clipf_sse>:
   0:   83 ec 0c                sub    esp,0xc
   3:   f2 0f 10 44 24 10       movsd  xmm0,QWORD PTR [esp+0x10]
   9:   f2 0f 5d 44 24 20       minsd  xmm0,QWORD PTR [esp+0x20]
   f:   f2 0f 5f 44 24 18       maxsd  xmm0,QWORD PTR [esp+0x18]
  15:   f2 0f 11 04 24          movsd  QWORD PTR [esp],xmm0
  1a:   dd 04 24                fld    QWORD PTR [esp]
  1d:   83 c4 0c                add    esp,0xc
  20:   c3                      ret

with:

00000030 <_av_clipf_c>:
  30:   dd 44 24 04             fld    QWORD PTR [esp+0x4]
  34:   dd 44 24 14             fld    QWORD PTR [esp+0x14]
  38:   dd 44 24 0c             fld    QWORD PTR [esp+0xc]
  3c:   db ea                   fucomi st,st(2)
  3e:   77 10                   ja     50 <_clipf_c+0x20>
  40:   dd d8                   fstp   st(0)
  42:   d9 c9                   fxch   st(1)
  44:   db e9                   fucomi st,st(1)
  46:   db d1                   fcmovnbe st,st(1)
  48:   dd d9                   fstp   st(1)
  4a:   eb 08                   jmp    54 <_clipf_c+0x24>
  4c:   8d 74 26 00             lea    esi,[esi+eiz*1+0x0]
  50:   dd d9                   fstp   st(1)
  52:   dd d9                   fstp   st(1)
  54:   f3 c3                   repz ret



More information about the ffmpeg-devel mailing list