[FFmpeg-devel] [PATCH] Fix ff_imdct_calc_sse() on gcc-4.6.

Måns Rullgård mans
Mon Jan 31 00:00:22 CET 2011


Alex Converse <alex.converse at gmail.com> writes:

> On Sun, Jan 30, 2011 at 2:13 AM, Reimar D?ffinger
> <Reimar.Doeffinger at gmx.de> wrote:
>> On Sun, Jan 30, 2011 at 01:22:05AM -0800, Alex Converse wrote:
>>> On Sun, Jan 30, 2011 at 1:08 AM, Alex Converse <alex.converse at gmail.com> wrote:
>>> >
>>> > Gcc 4.6 only preserves the first value when using a vector with an "m"
>>> > constraint.
>>> > ---
>>> > ?libavcodec/x86/fft_sse.c | ? ?4 ++--
>>> > ?1 files changed, 2 insertions(+), 2 deletions(-)
>>> >
>>> >
>>>
>>> oops this generates an extra indirection. Those of you who like to
>>> defend inline asm, please step up and make some suggestions.
>>
>> Use what would be your _only_ option (conceptually, not in implementation
>> of course) if you didn't use inline asm:
>> MANGLE and change DECLARE_ALIGNED to DECLARE_ASM_CONST
>> There's also the option of a gcc bug report, I have some doubts that is
>> a valid optimization (though there are constraints to make gcc load
>> directly into a xmm register, but both that and the current code have
>> needlessly unpredictable performance).

The optimisation is valid, the constraint is wrong.

> With MANGLE this time.
>
> From 51a872ff818bbb0071ab1649d104afcea48546c9 Mon Sep 17 00:00:00 2001
> From: Alex Converse <alex.converse at gmail.com>
> Date: Sun, 30 Jan 2011 01:04:41 -0800
> Subject: [PATCH 1/2] Fix ff_imdct_calc_sse() on gcc-4.6.
> MIME-Version: 1.0
> Content-Type: multipart/mixed; boundary="------------1"
>
> This is a multi-part message in MIME format.
> --------------1
> Content-Type: text/plain; charset=UTF-8; format=fixed
> Content-Transfer-Encoding: 8bit

Please try to avoid sending attached patches in this format.

> Gcc 4.6 only preserves the first value when using a vector with an "m"
> constraint.
> ---
>  libavcodec/x86/fft_sse.c |    7 +++----
>  1 files changed, 3 insertions(+), 4 deletions(-)
>
>
> --------------1
> Content-Type: text/x-patch; name="0001-Fix-ff_imdct_calc_sse-on-gcc-4.6.patch"
> Content-Transfer-Encoding: 8bit
> Content-Disposition: attachment; filename="0001-Fix-ff_imdct_calc_sse-on-gcc-4.6.patch"
>
> diff --git a/libavcodec/x86/fft_sse.c b/libavcodec/x86/fft_sse.c
> index 9f02816..db591d4 100644
> --- a/libavcodec/x86/fft_sse.c
> +++ b/libavcodec/x86/fft_sse.c
> @@ -23,7 +23,7 @@
>  #include "libavcodec/dsputil.h"
>  #include "fft.h"
>  
> -DECLARE_ALIGNED(16, static const int, m1m1m1m1)[4] =
> +DECLARE_ASM_CONST(16, int, ff_m1m1m1m1)[4] =
>      { 1 << 31, 1 << 31, 1 << 31, 1 << 31 };
>  
>  void ff_fft_dispatch_sse(FFTComplex *z, int nbits);
> @@ -82,7 +82,7 @@ void ff_imdct_calc_sse(FFTContext *s, FFTSample *output, const FFTSample *input)
>      j = -n;
>      k = n-16;
>      __asm__ volatile(
> -        "movaps %4, %%xmm7 \n"
> +        "movaps "LOCAL_MANGLE(ff_m1m1m1m1)", %%xmm7 \n"
>          "1: \n"
>          "movaps       (%2,%1), %%xmm0 \n"
>          "movaps       (%3,%0), %%xmm1 \n"
> @@ -95,8 +95,7 @@ void ff_imdct_calc_sse(FFTContext *s, FFTSample *output, const FFTSample *input)
>          "add $16, %0 \n"
>          "jl 1b \n"
>          :"+r"(j), "+r"(k)
> -        :"r"(output+n4), "r"(output+n4*3),
> -         "m"(*m1m1m1m1)
> +        :"r"(output+n4), "r"(output+n4*3)
>          XMM_CLOBBERS_ONLY("%xmm0", "%xmm1", "%xmm7")
>      );
>  }

Looks OK to me.

-- 
M?ns Rullg?rd
mans at mansr.com



More information about the ffmpeg-devel mailing list