[FFmpeg-devel] [PATCH] Fix ff_imdct_calc_sse() on gcc-4.6.
Måns Rullgård
mans
Mon Jan 31 00:00:22 CET 2011
Alex Converse <alex.converse at gmail.com> writes:
> On Sun, Jan 30, 2011 at 2:13 AM, Reimar D?ffinger
> <Reimar.Doeffinger at gmx.de> wrote:
>> On Sun, Jan 30, 2011 at 01:22:05AM -0800, Alex Converse wrote:
>>> On Sun, Jan 30, 2011 at 1:08 AM, Alex Converse <alex.converse at gmail.com> wrote:
>>> >
>>> > Gcc 4.6 only preserves the first value when using a vector with an "m"
>>> > constraint.
>>> > ---
>>> > ?libavcodec/x86/fft_sse.c | ? ?4 ++--
>>> > ?1 files changed, 2 insertions(+), 2 deletions(-)
>>> >
>>> >
>>>
>>> oops this generates an extra indirection. Those of you who like to
>>> defend inline asm, please step up and make some suggestions.
>>
>> Use what would be your _only_ option (conceptually, not in implementation
>> of course) if you didn't use inline asm:
>> MANGLE and change DECLARE_ALIGNED to DECLARE_ASM_CONST
>> There's also the option of a gcc bug report, I have some doubts that is
>> a valid optimization (though there are constraints to make gcc load
>> directly into a xmm register, but both that and the current code have
>> needlessly unpredictable performance).
The optimisation is valid, the constraint is wrong.
> With MANGLE this time.
>
> From 51a872ff818bbb0071ab1649d104afcea48546c9 Mon Sep 17 00:00:00 2001
> From: Alex Converse <alex.converse at gmail.com>
> Date: Sun, 30 Jan 2011 01:04:41 -0800
> Subject: [PATCH 1/2] Fix ff_imdct_calc_sse() on gcc-4.6.
> MIME-Version: 1.0
> Content-Type: multipart/mixed; boundary="------------1"
>
> This is a multi-part message in MIME format.
> --------------1
> Content-Type: text/plain; charset=UTF-8; format=fixed
> Content-Transfer-Encoding: 8bit
Please try to avoid sending attached patches in this format.
> Gcc 4.6 only preserves the first value when using a vector with an "m"
> constraint.
> ---
> libavcodec/x86/fft_sse.c | 7 +++----
> 1 files changed, 3 insertions(+), 4 deletions(-)
>
>
> --------------1
> Content-Type: text/x-patch; name="0001-Fix-ff_imdct_calc_sse-on-gcc-4.6.patch"
> Content-Transfer-Encoding: 8bit
> Content-Disposition: attachment; filename="0001-Fix-ff_imdct_calc_sse-on-gcc-4.6.patch"
>
> diff --git a/libavcodec/x86/fft_sse.c b/libavcodec/x86/fft_sse.c
> index 9f02816..db591d4 100644
> --- a/libavcodec/x86/fft_sse.c
> +++ b/libavcodec/x86/fft_sse.c
> @@ -23,7 +23,7 @@
> #include "libavcodec/dsputil.h"
> #include "fft.h"
>
> -DECLARE_ALIGNED(16, static const int, m1m1m1m1)[4] =
> +DECLARE_ASM_CONST(16, int, ff_m1m1m1m1)[4] =
> { 1 << 31, 1 << 31, 1 << 31, 1 << 31 };
>
> void ff_fft_dispatch_sse(FFTComplex *z, int nbits);
> @@ -82,7 +82,7 @@ void ff_imdct_calc_sse(FFTContext *s, FFTSample *output, const FFTSample *input)
> j = -n;
> k = n-16;
> __asm__ volatile(
> - "movaps %4, %%xmm7 \n"
> + "movaps "LOCAL_MANGLE(ff_m1m1m1m1)", %%xmm7 \n"
> "1: \n"
> "movaps (%2,%1), %%xmm0 \n"
> "movaps (%3,%0), %%xmm1 \n"
> @@ -95,8 +95,7 @@ void ff_imdct_calc_sse(FFTContext *s, FFTSample *output, const FFTSample *input)
> "add $16, %0 \n"
> "jl 1b \n"
> :"+r"(j), "+r"(k)
> - :"r"(output+n4), "r"(output+n4*3),
> - "m"(*m1m1m1m1)
> + :"r"(output+n4), "r"(output+n4*3)
> XMM_CLOBBERS_ONLY("%xmm0", "%xmm1", "%xmm7")
> );
> }
Looks OK to me.
--
M?ns Rullg?rd
mans at mansr.com
More information about the ffmpeg-devel
mailing list