[FFmpeg-devel] Patch: Inline asm fixes for Intel compiler on Windows

Michael Niedermayer michaelni at gmx.at
Sun Jan 5 06:36:56 CET 2014


On Sun, Jan 05, 2014 at 01:27:26PM +1100, Matt Oliver wrote:
> On 30 December 2013 13:06, Matt Oliver <protogonoi at gmail.com> wrote:
> 
> > On 30 December 2013 02:26, Michael Niedermayer <michaelni at gmx.at> wrote:
> >
> >> On Sun, Dec 29, 2013 at 02:19:40PM +0100, Reimar Döffinger wrote:
> >> > On Sun, Dec 29, 2013 at 02:48:52PM +1100, Matt Oliver wrote:
> >> > > is to much to ask. The only other option I can think of is to have
> >> > > different versions. i.e. for those few lines with direct symbol
> >> references
> >> > > have an #ifdef that only changes the code for Intel compiler on
> >> Windows.
> >> > > Windows shared libraries dont need PIC so this shouldn't be a problem.
> >> > > Although the code may become a bit ugly.
> >> >
> >> > One thing I thought of: You could try using named asm constraints, and
> >> > only change the MANGLE to convert to a reference to a named asm
> >> > argument.
> >> > If you are lucky in quite a few cases you'd then only need an ifdef
> >> > around the extra asm constraints.
> >>
> >> that should be possible without ifdef, see XMM_CLOBBERS_ONLY() for
> >> something similar
> >>
> >> [...]
> >> --
> >> Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
> >>
> >> You can kill me, but you cannot change the truth.
> >>
> >
> > Using named asm constraints is a good idea as it seems that Intel compiler
> > supports those. So I will update the patch so that MANGLE remains in use
> > but it will just change to a named asm constraint on Intel+Windows. Combine
> > that with a macro such as XMM_CLOBBERS_ONLY in order to add the necessary
> > asm interfaces and that should work fine.
> >
> > That one thing I did notice though is that if the use of "m" constraints
> > with PIC is an issue then there are several locations in existing code that
> > dont seem to follow that. For instance in libavcodec/x86/cavsdsp.c the
> > macro QPEL_CAVSV1 uses a asm interface for MUL1 (which is variable ff_pw_96
> > and is assigned to %5). However MUL2 (which is variable ff_pw_42) is used
> > as a direct symbol through MANGLE. Since both these variables are defined
> > in the same place and are of the same type (uint64_t) then it appears there
> > are some inconsistencies in existing FFmpeg code. Ill leave all existing
> > instances of MANGLE but i thought id just point this out.
> >
> 
> OK so I have redone the patch using named asm constraints and a macro to
> add them to the asm-interface (similar to XMM_CLOBBERS_ONLY as discussed).
> 
> There are only 2 locations that are slightly different and these are in
> motion_est.c and vf_fspp.c.

> The inline asm in Intel compiler is rather
> unstable and changing some compilation flags suddenly result in errors from
> what was previously perfectly functional asm. To get compilation to work

hmm


> using standard release options 2 variables were changed from direct symbol
> references to an asm-interface. This should not be a problem as these
> symbols are defined in asm-interfaces already (for instance in vf_fspp.c
> temps[0] is already in an interface and all I did was add temps[1]). So the
> patch does not change anything that didnt already exist.
> 
> To ensure everything stayed the same I compiled using gcc 4.8.2 and checked
> the generated disassembly for the part affected. For all changes the
> resulting code is exactly identical so this does not change anything (which
> for the most part it shouldn't anyway as the macros only change when using
> Intel on Windows). So no surprises there but I thought i would be thorough
> just to be sure. So obviously all FATE tests passed.
> 
> Let me know how this one looks.

>  libavcodec/x86/cabac.h                 |    3 +
>  libavcodec/x86/cavsdsp.c               |    2 +
>  libavcodec/x86/constants.h             |    2 +
>  libavcodec/x86/dsputil_mmx.c           |    1 
>  libavcodec/x86/h264_i386.h             |    3 +
>  libavcodec/x86/idct_sse2_xvid.c        |   18 +++++-----
>  libavcodec/x86/lpc.c                   |    3 +
>  libavcodec/x86/mathops.h               |    2 -
>  libavcodec/x86/mlpdsp.c                |    5 +-
>  libavcodec/x86/motion_est.c            |    4 +-
>  libavcodec/x86/mpegvideoenc_template.c |    4 +-
>  libavcodec/x86/simple_idct.c           |    1 
>  libavcodec/x86/vc1dsp_mmx.c            |    6 +++
>  libavfilter/libmpcodecs/vf_fspp.c      |   58 ++++++++++++++++++---------------
>  libavfilter/libmpcodecs/vf_ilpack.c    |    8 ++--
>  libavfilter/libmpcodecs/vf_pp7.c       |    2 -
>  libavfilter/libmpcodecs/vf_uspp.c      |    2 -
>  libavfilter/vf_drawtext.c              |    4 +-
>  libavutil/mathematics.h                |    3 +
>  libavutil/x86/asm.h                    |   43 +++++++++++++++++++++++-
>  libpostproc/postprocess_template.c     |    7 +++
>  libswresample/x86/resample_mmx.h       |    2 +
>  libswscale/x86/rgb2rgb.c               |    8 ++++
>  libswscale/x86/rgb2rgb_template.c      |   11 ++++++
>  libswscale/x86/swscale_template.c      |   27 +++++++++++++++
>  libswscale/x86/yuv2rgb.c               |   18 +++++-----
>  libswscale/x86/yuv2rgb_template.c      |   21 ++++++++++-
>  27 files changed, 205 insertions(+), 63 deletions(-)
> d26137cd7e89161d06bef0332b959df68afaadbb  Fixed-inline-asm-compilation-with-Intel-compiler.patch
> From 6eea177a4761629df5d6f02359fccc4efb116aed Mon Sep 17 00:00:00 2001
> From: Matt Oliver <protogonoi at gmail.com>
> Date: Sun, 5 Jan 2014 13:09:25 +1100
> Subject: [PATCH] Fixed inline asm compilation with Intel compiler,
> 
> ---
>  libavcodec/x86/cabac.h                 |  3 +-
>  libavcodec/x86/cavsdsp.c               |  2 ++
>  libavcodec/x86/constants.h             |  2 ++
>  libavcodec/x86/dsputil_mmx.c           |  1 +
>  libavcodec/x86/h264_i386.h             |  3 +-
>  libavcodec/x86/idct_sse2_xvid.c        | 18 +++++------
>  libavcodec/x86/lpc.c                   |  3 ++
>  libavcodec/x86/mathops.h               |  2 +-
>  libavcodec/x86/mlpdsp.c                |  5 +--
>  libavcodec/x86/motion_est.c            |  4 ++-
>  libavcodec/x86/mpegvideoenc_template.c |  4 +--
>  libavcodec/x86/simple_idct.c           |  1 +
>  libavcodec/x86/vc1dsp_mmx.c            |  6 ++++
>  libavfilter/libmpcodecs/vf_fspp.c      | 58 +++++++++++++++++++---------------
>  libavfilter/libmpcodecs/vf_ilpack.c    |  8 ++---
>  libavfilter/libmpcodecs/vf_pp7.c       |  2 +-
>  libavfilter/libmpcodecs/vf_uspp.c      |  2 +-
>  libavfilter/vf_drawtext.c              |  4 ++-
>  libavutil/mathematics.h                |  3 ++
>  libavutil/x86/asm.h                    | 43 ++++++++++++++++++++++++-
>  libpostproc/postprocess_template.c     |  7 ++++
>  libswresample/x86/resample_mmx.h       |  2 ++
>  libswscale/x86/rgb2rgb.c               |  8 +++++
>  libswscale/x86/rgb2rgb_template.c      | 11 +++++++
>  libswscale/x86/swscale_template.c      | 27 ++++++++++++++++
>  libswscale/x86/yuv2rgb.c               | 18 +++++------
>  libswscale/x86/yuv2rgb_template.c      | 21 ++++++++++--
>  27 files changed, 205 insertions(+), 63 deletions(-)
> 
> diff --git a/libavcodec/x86/cabac.h b/libavcodec/x86/cabac.h
> index 558d287..fe1eb32 100644
> --- a/libavcodec/x86/cabac.h
> +++ b/libavcodec/x86/cabac.h
> @@ -34,7 +34,8 @@
>  #       define BROKEN_COMPILER 0
>  #endif
>  

> -#if HAVE_INLINE_ASM
> +#if HAVE_INLINE_ASM && !( defined(__INTEL_COMPILER) && defined(_MSC_VER) )

that condition occurs alot maybe something like
HAVE_INTEL_GAS or .._ASSEMBLER or something
could be used to simplify these and also test for the actual thing
that causes the problems assuming that is what causes the problem


[...]
> @@ -266,7 +267,7 @@ static inline void sad8_4_mmx(uint8_t *blk1, uint8_t *blk2, int stride, int h)
>          "punpckhbw %%mm7, %%mm5         \n\t"
>          "paddw %%mm4, %%mm2             \n\t"
>          "paddw %%mm5, %%mm3             \n\t"
> -        "movq 16+"MANGLE(round_tab)", %%mm5 \n\t"
> +        "movq %5, %%mm5                 \n\t"
>          "paddw %%mm2, %%mm0             \n\t"
>          "paddw %%mm3, %%mm1             \n\t"
>          "paddw %%mm5, %%mm0             \n\t"
> @@ -290,6 +291,7 @@ static inline void sad8_4_mmx(uint8_t *blk1, uint8_t *blk2, int stride, int h)
>          " js 1b                         \n\t"
>          : "+a" (len)
>          : "r" (blk1 - len), "r" (blk1 -len + stride), "r" (blk2 - len), "r" ((x86_reg)stride)
> +          , "m"(round_tab[2])

a int64 instead of a array could be used


[...]
> diff --git a/libavfilter/libmpcodecs/vf_fspp.c b/libavfilter/libmpcodecs/vf_fspp.c
> index a8a33e2..290057f 100644
> --- a/libavfilter/libmpcodecs/vf_fspp.c
> +++ b/libavfilter/libmpcodecs/vf_fspp.c

changes to libmpcodecs should be sent to mplayer-dev


[...]



> index e4315ef..226b415 100644
> --- a/libswscale/x86/yuv2rgb.c
> +++ b/libswscale/x86/yuv2rgb.c
> @@ -48,15 +48,6 @@ DECLARE_ASM_CONST(8, uint64_t, pb_e0) = 0xe0e0e0e0e0e0e0e0ULL;
>  DECLARE_ASM_CONST(8, uint64_t, pb_03) = 0x0303030303030303ULL;
>  DECLARE_ASM_CONST(8, uint64_t, pb_07) = 0x0707070707070707ULL;
>  
> -//MMX versions
> -#if HAVE_MMX_INLINE
> -#undef RENAME
> -#undef COMPILE_TEMPLATE_MMXEXT
> -#define COMPILE_TEMPLATE_MMXEXT 0
> -#define RENAME(a) a ## _mmx
> -#include "yuv2rgb_template.c"
> -#endif /* HAVE_MMX_INLINE */
> -
>  // MMXEXT versions
>  #if HAVE_MMXEXT_INLINE
>  #undef RENAME
> @@ -66,6 +57,15 @@ DECLARE_ASM_CONST(8, uint64_t, pb_07) = 0x0707070707070707ULL;
>  #include "yuv2rgb_template.c"
>  #endif /* HAVE_MMXEXT_INLINE */
>  
> +//MMX versions
> +#if HAVE_MMX_INLINE
> +#undef RENAME
> +#undef COMPILE_TEMPLATE_MMXEXT
> +#define COMPILE_TEMPLATE_MMXEXT 0
> +#define RENAME(a) a ## _mmx
> +#include "yuv2rgb_template.c"
> +#endif /* HAVE_MMX_INLINE */
> +
>  #endif /* HAVE_INLINE_ASM */
>  
>  av_cold SwsFunc ff_yuv2rgb_init_x86(SwsContext *c)

why ?


[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

It is what and why we do it that matters, not just one of them.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20140105/29907358/attachment.asc>


More information about the ffmpeg-devel mailing list