[FFmpeg-devel] [RFC] DXVA2 decoding and FFmpeg

Tue Jun 16 14:30:02 CEST 2015

On date Tuesday 2015-06-16 14:16:11 +0200, Gwenole Beauchesne encoded:
> Hi,
> 
> 2015-06-16 14:03 GMT+02:00 Michael Niedermayer <michaelni at gmx.at>:
[...]
> >> +#if HAVE_SSE2
> >> +/* Copy 16/64 bytes from srcp to dstp loading data with the SSE>=2 instruction
> >> + * load and storing data with the SSE>=2 instruction store.
> >> + */
> >> +#define COPY16(dstp, srcp, load, store) \
> >> +    __asm__ volatile (                  \
> >> +        load "  0(%[src]), %%xmm1\n"    \
> >> +        store " %%xmm1,    0(%[dst])\n" \
> >> +        : : [dst]"r"(dstp), [src]"r"(srcp) : "memory", "xmm1")
> >> +
> >> +#define COPY64(dstp, srcp, load, store) \
> >> +    __asm__ volatile (                  \
> >> +        load "  0(%[src]), %%xmm1\n"    \
> >> +        load " 16(%[src]), %%xmm2\n"    \
> >> +        load " 32(%[src]), %%xmm3\n"    \
> >> +        load " 48(%[src]), %%xmm4\n"    \
> >> +        store " %%xmm1,    0(%[dst])\n" \
> >> +        store " %%xmm2,   16(%[dst])\n" \
> >> +        store " %%xmm3,   32(%[dst])\n" \
> >> +        store " %%xmm4,   48(%[dst])\n" \
> >> +        : : [dst]"r"(dstp), [src]"r"(srcp) : "memory", "xmm1", "xmm2", "xmm3", "xmm4")
> >> +#endif
> >> +
> >> +#define COPY_LINE(dstp, srcp, size, load)                               \
> >> +    const unsigned unaligned = (-(uintptr_t)srcp) & 0x0f;               \
> >> +    unsigned x = unaligned;                                             \
> >> +                                                                        \
> >> +    av_assert0(((intptr_t)dstp & 0x0f) == 0);                           \
> >> +                                                                        \
> >> +    __asm__ volatile ("mfence");                                        \
> >> +    if (!unaligned) {                                                   \
> >> +        for (; x+63 < size; x += 64)                                    \
> >> +            COPY64(&dstp[x], &srcp[x], load, "movdqa");                 \
> >> +    } else {                                                            \
> >> +        COPY16(dst, src, "movdqu", "movdqa");                           \
> >> +        for (; x+63 < size; x += 64)                                    \
> >> +            COPY64(&dstp[x], &srcp[x], load, "movdqu");                 \
> >
> > to use SSE registers in inline asm operands or clobber list you need
> > to build with -msse (which probably is default on on x86-64)
> >
> > files build with -msse will result in undefined behavior if anything
> > in them is executed on a pre SSE cpu, as these allow gcc to put
> > SSE instructions directly in the code where it likes
> >
> > The way out of this "design" is not to tell gcc that it passes
> > a string with SSE code to the assembler
> > that is not to use SSE registers in operands and not to put them
> > on the clobber list unless gcc actually is in SSE mode and can use and
> > need them there.
> > see XMM_CLOBBERS*
> 
> Well, from past experience, lying to gcc is generally not a good thing
> either. There are multiple interesting ways it could fail from time to
> time. :)
> 
> Other approaches:
> - With GCC >= 4.4, you can use __attribute__((target(T))) where T =
> "ssse3", "sse4.1", etc. This is the easiest way ;
> - Split into several separate files per target. Though, one would then
> argue that while we are at it why not just start moving to yasm.
> 

> The former approach looks more appealing to me, considering there may
> be an effort to migrate to yasm afterwards.

I plan to port this patch to yasm. I'll ask for help on IRC since
probably it will take too much time otherwise without any guidance.
-- 
FFmpeg = Friendly and Fancy Mind-dumbing Pacific Easy Generator