[Ffmpeg-devel] [PATCH] (3) building with --disable-opts on i386 with mmx enabled

Fri Aug 11 22:07:48 CEST 2006

On Friday 11 August 2006 21:00, Michael Niedermayer wrote:
> 4.0+, 3.2+ or so and 2.95 must work

>
> > As for validity. The modified transpose4x4 is used in 4 regression test
> > and passes all (With gcc-4.0.3 and gcc-3.3.6) - it's a straightforward
> > substitution and uint32_t can be passed as y. They enter via "movd".
> > Where's the error?
>
> writing to input operands ...
It's a pleasure to profit from your experience. What is the right way to 
handle this, if you want to work on a copy of the input parameter?

Is this better? 

static inline void transpose4x4(uint8_t *dst, uint8_t *src, int dst_stride, 
int src_stride){
        uint32_t t0=(*(uint32_t*)(src + 0*src_stride)); 
        uint32_t t1=(*(uint32_t*)(src + 1*src_stride)); 
        uint32_t t2=(*(uint32_t*)(src + 2*src_stride)); 
        uint32_t t3=(*(uint32_t*)(src + 3*src_stride)); 

    asm volatile(
        "punpcklbw %5, %4               \n\t"
        "punpcklbw %7, %6               \n\t"
        "movq %4, %5                    \n\t"
        "punpcklwd %6, %4               \n\t"
        "punpckhwd %6, %5               \n\t"
        "movd  %4, %0                   \n\t"
        "punpckhdq %4, %4               \n\t"
        "movd  %4, %1                   \n\t"
        "movd  %5, %2                   \n\t"
        "punpckhdq %5, %5               \n\t"
        "movd  %5, %3                   \n\t"

        : "=m" (*(uint32_t*)(dst + 0*dst_stride)),
          "=m" (*(uint32_t*)(dst + 1*dst_stride)),
          "=m" (*(uint32_t*)(dst + 2*dst_stride)),
          "=m" (*(uint32_t*)(dst + 3*dst_stride))
         "+y" (t0),
         "+y" (t1),
         "+y" (t2),
         "+y" (t2)
    );
}