[FFmpeg-devel] [PATCH] Some IWMMXT functions for libavcodec #2

Siarhei Siamashka siarhei.siamashka
Fri May 16 17:08:00 CEST 2008


On Friday 16 May 2008, Dmitry Antipov wrote:
> Michael Niedermayer wrote:
> > [...]
> >
> >> +static int vsad_intra16_iwmmxt(void *c, uint8_t *pix, uint8_t *dummy,
> >> int stride, int h) +{
> >> +    int s;
> >> +
> >> +    asm volatile("mov r1, %3            \n\t"
> >> +                 "wzero wr0             \n\t"
> >> +                 "1: wldrd wr1, [%1]    \n\t"
> >> +                 "wldrd wr2, [%1, #8]   \n\t"
> >> +                 "add %1, %1, %2        \n\t"
> >> +                 "wldrd wr3, [%1]       \n\t"
> >> +                 "wldrd wr4, [%1, #8]   \n\t"
> >> +                 "wsadbz wr1, wr1, wr3  \n\t"
> >> +                 "wsadbz wr2, wr2, wr4  \n\t"
> >> +                 "waddw wr0, wr0, wr1   \n\t"
> >> +                 "waddw wr0, wr0, wr2   \n\t"
> >> +                 "subs r1, r1, #1       \n\t"
> >> +                 "bne 1b                \n\t"
> >
> > half of the loads in there are redundant, this also applies to a few
> > other functions
>
> Why? Unlike on x86, you can't do SIMD stuff between register(s) and memory
> - all data should be loaded first. This means that WMMX code will always
> issue more loads than equivalent MMX/SSE code.
>
> For example x = x + y for 8x8 vectors issues 1 load and 1 store on x86 with
> MMX:
>
> asm volatile("movq (%1), %%mm0\n\t"
>               "paddb (%0), %%mm0\n\t"
>               "movq %%mm0, (%0)\n\t"
>
>               : : "r"(x), "r"(y));
>
> For WMMX, you can't do it without at least 2 loads and 1 store:
>
> asm volatile ("wldrd wr0, [%0]\n\t"
>                "wldrd wr1, [%1]\n\t"
>                "waddb wr0, wr0, wr1\n\t"
>                "wstrd wr0, [%0]\n\t"
>
>                : : "r"(x), "r"(y));
>
> Am I missed something?

Half of the data loaded on the second iteration of your loop has been already
loaded on the first iteration. It could be reused to improve performance.
Reusing this data can be used by unrolling loop.

-- 
Best regards,
Siarhei Siamashka




More information about the ffmpeg-devel mailing list