[FFmpeg-devel] [PATCH] Some IWMMXT functions for libavcodec #2
Fri May 16 18:45:51 CEST 2008
On Friday 16 May 2008, Dmitry Antipov wrote:
> Siarhei Siamashka wrote:
> > Half of the data loaded on the second iteration of your loop has been
> > already loaded on the first iteration. It could be reused to improve
> > performance. Reusing this data can be used by unrolling loop.
> Argh, I see. Here two loads are avoided at the cost of having two moves:
> asm volatile("mov r1, %3 \n\t"
> "wzero wr0 \n\t"
> "wldrd wr1, [%1] \n\t"
> "wldrd wr2, [%1, #8] \n\t"
> "1: add %1, %1, %2 \n\t"
> "wldrd wr3, [%1] \n\t"
> "wldrd wr4, [%1, #8] \n\t"
> "wsadbz wr1, wr1, wr3 \n\t"
> "wsadbz wr2, wr2, wr4 \n\t"
> "waddw wr0, wr0, wr1 \n\t"
> "waddw wr0, wr0, wr2 \n\t"
> "wmov wr1, wr3 \n\t"
> "wmov wr2, wr4 \n\t"
> "subs r1, r1, #1 \n\t"
> "bne 1b \n\t"
> "textrmsw %0, wr0, #0 \n\t"
> : "=r"(s), "+r"(pix)
> : "r"(stride), "r"(h - 1)
> : "r1");
> As for unrolling, I don't believe it's a good idea here since the number of
> iterations of outer loop isn't known. Here is an unrolled version:
You just need to unroll loop by merging two iterations into one, then you can
flip the registers used for loads and avoid unnecessary moves. I can try
to make some kind of code that theoretically should do the job (but can't
Also one need to take care of instruction latencies (especially loads) to
avoid pipeline stalls when you try to use the result before it is
available. There is at least one stall in your code. You can hide it
by moving "subs r1, r1, #1" instruction up and placing it after the
More information about the ffmpeg-devel