[FFmpeg-devel] [PATCH] Some IWMMXT functions for libavcodec #2

Michael Niedermayer michaelni
Fri May 16 18:43:11 CEST 2008


On Fri, May 16, 2008 at 08:19:44PM +0400, Dmitry Antipov wrote:
> Siarhei Siamashka wrote:
> 
> > Half of the data loaded on the second iteration of your loop has been already
> > loaded on the first iteration. It could be reused to improve performance.
> > Reusing this data can be used by unrolling loop.
> 
> Argh, I see. Here two loads are avoided at the cost of having two moves:
> 
> asm volatile("mov r1, %3            \n\t"
>               "wzero wr0             \n\t"
>               "wldrd wr1, [%1]       \n\t"
>               "wldrd wr2, [%1, #8]   \n\t"
>               "1: add %1, %1, %2     \n\t"
>               "wldrd wr3, [%1]       \n\t"
>               "wldrd wr4, [%1, #8]   \n\t"
>               "wsadbz wr1, wr1, wr3  \n\t"
>               "wsadbz wr2, wr2, wr4  \n\t"
>               "waddw wr0, wr0, wr1   \n\t"
>               "waddw wr0, wr0, wr2   \n\t"
>               "wmov wr1, wr3         \n\t"
>               "wmov wr2, wr4         \n\t"
>               "subs r1, r1, #1       \n\t"
>               "bne 1b                \n\t"
>               "textrmsw %0, wr0, #0  \n\t"
>               : "=r"(s), "+r"(pix)
>               : "r"(stride), "r"(h - 1)
>               : "r1");
> 
> As for unrolling, I don't believe it's a good idea here since the number of
> iterations of outer loop isn't known. Here is an unrolled version:

the iterations are always an even number IIRC, but dont hesitate to add a
assert(!(h&1));


> 
> asm volatile("mov r1, %3                \n\t"
>               "wzero wr0                 \n\t"
>               "wldrd wr1, [%1]           \n\t"
>               "wldrd wr2, [%1, #8]       \n\t"
>               "1: add %1, %1, %2         \n\t"
>               "wldrd wr3, [%1]           \n\t"
>               "wldrd wr4, [%1, #8]       \n\t"
>               "wsadbz wr1, wr1, wr3      \n\t"
>               "wsadbz wr2, wr2, wr4      \n\t"
>               "waddw wr0, wr0, wr1       \n\t"
>               "waddw wr0, wr0, wr2       \n\t"
>               "subs r1, r1, #1           \n\t"
>               "beq 2f                    \n\t"
>               "add %1, %1, %2            \n\t"
>               "wldrd wr5, [%1]           \n\t"
>               "wldrd wr6, [%1, #8]       \n\t"
>               "wsadbz wr3, wr3, wr5      \n\t"
>               "wsadbz wr4, wr4, wr6      \n\t"
>               "waddw wr0, wr0, wr3       \n\t"
>               "waddw wr0, wr0, wr4       \n\t"
>               "wmov wr1, wr5             \n\t"
>               "wmov wr2, wr6             \n\t"
>               "subs r1, r1, #1           \n\t"
>               "bne 1b                    \n\t"
>               "2: textrmsw %0, wr0, #0   \n\t"
>               : "=r"(s), "+r"(pix)
>               : "r"(stride), "r"(h - 1)
>               : "r1");

well, now you have 2 unneeded wmov in there



> 
> The granularity of performance monitoring unit's clock cycle counter
> isn't enough to see performance differences between them :-).

run the code 100 times instead of once

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

I am the wisest man alive, for I know one thing, and that is that I know
nothing. -- Socrates
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080516/cc87d1c1/attachment.pgp>



More information about the ffmpeg-devel mailing list