[FFmpeg-devel] [PATCH] Some IWMMXT functions for libavcodec #2

Dmitry Antipov dmantipov
Fri May 23 11:30:21 CEST 2008


Siarhei Siamashka wrote:

> First it is better to do some "warming up" and call the tested functions at
> least once before doing benchmark, so that they are loaded to instructions
> cache and all the data they use gets loaded to the data cache. You need to
> be careful with data cache, because ARM cores may have read-allocate cache
> behaviour configured. With read-allocate cache, cache lines are not allocated
> on write misses, so just initializing array by writing to it may be not
> enough to ensure that it got into cache.

Sure, I'm taking this into account. Real output of 'loadwmmx' looks like the
following:

0) 3165 430
1) 460 142
2) 23 23
3) 23 23
4) 23 23
5) 23 23
6) 23 23
7) 23 23
8) 23 23
9) 23 23
10) 23 23
11) 23 23
12) 23 23
13) 23 23
14) 23 23
15) 23 23
65616 65616

Iterations 0) and 1) are warm-up. From 2) to 15), the code is definitely
fills the instruction cache, thus shows the peak performance.

> Second and more important. Your test buffer is a byte array and it is not
> guaranteed to be 8-byte aligned. WLDRD instruction requires strict 8-byte
> alignment at least for WMMX1 cores (if documentation is not completely wrong).
> You can check '/proc/cpu/aligment' to configure behaviour of your CPU when
> it performs unaligned memory reads/writes. Theoretically, it could be that
> CPU just ignores unaligned WLDRD reads and your benchmark does not make much
> sense. But it also could be that PXA3xx cores can support unaligned memory
> access, don't know.

As I know, all XScale cores strictly follows the rule 'N-byte access should
be aligned on N-bytes boundary' (where N=1,2,4,8). An attempt to run something
like:

unsigned char __attribute__((aligned (8))) data[64];
asm volatile ("wldrh wr0, [%0]\n\t"
               "wldrh wr1, [%0, #1]\n\t"
               : : "r"(data));

as well as:

asm volatile ("wldrd wr0, [%0]\n\t"
               "wldrd wr1, [%0, #4]\n\t"
               : : "r"(data));

causes SIGBUS and some noise from the kernel:

Alignment trap: align (723) PC=0x000083e8 Instr=0xeddd1101 Address=0xbefffd94 FSR 0x013

Dmitry




More information about the ffmpeg-devel mailing list