[FFmpeg-devel] [PATCH] Some IWMMXT functions for libavcodec #2

Dmitry Antipov dmantipov
Mon May 19 11:46:46 CEST 2008


Siarhei Siamashka wrote:

> But your unrolled code still "sucks" :) It has a lot of pipeline stalls
> that could be eliminated. Please read optimization manual and find
> a definition of instruction latency. That will help a lot in optimizing
> code and understanding how CPU works. ARM pipeline is quite simple to
> comprehend and you will immediately spot potential stalls after you
> get more practice with assembly code.

What docs are you using? As I understand, this is the main XScale core specification
beyond the WMMX-specific stuff: http://www.intel.com/design/intelxscale/273473.htm
(I'm slightly confused with the relationships between XScale cores and ARM{5,7,9,11}
ones).

> Let's try the following. We can start with getting a perfect version of
> 'vsad_intra16_iwmmxt' function first. Once it is done, you can focus on
> optimizing 'pix_sum' function yourself without getting any assistance or
> further hints. Once you manage to get an implementation that does not have
> any pipeline stalls, you should have enough experience and can move on to
> optimizing the rest of functions. Is this plan acceptable for you?

OK, sure. Thank you very much for your assistance and hints.

Dmitry







More information about the ffmpeg-devel mailing list