[FFmpeg-devel] [PATCH] Some ARM VFP optimizations (vector_fmul, vector_fmul_reverse, float_to_int16)

Siarhei Siamashka siarhei.siamashka
Mon Apr 21 00:50:50 CEST 2008

On Monday 21 April 2008, Michael Niedermayer wrote:
> > I do not remember exactly but there was a serious problem with your
> > suggested "solution", IIRC it caused some serious speedloss on some
> > systems.
> >
> > anyway
> > 1. Why do you need a aligned stack? That is why cant you use the heap?

For temporary data storage (when we run out of registers). Also in ARMv6
IDCT the stack is used for doing 'permutation' between rows and columns
processing (columns are stored as rows).

I thought about reserving some spare working buffer after IDCT coefficients
(so that DCTELEM *data argument would point to 64 DCTELEM values with
coefficients and 64 spare temporary DCTELEM data buffer immediately following
them), but it's a bit intrusive.

How exactly can the heap be efficiently used from IDCT?

> > 2. Assuming you do need one, where was the problem with using a recent
> > gcc which supports maintaining stack alignment?
> > 3. What effect does your solution have on systems which do align the
> > stack aka a recent gcc on pre EABI. Or even a non gcc compiler.

Please explain me how exactly recent gcc would align stack on pre EABI system?
If you want to suggest something like '-mpreferred-stack-boundary', that
option is only supported for x86 ('-m' prefix in general means machine
dependent option).

And I would not even consider non gcc compiler right now, that's a waste of
time. Whatever you suppose or try to guess about this compiler has a high
probability to turn out wrong. Please remember that we are speaking about
assembly code here, not just C99 or whatever covered by some standards.

Assemblers are not portable by definition. I have already provided you a link
to Micro$oft page describing their assembly syntax and macros for ARM.
Here is one more link to another assembler just to get some variety: 

Do you really want to think about supporting all of them?

I would stick with the following policy: want to have assembly
optimizations out of the box - use GNU binutils. Want to port FFmpeg to
some strange toolchain - you are welcome, but don't expect that it 
would be easy. As the very last option one can always use FFmpeg with
plain C without any assembly optimizations, it will be slower, but it will

> To awnser 3.
> huge speedloss, and thats why this isnt a solution

Where did you get this idea? Actually using current FFmpeg implementation of
ARMv5TE IDCT is a huge speedloss :)

The proposed upgrade is not perfect, but it still can be improved further. And
it will provide performance improvement, and provide it right now. Before this
hardware (ARMv5TE is already old) gets completely outdated and abandoned by 


> or better add a comment to the #define setting MAX_NEG_CROP

Yes, that's a bit ugly, but not a big deal.

Best regards,
Siarhei Siamashka

More information about the ffmpeg-devel mailing list