[FFmpeg-devel] [PATCH] Some ARM VFP optimizations (vector_fmul, vector_fmul_reverse, float_to_int16)

Sun Apr 20 23:54:30 CEST 2008

On Sun, Apr 20, 2008 at 11:41:57PM +0200, Michael Niedermayer wrote:
> On Sun, Apr 20, 2008 at 11:33:43PM +0300, Siarhei Siamashka wrote:
[...]
> > Now about the problems why this code is still not in FFmpeg SVN. The most
> > important problem that stalled everything is 64-bit alignment. In order to
> > perform dual loads, data needs to be 64-bit (8 byte) aligned. This applies
> > to constants loading and accessing temporary data on stack. Constants loading
> > requires some kind of '.align' directive in the code. The problem is that
> > '.align' is nonportable and '.balign' or '.p2align' are not supported by 
> > broken Apple toolchain on x86 and apparently on ARM too (actually it has a 
> > lot more issues in addition to this one). One more problem is related to
> > stack, because legacy systems and toolchains (those that use OABI) support 
> > only 4 byte stack alignment. There was a tweak in the code that performs 
> > stack realignment when IDCT code is built for these broken legacy systems 
> > with absolutely no overhead on modern systems (this code does not get compiled
> > in for modern EABI systems that already provide 64-bit stack alignment), but
> > it was rejected by Michael Niedermayer. I strongly disagree with this decision
> > and breaking support for legacy systems for no reason is not a valid option
> > for me.
> 
> I do not remember exactly but there was a serious problem with your suggested
> "solution", IIRC it caused some serious speedloss on some systems.
> 
> anyway
> 1. Why do you need a aligned stack? That is why cant you use the heap?
> 2. Assuming you do need one, where was the problem with using a recent gcc
>    which supports maintaining stack alignment?
> 3. What effect does your solution have on systems which do align the stack
>    aka a recent gcc on pre EABI. Or even a non gcc compiler.

To awnser 3.
huge speedloss, and thats why this isnt a solution

> 
> 
> [...]
> > The last (actually minor) problem is related to crop table that is defined in
> > C code and used from assembly code. If anybody would ever change MAX_NEG_CROP
> > constant in C code for whatever reason, assembly implementation of ARMv5TE
> > IDCT will be silently broken (it will compile, but will break at runtime). I
> > would prefer to have compilation failure here. A good example is lowres
> > decoding support that is now broken for many less popular architectures
> > (lowres decoding was added at some time in the past, but nobody tweaked sparc
> > or alpha code to properly support it). Another option (which I initially used)
> > is just to have a copy of crop table stored in assembly source file, it will
> > take extra space, but will make code reliable and not dependent on any other
> > stuff from outside.
> > 
> > I really would like to hear some constructive feedback instead of "this is
> > unacceptable" with no alternatives suggested :)
> 
> #if MAX_NEG_CROP != 12345
> Please fix code in file foobar
> #endif

or better add a comment to the #define setting MAX_NEG_CROP

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

It is dangerous to be right in matters on which the established authorities
are wrong. -- Voltaire
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080420/8cc98c5c/attachment.pgp>