[Ffmpeg-devel] [RFC] AltiVec optimizations, try 2

Guillaume POIRIER gpoirier
Wed Aug 2 22:57:20 CEST 2006


Luca Barbato wrote:
> Kostya wrote:
>> Here is my second attempt of writing optimized code. 
>> Please test how it works on your Macs (my rough test show ~6% speedup).
>> If there are no objections I'll commit it tomorrow.
>>
>>
>> ------------------------------------------------------------------------
>>
>> diff -ru --exclude .svn ffmpeg/libavcodec/Makefile ffmpeg-vc1/libavcodec/Makefile
> 
> svn diff ?
> 
>> diff -ru --exclude .svn ffmpeg/libavcodec/ppc/dsputil_ppc.c ffmpeg-vc1/libavcodec/ppc/dsputil_ppc.c
>> --- ffmpeg/libavcodec/ppc/dsputil_ppc.c	2006-07-27 18:22:09.000000000 +0300
>> +++ ffmpeg-vc1/libavcodec/ppc/dsputil_ppc.c	2006-07-31 11:51:04.000000000 +0300
>> @@ -251,6 +251,10 @@
>>  
>>  void dsputil_h264_init_ppc(DSPContext* c, AVCodecContext *avctx);
>>  
>> +#ifdef HAVE_ALTIVEC
>> +void vc1dsp_init_altivec(DSPContext* c, AVCodecContext *avctx);
>> +#endif
>> +
> 
> I think we should reorder the init code to be more rational, I'll do
> something about it soonish
> 
>> +/* constants used in transform */
>> +static const vector int vec_64 = (vector int)64;
>> +static const vector int vec_7 = (vector int)7;
>> +static const vector int vec_5 = (vector int)5;
>> +static const vector int vec_4 = (vector int)4;
>> +static const vector int vec_3 = (vector int)3;
>> +static const vector int vec_2 = (vector int)2;
>> +static const vector int vec_1 = (vector int)1;
> 
> define it with the vec_splat_s32(value) and
> vec_sl(vec_splat_s32(4),vec_splat(4)) for 64

Just out of curiosity, is it necessary to explicit vec_splat_s32 so that 
gcc uses the "splat" asm instruction, otherwise it will allocate 64, 7, 
... on the stack and load each register with these constants?

Also, as far as I understood how vec_splat_s32 works, it should be 
possible to generate a vector full of "64" with a single 
vec_splat_s32(64)... so why is it desirable to use the form with more 
instructions (more decoding bw, more dependencies, more computation unit 
slots used up)... is this an optimization specific to G4 or to Altivec 
in general?

Or am I just to blind to see the obvious solution?

Guillaume





More information about the ffmpeg-devel mailing list