[FFmpeg-devel] [HACK] 50% faster H.264 decoding

Mon Aug 23 12:18:47 CEST 2010

Luca Barbato wrote:
> On 08/23/2010 07:39 AM, Jason Garrett-Glaser wrote:
>> How in the world would instrinsics solve that problem?  There is no
>> compiler in the world that will magically rewrite your algorithm to
>> use completely different instructions on a given architecture.
> 
> You described what a compiler generally does...
> 
> you use c = a + b; not c = arch_specific_add_inst(a, b);
> 
> Generic vector intrinsics do exist (and yes, the do suck right now)

What is the point of it? I mean, compare these two code snippets:

void sum(float *out, const float *in, int size)
{
     vector float *o = out;
     vector float *i = in;

     size <<= 2;

     while(size--)
          *o++ += *i++;
}

void sum(float *out, const float *in, int size)
{
     assert(!(in&15));
     assert(!(out&15));
     assert(!(size&3));

     while(size--)
          *out++ += *in++;
}

Why can't the compiler generate exactly the same ASM for both?

-Vitor