[FFmpeg-devel] [PATCH] VP8 arithcoder asm
Jason Garrett-Glaser
darkshikari
Sun Jul 4 12:29:20 CEST 2010
On Sun, Jul 4, 2010 at 3:14 AM, Reimar D?ffinger
<Reimar.Doeffinger at gmx.de> wrote:
> On Sun, Jul 04, 2010 at 02:25:18AM -0700, Jason Garrett-Glaser wrote:
>> This is rather odd, considering that the code looks a whole lot better
>> than what gcc generates, so there must be something stalling my code
>> that I'm missing, assuming my numbers are right. ?It couldn't possibly
>> be the extra pushes and pops implied by an extern call -- because at
>> least for me, calling the vp56_rac asm function repeatedly instead of
>> the merged tree function is actually faster, despite vastly more stack
>> thrashing.
>
> Maybe it causes the compiler to mess up completely in surrounding code?
> Sometimes the compiler manages to optimize code pieces together that
> are quite far apart, any kind of code it cannot see might confuse it.
As you can see, the timer doesn't *time* the surrounding code.
> Also, which compiler version do you use?
As I said, 4.3.
> Because e.g. your cmov-related
> "magic" does not work at all for me with gcc 4.4.4 and compiling for
> Phenom II, it always generates branches...
Relatedly, Yuvi confirmed that in the case that the compiler *doesn't*
generate the cmovs we want, the asm is significantly faster (~10%).
Dark Shikari
More information about the ffmpeg-devel
mailing list