[FFmpeg-devel] [PATCH] Faster CABAC H.264 residual decoding
Sun May 4 17:46:30 CEST 2008
On May 4, 2008, at 3:07 AM, Guillaume POIRIER wrote:
> On Sun, May 4, 2008 at 3:18 AM, Alexander Strange <astrange at ithinksw.com
> > wrote:
>> On May 3, 2008, at 5:21 PM, Michael Niedermayer wrote:
>>> I prefer uint8_t arrays, the speed loss they could cause is purely
>>> hypothetical while the int arrays will need 100 bytes more
>>> precious L1
>>> Besides this, the patch is ok if its faster on average.
>> Here it is with uint8_t.
>> On Core 2:
>> old: avg 9.113 max 9.135 min 9.09
>> new: avg 9.068 max 9.077 min 9.061
>> It should be faster than with int on any RISC.
> How did you measure the speed? Did you measure overall decoding speed
> with "time", or did you used START_TIMER/STOP_TIMER around the
> relevant function?
I ran mplayer -benchmark a lot on the same file and dropped all but
the lowest 4 times for each. This patch had a lot more effect that I
thought it would, so I guess START_TIMER would've worked fine - for
smaller changes it keeps giving me answers that are the opposite of
what measuring total time says, so I've been avoiding it. I meant to
look into using 'rdtscp' instead of rdtsc there, maybe that works
> I can test on PPC970 to have an idea of how this new code behaves on a
> RISC machine.
>> I'll apply it tomorrow
>> unless someone thinks it's still worse on Athlon.
> Well, I guess it's better to measure speed on Athlon rather than
> making a wild guess ;-)
Yeah, if I had one. The asm is a lot shorter with this version,
though, so I really hope it's not slower.
If it is, then:
uint8_t *ctx = (abslevelgt1 != 0 ? 0 : abslevel1) +
abslevel1 = FFMIN( 4, abslevel1+1 );
ctx = 5 + abslevelgt1 + abs_level_m1_ctx_base;
abslevelgt1 = FFMIN( 4, abslevelgt1+1 );
should also be better than the current code on x86, but it's not
really better on PPC where you don't have cmov.
More information about the ffmpeg-devel