[Ffmpeg-devel] [PATCH] simple_idct_armv5te optimization
Siarhei Siamashka
siarhei.siamashka
Sat Sep 30 20:56:14 CEST 2006
On Saturday 30 September 2006 20:41, Michael Niedermayer wrote:
[...]
> > IDCT SIMPLE-ARMv5TE: err_inf=1 err2=0.00667969 syserr=0.00130000
> > maxout=266 blockSumErr=64
> > IDCT SIMPLE-ARMv5TE: 153.4 kdct/s
> >
> > After patch:
> > IDCT SIMPLE-ARMv5TE: err_inf=1 err2=0.00667969 syserr=0.00130000
> > maxout=266 blockSumErr=64
> > IDCT SIMPLE-ARMv5TE: 158.8 kdct/s
>
> patch looks ok (assuming its also faster with an actual video, instead of
> just dct-test)
A good point. Actually on real video I was unable to see any visible
difference. And considering the increased code size, now I think it may
theoretically even cause some slowdown if the program runs out of
instruction cache, I remember a discussion in mplayer developers
mailing list about h264 decoder and -O4 vs. -O2.
So it should be carefully benchmarked and investigated. Considering the
current 'simple_idct_armv5.S', a strange thing is that it provides some
performance improvement over older armv4 code for mpeg1 (up to 10%), but
almost does not have any effect for mpeg4 (within 1-2%) in my tests. And from
the result of profiling (on x86 computer unfortunately, but with 'generic'
cpu and MMX/SSE and uther stuff disabled) both mpeg1 and mpeg4 heavily use
IDCT, so some effect should have been observed. There should be some
explanation. I'll try to find a way to measure effects of both data and
instruction caches.
More information about the ffmpeg-devel
mailing list