[Ffmpeg-devel] [PATCH] simple_idct_armv5te optimization
Måns Rullgård
mru
Sat Sep 30 22:17:24 CEST 2006
Siarhei Siamashka <siarhei.siamashka at gmail.com> writes:
> On Saturday 30 September 2006 20:41, Michael Niedermayer wrote:
>
> [...]
>> > IDCT SIMPLE-ARMv5TE: err_inf=1 err2=0.00667969 syserr=0.00130000
>> > maxout=266 blockSumErr=64
>> > IDCT SIMPLE-ARMv5TE: 153.4 kdct/s
>> >
>> > After patch:
>> > IDCT SIMPLE-ARMv5TE: err_inf=1 err2=0.00667969 syserr=0.00130000
>> > maxout=266 blockSumErr=64
>> > IDCT SIMPLE-ARMv5TE: 158.8 kdct/s
>>
>> patch looks ok (assuming its also faster with an actual video, instead of
>> just dct-test)
>
> A good point. Actually on real video I was unable to see any visible
> difference. And considering the increased code size, now I think it may
> theoretically even cause some slowdown if the program runs out of
> instruction cache, I remember a discussion in mplayer developers
> mailing list about h264 decoder and -O4 vs. -O2.
>
> So it should be carefully benchmarked and investigated. Considering
> the current 'simple_idct_armv5.S', a strange thing is that it
> provides some performance improvement over older armv4 code for
> mpeg1 (up to 10%), but almost does not have any effect for mpeg4
> (within 1-2%) in my tests. And from the result of profiling (on x86
> computer unfortunately, but with 'generic' cpu and MMX/SSE and uther
> stuff disabled) both mpeg1 and mpeg4 heavily use IDCT, so some
> effect should have been observed. There should be some
> explanation. I'll try to find a way to measure effects of both data
> and instruction caches.
I'll have a look at it when I get time. Unfortunately, that will
probably not be within the next few days.
--
M?ns Rullg?rd
mru at inprovide.com
More information about the ffmpeg-devel
mailing list