[Ffmpeg-devel] Snow slicing support
Oded Shimon
ods15
Tue Apr 11 06:49:47 CEST 2006
On Thu, Apr 06, 2006 at 05:19:57PM +0200, Michael Niedermayer wrote:
> Hi
>
> On Mon, Apr 03, 2006 at 09:47:58PM +0300, Oded Shimon wrote:
> > Just thought this patch might be of general interest to anyone. What I
> > find interesting about it, is that it's not the sliced output that helps
> > at all, but the rearranging of how the data is handled, of unpacking
> > coeffs seperately from decoding image. It is actually a surprisngly huge
> > difference on my cpu, almost 20% faster in some cases. This code trades
> > off code switches against data switches, and even in my high res video
> > (944x544), code switches prooved to be far more expensive...
> >
> > I don't really expect this patch to go in CVS, but I am interested in any
> > comments if anyone has any...
>
> this needs testing with different resolutions, bitrates and cpus
> (320x240 720x576 p4 athlon ...)
>
> is this speed difference also there with other gcc versions
> and most interresting is it there too at lower -O
>
> if its consistently faster (or at least not slower) then this should be
> applied
Do you have any suggestions with how to test this efficiently? cache
performance is hard to benchmark, especially in high level code. :/
using mplayer -benchmark several times gave me wild results:
without patch:
BENCHMARKs: VC: 108.872s VO: 17.123s A: 1.205s Sys: 32.865s = 160.065s
BENCHMARKs: VC: 102.149s VO: 15.351s A: 1.198s Sys: 33.220s = 151.918s
BENCHMARKs: VC: 99.299s VO: 15.920s A: 1.517s Sys: 34.233s = 150.970s
BENCHMARKs: VC: 101.674s VO: 16.263s A: 1.284s Sys: 32.215s = 151.436s
with patch:
BENCHMARKs: VC: 97.398s VO: 15.675s A: 1.299s Sys: 36.363s = 150.734s
BENCHMARKs: VC: 95.429s VO: 15.321s A: 1.174s Sys: 38.613s = 150.536s
BENCHMARKs: VC: 96.610s VO: 15.528s A: 1.181s Sys: 37.275s = 150.594s
BENCHMARKs: VC: 95.816s VO: 15.297s A: 1.197s Sys: 38.248s = 150.558s
(these are old benchmarks, and on that single file)
In this case the difference was still obvious, but the results are very
inaccurate. is there a better way for this? maybe START_TIMER around the
whole decode() function?
- ods15
More information about the ffmpeg-devel
mailing list