[Ffmpeg-devel] Snow slicing support

Oded Shimon ods15
Tue Apr 11 06:49:47 CEST 2006


On Thu, Apr 06, 2006 at 05:19:57PM +0200, Michael Niedermayer wrote:
> Hi
> 
> On Mon, Apr 03, 2006 at 09:47:58PM +0300, Oded Shimon wrote:
> > Just thought this patch might be of general interest to anyone. What I 
> > find interesting about it, is that it's not the sliced output that helps 
> > at all, but the rearranging of how the data is handled, of unpacking 
> > coeffs seperately from decoding image. It is actually a surprisngly huge 
> > difference on my cpu, almost 20% faster in some cases. This code trades 
> > off code switches against data switches, and even in my high res video 
> > (944x544), code switches prooved to be far more expensive...
> > 
> > I don't really expect this patch to go in CVS, but I am interested in any 
> > comments if anyone has any...
> 
> this needs testing with different resolutions, bitrates and cpus
> (320x240 720x576 p4 athlon ...)
> 
> is this speed difference also there with other gcc versions
> and most interresting is it there too at lower -O
> 
> if its consistently faster (or at least not slower) then this should be
> applied

Do you have any suggestions with how to test this efficiently? cache 
performance is hard to benchmark, especially in high level code. :/

using mplayer -benchmark several times gave me wild results:

without patch:
BENCHMARKs: VC: 108.872s VO:  17.123s A:   1.205s Sys:  32.865s =  160.065s
BENCHMARKs: VC: 102.149s VO:  15.351s A:   1.198s Sys:  33.220s =  151.918s
BENCHMARKs: VC:  99.299s VO:  15.920s A:   1.517s Sys:  34.233s =  150.970s
BENCHMARKs: VC: 101.674s VO:  16.263s A:   1.284s Sys:  32.215s =  151.436s

with patch:
BENCHMARKs: VC:  97.398s VO:  15.675s A:   1.299s Sys:  36.363s =  150.734s
BENCHMARKs: VC:  95.429s VO:  15.321s A:   1.174s Sys:  38.613s =  150.536s
BENCHMARKs: VC:  96.610s VO:  15.528s A:   1.181s Sys:  37.275s =  150.594s
BENCHMARKs: VC:  95.816s VO:  15.297s A:   1.197s Sys:  38.248s =  150.558s

(these are old benchmarks, and on that single file)

In this case the difference was still obvious, but the results are very 
inaccurate. is there a better way for this? maybe START_TIMER around the 
whole decode() function?

- ods15





More information about the ffmpeg-devel mailing list