[MPlayer-dev-eng] [PATCH] (new version) AltiVec: dct64 for mp3lib, IMDCT for liba52, detection code

Daniel Egger degger at fhm.edu
Sat Jan 18 20:34:03 CET 2003


Am Sam, 2003-01-18 um 18.07 schrieb Romain Dolbeau:

> Yet most expected functions can be seen in the profile,
> so they are profiled.
> Also, the machine where the profile was generated was
> a 800 Mhz PPC7450 w/o L3 cache, so as soon as you're
> out of the 128KB L2, memory accesses are very expensive.
> (it's regular PC133, not even DDR on this box...)

Sure, but the top functions are certainly not what I would have
expected. For instance this is the profile on a G4 with linux
playing a MPEG4 with mp3 audio:

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 21.50     10.00    10.00 45664560     0.00     0.00  idctSparseColAdd
  9.16     14.26     4.26 47547936     0.00     0.00  idctRowCondDC
  6.73     17.39     3.13  2759489     0.00     0.00  put_pixels8_xy2_c
  6.64     20.48     3.09  2716259     0.00     0.00  put_no_rnd_pixels8_xy2_c
  5.40     22.99     2.51 19075470     0.00     0.00  mpeg4_decode_block
  5.29     25.45     2.46   406296     0.01     0.01  synth_1to1
  3.46     27.06     1.61  3397968     0.00     0.00  ff_h263_decode_mb
  2.86     28.39     1.33  3397968     0.00     0.01  MPV_decode_mb
  2.69     29.64     1.25      729     1.71     1.71  play
  2.39     30.75     1.11  2798586     0.00     0.00  mpeg_motion
  2.13     31.74     0.99   406296     0.00     0.00  dct64_1
  2.04     32.69     0.95  3320142     0.00     0.00  MPV_motion
  1.93     33.59     0.90  5708070     0.00     0.00  simple_idct_add
  1.85     34.45     0.86  9409352     0.00     0.00  h263_decode_motion
  1.72     35.25     0.80   261652     0.00     0.00  put_pixels16_x2_c
  1.66     36.02     0.77   485686     0.00     0.00  dct36
  1.66     36.79     0.77   256612     0.00     0.00  put_no_rnd_pixels16_x2_c
  1.57     37.52     0.73   338964     0.00     0.00  ff_emulated_edge_mc

As you can see, the top offender is the iDCT which is exactly what one
would expect because it's really computing intensive, the mp3 iDCT has
far less data to compute then the video one (ratio of bandwidth 1/7)
and thus is quite a bit below. The put_* functions perform the already
mentioned MC and are quite intensive because of their memory touching
nature. 

> If the heavy computations are in-cache and the memory
> copies are off-cache, then the copies will eat up
> plenty of time. If you (copy, compute_on_copy),
> then most of the memory latency will be seen in the
> copy instead of the computations.

True, but most memory copies are implicit not explicit by
calling memmove (and especially not memmove) and thus will show up
in the touching functions not in any memcpy/memmove routines.

> I don't have a G4 w/ L3 to verify this theory :-(

Me neither.

> I added -pg to config.mak, isn't that enough ? (after removing
> all .a and .o, of course).

Yes it is, just tried it on Linux; the CFLAGS are correctly propagated
to all sudirectories as they should.

>  I'm not sure what is a proper libc for profiling on MacOSX ...

I've no idea, don't have the space to install the developer tools on
my OS X partition on my PowerBook; maybe I'll try later on my iBook
(if my girlfriend permits)... :)

-- 
Servus,
       Daniel
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Dies ist ein digital signierter Nachrichtenteil
URL: <http://lists.mplayerhq.hu/pipermail/mplayer-dev-eng/attachments/20030118/91772b16/attachment.pgp>


More information about the MPlayer-dev-eng mailing list