[MPlayer-dev-eng] [PATCH] (new version) AltiVec: dct64 for mp3lib, IMDCT for liba52, detection code

Romain Dolbeau dolbeau at irisa.fr
Sat Jan 18 14:34:15 CET 2003


Daniel Egger wrote:

> And it works under Linux too, for instance for all players 
> using libmpeg2 in a more recent version.

I don't doubt libmpeg2, I was talking about my rip-off
of the code included in my mplayer patch. I won't claim
it works until I've tested it :-)

> Any plans to tackle libavcodec MC support? I could send you some
> (currently not working) code to start with. Unfortunately the alignment
> of the data changed quite a lot and thus it's almost impossible 
> to write fast versions without handling a dozend of special cases. :/

What's MC ? I know absolutely nothing on image/soud/signal
processing, I'm just a comp.arch guy trying to spare his
precious CPU cycles :-)

Seriously, it seems that on MacOSX the latest mplayer is pretty
fast, and cpu consumption spread over many functions.

The number one CPU eater is YUV420To2VUY_W1x, a QuickTime
function. I assume that '2VUY' is the Radeon favorite
YUV format for overlay, and as libSDL doesn't handle it
directly, QuickTime converts the data on-the-fly.
Apart from this one (and the gprof functions :-),
memmove and memset are bad...

This is all the functions eating up more that 1% of CPU
according to gprof for one particular file (first
2048 frames only):

#####
   %   cumulative   self              self     total
  time   seconds   seconds    calls  ms/call  ms/call  name
  18.8       9.74     9.74                             _YUV420To2VUY_W1x [3]
  18.4      19.24     9.50                             _moncount (99654)
   6.5      22.58     3.34                             _memmove [12]
   5.4      25.40     2.82                             mcount (15826)
   4.0      27.46     2.06                             _gmc1_altivec [14]
   3.8      29.43     1.97                             _memset [15]
   2.4      30.67     1.24                             _mach_msg_overwrite [19]
   2.3      31.84     1.17                             _nanosleep [20]
   2.2      32.99     1.15                             _RateConvertStereo16AltiVec [21]
   2.0      34.04     1.05 13765710     0.00     0.00  _mpeg4_decode_block [22]
   1.7      34.94     0.90   164880     0.01     0.01  _synth_1to1 [23]
   1.7      35.82     0.88                             _syscall [24]
   1.6      36.65     0.83                             _put_pixels16_altivec [25]
   1.6      37.46     0.81  2457600     0.00     0.00  _MPV_decode_mb [9]
   1.5      38.26     0.80                             _loadVectorShort [26]
   1.3      38.95     0.69                             _idct_add_altivec [27]
   1.1      39.53     0.58   901254     0.00     0.00  _put_pixels8_xy2_c [28]
   1.0      40.07     0.54  2829604     0.00     0.00  _mpeg_motion [11]
   1.0      40.58     0.51                             _avg_pixels16_altivec [29]
   1.0      41.09     0.51                             _sdevCopyBuffer [30]
#####

There's not much left to do IMHO. Can't avoid Amdahl's law :-(

(Unless it's easy to output 2VUY from mplayer, and fix
libSDL to feed that to the hardware... QT can take 2VUY
directly it seems)

BTW, this is on OSX 10.1, mplayer works on it after all.

-- 
Romain Dolbeau



More information about the MPlayer-dev-eng mailing list