[Mplayer-dev-eng] processsor speciffic code

Arpi arpi at thot.banki.hu
Wed Sep 12 00:35:06 CEST 2001


Hi,

> Hello,
>  i'm confering with debian at Tonelli.sns.it about uploading mplayer into
> debian unstable.  Besides unifying configs ( we need to handle all those
> funny things like fonts/codecs/general configs, and resolve win32 dlls
> stuff ) we are thinking about single-binary problem.
> ie: mplayer detects cpu and compiles cpu-speciffic code. from debian's
> perspective we're only considering x86 and sparc archs, and only x86 for
> now.
> We would like to have single package of mplayer, but we don't want mplayer
> to run slow just because we compiler it with mcpu=i386.
It is not only not recommended, it's forbidden.

> So, we are thinking about a ways around - there is debian proposed policy
> that makes all binaries put their cpu-specific code into shared libraries
> and include the right one during runtine. But as I understand there would
> be noticable performance penalty.
> Another solution we thought about is having few most common setups - for
> pentium,pII,pIII,pIV,k6,k6-2,duron/athlon and i386 just in case, package
> would provide binaries like mplayer.athlon, mplayer.i386, little script
> or installation script would run correct one.
It would be the easiest to impelment, but it has a very big package
oversizing... nobody wants to download a 15mb package to use 1mb of it.
Maybe different package for each architecture...

> There was idea like this:
> > replace all
> >  ...#ifdef  HAVE_MMX
> > by
> >    if( have_mmx)
> but it would be a lot of work, wouldn't be nice and I suspect there would
> be large speed penalties due to branch-prediction missess. On cpus like
> pIV that would really hurt.
Yes. Forget it.

> Maybe someone have better idea how to handle this?
Function pointers are the only way to handle it nice and relative fast
(but still few % slower than #ifdef, see ffmpeg benchmarks - we did this
change there).

> We also think that we would deliver only a couple of video outputs -
> namely sdl, ggi, x11 and xv.
> Or maybe also dga, fbdev, svga and aa, and make users with different needs
> use current practice - download cvs and build mplayer themselves.
libvo2 API is designed to be (optional) pluginized.
i mean there will be a config option to compile libvo as common (not
environment-specific) and plugins (for each libs, like dga1, dga2, xv, x11,
various (unfortunately incompatible) sdl versions...)

As they will only initialize device/lib and set up requested number of
buffers, they are not so speed-critic. All speed critic stuff (actual
rendering / colorspace conversion) will be in common libvo2 core.

Same can be done for libao2 too.

I think libao/vo are the most problematic parts, because they not only
depends on cpu, they depends a lot on system libraries, X version, enabled
kernel features.

Other parts are only CPU specific - it should be solved somehow.
I think 2 variations are available:
1. using function pointers, and runtime cpu detection.
2. loading external libraries at runtime.
  2.a. using shared (.so) libs, but compiled without -fPIC, so, at least
       on x86 platform it would work very fast, with no noticable slowdown.
       if we are using -fPIC, it means average 3-10% slowdown, but some
       architectures (non-x86) requires this.
  2.b. loading static (.a/.o) libraries, as XFree 4.x does. it's very hard
       to implement (it's really linking the binary at runtime) but has no
       speed loss at all.

These will solve mmx/sse/3dnow etc code selection, but the remaining 
C code will be still optimized for a single cpu :(

So the macimal performance will still need the full recompilation of
mplayer, by inlineing function and optimizing C code for the given CPU(s).

MPlayer is one of that few applications where speed really matters a lot,
even few % slowdown can make it unusable for systems at speed border.

My time plan:
1. finish libvo2 - this is the most important thing to do now.
   it requires for near all TODO things - Gui, aspect ratio, direct
   rendering, software scaling, encoding etc.
2. make cpu-specific packages with ao/vo plugins. so binary-maniac users
   will be happy, while still using optimized code. but it means many
   (~10) packages.
3. implement optinal function pointers (done for few things, like mp3lib,
   libmpeg2 and libavcodec, but should be done for the other stuff)
4. make cpu-specific packs. only C code matters, others are runtime
   selected. it means only a few packs. for 486, 586, 586mmx, 686, k6.
   this can't be lowered more, without big noticable speed loss.


A'rpi / Astral & ESP-team

--
mailto:arpi at thot.banki.hu
http://esp-team.scene.hu

_______________________________________________
Mplayer-dev-eng mailing list
Mplayer-dev-eng at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mplayer-dev-eng



More information about the MPlayer-dev-eng mailing list