arpi at thot.banki.hu
Tue Oct 9 19:43:45 CEST 2001
> > > > It does not worth to support! If you can select between functions optimized
> > > > for various CPUs it would take much more memory and a bit more CPU time
> > > > (function pointers). If you make dynamic loadable "plugins" it would
> > > > decrease performance about between 10 and 20 percent (afaik). The latter
> > > > because on x86 relocation information should be kept in a register (and
> > > > shared objects are this kind of thing ..).
> > >
> > > Uhh, this is complete nonsense. If written in any reasonable manner it
> > > will be no slower. Also, mplayer already links against half a dozen
> > P*L*E*A*S*E! Don't flame me, I was probably coding much more x86 assembly
> > than you. These are the facts! Please read some dox, learn more because it's
> > very lamer attitude to say "complete nonsense" even if you're not right :)
> Then get the facts straight. There is no magic fairy that makes code
> slower just because it is loaded after the main binary. There aren't
> any function pointers involved after it's started, either - you call
> through one pointer when you start the decoding core (or whichever
> chunk of code you happen to be selecting), and that's it.
Ok, then the facts.
I've made measurements, and Walked (libmpeg2/mpeg2dec author) also did some.
You are simply wrong.
Let's see, what happen if you're using a single binary, with built-in codecs:
It's mapped into the process' virtual memory, loaded to usually 0x80000000.
Every variable has a constant memory address, calculated at _linking_ time.
So referencing a variable is a simple mov eax,[0xXXXXXXXX]. (intel syntax)
Now, if you have this code in a shared library:
it's mapped to the memory around 0x40000000, just after the prvious .so
library. so it's memory adderss depends on the environment, size of other
libraries and teh probram which is linked with it.
memory addresses are calculated _runtime_, not linking time.
In the code placed into teh library referencing every functions, variables
etc requires an index, which is usually placed in a register (ebp).
So, the example above looks like: mov eax,[ebp+0xXXXXXXXX].
It means one less register can be used by the program. It isn't problem for
MIPS where 2x64 registers are available, but it's problem for x86, where
only a few register are avaliable, and most of them has special functions.
It results up to 20% performance loss in cpu-intensive tasks like divx
decoding or postprocessing. I personally got max 8% performance loss on my
900MHz celeron2, but it ran at 100mhz bus and has very fast cache, so ram
access isn't much slower than registers. Btw 8% is still too big.
But. We are on linux now. (note: mplayer works on non-linux systems too)
There is 2 way to create shared libraries.
1.: with -fPIC
2.: without -fPIC
first version means standard way of shared libs, it is really shared then. i
mean many instance of this lib use the same memory. but it requies that
index register mentioned above.
second version (without -fPIC) isn't standard way, but works under linux!
it means runtime relocation of the library. as it's relocated (patched), it
can't be used by more than one processes (so doesn't share the memory, just
the disk space), but it avoids using the index register, so it mean <1% speed
loss. it's acceptable by us.
back to the original topic:
we aren't against using shared libraries. but the current design and API of
mplayer doesn't allow it. i mean mainly libvo, which is very
environment-dependent (available X, SDL etc libs, versions).
so, we have to change API design (we're working on libvo2) first, and then
we can see how to split it into plugins (shared libs).
until it's done, pre-configured pre-built binaries are bad things!!!!!!
the another issue: runtime cpu detection.
it's doen by a few things in mplayer (libmpeg2 partially, libavcodec fully,
libmp3 partially, others: none)
someone should review these mmx/sse/3dnow etc optimizations, and change all
of them to function pointers set by initialization at runtime using _common_
cpu detection. we has no human resource for this now.
if you're interested, tell us. patches are welcomed.
> You don't even need to delay loading, it's quite possible to build all
> versions into the main binary - even easier to setup, but increases
> the virtual image size (which doesn't matter much on a system with a
> non-broken paging system).
if you want to crate modular pre-built binaries, you have to use shared
libraries for environment-dependent functions (like -ao and -vo drivers), or
the user have to install tons of libraries (from svgalib to libesound),
while he probably use only a few (one for yuva nd one for rgb) options.
now he can configure mplayer at compile time, and compile support (and
dependenxy) only for the libs he wants to use. plugins are the only way
in binary packages for this.
> You appear to have completely missed my point. You subscribe to the
> mplayer-users list by choice; if you don't like hearing from users,
he is mplayer developer. you did nothing in mplayer. so?
A'rpi / Astral & ESP-team
mailto:arpi at thot.banki.hu
More information about the MPlayer-users