[MPlayer-dev-eng] Adding threaded functionality to mplayer

Attila Kinali attila at kinali.ch
Wed Sep 8 14:48:27 CEST 2004


On Mon, Sep 06, 2004 at 04:24:53PM +0100, Ed Wildgoose wrote:
 
> Most people have AGP, but someone told me that this is an asynchronous 
> interface...?  Faster to copy in than out...?  Anyway, PCI-X is on the 
> horizon, and this promises to let you do really silly things like use 
> your graphics card as a general purpose DSP processor!  This should be 
> really funky, and the bandwidth will be awesome.  Boards are already 
> available from Asus and others.

lol, do you realy believe that PCI-Express will solve the I/O issue ?
Todays normal PCI interface can handle about 135MByte/s (33.3MHz, 32bit)
but in reality you can hardly go over 80MB/s with a single device. Using
more than one decreases the overall bandwidth further. But even 80MB/s
should be enough for most videos, but unfortunately, we have another
bottleneck: RAM. Every I/O operation uses RAM and as long as this stupid
PC design continues, this will be one of the major bottlenecks.
If you are intersted on how our GHz CPU future looks like read [1].

Also note, that PCI-X is not the same as PCI-Express. PCI-X denotes the
extended PCI busses iirc introduced in PCI 2.3 which use 64bit and/or
66.6MHz.

Also note that PCI-Express is not really needed for graphics cards,
AGP is fast enough (iirc as fast as PCI-Express will be), the reason why
the industry pushes it, is because they need a fast interface for the
upcoming GBit and 10GBit ethernet interfaces. Using the same interface
for both networking and graphics is a cut on costs (higher volume).
But the PC would actualy profit more from a redesign of it's I/O
architecture then from new busses, but that doesnt sell as well as
bigger numbers.

> Cutting back to the original point though about threading mplayer.  
> Adding threads is not really about performance, and whoever said that 

MPlayer is mostly about two things: performance and stability.
And if you think that threading will not have any drawback on
performance then you should walk out of your university and have a look
at the real world. Having multiple processes using large amounts of
memory (where large is more than a few cache lines) and running
concurently is a huge decrease in performance compared to running them in a
non-preemting, round robbing manner. Search for cache efficiency and
related topics if you want to know more.
Also note that context switching is very expensive, it varies depending
on the CPU and operating system between 100s and 1000s of cycles.
Nothing that you want to do too often.
Read the various documents on mikro kernel performance for a in deep
analysis on this.

> synchronisation became harder was missing the point.  Basically all I 
> was suggesting is that you have one thread displaying video, and one 
> running the audio.  Basically at a high level the video thread prepares 
> a frame and then blocks until the audio thread signals that the video 
> frame is due to be displayed, and effectively yields to the video 
> thread.  Video displays a frame and then blocks again until the audio 
> thread tellis it that it's time to display again.

How does the audio thread know how long the video thread has to wait ?
What happends if the audio thread needs more than one frame time to
decode _and_ display it ? Also you would need to make a 
file reader/demuxer thread, because both audio and video need it
independently, thus you have already 3 threads to synchronize, not 2.
How do you handle the case of a subtitle file ? Or an seperate audio
file ?

> In the same way you can have a keyboard loop sending instructions to the 
> other threads about what to play.

Makes it 4 threads to synchornize.

> It's quite a simple design compared with the event loop that we have at 
> the moment, 

I still think it's not true. I agree that the current state machine (an
event loop is something different) is not easy to understand and
contains a lot of obfuscated code. But that's mainly because it just
grew from a simple loop and thus lacks design. If you think it can be
done that easily, try to fork mplayer and prove it.

> effectively you just split main() into a number of chunks 
> and get them all to run in their own loops with appropriate 
> synchronisation back to the main thread for master control.  In practice 
> of course you need to be careful not to have a whole bunch of problems 
> caused by threading, but the top down design becomes very simple.

A nice top design with a hackish implementation isnt better than a
hackish implementation with no design.

> Anyway, I'm not going to bang on about this too much.  Splitting up 
> main() will go a long way to making this practical, and I will have a 
> bash when I get some time.  After that it will be pretty much a case of 
> showing a patch to the new broken up main that runs small chunks of it 
> in it's own thread.  We can argue about the merits much better when 
> there is some code to look at, but I guess I am unlikely to go down that 
> route anyway if there is resistance!
 
There is only resistance against your claims. Most of the developers
around MPlayer had to learn on how to write efficient programs a long
time ago when computers still run at 10MHz and every cycle was needed to
get the job done. (Though we now have access to GHz processors, it is
still not true that we can waste any cycle) Now you come and say that
you can do the same thing in a way that most people know it's slower
claiming that it will be as fast as the current method. Can you see why
that is hard to believe ?


			Attila Kinali

[1] J. Ousterhout, "Why aren't operating systems getting faster as fast
as hardware", USENIX 1990




More information about the MPlayer-dev-eng mailing list