[MPlayer-dev-eng] Adding threaded functionality to mplayer NODAEMON

Roberto Ragusa mail at robertoragusa.it
Sat Sep 11 09:59:49 CEST 2004


On Fri, 10 Sep 2004 21:15:47 -0400
D Richard Felker III <dalias at aerifal.cx> wrote:

> ok i'll write a more technical reply too.

Good, let me clear a couple of misunderstandings.

> > > On Fri, Sep 10, 2004 at 12:04:26PM +0200, Roberto Ragusa wrote:
> > 
> > > > According to vmstat there were 10 cs/s, so a timeslice of 100ms, which is
> > > > still large, but not very far from a player requirement (10ms?).
> > > 
> > > huh? i think you got this backwards. with 100ms timeslices, video
> > > smoothness will be DESTROYED.
> > 
> > But I didn't say 100, I said 10! :-)
> 
> look at what i quoted. you said 100ms. maybe you wrote the numbers
> wrong though.

What I meant was
"a timeslice of 100ms, which is still large, but not very far from a player
requirements (which is what, 10ms?)"
and not
"a timeslice of 100ms, which is still large, but not very far from a player
requirements (the difference is what, 10ms?)".

So "the requirement is 10ms", which I estimated from 40ms frame durations
at 25fps.

> > 100ms is not far from 10ms (only a factor 10)
> 
> 100ms is several frames' duration. 10ms is less than one frame. HUGE
> difference.

Maybe I confused you by saying "not far", because I considered 10ms and
100ms near.

What I was saying is
"100ms is near 10ms, so context switch cost will not get much much worse"
and not
"100ms is near 10ms, so if a player works at 10ms it can work at 100ms".

The player will certainly switch before 100ms, but if the operating system
can give 100ms of timeslice, this means the switches will often occur
only when the player wants (that is, a thread wants to sleep).
So no useless switchings at high rate affecting performance.

> if it's a 300 mhz p2 (not celeron) or a xeon (did xeons that slow even
> exist? i'm not sure), then it probably has a plenty cache. even if
> not, you'll be playing a smaller movie (typically 640x288 or 512x384)
> since a box that slow can never play dvd resolution.

So you're saying that a CPU can basically play only movies with frames
fitting its cache, because overflowing the cache destroys the performance
and if when the cache is small the raw power of the CPU is probably also
limited enough to not be able to make the deadlines.

OK, it could be true (partially al least).

> so do i... i don't think your data scales right. newer cpus have
> probably taken a lot of steps to make context switch less expensive.

So smaller machines suffer more, you're saying. OK, reasonable.

> > 1) fitting a frame into cache helps speed a lot
> 
> yes, but even if a whole frame doesn't fit, cache coherency still
> helps. for example, when decoding a frame, assuming motion isn't too
> chaotic, motion vectors for several subsequent macroblocks will come
> from the same general area of the picture, and it will help for this
> area to be in the cache. also, cache helps a lot with slice
> processing.
> 
> if you want to compare, try playing a really big movie that won't fit
> in your cache, first with cache, then with cache disabled in the bios
> setup... :)

Nice try :-)
but this is not fair, you're also uncaching all the code and then
we are talking about accidental cache thrashing not 100% miss rate.

> > 3) fast CPU have a (perhaps) big enough cache
> > doesn't a strategy assuming big caches only help fast CPUs?
> 
> re-read what i just said about snow. snow already requires an insanely
> fast cpu....in fact at present it can't be decoded realtime on _any_
> cpu. this will change once we get some more optimization but it's
> still going to be intensive.

In my ignorance, I'm hearing about snow for the first time and will
search information about that. I think we're currently lacking a seriously
powerful codec at the moment.

Then there is dirac, where I'm afraid cache issues will be a problem.

> basically, if decoding movies is only taking 3-6% cpu time, it's time
> to design a new codec that's a lot more space-efficient, because the
> old speed restrictions don't apply anymore.

Yes, in fact I was discussing here a 3D (x,y,time) wavelet coder.
That would have multimegabyte "frames" (actually group of frames)
and it's possible that only GPUs can do it fast enough.

> > 1) a huge main loop, hard to understand, hard to modify because
> > side effects could happen
> 
> agree. threading makes this worse, not better.

Well, not.
The main loop would not exist at all.
All the complexity would be in the processing blocks but there would be
a very clean interface (request frame, send frame, receive order, send
status info...).

> > 2) simple things like OSD during paused video are problematic
> 
> changing main won't help this one bit. it's a fundamental problem that
> can't be overcome without making the player slower. once you've
> displayed a frame, it's very possible that it no longer exists
> anywhere readable, so you can't change osd on it without getting a new
> frame.

True.

> > 3) playing a 25fps video at 50fps deinterlaced is not possible
> 
> this is a fundamental problem in the video filter layer and the main
> loop structure. again it has nothing to do with being non-threaded.
> mplayer g2 can do full-rate deinterlacing just fine, because it runs
> the filter chain in the correct order (pulling from the end).

Good. Is g2 already usable? I thought it was at a design stage.

> > 4) playing real time streams without overflows/underflows is not possible
> 
> not sure what this means exactly.

Simply this: "mplayer /dev/dvb/adapter0/dvr0", the player will quit after
a few seconds. "-cache 4000" will gain you extra time.

Mplayer takes the timing from the sound card and reads when it needs data.

If mplayer is too fast, reading from the dvb device will fail, so mplayer
thinks it'f eof and quits.

If mplayer is too slow, the kernel dvb buffer fills up and the read returns
-EOVERFLOW. I don't remember if mplayer quits or ignores the errors and
goes on (but data has been lost in the kernel, so there is video corruption).

> > 5) playing a 1fps animation affects reaction times of keypresses
> 
> yes, this sucks. but it's easy to write a correct main loop without
> this problem.
[...] 
> a good design can use threads, but does not depend on it. most of the
> work i was doing on designing a new video layer couldn't care less
> whether the calling app is using threads or not...

Maybe I shouldn' have focused my comments on threads.
What I was rally proposing is a modularization of the processing blocks
with well defined interfaces eliminating subtle side effects (such as,
calling this before this will not work...).
A good approach could be (you said you already know how to design a good
architecture, but let me have a try too) a main loop consisting of
basically a scheduler, it keeps info about all the functional blocks
and knows who has "something to do", decides who to run based on a
deadline strategy. When that abstraction is in place you can add
threads if you want (2 threads with 2 CPUs will certainly be better).

But I think that without a more threaded approach the problem
"I have to display a frame in 2ms, but if I try to process other data
in the mean time I will miss the deadline" has no solution, because of
the preemptive/cooperative multitasking similarity I used before.

> i'm just tired of pro-thread propaganda...

No propaganda, just pleasant conversation.

-- 
   Roberto Ragusa    mail at robertoragusa.it




More information about the MPlayer-dev-eng mailing list