[Mplayer-advusers] Bug? Anomalous CPU usage when playing HDTV clips.

John Stebbins stebbins at jetheaddev.com
Mon Apr 12 19:51:35 CEST 2004


A reply to Richard, and some additional timing analysis.

First the analysis.  I ran the same tests again, but this time
on the radeon driver.  It turns out its bugs are completely
different than the i830 bugs.  I can find no indication that
the radeon driver waits for vsync as the i830 driver does.
What I have found instead is that the bandwidth to the frame
buffer being used for XV seems to be 5 times slower when using
the radeon.  With the i830, I get around 1.1 GB/s transfer rate.
With the radeon, I only get about 210 MB/s transfer rate.
On a P4 2.6Ghz, mplayer actually decodes a 1280x720 frame faster
than XV can copy the bytes.

Note that I took great pains to make sure to minimize other
factors that could obscure the results.  Both tests are run
on the same hardware (except for the addition of the radeon
card itself).  I made sure that the code path followed in both
drivers is as similar as possible.  This meant using the YUY2
image format.  In both drivers, this results in a simple
memcpy call to copy the buffer.  I made sure the dimensions of
the region being copied are the same and thus the same number
of bytes are being copied.

There is a bug entered in XFree bugzilla that indicates
others are seeing this problem as well. This driver seemed 
to undergo a drastic performance hit around the time of the 
XFree 4.3 release. The bug number is 414 if anyone is 
interested in seeing the history and other peoples efforts
to analyze. Also, before I saw this bug, I entered another
one (number 1292) documenting what I had seen.  I will be adding
comments to these bug reports soon.

After seeing all this, I took a closer look at the XFree86
log file and noticed the following warning.

(WW) RADEON(0): [agp] AGP not available

I'm going to do some spelunking in the driver to find out 
what causes this warning and see if it is related.

Reply to Richard below...

On Sun, 2004-04-11 at 14:34, D Richard Felker III wrote:
> On Fri, Apr 09, 2004 at 05:37:46PM -0700, John Stebbins wrote:
> > Some additional info for anyone following this thread.
> > 
> > I got tired of groping in the dark on this, so I horked the
> > X sources, added some uSec timing measurements and profiled
> > whats going on for one of the drivers.  I did this on my desktop
> > machine which is a P4 2.6Ghz with intel i830 integrated graphics.
> > 
> > The color conversion and scaling is indeed being done completely
> > in hardware as I thought it should be.  The problem turns out
> > to be in a busy wait loop in the driver.  It appears to be waiting
> > for a vsync (as someone suggested earlier).  What I see is that 
> > sometimes, the timing between mplayer & the X driver are just right
> > so the busy wait finishes within usecs.  Then other times the timing
> > will get completely wrong and the busy wait will spin for as long
> > as 14 msec.  This is fairly modal behavior as well.  When the timing
> > gets short it tends to stay short for a while, and when it gets long
> > it tends to stay long.  The propensity for the code to get into the
> > long wait mode seems to increase when mplayer is "working harder".
> > So high resolutions or high frame rates exaggerates the problem.
> > 
> > This busy wait is disastrous for 720p HD because this format runs
> > at 60fps which is 16 ms per frame.  When X chews up 14 ms per frame,
> > that leaves just 2 ms for decoding.  Ouch!
> > 
> > I don't know yet if the radeon driver has exactly the same problem.
> > I'll be diving into that one next.
> > 
> > I'm not sure what to do about the busy wait.  I don't see an obvious
> > way to synchronize closely to vsync without waiting.  Any ideas anyone?
> 
> Proper hardware should never require a busy loop; it just has a
> "switch buffers at next vblank" command you send to the card. Are you
> sure you're using the real backend scaler and not a stupid
> blitter-based xv port?
> 
Yes, very sure.  I've become far more familiar with the XFree86
XV code than I'd like to be :-P

The driver does use a command to the hardware that says 
"switch buffers at next vblank".  But the way the driver was
implemented, they are only using one buffer at a time in this code
path (though 2 are allocated).  Before copying data into the next
buffer, they always wait for the previous buffer to become free. They
sit in a busy wait loop reading a status register until the
previous operation is complete.  Then they switch "current" buffers,
fill the "current" buffer, issue the command to refresh on vsync and
return.

I don't know enough about the hardware details to suggest an 
alternative yet.  The problem they seem to be working around is 
that the status register they are reading to check for completion
can only be used to indicate the completion of one buffer at a time.
If they allowed both buffers to be in use simultaneously, they 
would not be able to tell if one or both buffers were complete.
If they were to miss one, the status register will never update
again and they would be stuck in the loop indefinitely.  In fact,
the loop actually has a max number of iterations defined just in
case this were to happen in some unknown other scenario.

Note that this problem only exhibits itself if the frame rate of
content is >= the refresh rate of the monitor (or you are running
a benchmark that does not meter the output frame rate).  
In my case, I am working with 720p HD content which is 60fps.  
There are several reasons why someone may use 60Hz refresh rate.  
To name a few: LCD, PLASMA, Projector, DVI output, and (in my case)
broken XFree driver that uses a BIOS call that is not implemented 
on all systems to set the refresh rate (fixed in XFree 4.4 I've
heard).

John





More information about the MPlayer-advusers mailing list