[MPlayer-dev-eng] Re: Should I write Voodoo Banshee VIDIX driver?

Alban Bedel albeu at free.fr
Sat Mar 20 12:29:45 CET 2004

Hi Georgi Petrov,

on Sat, 20 Mar 2004 01:20:04 -0800 (PST) you wrote:

> I'm amazed how fast I get responce, that's really great!
> Tobias, thank you too for trying to help!
> I begin...
> I saw colorspace.txt and I blame myself for asking too many questions,
> which can be found there. Actually long time ago I red this file and
> everything I(don't) know was based on memories for what's written inside
> :)
> Things I figured out from your previous posts (can be wrong):
> 1) pitch=stride >= width * bpp
> Is here bpp like this:
> YV12 -> 12 bits -> bpp=12 bits=1,5 bytes
> YUY2 -> 16 bits -> bpp=16 bits=2 bytes

no. For planar format each plane have it own stride.

> 2) for YV12 planar I have this layout in memory: YUV YUV YUV YUV...,
> where Y plane :w*h*8 bits
> U plane :(w/2)*(h/2)*8 bits
> V plane :(w/2)*(h/2)*8 bits
> (I can be wrong about YUVYUVYUV memory layout!!!)

Yes, you are wrong. You have 3 independant buffers:

y buffer (y_stride*height bytes)

u buffer (u_stride*height/2 bytes)

v buffer (v_stride*height/2 bytes) 

They are separate buffer, so you have different stride for each buffer.
If you encounter one place where the format is YV12 and you have a single
pointer then the planes are only append one after each other. If you can
only set one stride then it assume u_stride = v_stride = y_stride/2.
If you can't set stride at all it then probably assume y_stride = width.

> 3) for YUY2 packed I have YU YV YU YV..., where
> Y plane :w*h*8 bits
> U plane :(w/2)*h*8 bits
> V plane :(w/2)*h*8 bits
> But I suppose it's not so, because colorspaces.txt says:

it's so. You have one sigle buffer of stride*height bytes:


> 4) The Banshee can't use YV12 natively, but it has hardware YV12 -> YUY2
> converter. This converter is supposed to be faster than software (that's
> its purpose), but on my system as well as on others it isn't. On my
> system I get way better performance by giving Banshee YUY2 format.
> On the other side this can be wrong, because XVideo accepts YV12 and the
> Banshee runs at full speed with it. Can it be because XVideo actually
> takes YV12, converts it to YUY2 by software and doesn't use Banshee's
> (slow???) hardware converter?
> Under Windows when I use YV12, I get the same low speed (really 3-4
> fps). Ideas?

atm i really have no idea what can be causing this. What are the results
with tdfxfb ? If tdfxfb work at normal speed then i think it's a pb with
your AGP driver and/or hardware.

> >You are running a celeron, i was running a k6 it might be the
> >difference.
> Can this in ANY way be related to the fact that my Celeron has no L2
> cache???
> 5) XVideo is bloated/slow and movies can play faster :)))

In my experience xv has always been the slowest. Even with DR it's not comming
close to tdfxfb.

> 6) I understood DR.

Nice :)

> 7) I didn't understand why when double buffering in XVideo is turned
> off, only 1 frame on 3 uses DR.

iirc it's bcs if you don't give -double the vo only allocate a single buffer.
Thus you can only dr one buffer (p frames only probably) if you give -double
it will allocate 3 buffers to allow DR with IP(B) codecs.

> When dealing with the video card, I allocate 2 separate memory ranges
> from it:
> 1) MMIO, which is used to set/read registers and as a whole change video
> card's configuration.
> 2) Video card's video memory.
> From now on I have 2 pointers to these memory locations.
> Then in order to get/set registers I look at the manual and see their
> offsets. I can do this correctly. When I want to play video, I setup an
> overlay. I write the needed info into needed registers, giving:
> 1) coordinates
> 2) size
> 3) format (eg. YUY2)
> 4) stride
> 5) H/V scaling, ... etc
> 6) memory, where the frame is located.
> Here only 6 bother me.
> I understand it that way: I set LEFTOVBUF to point to some memory of the
> video card's memory (for example it's 10th MB) and then the Banshee
> reads from this address to update the overlay frame after frame. 
> If you want to do double buffering, you set RIGHTOVBUF too and then swap
> them frame by frame. This turned out to be wrong, because regardless of
> what I point LEFTOVBUF to, overlay sits empty. So after 4 days 8
> hours/day TRYING to find something useful, just got VIDCUROVRSTART's
> value and decided to write there. Suprisely it turned out to be exactly
> the start of the memory that Banshee uses to update the overlay. Am I
> wrong somewhere? What must I set LEFT/RIGHTOVBUF to point to?

According to the specs. sheet i have VIDCUROVRSTART is read only. So i
never tried to write to it. It's probably no good idea.

Afaiu LEFT/RIGHTOVBUF must be used. That is you write your next buffer
address to LEFT/RIGHTOVBUF and then issue a SWAPBUFFER command. This
command allow vsync'ed swap, etc
I do it as explained in the specs (at least how i understand it should be
done) and it seems to work correctly.

> One more question: I must deal always with video card's memory, right? I
> mean - I have no work to do with computer's RAM and all those registers
> must point to the video card's RAM???


> Then I don't understand this:
> VIDIX wants these things:
> 1) video memory destination to write to. I give it VIDCUROVSTART's value

Ok. But check before where it is, don't assume it point to a valid location.

> 2) video format. I always reject YV12 and tell it to give me YUY2
> 3) alignment of Y,U,V in bytes. Should it be 4???

start address *and* stride must be 4 bytes aligned.

> 4) frame size. How can I compute it? h*w*2 for YUY2 (2 comes from 2
> bytes=16 bits)


> 5) Y,U,V planes offsets within frame. It's only for planar formats, for
> packed only Y value is used (because it's packed and there are no
> Y,U,V???). Does this mean that for YUY2 I have to give only Y plane and
> should it's offset be 0, eg. the start of the overlay memory? Ideas?

sure. packed mean that all y,u and v are packed in the same buffer. So
you just give the start address, that's it.

> 6) stride. stride=w*2 for YUY2

Again stride >= width*bpp. So generally yes. But you must take care of
the alignement so you may end up with stride > width*bpp.


Everything is controlled by a small evil group
to which, unfortunately, no one we know belongs.

More information about the MPlayer-dev-eng mailing list