[MPlayer-dev-eng] Re: Should I write Voodoo Banshee VIDIX driver?

Alban Bedel albeu at free.fr
Fri Mar 19 13:23:30 CET 2004

Hi Georgi Petrov,

on Fri, 19 Mar 2004 01:17:22 -0800 (PST) you wrote:

> Hi Alban,
> Thanks for answering me. It's always better to know someone cares about
> what you do :)
> >> 3) tdfx_vid: *Could be the best*, but lacks subtitle support!!! This
> >> is serious.
> >vf_expand ? Imho there is no need to dup even more code in MPlayer. But
> >feel free to do it.
> ??????????????????????
> I didn't understand it - how can -vf expand help me to see the
> subtitles, when they aren't implemented in the driver. I don't fully
> understand the internals of mplayer, but in your driver osd_something
> function is empty. Shouldn't this function place the subtitles on the
> screen, or this work is handled elsewhere?
> I'm really incompetent here :) Tell me!

the expand filter can draw the osd, just RTFM ;) Most vo just duplicate
this code as they have been writen before libmpcodecs (and all it's
filters). I was also hoping to be able to use some hw functions to have
accelerated OSD :) I attempted to use some 3D stuff of the card but i
failed miserably. That's probably possible but not while X is running
(unless you disable all GL/drm stuff perhaps but i doubt).
It's probably possible to use the blitter if you don't care of the alpha.
Could be a good idea as using the osd functions would mean reading from
video (or agp) mem and it's sloooooow :(

> >>
> >http://zebra.fh-weingarten.de/~maxi/html/mplayer-users/2003-09/msg0068
> >1.html> 
> >> There I explain that tdfx_vid CAN turn bilinear filtering on, but by
> >> default it doesn't! Can be fixed by ~2 lines patch. Result? Again
> >> poor qulity when resizing!
> >
> >iirc there is a limitation. The bilinear filtering can only be enabled
> >in 1x mode (whatever that is) according to the specs. i have. So the
> >code check that and then enable bilinear filtering if possible.
> >Now if you can set all the time a patch is welcome.
> Yes, that's true, BUT: In your code you check wheter 1x or 2x mode is
> turned on. If 1x is turned on, you enable bilinear, if 2x is on, you
> disable it.

Wich i think is/was the correct thing to do as long as don't know
exactly know what this 1x/2x mode do and if you can just set any time.

> The right way isn't to check which mode is enabled (bacause on my card
> 2x mode is ALWAYS turned on), but explicitly to enable 1x mode and then
> enable bilinear filtering. Acually on my post I've included the patch.
> Sorry - it's not diff's output, but if you take a look at it, you'll see
> that it's really easy to correct it :) No more than 3 lines of code :)

Again, if it work and isn't making any pb (strange side effect afterwards
or the like) then a patch is welcome (i couldn't find the one you are
writing about). Anyway i'll wait until your driver is ready and then i'll
backport all improvment to tdfx_vid ;)

> >It's at kernel level bcs you need to interface with the AGP mem.
> >I never was able to use the AGP stuff from user space (even the
> >simplest code samples failed miserably) so i turned to a kernel module.
> >I was also nice to see how you do modules ;)
> Yeah... That's true. I just want to see how the VIDIX driver will
> perform in comparison to yours. It would be fun :) Of course AGP memcpy
> is faster, but... I'm just curious what will happend :) Right now my
> driver gives me less dropped frames. It still doesn't display correctly
> NOTHING, but that's only some adjustments I have to make. I have one
> video, on which:
> 1) XVideo gives me ~ 1300 dropped frames
> 2) tdfx_vid ~ 900
> 3) my driver ~ 200

Well if doesn't display anything it's not really fair ;) But you better do
speed benchmark (-nosound -benchmark) to compare. You can find those i did
after writing tdfx_vid here:

> Of course there may be some mistake from my side, because I still have
> no idea what is pitch, stride, how the whole thing is arranged in
> memory,
pitch and stride are the same thing. The distance in bytes betwen 2
consecutive lines.
The banshee mem layout is pretty simple. Basicaly there only 2 place where
you write data. The framebuffer where packed data can be handled and the
planar 2 packed converter (yv12 to yuy2).

> but after 1 week ENDLESS coding I managed to:
> 1) Implement almost all VIDIX needed functions (some partly, no dobule
> buffering)
> 2) Start overlay (with proper size/format) and give VIDIX it's memory
> address, size, etc.
> Now I have to figure how exactly are Y,U and V planes arranged, which
> offsets between what I have to set and... what the f*** I have to do at
> all!!!

YV12 can only be handled using the planar 2 packed converter. It's pretty
simple to use. You set the stride and address of your target buffer
(where yuy2 data will be writen). Then just write Y, U and then V to the
converter address. The converter use a fixed address scheme, each plane
is 1MB big and have a stride of 1024 bytes. 
Luckily the AGP move function can use different input/output stride so
no need for slow loop wich copy line by line :)

> Now VIDIX writes there, but what's on the screen hardly can be called
> video:))) It moves and there are SOME lines displayed correctly, but the
> whole thing is heavily unarranged. I'm still experimenting. I should
> adjust offsets and a couple of things, but it's really hard when you
> don't know how EXACTLY it works inside :(

This really sound like a stride problem. For the overlay you'll use
a buffer you put somewhere in the video mem. You can chosse any
stride but you probably want to use the orginal stride so it can
be copied "at once". You also have to be carefull as quiet a lot of stuff
must be aligned (overlay stride for yuy2 need 4 bytes aligned stride
and address for example).

> BTW, writing kernel module is WAAAAAY beyound my skills :)))
> Ohh, yes - can the driver be modified so it can be inserted in 2.6.x
> kernel?

Dunno. I have no 2.6 box and i'm not going to switch to 2.6 soon i think.
So it's up to some 2.6 users imho. But don't worry it's really not
that hard and it probably not very different from VIDIX coding.

> >I did considered VIDIX. But i soon understood VIDIX will never ever fit
> >with the banshee design.
> >VIDIX assume the card have a nice overlay wich support a given number
> >of
> format.
> >The banshee overlay only support YUY2 and BGR16.
> Why not use -vf yuy2 always? At the moment when VIDIX tries to find
> matching colorspace, I just reject YV12 and say one big YES to YUY2.

You can do that. But on the box i used to write this driver (k6-2 333)
i never found *any* case where software conversion was faster than using
the hw stuff. If you prefer sw conversion you can force it anyway.

> Well, I think that BGR16 on the other side isn't big issue - I mean that
> no codec in libavcodec uses RGB color space as far as I know. And even
> if some does, no MPEG4 codec does it :)

My objectif when writing this driver was to get as much as i could from
the card with all colorspace that it could handled in some way. If the
"big" codecs would really use it or not was the last i thought about.
Another thing is that i later used this driver in other progs. Whitout
RGB support it would have been pointless. And believe me tdfx_vid is faster
than *any* other video output method ;)

> >That's it, it can't even downscale. For all the rest you need to use
> >the internal converter/scaler.
> I think that downscaling isn't big issue either. I mean - how often do
> you need do downscale some video? I personally don't need it and I see
> no one, who will need it :)

Again completeness. I just hated it when the video was cutted bcs i
resized the window a bit smaller :) But looking at the code i can't find
that anymore. Strange i'm so sure that wrote that once. Was perhaps lost
or forgoten and never commited :((

> >So in VIDIX to correctly support all colorspace you would need to
> >implement all this magic inside the driver. That was not an acceptable
> >solution.
> Why don't just reject what's not supported by the driver and leave
> mplayer to do the hard work? For example since YV12 is the colorspace
> almost every codec uses, I just say NO to YV12 and the internal
> converter gives me YUY2 - converted using MMX and it's way faster than I
> could ever do it in the driver.

You seems to miss the point that the card hw itself do the conversion.
Look at the tdfx_vid code you won't find any code to convert betwen
colorspace. All it can do is ask the card to copy with optional
convertion/scaling some data wich is in his memory.

> >So i wrote tdfx_vid wich only give access to the basics functions of
> >the card. Then the vo do all the needed black magic to have these bit
> >showup on the screen as fast as possible :)
> Yeah, that's what I also mean :)
> >Anyway good luck with VIDIX if you really thinks that's worth it :)
> Well, if it could give me 900 -> 200 dropped frames on some movie, I'll
> be satisfied. Even if I do something wrong at the moment to have this
> framedrop(e.g. asking from VIDIX to write smaller amount of information
> than really needed), I'll be happy, because that will be my first
> "driver" :)

Would be great but i somewhat doubt it. Keep in mind that the main
difference betwen tdfxfb and tfdx_vid is that tdfx_vid is using AGP move.
Using the overlay is no big speedup, the bottleneck is really when you
transfere data to the card.

> Thank you a lot. Without tdfx_vid, I would never succeed.

It's a pleasure for me. You know after writing tdfx_vid i got nearly 0
feedback. So even it's long after i really enjoy discussing this stuff.

BTW if you don't have the banshee specs, it's probably high time to check
that. If you have pb finding then just ask me i can send you the stuff
i have.

BTW2 my banshee lie unused atm so i can't do any testing. But i'll put it
back in some box soon i think.


Everything is controlled by a small evil group
to which, unfortunately, no one we know belongs.

More information about the MPlayer-dev-eng mailing list