[MPlayer-dev-eng] Documenting and extending the subtitles system

Nicolas George nicolas.george at normalesup.org
Tue Feb 24 00:36:41 CET 2009


Hi. Thanks for both your replies.

Le quintidi 5 ventôse, an CCXVII, Reimar Döffinger a écrit :
> >   - Overlaid stuff is stored as an almost-static list of objects in a global
> >     variable.
> There is no reason for it to be static or almost-static, though I may
> misunderstand what you mean.

What I want to say is that although vo_osd_list is a dynamically-allocated
linked list, only 6 elements are ever added to it, and adding new ones
require adding code all around.

> >     - text objects are in fact made of a gray map (1 octet per pixel) plus a
> >       bitmap mask (1 octet by pixel as a boolean).
> Uh, no, no boolean. It is a luminance and a alpha bitmap, though alpha
> is represented in a way that allows faster software rendering

Ok, my bad, this is indeed alpha + luminance.

>								 and alpha
> is first applied to the source then the luminance is added.

Cairo does that too, they call it "pre-multiplied alpha". That makes
perfectly sense if the image is only intended to be overlaid, which is our
case.

> >   - A lot of image formats are supported, but for planar formats, only the
> >     first plane is affected, which is why very saturated colors bleed
> >     through subtitles.
> No, this is just an optimization of the software renderer. Hardware
> renderers as in vo_gl, vo_direct3d or vo_vdpau will not do this.

Ok, just an optimization, not worth mentioning it.

> >   - Overlaid stuff is pulled from libass, and may be send through, and will
> >     sometime travel along VFCTRL_DRAW_EOSD.
> No, VOCTRL_DRAW_EOSD will contain the data, VFCTRL_DRAW_EOSD is only a
> notification that it is now time to draw the OSD.

I did not see the difference between V[OF]CTRL, my bad. But, for my defence,
there are inconsistencies:

- vf_ass always does the overlay in put_image and ignores V?CTRL_DRAW_EOSD;
- vo_gl (the only VO implementing EOSD) does the overlay on VOCTRL_DRAW_EOSD;
- vf_vo emits VFCTRL_DRAW_EOSD whenever it gets VFCTRL_DRAW_EOSD;
- dec_video.c emits VFCTRL_DRAW_EOSD just after put_image.

It seems to me like two subtle ways to do exactly the same thing, and there
certainly is room to make things simpler.

> I think you can do full-colour rendering with what is already there
> though, even if it would be quite tricky (I did not actually research
> this yet though).

The ass_image_t has only one global field for the color, so apart from
splitting the overlay into one object per color, which is an awful waste, I
do not see how.

> Depends on what you consider the "target colorspace". It should not be
> required to be the same one as whatever the actual video uses.

I probably was not clear enough: I think the EOSD objects should be stored
in the pixel format that will be the most practical when using them.

Normally, the EOSD objects should be used at exactly one point of the chain:
either the ass filter or the VO driver: this point decides which pixel
formats it wants. The ass filter will probably want the same pixel format as
the video that is going through it, while the gl output driver will probably
want a pixel format suitable to turn into a texture.


Le quintidi 5 ventôse, an CCXVII, Uoti Urpala a écrit :
> It's not clear whether you're considering only the render-on-video part
> or also how OSD content state is maintained and updated. Those parts are
> separate; the code doing the rendering need not have any knowledge about
> how OSD internally maintains its list of objects. Your comments about
> the current mechanisms generally fail to distinguish the API for
> updating OSD state and the API for renderers, which are fairly
> independent.

I completely agree that these two points should be distinguished. But in the
current state of the code, they are not, which is a problem.

The old OSD code is the worst in that regard, of course: it actually needs
to know where an OSD object comes from to be able to render it. But the EOSD
code is not perfect either, since it gets its objects directly from libass.

Separating the production of (E)OSD objects and their use, and maintaining a
constant direction of the code flow would help greatly to make the code
easier to maintain and extend.

> This stuff is all related to maintaining OSD state and is not visible to
> the part doing the rendering.

Alas, it is: there actually several code paths that lead to the rendering
function, depending on the origin of the OSD objects.

> Now this comment clearly only considers the rendering part.

Yes, indeed.

> The main problem with this is that it's worse for caching rendered
> fonts. Any color change will require rendering a separate fullcolor
> bitmap. It'll also use more memory for caching. I think the main
> alternatives are:
> 1) Cache fullcolor bitmaps. Uses more memory and requires rendering from
> scratch more often than current model.
> 2) Cache alpha bitmaps but export fullcolor bitmaps from libass, doing
> the alpha->fullcolor compositing inside libass (possibly caching that
> for a shorter time).
> 3) Export alpha bitmaps and allow renderers to implement a possibly more
> efficient compositing step (requiring more complicated compositing
> primitives than current EOSD).

I did not think of that, and this is a very interesting issue. I will come
back to it later.

> I think "EOSD" is best used to refer to the renderer API. You're
> considering other related functionality here.

I will try to avoid denomination problems as much as possible.

> > - a callback API, for anyone to get notification of significant events (such
> >   as timestamps expiration);
> This isn't so obvious. Currently there is no such API and I see no clear
> need for one. Subtitle events are probably best handled by the subtitle
> "codec". OSD objects may have expiration times, but it doesn't seem
> necessary to have those "in the OSD API".

I think you are mistaken: there is currently such an API, and it is used:
the ass and vo filters get their EOSD objects by calling
ass_mp_render_frame, and they give the current PTS as a parameter; libass
uses that PTS to compute which subtitles are active.

There does not seem to be a similar mechanism in the old OSD system, though,
and if I remember correctly, that is one of the reason why ASS does not work
with mencoder (there are no PTS traveling along the video filters in
mencoder).

This is not actually important, we could easily change libass so that it
gets its timestamp from update_subtitles.

> I think your comments are overall too vague and general to really
> progress anywhere. You should identify some particular change you want
> to make or issue you want to fix. Continuing discussion at this level
> will generate talk but probably little else.

I will try to be much more precise, now that I sure I was not completely
mistaken on the workings of the system. I will try to sketch what the design
should be when the work is done, and then see how to to there from here with
the least effort.

We have a thing that I will call "the overlay subsystem". It has two sides:
the "client" side and the "driver" side.

On the client side, there are various parts of the core code of mplayer that
want to display something to the user in reaction to environmental events:
the subtitles react to timestamps from the demuxer, the input system reacts
to the user, etc.

On the driver side, there is one piece of code that receives the overlay
objects and manages to somehow actually put them on the video.

Here is the API of the client side (the function names are not definitive):

- void overlay_client_use(callback)

  The overlay subsystem will call the callback whenever it is reinitialized.
  The client will thus be informed that the overlay subsystem is blank; it
  also gets the resolution.

- imageptr overlay_client_add_image(image data, alpha data)

  The client gives an image to the overlay subsystem. The ownership of the
  memory for the data goes to the overlay subsystem, which is now
  responsible for freeing it. The function returns an opaque value to point
  the image. The input image can be in any pixel format the client wants.

- alphamapptr overlay_client_add_alphamap(alpha data)

  Same as above, but only the alpha channel.

- imageptr overlay_client_alphamap_instantiate(alphamapptr, color)

  Creates an image object from an alpha map and a single color. I will call
  these images "uniform images" from now on. Note that, from the client
  point of view, once an uniform image has been instantiated, it is
  indistinguishable from a normal image.

- objectptr overlay_client_show_image(imageptr, x, y)

  A previously loaded image should actually be displayed on the video. The
  same imageptr can be displayed several times at once.

- void overlay_client_hide(objectptr)
- void overlay_client_destroy_image(imageptr)
- void overlay_client_destroy_alphamap(alphamapptr)

  The names should say it all.

Now the driver side:

- void overlay_driver_init(width, height, pixel_fmt, accept_alphamap)

  The driver tells the overlay subsystem its resolution, its preferred
  pixel format and whether it accepts raw alphamaps.

- overlay_driver_get()

  This function returns a lot of values:

  - A boolean telling if anything changed on what is actually displayed.

  - A list of new images.

  - A list of destroyed images.

  - A list of new alpha maps.

  - A list of destroyed alpha maps.

  - A list new uniform images.

  - A list of destroyed uniform images.

  - A list of all currently displayed objects.

  These lists have the following properties:

  - In each of the image, alpha map and uniform image structures,
    the driver has room to store an opaque value.

  - If accept_alphamap was false in the call to _init, then the lists of
    alpha maps and uniform images are guaranteed to be empty.

  - All images are ready-to-render: they are in the driver's preferred pixel
    format, with pre-multiplied alpha, and the alpha channel has been
    downscaled for all subsamplings.

  - The currently displayed objects are a simple structure with the
    coordinates, a pointer to the image structure, and a flag telling if
    this is an uniform image; that last flag is guaranteed to be false if
    accept_alphamap was false.

- overlay_driver_release()

  Tells the overlay subsystem that it can actually free the destroyed
  structures.

With this API, I think the work of the various parts is quite
straightforward.

In particular, the overlay subsystem keeps the alpha maps, converts the
images into the driver's pixel format, and if the driver does not accept
alpha maps, it does the work of instantiating an alpha map into an image.

Does that sound reasonable?

Regards,

-- 
  Nicolas George
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/mplayer-dev-eng/attachments/20090224/c984ccf3/attachment.pgp>


More information about the MPlayer-dev-eng mailing list