[MPlayer-dev-eng] [PATCH] A new ASS renderer

Mon Sep 24 19:14:34 CEST 2012

Le duodi 2 vendémiaire, an CCXXI, Xidorn Quan a écrit :
> This optimization should not be applied into the current vf_ass itself
> since it could make rendering slower for most cases which subtitles
> are horizontal. But it is useful for vf_ass2 because vf_ass2 only
> extends it once for every different subtitles but vf_ass will extends
> it every frame.

I do not see how it can not be helpful. Consider the following change:

Every time a line is about to be marked as "dirty":

  * keep track of the minimum and maximum affected x;
  * if the interval is enlarged, upscale the extra pixels.

The overhead of keeping track of min and max x is negligible, and even with
horizontal text layouts, most lines only cover a fraction of the width of
the screen, so there is a non-negligible benefit.

But there is even better:

Every time libass tells us that the subtitles have changed, walk over all
covered lines and determine the min and max x affected.

Any of these optimizations can be implemented as is on top of the current
implementation. The second is in fact very simple, and I have a mind to work
on it as soon as I find some time (in particular for the benchmark part).

> It's true that this can be merged into the filter without any
> technological difficulty, but it also means a large modification to
> it. I just don't think that putting two different methods which are
> equivalent in result into one filter is a good idea. Maybe I'm wrong.

There is a large amount of shared code: all libass communication and filter
glue. Duplicating it is really not a good idea.

> Well, I think it is no problem to simply replace the old vf_ass with
> this new implementation because of the reason I have given that
> performance for filter becomes trivial when subtitle is changed every
> frame. And this would work better for other situations.

That is an option I do not rule out.

> This makes no sense since the elements not in the image will again
> hurt the cache,

I am not sure what you mean here, so maybe I was not clear in my suggestion.
If I understand your proposal correctly, when dealing with "Hello" with an
outline and a shadow, your patch creates an image for the whole world. My
suggestion is to create an image for the "H", with the body, outline,
shadow, merged, then an image for the "e", one for the "l", etc. That means
that the pixels above the "e" and "o" do not need to be processed.

>		  and this may significantly complicate the code.

That is probably true for that suggestion.

> Samplings of different components are different, so the alpha channal
> should not be pre-downsampling. It must fit the highest one. But
> downsampling other channals first might be a good idea.

You can create your alpha channel at the chrominance resolution and then
create a second copy of the alpha channel subsampled at the luminance
resolution.

> I don't think it is a good idea to blending it thrice since for some
> reason, alpha values require a relatively complex preprocess. I don't
> think it is more cache-friendly either. CPU cache should be large
> enough to store such arrays nowadays.
> 
> For reference, lines in x86 L1 cache are 64 bytes. 3 color channals
> with 1 alpha channal and 1 data stream cost less than 0.5kB in L1
> which has been at least 4kB since it first appeared in this
> architecture.

This is speculating. Benchmarks are better.

Regards,

-- 
  Nicolas George
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/mplayer-dev-eng/attachments/20120924/e91a0de2/attachment.asc>