[MPlayer-dev-eng] [PATCH] A new ASS renderer

Sun Sep 23 15:23:46 CEST 2012

On Sun, Sep 23, 2012 at 5:20 PM, Nicolas George <
nicolas.george at normalesup.org> wrote:

> Le duodi 2 vendémiaire, an CCXXI, Xidorn Quan a écrit :
> > This patch addresses the first problem you said. It renders the whole
> > sentence once, and reuse it for frames where subtitle keep unchanged.
> > And in fact, this renderer is also partially optimized for the second
> > problem at the same time.
> >
> > The workflow for the current renderer is: upsampling all rows which
> > will be rendered on from the origin frame, rendering subtitles,
> > then downsampling. In this new renderer, because every pixel will be
> > visited no more than once, we just need to upsample and downsample
> > the pixels we need instead of the whole rows. So this renderer may
> > also perform better for vertical subtitles in which pixels need to be
> > rendered in a row is far fewer than those of a whole row even if
> > the subtitles change every frame.
>
> Ok, I see. I believe your patch could be split into smaller components,
> making it easier to review and eventually apply:
>
> * Keep track of the extents of the "dirty" lines, to avoid up/downscaling a
>   whole line for a single char. Simple minor optimization, interesting by
>   itself.
>

This optimization should not be applied into the current vf_ass itself
since it could make rendering slower for most cases which subtitles
are horizontal. But it is useful for vf_ass2 because vf_ass2 only
extends it once for every different subtitles but vf_ass will extends
it every frame.

> * Vectorized implementations of the blending functions.
>
> * Merge the image elements into a temporary image. To avoid duplicating the
>   filter, it can be just an option that activate a slightly different code
>   path inside the filter.
>

It's true that this can be merged into the filter without any
technological difficulty, but it also means a large modification to
it. I just don't think that putting two different methods which are
equivalent in result into one filter is a good idea. Maybe I'm wrong.

Well, I think it is no problem to simply replace the old vf_ass with
this new implementation because of the reason I have given that
performance for filter becomes trivial when subtitle is changed every
frame. And this would work better for other situations.

> There are other possible optimizations. I can suggest at least two of them:
>
> * Detect overlapping image elements (glyph body, outline and shadow) and
>   combine only them in an intermediate image.
>

This makes no sense since the elements not in the image will again
hurt the cache, and this may significantly complicate the code.

> * Downscale the alpha channel instead of upscaling the target image.
>

Samplings of different components are different, so the alpha channal
should not be pre-downsampling. It must fit the highest one. But
downsampling other channals first might be a good idea.

> Also, a few benchmarks would be nice. Blending an alpha channel with a
> single color is simpler and more cache-friendly than blending an alpha
> channel with a color channel, so it may happen that doing the first thrice
> would be faster than the latter.
>

I don't think it is a good idea to blending it thrice since for some
reason, alpha values require a relatively complex preprocess. I don't
think it is more cache-friendly either. CPU cache should be large
enough to store such arrays nowadays.

For reference, lines in x86 L1 cache are 64 bytes. 3 color channals
with 1 alpha channal and 1 data stream cost less than 0.5kB in L1
which has been at least 4kB since it first appeared in this
architecture.