[MPlayer-dev-eng] [PATCH] A new ASS renderer

Sun Sep 23 16:37:20 CEST 2012

On Sun, Sep 23, 2012 at 04:22:01PM +0200, wm4 wrote:
> On Sun, 23 Sep 2012 16:10:28 +0200
> Reimar Döffinger <Reimar.Doeffinger at gmx.de> wrote:
> 
> > The when running on the GPU, the alpha blending does not take any
> > relevant amount of time, what breaks the GPU render's neck is when
> > it has to upload 10000 2x2 images that are then stored in 10000
> > different textures.
> 
> In my hacks to vo_gl, I reduced the number of textures needed for EOSD
> to 1, with 1 upload per frame. The CPU still needs to memcpy the images
> to pack them, though. This was a performance win, especially on OSX for
> some reason.

I am afraid that will depend on a lot of things.
For example how fast the driver is at creating a new texture vs.
just updating one.
Some drivers for example also have huge issues with memory fragmentation
when allocating and freeing huge textures of different sizes all the
time.
Then there's also a question whether it's a subtitle where things change
all the time or one where 90% stays the same and a small thing changes
(in which case I suspect your approach would waste a good bit of memory
bandwidth? Here integrated vs. dedicated GPU can make a lot of
difference if that's a problem).
Lastly, if the driver decides to use a host-memory backing texture, the
glTexSubImage calls can be almost as fast as a memcpy, but that relies
heavily on driver details.

> > > > Only libass can e.g. look ahead
> > > 
> > > You could easily request a frame from the future from libass,
> > > provided libass has received all necessary subtitle packets. The
> > > rendering interface takes a time value.
> > 
> > That is useless, since what you need is to have all glyphs that are
> > used repeatedly in future frames are stored together in a shared
> > memory area (and preferably without introducing yet another memcpy),
> > whereas all that are used only a few times are not there but still
> > grouped together so they can be processed in one go.
> 
> What do you mean by grouped together? Packed into a rectangular
> bitmap?

Preferably yes. I haven't thought too much about what the requirements
would be exactly though. SIMD probably wouldn't care so much about the
exact layout but more that there are large chunks with the same
alpha/color value I could imagine. Just wild guessing though.