[FFmpeg-soc] libavfilter - filter for .ass subtitle rendering using libass

Mon Mar 2 23:28:43 CET 2009

On date Thursday 2009-02-26 10:56:49 +0300, Alexey Lebedeff encoded:
> 
> On Thu, 26 Feb 2009 00:45:04 +0100, Stefano Sabatini wrote:
> > On date Tuesday 2009-02-24 20:28:53 +0300, Alexey Lebedeff encoded:
> >> On Tue, 24 Feb 2009 13:54:08 +0300, Alexey Lebedeff wrote:
> 
> >>   int vsub,hsub;   //< chroma subsampling
> >
> > missing space after the comma
> 
> And the same in the vf_drawbox.c =)

Fixed.

> >> } AssContext;
> >> 
> >> static int parse_args(AVFilterContext *ctx, AssContext *context, const char* args);
> >> static av_cold int init(AVFilterContext *ctx, const char *args, void *opaque)
> >> {
> >>   AssContext *context= ctx->priv;
> >> 
> >>   /* defaults */
> >>   context->margin = 10;
> >>   context->encoding = "utf-8";
> >> 
> >>   if ( parse_args(ctx, context, args) )
> >
> > if (parse_args(...))
> >
> > style is preferred.
> >
> >>     return 1;
> >
> > The documentation only says to return 0 in case of success, but all
> > the current filters return a negative value in case of failure, which
> > is consistent with the FFmpeg API, so I'd say to return -1 here and in
> > all the other cases.
> >
> > Also I find quite confusing to parse the args in a distinct function,
> > but maybe it's only me.
> 
> It was too big, and initially I wanted to place libass initialization
> in init() too. But together they was so large, that I decided to split
> them in such way.
> 
> >>   return 0;
> >> }
> 
> >> static int config_input(AVFilterLink *link)
> >> {
> >>   AssContext *context = link->dst->priv;
> >> 
> >>   context->frame_width = link->w;
> >>   context->frame_height = link->h;
> >> 
> >>   context->ass_library = ass_library_init();
> >> 
> >>   if ( !context->ass_library ) {
> >>     av_log(0, AV_LOG_ERROR, "ass_library_init() failed!\n");
> >>     return 1;
> >>   }
> >> 
> >>   ass_set_fonts_dir(context->ass_library, "");
> >>   ass_set_extract_fonts(context->ass_library, 1);
> >>   ass_set_style_overrides(context->ass_library, NULL);
> >> 
> >>   context->ass_renderer = ass_renderer_init(context->ass_library);
> >>   if ( ! context->ass_renderer ) {
> >>     av_log(0, AV_LOG_ERROR, "ass_renderer_init() failed!\n");
> >>     return 1;
> >>   }
> >> 
> >>   ass_set_frame_size(context->ass_renderer, link->w, link->h);
> >>   ass_set_margins(context->ass_renderer, context->margin, context->margin, context->margin, context->margin);
> >>   ass_set_use_margins(context->ass_renderer, 1);
> >>   ass_set_font_scale(context->ass_renderer, 1.);
> >>   ass_set_fonts(context->ass_renderer, NULL, "Sans");
> >> 
> >>   context->ass_track = ass_read_file(context->ass_library, context->filename, context->encoding);
> >>   if ( !context->ass_track ) {
> >>     av_log(0, AV_LOG_ERROR, "Failed to read subtitle file with ass_read_file()!\n");
> >>     return 1;
> >>   }
> >
> > I don't see why this is done here rather than in init().
> Because of link->w and link->h.
> It needs to be checked whether ass_set_frame_size() can be called
> anytime, or libass init sequence is fixed.

This is only used in:
ass_set_frame_size(context->ass_renderer, link->w, link->h);

all the other initialization may be done in init.
Also I think you're missing error check for the various ass*
functions.

But again, if you can work to a libavcodec ASS codec it would be
*much* better, and you have more chance to get included in FFmpeg (and
it will be rather useful by its own).

Check the ffmpeg-devel archive, recently there has been a discussion
on the new subtitles framework.

> > Also av_log should always have a context.
> I was not sure  what to use here, and 0 as context worked fine.
> Will be the 'link->dst' the right parameter for av_log() here?
> 
> >>   avcodec_get_chroma_sub_sample(link->format,
> >> 				&context->hsub, &context->vsub);
> >> 
> >>   return 0;
> >> }
> >> 
> >> static void start_frame(AVFilterLink *link, AVFilterPicRef *picref)
> >> {
> >>   avfilter_start_frame(link->dst->outputs[0], picref);
> >> }
> >> 
> >> #define _r(c)  ((c)>>24)
> >> #define _g(c)  (((c)>>16)&0xFF)
> >> #define _b(c)  (((c)>>8)&0xFF)
> >> #define _a(c)  ((c)&0xFF)
> >> #define rgba2y(c)  ( (( 263*_r(c)  + 516*_g(c) + 100*_b(c)) >> 10) + 16  )                                                                     
> >> #define rgba2u(c)  ( ((-152*_r(c) - 298*_g(c) + 450*_b(c)) >> 10) + 128 )                                                                      
> >> #define rgba2v(c)  ( (( 450*_r(c) - 376*_g(c) -  73*_b(c)) >> 10) + 128 )                                                                      
> 
> > Please avoid _foo var names, they are unreadable.
> > I wonder if we can do such transformation using some function, if
> > that's not the case maybe we should implement it in the lib, otherwise
> > filter writers will write this again and again (see for example VHOOK).
> 
> This can be rewritten using swscale, as Bobby Bingham suggested.
> 
> Then code will do the following:
> 1) Render all subtitle parts onto a big RGB bitmap
> 2) Convert this bitmap to destination format using swscale
> 3) Blend converted result with source pic
> 
> This is not optimal, as swscale will do many unneeded work.
> But all subtleties of different formats handling will be hidden from us.
> 
> >> static void draw_ass_image(AVFilterPicRef *pic, ass_image_t *img, AssContext *context)
> >> {
> >>   unsigned char *row[4];
> >>   unsigned char c_y = rgba2y(img->color);
> >>   unsigned char c_u = rgba2u(img->color);
> >>   unsigned char c_v = rgba2v(img->color);
> >>   unsigned char opacity = 255 - _a(img->color);
> >>   unsigned char *src;
> >>   int i, j;
> >> 
> >>   unsigned char *bitmap = img->bitmap;
> >>   int bitmap_w = img->w;
> >>   int bitmap_h = img->h;
> >>   int dst_x = img->dst_x;
> >>   int dst_y = img->dst_y;
> >> 
> >>   int channel;
> >>   int x,y;
> >> 
> >>   src = bitmap;
> >> 
> >>   for (i = 0; i < bitmap_h; ++i) {
> >>     y = dst_y + i;
> >>     if ( y >= pic->h )
> >>       break;
> >> 
> >>     row[0] = pic->data[0] + y * pic->linesize[0];
> >> 
> >>     for (channel = 1; channel < 3; channel++)
> >>       row[channel] = pic->data[channel] +
> >> 	pic->linesize[channel] * (y>> context->vsub);
> >> 
> >>     for (j = 0; j < bitmap_w; ++j) {
> >>       unsigned k = ((unsigned)src[j]) * opacity / 255;
> >> 
> >>       x = dst_x + j;
> >>       if ( y >= pic->w )
> >> 	break;
> >> 
> >>       row[0][x] = (k*c_y + (255-k)*row[0][x]) / 255;
> >>       row[1][x >> context->hsub] = (k*c_u + (255-k)*row[1][x >> context->hsub]) / 255;
> >>       row[2][x >> context->hsub] = (k*c_v + (255-k)*row[2][x >> context->hsub]) / 255;
> >>     }
> >> 
> >>     src += img->stride;
> >>   } 
> >> }
> 
> > It's always better to use the slice API, would be possible to
> > implement this overlay algorithm per-slice (I bet yes)?
> 
> What is slice API and where I can read about it?

Uh, use the source Luke! Basically with the term "API slice" I refer
to the use of draw_slice() rather than "sending" the whole frame with
end_frame(), you pass just a slice, it greatly helps cache locality
and so performance, especially for long and slow filter chains.

Check for example the eval filter (search in ffmpeg-devel) for a
simple example using it.

> > I don't think this is a good syntax, ':' is already used with a
> > special meaning (to separate params) in most filters, if you want to
> > support a list of key/val values (which looks like a good idea, at
> > least better than a list of unnamed params) then I think it's better
> > to use the syntax:
> > param1=val1:param2=val2...:paramN=valN
> Wanted to do this, but '=' was used by graph parsed, and I was
> unaware that equal signs can be quoted.
> 
> > As already noted, the patch as it is currently designed maybe won't be
> > accepted in SVN anyway, I think the "right" solution for the FFmpeg
> > standards would be to use/implement/extend somehow the native ASS
> > libavcodec decoder, then maybe we should also extend the lavfi API to
> > also support subtitle frames.
> 
> I have some interest in subtitle support and some spare time, so I'll
> try to implement the "right" solution.

Nice.

Regards.