[FFmpeg-devel] [PATCH] WIP: subtitles in AVFrame

Fri Nov 11 17:46:51 EET 2016

On Wed,  2 Nov 2016 23:09:34 +0100
Clément Bœsch <u at pkh.me> wrote:

> ---
> So this is just a prototype that is starting to work. It needs a split
> (and completion of the long TODO list) before review but I wanted to
> share it for early feedbacks.
> 
> So long story short: AVSubtitle → AVFrame, you have to use the new M:N
> API, and it's integrated into lavfi.
> 
> Now a bit longer: the AVFrame->extended_data are
> AVFrameSubtitleRectangle (declared in lavu/frame). Frames are
> refcounted. So far this is not really meaningful as a frame ref will
> alloc and copy every rectangle but that's a limitation we will have to
> deal with for now but it allows proper integration into the framework.
> 
> It actually simplifies a lot the workflow so I'm optimistic about the
> end result. Unfortunately, there is a lot of work remaining.
> 
> End note: despite my obsession with subtitles, I'd like to remind
> everyone that I actually hate working with them, I just believe it's
> really important to get that historical burden out of sight. That means,
> if anyone wants to help with that, they are extremelly welcome.
> 
> Thanks,
> 
> TODO:
> - what to do about sub2video?
> - properly deal with -ss and -t (need strim filter?)
> - sub_start_display/sub_end_display needs to be honored
> - find a test case for dvbsub as it's likely broken (ffmpeg.c hack is
>   removed and should be replaced by a EAGAIN logic in lavc/utils.c)
> - make it pass FATE:
>   * fix cc/subcc
>   * broke various other stuff
> - split commit
> - Changelog/APIchanges
> - proper API doxy
> - update lavfi/subtitles?
> ---
>  ffmpeg.c                   | 231 ++++++++++++++++-----------------------------
>  ffmpeg_filter.c            |  94 +++++++++++++++++-
>  ffmpeg_opt.c               |  33 ++++++-
>  libavcodec/avcodec.h       |   9 ++
>  libavcodec/utils.c         | 214 +++++++++++++++++++++++++++++++++++++++++
>  libavfilter/Makefile       |   3 +
>  libavfilter/allfilters.c   |   8 ++
>  libavfilter/avfilter.c     |   8 ++
>  libavfilter/buffersink.c   |  28 ++++++
>  libavfilter/buffersrc.c    |  59 ++++++++++++
>  libavfilter/fifo.c         |  31 ++++++
>  libavfilter/formats.c      |   4 +
>  libavfilter/internal.h     |   8 ++
>  libavfilter/sf_snull.c     |  48 ++++++++++
>  libavfilter/subtitle.c     |  60 ++++++++++++
>  libavfilter/subtitle.h     |  31 ++++++
>  libavfilter/vf_subtitles.c |  42 +++++----
>  libavformat/assenc.c       |   3 +
>  libavutil/frame.c          | 131 +++++++++++++++++++------
>  libavutil/frame.h          |  59 ++++++++++--
>  tests/ref/fate/sub-charenc |   6 +-
>  21 files changed, 895 insertions(+), 215 deletions(-)
>  create mode 100644 libavfilter/sf_snull.c
>  create mode 100644 libavfilter/subtitle.c
>  create mode 100644 libavfilter/subtitle.h
> 
> diff --git a/ffmpeg.c b/ffmpeg.c
> index 36a921a..8222b8b 100644
> --- a/ffmpeg.c
> +++ b/ffmpeg.c

Can't comment on those.

> diff --git a/libavcodec/avcodec.h b/libavcodec/avcodec.h
> index 211112f..0912bff 100644
> --- a/libavcodec/avcodec.h
> +++ b/libavcodec/avcodec.h
> @@ -4838,7 +4838,10 @@ int avcodec_decode_video2(AVCodecContext *avctx, AVFrame *picture,
>   *                 must be freed with avsubtitle_free if *got_sub_ptr is set.
>   * @param[in,out] got_sub_ptr Zero if no subtitle could be decompressed, otherwise, it is nonzero.
>   * @param[in] avpkt The input AVPacket containing the input buffer.
> + *
> + * @deprecated Use avcodec_send_packet() and avcodec_receive_frame().
>   */
> +attribute_deprecated
>  int avcodec_decode_subtitle2(AVCodecContext *avctx, AVSubtitle *sub,
>                              int *got_sub_ptr,
>                              AVPacket *avpkt);
> @@ -5322,6 +5325,12 @@ attribute_deprecated
>  int avcodec_encode_video2(AVCodecContext *avctx, AVPacket *avpkt,
>                            const AVFrame *frame, int *got_packet_ptr);
>  
> +/**
> + * Encode a subtitle.
> + *
> + * @deprecated use avcodec_send_frame()/avcodec_receive_packet() instead
> + */
> +attribute_deprecated
>  int avcodec_encode_subtitle(AVCodecContext *avctx, uint8_t *buf, int buf_size,
>                              const AVSubtitle *sub);
>  
> diff --git a/libavcodec/utils.c b/libavcodec/utils.c
> index 87de15f..add7edf 100644
> --- a/libavcodec/utils.c
> +++ b/libavcodec/utils.c
> @@ -2050,6 +2050,86 @@ int avcodec_encode_subtitle(AVCodecContext *avctx, uint8_t *buf, int buf_size,
>      return ret;
>  }
>  
> +// TODO: delete this compatibility code when all subtitles encoders moved
> +// to send_frame
> +static int encode_subtitle_frame(AVCodecContext *avctx,
> +                                 AVPacket *avpkt,
> +                                 const AVFrame *frame,
> +                                 int *got_packet_ptr)
> +{
> +    int i, ret;
> +
> +FF_DISABLE_DEPRECATION_WARNINGS
> +    AVPacket tmp_pkt;
> +    AVSubtitle subtitle;
> +
> +    get_subtitle_defaults(&subtitle);
> +
> +    /* Allocate (once) a temporary large output subtitle buffer */
> +    if (!avctx->internal->byte_buffer) {
> +        const int subtitle_out_max_size = 1024 * 1024;
> +        uint8_t *subtitle_out = av_malloc(subtitle_out_max_size);
> +        if (!subtitle_out)
> +            return AVERROR(ENOMEM);
> +
> +        avctx->internal->byte_buffer      = subtitle_out;
> +        avctx->internal->byte_buffer_size = subtitle_out_max_size;
> +    }

This is used for encoding below... does the old API really not have a
better way for this?

I would have thought 1MB is a bit small, but then again I looked at
ffmpeg.c, and it's using the same fixed size buffer.

> +
> +    /* Craft an AVSubtitle from the AVFrame */
> +    subtitle.format = frame->format == AV_PIX_FMT_NONE;
> +    subtitle.rects  = av_mallocz_array(frame->sub_nb_rects, sizeof(*subtitle.rects));
> +    if (!subtitle.rects)
> +        return AVERROR(ENOMEM);
> +    subtitle.num_rects = frame->sub_nb_rects;
> +    subtitle.pts = frame->pts;
> +
> +    for (i = 0; i < frame->sub_nb_rects; i++) {
> +        const AVFrameSubtitleRectangle *src_rect = (AVFrameSubtitleRectangle *)frame->extended_data[i];
> +        AVSubtitleRect *dst_rect;
> +
> +        dst_rect = av_mallocz(sizeof(*subtitle.rects[i]));
> +        if (!dst_rect)
> +            return AVERROR(ENOMEM);

Leaks at least subtitle.rects (if you care...).

> +        subtitle.rects[i] = dst_rect;
> +
> +        if (subtitle.format) {
> +            dst_rect->type = SUBTITLE_ASS;
> +            dst_rect->ass  = src_rect->text;
> +        } else {
> +            dst_rect->type = SUBTITLE_BITMAP;
> +            dst_rect->x = src_rect->x;
> +            dst_rect->y = src_rect->x;
> +            dst_rect->w = src_rect->w;
> +            dst_rect->h = src_rect->h;
> +            memcpy(dst_rect->data,     src_rect->data,     sizeof(dst_rect->data));
> +            memcpy(dst_rect->linesize, src_rect->linesize, sizeof(dst_rect->linesize));
> +        }
> +        dst_rect->flags = src_rect->flags;
> +    }
> +
> +    /* Send it to the old API */
> +    ret = avcodec_encode_subtitle(avctx,
> +                                  avctx->internal->byte_buffer,
> +                                  avctx->internal->byte_buffer_size,
> +                                  &subtitle);
> +    if (ret < 0)
> +        return ret;
> +
> +    /* Wrap the encoded buffer into a ref-counted AVPacket */
> +    av_init_packet(&tmp_pkt);
> +    tmp_pkt.data = avctx->internal->byte_buffer;
> +    tmp_pkt.size = ret;

> +    av_packet_ref(avpkt, &tmp_pkt);

Can fail.

> +    avpkt->pts = frame->pts;
> +    avpkt->dts = frame->pkt_dts;
> +    avpkt->duration = frame->pkt_duration;
> +FF_ENABLE_DEPRECATION_WARNINGS
> +
> +    *got_packet_ptr = 1;
> +    return ret;
> +}
> +
>  /**
>   * Attempt to guess proper monotonic timestamps for decoded video frames
>   * which might have incorrect times. Input timestamps may wrap around, in
> @@ -2764,6 +2844,134 @@ void avsubtitle_free(AVSubtitle *sub)
>      memset(sub, 0, sizeof(AVSubtitle));
>  }
>  
> +static int decode_subtitle_frame(AVCodecContext *avctx,
> +                                 AVFrame *frame,
> +                                 int *got_frame_ptr,
> +                                 const AVPacket *avpkt)
> +{
> +    int i, ret = 0;
> +
> +    *got_frame_ptr = 0;
> +
> +    if (!avctx->codec)
> +        return AVERROR(EINVAL);
> +
> +    if (!avpkt->data && avpkt->size) {
> +        av_log(avctx, AV_LOG_ERROR, "invalid packet: NULL data, size != 0\n");
> +        return AVERROR(EINVAL);
> +    }

didn't vobsubs have 0-sized packets at some point? Also, I think the
caller should check this (avcodec_send_packet).

> +
> +    av_assert0(avctx->codec->type == AVMEDIA_TYPE_SUBTITLE);
> +
> +    av_frame_unref(frame);

It's already guaranteed to be unreffed.

> +
> +    if ((avctx->codec->capabilities & AV_CODEC_CAP_DELAY) || avpkt->size) {
> +        AVPacket pkt_recoded;
> +        AVPacket tmp = *avpkt;
> +        int did_split = av_packet_split_side_data(&tmp);
> +
> +        if (did_split) {
> +            /* FFMIN() prevents overflow in case the packet wasn't allocated with
> +             * proper padding.
> +             * If the side data is smaller than the buffer padding size, the
> +             * remaining bytes should have already been filled with zeros by the
> +             * original packet allocation anyway. */
> +            memset(tmp.data + tmp.size, 0,
> +                   FFMIN(avpkt->size - tmp.size, AV_INPUT_BUFFER_PADDING_SIZE));
> +        }
> +        pkt_recoded = tmp;
> +
> +        ret = recode_subtitle(avctx, &pkt_recoded, &tmp);
> +        if (ret >= 0) {

IMO early exit would be nicer...

This is mostly the old code copy&pasted, I assume.

> +            int err_ret;
> +            AVSubtitle sub = {0};
> +
> +            avctx->internal->pkt = &pkt_recoded;
> +
> +            ret = avctx->codec->decode(avctx, &sub, got_frame_ptr, &pkt_recoded);
> +            av_assert1((ret >= 0) >= !!*got_frame_ptr &&
> +                       !!*got_frame_ptr >= !!sub.num_rects);
> +
> +            for (i = 0; i < sub.num_rects; i++) {
> +                if (sub.rects[i]->ass && !utf8_check(sub.rects[i]->ass)) {
> +                    av_log(avctx, AV_LOG_ERROR,
> +                           "Invalid UTF-8 in decoded subtitles text; "
> +                           "maybe missing -sub_charenc option\n");
> +                    avsubtitle_free(&sub);
> +                    return AVERROR_INVALIDDATA;
> +                }
> +            }

Hate.

> +
> +            frame->pts                   =
> +            frame->best_effort_timestamp = avpkt->pts;
> +            frame->pkt_duration          = avpkt->duration;
> +            frame->pkt_pos               = avpkt->pos;
> +            frame->sub_nb_rects          = sub.num_rects;
> +
> +            if (avctx->codec_descriptor->props & AV_CODEC_PROP_BITMAP_SUB)
> +                frame->format = AV_PIX_FMT_PAL8;
> +            else if (avctx->codec_descriptor->props & AV_CODEC_PROP_TEXT_SUB)
> +                frame->format = AV_PIX_FMT_NONE;

AFAIK these are just hints, not the definitive format. Why not use
sub.format?

> +
> +            frame->sub_start_display = sub.start_display_time ? av_rescale_q(sub.start_display_time, av_make_q(1, 1000), AV_TIME_BASE_Q)
> +                                                              : AV_NOPTS_VALUE;
> +            frame->sub_end_display   = sub.end_display_time   ? av_rescale_q(sub.end_display_time,   av_make_q(1, 1000), AV_TIME_BASE_Q)
> +                                                              : AV_NOPTS_VALUE;

end_display_time can be UINT32_MAX for "infinite". I think we should
clean this up here, and differentiate between "0", "unknown", and
"until next event".

> +            /* Allocate sub_nb_rects AVFrameSubtitleRectangle */
> +            err_ret = av_frame_get_buffer(frame, 0);
> +            if (err_ret < 0) {
> +                avsubtitle_free(&sub);
> +                return err_ret;
> +            }
> +
> +            /* Transfer data from AVSubtitleRect to AVFrameSubtitleRectangle */
> +            for (i = 0; i < sub.num_rects; i++) {
> +                AVSubtitleRect *src_rect = sub.rects[i];
> +                AVFrameSubtitleRectangle *dst_rect = (AVFrameSubtitleRectangle *)frame->extended_data[i];
> +
> +                if (frame->format == AV_PIX_FMT_NONE) {
> +                    dst_rect->text = src_rect->ass;
> +                } else {
> +                    dst_rect->x = src_rect->x;
> +                    dst_rect->y = src_rect->y;
> +                    dst_rect->w = src_rect->w;
> +                    dst_rect->h = src_rect->h;
> +                    memcpy(dst_rect->data,     src_rect->data,     sizeof(src_rect->data));
> +                    memcpy(dst_rect->linesize, src_rect->linesize, sizeof(src_rect->linesize));
> +                }
> +                dst_rect->flags = src_rect->flags;
> +
> +                // we free and make sure it is set to NULL so the data inside
> +                // is not freed after destructing the AVSubtitle
> +                av_freep(&sub.rects[i]);
> +            }
> +            sub.num_rects = 0;
> +            avsubtitle_free(&sub);
> +
> +            if (tmp.data != pkt_recoded.data) { // did we recode?
> +                /* prevent from destroying side data from original packet */
> +                pkt_recoded.side_data = NULL;
> +                pkt_recoded.side_data_elems = 0;
> +
> +                av_packet_unref(&pkt_recoded);
> +            }
> +            avctx->internal->pkt = NULL;
> +        }
> +
> +        if (did_split) {
> +            av_packet_free_side_data(&tmp);

This leaks if the code above returns due to an error, I think?

> +            if(ret == tmp.size)
> +                ret = avpkt->size;
> +        }
> +
> +        if (*got_frame_ptr)
> +            avctx->frame_number++;
> +    }
> +
> +    return ret;
> +}
> +
>  static int do_decode(AVCodecContext *avctx, AVPacket *pkt)
>  {
>      int got_frame;
> @@ -2792,6 +3000,9 @@ static int do_decode(AVCodecContext *avctx, AVPacket *pkt)
>      } else if (avctx->codec_type == AVMEDIA_TYPE_AUDIO) {
>          ret = avcodec_decode_audio4(avctx, avctx->internal->buffer_frame,
>                                      &got_frame, pkt);
> +    } else if (avctx->codec_type == AVMEDIA_TYPE_SUBTITLE) {
> +        ret = decode_subtitle_frame(avctx, avctx->internal->buffer_frame,
> +                                    &got_frame, pkt);
>      } else {
>          ret = AVERROR(EINVAL);
>      }
> @@ -2941,6 +3152,9 @@ static int do_encode(AVCodecContext *avctx, const AVFrame *frame, int *got_packe
>      } else if (avctx->codec_type == AVMEDIA_TYPE_AUDIO) {
>          ret = avcodec_encode_audio2(avctx, avctx->internal->buffer_pkt,
>                                      frame, got_packet);
> +    } else if (avctx->codec_type == AVMEDIA_TYPE_SUBTITLE) {
> +        ret = encode_subtitle_frame(avctx, avctx->internal->buffer_pkt,
> +                                    frame, got_packet);
>      } else {
>          ret = AVERROR(EINVAL);
>      }
> diff --git a/libavfilter/Makefile b/libavfilter/Makefile
> index 7ed4696..5972b27 100644
> --- a/libavfilter/Makefile
> +++ b/libavfilter/Makefile
> @@ -329,6 +329,9 @@ OBJS-$(CONFIG_YUVTESTSRC_FILTER)             += vsrc_testsrc.o
>  
>  OBJS-$(CONFIG_NULLSINK_FILTER)               += vsink_nullsink.o
>  
> +# subtitle filters
> +OBJS-$(CONFIG_SNULL_FILTER)                  += sf_snull.o
> +
>  # multimedia filters
>  OBJS-$(CONFIG_ADRAWGRAPH_FILTER)             += f_drawgraph.o
>  OBJS-$(CONFIG_AHISTOGRAM_FILTER)             += avf_ahistogram.o
> diff --git a/libavfilter/allfilters.c b/libavfilter/allfilters.c
> index 82a65ee..e5d20cd 100644
> --- a/libavfilter/allfilters.c
> +++ b/libavfilter/allfilters.c
> @@ -45,6 +45,7 @@ void avfilter_register_all(void)
>          return;
>      initialized = 1;
>  
> +    /* audio filters */
>      REGISTER_FILTER(ABENCH,         abench,         af);
>      REGISTER_FILTER(ACOMPRESSOR,    acompressor,    af);
>      REGISTER_FILTER(ACROSSFADE,     acrossfade,     af);
> @@ -138,6 +139,7 @@ void avfilter_register_all(void)
>  
>      REGISTER_FILTER(ANULLSINK,      anullsink,      asink);
>  
> +    /* video filters */
>      REGISTER_FILTER(ALPHAEXTRACT,   alphaextract,   vf);
>      REGISTER_FILTER(ALPHAMERGE,     alphamerge,     vf);
>      REGISTER_FILTER(ASS,            ass,            vf);
> @@ -345,6 +347,9 @@ void avfilter_register_all(void)
>  
>      REGISTER_FILTER(NULLSINK,       nullsink,       vsink);
>  
> +    /* subtitle filters */
> +    REGISTER_FILTER(SNULL,          snull,          sf);
> +
>      /* multimedia filters */
>      REGISTER_FILTER(ADRAWGRAPH,     adrawgraph,     avf);
>      REGISTER_FILTER(AHISTOGRAM,     ahistogram,     avf);
> @@ -367,10 +372,13 @@ void avfilter_register_all(void)
>      /* those filters are part of public or internal API => registered
>       * unconditionally */
>      REGISTER_FILTER_UNCONDITIONAL(asrc_abuffer);
> +    REGISTER_FILTER_UNCONDITIONAL(ssrc_sbuffer);
>      REGISTER_FILTER_UNCONDITIONAL(vsrc_buffer);
>      REGISTER_FILTER_UNCONDITIONAL(asink_abuffer);
> +    REGISTER_FILTER_UNCONDITIONAL(ssink_sbuffer);
>      REGISTER_FILTER_UNCONDITIONAL(vsink_buffer);
>      REGISTER_FILTER_UNCONDITIONAL(af_afifo);
> +    REGISTER_FILTER_UNCONDITIONAL(sf_sfifo);
>      REGISTER_FILTER_UNCONDITIONAL(vf_fifo);
>      ff_opencl_register_filter_kernel_code_all();
>  }
> diff --git a/libavfilter/avfilter.c b/libavfilter/avfilter.c
> index 1d469c3..908ab2c 100644
> --- a/libavfilter/avfilter.c
> +++ b/libavfilter/avfilter.c
> @@ -314,6 +314,12 @@ int avfilter_config_links(AVFilterContext *filter)
>  
>                  if (!link->time_base.num && !link->time_base.den)
>                      link->time_base = (AVRational) {1, link->sample_rate};
> +                break;
> +
> +            case AVMEDIA_TYPE_SUBTITLE:
> +                if (!link->time_base.num && !link->time_base.den)
> +                    link->time_base = inlink ? inlink->time_base : AV_TIME_BASE_Q;
> +                break;
>              }
>  
>              if (link->src->nb_inputs && link->src->inputs[0]->hw_frames_ctx &&
> @@ -1201,6 +1207,8 @@ int ff_filter_frame(AVFilterLink *link, AVFrame *frame)
>              av_assert1(frame->width               == link->w);
>              av_assert1(frame->height               == link->h);
>          }
> +    } else if (link->type == AVMEDIA_TYPE_SUBTITLE) {
> +        // TODO?
>      } else {
>          if (frame->format != link->format) {
>              av_log(link->dst, AV_LOG_ERROR, "Format change is not supported\n");
> diff --git a/libavfilter/buffersink.c b/libavfilter/buffersink.c
> index 2feb56d..f38fb1a 100644
> --- a/libavfilter/buffersink.c
> +++ b/libavfilter/buffersink.c
> @@ -388,6 +388,11 @@ static int asink_query_formats(AVFilterContext *ctx)
>      return 0;
>  }
>  
> +static av_cold int ssink_init(AVFilterContext *ctx, void *opaque)
> +{
> +    return common_init(ctx);
> +}
> +
>  #define OFFSET(x) offsetof(BufferSinkContext, x)
>  #define FLAGS AV_OPT_FLAG_FILTERING_PARAM|AV_OPT_FLAG_VIDEO_PARAM
>  static const AVOption buffersink_options[] = {
> @@ -405,9 +410,11 @@ static const AVOption abuffersink_options[] = {
>      { NULL },
>  };
>  #undef FLAGS
> +static const AVOption sbuffersink_options[] = {NULL};
>  
>  AVFILTER_DEFINE_CLASS(buffersink);
>  AVFILTER_DEFINE_CLASS(abuffersink);
> +AVFILTER_DEFINE_CLASS(sbuffersink);
>  
>  static const AVFilterPad avfilter_vsink_buffer_inputs[] = {
>      {
> @@ -452,3 +459,24 @@ AVFilter ff_asink_abuffer = {
>      .inputs      = avfilter_asink_abuffer_inputs,
>      .outputs     = NULL,
>  };
> +
> +static const AVFilterPad avfilter_ssink_sbuffer_inputs[] = {
> +    {
> +        .name         = "default",
> +        .type         = AVMEDIA_TYPE_SUBTITLE,
> +        .filter_frame = filter_frame,
> +    },
> +    { NULL }
> +};
> +
> +AVFilter ff_ssink_sbuffer = {
> +    .name          = "sbuffersink",
> +    .description   = NULL_IF_CONFIG_SMALL("Buffer subtitle frames, and make them available to the end of the filter graph."),
> +    .priv_class    = &sbuffersink_class,
> +    .priv_size     = sizeof(BufferSinkContext),
> +    .init_opaque   = ssink_init,
> +    .uninit        = uninit,
> +    //.query_formats = ssink_query_formats,
> +    .inputs        = avfilter_ssink_sbuffer_inputs,
> +    .outputs       = NULL,
> +};
> diff --git a/libavfilter/buffersrc.c b/libavfilter/buffersrc.c
> index 9294811..c20b7b2 100644
> --- a/libavfilter/buffersrc.c
> +++ b/libavfilter/buffersrc.c
> @@ -63,6 +63,9 @@ typedef struct BufferSourceContext {
>      uint64_t channel_layout;
>      char    *channel_layout_str;
>  
> +    /* subtitle only */
> +    int sub_format;
> +
>      int got_format_from_params;
>      int eof;
>  } BufferSourceContext;
> @@ -128,6 +131,8 @@ int av_buffersrc_parameters_set(AVFilterContext *ctx, AVBufferSrcParameters *par
>          if (param->channel_layout)
>              s->channel_layout = param->channel_layout;
>          break;
> +    case AVMEDIA_TYPE_SUBTITLE:
> +        break;
>      default:
>          return AVERROR_BUG;
>      }
> @@ -204,6 +209,8 @@ static int av_buffersrc_add_frame_internal(AVFilterContext *ctx,
>          CHECK_AUDIO_PARAM_CHANGE(ctx, s, frame->sample_rate, frame->channel_layout,
>                                   av_frame_get_channels(frame), frame->format);
>          break;
> +    case AVMEDIA_TYPE_SUBTITLE:
> +        break;
>      default:
>          return AVERROR(EINVAL);
>      }
> @@ -271,6 +278,7 @@ unsigned av_buffersrc_get_nb_failed_requests(AVFilterContext *buffer_src)
>  #define OFFSET(x) offsetof(BufferSourceContext, x)
>  #define A AV_OPT_FLAG_FILTERING_PARAM|AV_OPT_FLAG_AUDIO_PARAM
>  #define V AV_OPT_FLAG_FILTERING_PARAM|AV_OPT_FLAG_VIDEO_PARAM
> +#define S AV_OPT_FLAG_FILTERING_PARAM|AV_OPT_FLAG_SUBTITLE_PARAM
>  
>  static const AVOption buffer_options[] = {
>      { "width",         NULL,                     OFFSET(w),                AV_OPT_TYPE_INT,      { .i64 = 0 }, 0, INT_MAX, V },
> @@ -306,6 +314,17 @@ static const AVOption abuffer_options[] = {
>  
>  AVFILTER_DEFINE_CLASS(abuffer);
>  
> +static const AVOption sbuffer_options[] = {
> +    { "time_base",      NULL, OFFSET(time_base),  AV_OPT_TYPE_RATIONAL, { .dbl = 0 }, 0, INT_MAX, S },
> +    { "format",         NULL, OFFSET(sub_format), AV_OPT_TYPE_INT,      { .i64 = -1 }, -1, 1, S, "format" },
> +        { "unspecified", NULL, 0, AV_OPT_TYPE_CONST, {.i64=-1}, INT_MIN, INT_MAX, S, "format" },
> +        { "bitmap",      NULL, 0, AV_OPT_TYPE_CONST, {.i64=0},  INT_MIN, INT_MAX, S, "format" },
> +        { "text",        NULL, 0, AV_OPT_TYPE_CONST, {.i64=1},  INT_MIN, INT_MAX, S, "format" },

Implies no mixed subs. I'm fine with that, but might be restrictive.

> +    { NULL }
> +};
> +
> +AVFILTER_DEFINE_CLASS(sbuffer);
> +
>  static av_cold int init_audio(AVFilterContext *ctx)
>  {
>      BufferSourceContext *s = ctx->priv;
> @@ -397,6 +416,10 @@ static int query_formats(AVFilterContext *ctx)
>          if ((ret = ff_set_common_channel_layouts(ctx, channel_layouts)) < 0)
>              return ret;
>          break;
> +    case AVMEDIA_TYPE_SUBTITLE:
> +        if ((ret = ff_add_format(&formats, c->sub_format)) < 0)
> +            return ret;
> +        break;
>      default:
>          return AVERROR(EINVAL);
>      }
> @@ -424,6 +447,8 @@ static int config_props(AVFilterLink *link)
>          if (!c->channel_layout)
>              c->channel_layout = link->channel_layout;
>          break;
> +    case AVMEDIA_TYPE_SUBTITLE:
> +        break;
>      default:
>          return AVERROR(EINVAL);
>      }
> @@ -461,6 +486,15 @@ static int poll_frame(AVFilterLink *link)
>      return size/sizeof(AVFrame*);
>  }
>  
> +static av_cold int init_subtitle(AVFilterContext *ctx)
> +{
> +    BufferSourceContext *c = ctx->priv;
> +
> +    if (!(c->fifo = av_fifo_alloc(sizeof(AVFrame*))))
> +        return AVERROR(ENOMEM);
> +    return 0;
> +}
> +
>  static const AVFilterPad avfilter_vsrc_buffer_outputs[] = {
>      {
>          .name          = "default",
> @@ -510,3 +544,28 @@ AVFilter ff_asrc_abuffer = {
>      .outputs   = avfilter_asrc_abuffer_outputs,
>      .priv_class = &abuffer_class,
>  };
> +
> +static const AVFilterPad avfilter_ssrc_sbuffer_outputs[] = {
> +    {
> +        .name          = "default",
> +        .type          = AVMEDIA_TYPE_SUBTITLE,
> +        .request_frame = request_frame,
> +        .poll_frame    = poll_frame,
> +        .config_props  = config_props,
> +    },
> +    { NULL }
> +};
> +
> +AVFilter ff_ssrc_sbuffer = {
> +    .name          = "sbuffer",
> +    .description   = NULL_IF_CONFIG_SMALL("Buffer subtitle frames, and make them accessible to the filterchain."),
> +    .priv_size     = sizeof(BufferSourceContext),
> +    .query_formats = query_formats,
> +
> +    .init          = init_subtitle,
> +    .uninit        = uninit,
> +
> +    .inputs        = NULL,
> +    .outputs       = avfilter_ssrc_sbuffer_outputs,
> +    .priv_class    = &sbuffer_class,
> +};
> diff --git a/libavfilter/fifo.c b/libavfilter/fifo.c
> index abfbba1..feda5bd 100644
> --- a/libavfilter/fifo.c
> +++ b/libavfilter/fifo.c
> @@ -313,3 +313,34 @@ AVFilter ff_af_afifo = {
>      .inputs    = avfilter_af_afifo_inputs,
>      .outputs   = avfilter_af_afifo_outputs,
>  };
> +
> +static const AVFilterPad avfilter_sf_sfifo_inputs[] = {
> +    {
> +        .name             = "default",
> +        .type             = AVMEDIA_TYPE_SUBTITLE,
> +        .filter_frame     = add_to_queue,
> +    },
> +    { NULL }
> +};
> +
> +static const AVFilterPad avfilter_sf_sfifo_outputs[] = {
> +    {
> +        .name          = "default",
> +        .type          = AVMEDIA_TYPE_SUBTITLE,
> +        .request_frame = request_frame,
> +    },
> +    { NULL }
> +};
> +
> +AVFilter ff_sf_sfifo = {
> +    .name        = "sfifo",
> +    .description = NULL_IF_CONFIG_SMALL("Buffer input subtitles and send them when they are requested."),
> +
> +    .init      = init,
> +    .uninit    = uninit,
> +
> +    .priv_size = sizeof(FifoContext),
> +
> +    .inputs    = avfilter_sf_sfifo_inputs,
> +    .outputs   = avfilter_sf_sfifo_outputs,
> +};
> diff --git a/libavfilter/formats.c b/libavfilter/formats.c
> index 20f45e3..66bf138 100644
> --- a/libavfilter/formats.c
> +++ b/libavfilter/formats.c
> @@ -364,6 +364,10 @@ AVFilterFormats *ff_all_formats(enum AVMediaType type)
>                  return NULL;
>              fmt++;
>          }
> +    } else if (type == AVMEDIA_TYPE_SUBTITLE) {
> +        if (ff_add_format(&ret, 0 /* bitmap */) < 0 ||
> +            ff_add_format(&ret, 1 /* text */) < 0)
> +            return NULL;

Questionable, is probably preliminary code.

>      }
>  
>      return ret;
> diff --git a/libavfilter/internal.h b/libavfilter/internal.h
> index 3856012..cd8db73 100644
> --- a/libavfilter/internal.h
> +++ b/libavfilter/internal.h
> @@ -142,6 +142,14 @@ struct AVFilterPad {
>       * input pads only.
>       */
>      int needs_writable;
> +
> +    /**
> +     * Callback function to get a subtitle frame. If NULL, the filter system
> +     * will use ff_default_get_subtitle_buffer().
> +     *
> +     * Input subtitle pads only.
> +     */
> +    AVFrame *(*get_subtitle_buffer)(AVFilterLink *link, int nb_rects);
>  };
>  
>  struct AVFilterGraphInternal {
> diff --git a/libavfilter/sf_snull.c b/libavfilter/sf_snull.c
> new file mode 100644
> index 0000000..c20afdd
> --- /dev/null
> +++ b/libavfilter/sf_snull.c
> @@ -0,0 +1,48 @@
> +/*
> + * This file is part of FFmpeg.
> + *
> + * FFmpeg is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2.1 of the License, or (at your option) any later version.
> + *
> + * FFmpeg is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with FFmpeg; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
> + */
> +
> +/**
> + * @file
> + * null subtitle filter
> + */
> +
> +#include "avfilter.h"
> +#include "internal.h"
> +
> +static const AVFilterPad sf_snull_inputs[] = {
> +    {
> +        .name = "default",
> +        .type = AVMEDIA_TYPE_SUBTITLE,
> +    },
> +    { NULL }
> +};
> +
> +static const AVFilterPad sf_snull_outputs[] = {
> +    {
> +        .name = "default",
> +        .type = AVMEDIA_TYPE_SUBTITLE,
> +    },
> +    { NULL }
> +};
> +
> +AVFilter ff_sf_snull = {
> +    .name          = "snull",
> +    .description   = NULL_IF_CONFIG_SMALL("Pass the source unchanged to the output."),
> +    .inputs        = sf_snull_inputs,
> +    .outputs       = sf_snull_outputs,
> +};
> diff --git a/libavfilter/subtitle.c b/libavfilter/subtitle.c
> new file mode 100644
> index 0000000..88e43d5
> --- /dev/null
> +++ b/libavfilter/subtitle.c
> @@ -0,0 +1,60 @@
> +/*
> + * This file is part of FFmpeg.
> + *
> + * FFmpeg is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2.1 of the License, or (at your option) any later version.
> + *
> + * FFmpeg is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with FFmpeg; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
> + */
> +
> +#include "libavutil/common.h"
> +
> +#include "subtitle.h"
> +#include "avfilter.h"
> +#include "internal.h"
> +
> +AVFrame *ff_null_get_subtitle_buffer(AVFilterLink *link, int nb_rects)
> +{
> +    return ff_get_subtitle_buffer(link->dst->outputs[0], nb_rects);
> +}
> +
> +AVFrame *ff_default_get_subtitle_buffer(AVFilterLink *link, int nb_rects)
> +{
> +    AVFrame *frame = av_frame_alloc();
> +    int ret;
> +
> +    if (!frame)
> +        return NULL;
> +
> +    frame->sub_nb_rects = nb_rects;
> +    frame->format       = link->format;
> +    ret = av_frame_get_buffer(frame, 0);
> +    if (ret < 0) {
> +        av_frame_free(&frame);
> +        return NULL;
> +    }
> +
> +    return frame;
> +}
> +
> +AVFrame *ff_get_subtitle_buffer(AVFilterLink *link, int nb_samples)
> +{
> +    AVFrame *ret = NULL;
> +
> +    if (link->dstpad->get_subtitle_buffer)
> +        ret = link->dstpad->get_audio_buffer(link, nb_samples);
> +
> +    if (!ret)
> +        ret = ff_default_get_audio_buffer(link, nb_samples);
> +
> +    return ret;
> +}
> diff --git a/libavfilter/subtitle.h b/libavfilter/subtitle.h
> new file mode 100644
> index 0000000..1db9ebe
> --- /dev/null
> +++ b/libavfilter/subtitle.h
> @@ -0,0 +1,31 @@
> +/*
> + * This file is part of FFmpeg.
> + *
> + * FFmpeg is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2.1 of the License, or (at your option) any later version.
> + *
> + * FFmpeg is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with FFmpeg; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
> + */
> +
> +#ifndef AVFILTER_SUBTITLE_H
> +#define AVFILTER_SUBTITLE_H
> +
> +#include "avfilter.h"
> +#include "internal.h"
> +
> +AVFrame *ff_default_get_subtitle_buffer(AVFilterLink *link, int nb_rects);
> +
> +AVFrame *ff_null_get_subtitle_buffer(AVFilterLink *link, int nb_rects);
> +
> +AVFrame *ff_get_subtitle_buffer(AVFilterLink *link, int nb_rects);
> +
> +#endif /* AVFILTER_SUBTITLE_H */
> diff --git a/libavfilter/vf_subtitles.c b/libavfilter/vf_subtitles.c
> index 0f22644..20b3c77 100644
> --- a/libavfilter/vf_subtitles.c
> +++ b/libavfilter/vf_subtitles.c
> @@ -305,6 +305,10 @@ static av_cold int init_subtitles(AVFilterContext *ctx)
>      AVStream *st;
>      AVPacket pkt;
>      AssContext *ass = ctx->priv;
> +    AVFrame *frame = av_frame_alloc();
> +
> +    if (!frame)
> +        return AVERROR(ENOMEM);
>  
>      /* Init libass */
>      ret = init(ctx);
> @@ -447,32 +451,32 @@ static av_cold int init_subtitles(AVFilterContext *ctx)
>      pkt.data = NULL;
>      pkt.size = 0;
>      while (av_read_frame(fmt, &pkt) >= 0) {
> -        int i, got_subtitle;
> -        AVSubtitle sub = {0};
> -
>          if (pkt.stream_index == sid) {
> -            ret = avcodec_decode_subtitle2(dec_ctx, &sub, &got_subtitle, &pkt);
> -            if (ret < 0) {
> -                av_log(ctx, AV_LOG_WARNING, "Error decoding: %s (ignored)\n",
> -                       av_err2str(ret));
> -            } else if (got_subtitle) {
> -                const int64_t start_time = av_rescale_q(sub.pts, AV_TIME_BASE_Q, av_make_q(1, 1000));
> -                const int64_t duration   = sub.end_display_time;
> -                for (i = 0; i < sub.num_rects; i++) {
> -                    char *ass_line = sub.rects[i]->ass;
> +            int i;
> +
> +            ret = avcodec_send_packet(dec_ctx, &pkt);
> +            if (ret < 0)
> +                break;
> +            // XXX probably not correct api usage
> +            ret = avcodec_receive_frame(dec_ctx, frame);
> +            if (ret >= 0) {

I've found that converting code which used to old API to the new API is
very messy. I think that's mostly because the data flow is rather
different.

Correct API usage would be something like:

  while (1) {
    if (av_codec_send_packet() == AVERROR_EAGAIN)
      add_packet_back_to_demuxer_queue()
    while (1) {
      ret = avcodec_receive_frame
      if (ret == AVERROR_EAGAIN)
         break; // new packet needed
      /* process frame */
    }
  }

Plus EOF and error handling.

avcodec_send_packet() returning EAGAIN in this specific situation can
probably be reasonably treated as error instead. Unless the decoder
violates the declared API rules.

> +                // XXX: honor start/end display time
> +                const int64_t start_time = av_rescale_q(frame->pts,          st->time_base, av_make_q(1, 1000));
> +                const int64_t duration   = av_rescale_q(frame->pkt_duration, st->time_base, av_make_q(1, 1000));
> +
> +                for (i = 0; i < frame->sub_nb_rects; i++) {
> +                    AVSubtitleRect *rect = (AVSubtitleRect *)frame->extended_data[i]; // XXX move to lavu
> +                    char *ass_line = rect->ass;
>                      if (!ass_line)
>                          break;
> -                    if (LIBAVCODEC_VERSION_INT < AV_VERSION_INT(57,25,100))
> -                        ass_process_data(ass->track, ass_line, strlen(ass_line));
> -                    else
> -                        ass_process_chunk(ass->track, ass_line, strlen(ass_line),
> -                                          start_time, duration);
> +                    ass_process_chunk(ass->track, ass_line, strlen(ass_line),
> +                                      start_time, duration);
>                  }
>              }
> +            av_packet_unref(&pkt);
>          }
> -        av_packet_unref(&pkt);
> -        avsubtitle_free(&sub);
>      }
> +    av_frame_free(&frame);
>  
>  end:
>      av_dict_free(&codec_opts);
> diff --git a/libavformat/assenc.c b/libavformat/assenc.c
> index d50f18f..97a8de8 100644
> --- a/libavformat/assenc.c
> +++ b/libavformat/assenc.c
> @@ -164,6 +164,9 @@ static int write_packet(AVFormatContext *s, AVPacket *pkt)
>      int hh2, mm2, ss2, ms2;
>      DialogueLine *dialogue = av_mallocz(sizeof(*dialogue));
>  
> +    if (pkt->duration < 0)
> +        end = INT64_MAX;

Wait what.

> +
>      if (!dialogue)
>          return AVERROR(ENOMEM);
>  
> diff --git a/libavutil/frame.c b/libavutil/frame.c
> index 53e6174..8bb3ed2 100644
> --- a/libavutil/frame.c
> +++ b/libavutil/frame.c
> @@ -216,52 +216,33 @@ fail:
>      return AVERROR(ENOMEM);
>  }
>  
> -static int get_audio_buffer(AVFrame *frame, int align)
> +static int get_data_buffer(AVFrame *frame, int n, size_t size)
>  {
> -    int channels;
> -    int planar   = av_sample_fmt_is_planar(frame->format);
> -    int planes;
> -    int ret, i;
> -
> -    if (!frame->channels)
> -        frame->channels = av_get_channel_layout_nb_channels(frame->channel_layout);
> -
> -    channels = frame->channels;
> -    planes = planar ? channels : 1;
> -
> -    CHECK_CHANNELS_CONSISTENCY(frame);
> -    if (!frame->linesize[0]) {
> -        ret = av_samples_get_buffer_size(&frame->linesize[0], channels,
> -                                         frame->nb_samples, frame->format,
> -                                         align);
> -        if (ret < 0)
> -            return ret;
> -    }
> +    int i;
>  
> -    if (planes > AV_NUM_DATA_POINTERS) {
> -        frame->extended_data = av_mallocz_array(planes,
> -                                          sizeof(*frame->extended_data));
> -        frame->extended_buf  = av_mallocz_array((planes - AV_NUM_DATA_POINTERS),
> -                                          sizeof(*frame->extended_buf));
> +    if (n > AV_NUM_DATA_POINTERS) {
> +        frame->extended_data = av_mallocz_array(n, sizeof(*frame->extended_data));
> +        frame->extended_buf  = av_mallocz_array(n - AV_NUM_DATA_POINTERS,
> +                                                sizeof(*frame->extended_buf));
>          if (!frame->extended_data || !frame->extended_buf) {
>              av_freep(&frame->extended_data);
>              av_freep(&frame->extended_buf);
>              return AVERROR(ENOMEM);
>          }
> -        frame->nb_extended_buf = planes - AV_NUM_DATA_POINTERS;
> +        frame->nb_extended_buf = n - AV_NUM_DATA_POINTERS;
>      } else
>          frame->extended_data = frame->data;
>  
> -    for (i = 0; i < FFMIN(planes, AV_NUM_DATA_POINTERS); i++) {
> -        frame->buf[i] = av_buffer_alloc(frame->linesize[0]);
> +    for (i = 0; i < FFMIN(n, AV_NUM_DATA_POINTERS); i++) {
> +        frame->buf[i] = av_buffer_alloc(size);
>          if (!frame->buf[i]) {
>              av_frame_unref(frame);
>              return AVERROR(ENOMEM);
>          }
>          frame->extended_data[i] = frame->data[i] = frame->buf[i]->data;
>      }
> -    for (i = 0; i < planes - AV_NUM_DATA_POINTERS; i++) {
> -        frame->extended_buf[i] = av_buffer_alloc(frame->linesize[0]);
> +    for (i = 0; i < n - AV_NUM_DATA_POINTERS; i++) {
> +        frame->extended_buf[i] = av_buffer_alloc(size);
>          if (!frame->extended_buf[i]) {
>              av_frame_unref(frame);
>              return AVERROR(ENOMEM);
> @@ -269,11 +250,50 @@ static int get_audio_buffer(AVFrame *frame, int align)
>          frame->extended_data[i + AV_NUM_DATA_POINTERS] = frame->extended_buf[i]->data;
>      }
>      return 0;
> +}
> +
> +static int get_audio_buffer(AVFrame *frame, int align)
> +{
> +    int channels;
> +    int planar   = av_sample_fmt_is_planar(frame->format);
> +    int planes;
> +    int ret;
> +
> +    if (!frame->channels)
> +        frame->channels = av_get_channel_layout_nb_channels(frame->channel_layout);
>  
> +    channels = frame->channels;
> +    planes = planar ? channels : 1;
> +
> +    CHECK_CHANNELS_CONSISTENCY(frame);
> +    if (!frame->linesize[0]) {
> +        ret = av_samples_get_buffer_size(&frame->linesize[0], channels,
> +                                         frame->nb_samples, frame->format,
> +                                         align);
> +        if (ret < 0)
> +            return ret;
> +    }
> +
> +    return get_data_buffer(frame, planes, frame->linesize[0]);
> +}
> +
> +static int get_subtitle_buffer(AVFrame *frame)
> +{
> +    int i, ret;
> +
> +    ret = get_data_buffer(frame, frame->sub_nb_rects, sizeof(AVFrameSubtitleRectangle));
> +    if (ret < 0)
> +        return ret;
> +    for (i = 0; i < frame->sub_nb_rects; i++)
> +        memset(frame->extended_data[i], 0, sizeof(AVFrameSubtitleRectangle));
> +    return 0;

I'm noticing it doesn't allocate the actual subtitle bitmap or text
data, which is a difference from audio/video.

>  }
>  
>  int av_frame_get_buffer(AVFrame *frame, int align)
>  {
> +    if (frame->sub_nb_rects)
> +        return get_subtitle_buffer(frame);

No format set is valid for subtitles?

> +
>      if (frame->format < 0)
>          return AVERROR(EINVAL);
>  
> @@ -320,6 +340,8 @@ FF_ENABLE_DEPRECATION_WARNINGS
>      dst->colorspace             = src->colorspace;
>      dst->color_range            = src->color_range;
>      dst->chroma_location        = src->chroma_location;
> +    dst->sub_start_display      = src->sub_start_display;
> +    dst->sub_end_display        = src->sub_end_display;
>  
>      av_dict_copy(&dst->metadata, src->metadata, 0);
>  
> @@ -561,6 +583,7 @@ int av_frame_make_writable(AVFrame *frame)
>      tmp.channels       = frame->channels;
>      tmp.channel_layout = frame->channel_layout;
>      tmp.nb_samples     = frame->nb_samples;
> +    tmp.sub_nb_rects   = frame->sub_nb_rects;
>      ret = av_frame_get_buffer(&tmp, 32);
>      if (ret < 0)
>          return ret;
> @@ -716,11 +739,57 @@ static int frame_copy_audio(AVFrame *dst, const AVFrame *src)
>      return 0;
>  }
>  
> +static int frame_copy_subtitle(AVFrame *dst, const AVFrame *src)
> +{
> +    int i, ret;
> +
> +    for (i = 0; i < dst->sub_nb_rects; i++) {
> +        const AVFrameSubtitleRectangle *src_rect = (const AVFrameSubtitleRectangle *)src->extended_data[i];
> +        AVFrameSubtitleRectangle       *dst_rect = (AVFrameSubtitleRectangle *)dst->extended_data[i];
> +
> +        memset(dst_rect, 0, sizeof(*dst_rect));
> +
> +        dst_rect->flags = src_rect->flags;
> +
> +        if (src->format != AV_PIX_FMT_NONE) {
> +            /* (Alloc and) copy bitmap subtitle */
> +
> +            av_assert1(!src_rect->text);
> +
> +            dst_rect->x = src_rect->x;
> +            dst_rect->y = src_rect->y;
> +            dst_rect->w = src_rect->w;
> +            dst_rect->h = src_rect->h;
> +
> +            ret = av_image_alloc(dst_rect->data, dst_rect->linesize,
> +                                 dst_rect->w, dst_rect->h,
> +                                 dst->format, 0);
> +            if (ret < 0)
> +                return ret;
> +            av_image_copy(dst_rect->data, dst_rect->linesize,
> +                          src_rect->data, src_rect->linesize,
> +                          dst->format, dst_rect->w, dst_rect->h);
> +        } else {
> +            /* Copy text subtitle */
> +
> +            av_assert1(!src_rect->x && !src_rect->y);
> +            av_assert1(!src_rect->w && !src_rect->w);
> +
> +            dst_rect->text = av_strdup(src_rect->text);
> +            if (!dst_rect->text)
> +                return AVERROR(ENOMEM);
> +        }
> +    }
> +    return 0;
> +}
> +
>  int av_frame_copy(AVFrame *dst, const AVFrame *src)
>  {
> -    if (dst->format != src->format || dst->format < 0)
> +    if (dst->format != src->format || (dst->format < 0 && !dst->sub_nb_rects))
>          return AVERROR(EINVAL);
>  
> +    if (dst->sub_nb_rects)
> +        return frame_copy_subtitle(dst, src);
>      if (dst->width > 0 && dst->height > 0)
>          return frame_copy_video(dst, src);
>      else if (dst->nb_samples > 0 && dst->channel_layout)

Maybe I glanced over it, but it doesn't explicitly reset the subtitle
fields. I think this should happen for unknown durations/start times
at least.

> diff --git a/libavutil/frame.h b/libavutil/frame.h
> index 8e51361..d531cfa 100644
> --- a/libavutil/frame.h
> +++ b/libavutil/frame.h
> @@ -148,6 +148,25 @@ typedef struct AVFrameSideData {
>      AVBufferRef *buf;
>  } AVFrameSideData;
>  
> +#define AV_NUM_DATA_POINTERS 8
> +
> +/**
> + * This structure describes decoded subtitle rectangle
> + */
> +typedef struct AVFrameSubtitleRectangle {
> +    int x, y;
> +    int w, h;

Might require clarification that this is strictly for bitmap subtitles
only.

> +
> +    /* image data for bitmap subtitles, in AVFrame.format (AVPixelFormat) */
> +    uint8_t *data[AV_NUM_DATA_POINTERS];
> +    int linesize[AV_NUM_DATA_POINTERS];

Does it define the format somewhere?

> +
> +    /* decoded text for text subtitles, in ASS */
> +    char *text;

I'm fine with this - other would probably argue this should be in
data[0] or so.

> +
> +    int flags;

Which flags (missing doc.)?

> +} AVFrameSubtitleRectangle;

The struct should probably be extendable (so its size is not part of
the ABI).

> +
>  /**
>   * This structure describes decoded (raw) audio or video data.
>   *
> @@ -182,7 +201,6 @@ typedef struct AVFrameSideData {
>   * for AVFrame can be obtained from avcodec_get_frame_class()
>   */
>  typedef struct AVFrame {
> -#define AV_NUM_DATA_POINTERS 8
>      /**
>       * pointer to the picture/channel planes.
>       * This might be different from the first allocated byte
> @@ -224,9 +242,12 @@ typedef struct AVFrame {
>       * For packed audio, there is just one data pointer, and linesize[0]
>       * contains the total size of the buffer for all channels.
>       *
> +     * For subtitles, each pointer corresponds to an AVFrameSubtitleRectangle.
> +     *

It should also mention how many data pointers there are.

Btw. I'm very not much of a fan to have to deal with extended_data. And
the duplication with data. (Does it even document what data[] contains
for subtitles?)

>       * Note: Both data and extended_data should always be set in a valid frame,
> -     * but for planar audio with more channels that can fit in data,
> -     * extended_data must be used in order to access all channels.
> +     * but for subtitles and planar audio with more channels that can fit in
> +     * data, extended_data must be used in order to access all audio channels
> +     * or subtitles rectangles.
>       */
>      uint8_t **extended_data;
>  
> @@ -359,10 +380,10 @@ typedef struct AVFrame {
>       * also be non-NULL for all j < i.
>       *
>       * There may be at most one AVBuffer per data plane, so for video this array
> -     * always contains all the references. For planar audio with more than
> -     * AV_NUM_DATA_POINTERS channels, there may be more buffers than can fit in
> -     * this array. Then the extra AVBufferRef pointers are stored in the
> -     * extended_buf array.
> +     * always contains all the references. For planar audio or subtitles with
> +     * more than AV_NUM_DATA_POINTERS channels or rectangles, there may be more
> +     * buffers than can fit in this array. Then the extra AVBufferRef pointers
> +     * are stored in the extended_buf array.
>       */
>      AVBufferRef *buf[AV_NUM_DATA_POINTERS];

Also to note: this refcounts the AVFrameSubtitleRectangle structs, but
not the contained data. This is not sane.

In particular, I think AVBufferRef refcounted data can't sanely contain
pointer (as they describe byte data). However the hwframescontext stuff
already uses pointers in AVBufferRef'ed structs, so we should at least
make behavior consistent. This means: if the AVBuffer is unreferenced
and destroyed, its destructor needs to free the referenced subtitle
data. I think this is not done yet, but I could be wrong.

>  
> @@ -532,6 +553,29 @@ typedef struct AVFrame {
>       * AVHWFramesContext describing the frame.
>       */
>      AVBufferRef *hw_frames_ctx;
> +
> +    /**
> +     * Number of rectangles available in extended_data
> +     * - encoding: set by user (XXX: no encoding yet)
> +     * - decoding: set by libavcodec, read by user
> +     */
> +    int sub_nb_rects;
> +
> +    /**
> +     * Subtitle start display time in AV_TIME_BASE, relative to pts
> +     * XXX: semantic for <=0?
> +     * - encoding: unused
> +     * - decoding: set by libavcodec, read by user.
> +     */
> +    int64_t sub_start_display;
> +
> +    /**
> +     * Subtitle end display time in AV_TIME_BASE, relative to pts
> +     * XXX: semantic for <=0?
> +     * - encoding: unused
> +     * - decoding: set by libavcodec, read by user.
> +     */
> +    int64_t sub_end_display;

I suggest "0" means literally 0 (i.e. essentially not displayed). -1
could mean "unknown".

What I don't like is that this doesn't define semantics whether a
subtitle frame replaces a previous frame. Normally you would do the
latter if the duration is unknown - but some subtitle formats or
containers can mix them and signal a incorrect too long duration. Then
the next frame shouldn't overlap with the previous frame.

>  } AVFrame;
>  
>  /**
> @@ -641,6 +685,7 @@ void av_frame_move_ref(AVFrame *dst, AVFrame *src);
>   * - format (pixel format for video, sample format for audio)
>   * - width and height for video
>   * - nb_samples and channel_layout for audio
> + * - sub_nb_rects for subtitle
>   *
>   * This function will fill AVFrame.data and AVFrame.buf arrays and, if
>   * necessary, allocate and fill AVFrame.extended_data and AVFrame.extended_buf.
> diff --git a/tests/ref/fate/sub-charenc b/tests/ref/fate/sub-charenc
> index a056cd1..050807b 100644
> --- a/tests/ref/fate/sub-charenc
> +++ b/tests/ref/fate/sub-charenc
> @@ -23,7 +23,7 @@ Dialogue: 0,0:02:31.61,0:02:33.90,Default,,0,0,0,,- А ти как си?\N- До
>  Dialogue: 0,0:02:40.16,0:02:42.79,Default,,0,0,0,,Монахът Дзенг каза,\Nче си в планината Удан.
>  Dialogue: 0,0:02:43.04,0:02:46.54,Default,,0,0,0,,Каза, че практикуваш\Nдълбока медитация.
>  Dialogue: 0,0:02:48.84,0:02:50.68,Default,,0,0,0,,Сигурно в планината\Nе много спокойно.
> -Dialogue: 0,0:02:51.25,0:02:53.46,Default,,0,0,0,,Завиждам ти.
> +Dialogue: 0,0:02:51.26,0:02:53.47,Default,,0,0,0,,Завиждам ти.
>  Dialogue: 0,0:02:53.67,0:02:58.34,Default,,0,0,0,,Имам толкова много работа,\Nпочти не ми остава\Nвреме за почивка.
>  Dialogue: 0,0:03:00.26,0:03:03.89,Default,,0,0,0,,Оставих обучението рано.
>  Dialogue: 0,0:03:05.69,0:03:11.28,Default,,0,0,0,,Защо? Ти си боец на Удан.\NОбучението е всичко.
> @@ -40,7 +40,7 @@ Dialogue: 0,0:03:53.40,0:03:56.57,Default,,0,0,0,,Не можах да издъ
>  Dialogue: 0,0:03:57.49,0:03:59.74,Default,,0,0,0,,Прекъснах медитацията си.
>  Dialogue: 0,0:03:59.95,0:04:02.24,Default,,0,0,0,,Не можах да продължа.
>  Dialogue: 0,0:04:03.20,0:04:07.79,Default,,0,0,0,,Нещо...\Nме дърпаше назад.
> -Dialogue: 0,0:04:09.62,0:04:10.91,Default,,0,0,0,,Какво беше?
> +Dialogue: 0,0:04:09.63,0:04:10.92,Default,,0,0,0,,Какво беше?
>  Dialogue: 0,0:04:15.46,0:04:18.00,Default,,0,0,0,,Нещо, от което не\Nмога да се освободя.
>  Dialogue: 0,0:04:23.39,0:04:24.68,Default,,0,0,0,,Скоро ли ще тръгваш?
>  Dialogue: 0,0:04:26.77,0:04:30.27,Default,,0,0,0,,Подготвяме охрана\Nза една доставка...
> @@ -52,7 +52,7 @@ Dialogue: 0,0:04:48.37,0:04:52.67,Default,,0,0,0,,Да. Той винаги е 
>  Dialogue: 0,0:04:52.88,0:04:56.55,Default,,0,0,0,,Не разбирам.\NКак можеш да се разделиш с него?
>  Dialogue: 0,0:04:56.76,0:04:59.93,Default,,0,0,0,,Той винаги е бил с теб.
>  Dialogue: 0,0:05:01.18,0:05:05.52,Default,,0,0,0,,Твърде много хора са\Nзагинали от това острие.
> -Dialogue: 0,0:05:09.68,0:05:14.52,Default,,0,0,0,,Чисто е единствено защото\Nкръвта се отмива лесно.
> +Dialogue: 0,0:05:09.69,0:05:14.53,Default,,0,0,0,,Чисто е единствено защото\Nкръвта се отмива лесно.
>  Dialogue: 0,0:05:15.40,0:05:20.61,Default,,0,0,0,,Ти го използваш справедливо.\NДостоен си за него.
>  Dialogue: 0,0:05:23.66,0:05:27.37,Default,,0,0,0,,Дойде време\Nда го оставя.
>  Dialogue: 0,0:05:27.58,0:05:31.21,Default,,0,0,0,,Е, какво ще правиш\Nот сега нататък?