[FFmpeg-devel] [PATCH 1/5] lavc/htmlsubtitles: improve handling broken garbage
Michael Niedermayer
michael at niedermayer.cc
Sun Jul 30 05:34:16 EEST 2017
On Sat, Jul 29, 2017 at 09:27:47PM +0200, Clément Bœsch wrote:
> This commit switches off forced correct nesting of tags and only keeps
> it for font tags. See long explanations in the code for the rationale.
>
> This results in various FATE changes which I'll explain here:
>
> - various swapping in font attributes, this is mostly noise due to the
> old reverse stack way of printing them. The new one is more correct as
> the last attribute takes over the previous ones.
>
> - unrecognized tags disappears
>
> - invalid tags that were previously displayed aren't anymore (instead,
> we have a warning). This is better for the end user
>
> The main benefit of this commit is to be more tolerant to error, leading
> to a better handling of badly nested tags or random wrong formatting for
> the end user.
> ---
> libavcodec/htmlsubtitles.c | 199 ++++++++++++++++++++++++++-------------
> tests/ref/fate/sub-sami2 | 4 +-
> tests/ref/fate/sub-srt | 14 +--
> tests/ref/fate/sub-srt-badsyntax | 4 +-
> tests/ref/fate/sub-textenc | 14 +--
> tests/ref/fate/sub-webvttenc | 14 +--
> 6 files changed, 157 insertions(+), 92 deletions(-)
>
> diff --git a/libavcodec/htmlsubtitles.c b/libavcodec/htmlsubtitles.c
> index e1fb7bc3cf..69d855df21 100644
> --- a/libavcodec/htmlsubtitles.c
> +++ b/libavcodec/htmlsubtitles.c
> @@ -1,5 +1,6 @@
> /*
> * Copyright (c) 2010 Aurelien Jacobs <aurel at gnuage.org>
> + * Copyright (c) 2017 Clément Bœsch <u at pkh.me>
> *
> * This file is part of FFmpeg.
> *
> @@ -18,6 +19,7 @@
> * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
> */
>
> +#include "libavutil/avassert.h"
> #include "libavutil/avstring.h"
> #include "libavutil/common.h"
> #include "libavutil/parseutils.h"
> @@ -31,19 +33,6 @@ static int html_color_parse(void *log_ctx, const char *str)
> return rgba[0] | rgba[1] << 8 | rgba[2] << 16;
> }
>
> -enum {
> - PARAM_UNKNOWN = -1,
> - PARAM_SIZE,
> - PARAM_COLOR,
> - PARAM_FACE,
> - PARAM_NUMBER
> -};
> -
> -typedef struct SrtStack {
> - char tag[128];
> - char param[PARAM_NUMBER][128];
> -} SrtStack;
> -
> static void rstrip_spaces_buf(AVBPrint *buf)
> {
> if (av_bprint_is_complete(buf))
> @@ -75,17 +64,48 @@ static void handle_open_brace(AVBPrint *dst, const char **inp, int *an, int *clo
> av_bprint_chars(dst, *in, 1);
> }
>
> +struct font_tag {
> + char face[128];
> + int size;
> + uint32_t color;
> +};
> +
> +/*
> + * The general politic of the convert is to mask unsupported tags or formatting
> + * errors (but still alert the user/subtitles writer with an error/warning)
> + * without dropping any actual text content for the final user.
> + */
> int ff_htmlmarkup_to_ass(void *log_ctx, AVBPrint *dst, const char *in)
> {
> - char *param, buffer[128], tmp[128];
> - int len, tag_close, sptr = 1, line_start = 1, an = 0, end = 0;
> - SrtStack stack[16];
> + char *param, buffer[128];
> + int len, tag_close, sptr = 0, line_start = 1, an = 0, end = 0;
> int closing_brace_missing = 0;
> + int i, likely_a_tag;
>
> - stack[0].tag[0] = 0;
> - strcpy(stack[0].param[PARAM_SIZE], "{\\fs}");
> - strcpy(stack[0].param[PARAM_COLOR], "{\\c}");
> - strcpy(stack[0].param[PARAM_FACE], "{\\fn}");
> + /*
> + * state stack is only present for fonts since they are the only tags where
> + * the state is not binary. Here is a typical use case:
> + *
> + * <font color="red" size=10>
> + * red 10
> + * <font size=50> RED AND BIG </font>
> + * red 10 again
> + * </font>
> + *
> + * On the other hand, using the state system for all the tags should be
> + * avoided because it breaks wrongly nested tags such as:
> + *
> + * <b> foo <i> bar </b> bla </i>
> + *
> + * We don't want to break here; instead, we will treat all these tags as
> + * binary state markers. Basically, "<b>" will activate bold, and "</b>"
> + * will deactivate it, whatever the current state.
> + *
> + * This will also prevents cases where we have a random closing tag
> + * remaining after the opening one was dropped. Yes, this happens and we
> + * still don't want to print a "</b>" at the end of the dialog event.
> + */
> + struct font_tag stack[16] = {0};
this seems to produce a compiler warning:
./libavcodec/htmlsubtitles.c: In function ‘ff_htmlmarkup_to_ass’:
./libavcodec/htmlsubtitles.c:112:12: warning: missing braces around initializer [-Wmissing-braces]
gcc (Ubuntu 4.8.5-2ubuntu1~14.04.1) 4.8.5
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
During times of universal deceit, telling the truth becomes a
revolutionary act. -- George Orwell
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: Digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20170730/3c0564f4/attachment.sig>
More information about the ffmpeg-devel
mailing list