[FFmpeg-devel] [PATCH v22 22/23] avutil/ass_split: Add parsing of hard-space tags (\h)

Soft Works softworkz at hotmail.com
Thu Dec 9 14:39:41 EET 2021



> -----Original Message-----
> From: ffmpeg-devel <ffmpeg-devel-bounces at ffmpeg.org> On Behalf Of Soft Works
> Sent: Thursday, December 9, 2021 1:13 PM
> To: ffmpeg-devel at ffmpeg.org
> Subject: [FFmpeg-devel] [PATCH v22 22/23] avutil/ass_split: Add parsing of
> hard-space tags (\h)
> 
> The \h tag in ASS/SSA is indicating a non-breaking space. See
> https://github.com/Aegisub/aegisite/blob/master/source/docs/3.2/ASS_Tags.html
> .md
> 
> The ass_split implementation is used by almost all text subtitle
> encoders and it didn't handle this tag. Interestingly, several tests
> are testing for \h parsing and had incorrect reference data for those tests.
> 
> The \h tag is specific to ASS and doesn't have any meaning outside of ASS.
> Still, the reference data for ttmlenc, textenc and webvttenc were full of
> \h tags even though this tag doesn't have a meaning there.
> 
> Signed-off-by: softworkz <softworkz at hotmail.com>
> ---
>  libavutil/ass_split.c            |  7 +++++++
>  libavutil/ass_split_internal.h   |  1 +
>  tests/ref/fate/mov-mp4-ttml-dfxp |  8 ++++----
>  tests/ref/fate/mov-mp4-ttml-stpp |  8 ++++----
>  tests/ref/fate/sub-textenc       | 10 +++++-----
>  tests/ref/fate/sub-ttmlenc       |  8 ++++----
>  tests/ref/fate/sub-webvttenc     | 10 +++++-----
>  7 files changed, 30 insertions(+), 22 deletions(-)
> 
> diff --git a/libavutil/ass_split.c b/libavutil/ass_split.c
> index c5963351fc..30512dfc74 100644
> --- a/libavutil/ass_split.c
> +++ b/libavutil/ass_split.c
> @@ -484,6 +484,7 @@ int avpriv_ass_split_override_codes(const
> ASSCodesCallbacks *callbacks, void *pr
>      while (buf && *buf) {
>          if (text && callbacks->text &&
>              (sscanf(buf, "\\%1[nN]", new_line) == 1 ||
> +             sscanf(buf, "\\%1[hH]", new_line) == 1 ||
>               !strncmp(buf, "{\\", 2))) {
>              callbacks->text(priv, text, text_len);
>              text = NULL;
> @@ -492,6 +493,12 @@ int avpriv_ass_split_override_codes(const
> ASSCodesCallbacks *callbacks, void *pr
>              if (callbacks->new_line)
>                  callbacks->new_line(priv, new_line[0] == 'N');
>              buf += 2;
> +        } else if (sscanf(buf, "\\%1[hH]", new_line) == 1) {
> +            if (callbacks->hard_space)
> +                callbacks->hard_space(priv);
> +            else if (callbacks->text)
> +                callbacks->text(priv, " ", 1);
> +            buf += 2;
>          } else if (!strncmp(buf, "{\\", 2)) {
>              buf++;
>              while (*buf == '\\') {
> diff --git a/libavutil/ass_split_internal.h b/libavutil/ass_split_internal.h
> index 8e8e51115c..d6eaade4a4 100644
> --- a/libavutil/ass_split_internal.h
> +++ b/libavutil/ass_split_internal.h
> @@ -141,6 +141,7 @@ typedef struct {
>       * @{
>       */
>      void (*text)(void *priv, const char *text, int len);
> +    void (*hard_space)(void *priv);
>      void (*new_line)(void *priv, int forced);
>      void (*style)(void *priv, char style, int close);
>      void (*color)(void *priv, unsigned int /* color */, unsigned int
> color_id);
> diff --git a/tests/ref/fate/mov-mp4-ttml-dfxp b/tests/ref/fate/mov-mp4-ttml-
> dfxp
> index e24b5d618b..e565ffa1f6 100644
> --- a/tests/ref/fate/mov-mp4-ttml-dfxp
> +++ b/tests/ref/fate/mov-mp4-ttml-dfxp
> @@ -1,9 +1,9 @@
> -2e7e01c821c111466e7a2844826b7f6d *tests/data/fate/mov-mp4-ttml-dfxp.mp4
> -8519 tests/data/fate/mov-mp4-ttml-dfxp.mp4
> +658884e1b789e75c454b25bdf71283c9 *tests/data/fate/mov-mp4-ttml-dfxp.mp4
> +8486 tests/data/fate/mov-mp4-ttml-dfxp.mp4
>  #tb 0: 1/1000
>  #media_type 0: data
>  #codec_id 0: none
> -0,          0,          0,    68500,     7866, 0x456c36b7
> +0,          0,          0,    68500,     7833, 0x31b22193
>  {
>      "packets": [
>          {
> @@ -15,7 +15,7 @@
>              "dts_time": "0.000000",
>              "duration": 68500,
>              "duration_time": "68.500000",
> -            "size": "7866",
> +            "size": "7833",
>              "pos": "44",
>              "flags": "K_"
>          }
> diff --git a/tests/ref/fate/mov-mp4-ttml-stpp b/tests/ref/fate/mov-mp4-ttml-
> stpp
> index 77bd23b7bf..f25b5b2d28 100644
> --- a/tests/ref/fate/mov-mp4-ttml-stpp
> +++ b/tests/ref/fate/mov-mp4-ttml-stpp
> @@ -1,9 +1,9 @@
> -cbd2c7ff864a663b0d893deac5a0caec *tests/data/fate/mov-mp4-ttml-stpp.mp4
> -8547 tests/data/fate/mov-mp4-ttml-stpp.mp4
> +c9570de0ccebc858b0c662a7e449582c *tests/data/fate/mov-mp4-ttml-stpp.mp4
> +8514 tests/data/fate/mov-mp4-ttml-stpp.mp4
>  #tb 0: 1/1000
>  #media_type 0: data
>  #codec_id 0: none
> -0,          0,          0,    68500,     7866, 0x456c36b7
> +0,          0,          0,    68500,     7833, 0x31b22193
>  {
>      "packets": [
>          {
> @@ -15,7 +15,7 @@ cbd2c7ff864a663b0d893deac5a0caec *tests/data/fate/mov-mp4-
> ttml-stpp.mp4
>              "dts_time": "0.000000",
>              "duration": 68500,
>              "duration_time": "68.500000",
> -            "size": "7866",
> +            "size": "7833",
>              "pos": "44",
>              "flags": "K_"
>          }
> diff --git a/tests/ref/fate/sub-textenc b/tests/ref/fate/sub-textenc
> index 3ea56b38f0..910ca3d6e3 100644
> --- a/tests/ref/fate/sub-textenc
> +++ b/tests/ref/fate/sub-textenc
> @@ -160,18 +160,18 @@ but show this: {normal text}
>  \ N is a forced line break
>  \ h is a hard space
>  Normal spaces at the start and at the end of the line are trimmed while hard
> spaces are not trimmed.
> -
> The\hline\hwill\hnever\hbreak\hautomatically\hright\hbefore\hor\hafter\ha\hha
> rd\hspace.\h:-D
> +The line will never break automatically right before or after a hard space.
> :-D
> 
>  31
>  00:00:54,501 --> 00:00:56,500
> 
> -\h\h\h\h\hA (05 hard spaces followed by a letter)
> +     A (05 hard spaces followed by a letter)
>  A (Normal  spaces followed by a letter)
>  A (No hard spaces followed by a letter)
> 
>  32
>  00:00:56,501 --> 00:00:58,500
> -\h\h\h\h\hA (05 hard spaces followed by a letter)
> +     A (05 hard spaces followed by a letter)
>  A (Normal  spaces followed by a letter)
>  A (No hard spaces followed by a letter)
>  Show this: \TEST and this: \-)
> @@ -179,10 +179,10 @@ Show this: \TEST and this: \-)
>  33
>  00:00:58,501 --> 00:01:00,500
> 
> -A letter followed by 05 hard spaces: A\h\h\h\h\h
> +A letter followed by 05 hard spaces: A
>  A letter followed by normal  spaces: A
>  A letter followed by no hard spaces: A
> -05 hard  spaces between letters: A\h\h\h\h\hA
> +05 hard  spaces between letters: A     A
>  5 normal spaces between letters: A     A
> 
>  ^--Forced line break
> diff --git a/tests/ref/fate/sub-ttmlenc b/tests/ref/fate/sub-ttmlenc
> index 4df8f8796f..aea09bb31e 100644
> --- a/tests/ref/fate/sub-ttmlenc
> +++ b/tests/ref/fate/sub-ttmlenc
> @@ -109,16 +109,16 @@
>          end="00:00:54.500"><span region="Default">Hide these tags:<br/>also
> hide these tags:<br/>but show this: {normal text}</span></p>
>        <p
>          begin="00:00:54.501"
> -        end="00:01:00.500"><span region="Default"><br/>\ N is a forced line
> break<br/>\ h is a hard space<br/>Normal spaces at the start and at the end
> of the line are trimmed while hard spaces are not
> trimmed.<br/>The\hline\hwill\hnever\hbreak\hautomatically\hright\hbefore\hor\
> hafter\ha\hhard\hspace.\h:-D</span></p>
> +        end="00:01:00.500"><span region="Default"><br/>\ N is a forced line
> break<br/>\ h is a hard space<br/>Normal spaces at the start and at the end
> of the line are trimmed while hard spaces are not trimmed.<br/>The line will
> never break automatically right before or after a hard space. :-D</span></p>
>        <p
>          begin="00:00:54.501"
> -        end="00:00:56.500"><span region="Default"><br/>\h\h\h\h\hA (05 hard
> spaces followed by a letter)<br/>A (Normal  spaces followed by a
> letter)<br/>A (No hard spaces followed by a letter)</span></p>
> +        end="00:00:56.500"><span region="Default"><br/>     A (05 hard
> spaces followed by a letter)<br/>A (Normal  spaces followed by a
> letter)<br/>A (No hard spaces followed by a letter)</span></p>
>        <p
>          begin="00:00:56.501"
> -        end="00:00:58.500"><span region="Default">\h\h\h\h\hA (05 hard
> spaces followed by a letter)<br/>A (Normal  spaces followed by a
> letter)<br/>A (No hard spaces followed by a letter)<br/>Show this: \TEST and
> this: \-)</span></p>
> +        end="00:00:58.500"><span region="Default">     A (05 hard spaces
> followed by a letter)<br/>A (Normal  spaces followed by a letter)<br/>A (No
> hard spaces followed by a letter)<br/>Show this: \TEST and this: \-
> )</span></p>
>        <p
>          begin="00:00:58.501"
> -        end="00:01:00.500"><span region="Default"><br/>A letter followed by
> 05 hard spaces: A\h\h\h\h\h<br/>A letter followed by normal  spaces: A<br/>A
> letter followed by no hard spaces: A<br/>05 hard  spaces between letters:
> A\h\h\h\h\hA<br/>5 normal spaces between letters: A     A<br/><br/>^--Forced
> line break</span></p>
> +        end="00:01:00.500"><span region="Default"><br/>A letter followed by
> 05 hard spaces: A     <br/>A letter followed by normal  spaces: A<br/>A
> letter followed by no hard spaces: A<br/>05 hard  spaces between letters: A
> A<br/>5 normal spaces between letters: A     A<br/><br/>^--Forced line
> break</span></p>
>        <p
>          begin="00:01:00.501"
>          end="00:01:02.500"><span region="Default">Both line should be
> strikethrough,<br/>yes.<br/>Correctly closed tags<br/>should be
> hidden.</span></p>
> diff --git a/tests/ref/fate/sub-webvttenc b/tests/ref/fate/sub-webvttenc
> index 45ae0b6131..f4172dcc84 100644
> --- a/tests/ref/fate/sub-webvttenc
> +++ b/tests/ref/fate/sub-webvttenc
> @@ -132,26 +132,26 @@ but show this: {normal text}
>  \ N is a forced line break
>  \ h is a hard space
>  Normal spaces at the start and at the end of the line are trimmed while hard
> spaces are not trimmed.
> -
> The\hline\hwill\hnever\hbreak\hautomatically\hright\hbefore\hor\hafter\ha\hha
> rd\hspace.\h:-D
> +The line will never break automatically right before or after a hard space.
> :-D
> 
>  00:54.501 --> 00:56.500
> 
> -\h\h\h\h\hA (05 hard spaces followed by a letter)
> +     A (05 hard spaces followed by a letter)
>  A (Normal  spaces followed by a letter)
>  A (No hard spaces followed by a letter)
> 
>  00:56.501 --> 00:58.500
> -\h\h\h\h\hA (05 hard spaces followed by a letter)
> +     A (05 hard spaces followed by a letter)
>  A (Normal  spaces followed by a letter)
>  A (No hard spaces followed by a letter)
>  Show this: \TEST and this: \-)
> 
>  00:58.501 --> 01:00.500
> 
> -A letter followed by 05 hard spaces: A\h\h\h\h\h
> +A letter followed by 05 hard spaces: A
>  A letter followed by normal  spaces: A
>  A letter followed by no hard spaces: A
> -05 hard  spaces between letters: A\h\h\h\h\hA
> +05 hard  spaces between letters: A     A
>  5 normal spaces between letters: A     A
> 
>  ^--Forced line break
> --

Patchwork fails to apply this patch due to trailing whitespace:
https://patchwork.ffmpeg.org/project/ffmpeg/patch/DM8P223MB036543CB351641BF7280F653BA709@DM8P223MB0365.NAMP223.PROD.OUTLOOK.COM/

The problem is, this time, the whitespace needs to be there.

Does anybody have an idea what could be done in this 
case?

Thanks,
softworkz



More information about the ffmpeg-devel mailing list