[FFmpeg-devel] [PATCH 2/4] avcodec/samidec: use ff_htmlmarkup_to_ass()
Yayoi Ukai
yayoi.ukai at gmail.com
Fri Aug 28 08:05:11 CEST 2015
Thank you for the suggestion.. It took me a while to get it but I
think it looks much better now.
Please let me know what you think!
Cheers,
On Sat, Aug 15, 2015 at 11:24 AM, Clément Bœsch <u at pkh.me> wrote:
> On Sat, Aug 08, 2015 at 09:24:03PM -0700, Yayoi Ukai wrote:
>> On Sat, Aug 8, 2015 at 8:23 AM, Clément Bœsch <u at pkh.me> wrote:
>> > On Fri, Aug 07, 2015 at 11:03:29PM -0700, Yayoi wrote:
>> >> ---
>> >> libavcodec/samidec.c | 59 +++++++++++++++++++++++++++++--------------------
>> >> tests/ref/fate/sub-sami | 18 +++++++--------
>> >> 2 files changed, 44 insertions(+), 33 deletions(-)
>> >>
>> >> diff --git a/libavcodec/samidec.c b/libavcodec/samidec.c
>> >> index 47850e2..41826a7 100644
>> >> --- a/libavcodec/samidec.c
>> >> +++ b/libavcodec/samidec.c
>> >> @@ -27,6 +27,7 @@
>> >> #include "ass.h"
>> >> #include "libavutil/avstring.h"
>> >> #include "libavutil/bprint.h"
>> >> +#include "htmlsubtitles.h"
>> >>
>> >> typedef struct {
>> >> AVBPrint source;
>> >> @@ -41,11 +42,12 @@ static int sami_paragraph_to_ass(AVCodecContext *avctx, const char *src)
>> >> char *tag = NULL;
>> >> char *dupsrc = av_strdup(src);
>> >> char *p = dupsrc;
>> >> + char *pcopy = NULL;
>> >> + int index = 0;
>> >> + int second_paragraph = 0;
>> >>
>> >> - av_bprint_clear(&sami->content);
>> >> for (;;) {
>> >> char *saveptr = NULL;
>> >> - int prev_chr_is_space = 0;
>> >> AVBPrint *dst = &sami->content;
>> >>
>> >> /* parse & extract paragraph tag */
>> >> @@ -77,37 +79,46 @@ static int sami_paragraph_to_ass(AVCodecContext *avctx, const char *src)
>> >> goto end;
>> >> }
>> >>
>> >
>> >> - /* extract the text, stripping most of the tags */
>> >> + /* check for the second paragrph */
>> >
>> > Why change the comment? What does "check" mean here? What is the "second
>> > paragraph"?
>>
>>
>> I answer it below with the other questions you have because you are
>> basically asking the same things.
>> And I will document better too.
>>
>> >
>> >> + pcopy = av_strdup(p);
>> >> while (*p) {
>> >> if (*p == '<') {
>> >> - if (!av_strncasecmp(p, "<P", 2) && (p[2] == '>' || av_isspace(p[2])))
>> >> + if (!av_strncasecmp(p, "<p", 2) && (p[2] == '>' || av_isspace(p[2]))) {
>> >> + second_paragraph = 1;
>> >> break;
>> >> - if (!av_strncasecmp(p, "<BR", 3))
>> >> - av_bprintf(dst, "\\N");
>> >> - p++;
>> >> - while (*p && *p != '>')
>> >> - p++;
>> >> - if (!*p)
>> >> - break;
>> >> - if (*p == '>')
>> >> - p++;
>> >> - continue;
>> >> + }
>> >> }
>> >> - if (!av_isspace(*p))
>> >> - av_bprint_chars(dst, *p, 1);
>> >> - else if (!prev_chr_is_space)
>> >> - av_bprint_chars(dst, ' ', 1);
>> >> - prev_chr_is_space = av_isspace(*p);
>> >> p++;
>> >> + index++;
>> >> + }
>> >> + p = p - index;
>> >> + if (second_paragraph) {
>> >> + p[index] = 0;
>> >> }
>> >> - }
>> >>
>> >> - av_bprint_clear(&sami->full);
>> >> - if (sami->source.len)
>> >> - av_bprintf(&sami->full, "{\\i1}%s{\\i0}\\N", sami->source.str);
>> >> - av_bprintf(&sami->full, "%s", sami->content.str);
>> >> + ff_htmlmarkup_to_ass(avctx, dst, p);
>> >> +
>> >> + /* add the source if there are any. */
>> >> + av_bprint_clear(&sami->full);
>> >> + if (sami->source.len) {
>> >> + av_bprintf(&sami->full, "{\\i1}%s{\\i0}\\N", sami->source.str);
>> >> + av_bprintf(&sami->full, "%s", sami->content.str);
>> >> + if (second_paragraph) {
>> >> + second_paragraph = 0;
>> >> + p = pcopy;
>> >> + p += index;
>> >> + index = 0;
>> >> + continue;
>> >> + }
>> >> + } else {
>> >> + av_bprintf(&sami->full, "%s", sami->content.str);
>> >> + }
>> >> + av_bprint_clear(&sami->content);
>> >> +
>> >> + }
>> >>
>> >
>> > This looks clumsy at best: you are finalizing the subtitle event inside
>> > the paragraph loop when it should be outside. It also seems there is a
>> > duplicating "second paragraph" logic even though the loop is supposed to
>> > handle one paragraph at a time.
>>
>> I know. It is clumsy.. I explain it below as well.
>>
>> >
>> > If you are uncomfortable with the current logic or believe it's badly
>> > designed for the logic you're trying to mangle, feel free to rewrite it;
>> > it's better than trying to inhibit the old behaviour with hacks.
>>
>> It's not that your code was badly designed to begin with. I feel like
>> it is the nature
>> of this format.
>>
>> So let me explain what was going on your old code and what I had changed.
>> Here is the example: (This one is from current fate sample.)
>>
>> <SYNC Start=73000>
>> <P Class=ENUSCC ID=Source>End of:
>> <P Class=ENUSCC>President John F. Kennedy Speech
>>
>> So this is one subtitle event and there are two paragraphs. One
>> paragraph has a new source and
>> the other paragraph has its content.
>> Since it is decided that this code uses html
>> parser(ff_htmlmarkup_to_ass) and use it to parse the contents of the
>> paragraph. Each paragraph content needs to go to to html parser,
>> regardless whether it is source or subtitle content.
>>
>> So I need to tell html_parser, ff_htmlmarkup_to_ass that where the
>> destination is.. (whether it is source or content).
>> I can't no longer specify the destination which I was able to do in
>> your old code. I mean maybe I can change the function header and add
>> additional flag to tell our parser where to put in the dst but I don't
>> want to because it may require to change srtdec and I do not want to
>> do that..
>>
>> Anyways, so that's the basic idea. I honestly think it is good that
>> subtitle event handle is also in loop because one loop iteration per
>> one subtitle event and subtitle event does not guarantee that it
>> contains only one paragraph to begin with. It was a possibility either
>> adding additional loop or having goto to honor your original logic but
>> after I noticed that subtitle event contains more than one paragraph,
>> I gave up on the idea to keep your logic. The comment says "check
>> second paragraph" but it should be able to handle multiple paragraphs
>> more than 2 in one subtitle event. (I definitely tested while I was
>> working on this part.. I can modify sample or attach test file if that
>> makes you feel happier. And I make a better comment then.)
>>
>> So I want to leave it as it is if you do not see major flaws or you do
>> not have very very strong opinion about how it needs to be modified.
>> Well, given the information, if you have some ideas, you can tell me
>> too. Besides, I can't cross out the possibility if you or me or
>> someone wants to add more enhancement (maybe one day you decided to
>> add support for img tag? who knows..) , and it may need to change code
>> structure again. And as it is now, it is still enhancement that works.
>> ;-).
>>
>
> I think it's breakable one way or another without that much effort.
>
> Here is a suggestion based on the original code:
>
> - make sure you have 5 bprint buffers: source_sami, content_sami,
> source_encoded, content_encoded, full (you already have 3 of them in
> the context)
>
> - remove the current markup processing from the loop and make sure the
> loop just adds the content of each paragraph where it belongs (that is,
> in either source_sami or content_sami). The code already does that, so
> there is probably no change to do (code already concats in each bprint
> buffer if more than 2 paragraphs happen).
>
> - after the loop ends, you now have the SAMI data of your packet
> split into content_sami and eventually in source_sami, without the SAMI
> crap (such as <P ...>). So what you need to do is just call the markup
> on content_sami (and eventually source_sami if there is data) to create
> content_encoded (and eventually source_encoded).
>
> - the existing code will then finish things up by concatenating source &
> content as it is already doing
>
> This is probably way more solid than trying to interlace another partial
> sub-loop in a loop of the same purpose like you proposed.
>
> If you do not want to do that, I will have to push your version and do the
> above or similar before pushing the whole set.
>
> [...]
>
> --
> Clément B.
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
More information about the ffmpeg-devel
mailing list