[FFmpeg-devel] [PATCH v23 19/21] avfilter/graphicsub2text: Add new graphicsub2text filter (OCR)

Sat Dec 11 19:39:38 EET 2021

> -----Original Message-----
> From: ffmpeg-devel <ffmpeg-devel-bounces at ffmpeg.org> On Behalf Of Daniel
> Cantarín
> Sent: Saturday, December 11, 2021 4:18 PM
> To: ffmpeg-devel at ffmpeg.org
> Subject: Re: [FFmpeg-devel] [PATCH v23 19/21] avfilter/graphicsub2text: Add
> new graphicsub2text filter (OCR)
> 
> Hi there softworkz.
> 
> Having worked before with OCR filter output, I suggest you a
> modification for your new filter.
> It's not something that should delay the patch, but just a nice addenum.
> Could be done in another patch, or could even do it myself in the
> future. But I let the comment here anyways, for you to consider.
> 
> If you take a look at vf_ocr, you'll see that it sets
> "lavfi.ocr.confidence" metadata field.
> Well... downstream filters can check that field in order to just
> consider certain confidence threshold, discarding the rest.
> This is very useful when doing OCR with non-ascii chars, like I do with
> Spanish language.
> 
> So I propose an option like this:
> 
>    { "confidence", "Sets the confidence threshold for valid OCR. Default
> 80." , OFFSET(confidence), AV_OPT_TYPE_INT, {.i64=80}, 0, 100, FLAGS },
> 
> Then you do an average of all confidences detected by tesseract after
> OCR but before converting to text subtitle frame, and compare that
> option value to the average result.
> Something like this:
> 
>    int average = sum_of_all_confidences / number_of_confidence_items;
>    if (average >= s->confidence) {
>      do_your_thing();
>    } else {
>      av_log(ctx, AV_LOG_DEBUG, "Confidence average %d under threshold.
> Text detected: '%s'\n", average, text);
>    }
> 
> Also, I would like to do some tests with spanish OCR, as I had to
> explicitly allowlist our non-ascii chars when using OCR filter, and
> don't know how yours will behave in that situation. Maybe having the
> chars allowlist option here too is a good idea. But, again: none of this
> this should delay the patch, as your work is much more important than
> this kind of nice to have functionalities, which could be easily
> implemented later by anyone.
> 

Hi Daniel,

I don't think that any of that will be necessary. For the generic ocr 
filter, this might make sense, because it is meant to work in 
many different situations, different text sizes, different (not 
necessarily uniform) backgrounds, static or moving, a wide spectrum
of colours, and no quantization in the time dimension, etc.

But for subtitle-ocr, we have a fixed and static background, we have
palette colours from like 4 to 32 only, we know when it starts and
that it doesn’t change until the next event and we have a pixel 
density relative to the text height that is a multiple of what
you get when you scan a letter for example.

Basically, this is like a pre-school situation for an OCR. If it 
can't recognize that in a reliable way and you would end up needing
to dissect results by confidence level, then the OCR wouldn't be 
worth a penny and this filter kind of pointless ;-)

IIUC, you haven't tried graphicsub2text yet. I suggest, you to
look at filters.texi for instructions to set up the model data.
There's an example with a test stream that you can run right
away. With that example, I haven't been able to spot a single 
incorrectly recognized character.

Somebody who tried my filter had contacted me last week as he 
was getting rather bad recognition results. It turned out
that the text in this case had strong outlines and the inner 
text was black. After removing the outlines and inverting the
text, the recognition result was close to perfect.

The crucial part is the preparation of the image before doing
OCR. When this is not done right, you can't remedy later with
confidence level evaluation.

What's working fine already is bright text without outlines.
Left for me to do is automatic detection of outline colours
and removing those before running recognition. Second part is
detection of the text (fill) color and depending on that - replace
the transparency either with a light or dark  background colour 
(and invert in the latter case).

When you get a chance to try, please let me know about your 
results.

PS: When positive, post here - otherwise contact me privately...LOL

Just joking..whatever you prefer.

Kind regards,'
softworkz
Application: Microsoft.Office.Interop.Outlook.ApplicationClass
Class: 43
Session: System.__ComObject
Parent: System.__ComObject
Actions: System.__ComObject
Attachments: System.__ComObject
BillingInformation: 
Body: 

> -----Original Message-----
> From: ffmpeg-devel <ffmpeg-devel-bounces at ffmpeg.org> On Behalf Of Daniel
> Cantarín
> Sent: Saturday, December 11, 2021 4:18 PM
> To: ffmpeg-devel at ffmpeg.org
> Subject: Re: [FFmpeg-devel] [PATCH v23 19/21] avfilter/graphicsub2text: Add
> new graphicsub2text filter (OCR)
> 
> Hi there softworkz.
> 
> Having worked before with OCR filter output, I suggest you a
> modification for your new filter.
> It's not something that should delay the patch, but just a nice addenum.
> Could be done in another patch, or could even do it myself in the
> future. But I let the comment here anyways, for you to consider.
> 
> If you take a look at vf_ocr, you'll see that it sets
> "lavfi.ocr.confidence" metadata field.
> Well... downstream filters can check that field in order to just
> consider certain confidence threshold, discarding the rest.
> This is very useful when doing OCR with non-ascii chars, like I do with
> Spanish language.
> 
> So I propose an option like this:
> 
>    { "confidence", "Sets the confidence threshold for valid OCR. Default
> 80." , OFFSET(confidence), AV_OPT_TYPE_INT, {.i64=80}, 0, 100, FLAGS },
> 
> Then you do an average of all confidences detected by tesseract after
> OCR but before converting to text subtitle frame, and compare that
> option value to the average result.
> Something like this:
> 
>    int average = sum_of_all_confidences / number_of_confidence_items;
>    if (average >= s->confidence) {
>      do_your_thing();
>    } else {
>      av_log(ctx, AV_LOG_DEBUG, "Confidence average %d under threshold.
> Text detected: '%s'\n", average, text);
>    }
> 
> Also, I would like to do some tests with spanish OCR, as I had to
> explicitly allowlist our non-ascii chars when using OCR filter, and
> don't know how yours will behave in that situation. Maybe having the
> chars allowlist option here too is a good idea. But, again: none of this
> this should delay the patch, as your work is much more important than
> this kind of nice to have functionalities, which could be easily
> implemented later by anyone.
>