[FFmpeg-devel] [PATCH v23 19/21] avfilter/graphicsub2text: Add new graphicsub2text filter (OCR)

Soft Works softworkz at hotmail.com
Sat Dec 11 19:39:38 EET 2021



> -----Original Message-----
> From: ffmpeg-devel <ffmpeg-devel-bounces at ffmpeg.org> On Behalf Of Daniel
> Cantarín
> Sent: Saturday, December 11, 2021 4:18 PM
> To: ffmpeg-devel at ffmpeg.org
> Subject: Re: [FFmpeg-devel] [PATCH v23 19/21] avfilter/graphicsub2text: Add
> new graphicsub2text filter (OCR)
> 
> Hi there softworkz.
> 
> Having worked before with OCR filter output, I suggest you a
> modification for your new filter.
> It's not something that should delay the patch, but just a nice addenum.
> Could be done in another patch, or could even do it myself in the
> future. But I let the comment here anyways, for you to consider.
> 
> If you take a look at vf_ocr, you'll see that it sets
> "lavfi.ocr.confidence" metadata field.
> Well... downstream filters can check that field in order to just
> consider certain confidence threshold, discarding the rest.
> This is very useful when doing OCR with non-ascii chars, like I do with
> Spanish language.
> 
> So I propose an option like this:
> 
>    { "confidence", "Sets the confidence threshold for valid OCR. Default
> 80." , OFFSET(confidence), AV_OPT_TYPE_INT, {.i64=80}, 0, 100, FLAGS },
> 
> Then you do an average of all confidences detected by tesseract after
> OCR but before converting to text subtitle frame, and compare that
> option value to the average result.
> Something like this:
> 
>    int average = sum_of_all_confidences / number_of_confidence_items;
>    if (average >= s->confidence) {
>      do_your_thing();
>    } else {
>      av_log(ctx, AV_LOG_DEBUG, "Confidence average %d under threshold.
> Text detected: '%s'\n", average, text);
>    }
> 
> Also, I would like to do some tests with spanish OCR, as I had to
> explicitly allowlist our non-ascii chars when using OCR filter, and
> don't know how yours will behave in that situation. Maybe having the
> chars allowlist option here too is a good idea. But, again: none of this
> this should delay the patch, as your work is much more important than
> this kind of nice to have functionalities, which could be easily
> implemented later by anyone.
> 

Hi Daniel,

I don't think that any of that will be necessary. For the generic ocr 
filter, this might make sense, because it is meant to work in 
many different situations, different text sizes, different (not 
necessarily uniform) backgrounds, static or moving, a wide spectrum
of colours, and no quantization in the time dimension, etc.

But for subtitle-ocr, we have a fixed and static background, we have
palette colours from like 4 to 32 only, we know when it starts and
that it doesn’t change until the next event and we have a pixel 
density relative to the text height that is a multiple of what
you get when you scan a letter for example.

Basically, this is like a pre-school situation for an OCR. If it 
can't recognize that in a reliable way and you would end up needing
to dissect results by confidence level, then the OCR wouldn't be 
worth a penny and this filter kind of pointless ;-)

IIUC, you haven't tried graphicsub2text yet. I suggest, you to
look at filters.texi for instructions to set up the model data.
There's an example with a test stream that you can run right
away. With that example, I haven't been able to spot a single 
incorrectly recognized character.

Somebody who tried my filter had contacted me last week as he 
was getting rather bad recognition results. It turned out
that the text in this case had strong outlines and the inner 
text was black. After removing the outlines and inverting the
text, the recognition result was close to perfect.

The crucial part is the preparation of the image before doing
OCR. When this is not done right, you can't remedy later with
confidence level evaluation.

What's working fine already is bright text without outlines.
Left for me to do is automatic detection of outline colours
and removing those before running recognition. Second part is
detection of the text (fill) color and depending on that - replace
the transparency either with a light or dark  background colour 
(and invert in the latter case).

When you get a chance to try, please let me know about your 
results.

PS: When positive, post here - otherwise contact me privately...LOL

Just joking..whatever you prefer.

Kind regards,'
softworkz
Application: Microsoft.Office.Interop.Outlook.ApplicationClass
Class: 43
Session: System.__ComObject
Parent: System.__ComObject
Actions: System.__ComObject
Attachments: System.__ComObject
BillingInformation: 
Body: 

> -----Original Message-----
> From: ffmpeg-devel <ffmpeg-devel-bounces at ffmpeg.org> On Behalf Of Daniel
> Cantarín
> Sent: Saturday, December 11, 2021 4:18 PM
> To: ffmpeg-devel at ffmpeg.org
> Subject: Re: [FFmpeg-devel] [PATCH v23 19/21] avfilter/graphicsub2text: Add
> new graphicsub2text filter (OCR)
> 
> Hi there softworkz.
> 
> Having worked before with OCR filter output, I suggest you a
> modification for your new filter.
> It's not something that should delay the patch, but just a nice addenum.
> Could be done in another patch, or could even do it myself in the
> future. But I let the comment here anyways, for you to consider.
> 
> If you take a look at vf_ocr, you'll see that it sets
> "lavfi.ocr.confidence" metadata field.
> Well... downstream filters can check that field in order to just
> consider certain confidence threshold, discarding the rest.
> This is very useful when doing OCR with non-ascii chars, like I do with
> Spanish language.
> 
> So I propose an option like this:
> 
>    { "confidence", "Sets the confidence threshold for valid OCR. Default
> 80." , OFFSET(confidence), AV_OPT_TYPE_INT, {.i64=80}, 0, 100, FLAGS },
> 
> Then you do an average of all confidences detected by tesseract after
> OCR but before converting to text subtitle frame, and compare that
> option value to the average result.
> Something like this:
> 
>    int average = sum_of_all_confidences / number_of_confidence_items;
>    if (average >= s->confidence) {
>      do_your_thing();
>    } else {
>      av_log(ctx, AV_LOG_DEBUG, "Confidence average %d under threshold.
> Text detected: '%s'\n", average, text);
>    }
> 
> Also, I would like to do some tests with spanish OCR, as I had to
> explicitly allowlist our non-ascii chars when using OCR filter, and
> don't know how yours will behave in that situation. Maybe having the
> chars allowlist option here too is a good idea. But, again: none of this
> this should delay the patch, as your work is much more important than
> this kind of nice to have functionalities, which could be easily
> implemented later by anyone.
> 

Hi Daniel,

I don't think that any of that will be necessary. For the generic ocr 
filter, this might make sense, because it is meant to work in 
many different situations, different text sizes, different (not 
necessarily uniform) backgrounds, static or moving, a wide spectrum
of colours, and no quantization in the time dimension, etc.

But for subtitle-ocr, we have a fixed and static background, we have
palette colours from like 4 to 32 only, we know when it starts and
that it doesn’t change until the next event and we have a pixel 
density relative to the text height that is a multiple of what
you get when you scan a letter for example.

Basically, this is like a pre-school situation for an OCR. If it 
can't recognize that in a reliable way and you would end up needing
to dissect results by confidence level, then the OCR wouldn't be 
worth a penny and this filter kind of pointless ;-)

IIUC, you haven't tried graphicsub2text yet. I suggest, you to
look at filters.texi for instructions to set up the model data.
There's an example with a test stream that you can run right
away. With that example, I haven't been able to spot a single 
incorrectly recognized character.

Somebody who tried my filter had contacted me last week as he 
was getting rather bad recognition results. It turned out
that the text in this case had strong outlines and the inner 
text was black. After removing the outlines and inverting the
text, the recognition result was close to perfect.

The crucial part is the preparation of the image before doing
OCR. When this is not done right, you can't remedy later with
confidence level evaluation.

What's working fine already is bright text without outlines.
Left for me to do is automatic detection of outline colours
and removing those before running recognition. Second part is
detection of the text (fill) color and depending on that - replace
the transparency either with a light or dark  background colour 
(and invert in the latter case).

When you get a chance to try, please let me know about your 
results.

PS: When positive, post here - otherwise contact me privately...LOL

Just joking..whatever you prefer.

Kind regards,'
softworkz

Categories: 
Companies: 
ConversationIndex: 0101D7EE0E321DD45EEE605BD34994EF6CE3443CE288AC2D685A00800014B360
ConversationTopic: [PATCH 1/1] Test ref file change
CreationTime: 11 Dec 2021 17:31:42
EntryID: 00000000BEFAAEED30DFDF43976487F12A562A600700DEB68488D92E8146A5C16995B1AE958D00000000010F0000DEB68488D92E8146A5C16995B1AE958D000570B7A2E40000
FormDescription: System.__ComObject
GetInspector: System.__ComObject
Importance: 1
LastModificationTime: 11 Dec 2021 17:31:42
MessageClass: IPM.Note
Mileage: 
NoAging: False
OutlookInternalVersion: 1614701
OutlookVersion: 16.0
Saved: False
Sensitivity: 0
Size: 12438
Subject: RE: [FFmpeg-devel] [PATCH v23 19/21] avfilter/graphicsub2text: Add new graphicsub2text filter (OCR)
UnRead: True
UserProperties: System.__ComObject
AlternateRecipientAllowed: True
AutoForwarded: False
BCC: 
CC: 
DeferredDeliveryTime: 1 Jan 4501 00:00:00
DeleteAfterSubmit: False
ExpiryTime: 1 Jan 4501 00:00:00
FlagRequest: 
HTMLBody: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META NAME="Generator" CONTENT="MS Exchange Server version 16.0.14701.20038">
<TITLE></TITLE>
</HEAD>
<BODY>
<!-- Converted from text/plain format -->
<BR>
<BR>

<P><FONT SIZE=2>> -----Original Message-----<BR>
> From: ffmpeg-devel <ffmpeg-devel-bounces at ffmpeg.org> On Behalf Of Daniel<BR>
> Cantarín<BR>
> Sent: Saturday, December 11, 2021 4:18 PM<BR>
> To: ffmpeg-devel at ffmpeg.org<BR>
> Subject: Re: [FFmpeg-devel] [PATCH v23 19/21] avfilter/graphicsub2text: Add<BR>
> new graphicsub2text filter (OCR)<BR>
><BR>
> Hi there softworkz.<BR>
><BR>
> Having worked before with OCR filter output, I suggest you a<BR>
> modification for your new filter.<BR>
> It's not something that should delay the patch, but just a nice addenum.<BR>
> Could be done in another patch, or could even do it myself in the<BR>
> future. But I let the comment here anyways, for you to consider.<BR>
><BR>
> If you take a look at vf_ocr, you'll see that it sets<BR>
> "lavfi.ocr.confidence" metadata field.<BR>
> Well... downstream filters can check that field in order to just<BR>
> consider certain confidence threshold, discarding the rest.<BR>
> This is very useful when doing OCR with non-ascii chars, like I do with<BR>
> Spanish language.<BR>
><BR>
> So I propose an option like this:<BR>
><BR>
>    { "confidence", "Sets the confidence threshold for valid OCR. Default<BR>
> 80." , OFFSET(confidence), AV_OPT_TYPE_INT, {.i64=80}, 0, 100, FLAGS },<BR>
><BR>
> Then you do an average of all confidences detected by tesseract after<BR>
> OCR but before converting to text subtitle frame, and compare that<BR>
> option value to the average result.<BR>
> Something like this:<BR>
><BR>
>    int average = sum_of_all_confidences / number_of_confidence_items;<BR>
>    if (average >= s->confidence) {<BR>
>      do_your_thing();<BR>
>    } else {<BR>
>      av_log(ctx, AV_LOG_DEBUG, "Confidence average %d under threshold.<BR>
> Text detected: '%s'\n", average, text);<BR>
>    }<BR>
><BR>
> Also, I would like to do some tests with spanish OCR, as I had to<BR>
> explicitly allowlist our non-ascii chars when using OCR filter, and<BR>
> don't know how yours will behave in that situation. Maybe having the<BR>
> chars allowlist option here too is a good idea. But, again: none of this<BR>
> this should delay the patch, as your work is much more important than<BR>
> this kind of nice to have functionalities, which could be easily<BR>
> implemented later by anyone.<BR>
><BR>
<BR>
Hi Daniel,<BR>
<BR>
I don't think that any of that will be necessary. For the generic ocr<BR>
filter, this might make sense, because it is meant to work in<BR>
many different situations, different text sizes, different (not<BR>
necessarily uniform) backgrounds, static or moving, a wide spectrum<BR>
of colours, and no quantization in the time dimension, etc.<BR>
<BR>
But for subtitle-ocr, we have a fixed and static background, we have<BR>
palette colours from like 4 to 32 only, we know when it starts and<BR>
that it doesn’t change until the next event and we have a pixel<BR>
density relative to the text height that is a multiple of what<BR>
you get when you scan a letter for example.<BR>
<BR>
Basically, this is like a pre-school situation for an OCR. If it<BR>
can't recognize that in a reliable way and you would end up needing<BR>
to dissect results by confidence level, then the OCR wouldn't be<BR>
worth a penny and this filter kind of pointless ;-)<BR>
<BR>
IIUC, you haven't tried graphicsub2text yet. I suggest, you to<BR>
look at filters.texi for instructions to set up the model data.<BR>
There's an example with a test stream that you can run right<BR>
away. With that example, I haven't been able to spot a single<BR>
incorrectly recognized character.<BR>
<BR>
Somebody who tried my filter had contacted me last week as he<BR>
was getting rather bad recognition results. It turned out<BR>
that the text in this case had strong outlines and the inner<BR>
text was black. After removing the outlines and inverting the<BR>
text, the recognition result was close to perfect.<BR>
<BR>
The crucial part is the preparation of the image before doing<BR>
OCR. When this is not done right, you can't remedy later with<BR>
confidence level evaluation.<BR>
<BR>
What's working fine already is bright text without outlines.<BR>
Left for me to do is automatic detection of outline colours<BR>
and removing those before running recognition. Second part is<BR>
detection of the text (fill) color and depending on that - replace<BR>
the transparency either with a light or dark  background colour<BR>
(and invert in the latter case).<BR>
<BR>
When you get a chance to try, please let me know about your<BR>
results.<BR>
<BR>
PS: When positive, post here - otherwise contact me privately...LOL<BR>
<BR>
Just joking..whatever you prefer.<BR>
<BR>
Kind regards,'<BR>
softworkz<BR>
<BR>
</FONT>
</P>

</BODY>
</HTML>
OriginatorDeliveryReportRequested: False
ReadReceiptRequested: False
ReceivedByEntryID: 
ReceivedByName: 
ReceivedOnBehalfOfEntryID: 
ReceivedOnBehalfOfName: 
ReceivedTime: 11 Dec 2021 18:39:00
RecipientReassignmentProhibited: False
Recipients: System.__ComObject
ReminderOverrideDefault: False
ReminderPlaySound: False
ReminderSet: False
ReminderSoundFile: 
ReminderTime: 1 Jan 4501 00:00:00
RemoteStatus: 0
ReplyRecipientNames: 
ReplyRecipients: System.__ComObject
SaveSentMessageFolder: System.__ComObject
SenderName: 
Sent: False
SentOn: 1 Jan 4501 00:00:00
SentOnBehalfOfName: softworkz at hotmail.com
Submitted: False
To: FFmpeg development discussions and patches
VotingOptions: 
VotingResponse: 
ItemProperties: System.__ComObject
BodyFormat: 1
DownloadState: 1
InternetCodepage: 65001
MarkForDownload: 0
IsConflict: False
AutoResolvedWinner: False
Conflicts: System.__ComObject
SenderEmailAddress: softworkz at hotmail.com
SenderEmailType: EX
Permission: 0
PermissionService: 0
PropertyAccessor: System.__ComObject
SendUsingAccount: System.__ComObject
TaskSubject: RE: [FFmpeg-devel] [PATCH v23 19/21] avfilter/graphicsub2text: Add new graphicsub2text filter (OCR)
TaskDueDate: 1 Jan 4501 00:00:00
TaskStartDate: 1 Jan 4501 00:00:00
TaskCompletedDate: 1 Jan 4501 00:00:00
ToDoTaskOrdinal: 1 Jan 4501 00:00:00
IsMarkedAsTask: False
ConversationID: 
Sender: System.__ComObject
RTFBody: System.Byte[]
RetentionExpirationDate: 1 Jan 4501 00:00:00


More information about the ffmpeg-devel mailing list