[FFmpeg-devel] [PATCH v23 19/21] avfilter/graphicsub2text: Add new graphicsub2text filter (OCR)

Daniel Cantarín canta at canta.com.ar
Sat Dec 11 22:23:54 EET 2021


> Hi Daniel,
> 
> I don't think that any of that will be necessary. For the generic ocr 
> filter, this might make sense, because it is meant to work in 
> many different situations, different text sizes, different (not 
> necessarily uniform) backgrounds, static or moving, a wide spectrum
> of colours, and no quantization in the time dimension, etc.
>
> But for subtitle-ocr, we have a fixed and static background, we have
> palette colours from like 4 to 32 only, we know when it starts and
> that it doesn’t change until the next event and we have a pixel 
> density relative to the text height that is a multiple of what
> you get when you scan a letter for example.
>

I see. That's a good point: this isn't generic OCR, but pretty
specific. Didn't considered that before.

> 
> Basically, this is like a pre-school situation for an OCR. If it 
> can't recognize that in a reliable way and you would end up needing
> to dissect results by confidence level, then the OCR wouldn't be 
> worth a penny and this filter kind of pointless ;-)
>

Well... I respectfully disagree, because reality's pretty effective
when it's about messing with common sense, making that paragraph
simply too optimistic. I'm sure we'll find some subtitle provider
with awful fonts and/or subtitling practices more sooner than later,
and that day those words will become sour.

Yet, I get your point. Please just ignore my previous comments
about the new filter. I'll test it properly eventually, and give you
some feedback. If any change is needed, I'll try to apply it myself,
so you don't have to do extra work. But just forget about it in the
meantime, as your point stands so far.

>
> IIUC, you haven't tried graphicsub2text yet. I suggest, you to
> look at filters.texi for instructions to set up the model data.
> (...)
> The crucial part is the preparation of the image before doing
> OCR. When this is not done right, you can't remedy later with
> confidence level evaluation.
> 

I'm aware, thanks. No expert, but have some experience with the stuff.

I'm actually using vf_ocr, taking dvbsubs and doing some alchemy with
lavfi using fps filter for the sparseness (and OCR CPU usage), color
tuning, creating a proper background for the ocr process, and so on.
I got OK results with image prep, and lots of noise without it. So
I kinda know the deal. Insights are cool anyways, and your code give
some good ideas too.

> 
> What's working fine already is bright text without outlines.
> Left for me to do is automatic detection of outline colours
> and removing those before running recognition. Second part is
> detection of the text (fill) color and depending on that - replace
> the transparency either with a light or dark  background colour 
> (and invert in the latter case).
> 

Bright (white) background over dark (black) characters had the best
results for me so far.

> 
> When you get a chance to try, please let me know about your 
> results.
> 

Most likely next week I'll take a look at it. It's easier now that you
let a public fork online (in another thread). I'm still getting used
to the patch and mail list dynamics.

>
> PS: When positive, post here - otherwise contact me privately...LOL
> 
> Just joking..whatever you prefer.
> 
> Kind regards,'
> softworkz


I try not to be rude, because I know it feels awful on the other side,
and I value feelings. I also tend to be chatty, in order to try to
understand and be understood. However, I fear replying a lot may be
seen as spaming the mailing list, so I'll keep my interactions to a
minimum. Please know there are people like me reading your work, even
when we may keep silent for different reasons.


Thanks,
Daniel.



More information about the ffmpeg-devel mailing list