[FFmpeg-devel] [PATCH 1/2] avcodec/{ass, webvttdec}: fix handling of backslashes

Soft Works softworkz at hotmail.com
Sat Feb 5 04:08:48 EET 2022



> -----Original Message-----
> From: ffmpeg-devel <ffmpeg-devel-bounces at ffmpeg.org> On Behalf Of
> Oneric
> Sent: Saturday, February 5, 2022 2:20 AM
> To: FFmpeg development discussions and patches <ffmpeg-
> devel at ffmpeg.org>
> Subject: Re: [FFmpeg-devel] [PATCH 1/2] avcodec/{ass, webvttdec}: fix
> handling of backslashes
> 
> On Fri, Feb 04, 2022 at 23:24:58 +0000, Soft Works wrote:
> > You want to "pollute" gazillions of subtitle streams in the
> > world from multiple subtitle formats with invisible
> > characters in order to solve an escaping problem in ffmpeg?
> 
> I do not consider using characters that are explicitly recommended to
> be
> used by Unicode to be “polluting”. Further consider that as mentioned
> invisible characters in ASS are not uncommon anyway already and
> conversion
> from ASS to something else are rare due to being generally lossy.
> Lossy
> with regards to typesetting that is, removing breaking hints in form
> of
> plain Unicode characters would be a new form of lossyness.
> 
> > [From the other mail:]
> > I'm not into changing ffmpeg's ass output, it's all
> > about the internally used ass format and the escaping is
> > a central problem there.
> 
> I’m not interested in reworking ffmpeg’s internal subtitle handling.
> The proposed patch is a clear improvement over the status quo which
> is plain incorrect. Within reasonable effort and sound arguments for
> it adjustments to the patch can be made; reworking ffmpeg internals is
> imo not “reasonable” effort to correct an uncontestedly wrong escape.
> 
> You have two options:
> Either finally tell me what I asked about:
> where (as in which file and function) removing wordjoiners should
> even happen and where possible lingering “\\ → \” conversions
> presumably
> are and if it’s simple enough I can add a removal accompanied by a
> comment
> pointing out that this can go wrong.
> Or go ahead and create your own patch.
> 
> ~~~~~~
> 
> > > > I'm not sure whether all ffmpeg text-sub encoders can handle
> > > > those chars - which could be verified of course.
> > >
> > > Since it's in the BMP and ffmpeg already seems happy to assume
> some
> > > UTF-8
> > > support by converting everything to it, I'm not worried about this
> > > until
> > > proven wrong.
> >
> > Proven wrong: https://github.com/libass/libass/issues/507
> 
> This issue is not at all wordjoiner specific despite the name.
> As far as I recall this never lead to wrong rendering.
> With HarfBuzz, the only fully featured shaping backend of libass,
> control characters were and are handled by HarfBuzz.
> And even with FriBiDi U+2060 was ignored since long before (2012)
> the linked issue was opened.
> 
> What that issue really is about is a combination of two more general
> issues. libass is currently not caching failure to lookup a glyph
> leading
> to multiple messages and at worst a perf degradation if no font on the
> font pool contained a glyph for a particular glyph. And the
> realisation
> that libass’ font-fallback strategy is not ideal for prefix-type
> control
> characters, characters which visibly affect both neighbours and a few
> others.
> The word-joiner is only highlighted here as due to its usage as an
> backslash escape its commonly passed to libass and a high enough
> percentage of fonts doesn’t contain it to create reports about it.
> 
> 
> For further reference: U+2060 was added in Unicode 3.2 released 2002.
> If you want to strip it because it might not render correctly you
> should
> also strip most emoji, the uppercase eszett ẞ and several actively
> used writing systems in their entirety.


Let's try to approach this from a different side. Which case is 
your [1/2] commit actually supposed to fix? 
How did you test your patch?
Can we please go over an example?

Thanks,
sw






More information about the ffmpeg-devel mailing list