[MPlayer-dev-eng] ctype.h functions on win32

Rich Felker dalias at aerifal.cx
Sat Oct 13 06:45:11 CEST 2007


On Sat, Oct 13, 2007 at 12:51:02AM +0200, Gianluigi Tiesi wrote:
> > > the proto of .h is int, so it should really work also with negative
> > > values.
> > 
> > And the prototype for strlen is char *, so it should work with any
> > possible value a char * can take on??? No, of course not! The
> > documentation for the function specifies what values are valid.
> 
> yes at least linux man pages says this but non ascii chars
> as signed char are negative values while the functions
> needs an unsigned char OR -1

The point is that negative numbers are not valid character numbers. If
you're in an 8bit legacy locale, the extra 128 characters are
representible only as _unsigned_ char or a larger integer type. Using
signed char for them is bogus. But if you're using multibyte character
handling correctly (needed for UTF-8 to work) then it doesn't matter
since you'll never directly access the bytes anyway and won't use the
is*() api but instead the isw*() api or something better.

> from the man:
> These functions check whether c, which must have the value of an
> unsigned char or EOF, falls into a certain character class according to
> the current locale.
> 
> "according to the current locale"
> this is not so good, as currently used in mplayer

The current locale in MPlayer is always C, so it's not a problem.
However, leaving the locale as C is probably a bad idea...

> > > > > in these places the value can be casted to uint32_t
> > > > 
> > > > Absolutely not.
> > > 
> > > I mean in the isspace call not elsewhere
> > 
> > That would be even worse. -1 is (IIRC) a valid argument to isspace but
> > 2^32 - 1 definitely is not valid!!
> 
> hmm uint8_t is better :D

No, that's also extremely wrong! It would make EOF get treated as
character 255, which is an extremely common and stupid bug in poorly
written C programs. No cast at all should be done. The return value of
fgetc is already exactly what it should be for use with is*()
functions!

> Mainly problems are in subtitle stuff, I wonder if subreader.c
> it's also used when using ass.
> If yes I think it will need a rewrite or partial rewrite.
> it's all based on isXXX functions, that imho should be avoided

Indeed. I'd be happy to see a patch to remove is*() calls from MPlayer
and use the appropriate replacements. The question of what is
appropriate depends on what's being done. If we're parsing text for
syntax (config, etc. files) then hard-coding a few ascii values is
sufficient. For example, isspace() is a check for character 9-13 or
32. But if we're processing human-language data (e.g. subtitle layout)
then using is*() was wrong in the first place and we need proper
Unicode functions that are aware of the character semantics for all
100000+ characters. Unicode has plenty of linebreak opportunities
other than ascii spaces. Some of them are useless optional linebreaks
for beautifying layout but some of them are mandatory for certain
languages (e.g. Tibetan and Ogham).

> and please don't turn each email of this ml into a flame
> or a dispute about api calls meaning
> I'm trying to help and I'm open to your prospectives

My flames are only about bugs you're proposing we introduce, not
anything ideological. Hopefully flaming for bugs/incorrect code is
still appropriate on this list. :)

Rich



More information about the MPlayer-dev-eng mailing list