[MPlayer-dev-eng] ctype.h functions on win32

Gianluigi Tiesi mplayer at netfarm.it
Sat Oct 13 07:50:29 CEST 2007


On Sat, Oct 13, 2007 at 12:45:11AM -0400, Rich Felker wrote:
> On Sat, Oct 13, 2007 at 12:51:02AM +0200, Gianluigi Tiesi wrote:
> > > > the proto of .h is int, so it should really work also with negative
> > > > values.
> > > 
> > > And the prototype for strlen is char *, so it should work with any
> > > possible value a char * can take on??? No, of course not! The
> > > documentation for the function specifies what values are valid.
> > 
> > yes at least linux man pages says this but non ascii chars
> > as signed char are negative values while the functions
> > needs an unsigned char OR -1
> 
> The point is that negative numbers are not valid character numbers. If
> you're in an 8bit legacy locale, the extra 128 characters are
> representible only as _unsigned_ char or a larger integer type. Using
> signed char for them is bogus. But if you're using multibyte character
> handling correctly (needed for UTF-8 to work) then it doesn't matter
> since you'll never directly access the bytes anyway and won't use the
> is*() api but instead the isw*() api or something better.

isspace can be replaced by check if value is 9 or 32
this works on ascii, 8bit locales and utf-8
obiviously doesn't work on utf-16, but you
cannot handle anyway wide chars using byte functions
or even worst making a check if it's multibyte or singlebyte
then use the correct function
mplayer can use utf-8 but I don't known about wide chars
for sure subtitle stuff won't work correctly using wide chars

> > from the man:
> > These functions check whether c, which must have the value of an
> > unsigned char or EOF, falls into a certain character class according to
> > the current locale.
> > 
> > "according to the current locale"
> > this is not so good, as currently used in mplayer
> 
> The current locale in MPlayer is always C, so it's not a problem.
> However, leaving the locale as C is probably a bad idea...

this is a workaround not a solution

> > > > > > in these places the value can be casted to uint32_t
> > > > > 
> > > > > Absolutely not.
> > > > 
> > > > I mean in the isspace call not elsewhere
> > > 
> > > That would be even worse. -1 is (IIRC) a valid argument to isspace but
> > > 2^32 - 1 definitely is not valid!!
> > 
> > hmm uint8_t is better :D
> 
> No, that's also extremely wrong! It would make EOF get treated as
> character 255, which is an extremely common and stupid bug in poorly
> written C programs. No cast at all should be done. The return value of
> fgetc is already exactly what it should be for use with is*()
> functions!

I currently mean:

if (c == -1) then do something to handle eof
else call isspace with cast,
-1 will never become 255 in the isspace call cast
so it doesn't look so wrong

a better way is checking if < 0 then avoid isspace call at all
no cast is needed here

> > Mainly problems are in subtitle stuff, I wonder if subreader.c
> > it's also used when using ass.
> > If yes I think it will need a rewrite or partial rewrite.
> > it's all based on isXXX functions, that imho should be avoided
> 
> Indeed. I'd be happy to see a patch to remove is*() calls from MPlayer
> and use the appropriate replacements. The question of what is
> appropriate depends on what's being done. If we're parsing text for
> syntax (config, etc. files) then hard-coding a few ascii values is
> sufficient. For example, isspace() is a check for character 9-13 or
> 32. But if we're processing human-language data (e.g. subtitle layout)
> then using is*() was wrong in the first place and we need proper
> Unicode functions that are aware of the character semantics for all
> 100000+ characters. Unicode has plenty of linebreak opportunities
> other than ascii spaces. Some of them are useless optional linebreaks
> for beautifying layout but some of them are mandatory for certain
> languages (e.g. Tibetan and Ogham).
> 
> > and please don't turn each email of this ml into a flame
> > or a dispute about api calls meaning
> > I'm trying to help and I'm open to your prospectives
> 
> My flames are only about bugs you're proposing we introduce, not
> anything ideological. Hopefully flaming for bugs/incorrect code is
> still appropriate on this list. :)

I really proposed nothing definitive, just talking about
I've posted no code at all, so you may be a bit polite
with me, by not saying that I write wrong code when I wrote nothing
You may also look at my contribuited code and see if I really
write wrong/bugged code.
Really difficult to make pacific discussions on this ml, I refused
also the write access on svn since I'm scared :D

the problem is that isXXX functions with negative values are not
working correct on mingw and they get a crash on msvc 2005 (not really a problem
since mplayer cannot be compiled by msvc)
on other platforms mileage can vary

signed chars can be changed to unsigned, fgets stuff can be checked if
negative, isXXX functions can be wrapped by a macro that checks if
negative then returns false (0 or whatever) if positive returns
the result of the api call

Regards

-- 
Gianluigi Tiesi <sherpya at netfarm.it>
EDP Project Leader
Netfarm S.r.l. - http://www.netfarm.it/
Free Software: http://oss.netfarm.it/



More information about the MPlayer-dev-eng mailing list