[MPlayer-dev-eng] ctype.h functions on win32

Rich Felker dalias at aerifal.cx
Fri Oct 12 23:33:13 CEST 2007


On Fri, Oct 12, 2007 at 09:57:59PM +0200, Gianluigi Tiesi wrote:
> On Fri, Oct 12, 2007 at 01:03:03PM -0400, Rich Felker wrote:
> > On Thu, Oct 11, 2007 at 07:03:23AM +0200, Gianluigi Tiesi wrote:
> > > Win32 crt's ctype functions like isalpha isspace etc
> > > are declared as int, but they return wrong results
> > > (using msvc 2005, you get an assert or crash)
> > > if the value is < 0
> > 
> > It's unclear from the spec what they're supposed to do. On the one
> > hand it says the application shall ensure that the value passed is a
> > valid value for unsigned char. But on the other hand, it says the
> > functions shall return a nonzero value if the argument is a character
> > of the corresponding type, and 0 otherwise (which could be interpreted
> > as meaning that 0 should be returned for invalid values as well).
> > 
> > In any case, why are we using these functions? They're bogus since
> > they only work for 8bit characters and not multibyte characters, and
> > MPlayer does not even use the locale anyway due to various issues, so
> > we might as well just code the ascii tests directly if ascii is all we
> > need...
> 
> the proto of .h is int, so it should really work also with negative
> values.

And the prototype for strlen is char *, so it should work with any
possible value a char * can take on??? No, of course not! The
documentation for the function specifies what values are valid.

> there are strange results with different locales, ie if LANG
> env var is turkish islower and/or tolwer works in a different way
> 
> suggestions?

MPlayer does not use locale aside from determining the character
encoding. BTW this hilights a rather serious bug in that MPlayer's use
of translations is invalid. The translated strings passed to printf
are generally not valid multibyte character strings in the current
locale, since the locale remains C. Thus, printf should not
necessarily accept them.

IMO we should use LC_CTYPE (but not any other locale classes) and just
remove all use of case-mapping and character-class functions and
replace them with ascii-only macros.

> > > By looking at the code, there are some functions
> > > where it's enough to change char to unsigned char in arguments
> > > 
> > > other functions are more difficult since they use int
> > > since some api are still returning int like fgetc (deprecated iirc)
> > 
> > Huh?? Why would fgetc be deprecated? Returning int is correct for
> > fgetc. It returns a byte (unsigned char) or EOF (-1) if end of file is
> > reached or another error occurs. What type would you suggest it should
> > return to meet these needs?
> 
> so I don't remember correct  :) the problem still exists with the value
> so it may need to be checked if -1 then casted to unsigned in isspace
> call

-1 (EOF) is also valid for passing to isspace and friends, iirc..

> > > in these places the value can be casted to uint32_t
> > 
> > Absolutely not.
> 
> I mean in the isspace call not elsewhere

That would be even worse. -1 is (IIRC) a valid argument to isspace but
2^32 - 1 definitely is not valid!!

Rich



More information about the MPlayer-dev-eng mailing list