Hacker News new | past | comments | ask | show | jobs | submit login

I look at it the other way: I've hard coded the reading and writing routines inside the tokenization logic.

Being able to do that is exactly the point why it's so much simpler to avoid a silly API such as strcspn (or, god forbid, strtok).

> non-ASCII

yeah i know... Do you prefer strcspn("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVXYZ")? Do you think it's faster?

If you're pedantic, you could lex (0x41 <= c && c <= 0x5A). That way at least you consistently read ASCII, even on non-ASCII implementations. But I don't care and it's less readable.

> I suppose you consider isalpha() to have a weird API?

Yes. I do not even understand what it does.

>> isalpha() checks for an alphabetic character; in the standard "C" locale, it is equivalent to (isupper(c) || islower(c)). In some locales, there may be additional characters for which isalpha() is true-letters which are neither upper case nor lower case.

Well in any case I'm sure that's not what I wanted... By the way locale is super hard to use as well. Locale is a process global property. I'm not aware of any way to pass explicit locales to library functions.




> If you're pedantic, you could lex (0x41 <= c && c <= 0x5A)

'A' v.s. 0x41 makes no difference for portability. The thing that's unportable about that is that it assumes that the characters A..Z are continuous in your character encoding, which isn't portable C.

Although admittedly having to deal with EBCDIC these days is rare in anything except highly portable programs like C compilers or popular script interpreters.

This is why ctype.h functions exist. Just use them.


> 'A' v.s. 0x41 makes no difference for portability. The thing that's unportable about that is that it assumes that the characters A..Z are continuous in your character encoding, which isn't portable C.

Wait, what? If C does not require A..Z to be contiguous, the distinction between 'A' and 0x41 is extremely significant to portable programs intending to parse ASCII when the native compiler encoding is whatever franken-coding doesn't have contiguous latin characters.


Yes. If the problem was trying to parse ASCII consistently that would be the right solution.

My response was to OPs moving the goal post to "portably parsing ASCII" in response to his suggested replacement for a C library function not being portable on non-ASCII systems, which make no sense.


I explicitly wrote "or whatever test" as a comment to the code snippet, and obviously the test was not the point of my comment.

Anyway I think most programming languages nowadays have their source encoding specified as UTF-8 or at least something ASCII-like, so ('A' <= c && c <= 'Z') is in fact what I would likely write, and using isalpha() would technically be a bug just as well.


EBCDIC famously does not have A..Z as contiguous characters, and I wouldn't describe it as a 'franken-coding' just yet - it still finds plenty of use in some places.


Unless you're dealing with mainframes, it's not like you see it everyday.


EBCDIC is a classic example of a franken-coding.

If your compiler's source character set is EBCDIC and you want to parse ASCII files, you must use 0x41, etc, instead of 'A'.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: