> But also, "strings" and "time" are actually very complex concepts, and these functions operate on often outdated assumptions about those underlying abstractions.
Even in safer languages such as Rust, there are often quæstions as to why certain string operations are either impossible, or need to be quite complicated for a rather simple operation and are then met with responses such as “*Did you know that the length of a string can grow from a capitalization operation depending on locale settings of environment variables?
P.s.: In fact, I would argue that strings are not necessarily all that complicated, but simply that many assume that they are simpler than they are, and that code that handles them is thus written on such assumptions that the length of a string remain the same after capitalization, or that the result not be under influence of environment variables.
Also known as "why does my code that parses floats fail in Turkey?"
Also also known as the discrepancy between a string's length-as-in-bytes, its length-as-in-code-points, and its length-as-in-how-humans-count-glyphs.
Strings are hard.
Edit to respond to your addendum:
> P.s.: In fact, I would argue that strings are not necessarily all that complicated, but simply that many assume that they are simpler than they are, and that code that handles them is thus written on such assumptions that the length of a string remain the same after capitalization, or that the result not be under influence of environment variables.
I don't think I agree with that, though we may just be disagreeing on semantics. I think the big mistake many of us make is confusing two different abstractions for the same one. We've got this high level abstraction for "text" that includes issues like locale and encoding and several other things. And then we've got this low level abstraction for "text" that is just a blob of bytes. And we often mix the abstractions because it often turns out okay anyway. Otherwise we have to confront demons like "a UTF-8 string containing 10 characters can be anywhere between 10 and 40 bytes long".
> Also known as "why does my code that parses floats fail in Turkey?"
I am quite certain that I have produced code that lowercases or uppercases and then checks for “i” in them, that I now realize would fail under Turkish locale settings as under that “i” does not uppercase to “I”, as one might expect.
The problem is you'd have to pass an annoying extra argument (even if just a NULL) to numerous functions which have no alternative without that argument.
Technically it would be better, especially from a multi-threading point of view. The locale stuff was designed in the 1980's, before multi-threading was a mainstream technique.
Say you have a multi-threaded global server which has to localize something in the context of a session, to the locale of the user making the request.
Still, for thread support, you don't necessarily need a cluttering argument. The locale can be made into a thread-specific variable. In Lisp I would almost certainly prefer for the local to be a dynamic variable. (It would be pretty silly to be passing an argument to influence whether he decimal point is a comma, while the radix of integers is being controlled by *print-base*.)
What you want is for the locale stuff to be broken out into a complete separate library: a whole separate set of loc_* functions: loc_strtod, loc_printf, and so on.*
The threading aspect is one thing yes, but I think programmers forgetting or never realizing that these functions will magically behave differently for some of their users is a bigger problem.
I don't think having to pass a locale arguments would be that big of a problem - you could always have wrapper functions for the C locale, although they should be implemented directly for performance.
> What you want is for the locale stuff to be broken out into a complete separate library: a whole separate set of loc_* functions: loc_strtod, loc_printf, and so on.*
Even in safer languages such as Rust, there are often quæstions as to why certain string operations are either impossible, or need to be quite complicated for a rather simple operation and are then met with responses such as “*Did you know that the length of a string can grow from a capitalization operation depending on locale settings of environment variables?
P.s.: In fact, I would argue that strings are not necessarily all that complicated, but simply that many assume that they are simpler than they are, and that code that handles them is thus written on such assumptions that the length of a string remain the same after capitalization, or that the result not be under influence of environment variables.