I don't see any functions in the OP's library that would require dedicated UTF-8...

rurban · on Aug 1, 2018

Yes, but don't call it string library then. Strings should handle strings, and strings are unicode now. Unicode needs to be normalized and needs case-insensitive support.

And it's not easy. I implemented the third of its kind. First there was ICU, which is overly bloated. You don't need 30MB for a simple string libc. Then there is libunistring which has overly slow iterators, so not usable for coreutils. And then there's my safelibc, which is small and fast, but only for wide-chars, not utf-8.

I fixed and updated the musl case-mapping, making it 2x faster, but this is not in yet. And there's not even a properly spec'ed wcscmp/wcsicmp to find strings. glibc is an overall mess. I won't touch that. wcsicmp/wcsfc/wcsnorm are not even in POSIX.

sundarurfriend · on Aug 1, 2018

How does the utf8proc[1] library that Julia uses compare to these?

[1] http://juliastrings.github.io/utf8proc/doc/

zlynx · on Aug 1, 2018

Why try to redefine the word "string?"

In computer jargon I believe CISC and the PDP-11 have seniority. That's why all multi-word functions like memcpy are in C's string.h header.

lifthrasiir · on Aug 2, 2018

Hey, even C contains a locale-dependent string comparison, namely `strcoll` (since 1990!).

I admit two words "string" and "text" are now interchangable. But that doesn't make strings have less requirements, people are just expecting more out of strings.