In addition to separate string types, they have separate iterator types that let you explicitly get the value you want. So:
String.len() == number of bytes
String.bytes().count() == number of bytes
String.chars().count() == number of unicode scalar values
String.graphemes().count() == number of graphemes (requires unicode-segmentation which is not in the stdlib)
String.lines().count() == number of lines
Really my only complaint is I don't think String.len() should exist, it's too ambiguous. We should have to explicitly state what we want/mean via the iterators.
That's a real nice API. (Similarly, python has @ for matmul but there is not an implementation of matmul in stdlib. NumPy has a matmul implementation so that the `@` operator works.)
ugrapheme and ucwidth are one way to get the graphene count from a string in Python.
It's probably possible to get the grapheme cluster count from a string containing emoji characters with ICU?
Any correctly designed grapheme cluster handles emoji characters. It’s part of the spec (says the guy who wrote a Unicode segmentation library for rust).
In addition to separate string types, they have separate iterator types that let you explicitly get the value you want. So:
Really my only complaint is I don't think String.len() should exist, it's too ambiguous. We should have to explicitly state what we want/mean via the iterators.