> Notably Rust did the correct thing In addition to separate string types, they ...

pron · 2025-08-22T19:22:13 1755890533

Similar to Java:

   String.chars().count(), String.codePoints().count(), and, for historical reasons, String.getBytes(UTF-8).length

westurner · 2025-08-22T19:04:16 1755889456

  String.graphemes().count()

That's a real nice API. (Similarly, python has @ for matmul but there is not an implementation of matmul in stdlib. NumPy has a matmul implementation so that the `@` operator works.)

ugrapheme and ucwidth are one way to get the graphene count from a string in Python.

It's probably possible to get the grapheme cluster count from a string containing emoji characters with ICU?

dhosek · 2025-08-22T19:54:38 1755892478

Any correctly designed grapheme cluster handles emoji characters. It’s part of the spec (says the guy who wrote a Unicode segmentation library for rust).