I'd say all of the above objections don't fix any of the actual issues (which revolve stem from there being multiple, similar, metrics one wants regarding the size in bytes, characters, normalized characters, etc of a unicode string), while introducing extra semantic bike-shedding!
Whether the metric is called "length", or "size", or whatever, is irrelevant, and length for strings and arrays and lists is so well entrenched and understood that the objections don't make sense.
What's actually the problem is not that "string length" is not "really" a length in the way that a leg or a wall has length, but that for unicode strings it's difficult to calculate, and confusing, unless one understands several unicode implementation mechanics...
The name length itself, for example, or the measuring of said length, was never an issue with ASCII strings (the issue there was to remember to add/subtract the NUL at the end).
It doesn’t matter how well you understand Unicode, it is impossible to compute “the length of a string” because there is no single metric that means that.
It’s like measuring “the size of a box.” I want volume, you want maximum linear dimension, UPS wants length plus width plus height. The problem with “the size of a box” isn’t that it’s hard to measure, it’s that it doesn’t exist. Imagine your favorite language has a Box type with a “size” property. What does it return. How likely is it that the thing it measures is the thing you want?
Of course it was never a problem for ASCII, because ASCII is structured to make most of the measurements people care about be the same value. I want bytes, you want code points, he wants “characters,” doesn’t matter, they’re all the same number.
ASCII is also incapable of representing real world text with good fidelity. It’s inherently impossible to remedy that while maintaining a singular definition of “length.” If you value length measurement over fidelity, you can keep using ASCII.
>It doesn’t matter how well you understand Unicode, it is impossible to compute “the length of a string” because there is no single metric that means that.
That's already covered in my comment: ("[the problem] stems from there being multiple, similar, metrics one wants").
Whether we call all of them length or a specialized name is not the real problem.
The real problem is you need to know what you want of each, and some of them (e.g. regarding normalization, decomposition, and so on) can be hard to grasp.
In 99% of cases people either want to know "how many bytes", or "how many discreet character glyphs of final output" (even if they have combining diacritics etc).
It's really rare to care about the number of glyphs. That's not something you can answer for a string in isolation, anyway; it depends on the font being used to render it. The only code that would care about this would be something like a text rendering engine allocating a buffer to hold glyph info.
I suspect you mean the number of grapheme clusters, which is Unicode's attempt to define something that lines up with the intuitive notion of "a character." This is basically a unit that your cursor moves over when you press an arrow key.
However, it's pretty uncommon to want to know the number of grapheme clusters too. Lots of people think they want to know it, but I struggle to come up with a use case where it's actually appropriate. An intentionally arbitrary limit like Tweet length is the best I can think of.
"How many bytes" is ambiguous. Do you mean UTF-8, UTF-16, UTF-32, or something else?
There are a lot of different ways to answer the question, "how long is this string?"
You did mention similar metrics, but you then went on to say that the objections don't make sense and that the actual problem is that length for a unicode string is difficult to calculate.
My point is that the difficulty of calculating a length is not the problem. It's annoying, but people have written the code to do it and there's rarely any reason to write it yourself. Just call into whatever library and have it do the work. The problem is that you have to know what kind of question to ask so you can make the call that will actually give you the answer that you need. And that is not the sort of thing that can be wrapped up in a nice little API.
Apparently I misremembered slightly: it's actually width + 2 x length + 2 x height.
That link doesn't seem to explain the why, but my understanding is that it's just a decent heuristic for the general difficulty of handling packages as they go through the system. Volume wouldn't be appropriate, because a really long, skinny box is harder to handle than a cube of the same volume.
How do you decide which dimension is the width versus the length? I assume height is often significant for packages containing things that shouldn't be turned upside down, but length versus width seems pretty arbitrary. Is width just assumed to be the shorter of the two dimensions?
Length is defined as the longest side. The other two sides are interchangeable so pick what you like. This measure doesn’t appear to account for packages that require a certain orientation.
Whether the metric is called "length", or "size", or whatever, is irrelevant, and length for strings and arrays and lists is so well entrenched and understood that the objections don't make sense.
What's actually the problem is not that "string length" is not "really" a length in the way that a leg or a wall has length, but that for unicode strings it's difficult to calculate, and confusing, unless one understands several unicode implementation mechanics...
The name length itself, for example, or the measuring of said length, was never an issue with ASCII strings (the issue there was to remember to add/subtract the NUL at the end).