Hacker News new | past | comments | ask | show | jobs | submit login
Fixing Python's String class (pydanny.com)
20 points by pydanny on April 1, 2013 | hide | past | favorite | 10 comments



Getting the length of a string is a bad idea. Usually you do this because you want to iterate over the string one character at a time. But getting the character at a particular index is an O(n) operation, so iterating over a string this way is O(n^2).

This is 2013, strings are Unicode now, and we have to stop thinking of them as arrays of characters. The most widely used Unicode encodings, UTF-8 and UTF-16, both have variable-length characters.


The most obvious way to iterate over a string character by character in Python is 'for char in string: ...'; that loop invokes str.__iter__ which will be an O(n) rather than O(n^2) operation.


As of Python 3.3, unicode string indexing in Python is a constant time operation. http://www.python.org/dev/peps/pep-0393/


Unicode string indexing has always been constant time. On UCS-2 builds, the index might have returned the wrong thing, compared to what a UCS-4 build would have returned.


I understand this was a joke, but what's the real reason python does not have a "string".length


There is built-in len function and there should be only one way to do stuff according to Python philosophy.



Well, there are two parts here: First, why does Python have a method instead of an attribute, and second, why is it external instead of internal to the object?

To answer the first question, Python allows you to ask about the size of any finite collection, including collections for which knowing the actual length could require an O(n) (or greater!) traversal, and collections which are mutable and can have lengths which change. A method is a far better fit than an attribute for describing the length, since it is something that might have to be calculated rather than being known a priori.

(Of course, nearly all collections keep accounting of their length at all times, but the reasoning is still there.)

The second reason, about len() being external to collections, is related to Python's heritage. I posit, without proof, that len() is external because Python's granddaddy is Smalltalk, where iteration and collection manipulation are external, inverted from what we would expect in other modern languages.

Now, with all of this said, the actual reasons are lost to history, and I don't know if anybody actually knows the exact reason for the design decision anymore. But, nonetheless, that's the way that it is. Hope this was interesting.

Edit: Well, never mind, looks like Guido did explain it. Today I learned things. Leaving this up for posterity, but please do ignore it.


Thought this was a day early for April fools until I realized he uses UTC on his blog.


[deleted]


Principle of April Fool's Day




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: