Hacker News new | past | comments | ask | show | jobs | submit login

I don't understand this comment. The hash method changed under different releases of Python 2.x, enough that was had to change our doctests to ensure a consistent order. So as far as I'm aware, the only guarantee is that calling .keys() and calling .values() will give you the terms in the same order, so long as there hasn't been a modification in the middle.



Ok, try running this in a program like so:

    d = {chr(i):i for i in range(65,91)}
    print(d)
Do it with python2 and python3. You'll see that the output in python3 changes every time.

If someone was relying on consistent ordering, they're going to have a bug.

Python2's ordering is deterministic[0]

[0]https://docs.python.org/2/library/stdtypes.html#dict.items


Okay, we're talking about the same thing. As the documentation points out:

> If items(), keys(), values(), iteritems(), iterkeys(), and itervalues() are called with no intervening modifications to the dictionary, the lists will directly correspond.

If you restart Python, you have broken the correspondence.

You'll note that either the documentation is incomplete or your interpretation is incorrect, as the most recent versions of 2.6 and 2.7 will use a randomized hash table when the -R flag is enabled:

    % ~/Python-2.7.10/python.exe -R x.py
    {'M': 77, 'L': 76, 'O': 79, 'N': 78, 'I': 73, 'H': 72,
     'K': 75, 'J': 74, 'E': 69, 'D': 68, 'G': 71, 'F': 70,
     'A': 65, 'C': 67, 'B': 66, 'Y': 89, 'X': 88, 'Z': 90,
     'U': 85, 'T': 84, 'W': 87, 'V': 86, 'Q': 81, 'P': 80,
     'S': 83, 'R': 82}
    % ~/Python-2.7.10/python.exe -R x.py
    {'Z': 90, 'Y': 89, 'X': 88, 'W': 87, 'V': 86, 'U': 85,
     'T': 84, 'S': 83, 'R': 82, 'Q': 81, 'P': 80, 'O': 79,
     'N': 78, 'M': 77, 'L': 76, 'K': 75, 'J': 74, 'I': 73,
     'H': 72, 'G': 71, 'F': 70, 'E': 69, 'D': 68, 'C': 67,
     'B': 66, 'A': 65}
I can totally understand how people expect an invariant order. As I pointed out, our regression code broke in the 2.x series because we relied on consistent ordering, and CPython never made that promise. But what I quoted above is the only guarantee about dictionary order. Everything else is an implementation accident.

Nor is it the only such implementation-specific behavior that people sometimes depend on.

  >>> for c in "This is a test":
  ...   if c is "i": print "Got one!"
  ... 
  Got one!
  Got one!
That's under CPython, where single character strings with chr(c)<256 use an intern table. Pypy doesn't print anything because it doesn't use that mechanism.

Note that 'is' testing is also faster:

    % python -mtimeit -s 's="testing 1, 2, 3."*1000' 'sum(1 for c in s if c is "t")'
    1000 loops, best of 3: 893 usec per loop
    % python -mtimeit -s 's="testing 1, 2, 3."*1000' 'sum(1 for c in s if c == "t")'
    1000 loops, best of 3: 1.01 msec per loop
This extra 10% is sometimes attractive.


Sure, but why did you enable the -R flag?

We are talking about the default way of doing things in the most commonly by far used implementation.

I'm not saying someone should have relied on the specific ordering or that the code that does rely on it is a great way of doing things.

CPython2 did make that promise -

    CPython implementation detail: Keys and values are listed in an arbitrary order which is non-random, varies across Python implementations, and depends on the dictionary’s history of insertions and deletions.
In other words, the order should be the same no matter how many times you restart the application.

It is a subtle source of bugs since it always worked in each specific version of CPython2 without the -R flag.


"why did you enable the -R flag?"

Because either the documentation means to include -R in the description, in which case your interpretation of the documentation is incorrect, or the documentation is incomplete because it doesn't describe a valid CPython 2.x run-time. Either way, it indicates that the difference isn't, strictly speaking, a Python2/3 issue.

"an arbitrary order"

Where does it say that the arbitrary order must be consistent across multiple invocations? Quoting from https://docs.python.org/2/using/cmdline.html#cmdoption-R :

> Changing hash values affects the order in which keys are retrieved from a dict. Although Python has never made guarantees about this ordering (and it typically varies between 32-bit and 64-bit builds), enough real-world code implicitly relies on this non-guaranteed behavior that the randomization is disabled by default.

I totally understand your point. I remember the debates about how this would break code. But it's there to mitigate algorithmic complexity attacks against an every increasing attack surface. This was the best solution they come up with, along with a migration path to the new default.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: