It is surprising though that the order was non-deterministic, not only undefined...

fyi1183 · on Feb 5, 2018

Non-determinism comes from randomness in the hash function. I don't know the details of the Python implementation, but generally you want randomness at the very least when processing potentially hostile user input. Otherwise your code is susceptible to a denial of service attack where the attacker sends data that was crafted to maximize hash collisions, to break the expected performance of hash tables.

From a theoretical angle, the only way to build hash tables with some kind of provable runtime guarantees is to include randomness.

stavros · on Feb 5, 2018

I think that's a habit you need to shake. If something is undefined, you should intuitively shy away from expecting things from it. If I found myself wondering why some undefined behavior was not consistent, I'd question why I even believed it should be consistent in the first place.

ridiculous_fish · on Feb 5, 2018

Would you expect that `str(some_dict) == str(some_dict)`?

PeterisP · on Feb 5, 2018

For that exact line I might expect that the result is true but still consider that line as "smelly" and meaningless, since dict equality cannot be compared by comparing their string representation.

I'd expect that any two ways of forming a dict with exactly same contents can result in different str(x) representation, and also that serializing and deserializing a dict can result in a different string representation.

stavros · on Feb 5, 2018

No, of course not. Some order has to be applied to serialize it to a string, but that order is not defined between calls.

eesmith · on Feb 5, 2018

Python long ago made the guarantee that:

> Keys and values are iterated over in an arbitrary order which is non-random, varies across Python implementations, and depends on the dictionary’s history of insertions and deletions. If keys, values and items views are iterated over with no intervening modifications to the dictionary, the order of items will directly correspond.

That is, the order will be maintained between calls, so long as the dictionary had not been modified.

Going back to "str(some_dict) == str(some_dict)". I would not expect to always be the same, but for entirely different reasons. Consider:

  class Strange():
    def __repr__(self):
      import random
      return str(random.random())

  some_dict = {1: Strange()}

  >>> str(some_dict) == str(some_dict)
  False

stavros · on Feb 5, 2018

A section called "CPython implementation detail" looks like the very antithesis of a guarantee, to me.

eesmith · on Feb 5, 2018

My apologies. I looked for the answer I knew was there, and quoted the wrong section because it matched what I was looking for.

I should have quoted the next section:

> If items(), keys(), values(), iteritems(), iterkeys(), and itervalues() are called with no intervening modifications to the dictionary, the lists will directly correspond.

Curiously, the "Dictionary view objects" section at https://docs.python.org/2.7/library/stdtypes.html#dictionary... has the same "Keys and values are iterated over in an arbitrary order which is non-random ..." text, but without being inside of a "CPython implementation detail" box.

stavros · on Feb 5, 2018

Huh, that's inconsistent. I guess they changed it to avoid the security problem and forgot to change the docs? Or maybe they didn't update 2.7 at all and it still behaves that way.

ubernostrum · on Feb 5, 2018

See this comment for the details:

https://news.ycombinator.com/item?id=16307007

brett40324 · on Feb 5, 2018

It is simply that a python dict is not a list (read array).