Hacker News new | past | comments | ask | show | jobs | submit login

It is surprising though that the order was non-deterministic, not only undefined. Determinism is something I intuitively expect from read-only operations like iteration.



Non-determinism comes from randomness in the hash function. I don't know the details of the Python implementation, but generally you want randomness at the very least when processing potentially hostile user input. Otherwise your code is susceptible to a denial of service attack where the attacker sends data that was crafted to maximize hash collisions, to break the expected performance of hash tables.

From a theoretical angle, the only way to build hash tables with some kind of provable runtime guarantees is to include randomness.


I think that's a habit you need to shake. If something is undefined, you should intuitively shy away from expecting things from it. If I found myself wondering why some undefined behavior was not consistent, I'd question why I even believed it should be consistent in the first place.


Would you expect that `str(some_dict) == str(some_dict)`?


For that exact line I might expect that the result is true but still consider that line as "smelly" and meaningless, since dict equality cannot be compared by comparing their string representation.

I'd expect that any two ways of forming a dict with exactly same contents can result in different str(x) representation, and also that serializing and deserializing a dict can result in a different string representation.


No, of course not. Some order has to be applied to serialize it to a string, but that order is not defined between calls.


Python long ago made the guarantee that:

> Keys and values are iterated over in an arbitrary order which is non-random, varies across Python implementations, and depends on the dictionary’s history of insertions and deletions. If keys, values and items views are iterated over with no intervening modifications to the dictionary, the order of items will directly correspond.

That is, the order will be maintained between calls, so long as the dictionary had not been modified.

Going back to "str(some_dict) == str(some_dict)". I would not expect to always be the same, but for entirely different reasons. Consider:

  class Strange():
    def __repr__(self):
      import random
      return str(random.random())

  some_dict = {1: Strange()}

  >>> str(some_dict) == str(some_dict)
  False


A section called "CPython implementation detail" looks like the very antithesis of a guarantee, to me.


My apologies. I looked for the answer I knew was there, and quoted the wrong section because it matched what I was looking for.

I should have quoted the next section:

> If items(), keys(), values(), iteritems(), iterkeys(), and itervalues() are called with no intervening modifications to the dictionary, the lists will directly correspond.

Curiously, the "Dictionary view objects" section at https://docs.python.org/2.7/library/stdtypes.html#dictionary... has the same "Keys and values are iterated over in an arbitrary order which is non-random ..." text, but without being inside of a "CPython implementation detail" box.


Huh, that's inconsistent. I guess they changed it to avoid the security problem and forgot to change the docs? Or maybe they didn't update 2.7 at all and it still behaves that way.


See this comment for the details:

https://news.ycombinator.com/item?id=16307007


It is simply that a python dict is not a list (read array).




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: