Hacker News new | past | comments | ask | show | jobs | submit login

Dicts are UNORDERED associative containers. If youre depending your app on implementation defined behavior, that's on your developpers shoulders. Stuff like that shouldn't pass code review



It is surprising though that the order was non-deterministic, not only undefined. Determinism is something I intuitively expect from read-only operations like iteration.


Non-determinism comes from randomness in the hash function. I don't know the details of the Python implementation, but generally you want randomness at the very least when processing potentially hostile user input. Otherwise your code is susceptible to a denial of service attack where the attacker sends data that was crafted to maximize hash collisions, to break the expected performance of hash tables.

From a theoretical angle, the only way to build hash tables with some kind of provable runtime guarantees is to include randomness.


I think that's a habit you need to shake. If something is undefined, you should intuitively shy away from expecting things from it. If I found myself wondering why some undefined behavior was not consistent, I'd question why I even believed it should be consistent in the first place.


Would you expect that `str(some_dict) == str(some_dict)`?


For that exact line I might expect that the result is true but still consider that line as "smelly" and meaningless, since dict equality cannot be compared by comparing their string representation.

I'd expect that any two ways of forming a dict with exactly same contents can result in different str(x) representation, and also that serializing and deserializing a dict can result in a different string representation.


No, of course not. Some order has to be applied to serialize it to a string, but that order is not defined between calls.


Python long ago made the guarantee that:

> Keys and values are iterated over in an arbitrary order which is non-random, varies across Python implementations, and depends on the dictionary’s history of insertions and deletions. If keys, values and items views are iterated over with no intervening modifications to the dictionary, the order of items will directly correspond.

That is, the order will be maintained between calls, so long as the dictionary had not been modified.

Going back to "str(some_dict) == str(some_dict)". I would not expect to always be the same, but for entirely different reasons. Consider:

  class Strange():
    def __repr__(self):
      import random
      return str(random.random())

  some_dict = {1: Strange()}

  >>> str(some_dict) == str(some_dict)
  False


A section called "CPython implementation detail" looks like the very antithesis of a guarantee, to me.


My apologies. I looked for the answer I knew was there, and quoted the wrong section because it matched what I was looking for.

I should have quoted the next section:

> If items(), keys(), values(), iteritems(), iterkeys(), and itervalues() are called with no intervening modifications to the dictionary, the lists will directly correspond.

Curiously, the "Dictionary view objects" section at https://docs.python.org/2.7/library/stdtypes.html#dictionary... has the same "Keys and values are iterated over in an arbitrary order which is non-random ..." text, but without being inside of a "CPython implementation detail" box.


Huh, that's inconsistent. I guess they changed it to avoid the security problem and forgot to change the docs? Or maybe they didn't update 2.7 at all and it still behaves that way.


See this comment for the details:

https://news.ycombinator.com/item?id=16307007


It is simply that a python dict is not a list (read array).


Not for much longer, as of 3.7 the ordering is a language feature: https://mail.python.org/pipermail/python-dev/2017-December/1...

It's mad that it ever wasn't this way. Mapping-with-ordered-keys is such a useful and pervasive data structure (all database query result rows, for one) that an ordered dictionary should be a fundamental part of a language.

It has been so much more pleasant to write python since ordering was maintained by default.


> Mapping-with-ordered-keys is such a useful and pervasive data structure (all database query result rows, for one) that an ordered dictionary should be a fundamental part of a language.

Here's one real-life use case of that: Avro records. One of the formats Avro uses is a text format that's basically ordered JSON. One company I worked for years ago used Avro as its wire protocol, and some Avro data was stored as JSON files on disk. Of course, Python's JSON implementation by default loads/unloads JSON to/from a dict. So just calling json.load() and json.dump() means I can't just load an Avro record from disk, change some data, and save it (which is something that came up when I was writing an upgrade script at a company I was working at years ago).

Thankfully, I had an out: the JSON library lets you override what container you load JSON into with object_pairs_hook, so I could just snarf it into an OrderedDict. But if I ever have to do this again after 3.7 comes out, I'm glad I won't have to worry about making sure I have the right container class. It makes my code simpler, and I won't have to leave a comment explaining why the code will break unless I specify an OrderedDict.


Great example.

I hate to think of all the developer hours wasted because JSON doesn't maintain key ordering.


Not to mention the lost opportunities for delta compression.


>>(all database query result rows, for one)

What? No. SQL does not return results in any consistent ordering unless specifically instructed to.


Not the result set, the rows of the result set.


Do you mean the columns of the row?


No, I mean the rows. Each row is semantically an ordered mapping.


Shouldn’t the collection of rows be a set or list, not a dictionary?

That said, you disagreed with my question then went on to show my question was on point.

The “rows themself” being an ordered map means you are referring to the columns, the order being set by the SELECT clause or table definition order (in case of wildcard).

That said, I personally feel iterating over table columns in that way to be a “bad code smell”. Not saying it’s bad in all cases, but generally it’s an anti-pattern to me.


Order-significance is exactly how the relational model was defined in the beginning

"An array which represents an n-ary relation R has the following properties:

1. Each row represents an n-tuple of R.

2. The ordering of rows is immaterial.

3. All rows are distinct.

4. The ordering of columns is significant -- it corresponds to the ordering S1, S2, ..., Sn of the domains on which R is defined (see, however, remarks below on domain-ordered and domain-unordered relations).

5. The significance of each column is partially conveyed by labeling it with the name of the corresponding domain."

-- A relational model of data for large shared data banks[1]

[1]: https://cs.uwaterloo.ca/~david/cs848s14/codd-relational.pdf


> Shouldn’t the collection of rows be a set or list, not a dictionary?

I didn't mention the collection of rows, I mentioned the rows themselves.

> The “rows themself” being an ordered map means you are referring to the columns

No, it means I'm referring to the rows themselves.

The rows themselves are each ordered mappings.

Hope this helps to clarify things. Happy to keep repeating this as many times as necessary.


I think the confusion is your use of “columns” or “columns of the row” to refer to attribute-value pairs that make up the row.


Unless there is an "order by" clause, the order of rows in the result set is undefined and non-deterministic from query to query.


As I said:

> Not the result set, the rows of the result set.

Each row is a mapping with ordered keys.


Perhaps you are confused and mean "columns"? ("Rows of the result set" is what I was referring to.)

A result set has rows, which are not in a deterministic order unless an "order by" is provided. Each row has columns. The columns are in order, obviously.


Perhaps you are confused and think I'm talking about the ordering of the result set.

I said that database query result rows are made up of ordered dictionaries. I didn't mention ordering of the result set.

Happy to keep repeating this as many times as necessary.


ok. We're actually saying the same thing, just in different words.


I have the opposite reaction to it, it seems insanely idiotic to have a associative array with ordered keys. It can only make sense to someone who doesn't know anything about fundamental data structures and a language that caters to people like that in spite of the performance penalty is just strange.

but hey, its Guido, I still can't fathom that he moved reduce into functools.


I vaguely remember that the change to ordered keys in 3.6 was actually a side effect of making the dict implementation more efficient!

https://mail.python.org/pipermail/python-dev/2016-September/...


Well, given that I used reduce twice in production in 14 years of Python, my guess is that you are using Python wrong.


Putting aside the fact that this change was made to increase performance, I'd rather have a language that's useful, expressive, and semantically powerful by default, instead of one that is less powerful and harder to use but slightly faster.


The data structure OrderedDict does what you describe and has been in the stdlib since I think 2.7


Indeed, but it's much nicer when it's universal and built-in rather than requiring a specific import and different, more cumbersome syntax.


Ordereddict has been available for ages. Why should I pay for the overhead in the 98% of the cases where I don't need it.


The new dict implementation was introduced in 3.5 (with forced order randomization, which was removed in 3.6) because it is faster and uses less memory than the previous non-ordered dict. Ordering is merely a nice byproduct. So you're not paying any penalty.

OrderedDict is fundamentally different because it is designed to allow inserting/removing keys in the middle of the ordering in O(1) time. It does not use the new dict under the hood, or vice-versa.


Actually, 3.7 are not always ordered. If you delete a key, order is not guaranteed anymore. For performance reasons.


Are you sure about that? I thought they considered that but decided to just make them always ordered in the end.


Keeping the dict ordered is actually faster. There’s a link to a YouTube talk a little upthread that discusses the algorithms used. It’s really interesting.


agree completely

aside from that, python has a separate ordered dict class




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: