Dicts are UNORDERED associative containers. If youre depending your app on implementation defined behavior, that's on your developpers shoulders. Stuff like that shouldn't pass code review
It is surprising though that the order was non-deterministic, not only undefined. Determinism is something I intuitively expect from read-only operations like iteration.
Non-determinism comes from randomness in the hash function. I don't know the details of the Python implementation, but generally you want randomness at the very least when processing potentially hostile user input. Otherwise your code is susceptible to a denial of service attack where the attacker sends data that was crafted to maximize hash collisions, to break the expected performance of hash tables.
From a theoretical angle, the only way to build hash tables with some kind of provable runtime guarantees is to include randomness.
I think that's a habit you need to shake. If something is undefined, you should intuitively shy away from expecting things from it. If I found myself wondering why some undefined behavior was not consistent, I'd question why I even believed it should be consistent in the first place.
For that exact line I might expect that the result is true but still consider that line as "smelly" and meaningless, since dict equality cannot be compared by comparing their string representation.
I'd expect that any two ways of forming a dict with exactly same contents can result in different str(x) representation, and also that serializing and deserializing a dict can result in a different string representation.
> Keys and values are iterated over in an arbitrary order which is non-random, varies across Python implementations, and depends on the dictionary’s history of insertions and deletions. If keys, values and items views are iterated over with no intervening modifications to the dictionary, the order of items will directly correspond.
That is, the order will be maintained between calls, so long as the dictionary had not been modified.
Going back to "str(some_dict) == str(some_dict)". I would not expect to always be the same, but for entirely different reasons. Consider:
class Strange():
def __repr__(self):
import random
return str(random.random())
some_dict = {1: Strange()}
>>> str(some_dict) == str(some_dict)
False
My apologies. I looked for the answer I knew was there, and quoted the wrong section because it matched what I was looking for.
I should have quoted the next section:
> If items(), keys(), values(), iteritems(), iterkeys(), and itervalues() are called with no intervening modifications to the dictionary, the lists will directly correspond.
Curiously, the "Dictionary view objects" section at https://docs.python.org/2.7/library/stdtypes.html#dictionary... has the same "Keys and values are iterated over in an arbitrary order which is non-random ..." text, but without being inside of a "CPython implementation detail" box.
Huh, that's inconsistent. I guess they changed it to avoid the security problem and forgot to change the docs? Or maybe they didn't update 2.7 at all and it still behaves that way.
It's mad that it ever wasn't this way. Mapping-with-ordered-keys is such a useful and pervasive data structure (all database query result rows, for one) that an ordered dictionary should be a fundamental part of a language.
It has been so much more pleasant to write python since ordering was maintained by default.
> Mapping-with-ordered-keys is such a useful and pervasive data structure (all database query result rows, for one) that an ordered dictionary should be a fundamental part of a language.
Here's one real-life use case of that: Avro records. One of the formats Avro uses is a text format that's basically ordered JSON. One company I worked for years ago used Avro as its wire protocol, and some Avro data was stored as JSON files on disk. Of course, Python's JSON implementation by default loads/unloads JSON to/from a dict. So just calling json.load() and json.dump() means I can't just load an Avro record from disk, change some data, and save it (which is something that came up when I was writing an upgrade script at a company I was working at years ago).
Thankfully, I had an out: the JSON library lets you override what container you load JSON into with object_pairs_hook, so I could just snarf it into an OrderedDict. But if I ever have to do this again after 3.7 comes out, I'm glad I won't have to worry about making sure I have the right container class. It makes my code simpler, and I won't have to leave a comment explaining why the code will break unless I specify an OrderedDict.
Shouldn’t the collection of rows be a set or list, not a dictionary?
That said, you disagreed with my question then went on to show my question was on point.
The “rows themself” being an ordered map means you are referring to the columns, the order being set by the SELECT clause or table definition order (in case of wildcard).
That said, I personally feel iterating over table columns in that way to be a “bad code smell”. Not saying it’s bad in all cases, but generally it’s an anti-pattern to me.
Order-significance is exactly how the relational model was defined in the beginning
"An array which represents an n-ary relation R has the following properties:
1. Each row represents an n-tuple of R.
2. The ordering of rows is immaterial.
3. All rows are distinct.
4. The ordering of columns is significant -- it corresponds to the ordering S1, S2, ..., Sn of the domains on which R is defined (see, however, remarks below on domain-ordered and domain-unordered relations).
5. The significance of each column is partially conveyed by labeling it with the name of the corresponding domain."
-- A relational model of data for large shared data banks[1]
Perhaps you are confused and mean "columns"?
("Rows of the result set" is what I was referring to.)
A result set has rows, which are not in a deterministic order unless an "order by" is provided. Each row has columns. The columns are in order, obviously.
I have the opposite reaction to it, it seems insanely idiotic to have a associative array with ordered keys. It can only make sense to someone who doesn't know anything about fundamental data structures and a language that caters to people like that in spite of the performance penalty is just strange.
but hey, its Guido, I still can't fathom that he moved reduce into functools.
Putting aside the fact that this change was made to increase performance, I'd rather have a language that's useful, expressive, and semantically powerful by default, instead of one that is less powerful and harder to use but slightly faster.
The new dict implementation was introduced in 3.5 (with forced order randomization, which was removed in 3.6) because it is faster and uses less memory than the previous non-ordered dict. Ordering is merely a nice byproduct. So you're not paying any penalty.
OrderedDict is fundamentally different because it is designed to allow inserting/removing keys in the middle of the ordering in O(1) time. It does not use the new dict under the hood, or vice-versa.
Keeping the dict ordered is actually faster. There’s a link to a YouTube talk a little upthread that discusses the algorithms used. It’s really interesting.