I've been toying around with Python 3 and using it for most of my personal/hack projects, but I somehow missed the unpacking improvements: https://www.python.org/dev/peps/pep-0448/
In particular, being able to create an updated copy of a dict with a single expression is pretty cool:
return {**old, 'foo': 'bar'}
# Old way
new = old.copy
new['foo'] = ['bar']
return new
It's twice as slow, doesn't work with more than one dictionary, you can't easily control the merge prescience and you can't (easily) use expressions/variables in the keys:
On the face of it, sure, but really it's some new bytecode that gets rid of the limitations of using kwargs like that, namely it's slow, impossible to really optimize and doesn't support duplicate keys.
Not really, and it's not getting most pythonic way of doing things IMO. With the syntax above (I think) you can work with any iterable, whereas with a dictionary union operator you'd have to define it on every class you'd want to use, and you'd be out of luck with generators.
I've been practicing Python for a while and didn't even know about this. In my code style, I try to completely avoid the "dict" keyword and exclusively use dict literal notation.
I'm allergic to the { ... } thing. And I like terse and cryptic, but for some reason I find it the less attractive syntactical trick of all python3 (that I know).
bear in mind that most things javascript got recently are old. literals, closure syntax, destruct, spread .. (lisp, scheme of course, but yeah python took it a while back too) this is all very very old but now ECMAxxxx is bringing it to the mainstream.
It's so strange to me that data scientists would need to be convinced to move to Python 3. It's superior in every way to legacy Python. I can understand maintaing Python 3 compatbility for legacy systems if you don't want to have Python 3 as a dependancy, but data scientists will be writing mostly ad hoc code and using Jupyter notebooks. The people around me are not allowed to use Python 2, in fact they're generally required to use the latest version of Python 3.
For anyone having trouble with maintaing multiple Python versions, I recommend Pyenv. You can install multiple local versions and switch between them. The selected version then uses the standard commands "python" and "pip", etc. which you can use to make your virtualenv from.
If you’re doing data science, do consider the anaconda python distribution with conda environments instead. Intel MKL linked numpy gives a lot better performance, and conda can install binary dependencies for libs that are tricky to compile even on linux. Pip packages are just more work and less performance in my experience (~10x)
For me, it's as simple as python is not a moving target. Whatever new changes there are in python 3 doesn't affect me since I won't be using any of the new features even if I were to use python 3.
Tradition? ;) I only use python for deep learning, which is not really that intensive in terms of code. Also most of the code out there for deep learning works on both python 2 and 3.
Also, your phrasing invites the question. Why on earth would I use a new feature in a turing complete language? Unless using the new feature results in tangible improvements in my code, I don't see a reason to use it.
That's not the point. The point is there is no reason to use an extra language feature unless it provides a tangible improvement in the code.
In fact, there are lots of reasons not to use an extra language feature that doesn't provide tangible improvements in code, including maintainability, ease of reading code, and portability in python's case.
I've moved to python 3 over the past couple of months, after resisting for the better part of a decade. I like it.
One surprising thing I learned from this document is that dicts now iterate in assignment order, not hash order. That's going to break some code for people.
I believe newer Python versions have at least deterministic iteration order (correct me if I'm wrong), while some previous ones had a non-deterministic one (non-deterministic between mutiple invocations of the interpreter). But there is also OrderedDict which iterates in assignment order.
In Python 2.7 prior to 2.7.3, or Python 3 prior to 3.3, iterating a dictionary would -- if it always contained the same set of keys -- be in a consistent order across multiple runs of the same program. This was an implementation detail not to be relied on.
Starting in 2.7.3 and 3.3, the hashing algorithm applied a per-process random salt to certain built-in types, in order to mitigate a potential denial-of-service vector by crafting dictionary keys which cause hash collisions. Unless the PYTHONHASHSEED environment variable was set to a consistent value, iteration over a dictionary was no longer accidentally consistent across runs.
In 3.6, the dictionary implementation was rewritten to drastically improve performance. As a side effect, dictionary iteration became consistent again, this time determined by insertion order. In 3.6, this consistency was an implementation detail and not to be relied on.
Beginning with 3.7, dictionary iteration order is finally guaranteed by the language, independent of implementation, and goes by insertion order.
Hmmmmmm.... that makes it hard for other implementations. CPython isn't the whole of the Python world. Requirements like that can be problematic for implementations like Micropython, for instance.
The entire reason they decided to make it a standard rather than keep it as an implementation detail is to make things better for users. It’s not about just looking after cpython.
I suspect the reasoning was with CPython dominating, people would eventually make this assumption in their code, perhaps unintentionally, and reduce compatibility. Better to make it official.
EDIT: I was just informed it is as of 3.7 an official language feature, making everything below invalid
Personally, I'm worried people will come to rely on the new behaviour in code instead. As core developers have repeatedly said, dict order is still an implementation detail, it should not be relied on as it
is not officially part of the language. Other implementations (except pypy) will probably not have this behaviour.
Yet, I feel like this will fall on deaf ears and become a de-facto part of the language. Blogs will state it as a new feature, Python books will teach it and new coders will rely on it, forever locking the dict internals in place for all python interpreters.
(If you need to rely on ordering, use an OrderedDict instead.)
Dicts are UNORDERED associative containers. If youre depending your app on implementation defined behavior, that's on your developpers shoulders. Stuff like that shouldn't pass code review
It is surprising though that the order was non-deterministic, not only undefined. Determinism is something I intuitively expect from read-only operations like iteration.
Non-determinism comes from randomness in the hash function. I don't know the details of the Python implementation, but generally you want randomness at the very least when processing potentially hostile user input. Otherwise your code is susceptible to a denial of service attack where the attacker sends data that was crafted to maximize hash collisions, to break the expected performance of hash tables.
From a theoretical angle, the only way to build hash tables with some kind of provable runtime guarantees is to include randomness.
I think that's a habit you need to shake. If something is undefined, you should intuitively shy away from expecting things from it. If I found myself wondering why some undefined behavior was not consistent, I'd question why I even believed it should be consistent in the first place.
For that exact line I might expect that the result is true but still consider that line as "smelly" and meaningless, since dict equality cannot be compared by comparing their string representation.
I'd expect that any two ways of forming a dict with exactly same contents can result in different str(x) representation, and also that serializing and deserializing a dict can result in a different string representation.
> Keys and values are iterated over in an arbitrary order which is non-random, varies across Python implementations, and depends on the dictionary’s history of insertions and deletions. If keys, values and items views are iterated over with no intervening modifications to the dictionary, the order of items will directly correspond.
That is, the order will be maintained between calls, so long as the dictionary had not been modified.
Going back to "str(some_dict) == str(some_dict)". I would not expect to always be the same, but for entirely different reasons. Consider:
class Strange():
def __repr__(self):
import random
return str(random.random())
some_dict = {1: Strange()}
>>> str(some_dict) == str(some_dict)
False
My apologies. I looked for the answer I knew was there, and quoted the wrong section because it matched what I was looking for.
I should have quoted the next section:
> If items(), keys(), values(), iteritems(), iterkeys(), and itervalues() are called with no intervening modifications to the dictionary, the lists will directly correspond.
Curiously, the "Dictionary view objects" section at https://docs.python.org/2.7/library/stdtypes.html#dictionary... has the same "Keys and values are iterated over in an arbitrary order which is non-random ..." text, but without being inside of a "CPython implementation detail" box.
Huh, that's inconsistent. I guess they changed it to avoid the security problem and forgot to change the docs? Or maybe they didn't update 2.7 at all and it still behaves that way.
It's mad that it ever wasn't this way. Mapping-with-ordered-keys is such a useful and pervasive data structure (all database query result rows, for one) that an ordered dictionary should be a fundamental part of a language.
It has been so much more pleasant to write python since ordering was maintained by default.
> Mapping-with-ordered-keys is such a useful and pervasive data structure (all database query result rows, for one) that an ordered dictionary should be a fundamental part of a language.
Here's one real-life use case of that: Avro records. One of the formats Avro uses is a text format that's basically ordered JSON. One company I worked for years ago used Avro as its wire protocol, and some Avro data was stored as JSON files on disk. Of course, Python's JSON implementation by default loads/unloads JSON to/from a dict. So just calling json.load() and json.dump() means I can't just load an Avro record from disk, change some data, and save it (which is something that came up when I was writing an upgrade script at a company I was working at years ago).
Thankfully, I had an out: the JSON library lets you override what container you load JSON into with object_pairs_hook, so I could just snarf it into an OrderedDict. But if I ever have to do this again after 3.7 comes out, I'm glad I won't have to worry about making sure I have the right container class. It makes my code simpler, and I won't have to leave a comment explaining why the code will break unless I specify an OrderedDict.
Shouldn’t the collection of rows be a set or list, not a dictionary?
That said, you disagreed with my question then went on to show my question was on point.
The “rows themself” being an ordered map means you are referring to the columns, the order being set by the SELECT clause or table definition order (in case of wildcard).
That said, I personally feel iterating over table columns in that way to be a “bad code smell”. Not saying it’s bad in all cases, but generally it’s an anti-pattern to me.
Order-significance is exactly how the relational model was defined in the beginning
"An array which represents an n-ary relation R has the following properties:
1. Each row represents an n-tuple of R.
2. The ordering of rows is immaterial.
3. All rows are distinct.
4. The ordering of columns is significant -- it corresponds to the ordering S1, S2, ..., Sn of the domains on which R is defined (see, however, remarks below on domain-ordered and domain-unordered relations).
5. The significance of each column is partially conveyed by labeling it with the name of the corresponding domain."
-- A relational model of data for large shared data banks[1]
Perhaps you are confused and mean "columns"?
("Rows of the result set" is what I was referring to.)
A result set has rows, which are not in a deterministic order unless an "order by" is provided. Each row has columns. The columns are in order, obviously.
I have the opposite reaction to it, it seems insanely idiotic to have a associative array with ordered keys. It can only make sense to someone who doesn't know anything about fundamental data structures and a language that caters to people like that in spite of the performance penalty is just strange.
but hey, its Guido, I still can't fathom that he moved reduce into functools.
Putting aside the fact that this change was made to increase performance, I'd rather have a language that's useful, expressive, and semantically powerful by default, instead of one that is less powerful and harder to use but slightly faster.
The new dict implementation was introduced in 3.5 (with forced order randomization, which was removed in 3.6) because it is faster and uses less memory than the previous non-ordered dict. Ordering is merely a nice byproduct. So you're not paying any penalty.
OrderedDict is fundamentally different because it is designed to allow inserting/removing keys in the middle of the ordering in O(1) time. It does not use the new dict under the hood, or vice-versa.
Keeping the dict ordered is actually faster. There’s a link to a YouTube talk a little upthread that discusses the algorithms used. It’s really interesting.
Before 3.7, dict ordering is documented literally everywhere as "consider it implementation details, use OrderedDict if you need order". From 3.7, it's part of the spec.
If you were relying on keys order before, you were not only doing it wrong, but you were doing so despite being told again and again.
This has been changed in Python 3.6 only, due to a change in dictionary implementation to make them more efficient. "Modern Dictionaries by Raymond Hettinger" [1] is a quite interesting and technical talk about these changes, explaining also the change in iteration order IIRC, worth a watch in my opinion!
This is why you read the documentation rather than relying on what a piece of library code appears to do. Maybe all programmers have to learn this the hard way.
I’m a little surprised at this point that Apple still doesn’t include a default Python 3.x on macOS. It’s the single thing keeping me from moving (as there’s a big difference between “just run this” and “first download this, then run this”).
My theory is that this is not going to be a nice change when they drop 2.7. Maybe one version will be released with both installed by default, then they'll use only 3.7/3.8 for the next decade. MacOS doesn't seem to care a lot about backwards compatibility recently.
I’m not so sure; /System/Library/Frameworks links for Python versions have been stable for a very long time (which surprised me at first but I imagine Apple has plenty of stuff of their own that uses Python). Even though they’ve since hacked a lot of the older versions with symbolic links to 2.7, versions back to 2.3 have valid paths.
Honestly the fact that they include a system Python 2 is a huge pain. You can't add packages to it, and you shouldn't add another Python 2 interpreter to the PATH. You end up having to use virtualenv which is a stupid hack.
I used PyInstaller for the first time this week to build a single binary executable for Linux. So far in limited use the result has been great. I think I'm gonna do OS X next.
I'd been a 2.7 holdout for ages, but when f-strings were greenlit for 3.6, I decided then and there that all my new personal projects would be written in 3.
I'm glad I did. F-strings are wonderful, as is pathlib and the enhanced unpacking syntax.
Since I started my current job, I've also been writing as many scripts as I can in Python 3 as well (and Docker has been a godsend for that because I can now deploy 3.6 scripts to the few servers we have that are still running RHEL 6).
This runs "/opt/command_to_run /fileToProcess" inside a container as the current uid, mounting the parameter to the shell script as "/file_to_process" inside the container.
The :z mount parameter may or may not be needed depending upon whether you have SELinux enabled or not (by default, SELinux prevents countainers accessing any file on the host, and :z changes the SE context to allow access). I don't know if this is the case with EL6 tho.
The -t parameter shouldn't be used if your script is running in a pipeline (it creates a pseudo-tty). So it may be worth having some kind of conditional to remove this.
The wrapper I use also has a conditional to add the "$(pwd)" prefix to the call parameter only if the parameter is a relative path.
And at some point, I'll make proper startup scripts for them. On RHEL 7 boxes, I've made systemd unit files. On RHEL 6... well, I suppose I'll be writing initscripts soon.
Several posters indicate that they’ve stuck to python 2.7 even for small side projects until now. I cannot understand why? Python 3 seems to have been technically superior for a few years, and side projects must surely be good for learning something new?
But surely that's not stopping anyone. As someone above said, just use pyenv to run Python 2.7x and Python 3. It's not as if anyone has to settle for using only legacy Python.
If you would like to write Python 3 but need to maintain support for any particular version of Python 2, something that I can personally recommend is using the Coconut transpiler: http://coconut-lang.org/
The Coconut language is a superset of Python 3, so any valid Python 3 is valid Coconut. But the advantage of the transpiler extends beyond the language itself, in that it can target any version of Python from 2.6 onwards on the Python 2 branch and 3.2 onwards on the Python 3 branch.
It has been really useful for me in that I want the advantages of type annotations and f-strings and other python 3.5+ features but I have to support running in an environment with only 2.6 installed. So when I target a 3.5+ version, all of those features are maintained, but when I target 2.6, the transpiler does all the work in converting to running 2.6 code for me.
I have been conditioned to write code that is 2/3 compatible, that even when I am writing specifically for the PY3 interpreter the code turns out to be a __future__ import away from being valid PY2. I did not think much of them at first, but very recently, f-strings have changed that.
I think many people get imprinted with writing the PY3 code using only the features available at the time they switched over.
Back-ports are an option but they are an extra dependency. When you have not used a new feature, the cost of an extra dependency out weighs the unknown benefits that would be realised by using a back-port.
Every time somebody mentions how awsome f-strings are makes me laugh. It was
around ten years ago when Python community was looking down at Perl and shell
with their string interpolation, but now that Python got pretty much the same
it's suddenly not considered a misfeature.
Have you considered that maybe the language had not evolved enough for them to be appealing. With b'' and u'' string syntax coming into the language, there is a realisation that string interpolation can be an opt-in feature among other reasons.
What you have is decision making analogous to the function
> With b'' and u'' string syntax coming into the language, there is a realisation that string interpolation can be an opt-in feature among other reasons.
This argument won't fly. Did you know that string interpolation in Perl and
shell was always an opt-in feature? And despite that it was frowned upon by
Python community.
> Previously it was always tempting to use string concatenation (concise, but obviously bad), now with pathlib the code is safe, concise, and readable.
This is the kind of feature that I'm wary to use even in scripts: questionable benefit, and probably too clever.
I'll challenge you: You shouldn't have many paths, should keep them at a centralized location in the code, and there you could just use a normal function with a proper self-documenting name.
It's not like programs consist of path operations to a significant degree, so using fancy syntactic sugar doesn't seem like a worthwhile optimization to me. At all.
I don't have a machine available right now, but I wonder what happens if two adjacent path elements are integers? Does it perform division instead of path/string concatenation?
The Path object won't construct a path from an integer.
>>> from pathlib import Path
>>> p=Path(1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.6/pathlib.py", line 979, in __new__
self = cls._from_parts(args, init=False)
File "/usr/lib/python3.6/pathlib.py", line 654, in _from_parts
drv, root, parts = self._parse_args(args)
File "/usr/lib/python3.6/pathlib.py", line 638, in _parse_args
a = os.fspath(a)
TypeError: expected str, bytes or os.PathLike object, not int
>>>
So what happens if the paths are numbers? They are treated like any other characters.
Came here to validate my own "I moved" experience: learned stuff I hadn't checked on. TL;DR its never too late to learn what you can do, once you can deprecate the past.
I think it's because the remaining python 2.7 users are mostly people using the packages that were unsupported in 3, which I think were mostly packages used in data science afaik
What do you mean? `A.dot(B)` and `A @ B` are the same thing for NumPy arrays. You might be mixing it up with the weirdness of `array` versus `matrix`, but that's totally separate.
Ah you're right, I had that mixed up. matmul should generally be the desired behavior anyway I'd think, the previous behavior of dot was a bit weird IMO, especially the behavior with a scalar. It's not backwards compatible, but I think it'd be better to prefer @/matmul in the future anyway.
I almost think it'd be nice to make matmul undefined for ranks higher than 2, since it's not really matrix multiplication and if you want to do that (or the previous behavior of dot) it can be achieved with einsum, with the advantage that you have to be a lot more explicit about what sort of tensor multiplication you want. That's probably a bit too purist though.
I had so many issues trying to install python3 in an existing server that I ended up having to go back. pip kept complaining and it was just really annoying. Then some libraries were not compatible and it felt like it wasn't worth it.
Even if you don't, any OS-level package manager should easily install Python3 alongside whatever the base install is without any conflicts as `python3` and pip as `pip3`.
Homebrew, apt, pacman, etc. all have one-line python3 installation.
Python allows the overriding of just about every operator. For Pathlib they overrode the division operator to instead perform path addition in a platform agnostic manner.
Syntactic support for annotating functions and exposing the annotations existed as of 3.0.
The 'typing' module in the standard library was new as of 3.5.
Syntactic support for annotating variables was new as of 3.6.
Support for delaying resolution of annotations is new in 3.7 with a __future__ import.
Originally the annotation feature was seen as a possible way to add type hints to Python, but other potential uses were envisioned and no immediate preference was given to types over other uses of annotations.
I realized, just today, that the secrets module is new to 3.6 after trying to pip install it. This being provided directly by the language is a game changer, IMHO.
In particular, being able to create an updated copy of a dict with a single expression is pretty cool: