Hacker News new | past | comments | ask | show | jobs | submit login
The Elements of Python Style (github.com/amontalenti)
147 points by pixelmonkey on Jan 4, 2016 | hide | past | favorite | 61 comments



This guy got a bunch of things wrong:

> encode = lambda x: x.encode("utf-8", "ignore")

Why not rather call it s (short for str)?

> With tuple unpacking, using _ as a throwaway label is also OK.

Use __ (double underscore) instead so it won't clash with the gettext convention:

    from django.utils.translation import ugettext as _
> always use args and kwargs for variable argument lists

No, no, no, no, no! Why not clarify what are those accepted arguments?

What the following code is doing?

    def shuffle(*args):
        # do something
It shuffles SOMETHING.

And this?

    def shuffle(*cards):
        # do something
It shuffles CARDS. Big difference IMO.

> Use parens (...) for fluent APIs

I don't like that.

> Use implicit continuations in function calls

PEP8 suggest: "The preferred place to break around a binary operator is after the operator, not before it."

Following this logic, this should be:

    return set((key.lower(), val.lower()) for
                key, val in mapping.iteritems())
(for at the end of the first line)

> Rarely create your own exception types

> ... and when you must, don't make too many.

Sorry but that's plain stupid.

> if item vs if item is not None

Yes, you should absolutely care. Not all the times, but sometimes a "NULL" value is different than a filled, but empty value. See zero. It's falsy, but totally acceptable input...


I'm "this guy" who "got a bunch of things wrong" :)

Style guides are opinionated, and not everyone will agree. I intentionally made the style guide part of a Github repository so I could consider feedback from the community, and I've already incoporated plenty. As I mention below in the thread, I think you have a good point about "args" vs "cards" for your shuffle example.

Re: "The preferred place to break around a binary operator is after the operator, not before it", I think you mis-read this PEP8 suggestion. A binary operator is something like "and" or "+", not the "for" keyword inside a listcomp.

Re: your other disagreements, they amount to style choices of their own. When a style guide picks between two ways of doing things, there will inevitably be detractors, and I am OK with that.


If you don't use "if item" properly, it's bad code, not just style choice.


>No, no, no, no, no! Why not clarify what are those accepted arguments?

> What the following code is doing?

    def shuffle(args):  
        # do something
It's misusing the args - you shouldn't be using args/kwargs there in the first place - they should be used for forwarding and things like that - eg. you're writing a generic function decorator and things like that.

If you want a collection to operate on "cards" then pass cards as a collection, if you want to operate on abstract function parameters use args/kwargs


Yep.

If you want meaningful names then do it explicitly:

    def shuffle(*args):
        cards = args
Much less likely to catch someone out.


If that method only accepted Cards, that's more confusing and not something visible to a code hinter that will supply the method signature. `*cards` is more explicit when the expected parameter is a variable number of card parameters.


But it's not pythonic to care if something is a "Card" or not. All that matters is if looks and acts like a Card.


I've never agreed with that. Without a good definition of what exactly it means to "look and act like a Card", large projects can easily get several slightly different definitions of what a Card is like, and nobody knows exactly which methods to use where (so what exactly do I need to implement so my class is a "file-like object"?). Horrible.

I call it smurf typing. If it smurfs like a smurf, smurfs like a smurf and possibly smurfs like a smurf, we consider it a smurf. Good luck with that information.


I think it's valid to use a more specific parameter for vararg to indicate the type of parameters.

for example:

  def ping(*servers):
But please don't deviate from args and kwargs for the generic case (like kw or kargs), it ruins more finely tuned muscle memory.


These are good points. I can't speak for pixelmonkey, but I bet he would appreciate if you sent him a pull request with your feedback. I don't know how noisy the conversation about Python style guidelines is right now, but could be worth it.


I'd gladly accept a PR on the issue raised re: "args" vs "cards" for the shuffle function. When I wrote the guideline on using the names "args" and "kwargs", I was thinking more about cases where you have to proxy both, e.g.

    class SomeClass(SomeOtherClass):
        def __init__(self, *args, **kwargs):
            # some logic
            super(SomeClass, self).__init__(*args, **kwargs)


that's totally acceptable in this case.


> This guy got a bunch of things wrong:

> I don't like that.

Not sure how "not liking something" equates getting things wrong, but I agree with most of what the author mentioned.

Using 'x' or 's' for a short lambda makes no difference. 's' could be anything, like 'x'. Unless you're specifically using django, there's no point in renaming _. Agreed with you on the argument lists and exceptions.

Python doesn't have NULL, I think you meant None.


> Unless you're specifically using django, there's no point in renaming _

Consistency? Accidental overwriting gettext alias? Don't forget it's also the last result in the interactive interpreter...

> Python doesn't have NULL, I think you meant None.

I meant "NULL" (with quotes).


>> Use parens (...) for fluent APIs > I don't like that.

And I can't stand '/' line continuations and much prefer the paren style. Quite happy to agree to disagree but you did start your post with "This guy got a bunch of things wrong" - so you should maybe be stricter about the distinction between what you truly regard as 'wrong' and what you just respectfully disagree with?


you are right


> Use reST for docstrings

I would suggest also using the Napoleon [1] extension now included with recent versions of Sphinx. It allows you to use Google or Numpy-style reST formatting to describe parameters and types without all the javadoc-ish noise from :param:, :type:, and others.

[1] https://sphinxcontrib-napoleon.readthedocs.org/en/latest/


Thanks -- we are actually discussing this possible change in a Pull Request here: https://github.com/amontalenti/elements-of-python-style/pull...


> If the strict 79-character line length rule in flake8 bothers you, feel free to ignore or adjust that rule.

Thanks for remarking this. Too many people use flake8/pep8/pylint in commit hooks.

From the BDFL himself:

> "I personally hate with a vengeance tools named after style guide PEPs that claim to enforce the guidelines from those PEPs."

> "[stylechecker] tools' rigidity and simplicity reflects badly on the [style guide] PEPs, which try hard not to be rigid or simplistic"

> GvR

https://twitter.com/raymondh/status/683793667996303360

https://twitter.com/raymondh/status/683809696902332416


My big problem when reading Python code are "from" and "as" in imports.

To illustrate: I read a function doing a few calls in file, and I want to understand what happens in one of the calls (a name without any dot prefix). So I do forward search for the name in the current file (a bit silly when thinking of it, but forward slash is so close to my index finger; and in web browser the search is forward by default), and it only finds it in imports in the beginning of the file after "from". At this point I lose the context of the function I was originally reading, because I have seen other chunks of code which I try to understand a bit. Also I lose the position of the original function in the file. If I want to dive deeper to the calls, I look up the module in question.

If the name would be use with full namespace prefix, like

  import datetime
  datetime.datetime.utcnow()
^ I could immediately see which module it's coming from and go there. Straight up! I wouldn't need finger acrobatics, and it would be readable in e.g. github code viewer.

If you are really bothered by the length of the dot-prefixed names, why not do

  now = datetime.datetime.utcnow()
  print(now())
^ That's all clear, and most likely you will fit to 79 chars per line.

Namespaces are a good idea, and explicit over implicit, right? I understand that there are some conventions (like _ for gettext), but I see importing names without dot prefixes as killing readability.


Minor nitpick: I think you mean

   now = datetime.datetime.utcnow # no parentheses
More generally, I think the strongest argument for using "from" is when you're dealing with redundant "foo.foo.bar" names. Would you really be confused at seeing "datetime.utcnow()" instead of "datetime.datetime.utcnow()"?


To me, this sounds like a failure of your tooling rather than the code you're reading. Vim with Jedi, PyCharm and PyDev can all jump straight to definitions. What's your development environment?

I'll use `from <module> import <identifiers>` when the identifiers' names express their purpose, and aren't dependent on the module's name. Importing identifiers directly makes the code using them less noisy [0], and makes it possible to replace the source module for the identifiers later on. In Django projects, I'll often need to rename modules or move identifiers between modules, and so using `from` imports makes that a lot easier.

I agree that `as` imports should be used sparingly.

[0]: your suggested solution is even more noisy than using the qualified name, because now you have a variable hanging around and readers have to work out if `now` is going to be reassigned or used later.


Good piece.

A (minor) nitpick.

On section titled '''Prefer "pure" functions and generators''' two "dedupe" functions are shown. They are stated to be the same but aren't. The first function returns the number of duplicated elements excluded (an int). The second function returns a set of excluding duplicated elements. It's true (in the first function) that the original list "items" will be modified to exclude the duplicated elements but an additional unshown step is needed in the second to determine the number... i.e len(items) - len(set(items)).


I agree. I opened PR #19 to address this:

https://github.com/amontalenti/elements-of-python-style/pull...

Would love to hear what you think of the changes.


I've always been curious about docstring style. I've usually gone with numpy-style docstrings:

https://github.com/numpy/numpy/blob/master/doc/HOWTO_DOCUMEN...

They tend to be easier to read than reST, and happen to be used by the packages I use most (numpy, scipy). They are also easily worked into sphynx/napoleon.

2 questions:

1.) Why use reST instead?

2.) Why isn't there a python-wide standard or best practices as is found for Scala/Java/others?


I use Sphinx style because it gives me better auto-completion via Jedi[1][2]. Otherwise I really like the look of Google style docstrings[3]. There's some ongoing work[4][5] to get support for that in Jedi as well.

[1] https://github.com/davidhalter/jedi [2] https://github.com/davidhalter/jedi-vim [3] http://sphinxcontrib-napoleon.readthedocs.org [4] https://github.com/davidhalter/jedi/issues/504 [5] https://github.com/davidhalter/jedi/pull/617


From [1], the "Docstring Standard" section of your link, "Our docstring standard uses re-structured text (reST) syntax". It seems like they use reST?

1 - https://github.com/numpy/numpy/blob/master/doc/HOWTO_DOCUMEN...


That's referring to the syntax in the docstring format they developed is only supported by reST, render by Sphinx's integration with reST. For exmaple,

    .. image:: filename 
You don't get that in Markdown.


> The biggest offender here is the bare except: pass clause. Never use these. Suppressing exceptions is simply dangerous.

This doesn't apply if you're using exceptions as control flow, which is idiomatic and recommended Python behavior.

Let's say you want to make sure some file doesn't exist before you start doing something. Specifically, you want to wipe it out if it exists, and you don't care if it doesn't exist. You do:

    if os.path.exists(temp_file):
        os.remove(temp_file)
This isn't idiomatic Python behavior. Instead, you want to use exceptions:

    try:
        os.remove(temp_file)
    except OSError:
        pass
Mind you, an actual bare "except:" (without specifying an exception type) is still bad, because that'll catch things you don't want being caught, like a SystemExit or KeyboardInterrupt, but it's perfectly acceptable to silently pass in this case.


Even better (with Python 3.5 or newer):

    from contextlib import suppress

    with suppress(OSError):
        os.remove(temp_file)
More generally, context managers are the right way to refactor repeated try/except/finally clauses.


Well, I have yet another reason to migrate from 2 to 3 now.


Note that FileNotFoundError is preferred in modern Python instead of OSError. In case of old Python, one should narrow the catching to cover only the wanted case:

    import os, errno

    try:
        os.remove(temp_file)
    except OSError as e:
        if e.errno != errno.ENOENT:
            raise


I can see why this is the preferred idiom.


Your example proves his point.

Suppose that permissions for the directory that the file is in got screwed up. (Or you're working in a new environment and they weren't set up right.) Your version gives no indication that anything went wrong. The version you dislike gives a helpful error message that says exactly what went wrong.

That error message can save a LOT of debugging time. Please make sure it is there.


However, the first version has a race condition. The file might not exist between the if-statement check and the call to os.remove.


That sort of issue is why the best practice is to use careful routines like https://docs.python.org/2/library/tempfile.html for temp files.

But in practice on a trusted server, this kind of race condition doesn't actually happen. And if you have someone untrusted on your server, you already have a bigger problem than what you're fixing by getting the race condition right.

By contrast the kind of permission problem that I mentioned is common, and I've spent too many hours debugging such issues in code other people wrote. (Ever tried to run old code on a new server that someone else tried to set up? You'd be amazed how many hidden dependencies will turn up...)

That is why I consider the race condition to be less important than the exception swallowing anti-pattern.


I never silently pass on exceptions any more as I've been bitten by way too many edge cases. You could imagine the case where your example is run by a different user to the owner of temp_file and fails to remove it due to permission issues. In my opinion you should at least log some sort of message here.


This still doesn't solve the problem, though, because you still have the race condition that the exception case is supposed to be fixing.

There's probably no good solution to this in this case.


Did you mean that:

    try:
        os.remove(temp_file)
    except OSError:
        pass
Has a race condition? Care to explain?


Suppose you have the if-then case. You might end up with the file on disk anyway: maybe os.path.exists says false, and it skips the remove, and so some other process sneaks in and creates the file behind your back. By the time you go to do whatever it is that requires this file not to exist, that file is there again.

Suppose then you decide to go for the try-except version. You still have the same problem. At some point between os.remove, and whatever you're going to do, this other process could leap in and create the file.

Some types of race condition can be solved by simply trying the operation required, rather than checking first and then trying only if it seems necessary, but I don't think this situation is one of them.


I see what you mean. I wouldn't call it a race condition in that code though. It may be a race condition in the system, but this code fragment on its own is race free - if there's a file, it will be removed. The rest of the system is not present and we can only see the correct-or-not way of file removal. (raising exception or not)

Otherwise we'll end up with "a = 1" being a race condition, because it may happen to map to a device memory used by many applications.


> With tuple unpacking, using _ as a throwaway label is also OK.

I see this sometimes, but the problem is that this breaks internationalization using gettext, commonly imported under the name _:

   from gettext import gettext as _
(For those who’ve never seen it, “_” is then used as a function which wraps strings, like _("Book"); translation tools can then parse the source code, deduce that “Book” is a term which needs translating, and at runtime the “_” function can return the translated term instead of the original string when a different language is needed. The name “_” for this functionality predates Python and is a common feature across most programming language using the gettext method of internationalization. It is therefore unfortunate that some Python users have taken “_” to be a kind of throwaway variable.)


I abhor this practice, frankly. It doesn't take that much more work to type gettext("The translating string"), or even t("string"). The purpose of "blank variable" is better served by _ than translations are.

And gettext the tool can pull out the terms requiring translation using any identifier; you don't have to use "_", and I believe "gettext" is supported automatically. Also, depending on what you're doing, a simple function might not be enough; the _("foo") stuff was intended to simply use the locale of the local machine, which is almost certainly wrong in certain environments, such as server-side programming. (Where we need to bind in things like the user's preferences, or the Accept-Language header on the current request to the call to the translation function, so _("foo") is really something like _("foo", request), though with a bit of magic we can get back to a _("foo") syntax, though again, I prefer the explicitness of gettext("foo") or translate("foo").)


This is true, but any decent IDE (I use PyCharm by Jetbrains) will hint you are overrinding a global variable.

I like to use the underscore as a throwaway because it is explicit - it is far more common for me to override other common throwaway variable names like "i", "j" or "x" that are also often used as loop counters.


Do those editors use the project's builtin namespace? gettext.install sets __builtins__._ to the translation function so that individual modules don't have to import it.


Yep, there's an option "Collect run-time types information for code insight" (not sure if enabled by default). JetBrains IDEs are smart as hell.


I thought it was mostly imported as __ (double underscore).


No, I’ve never seen that. The normal single-underscore name is used in the Python documentation:

https://docs.python.org/3.6/library/gettext.html


You're right, I was probably thinking of some other library/framework/language.


I agree with a lot of the criticisms. In addition,

1. The prefixed underscore affects exports if you're providing an API in a package - see the standard library for lots of examples of prepended underscores.

2. Don't use lambdas where a regular function definition works.

3. Only use args and *kwargs where these args are completely generic, otherwise be specific as to the semantics.

4. Complex list comps should be rewritten to append to lists.

5. Multiline condition statements should use indentation other than 4 spaces (2 or 8, for example).

6. Don't use `if foo` when you mean `if foo is not None`.

Good luck with your style guide, I wish more people would make good style important to them.


Most of it is pretty obvious. However, what I am still fighting myself on with Python is a project layout/module structure.

In languages with C++-like OOP model (Java, PHP) it's most of the time reasonable to settle on "1 file, 1 class" and to use directory structure the same way you use namespaces. Actually, in some such languages it's enforced.

For PHP it still leaves a problem with where you shoud put/expect to find anything beside classes, but it's fine, more or less. I hope some day we'll decide even more generally what works best, but for now it's decided per project/framework and it's mostly bearable.

For Ruby it's mostly the same as for PHP, even though the language is a bit more flexible. But it's still normal to use class as the main entity, so pretty much the same guidelines/problems apply.

For Python… I cannot decide yet, and what's worse, looking at the code on, say, Github, it doesn't seem like it's less of a problem for many others. Because some of "solutions" are seriously horrible. No project structure whatsoever, tens of classes in the same file. Classes, functions and even app-specific code (like GUI, CLI or HTTP-routing) all in the same file. Multi-thousand LOC files are considered completely normal. Ridiculous amount of code in __init__.py, that completely re-arranges submodule structure or even defines a bunch of new code. Hell, I've seen projects (more than 1) almost entirely written in the __init__.py! Please don't say you think it's normal.

I would say that things remind of "project-structure" in C, but (it's scary to suddenly realize that) it would actually be a compliment for Python, because even though large files are considered normal as well, with modern practices of writing C-code and all that "C-lang OOP" model stuff it's mostly clear where you should put something. With Python I'm never sure. The only way to find something is to use grep.

With things like that in Python, it's even no use to speak of a "project structure" in the more general sense, aside from what's enforced by the frameworks like Django.

So, any suggestions?


In PHP you create classes for everything because there is no other encapsulating scope. In that circumstance, with the class loader hook the language gives you, one class per file is a clear win.

In Python, with module-level scope, it makes sense to group like-functionality in the same module (aka file). Maybe it only seems "wrong" to you because it's different than other languages you're more familiar with?


Yes, it makes sense "to group like-functionality in the same module", but that seems to be the very problem. Because I already describes what it leads to, and compared with state of things in languages with more rigid code structure I cannot say it satisfies me.

And, just to clarify, I consider myself quite familiar with Python, because it's probably my first language I'm still using a lot, even though the language itself changed a lot during this time. That's about 10 years. And I am still never sure where I'm supposed to put anything.

When 1 file contains everything there is to say about `time` — it will contain a lot, because there really is a lot to say about time. It's much easier to move around the codebase, when there are many files with clear semantic scope. Say, there's a module Time with a file (and a class) Timezone in it. Then, even though the class Timezone may be 4000 lines long — it doesn't bother me. If I assume that there's no crazy person working on that project, that would write something about product energy value in the class named "Timezone" — there's pretty much zero cognitive load of keeping that file in my text editor. Because I know, that even though 4000 LOC is quite a lot — there's nothing else in that file. There's no "before" and "after" that class. If I am not finding something here — it isn't here.

And the best thing is, that the directory structure can be quite nested, but feel pretty flat. The directory tree is completely semantic, there's no 1 directory with 200 files in that, there's clear meaning in that: src/Time/Timezone, src/Time/Datetime, src/Food/Components/Protein, src/Food/Components/Fat. I can execute `tree -f` in the console, and I already know a lot about what I can expect to find in the codebase. Discoverability is important.

It is also easier to review commits in CVS with a project structure like that. I can look only at files changed and I know, that somebody changed something in Timezone class, not just something "time-related".

With python it's a great deal harder to follow project structure like that. If everything can be grouped by file based just on "related-ness" principle — files can grow huge and I have to apply constant, significant effort to notice that something can become a sub-module, not just a file with bunch of functionality. And looking at the file with the same 4000 LOC becomes way harder: I don't know what else there is in that class, judging only by filename, assumption of developers being more or less sane and even the piece of code I'm looking at right now. There can be many things "related to time" before or after that piece of code. 4000 suddenly seems much larger than before, and the cognitive load now is quite significant.

That's why I'm looking for some "project-structure style guide" for Python. There needs to be some set of simple rules to follow, so that python project would not become the mess it always becomes. If we think it's reasonable to have a style-guide about "backslash vs parens in multi-line expressions" it's unreasonable to assume that project structure will grow just fine all by itself. Well, it contradicts my observations.


Python gives you packages for domains as large as "time". You would create a directory (for the uninitiated: with an empty __init__.py file in it) and now that directory is a package. The files within it are modules. There would be a time package with a timezone module within. That module would have the Timezone class.

The language has a feature -- a second level of scope -- that does need to be managed if used. You're right about that and it's a legitimate concern I suppose if it affects your throughput.

I suppose to me it's a win because my Python code has fewer classes than say my php code because I'm able to make use of module scope to provide encapsulation for things that don't need to be a class.


I would suggest working on how to improve your Python tooling; Vim with Jedi, or PyCharm are both great options.

I think Python's module semantics are harder to use and understand, but when you know how to wield them (or reading the code of someone who does) they serve you far better than other systems. Python's module semantics are closer to Java's (explicit packages) rather than Ruby or JavaScript (global namespace), but unlike Java, Python's modules are derived from the file structure and not `package` statements.

The consequence of this is that most Python developers throw everything that should be in the same module into the same file, because that's "easiest". It stops being easy when the file grows large.

What you can do to address the problem of large modules in a single file is to turn the module into a directory of the same name, create an `__init__.py`, then put separate files in that directory to hold sections of the module (e.g. a file for each class if that's appropriate), and then in the `__init__.py` import the classes and whatever else you want to export from the module. Done - and client code needn't change.

Python's module system (and namespace system in general) is one of the largest reasons I prefer it over other dynamic languages. Knowing where every identifier came from in a file makes it so much easier to learn a codebase.


I can't take a guide seriously when things like this are sprinkled all over it:

    return [x
         for x in items
         if x.endswith(".py")]

    if (response and
        "data" in response and
        response["data"]):
        return response["data"]
Just use 2 lines instead of using valuable horizontal lines, and make the code harder to read.

ex:

    pyfiles = [x for x in items if x.endswith(".py")]
    return pyfiles


It's a fair point, but the style guide is meant to be illustrative.

The former example is indented the way longer lines might very well be e.g. if your filter clause were a 2-part bool, you might not be able to fit it in a one-liner. The indentation also clarifies the difference between it and the imperative example above it.

The latter example is a rewrite of code written with 2 nested if statements, so the indentation serves to show that now the two nested statements were incorporated into the bool evaluation. If I made it a one-liner, someone might make the argument that the nested version is "more readable", but as written all you can say is that the single-if version is "less nested", which is the point I was trying to get across.


At first read this is a pretty reasonable doc for some things to consider. I also recommend watching the linked "PEP8" YouTube presentation from PyCON: https://www.youtube.com/watch?v=wf-BqAjZb8M

From past projects, I got way more adherance-to-the-letter type patches versus "make this code pythonic and elegant" type patches, and the latter IMHO is more of the point of where things should go.

It's a bit of a long/slow talk, so if you find yourself getting bored, switch it up to 2x speed on youtube and I think many of you will like it.

My personal PEP8 ignore list was:

-pep8 -r --ignore=E501,E221,W291,W391,E302,E251,E203,W293,E231,E303,E201,E225,E261,E241 lib/ bin/

Which is quite a lot of it :)

Codebases are easier to understand when everybody's consistent within the codebase, but things need to be idiomatic well beyond PEP8 and it's easy to have a forrest vs trees scenario.


>But another good example is rewriting an if/else chain as a dictionary lookup or repetitive code as a tuple of operations followed by a for loop.

I'm familiar with the dictionary lookup method but not the tuple. Perhaps it's obvious but could someone please give an example?


All great tips, except I still feel could write docstings in markdown.

Is there a way to use Sphinx but with markdown?


The grammar capos of python are back.

Can we get rid of the stupid indent vs tab stuff and have errors visible on screen without black magic?

Oh and maybe why not braces to really control variable lifetime?

    from __future__ import braces
    SyntaxError: not a chance
Ah ah what a joke.

Just remember that use strict exists in Perl and in python you have

    dyslexia=0
    if True:
        dylesxcia=1
will raise no error.

The strawman fallacie of python purity of style




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: