Nice course. Looks like good, practical material for people who want to dive in.
Side note, this example made me pause and be grateful that I'm no longer professionally writing in python as much as I used to be:
@shares.setter
def shares(self, value):
if not isinstance(value, int):
raise TypeError('Expected int')
self._shares = value
I programmed python for about 6 years. Now I've switched to Java and Go and really don't miss having to type-check things manually. Static-typed languages have too many benefits right off the bat. You can (mostly) avoid all of these types of checks.
I still use python if I'm doing prototyping, exploring APIs or ideas, etc. But I try not to roll it into production unless it's an inconsequential program.
I’m in the midst of a python project right now - I’m fairly familiar with python but I haven’t used in a while, preferring TS. I’m currently majorly kicking myself for not starting out with mypy. Unfortunately even with mypy, soundness guarantees are nowhere near as powerful as TS, and the typing ecosystem is lacking (no definitely-typed equivalent i know of), so working with libraries is pretty much just guesswork. I really have no clue why people love this language so much.
In my experience, the people who love Python are those who are happy with dynamic typing - usually because they only use the language for small projects.
When starting a large project, it would be better to skip Python altogether and use a proper statically-typed language.
Type hints only make sense when you’re working on an existing large Python project - just as Guido was doing at Dropbox when he added them to the language. This is because type annotations are the worst-of-both-worlds - they require the verbosity of static typing and provide few of its benefits.
> This is not a course for software engineers on how to write or maintain a one-million line Python application. I don't write programs like that, nor do most companies who use Python, and neither should you.
If you already know your project is going to be as big as Dropbox's entire codebase, salut don Corleone!
But otherwise, I feel like trying to choose the language that scales to a million lines sounds like dooming in the beginning itself by over-engineering. I attempted that myself but kept coming back to python because it's fast to code, it's forgiving as hell and its customizable. Once you're used to it's quirks you can breeze through it fast. I'm sure ruby and js have the same as well, and with ts you get better typing, but with the typing you can do with 3.8, I'll argue that python gives the _best_ of both worlds if it's especially mixed with a good IDE like pycharm. If you really want, you can also incorporate pytype or something from the get-go. If my Greenfield project's biggest problem is that python doesn't scale, then I'm a very happy man.
Python doesn’t scale well beyond one file, one developer, one time write. Once a program needs maintaining, updating, expanding, debugging - the experience quickly deteriorates.
The codebase I work on now is python and I prefer it to both the java and scala codebases I've worked on before that were much smaller. Python gives a better debugging experience than java and a better updating experience than scala (especially someone else's scala) in my opinion.
I guess I should ask what you consider a small project. I have some five-digit LOC projects written entirely in Python. I have never noticed this typing issue everyone is very concerned about and am often baffled by the emphasis placed upon it. What am I missing?
I am willing to entertain the idea that I have been very lucky or that I have some programming mannerism which has caused me to skate by this kind of thing, but I just don't get it.
It is not impossible to create a large program in a dynamic language. It simply requires extreme discipline.
At the same time, it is not impossible to write a large program entirely in assembly language, it just requires extreme discipline, even more so than python.
It's just a question of degree.
But when you DO get a type problem, and in any sizable code base, you WILL, it only manifests AT RUNTIME, and possibly at a customer site or in production, rather than manifesting at compile time.
It’s certainly possible to write untyped code and have it function, but you’re sacrificing development velocity by requiring both you and future maintainers to manually do all the things that a type checker can do automatically.
Perhaps I have not run into this because I tend to avoid situations where "velocity" is something within two degrees of me uses as an adjective describing a project.
The only times I have heard from people who have worked on projects after me is the odd phone call to thank me for the documentation and the clarity of code, just as an intro to ask me a "what should we do about this?" or some kind of obscure historical follow-up.
Case in point, I had written an apartment search website in Perl. Now, at this point I had been pretty disenchanted with Perl as a whole due to the "there's more than one way to do it" culture combining with the "executable line noise" syntax to give rise to a lot of very impenetrable write-once, read-never-again code from others. So, when I did this, I used real, appropriately-named variables, eschewing the convenience of $_, and I made sure each line of code did one thing and only one thing if at all possible. I did not pack a lot into a line. Each line should be obvious. Where I felt that it might be subject to interpretation, I added comments as to what I was doing and why. Each function had is own comment section which discussed what it was for and why, possible room for improvement, and so on.
Eventually the duty passed to another set of hands, a bunch of students who had never seen Perl (the rise of PHP was strong at that point) and in the follow-up, they mentioned that they found it very straightforward to simply re-write each line.
I heard back in the past few months about one of those very large projects I did in a previous job, just keeping touch with people. I asked about one of my personal pet projects. "No, it just keeps running." "It's very obvious where to make any changes." All Python. No type stuff.
It's unquestionably more for short scripts - what I mostly do - instead of big applications. My memory is of learning Java, where you're all but forced to create custom classes for everything, just in case you need extensibility later. With Python? It can .quack(), and that's all I should care about.
I understand generics have helped here, but they still don't seem quite there. And ironically, I'm finding ML tasks in python to be something that could really benefit from type hints.
Side note, and this bugs me: if people love it - empirically, they do - and you don't understand why - surely that should bring about some introspection? It seems all-too-often to bring about the opposite reflex.
Badly indented python code doesn't run, which it shouldn't - my understanding and the machine's are different. Well-indented python code has a lot less visual noise than other languages. That alone should give pause for thought. Why would you not want that feature?
Your comparison seems entirely about python v Java. I encourage you to check out some modern gradually types languages to help understand what Python lacks. TypeScript is particularly good for this.
> Why would you not want that feature?
Easy. Braces/etc. become basically invisible after using the language for a bit so they don’t bother me whatsoever. However moving code around in Python is always a pain because you must make sure everything is placed at the correct indentation level rather than simply letting the formatter take care of it. I’ve definitely experienced bugs from moved code having its first/last line at the wrong level, and it can be particularly confusing when the last line is at a different indentation than the previous. It’s so much easier to just grab a brace-enclosed block and smack it down somewhere (which also provides a good sanity check that you’ve yanked the entire block and aren’t missing any lines).
> Badly indented python code doesn't run
I wish this were the case. What’s actually the case is that badly indented python code will give you potentially different results than what you expect, which ranges from syntax errors to failed tests to very hard to diagnose bugs.
> Side note, and this bugs me: if people love it - empirically, they do - and you don't understand why - surely that should bring about some introspection?
Side note, and this bugs me: if people hate it - empirically, they do - and you don't understand why - surely that should bring about some introspection?
"invisible" syntax is precisely want I don't want. That's practically the definition of a bug!
I know I set myself up for the last line. I can only say I've really made a good faith effort to try to understand the explanations for braces, and none make any sense to me. I've had nearly ten years writing more or less python alongside braced languages (I agree it's not a massive pain), and outside of the REPL - I have literally never seen an IndentationError (err.. that's the same as "doesn't run" to me), or "hard to diagnose bugs". Almost never a TypeError, either. Maybe a dozen or so?
I need to understand, from source code, how the instructions flow. For that, I and essentially all other humans need indentation. I never want the machine to interpret the instruction flow differently to me. I genuinely cannot understand how someone can fail to produce python code to that standard. I trivially can in any language which includes syntax specifically for giving a machine a separate understanding of program flow to humans.
Because no matter what problem you have, after 12 minutes of googling, you can pip install foo. The language almost doesn't matter because most programming is via library api.
I can do the same with npm, but if there’s a @types available (or better yet built in), I barely have to read long-form documentation and I get smart completions with documentation, types, etc right in my editor, specific to the exact expression I’m editing.
And...exactly the same with mypy, either via in-library typings or the typeshed.
Projects in the npm ecosystem may be somewhat more likely to.have typings if the project exists, but I find that, for anything other than web frontend, where JS is obviously king, the tool I'm looking for is more likely to actually exist in the python ecosystem.
There’s a couple dozen packages on there. This can’t be described as “exactly the same” as npm/typescript if the scale and level of community involvement [1] is nowhere near the same.
1: among the most important things here, in addition to the strength of the type system, where mypy also lacks
People love the language because the language doesn't matter?
I don't think that quite explains it. We loved Python at version 1, before pip and PyPI were around - although the "batteries included" standard library played a similar role to some extent. For me it's mostly the syntax and the simplicity.
> I really have no clue why people love this language so much.
You get used to it somehow. I did C and Java for about 10 years before Python. My first exposure to Python was a Computational Physics class and it was maddening not knowing what types went where. Eventually you realize it doesn't really matter as long as you're disciplined and rely on others who are disciplined enough to document their code well. I haven't had the chance to use mypy in the 7 years I've been using Python (across 3 industries; medical devices, scientific devices, and web) but I would be surprised if it significantly reduced the number of bugs I've seen in practice.
It’s certainly possible to write out types in English and cross your fingers and hope they’re correct and stay correct as the file gets edited by more and more people, but why do that when you can do it in a language that an intelligent type checker can understand and validate, and further use to provide smart editor tab-completions/etc?
This is especially big for any refactoring work - just yesterday I had to change the format of a config file that gets read by both TS and Python scripts. For ts, I updated the typedef and I instantly got every error in the codebase annotated. For python, I had to manually go through each line, using my own brain as the type checker. My brain is much faster and better at being a “squiggly red line spotter” than a type checker, so I finished my TS work in a few minutes and took probably an hour going through the python, despite the projects being similar in size.
"I have to write a really small script... ok, not so small that I can do it with bash. I know, I'll use Python! This is going to be a small, throwaway utility anyway"
(Some weeks later, it's turned into a large monstrosity and I need to refactor something, and everything breaks because refactoring anything nontrivial written in Python is a dangerous proposition)
I mean, you need to write tests before you refactor, but that's true in all languages.
I do agree (and have the scars to prove it) that this isn't a panacaea but it does help, a lot.
That being said, one of the worst bugs I had in python was in passing in the wrong types to a constructor function, and my tests didn't catch that. To be fair, Mypy would have, but I hadn't annotated that part. At the end of that debugging situation, I would have gladly killed somone for enforced static types in Python.
That being said, the data model is a work of art, and a core reason why I enjoy coding in Python. It's just a shame that pandas kinda sucks.
I find the problem with writing tests for Python is twofold:
- Most Python programs I write start their life as tiny scripts, always with the certainty they'll never grow (Narrator: they always grow). I don't know many people who write tests for their scripts...
- Testing in Python means too much effort in the wrong places. Consider that most refactoring problems would have been caught for free by a language with static typing. I've experienced runtime errors because I changed the return type of a function from a single value to a list (or viceversa). A statically typed language would have let me know of my mistake for free, so that I could focus on tests that really matter.
Are types a free lunch or are there trade offs, what are you getting and what are you losing by adopting types?
How do the benefits you get, relate to the difficulty of problems you’re solving when programming?
Eg if the cost of types is increased coupling of code but the trade off benefit is it makes fixing typos easy then that’s a bad trade. You’ve made a hard problem more brittle in exchange for making a trivial problem even easier.
Could you give an example of how code could be coupled at the type level, but not logically coupled? It seems impossible to me.
> trivial problem
Type checking is quite non-trivial, especially as the logic and types get more advanced (conditional types, index types, mapped types, etc etc). Not just “typos”.
>> coupled at the type level, but not logically coupled?
float multiply(float a, float b) { return a * b; }
You can’t add doubles with this code because it’s coupled to the float type. Summing a for b times has no logical coupling to floats or doubles just as floats or doubles have no logical coupling to either implementation of addition - both types are added by the same operations. You can swap the types or the algorithm.
Before you reach for the polymorphism or further pollute the universe of types consider that it costs less lines of code to erase types here.
What errors would preserving types help with here? What has more utility, less lines of code spent for the same outcomes or an implementation that exists further along the spectrum of typing in the “at compile time” direction? - I don’t believe that question can be answered but in practice i find less lines of code correlates more strongly with outcomes i care about than the degree of typing applied.
> What errors would preserving types help with here?
Pretty simple, it helps when you do a refactoring that changes the type of the value passed somewhere from number to number[] and the compiler instantly tells you that that’s illegal.
I did something similar just the other day when I had a shared config file read by both TS scripts and Python scripts and I needed to change the format to support a new feature. On the ts side I updated the typedef and the compiler pointed out to me every area that needed updating, on the Python side I had to spend a good chunk of time manually going through the script to figure out what needed updating.
>> it helps when you do a refactoring that changes the type of the value passed somewhere from number to number[]
Typing isn't a strong enough tool to combat that class of problem; you can fix the type signatures while introducing a semantic break. You need tests and if you have tests, what does typing bring? Keep your tests fast, measure and maintain a speed of >250 tests per second and remember to test behaviour not implementation - you won't go far wrong.
IDEs have been performing refactoring changes on our behalf in dynamic languages for years at this point. Don't manually edit when we've never had such powerful tooling available to us
Even concepts that i'd say are strictly the domain of static typying, like automatically pruning dead code behind retired feature flags, make headway today: https://github.com/jendrikseipp/vulture
Typing has a time and a place, that is without a doubt but the world is better viewed without the static typing lens permanently affixed in place, your efficacy will increase.
Testing isn’t solely about whether code works today - you’re unlikely to checkin something completely broken. In fact i advocate NO tests for throwaway code, move fast.
It’s whether it continues to work tomorrow after you’ve added or removed a behaviour. Tests are the secret sauce that keep the cost of change low.
Refactoring (a commit with zero changes to test code) is the other key pillar of long term codebase health.
> Static-typed languages have too many benefits right off the bat. You can (mostly) avoid all of these types of checks.
Python’s major typecheckers, which add features faster than the type systems of most major statically-typed languages, already support a more robust type system than many industrially-popular statically-typed languages.
It's not Haskell, or even Scala, but then neither is Go or Java.
Now, obviously, you don't get the performance benefits of type-informed static compilation with Python, but performance isn't the issue most people seem to be discussing here.
A large majority of projects out there are not using type hints. You'd have to roll your own type hints. And sometimes, the type hints are impossible to upstream:
For example, let's say you use Mongoengine. Now, you can query a collection by using MyObject.objects.get(...) and it will return a MyObject. You might be able to make that work in Mypy. Now, you use some fancy aggregation feature, passing a dict into Mongoengine and it changes the type of the result you get back. How will you typehint that? I think the only way would be to special case it for all your queries.
Dynamic languages have 'reflection' all over the place. Type systems are to terminate quickly, and if they don't, you are hacking them and they perform terrible (game of life in C++ templates and such).
So even if performance is not the goal, you can't even get the correctness property right.
In Haskell, there are libraries that will generate code from schemas, including migrations. They are typed to various degrees. There also shallow embeddings of the PostgreSQL query language (with types for the returned type, unlike the Mongoengine example before!). I just want to demonstrate that in Haskell, typing is not all-or-nothing either, but it is a spectrum, where a Haskell lib will typically end up being more typed than a Python lib. You could 'simplify' their type signatures (get dynamic typing) with GHC.Generic and you'd get something like what is common practice with Python. But it is pretty much impossible to go the other way.
> For example, let's say you use Mongoengine. Now, you can query a collection by using MyObject.objects.get(...) and it will return a MyObject. You might be able to make that work in Mypy. Now, you use some fancy aggregation feature, passing a dict into Mongoengine and it changes the type of the result you get back. How will you typehint that? I think the only way would be to special case it for all your queries.
Isnt this the case any time you use a DSL to query external data? Only sometimes someone else have type-hinted the things for you.
I work in a strongly typed language (F#) that interops with JS, and one of our principles is to do the type checking on the F#--JS boundary and then not have to worry about it again.
For that specific contrived scenario (mongoengine, which I've never used, apparently uses .aggregate for the latter rather than overriding .get) with two stub @overloads of [...].get for typehinting, one covering what it takes and gives in the aggregate case of interest and one for what it takes and gives in the simple case.
I agree that, in general, duck-typing is preferable. However in cases like this, where you really want to ensure the count of shares is a whole number, I can see an argument for `isinstance`.
What I can’t understand is the argument that occasional use of `isinstance` is bad, but also that pervasive nominal type-checking via annotations is good.
It usually means there is a better way to do what you are doing.
If it's user input you should sanitize there. In Python this is often easy because you can usually cast as the type you need e.g. with int() and then raise exception if it cannot.
If it's your code, then you should make sure you are passing in an int if that's what is required. If you type hint the input as int, and somewhere in your code you pass in a string, you will get a warning in the IDE.
I would argue that's a very different time and place than checking for instanceof in the running code and that's why isinstance is bad and type-checking is good.
There are always exceptions but it's bad to write them in examples when teaching code.
> If you type hint the input as int, and somewhere in your code you pass in a string, you will get a warning in the IDE.
Statically-typed languages can guarantee to get this right 100% of the time. Can type-checkers for a highly-dynamic language like Python guarantee the same?
> Statically-typed languages can guarantee to get this right 100% of the time.
No they don't. They usually provide escape hatches for things typesystem does not cover, so there are cases where typechecker will just trust that you know what you are doing (even if you don't).
But more importantly, you don't need 100% to be useful. For aid in IDE, high precision (with somewhat lacking recall) is good enough. Of course, for refactoring, higher recall, the better (but you could substitute lacking recall with tests, which is suboptimal, but viable).
But it's interesting question on what python/mypy (python typechecker) can actually do. The answer here is it depends on configuration. Mypy with default configuration typechecks only typed code (i.e. functions which have type annotations) so you get guarantees only there. But you can configure it to be more and more strict (checking untyped defs, not allowing untyped code, and more), which increases guarantees you get (and it also increases the number of valid programs that it rejects). You can get in python into really strictly typed code, but you can also hit the wall if you need libraries that does not provide proper type hints (unless you write type hints by yourself).
> What I can’t understand is the argument that occasional use of `isinstance` is bad, but also that pervasive nominal type-checking via annotations is good.
it doesn't have to be nominal, MyPy supports structural sub-typing through Protocols [1]
It's not idiomatic to force a specific type like that even so. Idiomatic code would accept any type that has the same operations (and their semantics) as int.
What PEP 484 was standardize the use of annotations as type annotations and provide ancillary out-of-the-box support for type hinting via annotations, particularly the typing module.
Mypy was actually using python 3.x annotation for type annotations before PEP 484 standardized them brought the stdlib typing module, but with PEP 484 there was a common, language-defined standard baseline for mypy and other efforts.
Maybe your IDE experience is different than mine, but a typed function will gladly accepted an untyped var without complaining, and during runtime could care less.
Why should type annotation be enforced at run time? In statically typed languages there is no type checking at run time, your type system already proved the type of variable.
> That’s why, for example, they can’t be used to increase the performance of the interpreter.
They can (as in - there's API for that, the rest is up to a community effort) improve performance of a final program, if type-annotated code is passed through Cython with ``annotation_typing=True`` flag:
The annotations are a language syntax feature than the runtime doesn't enforce. There are a number of separate static typecheckers, such as mypy, that do allow AOT static verification (which is, after all, all languages like Haskell have; runtime enforcement of types that have been statically verified in advance isn't super common or necessary.)
Side note, this example made me pause and be grateful that I'm no longer professionally writing in python as much as I used to be:
I programmed python for about 6 years. Now I've switched to Java and Go and really don't miss having to type-check things manually. Static-typed languages have too many benefits right off the bat. You can (mostly) avoid all of these types of checks.I still use python if I'm doing prototyping, exploring APIs or ideas, etc. But I try not to roll it into production unless it's an inconsequential program.