Hacker News new | past | comments | ask | show | jobs | submit login
PyAnnotate – Auto-generate type annotations for mypy (mypy-lang.blogspot.com)
129 points by psychotik on Nov 15, 2017 | hide | past | favorite | 33 comments



For any dropboxers (or others), how does this compare with pytype? https://github.com/google/pytype.


(I worked on an early version of PyAnnotate.)

The main difference is that pytype is a static analyzer (i.e. it inspects the code and tries to figure out what types various things are), whereas PyAnnotate is a profiler hook, so you have to run your code and it observes types as your code runs.

Both have their pros and cons. While static analysis (in my personal opinion) would be ideal because you don't have to run your code, and in theory it can be much more complete, it's also much harder (often impossible) in Python. The runtime analysis of PyAnnotate has a lot of downsides (it doesn't give you types for code that it didn't observe run and it can't know if it has seen all the particular types for a parameter or return). The upside is that it was quick to implement something useful and it gets you quickly to pretty descent type annotations for your main code paths. Which is nice, because in a large untyped codebase it effectively lays down a rough draft of type annotations, making it a lot easier to fix up and fill in edge cases by hand.


There was a thing called PySonar by Yin Wang. It's sort of gone now but you can still find copies around the net.


(work for Google and uses pytype daily)

Pytype is similar to mypy that it can do type checking with proper annotations. In addition to use annotations, pytype can also do inference based on static analysis.

I don't have much experience with mypy but the last time I used it, it cannot infer from `return x == y` that the function returns a bool. Pytype can correctly infer many simple forms of function argument types and return type, and even some more complex form.

From reading the project, PyAnnotate completely rely on runtime profiling info to _help_ you get to the first round of annotations. We also have similar project that gathers types from runtime and help people to annotate the code. The type information gathered this way has its limitations (PyAnnotate project called this out as well, that you should only use it on legacy code but not on newly written code).

To give an example: if PyAnnotate observe a function below to accept a list of ints and returns an int, it may conclude that the type of this function is `Callable[[List[int]], int]`

``` def foo(xs): ret = 0 for x in xs: ret += x return ret ```

But it can actually work on any iterable (because of the for-in loop), and the item in `xs` is number (because the `__iadd__` call on integer 0). With static analysis, the correct inferred type might be `Callable[[Iterable[Union[int, float]]], Union[int, float]]`


It's not possible to infer that the result `return x == y` is a bool, because python has rich comparisons (e.g. if x an y are numpy arrays you'll get an array of the same shape (after broadcasting) back). So either pytype uses additional information, or it's sometimes just wrong.


Technically true, since you'll need to check the return type of `__eq__()`. But the following code doesn't trigger any error using `mypy test.py --check-untyped-defs`.

  def foo():
    x = 1
    return x == 2


  def bar(x: bool):
    print(x)


  def baz(x: int):
    print(x)


  if __name__ == '__main__':
    bar(foo())
    baz(foo())


I think another good example of "observed types are not necessarily the intended types" is a Text parameter (unicode in python2, str in python3). The actual intent might be Iterable[Text].


Well at first glance, much better documentation :)

I just tried pytype and it basically did nothing but spit out some errors about imports not found. I didn't have time to try to investigate further and the documentation seems to be almost non-existent. Surprising with 1780 commits to the project.



Somewhat off topic but I think that more and more people are learning (the hard way, unfortunately) how important static typing is, and how dynamic typing makes it very difficult to develop and maintain large projects.

I think the next generation of successful languages will all be statically typed (whether they will run natively or in a virtual machine is a different (even if related) question).


No I think what we are seeing is a lot more hybrid systems. Go is like a statically typed language with lots of dynamic features. Julia is a dynamically typed language with lots static typing features.

Paradigms are getting mixed too. Rust, Kotlin and Swift are all imperative languages with heavy functional inspiration.

Traditional statically typed OOP languages such as Java is what people want to get away from.


> No I think what we are seeing is a lot more hybrid systems.

I'm not sure why you started that sentence with "No". I actually agree with it.

There are many good ideas that have emerged separately in different languages, and are being combined in some of the new languages.

All I'm saying is, successful upcoming languages will probably be more statically typed than dynamically typed.

In other words, the dynamic typing paradigm is failing the real world test.

> Go is like a statically typed language with lots of dynamic features.

Go is statically typed.

The part that lacks static typing (no generics) is the worst part of the language that gets the most flak.

> Traditional statically typed OOP languages such as Java is what people want to get away from.

Java's problem is that it's just way too verbose.

    User user = new User(....); // This is not even that verbose

    // Maybe more like this:

    User user = new User(new UserProfile(....), UserManagerFactory.getDefaultUserManager());
People have misdiagnosed the problem and thought it was the static typing.

It turns out to be the lack of support for functions as objects. So for many things you end up having to create dummy classes and objects just to wrap functions.

Even C supported passing function pointers around.

So in this regard, Java is less expressive than C.


I somewhat agree but I think they will be optionally typed (and for that matter, optionally borrow checked), more along the lines of Julia. Where type stable code gets the performance benefits of static typing, even if it doesn't use any static typing. And where types can be added at any point to improve type checking, performance, and polymorphism all at once.

This allows for fast prototyping, and when done correctly, easy to add type safety. For example, you can prototype the code, make sure it works, add more tests, then add type checking while cleaning it up and documenting it. That would be my ideal workflow.


I disagree, static typing shouldn't be optional.


Do you actually mean explicit typing as opposed to merely static typing? How about type inference?


Why?


Citation needed?

Anecdotally, I've developed large projects in C++ and Java (I know, they're pretty lame static type systems -- but certainly the most popular static type systems) and also in Python and Clojure and I really haven't seen much benefit in static typing in regards to software defect rate or quality. Static typing make auto complete and refactoring tools easier, for sure, but it also slows down ease of experimentation (and writing generic code can be painful, although other static languages especially type inferred ones fare better here). I buy into Rich Hickeys view on this topic[1] and that's one reason why I like Clojure: it gets out of the way, but it provides me with the tools I need to verify or validate my data (eg on the module or application boundaries).

I've played around with languages that have fancier type systems (Haskell, various ML's, briefly ATS) and am very interested in Rust (but have yet to use it), but they haven't really provided enough benefits for the effort of describing the types.

Note that I used to be very heavily in the static typing camp and I still very much like the idea of static typing, I just don't think we have found a static type system yet that has the right balance of convenience and safety and actually catches the right kinds of errors (as described in the below talk).

I guess my point is that its not quite clear that the next generation of successful languages will all be statically typed. In fact, current trends would suggest otherwise (most of the popular languages are dynamically typed) although perhaps that depends on your definition of "successful".

[1] https://www.youtube.com/watch?v=2V1FtfBDsLU


> I just don't think we have found a static type system yet that has the right balance of convenience and safety and actually catches the right kinds of errors (as described in the below talk).

"Please don't be an uninformed Rich Hickey talk"

<Clicks>

"Oh, it's an uninformed Rich Hickey talk"


Explain?


> I really haven't seen much benefit in static typing in regards to software defect rate or quality

Hold it right there. I've never seen anyone argue that static type systems prevent bugs.

I mean they do prevent silly bugs that occur from mistyping variable/property names but I've never seen anyone claim that they eliminate other classes of bugs.

The biggest benefit of static type checking is you know what all the variables are.

    def checkout_cart(customer, payment_methods, cart, session):
        # body
What the hell is customer? What is payment methods? What fields are available on these objects? What methods can you call on them? no freaking idea.

Of course, this kind of code is confusing in Java as well, but for a different reason: Java conventions encourage a kind of obtuse programming style where everything is hidden behind layers of abstractions of factories and managers, so that even when everything is typed, you're not sure what anything is doing because all the data that matters is private and so are all the methods that actually do anything useful. All you're left with is an abstract interface that can sometimes be just as bad as an untyped variable. But this is mostly a cultural problem. (I've digressed).

> Static typing make auto complete and refactoring tools easier, for sure, but it also slows down ease of experimentation

Java slows down ease of experimentation because it requires tons of boilerplate code for even the simplest tasks.

It's not the static type checking.

If anything, static type checking helps experimentation because you can change your mind quickly and the compiler will help you catch all the stupid mistakes that can occur from mismatching types or mistyping variable names. This removes a huge cognitive tax and makes programming more enjoyable. Although I will concede this is subjective.


>Hold it right there. I've never seen anyone argue that static type systems prevent bugs.

Really? I see this every single time the subject is brought up. And, to be fair, they do catch some bugs, it's just that they do so at a cost.

>What the hell is customer? What is payment methods? What fields are available on these objects? What methods can you call on them? no freaking idea.

And, if they are all strings, how much more of an idea do you have?

Static typing does not necessarily help solve this problem - a combination of reduced scope(i.e. looser coupling), more specific variable naming and higher cohesion (e.g. having a customer object) do.

Moreover, there's a super easy way to figure out what all of those things are and figure out how you want to change it - run a behavioral test and launch a REPL when it hits that function.

At that point you can inspect customer, use autocomplete on it and even experimentally run code.

>If anything, static type checking helps experimentation because you can change your mind quickly and the compiler will help you catch all the stupid mistakes that can occur from mismatching types or mistyping variable names. This removes a huge cognitive tax and makes programming more enjoyable.

Behavioral tests perform this function equally well, if you have them.


> IMHO behavioral tests perform this function equally well.

I think a common pitfall in these discussions is to compare the worst case examples rather than reasonable quality codebases. I'd be far more interested in, say, time/cost to correct result metrics for a well-maintained Python codebase which has reasonable use of tests & linting (e.g. flake8) to an equivalently-proficient team using a statically typed language.


If we're discussing well-maintained code, then I would expect that the public interface is documented, at least in docstrings. Then I also know what the parameters are.


Agreed — I'm just wondering about how to quantify the impact of various changes. A dynamic language project with no tests, etc. is going to look like a selling point for static typing but I suspect the real-world bug counts for, say, a Python project using mypy (or even flake8 + tests + coverage) is going to be a lot closer than you might think from how heated these discussions get.


There was a study that did a line by line translation of 4 python projects to haskell and caught some bugs (between 0 and 4 per project): http://evanfarrer.blogspot.co.uk/2012/06/unit-testing-isnt-e...

I got the impression that the bugs found were either not at all serious (e.g. throwing a typeerror on malformed input instead of some other nicer kind of error) or were in areas of the code not covered by tests.

Unfortunately the author does not rate them by severity.


Thanks - that's a lot like what I had in mind!

My gut feeling is that dynamic typing + tests & static analysis is faster than very heavyweight languages (e.g. Java) but probably near or less than languages with more advanced typing systems like Haskell or Rust, but I'd really like to see something more comprehensive than a subjective opinion.


What is behavioral test? Like, I don't understand all these weird paradigms that people come up with to deal with the deficiencies of dynamic typing.

If declaring structs is seen as costly overhead that complicates coding, tests are when more cumbersome.


>What is behavioral test?

A behavioral test is a test that tests the behavior of a piece of software, as opposed to a test that checks types or implementation details or something else that isn't behavior.

It is perhaps not necessary to write tests like these in languages that produce code that does not have bugs. I have yet to encounter such a language.

>If declaring structs is seen as costly overhead that complicates coding, tests are when more cumbersome.

You do not write tests then?


No I do not. I specially don't write unit tests.


Except that static typing doesn’t help much there either (unless perhaps you’re using Frink with its units). The type doesn’t carry enough information. For example, knowing that something is an integer really doesn’t give you enough context about what that integer means or is used for. If it’s an object or strict, ok, then it helps to document, but if it’s primitive or standard collections...

Anyway, in Clojure, we now use spec to specify the shape of data that we expect, with nice descriptive namespaced keyword names. It helps in validating data entering the system, generating test data and as documentation.


> I've never seen anyone argue that static type systems prevent bugs.

It's extremely common to claim that static typing prevents entire classes of bugs (and I agree!). Here's just one instance of such a claim I found on Google:

https://news.ycombinator.com/item?id=10934134


The pendulum has swung the other way and back even in my brief memory as a programmer. Is it a trend or an oscillation?


Have to agree with you, but with a little bit of additional context.

I get why people started rejecting statically typed languages. I work with many different languages, but started with statically typed languages with various forms of static typing (Pascal was my first, but then C and C++). Dynamic languages offer flexibility that statically typed languages lack. Most of the time, the flexibility rejected by static typing ends up being a good thing as it encourages sane code that can be analyzed by the compiler and prevent runtime bugs, so I've always accepted this sacrifice of flexibility as a feature rather than a hindrance.

However, I've found myself recently writing many solutions using TypeScript[0] and I'm finding its static type system, which allows one to configure the strictness, to be incredibly powerful. It's helped, first, by fantastic support for union types, type inference and duck typing. I find that the compiler, which is little more than a transpiler with a static analyser bolted on top of it, catches errors that I make and greatly reduces runtime WTFs while still allowing me to write terse, simple code that isn't littered with unnecessary type annotations. I add typings where either the implicit type detection can't predict the type for me or where it decreases cognitive overhead while reading code and leave them out when the opposite is true. This moves the bar back a little bit toward the flexibility side -- I can do things in code that are completely illegal in C# that result in less code, yet not increase my runtime bugs or decrease readability in the process. And if there's something truly harry that has to be done, I can tell the compiler to simply ignore the violation and give me the JavaScript output that I want regardless of what you think it's going to do.

...and I find myself longing for that type system everywhere else. So while I am still a huge proponent of statically typed languages and I don't see that changing any time, precisely how the static type system works and what features it supports is becoming very important to me.

[0] Which, considering how much I hate JavaScript, was a huge surprise considering it's basically ES6 JavaScript with optional type annotations.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: