Type inference really doesn't require ML (and is probably better done without it in order to be robust).
I think the Kolmogorov complexity of serious production codebases is very high -- in order to convey what the codebase does, you have to transmit roughly as many bits as the codebase itseld. In most companies, you already have a way to get a "compact, human-readable description" of what the code does -- it's called asking your coworker, or reading your internal documentation. Ramp up is still hard.
This holds only if you have sufficient context, giving what is virtually an untyped function into e.g. mypy will not provide anything meaningful. Even languages with rich type systems like Haskell do require some types - even implicitly - to infer everything else.
In contrast, copilot has access to significantly more information when emitting the type-annotated code, including context of what the project is like, how are other functions defined, plus all the prior knowledge accumulated from training.
> I think the Kolmogorov complexity of serious production codebases is very high
Do serious codebases exceed gigabytes in raw storage?
> Even languages with rich type systems like Haskell do require some types - even implicitly - to infer everything else.
If the Haskell compiler is spitting out "ambiguous instance of +" or whatever that error message is, it's a sign that the author doesn't understand what they're asking for. Taking a language whose value proposition is "I refuse to compile your code if there is the slightest indication that it's not perfect" and slapping a fuzzy ML model on top of it to suppress a class of errors is not a good idea.
Type systems exist as a way of ensuring that the programmer's mental model matches the code. Offloading type annotations to something else removes that safety from you.
> no priors
Sure? I don't see how that changes anything. My point is just that we already have very high quality summarizers of code, but software development is still hard. Making a model that attempts to approximate the summarizers of code that we already have isn't going to help much, no matter how good it gets.
> and slapping a fuzzy ML model on top of it to suppress a class of errors is not a good idea.
I am not saying that that is a good idea, I am expressing that there are limitations on what one can infer without broader context.
> Offloading type annotations to something else removes that safety from you.
Counter argument; the annotations can be used as a second sanity check; if the inferred types match your mental model - and the type inference model is good - then you know that your mental model is correct and you didn't miss something.
I think the Kolmogorov complexity of serious production codebases is very high -- in order to convey what the codebase does, you have to transmit roughly as many bits as the codebase itseld. In most companies, you already have a way to get a "compact, human-readable description" of what the code does -- it's called asking your coworker, or reading your internal documentation. Ramp up is still hard.