At the heart of the machine learning that I do with pytorch, most of the errors come from wrong sizes of matrix multiplications, or some matrix input that has a wrong size somewhere in the middle of the net.
How much an extra layer of wrapper of a functional language help with that? Pytorch is not much helpful in terms of the error messages. You have to print out the matrix sizes generated from intermediate operations to find out what is really happening.
(I'm obviously biased as being the author of ocaml-torch)
Using functional programming is not of much help for size mismatches, the current bindings don't even use the type system to check the number of dimensions - although that should be reasonably easy to add. Maybe a better approach to help with this would be tensors with named dimensions http://nlp.seas.harvard.edu/NamedTensor It's possible that a strong type system would help here but I don't think there have been much attempts.
However when using the Python api I often have errors because of:
- Unused variables when refactoring my code, which are just me forgetting to use some parameters.
- Comparing things of different types (and Python does not report any error and just return that they are different).
- Making changes to some helper functions without adapting all the projects where I'm using them.
Using a good Python linter probably helps with these, but that's a place where languages like ocaml naturally shine.
I've done quite a lot of work trying to solve this problem in Haskell, and I'm of the opinion that current type system technology is not strong enough for real world machine learning.
At the most basic level, the sizes of tensors are often not known until runtime, so some sort of dependent typing is necessary. Idris is currently the most practical dependently typed language, and it's missing a number of features that would be needed for machine learning work. For example, it only supports 64bit floating point operations, whereas 32bit ops are standard and the industry is moving to 16 bit and fewer ops.
There's ways in Haskell to get most of the benefits of dependent typing, but they're pretty ugly. My subhask library [1] I think did a reasonable job for vector/matrix math, but once you get into higher order tensors everything becomes either far too verbose or impossible to specify. For example, permuting the axes of a tensor takes roughly a full page of code, and it's not even properly type safe. At the bottom of the link [1], there's a list of type system features that I think would be necessary to add to Haskell before it has a chance of a good linear algebra system... but in all honestly, I'm not even convinced yet that a usable linear algebra system can even be implemented in a language with proper dependent types.
Where are cutting edge type systems at for statically verifying the “dimensionality” of computations? I remember a long time ago reading about a language from sun called “fortress” which was going to include the option of assigning units to values in such a way that scientific domain model operations could be defined via a kind of type-sugar ... seems like that technology would work really well for ML (and data science in general) if it were effective.
I remember reading some lengthy hype about fortress and then never hearing about it again ...
Do bindings like this help much with the day to day work of machine learning research? Are the kinds of errors you encounter the kind that could be avoided with a smarter type checker?
So, this doesn't apply to torch specifically, but to give an example from my own experience... I have a tensorflow model that takes 45 minutes just to build the graph. You can imagine how frustrating it is to wait most of an hour to run your code, only to see it crash due to some trivial error that could have been caught by a compiler.
Although this is somewhat of an extreme example, I'd say most of my tensorflow models take at least several minutes to build a graph. Anything that reduces the possibility of runtime errors would therefore speed up my development cycle dramatically. I'll also note that the OCaml compiler is lightning-fast compared to say, gcc.
Are you sure you’re writing it in a reasonable way? I’ve made some awfully large nets (implementing current state of the art models) with TF and I’ve never ran into anything like that. I mean I might have 30 seconds to a minute before it gets moving but that’s the most I’ve seen, and that’s including the entire init process (reserving GPUs, preprocessing enough data to keep the cache moving, initializing variables for optimizers and such, and so on). Some of these models have an absolute ton of ops.
What are you doing that requires a 10, 30, 45 min graph build?
I'm fairly sure the model is implemented in a reasonable way. It's an experimental deep generative model based on https://github.com/openai/glow, though more complex because the warp and its inverse are evaluated at training time, and the outputs fed to other things. The warp has around 200 layers, IIRC. The model requires keeping track of the evolution of the log-determinant of the warp after each operation, along with the derivatives of those things... so the graph can get pretty huge.
I'm not sure that's a good metric. After all type checking does not really answer a question a developer has about type errors, it just informs you of a type error before you run your program.
In fact I'd wager your average Haskell type error has generated more than a few frustrated stackoverflow questions.
Right, but learning about the errors without having to run and hit every codepath in your program has a lot of value. So in that way, seeing how many people hit pandas runtime type errors is a useful thing to measure.
Type errors in functional languages (except Elm) are so unhelpful, you just get used to their general shape and your context to figure what the problem might be. Eventually these are rarely ever a problem but I agree that for beginners, type errors can be quite hostile.
I'll say however, the benefit of an ML like Ocaml is less so the types than that functional languages are the most advanced in providing tools which allow for domain modeling in terms of composition and defining tailored algebras. This is something that fits both applied and experimental machine learning especially well.
Fair point. I guess it's fair to say that a typechecker would have avoided (almost) all runtime type errors and a fair portion of compile-time type errors provided the developer had type-guided tool assistance.
There places where types will help when reading unfamiliar code. They'll also help when you're putting together more complex algorithms for novel architectures. They're less helpful if you're only stacking layers like lego-blocks and tuning hyper parameters of existing models.
If you look at the issues, you'll see the developer contemplating GADTs to deal with tensor shape matching (but even phantom types will do). This is an issue that causes headaches for just about anyone whose dealt with neural nets.
Bindings author here. This should support most of the PyTorch ops - this is done by automatically generating the binding code as there are more than a thousand of these. Most of the models available in torchvision should also be there (with pre-trained weights) see https://github.com/LaurentMazare/ocaml-torch/tree/master/src... Finally it's also possible to export some python defined models and run them from ocaml.
That being said there are some rough edges compared to the PyTorch api, e.g. no parallel data loader, not much tooling and only a couple tutorials...
How much an extra layer of wrapper of a functional language help with that? Pytorch is not much helpful in terms of the error messages. You have to print out the matrix sizes generated from intermediate operations to find out what is really happening.