I LOVE PyTorch for experimenting with dynamic deep neural nets (DNNs) -- that is, DNNs that can have different graphs for different input samples. I find it much, MUCH easier to create and tinker with dynamic DNNs using PyTorch than, say, TensorFlow Fold. PyTorch is great for R&D experimentation.
For example, here's how easy it is to construct a fully-connected neural net with a dynamically random number of recurrent hidden layers in PyTorch. Yes, it's a silly example, but it shows how easy it is to construct dynamic DNNs with PyTorch:
import random
import torch
class MySillyDNN(torch.nn.Module):
def __init__(self, input_dim, hidden_dim, output_dim):
super(MySillyDNN, self).__init__()
self.input_layer = torch.nn.Linear(input_dim, hidden_dim)
self.hidden_layer = torch.nn.Linear(hidden_dim, hidden_dim)
self.output_layer = torch.nn.Linear(hidden_dim, output_dim)
def forward(self, x, max_recurrences=3):
hidden_relu = self.input_layer(x).clamp(min=0)
for r in range(random.randint(0, max_recurrences)):
hidden_relu = self.hidden_layer(hidden_relu).clamp(min=0)
y_pred = self.output_layer(hidden_relu)
return y_pred
It would be a hassle to do something like this with other frameworks like TensorFlow or Theano, which require you to specify the computational graph (including conditionals, if any) before you can run the graph.
PyTorch's define-the-graph-by-running-it approach is sooo nice for quick-n'-dirty experimentation with dynamic graphs.
You can even create and tinker with dynamic graphs interactively on a Python REPL :-)
What about keras? Keras was originally billed as the torch API in python, so same API design inspiration. I've used both but generally still use keras because I don't see enough of a difference in pytorch to switch.
The graph does need to be compiled, you're right. But you can still do everything interactively through the REPL. I don't see the difference in practice.
Also, is there any reason to think useful optimizations are being made during compilation in theano or tf that don't get made in torch because it is more strictly dynamic? Anecdotally, pytorch seems quite fast, but I'm wondering.
Have you run anything on multiple-GPUs or scaled to multiple nodes? My biggest hesitation for using pytorch is what appears to be the limited distributed compute support. Being able to easily scale a dynamic graph to arbritrarily large size across a cluster would make pytorch an easy sell for me.
No, I haven't done any multi-GPU or multi-node work with PyTorch... at least not yet. So far, I've used PyTorch only for quick-turnaround tinkering and experimentation, and for building prototypes, typically with small datasets or smaller subsets of larger datasets.
For real-world workloads, I, along with my work colleagues, currently use TensorFlow, which has good performance, large community infrastructure, and fantastic tooling around it.
If an idea shows promise in PyTorch, our next step is usually to implement it in TensorFlow with more data. But we do a lot of experimental tinkering in TensorFlow too. It depends on the learning task at hand.
Note that this version of PyTorch is the first one to support distributed workloads such as multi-node training.
There was a great podcast with Soumith Chintala on the O'Reilly data show a couple of days back with more info on PyTorch and how it differs from Theano and Tensorflow:
For example, here's how easy it is to construct a fully-connected neural net with a dynamically random number of recurrent hidden layers in PyTorch. Yes, it's a silly example, but it shows how easy it is to construct dynamic DNNs with PyTorch:
It would be a hassle to do something like this with other frameworks like TensorFlow or Theano, which require you to specify the computational graph (including conditionals, if any) before you can run the graph.PyTorch's define-the-graph-by-running-it approach is sooo nice for quick-n'-dirty experimentation with dynamic graphs.
You can even create and tinker with dynamic graphs interactively on a Python REPL :-)