This is really powerful. As a use case I'm considering the graph of streets in a city. A task would be to make predictions on the next nodes visited by a vehicle given the history of the recent nodes. I'm not sure how you deal with mini batches. Since you need the Adj matrix of the whole graph?
Mini-batching is indeed tricky, as you need the information of the complete local neighborhood for all nodes in a mini-batch. Let's say you select N nodes for a mini-batch, then you would also have to provide all nodes of the k-th order neighborhood for a neural net with k layers (if you want an exact procedure). I'd recommend to do some subsampling in this case, although this is not trivial. We're currently looking into this. Otherwise full-batch gradient descent tends to work very well on most datasets. Datasets up to ~1 million nodes should fit into memory and the training updates should still be quite fast.
Both graph-level and node-level classification are possible. Graph-level classification requires some from of pooling operation (simplest case: mean-pooling over all nodes, but there are more elaborate things one can do)
In a graphical model, you'd explicitly model the probabilistic assumptions that you make with respect to the data. In this neural network-based approach the goal can be thought of more like learning a function that maps from some input to some desired output. But indeed the form of the propagation rule resembles mean field inference in graphical models.
Currently the framework only supports undirected graphs, so directed graphical models wouldn't be supported as input. I can't really judge how useful it would be to take a Bayesian net as input, sounds a bit hacky to me. But in principle you could train a neural net on any kind of graph, someone recently suggested to take the connection graph of another neural net as input and try to learn some function on that. But again, it's really hard to judge in advance how useful such approaches would be ;)