Also no - we use data paralellism. I'd maybe check out our new spark page we put up that explains how parameter averaging works: http://deeplearning4j.org/spark
To quote your page:
Data parallelism shards large datasets and hands those pieces to separate neural networks, say, each on its own core. Deeplearning4j relies on Spark for this, training models in parallel
This isn't the same thing as the TensorFlow distributed training model at all.
No..? I'm not sure how training on several shards at once and averaging the results asynchronously is hyper parameter tuning o_0 We have a whole dedicated library for that called arbiter.
You are thinking of grid search and the like. We implement grid search and bayesian on spark (the latter being closed source)
I didnt say it was the exactly the same as tf either. We have been doing this for close to 2 years now. Actually..Im not sure why it has to be? Its closer to hogwild and co.
You also never answered my question ;). Not sure what to assume here.
Sorry, I edited. I agree it's not hyperparamter optimisation, more some kind of regularisation thing.
But it isn't the same as what TensorFlow does, and I'd argue it is much closer to my initial characterisation ("can use Spark to coordinate training multiple models in parallel")
Not entirely sure about which question I missed, or what you want to assume. If it is this:
When you "played" with us - I'm assuming you just cloned our examples and ran us on the desktop? Likely more than a year ago before our c++ rewrite? I'd be curious to see if you ran us on spark as well.
Then no. It was around November, and I got the demos working, and attempted to build a custom network.
Edit: Was it the pyspark question? Then yes, but we also use Scala, Java and R (and SQL of course).
Are you comparing 2015 DL4J to 2016 TensorFlow? Your opinion of our framework seems outdated. You're welcome to try us now -- that would actually be a fair comparison.
To quote your page:
Data parallelism shards large datasets and hands those pieces to separate neural networks, say, each on its own core. Deeplearning4j relies on Spark for this, training models in parallel
This isn't the same thing as the TensorFlow distributed training model at all.