We actually are going to be doing some stuff with IBM/NVLink as well as some other neat things I can't announce. We'll be able to benchmark on this front similar to you guys though :).
We'll be doing RDMA for this and plan on writing the code to match that using spark for orchestration and data storage/loading.
Right now the main thing we do is data parallelism on partitions of data with intermittent averaging with data being trained on various spark partitions.
We actually are going to be doing some stuff with IBM/NVLink as well as some other neat things I can't announce. We'll be able to benchmark on this front similar to you guys though :).
We'll be doing RDMA for this and plan on writing the code to match that using spark for orchestration and data storage/loading.
Right now the main thing we do is data parallelism on partitions of data with intermittent averaging with data being trained on various spark partitions.
Other than that, we have multi gpu settings.
We've made it pretty configurable though: http://deeplearning4j.org/gpu
Admittedly, we'll continue to do more work in this area though.
So far fp16 has been pretty nice though :).