It's not just about accelerating ML, specifically deep learning. There are many other enterprise technologies that can benefit from GPUs. One example: OLAP-focused databases (such as MapD - https://www.mapd.com/). For some benchmarks, check out this blog: http://tech.marksblogg.com/benchmarks.html.
The DL "training" use-case is well-known at this point, but there are many others which are emerging.
A GPU database isn't that useful, because the arithmetic intensity (ops/byte) is relatively low. Cross-sectional memory bandwidth is what really matters; you can get similar effects with a cluster of CPU machines provisioned appropriately, with a shard or a replica of the database on each CPU machine. I say this as someone who has written a GPU in-memory database of sorts that is used at Facebook (Faiss), but what is interesting if you can tie that to something that has higher arithmetic intensity before or after the database lookup on the GPU.
GPUs are only really being used for machine learning due to the sequential dependence of SGD and the relatively high arithmetic intensity (flops/byte) of convolutions or certain GEMMs. The faster you can take a gradient descent step means the faster wall clock time to converge, and you would lose by limiting memory reuse (for conv/GEMM) or on communication overhead or latency if you attempt to split a single computation between multiple nodes. The Volta "tensor cores" (fp16 units) make the GPU less arithmetic bound for operations such as convolution that require a GEMM-like operation, but the fact that the memory bandwidth did not increase by a similar factor means that Volta is fairly unbalanced.
The point about Intel not increasing their headline performance by as much as GPUs is also misleading. Intel CPUs are very good at branchy codes and are latency optimized, not throughput optimized (as far as a general purpose computer can be). Not everything we want to do, even in deep learning, will necessarily run well on a throughput-optimized machine.
Actually, in columnar databases the ops/byte intensity is significantly greater, and the GPU helps here.
If you think about how a database CAN be built, instead of how they were built until now, you will find that there are very interesting ideas that can and do make use of the GPU.
The research into these has been around since 2006, with a lot of interesting papers published around 2008-2010.
There are also at least 5 different GPU databases around, each with their own aspects and suitable use-cases [1]...
The DL "training" use-case is well-known at this point, but there are many others which are emerging.