Yes, I have done non-trivial implementations of a number of SoTA models in Julia. The framework I've used is Flux[1] which I love for it's simplicity, it is very much like the DarkNet[2] framework in that regard which is refreshing after using TensorFlow. PyTorch is much better about not having unnecessary complexity and a sensible API but Flux is certainly better.
The ability for Julia to compile directly to PTX assembly[3][4] means that you can even write the GPU kernels in Julia and eliminate the C/C++ CUDA code. Unfortunately, there is still a lot of work to be done to make it as reliably fast and easy as TensorFlow/PyTorch so I don't think it is usable for production yet.
I hope it will be production ready soon but it will likely take some time to highly tune the compute stacks. They are already working on AMD GPU support with AMDGPU.jl[5] and with the latest NVIDIA GPU release which has IMHO purposefully decreased performance (onboard RAM, power) for scientific compute application I would love to be able to develop on my AMD GPU workstation and deploy on whatever infrastructure easily in the same language.
I do have some gripes with Julia but the biggest of them are mostly cosmetic.