I was going to write a cynical complaint about how it's hardly useful without GPU support... but it's using WebGL to hit the GPU. Of course. And it's probably a million times easier than trying install a TensorFlow stack locally on your desktop.
We store NDArrays as floating point WebGLTextures (in rgba channels). Mathematical operations are defined as fragment shaders that operate on WebGLTextures and produce new WebGLTextures.
The fragment shaders we write operate in the context of a single output value of our result NDArray, which gets parallelized by the WebGL stack. This is how we get the performance that we do.
Which... is pretty much how GPGPU started in the early 2000.
Sad/funny how we go through this cycle again.
It will be interesting to see if the industry will produce a standard for GPGPU in the browser. Giving that the desktop standard is less common than a proprietary standard.
This is still done in pretty much every game engine I've worked with (for general computation used to support rendering as much as the rendering itself). It's frankly extremely practical and better than many GPGPU apis because it matches what the hardware is doing internally better (GPU core warps, texel caches, vertex caches, etc).
> It will be interesting to see if the industry will produce a standard for GPGPU in the browser.
They did: webcl Sadly, it had multiple security issues so the browsers that had implemented it in their beta channels (just Chrome and Firefox, I believe) ended up removing it. And now, I think it's totally stalled and no one is planning on implementing it.
Also sadly, SIMD.js support is coming along extremely slowly.
And SwiftShader is a quite nice fallback for blacklisted GPUs. They simulate WebGL on the CPU and take advantage of SIMD:
https://github.com/google/swiftshader
As I understand, deeplearn.js is more of a kitchen than a prepared meal. Part of the library is referred to as “numpy for the web” with classes to run linear algebra equations efficiently, leveraging the GPU. I don’t see why you couldn’t use those pieces to set up other networks. I think the name “deeplearn.js” is moreso capitalizing on the branding momentum of “deep learning” rather than being the demonstration of one kind of network. I’m in the middle of introductory machine learning classes, so I hope someone will correct me if I’m wrong.
We wanted to do hardware accelerated deep learning on the web, but we realized there was no NumPy equivalence. Our linear algebra layer has now matured to a place where we can start building a more functional automatic differentiation layer. We're going to completely remove the Graph in favor of a much simpler API by end of January.
Once that happens, we'll continue to build higher level abstractions that folks are familiar with: layers, networks, etc.
We really started from nothing, but we're getting there :)
Thanks for the explanation! I recently have been working on my own deep learning library (for fun) and was doing something similar. Aren't GL textures sampled with floating point units inexactly? Do you just rely on floating point error to be small enough that you can reliably index weights?
I ended up switching to OpenCL since I am running this on my desktop. Just curious to see what you did. Thanks!
You can set nearest neighbor interpolation for the texture (aka no interpolation), and gl_FragCoord can be used to determine which pixel the fragment shader is operating on.
It's not really a hack, it's just using the GPU's parallel computing capabilities to compute things in parallel. This technique has been around for ages.
Languages buddy, languages.. As much as languages were a barrier for human culture to spread their ideas, it's analogous in the computing world.. JS is catching up with many concepts that were prevalent in other languages/environments. Also due to JS it is now becoming more accessible and popular to the commoners..
Compute shaders are not supported in webgl, but is possible to perform vector operations by rendering to a texture with the fragment shader. It's basically a hack. The trick is rendering an image without putting it on the screen, storing arbitrary data into the pixels. This has limitations, but is actually good enough for many vector and matrix operations.
I believe this method was even used with desktop opengl when gpus were first being used for general computing and didn't have more flexible apis yet like opencl/cuda
Why pretend? A texture is just an array of floats. The thing preventing you, the programmer, from seeing it as something beyond an "image" is really yourself. In modern graphics, we use textures to represent all kinds of data. Projections onto different spherical harmonic modes, luminance histograms, atmospheric data, etc.
A texture is just data and a shader mutates that in a highly parallel way with specific coherence semantics. Before knocking it, I suggest educating yourself a bit more.
What are we pretending here? A GPU is designed to do multiplication. We're doing multiplication. Calling it "texture" or "shading" or whatever else is only because of the GPU's root in gaming, otherwise the problems are exactly the same.
Yeah I don't see how I stepped on any toes. Deep learning libraries have always been a pain in the ass to install. And only support some OSes and GPUs. There's nothing wrong with JS or webgl, but it seems an incredibly inelegant choice for this.
I wasn't aware it worked at all on Windows now, that's cool. Doesn't seem very simple to install though. This is the installation guide for just one of the prerequisites: http://docs.nvidia.com/cuda/cuda-installation-guide-microsof... And as you mention, only Nvidia.
When you want performance the only way is getting close to the hardware. That is why deep learning is usually linux+nvidia. That is why game engines use/used C or even assemble code for critical parts.
Don't expect anything better soon as companies focus on performance per watt and will only develop things in that path.
Companies will improve the algorithms and go low level with hardware instructions. The problem with GPUs is that they have special instructions for some operations. Not using them can be 20 times slower. You are not going to improve the algorithms 20 times, ND if you do you can still improve it another 20 times with hardware instructions. You do both if you can. That is a big saving in electricity in data centers.
I'm not sure how convenient this will be for most use cases.
JavaScript in the browser lacks good data manipulation libraries (nothing that comes close to using Pandas, or R). Even loading/saving files with the browser is a PITA (by design really, for security reasons).
I can see a day when JavaScript (or web assembly) has APIs like Numpy and Pandas, but for now It's simply easier to install one of those than try and do everything in JavaScript.
Presumably this will usually be run from a JS engine like Node independent of a browser. Which will have all the libraries needed for loading data and such. In theory it will be an order of magnitude faster than numpy because it uses the GPU (I don't know what the actual performance is though.) Certainly the JS will be much faster than python or R.
Deeplearn.js is designed for webGL on a browser. I'm not sure if it'll work in Node. And on a server you'd probably want to target OpenCL.
Node adds file support, but libraries for stuff like dataframes are still in their early days. That might not be a problem for example problems, but real world data needs cleaning up before you can start doing ML.
I've been waiting years for this. Once compute shaders became available to the browser I planned to finally get around to writing some code because you could project out to so many platforms. This sucks the entropy out of the coding task. You don't have to have multiple implementations, with unique bugs on multiple platforms.
JS has improved over the year but you can also go with a typed language if you wanted like purescript or typescript.
I guess I targeted compute shaders in the browser as a good time to revisit linear algebra and ANN since I could expect improved performance, improved programming model and improved portability.
You have created an abstraction that is pretty portable. You'll probably be able to capture new performance enhancements as they occur on web runtimes. Maybe I'll try it out.
pip dependencies can be a pain in the ass. It's also remarkably bad about telling you how to satisfy the dependencies.
* Error in line 392
After an hour of searching, the solution is:
"Just install these 100 packages. Package 1-20 need to be compiled from source, because the project requires an old version. And they all have dependencies of their own."
OP here: Please checkout the GitHub repository of the Neural Network in JS too [0].
Would love to see some exciting discussions around machine learning in JS. I am exploring the topic heavily at the moment and I am keen to apply/write/educate/learn more about it.
I also hope internet bandwidth grows to the point where hundreds of megabytes takes milliseconds to download, and that computer memory increases until hundreds of megabytes is trivial, and that the economics of ISPs mean I get truly unlimited data without incurring charges, but when all that happens I definitely do hope we're utilising technology in ways that includes embedding complex neural networks in webpages as a normal, typical thing we do.
Resources don't grow to the infinite, there are physics limits to everything. Be aware of that. You can see the perfect example in CPUs, now we are adding cores but peak speed In a single core don't grow very much in new generations as they did few years ago. With the web it is going to be similar at some point (even the TCP protocol has some limits that forbid to reach top speed with current connections for small transfers).
The theoretical maximum speed of 5G wireless is 10,000Mbps with a 1ms latency - a 300MB file would download in 251ms. That's current technology being tested in the field at the moment. Real phones will use it in a couple of years. We won't get that speed, but we'll get quite close in major cities. Today's flagship desktop computers have 16GB of RAM, but there's no particular reason why they can't have 10 times that much. Servers already do. Phones have 4GB, but again they could have much more. CPU speeds are fast, but a neural networks run on GPUs which are considerably faster for that task (remember this isn't training the network, it's just running one you've downloaded).
This is what's actually happening right now. This is the near future.
I am going to just forget all the countries that are still on GSM. Let's say we have 5G right now.
1) The theoretical speed is one thing, the real one is another one.
2) There is more than one node connected to the network
3) TCP needs some package interchange to increase the window size
4) Latency with the server where the page is hosted is not going to be 1ms, you are lucky if it is around 20ms.
5) Serves have more RAM because they use better CPUs, it means they are more expensive.
6) Who connects to websites nowadays? Mobile is the way to go now (It means slower CPUs, less RAM, less power consumption, etc)
7) Why are you going to transfer a complete netural network instead of sending the data to the server. The data is going to be smaller than the network and you avoid the risk of someone cloning the neural network for their web site.
8) There are no useful applications in web sites for neural networks. It is a non sense for me (please probe me wrong and show me an use case where it makes sense)
7) Why are you going to transfer a complete netural network instead of sending the data to the server.
Privacy is one reason I can think of. By running the trained network locally I can benefit from it without sharing all my data with the company serving it.
Data size is another reason. If I want to categorize 50GB of images using a 300MB model it makes a lot more sense to download the model than upload the data.
8) There are no useful applications in web sites for neural networks.
I admit that I haven't got a clue how these things will be useful. If I did then I'd have started a business to implement it already. My lack of insight isn't a good reason to suggest they'll never be useful though.
> Data size is another reason. If I want to categorize 50GB of images using a 300MB model it makes a lot more sense to download the model than upload the data.
The faster the network is the more reasons you have to upload your data. The time to process an image in the cloud is going to be virtually 0 with small latency as the resources in the cloud are huge compared with what you can have at home.
There is lots of work being done in model compression (quantization, simple factorization tricks, better conv kernels like depthwise separable convs, etc). We won’t let that happen!
I am aware of that research, but even with a 20x decrease in size some models are still too big for web (think about world wide web, not internet in US).
Often times researchers train huge models, but don't think about model size (because they don't have to). We've seen ~200MB production models get down to ~4MB and not lose much precision. I'm quite confident we'll continue that trend.
Don't forget that folks were saying this about the web when images / rich media were becoming prevalent!
200MB is still a small model and 4MB is almost the double of an average web page (including images). 10MB web pages is really bad, more for countries that are still developing their infrastructure.
I saw a talk on this paper a couple years ago. https://arxiv.org/abs/1503.02531 The method is to train a smaller model on the predictions of a large model or ensemble. I'd be interested in knowing other techniques as well.
This explanation from a Google blogpost helped me:
The API mimics the structure of TensorFlow and NumPy, with a delayed execution model for training (like TensorFlow), and an immediate execution model for inference (like NumPy). We have also implemented versions of some of the most commonly-used TensorFlow operations. With the release of deeplearn.js, we will be providing tools to export weights from TensorFlow checkpoints, which will allow authors to import them into web pages for deeplearn.js inference.
When possible, most DL frameworks take advantage of Nvidia's specialized CuDNN libraries, which often provide a 10x+ speedup (and are obviously not available via WebGL). So at least on the latest Nvidia cards, you will likely see a 10x slowdown, and probably even more.
It uses WebGL to use the GPU, so probably is closer to other standalone libs (but I guess it will depend on the kernels the Networks uses and how WebGL can handle them)
What is so appealing about a delayed execution model? Why can't we just perform tensor math as in numpy, and let the library figure out the fastest way to do it behind the scenes? I think the whole "graph" approach is making things needlessly complicated.
This is a meme that keeps getting repeated, and I don't know why. Tensorflow, for example, despite several years of development, does basically little to no graph optimizations and for tons of tasks ends up much slower than PyTorch / Chainer / DyNet (Tensorflow is developing a "JIT compiler" but it is still in alpha).
It goes without saying that a framework that does define-by-run also knows the whole graph.
By the way, if you'd make your interface more general than deep learning, your library could be the start of an alternative for numpy/scipy on JS, and it would be even faster than the original Python version because it uses the GPU. Just a thought ...
(One small downside is that JS doesn't have the nice operator overloading that Python has, afaik)
We call ourselves deeplearn.js, but you can use it for general linear algebra! Our NDArrayMath layer is analogous to NumPy, and we support a large subset of it (we support many of the linear algebra kernels, broadcasting, axis reduction, etc).