The demo uses WebGL, and if you can't get the demo to work, you can
here(https://github.com/Erkaman/regl-cnn) find a recorded gif that shows what it is supposed to look like.
This demo does handwritten digit recognition by evaluating a
Convolutional Neural Network on the GPU with WebGL. The network was
trained in TensorFlow by this script here(https://github.com/Erkaman/regl-cnn/blob/gh-pages/scripts/cr...), and the network was then
reimplemented on the GPU by hand with WebGL. The main purpose of the
demo was to demonstate how our WebGL framework
regl(https://github.com/mikolalysenko/regl) can be used to greatly
simplify GPGPU programming in WebGL. The secondary purpose was to test
whether evaluating Deep Learning networks in WebGL is doable. To our
knowledge(but we may be wrong!), our implementation is the first
implementation ever to attempt GPU accelerating neural networks with
WebGL And we hope that this implementation will provide a foundation
for people who, like us, wish to experiment with Deep Learning and
WebGL The GPU implementation can be found here(https://github.com/Erkaman/regl-cnn/blob/gh-pages/src/gpu.js)
Note that this network will probably be slower than the corresponding
network implemented on the CPU. This is because of the overhead
associated with transferring data to and from the GPU. But in the
future we will attempt implementing more complex networks in the browser,
such as Neural Style(https://arxiv.org/pdf/1508.06576v2.pdf), and then we think that we will see a
significant speedup compared to the CPU.
Lastly, if anyone has any questions, I will be glad to answer them
here.
Nice work! I've also tried my hands at use WebGL to do deep learning in the browser (e.g. https://github.com/scienceai/neocortex/blob/master/src/lib/w...). The conclusion I came to was that there are just way too many limitations for it to really pay off. The need to encode everything in textures etc limits the data shape and dimensionality, not to mention the complexity cost. If you can get more complex networks working I'll be really impressed!
MXnet.js [1] is an emscripten port of the base C++ framework. It runs entirely in the browser and works fairly well. The actual code produced by emscripten isn't that large, but the model weights can become an issue. I've tried to get emscripten working on tensorflow, even just for forward prediction, but have been pretty much gotten nowhere. Of course this doesn't let you harness GPU power.
Lots of cool potential applications of doing deep learning over the web are just waiting to be discovered and built.
I had plans to do that, but that means implementing all of TensorFlows functionality in Javascript, which is a huge pain and lots of work. But if we restricted all networks to a strict subset of TensorFlow, it can be done, I think.
Weird, my 7 with a slash is consistently (2/2 times) recognised as a 7. Maybe you need to write the slash without connecting it to the bottom with a little loop? Otherwise it's a real shame that these networks are only trained on American handwriting (is it even handwriting? most Americans seem to print-write).
I wrote it without the dash for most of my childhood, until I found out the dash was a possible way to write it. From then on, nobody ever mistook my 7's for 1's anymore. It was great, since I have really bad handwriting and I'd take all the help I could get not to get things wrong on tests. Zeroes get slashes, too, just for safety.
In my experience the dash is unusual (but not unheard of) in the US. I've heard people refer to it as a German or European 7. I've never seen the German 1 that is like an upside down V used by an American.
Just tried it with my usual length dash (extending 3/4 along the top line to the left, and about 1/4 out the other side) and it returned a 3. Used a smaller dash and it returned a 7. Exceptionally neat tool though and seriously impressive!
Interesting it returns the false numbers when written in 'non-American' style. Drawing a 1 is recognized as 7, and drawing a 7 (with 'strike-through') is recognized as 2 :)
Since the network has already been trained, I can't imagine using WebGL is worthwhile. It would interesting to see, either, drawing samples from the equilibrium distribution or training in the browser.
It must have been quite painstaking to hand-code this neural network into WebGL shaders. It could have been easier if the browser vendors would just implement the WebCL standard.
This seems like a throwback to the pre-CUDA "GPGPU" era, when people were implementing numerical algorithms in OpenGL to be able to leverage GPUs for general purpose computing.
Yeah, it was really painful. In order to speed up the process, I first implemented the network on the CPU, so that I could quickly verify my GPU implementation.
I draw my 1's the way it's often displayed on screen, with a little tail at the top and a line at the base. Your system keeps detecting this as a 2. I've also had 3's detected as 2's.
I have always written 1's with a tail at the top. That is both the way it's taught in German schools, and how this program outputs the digit. Most of my attempts get recongnized as 7.
I prefer my writing as unambiguous as possible, so I also like that way of writing it. Like someone else mentioned, "mousewritten" is not necessarily the same as handwritten, so that could be part of the problem.
Pretty cool. I realize it is mainly a proof-of-concept, but decided to try out variations of scribbles[1]. Does the code make explicit assumptions about orientation, or did the training data make certain assumptions?
The original MNIST training data made the assumption that the digits are not flipped. But you could solve it by creating more training data by flipping the original digits. But then you suddenly end up with an awful lot of data, and then the training process would take days, literally.
Intriguing! I was under the illusion that webGL was emit-only, and that Javascript programs can't read back any output generated by WebGL. I must be wrong or out of date :) So how does the script do it?
With the right extensions, you can render to a texture which can be sampled from in subsequent draw calls.
The framebuffer object extensions that allow you to write to 32-bit RGBA textures are widely supported on desktops and mobiles (OpenGL ES 2). But floating point textures are not. So, shaders resort to encoding 32-bit floats in RGBA textures. This unfortunately isn't a simple cast. More here: http://aras-p.info/blog/2009/07/30/encoding-floats-to-rgba-t...
I am storing all data in textures, and you can read data from textures by binding the texture to an fbo, set that fbo to the default fbo, and then read the data with `glReadPixels`
One way would be to crack captchas, eg make a Chrome extension that automatically fills them up. I remember someone implemented that at some point for MegaUpload.
Another way is to gather a data set of people writing digits with their mouse, and make a classifier that tells you if an input is realistic or not.
Of course you'd need to store previous user inputs to make sure someone is not just reusing the same digit over and over again.
This demo does handwritten digit recognition by evaluating a Convolutional Neural Network on the GPU with WebGL. The network was trained in TensorFlow by this script here(https://github.com/Erkaman/regl-cnn/blob/gh-pages/scripts/cr...), and the network was then reimplemented on the GPU by hand with WebGL. The main purpose of the demo was to demonstate how our WebGL framework regl(https://github.com/mikolalysenko/regl) can be used to greatly simplify GPGPU programming in WebGL. The secondary purpose was to test whether evaluating Deep Learning networks in WebGL is doable. To our knowledge(but we may be wrong!), our implementation is the first implementation ever to attempt GPU accelerating neural networks with WebGL And we hope that this implementation will provide a foundation for people who, like us, wish to experiment with Deep Learning and WebGL The GPU implementation can be found here(https://github.com/Erkaman/regl-cnn/blob/gh-pages/src/gpu.js)
Note that this network will probably be slower than the corresponding network implemented on the CPU. This is because of the overhead associated with transferring data to and from the GPU. But in the future we will attempt implementing more complex networks in the browser, such as Neural Style(https://arxiv.org/pdf/1508.06576v2.pdf), and then we think that we will see a significant speedup compared to the CPU.
Lastly, if anyone has any questions, I will be glad to answer them here.