Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: GPU-Accelerated Digit Recognition with WebGL (erkaman.github.io)
135 points by erkaman on Aug 17, 2016 | hide | past | favorite | 57 comments



The demo uses WebGL, and if you can't get the demo to work, you can here(https://github.com/Erkaman/regl-cnn) find a recorded gif that shows what it is supposed to look like.

This demo does handwritten digit recognition by evaluating a Convolutional Neural Network on the GPU with WebGL. The network was trained in TensorFlow by this script here(https://github.com/Erkaman/regl-cnn/blob/gh-pages/scripts/cr...), and the network was then reimplemented on the GPU by hand with WebGL. The main purpose of the demo was to demonstate how our WebGL framework regl(https://github.com/mikolalysenko/regl) can be used to greatly simplify GPGPU programming in WebGL. The secondary purpose was to test whether evaluating Deep Learning networks in WebGL is doable. To our knowledge(but we may be wrong!), our implementation is the first implementation ever to attempt GPU accelerating neural networks with WebGL And we hope that this implementation will provide a foundation for people who, like us, wish to experiment with Deep Learning and WebGL The GPU implementation can be found here(https://github.com/Erkaman/regl-cnn/blob/gh-pages/src/gpu.js)

Note that this network will probably be slower than the corresponding network implemented on the CPU. This is because of the overhead associated with transferring data to and from the GPU. But in the future we will attempt implementing more complex networks in the browser, such as Neural Style(https://arxiv.org/pdf/1508.06576v2.pdf), and then we think that we will see a significant speedup compared to the CPU.

Lastly, if anyone has any questions, I will be glad to answer them here.


Nice work! I've also tried my hands at use WebGL to do deep learning in the browser (e.g. https://github.com/scienceai/neocortex/blob/master/src/lib/w...). The conclusion I came to was that there are just way too many limitations for it to really pay off. The need to encode everything in textures etc limits the data shape and dimensionality, not to mention the complexity cost. If you can get more complex networks working I'll be really impressed!

MXnet.js [1] is an emscripten port of the base C++ framework. It runs entirely in the browser and works fairly well. The actual code produced by emscripten isn't that large, but the model weights can become an issue. I've tried to get emscripten working on tensorflow, even just for forward prediction, but have been pretty much gotten nowhere. Of course this doesn't let you harness GPU power.

Lots of cool potential applications of doing deep learning over the web are just waiting to be discovered and built.

[1] https://github.com/dmlc/mxnet.js


Wow, so there are other people who have tried doing deep learning in WebGL! But I will also give it a try, and see if I can do it better than you.


Very well done, but not the first: 2 years ago Jetpac released DeepBeliefSDK (https://github.com/jetpacapp/DeepBeliefSDK). They were acquired by Google and development stopped. But the demo has been ported (http://waylonflinn.github.io/DeepBeliefSDK/) using a newer toolkit called webblas (https://github.com/waylonflinn/weblas). Reddit discussion about webblas at (https://www.reddit.com/r/MachineLearning/comments/41luif/gpu...) which has some more links and thoughts about this kind of thing. One more toolkit, written during a hackathon and still getting started, is gpu.js (https://github.com/gpujs/gpu.js/).


Are there any plans to make the porting automatic with some sort of compile to JS?


I had plans to do that, but that means implementing all of TensorFlows functionality in Javascript, which is a huge pain and lots of work. But if we restricted all networks to a strict subset of TensorFlow, it can be done, I think.


I grew up in Germany and my 7s (written with the slash through it) is recognized as a 3 or 8 :\


Weird, my 7 with a slash is consistently (2/2 times) recognised as a 7. Maybe you need to write the slash without connecting it to the bottom with a little loop? Otherwise it's a real shame that these networks are only trained on American handwriting (is it even handwriting? most Americans seem to print-write).


Hmmm... I don't connect with a loop. See the following: http://i.imgur.com/H7sYLNn.png I'd love to see what yours looks like.


Not the person you were talking to, but here's mine:

http://imgur.com/a/7OIgK

(American here). I did get it saying 3 a few times, but 7 most of the time.


When I handwrite a 7, it has a dash through the middle. Everyone I know does this. Your app doesn't like these; it sees a 3.


Really?? It is the opposite here. No one really uses the dash. I like wee differences like this.


I wrote it without the dash for most of my childhood, until I found out the dash was a possible way to write it. From then on, nobody ever mistook my 7's for 1's anymore. It was great, since I have really bad handwriting and I'd take all the help I could get not to get things wrong on tests. Zeroes get slashes, too, just for safety.


> It is the opposite here. No one really uses the dash.

Who is no one? Where is here?


In my experience the dash is unusual (but not unheard of) in the US. I've heard people refer to it as a German or European 7. I've never seen the German 1 that is like an upside down V used by an American.


I just tried a few 7s with dashes and it got them all, how big a dash do you use?


Just tried it with my usual length dash (extending 3/4 along the top line to the left, and about 1/4 out the other side) and it returned a 3. Used a smaller dash and it returned a 7. Exceptionally neat tool though and seriously impressive!


I use a dash too, but it recognises it OK for me


And when I handwrite a 1 with a serif at the top, it sees it as a 7

https://en.wikipedia.org/wiki/Regional_handwriting_variation...


For me, it returns a 2.


I would just assume you wrote a 7, realised it was wrong, and crossed it out.

But yes, the recognition isn't 100% accurate. There are quite a few simple "0" shapes that it doesn't get right.


I keep getting incorrect results for 4's in this style: http://imgur.com/BIlhnN9

However if I draw 4's like this: http://imgur.com/akifdRs I get the correct result.

Is this a limitation with the training set?


that kind of 4 doesn't exist in the training set. This works: http://imgur.com/gallery/xnFnF


When I draw 4s the second way, I keep getting 9... which is odd, because I assume this was trained on the usual MNIST?


It also had some issues to recognize 1.


Interesting it returns the false numbers when written in 'non-American' style. Drawing a 1 is recognized as 7, and drawing a 7 (with 'strike-through') is recognized as 2 :)

(https://en.wikipedia.org/wiki/Regional_handwriting_variation...)


Since the network has already been trained, I can't imagine using WebGL is worthwhile. It would interesting to see, either, drawing samples from the equilibrium distribution or training in the browser.


It must have been quite painstaking to hand-code this neural network into WebGL shaders. It could have been easier if the browser vendors would just implement the WebCL standard.

This seems like a throwback to the pre-CUDA "GPGPU" era, when people were implementing numerical algorithms in OpenGL to be able to leverage GPUs for general purpose computing.


Yeah, it was really painful. In order to speed up the process, I first implemented the network on the CPU, so that I could quickly verify my GPU implementation.


I draw my 1's the way it's often displayed on screen, with a little tail at the top and a line at the base. Your system keeps detecting this as a 2. I've also had 3's detected as 2's.


Yeah, the MNIST dataset is not that great it seems. At least for recognising real handwritten digits.


That's bad news for me, because I also plan to use the MNIST data in a project of mine. :)


Well, handwritten is not mousewritten ;) Have you tried real handwritten digits?


I have always written 1's with a tail at the top. That is both the way it's taught in German schools, and how this program outputs the digit. Most of my attempts get recongnized as 7.


I prefer my writing as unambiguous as possible, so I also like that way of writing it. Like someone else mentioned, "mousewritten" is not necessarily the same as handwritten, so that could be part of the problem.


Add to that list, when I draw an 8, it's also recognised as a 2.


It may have some issues with 9, depending on how you write it. https://i.imgur.com/URYuOSO.png


Pretty cool. I realize it is mainly a proof-of-concept, but decided to try out variations of scribbles[1]. Does the code make explicit assumptions about orientation, or did the training data make certain assumptions?

[1] http://imgur.com/a/HCFzy


The original MNIST training data made the assumption that the digits are not flipped. But you could solve it by creating more training data by flipping the original digits. But then you suddenly end up with an awful lot of data, and then the training process would take days, literally.


Drew 8s skewed slightly toward left, ended up with "6" or "3"


Intriguing! I was under the illusion that webGL was emit-only, and that Javascript programs can't read back any output generated by WebGL. I must be wrong or out of date :) So how does the script do it?


With the right extensions, you can render to a texture which can be sampled from in subsequent draw calls.

The framebuffer object extensions that allow you to write to 32-bit RGBA textures are widely supported on desktops and mobiles (OpenGL ES 2). But floating point textures are not. So, shaders resort to encoding 32-bit floats in RGBA textures. This unfortunately isn't a simple cast. More here: http://aras-p.info/blog/2009/07/30/encoding-floats-to-rgba-t...


I am storing all data in textures, and you can read data from textures by binding the texture to an fbo, set that fbo to the default fbo, and then read the data with `glReadPixels`


I have trouble with the 9's being returned as 3's


same here - I thought for a second that my handwriting was THAT bad after years with keyboard.


Very nice! For some reason I thought it was for letters, and was wondering why it thought my 'h' looked like a '6' heh


I've never gotten a correct result.

7 returned 9, 3 returned 8, 1 returned 9...


666 returns 4


When you break the contract of a function (a single digit here), you might as well be playing dice. https://xkcd.com/221/


Might be helpful used in a Captcha-type setup


What are you envisioning? It would seem that drawing digits is fairly easy for bots.


One way would be to crack captchas, eg make a Chrome extension that automatically fills them up. I remember someone implemented that at some point for MegaUpload.

Another way is to gather a data set of people writing digits with their mouse, and make a classifier that tells you if an input is realistic or not. Of course you'd need to store previous user inputs to make sure someone is not just reusing the same digit over and over again.


I have tried this. It is always returning 0.


Hmm, may be a problem with your graphics card. What's your graphics card and browser?


Getting the same issue with chrome 52, windows 10 and a r9 fury.

[.Offscreen-For-WebGL-043C4318]GL ERROR :GL_INVALID_OPERATION : glReadPixels: demo:1

Issue with amd cards maybe?


Yeah, reading from floating point textures with `glReadPixels` is not really supported on some cards or browsers, it seems.


I just checked chrome://gpu and it shows:

Canvas: Software only, hardware acceleration unavailable

and some other 'Problems detected' so the problem is probably on my side.

Cool project though!




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: