Show HN: GPU-Accelerated Digit Recognition with WebGL

erkaman · on Aug 17, 2016

The demo uses WebGL, and if you can't get the demo to work, you can here(https://github.com/Erkaman/regl-cnn) find a recorded gif that shows what it is supposed to look like.

This demo does handwritten digit recognition by evaluating a Convolutional Neural Network on the GPU with WebGL. The network was trained in TensorFlow by this script here(https://github.com/Erkaman/regl-cnn/blob/gh-pages/scripts/cr...), and the network was then reimplemented on the GPU by hand with WebGL. The main purpose of the demo was to demonstate how our WebGL framework regl(https://github.com/mikolalysenko/regl) can be used to greatly simplify GPGPU programming in WebGL. The secondary purpose was to test whether evaluating Deep Learning networks in WebGL is doable. To our knowledge(but we may be wrong!), our implementation is the first implementation ever to attempt GPU accelerating neural networks with WebGL And we hope that this implementation will provide a foundation for people who, like us, wish to experiment with Deep Learning and WebGL The GPU implementation can be found here(https://github.com/Erkaman/regl-cnn/blob/gh-pages/src/gpu.js)

Note that this network will probably be slower than the corresponding network implemented on the CPU. This is because of the overhead associated with transferring data to and from the GPU. But in the future we will attempt implementing more complex networks in the browser, such as Neural Style(https://arxiv.org/pdf/1508.06576v2.pdf), and then we think that we will see a significant speedup compared to the CPU.

Lastly, if anyone has any questions, I will be glad to answer them here.

transcranial · on Aug 17, 2016

Nice work! I've also tried my hands at use WebGL to do deep learning in the browser (e.g. https://github.com/scienceai/neocortex/blob/master/src/lib/w...). The conclusion I came to was that there are just way too many limitations for it to really pay off. The need to encode everything in textures etc limits the data shape and dimensionality, not to mention the complexity cost. If you can get more complex networks working I'll be really impressed!

MXnet.js [1] is an emscripten port of the base C++ framework. It runs entirely in the browser and works fairly well. The actual code produced by emscripten isn't that large, but the model weights can become an issue. I've tried to get emscripten working on tensorflow, even just for forward prediction, but have been pretty much gotten nowhere. Of course this doesn't let you harness GPU power.

Lots of cool potential applications of doing deep learning over the web are just waiting to be discovered and built.

[1] https://github.com/dmlc/mxnet.js

erkaman · on Aug 17, 2016

Wow, so there are other people who have tried doing deep learning in WebGL! But I will also give it a try, and see if I can do it better than you.

kcimc · on Aug 17, 2016

Very well done, but not the first: 2 years ago Jetpac released DeepBeliefSDK (https://github.com/jetpacapp/DeepBeliefSDK). They were acquired by Google and development stopped. But the demo has been ported (http://waylonflinn.github.io/DeepBeliefSDK/) using a newer toolkit called webblas (https://github.com/waylonflinn/weblas). Reddit discussion about webblas at (https://www.reddit.com/r/MachineLearning/comments/41luif/gpu...) which has some more links and thoughts about this kind of thing. One more toolkit, written during a hackathon and still getting started, is gpu.js (https://github.com/gpujs/gpu.js/).

rawnlq · on Aug 17, 2016

Are there any plans to make the porting automatic with some sort of compile to JS?

erkaman · on Aug 17, 2016

I had plans to do that, but that means implementing all of TensorFlows functionality in Javascript, which is a huge pain and lots of work. But if we restricted all networks to a strict subset of TensorFlow, it can be done, I think.

blobert · on Aug 17, 2016

I grew up in Germany and my 7s (written with the slash through it) is recognized as a 3 or 8 :\

Asooka · on Aug 17, 2016

Weird, my 7 with a slash is consistently (2/2 times) recognised as a 7. Maybe you need to write the slash without connecting it to the bottom with a little loop? Otherwise it's a real shame that these networks are only trained on American handwriting (is it even handwriting? most Americans seem to print-write).

blobert · on Aug 17, 2016

Hmmm... I don't connect with a loop. See the following: http://i.imgur.com/H7sYLNn.png I'd love to see what yours looks like.

calebegg · on Aug 17, 2016

Not the person you were talking to, but here's mine:

http://imgur.com/a/7OIgK

(American here). I did get it saying 3 a few times, but 7 most of the time.

imadfy · on Aug 17, 2016

When I handwrite a 7, it has a dash through the middle. Everyone I know does this. Your app doesn't like these; it sees a 3.

Fifer82 · on Aug 17, 2016

Really?? It is the opposite here. No one really uses the dash. I like wee differences like this.

wccrawford · on Aug 17, 2016

I wrote it without the dash for most of my childhood, until I found out the dash was a possible way to write it. From then on, nobody ever mistook my 7's for 1's anymore. It was great, since I have really bad handwriting and I'd take all the help I could get not to get things wrong on tests. Zeroes get slashes, too, just for safety.

eatbitseveryday · on Aug 17, 2016

> It is the opposite here. No one really uses the dash.

Who is no one? Where is here?

esrauch · on Aug 17, 2016

In my experience the dash is unusual (but not unheard of) in the US. I've heard people refer to it as a German or European 7. I've never seen the German 1 that is like an upside down V used by an American.

kraftman · on Aug 17, 2016

I just tried a few 7s with dashes and it got them all, how big a dash do you use?

wastedhours · on Aug 17, 2016

Just tried it with my usual length dash (extending 3/4 along the top line to the left, and about 1/4 out the other side) and it returned a 3. Used a smaller dash and it returned a 7. Exceptionally neat tool though and seriously impressive!

GordonS · on Aug 17, 2016

I use a dash too, but it recognises it OK for me

Aardwolf · on Aug 17, 2016

And when I handwrite a 1 with a serif at the top, it sees it as a 7

https://en.wikipedia.org/wiki/Regional_handwriting_variation...

kmfrk · on Aug 17, 2016

For me, it returns a 2.

mnw21cam · on Aug 17, 2016

I would just assume you wrote a 7, realised it was wrong, and crossed it out.

But yes, the recognition isn't 100% accurate. There are quite a few simple "0" shapes that it doesn't get right.

gecgooden · on Aug 17, 2016

I keep getting incorrect results for 4's in this style: http://imgur.com/BIlhnN9

However if I draw 4's like this: http://imgur.com/akifdRs I get the correct result.

Is this a limitation with the training set?

erkaman · on Aug 17, 2016

that kind of 4 doesn't exist in the training set. This works: http://imgur.com/gallery/xnFnF

bbctol · on Aug 17, 2016

When I draw 4s the second way, I keep getting 9... which is odd, because I assume this was trained on the usual MNIST?

pjmlp · on Aug 17, 2016

It also had some issues to recognize 1.

flohofwoe · on Aug 17, 2016

Interesting it returns the false numbers when written in 'non-American' style. Drawing a 1 is recognized as 7, and drawing a 7 (with 'strike-through') is recognized as 2 :)

(https://en.wikipedia.org/wiki/Regional_handwriting_variation...)

DanWaterworth · on Aug 17, 2016

Since the network has already been trained, I can't imagine using WebGL is worthwhile. It would interesting to see, either, drawing samples from the equilibrium distribution or training in the browser.

n00b101 · on Aug 17, 2016

It must have been quite painstaking to hand-code this neural network into WebGL shaders. It could have been easier if the browser vendors would just implement the WebCL standard.

This seems like a throwback to the pre-CUDA "GPGPU" era, when people were implementing numerical algorithms in OpenGL to be able to leverage GPUs for general purpose computing.

erkaman · on Aug 17, 2016

Yeah, it was really painful. In order to speed up the process, I first implemented the network on the CPU, so that I could quickly verify my GPU implementation.

jvdl · on Aug 17, 2016

I draw my 1's the way it's often displayed on screen, with a little tail at the top and a line at the base. Your system keeps detecting this as a 2. I've also had 3's detected as 2's.

erkaman · on Aug 17, 2016

Yeah, the MNIST dataset is not that great it seems. At least for recognising real handwritten digits.

jvdl · on Aug 17, 2016

That's bad news for me, because I also plan to use the MNIST data in a project of mine. :)

ujal · on Aug 17, 2016

Well, handwritten is not mousewritten ;) Have you tried real handwritten digits?

wongarsu · on Aug 17, 2016

I have always written 1's with a tail at the top. That is both the way it's taught in German schools, and how this program outputs the digit. Most of my attempts get recongnized as 7.

jvdl · on Aug 17, 2016

I prefer my writing as unambiguous as possible, so I also like that way of writing it. Like someone else mentioned, "mousewritten" is not necessarily the same as handwritten, so that could be part of the problem.

jvdl · on Aug 17, 2016

Add to that list, when I draw an 8, it's also recognised as a 2.

piggycurse · on Aug 17, 2016

It may have some issues with 9, depending on how you write it. https://i.imgur.com/URYuOSO.png

eatbitseveryday · on Aug 17, 2016

Pretty cool. I realize it is mainly a proof-of-concept, but decided to try out variations of scribbles[1]. Does the code make explicit assumptions about orientation, or did the training data make certain assumptions?

[1] http://imgur.com/a/HCFzy

erkaman · on Aug 18, 2016

The original MNIST training data made the assumption that the digits are not flipped. But you could solve it by creating more training data by flipping the original digits. But then you suddenly end up with an awful lot of data, and then the training process would take days, literally.

vtange · on Aug 17, 2016

Drew 8s skewed slightly toward left, ended up with "6" or "3"

willvarfar · on Aug 17, 2016

Intriguing! I was under the illusion that webGL was emit-only, and that Javascript programs can't read back any output generated by WebGL. I must be wrong or out of date :) So how does the script do it?

deafcalculus · on Aug 17, 2016

With the right extensions, you can render to a texture which can be sampled from in subsequent draw calls.

The framebuffer object extensions that allow you to write to 32-bit RGBA textures are widely supported on desktops and mobiles (OpenGL ES 2). But floating point textures are not. So, shaders resort to encoding 32-bit floats in RGBA textures. This unfortunately isn't a simple cast. More here: http://aras-p.info/blog/2009/07/30/encoding-floats-to-rgba-t...

erkaman · on Aug 17, 2016

I am storing all data in textures, and you can read data from textures by binding the texture to an fbo, set that fbo to the default fbo, and then read the data with `glReadPixels`

Sander_Marechal · on Aug 17, 2016

I have trouble with the 9's being returned as 3's

zargath · on Aug 17, 2016

same here - I thought for a second that my handwriting was THAT bad after years with keyboard.

anfractuosity · on Aug 17, 2016

Very nice! For some reason I thought it was for letters, and was wondering why it thought my 'h' looked like a '6' heh

ythl · on Aug 17, 2016

I've never gotten a correct result.

7 returned 9, 3 returned 8, 1 returned 9...

eyelidlessness · on Aug 17, 2016

666 returns 4

cordite · on Aug 17, 2016

When you break the contract of a function (a single digit here), you might as well be playing dice. https://xkcd.com/221/

mp3geek · on Aug 17, 2016

Might be helpful used in a Captcha-type setup

wongarsu · on Aug 17, 2016

What are you envisioning? It would seem that drawing digits is fairly easy for bots.

MasterScrat · on Aug 17, 2016

One way would be to crack captchas, eg make a Chrome extension that automatically fills them up. I remember someone implemented that at some point for MegaUpload.

Another way is to gather a data set of people writing digits with their mouse, and make a classifier that tells you if an input is realistic or not. Of course you'd need to store previous user inputs to make sure someone is not just reusing the same digit over and over again.

amitmerchant · on Aug 17, 2016

I have tried this. It is always returning 0.

erkaman · on Aug 17, 2016

Hmm, may be a problem with your graphics card. What's your graphics card and browser?

eksrow · on Aug 17, 2016

Getting the same issue with chrome 52, windows 10 and a r9 fury.

[.Offscreen-For-WebGL-043C4318]GL ERROR :GL_INVALID_OPERATION : glReadPixels: demo:1

Issue with amd cards maybe?

erkaman · on Aug 17, 2016

Yeah, reading from floating point textures with `glReadPixels` is not really supported on some cards or browsers, it seems.

eksrow · on Aug 17, 2016

I just checked chrome://gpu and it shows:

Canvas: Software only, hardware acceleration unavailable

and some other 'Problems detected' so the problem is probably on my side.

Cool project though!