Implementing a Convolutional Neural Network from Scratch in Python

tsumnia · on May 22, 2019

Scrolling HN, I saw the two words I'm always weary of - "from scratch". Mostly because I will click on the link, hoping to learn the mathematics behind a particular algorithm, only to see they've import sklearn and skip over all the explaining of how things are ACTUALLY getting done. Not that these types of tutorials do not have their place, but its irksome to see "from scratch" including most of the hard part being done.

With that, thank you Victor. Specifically because you did not do this at all and instead wrote a very easy to follow guide. I think this type of learning material will be very useful for CS and mathematics. The idea that very complicated algorithms must be explicitly implemented and then walked through, rather than symbols in a white paper will help make the mathematics of CS more applicable to everyone.

And to anyone bringing up numpy, it is at a level of "prepackaged" I'm fine with. I'm not going to raise my own pigs and chickens to make a breakfast burrito, but saying I did it from scratch by microwaving a frozen one isn't going to cut it either. Numpy is like the basic ingredients to the recipe. While something like sklearn or tensorflow are perfectly acceptable, I wouldn't say that's the best method for learning CNNs.

Darkphibre · on May 22, 2019

Thanks for this writeup! Having moved into Data Science as a primary career (after 25 years architecting software), it's frustrating to try to decipher the various symbology used in Physics, Economics, Computer Science whitepapers.

I need a Whitepaper Rosetta Stone. And this article is one such example.

ericol · on May 22, 2019

This story started really bad, but what a plottwist!

I saw this very same posted here in HN or Reddit last week methinks, and one of the top comments was complaining preciselly about numpy.

You disarmed that point right from the start, so kudos.

ericol · on May 22, 2019

I stando corrected based on a comment form the author that is found below: Last week's post was the previous one in this series.

z3c0 · on May 22, 2019

I agree with you overall. This is purely speculative, but it seems that the rise of pre-packaged ML solutions has caused the meaning of "from scratch" to have changed to "with sklearn". It's easy to feel like a short python script using sklearn is "from scratch" when you were using a WYSIWYG solution before.

That said, the book "Data Science from Scratch" is great, and I'd recommend it to those looking for a deeper understanding than just "import sklearn".

vzhou842 · on May 22, 2019

Hey, author here. Any/all feedback is welcome, and I'm happy to answer questions.

Previous discussion on HN of the "introduction to Neural Networks" referenced in this article: https://news.ycombinator.com/item?id=19320217

Runnable code from the article: https://repl.it/@vzhou842/A-CNN-from-scratch-Part-1

Github: https://github.com/vzhou842/cnn-from-scratch

parkaboy · on May 22, 2019

Honestly: the "convolution" (cross-correlation) part in your article is the clearest step-by-step explanation I've ever encountered. Well done!

Ranlot · on May 22, 2019

For those interested in a clear explanation of the backward pass (especially for convolution layers), here's a good resource:

https://arxiv.org/abs/1811.11987

https://github.com/Ranlot/backpropagation-CNNs

savant_penguin · on May 22, 2019

came here exactly looking for the backward pass

vzhou842 · on May 22, 2019

I'll have a sequel explaining the backward pass with full code up by next week!

p1esk · on May 23, 2019

I've done it as an exercise a while ago: https://github.com/michaelklachko/NN/blob/master/Python/conv...

alteria · on May 22, 2019

I like your writing style, clear and concise. Your NN posts would have been super helpful when I was taking an Intro AI course.

SomeHacker44 · on May 22, 2019

“Each of the 4 filters in the conv layer produces a 26x26 output, so stacked together they make up a 26x26x8 volume. All of this happens because of 3 × 3 (filter size) × 8 (number of filters) = only 72 weights!”

There seems to be a mismatch here with 4 and 8. Probably 4 is wrong.

master_yoda_1 · on May 27, 2019

Firs thing first stop stealing. You copy code and figures from karpathy lecture notes and did not cite him. Its call plagiarism.

duked · on May 22, 2019

This is a great tutorial. However, every time I see RNN/CNN it's always applied to some video stream or set of images. I really would like to find some tutorial but applied to event logs or other text-based input. Anyone has a good link for that?

siekmanj · on May 22, 2019

Andrej Karpathy has a pretty good introductory article to RNNs here: http://karpathy.github.io/2015/05/21/rnn-effectiveness/

He has some code which is pretty easy to follow to go along with the article: https://gist.github.com/karpathy/d4dee566867f8291f086

amelius · on May 22, 2019

> If you trained a network to detect dogs, you’d want it to be able to a detect a dog regardless of where it appears in the image. Imagine training a network that works well on a certain dog image, but then feeding it a slightly shifted version of the same image. The dog would not activate the same neurons, so the network would react completely differently!

But CNNs only deal with translations. What if the image of the dog is rotated?

vzhou842 · on May 22, 2019

True! If dealing with stuff like rotations is a concern, you could augment the training set by applying small random transforms to it (like rotations, cropping, scaling, color adjustment, etc).

amelius · on May 22, 2019

Ok, this makes me wonder how humans do it. Would a person who has never seen a rotated upside-down dog recognize it?

greiskul · on May 22, 2019

Well, human children do train their vision in a lot of different ways while playing, and probably very early develop a mechanism for dealing with all kinds rotated objects.

For some objects though, even adults are not instantly good, reading text upside down is something that is initially very hard, but with some practice it can be done, I've met some teachers that can do it from years of tutoring people from other sides of desks.

panic · on May 22, 2019

The way something looks can definitely change based on its rotation -- see http://thatchereffect.com for example.

mrhwick · on May 22, 2019

For that level of effectiveness you would want to investigate using a capsule network.

master_yoda_1 · on May 23, 2019

You copy code and figures from karpathy lecture notes and did not cite him. Its call plagiarism.

windsignaling · on May 22, 2019

Where's part 2? That's going to be the thing that's actually non-trivial

vzhou842 · on May 22, 2019

It's in the works! I needed a few more days to polish it up. It'll probably be up by early next week.