Hacker News new | past | comments | ask | show | jobs | submit login
Introduction to Convolutional Neural Networks (serokell.io)
150 points by aroccoli on Aug 4, 2021 | hide | past | favorite | 22 comments



20 odd years ago I worked on someone else's helpdesk as application and PC support. Sometimes the calls that came in were a bit more interesting than Extra Term is blinking at me or Sophos ate my Word doc.

Call (ticket) came in along the lines of "Need help with a neural network I am developing in Excel". I bit. I showed the end user (who was orders of magnitudes cleverer than me) a bit about VBA. Some reasonably good habits like Option Explicit and making sure all eventualities are covered in If clauses. Keep your inputs, working and output separate and sum north/south and east/west and compare. Document the bloody thing! If I recall correctly, it was a Hopfield thingie, so I'm pretty off topic here.

The next call was for help with an Access database with fifty odd tables all linked to each other in a way that must surely be designed to invoke Cthulhu. I deleted it and we started again! My call notes were odd enough to get a mention at the next ops meeting.

The product is still flying so all good.

Convoluted ... what?


> If I recall correctly, it was a Hopfield thingie, so I'm pretty off topic here.

Actually, you’re spot on. Hopfield networks were once popular, and are now popular again: https://arxiv.org/abs/2008.02217

(Hopularity leads to popularity, as they say.)

I wish there was some way to track down that spreadsheet. :) It’d be neat to read. Thanks for sharing.


> fifty odd tables all linked to each other in a way that must surely be designed to invoke Cthulhu.

Please, add a warn that this make people spit the coffee


Wow, awesome, we’re so spoiled now


Has anyone tried searching for new basic operations like convolution or pooling?

We've been using these methods for years, and I doubt the first major development in NN vision (convolution) is the most optimal method possible.

Consider the extreme case of searching over all (differentiable) mathematical operations to see if something really novel can be discovered.

How feasible would this be?


Multi-head self-attention seems to be the new trendy architectural primitive.

I don't know how feasible it would be - I guess you could take a set of base operations (matrix multiplication, softmax, etc.) and randomly generate feature transformations and check if any of them yield good features (stick a linear readout at the end of it and test the performance on some downstream tasks).

That would be an unguided search - I guess you could try something like GA or something. Also, it uses neural network training as an inner loop step, so it would probably be to expensive. Better would be if you could get the gradient w.r.t. to the tentative operation somehow.

Problem is that training NNs is nontrivial and you might need things like BatchNorm and residual connections to make things stable, so you'd somehow have to search for good architectures for each operation as well.


This is basically what the field of neural architecture search tries to do. Here’s a good (somewhat technical) introduction: https://lilianweng.github.io/lil-log/2020/08/06/neural-archi...


I think NAS is a bit higher level than what the OP had in mind - NAS isn't usually used to search for fundamental operations like self-attention or convolution. But I guess you could probably adapt it quite easily.


I believe some algorithms, like AutoML-Zero, search on the level of mathematical operations.


At least there is work on greatly generalizing convolutions. They're much more broadly applicable (in neural networks) to very differently structured data than they appear to be in their standard form. (The "in neural networks" qualifier is there because quite a bit of this has been understood about the mathematical operation convolution for a long long time).

Some recent developments:

* https://arxiv.org/abs/2010.03633 (disclosure: I'm one of the authors)

* https://arxiv.org/abs/2012.06333


group CNN are drop in replacement of convolution layers.


Another "beginner" intro that starts with describing FCs and neurons, and doesn't tell why we need NNs in the first place.

Although Deep NNs is not very interpretable, there's good intuitions behind the designs. These kind of articles will only make deep learning more mysterious.


I was thinking the same; there are so many articles explaining the basics.

For me it would be more helpful to start off with a real-life scenario where the mentioned method can be applied and even might excel compared to other methods; bonus points if you also explain what properties of the method make it so very well-suited for the specific real-life scenario.

There are so many methods in data science / machine learning and from what I remember from my university days one of the difficult tasks was to know when to use which method, depending on the properties of your data and on what you want to achieve; additionally, sometimes you also need to optimize/improve the method's hyperparameters and that's almost a whole separate discipline by itself.

Nonetheless, the posted article contains a lot of valuable information for a beginner, so it's definitely a good start.


Yes. In most problems I face I find gradient boosting is at least as good if not better then any neural network and much easier to implement and explain.


So there hundreds of posts on CNNs, what do we find special in this post?


My hypotheses are:

1. Some companies found a way to promote this type of blog articles past a certain threshold to stay on the front page long enough for... profits?

2. The demography of HN has changed substantially in recent times so that copypasta articles that don't add anything new to existing, better sources are actually valuable to them.

Edit: typo.


I'm inclined to agree. I've been meaning to take the time to implement a CNN from scratch, I went right to this article hoping it would have some code but no, just the same content over again.


I didn't implement a CNN from scratch, but a few years ago, I wrote a blog post on CNNs [1] because, like the other commenters, I could find almost no decent blog content on what exactly a CNN was. Maybe it will help in your efforts.

[1] http://gregorygundersen.com/blog/2017/02/24/cnns/


this seems really good. I always like to read code cause it gives a better idea how things works under the hood. when I first wrote a toy neural network I read everything I could on the topic, plus I was taking a ML class where the teacher was a big fan of implementing things from scratch.


If you want to implement a CNN from scratch, wouldn't you prefer an article that describes how it works, rather than an article that just gives you the code? Otherwise it's more a process of copying from a source moreso than implementing from scratch.


I guess you could look at a popular framework such as pytorch and try to figure our from the code how cnn is implemented


Icons are screen width on mobile.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: