Hacker News new | past | comments | ask | show | jobs | submit login

Automatic differentiation is why we are able to have complex models like transformers, it's arguably the key reason (in addition to large amounts of data and massive compute resources) that we have the revolution in AI that we have.

Nobody working in this space is hand calculating derivatives for these models. Thinking in terms of differentiable programming is a given and I think certainly counts as "from scratch" in this case.

Any time I see someone post a comment like this, I suspect the don't really understand what's happening under the hood or how contemporary machine learning works.




> Thinking in terms of differentiable programming is a given and I think certainly counts as "from scratch" in this case.

I have to disagree on that being an obvious assumption for the meaning of "from scratch", especially given that the book description says that readers only need to know Python. It feels like if I read "Crafting Interpreters" only to find that step one is to download Lex and Yacc because everyone working in the space already knows how parsers work.

> I suspect the don't really understand what's happening under the hood or how contemporary machine learning works.

Everyone has to start somewhere. I thought I would be interested in a book like this precisely because I don't already fully understand what's happening under the hood, but it sounds like it might not actually be a good starting point for my idea of "from scratch."


On that note, I have a relative comprehensive intro to PyTorch in the Appendix (~40 pages) that go over automatic differentiation etc.

The alternative, if you want to build something truly from scratch, would be to implement everything in CUDA, but that would not be a very accessible book.


“If you wish to make an apple pie from scratch, you must first invent the universe.” —- Carl Sagan


It depends on which hood you want to look under.

Let's say you wanted to write your own SSH client as a learning exercise. Is it cheating if you use OpenSSL? Is it cheating if you use Python? Is it cheating if you use a C compiler?


Oh, I see you're using an existing ISA and not creating your own for this. And also, where do you get off using existing integrated circuits. From scratch means you have to start from sand and make your own nand gates and get to an adder and a latch and then a cpu and write an operating system for it before you can get to using language that you invented for this purpose.


Nobody writes code in terms of Nands but there is Nand to Tetris course ("The Elements of Computing Systems: Building a Modern Computer from First Principles" book) https://www.nand2tetris.org

pytorch to LLMs has a lot to show even without Python to pytorch part. It reminds me of "Neural Networks: Zero to Hero" Andrej Karpathy https://m.youtube.com/playlist?list=PLAqhIrjkxbuWI23v9cThsA9... Prerequisites: solid programming (Python), intro-level math (e.g. derivative, gaussian). https://karpathy.ai/zero-to-hero.html



I’m very comfortable with AI in general but not so much with Machine Lesrning. I understand transformers are a key piece of the puzzle that enables tools like LLMs but don’t know much about them.

Do you (or others) have good resources explaining what they are and how they work at a high level?


I'd say Chapter 1 would be the high-level intro to transformers and how they relate to LLMs.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: