Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: AI/ML papers to catch up with current state of AI?
222 points by hahnchen on Dec 15, 2023 | hide | past | favorite | 49 comments
I used to be into ML back in the R-CNN, GAN, ResNet era and would read papers/blogs.

Seems like ML is taking off recently and I want to get back into it! So far on my list I have attention is all you need, qlora, llama’s and q learning. Suggestions?




Since nobody is actually recommending papers, here's an incomplete reading list that I sent out to some masters students I work with so they can understand the current research (academic) my little team is doing:

Paper reference / main takeaways / link

instructGPT / main concepts of instruction tuning / https://proceedings.neurips.cc/paper_files/paper/2022/hash/b...

self-instruct / bootstrap off models own generations / https://arxiv.org/pdf/2212.10560.pdf

Alpaca / how alpaca was trained / https://crfm.stanford.edu/2023/03/13/alpaca.html

Llama 2 / probably the best chat model we can train on, focus on training method. / https://arxiv.org/abs/2307.09288

LongAlpaca / One of many ways to extend context, and a useful dataset / https://arxiv.org/abs/2309.12307

PPO / important training method / idk just watch a youtube video

Obviously these are specific to my work and are out of date by ~3-4 months but I think they do capture the spirit of "how do we train LLMs on a single GPU and no annotation team" and are frequently referenced simply by what I put in the "paper reference" column.


Mamba: Linear-Time Sequence Modeling with Selective State Spaces / https://arxiv.org/abs/2312.00752


I would say that the chinchilla paper is a prerequisite to all of the ones mentioned above

https://arxiv.org/abs/2203.15556


DPO should be listed as well: https://arxiv.org/abs/2305.18290

It's extremely zeitgeisty atm


My view is to focus on doing stuff. That's what I did. Pick up some task you want the model to do, try finetuning llama, playing with APIs from OpenAI, etc. Googling and asking GPT along the way.

Foundational model training got so expensive that unless you can get hired by "owns nuclear power plant of GPUs" you are not going to get any "research" done. And as the area got white-hot those companies have more available talent than hardware nowadays. So just getting into the practitioner area is the best way to get productive with those models. And you improve as a practitioner by practicing, not by reading papers.

If you're at the computer, your time is best spent writing code and interacting with those models in my opinion. If you cannot (e.g. commute) I listen to some stuff (e.g. https://www.youtube.com/watch?v=zjkBMFhNj_g - Anything from Karpathy on youtube, or https://www.youtube.com/@YannicKilcher channel).


https://trendingpapers.com/

This tool can help you find what's new & relevant to read. It's updated every day (based on ArXiv).

You can filter by category (Computer Vision, Machine Learning, NLP, etc), by release date, but most importantly, you can rank by PageRank (proxy of influence/readership), PageRank growth (to see the fastest growing papers in terms of influence), total # of citations, etc...


I'd be wary of programmatic lists that claim to track the most important/popular recent papers. There's a ridiculous amount of hype/propaganda and citation hacking surrounding new AI research, making it hard to discern what will truly stand the test of time. Tomas Mikolov just posted about this:

https://news.ycombinator.com/item?id=38654038


It doesn't claim to track the most important recent papers. It's very clear and upfront that it aims to track the most trending recent papers. It's even in the title of the website. There's no claim of permanent importance.


Thank you. You're right. I edited my comment so it refers to "important/popular" instead.


Maybe this tweet by John Carmack can help you:

This is a great little book to take you from “vaguely understand neural networks” to the modern broad state of practice. I saw very little to quibble with. https://fleuret.org/francois/lbdl.html


Thanks! Purchased a copy for myself and a friend.

And, Francois could easily report the unauthorized seller to Amazon, or send S&D letter, suing not required.


Thanks for this - looks great. Do you have any other recommendations? Videos, blogs, etc?


Bear in mind that ML skillset is now bifurcating into two components. On the one side are the people who work at places like OpenAI/DeepMind/Mistral/etc, who have billion dollar compute budgets. They are the ones who will create the foundational models. At this point a lot of this work is very technically narrow, dealing with CUDA, GPU issues, numerical stability, etc. On the other side are people who are using the models through the APIs in various ways. This is much more open-ended and potentially creative, but you don't need to know how QLearning works to do this.

It's a bit analogous to the situation with microprocessors. There is a ton of deep technical knowledge about how chips work, but most of this knowledge isn't critical for mainstream programming.


The book that just came out, "Understanding Deep Learning", is an excellent overview of the current state of AI: https://udlbook.github.io/udlbook/

Read that first, then to keep up to date you can follow up with any papers that seem interesting to you. A good way to be aware of the interesting papers that come out is to follow @_akhaliq on X: https://twitter.com/_akhaliq


What do you think of this book?

https://fleuret.org/francois/lbdl.html

I like that it’s formatted for the phone.


I think it is quite good if what you need is to get very quickly a simple overview of the current state of AI.


My current fave book to introduce DNNs is "Deep Learning: A Visual Approach" by Glassner. He's crystal clear, covers a lot of ground, and the book is up-to-date on everything but LMMs, which is moving so fast that no book could keep up.


Another book which also seems to be very good is "Deep Learning, Foundations and Concepts". It is coming out soon and you can already preview it on-line at https://www.bishopbook.com/


Prince's other writings have been outstanding. Based on the relative opacity of Bishop's venerable PRML, I'd turn to Prince's book before I would Bishop's newest.


Hey, imho best overall technical intro to LLMs (I guess that´s your main interest as you mentioned qlora + llama) is by Simon Willis [1]. Additionally or if you prefer videos, the recent 1h "busy persons intro" by Andrei Karpathy is great + dense as well [2].

[1] https://simonwillison.net/2023/Aug/3/weird-world-of-llms/ [2] https://youtu.be/zjkBMFhNj_g?si=M6pRX66NrRyPM8x-

EDIT: Maybe I misunderstood as you asked about papers, not general intros. I don´t think that reading papers is the best way to "catch up" as the pace is rapid and knowledge very decentralized. I can confirm what Andrej recently wrote on X [3]:

"Unknown to many people, a growing amount of alpha is now outside of Arxiv, sources include but are not limited to:

- https://github.com/trending

- HN

- that niche Discord server

- anime profile picture anons on X

- reddit"

[3] https://twitter.com/karpathy/status/1733968385472704548


This, but I'd replace Reddit with 4chan. There is a lot more information on how to build, finetune and run models there, compared to Reddit.


Is he referencing a particular “niche discord server?”


This one is very good, and will provide certain key insights on the way you should think at NNs. -> https://www.amazon.it/Deep-Learning-Python-Francois-Chollet/...

This is a good explanation of the Transformer details -> https://www.youtube.com/watch?v=bCz4OMemCcA&ab_channel=UmarJ...

This is old but covers a lot of background that you needs to know to understand very well the rest. What I like of this book is that it often explains in a very intuitive way the motivations behind certain choices. -> https://www.amazon.it/Natural-Language-Processing-Pytorch-Ap...


Once a week (at least!) some research group publishes another review paper to the cs.AI section on ArXiv. Look for new [papers with "survey" in the title](https://arxiv-sanity-lite.com/?q=survey&rank=time&tags=cs.AI...). You'll get surveys on every conceivable subtopic of ML/AI.


I'd also add "Deep reinforcement learning from human preferences" https://proceedings.neurips.cc/paper_files/paper/2017/file/d... and "Training language models to follow instructions with human feedback" https://proceedings.neurips.cc/paper_files/paper/2022/file/b....

These papers outline the approach of reinforcement learning from human feedback which is being used to train lots of these LLMs such as ChatGPT.


I kind of despair of keeping up to date with ML, at least to the extent that I might ever get current enough to be paid to work with it. I did Andrew Ng's Coursera specialisation a few years back - and I've worked through some of the developer-oriented courses, implemented some stuff. read more than a few books, read papers (the ones I might have a hope of understanding), and tried to get a former employer to take it seriously. But its seeming like unless you have a PhD or big-co experience then its very difficult to keep up to date by working in the field.

Notwithstanding the above, I'd agree with others here who suggest learning by doing/implementing, not reading papers.


I put together a reading list for Andrej Karpathy's intro to LLMs that would be helpful for all of the latest LLM and multi-modal architectures:

https://blog.oxen.ai/reading-list-for-andrej-karpathys-intro...


Build something of personal interest to you. Start by looking for similar open-source projects online. Look at the online posts of the authors. Then look for the papers that you think will be useful for your project. Before you know it, you'll become an expert in your area of interest.

Above all, be wary of programmatic lists that claim to track the most important recent papers. There's a ridiculous amount of hype/propaganda and citation hacking surrounding new AI research, making it hard to discern what will truly stand the test of time. Tomas Mikolov just posted about this.[a]

---

[a] https://news.ycombinator.com/item?id=38654038


Part 2 of the fast.ai course might be a good start: https://course.fast.ai/Lessons/part2.html


I found Cosma Shalizi's notes on the subject pretty insightful.

http://bactra.org/notebooks/nn-attention-and-transformers.ht...

Definitely read through to the last section.


Thanks for sharing! I find cosma’s writing enlightening always. And read his stuff far less than I should


https://www.youtube.com/@algorithmicsimplicity - that series cleared up the fundamental question about transformers I couldn't find an answer for in many recommended materials.

Here's also nice tour de building blocks, which could also double as transformers/tensorflow API reference documentation: https://www.youtube.com/watch?v=eMXuk97NeSI&t=207s

The #1 visualization of architecture and size progression: https://bbycroft.net/llm


This resource has been invaluable to me: https://paperswithcode.com/

From the past examples you give it sounds like you were into computer vision. There’s been a ton of developments since then, and I think you’d really enjoy the applications of some of those classic convolutional and variational encoder techniques in combination with transformers. A state of the art multimodal non-autoregressive neural net model such as Google’s Muse is a nice paper to work up to, since it exposes a breadth of approaches.


No emergence

[2304.15004] Are Emergent Abilities of Large Language Models a Mirage? - arXiv https://arxiv.org/abs/2304.15004

Can't plan

https://openreview.net/forum?id=X6dEqXIsEW

No compositionality https://openreview.net/forum?id=Fkckkr3ya8

Apart from that it's great


Would suggest our weekly paper club called Arxiv Dive - https://lu.ma/oxenbookclub. You can see past ones on our blog (https://blog.oxen.ai/) - have covered papers like Mamba, CLIP, Attention is all you need, and more. We also do a "hands on" session with live code, models, and real world data on Fridays!


I recently started reading research papers related to GPTs and LLMs. I have listed them here, along with my short synopsis and links to their code and datasets

https://www.thinkevolveconsulting.com/large-language-models-...


Little late to this thread but from my list:

LLM (foundational papers)

* Attention is all you need - transformers + self attention

* BERT - first masked LM using transformers + self attention

* GPT3 - big LLM decoder (Basis of gpt4 and most LLM)

* Instruct GPT or TKInstruct (instruction tuning enables improved zero shot learning)

* Chain of Thought (improve performance via prompting)

some other papers which are become trendy depending on your interest

* RLHF - RL using human feedback

* Lora - make models smaller

* MoE - kind of ensembling

* self instruct - self label data

* constitutional ai - self alignment

* tree of thought - like CoT but a tree

* FastAttention,Longformer - optimized attention mechanisms

* React - agents


The good (and some might say bad thing) is that when it comes to fundamental technologies there are only 2 that are relevant:

1. Transformers 2. Diffusion

The benefit is that, focus on understanding them both reeaaalllyy well and you are at the forefront of research;)

Also, what is the reason you want to do this? If it is about building some kind of AI enabled app, you don't have to read anything. Get an API key and let's go the barrier has never been lower.


To that, what can these express, precisely, is an interesting question; so for transformer encoders:

https://arxiv.org/abs/2301.10743

Another interesting research topic is the trusted generation of tasks for finetuning

https://arxiv.org/abs/2306.08568

And I suppose too running these at the edge is terribly interesting too, if you can find analyses of "quantization" this is a highly active research are, and results are pretty incredible since it cuts resources by huge factors and no one quite knows why.

This is one that's easy to dive into with consumer hardware, but don't know any great papers myself

Run locally: https://github.com/ggerganov/llama.cpp

Quantized models: https://huggingface.co/TheBloke

Explainability is under research, though I haven't seen any good solutions.

This nay arise from skeptics who are calling the things stochastic parrots, incapable of reason, without a world model, etc.


> there are only 2 that are relevant: 1. Transformers 2. Diffusion

I'd argue that there are plenty of less sexy, non-unicorn uses for AI/ML - particularly in industrial applications. SVMs, DNNs, etc are still very relevant. As is GOFAI in some domains.


Posted in another thread, but sadly I got no replies...

Related question: how can I learn how to read the mathematical notation used in AI/ML papers? Is there a definitive work that describes the basics? I am a post-grad Engineer, so I know the fundamentals, but I'm really struggling with a lot of the Arxiv papers. Any pointers hugely appreciated.


I particularly enjoyed Kevin Murphy's book [0] for being just rigorous enough to satisfy but not too dry, but also not trying to add humor unnecessarily. It's not the best introduction text but it's great for someone with a little familiarity in the field who wants to broaden their understanding. There are proofs to rationalize some approaches, but not to the degree that would satisfy a hardcore mathematicians maybe, but tbh I think that's a good thing for a book of this scope.

If you find a sample, it may include the index of symbols in the beginning which is pretty comprehensive and may satisfy your question on its own.

https://www.goodreads.com/book/show/15857489-machine-learnin...


Thank you!


Have you tried asking ChatGPT to help explain the notation? I haven't tried that myself, but have read that it can work[0].

[0]: https://medium.com/@eric.christopher.ness/get-an-explanation...


From ChatGPT:

>> To catch up with the current state of Artificial Intelligence and Machine Learning, it's essential to look at the latest and most influential research papers. Here are some categories and specific papers you might consider:

1. *Foundational Models and Large Language Models*: - Papers on GPT (Generative Pre-trained Transformer) series, particularly the latest like GPT-4, which detail the advancements in language models. - Research on BERT (Bidirectional Encoder Representations from Transformers) and its variants, which are pivotal in understanding natural language processing.

2. *Computer Vision*: - Look into papers on Convolutional Neural Networks (CNNs) and their advancements. - Research on object detection, image classification, and generative models like Generative Adversarial Networks (GANs).

3. *Reinforcement Learning*: - Papers from DeepMind, like those on AlphaGo and AlphaZero, showcasing advances in reinforcement learning. - Research on advanced model-free algorithms like Proximal Policy Optimization (PPO).

4. *Ethics and Fairness in AI*: - Papers discussing the ethical implications and biases in AI, including work on fairness, accountability, and transparency in machine learning.

5. *Quantum Machine Learning*: - Research on the integration of quantum computing with machine learning, exploring how quantum algorithms can enhance ML models.

6. *Healthcare and Bioinformatics Applications*: - Papers on AI applications in healthcare, including drug discovery, medical imaging, and personalized medicine.

7. *Robotics and Autonomous Systems*: - Research on the intersection of AI and robotics, including autonomous vehicles and drone technology.

8. *AI in Climate Change*: - Papers discussing the use of AI in modeling, predicting, and combating climate change.

9. *Interpretable and Explainable AI*: - Research focusing on making AI models more interpretable and explainable to users.

10. *Emerging Areas*: - Papers on new and emerging areas in AI, such as AI in creative arts, AI for social good, and the integration of AI with other emerging technologies like the Internet of Things (IoT).

To find these papers, you can check academic journals like "Journal of Machine Learning Research," "Neural Information Processing Systems (NeurIPS)," and "International Conference on Machine Learning (ICML)," or platforms like arXiv, Google Scholar, and ResearchGate. Additionally, following key AI research labs like OpenAI, DeepMind, Facebook AI Research, and university research groups can provide insights into the latest developments.


If you want good up to date resources on the applied side I’d recommend checking out https://hamel.dev/notes/


At the Twitter section at the bottom there is usually good papers https://news.mioses.com


you don’t need papers, Arxiv are self aggrandizement from some meme in East Asia

just join communities on discord or locallama on reddit


can I get some insights on ai and robotics some papers to implement and get my hands dirty




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: