I used to be into ML back in the R-CNN, GAN, ResNet era and would read papers/blogs.
Seems like ML is taking off recently and I want to get back into it! So far on my list I have attention is all you need, qlora, llama’s and q learning. Suggestions?
Since nobody is actually recommending papers, here's an incomplete reading list that I sent out to some masters students I work with so they can understand the current research (academic) my little team is doing:
PPO / important training method / idk just watch a youtube video
Obviously these are specific to my work and are out of date by ~3-4 months but I think they do capture the spirit of "how do we train LLMs on a single GPU and no annotation team" and are frequently referenced simply by what I put in the "paper reference" column.
My view is to focus on doing stuff. That's what I did. Pick up some task you want the model to do, try finetuning llama, playing with APIs from OpenAI, etc. Googling and asking GPT along the way.
Foundational model training got so expensive that unless you can get hired by "owns nuclear power plant of GPUs" you are not going to get any "research" done. And as the area got white-hot those companies have more available talent than hardware nowadays. So just getting into the practitioner area is the best way to get productive with those models. And you improve as a practitioner by practicing, not by reading papers.
This tool can help you find what's new & relevant to read. It's updated every day (based on ArXiv).
You can filter by category (Computer Vision, Machine Learning, NLP, etc), by release date, but most importantly, you can rank by PageRank (proxy of influence/readership), PageRank growth (to see the fastest growing papers in terms of influence), total # of citations, etc...
I'd be wary of programmatic lists that claim to track the most important/popular recent papers. There's a ridiculous amount of hype/propaganda and citation hacking surrounding new AI research, making it hard to discern what will truly stand the test of time. Tomas Mikolov just posted about this:
It doesn't claim to track the most important recent papers. It's very clear and upfront that it aims to track the most trending recent papers. It's even in the title of the website. There's no claim of permanent importance.
This is a great little book to take you from “vaguely understand neural networks” to the modern broad state of practice. I saw very little to quibble with. https://fleuret.org/francois/lbdl.html
Bear in mind that ML skillset is now bifurcating into two components. On the one side are the people who work at places like OpenAI/DeepMind/Mistral/etc, who have billion dollar compute budgets. They are the ones who will create the foundational models. At this point a lot of this work is very technically narrow, dealing with CUDA, GPU issues, numerical stability, etc. On the other side are people who are using the models through the APIs in various ways. This is much more open-ended and potentially creative, but you don't need to know how QLearning works to do this.
It's a bit analogous to the situation with microprocessors. There is a ton of deep technical knowledge about how chips work, but most of this knowledge isn't critical for mainstream programming.
The book that just came out, "Understanding Deep Learning", is an excellent overview of the current state of AI: https://udlbook.github.io/udlbook/
Read that first, then to keep up to date you can follow up with any papers that seem interesting to you.
A good way to be aware of the interesting papers that come out is to follow @_akhaliq on X: https://twitter.com/_akhaliq
My current fave book to introduce DNNs is "Deep Learning: A Visual Approach" by Glassner. He's crystal clear, covers a lot of ground, and the book is up-to-date on everything but LMMs, which is moving so fast that no book could keep up.
Another book which also seems to be very good is "Deep Learning, Foundations and Concepts". It is coming out soon and you can already preview it on-line at https://www.bishopbook.com/
Prince's other writings have been outstanding. Based on the relative opacity of Bishop's venerable PRML, I'd turn to Prince's book before I would Bishop's newest.
Hey, imho best overall technical intro to LLMs (I guess that´s your main interest as you mentioned qlora + llama) is by Simon Willis [1]. Additionally or if you prefer videos, the recent 1h "busy persons intro" by Andrei Karpathy is great + dense as well [2].
EDIT: Maybe I misunderstood as you asked about papers, not general intros.
I don´t think that reading papers is the best way to "catch up" as the pace is rapid and knowledge very decentralized. I can confirm what Andrej recently wrote on X [3]:
"Unknown to many people, a growing amount of alpha is now outside of Arxiv, sources include but are not limited to:
This is old but covers a lot of background that you needs to know to understand very well the rest. What I like of this book is that it often explains in a very intuitive way the motivations behind certain choices. -> https://www.amazon.it/Natural-Language-Processing-Pytorch-Ap...
Once a week (at least!) some research group publishes another review paper to the cs.AI section on ArXiv. Look for new [papers with "survey" in the title](https://arxiv-sanity-lite.com/?q=survey&rank=time&tags=cs.AI...). You'll get surveys on every conceivable subtopic of ML/AI.
I kind of despair of keeping up to date with ML, at least to the extent that I might ever get current enough to be paid to work with it. I did Andrew Ng's Coursera specialisation a few years back - and I've worked through some of the developer-oriented courses, implemented some stuff. read more than a few books, read papers (the ones I might have a hope of understanding), and tried to get a former employer to take it seriously. But its seeming like unless you have a PhD or big-co experience then its very difficult to keep up to date by working in the field.
Notwithstanding the above, I'd agree with others here who suggest learning by doing/implementing, not reading papers.
Build something of personal interest to you. Start by looking for similar open-source projects online. Look at the online posts of the authors. Then look for the papers that you think will be useful for your project. Before you know it, you'll become an expert in your area of interest.
Above all, be wary of programmatic lists that claim to track the most important recent papers. There's a ridiculous amount of hype/propaganda and citation hacking surrounding new AI research, making it hard to discern what will truly stand the test of time. Tomas Mikolov just posted about this.[a]
From the past examples you give it sounds like you were into computer vision. There’s been a ton of developments since then, and I think you’d really enjoy the applications of some of those classic convolutional and variational encoder techniques in combination with transformers. A state of the art multimodal non-autoregressive neural net model such as Google’s Muse is a nice paper to work up to, since it exposes a breadth of approaches.
Would suggest our weekly paper club called Arxiv Dive - https://lu.ma/oxenbookclub. You can see past ones on our blog (https://blog.oxen.ai/) - have covered papers like Mamba, CLIP, Attention is all you need, and more. We also do a "hands on" session with live code, models, and real world data on Fridays!
I recently started reading research papers related to GPTs and LLMs. I have listed them here, along with my short synopsis and links to their code and datasets
The good (and some might say bad thing) is that when it comes to fundamental technologies there are only 2 that are relevant:
1. Transformers
2. Diffusion
The benefit is that, focus on understanding them both reeaaalllyy well and you are at the forefront of research;)
Also, what is the reason you want to do this? If it is about building some kind of AI enabled app, you don't have to read anything. Get an API key and let's go the barrier has never been lower.
And I suppose too running these at the edge is terribly interesting too, if you can find analyses of "quantization" this is a highly active research are, and results are pretty incredible since it cuts resources by huge factors and no one quite knows why.
This is one that's easy to dive into with consumer hardware, but don't know any great papers myself
> there are only 2 that are relevant: 1. Transformers 2. Diffusion
I'd argue that there are plenty of less sexy, non-unicorn uses for AI/ML - particularly in industrial applications. SVMs, DNNs, etc are still very relevant. As is GOFAI in some domains.
Posted in another thread, but sadly I got no replies...
Related question: how can I learn how to read the mathematical notation used in AI/ML papers? Is there a definitive work that describes the basics? I am a post-grad Engineer, so I know the fundamentals, but I'm really struggling with a lot of the Arxiv papers. Any pointers hugely appreciated.
I particularly enjoyed Kevin Murphy's book [0] for being just rigorous enough to satisfy but not too dry, but also not trying to add humor unnecessarily. It's not the best introduction text but it's great for someone with a little familiarity in the field who wants to broaden their understanding. There are proofs to rationalize some approaches, but not to the degree that would satisfy a hardcore mathematicians maybe, but tbh I think that's a good thing for a book of this scope.
If you find a sample, it may include the index of symbols in the beginning which is pretty comprehensive and may satisfy your question on its own.
>> To catch up with the current state of Artificial Intelligence and Machine Learning, it's essential to look at the latest and most influential research papers. Here are some categories and specific papers you might consider:
1. *Foundational Models and Large Language Models*:
- Papers on GPT (Generative Pre-trained Transformer) series, particularly the latest like GPT-4, which detail the advancements in language models.
- Research on BERT (Bidirectional Encoder Representations from Transformers) and its variants, which are pivotal in understanding natural language processing.
2. *Computer Vision*:
- Look into papers on Convolutional Neural Networks (CNNs) and their advancements.
- Research on object detection, image classification, and generative models like Generative Adversarial Networks (GANs).
3. *Reinforcement Learning*:
- Papers from DeepMind, like those on AlphaGo and AlphaZero, showcasing advances in reinforcement learning.
- Research on advanced model-free algorithms like Proximal Policy Optimization (PPO).
4. *Ethics and Fairness in AI*:
- Papers discussing the ethical implications and biases in AI, including work on fairness, accountability, and transparency in machine learning.
5. *Quantum Machine Learning*:
- Research on the integration of quantum computing with machine learning, exploring how quantum algorithms can enhance ML models.
6. *Healthcare and Bioinformatics Applications*:
- Papers on AI applications in healthcare, including drug discovery, medical imaging, and personalized medicine.
7. *Robotics and Autonomous Systems*:
- Research on the intersection of AI and robotics, including autonomous vehicles and drone technology.
8. *AI in Climate Change*:
- Papers discussing the use of AI in modeling, predicting, and combating climate change.
9. *Interpretable and Explainable AI*:
- Research focusing on making AI models more interpretable and explainable to users.
10. *Emerging Areas*:
- Papers on new and emerging areas in AI, such as AI in creative arts, AI for social good, and the integration of AI with other emerging technologies like the Internet of Things (IoT).
To find these papers, you can check academic journals like "Journal of Machine Learning Research," "Neural Information Processing Systems (NeurIPS)," and "International Conference on Machine Learning (ICML)," or platforms like arXiv, Google Scholar, and ResearchGate. Additionally, following key AI research labs like OpenAI, DeepMind, Facebook AI Research, and university research groups can provide insights into the latest developments.
Paper reference / main takeaways / link
instructGPT / main concepts of instruction tuning / https://proceedings.neurips.cc/paper_files/paper/2022/hash/b...
self-instruct / bootstrap off models own generations / https://arxiv.org/pdf/2212.10560.pdf
Alpaca / how alpaca was trained / https://crfm.stanford.edu/2023/03/13/alpaca.html
Llama 2 / probably the best chat model we can train on, focus on training method. / https://arxiv.org/abs/2307.09288
LongAlpaca / One of many ways to extend context, and a useful dataset / https://arxiv.org/abs/2309.12307
PPO / important training method / idk just watch a youtube video
Obviously these are specific to my work and are out of date by ~3-4 months but I think they do capture the spirit of "how do we train LLMs on a single GPU and no annotation team" and are frequently referenced simply by what I put in the "paper reference" column.