Hacker News new | past | comments | ask | show | jobs | submit login

I wouldn't be so quick to conspiracy. I'm the author of a work and a famous blog post that trains a particular common architecture much faster (don't want to dox myself too much) and with far fewer parameters, but it has been rejected several times and is now arxiv only. Our most common complaint was "who would use this? Why not just take a large model and tune it?" That question alone held us back a year (had over a hundred citations by then and remains my most cited work) until it switched to "use more datasets" and "not novel" (by that time true, others had built off of us, cited us, and published in top venues).

I don't think this was some conspiracy by big labs to push back against us (we're nobodies) but rather that people get caught up in hype and reviewers are lazy and incentivized to reject. You're trained to be critical of works and especially consider that post hoc most solutions appear far simpler than they actually are. But context matters because if you don't approach every paper with nuance it's easy to say "oh, it's just x." But if those ideas were so simple and obvious they would also be prolific. I see a lot of small labs suffer the same fate simply due to lack of compute. If you don't make your new technique work on many datasets it becomes the easiest thing to reject a paper by. ACs aren't checking that reviews are reasonable. I've even argued with fellow reviewers about papers in workshops -- papers I would have accepted in the main conference -- that are brushed off and the reviewers admit in their reviews that they do not work on these topics. I don't understand what's going on but at times it feels like a collective madness. A 10 page paper with 4 very different datasets that solves a problem, is clearly written, has no major flaws, and is useful to the community should not need defending when submitted to a workshop just because reviewers aren't qualified to review the work (this paper got in btw). We are moving into a "pay to play" ecosystem and that will only create bad science due to group think. (another aspect of "pay to play" is in the tuning. Spending $1M to tune your model to be the best doesn't mean it is better than a model that could not afford the search. Often more than half of resources are spent on tuning now)




Is there a place where you guys discuss... things? I'm layman interested in this topic akin to pop-physics/maths, but have no chance to just read papers and "get it". On the other hand, immediately available resources focus more on how-to part of it rather than on what's up overall. Also, do you have something like 3b1b/pbs/nph for it? Content that you can watch and say "well, yep, good job".


I don't have any great recommendations and unfortunately my advice may be not what you want to hear. What I tell my students is "You don't need to know math to build good models, but you need to know math to know why your models are wrong." But this is even a contentious statement within the community. (Personally I'm more interested in exploring what we can build and understand rather than focusing on throwing more compute and data at problems. There's a lot of work to be done that does not require significant compute, but it isn't flashy and you'll get little fame. Every famous model you know has some unsung hero(s) who built the foundation before compute was thrown at the problem). I was previously a physicist and we similarly frequently express that you do not know the material unless you can do the math. Physicists are trained in generating analogies as they help communication but this sometimes leads to people convincing themselves that they understand things far more than they actually do. They say the devil is in the details, and boy are there a lot of details. (Of the science communicators, I'm happy those are the ones you mention though!) But do not take this as gatekeeping! These groups are often happy to help with the math and recommend readings. ML is kinda a while west and you can honestly pick a subdomain of math and probably find it useful, but I would start by making sure you have a foundation in multivariate calculus and linear algebra.

As to paper reading, my suggestion is to just start. This is a fear I faced when I began grad school and it feels overwhelming and like everyone is leagues ahead of you and you have no idea where to begin. I promise that is not the case. Start anywhere, it is okay, as where you end up will not matter too much on where you begin. Mentors help, but they aren't necessary if you have dedication. As you read you will become accustomed to the language and start to understand the "lore." I highly suggest following topics you find interesting backwards through time, as this has been one of the most beneficial practices in my learning. I still find revisiting some old works reveals many hidden gems that were forgotten. Plus, they'll be easier to read! Yes, you will have to reread many of those works later, as you mature your knowledge, but that is not a bad thing. You will come with newer eyes. Your goal should be to first understand the motivation/lore, so do not worry if you do not understand all the details. You will learn a lot through immersion. It is perfectly okay if you barely understand a work when first starting because a mistake many people make (including a lot of researchers!) is that a paper is not and cannot be self contained. You cannot truthfully read a work without understanding its history and that only comes with time and experience. Never forget this aspect; it is all too easy to deceive yourself that things are simpler than they are (the curse of hindsight).

I'd also suggest to just get building. To learn physics you must do physics problems. To learn ML you must build ML systems. There are no shortcuts but progress is faster than it looks. There's hundreds of tutorials out there and most are absolute garbage but I also don't have something I can point to that's comprehensive. Just keep in mind that you're always learning and so are the people writing tutorials. I'm going to kinda just dump some links, they aren't in any particular order sorry haha. Its far from comprehensive, but this should help you getting started, nothing in here is too advanced. If it looks complicated, spend more time, you'll get it. It's normal if it doesn't click right away and there's nothing wrong with that.

https://www.youtube.com/@Mutual_Information

https://www.youtube.com/@EmergentGarden

https://www.youtube.com/@pascalpoupart3507

https://www.youtube.com/@AndrejKarpathy

https://www.youtube.com/@alfcnz

https://www.youtube.com/@rmcelreath

http://neuralnetworksanddeeplearning.com/

https://adversarial-ml-tutorial.org/introduction/

https://www.deeplearningbook.org/

https://nlp.seas.harvard.edu/2018/04/03/attention.html

https://huggingface.co/blog/annotated-diffusion

https://lilianweng.github.io

https://pytorch.org/ecosystem/

https://medium.com/pytorch/archive

https://www.inference.vc/


Absolutely fantastic advice. Thank you!


No problem! And good luck! It's a lot of work but well worth it.


Thank you very much!


Thanks!


Unless they were very confident of acceptance, a top research prof would rewrite and resubmit before publishing on arxiv so that others could "build on it" (scoop you at a top conference).


Welcome to ML. And idk, I'd feel pretty confident that a paper that gets so many citations gets accepted. The review system is like a slot machine if you aren't a big tech lab




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: