Ten Noteworthy AI Research Papers of 2023

dmezzetti · 2024-01-07T00:29:58.000000Z

Couple other interesting papers from 2023 to add, specifically for small language models.

TinyStories: https://arxiv.org/abs/2305.07759

Phi-1.5: https://arxiv.org/abs/2309.05463

Both these papers proved that with good/concise data we can get better performance with smaller parameter counts.

mistrial9 · 2024-01-07T00:37:58.000000Z

no I dont think so - TinyStories has a different architecture and assumptions for success.. different enough so that they dont compare to the others IMHO

dmezzetti · 2024-01-07T00:44:39.000000Z

Perhaps but TinyStories was one of the first papers to show that we don't need a lot of parameters to build a coherent generative model.

mistrial9 · 2024-01-07T00:58:51.000000Z

fwiw this link looks interesting, everyone

https://github.com/neuml/txtai

p1esk · 2024-01-07T01:10:41.000000Z

What is “coherent”?

dmezzetti · 2024-01-07T01:12:31.000000Z

Generates plausible continuations of text.

p1esk · 2024-01-07T02:06:34.000000Z

Both gpt-2 and gpt-4 do that.

SgtBastard · 2024-01-07T05:03:55.000000Z

> we don't need a lot of parameters to build a coherent generative model.

GPT-4 is likely to be a trillion+ parameters over its MoR models.

swyx · 2024-01-06T23:43:38.000000Z

fantastic commentary on these papers and very good choices for ai engineering.especially dont miss sebastian’s commentary on pythia https://twitter.com/rasbt/status/1734920232173539796?s=12&t=... and bloomberg gpt (which was a surprise to me bc to my knowledge the model wasnt actually ever released? idk) https://x.com/rasbt/status/1738467874644128193?s=20

we covered some of these papers with their authors at neurips (those we could get…) coverage here

https://www.latent.space/p/neurips-2023-papers

mistrial9 · 2024-01-06T23:48:30.000000Z

I stumbled upon this site

https://openreview.net/group?id=NeurIPS.cc/2023/Conference#t...

not sure how this works exactly

light_hue_1 · 2024-01-07T00:18:25.000000Z

People run conferences through openreview. You sign up as an organizer, people submit papers, and reviewers review them. Then a final decision is made about acceptance.

As a non-expert, I think that openreview is basically useless for you. The signal to noise ratio is terrible. It's just like arxiv.

Just because a paper got good reviews, doesn't make it a good paper. And just because it got bad reviews, doesn't mean it's a bad paper. Conference reviews are pretty random. A lot of good work is rejected and a mountain of trash is accepted.

At the moment we're massively short on competent reviewers, so the level of commentary of openreview is.. "oh look, my racist uncle commented on cnn.com again"-level abysmal.

sdenton4 · 2024-01-07T00:24:58.000000Z

+1 to this. I've been orienting away from ML conferences because the review process is so bad; it's an incredible amount of wasted work responding to and re-submitting things after getting a bad crop of reviewers. Honestly it also makes me despair for the quality of the conferences themselves.

light_hue_1 · 2024-01-07T02:27:11.000000Z

Sadly if you're an academic it's a game you must play. My students need papers so that they get jobs and I need them to get grants.

But yeah. We all waste an incredible amount of time resubmitting to another conference for no reason at all.

sdenton4 · 2024-01-07T16:35:19.000000Z

For my group, there's journals and a whole ecosystem outside of the ml conference space related to the actual application area... We can go to a top-tier journal in the application domain and get better reviewers.

esafak · 2024-01-07T00:08:19.000000Z

This is a popular site used by various conferences for peer review. Useful for getting expert opinion on papers. Probably too noisy for someone not involved in research.

rsfern · 2024-01-07T00:54:39.000000Z

It’s also useful for getting hot takes by reviewers who didn’t read the paper carefully. Though even in those cases sometimes the author responses can be pretty illuminating

wawayanda · 2024-01-07T01:47:33.000000Z

What has happened with the BloombergGPT? The paper was published in March and as far as I know, they have not launched anything.

jsemrau · 2024-01-07T02:21:07.000000Z

AdaptLLM appeared to be just as capable.

Havoc · 2024-01-07T02:32:41.000000Z

Pretty sure it was never intended for external consumption