More

ainch · 2025-12-02T05:48:49 1764654529

As LLMs are productionised/commodified they're incorporating changes which are enthusiast-unfriendly. Small dense models are great for enthusiasts running inference locally, but for parallel batched inference MoE models are much more efficient.

ainch · 2025-11-30T21:35:16 1764538516

Perhaps they need more advertising around the correct spelling of his name.

nightski · 2025-12-01T07:51:41 1764575501

Good catch but it was just an honest typo.

ainch · 2025-11-21T22:18:01 1763763481

Gemini 3.0's cutoff is January. I think you can get away with it if the model has good search/tool use capability.

ainch · 2025-11-13T22:51:17 1763074277

Which other group is that?

ainch · 2025-11-07T00:53:01 1762476781

Of course people don't say it, but there are many cases where reported algorithmic improvements are attributable to poor baseline tuning or shoddy statistical treatment. Tao is exhibiting a lot more epistemic humility than most researchers who probably have stronger incentives to market their work and publish.

ainch · 2025-11-07T00:31:09 1762475469

Inferior in what sense? Genie 3 is addressing a fundamentally different problem to a physics sim or procgen: building a good-enough (and broad-enough) model of the real world to train agents that act in the real world. Sims are insufficient for that purpose, hence the "sim2real" gap that has stymied robotics development for years.

wizzwizz4 · 2025-11-07T14:31:00 1762525860

Genie 3 is inferior in the sense you just described: the sim2real gap would be greater, because it's a less accurate model of the aspects of the world that are relevant to robotics.

ainch · 2025-11-02T20:24:30 1762115070

The reports are definitely bland, but I find them very helpful for discovering sources. For example, if I'm trying to ask an academic question like "has X been done before," sending something to scour the internet and find me examples to dig into is really helpful - especially since LLMs have some base knowledge which can help with finding the right search terms. It's not doing all the thinking, but those kind of broad overviews are quite helpful, especially since they can just run in the background.

kmarc · 2025-11-02T21:13:32 1762118012

I caught myself that most of my LLM usage is like this:

ask a loaded, "filter question" I more or less know the answer for, and mostly skip the prose and get to the links to its sources.

ukuina · 2025-11-03T12:01:42 1762171302

The "loaded question" approach works for getting MUCH better pro/con lists, too, in general, across all LLMs.

vogu66 · 2025-11-03T07:58:49 1762156729

I do that too, I wonder how much of it is the LLM being helpful and how much of it is the RAG algorithm somehow providing better references to the LLM than a google search can?

ainch · 2025-11-02T20:19:12 1762114752

Generally you train each expert simultaneously. The benefit of MoEs is that you get cheap inference because you only use the active expert parameters, which constitute a small fraction of the total parameter count. For example Deepseek R1 (which is especially sparse) only uses 1/18th of the total parameters per-query.

pama · 2025-11-03T01:35:56 1762133756

> only uses 1/18th of the total parameters per-query.

only uses 1/18th of the total parameters per token. It may use the large fraction of them in a single query.

ainch · 2025-11-03T19:58:20 1762199900

That's a good correction, thanks.

ainch · 2025-11-02T20:11:29 1762114289

That's an interesting idea, it sounds similar to the principles behind low precision models like BitNet (where each weight is +-1 or 0).

That said, I know Deepseek use fp32 for their gradient updates even though they use fp8 for inference. And a recent paper shows that RL+LLM training is shakier at bf16 than fp16, which would both imply that numerical precision in gradients still matters.

ainch · 2025-10-23T02:12:24 1761185544

Cutting people at FAIR is a real shame though - great models like DINO and SAM have had massive positive impact - hopefully that work doesn't slow in favour of LLM-only development at MSL.