OT but my intuition says that there’s a spectrum - non thinking models - thinkin...

futureshock · 2026-02-12T20:33:35 1770928415

I think step 4 is the agent swarm. Manager model gets the prompt and spins up a swarm of looping subagents, maybe assigns them different approaches or subtasks, then reviews results, refines the context files and redeploys the swarm on a loop till the problem is solved or your credit card is declined.

jasondigitized · 2026-02-13T04:48:19 1770958099

So Google Answers is coming back?!?!?!

simianwords · 2026-02-12T20:39:54 1770928794

i think this is the right answer

edit: i don't know how this is meaningfully different from 3

NitpickLawyer · 2026-02-12T17:45:06 1770918306

> best of N models like deep think an gpt pro

Yeah, these are made possible largely by better use at high context lengths. You also need a step that gathers all the Ns and selects the best ideas / parts and compiles the final output. Goog have been SotA at useful long context for a while now (since 2.5 I'd say). Many others have come with "1M context", but their usefulness after 100k-200k is iffy.

What's even more interesting than maj@n or best of n is pass@n. For a lot of applications youc an frame the question and search space such that pass@n is your success rate. Think security exploit finding. Or optimisation problems with quick checks (better algos, kernels, infra routing, etc). It doesn't matter how good your pass@1 or avg@n is, all you care is that you find more as you spend more time. Literally throwing money at the problem.

mnicky · 2026-02-12T17:50:27 1770918627

> can a sufficiently large non thinking model perform the same as a smaller thinking?

Models from Anthropic have always been excellent at this. See e.g. https://imgur.com/a/EwW9H6q (top-left Opus 4.6 is without thinking).

simianwords · 2026-02-12T17:54:36 1770918876

its interesting that opus 4.6 added a paramter to make it think extra hard.

andy12_ · 2026-02-13T12:00:56 1770984056

The difference between thinking and no-thinking models can be a little blurry. For example, when doing coding tasks Anthropic models with no-thinking mode tend to use a lot of comments to act as a scratchpad. In contrast, models in thinking mode don't do this because they don't need to.

Ultimately, the only real difference between no-thinking and thinking models is the amount of tokens used to reach the final answer. Whether those extra scratchpad tokens are between <think></think> tags or not doesn't really matter.