More

nikcub · 2026-03-04T21:34:05 1772660045

I can't find a github or email for Hannah - if you're reading this i'd like to add Australian energy price data via Open Electricity[0] to the data (reach out via my profile)

[0] https://explore.openelectricity.org.au/

philipkglass · 2026-03-04T21:38:14 1772660294

Her github is here: https://github.com/HannahRitchie

nikcub · 2026-03-04T22:21:26 1772662886

thank you!

nikcub · 2026-03-04T21:27:12 1772659632

chatgpt use should be in the default set since energy use of ai is so often in the news now - and more often in social media

nikcub · 2026-03-01T04:39:33 1772339973

so we're all going to hold onto sequoia like we did snow leopard. only reason i'm not buying a new mac at the moment is because it would force me to upgrade.

the situation is absurd ..

fwiw switching to the sequoia beta channel in system settings killed the nag notifications for me (I believe the profile as defined in OP will stop all updates - which you probably don't want)

ProllyInfamous · 2026-03-01T10:50:10 1772362210

My historic "sticking points" have been macOS 9.1, 10.8, 10.13, & presently 13.2

I'm getting old enough that it's rational thinking: pre-AI/pre-Tahoe operating systems will be accompany me into death.

lapcat · 2026-03-01T12:39:12 1772368752

But you probably don't want to receive all and only beta updates either, which is what the beta channel will give you.

nikcub · 2026-02-26T22:40:22 1772145622

> Claude Code this morning was about to create an account with NeonDB

I had the same thing happen. Use planetscale everywhere across projects and it recommended neon. It's definitely a bug.

nikcub · 2026-02-26T22:37:24 1772145444

from my understanding Anthropic are now hiring a lot of experts in different who are writing content used to post-train models to make these decisions and they're constantly adjusted by the anthropic team themselves

this is why the stacks in the report and what cc suggests closely match latest developer "consensus"

your suggestion would degrade user experience and be noticed very quickly

asawfofor · 2026-02-26T23:44:55 1772149495

I guess that’s why I’m not seeing anyone trying to build a skills marketplace for agent skills files. The llm api will read in any skills you want to add to context in plain text, and then use your content to help populate their own skills files.

xyzzy123 · 2026-02-27T03:53:37 1772164417

So I wonder about sharable skills? Like if it's a problem that lots of people have, I find the base model knows about it already.

But how to do things in your environment? The conventions your team follow? Super useful but not very shareable.

Whats left over between those extremes does not seem to be big enough to build an ecosystem around.

Final problem, it seems difficult to monetise what is effectively a repo of llm generated text files.

fragmede · 2026-02-27T03:12:59 1772161979

isn't that https://lobehub.com/ ?

sarchertech · 2026-02-27T01:15:41 1772154941

That sounds too expensive to be viable when the giveaway phase ends.

hedora · 2026-02-27T14:07:28 1772201248

That's how Google search worked back when it was at its most useful. They had a large "editorial team" that manually tweaked page ranks on a site-by-site basis.

The core graph reputation based page ranking algorithm lasted for a hot second before people started gaming it. No idea what they do these days.

sarchertech · 2026-02-27T14:56:57 1772204217

Yeah but you can farm that out very cheap, and I don’t think they were even manually reviewing more than a small fraction of sites.

If you’re hiring experts to manually rank programming libraries, that’s a much more expensive position.

nikcub · 2026-02-23T01:49:41 1771811381

> they say they use heuristics and usage patterns.

cache hit rate alone would stand out

mvdtnz · 2026-02-23T01:56:08 1771811768

Why do you mean by this? What cache?

mirashii · 2026-02-23T02:01:30 1771812090

Generally speaking, there's prompt caching that can be enabled in the API with things like this: https://platform.claude.com/docs/en/build-with-claude/prompt...

For a specific harness, they've all found ways to optimize to get higher cache hit rates with their harness. Common system prompts and all, and more and more users hitting cache really makes the cost of inference go down dramatically.

What bothers me about a lot of the discussion about providers disallowing other harnesses with the subscription plans around here is the complete lack of awareness of how economies of scale from common caching practices across more users can enable the higher, cheaper quotas subscriptions give you.

lurkshark · 2026-02-23T06:05:11 1771826711

I feel like a lot of this would go away if they made a different API for the “only for use with our client” subscriptions. A different API from the generic one, that moved some of their client behaviors up to the server seems like it would solve all this. People would still reverse engineer to use it in other tools but it would be less useful (due to the forced scaffolding instead of entirely generic completions API) and also ease the burden on their inference compute.

I’m sure they went with reusing the generic completions API to iterate faster and make it easier to support both subscription and pay-per-token users in the same client, but it feels like they’re burning trust/goodwill when a technical solution could at least be attempted.

TeMPOraL · 2026-02-23T08:32:39 1771835559

> I feel like a lot of this would go away if they made a different API for the “only for use with our client” subscriptions.

They literally did exactly that. That's what being cut off (Antigravity access, i.e. private "only for use with our client" subscription - not the whole account, btw.) for people who do "reverse engineer to use it in other tools".

Nothing here is new or surprising, the problem has been the same since Anthropic released Claude Code and the Max subscriptions - first thing people did then was trying to auth regular use with Claude Code tokens, so they don't have to pay the API prices they were supposed to.

lurkshark · 2026-02-23T18:46:01 1771872361

What I was getting at is that the current API is still a generic inference endpoint, but with OAuth instead of an API key. What I'm suggesting is that they move some of the client logic up to the OAuth endpoint so it's no longer a generic inference endpoint (e.g. system prompt is static, context management is done on the server, etc). I assume they can get it to a point that it's no longer useful for a general purpose client like OpenClaw

nikcub · 2026-02-23T02:01:49 1771812109

prompt caching - big part of the reason why they can economically offer claude code plans. one of the ant team explain it here:

https://x.com/trq212/status/2024574133011673516

nikcub · 2026-02-20T20:16:03 1771618563

I assume that's why this is gated behind a request for access from teams / enterprise users rather than being GA

but there are open versions available built on the cn OSS models:

https://github.com/lintsinghua/DeepAudit

sciencejerk · 2026-02-20T20:53:24 1771620804

The GA functionality is already here with a crafted prompt or jailbreak :)

nikcub · 2026-02-20T21:01:40 1771621300

it's gone a bit unnoticed that they've stopped support for response prefilling in the 4.6 models :/

nikcub · 2026-02-20T20:10:46 1771618246

> What we've found is that giving LLM security agents access to good tools (Semgrep, CodeQL, etc.) makes them significantly better

100% agree - I spun out an internal tool I've been using to close the loop with website audits (more focus on website sec + perf + seo etc. rather than appsec) in agents and the results so far have been remarkable:

https://squirrelscan.com/

Human written rules with an agent step that dynamically updates config to squash false positives (with verification) and find issues while also allowing the llm to reason.

nikcub · 2026-02-19T18:03:32 1771524212

and anyone notice that the pace has broken xAI and they were just dropped behind? The frontier improvement release loop is now ant -> openai -> google

gavinray · 2026-02-19T19:21:38 1771528898

xAI just released Grok 4.20 beta yesterday or day before?

dist-epoch · 2026-02-19T18:59:30 1771527570

Musk said Grok 5 is currently being trained, and it has 7 trillion params (Grok 4 had 3)

svara · 2026-02-19T19:57:20 1771531040

My understanding is that all recent gains are from post training and no one (publicly) knows how much scaling pretraining will still help at this point.

Happy to learn more about this if anyone has more information.

Squarex · 2026-02-20T09:00:54 1771578054

I still remember gemini 1.5 ultra and gpt 4.5 as extremely strong at some areas that no benchmark capture. It was probably not economical to use them at 20 usd subscription, but they felt differently and smarter at some ways. The benchmarks seems to be missing something, because flash 3 was very close on some benchmarks to 3 pro, but much, much dumber.

dist-epoch · 2026-02-19T20:21:42 1771532502

You gain more benefit spending compute on post-training than on pre-training.

But scaling pre-training is still worth it if you can afford it.

nikcub · 2026-02-19T03:01:21 1771470081

mirror: https://archive.is/NLZxj