We are not anywhere near 160 IQ assistants, otherwise there'd have been a bloomi...

gojomo · on Sept 27, 2023

I agree we're not at 160 IQ general-assitants, yet.

But just a few years ago, I'd have said that prospect was "maybe 20 years away, or longer, or even never". Today, with the recent rapid progress with LLMs (& other related models), with many tens-of-billions of new investment, & plentiful gains seemingly possible from just "scaling up" (to say nothing of concommitant rapid theoretical improvements), I'd strongly disagree with "not anywhere near". It might be just a year or few away, especially in well-resourced labs that aren't sharing their best work publically.

So yes, all those things you'd expect with plentiful fast-thinking 160 IQ assistants are things that I expect, too. And there's a non-negligible chance those start breaking out all over in the next few years.

And yes, such advances would upgrade prudent & good-intentioned "defenders", too. But are all the domains-of-danger symmetrical in the effects of upgraded attackers and defenders? For example, if you think "watch lists" of dangerous inputs are an effective defense – I'm not sure they are – can you generate & enforce those new "watch lists" faster than completely-untracked capacities & novel syntheses are developed? (Does your red-teaming to enumerate risks actually create new leaked recipes-for-mayhem?)

That's unclear, so even though in general I am optimistic about AI, & wary of any centralized-authority "pause" interventions proposed so far, I take well-informed analysis of risks seriously.

And I think casually & confidently judging these AIs as being categorically incapable of synthesizing novel recipes-for-harm, or being certain that amoral genius-level AI assistants are so far away as to be beyond-a-horizon-of-concern, are reflective of gaps in understanding current AI progress, its velocity, and even its potential acceleration.

0xDEAFBEAD · on Sept 28, 2023

I think this argument doesn't work if the model is open source though.

First, it's unclear how all these defensive measures are supposed to help if a bad actor is using an LLM for evil on their personal machine. How do reflection types or watch lists help in that scenario?

Second, if the model is open source, a bad actor could use it for evil before good actors are able to devise, implement, and stress-test all the defensive measures you describe.