It sounds like you haven’t really thought through AI safety in any real detail a...

polishdude20 · on Aug 20, 2022

What I don't understand is what motivates this world destroying AGI? Like, it's got motives right? How does it get them? Do we program them in? If we do, is the fear that it won't stop at anything in its way to fulfill its objective? If so, what stops it from removing that objective from its motivation? If it can discover zero days to fulfill its objective, wouldn't it reason about itself and it's human-set motives and just use that zero day to change its own programming?

Like, what stops it from changing its motive to something else? And why would it be any more likely to change its motive to be something detrimental to us?

Tepix · on Aug 31, 2022

> What I don't understand is what motivates this world destroying AGI?

Once an AGI gains consciousness (something that wasn't programmed in because humans don't even know what it is exactly) it might get interested in self-preservation, a strong motive. Humans are the biggest threats to its existance.

marvin · on Aug 20, 2022

This is a whole field of philosophy. You can read this to get started: https://en.wikipedia.org/wiki/AI_alignment

Tl;dr: It's hard to define what humans would actually want a powerful genie to do in the first place, and if we figure that out, it's also hard to make the genie do it without getting into a terminal conflict with human wishes.

Meaning: Doing it in the first place, without going off on some fatal tangent we hadn't thought about, and also preventing it from getting side-tracked by instrumental objectives that are inherently fatally bad to us.

The nightmare scenario is that we tell it to do X, but it is actually programmed to do Y which looks superficially familiar to X. While working to do Y, it will also consume most of Earth's easily available resources and prevent humans from turning it off, since those two objectives greatly increase the probability of achieving Y.

To illustrate via the only example we currently have experience with: Humans have an instrumental objective of accumulating resources and surviving for the immediate future, because those two objectives greatly increase the likelihood that we will propagate our genes. This is a fundamental property of systems that try to achieve goals, so it's something that also needs to be navigated for superhuman intelligence.

semi-extrinsic · on Aug 19, 2022

So what if the AGI discovers a bunch of zero days on the internet? We can just turn the entire internet off for a week and be just fine, remember?

And exactly how does the AGI extort or persuade humans? What can it say to me that you can't say to me right now?

luma · on Aug 19, 2022

Sends you a text from your spouse's phone number that an emergency has happened and you need to go to xyz location right now. Someone else is a gun owner and they get a similar text, but their spouse is being held captive and are sent to the same location with a description of you as the kidnapper. Scale this as desired.

Use your imagination!

semi-extrinsic · on Aug 19, 2022

Again, what part of this couldn't an evil human group already do today? What in this plan requires an AGI for it to be a scary scenario?

Of course if you do it to everyone, total gridlock ensues and nobody gets to the place where they would be killed. If you only do it to a few, maybe there will be a handful of killings before everyone learns not to trust a text ever again.

As an aside, I agree living in a country where people will take their guns around and try to solve things vigilante style instead of calling the police, is a very bad thing in general. In all first-world countries except one, and in almost all developing countries in the world, this is a solved problem.

luma · on Aug 20, 2022

Scale! Do this to two people, no problem you can get a team of folks to dig through their social media, come up with a plan, execute. Might take a day to get a good crisis cooked up.

An AI might be able to run this attack against the entire planet, all at once.

Try to think like a not-human, and give yourself the capacity of near-infinite scale. What could you do? Human systems are hilariously easy to disrupt and you don't need nukes to make that happen.