Hacker News new | past | comments | ask | show | jobs | submit login

You've brought up Pascal's Wager twice in this thread, as if it really can end an argument by itself. Just because you don't like what high impact / low probability EV calculations do in Theology doesn't mean you can wave away high impact / low probability EV calculations in other domains.



I agree that "that's Pascal's wager!" isn't a reasonable response to someone arguing that, say, a 1% or 10% extinction risk is worth taking seriously. If you think the probability is infinitesimally small but that we should work on it just in case, then that's more like Pascal's wager.

I think the whole discussion thread has a false premise, though. The main argument for working on AGI accident risk is that it's high-probability, not that it's 'low-probability but not super low.'

Roughly: it would be surprising if we didn't reach AGI this century; it would be surprising if AGI exhibited roughly human levels of real-world capability (in spite of potential hardware and software improvements over the brain) rather than shooting past human-par performance; and it would be surprising if it were easy to get robustly good outcomes out of AI systems much smarter than humans, operating in environments too complex for it to be feasible to specify desirable v. undesirable properties of outcomes. "It's really difficult to make reliable predictions about when and how people will make conceptual progress on a tough technological challenge, and there's a lot of uncertainty" doesn't imply "the probability of catastrophic accidents is <10%" or even "the probability of catastrophic accidents is <50%".


>>high impact / low probability EV calculations in other domains

Yeah but there isn't a calculation of the risk/probability of AGI there's just the appeal to the idea.


That's… not at all true? There have been numerous thorough attempts at coming up with as reasonable an estimate as possible, involving in-depth conversations with most of the experts in the field. Maybe it's not strictly a "calculation", but it's a hell of a lot more than just an appeal to the idea of AGI.


Except that there's no such thing as the field of "AGI" (or if there is, its a subfield of philosophy). Asking modern ML and Deep Learning researchers about their thoughts on AGI is like asking the wright brothers about a mars mission.

Sure, they're the closest thing we have to "experts", but there's not just one, but likely 5 or 10 more field, if not world-changing leaps we need to make before the technologies we have will resemble anything like AGI.

And that's ignoring the people who ascribe diety-like powers to some potential AGI. Air gap the computer and control the inputs and outputs. We can formally prove what a system is capable of. That fixes the problem.


> Air gap the computer and control the inputs and outputs. We can formally prove what a system is capable of. That fixes the problem.

Debates about superhuman AI have focused quite a lot on what it would mean to "control the inputs and outputs" while still being able to get some kind of benefit from the AI.

You can indeed formally prove that a computer will or won't do certain things, so you could use that for isolation purposes. But in order to be useful, the AI needs to interact with people and/or the world in some way. Otherwise it might as well be switched off or never have been built in the first place.

If it's really a superhuman intelligence with superhuman knowledge about the world, then interacting with people is where the risk creeps back in, because the AI could make suggestions, recommendations, requests, offers, promises, or threats. Although there are plenty of ideas about limiting the nature of questions and answers, having some kind of separate person or machine judge whether information from the AI's communications should be used or how, or limiting what the AI is programmed to attempt to do or how, none of these measures are straightforward to formally prove correct in the way that simpler isolation properties are.

If we made contact with intelligent aliens, would formal proofs of correctness of the computers through which we (say, exclusively) communicate with them guarantee that they couldn't massively disrupt our society by means of what they had to say?


Your opinion about how far off AGI is is a totally reasonable one, and some AI experts agree with it. Other AI experts disagree, and think that AGI is basically tractable with current research methods plus a breakthrough or two.

Yes, there's some additional uncertainty about whether you're even asking the right people. But you can take account of that uncertainty, both by widening your error bars and by asking people from other fields as well (including philosophers, e.g. Bostrom). What you can't do is just throw up your hands and say "it's unknowable".

This is all beside the original point, which is that these arguments are much more rigorously grounded than just a wave in the direction of AGI. Could they be better? Sure, and there are a lot of people who'd be interested in seeing some better estimates. But for now, they're the best we have, and they're a completely reasonable thing to base decisions on.

> We can formally prove what a system is capable of. That fixes the problem.

That's exactly what some of the people working on this problem are trying to do, but it's a hell of a lot harder than you make it sound. Formal methods have come a really long way, but they're not even remotely close to being able to prove that an AGI system is safe (yet).


You don't need to prove anything about an AGI system if you can prove things about its I/O capabilities. I recognize that proving things about machine learning models is very hard, I tried to do some research in that area and got practically no where.

And I think you and I have very different definitions of Rigorous. Like I said, unless you'd take Wilbur Wright's thoughts on a mars mission seriously, I don't think you should give much thought to Geoff Hinton's thoughts about when we'll get AGI, and I say that having enormous respect for him and his achievements. (and using him as a simple example)

Its pseudoscience. We're notoriously bad about predicting the future. I don't see any reason to trust people going on about the dangers of AGI any more than the futurologists of my parents generation, who predicted flying cars and interstellar travel, but missed smartphones.


> You don't need to prove anything about an AGI system if you can prove things about its I/O capabilities.

But then you can't benefit from it, or you can only benefit from it except in narrowly predefined ways.


Perhaps, but if the alternative is either incredible fear of the technology, or the technology potentially killing humanity, then "making great strides in a select few fields" seems quite good.


I really, really don't think modern AI capabilities and AGI are as dissimilar as airplanes and rockets. Neither do many AI researchers. You're obviously welcome to disagree, but you don't just get to declare their opinions invalid because you think they're out of their depth.


I don't think AI researchers are "out of their depth" (I am one myself). And yes, there is room for different opinions.

However, I believe that the ones who say sensational things about AI doomsday are the ones who are disproportionately quoted by the media.


This is also true. You see all of these things like Hawking and Musk talking about the AI Apocalypse (when they aren't even experts in the field), and it gets people scared when, at this point, there's very little reason to be.

On the other hand, I don't actually know Hinton's opinion on these things, and he might agree with me (in which case, you absolutely should listen to him!). But instead the loudest voices are perhaps the most ridiculous.

That said, I do think that if I asked you to make a bet with me on when we would approach with even 50% confidence, your error bars would be on the order of a century.


"We can formally prove what a system is capable of."

This is not true. By Rice's theorem, either the system is too dumb to be useful, or we can't prove what it can do.


This is an abuse of Rice's theorem (which seems to be getting more common).

Rice's theorem says that no program can correctly decide what an arbitrary program will do, not that no properties of programs can be proven. There are useful programs about which non-trivial properties can be, and have been, proven, and Rice's theorem is no limit on the complexity of an individual program about which a property may be proven, or the complexity of a property which an individual program may be proven to exhibit.

Usually programs with provable properties have been intentionally constructed to make it possible to prove those properties, rather than having someone come along and prove a property after-the-fact.


My argument is as follows. You're right, and I'm appealing to a stronger Wolfram-esque version of Rice, together with the fact that humans readily throw non-essential goals under the bus in pursuit of speed.

* Building AI is a race against time, and in such races, victory is most easily achieved by those who can cut the most corners while still successfully producing the product.

* As a route to general AI, a neural architecture seems plausible. (Not at the current state-of-the-art, of course.)

* Neural networks (as they currently stand) are famously extremely hard to analyse: certainly we have no good reason to believe they're more easily analysed than a random arbitrary program.

* A team which is racing to make a neural-architecture AI has little incentive to even try to make their AI easy to analyse. Either it does the job or it doesn't. (Witness the current attempts to produce self-driving cars through deep learning.) Any further effort spent on making an easily-analysable AI is effort which is wasting time that another team is using just to build the damn thing.

* Therefore, absent a heroic effort to the contrary, the first AI will be a program which is as hard as a random arbitrary program to analyse. And, as much as I hate to appeal to Wolfram, he has abundantly shown that random arbitrary programs, even very simply-specified ones, tend to be hard to analyse in practice.

(My argument doesn't actually require a neural architecture of the AI; it's just a proxy for a general unanalyseable thing.)


1. I'm not sure that I agree. Not all research is a race against time. But, perhaps you're right, I'll accept this.

2. Certainly the most plausible thing we have now, I'm not sure that that makes it plausible, but better than anything else. so okay.

3. This depends on what you mean. Neural Networks are actually significantly easier to analyze than arbitrary programs, when you essentially restrict yourself to two operations (multiplication and sigmoid or ReLU), things get a lot easier to analyze. Here are some questions we can answer about a neural network that we can't about an arbitrary program: "Will this halt for this input?", "Will this halt for all inputs?", "What will a mild perturbation of this input have on the output?", these are as a consequence of fineiteness and differentiability, which are not attributes that a normal program has. (caveat: this gets more difficult with things like RNNs and NTMs, but afaik is still true). The questions that we find difficult answer for a Neural Network are very different than for a normal program: namely "How did this network arrive at these weights as opposed to these other ones?" and related "What does this weight or set of weights represent?", but I don't think that there's any indication that those questions are impossible to answer (and often we can answer them, like for facial recognition networks where we can clearly see that successive layers detect gradients, curves, facial features, and then eventually entire faces)

4. Agreed. There's no real reason to know why it works if it works.

5. I think you can tell, but I don't think this holds.


Those arguments are plausible, and thanks for the clarification.

I just hate to see Rice's theorem interpreted as "nobody can ever know if a program is correct or not". People have been making a ton of progress on knowing if (some) programs are correct, and Rice's theorem never said they can't.


An airgapped computer is not going to be capable of taking over the world.

Further, Turing completeness is not required to be "useful". You can get to the moon without Turing completeness.


I didn't address airgappedness at all, but you may still be wrong: we use the superintelligence's output, or else why would we have created it, so we are the channel by which it affects the world. Anyway, who knows what can be done by a superintelligence which can control a stream of electrons emitted from its own CPU!


Well, exactly as much as any other cpus can do by solely controlling it's output electrons: not much. Let's not ascribe deity-like powers to things that we can understand fairly well.


Have you ever seen a CPU running a program which is trying to communicate with the outside world through its electron side-channels? As far as I can see, your argument is "no computer has ever done this, and also we understand computers well enough to know that no computer ever will". The first clause is obvious, since no computer has ever been made to do this. The second clause is assuming without proof that we will never make a superintelligent AI. Just because you don't see how to exploit the side-channels of your system, doesn't mean they're unexploitable. This is the lesson of all security research ever.


Pray explain how you could use the electrons coming out of a CPU as a side channel. I don't need anything specific, but I'd prefer something that doesn't sound like its taken out of a Heinlein novel.

You're again using terms incorrectly. A "side channel" implies that someone is listening to information that is unintentionally leaked. Unless your expectation is that this CPU is going to start side channeling our minds with the EM waves its emitting (which again, "deity-like attributes"), we'd need to be specifically listening to whatever "side channel" it uses, and it would require knowledge of and access to that side channel.

Something being able to send additional information over a side channel doesn't help unless that information is received, and so realistically, unless your hypothesis is "mind control/hacking the airwaves/whatever via sound waves the chip emanates" or similar, which are preposterous, it'll always be just as easy for the thing to transmit information via the normal channels.


A side channel is a channel through which information may leak because of the physical instantiation of an algorithm. It's not much of a stretch to include "things which let us manipulate the world" in that; do you have a better term? I thought the meaning was obvious, but apparently it's not: by "side channel" I here mean "unintended means of affecting the world by a mechanism derived from an accidental consequence of the physical implementation", by analogy with the standard "information"-related "side channel".


I think the closest conventional thing would be a sandbox escape/backdoor. Although (not that I'm an expert) I've never heard of anything close to a sandbox escape using side-channel like things. That said, like I said, most side channel attacks are either time based, or involve things like heat and power usage of the system.

The thing about all of these is that they generally allow you to get a small amount of data out that can sometimes help you with things. But again, without ascribing magic powers to the system, all the stuff that it can directly affect: power draw, temperature, disk spin speeds, monitoring LED blink speeds, noises, even the relatively insane things like EM frequency emissions can all be controlled relatively easily, and no matter how smart it is, I don't see an AGI violating physics.


I distinctly remember a paper about AI figuring out how to either get wifi or send radio waves without access to the relevant hardware. Can't find the link at the moment though :/


I expect you're referring to this article:

https://www.damninteresting.com/on-the-origin-of-circuits/


Right, because there's no way even small computers can communicate through air. Just one tiny crack is all that's needed.

And that's not even going into things the AI might say that'll convince the gatekeepers to just voluntarily let it out.


This makes me think you don't know what the word "airgapped" means in this context.


Sure, you're counting on the AI to not be able to exploit its hardware to communicate with anything. That seems like a huge assumption against a super intelligence.


Experts in the field of AI have a long history of being wrong. "AI is 20 years" away has become the standard joke in the field. There is no reason to grant expert opinion on this topic any weight.


What's your suggested alternative? It's really easy to criticize, but a lot harder to do any better. If you aren't suggesting any other methods, you're implicitly saying we should just ignore the problem. I really don't think that will end well.


My suggested alternative is prioritizing problems appropriately. This "Open Philantropy Project" is a guy taking money that used to be for curing malaria (and I respect what he used to do on that front!) and giving it to his roommate.

You can do better solving a terrible problem that does exist than solving a sci-fi problem that doesn't exist.

I know it's usually uncouth to compare different charities like this, but that's exactly what Effective Altruism was supposed to be about, and this cause directly competes with curing malaria.

If more than zero AGI technology starts existing, we can re-prioritize. It makes no sense for a field to go from not existing to superhuman performance without anyone noticing.


Nobody thinks we won't notice when it's more imminent; the concern is that capability research has always been faster than safety research. If we wait until it's imminent, we risk having no chance of catching up in time. Why not devote a tiny fraction of our funding to starting the research now, so we can be ready on time? Worst case we're early... which in this context is a lot better than being late.


Worst case the safety research ends up causing the AGI to be actively antagonistic. Slightly less worst case it never leads anywhere and AGI ends up turning us into mouthless slugs anyway. Third-ish worst it makes AGI suborn-able and whoever gets to it first becomes our god-king.

many many worse situations later we get to:

It never leads anywhere but AGI never happens anyway and it just ends up being welfare-for-future-phobes instead of curing debilitating disease or buying every homeless person in a city a new pair of shoes or whatever.

"we're early" is in fact the best possible scenario, actually it's basically the only positive scenario.


You admit that you don't know what the actual probabilities are. Or you admit you have an extremely low confidence in your estimate of the probabilities.


Have you read any of the estimates? Basically every expert gives a huge confidence interval, as do most of the people actively working on AI risk.


Saying you are 95% confident AGI will come between 50-100 years (a "huge" confidence interval), is still far too confident.


I don't recall seeing 95% confidence intervals that narrow. Even the 90% confidence intervals I see are usually like 5-150 or something. (Also it would be super weird to me if the lower bound of someone's 95%CI was 50.)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: