Our Values Are Complex and Fragile
The claim that we’ll need extreme precision to make safe, usable AIs is key to this book’s argument. So let’s back off for a moment and consider a few objections to the whole idea.
First, one might object to the whole idea of AIs making autonomous, independent decisions. When discussing the potential power of AIs, the phrase “Al-empowered humans” cropped up. Would not future AIs remain tools rather than autonomous agents? Actual humans would be making the decisions, and they would apply their own common sense and not try to cure cancer by killing everyone on the planet.
Human overlords raise their own problems, of course. The daily news reveals the suffering that tends to result from powerful, unaccountable humans. Now, we might consider empowered humans as a regrettable “lesser of two evils” solution if the alternative is mass death. But they aren’t actually a solution at all.
Why aren’t they a solution at all? It’s because these empowered humans are part of a decision-making system (the AI proposes certain approaches, and the humans accept or reject them), and the humans are the slow and increasingly inefficient part of it. As AI power increases, it will quickly become evident that those organizations that wait for a human to give the green light are at a great disadvantage. Little by little (or blindingly quickly, depending on how the game plays out), humans will be compelled to turn more and more of their decision making over to the AI. Inevitably, the humans will be out of the loop for all but a few key decisions.
Moreover, humans may no longer be able to make sensible decisions, because they will no longer understand the forces at their disposal. Since their role is so reduced, they will no longer comprehend what their decisions really entail. This has already happened with automatic pilots and automated stock-trading algorithms: these programs occasionally encounter unexpected situations where humans must override, correct, or rewrite them. But these overseers, who haven’t been following the intricacies of the algorithm’s decision process and who don’t have hands-on experience of the situation, are often at a complete loss as to what to do—and the plane or the stock market crashes.1
Finally, without a precise description of what counts as the Al’s “controller,” the AI will quickly come to see its own controller as just another obstacle it must manipulate in order to achieve its goals. (This is particularly the case for socially skilled AIs.)
Consider an AI that is tasked with enhancing shareholder value for a company, but whose every decision must be ratified by the (human) CEO. The AI naturally believes that its own plans are the most effective way of increasing the value of the company. (If it didn’t believe that, it would search for other plans.) Therefore, from its perspective, shareholder value is enhanced by the CEO agreeing to whatever the AI wants to do. Thus it will be compelled, by its own programming, to present its plans in such a way as to ensure maximum likelihood of CEO agreement. It will do all it can do to seduce, trick, or influence the CEO into agreement. Ensuring that it does not do so brings us right back to the problem of precisely constructing the right goals for the AI, so that it doesn’t simply find a loophole in whatever security mechanisms we’ve come up with.