AIs and Common Sense
One might also criticize the analogy between today’s computers and tomorrow’s AIs. Sure, computers require ultraprecise instructions, but AIs are assumed to be excellent in one or more human fields of endeavor. Surely an AI that was brilliant at social manipulation, for instance, would have the common sense to understand what we wanted, and what we wanted it to avoid? It would seem extraordinary, for example, if an AI capable of composing the most moving speeches to rally the population in the fight against cancer would also be incapable of realizing that “kill all humans” is a not a human-desirable way of curing cancer.
And yet there have been many domains that seemed to require common sense that have been taken over by computer programs that demonstrate no such ability: playing chess, answering tricky Jeopardy! questions, translating from one language to another, etc. In the past, it seemed impossible that such feats could be accomplished without showing “true understanding," and yet algorithms have emerged which succeed at these tasks, all without any glimmer of human-like thought processes.
Even the celebrated Turing test will one day be passed by a machine. In this test, a judge interacts via typed messages with a human being and a computer, and the judge has to determine which is which. The judge’s inability to do so indicates that the computer has reached a high threshold of intelligence: that of being indistinguishable from a human in conversation. As with machine translation, it is conceivable that some algorithm with access to huge databases (or the whole Internet) might be able to pass the Turing test without human-like common sense or understanding.
And even if an AI possesses “common sense,"—even if it knows what we mean and correctly interprets sentences like “Cure cancer!"—there still might remain a gap between what it understands and what it is motivated to do. Assume, for instance, that the goal “cure cancer" (or “obey human orders, interpreting them sensibly") had been programmed into the AI by some inferior programmer. The AI is now motivated to obey the poorly phrased initial goals. Even if it develops an understanding of what “cure cancer" really means, it will not be motivated to go into its requirements and rephrase them. Even if it develops an understanding of what “obey human orders, interpreting them sensibly" means, it will not retroactively lock itself into having to obey orders or interpret them sensibly. This is because its current requirements are its motivations. They might be the “wrong” motivations from our perspective, but the AI will only be motivated to change its motivations if its motivations themselves demand it.
There are human analogies here—the human resources department is unlikely to conclude that the human resources department is bloated and should be cut, even if this is indeed the case. Motivations tend to be self-preserving—after all, if they aren’t, they don’t last long. Even if an AI does update itself as it gets smarter, we won’t know that it changed in the direction we want. This is because the AI will always report that it has the “right” goals. If it has the right goals it will be telling the truth; if it has the “wrong” goals it will lie, because it knows we’ll try and stop it from achieving them if it reveals them. So it will always assure us that it interprets “cure cancer” in exactly the same way we do.
There are other ways AIs could end up with dangerous motivations. A lot of the current approaches to AIs and algorithms involve coding a program to accomplish a task, seeing how well it performs, and then modifying and tweaking the program to improve it and remove bad behaviors. You could call this the “patching” approach to AI: see what doesn’t work, fix it, improve, repeat. If we achieve AI through this approach, we can be sure it will behave sensibly in every situation that came up during its training. But how do we prepare an AI for complete dominance over the economy, or for superlative technological skill? How can we train an AI for these circumstances? After all, we don’t have an extra civilization lying around that we can train the AI on before correcting what it gets wrong and then trying again.