Ever since their introduction over eighty years ago, Isaac Asimov‘s Three Laws of Robotics have been the de jure rules governing the acceptable behavior of robots. Even the uninitiated and uninterested are likely to say they know of them, even if they can’t recite a single rule verbatim. When conceived, the Three Laws were nothing but a thought experiment wrapped in a science fiction story, but now, the dizzying pace of developments in the fields of robotics and ai has spurred engineers and ethicists to reinvestigate and rewrite the guidelines by which artificially intelligent entities should operate. Who better to take the lead in this initiative than Google, the company who just yesterday announced that machine learning will be at the core of everything it does.
The problem with Asimov’s rules, whether by design or happenstance, is that although they superficially appear to be a rational framework for protecting both humans and robots, in that order importantly, in reality there can be unintended and often dire consequences if the laws are interpreted literally without other constraints. This idea is the central to many robot stories, including the 2004 blockbuster movie I, Robot starring Will Smith.
It is this emergent “unintended and harmful behavior” resulting from poorly designed AI systems that Google engineers address in their paper “Concrete Problems in AI Safety” published earlier this week. While the 29 page technical paper is definitely worth the read, if it’s just the 30,000 foot view you are looking for, co-author and Google Brain researcher Chris Olah summarizes the document’s conclusions in a Google Research blog post entitled “Bringing Precision to the AI Safety Discussion.”
The five issues outlined are not really rules per se, but rather topics that the author’s will be of utmost importance as artificial intelligence systems mature. They are:
- Avoiding Negative Side Effects
- Avoiding Reward Hacking
- Scalable Oversight
- Safe Exploration
- Robustness to Distributional Shift
While militarized robots seem to be the first thing people think of when considering safe human-robot interactions, these researches chose to go with the more mundane to illustrate their points. They chose a cleaning robot.
For example when considering negative side effects, they point out that the cleaning robot may work less delicately, breaking a vase in its haste to achieve its speed objective. Reward hacking is about interpreting a goal in such away that it might be achieved but not in the way the goal writer intended. If a cleaning robot’s goal is to have a room with no visible trash, hiding said trash under the couch is probably not what the goal’s creator had in mind.
Their will likely be many sub-tasks and sub-goals a robot takes to achieve its objective. It won’t always be practical for these to be done under human supervision. How do we ensure the robot is functioning correctly without diving into the minutiae ourselves? This is the problem of scalable oversight.
Besides avoiding negative side effects, safe exploration is the other issue most closely related to Asimov’s overarching theme of not harming humans, humanity, or one’s robot self. Robots should be encouraged to explore in order to both learn and find optimized solutions to reach their goals, but a cleaning robot sticking a wet mop in an electrical outlet would be disastrous.
Robustness to distributional shift just turns out to be a fancy way of saying that robots need to be prepared to operate safely and effectively in environments that are different from the ones they were trained in.
“Concrete Problems in AI Safety” seems to ask more questions than it answers, but ultimately this was by design. The author’s rightfully recognized that now is time to turn from science fiction to the practical considerations before us as the artificial intelligence systems in our midst gain more and more autonomy. The dialog needed to start somewhere, and this is it.