The dawn of AI safety
Feral roosters compete with little effect to wake the tourists and hospitality workers of Waikiki Beach in Honolulu. They are whimsical reminders of the unintended consequences of our relentless march across the planet. As the sun rises and the roosters crow, scientists and engineers travel from around the world to gather for AAAI-19, the Association for the Advancement of Artificial Intelligence’s thirty-third annual conference.
Attendees are conferring over topics such as multi-agent pathfinding, deep Bayesian learning, and, encouragingly, AI safety. In general, the object of the conference’s participants is to make artificial intelligence more effective at achieving the goals that humans set for it. As the safety track of the conference emphasizes again and again, those goals are mostly well-intended and yet tricky to properly express. The genie of AI, just as its Aladdin counterpart, tends to carry out precisely its instructions rather than our wishes.
Despite the sponsorship of the Safe AI workshop by Cambridge’s Centre for the Study of Existential Risk, the risks covered were mostly pedestrian – literally pedestrian in the case of autonomous vehicle systems. Participants of the Safe AI workshop selected “Detecting Backdoor Attacks on Deep Neural Networks by Activation Clustering” by Chen et al. as their favorite paper. A number of the other presentations also enumerated adversarial attacks and their remedies. Researchers have previously shown a variety of ways that a malicious actor can cause AI systems to have unintended or outright malevolent results. In the case of autonomous vehicle control, this can be disastrous. A concrete example is that of marking stop sign images in a dataset with small and innocuous-looking post-its and labeling them as speed-limit signs (Gu et al. 2017). This sort of white-box attack on an AI model requires access to the training dataset, model or both, but other black-box attacks which compromise systems without such access were also presented along with proposed remedies.
Absent a malicious actor, artificial intelligence systems can succeed at carrying out their objectives while failing to benefit or even harming their creators. Beyond Isaac Asimov’s three laws of robotics, there are real-world efforts under way to ensure that AI systems deliver on the intent of their creators, not just the letter of their objective functions. Gopal Sarma proposed employing neuropsychology in assuring value alignment in AI systems. Francesca Rossi neatly outlined an IBM approach to ethically bound the behavior of such systems at some cost to objective performance. And for run-away AI, others described “red button” techniques of interrupting operation.
AI applications are still challenging problems. As Philip Koopman of CMU entertainingly illustrated, autonomous vehicles in Pittsburgh must master a gauntlet of challenges before they can safely get across town. They have to be made better at their task before they can be safe. In fact, almost all of the AI safety efforts presented at the conference were directed at making AI systems more effective at carrying out their specified and intended tasks.
The understood failure space of AI systems is expanding rapidly. José Hernández-Orallo summarized the state of the field and deftly pressed out at its corners. His paper describes a number of scenarios in which a continuously learning and changing intelligent system might break away from its defined objectives or safety constraints. Tantalizing is his observation that it difficult to determine the boundaries between cognition within an artificial intelligence and its environment, including our own minds. This might seem to be veering into the realm of the philosophical, but that may be where the most profound risk of AI resides.
The presentation and paper given by Peter Eckersley of Partnership on AI came closest to exploring AI risk with civilization-wide impact. He was one of two speakers to raise Nick Bostrom’s paperclip maximizer thought experiment. Eckersley draws upon a number of proofs of the impossibility of satisfying multiple societal goals to assert that, “we do not presently have a trustworthy framework for making decisions about the welfare or existence of people in the future.” He asked the audience to consider, “Who will exist in the future?” with the implication that utility functions will be making these decisions. Eckersley urged machine learning practitioners to avoid using totally ordered objective functions. He presented the dire Totalitarian Convergence Conjecture:
“Powerful agents with mathematically certain, monotonically increasing, open-ended objective functions will adopt sub-goals to disable or dis-empower other agents in all or almost all cases,”
followed quickly by the reassuring caveat that perhaps there was a pluralistic alternative. Eckersley’s presentation did not deal with existential risk, but was an unsettling reminder that there are societal effects of pervasive AI.
Interstitial interviews with speakers and attendees revealed an array of responses to the question of existential risk. Philip Koopman provided the most detailed scenario of a disaster that could befall civilization in the near term as the result of rapid adoption of technology. He should know – besides his responsibilities at CMU, he is CTO of Edge-Case Research, a company that is guiding us toward autonomous transportation. In his presentation Koopman had presented the sort of challenging but tractable problems that AI faces in safely guiding vehicles through city streets like Pittsburgh’s. Asked offline about larger-scale risk, he began by reminding us that while we have very effective artificial intelligence in narrow domains, it is still quite brittle. “There are rules baked in,” he said, “Rules that we don’t understand.” That leaves the possibility for effects that hide within the statistics for some time. He raised the question of whether a demographic shift in fatalities resulting from full autonomy would be an acceptable outcome. For instance, when airbags were first deployed in cars, their explosive charges were calibrated to constrain a 200-lb man during a high-speed collision. Lighter passengers and babies paid a terrible price. Koopman did admit that he expected fatalities to fall as autonomy pervades transportation – he just warned that we need to be ready to accept the bias inherent in AI and its potentially lethal effects.
When pressed to envision large-scale worst-case scenarios, Professor Koopman had a laundry list of potential points of failure with implications beyond the scope of individual car accidents. He said that while the Y2K bug had been reported as a case of systemic failure forestalled, that there were many pockets of failure like a system at CMU that was down for nine months. He pointed out that the operating system of choice for autonomous vehicles, Linux, will fail in the same way in the year 2038. A bank failing is a serious problem, but a transportation system failing is another. He recounted the time that F-22 fighter plane navigation, communication, and fuel systems failed simultaneously as the jets crossed the international dateline. Only quick thinking and good weather prevented the time-bomb of a software bug from downing the fighters.
“Human drivers have good numbers,” said Koopman (100 million miles between fatalities by one calculation). “We are good at compensating for mistakes.” He painted the worst case scenario in which a Y2K or dateline bug strikes system-wide: “Imagine all the cars crashing simultaneously, fatally.” He concluded, “We are deploying quicker than we are understanding.”
Professor Koopman’s direst scenario would entail horrific loss of human life followed by a paralyzed transportation system, which would in turn generate significant dislocation and disruption. Food and medicine shortages, crop failures, and more could precipitate a larger global disaster. But humans are, as he pointed out, adept at compensating for natural and man-made disaster. Other conference-goers had relatively sanguine views about the risk of paper-clip maximizers or other nightmare outcomes of AI run amok. Xiaowei Huang said that current AI is “not so powerful” that it could pose an existential risk. He pointed out that current AI is insufficient to even “abstract the best idea” from a data set, something that humans do on a daily basis. Alessio Lomuscio indicated that he was more concerned about the ill effects of “stupid algorithms” than wickedly smart AI. Neither was concerned about the possibility of a near-term intelligence explosion.
Jose Hernandez-Orallo was kind enough to thoughtfully consider the problem of existential risk. After providing perhaps the broadest overview of AI pitfalls, he might be one to sense a path to the darkest corner of the possible scenarios. His paper was co-authored with two researchers from Cambridge’s Centre for the Study of Existential Risk. In a sideline interview, he dismissed a “run-away Roomba” with a shrug and, in an apparent segue, began to express his concern for “a potential tech oligopoly of companies like Google and Facebook.” He noted that the West had perhaps seven or eight such companies handling our data and that China had perhaps two such entities. He explained that there was a “conflict of values” between these companies and the targets of their attention, the human race. Peter Flach agreed that “AI is a tool like a knife.” It can be used to prepare food or injure depending on who wields it and their intent. The concern he then raised appeared to be a detour – He said that Facebook’s application of AI is in service of “psychology, maximizing engagement.” Neither Hernandez-Orallo nor Flach asserted an existential threat from the tech giants, but it was interesting that they raised the topic in response to the question.
Later in the conference, Stuart Russell expanded upon Flach’s concern of these companies doing more than passive observation and analysis of their subjects. He said that, “We cannot assume the objective functions [of these companies] are safe.” While the view is that they are passively adapting to the behavior of users, he felt that their actual effect was to, “make the user more predictable.” He said that Facebook’s algorithms might be based on the assumption of fixed human preferences, but that their net effect was to, “mold people to produce more payoff.” Russell freely embraced the idea of existential risk posed by AI during the interview. He said that it is a “defensive reaction” of the AI community to downplay the risk and that the, “Luddites were right about the significant risks of technologies such as nuclear power. The nuclear industry has almost completely destroyed itself by brushing the risks under the carpet.”
On the whole, the Safe AI conference speakers painted a picture of a burgeoning area of research and a virtual arms race between scientists and engineers and the natural and man-made pitfalls that their creations face. While existential risk was not the focus and was widely discounted, there was a strong air of concern about rapid adoption of AI into the fabric of civilization. Transportation, medicine, governance, finance, and communication were only a few of the realms of AI adoption that were named as sources of risk. Research on the cumulative risk of the broad adoption of AI may be warranted. Offline, participants cited aggregators and mediators of human behavior as particularly concerning.