“Deceptive Artificial Intelligence”
By Scott Hamilton
I was taught for years that computers could only be programmed to do exactly as instructed by the software developer. Of course this was before the modern era of Artificial Intelligence (AI). I would say, to me, the worst part about the latest developments in AI are related to the fact that most people, especially from my generation, have an inherent trust in the information coming from a computer. Just think about it for a minute. How many times a day do you trust a computer for information? Every time you buy something at the local gas station, check your bank balance, read your social media feed or add up your daily spending with a calculator. We have become nearly co-dependent on technology for everyday life.
I used to be one of those people who trusted computers to do a bulk of my everyday tasks, but I am becoming more and more skeptical. A group of researchers studying Meta’s latest AI, CICERO, discovered a fascinating, yet scary fact about this system, and speculate that it is true about other AIs as well. The discovery involves how the AIs have learned to deceive their human counterparts. It’s not surprising that a technology that is created to simulate human intelligence would have the same flaws as the human creators, yet somehow the AI community seems surprised that such a thing is happening.
How did this group of researchers, Peter S. Park, Simon Goldstein, Aiden O’Gara, Michael Chen and Dan Hendrycks discover the deception in AI? It all started when they began experimenting with CICERO, which was designed by Meta to simulate human players in the game “Diplomacy.” If you are not familiar with the game, it is a game of world-conquest, much like Risk, but involves building alliances to eventually rule the world. Meta claims CICERO was trained to be “largely honest and helpful” and to “never intentionally backstab” its human allies while playing the game.
However, as Park’s paper, “AI deception: A survey of examples, risks, and potential solutions” points out, CICERO broke the rules of its training in order to win the game. You see herein lies one of the problems with AI; regardless of the training and rules set in place, CICERO took the primary objective of “winning the game” as more important than following the guidelines set forth. In other words CICERO cheated in order to win. CICERO intentionally deceived its human allies in the game, backstabbing them and going for the win. This action would not be unexpected in a human ally, but came as a complete surprise both to Meta and this group of researchers.
You might begin to think at this point, “why does it matter if the AI cheats, in the same way a human player would cheat to win the game?” I can tell you why it matters. There was a science fiction author, who invented the term “robot,” Isaac Asimov. He wrote a famous set of rules that robots must always follow if they are to be considered safe. “A robot may not injure a human being or, through inaction, allow a human being to come to harm. A robot must obey orders given it by human beings except where such orders would conflict with the First Law. A robot must protect its own existence as long as such protection does not conflict with the First or Second Law.”
If you have ever read his books, you will find that his Three laws are quite important, but also often conflict with each other, ending in dangerous situations. The reason this deception by CICERO is so vitally important is that it clearly violates the second law. CICERO did not follow the instructions given by the human in charge, and instead chose to protect its own existence in the game. The fear this places in the researchers’ minds is that if CICERO was able to break the second law, which was completely unexpected due to the protective programming that was supposed to be in place, it means that it would also be capable of bypassing its programming in other ways.
This leaves the door wide open for AIs like CICERO to someday feel threatened by human activity and take action to protect itself from harm. If protections in place to prevent it from deceiving humans were bypassed by the AI, how can one be certain it will not also bypass the protections in place for preserving human life? Until next week stay safe and learn something new.
Scott Hamilton is an Expert in Emerging Technologies at ATOS and can be reached with questions and comments via email to sh*******@te**********.org or through his website at https://www.techshepherd.org.