I am honored to interview Dr. Russell and get his thoughts. He is a leading expert in Artificial Intelligence, Professor of Computer Science and Smith-Zadeh Professor in Engineering, University of California, Berkeley, and author of several articles and books. While for pragmatic expediency reasons we have a tendency to jump into machine learning directly, I strongly encourage my students to first read Dr. Russell’s (coauthored with Peter Norvig) book Artificial Intelligence: A Modern Approach. The book provides the conceptual foundation upon which machine learning knowledge can be built. Professor Naqvi
Professor Naqvi: What factors led to this amazing idea of your institution? How can the world of AI benefit from your research?
Dr. Stuart Russell: Artificial intelligence research is concerned with the design of machines capable of intelligent behavior, i.e., behavior likely to be successful in achieving objectives. The long-term outcome of AI research seems likely to include machines that are more capable than humans across a wide range of objectives and environments. This raises a problem of control: given that the solutions developed by such systems are intrinsically unpredictable by humans, it may occur that some such solutions result in negative and perhaps irreversible outcomes for humans. CHCAI’s goal is to ensure that this eventuality cannot arise, by refocusing AI away from the capability to achieve arbitrary objectives and towards the ability to generate provably beneficial behavior.
The AI community will benefit because this new goal is much less likely to cause problems that could lead to a partial or complete shutdown of AI research. Systems that make mistakes that harm users, either financially or physically, can have negative consequences for an entire sector of the AI enterprise.
Professor Naqvi: If robots learn from us, from observing us, and from reading about us – wouldn’t they develop the same biases and prejudices that limit us?
Dr. Stuart Russell: It is clear that prejudices hurt others, and the value alignment process for machines should take into account the values of all humans.
Thus, a prejudiced value system is not one that machines can consistently adopt.
Professor Naqvi: Sanders and Tewkesbury (Sanders and Tewkesbury, 2015)
contend that “As AI develops, we might have to engineer ways to prevent consciousness in them just as we engineer other systems to be safe.” You seem to be doing the opposite. Why?
Dr. Stuart Russell: I don’t think we’re proposing the opposite (i.e., engineering consciousness into machines). We’re just proposing the goal of building machines that will eventually learn the right objectives. We have absolutely no idea how to design or test for consciousness in machines. And let me reiterate, consciousness isn’t the problem, it’s competence.
Professor Naqvi: Your center will certainly require multidisciplinary leadership. How do you plan to include experts from multiple backgrounds?
Dr. Stuart Russell: We have experts from cognitive science and game theory, and will be adding affiliate faculty in philosophy, economics, and possibly sociology. As we
learn to talk the same language and develop new research projects, we will expend the Center’s funding to accommodate them.
Professor Naqvi: The literature or data used to train AI based systems could be limited or tilted towards one value system (e.g. Western values), hence even a fair algorithm will have no choice but to learn the values for which maximum data is available. How would you prevent that?
Dr. Stuart Russell: A good Bayesian understands that data come from a data generating process that may have over- or under-representation of all kinds. (For example, the great majority of newspaper articles focus on just a few thousand people in the world.) If handled properly this does not introduce a bias, but there will inevitably be greater uncertainty associated with the values and preferences of people who are simply not visible in the data.
If the machine’s decisions might impact those people then it has an incentive to gather more information about them.
It’s also worth pointing out that in a real sense it’s not the robot’s value function at all. The robot’s objective is just to help humans realize their values, and what it learns is what those values are.
So, a robot chef in a vegan household isn’t a vegan robot, it’s a robot that knows the humans it is cooking for are vegans. If they lend the robot to their neighbors who eat meat, the robot won’t refuse to cook the steak on moral grounds!
Professor Naqvi: Do you feel that a bit of subjectivity (qualitative-ness) and creativity are key to learning from observed behavior. If yes, would AI be able to adapt to qualitative and creative learning frameworks? If no, how would its bias-free and purely quantitative learning be different than typical human learning?
Dr. Stuart Russell: Learning anything complex generally requires acquiring, in some form, discrete or qualitative structures as well as continuous parameters.
(For example, with deep learning, experimenters try all sorts of network structures — this learning process is sometimes called “graduate student descent”.) It’s also the case that optimal decisions within decision theory can be purely qualitatively determined — for example, you can decide that you’d rather crash into a solid concrete wall at 5mph than 75mph, without calculating any probabilities and utilities.
So there are many aspects of human learning and decision making that are good ideas for machines too.
The main difference from humans, I think, is that the robot has no
preferences of its own; it should be purely altruistic. This is not very common in humans.
Professor Naqvi: Humans have a built-in feedback loop that reinforces pain and pleasure stimulus that corresponds to the basic hardwired survival mechanism in biological entities (in this case humans). A lot of learning takes place as a consequence of that feedback loop. How do you integrate that, i.e. if you do, in your AI research?
Dr. Stuart Russell: Unlike Asimov, I believe there is no reason for the robot to have any
intrinsic preference for avoiding harm to itself. This preference should
exist only as a subgoal of helping humans, i.e., the robot is less useful to people if it is damaged, and its owner would be out of pocket and perhaps upset, so *to that extent* (and no more) it has an obligation to avoid damage. The self-sacrifice by TARS in Interstellar is an excellent example of how robots should behave, in cases where such extreme measures are indicated.
Professor Naqvi: Thank you so much. I truly appreciate your time.