Changing our formulation of AI to avoid runaway risks: Interview with Prof. Stuart Russell


Hey everybody so instead of having myself and Ben. Your usual hosts and here with special guest professor Stuart Russell of Berkeley Who's written a really fascinating book about Ai in the future of AI? And we're GONNA talk about it Professor Russell. Thank you for joining me. It's a pleasure. You're listening to linear digressions so Professor Brussel I know your name might be familiar to a lot of folks who are listening to this podcast. But for those who don't know your work as much or they may be recognized the name but aren't quite sure how to place it. You might giving a brief introduction. Sure so I've been a Berkeley Longer than I can't remember about thirty four years and I've been doing a I actually since I was in high school. I wrote a chess program. background nine hundred. Seventy five In high school so I've been doing a along time You might know my name if you've taken. Ai Costs you possibly used a book by Russell Ovik So Peter Novick and I wrote that starting in Nineteen ninety-two. We just sent the fourth edition off to the printer last week and So my research has covered Pretty much every area of artificial intelligence reasoning learning problem solving game playing planning robotics language. Vision these days I'm concerned about the following simple question which we had actually in the First Edition Textbook What if we succeed? There's a new book that you have out now human compatible artificial intelligence in the problem of control as you mentioned this. It sounds like this is something. You've at least been aware of for a long time. WanNa ask what motivated you to write this book now? So the book Sort of has two parts. One is the part that says okay. This is how we currently think they I And this is why it's extremely wrong And if we assume the current standard model they I As I systems. Get better and better We face the prospect of losing control over them. And losing control to machines altogether Second part of the book says okay. Here's how to fix it Here's a way of doing I on a completely different kind of general theoretical foundation and conceptual framework And in this new framework it seems that At least that failure mode of losing control Two machines Seems to go away. The why now is because Sometime around twenty thirteen. Two Thousand Fourteen I figured out what the second half of the book should be namely is a way of dealing with the problem I didn't just want to write a book saying okay. We're all dude right. Alan Turing actually. Nine hundred fifty one said World Dude. So there's not I wouldn't be a new point so without asking you to cannibalize your book sales too much here. You know in in a snapshot what what is the Fundamental Way. That we've gotten it wrong for a long time and where. Where is the ray of hope that you found in that second half the book so this down model of AI? involves building machinery that optimizes a fixed known objective So if you remember if you've read the first few chapters of the book the textbook We talk about for example problem solving systems that Find a sequence of actions that's guaranteed to achieve a goal with minimum cost. So there you have to specify the goal you have to specify the cost function in Moctezuma processes you have to specify the reward function in machine learning algorithms. You have to specify the function In control theory you have to specify a cost function so in fact it's not just A. It's a good fraction of twentieth century. Technology is based on this model and the model is wrong because me do not know how to objectives correctly particularly when you have systems at start to operate in the real world is easy on the chessboard to say okay. You're supposed to win the game But in the real world you might say okay. I'd like you to restore carbon dioxide levels to Pre industrial concentration so that we can get the climate back and balance. That sounds great. What a wonderful objective. What could go wrong wrong? Well you'd get rid of all the people because they're the ones who are producing the carbon dioxide and then you might say okay well. Let's not do that. Let's Restore Carbon Dioxide and not kill anybody And then of course. The system has Subtle and complex social media campaign. That convinces everyone to have fewer and fewer children Until there are no people left and then carbon dioxide is restored. And that's much easier than trying to do all the politics of convincing people to stop consuming and producing and all that kind of stuff so So actually we've known this for a long time where we have the story of King Midas you know. He gave his is objective specification. Everything he touches asa to gold and of course it was the wrong objective died because his food and drink and his family to MD gold And the genie. You get three. Your third wish always please undo the first two issues because I messed everything up. So we know this And yet we assist with a model with the more effective the better the I the worse the outcome is going to be the human beings And if that isn't a bad engineering model I don't know what is right so I think we should abandon that. We have doing things In brief solution is to say that The machines objective is still to satisfy human preferences about the future. But the machine knows that it doesn't know what those preferences are so as explicitly uncertain Just to give you a simple analogy When you go to restaurant the restaurant doesn't know what you want to eat. They know that they don't know they. You what would you like to eat? What MENU CHOICES? And if you pick something off the menu That isn't their life. Subjected will cost to to give you that thing you know if they're out of that item they're not gonNa you know traipse all over the city trying to find more of it. They'll say well. Sorry so you know. We're out of the duck tonight or you know the chickens not so great but maybe I could recommend the polk medallions instead whatever. It might be so. This is perfectly normal and understandable human beings that we don't know what Other humans want and we asked them and or they tell us and we we have an interactive process and we can do the same thing with machines. The machine knows that it doesn't know what you want but it has to somehow act in a way that is beneficial to us so it's naturally motivated so it's trying to solve that problem. The solution to that problem is to do things like US questions off Commission before before you kill everyone the restore carbon dioxide levels you ask you know. I understand about the carbon dioxide is it okay if I kill everyone and then you can say no. That's not why we added more sure that you're not right. Yeah so that's the basic oil and We can formulate this. Mathematically if you're interested you problem in Game Theory And the solutions to those games have the property that the better the I the better the outcome the human beings

Coming up next