Adversarial Examples Are Not Bugs, They Are Features with Aleksander Madry

Automatic TRANSCRIPT

Hey welcome thank you for having me so yes indeed a IMF at and Macho. What I work on his machine learning trying to understand why this technology works. What can we do to make it? Better and in particular are very much focus on reliability and robustness issues awesome. Can you share with us a little bit about your background? And how did you get started working in machine learning in general and reliability robustness in particular? That's actually an interesting story. So essentially I originally or I still actually feary so I worked on on the field called fear of computation and somehow like I was learning about machine. Learning about all dislike resent exciting developments there and kind of it peaked. My curiosity and I started to learn more and more and more in particular one thing that you know as I was trying to make sense of this field. One thing that stood up to me. It's exactly this notion of adversarial examples that who will talk about more definite moment and essentially dare kind of what it showed. Was that this first of all. We don't understand fully how this machine learning it works but also we realized that well even though this it works well on average in average case it actually leads performance in the worst case is is much much more troubling so that's essentially a peak my interest and made me more interested in figuring out. Okay if we think about machine learning from more worst-case perspective what does it look like? Essentially that's that's how I got into this field. Paper viewers was recently presented by one of your students at the in Europe's conference. This is a paper that we wanted to dig into An interview it's called. Adversarial ARE NOT BUGS. They are features Why don't we start by having you share? Just the broad space that you're looking to address with this paper. Sure so actually. This is exactly about this this notion of examples emission in passing so so. Let's talk about what other examples are so think like the tone examples as this very curious glitches of state of the art machining systems. So essentially what? People observed is that you can take. Let's say an image over of a pig dot state of the art classified recognizes speak with high confidence. And then what you can do. You can add a tiny bit of noise just a speck of noise to this image. This this is not random carefully chosen and once you add this noise well you get a different image who are human looks indistinguishable from the image. But for some reason this new image is grossly miss classified by no by the by the clarifier. So for instance you know you can make a pig be classified as an airliner with high confidence so the usual joke I make is that it shows how powerful machine learning can be. It can make pigs fly but essentially what it really a shows us. Is that kind of this. Brittleness of this prediction. That like everything looks fine but then I can show you version of the input that looks almost identical or to you as a human identical to picture for for which the system seems to severely under four so kind of so this is the point of settlers paper kind of trying to understand okay so first of all why we have had examples and essentially why are so widespread because it turns out that essentially if you just look at standardly trained machine learning system it essentially l. every input seems to have this behavior so we can just take any input and you can make it be classified as any other input jazz by adding a bit of. Nice to it so this was like very positive as I understand. That system will not be perfect. There will be some input on which you kind of things would be wrong but why it is possible to essentially take input and kind of what is the nature of this veneman okay. So that's that's essentially. What is the point of paper and kind of people of course wondered about this question since the discovery fundamental and they fought about things that you know. This is this is absurd. Examples exist jobs because they said it's tuitions or that you know this is just an imperfection of the kinds of techniques that we're using and somehow all that the explanation of proposed expansion in common is that they actually view. The examples is a glitch in the system as a bag of our system. That once we develop a missionary meadows. That actually are perfect. They will not have this bags anymore. But it's something that we need to fix as a community in order for these to be more useful or less susceptible to The problems that they represent exactly so essentially so ducks and this is definitely true and our paper also recognizes as. This is the goal but kind of what was new and kind of kind of mind blowing to us as well as we discover. Even though in retrospect actually perfectly obvious is that even though we think of this object bags something undesirable and is definitely undesirable behavior. It's actually the root of this behavior is comes from the fact that our machine learning models actually are are performing the task. We are asking you to perform all too well as opposed to doing the wrong way. Okay so essentially so. This is why this title says this not really a box even though we are undesirable. They're not really bags. They actually feature there something that they correspond. They're just like a natural consequence of the kind of misalignment between how our models solve the task we want them to solve and how we as humans they are supposed to be interesting so when I hear you say that they. They're working as designed. I think these algorithms they're kind of pattern matching machines. We give them a pattern. And then they make a prediction and this adversarial example. Problem is kind of manipulating that pattern. And so the prediction is expectedly different. is that the general sense or you know what where are the nuances in this idea that these are actually features? Yes so essentially exactly so so definitely are kind of touching on the underwrite subject so we just put it in the in the right training. So so what do we really? Okay so what is that. We expect our machine learning systems to do and what they actually are doing right so what we expect. Much learning to machines to is just okay. Let's say I give it. Imagine it Imminent DATA SET TO TRAIN. So essentially it has like million over a million images high quality images. Each image is a picture of. Let's say a dog and the Labor. Look this dog. There's a picture of a cat and a label this Cup so we let our system. Just look at this data and kind of try to figure out all the patterns. That seem to such as the open house dog like. How's IT Catholic? And then the way we kind of tested this is how we there's the performance that we have held out data set of. Let's say no different different images with Labor. So let's there are some new cars and new dogs and now we just show the picture and we ask them okay. Tell me what you think is on this picture. So so. That's kind of the way like what we expect is by just looking at the training center and passing the test. The system learns how a dog looks like or how it looks like and essentially like this is just a way to achieve that. So so your accent. So that is this on on technical. Training and testing and on our expectation is that this is after undergoing procedure. The system knows how a dog and cat looks like and now what we discovered is that can in some ways even though this is our expectation. This is not exactly what's happening so was essentially happening. Is that our deep learning model in particular but all Muslims in particular. If all that they have to do is just figure out. What is the correct answer to this kind of quiz that we give to the end where you don't necessarily have to even learn what adults or what they cut really is from the human perspective? Essentially all that you have to figure out that there are certain patterns in the data that correlate well with one label versus the other and you just have to extract this this pattern and essentially you make your classification based on dispatchers and so far. This is not surprising but what is surprising. Is that departments that turned out to be most effective to perform well on this test. They are totally not the partners that we as humans would see in any way correlated with an electric or having anything to do with this picture corresponding to a picture of a dog and this is this speech corresponding to a picture of a cut and essentially what examples are they exactly lever judge. Lenny played this patterns that much learning molitor sensitive to to humans. Seem to be completely unimportant to exactly like you know. Changed the prediction of model while not change it. Anything to a human.

Coming up next