Automate and/or Die?
there welcome to the cloud secure podcasts. Thanks for joining us today. Your host here. Tim peacock product manager for threat detection. Here cloud and anton traffic in the reformed band list a member of cloud security team. Here you can find the spot guests. Podcast distributed and our website when find a launch it. You can also follow us on twitter dot com. Slash cloud sec podcast. Today we have new guest a new topic and the topic would involve remediation and bucknell make dust cloud so guess. Today's joe. hey joe. Welcome to the podcast. Could you introduce yourself and tell us about your previous role at city as well as you current job. Sure thank you guys for. Having me my previous role at city i came in as the vice president of cloud. Native security engineering focused on google cloud they were doing google cloud adoption and my job was to use cloud native tools as much as possible to build security guard rails to protect their workloads. That would be going into cloud after my boss left. I ascended to senior vice president as well as the head of seaso's so special projects. So i did a little bit of work. Still mainly focused on but also with aws now. I work for one of the larger. Cyber security companies in the world still focused on google fire zone. When you're at city you focused on. Google cloud is is this like conway's law in action and was there an upside to that was good conway's law in some ways. Yeah i think that it was conway's law almost in a global way. Because i think there's a lot of large enterprise thinking the same way about cloud and figuring things out trying to closed security gaps closed the gaps in technology between traditional and close. And i think that cloud kind of just put itself out. There is one of those tools that allows us to build and customize you said you were focused on native tool as much as possible. What did that end up looking like. What does that mean for you. There was really three areas of security guards that we built so there was the preventative. Obviously starting with your pipeline moving forward. Then there's the alerting whatever siem tool you're using and threat hunting tool you're using integrating into that. And then there was the remedial aspect of it which we use cloud functions service was the greatest invention to hit cloud as far as security and being able to customize your tool by remedial. I don't imagine you mean like the kind of classes. I had to take his child. I imagine you're actually talking about the topic of our podcast today. Which is threat. Remediation are vulnerable remediation so we see a lot of people talking about threats. I honor abilities in the cloud in even within my product. We sometimes mixed them up. Do you think that chris definition was important. Your role at city or did you guys treat those as the same thing. They had respond to automatically. I think they were very much the same. But i think cloud certainly blurs the lines a little bit between the two. The chris definition still wins out in my opinion because they're still some level of human interaction or human made software that has to play a role. But i think when you think of cloud and you get into the idea that every resource is now an identity. So your vm is accessing cloud storage as the service account that it's attached to so it definitely blurs the lines and you have to treat it very differently and make sure that you're following best practices and i think again. Cloud is one of those areas where the lines are blurred. But it's still traditional enough. Your dyed in the wool technician is going to understand it. Yeah but here's the one that usually tweaks. Because i am the person who would fight to the death over the definitions and not confuse threats and vulnerabilities because if you forgot patch and you have a weakness that somebody can exploit versus somebody just hacked you pick the pasta or did something to me. The chasm is kinda huge. And i am really quite allergic the people who confused him however when we started looking at this specific to the cloud i kind of not just some of the workflows in some of the thinking and some of the processes are similar so i had to reset my brain a little bit like before that i'll be reaching for my baseball bat to tell people why are you. Confusion buffer overflow with an attack. Like these are from different domains. But i see that when people remediate on page system versus they remediate assist them where they have evidence of password compromise they sort of flow through similar steps and started questioning. My kind of religious fervor in this regard is that what's going on or something. There's new here you know. I think you hit the nail on the head. The threats have changed just taking into consideration. What we use to build these tools so server right. It's a whole new technology that a lot of people don't understand so you have these functions and you put the code out there one day. But if you're not monitoring echoed if you don't know that it's been updated or changed and you don't have control over those. I m permissions that becomes a threat and vulnerability almost at the same time right because someone can add a line of code. Start data exfiltration and you have a serious problem on your hand. So i think you just have to be aware that identity is totally different than the cloud and your identity and access management is where all of your security has to start at week. Password is still evolving their ability. But like you're one step away from being exploited in a very bad way kind of like. That's roughly what may be going on because to me. I like to highlight the fact. That identity is the main boundary in the cloud. You may not have layers of firewall. And that's going to be the topic of future podcast. I'm sure where we're gonna gonna wax poetic over. How identity is really your main security barrier in the cloud. It's not physical it's not. It's firewalls apologised to fighter vendors. But it's mostly the identity really does separate the good from the bad and apologies to the poor firewalls. That were being kept in buckets. That's awful is that too loose. Let them live their lives. They served us well so we touched a little bit on this. I sort of mentioned pro season. You guys mentioned automation. So the real elephants in the room is can automate remediation of on their abilities. And can we automate dealing with some of the threats in the cloud. So how do we go about that. So this is kind of my guest. Introduction to the main feature. Canley hope for the asian or not. I think so. I would say that in the cloud if you are not open to automation. You're probably in the wrong industry because it's given us the agility that we've always been promised a technology was going to give us so your products are changing every day. You're api's are changing the way that they are integrated is changing. And there's a lot of back and forth there. And i think that one of the things that has to be stressed here is from a remedial standpoint. You have to really test things. You have to make sure before anything it's production. It's doing only what you want it to do. So there's a lot of back and forth between the logs and oops. I didn't want to destroy that. But i need to layer this a little bit better but yeah i think. There's very few things in the cloud that i would say. You can't automate if you have competent engineers on your staff. And that's the hurdle to clear right. I think there's a limited number of security engineers and certainly even more limited number of cloud security engineers. We certainly talk about the challenge of hiring one of the things. We talk a lot about online teams. Ironically when it comes to guardrails how do we in our system for enforcing guardrails put up the right guardrails to help. Security teams scale their human operations with less effort. So one of the things. I run into a lot as a challenge. Those i talked to users about automated remediation and to me. It feels like there's a really big gap between where we are with automated response and where we could be with automated remediation. Let me even example. The teas that difference. Apart if i have a basement and water gets into see. I grew up on the east coast. This is the thing i worry about. Don't worry about in california. But if i have a basement and water gets into it it's really easy to have a sump. Pump that automatically has afloat. That kicks on pumps the water out. Now that's automated response but that didn't deal with the now moldy drywall or the crinkly pictures of grandma to deal with that. I call observe fro. They show up with a fancy fan and they fixed my house. That's remediation and so when you were building these things previously as the elevated senior vp. Were you building responser rebuilding remediation. And did you have a half from one to the other. I think that response in remediation kind of became in some ways because we were really solving for the fact that we had a lack of internal knowledge amongst ops teams amongst sock. I mean we were having to solve problems in engineering that ordinarily would have been the problem of some other team right so when we were responding to things it was important that we did kick on the sump dumb but also get all the artifacts that were necessary to build out reports so that we could respond properly. You know you don't want to just kill someone's resource and then not have a conversation about wyatt when it was so. I think that automation in to end is really the key. And making sure that you are keeping the artifacts keeping the documentation. It's just become so natural for us to automate everything at this point. So you didn't have human in the loop most the time. It really was truly automatic truly automatic. As far as i'm concerned you can get away with having a human in the loop and still call it automation but into end automation. It's really the key. The fewer hands touch things. The less mistakes you make the less likely you are to have human engineering and influence so it just makes for a much more secure environment but then you have to trust automation to be sure that remediated issue right like tim pointed out that this whole distinction in difference between remediation response automation can act. But like how do we know that. The problem is solved at the end human. Have to check. I'm trying to be been devil's advocate. But like how do they get to trust that. The problem is truly solved by the machinery with no humans. There is a of luke that you build with automation where you do the remediation and then you output the logs to report that the remediation was carried out. And then you have another loop that goes back and checks your automation to make sure that it is doing what it's supposed to do. Then you really should. I mean one part of the human element that we really can't get rid of is the auditing aspect of things you need those auditors to come in and check your code. But you want that checks and balances system. It's important automation will take far but it will only take us so far. That's really interesting. That automation routes in an organizational context of. We think we can solve this problem. Did we actually solve it. One of the interesting things is as security engineer and you just have to make sure that especially in leadership you have these metrics that you have to be able to provide so reporting up the chain. They may not care but they wanna know that you're doing something. They want to make sure that the tools you're building doing what you say they're doing so you have to perform some sort of metric and take them reports about you know. Here's what we stopped this last quarter. Here's what we know happened. Do they care. Probably not they probably care about money and pr right. They don't want to end up on the nightly news. You give them the confidence through your tooling that they're not going to end up on the nightly news. It's one of those cases where metrics are interesting for both managing up as well as managing down. What were you tracking with your team. When it came to this where you tracking. How many use cases we had. How often they fired. How did you manage this. I was tracking. How often they fired. How often the alerts that we built because we layered things we we always assumed that one layer of that was gonna fail. So if you built a preventative measure plan for the fact that someone could local exact anything in their tariff form right and make a change so be ready for that with alerts and the when the alert fails you go to remedial we tracked. How many alerts we had versus. How many actual remedial events. We had retract the type of infraction. That we were correcting so if someone built a dam workload are container workload in a region. That wasn't allowed. We would track that so that we knew the kind of threats that we were dealing with but more often than not. We were providing insights into. Here's the cis benchmark. Here's what we stopped. We are in compliance at all times. That was really what they wanted to see. Some of this sounds like it's a policy issue rather than even a weakness of on their ability or a threat like if i launch the in their own region. Eventually it made end up with privacy issues. Whatever but ultimately it's neither vulnerability in there a threat right that support issue. It might be into subtle all. That's the story. I think you're one hundred percent right and making sure that everybody in a massive organization understands what the policy is. That's a challenge so a lot of it was just learning on the fly in an organization and and making sure that people had the logs output to understand why they're machine suddenly disappeared right why they're storage suddenly disappeared. That was a fun time. Did you get particularly memorable emails. From people whose instances disappeared. Oh absolutely absolutely and and there's always the floods where something goes a little farther than it should have and get rid of stuff that they needed the testing important making sure that you're giving people the information so that they can build the right resources. How about three drag the elephants in the room into the site and say some of the stuff describing kind of points out that your team and tip teams around. You did not have this notorious security fear of automation like you mentioned. Something may have gone too far whatever but like we always hear stories from the nineties two thousands when somebody blocks yells lapdog and it was a coo during the presentation so the whole project was shut down like the whole domain of security was terminated. We hear those stories but apparently in the cloud things. Aren't that bad or maybe leave even aren't that way. So what about the cases where things went wrong how com automation survived those occurrences like what's different than the cloud. I think one of the biggest things is that recovery can be so rapid in the cloud. Where before there was such a massive process to correct any of the issues that you created at the worst you run into. The cloud is that you're going to have to redeploy an entire new project. And what is gonna take you from the perspective of infrastructures code. So i think there's less fear because of that. One of the things that people say about the cloud is that the landscape is huge. It's harder and harder to see everything but at the end of the day you're actually able to visualize your isolation your bbc project boundaries. So much easier in the cloud than you are in everyday life. so you can target specific instances. You can take things to a resource level for access and made sure that they are meeting. The requirements needed for special workgroups. The upside is that you get more benefit than you get. Negative sounds like also can automation to fix automation mistakes right. I've heard that. Maybe i was reading the lines. Absolutely you're one hundred percent right so you have to build a loop in one of those loose actually making sure that the functions that you're using to secure your account are always available so something gets deleted. Having a automated redeployment of that security function in the back end is really important. I like the thinking with loops. One of my favorite is john boyd. Who came up with middle loop so where you're going with us. One question i have gets back to people though is sounds like you had good people in actually know you had good people because i met some of your team. What advice would you have for somebody who wanted to get started with this. When they had fewer people fewer people dressed as much as you trusted your team check and double check. Obviously but i think investing in the training and the knowledge is something that a lot of organizations are probably avoiding right. Now i think the path is to speed to production and really what you lose with. That is being able to bring up a team. That understands these services in depth. So i i think training is really important. The other thing is give people a test environment and let them go because the other thing that's happening within cloud is we are so scared of minute mistake leading to a problem that we don't wanna face when really we should just is elite some boundaries and let people get into the cloud and go because that's the best way to learn. That's how i trained. My team just gave them access and said start playing with it and before long day were probably more competent than i was at the end of the day. This really interesting that the best way to learn is just by doing like that. Yeah well hey. We are. Just time so joe. Thank you so much for joining us. Everybody who's listened through to this point in the past. Thank you so much for joining us again. You can find this podcast on google podcasts. And wherever else you get your guests you can follow us on. Twitter at twitter dot com. The cloud sec. Podcast find anton myself on twitter tweet at us. Email argue with us if we like or really hate what we hear a we might invite you on the podcast. And that's either a blessing curse depending on how much we agree to disagree. And how brave you are so see you all next time on the cloud security podcast