Disrupt Your SOC or Be Disrupted
This session from The Modern SOC Summit is with Girish Bhat and DJ Goldsworthy about disrupting your SOC. Girish is the Vice President of Security Marketing at Sumo Logic, and DJ is the Director of Security Operations and Threat Management at Aflac. Today, they share how we can shift more focus to defending our systems from modern attacks. DJ tells us why we should be moving towards an autonomous SOC and the importance of using risk scoring methodologies.
DJ GoldsworthyDirector of Security Operations and Threat Management, Aflac
Girish BhatVice President of Security Marketing, Sumo Logic
Girish Bhat: Welcome to the Modern SOC Summit. I'm Girish Bhat, I lead the security marketing practice at Sumo Logic. Today's session is titled disrupt your SOC or be disrupted. Before we jump into the session itself, it's my pleasure to welcome the main speaker for today's session DJ Goldsworthy. DJ is a SecOps and Threat Management Director at Aflac today. And I've had the opportunity and pleasure of working with him for many years now. Not only is DJ an accomplished security leader, but he's also a hands- on practitioner. And my hunch is that he's had or played with every possible security tool and technology that exists in a security operation center today. I know DJ I did not do justice by introducing you that way, I know you have a very checkered and accomplished history in security. Could you share a little bit more about what you do today and that sort of thing for today's audience?
DJ Goldsworthy: Sure. Yeah. I've been with Aflac, I'm sure many of you are familiar with Aflac the duck, for about six years now. Came over to help build up some of the cybersecurity capabilities with a focus on threat intelligence and enterprise vulnerability management, and expanded my responsibilities to take over all of security operations, incident response and forensics, administration, and more recently engineering. So, really help build a lot of the security capabilities. And then through my various teams we do a lot of the hands- on aspects of security for Aflac. So we've built quite a bit over the last five years.
Girish Bhat: Thank you DJ. And thank you so much for agreeing to speak at our Modern SOC Summit. So let's jump into the topic of today's session, disrupt yourself SOC or be disrupted. I'm a data- driven person. I always like to base things on data to the extent possible. And what I always like to do is to look at third- party data in the industry. So for today's session I'm leveraging third- party data published by Ernst& Young, the global consulting firm. Every year they put out a report, I highly encourage you guys to take a look at it. It's a very well- written report and it tends to be vendor neutral, that's what I like about the report as well. And there are four statistics that I want to share today. The first statistic that's very relevant here is that, SOC spend dominates all cybersecurity spending within enterprises. The sample size is roughly about 1, 300 enterprises across the world that were surveyed by Ernst& Young, so SOC dominates. So the interesting part about this survey also indicated that 59% of the surveyed organizations experienced some sort of a material breach or a significant breach. This is a big deal, right? Let's look at it a little bit more. Out of the 59% of the companies that were breached, that's roughly about 767 companies although 1, 300, only 26% or 26% of the SOCs within these enterprises could actually detect that. That's only 200 companies had their SOC detected and roughly about 560 that SOC failed to detect the attack on the breach. Look, there's always a blame game to these sayings, right? Is it tools? Is it technologies? Is it humans? Its workflows? Its strategy or lack of it? There's all of the above in many companies, right? So there's a fourth statistic that is very relevant here. The report also said that 86% of the cybersecurity spend is on more mundane and reactive measures and also some important measures around compliance. Don't get me wrong, compliance should be the foundation of any cybersecurity strategy but you need to do more in terms of being able to defend against modern attacks and also grow your system. So that's why the theme of today's sessions we absolutely need to change. This is my view and I can't think of any other better person to talk about this other than DJ. DJ the session is all yours sir.
DJ Goldsworthy: Thank you Girish. That was a great introduction and great synopsis of the challenge at hand. You know I could definitely say firsthand working with a lot of peers see some of these challenges on the front line. So I think as it comes to framing the way that I think we move forward as an industry, I think it's important that we look at the direction that our adversaries are likely to go. And that becomes our target preemptively, we don't wait until they get there and then react. And where I believe that we're heading eventually is AI versus AI. And I know that's a buzzword and a lot of people's skin crawls a bit when they hear AI, because there aren't a lot of real practical examples of AI. But suffice it to say advanced machine learning, deep learning more intelligence within algorithms driving attacks. And things are going to accelerate to a pace that we can't really fathom right now. And so as attacks move to more of a high velocity they're going to look more like pen tests but very fast moving. And so what I envisioned the adversarial landscape looking like is they'll have open- source intelligence on companies, social media profiles of employees, a good picture of what we as a company look like or you as a company look like or any target looks like and there'll be adaptive attack engine as a service, though that's building a framework at scale to launch attacks. And essentially these attacks will use open- source intelligence and adaptive frameworks and machine learning to acquire targets, multiple targets per entity, go through multiple stages of attack iterating very quickly changing things like source domains, source IP address. The infrastructure will be highly variable so that you can't do just pure IP- based blocking and stop and attack in its tracks. The methods will be changing and if method A doesn't work it'll move on to method B and so forth. And so if you think about things moving very quickly and the attacks being polymorphic, rapidly evolving security is going to have to be obviously very adaptive, very automated to be able to sustain these types of attacks and keep them at bay. And so looking at the statistics that Girish put out there, we're obviously not there yet as an industry. And as attacks accelerate the problem's going to get worse if we don't hit it off. And I'm pessimistic about the state of affairs of criminal organizations and their intent, but optimistic about our ability to get out ahead of them. So, to do so though I believe we need to accept some hard truths. When we look at the incidents, the number of incidents that occur when I say exceedingly rare here you would say," Well, Girish's stats just said that they're not rare." I mean on a company by company basis the chances are your company doesn't have dozens of incidents a month or dozens even a year necessarily. And then when you look at near misses where there was an attack and it got stopped at some phase of the kill chain and didn't get to action on objectives where they actually got data or did some form of disruptive attack the number goes up some. We certainly see companies have near misses on a bit more of a regular basis, but still if you add all those up I'd fashioned to say that the average company probably is within maybe a couple dozen to not more than 50 of these real incidents and near misses combined. But the average SOC has thousands of alerts. And so when we take the simple truth of the incidents and near misses are quantitatively small numbers but our alerts are very high numbers, we know we're missing the mark. So we have to be bold as an industry to address that problem. And I think if we begin with the end in mind then we can have a North Star that we're working towards, and I think that that North Star needs to be the near zero alert SOC. And I know that that may sound a bit tongue in cheek, like" Whoa of course, near zero." But again that's a target. If we know that incidents and near misses are relatively rare that should be our target. It shouldn't be to get from a thousand alerts to 900, it should be to get from a thousand alerts to very few that way our alerts are more reflective of reality. And I have some ideas on how I think we can get there as an industry.
Girish Bhat: Hey DJ, I have a question here. So I like the way in which you're talking about. You presented this saying that," Hey, there will be some incidents and alerts." And so sometimes I think we as an industry we kind of mix and match the terminology. Sometimes what you mean by an alert may be a significant incident from an Ernst& Young report or vice versa, right?
DJ Goldsworthy: Yeah.
Girish Bhat: So I think there's a terminology normalization that needs to happen. But I think more importantly what you're saying is there are tens of thousands of so- called alerts or potential incidents but really it's you can count them on a fingertip if you architected your SOC properly. I think that's what you seem to be indicating. Is that fair?
DJ Goldsworthy: Yes, that's very fair. I mean, when I say incidents I mean damage was inflicted maybe in the news, maybe reporting to regulators or stakeholders that something bad happened. And near misses just means that there was some form of unauthorized access or adverse effects but they were mitigated before the really bad stuff happened. And so this isn't like a phishing email made it into an inbox but nobody clicked it, I don't consider that an incident or a near miss that's an event. And I'll explain what I think we need to be doing with those in just-
Girish Bhat: Thank you.
DJ Goldsworthy: ...A moment here. Okay. So when we talk about modern SOC attributes I think we need to be moving towards an increasingly autonomous SOC. And no, that doesn't mean SOC people should start packing up their desks and thinking about what they do next with their career. It's really about autonomous in terms of the forms of what we currently consider the outputs of a SOC, which is again many alerts and then a person working those alerts. But we already know that there isn't enough time to sit and wait for a human to review an alert and decide what to do. So if we know we have to get faster then we have to really change the paradigm of how a SOC operates. And in my mind the way to get there is to focus on automation and there's a couple of things that you have to do to be able to automate. You can't just automate, it's not just go code and automate and you're done there's some foundational work that has to be done first. And I believe in the process of moving towards an autonomous SOC, we redefine what it means to be a SOC analyst. And I think that that's going to be an important evolution that's going to help shape the way that we do operations in the future. And so in a modern SOC I envision analysts spending more time working on the system than in the system. And what I mean by that is today they work in the system meaning they're fielding alerts reactively, doing investigations and calling a lot of stuff false positive, and then closing it out and sending a few things on to incident response maybe to do deeper investigations. But if we go back to that thought about the low to no alert SOC, most things aren't incidents. We know most of the tickets or alerts that we work aren't incidents, if they were we'd all be in big trouble because we'd be having to explain to our board why we have so many incidents every month. I don't think that that's what's happening. And so with those incidents that are being worked that's time wasted. And so what I'm proposing and what I believe the industry will move towards is analysts looking at each false alarm, each alert that comes in that's not an incident as an opportunity to improve the system to get long- term rewards. So they're going to spend more of their time as coders, as threat modelers, as engineers building a system that's more effective automation and so forth. Because we need to repurpose their skillsets from reading alerts and responding to them to figuring out why does an alert fire in the first place? What type of data makes a high quality alert? When does something fire that actually warrants attention? And how do I get there from an alerting standpoint more reliably and less frequently? And so I think that repurposing the SOC and upskilling them is going to be a critical transition that we need to make collectively. And so it's like," Okay. That's all good DJ you have these ideas, but how do we get there?" And there are ways I assure you. I think what we need to do first is looking at a matrix here of confidence on the Y axis and frequency on the X axis, this is like the spectrum of alert. Anything that is low confidence whether it's high or low frequency we just need to eliminate. We are wasting way too much time, way too many valuable resources responding to low fidelity, low confidence alerts and that's taken away from the opportunity for us to really improve things. So I think the first thing we need to do is eliminate those low confidence and be comfortable saying to our leadership team," We shouldn't be spending our time on this. We've got to be spending our time on better things and therefore, some of your metrics are going to change but that's a change that we welcome." Then you get to this middle tier of confidence. And this is where if you haven't already I would recommend you start shifting towards risk scoring methodologies, where you have things like signals as opposed to one- to- one alerts. And you aggregate risk based on entities whether that's a system or an ID or some other fixed attribute that makes sense to collect risk on, and watch that risk over time for multiple indicators that something's happening. Because medium confidence isn't enough to trigger an alert each time. We're so focused on catching everything that we're ultimately catching very little as an industry, right? So we have to get more and more comfortable saying," We're going to let some risk bubble below the surface, but if multiple things happen and that risk exceeds the threshold now it warrants attention. Now it rises to the fidelity of earning the right to command our SOC folks time." And also I think the industry's getting increasingly more effective at analytics. There's still ways to go but I'm seeing machine learning capabilities that are functioning well, deep learning models that are getting more effective. And we need to be working closely with our partners on how to develop more and more effective analytics and deep learning models so that this middle tier of confidence alerts can be elevated up to a higher confidence composite alert. And so as we do that with risk scoring and as we do that with improved models from our partners and maybe even some we're developing internally, though I vastly prefer our partners do a lot of that lift, we can slowly eliminate a lot of that middle tier. And what we're left with in a perfect world is high confidence alerts. There's always going to be new stuff coming in, the orange and the red will creep back in and that's where we have to iterate through. And that's where I get back to the SOCs one of their core responsibilities in the future will be to work on the system. They're going to see that orange creeping back in or some red creeping back in and say," No, we've got to go build on this system to get rid of that." We need more enrichment, we need new data sources, we need smarter model roles, we need to do better risk aggregation we're collectively to fix that and push it back up to these high confidence composite alerts again. And once we're in the green, this is where we can automate. Okay. Don't try to automate in the orange and red unless that automation is automating additional enrichment or other things that are going to raise the confidence up. Once we get that high confidence though, we can start automating. And this is where over the course of a year or two years maybe three years we can increase automation to get closer to autonomous SOCs where we are building truly adaptive, truly resilient networks that can withstand the impending acceleration of attacks that are using more machine learning against us to build resiliency in. If you see this automatically block, if you see similar traffic take these actions start implementing speed bumps, multifactor challenges. Various things that we can do to improve the responsiveness without a person needing to be involved because if we're high confidence, the amount of human involvement needed drops drastically. And so I think this is how we can make substantial progress.
Girish Bhat: Yeah. So DJ I couldn't agree more on this slide and all the other slides so far, right? Because I think directionally what we're also hearing, just like you are highlighting is humans in the loop for the high value intelligent activities augment humans with the power of machines as relevant and possible to help scale. So you're spot on. Thank you sir.
DJ Goldsworthy: Yeah. Thank you. So the analytics just want to add one thought here on how I think we'll see SIEM and analytics evolve in the near term. To really help, I think, bridge the chasm of machine learning get in the middle tier of confidence alerts up to that higher confidence tier more systemically on a very highly reliable basis. And so what I envision is breach and attack simulations already are a pretty well- proven segment of the market. A lot of companies have adopted that, those fortunate enough to get funding for it. And how that works is we take threat intelligence, threat security research what are adversaries doing? What does their infrastructure look like? What does their malware look like? And you run it through a breach simulation platform that then simulates that attack in your environment, right? So it's how this attack looks, how ransomware looks in our environment, how remote access tools work in our environment, how lateral movement techniques, and so forth the whole gambit. And if we can integrate that into our SIEM, but not just from the basis of does the SIEM trigger a one- to- one alert because it saw some log? It's a hey, that log means something bad happened but rather our partners are building deep learning models that are supervised deep learning models. This is what bad looks like. And just keeps detonating attacks over and over and it's training these models, but here's what bad looks like in our environment. In our environment it's a very important part, because it's instrumenting it to focus the models attention on how it looks when it occurs in our environment. And when you compliment that with what we've been doing as an industry for a while now, which is unsupervised learning where we're just taking logs and saying," We're going to be able to find out what abnormal looks like after we build a baseline." That by itself hasn't proven to build enough confidence in our ability to get to that high confidence bracket, because a lot of stuff looks abnormal. Things change in environments, new systems come online, baselines change, and suddenly the unsupervised learning is going," Hey, this looks bad. This looks bad. This looks bad." And the SOC's back to chasing their tail again. But if we combine the this is abnormal with this is abnormal and it looks like bad based on the supervised deep learning models and we can combine those, I think we can get to a much higher fidelity system of alerting. And considering that the breach attack simulation is happening based on current threat intelligence within a very brief period of time, we're able to model what the new techniques look like. And we'll have some really good archetypes, some really good peer groupings, clusters of behaviors that say," This is what ransomware looks like and that ransomwares evolve. This is what the new generation of ransomware looks like." And as those behaviors that are unsupervised start to look like that we say," You know what? This looks like somebody moving laterally. Because I've never seen it before and it looks more like this than this normal person over here." And so when that happens I think we can get to highly automated responses even more so than we are now. We have plenty I think we can automate now and start making good progress but we need to make this leap collectively. And I think we will. And I think if everyone continues to encourage their partners to go this direction, we as an industry can help it evolve together. And I think that that's our responsibility, to all collectively try to affect change in the right direction.
Girish Bhat: Yeah. So DJ I was just having some flashbacks going back five, six years when we started off with unsupervised machine learning aka you run things like that, right?
DJ Goldsworthy: Yeah.
Girish Bhat: The new shiny thing. I mean it's great for baselining is what we learned over time, but then on its own it's a good identification of our potentially baselining but you need to augment it with real- time threat intel as well as other tools.
DJ Goldsworthy: Yeah.
Girish Bhat: So, yeah. So this is what we're consistently hearing. Thank you for highlighting this.
DJ Goldsworthy: My pleasure. Yeah. I think the industry is getting on board with this concept more and more and it's a good thing. I think that this is going to take us far. So key takeaways, we didn't have the time obviously to get into details. And I'm on LinkedIn if anybody wants to ping me and talk details, happy to connect you with my team or talk details myself. But thinking about how can I practically walk away from here and start doing some things now I say number one is, eliminate the waste and distractions. We just have to say enough is enough and we're going to call ugly, ugly and get rid of those ugly alerts. The stuff that's low value we just need to stop doing it. Then we focus on that middle tier. Work with your partners on their analytics models, tell them what's working and what's not, do some threat modeling and explain what kind of models you want to see from them. And then within your own system continue to enhance your correlations and the risk scoring within those to try to increase the fidelity and all these steps are giving time back to your SOC. Work with your SOC to upskill them, to get them more comfortable working on the system than in the system so you can get that upward spiral of giving time back and improving the accuracy. And the more you do that the more time they have to do even better and keep improving things. And I say that we've done this at Aflac. We started with thousands of alerts and we have very few these days, far fewer than we started with. Because we've really focused on investments and the time to improve the system. And so I'm not selling you something that I haven't done myself. And then augmenting people with automation. Once you have that high confidence take the time to automate it, have trust in it, trust but verify. Shift your metrics to verifying that the automation is working, that you're not creating help desk tickets because you're blocking people in scenarios where you shouldn't and stuff like that. And so long as you're not, continue on the journey of automation and keep automating more and more until one day you look and say," Wow. We're a largely autonomous SOC. We're spending much more time on the system than in the system." And life will be better.
Girish Bhat: DJ, I'm going to jump in here and ask you a question. That I'm like," Wow, this is great." So you obviously have built this sophistication over a period of time, six years at Aflac. And also Aflac has quite a good access to resources human, capital, among others. So for someone who perhaps has a moderate level of maturity in their SOC to come up with or head down this road map, what's the time horizon?
DJ Goldsworthy: I mean it's a journey. It is not a tactical move it's a very strategic move. Timeframe to invest and getting things built right, then iterating and giving yourself time back, and then reinvest in that time like in the stock market, collecting those dividends and reinvesting them it's a multi- year journey. Generally speaking at least a year or two, but you can make some really meaningful gains pretty quickly. So I say that the journey to get to a mature state is probably a multi- year endeavor, so starting now is important. But the timeframe from start to where you will go," Wow. We've made really good progress," is months. Even a program that doesn't have a substantial amount of funding you can make impacts by just starting with that process of eliminating waste and distractions and then reinvesting that time in engineering and automation. You know improving the system not working in the system. Don't free up your SOC's time and then ask them to do other busy work, let them work on the system and you'll start to see the benefits quickly.
Girish Bhat: Cool. Thank you so much DJ. That was so informative. On behalf of the Modern SOC summit organizers and Sumo Logic, thank you sir. And appreciate you spending your time with us and sharing it back to the community.
DJ Goldsworthy: I enjoyed being here. Thank you all very much. I appreciate it.( silence)