Video: Clean Room and/or IRE? Stop Guessing and Start Recovering | Duration: 1808s | Summary: Clean Room and/or IRE? Stop Guessing and Start Recovering | Chapters: Webinar Introduction (33.77s), Incident Response Process (95.22s), Investigation Assumptions (328.625s), Investigation Questions (524.73505s), Clean Room Investigation (694.865s), Rubrik IRE Advantages (1034.895s)
Transcript for "Clean Room and/or IRE? Stop Guessing and Start Recovering":
Everyone. Welcome to this webinar about clean room and IREs. We're gonna talk a bit about how the two relate, how they work towards helping you with cyber recovery. It's a very strategic decision. It sounds very tactical. Hey. We're gonna build this environment. We're gonna build that environment. But much like disaster recovery, it's super important for you all to think about this ahead of time to be very strategic in what you build, how you build it, where you build it, all that kind of stuff. So we'll talk through a bunch of that, what they actually mean. You know, what is a clean room? What is an isolated recovery environment? How do they relate to one another? When should you use one or the other? And why each of them matters when it comes to a cyber recovery event? And it can make the difference between taking weeks to recover and taking hours to recover. So a super important thing to to consider. My name is Brian Knudson. I am a technical marketing manager at Rubrik, and I'm excited to to walk you through this. So let's get started. So I like to start with kinda what's the process we go through? What does it look like to deal with the cyber incident? And who all gets involved? So in most environments, you've got a security team, you've got an IT team. Smaller environments, you may be the same person. Bigger environments, they may be two completely disparate teams and different organizational structures that sometimes are not necessarily working together on a regular basis. And so it's important to understand that, one, you've got two different teams with two different goals going on here. You also have a general process things run through. So there's a lot of different frameworks to talk about how cyber incidents happen, what you need to do to prepare. I've kinda purposely left this as a little bit more generic and not necessarily aligned directly with with any one framework, but it's important to understand the framework on both sides and and what you're dealing with. As you go through a security incident, you are generally going to start with detection. You're gonna find out that something got into your environment, and and and that kinda kicks off the process of notifying people, bringing together, essentially, war room, and and just having everybody notified about what's going on. Then the first step really tends to become after that is, like, let's let's stop this thing. We don't want it to spread further. We don't want this issue to get to get worse. So we need to try and box it in as best we can. Sometimes doing it a little bit on the slide, we'll talk about, why it's important to understand that you don't always wanna tip-off the threat actor immediately, but you wanna stop it from spreading is the key there. And then we get into the forensics and investigation side of things. Both of them are kinda loaded terms. I'm using them in the most generic sense. It's the period in which we try to figure out what the heck's going on. What do we need to do about it? What do we need to know about it? Spending time learning about the incident and and who's doing it, what they're trying to do, what they accessed, how long they've been there. That's that's really the the biggest chunk of the time that most incidents will cause downtime is is figuring out what's going on in that forensics investigation phase. Once we know what eradication. We can start to clean up production, make it ready to restore back into. And that's where we tend to flip from the security team to the IT team, is that recovery piece of things. IT wants to move fast. They unfortunately have to kinda sit on their hands through most of this process, which is which is tough. If you're on the IT side of things, you want to action. You know, Doctor is all about acting as quickly as possible and getting things up as as early as we can. That doesn't always work well in the cyber recovery situation because even though the business is down and we're losing money every single hour, the pressure's on to get things back up and running. There's there's a serial set of steps that we need to get through, especially if we only have a production environment. No clean room. No IRE. And so we've we've gotta sit there, and we've gotta let this process play itself out. And that does lead us into a challenge, especially with those two different teams both getting pressure from up above to get things up and running and stop hemorrhaging dollars, is we end up with this impossible choice. We can either rush the forensics and risk reinfecting. We restore the wrong thing, and all of a sudden, it's broken out again versus having to sit there patiently wait for that investigation to complete, get it completely out of production, and then start to do the restore. And there's a lot of complicating factors that go in into all this. I'm gonna do a little bit of white boarding here in a little bit, and I'll I'll kind of explain more of that as we get into the details of these different environments and and how they relate to this process. But, ultimately, what it kinda comes down to is you've gotta take the time to do it right. You don't wanna rush things. And under pressure, that's really hard to do. So there there's kind of three assumptions that we've discovered that, customers run into during a cyber incident that can really cause them problems. This first one is to assume that it's safe to do the investigation in production in the first place. This is the environment where the attack happened. This is the environment where things are are actively going south, and all of our all of our systems are in place in this environment. But it's not safe to assume that that is a safe place to do the investigation, because the the threat actors are constantly on the lookout for you to be trying to stop them. And, you know, their goal oftentimes at the beginning is be quiet and to silently move from one step to the next. And if you happen to notice them in your environment and they notice you noticing them, they're gonna hit the fast forward button. They're gonna stop being quiet. They're gonna be noisy and fast and start to execute whatever payloads they've they've put into place. And so you gotta be careful about that. You gotta be careful about when you get into the, the containment phase and you start trying to stop them from spreading. That's gonna tip them off because of all of a sudden, the system that they've been using as as their, entry point into the environment suddenly goes away, that's a clue to them that, hey. Things might not be going well. And they may actually have a a kill switch built into it to execute the payload at that point. So you gotta be careful about how you go about the investigation in production as it is. The second assumption is to assume like we would in Doctor that the most recent is the point in time that we wanna restore to. This is why it's super important to have a separate cyber recovery plan from your disaster recovery plan because they are two very different environments or situations there. And so knowing which point in time to restore to is super important and is is a bit of why we have to kinda wait for the investigation piece to really play itself out, because we don't necessarily know that that's the clean point in time. We don't know how long they've been dwelling in the environment. And how do you figure that out can make a huge difference between that, you know, weeks of downtime and hours of downtime. We'll dig into that more. The third assumption is scanning after the restore is sufficient. So, you know, we've got a clean environment. Let's go ahead and restore and we'll scan it. We'll see what's going on there. Well, there's two problems with that assumption. One is the fact that you may accidentally introduce the malware back into the environment. Even if you scan it and find it after you restore, it's had an opportunity to get out and do its thing. Secondarily, if you miss it on the first try, if you're using the same set of tools to scan, in this post event situation as you were in a pre event situation and it missed it pre event, good chance it's gonna miss it post event as well. And so using all of the tools that they have at their disposal to hide from your EDR tools, which they will actively try to do, means that they may be able to hide from those same EDR tools after you after you do the restoration. So all three of these assumptions are places where traditional recovery tends to fail and why you kinda need a different perspective on how to go about it. Ultimately, it all comes down to how do we investigate this problem and prepare ourselves for recovery and more importantly to restore clean points in time. As we go through the investigation, there are several questions we're trying to answer. And these are all questions that that Rubrik has really kinda focused on to try and help our customers reduce that investigation time. What is the attacker's footprint? What is the scope of the attack? Those two are, like, how wide, how deep, and how early did they get into our environment? Third, the identity side of things. Super critical question these days because they are using identity as the entry point, and they are using the identity environment as the backdoor to get back into the environment, because that's not a place that we've traditionally been looking very closely at from a security perspective. Next, what is the sensitive exposure risk? You know, they're looking for data oftentimes. And if they have, corrupted your data, if they've encrypted it, they've probably exfiltrated it as well. And will you be able to answer very quickly, based on what systems they accessed, what data they probably exfiltrated? That's a super important question to answer very quickly in in the situation. You definitely don't have weeks to answer that question in most cases. And then finally, what is definitely clean? What's the point in time that we can feel safe restoring into a production environment and and know that we're not going to reintroduce the malware? That clean question has has some complexity in and of itself too. You know, you wanna make sure you not just get rid of the encrypted data, but that you're sure that it doesn't have the ransomware. It doesn't have the malware in it that you can easily identify. Even more importantly, you wanna go back to the point where they maybe just loaded the tools. They dwelled for five days. The first day was just loading their back doors so that they could get back in. Then they started bringing the malware into the environment because that is a risky move for them. That is the point at which maybe things get noticed. Maybe the EDR tool picks it up. Ultimately, all three of these boil down to the fact that you wanna restore the business and not the malware as you're starting to do the restoration process. So if you're restoring that malware and you're not truly removing all three of these layers of an attack, there's a good chance they're gonna get back in your environment. You may not even know it immediately. You may have you may get back up and be up for a few more weeks, and then all of a sudden, things start encrypting again. What do you do at that point? So the question really becomes, where do we run this investigation? I talked about, you know, the the timeline slide at the beginning was really about what if we only had production to work in. And, of course, this webinar is about cleanroom and IRE. So where do we wanna run the investigation? Really, it depends. It is the true engineering answer. It depends. And this is a business decision as much as it is a technology decision. So I'm gonna do something fun. I'm gonna move to a whiteboard, and we're going to, actually depict what these different environments look like so you can compare and contrast. So got our whiteboard. Let's start with the most basic part of all of our environments, which is going to be our production environment. Now our production environment, of course, has a bunch of production workloads. These can be virtual machines. They can be databases. You know, whatever applications are important to you and your business. We'll also have some sort of backup set up within this environment. Ideally, this backup environment is going to be air gapped. It's gonna be immutable. I mean, if all else fails, you wanna have that backup available to you so that these four workloads that are most important to your business all survive the attack. Even if everything else goes away, backups need to be there. Now when when we talk about those those stages that we go through, you know, we start by detecting. We start by contain. We then move into the investigation phase. We move into eradicate. And finally, we want to recover. Hopefully, you can read that okay. When you only have a production environment, guess what? Every one of these stages happens in that environment happens in that production environment. And some of those, those assumptions we talked about can get a little dicey in that environment, to be honest. So so it can be a challenging environment to have as only as your only environment to work with. More importantly, it leads to that linear approach to things that I mentioned at the beginning where each one of these phases can't start until the previous one ends. You can't start investigating until you've really contained it, because the investigation and the containment are both, like I said, alert signals that that maybe be sent. So understanding the fact that this is the where we get that linear perspective on things is because they're all depending on this one environment. Now from there, then we can start talking about, well, what does a clean room give us? So if I depict a clean room as as really a separate environment, and it's gonna be another one of those environments, we wanna make sure it is truly disconnected from the rest of the environment. This needs to be isolated. It needs to be I I can bring things up in this environment that will never get out. And one of the few things you probably need to allow to transition back and forth are your backups to make sure that you've got the the forensics information in that environment that you can play with. These are going to be backups that you are going to want to restore and see if there happens to be something untowards in them. Allow them to do some things, potentially. You're you're looking at the the logs that are in those systems. You're looking at how do they spread? What's the next thing they're looking for? What what are the pathways that they take through the environment? So that you can then start to understand, okay. Well, these may be the points in time we don't ever want to restore. We want to we wanna keep those aside. How are you maybe tracking those? And that leads to a really interesting situation where, you know, you may you may do this restore. And this restoration process of trying to figure out what the, corrupted or, you know, infected points in time may require you to have to do a restore and then scan it and realize, oh, wait. There is malware on there. And then you'll have to, delete that. And then you'll have to start over again. So this failure mode requires, an over and over loop until you find that clean point in time, slowly going back in time. That's super time consuming. But having a clean room to do that, it gives you additional capability that you may not have when you're in the in the production environment. So as we look at our our timeline and how that affects the or how clean room affects that timeline, you know, we're still detecting in production. We're still containing in production. That's never going to change. But now we can do some investigation in in that clean room. It gives us that separate environment, which is super useful if, you know, for some reason, law enforcement has police tape around the production environment. You can't even access it. That's a huge detriment to your ability to, move through the environment and and be able to do these stages. And when we get to the eradication, you know, it's still dependent on us investigating. We need to know where it's been and what it's been doing so that we can truly get rid of it. But we can do a little bit of skipping around from contain to eradicate. Essentially, what we're doing is we're introducing some parallel capabilities. We can start to investigate whilst being contained in production because we don't have that single environment. We can isolate it and be able to do the things that may alert the threat actor as to what's going on by doing it in this isolated environment that doesn't call out to their command and control servers. Now, of course, we need to wait for recovery to end because recovery is still going to happen in production. We don't have any place to do it. We still have to eradicate it in the production environment as well. So we still end up with a fairly linear process here, but it is one that allows us a little bit more parallelism, a little bit more flexibility to speed up that investigation time frame. But you still have IT kinda sitting on their hands waiting for this to so let's move on and talk about how an IRE affects this whole process. So again, an isolated recovery environment is gonna be similar. This is gonna be a highly isolated environment where we don't trust anything inside of it. And we definitely don't want anything inside of it to get back to the production environment without a super controlled process. Again, we're also going to want our backups in here. So a little bit of a hole that allows replication of our backups or points in time into this environment so that we can, again, be able to restore things out of there. Now what an IRE really differentiates from a clean room is that it we end up with multiple environments in here. So we may have something very similar to a clean room in here. Let's call this our forensics just to differentiate it here, but it affects the same way. Like, this is gonna be super isolated even within the the the IRE environment because we want to be able to put known bad things inside of it and let it do its thing. So we'll still have this kind of recovery capability, discover where we've got malware potentially so that we can start to identify what should and shouldn't be restored into production to be able to start to identify things, and and do this, you know, this restore fail delete repeating loop over and over again. So that still happens at that point, but we've got an environment to do that. And even better, we could potentially create multiple forensics environments that allow us to do things in parallel. So the investigation the security team's doing can happen in one forensics environment, and the forensics that the IT team needs to do in order to find the clean point in time can happen in a separate one. And now you're starting to see where this can help to speed up the process a little bit, hopefully. We also tend to see staging environments, where we can do the restores of when we find known good clean points in time. We can restore them into the staging environment and start to build the environment back up, get get things moving faster. Again, potentially happen in parallel with the other investigation pieces. And then finally, have a production restore environment where as we put things together and we know they're good and clean and that they are working together properly, we can start to migrate things from our staging environment to our production environment. These environments, of course, are are also kept isolated, maybe not quite as thoroughly. The production environment, ultimately, we're gonna wanna, you know, get our our end users into that environment so that they can access that and start using these systems. But that that wall around that production environment needs to be permeable, but very well controlled. We don't want just anything moving in and out of it. We wanna be very cautious and very thoughtful about what we're letting in. So it is definitely a deny everything until we feel confident these things are ready for users to access them. So as we look back at our stages and how this affects our different stages of the recovery, we can detect in production, we contain in production. That's normal. Investigation now becomes, again, that that linchpin becomes very interesting where this can be happening in our investigation environments, where we've got multiple of them. They can move faster. The eradication, of course, still happens in production, but now our recovery is actually happening within the IRE. And what that does is that really starts to decouple these stages. You know, detect is still a dependency for containment. Now investigation, of course, is no longer going to be coupled to the containment. We can start that immediately. Now we're probably not gonna eradicate until we start to contain. And sorry for the messy diagram. There's a lot going on here, which is why I'm using these different colors. The containment still kinda is a prerequisite for eradication. We gotta make sure we've got it well well contained before we can truly eradicate it. But now recovery is no longer dependent on eradication. And, honestly, it's not even totally dependent on investigation. You know, there's a loose bit there. But since we have that separate environment with which we can put users into using it in a production mode, that unlocks a lot. That's where we really start to decouple this whole thing and allow things to happen much much quicker in that regard. So this is a Rubrik webinar, and I did want to really just kinda focus in on the high level in helping you understand the basics that, really, any backup vendor can can help you with. I wanna just take a couple minutes to show how Rubrik can really change the game for you in this regard. Rubrik, one of the cool things that we do is as we ingest backups, we are going to do what we call threat monitoring and sensitive data detection. And what that's gonna do is it's gonna start to help us to identify which points in time have known malware or known, malware tools. So the tools the attackers are using aren't necessarily the malware themselves, but are the tools that they potentially use, and be able to identify that at the point. So our detection actually can happen with Rubrik along with your EDR tools. Again, things will evade the EDR tools, but we have the ability to potentially see around or or look at the two points in time and say, hey. There's there's something that showed up here that EDR missed. And allowing that to to hopefully happen a little bit faster in some cases, you know, it's it's a secondary way of looking at things, which is super important, which gets into that restore, fail, delete piece of things. Because now you've got a secondary viewpoint of things so that you're you're not depending on the same thing. We don't have to restore these backups in order to check and see. We don't have to to scan it in order to see. We we have that knowledge built in already. And if it's even if it's a zero day, if you can get that hash value, turbo threat hunting allows you to be able to search through all of these, these points in time that we've taken and just do a metadata search to find them within within seconds. The other thing that Roobar can can really help do is is to eliminate some of these forensics environments. So you may not need that restore, fail, delete type of an environment to do the scanning. So now you've you've saved yourself some space in that IRE. You've saved yourself a ton of time because you don't have to go through that process. Looking for that sensitive data also allows us to very quickly be able to give you an identification as to where you know, once you know which systems they touch, it's a simple report to run out of rubric to identify where sensitive data may be, where that credit card number is, where the Social Security number may be. Those things can can be, processed very, very quickly because to us, it's metadata. We don't have to touch the backups at that point to to scan those things. And that information gets brought down to you into that IRE. So you don't even need to go back to that previous point in time or that that production environment in order to find out what's going on there. So really speeding up the investigation and reducing the need for different environments in the IRE is a big part of it. But having all of these environments allow you to now put that parallelism in there to decouple these different pieces so that you don't have heavy dependencies on them. So that's kinda what the differences between a clean room and an IRE are. Obviously, there's a lot more complexity with an IRE, but the the just like with disaster recovery, knowing that you've got an environment to go to, knowing you've got the processes and procedures already in place, and a place that, honestly, you can test them on a regular basis without affecting production means that when it comes time to a cyber attack, when when that call comes at 02:00 in the morning on a Friday night before a long weekend, you know you're gonna be ready to go. You know what to do. Everybody knows their role and are able to jump in and start doing things as quickly as possible to to get that environment up and running. And that's really the most important thing is to reduce the number of assumptions and know exactly what you have to work with and what it's going to take to get you there. So with that, I'd like to thank you for joining us. Hopefully, you've been, asking questions in the q and a panel, and, we've been answering them on the back end. We're not gonna take live q and a, but we're gonna stay on with you as long as as you all have questions to in that q and a panel. We've got a team of people helping us down there. So, hopefully, it'll go pretty quick for you all. I really do appreciate your time. And, yeah, please ask whatever questions you have. We'll help you out for as as long as you've got them. But thank you for your time, and we hope you have a great day.