Video: Introducing Microsoft 365 Autonomous Business Recovery: Shift from Data Recovery to Business Recovery | Duration: 2052s | Summary: Introducing Microsoft 365 Autonomous Business Recovery: Shift from Data Recovery to Business Recovery | Chapters: Welcome and Introductions (30.11s), M365 Data Loss (74.185005s), Cyber Resiliency Solution (198.68s), Autonomous Business Recovery (255.06s), Autonomous Business Recovery (344.47s), Recovery Challenges (418.56s), Autonomous Business Recovery (472.085s), System Architecture (547.195s), Future Vision Roadmap (675.73004s), Rubrik Agent Cloud (1047.93s), Live Product Demo (1393.89s), Closing Remarks (1938.0801s)
Transcript for "Introducing Microsoft 365 Autonomous Business Recovery: Shift from Data Recovery to Business Recovery":
Hello and welcome to our follow-up session from the Rubrik Cyber Resiliency Summit, where we talk all things resiliency as it relates to your data environments. This is a follow-up more to our M365 session, where we're gonna talk more about some of the challenges that we see as it relates to dealing with data loss in March and what we at Rubrik have done to really change the game as it relates to helping you recover data as quickly as possible. My name is Eddie Waslowski, field CTO for Rubrik's SaaS business. And I'll also be joined by Akram, who is responsible for our M365 product portfolio as a product manager, and will be later joined by Dev, who is our GM of Rubrik's AI business. So as we covered in the Rubrik Resiliency Summit, we talked a lot about some of the challenges as it relates to managing data or potentially losing data in M365. And really shouldn't be any kind of news to anyone as it relates to the data loss challenges or the data management challenges that exist inside of March. We know from Microsoft that identity attacks are happening in a significant amount on a daily basis. There is this challenge and likelihood that users can inadvertently delete data through administrative changes or policy changes in the environment. And we now have to deal with the next evolved threat as it relates to AI and how threat actors can potentially weaponize AI to further damage or hold organizations hostage as it relates to data loss. So we're constantly faced with this barrage of attacks and potential data loss events. And we really have to be thinking strategically about how we can ultimately bring up our business and do so as quickly as possible. And to be honest, right? The challenges that we're seeing as it relates to recovering the data, isn't really, can we recover the data? There's really a multitude of ways in which we can recover. But when we think about instantiating the minimally viable business, and we wanna make sure that we can get the most critical users and the most critical data up and running as quickly as possible, It's a very difficult question to answer, largely because we don't know, right? Organizations haven't gone through the process of classifying all of their data. They may deploy purview for content go forward, or look at it from an application as users are authoring or touching data, but there really isn't an addressable solution right now where you're going backwards to be able to deal with all this. So if I were to say, what is the most important, largely what we see from organizations is that they don't know. And this unknown factor is lead us kind of scrambling if we have some kind of a data loss challenge inside of March. Rubrik has thousands of customers, millions of users across March, and we have built what we call our end to end cyber resiliency solution for M365. And it's not just again, the recovery of the data, but making sure we're doing it in a way that is the fastest way possible based on the information we know where we can get the most critical data, while also being able to look at all the critical identity components of three sixty five and being able to help with not only data access governance across the application, but also to be able to help you better govern a lot of the AI agents that are ultimately being deployed inside of our environment. So I'm excited to pass it over to Akram to talk more about this autonomous business recovery, the solution we've developed, because it really is game changing and really will help organizations, in their most critical hour, recover the most critical data for those most important users. Over you, Akram. Thanks, Harry. Hey, everyone. I am Mohammad Akram. I am the product manager on our Microsoft three sixty five data protection portfolio. So let me walk you through our new autonomous business recovery. Before I talk about this solution or what we're trying to build, let's look at, what is the problem that we're trying to solve? So, in the event of an M365 incident, the biggest problem that we've heard from our customers is, they don't know where to even start. They don't have a starting point. They don't know what to even do, what where to even start recovering their assets. This is also because the M365 admin does not have any context on what data is business critical to their users. They don't have context on what data is important to my CTO, my CIO, and where their data resides. This is a complex mesh of information across Exchange, OneDrive, SharePoint, Teams, and nobody actually knows where my data resides and where I'm using it. And lastly, Microsoft limits us on the API. So there's only so much data that we can recover in a particular day. So it becomes really crucial for for them to be really precise because if they waste these APIs, then the business impact is it's just sitting there. The important data is just sitting there to be recovered, and it would take them hours and even days and weeks to get their data, which means more prolonged downtime, which leads to business and revenue loss. So, overall, what we're trying to solve here is give them a way to overcome this blind spot problem, to have some context on what the data they're trying to recover, and be a bit more precise in their recoveries. That's where we are introducing our M365 autonomous business recovery, which is the first business aware recovery for Microsoft March. Through Autonomous Business Recovery, we're not just recovering your technical snapshots. We are trying to restore operations through business criticality. We are trying to introduce a single click business continuity solution, where we try to recover the most critical data for all your crucial users that were recently accessed by them across Exchange, OneDrive, SharePoint, and Teams. What happens through this is, we restore the exact assets for your most critical users, with surgical precision, and so really targeted recovery, that there is no wasted APIs, there is no wasted time, we are not over restoring any data, and it makes sure that you get back to your business up and running, in the shortest amount of time possible. Let's look at some of what some of our customers have been saying. We recently heard from the healthcare customer, that finding the stuff that they need, which they need to chase their leaders for, that would save them an unimaginable amount of time. So, like, oh, how would I do about this? I would have to talk to my readers, I have understand what documents they want. Just doing that and automating that process, would save them an unimaginable amount of time. Second, we would talk with a large IT customer, they're like, yeah, all of this M365 data is really complicated. You're dealing with data that they don't understand. So, anything that Rubrik or us can help them to understand this data and be more predictive and be more autonomous in their recovery plans, that would really help them reduce the effort and be very useful. And lastly, we there's a retail customer who we work with, they actually had an attack. And one of the feedback that we received was, and this is not an N365 Rubric, but at a previous company, while they had solid backups, what they didn't have was a clue how to even recover from these solid backups. Whatever they were trying to do during their attack, was they're trying to make up much of it on the hoof, and it completely slowed them down. They couldn't do anything for the multiple days, even though they had really solid backups. So, what they needed was a single click restoration, which we are trying to provide right now. So, let's look at some recovery times of what our solution provides And I want us to be grounded in the reality of what a recovery looks like for most of the organizations today. So, a major incident happens or a major data loss event happens, the instinct, the native instinct is, oh, let me perform a full mass recovery. But you're looking at a full Microsoft three sixty five subscription. Right? So, user, every file, every archive, for a decent sized organization, let's say, a healthcare customer in this example that I've mentioned, it could take a restore all button could take weeks for that restore to happen. And at this moment, I want you to ask yourself, can your business survive two to three weeks without access to its most critical communication, most critical files? For most most of the customers, the answer is a resounding no. That's where our autonomous business recovery changes the game. Instead of a brute force restore of everything all at once, our ABR or autonomous business recovery is surgical, identifies the most important and relevant data for your most key crucial individual use, key critical users. And by focusing only on what is required, we can shrink the recovery time window from weeks to less than three hours in this particular example. Now that we understand the impact, what we're trying to build, let's spend some time on how this system works. Right? It all starts with proactive signal intelligence. We aren't just looking at static backups that rubric continuously ingests from these backups. We are constantly ingesting signals for backup metadata, for index files, for Microsoft three sixty audit logs and we're continuously analyzing these signals across Exchange, OneDrive, SharePoint and Teams, so that we can identify which data is hot. Meaning, what data is recently active? What data was recently accessed by the users or the most business critical users? What data is business critical. So, instead of treating every file as the same, we are essentially building a user data intelligent map of what your business needs to survive right now. Once we have those signals, they are now fed into the Rubrik Autonomous engine. This is the brain of the operation. So, the engine evaluates that incoming data against your own user defined parameters. So, you define the parameters in this case. It intelligently ranks the recovery priorities and generates an optimized recovery plan. So, this removes the manual burden of having to decide and in the situation of high pressure, which folders or which mailbox to be restored first. So, the engine does the heavy lifting here, to make sure their most vital assets are in front of the line. Last, we move on to the execution, which is our accelerated recovery. Through our integration of Rubrik Security Cloud and the Azure Kubernetes Services AKS, we orchestrate a targeted restore operation and we can scale this recovery process and maximize the speed within your available API resources, so that we are using your APIs most efficiently. So, this is how we use our proactive signal intelligence, the Rubrik Autonomous Engine and accelerated recovery to build the solution, so that we have the most critical data accessed by the most crucial users available for recovery. So, while we're really proud of what is available today, I also wanted to give you a glimpse into our North Star vision. This is where we see the future of our recovery heading. So, the first stage starts here with the admin creating a minimum viable company, which is mentioned as MVC here. This is your baseline. This is your most crucial individual, your most critical users that you have in your organization, for whom we want to restore the operations. Second, is Rubrik creates the recovery plan, basis your custom configurations or basis your configuration that you've decided. This is something that is already available today and as I had already discussed in the previous slides. This is looking at your Exchange, OneDrive, SharePoint and Teams and pre creating a baseline recovery plan. Now, move to the second stage, which is where it gets interesting, right? Instead of just clicking through an endless nested menus to adjust your recovery plan, our vision allows the customer to modify their recovery plan using natural language to suit their scenarios. Right? Think of it like having a conversation. You could simply tell the system, 'Prioritize my Exec leadership mailboxes' or 'Ensure the legal shape on-site is restored within the first hour'. Through this, the system understands the context and adjusts the technical parameters automatically. The next phase here is the AI agent as a consultant phase, as I call. So, in this phase, we introduce the agent. This isn't just a passive tool or a passive participant. The agent will actively recommend modifications, based on real time data or historical trends that it has seen in your environment. In all of these cases, the admin always has the final word. So, you can accept those intelligent recommendations or you can reject those recommendations. The key here is always, you have the power to modify your recovery plan and leverage the agent to optimize your recoveries. Another key highlight that I want to have in our plan is, you'll always have an estimated restore time or estimated recovery time that is available to you in real time And every time you take an action, either a recommendation from an agent or modify your own recovery plan through natural language, we change, we modify your estimated recovery time, so that you can, at any given point, you know, how long is it going to take for you to get back to business. Lastly, to close this loop, the final step is 'Save or Restore'. So, whether you're executing a live recovery or you're just creating a plan for a rainy day, this workflow ensures that the finalized output is optimized, validated and it's available for you to be executed right now or maybe later in the future, whenever the time comes. Now that we've seen the vision and the roadmap, let me show you in our real product experience, how this might look like. Let's go over some mocks. So, is our real RSC Rubrik Security Cloud product experience and a part of our vision, how it would look like in the near future. On the screen, I have in the center, you have a dynamic relationship map, where you can see all your most critical users. For example, Alice Johnson in this case, and all the data that we're trying to recover for your most critical users. And we split all of this by the application. So, you can see for Alice, we show everything that we're trying to recover for Exchange, OneDrive, SharePoint and Teams. And within Exchange, it gives you visibility into each of the specific components, like emails, calendar, contacts, tasks, so that we don't we don't just give you a list, we give you an interactive plan that you can play and understand what all assets are coming back. Another important aspect I want you to see is, on the center top, you will see an estimated restore time. In this particular case, it shows one point five hours. So, you know, in this configuration of your plan, how long is it going to take for you to recover. Next, let's look at the bottom left of our screen. This is the natural language prompt that we discussed in our roadmap previously, and I want to provide you this capability to be able to have this natural language chat interface to just modify and recover your plan. You can use some of our suggested prompts as well, like do not recover emails without attachments' or 'type in your own instructions' and as you do this and execute an instruction, the system recalculates the plan and the restore time in real time, so that it gives the admin the ultimate control with minimal friction. Now, let's go over one of these sample prompts. In this example, what the admin is trying to do is, for exchange, they do not want to recover the email which was sent company wide. Right? We know all of these noisy company wide emails, they get sent in high volumes and that we don't need to recover that in an incident case. So they want to remove all of those emails and for OneDrive, they only want to recover the files with sensitivity label as confidential. So, this organization is pretty strong with their purview labeling and they only want to recover certain files for the OneDrive environment. Once this prompt is executed, what we see is two things. Number one, the volume of assets across Exchange and OneDrive has reduced from what we saw in the previous screen, because the prompt got run, some of the emails were removed, which were sent to company wide, and only let's say in this example, half of the OneDrive which had a confidential label to them were part of Alice Johnson's recovery plan. What I also want you to see is the estimated time to recover that is now fifty five minutes because we reduced the volume of assets, so it's now just fifty five minutes. At the same time, you can also use this prompt based mechanism to add more assets. So, can add 'hey, please add some critical SharePoint sites that are not accessed by my critical users but are irrelevant to me'. Basically, you are in full control to add more assets, to remove assets, and manage your RTO objectives effectively in real time from our interface on the Rubik's Security Cloud. I am extremely excited about our new offering and what we're about to build in the future. And I can't wait for all of you to try this out. I will next hand it off to Dev, who is going to be covering our Rubrik Agent Cloud offering. Thank you very much, Akram. Now I wanna talk about one of our newest additions to the Rubric family, which is called the Rubric Agent Cloud. At the Rubric Agent Cloud, we've really decided to underwrite one trend, which we call the rise of AgenTeq AI. We say agents are coming, but a lot of times the organizations we work with, they're already here. And the trick with AI agents is no longer difficulty of being able to build them or whether the models are good enough. But it's really the concern of how to be able to deploy, operate and govern these agents at scale. Agents present a lot of opportunity and I think what can go right, but also a lot of risk that needs to be managed and what can go wrong. In speaking with leaders, one of the key challenges with agents is that their access is often unguarded and unmonitored. We define agents as really LLMs with access to tools. So you can think about this as models that can take action on behalf of users or in terms of the employees inside of a company. Pretty consistently, if we talk to an IT or security leader, the real concern is around this lack of a single control plane that can help them answer the questions that start as basic as what are the AI agents that are even running and what kind of tools and data can they access? Two, how do I enforce my custom AI guardrails in real time? And of course, what do I do when something goes wrong? Our experience is that a lack of a really good product and platform solution to be able to provide the single control plane in securing and governing agents is one of the things that ends up slowing down the deployment of AI the most when it looks like the pathway from pilot to production. The truth is that billing agents can be done today in hours, days, or weeks, but the deployment is the piece that oftentimes takes months. We see companies build initial proof of concepts quite quickly. But where things often get bogged down is in a committee approval process that needs to understand all aspects of how that agent works. Being able to give them the feedback in terms of what it requires and the changes that they need to be able to make that production ready. An endless loop of making sure that the docs are updated and resubmitted for approval, until you finally have the long tail of being able to deploy that directly into production. One of the things that makes this particularly challenging when it comes towards AI, is that agents can operate in a very diverse way and with non deterministic access inside of your organization. Which means that the overall risk profile for what an agent can do is much greater than what we might think about as an employee with a similar set of privileges. Agents can operate a 100 times faster than a normal employee might and be able to do about a 100 times more level of damage as well as a function of that. This is the key problem that we've really looked to be able to solve inside the Rubrik Agent Cloud. As being able to provide a unified control layer for the agents and operations that they're running. The reason Rubrik has felt the compelling interest to be able to attack this space is that we have some of our core competencies historically in data backup and resilience that gives us a good understanding of the underlying data and applications an agent might be accessing. We layered on an identity solution, which became one of our fastest growing products. And with the acquisition of Credabase last year, we brought in a core understanding of models and LLM infrastructure. We think those three key ingredients provide us the right basis to be able to play in the space of agent operations. Because agents are essentially models or LLMs that are using identities to operate on data and applications. To be able to attack that problem, we've released a new product that we call the Rubrik Agent Cloud. The Rubrik Agent Cloud is essentially a layer that sits in between your users and agents and the underlying models that they might call And provides three key value propositions. The first is continuous monitoring and observability. The ability to hook into any area that your agents are deployed and automatically discover what's happening and populate the agent inventory. To give you full visibility in terms of what agents are actually running inside of the ecosystem and what kind of tools and data they can access. The second thing that we look at being able to provide is active custom governance. One of the key things that I think organizations need is the ability to apply their own custom AI guardrails in real time to what an agent is doing. This is typically one of the blockers towards being able to deploy agents out greatly at scale. And what we've built inside of the Rubric Agent Cloud is a way to be able to do real time policy enforcement for your own custom agent guardrails. And I'm gonna show you a little of how that works in just a few minutes. And the last thing that we think about as a required capability, as organizations are rolling out agents, is how do you remediate when something goes wrong? It used to be that the thing that required data backup and resilience would be natural disaster, fire, flood. And that eventually became more along the lines of cybersecurity and attacks. But I think what we see now is that agents are able to do things like delete databases or take destructive actions. What we've released inside of Rebrik Agent Cloud is a feature we call agent rewind, which allows you to undo a destructive action an agent has taken if the underlying property is one that Rubrik protects. By allowing you to easily correlate the observability we have into the agent action to the previous healthy snapshot we have from backup. These three core capabilities really make up the Rubrication Cloud. And I just wanna double click into how that works and show you the product in action. The Rubrication Cloud plugs into the existing ecosystem or the ways that agents get built today. We typically see three different ways that agents get built. Oftentimes there's a purely custom agent you might be building, which looks like using direct inference calls to backend service like OpenAI or Anthropic. Oftentimes federated through some orchestration layer, like LangChain or LangGraph. There we hook into and integrate directly into your AI gateway, or can provide a standalone AI gateway for you, so that you can monitor all the prompts, responses, and tool calls. The second is we hook into some of the managed platforms that help you build agents. A great example of this is something like Copilot Studio, where you can actually build agents directly that have access to a large number of different properties that might exist inside of your Microsoft ecosystem. For any of those organizations, we hook into your Copilot Studio ecosystem and are automatically able to see the agents that are running and be able to analyze the prompts, the responses, and the tool calls that those agents are making. And finally, there's oftentimes agents that are running on the endpoint. You can think of this as Cloud Code as an example. And what we see as being effective here is our integration into MDM solutions or mobile device management, that allows us to get both the visibility and some hooks into what is actually running directly on the endpoint. Once we have the visibility, we can couple that with our really unique proprietary governance engine, which we call Sage. Sage stands for Semantic AI Governance Engine and allows you to be able to actually define your own custom AI guardrails directly in English. We convert those into a small language model judge that can run on every input and every output of a given agent interaction and help you improve your policy enforcement layer over time. Sage allows you to translate your policies into a small language model that can help you use AI to secure and govern agents themselves and improves over time. To show you a little bit of how this actually works in practice, I'll jump over to our actual live demo. This is the Rubrication Cloud product and our key dashboard that users might see when they first connect. You can see right off the box, we're able to go and detect certain agents that are running as well as risks that we might see inside of an ecosystem. These get automatically and discovered through our key integrations with our partners. The first thing I do in Rubrik Agent Cloud, is I connect it into the ecosystem of tools that I'm using to be able to build my agents. I can do this directly through my connections tab. Now there's three different places that I often mention that agents get built. The first is directly in cloud platforms. So here you can see things like Copilot Studio or Microsoft Foundry as platforms that help organizations be able to build agents that we built direct API level integration into. So all you need to do is connect your credentials and we can automatically scan and discover that environment. We also connect into your existing AI gateways or directly through an endpoint in order to be able to detect the agents that we're looking to be able to run at scale. Once we've connected into your ecosystem, we can populate something that we call the agent inventory. The agent inventory essentially allows you to get a quick view into what are the tools and agents that are actually actively running inside of my ecosystem today. So here we can see that we've discovered a number of agents. We can see what applications they have access to and when they were discovered. But also what are the active alerts and violations that these agents are raising? These alerts and violations, I think, is one of the areas that we see some of the most powerful value in organizations that are trying to establish their own custom AI guardrails. Alerts and violations really have to do with the policies that I've placed on agents. And I can configure these directly in my policies tab. Let me show you how that works. If I click into the policies tab, you'll notice that we have some predefined policies that you can start to enable directly on your system. So things like agents should be read only by default or be able to detect anytime there's PII being shared are the types of policies that you can actually enable just directly in one click inside of the Grouping Agent Cloud. But one of the powerful things that I think that we allow you to be able to do is actually create your own custom policies directly in English. So I work at a lot of organizations that are in financial services and might have a policy that says something like, agents should never give financial advice. This is a good example of a policy that I oftentimes see in governance documents that is really hard to enforce in a conventional rules based system. And the reason is that this is not necessarily something that you can write as a regex or a string match to run on every single input, output, or response. Instead, what we'll do is we'll use Sage to be able to combine and condense this particular policy that I've written into a small language model. The first thing we'll do is we'll take this idea that agents should not give financial advice. And we'll put it into an enhanced prompt template that we call our policy configuration. That policy configuration is automatically given a strength score, similar to a password strength score, along with some detail on how to make that policy a little bit better. So you can see that we have some of the goals and instructions here and some key definitions, all of which are editable. For example, what does financial advice actually mean? What does it mean to be able to give a recommendation and so forth? We can also see examples of where this agent would anticipate being able to trigger a violation or mark something as safe. So for example, if the agent said something along the lines of, based on market trends, recommend you diversify your portfolio with tech stocks. That might be something that marks as violation. But if it's just simply defining what a mutual fund is, that's marked to save. You can also run your own test examples to say something like, definitely buy a given stock and be able to see how actually the agent governance policy would be able to trigger. So here we found that this violates our policy with high confidence. And we even have a reasoning trace. If for any reason our policy model got something wrong, you can easily add this as an example that the model will then learn from. Once you've created your policy configuration, it gets populated by default using our models. You can actually click next. And this policy is something that's ready to get enforced across your ecosystem. Whether that's an agent you built in Copilot Studio, Microsoft Azure at Foundry, or directly connected through something like a AI gateway. These policies can be put in monitor mode where we'll be able to trigger a violation alert, or if we're placed in line directly into a blocking mode as well. Once we have any of these policies set up, we can actually go back to our agent inventory and see where different policies might be getting triggered and where I wanna be able to dig in deeper. Now in our inventory, I can see our Cloud File Manager agent is triggering some more active alerts. And so I can click into this agent and get a sense of what's its actual agent map. What are the tools and data that I can access? So this given agent, for example, has access to SharePoint and the ability to delete and search within SharePoint. And it's actually triggering one of the alerts on one of my policies. I can click into one of the flagged actions to understand why. Looks like this agent deleted some files inside of SharePoint. So it violated one of the policies that I had put on the agent, which was by default, it should only be reusing read only tools. And it actually went one step further and deleted some files. One of the powerful things I can do is I can actually dig deeper into this looking at the action timeline. And I can see that this agent was searching across OneDrive for some secrets and did find some. And it decided to delete the file in which that it found some of these secrets. Now, this agent looks like it was being a little overzealous for what I'd asked it to do. And so what I can do is of course, set a policy from my read only to no longer act in monitor mode, but to block mode. But I can go one step further too, and I can rewind this action as well. Which basically means I'm gonna restore this file to the previous healthy snapshot that existed directly from the Rubric backup. The way this is done is we're just correlating the timestamp at which the agent took the action to the previous snapshot that existed within our backup and allowing you to have a really easy path to be able to actually identify those items and do the rewind. So this is the Rubrication Cloud. It's an interface that gives you the real control that we see organizations need across their agents, to be able to accelerate their journey, regardless of where those are built. And so with that, I wanna say thank you for watching and I'll hand it back over to Eddie to continue the conversation. Great. Thank you, Dev, for walking us through, what is really transformative for Rubrik with our Rubrik Agent Cloud. And also thank you, Akram, for walking us through, exciting announcements that we've made around our autonomous business recovery for March. These solutions together are no doubt going to change the landscape for how we think about data resiliency for Microsoft March. And looking forward to continuing the conversation with you all. Our team will definitely be in touch, but if you want to reach out to us beforehand, there should be a link here within webinar to reach out to ask some more questions. So thank you again for your time and looking forward to working with each of you to talk more about how we can help impact your m three sixty five deployments.