Video: Powering Progress: Securing Global Leadership with AI-Driven Observability in India | Duration: 2048s | Summary: Powering Progress: Securing Global Leadership with AI-Driven Observability in India | Chapters: Welcome and Introduction (19.775s), Session Overview (114.51s), AI Adoption in India (192.16s), Observability and Automation Priorities (267.935s), AI in IT Ops (450.72s), Autonomous Operational Resilience (580.93s), Advanced Observability Use Cases (930.375s), AI in IT Management (1068.165s), AI Design Framework (1181.0149s), AI-Driven IT Optimization (1343.75s), AI Agent Integration (1624.6799s), Summarizing AI Observability Importance (1707.8301s), Conclusion and Survey (1837.085s)
Transcript for "Powering Progress: Securing Global Leadership with AI-Driven Observability in India": Hello, everyone, and welcome. Thank you for joining us for Powering Progress, securing global leadership with AI driven observability in India. In today's session, we will explore how organizations can strengthen operational resilience, improve visibility across increasingly complex IT environments, and unlock greater business value through AI driven observability. Today's session will be presented by Neeraj Kumar, Director Solutions Engineering, APJ GSI at SolarWinds. In this session, he will share insights on how AI driven observability can empower organizations in India to manage complexity, accelerate innovation, and strengthen their position in the global digital economy. Thanks everybody for joining today's session. My name is Niraj Kumar and I lead the Solution Solution team for SolarWinds focusing more on Asia Pacific and Japan business as well as the global SI business. Now today's session is all about where SolarWinds actually partnered with IDC to understand AI impact adoption and use cases, how it's actually leveraged and used across the various enterprises and government organizations in the APAC region. Now in today's session, we'll more focus on India and basically results from discussions with Indian organizations, how they are trying to secure global leadership with AI driven observability, especially in the IT operations area. Keep your questions coming, in the chat window, and we'll definitely try to answer them, at the end of the session. We'll first start with India's AI adoption. I mean, how we are moving from pilot to scale, what are the use cases where organizations are investing, and, the planned investments across 2026. Then we'll move more towards IT operations area and how I Indian enterprises are scaling AI ops, Genetic AI, agentic AI, especially in the IT operations area. We'll with, obviously, the AI implications in IT ops landscape and more and more AI, becoming prevalent, how is the new IT ops landscape going to look out, and what are the challenges and barriers which needs to be taken care of to adopt these use cases. And finally, we'll talk about how SolarWinds is solving some of those challenges as well as what are the use cases and how we are adopting AI and building use cases across our solutions, Samal. So let's go deeper and dive in into these areas. Now if I start with India AI adoption. Okay. I mean, from the survey, from the discussions which we had with various organizations and customers together with IDC, number one priority obviously was IT ops and major investments are planned in this area, especially in 2026. Okay? So that's 74% followed by AI augmented merchandising as well as augmented agile systems and all. Correct? Now if you focus more on IT ops area, I think it was pretty clear that 62% of organizations are already adopting AI ops mainly to modernize their infrastructure applications, operations, and things like that. Correct? Even the astonishing part was that that 40% of organizations have already started deploying agentic AI, which is all about how we can automate some of those things and eventually move to autonomous operations or autonomous operational resilience, which we'll talk about in the later slides. Okay? Now with more focus, especially around AI ops area. Correct? And let's focus more on how Indian enterprises are scaling AI ops and also leveraging things like GenAI, Agentic AI to drive operational resistance and business growth. So if we look at the data from the results which we had from the survey, the capabilities required across the organizations, I think 71 of the organizations are in favor of getting how can we get a real time observability? So that's the top priority or the community requirement which is coming up. Followed by automated remediation, which is 68% of the organizations, followed by root cause analysis, which is 59%. Noise reduction is interesting. I think this was like you have IT ops teams flooded with alerts and not insights which becomes actionable. Correct? So noise reduction, obviously, is one of the key priority. And then, I mean, this is one of my favorite, which is predictive prevention. I mean, while we get to issue, we get insights, we get RCA, we solve that issue. But can we prevent prevent that issues from happening? Correct? What can we do so that that issue doesn't happen? Correct? So that's the preventive prevention, and that's also on top of required capabilities from 45% of the organizations. Correct? Another interesting aspect, from this, was 61% actually prefer platform based approach. Correct? So no longer do siloed solutions and siloed observability across compute, storage, security, network, and observability. I mean, 61% of organizations actually prefer platform based approach. Correct? Followed by things like machine learning driven capacity optimization. I mean, this is also coming a lot more, especially where you are you are basically I mean, you want to predict when I need to buy more hardware, when my saturation limits are going to reach, and all of those things are not correct. And then security also is one of the important factors of one of the major strategic priorities. Like, I mean, we are adopting AI. We are adopting all these use cases and all. I mean, how can we adopt in a more secured government way where we can have a overall governance around that, what data we are exposing, what data we are sending. So that's also coming one as one of the key priorities in our correct. And then other two are pretty obvious, which is resilient digital backbones. I mean, whatever interfaces we are exposing for our customers, partners, stakeholders, all, I mean, those digital backbones should be obviously resilient. They should they should be foolproof, no attacks, and obviously performing pretty well for our customers and the end user experiences are great on those things. Okay? And then FinOps part is also coming more and more important. I mean, while we keep on throwing infrastructure, we keep on leveraging AIs, we keep on leveraging tokens, and all of those things. And also how can we optimize all of those things and leverage to the best of the ability? So wherever is required, you use wherever how can we optimize the infrastructure, cloud cost, and all of those things. So that's also as one of the top priorities. Now if we drill down into if we drill down into the IT ops landscape, correct, which is now going to be more and more powered by AI, and obviously, how can we leverage more and more AI across this IT ops area. I mean, to adapt to that or to move to that level, I mean, there are obviously challenges. Okay. So, of the major challenges is the hybrid complexity which is 58% of the organizations actually mentioned that we have complex hybrid environments. I mean, again, data silos. 39% actually mentioned the tools sprawl. Everything we are using separate solutions to kind of get data. So the problem is not data. The problem is actually siloed and the complexity across these hybrid environments. And the other important part part was the skill shortage. I mean, struggling with resources in cloud, DevOps, database area, which is obviously slowing down all these AI options and all these modernizations, which the benefits comes out of these AI adoptions. So effectively, if you look at it, the real barrier is not the technology. It's actually the capacity. I mean, just 30% of the organization is actually attributed to not having the right technology and tools in place. Correct? So while we have the tools, we have the technologies, we have the data, we are collecting so much data across all of those layers and all. But, obviously, the siloed nature of that data, the skill set from the team perspective, and also the overall process is actually making it lot more difficult to respond to issues quickly and effectively to move to what we want to do is an autonomous operational resilience. Okay. So so 51% actually told that the processes are making it difficult to adopt. Obviously, 36% told that the teams where they don't have enough people, skill gap, and all of those things. Only 13% tools, which is which was pretty interesting data to understand. So what are we doing from a SolarWinds' perspective? So we talked about tools. We talked about processes. And, obviously, we talked about the technology areas, the data collections across each of those layers are not correct. So effectively, it's people, process, and technology. And how SolarWinds solves this problem is that if you look at it, we have everything integrated as part of the overall platform. Now if you look at the left hand side, it's all about how can we collect data, observability data across all of these technology layers and all, be it network, be it infrastructure, which can be physical servers, virtual servers, cloud, containers, Kubernetes, and all of these environments, be it databases, be it applications, COTS applications, custom built applications. Be it end user experience, how users are using our digital channels, our applications, and also the security aspects around all of these things and all. So we obviously collect telemetry data across all of these and get to a single platform. And we have that platform available either in a form of a self hosted version or a software as a service version to leverage and consume. Then comes the people aspect. I mean, how can I get the insights from that data, intelligence from that data which I'm collecting from monitoring of the delayer and get those alerts and incidents to the right teams at the right time correct and bring people together to collaborate? So that's the middle layer, which is actually the incident response platform where how it can people can collaborate, get to notif notified, get status on what's happening, where are we on the incidents, and all of those things. And, obviously, leverage the knowledge bases, leverage things to solve all of those aspects and all correct. And then the final part, obviously, is the workflows. Now if something needs to change, if, there are dependencies, if somebody raises an incident or a service request, how that overall workflow we can streamline. That's the IT service management part. So effectively, how can we bring the whole, people process and technology together on a common layer, on a common platform and start driving outcomes like tools consolidation, removing those data silos, moving towards operational resilience, service excellence, and doing that all with a reduced cost and in a completely automated fashion. Now moving little bit deeper into this. Okay. So, think of it that we obviously the goal is to move to autonomous operational resilience powered by AI driven observability, Gen AI, AgenTiKi, and all of those things are not correct. But it's effectively more around moving from a pure human driven approach to approach where we have intelligent systems doing lot of heavy lifting for us as well as having a human in middle or augmenting that with a human driven expertise to anticipate and fix problems in advance, correct, before they become issues and not actually reacting to things and all. Correct? So that's the whole idea of autonomous operational resilience. Correct? So so in this new world, obviously, idea is that how the technology can works for the people and, I mean, how we can make technology work for us rather than me leaning towards that. Okay. Let's get this technology to work correct. So let have the technology or the AI systems do a lot of these heavy liftings and all. And we obviously get our expertise in driving these things, validating these things, and obviously getting our human expertise eventually to move to anticipate problems, fix up them before they actually become a problem. So looking a little bit deeper into this, and this is this has been our vision from SolarWinds' perspective, which is built on these three pillars, which is obviously coming from our twenty five years of expertise, in this area, supporting all these, network monitoring, infrastructure observability, cloud applications, settings, experience, security, and all of those aspects. So it's all about unification. I mean, how can we remove those data silos? How can we get all the data onto a single platform across the entire IT landscape? Then moving towards proactability, which is prediction. How can we analyze the historical data as well as real time data to to understand problems, get to insights, root causes quickly, and to predict things, before it actually becomes an issue. And then also moving towards automation where how can we automate those day to day tasks, mundane tasks and all, and free up efforts for the people to focus more on high value tasks which are actually driving the business revenues and the growth for the organizations and all. Correct? So that's that's our reason. Now from SolarWinds' perspective, what are we doing is we are incorporating a lot of use cases in our solutions in each of the pillars as we discussed earlier, be it monitoring activity, be it incident response, be it IT service management. Okay? Let's talk of couple of use cases which helps drive towards that autonomous operational resilience goal. So if you talk of monitoring and observability, I mean, rather than setting up thresholds for some things, can the system learn and automatically detect some anomaly or create a auto anomaly based alert, for us to look at that correct to look at issues, proactively before that issue actually happens. Correct? So that's one of the use cases. Correct? Root causes is if you think of can we understand what started the chain of events which suddenly led to some degradation or, errors in the application or in the infrastructures and all. Correct? Can we understand that root cause collating metrics across multiple data sources, multiple domains, infrastructure, networks, applications, all of those things correct? So that becomes a important use case. Again, capacity planning, we talked about earlier also, is all about can we predict things when I'm going to reach my saturation point, when I'm going to reach a threshold where obviously so that I can plan my purchase decisions and all. Or let's not keep on throwing, infra resources, which obviously increases the cost and all of those things. And let's run it in a proper optimal fashion, which is all about resource optimization and forecasting. Correct? If we talk of similarly focused more on database observability, I mean, can I look at all the queries which are getting executed, and can I understand if I have proper indexes set or if I need to set up new indexes or if I need to optimize that query so that it works properly or I need to optimize my tables and all of those things? Correct? If you talk of incident response, I mean, when incident happens and there are multiple people involved looking at that incident, solving that incident, can I have a transparency that, okay, how that incident is going on? Can I have a past knowledge basis? Can I compare with past incidents how it was solved correct? Can I I mean, the noise reduction use case which we earlier talked about? When we are getting events and alerts from various sources and all, can we group those alerts with a meaningful grouping and figure out what can be the probable root causes and all of those things are not correct. If there are some transient alerts, like something comes up and suddenly next minute it goes up, can I remove those things and not do a on call just to waste time for that engineer? Correct. So so a lot of those use cases which are available. Similarly, IT service management, for example, can we when I'm logging a service request, can AI ultimately suggest that this should be the category, this would be the team it should assign to so that And these are the required information for that incident for that request so that it actually directly goes to the right team rather than going to a first l one team, and then they route it to the respective teams and all so that we don't waste time in fulfilling that request. Correct? Similarly, when I'm resolving an incident, can I get which is the recommended runbook for me to use? What should be the steps to solve that incident? Correct? If I have some steps in a Word doc or a SharePoint and all, can I use that to create the automated runbook on demand inside my system, which can be leveraged again by the agents about? Correct? So these are all some of the use cases which we are leveraging AI in our solutions. And these are all available today for leveraging. But obviously, we are building lot more. One of them is the AI agent, which is basically the agentic capabilities which is already available in a tech preview in our ability self hosted as well as SaaS solution where you can ask questions about what is happening in Minecraft, what is happening in my applications, what is the root cause. You can ask question how to configure things and all. And, obviously, we are building a lot more capabilities. It's all about how we can leverage all these AI to do these heavy duty tasks and, obviously, have the human take those actions and decisions. Correct? And obviously, are building lot more use cases like correlation, analyzing log patterns, ultimately service surfacing alerts and incidents and all of those things. So we talked about, the SolarWinds AI use cases, what we have available today as well as, what we are building. But, before I show you couple of use cases, let's also talk about SolarWinds AI by design framework. Now, we heard about lot of, I mean, one of the barriers to adoption being the governance and the security and the guidelines, and all of those things. So this obviously is very important for us. So we always follow this SolarWinds AI by Design framework whenever we build any use cases inside our solutions. Okay? Leveraging AI. Now there are basically four pillars. One is the privacy and security part. So any data I mean, we always make it option that what data you want to share to the AI engines or the LLMs and all to leverage and get to the insights and all. There is always an opt in opt out facility to leverage each of those use cases. And so this actually takes care of the privacy and security aspects. You can also decide which data you want to send, maybe some data you want to mask out, all of those things. And also, a lot of those things are available inside the solutions. The second pillar is accountability and fairness. So we want to use algorithms which are unbiased and but again, I mean, algorithms are algorithms and obviously we want always that there is a human in the loop. So, AI provides the insights and recommendations and there is a human in the loop to take care of the decisions whether he wants to incorporate he or she wants to incorporate that or maybe ask follow-up questions or do something about it or change some of that stuff and all of those things correct. So so that it's always accountable that who executed those decisions. The third pillar is the transparency and trust. Like, it's not about just AI giving an insight or the recommended actions. It is also going through the phase that, okay, it explains what data it analyzed, what decisions it took, what steps it followed. So it's all about explainable AI. And then finally, simplicity and accessibility is all about I mean, it's not like, okay, build another chatbot or another AI engine and then connect all the solutions to that and let it answer data and all. Correct? Why don't we build the use cases inside the existing solutions and all so that it becomes easy for anybody to use We've been already used to using the solutions, and it becomes obviously accessible to everybody using the existing solutions and simple to use. Right? So that's about simplicity and accessibility. So let's move to talk about some of the use cases. Okay. The first use case I wanted to talk about is the capacity planning because this was one which obviously came in the survey. Now it is all about I mean, of resources, bandwidth, storage, memory, CPU across infrared network, storage, and all of those things. Correct? I mean, it's not about we keep on adding resources and all. Correct? Can we understand based on users patterns when I'm going to reach my saturation point? Can I do some modeling? Correct? Can I project future resource needs and exhaustion dates and all? Obviously, helps preventing these out of memory, desk full scenarios and also helps in planning the new hardware procurements and all of those things. The second use case is root cause assist. Now think of it that nothing happens in isolation in IT environment. I mean, there will typically be a sequence. Let's say some storage suddenly some I mean, storage IOPS increased because of some disk failure. Correct? That led to infra resources that increased spiking up CPU and memory. Correct? That potentially led to database queries taking a lot of response time. That potentially led to some of applications also experiencing slowness and all of those things. Correct? So there is typically a sequence of events or a series of events because nothing happens because things are so interconnected these days. Nothing happens in isolation. Correct? So root cause assist is all about understanding all those metric and event anomalies across interconnected systems and figuring out what started it. Correct? What's the underlying cause? Basically, was the patient zero? Correct? Now all it helps is obviously reduces these war rooms, blame games across teams that, okay, which was the problem and how can I quickly get to the root cause improving my mean time to diagnose and eventually mean time to remediate? Now I talked about the resource optimization. Correct. Now typically, the the scenario and especially even from a FinOps perspective, the scenario always has been that let's keep on adding resources. Correct? Let's keep on adding or let's over allocate or over provision some things or in some cases, obviously, under provision, which obviously impacts my performance. So both ways, I'm I'm at loss because if I over provision, I'm hitting getting hit on cost. If I under provision, I'm getting hit on performance and end user experience. Correct? So I should have a ideal optimal performance. Correct? Now it's humanly impossible for me to keep on analyzing all of these things at all point of time. Correct? So why not leverage AI to keep on analyzing all of those things and recommend optimal configuration, like, be it for Kubernetes, be it for network, be it for infrastructures and all. Correct? So in this example, I mean, I while I have allocated max CPU resource limits and memory limits as well as the request limits. AI can analyze the usage patterns and recommend the optimal configurations. And all it requires for the human in the loop to take this configuration and go ahead and execute it if it's to their likings. Correct. So, it it obviously helps optimize on the resources as well as helps optimize main time to remediate in case if there are slowness experience. And this doesn't require high value skills or experts for Kubernetes or databases or infras to solve all those problems. So we can have operations or SREs take care of all of these things. Again, another interesting example is the AI query assist. Like, okay, we write a query and we forget about it. Now can I have AI keep on analyzing that query? Is that what indexes they are using? Which types of various requests are getting called on this table. Can I optimize my query? Can I look at which are the sources they are using? Can I add additional indexes? And all of those things and also AI can help in these aspects also where it can keep on analyzing the execution plans, the users, and help me rewrite those queries, help me add those indexes, help me do some other tunings, and and, again, always with human in leap a loop where human can take the final decisions to do these steps and all. The other example is about if I get an incident, for example, can I leverage past learnings to understand what was followed to solve some of those issues and Correct? What was done in past? Which teams were involved? What was the potential issue last time? So we can solve all of those problems and all. Correct? Now finally moving to the use case around the AI agent, which we have already released in our Salesforce solution in tech preview. And we are going to release this month itself for our SaaS solution also. So you can have a conversational interface where you can ask questions related to what is there on the screen. So, it's all contextual. You can ask questions related to your nodes, your applications, your devices. You can even ask questions that, okay. This is I want to do and how should I proceed about that? So it will suggest which alert to configure, how to go about it, and all of those things are all correct. So it becomes pretty easy where you are you think of you are actually talking to your monitoring system and getting work done rather than having you configure lot of these and do lot of these heavy duty works. So to summarize to summarize how we why AI driven observability actually is important, actually matters now to drive to our automation, operational resilience, and business growth. So think of it as I mean, we have so much scale. I mean, we have so much data getting collected across the hybrid environments and all. It's humanly impossible to manually do all of these things, and the traditional methods also can't handle all of those things. Correct? I mean, observability at some point of time was more about let's collect lot more data. Let's keep on collecting data across all of the all of the silos, across all of the domains are not correct. But eventually, now moving to how can we get insights from that data and how can I take actions on that data to basically prevent things from happening and move from that reactive phase to proactive and preventive phase? So that's the proactive operations piece. And then one of the things which we also heard from the results of the surveys, I mean, no longer the disparate silo tools. I mean, why not have a cohesive observability layer across the environments for the hybrid environments? And then the final part is obviously how can we have security compliance in mind. How can it also do the compliance monitoring across all the data domains as well as have a security posture that, okay, if I'm leveraging AIs, how I can be secured and compliant in that aspect also. So our AI by design and security by design principles helps in those areas. It obviously helps all of these value drivers. I mean, 43% of respondents actually asked that they value AI powered RCA. 61% actually preferred platform based approach. And 48% obviously were focused on security and governance and also the ability to drive the security and compliance across the across the overall data domains and all. With that, that's the end of the presentation. We will take up the questions in the chat now. If you haven't put your questions in the chat, please keep on adding your questions in the chat and we will take up all of those things. Thanks, everybody. Thank you everyone for joining us. Before you leave, we would greatly appreciate it if you could complete the survey form appearing on your screen. Your feedback is very important to us and helps us improve future sessions. We would also like to invite you to our largest virtual event, Solar Wednesday, your path to autonomous operational resilience, taking place on April 16. In this event, you will learn how AI precision and intelligent context aware technology can cut through complexity, surface what matters most, and help teams move from reactive troubleshooting to resilient operations. Thank you again for your time and participation. We look forward to seeing you at SolarWednesday.