Video: Test What Matters. Skip What Doesn't. Ship. | Duration: 1744s | Summary: Test What Matters. Skip What Doesn't. Ship. | Chapters: Introduction and Overview (115.575s), Testing Bottleneck Challenges (167.06s), Intelligent Testing Strategy (376.91s), Smart Testing Strategies (515.75s), SmartTest Analytics Explained (874.395s), Smart Test Selection (1275.025s), Cloud Compatibility (1357.925s), Addressing Remaining Questions (1408.085s), AI in Testing (1477.93s), SmartTest and Environments (1548.935s), SmartTest Feature Availability (1649.5s), Conclusion and Thanks (1699.225s)
Transcript for "Test What Matters. Skip What Doesn't. Ship.":
Alright. Hi, everyone. Thank you for joining. I am Shoja Abnakvi, a senior solutions architect at CloudBees. Today's webinar will be on testing what matters and skipping what doesn't. Let me see if I can advance the slide. There you go. So across a lot of teams right now, we're seeing the same, I guess, patterns. For example, PR validation is slowing things down while CI cost keeps going up. So the question really ends up becoming, how do we move faster without losing the confidence in our pipelines? And that's what we'll unpack today. We'll leave some time at the end for q and a, so feel free to drop questions in the chat as we go. So let's kinda dive in. Alright. Good. So let me start with the situation that's very, familiar. It's the beginning of the quarter. Engineer leadership walks into a platform or a shared services meeting and says, we need to increase velocity. At the and in the same breath, they also say, we also need to reduce cost. So now you have to go faster and cheaper all at the same time. The business is pushing for features to be delivered at a higher velocity. AI assisted development is increasing the pull request, but leadership is looking at the CI bill and asking the hard questions. Why are compute cost climbing? Why are pipelines running for so long? And why are you paying to execute the same test over and over again for the same PR? And now that pressure lands on the platform engineering or maybe even QA leadership. But on paper, everything looks modern. Builds are automated, infrastructure scales dynamically, and all the tests are paralyzed. But then someone asked the question, what is actually consuming the most time and the most compute? And more often than not, it's testing. Regression suites have grown. End to end coverage has expanded. And with AI, test cases are being, written and automated. But the validation time has quietly become the longest stage in the pipeline. Developers are moving faster, but every change happens to trigger a full test suite. So validation takes hours, failures are discovered late, and pipelines are rerun. So, obviously, cost cloud cost rises, and now that's kinda where the team is stuck. How do you increase velocity, decrease spend without increasing risk? And that's the tension most teams are dealing with right now. So what happens when testing doesn't evolve? At first, it's subtle. Right? Test suites grow. Coverage expands. But every test adds runtime. It adds additional compute, and every test increases noise. Now every PR runs everything. Right? A small change triggers a massive validation load. Doesn't matter if it's a change in the config file. If I update the read me, it's gonna run that full validation. So developers have to wait. Feedback is slow. And then we got these flaky tests that that appear. Right? Tests that pass and fail with the same code. Developers have to rerun pipeline. It eventually passes it, and they move on. But the trust now is kinda gone. Right? Because they have to rerun pipelines in order to get past that flaky test. So what does that mean? Compute time doubles, spend increases. Now let's connect the dots. So you have velocity that drops, cost that rises, and confidence that falls. The problem here isn't runtime. It's trust. And what is the root cause? It's the fact that we're running everything. So many teams have tried to fix this problem. What did they do? They tried to paralyze their test cases. They split the suites. They added more runners. They increased concurrency. All this helps, but they're still running everything. Then they scale their infrastructure. They add bigger machines. They add more memory, more CPU. They add more executors. So things are going faster, but guess what? It's a lot more expensive. And then they have to add observability. So, you know, to manage those VMs, to manage those CPU utilization, to manage the, you know, the additional VM that they're creating or if they're, you know, spinning things on Kubernetes. So, again, they have to add dashboard and metrics. So but the problem is that doesn't change the execution time. The underlying issue is still that we're assuming all test cases have to run. So what needs to change? So it's not testing volume. It's really a testing strategy. We've been asking how do we run things faster, but the better question is which tests actually matter for the change that's being made in the code itself? Not every change touches everything. Not every test carries the same amount of risk. What if validation could be aligned to risk calibrated based on confidence? And that's a shift from blanket from blanket testing to intelligent testing. So how do we do this safely? First, we have to observe. Right? CloudB smart test watches the builds, it watches the commits, and it looks at the results. Then it begins to learn. It maps the changes to the failures, then it calibrates. It determines which tests matter most. The confidence increases over time, and then we execute. Run the most relevant test cases first. So it's not less testing. It happens to be smarter way of testing. So the concept, we observe, we learn, we calibrate, and we execute. So let's make this real. Let's take a look at what this looks like inside an actual pipeline. Okay. So there are really two ways team apply this. The first is testing earlier. Right? Makes sense. But bringing the signal forward in the pipeline so developers don't have to wait. The second way testing is done, is running only the tests that are most relevant in the given change. So most teams don't end up choosing one or the other. They end up doing both depending on where they are in their pipeline. Let's take a look at what it means in practice. Before this approach, this should be very common, teams run heavy integrations or or or smoke testing less frequently and failure to discover is late, and that creates a long feedback loop and significant wait time across teams. Let me give you an example. So I'm a developer. I come into work at 8AM. Well, let's let's make it a little bit more realistic. I come to work at nine, maybe ten, and I start writing my code. Right? As I'm writing my code, I commit my changes. I run some unit tests that I've written, and then I have to wait for the nightly build to really understand if my code commits will be merged to the feature branch. And in some cases, I don't know, right, until I come back twenty four hours later the next day, and I realize that my my change was rejected. So with SmartTest, we shift that feedback left even further left. Instead of waiting for the full validation, developers will will be able to get results sooner because SmartTest will be dynamically creating subsets from the full test suite and only run those tests that will actually test the code changes that were made. So instead of waiting till the next day, developers know immediately if their changes are good. So here's another important shift. Traditionally, tests run-in some sort of fixed or random order, which means even if failures exist, you might not see it until much later in the run. With SmartTest, we kinda we prioritize on the failure probability. The test most likely to fail run first. So instead of, so instead of waiting for the entire test suite, you fail faster, and that dynamically reduces the time for first failure, which reduces the feedback to the developer. Now it's important to understand that how this kind of happens. Right? It's not instant. The system starts to run-in observation mode. So when you're first using, when you start to integrate with, Kabi SmartTest, you'll have to add one or two lines in your, current build, process. And that will allow SmartTest to watch the builds, analyze the results, and over time, build what we call the confidence curve. As confident increases, the system becomes more precise. Right? It begins to select the right test, and this is how reduction is earned. Right? And what I mean by this let's take a look. Let's see here. So let's take a look at what you're seeing here right now and why it's really important. So what this chart is showing is the relationship between how long your tests run and how confident you are for catching all the failures. So on the on the horizontal axis, you have the the time, how long your tests are running. And on the vertical axis, you have the confidence or the percentage of failures you're likely to catch. So if you're running the entire test suite, let's say, for a hundred minutes, you're going to catch essentially 100% of the failures, and that's what what teams do right now. Right? But what this curve is really showing is something really important. You don't need to run all hundred minutes to get the most signal. So in this example, you might be able to run the test suite for thirty minutes or forty minutes, and you'll likely be able to catch 90 to 95% of the failures. And this is the idea of risk tolerance comes in. Right? Because now you have a choice. Do you want to have 100% confidence at hundred minutes or 95% confidence at forty five minutes or forty minutes? And that's very important question because now it's a lot different than saying, hey. We can cut your trust suite by half versus, hey. Look. We can cut it by half, but also give you 95% confidence that you're testing all the failures, right, or be able to catch all the failures. What smart test does, it takes it makes this trade off visible and measurable. Instead of guessing, you can calibrate test execution based on confidence. And more importantly, this isn't static. The curve is built from your actual builds, your actual failures, and the actual code base. So as the system learns more, this curve becomes more accurate over time. So this isn't about skipping tests. It's about understanding how much testing is actually required to maintain that confidence. So what do you so here, what you're looking at is a summary of the test executions over time. Each row represents a month and how much time was spent running tests. The most important column is the total duration without SMART test and the total duration with SMART test. The difference between running everything and running only things that matter. For example, if we look at the month here, for example, we're seeing that SMART test, without SMART test, there were over two hundred hours of execution time. Right? Once you enable SmartTest, we were under a hundred minutes. So you're eliminating half the time. And that translate directly to, what, less compute usage, fewer research consumption, and lower c CI cloud spend, and just as importantly, faster feedback for developers. And as you move down the table, you'll see that this isn't a onetime improvement. It's consistent over time. As the system begins to learn more, it efficiently it it actually compounds. And this is based on the actual pipeline activity, again, your builds, your tests, and your failure patterns. So it's not a benchmark. It's not an estimate. It's actual real data from your environment. So instead of guessing how much time or how much or cost you might save, you can measure it directly. So at the same time, you don't just get visibility into what's happening across your test ecosystem. Right? Actually, let me rephrase. So instead of, you know you know, being able to, I would say, improve your ecosystem your test ecosystem, we also try we also end up building on your test duration analytics. We build analytics around failure ratios and flaky tests and longer run running tests. So you're not just optimizing execution with SmartTest. You're actually improving the health of your overall test suite. And this is another interesting thing that, is one of the artifacts of other features of SmartTest, and this is where we're talking about triaging. So instead of looking at dozens of failures individually, the system groups them into underlying issues. So, for example, here, instead of 18 failures, you might actually have two actual problems to investigate. Then you kinda prioritize the most impactful failures first. For example, identifying a single root cause that's affecting multiple tests. For example, let me think of a good example, is network timing out. Right? That may be an issue that affects twenty, thirty, 40 test cases. Right? And there may be just one cause of that problem. So if you can just fix that one cause, well, guess what? You just fixed, you know, 40 or, you know, test cases. And from there, right, once you know what the root cause is, you can you can associate that cause to a commit, and that can easily remediate some of the problems that you're facing when it comes to test case runs. And, also, most important a very important piece of of what SmartTest can offer is by identifying flaky tests, we can deprioritize them. So instead of, you know, chasing noise, team can focus on the signal itself. So when you put all to all this together, prioritize execution, faster failure detection, better visibility, streamline triage, the pipeline starts to behave very differently. And let's look at what that means for runtime cost and trust. So when you're running or when you're using CloudB SmartTest, guess what happens? Runtime drops, hours becomes minutes, developers get feedback much faster, iteration speeds up, and the compute usage drops. So what do you have? You have fewer reruns, lower spend, and most importantly, the trust returns, and developers begin to trust the green. Developers trust the signal. Leadership sees the speed and cost aligning, and that's the outcome, speed, cost, and confidence. Testing didn't become a bottleneck overnight, but it doesn't have to stay that way. If test is your constraint, the question isn't whether you need to run whether do you need to run all your testing, or the question ends ends up becoming whether if you're running the right test. If that all makes sense, I would love to see if there's any questions. I would love to get your comments as well. Thank you. Alright. We have one question. How is the confidence determined on the curve? Great question. So what we're doing here is we're analyzing the code changes that are happening on on the Git repository. And from there, we're looking at the executions the test suite executions that are actually running. And from there, we're analyzing the types of failures that are happening. So if you're running a test suite with, you know, twenty, ten, twenty, 30% failure rate, we're able to determine, okay. Well, did this failure happen because of the code change, or was it something else that may have taken place? Right? And as that kind of again, as we continue to learn, that confidence continues to increase. Next question. Does this require changing our test suites? No. It doesn't. So we work with majority of the test suites. In fact, I haven't come into any sort of scenarios where we have not been able to work with any, test suites. So if you're using, you know, Playwright or using, you know, Selenium, you're using Cucumber, whatever you're using, we'll be able to integrate with those. Great question. Thank you. Question alright. A couple of questions. Let's see. I missed one. So smart test mechanism pox. Oh, okay. How smart test mechanism picks the minimal minimal but necessary set of tests to run? Yeah. Great question. So you have to so when you're actually when the confidence curve itself is being built, you have the option of saying, okay. You know what? Based on the confidence curve, I want to run the test case for thirty minutes or ten minutes. Usually, what we see customers doing early on in the in the in the stage of their SDLC, they may for them, with dev stage, they may be okay with an 80% confidence on the test on on the test case runs. And as a and they still run their nightly builds in the evening, right, which is its full test suite. But, again, the idea, again, remember, is to reduce that feedback loop so developers know quicker if their PR or pull request will have you know, will they be able to merge by the, I guess, by the time they come back to work the next day? Next question. In terms of infrastructure, can CloudBee Smarttest be used on any cloud version? The answer is yes. From my understanding, what what you're asking is I'm assuming you're you're running on AWS, which is can you run on Azure? Can you run on whatever? Yes. The answer is yes. It's not a problem for us because SmartTest is a SaaS offering. So what you'll be doing, you'll have a a command line utility that you'll be pulled down locally on your on your whatever your build agents are. And when there'll be a line in your in your existing CI process, that will call that CLI and send the required metadata over to SmartTest. Let's see. Any any other questions? I don't think I answered all of them. Let me make sure. Mechanism box. Yeah. I think I got them all. Oh, and the new one came up. How does it help when how does it help when tests may pass in one topology but not in the other? That's a good question. I just need to if you can in the chat, maybe ex what do you mean by topology? Maybe give an example. I may be able to answer that question a little bit more precisely, I would say. Are you so are you talking about network topology here? Or I'm gonna see if I can open the chat window here real quick. Let's see. Yeah. Can you clarify that question a bit? I'll answer the other one. Does this require code coverage data? No. It doesn't. It's nice to have code coverage data, but it's not required. Question. Smart test is a plug in that needs to be installed on Cloud VCI. Yes. Cloud VCI, there is a plug in. There is a integration that you can leverage, but it's not only Cloudvue CI. So if you're using GitHub actions, if you're using, you know, GitLab CI or some other CI solution, the innovation also do exist there as well. Does predictive test uses AI? Yes. It does. The we use ML to build the model, and from that model, we then when we're doing these selection of the test cases for the code changes, that's where the AI piece comes in. If we have Tosca or the suites, will this work with it? Yes. It will. So I'm assuming, really, I'm gonna ask you a question again. You're asking topology. I'm assuming you mean different environments. You know, if one if a test works in one environment but doesn't work in the other environment, there may be a way to, in fact, utilize that information in order to build what we call the predictive test selection, PTS, smart test. Because we'll have you can flag or tag certain environments, and so we can determine from the tagging, hey. If I'm deploying into the QA environment, I know these tests work or these tests are flaky. Right? And that could be a good way to measure for yourself, determine, hey. Why these test cases work in this topology and not the other one. Right? Next question, predictive test. Is it using some LLM? Not exactly, but we do have an LLM or an agent that will be leveraging SmartTest, in, as part of the unified offering that we have. But good question. Yeah. So we have two minutes left. Any more questions? These are great questions, by the way. Okay. I see another one. We use open source Jenkins. Can we take advantage of SmartTest feature? Of course. Yes. You can. Yep. Yeah. If you have open source Jenkins, enterprise Jenkins, yeah, you can still take advantage of SmartTest. Alright. I don't see any more new questions coming in. Maybe you will wait another okay. I think we're good. Thank you, everyone. Thank you for joining, and I believe this will be available. Recording The will be available. And, also, if you were interested in seeing a more custom demo of SmartTest, do reach out to Cloudviews. Thank you.