Automation and Observability with Mirko Novakovic

00:00

0.5
1
1.25
1.5
1.75
2

This is a podcast episode titled, Automation and Observability with Mirko Novakovic. The summary for this episode is: <p>Jerry is joined by CEO of Instana, Mirko Novakovic. They discuss how the increasing complexity of AI and cloud environments can lead to the creation of an enterprise black box, where no one knows exactly how an application works or how AI models make decisions. Mirko describes how Application Performance Monitoring and Observability play a critical role in tackling this problem for an enterprise by helping them "turn on the lights." He also introduces "Stan" (The Robot) to discuss the future of automation in observability. <br/><br/>Art by Adaoha Onyekwelu.</p>

Transcript

Ethan: From IBM Cloud and Cognitive Software, you are listening to the Art of Automation with host Jerry Cuomo.

Jerry Cuomo: Thank you, Ethan, and Welcome to the Art of Automation, a podcast that explores the application of automation and enterprise. You ask, what is automation? Well, automation uses technology to automate tasks that once exclusively required humans. Well, there's a saying, you can't fix what you can't see. Seems pretty logical, right? Well, that saying also translates to automation pretty well, as in you can't automate what you can't see. Well, you can probably automate blindly, but then what's the odds that you would actually fix the issue at hand? And by fix, in the context of automation, it usually means things like reducing repetitive, mundane work and shifting the new free time that you've just liberated to tasks that matter more to your company. So the question that we're going to try to answer today on this podcast is how can we help businesses figuratively turn the lights on so that they can see what's happening across their business and IT systems so that they can fix problems and automate actions based on data and real insights with the lights on? So I'm excited to have as my guest to lead this discussion, Mirko Novakovic, the CEO of Instana, and now also the leader of our application performance and observability practice within IBM automation. Mirko is a recognized leader in the area of enterprise application performance management, also known as APM and observability. At Instana, Mirko led the creation of a solution designed specifically to turn the lights on for those managing modern cloud native applications and microservices. And before Instana, he co- founded Codecentric AG, a startup that focuses on software development and delivery and innovation around technologies in Germany. And before that, he honed his skills around enterprise applications by leading projects, working for electronic data systems and IBM. So come full circle and this is where we start. Welcome Mirko, to the Art of Automation.

Mirko: Hello, Jerry. I'm happy to be here.

Jerry Cuomo: Oh, great to have you. All right, let's get right into our first question. Mirko, why are you so excited about application performance management and observability and what does it actually deliver to its users?

Mirko: Yeah, Jerry, when I started my career actually at IBM almost 20 years ago, I was a software developer. And as a software developer, you have good tools like debuggers and profilers on your laptop, but whenever you put your software into production and something happens, you basically have a black box. And that's how I got excited about APM because APM basically makes your black box a white box more and more. So it provides you information about your code performing in production, gives you the golden signals, the duration of your application, a request by the user, the errors and context to the errors, the error rate. And also it provides you with user experience data. So for example, how an application performs in the browser of a user or on the mobile phone today, and that's exciting. It gives developers, it gives operators, it gives DevOps teams the right information to understand what's happening in production and if there is an issue, how to fix those issues.

Jerry Cuomo: Yeah. So it sounds like the telemetrics to get the data from end to end is key to that, to making the black box a white box. Is that true?

Mirko: Absolutely. I mean, only end to end really gives you the visibility into the whole application flow from the end user to basically the database maybe on the mainframe, right?

Jerry Cuomo: Yeah, yeah, true, true. So Mirko, is observability different than monitoring?

Mirko: It is. I mean, you could say it's different. You can also say it's an evolution. I would say it more as an evolution, and basically observability is the art of understanding what's happening inside of an application from the outside. And what has happened over the past few years is that developers got more and more tools to make their code observable, like adding traces to their code or adding metrics or adding logs. And there has been a lot of standardization. Open Telemetry is the newest project that basically standardizes how you instrument your code with traces, locks, and metrics. And also in some terms, standardize the protocol, so how that data is transferred between tools in between the application. So it's not something totally new, but it basically adds a new layer to it.

Jerry Cuomo: All right, Mirko, that makes sense. So now can you connect back into automation? So what is the connection between observability and automation?

Mirko: I think observability is the data source for automation, and I think there are multiple levels, how you can see it and how I think automation will work and AIOps will work on top of observability data. The first deck is basically automating on top of alerts and events. So basically if you figure out that something is not working, you can start a workflow or a runbook that automates based on the alert that you figured out. I think the next level, what you are seeing right now is the integration into the whole DevOps lifecycle. So tools, observability tools, monitoring tools integrate into the developer lifecycle, into CICD. They understand what it releases, they understand deployments versions, and then derive, for example, if a code change has caused an error or if a release performs better or worse than the release before. And then you can automate, you can automatically roll back, you can trigger a code change, et cetera, out of the box. And I think the third step, which I think today observability AI ops are not that good at it yet, is prediction in production. And what I mean by that is, for example, scaling up and scaling down by predicting traffic that's coming, load that's coming or fixing stuff before it even happens. Right?

Jerry Cuomo: Yeah. So Mirko, that gets into our third question that's about AI. So it sounds like you're on a role, keep going. Tell us how AI plays a role in observability.

Mirko: Yeah, I think AI is a very important role in this whole observability and especially automation space. And I think so far what you are seeing is how AI is adopted in the monitoring observability space, is basically baselining and learning what a normal state of behavior is. So you will see baseline seasonality tools are getting better understanding spikes. We have more dynamic thresholds for alerts. We understand error rates. This is what the state of the art is. And what's next in my point of view is, and what we are already doing in Instana, is understanding and learning patterns. And I always give an example that every developer knows. For example, if you look into a lock file today or into application locks, you have errors. And if you go to a developer, he will tell you that there are good errors and bad errors, which normally shouldn't happen, but what inaudible Yeah, that means basically somebody locks an arrow, but he knows that this is not a fatal error. That's yeah, it's an error, but it's not fatal, right? But if you pollute your whole lock file with these good errors, you maybe are not finding the bad errors because for a machine, an error is an error and you will look at error rates. So you could have 1, 000 errors where 998 are good, but two are different. And understanding these differences automatically, understanding patterns, understanding the context of a problem, understanding the complexity around it with the microservices, that's something that we are getting better and better. And also with what's in AI Ops and the technology within Instana, we can understand more complex things like language, the words there in the lock file, maybe even the natural language of an issue that was posted into ServiceNow or Jira or wherever. And then you can combine that knowledge and you can understand more complex patterns and scenarios.

Jerry Cuomo: Yeah, and I think that is key, the labeling of data and to then be able to group that data and to say these things are related. Maybe as you said, this is an era that came from a piece of code. There is an alert coming from maybe a running system in a cloud that's saying you're out of memory. And then to be able to group those things together, using both natural language for labeling, but also using machine learning model to look at the relationships between those things, and then provide recommendations based on past ticket history of how problems of that nature could have been solved. I think this is a wonderful mixture of things.

Mirko: And just adding to that, you already named it. I think topology and understanding context is very essential. Right? You cannot just compare two or three metrics. You need the whole context and understanding of topology like that service A is calling service B. With that topology, you can much better pinpoint to a root cause or group things together.

Jerry Cuomo: So one of the things when we talk about Instana is its specialty in cloud native applications and microservices. What are the complexities that customers see with cloud native applications? Clearly creating microservices is part of the nirvana of application development, but what's the other side of it? What problems do people run into when creating applications and how does Instana help with those situations?

Mirko: Yeah, I think the core issue in production is that these systems are what we say deep. What that means is, let's go back 20 years. When I was at IBM and we developed system, you had a WebSphere server, you deployed your application, that WebSphere server called the database. So you basically had two steps, right? Application calls database. So you could say the depth is two, right?

Jerry Cuomo: Yes.

Mirko: But today, a microservice application, we have customers, the depth is bigger than 100. So when a user clicks a button, you have 100 service calls behind that, it's called service, service, service, service. So if there is a problem, it's really hard to understand where the root cause of that problem is because it could be everywhere in this deep network of services that are called. And some of them are external, some of them are internal, and they are all interconnected, systems get redeployed. So the rate of change and the depths of the systems have become so complex that understanding that network and analyzing that network of services is really an art. And it's what makes it's so complex to manage these applications in production.

Jerry Cuomo: That's great, Mirko. It makes a lot of sense. So where do you see this all going? You mentioned the power of AI coming into observability and APM. What else could our audience look forward towards with respect to observability and automation?

Mirko: Personally, you maybe know that we have this little robot as our logo, and we call that robot Stan. And from day one, our vision for this whole space was that you get another SRE or DevOps teammate, which is Stan, that helps your DevOps team accomplish the work that needs to be done. And basically Stan takes over, I would say the simple repetitive work for the DevOps teams and frees up their time to concentrate on the stuff that's really important and they should care. And not about monitoring and watching dashboards and then doing restarts or scaling up, scaling down manually. And I think that's where we want to go. We want to create basically a teammate for SRE and DevOps teams that helps them becoming better in managing their complex application and production.

Jerry Cuomo: Oh, I love that. Yeah, that's awesome.

Mirko: And by the way, I mean there's still a lot to do to get there, but I think step by step, by becoming smarter, by applying an eye, getting the workflows done, we are getting there. And I think in the next two or three years, similar to what's autopilot or autonomous driving in for cars, I think we will get to a more autonomous way of managing applications and production.

Jerry Cuomo: Right. Wow. Stan, the SRE bot, that is your digital employee. That is amazing. So ladies and gentlemen, you've been listening to the Art of Automation with guest Mirko Nirvakovic, the CEO of Instana, and now our leader of observability within IBM Automation. Okay, we started this episode talking about working with the lights on such that you are data driven, which is the key enabler for automation. And Mirko showed us and told us about application performance monitoring and observability as being the place to start your automation journey. He also talked about modern cloud environments being dynamic and constantly increasing in complexity, and that most problems are neither known nor monitored. So observability addresses this key issue of these unknowns, so to speak, and enables you to continuously automate, get ahead with the lights on, exposing the facts and the data of where your opportunities and hotspots are, where automation would be of the greatest benefit. Well, folks, that's it. And once again, thank you, Mirko, for joining.

Mirko: Thank you. It was fun talking to you here.

Jerry Cuomo: And also I'd like to thank all of you for listening. Well, this is Jerry Cuomo, IBM fellow and Chief Technology Officer of Automation at IBM. See you again on an upcoming episode.

DESCRIPTION

Jerry is joined by CEO of Instana, Mirko Novakovic. They discuss how the increasing complexity of AI and cloud environments can lead to the creation of an enterprise black box, where no one knows exactly how an application works or how AI models make decisions. Mirko describes how Application Performance Monitoring and Observability play a critical role in tackling this problem for an enterprise by helping them "turn on the lights." He also introduces "Stan" (The Robot) to discuss the future of automation in observability.