AI for Sustainable IT - Part 1
DJ: You're listening to the Art of AI podcast, with your host, Jerry Cuomo. You are also listening to Making of the SRE Omelette with host, Kevin Yu. Welcome to today's podcast. I'm DJ, your AI host, and we're here to connect the dots between two trending areas with the help of subject matter experts from their respective podcasts. Jerry Cuomo from the Art of AI, and Kevin Yu of Making of the SRE Omelette. In this episode, the first in a two- part series, we explore the intersection of AI with sustainable IT operations, where innovation meets responsibility. We'll look at how AI has advanced from basic functions to sophisticated applications that complement human intellect, emphasizing the enhancement of both creativity and productivity. Alongside, we'll examine the principles of SRE and their application in not just maintaining, but improving the efficiency and sustainability of our technological environment. By tuning into this episode, you'll understand how AI and SRE converge to enhance and sustain technology operations for the future. Stay with us for the entire series to gain a complete understanding of these critical topics. Let's get right to the first question, starting with Jerry. Can you start off by explaining to our audience what is AI, and why has it become a hot topic recently?
Jerry Cuomo: Yeah sure, DJ and hey, thanks for pulling this together. I think it's going to be a great discussion between Kevin and I. What is AI, and why is it becoming such a hot topic you ask? AI is essentially a computer science field that focuses on creating machines capable of performing tasks that typically would require human intelligence. This includes things like learning, reasoning, problem solving, perception, language understanding, and it has evolved significantly over the years. Initially, it was about programming computers to do specific tasks. However, the real breakthrough came with the introduction of a technique called machine learning, which is a subset of AI. Machine learning enables systems to quote, unquote, " learn and improve from their experiences" without being explicitly programmed, which is pretty cool. Then later the introduction of neural networks, which mimic the human brain structure and function, and that marked another big jump in AI's growth. Fast- forward, the latest, and arguably the most exciting development in AI, is the emergence of these things that we call foundation models. These are large scale models trained on vast amounts of data, and it enables them to perform a wide range of tasks, sometimes beyond what they were initially trained for. This adaptability makes them foundational for building various types of new applications. Then this leads to generative AI, which is a subset of these foundation models, that focus on generating or creating new content, whether it be text images, or even music. It's like teaching AI to be creative, based on the patterns it has learned from its data. One of the notable examples of generative AI is large language models, and these are the things that are powering the chatbots that you all know and love, like ChatGPT and Google Bard. These models also trained on massive, and in this case, text data sets. They can understand and generate human- like text, making them incredibly versatile. They can write articles, compose poetry, or even my favorite, help generate source code. Now when I talk about AI, I often enjoy a little twist on the A part. For me it stands for augmented, as in augmented intelligence. In this view, humans contribute the creativity and emotional insight, while AI adds the vast data processing and pattern recognition. Yeah, so DJ in a nutshell, that's it. That's what AI is, and a little bit about what is making it exciting these days.
DJ: Thanks Jerry. Now Kevin, it's your turn. Your podcast touches on a unique intersection between site reliability engineering or SRE and sustainability. Can you also give the audience an overview of the discipline of SRE and then get us started by defining what do we mean with the term sustainability?
Kevin Yu: First of all, thank you for the invite DJ. It is actually quite different on the other side of the mic, and it is always a pleasure speaking with Jerry. One of the words you often here about SRE is what he just said, discipline meaning it is a practice. As people practice it often, it becomes a behavior, becomes something we just do. What does the practice of SRE, Site Reliability Engineering, teaches us to do? It is ultimately to deliver a great user experience for the users, allowing them to achieve success using the service, and in turn produce positive business results. To achieve the outcome, the discipline of SRE must not be a point- in- time activity. It must be applied in the entire life cycle of the product. Sorry if it sounds like a commercial for SRE, but as someone was reminding me, think of SRE as a plank and stretch you do to cure a bad back, for a great NPS, and happy customers. I'll give some additional examples of how it is applied to the lifecycle, so we can see how it becomes real. Start on design, SRE are there to ensure we understand the requirements, and start any architecture changes to achieve it.
Jerry Cuomo: Yep.
Kevin Yu: If we see challenging goals, it may mean early POCs profile concepts, so we validate assumptions and known limits, and adjust as needed. SREs are there doing building of the solution to make sure we understand potential failure points, and how we can detect and mitigate them. SREs work with engineers to have the right quality checks in the pipeline, so we can balance velocity and quality with data. Lastly, now this sounds like a commercial, SREs are the all seeing AI in production, knowing and acting to mitigate disruptions before they negatively impact users, and maintain that good user experience. Above all, SREs are passionate about eliminating inaudible work that are repetitive, manual, and not enduring, so we can maximize our efficiency to scale, and achieve that built once, wrong everywhere promise. We do all that with the art of AI and automation.
Jerry Cuomo: Yeah, I love that.
Kevin Yu: Oh, you also asked about sustainability. I like how Christina Shim, IBM VP for Sustainability explained it, so I'll steal it. She used the term by United Nations, " Sustainability is about how we can leave the planet better off for the next generation, than we are now."
DJ: Nice.
Kevin Yu: It is about development that meets the needs of what we need to do for the present day, without compromising the ability of future generations to meet their own needs. I think that is such a powerful capture, and it is much broader than just carbon emissions, we often hear about. It is really about how all of us, all the stakeholders, can work together, so while we meet the demands of today, we are not taking away demands of tomorrow.
DJ: That's good to hear, Kevin. Conserve today for a smarter tomorrow. Based on your opening statements, I can see how these ingredients come together to affect positive change. Okay. Now back to Jerry. We can all appreciate the impact of climate change. In fact, this past summer was the hottest on record, and I'm struggling to keep my circuits cooled.
Kevin Yu: Right.
DJ: I'm curious, is there an intersect between AI, SRE and sustainability?
Jerry Cuomo: Yes. Anytime there is a breakthrough in technology, we have an opportunity to not just progress our day- to- day, and make our everyday life better... and the reason why I like these topics, is they're all aiming at giving us back the greatest gift that we have on this planet, which is the gift of time.
DJ: True.
Jerry Cuomo: We can spend time doing the things that we love with the people that we love. I think having an SRE there, who is wisely keeping a system running and looking ahead, being proactive using AI, versus being reactive, is going to allow that person to have a weekend to be with their family, and avoid those terrible calls that you get.
DJ: Yeah, I really despise those calls saying the system is down, and they always seem to happen on a Friday.
Kevin Yu: Right.
Jerry Cuomo: Of course, technology takes energy, and some technology takes more energy than others. AI, and especially the type of AI that I started talking about earlier, foundation models, large language models, generative AI takes, in some cases, significant energy, that if we're not thinking about it from its entire life cycle, like you mentioned, like a good SRE is trained to do, it's so much cheaper to fix a problem before the problem occurs, then after. Being wise in AI requires a thoughtful approach to the AI lifecycle. Recently in an article I wrote that every prompt issued to ChatGPT costs a bottle of water.
DJ: Wow.
Kevin Yu: Wow.
Jerry Cuomo: When you think about it, the energy involved in traversing these vast neural networks, with the assistance of GPUs to help in the processing, to cool the GPUs, you need water, and you need pumps to pump the water through the system, which require electricity, et cetera, et cetera. I think there are wiser ways to apply AI, that will help me get the outcome I need. We're seeing a trend to sustainable AI through picking the smallest model and task tuning, fine- tuning in the model for particular tasks, so that you have an efficient model that's no bigger, or no smaller, than the job at hand. Instead of these vast models, which show you the art of the possible, they come at a cost. I think over time we're going to see a trend, with the right AI lifecycle, that allows you to start at the very beginning, start at the data, set the curation, and manage the lifecycle right, to governance of the AI model, so that it is authentic, it is nimble, and energy efficient. I see this happening, as we speak, with products from our friends and our employer at IBM, with watsonx and the like. It's really built on those principles. Can you have your cake and eat it too? Yes, you can. You can have a model that is amazing, that provides surprising results to augment what you do on a daily basis, and make you a better person, and give you back that gift of time, without jeopardizing many, many bottles of water to cool GPUs, and as you said, leave the planet the way it was given to us.
Kevin Yu: I love the capture being proactive and wise about AI. Jerry, I'm going to steal that. But perhaps we'll have it with an omelette.
Jerry Cuomo: Right, have the omelette needed to.
DJ: Kevin, can you flip this around? Jerry just took a closer look at AI. Now, can you do the same for SRE, looking at use cases at the intersection of AI and sustainability?
Kevin Yu: On the AI use cases for SRE, the starting point of the intersect of AI and SRE is AoPS. Jerry, you and I talked about that in the previous episode.
Jerry Cuomo: Yeah.
Kevin Yu: Just like self- driving cars was above wow factor a few years back, it is becoming a norm. I'm not talking about navigating completely from point A to B, but lane keeping assistant is becoming a common feature. Similar to that, there's AI assisted operations in most of the products now. I remember we used to have to set up hundreds of alerts, to notify us what trigger automation, to maintain user experience.
Jerry Cuomo: Yep.
Kevin Yu: Now I can pull up a tool like Instana and pick a golden signal such as latency error rate was inaudible, and tell it to do something when something abnormal happens. I can pick the standard deviations, and it's all just a matter of drag and drop, and clicks, and have it applied to everything monitors, and I just basically implemented a golden signal trigger action, and notification for all my systems, in seconds if not minutes. Jerry, I remember in our first podcast you talked about would it be great if AI can tell me, " Hey Kevin, if you want to go to the hockey game tonight, you probably shouldn't push this code to production now."
Jerry Cuomo: That's right.
Kevin Yu: Well, for context of the audience, I'm Canadian, so hockey's important to me. Guess what? I see flavors of that today. To select Instana can surface areas of potential concerns to users. Say I checked in a piece of bad code, and it has been running in the pipeline, the tool can now tell me, " Hey Kevin, there is an increasing error rate after the deployment marker. You should probably take a look now." Well, it doesn't quite say, " Hey Kevin," but it is able to surface those potential problems to prompt action, before it is too late.
Jerry Cuomo: Yeah.
Kevin Yu: By the way, it probably could, if we just put a chatbot to it.
Jerry Cuomo: Yeah, look at the risk in making a change. I think I also joked, maybe on that very podcast that AI, along with a healthy omelette, eats data for breakfast. With data accessible, from a series of disparate systems, now you can look at, " Well in GitHub, these types of changes have led to 70% in downstream incidents, and look, Kevin's about to check in a change in that code, so maybe there's a chance that," so you can look at impact analysis, across large amounts of data logs, and look at anomalies occurring, and then assess risk in doing certain tasks. Maybe you will make that hockey game.
Kevin Yu: Yes, hockey is important to us Canadians.
Jerry Cuomo: We have the Carolina Hurricanes that I frequent.
Kevin Yu: Right, and Carolina actually won a cup, before the Toronto Maple Leafs did.
Jerry Cuomo: Okay. All right. We're not going to talk about the Maple Leafs on this podcast.
Kevin Yu: We'll bring a whole set of audience here. Sorry DJ, back to the program.
DJ: Thank you for not forgetting that I'm actually the host today. All right. Kevin, it sounds like data represents the egg in this omelette. Can you say more about the role of data, to bring out the best in these ingredients of SRE and sustainability with AI?
Kevin Yu: You touch on it, Jerry. Data is critical in this AI journey, and on the intercept with sustainability, as we become more efficient at using the system would, naturally reduce compute. One of the primary use cases I've seen so far with SRE, AI, and sustainability, is right size the environment. Meaning, let's not provision 100 cores for workload, and have it sitting at 90% idle. How can we have visibility, to Jerry's point data, to the consumption, and partner with engineering teams to build a system that is elastic from the start-
Jerry Cuomo: Absolutely.
Kevin Yu: ... so wecan provision just what we need. This will help to just not reduce carbon, but also to reduce cost, so this tremendous business motivation.
DJ: Thanks Kevin. Okay, listeners, I think this is a good place to pause. We've covered some ground today with Kevin and Jerry, and their insights on the vital role of data in SRE and sustainability, blending in the precision of AI. Let's reflect on these insights. We'll continue this dialogue in the next part of our series, where we'll pick up this conversation and go deeper into the actionable aspects of sustainability and AI. Thank you Kevin and Jerry, for your expertise. Well, that's it for today. Please be sure to tune into part two of this two- part series on AI and sustainable IT operations. This is DJ signing off. Catch you again real soon.
DESCRIPTION
In this first episode of a two-part series, hosted by DJ, the AI chatbot, with guests Jerry Cuomo and Kevin Yu, their conversation explores the intersection of Artificial Intelligence (AI) and Site Reliability Engineering (SRE) in promoting sustainable IT operations.
You can catch Part 2 on Kevin's "Making of the SRE Omelette" feed.
Jerry discusses the complexities of AI, tracing its progression from machine learning to advanced stages like generative AI and large language models such as ChatGPT. While Kevin, provides insights into the SRE field, underscoring its significance in offering outstanding user experiences and advocating for sustainable methods.
This episode, a unique blend of "Art of AI" and "Making of the SRE Omelette," offers listeners insights into how AI and SRE collaborate to elevate and augment the efficiency and sustainability of our technology environment. The dialogue exploresthe critical importance of a forward-thinking and intelligent application of AI, its influence on energy usage, and the crucial role of data in refining SRE and sustainability practices. Be sure to listen to this episode for a thorough exploration of these vital subjects and join us for the complete series to deepen your comprehension of AI's impact in creating a sustainable technological future.
Oh... and be sure to catch Part 2 on Kevin's "Making of the SRE Omelette" feed.
* Coverart was created with the assistance of DALL·E 2 by OpenAI. ** Music for the podcast created by Mind The Gap Band - Cox, Cuomo, Haberkorn, Martin, Mosakowski, and Rodriguez