Episode 7 - Making SRE Real

Media Thumbnail
00:00
00:00
1x
  • 0.5
  • 1
  • 1.25
  • 1.5
  • 1.75
  • 2
This is a podcast episode titled, Episode 7 - Making SRE Real. The summary for this episode is: <p>Stacy Joines - IBM Fellow, VP and CTO in IBM WW Team for Global Markets shares her distinguished field experiences on driving client success for special events and peak shopping season like the US Cyber Monday. &nbsp;Stacy describes what it takes to make SRE outcome real, achieve the business and client success, and how to influence the prioritization to make it happen.</p><p><br></p><p><strong>Timestamps:</strong></p><p>[00:00&nbsp;-&nbsp;01:07] Intro to the episode</p><p>[01:29&nbsp;-&nbsp;07:10] IT culture: Through the cloud movement, hyper-scaling movement, etc.</p><p>[07:27&nbsp;-&nbsp;09:20] An example of a constant state of readiness</p><p>[10:21&nbsp;-&nbsp;13:25] The biggest driver for SRE</p><p>[13:41&nbsp;-&nbsp;18:52] Word's of wisdom for current and next gen SRE professionals</p><p>[22:49&nbsp;-&nbsp;25:07] Stacy's ingredient and recipe for the SRE Omelette</p>
This is how we do business
00:27 MIN
Intro to the episode
01:06 MIN
IT culture: Through the cloud movement, hyper-scaling movement, etc.
05:41 MIN
An example of a constant state of readiness
01:52 MIN
The biggest driver for SRE
03:03 MIN
Word's of wisdom for current and next gen SRE professionals
05:11 MIN
Stacy's ingredient and recipe for the SRE Omelette
02:18 MIN

Stacy: We're building a practice that helps us anticipate and helps us to be in a constant state of readiness. That big days, big events, new things, whatever comes our way, it's something that we're practiced, something that we're prepared for, something that we're doing all the time. It's part of our DNA. It's part of how we do business.

Kevin: Welcome to another episode of the Making of the SRE Omelet Podcast. Today's episode is all about making SRE real. Specifically, what does it take to achieve the SRE outcome and what does that look like. I am so excited to have Stacy Jones, IBM fellow, VP, and CTO in the IBM worldwide team for global markets join us today. I have the pleasure to work with Stacy for many years, and I know that everyone, team IBM and customers are happy when they see Stacy walk in the room, because we know no matter how bad the situation may look, she'll lead us to solve those problems and deliver a positive business outcome. Welcome to the show, Stacy.

Stacy: Hey, Kevin. Thank you. Gosh, that's such a warm introduction. I appreciate it. And we have known each other a while, haven't we?

Kevin: Yeah. It goes back decades and definitely share lots of experiences and lots of stories to tell.

Stacy: Well, I think you touch on something right at the beginning that you and I have talked about many times, which is in our profession, often we get introduced through something that's gone wrong. But the ideal here and what we've built a lot of collateral around and what we see happening in the industry, is that's all very reasonable stuff for an introduction. Often get introduced on people's worst day, but the real goal here is to build a practice around SRE. If we're doing this right, we're building a practice that helps us anticipate and helps us to be in a constant state of readiness. That big days, big events, new things, whatever comes our way, it's something that we're practiced. Something that we're prepared for, something that we're doing all the time. It's part of our DNA. It's part of how we do business. And that's the ultimate goal. We don't always start there, but that's where we want to be. And that has become such a big part of IT culture as we see this movement to cloud and this movement to hyper scaling. As we say, we have the sufficient materials to make that work, but we have to know how to use them and we have to know how to leverage all this capability that we've had in an unprecedented way. We've never had this type of compute ability available to regular companies and people ever in the history of the world. And how do we make the most of it? And to some extent, that's got to go back to our culture, not just thinking about how a programmer acts, but how all that work comes together and is a scalable, manageable, predictable thing. That we're leveraging it with our clients and with our business. Man, I went off on just the introduction. We're in trouble already, I think.

Kevin: No. This is definitely not your first rodeo, using our terms we used before. And I think you touch on many of the aspects, what does this really mean for our customers. How has this definition of good enough changed over time? Maybe you can take our audience through that. How has this good enough changed over time, from the good old inaudible days, to SaaS, and to the hybrid cloud that we all are going after?

Stacy: You and I were there on the ground floor of some of the things that were coming, that we see actuated and fully realized here. You and I were both in the retail practice for many years. A lot of the things that we see in the market today really came out of those experiences, whole concept of CICD and fail- fast and experimentation, bots, all those things. That was stuff that I like to believe at least originated in the retail space, and learning how to present things to clients to get the most impact from what we were doing, and getting that realized into a culture. So when we talk about SRE and we talk about DevOps and we talk about the things that we're doing these days in cloud, in hybrid clouds, these things were kind of born on the web. I spent some time working with the IBM Garage over a year and really looking at how this movement is not just a set of tools, but it's really a culture that's built around those tools, and that's what Garage is driving in our customer side. It's not just learning how a particular tool fits an IT process, but really thinking about how that IT process fits into this culture of weren't on the web. We're pushing things and experimenting with things and looking at how we most efficiently interact with a customer, how we scale, how we're prepared, and in a constant state of preparedness, I'll keep coming back to that probably as we talk, but how this is a change from the traditional IT of we're going to package this up, to your point, we're going to test it, we're going to deploy it, we're going to open bug reports against it, we're going to fix those, we're going to come out with the next release. In this space, particularly at the presentation layer for things that are in user focus, there may be hundreds of copies of that presentation layer out there that are being evaluated, about which is the most efficient, which is drawing the most traffic, which is driving the most business and the most click through for our company. And that's being evaluated automatically. And we can push different experiments throughout the entire space. Now, that's not true of all types of code, that's not true of all types of applications, but it's certainly a huge difference in how we were thinking about software 10, 15, 20 years ago. We think about a much more dynamic environment, we think about places that we can try things, and how we evaluate and populate around what we've tried. So it's a different culture. It's getting our heads around that culture. It's trying to figure out how we delineate all that with our existing IT processes and where we want to be in the future, and leverage some of this capability that's available to us.

Kevin: Much more dynamic, should to culture, of where we try, experiment, and automatically evaluate impact to the business and that user experience. Stacy, do you have some example of this constant state of readiness you spoke of?

Stacy: It's a real hybrid journey. How do we manage them? How do we keep our SLAs? How do we protect them? We've had some really bizarre things happen over the past couple years. No kidding, Stacy. Like deep freeze weather in Texas for crying out loud, volcanoes going off in the islands in the middle of the Atlantic, disrupting air traffic and other things, and just really situations that have exposed systems that maybe weren't completely prepared for sudden issues. It was something that maybe we thought we had done and maybe at one point in time we had done it.

Kevin: Or thought about it, like BCDR.

Stacy: Or thought about it. Or maybe we bought the right equipment to do it, but nobody quite got it all put together. But it wasn't part of our culture again. We thought ready was a one and done, but ready doesn't happen once. You and I have talked about that before. Ready doesn't happen once. Ready is a constant state of readiness. It's in our culture. We're using the elements of this equipment frequently, or we're at least testing it frequently. And getting that into our thought process and figuring out how it can fit into how we want to operate our business, our IT business. Because everything IT does is there because it supports a business need. And it's really quite amazing how things that aren't particularly esteemed in the IT space can suddenly become extremely important when things go wrong.

Kevin: So Stacy, you and I joke about in the good old days, people don't really consider for performance and reliability until weeks leading to Black Friday. And I'm so glad you mentioned it shouldn't be a point in time thought, meaning call Stacy a week before Black Friday, rather than something you should consider all the time.

Stacy: Or on Black Friday.

Kevin: Right. That might be a bit too late. You're great, but that would be pure magic.

Stacy: It's hard to get that going.

Kevin: So I think this touches on, I would say one of the biggest challenge everyone has in the industry, is to drive that prioritization for SRE and reliability features. Because often, it falls behind feature and function. In the good old days and even now, many still refer to those as non- functional requirements.

Stacy: That just burns me up. I don't like the term. I think the biggest driver, just to answer your question succinctly, the biggest driver is again, everything that we're doing in IT, usually it maps to a business need of the company. Talking about how that business need needs to behave and what kind of financial impact it could have is an important part of what we need to do when we're speaking to the business and helping them understand, because sometimes the folks who are on the business side are not necessarily acquainted in a substantial way with how the IT functional side has to work. In the early days of the work we were doing on the web, you and I would often encounter folks who said, " My website is not really a big deal, so we're not investing a lot." And, " Okay, well, you've invested quite a bit in time, if nothing else. And with a little bit more investment, this thing that you have built could be stable during peak loading times and could provide a good user experience." And after we got through that point of discussion and they had a successful sale event and understood the possibilities, people got really excited. And so, success is a big aspect in demonstrating the value of things there. In other industries, the importance is more regulatory, " If this becomes unavailable, I may not meet regulatory requirements." People may have SLAs to their clients, as well for services that they're providing. Then, " Here's how you're going to have to build out to support that SLA," and explaining all the pieces and parts. I think the challenge then was this seems like a lot of hard work for maybe an IT team that was more familiar with building things for point of sale, which is a much lower volume enterprise than a commercial public website in terms of inventory management, presentation management, sale management, order management, all those things. Getting that culture shift in place was very important.

Kevin: So it is a case of tailor the message to the audience, connect that impact, and benefit back to the business metric, and having that culture, you spoke of, that constant state of readiness, that shift to a line and prioritize what we do. Again, back to that business impact and class success. So Stacy, let's jump to the next segment. One of the main reasons why I started this podcast is on tech vitality. Any words of wisdom you would give to current and next generation of practitioners who may want to get into the profession?

Stacy: Wow. This could be a series, if you ask me questions like that.

Kevin: I'm more than happy to come back to you and have a sequel with Stacy.

Stacy: It is one of the blessings of this role, that I do get... And also, where I live, down here in RTP, we're surrounded by universities. It is exciting to get to talk to a lot of different students who are doing a lot of different things. And I always tell them, and I mean it from the bottom of my heart, I am so jealous of what they're going to get to see in a 30 or 40 year career that they're going to get to spend in IT that we're just beginning to touch on. You look at all the advances that we've had, I was very blessed to work on the Watson team, arguably the first AI product that was brought to market as a commercially available AI, and tailored for the client. It was an exciting time. And you think of the possibilities of that, that we're just beginning to unpack what we can do. And then, we look at the things that are coming next. IBM is a leader in the quantum space, and some of the things that we've delivered there, and the possibilities of quantum.

Kevin: You see on movies and now it's become a reality.

Stacy: Right. You go back to AI, and I think I've told you this in the past, every morning you woke up and it was like something that you had only seen or thought about before in science fiction was now real, something you could do. It speaks multiple languages. It understand all these different types of media. And just an exciting time, where we're getting this technology, we have some use cases in mind for it already, obviously that's why it got developed, but exploring the full extent of what it can do for humankind is just breathtaking. And I'm excited for the next generation of technologists, not that I'm planning to go anywhere soon. Just thinking of what that will do for us over the next 50 years, 100 years. What discoveries that can make for us, not in the physical sciences, but also where that could take us in computer science. What can this level of computer science do for us in the next level of computer science. It's an exciting time. And it's all new. It's like going from building houses with wood to steel. We've discovered steel and concrete. What could you do with that?

Kevin: That leap.

Stacy: Yeah. And behind all that is not just the applications, not just what we're going to do, and train it, but folks who are going to help guide that through the infrastructure that's available, and guide that so that it gets to as stable and consumable by its client base. And that's really what SRE is going to be about, I think, going forward. Not just one more dashboard, to my friend Jerry's comment, but how do we harness these things so that we can do it safely, repeatably, and consistently across a epic scale of things. It's just an unbelievably exciting time, we have only been able to imagine. It's like Hollywood wrote our specs for us here to some extent, just amazing stuff. So that that's a big area of conversation with students, about the visionary future. The short term future I think is an ongoing blurring between traditional IT roles and traditional business roles, particularly as we see things like AI really get traction in the market. Hopefully, we are going to end up in a place pretty quickly where some of this IT backlog gets broken by... We have the capability of having the person set up the things that they want to do, or the business things that they need to do using more human interaction, friendly tooling, computers that speak and operate on human term. I don't have to learn a language, I don't have to learn some macros, or whatever. I can just tell it what I want, it understands it. That, I think, is a wave that hopefully is getting momentum here soon. That's totally my opinion. That would be the next big thing that we're continuing to see IT permeate. IT moves more and more into our day- to- day lives.

Kevin: Would love that. Essentially get to what we spoke of. It is not just cyber liability engineers doing SRE, but everyone involved with building the solution. And we can get there by making it easier for people to interact and enable those capabilities.

Stacy: Right. Some of these SRE capabilities have to be consumable as the IT becomes more consumable out to the edge. And particularly, as it gets more important. I'm looking around my office here at home, there are tons of gadgets. One day, they'll probably be under one umbrella of things. And hopefully, those SRE concepts become as consumable as the end capabilities that the applications are built around. So I can manage how my thermostats operate without-

Kevin: Without knowing the internals of how it worked.

Stacy: Right. Without having to know the internals, and be able to simplify that whole process of how these things operate together, because that's really where I think we're headed. This is again my opinion, but how all this stuff kind of meshes together. And now, we don't have a necessarily centrally operated set of functionality, but how all this stuff gets managed and pushed together at the edge. So really, SRE, to your point, Kevin, needs to move closer to the edge as well in a consumable way. It doesn't make us all experts at how to deploy a full site to a cloud, but how I can just make sure that I get cold air in the summer and hot air in the winter out of this thing. That's all I want.

Kevin: Right. And Stacy, by the way, I started to use the term SRE features for simplicity, to capture SRE requirements, drive that into Aha! And whatnot. And I think in closing, I think in a way this relate to one of your first insights you gave us on how you drive prioritization, is to demonstrate success. And with that, it make it easier to have that conversation, to gain momentum and have the priorities.

Stacy: I think it's really important to say we're going to do this, we're going to get these capabilities, we're going to be in a place where we could have these capabilities, like a hybrid cloud. And so, by thinking a little bit more about it and doing these things as part of our cultural practice, we are always ready. And it may require some more upfront investment in planning, but if it becomes just the way we do business, then we are constantly ensuring that our systems have the scalability, high availability, disaster recovery that they need, because we're constantly leveraging those capabilities in our day- to- day business. That's the ideal. Now, that takes planning and practice. It is an intentional thing. It is as intentional as anything on Earth. Just because we've gone to a cloud, just because we've done this, that, or the other, we've got to understand how the story fits together and make sure that we're driving that story on a regular basis, and demonstrating how this helps us maximize the value out of all these things we're doing.

Kevin: I think that's a perfect segue to the last segment of the podcast. Here we go back to the title of the podcast, Making of the SRE Omelet. Stacy, what would be your ingredient and recipe for SRE to get to the business outcome?

Stacy: Kevin, I love the analogy. I love the omelet analogy. And when we make an omelet every day, in fact, I think I had one for lunch, and when we make an omelet, we're making a new thing. And the thing that makes an omelet really good in my opinion, is for the things that we put in there to be fresh. And that doesn't mean, I want to be careful, that doesn't mean that in order to build a new thing, we have to do all new things, but it does mean that we have to take a fresh look at everything. Is it a good fit for what we're trying to do? And if not, what do we need to get it there? So if we're looking to build this highly scalable, highly reliable hybrid cloud implementation, how does everything fit together? And we'll go back to that conversation about... In the early days of retail, we had a lot of folks who did a lot of work on their website, only to point back to a system that was quite out of date and maybe designed for a very small point of sale business, is now taking real time orders. How are your components? How do things look in your omelet, if you will? And not just in terms of your infrastructure and your applications, but also our internal processes. And we touched on this, this approach of SRE, and some of the elements that we can bring to bear. It's a bit of a culture change. Is everything and everyone preparing to come on board with that? And how do we fit that into our culture, so that we can take advantage of what we're getting ready to modernize to? So I think that's a bit of the challenge, taking a fresh look at how we're going to put all this together, and are there some pieces that need to get refreshed and modernized. Or is it just really looking at changing culture and changing how we do things more so than throwing out perfectly good pieces just because. So taking a fresh look, Kevin. That's my answer. That's fresh stuff for fresh omelets, to solve our problems and to do new things. I think that would be my choice there.

Kevin: I love it, Stacy. It is a perfect answer from your consulting lens. Taking that fresh perspective, and fresh doesn't mean we do all new things, but asking and validating the reason behind how we build those solutions in the first place and the goals we want it to achieve, then surface the elements that needed to change. And in a way, that's also the best way to get buy- in from the team. Because let's face it, change is hard and we don't want to change what's working.

Stacy: Yeah, exactly. I think that's the biggest thing. And coffee, lots of coffee to go with your... That's really all you need.

Kevin: We'll save the recipe for a good coffee for another podcast.

Stacy: It's exciting to see something that you and I have worked on our entire careers really hit the mainstream. And it certainly makes it a lot easier to explain the value of these processes and these capabilities. It's way more than just one more dashboard, it's really how the orchestration of everything that you're doing, how we orchestrate that to deliver value and to take advantage of the capabilities that you have, and to meet your SLAs, and to meet your regulatory requirements, and to keep everything safe and secure. It's the glue and the conveyor belt, if you will, the machine that makes all that possible. And it's an exciting time to be in the professional, I'll say it again.

Kevin: Yeah. No, definitely. Thank you so much, Stacy, for spending time with us, to take us through this journey of evolution from performance Black Friday, Cyber Monday, to what you captured really well. What a fantastic time to be in this cyber reliability engineering profession. Thank you so much for this.

Stacy: Yeah. No worries, Kevin. Thank you for inviting me.

Kevin: I would also like to thank you, the audience, for listening. See you again on upcoming episode.

DESCRIPTION

Stacy Joines - IBM Fellow, VP and CTO in IBM WW Team for Global Markets shares her distinguished field experiences on driving client success for special events and peak shopping season like the US Cyber Monday.  Stacy describes what it takes to make SRE outcome real, achieve the business and client success, and how to influence the prioritization to make it happen.

Today's Host

Guest Thumbnail

Kevin Yu

|Principal SRE, IBM Sustainability Software

Today's Guests

Guest Thumbnail

Stacy Joines

|IBM Fellow, VP, CTO IBM Account Team for Kyndryl