Episode 0 - Making of the SRE Omelette

Media Thumbnail
00:00
00:00
1x
  • 0.5
  • 1
  • 1.25
  • 1.5
  • 1.75
  • 2
This is a podcast episode titled, Episode 0 - Making of the SRE Omelette. The summary for this episode is: <p>An introduction to Making of the SRE Omelette Podcast. &nbsp;Kevin shares the motivation behind the show, explains what Omelette has to do with SRE and gives a preview of what to come.</p>

Kevin Yu: Welcome to the introduction episode of the Making of the SRE Omelet podcast. This show where we explore the positive business and client success outcome from site reliability engineering and hear found experts on how they influence the culture and mindset shift that led to those results. I'm your host, Kevin Yu, and I am the principal SRE at IBM applications where I lead SRE transformation of our organization to be a data driven engineering discipline. Join me and my guests on this journey to embrace SRE and deliver business and client success. What is the SRE you ask? SRE stands stands for site reliability engineering and originated from Google to treat operations as if it's a software problem, which goes out improving system reliability and scalability. Over the years, many organizations like IBM have also embraced the discipline. In my experience, SRE is a mindset that champions the culture of Agile and is engineering discipline that is data driven and KPI focused. It applies a life cycle approach towards building resilient solutions that meet business skills and delight clients. In this very first episode, let's call it episode zero. I will give the introduction to the show as some sneak peek into what you can expect in future episodes. One of the main motivations to start this podcast is I find many people and teams have struggled with driving the mindset shift to embrace SRE. Therefore, I'd like this show to help answer common questions people have in this journey as we bring in business and industry leaders in this domain so that the community as a whole can benefit from the insights and experiences from those leaders. And together we capture the business and customer values of SRE and have the knowledge to overcome challenges and achieve the outcome. To begin, you may wonder what does an omelet has to do with SRE? Well, this will be one of the many stories you will hear on this podcast. This story started with me struggling to convince people we need to prioritize our resources on SRE capabilities and tackle the technical debt. And so people no longer would debate the importance of SRE, we still struggle to balance tying on SRE features and capabilities versus other features. Does that sound familiar? So one of the opportunity with my general manager, inaudible, asking the question, how do we overcome the challenge of having time to drive the culture shift of SRE when everyone's so busy with existing tasks and putting out fires? I gave the analogy that it is a little like chicken or the egg, meaning we know we won't be so busy if we can only get some of those tasks done where the system can automatically detect and recover from disruptions or having guard rails to prevent users from harming the system, but we simply don't have time. So it is a little like chicken or the egg. His response was a self provoking one and he said, " Kevin, cultural is the outcome of what we do, in the context of the chicken or the egg, it is like an omelet." I thought it was a clever capture and really highlights that cultural is the outcome because emphasize that behavior is a key ingredient. With that, I embarked on studying how to influence people's behaviors. My initial research led to a research by Dr. Albert Bandura, a psychologist. His serial reciprocal determinism states that person's behavior both influences and is influenced by personal factors and the social environment, meaning that people's behavior not only influences themselves but also their environment. But the reverse is also true in that environment influence people's behaviors. The key to change behavior therefore arrives at the recognition of the influences that are working against us. Once we identify those influences, we must take action to change not only ourselves but the environment to accomplish the goal of changing behavior. Bringing into how we drive SRE culture in the state where people are overworked. This means we must service all influences that lead to the overwork state. Now some of those are visible like to- dos and deadlines, but many are hidden or implicit like everyone simply working late to meet deadlines or people simply keep doing repetitive menu tasks that has no enduring value. Only when we have those influences identify, we can then make conscious change to get rid of them so we can arrive at the outcome we want. At this point, I would take another quote this time for IBM's former CEO, Ginni Rometty. She said that I learned along the way culturalist behavior. That's all it is. Cultural is people's behaviors. So the key to driving the SRE cultural shift lies with influence people and the workplace environments. Now, if there's one thing I learned from leading large teams and being the dad of two boys is that incentive matters. It can give them all the skill, resource and plan, but if there is no incentive, it will always be an uphill battle, with incentive, people and kids I've learned will be highly motivated and will actually be driven to learn whatever is needed, find ways to get the resources, come with great plans and innovations. So the key therefore to influencing behavior is to find what motivates people. For my kids, it is iPads and gaming, but that doesn't quite work at the workplace. And this leads to this podcast. How are the leaders finding what motivates their team and how are they motivate the team towards the site, reliability, engineering discipline. Perhaps another story to capture the takeaway. I've seen teams getting really good at putting out fires. Meaning instead of manually being paged out about inaudible, causing service disruptions, they implemented a way to detect the disruption ultimately the JB and recycle. This leads to a significant improvement in recovery time from tens of minutes to seconds. However, if no one asks a question why those automate recovery runs all the time, the automation is simply covering up some hidden problems. To truly embrace SRE, we will have the team asking why the fires are happening, be curious to understand what led to the smoke, that led to the fire and find ways to mitigate that. This can lead to discovery that users are doing use cases such as running long reports, hence causing the system to have long run inquiries and hurt the system. So while we may be recovering for home system by recycling the JBN, users are still getting unsatisfied user experience every time they try to run those reports. The outcome of SRE is to build solutions that truly considers elasticity, scalability, and resiliency to drive customer and business success. What SRE strive for. In this podcast, I will have leaders and subject matter experts answer common questions such as what does SRE mean to them and how does it help their business and customers? How do they drop the cultural mindset shift? What do they find motivates their team? And what are some of the negative influences they have identified and how did they overcome those? In addition, we'll hear from guest stories that highlight the importance of SRE, the good, the bad, and the ugly, as well as how SRE have evolved over the years and where we think it is going. So thank you for listening and I look forward to this journey of making SRE omelet where we explore how organizations can embrace site, reliability, engineering and drive business and client success.

DESCRIPTION

An introduction to Making of the SRE Omelette Podcast.  Kevin shares the motivation behind the show, explains what Omelette has to do with SRE and gives a preview of what to come.