Episode 14 - There is no SRE without Team
Bill Higgins: Coordination is when I reach out to you because I have to. Collaboration is when I reach out to you because I think by reaching out to you and pulling you into my activity, there will be a better outcome than had I not reached out to you.
Speaker 2: Hi everyone. Welcome back to another episode of the making of the SRE Omelette podcast, where we explore the positive business and client success outcome from site reliable engineering and how experts influence the cultural and mindset shift that led to those results. SRE is a team sport. I'll use a Canadian reference here, there's no hockey without assist, and to me, we can say the same for business and client success. Without assists from team members, those achievements will not be possible. Here to talk about the importance of teamwork and how to ask for help is Bill Higgins. Bill is the director of research and development at IBM Watson and is a champion of technical vitality and diversity. His passion to help others is contagious, and I'm so excited to have him here with us today. Welcome to the show, Bill.
Bill Higgins: Hi, it's good to be here.
Speaker 2: Bill, could you please share with the audience your career journey to get us started?
Bill Higgins: So I grew up in Pennsylvania and I went to Penn State University in the late 1990s for computer science. Then I came to IBM in 2001 where I've been ever since. My first five years or so I was working on application development and architecture, and then from 2006 to 2010, I worked in the rational group building the product called Rational Team Concert, which was a really great experience because the team had some of the best software developers in the world. The team lead was Erich Gamma, who had written design patterns and created J Unit now leads VS Code and just some other really amazing folks. So that was a really great place to be a young software engineer and learn about the craft of software engineering and architecture, about leadership and about good teamwork. Then in about 2011 to 2013, I went to the Tivoli division, which does operational tools. And so I took the lessons learned from the development tools and the operational tools and really got into DevOps and then I helped IBM get into DevOps. Then in the mid- 2010s, I actually took a left turn and went to the design group because there was this big design transformation going on. So I brought the DevOps knowledge and the engineering knowledge, synthesize that with the design folks with the design knowledge and really started to think about a whole team approach, design, product management and engineering for building great products. One of the things we realized was that you need tools that actually bring people together. So then in the mid 2010s along with your previous guest, David Lee, we actually rolled out a brand new set of tools for IBM like Slack, GitHub, Travis- CI, etc., and really started to bring the company much closer together instead of being a bunch of silos. Then based on that experience, I then took over the development of our Watson AI technology because we had this aspiration to infuse AI into all the IBM software products. And so in some ways it was a similar project to getting everybody to adopt new tools. Now, we were trying to get them to adopt a new programming model with AI, and so I've been working on that for about four years now. We're just getting ready for our upcoming think conference where we're going to roll out our NextGen AI stack with foundation models and open source a bunch of the technology that we've created over the past four years.
Speaker 2: Wow, what an amazing journey. And I now know who to thank for bringing us the wonderful tools such as Slack. And there you go, ladies and gentlemen, what Bill has captured is why I'm so excited for him to be here today, demonstrated successes, leading teams on diverse roles across product development and organizations. Bill, I'm sure you have come across many great teams and people. I recall stories shared on the Unlearning Podcast with Barry O'Reilly where you spoke of people that are best at what they do, are aggressive at trying things. Can you take this audience through that story?
Bill Higgins: Barry and I were talking about AI and what's different about AI versus traditional programming. And of course with traditional programming, you basically encode a set of procedures like while loops, for loops, if statements, L statements, and so you very deterministically understand what you want the system to do. Whereas with AI, you actually learn from data and it's much more probabilistic. The example I used with Barry was the way the Tesla semi self- driving system works. So basically, the goal of the Tesla semi self- driving system is to keep you between the lines and a consistent distance behind the car in front of you. And if it's working correctly, you basically never have to touch the wheel. I mean, you have to touch the wheel to prove that you're actually paying attention, but theoretically, you don't have to touch the wheel. So every time a person takes over the wheel, Tesla considers that an error and it feeds it back to the machine learning system so that it can try to get better. So that feedback on error corrections is a way that AI systems get better over time. The point I made with Barry is that in my experience, people who are really successful are those who actually try to do hard things, take calculated risks. If you're trying hard things and taking calculated risks, you will fail from time to time, sometimes small, sometimes big. And the people that are the most successful are the ones who actually don't give up when they have a setback, but rather learn from it and figure out how they can do better next time. I have a good friend who's a famous skateboarder named Rodney Mullen, and he has a similar talk about in the world of skateboarding, the best skateboarders like him, his buddy Tony Hawk and some of the other Bones Brigade members, one characteristic of them is that they actually fall down the most because they're trying hard tricks. They're trying tricks that have never been done before, but they keep getting up and they keep trying again. And if you keep doing that, you can do tricks that nobody else in the world can do, literally.
Speaker 2: What a great set of stories, you won't learn and get better if you don't try something new. And the key is to not just give up, but to recognize those failures and falls and taking the feedback to learn from them. It reminds me of one of my earliest mentor, Harry Pickett, told me many years ago, he said, " Kevin, always look to try something you're not comfortable with." I remember back then I was like, why? That sounds scary. But to your point, if we never push our abilities, we will never get to learn and get better.
Bill Higgins: Yeah, it's fundamental to learning. Sports are nice as an analogy because they're so concrete and measurable. And so think about somebody who's a bad tennis player. With good coaching, you can concretely show them how to serve better. And it might start with very small things like change your grip, change your stance, change how high you throw the ball up in the air, change the angle at which you do the racket. But if you get the coaching on the better practices, you're literally rewiring your brain so that it becomes, we call it muscle memory. It's really just a mental model, so that you can consistently hit 120 mile an hour serves in play, which seems almost super human.
Speaker 2: And seems so routine, at least on TV.
Bill Higgins: But it's because of that deliberate practice. But you can use deliberate practice with anything. In a way, if you think about it, it's analogous for having a good tennis coach teaching you how to serve. It's analogous to have a senior engineer giving you feedback on your pull requests. So whereas with the tennis coach, they're saying, " Oh, the ball is still five inches too low, throw it five inches higher." And with the pull request, they might say, " Oh, you didn't think of this error condition. You always forget that this particular type of error condition." And if you get that feedback enough in the course of day- to- day work about things you've been thinking really hard about, then over time it becomes as baked into your mind as how you tie your shoes. It's actually harder to describe something that's really become entrenched in your mind. Try to describe how to tie your shoes verbally. It's very hard. Or try to describe how you operate a car. It's very hard because it's just become an intuitive thing versus something you have to consciously think about.
Speaker 2: I remember when I tried to teach my boys how to tie the shoe, was it around the rabbit ears or something like that? I said, " Let me just show you."
Bill Higgins: But now it's just become part of their mental model. They no longer have to think about it consciously.
Speaker 2: So Bill, what is the art of learning if there is one?
Bill Higgins: The way I think about learning is the following. There, I don't even know how to categorize it, thousands, millions of things you could be good at. You could be a great driver, you could be a great tennis player, you'd be a great software engineer. But each of those areas of expertise breaks down into a set of tasks that are fairly concrete, like writing code, writing test cases, thinking about error conditions. So you really have to get concrete about what you're trying to get better at. Am I trying to get better at writing code? Am I going to try to get better at communication? Am I going to try to get better at emotional intelligence? And then once you choose the area where you want to level up. And one analogy I use is in video games. So usually a lot of video games, you start off with a character who's relatively weak, but over time you gain skills and power, you gain skills and health, you gain skills and weaponry. So, I try to think about what are the set of skills that I need in order to be successful at whatever it is I'm trying to do to be better at what I'm trying to do. Then pick one or two in a six- month period that you want to level up. Say that you tend to get frustrated in meetings, and then you tend to lash out and it creates a toxic environment. So that's a well- known area of science called emotional intelligence, and there's books about it. So you pick that area, and then I actually use a three- legged stool. I study books, I pick a mentor and then I practice on the job. And so over a six- month period or so, if you're intentional about trying to get better, very concretely, not in the abstract behaviors, it's like, if somebody says something that makes me angry, I'm going to count to 10 before I respond. If I am about to dash off an angry email, I will save it in the drafts folder until the next day. So you have to get very concrete about behaviors you can change. And then if you do those behaviors enough, eventually it becomes habitual and all of a sudden it's like tying your shoes. You no longer have to think about it, and then you can move on to the next skill. So it's really about what do you need to level up and why? And then I use three- legged stool of study, mentoring and on the job practice
Speaker 2: Level up with Bill, I know I've gained at least a few points of intelligence since the start of the conversation. I love the analogy of video games. I know I use that a lot with my kids because it is very relatable and get engaged.
Bill Higgins: I think both video games and sports are good teaching mechanisms, because they're simplified models of the world. The everyday world is almost infinitely complex. But in video games, they have to have a set of rules because somebody had to program them. And in sports they have to have a set of rules. So if you can use those as not just analogies, but actually you can take lessons from video games and lessons from sports, it can help you get your head around some of the complexity of the world.
Speaker 2: So Bill, thank you for the journey of getting better through learning from trying new things. Now, naturally as we learn, we are going to come across things we don't know of and let's bring the team environment here. Is there the art of asking for help?
Bill Higgins: Yeah, so asking for help is an underappreciated skill, and there's one of the metaphors I always use is there's ditches on both sides of the road. So there's extremes which are both bad. And so one ditch is when you just ask for help automatically without actually trying to do something yourself. Somebody gets a problem and they immediately say, " Help me do this thing." Well, if you do that, number one, you're not going to learn anything. Number two, people are going to think you're not serious. The ditch on the other side of the road is when you wait too long to ask for help, and that one is less understood. So I think the first one, people understand when you hear about success, it's like try to muddle through yourself to a certain extent because that's one of the ways that you learn by actually trying a bunch of things that don't work. But the other one is less obvious, and I actually have a story around it. So about, let's see, seven years ago, a company approached us at about a partnership. And for reasons, I was the one selected on the IBM side to lead the partnership to basically use Watson in this product. I was pretty confident at the start of the project, because I knew the product really well and I was in charge of the AI stuff. But the project started going off the rails. I won't go into the reasons why, because that's not really the point, but let's just say the project was starting to go off the rails, and we weren't making progress. And so about maybe six months into the project, I had to have a meeting with the CEO of the other company we were partnering with and with the very senior person inside IBM, who was the executive sponsor for the partnership on behalf of our CEO. And the meeting was just awful. It was just clear the project was completely off the rails. We didn't have anything to show. And it's like, " What have you all been doing for six months?" And so the next week I had a talk with my manager and she was like, " What happened with this project? I thought you had it under control." I started telling her, " Well, this thing was wrong. That thing was wrong. This person didn't help us." And she said, " Why didn't you tell me this four months ago?" And I said, " Well, I'm an IBM distinguished engineer. We're supposed to figure hard things out." And she looked at me and she goes, " Bill, I'm a general manager. I'm much more senior than you. I asked for help when I need it." And she said, " One of the skills of an executive is to know when to ask for help." And so basically, if you put the two thoughts together, like asking for help too soon and asking for help too late, it takes you back to Agile 101 and time boxing. So for a given task, whether it's a small task or a very hard task, give yourself a roadmap of what you think good progress would look like and then work really hard to do it yourself or with your team, but without asking for help. But if you miss a milestone, you at least should tell your leaders that you miss the milestone, what you think is going wrong and how you hope to correct it, but you could get to a point where you actually don't know how to correct it. That's what I call being stuck. And so if you're feeling stuck, that's when you go to your manager. And if you've tried to do all the stuff yourself, then you're going to say, " Look, I tried this, I tried that. I thought this would work. Here's why it didn't work." And they're going to say, " You know what? You worked really hard to get this and I understand why you're stuck. Maybe you need some help at my level. Maybe I need to ping the CEO of the company and tell him that the partners' engineers need to be more engaged than they are." So I think it's really an art about knowing when to ask for help. And since that happened, I've actually, again, going back to the deliberate practice, I think I've become really good at knowing when to ask for help and things just move much faster. We've had a much greater impact ever since then, because the folks above you want you to succeed. If they're good, they don't penalize you for asking for help. They actually appreciate that because it helps the whole thing go faster. I mean, in some ways, that's their job. Their job is to set strategy, set strategic context, and then help people when they get blocked. That's the job of an executive to some extent and to allocate resources intentionally.
Speaker 2: In my dialogue with Kareem Joseph, my general manager, and now SVP, after we reveal progress, he always asks if there's anything he can do to help. He welcomed us, telling him of the blockers we're having so he can help us succeed. So Bill, stay in the middle of the road, not on either set of the ditches of giving up too fast without trying or taking too long by ourselves. The moment we're stuck after learning and trying, it's a time to ask for that help. So Bill, now we see the benefit and importance of knowing when to ask for help. If we were to put on our mentor, manager or leader's hat, what can people do to make it easier for people to ask for help?
Bill Higgins: Well, I mean, I think you just gave a great example with Kareem. So when he asks proactively, " What are your blockers?" It comes down to psychological safety. If people think they'll be penalized for asking for help, they're not going to ask for help. So by asking you, " What are your blockers?" Even though he is not saying it explicitly, what Kareem is telling you is that I expect you to have blockers because you're doing hard stuff. And so by just asking you what are your blockers, he normalizes that. He expects you to have blockers. And so that creates psychological safety.
Speaker 2: I like that.
Bill Higgins: The other thing though is when somebody does ask for help, you can't penalize them, even if you think maybe they could have done something a little bit better before they ask for help. So basically when somebody asks me for help, one of the first things I always say is, " Number one, thank you. I'm glad you trust me enough that you feel safe to ask me for help." And again, normalize that it's actually a good thing you appreciate that they ask for help as opposed to, " Why are you bothering me with this? You're an SCSM. You should be able to figure this out yourself." Because you want to keep that feedback loop open. And if you penalize them in any way, whether it's like a soft penalization, like a sigh, or a hard penalization, like you give them a worse performance review, they'll learn either way and they'll adapt either way. So the more psychological safety you create, the more they'll be willing to bring more sensitive issues to you, and then you can be a better helper to them. And then finally, follow through. So if somebody asks you for help, you work through the plan with them, you do this, I'll do that. And for whatever things you sign up for, make that your top priority. Because again, if you do that consistently, it's going to either create a virtuous cycle if you do the positive things, or a vicious cycle if you do the negative things.
Speaker 2: Wow, what a great capture, really to foster a culture and environment where people feel safe to ask for help and follow suit on the help promised to reinforce this cycle.
Bill Higgins: The last thing I'll say is in the example with the partnership failure, my manager told me she was a general manager. She said, " I ask for help." And one of the reasons I wrote that article about that story was to say, " I ask for help." So it's also important as leaders that we say, not only is it normal for somebody who reports to me to ask for help, but make it clear that even at a much senior level, we also ask for help.
Speaker 2: Very good point. It definitely makes people feel even safer and normal to do so when they know even their leaders ask for help too. So Bill, you leveled us up in learning and asking for help. Can you take us to what makes a great team next?
Bill Higgins: It starts with team design. inaudible was just humble. He had this thing that says, he said, " It's skills not roles." DevOps team is a team where you've got the right aggregate set of skills. It's not like a role. Whenever I take on a mission, the first thing I always do is a team design. For instance, with the current project Watson Core, core AI building blocks for mission- critical use cases. So to do that, you need a variety of skills. Of course, you need AI people, but you also need software engineers because if you think about infusing AI and mission- critical systems, it's really the intersection of AI and software engineering. There's very few magical unicorns who can be great at both. So just like with DevOps, you need some awesome AI people, you need some awesome software engineers, and they need to know, understand and respect enough about each other's craft that they can work well together. It starts with the team design. So what are the aggregate set of skills across engineering, across product management, across design, and also levels of seniority With engineering, for example, you're going to need some senior engineers in order to really lead on the architecture. They should also be in the code base, but they're the ones who need to be accountable for the architectural decisions and leading that trade off analysis. So that's just the basic team design. Then of course, actual people have to fit into that team design. With people, I look at two things. I look at the skills. You don't want to take visual designer and ask them to be a machine learning engineer and vice versa. So there's a basic skills fit, but then there's also, and I know this is a controversial term, but a culture fit. The culture I promote on my team is a culture of psychological safety, inclusiveness and appreciation. I know that might sound like a bunch of canned wavy kumbaya stuff, but basically if you select for people like that and then you encourage that sort of behavior, you have a team who plays together like 2017 FC Barcelona or 1986 Boston Celtics, who not only are great at playing together, but they actually love playing with one another. While that's just nice on a human level, it actually results in higher performance because when you really enjoy what you're doing and when you enjoy working with the people who you spend eight hours a day with, you actually have better performance. There is a humanist side of it for sure, but there's also just a performance and business outcome side of it. The team I lead right now, the Watson Core team, is the best team I've ever worked with, ever, and I've worked with some really good teams. The number one reason that it's better than those other teams is that it's the happiest and healthiest team. There's a really wonderful ritual that we have called Thankful Thursday, still in Slack. We actually got this from, we took it from the marketing team, so thanks marketing team. Every Thursday there's a Slack reminder that goes off that says, " At here, has somebody done something awesome this week? Think about recognizing them with a thankful Thursday." And then if you look in the Slack channel on Thursday, you'll see yesterday I had a goofy one. I said, thankful Thursday to Mike Hollinger and Alex Brooks for tolerating my brainstorming about how Watson Corps might someday power industrial robots. But then there's other ones like Thankful Thursday to Joe for providing the best code review ever. So yesterday, I think we had a record of Thankful Thursdays. I think we had something like 30 of them on a 20 person team.
Speaker 2: Oh, wow.
Bill Higgins: And so that culture of appreciation, actually, it goes back to-
Speaker 2: It's contagious.
Bill Higgins: It really is. It really is. Contagious is a good word for it. There's cultural values which are intangible, and then there's manifestation of cultural values. So one of the cultural values I really believe in which I got from friends of mine a long time ago, Matt Lavin and inaudible, was" celebrate awesomeness." So human nature is that we tend to speak up when somebody does something bad. It's like, " Why did you do that?" And then when somebody does something good that's just thanks, or you don't even say thanks. And so typically, the natural course of things like human nature is you only see the bad stuff. It's like nobody appreciates plumbing until the toilet doesn't work. So the Thankful Thursday is an explicit cultural signifier that we do appreciate each other. And one of the nice things when you mix that with the Slack channel paradigm, it's visible to everybody else. So not only do you get the recognition in the Slack channel, but also you become aware of things you might not have known about. It's like, huh, that's pretty cool, and so it also helps with awareness among the team.
Speaker 2: I really like it, be intentional with showing appreciation. Bill, it reminds me of my conversation with David Lee in that we often do incident learning on what went wrong, but hey, for everything that went wrong, there are probably hundreds of things that went right, so we should take the time to celebrate the success as well.
Bill Higgins: And related to that, one of David's and my longtime mentors is a guy named John Allspaw, who used to work at Etsy, and now he works at Adaptive Capacity Labs. And one thing he told me, I remember him telling me this like it was yesterday, because it was just such a light bulb moment. He said, " Don't call them postmortems because that has negative connotations. Call them learning reviews and don't just do learning reviews when something goes'wrong,' do learning reviews when something goes right." So probably the single best run, mission critical project I've ever seen is one that David led when we had to migrate our IBM GitHub Enterprise instance, which is the largest GitHub enterprise instance in the world, from its version two architecture to its version three architecture. Basically version two architecture hit fundamental scalability problems that just hurt reliability. Let's just leave it at that. It was a scary situation. And basically we needed to migrate to a new architecture, but we didn't know what that new architecture was. It's like a ticking time bomb, because every day people put more content into GitHub exacerbating the scalability problem. So the longer we waited, the worse the problem got and the closer we came to disaster. David led a project over maybe about six months to come up with a new architecture, and then the thing that really impressed me was how much they practiced. They must have done 50 test migrations, and they did things like pulled out power plugs, pulled out hard drives, to just find all the possible error conditions they could imagine, and then of course, put in more countermeasures against it. So when we finally did the migration Labor Day 2017, it was, first of all, it was supposed to take two and a half days, and I think they got it done in one and a half days.
Speaker 2: Wow.
Bill Higgins: And then afterwards, it was scary. It was like a scary silence because we kept waiting for the other shoe to drop, but nothing went wrong or just very tiny things went wrong. And so after that, we actually did a learning review about what went right-
Speaker 2: I love it.
Bill Higgins: ...what we learned, but also what we might do differently because there were missteps along the way. And so I think it's just really important to both reflect on failures, but also to reflect on successes
Speaker 2: Here, here, definitely. In the land of reliability engineering, I always say boring is good.
Bill Higgins: Boring is good, but incidents are one of our best sources of learning.
Speaker 2: Oh, yes. There's no better way of requirements and production for sure.
Bill Higgins: Yeah, I got to use my favorite quote from Richard Cook, success comes from experience. Experience comes from failure.
Speaker 2: That's a great quote to take us to the final segment, and here's the signature question of this podcast that goes back to the inspiration of the omelette. Bill, what is your ingredient and recipe for teamwork?
Bill Higgins: I think it really starts with humility, curiosity, and presuming brilliance. Let's just use DevOps since the audience is, sorry, you said that DevOps is probably closer to home. So what do the battle days look like? The battle days look like the following. The application developers are writing code as fast as possible, and they're not thinking about things like operability, observability, robustness, and the monitoring infrastructure to inform humans for resilience. They look at the system administrators as adversaries because those darn system administrators are the ones who make it harder to get my new features in production. They have these security standards, they have these change windows, etc., so they're the other. Then other side, the systems administrators, they look at the developers and they're like, these are the folks who destabilize the systems. These are the ones who don't write enough test cases. These are the ones who have zero logging. So that when we, the system administrators, get woken up at 3: 00 AM on a Saturday night, we don't know what to do, and it's incredibly stressful. Those application developers are reckless and careless. So that's sort of the natural tribalism that back to our hunter- gatherer days as a evolutionary survival instinct. But to get past that, you have to, again, I said humility. So you say, " How does this person create value?" That's also curiosity, and what might I be doing wrong that's hurting this collaboration? Maybe that's a humility. And then expecting awesomeness. So you say, " I bet the system administrator is amazing and I'm going to talk to them and understand how they create value, and I'm going to talk to them and ask them, how can I earn your trust? And how can I be a better collaborator?" In any collaboration I have these days with a new leader, I got my spiel about how I'm committed to collaboration and learning what they do, learning how they add value. And then I say, " Let me know how I can earn your trust and let me know if I ever do anything that hurts your trust in me," and that's the right mindset. The other thing I'll say, which is another thing I got from John Allspaw, he said, " What's the difference between coordination and collaboration?" And I was like, "I don't know. You tell me." He said, " Coordination is when I reach out to you because I have to. That's what the process is. There's a step in the process where I have to notify you of something. " Collaboration is when I reach out to you because I think by reaching out to you and pulling you into my activity, there will be a better outcome than had I not reached out to you." And so in a healthy team, people deeply appreciate the diversity of experiences and skills, and we call them superpowers, like in the Avengers' movies, that different people have. They first of all treasure those skills, diverse perspectives and superpowers, and they proactively reach out in collaboration to create better outcomes, like the whole is greater than the sum of the parts. So I guess that's a rambling way of how I think about teamwork.
Speaker 2: There you go, ladies and gentlemen, the ingredient and recipe for teamwork of superheroes from Bill Higgins. Bill, thank you so much for spending the time with us today.
Bill Higgins: Thank you.
Speaker 2: And I'd like to thank you, the audience, for listening. You can also check our reference section of the podcast on Casted for references spoke of by Bill and link to his Unlearning podcast. See you on a feature episode.
DESCRIPTION
SRE is a team sport - without assists from team members, business and client successes would not be possible. Bill shares how we can be successful via learning, knowing the indicators to ask for help and how to foster an environment where people feel safe to ask for help. Lastly, his ingredient and recipe on building a high performing team. Listen in to Bill and let's level up on your teamwork!