Episode 17 - SRE for Managers
Marion Clelland: It's interesting you mentioned toil because for me, I don't think we've done enough SRE on the systems that managers use in the background. There is so much more toil in my job now and some of it's getting better. But yeah, I mean I'd love to set up SRE for management systems.
Kevin Yu: Hi, everyone. Welcome back to another episode of the Making of the SRE Omelette podcast, where we explore how we achieve positive business and client success outcome via the practice of site reliability engineering. As we look to wrap up season one, I really wanted to have a perspective of SRE and SRE manager because managers provide leadership support and mentorship to their teams. They help employees reach their goals and shelter them from unnecessary distractions to help them focus. I'm so excited to have Marion Clelland join us today. Marion is a development and SRE manager for IBM Cloud Container Registry and vulnerability advisor based in UK. She's also a key member of the IBM SRE profession core team and the first woman to be thought leader certified, SRE in IBM. I have the pleasure of working with Marion in many SRE profession activities and I can think of no better person to speak to on this topic. Welcome to the show, Marion.
Marion Clelland: Cool. Thanks Kevin for having me. Excited to talk about SRE and cooking, which are two of my favorite things.
Kevin Yu: There you go. Ladies and gentlemen, you know are in for a treat today. So Marion, why don't you get us started by sharing with us your career journey?
Marion Clelland: I can, and I guess it's just not really very traditional. I've not really gone up through a route and come out at the end in a specific place that I'd assumed I would end up. I did business IT at university. This is my 17th, 18th year in IBM and my first role was actually in operations and it was one of my favorite roles. I must admit. I sort of look back on that and go, " Wow, yeah, that was a really good role and I really learned loads of things from it." But early in your career you're very much taught to try lots of different things and so I went off and I did development after that and then it was, " Oh, so what's your career plan?" I was like, " Oh, I don't really know. I've not really thought about it." But there was, I guess a bit of a pressure to decide what you want to do when you grow up. And so I went down what I thought was the path that I wanted to do, which was architect and I was good at it, but I sort of got to this point where I was a bit like, " I don't really find this fun anymore." And then there was a decision point for me, me in terms of having a family and what do I do? So I looked for a role that made it easier for me to do that transition, become a mother, and I moved business units so that I could be closer to home and I went into technical PM work, but I sort of got to this place where I was like, " I'm not enjoying this and I don't know why I am coming into work every day and putting my daughter in childcare. I'm not really enjoying it." So I sort of had a bit of a reflection point at that time and I was like, " Well, what have I really enjoyed in my career?" And it was that first role in ops and I was like, " Could I do that, but where I am?" So in the department I was working in at the time, we'd sort of gone through a transition, we were moving into cloud, we were a software product, how do we make that a SaaS offering? And there was this gap in that most of the people in the department came up through that traditional software development route and they hadn't done the operations side. So I could offer something a bit different because I'd done it before. And so it was from there that I started researching what SRE was, how could we implement it? And from there took it out, expanded it, we had a pilot, we grew it and I set it up for the department, which was a really cool thing to do because you could sort of use some of your past experience, but also it was like, I don't know, 16 years on in my career and things had changed. So there were all these new practices, SRE was a bit of a buzz and I could bring those things to the team and that was really cool. Then I guess management, we'll talk about how did I go into management after that? It's a separate question.
Kevin Yu: What you did there was such a great highlight of we should often reflect, I mean as part of SRE, we do instant learning and reflection, what you just spoke of your career journey is a really good example of how we should also reflect on what we enjoy about our work, what we don't, and do something about it.
Marion Clelland: Yeah, yeah, you go to work for so many hours a day, I really think you should enjoy it.
Kevin Yu: Great role model there.
Marion Clelland: Thank you.
Kevin Yu: So what made you consider to go into the manager's role?
Marion Clelland: So as you can see from my past, I don't really stay in a role for very long, which I don't know if that's a good thing or a bad thing, but for me it sort of helped me build something that I just really enjoy doing. As I said, if you're coming to work for that many hours a day, you might as well enjoy it whilst you're there. And so I think really for me it's just trying lots of different things, understanding what I'm good at, what I don't really enjoy, and how can I create a role where I'm doing things that I enjoy most of the time and it feels like I'm adding some value. And so I know too, a lot of my sponsors moving away from SRE and I haven't fully moved away from SRE, but moving away from a technical hands- on SRE role, it seemed to my sponsors as quite a rapid decision, but it honestly wasn't. I'd done a training course called Insight into Management I think maybe three years before I made the move. And it was just something that I started to naturally moved towards. I was in a leadership role, I was leading a team. There were things that I would do as a team lead that perhaps not all other team leads would do. So I would hold one- to- ones with everyone in my team, which felt like a bit of a manager thing to do, but for me it was important to understand were they enjoying what they were doing? Was the stress of the on- call too much? Were the things that I could do to make things better for them? And very much of the servant leadership style of management. And so all those sorts of things were coming together and for me it just became, " Well, this is where I want to head next." But like I said to my sponsors, it appeared quite a rash decision. I think it was maybe the first time in my career that I'd started to be considered as an SME in a particular subject matter. So to then go, " Oh, that's it, I'm going to throw in the towel. I'm going to go and move over to management." Maybe seemed a bit weird. I was the first woman and I think still perhaps the only woman in IBM to be certified as thought leader as SRE. And for the work that I'd done on the SRE pilot and setting that up, I'd won a technical industry award. So I was starting to get a really big name in SRE and then, oh no, and now I'm off and I'm a manager. And I think there's also a bit of a pressure as a woman, maybe just this is me, but I don't know. But to stay technical and to then be one of those women who just goes off and does a management role. I don't know, it had a bit of an internal fight with myself. But like I said at the beginning, it was just I wanted to do something that I found fun and I enjoyed, and I think I just felt naturally I was good at the role and wanted to give it a shot and see how I got on. And that's really how I've done most things in my career.
Kevin Yu: Wow. Marion, I'll say it again, amazing role model you are. Those are traits of a great leader and person. And Marion, I'm sure you'll inspire more women to follow your path. Speaking of liking what we do, I recall having a conversation with our friend Jerry Cuomo. I think he caught me on a bad day and I was telling him, " I'm not sure I like what I'm doing." And he reminded me just eliminating toil to be less than 50% of what SRE does. " Kevin, you don't have to be happy with what you do a hundred percent of the time. You just have to make sure you do have more good days versus bad days."
Marion Clelland: Yeah, yeah. Well, so it's interesting you mentioned toil. Because for me, I don't think we've done enough SRE on the systems that managers use in the background. There is so much more toil in my job now.
Kevin Yu: Oh, no.
Marion Clelland: And some of it's getting better. But yeah, I mean I'd love to set up SRE for management systems because I find I spend so much time now typing in boxes, different boxes, same information between spreadsheets into different systems. It's like, " Oh, why have we not automated this yet?" So that's one thing. And so I, like I say, love to get some of that fixed. And then other things, I think you get a different insight and perspective on decision points that are made within the business that I think when you're an engineer, you just think people are doing it to spite you, but there's reasons behind it and oftentimes they're financial reasons or legal reasons. And obviously as a manager you get insight into those things and then it's a case of, well, what do I pass on to the team? What do I try and fix before they even find out that it might have been a problem? Do I tell them that I fixed it or do I just gloss over it? Because I'm here, I see my primary job as trying to make the world a better place for my engineering team to work in and why burden them with the idea that I had to go off and fix something for them? So I do find my scrum updates really challenging in a, well, what do I actually say that I did yesterday? Because I did all these things. I was trying to get this license sorted out for you because I didn't want you knowing that maybe we were going to have to change the tooling you were going to be using and that would be an interrupt to you. So I did all those things, but I don't really feel like I need to tell you any of it. So I was in meetings yesterday is usually my update. Meetings yesterday, more meetings today. And then the other thing is it's a bit weird not being in the weeds anymore. One of the things that I really enjoyed, and maybe I'm just weird, is love incidents and get involved in incidents. I really found it was how I learned about a system and how it all hung together because when something fails, you see the knock on impact across the architecture and you lose sight of that when you become a manager and you start seeing the incidents in the metrics that get reported and what's your NTTR and why did it take so long for you to detect that? And you start seeing it as those numbers and you start losing some of the detail. And so I find it really interesting going through the incident reports with my team just to keep my own knowledge up of how the system works. I find that fascinating because like I say, I've sort of stepped away that I'm not hands- on, but I do still find it interesting and engaging. So I love sitting in on RCAs with my team.
Kevin Yu: You know Marion, that's a really interesting perspective. I never thought about that way, but it makes perfect sense. Perhaps you can get yourself, maybe you can start hosting some of the chaos testing game day events.
Marion Clelland: Yes, that would be a good idea.
Kevin Yu: Wait, you can be the one that breaks the-
Marion Clelland: I can break things quite easily. Don't worry about that.
Kevin Yu: And then challenge your team to see what did Marion break this time? Did she touch anything, right?
Marion Clelland: Yeah. Yeah.
Kevin Yu: And then it reminds me, David Lee from the CIO, they have a ridic practice discussing incident learning through storytelling and perhaps that's a practice that you and your team can get into as well. That kind of thing.
Marion Clelland: Yeah, nice. Sounds really good.
Kevin Yu: Yeah. Yeah. No, and I'll say that the toil you mentioned that you started to see when you took on this role. I remember I had to go procure some licensing for our team and oh my gosh, I did not know how hard it is to spend money.
Marion Clelland: Yeah, it really is.
Kevin Yu: I think you hit something there. I think you also mentioned that you won numerous IT awards for you to be hands on. Marion definitely see opportunities for you to win more awards for you eliminating toil at the management and the finance level. I think about all the reporting rather than a PowerPoint, especially how can we do those on the glass, on the dashboards?
Marion Clelland: Yeah. The amount of times you have to put the same information in three different places is unbelievable. Different templates for different audiences. I'm just like, " Oh, can we not just have one?" I don't know.
Kevin Yu: Yes. You know what Marion, let's work on reducing that as a to- do both.
Marion Clelland: Absolutely, yes. Let's do that.
Kevin Yu: Everyone's lot better. So Marion, I think you touched on some of those. What would you say is the top challenges that you see from your team as a manager?
Marion Clelland: I think it's hard to say what the top challenge for my team is. I would probably poll them. I've got 16 people in my team. I'd probably get 120 different answers if I asked, " What's your number one top challenge?" But a challenge for sure that I've seen in both my teams is the alerting balance. So how do you get the right balance of alerting so that your team can get sleep and rest and they're not interrupted 24 7, but also that you notice it before your customers do. And SLOs help, but I don't think they're the end there really. For me, they help in that your team understand what they're trying to aim for, but unless it's published, your customers don't know. And particularly around the question of performance and what is slow. I mean that's certainly something that we've been talking a lot in our team about at the moment because slow to us might not be slow to someone else. The whole slow is the new down. Yeah, I absolutely agree with that. How do we know what is acceptable? And like I say, SLOs help there, but until you're publishing them and agreeing them externally and then it becomes an SLA, and you don't really want SLAs on performance per se. So it is really difficult getting that balance of how do I not interrupt the team all the time, but how do we notice before our customers do? What's the precise point where you need your alerting to kick in without disturbing everyone all the time? I find that's really hard, but you iterate on it, don't you? So you have a starting point. If it's not right, you try again. And I think really that's just life is learning from failure, isn't it? So with the whole reinventing myself, you do it all the time, you iterate, you learn, and you improve over time. So I'm sure we'll get there and then we'll change your requirements and we'll have to do it all over again.
Kevin Yu: Yeah, exactly. I like the point you just mentioned, don't be afraid to change your requirements if that's what's needed to give the life back to the team.
Marion Clelland: Yeah, absolutely.
Kevin Yu: I think over time having dialogues with our customers, we would come to a consensus. What is the definition of slow? Yes.
Marion Clelland: Yeah, we'll see.
Kevin Yu: Let's hope. So let me turn that around. What is the top challenge for you?
Marion Clelland: So that's an interesting one too. So I think something that I've slowly learned to do over time, and I still find it hard and it's different now as well, that I'm in a management position is I care a lot about the work that I do. And for me, I can find that comes out as quite an emotional response sometimes. Particularly when I was doing my previous role and I was setting up the SRE pilot, setting up a new team, there were sometimes management decisions that came down. And again, I wasn't a manager at the time, so there were all these decisions coming down around finance and shape of the organization that I didn't quite understand. And I got really quite upset about it to the point that sometimes I would be in tears, but it's because I really cared about the team. I cared about what we were doing. And for me that's been quite difficult. And now as a manager, I get different insights into things and I also get insights into people's personal lives, which is great because the team trusts me enough to share things with me. But then it makes it quite hard on me because you're sort of carrying this load of, " Oh no, this person in the team is struggling because of this thing, and am I doing the right thing for them and do I need to change the way that I'm interacting with them?" And there's all these questions all the time. And so for me, caring about your job is a really good thing, but you also need to learn how to compartmentalize, put work aside, switch your laptop off at the end of the day and actually leave it off. I'm getting much better at it. I used to be dreadful. I used to still be online late at night, but now it is very much laptops shut. I need to stop, I need to do something else because it just gets too much otherwise.
Kevin Yu: You get emotion because you care. And I think that's again, it's a trade- off, a good manager, a good human being, right?
Marion Clelland: Yeah. Yeah. It's all about being human, isn't it, at the end of the day?
Kevin Yu: Yeah. Yeah, it is. And having that empathy, like you said, having that perspective, how does your team feel? And I know personally I contemplate the management role and I think what you mentioned about that being able to regulate emotion, compartmentalizing. Have you found a way to do that better?
Marion Clelland: Oh no. I don't know. I don't think there's any magic solution around it beyond, you just have to do it. You have to force yourself to know today I'm switching off my laptop and I'm going to go do something else because I have to for my own mental health. I don't think there is an easy way of doing it beyond just being disciplined and switching technology off. If I come up with something, I'll come back to you though, Kevin.
Kevin Yu: Sounds good. We'll do a sequel after this episode.
Marion Clelland: Yeah, sure.
Kevin Yu: And I think what you mentioned about is really helping your team to, and maybe set a good example of if you're not on call, put your work aside, have your life back?
Marion Clelland: Yes, please do.
Kevin Yu: Yes. So let's touch on, at least from my, I see a lot of measurement on hey, we need to improve our efficiency of scale, which got into how can we build a team that's more efficient, more effective? Lots of kumbaya stuff there. But what is your definition of an effective and impactful team?
Marion Clelland: It's a difficult question to answer, but I think it's one where people feel that they can challenge each other's ideas in a safe environment. Disruption can be a good thing. And I think in my team I've got some really good strong personalities and they're not afraid to challenge one another constructively. And I think a lot of that comes from diversity. Not to steer the conversation away from SRE, but I think diversity really is a big challenge for us at the moment. And getting more diverse teams is a way of increasing our productivity. And it is not even the obvious, gender and things like that. It's just different skills and different life experiences and even moving from a different business unit. So I was talking to a colleague just the other week, they were less technical than the rest of their team. They'd come in from a different business unit and they were like, " Oh, I don't know what I'm bringing to this team. They're all so much more technical than I am. They're deeper in this, they understand these things." And we had a bit of a conversation about it and there were other things that this person was bringing to the team. They were bringing more organizational skills than the rest of the team had. They had far more client insight into how products were being used. And it wasn't until those missing skills had been brought into the team that the team realized that they were actually needed. And I think until you bring in lots of different people with different backgrounds and different experiences, you don't actually understand that the diversity is missing. You need your team to feel free and safe to experiment, and you get those new ideas from different people coming in from outside and going, " Have you thought about doing it a different way?" So that sort of thing I think is really important and we do really need to consider it.
Kevin Yu: I really like that because if you just have people who have the same background in experience or previous roles, they will all be asking the same questions. And I remember one of the previous guests, MP English from Google when they got a new intern who joined the team, and one of the first question was like, " Why do we do things this way?" If we don't have a good answer for it, that leads to us to challenge the status quo, which I saw was reading. And I think really highlighted by what you just mentioned. I really like how your definition of measurement of impactful and effective is really from how we drive that disruption, how we are challenging that status quo of finding a better way to do things.
Marion Clelland: And I think we need to reward it as well. So when you have those disruptive moments and people challenging things and bringing in new ideas, how do we reward that? Someone going and fixing some tech debt, it doesn't always go noticed, but recognizing it to the team, these accomplishments happen, we've got rid of this tech debt, they're important things to celebrate as well as the features and as well as the reliability improvements. It's those other things that you also need to reward the team for as well.
Kevin Yu: Hear, hear, and I'm going to go off the script here.
Marion Clelland: That's fine.
Kevin Yu: So this is an interesting area and I think you were just mentioning you miss being in the trenches, in the fire, putting on the fire.
Marion Clelland: I do.
Kevin Yu: I would say for my own career, I have definitely been recognized a lot more often when I was the one, " Hey, there's a big fire. Who do we call? Kevin?" Kevin went in, put out the fire, get recognized.
Marion Clelland: Well done.
Kevin Yu: I benefited from a lot of that. However, as we look at the SRE, it's really about, " How do we prevent that fire?"
Marion Clelland: Absolutely. Yeah.
Kevin Yu: What are your thoughts about help actively promote and recognize people who prevented the fire?
Marion Clelland: It's difficult because you don't get a lot of time to do it. That's the hard thing. And really I think it's allowing the team to feel like they can do it and not get in trouble. So we've got all these deliverables, yes, we've got all these incidents that we need to manage. Lots of challenges around time management and getting things in plan, but how do you also carve out the time to do those other things? And people don't always feel like they're allowed to, like there's some permission to do it because it's not on a plan. So if you go off and do something else, it feels like you're not doing the right pieces of work to move the project on, and that's all valid. Trying to have conversations with a team in, could you spend 10% of your time? It doesn't have to be a lot of your time, but 10% of your time doing something that you think will help these things. Is there PRs that you can clear up that have been hanging around for ages that are debt that we've just not sorted out? Are there issues that you could raise that you could then put into our backlog so that we could fix it in the future if it's too big to just fix in an afternoon? Because unless you highlight the tech debt in your plan somewhere, it will always go unnoticed and no one will ever fix it and execs won't know that it exists. So you have to be upfront about it and show that it exists for your team to be able to then have the permission to go and fix it. And then rewarding it, I think it needs to be bundled in with the same way that you reward feature development. They should be on a par with one another because they're all things that drive the business forward. They're all things that are important for us to do so that we continue to make money as a business. And if you don't reward consistently, they don't get done.
Kevin Yu: People are smart, people is going to change your behavior based on what's been recognized, right?
Marion Clelland: Yeah, absolutely. Yeah.
Kevin Yu: So I really like that, creating that safe environment definitely is first and foremost. I like what you call out perhaps spending time, maybe you don't completely solve it, but you raise it and bring it from the center so they get prioritized for the next spring sessions. Wiping up this segment then, Marion, what would be your definition of a good SRE manager?
Marion Clelland: It's a tricky question. I don't know. Is it different to other managers or is it the same as other managers? So I guess there is an element of understanding, especially where I've been in the role before, is understanding that there is a stress around being on call, having the pager, getting that alarm going off in the middle of the night and your phone is ringing and you're like, " Oh god, what is happening?" It seeps into your home life. And so I think there is an element of understanding that. It's not the same level as emergency services, doctors, nurses, that's sort of role. But equally, you don't know who's on the other end of your system as much as you might put on your product. This can't be used for life- saving critical systems. You don't know if it is or not. And so there's always that element of stress in a, well, I don't know all the customers, particularly in a cloud environment, anyone could be using it. And particularly in IBM Cloud, it's big banks, it's masses amounts of money. And so there is a stress there. And so understanding that as a manager is really important because you can empathize, you can talk through ways to deal with it. It is a part of the role. It's not something that you can just go, " Well, I just don't want to do that part of the role" because it sort of makes you not really doing an SRE role. So there is that side of it, but I think it goes back to what we said at the beginning around it's all about being a nice good human being. And I think management is a lot like that. It's being consistent in the way that you deliver messages. It's being authentic in who you are. I think one of the nicest compliments I had from someone in my team was, " You've always been the same person. So from the day that you hired me to this conversation right now, I've always known who you are and you've always spoken to me in the same way." It's an element of no surprises. I mean, no one wants to go, " Oh, is Marion having a good day today? Or is she in a bad mood today? I don't know if I should bring her this problem today or tomorrow because how is she going to react to it?" You don't want to be that person. You need to be someone who can build that trust because once you've done that, it feeds into the blameless culture. People will bring you their problems. They will say, " We had this incident and oh, actually it was because I pressed the wrong key." Okay, well that's great and let's fix it so that no one else in the whole organization can press that wrong key because it doesn't come down to people. And now going into, I guess RCAs and things, but you need people to be upfront about how they got into a specific situation so that you can fix it with the technology solutions. And I need my team to feel that they can trust me so that we can get these problems solved.
Kevin Yu: Going back originally having that safe environment, people are willing to challenge the status quo. People are willing to share perspectives and insights to really drive that change. So thank you so much, Marion, for the insights of good SRE manager. I would say just manager and human being general. Big part of this podcast is to drive tech vitality of SRE. And I know this is an area you're passionate about as well. You know that one of the biggest challenges practitioners have, and we often hear is that I simply don't have time. Marion, Kevin, I'm so busy putting off fires. I don't have time to learn. I don't have the time to get a certification. Do you see that challenge as well? And if so, how have you helped your team address it?
Marion Clelland: I 100% see that in pretty much everyone in my team. Yes. And there's mandatory education that comes down as well as all the other things that you want to learn. And so it's how do you prioritize all those things and your day job and responding to the incidents. And there isn't a magic answer to this problem. It's discipline, I'm afraid. But there's a mindset to it as well, I think. So there's the premise in investing where you invest little and often and you see big gains. And that's how I see skills building really. And we are in a skills- based market, so we reward based on market value of skills. So ensuring that your skills are up- to- date is incredibly important for your career and you should be making time for it. But yes, it's hard. So I've said about making time an hour of your week, working week is only 2. 5% of the time you're in work. It's not really a massive amount of time. So you can make time for it, you make time for other things. So make time to build your skills, block the time out in your calendar. I try to encourage my team, and I don't think they all do it, but I do encourage my team to block out some time in the afternoon on a Friday. If you don't work on a Friday, a different day. Get off social media, close your email down. It's having that agreement with the team as well also helps because if you know everyone else in the team is doing it on a Friday afternoon, then you're less likely to interrupt each other. And I think there's also an element of does my manager allow me to go and do this? Is this allowed? I don't know. Should I really be actually working on this Epic instead? Because that's the most important thing. We're getting a lot of flack about it, maybe from execs, I don't know. But I think you also have to not just communicate that it's allowed, but also role model it yourself. So I do try to highlight to my team that I'm also keeping my skills up to date. I will post in Slack about things that I've done, courses that I've been on. I'd recommend you maybe watch this video because I found it really good. And there's also all the not so obvious learning, so it doesn't have to be a course. I subscribe to quite a lot of SRE newsletters. There'll be articles in there. They may be just a five- minute read, but I'll go through them. I'll filter out the ones that I think maybe the team would be interested in, maybe we've been working on some performance issues and oh, lo and behold, there's an article about some other company who's also been going through the same thing and how have they set their SLOs and what have they done? And so I'll ping it to the team so that they can have a read of it. And it's only five minutes, but it gets the team thinking, different insights, all those sorts of things, making sure the team are pairing. It happens, maybe it doesn't happen enough, but pairing with one another and building up each other's skills is also really important. I did suggest a really wacky idea to someone in my team. We were talking about how do you go to conferences when you can't get funding to travel? And maybe we could come up with our own conference day. A lot of the conferences that you do, so SREcon and there was DevOps Institute Summit, they put YouTube videos out there. Could we curate our own conference by all submitting a video that we thought was really cool that we've seen and build it into our own little mini conference day in the office? And obviously some of that came from organizing RSE conference last year and IBM, but I was like, wouldn't that be a really cool idea? Because everyone gets to share something that they thought would be really beneficial for the team, and you get lots of different ideas from lots of different people and could we build it into our own little conference? So I think there's ways. You just need to make it fun as well.
Kevin Yu: Yeah, speaking of fun, I can't remember where I read it. Nobody likes going to meetings, but everyone loves going to an event.
Marion Clelland: It's rebranding. That's all it is, Kevin.
Kevin Yu: That's it, right? So I love your idea of an event where we have a watch party, and Marion you also reminded me, I had a session with Bill Higgins. He touch on the art of asking for help. And one of the best way to show your team that it's okay to do so is letting them know that you do it too. And I think you did a really good highlight here where you showing your team that, " Hey, I don't know everything. I am still learning."
Marion Clelland: I really don't.
Kevin Yu: And this is what I do. So I think that's a really fantastic way of, again, being a good role model. So Marion, any hints for potential practitioners looking to get into SRE on what they can do to get ready? Perhaps you can break it down for a new hire as well as someone who is a experienced SRE.
Marion Clelland: For me, SRE really roots itself in development and engineering. So being an awesome developer is really important. And for me, there's also, I guess the difference between being a disciplined professional engineer versus coder. And that's not to say either which one is wrong or right, but I think when you're looking at SRE, you really need someone who's disciplined because they're the people who will pick out your tech debt. They're the ones who will think about clean code. They will really elevate the quality of the products that you are delivering. And it's not always just about around the edges in the way that you might do CIS admin or ops. It's how do you improve the product that's being delivered as well. And for me, having someone with a really strong base in development and engineering is really important. Beyond that, getting into SRE in the beginning, it's understanding things like the terminology. You band around SLO, MTTR, what do they really mean? How do you get to those things? There's loads of really good books that you can read about it. And just getting a basic understanding of what they mean and why you care about them is really important. And then I mentioned earlier about articles. I mean, I read incident reports from other companies just because it's fun. Do that. What other problems do companies have? What are we reporting about what's going on in the world? How do people mitigate incidents? What do people care about? Getting that understanding of what can go wrong so that you can improve how you respond to things, I think is really important. That's I guess where I'd sort of start. You asked about what I'd look for in a new hire. I think really it comes down to really wanting to learn. So we mentioned about building skills. It really comes across in an interview, if someone's passionate about learning and passionate about the subject, it is very obvious. It's not something that you can easily fake. And the most successful people that I've brought into teams have been the ones who have really come across as they just want to learn. They just want to absorb everything. Most tech skills you can teach, I don't need you to have a hundred percent of the skills that we've put on an application to join the company. If I was going back to the diversity point, I'd say to women who won't apply for jobs unless they've got a hundred percent, that's not what I'm looking for as a manager. It really isn't. I know that I can teach you these skills if you're passionate about learning. So please apply if you're doubting yourself. If you've got 70%, just apply anyway. Obviously it depends on the experience you are hiring for. So if I need a senior engineer and I need them to be well- versed in something from day one, that's a different scenario. But for the majority of roles, a lot of the skills we could just teach you when you're here, that's really important to learn.
Kevin Yu: Yeah. Marion, I remember reading your LinkedIn post on that. It is a great call- out. I will also add that to embrace neurodiversity. The same is also true. Please don't feel that you have to meet every requirement before you apply.
Marion Clelland: Yeah, please don't.
Kevin Yu: And I think this is perhaps a call- out to all managers who create job posts to clearly separate must haves versus nice to haves. This will help us get the most diverse applicants. So thank you so much Marion for that. I really like what you're talking about learning incidents from other companies. I think that really feedback into just building that more additional perspective and boost that diversity of SaaS to help us come with better way to solve problems. Do you have some SaaS there to boost sharpen people's skills there?
Marion Clelland: Yeah, so I guess one of the things that I don't see maybe enough SRE doing is actually using what they're supporting. So there's a SaaS offering that you support. Are you using the product? Do you try it out? Do you know how your customers are interacting with the system where things could fail? What happens when you have an outage? What's the customer experience? We do a lot of design thinking in IBM, and I don't think enough SRE think it applied to them, but for me it really does because the whole reason you are doing your role is so that customers have an amazing experience using your product. And I think if you don't understand how it works for them when you have an outage, then that's a really big problem. The other thing for me maybe is as you understood from my career path, I'm a little bit anti SME. I become an SME, and then I go, " Oh, let's just go do something else." But I think I've learned a lot from doing that because I've had enough different experiences that I bring something different to the role all the time. And so I don't think there's a problem being an SRE maybe for software and they're like a SaaS product and then maybe going, " Oh, actually maybe I will try being an infrastructure SME." It's build your skills by moving around. And even in SRE, it sounds quite niche, but there are lots of different opportunities and different things you can get involved in as an SRE. And I think as my experience has taught me, trying different things and learning lots of different experiences just makes you a stronger whatever it is you want to be at the end of the day. A human being, let's go with that again. All these experiences, they shape you and you bring something different. And I think that's really important.
Kevin Yu: I love it. So Marion, where do you think SRE is going?
Marion Clelland: Such a difficult question. I really didn't know how to answer this one at all. I guess in a way, SRE is still a little bit early on. It is a bit new, and maybe it shouldn't be, but he does still feel new. And I think we're still very much agreeing, " What does SRE mean?" I was talking to someone a while back who was critiquing the way another team was doing SE, and they were like, " That is not proper SRE." I was like, " Well, what is proper SE? And is it a particular way of doing it? Is it a particular set of things that you do or is it a little more like agile? Is it principles?" I don't know. Maybe there is a, " This is SRE and this is how you should do it", but for me, I really feel it should be tailored to what your business needs are because maybe ops is right for you. Maybe you don't need SRE. If you're not thinking about the business and what the business requirements are, then why are you doing it at all? And for me, that was definitely, it came out in my previous role where we were setting up the pilot because my SE team, we didn't look at CICD at all. And for some people they would say, " Well, you're not doing SRE then." But we were doing so many other things that really were SRE, but the business didn't need us to care about CICD. It wasn't a problem. So we needed to focus on what the problems were, and that was our alerting infrastructure, to be honest. It was the metrics. We had nothing there. And so we had to really build that up and to focus then on CICD as well, it would've been too much of a distraction and we would not have fixed the business problem that we were there to do. And then my other thought, which might get me in trouble with some people is we started out doing SRE. I mean, in Google, and it was because engineers hated doing ops, but now we're hiring people who want to do SRE. I don't know, is that just a bit weird? The premise was you hated the role and you wanted to automate your way out of it, and now we're just hiring people who just want to do it. I love SE, I think it's great, but maybe for me it's morphing a little bit into DevOps. And I don't know, maybe SRE will continue on and it will stay distinct. But maybe also for me, I think development understanding how to operate a system and operators understanding how to develop, it's really important. So I don't know, maybe it will just be DevOps in the future, or DevSecOps or Dev, whatever other acronyms we want to throw in there. Or maybe it will stay SRE. I don't know. What do you think?
Kevin Yu: What do I think? Oh my God. Tough question. Turned it around. I never had that happen.
Marion Clelland: I know.
Kevin Yu: You're good at this. So I think in some forms of this, maybe two pillars I'll put on. I think we, just from a reliability perspective, it would just transition to be without the S and that's stating that from my previous guest, Kyle Brown, rather than narrow down to site service or whatever you want to call it. Ultimately, we want to achieve the goal of something being reliable, and that's should be considered by everyone. The other part, I think that one of the biggest culture shift and maybe process we're thinking that we pushing with SRE is automation and toil reduction. You and I spoke of that at the beginning and you was so much more toil. So I really feel like imagine you probably wouldn't have this today, if 10 years ago there were managers who embraced the SRE mindset, they would've said, " Oh my God, there's all this toil. Let me work my way to eliminate it." So I think in the future, I would like to see at least SRE being embraced by everybody in the organization. It's not really a job. It's really a... How would you call that? A discipline mindset that everybody does. People in finance, are they looking to buy a new license software, they should see what I think is toil.
Marion Clelland: They really should.
Kevin Yu: Eliminate it, exactly right? Project managers. They're doing reporting updates on projects. They realize they just spreadsheet PowerPoints through email repeatedly. How do I eliminate that? So I would love to see us get into that spot where everyone work to embrace the outcome of SRE.
Marion Clelland: Sounds lovely. I would love that too.
Kevin Yu: Yeah. Yeah. I think we all would. Less PowerPoints and spreadsheets.
Marion Clelland: Oh, I don't know. I quite like a PowerPoint, but that's just me.
Kevin Yu: But in all that toil, if you're bringing sustainability, look at all the carbon we'll be saving, right Marion?
Marion Clelland: Yeah, too right.
Kevin Yu: So in closing, we went our journey of all improving, I would say live inputting the planet, putting sustainability. I always like to go in closing back to the inspiration of this podcast, but instead of SRE as a whole, what would you say is a key ingredient and recipe for a well managed SRE team?
Marion Clelland: I felt there was a lot of pressure on this question because I love cooking so much. My husband is really good at making omelets as well. So I was just like, " Oh no, I really have to answer this very well." So for me, the key is obviously the eggs. So the eggs for me is the psychological safety that comes from the way that you manage the team from the leadership is from building the trust. And that doesn't obviously come from day one. It comes over the long term. And as I mentioned earlier, it's around being authentic, being consistent. So if you say you're going to do something, you do it. And if you don't, it's okay to apologize and own up to the fact that you didn't do it and to make it right. And I think for me, that builds that psychological safety. Also, obviously eggs, seasoning. You have to have seasoning, salt and pepper. So the fun, the creativity, the freedom to express and be creative in your job, to do the fun things like create that really cool bot on the side. Having the time for that, it is really important because otherwise all you're doing is turning the handle and that's no fun. You need to have some fun at work. And fillings, so you can put whatever you like in an omelet within reason. I mean, maybe putting chocolate in it'd be a bit weird, but cheese, pepper, ham, mushrooms, and for me, that's the people. So it's the diversity element, it's bringing all the different skills in, all the different personalities and also acknowledging that sometimes a filling that you want for your omelet isn't available. Maybe they're on holiday, maybe they're off doing that education that you want them to do and that's fine. And the omelet is still fine. The omelet is still great. You can take people out of that omelet, but the team still runs. And having an effective team where you don't have just one person who has all the knowledge for me is really important. So fillings that, they're all there most of the time, and you can choose whatever ones you want, but sometimes people have a life. They need to go off and do something else. That's my recipe for a well- managed SRE team.
Kevin Yu: Wow, that's fantastic. I really love it. It really captures the empathy, the different perspective people bring and making people creating that safe environment. People feel like they're not the only one that can solve this problem. They can go on holiday, they can have a break, and their team's got their back.
Marion Clelland: I find it so sad when someone says, " Oh, can I go on holiday? Because so-and-so is on holiday as well." I'm like, " Of course you can." If we can't feel like we can go on holiday, then we're doing something wrong in the team. We really are.
Kevin Yu: Marion. I'll say it again. You are such a role model for all of us.
Marion Clelland: Thank you.
Kevin Yu: I wish all inspire managers listen to this has an aspiration to be just like you. Thank you so much, Marion.
Marion Clelland: Thank you very much, Kevin. That's making me blush.
Kevin Yu: We'll have you follow up with say, additional ingredients on the future episode.
Marion Clelland: Yeah. Okay. I'll need to give it even more thought, won't I? Cool. Thank you.
Kevin Yu: I also like to thank the audience for listening. This is Kevin Yu, principal SRE of IBM sustainability software. See you on the future episode.
DESCRIPTION
Marion provides advice for managers and leaders to create a safe environment to foster innovation and teamwork with a diverse team in this episode - and a creative way to tackle technical debt.
Managers provide leadership, support and mentorship to their teams - they help employees reach their goals and shelter them from unnecessary distractions to help them focus. Marion shares with the SRE community challenges she has observed from her team, as well as issues experienced as a manager - including much more toil.
Marion gives perspectives of an effective team and what can be done to assemble and maintain it. In addition, advices for how practitioners can balance maintaining SLA/SLO, backlogs and keeping up with education and trends in the industry for career progression.
Lastly, Marion summarizes with a fantastic recipe for the SRE Omelette for a well managed SRE team.