Episode 8 - Supply Chain Resiliency

Media Thumbnail
00:00
00:00
1x
  • 0.5
  • 1
  • 1.25
  • 1.5
  • 1.75
  • 2
This is a podcast episode titled, Episode 8 - Supply Chain Resiliency. The summary for this episode is: <p>Supply Chain has become a household term during the pandemic - this is because issues in supply chain has led to disruptions for consumer goods or even chicken sandwich - as Marshall pointed out in his blog post here (<a href="https://devopslamb.wordpress.com/2020/12/30/disaster-recovery-for-supply-chains/" rel="noopener noreferrer" target="_blank">https://devopslamb.wordpress.com/2020/12/30/disaster-recovery-for-supply-chains/</a>)</p><p><br></p><p>I'm joined by Marshall Lamb, Distinguished Engineer and CTO of Sterling, IBM Sustainability Software to discuss what a resilient supply chain looks like, how to build it and its relationship with SRE. &nbsp;Marshall also shares his guidance for practitioners interested in the supply chain industry and ingredients organizations should embrace to building resilient solutions.</p><p><br></p><p><strong>Timestamps:</strong></p><p>[00:00&nbsp;-&nbsp;00:54] Intro to the episode</p><p>[01:06&nbsp;-&nbsp;03:30] Marshall shares the chicken sandwich story</p><p>[04:41&nbsp;-&nbsp;10:31] What SRE means to Marshal from a supply chain perspective</p><p>[18:27&nbsp;-&nbsp;24:52] What "good enough" looks like in supply chain now, and is it changing?</p><p>[25:26&nbsp;-&nbsp;30:45] Marshall's words of wisdom to those interested in a supply chain related profession</p><p>[33:01&nbsp;-&nbsp;35:43] Marshall's ingredient and recipe for the SRE Omelette</p>
Intro to the episode
00:53 MIN
Marshall shares the chicken sandwich story
02:24 MIN
What SRE means to Marshal from a supply chain perspective
05:49 MIN
What "good enough" looks like in supply chain now, and is it changing?
06:24 MIN
Marshall's words of wisdom to those interested in a supply chain related profession
05:18 MIN
Marshall's ingredient and recipe for the SRE Omelette
02:41 MIN

Marshall: Because of the pandemic and the issues that it has created for supply and demand, supply chains become a household word. If it takes me longer to recover my supply chain to normal operations than I have inventory on hand to meet that demand, I'm in big trouble. And that's what the pandemic has really taught us.

Kevin: Hi, everyone. Welcome back to another episode of The Making of the SRE Omelette podcast. Supply chain has been blamed for many problems during the pandemic of the last couple of years. Anything from toilet paper to chips for your car and chicken sandwich, as my next guests have noted. Joining us today to talk about supply chain and SRE is Marshall Lamp, distinguished engineer and CTO of Sterling IBM Sustainably Software. Welcome to the show, Marshall.

Marshall: Thank you, Kevin. Happy to be here.

Kevin: So, Marshall, before our guests wander further, perhaps you can share the story of the chicken sandwich to get us started.

Marshall: Yeah, sure. So, I was traveling on the road, and I believe I was on Interstate 95 and it was lunchtime. I was hungry, so I pulled off and found a fast food restaurant. And I went inside, and as I approached the cash register, I noticed there was, and this was during the pandemic. This was actually probably within six to eight months of the pandemic. And so, as I approached the cash register, I noticed there was a printed sign on the cash register, and it said, " Due to supply chain issues, we are temporarily unable to serve chicken sandwiches." And I just remember pausing and thinking about that sign. Well, first of all, I really wasn't going to get a chicken sandwich. So, I wasn't worried about that. But the fact that it called out supply chain issues struck me as they're making supply chain a household word. Not them, obviously. Not the fast food restaurant. But suddenly, because of the pandemic and the issues that has created for supply and demand, supply chain's become a household word when ideally it shouldn't. Right? Supply chain is something that we all take for granted as happening happily and working properly every day. That when we go to the grocery store and we look for our milk and bread, that it's there on the shelf. Or when we go get a new set of tires for our car, that they have our brand and model in stock. But when it doesn't work, it's like plumbing in a house. When you go into a friend's house and you go use the restroom and turn on the water faucet, you're just expecting water to flow. But when it doesn't, wow, what's happening here? Why is this plumbing all of a sudden failing? And when it really fails and you have backups in the toilet or a burst in a line out in the yard, it's spectacular. I mean, when plumbing goes bad, it has the ability to condemn a building. I mean, you're right, a bad paint job? Yeah, that looks bad. But plumbing. When something goes bad with supply chain, things come to a grinding halt in our economy and starts to affect our everyday life. So, it just really made me stop and think, wow, suddenly, because of this pandemic, supply chain has become a household word. And I'm not sure that's a good thing.

Kevin: I love that analogy. And Marshall, it is September and close to the peak shopping season. Performance and reliability is very much like what you said. Most of the time, they're transparent. But when services fail and impact the business and customer, they become front and center of all forms of escalations.

Marshall: That's right. And most large retailers make a huge percentage of their profits in the last month to month and a half of the year. So, Black Friday, the after Christmas sales, that last two months or so are really, really important for large retailers. So, if the supply chain is broken or there's something just preventing resupply of inventory, that's hugely impactful to these retailers.

Kevin: Right. Marshall, this is a perfect segue to the main purpose of this podcast, which are to understand the business impact of SRE and the culture to achieve that outcome you spoke of. Could you start by sharing with the audience what SRE means to you from a supply chain perspective?

Marshall: That's a really great question. So SRE, site reliability engineering. Obviously, that applies universally to infrastructure and the management of infrastructure. The thing about a supply chain is that it's almost like a happy mistake that it all works. Because a supply chain is a stitched together series of systems and human based processes that are owned by lots of different entities, right? So, you have customers, you have retailers, you have distributors, wholesalers, suppliers, manufacturers, and they all have to work together to make it all happen to make our economy thrive. And so, no single entity, no single enterprise, no single business owns that infrastructure end to end. You own your little bit. You own your piece. So, from the purest sense of the word, SRE, there's only a portion of the entire supply chain that me, as an entity owner, as an enterprise owner, are actually in control of. And sure, in the traditional sense, I am responsible for making sure that that infrastructure running my portion of the supply chain is resilient, it's highly available, it's distributed, I have proper monitoring in place, and all the good things that SRE teaches us we should do, I should be applying to the bit that I control. But what about the entire supply chain? So, I've given this a lot of thought, especially given the influences of the pandemic. I bridge the gap between SRE and the supply chain by talking about business continuity. So, in my past history with IBM, I've been responsible for cloud operations of various products, and we had to have a business continuity plan. IBM requires us to do that, and it requires that we practice it. Well, the business continuity plan, at a high level, has two components. It has what are you doing infrastructure wise? Basically, technology. How do I define high availability? How do I practice rolling updates and recovery when things go wrong? But that's really infrastructure technology and technology operations. The other half that people don't think about all that much in a business continuity plan is what about my people? What if I lose an entire data center? Well, okay, well I've lost that infrastructure, sure. And I should have the ability to follow to other locations. But what about the people who work in that data center?

Kevin: Well, they can't go into the office.

Marshall: Yeah, what happens if they can't go into an office?

Kevin: Right.

Marshall: So, when we practice business continuity, we not only practice failing over technology, but we have our people work from home that day. Can they do their job sufficiently from home? Now, let's superimpose that idea with a supply chain. What does business continuity look like from a supply chain perspective? Well, again, I only own the bit that I own, right? I don't own what my partners do. I have no responsibility over them or my suppliers or my distributors or my manufacturers. I can't control what they do. I can through my business. I can decide not to do business with them because I don't like what they're doing. But at the end of the day, they are responsible for their own business continuity plans. Okay. Well given that, what does business continuity mean to me? And I think another way to say it is, what does a resilient supply chain mean to me, and what can I do about it? First of all, there's measurement. So, you would say, Kevin, that in any SRE practice you have to measure what you do so that you can understand if you're making improvement-

Kevin: Right. And you know what good enough is, right? Yeah.

Marshall: Yeah, exactly. What is good enough? Well, what the DevOps practice taught us, that I think translates really well into SRE, is that time to recovery is a really important metric to measure, right? When something goes wrong, and stuff always goes wrong-

Kevin: You anticipate. You expect it will.

Marshall: Yeah, right. You can't deny that it's going to go wrong, right? Something will fail. So, what are you going to do about it? How fast can you recover from it? So, we can't prevent failure, and I think the pandemic taught us that really, really well. So, how quickly can I recover? So TTR, in the supply chain world, is very important. But again, that's about what do I control? The bits that I control. And we'll talk about that in a minute, what I think that means in the supply chain realm. So, TTR equally applies importance to supply chain operations as it does in IT operations. If something goes wrong in my supply chain, how quickly can I recover so I'm not impacting my business, or so I'm not impacting my customers. So, I think that's an important metric. But there's a new metric that is introduced in the supply chain world that in typical SRE is not measured. And that's TTS, time to survive. And let's take a large retailer, for example. Let's take a grocery store. So, I grew up on the southeast Texas coast, and we had hurricanes all the time. And of course, when a hurricane was barreling down on the coast, there'd be a run on milk and bread and other things in the-

Kevin: Water, all that.

Marshall: Water, exactly. So, well, that's a big problem that I did not anticipate that inventory hit. So, maybe I should have, as a grocer, start stocking up on more bread and water and milk. But my inventory on hand and the demand on that inventory defines what my time to survive is. Survival means keeping inventory for every sale. I do not want to have an out of stock situation.

Kevin: inaudible of sales.

Marshall: Yeah, exactly. Out of stock situations for retailers are death.

Kevin: Well, of course, some of the food may expire, right?

Marshall: Right, exactly. And that's a whole nother problem, keeping on top of expirations and making sure you stay up on top of expiring inventory. But let's just think of it from a demand perspective. It's really important for me to measure what my typical demand is to stay ahead of that demand and make sure I have inventory to meet that demand. But what if there's a sudden change in that demand? When a hurricane is approaching the coast, the demand on certain goods accelerates beyond what normal measures show. And so, my time to survive is suddenly diminished.

Kevin: Right.

Marshall: So, it's really, really important for supply chain operators who have inventory as part of their supply chain to understand what their TTS is, what is time to survive. And that doesn't only apply to a retailer who has inventory in store to meet a customer's demand, but it also applies to distributors and manufacturers. And so, the equation that presents a problem is if my time to recover exceeds my time to survive, I have a big problem.

Kevin: Right.

Marshall: Right? If it takes me longer to recover my supply chain to normal operations than I have inventory on hand to meet that demand, I'm in big trouble. And that's what the pandemic has really taught us. So, there's no magic answer to how to ensure you always have enough inventory on hand no matter what the demand is because that can fluctuate widely. But it's important to measure it. What is my typical time to survive? What is my typical time to recovery? And how can I make both better, right? So, a resilient supply chain, I think given that, a resilient supply chain looks like this, from a point of view of an enterprise that owns a portion of the supply chain. First, I should have multiple suppliers, not just one. As a grocer, I should have multiple suppliers that can give me bell peppers. Not just one farmer. And I should be actively rotating between those suppliers, such that at any given point, if a supplier is impacted by some, let's say, natural event, like fires in California are affecting farms, right? So, if my supplier is in California, and they had to evacuate their farm because of a large fire, then I need another supplier to get my bell peppers from. So, it's just like IT operations. I should have multiple data centers, multiple servers that can equally serve requests. Exactly. I should do that from a supplier perspective. The other thing is I should be digitizing and requiring my partners to digitize every aspect of their business. Digitization removes human processes or it removes manual intervention, which not only makes things more efficient, but it makes it more portable. Now, if I'm not dealing with a clipboard, I could operate my business from home or anywhere or a coffee shop on a corner because it's been digitized. So, that helps with the mobility of my human element in the case of a disaster, right?

Kevin: Right. People are working remotely.

Marshall: Exactly.

Kevin: Right.

Marshall: They can do it anywhere if their aspect of the business is properly digitized. And by the way, there's a side effect of that, digitized processes also make that data that is not digitized readily available to visualization and analytics for end to end visibility, which is another aspect of resilient supply chain. I need that end to end visibility. I need to know where my orders are, where my shipments are, and if they're stalled or broken or something is held up, I need to know immediately so that I can take corrective action before it becomes customer impacting. Just like an SRE, if a server goes belly up, I need to be able to recover that server before its effect is affecting my customers or my business. Same thing in a supply chain. I need to be able to see what's happening in the supply chain well enough to where if there is an impact, and there will be, that I can do something about it before it starts impacting my business. And then, of course, there's the workforce continuity that I mentioned around the business continuity plans. If all of that is in place, then my people should be able to do their job wherever they are. So, if they have to evacuate their town because of floods, or they have to leave the building because of a power outage, then they can work from home. So, I think those are the aspects of a resilient supply chain. And again, it goes back to measuring TTR and TTS, time to recovery and time to survive, and optimizing both. And once you do that, you can start playing what if scenarios. I can start modeling disaster scenarios and determine, well, what are the courses of action I need to take if that happens? So, they can actually model and test various disastrous scenarios.

Kevin: Supply chain, as you described, really sounds just like a complex system. And just like supply chain, need that visibility to data for us to drive actions, be it automated or manual.

Marshall: Yeah, exactly. If you don't properly digitize your business, not only are you probably spending a lot of extra effort and cost on manual processes, but you're losing out on an opportunity to model different outcomes.

Kevin: Right. With data we can anticipate and proactively adjust as needed versus being reactive, doing things only after the fact.

Marshall: Exactly. But just like it helps you model disaster scenarios, it can also help you model new revenue opportunities.

Kevin: Right, right, right.

Marshall: Because the more you can optimize, even shave off, single digit percentage points on time spent, you can increase your profit margins considerably by lowering the cost of the supply chain operations. So, it's not only about disaster recovery and business continuity, but it's also about mining new cost savings and profit making opportunities, as well.

Kevin: I also liked how you touch on dependencies. Everyone who has a supply chain owns a little piece, and no one has control over everything. This is very much like a complex system. I remember you coaching the team to simply anticipate that everything that can fail will fail. Sure, we can ask our dependencies to be more reliable, but what's in our control is to be better at handling those failures to mitigate or minimize the disruptions to the service that we provide.

Marshall: That's right. At the end of the day, we only control what we control. And we can complain when someone else fails at their job. We can complain all we want, but we can't prevent them from failing at what they do. All we can do is be resilient to when they fail.

Kevin: Right. And that's definitely a big aspect of how we learn from disruptions.

Marshall: Exactly. Exactly. That's why I think these learnings and SREs that applies to infrastructure and infrastructure operations applies equally to supply chain. It may be even more so because, again, I'm only controlling my bid of the supply chain. So, how can I be more resilient in other aspects when other parts of my supply chain fail that I don't control? What do I need to do about that?

Kevin: And speaking of data, from a latency and processing time perspective, good enough in retail, especially inventory management, is measured milliseconds or near real time. I recall for supply chain, it would be good enough if training documents can be delivered in five to 10 minutes. Marshall, can you give the audience some example of what good enough in supply chain looks like now? And if that's changing?

Marshall: Yeah, that's an interesting phenomenon. It's actually changing, too. Let's use a bank example. And this doesn't necessarily translate directly to other supply chains, but I think it's a good example. So, not that long ago, if I were to go into a bank branch and deposit a check, it wouldn't appear on my account. It wouldn't be credited in my account until the end of the day. Even if I went in the morning, it wouldn't happen until the end of the day. And I remember thinking, boy, I wanted to work at a bank as a kid because they closed every day at 4: 30. And I was thinking, wow, that'd be great to work for a company I get to go home at 4: 30. I didn't realize that they just closed their doors, and they spent the next hour or two hours-

Kevin: Counting checks.

Marshall: ...clearing all those checks. Yeah, going through all the checks and batching them up, and then sending the information to the Central Bank or the Federal Reserve. So they're doing a lot of work behind the day at the end of the day in batch. So, batch processing of value chain, I'll use the term value chain instead of supply chain, because that equally applies to things like financial supply chains, as well as retail supply chains. But value chains, in B2B networking, are largely batch oriented. And so, the transactions, even though electronic, were fewer, but huge.

Kevin: Right.

Marshall: Typically large batches or giant envelopes or think of it as giant zip files filled with orders or invoices or even deposited checks. And they were processed infrequently. And that's just how business worked. And so, yeah, there was, and up until recently, pretty lenient requirements around data transaction fidelity. Service Level Agreements, SLAs, around data delivery were in terms of tens of minutes.

Kevin: Right.

Marshall: So, if it takes up to 30 minutes to actually get a response, that's probably okay. But that's changing rapidly. And not to call out one vendor specifically, but we often call it the Amazon effect. And really, it's about online retailers, as you hinted at. We have, as individual private consumers, we have developed an expectation of knowing the status of my order at any given minute of every day.

Kevin: Right, right.

Marshall: So, I can look up where my order is, if it's being packaged, if it's been shipped, and what my tracking number is. And it even gets to the point where I can see where the truck is on the road, and how many stops that truck has before getting to my house.

Kevin: Right.

Marshall: That's an amazing level of data availability and fidelity-

Kevin: Transparency.

Marshall: ...and transparencyto the end user, to the customer, to me, to the person who is outside of the supply chain, really as an individual consumer. That is an amazing lack of data latency. The data's just available, it seems like, instantaneously. And available to me, the private consumer. So, as a private consumer now, I am bringing that expectation for lower data latency and availability into the workplace, into my enterprise. Why can't I know the status of my order at any given a minute? Why are you taking, Mr. Supplier, 30 minutes to acknowledge my order? And so, there is an increasing demand for faster transaction availability, transaction completion and transparency, and availability of data. And so, what comes with that is a transition, a pretty slow transition, away from batch oriented data exchanges, which are file based today or yesterday. And think of like SFTP or ASII protocols where you're just sending over large batches files, and moving away from that to more API driven workloads.

Kevin: Interesting. That's a whole school problems.

Marshall: Yeah. And so-

Kevin: Challenges.

Marshall: Right, right. I mean, APIs have been around for decades in software world. So, a software engineer thinks," What? Why are APIs all of a sudden important to supply chain? They've been around for decades. Why supply chain? Why now?" Well, it's just because supply chains have been batch oriented. They've been using-

Kevin: Right.

Marshall: ...file based transfer protocols for decades. They work just fine. They're nothing wrong with them. It's just now businesses are bringing a new expectation. And with that comes a demand for different ways of moving that data around. So, think about, not to dwell on this too much, but think about what has to happen in a file transfer, batch oriented based supply chain. There has to be a file server that I'm depositing files to, or customers are pulling files out of the network and putting them in their file server. And something has to be done to that file, has to break it up, and then push it to different places, maybe upload it to an ERP, maybe load it up to a CRM. But something has to watch for those files, and then do something with those files with another system. So, in order to do faster business, in order to have that faster transaction, the faster settlement of an order, you really can't afford to have that middleman, as it were, that file server. You've got to get rid of that, simplify your network, and have your network actually communicate directly via APIs with the transacting system. And that not only simplifies the path, but APIs, generally, are used for smaller bits of data, not huge amounts of data, and are much more responsive. So, we're moving rapidly.

Kevin: And easier to recover.

Marshall: Exactly. Yeah, much easier to recover. And so, we're moving rapidly away from large, infrequent, batch oriented data transfers to more frequent, much smaller, more nimble API based transactions in order to fuel that expectation, in order to meet that expectation of having instantaneous transaction acknowledgement and payment settlements and data availability. So, that's what we're seeing from a technology perspective to meet that expectation.

Kevin: Thanks for sharing that fantastic insight, Marshall. It is really cool to connect changes in consumer expectations to changes in the definition of what success means in supply chain, and in turn changes to architecture and technology needed to meet those goals. With that, and in relation to your childhood dream of being a banker, a big part of this podcast is technical vitality. Marshall, any words of wisdom you would give to audiences you may have just inspired to look into a profession related to supply chain?

Marshall: Well, I really haven't been in the supply chain space that long. I've been probably four years now, and I am working with several people who this is their life work, working in supply chain. And so, I learn a huge amount every day I'm working with these people. But there's one thing that I have in common with them in that the majority of my career at IBM has been on the back end of software development. I call myself a plumber, not a painter. And supply chain is all about plumbing. It's all about going back to our resiliency talk about the data movement. It's hard to move data around to different systems, many of which you don't control, and make sure everything works properly. It is an amazing collaboration of entities and individuals that otherwise wouldn't talk to each other. So, going back to my statement, it's a happy mistake that our supply chains even work. But if someone's interested in supply chain, I really do think one of the biggest opportunities that supply chain practitioners have going forward, especially from an engineering perspective, is in this area of more resilient, more nimble, faster supply chains. I think certainly an element of that is SRE, and I have a thought on really getting into the SRE engineering space equally for supply chain, as well as anything else, like IT operations. But in supply chain, in general, if someone wants to get into supply chain, I really think beyond just engineering discipline around plumbing and understand what it means to securely and reliably move data from point A to point B, you really need to understand the business. And I think unlike almost any other job that I've had at IBM, I've really had to understand the industry better for this job because so much of the behavior of supply chain is driven by expectations and behaviors of the companies that rely on the supply chain. That's really important to understand what's important to them. And also, you really need to understand, when things go wrong, how it impacts them. I mean, when an aspect of a supply chain that we control fails for whatever reason, we hear it. We hear about it from our customers. I mean, if a supply chain fails, they're losing money, real money. And of course, we hear about it, and we get to understand just how impactful this plumbing is to their everyday business. I think it's really important for a supply chain practitioner to really spend time with customers. And not only their customers, their service customers, but their suppliers, their distributors, understand all the different roles that play in a supply chain so you can understand what's important to them, and what their needs are. So, I think that's important. And certainly, when a lot of people think of supply chain, they think retail, large retail. That's the big bear in the room. But there are lots of different types of supply chains out there. There are manufacturing supply chains, there are financial supply chains, there are medical supply chains. So, there are lots of different styles and types of supply chains, and they all bring different types of requirements, and sometimes regulatory requirements, as well. So, understanding the business is imperative to being able to build a productive supply chain. But from an SRE perspective, because we have spent so much time talking about how SRE translates to supply chain, if someone wanted to get into the software side of supply chain, it's really no different than any type of highly resilient IT infrastructure you may want to get involved in, other than the fact that you just have to be aware that what you're building, from a technology perspective, is the heart blood of these enterprises, of these businesses. So resiliency, time to recovery, is really, really important. It's always important.

Kevin: Right.

Marshall: Right, Kevin? I mean, TTR is always important no matter what, you're right. But it's particularly poignant for supply chain technology. And so, I would challenge an engineer who wants to get into supply chain to ask him or herself this question, and to ask his or her other colleagues this question. When your code breaks, and it will break, how will it break? How will you know? And what will you do about it? Now, I'm not here to imply that as, an engineer, you're writing bad code. We've all written bad code. We know what that's like. I'm not accusing anyone of writing bad code. I'm just saying when your code breaks, because your code's likely to depend on something else. In my part of the supply chain, I am expecting when I send an order to a supplier, that that supplier acknowledges my order.

Kevin: Right.

Marshall: Well, what if it doesn't, right?

Kevin: How do you know it doesn't, right? Yeah.

Marshall: My code is-

Kevin: Yeah.

Marshall: Exactly. Exactly. So, it's important to understand equally for any system that you're writing software for. But again, more importantly for supply chain, it's important to understand everywhere in your code that could break, many times because of things beyond your control. But how will it break? What will it look like when it breaks? And so, you need to understand all those points. And then, how will you know? Meaning, are you properly monitoring those points of breakage? So, if this remote system I'm dependent on, or this remote supplier that I'm dependent on, if they don't do their job, what do I need to look for to make sure I understand that as quickly as possible? How am I monitoring my system? And then lastly, once you're alerted that something's wrong, what are you going to do about it? What's your playbook? What's your reaction? What automation do you employ to correct the action? If you ask yourself and your colleagues that simple question, when your code breaks, how will it break? How will you know? And what will you do about it? You are off on good footing from an SRE perspective, not only for supply chain, but any software solution that is dependent upon a highly resilient system.

Kevin: Those are great advices for both current and future practitioners. In fact, those series of questions, how we know something has failed and how we recover, are fundamental questions we have in instant learning. But I really want to echo the part you mentioned about getting to know the customer, not just what their expectations are, but also understanding how our disruptions impact them. Because only with that understanding can we design a better system that leads to client success. So, Marshall, you took us through a great journey from the chicken sandwich to what a resilient supply chain means today, and what it will mean tomorrow. Going back to the inspiration of this podcast, what would you say is the ingredient and recipe for an organization to achieve the SRE outcome?

Marshall: So, I think SRE is a natural evolution of the DevOps movement. And I got initiated in DevOps as part of an exercise when taking a product that I was responsible for to cloud back in 2013, now. And so, I was part of a task force of engineers who were tasked with bringing up our solution, which we had been running on our own data center in a cloud. And what did that mean? And that's when we got introduced to DevOps. But back in 2013, DevOps was not the buzzword that it had become, and SRE certainly wasn't a notion. And I feel very, very strongly that DevOps is not a job description. And therefore, I also feel very strongly that SRE is not a job description. Now, don't get me wrong. We need strong practitioners like yourself to teach people how to practice SRE and DevOps correctly. But it's not a profession, in general. It's a practice. It's a philosophy.

Kevin: It's a hat you wear. Yeah.

Marshall: Yeah, it's something we all are responsible for. And I think whenever an organization has a group that's called the SRE team is doing a disservice. It's basically saying, " Oh, this other team is responsible for SRE. That's not my job."

Kevin: I'll build it. I'll give it to them.

Marshall: Right. They're responsible for the resiliency aspect, right, of my software. That's simply not true. As a software engineer, I am responsible, at the end of the day, for what is deployed to production. I am responsible for what it means to build a resilient system based on that code. Not someone else. And so, I think it's really important, it's a key ingredient, for an organization to embrace that SRE is everybody's responsibility, and we do require coaching and guidance and leadership from people, like yourself. But that does not mean that we are delegating all SRE responsibility to you. That's just a recipe for failure, in my mind. So, as soon as an organization, and really the organization's leadership, it has to start with the leadership, believe in and preach that every engineer's responsibility is SRE, that's when you start seeing real progress, real successes, in the area of improving operations and operational efficiencies.

Kevin: Love it. Shifting left the responsibility of SRE to all engineers, starting with the leadership, so we can build products with considerations and features of SRE from the start. So, there you go, ladies and gentlemen, the ingredient and recipe for SRE from Marshall Lamp, distinguished engineer and CTO of Sterling IBM Sustainability Software. Thank you very much, Marshall, for joining us today.

Marshall: Great. Thank you, Kevin. I enjoyed it.

Kevin: Please also check on Marshall's blog post, Enterprise to Cloud, as he shares stories of his journey to enable cloud friendly, scalable, and profitable service. Thank you all for listening. See you again on an upcoming episode.

DESCRIPTION

Supply Chain has become a household term during the pandemic - this is because issues in supply chain has led to disruptions for consumer goods or even chicken sandwich - as Marshall pointed out in his blog post here (https://devopslamb.wordpress.com/2020/12/30/disaster-recovery-for-supply-chains/)


I'm joined by Marshall Lamb, Distinguished Engineer and CTO of Sterling, IBM Sustainability Software to discuss what a resilient supply chain looks like, how to build it and its relationship with SRE.  Marshall also shares his guidance for practitioners interested in the supply chain industry and ingredients organizations should embrace to building resilient solutions.

Today's Host

Guest Thumbnail

Kevin Yu

|Principal SRE, IBM Sustainability Software

Today's Guests

Guest Thumbnail

Marshall Lamb

|Distinguished Engineer, Master Inventor and CTO, IBM Sterling