The Cedar Language and Policy Based Authorization with Emina Torlak

00:00

0.5
1
1.25
1.5
1.75
2

This is a podcast episode titled, The Cedar Language and Policy Based Authorization with Emina Torlak. The summary for this episode is: <p>In this episode of Authorization in Software, host Damian Schenkelman talks to Emina Torlak, Senior Principal Applied Scientist at AWS, about the intricacies of software authorization, policies, and the Cedar policy language. Torlak delves into the philosophy behind Cedar, an open-source language for writing and enforcing custom authorization policies. They discuss the need for policy-based access control, how it separates application code from authorization logic, and the importance of user interface in managing authorization.</p>

Transcript

Damian Schenkelman: Welcome to another episode of Authorization in Software, where we dive deep into tooling, standards, and best practices for software authorization. My name is Damian Schenkelman. Today I'm chatting about all things authorization, policies, and the Cedar programming language with Emina Torlak, senior principal applied scientist at AWS. Hey Emina, it's great to have you here.

Emina Torlak: Hey Damian, thank you very much for inviting me. It's great to be here.

Damian Schenkelman: I'm, I'm really excited about what we're going to be chatting about. But before we get started on dive deep, could you give our listeners a brief overview of your background and your current role at Amazon?

Emina Torlak: Absolutely. So, I'm currently a senior principal applied scientist at Amazon and an associate professor of computer science at the University of Washington. I work at the intersection of programming languages and automated reasoning, which is an area of computer science that's concerned with automatically proving correctness of systems, such as logical specifications and code. When I'm wearing my professor hat up, I work on a programming language called ReSeT. I've been developing it for about a decade or so, which virtualizes access to an automated theorem prover. It makes it possible for people who are not experts in automated reasoning to build very quickly state- of- the- art certification tools for the maze that they care about. So, some examples include building a verifier for the therapy control software for a radiation therapy machine in active clinical use at the University of Washington. Another example includes verifying just in time compilers that are embedded in the Linux kernel. When I'm wearing my AWS hat on, I co- lead the development of Cedar, which is a new language for writing and evaluating authorization policies. We designed Cedar to balance performance, expressiveness, and analyzability, meaning the ability to reason automatically about the correctness of Cedar policies. If you've attended Reinvent 2022, that's where it launched as part of two other products, which is Amazon Verified Permissions and AWS Verified Access.

Damian Schenkelman: This is amazing. It seems you've been thinking for a while about how to add verifiability and the stability to very critical things, right? Again, on the one hand, verified permissions, on the other hand, software for healthcare, which also needs to be fairly consistent and make sure that it does the right thing. It's great to see how these academic concepts are applying to more practical matters, as you said, an applied scientist. You mentioned Cedar is a policy language. What does that mean for people maybe that are not familiar with policies? What are policies? What is policy- based access control and how does it relate to authorization?

Emina Torlak: So for policies, you can think of them as programs in a domain specific language. They're very restricted class of programs. They take as input a principle that needs to be authorized, some action that they want to perform, the resource in which they want to perform the action, and the current context. And based on that, the policy decides if the principle is allowed to do this. For example, a user in a photo sharing app might want to access a particular photo that was uploaded by their friend. Whether they can do so or not is determined by their policy that their friend set, whether you're allowed to do it or not. Policy- based access control, I have to admit, I actually had to look up that, that's not what we call it. I had to Google it. And, there are different definitions out there. But they will get at the same three concepts that are generally useful and that resonates with us. So the first one is, when you're using policy- based access control and particular authorization, you want to separate your code from the authorization logic. Okay? So this has two advantages. The first one is, if you want to change your authorization policies, you don't have to change your applicationary compile, right? You just change the policy, and things continue to work, and you have a clean separation of concerns, your application logic from your authorization logic. The second thing that resonates with us and the various definitions of policy- based authorization is that when you externalize your authorization ETO language, this language should give you flexibility and the right concepts, the right abstractions to express the authorization, not just that are relevant to your applications. So don't be dogmatic, right? In some cases it makes sense to give authorization based on roles, and hierarchies, and in other contexts it makes sense to do so based on attributes. So really, the ability to mix and match those freely is one of the things that we wanted to put into Cedar. And then, the third aspect which may be underappreciated and is extremely important is that, whatever policy- based authorization we provide for you to build your applications with, the thing that really matters for your customers and the users of your applications is providing the right UI. So, there is no universal UI to an authorization system that works for all applications. If we're talking about a UI for somebody to use to authorize their photos or documents, you want something point and clicky, right? You don't want to use this to write policies. On the other hand, if you are giving power to your system admins to write authorization policies, maybe they really do want the text interface and you provide them a nice language to do that in. So, the thing that we were thinking about with Cedar was that we wanted to provide a language that makes all of those things easier, right? So it's a layer that you can use from within your application to give a UX to the authorization system that makes sense for how your application is going to be used.

Damian Schenkelman: Okay. This is very interesting and you touched on a few interesting topics and we're going to be able to dig into those over the course of the show. But, essentially you said, okay, so policies allow you to define some rules based on who's trying to access something, and what they're trying to do, and what they're trying to access. So it's like, who is doing it? What they're trying to do? What they're doing it with? And then, some context which might be someone's IP address or the specific permissions they have and a few other things. You're trying to make this authorization decision, which in some cases it might be based on someone's role, in some cases it might be based on other attributes. And that's where you touch on some topics like attribute- based access controller and drone role- based access controller. And then, you also said, "Well, we also wanted to make sure that whoever was using this as the end- user was able to have a user interface that allowed them to understand what was going on there." Can you share a bit more about that? Because, I don't think that's something that, at least, I've heard a lot when folks are talking about designing authorization languages.

Emina Torlak: Mm- hmm. So, the thing that we wanted to do with Cedar is to give you a general substrate that would make this easier. So, let me talk about a more particular aspect of the Cedar language that makes UX building in particular easier, and that is the notion of templates that we had with Cedar. So you can write a generalized version of a policy, where you were, let's say, leaving some aspects of the hierarchy unspecified. Then, as your user is interacting with the UI and they're clicking where in the hierarchy they want to authorize something. So for example, my managers want to see my employee records or something like that. The only thing that the application needs to do then is to turn these UI clicks into instantiations of this template. Right? So that is one example, one feature that we built into Cedar that makes building UX access in particular easier, and especially the UXs that are pointy and clicking. Now we do have applications in which people just want to write Cedar. So one example of that is the AWS Verified Access, where the people who are interacting with AVA are writing Cedar policies that are essentially implementing a zero trust system for accessing corporate applications. So, you can do either thing, whatever makes sense. In the context of one application, you want to expose the text interface. In the context of another application, it really should be a more pointy and clicky interface.

Damian Schenkelman: That makes sense. And, it seems that this is one of those things that you did to make sure that you address expressiveness in the Cedar system. Whenever I see languages developed as particularly in an enterprise setting, it's interesting to understand more of the background, I find that, and the thinking process. So you said, " Hey, we're looking for performance. We're looking for expressiveness. We're looking for analyzability." What made you create Cedar? What were you able to do maybe with other tools and other languages and what made you say, " We're going to have to write a new one because we think this is the way to go"?

Emina Torlak: Yeah, it's a really good question and it's an interesting story, because we resisted building the new language for the longest time, because doing so is just not a thing to be taken lightly. So, for a very long time, customers were coming up and asking us for help. They were saying, " Hey, you guys built this IAM language. And it seems to work really well for protecting AWS resources. We have our own resources that we want to protect and it's been a struggle." We really have customers come and tell us, " We have tried to build our own in- house authorization system three times. We're terrified of it. It's hard to scale. It's hard to get the language right. Can you help us do it?" So, having heard this enough, we took look at the current landscape to see what's out there and whether we could just direct people toward an existing solution and say, " Go use that. We're a 100% confident it's going to satisfy your needs." What we essentially found is that in the current landscape you can place the current solution along an axis that's... Actually, okay, so it's three- dimensional space, where in one hand you are considering the performance, how fast the authorization runs, can you actually bound the latency when somebody makes authorization request? On the other hand, on the other axis, you can place the expressiveness on the language. Okay, can I express everything that I need to say? All right? Can I talk about role- based access control? Can I talk about ABAC? Can I talk about other things? And then, on the third dimension is, is it actually possible to prove properties of these policies? Now, this last analyzability dimension sounds very academic, but it is something that we are currently doing under the scenes behind the tools that are very popular among IAM users in particular. So, the IAM Access Analyzer and S3 Block Public Access. Under the hood, they're using a theorem improver to establish properties of IAM policies. That is one thing that customers really wanted. They came to us and said, " Can you give us that?" Right? " In whatever solution you come up with, we really want this functionality because it has helped us with compliance, with all sorts of things. It really makes our lives easier." So, on one hand you have the languages that are extremely expressive, and that's very good if you need to be super flexible. But, the price of expressiveness, and this is true with programming languages in general, nothing specific to authorization, but the price of expressiveness is that the more express of the language is, the fewer things you can guarantee or say about its performance, right? Classic example is if your language includes loops, you can't even guarantee that a program terminates, right? And on the other hand, if you want to have really good performance, you really have to limit your language. What you can say in the language is pretty limited. And the third dimension, analyzability, is just like that. If your language includes certain features in order to be expressive enough, chances are you won't be able to analyze it. Come in technical terms, it means that the analysis problem for the language becomes undecidable. It's not actually possible to write on an algorithm that determines whether a policy can do something or not. So, what was really missing in this space is this point in the middle where you try to balance these concerns, where you get just enough expressive, it's not super expressive, right, you get just enough to cover the basic authorization use cases, RBAC, ReBAC, ABAC. And, it's sufficiently constrained that you can put bounds on performance. You can say something about latency, how long an authorization call is going to take. And, you can also analyze these policies automatically.

Damian Schenkelman: That makes sense. And, I have a couple of questions though that I think the first one is, you talk about performance, and you also contrasted that to expressiveness. So you said, " Hey, the more expressive something is, inaudible you can make, which also factors into performance." When we talk about performance, are we talking about performance of the entire authorization decision, or are we talking about performance of how long the policies take to run? And this naturally depends on what the policies are doing and whether they are doing another thing. So, how do folks think about it?

Emina Torlak: So, you are absolutely right. There are two dimensions. So, the dimension that's directly under Cedar's control is, once you have done the hard job of deciding which policies to evaluate and which data you need to pull in to evaluate them, then we give you balance of performance of how long this evaluation is going to take, how long we are going to take given all the data in order to make an authorization decision. And, for some typical use cases with hundreds, thousands of policies and entities, our authorization latencies less than one millisecond. Okay? So, it is super fast, super cheap. But, like you pointed out, there is a much bigger problem surrounding this inner core problem, which is fetching the policies and fetching the data. In the case of Cedar, that is something that the applications or services that are built on top of Cedar are responsible for doing. So, our interface really stops at that authorization and the evaluation layer. So after you have gathered all the data and you give it to us, we're promising to come back in the typical case in under a millisecond.

Damian Schenkelman: That makes sense. Yeah. It's usually hard to make commitments for things that you don't control, and particularly in very complex systems. The other thing I was curious about is you mentioned some things about being able to verify what your policies essentially allow you to do, what things will or won't happen. This seems very interesting, because the industry talks a lot about SIEN, which is Security Information and Event Management. And essentially, it's after the fact, figuring out what things happen and maybe doing anomaly detection, and who might have access to things that they shouldn't have. Making sure that you're very responsive there. But this seems to be more preventing and making sure that things that shouldn't ever happen, don't happen. How did you folks come to think about it? Because, as you said, one thing is, " Hey, customers came to us and said we want help implementing these capabilities." But a very different one is, is learning, " Hey, these customers actually need this capability, even though they might not have us for it because they might not know it's even possible to do."

Emina Torlak: Mm-hmm. IAM has already been through this process. And, through this process, they decided to build the IAM Access Analyzer, which tells you in simple terms what your IAM policies allow and you can write scripts to process the data from IAM and decide whether you have any security holes in your system or not. Same thing for S3 Block Public Accesss. So, within that context, the basic property is, " Whatever I write with my policies, I really shouldn't make my buckets public." So that's another thing that we could check fully automatically. So, in the context of Cedar, it's actually a good question what it means to verify Cedar policies. So, the thing that both IAM Access Analyzer and S3 Block Public Accesss have in common is that under the hood they are both using the same engine, which is called Zelkova, and it takes as input policies in IAM and translate it to logical formulas, which it then analyzes. So, we have the same IAM engine, so call it the equivalent for Cedar. But the really good question is, how do you expose this capability to customers. For IAM, you have the main knowledge to use that makes sense for all IAM users. You know what an IAM bucket is, you know what an arm looks like. In Cedar there is no such thing. Exactly because you bring your own resources and your own attributes and saw to the table. So, the question is, what are interesting properties to check about Cedar policies. And, we are talking to customers and figuring those things out. But, to give you an example, I think I took a chat with, people have been pretty excited about and have been asking us for is checking the equivalence of two policies. So, imagine that, you had a deadline, you wrote a bunch of policies, you tested them, and they seemed to be doing what you want. They seem to be working. There are 10 of them and they're super nasty and ugly. So the deadline passes and you want to refactor them, right? You want to rewrite them so they look nice, they're easy to audit, they're easy to understand, and you get it down to three policies. Now, how can you be sure that the inaudible policies that you had and these three beautiful policies that you've written are actually doing the exact same thing? They will disagree... They will agree on every single request that comes in. If one says yes, the other is going to say yes and vice versa. So, this problem, equivalence checking, is something that we can do with automated reasoning that we have built into that.

Damian Schenkelman: That makes sense. So, that also goes to what we were saying earlier, rather than sampling for a number of very large combinations for inputs, you can actually mathematically verify that these two things are equivalent, which gives you much bigger safety. That's very interesting. You also mentioned things around... You give us your model, by that you are expressing your authorization, not your authorization model with Cedar data. And that's also what makes some of these things challenging. We were facing some similar challenges while developing some of our own authorization solutions, with inaudible, right? Which is, you are telling us all of your knowns, right? You have your folders, and you inside you have your files. But, we don't know what they mean and they might mean different things to different companies working on these projects even if the known label is the same.

Emina Torlak: Exactly. Exactly. So, that's really one challenge of building generic tools, tools that people are supposed to customize, is then, it becomes difficult to provide universally useful tooling, and environments, and ecosystem for that. You have all of these interesting usability questions that come up. What are the properties that every Cedar user is going to care about? So, equivalence might be one of them. Maybe that's not the first other concern, but that's one of the things that we really look forward to finding out as more people use the language and interact with it.

Damian Schenkelman: So you've been talking about policies, inaudible fidelity, performance bounds, being able to know that what you have there has some guarantees. What was the process of developing Cedar like to achieve those properties? And what tools did you and the team use to get to those results?

Emina Torlak: So, one of the things that we have focused on from the beginning, in addition to performers, the thing that we really care about is safety. So, you are putting Cedar on the critical path of replications, critical to security, which means that we wanted to be really sure of two things, actually three things. Okay? So, thing number one is... Which I'm going to jokingly call the human computer interface, which is, we have to write the data specification language in a way that is completely clear, so that you as the human, when you're translating your intent, right, whatever is in your head into the policy language, are less likely to make mistakes. Right? You have to understand what the specification is actually saying. And they have to be unambiguous. Okay? So, this is why we settled on writing a formal specification for Cedar and I can talk more about what that actually means. The second one is, once you have this common specification written down, there are properties of Cedar that we wanted to prove. So, we wanted to be a 100% sure that those properties hold about the design of the language, okay, the specification. So, one example is very simple and that better hold in any authorization system, and that is the default behavior of the system is denied. Okay? If there is no explicit permission given to do something, if you are asked whether you can do it, the answer should be no by default. We can prove that. In addition to those generic properties that have to be true about all authorization systems. We also have properties which are unique to Cedar and related to performance. For example, the syntax of the Cedar language is designed in such a way that there is a part of the policy which we call the policy scope, such that, if you use the information from the policy scope to do indexing, and to retrieve policies based on the context of the scope, you are guaranteed to pull in all the policies that you need in order to make a correct authorization decision. Okay? So your mental model when using Cedar is if you have a million policies, all of them get evaluated to make an authorization decision. That's your easy mental model as a user. Of course that's not going to happen in real life, right? Somebody has to select a subset of those policies that's actually relevant to that request and evaluate only those, because otherwise it wouldn't scale. So how to select that subset is a problem that we call policy slicing and we wanted to prove that the policy slicing algorithm that we propose based on the syntax of the Cedar language is solid. So you're guaranteed to put enough policies. So, that's another example of a thing that we wanted to prove about the design of the language, so the actual spec. And finally, the third thing that we wanted to ensure is that what we actually implemented in Rust. So this is how we get the good performance, our implementation is in Rust, actually matches the language spec. Right? So, if you have a clear spec and you prove things about it, but who knows if your limitation matches it or not, we wanted to bridge that gap as well. And we do bridge that gap with a technique called random differential testing, which has been used very successfully to test compilers. So, it was pioneered in the context of testing SQL compilers. And, that is what we use to establish the correspondence between our Rust implementation and our formal specification, which is written in a language called Daphne.

Damian Schenkelman: Okay. So, there is a lot of to unpack here. Let me try to go brave, and naturally correct me if I didn't get anything right. So, folks said, " Hey, we're going to start by formally defining the language. That means, no coding, no implementation, let's just make sure that the rules for the language, the grammar, what is valid to express can be mathematically expressed and we understand what we might here do." So, that's step one conceptually and with these quotes all on paper. Then you said, " Okay, I'm going to start using Daphne to guarantee some of the properties for that language specification." And this is where you started going into like that, " Hey default deny, make sure that that's a thing." But also, the rules around scope. So scope is... If you specify, you can see that your principle, your action and your resource like I said. I'm going to use quotes, so no one can see them, a pre- filter, then that's what allows the Cedar engine to figure out what policies absolutely need to run in order to make that authorization decision. And that's one guarantee. And once you folks had the specification, once you have the Daphne tests, making sure that everything was working, only at that point you went on and said, " Okay. Now we're going to go write some Rust code, which is to inaudible and you also guarantee performance upper valves, no garbage collection, making sure that you folks know when memories are allocated, et cetera." I mean, is there a feedback loop here or Daphne verifies Rust? How does this work?

Emina Torlak: Yeah. Yeah. So, I think you hit the summary correct, except that inaudible was a little bit more concurrent than that. So it wasn't a waterfall model, it was more iterative. We were developing the Daphne and the Rust at the same time. So, the cool thing about building our specification in Daphne is that Daphne is both a theorem prover and a full- fledged programming language. So, when we say, see the form of specification, it sounds like we wrote a bunch of math formulas, we actually wrote an interpreter for Cedar. So, our specification is a reference implementation. The difference between that implementation and the Rust one is that the Daphne one, we focused on it being very small and very readable. We didn't care about performance at all. So, we used crazy features of the Daphne like, set comprehensions to implement code. And this is not something that you would use in a production implementation in order to create your production implementation to run fast. So, basically, we built Cedar twice. Once in the Daphne, as a reference functional program effectively, and once and Rust. And, we proved properties of dysfunctional program in Daphne. And then, we have a system we actually use cargo- fuzz, that's built on top of cargo- fuzz that generates millions, and millions, and millions of inputs. So millions of Cedar policies, millions of inputs for these policies, and millions of entity stores. And then, it feeds them to both implementations. If they agree, we're good. If they disagree, we found a bug. Okay? So maybe it was a bug in the reference, maybe it was in the implementation, then you have to examine it as a human being and figure out who was right. So there's definitely a feedback loop there.

Damian Schenkelman: That's very neat. And, how do end- users, or I guess developers, or IT admins, whatever you folks think of an end- user, get to participate in this process of making sure that it's not just that you can prove things out the language that performance is bound, but also it's readable, it's understandable, it's something that folks can write and iterate on that code.

Emina Torlak: Right. Yeah. That's a good question. So, so far we've been using the Daphne model as our source of truth for the development internally. So for example, when we talk about things like building a type checker for Cedar, it turns out that Cedar is dynamically typed, but it has an optional type checker. So, if you tell us the schema that describes the shape of your data, okay? So, who can roll up to whom in the hierarchy? What attributes you have? What are the types of those attributes? We have a type checker that can shift this for you. So one thing that you want to prove about type checkers is soundness. So, that means for example, that if the type checker comes back and it says, " Your program is correct." It means that when you run it, at runtime, it's guaranteed not to throw any type errors. So the evaluator is not going to be throwing any type errors. So this property is... Again, every major component in modeling, both in Daphne and in Rust, and did the same process, proofs, differential testing, and so on. And this is what we have been using the models for so far. A super nerdy academic thing that I would eventually like to do in my spare time. I think it would be super cool if we could generate English specification from the Daphne. Right now, they are both done independently. Okay? So we write the Daphne, then we talk to our tech writers and we figure out how to translate into nice English. But, it would be super cool if we could just take the formal spec and generate the English that people can read and understand from it.

Damian Schenkelman: That we need. I'm sure someone will figure that out with generative AI now that it has all the hype. You mentioned dynamic types with the possibility of doing static specifications. Again, you mentioned for example the scope feature. How did you make sure that end developers, the people that were going to be using Cedar to express their business policies understood the language, they were happy with the feature set, how did that work?

Emina Torlak: So, one thing that we have done throughout the process of developing Cedar, which has been extremely helpful is when we first started designing it on paper and writing little prototypes that are now in some dusting Github repo, who knows whether if they work or not. We made sure that throughout this entire process we were talking to both external and internal teams within Amazon, who came to us originally and said, " Hey, we have this authorization problem, can you say solve it for us?" We would write a prototype, go to them and say, " Hey, this is what we're thinking right now. Does this look like a good idea?" And, they would say, " Yeah, this is okay. But this part is unacceptable to me." So, one interesting story is... And one question that people often ask is, " Well, why is Cedar actually dynamically typed? Why are you not enforcing static typing discipline from the start?" And the answer is, that's actually how we started. So, our original very first version zero, draft of Cedar was strongly, statically typed, so think the code was inaudible, super strict. And, we took that to potential customers and they said, " I can't work with this." And, the reason is very simple that the authorization data that they work on, they don't control. It comes from third- parties. So, they don't necessarily know the shape of the data in advance, and it means that we have to allow them within their policy to write very dynamic things, like, " If this policy has the name attribute, then compare it to the string Emina." Right? You can't demand that they have the name attribute and you can't demand that they even know the data of the sheep of the data that they're operating on. So that's how we decided to make typing optional. Okay? So Cedar is dynamically typed by default. The semantics is specified in that way. But then, other customers came along and said, " Hey, well I actually know the shape of my data. Can you help me actually use this to make sure that the policy that I write or see that I'm not writing typos and I'm not accessing an attribute that doesn't actually exist?" And that's how we built this optional type system. If you have the schema, if you know the shape of your data, you get this extra security and the extra safety.

Damian Schenkelman: That's neat. You mentioned Github repositories, we were talking about developer feedback. How many folks consider making Cedar open source?

Emina Torlak: Yes. We have. I'm pretty excited to say that we just through linked Cedar at the Linux inaudible Open Source Summit. And, you can find it on GitHub at Cedar/ policy.

Damian Schenkelman: Wow. That's amazing. Congratulations.

Emina Torlak: Thank you.

Damian Schenkelman: How did you come to this decision?

Emina Torlak: Obviously, I'm super biased, so take that with a grain of salt. But, we think that the security and performance properties of Cedar and the way that we built it make it a pretty good language for many authorization applications. And, we really wanted to enable the broader open source community to build on the work that we've done. So, to benefit from it, to extend it, to come up with cool ideas. Because basically, every time we have extended the reach of Cedar, when we included more people, we have gotten invaluable feedback that has inevitably made the language better, incorporating it, made it more safe, more secure, more usable. So, there is the community building aspect, but there's also the selfish aspect before, because more people keep the tires, the better it's going to get.

Damian Schenkelman: Yeah, that's one of the nice things about communities and open source. You get feedback that, I think, is typically a lot more organic, a lot more natural than if you just have a close product, because the use cases are different, because you can peek under the hole and do things that maybe you can't otherwise. Are the Rust and the Daphne implementations going to be open? How's that going to work?

Emina Torlak: Yes. So, we're making everything open. So, all the Daphne code is going to open. It's going to be open all the Rust code as well as the differential testing framework that we're using to establish the equivalence of the two. We feel that this is really important because it builds trust with customers. You can not only analyze our code, which is of course the whole point of being open source, but you could look at our proofs, and you can look at the statements of the properties that we proved, and convince yourself that it's a property that you care about, it's a good one. Or maybe if it's not, you can come back to us and say, " Hey, I actually really care about this other thing. Does this other thing hold about your language?" And if it does, we or you can write a proof about it.

Damian Schenkelman: That does make sense. The tuning for some of these things might not be something that folks use all the time. How do folks think about the community contributing? What would that process be? For example, do you need to send a PR that passes the spec, and also runs that part of steps, and is there an environment or people who will review to quickly try this out or set up the repo locally?

Emina Torlak: Yes. So, setting up the repo locally and trying out the Cedar and the demo apps, that's all easy, or at least we have tried to make it as easy as possible. But, this point that you make about people not necessarily being familiar with verification and Daphne and how's that going to work, in terms of open source contributions? It's a very good question, right? Because it is an unfamiliar development model. And, extending Cedar in that sense is a process that involves updating both product specification and the code. So the way that we decided to do this is having a similar RFC process to what other open source languages do like Rust. So, if you want to contribute something that's not a core feature, that doesn't need verification, that's easy. Right? You open a PR just like everybody else, you get code reviewed. If enough people like it, we pull it in. So, an example of that could be a sidecar implementation for Cedar, right? But if you need to change or you want to change, you want to propose a change to the core language, the core semantics, then you go through the RFC process. And, if it's something that the community agrees is a good idea, and we like it, even if you don't know anything about Daphne, and you just want to write the Rust code, and that's possible too. We have expertise on our side to make the proofs go through, again.

Damian Schenkelman: So, that's interesting. So, it'll be a mix of handholding for features that folks are familiar with. Then you'd have the option of the small PR menu item where you say, " Hey, this is more than enough, it should be able to get in." And then for larger things it might be either a model or other function like verification model, where people can run it on their side. That makes sense. What about the feature? What are your plans for future development of Cedar? And, are there specific features or improvements that are coming that you're excited about?

Emina Torlak: So, we are just at the beginning of our journey. We just open sourced. There are lots of things that we're excited about. Our roadmap is pretty open. So I'll tell you some things that we have in mind. And, all of this is subject to change based on what people actually want. So, the nice thing about open source is if Cedar sounds like a good idea to you with, something that might used for your application, you have the option to influence it. So, do PRs participate in a process, let us know. Things that we know for sure people want right now for example is, bindings for additional languages. So right now, it is easy to use Cedar if you have Rust code. Also for Java, so we released bindings for Java. But, there are many more languages out there that people like and enjoy using. So, those are definitely on the roadmap. We want those additional binds. Having a sidecar. So a lot of these authorization systems work by having a sidecar and people are used to that model, they know how to work with it. And that's another thing that we'd like to implement. On a more technical side, an idea that we've been playing with is, doing some form of type inference. Okay? So right now, you have two modes? So one of the modes is, you don't have any types. So, you just run everything at runtime, and the type errors, you deal with them, it's fine. The other mode is you know everything. So you give us a full specification, the schema, and recheck it. But there is a third mode where we're finding some customers exist in that between area, where they know some types and they don't know the others. So then, the technical question becomes, " Can we actually infer those?" So, again, whether this is a good idea or not, it will depend on how many people use it, how many people are excited about it. And, in that sense, our roadmap is really about figuring out what's most useful, what people are excited about and engaging with the community in that way.

Damian Schenkelman: Yeah, this is one of the great things about the community and being able to gather that feedback. You mentioned, for example, the sidecar stuff. And, I'm sure that's great to happen for, again, typical orchestrators like Kubernetes, and then there's going to be, "Hey, I need to make HTP requests before running my inaudible. Can you folks help me with that?" I'm sure that a number of projects will come from this. I know also, that again, this is being open sourced, but you've been seeing Cedar run in real applications for a while as part of AWS Verified Permissions. What is Amazon Verified Permissions and how does it defer from Cedar, so that people might yet understand how they work together and what one is and what one is?

Emina Torlak: Yeah, that's a good question. So, we've been talking about Cedar so far. And, the best mental image that you know can have about Cedar is that it's a language implementation. It gives you an a language and an API for evaluating for policy in that language. And, if you have a very small application, that's probably enough. Right? So let's say you have three policies that don't change very often you store them in a file, when your application starts up, you load them up, and then you just see that whenever you need to write a request. So, this is a model that's completely workable for small applications. Or, for whatever reasons, you have to cache policies locally. Again, this is where you would take the SDK model and run with it. But, when you start trying to scale. And, a lot of AWS customers operate on a very big scale, then a new source of problems creep in that Cedar itself does not solve. So, some obvious ones are, where and how are you storing policies? Can you deal with policy governance? Policy versioning? How are you evaluating these policies? How are you doing the policies slicing? So Cedar itself doesn't do the policy slicing for you. It gives you a theorem that says, " If you do the policy slicing this way, things are going to be correct." But if you have a massive store with billions of resources and millions of policies, somebody has to implement a distributed database somewhere that actually implements this slicing algorithm. So, all of those things is what ADP does for you. Okay? So, you can think of it as a managed service policy store that stores Cedar policies that versions then does governance, all of that on your behalf. And then, when your request comes in from your application, ADP gathers the policy that need to be evaluated at the slice, and answers the request for you very quickly. So now, the very quickly becomes the question that you asked at the beginning, well it's not going to be millisecond any longer because ADP has to take some time to gather those policies. But again, it's going to be very fast, because the people who are building ABP are amazing engineers. They've been doing this for years. So, they've really gotten the art of building distributed systems now. So, that is the difference between ADP and Cedar.

Damian Schenkelman: Okay. So, that makes sense. It seems like Cedar is the language where you use to define your policies. ADP is the way in which within ALUS you can decide to offer them, version them, make them available whenever you need to run them and so on. What are customers doing with ADP on Cedar? Maybe what are two interesting examples that people might learn about so that they know, " Hey, this is the kind of thing that I could be doing. Maybe I didn't know about ADP. I didn't know about Cedar and I can get started with it?"

Emina Torlak: Yeah, that's a good question. So, heaviest use of Cedar is currently coming from ADP customers. And, roughly speaking, they are three classes of applications that we're seeing. So one of them, we can call them consumer- facing applications. So think about, financial industry and they are building banking services for end- users. In these applications, they are interesting because the rules tend to be very ABAC heavy, so Attribute- based access control, because there is no natural hierarchy between banking users, right? So, people want to do things like authorize legal dependent, signatories on accounts, that sort of thing. So, those policies tend to be very ABAC heavy. And then, going back to that question of interfaces or UXs for authorization, these will be mostly these pointy and clicky interfaces that use these Cedar templates to make that go through easily on the application side and the side of the application code. The second class application that we're seeing are internal services. So, we have some within AWS and some other organizations as well. But you can think of it as an organization that has a lot of sensitive internal resources, billing data for example, that it needs to make available to employees and applications within the organization, but in a very limited fashion. Okay? So the person who owns the data wants to control the access, they determine who gets to call it, how they get to call it, how they get to use it, and so on. These tend to be a mix of attribute- based and role- based. So, for example, developers in a certain organization are allowed to access my billing data, but only if it's no longer than three days old, that sort of thing. So that's the second class application that we're seeing for ADP. And the final one is kind of business to business. Software is a service application, so thank God somebody building an HR application, and then having other companies subscribe to this HR application to provide services to their own employees. And these tend to be more heavily role- based. Right? So a manager is allowed to access their employees records rather than attribute- based. But, it's a mix of all of those. So, those are the things that we're most frequently seeing right now.

Damian Schenkelman: Yeah. Yeah, I can understand that. I can relate to it. When you get into B2B with custom models per customer, that's when things start to get a bit messier. And I'm sure, a lot of the optimizations around Cedar and the scopes start to make a lot more sense.

Emina Torlak: Yes, yes. It's definitely a hearing problem. And you know what? Once you've seen enough of them, you really get to empathize with your customers and how much pain they have to go to when they're trying to implement these things on their own.

Damian Schenkelman: Yeah. This is amazing. It's been great learning about what you folks are doing, bringing great learning from you. I have one final question, which is, where does the name Cedar come from?

Emina Torlak: That's good question. So, there were some internal versions of the IAM policy language that started with the letters A and B. C was the next letter in sequence. The previous two languages were trees, so we decided to go with the alphabetical tree scheme.

Damian Schenkelman: Okay. It's always interesting to learn about those things. How did this very, very big thing maybe in 10 years end up being named... Oh yeah, it was the first two were taken and they were all trees. That's very neat. Emina, it's been great to have you. I really appreciate your time. I know you're working a lot on the launch and making sure that things are great, making sure that the community can contribute to Cedar. It's been amazing to have you here. I learned a lot and hopefully everyone listening in has as well.

Emina Torlak: Thank you so much for having me. It's been an absolute pleasure.

Damian Schenkelman: That's it for today's episode of Authorization in Software. Thanks for tuning in and listening to us. If you enjoy the show, be sure to subscribe to the podcast on your preferred platform so you'll never miss an episode. And if you have any feedback or suggestions for future episodes, feel free to reach out to us on social media. We love hearing from our listeners, keep building secure software and we'll catch you on the next episode of Authorization in Software.

DESCRIPTION

In this episode of Authorization in Software, host Damian Schenkelman talks to Emina Torlak, Senior Principal Applied Scientist at AWS, about the intricacies of software authorization, policies, and the Cedar policy language. Torlak delves into the philosophy behind Cedar, an open-source language for writing and enforcing custom authorization policies. They discuss the need for policy-based access control, how it separates application code from authorization logic, and the importance of user interface in managing authorization.

Today's Host

Damian Schenkelman

|Principal Architect @ Okta

Today's Guests

Emina Torlak

|Senior Principal Applied Scientist, Amazon Web Services

Emina Torlak is a Senior Principal Applied Scientist at Amazon Web Services and an Associate Professor in the Paul G. Allen School of Computer Science & Engineering at the University of Washington. She received her Bachelors (2003), Masters (2004), and Ph.D. (2009) degrees from MIT. She is a recipient of the Robin Milner Young Researcher Award (2021), NSF CAREER Award (2017), Sloan Research Fellowship (2016), and the AITO Dahl-Nygaard Junior Prize (2016). Emina works at the intersection of programming languages and automated reasoning. She has built the Kodkod solver and the Rosette programming language. Kodkod has been used in over 70 tools for software engineering and design, and Rosette powers state-of-the-art verification tools for correctness-critical systems, ranging from radiation therapy control to just-in-time compilers in the Linux kernel. Currently, Emina co-leads the development of Cedar, an open-source language for writing and enforcing custom authorization policies. Cedar balances expressiveness, performance, and analyzability. It is used by Amazon Verified Permissions and AWS Verified Access.

Follow Emina on LinkedIn

The Cedar Language and Policy Based Authorization with Emina Torlak

DESCRIPTION

Today's Host

Damian Schenkelman

Today's Guests

Emina Torlak

Recent Episodes

Fine Grained Authorization, Open Source and Topaz

Deep Dive into Open Policy Authorization Layer (OPAL)

How Box Does Authorization

Authorization at Workday

Macaroons for Authorization with Neil Madden

Real Time Authorization with Atul Tulshibagwale