FOSDEM 2024: SpiceDB: Mature, Open Source ReBAC
All right! So this is the talk on SpiceDB. Thanks everyone for showing up so early in the morning. I'm starting to lose my voice because there was a long day yesterday of talking and meeting awesome people. This is my first FOSDEM. So who am I? My name is Jimmy Zelinskie. I'm the co-founder of a company called authzed and authzed builds SpiceDB. Previously, I've worked at Red Hat and CoreOS. So I've been around in the container and Kubernetes ecosystem for a pretty long time -- basically since the beginning. There I'm actually a maintainer of OCI which is the standard
specification for Linux containers and I've also started a bunch of projects in that space: notably the Kubernetes Operator Framework and some others. This talk is entitled SpiceDB, but since FOSDEM is more of a developer community conference, I really wanted to focus less on this talk being a vendor pitch for SpiceDB, but actually kind of more of a level set about kind of the problems in the authorization space and kind of the history and status-quo of that so that kind of everyone understands what might be the best tool to solve their problems. I'm not going to try to sell you SpiceDB for all problems because the more informed you are, the better you can pick the product that's actually going to complement your software stack and what you need and that means there's going to be way more qualified people using SpiceDB way more qualified people using other authorization tooling. But obviously like I'm the most jazzed about SpiceDB because I created it! So why are we all here? We're all here because there is a not-for-profit organization called OWASP which is the Open Worldwide Application Security Project that got started in the early 2000s. They're famous for having this list called the Top 10
and the Top 10 is basically an enumeration of the the highest risk--the highest threats--for web security. And as of 2017, Broken Access Control was number five. As of 2021, Broken Access Control is number one. That means this is the biggest threat to the web and to all the applications running internet facing to the web. But really the question is: how did we actually get to this point? how did this happen? and how did it happen so quickly? I'm not going to point any fingers but what I'm actually going to do is dive into two different groups of stakeholders in the history of authorization. There's Academia -- people publishing papers in this space and defining concepts -- and then there's the industry practitioners that are actually building the software and realizing these systems as they're actually connected to the web. I'm going to start with Academia first. On the LEFT-hand side, you're going to see a timeline and then on the RIGHT-hand
side, there's going to be some notes, and not for this slide, but others you'll see QR codes in this corner as well. Those QR codes are going to link to the specific novel paper. So if you're interested in any of these particular concepts, you can feel free to scan the QR codes. But our history of authorization is actually going to start in the '80s. It gets really kicked off with this publication called the Trusted Computer System Evaluation Criteria which is a security practices book published by the US Department of Defense. In it, it's outlining a lot of different security practices that are effectively a part of the United States military. Importantly,
they describe these two different Access Control systems: Discretionary and Mandatory. Now Discretionary is conceptually just "if you created the idea or the information you can share it" and "if you're then given access to that you can share that". It's at your discretion. I used file systems and Google Docs as an example here, but it's not a perfect one-to-one match. If someone shares a file with you on a unix-like file system, you can copy that file if you have read access and then you can change whatever permissions on that and share that -- similarly with Google Docs. So it's at your discretion how you're going to share that information once you're given read access. Then there's Mandatory Access Control which is effectively a long list -- an exhaust list -- of all the access for a particular thing. Most notably people are most familiar with SELinux
as the example of this.If you're unfamiliar with SELinux, it's a way of locking down the Linux kernel. Honestly, it kind of comes with a negative connotation because mandatory access control are very verbose and very difficult to get right because you have to enumerate absolutely everything. Some people say that the three letter agency at the US government that created this are the only people actually know how to configure this correctly. I don't know if
that that's actually true or how many people use it. I do know Red Hat is is one of the folks that actually does promote SELinux. But the one thing about this slide that I really wanted to kind of drive home is that these ideas -- they're as old as the military and war itself. There's
nothing novel about the '80s where these ideas got "invented", but what actually happened was someone only actually ever thought to write this down in the '80s. So it took that long after using these ideas for many, many, many years. So now we jump roughly 10 years, actually 9 years to 1992. This coincidentally happens to also be the year I was born--that makes me feel relatively old. But, in '92, we get this paper published on Role-based Access Control and Role-based Access Control often called RBAC is where actually most people believe the state-of-the-art for authorization systems is.
The core idea is, basically, there is a group that is assigned access to a particular thing and those groups are called Roles and then you map users into these roles and by means of being in this role you get access delegated to you. The kind of number one problem with RBAC is that everyone defines it differently. If you build any enterprise software, you're going to talk to clients and they're going to ask you for RBAC, but the difference is if I look at two different enterprise applications, how they implement RBAC is entirely different. The only commonality is this mapping of users into groups that then have access. This is kind of going to be a recurring theme across all these papers published in academia -- anything with *BAC -- because they're documenting concepts, but not actually specifications that would give you an ultimately cohesively designed and secure system. Most famously the biggest issue with
RBAC is that there really is no scope. If you say someone is an admin, does that mean they're an admin of the entire web app? Does that mean they're an admin of a particular resource in the app? You just don't know until you actually build it yourself. So there's not really an easy way to reason about these systems until you actually touch them. So now we jump well into the future into 2015 and this is when the paper on ABAC, which is Attribute based Access Control, is written. The idea behind ABAC is to kind of generalize on RBAC and say the role that you're assigned is just one attribute that your user can have and other attributes might be that you logged in with this IP address or many other dynamic attributes can be assigned to you. The really
important thing about ABAC is it's providing this real-time context so now you can write rules like "are they connecting from this country's the subnet at this time?" You can delegate access at particular windows of time and perform more logic on these attributes that folks have. And, now, we're going to take a huge digression back to 1965. If you're unfamiliar Multics is actually this operating system that was developed between MIT, GE, and Bell Labs. You might not remember it, but it actually inspired an operating system you're probably familiar with: Unix. Unix is actually an attempt at making Multics concepts ported to less expensive hardware. Multics is
often credited as the first operating system that has access control for the file system. I actually don't know if that's true, but it's often credited as that. In Multics, you have a file system tree, so you get hierarchical structure, and then at every branch which would be a file or a directory, you can have five different attributes assigned to that. You get read, write, exec, and append -- these are all file operations that you'd be familiar with. But, there's this fifth one that's super interesting called "trap" and that actually gives you the ability to do callbacks into C functions. It was initially designed so you could do file locking in user space. But the
thing with Multics and reason why I bring it up is because there was inheritance, there was ABAC, and there was user-defined functions in an authorization system in 1965. When in academia the ideas behind attributes were published in 2015. So there are systems using these concepts, but they maybe haven't been formalized and written down in the concrete form and this is a huge issue with the whole space. Because people are doing things but they're not really studying how to make these systems robust with these ideas. They're kind of more just documenting these ideas ad-hoc. So getting back to the normal timeline, we hit 2019. It's actually in 2007 that the the term
is coined "Relationship-based Access Control" (ReBAC) and the idea behind this is that by establishing a chain of relationships like "Jimmy is a speaker at FOSDEM" and "speakers at FOSDEM have access to the FOSDEM speaker Matrix chat", if you can follow these chains of relationships you can actually go from "Jimmy has access to the FOSDEM speaker room". This term is coined around then and it's looking forward at what tech in the web 2.0 era will look like. It's published initially while considering how Facebook's social graph works internally -- when you share photos on Facebook you say "friends of friends" can view this -- you're literally defining it in terms of relationship to yourself. So, we hit 2019 and that's when Google publishes a paper called Zanzibar which is documenting an internal system at Google powered by these concepts. And
the difference and the reason why I have 2019 for ReBAC is because Google is documenting a concrete implementation of this, unlike a lot of these other papers talking purely about concepts. It's talking about an application of these concepts and really giving you a framework for how to use this effectively and in a correct way across multiple products at Google. So then in 2021, SpiceDB is open-sourced which is also implementing the similar concepts to Zanzibar. Obviously, I'm going to get into that later, but there are other *BAC models, but these were the primary ones that I see mostly relevant in industry. You can dive into Wikipedia if you're interested in other ones, but now we've got to cover the industry side of things. We're leaving academia and evaluating how industry has this problem which is that they go to building web application and your first job is to just build the MVP -- the minimum viable product -- of your web application. So what you're going to do is do what you do with everything in a web application,
which is store data in a database -- probably the relational database you're using for everything else. And then you're going to try to check if a user has particular access based on some data you stored in the database. It's maybe going to be a role if you're inspired by RBAC, but maybe it's just an enumeration of the list of users that can do a particular thing. So you may have written code that looks like this, but the problem is this falls over at some point in time whether fundamentally you build a system that actually is just really slow or you have to build a new system that is way faster than you ever intended it for it to be or you basically get users of your software that demand new functionality that is not actually possible for you to implement until you refactor your authorization code. A great example of that is if they want recursive teams -- so if
you have groups of users what if you have groups of groups or groups of groups of groups of groups. That is something that most people cannot build or they don't build in their initial MVP and, when you get requested functionality like that, you're forced to completely rewrite your authorization system. The other thing that could happen to you is your company buys another company and they're based in a different continent and that means all the requests for checking permissions now have to travel across an ocean (if they want to be correct). That's a huge problem and making sure that the performance is actually going to be viable and the answers you're going to get for authorization questions are correct is a difficult problem. So you hit one of these these kind of big
issues and then you are forced to enter this cycle that I'm going to get into -- these numbers are kind of fudged -- but the whole point is that it's going to take an engineer probably with expertise in that web app that has worked on this specific authorization system. It's going to take them a while to implement this. It's going to be super sensitive, because someone else is going to have to review it and that person is ALSO going to have to be deeply embedded in that code-base.
They're going to be extraordinarily careful because any mistake that happens in this code base is going to be a vulnerability because it's giving access to people that shouldn't otherwise have access. So that's going to take a long time then you're going to do QA. You might actually have to perform a security audit before you can deploy this software because you're deploying to enterprise environments. Then you're also probably going to want to take extra time rolling out these changes into production. You probably don't want to deploy it to everyone all at once. You probably want to deploy to a minor subset just in case you find something wrong with the code. All of this just takes time and the problem is it's actually putting security of your software at odds with development velocity. Fundamentally, it's going to take you too long to add this functionality and you're going to want to take shortcuts, but shortcuts are security flaws in your software.
Then it's rinse and repeat. You basically don't know how long until the pain is going to build up where you're forced to rewrite these authorization systems and that is like the mystery box entirely. You could finish or not even be finished rewriting your authorization system and then all of a sudden a new user sets some requirement for you and you're doomed. You have to completely rewrite the thing you just thought you re-architected to be future proof. How do we fix this never ending cycle? Well, OWASP themselves actually have recommendations for this. They say you should no longer adopt RBAC, but instead concepts from ABAC and ReBAC. Obviously, I'm biased towards ReBAC,
because I think it's the more modern approach. The OWASP folks also give you some high-level benefits to why you would adopt these these new ones over RBAC. I'm going to just take this from the ReBAC perspective. When you're doing a graph-like thing -- a Relationship-based system, you're forced to talk about individual entities. "This user Jimmy has access to this particular document". Because you're doing that, that has this kind of buzz-word associated with it: fine-grained. You're
not resolving Jimmy to a role or a group; you're actually following Jimmy directly through to the document. You're talking about individual entities in the system so, as a result, you get more fine-grained access. I'm not trying to generalize about any users or paint over anything; I'm actually talking about the exact objects I care about. That means you can actually like develop systems where you delegate access to a particular row in a database or a cell in a spreadsheet. All of these systems are designed for speed because they understand that they're going to have to store a lot of data to be this fine-grained. Then because your applications are
only talking about the direct objects that they care about, any of the relationships "in between" don't get written into your code. You just ask the question "can this user perform this action on this thing?" How they got access to that, and if you ever refactor or change how they get access to that, does not live in your code base anymore. That means you can make changes to your permission system and not change a single line of code in any of your web applications. Believe me when you do that for the first time, it is a magical feeling because you don't have to touch ANY code. Then there's also multi-tenancy and management ease. This is kind of just about simplicity around
modeling and then, with ABAC and ReBAC systems, you're kind of paying it forward. So RBAC might be really easy conceptually for you to implement at the beginning, but these systems -- the ABAC and ReBAC ones -- they're more focused on forward thinking like if you need to make changes like I just described how you can change ReBAC designs without changing code. It may be a little bit more effort for you to get started in building and integrating with one of these systems, but by day two, if you ever need to make a change, it's going to pay dividends. Now, I wanted to get deeper into this Zanzibar paper that I talked about earlier, which kind of like kicked off the interest in ReBAC that you see today. Basically, Zanzibar is a purpose-built graph database that is very specifically optimized for one thing: finding a path in a graph and by virtue of finding that path that means that a user has access to that particular resource.
It's actually one of the few good things that came out of Google+. There's only two things that came out of Google+: there is Zanzibar internally at Google and then the consumer-facing Google Photos. The novelty of this paper is actually that it is solving an authorization problem with a focus on distributed systems. You'll notice the title of the paper is called Zanzibar:
Google's Consistent, Global Authorization System, so it is fundamentally trying to tackle authorization as a distributed systems problem which is not really something else any has done in the past. Because they kind of acknowledge that if they're going to deploy one system at Google, it needs to work across all geos in the world and it has to be extremely, extremely reliable and it can never be wrong. These are really difficult requirements, but the anecdote that I like to use is when you're using a cloud provider like Amazon and you go to provision something like, say, an S3 bucket, you're always choosing what region. But, actually, if you go to set IAM
rules in a cloud provider like Amazon, you don't pick the region. That is because these systems fundamentally have to be global and when you're designing them yourself at a particular scale, you need to think about how you're going to make your system global. So this paper actually inspired two companies, Carta and Airbnb, to go forward and implement their own internal systems based on the ideas in this paper. None of them are truly 100% what I would call authentic to the original paper, but rather the paper fused with the requirements of their business at the time. I think the real superpower with Zanzibar, though, is that, if you go to send someone a Google Doc in Gmail, and they don't already have access, Gmail will pop up a box and tell you "hey! you didn't give access to this person". That fundamentally means that Gmail actually has a way to ask questions
and check permissions that are built into Google Drive. That means you can have one central source of truth for authorization data that your whole application suite can share -- microservices can share. This is incredibly powerful because not only does it allow integrations like this, but it also lets you have that central source of truth where if you need to audit something you can just ask that one service. It's the only service you have to trust. It's the only service that you have to query if you're trying to like really dig into any of this data if say you have a problem like an outage or something an incident and you you need to understand what the access control looked like. So you might now be wondering "how do I Zanzibar?" This is exactly what we set out to do basically the year after the paper was published. My co-founders and I left Red Hat to
found authzed and build SpiceDB in the open source. There were some folks experimenting with the ideas around ReBAC at the time, but no one was really moving the needle towards making this a production thing that you could use in a real enterprise environment or at a real tech company. We originally prototyped the thing in Python: it was type-annotated, lazily-evaluated, functional Python. It was way faster than you'd ever think Python should be, but it was not fast enough, so we ended up rewriting it in Go and open sourcing that. The name is
actually inspired by Dune because internally at Google the project was actually called project "SPICE" because of a running joke that "the ACLs must flow". The timing for that has actually been really good with the resurgence of Dune in the movies. Internally at authzed all of our software is named Dune references as a kind of homage. So if we fast forward to today, the SpiceDB community has actually gotten contributions from a lot of companies -- big names like Netflix, GitHub, Google, Red Hat, Adobe, and Plaid. There are production users in small companies like startups where it's just the co-founders all the way up to Fortune 50 companies.
But I still haven't actually told you what SpiceDB. SpiceDB is, as I described with Zanzibar earlier, this extremely parallel graph database. Developers basically apply a schema just like you would for a relational database and -- I've given an example schema here modeling a Google doc -- then what they do is they store data inside that database and query that data according to that schema. It's really magic when you can actually make schema changes in a forward compatible way that enables you actually modify your permission systems without changing any application code. So we don't actually have a SQL API despite being a database; instead, we give you gRPC and HTTP APIs. The primary interface we recommend is gRPC for latency reasons because authorization is in the critical path of everything your web applications are going to do and possibly everything at your business. You really have to make sure the stuff is fast thus everything needs to be kept in memory
everything needs to be returned in single digit milliseconds. gRPC is actually pretty critical for that. Then, in addition to the actual kind of main server, we also expose servers for powering developer tools so you can get like autocomplete in your editor. Also integration testing services. It's Kubernetes-native -- designed from the beginning because our background is all in Kubernetes. SpiceDBs self cluster. If you deploy just SpiceDB directly on Kubernetes, it will discover other nodes and actually start to divide and shard up the in-memory graph that it's using to actually serve across them automatically. We also offer a SpiceDB Kubernetes Operator in the
open source which will then do automated updates for SpiceDB. Notoriously having zero-downtime updates for a database is very tricky, so we just took that problem off the table for most people and just implemented it automatic for anyone using Kubernetes. We remain true to Zanzibar's goals of consistency at scale: we have pluggable data storage systems and, depending on what your requirements are, say you need to deploy everywhere in the globe you can store all your raw relationship data in something like Spanner or CockroachDB, and then you can deploy regional deployments of SpiceDB that will exist as independent caches for those geos. Fundamentally
they're sharing all the same core data and they're consistent across those environments. If that sounds too complicated for you or like you don't really need that because you're just single region shop, that's fine. We also have deep integrations with PostgreSQL or MySQL if you just want to use something like Aurora or Amazon RDS. Obviously then there's also memory for testing. We also have a tool called zed. Zed is the official command line tool. It manages cluster credentials,
backups, and it gives you a command for every single SpiceDB API. I just kind of give an example of running with a debug flagged permissions check. You can actually see it gives you a whole graph traversal. It shows you a tree of how you actually computed whether or not someone has access with timing data associated with all that so you can see where things slow down. We have a web IDE, so actually the two things you just saw -- SpiceDB and Zed -- we compile to WebAssembly and then run that in the browser. Then we build that all on top of Monaco -- the engine that powers VSCode -- and
give you a full IDE where you don't have to install any of the software I just showed you. You can just go to play.authzed.com and start playing with this stuff. You can run zed against live data. You can load in test data. What we actually do is we can generate exhaustively all of the paths available in the graph for you so there's somewhat of a model checking happening here so you can actually prove exhaustively all of the ways you can traverse the graph are the ways you think they are. That basically lets you prove that a system is correct without you deploying it into production or having someone do a extremely long security audit on your program. Then you can check
this stuff into to CI/CD so if you make a change to the schema you can actually guarantee that certain assertions always pass and that everything is exhaustively checked. So Zanzibar is not a silver bullet. We actually have had to extend Zanzibar in a bunch of different ways. SpiceDB remains true to all kind of the core concepts that you'll find in Zanzibar, but not everyone is Google, so not everyone relies on users being represented the same way. We are kind of more flexible with how people can model their own users and then we kind of add on developer experience because at Google they can say you're forced to use the software, but when you're building open source software, you can't force people to use your software. You have to compel them to use your software by having a better experience than what they're currently doing. We've also added kind of
contextual relationships with ABAC so that means relationships can actually exist dynamically based on context that you provide at runtime. That was a joint project with Netflix. So if you're wondering "how you SpiceDB", you can go to our Discord: discord.gg/spicedb or check out GitHub. Basically anywhere on the internet where you expect to find open source projects, SpiceDB is there. thankz
2024-02-16 19:50