Blog

What is Google Zanzibar?

Learn what Google Zanzibar is, how to implement it, and how it compares to other authorization technologies.

Learn what Google Zanzibar is, how to implement it, and how it compares to other authorization technologies.


If you've done any research on authorization systems, you've likely come across Google's “consistent, global authorization system” known as Zanzibar

The system is particularly notable for providing a way to handle "relationship-based access control" (ReBAC), which extends traditional role-based access control (RBAC) and attribute-based access control (ABAC) models by considering the relationships between entities (like user-to-user or user-to-resource) as part of the access control decisions.

In this article, we’ll discuss Google Zanzibar in depth, how to implement it, and how it compares to other authentication technologies.

Why Google Zanzibar?

Authorization and access control, particularly in web applications, is not a new concept. Nearly every programming language or framework has built-in or third-party libraries to help with implementing authorization and access control. So why did Google decide to build Zanzibar from scratch?

Fine-grained authorization

Google's products are heavily ingrained in users' lives (e.g., email, calendar, maps, photos), so privately and accurately managing access at all layers is central to protecting user data and ensuring a great user experience. 

For such consumer use cases, traditional, coarse-grained access control methods like role-based access control (RBAC) don't work. Expressing access control rules such as 'user:alice can manage calendar-invite:eng-team-standup' (Calendar) or 'user:bob can edit document:readme' (Docs) requires resource or object-level fine-grained authorization (FGA). And at Google’s scale, this means trillions of access rules, necessitating the need for a system that can manage and serve all of them.

Centralized & scalable authorization

From the onset, Google realized that its product teams needed a centralized, fine-grained authorization service. 

For one, the cost overhead of implementing and maintaining the same authorization logic within each product (e.g., Maps, Docs, Photo, YouTube) didn't make financial sense.

More importantly, multiple implementations would dramatically increase the chances of bugs and security holes. For these reasons, Google opted to take on the challenge of building and scaling a centralized authorization system that each of its product teams could rely on for consistent authorization logic.

Key concepts

The Google Zanzibar paper does a great job of detailing Google's implementation and the many specific enhancements they have made over the years to get Zanzibar to scale to trillions of rules and millions of requests per second. 

Here are a few key high-level core concepts within Zanzibar:

Namespace configurations

Zanzibar introduces the concept of "namespaces," which are essentially schemas defining different types of objects and the relations that can exist between them. Each namespace configuration specifies:

  • Object types: Like documents, images, or projects.
  • Relation types: Such as "owner," "viewer," or "editor," define how users or other objects can interact with the objects.

The key concept in Zanzibar: Relation tuples

The most important concept to understand within Zanzibar is the 'relation tuple.' A relation tuple is simply a representation of a specific rule or ACL within Zanzibar. The basic format for a relation tuple is: <object>#<relation>@<user> 

  1. An object (like documents, images, or projects)
  2. A relation (defining how users or other objects can interact with objects)
  3. A subject (represents the user or identity that is requesting access.).

A tuple describes a specific rule within the system that is consulted when making authorization decisions. For example, the tuple [Document:readme, edit, User:alice] represents that Alice has editing rights over the readme document.

Schema configuration

If relation tuples express specific rules, Zanzibar's schema, segmented by 'namespaces,' defines which rules can be created. A schema configuration may specify the valid object types and relations in a namespace. 

For example, 'namespace-1' might define a 'tenant' object type that supports 'member' and 'admin' relationships. 

With this schema, it's possible to create rules like 'user:eve is a member of tenant:techcorp'. If you're familiar with relational databases, you can think of the Zanzibar schema as a relational database's table schema, whereas relation tuples are like rows in that table adhering to the schema.

User-specified consistency

Key to Google Zanzibar's viability and success at Google is its 'user-specifiable' consistency model. Each write transaction within Zanzibar generates a unique, incrementing transaction ID, a zookie.

On each 'read' (e.g. check operation), clients have the ability to pass a specific zookie instructing the service to conduct the operation on data 'no older than' the timestamp represented by the passed zookie. This guarantees that checks are performed on correct and up-to-date data as per a client's needs. 

This model of user-provided consistency allows Zanzibar to make use of replicated data and caches wherever possible while maintaining correctness for the user. Unlike most other eventually consistent systems, Zanzibar gives users the ability to tradeoff consistency and performance on a per-request basis.

Zookie (Zanzibar Cookie)

A zookie is essentially a timestamped snapshot token. It represents a point-in-time snapshot of the authorization data (Access Control Lists or ACLs) that Zanzibar uses to make decisions about whether a particular action by a user is allowed. 

The purpose of the zookie is to help clients and the Zanzibar system maintain a consistent view of permissions, despite potential delays or asynchronicity in data replication across a globally distributed environment.

Data replication & scale

Implementing a globally distributed authorization service with user-specified consistency is no easy task. Google Zanzibar achieves high scale by running more than 10,000 servers in several dozen clusters worldwide. 

Google's globally distributed Spanner planet-scale database is also mentioned heavily in the paper and is Zanzibar's primary data store. Spanner provides global sharding and replication, and its 'TrueTime' mechanism provides atomic timestamps that enable the zookie/snapshot reads we previously mentioned. 

In addition to Spanner, Zanzibar implements multiple layered caches optimized to combat 'hot spots' and ensure a p95 of less than 10ms for check requests.

Google Zanzibar also uses request hedging to mitigate the impact of tail latency. Tail latency refers to the relatively long response times experienced occasionally due to reasons such as garbage collection pauses, resource contention, or temporary network issues. When Zanzibar detects a response is taking too long, it makes multiple simultaneous requests to different servers and uses the result from the first successful response. 

Functionality

Google Zanzibar API consists of five core methods for handling access control: read, write, watch, check, and expand.

  • Read: Used to retrieve stored access control lists (ACLs) or specific permissions associated with objects.
  • Write: Used to create or update the ACL entries.
  • Watch: Used to subscribe to changes in access control information.
  • Check: Used to verify whether a particular subject has permission to access an object.
  • Expand: Used to determine all the principals that have a particular kind of access to a resource or to explore the permissions associated with a principal across various resources.

What are the alternatives to Google Zanzibar?

Here are some alternatives to Google Zanzibar:

Open Policy Agent (OPA)

OPA is an open-source, general-purpose policy engine that unifies policy enforcement across the stack. OPA decouples policy decision-making from policy enforcement, allowing you to specify policy as code and simple APIs to offload policy decision-making from your software. OPA uses a high-level declarative language called Rego to write policies, which are queried by your services via REST APIs.  

A huge downside of OPA is that you need to deploy a policy agent next to the things you want to make policy decisions on. It’s also pretty specific to Netflix’s use case since they built it.

Casbin

Casbin is an open-source access control library for Golang projects, but it also supports other languages like Java, PHP, and Node.js. It supports various access control models like ACL, RBAC, ABAC, and even RESTful. Casbin is well-suited for applications that require fine-grained control over user permissions.

Google Zanzibar vs. other authorization technologies

Zanzibar is not the first or only authorization technology out there. There are a number of approaches ranging from home-grown, basic Role Based Access Control (RBAC) libraries to more sophisticated rules and policy engines available on the market. 

In general, authorization strategies can be divided into either:

Stateful systems

Stateful authorization systems like Zanzibar and other similar relationship-based access control (ReBAC) services store all necessary data internally to make authorization decisions. This includes user roles, relationships, permissions, and any other context needed for decision-making.

These systems are a more natural fit for application-layer authorization (e.g., modeling hierarchies, ownership, access to applications, etc.). They are fully self-contained and can make authorization decisions independently or serve as access-aware indexes.

Stateless systems

Stateless systems like the Open Policy Agent (OPA) do not store any contextual data internally. Instead, they require the caller to pass all contextual information in the request needed to make authorization decisions.

These systems are more commonly used for infrastructure-layer authorization (e.g., ABAC, IP-range blocks, etc). 

Which is better?

Deciding which technology is better for your tech stack or application depends on various factors:

  • Application needs: Consider whether the application requires complex relationship tracking and contextual decision-making or if it operates with well-defined, rule-based access controls.
  • Architecture complexity: In more distributed environments, such as microservices, stateless systems might offer better scalability and flexibility.
  • Performance requirements: Stateful systems may provide faster responses for complex queries due to their integrated data, but stateless systems can scale more effectively across distributed environments.
  • Maintenance and overhead: Managing a stateful system's data can require more overhead compared to stateless systems, which offload much of the data management to the application layer.

Both have tradeoffs, and a fully-fledged authorization system will likely use elements of both approaches.

Implementing your own Zanzibar service

Upon reading the Google Zanzibar paper, you might be surprised to learn that Google never productized its Zanzibar service for consumers. To date, no Zanzibar service has come online or become available within Google Cloud. If you're looking to use a similar service, you'll either need to:

  • Build your own system or use an open-source implementation: Inspired by the principles and architecture outlined in the Zanzibar paper, you can attempt to create your own custom authorization system. 

    Alternatively, you can leverage open-source implementations like OpenFGA and SpiceDB.  Opting for an open-source project could save you some development time and provide a solid base to build upon.  

    Both options, however, demand a significant investment in development time and expertise in security and distributed systems architecture. 
  • Choose to use a vendor service like WorkOS: WorkOS FGA’s core authorization engine incorporates many concepts from the Zanzibar paper — such as tuples, namespaces, and zookies — with additional features that enhance functionality and the developer experience that Zanzibar may be lacking. One major benefit of using WorkOS or similar vendors is avoiding the complexity of managing a complex distributed system deployment yourself.

In WorkOS FGA, tuples are known as warrants. A warrant includes the same three major components (object, relationship, and subject) plus an optional "policy" component. This policy, a boolean expression evaluated at query time, allows WorkOS FGA to handle dynamic ABAC scenarios, where decisions are made based on external data.

Namespaces, as described in Zanzibar, are known as object types in WorkOS FGA. Object types are represented as JSON with the ability to set restrictions on which types of objects can be involved in specific relationships. WorkOS FGA also provides pre-built object types for common authorization scenarios like RBAC and multi-tenancy within Zanzibar’s ReBAC model.

Ultimately, WorkOS FGA aims to provide a generic and scalable authorization service capable of modeling diverse use cases and performing access checks globally with low latency.

Get started today.

In this article

This site uses cookies to improve your experience. Please accept the use of cookies on this site. You can review our cookie policy here and our privacy policy here. If you choose to refuse, functionality of this site will be limited.