What is the CAP theorem?

Aug 27, 2024

Coined in 1998 by Prof. Eric Brewer of the University of California, Berkeley, the CAP theorem states the following: “In any distributed database design, in the face of Partition Tolerance, you can choose to design for either Consistency (CP) or Availability (AP). You cannot have both.”

Consistency

Consistency means that all clients see the same data at the same time, no matter which node they connect to.

For this to happen, whenever data is written to one node, it must be instantly forwarded or replicated to all the other nodes in the system before the write is deemed ‘successful.’

Strategies for Ensuring Consistency

Ensuring consistency in distributed systems often involves trade-offs with other CAP theorem properties, particularly Availability. Here are common strategies for achieving consistency:

Strong Consistency: This approach requires that all read and write operations are performed on the entire dataset, ensuring that data remains consistent across all nodes. However, it can lead to increased latency and reduced availability, especially during network partitions.
Eventual Consistency: Eventual consistency acknowledges that, in certain scenarios, it may be acceptable for nodes to temporarily have slightly different versions of data. Over time, all nodes will converge to a consistent state. This approach prioritizes availability and performance over strict consistency.
Quorum-based Systems: Many distributed databases use quorums to reach consensus. A quorum is a subset of nodes that must agree on a data change before it is considered valid. Quorum-based systems strike a balance between consistency and availability by requiring a majority vote for operations.
Consistency Models: Different consistency models, such as linearizability, sequential consistency, and causal consistency, offer varying levels of strictness in ensuring consistency. The choice of a model depends on the specific requirements of the application.

Availability

Availability means that any client making a request for data gets a response, even if one or more nodes are down.

Another way to state this—all working nodes in the distributed system return a valid response for any request, without exception.

Use Cases

Highly Available Web Services: E-commerce platforms, social media networks, and online services require high availability to ensure uninterrupted access for users.
Real-time Communication: Messaging and communication systems, like chat applications and video conferencing tools, rely on availability to provide seamless interactions.
Disaster Recovery: Availability is crucial in disaster recovery scenarios, where systems must continue operating despite catastrophic events.

Partition tolerance

A partition is a communications break within a distributed system—a lost or temporarily delayed connection between two nodes.

Partition tolerance means that the cluster must continue to work despite any number of communication breakdowns between nodes in the system.

Achieving Partition Tolerance

Partition tolerance is typically achieved through various techniques and strategies in distributed systems, including:

Replication: Creating redundant copies of data and services across different nodes and regions to ensure that even if some nodes are unreachable, the system can continue to operate using available replicas.
Load Balancing: Distributing incoming requests or traffic evenly across multiple nodes to prevent overloading any single component and reduce the impact of a potential partition.
Quorum Systems: Implementing quorum-based algorithms that require a minimum number of nodes to agree on a decision or operation. This ensures that critical tasks can proceed even if some nodes are unreachable.

CAP theorem NoSQL database types

CP database: A CP database delivers consistency and partition tolerance at the expense of availability. When a partition occurs between any two nodes, the system has to shut down the non-consistent node (i.e., make it unavailable) until the partition is resolved.
AP database: An AP database delivers availability and partition tolerance at the expense of consistency. When a partition occurs, all nodes remain available but those at the wrong end of a partition might return an older version of data than others. (When the partition is resolved, the AP databases typically resync the nodes to repair all inconsistencies in the system.)
CA database: A CA database delivers consistency and availability across all nodes. It can’t do this if there is a partition between any two nodes in the system, however, and therefore can’t deliver fault tolerance.

CAP theorem and microservices

Microservices are defined as loosely coupled services that can be independently developed, deployed, and maintained. They include their own stack, database, and database model, and communicate with each other through a network. Microservices have become especially popular in hybrid cloud and multi-cloud environments, and they are also widely used in on-premises data centers. If you want to create a microservices application, you can use the CAP theorem to help you determine a database that will best fit your needs.

Here's a simple table summarizing the use cases and trade-offs for Consistency (C), Availability (A), and Partition Tolerance (P) in the context of the CAP theorem:

This table provides a quick reference to understand how each element of the CAP theorem might be applied and what compromises are typically made in each scenario.

Better Engineers

Discussion about this post