CAP Theorem in System Design

By Oleksandr Andrushchenko, Published on

The CAP theorem is one of the most fundamental principles in distributed system design. It defines the trade-offs between three key properties — Consistency, Availability, and Partition tolerance — and explains why it’s impossible to achieve all three simultaneously in a distributed data system.

What CAP theorem states

Formally introduced by Eric Brewer in 2000 and later proven by Gilbert and Lynch, the CAP theorem states:

In the presence of a network partition, a distributed system must choose between consistency and availability.

This means that when parts of the system can’t communicate reliably (due to network failures, delays, or node crashes), the system must decide: either it refuses some requests to remain consistent, or it serves all requests but may return stale or inconsistent data.

The three properties explained

  • Consistency (C): Every read receives the most recent write or an error. All nodes see the same data at the same time.
  • Availability (A): Every request receives a (non-error) response, regardless of whether it’s the latest data.
  • Partition Tolerance (P): The system continues to operate despite network partitions or communication failures between nodes.
    CAP theorem
    CAP theorem

Visualizing the CAP triangle

Distributed databases can be visualized as points inside a triangle — each vertex representing one of the properties. When a network partition occurs, you can only guarantee two of the three.

Combination Guarantees Examples
CA (Consistency + Availability) No partition tolerance; works only in perfect networks. Traditional single-node relational databases (e.g., PostgreSQL without replication).
CP (Consistency + Partition Tolerance) Guarantees correctness but may sacrifice availability during partitions. Zookeeper, HBase, etcd, MongoDB (in majority-write mode).
AP (Availability + Partition Tolerance) Stays available but may serve stale or inconsistent data temporarily. Cassandra, DynamoDB, CouchDB, Riak.

CAP theorem in practice

In real-world distributed systems, partitions are not a matter of if but when. Therefore, partition tolerance is mandatory, and system designers typically choose between consistency and availability under failure conditions.
During normal operation (no partitions), modern systems often provide tunable consistency — allowing applications to choose between strong or eventual consistency depending on use case.

Examples

1. CP system (Consistency first)

Systems like etcd or Zookeeper prefer consistency. During partitions, they may reject writes or block reads to prevent serving stale data. This is common for systems maintaining configuration, coordination, or leader election.

# simplified pseudo-code: consistent write with coordination def write_consistent(cluster, key, value): quorum = len(cluster) // 2 + 1 ack = 0 for node in cluster: if node.reachable(): node.store[key] = value ack += 1 if ack >= quorum: return "Write committed" else: raise Exception("Not enough nodes for consistency") 

2. AP system (Availability first)

Systems like Cassandra or DynamoDB prioritize availability. Each replica can accept writes even if others are unreachable. Conflicts are later resolved through replication and reconciliation mechanisms.

# simplified eventual consistency example def write_available(node, key, value): node.local_store[key] = value node.replicate_async(key, value) # replicate later return "Write accepted" # reads may return stale data until replicas converge 

Beyond CAP: PACELC theorem

Daniel Abadi proposed PACELC as an extension to CAP, observing that even without partitions (P), systems still face a trade-off between latency (L) and consistency (C):

If Partition (P) occurs → choose between Availability (A) and Consistency (C); Else → choose between Latency (L) and Consistency (C).

This means systems constantly balance performance and correctness, not just during failures.

Choosing the right balance

The “right” choice depends on your system’s goals:

  • CP systems — use when correctness is more important than uptime (financial transactions, configuration stores).
  • AP systems — use when availability and responsiveness matter most (caches, analytics, social feeds).
  • CA systems — typically single-node or tightly coupled systems where partitions are not expected.

Guidelines for system designers

  1. Assume partitions will happen; design for failure.
  2. Identify which operations must remain consistent vs. available.
  3. Use tunable consistency (like DynamoDB’s strongly consistent read option) when possible.
  4. Document consistency behavior clearly for API consumers.
  5. Use monitoring and chaos testing to understand behavior under partitions.

Summary

The CAP theorem doesn’t prescribe what to build — it clarifies the unavoidable trade-offs in distributed design. When partitions occur, systems can’t have it all. Understanding CAP helps architects reason about data correctness, resilience, and user experience — and design systems that behave predictably under real-world network conditions.
In practice, successful architectures embrace partition tolerance and provide a configurable balance between availability and consistency — because in distributed systems, trade-offs are not optional; they are design choices.