Consistency in System Design
By Oleksandr Andrushchenko, Published on
Consistency is one of the foundational concerns when designing distributed systems. It answers the question: when a client reads data, how fresh or correct should that data be compared to the most recent write? Different applications have different tolerances for stale or conflicting data. This article explains common consistency models, trade-offs, and practical patterns you can apply when building real systems.
Why consistency matters
Consistency affects correctness, user experience, and system complexity. A bank transfer system that shows inconsistent balances is unacceptable. A news feed or shopping catalog can often tolerate slightly stale data for much better availability and latency. Understanding required consistency lets you make sensible architecture choices.
High-level trade-offs: CAP and PACELC
The CAP theorem (Consistency, Availability, Partition tolerance) tells us that during a network partition you must choose between consistency and availability. In practice partitions happen rarely but are possible, so systems choose where they stand on the consistency–availability spectrum.
PACELC expands CAP by noting that even without partitions you trade off latency vs consistency. Many design decisions are about where you draw these lines.
Common consistency models
- Strong consistency — a read always returns the latest committed write. This simplifies reasoning but costs latency and/or availability (often requires coordination like consensus).
Strong consistency - Linearizability — a strict form of strong consistency that preserves real-time ordering of operations.
- Sequential consistency — operations appear in a total order consistent across clients, but not necessarily tied to real time.
- Eventual consistency — writes propagate asynchronously; eventually all replicas converge. This provides high availability and low latency but allows temporary divergence.
Image caption - Causal consistency — if operation B causally depends on A, all clients will see A before B. Causal consistency is strictly stronger than eventual but weaker than strong consistency.
- Session guarantees — practical client-side guarantees like read-your-writes, monotonic reads, and monotonic writes that help UX without full strong consistency.
Patterns to achieve consistency
1. Leader-based replication
A single leader accepts writes and replicates to followers. This is simple and can provide strong consistency (if reads go to the leader) or tunable consistency (if some reads are served from followers).
2. Quorum reads/writes
Use R (read quorum) and W (write quorum) such that R + W > N (replica count). This gives a tunable trade-off: higher W favors stronger consistency, lower W favors availability and latency.
3. Consensus protocols (Paxos / Raft)
For strict linearizability across multiple nodes, use a consensus protocol. It provides strong consistency but involves leader election and more coordination overhead.
4. Conflict-free Replicated Data Types (CRDTs)
CRDTs are data structures designed so replicas can update independently and merge deterministically without central coordination. They are excellent for highly available collaborative systems where deterministic conflict resolution is needed.
5. Versioning and compare-and-swap (CAS)
Optimistic concurrency control uses versions or vector clocks. A write reads the version, attempts an update, and wins only if the version hasn't changed (CAS). This reduces blocking but requires conflict handling on failures.
Practical techniques and safeguards
- Idempotency: make operations safe to retry to handle uncertain outcomes across networks.
- Read-after-write consistency: ensure clients reading after a successful write see their update — often a session guarantee implemented by sticking a client to a replica or by reading from the leader.
- Cache invalidation: design strong cache invalidation rules or short TTLs when correctness matters.
- Background reconciliation: anti-entropy (periodic replica syncing) and CRDT merges to repair divergence.
- Application-level conflict resolution: when automatic merging is impossible, using deterministic rules (last-writer-wins, highest-priority source) or surfacing conflicts for human resolution is necessary.
Developer-friendly guidelines
- Start by specifying correctness requirements — which operations must be strongly consistent and which can tolerate eventual consistency.
- Prefer session guarantees for UX-sensitive features (e.g., read-your-writes for a user’s recent actions).
- Keep critical paths strongly consistent (accounting, billing), and move non-critical reads to eventually consistent caches/replicas.
- Instrument and simulate partitions and delays to observe behavior under failure modes.
- Document the consistency model in your API contract so clients know what to expect.
Quick comparison
| Model | Latency | Availability | Typical Use Cases |
|---|---|---|---|
| Strong / Linearizable | Higher | Lower during partitions | Banking, leader election, inventory decrement |
| Sequential | Higher than eventual | Medium | Where global ordering matters but strict real-time isn't required |
| Causal | Low–medium | High | Social feeds, collaborative apps |
| Eventual | Low | High | Caches, analytics, catalogs |
Small, practical Python examples
Compare-and-swap (optimistic update)
# simple CAS pattern using a versioned record (pseudocode) # assume store.get(key) -> (value, version) # assume store.put_if_version(key, new_value, expected_version) -> True/False
def increment_counter(store, key):
while True:
value, version = store.get(key)
new_value = value + 1
success = store.put_if_version(key, new_value, version)
if success:
return new_value
# retry on version mismatch
Read-your-writes session guarantee (client stickiness)
# naive session approach: bind writes to the leader for a session # client_session stores the preferred_node after a write
def write_and_read(session, store, key, value):
# write to leader (or node used for previous writes in this session)
node = session.get('preferred_node') or store.leader()
node.put(key, value)
session['preferred_node'] = node
# read from the same node to ensure read-your-writes
return node.get(key)
When to pick what
If your system can't tolerate anomalies (double-spend, inconsistent inventory), favor stronger guarantees with consensus or single-writer patterns. If you need low-latency global reads and can accept temporary divergence, eventual or causal models with good conflict resolution are better.
Summary
Consistency is not a single on/off switch; it's a spectrum of models and trade-offs. Design systems by explicitly classifying operations by their consistency needs, choosing appropriate replication and coordination patterns, and using pragmatic techniques (session guarantees, idempotency, reconciliation) to reduce complexity for clients. The right choice depends on the domain, failure assumptions, and user expectations — but explicit, documented consistency contracts will always make your system easier to reason about and safer to evolve.
Further reading suggestions (to explore after this primer): CAP theorem proofs, Raft/Paxos tutorials, CRDT libraries and patterns, and real-world case studies on consistency vs. availability decisions.