Replication
Wikipedia · Cambridge Distributed Systems · Replication
Copy data to multiple nodes for fault tolerance and lower latency. The hard part is keeping copies consistent. Three architectures: leader/follower, multi-leader, leaderless. The quorum condition R + W > N guarantees you read at least one up-to-date copy.
Why replicate
Single copies are single points of failure. Replication buys you: (1) fault tolerance: if a node dies, others have the data; (2) lower latency: serve reads from the nearest copy; (3) higher throughput: spread read load across replicas. The cost is consistency: when one copy is updated, others lag behind.
Leader/follower (primary/backup)
One node is the leader. All writes go to the leader. The leader replicates to followers. Reads can go to any replica (but might be stale). If the leader dies, a follower must be promoted. Simple, but the leader is a bottleneck and a single point of failure for writes.
Quorum reads and writes
In a leaderless system with N replicas, write to W nodes and read from R nodes. If R + W > N, at least one node in every read set has the latest write. This is the quorum condition. With N=3, R=2, W=2: every read overlaps every write.
Consistency models
Strong consistency: every read sees the most recent write. Requires coordination (slow). Eventual consistency: replicas converge eventually, but reads may be stale. Causal consistency: if A caused B, everyone sees A before B, but concurrent events may appear in any order. Stronger is safer, weaker is faster.
Neighbors
Cross-references
- 🌐 Ch.7 CAP Theorem — the fundamental tradeoff between consistency and availability
- 🌐 Ch.10 CRDTs — eventual consistency without coordination
june.kim/caches-all-the-way-down — replication as caching: every replica is a cache of the leader's state