← back to distributed systems

CAP Theorem

Brewer 2000, Gilbert & Lynch 2002 · wpWikipedia

The CAP theorem: a distributed data store can provide at most two of three guarantees: Consistency (every read gets the latest write), Availability (every request gets a response), Partition tolerance (the system works despite network partitions). Since partitions happen in practice, the real choice is between C and A during a partition.

Consistency Availability Partition Tolerance CP: HBase, MongoDB, Zookeeper AP: Cassandra, DynamoDB, CouchDB CA: single-node RDBMS (not distributed) During a partition: CP waits, AP serves stale data

The three properties

Consistency: linearizability. Every read returns the value of the most recent write. Availability: every non-failing node returns a response for every request. Partition tolerance: the system continues to operate despite arbitrary message loss between nodes. In any real network, partitions happen, so you must tolerate them. The theorem says you then must choose: block during a partition (CP) or return possibly stale data (AP).

Scheme

The real choice: latency vs consistency

CAP is often oversimplified. The deeper insight (from Abadi's PACELC extension): even when there is no partition, there is a tradeoff between latency and consistency. Synchronous replication is consistent but slow. Asynchronous replication is fast but eventually consistent. The partition case is just the extreme version of this tradeoff.

Scheme
Neighbors

Cross-references