← back to distributed systems

Raft

Ongaro & Ousterhout 2014 · wpWikipedia

Raft is a consensus algorithm designed to be understandable. It decomposes consensus into three subproblems: leader election, log replication, and safety. It provides the same guarantees as Paxos but with a clearer structure. One leader per term. The leader appends entries to its log and replicates them. An entry is committed when a majority of nodes have it.

Leader election

Nodes start as followers. If a follower receives no heartbeat from a leader within a timeout, it becomes a candidate and starts an election. It increments the term, votes for itself, and requests votes from others. A node wins if it gets a majority. If two candidates split the vote, a new election starts with a new term. Randomized timeouts prevent repeated ties.

Scheme
Leader F1 F2 x=1 y=2 x=3 z=4 x=1 y=2 x=3 x=1 y=2 commit index committed uncommitted

Log replication

The leader receives client requests and appends them to its log. It sends AppendEntries RPCs to all followers. Once a majority has the entry, the leader commits it and responds to the client. Followers apply committed entries to their state machines in log order. If a follower is behind, the leader sends missing entries.

Scheme

Safety and liveness

Safety: if an entry is committed, no future leader will have a different entry at that index. Raft guarantees this via the election restriction: a candidate must have a log at least as up-to-date as a majority. Liveness: the system makes progress as long as a majority of nodes and the network are working. Randomized election timeouts prevent livelock.

Neighbors

Cross-references