Raft

Ongaro & Ousterhout 2014 · Wikipedia

Raft is a consensus algorithm designed to be understandable. It decomposes consensus into three subproblems: leader election, log replication, and safety. It provides the same guarantees as Paxos but with a clearer structure. One leader per term. The leader appends entries to its log and replicates them. An entry is committed when a majority of nodes have it.

Leader election

Nodes start as followers. If a follower receives no heartbeat from a leader within a timeout, it becomes a candidate and starts an election. It increments the term, votes for itself, and requests votes from others. A node wins if it gets a majority. If two candidates split the vote, a new election starts with a new term. Randomized timeouts prevent repeated ties.

Scheme

; Raft leader election.
; Nodes: follower -> candidate -> leader.
; Need majority to win.

(define num-nodes 5)
(define majority (+ 1 (quotient num-nodes 2)))  ; 3

(define (election candidate-id votes-received)
  (display "Node ") (display candidate-id)
  (display " got ") (display votes-received) (display " votes")
  (display " (need ") (display majority) (display "): ")
  (if (>= votes-received majority)
      (display "ELECTED LEADER")
      (display "election failed"))
  (newline))

(display "5 nodes, majority = ") (display majority) (newline)
(election 0 3)  ; wins
(election 1 2)  ; loses
(election 2 4)  ; wins

Log replication

The leader receives client requests and appends them to its log. It sends AppendEntries RPCs to all followers. Once a majority has the entry, the leader commits it and responds to the client. Followers apply committed entries to their state machines in log order. If a follower is behind, the leader sends missing entries.

Scheme

; Raft log replication.
; Leader appends, replicates, commits on majority.

(define leader-log (list))
(define f1-log (list))
(define f2-log (list))
(define commit-index 0)

(define (append-entry entry)
  (set! leader-log (append leader-log (list entry)))
  (display "Leader appends: ") (display entry) (newline))

(define (replicate-to-follower follower-name)
  ; Send all entries after follower's last index
  (cond
    ((equal? follower-name "F1")
     (set! f1-log leader-log))
    ((equal? follower-name "F2")
     (set! f2-log leader-log)))
  (display "  Replicated to ") (display follower-name) (newline))

(define (try-commit index)
  ; Count how many have entry at this index
  (let ((count (+ 1  ; leader always has it
                 (if (>= (length f1-log) index) 1 0)
                 (if (>= (length f2-log) index) 1 0))))
    (if (>= count 2)
        (begin (set! commit-index index)
               (display "Committed index ") (display index) (newline))
        (display "Not yet committed"))))

(append-entry "x=1")
(replicate-to-follower "F1")
(replicate-to-follower "F2")
(try-commit 1)

(append-entry "y=2")
(replicate-to-follower "F1")
; F2 is slow, has not received y=2 yet
(try-commit 2)  ; 2 of 3 have it -> committed

; Raft log replication.
; Leader appends, replicates, commits on majority.

(define leader-log (list))
(define f1-log (list))
(define f2-log (list))
(define commit-index 0)

(define (append-entry entry)
  (set! leader-log (append leader-log (list entry)))
  (display "Leader appends: ") (display entry) (newline))

(define (replicate-to-follower follower-name)
  ; Send all entries after follower's last index
  (cond
    ((equal? follower-name "F1")
     (set! f1-log leader-log))
    ((equal? follower-name "F2")
     (set! f2-log leader-log)))
  (display "  Replicated to ") (display follower-name) (newline))

(define (try-commit index)
  ; Count how many have entry at this index
  (let ((count (+ 1  ; leader always has it
                 (if (>= (length f1-log) index) 1 0)
                 (if (>= (length f2-log) index) 1 0))))
    (if (>= count 2)
        (begin (set! commit-index index)
               (display "Committed index ") (display index) (newline))
        (display "Not yet committed"))))

(append-entry "x=1")
(replicate-to-follower "F1")
(replicate-to-follower "F2")
(try-commit 1)

(append-entry "y=2")
(replicate-to-follower "F1")
; F2 is slow, has not received y=2 yet
(try-commit 2)  ; 2 of 3 have it -> committed

Safety and liveness

Safety: if an entry is committed, no future leader will have a different entry at that index. Raft guarantees this via the election restriction: a candidate must have a log at least as up-to-date as a majority. Liveness: the system makes progress as long as a majority of nodes and the network are working. Randomized election timeouts prevent livelock.

Neighbors

Cross-references

🌐 Ch.4 Consensus — the general consensus problem and FLP impossibility
🌐 Ch.6 Byzantine Fault Tolerance — what happens when nodes can lie, not just crash

← Consensus by june.kim Byzantine Fault Tolerance →