Gossip Protocols

Wikipedia · Gossip protocol

Gossip protocols spread information like an epidemic. Each node periodically picks a random peer and exchanges state. After O(log N) rounds, all N nodes have the information. No central coordinator, no single point of failure, and remarkably robust to node failures. The tradeoff: eventual consistency only, and bandwidth overhead from redundant messages.

Epidemic dissemination

A node with new information "infects" a random peer each round. That peer infects another. The number of informed nodes doubles each round (approximately). After log2(N) rounds, everyone knows. Three styles: push (I send you my data), pull (I ask you for your data), push-pull (we exchange). Push-pull converges fastest.

Scheme

; Gossip simulation: each informed node infects one random peer.
; Track how many rounds to reach all N nodes.

(define N 16)

(define (gossip-rounds n)
  ; Approximate: each round, each informed node infects one new node
  ; Informed doubles each round (roughly)
  (let loop ((informed 1) (round 0))
    (if (>= informed n)
        (begin
          (display "N=") (display n)
          (display ", rounds=") (display round) (newline))
        (loop (min n (* informed 2)) (+ round 1)))))

(gossip-rounds 4)
(gossip-rounds 8)
(gossip-rounds 16)
(gossip-rounds 64)
(gossip-rounds 1024)
(gossip-rounds 1000000)
; O(log N) rounds to inform everyone

Failure detection with gossip

Gossip can detect failures. Each node maintains a heartbeat counter that increments periodically. Nodes gossip heartbeats. If a node's heartbeat has not increased for long enough, it is suspected dead. The phi-accrual failure detector outputs a suspicion level (phi) rather than a binary dead/alive. Higher phi means more likely dead. This avoids premature declarations from temporary slowdowns.

Scheme

; Phi-accrual failure detector (simplified).
; phi = -log10(probability that heartbeat is just late).
; Higher phi = more suspicious.

(define (phi time-since-last-heartbeat mean-interval)
  ; Simplified: exponential distribution assumption
  ; P(late by t) = e^(-t/mean)
  ; phi = -log10(P) = t / (mean * ln(10))
  (/ time-since-last-heartbeat (* mean-interval 2.302585)))

(define mean-hb 1.0)  ; average 1 second between heartbeats

(display "Time since last HB | phi") (newline)
(display "---") (newline)
(let loop ((t 1))
  (if (> t 10) #t
      (begin
        (display t) (display "s              | ")
        (display (phi t mean-hb)) (newline)
        (loop (+ t 1)))))
; phi > 8 is typically "considered dead"

Neighbors

Cross-references

🎰 Probability Ch.11 — Markov chains: gossip spreading is a random process on a graph

← Distributed Transactions by june.kim CRDTs →