← back to distributed systems

Gossip Protocols

Wikipedia · wpGossip protocol

Gossip protocols spread information like an epidemic. Each node periodically picks a random peer and exchanges state. After O(log N) rounds, all N nodes have the information. No central coordinator, no single point of failure, and remarkably robust to node failures. The tradeoff: eventual consistency only, and bandwidth overhead from redundant messages.

Epidemic dissemination

A node with new information "infects" a random peer each round. That peer infects another. The number of informed nodes doubles each round (approximately). After log2(N) rounds, everyone knows. Three styles: push (I send you my data), pull (I ask you for your data), push-pull (we exchange). Push-pull converges fastest.

t=0 t=1 t=2 t=3
Scheme

Failure detection with gossip

Gossip can detect failures. Each node maintains a heartbeat counter that increments periodically. Nodes gossip heartbeats. If a node's heartbeat has not increased for long enough, it is suspected dead. The phi-accrual failure detector outputs a suspicion level (phi) rather than a binary dead/alive. Higher phi means more likely dead. This avoids premature declarations from temporary slowdowns.

Scheme
Neighbors

Cross-references