NoSQL databases abandon the relational model when it does not fit. Four main types: key-value stores (Redis), document stores (MongoDB), column-family stores (Cassandra), and graph databases (Neo4j). The CAP theorem says you can have at most two of Consistency, Availability, and Partition tolerance.
Key-value stores
The simplest model: a dictionary. Store and retrieve values by key. No schema, no query language beyond GET/SET/DELETE. Extremely fast for point lookups. Examples: Redis, DynamoDB, Riak.
Store semi-structured documents (JSON, BSON). Each document can have different fields. Queries can reach into nested structure. Good for heterogeneous data that does not fit a fixed schema. Examples: MongoDB, CouchDB.
Python
# Document store simulationimport json
class DocumentStore:
def __init__(self):
self.docs = {}
self.next_id = 1def insert(self, doc):
doc_id = str(self.next_id)
self.next_id += 1
self.docs[doc_id] = doc
return doc_id
def find(self, query):
"""Find docs matching all key-value pairs in query."""
results = []
for doc_id, doc in self.docs.items():
ifall(doc.get(k) == v for k, v in query.items()):
results.append((doc_id, doc))
return results
db = DocumentStore()
db.insert({"name": "Alice", "age": 30, "tags": ["admin", "dev"]})
db.insert({"name": "Bob", "age": 22, "dept": "sales"})
db.insert({"name": "Carol", "age": 30, "tags": ["dev"]})
# Different documents have different shapes - that is fineprint("All docs:")
for doc_id, doc in db.docs.items():
print(" " + doc_id + ": " + json.dumps(doc))
print()
print("Find age=30:")
for doc_id, doc in db.find({"age": 30}):
print(" " + doc_id + ": " + doc['name'])
Graph databases
Store nodes and edges directly. Queries traverse relationships without expensive joins. Good for social networks, recommendation engines, knowledge graphs. Examples: Neo4j, Amazon Neptune.
The CAP theorem (Brewer, 2000): a distributed data store can satisfy at most two of three properties. Consistency (every read returns the latest write), Availability (every request gets a response), Partition tolerance (the system works despite network splits). In practice, partitions happen, so the real choice is between consistency and availability during a partition.
Scheme
; CAP theorem: pick two.; CP systems: consistent but unavailable during partition (e.g., HBase); AP systems: available but may return stale data (e.g., Cassandra); CA systems: not partition-tolerant (single-node relational DB)
(define systems
(list
(list "PostgreSQL (single)""CA""No partition tolerance")
(list "HBase / MongoDB""CP""Blocks during partition")
(list "Cassandra / DynamoDB""AP""Eventual consistency")
(list "Spanner""CP""Uses TrueTime for global consistency")))
(for-each
(lambda (s)
(display (car s))
(display " [") (display (cadr s)) (display "] - ")
(display (caddr s))
(newline))
systems)
Eventual consistency
AP systems trade strong consistency for availability. After a write, replicas may temporarily disagree. Given enough time without new writes, all replicas converge to the same value. This is eventual consistency. Many applications tolerate this: a social media timeline that is a few seconds stale is acceptable.
Python
# Eventual consistency simulation: three replicasimport random
class Replica:
def __init__(self, name):
self.name = name
self.data = {"x": 0}
self.pending = []
def write(self, key, value):
self.data[key] = value
print(" " + self.name + ": wrote " + key + "=" + str(value))
def sync_to(self, other):
for k, v in self.data.items():
if other.data.get(k) != v:
other.data[k] = v
replicas = [Replica("R1"), Replica("R2"), Replica("R3")]
# Write to R1 onlyprint("Write x=42 to R1:")
replicas[0].write("x", 42)
print()
print("Before sync:")
for r in replicas:
print(" " + r.name + ": x = " + str(r.data['x']))
# Sync propagatesprint()
print("Syncing...")
replicas[0].sync_to(replicas[1])
replicas[0].sync_to(replicas[2])
print()
print("After sync (eventual consistency achieved):")
for r in replicas:
print(" " + r.name + ": x = " + str(r.data['x']))