NoSQL and Beyond

Wikipedia · NoSQL · CC BY-SA 4.0

NoSQL databases abandon the relational model when it does not fit. Four main types: key-value stores (Redis), document stores (MongoDB), column-family stores (Cassandra), and graph databases (Neo4j). The CAP theorem says you can have at most two of Consistency, Availability, and Partition tolerance.

Key-value stores

The simplest model: a dictionary. Store and retrieve values by key. No schema, no query language beyond GET/SET/DELETE. Extremely fast for point lookups. Examples: Redis, DynamoDB, Riak.

Scheme

; Key-value store: just an association list.

(define store (list))

(define (kv-set key value)
  (set! store (cons (cons key value)
    (filter (lambda (p) (not (string=? (car p) key))) store))))

(define (kv-get key)
  (let ((found (assoc key store)))
    (if found (cdr found) #f)))

(define (kv-delete key)
  (set! store (filter (lambda (p) (not (string=? (car p) key))) store)))

(kv-set "user:1" "Alice")
(kv-set "user:2" "Bob")
(kv-set "session:abc" "user:1")

(display "GET user:1 = ") (display (kv-get "user:1")) (newline)
(display "GET session:abc = ") (display (kv-get "session:abc")) (newline)

(kv-delete "user:2")
(display "GET user:2 = ") (display (kv-get "user:2"))

Document stores

Store semi-structured documents (JSON, BSON). Each document can have different fields. Queries can reach into nested structure. Good for heterogeneous data that does not fit a fixed schema. Examples: MongoDB, CouchDB.

Python

# Document store simulation
import json

class DocumentStore:
    def __init__(self):
        self.docs = {}
        self.next_id = 1

    def insert(self, doc):
        doc_id = str(self.next_id)
        self.next_id += 1
        self.docs[doc_id] = doc
        return doc_id

    def find(self, query):
        """Find docs matching all key-value pairs in query."""
        results = []
        for doc_id, doc in self.docs.items():
            if all(doc.get(k) == v for k, v in query.items()):
                results.append((doc_id, doc))
        return results

db = DocumentStore()
db.insert({"name": "Alice", "age": 30, "tags": ["admin", "dev"]})
db.insert({"name": "Bob", "age": 22, "dept": "sales"})
db.insert({"name": "Carol", "age": 30, "tags": ["dev"]})

# Different documents have different shapes - that is fine
print("All docs:")
for doc_id, doc in db.docs.items():
    print("  " + doc_id + ": " + json.dumps(doc))

print()
print("Find age=30:")
for doc_id, doc in db.find({"age": 30}):
    print("  " + doc_id + ": " + doc['name'])

# Document store simulation
import json

class DocumentStore:
    def __init__(self):
        self.docs = {}
        self.next_id = 1

def insert(self, doc):
        doc_id = str(self.next_id)
        self.next_id += 1
        self.docs[doc_id] = doc
        return doc_id

def find(self, query):
        """Find docs matching all key-value pairs in query."""
        results = []
        for doc_id, doc in self.docs.items():
            if all(doc.get(k) == v for k, v in query.items()):
                results.append((doc_id, doc))
        return results

db = DocumentStore()
db.insert({"name": "Alice", "age": 30, "tags": ["admin", "dev"]})
db.insert({"name": "Bob", "age": 22, "dept": "sales"})
db.insert({"name": "Carol", "age": 30, "tags": ["dev"]})

# Different documents have different shapes - that is fine
print("All docs:")
for doc_id, doc in db.docs.items():
    print("  " + doc_id + ": " + json.dumps(doc))

print()
print("Find age=30:")
for doc_id, doc in db.find({"age": 30}):
    print("  " + doc_id + ": " + doc['name'])

Graph databases

Store nodes and edges directly. Queries traverse relationships without expensive joins. Good for social networks, recommendation engines, knowledge graphs. Examples: Neo4j, Amazon Neptune.

Scheme

; Graph database: nodes and edges with traversal.

(define nodes
  (list (cons "alice" "person")
        (cons "bob" "person")
        (cons "carol" "person")
        (cons "proj1" "project")))

(define edges
  (list (list "alice" "knows" "bob")
        (list "alice" "knows" "carol")
        (list "bob" "works-on" "proj1")
        (list "carol" "works-on" "proj1")))

; Traverse: who does Alice know?
(define (neighbors node rel)
  (map caddr
    (filter (lambda (e)
      (and (string=? (car e) node)
           (string=? (cadr e) rel)))
      edges)))

(display "Alice knows: ")
(display (neighbors "alice" "knows")) (newline)

; Two-hop: what projects do Alice's friends work on?
(display "Projects of Alice's friends:") (newline)
(for-each
  (lambda (friend)
    (let ((projects (neighbors friend "works-on")))
      (for-each (lambda (p)
        (display "  ") (display friend)
        (display " -> ") (display p) (newline))
        projects)))
  (neighbors "alice" "knows"))

CAP theorem

The CAP theorem (Brewer, 2000): a distributed data store can satisfy at most two of three properties. Consistency (every read returns the latest write), Availability (every request gets a response), Partition tolerance (the system works despite network splits). In practice, partitions happen, so the real choice is between consistency and availability during a partition.

Scheme

; CAP theorem: pick two.
; CP systems: consistent but unavailable during partition (e.g., HBase)
; AP systems: available but may return stale data (e.g., Cassandra)
; CA systems: not partition-tolerant (single-node relational DB)

(define systems
  (list
    (list "PostgreSQL (single)" "CA" "No partition tolerance")
    (list "HBase / MongoDB"     "CP" "Blocks during partition")
    (list "Cassandra / DynamoDB" "AP" "Eventual consistency")
    (list "Spanner"             "CP" "Uses TrueTime for global consistency")))

(for-each
  (lambda (s)
    (display (car s))
    (display " [") (display (cadr s)) (display "] - ")
    (display (caddr s))
    (newline))
  systems)

Eventual consistency

AP systems trade strong consistency for availability. After a write, replicas may temporarily disagree. Given enough time without new writes, all replicas converge to the same value. This is eventual consistency. Many applications tolerate this: a social media timeline that is a few seconds stale is acceptable.

Python

# Eventual consistency simulation: three replicas

import random

class Replica:
    def __init__(self, name):
        self.name = name
        self.data = {"x": 0}
        self.pending = []

    def write(self, key, value):
        self.data[key] = value
        print("  " + self.name + ": wrote " + key + "=" + str(value))

    def sync_to(self, other):
        for k, v in self.data.items():
            if other.data.get(k) != v:
                other.data[k] = v

replicas = [Replica("R1"), Replica("R2"), Replica("R3")]

# Write to R1 only
print("Write x=42 to R1:")
replicas[0].write("x", 42)

print()
print("Before sync:")
for r in replicas:
    print("  " + r.name + ": x = " + str(r.data['x']))

# Sync propagates
print()
print("Syncing...")
replicas[0].sync_to(replicas[1])
replicas[0].sync_to(replicas[2])

print()
print("After sync (eventual consistency achieved):")
for r in replicas:
    print("  " + r.name + ": x = " + str(r.data['x']))

Neighbors

Cross-references

🌐 Distributed Systems — CAP theorem originates from distributed systems theory

Foundations (Wikipedia)

← Query Processing by june.kim