← back to databases

Query Processing

Wikipedia · wpQuery optimization · CC BY-SA 4.0

Query processing translates a SQL query into an execution plan. The optimizer considers multiple plans (different join orders, index choices, join algorithms) and picks the cheapest one based on cost estimates. The main join algorithms: nested-loop, sort-merge, and hash join.

Project(name) Select(age>25) Hash Join(id=uid) Scan(users) Scan(orders) Query plan: scan, join, filter, project.

Nested-loop join

The simplest join algorithm. For each row in the outer table, scan the entire inner table for matches. Cost: O(n * m). Good when one table is small or the inner table has an index.

Scheme

Sort-merge join

Sort both tables on the join key, then merge them in one pass. Cost: O(n log n + m log m) for the sorts, then O(n + m) for the merge. Best when both tables are large and already nearly sorted.

Python

Hash join

Build a hash table on the smaller (build) relation, then probe it with each row of the larger (probe) relation. Cost: O(n + m) on average. The default choice in most modern query optimizers when no useful index exists.

Scheme

Cost estimation

The optimizer estimates the cost of each plan using statistics: table sizes, index selectivity, data distribution. It picks the plan with the lowest estimated I/O and CPU cost. The estimates are not always accurate, which is why sometimes the optimizer picks a suboptimal plan.

Python
Neighbors

Cross-references

Foundations (Wikipedia)