Neural Networks
Lovelace textbook · CC BY-SA 4.0 · computationalcognitivescience.github.io/lovelace/home
A neural network is a function built from layers of simple units. Each unit computes a weighted sum of its inputs, adds a bias, and passes the result through a nonlinear activation function. Stacking layers lets the network learn hierarchical representations. Backpropagation adjusts the weights by propagating error gradients backward through the layers. With enough hidden units, a single hidden layer can approximate any continuous function.
The perceptron
The simplest neural network: one unit, n inputs, n weights, one bias. Output = activation(w1*x1 + w2*x2 + ... + bias). A single perceptron can learn any linearly separable function (AND, OR) but not XOR. This limitation motivated multilayer networks.
Backpropagation
Backpropagation computes the gradient of the loss with respect to each weight by applying the chain rule layer by layer, from output back to input. Each weight is then nudged in the direction that reduces the loss. The learning rate controls the step size. This is gradient descent applied to a compositional function. The same backward-pass structure appears in cognitive architectures as
consolidation: a process that reads from experience and writes parameter changes back to the substrate.
Universal approximation
A feedforward network with a single hidden layer and enough units can approximate any continuous function on a compact domain to arbitrary precision. This is the universal approximation theorem. It guarantees expressiveness but says nothing about learnability: finding the right weights is a separate problem, and that is where depth, architecture, and training dynamics matter.
Notation reference
| Symbol | Meaning |
|---|---|
| w, b | Weights and biases |
| sigma(z) | Activation function (e.g., sigmoid, ReLU) |
| dL/dw | Gradient of loss with respect to weight |
| lr | Learning rate (step size for gradient descent) |
| UAT | Universal approximation theorem |
Neighbors
- Lovelace Ch.5 — reinforcement learning uses neural networks as function approximators
- Lovelace Ch.8 — cognitive architectures that integrate neural-network-style learning
- 🍞 Capucci 2021 — backpropagation as lens composition: the categorical structure of gradient descent
- ∫ Calculus Ch.5 Chain Rule — the chain rule that makes backpropagation work
Universal approximation theorem
Translation notes
The Lovelace textbook covers the history of connectionism, from Rosenblatt's perceptron through the PDP group to modern deep learning. This page focuses on the computational mechanics: what a network computes, how backpropagation trains it, and why universal approximation matters. The textbook also discusses representation learning and distributed representations, which are central to the connectionist program in cognitive science.