← back to cognitive science

Neural Networks

Lovelace textbook · CC BY-SA 4.0 · computationalcognitivescience.github.io/lovelace/home

A neural network is a function built from layers of simple units. Each unit computes a weighted sum of its inputs, adds a bias, and passes the result through a nonlinear activation function. Stacking layers lets the network learn hierarchical representations. Backpropagation adjusts the weights by propagating error gradients backward through the layers. With enough hidden units, a single hidden layer can approximate any continuous function.

The perceptron

The simplest neural network: one unit, n inputs, n weights, one bias. Output = activation(w1*x1 + w2*x2 + ... + bias). A single perceptron can learn any linearly separable function (AND, OR) but not XOR. This limitation motivated multilayer networks.

x1 x2 x3 h1 h2 h3 y input hidden output
Scheme

Backpropagation

Backpropagation computes the gradient of the loss with respect to each weight by applying the chain rule layer by layer, from output back to input. Each weight is then nudged in the direction that reduces the loss. The learning rate controls the step size. This is gradient descent applied to a compositional function. The same backward-pass structure appears in cognitive architectures as jkconsolidation: a process that reads from experience and writes parameter changes back to the substrate.

Scheme

Universal approximation

A feedforward network with a single hidden layer and enough units can approximate any continuous function on a compact domain to arbitrary precision. This is the universal approximation theorem. It guarantees expressiveness but says nothing about learnability: finding the right weights is a separate problem, and that is where depth, architecture, and training dynamics matter.

Notation reference

Symbol Meaning
w, bWeights and biases
sigma(z)Activation function (e.g., sigmoid, ReLU)
dL/dwGradient of loss with respect to weight
lrLearning rate (step size for gradient descent)
UATUniversal approximation theorem
Neighbors

Translation notes

The Lovelace textbook covers the history of connectionism, from Rosenblatt's perceptron through the PDP group to modern deep learning. This page focuses on the computational mechanics: what a network computes, how backpropagation trains it, and why universal approximation matters. The textbook also discusses representation learning and distributed representations, which are central to the connectionist program in cognitive science.

Read the original: Lovelace, Chapter 4.