The derivative f'(c) is the limit of the difference quotient (f(x) - f(c))/(x - c) as x → c. It measures the instantaneous rate of change. The mean value theorem says that somewhere between a and b, the derivative equals the average rate of change.
Definition of the derivative
f'(c) = lim (f(x) - f(c)) / (x - c) as x → c, if the limit exists. Equivalently, f'(c) = lim (f(c+h) - f(c)) / h as h → 0. Differentiability at c implies continuity at c, but not vice versa: |x| is continuous at 0 but not differentiable there.
Scheme
; Derivative as limit of difference quotient
(define (approx-derivative f c h)
(/ (- (f (+ c h)) (f c)) h))
(define (f x) (* x x)) ; f(x) = x^2, f'(x) = 2x
(display "Approximating f'(3) for f(x) = x^2:") (newline)
(for-each (lambda (h)
(display " h=") (display h)
(display ": ") (display (approx-derivative f 3 h))
(newline))
(list 1.00.10.010.0010.00010.00001))
(display "Exact: f'(3) = 6") (newline)
; |x| is not differentiable at 0
(display "Derivative of |x| at 0:") (newline)
(display " from right: ")
(display (approx-derivative abs 00.001)) (newline)
(display " from left: ")
(display (approx-derivative abs 0-0.001)) (newline)
(display "Left and right limits differ: not differentiable")
Python
# Numerical derivativedef deriv(f, c, h=1e-8):
return (f(c + h) - f(c)) / h
f = lambda x: x**2print(f"f'(3) = {deriv(f, 3):.6f} (exact: 6)")
g = lambda x: abs(x)
print(f"|x|' at 0 from right: {deriv(g, 0, 0.001)}")
print(f"|x|' at 0 from left: {deriv(g, 0, -0.001)}")
Chain rule
If g is differentiable at c and f is differentiable at g(c), then (f ∘ g)'(c) = f'(g(c)) * g'(c). The derivative of a composition is the product of derivatives along the chain.
Scheme
; Chain rule: (f o g)'(c) = f'(g(c)) * g'(c); f(x) = x^2, g(x) = sin(x); (f o g)(x) = sin^2(x); (f o g)'(x) = 2*sin(x)*cos(x) = sin(2x)
(define (approx-deriv f c)
(let ((h 0.00001))
(/ (- (f (+ c h)) (f c)) h)))
(define (f x) (* x x))
(define (g x) (sin x))
(define (fog x) (f (g x)))
(define c 1.0)
(display "Chain rule at c = 1:") (newline)
(display " numerical (fog)'(1) = ")
(display (approx-deriv fog c)) (newline)
(display " f'(g(1)) * g'(1) = ")
(display (* (approx-deriv f (g c)) (approx-deriv g c))) (newline)
(display " exact sin(2) = ")
(display (sin 2.0))
Python
# Python equivalentimportmathdef deriv(f, c, h=1e-5):
return (f(c + h) - f(c)) / h
f = lambda x: x**2
g = lambda x: math.sin(x)
fog = lambda x: f(g(x))
c = 1.0print("Chain rule at c = 1:")
print(" numerical (fog)'(1) =", deriv(fog, c))
print(" f'(g(1)) * g'(1) =", deriv(f, g(c)) * deriv(g, c))
print(" exact sin(2) =", math.sin(2.0))
Mean value theorem
If f is continuous on [a, b] and differentiable on (a, b), there exists c in (a, b) with f'(c) = (f(b) - f(a)) / (b - a). The tangent at c is parallel to the secant from a to b. Rolle's theorem is the special case where f(a) = f(b), giving f'(c) = 0.
Scheme
; MVT: find c where f'(c) = (f(b)-f(a))/(b-a); f(x) = x^3, [a,b] = [1, 3]
(define (f x) (* x x x))
(define a 1.0)
(define b 3.0)
(define avg-slope (/ (- (f b) (f a)) (- b a)))
(display "Average slope on [1,3]: ")
(display avg-slope) (newline) ; (27-1)/2 = 13; f'(x) = 3x^2 = 13 => x = sqrt(13/3)
(define c-exact (sqrt (/ 13.03.0)))
(display "MVT point c = sqrt(13/3) = ")
(display c-exact) (newline)
(display "f'(c) = 3c^2 = ")
(display (* 3 c-exact c-exact)) (newline)
(display "Matches average slope? ")
(display (< (abs (- (* 3 c-exact c-exact) avg-slope)) 0.0001))
Python
# Python equivalentimportmath
f = lambda x: x**3
a, b = 1.0, 3.0
avg_slope = (f(b) - f(a)) / (b - a)
print("Average slope on [1,3]:", avg_slope)
# f'(x) = 3x^2 = 13 => x = sqrt(13/3)
c = math.sqrt(13 / 3)
print("MVT point c = sqrt(13/3) =", c)
print("f'(c) = 3c^2 =", 3 * c * c)
print("Matches average slope?", abs(3 * c * c - avg_slope) < 0.0001)
L'Hopital's rule and Taylor's theorem
L'Hopital's rule: if f(c) = g(c) = 0 and the limit of f'/g' exists, then lim f(x)/g(x) = lim f'(x)/g'(x). Taylor's theorem: f(x) = f(c) + f'(c)(x-c) + f''(c)(x-c)^2/2! + ... + R_n, with an explicit remainder term. The Taylor polynomial is the best polynomial approximation near c.
Scheme
; Taylor polynomial for e^x around c=0; e^x = 1 + x + x^2/2! + x^3/3! + ...
(define (factorial n)
(if (<= n 1) 1 (* n (factorial (- n 1)))))
(define (taylor-exp x n)
(let loop ((k 0) (sum 0.0))
(if (> k n) sum
(loop (+ k 1) (+ sum (/ (expt x k) (factorial k)))))))
(define x 1.0)
(display "Taylor approximations of e^1:") (newline)
(for-each (lambda (n)
(display " degree ") (display n)
(display ": ") (display (taylor-exp x n))
(display " (error: ") (display (abs (- (exp x) (taylor-exp x n))))
(display ")") (newline))
(list 12351015))
(display "Exact e = ") (display (exp 1.0))
Python
# Python equivalentimportmathdef taylor_exp(x, n):
returnsum(x**k / math.factorial(k) for k inrange(n + 1))
x = 1.0print("Taylor approximations of e^1:")
for n in [1, 2, 3, 5, 10, 15]:
approx = taylor_exp(x, n)
error = abs(math.e - approx)
print(" degree " + str(n) + ": " + str(approx) + " (error: " + str(error) + ")")
print("Exact e =", math.e)
📐 Calculus Ch.5 — derivative rules: the computational counterpart to this rigorous treatment
🤖 ML Ch.3 — gradient descent: the chain rule drives all of neural network training
🍞 Capucci 2021 — categorical differentiation: the chain rule as composition in a double category
Translation notes
Numerical differentiation via difference quotients suffers from cancellation error: too-small h makes (f(c+h) - f(c)) lose significant digits. The sweet spot is around h = 10^-8 for double precision. The exact derivative is a limit, not a computation.