Terrain as Log-Likelihood

The connection between walking uphill and fitting a statistical model

The Metaphor You've Been Using

In the optimisation pages, you watched algorithms climb a terrain — gradient ascent following the steepest slope, Newton-Raphson using curvature to leap toward the peak, simulated annealing randomly wandering to escape local hills.

That terrain wasn't just a convenient visual metaphor. In statistics, the terrain is the log-likelihood surface. Every point on the map represents a candidate set of model parameters, and the elevation at that point is how well those parameters explain the observed data.

Terrain Concept Statistical Concept
Map coordinates (x, y) Parameter values ($\beta_0, \beta_1, \ldots$)
Elevation at a point Log-likelihood $\ell(\beta | \text{data})$
Peak / summit Maximum likelihood estimate (MLE)
Slope (gradient) Score function $\nabla \ell(\beta)$
Curvature (Hessian) Observed information matrix $-\nabla^2 \ell(\beta)$
Confidence ellipse at peak Approximate confidence region for $\beta$
Sharp peak vs broad plateau Small vs large standard errors
"Climbing a hill" is maximum likelihood estimation. The algorithms are the same — only the surface they're climbing changes.

See It: What Link Functions Do to the Surface

The middle panel shows the log-likelihood parameterized directly in terms of the mean $\mu$ (no link function). The right panel shows the same log-likelihood after applying the canonical link transformation. Compare how the link function reshapes the surface into something that looks like the smooth terrain on the left.

Terrain (Metaphor)

LL without link (μ-space)

LL with link (η-space)

Click a surface to set starting position, then run an algorithm.

Left: a smooth Gaussian-bump terrain. Middle: log-likelihood parameterized directly in $\mu$. Right: the same log-likelihood after applying the canonical link function. The link transforms the awkward middle surface into the well-behaved right surface.

Why Link Functions Make Surfaces Similar

Without a link function, the log-likelihood surface can be badly shaped: cliffs where parameters hit constraints ($\mu > 0$ for counts, $0 < p < 1$ for probabilities), asymmetric curvature, and steep drop-offs. Optimisation algorithms struggle on such surfaces.

The Canonical Link Solution

Each GLM family belongs to the exponential family, which has a natural parameter $\theta$. The canonical link sets $\eta = \theta$, giving the log-likelihood a common structure:

$\ell(\beta) = \sum_{i=1}^{n} \left[ y_i \theta_i - b(\theta_i) \right] + \text{const}$

where $\theta_i = \eta_i = \beta_0 + \beta_1 x_i$. This is always concave in $\beta$ — a smooth hill with a single peak, exactly like the terrain metaphor. The different families only change $b(\theta)$:

Family Canonical Link $b(\theta)$ Surface Shape
Gaussian Identity: $\eta = \mu$ $\theta^2/2$ Exact quadratic bowl
Binomial Logit: $\eta = \log\frac{p}{1-p}$ $\log(1 + e^\theta)$ Smooth concave hill
Poisson Log: $\eta = \log\mu$ $e^\theta$ Smooth concave hill
The link function is not arbitrary decoration — it transforms a constrained, awkwardly-shaped optimisation problem into a smooth, well-behaved hill that algorithms can climb efficiently. This is why GLMs use link functions, and why the terrain metaphor works so well for all of them.

Without the link

If we parameterize Poisson regression as $\mu = \beta_0 + \beta_1 x$ (identity link), the surface has a cliff where $\mu$ approaches zero (log-likelihood plummets to $-\infty$). For logistic regression with $p = \beta_0 + \beta_1 x$ (linear probability model), the surface has walls at $p = 0$ and $p = 1$. These constraints make the surface hard to optimise and the terrain metaphor breaks down.

With the canonical link

Apply log: $\log\mu = \beta_0 + \beta_1 x$ for Poisson. Apply logit: $\log\frac{p}{1-p} = \beta_0 + \beta_1 x$ for logistic. Now the parameters are unconstrained, the surface is concave, and gradient ascent or Newton-Raphson will find the peak reliably — just like climbing a hill on a map.

Connecting the Pieces

Every tutorial's "fitting" page is about climbing a log-likelihood surface. Every tutorial's "advanced" page derives the shape of that surface from the probability model. And every optimisation visualisation demonstrates the algorithms that do the climbing.

Explore further:
Theory: For the mathematical foundations of likelihood-based inference, see Likelihood and Simulation Theory on JonStats.