Terrain as Log-Likelihood

The Metaphor You've Been Using

In the optimisation pages, you watched algorithms climb a terrain — gradient ascent following the steepest slope, Newton-Raphson using curvature to leap toward the peak, simulated annealing randomly wandering to escape local hills.

That terrain wasn't just a convenient visual metaphor. In statistics, the terrain is the log-likelihood surface. Every point on the map represents a candidate set of model parameters, and the elevation at that point is how well those parameters explain the observed data.

Terrain Concept	Statistical Concept
Map coordinates (x, y)	Parameter values ($\beta_0, \beta_1, \ldots$)
Elevation at a point	Log-likelihood $\ell(\beta \| \text{data})$
Peak / summit	Maximum likelihood estimate (MLE)
Slope (gradient)	Score function $\nabla \ell(\beta)$
Curvature (Hessian)	Observed information matrix $-\nabla^2 \ell(\beta)$
Confidence ellipse at peak	Approximate confidence region for $\beta$
Sharp peak vs broad plateau	Small vs large standard errors

        "Climbing a hill" is maximum likelihood estimation. The algorithms are the same —
        only the surface they're climbing changes.
      

The last three rows of the table — information, confidence ellipse, sharp peak vs broad plateau — get a section of their own: Statistical Inference: From the Peak to the Error Bar turns the shape of the summit into standard errors and credible intervals.

See It: What Link Functions Do to the Surface

The middle panel shows the log-likelihood parameterized directly in terms of the mean $\mu$ (no link function). The right panel shows the same log-likelihood after applying the canonical link transformation. Compare how the link function reshapes the surface into something that looks like the smooth terrain on the left.

Terrain (Metaphor)

LL without link (μ-space)

LL with link (η-space)

Click a surface to set starting position, then run an algorithm.

Left: a smooth Gaussian-bump terrain. Middle: log-likelihood parameterized directly in $\mu$. Right: the same log-likelihood after applying the canonical link function. The link transforms the awkward middle surface into the well-behaved right surface.

Why Link Functions Make Surfaces Similar

Without a link function, the log-likelihood surface can be badly shaped: cliffs where parameters hit constraints ($\mu > 0$ for counts, $0 < p < 1$ for probabilities), asymmetric curvature, and steep drop-offs. Optimisation algorithms struggle on such surfaces.

The Canonical Link Solution

Each GLM family belongs to the exponential family, which has a natural parameter $\theta$. The canonical link sets $\eta = \theta$, giving the log-likelihood a common structure:

$\ell(\beta) = \sum_{i=1}^{n} \left[ y_i \theta_i - b(\theta_i) \right] + \text{const}$

where $\theta_i = \eta_i = \beta_0 + \beta_1 x_i$. This is always concave in $\beta$ — a smooth hill with a single peak, exactly like the terrain metaphor. The different families only change $b(\theta)$:

Family	Canonical Link	$b(\theta)$	Surface Shape
Gaussian	Identity: $\eta = \mu$	$\theta^2/2$	Exact quadratic bowl
Binomial	Logit: $\eta = \log\frac{p}{1-p}$	$\log(1 + e^\theta)$	Smooth concave hill
Poisson	Log: $\eta = \log\mu$	$e^\theta$	Smooth concave hill

        The link function is not arbitrary decoration — it transforms a constrained,
        awkwardly-shaped optimisation problem into a smooth, well-behaved hill
        that algorithms can climb efficiently. This is why GLMs use link functions,
        and why the terrain metaphor works so well for all of them.
      

Without the link

If we parameterize Poisson regression as $\mu = \beta_0 + \beta_1 x$ (identity link), the surface has a cliff where $\mu$ approaches zero (log-likelihood plummets to $-\infty$). For logistic regression with $p = \beta_0 + \beta_1 x$ (linear probability model), the surface has walls at $p = 0$ and $p = 1$. These constraints make the surface hard to optimise and the terrain metaphor breaks down.

With the canonical link

Apply log: $\log\mu = \beta_0 + \beta_1 x$ for Poisson. Apply logit: $\log\frac{p}{1-p} = \beta_0 + \beta_1 x$ for logistic. Now the parameters are unconstrained, the surface is concave, and gradient ascent or Newton-Raphson will find the peak reliably — just like climbing a hill on a map.

Connecting the Pieces

Every tutorial's "fitting" page is about climbing a log-likelihood surface. Every tutorial's "advanced" page derives the shape of that surface from the probability model. And every optimisation visualisation demonstrates the algorithms that do the climbing.

Explore further:

1D optimisation — gradient ascent on a single-parameter log-likelihood
2D optimisation — climbing a Gaussian regression LL surface
Multi-optima — when the surface has multiple peaks
MCMC — sampling from the surface instead of climbing it
Gaussian LL derivation — why the surface is a quadratic bowl
Logistic LL derivation — why there's no closed-form MLE
Poisson LL derivation — the exponential family connection

Theory: For the mathematical foundations of likelihood-based inference, see Likelihood and Simulation Theory on JonStats. Blog origin of the landscape metaphor: Part Five — Traversing the Likelihood Landscape.