Multi-Optima: When Algorithms Get Stuck

Soldiers vs Scouts in the search for the global maximum

Soldier Mindset

"Always seek higher ground"

Fast, decisive, follows orders (the gradient). But if started on the wrong hill, will climb to a local peak and declare victory.

  • Gradient Descent
  • Newton-Raphson
  • Steepest Ascent

🔎 Scout Mindset

"Survey the territory first"

Slower, willing to explore, may temporarily go downhill. More likely to find the true global maximum.

  • Random Restarts
  • Simulated Annealing
  • Basin Hopping

The Trade-off

When you're certain about the landscape (like most GLMs with convex likelihood), soldier algorithms are efficient. When uncertain (complex models, neural networks), scout algorithms are worth the extra cost.

Arthur's Seat: A Multi-Peak Landscape
Current Position
--
Current Elevation
--
Iterations
0
Best Found
--
Global Max
261m
Click "Run Algorithm" or "Step" to begin...
Section 1: Speed — How Fast to Reach the Peak?

On a simple, unimodal landscape, soldier algorithms excel. Watch how Newton-Raphson uses curvature information to take larger, smarter steps compared to gradient ascent's cautious approach.

Click to compare...
⚔️ Gradient Ascent
Steps: 0 | Elevation: --
🔬 Newton-Raphson
Steps: 0 | Elevation: --

Why Newton is Faster

Gradient ascent only uses first derivatives (slope) — it knows which direction is uphill but not how steep it stays. Newton-Raphson uses second derivatives (curvature) too — it can estimate how far to go before the slope flattens. On smooth, bowl-shaped landscapes, this makes Newton dramatically faster.

Section 2: Accuracy — How Often Find the Global Peak?

Speed means nothing if you climb the wrong mountain. On multi-modal landscapes like Arthur's Seat, where you start determines where you end up. Scout algorithms trade speed for better global exploration.

⚔️ Gradient Ascent
0/20
found global peak
🔬 Newton-Raphson
0/20
found global peak
🔍 Random Restarts (×5)
0/20
found global peak
🌡️ Simulated Annealing
0/20
found global peak
Click to run trials with random starting positions...

The Local Optimum Trap

Soldier algorithms always find a peak — but often not the highest one. Scout algorithms invest extra computation to escape local optima. The payoff depends on how much you value finding the global versus any reasonable solution.

Section 3: Representativeness — How Well Describe the Landscape?

Finding a peak is just part of the story. How confident should we be in that answer? The Hessian-based approximation assumes the landscape is a smooth bowl near the peak — but what if it isn't?

MLE + Hessian Ellipse
95% confidence region based on curvature at the peak
MCMC Posterior Samples
Actual samples from the probability landscape

MLE + Hessian Approximation

Fast: One optimisation run + one matrix inversion.
Assumes: Landscape is roughly quadratic (bowl-shaped) near the peak.
Fails when: Ridges, multiple modes, or heavy tails.

MCMC Sampling

Slow: Thousands of iterations to explore.
Assumes: Very little — just needs to be able to evaluate likelihood.
Reveals: True shape of uncertainty, including asymmetries and multiple modes.

Conclusion: No Free Lunch

Each algorithm optimises for different things. There is no single "best" choice — only trade-offs appropriate to your problem.

Algorithm Speed Accuracy Representativeness Best For
Gradient Ascent ⭐⭐ Simple convex problems, debugging
Newton-Raphson ⭐⭐⭐ ⭐⭐ GLMs, smooth likelihoods
Random Restarts ⭐⭐⭐ Mixture models, neural net init
Simulated Annealing ⭐⭐⭐ Discrete optimisation, scheduling
MCMC ⭐⭐ ⭐⭐⭐ Bayesian inference, uncertainty quantification

The Fundamental Trade-off

Speed vs Thoroughness: Fast algorithms assume the landscape is "nice" (convex, smooth). Slow algorithms make fewer assumptions but pay with computation.

Point Estimate vs Distribution: Optimisation finds where the peak is. MCMC tells you how certain you should be about that answer.

For most GLMs with well-behaved likelihoods, Newton-Raphson + Hessian-based standard errors is the sweet spot: fast, accurate, and "representative enough". But always check your assumptions!

Why This Matters for Statistics

Most GLM likelihoods are concave (bowl-shaped when maximising), meaning there's only one peak. Soldier algorithms like Newton-Raphson are perfect here — they'll always find the global maximum.

But some models have multiple local optima:

The Uncertainty Principle of Optimisation

The more uncertain you are about your landscape's shape, the more valuable exploration becomes. A soldier who always seeks higher ground will summit a peak quickly — but it may not be the highest one. A scout who surveys the territory first moves slower but is more likely to find the true summit.

📖
The Scout Mindset
Julia Galef (2021)
The soldier/scout distinction is borrowed (loosely!) from Galef's book about reasoning and truth-seeking. In her framing, soldiers fight to defend beliefs they already hold; scouts explore to understand reality. We're repurposing the metaphor for optimisation: soldiers follow orders (the gradient) without question; scouts survey the territory before committing. Galef's book is about epistemology, not algorithms — but the core insight transfers: when uncertain about the landscape, exploration beats exploitation.
🐜

🐜 You found the ant!

What's an ant doing on a page about optimisation?

Real ants have solved the exploration-exploitation trade-off that plagues optimisation algorithms. Their solution is now a real algorithm: Ant Colony Optimisation (ACO).

The Colony's Dilemma

Most ants follow pheromone trails left by other ants. The more ants that travel a route, the stronger the trail becomes. This is efficient — why reinvent the wheel?

But here's the problem: if all ants followed the strongest trail 100% of the time, the colony would deplete that food source and starve. They'd be stuck on a "local optimum" while better resources exist elsewhere.

The Solution: Disciplined Wandering

A small percentage of ants ignore the pheromone trail and wander randomly. Most of these "rebels" find nothing. But occasionally, one discovers a new food source — and starts a new trail.

Ant Behaviour Algorithm Equivalent
Follow strong pheromone trail Exploit current best solution
Random wandering Explore new regions
Pheromone strength Solution quality memory
Pheromone evaporation Forgetting old solutions
% that wander Exploration parameter (ε)

The profound bit: Individual ants that wander off probably fail. But the colony benefits from having some percentage of "failures" — because resources deplete, environments change, and better solutions exist elsewhere. The 100% disciplined colony is perfectly adapted to current conditions, and doomed when they change.

Beyond Search: A Philosophy

This maps to model selection too. Don't just fit the model that worked last time. Some "wasted" exploration prevents overfitting to one dataset or problem type.

The ants figured this out millions of years before we invented gradient descent.