Soldiers vs Scouts in the search for the global maximum
"Always seek higher ground"
Fast, decisive, follows orders (the gradient). But if started on the wrong hill, will climb to a local peak and declare victory.
"Survey the territory first"
Slower, willing to explore, may temporarily go downhill. More likely to find the true global maximum.
When you're certain about the landscape (like most GLMs with convex likelihood), soldier algorithms are efficient. When uncertain (complex models, neural networks), scout algorithms are worth the extra cost.
On a simple, unimodal landscape, soldier algorithms excel. Watch how Newton-Raphson uses curvature information to take larger, smarter steps compared to gradient ascent's cautious approach.
Gradient ascent only uses first derivatives (slope) — it knows which direction is uphill but not how steep it stays. Newton-Raphson uses second derivatives (curvature) too — it can estimate how far to go before the slope flattens. On smooth, bowl-shaped landscapes, this makes Newton dramatically faster.
Speed means nothing if you climb the wrong mountain. On multi-modal landscapes like Arthur's Seat, where you start determines where you end up. Scout algorithms trade speed for better global exploration.
Soldier algorithms always find a peak — but often not the highest one. Scout algorithms invest extra computation to escape local optima. The payoff depends on how much you value finding the global versus any reasonable solution.
Finding a peak is just part of the story. How confident should we be in that answer? The Hessian-based approximation assumes the landscape is a smooth bowl near the peak — but what if it isn't?
Fast: One optimisation run + one matrix inversion.
Assumes: Landscape is roughly quadratic (bowl-shaped) near the peak.
Fails when: Ridges, multiple modes, or heavy tails.
Slow: Thousands of iterations to explore.
Assumes: Very little — just needs to be able to evaluate likelihood.
Reveals: True shape of uncertainty, including asymmetries and multiple modes.
Each algorithm optimises for different things. There is no single "best" choice — only trade-offs appropriate to your problem.
| Algorithm | Speed | Accuracy | Representativeness | Best For |
|---|---|---|---|---|
| Gradient Ascent | ⭐⭐ | ⭐ | ⭐ | Simple convex problems, debugging |
| Newton-Raphson | ⭐⭐⭐ | ⭐ | ⭐⭐ | GLMs, smooth likelihoods |
| Random Restarts | ⭐ | ⭐⭐⭐ | ⭐ | Mixture models, neural net init |
| Simulated Annealing | ⭐ | ⭐⭐⭐ | ⭐ | Discrete optimisation, scheduling |
| MCMC | ⭐ | ⭐⭐ | ⭐⭐⭐ | Bayesian inference, uncertainty quantification |
Speed vs Thoroughness: Fast algorithms assume the landscape is "nice" (convex, smooth).
Slow algorithms make fewer assumptions but pay with computation.
Point Estimate vs Distribution: Optimisation finds where the peak is.
MCMC tells you how certain you should be about that answer.
For most GLMs with well-behaved likelihoods, Newton-Raphson + Hessian-based standard errors is the sweet spot:
fast, accurate, and "representative enough". But always check your assumptions!
Most GLM likelihoods are concave (bowl-shaped when maximising), meaning there's only one peak. Soldier algorithms like Newton-Raphson are perfect here — they'll always find the global maximum.
But some models have multiple local optima:
The more uncertain you are about your landscape's shape, the more valuable exploration becomes. A soldier who always seeks higher ground will summit a peak quickly — but it may not be the highest one. A scout who surveys the territory first moves slower but is more likely to find the true summit.