GLM Tutorial: Fitting Poisson Regression

✔ Correct!

Maximum Likelihood Estimation (MLE) via Iteratively Reweighted Least Squares (IRLS) is the standard method for Poisson regression and all GLMs.

        Why IRLS is essential for Poisson regression:

        The log link creates a non-linear relationship between predictors and expected count.
        There's no algebraic trick to solve for $\beta$ directly - we must iterate to find the
        maximum likelihood solution.
      

The IRLS algorithm for Poisson regression:

Start with initial estimates (often $\beta = 0$)
Calculate current rates: $\hat{\mu} = e^{X\beta}$
Compute weights: $W = \text{diag}(\hat{\mu})$ (for Poisson)
Update: $\beta^{new} = (X'WX)^{-1}X'Wz$ where $z$ is a working response
Repeat until convergence (usually 4-5 iterations)

Notice: the weights for Poisson ($\mu$) differ from logistic ($p(1-p)$), but the algorithm structure is identical!

🔍 Want to see optimisation in action?
Our interactive visualisations show how algorithms navigate parameter space from 1D to 4D, including gradient descent and Newton-Raphson.

❌ No Closed-Form Solution Exists

For linear regression (Gaussian + identity), the closed-form solution $\beta = (X'X)^{-1}X'y$ works beautifully. But Poisson regression has no such solution.

        Why? The log link creates a non-linear relationship:
        
        $\ln(\mu) = X\beta \quad \Rightarrow \quad \mu = e^{X\beta}$
        
        Solving for $\beta$ requires finding where the derivative of the Poisson log-likelihood equals zero,
        which has no closed-form solution.

This is a fundamental difference between:

Linear regression: Minimize sum of squared errors → closed-form
Poisson regression: Maximize Poisson likelihood → iteration required

Select Maximum Likelihood (IRLS) to continue.

⚠️ Valid, But Not Standard for GLMs

Gradient descent would work for Poisson regression - it's how neural networks are trained. However, it's not the standard approach for GLMs.

        Why IRLS is preferred over gradient descent:
        Faster convergence: IRLS uses curvature information (2nd derivatives)
No step size tuning: Gradient descent requires choosing learning rate
Exact standard errors: IRLS gives information matrix for free
Guaranteed convergence: For well-posed GLMs, IRLS always converges

      

Gradient descent is typically 10-100x slower than IRLS for Poisson regression. It's the standard for deep learning where computing the full Hessian is impractical.

Select Maximum Likelihood (IRLS) for the standard GLM approach.

⭐ Excellent Insight!

Newton-Raphson is indeed the mathematical foundation of how we fit Poisson regression! IRLS for Poisson GLMs is actually equivalent to Newton-Raphson (specifically, Fisher scoring).

        The connection:
        
        Newton-Raphson update: $\beta^{new} = \beta^{old} - H^{-1} \nabla \ell$
        
        Where $H$ is the Hessian and $\nabla \ell$ is the score. For GLMs, this can be rewritten
        as an iteratively reweighted least squares problem.

Fisher scoring (used in IRLS) replaces the observed Hessian with its expected value (Fisher information), which simplifies computation and guarantees positive definiteness.

In practice, we use the term "Maximum Likelihood via IRLS" - select that option to continue.

Tutorial 3: Fitting the Model

Your model so far

Same Fitting Story as Tutorial 2

How should we find the $\beta$ coefficients?

Maximum Likelihood (IRLS)

Closed-Form (OLS)

Gradient Descent

Newton-Raphson

✔ Fitting Method Selected

✔ Correct!

❌ No Closed-Form Solution Exists

⚠️ Valid, But Not Standard for GLMs

⭐ Excellent Insight!