← All Tutorials

Tutorial 3: Fitting the Model

How should we estimate the Poisson regression parameters?

Systematic
Link
Distribution
4 Fitting
5 Implementation

Your model so far

Systematic Component
$\eta = \beta_0 + \beta_1 \cdot \text{Temp} + \ldots$
Link Function
Log: $\ln(\mu) = \eta$
Distribution
Poisson

Same Fitting Story as Tutorial 2

Like logistic regression, Poisson regression has no closed-form solution. The log link creates a non-linear equation that must be solved iteratively. However, the good news is that the same IRLS algorithm works for both - GLMs have a unified fitting framework!

How should we find the $\beta$ coefficients?

We have our model structure defined. Now we need to estimate the parameters ($\beta_0, \beta_1, \ldots, \beta_5$) that best fit the observed rental counts.

Click on a card to select it.

🔄

Maximum Likelihood (IRLS)

Iteratively Reweighted Least Squares - the standard GLM fitting algorithm.

Iterative optimization • Works for all GLMs • Fisher scoring

📐

Closed-Form (OLS)

Direct analytical solution using matrix algebra: $\beta = (X'X)^{-1}X'y$

One-shot calculation • No iteration • Exact solution

⛰️

Gradient Descent

General-purpose optimization by following the gradient of the loss function.

Step size tuning • Slower convergence • Deep learning standard

🎯

Newton-Raphson

Second-order optimization using the Hessian matrix.

Fast convergence • Computes curvature • IRLS is a form of this

✔ Fitting Method Selected

Maximum Likelihood via IRLS is the standard approach for GLMs, and it's required for Poisson regression since there's no closed-form solution.

R's glm() and Python's statsmodels use IRLS (a form of Fisher scoring) by default. For our bike rental model, it typically converges in 4-5 iterations.

✔ Correct!

Maximum Likelihood Estimation (MLE) via Iteratively Reweighted Least Squares (IRLS) is the standard method for Poisson regression and all GLMs.

Why IRLS is essential for Poisson regression:
The log link creates a non-linear relationship between predictors and expected count. There's no algebraic trick to solve for $\beta$ directly - we must iterate to find the maximum likelihood solution.

The IRLS algorithm for Poisson regression:

Notice: the weights for Poisson ($\mu$) differ from logistic ($p(1-p)$), but the algorithm structure is identical!

🔍 Want to see optimisation in action?
Our interactive visualisations show how algorithms navigate parameter space from 1D to 4D, including gradient descent and Newton-Raphson.

❌ No Closed-Form Solution Exists

For linear regression (Gaussian + identity), the closed-form solution $\beta = (X'X)^{-1}X'y$ works beautifully. But Poisson regression has no such solution.

Why? The log link creates a non-linear relationship:

$\ln(\mu) = X\beta \quad \Rightarrow \quad \mu = e^{X\beta}$

Solving for $\beta$ requires finding where the derivative of the Poisson log-likelihood equals zero, which has no closed-form solution.

This is a fundamental difference between:

Select Maximum Likelihood (IRLS) to continue.

⚠️ Valid, But Not Standard for GLMs

Gradient descent would work for Poisson regression - it's how neural networks are trained. However, it's not the standard approach for GLMs.

Why IRLS is preferred over gradient descent:
  • Faster convergence: IRLS uses curvature information (2nd derivatives)
  • No step size tuning: Gradient descent requires choosing learning rate
  • Exact standard errors: IRLS gives information matrix for free
  • Guaranteed convergence: For well-posed GLMs, IRLS always converges

Gradient descent is typically 10-100x slower than IRLS for Poisson regression. It's the standard for deep learning where computing the full Hessian is impractical.

Select Maximum Likelihood (IRLS) for the standard GLM approach.

⭐ Excellent Insight!

Newton-Raphson is indeed the mathematical foundation of how we fit Poisson regression! IRLS for Poisson GLMs is actually equivalent to Newton-Raphson (specifically, Fisher scoring).

The connection:

Newton-Raphson update: $\beta^{new} = \beta^{old} - H^{-1} \nabla \ell$

Where $H$ is the Hessian and $\nabla \ell$ is the score. For GLMs, this can be rewritten as an iteratively reweighted least squares problem.

Fisher scoring (used in IRLS) replaces the observed Hessian with its expected value (Fisher information), which simplifies computation and guarantees positive definiteness.

In practice, we use the term "Maximum Likelihood via IRLS" - select that option to continue.