Generalised Linear Models

Interactive tutorials that teach you to match statistical models to your data's characteristics

Why Does the Model Choice Matter?

Many introductory statistics courses teach only one model: ordinary least squares regression (R's lm(), Python's LinearRegression()). This works fine for continuous, normally-distributed outcomes. But what about...

Using the wrong model isn't just statistically incorrect—it can give you impossible predictions (negative probabilities, fractional counts) and misleading inferences (wrong standard errors, invalid confidence intervals).

GLMs are about making your model compatible with your data's generating process—not forcing everything into an inappropriate framework because it's the only one you know.

What Goes Wrong Without GLMs?

Data Type Example Outcome OLS Prediction Problem GLM Solution
Binary (0/1) Heart disease (yes/no) Can predict P = 1.3 or P = -0.2 Logistic: bounds to (0,1)
Count Bike rentals per day Can predict -50 rentals Poisson/NegBin: non-negative integers
Positive continuous Insurance claim amount Can predict negative costs Gamma: strictly positive
The GLM Framework:   X → η = Xβ → μ = g-1(η) → Y ~ f(μ, α)

The Two-Part GLM Structure (after King, Tomz & Wittenberg, 2000):

Stochastic $Y_i \sim f(\theta_i, \alpha)$ — the random component (distribution family + dispersion)
Systematic $\theta_i = g(X_i, \beta)$ — the deterministic component (predictors + link)

Every statistical model from linear regression to logistic regression fits this two-part structure. The tutorials help you choose appropriate $f(\cdot)$ and $g(\cdot)$ for your data.

Tutorial Series

Each tutorial presents a real dataset and decision problem. Your task is to figure out the appropriate model.

1. Heart Rate Prediction

Which GLM family?

The Decision Problem

Predict a patient's maximum heart rate during exercise from their characteristics.

Dataset
UCI Heart Disease (Cleveland)
Your challenge: What type of outcome is heart rate? What constraints does it have? Which GLM family fits best?

2. Heart Disease Classification

Which GLM family?

The Decision Problem

Classify whether a patient has heart disease based on diagnostic measurements.

Dataset
UCI Heart Disease (Cleveland)
Your challenge: The outcome is yes/no. What link function maps probabilities to a linear predictor?

3. Bike Rental Demand

Which GLM family?

The Decision Problem

Predict daily bike rental demand from weather and calendar variables.

Dataset
UCI Bike Sharing Dataset
Your challenge: Rentals are counts (0, 1, 2, ...). What distribution models count data? What might go wrong?

4. Handling Overdispersion

Which GLM family?

The Decision Problem

The model from Tutorial 3 has a problem. Can you diagnose and fix it?

Dataset
UCI Bike Sharing (revisited)
Your challenge: When variance exceeds the mean, standard errors become unreliable. What's the solution?

5. Blood Pressure Prediction

Which GLM family?

The Decision Problem

Predict resting blood pressure—a strictly positive, continuous outcome.

Dataset
UCI Heart Disease (revisited)
Your challenge: Blood pressure can't be negative. What GLM family handles positive continuous data?

What You'll Learn

Each tutorial walks through the same 6-step process:

  1. Systematic Component — Choose your response and predictors
  2. Link Function — Connect the linear predictor to the mean
  3. Distribution — Choose the appropriate probability distribution
  4. Fitting Method — Understand how parameters are estimated
  5. Implementation — Code it in R and Python
  6. Advanced — Derive the log-likelihood and fit from scratch

Optimisation Visualised

How do algorithms find the best parameters? Our interactive visualisations show you how MLE algorithms navigate parameter space—from simple 1D problems to the high-dimensional challenges faced by modern AI.

1D (curve) → 2D (surface) → 3D (volume) → 4D+ (projections only)

Watch gradient descent, Newton-Raphson, and analytic solutions in action. See why we need algorithms when visualisation fails.

Explore Optimisation →

Alpha Version - We'd Love Your Feedback!

This tutorial series is in active development. If you encounter any issues or have ideas for improvements, please let us know through GitHub:

Report a Bug Suggest a Feature

Requires a GitHub account. Your feedback helps improve these tutorials for everyone.