Generalised Linear Models

Interactive tutorials that teach you to match statistical models to your data's characteristics

Why Does the Model Choice Matter?

Many introductory statistics courses teach only one model: ordinary least squares regression (R's lm(), Python's LinearRegression()). This works fine for continuous, normally-distributed outcomes. But what about...

Using the wrong model isn't just statistically incorrect—it can give you impossible predictions (negative probabilities, fractional counts) and misleading inferences (wrong standard errors, invalid confidence intervals).

GLMs are about making your model compatible with your data's generating process—not forcing everything into an inappropriate framework because it's the only one you know.

These interactive tutorials are a companion to JonStats, which covers the underlying theory of statistical inference and simulation in depth.

What Goes Wrong Without GLMs?

Data Type Example Outcome OLS Prediction Problem GLM Solution
Binary (0/1) Heart disease (yes/no) Can predict P = 1.3 or P = -0.2 Logistic: bounds to (0,1)
Count Bike rentals per day Can predict -50 rentals Poisson/NegBin: non-negative integers
Positive continuous Insurance claim amount Can predict negative costs Gamma: strictly positive
Proportion (0,1) Exam pass rate Can predict rates > 1 or < 0 Beta: bounded to (0,1)
Ordered categories Pain level (None/Mild/Moderate/Severe) Treats gaps as equal, predicts "2.7 pain" Ordinal: respects ordering without equal spacing
Zero-heavy counts Doctor visits (many never visit) Underpredicts zeros, wrong SEs ZIP: two processes generating zeros
The GLM Framework:   X → η = Xβ → μ = g-1(η) → Y ~ f(μ, α)

The Two-Part GLM Structure (after King, Tomz & Wittenberg, 2000):

Stochastic $Y_i \sim f(\theta_i, \alpha)$ — the random component (distribution family + dispersion)
Systematic $\theta_i = g(X_i, \beta)$ — the deterministic component (predictors + link)

Every statistical model from linear regression to logistic regression fits this two-part structure. The tutorials help you choose appropriate $f(\cdot)$ and $g(\cdot)$ for your data.

Tutorial Series

Each tutorial presents a real dataset and decision problem. Your task is to figure out the appropriate model.

New to GLMs? Start with the theory in Introduction to Generalised Linear Models on JonStats.

1. Heart Rate Prediction

Which GLM family?

The Decision Problem

Predict a patient's maximum heart rate during exercise from their characteristics.

Dataset
UCI Heart Disease (Cleveland)
Your challenge: What type of outcome is heart rate? What constraints does it have? Which GLM family fits best?

2. Heart Disease Classification

Which GLM family?

The Decision Problem

Classify whether a patient has heart disease based on diagnostic measurements.

Dataset
UCI Heart Disease (Cleveland)
Your challenge: The outcome is yes/no. What link function maps probabilities to a linear predictor?

3. Bike Rental Demand

Which GLM family?

The Decision Problem

Predict daily bike rental demand from weather and calendar variables.

Dataset
UCI Bike Sharing Dataset
Your challenge: Rentals are counts (0, 1, 2, ...). What distribution models count data? What might go wrong?

4. Handling Overdispersion

Which GLM family?

The Decision Problem

The model from Tutorial 3 has a problem. Can you diagnose and fix it?

Dataset
UCI Bike Sharing (revisited)
Your challenge: When variance exceeds the mean, standard errors become unreliable. What's the solution?

5. Blood Pressure Prediction

Which GLM family?

The Decision Problem

Predict resting blood pressure—a strictly positive, continuous outcome.

Dataset
UCI Heart Disease (revisited)
Your challenge: Blood pressure can't be negative. What GLM family handles positive continuous data?

6. Exam Performance Rates

Which GLM family?

The Decision Problem

Predict the proportion of exam questions answered correctly—a continuous value bounded between 0 and 1.

Dataset
Synthetic Exam Proportions
Your challenge: Proportions aren't binary and can't exceed [0,1]. What distribution handles continuous bounded data?

7. Pain Level Assessment

Which GLM family?

The Decision Problem

Predict pain severity (None/Mild/Moderate/Severe) after treatment—ordered categories with unequal gaps.

Dataset
Synthetic Clinical Trial
Your challenge: The outcome has a natural ordering but isn't a number. How do you model ordered categories?

8. Doctor Visit Counts

Which GLM family?

The Decision Problem

Predict annual doctor visits—count data where 41% of people have zero visits, far more than any standard model expects.

Dataset
Synthetic Health Survey
Your challenge: Two different processes produce zeros: healthy non-visitors and access-blocked patients. How do you model both?

Tutorials 4–8 extend the framework beyond the original blog series — see the 25-part GLM series that started it all.

What You'll Learn

Each tutorial walks through the same 6-step process:

  1. Systematic Component — Choose your response and predictors
  2. Link Function — Connect the linear predictor to the mean
  3. Distribution — Choose the appropriate probability distribution
  4. Fitting Method — Understand how parameters are estimated
  5. Implementation — Code it in R and Python
  6. Advanced — Derive the log-likelihood and fit from scratch

Optimisation Visualised

How do algorithms find the best parameters? Our interactive visualisations show you how MLE algorithms navigate parameter space—from simple 1D problems to the high-dimensional challenges faced by modern AI.

1D (curve) → 2D (surface) → 3D (volume) → 4D+ (projections only)

Watch gradient descent, Newton-Raphson, and analytic solutions in action. See why we need algorithms when visualisation fails.

Explore Optimisation →

For the theory behind likelihood-based inference, see Likelihood and Simulation Theory on JonStats.

Alpha Version - We'd Love Your Feedback!

This tutorial series is in active development. If you encounter any issues or have ideas for improvements, please let us know through GitHub:

Report a Bug Suggest a Feature

Requires a GitHub account. Your feedback helps improve these tutorials for everyone.