GLM Tutorials: Matching Models to Data

Why Does the Model Choice Matter?

Many introductory statistics courses teach only one model: ordinary least squares regression (R's lm(), Python's LinearRegression()). This works fine for continuous, normally-distributed outcomes. But what about...

Binary outcomes (yes/no, survived/died, clicked/didn't click)?
Count data (number of accidents, website visits, species observed)?
Positive continuous data (insurance claims, waiting times, costs)?

Using the wrong model isn't just statistically incorrect—it can give you impossible predictions (negative probabilities, fractional counts) and misleading inferences (wrong standard errors, invalid confidence intervals).

        GLMs are about making your model compatible with your data's generating process—not
        forcing everything into an inappropriate framework because it's the only one you know.
      

What Goes Wrong Without GLMs?

Data Type	Example Outcome	OLS Prediction Problem	GLM Solution
Binary (0/1)	Heart disease (yes/no)	Can predict P = 1.3 or P = -0.2	Logistic: bounds to (0,1)
Count	Bike rentals per day	Can predict -50 rentals	Poisson/NegBin: non-negative integers
Positive continuous	Insurance claim amount	Can predict negative costs	Gamma: strictly positive

The GLM Framework: X → η = Xβ → μ = g^-1(η) → Y ~ f(μ, α)

The Two-Part GLM Structure (after King, Tomz & Wittenberg, 2000):

Stochastic $Y_i \sim f(\theta_i, \alpha)$ — the random component (distribution family + dispersion)

Systematic $\theta_i = g(X_i, \beta)$ — the deterministic component (predictors + link)

Every statistical model from linear regression to logistic regression fits this two-part structure. The tutorials help you choose appropriate $f(\cdot)$ and $g(\cdot)$ for your data.

Tutorial Series

Each tutorial presents a real dataset and decision problem. Your task is to figure out the appropriate model.

1. Heart Rate Prediction

Which GLM family?

The Decision Problem

Predict a patient's maximum heart rate during exercise from their characteristics.

Dataset

UCI Heart Disease (Cleveland)

Your challenge: What type of outcome is heart rate? What constraints does it have? Which GLM family fits best?

Start Tutorial

2. Heart Disease Classification

Which GLM family?

The Decision Problem

Classify whether a patient has heart disease based on diagnostic measurements.

Dataset

UCI Heart Disease (Cleveland)

Your challenge: The outcome is yes/no. What link function maps probabilities to a linear predictor?

Start Tutorial

3. Bike Rental Demand

Which GLM family?

The Decision Problem

Predict daily bike rental demand from weather and calendar variables.

Dataset

UCI Bike Sharing Dataset

Your challenge: Rentals are counts (0, 1, 2, ...). What distribution models count data? What might go wrong?

Start Tutorial

4. Handling Overdispersion

Which GLM family?

The Decision Problem

The model from Tutorial 3 has a problem. Can you diagnose and fix it?

Dataset

UCI Bike Sharing (revisited)

Your challenge: When variance exceeds the mean, standard errors become unreliable. What's the solution?

Start Tutorial

5. Blood Pressure Prediction

Which GLM family?

The Decision Problem

Predict resting blood pressure—a strictly positive, continuous outcome.

Dataset

UCI Heart Disease (revisited)

Your challenge: Blood pressure can't be negative. What GLM family handles positive continuous data?

Start Tutorial

What You'll Learn

Each tutorial walks through the same 6-step process:

Systematic Component — Choose your response and predictors
Link Function — Connect the linear predictor to the mean
Distribution — Choose the appropriate probability distribution
Fitting Method — Understand how parameters are estimated
Implementation — Code it in R and Python
Advanced — Derive the log-likelihood and fit from scratch

Optimisation Visualised

How do algorithms find the best parameters? Our interactive visualisations show you how MLE algorithms navigate parameter space—from simple 1D problems to the high-dimensional challenges faced by modern AI.

1D (curve) → 2D (surface) → 3D (volume) → 4D+ (projections only)

Watch gradient descent, Newton-Raphson, and analytic solutions in action. See why we need algorithms when visualisation fails.

Explore Optimisation →

Generalised Linear Models

Why Does the Model Choice Matter?

What Goes Wrong Without GLMs?

Tutorial Series

1. Heart Rate Prediction

The Decision Problem

2. Heart Disease Classification

The Decision Problem

3. Bike Rental Demand

The Decision Problem

4. Handling Overdispersion

The Decision Problem

5. Blood Pressure Prediction

The Decision Problem

What You'll Learn

Optimisation Visualised

Alpha Version - We'd Love Your Feedback!