← All Tutorials

Tutorial: The Link Function

Choose how to connect your predictors to the response

Systematic Component
2 Link Function
3 Distribution

Your model so far

MaxHeartRate = g(?)(1, Age, ExerciseAngina, STDepression)

The link function g() determines how the linear combination of predictors relates to the expected response.

Choose the Link Function

For predicting maximum heart rate (a continuous value typically 60-200 bpm), which link function is most appropriate?

Click on a card to select it.

Link Function Selected

With the identity link, your model equation becomes:

E[MaxHeartRate] = β0 + β1·Age + β2·ExerciseAngina + β3·STDepression

The identity link means predicted heart rate equals the linear combination directly - no transformation needed. This is the natural choice for continuous outcomes that can take any value.

✔ Good Choice!

The identity link is a sensible choice for predicting maximum heart rate.

Since heart rate is a continuous measure and we expect predictors to have additive effects, the identity link gives us the most interpretable model.

What this means: A one-year increase in age changes predicted heart rate by β1 bpm directly - easy to interpret!

A small caveat: The identity link allows predictions of any value, including negative heart rates (which are impossible). In practice, if your model predicts negative values, you might need to reconsider your approach. For typical data ranges, this usually isn't a problem.

⚠ Not a Bad Idea...

The log link isn't unreasonable here - heart rate is always positive, so the log link would guarantee positive predictions.

However, consider:
  • We don't typically expect multiplicative effects of age on heart rate
  • Coefficients become harder to interpret (percentage changes rather than bpm)
  • The identity link is simpler and often sufficient

The log link is more natural when the response must be positive (like counts or concentrations) and when effects are genuinely multiplicative.

For this tutorial, try the identity link instead. In free explore mode, you could compare both approaches!

❌ Not Appropriate Here

The logit link is designed for modelling probabilities - values that must be between 0 and 1.

Maximum heart rate is measured in bpm (typically 60-200), not as a probability. The logit transformation would produce nonsensical results here.

Logit is the right choice for:
  • Binary outcomes (yes/no, survived/died)
  • Proportions (percentage of successes)
  • Any response bounded between 0 and 1

❌ Not Appropriate Here

The inverse link creates a reciprocal relationship - it's used when you expect that as predictors increase, the response decreases in a specific hyperbolic way.

This is the canonical link for Gamma distributions, typically used for modelling waiting times, rates, or other strictly positive continuous data with right-skewed distributions.

Inverse link is natural for:
  • Time-to-event data (waiting times)
  • Insurance claim amounts
  • Other right-skewed positive continuous data

For heart rate, we expect more straightforward additive effects, not reciprocal ones.