← All Tutorials

Tutorial: The Link Function

Choose how to connect your predictors to the response

✓ Systematic Component

2 Link Function

3 Distribution

Your model so far

MaxHeartRate = g(?)(1, Age, ExerciseAngina, STDepression)

The link function g() determines how the linear combination of predictors relates to the expected response.

Choose the Link Function

For predicting maximum heart rate (a continuous value typically 60-200 bpm), which link function is most appropriate?

Click on a card to select it.

Identity

g(μ) = μ

The simplest link: predictions are directly on the response scale.

Use when: Response can be any real number (positive, negative, or zero)

Log

g(μ) = ln(μ)

Ensures predictions are always positive. Models multiplicative effects.

Use when: Response must be strictly positive (counts, concentrations)

Logit

g(μ) = ln(μ/(1-μ))

Maps probabilities (0-1) to the real line. The "log-odds" transformation.

Use when: Response is a probability or proportion

Inverse

g(μ) = 1/μ

Creates reciprocal relationship between predictors and response.

Use when: Modelling rates or times (Gamma regression)

Link Function Selected

With the identity link, your model equation becomes:

E[MaxHeartRate] = β₀ + β₁·Age + β₂·ExerciseAngina + β₃·STDepression

The identity link means predicted heart rate equals the linear combination directly - no transformation needed. This is the natural choice for continuous outcomes that can take any value.

← Back to Variables

✔ Good Choice!

The identity link is a sensible choice for predicting maximum heart rate.

Since heart rate is a continuous measure and we expect predictors to have additive effects, the identity link gives us the most interpretable model.

        What this means: A one-year increase in age changes predicted heart rate
        by β1 bpm directly - easy to interpret!
      

A small caveat: The identity link allows predictions of any value, including negative heart rates (which are impossible). In practice, if your model predicts negative values, you might need to reconsider your approach. For typical data ranges, this usually isn't a problem.

⚠ Not a Bad Idea...

The log link isn't unreasonable here - heart rate is always positive, so the log link would guarantee positive predictions.

        However, consider:
        We don't typically expect multiplicative effects of age on heart rate
Coefficients become harder to interpret (percentage changes rather than bpm)
The identity link is simpler and often sufficient

      

The log link is more natural when the response must be positive (like counts or concentrations) and when effects are genuinely multiplicative.

For this tutorial, try the identity link instead. In free explore mode, you could compare both approaches!

❌ Not Appropriate Here

The logit link is designed for modelling probabilities - values that must be between 0 and 1.

Maximum heart rate is measured in bpm (typically 60-200), not as a probability. The logit transformation would produce nonsensical results here.

        Logit is the right choice for:
        Binary outcomes (yes/no, survived/died)
Proportions (percentage of successes)
Any response bounded between 0 and 1

      

❌ Not Appropriate Here

The inverse link creates a reciprocal relationship - it's used when you expect that as predictors increase, the response decreases in a specific hyperbolic way.

This is the canonical link for Gamma distributions, typically used for modelling waiting times, rates, or other strictly positive continuous data with right-skewed distributions.

        Inverse link is natural for:
        Time-to-event data (waiting times)
Insurance claim amounts
Other right-skewed positive continuous data

      

For heart rate, we expect more straightforward additive effects, not reciprocal ones.