Your model so far
MaxHeartRate ~ f(?) with E[y] = β0 + β1·Age + β2·ExAng + β3·STDep
The distribution f() describes how individual observations scatter around the expected value.
This is the stochastic (random) component of the model.
Choose the Distribution Family
For modelling maximum heart rate - a continuous measure typically ranging from 60-200 bpm - which distribution family best describes how observations vary?
Click on a card to select it.
Gaussian (Normal)
y ~ N(μ, σ²)
The classic bell curve. Observations scatter symmetrically around the mean.
Use when: Continuous data, symmetric residuals, constant variance
Variance: constant (σ²)
Gamma
y ~ Gamma(α, β)
For positive continuous data that's often right-skewed. Variance increases with mean.
Use when: Strictly positive, right-skewed, variance grows with mean
Variance: proportional to μ²
Poisson
y ~ Poisson(λ)
For count data (0, 1, 2, ...). Mean equals variance.
Use when: Counting events, integers only, mean ≈ variance
Variance: equals μ
Inverse Gaussian
y ~ IG(μ, λ)
For highly right-skewed positive continuous data. Variance increases rapidly with mean.
Use when: Positive, highly skewed, variance grows as μ³
Variance: proportional to μ³
Model Complete!
You've specified all three components of your GLM:
MaxHeartRate ~ Normal(μ, σ²)
where E[MaxHeartRate] = μ = β0 + β1·Age + β2·ExerciseAngina + β3·STDepression
Systematic Component
4 terms (incl. intercept)
This is equivalent to ordinary least squares (OLS) linear regression -
the foundation of statistical modelling. The GLM framework shows how this familiar model
is just one special case of a much broader family.