The distribution f() describes the probability of observing each outcome (0 or 1). This is the stochastic (random) component of the model.
Our response is either 0 (no disease) or 1 (disease). This isn't continuous data that varies around a mean - it's a binary outcome where we model the probability of "success" (disease = 1).
For modelling heart disease presence - a binary outcome (0 or 1) - which distribution family describes this type of data?
Click on a card to select it.
You've specified all three components of your GLM:
$\text{HeartDisease} \sim \text{Binomial}(1, p)$
where $\text{logit}(p) = \beta_0 + \beta_1 \cdot \text{Age} + \beta_2 \cdot \text{Sex} + \beta_3 \cdot \text{CP} + \beta_4 \cdot \text{MaxHR} + \beta_5 \cdot \text{STDep}$
This combination - Binomial distribution + Logit link - is logistic regression, the workhorse of binary classification. The logit is the "canonical" link for the Binomial family, making this a natural and mathematically elegant pairing.