The distribution f() describes how the observed counts vary around the expected count. This is the stochastic (random) component of the model.
Our response is a count: 0, 1, 2, 3, ... (non-negative integers). Unlike continuous data, counts are discrete and can't be negative. The distribution we choose should respect these constraints.
For modelling bike rental counts - non-negative integers (0, 1, 2, ..., thousands) - which distribution family describes this type of data?
Click on a card to select it.
You've specified all three components of your GLM:
$\text{RentalCount} \sim \text{Poisson}(\mu)$
where $\ln(\mu) = \beta_0 + \beta_1 \cdot \text{Temp} + \beta_2 \cdot \text{Hum} + \beta_3 \cdot \text{Wind} + \beta_4 \cdot \text{Work} + \beta_5 \cdot \text{Weather}$
This combination - Poisson distribution + Log link - is Poisson regression, the standard approach for count data. The log is the "canonical" link for the Poisson family.
A key Poisson assumption is Mean = Variance. In practice, many real datasets show overdispersion - variance greater than the mean. We'll check for this after fitting, and if present, Tutorial 4 shows how to handle it with the Negative Binomial distribution.