Fit the Poisson regression model to predict bike rental counts
Now we'll implement our Poisson GLM using real code. Both R and Python have excellent GLM support, and the results will be identical (within rounding).
Our model: $\text{Count} \sim \text{Poisson}(\mu)$ with $\ln(\mu) = \beta_0 + \beta_1 \cdot \text{temp} + \beta_2 \cdot \text{hum} + \beta_3 \cdot \text{windspeed} + \beta_4 \cdot \text{workingday} + \beta_5 \cdot \text{weathersit}$
# Load the UCI Bike Sharing dataset (daily aggregates)
bike <- read.csv("day.csv")
# Quick look at the data
dim(bike)
# [1] 731 16
summary(bike$cnt)
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# 22 3152 4548 4504 5956 8714
# Fit Poisson regression with log link (the default for Poisson)
fit <- glm(cnt ~ temp + hum + windspeed + workingday + weathersit,
family = poisson(link = "log"),
data = bike)
summary(fit)
Call:
glm(formula = cnt ~ temp + hum + windspeed + workingday + weathersit,
family = poisson(link = "log"), data = bike)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 8.239119 0.003845 2142.57 <2e-16 ***
temp 1.397084 0.003179 439.44 <2e-16 ***
hum -0.356839 0.005271 -67.69 <2e-16 ***
windspeed -0.967277 0.008000 -120.91 <2e-16 ***
workingday 0.038946 0.001202 32.39 <2e-16 ***
weathersit -0.129765 0.001380 -94.02 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 668801 on 730 degrees of freedom
Residual deviance: 380005 on 725 degrees of freedom
AIC: 387415
Number of Fisher Scoring iterations: 4
# For Poisson regression, exp(coefficient) gives the rate ratio
exp(coef(fit))
# (Intercept) temp hum windspeed workingday weathersit
# 3.779e+03 4.043e+00 7.001e-01 3.800e-01 1.040e+00 8.783e-01
# With 95% confidence intervals
exp(confint(fit))
# Key diagnostic: residual deviance should be close to df
# Ratio >> 1 indicates overdispersion
fit$deviance / fit$df.residual
# [1] 524.1
# Severe overdispersion detected!
# Residual deviance: 380,005
# Degrees of freedom: 725
# Ratio: 524.1 (should be ~1 for Poisson to be appropriate)
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf
import numpy as np
# Load the UCI Bike Sharing dataset
bike = pd.read_csv("day.csv")
# Quick look at the data
print(bike.shape)
# (731, 16)
print(bike['cnt'].describe())
# count 731.000000
# mean 4504.348837
# std 1937.211452
# min 22.000000
# 25% 3152.000000
# 50% 4548.000000
# 75% 5956.000000
# max 8714.000000
# Fit Poisson regression with log link
fit = smf.glm('cnt ~ temp + hum + windspeed + workingday + weathersit',
data=bike,
family=sm.families.Poisson()).fit()
print(fit.summary())
Generalized Linear Model Regression Results
==============================================================================
Dep. Variable: cnt No. Observations: 731
Model: GLM Df Residuals: 725
Model Family: Poisson Df Model: 5
Link Function: Log Scale: 1.0000
Method: IRLS Log-Likelihood: -1.9370e+05
Date: Wed, 11 Dec 2025 Deviance: 3.8000e+05
Time: 16:58:06 Pearson chi2: 3.68e+05
No. Iterations: 5 Pseudo R-squ. (CS): 1.000
Covariance Type: nonrobust
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
Intercept 8.2391 0.004 2142.559 0.000 8.232 8.247
temp 1.3971 0.003 439.442 0.000 1.391 1.403
hum -0.3568 0.005 -67.693 0.000 -0.367 -0.347
windspeed -0.9673 0.008 -120.911 0.000 -0.983 -0.952
workingday 0.0389 0.001 32.390 0.000 0.037 0.041
weathersit -0.1298 0.001 -94.024 0.000 -0.132 -0.127
==============================================================================
# For Poisson regression, exp(coefficient) gives the rate ratio
print("Rate Ratios:")
print(np.exp(fit.params))
# Intercept 3779.086
# temp 4.043
# hum 0.700
# windspeed 0.380
# workingday 1.040
# weathersit 0.878
# With 95% confidence intervals
print("\nRate Ratios with 95% CI:")
print(np.exp(fit.conf_int()))
# Key diagnostic: deviance / df_residual should be ~1
overdispersion_ratio = fit.deviance / fit.df_resid
print(f"Overdispersion ratio: {overdispersion_ratio:.1f}")
# Overdispersion ratio: 524.1
# Severe overdispersion detected!
# Deviance: 380,005
# Degrees of freedom: 725
# Ratio: 524.1 (should be ~1 for Poisson to be appropriate)
The coefficients are on the log scale. To interpret them, we exponentiate to get rate ratios - the multiplicative effect on expected count.
| Predictor | Coefficient ($\beta$) | Rate Ratio ($e^\beta$) | Interpretation |
|---|---|---|---|
| temp | +1.397 | 4.04 | Full temp range (0 to 1) = 4x more rentals |
| hum | -0.357 | 0.70 | Full humidity range = 30% fewer rentals |
| windspeed | -0.967 | 0.38 | Full wind range = 62% fewer rentals |
| workingday | +0.039 | 1.04 | Working days = 4% more rentals |
| weathersit | -0.130 | 0.88 | Each worse weather category = 12% fewer rentals |
Temperature effect: $e^{1.397} = 4.04$ means that going from the coldest to warmest temperature (normalized 0-1) multiplies expected rentals by 4. Temperature is the strongest predictor.
Note: These predictors are normalized (0-1 scale), so the coefficients represent the effect across their full range, not a "one unit" change in the original scale.
The model diagnostics reveal severe overdispersion. The Poisson model assumes Mean = Variance, but our data has variance much greater than the mean.
What does this mean?
The solution? Use the Negative Binomial distribution (Tutorial 4), which has an extra parameter to account for overdispersion.