1 Load the data
heart <- read.csv("heart.csv")
head(heart)
2 Fit the GLM
model <- glm(
thalach ~ age + exang + oldpeak,
data = heart,
family = gaussian(link = "identity")
)
Note: For Gaussian + identity, you could also use lm() which gives identical results.
Using glm() makes the GLM framework explicit and is easily adapted to other families.
3 View results
summary(model)
coef(model)
confint(model)
4 Check assumptions
par(mfrow = c(2, 2))
plot(model)
Expected Output
Call:
glm(formula = thalach ~ age + exang + oldpeak, family = gaussian(link = "identity"),
data = heart)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 203.3660 6.7458 30.147 < 2e-16 ***
age -0.8298 0.1247 -6.657 1.34e-10 ***
exang -14.2542 2.4519 -5.813 1.57e-08 ***
oldpeak -3.7805 1.0091 -3.746 0.000215 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for gaussian family taken to be 366.9819)
Null deviance: 169344 on 302 degrees of freedom
Residual deviance: 109727 on 299 degrees of freedom
AIC: 2655.2
1 Import libraries and load data
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf
heart = pd.read_csv("heart.csv")
print(heart.head())
2 Fit the GLM
model = smf.glm(
formula="thalach ~ age + exang + oldpeak",
data=heart,
family=sm.families.Gaussian(
link=sm.families.links.Identity()
)
).fit()
Note: For Gaussian + identity, you could also use smf.ols() which gives identical results.
Using glm() makes the GLM framework explicit and is easily adapted to other families.
3 View results
print(model.summary())
print(model.params)
print(model.conf_int())
4 Check assumptions
import matplotlib.pyplot as plt
import scipy.stats as stats
residuals = model.resid_pearson
fitted = model.fittedvalues
fig, axes = plt.subplots(2, 2, figsize=(10, 8))
axes[0, 0].scatter(fitted, residuals, alpha=0.5)
axes[0, 0].axhline(y=0, color='red', linestyle='--')
axes[0, 0].set_xlabel('Fitted values')
axes[0, 0].set_ylabel('Residuals')
axes[0, 0].set_title('Residuals vs Fitted')
stats.probplot(residuals, plot=axes[0, 1])
axes[0, 1].set_title('Q-Q Plot')
plt.tight_layout()
plt.show()
Expected Output
Generalized Linear Model Regression Results
==============================================================================
Dep. Variable: thalach No. Observations: 303
Model: GLM Df Residuals: 299
Model Family: Gaussian Df Model: 3
Link Function: Identity Scale: 366.98
Method: IRLS Log-Likelihood: -1322.6
Date: Thu, 11 Dec 2025 Deviance: 1.0973e+05
Time: 15:31:43 Pearson chi2: 1.10e+05
No. Iterations: 3 Pseudo R-squ. (CS): 0.3524
Covariance Type: nonrobust
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
Intercept 203.3660 6.746 30.147 0.000 190.144 216.588
age -0.8298 0.125 -6.657 0.000 -1.074 -0.585
exang -14.2542 2.452 -5.813 0.000 -19.060 -9.448
oldpeak -3.7805 1.009 -3.746 0.000 -5.758 -1.803
==============================================================================