← All Tutorials

Tutorial 2: Implementation

See your logistic regression model in R and Python code

Systematic
Link
Distribution
Fitting
5 Implementation

Tutorial Complete!

You've successfully specified a logistic regression model. Here's how to fit it in practice.

Your Complete Model

$\text{HeartDisease} \sim \text{Binomial}(1, p)$ where $\text{logit}(p) = \beta_0 + \beta_1 \cdot \text{Age} + \beta_2 \cdot \text{Sex} + \beta_3 \cdot \text{CP} + \beta_4 \cdot \text{MaxHR} + \beta_5 \cdot \text{STDep}$
Response
Heart Disease (0/1)
Link
Logit
Family
Binomial
Method
IRLS (5 iterations)

1 Load the data

# Load the UCI Heart Disease dataset
heart <- read.csv(
  "https://archive.ics.uci.edu/ml/machine-learning-databases/heart-disease/processed.cleveland.data",
  header = FALSE
)
names(heart) <- c("age", "sex", "cp", "trestbps", "chol", "fbs",
                "restecg", "thalach", "exang", "oldpeak",
                "slope", "ca", "thal", "target")

# Binarise target: 0 = no disease, 1 = disease
heart$target <- ifelse(heart$target > 0, 1, 0)

# Preview the data
head(heart)

2 Fit the GLM

# Fit logistic regression using glm()
model <- glm(
  target ~ age + sex + cp + thalach + oldpeak,
  data = heart,
  family = binomial(link = "logit")
)
Note: family = binomial(link = "logit") specifies logistic regression. The logit link is the default for binomial, so you could also write just family = binomial.

3 View results

# Summary of fitted model
summary(model)

# Extract coefficients (log-odds scale)
coef(model)

# Calculate odds ratios
exp(coef(model))

# Confidence intervals for odds ratios
exp(confint(model))

4 Assess model fit

# Deviance and AIC
model$deviance   # Residual deviance
model$null.deviance  # Null deviance
AIC(model)

# Pseudo R-squared (McFadden)
1 - (model$deviance / model$null.deviance)

# Confusion matrix (at threshold 0.5)
predicted <- ifelse(predict(model, type = "response") > 0.5, 1, 0)
table(Actual = heart$target, Predicted = predicted)

Expected Output

Call: glm(formula = target ~ age + sex + cp + thalach + oldpeak, family = binomial(link = "logit"), data = heart) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -3.165518 2.025288 -1.563 0.11805 age 0.035939 0.018825 1.909 0.05624 . sex 1.674501 0.350569 4.777 1.78e-06 *** cp 0.896265 0.170199 5.266 1.39e-07 *** thalach -0.024663 0.007989 -3.087 0.00202 ** oldpeak 0.682883 0.152781 4.470 7.83e-06 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 417.98 on 302 degrees of freedom Residual deviance: 273.13 on 297 degrees of freedom AIC: 285.13 Number of Fisher Scoring iterations: 5

1 Import libraries and load data

import pandas as pd
import numpy as np
import statsmodels.api as sm
import statsmodels.formula.api as smf

# Load the UCI Heart Disease dataset
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/heart-disease/processed.cleveland.data"
cols = ["age", "sex", "cp", "trestbps", "chol", "fbs",
        "restecg", "thalach", "exang", "oldpeak",
        "slope", "ca", "thal", "target"]
heart = pd.read_csv(url, header=None, names=cols)

# Binarise target: 0 = no disease, 1 = disease
heart['target'] = (heart['target'] > 0).astype(int)

# Preview the data
print(heart.head())

2 Fit the GLM

# Fit logistic regression using formula interface
model = smf.glm(
    formula="target ~ age + sex + cp + thalach + oldpeak",
    data=heart,
    family=sm.families.Binomial()
).fit()
Note: Binomial() uses the logit link by default. You could explicitly specify Binomial(link=sm.families.links.Logit()).

3 View results

# Summary of fitted model
print(model.summary())

# Extract coefficients (log-odds scale)
print(model.params)

# Calculate odds ratios
print(np.exp(model.params))

# Confidence intervals for odds ratios
print(np.exp(model.conf_int()))

4 Assess model fit

# Deviance and AIC
print(f"Deviance: {model.deviance:.2f}")
print(f"AIC: {model.aic:.2f}")

# Pseudo R-squared (McFadden)
pseudo_r2 = 1 - (model.deviance / model.null_deviance)
print(f"Pseudo R²: {pseudo_r2:.4f}")

# Confusion matrix
from sklearn.metrics import confusion_matrix, classification_report
predicted = (model.predict() > 0.5).astype(int)
print(confusion_matrix(heart['target'], predicted))
print(classification_report(heart['target'], predicted))

Expected Output

Generalized Linear Model Regression Results ============================================================================== Dep. Variable: target No. Observations: 303 Model: GLM Df Residuals: 297 Model Family: Binomial Df Model: 5 Link Function: Logit Scale: 1.0000 Method: IRLS Log-Likelihood: -136.57 Date: Thu, 11 Dec 2025 Deviance: 273.13 Time: 16:37:21 Pearson chi2: 309. No. Iterations: 5 Pseudo R-squ. (CS): 0.3800 Covariance Type: nonrobust ============================================================================== coef std err z P>|z| [0.025 0.975] ------------------------------------------------------------------------------ Intercept -3.1655 2.025 -1.563 0.118 -7.135 0.804 age 0.0359 0.019 1.909 0.056 -0.001 0.073 sex 1.6745 0.351 4.777 0.000 0.987 2.362 cp 0.8963 0.170 5.266 0.000 0.563 1.230 thalach -0.0247 0.008 -3.087 0.002 -0.040 -0.009 oldpeak 0.6829 0.153 4.470 0.000 0.383 0.982 ==============================================================================

What's Next?