← All Tutorials

Tutorial 2: Fitting the Model

How should we estimate the logistic regression parameters?

Systematic
Link
Distribution
4 Fitting
5 Implementation

Your model so far

Systematic Component
$\eta = \beta_0 + \beta_1 \cdot \text{Age} + \ldots$
Link Function
Logit: $\ln(p/(1-p)) = \eta$
Distribution
Binomial

Key Difference from Tutorial 1

Unlike linear regression (Gaussian + identity), logistic regression has no closed-form solution. We cannot simply compute $\beta = (X'X)^{-1}X'y$. The logit link creates a non-linear equation that must be solved iteratively.

How should we find the $\beta$ coefficients?

We have our model structure defined. Now we need to estimate the parameters ($\beta_0, \beta_1, \ldots, \beta_5$) that best fit the observed data.

Click on a card to select it.

🔄

Maximum Likelihood (IRLS)

Iteratively Reweighted Least Squares - the standard GLM fitting algorithm.

Iterative optimization • Required for logistic • Fisher scoring

📐

Closed-Form (OLS)

Direct analytical solution using matrix algebra: $\beta = (X'X)^{-1}X'y$

One-shot calculation • No iteration • Exact solution

⛰️

Gradient Descent

General-purpose optimization by following the gradient of the loss function.

Step size tuning • Slower convergence • Deep learning standard

🎯

Newton-Raphson

Second-order optimization using the Hessian matrix.

Fast convergence • Computes curvature • IRLS is a form of this

✔ Fitting Method Selected

Maximum Likelihood via IRLS is the standard approach for GLMs, and it's required for logistic regression since there's no closed-form solution.

R's glm() and Python's statsmodels use IRLS (a form of Fisher scoring) by default. For our heart disease model, it typically converges in about 5 iterations.

✔ Correct!

Maximum Likelihood Estimation (MLE) via Iteratively Reweighted Least Squares (IRLS) is the standard method for logistic regression and all GLMs.

Why IRLS is essential for logistic regression:
The logit link creates a non-linear relationship between predictors and probability. There's no algebraic trick to solve for $\beta$ directly - we must iterate to find the maximum likelihood solution.

The IRLS algorithm for logistic regression:

This is equivalent to Newton-Raphson optimization using Fisher information - hence "Fisher scoring".

🔍 Want to see optimisation in action?
Our interactive visualisations show how algorithms navigate parameter space from 1D to 4D, including gradient descent and Newton-Raphson.

❌ No Closed-Form Solution Exists

For linear regression (Gaussian + identity), the closed-form solution $\beta = (X'X)^{-1}X'y$ works beautifully. But logistic regression has no such solution.

Why? The logit link creates a non-linear relationship:

$\ln\left(\frac{p}{1-p}\right) = X\beta$

Solving for $\beta$ requires finding where the derivative of the log-likelihood equals zero, which has no closed-form solution.

This is a fundamental difference between:

Select Maximum Likelihood (IRLS) to continue.

⚠️ Valid, But Not Standard for GLMs

Gradient descent would work for logistic regression - it's how neural networks are trained. However, it's not the standard approach for GLMs.

Why IRLS is preferred over gradient descent:
  • Faster convergence: IRLS uses curvature information (2nd derivatives)
  • No step size tuning: Gradient descent requires choosing learning rate
  • Exact standard errors: IRLS gives information matrix for free
  • Guaranteed convergence: For well-posed GLMs, IRLS always converges

Gradient descent is typically 10-100x slower than IRLS for logistic regression. It's the standard for deep learning where computing the full Hessian is impractical.

Select Maximum Likelihood (IRLS) for the standard GLM approach.

⭐ Excellent Insight!

Newton-Raphson is indeed the mathematical foundation of how we fit logistic regression! IRLS for binomial GLMs is actually equivalent to Newton-Raphson (specifically, Fisher scoring).

The connection:

Newton-Raphson update: $\beta^{new} = \beta^{old} - H^{-1} \nabla \ell$

Where $H$ is the Hessian and $\nabla \ell$ is the score. For GLMs, this can be rewritten as an iteratively reweighted least squares problem.

Fisher scoring (used in IRLS) replaces the observed Hessian with its expected value (Fisher information), which simplifies computation and guarantees positive definiteness.

In practice, we use the term "Maximum Likelihood via IRLS" - select that option to continue.