GLM Tutorial: Fitting Logistic Regression

✔ Correct!

Maximum Likelihood Estimation (MLE) via Iteratively Reweighted Least Squares (IRLS) is the standard method for logistic regression and all GLMs.

        Why IRLS is essential for logistic regression:

        The logit link creates a non-linear relationship between predictors and probability.
        There's no algebraic trick to solve for $\beta$ directly - we must iterate to find the
        maximum likelihood solution.
      

The IRLS algorithm for logistic regression:

Start with initial estimates (often $\beta = 0$)
Calculate current probabilities: $\hat{p} = \frac{1}{1 + e^{-X\beta}}$
Compute weights: $W = \text{diag}(\hat{p}(1-\hat{p}))$
Update: $\beta^{new} = (X'WX)^{-1}X'Wz$ where $z$ is a working response
Repeat until convergence (usually 3-7 iterations)

This is equivalent to Newton-Raphson optimization using Fisher information - hence "Fisher scoring".

🔍 Want to see optimisation in action?
Our interactive visualisations show how algorithms navigate parameter space from 1D to 4D, including gradient descent and Newton-Raphson.

❌ No Closed-Form Solution Exists

For linear regression (Gaussian + identity), the closed-form solution $\beta = (X'X)^{-1}X'y$ works beautifully. But logistic regression has no such solution.

        Why? The logit link creates a non-linear relationship:
        
        $\ln\left(\frac{p}{1-p}\right) = X\beta$
        
        Solving for $\beta$ requires finding where the derivative of the log-likelihood equals zero,
        which has no closed-form solution.

This is a fundamental difference between:

Linear regression: Minimize sum of squared errors → closed-form
Logistic regression: Maximize binomial likelihood → iteration required

Select Maximum Likelihood (IRLS) to continue.

⚠️ Valid, But Not Standard for GLMs

Gradient descent would work for logistic regression - it's how neural networks are trained. However, it's not the standard approach for GLMs.

        Why IRLS is preferred over gradient descent:
        Faster convergence: IRLS uses curvature information (2nd derivatives)
No step size tuning: Gradient descent requires choosing learning rate
Exact standard errors: IRLS gives information matrix for free
Guaranteed convergence: For well-posed GLMs, IRLS always converges

      

Gradient descent is typically 10-100x slower than IRLS for logistic regression. It's the standard for deep learning where computing the full Hessian is impractical.

Select Maximum Likelihood (IRLS) for the standard GLM approach.

⭐ Excellent Insight!

Newton-Raphson is indeed the mathematical foundation of how we fit logistic regression! IRLS for binomial GLMs is actually equivalent to Newton-Raphson (specifically, Fisher scoring).

        The connection:
        
        Newton-Raphson update: $\beta^{new} = \beta^{old} - H^{-1} \nabla \ell$
        
        Where $H$ is the Hessian and $\nabla \ell$ is the score. For GLMs, this can be rewritten
        as an iteratively reweighted least squares problem.

Fisher scoring (used in IRLS) replaces the observed Hessian with its expected value (Fisher information), which simplifies computation and guarantees positive definiteness.

In practice, we use the term "Maximum Likelihood via IRLS" - select that option to continue.

Tutorial 2: Fitting the Model

Your model so far

Key Difference from Tutorial 1

How should we find the $\beta$ coefficients?

Maximum Likelihood (IRLS)

Closed-Form (OLS)

Gradient Descent

Newton-Raphson

✔ Fitting Method Selected

✔ Correct!

❌ No Closed-Form Solution Exists

⚠️ Valid, But Not Standard for GLMs

⭐ Excellent Insight!