GLM Tutorial: Choose the Link Function for Classification

✔ Correct Choice!

The logit link is the standard choice for binary classification, giving us logistic regression.

The logit function maps probabilities to "log-odds":

        $\text{logit}(p) = \ln\left(\frac{p}{1-p}\right)$
        
        When $p = 0.5$, logit$(p) = 0$

        When $p \to 0$, logit$(p) \to -\infty$

        When $p \to 1$, logit$(p) \to +\infty$

Why logit dominates:

Odds ratio interpretation: $e^\beta$ gives the odds ratio directly
Algebraically convenient: Simple closed-form derivatives for optimization
Canonical link: Natural pairing with Binomial distribution in GLM theory

⚠ Valid, But Not Preferred Today

The probit link is mathematically valid for binary outcomes - it also maps probabilities to the real line. However, it's less commonly used than logit today.

Historical Context

Pre-1980s: Probit dominated in economics and bioassay (dose-response studies). It was developed by Chester Bliss in the 1930s for analyzing insecticide effectiveness.

Why probit was preferred then: The "latent variable" interpretation was appealing - assuming an underlying normally-distributed threshold that determines the binary outcome.

Why logit won: With modern computation, logit's advantages became clear:

Direct odds ratio interpretation
Simpler mathematics (no need for normal CDF tables)
Canonical link for Binomial in GLM framework

        Practical note: Probit and logit give very similar predictions in practice:
        
        $\beta_{\text{probit}} \approx \beta_{\text{logit}} \times 0.625$
        
        Probit is still used in economics (tradition) and some dose-response studies.

For this tutorial, try the logit link - it's the modern standard and gives easier-to-interpret coefficients.

❌ Theoretically Wrong (But Sometimes Used)

The identity link doesn't bound predictions - it gives us the "linear probability model" (LPM):

$P(\text{HeartDisease} = 1) = \beta_0 + \beta_1 \cdot \text{Age} + \ldots$

The theoretical problem: Linear combinations can produce any real number, so this model can predict:

$P = 1.3$ (impossible - probability > 1)
$P = -0.2$ (impossible - probability < 0)

In Practice...

Despite being theoretically incorrect, the LPM is still used, especially in econometrics:

Easy interpretation: Coefficients are percentage point changes in probability
Often "good enough": When predicted probabilities stay near 0.5, out-of-bounds predictions are rare
Causal inference: Some economists prefer LPM for its simplicity in causal analysis

However, for serious prediction or when probabilities near 0 or 1 matter, logit is preferred.

For this tutorial, use the logit link to learn the proper GLM approach.

❌ Not Appropriate for Probabilities

The log link ensures predictions are positive, but doesn't bound them to be less than 1:

        $\ln(p) = \beta_0 + \beta_1 \cdot \text{Age} + \ldots$
        
        $p = e^{\beta_0 + \beta_1 \cdot \text{Age} + \ldots}$

The problem: The exponential function can produce values greater than 1:

When $\eta > 0$, we get $p = e^\eta > 1$ (impossible)
The log link is designed for count data (Poisson), not probabilities

The log link is the right choice when the response must be positive (like counts), but probabilities need a link that bounds to (0, 1).

Tutorial 2: The Link Function for Classification

Your model so far

The Key Constraint

Choose the Link Function

Logit

Probit

Identity

Log