Part One: Model fitting as parameter calibration

statistics
Author

Jon Minton

Published

November 28, 2023

tl;dr

This is part of a series of posts which introduce and discuss the implications of a general framework for thinking about statistical modelling. This framework is most clearly expressed in King, Tomz, and Wittenberg (2000) .

Part 1: What are statistical models and how are they fit?

It’s common for different statistical methods to be taught as if they’re completely different species or families. In particular, for standard linear regression to be taught first, then additional, more exotic models, like logistic or Poisson regression, to be introduced at a later stage, in an advanced course.

The disadvantage with this standard approach to teaching statistics is that it obscures the way that almost all statistical models are, fundamentally, trying to do something very similar, and work in very similar ways.

Something I’ve found immensely helpful over the years is the following pair of equations:

Stochastic Component

\[ Y_i \sim f(\theta_i, \alpha) \]

Systematic Component

\[ \theta_i = g(X_i, \beta) \]

In words, the above is saying something like:

  • The predicted response \(Y_i\) for a set of predictors \(X_i\) is assumed to be drawn from (the \(\sim\) symbol) a stochastic distribution (\(f(.,.)\))
  • The stochastic distribution contains both parameters we’re interested in, and which are determined by the data \(\theta_i\), and parameters we’re not interested in and might just have to assume, \(\alpha\).
  • The parameters we’re interested in determining from the data \(\theta_i\) are themselves determined by a systematic component \(g(.,.)\) which take and transform two inputs: The observed predictor data \(X_i\), and a set of coefficients \(\beta\)

And graphically this looks something like:

flowchart LR
  X
  beta
  g
  f
  alpha
  theta
  Y
  
  X --> g
  beta --> g
  g --> theta
  theta --> f
  alpha --> f
  
  f --> Y


To understand how this fits into the ‘whole game’ of modelling, it’s worth introducing another term, \(D\), for the data we’re using, and to say that \(D\) is partitioned into observed predictors \(X_i\), and observed responses, \(y_i\).

For each observation, \(i\), we therefore have a predicted response, \(Y_i\), and an observed response, \(y_i\). We can compare \(Y_i\) with \(y_i\) to get the difference between the two, \(\delta_i\).

Now, obviously can’t change the data to make it fit our model better. But what we can do is calibrate the model a little better. How do we do this? Through adjusting the \(\beta\) parameters that feed into the systematic component \(g\). Graphically, this process of comparison, adjustment, and calibration looks as follows:

flowchart LR
  D
  y
  X
  beta
  g
  f
  alpha
  theta
  Y
  diff
  
  D -->|partition| X
  D -->|partition| y
  X --> g
  beta -->|rerun| g
  g -->|transform| theta
  theta --> f
  alpha --> f
  
  f -->|predict| Y
  
  Y -->|compare| diff
  y -->|compare| diff
  
  diff -->|adjust| beta
  
  
  
  linkStyle default stroke:blue, stroke-width:1px

Pretty much all statistical model fitting involves iterating along this \(g \to \beta\) and \(\beta \to g\) feedback loop until some kind of condition is met involving minimising \(\delta\).

I’ll expand on this idea further in part 2.

References

King, Gary, Michael Tomz, and Jason Wittenberg. 2000. “Making the Most of Statistical Analyses: Improving Interpretation and Presentation.” American Journal of Political Science 44: 341355. http://gking.harvard.edu/files/abs/making-abs.shtml.