My blog series on statistical inference and modelling has, at the time of writing, 13 parts, and a feature-length reading time. 1 I’ve been strongly motivated to write this because it covers what I consider the essential theory and practice necessary to be a competent user of statistical methods. In this post I’ll go a bit more into my own background, and how I came to pick up this knowledge.
My first degree was in the applied physical sciences: electronic engineering. From this I learned two things: firstly, not to be afraid of algebra and coding; secondly, that I didn’t want to do electronic engineering as a career. So I moved into the social sciences, and this move took me to the health sciences, demography and epidemiology.
The move from the applied physical to the social and health sciences made me realise I’d learned something else from the engineering course: a pair of expectations about methods training. The first expectation was that the methods taught should allow the substantive questions of interest in the field to be addressed. The second expectation was that methods should be taught with sufficient rigour and formalism to ensure students attending the same course, and being sufficiently attentive in that course, leave with a common understanding of what’s been taught and exactly how they are applied.
I wish I could honestly say otherwise, but in my experience the methods taught in much of the social sciences in the UK fell short of the standards of rigour and application that are just taken as given in an engineering course. Qualitative methods courses tend to trade in abstract nouns and unfalsifiable declarations - how does one really know whether one’s employing a feminist methodology, or a critical realist epistemology, or a post-structuralist framing, when asking people why they’re so sad, or angry, or poor? And most of the quantitative methods training, at least when I first encountered them, took the form of telling people what buttons to click, in which order, after opening up a copy of SPSS. Press this button, then this button, then this button, then look at this number here, and check it’s under 0.05, and look for the number of stars in this row, and so on.
When I started a PhD in the quantitative social sciences I was highly unskilled. I sat in on some general social science methods courses, some econometrics, some first year probability and statistics courses run by the maths departments, but still didn’t feel I knew how to use the methods of quantitative research with the same level of rigour and understanding that I’d been used to in the engineering course. So I kept searching.
The training course that finally changed this was Gov 2001, a course that’s been run annually by Harvard university for decades, and seems to have become something of an institution. The course teaches statistical inference from the ground up, from the first principles of likelihood and probability, but also doesn’t scrimp on the practicalities of application. It’s also highly applied, with students evaluated on whether they can, at the end of the course, replicate and improve upon an article that’s already been published. It also emphasises the family resemblances between statistical models, the way almost all specific models are just different versions of an underlying ‘mother model’ (my term) which comprises two linked equations.
I took the course as a distance student over a decade ago, and still find its contents immensely valuable. The blog post series listed below is largely based on that course, though with my own idiosyncratic spin and emphasis. It’s quite technical in places, but the juice is worth the squeeze. If you follow along you will know and understand more about statistical models and their application than almost any UK graduate in a field other than statistics. 2
Footnotes
Much of this reading may be messages generated by R functions, however.↩︎
A statement based on a great deal of personal experience, sadly. Statistical inference is still generally quite poorly explained, poorly understood, and poorly applied in much of the UK, especially when it comes to model building, comparision, interpretation and use for prediction.↩︎