Blog Archaeology: Mapping the Territory

A data-driven look at what I’ve been writing about

meta
blog
analytics
data visualization
Author

Jon Minton

Published

February 15, 2026

Introduction

This blog started in late November 2023, initially as a place to work through ideas about statistics, and quickly expanding to encompass film and book reviews, pop culture analysis, data visualisations, hand-drawn illustrations, and assorted other preoccupations. Over two years and 130+ posts later, it seems worth stepping back and looking at what’s accumulated.

This post uses R to programmatically read every post’s metadata — title, date, word count, categories — and visualise the blog’s structure, growth, and evolution.

library(yaml)
library(dplyr)
library(tidyr)
library(stringr)
library(ggplot2)
library(plotly)
library(purrr)
library(lubridate)

Reading the metadata

# Find all index.qmd files in posts/
qmd_files <- list.files(
  "../../posts",
  pattern = "index\\.qmd$",
  recursive = TRUE,
  full.names = TRUE
)

# Function to safely extract metadata from a .qmd file
extract_post_meta <- function(f) {
  lines <- tryCatch(readLines(f, warn = FALSE), error = function(e) character(0))
  if (length(lines) == 0) return(NULL)

  # Find YAML delimiters
  yaml_markers <- which(lines == "---")
  if (length(yaml_markers) < 2) return(NULL)

  yaml_text <- paste(lines[(yaml_markers[1] + 1):(yaml_markers[2] - 1)], collapse = "\n")
  meta <- tryCatch(yaml::yaml.load(yaml_text), error = function(e) NULL)
  if (is.null(meta)) return(NULL)

  # Calculate word count (text after YAML header)
  body_lines <- lines[(yaml_markers[2] + 1):length(lines)]
  text <- paste(body_lines, collapse = " ")
  word_count <- str_count(text, "\\S+")

  # Extract path components
  rel_path <- str_remove(f, ".*/posts/")
  path_parts <- str_split(rel_path, "/")[[1]]

  # Determine folder structure
  if (length(path_parts) >= 3) {
    top_folder <- path_parts[1]
    sub_folder <- path_parts[2]
  } else if (length(path_parts) == 2) {
    top_folder <- "standalone"
    sub_folder <- path_parts[1]
  } else {
    top_folder <- "other"
    sub_folder <- "other"
  }

  # Handle date
  post_date <- tryCatch(as.Date(meta$date), error = function(e) NA_Date_)

  # Handle categories
  cats <- meta$categories
  if (is.null(cats)) cats <- character(0)

  tibble(
    path = rel_path,
    title = meta$title %||% "Untitled",
    date = post_date,
    word_count = word_count,
    top_folder = top_folder,
    sub_folder = sub_folder,
    categories = list(cats),
    has_claude_footnote = str_detect(text, "\\[\\^claude-")
  )
}

# Extract metadata from all posts
posts_df <- map_dfr(qmd_files, extract_post_meta) |>
  filter(!is.na(date)) |>
  arrange(date)

cat(nrow(posts_df), "posts found, spanning",
    as.character(min(posts_df$date)), "to", as.character(max(posts_df$date)), "\n")
137 posts found, spanning 2023-11-25 to 2026-02-15 
cat("Total word count:", format(sum(posts_df$word_count), big.mark = ","), "\n")
Total word count: 225,824 

The Treemap

The treemap below shows every post, grouped by top-level folder and sub-folder. Size reflects word count; colour reflects posting date (darker = older, lighter = more recent). Hover for details.

# Prepare treemap data
# Create parent labels
treemap_data <- posts_df |>
  mutate(
    # Clean up folder names for display
    folder_label = case_when(
      top_folder == "glms" ~ "GLM Series",
      top_folder == "unpop" ~ "Unpopular Opinions",
      top_folder == "tardy-tuesday" ~ "Tardy Tuesday",
      top_folder == "handdrawn-stats" ~ "Hand-drawn Stats",
      top_folder == "standalone" ~ "Standalone",
      TRUE ~ top_folder
    ),
    # Truncate long titles
    short_title = str_trunc(title, 40),
    date_numeric = as.numeric(date),
    hover_text = paste0(
      "<b>", title, "</b><br>",
      "Date: ", date, "<br>",
      "Words: ", format(word_count, big.mark = ","), "<br>",
      "Folder: ", top_folder, "/", sub_folder
    )
  )

# Build the treemap
plot_ly(
  type = "treemap",
  labels = treemap_data$short_title,
  parents = treemap_data$folder_label,
  values = treemap_data$word_count,
  text = treemap_data$hover_text,
  hoverinfo = "text",
  marker = list(
    colors = treemap_data$date_numeric,
    colorscale = list(
      c(0, "#2c3e50"),
      c(0.5, "#3498db"),
      c(1, "#e74c3c")
    ),
    showscale = TRUE,
    colorbar = list(
      title = "Date",
      ticktext = c(
        as.character(min(treemap_data$date)),
        as.character(max(treemap_data$date))
      ),
      tickvals = c(
        min(treemap_data$date_numeric),
        max(treemap_data$date_numeric)
      )
    )
  ),
  textinfo = "label"
) |>
  layout(
    title = list(text = "Blog Post Treemap: Size = Word Count, Colour = Date"),
    margin = list(t = 50)
  )

Posting Patterns

posts_by_month <- posts_df |>
  mutate(month = floor_date(date, "month")) |>
  count(month) |>
  complete(month = seq(min(month), max(month), by = "month"), fill = list(n = 0))

ggplot(posts_by_month, aes(x = month, y = n)) +
  geom_col(fill = "#3498db", alpha = 0.8) +
  geom_smooth(se = FALSE, colour = "#e74c3c", linewidth = 0.8) +
  labs(
    title = "Posts per month",
    x = NULL,
    y = "Number of posts"
  ) +
  theme_minimal() +
  scale_x_date(date_breaks = "3 months", date_labels = "%b\n%Y")
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'

# Calculate gaps between posts
posts_sorted <- posts_df |> arrange(date)
gaps <- tibble(
  from_title = posts_sorted$title[-nrow(posts_sorted)],
  to_title = posts_sorted$title[-1],
  from_date = posts_sorted$date[-nrow(posts_sorted)],
  to_date = posts_sorted$date[-1],
  gap_days = as.numeric(to_date - from_date)
)

cat("Longest gaps between posts:\n\n")
Longest gaps between posts:
gaps |>
  arrange(desc(gap_days)) |>
  head(5) |>
  mutate(
    gap = paste(gap_days, "days"),
    from = paste0(from_title, " (", from_date, ")"),
    to = paste0(to_title, " (", to_date, ")")
  ) |>
  select(gap, from, to) |>
  knitr::kable()
gap from to
85 days Brother Lee the Antimonk (2025-01-04) Time and (state) change (2025-03-30)
44 days Remembering KGB: 1992’s subtly terrifying social poison simulator (2025-08-26) The Man Who Solved Intelligence (2025-10-09)
36 days The Contestant (2025-06-22) It’s your choice (2025-07-28)
27 days Claude Adds Footnotes: A Reflection (2025-12-06) 2025: The Last Year Most Knowledge Workers will be Human (2026-01-02)
24 days Statistics Website (2024-09-04) The Paradox of Tolerating Intolerance: Position A and Position B (2024-09-28)

Content Categories

folder_summary <- posts_df |>
  count(top_folder, sort = TRUE) |>
  mutate(
    pct = n / sum(n) * 100,
    label = paste0(top_folder, " (", n, ", ", round(pct), "%)")
  )

ggplot(folder_summary, aes(x = reorder(top_folder, n), y = n)) +
  geom_col(fill = "#2ecc71", alpha = 0.8) +
  geom_text(aes(label = n), hjust = -0.2) +
  coord_flip() +
  labs(
    title = "Posts by top-level folder",
    x = NULL,
    y = "Number of posts"
  ) +
  theme_minimal()

# Explode categories and count
all_cats <- posts_df |>
  unnest(categories) |>
  count(categories, sort = TRUE)

cat("Top 15 category tags:\n\n")
Top 15 category tags:
all_cats |>
  head(15) |>
  knitr::kable()
categories n
statistics 44
R 37
tidy tuesday 13
stories 8
time series 7
films 6
AI 5
Tidy Tuesday 5
books 5
causality 5
games 5
blog 4
bootstrapping 4
economics 4
meta 4

Word Counts

ggplot(posts_df, aes(x = word_count)) +
  geom_histogram(bins = 30, fill = "#9b59b6", alpha = 0.8) +
  geom_vline(xintercept = median(posts_df$word_count),
             linetype = "dashed", colour = "#e74c3c") +
  annotate("text", x = median(posts_df$word_count) + 100,
           y = Inf, vjust = 2, hjust = 0,
           label = paste("Median:", format(median(posts_df$word_count), big.mark = ",")),
           colour = "#e74c3c") +
  labs(
    title = "Distribution of post word counts",
    x = "Word count",
    y = "Number of posts"
  ) +
  theme_minimal()

cumulative <- posts_df |>
  arrange(date) |>
  mutate(
    cumulative_words = cumsum(word_count),
    post_number = row_number()
  )

ggplot(cumulative, aes(x = date, y = cumulative_words)) +
  geom_line(colour = "#3498db", linewidth = 1) +
  geom_point(colour = "#3498db", size = 0.8, alpha = 0.5) +
  scale_y_continuous(labels = scales::comma) +
  labs(
    title = "Cumulative word count over time",
    x = NULL,
    y = "Cumulative words"
  ) +
  theme_minimal()

cat("Longest posts:\n\n")
Longest posts:
posts_df |>
  arrange(desc(word_count)) |>
  head(10) |>
  select(title, date, word_count, top_folder) |>
  knitr::kable()
title date word_count top_folder
Time and (state) change 2025-03-30 12044 glms
Some thoughts on The Genius(*) Myth: A Review and a Reverie 2025-08-19 6636 unpop
Nine thoughts on Tim Berners-Lee’s This is for Everyone 2025-10-12 6359 unpop
Demystifying and Disenchanting Statistical Significance 2024-10-19 6319 glms
The Man Who Solved Intelligence 2025-10-09 5578 unpop
2025: The Last Year Most Knowledge Workers will be Human 2026-01-02 5134 standalone
The Dilbert Future in Retrospect 2026-01-25 4665 standalone
The Book of Nigel 2024-10-25 4451 standalone
The Analytical Maxim Gun 2025-11-29 4172 standalone
Part Thirteen: On Marbles and Jumping Beans 2024-02-10 4066 glms

The Claude Footnotes Project

In December 2025, Claude Sonnet reviewed 128 posts and added 42 fact-checking footnotes across 29 posts. In February 2026, Claude Opus conducted a more extensive review, expanding to previously-skipped opinion and commentary posts, correcting typos, and adding further footnotes.

footnoted <- posts_df |>
  filter(has_claude_footnote)

cat(nrow(footnoted), "of", nrow(posts_df), "posts now have Claude footnotes.\n")
31 of 137 posts now have Claude footnotes.
cat("Footnoted posts by folder:\n\n")
Footnoted posts by folder:
footnoted |>
  count(top_folder, sort = TRUE) |>
  knitr::kable()
top_folder n
unpop 15
glms 11
standalone 3
on-writing 1
tardy-tuesday 1

Reflection

What does the data show? A blog that started with a burst of statistical pedagogy in late 2023 and early 2024, diversified into cultural commentary and personal essays, and settled into a rhythm of roughly weekly posting with occasional longer gaps. The GLM series remains the heaviest single body of work by word count, but the Unpopular Opinions posts collectively rival it — and tend to be more widely read. The Tardy Tuesday data visualisation posts, though numerous, are lighter on prose, being primarily code and charts.

The blog has become, unexpectedly, a body of work. Not a planned one — there was never a grand design — but an accumulation of preoccupations that, viewed at this distance, form a reasonably coherent picture of someone interested in both the mechanics of understanding the world (statistics, data, methodology) and the meaning we impose on it (stories, film, neurodiversity, politics).

Whether anyone other than the author reads it is, as ever, beside the point.