Jon Minton’s Blog - Tidy Tuesday on Dr Who

Tidy Tuesday challenge

First we load the packages

The tidyverse equivalent of pacman is now pak.

The latest dataset is here, and the specific files to work.

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.0     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

dta_list <- tidytuesdayR::tt_load(x = "2023-11-28")

--- Compiling #TidyTuesday Information for 2023-11-28 ----
--- There are 3 files available ---
--- Starting Download ---


    Downloading file 1 of 3: `drwho_episodes.csv`
    Downloading file 2 of 3: `drwho_directors.csv`
    Downloading file 3 of 3: `drwho_writers.csv`

--- Download complete ---

dta_eps <- dta_list[["drwho_episodes"]]
dta_wrt <- dta_list[["drwho_writers"]]

Let’s see how the viewship changed over time

dta_eps_season <- 
  dta_eps |> 
  group_by(season_number) |> 
  mutate(
    mean_viewers = mean(uk_viewers),
    mean_date = mean(first_aired)
    ) |> 
  ungroup()

dta_eps_season |> 
  ggplot(aes(x = first_aired, y = uk_viewers)) + 
  geom_point(colour = "grey") +
  geom_point(aes(x = mean_date, y = mean_viewers), size = 2.5) + 
  scale_x_date(breaks = "2 years", labels = \(x) format(x, "%Y")) +
  labs(
    x = "First aired",
    y = "UK Viewers (millions)",
    title = "Viewers over time for Dr Who",
    subtitle = "People don't watch TV like they used to..."
  ) +
  annotate("text", x = lubridate::make_date(2015), y = 10, label = "What happened here?!") +
  annotate("text", x = lubridate::make_date(2014), y= 8, label = "Smartphone strangling the TV from now", hjust = 0) + 
  stat_smooth(colour = "blue", se = FALSE)

`geom_smooth()` using method = 'loess' and formula = 'y ~ x'

Let’s now look at writers by season

dta_eps_wrt <- 
  dta_eps |> 
    left_join(dta_wrt, by = "story_number")

How many episodes by writer?

dta_eps_wrt |> 
  group_by(writer) |>  
  summarise(
    n_written = n()
  ) |> 
  ungroup() |> 
  arrange(desc(n_written))

# A tibble: 40 × 2
   writer           n_written
   <chr>                <int>
 1 Steven Moffat           45
 2 Russell T Davies        31
 3 Chris Chibnall          29
 4 Mark Gatiss              9
 5 Toby Whithouse           7
 6 Gareth Roberts           5
 7 Helen Raynor             4
 8 Jamie Mathieson          4
 9 Peter Harness            4
10 Matthew Graham           3
# ℹ 30 more rows

So Moffat wrote most episodes, then Davies, then Chibnall

And what about popularity by writer?

dta_eps_wrt |> 
  group_by(writer) |>  
  mutate(
    n_written = n()
  ) |> 
  ungroup() |> 
  filter(n_written >= 5) |> 
  ggplot(aes(x = fct_reorder(writer, rating), y= rating)) + 
  geom_boxplot() + 
  coord_flip() + 
  labs(
    x = "Distribution of ratings",
    y = "Writer", 
    title = "Rating distribution by writer",
    subtitle = "Writers who wrote at least five episodes"
  )

When were the different writers active?

major_writers_active <- 
  dta_eps_wrt |> 
    group_by(writer) |>  
    mutate(
      n_written = n()
    ) |> 
    ungroup() |> 
    filter(n_written >= 5) |> 
    group_by(writer) |> 
    summarise(
      started_writing = min(first_aired),
      finished_writing = max(first_aired),
      n_written = n_written[1]
    ) |> 
    ungroup() |> 
    mutate(
      yr_start = year(started_writing),
      yr_end = year(finished_writing)
    )

major_writers_active |> 
  arrange(started_writing)

# A tibble: 6 × 6
  writer           started_writing finished_writing n_written yr_start yr_end
  <chr>            <date>          <date>               <int>    <dbl>  <dbl>
1 Russell T Davies 2005-03-26      2010-01-01              31     2005   2010
2 Mark Gatiss      2005-04-09      2017-06-10               9     2005   2017
3 Steven Moffat    2005-05-21      2017-12-25              45     2005   2017
4 Toby Whithouse   2006-04-29      2017-06-03               7     2006   2017
5 Gareth Roberts   2007-04-07      2011-09-24               5     2007   2011
6 Chris Chibnall   2007-05-19      2022-10-23              29     2007   2022

Here we see the tenure of different major writers. Russell T Davies and Steven Moffatt are the major players.

Coda

Neither of us know much about Dr Who!

But hopefully we now know a bit more!