Tidy Tuesday: Roaming US Holidays

R
USA
holidays
Tidy Tuesday
Authors

Brendan Clarke

Kate Pyper

Kennedy Owusu-Afriyie

Myriam Scansetti

Andrew Saul

Jon Minton

Published

June 19, 2024

For this Tardy Tuesday session we looked at the following Tidy Tuesday data challenge, which involved identifying the date when various public holidays in the USA (a rare thing) are expected to occur each year.

Brendan led/‘scribed’ the session

Analysis

We used the tidytuesdayR package to load the data, then pushed these to the global environment using list2env.

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.0     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
tidytuesdayR::tt_load('2024-06-18') |>
  list2env(envir = .GlobalEnv)
--- Compiling #TidyTuesday Information for 2024-06-18 ----
--- There are 2 files available ---
--- Starting Download ---

    Downloading file 1 of 2: `federal_holidays.csv`
    Downloading file 2 of 2: `proposed_federal_holidays.csv`
--- Download complete ---
<environment: R_GlobalEnv>

Our main dataset looked as follows:

federal_holidays
# A tibble: 11 × 6
   date  date_definition official_name year_established date_established details
   <chr> <chr>           <chr>                    <dbl> <date>           <chr>  
 1 Janu… fixed date      New Year's D…             1870 1870-06-28       "Celeb…
 2 Janu… 3rd monday      Birthday of …             1983 1983-11-02       "Honor…
 3 Febr… 3rd monday      Washington's…             1879 NA               "Honor…
 4 May … last monday     Memorial Day              1868 NA               "Honor…
 5 June… fixed date      Juneteenth N…             2021 2021-06-17       "Comme…
 6 July… fixed date      Independence…             1870 NA               "Celeb…
 7 Sept… 1st monday      Labor Day                 1894 NA               "Honor…
 8 Octo… 2nd monday      Columbus Day              1968 NA               "Honor…
 9 Nove… fixed date      Veterans Day              1938 NA               "Honor…
10 Nove… 4th thursday    Thanksgiving…             1941 NA               "Tradi…
11 Dece… fixed date      Christmas Day             1870 NA               "The m…

We were interested in those ‘roaming holidays’ where the date column contains a range of dates, and the date definition contains information on the criterion used to determine the specific date for a given year.

We decided to try to solve the problem manually for MLK day, which should be the third monday in January.

# find 3rd monday of january 202x
wday("2024-06-17")
[1] 2
date_range <- "January 15–21"
year <- 2024

# find monday (2) in date range

start_date <- "January 15 2024"
end_date <- "January 21 2024"
mdy(start_date)
[1] "2024-01-15"
str_split(date_range, "–")
[[1]]
[1] "January 15" "21"        
month <- str_extract(date_range, "[a-zA-Z]+") 

dates <- str_extract_all(date_range, "[0-9]+") |>
  unlist()

start_end <- ymd(paste(year, month, dates))

# ymd("2024-01-16") %within% interval(start_end[1], start_end[2])

dates_between <- seq(from = start_end[1], to = start_end[2], by = 1)

dates_between[wday(dates_between) == 2]
[1] "2024-01-15"

Then, we generalised this slightly by producing a function that finds the date of MLK day for different given years:

mlk_day <- function(year){
  date_range <- "January 15–21"
  
  month <- stringr::str_extract(date_range, "[a-zA-Z]+") 
  
  dates <- stringr::str_extract_all(date_range, "[0-9]+") |>
    unlist()
  
  start_end <- lubridate::ymd(paste(year, month, dates))
  
  dates_between <- seq(from = start_end[1], to = start_end[2], by = 1)
  
  dates_between[wday(dates_between) == 2] # update day for general
}

mlk_day(2025)
[1] "2025-01-20"
map_vec(1983:2025, mlk_day)
 [1] "1983-01-17" "1984-01-16" "1985-01-21" "1986-01-20" "1987-01-19"
 [6] "1988-01-18" "1989-01-16" "1990-01-15" "1991-01-21" "1992-01-20"
[11] "1993-01-18" "1994-01-17" "1995-01-16" "1996-01-15" "1997-01-20"
[16] "1998-01-19" "1999-01-18" "2000-01-17" "2001-01-15" "2002-01-21"
[21] "2003-01-20" "2004-01-19" "2005-01-17" "2006-01-16" "2007-01-15"
[26] "2008-01-21" "2009-01-19" "2010-01-18" "2011-01-17" "2012-01-16"
[31] "2013-01-21" "2014-01-20" "2015-01-19" "2016-01-18" "2017-01-16"
[36] "2018-01-15" "2019-01-21" "2020-01-20" "2021-01-18" "2022-01-17"
[41] "2023-01-16" "2024-01-15" "2025-01-20"

Finally, we generalised this further with a function to find the date of roaming holidays for many different types of holiday and years:

any_day <- function(year, date, date_definition){
  
  day <- stringr::str_extract(date_definition, " .+$") |>
    str_trim() 
  
  weekdays <- paste0(c("sun", "mon", "tues", "wednes", "thurs", "fri", "satur"), "day")
  
  day_no <- which(weekdays == day)
  
  month <- stringr::str_extract(date, "[a-zA-Z]+") 
  
  dates <- stringr::str_extract_all(date, "[0-9]+") |>
    unlist()
  
  start_end <- lubridate::ymd(paste(year, month, dates))
  
  dates_between <- seq(from = start_end[1], to = start_end[2], by = 1)
  
  dates_between[wday(dates_between) == day_no] # update day for general
}

any_day(2024, "January 15–21", "3rd monday")
[1] "2024-01-15"
movers <- federal_holidays |>
  filter(date_definition != "fixed date") |>
  expand_grid(year = 1983:2024) 

movers <- movers |>
  bind_cols(actual_date = pmap_vec(list(year = movers$year, date = movers$date, date_definition = movers$date_definition), any_day))

movers
# A tibble: 252 × 8
   date  date_definition official_name year_established date_established details
   <chr> <chr>           <chr>                    <dbl> <date>           <chr>  
 1 Janu… 3rd monday      Birthday of …             1983 1983-11-02       Honors…
 2 Janu… 3rd monday      Birthday of …             1983 1983-11-02       Honors…
 3 Janu… 3rd monday      Birthday of …             1983 1983-11-02       Honors…
 4 Janu… 3rd monday      Birthday of …             1983 1983-11-02       Honors…
 5 Janu… 3rd monday      Birthday of …             1983 1983-11-02       Honors…
 6 Janu… 3rd monday      Birthday of …             1983 1983-11-02       Honors…
 7 Janu… 3rd monday      Birthday of …             1983 1983-11-02       Honors…
 8 Janu… 3rd monday      Birthday of …             1983 1983-11-02       Honors…
 9 Janu… 3rd monday      Birthday of …             1983 1983-11-02       Honors…
10 Janu… 3rd monday      Birthday of …             1983 1983-11-02       Honors…
# ℹ 242 more rows
# ℹ 2 more variables: year <int>, actual_date <date>