For this session we chose to focus on a data wrangling challenge that Andrew brought to the table, rather than the most recent dataset. The challenge involved converting slightly messy data structure from the first format seen below, to the second (tidy) structure:
# A tibble: 1 × 9
region la la_code house_count house_sold flat_count flat_sold all_count
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 Scotland Minyip M394932 1000 900 600 300 1600
# ℹ 1 more variable: all_sold <chr>
This solution involved a mixture of tidyverse and base R expressions. Brendan made the point that tidyverse is great at doing 80% of the work needed quickly, but sometimes base R is needed to complete the remaining 20%.
Other solutions
Andrew proposed the following function for doing the data cleaning.
merge_rows <-function(df){ # eg. lst[[1]]# convert rows1&2 into columns row1 <- df[1, ] |>t() row2 <- df[2, ] |>t() # remove selected text then fill down NAs row1 <-str_remove(row1, " Stats in Nov") |>as_tibble() |>fill(value) |>pull() |>replace_na("") row3 <-str_c(row1, " ", row2) |>str_trim()# create same header vector as original df header_vec <-character()for (i inseq_along(df)){ header_vec[i] <-str_c("...", i) }# create tibble with header and 1st row of df tib <-t(row3) |>as_tibble() |>set_names(header_vec) return(tib)}create_header_df <-function(df) {merge_rows(df) |>bind_rows(df) |> janitor::row_to_names(1) |>clean_names()}create_header_df(df) |>slice(-c(1:2))
After Kate solved Andrew’s problem with lots of time to spare, she then ran through a real-world example that makes use of some similar approaches. The use of unlist() to extract vectors to work on was a big part of both solutions.
We all agreed Kate was excellent with these kinds of data wrangling challenges, despite coding with no prior warning of the challenge and wet hair.