Tidy Tuesday on Life Expectancy

R
tidy tuesday
Life Expectancy
Authors

Nick Christofides

Jon Minton

Sandra Nwobi

Published

December 6, 2023

This week’s Tidy Tuesday compares life expectancy across the globe and is available here:

Loading the data

Code
library(tidyverse)
library(tidytuesdayR)
data_url <- "https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2023/2023-12-05/life_expectancy_different_ages.csv"
 
dta <- read_csv(data_url)

# Alternatively
tuesdata <- tidytuesdayR::tt_load('2023-12-05')

    Downloading file 1 of 3: `life_expectancy.csv`
    Downloading file 2 of 3: `life_expectancy_different_ages.csv`
    Downloading file 3 of 3: `life_expectancy_female_male.csv`
Code
names(tuesdata)
[1] "life_expectancy"                "life_expectancy_different_ages"
[3] "life_expectancy_female_male"   
Code
life_expectancy <- tuesdata$life_expectancy

n_distinct(life_expectancy$Entity)
[1] 261

Setting a global plot theme

Code
theme_set(theme_minimal())

Comparing life expectancy across regions

Code
regions <- life_expectancy %>%
  filter(str_detect(Entity, "region"))

regions %>%
  ggplot(aes(x = Year, y = LifeExpectancy)) +
  geom_line(aes(col = Entity)) +
  theme(legend.position = "top") +
  annotate(geom = "text",
           x = 1960, y = 50,
           label = "What happened here?") +
  geom_vline(xintercept = 2019, linetype = 2) +
  annotate(geom = "text", x = 2019, y = 75,
           label = "Start of COVID pandemic",
           hjust = 1)

Difference in life expectancy between more and less developed regions

Code
regions %>%
  filter(Entity %in% c("More developed regions",
                       "Less developed regions"))  %>%
  arrange(Year, Entity) %>%
  group_by(Year) %>%
  mutate(difference = LifeExpectancy - lag(LifeExpectancy)) %>%
  filter(!is.na(difference)) %>%
  ungroup() %>%
  ggplot(aes(x = Year, y = difference)) +
  geom_area(alpha = 0.5) +
  expand_limits(y = 0) +
  labs(title = "Difference in life expectancy between more developed and less developed regions",
       y = "Difference in life expectancy (years)")

We look at life expectancy at different ages in three specific countries.

Code
data_tidy <-
  dta |>
    pivot_longer(
      cols = LifeExpectancy0:LifeExpectancy80
    ) |>
    mutate(
      starting_age = str_remove(name, "LifeExpectancy") %>%
        as.numeric()
    ) |>
    select(-name) |>
    rename(e_x = value)
Code
data_tidy |>
  filter(
    Entity %in% c(
      "Nigeria", "Iran",
      "South Africa"
    )
  ) |>
  arrange(Year)  |>
  ggplot(aes(Year, e_x, group = factor(starting_age), colour = factor(starting_age))) +
  geom_line() +
  facet_wrap(~Entity)

Sandra Nwobi, who suggested the three countries above, provides the following summary:

Of the three developing countries—Iran, South Africa, and Nigeria—Nigeria has a significantly higher zero-age death rate in the late 50s and early 60s. This can be attributed to a number of factors, including socioeconomic instability, political unrest, malnutrition, and limited access to healthcare. Comparing this result to South Africa and Iran, it is comparatively higher. However, there have been noticeable improvements in Nigeria during the 1980s, with a steady increase. Nevertheless, much work needs to be done to combat this in Nigeria, as it performs significantly worse than the other two countries.

There was a noticeable decline in data in the early 2000s, particularly in South Africa. Health crises like HIV/AIDS, which may have affected people between the ages of 0 and 25, as well as a number of social and economic problems may have contributed to this decline.

Iran’s data indicates consistent growth across all age groups over the years, with the exception of a general decline in 2020 that was likely caused by the COVID-19 virus. Out of the three countries, South Africa is the most affected, maybe as a result of a much older demography compared to Nigeria.