Ridge Plot Strava Rides

In this post, I want to explore my hardest Strava rides. By ‘hard’ rides, I am talking about the activities with the highest positive altitude, that I had to overcome. As a visualisation technique, I wanted to try out a so called ‘Ridge Plot’. These type of plots resamble the iconic cover art for Joy Division’s album Unknown Pleasures.

First load all my strava activities from a private Github repository. If you want to create a repository with all your Strava activities for yourself, you can do so by following the instructions on this blog post.

library(tidyverse)
library(lubridate)
library(ggridges)
library(pins)

theme_set(theme_light())
board_register_github(repo = "duju211/strava_data")

df_meas_raw <- pin_get("strava_meas", board = "github")

board_disconnect("github")

Filter for all “Ride” activities. Also make sure, that there are no altitude measurements missing and that there are no measurements, where I am not moving.

df_bike_rides_raw <- df_meas_raw %>% 
  filter(type == "Ride", !is.na(altitude), moving)

Nest the data by id and calculate important KPIs for every ride.

df_bike_rides_kpi <- df_bike_rides_raw %>%
  nest(ride_data = -id) %>% 
  mutate(
    progress = map(ride_data, ~ (1:nrow(.x)) / nrow(.x)),
    pos_altitude = map_dbl(
      ride_data, 
      ~ sum(pmax(.x$altitude - lag(.x$altitude), 0, na.rm = TRUE))),
    max_altitude = map_dbl(ride_data, ~ max(.x$altitude)))

The pos_altitude column calculates the total positive altitude gain and the max_altitude column describes the highest point of each ride. The progress column is a counter from (0, 1] to have the progress of each ride on the same scale. All these new columns are later used in the final visualisation.

For each ride there is one resulting row with all the calculated KPIs and the measurements nested in a list column (ride_data).

df_bike_rides_kpi
## # A tibble: 167 x 5
##            id ride_data              progress       pos_altitude max_altitude
##         <dbl> <list>                 <list>                <dbl>        <dbl>
##  1 3371274479 <tibble [5,358 x 12]>  <dbl [5,358]>          655.         923 
##  2 3356503514 <tibble [12,122 x 12]> <dbl [12,122]>        1008.         983.
##  3 3345884420 <tibble [12,496 x 12]> <dbl [12,496]>        1438.         984.
##  4 3340985913 <tibble [5,998 x 12]>  <dbl [5,998]>          709.         903 
##  5 3312058282 <tibble [3,807 x 12]>  <dbl [3,807]>          449.         903.
##  6 3307711963 <tibble [5,837 x 12]>  <dbl [5,837]>          694.         895.
##  7 3289469810 <tibble [11,432 x 12]> <dbl [11,432]>        1384.         946.
##  8 3279693358 <tibble [11,981 x 12]> <dbl [11,981]>         997.         967.
##  9 3258708906 <tibble [9,460 x 12]>  <dbl [9,460]>          941.         953 
## 10 3253894818 <tibble [12,884 x 12]> <dbl [12,884]>        1152.         865.
## # ... with 157 more rows

There are 167 activities in the data, which is too much to display reasonably in one single plot. Because of this, determine the top 20 rides by the calculated pos_altitude column. Turn the column into a factor afterwards. Determine the order of the factor by max_altitude. This helps to display the activities in a reasonable order to avoid overplotting later.

n_rides <- 15

df_bike_rides_top_n_nested <- df_bike_rides_kpi %>%
  top_n(n = n_rides, wt = pos_altitude) %>% 
  arrange(max_altitude) %>%
  mutate(id = fct_inorder(as.character(id)))

df_bike_rides_top_n_nested
## # A tibble: 15 x 5
##    id         ride_data              progress       pos_altitude max_altitude
##    <fct>      <list>                 <list>                <dbl>        <dbl>
##  1 3289469810 <tibble [11,432 x 12]> <dbl [11,432]>        1384.         946.
##  2 3217214645 <tibble [12,605 x 12]> <dbl [12,605]>        1162.         949.
##  3 1956833517 <tibble [12,785 x 12]> <dbl [12,785]>        1267.         951 
##  4 2782824418 <tibble [13,456 x 12]> <dbl [13,456]>        1353.         958.
##  5 1564670479 <tibble [15,141 x 12]> <dbl [15,141]>        1656.         964.
##  6 3197748921 <tibble [13,536 x 12]> <dbl [13,536]>        1327.         965.
##  7 3194353432 <tibble [11,380 x 12]> <dbl [11,380]>        1152.         966.
##  8 2414596388 <tibble [12,341 x 12]> <dbl [12,341]>        1339.         975 
##  9 2304838230 <tibble [20,613 x 12]> <dbl [20,613]>        1794.         983 
## 10 3345884420 <tibble [12,496 x 12]> <dbl [12,496]>        1438.         984.
## 11 2822209559 <tibble [10,823 x 12]> <dbl [10,823]>        1285.         985.
## 12 2302099156 <tibble [17,322 x 12]> <dbl [17,322]>        2010.        1065.
## 13 2547075904 <tibble [14,901 x 12]> <dbl [14,901]>        1465.        1201.
## 14 2525681356 <tibble [12,307 x 12]> <dbl [12,307]>        1308.        1230 
## 15 2519983912 <tibble [10,581 x 12]> <dbl [10,581]>        1396.        1357.

Unnest the data to get it in the right form for a ggplot visualisation.

df_bike_rides_top_n <- df_bike_rides_top_n_nested %>% 
  unnest(c(ride_data, progress))

Final visualisation:

df_bike_rides_top_n %>%
  mutate(pos_altitude = pos_altitude / 1000) %>% 
  ggplot(
    aes(
      x = progress, y = id, height = altitude, group = id,
      fill = pos_altitude)) +
    geom_ridgeline(scale = 0.0032) +
    theme(
      axis.text.y = element_blank(),
      axis.ticks = element_blank(),
      axis.text.x = element_blank(),
      axis.title.y = element_blank(),
      axis.title.x = element_blank(),
      panel.background = element_blank(),
      panel.border = element_blank(),
      panel.grid.major = element_blank(),
      panel.grid.minor = element_blank(),
      plot.background = element_blank(),
      legend.position = "bottom", legend.text = element_text(angle = 90)) +
    scale_fill_viridis_b() +
    labs(
      title = str_glue(
        "My Top {n_rides} Strava bike rides (by total positive altitude)"),
      subtitle = "Sorted in descending order by highest point of the ride",
      caption = str_glue("As of: {today()}"),
      fill = "Positive Altitude [1000m]")