After doing The Incomparable, I had to go see what I could do with Relay.fm’s data as well, naturally.
So this is the beginning of that. The source code is included in this repo, and, in case anyone asks:
Yes, you can use these graphs for whatever. Maybe link back to me, that would be cool.

Please note that there’s probably still stuff coming.

Overview

Relay.fm spans 35 shows and a combined runtime of over 118 days spread across 2574 episodes since 2014.

In 2016, there were 752 episodes published, and we’re currently about 104.7% of the way there at 787 episodes this year.

Shows by Age

readRDS("data/relay.rds") %>% group_by(podcast) %>% mutate(d_min = min(date), 
    d_max = max(date)) %>% ggplot(aes(x = reorder(podcast, d_min), color = show_status)) + 
    geom_errorbar(aes(ymin = d_min, ymax = d_max), width = 0.5, size = 2, alpha = 0.75) + 
    coord_flip() + scale_x_discrete(position = "top") + scale_color_brewer(palette = "Dark2") + 
    labs(x = "", y = "First to Last Episode Date", title = "Relay.fm", subtitle = "Show Timelines", 
        caption = caption, color = "Status") + theme(legend.position = "top")

Shows per Year

relay %>% group_by(podcast, year) %>% tally %>% ggplot(aes(x = as.factor(year), 
    weight = n, fill = podcast)) + geom_bar() + scale_y_continuous(breaks = seq(0, 
    1000, 100), minor_breaks = seq(0, 1000, 25)) + scale_fill_viridis(discrete = TRUE, 
    guide = guide_legend(ncol = 1, keyheight = 0.75, )) + labs(x = "Year", y = "# of Episodes", 
    fill = "Show", title = "Relay.fm", subtitle = "Episodes per Year", caption = caption) + 
    theme_ipsum(grid = "y") + theme(legend.key.size = unit(0.5, "cm"))

Shows per Week

relay %>% mutate(week = week(date), year_num = year(date)) %>% filter(year_num >= 
    2014) %>% group_by(year, week) %>% tally %>% {
    ggplot(data = ., aes(x = week, y = n)) + geom_point() + geom_smooth(method = lm, 
        se = F, color = "red") + scale_y_continuous(breaks = seq(0, 100, 5), 
        minor_breaks = seq(0, 100, 1)) + scale_x_continuous(breaks = seq(0, 
        55, 16)) + facet_wrap(~year) + labs(title = "Relay.fm", subtitle = "Weekly Episode Output", 
        x = "Week of Year", y = "Episodes Published", caption = caption) + theme_ipsum(grid = "y")
}

Regularity

relay %>% mutate(week = isoweek(date), year_num = year(date)) %>% filter(year_num == 
    2017) %>% group_by(podcast, year, week) %>% tally %>% ggplot(aes(x = week, 
    y = n)) + geom_col() + # scale_y_continuous(breaks = seq(0, 10, 1)) + scale_x_continuous(breaks =
# seq(0, 60, 10)) +
facet_wrap(~podcast, ncol = 3) + labs(title = "Relay.fm", subtitle = "Episodes per Week in 2017", 
    x = "Week of Year", y = "# fo Episodes", caption = caption) + theme_ipsum(grid = "y") + 
    theme(axis.text.y = element_text(size = rel(0.5)))

Regularity and Age

relay %>% group_by(podcast) %>% summarize(first_aired = min(date), last_aired = max(date), 
    episodes = length(unique(number)), duration = sum(duration)) %>% mutate(lifetime_weeks = as.numeric(difftime(last_aired, 
    first_aired, units = "weeks")), episodes_per_week = episodes/lifetime_weeks, 
    duration_per_week = duration/lifetime_weeks) %>% ggplot(aes(x = lifetime_weeks, 
    y = episodes_per_week)) + geom_point() + geom_label_repel(aes(label = podcast), 
    size = 3) + geom_hline(aes(yintercept = mean(episodes_per_week)), linetype = "dotted", 
    color = "blue") + geom_vline(aes(xintercept = mean(lifetime_weeks)), linetype = "dotted", 
    color = "blue") + scale_y_continuous(breaks = seq(0, 5, 0.25), minor_breaks = seq(0, 
    5, 0.1)) + labs(title = "Relay.fm", subtitle = "Show Lifetime and Weekly Output", 
    x = "Lifetime of Podcast in Weeks", y = "Average # of Episodes per Week", 
    caption = caption) + theme_ipsum(grid = "y")

Shows per Month

relay %>% mutate(month = month(date), year_num = year(date)) %>% filter(year_num >= 
    2014) %>% group_by(year, month) %>% tally %>% {
    ggplot(data = ., aes(x = month, y = n)) + geom_point() + geom_smooth(method = lm, 
        se = F, color = "red") + scale_y_continuous(breaks = seq(0, 100, 15), 
        minor_breaks = seq(0, 100, 5)) + scale_x_continuous(breaks = seq(0, 
        13, 2), minor_breaks = seq(0, 13, 1)) + facet_grid(. ~ year, space = "free_x", 
        scales = "free_x") + labs(title = "Relay.fm", subtitle = "Monthly Episode Output", 
        x = "Month of Year", y = "Episodes Published", caption = caption) + 
        theme_ipsum()
}

Runtimes

All Shows

relay %>% filter(year >= 2011) %>% ggplot(aes(x = date, y = duration, color = podcast)) + 
    geom_point(alpha = 0.5) + scale_x_date(date_breaks = "1 years", date_labels = "%Y", 
    date_minor_breaks = "1 month") + scale_y_continuous(breaks = seq(0, 300, 
    60), minor_breaks = seq(0, 300, 15), limits = c(0, NA)) + scale_color_viridis(discrete = TRUE, 
    guide = guide_legend(ncol = 1, keyheight = 0.75, )) + labs(title = "Relay.fm", 
    subtitle = "Episode Durations of Every Episode Since 2011", x = "Date Published", 
    y = "Duration (mins)", color = "Show", caption = caption)

relay %>% filter(podcast != "B-Sides") %>% {
    ggplot(., aes(x = date, y = duration)) + geom_point() + expand_limits(y = 0) + 
        scale_x_date(breaks = date_breaks("12 months"), minor_breaks = date_breaks("6 month"), 
            date_labels = "%Y") + scale_y_continuous(breaks = seq(0, 300, 30), 
        minor_breaks = seq(0, 300, 15)) + facet_wrap(~podcast, ncol = 5) + labs(title = "Relay.fm", 
        subtitle = "Episode Durations", x = "Date Published", y = "Duration (mins)", 
        caption = caption) + theme_ipsum(grid = "y") + theme(axis.text.x = element_text(angle = 90, 
        vjust = 0.5))
}

Histogram

relay %>% ggplot(aes(x = duration)) + geom_histogram(binwidth = 2.5) + geom_density(aes(y = ..count.., 
    group = 1), fill = "black", alpha = 0.3) + scale_x_continuous(breaks = seq(0, 
    300, 15), minor_breaks = seq(0, 300, 5)) + scale_y_continuous(breaks = seq(0, 
    300, 15), minor_breaks = seq(0, 300, 5)) + labs(title = "Relay.fm", subtitle = "Episode Durations", 
    y = "Frequency", x = "Duration (mins)", caption = caption) + theme_ipsum(grid = "y")

Average by Show

Boxplots

relay %>% # filter(podcast != 'B-Sides') %>%
group_by(podcast) %>% mutate(dur_mean = mean(duration)) %>% {
    ggplot(., aes(x = forcats::fct_reorder(podcast, dur_mean), y = duration)) + 
        geom_boxplot() + stat_summary(fun.y = "mean", geom = "point", size = 2, 
        color = "red") + coord_flip() + scale_y_continuous(breaks = seq(0, 300, 
        30), minor_breaks = seq(0, 300, 15)) + labs(title = "Relay.fm", subtitle = "Episode Duration — Boxplot + Mean (red)", 
        x = "", y = "Duration (mins)", caption = caption) + theme_ipsum()
}

95% CI

relay %>% # filter(podcast != 'B-Sides') %>%
group_by(podcast) %>% mutate(dur_mean = mean(duration)) %>% {
    ggplot(., aes(x = forcats::fct_reorder(podcast, dur_mean), y = duration)) + 
        stat_summary(fun.data = mean_cl_normal, geom = "errorbar") + stat_summary(fun.y = "mean", 
        geom = "point", size = 2, color = "red") + coord_flip() + expand_limits(y = 0) + 
        scale_y_continuous(breaks = seq(0, 300, 15), minor_breaks = seq(0, 300, 
            5)) + labs(title = "Relay.fm", subtitle = "Episode Duration — 95% CI + Mean (red)", 
        x = "", y = "Duration (mins)", caption = caption) + theme_ipsum()
}

Shitty Statistics

The following is merely done for fun and for me to try out some things.
Nothing I do here should actually be considered statistically sound or something.

The following plot shows the proportion of shows airing (or that used to air) on a given day which are retired or active.
We can infer from this that “monday shows” get retired more often, relatively speaking.

relay %>% count(show_status, weekday) %>% mutate(weekday = forcats::fct_rev(weekday)) %>% 
    group_by(weekday) %>% mutate(p = n/sum(n)) %>% ggplot(aes(x = weekday, p, 
    fill = show_status)) + geom_col() + coord_flip() + scale_y_percent() + scale_fill_brewer(palette = "Dark2") + 
    labs(title = "Relay.fm", subtitle = "Show Status by Publishing Day of Week", 
        x = "", y = "Proportion (within Day of Week)", fill = "Status", caption = caption) + 
    theme(legend.position = "top")

…Which, obviously, is bullshitty because podcasts don’t work like live television, and when a show is published should not by any means indicate whether or not it will be retired. At least I can’t think of any reason.

To put it in statsy terms:

relay %>% mutate(status_bin = if_else(show_status == "Retired", 1, 0), weekday = factor(weekday, 
    ordered = FALSE)) %>% glm(formula = status_bin ~ weekday, data = ., family = binomial) %>% 
    sjPlot::sjt.glm(exp.coef = TRUE, separate.ci.col = FALSE)
    status_bin
    Odds Ratio (CI) p
(Intercept)   0.04
(0.02 – 0.08)
<.001
weekday
Monday   7.27
(3.73 – 16.38)
<.001
Tuesday   4.19
(2.06 – 9.70)
<.001
Wednesday   4.64
(2.34 – 10.59)
<.001
Thursday   2.59
(1.29 – 5.96)
.014
Friday   3.63
(1.72 – 8.63)
.002
Saturday   3.07
(0.78 – 10.39)
.081
Observations   2574

Monday shows are 7 times more like to be retires than sunday shows (which is the reference level), while wednesday shows are 4 times more like. Also, friday shows are half as likely to be retired compared to monday shows.

Huh.

As I said, this is not methodologically sound, as I just dumped data into a logistic regression model and looked at the results, but if the next Relay show to be retires aired on wednesdays or mondays, I’ll have my hands full with hindsight bias, so there’s that.

People

Sadly I don’t have a lot of data about people besides the hosts of each show. I could try to pry out the guests of shows out of each show’s RSS feed, but sadly this information is not neatly presented in the feed, but rather would require an amount of regexing I am not prepared, and therere for not able to do.
Be that as it may, here’s a bit about the hosts.

relay %>% separate(col = people, sep = ",\\s", into = paste0("Host", 1:10)) %>% 
    gather(role, person, starts_with("Host")) %>% mutate(role = "Host") %>% 
    filter(!is.na(person)) %>% count(person) %>% ggplot(aes(x = reorder(person, 
    n), y = n)) + geom_col() + coord_flip() + labs(title = "Relay.fm", subtitle = "Person by Number of Episodes", 
    x = "", y = "# of Episodes Hosted", caption = caption)


Creative Commons Lizenzvertrag