After doing The Incomparable, I had to go see what I could do with Relay.fm’s data as well, naturally.
So this is the beginning of that. The source code is included in this repo, and, in case anyone asks:
Yes, you can use these graphs for whatever. Maybe link back to me, that would be cool.

Please note that there’s probably still stuff coming.

Overview

Relay.fm spans 39 shows and a combined runtime of over 142 days spread across 3276 episodes since 2014.

In 2017, there were 850 episodes published, and we’re currently about 75.2% of the way there at 639 episodes this year.

Shows by Age

readRDS("data/relay.rds") %>% group_by(podcast) %>% mutate(d_min = min(date), 
    d_max = max(date)) %>% ggplot(aes(x = reorder(podcast, d_min), color = show_status)) + 
    geom_errorbar(aes(ymin = d_min, ymax = d_max), width = 0.5, size = 2, alpha = 0.75) + 
    coord_flip() + scale_x_discrete(position = "top") + scale_color_brewer(palette = "Dark2") + 
    labs(x = "", y = "First to Last Episode Date", title = "Relay.fm", subtitle = "Show Timelines", 
        caption = caption, color = "Status") + theme(legend.position = "top")

Shows per Year

relay %>% group_by(podcast, year) %>% tally %>% ggplot(aes(x = as.factor(year), 
    weight = n, fill = podcast)) + geom_bar() + scale_y_continuous(breaks = seq(0, 
    1000, 100), minor_breaks = seq(0, 1000, 25)) + scale_fill_viridis(discrete = TRUE, 
    guide = guide_legend(ncol = 1, keyheight = 0.75, )) + labs(x = "Year", y = "# of Episodes", 
    fill = "Show", title = "Relay.fm", subtitle = "Episodes per Year", caption = caption) + 
    theme_ipsum(grid = "y") + theme(legend.key.size = unit(0.5, "cm"))

Shows per Week

relay %>% mutate(week = week(date), year_num = year(date)) %>% filter(year_num >= 
    2014) %>% group_by(year, week) %>% tally %>% {
    ggplot(data = ., aes(x = week, y = n)) + geom_point() + geom_smooth(method = lm, 
        se = F, color = "red") + scale_y_continuous(breaks = seq(0, 100, 5), 
        minor_breaks = seq(0, 100, 1)) + scale_x_continuous(breaks = seq(0, 
        55, 16)) + facet_wrap(~year) + labs(title = "Relay.fm", subtitle = "Weekly Episode Output", 
        x = "Week of Year", y = "Episodes Published", caption = caption) + theme_ipsum(grid = "y")
}

Regularity

relay %>% mutate(week = isoweek(date), year_num = year(date)) %>% filter(year_num == 
    2017) %>% group_by(podcast, year, week) %>% tally %>% ggplot(aes(x = week, 
    y = n)) + geom_col() + # scale_y_continuous(breaks = seq(0, 10, 1)) + scale_x_continuous(breaks =
# seq(0, 60, 10)) +
facet_wrap(~podcast, ncol = 3) + labs(title = "Relay.fm", subtitle = "Episodes per Week in 2017", 
    x = "Week of Year", y = "# fo Episodes", caption = caption) + theme_ipsum(grid = "y") + 
    theme(axis.text.y = element_text(size = rel(0.5)))

Regularity and Age

relay %>% group_by(podcast) %>% summarize(first_aired = min(date), last_aired = max(date), 
    episodes = length(unique(number)), duration = sum(duration)) %>% mutate(lifetime_weeks = as.numeric(difftime(last_aired, 
    first_aired, units = "weeks")), episodes_per_week = episodes/lifetime_weeks, 
    duration_per_week = duration/lifetime_weeks) %>% ggplot(aes(x = lifetime_weeks, 
    y = episodes_per_week)) + geom_point() + geom_label_repel(aes(label = podcast), 
    size = 3) + geom_hline(aes(yintercept = mean(episodes_per_week)), linetype = "dotted", 
    color = "blue") + geom_vline(aes(xintercept = mean(lifetime_weeks)), linetype = "dotted", 
    color = "blue") + scale_y_continuous(breaks = seq(0, 5, 0.25), minor_breaks = seq(0, 
    5, 0.1)) + labs(title = "Relay.fm", subtitle = "Show Lifetime and Weekly Output", 
    x = "Lifetime of Podcast in Weeks", y = "Average # of Episodes per Week", 
    caption = caption) + theme_ipsum(grid = "y")

Shows per Month

relay %>% mutate(month = month(date), year_num = year(date)) %>% filter(year_num >= 
    2014) %>% group_by(year, month) %>% tally %>% {
    ggplot(data = ., aes(x = month, y = n)) + geom_point() + geom_smooth(method = lm, 
        se = F, color = "red") + scale_y_continuous(breaks = seq(0, 100, 15), 
        minor_breaks = seq(0, 100, 5)) + scale_x_continuous(breaks = seq(0, 
        13, 2), minor_breaks = seq(0, 13, 1)) + facet_grid(. ~ year, space = "free_x", 
        scales = "free_x") + labs(title = "Relay.fm", subtitle = "Monthly Episode Output", 
        x = "Month of Year", y = "Episodes Published", caption = caption) + 
        theme_ipsum()
}

Runtimes

All Shows

relay %>% filter(year >= 2011) %>% ggplot(aes(x = date, y = duration, color = podcast)) + 
    geom_point(alpha = 0.5) + scale_x_date(date_breaks = "1 years", date_labels = "%Y", 
    date_minor_breaks = "1 month") + scale_y_continuous(breaks = seq(0, 300, 
    60), minor_breaks = seq(0, 300, 15), limits = c(0, NA)) + scale_color_viridis(discrete = TRUE, 
    guide = guide_legend(ncol = 1, keyheight = 0.75, )) + labs(title = "Relay.fm", 
    subtitle = "Episode Durations of Every Episode Since 2011", x = "Date Published", 
    y = "Duration (mins)", color = "Show", caption = caption)

relay %>% filter(podcast != "B-Sides") %>% {
    ggplot(., aes(x = date, y = duration)) + geom_point() + expand_limits(y = 0) + 
        scale_x_date(breaks = date_breaks("12 months"), minor_breaks = date_breaks("6 month"), 
            date_labels = "%Y") + scale_y_continuous(breaks = seq(0, 300, 30), 
        minor_breaks = seq(0, 300, 15)) + facet_wrap(~podcast, ncol = 5) + labs(title = "Relay.fm", 
        subtitle = "Episode Durations", x = "Date Published", y = "Duration (mins)", 
        caption = caption) + theme_ipsum(grid = "y") + theme(axis.text.x = element_text(angle = 90, 
        vjust = 0.5))
}

Histogram

relay %>% ggplot(aes(x = duration)) + geom_histogram(binwidth = 2.5) + geom_density(aes(y = ..count.., 
    group = 1), fill = "black", alpha = 0.3) + scale_x_continuous(breaks = seq(0, 
    300, 15), minor_breaks = seq(0, 300, 5)) + scale_y_continuous(breaks = seq(0, 
    300, 15), minor_breaks = seq(0, 300, 5)) + labs(title = "Relay.fm", subtitle = "Episode Durations", 
    y = "Frequency", x = "Duration (mins)", caption = caption) + theme_ipsum(grid = "y")

Average by Show

Boxplots

relay %>% # filter(podcast != 'B-Sides') %>%
group_by(podcast) %>% mutate(dur_mean = mean(duration)) %>% {
    ggplot(., aes(x = forcats::fct_reorder(podcast, dur_mean), y = duration)) + 
        geom_boxplot() + stat_summary(fun.y = "mean", geom = "point", size = 2, 
        color = "red") + coord_flip() + scale_y_continuous(breaks = seq(0, 300, 
        30), minor_breaks = seq(0, 300, 15)) + labs(title = "Relay.fm", subtitle = "Episode Duration — Boxplot + Mean (red)", 
        x = "", y = "Duration (mins)", caption = caption) + theme_ipsum()
}

95% CI

relay %>% # filter(podcast != 'B-Sides') %>%
group_by(podcast) %>% mutate(dur_mean = mean(duration)) %>% {
    ggplot(., aes(x = forcats::fct_reorder(podcast, dur_mean), y = duration)) + 
        stat_summary(fun.data = mean_cl_normal, geom = "errorbar") + stat_summary(fun.y = "mean", 
        geom = "point", size = 2, color = "red") + coord_flip() + expand_limits(y = 0) + 
        scale_y_continuous(breaks = seq(0, 300, 15), minor_breaks = seq(0, 300, 
            5)) + labs(title = "Relay.fm", subtitle = "Episode Duration — 95% CI + Mean (red)", 
        x = "", y = "Duration (mins)", caption = caption) + theme_ipsum()
}

Shitty Statistics

The following is merely done for fun and for me to try out some things.
Nothing I do here should actually be considered statistically sound or something.

The following plot shows the proportion of shows airing (or that used to air) on a given day which are retired or active.
We can infer from this that “monday shows” get retired more often, relatively speaking.

relay %>% count(show_status, weekday) %>% mutate(weekday = forcats::fct_rev(weekday)) %>% 
    group_by(weekday) %>% mutate(p = n/sum(n)) %>% ggplot(aes(x = weekday, p, 
    fill = show_status)) + geom_col() + coord_flip() + scale_y_percent() + scale_fill_brewer(palette = "Dark2") + 
    labs(title = "Relay.fm", subtitle = "Show Status by Publishing Day of Week", 
        x = "", y = "Proportion (within Day of Week)", fill = "Status", caption = caption) + 
    theme(legend.position = "top")

…Which, obviously, is bullshitty because podcasts don’t work like live television, and when a show is published should not by any means indicate whether or not it will be retired. At least I can’t think of any reason.

To put it in statsy terms:

relay %>% mutate(status_bin = if_else(show_status == "Retired", 1, 0), weekday = factor(weekday, 
    ordered = FALSE)) %>% glm(formula = status_bin ~ weekday, data = ., family = binomial) %>% 
    sjPlot::sjt.glm(exp.coef = TRUE, separate.ci.col = FALSE)
    status_bin
    Odds Ratio (CI) p
(Intercept)   0.06
(0.03 – 0.10)
<.001
weekday
Monday   8.10
(4.63 – 15.60)
<.001
Tuesday   2.35
(1.28 – 4.69)
.009
Wednesday   2.47
(1.37 – 4.84)
.005
Thursday   2.66
(1.49 – 5.18)
.002
Friday   3.62
(1.97 – 7.23)
<.001
Saturday   1.29
(0.35 – 3.87)
.671
Observations   3276

Monday shows are 7 times more like to be retires than sunday shows (which is the reference level), while wednesday shows are 4 times more like. Also, friday shows are half as likely to be retired compared to monday shows.

Huh.

As I said, this is not methodologically sound, as I just dumped data into a logistic regression model and looked at the results, but if the next Relay show to be retires aired on wednesdays or mondays, I’ll have my hands full with hindsight bias, so there’s that.

People

Sadly I don’t have a lot of data about people besides the hosts of each show. I could try to pry out the guests of shows out of each show’s RSS feed, but sadly this information is not neatly presented in the feed, but rather would require an amount of regexing I am not prepared, and therere for not able to do.
Be that as it may, here’s a bit about the hosts.

relay %>% separate(col = people, sep = ",\\s", into = paste0("Host", 1:10)) %>% 
    gather(role, person, starts_with("Host")) %>% mutate(role = "Host") %>% 
    filter(!is.na(person)) %>% count(person) %>% ggplot(aes(x = reorder(person, 
    n), y = n)) + geom_col() + coord_flip() + labs(title = "Relay.fm", subtitle = "Person by Number of Episodes", 
    x = "", y = "# of Episodes Hosted", caption = caption)


Creative Commons Lizenzvertrag