Podcast Stats
  • Overview
  • The Incomparable
    • The Motshership
    • The Network
  • Relay FM
  • ATP
  • Data

On this page

  • Episode Duration
    • Duration by Year
    • Duration by Month
  • Weekday Heatmap
  • Raw Data

Accidental Tech Podcast

Author

Lukas Burk

Updated

May 4, 6:22 (UTC)

Data in this summary is limited to the 100 most recent episodes in the feed.
At least that was the case until November 2016, when Marco pointed out to me that there was, indeed, a better place for me to scrape episode data.

In the meantime, I had taken John’s advice and started downloading every ATP episode ever as mp3 to manually mediainfo the duration out of them in a script, only to realize that the first few episodes didn’t have title information in them. This left only the option of crawling the ATP website for full episode metadata (which was annoying because of non-trivial URL parameters for pagination), which I was very much about to do… until I got Marco’s reply.
So… thanks Marco for saving me from the horrible fate that would have been a while loop in R.

At least that was the case, but ever since the ATP site relaunched in… 2020 or so? it was feasible so scrape the site and just leave it at that, as overcast feeds were not easily available programmatically anyway. It was a whole thing and I neglected to write any of it down.
Whoopsie.

Episode Duration

Code
ggplot(atp, aes(x = date, y = duration)) +
  geom_point(alpha = .5, size = 2) +
  geom_label_repel(
    data = slice_max(atp, duration, n = 5),
    aes(label = glue::glue("{number}: {title}"))
  ) +
  expand_limits(y = 0) +
  geom_smooth(method = lm, formula = y ~ x, se = FALSE, color = "red") +
  scale_x_date(
    breaks = date_breaks("12 months"),
    minor_breaks = date_breaks("6 months"),
    date_labels = "%Y"
  ) +
  scale_y_time(
    breaks = hms(minutes = seq(0, 1e6, 30)),
    labels = \(x) {
      stringr::str_replace(x, ":\\d\\d$", "")
    }
  ) +
  labs(
    title = "ATP Episode Durations",
    subtitle = "All episodes durations with trend line.\n5 longest episodes are labelled with their episode number.",
    x = "Date Published",
    y = "Duration (H:M)",
    caption = caption
  )

Duration by Year

Code
atp |>
  group_by(year) |>
  ggplot(aes(x = as.factor(year), y = duration)) +
  geom_boxplot() +
  stat_summary(fun = mean, geom = "point", color = "red") +
  scale_y_time(
    breaks = hms(minutes = seq(0, 1e6, 30)),
    labels = \(x) {
      stringr::str_replace(x, ":\\d\\d$", "")
    }
  ) +
  labs(
    title = "ATP Episode Durations per Year",
    subtitle = "Duration by year with average (dot)",
    x = "",
    y = "Duration (H:M)",
    caption = caption
  ) +
  theme(
    panel.grid.major.x = element_blank()
  )

Duration by Month

Code
atp |>
  ggplot(aes(x = month, y = duration)) +
  geom_boxplot() +
  stat_summary(fun = mean, geom = "point", color = "red") +
  scale_x_discrete(
    guide = guide_axis(n.dodge = 2)
  ) +
  scale_y_time(
    breaks = hms(minutes = seq(0, 1e6, 30))
  ) +
  labs(
    title = "ATP Episode Durations grouped by Month",
    subtitle = "Duration by month with average (dot)",
    x = NULL,
    y = duration_mins,
    caption = caption
  ) +
  theme(
    panel.grid.major.x = element_blank()
  )

Weekday Heatmap

Code
atp |>
  mutate(year = forcats::fct_rev(factor(year))) |>
  count(year, weekday, .drop = FALSE) |>
  ggplot(aes(x = weekday, y = year, fill = n, color = n)) +
  geom_tile() +
  scale_color_viridis_c() +
  scale_fill_viridis_c() +
  labs(
    title = "ATP Release Day",
    subtitle = "Frequency of releases per day of week",
    x = "Day of Week",
    y = "Year",
    caption = caption,
    fill = "Count",
    color = "Count"
  ) +
  theme(
    panel.grid.major = element_blank(),
    panel.grid.minor = element_blank()
  )

Raw Data

Code
atp |>
  mutate(
    duration = round(duration / 60, 2),
    date = as_date(date)
  ) |>
  select(number, title, duration, date, year, month) |>
  datatable(
    style = "bootstrap",
    colnames = c("#", "Title", "Duration (mins)", "Date", "Year", "Month"),
    rownames = FALSE
  )
 
  • About