At least that was the case until November 2016, when Marco pointed out to me that there was, indeed, a better place for me to scrape episode data.
In the meantime, I had taken John’s advice and started downloading every ATP episode ever as mp3 to manually mediainfo the duration out of them in a script, only to realize that the first few episodes didn’t have title information in them. This left only the option of crawling the ATP website for full episode metadata (which was annoying because of non-trivial URL parameters for pagination), which I was very much about to do… until I got Marco’s reply.
So… thanks Marco for saving me from the horrible fate that would have been a while loop in R.
At least that was the case, but ever since the ATP site relaunched in… 2020 or so? it was feasible so scrape the site and just leave it at that, as overcast feeds were not easily available programmatically anyway. It was a whole thing and I neglected to write any of it down. Whoopsie.
Episode Duration
Code
ggplot(atp, aes(x = date, y = duration)) +geom_point(alpha = .5, size =2) +geom_label_repel(data =slice_max(atp, duration, n =5),aes(label = glue::glue("{number}: {title}")) ) +expand_limits(y =0) +geom_smooth(method = lm, formula = y ~ x, se =FALSE, color ="red") +scale_x_date(breaks =date_breaks("12 months"),minor_breaks =date_breaks("6 months"),date_labels ="%Y" ) +scale_y_time(breaks =hms(minutes =seq(0, 1e6, 30)) ) +labs(title ="ATP Episode Durations",subtitle ="All episodes by duration",x ="Date Published", y = duration_mins, caption = caption )
Duration by Year
Code
atp |>group_by(year) |>ggplot(aes(x =as.factor(year), y = duration)) +geom_boxplot() +stat_summary(fun = mean, geom ="point", color ="red") +scale_y_time(breaks =hms(minutes =seq(0, 1e6, 30)) ) +labs(title ="ATP Episode Durations per Year",subtitle ="Duration by year with average (dot)",x ="", y = duration_mins, caption = caption ) +theme(panel.grid.major.x =element_blank() )
Duration by Month
Code
atp |>ggplot(aes(x = month, y = duration)) +geom_boxplot() +stat_summary(fun = mean, geom ="point", color ="red") +scale_x_discrete(guide =guide_axis(n.dodge =2) ) +scale_y_time(breaks =hms(minutes =seq(0, 1e6, 30)) ) +labs(title ="ATP Episode Durations grouped by Month",subtitle ="Duration by month with average (dot)",x =NULL, y = duration_mins, caption = caption ) +theme(panel.grid.major.x =element_blank() )
Weekday Heatmap
Code
atp |>mutate(year = forcats::fct_rev(factor(year))) |>count(year, weekday, .drop =FALSE) |>ggplot(aes(x = weekday, y = year, fill = n, color = n)) +geom_tile() +scale_color_viridis_c() +scale_fill_viridis_c() +labs(title ="ATP Release Day",subtitle ="Frequency of releases per day of week",x ="Day of Week", y ="Year", caption = caption,fill ="Count", color ="Count" ) +theme(panel.grid.major =element_blank(),panel.grid.minor =element_blank() )