Using base R (as I did previously) means thinking about what do I want plotted where. Using ggplot means thinking about what job is the data doing in this graph. This is just running through what I was thinking when making my own version the Scientific American births per year style graph, but using ggplot rather than base plot.

library(readxl) library(dplyr) library(tidyr) library(ggplot2)

Taking some data of births per year. In this case I am measuring percentage bloacks away from the mean, so transform the data into the day of the year, and births on that day as a proportion of the annual mean.

if(!file.exists("births.xlsx")){ download.file("http://www.stats.govt.nz/~/media/Statistics/browse-categories/population/pop-birthdays-table/most-common-birthdays-19802014.xlsx", destfile="births.xlsx", mode="wb") } birthnums <- read_excel("births.xlsx", sheet = 2, skip=2) names(birthnums)[1] <- "day" long1 <- birthnums %>% gather(month,births, January:December) %>% filter(!is.na(births) & !(month=="February" & day==29)) %>% mutate(day_yr = 1:n(), perc=births/mean(births)) %>% select(day_yr, perc)

Because days are discrete intervals, I am going to double up and displace the data to make both edges of the interval. This is to give a discrete interval bar graph effect to a continious line style graph. Because I intend to use geom_ribbon(), I also need a return path along the annual mean value.

long1$baseline <- 1 long1$trueday <- long1$day_yr long2 <- long1 long1$part <- 1 long2$part <- 2 long2$day_yr <- long2$day_yr + 1 long3 <- rbind(long1, long2) %>% arrange(trueday,part)

This does kind of work for what it does at this point,

ggplot(long3) + coord_polar() + geom_ribbon(aes(ymin=baseline, ymax=perc, x=day_yr), fill="red")

To get multiple bands, we are going to want a group for each range, because in ggplot I think in terms of having the data to do a job on the graph. The ultimate aim is to overplot each of the data generated ribbons so that the wider ribbons go first with the narrower ones later.

This involves relacing unwanted entries with the mean value (1), and creating multiple sets of data than move in through the bands to the mean. This is all automatable with a function, but I will show manually figuring out the logic here and anyone interested can abstract it to a function.

First we need the range, in order to know the bands

range(long3$perc) ## [1] 0.7572485 1.1001570

So bands of steps of .75 to 1.15 will be steps of 5% difference (with little visible in the last step). Doing a test set:

long_postive1.15 <- long3 long_postive1.15$perc[long_postive1.15$perc < 1] <- 1 long_postive1.15$ranges <- "+10% to +15%" long_postive1.05 <- long3 long_postive1.05$perc[long_postive1.05$perc < 1] <- 1 long_postive1.05$perc[long_postive1.05$perc > 1.05] <- 1.05 long_postive1.05$ranges <- "0 to +5%" grf <- rbind(long_postive1.15, long_postive1.05) ggplot() + coord_polar() + geom_blank(data=long3, aes(x=day_yr, y=perc)) + geom_ribbon(data=grf,aes(ymin=baseline, ymax=perc, x=day_yr, group=ranges, fill=ranges))

That happens to work because the factors are naturally in the correct order, so let’s run with it.

long_postive1.15 <- long3 long_postive1.15$perc[long_postive1.15$perc < 1] <- 1 long_postive1.15$ranges <- "+10% to +15%" long_postive1.10 <- long3 long_postive1.10$perc[long_postive1.10$perc < 1] <- 1 long_postive1.10$perc[long_postive1.10$perc > 1.10] <- 1.10 long_postive1.10$ranges <- "+5% to +10%" long_postive1.05 <- long3 long_postive1.05$perc[long_postive1.05$perc < 1] <- 1 long_postive1.05$perc[long_postive1.05$perc > 1.05] <- 1.05 long_postive1.05$ranges <- "0% to +5%" long_postive0.75 <- long3 long_postive0.75$perc[long_postive0.75$perc > 1] <- 1 long_postive0.75$ranges <- "-25% to -20%" long_postive0.80 <- long3 long_postive0.80$perc[long_postive0.80$perc > 1] <- 1 long_postive0.80$perc[long_postive0.80$perc < 0.8] <- 0.8 long_postive0.80$ranges <- "-20% to -15%" long_postive0.85 <- long3 long_postive0.85$perc[long_postive0.85$perc > 1] <- 1 long_postive0.85$perc[long_postive0.85$perc < 0.85] <- 0.85 long_postive0.85$ranges <- "-15% to -10%" long_postive0.90 <- long3 long_postive0.90$perc[long_postive0.90$perc > 1] <- 1 long_postive0.90$perc[long_postive0.90$perc < 0.90] <- 0.90 long_postive0.90$ranges <- "-10% to -5%" long_postive0.95 <- long3 long_postive0.95$perc[long_postive0.95$perc > 1] <- 1 long_postive0.95$perc[long_postive0.95$perc < 0.95] <- 0.95 long_postive0.95$ranges <- "-5% to 0%" grf <- rbind(long_postive1.15, long_postive1.10, long_postive1.05, long_postive0.75, long_postive0.80, long_postive0.85, long_postive0.90, long_postive0.95) ggplot() + coord_polar() + geom_blank(data=long3, aes(x=day_yr, y=perc)) + geom_ribbon(data=grf,aes(ymin=baseline, ymax=perc, x=day_yr, group=ranges, fill=ranges))

The data is great, but we need to control the order of the factors.

grf$`Difference to mean` <- factor(grf$ranges, levels=c("-25% to -20%", "-20% to -15%", "-15% to -10%", "-10% to -5%", "-5% to 0%", "+10% to +15%", "+5% to +10%", "0% to +5%")) ggplot() + coord_polar() + geom_blank(data=long3, aes(x=day_yr, y=perc)) + geom_ribbon(data=grf,aes(ymin=baseline, ymax=perc, x=day_yr, group=`Difference to mean`, fill=`Difference to mean`))

Now it is a matter of adding a bunch of settings to make it look better.

clrs <- c("#1D5C8F", "#266f9B", "#2F84A3", "#3498A1", "#16AE99", "#E76525", "#F38320", "#F7A21C") legendorder=c("-25% to -20%", "-20% to -15%", "-15% to -10%", "-10% to -5%", "-5% to 0%", "0% to +5%", "+5% to +10%", "+10% to +15%") linedf <- data.frame(ax = c(1,1), ay=c(1.05,1.15)) ggplot() + coord_polar() + ylim(.4,1.15) + ggtitle("New Zealand births by day of year, over/under average") + theme_classic() + theme(axis.title.x = element_blank(), axis.title.y = element_blank(), axis.line=element_blank(), axis.ticks =element_blank(), axis.text.x=element_blank(), axis.text.y=element_blank()) + geom_blank(data=long3, aes(x=day_yr, y=perc)) + geom_ribbon(data=grf,aes(ymin=baseline, ymax=perc, x=day_yr, group=`Difference to mean`, fill=`Difference to mean`)) + geom_line(data=grf,aes(x=day_yr, y=1), colour="#444444", lwd=0.5) + geom_line(data=linedf,aes(x=ax, y=ay), colour="#444444") + annotate(geom="text", x=15, y=1.1, label="January", size=3) + annotate(geom="text", x=350, y=1.1, label="December", size=3) + scale_fill_manual(values=clrs, breaks=legendorder)

So that’s one approach.

Thanks for the code, one thing I couldn’t figure out was how you did the coloring by percent, mine kept coming out with the color in blocks instead of accurately representing the values…

https://wordpress.com/post/rforearth.wordpress.com/664

LikeLike

I would be wondering if what you were grouping by was et appropriately in the graph aes- I don’t know this is the case, but if everything is precalculated into variables before making the graph, I would expect the variable with what band it is in combines with geom_ribbon to make each ribbon at the height of the percentiles. My guess is if you do not have data for a there and back path for the ribbon with groups of which ribbon, the process may go astray.

LikeLike