Uncategorized

Circular banded graphs for ggplot

 

Using base R (as I did previously) means thinking about what do I want plotted where. Using ggplot means thinking about what job is the data doing in this graph. This is just running through what I was thinking when making my own version the Scientific American births per year style graph, but using ggplot rather than base plot.

library(readxl)
library(dplyr)
library(tidyr)
library(ggplot2)

Taking some data of births per year. In this case I am measuring percentage bloacks away from the mean, so transform the data into the day of the year, and births on that day as a proportion of the annual mean.

if(!file.exists("births.xlsx")){
 download.file("http://www.stats.govt.nz/~/media/Statistics/browse-categories/population/pop-birthdays-table/most-common-birthdays-19802014.xlsx", destfile="births.xlsx", mode="wb")
}
birthnums <- read_excel("births.xlsx", sheet = 2, skip=2)
names(birthnums)[1] <- "day"
long1 <- birthnums %>% gather(month,births, January:December) %>%
 filter(!is.na(births) & !(month=="February" & day==29)) %>%
 mutate(day_yr = 1:n(), perc=births/mean(births)) %>% 
 select(day_yr, perc)

Because days are discrete intervals, I am going to double up and displace the data to make both edges of the interval. This is to give a discrete interval bar graph effect to a continious line style graph. Because I intend to use geom_ribbon(), I also need a return path along the annual mean value.

long1$baseline <- 1
long1$trueday <- long1$day_yr
long2 <- long1
long1$part <- 1
long2$part <- 2
long2$day_yr <- long2$day_yr + 1
long3 <- rbind(long1, long2) %>% arrange(trueday,part)

This does kind of work for what it does at this point,

ggplot(long3) + coord_polar() +
 geom_ribbon(aes(ymin=baseline,
 ymax=perc,
 x=day_yr), fill="red")

To get multiple bands, we are going to want a group for each range, because in ggplot I think in terms of having the data to do a job on the graph. The ultimate aim is to overplot each of the data generated ribbons so that the wider ribbons go first with the narrower ones later.

This involves relacing unwanted entries with the mean value (1), and creating multiple sets of data than move in through the bands to the mean. This is all automatable with a function, but I will show manually figuring out the logic here and anyone interested can abstract it to a function.

First we need the range, in order to know the bands

range(long3$perc)
## [1] 0.7572485 1.1001570

So bands of steps of .75 to 1.15 will be steps of 5% difference (with little visible in the last step). Doing a test set:

long_postive1.15 <- long3
long_postive1.15$perc[long_postive1.15$perc < 1] <- 1
long_postive1.15$ranges <- "+10% to +15%"

long_postive1.05 <- long3
long_postive1.05$perc[long_postive1.05$perc < 1] <- 1
long_postive1.05$perc[long_postive1.05$perc > 1.05] <- 1.05
long_postive1.05$ranges <- "0 to +5%"

grf <- rbind(long_postive1.15, long_postive1.05)

ggplot() + coord_polar() + 
 geom_blank(data=long3, aes(x=day_yr,
 y=perc)) +
 geom_ribbon(data=grf,aes(ymin=baseline,
 ymax=perc,
 x=day_yr,
 group=ranges,
 fill=ranges))

That happens to work because the factors are naturally in the correct order, so let’s run with it.

long_postive1.15 <- long3
long_postive1.15$perc[long_postive1.15$perc < 1] <- 1
long_postive1.15$ranges <- "+10% to +15%"

long_postive1.10 <- long3
long_postive1.10$perc[long_postive1.10$perc < 1] <- 1
long_postive1.10$perc[long_postive1.10$perc > 1.10] <- 1.10
long_postive1.10$ranges <- "+5% to +10%"

long_postive1.05 <- long3
long_postive1.05$perc[long_postive1.05$perc < 1] <- 1
long_postive1.05$perc[long_postive1.05$perc > 1.05] <- 1.05
long_postive1.05$ranges <- "0% to +5%"

long_postive0.75 <- long3
long_postive0.75$perc[long_postive0.75$perc > 1] <- 1
long_postive0.75$ranges <- "-25% to -20%"

long_postive0.80 <- long3
long_postive0.80$perc[long_postive0.80$perc > 1] <- 1
long_postive0.80$perc[long_postive0.80$perc < 0.8] <- 0.8
long_postive0.80$ranges <- "-20% to -15%"

long_postive0.85 <- long3
long_postive0.85$perc[long_postive0.85$perc > 1] <- 1
long_postive0.85$perc[long_postive0.85$perc < 0.85] <- 0.85
long_postive0.85$ranges <- "-15% to -10%"

long_postive0.90 <- long3
long_postive0.90$perc[long_postive0.90$perc > 1] <- 1
long_postive0.90$perc[long_postive0.90$perc < 0.90] <- 0.90
long_postive0.90$ranges <- "-10% to -5%"

long_postive0.95 <- long3
long_postive0.95$perc[long_postive0.95$perc > 1] <- 1
long_postive0.95$perc[long_postive0.95$perc < 0.95] <- 0.95
long_postive0.95$ranges <- "-5% to 0%"

grf <- rbind(long_postive1.15,
 long_postive1.10,
 long_postive1.05,
 long_postive0.75,
 long_postive0.80,
 long_postive0.85,
 long_postive0.90,
 long_postive0.95)

ggplot() + coord_polar() + 
 geom_blank(data=long3, aes(x=day_yr,
 y=perc)) +
 geom_ribbon(data=grf,aes(ymin=baseline,
 ymax=perc,
 x=day_yr,
 group=ranges,
 fill=ranges))

 

The data is great, but we need to control the order of the factors.

grf$`Difference to mean` <- factor(grf$ranges, levels=c("-25% to -20%",
 "-20% to -15%",
 "-15% to -10%",
 "-10% to -5%",
 "-5% to 0%",
 "+10% to +15%",
 "+5% to +10%",
 "0% to +5%"))
ggplot() + coord_polar() + 
 geom_blank(data=long3, aes(x=day_yr,
 y=perc)) +
 geom_ribbon(data=grf,aes(ymin=baseline,
 ymax=perc,
 x=day_yr,
 group=`Difference to mean`,
 fill=`Difference to mean`))

Now it is a matter of adding a bunch of settings to make it look better.

clrs <- c("#1D5C8F",
 "#266f9B",
 "#2F84A3",
 "#3498A1",
 "#16AE99",
 "#E76525",
 "#F38320",
 "#F7A21C")
legendorder=c("-25% to -20%",
 "-20% to -15%",
 "-15% to -10%",
 "-10% to -5%",
 "-5% to 0%",
 "0% to +5%",
 "+5% to +10%",
 "+10% to +15%")
linedf <- data.frame(ax = c(1,1), ay=c(1.05,1.15))
ggplot() + coord_polar() + 
 ylim(.4,1.15) + ggtitle("New Zealand births by day of year, over/under average") + theme_classic() +
 theme(axis.title.x = element_blank(), axis.title.y = element_blank(),
 axis.line=element_blank(), axis.ticks =element_blank(),
 axis.text.x=element_blank(), axis.text.y=element_blank()) +
 geom_blank(data=long3, aes(x=day_yr, y=perc)) +
 geom_ribbon(data=grf,aes(ymin=baseline,
 ymax=perc,
 x=day_yr,
 group=`Difference to mean`,
 fill=`Difference to mean`)) +
 geom_line(data=grf,aes(x=day_yr, y=1), colour="#444444", lwd=0.5) +
 geom_line(data=linedf,aes(x=ax, y=ay), colour="#444444") +
 annotate(geom="text", x=15, y=1.1, label="January", size=3) +
 annotate(geom="text", x=350, y=1.1, label="December", size=3) +
 scale_fill_manual(values=clrs, breaks=legendorder) 

So that’s one approach.

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s