R

WheRe in the woRld is R

I decided to check world usage of R.

After a few false starts (cran download logs don’t include IP blocks) I settled on using Google Trends to find the frequency of the search term

R Data

and downloaded the interest by region table. But being from tiny New Zealand, I am also interested in a interest per capita measure, and I used number of internet users as the national population that could be searching google for information about R Data.

Singapore wins both by a long way, which is not actually surprising as the government there actively supported people learning R for analytics.

 

Country Raw score per 100000 Internet Users
Singapore 100 2.15
New Zealand 67 1.76
Ireland 58 1.53
Switzerland 79 1.13
Denmark 58 1.1
Norway 46 0.94
Finland 45 0.93
Austria 38 0.57
Israel 31 0.52
Sweden 42 0.46
Belgium 39 0.45
Netherlands 47 0.3
Australia 55 0.28
Korea, (South) Republic of 73 0.16
South Africa 46 0.16
Canada 51 0.15
Taiwan 30 0.15
Malaysia 23 0.11
Poland 20 0.08
Spain 21 0.06
Italy 19 0.05
Germany 34 0.05
Philippines 22 0.05
Indonesia 28 0.04
France 18 0.03
United States 65 0.02
India 77 0.02
Turkey 6 0.02
Mexico 9 0.02
Brazil 12 0.01
Russia (Russian Fed.) 9 0.01
Japan 8 0.01

 

New Zealand is a clear second per capita, suggesting if you said you were having data frame problems in an urban coffee shop, people might know what you meant.

ppercapita

Whereas the raw interest map highlights countries with large populations and tech centres.

raw

and the code I used:

 

library(countrycode)
Rcountries <- read.csv("geoMap.csv")
Rcountries$cnames <- countrycode(rownames(Rcountries), 'country.name', 'iso2c')


library(rvest)
url <- "http://www.internetworldstats.com/list2.htm"
#Scrape tables
htmltables <- url %>% read_html() %>% html_nodes("table") %>% html_table(fill = TRUE)
length(htmltables)
#check which tables have the data by looking through them like this
View(htmltables[[19]])
View(htmltables[[7]])
# etc through the number of tables
p1 <- htmltables[[7]]
p2 <- htmltables[[10]]
p3 <- htmltables[[13]]
p4 <- htmltables[[16]]
countries <- rbind(p1,p2,p3,p4)

countriesplus <- merge(countries, Rcountries[!is.na(Rcountries$cnames),], by.x="X2", by.y="cnames")
countriesplus$internet <- as.numeric(gsub(",","",countriesplus$X5))
countriesplus$population <- as.numeric(gsub(",","",countriesplus$X4))
countriesplus$Category..All.categories <- as.numeric(as.character(countriesplus$Category..All.categories))
countriesplus$ratioper100000 <- 100000 *countriesplus$Category..All.categories/countriesplus$internet

RUsage <- countriesplus[order(-countriesplus$ratioper100000),]

library(rworldmap)

for_map <- joinCountryData2Map( countriesplus, joinCode = "ISO2", nameJoinColumn = "X2")
mapBits <- mapCountryData( for_map, nameColumnToPlot="Category..All.categories", addLegend=FALSE,
 catMethod = "pretty", numCats=10, mapTitle = "raw R interest")
do.call( addMapLegend, c(mapBits, legendWidth=0.5, legendMar = 2))

mapBits <- mapCountryData( for_map, nameColumnToPlot="ratioper100000", addLegend=FALSE,
 catMethod = "pretty", numCats=10, mapTitle = "R interest per 100000 Internet Users")
do.call( addMapLegend, c(mapBits, legendWidth=0.5, legendMar = 2))
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s