Maps: R-Downloads around the world

1 Summary

To see the code used in this post, visit my kernel on kaggle in R Markdown format.

  • Objectives: To spatially visualise a dataset to understand global trends.
  • Challenge: Using the tmap package.
  • Data points: 7,282,808
  • Language: R

2 Question

What countries download R the most? Where are they? And who are the most intense R users measured by R downloads per 1,000 capita? Which months of the year are the most popular times to download R? Does it depend on the season? What variable could we attempt to use as a predictor of R downloads? Could we use linear regression for inference?

3 Dataset description

I used the Tidy Tuesday dataset for R downloads posted on 30th October 2018. The dataset contained 938,115 observations and eight features corresponding to a unique id, download date, time, size, version, os, country, and IP id. The downloads correspond to a year’s worth of downloads running from 2017-10-20 to 2018-10-20. To prevent any overlap and have one entire year, I limited downloads until 2018-10-19. After I removed missing values, there were 910,351 observations.

3.1 Total R Downloads

So what are the top ten countries with the most downloads?

The US has almost six times as many downloads than the second country China.

Figure 3.1: The US has almost six times as many downloads than the second country China.

And what about other countries and regions?
Besides the US, China, Germany, and other countries featured in the top 10, the map allows us to see which countries in each country downloaded R the most. These are Colombia and Brazil in South America; Mexico in Central America; and Italy, France and Poland in Europe.

Figure 3.2: Besides the US, China, Germany, and other countries featured in the top 10, the map allows us to see which countries in each country downloaded R the most. These are Colombia and Brazil in South America; Mexico in Central America; and Italy, France and Poland in Europe.

3.2 Total R Downloads per 1,000 capita

Some countries are heavier R users measured by the number of downloads per 1,000 capita. I wanted to see how this compared with total downloads.

Hong Kong, Switzerland, and Iceland, had more than two downloads per 1,000 capita during the 2017/18 period.

Figure 3.3: Hong Kong, Switzerland, and Iceland, had more than two downloads per 1,000 capita during the 2017/18 period.

And again, how does this look like around the world?

Besides the top 10 downloads per 1,000 capita plot, here we see that Uruguay and Chile in South America have the most downloads per 1,000 capita; Costa Rica in Central America; and Japan in Asia.

Figure 3.4: Besides the top 10 downloads per 1,000 capita plot, here we see that Uruguay and Chile in South America have the most downloads per 1,000 capita; Costa Rica in Central America; and Japan in Asia.

3.3 Times of the year with most downloads

I wonder at what times in the year are most popular for R downloads. Does it change amongst regions?

It seems like not everybody loves the sunshine, #RStats doesn't. The summer months had the least downloads around the world with the exception of Sub-Saharan Africa and the Pacific Islander regions.

Figure 3.5: It seems like not everybody loves the sunshine, #RStats doesn’t. The summer months had the least downloads around the world with the exception of Sub-Saharan Africa and the Pacific Islander regions.

4 Conclusion

"In this blogpost I analysed a dataset containing R software downloads spanning from October 2017 to 2018. Unsurprisingly, I found that the most populated countries have the most total downloads. Using the tidyverse, lubridate, and tmappackages, I found out which countries most download the software. Since large countries with large populations will have more total downloads, I decided to inspect number downloads per 1,000 capita. This extra step revealed that small, developed countries, such as Hong Kong, Switzerland, Iceland, Singapore, Liechtenstein, the Netherlands and Denmark have the most downloads per 1,000 capita. An exception to this were the US and Australia with at least one download per 1,000 capita despite being larger countries. Lastly, I looked into which months had the most R downloads by sub-regions and I find that almost everywhere, the summer isn’t a very popular season for R downloads.

Share Comments
comments powered by Disqus