To see the code used in this post, visit my kernel on kaggle in R Markdown format.
- Objectives: To spatially visualise a dataset to understand global trends.
- Challenge: Using the
- Data points: 7,282,808
- Language: R
What countries download R the most? Where are they? And who are the most intense R users measured by R downloads per 1,000 capita? Which months of the year are the most popular times to download R? Does it depend on the season? What variable could we attempt to use as a predictor of R downloads? Could we use linear regression for inference?
3 Dataset description
I used the Tidy Tuesday dataset for R downloads posted on 30th October 2018. The dataset contained 938,115 observations and eight features corresponding to a unique id, download date, time, size, version, os, country, and IP id. The downloads correspond to a year’s worth of downloads running from 2017-10-20 to 2018-10-20. To prevent any overlap and have one entire year, I limited downloads until 2018-10-19. After I removed missing values, there were 910,351 observations.
3.1 Total R Downloads
So what are the top ten countries with the most downloads?
3.2 Total R Downloads per 1,000 capita
Some countries are heavier R users measured by the number of downloads per 1,000 capita. I wanted to see how this compared with total downloads.
And again, how does this look like around the world?
3.3 Times of the year with most downloads
I wonder at what times in the year are most popular for R downloads. Does it change amongst regions?
"In this blogpost I analysed a dataset containing R software downloads spanning from October 2017 to 2018. Unsurprisingly, I found that the most populated countries have the most total downloads. Using the
tmappackages, I found out which countries most download the software. Since large countries with large populations will have more total downloads, I decided to inspect number downloads per 1,000 capita. This extra step revealed that small, developed countries, such as Hong Kong, Switzerland, Iceland, Singapore, Liechtenstein, the Netherlands and Denmark have the most downloads per 1,000 capita. An exception to this were the US and Australia with at least one download per 1,000 capita despite being larger countries. Lastly, I looked into which months had the most R downloads by sub-regions and I find that almost everywhere, the summer isn’t a very popular season for R downloads.