Comparing Quality of Life, Income Inequality, and Nobel Laureates Per Capita

Now that I’m done with finals, I finally have time to update my blog. I managed to find three separate Wikipedia entries: one about the Quality of Life Scores of several different countries, one about the number of Nobel Laureates per capita, and one that is a List of Countries by Income Inequality which uses Gini index to rank countries.

I then plotted the data to see if there was a discernable relationship between the three. Most of the work for this project had to do with merging and cleaning the data. I began by pasting the tables from the wikipedia articles into a .csv file. Since the tables were all different lengths and some countries were missing values of the parameters, I had to tidy the dataset up quite a bit.

The result, featured below the code, is pretty interesting.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
library(ggplot2)

nobel.data <- read.csv("nobel.csv", header = TRUE)
fixed.nobel.data <- matrix(nrow = 64, ncol = 4)
colnames(fixed.nobel.data) <- c("Country", "Laureates.Per.10.Million", "Quality.Of.Life.Score", "Gini.Score")

fixed.nobel.data[ , 1] <- paste(intersect(intersect(nobel.data$Country,
                                              nobel.data$Country.or.territory),
                                              nobel.data$Country.1))

for(i in 1:64) {
 country <- fixed.nobel.data[i, 1]
 for (j in 1:48) {
   if (nobel.data[j, 2] == country) {
     fixed.nobel.data[i , 2] <- nobel.data[j, 3]
   }
 }
}

for(i in 1:64) {
 country <- fixed.nobel.data[i, 1]
 for (j in 1:111) {
   if (nobel.data[j, 4] == country) {
     fixed.nobel.data[i , 3] <- nobel.data[j, 5]
   }
 }
}

for(i in 1:64) {
 country <- fixed.nobel.data[i, 1]
 for (j in 1:173) {
   if (nobel.data[j, 6] == country) {
     fixed.nobel.data[i , 4] <- nobel.data[j, 7]
   }
 }
}

fixed.nobel.data <- na.omit(fixed.nobel.data)
fixed.nobel.data2 <- as.data.frame(fixed.nobel.data)
fixed.nobel.data2[, 4] <- as.numeric(fixed.nobel.data[, 4])
fixed.nobel.data2[, 3] <- as.numeric(fixed.nobel.data[, 3])
fixed.nobel.data2[, 2] <- as.numeric(fixed.nobel.data[, 2])
fixed.nobel.data2[, 1] <- as.character(fixed.nobel.data[, 1])
fixed.nobel.data2[, 1] <- factor(fixed.nobel.data[, 1])

g <- ggplot(fixed.nobel.data2,
           aes(x = Quality.Of.Life.Score,
               y = Laureates.Per.10.Million,
               size = Gini.Score)) +
           xlab("Quality of Life Score") +
           ylab("Nobel Laureates Per 10 Million People")

g <- g + ylim(0, 45) + annotate("text", x = 5, y = 30,
                       label ="Larger Text represents\nLarger Income Inequality",
                       color = "red")

g <- g + theme(legend.position = "none")

g <- g + geom_text(aes(label = Country)) + scale_size(range = c(2.5, 10))

g

We can see that there is a weak positive relationship between the Quality of Life of a country and its Nobel Laureates per capita, as I expected. Perhaps most interesting is how the income inequality fits into all of this. We see that countries that have a high Quality of Life Score AND high Nobel Laureates per capita tend to have very low income inequality. On the other hand, most of the countries that had relatively low Quality of Life scores and very few Nobel Laureates per capita tended to also have high levels of income inequality.

It is curious to note that some countries with higher levels of income inequality still had very low Nobel Laureates per capita, even if they were on the medium-high end of the Quality of Life scoring.