Normalized Frequency of Terrorism in the US

I’ve been using the Global Terrorism Database a lot lately so I decided to share an interesting plot I made with the data.

The GTD provides over 100,000 observations of terrorist incidents between 1970 and 2011. Of these, there are about 2400 observations in the USA. While this is not a large number, the graph still provides some interesting and intuitive results.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
## Load libraries
library(ggplot2)
library(plyr)
library(maps)
library(stringr)

## Load terrorism data
gtd.data <- read.csv("gtd.csv", stringsAsFactors = F)



##
## Begin USA heatmap plot
##

## Subset data to only include terrorist attacks in the USA
gtd.usa <- subset(gtd.data, country_txt == "United States")

## Clean provstate column
gtd.usa$provstate <- str_replace(gtd.usa$provstate, "(U.S. State)", "")
gtd.usa$provstate <- str_replace(gtd.usa$provstate, "[(]", "")
gtd.usa$provstate <- str_replace(gtd.usa$provstate, "[)]", "")

## Trim whitespaces
gtd.usa$provstate <- str_trim(gtd.usa$provstate)

## Load US state population data
populations <- read.csv("states.csv")

## Create counts of terrorist activity in each state
counts <- count(gtd.usa, "provstate")

## Merge the populations dataset with the counts dataset
gtd.pop.merge <- merge(counts, populations, by.x = "provstate", by.y = "Name")

## Create normalized terrorism frequency by dividing frequency
## by the population of the state
gtd.pop.merge <- mutate(gtd.pop.merge, normal = freq / CENSUS2010POP)
gtd.pop.merge$normal <- log10(gtd.pop.merge$normal)

gtd.pop.merge$provstate <- tolower(gtd.pop.merge$provstate)
names(gtd.pop.merge)[1] <- "region"

## Load US state data
states <- map_data("state")

## Merge the map data with our previous dataset
merged <- merge(states, gtd.pop.merge, sort = FALSE, by = "region")

## Plot the heatmap
g <- ggplot(merged) + geom_polygon(aes(x = long, y = lat, group = group,
                                       fill = normal))

g <- g + scale_fill_gradient(low = "lightgreen", high = "blue")

g <- g + theme_bw() + labs(fill = "Normalized Frequency of Terrorism") +
     theme(legend.position = "bottom")

g <- g + xlab(NULL) + ylab(NULL)

g <- g + theme(panel.grid.minor=element_blank(),
               panel.grid.major=element_blank())

g <- g + theme(axis.text.x = element_blank(), axis.text.y = element_blank())

g <- g + ggtitle("Normalized Frequency of Terrorism in the USA")

g <- g + scale_x_continuous(breaks = NULL) + scale_y_continuous(breaks = NULL)

g

In order to obtain meaningful results, rather than simply plot the number of terrorist incidents per state, I divided each state’s count by the 2010 state population. I know that this is not entirely correct as population levels have fluctuated (with respect to one another) from 1970-2011 but this was fine for my purposes. I noticed some clustering in the frequencies of terrorist attacks so I took a log10 transform of those numbers to spread the numbers out more smoothly.