I recently discovered an awesome R package called zipcode so I decided to play around with it a little bit. I found some IRS data on the 100 highest and 100 lowest income zip codes in the US. After cleaning up and modifying the data a little bit I plotted it onto a map projection of the US.
library(ggplot2) library(maps) library(zipcode) data(zipcode) high.income <- read.csv("high.csv", header=T) low.income <-read.csv("low.csv", header=T) high.income$type <- "High" low.income$type <- "Low" high.income$zip <- clean.zipcodes(high.income$zip) low.income$zip <- clean.zipcodes(low.income$zip) high.income <- merge(high.income, zipcode, by.x='zip', by.y='zip') low.income <- merge(low.income, zipcode, by.x='zip', by.y='zip') #### Remove Hawaii low.income <- low.income[-which(low.income$state=="HI"),] states <- map_data("state") g <- ggplot() + geom_path(aes(x = long, y = lat, group=group), data=states) g <- g + geom_point(aes(x=longitude, y=latitude, color=type, size=Salary), data = high.income) g <- g + geom_point(aes(x=longitude, y=latitude, color=type, size=Salary), data = low.income) g <- g + scale_size_continuous(range = c(3, 10)) g <- g + theme_bw() + labs(x=NULL, y=NULL) + ggtitle("Zip Codes by Salary") g
The blue points represent zip codes that are part of the lowest group while the red points are in the wealthiest group. The size of the points correspond to reported salaries in those zip codes. For this post, I only considered zip codes that reported more than 5,000 returns.
We can see that many of the least wealthy zip codes are in the southeast while the New York/New Jersey area as well as LA/San Francisco have high densities of wealthy zip codes.