Problem set 4

Due by 11:59 PM on Tuesday, October 30, 2018

Task 0: Setting things up

Create a new RStudio project somewhere on your computer. Open that new folder in Windows File Explorer or macOS Finder (however you navigate around the files on your computer), and create subfolders there named output and data.

Download this R Markdown file and place it in the root of your newly-created projectYou’ll probably have to right click on the link and choose “Save link as…”.

It contains an basic outline/skeleton of the tasks you’ll do in this assignment. It doesn’t have a lot this time. You’re on your own.Awesome. Wow.

Download these two CSV files and place them in your data folder:

In the end, the structure of your new project directory should look something like this:

your-project-name/
  your-name_problem-set-4.Rmd
  your-project-name.Rproj
  output/
    NOTHING
  data/
    unemployment.csv
    water_usage.csv

Task 1: Bullet charts

Bullet charts are goofy and wonky, but they’re excellent practice problems for ggplot, since they involve lots of geom_*() layers.

Recreate this figure in R.Original figure by Bill Dean, posted at “The Bullet Graph”.

Don’t worry about including the legend.

Bill Dean’s bullet chart

Bill Dean’s bullet chart

A couple hints:

The final image should look something like this: Don’t worry about custom fonts unless you want to be brave; I’m using Roboto Condensed in the plot.

You can use whatever colors you want and whatever titles you want.

Task 2: Small multiples

Use data from the US Bureau of Labor Statistics (BLS) to show the trends in employment rate for all 50 states between 2006 and 2016. What stories does this plot tell? Which states struggled to recover from the 2008–09 recession?

Some hints:

Task 3: Slopegraphs

Use data from the BLS to create a slopegraph that compares the unemployment rate in January 2006 with the unemployment rate in January 2009, either for all 50 states at once or for a specific region or division. Make sure the plot doesn’t look too busy or crowded in the end.

What story does this plot tell? Which states in the US (or in the specific region you selected) were the most/least affected the Great Recession?

Some hints:

unemployment_with_highlights <- unemployment %>% 
  mutate(highlight = ifelse(state %in% c("Utah", "Arizona"), TRUE, FALSE))

ggplot(unemployment_with_highlights, 
       aes(x = date, y = unemployment, group = state, color = highlight)) +
  geom_line(size = 0.5, alpha = 0.75) +
  scale_color_manual(values = c("grey70", "red"), guide = FALSE) +
  theme_minimal()

Submit

When you’re done, submit a knitted PDF or Word file of your analysis on Learning Suite. As always, it’s best if the final knitted document is clean and free of warnings and messages (so if a chunk is creating messages, like wherever you run library(tidyverse), add message=FALSE, warning=FALSE to the chunk options).

Postscript: how I got this unemployment data

For the curious, here’s the code I used to download the unemployment data from the BLS.

And to pull the curtain back and show how much googling is involved in data visualization (and data analysis and programming in general), here was my process for getting this data:

  1. I thought “I want to have students show variation in something domestic over time” and then I googled “us data by state”. Nothing really came up (since it was an exceedingly vague search in the first place), but some results mentioned unemployment rates, so I figured that could be cool.
  2. I googled “unemployment statistics by state over time” and found that the BLS keeps statistics on this. I clicked on the “Data Tools” link in their main navigation bar, clicked on “Unemployment”, and then clicked on the “Multi-screen data search” button for the Local Area Unemployment Statistics (LAUS).
  3. I walked through the multiple screens and got excited that I’d be able to download all unemployment stats for all states for a ton of years, BUT THEN the final page had links to 51 individual Excel files, which was dumb.
  4. So I went back to Google and searched for “download bls data r” and found a few different packages people have written to do this. The first one I clicked on was blscrapeR at GitHub, and it looked like it had been updated recently, so I went with it.
  5. I followed the examples in the blscrapeR package and downloaded data for every state.

Another day in the life of doing modern data science. I had no idea people had written R packages to access BLS data, but there are like 3 packages out there! After a few minutes of tinkering, I got it working and it’s super magic.