Annotating and grouping
Materials for class on Tuesday, November 27, 2018
Data to download
Download these and put them in a folder named “data” in an RStudio project:
- World happinessI collected this data from the UN and the World Bank. If you’re interested, you can see the R script I used to create this dataset here.
- Louisville animal bitesSee complete column descriptions. The data is released under a public domain license and hosted originally at Kaggle.
Use this link to see the code that I’m actually typing:
I’ve saved the R script to Dropbox, and that link goes to a live version of that file. Refresh or re-open the link as needed to copy/paste code I type up on the screen.
Louisville animal bites
Use some of this code to help you get started. You don’t have to do this—this gets a count of dog, cat, and other bites between 2010 and 2017. Feel free to do whatever you want. You’re iterating here!
library(tidyverse) library(lubridate) bites_raw <- read_csv("data/Health_AnimalBites.csv") # Or directly from the internet if you want # bites_raw <- read_csv("https://datavizf18.classes.andrewheiss.com/data/Health_AnimalBites.csv") bites <- bites_raw %>% mutate(year = year(bite_date)) %>% mutate(species = case_when( SpeciesIDDesc == "CAT" ~ "Cat", SpeciesIDDesc == "DOG" ~ "Dog", TRUE ~ "Other" )) %>% mutate(species = factor(species, levels = c("Dog", "Cat", "Other"), ordered = TRUE)) %>% filter(year < 2018, year >= 2010) bites_species_year <- bites %>% filter(!is.na(species)) %>% group_by(year, species) %>% summarize(total_bites = n())
Iterative design + grouping and annotating
Here are some fairly polished plots based on the world happiness index and other UN and World Bank data, all arranged in a nice 3-panel figure with patchwork. This is the final output—the process of getting to the point took a while and went through lots of different iterations, which is the creative process in action.
library(tidyverse) library(ggrepel) library(broom) # For dealing with models as data frames library(patchwork) library(ggbeeswarm) # For cool dot plots
happiness <- read_csv("data/world_happiness.csv")
happiness_clean <- happiness %>% mutate(in_asia = region == "East Asia & Pacific") %>% mutate(label_to_plot = ifelse(in_asia, country, NA)) %>% mutate(region_big = case_when( region == "East Asia & Pacific" ~ "Asia", region == "Europe & Central Asia" ~ "Europe", region == "Latin America & Caribbean" ~ "North & South America", region == "North America" ~ "North & South America", region == "South Asia" ~ "Asia", TRUE ~ region )) %>% mutate(region_big = factor(region_big, levels = c("North & South America", "Europe", "Middle East & North Africa", "Asia", "Sub-Saharan Africa"), ordered = TRUE))
Happiness explained by life expectancy
Here’s the relationship between life expectancy and national happiness, with East Asian and Oceanic countries highlighted with redundant shapes. Note how instead of using
annotate(), I make a separate data frame called
extra_labels and then use
geom_text() to plot it twice. This might be overkill here, since I’m only plotting two things, but it allows for more flexibility later if I want to add additional labels and not worry about adding even more
Happiness explained by life expectancy, colored by region
Here I collapsed some of the regions with
case_when() up above, and then generated a palette of five perceptually uniform and colorblind friendly colors at iWantHue.
The other cool thing about this plot is
final_predicted_points, which runs a linear regression model on each region and then determines the final predicted point for each line, which I then use with
geom_text_repel() to put region names directly on the plot.
Happiness by region
Here I just plot happiness scores (i.e. no comparison with life expectancy or anything else) by region. I use
geom_quasirandom() from the
ggbeeswarm package, which jitters points in cool shapes.
Combined mega plot with patchwork
Finally, I put all of these together in a final combined plot using the
Note how I make some adjustments to
plot2, like shrinking the titles and adding tags. Also note that I use
& to combine the plots in the right configuration. I figured this out by reading the README at
patchwork’s GitHub repository.
Clearest and muddiest things
Go to this form and answer these two questions:
- What was the muddiest thing from class today? What are you still wondering about?
- What was the clearest thing from class today? What was the most exciting thing you learned?
I’ll compile the questions and send out answers after class.