class: center, middle, inverse, title-slide # visualizing
spatial data II ### 2021-10-13 --- class: middle, inverse # Welcome --- ## Announcements - --- ## Setup .midi[ ```r # load packages library(tidyverse) library(scales) library(geofacet) library(lubridate) # set default theme for ggplot2 ggplot2::theme_set(ggplot2::theme_minimal(base_size = 16)) # set default figure parameters for knitr knitr::opts_chunk$set( fig.width = 8, fig.asp = 0.618, fig.retina = 3, dpi = 300, out.width = "60%" ) # dplyr print min and max options(dplyr.print_max = 6, dplyr.print_min = 6) ``` ] --- class: middle, inverse # Fisheries of the world --- Fisheries and Aquaculture Department of the Food and Agriculture Organization of the United Nations collects data on fisheries production of countries. The (not-so-great) visualization below shows the distribution of fishery harvest of countries for 2018, by capture and aquaculture. <br> .pull-left[ <img src="images/fisheries-data.png" width="100%" style="display: block; margin: auto;" /> ] .pull-right[ - Countries whose total harvest was less than 100,000 tons are not included in the visualization. - Source: [Fishing industry by country](https://en.wikipedia.org/wiki/Fishing_industry_by_country) ] --- .question[ What are some ways you would improve this visualization? ] <img src="images/fisheries.png" width="80%" style="display: block; margin: auto;" /> --- ## Get the data ```r fisheries <- read_csv("data/fisheries.csv") ``` --- ## Inspect the data .midi[ ```r fisheries ``` ``` ## # A tibble: 216 × 3 ## country capture aquaculture ## <chr> <dbl> <dbl> ## 1 China 17800000 63700000 ## 2 Indonesia 6584419 16600000 ## 3 India 5082332 5703002 ## 4 Vietnam 2785940 3634531 ## 5 United States 4931017 444369 ## 6 Russia 4773413 173840 ## # … with 210 more rows ``` ] --- ## Data prep Calculate total fishing .pull-left[ ```r fisheries <- fisheries %>% mutate(total = capture + aquaculture) ``` ] .pull-right[ ```r fisheries ``` ``` ## # A tibble: 216 × 4 ## country capture aquaculture total ## <chr> <dbl> <dbl> <dbl> ## 1 China 17800000 63700000 81500000 ## 2 Indonesia 6584419 16600000 23184419 ## 3 India 5082332 5703002 10785334 ## 4 Vietnam 2785940 3634531 6420471 ## 5 United States 4931017 444369 5375386 ## 6 Russia 4773413 173840 4947253 ## # … with 210 more rows ``` ] --- ## Mapping the fisheries data - Obtain country boundaries and store as a data frame - Join the fisheries and country boundaries data frames - Plot the country boundaries, and fill by fisheries harvest data --- ## `map_data()` The `map_data()` function easily turns data from the maps package in to a data frame suitable for plotting with ggplot2: ```r world_map <- map_data("world") %>% as_tibble() ``` --- ## Mapping the world .midi[ ```r ggplot(world_map, aes(x = long, y = lat, group = group)) + geom_polygon(fill = "gray") + coord_quickmap() ``` <img src="13-visualize-spatial-II_files/figure-html/unnamed-chunk-10-1.png" width="70%" /> ] --- ## Join fisheries and world map .pull-left[ ```r fisheries %>% select(country) ``` ``` ## # A tibble: 216 × 1 ## country ## <chr> ## 1 China ## 2 Indonesia ## 3 India ## 4 Vietnam ## 5 United States ## 6 Russia ## # … with 210 more rows ``` ] .pull-right[ ```r world_map %>% select(region) ``` ``` ## # A tibble: 99,338 × 1 ## region ## <chr> ## 1 Aruba ## 2 Aruba ## 3 Aruba ## 4 Aruba ## 5 Aruba ## 6 Aruba ## # … with 99,332 more rows ``` ] --- ## Join fisheries and world map ```r fisheries_map <- left_join(fisheries, world_map, by = c("country" = "region")) ``` ```r glimpse(fisheries_map) ``` ``` ## Rows: 85,970 ## Columns: 9 ## $ country <chr> "China", "China", "China", "China", "China", "China", "Chi… ## $ capture <dbl> 17800000, 17800000, 17800000, 17800000, 17800000, 17800000… ## $ aquaculture <dbl> 63700000, 63700000, 63700000, 63700000, 63700000, 63700000… ## $ total <dbl> 81500000, 81500000, 81500000, 81500000, 81500000, 81500000… ## $ long <dbl> 110.8888, 110.9383, 110.9707, 110.9977, 111.0137, 110.9127… ## $ lat <dbl> 19.99194, 19.94756, 19.88330, 19.76470, 19.65547, 19.58608… ## $ group <dbl> 418, 418, 418, 418, 418, 418, 418, 418, 418, 418, 418, 418… ## $ order <int> 28698, 28699, 28700, 28701, 28702, 28703, 28704, 28705, 28… ## $ subregion <chr> "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1"… ``` --- ## Mapping fisheries .task[ What is missing/misleading about the following map? ] .midi[ ```r ggplot(fisheries_map, mapping = aes(x = long, y = lat, group = group)) + geom_polygon(aes(fill = capture)) + scale_fill_viridis_c() + coord_quickmap() ``` <img src="13-visualize-spatial-II_files/figure-html/unnamed-chunk-15-1.png" width="65%" /> ] --- class: middle .hand[ livecoding ] --- background-color: #114E8B <img src="13-visualize-spatial-II_files/figure-html/unnamed-chunk-16-1.png" width="95%" /> --- ## Highlights from livecoding - When working through non-matching unique identifiers in a join, you might need to clean the data in both data frames being merged, depending on the context - Two ways to surface polygons with `NA`s: - `left_join()` map to data, layering with map at the bottom, data on top - `left_join()` data to map, set `na.value` in `scale_fill_*()` to desired color - Use `na.value = "red"` (or some other color that will stand out) to easily spot polygons with `NA`s --- class: middle, inverse # Geofaceting --- <img src="13-visualize-spatial-II_files/figure-html/geofacet-state-1.png" width="90%" /> --- ## Daily US vaccine data by state .small[ ```r us_state_vaccinations <- read_csv(here::here("data/us_state_vaccinations.csv")) ``` ```r us_state_vaccinations ``` ``` ## # A tibble: 17,501 × 14 ## date location total_vaccinations total_distributed people_vaccinated ## <date> <chr> <dbl> <dbl> <dbl> ## 1 2021-01-12 Alabama 78134 377025 70861 ## 2 2021-01-13 Alabama 84040 378975 74792 ## 3 2021-01-14 Alabama 92300 435350 80480 ## 4 2021-01-15 Alabama 100567 444650 86956 ## 5 2021-01-16 Alabama NA NA NA ## 6 2021-01-17 Alabama NA NA NA ## # … with 17,495 more rows, and 9 more variables: ## # people_fully_vaccinated_per_hundred <dbl>, ## # total_vaccinations_per_hundred <dbl>, people_fully_vaccinated <dbl>, ## # people_vaccinated_per_hundred <dbl>, distributed_per_hundred <dbl>, ## # daily_vaccinations_raw <dbl>, daily_vaccinations <dbl>, ## # daily_vaccinations_per_million <dbl>, share_doses_used <dbl> ``` ] .footnote[ Source: https://ourworldindata.org/us-states-vaccinations ] --- ## Facet by location .panelset.sideways[ .panel[.panel-name[Code] ```r ggplot( us_state_vaccinations, aes(x = date, y = people_fully_vaccinated_per_hundred) ) + geom_area() + facet_wrap(~location) ``` ] .panel[.panel-name[Plot] ``` ## Warning: Removed 1802 rows containing missing values (position_stack). ``` <img src="13-visualize-spatial-II_files/figure-html/unnamed-chunk-21-1.png" width="100%" /> ] ] --- ## Data cleaning ```r us_state_vaccinations <- us_state_vaccinations %>% mutate(location = if_else(location == "New York State", "New York", location)) %>% filter(location %in% c(state.name, "District of Columbia")) ``` --- ## Geofacet by state Using `geofacet::facet_geo()`: .panelset.sideways[ .panel[.panel-name[Code] ```r ggplot(us_state_vaccinations, aes(x = date, y = people_fully_vaccinated_per_hundred)) + geom_area() + * facet_geo(~ location) + labs( x = NULL, y = NULL, title = "Covid-19 vaccination rate in the US", subtitle = "Daily number of people fully vaccinated, per hundred", caption = "Source: Our World in Data" ) ``` ] .panel[.panel-name[Plot] ``` ## Warning: Removed 567 rows containing missing values (position_stack). ``` <img src="13-visualize-spatial-II_files/figure-html/unnamed-chunk-23-1.png" width="95%" /> ] ] --- ## Geofacet by state, with improvements .panelset.sideways[ .panel[.panel-name[Plot] <img src="13-visualize-spatial-II_files/figure-html/unnamed-chunk-24-1.png" width="100%" /> ] .panel[.panel-name[Code] .midi[ ```r ggplot(us_state_vaccinations, aes(x = date, y = people_fully_vaccinated_per_hundred, group = location)) + geom_area() + facet_geo(~location) + * scale_y_continuous( * limits = c(0, 100), * breaks = c(0, 50, 100), * minor_breaks = c(25, 75) * ) + * scale_x_date(breaks = c(ymd("2021-01-01", "2021-05-01", "2021-09-01")), date_labels = "%b") + labs( x = NULL, y = NULL, title = "Covid-19 vaccination rate in the US", subtitle = "Daily number of people fully vaccinated, per hundred", caption = "Source: Our World in Data" ) + theme( * strip.text.x = element_text(size = 7), * axis.text = element_text(size = 8), plot.title.position = "plot" ) ``` ] ] ] --- ## Bring in 2020 Presidential election results ```r election_2020 <- read_csv(here::here("data/us-election-2020.csv")) ``` ```r election_2020 ``` ``` ## # A tibble: 51 × 5 ## state electoal_votes biden trump win ## <chr> <dbl> <dbl> <dbl> <chr> ## 1 Alabama 9 0 9 Republican ## 2 Alaska 3 0 3 Republican ## 3 Arizona 11 11 0 Democrat ## 4 Arkansas 6 0 6 Republican ## 5 California 55 55 0 Democrat ## 6 Colorado 9 9 0 Democrat ## # … with 45 more rows ``` --- ## Geofacet by state, color by presidential election result .small[ .panelset.sideways[ .panel[.panel-name[Code] ```r us_state_vaccinations %>% left_join(election_2020, by = c("location" = "state")) %>% ggplot(aes(x = date, y = people_fully_vaccinated_per_hundred)) + * geom_area(aes(fill = win)) + facet_geo(~location) + * scale_y_continuous(limits = c(0, 100), breaks = c(0, 50, 100), minor_breaks = c(25, 75)) + scale_x_date(breaks = c(ymd("2021-01-01", "2021-05-01", "2021-09-01")), date_labels = "%b") + * scale_fill_manual(values = c("#2D69A1", "#BD3028")) + labs( x = NULL, y = NULL, title = "Covid-19 vaccination rate in the US", subtitle = "Daily number of people fully vaccinated, per hundred", caption = "Source: Our World in Data", fill = "2020 Presidential\nElection" ) + theme( strip.text.x = element_text(size = 7), axis.text = element_text(size = 8), plot.title.position = "plot", * legend.position = c(0.93, 0.15), * legend.text = element_text(size = 9), * legend.title = element_text(size = 11), * legend.background = element_rect(color = "gray", size = 0.5) ) ``` ] .panel[.panel-name[Plot] <img src="13-visualize-spatial-II_files/figure-html/unnamed-chunk-29-1.png" width="100%" /> ] ] ]