visualizing spatial data II

# visualizing<br>spatial data II
### 2021-10-13

---

# Welcome

---

## Announcements

---

## Setup

```r
# load packages
library(tidyverse)
library(scales)
library(geofacet)
library(lubridate)

# set default theme for ggplot2
ggplot2::theme_set(ggplot2::theme_minimal(base_size = 16))

# set default figure parameters for knitr
knitr::opts_chunk$set(
  fig.width = 8, fig.asp = 0.618, fig.retina = 3, dpi = 300, out.width = "60%"
)

# dplyr print min and max
options(dplyr.print_max = 6, dplyr.print_min = 6)
```
]

---

# Fisheries of the world

---

Fisheries and Aquaculture Department of the Food and Agriculture Organization of the United Nations collects data on fisheries production of countries. The (not-so-great) visualization below shows the distribution of fishery harvest of countries for 2018, by capture and aquaculture.

<br>

.pull-left[
<img src="images/fisheries-data.png" width="100%" style="display: block; margin: auto;" />
]
.pull-right[
- Countries whose total harvest was less than 100,000 tons are not 
included in the visualization.
- Source: [Fishing industry by country](https://en.wikipedia.org/wiki/Fishing_industry_by_country)
]

---

---

## Get the data

```r
fisheries <- read_csv("data/fisheries.csv")
```

---

## Inspect the data

```r
fisheries
```

```
## # A tibble: 216 × 3
##   country        capture aquaculture
##   <chr>            <dbl>       <dbl>
## 1 China         17800000    63700000
## 2 Indonesia      6584419    16600000
## 3 India          5082332     5703002
## 4 Vietnam        2785940     3634531
## 5 United States  4931017      444369
## 6 Russia         4773413      173840
## # … with 210 more rows
```
]

---

## Data prep

Calculate total fishing

```r
fisheries <- fisheries %>%
  mutate(total = capture + aquaculture)
```
]
.pull-right[

```r
fisheries
```

```
## # A tibble: 216 × 4
##   country        capture aquaculture    total
##   <chr>            <dbl>       <dbl>    <dbl>
## 1 China         17800000    63700000 81500000
## 2 Indonesia      6584419    16600000 23184419
## 3 India          5082332     5703002 10785334
## 4 Vietnam        2785940     3634531  6420471
## 5 United States  4931017      444369  5375386
## 6 Russia         4773413      173840  4947253
## # … with 210 more rows
```
]

---

## Mapping the fisheries data

- Obtain country boundaries and store as a data frame
- Join the fisheries and country boundaries data frames
- Plot the country boundaries, and fill by fisheries harvest data

---

## `map_data()`

The `map_data()` function easily turns data from the maps package in to a data frame suitable for plotting with ggplot2:

```r
world_map <- map_data("world") %>% as_tibble()
```

---

## Mapping the world

```r
ggplot(world_map, aes(x = long, y = lat, group = group)) +
  geom_polygon(fill = "gray") +
  coord_quickmap()
```

<img src="13-visualize-spatial-II_files/figure-html/unnamed-chunk-10-1.png" width="70%" />
]

---

## Join fisheries and world map

```r
fisheries %>% select(country)
```

```
## # A tibble: 216 × 1
##   country      
##   <chr>        
## 1 China        
## 2 Indonesia    
## 3 India        
## 4 Vietnam      
## 5 United States
## 6 Russia       
## # … with 210 more rows
```
]
.pull-right[

```r
world_map %>% select(region)
```

```
## # A tibble: 99,338 × 1
##   region
##   <chr> 
## 1 Aruba 
## 2 Aruba 
## 3 Aruba 
## 4 Aruba 
## 5 Aruba 
## 6 Aruba 
## # … with 99,332 more rows
```
]

---

## Join fisheries and world map

```r
fisheries_map <- left_join(fisheries, world_map, by = c("country" = "region"))
```

```r
glimpse(fisheries_map)
```

```
## Rows: 85,970
## Columns: 9
## $ country     <chr> "China", "China", "China", "China", "China", "China", "Chi…
## $ capture     <dbl> 17800000, 17800000, 17800000, 17800000, 17800000, 17800000…
## $ aquaculture <dbl> 63700000, 63700000, 63700000, 63700000, 63700000, 63700000…
## $ total       <dbl> 81500000, 81500000, 81500000, 81500000, 81500000, 81500000…
## $ long        <dbl> 110.8888, 110.9383, 110.9707, 110.9977, 111.0137, 110.9127…
## $ lat         <dbl> 19.99194, 19.94756, 19.88330, 19.76470, 19.65547, 19.58608…
## $ group       <dbl> 418, 418, 418, 418, 418, 418, 418, 418, 418, 418, 418, 418…
## $ order       <int> 28698, 28699, 28700, 28701, 28702, 28703, 28704, 28705, 28…
## $ subregion   <chr> "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1"…
```

---

## Mapping fisheries

```r
ggplot(fisheries_map, mapping = aes(x = long, y = lat, group = group)) +
  geom_polygon(aes(fill = capture)) +
  scale_fill_viridis_c() +
  coord_quickmap()
```

<img src="13-visualize-spatial-II_files/figure-html/unnamed-chunk-15-1.png" width="65%" />
]

---

---

background-color: #114E8B

---

## Highlights from livecoding

- When working through non-matching unique identifiers in a join, you might need to clean the data in both data frames being merged, depending on the context

- Two ways to surface polygons with `NA`s:
  - `left_join()` map to data, layering with map at the bottom, data on top
  - `left_join()` data to map, set `na.value` in `scale_fill_*()` to desired color

- Use `na.value = "red"` (or some other color that will stand out) to easily spot polygons with `NA`s

---

# Geofaceting

---

---

## Daily US vaccine data by state

```r
us_state_vaccinations <- read_csv(here::here("data/us_state_vaccinations.csv"))
```

```r
us_state_vaccinations
```

```
## # A tibble: 17,501 × 14
##   date       location total_vaccinations total_distributed people_vaccinated
##   <date>     <chr>                 <dbl>             <dbl>             <dbl>
## 1 2021-01-12 Alabama               78134            377025             70861
## 2 2021-01-13 Alabama               84040            378975             74792
## 3 2021-01-14 Alabama               92300            435350             80480
## 4 2021-01-15 Alabama              100567            444650             86956
## 5 2021-01-16 Alabama                  NA                NA                NA
## 6 2021-01-17 Alabama                  NA                NA                NA
## # … with 17,495 more rows, and 9 more variables:
## #   people_fully_vaccinated_per_hundred <dbl>,
## #   total_vaccinations_per_hundred <dbl>, people_fully_vaccinated <dbl>,
## #   people_vaccinated_per_hundred <dbl>, distributed_per_hundred <dbl>,
## #   daily_vaccinations_raw <dbl>, daily_vaccinations <dbl>,
## #   daily_vaccinations_per_million <dbl>, share_doses_used <dbl>
```

]

---

## Facet by location

.panelset.sideways[
.panel[.panel-name[Code]

```r
ggplot(
  us_state_vaccinations,
  aes(x = date, y = people_fully_vaccinated_per_hundred)
) +
  geom_area() +
  facet_wrap(~location)
```

]

```
## Warning: Removed 1802 rows containing missing values (position_stack).
```

]
]

---

## Data cleaning

```r
us_state_vaccinations <- us_state_vaccinations %>%
  mutate(location = if_else(location == "New York State", "New York", location)) %>%
  filter(location %in% c(state.name, "District of Columbia"))
```

---

## Geofacet by state

Using `geofacet::facet_geo()`:

.panelset.sideways[
.panel[.panel-name[Code]

```r
ggplot(us_state_vaccinations, 
       aes(x = date, y = people_fully_vaccinated_per_hundred)) +
  geom_area() +
* facet_geo(~ location) +
  labs(
    x = NULL, y = NULL,
    title = "Covid-19 vaccination rate in the US",
    subtitle = "Daily number of people fully vaccinated, per hundred",
    caption = "Source: Our World in Data"
  )
```

]

```
## Warning: Removed 567 rows containing missing values (position_stack).
```

]
]

---

## Geofacet by state, with improvements

.panelset.sideways[
.panel[.panel-name[Plot]
<img src="13-visualize-spatial-II_files/figure-html/unnamed-chunk-24-1.png" width="100%" />
]
.panel[.panel-name[Code]
.midi[

```r
ggplot(us_state_vaccinations, aes(x = date, y = people_fully_vaccinated_per_hundred, group = location)) +
  geom_area() +
  facet_geo(~location) +
* scale_y_continuous(
*   limits = c(0, 100),
*   breaks = c(0, 50, 100),
*   minor_breaks = c(25, 75)
*   ) +
* scale_x_date(breaks = c(ymd("2021-01-01", "2021-05-01", "2021-09-01")), date_labels = "%b") +
  labs(
    x = NULL, y = NULL,
    title = "Covid-19 vaccination rate in the US",
    subtitle = "Daily number of people fully vaccinated, per hundred",
    caption = "Source: Our World in Data"
  ) +
  theme(
*   strip.text.x = element_text(size = 7),
*   axis.text = element_text(size = 8),
    plot.title.position = "plot"
  )
```
]
]
]

---

## Bring in 2020 Presidential election results

```r
election_2020 <- read_csv(here::here("data/us-election-2020.csv"))
```

```r
election_2020
```

```
## # A tibble: 51 × 5
##   state      electoal_votes biden trump win       
##   <chr>               <dbl> <dbl> <dbl> <chr>     
## 1 Alabama                 9     0     9 Republican
## 2 Alaska                  3     0     3 Republican
## 3 Arizona                11    11     0 Democrat  
## 4 Arkansas                6     0     6 Republican
## 5 California             55    55     0 Democrat  
## 6 Colorado                9     9     0 Democrat  
## # … with 45 more rows
```

---

## Geofacet by state, color by presidential election result

```r
us_state_vaccinations %>%
  left_join(election_2020, by = c("location" = "state")) %>%
  ggplot(aes(x = date, y = people_fully_vaccinated_per_hundred)) +
* geom_area(aes(fill = win)) +
  facet_geo(~location) +
* scale_y_continuous(limits = c(0, 100), breaks = c(0, 50, 100), minor_breaks = c(25, 75)) +
  scale_x_date(breaks = c(ymd("2021-01-01", "2021-05-01", "2021-09-01")), date_labels = "%b") +
* scale_fill_manual(values = c("#2D69A1", "#BD3028")) +
  labs(
    x = NULL, y = NULL,
    title = "Covid-19 vaccination rate in the US",
    subtitle = "Daily number of people fully vaccinated, per hundred",
    caption = "Source: Our World in Data",
    fill = "2020 Presidential\nElection"
  ) +
  theme(
    strip.text.x = element_text(size = 7),
    axis.text = element_text(size = 8),
    plot.title.position = "plot",
*   legend.position = c(0.93, 0.15),
*   legend.text = element_text(size = 9),
*   legend.title = element_text(size = 11),
*   legend.background = element_rect(color = "gray", size = 0.5)
  )
```

]

]
]
]