visualizing uncertainty

# visualizing uncertainty
### 2021-10-27

---

# Welcome

---

## Announcements

- You can now find thank you cards and ugly ggplots linked from the course website

- Project 1s:
  - Grades updated, well done on resubmissions!
  - Request to link to writeups (project website) and slides (not repo, commits, etc.) from the course website -- note: your names will be publicly available
  - Request will also include question about whether you want to "own" a copy of your Project 1 repo, i.e. make commit history and issues not containing scores
  - Need team consensus for both
  - Helpful to highlight your work + show future STA/ISS 313 students what they'll learn in the course

- Watch assigned video before Friday's class

- More on live-coding webinars

---

## Setup

```r
# load packages
library(tidyverse)
library(here)
library(colorspace)
library(cowplot)
library(emmeans)
library(broom)
library(gapminder)
library(emo)     # install_github("hadley/emo")
library(ungeviz) # install_github("wilkelab/ungeviz")

# set default theme for ggplot2
ggplot2::theme_set(ggplot2::theme_minimal(base_size = 16))
update_geom_defaults("point", list(size = 2)) # 2 for full width, 2.5 for half width

# set default figure parameters for knitr
knitr::opts_chunk$set(
  fig.width = 8, fig.asp = 0.618, fig.retina = 3, dpi = 300, out.width = "60%"
)

# dplyr print min and max
options(dplyr.print_max = 10, dplyr.print_min = 10)
```
]

---

## .hand[Let's imagine we're playing a game]

---

## .hand[The odds are in your favor:]
## .hand[You have a 90% chance of winning!]

---

class: center middle
background-image: url("images/Disappearing_dots.gif")
background-size: contain
background-color: #cccccc

???

Image by Wikiemdia user [Jahobr](https://commons.wikimedia.org/wiki/User:Jahobr), released into the public domain.

https://commons.wikimedia.org/wiki/File:Disappearing_dots.gif

---

## .hand[Sorry, you lost.] 🙂

---

## .hand[How does that make you feel?]

---

## We are bad at judging uncertainty

* You had a 10% chance of losing

* One in ten playing this game will lose

* 90% chance of winning is nowhere near a certain win

---

## It helps to visualize a set of possible outcomes

Possible outcomes from 100 individual games played

---

## Frequency framing

This type of visualization is called **frequency framing**

---

## Visualizing uncertainty of point estimates

- A point estimate is a single number, such as a mean

- Uncertainty is expressed as standard error, confidence interval, or credible interval

- Important: Uncertainty of a point estimate != variation in the sample

---

## Key concepts of statistical sampling

.center[
<img src="16-visualize-uncertainty_files/figure-html/sampling-schematic1-1.png" width="60%" />
]

???

Figure redrawn from [Claus O. Wilke. Fundamentals of Data Visualization. O'Reilly, 2019.](https://clauswilke.com/dataviz)

---

## Key concepts of statistical sampling

.center[
<img src="16-visualize-uncertainty_files/figure-html/sampling-schematic2-1.png" width="60%" />
]

???

Figure redrawn from [Claus O. Wilke. Fundamentals of Data Visualization. O'Reilly, 2019.](https://clauswilke.com/dataviz)

---

## Key concepts of statistical sampling

.center[
<img src="16-visualize-uncertainty_files/figure-html/sampling-schematic3-1.png" width="60%" />
]

???

Figure redrawn from [Claus O. Wilke. Fundamentals of Data Visualization. O'Reilly, 2019.](https://clauswilke.com/dataviz)

---

## Confidence intervals

> We are 95% confident that the confidence interval includes the true population mean.

]
.panel[.panel-name[Discuss]
<iframe src="https://app.sli.do/event/rxg9buzy" height="100%" width="100%" frameBorder="0" style="min-height: 560px;" title="Slido"></iframe>
]
]

---

## Frequentist interpretation of a confidence interval

???

Figure from [Claus O. Wilke. Fundamentals of Data Visualization. O'Reilly, 2019.](https://clauswilke.com/dataviz)

---

## Everest

```r
everest <- read_csv(here::here("16-visualize-uncertainty", "data/everest.csv"))
```

```r
everest
```

```
## # A tibble: 21,813 × 21
##    expedition_id member_id    peak_id peak_name  year season sex     age
##    <chr>         <chr>        <chr>   <chr>     <dbl> <chr>  <chr> <dbl>
##  1 EVER63101     EVER63101-03 EVER    Everest    1963 Spring M        36
##  2 EVER63101     EVER63101-04 EVER    Everest    1963 Spring M        31
##  3 EVER63101     EVER63101-05 EVER    Everest    1963 Spring M        27
##  4 EVER63101     EVER63101-06 EVER    Everest    1963 Spring M        26
##  5 EVER63101     EVER63101-07 EVER    Everest    1963 Spring M        26
##  6 EVER63101     EVER63101-08 EVER    Everest    1963 Spring M        29
##  7 EVER63101     EVER63101-01 EVER    Everest    1963 Spring M        44
##  8 EVER63101     EVER63101-09 EVER    Everest    1963 Spring M        37
##  9 EVER63101     EVER63101-10 EVER    Everest    1963 Spring M        32
## 10 EVER63101     EVER63101-11 EVER    Everest    1963 Spring M        26
## # … with 21,803 more rows, and 13 more variables: citizenship <chr>,
## #   expedition_role <chr>, hired <lgl>, highpoint_metres <dbl>, success <lgl>,
## #   solo <lgl>, oxygen_used <lgl>, died <lgl>, death_cause <chr>,
## #   death_height_metres <dbl>, injured <lgl>, injury_type <chr>,
## #   injury_height_metres <dbl>
```
]

---

## Example: Highest point reached on Everest in 2019

Includes only climbers and expedition members who **did not** summit

.center[
<img src="16-visualize-uncertainty_files/figure-html/everest-highest-point-1.png" width="70%" />
]

---

## Marginal effects example: Height reached on Everest

Average height reached relative to:<br>
a male climber who climbed with oxygen, summited, and survived

.center[
<img src="16-visualize-uncertainty_files/figure-html/everest_margins-1.png" width="70%" />
]

---

## Marginal effects example: Height reached on Everest

Other visualization options: half-eye

.center[
<img src="16-visualize-uncertainty_files/figure-html/everest_margins2-1.png" width="70%" />
]

---

## Marginal effects example: Height reached on Everest

Other visualization options: gradient interval

.center[
<img src="16-visualize-uncertainty_files/figure-html/everest_margins3-1.png" width="70%" />
]

---

## Marginal effects example: Height reached on Everest

Other visualization options: quantile dotplot

.center[
<img src="16-visualize-uncertainty_files/figure-html/everest_margins4-1.png" width="70%" />
]

---

## Marginal effects example: Height reached on Everest

Other visualization options: quantile dotplot

.center[
<img src="16-visualize-uncertainty_files/figure-html/everest_margins5-1.png" width="70%" />
]

---

## Marginal effects example: Height reached on Everest

Other visualization options: quantile dotplot

.center[
<img src="16-visualize-uncertainty_files/figure-html/everest_margins6-1.png" width="70%" />
]

---

## Making a plot with error bars in R

Example: Relationship between life expectancy and GDP per capita

.center[
<img src="16-visualize-uncertainty_files/figure-html/gapminder-regressions-1.png" width="70%" />
]

---

## Making a plot with error bars in R

Example: Relationship between life expectancy and GDP per capita

.pull-left[
<img src="16-visualize-uncertainty_files/figure-html/gapminder-regressions2-1.png" width="100%" />
]

.pull-right[
<br>
<img src="16-visualize-uncertainty_files/figure-html/gapminder-summary-1.png" width="100%" />

]

---

## Gapminder

See [gapminder.org](https://www.gapminder.org/) for fantastic visualizations and up-to-date data

```r
gapminder
```

```
## # A tibble: 1,704 × 6
##    country     continent  year lifeExp      pop gdpPercap
##    <fct>       <fct>     <int>   <dbl>    <int>     <dbl>
##  1 Afghanistan Asia       1952    28.8  8425333      779.
##  2 Afghanistan Asia       1957    30.3  9240934      821.
##  3 Afghanistan Asia       1962    32.0 10267083      853.
##  4 Afghanistan Asia       1967    34.0 11537966      836.
##  5 Afghanistan Asia       1972    36.1 13079460      740.
##  6 Afghanistan Asia       1977    38.4 14880372      786.
##  7 Afghanistan Asia       1982    39.9 12881816      978.
##  8 Afghanistan Asia       1987    40.8 13867957      852.
##  9 Afghanistan Asia       1992    41.7 16317921      649.
## 10 Afghanistan Asia       1997    41.8 22227415      635.
## # … with 1,694 more rows
```

---

## Making a plot with error bars in R

```r
lm_data <- gapminder %>%
  nest(data = -c(continent, year))

lm_data
```

```
## # A tibble: 60 × 3
##    continent  year data             
##    <fct>     <int> <list>           
##  1 Asia       1952 <tibble [33 × 4]>
##  2 Asia       1957 <tibble [33 × 4]>
##  3 Asia       1962 <tibble [33 × 4]>
##  4 Asia       1967 <tibble [33 × 4]>
##  5 Asia       1972 <tibble [33 × 4]>
##  6 Asia       1977 <tibble [33 × 4]>
##  7 Asia       1982 <tibble [33 × 4]>
##  8 Asia       1987 <tibble [33 × 4]>
##  9 Asia       1992 <tibble [33 × 4]>
## 10 Asia       1997 <tibble [33 × 4]>
## # … with 50 more rows
```
]

---

## Making a plot with error bars in R

```r
lm_data <- gapminder %>%
  nest(data = -c(continent, year)) %>%
  mutate(
    fit = map(data, ~lm(lifeExp ~ log(gdpPercap), data = .x))
  )

lm_data
```

```
## # A tibble: 60 × 4
##    continent  year data              fit   
##    <fct>     <int> <list>            <list>
##  1 Asia       1952 <tibble [33 × 4]> <lm>  
##  2 Asia       1957 <tibble [33 × 4]> <lm>  
##  3 Asia       1962 <tibble [33 × 4]> <lm>  
##  4 Asia       1967 <tibble [33 × 4]> <lm>  
##  5 Asia       1972 <tibble [33 × 4]> <lm>  
##  6 Asia       1977 <tibble [33 × 4]> <lm>  
##  7 Asia       1982 <tibble [33 × 4]> <lm>  
##  8 Asia       1987 <tibble [33 × 4]> <lm>  
##  9 Asia       1992 <tibble [33 × 4]> <lm>  
## 10 Asia       1997 <tibble [33 × 4]> <lm>  
## # … with 50 more rows
```
]

---

## Making a plot with error bars in R

```r
lm_data <- gapminder %>%
  nest(data = -c(continent, year)) %>%
  mutate(
    fit = map(data, ~lm(lifeExp ~ log(gdpPercap), data = .x)),
    tidy_out = map(fit, tidy)
  )

lm_data
```

```
## # A tibble: 60 × 5
##    continent  year data              fit    tidy_out        
##    <fct>     <int> <list>            <list> <list>          
##  1 Asia       1952 <tibble [33 × 4]> <lm>   <tibble [2 × 5]>
##  2 Asia       1957 <tibble [33 × 4]> <lm>   <tibble [2 × 5]>
##  3 Asia       1962 <tibble [33 × 4]> <lm>   <tibble [2 × 5]>
##  4 Asia       1967 <tibble [33 × 4]> <lm>   <tibble [2 × 5]>
##  5 Asia       1972 <tibble [33 × 4]> <lm>   <tibble [2 × 5]>
##  6 Asia       1977 <tibble [33 × 4]> <lm>   <tibble [2 × 5]>
##  7 Asia       1982 <tibble [33 × 4]> <lm>   <tibble [2 × 5]>
##  8 Asia       1987 <tibble [33 × 4]> <lm>   <tibble [2 × 5]>
##  9 Asia       1992 <tibble [33 × 4]> <lm>   <tibble [2 × 5]>
## 10 Asia       1997 <tibble [33 × 4]> <lm>   <tibble [2 × 5]>
## # … with 50 more rows
```
]

---

## Making a plot with error bars in R

```r
lm_data <- gapminder %>%
  nest(data = -c(continent, year)) %>%
  mutate(
    fit = map(data, ~lm(lifeExp ~ log(gdpPercap), data = .x)),
    tidy_out = map(fit, tidy)
  ) %>%
  unnest(cols = tidy_out)

lm_data
```

```
## # A tibble: 120 × 9
##    continent  year data      fit    term    estimate std.error statistic p.value
##    <fct>     <int> <list>    <list> <chr>      <dbl>     <dbl>     <dbl>   <dbl>
##  1 Asia       1952 <tibble … <lm>   (Inter…    15.8       9.27      1.71 9.78e-2
##  2 Asia       1952 <tibble … <lm>   log(gd…     4.16      1.25      3.33 2.28e-3
##  3 Asia       1957 <tibble … <lm>   (Inter…    18.1       9.70      1.86 7.20e-2
##  4 Asia       1957 <tibble … <lm>   log(gd…     4.17      1.28      3.26 2.71e-3
##  5 Asia       1962 <tibble … <lm>   (Inter…    16.6       9.52      1.74 9.11e-2
##  6 Asia       1962 <tibble … <lm>   log(gd…     4.59      1.24      3.72 7.94e-4
##  7 Asia       1967 <tibble … <lm>   (Inter…    19.8       9.05      2.19 3.64e-2
##  8 Asia       1967 <tibble … <lm>   log(gd…     4.50      1.15      3.90 4.77e-4
##  9 Asia       1972 <tibble … <lm>   (Inter…    21.9       8.14      2.69 1.13e-2
## 10 Asia       1972 <tibble … <lm>   log(gd…     4.44      1.01      4.41 1.16e-4
## # … with 110 more rows
```
]

---

## Making a plot with error bars in R

lm_data
```

```
## # A tibble: 120 × 7
##    continent  year term           estimate std.error statistic  p.value
##    <fct>     <int> <chr>             <dbl>     <dbl>     <dbl>    <dbl>
##  1 Asia       1952 (Intercept)       15.8       9.27      1.71 0.0978  
##  2 Asia       1952 log(gdpPercap)     4.16      1.25      3.33 0.00228 
##  3 Asia       1957 (Intercept)       18.1       9.70      1.86 0.0720  
##  4 Asia       1957 log(gdpPercap)     4.17      1.28      3.26 0.00271 
##  5 Asia       1962 (Intercept)       16.6       9.52      1.74 0.0911  
##  6 Asia       1962 log(gdpPercap)     4.59      1.24      3.72 0.000794
##  7 Asia       1967 (Intercept)       19.8       9.05      2.19 0.0364  
##  8 Asia       1967 log(gdpPercap)     4.50      1.15      3.90 0.000477
##  9 Asia       1972 (Intercept)       21.9       8.14      2.69 0.0113  
## 10 Asia       1972 log(gdpPercap)     4.44      1.01      4.41 0.000116
## # … with 110 more rows
```
]

---

## Making a plot with error bars in R

lm_data
```

```
## # A tibble: 48 × 7
##    continent  year term           estimate std.error statistic       p.value
##    <fct>     <int> <chr>             <dbl>     <dbl>     <dbl>         <dbl>
##  1 Asia       1952 log(gdpPercap)     4.16     1.25       3.33 0.00228      
##  2 Asia       1957 log(gdpPercap)     4.17     1.28       3.26 0.00271      
##  3 Asia       1962 log(gdpPercap)     4.59     1.24       3.72 0.000794     
##  4 Asia       1967 log(gdpPercap)     4.50     1.15       3.90 0.000477     
##  5 Asia       1972 log(gdpPercap)     4.44     1.01       4.41 0.000116     
##  6 Asia       1977 log(gdpPercap)     4.87     1.03       4.75 0.0000442    
##  7 Asia       1982 log(gdpPercap)     4.78     0.852      5.61 0.00000377   
##  8 Asia       1987 log(gdpPercap)     5.17     0.727      7.12 0.0000000531 
##  9 Asia       1992 log(gdpPercap)     5.09     0.649      7.84 0.00000000760
## 10 Asia       1997 log(gdpPercap)     5.11     0.628      8.15 0.00000000335
## # … with 38 more rows
```
]

---

## Making a plot with error bars in R

.small.pull-left[

```r
ggplot(lm_data) +
  aes(
    x = year, y = estimate,
    ymin = estimate - 1.96*std.error,
    ymax = estimate + 1.96*std.error,
    color = continent
  ) +
  geom_pointrange(
    position = position_dodge(width = 1)
  ) +
  scale_x_continuous(
    breaks = unique(gapminder$year)
  ) + 
  theme(legend.position = "top")
```
]

.pull-right[
<img src="16-visualize-uncertainty_files/figure-html/gapminder-model-out-1.png" width="100%" />
]

???

Figure and code idea from [Kieran Healy. Data Visualization: A practical introduction. Princeton University Press, 2019.](https://socviz.co/)

---

## Half-eyes, gradient intervals, etc.

The **ggdist** package provides many different visualizations of uncertainty

.small.pull-left[

```r
library(ggdist)
library(distributional) # for dist_normal()

lm_data %>%
  filter(year == 1952) %>%
  mutate(
    continent = 
      fct_reorder(continent, estimate) 
  ) %>%
  ggplot(aes(x = estimate, y = continent)) +
  stat_dist_halfeye(
    aes(dist = dist_normal(
      mu = estimate, sigma = std.error
    )),
    point_size = 4
  )
```
]

.pull-right[
<img src="16-visualize-uncertainty_files/figure-html/gapminder-halfeye-out-1.png" width="100%" />
]

---

## Half-eyes, gradient intervals, etc.

The **ggdist** package provides many different visualizations of uncertainty

.small.pull-left[

```r
library(ggdist)
library(distributional) # for dist_normal()

lm_data %>%
  filter(year == 1952) %>%
  mutate(
    continent = 
      fct_reorder(continent, estimate) 
  ) %>%
  ggplot(aes(x = estimate, y = continent)) +
  stat_dist_gradientinterval(
    aes(dist = dist_normal(
      mu = estimate, sigma = std.error
    )),
    point_size = 4,
    fill = "skyblue"
  )
```
]

.pull-right[
<img src="16-visualize-uncertainty_files/figure-html/gapminder-gradinterval-out-1.png" width="100%" />
]

---

## Half-eyes, gradient intervals, etc.

The **ggdist** package provides many different visualizations of uncertainty

.small.pull-left[

```r
library(ggdist)
library(distributional) # for dist_normal()

lm_data %>%
  filter(year == 1952) %>%
  mutate(
    continent = 
      fct_reorder(continent, estimate) 
  ) %>%
  ggplot(aes(x = estimate, y = continent)) +
  stat_dist_dotsinterval(
    aes(dist = dist_normal(
      mu = estimate, sigma = std.error
    )),
    point_size = 4,
    fill = "skyblue",
    quantiles = 20
  )
```
]

.pull-right[
<img src="16-visualize-uncertainty_files/figure-html/gapminder-quantiledots-out-1.png" width="100%" />
]

---

## Half-eyes, gradient intervals, etc.

The **ggdist** package provides many different visualizations of uncertainty

.small.pull-left[

```r
library(ggdist)
library(distributional) # for dist_normal()

.pull-right[
<img src="16-visualize-uncertainty_files/figure-html/gapminder-quantiledots2-out-1.png" width="100%" />
]

---

## Further reading and acknowledgements

- Acknowledgements: Slides from [Visualizing uncertainty](https://wilkelab.org/SDS375/slides/visualizing-uncertainty.html) by Claus Wilke
- Further reading
  - Fundamentals of Data Visualization: [Chapter 16: Visualizing uncertainty](https://clauswilke.com/dataviz/visualizing-uncertainty.html)
  - Data Visualization—A Practical Introduction: [Chapter 6.6: Grouped analysis and list columns](https://socviz.co/modeling.html#grouped-analysis-and-list-columns)
  - Data Visualization—A Practical Introduction: [Chapter 6.7: Plot marginal effects](https://socviz.co/modeling.html#plot-marginal-effects)
  - **ggdist** reference documentation: https://mjskay.github.io/ggdist/index.html
  - **ggdist** vignette: [Frequentist uncertainty visualization](https://mjskay.github.io/ggdist/articles/freq-uncertainty-vis.html)