class: center, middle, inverse, title-slide # visualizing uncertainty ### 2021-10-27 --- class: middle, inverse # Welcome --- ## Announcements - You can now find thank you cards and ugly ggplots linked from the course website - Project 1s: - Grades updated, well done on resubmissions! - Request to link to writeups (project website) and slides (not repo, commits, etc.) from the course website -- note: your names will be publicly available - Request will also include question about whether you want to "own" a copy of your Project 1 repo, i.e. make commit history and issues not containing scores - Need team consensus for both - Helpful to highlight your work + show future STA/ISS 313 students what they'll learn in the course - Watch assigned video before Friday's class - More on live-coding webinars --- ## Setup .midi[ ```r # load packages library(tidyverse) library(here) library(colorspace) library(cowplot) library(emmeans) library(broom) library(gapminder) library(emo) # install_github("hadley/emo") library(ungeviz) # install_github("wilkelab/ungeviz") # set default theme for ggplot2 ggplot2::theme_set(ggplot2::theme_minimal(base_size = 16)) update_geom_defaults("point", list(size = 2)) # 2 for full width, 2.5 for half width # set default figure parameters for knitr knitr::opts_chunk$set( fig.width = 8, fig.asp = 0.618, fig.retina = 3, dpi = 300, out.width = "60%" ) # dplyr print min and max options(dplyr.print_max = 10, dplyr.print_min = 10) ``` ] --- class: middle ## .hand[Let's imagine we're playing a game] --- class: middle ## .hand[The odds are in your favor:] ## .hand[You have a 90% chance of winning!] --- class: center middle background-image: url("images/Disappearing_dots.gif") background-size: contain background-color: #cccccc <style> .move-down { margin-top: -6em; } </style> .move-down[ # playing ] ??? Image by Wikiemdia user [Jahobr](https://commons.wikimedia.org/wiki/User:Jahobr), released into the public domain. https://commons.wikimedia.org/wiki/File:Disappearing_dots.gif --- class: middle ## .hand[Sorry, you lost.] 🙂 --- class: middle ## .hand[How does that make you feel?] --- ## We are bad at judging uncertainty -- * You had a 10% chance of losing -- * One in ten playing this game will lose -- * 90% chance of winning is nowhere near a certain win --- ## It helps to visualize a set of possible outcomes Possible outcomes from 100 individual games played .center[ <img src="16-visualize-uncertainty_files/figure-html/freq-waffle-1.png" width="45%" /> ] --- ## Frequency framing This type of visualization is called **frequency framing** .center[ <img src="16-visualize-uncertainty_files/figure-html/freq-waffle2-1.png" width="90%" /> ] --- ## Visualizing uncertainty of point estimates -- - A point estimate is a single number, such as a mean -- - Uncertainty is expressed as standard error, confidence interval, or credible interval -- - Important: Uncertainty of a point estimate != variation in the sample --- ## Key concepts of statistical sampling .center[ <img src="16-visualize-uncertainty_files/figure-html/sampling-schematic1-1.png" width="60%" /> ] ??? Figure redrawn from [Claus O. Wilke. Fundamentals of Data Visualization. O'Reilly, 2019.](https://clauswilke.com/dataviz) --- ## Key concepts of statistical sampling .center[ <img src="16-visualize-uncertainty_files/figure-html/sampling-schematic2-1.png" width="60%" /> ] ??? Figure redrawn from [Claus O. Wilke. Fundamentals of Data Visualization. O'Reilly, 2019.](https://clauswilke.com/dataviz) --- ## Key concepts of statistical sampling .center[ <img src="16-visualize-uncertainty_files/figure-html/sampling-schematic3-1.png" width="60%" /> ] ??? Figure redrawn from [Claus O. Wilke. Fundamentals of Data Visualization. O'Reilly, 2019.](https://clauswilke.com/dataviz) --- ## Confidence intervals .task[ What does "95% confident" mean in the following sentence? ] .panelset[ .panel[.panel-name[Sentence] > We are 95% confident that the confidence interval includes the true population mean. ] .panel[.panel-name[Discuss] <iframe src="https://app.sli.do/event/rxg9buzy" height="100%" width="100%" frameBorder="0" style="min-height: 560px;" title="Slido"></iframe> ] ] --- ## Frequentist interpretation of a confidence interval <img src="16-visualize-uncertainty_files/figure-html/unnamed-chunk-2-1.png" width="50%" style="display: block; margin: auto;" /> ??? Figure from [Claus O. Wilke. Fundamentals of Data Visualization. O'Reilly, 2019.](https://clauswilke.com/dataviz) --- ## Everest ```r everest <- read_csv(here::here("16-visualize-uncertainty", "data/everest.csv")) ``` .small[ ```r everest ``` ``` ## # A tibble: 21,813 × 21 ## expedition_id member_id peak_id peak_name year season sex age ## <chr> <chr> <chr> <chr> <dbl> <chr> <chr> <dbl> ## 1 EVER63101 EVER63101-03 EVER Everest 1963 Spring M 36 ## 2 EVER63101 EVER63101-04 EVER Everest 1963 Spring M 31 ## 3 EVER63101 EVER63101-05 EVER Everest 1963 Spring M 27 ## 4 EVER63101 EVER63101-06 EVER Everest 1963 Spring M 26 ## 5 EVER63101 EVER63101-07 EVER Everest 1963 Spring M 26 ## 6 EVER63101 EVER63101-08 EVER Everest 1963 Spring M 29 ## 7 EVER63101 EVER63101-01 EVER Everest 1963 Spring M 44 ## 8 EVER63101 EVER63101-09 EVER Everest 1963 Spring M 37 ## 9 EVER63101 EVER63101-10 EVER Everest 1963 Spring M 32 ## 10 EVER63101 EVER63101-11 EVER Everest 1963 Spring M 26 ## # … with 21,803 more rows, and 13 more variables: citizenship <chr>, ## # expedition_role <chr>, hired <lgl>, highpoint_metres <dbl>, success <lgl>, ## # solo <lgl>, oxygen_used <lgl>, died <lgl>, death_cause <chr>, ## # death_height_metres <dbl>, injured <lgl>, injury_type <chr>, ## # injury_height_metres <dbl> ``` ] --- ## Example: Highest point reached on Everest in 2019 Includes only climbers and expedition members who **did not** summit .center[ <img src="16-visualize-uncertainty_files/figure-html/everest-highest-point-1.png" width="70%" /> ] --- ## Marginal effects example: Height reached on Everest -- Average height reached relative to:<br> a male climber who climbed with oxygen, summited, and survived -- .center[ <img src="16-visualize-uncertainty_files/figure-html/everest_margins-1.png" width="70%" /> ] --- ## Marginal effects example: Height reached on Everest Other visualization options: half-eye .center[ <img src="16-visualize-uncertainty_files/figure-html/everest_margins2-1.png" width="70%" /> ] --- ## Marginal effects example: Height reached on Everest Other visualization options: gradient interval .center[ <img src="16-visualize-uncertainty_files/figure-html/everest_margins3-1.png" width="70%" /> ] --- ## Marginal effects example: Height reached on Everest Other visualization options: quantile dotplot .center[ <img src="16-visualize-uncertainty_files/figure-html/everest_margins4-1.png" width="70%" /> ] --- ## Marginal effects example: Height reached on Everest Other visualization options: quantile dotplot .center[ <img src="16-visualize-uncertainty_files/figure-html/everest_margins5-1.png" width="70%" /> ] --- ## Marginal effects example: Height reached on Everest Other visualization options: quantile dotplot .center[ <img src="16-visualize-uncertainty_files/figure-html/everest_margins6-1.png" width="70%" /> ] --- ## Making a plot with error bars in R Example: Relationship between life expectancy and GDP per capita .center[ <img src="16-visualize-uncertainty_files/figure-html/gapminder-regressions-1.png" width="70%" /> ] --- ## Making a plot with error bars in R Example: Relationship between life expectancy and GDP per capita .pull-left[ <img src="16-visualize-uncertainty_files/figure-html/gapminder-regressions2-1.png" width="100%" /> ] .pull-right[ <br> <img src="16-visualize-uncertainty_files/figure-html/gapminder-summary-1.png" width="100%" /> ] --- ## Gapminder See [gapminder.org](https://www.gapminder.org/) for fantastic visualizations and up-to-date data ```r gapminder ``` ``` ## # A tibble: 1,704 × 6 ## country continent year lifeExp pop gdpPercap ## <fct> <fct> <int> <dbl> <int> <dbl> ## 1 Afghanistan Asia 1952 28.8 8425333 779. ## 2 Afghanistan Asia 1957 30.3 9240934 821. ## 3 Afghanistan Asia 1962 32.0 10267083 853. ## 4 Afghanistan Asia 1967 34.0 11537966 836. ## 5 Afghanistan Asia 1972 36.1 13079460 740. ## 6 Afghanistan Asia 1977 38.4 14880372 786. ## 7 Afghanistan Asia 1982 39.9 12881816 978. ## 8 Afghanistan Asia 1987 40.8 13867957 852. ## 9 Afghanistan Asia 1992 41.7 16317921 649. ## 10 Afghanistan Asia 1997 41.8 22227415 635. ## # … with 1,694 more rows ``` --- ## Making a plot with error bars in R .small[ ```r lm_data <- gapminder %>% nest(data = -c(continent, year)) lm_data ``` ``` ## # A tibble: 60 × 3 ## continent year data ## <fct> <int> <list> ## 1 Asia 1952 <tibble [33 × 4]> ## 2 Asia 1957 <tibble [33 × 4]> ## 3 Asia 1962 <tibble [33 × 4]> ## 4 Asia 1967 <tibble [33 × 4]> ## 5 Asia 1972 <tibble [33 × 4]> ## 6 Asia 1977 <tibble [33 × 4]> ## 7 Asia 1982 <tibble [33 × 4]> ## 8 Asia 1987 <tibble [33 × 4]> ## 9 Asia 1992 <tibble [33 × 4]> ## 10 Asia 1997 <tibble [33 × 4]> ## # … with 50 more rows ``` ] --- ## Making a plot with error bars in R .small[ ```r lm_data <- gapminder %>% nest(data = -c(continent, year)) %>% mutate( fit = map(data, ~lm(lifeExp ~ log(gdpPercap), data = .x)) ) lm_data ``` ``` ## # A tibble: 60 × 4 ## continent year data fit ## <fct> <int> <list> <list> ## 1 Asia 1952 <tibble [33 × 4]> <lm> ## 2 Asia 1957 <tibble [33 × 4]> <lm> ## 3 Asia 1962 <tibble [33 × 4]> <lm> ## 4 Asia 1967 <tibble [33 × 4]> <lm> ## 5 Asia 1972 <tibble [33 × 4]> <lm> ## 6 Asia 1977 <tibble [33 × 4]> <lm> ## 7 Asia 1982 <tibble [33 × 4]> <lm> ## 8 Asia 1987 <tibble [33 × 4]> <lm> ## 9 Asia 1992 <tibble [33 × 4]> <lm> ## 10 Asia 1997 <tibble [33 × 4]> <lm> ## # … with 50 more rows ``` ] --- ## Making a plot with error bars in R .small[ ```r lm_data <- gapminder %>% nest(data = -c(continent, year)) %>% mutate( fit = map(data, ~lm(lifeExp ~ log(gdpPercap), data = .x)), tidy_out = map(fit, tidy) ) lm_data ``` ``` ## # A tibble: 60 × 5 ## continent year data fit tidy_out ## <fct> <int> <list> <list> <list> ## 1 Asia 1952 <tibble [33 × 4]> <lm> <tibble [2 × 5]> ## 2 Asia 1957 <tibble [33 × 4]> <lm> <tibble [2 × 5]> ## 3 Asia 1962 <tibble [33 × 4]> <lm> <tibble [2 × 5]> ## 4 Asia 1967 <tibble [33 × 4]> <lm> <tibble [2 × 5]> ## 5 Asia 1972 <tibble [33 × 4]> <lm> <tibble [2 × 5]> ## 6 Asia 1977 <tibble [33 × 4]> <lm> <tibble [2 × 5]> ## 7 Asia 1982 <tibble [33 × 4]> <lm> <tibble [2 × 5]> ## 8 Asia 1987 <tibble [33 × 4]> <lm> <tibble [2 × 5]> ## 9 Asia 1992 <tibble [33 × 4]> <lm> <tibble [2 × 5]> ## 10 Asia 1997 <tibble [33 × 4]> <lm> <tibble [2 × 5]> ## # … with 50 more rows ``` ] --- ## Making a plot with error bars in R .small[ ```r lm_data <- gapminder %>% nest(data = -c(continent, year)) %>% mutate( fit = map(data, ~lm(lifeExp ~ log(gdpPercap), data = .x)), tidy_out = map(fit, tidy) ) %>% unnest(cols = tidy_out) lm_data ``` ``` ## # A tibble: 120 × 9 ## continent year data fit term estimate std.error statistic p.value ## <fct> <int> <list> <list> <chr> <dbl> <dbl> <dbl> <dbl> ## 1 Asia 1952 <tibble … <lm> (Inter… 15.8 9.27 1.71 9.78e-2 ## 2 Asia 1952 <tibble … <lm> log(gd… 4.16 1.25 3.33 2.28e-3 ## 3 Asia 1957 <tibble … <lm> (Inter… 18.1 9.70 1.86 7.20e-2 ## 4 Asia 1957 <tibble … <lm> log(gd… 4.17 1.28 3.26 2.71e-3 ## 5 Asia 1962 <tibble … <lm> (Inter… 16.6 9.52 1.74 9.11e-2 ## 6 Asia 1962 <tibble … <lm> log(gd… 4.59 1.24 3.72 7.94e-4 ## 7 Asia 1967 <tibble … <lm> (Inter… 19.8 9.05 2.19 3.64e-2 ## 8 Asia 1967 <tibble … <lm> log(gd… 4.50 1.15 3.90 4.77e-4 ## 9 Asia 1972 <tibble … <lm> (Inter… 21.9 8.14 2.69 1.13e-2 ## 10 Asia 1972 <tibble … <lm> log(gd… 4.44 1.01 4.41 1.16e-4 ## # … with 110 more rows ``` ] --- ## Making a plot with error bars in R .small[ ```r lm_data <- gapminder %>% nest(data = -c(continent, year)) %>% mutate( fit = map(data, ~lm(lifeExp ~ log(gdpPercap), data = .x)), tidy_out = map(fit, tidy) ) %>% unnest(cols = tidy_out) %>% select(-fit, -data) lm_data ``` ``` ## # A tibble: 120 × 7 ## continent year term estimate std.error statistic p.value ## <fct> <int> <chr> <dbl> <dbl> <dbl> <dbl> ## 1 Asia 1952 (Intercept) 15.8 9.27 1.71 0.0978 ## 2 Asia 1952 log(gdpPercap) 4.16 1.25 3.33 0.00228 ## 3 Asia 1957 (Intercept) 18.1 9.70 1.86 0.0720 ## 4 Asia 1957 log(gdpPercap) 4.17 1.28 3.26 0.00271 ## 5 Asia 1962 (Intercept) 16.6 9.52 1.74 0.0911 ## 6 Asia 1962 log(gdpPercap) 4.59 1.24 3.72 0.000794 ## 7 Asia 1967 (Intercept) 19.8 9.05 2.19 0.0364 ## 8 Asia 1967 log(gdpPercap) 4.50 1.15 3.90 0.000477 ## 9 Asia 1972 (Intercept) 21.9 8.14 2.69 0.0113 ## 10 Asia 1972 log(gdpPercap) 4.44 1.01 4.41 0.000116 ## # … with 110 more rows ``` ] --- ## Making a plot with error bars in R .small[ ```r lm_data <- gapminder %>% nest(data = -c(continent, year)) %>% mutate( fit = map(data, ~lm(lifeExp ~ log(gdpPercap), data = .x)), tidy_out = map(fit, tidy) ) %>% unnest(cols = tidy_out) %>% select(-fit, -data) %>% filter(term != "(Intercept)", continent != "Oceania") lm_data ``` ``` ## # A tibble: 48 × 7 ## continent year term estimate std.error statistic p.value ## <fct> <int> <chr> <dbl> <dbl> <dbl> <dbl> ## 1 Asia 1952 log(gdpPercap) 4.16 1.25 3.33 0.00228 ## 2 Asia 1957 log(gdpPercap) 4.17 1.28 3.26 0.00271 ## 3 Asia 1962 log(gdpPercap) 4.59 1.24 3.72 0.000794 ## 4 Asia 1967 log(gdpPercap) 4.50 1.15 3.90 0.000477 ## 5 Asia 1972 log(gdpPercap) 4.44 1.01 4.41 0.000116 ## 6 Asia 1977 log(gdpPercap) 4.87 1.03 4.75 0.0000442 ## 7 Asia 1982 log(gdpPercap) 4.78 0.852 5.61 0.00000377 ## 8 Asia 1987 log(gdpPercap) 5.17 0.727 7.12 0.0000000531 ## 9 Asia 1992 log(gdpPercap) 5.09 0.649 7.84 0.00000000760 ## 10 Asia 1997 log(gdpPercap) 5.11 0.628 8.15 0.00000000335 ## # … with 38 more rows ``` ] --- ## Making a plot with error bars in R .small.pull-left[ ```r ggplot(lm_data) + aes( x = year, y = estimate, ymin = estimate - 1.96*std.error, ymax = estimate + 1.96*std.error, color = continent ) + geom_pointrange( position = position_dodge(width = 1) ) + scale_x_continuous( breaks = unique(gapminder$year) ) + theme(legend.position = "top") ``` ] .pull-right[ <img src="16-visualize-uncertainty_files/figure-html/gapminder-model-out-1.png" width="100%" /> ] ??? Figure and code idea from [Kieran Healy. Data Visualization: A practical introduction. Princeton University Press, 2019.](https://socviz.co/) --- ## Half-eyes, gradient intervals, etc. -- The **ggdist** package provides many different visualizations of uncertainty -- .small.pull-left[ ```r library(ggdist) library(distributional) # for dist_normal() lm_data %>% filter(year == 1952) %>% mutate( continent = fct_reorder(continent, estimate) ) %>% ggplot(aes(x = estimate, y = continent)) + stat_dist_halfeye( aes(dist = dist_normal( mu = estimate, sigma = std.error )), point_size = 4 ) ``` ] .pull-right[ <img src="16-visualize-uncertainty_files/figure-html/gapminder-halfeye-out-1.png" width="100%" /> ] --- ## Half-eyes, gradient intervals, etc. The **ggdist** package provides many different visualizations of uncertainty .small.pull-left[ ```r library(ggdist) library(distributional) # for dist_normal() lm_data %>% filter(year == 1952) %>% mutate( continent = fct_reorder(continent, estimate) ) %>% ggplot(aes(x = estimate, y = continent)) + stat_dist_gradientinterval( aes(dist = dist_normal( mu = estimate, sigma = std.error )), point_size = 4, fill = "skyblue" ) ``` ] .pull-right[ <img src="16-visualize-uncertainty_files/figure-html/gapminder-gradinterval-out-1.png" width="100%" /> ] --- ## Half-eyes, gradient intervals, etc. The **ggdist** package provides many different visualizations of uncertainty .small.pull-left[ ```r library(ggdist) library(distributional) # for dist_normal() lm_data %>% filter(year == 1952) %>% mutate( continent = fct_reorder(continent, estimate) ) %>% ggplot(aes(x = estimate, y = continent)) + stat_dist_dotsinterval( aes(dist = dist_normal( mu = estimate, sigma = std.error )), point_size = 4, fill = "skyblue", quantiles = 20 ) ``` ] .pull-right[ <img src="16-visualize-uncertainty_files/figure-html/gapminder-quantiledots-out-1.png" width="100%" /> ] --- ## Half-eyes, gradient intervals, etc. The **ggdist** package provides many different visualizations of uncertainty .small.pull-left[ ```r library(ggdist) library(distributional) # for dist_normal() lm_data %>% filter(year == 1952) %>% mutate( continent = fct_reorder(continent, estimate) ) %>% ggplot(aes(x = estimate, y = continent)) + stat_dist_dotsinterval( aes(dist = dist_normal( mu = estimate, sigma = std.error )), point_size = 4, fill = "skyblue", quantiles = 10 ) ``` ] .pull-right[ <img src="16-visualize-uncertainty_files/figure-html/gapminder-quantiledots2-out-1.png" width="100%" /> ] --- ## Further reading and acknowledgements - Acknowledgements: Slides from [Visualizing uncertainty](https://wilkelab.org/SDS375/slides/visualizing-uncertainty.html) by Claus Wilke - Further reading - Fundamentals of Data Visualization: [Chapter 16: Visualizing uncertainty](https://clauswilke.com/dataviz/visualizing-uncertainty.html) - Data Visualization—A Practical Introduction: [Chapter 6.6: Grouped analysis and list columns](https://socviz.co/modeling.html#grouped-analysis-and-list-columns) - Data Visualization—A Practical Introduction: [Chapter 6.7: Plot marginal effects](https://socviz.co/modeling.html#plot-marginal-effects) - **ggdist** reference documentation: https://mjskay.github.io/ggdist/index.html - **ggdist** vignette: [Frequentist uncertainty visualization](https://mjskay.github.io/ggdist/articles/freq-uncertainty-vis.html)