Hands-on Exercise 4.3 - Visualising Uncertainty

Author

Nguyen Nguyen Ha (Summer)

Published

May 6, 2025

Modified

May 6, 2025

1. Learning Outcome

Visualizing uncertainty is a crucial aspect of statistical analysis. This chapter provides hands-on experience in creating informative statistical graphics that effectively communicate uncertainty, including:

Plot statistical error bars: Utilize ggplot2
Create interactive error bars: Combine ggplot2, plotly, and DT
Advanced techniques: Using ggdist package
Generate Hypothetical Outcome Plots (HOPs): Utilize ungeviz package

2. Getting Started

2.1.Installing and loading the packages

For this exercise, we will utilize the following R packages:

tidyverse: A collection of R packages for data science tasks, including data manipulation, transformation, and visualization.
plotly: For creating interactive and dynamic plots.
gganimate: For generating animated plots that showcase changes over time or different conditions.
DT: For displaying interactive HTML tables, enabling features like sorting, filtering, and searching.
crosstalk: For implementing cross-widget interactions, such as linked brushing and filtering, to explore data relationships across different visualizations.
ggdist: For visualizing distributions and uncertainty in a visually appealing and informative manner.

devtools::install_github("wilkelab/ungeviz")

pacman::p_load(ungeviz, plotly, crosstalk,
               DT, ggdist, ggridges,
               colorspace, gganimate, tidyverse)

2.2. Data import

Exam_data.csv will be used for this exercise.

A dataset named Exam_data is used in this section. It contains year-end examination scores for a cohort of Primary 3 students from a local school and is stored in CSV format.

The read_csv() function from the readr package, part of the tidyverse, is applied to import the file:

exam <- read_csv("data/Exam_data.csv", show_col_types = FALSE)

📋 Preview of the data:

ID	CLASS	GENDER	RACE	ENGLISH	MATHS	SCIENCE
Student321	3I	Male	Malay	21	9	15
Student305	3I	Female	Malay	24	22	16
Student289	3H	Male	Chinese	26	16	16
Student227	3F	Male	Chinese	27	77	31
Student318	3I	Male	Malay	27	11	25
Student306	3I	Female	Malay	31	16	16

3. Visualizing the uncertainty of point estimates: ggplot2 methods

A point estimate is a single value, such as a mean, while uncertainty is represented by measures like standard error, confidence interval, or credible interval.

Important

It’s crucial to distinguish the uncertainty of a point estimate from the variation within a sample.

In this section, we explore how to plot error bars for math scores by race using data from the exam tibble.

First, the code chunk below calculates the necessary summary statistics.

my_sum <- exam %>%
  group_by(RACE) %>%
  summarise(
    n=n(),
    mean=mean(MATHS),
    sd=sd(MATHS)
    ) %>%
  mutate(se=sd/sqrt(n-1))

Key takeaways from the above code chunk

group_by() (dplyr): Groups observations by RACE.
summarise(): Computes count, mean, and standard deviation.
mutate(): Calculates the standard error of math scores by RACE.
The final output is stored as a tibble named my_sum.

Next, the code chunk below is used to display my_sum tibble data frame in an html table format.

The Code
The Table

knitr::kable(head(my_sum), format = 'html')

RACE	n	mean	sd	se
Chinese	193	76.50777	15.69040	1.132357
Indian	12	60.66667	23.35237	7.041005
Malay	108	57.44444	21.13478	2.043177
Others	9	69.66667	10.72381	3.791438

3.1 Plotting standard error bars of point estimates

Now we can plot the standard error bars of mean maths score by race as shown below.

The Plot
The Code

ggplot(my_sum) +
  geom_errorbar(
    aes(x=RACE, 
        ymin=mean-se, 
        ymax=mean+se), 
    width=0.2, 
    colour="black", 
    alpha=0.9, 
    size=0.5) +
  geom_point(aes
           (x=RACE, 
            y=mean), 
           stat="identity", 
           color="red",
           size = 1.5,
           alpha=1) +
  ggtitle("Standard error of mean maths score by rac")

Key takeaways from the above code chunk

The error bars are computed using the formula mean+/-se.
For geom_point(), it is important to indicate stat=“identity”.

3.2. Plotting confidence interval of point estimates

Instead of plotting the standard error bar of point estimates, we can also plot the confidence intervals of mean maths score by race.

The plot
The code

ggplot(my_sum) +
  geom_errorbar(
    aes(x=reorder(RACE, -mean), 
        ymin=mean-1.96*se, 
        ymax=mean+1.96*se), 
    width=0.2, 
    colour="black", 
    alpha=0.9, 
    size=0.5) +
  geom_point(aes
           (x=RACE, 
            y=mean), 
           stat="identity", 
           color="red",
           size = 1.5,
           alpha=1) +
  labs(x = "Maths score",
       title = "95% confidence interval of mean maths score by race")

Key takeaways from the above code chunk

The confidence intervals are computed using the formula mean+/-1.96*se.
The error bars is sorted using the average maths scores.
labs() argument of ggplot2 is used to change the x-axis label.

3.3. Visualizing the uncertainty of point estimates with interactive error bars

In this section, we explore how to plot interactive error bars for the 99% confidence interval of mean maths score by race as shown in the figure below.

The plot
The code

shared_df = SharedData$new(my_sum)

bscols(widths = c(4,8),
       ggplotly((ggplot(shared_df) +
                   geom_errorbar(aes(
                     x=reorder(RACE, -mean),
                     ymin=mean-2.58*se, 
                     ymax=mean+2.58*se), 
                     width=0.2, 
                     colour="black", 
                     alpha=0.9, 
                     size=0.5) +
                   geom_point(aes(
                     x=RACE, 
                     y=mean, 
                     text = paste("Race:", `RACE`, 
                                  "<br>N:", `n`,
                                  "<br>Avg. Scores:", round(mean, digits = 2),
                                  "<br>95% CI:[", 
                                  round((mean-2.58*se), digits = 2), ",",
                                  round((mean+2.58*se), digits = 2),"]")),
                     stat="identity", 
                     color="red", 
                     size = 1.5, 
                     alpha=1) + 
                   xlab("Race") + 
                   ylab("Average Scores") + 
                   theme_minimal() + 
                   theme(axis.text.x = element_text(
                     angle = 45, vjust = 0.5, hjust=1)) +
                   ggtitle("99% Confidence interval of average /<br>maths scores by race")), 
                tooltip = "text"), 
       DT::datatable(shared_df, 
                     rownames = FALSE, 
                     class="compact", 
                     width="100%", 
                     options = list(pageLength = 10,
                                    scrollX=T), 
                     colnames = c("No. of pupils", 
                                  "Avg Scores",
                                  "Std Dev",
                                  "Std Error")) %>%
         formatRound(columns=c('mean', 'sd', 'se'),
                     digits=2))

4. Visualising Uncertainty: `ggdist` package

ggdist is an R package that enhances ggplot2 by providing flexible geoms and statistical functions specifically designed for visualizing distributions and uncertainty. It supports both frequentist and Bayesian uncertainty visualization, unifying them through the concept of distribution visualization:

For frequentist models, it enables the visualization of confidence distributions or bootstrap distributions (see vignette("freq-uncertainty-vis")).
For Bayesian models, it facilitates the visualization of probability distributions, integrating seamlessly with the tidybayes package, which builds upon ggdist.

4.1. Visualizing the uncertainty of point estimates: ggdist methods

In the code chunk below, stat_pointinterval() of ggdist is used to build a visual for displaying distribution of maths scores by race.

exam %>%
  ggplot(aes(x = RACE, 
             y = MATHS)) +
  stat_pointinterval() +
  labs(
    title = "Visualising confidence intervals of mean math score",
    subtitle = "Mean Point + Multiple-interval plot")

In the code chunk below, the following arguments are used:

.width = 0.95
.point = median
.interval = qi

exam %>%
  ggplot(aes(x = RACE, y = MATHS)) +
  stat_pointinterval(.width = 0.95,
  .point = median,
  .interval = qi) +
  labs(
    title = "Visualising confidence intervals of median math score",
    subtitle = "Median Point + Multiple-interval plot")

Plot with 99% confidence interval

Makeover the plot showing 95% and 99% confidence intervals.

exam %>%
  ggplot(aes(x = RACE, 
             y = MATHS)) +
  stat_pointinterval(.width = 0.99,
                     .point = mean,
                     .interval = qi,
    show.legend = FALSE) +   
  labs(
    title = "Visualising confidence intervals of mean math score",
    subtitle = "Mean Point + Multiple-interval plot")

4.2. Visualizing the uncertainty of point estimates: ggdist methods

exam %>%
  ggplot(aes(x = RACE, 
             y = MATHS)) +
  stat_pointinterval(
    show.legend = FALSE) +   
  labs(
    title = "Visualising confidence intervals of mean math score",
    subtitle = "Mean Point + Multiple-interval plot")

Note

This function comes with many arguments, it is recommended to read the syntax reference for more details

4.3. Visualizing the uncertainty of point estimates: ggdist methods

The code chunk below uses stat_gradientinterval() of ggdist is used to build a visual for displaying distribution of maths scores by race.

exam %>%
  ggplot(aes(x = RACE, 
             y = MATHS)) +
  stat_gradientinterval(   
    fill = "skyblue",      
    show.legend = TRUE     
  ) +                        
  labs(
    title = "Visualising confidence intervals of mean math score",
    subtitle = "Gradient + interval plot")

Note

This function comes with many arguments, it is recommended to read the syntax reference for more details

5. Visualising Uncertainty with Hypothetical Outcome Plots (HOPs)

First we install ungeviz package using below code chunk. This step only needs to be performed once.

devtools::install_github("wilkelab/ungeviz")

Next we launch the application in R.

devtools::install_github("wilkelab/ungeviz")

ggplot(data = exam, 
       (aes(x = factor(RACE), y = MATHS))) +
  geom_point(position = position_jitter(
    height = 0.3, width = 0.05), 
    size = 0.4, color = "#0072B2", alpha = 1/2) +
  geom_hpline(data = sampler(25, group = RACE), height = 0.6, color = "#D55E00") +
  theme_bw() + 
  # `.draw` is a generated column indicating the sample draw
  transition_states(.draw, 1, 3)

Practice: Point-interval plot with shaded ribbons across varying confidence levels

ci_levels <- c(0.25, 0.5, 0.75, 0.95, 1)

ci_data <- map_dfr(ci_levels, function(cl) {
  exam %>%
    group_by(GENDER) %>%
    summarise(
      .width = cl,
      lower = quantile(MATHS, probs = (1 - cl) / 2),
      upper = quantile(MATHS, probs = 1 - (1 - cl) / 2),
      mean = mean(MATHS),   # changed here
      .groups = "drop"
    ) %>%
    mutate(confidence = cl)
})

ggplot(ci_data, aes(x = confidence, y = mean, group = GENDER, color = GENDER, fill = GENDER)) +
  geom_ribbon(aes(ymin = lower, ymax = upper), alpha = 0.2, color = NA) +
  geom_pointrange(aes(ymin = lower, ymax = upper), size = 0.4) +
  scale_x_continuous(breaks = ci_levels, limits = c(0.25, 1)) +
  labs(
    title = "Mean MATHS Scores with Confidence Intervals by GENDER",
    subtitle = "Vertical bars + shaded ribbons represent uncertainty at each confidence level",
    x = "Confidence Level",
    y = "Mean MATHS Score",
    color = "GENDER",
    fill = "GENDER"
  ) +
  theme_minimal()

The Male group shows slightly more uncertainty (wider bands) at higher levels.
The overlap in ribbons suggests no strong difference in means between genders.

1. Learning Outcome

2. Getting Started

2.1.Installing and loading the packages

2.2. Data import

3. Visualizing the uncertainty of point estimates: ggplot2 methods

3.1 Plotting standard error bars of point estimates

3.2. Plotting confidence interval of point estimates

3.3. Visualizing the uncertainty of point estimates with interactive error bars

4. Visualising Uncertainty: ggdist package

4.1. Visualizing the uncertainty of point estimates: ggdist methods

4.2. Visualizing the uncertainty of point estimates: ggdist methods

4.3. Visualizing the uncertainty of point estimates: ggdist methods

5. Visualising Uncertainty with Hypothetical Outcome Plots (HOPs)

Practice: Point-interval plot with shaded ribbons across varying confidence levels

4. Visualising Uncertainty: `ggdist` package