Skip to content

Add marginal dist. to PPD plots#429

Open
mattansb wants to merge 41 commits intostan-dev:masterfrom
mattansb:master
Open

Add marginal dist. to PPD plots#429
mattansb wants to merge 41 commits intostan-dev:masterfrom
mattansb:master

Conversation

@mattansb
Copy link

@mattansb mattansb commented Mar 8, 2026

Addresses #425

This PR adds the show_marginal argument (default FALSE) to the PPD-distributions and the PPD-test-statistics functions (the PPD-intervals functions de-facto show the marginal PPD(s)).

I think we need a better default PPD color - which?

Here are all of these functions (default plots -- show_marginal=FALSE -- are unchanged!):

library(bayesplot)
#> This is bayesplot version 1.15.0.9000
#> - Online documentation and vignettes at mc-stan.org/bayesplot
#> - bayesplot theme set to bayesplot::theme_default()
#>    * Does _not_ affect other ggplot2 plots
#>    * See ?bayesplot_theme_set for details on theme setting

ypred <- example_yrep_draws()[1:19,]
g <- example_group_data()

ppd-distributions

ppd_dens_overlay(ypred) # mid color

ppd_dens_overlay(ypred, show_marginal = TRUE) # light + dark colors

# ppd_ecdf_overlay(ypred)
ppd_ecdf_overlay(ypred, show_marginal = TRUE)

# ppd_dens(ypred)
ppd_dens(ypred, show_marginal = TRUE)

# ppd_hist(ypred)
ppd_hist(ypred, show_marginal = TRUE)
#> `stat_bin()` using `bins = 30`. Pick better value `binwidth`.
#> `stat_bin()` using `bins = 30`. Pick better value `binwidth`.

# ppd_dots(ypred)
ppd_dots(ypred, show_marginal = TRUE)

# ppd_freqpoly(ypred)
ppd_freqpoly(ypred, show_marginal = TRUE)
#> `stat_bin()` using `bins = 30`. Pick better value `binwidth`.
#> `stat_bin()` using `bins = 30`. Pick better value `binwidth`.

# ppd_freqpoly_grouped(ypred[1:4,], group = g)
ppd_freqpoly_grouped(ypred[1:4,], group = g, show_marginal = TRUE)
#> `stat_bin()` using `bins = 30`. Pick better value `binwidth`.
#> `stat_bin()` using `bins = 30`. Pick better value `binwidth`.

# ppd_boxplot(ypred[1:7,])
ppd_boxplot(ypred[1:7,], show_marginal = TRUE)

ppd-test-statistics

# ppd_stat_data returns marginal data too...
ppd_stat_data(ypred, g, var, show_marginal = TRUE) |>
  subset(variable == "PPD")
#> # A tibble: 2 × 3
#>   group  variable value
#>   <fct>  <fct>    <dbl>
#> 1 GroupA PPD       463.
#> 2 GroupB PPD       389.

# ppd_stat(ypred)
ppd_stat(ypred, show_marginal = TRUE)
#> `stat_bin()` using `bins = 30`. Pick better value `binwidth`.

# ppd_stat_grouped(ypred, g)
ppd_stat_grouped(ypred, g, show_marginal = TRUE)
#> `stat_bin()` using `bins = 30`. Pick better value `binwidth`.

# ppd_stat_freqpoly(ypred)
ppd_stat_freqpoly(ypred, show_marginal = TRUE)
#> `stat_bin()` using `bins = 30`. Pick better value `binwidth`.

# ppd_stat_freqpoly_grouped(ypred, g)
ppd_stat_freqpoly_grouped(ypred, g, show_marginal = TRUE)
#> `stat_bin()` using `bins = 30`. Pick better value `binwidth`.

# ppd_stat_2d(ypred)
ppd_stat_2d(ypred, show_marginal = TRUE)

Created on 2026-03-12 with reprex v2.1.1

@jgabry
Copy link
Member

jgabry commented Mar 9, 2026

Thanks for the PR! I haven't had a chance to look at the code yet (I'm a bit swamped with work at the moment), but a couple quick comments:

  1. I just triggered the GHA workflows to run and it looks like there are some failures.

  2. Regarding the color issue you mentioned, this is my fault due to lack for foresight! When I added the PPD plots I was assuming no overlays since we wouldn't be comparing to data. So instead of the light and dark distinction used for the PPC plots, I went with a color somewhere in between. But now this makes it harder to find a good color for overlays.

@mattansb
Copy link
Author

@jgabry I fixed the error that popped up.

Well the color "issue" can be resolved quite simply if we switch from "mid" to "light" for PPD, like PPC.
Or with some logic we can use "light"/"dark" only if show_marginal = TRUE and "mid" otherwise.
WDYT?

@jgabry
Copy link
Member

jgabry commented Mar 10, 2026

Well the color "issue" can be resolved quite simply if we switch from "mid" to "light" for PPD, like PPC.
Or with some logic we can use "light"/"dark" only if show_marginal = TRUE and "mid" otherwise.
WDYT?

@tjmahr Any preference on this? The first option would keep the code simpler, but it would change the appearance of all the PPD plots (could we call this breaking visual backwards compatibility?). I would lean towards the second option but I'm not really sure.

@jgabry
Copy link
Member

jgabry commented Mar 10, 2026

I just tried using this branch and show_marginal seems to work well for the plots I tried, but I don't think the new argument is documented yet.

@mattansb
Copy link
Author

Sorry, it was only documented in PPD-distributions. Added to PPD-test-statistics now.

@mattansb
Copy link
Author

Or with some logic we can use "light"/"dark" only if show_marginal = TRUE and "mid" otherwise.

I did this - wasn't too hard, and it actually made some of the code more simple. I've updated the examples above - I think it looks much better (note that the first ppd_dens_overlay(ypred) uses the mid color as it did before).

@jgabry
Copy link
Member

jgabry commented Mar 13, 2026

Thanks, I agree it looks better!

@jgabry
Copy link
Member

jgabry commented Mar 13, 2026

I think after that change there are now some visual tests that are failing. I checked one of the ppd_stat ones and it looks like it's the legend that's changed.

@mattansb
Copy link
Author

Okay, should be fixed.

@codecov-commenter
Copy link

codecov-commenter commented Mar 16, 2026

Codecov Report

❌ Patch coverage is 59.42029% with 112 lines in your changes missing coverage. Please review.
✅ Project coverage is 96.83%. Comparing base (a30a706) to head (5cbe24a).

Files with missing lines Patch % Lines
R/ppd-distributions.R 53.59% 71 Missing ⚠️
R/ppd-test-statistics.R 62.50% 36 Missing ⚠️
R/helpers-gg.R 78.26% 5 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #429      +/-   ##
==========================================
- Coverage   98.66%   96.83%   -1.83%     
==========================================
  Files          35       35              
  Lines        5857     6010     +153     
==========================================
+ Hits         5779     5820      +41     
- Misses         78      190     +112     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Member

@jgabry jgabry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I finally took a closer look. I think this is getting close! I made two small review comments inline, and here are few other things I noticed:

  • When discrete=TRUE, ppd_stat(..., discrete = TRUE, show_marginal = TRUE) seems to behave strangely. It looks to me like it's just plotting a bar with a count of 1 instead of the marginal PPD, but I'm not entirely sure. For example try:
   ypred <- matrix(rbinom(100 * 20, 1, 0.2), nrow = 100)
   prop0 <- function(x) mean(x == 0)
   ppd_stat(ypred, stat = prop0, discrete = TRUE, show_marginal = TRUE)
  • The default for freq was changed in ppd_stat() but not ppd_stat_grouped()

  • Currently there are no tests that turn on show_marginal=TRUE (this is also why the codecov comment shows low test coverage for the PR). Can you add some snapshot visual tests with show_marginal = TRUE?

Also @avehtari and @tjmahr do either of you have any thoughts on this PR before we go ahead with it?

show_marginal = show_marginal,
# in case user turns legend back on
guide = guide_legend(
override.aes = list(size = 2 * size, alpha = 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
override.aes = list(size = 2 * size, alpha = 1)
override.aes = list(linewidth = 2 * size, alpha = 1)

fill = "PPD"),
notch = notch,
linewidth = 1,
outlier.color = get_color("mh"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be get_color("dh") now that we changed to the light and dark instead of middle colors? Or are the outliers intentionally using a different color?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants