Using brms and traitintegration

library(traitintegration)

Introduction

This vignette provides an overview of how to use brms to fit a Bayesian multiple response model to traits measured on individuals. The goal in doing so is to generate a posterior distribution for the correlation among traits. We can then use this posterior distribution in the traitintegration package to estimate the posterior distribution of the maximum eigenvalue, the variance of eigenvalues, and entR as measures of trait integration. We will use the wild population data from Helianthome as an example.

Fitting a model with brms

First we retrieve the data from Helianthome, store it in a data frame called dat, and retain only the columns Leaf C N ratio, SLA, Primary branches, Stem diameter at flowering, Ligule length, Phyllaries length, and population_id. Note: If you load the readr library, you don’t need the readr:: prefix for the read_csv().

## retrieve the phenotype matrix
##
phenos <- readr::read_csv("http://www.helianthome.org/rest/population/phenotype_matrix/wild",
                          show_col_types = FALSE)
# ## retrieve the list of populations (for population_id)
# ##
# populations <- readr::read_csv("http://www.helianthome.org/rest/population/list/",
#                           show_col_types = FALSE)
## retrieve individual id and population id
##
index <- readr::read_csv("http://www.helianthome.org/rest/individual/list/",
                          show_col_types = FALSE)
## merge population id into phenotypic data
##
wild <- merge(phenos, index, by = "individual_id") |>
  dplyr::select(`Leaf C N ratio`, SLA, `Primary branches`,
                `Stem diameter at flowering`, `Ligule length`,
                `Phyllaries length`, population_id) |>
  dplyr::rename(Leaf.C.N.ratio = `Leaf C N ratio`,
                Primary.branches = `Primary branches`,
                Stem.diameter.at.flowering = `Stem diameter at flowering`,
                Ligule.length = `Ligule length`,
                Phyllaries.length = `Phyllaries length`)

Now that we have the data, we can fit a model using brms. Specifically, we will fit a model in which we simply fit a random intercept for each population. This means that the correlation matrix we estimate is the within-population covariance matrix. The lines begininng wild_ define linear models with a random intercept for each of the six traits included in the analysis. They are combined into a single, multiresponse model using the + operator in the call to brm.

The set_rescor(rescor = TRUE) argument tells brms to estimate correlations among the traits. The refresh = 0 prevents brms from telling us how many iterations it is completing. The options(mc.cores = parallel::detectCores()) tells brms to use as many cores as you have (up to 4) to run the MCMC chains in parallel. Note: Just as above, if you load thebrms library, you don’t need the brms:: prefix for the calls to bf() and brm().

options(mc.cores= parallel::detectCores())

wild_CN <- brms::bf(Leaf.C.N.ratio ~ (1|population_id))
wild_SLA <- brms::bf(SLA ~ (1|population_id))
wild_branch <- brms::bf(Primary.branches ~ (1|population_id))
wild_stem <- brms::bf(Stem.diameter.at.flowering ~ (1|population_id))
wild_ligule <- brms::bf(Ligule.length ~ (1|population_id))
wild_phyllary <- brms::bf(Phyllaries.length ~ (1|population_id))

mod_wild <- brms::brm(wild_CN + wild_SLA + wild_branch + wild_stem +
                        wild_ligule + wild_phyllary + 
                        brms::set_rescor(rescor = TRUE),
                      data = wild,
                      family = "gaussian", 
                      refresh = 0)

brms will automatically exclude any individuals with missing data. Now that the model has finished, we can explore some of the results.

summary(mod_wild)
#>  Family: MV(gaussian, gaussian, gaussian, gaussian, gaussian, gaussian) 
#>   Links: mu = identity; sigma = identity
#>          mu = identity; sigma = identity
#>          mu = identity; sigma = identity
#>          mu = identity; sigma = identity
#>          mu = identity; sigma = identity
#>          mu = identity; sigma = identity 
#> Formula: Leaf.C.N.ratio ~ (1 | population_id) 
#>          SLA ~ (1 | population_id) 
#>          Primary.branches ~ (1 | population_id) 
#>          Stem.diameter.at.flowering ~ (1 | population_id) 
#>          Ligule.length ~ (1 | population_id) 
#>          Phyllaries.length ~ (1 | population_id) 
#>    Data: wild (Number of observations: 530) 
#>   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
#>          total post-warmup draws = 4000
#> 
#> Multilevel Hyperparameters:
#> ~population_id (Number of levels: 62) 
#>                                       Estimate Est.Error l-95% CI u-95% CI Rhat
#> sd(LeafCNratio_Intercept)                 0.65      0.12     0.42     0.89 1.00
#> sd(SLA_Intercept)                         2.17      0.43     1.31     3.05 1.00
#> sd(Primarybranches_Intercept)             6.79      0.68     5.63     8.29 1.00
#> sd(Stemdiameteratflowering_Intercept)     6.44      0.69     5.18     7.93 1.00
#> sd(Ligulelength_Intercept)                0.49      0.06     0.39     0.61 1.00
#> sd(Phyllarieslength_Intercept)            0.53      0.06     0.43     0.66 1.00
#>                                       Bulk_ESS Tail_ESS
#> sd(LeafCNratio_Intercept)                 1573     2503
#> sd(SLA_Intercept)                         1442     1472
#> sd(Primarybranches_Intercept)             1069     1908
#> sd(Stemdiameteratflowering_Intercept)     1174     1709
#> sd(Ligulelength_Intercept)                1696     2329
#> sd(Phyllarieslength_Intercept)            1041     2020
#> 
#> Regression Coefficients:
#>                                   Estimate Est.Error l-95% CI u-95% CI Rhat
#> LeafCNratio_Intercept                 7.88      0.11     7.66     8.10 1.00
#> SLA_Intercept                        23.19      0.40    22.39    23.96 1.00
#> Primarybranches_Intercept            21.74      0.88    20.02    23.46 1.00
#> Stemdiameteratflowering_Intercept    34.39      0.88    32.65    36.09 1.00
#> Ligulelength_Intercept                5.24      0.07     5.10     5.38 1.00
#> Phyllarieslength_Intercept            3.32      0.07     3.18     3.46 1.00
#>                                   Bulk_ESS Tail_ESS
#> LeafCNratio_Intercept                 3167     2859
#> SLA_Intercept                         3912     3422
#> Primarybranches_Intercept              637     1221
#> Stemdiameteratflowering_Intercept     1337     2313
#> Ligulelength_Intercept                1599     2413
#> Phyllarieslength_Intercept            1286     1984
#> 
#> Further Distributional Parameters:
#>                               Estimate Est.Error l-95% CI u-95% CI Rhat
#> sigma_LeafCNratio                 1.69      0.06     1.58     1.81 1.00
#> sigma_SLA                         6.42      0.22     6.01     6.87 1.00
#> sigma_Primarybranches             4.86      0.16     4.55     5.20 1.00
#> sigma_Stemdiameteratflowering     6.89      0.23     6.45     7.36 1.00
#> sigma_Ligulelength                0.68      0.02     0.64     0.73 1.00
#> sigma_Phyllarieslength            0.58      0.02     0.54     0.62 1.00
#>                               Bulk_ESS Tail_ESS
#> sigma_LeafCNratio                 3380     2744
#> sigma_SLA                         3320     3065
#> sigma_Primarybranches             4476     3545
#> sigma_Stemdiameteratflowering     4657     3588
#> sigma_Ligulelength                5561     3112
#> sigma_Phyllarieslength            5444     3393
#> 
#> Residual Correlations: 
#>                                                  Estimate Est.Error l-95% CI
#> rescor(LeafCNratio,SLA)                             -0.36      0.04    -0.44
#> rescor(LeafCNratio,Primarybranches)                  0.32      0.04     0.24
#> rescor(SLA,Primarybranches)                         -0.14      0.05    -0.24
#> rescor(LeafCNratio,Stemdiameteratflowering)          0.22      0.04     0.13
#> rescor(SLA,Stemdiameteratflowering)                 -0.13      0.05    -0.22
#> rescor(Primarybranches,Stemdiameteratflowering)      0.35      0.04     0.27
#> rescor(LeafCNratio,Ligulelength)                     0.02      0.05    -0.08
#> rescor(SLA,Ligulelength)                            -0.02      0.05    -0.11
#> rescor(Primarybranches,Ligulelength)                -0.12      0.05    -0.21
#> rescor(Stemdiameteratflowering,Ligulelength)         0.09      0.05     0.00
#> rescor(LeafCNratio,Phyllarieslength)                 0.05      0.05    -0.04
#> rescor(SLA,Phyllarieslength)                        -0.06      0.05    -0.15
#> rescor(Primarybranches,Phyllarieslength)            -0.10      0.05    -0.19
#> rescor(Stemdiameteratflowering,Phyllarieslength)     0.13      0.05     0.04
#> rescor(Ligulelength,Phyllarieslength)                0.36      0.04     0.28
#>                                                  u-95% CI Rhat Bulk_ESS
#> rescor(LeafCNratio,SLA)                             -0.28 1.00     4062
#> rescor(LeafCNratio,Primarybranches)                  0.40 1.00     4447
#> rescor(SLA,Primarybranches)                         -0.05 1.00     4906
#> rescor(LeafCNratio,Stemdiameteratflowering)          0.31 1.00     4692
#> rescor(SLA,Stemdiameteratflowering)                 -0.04 1.00     4668
#> rescor(Primarybranches,Stemdiameteratflowering)      0.43 1.00     4345
#> rescor(LeafCNratio,Ligulelength)                     0.11 1.00     5527
#> rescor(SLA,Ligulelength)                             0.08 1.00     5384
#> rescor(Primarybranches,Ligulelength)                -0.03 1.00     5568
#> rescor(Stemdiameteratflowering,Ligulelength)         0.19 1.00     5400
#> rescor(LeafCNratio,Phyllarieslength)                 0.13 1.00     5055
#> rescor(SLA,Phyllarieslength)                         0.03 1.00     6051
#> rescor(Primarybranches,Phyllarieslength)            -0.01 1.00     5277
#> rescor(Stemdiameteratflowering,Phyllarieslength)     0.22 1.00     4802
#> rescor(Ligulelength,Phyllarieslength)                0.44 1.00     5015
#>                                                  Tail_ESS
#> rescor(LeafCNratio,SLA)                              3486
#> rescor(LeafCNratio,Primarybranches)                  3216
#> rescor(SLA,Primarybranches)                          3425
#> rescor(LeafCNratio,Stemdiameteratflowering)          3162
#> rescor(SLA,Stemdiameteratflowering)                  3395
#> rescor(Primarybranches,Stemdiameteratflowering)      3256
#> rescor(LeafCNratio,Ligulelength)                     3207
#> rescor(SLA,Ligulelength)                             3270
#> rescor(Primarybranches,Ligulelength)                 3203
#> rescor(Stemdiameteratflowering,Ligulelength)         3239
#> rescor(LeafCNratio,Phyllarieslength)                 3223
#> rescor(SLA,Phyllarieslength)                         2858
#> rescor(Primarybranches,Phyllarieslength)             3167
#> rescor(Stemdiameteratflowering,Phyllarieslength)     2978
#> rescor(Ligulelength,Phyllarieslength)                3302
#> 
#> Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
#> and Tail_ESS are effective sample size measures, and Rhat is the potential
#> scale reduction factor on split chains (at convergence, Rhat = 1).

The most important results here are that the Rhats are all very close to 1. That means we have good evidence that the chains have converged. It’s also good to see that both the Bulk_ESS and Tail_ESS are reasonably high (all above 6000), and especially that they are high for the rescor parameters. Those are the correlations that we’re interested.

Visualizing the posterior correlation matrix

With the fitted model in hand, visualizing the correlation matrix is straightforward. If you store the result of the call to plot_posterior_correlation() in a variable, say p, you can print the posterior mean correlation matrix as shown below.

p <- plot_posterior_correlation(mod_wild)

print(p$R)
#>                         LeafCNratio         SLA Primarybranches
#> LeafCNratio              1.00000000 -0.36023696      0.32423435
#> SLA                     -0.36023696  1.00000000     -0.14472748
#> Primarybranches          0.32423435 -0.14472748      1.00000000
#> Stemdiameteratflowering  0.21977329 -0.12712371      0.35332146
#> Ligulelength             0.01656442 -0.01577509     -0.11847978
#> Phyllarieslength         0.04612757 -0.06200969     -0.09992245
#>                         Stemdiameteratflowering Ligulelength Phyllarieslength
#> LeafCNratio                          0.21977329   0.01656442       0.04612757
#> SLA                                 -0.12712371  -0.01577509      -0.06200969
#> Primarybranches                      0.35332146  -0.11847978      -0.09992245
#> Stemdiameteratflowering              1.00000000   0.09377997       0.13001835
#> Ligulelength                         0.09377997   1.00000000       0.36269402
#> Phyllarieslength                     0.13001835   0.36269402       1.00000000

The color of each edge reflects the posterior mean of the correlation coefficient for the nodes that are being connected, red for negative (< -0.2), blue for positive (> 0.2), or white for intermediate (-0.2 to 0.2). If you have correlation coefficients that are especially large in magnitude, they will display as more intense colors of red or blue.

You will probably find the labels at each node a bit long. If you do, you can use the labels argument to give them new names. If you’d also like to see values for the individual correlation coefficients, you can set the label_edges argument to TRUE. Note: storing the result of the call to plot_posterior_correlation() in p to suppresses printing of the posterior correlation matrix.

p <- plot_posterior_correlation(mod_wild, labels = c("Leaf CN", "SLA", "Branches",
                                                     "Stems", "Ligules", 
                                                     "Phyllaries"),
                           label_edges = TRUE)

If you look closely, you’ll notice that not all of the edges are labeled. That’s because some of those in the middle would overlap if they were displayed.

You aren’t stuck with the color palette used here. You can use any of the diverging color palettes in RColorBrewer. You can also change the number of digits displayed by using the digits argument.

Estimating trait integration

Estimating trait integration is straightforward. We simply call the assemble_statistics() function with the fitted model as the argument. It is best to store the result in a variable, say results, so that you can work with the results.

results <- assemble_statistics(mod_wild)
str(results)
#> 'data.frame':    4000 obs. of  9 variables:
#>  $ EV_1   : num  1.89 1.78 1.83 1.88 1.94 ...
#>  $ EV_2   : num  1.41 1.49 1.38 1.42 1.45 ...
#>  $ EV_3   : num  0.933 0.993 0.999 1.002 0.855 ...
#>  $ EV_4   : num  0.679 0.635 0.679 0.666 0.64 ...
#>  $ EV_5   : num  0.595 0.585 0.595 0.557 0.597 ...
#>  $ EV_6   : num  0.498 0.514 0.518 0.467 0.518 ...
#>  $ Lead_EV: num  1.89 1.78 1.83 1.88 1.94 ...
#>  $ Var_EV : num  0.295 0.279 0.266 0.311 0.327 ...
#>  $ ent_R  : num  0.265 0.263 0.253 0.279 0.275 ...

As you can see, results is a data frame in which the columns starting with EV_ are the eigenvalues of the correlation matrix, Lead_EV is the lead eigenvalue, Var_EV is the variance of the eigenvalues, and ent_R the entropy measure of trait integration. The rows of the data frame correspond to samples from the posterior distribution.

You can summarize the results using the summarize_statistics() function.

summarize_statistics(results)
#>   Statistic  Mean  2.5% 97.5%
#> 1   Lead_EV 1.801 1.651 1.956
#> 2    Var_EV 0.265 0.203 0.337
#> 3     ent_R 0.252 0.220 0.286

By default the data frame that is returned contains the posterior mean and the 2.5% and 97.5% quantiles of the statistic and rounds the results to 3 digits, but those options are easy to change.

summarize_statistics(results, probs = c(0.025, 0.05, 0.5, 0.95, 0.975), digits = 2)
#>   Statistic Mean 2.5%   5%  50%  95% 97.5%
#> 1   Lead_EV 1.80 1.65 1.67 1.80 1.93  1.96
#> 2    Var_EV 0.27 0.20 0.21 0.26 0.32  0.34
#> 3     ent_R 0.25 0.22 0.22 0.25 0.28  0.29

You can also visualize the posterior distribution of any column using functions from the bayesplot package. For example, you can visualize the posterior distributions of the lead eigenvalue, variance of eigenvalues, and entR as follows (and as before, you can leave off the bayesplot:: and `ggplot2:: prefixes if you’ve loaded those libraries):

bayesplot::mcmc_areas(results[, c("Lead_EV", "Var_EV", "ent_R")])

Note: You will only be able to use bayesplot functions that expect all chains to be merged, e.g., mcmc_areas(), mcmc_intervals(), mcmc_hist(), andmcmc_dens()`.