brms
and
traitintegration
This vignette provides an overview of how to use brms
to
fit a Bayesian multiple response model to traits measured on
individuals. The goal in doing so is to generate a posterior
distribution for the correlation among traits. We can then use this
posterior distribution in the traitintegration
package to
estimate the posterior distribution of the maximum eigenvalue, the
variance of eigenvalues, and entR as measures of trait integration. We
will use the wild population data from Helianthome as an example.
brms
First we retrieve the data from Helianthome, store it in a data
frame called dat
, and retain only the columns
Leaf C N ratio
, SLA
,
Primary branches
, Stem diameter at flowering
,
Ligule length
, Phyllaries length
, and
population_id
. Note: If you load the readr
library, you don’t need the readr::
prefix for the
read_csv()
.
## retrieve the phenotype matrix
##
phenos <- readr::read_csv("http://www.helianthome.org/rest/population/phenotype_matrix/wild",
show_col_types = FALSE)
# ## retrieve the list of populations (for population_id)
# ##
# populations <- readr::read_csv("http://www.helianthome.org/rest/population/list/",
# show_col_types = FALSE)
## retrieve individual id and population id
##
index <- readr::read_csv("http://www.helianthome.org/rest/individual/list/",
show_col_types = FALSE)
## merge population id into phenotypic data
##
wild <- merge(phenos, index, by = "individual_id") |>
dplyr::select(`Leaf C N ratio`, SLA, `Primary branches`,
`Stem diameter at flowering`, `Ligule length`,
`Phyllaries length`, population_id) |>
dplyr::rename(Leaf.C.N.ratio = `Leaf C N ratio`,
Primary.branches = `Primary branches`,
Stem.diameter.at.flowering = `Stem diameter at flowering`,
Ligule.length = `Ligule length`,
Phyllaries.length = `Phyllaries length`)
Now that we have the data, we can fit a model using
brms
. Specifically, we will fit a model in which we simply
fit a random intercept for each population. This means that the
correlation matrix we estimate is the within-population covariance
matrix. The lines begininng wild_
define linear models with
a random intercept for each of the six traits included in the analysis.
They are combined into a single, multiresponse model using the
+
operator in the call to brm
.
The set_rescor(rescor = TRUE)
argument tells
brms
to estimate correlations among the traits. The
refresh = 0
prevents brms
from telling us how
many iterations it is completing. The
options(mc.cores = parallel::detectCores())
tells brms to
use as many cores as you have (up to 4) to run the MCMC chains in
parallel. Note: Just as above, if you load thebrms
library,
you don’t need the brms::
prefix for the calls to
bf()
and brm()
.
options(mc.cores= parallel::detectCores())
wild_CN <- brms::bf(Leaf.C.N.ratio ~ (1|population_id))
wild_SLA <- brms::bf(SLA ~ (1|population_id))
wild_branch <- brms::bf(Primary.branches ~ (1|population_id))
wild_stem <- brms::bf(Stem.diameter.at.flowering ~ (1|population_id))
wild_ligule <- brms::bf(Ligule.length ~ (1|population_id))
wild_phyllary <- brms::bf(Phyllaries.length ~ (1|population_id))
mod_wild <- brms::brm(wild_CN + wild_SLA + wild_branch + wild_stem +
wild_ligule + wild_phyllary +
brms::set_rescor(rescor = TRUE),
data = wild,
family = "gaussian",
refresh = 0)
brms
will automatically exclude any individuals with
missing data. Now that the model has finished, we can explore some of
the results.
summary(mod_wild)
#> Family: MV(gaussian, gaussian, gaussian, gaussian, gaussian, gaussian)
#> Links: mu = identity; sigma = identity
#> mu = identity; sigma = identity
#> mu = identity; sigma = identity
#> mu = identity; sigma = identity
#> mu = identity; sigma = identity
#> mu = identity; sigma = identity
#> Formula: Leaf.C.N.ratio ~ (1 | population_id)
#> SLA ~ (1 | population_id)
#> Primary.branches ~ (1 | population_id)
#> Stem.diameter.at.flowering ~ (1 | population_id)
#> Ligule.length ~ (1 | population_id)
#> Phyllaries.length ~ (1 | population_id)
#> Data: wild (Number of observations: 530)
#> Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
#> total post-warmup draws = 4000
#>
#> Multilevel Hyperparameters:
#> ~population_id (Number of levels: 62)
#> Estimate Est.Error l-95% CI u-95% CI Rhat
#> sd(LeafCNratio_Intercept) 0.65 0.12 0.42 0.89 1.00
#> sd(SLA_Intercept) 2.17 0.43 1.31 3.05 1.00
#> sd(Primarybranches_Intercept) 6.79 0.68 5.63 8.29 1.00
#> sd(Stemdiameteratflowering_Intercept) 6.44 0.69 5.18 7.93 1.00
#> sd(Ligulelength_Intercept) 0.49 0.06 0.39 0.61 1.00
#> sd(Phyllarieslength_Intercept) 0.53 0.06 0.43 0.66 1.00
#> Bulk_ESS Tail_ESS
#> sd(LeafCNratio_Intercept) 1573 2503
#> sd(SLA_Intercept) 1442 1472
#> sd(Primarybranches_Intercept) 1069 1908
#> sd(Stemdiameteratflowering_Intercept) 1174 1709
#> sd(Ligulelength_Intercept) 1696 2329
#> sd(Phyllarieslength_Intercept) 1041 2020
#>
#> Regression Coefficients:
#> Estimate Est.Error l-95% CI u-95% CI Rhat
#> LeafCNratio_Intercept 7.88 0.11 7.66 8.10 1.00
#> SLA_Intercept 23.19 0.40 22.39 23.96 1.00
#> Primarybranches_Intercept 21.74 0.88 20.02 23.46 1.00
#> Stemdiameteratflowering_Intercept 34.39 0.88 32.65 36.09 1.00
#> Ligulelength_Intercept 5.24 0.07 5.10 5.38 1.00
#> Phyllarieslength_Intercept 3.32 0.07 3.18 3.46 1.00
#> Bulk_ESS Tail_ESS
#> LeafCNratio_Intercept 3167 2859
#> SLA_Intercept 3912 3422
#> Primarybranches_Intercept 637 1221
#> Stemdiameteratflowering_Intercept 1337 2313
#> Ligulelength_Intercept 1599 2413
#> Phyllarieslength_Intercept 1286 1984
#>
#> Further Distributional Parameters:
#> Estimate Est.Error l-95% CI u-95% CI Rhat
#> sigma_LeafCNratio 1.69 0.06 1.58 1.81 1.00
#> sigma_SLA 6.42 0.22 6.01 6.87 1.00
#> sigma_Primarybranches 4.86 0.16 4.55 5.20 1.00
#> sigma_Stemdiameteratflowering 6.89 0.23 6.45 7.36 1.00
#> sigma_Ligulelength 0.68 0.02 0.64 0.73 1.00
#> sigma_Phyllarieslength 0.58 0.02 0.54 0.62 1.00
#> Bulk_ESS Tail_ESS
#> sigma_LeafCNratio 3380 2744
#> sigma_SLA 3320 3065
#> sigma_Primarybranches 4476 3545
#> sigma_Stemdiameteratflowering 4657 3588
#> sigma_Ligulelength 5561 3112
#> sigma_Phyllarieslength 5444 3393
#>
#> Residual Correlations:
#> Estimate Est.Error l-95% CI
#> rescor(LeafCNratio,SLA) -0.36 0.04 -0.44
#> rescor(LeafCNratio,Primarybranches) 0.32 0.04 0.24
#> rescor(SLA,Primarybranches) -0.14 0.05 -0.24
#> rescor(LeafCNratio,Stemdiameteratflowering) 0.22 0.04 0.13
#> rescor(SLA,Stemdiameteratflowering) -0.13 0.05 -0.22
#> rescor(Primarybranches,Stemdiameteratflowering) 0.35 0.04 0.27
#> rescor(LeafCNratio,Ligulelength) 0.02 0.05 -0.08
#> rescor(SLA,Ligulelength) -0.02 0.05 -0.11
#> rescor(Primarybranches,Ligulelength) -0.12 0.05 -0.21
#> rescor(Stemdiameteratflowering,Ligulelength) 0.09 0.05 0.00
#> rescor(LeafCNratio,Phyllarieslength) 0.05 0.05 -0.04
#> rescor(SLA,Phyllarieslength) -0.06 0.05 -0.15
#> rescor(Primarybranches,Phyllarieslength) -0.10 0.05 -0.19
#> rescor(Stemdiameteratflowering,Phyllarieslength) 0.13 0.05 0.04
#> rescor(Ligulelength,Phyllarieslength) 0.36 0.04 0.28
#> u-95% CI Rhat Bulk_ESS
#> rescor(LeafCNratio,SLA) -0.28 1.00 4062
#> rescor(LeafCNratio,Primarybranches) 0.40 1.00 4447
#> rescor(SLA,Primarybranches) -0.05 1.00 4906
#> rescor(LeafCNratio,Stemdiameteratflowering) 0.31 1.00 4692
#> rescor(SLA,Stemdiameteratflowering) -0.04 1.00 4668
#> rescor(Primarybranches,Stemdiameteratflowering) 0.43 1.00 4345
#> rescor(LeafCNratio,Ligulelength) 0.11 1.00 5527
#> rescor(SLA,Ligulelength) 0.08 1.00 5384
#> rescor(Primarybranches,Ligulelength) -0.03 1.00 5568
#> rescor(Stemdiameteratflowering,Ligulelength) 0.19 1.00 5400
#> rescor(LeafCNratio,Phyllarieslength) 0.13 1.00 5055
#> rescor(SLA,Phyllarieslength) 0.03 1.00 6051
#> rescor(Primarybranches,Phyllarieslength) -0.01 1.00 5277
#> rescor(Stemdiameteratflowering,Phyllarieslength) 0.22 1.00 4802
#> rescor(Ligulelength,Phyllarieslength) 0.44 1.00 5015
#> Tail_ESS
#> rescor(LeafCNratio,SLA) 3486
#> rescor(LeafCNratio,Primarybranches) 3216
#> rescor(SLA,Primarybranches) 3425
#> rescor(LeafCNratio,Stemdiameteratflowering) 3162
#> rescor(SLA,Stemdiameteratflowering) 3395
#> rescor(Primarybranches,Stemdiameteratflowering) 3256
#> rescor(LeafCNratio,Ligulelength) 3207
#> rescor(SLA,Ligulelength) 3270
#> rescor(Primarybranches,Ligulelength) 3203
#> rescor(Stemdiameteratflowering,Ligulelength) 3239
#> rescor(LeafCNratio,Phyllarieslength) 3223
#> rescor(SLA,Phyllarieslength) 2858
#> rescor(Primarybranches,Phyllarieslength) 3167
#> rescor(Stemdiameteratflowering,Phyllarieslength) 2978
#> rescor(Ligulelength,Phyllarieslength) 3302
#>
#> Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
#> and Tail_ESS are effective sample size measures, and Rhat is the potential
#> scale reduction factor on split chains (at convergence, Rhat = 1).
The most important results here are that the Rhat
s are
all very close to 1. That means we have good evidence that the chains
have converged. It’s also good to see that both the
Bulk_ESS
and Tail_ESS
are reasonably high (all
above 6000), and especially that they are high for the
rescor
parameters. Those are the correlations that we’re
interested.
With the fitted model in hand, visualizing the correlation matrix is
straightforward. If you store the result of the call to
plot_posterior_correlation()
in a variable, say
p
, you can print the posterior mean correlation matrix as
shown below.
print(p$R)
#> LeafCNratio SLA Primarybranches
#> LeafCNratio 1.00000000 -0.36023696 0.32423435
#> SLA -0.36023696 1.00000000 -0.14472748
#> Primarybranches 0.32423435 -0.14472748 1.00000000
#> Stemdiameteratflowering 0.21977329 -0.12712371 0.35332146
#> Ligulelength 0.01656442 -0.01577509 -0.11847978
#> Phyllarieslength 0.04612757 -0.06200969 -0.09992245
#> Stemdiameteratflowering Ligulelength Phyllarieslength
#> LeafCNratio 0.21977329 0.01656442 0.04612757
#> SLA -0.12712371 -0.01577509 -0.06200969
#> Primarybranches 0.35332146 -0.11847978 -0.09992245
#> Stemdiameteratflowering 1.00000000 0.09377997 0.13001835
#> Ligulelength 0.09377997 1.00000000 0.36269402
#> Phyllarieslength 0.13001835 0.36269402 1.00000000
The color of each edge reflects the posterior mean of the correlation coefficient for the nodes that are being connected, red for negative (< -0.2), blue for positive (> 0.2), or white for intermediate (-0.2 to 0.2). If you have correlation coefficients that are especially large in magnitude, they will display as more intense colors of red or blue.
You will probably find the labels at each node a bit long. If you do,
you can use the labels
argument to give them new names. If
you’d also like to see values for the individual correlation
coefficients, you can set the label_edges
argument to
TRUE
. Note: storing the result of the call to
plot_posterior_correlation()
in p
to
suppresses printing of the posterior correlation matrix.
p <- plot_posterior_correlation(mod_wild, labels = c("Leaf CN", "SLA", "Branches",
"Stems", "Ligules",
"Phyllaries"),
label_edges = TRUE)
If you look closely, you’ll notice that not all of the edges are labeled. That’s because some of those in the middle would overlap if they were displayed.
You aren’t stuck with the color palette used here. You can use any of
the diverging color palettes in RColorBrewer
. You can also
change the number of digits displayed by using the digits
argument.
Estimating trait integration is straightforward. We simply call the
assemble_statistics()
function with the fitted model as the
argument. It is best to store the result in a variable, say
results
, so that you can work with the results.
results <- assemble_statistics(mod_wild)
str(results)
#> 'data.frame': 4000 obs. of 9 variables:
#> $ EV_1 : num 1.89 1.78 1.83 1.88 1.94 ...
#> $ EV_2 : num 1.41 1.49 1.38 1.42 1.45 ...
#> $ EV_3 : num 0.933 0.993 0.999 1.002 0.855 ...
#> $ EV_4 : num 0.679 0.635 0.679 0.666 0.64 ...
#> $ EV_5 : num 0.595 0.585 0.595 0.557 0.597 ...
#> $ EV_6 : num 0.498 0.514 0.518 0.467 0.518 ...
#> $ Lead_EV: num 1.89 1.78 1.83 1.88 1.94 ...
#> $ Var_EV : num 0.295 0.279 0.266 0.311 0.327 ...
#> $ ent_R : num 0.265 0.263 0.253 0.279 0.275 ...
As you can see, results
is a data frame in which the
columns starting with EV_
are the eigenvalues of the
correlation matrix, Lead_EV
is the lead eigenvalue,
Var_EV
is the variance of the eigenvalues, and
ent_R
the entropy measure of trait integration. The rows of
the data frame correspond to samples from the posterior
distribution.
You can summarize the results using the
summarize_statistics()
function.
summarize_statistics(results)
#> Statistic Mean 2.5% 97.5%
#> 1 Lead_EV 1.801 1.651 1.956
#> 2 Var_EV 0.265 0.203 0.337
#> 3 ent_R 0.252 0.220 0.286
By default the data frame that is returned contains the posterior mean and the 2.5% and 97.5% quantiles of the statistic and rounds the results to 3 digits, but those options are easy to change.
summarize_statistics(results, probs = c(0.025, 0.05, 0.5, 0.95, 0.975), digits = 2)
#> Statistic Mean 2.5% 5% 50% 95% 97.5%
#> 1 Lead_EV 1.80 1.65 1.67 1.80 1.93 1.96
#> 2 Var_EV 0.27 0.20 0.21 0.26 0.32 0.34
#> 3 ent_R 0.25 0.22 0.22 0.25 0.28 0.29
You can also visualize the posterior distribution of any column using
functions from the bayesplot
package. For example, you can
visualize the posterior distributions of the lead eigenvalue, variance
of eigenvalues, and entR as follows (and as before, you can leave off
the bayesplot::
and `ggplot2::
prefixes if
you’ve loaded those libraries):
Note: You will only be able to use bayesplot
functions
that expect all chains to be merged, e.g., mcmc_areas()
,
mcmc_intervals()
,
mcmc_hist(), and
mcmc_dens()`.