This is the classic eight schools example from Rubin(1981)1. (It is also found in Chapter 5 of Gelman et al., 2014 2.) The story is that 8 different schools experimented with an SAT coaching experiment. The performance gains of the coached students were compared to students on a weight list control. Separate estimates were obtained for each school, but because the size of the schools differed, the standard errors differed as well.
Here are the data:
Schools <- data.frame(row.names=c("A","B","C","D","E","F","G","H"),
effect = c(28.39,7.94,-2.75,6.82,-.64,.63,18.01,12.16),
see = c(14.9, 10.2, 16.3, 11.0, 9.4, 11.4, 10.4, 17.6))
Schools
Lets start by calculating a weighted average effect. I’ll weight each case by the precision (one over the square of the see).
Schools$w <- 1/Schools$see^2
Schools.mean <- sum(Schools$w*Schools$effect)/sum(Schools$w)
Schools.mean
[1] 7.870546
Here is a plot of the data.
ord <- order(Schools$effect)
plot(Schools$effect[ord[c(1,8)]]+c(-2,2)*Schools$see[ord[c(1,8)]],
c(nrow(Schools),1),main = "8 Schools data.",type="n",yaxt="n",
xlab="Effect Size",ylab="School")
points(Schools$effect[ord],nrow(Schools):1,pch=rownames(Schools)[ord])
segments(Schools$effect[ord]-2*Schools$see[ord],nrow(Schools):1,
Schools$effect[ord]+2*Schools$see[ord],nrow(Schools):1)
abline(v=Schools.mean,col="blue")
First we need to load the packages. The rstan
package runs stan from R. The shinystan
package gives us a browser for the results.
library(rstan)
library(shinystan)
library(parallel) # For using multiple chains
First we set up the Stan model, and put it into a variable called school8
(note the output.var="school8"
in stan block tag)
Almost all models will have a data
, parameters
and model
block. They could have others as well (common ones are transformed data
to do pre-calculations and transformed parameters
to recast the model).
In a stan model, data refers to values that don’t change over the course of the MCMC loop. This tends to be one of three things: * The observed data values. * Fixed hyperparameters for prior distributions. * Structural hyperparamters (e.g., sample size, number of groups).
In stan (like C++) all variables must be declared. They are generally either int
or real
(here vector
is shorthand for a vector of real values). The modifiers
The parameters are the values that stan will try to estimate. These include both latent variables and ordinary parameters and hyperparameters.
In stan, parameters should all be real. Non-continuous parameters don’t work with the Hamiltonian Monte Carlo. (There are tricks for dealing with common cases like mixture models in the stan examples.) This is called lp__
(log pdf) in the output.
Note carefully the use of the lower and upper bounds. It is important for stan to know when a parameter is restricted to say positive values (i.e., a scale parameter) as it needs to constrain the space for the sampler.
This section gives the distribution for all of the parameters. First I give the BUGS-like way of doing this. This is to use the ~
operator to give the distribution. The names are slightly different in stan and R, so the rstan function lookup
can help you find the stan function corresponding to the R function.
lookup(dnorm)
lookup(dt)
Note that what stan is actually doing in the model block is calculating the log p.d.f. Thus, the commented out expressions are an alternative to the ~
notation.
Finally, note that this is an incomplete Bayesian model as there are no distributions for mu
or tau
. In stan this means we are implicitly putting a uniform prior on mu
and log(tau)
; the latter is transformed so that it will always be positive. These are improper priors, but stan will be fine as long as the posterior is proper.
Build a list which contains the data as elements using the names in the data section of the model.
school8.dat <- list(
J = nrow(Schools),
y = Schools$effect,
sigma=Schools$see)
There are two funcitons to start the sampling in stan. The first one is stan(file=XXX,data=YYY,...)
. This assumes that the stan model is in a file. However, with R markdown, we already saved the model in an object so we can use sampling(model,data=YYY,...)
. By the way, this same function can be used to make additional samples after we have sampled for a while.
The mc.cores should work well for Unix (Linux and Mac OS), I’m not so sure about Windows.
#options(mc.cores = parallel::detectCores()-1)
school8.fit1 <- sampling(
school8.stan, # The model
data = school8.dat, # The data
chains = 5,
warmup = 1000,
iter = 2000,
refresh=100 # Show progress
)
starting worker pid=925 on localhost:11964 at 15:53:01.250
starting worker pid=934 on localhost:11964 at 15:53:01.367
starting worker pid=943 on localhost:11964 at 15:53:01.485
starting worker pid=952 on localhost:11964 at 15:53:01.599
starting worker pid=963 on localhost:11964 at 15:53:01.713
SAMPLING FOR MODEL 'school8' NOW (CHAIN 1).
Gradient evaluation took 1e-05 seconds
1000 transitions using 10 leapfrog steps per transition would take 0.1 seconds.
Adjust your expectations accordingly!
Iteration: 1 / 2000 [ 0%] (Warmup)
Iteration: 100 / 2000 [ 5%] (Warmup)
Iteration: 200 / 2000 [ 10%] (Warmup)
Iteration: 300 / 2000 [ 15%] (Warmup)
Iteration: 400 / 2000 [ 20%] (Warmup)
Iteration: 500 / 2000 [ 25%] (Warmup)
Iteration: 600 / 2000 [ 30%] (Warmup)
Iteration: 700 / 2000 [ 35%] (Warmup)
Iteration: 800 / 2000 [ 40%] (Warmup)
Iteration: 900 / 2000 [ 45%] (Warmup)
Iteration: 1000 / 2000 [ 50%] (Warmup)
Iteration: 1001 / 2000 [ 50%] (Sampling)
Iteration: 1100 / 2000 [ 55%] (Sampling)
Iteration: 1200 / 2000 [ 60%] (Sampling)
Iteration: 1300 / 2000 [ 65%] (Sampling)
Iteration: 1400 / 2000 [ 70%] (Sampling)
Iteration: 1500 / 2000 [ 75%] (Sampling)
Iteration: 1600 / 2000 [ 80%] (Sampling)
Iteration: 1700 / 2000 [ 85%] (Sampling)
Iteration: 1800 / 2000 [ 90%] (Sampling)
Iteration: 1900 / 2000 [ 95%] (Sampling)
Iteration: 2000 / 2000 [100%] (Sampling)
Elapsed Time: 0.036901 seconds (Warm-up)
0.018909 seconds (Sampling)
0.05581 seconds (Total)
SAMPLING FOR MODEL 'school8' NOW (CHAIN 2).
Gradient evaluation took 6e-06 seconds
1000 transitions using 10 leapfrog steps per transition would take 0.06 seconds.
Adjust your expectations accordingly!
Iteration: 1 / 2000 [ 0%] (Warmup)
Iteration: 100 / 2000 [ 5%] (Warmup)
Iteration: 200 / 2000 [ 10%] (Warmup)
Iteration: 300 / 2000 [ 15%] (Warmup)
Iteration: 400 / 2000 [ 20%] (Warmup)
Iteration: 500 / 2000 [ 25%] (Warmup)
Iteration: 600 / 2000 [ 30%] (Warmup)
Iteration: 700 / 2000 [ 35%] (Warmup)
Iteration: 800 / 2000 [ 40%] (Warmup)
Iteration: 900 / 2000 [ 45%] (Warmup)
Iteration: 1000 / 2000 [ 50%] (Warmup)
Iteration: 1001 / 2000 [ 50%] (Sampling)
Iteration: 1100 / 2000 [ 55%] (Sampling)
Iteration: 1200 / 2000 [ 60%] (Sampling)
Iteration: 1300 / 2000 [ 65%] (Sampling)
Iteration: 1400 / 2000 [ 70%] (Sampling)
Iteration: 1500 / 2000 [ 75%] (Sampling)
Iteration: 1600 / 2000 [ 80%] (Sampling)
Iteration: 1700 / 2000 [ 85%] (Sampling)
Iteration: 1800 / 2000 [ 90%] (Sampling)
Iteration: 1900 / 2000 [ 95%] (Sampling)
Iteration: 2000 / 2000 [100%] (Sampling)
Elapsed Time: 0.038638 seconds (Warm-up)
0.027682 seconds (Sampling)
0.06632 seconds (Total)
SAMPLING FOR MODEL 'school8' NOW (CHAIN 3).
Gradient evaluation took 7e-06 seconds
1000 transitions using 10 leapfrog steps per transition would take 0.07 seconds.
Adjust your expectations accordingly!
Iteration: 1 / 2000 [ 0%] (Warmup)
Iteration: 100 / 2000 [ 5%] (Warmup)
Iteration: 200 / 2000 [ 10%] (Warmup)
Iteration: 300 / 2000 [ 15%] (Warmup)
Iteration: 400 / 2000 [ 20%] (Warmup)
Iteration: 500 / 2000 [ 25%] (Warmup)
Iteration: 600 / 2000 [ 30%] (Warmup)
Iteration: 700 / 2000 [ 35%] (Warmup)
Iteration: 800 / 2000 [ 40%] (Warmup)
Iteration: 900 / 2000 [ 45%] (Warmup)
Iteration: 1000 / 2000 [ 50%] (Warmup)
Iteration: 1001 / 2000 [ 50%] (Sampling)
Iteration: 1100 / 2000 [ 55%] (Sampling)
Iteration: 1200 / 2000 [ 60%] (Sampling)
Iteration: 1300 / 2000 [ 65%] (Sampling)
Iteration: 1400 / 2000 [ 70%] (Sampling)
Iteration: 1500 / 2000 [ 75%] (Sampling)
Iteration: 1600 / 2000 [ 80%] (Sampling)
Iteration: 1700 / 2000 [ 85%] (Sampling)
Iteration: 1800 / 2000 [ 90%] (Sampling)
Iteration: 1900 / 2000 [ 95%] (Sampling)
Iteration: 2000 / 2000 [100%] (Sampling)
Elapsed Time: 0.03717 seconds (Warm-up)
0.018411 seconds (Sampling)
0.055581 seconds (Total)
SAMPLING FOR MODEL 'school8' NOW (CHAIN 4).
Gradient evaluation took 6e-06 seconds
1000 transitions using 10 leapfrog steps per transition would take 0.06 seconds.
Adjust your expectations accordingly!
Iteration: 1 / 2000 [ 0%] (Warmup)
Iteration: 100 / 2000 [ 5%] (Warmup)
Iteration: 200 / 2000 [ 10%] (Warmup)
Iteration: 300 / 2000 [ 15%] (Warmup)
Iteration: 400 / 2000 [ 20%] (Warmup)
Iteration: 500 / 2000 [ 25%] (Warmup)
Iteration: 600 / 2000 [ 30%] (Warmup)
Iteration: 700 / 2000 [ 35%] (Warmup)
Iteration: 800 / 2000 [ 40%] (Warmup)
Iteration: 900 / 2000 [ 45%] (Warmup)
Iteration: 1000 / 2000 [ 50%] (Warmup)
Iteration: 1001 / 2000 [ 50%] (Sampling)
Iteration: 1100 / 2000 [ 55%] (Sampling)
Iteration: 1200 / 2000 [ 60%] (Sampling)
Iteration: 1300 / 2000 [ 65%] (Sampling)
Iteration: 1400 / 2000 [ 70%] (Sampling)
Iteration: 1500 / 2000 [ 75%] (Sampling)
Iteration: 1600 / 2000 [ 80%] (Sampling)
Iteration: 1700 / 2000 [ 85%] (Sampling)
Iteration: 1800 / 2000 [ 90%] (Sampling)
Iteration: 1900 / 2000 [ 95%] (Sampling)
Iteration: 2000 / 2000 [100%] (Sampling)
Elapsed Time: 0.032323 seconds (Warm-up)
0.021422 seconds (Sampling)
0.053745 seconds (Total)
SAMPLING FOR MODEL 'school8' NOW (CHAIN 5).
Gradient evaluation took 6e-06 seconds
1000 transitions using 10 leapfrog steps per transition would take 0.06 seconds.
Adjust your expectations accordingly!
Iteration: 1 / 2000 [ 0%] (Warmup)
Iteration: 100 / 2000 [ 5%] (Warmup)
Iteration: 200 / 2000 [ 10%] (Warmup)
Iteration: 300 / 2000 [ 15%] (Warmup)
Iteration: 400 / 2000 [ 20%] (Warmup)
Iteration: 500 / 2000 [ 25%] (Warmup)
Iteration: 600 / 2000 [ 30%] (Warmup)
Iteration: 700 / 2000 [ 35%] (Warmup)
Iteration: 800 / 2000 [ 40%] (Warmup)
Iteration: 900 / 2000 [ 45%] (Warmup)
Iteration: 1000 / 2000 [ 50%] (Warmup)
Iteration: 1001 / 2000 [ 50%] (Sampling)
Iteration: 1100 / 2000 [ 55%] (Sampling)
Iteration: 1200 / 2000 [ 60%] (Sampling)
Iteration: 1300 / 2000 [ 65%] (Sampling)
Iteration: 1400 / 2000 [ 70%] (Sampling)
Iteration: 1500 / 2000 [ 75%] (Sampling)
Iteration: 1600 / 2000 [ 80%] (Sampling)
Iteration: 1700 / 2000 [ 85%] (Sampling)
Iteration: 1800 / 2000 [ 90%] (Sampling)
Iteration: 1900 / 2000 [ 95%] (Sampling)
Iteration: 2000 / 2000 [100%] (Sampling)
Elapsed Time: 0.033816 seconds (Warm-up)
0.019145 seconds (Sampling)
0.052961 seconds (Total)
There were 410 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See
http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmupThere were 2 chains where the estimated Bayesian Fraction of Missing Information was low. See
http://mc-stan.org/misc/warnings.html#bfmi-lowExamine the pairs() plot to diagnose sampling problems
summary(school8.fit1)
$summary
mean se_mean sd
mu 7.812131 0.6055113 5.608953
tau 6.526100 0.5475959 5.594885
theta[1] 11.406455 0.7937759 8.844209
theta[2] 7.661773 0.7112901 6.728279
theta[3] 6.187038 0.5263840 7.914510
theta[4] 7.449181 0.5391214 6.876766
theta[5] 5.250541 0.5238661 6.588764
theta[6] 6.084175 0.5881689 6.928273
theta[7] 10.227007 0.7628530 7.193739
theta[8] 8.351604 0.6037402 7.949255
lp__ -16.765925 1.0801876 6.665330
2.5% 25%
mu -1.9650045 4.0061144
tau 0.4908227 2.4808113
theta[1] -2.3989105 5.4760790
theta[2] -4.6237519 2.9270998
theta[3] -11.0406690 0.7400600
theta[4] -5.3785224 2.4641135
theta[5] -8.5583917 0.8033183
theta[6] -8.5682197 0.9106422
theta[7] -1.3595897 5.5085064
theta[8] -5.9049507 2.5862723
lp__ -27.6712192 -21.4322308
50% 75% 97.5%
mu 8.114573 11.627998 18.650955
tau 5.104776 8.894914 20.571063
theta[1] 10.741247 16.130516 32.410127
theta[2] 7.986128 12.038098 21.293496
theta[3] 6.565595 11.655198 20.626196
theta[4] 7.674158 11.928284 21.379169
theta[5] 5.657908 10.159945 17.273066
theta[6] 6.451819 11.236821 18.478337
theta[7] 10.308138 14.461569 26.174680
theta[8] 8.372006 12.716449 24.744555
lp__ -17.660582 -12.808046 1.238644
n_eff Rhat
mu 85.80631 1.041245
tau 104.39071 1.031969
theta[1] 124.14298 1.025667
theta[2] 89.47764 1.038188
theta[3] 226.06989 1.018209
theta[4] 162.70289 1.022423
theta[5] 158.18573 1.028340
theta[6] 138.75410 1.024984
theta[7] 88.92577 1.033819
theta[8] 173.36152 1.018429
lp__ 38.07544 1.114614
$c_summary
, , chains = chain:1
stats
parameter mean sd
mu 5.317119 5.722964
tau 5.415942 5.154340
theta[1] 8.284340 9.138626
theta[2] 4.766320 6.748719
theta[3] 4.010707 7.251676
theta[4] 5.134581 6.811278
theta[5] 3.162939 5.801276
theta[6] 3.827647 6.313627
theta[7] 7.233195 8.048787
theta[8] 5.883783 7.859923
lp__ -15.065836 6.611098
stats
parameter 2.5% 25%
mu -2.0692392 0.06696485
tau 0.9128414 0.96777399
theta[1] -2.2795964 1.10345272
theta[2] -3.9538152 -1.06680084
theta[3] -8.3303515 -0.61107846
theta[4] -4.6738503 0.07632760
theta[5] -8.4102941 0.28988835
theta[6] -8.6773500 -0.19629819
theta[7] -2.2706246 -0.32696602
theta[8] -5.9658763 0.46633543
lp__ -26.5509196 -20.60641638
stats
parameter 50% 75%
mu 4.9957764 9.791407
tau 3.8712827 8.088730
theta[1] 6.6593258 13.839511
theta[2] 3.5701161 9.798721
theta[3] 1.3700207 8.796577
theta[4] 2.8740655 9.964245
theta[5] 0.8033183 6.837366
theta[6] 1.7221348 8.188059
theta[7] 6.7532038 12.872211
theta[8] 3.7967882 11.168415
lp__ -15.9032018 -8.032832
stats
parameter 97.5%
mu 16.879393
tau 18.484609
theta[1] 29.886852
theta[2] 18.569349
theta[3] 19.199158
theta[4] 20.511356
theta[5] 15.564108
theta[6] 17.485785
theta[7] 25.558868
theta[8] 22.526985
lp__ -6.324546
, , chains = chain:2
stats
parameter mean sd
mu 9.113725 4.997920
tau 4.921228 5.352254
theta[1] 11.703441 7.351434
theta[2] 9.406019 5.752201
theta[3] 8.043952 7.035265
theta[4] 8.623902 6.620876
theta[5] 7.252884 6.004023
theta[6] 7.877358 6.530397
theta[7] 10.973099 5.804341
theta[8] 9.334019 6.723365
lp__ -13.034108 8.771491
stats
parameter 2.5% 25%
mu -1.8337718 6.030965
tau 0.4908227 0.886425
theta[1] -1.4841371 8.146024
theta[2] -3.9383633 6.243934
theta[3] -9.6757426 4.132688
theta[4] -5.9050293 4.680339
theta[5] -5.6360762 3.218087
theta[6] -6.4022144 4.395750
theta[7] -1.3521462 7.914310
theta[8] -3.8439566 5.369012
lp__ -26.5091632 -19.938951
stats
parameter 50% 75%
mu 10.711412 11.745782
tau 3.398615 6.984397
theta[1] 11.461121 14.513713
theta[2] 10.819651 12.038098
theta[3] 10.162328 11.907102
theta[4] 10.436656 11.928284
theta[5] 8.785181 11.944174
theta[6] 9.932461 11.727059
theta[7] 11.554961 13.512744
theta[8] 11.080091 12.185820
lp__ -14.945254 -5.352956
stats
parameter 97.5%
mu 17.381211
tau 18.023103
theta[1] 29.586328
theta[2] 20.468949
theta[3] 19.272080
theta[4] 20.924958
theta[5] 16.442430
theta[6] 17.999776
theta[7] 23.968096
theta[8] 22.561822
lp__ 1.238644
, , chains = chain:3
stats
parameter mean sd
mu 7.959066 5.451501
tau 7.209491 5.871847
theta[1] 11.962623 8.890615
theta[2] 7.595246 6.542198
theta[3] 6.014075 7.574704
theta[4] 7.408524 6.594604
theta[5] 5.228719 6.322350
theta[6] 6.081984 7.046691
theta[7] 10.505488 6.840579
theta[8] 8.277614 8.169887
lp__ -18.149799 5.000652
stats
parameter 2.5% 25%
mu -1.965004 4.526018
tau 1.697291 3.192851
theta[1] -3.084323 6.334717
theta[2] -4.183357 3.049151
theta[3] -9.272498 1.995942
theta[4] -5.689904 3.422867
theta[5] -8.806101 1.032569
theta[6] -9.054196 1.726537
theta[7] -1.539866 5.876482
theta[8] -7.341958 3.274583
lp__ -27.916636 -21.964785
stats
parameter 50% 75%
mu 7.992465 11.267931
tau 5.571287 9.309433
theta[1] 11.062798 16.595132
theta[2] 7.475613 11.880208
theta[3] 6.408747 11.004031
theta[4] 7.486955 11.358652
theta[5] 6.061246 9.319472
theta[6] 6.348071 10.842026
theta[7] 10.242733 14.586693
theta[8] 8.153980 12.921977
lp__ -18.237036 -14.075445
stats
parameter 97.5%
mu 18.937056
tau 21.770242
theta[1] 32.910887
theta[2] 21.734591
theta[3] 19.910615
theta[4] 20.337824
theta[5] 16.556155
theta[6] 19.252657
theta[7] 25.519162
theta[8] 25.607118
lp__ -9.278952
, , chains = chain:4
stats
parameter mean sd
mu 8.568027 5.534737
tau 7.168199 5.435400
theta[1] 12.733641 9.028515
theta[2] 8.412744 6.541800
theta[3] 6.565178 8.283020
theta[4] 8.027052 6.263444
theta[5] 5.453597 6.643689
theta[6] 6.317908 6.678812
theta[7] 11.030815 6.814620
theta[8] 8.995761 8.064238
lp__ -18.215608 5.201645
stats
parameter 2.5% 25%
mu -1.3212825 5.293882
tau 1.3719551 3.325571
theta[1] -1.2472495 6.694012
theta[2] -3.8508585 4.466115
theta[3] -12.3482461 2.193619
theta[4] -4.3446945 4.176093
theta[5] -9.3785866 1.829915
theta[6] -7.9954878 2.466654
theta[7] -0.7609649 6.606466
theta[8] -5.4575975 4.456238
lp__ -28.8531399 -21.583613
stats
parameter 50% 75%
mu 8.347975 11.490139
tau 5.878870 9.367476
theta[1] 10.993628 18.013020
theta[2] 8.465054 12.197814
theta[3] 7.300503 11.048765
theta[4] 8.152605 11.714965
theta[5] 6.137964 9.988435
theta[6] 6.759475 10.570555
theta[7] 10.131324 14.984971
theta[8] 8.297449 13.153525
lp__ -18.490537 -14.761629
stats
parameter 97.5%
mu 19.976295
tau 20.576796
theta[1] 34.252377
theta[2] 22.195894
theta[3] 20.732651
theta[4] 20.210171
theta[5] 18.077228
theta[6] 18.604222
theta[7] 26.637261
theta[8] 26.506941
lp__ -6.891882
, , chains = chain:5
stats
parameter mean sd
mu 8.102717 5.544260
tau 7.915642 5.547808
theta[1] 12.348232 8.971728
theta[2] 8.128535 7.076505
theta[3] 6.301277 8.775130
theta[4] 8.051848 7.499751
theta[5] 5.154567 7.415675
theta[6] 6.315977 7.416364
theta[7] 11.392437 7.459067
theta[8] 9.266841 8.314063
lp__ -19.364271 4.689265
stats
parameter 2.5% 25%
mu -2.064587 4.1700491
tau 1.794061 4.0570490
theta[1] -2.690773 6.0130344
theta[2] -5.910027 3.7126105
theta[3] -13.015217 1.2974379
theta[4] -6.509460 3.1743881
theta[5] -10.947974 0.6910546
theta[6] -10.173149 1.7264875
theta[7] -1.176554 5.9927923
theta[8] -5.822930 3.6335596
lp__ -28.390295 -22.5641666
stats
parameter 50% 75%
mu 8.090350 11.386533
tau 6.515542 10.255615
theta[1] 11.360511 17.552293
theta[2] 7.965408 12.758135
theta[3] 6.573409 12.018820
theta[4] 7.860309 12.710849
theta[5] 5.481983 9.993061
theta[6] 6.554202 11.168807
theta[7] 10.726751 16.397259
theta[8] 8.739287 14.749213
lp__ -19.476391 -16.229731
stats
parameter 97.5%
mu 18.81987
tau 22.01051
theta[1] 33.58971
theta[2] 22.15199
theta[3] 23.12638
theta[4] 23.55926
theta[5] 19.31765
theta[6] 19.42346
theta[7] 27.26951
theta[8] 25.85211
lp__ -10.04014
Alternate style using external file.
options(mc.cores = parallel::detectCores()-1)
school8.fit1 <- stan(
file="Bayesian/school8.stan", # The model
data = school8.dat, # The data
chains = 5,
warmup = 1000,
iter = 2000,
refresh=100 # Show progress
)
summary(school8.fit1)
Printing gives summaries of the posterior for the specified parameters. Use the pars
argument to select what to print.
print(school8.fit1,pars=c("theta","mu","tau","lp__"), probs=c(.1,.5,.9))
Inference for Stan model: school8.
5 chains, each with iter=2000; warmup=1000; thin=1;
post-warmup draws per chain=1000, total post-warmup draws=5000.
mean se_mean sd 10%
theta[1] 11.41 0.79 8.84 1.10
theta[2] 7.66 0.71 6.73 -1.07
theta[3] 6.19 0.53 7.91 -3.19
theta[4] 7.45 0.54 6.88 -0.73
theta[5] 5.25 0.52 6.59 -2.87
theta[6] 6.08 0.59 6.93 -2.05
theta[7] 10.23 0.76 7.19 0.16
theta[8] 8.35 0.60 7.95 -0.71
mu 7.81 0.61 5.61 0.07
tau 6.53 0.55 5.59 0.97
lp__ -16.77 1.08 6.67 -24.32
50% 90% n_eff Rhat
theta[1] 10.74 22.86 124 1.03
theta[2] 7.99 15.71 89 1.04
theta[3] 6.57 15.20 226 1.02
theta[4] 7.67 15.80 163 1.02
theta[5] 5.66 12.79 158 1.03
theta[6] 6.45 14.28 139 1.02
theta[7] 10.31 19.33 89 1.03
theta[8] 8.37 17.82 173 1.02
mu 8.11 14.39 86 1.04
tau 5.10 13.65 104 1.03
lp__ -17.66 -7.30 38 1.11
Samples were drawn using NUTS(diag_e) at Wed Mar 6 15:53:05 2019.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
The plot
function gives 50% and 95% intervals.
plot(school8.fit1)
ci_level: 0.8 (80% intervals)
outer_level: 0.95 (95% intervals)
Traceplot shows convergence (note stan and coda have slightly different traceplot functions).
rstan::traceplot(school8.fit1,pars=c("mu","tau","lp__"), inc_warmup=TRUE, nrow=3)
What we are looking for here is (a) white-noise like, and all chains plotting over top of each other.
Typically variance and scale parameters will be positively skewed, and log-posterior is often not as well behaved as others.
Pair plot sometimes uncover problems with models.
pairs(school8.fit1,pars=c("mu","theta[1]","tau"))
Shinystan opens a new shiny workspace which allows interactive browsing of the results.
school8.fit1s <- launch_shinystan(school8.fit1)
Launching ShinyStan interface... for large models this may take some time.
Listening on http://127.0.0.1:3422