---
title: "8 Schools Stan Improved"
author: "Russell Almond"
date: "1/31/2019"
output: html_notebook
editor_options:
chunk_output_type: inline
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
# The 8 Schools Problem
This is the classic eight schools example from Rubin(1981)[^1]. (It is also found in Chapter 5 of Gelman et al., 2014 [^2].) The story is that 8 different schools experimented with an SAT coaching experiment. The performance gains of the coached students were compared to students on a weight list control. Separate estimates were obtained for each school, but because the size of the schools differed, the standard errors differed as well.
Here are the data:
```{r data}
Schools <- data.frame(row.names=c("A","B","C","D","E","F","G","H"),
effect = c(28.39,7.94,-2.75,6.82,-.64,.63,18.01,12.16),
see = c(14.9, 10.2, 16.3, 11.0, 9.4, 11.4, 10.4, 17.6))
Schools
```
Lets start by calculating a weighted average effect. I'll weight each case by the precision (one over the square of the see).
```{r grand mean}
Schools$w <- 1/Schools$see^2
Schools.mean <- sum(Schools$w*Schools$effect)/sum(Schools$w)
Schools.mean
```
Here is a plot of the data.
```{r data Plot}
ord <- order(Schools$effect)
plot(Schools$effect[ord[c(1,8)]]+c(-2,2)*Schools$see[ord[c(1,8)]],
c(nrow(Schools),1),main = "8 Schools data.",type="n",yaxt="n",
xlab="Effect Size",ylab="School")
points(Schools$effect[ord],nrow(Schools):1,pch=rownames(Schools)[ord])
segments(Schools$effect[ord]-2*Schools$see[ord],nrow(Schools):1,
Schools$effect[ord]+2*Schools$see[ord],nrow(Schools):1)
abline(v=Schools.mean,col="blue")
```
## Model Setup in Stan.
First we need to load the packages. The `rstan` package runs stan from R. The `shinystan` package gives us a browser for the results.
```{r}
library(rstan)
library(shinystan)
library(parallel) # For using multiple chains
options(mc.cores=5)
```
First we set up the Stan model, and put it into a variable called `school8` (note the `output.var="school8"` in stan block tag)
```{stan output.var="school8.stan"}
data {
int J; // number of schools
real y[J]; // estimated treatement effects
real sigma[J]; //s.e. of effects.
}
parameters {
real mu;
real tau;
vector[J] eta;
}
transformed parameters {
vector [J] theta;
theta = mu + eta*tau;
}
model {
eta ~ normal(0,1);
//target += normal_lpdf(theta|mu,tau);
y ~ normal(theta,sigma);
//target += normal_lpdf(effect|theta,see);
}
```
### Parts of the stan code.
Almost all models will have a `data`, `parameters` and `model` block. They could have others as well (common ones are `transformed data` to do pre-calculations and `transformed parameters` to recast the model).
#### Data
In a stan model, _data_ refers to values that don't change over the course of the MCMC loop. This tends to be one of three things:
* The observed data values.
* Fixed hyperparameters for prior distributions.
* Structural hyperparamters (e.g., sample size, number of groups).
In stan (like C++) all variables must be declared. They are generally either `int` or `real` (here `vector` is shorthand for a vector of real values). The modifiers and can be used to constrain the inputs. In the data field this is just used to to type checking.
#### Parameters
The parameters are the values that stan will try to estimate. These include both latent variables and ordinary parameters and hyperparameters.
In stan, parameters should all be real. Non-continuous parameters don't work with the Hamiltonian Monte Carlo. (There are tricks for dealing with common cases like mixture models in the stan examples.) This is called `lp__` (log pdf) in the output.
Note carefully the use of the lower and upper bounds. It is important for stan to know when a parameter is restricted to say positive values (i.e., a scale parameter) as it needs to constrain the space for the sampler.
#### Model
This section gives the distribution for all of the parameters. First I give the BUGS-like way of doing this. This is to use the `~` operator to give the distribution. The names are slightly different in stan and R, so the rstan function `lookup` can help you find the stan function corresponding to the R function.
```{r lookup, echo=TRUE}
lookup(dnorm)
lookup(dt)
```
Note that what stan is actually doing in the model block is calculating the log p.d.f. Thus, the commented out expressions are an alternative to the `~` notation.
Finally, note that this is an incomplete Bayesian model as there are no distributions for `mu` or `tau`. In stan this means we are implicitly putting a uniform prior on `mu` and `log(tau)`; the latter is transformed so that it will always be positive. These are improper priors, but stan will be fine as long as the posterior is proper.
## Data preparation
Build a list which contains the data as elements using the names in the data section of the model.
```{r School 8 Data}
school8.dat <- list(
J = nrow(Schools),
y = Schools$effect,
sigma=Schools$see)
```
## Running Stan
There are two funcitons to start the sampling in stan. The first one is `stan(file=XXX,data=YYY,...)`. This assumes that the stan model is in a file. However, with R markdown, we already saved the model in an object so we can use `sampling(model,data=YYY,...)`. By the way, this same function can be used to make additional samples after we have sampled for a while.
The mc.cores should work well for Unix (Linux and Mac OS), I'm not so sure about Windows.
```{r Run Stan-sampling}
#options(mc.cores = parallel::detectCores()-1)
school8.fit1 <- sampling(
school8.stan, # The model
data = school8.dat, # The data
chains = 5,
warmup = 1000,
iter = 2000,
refresh=1000 # Show progress
)
summary(school8.fit1)
```
Alternate style using external file.
```{r Run Stan-stan}
options(mc.cores = parallel::detectCores()-1)
school8.fit1 <- stan(
file="Bayesian/school8.stan", # The model
data = school8.dat, # The data
chains = 5,
warmup = 1000,
iter = 2000,
refresh=100 # Show progress
)
summary(school8.fit1)
```
## Summaries using the base Stan functions
Printing gives summaries of the posterior for the specified parameters. Use the `pars` argument to select what to print.
```{r print stanfit}
print(school8.fit1,pars=c("theta","mu","tau","lp__"), probs=c(.1,.5,.9))
```
The `plot` function gives 50% and 95% intervals.
```{r stan plot}
plot(school8.fit1,pars=c("theta","mu","tau"))
```
Traceplot shows convergence (note stan and coda have slightly different traceplot functions).
```{r stan traceplot}
rstan::traceplot(school8.fit1,pars=c("mu","tau","lp__"), inc_warmup=TRUE, nrow=3)
```
What we are looking for here is (a) white-noise like, and all chains plotting over top of each other.
Typically variance and scale parameters will be positively skewed, and log-posterior is often not as well behaved as others.
Pair plot sometimes uncover problems with models.
```{r stan pairs}
pairs(school8.fit1,pars=c("mu","eta[1]","tau"))
```
## Shinystan
Shinystan opens a new shiny workspace which allows interactive browsing of the results.
```{r shinystan}
school8.fit1s <- launch_shinystan(school8.fit1)
```
### References
[^1]: Rubin, D. B. (1981). Estimation in Parallel randomized experiments. _Journal of Educational Statistics_, __6__, 377-401.
[^2]: Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A. and Rubin, D. B. (2014). _Bayesian Data Analysis: Third Edition_. CRC Press. (ISBN: 978-1-4398-4095-5)
[^3]: Gelman, A. and Hill, J. (2007). _Data Analysis Using Regression and Hierarchical/Multilevel Models._ Cambridge University Press.
[^4]: Dempster, A. P. Laird, N. M. and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm (with discussion). _Journal of the Royal Statistical Society,_ __39__, 1--38.