MCMC for Hierarchical Mixture Models
The purpose of this page is to share some of my R, JAGS and Stan code
for fitting hierarchical mixture models. Hopefully, others will find
this of use. I also may get some people to help me with feedback for
improving the way these models are fit.
The following working paper describes the
model that I am working with and some of the approaches in the code.
The paper was presented at
the 2014
Bayesian Application Workshop as
part of the Uncertainty In Artificial Intelligence Conference. The
slides from that conference are available (Power
point, PDF), as are four graphs which
show a direct comparison between the JAGS and Stan (unconstrained)
runs for the case were both the data generation and estimated model
have K=2:
- Deviance/Log Posterior. Good
convergence in both JAGS and Stan
- mu0[1] Level 2 mean for first
component. Note slow mixing and low effective
sample size in JAGS.
- alpha0[1] Level 2 average
probability of first component. Note signs of multiple modes for
this parameter in both JAGS and Stan runs.
- gamma0[1] Level 2 standard deviation
of Component 1 variances. Note cross-over at point 5,000 in Stan
runs (Stan chains are actually a chain of size 5,000 pasted together
with a chain of size 10,000) and multiple modes.
I have not pulled this together into a formal R package. I have a
couple of files full of R functions plus a couple of longer script
files for running the whole script. At the moment, this script file
needs a fair bit of fine tuning for any application, so I'm not ready
to make this into a package yet.
Prerequisites
- rjags
- rstan. Note: rstan
is a source package, because it uses the local C compiler. So you
need to follow the instructions on compiling R packages from
source. The proper compilers are usually already installed under
most Linux distributions. For Mac OS X, you need to download the
software developers kit (Xcode) for your version of Mac OS from Apple. For
Windows, follow the instructions
at: Install
Rtools for Windows.
- coda
- mixtools
- rje
Code
- Extra functions
- coda2.R Extensions to coda, mainly for
handling conversion of JAGS and Stan output to coda.
- sortmix.R Code for sorting mixture
components and generating input values.
- mixfuns.R Code for component
identification, as well as WAIC and DIC calculations.
- hmmgen.RData for generating Hierarchcial
Mixture Models.
- comparePlots.RPost processing to
generate Stan/JAGS comparision plots for talk.
- For Running in JAGS
- For Running in JAGS
Test Data
- Simulated Data Simulated data for K=2, 3,
and 4.
- Initial Values Initial values taken from
fitting a real data set to the K=2, 3, and 4 models using the data
initialization procedure. This can be used to make realistic
simulations.
Results from Running on test data
Output scripts and plots
The .Rout
files are plain text which should be readable
by any reasonable text reader (Note:
Microsoft's NOTEPAD.EXE
is broken, don't use it.)
The .1.pdf
files are from the first part of the MCMC run
and the .2.pdf
files are from the second part of the MCMC
run.
Run Times
Times are user times in minutes. Other processing was happening
during runs, but never more than a 2–3 process machine. Runs
were on a ZaReason laptop with 8 CPUs running Ubuntu 14.4.
Data Set | Model Fit | JAGS | Stan
(constrained) | Stan (unconstrained) |
K=2 | K=2 | 11.8 | 34.1 | 31.3 |
K=2 | K=3 | 24.7 | 60.6 | 131.2 |
K=2 | K=4 | 51.2 | 348.8 | 389.0 |
|
K=3 | K=2 | 12.2 | 37.8 | 21.6 |
K=3 | K=3 | 32.2 | 76.8 | 57.9 |
K=3 | K=4 | 50.0 | 337.5 | 140.9 |
|
K=4 | K=2 | 11.1 | 81.9 | 90.8 |
K=4 | K=3 | 31.4 | 438.2 | 198.9 |
K=4 | K=4 | 51.8 | 461.4 | 340.6 |
Convergence Results
This table summarizes the maximum value of R-hat for the Level-2
parameters for each data/model combination using each method. The
minimum effective MCMC sample size is given in parenthesis (note that
the full MC sample is 45,000 cycles in each case).
Data Set | Model Fit | JAGS | Stan
(constrained) | Stan (unconstrained) |
K=2 | K=2 | 1.6 (142) | 1.9 (308) | 1.4 (3084) |
K=2 | K=3 | 2.0 (23) | 2.0 (278) | 1.5 (3164) |
K=2 | K=4 | 1.6 (19) | 2.1 (10) | 1.3 (9917) |
|
K=3 | K=2 | 2.2 (101) | 1.4 (1389) | 1.5 (1163) |
K=3 | K=3 | 1.4 (64) | 2.7 (636) | 1.3 (462) |
K=3 | K=4 | 1.6 (147) | 1.9 (60) | 1.4 (486) |
|
K=4 | K=2 | 1.3 (17) | 1.4 (292) | 1.5 (805) |
K=4 | K=3 | 1.8 (31) | 1.5 (108) | 1.3 (799) |
K=4 | K=4 | 1.5 (45) | 1.5 (252) | 1.7 (647) |
Model Fit indexes
This table looks at the DIC, and WAIC indexes for each model. Look at
Chapter 7
of Bayesian
Data Analysis, 3rd Edition for definitions. The number
in parenthesis is the estimated dimensionality of the model. Note
that the Markov chains have not reached pseudo-convergence, so these
results need to be taken with an entire carton of salt.
If we were calculating AIC instead of WAIC, the number of parameters
would vary according to the second column. The parameter count
is (3K-1)I
, where in all data sets I=10
; so
the counts should be 50 (K=2), 80 (K=3) or 110 (K=4). Nominally, the
WAIC value should be lowest when the first and second columns match.
WAIC1 |
Data Set | Model Fit | JAGS | Stan
(constrained) | Stan (unconstrained) |
K=2 | K=2 | 1132 (36.4) | 1131
(36.3) | 1132 (36.4) |
K=2 | K=3 | 1132 (35.5) | 1133
(36.5) | 1131 (35.5) |
K=2 | K=4 | 1135 (38) | 1133
(35.8) | 1132 (35.5) |
|
K=3 | K=2 | 1361 (67.7) | 1326
(49.6) | 1320 (43.0) |
K=3 | K=3 | 1190 (60.29) | 1188
(60.7) | 1186.6 (60.0) |
K=3 | K=4 | 1187 (63.8) | 1188
(60.6) | 1188 (60.9) |
|
K=4 | K=2 | 1227 (50.5) | 1260 (77.3) | 1234 (53.3) |
K=4 | K=3 | 1124 (73.4) | 1125
(74.9) | 1128 (76.6) |
K=4 | K=4 | 1120 (97.7) | 1119
(97.1) | 1116 (93.4) |
WAIC2 |
Data Set | Model Fit | JAGS | Stan
(constrained) | Stan (unconstrained) |
K=2 | K=2 | 1132 (36.3) | 1130
(36.3) | 1131 (36.3) |
K=2 | K=3 | 1132 (35.5) | 1133
(36.5) | 1131 (35.5) |
K=2 | K=4 | 1135 (38.2) | 1133
(35.8) | 1132 (35.5) |
|
K=3 | K=2 | 1361 (67.7) | 1326
(49.7) | 1320 (43.0) |
K=3 | K=3 | 1190 (60.3) | 1189
(60.7) | 1187 (60.0) |
K=3 | K=4 | 1187 (63.8) | 1188
(60.5) | 1188 (60.9) |
|
K=4 | K=2 | 1227 (50.5) | 1260
(77.3) | 1234 (53.3) |
K=4 | K=3 | 1124 (73.4) | 1125
(74.9) | 1128 (76.6) |
K=4 | K=4 | 1121 (97.7) | 1118
(97.1) | 1116 (93.5) |
Something strange is happening currently in the DIC calculations. The
pDIC is almost always turning out negative, and sometimes the DIC is
negative as well. There is likely a bug in the code, I need to check it.
DIC |
Data Set | Model Fit | JAGS | Stan
(constrained) | Stan (unconstrained) |
K=2 | K=2 | # | # | # |
K=2 | K=3 | # | # | # |
K=2 | K=4 | # | # | # |
|
K=3 | K=2 | # | # | # |
K=3 | K=3 | # | # | # |
K=3 | K=4 | # | # | # |
|
K=4 | K=2 | # | # | # |
K=4 | K=3 | # | # | # |
K=4 | K=4 | # | # | # |
This is based on the DIC alternative suggested by Andrew Gelman, which
uses the variance of the deviance to estimate the dimensionality of
the model.
DIC alternate |
Data Set | Model Fit | JAGS | Stan
(constrained) | Stan (unconstrained) |
K=2 | K=2 | 1318 (56.7) | 1572
(55.2) | 1475 (57.1) |
K=2 | K=3 | 2546.7 (55.7) | 2417
(57.8) | 2372 (56.5) |
K=2 | K=4 | 2822 (61.3) | 13309
(59.7) | 16352 (57.9) |
|
K=3 | K=2 | 1805 (127.2) | 1326
(49.7) | 1458 (63.1) |
K=3 | K=3 | 1548 (107.3) | 1839 (112.9) | 1601 (117.2) |
K=3 | K=4 | 1686 (123.0) | 3579 (126.3) | 2314 (119.0) |
|
K=4 | K=2 | 1359 (75.9) | 2132
(162.8) | 1767 (127.9) |
K=4 | K=3 | 1760 (103.7) | 2795
(117.5) | 2382 (116.7) |
K=4 | K=4 | 2008 (196.0) | 3217.9
(191.1) | 5174 (211.1) |
Page maintained by Russell Almond.
Last updated on 2014-06-23.