MCMC for Hierarchical Mixture Models

The purpose of this page is to share some of my R, JAGS and Stan code for fitting hierarchical mixture models. Hopefully, others will find this of use. I also may get some people to help me with feedback for improving the way these models are fit.

The following working paper describes the model that I am working with and some of the approaches in the code. The paper was presented at the 2014 Bayesian Application Workshop as part of the Uncertainty In Artificial Intelligence Conference. The slides from that conference are available (Power point, PDF), as are four graphs which show a direct comparison between the JAGS and Stan (unconstrained) runs for the case were both the data generation and estimated model have K=2:

Deviance/Log Posterior. Good convergence in both JAGS and Stan
mu0[1] Level 2 mean for first component. Note slow mixing and low effective sample size in JAGS.
alpha0[1] Level 2 average probability of first component. Note signs of multiple modes for this parameter in both JAGS and Stan runs.
gamma0[1] Level 2 standard deviation of Component 1 variances. Note cross-over at point 5,000 in Stan runs (Stan chains are actually a chain of size 5,000 pasted together with a chain of size 10,000) and multiple modes.

I have not pulled this together into a formal R package. I have a couple of files full of R functions plus a couple of longer script files for running the whole script. At the moment, this script file needs a fair bit of fine tuning for any application, so I'm not ready to make this into a package yet.

Prerequisites

rjags
rstan. Note: rstan is a source package, because it uses the local C compiler. So you need to follow the instructions on compiling R packages from source. The proper compilers are usually already installed under most Linux distributions. For Mac OS X, you need to download the software developers kit (Xcode) for your version of Mac OS from Apple. For Windows, follow the instructions at: Install Rtools for Windows.
coda
mixtools
rje

Code

Extra functions
- coda2.R Extensions to coda, mainly for handling conversion of JAGS and Stan output to coda.
- sortmix.R Code for sorting mixture components and generating input values.
- mixfuns.R Code for component identification, as well as WAIC and DIC calculations.
- hmmgen.RData for generating Hierarchcial Mixture Models.
- comparePlots.RPost processing to generate Stan/JAGS comparision plots for talk.
For Running in JAGS
- R script using JAGS
- JAGS model
For Running in JAGS
- R script using Stan
- There are three Stan models, depending on constraints: Unconstrained, mu0 constrained, and tau0 constrained.

Test Data

Simulated Data Simulated data for K=2, 3, and 4.
Initial Values Initial values taken from fitting a real data set to the K=2, 3, and 4 models using the data initialization procedure. This can be used to make realistic simulations.

Results from Running on test data

Output scripts and plots

The .Rout files are plain text which should be readable by any reasonable text reader (Note: Microsoft's NOTEPAD.EXE is broken, don't use it.) The .1.pdf files are from the first part of the MCMC run and the .2.pdf files are from the second part of the MCMC run.

Data Set	Model Fit	JAGS	Stan (constrained)	Stan (unconstrained)
K=2	K=2	Log File 1st Run Plots 2nd Run Plots	Log File 1st Run Plots 2nd Run Plots	Log File 1st Run Plots 2nd Run Plots
K=2	K=3	Log File 1st Run Plots 2nd Run Plots	Log File 1st Run Plots 2nd Run Plots	Log File 1st Run Plots 2nd Run Plots
K=2	K=4	Log File 1st Run Plots 2nd Run Plots	Log File 1st Run Plots 2nd Run Plots	Log File 1st Run Plots 2nd Run Plots

K=3	K=2	Log File 1st Run Plots 2nd Run Plots	Log File 1st Run Plots 2nd Run Plots	Log File 1st Run Plots 2nd Run Plots
K=3	K=3	Log File 1st Run Plots 2nd Run Plots	Log File 1st Run Plots 2nd Run Plots	Log File 1st Run Plots 2nd Run Plots
K=3	K=4	Log File 1st Run Plots 2nd Run Plots	Log File 1st Run Plots 2nd Run Plots	Log File 1st Run Plots 2nd Run Plots

K=4	K=2	Log File 1st Run Plots 2nd Run Plots	Log File 1st Run Plots 2nd Run Plots	Log File 1st Run Plots 2nd Run Plots
K=4	K=3	Log File 1st Run Plots 2nd Run Plots	Log File 1st Run Plots 2nd Run Plots	Log File 1st Run Plots 2nd Run Plots
K=4	K=4	Log File 1st Run Plots 2nd Run Plots	Log File 1st Run Plots 2nd Run Plots	Log File 1st Run Plots 2nd Run Plots

Run Times

Times are user times in minutes. Other processing was happening during runs, but never more than a 2–3 process machine. Runs were on a ZaReason laptop with 8 CPUs running Ubuntu 14.4.

Data Set	Model Fit	JAGS	Stan (constrained)	Stan (unconstrained)
K=2	K=2	11.8	34.1	31.3
K=2	K=3	24.7	60.6	131.2
K=2	K=4	51.2	348.8	389.0

K=3	K=2	12.2	37.8	21.6
K=3	K=3	32.2	76.8	57.9
K=3	K=4	50.0	337.5	140.9

K=4	K=2	11.1	81.9	90.8
K=4	K=3	31.4	438.2	198.9
K=4	K=4	51.8	461.4	340.6

Convergence Results

This table summarizes the maximum value of R-hat for the Level-2 parameters for each data/model combination using each method. The minimum effective MCMC sample size is given in parenthesis (note that the full MC sample is 45,000 cycles in each case).

Data Set	Model Fit	JAGS	Stan (constrained)	Stan (unconstrained)
K=2	K=2	1.6 (142)	1.9 (308)	1.4 (3084)
K=2	K=3	2.0 (23)	2.0 (278)	1.5 (3164)
K=2	K=4	1.6 (19)	2.1 (10)	1.3 (9917)

K=3	K=2	2.2 (101)	1.4 (1389)	1.5 (1163)
K=3	K=3	1.4 (64)	2.7 (636)	1.3 (462)
K=3	K=4	1.6 (147)	1.9 (60)	1.4 (486)

K=4	K=2	1.3 (17)	1.4 (292)	1.5 (805)
K=4	K=3	1.8 (31)	1.5 (108)	1.3 (799)
K=4	K=4	1.5 (45)	1.5 (252)	1.7 (647)

Model Fit indexes

This table looks at the DIC, and WAIC indexes for each model. Look at Chapter 7 of Bayesian Data Analysis, 3rd Edition for definitions. The number in parenthesis is the estimated dimensionality of the model. Note that the Markov chains have not reached pseudo-convergence, so these results need to be taken with an entire carton of salt.

If we were calculating AIC instead of WAIC, the number of parameters would vary according to the second column. The parameter count is (3K-1)I, where in all data sets I=10; so the counts should be 50 (K=2), 80 (K=3) or 110 (K=4). Nominally, the WAIC value should be lowest when the first and second columns match.

WAIC1

Data Set Model Fit JAGS Stan (constrained) Stan (unconstrained)

K=2 K=2 1132 (36.4) 1131 (36.3) 1132 (36.4)

K=2 K=3 1132 (35.5) 1133 (36.5) 1131 (35.5)

K=2 K=4 1135 (38) 1133 (35.8) 1132 (35.5)

K=3 K=2 1361 (67.7) 1326 (49.6) 1320 (43.0)

K=3 K=3 1190 (60.29) 1188 (60.7) 1186.6 (60.0)

K=3 K=4 1187 (63.8) 1188 (60.6) 1188 (60.9)

K=4 K=2 1227 (50.5) 1260 (77.3) 1234 (53.3)

K=4 K=3 1124 (73.4) 1125 (74.9) 1128 (76.6)

K=4 K=4 1120 (97.7) 1119 (97.1) 1116 (93.4)

WAIC2

Data Set Model Fit JAGS Stan (constrained) Stan (unconstrained)

K=2 K=2 1132 (36.3) 1130 (36.3) 1131 (36.3)

K=2 K=3 1132 (35.5) 1133 (36.5) 1131 (35.5)

K=2 K=4 1135 (38.2) 1133 (35.8) 1132 (35.5)

K=3 K=2 1361 (67.7) 1326 (49.7) 1320 (43.0)

K=3 K=3 1190 (60.3) 1189 (60.7) 1187 (60.0)

K=3 K=4 1187 (63.8) 1188 (60.5) 1188 (60.9)

K=4 K=2 1227 (50.5) 1260 (77.3) 1234 (53.3)

K=4 K=3 1124 (73.4) 1125 (74.9) 1128 (76.6)

K=4 K=4 1121 (97.7) 1118 (97.1) 1116 (93.5)

Something strange is happening currently in the DIC calculations. The pDIC is almost always turning out negative, and sometimes the DIC is negative as well. There is likely a bug in the code, I need to check it.

DIC

Data Set Model Fit JAGS Stan (constrained) Stan (unconstrained)

K=2 K=2 # # #

K=2 K=3 # # #

K=2 K=4 # # #

K=3 K=2 # # #

K=3 K=3 # # #

K=3 K=4 # # #

K=4 K=2 # # #

K=4 K=3 # # #

K=4 K=4 # # #

This is based on the DIC alternative suggested by Andrew Gelman, which uses the variance of the deviance to estimate the dimensionality of the model.

DIC alternate

Data Set Model Fit JAGS Stan (constrained) Stan (unconstrained)

K=2 K=2 1318 (56.7) 1572 (55.2) 1475 (57.1)

K=2 K=3 2546.7 (55.7) 2417 (57.8) 2372 (56.5)

K=2 K=4 2822 (61.3) 13309 (59.7) 16352 (57.9)

K=3 K=2 1805 (127.2) 1326 (49.7) 1458 (63.1)

K=3 K=3 1548 (107.3) 1839 (112.9) 1601 (117.2)

K=3 K=4 1686 (123.0) 3579 (126.3) 2314 (119.0)

K=4 K=2 1359 (75.9) 2132 (162.8) 1767 (127.9)

K=4 K=3 1760 (103.7) 2795 (117.5) 2382 (116.7)

K=4 K=4 2008 (196.0) 3217.9 (191.1) 5174 (211.1)

WAIC1
Data Set	Model Fit	JAGS	Stan (constrained)	Stan (unconstrained)
K=2	K=2	1132 (36.4)	1131 (36.3)	1132 (36.4)
K=2	K=3	1132 (35.5)	1133 (36.5)	1131 (35.5)
K=2	K=4	1135 (38)	1133 (35.8)	1132 (35.5)

K=3	K=2	1361 (67.7)	1326 (49.6)	1320 (43.0)
K=3	K=3	1190 (60.29)	1188 (60.7)	1186.6 (60.0)
K=3	K=4	1187 (63.8)	1188 (60.6)	1188 (60.9)

K=4	K=2	1227 (50.5)	1260 (77.3)	1234 (53.3)
K=4	K=3	1124 (73.4)	1125 (74.9)	1128 (76.6)
K=4	K=4	1120 (97.7)	1119 (97.1)	1116 (93.4)

WAIC2
Data Set	Model Fit	JAGS	Stan (constrained)	Stan (unconstrained)
K=2	K=2	1132 (36.3)	1130 (36.3)	1131 (36.3)
K=2	K=3	1132 (35.5)	1133 (36.5)	1131 (35.5)
K=2	K=4	1135 (38.2)	1133 (35.8)	1132 (35.5)

K=3	K=2	1361 (67.7)	1326 (49.7)	1320 (43.0)
K=3	K=3	1190 (60.3)	1189 (60.7)	1187 (60.0)
K=3	K=4	1187 (63.8)	1188 (60.5)	1188 (60.9)

K=4	K=2	1227 (50.5)	1260 (77.3)	1234 (53.3)
K=4	K=3	1124 (73.4)	1125 (74.9)	1128 (76.6)
K=4	K=4	1121 (97.7)	1118 (97.1)	1116 (93.5)

DIC
Data Set	Model Fit	JAGS	Stan (constrained)	Stan (unconstrained)
K=2	K=2	#	#	#
K=2	K=3	#	#	#
K=2	K=4	#	#	#

K=3	K=2	#	#	#
K=3	K=3	#	#	#
K=3	K=4	#	#	#

K=4	K=2	#	#	#
K=4	K=3	#	#	#
K=4	K=4	#	#	#

DIC alternate
Data Set	Model Fit	JAGS	Stan (constrained)	Stan (unconstrained)
K=2	K=2	1318 (56.7)	1572 (55.2)	1475 (57.1)
K=2	K=3	2546.7 (55.7)	2417 (57.8)	2372 (56.5)
K=2	K=4	2822 (61.3)	13309 (59.7)	16352 (57.9)

K=3	K=2	1805 (127.2)	1326 (49.7)	1458 (63.1)
K=3	K=3	1548 (107.3)	1839 (112.9)	1601 (117.2)
K=3	K=4	1686 (123.0)	3579 (126.3)	2314 (119.0)

K=4	K=2	1359 (75.9)	2132 (162.8)	1767 (127.9)
K=4	K=3	1760 (103.7)	2795 (117.5)	2382 (116.7)
K=4	K=4	2008 (196.0)	3217.9 (191.1)	5174 (211.1)

Page maintained by Russell Almond. Last updated on 2014-06-23.