--- title: "Central Limit Theorem" author: "Russell Almond" date: "September 8, 2020" output: html_document runtime: shiny --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) ``` Pick a distribution: * Uniform -- platykurtic * Binomial -- symmetric and mesokurtic * Exponential -- highly skewed * Gamma (shape = 3) -- skewed * T (df =3) -- high kurtosis Slide the sample size up and down, notice how the empirical distribution function and histogram coverge to the normal distribution function and density. ```{r DistConv, echo=FALSE} nmax <- 1000 mmax <- 100 rdist <- list(Uniform=runif, Binomial= function(n) rbinom(n,1,.5), Exponential = rexp, Gamma = function(n) rgamma(n,3), "T" = function(n) rt(n,3)) inputPanel( selectInput("dist",label="Distribution Type", choices=c("Uniform","Binomial","Exponential","Gamma","T"), selected="Unifor"), sliderInput("NN", label = "Number of Samples:", min = 25, max=nmax, value=nmax, step=5), sliderInput("MM",label="Size of each sample:", min=1, max=mmax,value=1,step=1) ) XX <- reactive(matrix(do.call(rdist[[input$dist]],list(nmax*mmax)),nmax,mmax)) renderPlot({ layout(matrix(1:4,2,2)) X1 <- XX()[1:input$NN,1] Xmean <- rowMeans(XX()[1:input$NN,1:input$MM,drop=FALSE]) hist(X1,main="Average of sample of size 1",probability=TRUE) curve(dnorm(x,mean(X1),sd(X1)),add=TRUE,lty=2,col="red") qqnorm(X1,main="Average of sample of size 1") qqline(X1) hist(Xmean, main=paste("Average of sample of Size",input$MM),probability=TRUE) curve(dnorm(x,mean(Xmean),sd(Xmean)),add=TRUE,lty=2,col="red") qqnorm(Xmean,main=paste("Average of sample of Size",input$MM)) qqline(Xmean) }) ``` The left column shows the original distribution. (I call that the _black hat_ in my CLT demo.) The right column shows the distribution of means of size $M$ (adjusted with the second slider). (This is the _white hat_ distribuiton, and $M$ is the number of cards averaged to get the white hat value.) The top row shows histograms with a normal curve on top. The bottom row shows a QQ-plot. This shows how much the sample is different from a normal distribution. A normal distribution should be right on top of the diagonal line. * A U-shaped curve indicates skewness (and an upside down U is negatively skewed). Try the exponential distribution. * An S-shaped curve indicates high kurtosis (a backwards S, low kurtosis) Try the Student-t and Uniform distributions. As $M$ (the number of cards averages to get to the white hat) gets bigger, the distribution should get closer and closer to the normal distribution. ## Take home * Even if the underlying data aren't normal, the distribuiton of the means of various groups should be close to normal. * Close depends on the sample size. * A bigger sample is needed if the data are highly skewed (expontential and gamma) or leptokurtic (exponential and Student t).