--- title: "Law of Large Numbers" author: "Russell Almond" date: "February 19, 2019" output: html_document runtime: shiny --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) ``` # Law of Large Numbers This is pretty close to the frequency definition of probability. Suppose the probability of some event is $p$. Suppose further than we sample $N$ times from the process that generates this event. Let $p_N$ be the proportion of times the event occurs in $N$ trials. As $N$ gets bigger and bigger, $p_N$ gets closer and closer to $p$. ![Detour](sign_turn_left.png)_(Skip this unless you are good with calculus.)_ This is one of those epsilon-delta theorems. So let $\delta$ be a difference from $p$ and let $\epsilon$ be a small probability. For any $\epsilon$ and $\delta$, there exists an $N$ such that $P(|p_N-p|>\delta) < \epsilon$. ## A demonstration. In the picture below, pick a probability $p$ and a sample size $N$. The computer will generate samples up to $N$ and plot $p_N$. The $\delta$-line is an error bound plus or minus $\delta$ units from the target $p$. This is a target so you can judge how close you got. ```{r LoLN, echo=FALSE} inputPanel( selectInput("N", label = "Maximum Sample Size:", choices = c(50, 100, 200, 500, 1000), selected = 200), sliderInput("p", label = "Probability of event (p)", min = 0, max = 1, value = .5, step = 0.01), sliderInput("delta", label = "Distance of reference line from target (delta)", min = 0, max = .1, value = .05, step = 0.005) ) renderPlot({ x <- runif(input$N) < input$p pn <- cumsum(x)/1:input$N plot(1:input$N,pn,xlab="Number of Trials",ylab="Proportion Success", type="l") abline(h=input$p,col="blue") abline(h=input$p+input$delta,col="skyblue") abline(h=input$p-input$delta,col="skyblue") }) ``` ## Convergence of Distributions (Boot strap distribution) We can use the _Law of Large Numbers_ to prove an important theorem. As the sample size gets larger and larger, the sample looks more and more like the population it is drawn from. ![Proof](sign_turn_left.png) Technically, the _Law of Large Numbers_ refers to the result above. But we can use it so show a very important basis of statistics. Suppose we have some kind of distribution, $F(x)$, that generates numbers, $X$. Recall that the definition of $F(x)=\Pr(X \leq x)$. ![Proof](sign_turn_left.png) Draw a sample of size $N$ from this distribution. Now consider the sampled data points $X_1,\ldots,X_N$, and consider sampling a new value $Y$ from that distribution. Let $F_N(y) = \Pr(Y \leq y)$. This is sometimes called the _bootstrap distribution_. ![Proof](sign_turn_left.png) By the law of large numbers, for every $y$, as $N$ gets large $F_N(y) \rightarrow F(y)$. So the sample distribution $F_N()$ converges to the $F()$. ## Demonstration of convergence of distributions. Pick a distribution: * Normal -- standard normal * Exponential -- highly skewed * Gamma (shape = 3) -- skewed * T (df =3) -- high kurtosis Slide the sample size up and down, notice how the empirical distribution function and histogram coverge to the theoretical distribution function and density. ```{r DistConv, echo=FALSE} nmax <- 1000 rdist <- list(Normal=rnorm, Exponential = rexp, Gamma = function(n) rgamma(n,3), "T" = function(n) rt(n,3)) pdist <- list(Normal=pnorm, Exponential = pexp, Gamma = function(q) pgamma(q,3), "T" = function(q) pt(q,3)) ddist <- list(Normal=dnorm, Exponential = dexp, Gamma = function(x) dgamma(x,3), "T" = function(x) dt(x,3)) inputPanel( selectInput("dist",label="Distribution Type", choices=c("Normal","Exponential","Gamma","T"), selected="Normal"), sliderInput("NN", label = "Maximum Sample Size:", min = 25, max=nmax, value=100, step=5) ) renderPlot({ XX <- do.call(rdist[[input$dist]],list(nmax)) Fn <-ecdf(XX[1:input$NN]) layout(matrix(c(1,2),1,2)) plot(Fn, main=paste("Actual vs Empirical Distribution Function, N=",input$NN)) curve(do.call(pdist[[input$dist]],list(x)),add=TRUE,lty=2,col="red") hist(XX[1:input$NN], probability = TRUE, main=paste("Actual vs Empirical Density Function, N=",input$NN),xlab="X") curve(do.call(ddist[[input$dist]],list(x)),add=TRUE,lty=2,col="red") }) ``` See also the [animated version](LawOfLargeNumbersAnimated.Rmd).