---
title: "Law of Large Numbers"
author: "Russell Almond"
date: "February 19, 2019"
output: html_document
runtime: shiny
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(plotly)
library(patchwork)
library(tidyverse)
accumulate_by <- function(dat, var) {
  var <- lazyeval::f_eval(var, dat)
  lvls <- plotly:::getLevels(var)
  dats <- lapply(seq_along(lvls), function(x) {
    cbind(dat[var %in% lvls[seq(1, x)], ], frame = lvls[[x]])
  })
  dplyr::bind_rows(dats)
}
```
# Law of Large Numbers

This is pretty close to the frequency definition of probability.  Suppose the probability of some event is $p$.  Suppose further than we sample $N$ times from the process that generates this event.  Let $p_N$ be the proportion of times the event occurs in $N$ trials.  As $N$ gets bigger and bigger, $p_N$ gets closer and closer to $p$.

![Detour](sign_turn_left.png)_(Skip this unless you are good with calculus.)_  This is one of those epsilon-delta theorems.  So let $\delta$ be a difference from $p$ and let $\epsilon$ be a small probability.  For any $\epsilon$ and $\delta$, there exists an $N$ such that $P(|p_N-p|>\delta) < \epsilon$.


## A demonstration.

In the picture below, pick a probability $p$ and a sample size $N$.  The computer will generate samples up to $N$ and plot $p_N$.

The $\delta$-line is an error bound plus or minus $\delta$ units from the target $p$.  This is a target so you can judge how close you got.


```{r LoLN, echo=FALSE}
inputPanel(
  selectInput("N", label = "Maximum Sample Size:",
              choices = c(50, 100, 200, 500, 1000), selected = 200),
  
  sliderInput("p", label = "Probability of event (p)",
              min = 0, max = 1, value = .5, step = 0.01),
  sliderInput("delta", label = "Distance of reference line from target (delta)",
              min = 0, max = .1, value = .05, step = 0.005)

)

renderPlotly({
  n <- 1:input$N
  x <- runif(input$N) < input$p
  pn <- cumsum(x)/n
  datalist <- lapply(n,function(nn)
    data.frame(n=1:nn,pn=pn[1:nn],f=nn))
  data <- dplyr::bind_rows(datalist)
  target <- input$p
  bounds <- input$p+c(-1,1)*input$delta
  fig <- 
      ggplot(data,aes(x=n,y=pn, frame=f)) +
      geom_line() +
      xlab("Number of Trials") +
      ylab("Proportion Success") +
      geom_hline(aes(yintercept=target,col="target")) +
      geom_hline(aes(yintercept=bounds[1],col="bound")) +
      geom_hline(aes(yintercept=bounds[2],col="bound")) + 
      labs(col="Target Lines") +
      scale_color_manual(values=c(target="blue",bound="skyblue"))
  
  ggplotly(fig) %>%  animation_opts(frame=100,transition=0,redraw=FALSE)
})
```

## Convergence of Distributions (Boot strap distribution)

We can use the _Law of Large Numbers_ to prove an important theorem.  As the sample size gets larger and larger, the sample looks more and more like the population it is drawn from.


![Proof](sign_turn_left.png) Technically, the _Law of Large Numbers_ refers to the result above.  But we can use it so show a very important basis of statistics.  Suppose we have some kind of distribution, $F(x)$, that generates numbers, $X$.  Recall that the definition of $F(x)=\Pr(X \leq x)$.

![Proof](sign_turn_left.png) Draw a sample of size $N$ from this distribution.  Now consider the sampled data points $X_1,\ldots,X_N$, and consider sampling a new value $Y$ from that distribution.  Let $F_N(y) = \Pr(Y \leq y)$.  This is sometimes called the _bootstrap distribution_.  

![Proof](sign_turn_left.png) By the law of large numbers, for every $y$, as $N$ gets large $F_N(y) \rightarrow F(y)$.  So the sample distribution $F_N()$ converges to the $F()$.

## Demonstration of convergence of distributions.

Pick a distribution:
* Normal -- standard normal
* Exponential -- highly skewed
* Gamma (shape = 3) -- skewed
* T (df =3) -- high kurtosis

Slide the sample size up and down, notice how the empirical distribution function and histogram coverge to the theoretical distribution function and density.

```{r DistConv, echo=FALSE}
nmax <- 1000
rdist <- list(Normal=rnorm, Exponential = rexp, 
              Gamma = function(n) rgamma(n,3),
              "T" = function(n) rt(n,3))
pdist <- list(Normal=pnorm, Exponential = pexp, 
              Gamma = function(q) pgamma(q,3),
              "T" = function(q) pt(q,3))
ddist <- list(Normal=dnorm, Exponential = dexp, 
              Gamma = function(x) dgamma(x,3),
              "T" = function(x) dt(x,3))

inputPanel(
  selectInput("dist",label="Distribution Type",
              choices=c("Normal","Exponential","Gamma","T"),
              selected="Normal"),
   selectInput("NN", label = "Maximum Sample Size:",
              choices = c(50, 100, 200, 500, 1000), selected = 200),
  
)

cumdat <- reactive({
  NN <- input$NN
  XX <- do.call(rdist[[input$dist]],list(NN))
  bind_rows(
      lapply(25:NN,function(i)
        data.frame(x=sort(XX[1:i]),Fn=(1:i)/i,f=i)))
})

renderPlotly({
  erfplot <- ggplot(cumdat(),aes(x,y=Fn,frame=f)) + geom_point()+stat_function(fun=pdist[[input$dist]],geom = "line",col="red") +  labs(title="Actual vs Empirical Distribution Function")
  
 
 ggplotly(erfplot) %>% animation_opts(frame=100) 
})

renderPlotly({
 histplot <- ggplot(cumdat(),aes(x,frame=f)) + geom_histogram(aes(y=..density..),binwidth=.25, position="identity") +
 stat_function(fun=ddist[[input$dist]],geom="line",col="red") + labs(
       title="Actual vs Empirical Density Function")

 ggplotly(histplot) %>% animation_opts(frame=100)
})
```

See also the [non-animated version](LawOfLargeNumbers.Rmd).