---
title: "Independence"
author: "Russell Almond"
date: "10/5/2020"
output: html_document
runtime: shiny
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(shiny)
```

Imagine a population which is split into two groups:  $A$ and $B$.  We select 100 people at random and ask them a question, which has two answers `yes` and `no`.  Define the following quantities:

* $N_{Ay}$  The number of people from Group $A$ who answered `yes`.

* $N_{An}$  The number of people from Group $A$ who answered `no`.

* $N_{By}$  The number of people from Group $B$ who answered `yes`.

* $N_{Bn}$  The number of people from Group $B$ who answered `no`.

Define the following values (row and column totals):

* $N_{A+}=N_{Ay}+N_{An}$  The number of people from Group $A$.

* $N_{B+}=N_{By}+N_{Bn}$  The number of people from Group $B$

* $N_{+y}=N_{Ay}+N_{By}$  The number of people who answered `yes`.

* $N_{+n}=N_{An}+N_{Bn}$  The number of people who answered `no`.

* $N=N_{A+}+N_{B+} = N_{+y} + N_{+n}$ The total number of people in the sample.

Dividing any of those numbers by $N_{xx}$ produces a corresponding proportion $P_{xx}$ (which can be interpreted as a probability or proportion.

Suppose group membership and the answer to the question are statistically indepedent.  In the diagram below, adjust $P_{A+}$ and $P_{+y}$ to make a two-by-two table:

```{r independent, echo=FALSE}
N <- 100
inputPanel(
  sliderInput("pa", label = "P(Member of Group A)",
              min = 0, max = 1, value = .5, step = 0.05),
  sliderInput("py", label = "P(Answered `yes`)",
              min = 0, max = 1, value = .5, step = 0.05)
)

tabi <- reactive(
  matrix(N*c(input$pa*input$py,
           input$pa*(1-input$py),
           (1-input$pa)*input$py,
           (1-input$pa)*(1-input$py)),2,2,
         byrow=TRUE,
         dimnames=list(Group=c("A","B"),Answer=c("y","n")))
)
renderTable(tabi(),rownames=TRUE, digits=1)
renderPlot(mosaicplot(tabi()),color=TRUE,main="Independent data")
```


There are two things you should notice about the independent data.

* The proportion of `yes` and `no` answers in group A and B are always the same:  $N_{Ay}/N_{A+} = N_{By}/N_{B+} = N_{+y}/N$

* The proportion of people in both Groups $A$ and $B$ are the same for people who answered both `yes`  and `no`:  $N_{Ay}/N_{+y} = N_{An}/N_{+n} = N_{A+}/N$

We could say that the row and column proportions are always the same.

Another way to think about this is to say:
* If we learned which group a person belongs to, that would not change the probability of their answer.
* If we learned how a person answered, that would not change the probablity of their group.

## Dependent

To make the table dependence, we need to add another parameter to the model to specify the degree of dependence.  

For a two-by-two table, the odds ratio is as fairly easy to understand choice:
$$ OR = \frac{P_{Ay}/P_{An}}{P_{By}/P_{Bn}}$$
When group and answer are indpendent the cross product ratio should be 1.  

If Group $A$ is more likely to answer `yes`, then the ratio should be bigger than 1.  

If Group $B$ is more likely to answer `yes`, then the ratio should be less than one.

```{r dependent, echo=FALSE}
N <- 100
inputPanel(
  sliderInput("pad", label = "P(Member of Group A)",
              min = 0, max = 1, value = .5, step = 0.05),
  sliderInput("pyd", label = "P(Answered `yes`)",
              min = 0, max = 1, value = .5, step = 0.05),
  selectInput("OR",label="Odds Ratio",
              choices=c("1/4","1/3","1/2","2/3","1","3/2","2","3","4"), selected ="1")
)

tabd <- reactive({
  pa <-input$pad
  py <-input$pyd
  OR <- eval(str2lang(input$OR))
  cat(pa,py,OR,"\n")
  if (OR == 1) {
    pay <- pa*py
  } else {
    S <- sqrt((1+(pa+py)*(OR-1))^2 + 4*OR*(1-OR)*pa*py)
    cat(S,"\n")
    pay <- (1+(pa+py)*(OR-1)-S)/2/(OR-1)
    cat(pay,"\n")
  }
  matrix(N*c(pay,(pa-pay),(py-pay),(1-py-pa+pay)),
        2,2, byrow=TRUE,
        dimnames=list(Group=c("A","B"),Answer=c("y","n")))
})
renderTable(tabd(),rownames=TRUE, digits=1)
renderPlot(mosaicplot(tabd()),color=TRUE,main="Dependent data")
```