---
title: "Independence"
author: "Russell Almond"
date: "10/5/2020"
output: html_document
runtime: shiny
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(shiny)
```
Imagine a population which is split into two groups: $A$ and $B$. We select 100 people at random and ask them a question, which has two answers `yes` and `no`. Define the following quantities:
* $N_{Ay}$ The number of people from Group $A$ who answered `yes`.
* $N_{An}$ The number of people from Group $A$ who answered `no`.
* $N_{By}$ The number of people from Group $B$ who answered `yes`.
* $N_{Bn}$ The number of people from Group $B$ who answered `no`.
Define the following values (row and column totals):
* $N_{A+}=N_{Ay}+N_{An}$ The number of people from Group $A$.
* $N_{B+}=N_{By}+N_{Bn}$ The number of people from Group $B$
* $N_{+y}=N_{Ay}+N_{By}$ The number of people who answered `yes`.
* $N_{+n}=N_{An}+N_{Bn}$ The number of people who answered `no`.
* $N=N_{A+}+N_{B+} = N_{+y} + N_{+n}$ The total number of people in the sample.
Dividing any of those numbers by $N_{xx}$ produces a corresponding proportion $P_{xx}$ (which can be interpreted as a probability or proportion.
Suppose group membership and the answer to the question are statistically indepedent. In the diagram below, adjust $P_{A+}$ and $P_{+y}$ to make a two-by-two table:
```{r independent, echo=FALSE}
N <- 100
inputPanel(
sliderInput("pa", label = "P(Member of Group A)",
min = 0, max = 1, value = .5, step = 0.05),
sliderInput("py", label = "P(Answered `yes`)",
min = 0, max = 1, value = .5, step = 0.05)
)
tabi <- reactive(
matrix(N*c(input$pa*input$py,
input$pa*(1-input$py),
(1-input$pa)*input$py,
(1-input$pa)*(1-input$py)),2,2,
byrow=TRUE,
dimnames=list(Group=c("A","B"),Answer=c("y","n")))
)
renderTable(tabi(),rownames=TRUE, digits=1)
renderPlot(mosaicplot(tabi()),color=TRUE,main="Independent data")
```
There are two things you should notice about the independent data.
* The proportion of `yes` and `no` answers in group A and B are always the same: $N_{Ay}/N_{A+} = N_{By}/N_{B+} = N_{+y}/N$
* The proportion of people in both Groups $A$ and $B$ are the same for people who answered both `yes` and `no`: $N_{Ay}/N_{+y} = N_{An}/N_{+n} = N_{A+}/N$
We could say that the row and column proportions are always the same.
Another way to think about this is to say:
* If we learned which group a person belongs to, that would not change the probability of their answer.
* If we learned how a person answered, that would not change the probablity of their group.
## Dependent
To make the table dependence, we need to add another parameter to the model to specify the degree of dependence.
For a two-by-two table, the odds ratio is as fairly easy to understand choice:
$$ OR = \frac{P_{Ay}/P_{An}}{P_{By}/P_{Bn}}$$
When group and answer are indpendent the cross product ratio should be 1.
If Group $A$ is more likely to answer `yes`, then the ratio should be bigger than 1.
If Group $B$ is more likely to answer `yes`, then the ratio should be less than one.
```{r dependent, echo=FALSE}
N <- 100
inputPanel(
sliderInput("pad", label = "P(Member of Group A)",
min = 0, max = 1, value = .5, step = 0.05),
sliderInput("pyd", label = "P(Answered `yes`)",
min = 0, max = 1, value = .5, step = 0.05),
selectInput("OR",label="Odds Ratio",
choices=c("1/4","1/3","1/2","2/3","1","3/2","2","3","4"), selected ="1")
)
tabd <- reactive({
pa <-input$pad
py <-input$pyd
OR <- eval(str2lang(input$OR))
cat(pa,py,OR,"\n")
if (OR == 1) {
pay <- pa*py
} else {
S <- sqrt((1+(pa+py)*(OR-1))^2 + 4*OR*(1-OR)*pa*py)
cat(S,"\n")
pay <- (1+(pa+py)*(OR-1)-S)/2/(OR-1)
cat(pay,"\n")
}
matrix(N*c(pay,(pa-pay),(py-pay),(1-py-pa+pay)),
2,2, byrow=TRUE,
dimnames=list(Group=c("A","B"),Answer=c("y","n")))
})
renderTable(tabd(),rownames=TRUE, digits=1)
renderPlot(mosaicplot(tabd()),color=TRUE,main="Dependent data")
```