---
title: "Covid-19 Vaccines"
author: "Russell Almond"
date: "11/18/2020"
output:
  html_document: default
  pdf_document: default
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(tidyverse)
library(kableExtra)
```

Good News!  In November, both Pfizer and Moderna announce Phase 3 Vaccine Trials with promising results.

With over 30,000 participants in each study they reported the following data.

```{r data}
covidVaccines <- tibble(
  Treatment=c("Placebo","Vaccine"),
  Pfizer=c(90,5),ModernaAll=c(95,5),
  ModernaSevere=c(11,0),
  N=c(15000,15000))
covidVaccines.N <- 15000
kable(covidVaccines)
```
## Measures of Effectiveness

We start with a cross-tab

Treatment | Sick | Healthy | Total
--------- | ---- | ------- | -----
Placebo   |  SP  |   HP    |  NP
Vaccine   |  SV  |   HV    |  NV
Total     |  NS  |   NH    |  N

Odds of getting sick

Placebo:  $ SP/HP $
Vaccine:  $ SV/HV $


### Cross Product (Odds) Ratio

$$ OR = \frac{SP/HP}{SV/HV} $$
How much does your odds of getting sick increase if you get the placebo instead of the vaccine.

### Risk Ratio

$$ RR = \frac{SP/NP}{SV/NV} $$
How much does your probiliby of getting sick increase if you get the placebo instead of the vaccine.

### Vaccine Effectiveness

$$ VE = 100 (1 - \frac{1}{RR}) $$

### Chi-square test

Null hypothesis is that getting the disease is independent of the vaccine.  In other words, $OR=RR=1$.

$$ SV/NV = SP/NP $$

Large chi-squared value incidates that cross product rate is not 1 (but doesn't tell if placebo or vaccine is better!

### Z-score test

Another way to work with these data is to calculate probabilities of infection for each group and the standard errors.  Then can use the $z$-test to compare.

$$p_V = p(S|V) = SV/NV \qquad SE(p_V) = \sqrt{p_V(1-p_V)/NV} $$
$$p_P = p(S|P) = SP/NP \qquad SE(p_P) = \sqrt{p_P(1-p_P)/NP} $$
$$ z = \frac{p_P-p_V}{\sqrt{SE(p_V)^2 + SE(p_P)^2}}$$

## Pfizer Vaccine

There were around 30,000 volunteers in the Phase 3 trials; 15,000 in each arm.

```{r}

covidVaccines %>% 
  mutate(p.Pfizer=Pfizer/N) %>%
  mutate(s.Pfizer=sqrt(p.Pfizer*(1-p.Pfizer)/N)) ->
  covidVaccines

select(covidVaccines,Treatment,contains("Pfizer")) %>% kable(digits=c(4,5))
              

```
```{r}
ggplot(covidVaccines,aes(x=Treatment,y=p.Pfizer,ymin=p.Pfizer-2*s.Pfizer,ymax=p.Pfizer+2*s.Pfizer)) + geom_pointrange()
```
### X^2 -- Pfizer

In SPSS this is done by producing a cross-tab.  We don't have the number of negative cases in each arm of the study, but up to rounding error it is just the sample size, so we will use that.

![Pfizer Cross Tab](CovidPfizerCrossTab.png)

![Phizer X2](CovidPfizerChi2.png)
```{r}
chisq.test(as.matrix(select(covidVaccines,Pfizer,N)))
```

```{r}
p <- pull(covidVaccines,p.Pfizer)
s <- pull(covidVaccines,s.Pfizer)

z <- (p[1]-p[2])/sqrt(sum(s^2))
pz <- 1-pnorm(z)
cat("Z = ",round(z,2), "p = ",round(pz,3),"\n")

```


![Pfizer Risk Ratio](CovidPfizerRisk.png)

```{r}
p <- pull(covidVaccines,p.Pfizer)
RR <- p[1]/p[2]
VE <- 100*(1-1/RR)
cat("Risk Ratio: ",round(RR,2),
    "Vaccine Effectiveness: ",round(VE,1),"\n")

```


## Moderna Vaccine -- All Cases


There were around 30,000 volunteers in the Phase 3 trials; 15,000 in each arm.

```{r}

covidVaccines %>% 
  mutate(p.ModernaAll=ModernaAll/N) %>%
  mutate(s.ModernaAll=sqrt(p.ModernaAll*(1-p.ModernaAll)/N)) ->
  covidVaccines

select(covidVaccines,Treatment,contains("ModernaAll")) %>% kable(digits=c(4,5))
              

```
```{r}
ggplot(covidVaccines,aes(x=Treatment,y=p.ModernaAll,ymin=p.ModernaAll-2*s.ModernaAll,ymax=p.ModernaAll+2*s.ModernaAll)) + geom_pointrange()
```
### X^2 -- Moderna (All Cases)

In SPSS this is done by producing a cross-tab.  We don't have the number of negative cases in each arm of the study, but up to rounding error it is just the sample size, so we will use that.

![Moderna All Cases Cross Tab](CovidModernaAllCrossTab.png)

![Moderna All Cases X2](CovidModernaAllChi2.png)
```{r}
chisq.test(as.matrix(select(covidVaccines,ModernaAll,N)))
```

```{r}
p <- pull(covidVaccines,p.ModernaAll)
s <- pull(covidVaccines,s.ModernaAll)

z <- (p[1]-p[2])/sqrt(sum(s^2))
pz <- 1-pnorm(z)
cat("Z = ",round(z,2), "p = ",round(pz,3),"\n")

```


![Moderna All Cases Risk Ratio](CovidModernaAllRisk.png)

```{r}
p <- pull(covidVaccines,p.ModernaAll)
RR <- p[1]/p[2]
VE <- 100*(1-1/RR)
cat("Risk Ratio: ",round(RR,2),
    "Vaccine Effectiveness: ",round(VE,1),"\n")

```

## Moderna Vaccine -- Severe Cases


```{r}

covidVaccines %>% 
  mutate(p.ModernaSevere=ModernaSevere/N) %>%
  mutate(s.ModernaSevere=sqrt(p.ModernaSevere*(1-p.ModernaSevere)/N)) ->
  covidVaccines

select(covidVaccines,Treatment,contains("ModernaSevere")) %>% kable(digits=c(4,5))
              

```
```{r}
ggplot(covidVaccines,aes(x=Treatment,y=p.ModernaSevere,ymin=p.ModernaSevere-2*s.ModernaSevere,ymax=p.ModernaSevere+2*s.ModernaSevere)) + geom_pointrange()
```
### X^2 -- Moderna (All Cases)

In SPSS this is done by producing a cross-tab.  We don't have the number of negative cases in each arm of the study, but up to rounding error it is just the sample size, so we will use that.

![Moderna Severe Cases Cross Tab](CovidModernaSevereCrossTab.png)

![Moderna Severe Cases X2](CovidModernaSevereChi2.png)
```{r}
chisq.test(as.matrix(select(covidVaccines,ModernaSevere,N)))
```

```{r}
p <- pull(covidVaccines,p.ModernaSevere)
s <- pull(covidVaccines,s.ModernaSevere)

z <- (p[1]-p[2])/sqrt(sum(s^2))
pz <- 1-pnorm(z)
cat("Z = ",round(z,2), "p = ",round(pz,3),"\n")

```

```{r}
p <- pull(covidVaccines,p.ModernaSevere)
RR <- p[1]/p[2]
VE <- 100*(1-1/RR)
cat("Risk Ratio: ",round(RR,2),
    "Vaccine Effectiveness: ",round(VE,1),"\n")

```

Yikes!  The estimate for the chances of getting Severe Covid-19 with
the virus is 0.  Divide by zero error!

But probability zero means impossible.  That is not right!

### Continuity Correction

Fix this by adding a conditinuity correction.  We add 1/2 to all of
the entries in the table.  

In particular, this makes the estimated
rate for getting severe COVID-19 $\frac{1}{2}/(N+1)$.

```{r}

covidVaccines %>% 
  mutate(p.ModernaSevere=(ModernaSevere+.5)/(N+1)) %>%
  mutate(s.ModernaSevere=sqrt(p.ModernaSevere*(1-p.ModernaSevere)/(N+1))) ->
  covidVaccines

select(covidVaccines,Treatment,contains("ModernaSevere")) %>% kable(digits=c(5,5))
              

```
```{r}
ggplot(covidVaccines,aes(x=Treatment,y=p.ModernaSevere,ymin=p.ModernaSevere-2*s.ModernaSevere,ymax=p.ModernaSevere+2*s.ModernaSevere)) + geom_pointrange()
```
```{r}
p <- pull(covidVaccines,p.ModernaSevere)
RR <- p[1]/p[2]
VE <- 100*(1-1/RR)
cat("Risk Ratio: ",round(RR,2),
    "Vaccine Effectiveness: ",round(VE,1),"\n")

```


## References:

* StatNews article on Pfizer vaccine:
https://www.statnews.com/2020/11/09/covid-19-vaccine-from-pfizer-and-biontech-is-strongly-effective-early-data-from-large-trial-indicate/

* Official Protocol document from Pfizer:
https://www.pfizer.com/science/coronavirus

* Pfizer Press Release:
https://www.pfizer.com/news/press-release/press-release-detail/pfizer-and-biontech-announce-vaccine-candidate-against

* Moderna Press Release:
https://investors.modernatx.com/news-releases/news-release-details/modernas-covid-19-vaccine-candidate-meets-its-primary-efficacy

* Entries from Andrew Gelman's Blog:
https://statmodeling.stat.columbia.edu/2020/11/16/estimating-efficacy-of-the-vaccine-from-95-true-infections/

https://statmodeling.stat.columbia.edu/2020/11/11/the-pfizer-biontech-vaccine-may-be-a-lot-more-effective-than-you-think/

* _How to use SPSS to obtain Odd Ratio and Relative Risk_  http://brahms.emu.edu.tr/icetin/spss8-RR-OR.pdf