---
title: "Reliability and Validity of Physics Playground"
author: 
  - Russell Almond, Florida State University
  - Jiawei Li
  - Zhichun (Lukas) Liu (now@ U Mass Dartmouth)
  - Seyedahmad Rahimi (@UF Starting Summer, 2021)
  - Chen Sun
  - Seyfullah Tingir, Cambium Assessment
date: "4/12/2021"
output: ioslides_presentation
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE)
library(tidyverse)
library(psych)
library(effectsize)
statAB <- read.csv("statAB.csv")
BSEAP <- read.csv("BSnewEAP.csv")
```

# Physics Playground The Game

## Gameplay Sketching

[Sketching Levels](https://youtu.be/VGZ_QyDpXD8)

![Physics Playground Sketching](UltimatePinball1.png)

Players draw objects (ramps, pendulums, levers, springboards) to get ball to balloon.

## Gameplay Manipulation

[Manipulation Levels](https://youtu.be/TzDvJRFweKM)

![Physics Playground Manipulation](Tricks1.png)

Players manipulate physics parameters (gravity, ball mass, air resistance, bounciness) to get ball to balloon.

## Four Process Architecture

* **Presentation** (Game Engine) -- Presents game level captures _work product_ (collection of events).

* **Evidence Identification** (Server) -- Processes events for a game level to create observable evidence sets for the game levels.

* **Evidence Accumulation** (Server) -- Uses Bayesian networks to estimate physics proficiencies from game levels.

* **Activity Selection** (Game Engine) -- Picks next game level based on current ability estimates.

Note:  Events and evidence sets are stored in a database, so scoring can easily be rerun by marking evidence sets as "unprocessed" and reprocessing.


## Physics Playground Model


![Proficiency Model](PP_OrangeNodes_6.png)

* 4 high-level proficiencies (salmon)
* 9 low-level proficiencies (orange).  

Each nodes has states `High`, `Medium`
and `Low`.  

## Physics Playground Q-Matrix

There is a collection of evidence models:  one for each game level.

These are represented through an augmented Q-matrix

[online sheet](https://docs.google.com/spreadsheets/d/16LcEuCspZjiBoZ3-Y1R3jxi1COXmh9vuTa9GwH1A_7Q/)

[Peanut package](https://pluto.coe.fsu.edu/RNetica/Peanut.html) has code for building Bayes nets from spreadsheet.

## Physics Playground Scores

* _Margin_ -- Probabilities for 
  `High`, `Medium` and `Low` (vector valued).  
  
* _Mode_ -- One of `High`, `Medium` or `Low` with highest probability.  
  
* _EAP (Expected A Posteriori)_ -- Assign the value +1 to `High`, 0 to   `Medium` and -1 to `Low` and then calculate the expected value.
  This can be also expressed as $\Pr(High) -\Pr(Low)$.  The score runs
  from +1 (high) to -1 (low).  (In the implementation, .97 and -.97
  was used, so the actual scores do not every quite reach +1 or -1).
  
[EABN Engine](https://pluto.coe.fsu.edu/Proc4) can spit out each of these score types for each node.

# Reliability

## Split Halfs

* "Items" are game levels:  need to leave game levels intact.

* Pair game levels on:

  - Primary and Secondary Skills
  - Difficulty
  - Other features
  
* Randomly assign one level of each pair to _Form A_ and _Form B_.

## Rescoring Logic

* Mark observable sets from _Form A_ levels as unscored.  Rescore to get _Form A_ scores.

* Mark observable sets from _Form B_ levels as unscored.  Rescore to get _Form B_ scores.

## Reliabilities:  EAP correlations

EAP scores are continuous, so can use ordinary correlations.

Optionally apply Spearman-Brown corrections

```{r Reliability, echo=FALSE}
corlist <- data.frame(
  Measure=c("Physics","Force and Motion","Linear Momentum",
            "Energy", "Torque"),
  Reliability=c(
    cor(statAB$Physics_EAPA,statAB$Physics_EAPB),
    cor(statAB$ForceAndMotion_EAPA,statAB$ForceAndMotion_EAPB),
    cor(statAB$LinearMomentum_EAPA,statAB$LinearMomentum_EAPB),
    cor(statAB$Energy_EAPA,statAB$Energy_EAPB),
    cor(statAB$Torque_EAPA,statAB$Torque_EAPB)))


knitr::kable(corlist,digits=3,caption="Correlations between Form A and B sub-forms.")

```


## Scatterplot (Higher Reliability)

```{r Energy, echo=FALSE, message=FALSE, fig.cap="Energy EAP score consistency"}
ggplot(statAB) + geom_point(aes(Energy_EAPA,Energy_EAPB)) + 
  geom_smooth(method="lm",aes(Energy_EAPA,Energy_EAPB)) + 
  labs(x="EAP(Energy) -- Form A",
       y="EAP(Energy) -- Form B")
```

## Scatterplot (Lower Reliability)

```{r Momentum, echo=FALSE, message=FALSE, fig.cap="Linear Momentum EAP score consistency"}
ggplot(statAB) + geom_point(aes(LinearMomentum_EAPA,LinearMomentum_EAPB)) + 
  geom_smooth(method="lm",aes(LinearMomentum_EAPA,LinearMomentum_EAPB)) + 
  labs(x="EAP(Momentum) -- Form A",
       y="EAP(Momentum) -- Form B")
```


## Reliabilities:  Modal Kappas

```{r ModalKappas, echo=TRUE}
modalTab <- table(statAB$Physics_ModeA,statAB$Physics_ModeB)
```
```{r KappaOutput,echo=FALSE}
knitr::kable(modalTab,caption="Agreement of Physics Modal Scores.")
cohen.kappa(modalTab)
```

The `cohen.kappa` function is from the `psych` package.


## Expected Kappas (1)

Get the "expected agreement" table by multiplying the margin scores together.
```{r Margins,echo=TRUE}
statAB[1,paste("Physics_Margin.",c("High","Medium","Low"),"A",sep="")]
statAB[1,paste("Physics_Margin.",c("High","Medium","Low"),"B",sep="")]
```
## Expected Kappas (2)
```{r ExpectedAggrement1,echo=TRUE}
outer(
  as.numeric(statAB[1,paste("Physics_Margin.",c("High","Medium","Low"),"A",sep="")]),
  as.numeric(statAB[1,paste("Physics_Margin.",c("High","Medium","Low"),"B",sep="")])
)
```
## Expected Kappas (3)

Now sum accros students:

```{r ExpectedAgreement,echo=TRUE}
states <- c("High","Medium","Low")
expTab <- outer(
  as.numeric(statAB[1,paste("Physics_Margin.",states,"A",sep="")]),
  as.numeric(statAB[1,paste("Physics_Margin.",states,"B",sep="")])
)
for (irow in 2:(nrow(statAB))) {
  itab <- outer(
  as.numeric(statAB[irow,paste("Physics_Margin.",states,"A",sep="")]),
  as.numeric(statAB[irow,paste("Physics_Margin.",states,"B",sep="")])
  )
  if (all(!is.na(itab))) 
    expTab <- expTab + itab
}
rownames(expTab) <- states
colnames(expTab) <- states
```

## Expected Kappas (4)

```{r EKappaOutput,echo=FALSE}
knitr::kable(expTab,digits=3,caption="Agreement of Physics Marginal Scores.")
cohen.kappa(expTab)
```


# Validity

## Correlations with external measures

The Fall 2019 Field trial included a pret-test and post-test.  

Items were coded to subscales matching the 4 high-level (salmon) nodes.

```{r Cormat, echo=FALSE}

knitr::kable(cor(select(BSEAP,c("preScore","postScore","Physics_EAP"))),
             digits=3,caption="Correlation of Physics EAP score with whole pretest and posttest.",booktabs=T)

```

## Subscales

As the subscale measures are short, sum the pre
-test and post-test scores

```{r Validity, echo=FALSE}
corlist$Validity <- c(
  cor(BSEAP$preScore+BSEAP$postScore,BSEAP$Physics_EAP),
  cor(BSEAP$ForceAndMotion_EAP,BSEAP$prepostNFM),
  cor(BSEAP$LinearMomentum_EAP,BSEAP$prepostMomentum),
  cor(BSEAP$Energy_EAP,BSEAP$prepostEnergy),
  cor(BSEAP$Torque_EAP,BSEAP$prepostTorque)
)
knitr::kable(corlist,digits=3,
             caption="Reliability (sub-form correlations) and Valitity (corrlation with pretest + posttest)",booktabs=T)

```

## Eta Statistic

Can use the $\eta$ or $\eta^2$ statistic to show how much of variability in score is explained by classification.

```{r EtaOverall, echo=TRUE}
## AOV expect factors.
BSEAP$Physics_Mode <- factor(BSEAP$Physics_Mode)
## Pretest
preaov <- aov(preScore ~ Physics_Mode,data=BSEAP)
cat("Pretest eta:", round(
  sqrt(eta_squared(preaov, partial=FALSE)$Eta_Sq),
 digits=2),"\n")
```


```{r EtaPost, echo=FALSE}
##Posttest
postaov <- aov(postScore ~ Physics_Mode,data=BSEAP)
cat("Posttest eta:", round(
  sqrt(eta_squared(postaov, partial=FALSE)$Eta_Sq),
 digits=2),"\n")
  
```

The `eta_squared` function is from the `effectsize` package.


## Discussion

These techniques work with any CDM which provides probability distributions over mastery states.

ECD models really help to produce matched forms for split-halfs.

These calculations can be used in formatory analysis.

Can do this with simulated data to examine form designs.

## Resources

* Physics Playground:  https://pluto.coe.fsu.edu/ppteam/

* RNetica and Peanut (Bayes net tools):  https://pluto.coe.fsu.edu/RNetica

* Four Process Scoring Tools:  https://pluto.coe.fsu.edu/Proc4

* Russell's Github page:  https://github.com/ralmond