--- title: "RNetica Basics" author: "Russell Almond" date: "March 18, 2019" output: pdf_document: default ioslides_presentation: default slidy_presentation: default --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = FALSE) ``` ## Four Packages ![Peanut Package Dependencies](../figures/PnetPackages.png) - RNetica -- Low-level R/Netica Interface - CPTtools -- Design patterns for Conditional Probability Tables * Independent of other packages - Peanut -- Object-oriented parameterized Bayesian Networks * Implementation independent * Uses CPTtools - PNetica -- RNetica implementation of Peanut protocols https://pluto.coe.fsu.edu/RNetica ## RNetica License Agreements - R: GPL-3 (Free and open source) - RNetica, CPTtools, Peanut, PNetica: Artistic-2.0 (Free and open source) - Netica.dll/libNetica.so: Commercial (openAPI, but not open source) * Free Student/Demo Version + Limited number of nodes. + Limited Usage (education, evaluation of Netica) * Paid version (see http://www.norsys.com/ for price information). + Need to purchase API, not GUI verison of Netica for RNetica + May want GUI for network visualization. * Netica.dll/libNetica.so included with binary distributions by permission of Norsys + License key still must be purchased. ## Installing non-CRAN packages - Windows and MacOS X: Download object package from pluto * Windows: RNetica-0.5-4.zip * MacOS: RNetica-0.5-5.tgz - RStudio: Use Tools> Install Packages ..., select "Install From" = "Package Archive File" - install.packages("RNetica-0.5-4.zip", repos=NULL) - Source Installation: * Download Source tarball: RNetica-0.5-4.tar.gz * Download and unpack Netica C API from https://norsys.com/ * From Command Line: ` R CMD INSTALL RNetica --configure-args='--with-netica=/path/to/Netica_API_504' ` * Windows, need to set a NETICA_HOME environmental variable, see INSTALLATION FILE. - Now load RNetica ```{r library} library(RNetica) ``` ## Netica Session: R and Netica Memory ![Netica Session and Netica Heap](../figures/RNetica-Netica1.png) - R and Netica have two different workspaces (memory heaps) - R workspace is saved and restored automatically when you quick and restart R. - Netica heap must be reconnected manually. - Netica Session object + Provides link to Netica Memory Space + Is a container for all `NeticaNetwork` objects + Must be reconnected every R session, using `startSession()` ## Installing a License Key - When you purchase a license, Norsys will send you a license key. Something that looks like: “+Course/FloridaSU/Ex15-05-30,120,310/XXXXX” (Where I’ve obscured the last 5 security digits) - To install the license key, start R in your project directory and type: ```{r licenseKey, echo=TRUE} #DefaultNeticaSession <- #NeticaSession(LicenseKey="+AlmondR/Tutorial/Ex19-06-01,120,310/XXXXX") #q("yes") ``` Restart R and type ```{r restart, echo=TRUE} #library(RNetica) #startSession(DefaultNeticaSession) ``` + If license key is not installed, then you will get the limited/student mode. Most of these examples will run ## When to use the session object. * When starting/restarting Netica * When creating a network, or reading one from a file. * When searching for networks. * Certain global properties `NeticaNetwork` objects have a `$session` proprty which points back to the session. `NeticaNode` objects have a `$node` property which points back to the network (which points to the session). ## Active and Inactive Pointers ![Netica Session and Netica Heap](../figures/RNetica-Netica1.png) + When RNetica creates/finds a Netica object it creates a corresponding R object + R NeticaBN objects live in the NeticaSession object. - R NeticaNode objects live in the NeticaBN. + If the pointer gets broken (saving & restarting R, deleting the network/node) then the R object becomes inactive. + The function `is.active()` tests to see if the node/net/session is active ## A simple example A subset of the ACED (http://ecd.ralmond.net/ecdwiki/ACED/ACED/) network. ![ACED Motif 1](../figures/ACEDMotif1.png) ## Starting The Netica Interface Load Libraries ```{r loadlib, echo=TRUE} library(RNetica) library(CPTtools) ``` Create a session object and start it. Session object can be reused across workspaces, but it needs to be started again everytime R is connected. ```{r start Session,echo=TRUE} sess <- NeticaSession(LicenseKey="+AlmondR/Tutorial/Ex19-06-01,120,310/864") # Could add License Key as argument startSession(sess) ``` Must start a session every time R is started. As a fallback, RNetica, will look for an object called `DefaultNeticaSession` and try to start that. ## Reloading Nets and Nodes Note that load networks needs a session as an argument (as networks are created within a session). Networks must be reloaded every time R is started. Note also, that the nodes must be reconnected to R objects. `NetworkAllNodes()` is the easiest way of doing this. ```{r load motif1, echo=TRUE} ## Read in Network -- Do this every time R is started. ##DeleteNetwork(motif1) motif1 <- ReadNetworks("../Nets/ACEDMotif1.dne",session = sess) ## Reconnect nodes -- Do this every time R is restarted m1.nodes <- NetworkAllNodes(motif1) m1.sgp <- m1.nodes$SolveGeometricProblems m1.obs <- NetworkNodesInSet(motif1,"Observables") m1.obs ``` ## Node Sets Netica defines a node set functionality which Adds a collection of labels (sets) to each node Defines a collection of nodes with that label Netica GUI really only offers the opportunity to color nodes by set RNetica can loop over node sets (lists of nodes) ```{r node sets, echo=TRUE} ## Node Sets NetworkNodeSets(motif1) NetworkNodesInSet(motif1,"Proficiencies") NodeSets(m1.sgp) ## These are all settable NodeSets(m1.sgp) <- c(NodeSets(m1.sgp),"HighLevel") NodeSets(m1.sgp) ``` ## RNetica Functions ```{r queries, echo=TRUE} ## Querying Nodes NodeStates(m1.sgp) #List states NodeParents(m1.sgp) #List parents NodeLevels(m1.sgp) #List numeric values associated with states NodeProbs(m1.sgp) # Conditional Probability Table (as array) ``` Can use the [] notation for viewing, setting CPTs ```{r CPTs, echo=TRUE} m1.sgp[] # Conditional Probability Table (as data frame) ## These are all settable (can be used on RHS of <-) for ## model construction motif1$nodes$X3Medium[] ``` ## Inference Before doing inference, must compile the network. This is the lightning bolt icon on the GUI. ```{r compile, echo=TRUE} CompileNetwork(motif1) #Lightning bolt on GUI ## Must do this before inference ## Recompiling an already compiled network is harmless ``` ```{r} ## Enter Evidence by setting values for these functions NodeFinding(m1.sgp) #View or set the value NodeLikelihood(m1.sgp) #Virtual evidence ``` ```{r beliefs, echo=TRUE} ## Query beliefs NodeBeliefs(m1.sgp) #Current probability (given entered evidence) ``` If we associate numeric values with a node, we can calculate expected values. ```{r node levels, echo=TRUE} NodeLevels(m1.sgp) <- c(-1,0,1) NodeExpectedValue(m1.sgp) ``` ## Enter Evidence Direct Evidence can be entered with the setter method of `NodeFinding` ```{r node finding, echo=TRUE} NodeStates(m1.obs$X3Medium) NodeFinding(m1.obs$X3Medium) <- "True" ``` Can also enter virtual evidence in the form of a likelihood. ```{r virtualEvidence, echo=TRUE} ## Enter Evidence "Not Low" ("High or Medium") NodeLikelihood(motif1$nodes$TableGeometric) <- c(1,1,0) ``` Query the nodes again, and the values should have changed. ```{r belief2, echo=TRUE} NodeBeliefs(m1.sgp) #Current probability (given entered evidence) NodeExpectedValue(m1.sgp) #If node has values, EAP ``` ## Retract Evidence Can retract evidence node by node, or on entire network. ```{r retract, echo=TRUE} RetractNodeFinding(m1.obs$X3Medium) RetractNetFindings(motif1) ``` Query the nodes again, and the values should have returned to the baselines. Query the nodes again, and the values should have changed. ```{r beliefr, echo=TRUE} NodeBeliefs(m1.sgp) #Current probability (given entered evidence) NodeExpectedValue(m1.sgp) #If node has values, EAP ``` Many more examples ```{r help, echo=TRUE} help(RNetica) ``` ## Mini-ACED Proficiency Model - Subset of ACED network (Shute, Hansen & Almond (2008); http://ecd.ralmond.net/ecdwiki/ACED ) - Proficiency Model subset: ![miniACED Proficiency Model](../figures/miniACEDPnet.png) ## Mini-ACED EM Fragments + All ACED tasks were scored correct/incorrect + Each evidence model is represented by a fragment consisting of observables with _stub_ edges indicating where it should be _adjoined_ with the network. Common Ratio Easy: ![Common Ratio Easy Fragment](../figures/CommonRatioEasyEM.png) Model Extend Table Hard: ![Model Extend Table Hard Fragment](../figures/ModelTableExtendHardEM.png) ## Task to EM Map * Need a table to tell which EM to use with which task * Row names are Task IDs * EM column contains evidence model name. * EM filename has suffix ".dne" attached. ```{r EM Table, echo=TRUE} ## Read in task->evidence model mapping EMtable <- read.csv("../Nets/MiniACEDEMTable.csv",row.names=1, as.is=2) #Keep EM names as strings head(EMtable) ``` ## Scoring Script 1. Start RNetica Session 2. Open Profiency Model netowrk 3. Load EM to net table. When Student evidence comes in: 1. Find SM for that student (clone PM if no SM) 2. Find EM for the task 3. Adjoin the two models. 4. Instantiate observable variables. 5. Drop EM. 6. Record Statistics & save SM ## Load the Proficiency Model ```{r load prof model, echo=TRUE} ## Read in network – Do this every time R is restarted profModel <- ReadNetworks("../Nets/miniACEDPnet-1.dne",session=sess) ## If profModels already exists could also use ## ReadNewtorks(profModel) ## Reconnect nodes – Do this every time R is restarted allNodes <- NetworkAllNodes(profModel) sgp <- allNodes$SolveGeometricProblems profNodes <- NetworkNodesInSet(profModel,"Proficiencies") ``` ## A student walks into the test center ... * Student gives the name “Fred” * Student is the right grade/age for ACED (8th or 9th grader, pre-algebra) * Bayes net has three states - Fred logs into ACED - Fred attempts the task `tCommonRatio1a` and gets it right - Fred attempts the task `tCommonRatio2a` and gets it wrong ## Start a New Student ```{r start new student, echo=TRUE} ## Copy the master proficiency model ## to make student model Fred.SM <- CopyNetworks(profModel,"Fred") Fred.SMvars <- NetworkAllNodes(Fred.SM) CompileNetwork(Fred.SM) ## Setup score history prior <- NodeBeliefs(Fred.SMvars$SolveGeometricProblems) Fred.History <- matrix(prior,1,3) row.names(Fred.History) <- "*Baseline*" colnames(Fred.History) <- names(prior) Fred.History ``` ## Score 1st Task ```{r score task 1, echo=TRUE} ### Fred does a task t.name <- "tCommonRatio1a" t.isCorrect <- "Yes" ## Adjoin SM and EM EMnet <- ReadNetworks(file.path("..","Nets", paste(EMtable[t.name,"EM"],"dne",sep=".")), session=sess) obs <- AdjoinNetwork(Fred.SM,EMnet) #NodeVisPos(obs$isCorrect) <- EMtable[t.name,c("X","Y")] NetworkAllNodes(Fred.SM) ## Fred.SM is now the Motif for the current task. CompileNetwork(Fred.SM) ## Enter finding NodeFinding(obs$isCorrect) <- t.isCorrect ``` ## Stats for 1st Task ```{r} ## Calculate statistics of interest post <- NodeBeliefs(Fred.SMvars$SolveGeometricProblems) Fred.History <- rbind(Fred.History,new=post) rownames(Fred.History)[nrow(Fred.History)] <- paste(t.name,t.isCorrect,sep="=") Fred.History ``` ## Cleanup from 1st Task ```{r cleanup Task 1, echo=TRUE} ## Cleanup and Observable no longer needed, so absorb it: DeleteNetwork(EMnet) ## Delete EM AbsorbNodes(obs) ## Currently, there is a Netica bug with Absorb Nodes, we will leave ## this node in place as that is mostly harmless. ``` ## 2nd Task Write a script for scoring the second task. This time Fred attempts the task `tCommonRatio2a` and gets it incorrect. ```{r exercise, echo=TRUE} ### Fred does another task t.name <- "tCommonRatio2a" t.isCorrect <- "No" ## Load Evidence Model and adjoin ## Recompile ## Add Evidence ## Check Finding and add to history ## Clean up ``` ## Answer for 2nd Task ```{r solution, echo=TRUE} ### Fred does another task t.name <- "tCommonRatio2a" t.isCorrect <- "No" EMnet <- ReadNetworks(file.path("..","Nets", paste(EMtable[t.name,"EM"],"dne", sep=".")), session=sess) obs <- AdjoinNetwork(Fred.SM,EMnet) #NodeVisPos(obs$isCorrect) <- EMtable[t.name,c("X","Y")] ## Fred.SM is now the Motif for the current task. CompileNetwork(Fred.SM) NodeFinding(obs[[1]]) <- t.isCorrect post <- NodeBeliefs(Fred.SMvars$SolveGeometricProblems) Fred.History <- rbind(Fred.History,new=post) rownames(Fred.History)[nrow(Fred.History)] <- paste(t.name,t.isCorrect,sep="=") Fred.History ## Cleanup: Delete EM and Absorb observables DeleteNetwork(EMnet) ## Delete EM AbsorbNodes(obs) ``` ## Save and Restore ```{r save and restore, echo=TRUE} ## Fred logs out WriteNetworks(Fred.SM,"FredSM.dne") DeleteNetwork(Fred.SM) is.active(Fred.SM) ## No longer active in Netica space ## Fred logs back in Fred.SM <- ReadNetworks("FredSM.dne",session=sess) is.active(Fred.SM) ``` ## Getting Serious * ACED field test has 230 students attempt all 63 tasks. * File miniACED-Geometric contains 30 task subset - There may be data registration issues here, don’t publish using these data before checking with me for an update * Each row is one student Record * Lets score the first student - And build a score history ## Setup for mini-ACED ```{r miniACED setup, echo=TRUE} miniACED.data <- read.csv("../Nets/miniACED-Geometric.csv",row.names=1) head(miniACED.data) ## Mark columns of table corresponding to tasks first.task <- 9 last.task <- ncol(miniACED.data) ## Code key for numeric values t.vals <- c("No","Yes") ``` ## Setup new Student ```{r setup student 1, echo=TRUE} ## Pick a student, we might normally iterate over this. Student.row <- 1 ## Setup for student in sample ## Create Student Model from Proficiency Model Student.SM <- CopyNetworks(profModel,"Student") Student.SMvars <- NetworkAllNodes(Student.SM) CompileNetwork(Student.SM) ## Initialize history list prior <- NodeBeliefs(Student.SMvars$SolveGeometricProblems) Student.History <- matrix(prior,1,3) row.names(Student.History) <- "*Baseline*" colnames(Student.History) <- names(prior) ``` ## Processing Loop ```{r processing loop, echo=TRUE} ## Now loop over tasks for (itask in first.task:last.task) { ## Look up the EM for the task, and adjoin it. tid <- names(miniACED.data)[itask] EMnet <- ReadNetworks(file.path("..","Nets", paste(EMtable[tid,"EM"],"dne",sep=".")), session=sess) obs <- AdjoinNetwork(Student.SM,EMnet) CompileNetwork(Student.SM) ## Add the evidence t.val <- t.vals[miniACED.data[Student.row,itask]] #Decode integer NodeFinding(obs[[1]]) <- t.val ## Update the history post <- NodeBeliefs(Student.SMvars$SolveGeometricProblems) Student.History <- rbind(Student.History,new=post) rownames(Student.History)[nrow(Student.History)] <- paste(tid,t.val,sep="=") ## Cleanup, Delete EM and Absob Observables DeleteNetwork(EMnet) AbsorbNodes(obs) # Still broken } ``` ## Weight of Evidence * Good (1985) * $H$ is a binary hypothesis, e.g., _Proficiency_ > `medium` * $E$ is evidence for that hypothesis * _Weight of Evidence_ of $E$ for $H$ (WOE) is $$ W(H:E) = \log\frac{\Pr(E|H)}{\Pr(E|\overline{H})} = \log \frac{\Pr(H|E)}{\Pr(\overline{H}|E)} - \log \frac{\Pr(H)}{\Pr(\overline{H}} $$ ## Conditional weight of evidence * Observe two different pieces of evidence, $E_1$ and $E_2$. * _Conditional Weight of Evidence_ is $$ W(H:E_2|E_1) = \log \frac{\Pr(E_2|H,E_1)}{\Pr(E_2|\overline{H},E_1)} $$ * Nice Additive Properties $$ W(H:E_1, E_2) = W(H:E_1) + W(H: E_2|E_1)$$ * Order Sensitive: - Earlier terms have higher WOE - Total WOE is always the same * WOE Balance Sheet (Madigan, Mosurski & Almond, 1997) ## Weight Of Evidence Balance Sheet Weight of Evidences can be calculated by differencing the log values in the history. ```{r woehist, echo=TRUE} Student.History woeHist(Student.History,c("High","Medium"),"Low") ``` We can graph this as a balansh sheet. ```{r woeBal,echo=TRUE} woeBal(Student.History,c("High","Medium"),"Low", title=paste("Evidence Balance Sheet for ", rownames(miniACED.data)[Student.row])) ``` ## Stacked Bar Charts These are like ordinary stacked bar charts, but offset at a given level. ```{r stackeBars, echo=TRUE} stackedBars(t(Student.History[1:4,]),1, col=hsv(223/360,.2,0.15*(3:1)+.5) ) ``` ```{r stackedBars, echo=TRUE} margins <- data.frame ( Trouble=c(Novice=.19,Semester1=.24,Semester2=.28,Semseter3=.20,Semester4=.09), NDK=c(Novice=.01,Semester1=.09,Semester2=.35,Semseter3=.41,Semester4=.14), Model=c(Novice=.19,Semester1=.28,Semester2=.31,Semseter3=.18,Semester4=.04) ) stackedBars(margins,3, main="Marginal Distributions for NetPASS skills", sub="Baseline at 3rd Semester level.", cex.names=.75, col=hsv(148/360,.2,0.10*(5:1)+.5)) ``` ## Comparing multiple students Compare one student to class average. Subgroup to full group Two groups to each other. Before to after. ```{r compareBars, echo=TRUE} margins.prior <- data.frame ( Trouble=c(Novice=.19,Semester1=.24,Semester2=.28,Semseter3=.20,Semester4=.09), NDK=c(Novice=.01,Semester1=.09,Semester2=.35,Semseter3=.41,Semester4=.14), Model=c(Novice=.19,Semester1=.28,Semester2=.31,Semseter3=.18,Semester4=.04) ) margins.post <- data.frame( Trouble=c(Novice=.03,Semester1=.15,Semester2=.39,Semseter3=.32,Semester4=.11), NDK=c(Novice=.00,Semester1=.03,Semester2=.28,Semseter3=.52,Semester4=.17), Model=c(Novice=.10,Semester1=.25,Semester2=.37,Semseter3=.23,Semester4=.05)) compareBars(margins.prior,margins.post,3,c("Prior","Post"), main="Margins before/after Medium Trouble Shooting Task", sub="Observables: cfgCor=Medium, logCor=High, logEff=Medium", legend.loc = "topright",legend.cex=.1, cex.names=.75, col1=hsv(h=.1,s=.2*1:5-.1,alpha=1), col2=hsv(h=.6,s=.2*1:5-.1,alpha=1)) ``` ```{r stopSession, echo=TRUE} stopSession(sess) ```