\name{Pnet2Qmat}
\alias{Pnet2Qmat}
\title{Makes an augmented Q-matrix from a collection of parameterized nets}
\description{
  In augmented \eqn{Q}-matrix, there is a set of rows for each
  \code{\link{Pnode}} which describes the conditional probability table
  for that node in terms of the model parameters (see
  \code{\link{BuildTable}}).  As the Pnodes could potentially come from
  multiple nets, the key for the table is (\dQuote{Model},
  \dQuote{Node}).  As there are multiple rows per node, \dQuote{State}
  is the third part of the key.

  The function \code{Pnet2} creates an augmented \eqn{Q}-matrix
  out of a collection of \code{\link{Pnode}}s, possibly spanning multiple
  \code{\link{Pnet}}s.

}
\usage{
Pnet2Qmat(obs, prof, defaultRule = "Compensatory", defaultLink = "partialCredit", defaultAlpha = 1, defaultBeta = NULL, defaultLinkScale = NULL, debug = TRUE)
}
\arguments{
  \item{obs}{A list of \emph{observable} \code{\link{Pnode}} objects.
    These could span multiple \code{\link{Pnet}} objects.  Each element
    of this list will corresponded to one or more rows in the output
    \eqn{Q}-matrix. 
  }
  \item{prof}{A list of \emph{proficiency} \code{Pnode}s.  These are the
    parents of the \code{Pnode}s in the \code{obs} list.  Usually, these
    are all in a central proficiency or hub model.
  }
  \item{defaultRule}{This should be a character scalar giving the name
    of a CPTtools combination rule (see
    \code{\link[CPTtools]{Compensatory}}).  
  }
  \item{defaultLink}{This should be a character scalar giving the name
    of a CPTtools link function (see \code{\link{partialCredit}}).}
  \item{defaultAlpha}{A numeric scalar giving the default value for
    slope parameters.}
  \item{defaultBeta}{A numeric scalar giving the default value for
    difficulty (negative intercept) parameters.}
  \item{defaultLinkScale}{A positive number which gives the default
    value for the link scale parameter.}
  \item{debug}{A logical value.  If true, extra information will be
    printed during process of building the Pnet.}
}
\details{

  A \eqn{Q}-matrix is a 0-1 matrix which describes which proficiency
  (latent) variables are connected to which observable outcome
  variables; \eqn{q_{jk}=1} if and only if 
  proficiency variable \eqn{k} is a parent of observable variable
  \eqn{j}.  Almond (2010) suggested that augmenting the \eqn{Q}-matrix
  with additional columns representing the combination rules
  (\code{\link{PnodeRules}}), link function (\code{\link{PnodeLink}}),
  link scale parameter (if needed, \code{\link{PnodeLinkScale}}) and
  difficulty parameters (\code{\link{PnodeBetas}}).  The discrimination
  parameters (\code{\link{PnodeAlphas}}) could be overloaded with the
  \eqn{Q}-matrix, with non-zero parameters in places where there were
  1's in the \eqn{Q}-matrix. 

  This arrangement worked fine with combination rules (e.g.,
  \code{\link[CPTtools]{Compensatory}}) which contained multiple alpha
  (discrimination) parameters, one for each parent variable,  and a
  single beta (difficulty).  The introduction of a new type of offset
  rule (e.g., \code{\link[CPTtools]{OffsetDisjunctive}}) which uses a multiple
  difficulty parameters, one for each parent variable, and a single
  alpha.  Almond (2016) suggested a new augmentation which has three
  matrixes in a single table (a Qmat):  the \eqn{Q}-matrix, which
  contains structural information; the \eqn{A}-matrix, which contains
  discrimination parameters; and the \eqn{B}-matrix, which contains the
  difficulty parameters.  The names for the columns for these matrixes
  contain the names of the proficiency variables, prepended with
  \dQuote{A.} or \dQuote{B.} in the case of the \eqn{A}-matrix and
  \eqn{B}-matrix.   There are two additional columns marked \dQuote{A}
  and \dQuote{B} which are used for the discrimination and difficulty
  parameter in the multiple-beta and multiple-alpha cases.  There is
  some redundancy between the \eqn{Q}, \eqn{A} and \eqn{B} matrixes, but
  this provides an opportunity for checking the validity of the input.

  The introduction of the partial credit link function
  (\code{\link[CPTtools]{partialCredit}}) added a further
  complication.  With the partial credit model, there could be a
  separate set of discrimination or difficulty parameters for each
  transition for a polytomous item.  Even the
  \code{\link[CPTtools]{gradedResponse}} link function requires a
  separate difficulty parameter for each level of the varaible save the
  first.  The rows of the Qmat data structure are hence augmented to
  include one row for every state but the lowest-level state.  There
  should be of fewer rows of associated with the node than the value in
  the \dQuote{Nstates} column, and the names of the states (values in
  the \dQuote{State} column) should correspond to every state of the
  target variable except the first.  It is an error if the number of
  states does not match the existing node, or if the state names do not
  match what is already used for the node or is in the manifest for the
  node \code{\link{Warehouse}}.  

  Note that two nodes in different networks may share the same name, and
  two states in two different nodes may have the same name as well.
  Thus, the formal key for the Qmat data frame is (\dQuote{Model},
  \dQuote{Node}, \dQuote{State}), however, the rows which share the
  values for (\dQuote{Model}, \dQuote{Node}) form a subtable for that
  particular node.  In particular, the rows of the \eqn{Q}-matrix
  subtable for that node form the \emph{inner Q-matrix} for that node.
  The inner \eqn{Q}-matrix shows which variables are relevant for each
  state transition in a partial credit model.  The column-wise maximum
  of the inner \eqn{Q}-matrix forms the row of the outer \eqn{Q}-matrix
  for that node.  This shows which proficiency nodes are the parent of
  the observable node. This corresponds to 
  \code{\link{PnodeQ}(\var{node})}. 

  The function \code{Qmat2Pnet} creates and sets the parameters of the
  observable \code{\link{Pnode}}s referenced in the \code{Qmat}
  argument.  As it needs to reference, and possibly create, a number of
  \code{\link{Pnet}}s and \code{Pnode}s, it requires both a network and
  a node \code{\link{Warehouse}}.  If the \code{override} parameter is
  true, the networks will be modified so that each node has the correct
  parents, otherwise \code{Qmat2Pnet} will signal an error if the
  existing network structure is inconsistent with the \eqn{Q}-matrix.

  As there is only one link function for each \var{node}, the values of
  \code{\link{PnodeLink}(\var{node})} and
  \code{\link{PnodeLinkScale}(\var{node})} 
  are set based on the values in the \dQuote{Link} and
  \dQuote{LinkScale} columns and the first row corresponding to
  \var{node}.  Note that the choice of link functions determines what is
  sensible for the other values but this is not checked by the code.

  The value of \code{\link{PnodeRules}(\var{node})} can either be a single
  value or a list of rule names.  The first value in the sub-Qmat must a
  character value, but if the other values are missing then a single
  value is used.  If not, all of the entries should be non-missing.  If
  this is a single value, then effectively the same combination rule is
  used for each transition.
  
  The interpretation of the \eqn{A}-matrix and the \eqn{B}-matrix
  depends on the value in the \dQuote{Rules} column.  There are two
  types of rules, multiple-A rules and multiple-B rules (offset rules).
  The CPTtools funciton \code{\link[CPTtools]{isOffsetRule}} checks to
  see what kind of a rule it is.  The multiple-A rules, of which
  \code{\link[CPTtools]{Compensatory}} is the canonical example, have one
  discrimination (or slope) parameter for every parent variable (values
  of 1 in the \eqn{Q}-matrix) and have a single difficulty (negative
  intercept) parameter which is in the \dQuote{B} column of the Qmat.
  The multiple-B or offset rules, of which
  \code{\link[CPTtools]{OffsetConjunctive}} is the canonical example,
  have a difficulty (negative intercept) parameter for each parent
  variable and a single discrimination (slope) parameter which is in the
  \dQuote{A} column.  The function \code{Qmat2Pnet} uses the value of
  \code{isOffsetRule} to determine whether to use the multiple-B (true)
  or multiple-A (false) paradigm.
  
  A simple example is a binary observable variable which uses the
  \code{\link[CPTtools]{Compensatory}} rule.  This is essentially a
  regression model (logistic regression with
  \code{\link[CPTtools]{partialCredit}} or
  \code{\link[CPTtools]{gradedResponse}} link funcitons, linear
  regression with \code{\link[CPTtools]{normalLink}} link function) on
  the parent variables.  The linear predictor is:
  \deqn{\frac{1}{\sqrt{K}} (a_1\theta_1 + \ldots + a_K\theta_K) - b .}
  The values \eqn{\theta_1, \ldots, \theta_K} are effective thetas, real
  values corresponding to the states of the parent variables.  The
  value \eqn{a_i} is stored in the column \dQuote{A.\var{namei}} where
  \var{namei} is the name of the \eqn{i}th proficiency variable; the
  value of \code{\link{PnodeAlphas}(\var{node})} is the vector \eqn{a_1,
  \ldots, a_k} with names corresponding to the parent variables.  The
  value of \eqn{b} is stored in the \dQuote{B} column; the value of
  \code{\link{PnodeBetas}(\var{node})} is \eqn{b}.

  The multiple-B pattern replaces the \eqn{A}-matrix with the
  \eqn{B}-matrix and the column \dQuote{A} with \dQuote{B}.
  Consider binary observable variable which uses the
  \code{\link[CPTtools]{OffsetConjunctive}} rule.  The linear predictor is:
  \deqn{a \min (\theta_1 -b+1, \ldots , \theta_K- b_K) .}
  The value \eqn{b_i} is stored in the column \dQuote{B.\var{namei}} where
  \var{namei} is the name of the \eqn{i}th proficiency variable; the
  value of \code{\link{PnodeBetas}(\var{node})} is the vector \eqn{b_1,
  \ldots, b_k} with names corresponding to the parent variables.   The
  value of \eqn{a} is stored in the \dQuote{A} column; the value of
  \code{\link{PnodeBetas}(\var{node})} is \eqn{a}.

  When there are more than two states in the output varible,
  \code{\link{PnodeRules}}, \code{\link{PnodeAlphas}(\var{node})} and
  \code{\link{PnodeBetas}(\var{node})} become lists to indicate that a
  different value should be used for each transition between states.
  If there is a single value in the \dQuote{Rules} column, or
  equivalently the value of \code{\link{PnodeRules}} is a scalar, then
  the same rule is repeated for each state transition.  The same is true
  for \code{\link{PnodeAlphas}(\var{node})} and
  \code{\link{PnodeBetas}(\var{node})}.  If these values are a list,
  that indicates that a different value is to be used for each
  transition.  If they are a vector that means that different values (of
  discriminations for multiple-a rules or difficulties for multiple-b
  rules) are needed for the parent variables, but the same set of values
  is to be used for each state transition.  If different values are to
  be used then the values are a list of vectors.

  
  The necessary configuration of \eqn{a}'s and \eqn{b}'s depends on the
  type of link function.  Here are the rules for the currently existing
  link funcitons:
  \describe{
    \item{normal}{(\code{\link[CPTtools]{normalLink}}) This link function
    uses the same linear predictor for each transition, so there should be
    a single rule, and \code{\link{PnodeAlphas}(\var{node})} and
    \code{\link{PnodeBetas}(\var{node})} should both be vectors (with
    \eqn{b} of length 1 for a multiple-a rule).  This rule also requires a
    positive value for the \code{\link{PnodeLinkScale}(\var{node})} in the
    \dQuote{"LinkScale"} column.  The values in the \dQuote{A.\var{name}}
    and \dQuote{B.\var{name}} for rows after the first can be left as
    \code{NA}'s to indicate that the same values are reused.}

    \item{graded response}{(\code{\link[CPTtools]{gradedResponse}}) This
    link function models the probability of getting at or above each
    state and then calculates the differences between them to produce
    the conditional probability table.  In order to avoid negative
    probabilities, the probability of being in a higher state must
    always be nonincreasing.  The surest way to ensure this is to both
    use the same combination rules at each state and the same set of
    discrimination parameters for each state.  The difficulty parameters
    must be nondecreasing.  Again, values for rows after the first can
    be left as \code{NA}s to indicate that the same value should be
    resused. }
    
    \item{partial credit}{(\code{\link[CPTtools]{partialCredit}}) This
    link function models the conditional probability from moving from
    the previous state to the current state.  As such, there is no
    restriction on the rules or parameters.  In particular, it can
    alternate between multiple-a and multiple-b style rules from row to
    row.

    Another restriction that the use of the partial credit rule lifts is
    the restriction that all parent variable must be used in each
    transition.  Note that there is one row of the \eqn{Q}-matrix (the
    inner \eqn{Q}-matrix) for each state transition.  Only the parent
    variables with 1's in the particular state row are considered when
    building the \code{\link{PnodeAlphas}(\var{node})} and
    \code{\link{PnodeBetas}(\var{node})} for this model.  Note that only
    the partial credit link function can take advantage of the multiple
    parents, the other two require all parents to be used for every
    state. }
  }

  The function \code{Pnet2Qmat} takes a collection of nodes (in a series
  of spoke or evidence models) and builds a Qmat data structure that can
  reproduce them.  It loops through the nodes and fills out the Qmat
  based on the properties of the \code{\link{Pnode}}s.  Note that if the
  proprties are not yet set, then the default values are used, thus
  applying this to a network for which the structure has been
  established, but the parameters have not yet been set will build a
  blank Qmat which can be adjusted by experts.

}
\value{

  The output augmented \eqn{Q}-matrix is a data frame with the columns
  described below.  The number of columns is variable, with items marked
  \var{prof} actually corresponding to a number of columns with names
  taken from the proficiency variables (the \code{prof} argument).


  \item{Model}{The name of the \code{\link{Pnet}} in which the node
    in this row lives.}
  \item{Node}{The name of the \code{\link{Pnode}} described in this
    row.  Except for the multiple rows corresponding to the same node,
    the value of this column needs to be unique within \dQuote{Model}.}
  \item{Nstates}{The number of states for this node.  Generally, each
    node should have one fewer rows than this number.}
  \item{State}{The name of the state for this row.  This should be
    unique within the (\dQuote{Model},\dQuote{Node}) combination.} 
  \item{Link}{The name of a link function.  This corresponds to
    \code{\link{PnodeLink}(\var{node})}.} 
  \item{LinkScale}{Either a positive number giving the link scale
    parameter or an \code{NA} if the link function does not need scale
    parameters.  This corresponds to
    \code{\link{PnodeLinkScale}(\var{node})}.}
  \item{\var{prof}}{There is one column for each proficiency variable.
    This corresponds to the structural part of the \eqn{Q}-matrix.
    There should be 1 in this column if the named proficiency is used in
    calculating the transition to this state for this particular node,
    and a 0 otherwise.}
  \item{Rules}{The name of the combination rule to use for this row.
    This corresponds to \code{\link{PnodeRules}(\var{node})}.}
  \item{A.\var{prof}}{There is one column for each proficiency with the
    proficiency name appended to \dQuote{A.}.  If a multiple-alpha style
    combination rule (e.g., \code{\link[CPTtools]{Compensatory}}) this
    column should contain the appropriate discriminations, otherwise,
    its value should be \code{NA}.}
  \item{A}{If a multiple-beta style
    combination rule (e.g., \code{\link[CPTtools]{OffsetConjunctive}}) this
    column should contain the single discrimination, otherwise,
    its value should be \code{NA}.}
  \item{B.\var{prof}}{There is one column for each proficiency with the
    proficiency name appended to \dQuote{B.}.  If a multiple-bet style
    combination rule (e.g., \code{\link[CPTtools]{OffsetConjunctive}}) this
    column should contain the appropriate difficulty (negative
    intercept), otherwise, its value should be \code{NA}.}
  \item{B}{If a multiple-beta style
    combination rule (e.g., \code{\link[CPTtools]{Compensatory}}) this
    column should contain the single difficulty (negative
    intercept), otherwise, its value should be \code{NA}.}
  \item{PriorWeight}{The amount of weight which should be given to the
    current values when learning conditional probability tables.  See
    \code{\link{PnodePriorWeight}}.} 
}
\references{
  Almond, R. G. (2010). \sQuote{I can name that Bayesian network in two
  matrixes.} \emph{International Journal of Approximate Reasoning.}
  \bold{51}, 167-178.

  Almond, R. G. (presented 2017, August). Tabular views of Bayesian
  networks. In John-Mark Agosta and Tomas Singlair (Chair), \emph{Bayeisan
    Modeling Application Workshop 2017}. Symposium conducted at the
  meeting of Association for Uncertainty in Artificial Intelligence,
  Sydney, Australia. (International) Retrieved from
  \url{http://bmaw2017.azurewebsites.net/} 
}
\author{Russell Almond}
\seealso{
  The inverse operation is \code{\link{Qmat2Pnet}}.

  See \code{\link{Warehouse}} for description of the network and node
  warehouse arguments

  See \code{\link[CPTtools]{partialCredit}},
  \code{\link[CPTtools]{gradedResponse}}, and
  \code{\link[CPTtools]{normalLink}} for currently available link
  functions.  See \code{\link[CPTtools]{Conjunctive}} and
  \code{\link[CPTtools]{OffsetConjunctive}} for more information about
  available combination rules.
  
  The node attributes set from the Omega matrix include:
  \code{\link{PnodeParents}(\var{node})},
  \code{\link{PnodeLink}(\var{node})},
  \code{\link{PnodeLinkScale}(\var{node})},
  \code{\link{PnodeRules}(\var{node})},
  \code{\link{PnodeQ}(\var{node})},
  \code{\link{PnodeAlphas}(\var{node})},
  \code{\link{PnodeBetas}(\var{node})}, and
  \code{\link{PnodePriorWeight}(\var{node})}
}
\examples{

## Sample Q matrix
Q1 <- read.csv(paste(library(help="Peanut")$path, "auxdata",
                           "miniPP-Q.csv", sep=.Platform$file.sep),
               stringsAsFactors=FALSE)

\dontrun{
library(PNetica) ## Needs PNetica
sess <- NeticaSession()
startSession(sess)
curd <- getwd()

netman1 <- read.csv(paste(library(help="Peanut")$path, "auxdata",
                          "Mini-PP-Nets.csv", sep=.Platform$file.sep),
                    row.names=1,stringsAsFactors=FALSE)

nodeman1 <- read.csv(paste(library(help="Peanut")$path, "auxdata",
                           "Mini-PP-Nodes.csv", sep=.Platform$file.sep),
                     row.names=1,stringsAsFactors=FALSE)

omegamat <- read.csv(paste(library(help="Peanut")$path, "auxdata",
                           "miniPP-omega.csv", sep=.Platform$file.sep),
                     row.names=1,stringsAsFactors=FALSE)

## Insures we are building nets from scratch
setwd(tempdir())
## Network and node warehouse, to create networks and nodes on demand.
Nethouse <- BNWarehouse(manifest=netman1,session=sess,key="Name")

Nodehouse <- NNWarehouse(manifest=nodeman1,
                         key=c("Model","NodeName"),
                         session=sess)

## Build the proficiency model first:
CM <- WarehouseSupply(Nethouse,"miniPP_CM")
CM1 <- Omega2Pnet(omegamat,CM,Nodehouse,override=TRUE)

## Build the nets from the Qmat

Qmat2Pnet(Q1, Nethouse,Nodehouse)


## Build the Qmat from the nets
## Generate a list of nodes
obs <-unlist(sapply(list(sess$nets$PPcompEM,sess$nets$PPconjEM,
                  sess$nets$PPtwostepEM,sess$nets$PPdurAttEM),
             NetworkAllNodes))

Q2 <- Pnet2Qmat(obs,NetworkAllNodes(CM))

## adjust Q1 to match Q2
Q1 <- Q1[,-1]  ## Drop unused first column.
class(Q1) <- c("Qmat", "data.frame")
# Force them into the same order
Q1 <- Q1[order(Q1$Model,Q1$Node),]
Q2 <- Q2[order(Q2$Model,Q2$Node),]
row.names(Q1) <- NULL
row.names(Q2) <- NULL


## Force all NA columns into the right type
Q1$LinkScale <- as.numeric(Q1$LinkScale)
Q1$A.Physics <- as.numeric(Q1$A.Physics)
Q1$A.IterativeD <- as.numeric(Q1$A.IterativeD)
Q1$B.Physics <- as.numeric(Q1$B.Physics)
Q1$B.NTL <- as.numeric(Q1$B.NTL)

## Fix fancy quotes added by some spreadsheets
Q1$Rules <- gsub(intToUtf8(c(91,0x201C,0x201D,93)),"\"",Q1$Rules)

## Insert Default Prior Weights
Q1$PriorWeight <- ifelse(is.na(Q1$NStates),"","10")
all.equal(Q1,Q2)


stopSession(sess)
setwd(curd)
}
}
\keyword{ distribution }
\keyword{ graph }