\name{Categorical} \alias{dcat} \alias{rcat} \alias{docat} \alias{pocat} \alias{qocat} \alias{rocat} \title{Categorical and Ordered Categorical Distributions. } \description{ A categorical distribution is one where the random variable can take one of a finite number of values. The random variable can take on a nominal or ordinal scale. The random variable can be represented by a (ordered) factor or an integer. For unordered categories, only the \code{dcat} and \code{rcat} operations are supported. For ordered categories, \code{pocat} and \code{qocat} are supported as well. } \usage{ dcat(x, prob, log = FALSE) docat(x, prob, log = FALSE) pocat(q, prob, log = FALSE) qocat(p, prob, factor = TRUE, labels = NULL) rcat(n, prob, factor = TRUE, ordered = FALSE, labels = NULL) rocat(n, prob, factor = TRUE, labels = NULL) } \arguments{ \item{n}{An integer scalar giving the number of random values to generate.} \item{p}{A numeric vector of probabilities.} \item{q}{An integer or ordered factor giving quantiles of the distribution.} \item{x}{An integer, factor or ordered factor giving values of the random variable.} \item{prob}{A vector, matrix or data frame of probability values. All rows must add to one. The number of rows should match \code{n} or the length of \code{p}, \code{q} or \code{x} (if those are note one).} \item{factor}{A logical scalar, indicates whether or not the output should be converted into a factor.} \item{ordered}{A logical scalar, indicates whether or not the output should be converted into an ordered factor.} \item{labels}{A character vector. If the output is an ordered factor, these are the names of the levels. The default is the names or column names of \code{prob}.} \item{log}{A logical scalar. If true, log probabilities are returned instead of probabilities.} } \details{ A categorical distribution is descrbed by a vector of probabilities, \eqn{p_1,\ldots,p_k} which sum to one. The possible values are the integers \eqn{1,\ldots,k}. The parameter \code{prob} is represented by a numeric vector whose values sum to one. The random value \code{x} can be represented either by an integer, or a factor value. If it is a factor value, then the level names of the factor should match the names of \code{prob}. The value of \code{prob} can be a matrix or a data frame. In which case each row is treated as a parameter. The column names of the matrix (or names of the data frame) should match the level names of the factor variable. An ordered categorical distribution is a categorical distribution whose values are considered to be ordered. The while all categorical variables are on at least a nominal scale, ordered categorial variables are also considered to be on an interval scale. Note that quantiles are defined for ordered categorical variables, but not for unordered categorical variables, thus \code{pocat(q,prob)} and \code{qocat(p,prob)} functions are available, but not \code{pcat(q,prob)} and \code{qcat(p,prob)} The functions \code{dcat(x,prob)} and \code{docat(x,prob)} calculate the probability of the random values \code{x}. The two functions are identical, as ordered and unordered categorical variables behave the same for this operation. The function is vectorized; if \code{x} is a vector or \code{prob} is a matrix or data frame, then the result is a vector of probabilities. The functions \code{rcat(n,prob)} and \code{rocat(n,prob)} both generate a random vector of categorical values. The only difference between the functions is that \code{rocat} produces an ordered factor by default. If \code{prob} is a matrix or data frame, then the number of rows should be \code{n} and a different set of probabilities is used for each value. The function \code{pocat(q,prob)} calculate the cumulative probability of the quantile \code{q}. This operation is only defined for ordered categorical variables. The function is vectorized; if \code{q} is a vector or \code{prob} is a matrix or data frame, then the result is a vector of probabilities. The function \code{pocat(p,prob)} calculate the quantile corresponding to a given probability \code{p}. This operation is only defined for ordered categorical variables. The function is vectorized; if \code{p} is a vector or \code{prob} is a matrix or data frame, then the result is a vector of values. } \value{ For \code{rcat}, \code{rocat} and \code{qocat} the value is a vector of values. These may be integers or (ordered) factors depending on the value of the \code{factor} argument. For \code{dcat}, \code{docat} and \code{pocat} the value is a vector of probabilities. } \author{Russell G Almond} \seealso{ \code{\link{median.ordered}} } \examples{ ### Random Number Test set.seed(123456) pp <- c(L=.5,M=.3,H=.2) x <- rcat(1000,pp) stopifnot(is.factor(x),!is.ordered(x), all.equal(levels(x),names(pp))) exp <- 1000*pp chisq <- sum((table(x)-exp)^2/exp) stopifnot (chisq < qchisq(.95,2)) ## Ordered categorical x1 <- rocat(10,pp) stopifnot(is.factor(x1),is.ordered(x1), all.equal(levels(x1),names(pp)) ### Discrete Markov Chain Generation N <- 100 Tmax <- 10 P0 <- c(L=.25,M=.5,H=.25) P1 <- rbind(L=c(L=.6,M=.3,H=.1), M=c(L=.2,M=.6,H=.2), H=c(L=.1,M=.3,H=.6)) chain <- matrix(ordered(NA,levels=1:3,labels=c("L","M","H")),N,Tmax) chain[,1] <- rcat(N,P0) for (t in 2:Tmax) { chain[,t] <- rcat(N,P1[as.integer(chain[,t-1L]),]) } dd <- dcat(3:1,pp) stopifnot(all.equal(dd,rev(pp))) dd <- dcat(factor(3:1,levels=1:3,labels=names(pp)),pp) stopifnot(all.equal(dd,rev(pp))) ddd <- dcat(1:3,P1) stopifnot(all.equal(ddd,diag(P1),check.attributes=FALSE)) ddd <- dcat(factor(1:3,levels=1:3,labels=colnames(P1)),P1) stopifnot(all.equal(ddd,diag(P1),check.attributes=FALSE)) pp1 <- pocat(1:3,pp) stopifnot(all.equal(pp1,cumsum(pp),check.attributes=FALSE)) pp1 <- pocat(ordered(3:1,levels=1:3,labels=names(pp)),pp) stopifnot(all.equal(pp1,rev(cumsum(pp)),check.attributes=FALSE)) pp1 <- pocat(1:3,P1) stopifnot(all.equal(pp1,diag(t(apply(P1,1,cumsum))), check.attributes=FALSE)) qq1 <- qocat(seq(.1,1,.1),pp) stopifnot(all.equal(qq1,ordered(rep(1:3,times=10*pp), levels=1:3,labels=names(pp)))) qq2 <- qocat(.5,P1) stopifnot(all.equal(qq2,ordered(1:3,levels=1:3,labels=names(pp)), check.attributes=FALSE)) } \keyword{ distribution }