\name{SSX} \alias{SSX} \title{Compute weighted sums of variables, squares and cross products} \description{ This routine produces the sums of squares and cross products of an augmented data matrix. The data matrix is first augmented with a column of \eqn{1}'s, so that in addition to the sums of squares and cross products, the sums of of the variables and the sum of the weights are calculated. } \usage{ SSX(X, weights = 1, side = "left") } \arguments{ \item{X}{A data set expressed as a matrix (or an object which can be coerced into a matrix. Rows represent individuals and columns represent variables.} \item{weights}{If supplied, this should should be a numeric vector of the same length as the number of rows in \code{X}. This produces a weighted sum/sum of squares.} \item{side}{A character scalar which should have either the value \dQuote{left} or \dQuote{right} (abbreviations are allowed). This controls on which side the of the matrix the column of \eqn{1}'s is added. } } \details{ Dempster (1969) describes a trick where both the means and variances/covariances for a set of data can be calculated in a single matrix. Let \eqn{\bold{X}} be a \eqn{n \times p} data matrix. The augmented matrix, \eqn{\bold{X}_{(+)}}, is a \eqn{(n +1) \times p} matrix added by adding a column of \eqn{1}'s. If the augmented column is added on the right, then \eqn{\bold{X}_{(+)}^{T}\bold{X}_{(+)}} has the following form: \deqn{ \bold{Q}_{(+)} = \left [ \begin{array}{cc} \sum x_{i1}^2 & \sum x_{i1} x_{i2} & \ldots & \sum x_{i1} x_{ip} & \sum x_{i1} \\ \sum x_{i2} x_{i1} & \sum x_{i2}^2 & \ldots & \sum x_{i2} x_{ip} & \sum x_{i2} \\ \vdots & \vdots & \ddots & \vdots & \vdots \\ \sum x_{ip} x_{i1} & \sum x_{ip} x_{i2} & \ldots & \sum x_{ip}^2 & \sum x_{ip} \\ \sum x_{i1} & \sum x_{i2} & \ldots & \sum x_{ip} & n \end{array} \right ] } Applying the sweep operator (\code{\link{matSweep}} has an interesting effect. It produces the matrix: \deqn{SWP[p+1]\bold{Q}_{(+)} = \left [ \begin{array}{cc} -\bold{T} & \bar \bold{X} \\ \bar \bold{X} & -1/n \end{array} \right ] } where \eqn{\bold{T}} is \eqn{n-1} times the covariance matrix for \eqn{\bold{X}}. If weights are supplied, instead of calculating \eqn{\bold{X}_{(+)}^{T}\bold{X}_{(+)}}, \code{SSX} calculates \eqn{\bold{X}_{(+)}^{T}\bold{W}\bold{X}_{(+)}}, where \eqn{\bold{W}} is a diagonal matrix with the weights on the diagonal. Thus it becomes a weighted sum of squares: \deqn{ \left [ \begin{array}{cc} \sum w_i x_{i1}^2 & \sum w_i x_{i1} x_{i2} & \ldots & \sum w_i x_{i1} x_{ip} & \sum w_i x_{i1} \\ \sum w_i x_{i2} x_{i1} & \sum w_i x_{i2}^2 & \ldots & \sum w_i x_{i2} x_{ip} & \sum w_i x_{i2} \\ \vdots & \vdots & \ddots & \vdots & \vdots \\ \sum w_i x_{ip} x_{i1} & \sum w_i x_{ip} x_{i2} & \ldots & \sum w_i x_{ip}^2 & \sum w_i x_{ip} \\ \sum w_i x_{i1} & \sum w_i x_{i2} & \ldots & \sum w_i x_{ip} & \sum w_i \end{array} \right ] } While Dempster(1969) usually does the augmentation to the left, Little and Rubin (2002) does the augmentation to the left. Other than the location of the augmented row and column in the final matrix, there is little difference. The \code{side} argument controls on which side the augmentation is done. The default is to follow the Little and Rubin style (on the left) rather than the on the right style shown in the equations above. } \value{ A symmetric matrix of size \eqn{(p+1) \times (p+1)}, where \eqn{p} is the number of columns of the \code{X} argument. Its contents are as described above. } \references{ Dempsters, A. P. (1969). \emph{Elements of Continuous Multivariate Analysis.} Addison-Wesley. Little, R. J. A. and Rubin, D. B. (2002). \emph{Statistical Analysis with Missing Data, Second Edition.} Wiley. } \author{Russell Almond} \seealso{ \code{\link{matSweep}}, \code{\link[stats]{cov}}, \code{\link[stats]{cov.wt}}, \code{\link[base]{crossprod}} } \examples{ data(eggs) eggsQ <- SSX(eggs) stopifnot(as.integer(eggsQ[1,1])==nrow(eggs)) ## Sweep out constant row and column. eggsT <- matSweep(eggsQ,1) stopifnot(all.equal(eggsT[2:4,2:4],(nrow(eggs)-1)*cov(eggs)), all.equal(eggsT[1,2:4],apply(eggs,2,mean)), all.equal(eggsT[1,1],-1/nrow(eggs))) ## Test of weighted version wt <- c(rep(2/3,6), rep(1/3,6)) eggwQ <- SSX(eggs,wt,side="right") stopifnot(all.equal(eggwQ[4,4],sum(wt))) ## Sweep eggwT <- matSweep(eggwQ,4) ## Weighted mean. eggwcov <- cov.wt(eggs,wt,method="ML") stopifnot (all.equal(eggwT[1:3,1:3]/(sum(wt)), eggwcov$cov), all.equal(eggwT[4,1:3],eggwcov$center)) } \keyword{ array }% use one of RShowDoc("KEYWORDS") \keyword{ algebra }% __ONLY ONE__ keyword per line