woeHist {CPTtools} | R Documentation |
Takes a matrix providing the probability distribution for the target variable at several time points and returns a weight of evidence for all time points except the first.
woeHist(hist, pos, neg)
hist |
A matrix whose rows represent time points (after tests) and columns represent probabilities. |
pos |
Names or numbers of states which should be regarded as “positive” |
neg |
Names or numbers of states which should be regarded as “negative” |
Good (1971) defines the Weight Of Evidence (WOE) as:
100 log10 Pr(E|H)/Pr(E| not H) = 100 [ log10 Pr(H|E)/Pr(not H|E) - log10 Pr(H)/Pr(not H)
Where not H is used to indicate the negation of the hypothesis. Good recommends taking the log base 10 and multiplying by 100, and calls the resulting units centibans. The second definition of weight of evidence as a difference in log odd leads naturally to the idea of an incremental weight of evidence for each new observation.
Following Madigan, Mosurski and Almond (1997), all that is needed to
calculate the WOE is the marginal distribution for the hypothesis
variable at each time point. They also note that the definition is
somewhat problematic if the hypothesis variable is not binary. In
that case, they recommend partitioning the states into a
positive and negative set. The pos
and neg
are meant to describe that partition. They can be any expression
suitable for selecting columns from the hist
matrix.
A vector of weights of evidence of length one less than the number of
rows of hist
(i.e., the result of applying diff()
to the
vector of log odds.)
Russell Almond
Good, I. (1971) The probabilistic explication of information, evidence, surprise, causality, explanation and utility. In Proceedings of a Symposium on the Foundations of Statistical Inference. Holt, Rinehart and Winston, 108-141.
Madigan, D., Mosurski, K. and Almond, R. (1997) Graphical explanation in belief networks. Journal of Computational Graphics and Statistics, 6, 160-181.
## Not run: allcorrect <- parseProbVec("CorrectSequence.csv") woeHist(allcorrect,c("High"),c("Medium","Low")) woeHist(allcorrect,1:2,3) ## End(Not run)