An ensemble method for right-censored data
Extension of random forests
Based on H. Ishwaran, et al., Random Survival Forests, The Annals of Applied Statistics, 841-860, 2008
Citations in 2017: “Nature Genetics”, “Clinical Cancer Research”, “Nature Medicine”, “PloS one”, etc
Probability for a data point to be selected for an OOB sample: \[
\left(1 - \frac{1}{n}\right)^n \approx \exp(-1) = 0.368
\]
Training data will contain approximately \(63.2\%\) of the original data
methods with restrictive assumptions: proportional hazards
ad hoc methods to detect nonlinear effects
Zhu and Kosorok, recursively imputed survival tree
Draw B bootstrap samples from the original data
Grow a survival tree for each bootstrap sample
Calculate a cumulative hazard function (CHF) for each tree
Average to obtain the ensemble CHF
Calculate prediction error for the ensemble CHF, using OOB data
Survival trees are binary trees
Start at the root node
a node should contain a minimum \(d_0 > 0\) unique events
CHF is estimated by a Nelson-Aalen estimator
Harrell’s concordance index, C-index (Harrell et al., 1982)
C-index specifically acounts for censoring
Form all possible pairs of cases over the data
omit the pairs if shorter survival time is censored
omit \((i, j)\) if \(T_i = T_j\) unless at least on is an event
breast - Wisconsin Prognostic Breast Cancer Data
breast10 - breast plus 10 uniform iid random variables
breast50 - breast plus 50 uniform iid random variables