Random Survival Forests

Aleksandr Savenkov

An ensemble method for right-censored data
Extension of random forests
Based on H. Ishwaran, et al., Random Survival Forests, The Annals of Applied Statistics, 841-860, 2008
Citations in 2017: “Nature Genetics”, “Clinical Cancer Research”, “Nature Medicine”, “PloS one”, etc

Two forms of randomization:
- random bootstrapped sample
- random subset of variables at each node split
Applications:
- regression and classifiction problems

Probability for a data point to be selected for an OOB sample: \[ \left(1 - \frac{1}{n}\right)^n \approx \exp(-1) = 0.368 \]
Training data will contain approximately \(63.2\%\) of the original data

methods with restrictive assumptions: proportional hazards
ad hoc methods to detect nonlinear effects
identifying interaction terms are problematic
- brute force for all two-way or three-way interactions
- subject knowledge to narrow the search

Survival trees are binary trees
Start at the root node
split into two nodes using a survival criterion
- log-rank: splits by maximazation of the log-rank test statistics
- log-rank score: standardized log-rank statistics
- custom (require some efforts)

Form all possible pairs of cases over the data
omit the pairs if shorter survival time is censored
omit \((i, j)\) if \(T_i = T_j\) unless at least on is an event
For a permissible pair:
- for \(T_i \neq T_j\), count 1 if shorter survival time has worse predicted outcome, 0.5 if tied
- for \(T_i = T_j\) and both are events count 1 if predictions are tied and 0.5 otherwise. If not both events, 1 if event has worse predicted outcome, 0.5 otherwise

breast - Wisconsin Prognostic Breast Cancer Data
- 198 observations with 32 covariates
- outcome: R = recurrent, N = non-recurrent
- the first 30 covariates are computed from a digitized image
breast10 - breast plus 10 uniform iid random variables
breast50 - breast plus 50 uniform iid random variables

H. Ishwaran, et al., Random Survival Forests, The Annals of Applied Statistics, 841-860, 2008
L. Breiman, Random Forests, Machine Learning, 5-32, 2001
Ishwaran H. and Kogalur U.B. (2017). randomForestSRC Random Forests for Survival, Regression, and Classification, R package version 2.5.0.
T. Hothorn, et al., Survival Ensembles, Biostatistics, 355-373
Zhu R., Kosorok M., Recursively imputed survival trees, JASA, 331-340, 2012