Consider a standard supervised learning problem
boosting
random forests
averaging
voting
stacking
averaging is the most popular and fundamental combination method
weighted average is given by \[ H(x) =\sum_{i=1}^N w_ih_i(x) \quad \text{with} \ w_i \geq 0 \ \text{and} \ \sum_{i=1}^Nw_i = 1 \]
consider classification problem with 10 samples with responses: \(Y = (1, 1, 1, 1, 1, 1, 1, 1, 1, 1)\)
three learners: M1, M2, M3
all three classifiers have 0.7 accuracy
P(all three correct) = \(0.7\times0.7\times0.7 = 0.343\)
P(two correct) = \(3\times0.3\times0.7\times0.7 = 0.441\)
P(at least two correct) = 0.784
highly correlated
less correlated
Stacking is a general procedure where a learner is trained to combine the individual learners.
individual learners are called the first-level learners
a model that combines first-level learners is called a meta-learner
let \(Y\) - vector of responses and \(X\) - covariates
\(h_1,...,h_L\) - set of base learners
create matrix \(Z\) where each column is cross-validated predictions of the base learners
estimate a meta-learner \(H\) based on matrix \(Z\) and \(Y\)
caret
SuperLearner
caretEnsemble
data splitting
pre-processing
feature selection
model tuning using resampling
variable importance estimation
Data: Sonar from mlbench:
## V1 V2 V3 V4 V5 V6
## 1 0.0200 0.0371 0.0428 0.0207 0.0954 0.0986
## 2 0.0453 0.0523 0.0843 0.0689 0.1183 0.2583
## 3 0.0262 0.0582 0.1099 0.1083 0.0974 0.2280
## 4 0.0100 0.0171 0.0623 0.0205 0.0205 0.0368
## 5 0.0762 0.0666 0.0481 0.0394 0.0590 0.0649
## 6 0.0286 0.0453 0.0277 0.0174 0.0384 0.0990
## 7 0.0317 0.0956 0.1321 0.1408 0.1674 0.1710
## 8 0.0519 0.0548 0.0842 0.0319 0.1158 0.0922
## 9 0.0223 0.0375 0.0484 0.0475 0.0647 0.0591
## 10 0.0164 0.0173 0.0347 0.0070 0.0187 0.0671
set.seed(107)
model_list <- caretList(
Class~., data=training,
trControl=my_control,
methodList=c("rf", "gbm", "rpart")
)
Correlation between models
## rf gbm rpart
## rf 1.0000000 0.6447882 0.3250142
## gbm 0.6447882 1.0000000 0.1658579
## rpart 0.3250142 0.1658579 1.0000000
glm_ensemble <- caretStack(
model_list,
method="glm",
metric="ROC",
trControl=trainControl(
method="boot",
number=10,
savePredictions="final",
classProbs=TRUE,
summaryFunction=twoClassSummary
)
)
model_preds$ensemble <- predict(glm_ensemble,
newdata=testing, type="prob")
## rf gbm rpart ensemble
## M vs. R 0.945216 0.8919753 0.6566358 0.9552469
set.seed(107)
model_list <- caretList(
Class~., data=training,
trControl=my_control,
methodList=c("rf", "gbm")
)
glm_ensemble <- caretStack(
model_list,
method="glm",
metric="ROC",
trControl=trainControl(
method="boot",
number=10,
savePredictions="final",
classProbs=TRUE,
summaryFunction=twoClassSummary
)
)
model_preds$ensemble <- predict(glm_ensemble,
newdata=testing, type="prob")
## rf gbm ensemble
## M vs. R 0.945216 0.8919753 0.9429012
set.seed(107)
model_list <- caretList(
Class~., data=training,
trControl=my_control,
methodList=c("rf", "rpart")
)
glm_ensemble <- caretStack(
model_list,
method="glm",
metric="ROC",
trControl=trainControl(
method="boot",
number=10,
savePredictions="final",
classProbs=TRUE,
summaryFunction=twoClassSummary
)
)
model_preds$ensemble <- predict(glm_ensemble,
newdata=testing, type="prob")
## rf rpart ensemble
## M vs. R 0.945216 0.6566358 0.9506173