Ensemble Methods/Techniques
  • are techniques that create multiple models and then combine them to produce improved results

Ensemble - Steps

  1. construction of a dictionary 𝐷 = {𝑇1(𝑋), 𝑇2(𝑋), …, 𝑇𝑀(𝑋)} of basis elements (weak learners) 𝑇𝑖(𝑋)
  2. fitting a model 𝑓(𝑋) = 𝛴1≤𝑖≤𝑚[𝛼𝑖·𝑇𝑖(𝑋)]

Ensemble - Types

Type

Description

Linear Regression

  • the ensemble consists of the coordinate functions 𝑇𝑖(𝑋) = 𝑋𝑖
  • the fitting is done by least squares

Voting & Averaging

  1. create multiple (classification/continuous) regression models of dataset
    1. each base model can be created using either:
      1. different subsets of dataset and same algorithm
      2. same dataset with different algorithms
      3. any other method
  2. compute output for each model and combine results
    1. majority voting
    2. weighted voting
    3. simple averaging
    4. weighted averaging

Stacking

basic idea:

  • train machine learning algorithms with training dataset and then generate a new dataset with these models
  • then this new dataset is used as input for the combiner machine learning algorithm

Bootstrapping Aggregation - Bagging

  • averages a noisy fitted function to many bootstrap samples, to reduce its variance
  • each model is given a resample dataset of original dataset with replacement
  • training/generation of each model can be done in parallel
  • prediction is done by aggregating the predictions of all trained models into a single prediction
Link to original

Random Forest

  • a fancier version of bagging
  • the ensemble consists of trees grown to bootstrapped versions of the data, with additional randomization at each split
  • fitting simply averages

Boosting (Additive Modeling)

  • boosting is an ensemble technique where a new model fits the residuals of pre-existing models, and then this new model is added to the pre-existing models. Models are added sequentially until no further improvements can be made
  • boosting incrementally builds an ensemble by training each model with the same dataset but where the weights of each dataset instance are adjusted according to the error of the last prediction. The main idea is to force the models to focus on the “hard” instances.
  • unlike bagging, boosting is a sequential method, so you can not use parallel operations here
  • combining simple models (high bias) to form a complex model (lower bias)
Link to original

  • the ensemble is grown in an adaptive fashion, then simply averaged at the end

Ensemble - Comparisons

Single Decision Tree <  Bagging < Random Forest < Boosting

Ensemble - Resources