Ensemble Methods/Techniques

are techniques that create multiple models and then combine them to produce improved results

Ensemble - Steps

construction of a dictionary 𝐷 = {𝑇₁(𝑋), 𝑇₂(𝑋), …, 𝑇_𝑀(𝑋)} of basis elements (weak learners) 𝑇_𝑖(𝑋)
fitting a model 𝑓(𝑋) = 𝛴_{1≤𝑖≤𝑚}[𝛼_𝑖·𝑇_𝑖(𝑋)]

Ensemble - Types

Type	Description
Linear Regression	the ensemble consists of the coordinate functions 𝑇_𝑖(𝑋) = 𝑋_𝑖 the fitting is done by least squares
Voting & Averaging	create multiple (classification/continuous) regression models of dataset each base model can be created using either: different subsets of dataset and same algorithm same dataset with different algorithms any other method compute output for each model and combine results majority voting weighted voting simple averaging weighted averaging
Stacking	basic idea: train machine learning algorithms with training dataset and then generate a new dataset with these models then this new dataset is used as input for the combiner machine learning algorithm
Bootstrapping Aggregation - Bagging	averages a noisy fitted function to many bootstrap samples, to reduce its variance each model is given a resample dataset of original dataset with replacement training/generation of each model can be done in parallel prediction is done by aggregating the predictions of all trained models into a single prediction Link to original
Random Forest	a fancier version of bagging the ensemble consists of trees grown to bootstrapped versions of the data, with additional randomization at each split fitting simply averages
Boosting (Additive Modeling)	boosting is an ensemble technique where a new model fits the residuals of pre-existing models, and then this new model is added to the pre-existing models. Models are added sequentially until no further improvements can be made boosting incrementally builds an ensemble by training each model with the same dataset but where the weights of each dataset instance are adjusted according to the error of the last prediction. The main idea is to force the models to focus on the “hard” instances. unlike bagging, boosting is a sequential method, so you can not use parallel operations here combining simple models (high bias) to form a complex model (lower bias) Link to original the ensemble is grown in an adaptive fashion, then simply averaged at the end

Ensemble - Comparisons

Single Decision Tree < Bagging < Random Forest < Boosting

Ensemble - Resources