Before we understand what is Ensemble technique in Machine learning let us first understand few challenges associated while building an effi...

Ensemble Machine Learning Techniques

Before we understand what is Ensemble technique in Machine learning let us first understand few challenges associated while building an efficient and accurate Machine Learning Model: 

1. Bias (Intercept) - If our model is skewed towards some data points then the ability of a Machine learning model to capture the true relationship cannot be obtained. This is a type of error in which the weights are not properly represented leading to skewed results, less accurate and more analytical errors.

"Higher the Bias less accurate will be our model".

2. Variance - The difference between the accuracy predicted on training data and the accuracy predicted on the test data is called as 'Variance'. There is a problem here if there is no variance in our data then there is a chance of Overfitting of our training data on the test data.

"Higher the Variance less accurate will be our model".

3. Overfitting - When we train our model with lot of data there is a chance that our model is learning from noise and inaccurate data points in our dataset. Then our model fails to categorize the data correctly because of too much noise and details.

"Overfitting is when High Variance and Low Bias is present in a model".

4. Underfitting - In this situation our model is failing to identify the trend itself destroying the accuracy of our Machine learning model which usually happens when we have not trained our model with sufficient data points, just like trying to build a Linear model using non-linear relational data points.

"Underfitting is when Low Variance and High Bias is present in a model".

To deal with this problem we need a model with "Low Variance and Low Bias" which is considered as an ideal Good fit model for making better Predictions and for achieving best Insights form our dataset and the point where both of these factors Variance and Bias are low is called as the "Sweet spot between a simple model and complex model" which we can find using RegularizationBagging, Boosting and Stacking.


An Ensemble Machine Learning is a technique of combining predictions from same training dataset (also known as Classifiers) using multiple Machine learning models to achieve better accuracy. It is one of the efficient way of building a Machine learning model.

1. Strong Classifiers: the prediction obtained from any model which is performing really well on both regression and classification tasks given.

2. Weak Classifiers: are the prediction obtained form any model that performs only slight better than any random chance. There can be single weak learner or combined weak learners.

We can divide Ensemble learning techniques into Simple and Advanced Ensemble learning techniques which are: -
1. Simple
        a. Max Voting.
        b. Averaging.
        c. Weighted Averaging.
2. Advanced
        a. Stacking.
        b. Blending.
        c. Bagging.
        d. Boosting.

Regularization: 

Let's begin with understanding the Regularization techniques used for improving the accuracy of the model and to control the overfitting scenarios (basically controlling high Variance). Though regularization does not improvise the performance of the model but as an advantage it can improve the generalization of performance of new and unseen data.

The three main Regularization techniques are: -

1.  L2 penalty/L2 Norm - Ridge Regression method. 

2. L1 penalty/L1 Norm - Lasso Regression.

3. Dropout.

We can use Ridge and Lasso algorithms for any type of algorithms involving Weighted parameters and also for Neural networks whereas Dropout is primarily used for any kind of Neural networks like ANN, CNN, DNN or RNN to moderate the learning.


1. Ridge Regularization (L2 Norm): The main purpose of using Ridge Regularization is to find a new line that is not completely Overfitting on Training data as well, which means we introduce a small amount of Bias into how our new line will fit to the data. When we find a slightly bad fit line with some bias the change is very significant improving the long living predictions and accuracy of the test data.

Fig: - Ridge Regression Formulation

Ridge Regression penalizes sum of squared coefficients. Here we will try to decrease the Ridge Regression Penalty (Lambda * slope^2) of the Least Squares Line (Regression Line) calculated such that the Ridge Regression Line also is fitting on most of the data points. This can be done by increasing the value of 'Lambda' form 0 to n positive numbers using Cross Validation method. 

If Lambda = 0 then the Ridge Penalty is same as Regression Line. Larger Lambda value gets less steeper becomes Slope of out data set.


2. Lasso Regularization (L1 Norm): penalizes the absolute values of the coefficients. Lasso regression is very much similar to the Ridge Regression technique but along with improving the prediction it also helps in performing Feature Engineering as we can shrink the Lasso penalty to absolute zero. 


Fig: Lasso Regression Formularization
Lasso Regression can exclude useless variables from an equation making the final equation easier and simpler for calculations - Feature Engineering. Hence Lasso Regression is best to use when we have lots of useless parameters.

3. Elastic Net Regression: is hybrid of Lasso Regression and Ridge Regression techniques. It is used when there are multiple features correlated. 

Fig: Elastic Net Regression
1. When Lambda1 and Lamda2 both = 0, then it is Least Squared Regression line.
2. If Lambda1 = 0, then it is a Ridge regression.
3. If Lmabda2 = 0, then it is Lasso regression.

4. Dropout: Dropout is a Regularization technique we mostly use while building a Neural Network model as it prevents the complex co-adoptions from other neurons. In Neural Networks fully connected layers are more prone to Overfit the training dataset, using Dropout we can drop connections with 1-p (probability parameter which needs to be tuned) probability for each of the specified layers and now we are left with the Reduced network in the Test dataset as  most of them were left out in the Training dataset.

Fig: Dropout Regularization
Dropout increases the training speed and learns more robust internal functions to identify random and unseen data.

Now let us discuss some of the advanced Ensemble Machine Learning Techniques: -

1. Bagging: is called Bootstrap Aggregating Machine Learning technique designed to Reduce Variance, improve the accuracy and stability of the algorithm used in Statistical Classification and Regression. Bagging reduces variance and prevents overfitting of the Training model.
Bagging works by creating multiple samples of the Training dataset learns them individually, fits a decision tree on each sample and combines each of the weak classifiers Parallelly using some deterministic averaging processes.

Fig: - Bagging Ensemble technique

Random forest is an extension of the Bagging Ensemble method, where we split the data while constructing each decision tree considering few random features only and not all the features together.

2. Boosting: is an Ensemble Machine learning technique used to Reduce Bias and also Reduce Variance in supervised learning by adding models Sequentially to the ensemble where new models added attempts to correct the errors made by the prior models. Hence the more and more models added will reduce the error to fewer at least to prevent Training from Overfitting.
Fig: - Boosting Ensemble Technique
Ada Boost, Gradient Boost and XG Boost Algorithms are some of the successful types of Boosting Techniques used in Ensemble Machine Learning.
Ada Boosting uses very simple trees for making single decision on one input variable before making a  prediction and these short trees are referred to as Decision Stumps.

We can see the Comparison between Bagging and Boosting as follow:
Fig: - Bagging vs Boosting

3. Stacking: involves combining the predictions based on Voting from multiple Machine Learning models but same training dataset like Bagging and Boosting. Another Machine Learning model is used to learn from the Base Models which is more often a Linear model such as Linear Regression or Logistic Regression problem for classification but we can still use any Machine learning model.
Fig: - Stacking Ensemble Technique
Stacking mainly involves K-fold Cross Validation or Train-Test-Split model for each Base-model for each Base model to store and then training them on entire Training dataset and now the Meta-model is trained on which model to trust under circumstances.

Check out my Github repository about XGBoosting Algorithm, one of the most widely used Ensemble Learning method link - https://github.com/HarishSingh2095/Ensemble-Learning_XGBoost-Algorithm


References:

For more references on Ensemble Learninig you can visit  https://machinelearningmastery.com/ensemble-machine-learning-with-python-7-day-mini-course/


Also do visit - https://towardsdatascience.com/ensemble-methods-bagging-boosting-and-stacking-c9214a10a205 for further references.


You can connect with me on - 

Linkedinhttps://www.linkedin.com/in/harish-singh-166b63118

Twitter - @harisshh_singh

Gmail - hs02863@gmail.com

 

End notes:

Hope this was useful for beginners in the field of Data Science. 

See you guys until next time.


7 comments: