Machine Learning Algorithms Every Data Scientist Should Know

Types Of ML Algorithms

There are a huge number of ML algorithms out there. Trying to classify them leads to the distinction being made in types of the training procedure, applications, the latest advances, and some of the standard algorithms used by ML scientists in their daily work. There is a lot to cover, and we shall proceed as given in the following listing:

1. Statistical Algorithms

2. Classification

3. Regression

4. Clustering

5. Dimensionality Reduction

6. Ensemble Algorithms

7. Deep Learning

8. Reinforcement Learning

9. AutoML (Bonus)

1. Statistical Algorithms**Statistics** is necessary for every machine learning expert. Hypothesis testing and confidence intervals are some of the many statistical concepts to know if you are a data scientist. Here, we consider here the phenomenon of **overfitting. **Basically, overfitting occurs when an ML model learns so many features of the training data set that the generalization capacity of the model on the test set takes a toss. The tradeoff between performance and overfitting is well illustrated by the following illustration:

Overfitting — from WikipediaHere, the black curve represents the performance of a classifier that has appropriately classified the dataset into two categories. Obviously, training the classifier was stopped at the right time in this instance. The green curve indicates what happens when we allow the training of the classifier to ‘overlearn the features’ in the training set. What happens is that we get an accuracy of 100%, but we lose out on performance on the test set because the test set will have a feature boundary that is usually similar but definitely not the same as the training set. This will result in a high error level when the classifier for the green curve is presented with new data. How can we prevent this?

## Keep reading with a 7-day free trial

Subscribe to ** The Blog of Thomas Cherickal** to keep reading this post and get 7 days of free access to the full post archives.