How random and stratified splitting differ from each other?

481 Asked by NehaTambe in Data Science , Asked on Nov 30, 2019

Random splitting is a splitting technique which splits the data based on a ratio by selecting random attributes irrespective of the classes. In such cases, an algorithm can choose the majority of the attributes from a single class which can lead to problems in accuracy and can give rise to a problem such as overfitting.

In such cases, stratified sampling can help to overcome such difficulties. We can manually select proportions of attributes based on each class equally so it won’t affect the overall accuracy.

Let us apply random splitting on iris dataset in 80:20 ratio and look on the classes.

Here we can see the species which are the target variables are not equally chosen to train. But we can manually choose every 80% of each class from the data. In this case, to attain good accuracy and to avoid overfitting, stratified sampling is used.

How random and stratified splitting differ from each other?

Your Answer