Python random state in splitting dataset

591    Asked by Nabhashahin in Python , Asked on Mar 9, 2021

I'm kind of new to python. can anyone tell me why we set random state to zero in splitting train and test set.

X_train, X_test, y_train, y_test = 

    train_test_split(X, y, test_size=0.30, random_state=0)

I have seen situations like this where random state is set to one!

X_train, X_test, y_train, y_test =

    train_test_split(X, y, test_size=0.30, random_state=1)

What is the consequence of this random state in cross validation as well?

Answered by Nabha shahin

To solve the train test split random state, you should go through the answer further:


Random_state can be 0 or 1 or any other integer. It should be the same value if you want to validate your processing over multiple runs of the code. By the way, I have seen random_state=42 used in various official examples of scikit.

the random_state parameter is used for starting the internal random number generator, which will decide the splitting of data into train and test indices in your case.

If random_state is None or np. random, then a randomly initialized RandomState object is returned.

If random_state is an integer, then it is used to seed a new RandomState object.

This is to check and validate the data when running the code multiple times. Setting random_state a fixed value will guarantee that the same sequence of random numbers is generated each time you run the code.



Your Answer

Interviews

Parent Categories