Python random state in splitting dataset
I'm kind of new to python. can anyone tell me why we set random state to zero in splitting train and test set.
X_train, X_test, y_train, y_test =
train_test_split(X, y, test_size=0.30, random_state=0)
I have seen situations like this where random state is set to one!
X_train, X_test, y_train, y_test =
train_test_split(X, y, test_size=0.30, random_state=1)
What is the consequence of this random state in cross validation as well?
To solve the train test split random state, you should go through the answer further:
Random_state can be 0 or 1 or any other integer. It should be the same value if you want to validate your processing over multiple runs of the code. By the way, I have seen random_state=42 used in various official examples of scikit.
the random_state parameter is used for starting the internal random number generator, which will decide the splitting of data into train and test indices in your case.
If random_state is None or np. random, then a randomly initialized RandomState object is returned.
If random_state is an integer, then it is used to seed a new RandomState object.
This is to check and validate the data when running the code multiple times. Setting random_state a fixed value will guarantee that the same sequence of random numbers is generated each time you run the code.