How can I get the train test validation split?

262    Asked by DavidEDWARDS in Data Science , Asked on Feb 13, 2023

How could I randomly split a data matrix and the corresponding label vector into a X_train, X_test, X_val, y_train, y_test, y_val with scikit-learn? As far as I know, sklearn.model_selection.train_test_split is only capable of splitting into two not into three…

Answered by Dominic Poole

To get the train test validation split, you could just use sklearn.model_selection.train_test_split twice. First to split to train, test and then split train again into validation and train. Something like this:


 X_train, X_test, y_train, y_test 
    = train_test_split(X, y, test_size=0.2, random_state=1)
 X_train, X_val, y_train, y_val
    = train_test_split(X_train, y_train, test_size=0.25, random_state=1) # 0.25 x 0.8 = 0.2

Your Answer

Interviews

Parent Categories