A user tries to run logistic regression on my data (6 categorical, 1 integer) using scikit learn. He is following the scikit learn documentation but when trying to fit my data he is getting the following value error.

688    Asked by SnehaPandey in Data Science , Asked on Nov 30, 2019
Answered by Sneha Pandey

#Below are the variables of my data.

train_data.dtypes

    OUTPUT

    TripType category

    VisitNumber category

    Weekday category

    Upc category

    ScanCount int64

    DepartmentDescription category

    FinelineNumber category

    dtype: object


X = train_data.loc[:, 'VisitNumber':'FinelineNumber']

Y = train_data.loc[:, 'TripType':'TripType']

logreg = linear_model.LogisticRegression()

logreg.fit(X, Y)

**ValueError: could not convert string to float: GROCERY DRY GOODS**

The error is due to the presence of categorical variables in the dataset. We cannot use names of categories directly as features in logistic regression. We need to convert them into some encoded vectors (or dummy variables). If we have 6 categories we need to use 5 dummy variables.

The example of changing variable into dummies is given below



The gender column has been changed to dummy variables 0 and 1.


Your Answer

Interviews

Parent Categories