A user ran the following code and received an error

1.0K Asked by GayatriJaiteley in Data Science , Asked on Nov 9, 2019

Answered by Gayatri Jaiteley

import pandas as pd

from sklearn.tree import DecisionTreeClassifier

data = pd.DataFrame()

data['A'] = ['a','a','b','a']

data['B'] = ['b','b','a','b']

data['C'] = [0, 0, 1, 0]

data['Class'] = ['n','n','y','n']

tree = DecisionTreeClassifier()

tree.fit(data[['A','B','C']], data['Class'])

He received the following error

Traceback (most recent call last):

File "", line 1, in

File "/usr/local/lib/python2.7/site-packages/sklearn/tree/tree.py", line 154, in fit

X = check_array(X, dtype=DTYPE, accept_sparse="csc")

File "/usr/local/lib/python2.7/site-packages/sklearn/utils/validation.py", line 377, in check_array

array = np.array(array, dtype=dtype, order=order, copy=copy)

ValueError: could not convert string to float: b

How to fix that?

In Python, Sklearn is used in almost all machine learning algorithms and they directly do not accept categorical variables in the algorithm. In such a case, to handle categorical variables, Label Encoder is used which converts strings to numbers or dummy variables are used.

For example

from sklearn import preprocessing

le = preprocessing.LabelEncoder()

le.fit(["paris", "paris", "tokyo", "amsterdam"])

le.transform(["tokyo", "tokyo", "paris"])

This will transform the strings Tokyo and paris into numbers and we can also invert the operation to get back into words such as

list(le.inverse_transform([2, 2, 1]))

It will again convert the numbers to the words.

A user ran the following code and received an error

Your Answer