Explain how to implement decision tree in python
Initially we import the libraries and the dataset
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
data = pd.read_csv("Fraud_check.csv")
data.head()
Now we discretize the data
data.TaxableIncome.max()
data['TaxableIncome'] = pd.cut(data['TaxableIncome'], [0,30000,100000], labels=['Risky','Good'])
data_dummies=pd.get_dummies(data[['Undergrad','Marital.Status','Urban']])
data_new=data.drop(['Undergrad','Marital.Status','Urban'],place=True,axis=1)
data_new=pd.concat([data,data_dummies],axis=1)
Now we split the dataset for training and testing
X=pd.DataFrame(data_new.iloc[:,1:].values)
y=data_new['TaxableIncome']
data_new.isnull().sum()
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size = 0.2)
Now we will fit the model and predict the data
from sklearn.tree import DecisionTreeClassifier
model = DecisionTreeClassifier()
model.fit(X,y)
predictions=model.predict(X_test)
predictions=pd.DataFrame(predictions)
Now we will evaluate the model
from sklearn.metrics import classification_report,confusion_matrix
print(confusion_matrix(y_test,predictions))
print('
')
print(classification_report(y_test,predictions))