Compare logistic regression with decision tree along with a case study in Python

947    Asked by varshaChauhan in Data Science , Asked on Nov 5, 2019
Answered by varsha Chauhan

First we import the data

# Importing the libraries

import numpy as np

import matplotlib.pyplot as plt

import pandas as pd

# Importing the dataset

dataset = pd.read_csv('Social_Network_Ads.csv')

X = dataset.iloc[:, [2, 3]].values

y = dataset.iloc[:, 4].values

Now we split and scale the data

# Splitting the dataset into the Training set and Test set

from sklearn.cross_validation import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)

# Feature Scaling

from sklearn.preprocessing import StandardScaler

sc = StandardScaler()

X_train = sc.fit_transform(X_train)

X_test = sc.transform(X_test)

Now we fit to the logistic classification and decision tree classification

# Fitting Logistic Regression and Decision tree to the Training set

from sklearn.linear_model import LogisticRegression

from sklearn.tree import DecisionTreeClassifier

classifier1 = LogisticRegression(random_state = 0)

classifier 1.fit(X_train, y_train)

classifier2=DecisionTreeClassifier()

classifier 2.fit(X_train, y_train)

Now we predict both the models

# Predicting the Test set results

y_pred1 = classifier1.predict(X_test)

y_pred2 = classifier2.predict(X_test)


 Now we evaluate both the model

# Making the Confusion Matrix

from sklearn.metrics import confusion_matrix

cm = confusion_matrix(y_test, y_pred1)

cm = confusion_matrix(y_test, y_pred2)

Compared to the confusion matrix, Decision tree performed better than logistic regression



Your Answer

Answer (1)

As anyone with a background in data analysis, this theme definitely caught my attention.


Looking at the code, it is extremely good to see the use of famous libraries like numpy, pandas, and scikit-learn. Importing the dataset and performing the necessary data splitting and scaling are essential steps in any machine learning project.

I see that logistic regression and decision tree classifiers are being trained on the dataset. It's always valuable to explore multiple algorithms and compare their performance. In this case, it's mentioned that decision trees outperformed logistic regression based on the confusion matrix evaluation. It would be interesting to dive deeper into the evaluation metrics and understand why decision trees yielded better results in this particular scenario.

Overall, it's fascinating to witness the power of Python in implementing machine learning models and conducting insightful analysis. I'm excited to further explore this topic and learn more about the nuances of logistic regression and decision trees in different contexts.

1 Year

Interviews

Parent Categories