How to implement one-hot encoding in Python and in R?

724 Asked by NiharikaDeshpande in Data Science , Asked on Nov 30, 2019

One-hot encoding is used to transform the categorical variables into dummy variables which is used in order to fit in the model.

In Python

To implement one-hot encoding in Python, we use the following

import numpy as np

import matplotlib.pyplot as plt

import pandas as pd

# Importing the dataset

dataset = pd.read_csv('Data.csv')

X = dataset.iloc[:, :-1].values

y = dataset.iloc[:, 3].values

From the above code, dataset ‘Data’ is imported and features and target variables are separated in X and y respectively.

For one-hot encoding we need to fit in the independent variable which is X.

from sklearn.preprocessing import LabelEncoder, OneHotEncoder

labelencoder_X = LabelEncoder()

X[:, 0] = labelencoder_X.fit_transform(X[:, 0])

onehotencoder = OneHotEncoder(categorical_features = [0])

X = onehotencoder.fit_transform(X).toarray()

In R

To implement one- hot encoding in R, we need to do the following.

# Importing the dataset

dataset = read.csv('Data.csv')

Here, dataset ‘Data.csv’ is imported and we will be encoding continuous variables.

# Encoding categorical data

dataset$Country = factor(dataset$Country,levels = c('France', 'Spain', 'Germany'),labels = c(1, 2, 3))

dataset$Purchased = factor(dataset$Purchased,levels = c('No', 'Yes'),labels = c(0, 1))

Your Answer