How to implement one-hot encoding in Python and in R?
One-hot encoding is used to transform the categorical variables into dummy variables which is used in order to fit in the model.
In Python
To implement one-hot encoding in Python, we use the following
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Importing the dataset
dataset = pd.read_csv('Data.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 3].values
From the above code, dataset ‘Data’ is imported and features and target variables are separated in X and y respectively.
For one-hot encoding we need to fit in the independent variable which is X.
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_X = LabelEncoder()
X[:, 0] = labelencoder_X.fit_transform(X[:, 0])
onehotencoder = OneHotEncoder(categorical_features = [0])
X = onehotencoder.fit_transform(X).toarray()
In R
To implement one- hot encoding in R, we need to do the following.
# Importing the dataset
dataset = read.csv('Data.csv')
Here, dataset ‘Data.csv’ is imported and we will be encoding continuous variables.
# Encoding categorical data
dataset$Country = factor(dataset$Country,levels = c('France', 'Spain', 'Germany'),labels = c(1, 2, 3))
dataset$Purchased = factor(dataset$Purchased,levels = c('No', 'Yes'),labels = c(0, 1))