Explain some of the preprocessing steps required before building a model in R.
Preprocessing is required to deal with the data which are categorical in nature or have missing data or normalization is required.
Let us read data to perform these preprocessing steps.
# Importing the dataset
dataset = read.csv('Data.csv')
Now we will take care of missing data by filling with the average
# Taking care of missing data
dataset$Age = ifelse(is.na(dataset$Age),
ave(dataset$Age, FUN = function(x) mean(x, na.rm = TRUE)),
dataset$Age)
dataset$Salary = ifelse(is.na(dataset$Salary),
ave(dataset$Salary, FUN = function(x) mean(x, na.rm = TRUE)),
dataset$Salary)
We can also encode categorical data by doing the following
# Encoding categorical data
dataset$Country = factor(dataset$Country,
levels = c('France', 'Spain', 'Germany'),
labels = c(1, 2, 3))
dataset$Purchased = factor(dataset$Purchased,
levels = c('No', 'Yes'),
labels = c(0, 1))
We can perform normalization by doing the following
training_set = scale(training_set)
test_set = scale(test_set)