Explain some of the preprocessing steps required before building a model in R.

816    Asked by Sunitapandey in Data Science , Asked on Nov 5, 2019
Answered by Sunita pandey

Preprocessing is required to deal with the data which are categorical in nature or have missing data or normalization is required.

Let us read data to perform these preprocessing steps.

# Importing the dataset

dataset = read.csv('Data.csv')

Now we will take care of missing data by filling with the average

# Taking care of missing data

dataset$Age = ifelse(is.na(dataset$Age),

                     ave(dataset$Age, FUN = function(x) mean(x, na.rm = TRUE)),

                     dataset$Age)

dataset$Salary = ifelse(is.na(dataset$Salary),

                        ave(dataset$Salary, FUN = function(x) mean(x, na.rm = TRUE)),

                        dataset$Salary)

We can also encode categorical data by doing the following

# Encoding categorical data

dataset$Country = factor(dataset$Country,

                         levels = c('France', 'Spain', 'Germany'),

                         labels = c(1, 2, 3))

dataset$Purchased = factor(dataset$Purchased,

                           levels = c('No', 'Yes'),

                           labels = c(0, 1))

We can perform normalization by doing the following

training_set = scale(training_set)

test_set = scale(test_set)



Your Answer

Interviews

Parent Categories