Explain with a case study how to perform text analysis using R.

Asked on Jan 10, 2020
To perform text analysis, we will be importing data containing reviews of restaurants.

First we read the data

# Importing the dataset

dataset_original = read.delim('Restaurant_Reviews.tsv', quote = '', stringsAsFactors = FALSE)

Now we will perform text cleaning after installing all the libraries required.

# Cleaning the texts

# install.packages('tm')

# install.packages('SnowballC')



corpus = VCorpus(VectorSource(dataset_original$Review))

corpus = tm_map(corpus, content_transformer(tolower))

corpus = tm_map(corpus, removeNumbers)

corpus = tm_map(corpus, removePunctuation)

corpus = tm_map(corpus, removeWords, stopwords())

corpus = tm_map(corpus, stemDocument)

corpus = tm_map(corpus, stripWhitespace)

Now we will create the bag of words model

# Creating the Bag of Words model

dtm = DocumentTermMatrix(corpus)

dtm = removeSparseTerms(dtm, 0.999)

dataset = as.data.frame(as.matrix(dtm))

dataset$Liked = dataset_original$Liked

Now we will split the model for training and testing

# Splitting the dataset into the Training set and Test set

# install.packages('caTools')



split = sample.split(dataset$Liked, SplitRatio = 0.8)

training_set = subset(dataset, split == TRUE)

test_set = subset(dataset, split == FALSE)

Now we will fit the model using Random Forest

# Fitting Random Forest Classification to the Training set

# install.packages('randomForest')


classifier = randomForest(x = training_set[-692],

                          y = training_set$Liked,

                          ntree = 10)

After fitting, we will predict the model

# Predicting the Test set results

y_pred = predict(classifier, newdata = test_set[-692])

We then finally evaluate the model using confusion matrix

# Making the Confusion Matrix

cm = table(test_set[, 692], y_pred)

