What are the preprocessing steps required before implementing a retrieval?
Some of the preprocessing methods need to be implemented before initiating a query. Some of them are-
a) Tokenizing-It converts the words in a document into tokens or simply chop off the words into pieces.
b) Stopwords-They are common words including articles, determiners etc which are avoided for a better query.
c) Stemming and lemmatization-This technique converts the word into their base form to increase the relevance of a query.