What is tokenization and how does it works?

630    Asked by NehaTambe in Data Science , Asked on Nov 27, 2019
Answered by Neha Tambe

Tokenization is the process of breaking up the original text into component pieces which are known as tokens. Tokens have a variety of useful attributes and methods. One of the important things to know that they are pieces of the original text and they are not converted to the base words which in case happens with lemmatization and stemming.

The flowchart below shows how tokenization works.




Your Answer

Interviews

Parent Categories