What is tokenization and how does it work?
Tokenization is the process of breaking up the original text into component pieces which are known as tokens. Tokens have a variety of useful attributes and methods. One of the important things to know that they are pieces of the original text and they are not converted to the base words which in case happens with lemmatization and stemming.
The flowchart below shows how tokenization works.