The process of splitting text into TOKENS
also known as TOKENIZATION
Assembling TOKENS into a meaningful structure