Tokenisation & normalisation

When processing almost any text, we need to find the words. This involves splitting the input character sequence into tokens and normalising each token into words.