*3.2. Tokenization*

Tokenization is performed on a term level for all the five methods presented. Each term is represented by a unique token. Further linguistic elements such as abbreviations and negations are cleaned, returned to their canonical form, and ge<sup>t</sup> assigned a token. For example, commonly used negation *'don't'* is tokenized to *'do', 'not'*.
