*3.2. Tokenization*

Each review in the data set is individually processed. The preprocessing of the reviews starts by the tokenization phase that splits a piece of review into small units such tokens. A typical tokenization process can confiscate punctuation marks from the given text and create tokens of the text. A token can be anything, a word or a symbol, etc. Here, we use Core NLP PTB Tokenizer which is actually PENN TREEBANK way of tokenization of English writing and it splits the reviews into sentences in order to make a simple review file.

#### *3.3. Stop Words Removal*

A set of meaningless or irrelevant words in a piece of text can seriously affect the accuracy of the output. Hence, removal of such stop words from the input text is an important phase in sentiment analysis of the text. In the collected user reviews, a stop word can be a number, a preposition or a person's name, a product's name, etc. Each review after tokenization goes through the stop words removal phase. The used approach uses Core NLP library [33], which helps in identifying a list of stop words.
