!5311AA22!DB3533ACC4!AB!C25DD!!!...

Finally, we stacked subsequent repetitions of symbols in a vector of "weights", associated to the non redundant sequence. Formally speaking, in a string *t*1 ... *tn* where ∃ *i* s.t. *ti* = *ti*+<sup>1</sup> = ··· = *ti*+*<sup>k</sup>* , we replace *ti*, *ti*+1, ... , *ti*+*<sup>k</sup>* with *ti* = *ti* associating the corresponding weight *wi* = *k* + 1, with 1 ≤ *i* ≤ *n*. This can be easily done by scanning the string and counting the number of consecutive occurrences of the same symbol, in linear time with respect to the length of the string.

For the above sequence, we obtain as the result of the preprocessing of the data:
