Almost Optimal Searching of Maximal Subrepetitions in a Word

Kolpakov, Roman

doi:10.3390/math10193569

Open AccessArticle

Almost Optimal Searching of Maximal Subrepetitions in a Word

by

Roman Kolpakov

^1,2

¹

Faculty of Mechanics and Mathematics, Moscow Center for Fundamental and Applied Mathematics, Lomonosov Moscow State University, GSP-1, Leninskie Gory, 119991 Moscow, Russia

²

Federal Research Center “Computer Science and Control” of Russian Academy of Sciences, Vavilov St. 40, 119333 Moscow, Russia

Mathematics 2022, 10(19), 3569; https://doi.org/10.3390/math10193569

Submission received: 9 August 2022 / Revised: 20 September 2022 / Accepted: 26 September 2022 / Published: 30 September 2022

(This article belongs to the Special Issue Analysis of One-Dimensional Regularities)

Download

Browse Figure

Versions Notes

Abstract

:

For some fixed

δ

such that

0 < δ < 1

, a

δ

-subrepetition in a word is a factor whose exponent is less than 2 but is not less than

1 + δ

(the exponent of the factor is the ratio of the factor length to its minimal period). The

δ

-subrepetition is maximal if it cannot be extended to the left or to the right by at least one letter while preserving its minimal period. In the paper, we propose an algorithm for searching all maximal

δ

-subrepetitions in a word of length n in

O (\frac{n}{δ} \log \frac{1}{δ})

time (the lower bound for this time is

Ω (\frac{n}{δ})

). It improves the previous known time complexity bounds for solving this problem.

Keywords:

combinatorics on words; combinatorial algorithms; time complexity; repetitions

MSC:

68W32

1. Introduction

Let

w = w [1] w [2] \dots w [n]

be an arbitrary word of length n. A fragment

w [i] \dots w [j]

of w, where

1 \leq i \leq j \leq n

, is called a factor of w and is denoted by

w [i \dots j]

. Note that this factor can be considered either as a word itself or as the fragment

w [i] \dots w [j]

of w. Thus, for factors, we have two different notions of equality: factors can be equal as the same fragment of the word w or as the same word. To avoid this ambiguity, we use two different notations: if two factors u and v of w are the same word (the same fragment of w), we will write

u = v

(

u \equiv v

). For any

i = 1, \dots, n

the factor

w [1 \dots i]

(

w [i \dots n]

) is called a prefix (a suffix) of w. By positions in w, we mean the order numbers

1, 2, \dots, n

of letters of the word w. For any factor

v \equiv w [i \dots j]

of w, the positions i and j are called the start position of v and the ending position of v and denoted by

beg (v)

and

end (v)

, respectively. For any two factors u, v of w, the factor u is contained in v if

beg (v) \leq beg (u)

and

end (u) \leq end (v)

. Two factors u and v of w such that

beg (u) \leq beg (v)

are called overlapped if

beg (v) \leq end (u) + 1

. The intersection of the overlapped factors u and v is the factor

w [beg (v) \dots end (u)]

(if

beg (v) = end (u) + 1

, the intersection of u and v is assumed to be an empty word). The length of the intersection of the overlapped factors u and v is called the overlap of u and v. The union of the overlapped factors u and v is the factor

w [i \dots j]

, where

i = \min (beg (u), beg (v))

,

j = \max (end (u), end (v))

. If some word u is equal to a factor v of w, then v is called an occurrence of u in w.

A positive integer p is called a period of w if

w [i] = w [i + p]

for each

i = 1, \dots, n - p

. We denote by

p (w)

the minimal period of w and by

e (w)

the ratio

| w | / p (w)

, which is called the exponent of w. Further, we use the following well-known fact, which is usually called the periodicity lemma.

Lemma 1.

If a word w has two periods

p, q

, and

| w | \geq p + q

, then

\gcd (p, q)

is also a period of w.

The periodicity lemma is actually a weaker version of the Fine and Wilf theorem (see [1,2]). Using the periodicity lemma, it is easy to obtain the following.

Proposition 1.

Let q be a period of a word w such that

| w | \geq 2 q

. Then, q is divisible by

p (w)

.

We will also use the following evident fact.

Proposition 2.

If two overlapped factors of a word have the same period p and the overlap of these factors is not less than p, then p is a period of the union of these factors.

A word is called primitive if its exponent is not an integer greater than 1. For primitive words, the following well-known fact takes place (see, e.g., [3]).

Lemma 2 (primitivity lemma).

If u is a primitive word, then the word

u u

has no occurrences of u that are neither a prefix nor a suffix of

u u

.

Words r such that

e (r) \geq 2

are called repetitions. A repetition in a word is called maximal if this repetition cannot be extended to the left or to the right in the word by at least one letter while preserving its minimal period. More precisely, a repetition r in a word w is called maximal if it satisfies the following maximality conditions:

if $beg (r) > 1$ , then $w [beg (r) - 1] \neq w [beg (r) + p (r) - 1]$ ,
if $end (r) < n$ , then $w [end (r) - p (r) + 1] \neq w [end (r) + 1]$ .

For example, word

ababaabaaababab

has 6 maximal repetitions:

ababa

,

abaabaa

,

aabaaaba

,

aa

,

aaa

and

ababab

. Maximal repetitions are usually called runs in the literature. By

R (w)

, we denote the set of all maximal repetitions in the word w. For any natural n, we define also

R (n) = \max_{| w | = n} | R (w) |

and

E (n) = \max_{| w | = n} \sum_{r \in R (w)} e (r)

.

The possible number of maximal repetitions was actively investigated in the literature. In [4], the linear upper bound

E (n) = O (n)

is proven, which implies obviously that

R (n) = O (n)

. Due to a series of papers (see, e.g., [5]), more precise upper bounds on

E (n)

and

R (n)

have been obtained. A breakthrough in this direction was made in [6], where the bounds

R (n) < n

,

E (n) < 3 n

are proven. To our knowledge, the best upper bound

| R (w) | \leq \frac{183}{193} n

for binary words w of length n is shown in [7], and the best lower bounds

2.035 n

on

E (n)

and

0.9445757 n

on

R (n)

are obtained, respectively, in [8,9]. Some results on the average number of runs in arbitrary words are obtained in [10,11].

In this paper, we consider words over a (polynomially bounded) integer alphabet, i.e., words over an alphabet that consists of nonnegative integers bounded by some polynomial of the length of words. Thus, further, by w, we will mean an arbitrary word of length n over the integer alphabet.

The problem of finding of all maximal repetitions in words was also actively investigated in the literature. A

O (n)

-time algorithm for finding all runs in a word of length n was proposed in [4] for the case of a constant-size alphabet. This result was generalized to the case of words over the integer alphabet in [12]. Another

O (n)

-time algorithm for the case of the integer alphabet, based on a different approach, has been proposed in [6]. An algorithm for solving the problem in the case of an unbounded, linearly ordered alphabet was proposed in [13]. This algorithm was improved in [14,15]. Finally, a linear time algorithm for this case was proposed in [16]. The obtained results can be summarized in the following two theorems.

Theorem 1.

The number of maximal repetitions in w is

O (n)

, and all these repetitions with their minimal periods can be computed in

O (n)

time.

Theorem 2.

The sum

\sum_{r \in R (w)} e (r)

of exponents of all maximal repetitions in w is

O (n)

.

Let r be a repetition in w. We call any factor of w that has the length

p (r)

and is contained in r a cyclic root of r. The cyclic root that is the prefix of r is called the prefix cyclic root of r. It follows from the minimality of the period

p (r)

that any cyclic root of r is a primitive word. Thus, the following proposition can be easily obtained from Lemma 2.

Proposition 3.

Two cyclic roots

x^{'}, x^{″}

of a repetition r are equal if and only if

beg (x^{'}) \equiv beg (x^{″}) (\mod p (r))

(

end (x^{'}) \equiv end (x^{″}) (\mod p (r))

).

Since all roots of any repetition are primitive, any repetition r has

p (r)

different cyclic roots, which are cyclic rotations of each other. The lexicographically minimal root among these cyclic roots is called the Lyndon root of the repetition r. Let x be the leftmost occurrence of the Lyndon root in the repetition r. Then, the difference

beg (x) - beg (r)

is denoted by

a (r)

. Two repetitions with the same minimal period are called repetitions with the same cyclic roots if they have the same set of distinct cyclic roots. Note that repetitions with the same cyclic roots have the same Lyndon root.

Let

r^{'}, r^{″}

be maximal repetitions with the same cyclic roots, p be the minimal period of

r^{'}

and

r^{″}

, and

x^{'}, x^{″}

be cyclic roots of repetitions

r^{'}, r^{″}

, respectively. Note that

beg (r^{'}) + a (r^{'})

and

beg (r^{″}) + a (r^{″})

are the starting positions of the leftmost Lyndon roots of repetitions

r^{'}, r^{″}

, respectively. Denote by

δ^{'}

(

δ^{″}

) the residue of

beg (x^{'}) - (beg (r^{'}) + a (r^{'}))

(

beg (x^{″}) - (beg (r^{″}) + a (r^{″}))

) in modulo p. Using Proposition 3, it is easy to see that

x^{'} = x^{″}

if and only if

δ^{'} = δ^{″}

. Thus, we obtain the following fact.

Proposition 4.

Let

r^{'}, r^{″}

be maximal repetitions with the same cyclic roots, p be the minimal period of

r^{'}

and

r^{″}

, and

x^{'}, x^{″}

be cyclic roots of repetitions

r^{'}, r^{″}

, respectively. Then,

x^{'} = x^{″}

if and only if

beg (x^{'}) - (beg (r^{'}) + a (r^{'})) \equiv beg (x^{″}) - (beg (r^{″}) + a (r^{″})) (\mod p) .

Further, we will use double-linked lists of all maximal repetitions with the same cyclic roots in the order of their increasing of starting positions. According to [17], these lists can be computed for the word w in

O (n)

time. It is also shown in [17] that, for all maximal repetitions r in w, the values

a (r)

can be computed in

O (n)

total time.

We will also use the following facts for overlaps of maximal repetitions (see, e.g., [18]).

Proposition 5.

The overlap of any two different maximal repetitions with the same minimal period p is smaller than p.

Proposition 6.

The overlap of any two different maximal repetitions

r^{'}

and

r^{″}

is smaller than

p (r^{'}) + p (r^{″})

.

A natural generalization of repetitions are factors with exponents strictly less than 2. We will call such factors subrepetitions. More precisely, a factor r is called a subrepetition if

1 < e (r) < 2

. Analogously to repetitions, a subrepetition r in w is called maximal if r satisfies the maximality conditions, i.e., if r cannot be extended to the left or to the right in w by at least one letter while preserving its minimal period. For any

δ

such that

0 < δ < 1

(In the paper, for convenience, we assume actually that

δ < 1 - ε

for some fixed

ε

), a subrepetition r is called a (δ-subrepetition) if

e (r) \geq 1 + δ

. It is shown below that the number of maximal

δ

-subrepetitions in a word of length n is

O (n / δ)

. In this paper, the following problem is investigated.

Problem 1.

For a given value δ, find in a given word w of length n all maximal δ-subrepetions.

Previously, in [19], two algorithms for solving Problem 1 were proposed: the first algorithm has

O (\frac{n \log \log n}{δ^{2}})

time complexity and the second algorithm has

O (n \log n + \frac{n}{δ^{2}} \log \frac{1}{δ})

expected time complexity. In [20], the expected time of the second algorithm was improved to the linear bound

O (\frac{n}{δ^{2}} \log \frac{1}{δ})

. Using the results of [21,22], this time can be further improved to

O (\frac{n}{δ} \log \frac{1}{δ})

. In [23], it is shown that all subrepetitions with the largest exponent (over all subrepetitions) in an overlap-free string can be found in O(n) time for a constant-size alphabet. In this paper, we propose an alternative deterministic algorithm for solving Problem 1 in

O (\frac{n}{δ} \log \frac{1}{δ})

time.

In the following sections, we give a detailed description of the proposed algorithm. In Section 2, we consider, in a word, regular structures, which are called maximal (gapped or overlapped) repeats. These repeats are closely related to maximal subrepetitions. We show that there exists a one-to-one correspondence between maximal subrepetitions and some maximal gapped repeats, which are called principal, so, to find all maximal subrepetitions, we can find actually all principal gapped repeats in the word. In Section 3, we introduce the notion of covering maximal repeats and show that a gapped repeat is principal if and only if it is not covered by another maximal repeat. Thus, to find all maximal subrepetitions, we find actually, in the word, all maximal gapped repeats that are not covered by other maximal repeats. In Section 4, Section 5 and Section 6, some auxiliary notions that are used in the proposed algorithm—in particular, the notions of generated, periodic, represented, birepresented and

α

-periodic repeats—are introduced and several useful statements related to these notions are proven. Finally, a detailed description of the proposed algorithm is contained in Section 7.

2. Repeats

Another regular structure in a word that is closely related to repetitions and subrepetitions is repeats. In a general case, a repeat

σ

in the word w is a pair

u^{'}, u^{″}

of equal factors of the word w, where

beg (u^{'}) < beg (u^{″})

. The factors

u^{'}, u^{″}

are called copies of the repeat

σ

; in particular,

u^{'}

is called the left copy of

σ

and

u^{'}

is called the right copy of

σ

. The length of copies of

σ

is denoted by

c (σ)

. The difference

beg (u^{″}) - beg (u^{'})

is called the period of the repeat

σ

and is denoted by

p (σ)

. The minimal factor

w [beg (u^{'}) \dots end (u^{″})]

containing both copies

u^{'}, u^{″}

of

σ

will be denoted by

fact (σ)

. Note that for different repeats

σ^{'}

and

σ^{″}

, we can have actually

fact (σ^{'}) = fact (σ^{″})

. Note also that

p (σ)

is a period of

fact (σ)

, but the minimal period of

fact (σ)

can be less than

p (σ)

. By the starting position (the ending position) of

σ

, we will mean the starting position (the ending position) of

fact (σ)

. We will say that a maximal repeat

σ

is contained in some factor

w^{'}

of the word if the factor

fact (σ)

is contained in

w^{'}

. A repeat is called overlapped if its copies are overlapped factors; otherwise, the repeat is called gapped. In other words, the repeat

σ

is gapped if

fact (σ)

can be represented in the form

u^{'} v u^{″}

, where v is a nonempty word called the gap of the repeat

σ

. For any

α > 1

, a gapped repeat

σ

is called α-gapped if

p (σ) \leq α c (σ)

.

Analogously to repetitions and subrepetitions, we can also introduce the notion of maximal repeats. A repeat

σ

with left and right copies

u^{'}, u^{″}

in w is called maximal if it satisfies the following conditions:

if $beg (u^{'}) > 1$ , then $w [beg (u^{'}) - 1] \neq w [beg (u^{″}) - 1]$ ,
if $end (u^{″}) < n$ , then $w [end (u^{'}) + 1] \neq w [end (u^{″}) + 1]$ .

In other words, a repeat in a word is called maximal if its copies cannot be extended to the left or to the right in the word by at least one letter while preserving its period. Note that any repeat can be extended to a uniquely defined maximal repeat with the same period. In particular, any

α

-gapped repeat can be extended to a uniquely defined maximal

α

-gapped or overlapped repeat. In [21,22], the following fact on maximal

α

-gapped repeats was obtained.

Theorem 3.

The number of maximal α-gapped repeats in w is

O (α n)

, and all these repeats can be computed in

O (α n)

time.

In [22], the more precise upper bound

18 α n

on the number of maximal

α

-gapped repeats in w was actually obtained. A tighter bound on this number was obtained later in [24]. An algorithm that finds, in each position of a word, the longest gapped repeats satisfying additional restrictions is proposed in [25].

Let

σ

be a maximal overlapped repeat with left and right copies

u^{'}, u^{″}

in w. Note that, in this case, the period

p (σ)

of

fact (σ)

is not greater than half of

| fact (σ) |

, so

fact (σ)

is a repetition. Let p be the minimal period of

fact (σ)

. By Proposition 1, we have that p is a divisor of

p (σ)

. Let

beg (fact (σ)) = beg (u^{'}) > 1

. Since

σ

is a maximal repeat, we have that

w [beg (fact (σ)) - 1] = w [beg (u^{'}) - 1] \neq w [beg (u^{″}) - 1] = w [beg (fact (σ)) + p (σ) - 1] .

On the other hand, since the period p of

fact (σ)

is a divisor of

p (σ)

, we have that

w [beg (fact (σ)) + p - 1] = w [beg (fact (σ)) + p (σ) - 1]

. Thus, we obtain that

w [beg (fact (σ)) - 1] \neq w [beg (fact (σ)) + p - 1]

. In an analogous way, we can obtain that if

end (fact (σ)) < n

, then

w [beg (fact (σ)) + 1] \neq w [beg (fact (σ)) - p + 1]

. Thus, we conclude that

fact (σ)

is a maximal repetition whose minimal period is a divisor of

p (σ)

. We will denote this repetition by

rep (σ)

. Now, let r be a maximal repetition in w. Then, we can consider the repeat

σ

with left and right copies

w [beg (r) \dots end (r) - p (r)]

and

w [beg (r) + p (r) \dots end (r)]

. It is easy to note that

σ

is a maximal overlapped repeat such that

p (σ) = p (r)

. We will call the repeat

σ

the principal repeat of the repetition r. Principal repeats of maximal repetitions will be also called reprincipal repeats. Note that any reprincipal repeat

σ

is the principal repeat of the repetition

rep (σ)

, so, for any maximal repetition, we have that this repetition and the principal repeat of this repetition are uniquely defined by each other. Therefore, we have one-to-one correspondence between all maximal repetitions and all reprincipal repeats in a word. Thus, in any word, the number of reprincipal repeats is equal to the number of maximal repetitions. We have the following fact for reprincipal repeats.

Proposition 7.

The number of reprincipal repeats in w is

O (n)

, and all these repeats can be computed in

O (n)

time.

Proof.

Recall that the number of reprincipal repeats in w is equal to the number of maximal repetitions in w, so, by Theorem 1, this number is

O (n)

. To compute all reprincipal repeats in w, first, we find all maximal repetitions in w. By Theorem 1, this can be done in time

O (n)

. Then, for each found maximal repetition r, we compute the principal repeat of r as repeat

(u^{'}, u^{″})

, where

beg (u^{'}) = beg (r)

,

end (u^{'}) = end (r) - p (r)

,

beg (u^{″}) = beg (r) + p (r)

,

end (u^{″}) = end (r)

. It can be computed in constant time. Therefore, the total time of computing all reprincipal repeats from the found maximal repetitions is proportional to the number of these repetitions, so this time is

O (n)

. Thus, the total time of computing all reprincipal repeats in w is

O (n)

. □

Now, let r be a maximal

δ

-subrepetition in w. Then, we can consider a gapped repeat

σ

with the left copy

w [beg (r) \dots end (r) - p (r)]

and the right copy

w [beg (r) + p (r) \dots end (r)]

. It is easy to see that

σ

is a maximal gapped repeat with the period

p (σ) = p (r)

. Moreover, we have

c (σ) = | r | - p (r) \geq (1 + δ) p (r) - p (r) = δ p (r) = δ p (σ),

so

p (σ) \leq c (σ) / δ

. Thus,

σ

is actually a maximal

\frac{1}{δ}

-gapped repeat. We call the repeat

σ

the principal repeat of the subrepetition r. Thus, for any maximal

δ

-subrepetition, there exists the principal repeat of this subrepetition. On the other hand, a maximal gapped repeat may not be the principal repeat of any maximal subrepetition. For example, in the word shown in Figure 1, we can consider the maximal subrepetition r with the minimal period 7. Note that the gapped repeat

u^{'} v u^{″}

is the principal repeat of r, while the gapped repeat

{\bar{u}}^{'} \bar{v} {\bar{u}}^{″}

is not the principal repeat of r, so the repeat

{\bar{u}}^{'} \bar{v} {\bar{u}}^{″}

is not the principal repeat of any maximal subrepetition. Thus, the repeat

u^{'} v u^{″}

is a principal repeat and the repeat

{\bar{u}}^{'} \bar{v} {\bar{u}}^{″}

is not a principal repeat. Note that for any maximal subrepetition, we have that this subrepetition and the principal repeat of this subrepetition are uniquely defined by each other, and we have one-to-one correspondence between all maximal

δ

-subrepetitions and all principal maximal

\frac{1}{δ}

-gapped repeats in a word (According to this observation, maximal subrepetitions can be identified with maximal repeats

σ

such that

p (σ)

is equal to the minimal period of

fact (σ)

(see Proposition 9)). Thus, Problem 1 can be reformulated in the following way.

Problem 2.

For a given value δ, find in a given word w of length n all principal maximal

1 / δ

-gapped repeats.

We obtain also that, in any word, the number of maximal

δ

-subrepetitions is not greater than the number of maximal

\frac{1}{δ}

-gapped repeats. Thus, Theorem 3 implies the following upper bound on the number of maximal

δ

-subrepetitions in a word.

Proposition 8.

The number of maximal δ-subrepetitions in w is

O (n / δ)

.

Note also that, for any principal overlapped or gapped repeats, we have the following obvious fact.

Proposition 9.

A maximal repeat σ is principal if and only if the minimal period of

fact (σ)

is equal to

p (σ)

.

Let M be a set of maximal repeats in the word w. Note that a maximal repeat is uniquely defined by its starting position together with its period. Thus, we can represent the set M by lists

M L_{t}

for

t = 1, 2, \dots, n

, where

M L_{t}

is the list containing all these repeats with the starting position t in the order of their increasing periods. Using bucket sorting, all the lists

M L_{t}

can be computed from the set M in total

O (n + | M |)

time. Moreover, all the lists

M L_{t}

can be traversed in total

O (n + | M |)

time. We will call the lists

M L_{t}

start position lists for the set M.

3. Covering of Maximal Repeats

Now, we reformulate Problem 2 in terms of covering maximal repeats. We say that a maximal repeat

σ

is covered by another maximal repeat

σ^{'}

if

fact (σ)

is contained in

fact (σ^{'})

and

p (σ^{'}) < p (σ)

. For example, in the word

aabababbababaa

, the gapped repeat

abababbababa

with copies

aba

is covered by the gapped repeat

abababbababa

with copies

ababa

. Note that the introduced notion of covering of repeats satisfies the transitivity property: if some maximal repeat

σ^{'}

is covered by some maximal repeat

σ^{″}

and the maximal repeat

σ^{″}

is covered by some maximal repeat

σ^{‴}

, then

σ^{'}

is covered by

σ^{‴}

. The following auxiliary facts can be easily checked.

Proposition 10.

A maximal α-gapped repeat can be covered only by α-gapped or overlapped repeats.

Proposition 11.

If a maximal repeat σ is covered by a maximal repeat

σ^{'}

, then the left copy of σ is contained in the left copy of

σ^{'}

and the right copy of σ is contained in the right copy of

σ^{'}

.

In an analogous way, a maximal repeat

σ

is covered by a maximal repetition r if

fact (σ)

is contained in r and

p (r) < p (σ)

. For example, the repetition

aababaababa

with the minimal period 5 covers the gapped repeat

ababaababa

with copies

aba

. Note that the repeat

σ

is covered by r if and only if

σ

is covered by the principal repeat of r. Note also that any maximal overlapped repeat coincides as a factor with some maximal repetition whose period is a divisor of the period of this repeat. Thus, any maximal repeat covered by a maximal overlapped repeat is covered also by the maximal repetition coinciding with this overlapped repeat. Thus, we have the following fact.

Proposition 12.

Any maximal repeat covered by a maximal overlapped repeat is covered also by the principal repeat of some maximal repetition.

Let a maximal repeat

σ

be covered by a maximal repeat

σ^{'}

. Then, the factor

fact (σ)

has the period

p (σ^{'}) < p (σ)

, so, by Proposition 9, the repeat

σ

is not principal. Hence, principal repeats are not covered by other maximal repeats. Now, let a maximal repeat

σ

be not principal, i.e.,

σ

is contained in some repetition or subrepetition r such that

p (r) < p (σ)

. In this case,

σ

is covered by the principal repeat of r. Therefore, if

σ

is not covered by any other maximal repeat, then

σ

is principal. Thus, we obtain the following fact.

Proposition 13.

A maximal repeat is principal if and only if it is not covered by any other maximal repeat.

Using Propositions 10 and 13, Problem 2 can be reformulated in the following way.

Problem 3.

For a given value δ, find in a given word w of length n all maximal

1 / δ

-gapped repeats that are not covered by other maximal repeats.

In the paper, we actually propose an algorithm for solving Problem 3. The main idea of the proposed algorithm is the following. First, using already known algorithms, we find in w all maximal

1 / δ

-gapped repeats, and then we remove from the found repeats all repeats that are covered by other maximal repeats. Thus, the crucial part of the proposed algorithm is the procedure of removing maximal

1 / δ

-gapped repeats covered by other maximal repeats. This procedure is described below in the text.

4. Periodic and Generated Repeats

For resolving Problem 3, we initially introduce the notions of periodic and generated repeats. Repeat

σ

is called periodic if the copies of

σ

are repetitions with a minimal period not greater than

c (σ) / 3

; otherwise,

σ

is called nonperiodic. Let

σ

be a maximal periodic repeat with a left copy

u^{'}

and a right copy

u^{″}

. Since

u^{'}

and

u^{″}

are repetitions, these repetitions can be extended to some maximal repetitions

r^{'}

and

r^{″}

with the same cyclic roots. Let

r^{'}

and

r^{″}

be different maximal repetitions. Then, we say that

σ

is represented by the pair of maximal repetitions

r^{'}, r^{″}

or, more briefly,

σ

is birepresented. The pair of maximal repetitions

r^{'}, r^{″}

will be called left if

| r^{'} | \leq | r^{″} |

; otherwise, it is called right. We will also call the repeat

σ

left (right) birepresented if

σ

is represented by a left (right) pair of maximal repetitions. If the maximal repetitions

r^{'}

and

r^{″}

are the same repetition r, we say that

σ

is represented by the maximal repetition r.

A maximal repeat

σ

is considered to be generated by a maximal repetition r if

beg (σ) = beg (r)

,

end (σ) = end (r)

, and

p (σ)

is divisible by

p (r)

. For maximal periodic repeats represented by maximal repetitions, we have the following fact.

Proposition 14.

Any maximal periodic repeat represented by a maximal repetition is generated by this maximal repetition.

Proof.

Let

σ

be a maximal periodic repeat represented by a maximal repetition r, and

u^{'}

,

u^{″}

be, respectively, the left and right copies of

σ

. Note that, since

σ

is periodic, the length of

u^{'}

and

u^{″}

is greater than

p (r)

. Denote by

x^{'}

,

x^{″}

the cyclic roots of r, which are prefixes of

u^{'}

and

u^{″}

, respectively. Since

x^{'} = x^{″}

, by Proposition 3, we have

beg (x^{'}) \equiv beg (x^{″}) (\mod p (r))

, so

beg (u^{'}) \equiv beg (u^{″}) (\mod p (r))

. Thus,

p (σ) = beg (u^{″}) - beg (u^{'})

is divisible by

p (r)

. Let

beg (σ) = beg (u^{'}) > beg (r)

. Then, both symbols

w [beg (u^{'}) - 1]

and

w [beg (u^{″}) - 1]

are contained in r and the difference between the positions of these symbols is divisible by

p (r)

. Thus,

w [beg (u^{'}) - 1] = w [beg (u^{″}) - 1]

, which contradicts the notion that

σ

is a maximal repeat. Thus,

beg (σ) = beg (r)

. In an analogous way, we have that

end (σ) = end (r)

. Thus,

σ

is generated by r. □

Note that any maximal repetition r generates no more than

e (r) / 2

gapped repeats, and, knowing values

beg (r)

,

end (r)

and

p (r)

, we can compute all these repeats in

O (e (r))

time. We can check each of these repeats in constant time if this repeat is

α

-gapped. Thus, we can compute all maximal

α

-gapped repeats generated by r in

O (e (r))

time. Therefore, we have the following simple procedure for computing all maximal

α

-gapped repeats generated by maximal repetitions in w. First, we find all maximal repetitions in w. According to Theorem 1, this can be done in

O (n)

time, and the total number of these repetitions is

O (n)

. Then, for each found repetition r, we compute all maximal

α

-gapped repeats generated by r in

O (e (r))

time. The total time of this procedure is

O (\sum_{r \in R (w)} e (r))

, so, by Theorem 2, this time is

O (n)

. Note also that, by Theorem 2, the number of computed repeats is

O (n)

. Thus, we obtain the following fact.

Proposition 15.

The number of maximal α-gapped repeats generated by maximal repetitions in w is

O (n)

, and all these repeats can be computed in

O (n)

time.

Proposition 15 is used further in the proposed algorithm for the exclusion of covered repeats generated by maximal repetitions.

5. Birepresented Gapped Repeats

Now, we prove several properties of birepresented gapped repeats, which are used in the proposed algorithm. Let the maximal repeat

σ

with left and right copies

u^{'}

and

u^{″}

be represented by a left pair of maximal repetitions

r^{'}, r^{″}

with the same cyclic roots, and p be the minimal period of

r^{'}

and

r^{″}

. Assume that

beg (u^{'}) > beg (r^{'})

and

beg (u^{″}) > beg (r^{″})

. Then,

w [beg (u^{'}) - 1]

and

w [beg (u^{″}) - 1]

are contained in

r^{'}

and

r^{″}

, respectively, so

\begin{matrix} u^{'} [p] & = & w [beg (u^{'}) - 1 + p] = w [beg (u^{'}) - 1], \\ u^{″} [p] & = & w [beg (u^{″}) - 1 + p] = w [beg (u^{″}) - 1] . \end{matrix}

Thus, from

u^{'} [p] = u^{″} [p]

, we have

w [beg (u^{'}) - 1] = w [beg (u^{″}) - 1]

, which contradicts the notion that

σ

is maximal. Therefore, we have

beg (u^{'}) = beg (r^{'})

or

beg (u^{″}) = beg (r^{″})

. Thus, we have the following separate cases for

beg (u^{'})

and

beg (u^{″})

:

a1.: $beg (u^{'}) = beg (r^{'})$ and $beg (u^{″}) = beg (r^{″})$ ;
a2.: $beg (u^{'}) = beg (r^{'})$ and $beg (u^{″}) > beg (r^{″})$ ;
a3.: $beg (u^{'}) > beg (r^{'})$ and $beg (u^{″}) = beg (r^{″})$ .

Analogously, it can be shown that

end (u^{'}) = end (r^{'})

or

end (u^{″}) = end (r^{″})

. Thus, we have the following separate cases for

end (u^{'})

and

end (u^{″})

:

b1.: $end (u^{'}) = end (r^{'})$ and $end (u^{″}) = end (r^{″})$ ;
b2.: $end (u^{'}) = end (r^{'})$ and $end (u^{″}) < end (r^{″})$ ;
b3.: $end (u^{'}) < end (r^{'})$ and $end (u^{″}) = end (r^{″})$ .

Thus, for

i, j = 1, 2, 3

, we have in total 9 cases aibj when there are cases ai and bj simultaneously. Note that, since

| r^{'} | \leq | r^{″} |

, the case of

beg (u^{″}) = beg (r^{″})

and

end (u^{″}) = end (r^{″})

implies that

beg (u^{'}) = beg (r^{'})

and

end (u^{'}) = end (r^{'})

, so cases a1b3, a3b1 and a3b3 are impossible. The remaining cases can be grouped into the three following cases:

$beg (u^{'}) > beg (r^{'})$ , $end (u^{'}) = end (r^{'})$ and $beg (u^{″}) = beg (r^{″})$ (case a3b2);
$beg (u^{'}) = beg (r^{'})$ , $end (u^{'}) = end (r^{'})$ and $end (u^{″}) < end (r^{″})$ (cases a1b2 and a2b2);
$beg (u^{'}) = beg (r^{'})$ and $end (u^{″}) = end (r^{″})$ (cases a1b1, a2b1 and a2b3).

We will call the repeat

σ

the repeat of the first type in case 1, repeat of the second type in case 2, and repeat of the third type in case 3. For example, consider the word

b a a b a a b a a b a b b a b a a b a a b a a b a a b a

, which has the left pair of the maximal repetitions

r^{'} = b a a b a a b a a b a

,

r^{″} = a b a a b a a b a a b a a b a

with the same cyclic roots of length 3. This pair represents the gapped repeat

a b a a b a a b a b b a b a a b a a b a

of the first type with copies

a b a a b a a b a

, the gapped repeat

b a a b a a b a a b a b b a b a a b a a b a a b a

of the second type with copies

b a a b a a b a a b a

and the gapped repeat

b a a b a a b a a b a b b a b a a b a a b a a b a a b a

of the third type with copies

b a a b a a b a a b a

.

First, consider case 1. Let

σ^{'}

,

σ^{″}

be two different maximal repeats of the first type represented by the left pair of maximal repetitions

r^{'}, r^{″}

. If

beg (σ^{'}) = beg (σ^{″})

, then it is easy to note that

σ^{'}

and

σ^{″}

are the same repeat, so

beg (σ^{'}) \neq beg (σ^{″})

. Let

x^{'}

,

x^{″}

be the prefixes of length p in the left copies of repeats

σ^{'}

and

σ^{″}

, respectively. Note that

x^{'}

,

x^{″}

are cyclic roots of

r^{'}

that are equal to the prefix cyclic root of

r^{″}

, so

x^{'} = x^{″}

. Thus, by Proposition 3, we have that the difference

beg (x^{'}) - beg (x^{″}) = beg (σ^{'}) - beg (σ^{″})

is divisible by p. Moreover, by the definition of repeats of the first type, we have that both values

beg (σ^{'}), beg (σ^{″})

are in the segment

beg (r^{'}) < beg (σ^{'}), beg (σ^{″}) \leq end (r^{'}) - 3 p + 1 .

Thus, the starting positions of all maximal repeats of the first type represented by the pair

r^{'}, r^{″}

form in the segment

[beg (r^{'}) + 1; end (r^{'}) - 3 p + 1]

an arithmetic progression of numbers with the step p. Let

t^{'}

be the maximal number in this progression and

k^{'}

be the number of all maximal repeats of the first type represented by the pair

r^{'}, r^{″}

. Then, we can consider the numbers

t^{'}, t^{'} - p, t^{'} - 2 p, \dots, t^{'} - (k^{'} - 1) p

of this progression in descending order as the starting positions of the corresponding repeats. In this way, we consider the set of all maximal repeats of the first type represented by the pair

r^{'}, r^{″}

as

{σ_{1}^{'}, σ_{2}^{'}, \dots, σ_{k^{'}}^{'}}

, where

beg (σ_{j}^{'}) = t^{'} - (j - 1) p

for

j = 1, 2, \dots, k^{'}

. Note that

beg (σ_{j}^{'}) = beg (σ_{1}^{'}) - (j - 1) p

,

p (σ_{j}^{'}) = p (σ_{1}^{'}) + (j - 1) p

and

end (σ_{j}^{'}) = end (σ_{1}^{'}) + (j - 1) p

for

j = 1, 2, \dots, k^{'}

.

Now, consider case 2. Let

σ^{'}

,

σ^{″}

be two different maximal repeats of the second type represented by the left pair of maximal repetitions

r^{'}, r^{″}

, and

{\hat{u}}^{'}

,

{\hat{u}}^{″}

be the right copies of repeats

σ^{'}

,

σ^{″}

, respectively. If

beg ({\hat{u}}^{'}) = beg ({\hat{u}}^{″})

, then

σ^{'}

and

σ^{″}

are the same repeat, so

beg ({\hat{u}}^{'}) \neq beg ({\hat{u}}^{″})

. Let

x^{'}

,

x^{″}

be the prefixes of length p in

σ^{'}

and

σ^{″}

, respectively. Note that

x^{'}

,

x^{″}

are cyclic roots of

r^{″}

, which are equal to the prefix cyclic root of

r^{'}

, so

x^{'} = x^{″}

. Thus, by Proposition 3, we have that the difference

beg ({\hat{u}}^{'}) - beg ({\hat{u}}^{″})

is divisible by p. Moreover, by the definition of repeats of the second type, we have that both values

beg ({\hat{u}}^{'}), beg ({\hat{u}}^{″})

are in the segment

beg (r^{″}) \leq beg ({\hat{u}}^{'}), beg ({\hat{u}}^{″}) \leq end (r^{″}) - | r^{'} | .

Thus, the starting positions of the right copies of all maximal repeats of the second type represented by the pair

r^{'}, r^{″}

form in the segment

[beg (r^{″}); end (r^{″}) - | r^{'} |]

an arithmetic progression of numbers with the step p. Let

t^{″}

be the minimal number in this progression and

k^{″}

be the number of all maximal repeats of the second type represented by the pair

r^{'}, r^{″}

. Then, we can consider the numbers

t^{″}, t^{″} + p, t^{″} + 2 p, \dots, t^{″} + (k^{″} - 1) p

of this progression in ascending order as the starting positions of the right copies of the corresponding repeats. In this way, we consider the set of all maximal repeats of the second type represented by the pair

r^{'}, r^{″}

as

{σ_{1}^{″}, σ_{2}^{″}, \dots, σ_{k^{″}}^{″}}

, where the starting position of the right copy of

σ_{j}^{″}

is

t^{″} + (j - 1) p

for

j = 1, 2, \dots, k^{″}

. Note that

beg (σ_{j}^{″}) = beg (r^{'})

,

p (σ_{j}^{″}) = p (σ_{1}^{″}) + (j - 1) p

and

end (σ_{j}^{″}) = end (σ_{1}^{'}) + (j - 1) p

for

j = 1, 2, \dots, k^{″}

.

Finally, consider case 3. Let

σ^{'}

,

σ^{″}

be two different maximal repeats of the third type represented by the left pair of maximal repetitions

r^{'}, r^{″}

, and

{\hat{u}}^{'}

,

{\hat{u}}^{″}

be the right copies of repeats

σ^{'}

,

σ^{″}

, respectively. Analogously to case 2, it can be shown that

beg ({\hat{u}}^{'}) \neq beg ({\hat{u}}^{″})

, and the difference

beg ({\hat{u}}^{'}) - beg ({\hat{u}}^{″})

is divisible by p. Thus, the starting positions of the right copies of all maximal repeats of the third type represented by the pair

r^{'}, r^{″}

form in the segment

[end (r^{″}) - | r^{'} | + 1; end (r^{″}) - 3 p + 1]

an arithmetic progression of numbers with the step p. Let

t^{‴}

be the minimal number in this progression and

k^{‴}

be the number of all maximal repeats of the third type represented by the pair

r^{'}, r^{″}

. Then, we can consider the numbers

t^{‴}, t^{‴} + p, t^{‴} + 2 p, \dots, t^{‴} + (k^{‴} - 1) p

of this progression in ascending order as the starting positions of the right copies of the corresponding repeats. In this way, we consider the set of all maximal repeats of the third type represented by the pair

r^{'}, r^{″}

as

{σ_{1}^{‴}, σ_{2}^{‴}, \dots, σ_{k^{‴}}^{‴}}

, where the starting position of the right copy of

σ_{j}^{‴}

is

t^{‴} + (j - 1) p

for

j = 1, 2, \dots, k^{‴}

. Note that

beg (σ_{j}^{‴}) = beg (r^{'})

,

p (σ_{j}^{‴}) = p (σ_{1}^{‴}) + (j - 1) p

and

end (σ_{j}^{‴}) = end (r^{″})

for

j = 1, 2, \dots, k^{‴}

. The repeat

σ_{1}^{‴}

will be called dominating, and other repeats

{σ_{2}^{‴}, \dots, σ_{k^{‴}}^{‴}}

will be called nondominating.

Consider additionally the repeats

σ_{k^{'}}^{'}

and

σ_{1}^{″}

. Let

u_{0}^{'}

and

u_{0}^{″}

be left and right copies of

σ_{k^{'}}^{'}

, and

u_{1}^{'}

and

u_{1}^{″}

be left and right copies of

σ_{1}^{″}

. Consider in

r^{'}

the prefix

x^{'}

of length p, which is a cyclic root of

r^{'}

. Since

x^{'}

is a prefix of

u_{1}^{'}

, we have also in

u_{1}^{″}

the prefix

x^{″}

of length p, which is a cyclic root of

r^{″}

and is equal to

x^{'}

. Note that

x^{″}

is a factor of

u_{0}^{″}

, so we can also consider in

u_{0}^{'}

the factor

x^{‴}

corresponding to

x^{″}

. Note also that

x^{‴}

is a cyclic root of

r^{'}

such that

x^{‴} = x^{'}

. Moreover,

x^{‴}

has to be in

r^{'}

the leftmost cyclic root to the right of

x^{'}

, so, by Proposition 3,

beg (x^{‴}) = beg (x^{'}) + p

. We have also that

beg (x^{″}) = beg (x^{'}) + p (σ_{1}^{″})

and

beg (x^{‴}) = beg (x^{″}) - p (σ_{k^{'}}^{'})

, so

beg (x^{‴}) = beg (x^{'}) + p (σ_{1}^{″}) - p (σ_{k^{'}}^{'})

. Thus,

p (σ_{1}^{″}) - p (σ_{k^{'}}^{'}) = p

, and

end (σ_{1}^{″}) = end (r^{'}) + p (σ_{1}^{″}) = end (r^{'}) + p (σ_{k^{'}}^{'}) + p = end (σ_{k^{'}}^{'}) + p .

Consider also the repeats

σ_{k^{″}}^{″}

and

σ_{1}^{‴}

. Let

u_{0}^{'}

and

u_{0}^{″}

be left and right copies of

σ_{k^{″}}^{″}

, and

u_{1}^{'}

and

u_{1}^{″}

be left and right copies of

σ_{1}^{‴}

. Consider in

r^{'}

the prefix

x^{'}

of length p, which is a cyclic root of

r^{'}

. Since

x^{'}

is a prefix of

u_{0}^{'}

, we have in

u_{0}^{″}

the prefix

x^{″}

of length p, which is a cyclic root of

r^{″}

and is equal to

x^{'}

. Since

x^{'}

is a prefix of

u_{1}^{'}

, we have also in

u_{1}^{″}

the prefix

x^{‴}

of length p, which is a cyclic root of

r^{″}

and is equal to

x^{'}

. Thus,

x^{″}

and

x^{‴}

are two equal cyclic roots of

r^{″}

. Note that

x^{‴}

has to be in

r^{'}

the leftmost cyclic root to the right of

x^{″}

, so, by Proposition 3,

beg (x^{‴}) = beg (x^{″}) + p

. Therefore,

p (σ_{1}^{‴}) = beg (x^{‴}) - beg (u_{1}^{'}) = beg (x^{‴}) - beg (u_{0}^{'}) = beg (x^{″}) + p - beg (u_{0}^{'}) = p (σ_{k^{″}}^{″}) + p .

Now, we can join all the repeats represented by the pair of repetitions

r^{'}, r^{″}

into the sequence of repeats

Ψ = {\hat{σ}}_{1}, {\hat{σ}}_{2}, \dots, {\hat{σ}}_{k^{'} + k^{″} + k^{‴}}

, where the repeats of the first type, the repeats of the second type and the repeats of the third type are inserted consecutively, i.e.,

{\hat{σ}}_{j} = σ_{j}^{'}

for

j = 1, 2, \dots, k^{'}

,

{\hat{σ}}_{k^{'} + j} = σ_{j}^{″}

for

j = 1, 2, \dots, k^{″}

, and

{\hat{σ}}_{k^{'} + k^{″} + j} = σ_{j}^{‴}

for

j = 1, 2, \dots, k^{‴}

. From the above observations, we have that

p ({\hat{σ}}_{j + 1}) = p ({\hat{σ}}_{j}) + p

for

j = 1, 2, \dots, k^{'} + k^{″} + k^{‴} - 1

,

beg ({\hat{σ}}_{j + 1}) < beg ({\hat{σ}}_{j})

for

j = 1, 2, \dots, k^{'}

,

beg ({\hat{σ}}_{j}) = beg (r^{'})

for

j = k^{'} + 1, k^{'} + 2, \dots, k^{'} + k^{″} + k^{‴}

,

end ({\hat{σ}}_{j + 1}) > end ({\hat{σ}}_{j})

for

j = 1, 2, \dots, k^{'} + k^{″}

, and

end ({\hat{σ}}_{j}) = end (r^{″})

for

j = k^{'} + k^{″} + 1, k^{'} + k^{″} + 2, \dots, k^{'} + k^{″} + k^{‴}

. Note also that

end (r^{″}) = end ({\hat{σ}}_{k^{'} + k^{″} + 1}) \leq end ({\hat{σ}}_{k^{'} + k^{″}}) + p

. Since

p ({\hat{σ}}_{j^{'}}) < p ({\hat{σ}}_{j^{″}})

for

j^{'} < j^{″}

, any repeat

{\hat{σ}}_{j}

cannot be covered by repeats from

Ψ

with greater indexes. Moreover, since

| {\hat{σ}}_{j^{'}} | < | {\hat{σ}}_{j^{″}} |

for

j^{'} < j^{″} \leq k^{'} + k^{″} + 1

, any repeat

{\hat{σ}}_{j}

for

j \leq k^{'} + k^{″} + 1

cannot be covered by repeats from

Ψ

with smaller indexes. Thus, any repeat

{\hat{σ}}_{j}

for

j \leq k^{'} + k^{″} + 1

cannot be covered by other repeats from

Ψ

. On the other hand, all repeats

{\hat{σ}}_{j}

for

j > k^{'} + k^{″} + 1

that are actually nondominating repeats of the third type are covered by the repeat

{\hat{σ}}_{k^{'} + k^{″} + 1}

, which is actually the dominating repeat of the third type. Thus, we have the following fact.

Proposition 16.

A maximal repeat σ represented by a left pair of maximal repetitions is covered by another repeat represented by the same pair of repetitions if and only if σ is a nondominating repeat of the third type.

From the left pair of maximal repetitions

r^{'}, r^{″}

, one can compute effectively all maximal

α

-gapped periodic repeats represented by the pair

r^{'}, r^{″}

.

Lemma 3.

Let

r^{'}, r^{″}

be a left pair of maximal repetitions with the same cyclic roots such that

p (r^{'}) = p (r^{″})

,

beg (r^{'})

,

beg (r^{″})

,

end (r^{'})

,

end (r^{″})

,

a (r^{'})

,

a (r^{″})

be known. Then, all maximal α-gapped periodic repeats represented by the pair

r^{'}, r^{″}

can be computed in time

O (1 + s)

, where s is the number of the maximal α-gapped periodic repeats represented by the pair

r^{'}, r^{″}

.

Proof.

First, we compute in constant time a sequence

Ψ

of repeats, where, by computing

Ψ

, we mean computing formulas by which any repeat

{\hat{σ}}_{j}

from

Ψ

can be computed in constant time. Since any maximal repeat

σ

is defined uniquely by the values

p (σ)

,

beg (σ)

and

end (σ)

, we will compute actually for any repeat

{\hat{σ}}_{j}

from

Ψ

the values

p ({\hat{σ}}_{j})

,

beg ({\hat{σ}}_{j})

and

end ({\hat{σ}}_{j})

.

Let p be the minimal period of

r^{'}

and

r^{″}

, and

x^{″}

be the prefix cyclic root of

r^{″}

. Denote by

f^{'}

the starting position of the leftmost cyclic root of

r^{'}

, which is equal to

x^{″}

and is not a prefix of

r^{'}

. Taking into account Proposition 4, it can be checked that

f^{'} = \{\begin{matrix} beg (r^{'}) + a (r^{'}) - a (r^{″}) if a (r^{'}) > a (r^{″}); \\ beg (r^{'}) + a (r^{'}) - a (r^{″}) + p if a (r^{'}) \leq a (r^{″}) . \end{matrix}

Note that

f^{'}

has to be actually the starting position of the repeat

{\hat{σ}}_{k^{'}}

from

Ψ

, and in this case,

c ({\hat{σ}}_{k^{'}}) = end (r^{'}) - f^{'} + 1

. Thus, if

end (r^{'}) - f^{'} + 1 < 3 p

, we can conclude that

k^{'} = 0

; otherwise,

k^{'} > 0

.

Let

end (r^{'}) - f^{'} + 1 \geq 3 p

, i.e.,

k^{'} > 0

. Denote by

f_{1}

the starting position of

{\hat{σ}}_{1}

. Note that

f_{1}

is the starting position of the rightmost cyclic root of

r^{'}

, which is equal to

x^{″}

, and such that

c ({\hat{σ}}_{1}) = end (r^{'}) - f_{1} + 1 \geq 3 p

. Thus, taking into account Proposition 3, we have that

f_{1}

is the greatest number such that

f_{1} - f^{'}

is divisible by p and

f_{1} \leq end (r^{'}) - 3 p + 1

. Note that

f_{1}

can be computed in constant time. Then,

beg ({\hat{σ}}_{1}) = f_{1}

,

p ({\hat{σ}}_{1}) = beg (r^{″}) - f_{1}

and

end ({\hat{σ}}_{1}) = end (r^{'}) + p ({\hat{σ}}_{1})

. Note also that

f^{'} = beg ({\hat{σ}}_{k^{'}}) = beg ({\hat{σ}}_{1}) - (k^{'} - 1) p = f_{1} - (k^{'} - 1) p,

so

k^{'} = \frac{f_{1} - f^{'}}{p} + 1,

(1)

and for any

j = 1, 2, \dots, k^{'}

, we have

beg ({\hat{σ}}_{j}) = beg ({\hat{σ}}_{1}) - p (j - 1)

,

end ({\hat{σ}}_{j}) = end ({\hat{σ}}_{1}) + p (j - 1)

,

p ({\hat{σ}}_{j}) = p ({\hat{σ}}_{1}) + p (j - 1)

and

c ({\hat{σ}}_{j}) = c ({\hat{σ}}_{1}) + p (j - 1)

. Denote

\hat{k} = k^{'} + k^{″}

. As shown above,

\hat{k}

is the greatest number such that

end ({\hat{σ}}_{1}) + p (\hat{k} - 1) < end (r^{″})

, so

\hat{k} = ⌊ \frac{1}{p} (end (r^{″}) - end ({\hat{σ}}_{1}) - 1) ⌋ + 1,

and

k^{″} = \hat{k} - k^{'}

. If

k^{″} > 0

, from the above observations for

j = 1, 2, \dots, k^{″}

, we have

beg ({\hat{σ}}_{k^{'} + j}) = beg (r^{'})

,

end ({\hat{σ}}_{k^{'} + j}) = end ({\hat{σ}}_{k^{'}}) + j p

,

p ({\hat{σ}}_{k^{'} + j}) = p ({\hat{σ}}_{k^{'}}) + j p

and

c ({\hat{σ}}_{k^{'} + j}) = | r^{'} |

. Moreover, the repeat

{\hat{σ}}_{\hat{k} + 1}

from

Ψ

has to satisfy the conditions

beg ({\hat{σ}}_{\hat{k} + 1}) = beg (r^{'})

,

end ({\hat{σ}}_{\hat{k} + 1}) = end (r^{″})

,

p ({\hat{σ}}_{\hat{k} + 1}) = p ({\hat{σ}}_{\hat{k}}) + p

. Note that in this case,

c ({\hat{σ}}_{{\hat{k}}^{'} + 1}) = | {\hat{σ}}_{{\hat{k}}^{'} + 1} | - p ({\hat{σ}}_{{\hat{k}}^{'} + 1}) = end (r^{″}) - beg (r^{'}) + 1 - p ({\hat{σ}}_{\hat{k}}) - p .

Thus, if

end (r^{″}) - beg (r^{'}) + 1 - p ({\hat{σ}}_{\hat{k}}) - p < 3 p,

we conclude that

k^{‴} = 0

; otherwise,

k^{‴} > 0

. Let

k^{‴} > 0

. Note that

k^{‴}

is the greatest number such that

c ({\hat{σ}}_{\hat{k} + k^{‴}}) = c ({\hat{σ}}_{{\hat{k}}^{'} + 1}) - p (k^{‴} - 1) \geq 3 p,

i.e.,

end (r^{″}) - beg (r^{'}) + 1 - p ({\hat{σ}}_{\hat{k}}) - p k^{‴} \geq 3 p .

Thus,

k^{‴} = ⌊ \frac{1}{p} (end (r^{″}) - beg (r^{'}) + 1 - p ({\hat{σ}}_{\hat{k}})) ⌋ - 3

, and for any

j = 1, 2, \dots, k^{‴}

, we have

beg ({\hat{σ}}_{\hat{k} + j}) = beg (r^{'})

,

end ({\hat{σ}}_{\hat{k} + j}) = end (r^{″})

,

p ({\hat{σ}}_{\hat{k} + j}) = p ({\hat{σ}}_{\hat{k} + 1}) + (j - 1) p

and

c ({\hat{σ}}_{\hat{k} + j}) = c ({\hat{σ}}_{\hat{k} + 1}) - (j - 1) p

.

Now, consider the case

k^{'} = 0

, i.e.,

end (r^{'}) - f^{'} + 1 < 3 p

and the pair

r^{'}, r^{″}

represents no repeats of the first type. Since all repeats of the second or third type represented by the pair

r^{'}, r^{″}

must have starting position

beg (r^{'})

, the equality

beg ({\hat{σ}}_{1}) = beg (r^{'})

must be held for

{\hat{σ}}_{1}

. Denote by

{\hat{σ}}_{0}

the repeat with the starting position

f^{'}

and the period

beg (r^{″}) - f^{'}

. We proved above that

p ({\hat{σ}}_{k^{'} + 1}) = p ({\hat{σ}}_{k^{'}}) + p

. In the same way, one can prove that

p ({\hat{σ}}_{1})

must be equal to

p ({\hat{σ}}_{0}) + p = beg (r^{″}) - f^{'} + p

. In this case, we obtain that

c ({\hat{σ}}_{1}) = \min (| r^{'} |, | r^{″} | + f^{'} - beg (r^{'}) - p)

. Therefore, if

\min (| r^{'} |, | r^{″} | + f^{'} - beg (r^{'}) - p) < 3 p

, we conclude that

Ψ

is empty. Let

\min (| r^{'} |, | r^{″} | + f^{'} - beg (r^{'}) - p) \geq 3 p

. Note that

f^{'} \leq beg (r^{'}) + p

, so

end (r^{'}) - f^{'} + 1 \geq | r^{'} | - p

. Thus,

| r^{'} | - p < 3 p

. Note also that

c ({\hat{σ}}_{k^{″} + 2})

has to be equal to

c ({\hat{σ}}_{k^{″} + 1}) - p \leq | r^{'} | - p

, so

c ({\hat{σ}}_{k^{″} + 2})

has to be less than

3 p

. Thus, we have

k^{‴} \leq 1

. First, consider the case

| r^{'} | < | r^{″} | + f^{'} - beg (r^{'}) - p

, i.e.,

{\hat{σ}}_{1}

is a repeat of the second type, so

end ({\hat{σ}}_{1}) = end (r^{'}) + p ({\hat{σ}}_{1})

. Then,

k^{″}

can be computed in constant time as the greatest number such that

end ({\hat{σ}}_{1}) + (k^{″} - 1) p < end (r^{″})

, and for any

j = 1, 2, \dots, k^{″}

, we have

beg ({\hat{σ}}_{j}) = beg (r^{'})

,

end ({\hat{σ}}_{j}) = end ({\hat{σ}}_{1}) + (j - 1) p

,

p ({\hat{σ}}_{j}) = p ({\hat{σ}}_{1}) + (j - 1) p

and

c ({\hat{σ}}_{j}) = | r^{'} |

. Moreover, we obtain that

{\hat{σ}}_{k^{″} + 1}

must satisfy the conditions

beg ({\hat{σ}}_{k^{″} + 1}) = beg (r^{'})

,

end ({\hat{σ}}_{k^{″} + 1}) = end (r^{″})

and

p ({\hat{σ}}_{k^{″} + 1}) = p ({\hat{σ}}_{k^{″}}) + p

, so

c ({\hat{σ}}_{k^{″} + 1}) = | {\hat{σ}}_{k^{″} + 1} | - p ({\hat{σ}}_{k^{″} + 1}) = end (r^{″}) - beg (r^{'}) + 1 - p ({\hat{σ}}_{k^{″}}) - p .

Thus, if

end (r^{″}) - beg (r^{'}) + 1 - p ({\hat{σ}}_{k^{″}}) - p < 3 p

, we conclude that

k^{‴} = 0

. Otherwise, taking into account

k^{‴} \leq 1

, we obtain that

k^{‴} = 1

and

beg ({\hat{σ}}_{k^{″} + 1}) = beg (r^{'})

,

end ({\hat{σ}}_{k^{″} + 1}) = end (r^{″})

and

p ({\hat{σ}}_{k^{″} + 1}) = p ({\hat{σ}}_{k^{″}}) + p

. Finally, consider the case

| r^{'} | \geq | r^{″} | + f^{'} - beg (r^{'}) - p

, i.e.,

k^{″} = 0

and

{\hat{σ}}_{1}

is a repeat of the third type, so

end ({\hat{σ}}_{1}) = end (r^{″})

. Taking into account

k^{‴} \leq 1

, we obtain in this case that

{\hat{σ}}_{1}

is a unique repeat in

Ψ

. Thus, in any case,

Ψ

can be computed in constant time.

Now, we need to select from

Ψ

all

α

-gapped repeats, i.e., all gapped repeats

\hat{σ}

such that

\frac{p (\hat{σ})}{c (\hat{σ})} \leq α

. If

\hat{k} = 0

, i.e.,

k^{'} = k^{″} = 0

, as shown above, we have a trivial case when

Ψ

contains no more than one repeat. Thus, further, we assume that

\hat{k} > 0

. First, we consider the case when

{\hat{σ}}_{1}

is a gapped repeat. Note that for any repeats

{\hat{σ}}_{j}

and

{\hat{σ}}_{j + 1}

from

Ψ

, the ending position of the left copy of

{\hat{σ}}_{j + 1}

is not greater than the ending position of the left copy of

{\hat{σ}}_{j}

and the starting position of the right copy of

{\hat{σ}}_{j + 1}

is not less than the right copy of

{\hat{σ}}_{j}

. Thus, if

{\hat{σ}}_{j}

is a gapped repeat, then all repeats

{\hat{σ}}_{l}

for

l > j

are also gapped repeats. Therefore, in this case, all repeats from

Ψ

are gapped repeats. Further, we note that for any j such that

k^{'} < j < k^{'} + k^{″} + k^{‴}

, we have

c ({\hat{σ}}_{j}) \geq c ({\hat{σ}}_{j + 1})

and

p ({\hat{σ}}_{j}) < p ({\hat{σ}}_{j + 1})

, so

\frac{p ({\hat{σ}}_{k^{'} + 1})}{c ({\hat{σ}}_{k^{'} + 1})} < \frac{p ({\hat{σ}}_{k^{'} + 2})}{c ({\hat{σ}}_{k^{'} + 2})} < \dots < \frac{p ({\hat{σ}}_{k^{'} + k^{″} + k^{‴}})}{c ({\hat{σ}}_{k^{'} + k^{″} + k^{‴}})} .

(2)

Now, consider a repeat

{\hat{σ}}_{j}

from

Ψ

such that

1 \leq j < k^{'}

. Since

{\hat{σ}}_{j}

is a gapped repeat, we have

c ({\hat{σ}}_{j}) < p ({\hat{σ}}_{j})

and, as shown above,

c ({\hat{σ}}_{j + 1}) = c ({\hat{σ}}_{j}) + p

,

p ({\hat{σ}}_{j + 1}) = p ({\hat{σ}}_{j}) + p

. Hence,

\frac{p ({\hat{σ}}_{j + 1})}{c ({\hat{σ}}_{j + 1})} = \frac{p ({\hat{σ}}_{j}) + p}{c ({\hat{σ}}_{j}) + p} < \frac{p ({\hat{σ}}_{j})}{c ({\hat{σ}}_{j})} .

Thus, we have

\frac{p ({\hat{σ}}_{1})}{c ({\hat{σ}}_{1})} > \frac{p ({\hat{σ}}_{2})}{c ({\hat{σ}}_{2})} > \dots > \frac{p ({\hat{σ}}_{k^{'}})}{c ({\hat{σ}}_{k^{'}})} .

(3)

From inequalities (2) and (3), we conclude that all the

α

-gapped repeats from

Ψ

have to form a continuous segment

{\hat{σ}}_{l}, {\hat{σ}}_{l + 1}, \dots, {\hat{σ}}_{m}

in

Ψ

. Thus, to efficiently find all these repeats, we need to compute the indexes l and m.

First, we compute l. Let

k^{'} > 0

and

\frac{p ({\hat{σ}}_{1})}{c ({\hat{σ}}_{1})} \leq α

. Then, we obviously obtain

l = 1

.

Now, let

k^{'} > 0

and

\frac{p ({\hat{σ}}_{1})}{c ({\hat{σ}}_{1})} > α \geq \frac{p ({\hat{σ}}_{k^{'}})}{c ({\hat{σ}}_{k^{'}})}

. Then, taking into account (3), we have that l is the smallest number such that

\frac{p ({\hat{σ}}_{l})}{c ({\hat{σ}}_{l})} \leq α

. Since this number cannot be greater than

k^{'}

, as shown above, we have also

p ({\hat{σ}}_{l}) = p ({\hat{σ}}_{1}) + p (l - 1)

and

c ({\hat{σ}}_{l}) = c ({\hat{σ}}_{1}) + p (l - 1)

. Thus, l is the smallest number such that

\frac{p ({\hat{σ}}_{1}) + p (l - 1)}{c ({\hat{σ}}_{1}) + p (l - 1)} \leq α

. It can be easily computed that

l = 1 + ⌈\frac{p ({\hat{σ}}_{1}) - α c ({\hat{σ}}_{1})}{p (α - 1)}⌉ .

Now, let

k^{'} = 0

or

α < \frac{p ({\hat{σ}}_{k^{'}})}{c ({\hat{σ}}_{k^{'}})}

. In this case, if there exists the repeat

{\hat{σ}}_{k^{'} + 1}

in

Ψ

and

\frac{p ({\hat{σ}}_{k^{'} + 1})}{c ({\hat{σ}}_{k^{'} + 1})} \leq α

, then we have

l = k^{'} + 1

; otherwise, from (2), we obtain that there are no

α

-gapped repeats in

Ψ

.

Now, we compute m. If

\frac{p ({\hat{σ}}_{\hat{k} + k^{‴}})}{c ({\hat{σ}}_{\hat{k} + k^{‴}})} \leq α

, then we obviously have

m = \hat{k} + k^{‴}

. Thus, further, we assume that

\frac{p ({\hat{σ}}_{\hat{k} + k^{‴}})}{c ({\hat{σ}}_{\hat{k} + k^{‴}})} > α

. Let

k^{‴} > 0

and

\frac{p ({\hat{σ}}_{\hat{k} + 1})}{c ({\hat{σ}}_{\hat{k} + 1})} \leq α < \frac{p ({\hat{σ}}_{\hat{k} + k^{‴}})}{c ({\hat{σ}}_{\hat{k} + k^{‴}})}

. Then, taking into account (2), we have that m is equal to the greatest number

\hat{k} + m^{'}

for

k^{‴} > m^{'} \geq 1

such that

\frac{p ({\hat{σ}}_{\hat{k} + m^{'}})}{c ({\hat{σ}}_{\hat{k} + m^{'}})} \leq α

. As shown above, we have also

p ({\hat{σ}}_{\hat{k} + m^{'}}) = p ({\hat{σ}}_{\hat{k} + 1}) + (m^{'} - 1) p

and

c ({\hat{σ}}_{\hat{k} + m^{'}}) = c ({\hat{σ}}_{\hat{k} + 1}) - (m^{'} - 1) p

. Thus, m is equal to the greatest number

\hat{k} + m^{'}

for

k^{‴} > m^{'} \geq 1

such that

\frac{p ({\hat{σ}}_{\hat{k} + 1}) + (m^{'} - 1) p}{c ({\hat{σ}}_{\hat{k} + 1}) - (m^{'} - 1) p} \leq α

. It can be easily computed that

m = \hat{k} + 1 + ⌊\frac{α c ({\hat{σ}}_{\hat{k} + 1}) - p ({\hat{σ}}_{\hat{k} + 1})}{p (α + 1)}⌋ .

Now, let

k^{‴} > 0

and

\frac{p ({\hat{σ}}_{\hat{k}})}{c ({\hat{σ}}_{\hat{k}})} \leq α < \frac{p ({\hat{σ}}_{\hat{k} + 1})}{c ({\hat{σ}}_{\hat{k} + 1})}

. Then, we have obviously that

m = \hat{k}

.

Now, we can assume that

α < \frac{p ({\hat{σ}}_{\hat{k}})}{c ({\hat{σ}}_{\hat{k}})}

and, if

k^{‴} > 0

,

α < \frac{p ({\hat{σ}}_{\hat{k} + 1})}{c ({\hat{σ}}_{\hat{k} + 1})}

. Let

k^{″} > 0

and

\frac{p ({\hat{σ}}_{k^{'} + 1})}{c ({\hat{σ}}_{k^{'} + 1})} \leq α < \frac{p ({\hat{σ}}_{\hat{k}})}{c ({\hat{σ}}_{\hat{k}})}

. Then, taking into account (2), we have that m is equal to the greatest number

k^{'} + m^{″}

for

k^{″} > m^{″} \geq 1

such that

\frac{p ({\hat{σ}}_{k^{'} + m^{″}})}{c ({\hat{σ}}_{k^{'} + m^{″}})} \leq α

. From the above observations, we obtain that

p ({\hat{σ}}_{k^{'} + m^{″}}) = p ({\hat{σ}}_{k^{'} + 1}) + (m^{″} - 1) p

and

c ({\hat{σ}}_{k^{'} + m^{″}}) = | r^{'} |

. Thus, m is equal to the greatest number

k^{'} + m^{″}

for

k^{″} > m^{″} \geq 1

such that

\frac{p ({\hat{σ}}_{k^{'} + 1}) + (m^{″} - 1) p}{| r^{'} |} \leq α

. It can be easily computed that

m = k^{'} + 1 + ⌊\frac{α | r^{'} | - p ({\hat{σ}}_{k^{'} + 1})}{p}⌋ .

Now, we assume additionally that, if

k^{″} > 0

,

α < \frac{p ({\hat{σ}}_{k^{'} + 1})}{c ({\hat{σ}}_{k^{'} + 1})}

In this case, if

k^{'} > 0

and

α \geq \frac{p ({\hat{σ}}_{k^{'}})}{c ({\hat{σ}}_{k^{'}})}

, then

m = k^{'}

; otherwise, we can conclude that

Ψ

has no

α

-gapped repeats. Thus, l and m can be computed in constant time, so all

α

-gapped repeats in

Ψ

can be computed in the required time.

Finally, we consider the case when

{\hat{σ}}_{1}

is an overlapped repeat. Denote for any

j = 1, 2, \dots, k^{'} + \hat{k}

by

{\hat{u}}_{j}^{'}

and

{\hat{u}}_{j}^{″}

the left and right copies of the repeat

{\hat{σ}}_{j}

. Let

k^{'} > 0

, i.e.,

{\hat{σ}}_{1}

is a repeat of the first type. Note that for any repeat

{\hat{σ}}_{j}

of the first type, we have

end ({\hat{u}}_{j}^{'}) = end ({\hat{u}}_{1}^{'})

and

beg ({\hat{u}}_{j}^{″}) = beg ({\hat{u}}_{1}^{″})

. Therefore, if

{\hat{σ}}_{1}

is an overlapped repeat, then any repeat

{\hat{σ}}_{j}

of the first type is also an overlapped repeat. Thus, in the considered case, we obtain that all repeats of the first type from

Ψ

are overlapped repeats, i.e., only repeats of the second or third types from

Ψ

can be gapped repeats. Now, consider in

Ψ

a repeat

{\hat{σ}}_{j}

such that

j \geq k^{'} + 2

. As shown above, we have

beg ({\hat{u}}_{j}^{″}) \geq beg ({\hat{u}}_{k^{'} + 2}^{″}) = beg ({\hat{u}}_{k^{'} + 1}^{″}) + p \geq beg (r^{″}) + p .

On the other hand, we have

end ({\hat{u}}_{j}^{'}) \leq end (r^{'})

. By Proposition 5, the overlap of repetitions

r^{'}

and

r^{″}

is less than p, so

end (r^{'}) < beg (r^{″}) + (p - 1)

. Thus, we obtain that

beg ({\hat{u}}_{j}^{″}) \geq beg (r^{″}) + p

and

end ({\hat{u}}_{j}^{'}) < beg (r^{″}) + (p - 1)

, so

{\hat{σ}}_{j}

is a gapped repeat. Hence, all repeats

{\hat{σ}}_{j}

from

Ψ

such that

j \geq k^{'} + 2

are gapped repeats. Thus, all gapped repeats from

Ψ

are repeats

{\hat{σ}}_{l}, {\hat{σ}}_{l + 1}, \dots, {\hat{σ}}_{k^{'} + \hat{k}}

, where

l = k^{'} + 1

if

{\hat{σ}}_{k^{'} + 1}

is a gapped repeat and

l = k^{'} + 2

otherwise. Since one can check in constant time if the repeat

{\hat{σ}}_{k^{'} + 1}

is gapped, the value l can be computed in constant time. Taking into account inequalities (2), we conclude that

\frac{p ({\hat{σ}}_{l})}{c ({\hat{σ}}_{l})} < \frac{p ({\hat{σ}}_{l + 1})}{c ({\hat{σ}}_{l + 1})} < \dots < \frac{p ({\hat{σ}}_{k^{'} + \hat{k}})}{c ({\hat{σ}}_{k^{'} + \hat{k}})} .

Thus, if

\frac{p ({\hat{σ}}_{l})}{c ({\hat{σ}}_{l})} > α

, then there are no

α

-gapped repeats in

Ψ

; otherwise, all

α

-gapped repeats in

Ψ

form a continuous segment

{\hat{σ}}_{l}, {\hat{σ}}_{l + 1}, \dots, {\hat{σ}}_{m}

, where the index m can be computed in constant time, analogously to the case considered above, when

{\hat{σ}}_{1}

is a gapped repeat. Therefore, all

α

-gapped repeats in

Ψ

can be computed in the required time in any case. □

In a similar way, from the left pair of maximal repetitions

r^{'}, r^{″}

, one can compute effectively all maximal

α

-gapped nondominating repeats of the third type represented by the pair

r^{'}, r^{″}

.

Lemma 4.

Let

r^{'}, r^{″}

be a left pair of maximal repetitions with the same cyclic roots such that

p (r^{'}) = p (r^{″})

,

beg (r^{'})

,

beg (r^{″})

,

end (r^{'})

,

end (r^{″})

,

a (r^{'})

,

a (r^{″})

be known. Then, all maximal α-gapped nondominating repeats of the third type represented by the pair

r^{'}, r^{″}

can be computed in time

O (1 + s)

, where s is the number of the maximal α-gapped nondominating repeats of the third type represented by the pair

r^{'}, r^{″}

.

Proof.

According to the proof of the previous lemma, all maximal

α

-gapped periodic repeats represented by the pair

r^{'}, r^{″}

form a continuous segment

{\hat{σ}}_{l}, {\hat{σ}}_{l + 1}, \dots, {\hat{σ}}_{m}

in

Ψ

, where the indexes

l, m

can be computed in constant time. Note that a repeat

{\hat{σ}}_{j}

from

Ψ

is a nondominating repeat of the third type if and only if

j > k^{'} + k^{″} + 1

. Thus, in order to find all maximal

α

-gapped nondominating repeats of the third type represented by the pair

r^{'}, r^{″}

, one needs to select from repeats

{\hat{σ}}_{l}, {\hat{σ}}_{l + 1}, \dots, {\hat{σ}}_{m}

all repeats

{\hat{σ}}_{j}

such that

j > k^{'} + k^{″} + 1

. This can be done obviously in time

O (1 + s)

, where s is the number of the selected repeats. □

Let

r^{'}, r^{″}

be a left pair of maximal repetitions representing some

α

-gapped repeat

σ

with left and right copies

u^{'}

and

u^{″}

and gap v. Note that the distance

beg (r^{″}) - (end (r^{'}) + 1)

between the repetitions

r^{'}

and

r^{″}

cannot be greater than the gap length

| v |

, which is not greater than

(α - 1) | u^{'} | \leq (α - 1) | r^{'} |

. Thus,

beg (r^{″}) - (end (r^{'}) + 1) \leq (α - 1) | r^{'} |

, so

beg (r^{″}) - beg (r^{'}) = (beg (r^{″}) - (end (r^{'}) + 1)) + | r^{'} | \leq (α - 1) | r^{'} | + | r^{'} | = α | r^{'} | .

If

beg (r^{″}) - beg (r^{'}) \leq α | r^{'} |

, we will say that the repetition

r^{″}

is

α

-close to the repetition

r^{'}

from the right. Thus, we obtain the following fact.

Proposition 17.

If a left pair

r^{'}, r^{″}

of maximal repetitions represents at least one maximal α-gapped repeat, then

r^{″}

is α-close to

r^{'}

from the right.

We also use the following proposition.

Proposition 18.

Any maximal repetition

r^{'}

has no more than

2 α

maximal repetitions that are α-close to

r^{'}

from the right.

Proof.

Let

r_{1}^{″}, r_{2}^{″}, \dots, r_{s}^{″}

be all maximal repetitions that are

α

-close to

r^{'}

from the right in the increasing order of their starting positions:

beg (r^{'}) < beg (r_{1}^{″}) < beg (r_{2}^{″}) < \dots < beg (r_{s}^{″}) .

For convenience, denote

r^{'}

by

r_{0}^{″}

and consider two consecutive repeats

r_{j}^{″}

and

r_{j + 1}^{″}

for

j = 0, 1, \dots, s - 1

. Since

p (r_{j}^{″}) = p (r_{j + 1}^{″}) = p (r^{'})

, by Proposition 5, the overlap of

r_{j}^{″}

and

r_{j + 1}^{″}

is less than

p (r^{'})

, so

beg (r_{j + 1}^{″}) > beg (r_{j}^{″}) + (| r_{j}^{″} | - p (r^{'})) \geq beg (r_{j}^{″}) + (| r^{'} | - p (r^{'})) \geq beg (r_{j}^{″}) + | r^{'} | / 2 .

Thus, we obtain that

beg (r_{0}^{″}) + s | r^{'} | / 2 < beg (r_{s}^{″}) \leq beg (r^{'}) + α | r^{'} | = beg (r_{0}^{″}) + α | r^{'} |,

so

s | r^{'} | / 2 < α | r^{'} |

, i.e.,

s < 2 α

. □

Further, by computing a periodic repeat, we will mean that the minimal period of copies of this repeat is additionally computed. Now, we show how all left birepresented maximal

α

-gapped periodic repeats in a word can be effectively computed.

Lemma 5.

All left birepresented maximal α-gapped periodic repeats in w can be computed in time

O (n α)

.

Proof.

First, we find all maximal repetitions in w. According to Theorem 1, this can be done in

O (n)

time, and the total number of these repetitions is

O (n)

. For each maximal repetition r in w, we compute the values

p (r)

,

beg (r)

,

end (r)

,

a (r)

. Moreover, we divide all maximal repetitions in w into subsets of all repetitions with the same cyclic roots, and represent each of these subsets as a double-linked list

P R S R_{j}

of the subset repetitions in the order of their starting positions. According to [17], this can be done in

O (n)

time. We also rearrange the repetitions in each list

P R S R_{j}

in the order of nondecreasing length (the repetitions of the same length are arranged in the order of increasing starting position). The rearranged list

P R S R_{j}

will be denoted by

L R S R_{j}

. Using bucket sorting, all the lists

L R S R_{j}

can be computed from the lists

P R S R_{j}

in total

O (n)

time.

For each j, we compute separately all maximal

α

-gapped periodic repeats represented by left pairs of repetitions from

P R S R_{j}

. The computation is performed as follows. We consider consecutively all repetitions from

L R S R_{j}

. For each repetition

r^{'}

from

L R S R_{j}

, we compute all maximal

α

-gapped periodic repeats represented by left pairs

r^{'}, r^{″}

of repetitions where

r^{″}

is a repetition from

P R S R_{j}

. Before the computation, we assume that all repetitions that precede the repetition

r^{'}

in the list

L R S R_{j}

are already removed from the current list

P R S R_{j}

. Note that, in order to compute the required repeats for the repetition

r^{'}

, we need to consider all the repetitions

r^{″}

from

P R S R_{j}

such that the pair of maximal repetitions

r^{'}, r^{″}

is left and represents at least one maximal

α

-gapped periodic repeat. According to Proposition 17, such repetitions

r^{″}

have to be

α

-close to

r^{'}

from the right. Thus, we actually need to consider only repetitions

r^{″}

from

P R S R_{j}

that are

α

-close to

r^{'}

from the right. Note that, since the lengths of all these repetitions are not less than the length of

r^{'}

and the starting positions of all these repetitions are greater than the starting position of

r^{'}

, all these repetitions follow

r^{'}

in the list

L R S R_{j}

, so they are presented in the current list

P R S R_{j}

. Moreover, they follow

r^{'}

in

P R S R_{j}

. Recall that the current list

P R S R_{j}

contains no repetitions of length less than the length of

r^{'}

. Thus, in the current list

P R S R_{j}

between the repetition

r^{'}

and the repetitions that are

α

-close to

r^{'}

from the right, there are no any other repetitions. Thus, all the repetitions that are

α

-close to

r^{'}

from the right form in the list

P R S R_{j} = (r_{1}^{″}, r_{2}^{″}, \dots, r_{q}^{″})

some continuous segment

(r_{l}^{″}, r_{l + 1}^{″}, \dots, r_{m}^{″})

, which follows immediately

r^{'}

, i.e.,

r^{'} = r_{l - 1}

. Therefore, proceeding from the repetition

r^{'}

, each of the repetitions that are

α

-close to

r^{'}

from the right can be found in the current list

P R S R_{j}

in constant time. After finding each repetition

r^{″}

from these repetitions, we compute all maximal

α

-gapped periodic repeats represented by the left pair

r^{'}, r^{″}

. According to Lemma 3, it can be done in time

O (1 + s)

, where s is the number of computed repeats. The minimal periods of copies of all these computed repeats are defined as the minimal period of

r^{'}

. Thus, the treatment of each of the repetitions that are

α

-close to

r^{'}

from the right can be done in time

O (1 + s)

. Since, according to Lemma 4, the number of maximal repetitions that are

α

-close to

r^{'}

from the right is not greater than

2 α

, the total treatment of all these repetitions can be done in time

O (α + s^{'})

, where

s^{'}

is the total number of computed repeats for all these repetitions. Then, we remove the considered repetition

r^{'}

from the double-linked list

P R S R_{j}

. It can be done in constant time. Thus, the consideration of the repetition

r^{'}

can be performed in time

O (α + s^{'})

. Therefore, since the number of all maximal repetitions is

O (n)

, the total time of considering all maximal repetitions from all lists

L R S R_{j}

is

O (n α + s^{″})

, where

s^{″}

is the total number of computed repeats. Since each computed repeat is a maximal

α

-gapped repeat, the number

s^{″}

is not greater than the total number of maximal

α

-gapped repeats in w, which is

O (n α)

by Theorem 3, so

s^{″} = O (n α)

. Thus, the total time of considering all maximal repetitions from all lists

L R S R_{j}

(which is actually the time of computing all left birepresented maximal

α

-gapped periodic repeats in w) is

s^{″} = O (n α)

. □

Analogously to the proof of Lemma 5, from Lemma 4, we can also prove the following lemma.

Lemma 6.

All left birepresented maximal α-gapped nondominating repeats of the third type in w can be computed in time

O (n α)

.

Now, consider the case of periodic repeats represented by right pairs of maximal repetitions. Let the maximal repeat

σ

be represented by the right pair of maximal repetitions

r^{'}, r^{″}

with the same cyclic roots and the minimal period p. Analogously to the case of repeats represented by left pairs of repetitions, we can show that

σ

can satisfy one of the three following cases:

$end (u^{'}) = end (r^{'})$ , $beg (u^{″}) = beg (r^{″})$ and $end (u^{″}) < end (r^{″})$ ;
$beg (u^{'}) > beg (r^{'})$ , $beg (u^{″}) = beg (r^{″})$ and $end (u^{″}) = end (r^{″})$ ;
$beg (u^{'}) = beg (r^{'})$ and $end (u^{″}) = end (r^{″})$ .

We will call the repeat

σ

a repeat of the first type in case 1, repeat of the second type in case 2, and repeat of the third type in case 3.

Analogously to the case of repeats represented by left pairs of repetitions, we can also show that the starting positions of the right copies of all maximal repeats of the third type represented by the pair

r^{'}, r^{″}

form an arithmetic progression

t^{‴}, t^{‴} + p, t^{‴} + 2 p, \dots, t^{‴} + (k^{‴} - 1) p

, where

k^{‴}

is the number of all maximal repeats of the third type represented by the pair

r^{'}, r^{″}

. We will call the repeat of the third type with the starting position

t^{‴}

of its right copy a dominating repeat; all the other repeats of the third type will be called nondominating. Analogously to Proposition 16, we can prove that a maximal repeat

σ

represented by a right pair of maximal repetitions is covered by another repeat represented by the same pair of repetitions if and only if

σ

is a nondominating repeat of the third type. Taking this into account, together with Proposition 16, we obtain the following fact.

Corollary 1.

A maximal periodic repeat σ represented by a pair of maximal repetitions is covered by another periodic repeat represented by the same pair of repetitions if and only if σ is a nondominating repeat of the third type.

Analogously to Lemmas 5 and 6, we can also prove similar facts for repeats represented by right pairs of maximal repetitions.

Lemma 7.

All right birepresented maximal α-gapped periodic repeats in w can be computed in time

O (n α)

.

Lemma 8.

All right birepresented maximal α-gapped nondominating repeats of the third type in w can be computed in time

O (n α)

.

From Lemmas 6 and 8, we obtain the following corollary.

Corollary 2.

All birepresented maximal α-gapped nondominating repeats of the third type in w can be computed in time

O (n α)

.

Further, we use Corollary 2 in the proposed algorithm for the exclusion of birepresented maximal

a l p h a

-gapped nondominating covered repeats. We show also how all maximal

α

-gapped periodic repeats in a word can be effectively computed.

Lemma 9.

All maximal α-gapped periodic repeats in w can be computed in time

O (n α)

.

Proof.

By Lemmas 5 and 7, we can find in time

O (n α)

all birepresented maximal

α

-gapped periodic repeats in w, so for finding all maximal

α

-gapped periodic repeats in w, we need to compute additionally in w all maximal

α

-gapped periodic repeats represented by maximal repetitions. By Proposition 14, maximal

α

-gapped periodic repeats represented by maximal repetitions are generated by these maximal repetitions. Thus, for finding all these repeats, we can compute initially all maximal

α

-gapped repeats generated by maximal repetitions in w. By Proposition 15, the number of such repeats is

O (n)

, and all these repeats can be computed in

O (n)

time. Note that a maximal repeat

σ

generated by a maximal repetition r is a periodic repeat represented by r if and only if

p (r) \leq c (σ) / 3

. Moreover, in this case,

p (r)

is the minimal period of copies of

σ

. Thus, for each of the computed repeats, we can check in constant time if this repeat is a periodic repeat represented by a maximal repetition and can define in this case the minimal period of copies of this repeat. Thus, all maximal

α

-gapped repeats generated by maximal repetitions in w can be computed in

O (n)

time, so the total time of computing all maximal

α

-gapped repeats generated by maximal repetitions in w is

O (n α)

. □

6. $α$ -Periodic Repeats

Further, we introduce the notion of

α

-periodic repeats, which is crucial for the proposed algorithm. Repeat

σ

is called α-periodic if the minimal period of copies of

σ

is not greater than

\frac{p (σ)}{3 α}

; otherwise,

σ

is called α-nonperiodic. Note that for any

α

-periodic maximal repeat

σ

that is

α

-gapped or overlapped, we have

c (σ) \geq p (σ) / α \geq 3 p

, where p is the minimal period of copies of

σ

. Thus, any

α

-periodic maximal

α

-gapped or overlapped repeat is a periodic repeat. Thus, all

α

-periodic maximal

α

-gapped or overlapped repeats can be classified as repeats represented by single maximal repetitions or repeats represented by pairs of maximal repetitions. We will use the following property of reprincipal repeats.

Proposition 19.

Let r be a maximal repetition such that

e (r) \geq 3

. Then, the principal repeat of r is α-nonperiodic.

Proof.

Let

σ

be the principal repeat of r. Assume that

σ

is

α

-periodic, i.e., the copies of

σ

are repetitions with minimal period

p^{'}

not greater than

\frac{p (σ)}{3 α} = \frac{p (r)}{3 α}

, so

p^{'} < p (r) / 3 < p (r)

. Since

e (r) \geq 3

, i.e.,

| r | \geq 3 p (r)

, we have that the length

c (σ)

of copies of

σ

is not less than

2 p (r) > p (r) + p^{'}

. Therefore, since both

p (r)

and

p^{'}

are periods of copies of

σ

, by the periodicity lemma, we obtain that

\gcd (p (r), p^{'})

is also a period of copies of

σ

. Since

p^{'}

is the minimal period of copies of

σ

, we conclude that

p^{'} = \gcd (p (r), p^{'})

, i.e.,

p^{'}

is a divisor of

p (r)

. In this case, we obtain that

p^{'}

is also a period of r, which contradicts the notion that

p (r)

is the minimal period of r. □

Let

σ

be a

α

-periodic maximal repeat represented by some maximal repetition r. Note that, in this case,

σ

is covered by r, i.e., it is covered by the principal repeat of r. Since the repeat

σ

is a periodic repeat with the minimal period

p (r)

of its copies, i.e., the length of its copies is not less than

3 p (r)

, and r contains copies of

σ

, we have that

e (r) \geq 3

. Thus, we have the following corollary from Proposition 19.

Corollary 3.

Any α-periodic maximal repeat represented by some maximal repetition is covered by the α-nonperiodic principal repeat of this repetition.

Our algorithm is based on the following lemma.

Lemma 10.

Let a maximal α-gapped repeat σ be covered by some α-periodic maximal repeat and not covered by any α-nonperiodic principal repeat of maximal repetition. Then, σ is a periodic birepresented nondominating repeat of the third type.

Proof.

Let

σ

be covered by a

α

-periodic maximal repeat

σ^{'}

. By Proposition 10,

σ^{'}

is a

α

-gapped or overlapped repeat, so

σ^{'}

is a periodic repeat. If

σ^{'}

is represented by a maximal repetition, then, by Corollary 3,

σ^{'}

is covered by the

α

-nonperiodic principal repeat of this repetition, so

σ

is also covered by the

α

-nonperiodic principal repeat of this repetition, which contradicts conditions of the lemma. Thus,

σ^{'}

is represented by a pair of maximal repetitions

r^{'}, r^{″}

. Let

p^{'}

be the minimal period of these repetitions. Note that, by Proposition 11, the left copy of

σ

is contained in the left copy of

σ^{'}

and the right copy of

σ

is contained in the right copy of

σ^{'}

. Hence, the left copy of

σ

is contained in

r^{'}

and the right copy of

σ

is contained in

r^{″}

, so

p^{'}

is a period of copies of

σ

. Moreover, we have that

c (σ) \geq \frac{p (σ)}{α} > \frac{p (σ^{'})}{α} \geq 3 p^{'} .

Thus,

σ

is a periodic repeat. Denote by p the minimal period of copies of

σ

. Let

p < p^{'}

. Since

c (σ) > 3 p^{'} > p^{'} + p

, by the periodicity lemma, we have that

\gcd (p^{'}, p)

is also a period of copies of

σ

, which has to be equal to p. Thus, p is a divisor of

p^{'}

, so, in the case

p < p^{'}

, we obtain that the cyclic roots of repetitions

r^{'}, r^{″}

are not primitive. Therefore,

p^{'} = p

, i.e.,

p^{'}

is the minimal period of copies of

σ

, so

σ

is represented by the pair of repetitions

r^{'}, r^{″}

. Thus, we have that

σ

is a periodic repeat represented by the pair

r^{'}, r^{″}

and is covered by the periodic repeat

σ^{'}

represented by the same pair

r^{'}, r^{″}

. Thus, by Corollary 1,

σ

is a nondominating repeat of the third type. □

We show also that all maximal

α

-gapped

α

-periodic repeats in a word can be computed in the following way.

Lemma 11.

All maximal α-gapped α-periodic repeats in w can be computed in time

O (n α)

.

Proof.

Recall that any maximal

α

-gapped

α

-periodic repeat is a periodic repeat, so, to find all maximal

α

-gapped

α

-periodic repeats, we can compute initially all maximal

α

-gapped periodic repeats in w. By Lemma 9, this can be done in time

O (n α)

. Moreover, by Theorem 3, the number of computed repeats is

O (n α)

. Recall that for each computed repeat, we compute additionally the minimal period of copies of this repeat, so we can check in constant time if this repeat is

α

-periodic. Thus, in

O (n α)

time, we can select from the computed repeats all maximal

α

-gapped

α

-periodic repeats in w. The total time of the proposed procedure for computing the required repeats is

O (n α)

. □

Lemma 11 is used further in the proposed algorithm for the identification of all maximal

α

-gapped

α

-periodic repeats.

Further, we show how to compute effectively all reprincipal

α

-periodic repeats in a word. Note that, as shown before, these repeats are actually maximal overlapped periodic repeats, which can be birepresented or represented by maximal repetitions. Thus, among reprincipal

α

-periodic repeats, we consider separately birepresented repeats and repeats represented by maximal repetitions. We consider initially birepresented reprincipal

α

-periodic repeats. These repeats are maximal overlapped periodic repeats represented by left or right pairs of maximal repetitions. Note that such repeats can be represented only by pairs of overlapped maximal repetitions. First, we consider maximal overlapped periodic repeats represented by left pairs of maximal repetitions.

Lemma 12.

Let

r^{'}, r^{″}

be a left pair of maximal overlapped repetitions with the same cyclic roots such that

p (r^{'}) = p (r^{″})

,

beg (r^{'})

,

beg (r^{″})

,

end (r^{'})

,

end (r^{″})

,

a (r^{'})

,

a (r^{″})

be known. Then, the number of maximal overlapped periodic repeats represented by the pair

r^{'}, r^{″}

is less than

e (r^{'})

, and all these repeats can be computed in time

O (e (r^{'}))

.

Proof.

It is shown in the proof of Lemma 3 that in the set

Ψ

, all repeats

{\hat{σ}}_{j}

such that

j \geq k^{'} + 2

are gapped repeats, so only repeats

{\hat{σ}}_{1}, {\hat{σ}}_{2}, \dots, {\hat{σ}}_{k^{'} + 1}

can be overlapped repeats, and, moreover, all these

k^{'} + 1

repeats can be computed in

O (1 + k^{'})

time. It is also shown that repeats

{\hat{σ}}_{1}, {\hat{σ}}_{2}, \dots, {\hat{σ}}_{k^{'}}

are overlapped if and only if repeat

{\hat{σ}}_{1}

is overlapped. Thus, in constant time, we can select all overlapped repeats from repeats

{\hat{σ}}_{1}, {\hat{σ}}_{2}, \dots, {\hat{σ}}_{k^{'} + 1}

. It follows from Equation (1) that

k^{'} \leq \frac{| r^{'} |}{p (r^{'})} - 2 = e (r^{'}) - 2

. Thus, the number of the selected repeats is not greater than

1 + k^{'} < e (r^{'})

, and the total time of computing these repeats is

O (1 + k^{'}) = O (e (r^{'}))

. □

From Lemma 12, we obtain the following fact.

Lemma 13.

The number of maximal overlapped periodic repeats represented by left pairs of maximal repetitions is

O (n)

, and all these repeats can be computed in

O (n)

time.

Proof.

Analogously to the proof of Lemma 5, first, we find all maximal repetitions in w; for each maximal repetition r in w, we compute the values

p (r)

,

beg (r)

,

end (r)

,

a (r)

, and, moreover, we divide all maximal repetitions in w into subsets of all repetitions with the same cyclic roots and represent each of these subsets as a list

P R S R_{j}

of the subset repetitions in the order of increasing starting position. As shown in the proof of Lemma 5, this can be done in

O (n)

time. Then, we consider each list

P R S R_{j}

. Let

P R S R_{j}

consist of consecutive repetitions

(r_{1}^{(j)}, r_{2}^{(j)}, \dots, r_{q}^{(j)})

. According to Proposition 5, for each repetition

r_{i}^{(j)}

, the overlaps of

r_{i}^{(j)}

with repetitions

r_{i - 1}^{(j)}

and

r_{i + 1}^{(j)}

are less than

p (r_{i}^{(j)})

, so repetitions

r_{i - 1}^{(j)}

and

r_{i + 1}^{(j)}

cannot be overlapped. Thus, only pairs

r_{i}^{(j)}, r_{i + 1}^{(j)}

can be overlapped pairs of repetitions representing maximal overlapped repeats. Therefore, we traverse the list

P R S R_{j}

for finding all overlapped left pairs

r_{i}^{(j)}, r_{i + 1}^{(j)}

of repetitions. Note that the total number of maximal repetitions in w is

O (n)

, so the total time of traversing all lists

P R S R_{j}

is

O (n)

. For each found pair

r_{i}^{(j)}, r_{i + 1}^{(j)}

, we compute all maximal overlapped periodic repeats represented by this pair. By Lemma 12, the number of these repeats is less than

e (r_{i}^{(j)})

, and all these repeats can be computed in time

O (e (r_{i}^{(j)}))

. For the computed repeats, the minimal periods of copies of these repeats are defined as the minimal period of

r_{i}^{(j)}

. Note that for each maximal repetition r in w, there can be only one left pair

r_{i}^{(j)}, r_{i + 1}^{(j)}

of repetitions such that

r \equiv r_{i}^{(j)}

, so the total number of maximal overlapped periodic repeats represented by all left pairs

r_{i}^{(j)}, r_{i + 1}^{(j)}

of repetitions in all lists

P R S R_{j}

is less than

\sum_{r \in R (w)} e (r)

; thus, by Theorem 2, this number is

O (n)

. For the same reason, the total time of computing all maximal overlapped periodic repeats represented by all left pairs

r_{i}^{(j)}, r_{i + 1}^{(j)}

of repetitions in all lists

P R S R_{j}

is

O (\sum_{r \in R (w)} e (r))

, so, by Theorem 2, this time is

O (n)

. Thus, the total time of the considered procedure for computing the required repeats is

O (n)

. □

We can also prove the analogous lemma for right pairs of maximal repetitions.

Lemma 14.

The number of maximal overlapped periodic repeats represented by right pairs of maximal repetitions is

O (n)

, and all these repeats can be computed in

O (n)

time.

From Lemmas 13 and 14, we directly obtain the corollary.

Corollary 4.

The number of maximal overlapped periodic birepresented repeats is

O (n)

, and all these repeats can be computed in

O (n)

time.

Now, we show that all maximal overlapped

α

-periodic birepresented repeats can be easily selected from maximal overlapped periodic birepresented repeats.

Lemma 15.

The number of maximal overlapped α-periodic birepresented repeats is

O (n)

, and all these repeats can be computed in

O (n)

time.

Proof.

Recall that any maximal overlapped

α

-periodic repeat is a periodic repeat, so, to find all maximal overlapped

α

-periodic birepresented repeats, we can compute initially all maximal overlapped periodic birepresented repeats in w. By Corollary 4, the number of such repeats is

O (n)

, and all these repeats can be computed in

O (n)

time. Since, for each computed repeat, we compute additionally the minimal period of copies of this repeat, we can check in constant time if this repeat is

α

-periodic. Thus, in

O (n)

time, we can select from the computed repeats all maximal overlapped

α

-periodic birepresented repeats in w. The total time of the proposed procedure for computing the required repeats is

O (n)

. □

Finally, we prove that all reprincipal

α

-periodic birepresented repeats can be effectively selected from maximal overlapped

α

-periodic birepresented repeats.

Lemma 16.

The number of reprincipal α-periodic birepresented repeats is

O (n)

, and all these repeats can be computed in

O (n)

time.

Proof.

Recall that all reprincipal repeats are maximal overlapped repeats, so for computing all reprincipal

α

-periodic birepresented repeats in w, we can select them from all maximal overlapped

α

-periodic birepresented repeats in w. By Lemma 15, the number of all maximal overlapped

α

-periodic birepresented repeats is

O (n)

, and these repeats can be computed in

O (n)

time. By Proposition 7, the number of reprincipal repeats in w is also

O (n)

, and all these repeats can be computed in

O (n)

time. In order to select all reprincipal repeats from maximal overlapped

α

-periodic birepresented repeats, we represent the set of all the computed maximal overlapped

α

-periodic birepresented repeats by start position lists

M O B R L_{t}

. In the same way, we represent the set of all the computed reprincipal repeats by start position lists

P R L_{t}

. All the lists

M O B R L_{t}

and

P R L_{t}

can be computed in time

O (n + S)

, where S is the total size of all lists

M O B R L_{t}

and

P R L_{t}

, so, since this total size is

O (n)

, the time of computing all the lists

M O B R L_{t}

and

P R L_{t}

is

O (n)

. Then, in order to select the required reprincipal repeats, we traverse simultaneously lists

M O B R L_{t}

and

P R L_{t}

for

t = 1, 2, \dots, n

. This can also be done in time

O (n + S) = O (n)

. Thus, the total time of the proposed procedure for computing the required repeats is

O (n)

. □

Now, we consider reprincipal

α

-periodic repeats represented by maximal repetitions. We show actually that these repeats are impossible.

Proposition 20.

The principal repeat of a maximal repetition cannot be represented by another maximal repetition.

Proof.

Let

σ

be the principal periodic repeat of a maximal repetition r. Assume that

σ

is represented by some another maximal repetition

r^{'}

. Note that, in this case,

p (r^{'}) \leq p (r)

, and r is contained in

r^{'}

, i.e., the length of the overlap of the repetitions r and

r^{'}

is not less than

2 p (r)

. It contradicts Proposition 6. □

Corollary 5.

Any reprincipal α-periodic repeat is a birepresented repeat.

Proof.

Let

σ

be the principal

α

-periodic repeat of some maximal repetition r. Note that, as shown above,

σ

is a periodic repeat. Assume that

σ

is represented by some maximal repetition. By Proposition 20,

σ

can be represented only by repetition r. In this case, we have that

p (σ) = p (r)

and the minimal period of copies of

σ

is

p (r)

, so the minimal period of copies of

σ

is greater than

\frac{p (σ)}{3 α}

, which contradicts the notion that

σ

is a

α

-periodic repeat. □

From Lemma 16 and Corollary 5, we obtain immediately the following fact.

Corollary 6.

The number of reprincipal α-periodic repeats is

O (n)

, and all these repeats can be computed in

O (n)

time.

Corollary 6 is used further in the proposed algorithm for the identification of all reprincipal

α

-periodic repeats.

7. Algorithm for Solving Problem 3

In this section, we describe in detail the proposed algorithm. The pseudocode of the proposed algorithm is presented in Algorithm 1. Let

α = 1 / δ

. We compute initially the set

G R

of all maximal

α

-gapped repeats in w and the set

P R

of all reprincipal repeats in w. By Theorem 3, we have that

| G R | = O (n α)

and

G R

can be computed in

O (α n)

time. Moreover, by Proposition 7, we have that

| P R | = O (n)

and

P R

can be computed in

O (n)

time. Recall that our goal is to exclude from

G R

all repeats that are covered by other maximal repeats. Using Propositions 10 and 12, we conclude that we need actually to exclude from

G R

all repeats that are covered by other maximal gapped repeats from

G R

or reprincipal repeats. Denote by

G R^{*}

the set of all repeats from

G R

that are not covered by other repeats from

G R

or reprincipal repeats. In these terms, our goal is to compute the set

G R^{*}

.

Note that all maximal repeats generated by a maximal repetition, except the principal repeat of this repetition, are covered by this repetition and so are covered by the principal repeat of this repetition. Recall also that the principal repeats of maximal repetitions are overlapped repeats, so gapped repeats cannot be reprincipal repeats. Thus, maximal gapped repeats generated by a maximal repetition are covered by the principal repeat of this repetition and so have to be excluded from

G R

. At the first stage, we exclude from

G R

all maximal

α

-gapped repeats generated by maximal repetitions. By Proposition 15, the number of these repeats is

O (n)

, and all these repeats can be computed in

O (n)

time. Denote the set of all the computed repeats by

C R

. In order to exclude from

G R

the repeats of the set

C R

, we represent

G R

by start position lists

G R L_{t}

. These lists can be computed in time

O (n + | G R |) = O (α n)

. In the same way, we represent the set

C R

by start position lists

C R L_{t}

. These lists can be computed in time

O (n + | C R |) = O (n)

. Then, all the computed repeats that are contained in

G R

can be excluded from

G R

by simultaneously traversing lists

G R L_{t}

and

C R L_{t}

in time

O (n + | G R | + | C R |) = O (α n)

.

Denote by

G R^{'}

the resulting set of all repeats from

G R

that remain after the first stage. Recall that all repeats from

G R

that are removed at the first stage are covered by reprincipal repeats. Therefore, any repeat from

G R

that is covered by a repeat

σ

removed at the first stage is covered also by some reprincipal repeat covering the repeat

σ

. Thus, in order to compute the set

G R^{*}

, we can remove from

G R^{'}

all repeats that are covered by other repeats from

G R^{'}

or reprincipal repeats. Denote by

\hat{G R}

the set of all repeats from

G R^{'}

together with all reprincipal repeats in w. In these terms, in order to compute the set

G R^{*}

, we remove from

G R^{'}

all repeats that are covered by other repeats from

\hat{G R}

.

Algorithm 1 Algorithm for solving Problem 3

α : = 1 / δ

compute the set

G R

of all maximal

α

-gapped repeats in w

compute the set

P R

of all reprincipal repeats in w

compute the set

C R

of all repeats from

G R

generated by maximal repetitions

exclude from

G R

all repeats from

C R

G R^{'} \leftarrow G R

compute the set

A G R

of all

α

-periodic repeats from

G R

compute the set

A P R

of all

α

-periodic reprincipal repeats in w

compute the set

N P R = P R ∖ A P R

of all

α

-nonperiodic reprincipal repeats in w

mark all

α

-nonperiodic repeats from

G R^{'}

as

α

-nonperiodic by using the set

A G R

\hat{G R^{'}} \leftarrow G R^{'} \cup N P R

remove from

\hat{G R^{'}}

all reprincipal repeats and all repeats that are covered by

α

-nonperiodic repeats

compute the set

B A N R

of all periodic birepresented maximal

α

-gapped nondominating repeats of third type in w

remove from

\hat{G R^{'}}

all repeats from

B A N R

G R^{*} \leftarrow \hat{G R^{'}}

output

G R^{*}

At the second stage, we remove from

G R^{'}

all repeats that are covered by

α

-nonperiodic repeats from

\hat{G R}

. For this purpose, first, we compute the set

A G R

of all

α

-periodic repeats from

G R

and the set

A P R

of all

α

-periodic reprincipal repeats in w. By Lemma 11, the set

A G R

can be computed in

O (n α)

time, and, by Corollary 6, the set

A P R

can be computed in

O (n)

time. Note that after performing the first stage, the set

G R^{'}

is represented by its start position lists

G R L_{t}^{'}

. The set

A G R

can be also represented by start position lists

A G R L_{t}

, which can be computed in

O (n + | A G R |) = O (n α)

time. Then, using the simultaneous traversing of lists

G R L_{t}^{'}

and

A G R L_{t}

, we mark all

α

-nonperiodic repeats from

G R^{'}

as

α

-nonperiodic (each repeat from

G R L_{t}^{'}

that is not contained in

A G R L_{t}

is marked as

α

-nonperiodic). Moreover, we compute the set

N P R

of all

α

-nonperiodic reprincipal repeats in w by removing from the set

P R

all repeats from

A P R

. To perform this removal, we also represent the sets

P R

and

A P R

by their start position lists

P R L_{t}

and

A P R L_{t}

. All these lists can be computed in time

O (n + | P R | + | A P R |) = O (n)

. Then, we compute all repeats from

N P R

by the simultaneous traversing of lists

P R L_{t}

and

A P R L_{t}

in total time

O (n + | P R | + | A P R |) = O (n)

. Note that the computed set

N P R

is also represented by its start position lists

N P R L_{t}

. Then, we compute the set

\hat{G R^{'}} = G R^{'} \cup N P R

by merging the lists

G R L_{t}^{'}

and

N P R L_{t}

into the start position lists

{\hat{G R L^{'}}}_{t}

for this set. This can be done by this simultaneous traversing of lists

G R L_{t}^{'}

and

N P R L_{t}

in time

O (n + | G R^{'} | + | N P R |) = O (α n)

. During the merging of repeats into the lists

{\hat{G R L^{'}}}_{t}

, we also mark these repeats as gapped or as reprincipal. Note also that

| \hat{G R^{'}} | = | G R^{'} | + | N P R | = O (α n)

.

For

i = 1, 2, \dots, ⌊ \log_{2} (n - 1) ⌋

, we denote by

R Q_{i}

the subset of all repeats

σ

from

\hat{G R^{'}}

such that

2^{i} \leq p (σ) < 2^{i + 1}

, i.e.,

⌊ \log_{2} p (σ) ⌋ = i

.

Proposition 21.

Let a maximal gapped repeat σ be covered by a maximal gapped repeat

σ^{'}

. Then,

p (σ) \geq p (σ^{'}) > p (σ) / 2

.

Proof.

Note that

p (σ) < | σ | \leq | σ^{'} | < 2 p (σ^{'})

, so

p (σ^{'}) > p (σ) / 2

. The inequality

p (σ) \geq p (σ^{'})

is obvious. □

Corollary 7.

Let a gapped repeat σ from

R Q_{i}

be covered by a gapped repeat

σ^{'}

from

\hat{G R^{'}}

. Then,

σ^{'}

is contained in

R Q_{i}

or

R Q_{i - 1}

.

Proposition 22.

Let a gapped repeat σ from

R Q_{i}

be covered by a reprincipal repeat

σ^{'}

from

R Q_{i^{'}}

. Then,

i \geq i^{'} \geq i - ⌈ \log_{2} α ⌉

.

Proof.

It is obvious that

i \geq i^{'}

. Assume that

i^{'} < i - ⌈ \log_{2} α ⌉

. Note that for any

σ

from

R Q_{i}

and any

σ^{'}

from

R Q_{i^{'}}

, we have

p (σ) > 2^{i - i^{'} - 1} p (σ^{'})

, so, in this case,

p (σ) > 2^{⌈ \log_{2} α ⌉} p (σ^{'}) \geq α p (σ^{'})

. Hence,

c (σ) \geq p (σ) / α > p (σ^{'})

. Let

u^{'}

,

u^{″}

be left and right copies of

σ

. Since

σ

is covered by

σ^{'}

, the repeat

σ

is contained in the repetition

rep (σ^{'})

with the minimal period

p (σ^{'})

, so both copies

u^{'}

,

u^{″}

are contained in

rep (σ^{'})

and

| u^{'} | = | u^{″} | > p (σ^{'})

. Consider the prefixes of length

p (σ^{'})

in

u^{'}

and

u^{″}

. Since these prefixes are equal cyclic roots of

rep (σ^{'})

, by Proposition 3, we have

beg (u^{'}) \equiv beg (u^{″}) (\mod p (σ^{'}))

, so

p (σ) = beg (u^{″}) - beg (u^{'})

is divisible by

p (σ^{'})

, i.e.,

p (σ)

is divisible by the minimal period of

rep (σ^{'})

. Moreover, if

beg (u^{'}) > beg (rep (σ^{'}))

, then both symbols

w [beg (u^{'}) - 1]

and

w [beg (u^{″}) - 1]

are contained in

rep (σ^{'})

, so

w [beg (u^{'}) - 1] = w [beg (u^{″}) - 1]

, i.e.,

σ

can be extended to the left, which contradicts the notion that

σ

is a maximal repeat. Thus,

beg (u^{'}) = beg (rep (σ^{'}))

. It can be analogously proven that

end (u^{″}) = end (rep (σ^{'}))

. Thus, the repeat

σ

is generated by the repetition

rep (σ^{'})

, which contradicts the notion that

σ \in \hat{G R^{'}}

. □

Summarizing Corollary 7 and Proposition 22, we obtain the following fact.

Corollary 8.

Let a gapped repeat σ from

R Q_{i}

be covered by a repeat

σ^{'}

from

R Q_{i^{'}}

. Then,

i \geq i^{'} \geq i - ⌈ \log_{2} α ⌉

.

For finding all repeats that have to be removed at the second stage, for each starting position

t = 1, 2, \dots, n

, we compute consecutively such repeats starting at position t. Note that such repeats starting at position t can be covered only by

α

-nonperiodic repeats from

\hat{G R^{'}}

that start at a position not greater than t and end at a position greater than t. Denote the set of all these

α

-nonperiodic repeats by

S Q

and write

S Q_{i} = S Q \cap R Q_{i}

for

i = 1, 2, \dots, ⌊ \log_{2} (n - 1) ⌋

. Note that if, for some repeat

σ

from

S Q_{i}

, there exists a repeat

σ^{'}

in

S Q

such that

end (σ) \leq end (σ^{'})

and

p (σ^{'}) \leq p (σ)

, then

σ

can be excluded from consideration. The remaining repeats from

S Q_{i}

form a sequence

L Q_{i} = σ_{1}, σ_{2}, \dots, σ_{s}

such that

end (σ_{1}) < end (σ_{2}) < \dots < end (σ_{s})

and

p (σ_{1}) < p (σ_{2}) < \dots < p (σ_{s})

. In order to perform an effective search in this sequence, we present

L Q_{i}

as AVL-tree

L Q T_{i}

. For each i, we compute also the value

{lep}_{i}

, which is the maximum of the ending positions of the last repeats in sequences

L Q_{i - 1}

,

L Q_{i - 2}

, …,

L Q_{i - ⌈ \log_{2} α ⌉}

.

Let

L Q_{i} = σ_{1}, σ_{2}, \dots, σ_{s}

. For any repeat

σ

, we define

p e (σ) = p (σ) + end (σ)

. From

end (σ_{1}) < end (σ_{2}) < \dots < end (σ_{s})

and

p (σ_{1}) < p (σ_{2}) < \dots < p (σ_{s})

, we have

p e (σ_{1}) < p e (σ_{2}) < \dots < p e (σ_{s})

.

Lemma 17.

For each

j \leq s - 2

, the inequality

p e (σ_{j + 2}) > p e (σ_{j}) + \frac{p (σ_{j})}{3 α}

is valid.

Proof.

Denote, for convenience, by

u_{1}^{'}

,

u_{2}^{'}

,

u_{3}^{'}

the left copies of

σ_{j}

,

σ_{j + 1}

,

σ_{j + 2}

, and by

u_{1}^{″}

,

u_{2}^{″}

,

u_{3}^{″}

the right copies of

σ_{j}

,

σ_{j + 1}

,

σ_{j + 2}

. For

k = 1, 2, 3

, denote

p_{k} = p (σ_{j + k - 1})

,

e_{k} = end (σ_{j + k - 1})

,

p e_{k} = p_{k} + e_{k}

, and for

k = 2, 3

, denote also

Δ p_{k} = p_{k} - p_{1}

,

Δ e_{k} = e_{k} - e_{1}

,

Δ p e_{k} = p e_{k} - p e_{1}

.

Assume that

p e_{3} \leq p e_{1} + \frac{p_{1}}{3 α}

, i.e.,

Δ p e_{2} < Δ p e_{3} \leq \frac{p_{1}}{3 α}

. Consider separately the following three cases.

1. Let

u_{1}^{″}

be contained in

u_{2}^{″}

. Thus,

u_{2}^{'}

contains the factor

{\hat{u}}_{1}^{'}

corresponding to the factor

u_{1}^{″}

in

u_{2}^{″}

such that

end (u_{1}^{'}) - end ({\hat{u}}_{1}^{'}) = Δ p_{2}

. Thus, since

u_{1}^{'} = {\hat{u}}_{1}^{'}

, we obtain that

u_{1}^{'}

has the period

Δ p_{2} < Δ p e_{2} < \frac{p_{1}}{3 α}

, which contradicts the notion that

σ_{j}

is

α

-nonperiodic.

2. Now, let

u_{1}^{″}

be not contained in

u_{2}^{″}

, i.e.,

beg (u_{1}^{″}) < beg (u_{2}^{″})

, and

Δ e_{2} \leq Δ p_{2}

, which implies

end (u_{2}^{'}) \leq end (u_{1}^{'})

. Let

u^{″}

be the intersection of

u_{1}^{″}

and

u_{2}^{″}

. Since

u^{″}

is a factor of

u_{1}^{″}

and

u_{2}^{″}

, for

u^{″}

, there are corresponding factors

u^{'}

in

u_{1}^{'}

and

{\hat{u}}^{'}

in

u_{2}^{'}

. Since

end (u^{'}) - end ({\hat{u}}^{'}) = Δ p_{2}

and

u^{'} = {\hat{u}}^{'} = u^{″}

, both factors

u^{'}

and

{\hat{u}}^{'}

have the period

Δ p_{2}

. Note that

Δ p_{2}

and

Δ e_{2}

are less than

Δ p e_{2} < \frac{p_{1}}{3 α}

. Thus,

| u^{″} | = | u_{2}^{″} | - Δ e_{2} > | u_{2}^{″} | - \frac{p_{1}}{3 α} \geq \frac{p_{2}}{α} - \frac{p_{1}}{3 α} > \frac{p_{1}}{α} - \frac{p_{1}}{3 α} = \frac{2 p_{1}}{3 α} > 2 Δ p_{2} .

(4)

Therefore, the intersection of

u^{'}

and

{\hat{u}}^{'}

has a length greater than

Δ p_{2}

, so, by Proposition 2, the union

\hat{u}

of factors

u^{'}

and

{\hat{u}}^{'}

has also the period

Δ p_{2}

. Note that

u_{2}^{'}

is contained in

\hat{u}

, so

u_{2}^{'}

has also the period

Δ p_{2} < \frac{p_{1}}{3 α} < \frac{p_{2}}{3 α}

, which contradicts the notion that

σ_{j + 1}

is

α

-nonperiodic.

3. Finally, let

u_{1}^{″}

be not contained in

u_{2}^{″}

and

Δ e_{2} > Δ p_{2}

, which implies

end (u_{2}^{'}) > end (u_{1}^{'})

. As in case 2, we define the factors

u^{″}

and

u^{'}

, and show that both these factors have period

Δ p_{2}

. Note that

u^{″}

is a prefix of

u_{1}^{″}

, so

u^{'}

is a prefix of

u_{1}^{'}

. Thus,

end (u^{'}) = end (u_{1}^{'})

. Consider the symbol

w [end (u_{1}^{'}) + 1]

following the copy

u_{1}^{'}

. Since

σ_{1}

is maximal, we have

w [end (u_{1}^{'}) + 1] \neq w [end (u_{1}^{″}) + 1]

. Moreover, since

end (u_{2}^{'}) > end (u_{1}^{'})

, the symbol

w [end (u_{1}^{'}) + 1]

is contained in

u_{2}^{'}

, so

u_{2}^{″}

contains the corresponding symbol

w [end (u_{1}^{'}) + p_{2} + 1] = w [end (u_{1}^{″}) + Δ p_{2} + 1]

, which is equal to

w [end (u_{1}^{'}) + 1]

. Thus, we obtain that

w [end (u_{1}^{″}) + 1] \neq w [end (u_{1}^{″}) + Δ p_{2} + 1]

. Let

w [end (u_{1}^{″}) + 1]

be not contained in

u_{3}^{″}

, i.e.,

beg (u_{3}^{″}) > end (u_{1}^{″}) + 1

. In this case, we have that

Δ e_{3} > | u_{3}^{″} | \geq \frac{p_{3}}{α} > \frac{p_{1}}{α}

which contradicts that

Δ e_{3} < Δ p e_{3} \leq \frac{p_{1}}{3 α}

. Thus,

w [end (u_{1}^{″}) + 1]

is contained in

u_{3}^{″}

. Therefore, since the symbol

w [end (u_{1}^{″}) + Δ p_{2} + 1]

is contained in

u_{2}^{″}

and

end (u_{2}^{″}) < end (u_{3}^{″})

, the symbol

w [end (u_{1}^{″}) + Δ p_{2} + 1]

is also contained in

u_{3}^{″}

. Thus,

u_{3}^{″}

contains both unequal symbols

w [end (u_{1}^{″}) + 1]

and

w [end (u_{1}^{″}) + Δ p_{2} + 1]

. Therefore,

u_{3}^{″}

contains the corresponding unequal symbols

w [end (u_{1}^{″}) + 1 - p_{3}]

and

w [end (u_{1}^{″}) + Δ p_{2} + 1 - p_{3}]

, which can be also represented as

w [end (u_{1}^{'}) - Δ p_{3} + 1]

and

w [end (u_{1}^{'}) - Δ p_{3} + Δ p_{2} + 1]

. Let

end (u_{1}^{'}) - Δ p_{3} + 1 \geq beg (u^{'})

. Note that

Δ p_{3} > Δ p_{2}

, so

end (u_{1}^{'}) - Δ p_{3} + Δ p_{2} + 1 \leq end (u_{1}^{'})

. Thus, in this case, we have

beg (u^{'}) \leq end (u_{1}^{'}) - Δ p_{3} + 1 < end (u_{1}^{'}) - Δ p_{3} + Δ p_{2} + 1 \leq end (u_{1}^{'}) = end (u^{'}),

i.e., both unequal symbols

w [end (u_{1}^{'}) - Δ p_{3} + 1]

and

w [end (u_{1}^{'}) - Δ p_{3} + Δ p_{2} + 1]

are contained in

u^{'}

, which contradicts the notion that

u^{'}

has period

Δ p_{2}

. Thus,

end (u_{1}^{'}) - Δ p_{3} + 1 < beg (u^{'}) = end (u^{'}) - | u^{'} | + 1 = end (u_{1}^{'}) - | u^{'} | + 1,

so

Δ p_{3} > | u^{'} | = | u^{″} |

. By relation (4), we have

| u^{″} | > \frac{2 p_{1}}{3 α}

. Thus,

Δ p_{3} > \frac{2 p_{1}}{3 α}

, which contradicts the notion that

Δ p_{3} < \frac{p_{1}}{3 α}

.

Since we obtained contradictions in all considered cases, the lemma is proven. □

Lemma 18.

end (σ_{s - 1}) < t + 2^{i + 2}

.

Proof.

Assume by contradiction that

end (σ_{s - 1}) \geq t + 2^{i + 2}

. Note that in this case,

| σ_{s - 1} | > 2^{i + 2} > 2 p (σ_{s - 1})

, so

σ_{s - 1}

is an overlapped repeat. Since

end (σ_{s}) > end (σ_{s - 1})

, for the same reason,

σ_{s}

is also an overlapped repeat. Note that the intersection of repetitions

rep (σ_{s})

and

rep (σ_{s - 1})

contains the factor

w [t \dots t + 2^{i + 2}]

, so the overlap of these repetitions is greater than

2^{i + 2} > p (σ_{s - 1}) + p (σ_{s})

, which contradicts Proposition 6, i.e., the lemma is proven. □

Using Lemma 18, we have that

t + 2^{i} < p e (σ_{1}) < p e (σ_{2}) < \dots < p e (σ_{s - 1}) < t + 3 \cdot 2^{i + 1},

and, by Lemma 17, for each

j < s - 2

, we have

p e (σ_{j + 2}) - p e (σ_{j}) > \frac{p (σ_{j})}{3 α} \geq \frac{2^{i}}{3 α} .

Thus, we obtain that

s - 1 = O (α)

, so

s = O (α)

. Here, we state this fact.

Corollary 9.

| L Q_{i} | = O (α)

for any i.

The procedure of the algorithm for a starting position t is as follows. First, we remove from trees

L Q T_{i}

all repeats ending at position t if such repeats exist. In order to perform effectively this removal, for each starting position t, we can maintain a double-linked list containing all repeats ending at position t from trees

L Q T_{i}

. In this case, all repeats that have to be removed from trees

L Q T_{i}

can be found in time proportional to the number of these repeats. Then, we consider consecutively all repeats in list

{\hat{G R L^{'}}}_{t}

. Let

σ

be a current considered gapped repeat from

{\hat{G R L^{'}}}_{t}

, and

⌊ \log_{2} p (σ) ⌋ = i

. By Corollary 7,

σ

can be covered only by repeats from

R Q_{i^{'}}

, where

i \geq i^{'} \geq i - ⌈ \log_{2} α ⌉

. Note that

σ

is covered by a repeat from

L Q T_{i^{'}}

such that

i - 1 \geq i^{'} \geq i - ⌈ \log_{2} α ⌉

if and only if

end (σ) \leq {lep}_{i}

. Thus, in this case, we remove

σ

from

{\hat{G R L^{'}}}_{t}

. Otherwise, we check if

σ

is covered by a repeat from

L Q T_{i}

. For this purpose, we search in

L Q T_{i}

the maximal k such that

p (σ_{k}) < p (σ)

. If such k exists and

end (σ) \leq end (σ_{k})

, then

σ

is covered by

σ_{k}

, so, in this case, we also remove

σ

from

{\hat{G R L^{'}}}_{t}

. Otherwise, if

σ

is a

α

-nonperiodic repeat, we insert

σ

in

L Q T_{i}

between

σ_{k}

and

σ_{k + 1}

. After this, we remove from

L Q T_{i}

all

σ_{j}

such that

j > k

and

end (σ_{j}) \leq end (σ)

. If it is necessary, we update the values

{lep}_{i + 1}

,

{lep}_{i + 2}

,…,

{lep}_{i + ⌈ \log_{2} α ⌉}

, which can depend on the value

end (σ)

. Now, let

σ

be a current considered reprincipal repeat from

{\hat{G R L^{'}}}_{t}

. In this case, we remove

σ

from

{\hat{G R L^{'}}}_{t}

and insert

σ

in

L Q T_{i}

, where

i = ⌊ \log_{2} p (σ) ⌋

, in the same way as inserting it above a

α

-nonperiodic gapped repeat, which is checked to be not covered. Then, we proceed to the next repeat in

{\hat{G R L^{'}}}_{t}

.

Note that during the described procedure for each repeat

σ

from

\hat{G R^{'}}

, we perform at most one search in some tree

L Q T_{i}

, at most one insertion of

σ

in

L Q T_{i}

and at most one deletion of

σ

from

L Q T_{i}

. All these operations can be performed in

O (\log | L Q_{i} |)

time, since, by Corollary 9,

| L Q_{i} | = O (α)

, we obtain that all these operations can be performed in

O (\log α)

time. Moreover, after the insertion of

σ

, no more than

⌈ \log_{2} α ⌉

values

{lep}_{i}

can be updated. It can be also performed in

O (\log α)

time. All the other operations over

σ

required for the described procedure can be performed in constant time. Thus, each repeat from

\hat{G R^{'}}

can be treated in

O (\log α)

time, so the total time of the described procedure for all starting positions t is

O (n + | \hat{G R^{'}} | \log α)

. Recall that

| \hat{G R^{'}} | = O (n α)

, so the total time of the described procedure is

O (n α \log α)

.

Note that after the second stage, we removed from

\hat{G R^{'}}

all reprincipal repeats and all repeats from

G R^{'}

that are covered by

α

-nonperiodic repeats from

\hat{G R}

, so, after the second stage, the set

\hat{G R^{'}}

consists of all repeats from

G R^{'}

that are not covered by

α

-nonperiodic repeats from

\hat{G R}

. Note also that

| \hat{G R^{'}} | = O (n α)

.

At the third stage, we compute the set

G R^{*}

by removing from the set

\hat{G R^{'}}

all repeats that are not contained in

G R^{*}

. Let

σ

be a repeat from

\hat{G R^{'}}

that is not contained in

G R^{*}

, i.e.,

σ

is covered by some other repeat besides

\hat{G R}

. Since

\hat{G R^{'}}

consists of repeats that are not covered by

α

-nonperiodic repeats from

\hat{G R}

, the repeat

σ

can be covered only by a

α

-periodic repeat from

\hat{G R}

. Moreover,

σ

cannot be covered by any

α

-nonperiodic reprincipal repeat, since, otherwise,

σ

has to be removed from

\hat{G R^{'}}

at the second stage. Thus,

σ

is covered by some

α

-periodic repeat from

\hat{G R}

and is not covered by any

α

-nonperiodic reprincipal repeat. Therefore, by Lemma 10,

σ

is a periodic birepresented nondominating repeat of the third type. On the other hand, if

σ

is a periodic birepresented nondominating repeat of the third type from

\hat{G R^{'}}

, then, by Corollary 1,

σ

is covered by another periodic repeat represented by the same pair of repetitions, so

σ

is not contained in

G R^{*}

. Thus, a repeat from

\hat{G R^{'}}

is not contained in

G R^{*}

if and only if this repeat is a periodic birepresented nondominating repeat of the third type. Hence, for computing the set

G R^{*}

, we remove from

\hat{G R^{'}}

all periodic birepresented nondominating repeats of the third type. Recall that

\hat{G R^{'}}

consists of maximal

α

-gapped repeats, so we need actually to remove from

\hat{G R^{'}}

all periodic birepresented maximal

α

-gapped nondominating repeats of the third type. For this purpose, we compute the set

B A N R

of all periodic birepresented maximal

α

-gapped nondominating repeats of the third type in w. By Corollary 2, this set can be computed in time

O (n α)

. Moreover, since the set

B A N R

is a subset of the set

G R

, we have

| B A N R | = O (n α)

. Then, we represent the set

B A N R

by its start position lists

B A N R L_{t}

in time

O (n + | B A N R |) = O (n α)

. Finally, we remove from

\hat{G R^{'}}

all repeats from

B A N R

by the simultaneous traversing of lists

{\hat{G R L^{'}}}_{t}

and

B A N R L_{t}

in time

O (n + | \hat{G R^{'}} | + | B A N R |) = O (α n)

. As a result, we obtain the required set

G R^{*}

. The total time of the third stage procedure is

O (n α)

.

Summarizing the times of all the procedures of the proposed algorithm, we obtain that the total time of the algorithm is

O (n α \log α) = O (\frac{n}{δ} \log \frac{1}{δ})

. Thus, Problem 3 can be resolved in

O (\frac{n}{δ} \log \frac{1}{δ})

time. Since, as shown above, Problem 3 is equivalent to Problem 1, we conclude the following main result of our paper.

Theorem 4.

All maximal δ-subrepetitions in a given word of length n over an integer alphabet can be found in time

O (\frac{n}{δ} \log \frac{1}{δ})

.

8. Conclusions

In the paper, we proposed an algorithm for finding all maximal

δ

-subrepetitions in a given word of length n in time

O (\frac{n}{δ} \log \frac{1}{δ})

. By Proposition 8, the number of all maximal

δ

-subrepetitions in a word of length n is

O (\frac{n}{δ})

, so the considered problem could be presumably resolved in

O (\frac{n}{δ})

time. Thus, finding all maximal

δ

-subrepetitions in a given word of length n in time

O (\frac{n}{δ})

is still an open problem.

Funding

This research was funded by the Ministry of Education and Science of the Russian Federation as part of the program of the Moscow Center for Fundamental and Applied Mathematics under the agreement 075-15-2022-284.

Conflicts of Interest

The author declares no conflict of interest.

References

Fine, N.J.; Wilf, H.S. Uniqueness theorems for periodic functions. Proc. Am. Math. Soc. 1965, 16, 109–114. [Google Scholar]
Choffrut, C.; Karhumäki, J. Combinatorics on words. In Handbook of Formal Languages; Rozenberg, G., Salomaa, A., Eds.; Springer: Berlin/Heidelberg, Gemany, 1997; Volume I, pp. 329–438. [Google Scholar]
Crochemore, M.; Hancart, C.; Lecroq, T. Algorithms on Strings; Cambridge University Press: Cambridge, UK, 2007. [Google Scholar]
Kolpakov, R.; Kucherov, G. On Maximal Repetitions in Words. J. Discret. Algorithms 2000, 1, 159–186. [Google Scholar]
Crochemore, M.; Ilie, L.; Tinta, L. Towards a solution to the “runs” conjecture. In Proceedings of the 19th Annual Symposium on Combinatorial Pattern Matching, Pisa, Italy, 18–20 June 2008; Lecture Notes in Computer Science. Springer: Berlin/Heidelberg, Germany, 2008; Volume 5029, pp. 290–302. [Google Scholar]
Bannai, H.; Tomohiro, I.; Inenaga, S.; Nakashima, Y.; Takeda, M.; Tsuruta, K. The “Runs” Theorem. SIAM J. Comput. 2017, 46, 1501–1514. [Google Scholar]
Holub, S. Prefix frequency of lost positions. Theor. Comput. Sci. 2017, 684, 43–52. [Google Scholar] [CrossRef]
Crochemore, M.; Kubica, M.; Radoszewski, J.; Rytter, W.; Walen, T. On the maximal sum of exponents of runs in a string. J. Discret. Algorithms 2012, 14, 9–36. [Google Scholar] [CrossRef]
Simpson, J. Modified Padovan words and the maximum number of runs in a word. Australas. J. Comb. 2010, 46, 129–145. [Google Scholar]
Puglisi, S.; Simpson, J. The expected number of runs in a word. Australas. J. Comb. 2008, 42, 45–54. [Google Scholar]
Christodoulakis, M.; Christou, M.; Crochemore, M.; Iliopoulos, C.S. On the average number of regularities in a word. Theor. Comput. Sci. 2014, 525, 3–9. [Google Scholar] [CrossRef]
Crochemore, M.; Ilie, L.; Smyth, W.F. A Simple Algorithm for Computing the Lempel-Ziv Factorization. In Proceedings of the Data Compression Conference (DCC 2008), Snowbird, UT, USA, 25–27 March 2008; pp. 482–488. [Google Scholar]
Kosolobov, D. Computing runs on a general alphabet. Inf. Process. Lett. 2016, 116, 241–244. [Google Scholar] [CrossRef]
Gawrychowski, P.; Kociumaka, T.; Rytter, W.; Waleń, T. Faster longest common extension queries in strings over general alphabets. In Proceedings of the 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016), Tel Aviv, Israel, 27–29 June 2016; Grossi, R., Lewenstein, M., Eds.; LIPIcs, Schloss Dagstuhl–Leibniz-Zentrum für Informatik: Wadern, Germany, 2016; Volume 54, pp. 5:1–5:13. [Google Scholar]
Crochemore, M.; Iliopoulos, C.S.; Kociumaka, T.; Kundu, R.; Pissis, S.P.; Radoszewski, J.; Rytter, W.; Waleń, T. Near-Optimal Computation of Runs over General Alphabet via Non-Crossing LCE Queries. In Proceedings of the 23rd String Processing and Information Retrieval (SPIRE 2016), Beppu, Japan, 18–20 October 2016; Lecture Notes in Computer Science. Inenaga, S., Sadakane, K., Sakai, T., Eds.; Springer: Cham, Switzerland, 2016; Volume 9954, pp. 22–34. [Google Scholar]
Ellert, J.; Fischer, J. Linear Time Runs over General Ordered Alphabets. In Proceedings of the 48th International Colloquium on Automata, Languages, and Programming (ICALP 2021), Glasgow, Scotland, 12–16 July 2021; Leibniz International Proceedings in Informatics (LIPIcs). Bansal, N., Merelli, E., Worrell, J., Eds.; Schloss Dagstuhl–Leibniz-Zentrum für Informatik: Dagstuhl, Germany, 2021; Volume 198, pp. 63:1–63:16. [Google Scholar]
Crochemore, M.; Iliopoulos, C.S.; Kubica, M.; Radoszewski, J.; Rytter, W.; Waleń, T. Extracting powers and periods in a word from its runs structure. Theor. Comput. Sci. 2014, 521, 29–41. [Google Scholar] [CrossRef]
Kolpakov, R. On primary and secondary repetitions in words. Theor. Comput. Sci. 2012, 418, 71–81. [Google Scholar] [CrossRef]
Kolpakov, R.; Podolskiy, M.; Posypkin, M.; Khrapov, N. Searching of gapped repeats and subrepetitions in a word. J. Discret. Algorithms 2017, 46, 1–15. [Google Scholar] [CrossRef]
Kociumaka, T.; Radoszewski, J.; Rytter, W.; Waleń, T. Internal pattern matching queries in a text and applications. In Proceedings of the Twenty-Sixth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2015), San Diego, CA, USA, 4–6 January 2015; Indyk, P., Ed.; Society for Industrial and Applied Mathematics: San Diego, CA, USA, 2015; pp. 532–551. [Google Scholar]
Crochemore, M.; Kolpakov, R.; Kucherov, G. Optimal bounds for computing α-gapped repeats. Inf. Comput. 2019, 268, 104434. [Google Scholar] [CrossRef]
Gawrychowski, P.; Tomohiro, I.; Inenaga, S.; Köppl, D.; Manea, F. Tighter Bounds and Optimal Algorithms for All Maximal α-gapped Repeats and Palindromes. Theory Comput. Syst. 2018, 62, 162–191. [Google Scholar] [CrossRef]
Badkobeh, G.; Crochemore, M.; Toopsuwan, C. Computing the Maximal-Exponent Repeats of an Overlap-Free String in Linear Time. In Proceedings of the 19th String Processing and Information Retrieval (SPIRE 2012), Cartagena de Indias, Colombia, 21–25 October 2012; Lecture Notes in Computer Science. Calderón-Benavides, L., Cristina, N., González-Caro, C.N., Chávez, E., Ziviani, N., Eds.; Springer: Berlin/Heidelberg, Germany, 2012; Volume 7608, pp. 61–72. [Google Scholar]
Tomohiro, I.; Köppl, D. Improved upper bounds on all maximal α-gapped repeats and palindromes. Theor. Comput. Sci. 2019, 753, 1–15. [Google Scholar]
Dumitran, M.; Manea, F. Longest Gapped Repeats and Palindromes. In Proceedings of the Mathematical Foundations of Computer Science 2015 (MFCS 2015), Milan, Italy, 24–28 August 2015; Lecture Notes in Computer Science. Italiano, G., Pighizzini, G., Sannella, D., Eds.; Springer: Berlin/Heidelberg, Germany, 2015; Volume 9234, pp. 205–217. [Google Scholar]

Figure 1. Principal and nonprincipal repeats.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kolpakov, R. Almost Optimal Searching of Maximal Subrepetitions in a Word. Mathematics 2022, 10, 3569. https://doi.org/10.3390/math10193569

AMA Style

Kolpakov R. Almost Optimal Searching of Maximal Subrepetitions in a Word. Mathematics. 2022; 10(19):3569. https://doi.org/10.3390/math10193569

Chicago/Turabian Style

Kolpakov, Roman. 2022. "Almost Optimal Searching of Maximal Subrepetitions in a Word" Mathematics 10, no. 19: 3569. https://doi.org/10.3390/math10193569

APA Style

Kolpakov, R. (2022). Almost Optimal Searching of Maximal Subrepetitions in a Word. Mathematics, 10(19), 3569. https://doi.org/10.3390/math10193569

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Almost Optimal Searching of Maximal Subrepetitions in a Word

Abstract

1. Introduction

2. Repeats

3. Covering of Maximal Repeats

4. Periodic and Generated Repeats

5. Birepresented Gapped Repeats

6. $α$ -Periodic Repeats

7. Algorithm for Solving Problem 3

8. Conclusions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Almost Optimal Searching of Maximal Subrepetitions in a Word

Abstract

1. Introduction

2. Repeats

3. Covering of Maximal Repeats

4. Periodic and Generated Repeats

5. Birepresented Gapped Repeats

6. α -Periodic Repeats

7. Algorithm for Solving Problem 3

8. Conclusions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

6. $α$ -Periodic Repeats