Sorting by Multi-Cut Rearrangements

Laurent Bulteau; Guillaume Fertin; Géraldine Jean; Christian Komusiewicz

doi:10.3390/a14060169

Abstract

A multi-cut rearrangement of a string S is a string

S^{'}

obtained from S by an operation called k-cut rearrangement, that consists of (1) cutting S at a given number k of places in S, making S the concatenated string

X_{1} \cdot X_{2} \cdot X_{3} \cdot \dots \cdot X_{k} \cdot X_{k + 1}

, where

X_{1}

and

X_{k + 1}

are possibly empty, and (2) rearranging the

X_{i}

s so as to obtain

S^{'} = X_{π (1)} \cdot X_{π (2)} \cdot X_{π (3)} \cdot \dots \cdot X_{π (k + 1)}

,

π

being a permutation on

1, 2, \dots, k + 1

satisfying

π (1) = 1

and

π (k + 1) = k + 1

. Given two strings S and T built on the same multiset of characters from an alphabet

Σ

, the Sorting by Multi-Cut Rearrangements (SMCR) problem asks whether a given number ℓ of k-cut rearrangements suffices to transform S into T. The SMCR problem generalizes several classical genomic rearrangements problems, such as Sorting by Transpositions and Sorting by Block Interchanges. It may also model chromoanagenesis, a recently discovered phenomenon consisting of massive simultaneous rearrangements. In this paper, we study the SMCR problem from an algorithmic complexity viewpoint. More precisely, we investigate its classical and parameterized complexity, as well as its approximability, in the general case or when S and T are permutations.

Keywords:

genome rearrangements; sorting; strings; permutations; algorithmic complexity

1. Introduction

Genome rearrangements are large-scale evolutionary events that affect the genome of a species. They include among others reversals [1], transpositions [2], block interchanges [3] and many more [4]. Compared to small-scale evolutionary events such as the insertion, deletion or substitution of single DNA nucleotides, they are considered to be rare and, until recently, were assumed to happen one after the other. In the recent literature, however, a new type of event, called chromoanagenesis, has been shown to occur in genomes [5,6]. The term chromoanagenesis subsumes different types of rearrangements (namely, chromothripsis, chromoanasynthesis and chromoplexy) whose common ground is the following: in a single event, the genome is cut into many blocks, and then rearranged. As stated by Pellestor and Gatinois [6], these are “massive chromosomal rearrangements arising during single chaotic cellular events”. Chromoanagenesis, and notably chromothripsis, is suspected to play a role in cancer and congenital diseases [5]. In this paper, we introduce a new model for genome rearrangements that is general enough to encompass most of the previously described genome rearrangements [4], as well as chromoanagenesis. In view of providing fast and accurate algorithms for simulating chromoanagenesis, and following a long tradition of algorithmic developments in the genome rearrangements community [4,7], our goal here is to study its properties in terms of computational complexity.

The present article is an extended and enriched version of a paper published by the same authors in SOFSEM 2021 [8].

1.1. Notations and Definitions

Given an alphabet

Σ

, let

Σ^{*}

be the set of all strings over

Σ

. We say that two strings

S, T \in Σ^{*}

are balanced if S and T are built on the same multiset of characters—in other words, each character in S also appears in T with the same number of occurrences. We denote by

| S |

the length of a string S. Unless otherwise stated, we assume that

| S | = | T | = n

. We denote by

S_{i}

,

1 \leq i \leq n

, the i-th character of S. Given a string S in

Σ^{*}

, we denote by d the maximum number of occurrences of any character of

Σ

in S. In the special case where

d = 1

(i.e., when S and T are permutations), and for any

0 \leq i \leq n

, we say that there is a breakpoint at position i in S (or, equivalently, that

(S_{i}, S_{i + 1})

is a breakpoint) if there is no position

0 \leq j \leq n

in T, such that the two consecutive characters

S_{i}

and

S_{i + 1}

are consecutive in T, i.e.,

T_{j} = S_{i}

and

T_{j + 1} = S_{i + 1}

. For the special cases

i = 0

and

i = n

, we artificially set

S_{0} = T_{0} = α_{0}

and

S_{n + 1} = T_{n + 1} = α_{n + 1}

, where

α_{0} \notin Σ

and

α_{n + 1} \notin Σ

. Thus, there is a breakpoint at position 0 (resp. n) in S whenever

S_{1} \neq T_{1}

(

S_{n} \neq T_{n}

). We also denote by

b (S, T)

the number of breakpoints in S with respect to T.

Definition 1.

Given a string

S \in Σ^{*}

and an integer k, a k-cut rearrangement of S is an operation consisting of the two following steps: (1) cut S at k locations (thus S can be written as the concatenation of

k + 1

strings, i.e.,

S = X_{1} \cdot X_{2} \cdot X_{3} \cdot \dots \cdot X_{k + 1}

, where any

X_{i}

is possibly empty, and where a cut occurs between

X_{i}

and

X_{i + 1}

,

1 \leq i \leq k

) and (2) rearrange (i.e., permute) the

X_{i}

s, so as to obtain

S^{'} = X_{π (1)} \cdot X_{π (2)} \cdot X_{π (3)} \cdot \dots \cdot X_{π (k + 1)}

, π being a permutation on the elements

1, 2, \dots, k + 1

, such that

π (1) = 1

and

π (k + 1) = k + 1

. Each of the

X_{i}

s considered in a given k-cut rearrangement will be called a block.

In the above model, any block is only allowed to move whenever it is cut at both its left and right extremities. Hence, moving a prefix of S (resp. a suffix of S) comes at the cost of an “extra cut” to its left (resp. right), and this corresponds, in the above definition, to the case where

X_{1}

(resp.

X_{k + 1}

) is empty.

Note also that, although a k-cut rearrangement has been defined as a cut along S at k locations, it is always possible, if necessary, to perform only

k^{'} \leq k

cuts, by cutting several times at the same location. Thus, one may mimic a k-cut rearrangement, while actually realizing a

k^{'}

-cut rearrangement. Note finally that, in this model, each block

X_{i}

can only be moved, thus no reversal of

X_{i}

is allowed, and therefore the strings we consider are always unsigned. In this paper, we study the following problem.

Sorting by Multi-Cut Rearrangements (SMCR)
Instance: Two balanced strings S and T, two integers k and ℓ.
Question: Is there a sequence of at most ℓ many k-cut rearrangements that transforms S into T?

Some examples of k-cut rearrangement scenarii are presented in Figure 1. For convenience, we may also refer to the SMCR problem with parameters k and ℓ as the

(k, ℓ)

-SMCR problem. Note that this notation shall not imply that k or ℓ are constants. Unless mentioned otherwise, input strings are considered to have an unbounded alphabet, although most hardness proofs also hold for constant-size alphabets, as detailed in the tables of results.

Figure 1. Examples of k-cut rearrangement scenarii. Cuts are represented by vertical lines and the order of the rearranged blocks is indicated by the framed numbers. (A) Example on strings

S = a b c e b c d a d c c b

and

T = a e a b b c c b c c d d

. (A.1) shows a 5-cut rearrangement scenario with

ℓ = 3

. (A.2) shows a 6-cut rearrangement scenario with

ℓ = 2

. (B) Example on permutations

S = 3 5 9 7 10 4 6 1 8 2

and

T = 1 2 3 4 5 6 7 8 9 10

with

b (S, T) = 11

. (B.1,B.2) show two alternative 5-cut rearrangement scenarii with

ℓ = 3

.

1.2. Parameterized Algorithmics

Our goal in this paper is to study the algorithmic properties of SMCR. In particular, we aim to outline the effect of bounds on k and ℓ on the complexity of the problem. To carry out our studies, we use the framework of parameterized algorithmics [9,10] which has been applied to many hard string problems in the past [11].

An instance of a parameterized problem consists of the (classical) input instance I and a parameter

κ \in N

. A parameterized problem is fixed-parameter tractable if every instance

(I, κ)

can be solved in

f (κ) \cdot poly (n)

time, where

n = | I |

is the size of the input, and f is some computable function. An algorithm with such a running time is called an FPT-algorithm. Thus, an FPT -algorithm can be considered pratical if f does not grow too fast and

κ

takes on relatively small values, for example if

f (κ) = 2^{κ}

and

κ

is upper-bounded by

log | I |

.

A particularly important technique for obtaining FPT-algorithms is kernelization, which is a framework for analyzing the power of data reduction rules for parameterized problems. A parameterized problem admits a kernelization, or simply a kernel, if there exists a polynomial-time algorithm that transforms every instance

(I, κ)

into an equivalent instance

(I^{'}, κ^{'})

, such that

| I^{'} | + κ^{'} \leq g (κ) .

The function g is called the kernel size. For any decidable parameterized problem L, the existence of a kernel implies that L has an FPT-algorithm with respect to

κ

, since the instance

(I^{'}, κ^{'})

can be solved in

f (κ)

time.

In our study, we consider the case that the parameter

κ

is one of k or ℓ, that

κ

is the combined parameter

k + ℓ

, or that k and ℓ are part of the input. Moreover, we consider some special cases where k or ℓ take on small constant values. Some of these cases correspond to well-studied problems, such as computing the transposition distance or the MCSP distance, as we discuss below. We will separately consider the case where S and T are strings (i.e.,

d > 1

both in S and T) and the case where S and T are permutations (i.e.,

d = 1

both in S and T), in Section 2 and Section 3, respectively.

1.3. Basic Observations

Both in permutations and strings, the cases

k = 1

and

k = 2

are trivial, since they do not allow one to move any block, and thus we are in the presence of a Yes-instance if, and only if,

S = T

. Additionally, the SMCR problem is a natural generalization and extension of several problems that have already been defined and studied in the literature before, as described hereafter.

When

k = 3

, each k-cut rearrangement is necessarily a transposition of blocks

X_{2}

and

X_{3}

(see Figure 2). Thus SMCR in that case is equivalent to the Sorting by Transpositions problem [2], which is NP-hard, even if S and T are permutations [12] or binary strings [13].

Figure 2. Relationship between classical rearrangements and k-cut rearrangements. Cuts are represented by vertical lines, while dashed rectangles define blocks that are moved in classical rearrangements. (a) when

k = 3

, a k-cut rearrangement corresponds to a transposition (here of blocks

X_{2}

and

X_{3}

). (b–f) are the 5 possible cases that we can get with a 4-cut rearrangement, all of them corresponding to a block interchange of dashed blocks.

When

k = 4

, each k-cut rearrangement allows one to move two blocks among

X_{2}

,

X_{3}

and

X_{4}

, which exactly corresponds to the Sorting by Block Interchange problem (see Figure 2). This problem is known to be in P for permutations [3] and NP-hard for strings (an NP-hardness proof for binary strings is given in ([14], Theorem 5.7.2)).

It may come as a surprise that the case

k = 4

is polynomial on permutations, when the case

k = 3

—and all other non-trivial cases—are NP-hard. The original algorithm for

k = 4

is based on the cycle graph (see Section 3.1.1 for formal definitions): in this setting, the goal is to create as many cycles as possible at each step (a block interchange may create at most two cycles). The key intuition is that such an “efficient” block-interchange always exists and can be polynomially computed. In other words, efficient block-interchanges commute, so it is safe to greedily apply the best-available operation at any point. This is in contrast with

k = 3

, since transpositions may also create up to two new cycles, but such efficient transpositions are not always guaranteed to exist and cannot be chosen greedily (applying one may destroy another). In fact, finding a sequence of such efficient transpositions is at the core of the NP-hardness proof [12]. On a related note, Lin et al. [15] proposed an algebraic formulation of the Sorting by Block Interchange problem, translating the cycles into permutation orbits, leading to a linear-time algorithm.

When

ℓ = 1

, the SMCR problem comes down to deciding whether k cuts are sufficient to rearrange S into T in one atomic move (i.e., one k-cut rearrangement). In permutations, the problem is trivially solved by counting the number

b (S, T)

of breakpoints between S and T, since we have a Yes-instance if and only if

b (S, T) \leq k

. In strings, the SMCR problem resembles the Minimum Common String Partition problem [16]. Both problems are indeed almost identical, and only differ in their optimum value, depending on whether S and T share the same prefix and/or suffix. This will be discussed in more detail in Theorems 5 and 6.

When k and ℓ are constant, SMCR is trivially polynomial-time solvable, since a brute-force algorithm, exhaustively testing all possible k-cut rearrangements at each of the ℓ permitted moves, has a running time of

O (n^{k ℓ + 1})

—the additional factor of n being needed to verify that the result corresponds to string T.

It is also natural to wonder whether

(k, ℓ)

-SMCR and

(k ℓ, 1)

-SMCR are equivalent. It can be easily seen that a Yes-instance for

(k, ℓ)

-SMCR is also a Yes-instance for

(k ℓ, 1)

-SMCR: it suffices for this to aggregate all cuts from the

(k, ℓ)

-SMCR solution (of which there are at most

k ℓ

), and rearrange accordingly. However, the reverse (i.e., from

(k ℓ, 1)

-SMCR to

(k, ℓ)

-SMCR) is not always true. For example, take

S = afedcbg

,

T = abcdefg

,

k = 3

, and

ℓ = 2

. This is a Yes-instance for

(6, 1)

-SMCR: One 6-cut rearrangement on S with the cuts

a | f | e | d | c | b | g

(symbolized as vertical segments) suffices to obtain T. In contrast, it is a No-instance for

(3, 2)

-SMCR: The number

b (S, T)

of breakpoints is equal to 6 and every 3-cut rearrangement is a transposition. In this instance, however, no transposition can decrease

b (S, T)

by 3. Thus, at least three 3-cut rearrangements are necessary to transform S into T.

2. Sorting by Multi-Cut Rearrangements in Strings

In this section, we provide algorithmic results concerning the Sorting by Multi-Cut Rearrangements problem, in the general case where S and T are strings. Our results are summarized in Table 1.

Table 1. Summary of the results for Sorting by Multi-Cut Rearrangements in strings. Recall that d denotes the maximum number of occurrences of any character in the input string S.

As mentioned in the previous section, we know that SMCR is NP-hard in binary strings for

k = 3, 4

[13,14]. In the following theorem, we extend this result to any value of k, however, in larger alphabet strings.

Theorem 1.

SMCRis NP-hard in k-ary strings for any fixed

k \geq 5

.

Proof.

By reduction from 3-Partition, a problem in which the input is a set

A

of integers and an integer B, and the question is whether

A

can be partitioned into triples such that the integers of each triple sum up to B. Observe that 3-Partition is strongly NP-hard, that is, it is NP-hard even if all integers are represented as unary numbers [17]. Given an instance of 3-Partition

(A, B)

with

A = {a_{1}, a_{2}, \dots, a_{3 m}}

and

m B = \sum_{i = 1}^{3 m} a_{i}

, we construct an instance of SMCR for any fixed

k \geq 5

as follows. By simple padding arguments, we may assume that each

a_{i}

is a multiple of

4 m

and that

\frac{B}{4} < a_{i} < \frac{B}{2}

. This implies the following property: If for some subset I of

{1, 2, \dots, 3 m}

and some

δ

with

0 \leq δ \leq 4 m

we have

\sum_{i \in I} a_{i} + δ = B + 4

, then

\sum_{i \in I} a_{i} = B

,

δ = 4

, and

| I | = 3

. We use a size-k alphabet

{0, 1, \dots, k - 1}

, we denote by X the string

k - 1 \cdot k - 2 \cdot \dots \cdot 3

, and by

X^{'}

the reverse of X, i.e.,

3 \cdot 4 \cdot \dots \cdot k - 1

. Note that X and

X^{'}

have length

k - 3 \geq 2

. We define

S = 10^{a_{1}} 10^{a_{2}} \dots 10^{a_{3 m}} 1 {(20 X 0 X 0 X 0)}^{m} 2

and

T = {(1 X^{'})}^{3 m} 1 {(20^{B + 4})}^{m} 2,

and set

ℓ = 3 m

. This completes the construction. Before proving its correctness, let us group length-2 substrings of S and T (hereafter called duos) based on whether they are in excess in S, in T, or equal in both strings.

Group 1 contains the duos $(0, 1), (1, 0), (0, k - 1)$ , $(k - 1, k - 2), \dots, (4, 3)$ , and $(3, 0)$ , which each occur $3 m$ times in S and which do not occur in T.
Group 2 contains the duos $(0, 0)$ , which occur $B m - 3 m$ times in S and $B m + 3 m$ times in T, and the duos $(1, 3), (3, 4), \dots, (k - 2, k - 1), (k - 1, 1)$ which do not occur in S, and occur $3 m$ times each in T.
Group 3 contains the duos $(0, 2)$ and $(2, 0)$ , which each occur m times in S and in T.

There are no further duos in S or T. To show the correctness of the reduction, we show that

(A, B)

is a Yes-instance of 3-Partition if, and only if, there exists a sequence of at most

ℓ = 3 m

many k-cut rearrangements transforming S into T.

(\Rightarrow)

Pick a solution of 3-Partition. For each triple

(a_{i}, a_{j}, a_{p})

of the solution, choose a unique substring

20 X 0 X 0 X 0

of S and perform the following three k-cut rearrangements: first, cut S around

0^{a_{i}}

and around the first copy of X in the chosen subsequence, and cut at every position inside X. Observe that the number of cuts is exactly k. Now reverse X into

X^{'}

and exchange

0^{a_{i}}

and

X^{'}

. Perform a similar k-cut rearrangement with

a_{j}

and the second occurrence of X and with

a_{p}

and the third occurrence of X in the selected substring. The selected substring is transformed into

200^{a_{i}} 00^{a_{j}} 00^{a_{k}} 0 = 20^{B + 4}

and since each string

0^{a_{i}}

is replaced by

X^{'}

, the first part of the string is

{(1 X^{'})}^{3 m}

. Hence, the string obtained by the

3 m

many k-cut rearrangements described above is T.

(\Leftarrow)

There are altogether

6 m + (k - 2) 3 m = 3 k m

duos in Group 1 which are in excess in S, and

3 k m

duos in Group 2 which are in excess in T. Since

ℓ = 3 m

, each k-cut rearrangement cuts k duos in Group 1 (and no duo in Group 2 or 3). In particular, no 00 duo may be cut in a feasible solution, so each subsequence

0^{B + 4}

in T is obtained by concatenating a number of strips of the form

0^{a_{i}}

, as well as some number

δ

of 0 singletons. Since S has

4 m

of these singleton 0s, we have

0 \leq δ \leq 4 m

. By the constraint on the values of

a_{i}

, each subsequence

0^{B + 4}

in T contains four singletons from S and three substrings

0^{a_{i}}

of S whose lengths sum to B. Thus, the m substrings

0^{B + 4}

in T correspond to a partition of

A

into m sets of three integers whose values sum up to B. □

Theorem 1 is improved by the following two theorems, which prove the NP-hardness of SMCR in ternary strings, but limited to odd values of k with

k \geq 7

. More precisely, Theorem 2 focuses on the specific case

k = 7

, while Theorem 3 extends the result to any odd

k \geq 7

.

Theorem 2.

SMCR isNP-hard for

k = 7

in ternary strings.

Proof.

The proof is, as the proof of Theorem 1, by reduction from the strongly NP-hard 3-Partition problem, in which the input is a set

A

of integers and an integer B, and the question is whether

A

can be partitioned into triplets, such that the sum of the integers in each triplet is equal to B. Given an instance

(A, B)

of 3-Partition with

A = {a_{1}, a_{2}, \dots, a_{3 m}}

such that (i)

\frac{B}{4} < a_{i} < \frac{B}{2}

for any

1 \leq i \leq 3 m

and (ii)

m B = \sum_{i = 1}^{3 m} a_{i}

, we build the following instance

(S, T, k, ℓ)

of SMCR:

S = 10^{a_{1}} 10^{a_{2}} \dots 10^{a_{3 m}} 12^{m + 1}

T = 1^{3 m + 1} {(20^{B})}^{m} 2

with

k = 7

and

ℓ = m

. Clearly, S and T are built on an alphabet of cardinality 3. Now, let us prove the correctness of our reduction.

(⇒) Suppose the instance

(A, B)

of 3-Partition is a Yes-instance. Thus,

A

can be partitioned in sets

A_{1}, A_{2}, \dots, A_{m}

, such that

A_{j} = {a_{j_{1}}, a_{j_{2}}, a_{j_{3}}}

and

a_{j_{1}} + a_{j_{2}} + a_{j_{3}} = B

for any

1 \leq j \leq m

. We now show how to reach T from S, using 7-cut rearrangements, in

ℓ = m

steps: at each step

1 \leq j \leq m

, cut in S to the right of the 1 preceding

0^{a_{j_{1}}}

(resp.

0^{a_{j_{2}}}

,

0^{a_{j_{3}}}

) and to the left of the 1 following

0^{a_{j_{1}}}

(resp.

0^{a_{j_{2}}}

,

0^{a_{j_{3}}}

), for a total of 6 cuts; the last cut is to the right of the j-th occurrence of 2. The 6 first cuts allow one to move blocks

0^{a_{j_{1}}}

,

0^{a_{j_{2}}}

,

0^{a_{j_{3}}}

, which we concatenate so as to obtain

0^{B}

(since

a_{j_{1}} + a_{j_{2}} + a_{j_{3}} = B

by hypothesis). The 7th cut allows us to insert block

0^{B}

between the j-th and the

(j + 1)

-th copy of 2. An illustration of the first step of the above described algorithm for solving SMCR is provided in Figure 3.

Figure 3. Example of the proposed reduction starting from the following instance of 3-Partition:

A = {6, 4, 4, 7, 4, 5, 4, 7, 5, 5, 4, 5}

(thus

m = 4

) and

B = 15

. Each

a_{i}

is represented by a rectangle containing

a_{i}

squares. Colors in

A

describe a way to partition

A

into triplets, each summing to

B = 15

. S and T are built as described in the reduction. Each vertical arrow corresponds to a 7-cut. The first one, which is the first step of the proposed 7-cut rearrangement scenario, going from S to

S^{'}

, is detailed: each red bullet corresponds to a cut, and blocks of 0 that are rearranged are underlined in green (corresponding to the three “green

a_{i}

s” summing to B).

After every step

1 \leq j \leq m

is achieved, three of the

0^{a_{i}}

blocks (each one being surrounded by 1s) are removed, while a

0^{B}

block is created between two occurrences of 2. Thus, after step m, we have produced sequence

1^{3 m + 1} {(20^{B})}^{m} 2 = T

. Since every step uses

k = 7

cuts and T is obtained after

m = ℓ

steps, we conclude that

(S, T, k, ℓ)

is a Yes-instance for SMCR.

(⇐) Suppose that the instance

(S, T, k, ℓ)

that we have built is a Yes-instance. First, we note that S contains exactly

7 m

duos (i.e., length-2 substrings) that are absent from T, while T contains

7 m

duos that are absent from S, as shown in the exhaustive list of duos provided in Table 2.

Table 2. Exhaustive list of duos present both in S and T, together with their occurrences.

Thus, starting from S, and in order to end in T, there are altogether

7 m

duos to destroy, and

7 m

to create. Since

k = 7

and

ℓ = m

, for each of the m allowed steps, 7 duos need to be destroyed (and never further created), while 7 duos need to be created (and never further destroyed). In particular, this implies that we never cut strictly inside a block of the form

0^{a_{i}}

, otherwise a

(0, 0)

duo would be destroyed (whereas the goal is to create altogether

2 m = (B - 1) m - (B - 3) m

of them). Since, in T, blocks containing 0s all are of the form

0^{B}

, this implies that there is a way to arrange all blocks of the form

0^{a_{i}}

from S, by concatenating them (and never cutting them), so that we obtain m blocks of the form

0^{B}

. This shows that it is possible to partition

A

into m sets, each achieving a sum equal to B. However, by hypothesis, each

a_{i}

satisfies

\frac{B}{4} < a_{i} < \frac{B}{2}

. Hence, the only way to partition

A

in the above described manner is to group the

a_{i}

s by triplets. Consequently, this shows that

(A, B)

is a Yes-instance for 3-Partition. □

Proof of Theorem 2 above can be extended so as to prove NP-hardness of SMCR for any odd

k \geq 7

in ternary alphabets. This is the purpose of the following theorem.

Theorem 3.

SMCR isNP-hard for any odd

k \geq 7

in ternary strings.

The proof is heavily based on the arguments developed in proof of Theorem 2. The idea here is to reduce from a decision problem that we call p-Partition (

p \geq 3

being an integer), which is an extension of 3-Partition; second, we slightly modify strings S and T according to this new problem. First, let us formally introduce p-Partition, and discuss its NP-hardness. The input of p-Partition is a set

A^{'} = {a_{1}^{'}, a_{2}^{'}, \dots, a_{p m}^{'}}

and an integer

B^{'}

, such that

m B^{'} = \sum_{i = 1}^{m p} a_{i}^{'}

, and the question is whether

A^{'}

can be partitioned into sets

A_{1}^{'}, A_{2}^{'}, \dots, A_{m}^{'}

, each of cardinality p, and such that, for any

1 \leq j \leq m

,

\sum_{a_{i}^{'} \in A_{j}^{'}} a_{i}^{'} = B^{'}

. We call Balanced p-Partition the restriction where

a_{i}^{'} > \frac{B^{'}}{p + 1}

for all i.

Lemma 1.

For any

p \geq 3

,Balanced p-Partitionis stronglyNP-hard.

Proof.

First, we show that for any

p \geq 3

, p-Partition is strongly NP-hard, by reduction from 3-Partition. Indeed, consider an instance

(A, B)

of 3-Partition, with

A = {a_{1}, a_{2}, \dots, a_{3 m}}

and

B = \frac{1}{m} \sum_{i = 1}^{3 m} a_{i}

. Let

N = B + 1

. We now construct an instance

(A^{'}, B^{'})

of p-Partition as follows: for any

1 \leq i \leq 3 m

,

a_{i}^{'} = a_{i}

, while

a_{i}^{'} = N

for all

3 m + 1 \leq i \leq p m

. Let

B^{'} = \frac{1}{m} \sum_{i = 1}^{p m} a_{i}^{'}

. Then,

B^{'} = \frac{1}{m} (\sum_{i = 1}^{m} a_{i} + (p - 3) m N)

, that is

B^{'} = B + (p - 3) N

. Note that

B^{'} < (p - 2) N

.

(⇒) If

(A, B)

is a Yes-instance for 3-Partition, then assume that

A

can be partitioned into

A_{1}, A_{2}, \dots, A_{m}

. Now, for any

1 \leq j \leq m

, take

A_{j}^{'}

to be the union of

A_{j}

and of any

(p - 3)

of the remaining elements among

a_{3 m + 1}^{'}, a_{3 m + 2}^{'}, \dots, a_{p m}^{'}

. The sum of the elements in

A_{j}^{'}

is equal to

B^{'}

, and sets

A_{j}^{'}

together form a partition of

A^{'}

. Hence,

(A^{'}, B^{'})

is a Yes-instance for p-Partition.

(⇐) If

(A^{'}, B^{'})

is a Yes-instance for p-Partition, then assume that

A^{'}

can be partitioned into

A_{1}^{'}, A_{2}^{'}, \dots, A_{m}^{'}

. Any

A_{j}^{'}

contains at least three elements among

{a_{1}^{'}, a_{2}^{'}, \dots, a_{3 m}^{'}}

, otherwise the sum of elements of

A_{j}^{'}

would be at least

(p - 2) N > B^{'}

. Since there are

3 m

such elements, we conclude that each

A_{j}^{'}

must contain exactly three elements of

A

. Now, let

A_{j} = A_{j}^{'} \cap A

. Let

S_{j}

(resp.

S_{j}^{'}

) be the sum of the elements of

A_{j}

(resp.

A_{j}^{'}

). Then, we have

S_{j}^{'} = S_{j} + (p - 3) N

. Since

S_{j}^{'} = B^{'}

, we conclude that

S_{j} = B

. This shows that

A_{1}, A_{2}, \dots, A_{m}

is a partition of

A

for which each set is of cardinality 3 and sums to B. In other words,

(A, B)

is a Yes-instance for 3-Partition.

Finally, the following argument shows that p-Partition remains strongly NP-hard in the balanced restriction. Indeed, take any instance of p-Partition and modify it as follows: every

a_{i}^{'}

is increased by

B^{'}

, and set

B^{''}

= (p + 1) B^{'}

. In that case, the sum of all elements from

A^{'}

is

m (p + 1) B^{'}

, and if a partition

A_{1}^{'}, A_{2}^{'}, \dots, A_{m}^{'}

exists, the sum of the elements of each

A_{j}^{'}

is equal to

B^{''}

. Since

a_{i}^{'} > B^{'}

, we conclude that

a_{i}^{'} > \frac{B^{″}}{p + 1}

and we are done. Note that each

a_{i}^{'}

remains bounded by a polynomial on the instance size (since each

a_{i}

and B is polynomialy bounded in 3-Partition), hence preserving the strong NP-hardness. □

Proof of Theorem 3

Now, in order to prove that SMCR is NP-hard in ternary strings for any odd

k \geq 7

, it suffices to adapt the reduction provided in proof of Theorem 2.

Hence, starting from any instance

(A^{'}, B^{'})

of Balanced p-Partition, we build the following instance

(S, T, k, ℓ)

of SMCR:

S = 10^{a_{1}^{'}} 10^{a_{2}^{'}} \dots 10^{a_{p m}^{'}} 12^{m + 1}

T = 1^{p m + 1} {(20^{B^{'}})}^{m} 2

with

k = 2 p + 1

and

ℓ = m

. Clearly, S and T are built on an alphabet of cardinality 3.

Correctness of the reduction relies on the same type of arguments, as in proof of Theorem 2, and are only informally described here: if a partition of

A^{'}

exists, this provides us with a rearrangement scenario where, at each step,

2 p + 1

cuts are applied. More precisely,

2 p

of these cuts allow one to “disconnect” the p blocks of the form

0^{a_{i}^{'}}

corresponding to a set

A_{j}^{'}

(of cardinality p) of the partition, while the last cut allows one to insert the concatenated abovementioned blocks into a single

0^{B^{'}}

block between two occurrences of character 2. The reverse direction is based on the number of duos that need to be deleted and created: there are

(2 p + 1) m

duos in S which are not present in T, and

(2 p + 1) m

duos in T which are not present in S; besides,

(2 p + 1) m = k ℓ

is the total number of cuts we are allowed altogether in a rearrangement scenario from S to T. Consequently, no

(0, 0)

duo can be deleted during such a scenario, which shows that any block of the form

0^{B^{'}}

in T must be the concatenation of blocks of the form

0^{a_{i}^{'}}

. This in turn proves that

A^{'}

can be partitioned in sets, each of which sums to

B^{'}

. Moreover, since every

a_{i}^{'} > \frac{B^{'}}{p + 1}

, all sets in that partition are of cardinality at most p. However, since there are m sets and altogether

m p

elements to partition, we conclude that each set is necessarily of cardinality exactly p. Consequently, we are in the presence of a Yes-instance for p-Partition. □

Theorem 3 above shows that SMCR is NP-hard for ternary strings, for any odd

k \geq 7

. We conjecture that this result can be extended in two ways, i.e., that SMCR is NP-hard for any

k \geq 5

(both odd and even), even in the case of binary strings.

Theorems 1 to 3 are concerned with the complexity of SMCR relatively to parameter k. We now turn our attention to parameter ℓ.

Theorem 4.

SMCR isNP-hard for

| Σ | = 5

and any fixed

ℓ \geq 1

.

Proof.

The reduction being very similar to the one of Theorem 1, we only highlight here the differences that allow us to have a fixed ℓ instead of fixed k. First assume that m is a multiple of ℓ (add up to ℓ triples of dummy elements otherwise), and let

k = \frac{15 m}{ℓ}

. Note that k is a multiple of 5. The reduction is the same as in proof of Theorem 1 using

k^{'} = 5

(i.e., with a size-5 alphabet). In other words, we have

X = 43

and

X^{'} = 34

.

In the forward direction, use the described scenario using 5-cut rearrangements, but combine a series of

k / 5

such rearrangements into a single k-cut rearrangement, as described at the end of Section 1.3. This gives a total of

\frac{3 m}{k / 5} = ℓ

many k-cut rearrangements sorting S into T. In the reverse direction, the same breakpoint count as in proof of Theorem 1 holds, namely

3 k^{'} m = 15 m

duos need to be broken using ℓ many k-cut rearrangements, with

ℓ k = 15 m

. So, again, no

(0, 0)

duo may be broken, and by the same argument, we obtain a valid 3-partition of

A

. □

The previous theorem shows NP-hardness of SMCR for any fixed ℓ. However, a stronger result can be obtained in the specific case

ℓ = 1

.

Theorem 5.

SMCR isNP-hard when

ℓ = 1

, even when

d = 2

.

Proof.

The proof is obtained by reduction from the Minimum Common String Partition (MCSP) problem, which has been proven to be NP-hard in strings, even when

d = 2

[16]. Recall that the decision version of MCSP asks, given two balanced strings S and T, and an integer p, whether S can be written as the concatenation of p blocks

S = X_{1} \cdot X_{2} \cdot \dots \cdot X_{p - 1} \cdot X_{p}

and T can be written as

T = X_{π (1)} \cdot X_{π (2)} \cdot \dots \cdot X_{π (p - 1)} \cdot X_{π (p)}

, where

π

is a permutation of

1, 2, \dots, p

. Note that here, we may have

π (1) = 1

and/or

π (p) = p

.

Given an instance

(S, T, p)

of MCSP, we build an instance

(S^{'}, T^{'}, k, ℓ)

of SMCR by setting

S^{'} = x \cdot S

,

T^{'} = T \cdot x

(with

x \notin Σ

),

k = p + 2

and

ℓ = 1

. Clearly, if

(S, T, p)

is a Yes-instance for MCSP, then

(S^{'}, T^{'}, p + 2, 1)

is a Yes-instance for SMCR: the MCSP solution uses

p - 1

cuts, to which we add one before x, one after x, and one after S for solving SMCR. Conversely, if

(S^{'}, T^{'}, k, ℓ)

is a Yes-instance for SMCR, and since x occurs only once in

S^{'}

, then 2 cuts are used to “isolate” x from

S^{'}

. Besides, since

T^{'}

ends with x, there must exist a cut after the last character of

S^{'}

. Hence, since

S^{'} = x \cdot S

, at most

k - 3 = p - 1

cuts are used strictly within S, which in turns means that S has been decomposed in p blocks, which can be rearranged so as to obtain T, since

ℓ = 1

. Thus,

(S, T, p)

is a Yes-instance for MCSP. □

Note that MCSP has been proved to be in FPT with respect to the size of the solution [18]. It can be seen that this result can be adapted for the SMCR problem in the case

ℓ = 1

.

Theorem 6.

When

ℓ = 1

,SMCRisFPTwith respect to parameter k.

Proof.

Assuming

S \neq T

, let A (resp. B) be the length of the longest common prefix (resp. suffix) of S and T. For

0 \leq a \leq A

and

0 \leq b \leq B

, let

S_{a, b}

and

T_{a, b}

be the strings obtained from S and T by removing the first a and last b characters. Then, T can be obtained from S by one k-cut rearrangement if and only if, for some pair

(a, b)

,

S_{a, b}

and

T_{a, b}

admit a common string partition into

k - 1

blocks. Indeed, this is easy to verify by matching the limits of the blocks in MCSP (including at the end of the strings) with the cuts of the rearrangement. So, SMCR when

ℓ = 1

can be solved using

O (n^{2})

calls to MCSP with parameter

k - 1

, each with a different pair

(a, b)

, which itself is FPT for k [18]. □

Note that it is not sufficient to check only with the longest common prefix and suffix (i.e.,

S_{A, B}

and

T_{A, B}

), which can be seen in the following example. When

S = xxzyxtyy

and

T = xxtyxzyy

, then S can be transformed into T via the 3-cut rearrangement with the cuts

x | xzy | xty | y

. Moreover, we have

A = B = 2

, but only

S_{1, 1} = xzy xty

and

T_{1, 1} = xty xzy

have a common partition into two blocks, whereas

S_{1, 1} = zyxt

and

T_{2, 2} = tyxz

do not have such a partition.

3. Sorting by Multi-Cut Rearrangements in Permutations

In this section, we provide algorithmic results concerning the Sorting by Multi-Cut Rearrangements problem, in the specific case where S and T are permutations. Our results are summarized in Table 3.

Table 3. Summary of the results for Sorting by Multi-Cut Rearrangements in permutations. Additionally, Opt-SMCR admits a 2-approximation algorithm (Theorem 10).

We start with showing the NP-hardness for constant k or ℓ (Theorems 7 and 8). Then, we show fixed-parameter tractability for the combined parameter

k + ℓ

in Theorem 9. Finally, we provide a 2-approximation result (Theorem 10).

3.1. Hardness for Constant Number of Cuts

We now show that SMCR is also hard in permutations.

Theorem 7.

For any

k \geq 5

,SMCRin permutations isNP-hard.

Theorem 7 gives a complexity dichotomy with respect to the number of cuts in the rearrangement, since the case

k = 3

is known to be NP-hard [12] and the case

k = 4

is known to be polynomial-time solvable [3].

We show Theorem 7 by reduction from Sorting By Transpositions on 3-cyclic permutations [12]. Intuitively, in such permutations, it is straightforward to identify triples of breakpoints, called 3-cycles, that should be solved together in a transposition. The difficulty arises in selecting a correct order in which those 3-cycles should be solved. Our approach consists of extending these 3-cycles into k-cycles, such that any k-cut rearrangement solving the original cycle must solve all k breakpoints together, and still performs a simple transposition on the rest of the sequence (to this end,

k - 3

dummy elements are created in order to consume the extra blocks in k-cut rearrangements). We first recall the necessary definitions and properties for breakpoints and cyclic permutations, then show how to extend a single cycle by only two or three elements, and finally successively apply this method to extend all cycles to any size

k \geq 5

.

3.1.1. Breakpoints and Cycle Graph

See Figure 4 for an illustration of the definitions in this section. For a permutation S of length n, we assume that the alphabet of S is

{1, 2, \dots, n}

. We further write

S_{0} = 0

and

S_{n + 1} = n + 1

(they are considered as implicit elements: they may not be part of rearrangements, but allow all other characters to have another element before and after). For a rearrangement r transforming S into

S^{'}

, we write

r (S) = S^{'}

and

r (S, T) = (S^{'}, T)

. The cycle graph

C (S, T)

of strings S and T is the graph over

n + 1

vertices

{0, 1, \dots, n}

with arcs

T_{j} \to S_{i}

if

T_{j + 1} = S_{i + 1}

. Every vertex has indegree and outdegree 1, so the graph is a disjoint union of cycles. Self-loops are called trivial cycles (when seen as a cycle) or adjacencies (when seen as an arc); other arcs are breakpoints. An element (or vertex) x is an adjacency (resp. breakpoint) according to its outgoing arc (we use transparently the bijection between a vertex and its outgoing arc). A k-cycle is a cycle of length k. The next breakpoint of breakpoint

x \to y

in

C (S, T)

is y (or equivalently, the outgoing arc of y). We write

C_{x} (S, T)

for the cycle of

C (S, T)

containing element x. A cycle graph (and, by extension, a pair of sequences generating this cycle graph) is k-cyclic if it contains only adjacencies and k-cycles. A rearrangement r applied to a permutation Scuts an element x,

0 \leq x \leq n

, if r cuts between

x = S_{i}

and

S_{i + 1}

. Furthermore, rsolves breakpointx if r cuts x and x is an adjacency in

r (S)

. A rearrangement rsolves a cycle if it solves all breakpoints in it. We write

d_{b} (S, T)

for the number of breakpoints of

C (S, T)

. A k-cut rearrangement is efficient if it solves k breakpoints. A pair

(S, T)

is k-efficiently sortable if there exists a sequence of efficient k-cut rearrangements transforming S into T. The following is a trivial generalization of a well-known lower bound for the transposition distance.

Figure 4. Example of a 3-cut rearrangement scenario from

S = 5 2 4 7 3 8 6 1 9

to

T = 1 2 3 4 5 6 7 8 9

using cycle graphs. From top to bottom: The cycle graph for

(S, T)

contains three non-trivial cycles and a trivial one (breakpoint u is depicted to the right of the label for u). Blocks 1 to 4 of a 3-cut rearrangement (cutting after 5, 8, and 1) are given: it solves cycle

5 \to 8 \to 1 \to 5

(denoted

(5, 8, 1)

for short). Two other efficient rearrangements solve cycles

(0, 6, 4)

and

(2, 7, 3)

, thus reaching T in three steps. Note that some cycles do not admit efficient 3-cut rearrangement, e.g.,

(0, 6, 4)

in the original permutation S.

Proposition 1.

A k-cut rearrangement may not solve more than k breakpoints, so S needs at least

\frac{d_{b} (S, T)}{k}

k-cut rearrangements to be transformed into T. Furthermore, the bound is reached if, and only if,

(S, T)

is k-efficiently sortable.

Proposition 2.

If r solves a breakpoint, it cuts the next breakpoint in the cycle graph.

Proof.

Let

x \to y

be an arc of the cycle graph, and let

x^{'}

be the successor of x in T as well as the successor of y in S. If r solves x, then r joins a block ending in x with a block starting in

x^{'}

, so

x^{'}

is the first element of some block of r. Thus, y is the last element of some block of r, and r cuts the breakpoint y in

C (S, T)

. □

Proposition 3.

If r is efficient, it solves a cycle if, and only if, it solves any breakpoint in it. Furthermore, r solves all breakpoints in a union of cycles of total size k.

Proof.

If r is efficient, then it solves all breakpoints that it cuts (since it may not solve a breakpoint without cutting them, and it solves and cuts k breakpoints). Due to Proposition 2, if r solves a breakpoint in a cycle, then it must solve all subsequent arcs in the same cycle. Hence, r either solves all breakpoints of a cycle or none at all. The size constraint follows from the fact that all cycles are disjoint. □

Cycle

C_{1}

is tied to another cycle

C_{2}

through the pair of breakpoints

(x, y)

if x is in

C_{1}

, y is in

C_{2}

, the permutation S has

S_{i} = y

and

S_{i + 1} = x

for some i, and T has

T_{j} = x

and

T_{j + 1} = y

for some y (for an illustration, see Figure 5). A breakpoint is without ties if no cycle is tied to the cycle containing it.

Figure 5. (a) The cycle

(1, 5, 8)

from Figure 4, and how solving it affects the rest of the sequence (substrings

2 - 8

and

6 - 1

, i.e., blocks

and

are swapped). (b) The 2-extension of S for breakpoint

x = 1

, giving cycle

(1, n + 1, n + 2, 5, 8)

. This extended cycle can be solved with a 5-cut rearrangement having an equivalent effect on the sequence (substrings

2 - 8

and

6 - 1

are swapped). The resulting

S_{2}^{'}

is the 2-extension of

S^{'}

for adjacency

x = 1

. (c) The 3-extension of S for breakpoint

x = 1

, giving cycles

(1, n + 2, 5, 8)

and

(n + 1, n + 3)

(the latter being tied to the former: there is no way of solving breakpoint

n + 3

without cutting

n + 2

). These two cycles can be solved with a 6-cut rearrangement having an equivalent effect on the sequence (again, substrings

2 - 8

and

6 - 1

are swapped). The resulting

S_{3}^{'}

is the 3-extension of

S^{'}

for adjacency

x = 1

.

Proposition 4.

If

C_{1}

is tied to

C_{2}

, then any efficient rearrangement solving

C_{1}

must also solve

C_{2}

.

Proof.

Let r be an efficient rearrangement solving

C_{1}

and, in particular, x. Then, r must place y after x in

r (S)

, although y is before x in S, so r must have a cut somewhere between y and x, i.e., just after y. So, r cuts breakpoint y, and solves cycle

C_{2}

. □

3.1.2. One-Cycle Extensions

We now introduce the central technical engine for the reduction of Theorem 7. Let

(S, T)

be a pair of permutations. Let x be a vertex of

C (S, T)

with the following properties (we say that x is safe): x is either an adjacency or a breakpoint without ties in a cycle of length

k_{x} \geq 3

, and all 2-cycles in

C (S, T)

are tied. The p-extension of

(S, T)

on x, with

p \in {2, 3}

, denoted

ϕ_{x}^{p} (S, T)

is the pair

(S^{'}, T^{'})

such that (see Figure 5 for an example):

For $p = 2$ :
$\begin{matrix} S^{'} & = (S_{1}, \dots, S_{i} = x, n + 2, S_{i + 1}, \dots, S_{n}, n + 1) & if x is an adjacency \\ S^{'} & = (S_{1}, \dots, S_{n}, n + 1, n + 2) & if x is a breakpoint \\ T^{'} & = (T_{1}, \dots, T_{j} = x, n + 2, T_{j + 1} = S_{i + 1}, \dots, T_{n}, n + 1) \end{matrix}$
For $p = 3$ :
$\begin{matrix} S^{'} & = (S_{1}, \dots, S_{i} = x, n + 3, n + 2, S_{i + 1}, \dots, S_{n}, n + 1) & if x is an adjacency \\ S^{'} & = (S_{1}, \dots, S_{n}, n + 1, n + 2, n + 3) & if x is a breakpoint \\ T^{'} & = (T_{1}, \dots, T_{j} = x, n + 3, n + 2, T_{j + 1} = S_{i + 1}, \dots, T_{n}, n + 1) \end{matrix}$

Note that the implicit rightmost element

S_{n + 1} = n + 1

is always inserted explicitly (indeed,

S_{n}

is followed by

n + 1

in both

S^{'}

and

T^{'}

), and becomes

S_{n + p + 1}^{'} = T_{n + p + 1}^{'} = n + p + 1

in the new permutations.

Lemma 2.

A p-extension on x has the following effects on the cycle graph:

if x is an adjacency, it adds p trivial cycles;
if x is a breakpoint and $p = 2$ , it adds $n + 1$ and $n + 2$ to the cycle containing x;
if x is a breakpoint and $p = 3$ , it adds $n + 2$ to the cycle containing x and a 2-cycle $(n + 1, n + 3)$ tied to the one containing x.

Other arcs and tied cycles are unchanged.

Proof.

If x is an adjacency, the p-extension inserts elements

n + 1

to

n + p

in both strings in the same order after x, and they are followed by the same element in both strings, since x is an adjacency, so only trivial cycles are added.

Assume now that x is a breakpoint. Consider first an arc

T_{j} \to S_{i}

with

T_{j} \neq x

in

C (S, T)

. Since no element is inserted after

T_{j}

or

S_{i}

,

T_{j} \to S_{i}

also appears in

C (S^{'}, T^{'})

(the case

i = j = n

is particular, as

n + 1

is explicitly introduced in both sequences, but it also yields the arc

T_{j} \to S_{j}

in

C (S^{'}, T^{'})

). If a cycle is tied to another one through a pair

(x, y)

in S and

(y, x)

in T, these duos cannot be broken by the p-extension (since x is safe, no cycle can be tied to

C_{x} (S, T)

), so it is still tied after the extension. Similarly, a non-tied cycle cannot become tied because of the extension.

It remains to describe arcs going out from

{x, n + 1, \dots, n + p}

. Let y be the head of the outgoing arc from x.

For j such that

T_{j}^{'} = x

, we have

T_{j + 1}^{'} = n + p = S_{n + p}^{'}

, so there exists an arc

x \to S_{n + p - 1}^{'} = n + p - 1

in

C (S^{'}, T^{'})

(note, in particular, that y no longer has its incoming arc

x \to y

).

For j such that

T_{j}^{'} = n + 1

, we have

T_{j + 1}^{'} = n + p + 1 = S_{n + p + 1}^{'}

, so there exists an arc

n + 1 \to S_{n + p}^{'} = n + p

in

C (S^{'}, T^{'})

.

For

p = 3

and j such that

T_{j}^{'} = n + 3

, we have

T_{j + 1}^{'} = n + 2 = S_{n + 2}^{'}

, so there exists an arc

n + 3 \to S_{n + 1}^{'} = n + 1

in

C (S^{'}, T^{'})

.

At this point, the outgoing arcs for all vertices except

n + 2

have been described, as well as incoming arcs for all vertices except y, so the last remaining arc is

n + 2 \to y

.

Overall, for

p = 2

, arc

x \to y

is replaced with the path

x \to n + 1 \to n + 2 \to y

. For

p = 3

, arc

x \to y

is replaced with

x \to n + 2 \to y

and a 2-cycle

n + 1 \leftrightarrow n + 3

is created. Note that this 2-cycle is tied to

C_{x} (S^{'}, T^{'})

through

(n + 2, n + 3)

. □

We now show how efficient rearrangements can be adapted through extensions. Let r be a k-cut rearrangement of

(S, T)

. We write

r^{'} = ψ_{x}^{p} (r)

for the

k^{'}

-cut rearrangement of

(S^{'}, T^{'}) = ϕ_{x}^{p} (S, T)

defined as follows:

if r does not cut x, then $k^{'} = k$ , $r^{'}$ cuts the same elements as r, and rearranges the blocks in the same order;
if r cuts x, then $k^{'} = k + p$ , $r^{'}$ cuts the same elements as r, as well as $n + 1$ , $n + 2$ and $n + 3$ (when $p = 3$ ), and rearranges the blocks in the same way as r, with elements $n + 3$ (when $p = 3$ ) and $n + 2$ inserted after x.

The following two lemmas show how efficient rearrangements of

(S, T)

and those of

ϕ_{x}^{p} (S, T)

are related through

ψ_{x}^{p}

.

Lemma 3.

If r is an efficient k-cut rearrangement of

(S, T)

, then

r^{'} = ψ_{x}^{p} (r)

is an efficient

k^{'}

-cut rearrangement of

(S^{'}, T^{'}) = ϕ_{x}^{p} (S, T)

. Furthermore,

r^{'} (S^{'}, T^{'}) = ϕ_{x}^{p} (r (S, T))

.

Proof.

If r does not cut x, then

k^{'} = k

and

r^{'}

solves in

(S^{'}, T^{'})

exactly the same breakpoints as r, so it is efficient. Furthermore, all elements in

r^{'} (S^{'}, T^{'})

and

ϕ_{x}^{p} (r (S, T))

are in the same order as in

r (S, T)

, except for

n + 1, \dots, n + p

, which are inserted, in both cases, at the end of S and in T as in

T^{'}

(since r and

r^{'}

do not edit the second string).

If r cuts x,

r^{'}

furthermore solves breakpoints

n + 1, \dots, n + p

, since it rearranges these elements in the same order as in

T^{'}

. So, it is an efficient rearrangement as well. Finally, all elements in

r^{'} (S^{'}, T^{'})

and

ϕ_{x}^{p} (r (S, T))

are in the same order as in

r (S, T)

, except for

n + p, \dots, n + 2

(which are inserted after x in both strings) and

n + 1

(which is inserted as a last element). □

The following claim will be useful for proving Lemma 4.

Claim 1.

Either

r^{'}

solves all breakpoints in

{x, n + 1, \dots, n + p}

, or none at all.

Proof.

For

p = 2

, this is a direct application of Proposition 3, since elements x,

n + 1

and

n + 2

are in the same cycle of

C (S^{'}, T^{'})

.

For

p = 3

, by Lemma 2,

C_{x} = C_{x} (S^{'}, T^{'})

is a

(k_{x} + 1)

-cycle containing x and

n + 2

, and

C (S^{'}, T^{'})

also contains a cycle denoted

C_{y}

with elements

n + 1

and

n + 3

. By Proposition 3,

r^{'}

solves any element in

C_{x}

(resp.

C_{y}

) if, and only if, it solves all elements in the same cycle (in particular,

k^{'} \geq k_{x} + 1

if

r^{'}

cuts x, so

k^{'} = k + p

). Furthermore,

C_{y}

is tied to

C_{x}

, so if

r^{'}

solves

C_{y}

, it must also solve

C_{x}

(by Proposition 4). Now, all that is left is to check the last direction: if

r^{'}

solves

C_{x}

, then it also solves

C_{y}

. Indeed,

C_{x}

is a

(k_{x} + 1)

-cycle and

r^{'}

solves a total of

k_{x} + 3

breakpoints, so it must also solve some 2-cycle

C_{y}^{'}

. Aiming at a contradiction, assume that

C_{y}^{'} \neq C_{y}

. Then,

C_{y}^{'}

is already a 2-cycle of

C (S, T)

, and it is tied to some other cycle

C_{x}^{'}

(both in

C (S, T)

and

C (S^{'}, T^{'})

), so

r^{'}

also solves

C_{x}^{'}

. Since

C_{x}^{'}

may not be equal to

C_{x}

(x was chosen without ties),

r^{'}

solves at least

| C_{x} | + | C_{y}^{'} | + | C_{x}^{'} | > k_{x} + 3

breakpoints, which yields a contradiction for a

(k_{x} + 3)

-cut rearrangement. □

Lemma 4.

If

r^{'}

is an efficient

k^{'}

-cut rearrangement of

(S^{'}, T^{'}) = ϕ_{x}^{p} (S, T)

with

k^{'} \in {k_{x}, k_{x} + p}

, then there exists an efficient k-cut rearrangement r of

(S, T)

such that

r^{'} = ψ_{x}^{p} (r)

, where

k = k^{'} - p = k_{x}

if

r^{'}

cuts x and

k = k^{'}

otherwise. Furthermore,

r^{'} (S^{'}, T^{'}) = ϕ_{x}^{p} (r (S, T))

.

Proof.

We build r from

r^{'}

using the converse operations of Lemma 3: mimicking the cuts and reordering of

r^{'}

, but ignoring cuts after

n + 1, \dots, n + p

if

r^{'}

cuts x. The relation between k and

k^{'}

and the efficiency of r follow from the fact that

r^{'}

solves either all of x,

n + 1, \dots, n + p

, or none at all, as proven in Claim 1. The ’furthermore’ part follows from Lemma 3, applied to r. □

3.1.3. Extending All Cycles

We use the natural order over integers as an arbitrary total order over the nodes. The representative of a cycle is its minimum node. We assume

(S, T)

to be k-cyclic for some k. A sample for

(S, T)

, where

(S, T)

is k-cyclic is a list X containing the representative from each k-cycle, and an arbitrary number of adjacencies. The p-extensions of

(S, T)

for sample

X = (x_{1}, \dots, x_{ℓ})

and of a rearrangement r of

(S, T)

are, respectively,

Φ_{X}^{p} (S, T) = ϕ_{x_{ℓ}}^{p} (\dots ϕ_{x_{2}}^{p} (ϕ_{x_{1}}^{p} (S, T)) \dots)

and

Ψ_{X}^{p} (r) = ψ_{x_{ℓ}}^{p} (\dots ψ_{x_{2}}^{p} (ψ_{x_{1}}^{p} (r)) \dots)

.

For

x_{i}

in the sample, we write

n_{x_{i}} = | S | + p \cdot (i - 1)

, i.e.,

n_{x_{i}}

is the size of the strings on which

ϕ_{x_{i}}^{p}

is applied. Note that the above definition requires each

x_{i}

to be safe in

ϕ_{x_{i - 1}}^{p} (\dots ϕ_{x_{1}}^{p} (S, T) \dots)

. This is indeed the case by Lemma 2: either

p = 2

, there are no 2-cycles, and all breakpoints are without ties, or

p = 3

, all 2-cycles are tied to a single cycle

C_{x_{j}} (S, T)

with

j < i

, which are all different from

C_{x_{i}} (S, T)

(since X is a sample and contains at most one element per cycle).

Proposition 5.

If

(S, T)

is k-cyclic, then

Φ_{X}^{2} (S, T)

is

(k + 2)

-cyclic.

Proof.

This follows from Lemma 2, since the 2-extension adds 2 elements to each k-cycle, so

Φ_{X}^{2} (S, T)

is

(k + 2)

-cyclic. □

Lemma 5.

If r is an efficient k-cut rearrangement of

(S, T)

, then

r^{'} = Ψ_{X}^{p} (r)

is an efficient

k + p

rearrangement of

(S^{'}, T^{'}) = Φ_{X}^{p} (S, T)

. Moreover, in this case,

r (S, T)

is k-cyclic with sample X, and

Φ_{X}^{p} (r (S, T)) = r^{'} (Φ_{X}^{p} (S, T))

.

Conversely, any efficient

k + p

rearrangement

r^{'}

of

(S^{'}, T^{'}) = Φ_{X}^{p} (S, T)

can be written as

r^{'} = Ψ_{X}^{p} (r)

, where r is an efficient k-cut rearrangement of

(S, T)

.

Proof.

Given an efficient k-cut rearrangement r, let

\begin{matrix} (S^{0}, T^{0}) & = (S, T), & (S^{j}, T^{j}) & = ϕ_{x_{j}}^{p} (S^{j - 1}, T j - 1) for all 0 < j \leq ℓ, \\ r_{0} & = r, & r_{j} & = ψ_{x_{j}}^{p} (r_{j - 1}) for all 0 < j \leq ℓ . \end{matrix}

Since r is an efficient k-cut rearrangement of

(S, T)

and

(S, T)

is k-cyclic, Proposition 3 implies that r must solve a single cycle of

C (S, T)

. Let x be the representative of this cycle: x is the only breakpoint of X cut by r, and

x = x_{i}

for some i. Furthermore,

C (r (S, T))

is also k-cyclic with sample X (with one cycle less than

C (S, T)

).

By Lemma 3, we have that

r_{j}

is an efficient

k_{j}

-cut rearrangement of

(S^{j}, T^{j})

for each j, where

k_{j} = k_{j - 1}

if

r_{j - 1}

does not cut x (i.e.,

j \neq i

) and

k_{j} = k_{j - 1} + p

otherwise. So, overall,

r^{'} = r_{ℓ}

is a an efficient

(k + p)

-cut rearrangement of

Φ_{X}^{p} (S, T)

. The relationship

Φ_{X}^{p} (r (S, T)) = r^{'} (Φ_{X}^{p} (S, T))

also follows from Lemma 3.

The converse direction is proven similarly using Lemma 4, with a specific attention given to the size of the rearrangements: starting from

r^{'}

(with

k + p

cuts), the number of cuts remains constant, except for

ψ_{x_{j}}^{p}

, where it drops to k and then remains constant again (so the condition

k^{'} \in {k, k + p}

in Lemma 4 is indeed satisfied). □

Lemma 6.

If

(S, T)

is k-cyclic with sample X, then

(S, T)

is k-efficiently sortable if, and only if,

Φ_{X}^{p} (S, T)

is k-efficiently sortable.

Proof.

This is a direct application of Lemma 5: a sequence of efficient k-cut rearrangements of

(S, T)

translates into a sequence of efficient

(k + p)

-cut rearrangements of

Φ_{X}^{p} (S, T)

through function

Ψ_{X}^{p}

(note that X remains a sample of

(S, T)

throughout the sequence of rearrangements). □

Lemma 7.

For any odd

k \geq 5

, deciding whether a k-cyclic pair

(S, T)

is k-efficiently sortable isNP-hard. For any even

k \geq 6

, deciding whether a pair

(S, T)

is k-efficiently sortable isNP-hard.

Proof.

By induction on k. Deciding if a 3-cyclic pair

(S, T)

is efficiently sortable is NP-hard (cf. [12], where it is shown that deciding if a permutation can be sorted with

\frac{d_{b} (S, T)}{3}

transpositions is NP-hard). For any

k \geq 5

, take

p = 2

if k is odd and

p = 3

otherwise, and consider a

(k - p)

-cyclic instance

(S, T)

and a sample X for

(S, T)

(note that one always exists):

(S, T)

is

(k - p)

-efficiently sortable if, and only if,

Φ_{X}^{p} (S, T)

is k-efficiently sortable by Lemma 6, and

Φ_{X}^{p} (S, T)

is k-cyclic for

p = 2

by Proposition 5. This gives a polynomial reduction proving hardness for k (even when restricted to k-cyclic permutations when k is odd). □

Theorem 7 is a corollary of Lemma 7, since a k-cyclic pair

(S, T)

is k-efficiently sortable if, and only if, S can be rearranged into T with at most

\frac{d_{b} (S, T)}{k}

k-cut rearrangements (Proposition 1).

3.2. Hardness for Constant Number of Rearrangements

By Theorem 7, SMCR is NP-hard for constant k. We now show that the case with constant ℓ is NP-hard for all

ℓ \geq 2

. Since for

ℓ = 1

SMCR is the same as computing the breakpoint distance between permutations, we thus obtain a complexity dichotomy with respect to ℓ.

Theorem 8.

SMCRin permutations isNP-hard for any constant

ℓ \geq 2

.

We refer to the notions of breakpoint graph, cycles and efficient rearrangements introduced in Section 3.1.1 of the proof of Theorem 7, as well as their first properties (Propositions 1 to 4). We first give an informal outline of our reduction for

ℓ = 2

: we show that it is NP-hard to check if an input with

2 k

breakpoints can be solved with two k-cut rearrangements (i.e., can be sorted efficiently). We build two permutations whose cycle graph can be decomposed into cycles of specific sizes, and we show that an efficient sorting scenario needs to satisfy two types of constraints: (i) each cycle must be solved entirely in the first or in the second rearrangement, so the sum of cycle size for each rearrangement must be exactly k, and (ii) some cycles must satisfy precedence constraints, i.e., we generalize the notion of ties by building tuples of cycles

(C_{1}, \dots, C_{p}, C^{'})

, where some

C_{i}

must be solved before or during the same k-cut rearrangement as

C^{'}

. We build an instance with a large cycle that needs to be solved during the first rearrangement by constraint (i), and that forces one to solve a non-trivial set of cycles in the same rearrangement by constraint (ii).

Proof.

We present a reduction from Exact 3-Set Cover, where we are given a collection of sets

X = {X_{1}, X_{2}, \dots, X_{n}}

, all of size 3 over a universe U of size

3 p

, and ask whether there are p sets in

X

whose union is exactly U. Further assume that

7 n = 8 p

. This is without loss of generality: indeed, let

δ = 8 p - 7 n

. If

δ > 0

, duplicate set

X_{1}

⌈ \frac{δ}{7} ⌉

times, so n increases by this quantity to become

n^{'}

, and p is unchanged, so

δ

decreases by

7 ⌈ \frac{δ}{7} ⌉

, and now

- 7 \leq 8 p - 7 n^{'} \leq 0

. If

δ < 0

, add

3 | δ |

new elements to the universe, and

| δ |

sets to

X

, covering exactly those new elements, so both n and p increase by

| δ |

and become respectively

n_{1}

and

p_{1}

, yielding

8 p_{1} - 7 n_{1} = 8 p + 8 | δ | - 7 n - 7 | δ | = δ + | δ | = 0

.

The following reduction is for

ℓ = 2

; we show how to extend it to any larger ℓ at the end of the proof. See Figure 6 for an illustration. Given an instance of Exact 3-Set cover, we build an alphabet of size

8 n + 6 p + 2

, with symbols

x_{i, q}

for

1 \leq i \leq n

and

1 \leq q \leq 8

, and

y_{i}

for

1 \leq i \leq 2 p + 2

. Given

u \in U

, let

i_{1}, \dots, i_{k}

be the indices, such that

u \in X_{i_{j}}

for all

1 \leq j \leq k

. Furthermore, let

p_{j}

be the index of u in

X_{j}

(i.e., 1, 2 or 3). Write

S_{u}

for the following string:

\begin{matrix} S_{u} & = x_{i_{1}, 2 p_{1} + 1} x_{i_{1}, 2 p_{1}} x_{i_{2}, 2 p_{2} + 1} x_{i_{2}, 2 p_{2}} \dots x_{i_{k}, 2 p_{k} + 1} x_{i_{k}, 2 p_{k}} . \end{matrix}

Figure 6. Illustration of the reduction from a simplified version of Exact Cover, with size-2 sets over a size-4 universe

U = {1, 2, 3, 4}

:

A = {1, 2}

,

B = {1, 4}

,

C = {2, 4}

,

D = {3, 4}

(for lighter indices we use

a, b, c, d

instead of

x_{1}, x_{2}, x_{3}, x_{4}

; note also that we have

a_{i}

for

1 \leq i \leq 6

instead of

1 \leq i \leq 8

because of the smaller sets). All six non-trivial cycles of

C (S, T)

are depicted on the top permutation:

C^{a} = {a_{1}, a_{3}, a_{5}}

,

C^{b}

, etc. In this example,

(S, T)

admits an efficient 11-cut rearrangement corresponding to an exact set cover of U: cut all breakpoints in bold cycles

C^{a}, C^{d}, C_{1}^{y}

, followed by an efficient 19-cut rearrangement solving the remaining breakpoints (in the actual reduction, we consider instances where

7 n = 8 p

to enforce that both efficient rearrangements have the same size k).

We now build permutations S and T:

\begin{matrix} S = & x_{1, 1} x_{1, 8} x_{2, 1} x_{2, 8} \dots x_{n, 1} x_{n, 8} & y_{1} y_{4} S_{1} y_{3} y_{6} S_{2} y_{5} \dots y_{6 p - 1} y_{6 p + 2} S_{3 p} y_{6 p + 1} y_{2} \\ T = & x_{1, 1} x_{1, 2} \dots x_{1, 8} x_{2, 1} x_{2, 2} \dots x_{2, 8} \dots x_{n, 1} x_{n, 2} \dots x_{n, 8} & y_{1} y_{2} \dots y_{6 p + 2} \end{matrix}

where the suffix of S is the concatenation of

y_{2 i - 1} y_{2 i + 2} S_{i}

for every

1 \leq i \leq 3 p

, followed by

y_{6 p + 1} y_{2}

. Finally, we set

k = 7 p + 1

. Note that strings S and T have

n + 1

adjacencies (0, and each

x_{i, 8}

,

1 \leq i \leq n

), so they have

(8 n + 6 p + 2) + 1 - (n + 1) = 7 n + 6 p + 2 = 14 p + 2 = 2 k

breakpoints. We compute the cycles of

C (S, T)

. To this end, we describe

n + 2

cycles (denoted

C_{i}^{x}

,

1 \leq i \leq n

,

C_{1}^{y}

and

C_{2}^{y}

), and check that their total size is indeed equal to the number of breakpoints.

For each

1 \leq i \leq n

, let

C_{i}^{x}

be cycle

x_{i, 1} \to x_{i, 3} \to x_{i, 5} \to x_{i, 7} \to x_{i, 1}

. There are n such size-4 cycles, for a total size of

4 n

breakpoints. Let

C_{1}^{y}

be the cycle

y_{1} \to y_{6 p + 1} \to y_{6 p - 1} \to \dots y_{5} \to y_{3} \to y_{1}

. It has size

3 p + 1

. The remaining arcs in substring

y_{2 u + 2} S_{u} y_{2 u + 1}

are

y_{2 u} \to x_{i_{k}, 2 p_{k}}

,

x_{i_{j}, 2 p_{j}} \to x_{i_{j - 1}, 2 p_{j - 1}}

for

2 \leq j \leq k

, and

x_{i_{1}, 2 p_{1}} \to y_{2 u + 2}

. Furthermore,

y_{6 p + 2} \to y_{2}

, so all these arcs form a single cycle,

C_{2}^{y}

of length

3 n + 3 p + 1

(

3 n

for all

x_{i, 2 q}

,

q \in {1, 2, 3}

, plus

3 p + 1

for all

y_{2 q}

,

1 \leq q \leq 3 p + 1

). The total size of the cycles above is

(4 n) + (3 p + 1) + (3 n + 3 p + 1) = 7 n + 6 p + 2

, so all breakpoints have been accounted for. Clearly, the reduction can be performed in polynomial time; we now prove that the reduction is correct, i.e.:

S can be transformed into T with 2 k-cut rearrangements

\Leftrightarrow X

has an exact cover of U.

(\Leftarrow)

Given an exact cover of U, we show that cutting all breakpoints in cycles

C_{1}^{y}

and

C_{i}^{x}

where

x_{i}

is in the set cover yields a k-cut rearrangement solving k breakpoints in S. In other words, we first cut the following breakpoints: for each i such that

X_{i}

is in the cover, cut between

x_{i, 1}

and

x_{i, 8}

, as well as between

x_{i, 2 q + 1}

and

x_{i, 2 q}

for

1 \leq q \leq 3

. Furthermore, cut after

y_{2 i + 1}

for each

0 \leq i \leq 3 p

. There are indeed

4 p + 3 p + 1 = k

cuts. At this point, each substring

y_{2 u + 2} S_{u} y_{2 u + 1}

is cut into two strings

y_{2 u + 2} S_{u}^{'} x_{j, 2 p + 1}

, and

x_{j, 2 p} S_{u}^{″} y_{2 u + 1}

(where j is the index of the only set

X_{j}

in the cover, such that

u \in X_{j}

) merge them together in reverse order, i.e., create string

X_{j, p} = x_{j, 2 p} S_{u}^{″} y_{2 u + 1} y_{2 u + 2} S_{u}^{'} x_{j, 2 p + 1}

(note that breakpoint

y_{2 u + 1}

is now solved). Furthermore, we merge the prefix ending in

y_{1}

with

y_{2}

, solving an additional breakpoint. Finally, for each j, such that

X_{j}

is in the cover, insert strings

X_{j, 1}

,

X_{j, 2}

,

X_{j, 3}

between

x_{j, 1}

and

x_{j, 8}

. This step solves breakpoints

x_{j, 1}, x_{j_{3}}, x_{j_{5}}

and

x_{j_{3}}

for each j in the cover, for a total of

4 p

breakpoints solved. So, overall, k breakpoints of S can be solved with a single k-cut rearrangement. Note that the resulting

S^{'}

thus has k remaining breakpoints with T: they can all be solved with a single k-cut rearrangement, so S can be sorted into T in only 2 steps.

(\Rightarrow)

Permutation S can be sorted into T with

ℓ = 2

k-cut rearrangements if, and only if, it can be sorted using two efficient k-cut rearrangements (Proposition 1). Each of these solves a union of cycles of total size k (Proposition 3). In other words, some cycles (of total size k) are solved first (i.e., with the first rearrangement), the other cycles (of total size k as well) are solved second. Note that all cycles

C_{i}^{x}

together have a size of

4 n < k

, so at least one cycle among

C_{1}^{y}

,

C_{2}^{y}

is solved first. They are too large to be solved together, so in fact exactly one of them is solved first, the other second. We extend the notion of being solved first to breakpoints and arcs appearing in the cycles solved first. The following result adapts Proposition 4 to our setting.

Claim 2.

If an arc

u \to v

such that v occurs before u in S is solved first, then some breakpoint strictly between v and u in S is solved first as well.

Proof.

Let

v w_{1} \dots w_{k} u

be the substring of S from v to u. Then, T has

u w_{1}

as a consecutive pair. Solving breakpoint u implies that the rearrangement puts

w_{1}

just after u, which is only possible if there is at least one cut point between

w_{1}

and u in S. □

Consider any

u \in U

, and element

x_{u, 3}

. This element appears in some substring

z x_{u, 3}, x_{u, 2}

of S (depending on the position of

x_{u, 3}

, z is either of the form

y_{2 q}

or

x_{u^{'}, 2 p}

), and

C_{2}^{y}

has an arc

x_{u, 2} \to z

. By the lemma above, if

C_{2}^{y}

is solved first, then the only breakpoint between z and

x_{u, 2}

, (namely,

x_{u, 3}

) must be solved first, and the cycle

C_{u}^{x}

containing it as well. So,

C_{2}^{y}

cannot be solved first, since otherwise, all cycles

C^{x}

would be solved first as well, for a total size of

7 n + 3 p + 1 > 7 p + 1 = k

.

Thus,

C_{1}^{y}

is solved first, as well as

\frac{k - | C_{1}^{y} |}{4} = p

cycles of the form

C_{i}^{x}

. Write P for the size-p subset of

{1, . .,, n}

, such that

i \in P

, if, and only if,

C_{i}^{x}

is solved first. We show that

⋃_{i \in P} X_{i} = U

, i.e., P is a solution to the set cover instance. Indeed, let

u \in U

. Recall that S contains substring

y_{2 u - 1} y_{2 u + 2} S_{u} y_{2 u + 1}

, with arc

y_{2 u + 1} \to y_{2 u - 1}

in cycle

C_{1}^{y}

. Thus, some other breakpoint z of this substring must be solved first. We have

z \neq y_{2 u + 2}

and

z \neq x_{i, 2 q}

, since otherwise

z \in C_{2}^{y}

, which is solved second. Thus

z = x_{i, 2 q + 1}

for some i such that

u \in X_{i}

, and since z is solved first, we have

i \in P

, which concludes the hardness proof for

ℓ = 2

.

For higher values of ℓ, the following is a reduction of SMCR for ℓ to SMCR for

ℓ + 1

: given permutations

S, T

and an integer k (assuming S and T have

k ℓ

breakpoints), let

S^{'} = S \cdot z_{1} \dots z_{k}

and

T^{'} = T \cdot z_{k} \dots z_{1}

. Then, breakpoints

(z_{1}, \dots, z_{k})

form a size-k cycle: any efficient sorting of

(S, T)

must use one step to exclusively sort the breakpoints in this cycle. Thus,

(S, T)

can be solved with ℓ many k-cut rearrangements if, and only if,

(S^{'}, T^{'})

can be solved with

ℓ + 1

many k-cut rearrangements. □

In many settings, we encounter a class of Center problems, which can be generally defined as follows: given a set of objects

S_{1}, \dots, S_{m}

and a radius k, find an object

S^{*}

at distance at most k from each

S_{i}

. In particular, in the context of strings, depending on the distance measure, this problem is called Consensus String (using the Hamming distance) or Center String (using the Edit distance). Problems Longest Common Subsequence and Shortest Common Supersequence also fit in this setting, where (non-symmetrical) distances count 1 for deletions (resp. insertions), and forbid all other operations. See [11] for a review of results on these and related problems. Applying this framework to permutations, it is natural to consider the Breakpoint Center problem: given permutations

(S_{1}, \dots, S_{m})

and an integer k, find a permutation

S^{*}

such that the breakpoint distance between each

S_{i}

and

S^{*}

is at most k. Note that this is slightly different from the Breakpoint Median problem [19], usually defined over

m = 3

permutations, which aims to minimize the sum of distances rather than the maximum. Clearly, for

m = 2

,

(S_{1}, S_{2})

have a breakpoint center

S^{*}

at distance k if, and only if,

S_{2}

can be obtained from

S_{1}

using two k-cut rearrangements (using

S^{*}

as an intermediary step). Thus, Theorem 8 can be restated as follows.

Corollary 1.

TheBreakpoint Centerproblem isNP-hard, even when restricted to two input permutations.

3.3. Fixed-Parameter Tractability and Approximability

We now show that input instances with bounded k and ℓ can be solved efficiently. More precisely, we show that SMCR in permutations is fixed-parameter tractable when parameterized by

k + ℓ

. In fact, SMCR admits a kernel with a polynomial size bound.

Theorem 9.

SMCRin permutations admits a kernel of size

2 k ℓ + 2

.

Proof.

The kernel is based on the following reduction rule: if there is a duo

(a, b)

(i.e., a substring

a b

) in both S and T, then remove b from S and T. Before we show the correctness, observe that the exhaustive application of the rule indeed gives the desired result: any Yes-instance that is reduced exhaustively with respect to the above rule has at most

4 k ℓ + 2

letters: we must cut inside every duo in S. Overall, we may create at most

k ℓ

cuts via ℓ many k-cut rearrangements. Hence, if S has more than

k ℓ

duos, then

(S, T)

is a No-instance. Thus, after applying the rule exhaustively, we may replace any instance with more than

k ℓ + 1

letters in S (and thus in T) by a trivial No-instance of constant size. Thus, the correctness of the rule remains to be proven. Consider an instance consisting of the permutations S and T to which the rule is applied and let

S^{'}

and

T^{'}

denote the resulting instance. We show that

(S, T)

is a Yes-instance if, and only if,

(S^{'}, T^{'})

is a Yes-instance.

(\Rightarrow)

If

(S, T)

is a Yes-instance, then there is a sequence of

ℓ + 1

permutations

(S = S_{1}, S_{2}, \dots, S_{ℓ + 1} = T)

, such that

S_{i + 1}

can be obtained from

S_{i}

via one k-cut rearrangement. Removing b from

S_{i}

gives a sequence

(S_{1}^{'}, S_{2}^{'}, \dots, S_{ℓ + 1}^{'} = T)

such that

S_{i + 1}^{'}

can be obtained from

S_{i}^{'}

via one k-cut rearrangement.

(\Leftarrow)

If

(S^{'}, T^{'})

is a Yes-instance, then there is a sequence of

ℓ + 1

permutations

(S^{'} = S_{1}^{'}, S_{2}^{'}, \dots, S_{ℓ + 1}^{'} = T^{'})

, such that

S_{i + 1}^{'}

can be obtained from

S_{i}^{'}

via one k-cut rearrangement. Replacing a by

a b

in each permutation

S_{i}^{'}

gives a sequence

(S = S_{1}, S_{2}, \dots, S_{ℓ + 1} = T)

such that

S_{i + 1}^{'}

can be obtained from

S_{i}^{'}

via one k-cut rearrangement. □

In addition, we show that for arbitrary values of k, SMCR has a polynomial-time approximation algorithm. Let Opt-SMCR be the optimisation version of SMCR, where we look for the smallest ℓ that is necessary to obtain T from S by k-cut rearrangements.

Theorem 10.

Opt-SMCRin permutations is 2-approximable.

Proof.

Let

I = (S, T, k)

be an instance of Opt-SMCR. We first rewrite S and T into

S^{'}

and

T^{'}

in such a way that

T^{'} = i d_{n}

, where

i d_{n}

denotes the identity permutation of

{1, \dots, n}

. Let

k^{'} = ⌊ \frac{k}{2} ⌋

. The algorithm consists of iterating the following three steps, starting from

S^{'}

:

(a): rewrite $S^{'}$ , by contracting adjacencies so as to obtain a permutation containing no adjacencies,
(b): cut around (i.e., right before and right after) the first $k^{'}$ elements $1, 2, 3, \dots, k^{'}$ of that permutation, and
(c): rearrange it so as to obtain $i d_{k^{'}}$ followed by the rest of the permutation.

Steps (b) and (c) above are presented for the case where k is even. If k is odd, (b) and (c) are slightly modified, since we are left with an unused cut:

(b’): do as (b) and additionally cut to the left of $k^{'} + 1$
(c’): do as (c) but rearrange in such a way that $k^{'}$ and $k^{'} + 1$ are consecutive.

Clearly, the optimal value ℓ for Opt-SMCR satisfies

ℓ \geq \frac{b (S, T)}{k}

. Our algorithm removes at least

k^{'}

(at least

k^{'} + 1

) breakpoints at each iteration when k is even (when k is odd), and thus requires

ℓ^{'} \leq \frac{b (S, T)}{k^{'}}

(

ℓ^{'} \leq \frac{b (S, T)}{k^{'} + 1}

) many k-cut rearrangements. Altogether, we have

ℓ^{'} \leq \frac{ℓ k}{k^{'}}

if k is even and

ℓ^{'} \leq \frac{ℓ k}{k^{'} + 1}

if k is odd. Since

k^{'} = ⌊ \frac{k}{2} ⌋

, we conclude that

ℓ^{'} \leq 2 ℓ

. □

4. Conclusions

We introduced Sorting by Multi-Cut Rearrangements, a generalization of usual genome rearrangement problems that, however, does not include reversals. We discussed its classical computational complexity (P vs. NP-hard) and its membership in FPT with respect to the parameters k and ℓ. For this, we distinguished the case where S and T are permutations from the case where S and T are strings.

The obvious remaining open problem is the one indicated in Table 1, namely the FPT status of SMCR with respect to parameter

ℓ + k

in strings (including cases where one of k or ℓ is constant, with

ℓ > 1

). Another question is whether the approximation factor of 2 of Theorem 10 can be improved. We also recall that we conjecture SMCR to be NP-hard for any

k \geq 5

in binary strings. It would also be interesting to better understand the comparative roles of k and ℓ in SMCR, for instance by studying the following question: assuming k is increased by some constant c, what impact does this have on the optimal distance?

Extensions or variants of SMCR could also be studied, notably the one allowing reversals (and thus applicable to signed strings/permutations), or the one where T is the lexicographically ordered string derived from S, as studied, for instance, by Radcliffe et al. [13] in the case of reversals.

Author Contributions

Conceptualization, L.B., G.F., G.J. and C.K.; methodology, L.B., G.F., G.J. and C.K.; validation, L.B., G.F., G.J. and C.K.; formal analysis, L.B., G.F., G.J. and C.K.; writing—original draft preparation, L.B., G.F., G.J. and C.K.; writing—review and editing, L.B., G.F., G.J. and C.K.; supervision, G.F. All authors have read and agreed to the published version of the manuscript.

Funding

This research was partially funded and supported by the PHC Procope grant number 17746PC (G.F. and G.J.), and by the DAAD Procope grant number 57317050 (C.K.).

Data Availability Statement

Not Applicable, the study does not report any data.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

SMCR	Sorting by Multi-Cut Rearrangements
MCSP	Minimum Common String Partition
FPT	Fixed-Parameter Tractable

References

Bafna, V.; Pevzner, P.A. Genome Rearrangements and Sorting by Reversals. SIAM J. Comput. 1996, 25, 272–289. [Google Scholar] [CrossRef]
Bafna, V.; Pevzner, P.A. Sorting by Transpositions. SIAM J. Discret. Math. 1998, 11, 224–240. [Google Scholar] [CrossRef]
Christie, D.A. Sorting Permutations by Block-Interchanges. Inf. Process. Lett. 1996, 60, 165–169. [Google Scholar] [CrossRef]
Fertin, G.; Labarre, A.; Rusu, I.; Tannier, E.; Vialette, S. Combinatorics of Genome Rearrangements; Computational Molecular Biology; MIT Press: Cambridge, MA, USA, 2009. [Google Scholar]
Stephens, P.J.; Greenman, C.D.; Fu, B.; Yang, F.; Bignell, G.R.; Mudie, L.J.; Pleasance, E.D.; Lau, K.W.; Beare, D.; Stebbings, L.A.; et al. Massive Genomic Rearrangement Acquired in a Single Catastrophic Event during Cancer Development. Cell 2011, 144, 27–40. [Google Scholar] [CrossRef] [PubMed]
Pellestor, F.; Gatinois, V. Chromoanagenesis: A piece of the macroevolution scenario. Mol. Cytogenet. 2020, 13, 3. [Google Scholar] [CrossRef] [PubMed]
Hannenhalli, S.; Pevzner, P.A. Transforming Cabbage into Turnip: Polynomial Algorithm for Sorting Signed Permutations by Reversals. J. ACM 1999, 46, 1–27. [Google Scholar] [CrossRef]
Bulteau, L.; Fertin, G.; Jean, G.; Komusiewicz, C. Sorting by Multi-cut Rearrangements. In Proceedings of the SOFSEM 2021: Theory and Practice of Computer Science—47th International Conference on Current Trends in Theory and Practice of Computer Science, Bolzano-Bozen, Italy, 25–29 January 2021; pp. 593–607. [Google Scholar]
Downey, R.; Fellows, M.R. Fundamentals of Parameterized Complexity; Springer: London, UK, 2013. [Google Scholar]
Cygan, M.; Fomin, F.V.; Kowalik, L.; Lokshtanov, D.; Marx, D.; Pilipczuk, M.; Pilipczuk, M.; Saurabh, S. Parameterized Algorithms; Springer: London, UK, 2015. [Google Scholar]
Bulteau, L.; Hüffner, F.; Komusiewicz, C.; Niedermeier, R. Multivariate algorithmics for NP-hard string problems. Bull. Eur. Assoc. Theor. Comput. Sci. 2014, 114. Available online: https://hal.archives-ouvertes.fr/hal-01260610/document (accessed on 27 May 2021).
Bulteau, L.; Fertin, G.; Rusu, I. Sorting by Transpositions Is Difficult. SIAM J. Discret. Math. 2012, 26, 1148–1180. [Google Scholar] [CrossRef]
Radcliffe, A.J.; Scott, A.D.; Wilmer, E.L. Reversals and transpositions over finite alphabets. SIAM J. Discret. Math. 2005, 19, 224–244. [Google Scholar] [CrossRef]
Christie, D.A. Genome Rearrangement Problems. Ph.D. Thesis, University of Glasgow, Glasgow, UK, 1998. [Google Scholar]
Lin, Y.C.; Lu, C.L.; Chang, H.Y.; Tang, C.Y. An efficient algorithm for sorting by block-interchanges and its application to the evolution of vibrio species. J. Comput. Biol. 2005, 12, 102–112. [Google Scholar] [CrossRef] [PubMed]
Kolman, P.; Goldstein, A.; Zheng, J. Minimum Common String Partition Problem: Hardness and Approximations. Electron. J. Comb. 2005, 12. [Google Scholar] [CrossRef]
Garey, M.R.; Johnson, D.S. Computers and Intractability: A Guide to the Theory of NP-Completeness; W. H. Freeman & Co.: New York, NY, USA, 1979. [Google Scholar]
Bulteau, L.; Komusiewicz, C. Minimum Common String Partition Parameterized by Partition Size Is Fixed-Parameter Tractable. In Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2014, Portland, OR, USA, 5–7 January 2014; pp. 102–121. [Google Scholar]
Sankoff, D.; Blanchette, M. The Median Problem for Breakpoints in Comparative Genomics. In Proceedings of the Computing and Combinatorics, Third Annual International Conference, COCOON ’97, Shanghai, China, 20–22 August 1997; Springer: London, UK, 1997; Volume 1276, pp. 251–264. [Google Scholar]

Figure 1. Examples of k-cut rearrangement scenarii. Cuts are represented by vertical lines and the order of the rearranged blocks is indicated by the framed numbers. (A) Example on strings

S = a b c e b c d a d c c b

and

T = a e a b b c c b c c d d

. (A.1) shows a 5-cut rearrangement scenario with

ℓ = 3

. (A.2) shows a 6-cut rearrangement scenario with

ℓ = 2

. (B) Example on permutations

S = 3 5 9 7 10 4 6 1 8 2

and

T = 1 2 3 4 5 6 7 8 9 10

with

b (S, T) = 11

. (B.1,B.2) show two alternative 5-cut rearrangement scenarii with

ℓ = 3

.

Figure 1. Examples of k-cut rearrangement scenarii. Cuts are represented by vertical lines and the order of the rearranged blocks is indicated by the framed numbers. (A) Example on strings

S = a b c e b c d a d c c b

and

T = a e a b b c c b c c d d

. (A.1) shows a 5-cut rearrangement scenario with

ℓ = 3

. (A.2) shows a 6-cut rearrangement scenario with

ℓ = 2

. (B) Example on permutations

S = 3 5 9 7 10 4 6 1 8 2

and

T = 1 2 3 4 5 6 7 8 9 10

with

b (S, T) = 11

. (B.1,B.2) show two alternative 5-cut rearrangement scenarii with

ℓ = 3

.

Figure 2. Relationship between classical rearrangements and k-cut rearrangements. Cuts are represented by vertical lines, while dashed rectangles define blocks that are moved in classical rearrangements. (a) when

k = 3

, a k-cut rearrangement corresponds to a transposition (here of blocks

X_{2}

and

X_{3}

). (b–f) are the 5 possible cases that we can get with a 4-cut rearrangement, all of them corresponding to a block interchange of dashed blocks.

Figure 2. Relationship between classical rearrangements and k-cut rearrangements. Cuts are represented by vertical lines, while dashed rectangles define blocks that are moved in classical rearrangements. (a) when

k = 3

, a k-cut rearrangement corresponds to a transposition (here of blocks

X_{2}

and

X_{3}

). (b–f) are the 5 possible cases that we can get with a 4-cut rearrangement, all of them corresponding to a block interchange of dashed blocks.

Figure 3. Example of the proposed reduction starting from the following instance of 3-Partition:

A = {6, 4, 4, 7, 4, 5, 4, 7, 5, 5, 4, 5}

(thus

m = 4

) and

B = 15

. Each

a_{i}

is represented by a rectangle containing

a_{i}

squares. Colors in

A

describe a way to partition

A

into triplets, each summing to

B = 15

. S and T are built as described in the reduction. Each vertical arrow corresponds to a 7-cut. The first one, which is the first step of the proposed 7-cut rearrangement scenario, going from S to

S^{'}

, is detailed: each red bullet corresponds to a cut, and blocks of 0 that are rearranged are underlined in green (corresponding to the three “green

a_{i}

s” summing to B).

Figure 3. Example of the proposed reduction starting from the following instance of 3-Partition:

A = {6, 4, 4, 7, 4, 5, 4, 7, 5, 5, 4, 5}

(thus

m = 4

) and

B = 15

. Each

a_{i}

is represented by a rectangle containing

a_{i}

squares. Colors in

A

describe a way to partition

A

into triplets, each summing to

B = 15

. S and T are built as described in the reduction. Each vertical arrow corresponds to a 7-cut. The first one, which is the first step of the proposed 7-cut rearrangement scenario, going from S to

S^{'}

, is detailed: each red bullet corresponds to a cut, and blocks of 0 that are rearranged are underlined in green (corresponding to the three “green

a_{i}

s” summing to B).

Figure 4. Example of a 3-cut rearrangement scenario from

S = 5 2 4 7 3 8 6 1 9

to

T = 1 2 3 4 5 6 7 8 9

using cycle graphs. From top to bottom: The cycle graph for

(S, T)

contains three non-trivial cycles and a trivial one (breakpoint u is depicted to the right of the label for u). Blocks 1 to 4 of a 3-cut rearrangement (cutting after 5, 8, and 1) are given: it solves cycle

5 \to 8 \to 1 \to 5

(denoted

(5, 8, 1)

for short). Two other efficient rearrangements solve cycles

(0, 6, 4)

and

(2, 7, 3)

, thus reaching T in three steps. Note that some cycles do not admit efficient 3-cut rearrangement, e.g.,

(0, 6, 4)

in the original permutation S.

Figure 4. Example of a 3-cut rearrangement scenario from

S = 5 2 4 7 3 8 6 1 9

to

T = 1 2 3 4 5 6 7 8 9

using cycle graphs. From top to bottom: The cycle graph for

(S, T)

contains three non-trivial cycles and a trivial one (breakpoint u is depicted to the right of the label for u). Blocks 1 to 4 of a 3-cut rearrangement (cutting after 5, 8, and 1) are given: it solves cycle

5 \to 8 \to 1 \to 5

(denoted

(5, 8, 1)

for short). Two other efficient rearrangements solve cycles

(0, 6, 4)

and

(2, 7, 3)

, thus reaching T in three steps. Note that some cycles do not admit efficient 3-cut rearrangement, e.g.,

(0, 6, 4)

in the original permutation S.

Figure 5. (a) The cycle

(1, 5, 8)

from Figure 4, and how solving it affects the rest of the sequence (substrings

2 - 8

and

6 - 1

, i.e., blocks Algorithms 14 00169 i003

and

are swapped). (b) The 2-extension of S for breakpoint

x = 1

, giving cycle

(1, n + 1, n + 2, 5, 8)

. This extended cycle can be solved with a 5-cut rearrangement having an equivalent effect on the sequence (substrings

2 - 8

and

6 - 1

are swapped). The resulting

S_{2}^{'}

is the 2-extension of

S^{'}

for adjacency

x = 1

. (c) The 3-extension of S for breakpoint

x = 1

, giving cycles

(1, n + 2, 5, 8)

and

(n + 1, n + 3)

(the latter being tied to the former: there is no way of solving breakpoint

n + 3

without cutting

n + 2

). These two cycles can be solved with a 6-cut rearrangement having an equivalent effect on the sequence (again, substrings

2 - 8

and

6 - 1

are swapped). The resulting

S_{3}^{'}

is the 3-extension of

S^{'}

for adjacency

x = 1

.

Figure 5. (a) The cycle

(1, 5, 8)

from Figure 4, and how solving it affects the rest of the sequence (substrings

2 - 8

and

6 - 1

, i.e., blocks Algorithms 14 00169 i003

and

are swapped). (b) The 2-extension of S for breakpoint

x = 1

, giving cycle

(1, n + 1, n + 2, 5, 8)

. This extended cycle can be solved with a 5-cut rearrangement having an equivalent effect on the sequence (substrings

2 - 8

and

6 - 1

are swapped). The resulting

S_{2}^{'}

is the 2-extension of

S^{'}

for adjacency

x = 1

. (c) The 3-extension of S for breakpoint

x = 1

, giving cycles

(1, n + 2, 5, 8)

and

(n + 1, n + 3)

(the latter being tied to the former: there is no way of solving breakpoint

n + 3

without cutting

n + 2

). These two cycles can be solved with a 6-cut rearrangement having an equivalent effect on the sequence (again, substrings

2 - 8

and

6 - 1

are swapped). The resulting

S_{3}^{'}

is the 3-extension of

S^{'}

for adjacency

x = 1

.

Figure 6. Illustration of the reduction from a simplified version of Exact Cover, with size-2 sets over a size-4 universe

U = {1, 2, 3, 4}

:

A = {1, 2}

,

B = {1, 4}

,

C = {2, 4}

,

D = {3, 4}

(for lighter indices we use

a, b, c, d

instead of

x_{1}, x_{2}, x_{3}, x_{4}

; note also that we have

a_{i}

for

1 \leq i \leq 6

instead of

1 \leq i \leq 8

because of the smaller sets). All six non-trivial cycles of

C (S, T)

are depicted on the top permutation:

C^{a} = {a_{1}, a_{3}, a_{5}}

,

C^{b}

, etc. In this example,

(S, T)

admits an efficient 11-cut rearrangement corresponding to an exact set cover of U: cut all breakpoints in bold cycles

C^{a}, C^{d}, C_{1}^{y}

, followed by an efficient 19-cut rearrangement solving the remaining breakpoints (in the actual reduction, we consider instances where

7 n = 8 p

to enforce that both efficient rearrangements have the same size k).

Figure 6. Illustration of the reduction from a simplified version of Exact Cover, with size-2 sets over a size-4 universe

U = {1, 2, 3, 4}

:

A = {1, 2}

,

B = {1, 4}

,

C = {2, 4}

,

D = {3, 4}

(for lighter indices we use

a, b, c, d

instead of

x_{1}, x_{2}, x_{3}, x_{4}

; note also that we have

a_{i}

for

1 \leq i \leq 6

instead of

1 \leq i \leq 8

because of the smaller sets). All six non-trivial cycles of

C (S, T)

are depicted on the top permutation:

C^{a} = {a_{1}, a_{3}, a_{5}}

,

C^{b}

, etc. In this example,

(S, T)

admits an efficient 11-cut rearrangement corresponding to an exact set cover of U: cut all breakpoints in bold cycles

C^{a}, C^{d}, C_{1}^{y}

, followed by an efficient 19-cut rearrangement solving the remaining breakpoints (in the actual reduction, we consider instances where

7 n = 8 p

to enforce that both efficient rearrangements have the same size k).

Table 1. Summary of the results for Sorting by Multi-Cut Rearrangements in strings. Recall that d denotes the maximum number of occurrences of any character in the input string S.

Table 2. Exhaustive list of duos present both in S and T, together with their occurrences.

Duo	Occurrences in S	Occurrences in T
$(0, 0)$	$B m - 3 m$	$B m - m$
$(0, 1)$	$3 m$	0
$(0, 2)$	0	m
$(1, 0)$	$3 m$	0
$(1, 1)$	0	$3 m$
$(1, 2)$	1	1
$(2, 0)$	0	m
$(2, 1)$	0	0
$(2, 2)$	m	0

Table 3. Summary of the results for Sorting by Multi-Cut Rearrangements in permutations. Additionally, Opt-SMCR admits a 2-approximation algorithm (Theorem 10).

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Article Metrics

Citations

Article Access Statistics

Journal Statistics

Multiple requests from the same IP address are counted as one view.

Sorting by Multi-Cut Rearrangements †

Abstract

1. Introduction

1.1. Notations and Definitions

1.2. Parameterized Algorithmics

1.3. Basic Observations

2. Sorting by Multi-Cut Rearrangements in Strings

3. Sorting by Multi-Cut Rearrangements in Permutations

3.1. Hardness for Constant Number of Cuts

3.1.1. Breakpoints and Cycle Graph

3.1.2. One-Cycle Extensions

3.1.3. Extending All Cycles

3.2. Hardness for Constant Number of Rearrangements

3.3. Fixed-Parameter Tractability and Approximability

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Article Metrics

Article Access Statistics

Sorting by Multi-Cut Rearrangements^†