**Graph Algorithms and Applications**

Editors

**Serafino Cicerone Gabriele Di Stefano**

MDPI • Basel • Beijing • Wuhan • Barcelona • Belgrade • Manchester • Tokyo • Cluj • Tianjin

*Editors* Serafino Cicerone Department of Information Engineering, Computer Science and Mathematics University of L'Aquila L'Aquila Italy Gabriele Di Stefano Department of Information Engineering, Computer Science and Mathematics University of L'Aquila L'Aquila Italy

*Editorial Office* MDPI St. Alban-Anlage 66 4052 Basel, Switzerland

This is a reprint of articles from the Special Issue published online in the open access journal *Algorithms* (ISSN 1999-4893) (available at: www.mdpi.com/journal/algorithms/special issues/ Graph Algorithms Applications).

For citation purposes, cite each article independently as indicated on the article page online and as indicated below:

LastName, A.A.; LastName, B.B.; LastName, C.C. Article Title. *Journal Name* **Year**, *Volume Number*, Page Range.

**ISBN 978-3-0365-1542-7 (Hbk) ISBN 978-3-0365-1541-0 (PDF)**

© 2022 by the authors. Articles in this book are Open Access and distributed under the Creative Commons Attribution (CC BY) license, which allows users to download, copy and build upon published articles, as long as the author and publisher are properly credited, which ensures maximum dissemination and a wider impact of our publications.

The book as a whole is distributed by MDPI under the terms and conditions of the Creative Commons license CC BY-NC-ND.

## **Contents**


## **About the Editors**

#### **Serafino Cicerone**

Serafino Cicerone received a PhD degree from the University "La Sapienza" of Rome in 1997. He is currently an Associate Professor with the Department of Information Engineering, Computer Science and Mathematics, University of L'Aquila. His research interests revolve around the specification, design, verification and implementation of efficient algorithms. Specific areas of interest include algorithmic graph theory, combinatorial optimization, distributed algorithms, algorithm engineering, and spatial and geometric data.

#### **Gabriele Di Stefano**

Gabriele Di Stefano received a PhD degree from the University "La Sapienza" of Rome, in 1992. He is currently a Full Professor with the Department of Information Engineering, Computer Science and Mathematics, University of L'Aquila. He has had key-participations in several EU funded projects. Among them: MILORD (AIM 2024), COLUMBUS (IST 2001-38314), AMORE (HPRN-CT-1999-00104), ARRIVAL (IST FP6-021235-2), and recently GEOSAFE (H2020-691161). His current research interests include algorithmic graph theory, combinatorial optimization, network algorithms, and distributed computing.

## *Editorial* **Special Issue on "Graph Algorithms and Applications"**

**Serafino Cicerone \* and Gabriele Di Stefano**

Department of Information Engineering, Computer Science and Mathematics, University of L'Aquila, I-67100 L'Aquila, Italy; gabriele.distefano@univaq.it

**\*** Correspondence: serafino.cicerone@univaq.it

**Abstract:** The mixture of data in real life exhibits structure or connection property in nature. Typical data include biological data, communication network data, image data, etc. Graphs provide a natural way to represent and analyze these types of data and their relationships. For instance, more recently, graphs have found new applications in solving problems for emerging research fields such as social network analysis, design of robust computer network topologies, frequency allocation in wireless networks, and bioinformatics. Unfortunately, the related algorithms usually suffer from high computational complexity, since some of these problems are NP-hard. Therefore, in recent years, many graph models and optimization algorithms have been proposed to achieve a better balance between efficacy and efficiency. The aim of this Special Issue is to provide an opportunity for researchers and engineers from both academia and the industry to publish their latest and original results on graph models, algorithms, and applications to problems in the real world, with a focus on optimization and computational complexity.

**Keywords:** analysis and design or graph algorithms; distributed graph and network algorithms; graph theory with algorithmic applications; computational complexity of graph problems; experimental evaluation of graph algorithms

**Citation:** Cicerone, S.; Di Stefano, G. Special Issue on "Graph Algorithms and Applications". *Algorithms* **2021**, *14*, 150. https://doi.org/10.3390/ a14050150

Received: 26 April 2021 Accepted: 6 May 2021 Published: 10 May 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

#### **1. Introduction**

Graphs represent mathematical abstractions that can be used to represent networks of various types: physical (e.g., the Internet or transportation networks), biological (e.g., brain networks), or social (e.g., online social networks). This led the development of algorithmic graph theory as a classical research area in computer science. It focuses on the discovery of characterization theorems on (different types of) graphs, which in turn often lead to the development of efficient algorithms for practical problems that can be modeled on graphs.

#### **2. Special Issue**

In response to the call for papers, a total of eighteen manuscripts were submitted. Out of them, we selected six submissions to appear in this Special Issue. In what follows, we summarize the contents of all six published papers.

In [1], the authors faced a typical problem concerning the visual analysis of realworld networks. To this end, they introduce and study the following beyond-planarity problem that they call *h*-CLIQUE2PATH PLANARITY. Let *G* be a simple topological graph for which the vertices are partitioned into subsets of size at most *h*, each inducing a clique: *h*-CLIQUE2PATH PLANARITY asks whether it is possible to obtain a planar subgraph of *G* by removing edges from each clique so that the subgraph induced by each subset is a path. They investigate the complexity of this problem in relation to *k*-planarity. In particular, they prove that *h*-CLIQUE2PATH PLANARITY is NP-complete even when *h* = 4 and *G* is a simple 3-plane graph, while it can be solved in linear time when *G* is a simple 1-plane graph, for any value of *h*. The results provided contribute to the growing fields of hybrid planarity and of graph drawing beyond planarity.

In [2], the authors used graph theory models to cope with problems arising in the field of molecular biology and bioinformatics. They considered the ancestral mixture model proposed by Chen and Lindsay in 2006, an important model building a hierarchical tree from high dimensional binary sequences. As a phylogenetic tree (or evolutionary tree), a mixture tree created from ancestral mixture models involves the inferred evolutionary relationships among various biological species. Moreover, it contains the information of time when the species mutates. The tree comparison metric, an essential issue in bioinformatics, is used to measure the similarity between trees. Since the approach to the comparison between two mixture trees is still unknown, the authors proposed a new metric to measure the similarity of two mixture trees and designed efficient algorithms for computing it.

In [3], the authors proposed graph models and algorithms for social network analysis. In particular, they considered the phenomenon occurring in many political campaigns where social influence is used in order to convince voters to support/oppose a specific candidate/party. In election control via the social influence problem, an attacker tries to find a set of limited influencers to start disseminating a political message in a social network of voters. A voter changes their opinion when they receive and accept the message. In constructive case, the goal is to maximize the number of votes/winners of a target candidate/party, while in the destructive case, the attacker tries to minimize them. Recent works considered the problem in different models and presented some hardness and approximation results. In that paper, the authors considered multi-winner election control through social influence on different graph structures and diffusion models, and the goal was to maximize/minimize the number of winners in our target party. They showed that the problem is hard to approximate when voters' connections form a graph, and the diffusion model is the linear threshold model. They also proved the same result considering an arborescence under independent cascade model. Moreover, they presented a dynamic programming algorithm for the cases that the voting system is a variation of straight-party voting and voters form a tree.

In [4], the authors considered congestion games, a well-known class of noncooperative games that have the capability to model several interesting competitive scenarios while maintaining nice properties. In these games, there is a set of players sharing a set of resources. Each resource has an associated cost function, which depends on the number of players using it (the so-called congestion). Players aim to choose subsets of resources to minimize the sum of resource costs. In particular, the authors introduced multidimensional congestion games, that is, congestion games for which the set of players is partitioned into *d* + 1 clusters *C*0, *C*1, . . . , *C<sup>d</sup>* . Players in *C*<sup>0</sup> have full information about all of the other participants in the game, while players in *C<sup>i</sup>* , for any 1 ≤ *i* ≤ *d*, have full information only about the members of *C*<sup>0</sup> ∪ *C<sup>i</sup>* and are unaware of the others. This model has at least two interesting applications: (*i*) it is a special case of graphical congestion games induced by an undirected social knowledge graph with independence number equal to *d*, and (*ii*) it represents scenarios in which players have a type and the level of competition they experience on a resource depends on their type and on the types of the other players using it. The authors focused on the case in which the cost function associated with each resource is affine and bound to the price of anarchy and stability as a function of *d* with respect to two meaningful social cost functions and for both weighted and unweighted players. They also provided refined bounds for the special case of *d* = 2 in the presence of unweighted players.

The remaining two papers addressed typical problems in algorithmic graph theory. In [5], the authors studied the maximum-clique independence problem and some variations of the clique transversal problem such as the {*k*}-clique, maximum-clique, minus clique, signed clique, and *k*-fold clique transversal problems from algorithmic aspects for *k*-trees, suns, planar graphs, doubly chordal graphs, clique perfect graphs, total graphs, split graphs, line graphs, and dually chordal graphs. They gave equations to compute the {*k*}-clique, minus clique, signed clique, and *k*-fold clique transversal numbers for suns and

showed that the {*k*}-clique transversal problem is polynomial-time solvable for graphs in which the clique transversal numbers equal their clique independence numbers. They also showed the relationship between the signed and generalization clique problems and presented NP-completeness results for the considered problems on *k*-trees with unbounded *k*, planar graphs, doubly chordal graphs, total graphs, split graphs, line graphs, and dually chordal graphs.

Finally, in [6], the class of *k*-distance-hereditary graphs was studied. The considered graphs have nice properties for which the distance in each connected induced subgraph is at most *k* times the distance in the whole graph. The defined graphs represent a generalization of the well-known distance-hereditary graphs, which actually correspond to 1-distancehereditary graphs. This paper provides characterizations for the class of all *k*-distancehereditary graphs such that *k* < 2. The new characterizations are given in terms of both forbidden subgraphs and cycle-chord properties. Such results also lead to devising a polynomial-time recognition algorithm for this type of graph that, according to the provided characterizations, simply detects the presence of quasi-holes in any given graph.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Acknowledgments:** The guest editors thank all of the authors who submitted their work to this Special Issue, the reviewers for their constructive comments, and the editorial staff for their assistance.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **Graph Planarity by Replacing Cliques with Paths †**

#### **Patrizio Angelini <sup>1</sup> , Peter Eades <sup>2</sup> , Seok-Hee Hong <sup>2</sup> , Karsten Klein <sup>3</sup> , Stephen Kobourov <sup>4</sup> , Giuseppe Liotta <sup>5</sup> , Alfredo Navarra 6,\* and Alessandra Tappini <sup>5</sup>**


Received: 3 July 2020; Accepted: 11 August 2020; Published: 13 August 2020

**Abstract:** This paper introduces and studies the following beyond-planarity problem, which we call *h*-CLIQUE2PATH PLANARITY. Let *G* be a simple topological graph whose vertices are partitioned into subsets of size at most *h*, each inducing a clique. *h*-CLIQUE2PATH PLANARITY asks whether it is possible to obtain a planar subgraph of *G* by removing edges from each clique so that the subgraph induced by each subset is a path. We investigate the complexity of this problem in relation to *k*-planarity. In particular, we prove that *h*-CLIQUE2PATH PLANARITY is NP-complete even when *h* = 4 and *G* is a simple 3-plane graph, while it can be solved in linear time when *G* is a simple 1-plane graph, for any value of *h*. Our results contribute to the growing fields of hybrid planarity and of graph drawing beyond planarity.

**Keywords:** planar graphs; *k*-planarity; NP-hardness; polynomial time reduction; cliques; paths

#### **1. Introduction**

A typical problem concerning the visual analysis of real-world networks refers to the creation of occlusions and hairball-like structures in dense subnetworks when node-link diagrams are generated by standard layout algorithms, e.g., force-directed methods. On the other hand, different representations, such as adjacency matrices, are well suited for dense graphs but make neighbor identification and path-tracing more difficult [1,2]. *Hybrid graph representations* combine different visualization metaphors in order to exploit their strengths and overcome their drawbacks.

The *NodeTrix* model [3] represents a first example of hybrid representation. It combines node-link diagrams with adjacency-matrix representations of the denser subgraphs [3–6]. Inspired by NodeTrix, other hybrid representation models were recently introduced [7–9]. The *ChordLink* model [7] embeds chord diagrams, used for the visualization of dense subgraphs (*clusters*), into a node-link diagram. In a (*k*, *p*) *representation* [8], each cluster contains at most *k* vertices and each vertex can occur at most *p* times along the boundary of the cluster. In the *intersection-link representations* [9] model, vertices are geometric objects and edges are either intersections between objects (*intersection-edges*) or crossing-free Jordan arcs attaching at their boundary (*link-edges*). Different types of objects determine different intersection-link representations.

*Clique-planar* drawings are defined in [9] as intersection-link representations in which the objects are isothetic rectangles, and the partition into intersection- and link-edges is given as a part of the input, so that the graph induced by the intersection-edges is composed of a set of vertex-disjoint cliques. The corresponding recognition problem, called CLIQUE-PLANARITY, has been proved NP-complete in general and polynomial-time solvable in restricted cases, for example when the rectangle representing each vertex is given as a part of the input, or when the cliques are arranged on levels according to a hierarchy. In [9], it is also proven that, if a graph is clique-planar, then it admits an intersection-link representation in which all vertices in a same cluster are isothetic unit squares whose upper-left corners are aligned along a line of slope one (see Figure 1a,b).

**Figure 1.** (**a**) A non-planar graph *G*. Cliques are highlighted with bold edges. (**b**) A clique-planar drawing of *G*. (**c**) Replacing each clique by a path spanning its vertices. Note that, different from (**a**), in (**c**), the first vertex and the last vertex of each path have only one place to connect to edges, while the interior vertices have two places: this is what makes the problem non-trivial.

Therefore, we can reformulate the CLIQUE-PLANARITY problem in the terminology of *beyond-planarity* [10,11] as follows. Given a graph *G* = (*V*, *E*) and a partition of its vertex set *V* into subsets *V*1, . . . , *V<sup>m</sup>* such that the subgraph of *G* induced by each subset *V<sup>i</sup>* is a clique, the goal is to compute a planar subgraph *G* 0 = (*V*, *E* 0 ) of *G* by replacing the clique induced by *V<sup>i</sup>* , for each *i* = 1, . . . , *m*, with a path spanning the vertices of *V<sup>i</sup>* (see Figure 1c).

In this paper, we introduce and study a problem called *h*-CLIQUE2PATH PLANARITY (for short, *h*-C2PP), that is a restricted version of CLIQUE-PLANARITY in which the input graph comes with a given embedding and each clique has size at most *h*. Preliminary results have been presented in [12].

#### *1.1. Our Results*

A graph *G* is *planar* if it admits an embedding in the plane where no two edges cross; this embedding is a *planar embedding* of *G*. A planar graph with an associated planar embedding is said to be an *embedded planar* graph, or a *plane* graph.

In the version of *h*-CLIQUE2PATH PLANARITY that we study, the input graph *G* is a *simple topological graph*. A *topological* graph is embedded in the plane so that each edge is a Jordan arc connecting its end-vertices. A topological graph is *simple* if a Jordan arc does not pass through any vertex, and does not intersect any arc more than once (either with a proper crossing or sharing a common end-vertex); finally, no three arcs mutually cross at the same point.

Our main goal is to investigate the complexity of *h*-C2PP in relation to the well-studied class of *k-planar graphs*, i.e., those that admit a drawing in which each edge has at most *k* crossings [9,10,13,14]. With a slight abuse of notation, we use the term *embedding* also for non-planar graphs, where we interpret each crossings as a dummy vertex. In particular, a *k-planar* graph together with a *k*-planar embedding is a *k-plane graph*.

A *geometric graph* is drawn in the plane so that each edge is a straight line segment. The version of *h*-C2PP in which the input graph *G* is a geometric graph has been recently studied by Kindermann et al. [15], who called it the *partition spanning forest problem*. They proved that 4-C2PP for geometric graphs is NP-complete, which immediately implies the NP-completeness of 4-C2PP for simple topological graphs.

We strengthen this result by proving that 4-C2PP is NP-complete even for simple topological 3-plane graphs. On the positive side, we prove that the *h*-C2PP problem for simple topological 1-plane graphs can be solved in linear time for any value of *h*. We finally remark that the 2-SAT formulation used in [15] to solve 3-C2PP for geometric graphs can be easily extended to solve 3-C2PP for any simple topological graph.

#### *1.2. Outline*

In Section 2, we further investigate the relationship between *h*-C2PP and the partition spanning forest problem, that is the problem studied by Kindermann et al. [15]. In Section 3, we prove the NP-completeness of 4-C2PP for simple topological 3-plane graphs. In Section 4, we show that the *h*-C2PP problem for simple topological 1-plane graphs is linear-time solvable for any value of *h*. Finally, in Section 5, we provide challenging open problems.

#### **2. Relationship between** *h***-CLIQUE2PAH PLANARITY and the Partition Spanning Forest Problem**

The input of the problem studied by Kindermann et al. [15] is a set of colored points in the plane, and the goal is to decide whether there exist straight-line spanning trees, one for each same-colored point subset, that do not cross each other. Since edges are straight-line, their drawings are determined by the positions of the points, and hence each same-colored point subset can, in fact, be seen as a straight-line drawing of a clique, from which edges have to be removed so that each clique becomes a tree and the drawing becomes planar.

The authors proved NP-completeness for the case in which the spanning tree is a path, even when there are at most four vertices with the same color. This result implies that 4-C2PP for geometric graphs is NP-complete. On the other hand, they provided a linear-time algorithm when there exist at most three vertices with the same color, which then extends to 3-C2PP for geometric graphs.

Although not explicitly mentioned in [15], the drawings produced by the reduction used to prove the NP-completeness of 4-C2PP for geometric graphs are 4-planar. We now provide some details about this reduction.

The authors of [15] performed a polynomial-time reduction from PLANAR 3-SATISFIABILITY. The variable gadget (shown in the yellow region of Figure 1) consists of a triangle *X* whose edges are *x*, *x<sup>l</sup>* , and *x<sup>r</sup>* . Edge *x* is crossing-free and the truth value of *X* is encoded according to which edge

among *x<sup>l</sup>* and *x<sup>r</sup>* is crossing-free. Let *T*<sup>1</sup> and *T*<sup>2</sup> be two triangles whose vertices are *u*, *y*, *z* and *v*, *y*, *z*, respectively. They define two faces *f*<sup>1</sup> and *f*2, respectively. Concatenate a triangle *T*<sup>3</sup> defined as in the variable gadget with *f*<sup>1</sup> by inserting its crossing-free edge (*y*, *z*) inside *f*<sup>1</sup> and by crossing the other two edges of *T*<sup>3</sup> with (*u*, *y*) and (*u*, *z*), respectively. Now, concatenate another triangle *T*<sup>4</sup> defined as in the variable gadget with *f*2. If the crossing-free edge of *T*<sup>4</sup> is inside *f*2, the gadget composed by *T*1, *T*2, *T*<sup>3</sup> and *T*<sup>4</sup> is the wire gadget; if the crossing-free edge of *T*<sup>4</sup> is outside *f*2, the gadget composed by *T*1, *T*2, *T*<sup>3</sup> and *T*<sup>4</sup> is the inverter gadget. The splitting gadget consists of three variable gadgets *X*,*Y* and *Z*, and two 4-cliques, concatenated as illustrated inside the blue region in Figure 2, where the yellow region contains a variable gadget, the orange region contains a wire gadget and the violet region contains an inverter gadget. As shown in Figure 2, multiple splittings of a variable *X* lead to an instance where a triangle has two edges with four crossings.

**Figure 2.** A drawing produced by the reduction in [15]. The yellow region contains a variable gadget, the blue region contains a splitting gadget, the orange region contains a wire gadget, and the violet region contains an inverter gadget.

The NP-completeness of 4-C2PP for geometric graphs implies the NP-completeness of 4-C2PP for simple topological 4-plane graphs. In what follows, we further explore the complexity of 4-C2PP in relation to *k*-planarity by considering values of *k* < 4. In particular, we prove that the problem remains NP-complete for *k* = 3, while it becomes linear-time solvable for *k* = 1.

#### **3.** *NP***-Completeness for Simple Topological 3-Plane Graphs**

In this section, we prove that the 4-C2PP problem remains NP-complete even when the input is a simple topological 3-plane graph.

Since the planarity of a simple topological graph can be checked in linear time, the *h*-C2PP problem for simple topological *k*-plane graphs belongs to NP for all values of *h* and *k*.

In the following, we prove the NP-hardness by means of a reduction from the PLANAR POSITIVE 1-IN-3-SAT problem. In this version of the SATISFIABILITY problem, which is known to be NP-complete [16], each variable appears only with its positive literal, each clause has at most three variables, the graph obtained by connecting each variable with all the clauses it belongs to is planar, and the goal is to find a truth assignment in such a way that, for each clause, exactly one of its three variables is set to True. Our reduction is technically different from the one presented in [15], which reduces from PLANAR 3-SATISFIABILITY.

For each 3-clique we use in the reduction, there is a *base edge*, which is crossing-free in the constructed topological graph, while the other two edges always have crossings. We call *left* (*right*) the edge that follows (precedes) the base edge in the clockwise order of the edges along the 3-clique. In addition, if an edge *e* of a clique does not belong to the path replacing the clique, we say that *e* is *removed*, and that all the crossings involving *e* in *G* are *resolved*.

For each variable *x*, let *n<sup>x</sup>* be the number of clauses containing *x*. We construct a simple topological graph gadget *G<sup>x</sup>* for *x*, called *variable gadget* (see the left dotted box in Figure 3a). This gadget contains 2*n<sup>x</sup>* 3-cliques *t x* 1 , . . . , *t x* 2*nx* , forming a ring, so that the left (right) edge of *t x i* only crosses the left (right) edge of *t x i*−1 and of *t x i*+1 , for each *i* = 1, . . . , 2*nx*. In addition, gadget *G<sup>x</sup>* contains *n<sup>x</sup>* additional 3-cliques, called *τ x* 1 , . . . , *τ x nx* , so that the right edge of *τ x j* crosses the left edge of *t x* 2*j*−1 and the right edge of *t x* 2*j* , while the left edge of *τ x j* crosses the left edge of *t x* 2*j* and the right edge of *t x* 2*j*−1 .

**Figure 3.** (**a**) The variable gadget *G<sup>x</sup>* for a variable *x* is represented in the left dotted box. The clause gadget for a clause *c* is represented in the right dotted box. The chain connecting *Gx* to *Gc* is represented with lighter colors. The removed edges are dashed red. (**b**) All variables are False. (**c**) At least two variables are True.

Then, for each clause *c*, we construct a simple topological graph gadget *Gc*, called *clause gadget*, which is composed of a planar drawing of a 4-clique, together with three 3-cliques whose left and right edges cross the edges of the 4-clique as in the right dotted box in Figure 3a. In particular, observe that the right (left) edge of each 3-clique crosses exactly one (two) edges of the 4-clique.

Every 3-clique in *G<sup>c</sup>* corresponds to one of the three variables of *c*. Let *x* be one of such variables; assuming that *c* is the *j*th clause that contains *x* according to the order of the clauses in the given

formula, we connect the 3-clique corresponding to *x* in the clause gadget *G<sup>c</sup>* to the 3-clique *τ x j* of the variable gadget *G<sup>x</sup>* of *x* by a chain of 3-cliques of odd length, as in Figure 3a.

By construction, the resulting simple topological graph *G* contains cliques of size at most 4, namely one per clause, and hence is a valid instance of 4-C2PP. In addition, by collapsing each variable and clause gadget into a vertex, and each chain connecting them into an edge, the resulting graph *G* 0 preserves the planarity of the PLANAR POSITIVE 1-IN-3-SAT instance. This implies that the only crossings for each edge of *G* are with other edges in the gadget it belongs to and, possibly, with the edges of the 3-cliques of a chain. Hence, *G* is 3-planar. Namely, each base edge is crossing-free; each internal edge of a 4-clique has one crossing; each external edge of a 4-clique has two crossings, and the same is true for the left and right edges of each 3-clique in a chain; finally, the left and right edges of each 3-clique in either a variable or a clause gadget have three crossings.

In the following, we prove the equivalence between the original instance of PLANAR POSITIVE 1-IN-3-SAT and the constructed instance *G* of 4-C2PP. For this, we first give a lemma stating that variable gadgets correctly represent the behavior of a variable; indeed, they can assume one out of two possible states in any solution for 4-C2PP.

**Lemma 1.** *Let G<sup>x</sup> be the variable gadget for a variable x in G. Then, in any solution for* 4-C2PP*, either the left edge of each* 3*-clique τ x j , with j* = 1, . . . , *nx, is removed, or the right edge of each* 3*-clique τ x j is removed.*

**Proof.** We first consider the possible removals of edges in *t x* 1 , . . . , *t x* 2*nx* and claim that, in any solution for 4-C2PP, one of the two following conditions are satisfied: (i) for each 3-clique *t x i* , if *i* is odd, then the left edge is removed, while if *i* is even the right edge is removed; and (ii) for each 3-clique *t x i* , if *i* is odd, then the right edge is removed, while if *i* is even the left edge is removed. Note that this claim is sufficient to prove the statement; in fact, if Condition (i) holds (as in Figure 3a), then the right edge of each 3-clique *τ x <sup>j</sup>* must be removed, in order to resolve its crossings with the right edge of *t x* 2*j*−1 and with the left edge of *t x* 2*j* , while if Condition (ii) holds, then the left edge of each 3-clique *τ x <sup>j</sup>* must be removed, in order to resolve its crossings with the left edge of *t x* 2*j*−1 and with the right edge of *t x* 2*j* .

To prove the claim, we consider the possible removals of edges of *t x* 1 . Suppose first that the base edge of *t x* 1 is removed. Thus, the crossings between the left (right) edge of *t x* 1 and the left (right) edge of *t x* 2 are not resolved; this implies that they have to be resolved by removing both the left and the right edge of *t x* 2 , which is not possible. If the right edge of *t x* 1 is removed, then the crossing between the right edges of *t x* 1 and *t x* 2 is resolved, while the one between their left edges is not. Hence, the left edge of *t x* 2 must be removed. By iterating this argument we conclude that the right (left) edge of each *t x <sup>i</sup>* with *i* odd (even) is removed. Symmetrically, we can prove that, if the left edge of *t x* 1 is removed, then the left (right) edge of each *t x <sup>i</sup>* with *i* odd (even) is removed. This concludes the proof of the lemma.

Given Lemma 1, we can associate the truth value of a variable *x* with the fact that either the left or the right edge of each 3-clique *τ x j* in the variable gadget *G<sup>x</sup>* of *G* is removed. We use this association to prove the following theorem.

#### **Theorem 1.** *The* 4-C2PP *problem is NP-complete, even for* 3*-plane graphs.*

**Proof.** Given an instance of PLANAR POSITIVE 1-IN-3-SAT, we construct an instance *G* of 4-C2PP in linear time as described above. We prove their equivalence.

Suppose first that there exists a solution for 4-C2PP, i.e., a set of edges of *G* whose removal resolves all crossings. By Lemma 1, for each variable *x* either the left or the right edge of each 3-clique *τ x j* in the variable gadget *G<sup>x</sup>* is removed. If the right edge is removed, we assign value True to variable *x*, otherwise we assign False.

To prove that this assignment results in a solution for the given formula of PLANAR POSITIVE 1-IN-3-SAT, we first show that, for each clause *c* that contains variable *x*, the right (left) edge of the 3-clique *tc*(*x*) of the clause gadget *G<sup>c</sup>* corresponding to *x* is removed if and only if the right (left) edge of each 3-clique *τ x j* is removed. Namely, consider the chain that connects *tc*(*x*) with a 3-clique *τ x j* of *Gx*. Note that, for any two consecutive 3-cliques along the chain, the left edge of one 3-clique and the right

edge of the other 3-clique must be removed. Since the chain has odd length, the right (left) edge of *tc*(*x*) is removed if and only if the right (left) edge of *τ x j* is removed, that is, the truth value of *G<sup>x</sup>* is transferred to the 3-clique *tc*(*x*) of *Gc*.

Finally, consider any clause *c*, composed of variables *x*, *y*, and *z*. Let *tc*(*x*), *tc*(*y*), and *tc*(*z*) be the three 3-cliques of the clause gadget *G<sup>c</sup>* of *c* corresponding to *x*, *y*, and *z*, respectively; also, let *v* be the central vertex of the 4-clique of *Gc*, and let *vx*, *vy*, and *v<sup>z</sup>* be the vertices of this 4-clique lying inside *tc*(*x*), *tc*(*y*), and *tc*(*z*), respectively; see Figure 3. We assume without loss of generality that *vx*, *vy*, and *v<sup>z</sup>* appear in this clockwise order around *v*. As discussed above, the left or the right edge of *tc*(*x*) (of *tc*(*y*); of *tc*(*z*)) is removed depending on whether the left or the right edge of each *τ x j* (of each *τ y j* ; of each *τ z j* ) is removed. We show that, for exactly one of *tc*(*x*), *tc*(*y*), and *tc*(*z*) the right edge is removed, which then implies that exactly one of *x*, *y*, and *z* is True, and hence the instance of PLANAR POSITIVE 1-IN-3-SAT is positive.

Suppose first that for each of *tc*(*x*), *tc*(*y*), and *tc*(*z*) the left edge is removed (and hence all the three variables are set to False), as in Figure 3b. This implies that the crossings between the right edges of the three 3-cliques and the three edges of triangle (*vx*, *vy*, *vz*) are not resolved. Hence, all the edges of this triangle should be removed, which is not possible since the remaining edges of the 4-clique do not form a path.

Suppose now that for at least two of *tc*(*x*), *tc*(*y*), and *tc*(*z*), say *tc*(*x*) and *tc*(*y*), the right edge is removed (and hence *x* and *y* are set to True), as in Figure 3c. Since each edge of triangle (*vx*, *vy*, *v*) is crossed by the left edge of one of *tc*(*x*) and *tc*(*y*), by construction, these crossings are not resolved. Hence, all the edges of (*vx*, *vy*, *v*) should be removed, which is not possible since the remaining edges of the 4-clique do not form a path of length 4.

Suppose finally that for exactly one of *tc*(*x*), *tc*(*y*), and *tc*(*z*), say *tc*(*x*), the right edge is removed (and hence *x* is the only one to be set to True), as in Figure 3a. Then, by removing edges (*v*, *vx*), (*vx*, *vy*), and (*vy*, *vz*), all the crossings are resolved and the remaining edges of the 4-clique form a path of length 4, as desired.

The proof of the other direction is analogous. Namely, suppose that there exists a truth assignment that assigns a True value to exactly one variable in each clause. Then, for each variable *x* that is set to True (to False), we remove the right (left) edge of each 3-clique *t x i* , with *i* = 2*j* − 1 and *j* = 1, . . . , *nx*, we remove the left (right) edge of each 3-clique *t x i* , with *i* = 2*j* and *j* = 1, . . . , *nx*, and we remove the right (left) edge of each 3-clique *τ x j* , with *j* = 1, . . . , *nx*. Then, we remove the left or right edge of each 3-clique in a chain so that for any two consecutive 3-cliques, one of them has been removed the left edge and the other one the right edge. This ensures that, for each clause *c*, the right edge of exactly one of the three 3-cliques that belong to the clause gadget *G<sup>c</sup>* has been removed, say the one corresponding to variable *x*, while for the other two 3-cliques the left edge has been removed. Hence, we can resolve all crossings by removing edges (*v*, *vx*), (*vx*, *vy*), and (*vy*, *vz*), as discussed above (see Figure 3a). The statement follows.

#### **4.** *h***-CLIQUE2PAH PLANARITY and 1-Planarity**

In this section, we show that, when the given simple topological graph is 1-plane, *h*-C2PP can be solved in linear time in the size of the input, for any *h*. We consider all possible simple topological 1-plane cliques and show that the problem can be solved using only local tests, each requiring constant time. Note that we can restrict to the case *h* ≤ 6, since *K*<sup>6</sup> is the largest 1-planar complete graph [11].

Simple topological 1-plane graphs containing cliques with at most four vertices that cross each other can be constructed, but it is easy to enumerate all these graphs (up to symmetry) (see Figure 4). Note that such graphs involve at most two cliques and that, if *K*<sup>4</sup> has a crossing, combining it with any other clique would violate 1-planarity (see Figure 4a,b). The next lemma accounts for cliques with five or six vertices.

**Lemma 2.** *There exists no* 1*-plane simple topological graph that contains two cliques, one of which with at least five vertices, whose edges cross each other.*

**Proof.** Consider a simple 1-plane graph *G* that contains two disjoint cliques *K* and *H*, with five and three vertices, respectively. Let *K* 0 be the simple plane topological graph obtained from *K* by replacing each crossing with a dummy vertex. By 1-planarity, every face of *K* 0 is a triangle and contains at most one dummy vertex. Suppose, for a contradiction, that there exists a crossing between an edge of *K* and an edge of *H* in *G*. Then, there would exist at least a vertex *v* of *H* inside a face *f* of *K* 0 and at least one outside *f* . Since *H* is a triangle, there must have been two edges that connect vertices inside *f* to vertices outside *f* . If *f* contains one dummy vertex, then two of its edges are not crossed by edges of *H*, as otherwise *G* would not be 1-planar. Hence, both the edges that connect vertices inside *f* to vertices outside *f* cross the other edge of *f* , a contradiction. If *f* contains no dummy vertices, then each edge of *f* admits one crossing. Let *u* be the vertex of *f* that is incident to the two edges crossed by edges of *H*. Since *u* has degree 4 in *K*, it is not possible to draw the third edge of *H* so that it crosses only one edge of *K*, which completes the proof.

**Figure 4.** All possible 1-plane graphs involving one or more cliques of type *K*<sup>3</sup> and *K*<sup>4</sup> admitting crossings edges. (**a**) and (**b**): two representations of a clique of type *K*<sup>4</sup> ; (**c**) and (**d**): two representations of two intersecting cliques of type *K*3; (**e**) and (**f**): two representations of a clique of type *K*<sup>3</sup> intersecting a clique of type *K*<sup>4</sup> ; (**g**): two intersecting cliques of type *K*<sup>4</sup> .

Combining the previous discussion with Lemma 2, we conclude that, for each subgraph of the input graph *G* that consists either of a combination of at most two cliques of size at most 4, as in Figure 4, or of a single clique not crossing any other clique, the crossings involving this subgraph (possibly with other edges not belonging to cliques) can only be resolved by removing its edges, which can be checked in constant time. In the next theorem, *n* denotes the number of vertices.

**Theorem 2.** *h*-C2PP *is O*(*n*)*-time solvable for simple topological* 1*-plane graphs.*

#### **5. Conclusions and Open Problems**

We introduce and study the *h*-CLIQUE2PATH PLANARITY problem for simple topological *k*-plane graphs; we proved that this problem is NP-complete for *h* = 4 and *k* = 3, while it is solvable in linear time for every value of *h*, when *k* = 1. The natural open question is: What is the complexity for simple topological 2-plane graphs?

Kindermann et al. [15] recently proved that problem 4-C2PP is NP-complete for geometric 4-plane graphs. It would be interesting to study this geometric version of the problem for 2-plane and 3-plane graphs.

Recall that the version of the *h*-C2PP problem when the input is an *n*-vertex abstract graph and *h* ∈ *O*(*n*) is NP-complete, since it is equivalent to CLIQUE PLANARITY [9]. What if the input is an abstract graph and *h* is bounded by a constant or sublinear function? We remark that for *h* = 3 this version of the problem is equivalent to CLUSTERED PLANARITY, when restricted to instances in which the graph induced by each cluster consists of three isolated vertices.

Finally, another intriguing research direction is to study the *h*-CLIQUE2PATH PLANARITY problem in the scenario in which the input graph comes without a clustering of its vertex set, but dense portions of the graph are found by an algorithm. While the problem of finding cliques in a graph is NP-complete [17], one could identify dense subgraphs, for example *k*-cores, in polynomial time [18].

**Author Contributions:** Conceptualization, All authors; Methodology, All authors; Software, All authors; Validation, All authors; Formal analysis, All authors; Investigation, All authors; Resources, All authors; Data curation, All authors; Writing—original draft preparation, All authors; Writing—review and editing, All authors; Visualization, All authors; Supervision, All authors; Project administration, All authors; Funding acquisition, All authors; All authors have read and agreed to the published version of the manuscript.

**Funding:** The research was partially supported by: (i) MIUR-DAAD Joint Mobility Program n.57397196 (P.A.); (ii) ARC (Australian Research Council) DP project (S.H.); (iii) Young Scholar Fund/AFF - Univ. Konstanz (K.K.); (iv) NSF grants CCF-1740858 - CCF-1712119 (S.K.); (v) MIUR grant 20174LF3T8 "AHeAD: efficient Algorithms for HArnessing networked Data" (G.L., A.T.); (vi) Dipartimento di Ingegneria dell'Università degli Studi di Perugia, grant RICBA19FM: "Modelli, algoritmi e sistemi per la visualizzazione di grafi e reti" (G.L., A.T.); and (vii) projects "Algorithms and Emergency", "Robot-based computing systems", "Distributed Computing by mobile entities" funded by Fondo Ricerca di Base 2017, 2018, 2019, respectively, University of Perugia (A.N.).

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **Efficient Approaches to the Mixture Distance Problem**

#### **Justie Su-Tzu Juan <sup>1</sup> , Yi-Ching Chen <sup>2</sup> , Chen-Hui Lin <sup>1</sup> and Shu-Chuan Chen 3,\***


Received: 31 August 2020; Accepted: 23 November 2020; Published: 28 November 2020

**Abstract:** The ancestral mixture model, an important model building a hierarchical tree from high dimensional binary sequences, was proposed by Chen and Lindsay in 2006. As a phylogenetic tree (or evolutionary tree), a mixture tree created from ancestral mixture models, involves the inferred evolutionary relationships among various biological species. Moreover, it contains the information of time when the species mutates. The tree comparison metric, an essential issue in bioinformatics, is used to measure the similarity between trees. To our knowledge, however, the approach to the comparison between two mixture trees is still unknown. In this paper, we propose a new metric named the mixture distance metric, to measure the similarity of two mixture trees. It uniquely considers the factor of evolutionary times between trees. If we convert the mixture tree that contains the information of mutation time of each internal node into a weighted tree, the mixture distance metric is very close to the weighted path difference distance metric. Since the converted mixture tree forms a special weighted tree, we were able to design a more efficient algorithm to calculate this new metric. Therefore, we developed two algorithms to compute the mixture distance between two mixture trees. One requires *O*(*n* 2 ) and the other requires *O*(*nh*1*h*2) computational time with *O*(*n*) preprocessing time, where *n* denotes the number of leaves in the two mixture trees, and *h*<sup>1</sup> and *h*<sup>2</sup> denote the heights of these two trees.

**Keywords:** phylogenetic tree; evolutionary tree; ancestral mixture model; mixture tree; mixture distance; tree comparison

#### **1. Introduction**

Phylogeny reconstruction involves reconstructing the evolutionary relationship from biological sequences among species. Nowadays it has become a critical issue in molecular biology and bioinformatics. Several existing methods, such as neighbor-joining methods [1] and maximum likelihood methods [2], have been proposed to reconstruct a phylogenetic tree. A novel and natural method, ancestral mixture models [3], was developed by Chen and Lindsay to deal with such a problem. The mixture tree, a hierarchical tree created from the ancestral mixture model, induces a sieve parameter to represent the evolutionary time. Chen, Rosenberg and Lindsay (2011) then developed MixtureTree algorithm [4], a linux based program written in C++, which employed the ancestral mixture models to reconstruct mixture tree from DNA sequences. With the information provided by the mixture tree, one can identify when and how a mutation event of species occurs. An example of the mixture tree created by MixtureTree algorithm [3] is shown in Figure 1. The data from Griffiths and Tavare (1994) [5] are a subset of the mitochondrial DNA sequences which first appeared in Ward et al. (1991) [6]. To study the mitochondrial diversity within the Nuu-Chuah-Nulth, an Amerindian tribe from Vancouver Island, Ward et al. (1991) [6] sequenced 360 nucleotide segments

of the mitochondrial control region for 63 individuals from the Nuu-Chuah-Nulth. Griffiths' and Tavares' subsample consisted of 55 of the 63 distinct sequences and 18 segregating sites, including 13 pyrimidines (C, T ) and five purines (A, G). Each linage represents a distinct sequence—that is, there are lineages *a* through *n*. The time scale on the tree can be represented by − log(1 − 2*p*), where *p* is a parameter, the mutation rate. The number on the tree represents the site of the lineage whereat the mutation occurs. For example, when *p* = 0.01, lineages *e* and *f* merge because mutation occurs at site 5 of lineage *f* .

**Figure 1.** An example of the mixture tree [3].

Distinct methods may produce distinct trees, even though the methods adopt an identical dataset [7]. To uncover a well-represented tree involved in evolutionary relationship among species it is quite important to estimate how similar (or different) trees are. The tree distance between two trees is a general measurement for the similarity of the trees.

The tree distance problem is a traditional issue in mathematics. Several metrics have been proposed to measure the similarity between two trees, such as the partition metric (also called the Robison–Foulds metric or RF distance for short) [8], the quartet metric [9], the nearest neighbour interchange metric [10] and the nodal distance metric [11]. Those metrics all compare two trees by considering the tree structure only, and do not mention any parameter in the tree. Thus, those metrics are not suitable for computing the similarity between two mixture trees. Therefore, we propose a novel metric named the mixture distance metric to measure the similarity of two mixture trees in this paper. Among the above metrics, the metric from the nodal distance algorithm is similar to our proposed metric. In 2003, John Bluis and Dong-Guk Shin [11] presented the nodal distance algorithm which is used to measure the distances from leaves to all other leaves in a tree. The metric is defined as follows: Distance(*T*1, *<sup>T</sup>*2) = <sup>∑</sup>*x*,*y*∈*L*(*T*<sup>1</sup> )=*L*(*T*2) |*DT*<sup>1</sup> (*x*, *y*) − *DT*<sup>2</sup> (*x*, *y*)|, where *DT<sup>i</sup>* (*x*, *y*) denotes the distance of leaf *x* to leaf *y* in the tree *T<sup>i</sup>* . The nodal distance algorithm was developed for this metric. Anyway, using this metric to measure the distance between two mixture trees is not conformable.

For the metric of the mixture distance, the time parameter indicating when a mutation event of species occurs plays an important role in the tree similarity, which is, however, not considered by those previous metrics. If the weight of an edge in a mixture tree is defined as the difference in time parameters between its two endpoints, a mixture tree can be regarded as a weighted tree. We can design metrics to calculate the distance between two weighted trees. Some literature discusses the distance problem between two weighted trees. For example, take the weighted RF metric [12], geodesic distance [13] and the path difference metric [14]. However, the weight on each edge is considered to be the number of base changes between the sequences of the species represented by its incident vertices in these documents. Since the weights of each edge in those weighted trees may be different, the algorithm must spend more time to calculate those distances between those two

weighted trees. For example, although there is an linear time algorithm to compute RF distance [15], and a randomized algorithm has been shown to approximate the RF distance with a bounded error in sublinear time [16], the complexity of the weighted RF distance still needs *O*(*n* 2 ). Some papers have studied algorithms for calculating the geodesic trees distances [17–19]. The best one already known is *O*(*n* 4 ) [19]. Due to the characteristics of the time parameter of a mixture tree, any two edges connecting two leaves to the same parent will have the same "weight" in a mixture tree. This helped us to design a better metric and algorithm. We further developed two algorithms to compute the mixture distance between two mixture trees. One requires *O*(*n* 2 ) and the other requires *O*(*nh*1*h*2) computational time with *O*(*n*) preprocessing time, where *n* denotes the number of leaves in these two mixture trees, and *h*<sup>1</sup> and *h*<sup>2</sup> denote the heights of these two trees. If we use the nodal distance algorithm with the mixture distance metric, the time complexity will be *O*(*n* 3 ) for binary unrooted trees. Comparisons with some previous methods show our method performs better.

#### **2. Mixture Distance Metric**

A tree *T* = (*V*(*T*), *E*(*T*)) is a connected and acyclic graph with a node set *V*(*T*) and an edge set *E*(*T*). *T* is a rooted tree if exactly one node of *T* has been designated the root. A node *v* ∈ *V*(*T*) is a leaf if it has no child; otherwise, *v* is an internal node. A node *v* ∈ *V*(*T*) is called in level *i*, denoted by *level*(*v*) = *i*, which means the number of edges on the path between the root and *v* is *i*. Let *L*(*T*) denote a subset of node set *V*(*T*), where each member is a leaf in *T* and *n* = |*L*(*T*)|. Let *height*(*T*) denote the height of tree *T*, which is max{*level*(*v*)|*v* ∈ *L*(*T*)}. *T* is a full binary tree if each node of *T* either has two children or it is a leaf. A complete binary tree is a full binary tree in which every level, except possibly the last, is completely filled, and all nodes are as far left as possible. Let *h*<sup>1</sup> = *height*(*T*1), *h*<sup>2</sup> = *height*(*T*2).

For a mixture tree *T*, each leaf is associated with a species, and every internal node *v* is associated with a mutation time *mT*(*v*) that represents the time when a mutation event occurs on the species node. In fact, the mutation time of an internal node in a mixture tree can be regarded as the distance between the node and any leaf of its descendants. Any two mixture tress *T*<sup>1</sup> and *T*<sup>2</sup> are comparable if *L*(*T*1) = *L*(*T*2). Throughout this paper, a tree refers to a rooted full binary tree and each internal node of the tree is associated with its mutation time, if not mentioned particularly.

Given any two nodes *u*, *v* ∈ *V*(*T*), the least common ancestor or lowest common ancestor (abbreviated LCA) of *u* and *v* is an ancestor of both *u* and *v* with the smallest mutation time. (It is also called the most recent common ancestor (abbreviated MRCA), or the last common ancestor (abbreviated LCA) in biology and genealogy.) Let *PT*(*u*, *v*) denote the mutation time *mT*(*w*) of the LCA *w* of two leaves *u* and *v* in *T*. The mixture distance metric, a metric for the mixture tree, is formally defined as follows.

The mixture distance between two comparable mixture trees *T*<sup>1</sup> and *T*2, denoted by *dm*(*T*1, *T*2), is defined as the sum of difference of the mutation times with respect to the LCAs of any two leaves in *<sup>T</sup>*<sup>1</sup> and *<sup>T</sup>*2. That is, *<sup>d</sup>m*(*T*1, *<sup>T</sup>*2) <sup>=</sup> <sup>∑</sup>*u*,*v*∈*L*(*T*<sup>1</sup> )=*L*(*T*2) |*PT*<sup>1</sup> (*u*, *v*) − *PT*<sup>2</sup> (*u*, *v*)|.

The significance of the mixture distance metric is to measure the similarity between two mixture trees, considering the mutation times (molecular clock) and mutation sites simultaneously. The study sought to develop two algorithms for efficiently computing the mixture distance between two comparable mixture trees. Before we go into the algorithms, three properties of the mixture distance matric are demonstrated. Felsenstein [20] derived three mathematical properties—reflexivity, symmetry and triangle inequality—required for a well-defined metric. We show that the mixture distance is well-defined in Theorem 1.

**Theorem 1.** *The mixture distance d<sup>m</sup> satisfies:*


*3. Triangle inequality: for any three comparable mixture trees T*1*, T*<sup>2</sup> *and T*3*, dm*(*T*1, *T*2) + *dm*(*T*2, *T*3) ≥ *dm*(*T*1, *T*3)*.*

**Proof.** 1. Due to *T*<sup>1</sup> = *T*2, for any two nodes *u*, *v* ∈ *L*(*T*1) = *L*(*T*2), we have *PT*<sup>1</sup> (*u*, *v*) = *PT*<sup>2</sup> (*u*, *v*). Therefore, *dm*(*T*1, *T*2) = 0 can be concluded. On the other hand, if *dm*(*T*1, *T*2) = 0 for any two comparable mixture trees *T*<sup>1</sup> and *T*2. We have *PT*<sup>1</sup> (*u*, *v*) − *PT*<sup>2</sup> (*u*, *v*) for any *u*, *v* ∈ *L*(*T*1) = *L*(*T*2) by the definition. Then we can prove *T*<sup>1</sup> = *T*<sup>2</sup> by induction on the height of *T*<sup>1</sup> (or *T*2).

2. For any two nodes *u*, *v* ∈ *L*(*T*1) = *L*(*T*2), *PT*<sup>1</sup> (*u*, *v*) − *PT*<sup>2</sup> (*u*, *v*) = −(*PT*<sup>2</sup> (*u*, *v*) − *PT*<sup>1</sup> (*u*, *v*)). Thus, *<sup>d</sup>m*(*T*1, *<sup>T</sup>*2) = <sup>∑</sup>*u*,*v*∈*L*(*T*<sup>1</sup> )=*L*(*T*2) |*PT*<sup>1</sup> (*u*, *v*)− *PT*<sup>2</sup> (*u*, *<sup>v</sup>*)| = <sup>∑</sup>*u*,*v*∈*L*(*T*<sup>1</sup> )=*L*(*T*2) |*PT*<sup>2</sup> (*u*, *v*) − *PT*<sup>1</sup> (*u*, *v*)| = *dm*(*T*2, *T*1).

3. The triangle inequality is always satisfied for any three nonnegative numbers *<sup>a</sup>*, *<sup>b</sup>*, *<sup>c</sup>* ∈ <<sup>+</sup> <sup>∪</sup> 0; that is, |*a* − *b*| + |*b* − *c*| ≥ |*a* − *c*|. Therefore, |*PT*<sup>1</sup> (*u*, *v*) − *PT*<sup>2</sup> (*u*, *v*)| + |*PT*<sup>2</sup> (*u*, *v*) − *PT*<sup>3</sup> (*u*, *v*)| ≥ |*PT*<sup>1</sup> (*u*, *v*) − *PT*<sup>3</sup> (*u*, *v*)| holds. Further, we have

$$\begin{aligned} \sum\_{\boldsymbol{\mu}, \boldsymbol{\nu} \in L(T\_1)} |P\_{T\_1}(\boldsymbol{\mu}, \boldsymbol{\nu}) - P\_{T\_2}(\boldsymbol{\mu}, \boldsymbol{\nu})| + \sum\_{\boldsymbol{\mu}, \boldsymbol{\nu} \in L(T\_2)} |P\_{T\_2}(\boldsymbol{\mu}, \boldsymbol{\nu}) - P\_{T\_3}(\boldsymbol{\mu}, \boldsymbol{\nu})| \\ \geq \sum\_{\boldsymbol{\mu}, \boldsymbol{\nu} \in L(T\_1)} |P\_{T\_1}(\boldsymbol{\mu}, \boldsymbol{\nu}) - P\_{T\_3}(\boldsymbol{\mu}, \boldsymbol{\nu})|. \end{aligned}$$

Consequently, *dm*(*T*1, *T*2) + *dm*(*T*2, *T*3) ≥ *dm*(*T*1, *T*3) can be concluded.

#### **3. An** *O*(*nh***1***h***2**)**-Time Algorithm**

Let *T*<sup>1</sup> and *T*<sup>2</sup> denote two comparable mixture trees of *n* leaves for each tree. Note that the mixture distance of *T*<sup>1</sup> and *T*<sup>2</sup> can be solved in *O*(*n* 2 )-time: As when given two comparable mixture trees *T*<sup>1</sup> and *T*<sup>2</sup> each with *n* leaves, there are *O*(*n* 2 ) pairs of leaves separately in *T*<sup>1</sup> and *T*2. In fact, the LCA of any pair of leaves can be found by adopting the *O*(1)-time algorithm with *O*(*n*)-time preprocessing [21].

In the following, another *O*(*n* 2 )-time algorithm, named Algorithm MIXTUREDISTANCE, is proposed to compute the mixture distance between *T*<sup>1</sup> and *T*2, which will help us to realize the next *O*(*nh*1*h*2)-time algorithm, the main result.

#### *3.1. Algorithm MixtureDistance*

Algorithm MIXTUREDISTANCE, as shown on Algorithm 1, proceeds the nodes of *T*<sup>1</sup> by breadth-first search. For each internal node *v* in *T*1, we find out the leaves of *T*<sup>1</sup> such that *v* is exactly the LCA of each pair of leaves, and then compute the LCA *u* of the leaves in *T*<sup>2</sup> which are mapped into the found leaves of *T*1. Finally, the difference of the mutation times between *u* and *v* is calculated. For convenience, we define (*a*, *b*) ∗ (*c*, *d*) = *ad* + *bc* for any two ordered pairs (*a*, *b*) and (*c*, *d*) in this algorithm, where *a*, *b*, *c* and *d* are any four integers.

The algorithm adopts a 2-coloring method [22] on the leaves in *T*<sup>1</sup> and *T*<sup>2</sup> for easy implementation. For each iteration associated with an internal node *v* of *T*<sup>1</sup> in line 4, the leaves of the left and right subtrees rooted by *v* are colored by red and green, respectively. The mapped leaves in *T*<sup>2</sup> have the same coloring as one in *T*1. The mixture distance between each internal node *u* in *T*<sup>2</sup> and *v* is calculated according the coloring scheme in *T*<sup>2</sup> (in lines 16–17), and the coloring information of *u* would be derived for the computation of its parent node (in line 18).

The coloring information of *u*, denoted by *color*(*u*), indicates the coloring information of the subtree in *T*<sup>2</sup> rooted by *u*. *color*(*u*) includes two numbers of *u*'s descendant leaves colored by red (*color*(*u*)[0]) and green (*color*(*u*)[1]), respectively. *color*(*u*) is derived by the coloring information of its two children. That is, *color*(*u*)[0] = *color*(*uL*)[0] + *color*(*uR*)[0] and *color*(*u*)[1] = *color*(*uL*)[1] + *color*(*uR*)[1], where *u<sup>L</sup>* and *u<sup>R</sup>* separately denote the left and right children of *u* in *T*2.


In line 16, *number*(*u*) is achieved by the special product of the color vectors of *u*'s two children, *number*(*u*) = *color*(*uL*)[0] × *color*(*uR*)[1] + *color*(*uL*)[1] × *color*(*uR*)[0], which means the number of times that *u* is an *LCA* of a red leaf and a green leaf. We multiply the difference of their mutation times by *number*(*u*) in line 17, for computing the mixture distance between each internal node *u* in *T*<sup>2</sup> and *v*. At the end of Algorithm MIXTUREDISTANCE, D indicates the mixture distance of *T*<sup>1</sup> and *T*2.

Since the numbers of internal nodes in *T*<sup>1</sup> and *T*<sup>2</sup> (= I<sup>1</sup> and I2) are both equal to *n* − 1, two for-loops will take *O*(*n*) time, and the innermost for-loop always takes 2 (a constant) time units. Therefore, Algorithm MIXTUREDISTANCE requires *O*(*n* 2 ) computational time.

#### *3.2. Modified Algorithm*

After introducing Algorithm MIXTUREDISTANCE, we can give a *O*(*nh*1*h*2) computational time algorithm for computing the mixture distance between two mixture trees in the following part. In Algorithm MIXTUREDISTANCE, when the leaves of the subtree rooted by an internal node *v* in *T*<sup>1</sup> are colored, other leaves in *T*<sup>1</sup> have no color, as do the mapped leaves in *T*2. That is, *color*(*w*) = (0, 0) for *w* ∈ *L*(*T*2). However, Algorithm MIXTUREDISTANCE still processes the ancestors of such leaves in *T*2. In the following, we propose an algorithm for disregarding the nodes without meaningful coloring information, and reduce the time complexity from *O*(*n* 2 ) to *O*(*nh*1*h*2) .

The algorithm contains three main stages, as follows:


In stage 1, the nodes of *T*<sup>2</sup> are ranked in postorder, and the leaves of *T*<sup>1</sup> are assigned by the same rank of the mapped leaves in *T*2. In Figure 2, red numbers nearby leaves in two given comparable mixture trees *T*<sup>1</sup> and *T*<sup>2</sup> indicate the ranking achieved by stage 1 of the algorithm. Note that the number within the nodes means the mutation time *mT<sup>i</sup>* (*v*) of the associated node *v* for *i* = 1 or 2.

**Figure 2.** An example of ranking leaves of *T*<sup>1</sup> and *T*2.

The algorithm proceeds to stage 2 for each internal node *v* of *T*<sup>1</sup> in the reverse order of breadth-first search. When *v* in *T*<sup>1</sup> is processed, stage 2 seeks to construct a minimal subtree *T* 0 2 of *T*<sup>2</sup> involved in colored leaves with respect to node *v*. For node *v*, a nondecreasing list of the leaves of the subtree rooted by *v*, denoted by *lea f*(*v*), is obtained from the leaf lists of its two children, where the leaves in the list are sorted by their ranks. Suppose that there are *k* ordered nodes in *lea f*(*v*), that is, *lea f*(*v*) = {*w*1, *w*2, . . . , *wk*}. With the list *lea f*(*v*), the subtree *T* 0 2 can be constructed as follows.

Let *lca*(*w<sup>i</sup>* , *wj*) denote the LCA of leaves *w<sup>i</sup>* and *w<sup>j</sup>* in *T*2, for any *i*, *j* ∈ {1, 2, . . . , *k*}. The subtree *T* 0 <sup>2</sup> = (*V* 0 , *E* 0 ) is initialized by *V* <sup>0</sup> = {*w*1, *w*2, *lca*(*w*1, *w*2)}, *E* <sup>0</sup> = {*lca*(*w*1, *w*2)*w*1, *lca*(*w*1, *w*2)*w*2} and *root* = *lca*(*w*1, *w*2) . For node *w<sup>i</sup>* , *i* ∈ {1, 2, . . . , *k* − 2},

$$V' = V' \cup \{ lca(w\_{i+1}, w\_{i+2}), w\_{i+2} \} \text{ and}$$

$$E' = E' \cup \{ \overline{lca(w\_{i+1}, w\_{i+2})w\_{i+2}} \}$$

Moreover, if the mutation time (the number written in the node circle) of *lca*(*wi*+<sup>1</sup> , *wi*+2), denoted by *t*(*lca*(*wi*+<sup>1</sup> , *wi*+2)), is larger than the mutation time of *root*, denoted by *t*(*root*), the edge *lca*(*wi*+<sup>1</sup> , *wi*+2)*root* is inserted into *E* <sup>0</sup> and reset *lca*(*wi*+<sup>1</sup> , *wi*+2) as the new *root*. Otherwise, if *t*(*lca*(*wi*+<sup>1</sup> , *wi*+2)) is smaller than the mutation time of *lca*(*w<sup>i</sup>* , *wi*+1), denoted by *t*(*lca*(*w<sup>i</sup>* , *wi*+1)), the edge *lca*(*w<sup>i</sup>* , *wi*+1)*wi*+<sup>1</sup> is removed from *E* <sup>0</sup> and the edges *lca*(*wi*+<sup>1</sup> , *wi*+2)*wi*+<sup>1</sup> and *lca*(*w<sup>i</sup>* , *wi*+1)*lca*(*wi*+<sup>1</sup> , *wi*+2) are inserted into *E* 0 . Otherwise, let *x* = *wi*+<sup>1</sup> and repeat do *x* = *f ather*(*x*) until *t*(*x*) < *t*(*lca*(*wi*+<sup>1</sup> , *wi*+2)) < *t*(*f ather*(*x*)), where *f ather*(*x*) is the node *y* such that *yx* ∈ *E* 0 . Then the edge *f ather*(*x*)*x* is removed from *E* <sup>0</sup> and the edges *lca*(*wi*+<sup>1</sup> , *wi*+2)*x* and *f ather*(*x*)*lca*(*wi*+<sup>1</sup> , *wi*+2) are inserted into *E* 0 .

**Example 1.** *An example of constructing the subtree T* 0 <sup>2</sup> *with respect to lea f*(*v*2) = {*A*, *B*, *G*, *H*} *is illustrated in Figure 3. Initially, the node set V* 0 *is* {*A*, *B*, *lca*(*A*, *B*)} *and the edge set E* 0 *includes the incident edges of the three nodes in T*2*. As node A is processed, two nodes lca*(*B*, *G*) *and G are inserted into V* 0 *, and two edges lca*(*A*, *B*)*lca*(*B*, *G*) *and lca*(*B*, *G*)*G are inserted into E* 0 *. Later, when node B is processed, two nodes lca*(*G*, *H*) *and H are inserted into V* 0 *and two edges lca*(*B*, *G*)*lca*(*G*, *H*) *and lca*(*G*, *H*)*H are inserted into E* 0 *. Meanwhile, the edge lca*(*B*, *G*)*G is removed from E* 0 *and the edge lca*(*G*, *H*)*G is inserted into E* 0 *, because the mutation time of lca*(*B*, *G*) *is larger than the time of the lca*(*G*, *H*)*.*

**Figure 3.** An example of constructing the subtree *T* 0 <sup>2</sup> with respect to *lea f*(*v*2) in Figure 2. (**a**) The initialization of *T* 0 2 . (**b**) The intermediate of *T* 0 2 as node A is processed. (**c**) The complete subtree *T* 0 2 as node B is processed. As the mutation time of *lca*(B, G) is larger than the time of *lca*(G, H), the dotted line incident to G is removed and the other incident edge of G is inserted.

After the subtree *T* 0 <sup>2</sup> with respect to currently processed node *v* is constructed, stage 3 of the algorithm performs lines 5–18 of Algorithm MIXTUREDISTANCE to compute the "partial" mixture distance between *T* 0 2 and the subtree rooted by *v* (only computes the distances of some nodes pairs, for which LCA is equal to *v*). At the end of the algorithm, D indicates the mixture distance between *T*<sup>1</sup> and *T*2.

**Theorem 2.** *The improved algorithm takes O*(*nh*1*h*2) *computational time and O*(*n*) *preprocessing time, where n denotes the number of leaves of the mixture trees and h<sup>i</sup>* = *height*(*Ti*) *for i* = 1, 2*.*

**Proof.** The algorithm contains three main stages. The first stage ranks the leaves in *T*<sup>1</sup> and *T*2, which takes *O*(*n*) time.

In the second stage, a minimal subtree *T* 0 2 of *T*<sup>2</sup> involved in colored leaves with respect to each node *v* in *T*<sup>1</sup> is constructed. For each node *v*, a leaf list *lea f*(*v*) is obtained from the leaf lists of its two children, which is achieved in *O*(*t*) time by using the two-way merging algorithm [23] performed on the leaf list of *v*'s children, where *t* is the size of *lea f*(*v*). The *O*(1)-time algorithm with *O*(*n*)-time processing [21] is employed to compute the LCA of any pair of nodes in *T*2. Constructing *T* 0 2 takes *O*(*th*0 ) time, where *h* 0 is the height of *T* 0 2 due to the "repeat" step. The last stage computes the mixture distance between *v* and each internal node in *T* 0 2 by performing lines 5–18 of Algorithm MIXTUREDISTANCE, which takes *O*(*t*) time. Stages 2 and 3 take *O*(*n*) iterations in total. However, each iteration deals with different *t* nodes. Note that for all internal nodes which are in the same level of *T*1, the sum of *t* (for each node) is *n*. Therefore, stages 2 and 3 totally take *O*(*nh*1*h* 0 ) = *O*(*h*1*h*2) time, where *h*<sup>1</sup> is the height of *T*<sup>1</sup> (note that *h* <sup>0</sup> ≤ *h*<sup>2</sup> = *height*(*T*2)). Hence, the algorithm requires *O*(*nh*1*h*2) computational time with *O*(*n*) preprocessing time.

#### **4. Conclusions**

In this paper, we provide a novel metric named the mixture distance metric to measure the similarity between two mixture trees. It uniquely considers the estimated evolutionary time in the trees. Two algorithms were developed to compute the mixture distance between mixture trees. One requires *O*(*n* 2 ) computational time and the other requires *O*(*nh*1*h*2) computational time with *O*(*n*) preprocessing time, respectively. Note that when *T*<sup>1</sup> and *T*<sup>2</sup> are complete binary trees, *h*<sup>1</sup> and *h*<sup>2</sup> will be *O*(log *n*) and the time complexity of our algorithm will be (*n* log<sup>2</sup> *n*).

Now, we compare our metric with some previous methods which measure phylogenetic differences in consideration of the branch length, when we consider a mixture tree as a weighted tree (recall that the weight of an edge in a mixture tree is defined as the difference of time parameters between its two endpoints). For the geodesic tree distance, the implementation is quite complex and requires heavy computation [19], although a heuristic fast version exists [18]. The definition of the weighted path difference distance [14] is almost the same as the mixture distance. Actually, the weighted path difference distance between two mixture trees *T*<sup>1</sup> and *T*<sup>2</sup> is equal to 2*dm*(*T*1, *T*2). However, it requires *O*(*n* 2 ) computational time. The mixture distance seems to be similar to the weighted RF distance [12], but the calculation performance will vary when we consider the distance between two different extents of similar mixture trees. We give an example as follows.

**Example 2.** *Four mixture trees with the same lineages A, B and C are illustrated in Figure 4; the time parameters are listed in the vertices, and the associated edge weights are labeled beside each edge. All pairs of these four trees have been compared using the methods outlined in [12] and this paper. The tables of the weighted RF (wRF) and mixture distances (dm) are given in Tables 1 and 2, respectively. From these two tables, one can find something interesting. (1) d<sup>m</sup> seems maintain the order relationship in wRF: When wRF thinks that two trees are similar, then d<sup>m</sup> also gets a smaller value between these two trees: wRF*(*T*1, *T*3) > *wRF*(*T*2, *T*3) > *wRF*(*T*2, *T*4) > *wRF*(*T*1, *T*2) *and dm*(*T*1, *T*3) > *dm*(*T*2, *T*3) > *dm*(*T*2, *T*4) > *dm*(*T*1, *T*2)*. (2) When wRF thinks that the distances between two pairs of trees are the same, then d<sup>m</sup> also thinks they are in the same: wRF*(*T*1, *T*2) = *wRF*(*T*3, *T*4)*, wRF*(*T*1, *T*4) = *wRF*(*T*2, *T*3) *and dm*(*T*1, *T*2) = *dm*(*T*3, *T*4)*, dm*(*T*1, *T*4) = *dm*(*T*2, *T*3)*. However, there are still differences between these two metrics in the details: (3) When wRF thinks two distances between two pairs of trees are very different, sometimes d<sup>m</sup> may not think that: wRF*(*T*1, *T*3)− *wRF*(*T*2, *T*3) = 1*, wRF*(*T*1, *T*4)− *wRF*(*T*2, *T*4) = 3*, but dm*(*T*1, *T*3) − *dm*(*T*2, *T*3) = *dm*(*T*1, *T*4) − *dm*(*T*2, *T*4) = 1*.*

**Figure 4.** Four weighted trees with the same lineages A, B and C.

**Table 1.** The weighted RF distances wRF among *T*<sup>1</sup> , *T*2, *T*<sup>3</sup> and *T*<sup>4</sup> .



**Table 2.** The mixture distances *d<sup>m</sup>* among *T*<sup>1</sup> , *T*2, *T*<sup>3</sup> and *T*<sup>4</sup> .

Therefore, it can be said that the performance of the mixture distance in calculating the similarity of two weighted trees is as good as the performance of the weighted RF distance, while the time complexity of the mixture distance is better. In addition, we compared our approaches with the methods performed with the nodal distance metric [11], geodesic tree distance [19], weighted path difference metric [14] and weighted RF distance [12], and the results are shown in Table 3. Our proposed approaches performed better than all of the previous methods when discussing the distance between two mixture trees.


**Author Contributions:** Investigation, Y.-C.C. and C.-H.L.; methodology, J.S.-T.J. and S.-C.C. All authors have read and agreed to the published version of the manuscript.

**Funding:** The first author was supported in part by the Ministry of Science and Technology of the Republic of China under Contract No. MOST100-2221-E-260-024- and MOST109-2115-M-260-001.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **Multi-Winner Election Control via Social Influence: Hardness and Algorithms for Restricted Cases**

#### **Mohammad Abouei Mehrizi \* and Gianlorenzo D'Angelo \***

Computer Science Department, Gran Sasso Science Institute (GSSI), Viale Francesco Crispi, L'Aquila AQ 67100, Italy

**\*** Correspondence: mohammad.aboueimehrizi@gssi.it (M.A.M.); gianlorenzo.dangelo@gssi.it (G.D.)

Received: 4 September 2020; Accepted: 29 September 2020; Published: 2 October 2020

**Abstract:** Nowadays, many political campaigns are using social influence in order to convince voters to support/oppose a specific candidate/party. In election control via social influence problem, an attacker tries to find a set of limited influencers to start disseminating a political message in a social network of voters. A voter will change his opinion when he receives and accepts the message. In constructive case, the goal is to maximize the number of votes/winners of a target candidate/party, while in destructive case, the attacker tries to minimize them. Recent works considered the problem in different models and presented some hardness and approximation results. In this work, we consider multi-winner election control through social influence on different graph structures and diffusion models, and our goal is to maximize/minimize the number of winners in our target party. We show that the problem is hard to approximate when voters' connections form a graph, and the diffusion model is the linear threshold model. We also prove the same result considering an arborescence under independent cascade model. Moreover, we present a dynamic programming algorithm for the cases that the voting system is a variation of straight-party voting, and voters form a tree.

**Keywords:** computational social choice; election control; multi-winner election; social influence; influence maximization

#### **1. Introduction**

Social media is an integral part of nowadays life. No one can ignore the effect of social media on different aspects of our life. Many people from all around the world are using social networks to provide/use various services like teaching/learning, spreading information, events' announcements, and advertising. It has been shown that two-thirds of American adults get news on social mediaSM [1]. It is easy to find evidence that a social influence (SI) started by few users has influenced many people. Then, social media is a kind of cheap means to spread a message among many users. Note that the power of social media is not just like spreading a message or advertising. Its power comes from the fact that a user will receive news from those who have enough authority to change his opinion, like close friends, family members, and colleagues. Since using social influence is effective and cheap, it has been attracting the attention of many political campaigns and candidates to target the user's opinion through SI. They disseminate a piece of information to change voters' opinion. Many real case studies show that campaigns used social influence to change the voters' opinion [2–5]. For example, Allcott and Gentzkow showed that 92% of Americans remembered pro-Trump false news, and 23% remembered pro-Clinton fake news [6].

There are two well-known diffusion models used in social influence called linear threshold model (LTM) and Independent Cascade Model (ICM) [7]. In LTM, a voter accepts a message if the sum over his incoming neighbors' influence, who already accepted the message, is high enough. On the other hand, in ICM, a voter will accept a message if at least one of his incoming neighbors, who already accepted the message, can convince him to accept it (please see Section 2 for a formal definition of LTM and ICM).

In this paper, we consider the multi-winner election control via social influence problem. We are given a social network of voters, a limited budget, a set of candidates each belongs to a party, a dynamic diffusion model to spread a message among the voters, and an attacker/manipulator who supports/opposes a party. When we use LT diffusion model, we assume that the attacker knows the probability that each voter wants to vote for each candidate. To take into account the incoming influence of each node *v*, we use an updating rule based on the incoming influence from the node's incoming activated neighbors, akin to [8]. On the other hand, when we use ICM, we assume the attacker knows the exact preferences list of all voters. When a node/voter becomes active/influenced/infected, in constructive (resp. destructive) case, it will promote (resp. demote) the position of the target candidates in its/his preference list, akin to [9,10] (see Section 3 for formal definition).

Regarding both LTM and ICM, there will be several winners, and they will be elected according to the overall candidates' scores after the diffusion. In the constructive (resp. destructive) case, the attacker wants to find a set of nodes (voters), according to its budget, to start the diffusion and change the voters' opinion to maximize (resp. minimize) the number of winners from his target party. In fact, in a given directed graph, we should find some diffusion starters to influence the voters such that the difference between the number of winners from our target party, w.r.t. the number of winners in the opponent party with the most winners, after and before the diffusion is maximized (resp. minimized). We present some results, including hardness of approximation, approximation, and polynomial-time exact algorithms considering some well-known objective functions on different structures.

Related works. There are many articles regarding voting manipulation (see the survey in [11]). The problem of finding a set of limited seed nodes from a given graph to maximize the expected number of influenced nodes is known as Influence Maximization (IM) problem. There exists an extensive literature about it, too [12]. Domingos and Richardson [13,14] introduced the IM problem, and Kempe et al. formalized it [7,15]. On the other hand, few works consider both of them together, i.e., the election control through social influence problem.

Wilder and Vorobeychik introduced the election control through SI problem regarding single-winner elections [10]. They investigated maximizing margin of victory (MoV) and probability of victory (PoV), where MoV is the difference of the score between the target candidate and the most voted opponent after and before the diffusion. The problem is considered under ICM. They showed maximizing MoV is *NP*-hard, and presented a <sup>1</sup> <sup>−</sup> <sup>1</sup> *e* -approximation algorithm concerning the optimal solution. Furthermore, for maximizing PoV, they showed that it is *NP*-hard to approximate the problem within any constant factor. Corò et al. [16,17] extended the work using any non-increasing scoring function under LTM. They demonstrated the same approximation factor for it. Abouei Mehrizi et al. considered the problem when the attacker knows a probability distribution over the candidates instead of the exact preferences list, under LTM [8]. They showed that maximizing/minimizing the expected probability to vote for a target candidate is hard to approximate within any constant factor under unique game with small set expansion conjecture. They also presented some constant factor approximation algorithms for a relaxed version of the problem. Abouei Mehrizi and D'Angelo showed that in multi-winner elections, when the manipulator wants to maximize/minimize the number of winners in his target party, the problem is inapproximable under ICM, except *P* = *NP* [9]. They also presented some constant factor approximation algorithms when the voting system is similar to the straight-party voting.

Bredereck and Elkind considered some different models, like bribing nodes/voters, adding or deleting edges under LTM. They showed that the problem is hard in those models. They also presented some polynomial-time algorithms for specific cases of the problem [18]. Castiglioni et al. investigated similar models under ICM. They showed that the problem is hard even in restricted structures. Regarding the bribing nodes to influence other voters, they proved that the election control is hard even if the given graph is a line. Furthermore, considering the edge removal/addition

case, they demonstrated that the problem is hard even if the attacker has an infinite budget [19]. Faliszewsk et al. considered the problem where each voter has a preference list. Each node of the graph is representative of all users with the same opinions. There is an edge between two nodes if their opinion differs by the place of an adjacent pair of candidates. They used LTM and proved that maximizing the number of votes for the target candidate is *NP*-hard and fixed parameter tractable with respect to the number of candidates [20]. Furthermore, there is another model in which voters have a preference list over candidates, and voters will change their preference list according to the majority of their neighbors' opinions [21–23].

Outline and our results. In Section 2, we define the most prominent diffusion models in the literature (called LTM and ICM) that we used in this paper. Section 3 defines our model and objective functions formally. We show that our problem is hard to approximate within any factor in a general graph when the diffusion model is LTM in Section 4. Section 5 contains the same result when the diffusion model is ICM, and the given graph is in the form of an arborescence, i.e., edges are from leaves to root of the tree. Moreover, in Section 6, we investigate the problem while the voting system is a variation of straight-party voting, where voters can vote for the parties. In other words, voters have a preference list (or probability distribution) over the candidates, but they can vote for the parties instead of candidates. We presented a polynomial-time algorithm based on the dynamic programming approach to find the maximum difference of votes for our target party before and after diffusion. It also gives a <sup>1</sup> 3 and <sup>1</sup> 2 -approximation algorithms for maximizing MoV in constructive and destructive models, respectively. Finally, we will discuss the results and future works in Section 7.

#### **2. Background**

In this section, we introduce two diffusion models that we have used in this paper, called linear threshold model (LTM) and independent cascade model (ICM) presented by Kemp et al. [7,15]. They are the most prominent dynamic diffusion models used in literature (see a survey on the topic [24]).

#### *2.1. Linear Threshold Model*

We are given a directed graph *G* = (*V*, *E*). Each edge (*u*, *v*) ∈ *E* has a weight *bu*,*<sup>v</sup>* ∈ [0, 1]. The sum of the incoming weight to each node *<sup>v</sup>* <sup>∈</sup> *<sup>V</sup>* is at most one, i.e., <sup>∑</sup>*u*∈*N<sup>i</sup> v bu*,*<sup>v</sup>* 6 1, where *N<sup>i</sup> v* is the set of incoming neighbors of *v*. Furthermore, each node *v* ∈ *V* has a threshold *t<sup>v</sup>* ∈ [0, 1] which is generated uniformly at random.

In this model, the diffusion will start from a set of nodes *S* ⊆ *V* known as seed nodes. At the first step, just the seed nodes will become active/influenced/infected, and all other nodes are inactive. Let us show *A<sup>i</sup>* as the set of nodes that are active at step *i*, i.e., *A*<sup>1</sup> = *S*. The activation process, for each step *i* > 1, is as follows: all nodes in *Ai*−<sup>1</sup> will remain active at step *i*, i.e., *Ai*−<sup>1</sup> ⊆ *A<sup>i</sup>* ; moreover, each inactive node *v* ∈ *V* \ *Ai*−<sup>1</sup> will become active if the sum of the weight from its incoming activated neighbors is not less than its threshold, i.e., for each node *v* ∈ *V* \ *Ai*−<sup>1</sup> , it will be in *A<sup>i</sup>* if <sup>∑</sup>*u*∈*N<sup>i</sup> v bu*,*<sup>v</sup>* > *tv*. The diffusion process will proceed in utmost |*V*| discrete steps, and it will stop as soon as no extra node becomes active, i.e., it stops at step *<sup>k</sup>* > 1 if *<sup>A</sup><sup>k</sup>* = *<sup>A</sup>k*−<sup>1</sup> . We use *A<sup>S</sup>* as the set of activated nodes after the diffusion process started from the set of seed nodes *S*. In what follows, to increase the readability of this article, when we say after *S*, it means after the diffusion process started from a set of seed nodes *S*. Note that the thresholds are not a part of the input, and they will be generated uniformly at random and independently when we run the process. Furthermore, the process is random, and several executions on the same graph may get different results for *AS*.

Kemp et al. [7] defined the IM problem as: Given a graph *G* = (*V*, *E*) and a budget *B* 6 |*V*|. Find a set of seed nodes *S* ⊆ *V*, (|*S*| 6 *B*) so that the expected |*AS*| is maximized. They proved that the problem is *NP*-hard under LTM. Moreover, they showed that a greedy algorithm can solve the problem approximately within a factor of 1 <sup>−</sup> <sup>1</sup> *<sup>e</sup>* − *e*, where *e* is any small constant and fixed number.

#### *2.2. Independent Cascade Model*

Consider a graph *G* = (*V*, *E*) with a weight *bu*,*<sup>v</sup>* ∈ [0, 1] on each edge (*u*, *v*) ∈ *E*. The same as LTM, all nodes are inactive, and at the first step the seed nodes *S* ⊆ *V* become active. Let us define *S<sup>i</sup>* as the nodes that were inactive at step *i* − 1 and became active at step *i*, then *S*<sup>1</sup> = *S*. At each step *i* > 1, each node *v* ∈ *Si*−<sup>1</sup> will try to activate its outgoing neighbors with the probability of the edge between them. In other words, consider *N<sup>o</sup> <sup>v</sup>* as the set of outgoing neighbors of node *<sup>v</sup>*; for each *<sup>u</sup>* <sup>∈</sup> *<sup>N</sup><sup>o</sup> v* , node *v* tries to activate *u* with the probability *bv*,*u*. If *v* has multiple outgoing neighbors, it tries to activate them in an arbitrary order. Note that a node becomes active once, let us say at step *k*, and try to activate its outgoing neighbors exactly once, at step *k* + 1.

Kemp et al. [7] considered the IM under ICM. They showed that the greedy algorithm works for this model, too. They also demonstrated that it is *NP*-hard to approximate the problem within any factor better than 1 <sup>−</sup> <sup>1</sup> *e* .

#### **3. Multi-Winner Election Control: Models and Objective Functions**

In this section, we consider the Multi-Winner Election Control, where some parties are running for an election so that more than one candidate will be elected as the winner, like a parliament election. We consider *t* different parties *C*1, . . . , *C<sup>t</sup>* , each of them contains *k* different candidates, i.e., *C<sup>i</sup>* = {*c i* 1 , . . . , *c i k* }, 1 6 *i* 6 *t*. We use *C* for the set of all candidates, i.e., *C* = ∪ *t i*=1 *Ci* . Furthermore, without loss of generality, we assume *C*<sup>1</sup> is our target party. Note that there will be exactly *k* winners for the election.

#### *3.1. Multi-Winner Election Control under LTM*

In this model, we investigate the case that the adversary does not know the preferences list of the voters; instead of that, for each voter, the attacker has a probability distribution over all candidates. This model is similar to the model known as probabilistic linear threshold ranking (PLTR) defined in [8]. Since most voters do not reveal their preferences in social media, then it is a realistic assumption.

The adversary tries to maximize/minimize the number of winners in his target party. For each node *v* ∈ *V*, we show *π<sup>v</sup>* as the probability distribution of the voter/node *v* over all candidates; we define *πv*(*c*) as the probability that the voter *v* votes for a specific candidate *c* ∈ *C*. Then for every node *<sup>v</sup>* ∈ *<sup>V</sup>*, and candidate *<sup>c</sup>* ∈ *<sup>C</sup>* we have *<sup>π</sup>v*(*c*) ∈ [0, 1], and <sup>∑</sup>*c*∈*<sup>C</sup> <sup>π</sup>v*(*c*) = 1.

In LTM, each node has an incoming influence, which shows the amount of pressure from incoming neighbors to support/oppose a target party. We use this incoming influence of node *v* ∈ *V* to change its probability distribution. Let us define *π*˜ *<sup>v</sup>* as the probability distribution of node *v* after *S*. Respectively, *π*˜ *<sup>v</sup>*(*c*) is the probability that node *v* will vote for candidate *c* ∈ *C* after *S*. We use *A<sup>S</sup>* to show the set of nodes that will become active after *S*.

We consider a single message which spreads among the voters. The message contains some constructive/destructive information targeting all candidates in the target party. When a node *v* becomes active, its probability distribution will change according to the incoming influence from its activated neighbors. We have to normalize the vector in order to make sure that the sum of the probabilities is equal to one, after *S*. For constructive model the probability distribution of a node *v* ∈ *A<sup>S</sup>* changes as follows.

$$\forall c \in \mathbb{C}\_1: \mathfrak{h}\_v(c) = \frac{\pi\_v(c) + \frac{1}{|\mathsf{C}\_1|} \sum\_{\mathsf{u} \in A\_S \cap N\_v^i} b\_{\mathsf{u}\mathsf{v}}}{1 + \sum\_{\mathsf{u} \in A\_S \cap N\_v^i} b\_{\mathsf{u}\mathsf{v}}},$$

$$\forall c \in \mathbb{C} \; \mathsf{C}\_1: \mathfrak{h}\_v(c) = \frac{\pi\_v(c)}{1 + \sum\_{\mathsf{u} \in A\_S \cap N\_v^i} b\_{\mathsf{u}\mathsf{v}}}.$$

Recall that *N<sup>i</sup> v* is the set of incoming neighbors of node *v*. Furthermore, considering the destructive case, the probability distribution of an active node *v* ∈ *A<sup>S</sup>* will change as follows.

$$\forall c \in \mathbb{C}\_1: \mathfrak{R}\_v(c) = \frac{\pi\_v(c)}{1 + \sum\_{\boldsymbol{\mu} \in A\_S \cap N\_v^i} b\_{\boldsymbol{\mu}\boldsymbol{\nu}}},$$

$$\forall c \in \mathbb{C} \; ^\circ \mathsf{C}\_1: \tilde{\pi}\_v(c) = \frac{\pi\_v(c) + \frac{1}{|\mathsf{C} \mid \mathsf{C}\_1|} \sum\_{\boldsymbol{\mu} \in A\_S \cap N\_v^i} b\_{\boldsymbol{\mu}\boldsymbol{\nu}}}{1 + \sum\_{\boldsymbol{\mu} \in A\_S \cap N\_v^i} b\_{\boldsymbol{\mu}\boldsymbol{\nu}}}.$$

By these changes (and normalization), we guarantee that the sum of the probability for each node is equal to 1. In both constructive and destructive cases, the probability distribution of inactive nodes *v* ∈ *V* \ *A<sup>S</sup>* will not change after *S*, i.e., *π*˜ *<sup>v</sup>* = *πv*.

Let us define the expected number of votes for candidate *c* ∈ *C* after *S*, as <sup>F</sup>(*c*, *<sup>S</sup>*) = <sup>E</sup>*A<sup>S</sup>* [∑*v*∈*<sup>V</sup> <sup>π</sup>*˜ *<sup>v</sup>*(*c*)]; similarly, <sup>F</sup>(*c*, <sup>∅</sup>) = <sup>E</sup>[∑*v*∈*<sup>V</sup> <sup>π</sup>v*(*c*)] is the expected number of votes for candidate *c* ∈ *C* before any diffusion.

**Example 1.** *Assume there are two parties supporting two candidates each, i.e., C* = *C*<sup>1</sup> ∪ *C*2, *C*<sup>1</sup> = {*c* 1 1 , *c* 1 2 }, *C*<sup>2</sup> = {*c* 2 1 , *c* 2 2 }*. There are five nodes in the given graph G* = (*V*, *E*)*, where their connections form a star and the weight of all edges is one, i.e.,* (*v*1, *v*2),(*v*1, *v*3),(*v*1, *v*4),(*v*1, *v*5) ∈ *E*, *bv*1 ,*v*<sup>2</sup> = *bv*<sup>1</sup> ,*v*<sup>3</sup> = *bv*<sup>1</sup> ,*v*<sup>4</sup> = *bv*<sup>1</sup> ,*v*<sup>5</sup> = 1*. Let us consider the probability distribution of each node v* ∈ *V as π<sup>v</sup>* = *πv*(*c* 1 1 ), *πv*(*c* 1 2 ), *πv*(*c* 2 1 ), *πv*(*c* 2 2 )*. We set the probability distribution of all nodes as* <sup>1</sup> 8 , 1 8 , 3 8 , 3 8 *. Then before any diffusion, the candidates' score is*

$$\begin{aligned} \mathcal{F}(c\_{1'}^1 \mathcal{Q}) = \mathcal{F}(c\_{2'}^1 \mathcal{Q}) = \frac{5}{8'}, \\ \mathcal{F}(c\_{1'}^2 \mathcal{Q}) = \mathcal{F}(c\_{2'}^2 \mathcal{Q}) = \frac{15}{8'}. \end{aligned}$$

*and none of our target candidates have less score than their opponents. Consider the constructive model in which the adversary's budget is one, i.e., he can select one node to influence the voters and change their opinion. Since the node v*<sup>1</sup> ∈ *V is the most influential node in the graph, the adversary selects it as his seed node. It activates all nodes in the graph, and their probability distribution will be updated as follows.*

$$\begin{aligned} \tilde{\pi}\_{\mathbb{U}\_1} &= \frac{1}{8}, \frac{1}{8}, \frac{3}{8}, \frac{3}{8}, \\ \tilde{\pi}\_{\mathbb{U}\_2} = \tilde{\pi}\_{\mathbb{U}\_3} = \tilde{\pi}\_{\mathbb{U}\_4} = \tilde{\pi}\_{\mathbb{U}\_5} &= \frac{5}{16}, \frac{5}{16}, \frac{3}{16}, \frac{3}{16} \end{aligned}$$

*and the expected number of votes for the candidates is*

$$\begin{aligned} \mathcal{F}(c\_{1\prime}^1 S) &= \mathcal{F}(c\_{2\prime}^1 S) = \frac{11}{8} \lambda, \\ \mathcal{F}(c\_{1\prime}^2 S) &= \mathcal{F}(c\_{2\prime}^2 S) = \frac{9}{8\prime} \end{aligned}$$

*and our target candidates' score is more than their opponents' score.*

#### *3.2. Multi-Winner Election Control under ICM*

Our model is similar to the work presented in [9]. We briefly mention the model bellow. In this model, despite LTM, we assume that the attacker knows the voters' preference list. Each voter *v* ∈ *V* has a preferences list *πv*. Abusing the notations, 1 6 *πv*(*c*) 6 *tk* is the rank of candidate *c* in the preference list of the voter *v*. After the diffusion, inactive voters will keep their original opinions, i.e., ∀*v* ∈ *V* \ *A<sup>S</sup>* : *π*˜ *<sup>v</sup>* = *πv*; however, the activated voters will change their preferences list as follows. Remind that *A<sup>S</sup>* is the set of activated nodes after *S*.

• Constructive: For each node *v* ∈ *A<sup>S</sup>* and for each target candidate *c* ∈ *C*1, the new position of *c* in *π*˜ *<sup>v</sup>* is

$$
\pi\_v(c) = \begin{cases}
\pi\_v(c) - 1 & \text{if } \exists \ c' \in \mathbb{C} \backslash \mathbb{C}\_1 \text{ s.t. } \pi\_v(c') < \pi\_v(c), \\
\pi\_v(c) & \text{otherwise.}
\end{cases}
$$

also, for other candidates *c* ∈ *C* \ *C*1, if there is a candidate *c* <sup>0</sup> ∈ *C* \ *C*<sup>1</sup> s.t. *πv*(*c* 0 ) = *πv*(*c*) + 1, then we set *π*˜ *<sup>v</sup>*(*c*) = *πv*(*c*); otherwise the new rank of the candidate *c* will be calculated as follows.

$$\pi\_{\mathbb{D}}(\mathfrak{c}) = \pi\_{\mathbb{D}}(\mathfrak{c}) + \left| \{ \mathfrak{c}^{\prime\prime} \in \mathbb{C}\_{1} \mid \pi\_{\mathbb{D}}(\mathfrak{c}^{\prime\prime}) > \pi\_{\mathbb{D}}(\mathfrak{c}) \land \left( \nexists \mathfrak{c} \in \mathbb{C} \; \middle| \: \mathfrak{C}\_{1} : \pi\_{\mathbb{D}}(\mathfrak{c}) < \pi\_{\mathbb{D}}(\mathfrak{c}^{\prime}) \right) \right|. \text{ Then } \pi\_{\mathbb{D}}(\mathfrak{c}) = \pi\_{\mathbb{D}}(\mathfrak{c}^{\prime\prime}) \text{ (by (1))}.$$

• Destructive: For each node *v* ∈ *A<sup>S</sup>* and for each target candidate *c* ∈ *C*1, we have

$$\pi\_{\upsilon}(c) = \begin{cases} \pi\_{\upsilon}(c) + 1 & \text{if } \exists \ c' \in \mathbb{C} \backslash \mathbb{C}\_1 \text{ s.t. } \pi\_{\upsilon}(c') > \pi\_{\upsilon}(c), \\\pi\_{\upsilon}(c) & \text{otherwise,} \end{cases}$$

while for *c* ∈ *C* \ *C*1, if there exists a candidate *c* <sup>0</sup> ∈ *C* \ *C*<sup>1</sup> s.t. *πv*(*c* 0 ) = *πv*(*c*) − 1 we set *π*˜ *<sup>v</sup>*(*c*) = *πv*(*c*), otherwise we have

$$\pi\_{\mathfrak{v}}(\mathfrak{c}) = \pi\_{\mathfrak{v}}(\mathfrak{c}) - \left| \{ \mathfrak{c}^{\prime\prime} \in \mathbb{C}\_{1} \mid \pi\_{\mathfrak{v}}(\mathfrak{c}^{\prime\prime}) < \pi\_{\mathfrak{v}}(\mathfrak{c}) \land \left( \nexists \mathfrak{c} \in \mathbb{C} \; \middle| \: \mathfrak{C}\_{1} : \pi\_{\mathfrak{v}}(\mathfrak{c}^{\prime\prime}) < \pi\_{\mathfrak{v}}(\mathfrak{c}) < \pi\_{\mathfrak{v}}(\mathfrak{c}) \right) \right|.$$

In this article, we consider the plurality scoring rule for simplicity, where just the most preferred candidate of each voter gets one score. However, the results can be extended for any non-increasing scoring function, e.g., *k*-approval, anti-plurality, and Borda's rule [25]. Let us denote by F(*c*, ∅), F(*c*, *S*), the expected score of candidate *c* before and after *S*, respectively; formally, <sup>∀</sup>*<sup>c</sup>* <sup>∈</sup> *<sup>C</sup>* : <sup>F</sup>(*c*, <sup>∅</sup>) = <sup>∑</sup>*v*∈*<sup>V</sup>* <sup>1</sup>*πv*(*c*)=<sup>1</sup> , <sup>F</sup>(*c*, *<sup>S</sup>*) = <sup>E</sup>*A<sup>S</sup>* h <sup>∑</sup>*v*∈*<sup>V</sup>* <sup>1</sup>*π*˜ *<sup>v</sup>*(*c*)=<sup>1</sup> i . (If we want to generalize the problem and consider any non-increasing scoring function *g*(·), the functions would be defined as <sup>F</sup>(*c*, <sup>∅</sup>) = <sup>∑</sup>*v*∈*<sup>V</sup> <sup>g</sup>*(*πv*(*c*)), <sup>F</sup>(*c*, *<sup>S</sup>*) = <sup>E</sup>*A<sup>S</sup>* h <sup>∑</sup>*v*∈*<sup>V</sup> <sup>g</sup>*(*π*˜ *<sup>v</sup>*(*c*))]).

**Example 2.** *Consider the graph G and candidates C in Example 1. Let set the voters' preference list as follows.*

$$\begin{aligned} \pi\_{\upsilon\_1} &= c\_1^1 \succ c\_2^1 \succ c\_1^2 \succ c\_2^2, \\ \pi\_{\upsilon\_2} &= c\_1^2 \succ c\_2^1 \succ c\_1^1 \succ c\_2^2, \\ \pi\_{\upsilon\_3} &= c\_2^2 \succ c\_1^2 \succ c\_1^1 \succ c\_2^1, \\ \pi\_{\upsilon\_4} &= c\_1^2 \succ c\_2^1 \succ c\_1^1 \succ c\_2^2, \\ \pi\_{\upsilon\_5} &= c\_2^2 \succ c\_1^1 \succ c\_1^2 \succ c\_2^1. \end{aligned}$$

*where a b means a is preferred to b. The candidates' score before any diffusion is*

$$\begin{aligned} \mathcal{F}(c\_{1'}^1 \mathcal{Q}) &= 1, \\ \mathcal{F}(c\_{2'}^1 \mathcal{Q}) &= 0, \\ \mathcal{F}(c\_{1'}^2 \mathcal{Q}) &= \mathcal{F}(c\_{2'}^2 \mathcal{Q}) = 2, \end{aligned}$$

*and before any diffusion, both of our target candidates have less score than their opponents. Consider the constructive case where the adversary's budget is one. The same as Example 1, the adversary selects the node v*<sup>1</sup> *as a seed node, and it activates all nodes in the graph. After S, the voters update their preference list as follows.*

$$\begin{aligned} \pi\_{\mathbb{P}\_1} &= c\_1^1 \succ c\_2^1 \succ c\_1^2 \succ c\_2^2, \\ \pi\_{\mathbb{P}\_2} &= c\_2^1 \succ c\_1^1 \succ c\_1^2 \succ c\_2^2, \\ \pi\_{\mathbb{P}\_3} &= c\_2^2 \succ c\_1^1 \succ c\_2^1 \succ c\_{1'}^2 \end{aligned}$$

$$\begin{aligned} \pi\_{\upsilon\_4} &= c\_2^1 \succ c\_1^1 \succ c\_1^2 \succ c\_2^2, \\ \pi\_{\upsilon\_8} &= c\_1^1 \succ c\_2^2 \succ c\_2^1 \succ c\_1^2. \end{aligned}$$

*and the candidates' score will be as follows.*

$$\begin{aligned} \mathcal{F}(c\_{1\prime}^1 S) &= \mathcal{F}(c\_{2\prime}^1 S) = 2\prime, \\ \mathcal{F}(c\_{1\prime}^2 S) &= 0, \\ \mathcal{F}(c\_{2\prime}^2 S) &= 1\_\prime \end{aligned}$$

*and both of the target candidates get more vote than their opponents.*

#### *3.3. Objective Functions*

In this paper, our goal is to maximize/minimize the number of winners from our target party. Then the objective functions are the same as [9]. Considering both IC and LT models, we define F(*C*1, *S*) as the number of candidates in *C*<sup>1</sup> that are among the winners. Formally, consider a set of given activated nodes *AS*, which became active after *S*. Let us define F*A<sup>S</sup>* (*c*) as the expected number of votes that candidate *c* will receive while *A<sup>S</sup>* is the set of activated nodes. We set Y*A<sup>S</sup>* (*c*) as the number of candidates *c* <sup>0</sup> ∈ *C* \ {*c*} where the expected number of their votes is less than *c*. In order to consider the tie-breaking rule, if F*A<sup>S</sup>* (*c j i* ) = F*A<sup>S</sup>* (*c j* 0 *i* <sup>0</sup>), then *c j i* has more priority than *c j* 0 *i* 0 if *j* < *j* 0 , or *j* = *j* <sup>0</sup> ∧ *i* < *i* 0 . Then Y*A<sup>S</sup>* (*c*) is defined as

$$\mathcal{Y}\_{A\_{\mathcal{S}}}(\boldsymbol{c}\_{i}^{j}) = \left| \{ \boldsymbol{c}\_{\boldsymbol{j}'}^{j'} \in \mathbb{C} \mid \mathcal{F}\_{A\_{\mathcal{S}}}(\boldsymbol{c}\_{i}^{j}) > \mathcal{F}\_{A\_{\mathcal{S}}}(\boldsymbol{c}\_{\boldsymbol{j}'}^{j}) \vee (\mathcal{F}\_{A\_{\mathcal{S}}}(\boldsymbol{c}\_{i}^{j}) = \mathcal{F}\_{A\_{\mathcal{S}}}(\boldsymbol{c}\_{\boldsymbol{j}'}^{j'}) \wedge (j < \boldsymbol{j}' \vee (j = \boldsymbol{j}' \wedge i < \boldsymbol{i}')) \} \right|.$$

By this definition, we define F(*C*1, *S*) as the expected number of winners from party *C*1, i.e., <sup>F</sup>(*C*1, *<sup>S</sup>*) = <sup>E</sup>*A<sup>S</sup>* - ∑*c*∈*C*<sup>1</sup> <sup>1</sup>Y*AS* (*c*)>(*t*−1)*k* .

Now, let us define the first objective function as Difference of Winners (DoW), where is the difference between the number of winners in our target party before and after *S*. Formally, in constructive (resp., destructive) model we define DoW*<sup>c</sup>* (resp., DoW*<sup>d</sup>* ) as

$$\begin{aligned} \text{DoW}\_{\mathcal{C}}(\mathcal{C}\_{1}, \mathcal{S}) &= \mathcal{F}(\mathcal{C}\_{1}, \mathcal{S}) - \mathcal{F}(\mathcal{C}\_{1}, \mathcal{Q}), \\ \text{DoW}\_{d}(\mathcal{C}\_{1}, \mathcal{S}) &= \mathcal{F}(\mathcal{C}\_{1}, \mathcal{Q}) - \mathcal{F}(\mathcal{C}\_{1}, \mathcal{S}). \end{aligned}$$

The problem of constructive difference of winners (CDW) asks for finding a set of seed nodes *S* (|*S*| 6 *B*) to maximize DoW*c*(*C*1, *S*). Similarly, destructive difference of winners (DDW) refers to the problem of finding a set of seed node *S* (|*S*| 6 *B*) to maximize DoW*<sup>d</sup>* (*C*1, *S*).

As the second objective function, we define a more compelling one called Margin of Victory (MoV). For constructive case, we define it as DoW plus the difference between the number of winners in the opponent parties with the most winners after and before *S*. Formally, for constructive (resp., destructive) case, we define MoV*<sup>c</sup>* (resp., MoV*<sup>d</sup>* ) as

$$\begin{split} \text{MoV}\_{\mathsf{c}}(\mathsf{C}\_{1},\mathsf{S}) &= \mathcal{F}(\mathsf{C}\_{1},\mathsf{S}) - \mathcal{F}(\mathsf{C}\_{\mathsf{A}\prime}^{\mathsf{S}}\,\mathsf{S}) - \left(\mathcal{F}(\mathsf{C}\_{1},\mathcal{Q}) - \mathcal{F}(\mathsf{C}\_{\mathsf{b}\prime}\,\mathcal{Q})\right), \\ \text{MoV}\_{d}(\mathsf{C}\_{1},\mathsf{S}) &= \mathcal{F}(\mathsf{C}\_{1},\mathcal{Q}) - \mathcal{F}(\mathsf{C}\_{\mathsf{b}\prime}\,\mathcal{Q}) - \left(\mathcal{F}(\mathsf{C}\_{1},\mathcal{S}) - \mathcal{F}(\mathsf{C}\_{\mathsf{A}\prime}^{\mathsf{S}}\,\mathcal{S})\right). \end{split}$$

where *C*B, *C S* A , respectively, are the opponent parties with the most winner before and after *S*.

The constructive margin of victory (CMV) problem is looking for a set of seed nodes *S* (|*S*| 6 *B*) in order to maximize MoV*c*(*C*1, *S*). Similarly, destructive margin of victory (DMV) refers to the problem of finding a set of seed nodes *S* (|*S*| 6 *B*) to maximize MoV*<sup>d</sup>* (*C*1, *S*).

#### **4. Multi-Winner Election Control on Graph under LTM**

It is proven that the problem is *NP*-hard to approximate within any factor of approximation using ICM [9]. In this part, we prove the same statement considering LTM.

#### **Theorem 1.** *It is NP-hard to approximate* CMV *and* CDW *within any factor on a given graph under LTM.*

**Proof.** Let us reduce the vertex cover (VC) problem to any approximation algorithm for CDW (reps., CMV). In VC, we are given an undirected graph *G* = (*V*, *E*) and an integer *k*; the decision question is: Is there a set of nodes *V* <sup>0</sup> ⊆ *V* (|*V* 0 | 6 *k*) so that for each edge (*u*, *v*) ∈ *E*, at least one of its vertices are in *V* 0 ? Assume I(*G*, *B*) is a given instance for VC problem, where *G* = (*V*, *E*) is the given graph, and *B* is an integer value. We create an instance I 0 (*G* 0 , *B*) for CDW (reps., CMV) so that *G* <sup>0</sup> = (*V* ∪ *V* <sup>0</sup> ∪ *V* 00 , *E* 0 ) is the graph build from *G*, and *B* is also the budget for our problem. Let us consider a case where there are two parties and four candidates, i.e., *t* = *k* = 2, *C* = *C*<sup>1</sup> ∪ *C*2, *C*<sup>1</sup> = {*c* 1 1 , *c* 1 2 }, *C*<sup>2</sup> = {*c* 2 1 , *c* 2 2 }. We fix the order of candidates in the probability distribution of the voter *v* as *π<sup>v</sup>* = (*πv*(*c* 1 1 ), *πv*(*c* 1 2 ), *πv*(*c* 2 1 ), *πv*(*c* 2 2 )), and build *G* 0 as follows.


$$\begin{aligned} \forall v \in V, \pi\_{\upsilon} &= (\frac{1}{2}, \frac{1}{2}, 0, 0), \\ \forall v' \in V', \pi\_{\upsilon'} &= (\frac{1}{2}, 0, \frac{1}{2}, 0), \\ \forall v'' \in V'', \pi\_{\upsilon''} &= (0, 0, \frac{1}{2}, \frac{1}{2}). \end{aligned}$$

By this reduction, the score of candidates before any diffusion is F(*c* 1 1 , ∅) = F(*c* 2 1 , ∅) = |*V*|, F(*c* 1 2 , ∅) = F(*c* 2 2 , ∅) = <sup>1</sup> 2 |*V*|. Then *F*(*C*1, ∅) = F(*C*2, ∅) = 1.

Note that in this reduction a node *v* will become active deterministically, if either it is selected as a seed node, or all of its incoming neighbors are selected as the seed nodes. Then if we can find a set of seed nodes *S* ⊆ *V* so that it activates all nodes in *V* deterministically, the seed set *S* is also an answer for the corresponding VC problem.

In any approximation algorithm, we know that *S* ⊆ *V* after the diffusion; otherwise, if there is a node *v* <sup>0</sup> ∈ *V* <sup>0</sup> ∩ *S* we can replace it with its incoming neighbor *v* ∈ *V* such that (*v*, *v* 0 ) ∈ *E* 0 and we get at least the same value for MoV*c*, DoW*c*. Furthermore, if there exists a node *v* <sup>00</sup> ∈ *V* <sup>00</sup> ∩ *S* one of the following situations holds:


Then from now on, we assume *S* ⊆ *V*.

If all nodes in *V* become active, since they have an outgoing edge to all nodes *v* <sup>0</sup> ∈ *V* 0 with probability one, then all nodes in *V* ∪ *V* 0 will become active, and the score of the candidates will be as follows.

$$\begin{aligned} \mathcal{F}(c\_1^1, \mathcal{S}) &= |V|, \\ \mathcal{F}(c\_2^1, \mathcal{S}) &= \mathcal{F}(c\_{1'}^2 \mathcal{S}) = \frac{3}{4}|V|. \end{aligned}$$

$$\mathcal{F}(c\_{2'}^2 \mathcal{S}) = \frac{1}{2}|V|.$$

Then *F*(*C*1, *S*) = 2, F(*C*2, *S*) = 0, DoW*c*(*C*1, *S*) > 0, MoV*c*(*C*1, *S*) > 0, and any approximation algorithm will return a positive value, then the answer of I will be YES.

On the other hand, if there is a node *v* ∈ *V*, which is inactive after the diffusion, i.e., ∃*v* ∈ *V* \ *AS*, the score of candidates will be as follows.

$$\begin{aligned} \mathcal{F}(c\_{1'}^1 S) &= |V|\_{\prime} \\ \mathcal{F}(c\_{2'}^1 S) &< \frac{3}{4} |V|\_{\prime} \\ \mathcal{F}(c\_{1'}^2 S) &> \frac{3}{4} |V|\_{\prime} \\ \mathcal{F}(c\_{2'}^2 S) &= \frac{1}{2} |V|. \end{aligned}$$

Then *F*(*C*1, *S*) = F(*C*2, *S*) = 1, DoW*c*(*C*1, *S*) = MoV*c*(*C*1, *S*) = 0, and any approximation algorithm will return zero, then the answer of I will be NO.

For the other direction, note that if we can find a set of nodes *S* ⊆ *V*, which is an answer for I, using the same set of nodes, we can activate all nodes in *V* ∪ *V* <sup>0</sup> and DoW*c*(*C*1, *S*) > 0, MoV*c*(*C*1, *S*) > 0.

To extend the proof for any number of parties (*t*) and candidates (*k*), we need to assign the probability distribution as follows, and the same approach concludes the proof for any *t*, *k* > 2. The same as before, the order of the candidates in probability distribution of a voter *v* is *π<sup>v</sup>* = (*πv*(*c* 1 1 ), . . . , *πv*(*c* 1 *k* ), *πv*(*c* 2 1 ), . . . , *πv*(*c* 2 *k* ), . . . , *πv*(*c t* 1 ), . . . , *πv*(*c t k* )).

$$\begin{aligned} \forall v \in V, \pi\_v &= \overbrace{\left(\frac{1}{k'}\frac{1}{k'}, \dots, \frac{1}{k'}\overbrace{0, \dots, 0}^{k(t-1)}\right)}^{k(t-1)} \\ \forall v' \in V', \pi\_{v'} &= \overbrace{\left(\frac{1}{k'}\frac{1}{k'}, \dots, \frac{1}{k'}, 0, \frac{1}{k'}\overbrace{0, \dots, 0}^{k(t-1)-1}\right)}^{k(t-1)} \\ \forall v'' \in V'', \pi\_{v''} &= \overbrace{\left(0, \dots, 0, \overbrace{1}^k, \dots, 1}^{k}\overbrace{0, \dots, 0}^{k(t-2)}, \dots\right). \end{aligned}$$

The following theorem proves the same statement for the destructive case of the problem.

#### **Theorem 2.** *It is NP-hard to approximate* DMV *and* DDW *within any factor on a given graph under LTM.*

**Proof.** The reduction is similar to the constructive case. Consider the case where *t* = *k* = 2. We should set the voters' probability distributions such that one of our target candidates be among the losers before and after any diffusion. Furthermore, another target candidate is among the winners before any dissemination; however, he will lose the election if and only if all nodes in the connected part of the

graph become active. Please note that, since our target candidates have more priority than the others, we need one more node to be able to do that.

#### **5. Multi-Winner Election Control on Arborescence under ICM**

In this section, instead of a general graph, we consider an arborescence structure. We are given a tree *G* = (*V*, *E*) and a budget *B* where the directed edges are from leaves towards the root under ICM. We are asked to find at most *B* seed nodes to maximize MoV*<sup>c</sup>* and DoW*c*.

It has been shown that the problem in inapproximable on a general graph, except *P* = *NP* [9]. Bharathi et al. conjectured that the IM problem considering ICM on arborescence is *NP*-hard [26]. Lu et al. proved that the conjecture is true [27], while Wang et al. showed that the IM problem accepts a polynomial-time algorithm on arborescence under LTM [28]. In the following, we show that our problem is hard to approximate within any factor of approximation on arborescence under ICM.

#### **Theorem 3.** *It is NP-hard to find an approximation algorithm for* CMV *and* CDW *on arborescence under ICM.*

**Proof.** We show the hardness by reducing the IM problem to our problem. Given an instance I(*T*, *B*) of IM problem where *T* = (*V*, *E*) is the tree (arborescence), and *B* is the budget. Let us define the decision version of the problem as follows: is there at most *B* seed nodes so that it activates all nodes of the tree in expected?

We consider the case where there are two parties and each of them have just two candidates, i.e., *C* = *C*<sup>1</sup> ∪ *C*2, *C*<sup>1</sup> = {*c* 1 1 , *c* 1 2 }, *C*<sup>2</sup> = {*c* 2 1 , *c* 2 2 }. Furthermore, for simplicity, we consider the plurality scoring rule. The proof can be extended for any number of parties and candidates using any non-increasing scoring function, akin to [29].

Let us create an instance of our problem I 0 (*T* 0 , *B*) as follows, where *T* <sup>0</sup> = (*V* ∪ *V* <sup>0</sup> ∪ *V* 00 , *E*) is a tree, and *B* is the same budget for both problems.


$$\begin{aligned} \forall v \in V: c\_1^2 \succ c\_2^2 \succ c\_1^1 \succ c\_{2'}^1, \\ \forall v' \in V': c\_2^2 \succ c\_1^2 \succ c\_2^1 \succ c\_{1'}^1, \\ \forall v'' \in V'': c\_1^2 \succ c\_1^1 \succ c\_2^1 \succ c\_2^2 \end{aligned}$$

Clearly, seed nodes will be selected from *V*, i.e., *S* ⊆ *V*; otherwise, if there is a node *v* <sup>0</sup> ∈ *S* ∩ *V* 0 , then the node is useless and does not affect DoW*<sup>c</sup>* or MoV*c*. If there is a node *v* <sup>00</sup> ∈ *S* ∩ *V* 00, we can replace it with its incoming neighbor and get at least the same value for DoW*<sup>c</sup>* and MoV*c*.

Using aforementioned polynomial-time reduction, if there exists a set of nodes *S* ⊆ *V* (|*S*| 6 *B*) so that MoV*<sup>c</sup>* > 0 (resp. DoV*<sup>c</sup>* > 0), then the node will activate all nodes in *V* ∪ *V* 00. Hence, we can select the same set and they will activate all nodes in *T*; then the answer of I will be YES. On the other hand, if MoV*<sup>c</sup>* = 0 (resp. DoW*<sup>c</sup>* = 0), it means there is no seed set can activate all nodes in *V* ∪ *V* 00; then the answer of I is NO. More formally, before any diffusion the score of candidates is

$$\begin{aligned} \mathcal{F}(c\_1^1, \mathcal{Q}) &= \mathcal{F}(c\_2^1, \mathcal{Q}) = 0, \\ \mathcal{F}(c\_1^2, \mathcal{Q}) &= 2|V|, \\ \mathcal{F}(c\_2^2, \mathcal{Q}) &= |V|. \end{aligned}$$

Then, none of the candidates in our target party will be elected as winner. After *S*, if there exists an inactive node in *V* ∪ *V* 00, then the the score of candidates will be as follows:

$$\begin{aligned} \mathcal{F}(c\_1^1, S) &< |V|\_{\prime} \\ \mathcal{F}(c\_2^1, S) &= 0\_{\prime} \\ \mathcal{F}(c\_1^2, S) &> |V|\_{\prime} \\ \mathcal{F}(c\_2^2, S) &= |V|. \end{aligned}$$

In this case also, none of our target candidates will be among the winners, and MoV*<sup>c</sup>* = DoW*<sup>c</sup>* = 0. However, if all nodes in *V* ∪ *V* 00 become active after *S*, the score of the candidates will be as follows and one of our target candidates (*c* 1 1 ) will be elected as winner and any approximation algorithm will return MoV*<sup>c</sup>* > 0 (resp. DoW*<sup>c</sup>* > 0). It concludes the prove.

$$\begin{aligned} \mathcal{F}(c\_1^1, S) &= |V|\_{\prime} \\ \mathcal{F}(c\_2^1, S) &= 0\_{\prime} \\ \mathcal{F}(c\_1^2, S) &= |V|\_{\prime} \\ \mathcal{F}(c\_2^2, S) &= |V|. \end{aligned}$$

The following theorem demonstrates the same hardness of approximation for the destructive case of our problem.

#### **Theorem 4.** *It is NP-hard to find an approximation algorithm for* DMV *and* DDW *on arborescence under ICM.*

**Proof.** The prove for the destructive case is similar to the constructive one. Consider I 0 in Theorem 3, we need to set the preferences list of the nodes so that all of our target candidates win the election before any diffusion; however, after the diffusion, one of them (let us say *c* ∈ *C*1) will lose if and only if all nodes in *V* ∪ *V* 00 become active. Note that since our target candidates have more priority than the others, we need one more isolated node to ensure that *c* will lose the election after the diffusion. Following the same approach concludes the statement.

#### **6. Multi-Winner Election Control on Tree Using Straight-Party Voting**

In this part, we consider the problem on a variation of the straight-party voting system (also called straight-ticket voting) in which the voters can vote for a party instead of candidates [30,31]. This model is used in many real elections [32,33]. The multi-winner election control problem via social influence under ICM and a general graph is considered in [9]. They showed that the problem is hard, and presented some constant factor approximation using straight-party voting system. In this section, we consider the problem on a tree where the edges are directed from root to the leaves.

In the rest of this section, we assume the given tree is a binary tree as we can convert any tree *T* to a binary tree *T* 0 by adding *O*(*n*) fake nodes. However, our algorithm can use the fake nodes to navigate the tree, but they neither have a probability distribution (preference list) nor can be selected as a seed node. To ensure that the fake nodes will not change the diffusion process on the tree, the weight of each incoming edge to each fake node should be equal to one. Moreover, the weight of an edge from a fake node to an original node is equal to the weight of the original node's incoming edge in *T*.

In the following, we present some dynamic programming (DP) algorithm to maximize DoV*spv c* (and DoV*spv d* ). Given a tree *T* = (*V*, *E*), and budge *B*, the idea is that for a fixed node *v* ∈ *V* and budget *k* (0 6 *k* 6 *B*), we calculate the maximum outcome from the sub-tree rooted at *v*, among the following cases: First, select the node *v* and try to find the other *k* − 1 seed nodes in its children. Second, do not select *v* and look for *k* seed nodes in its children.

We define *r*(*v*), *l*(*v*), *f*(*v*), respectively, as the right child, left child, and the parent (father) of the node *v*. In Section 6.1 we consider the problem under LTM, and in Section 6.2 the problem is investigated under ICM.

#### *6.1. Multi-Winner Election Control Using Straight-Party Voting under LTM*

In this section, the voters have preferences list over the candidates. However, they vote for a party proportional to the probability of voting for all candidates in each party. Let us define F*spv*(*C*1, ∅), F*spv*(*C*1, *S*), as the sum of the scores for our target party *C*<sup>1</sup> before and after *S*, respectively. Formally they are defined as follows.

$$\begin{aligned} \mathcal{F}\_{spv}(\mathsf{C}\_{1\prime}\mathcal{Q}) &= \mathbb{E}\left[\sum\_{v \in V} \sum\_{c \in \mathsf{C}\_{1}} \pi\_{v}(c)\right], \\ \mathcal{F}\_{spv}(\mathsf{C}\_{1\prime}\mathcal{S}) &= \mathbb{E}\_{A\_{\mathcal{S}}}\left[\sum\_{v \in V} \sum\_{c \in \mathsf{C}\_{1}} \pi\_{v}(c)\right]. \end{aligned}$$

The same as before we define the objective function MoV and difference of votes (DoV), for constructive case, as follows.

$$\begin{aligned} \text{DoV}\_{\mathcal{C}}^{spv}(\mathsf{C}\_{1}, \mathsf{S}) &= \mathcal{F}\_{spv}(\mathsf{C}\_{1}, \mathsf{S}) - \mathcal{F}\_{spv}(\mathsf{C}\_{1}, \mathsf{O}), \\ \text{MoV}\_{\mathcal{C}}^{spv}(\mathsf{C}\_{1}, \mathsf{S}) &= \mathcal{F}\_{spv}(\mathsf{C}\_{1}, \mathsf{S}) - \mathcal{F}\_{spv}(\mathsf{C}\_{\lambda}^{\mathcal{S}}, \mathsf{S}) - \left(\mathcal{F}\_{spv}(\mathsf{C}\_{1}, \mathcal{O}) - \mathcal{F}\_{spv}(\mathsf{C}\_{\alpha}, \mathcal{O})\right), \end{aligned} \tag{1}$$

while *C*<sup>B</sup> and *C S* <sup>A</sup> are the most voted opponent party before and after *S*, respectively. For destructive model the objective functions are defined as

$$\begin{split} \text{DoV}\_{d}^{spv}(\mathbb{C}\_{1},\mathbb{S}) &= \mathcal{F}\_{spv}(\mathbb{C}\_{1},\mathbb{O}) - \mathcal{F}\_{spv}(\mathbb{C}\_{1},\mathbb{S}), \\ \text{MoV}\_{d}^{spv}(\mathbb{C}\_{1},\mathbb{S}) &= \mathcal{F}\_{spv}(\mathbb{C}\_{1},\mathbb{O}) - \mathcal{F}\_{spv}(\mathbb{C}\_{a},\mathbb{O}) - \left(\mathcal{F}\_{spv}(\mathbb{C}\_{1},\mathbb{S}) - \mathcal{F}\_{spv}(\mathbb{C}\_{a}^{\mathbb{S}},\mathbb{S})\right). \end{split} \tag{2}$$

#### 6.1.1. Maximizing DoV in Straight-Party Voting under LTM

We define *F<sup>v</sup>* as the set of possible probabilities that the node *f*(*v*) may become active. More precisely, consider all nodes in the path from root to the *v* as *F* 0 *<sup>v</sup>* = {*v*0, *v*1, . . . , *v<sup>t</sup>* = *f*(*v*)} (recall that *f*(*v*) is the parent of *v*). If none of the nodes in *F* 0 *<sup>v</sup>* are selected as a seed node, then the probability that *f*(*v*) becomes active by his incoming influence is zero. If just the root (*v*0) is selected as the seed node, then the probability that *f*(*v*) becomes active is ∏ *i*<*t i*=0 *bvi* ,*vi*+<sup>1</sup> ; also, if *v*<sup>1</sup> is selected as a seed node but none of the nodes *v<sup>i</sup>* , 2 6 *i* 6 *t*, are selected as a seed node, the probability that *f*(*v*) becomes active by its parent is ∏ *i*<*t i*=1 *bvi* ,*vi*+<sup>1</sup> , and so on; all these probabilities belong to *Fv*.

Let us define DoV*c*(*v*, *k*, *S*, *p*) as the maximum value of the sum over the difference of probability to vote for our target party after and before *S* in the sub-tree rooted at *v* while *p* ∈ *F<sup>v</sup>* is the probability that its parent is active, and the budget is *k*. Furthermore, all selected seed nodes will be in *S*. In other words, DoV*c*(*v*, *<sup>k</sup>*, *<sup>S</sup>*, *<sup>p</sup>*) = *max*{DoV*spv <sup>c</sup>* (*C*1, *S*)} in the sub-tree rooted at *v* while it will become active with probability *p* · *bf*(*v*),*<sup>v</sup>* and |*S*| 6 *k*. The formal definition of DoV*c*(*v*, *k*, *S*, *p*) is as follows:

$$\begin{split} \text{DoV}\_{\mathcal{c}}(\boldsymbol{v},k,\boldsymbol{S},\boldsymbol{p}) &= \max \left\{ \\ \max\_{k'=0}^{k} \left\{ \text{DoV}\_{\mathcal{c}}\left(r(\boldsymbol{v}),k',\boldsymbol{S},\boldsymbol{p}\cdot\boldsymbol{b}\_{f(\boldsymbol{v}),\boldsymbol{x}}\right) + \text{DoV}\_{\mathcal{c}}\left(l(\boldsymbol{v}),k-k',\boldsymbol{S},\boldsymbol{p}\cdot\boldsymbol{b}\_{f(\boldsymbol{v}),\boldsymbol{x}}\right) \right\} + \boldsymbol{p}\cdot\boldsymbol{b}\_{f(\boldsymbol{v}),\boldsymbol{x}}\cdot\mathcal{D}\_{\boldsymbol{v}} \end{split} \end{split} $$
 
$$\max\_{k'=0}^{k-1} \left\{ \text{DoV}\_{\mathcal{c}}\left(r(\boldsymbol{v}),k',\boldsymbol{S}\cup\{\boldsymbol{v}\},1\right) + \text{DoV}\_{\mathcal{c}}\left(l(\boldsymbol{v}),k-k'-1,\boldsymbol{S}\cup\{\boldsymbol{v}\},1\right) \right\} + \mathcal{D}\_{\boldsymbol{v}} \right\} , \quad \text{(3)} $$

where D*<sup>v</sup>* is the increased score of our target party made by the node *v* if it becomes active, which is

$$\mathcal{D}\_{\upsilon} = \sum\_{c \in \mathbb{C}\_{\mathbb{I}}} \left( \frac{\pi\_{\upsilon}(c) + \frac{1}{|\mathbb{C}\_{\mathbb{I}}|} \cdot p \cdot b\_{f(\upsilon), \upsilon}}{1 + p \cdot b\_{f(\upsilon), \upsilon}} - \pi\_{\upsilon}(c) \right). \tag{4}$$

We can calculate and store the values in a two-dimensional array *A*[*B* + 1, |*V*|] where the rows are the budgets (starting from zero to *B*), and the columns are the nodes of the tree presented as the BFS reverse order, and each cell (*i*, *j*) (0 6 *i* 6 *B*, 0 6 *j* < |*V*|) of the array refers to another array *A* 0 [|*Fv<sup>j</sup>* |]. Then in the worst case, since the budget *B*, and |*Fv<sup>j</sup>* | (for any *v<sup>j</sup>* ∈ *V*) are at most equal to |*V*|, then we can solve the problem in polynomial time using *O*(|*V*| 3 ) memory. Note that we have to fill the matrix *A* left-to-right and top-down, while for each cell of it we can fill the corresponding array *A* 0 in any order.

As the base cases, for each leaf *v* ∈ *V*, and *p* ∈ *Fv*, if *k* > 0 we set DoV*c*(*v*, *k*, *S*, *p*) = D*v*, otherwise, if *k* = 0 we have DoV*c*(*v*, *k*, *S*, *p*) = *p* · *bf*(*v*),*<sup>v</sup>* · D*<sup>v</sup>* which is the difference of the probability to vote for our party after and before diffusion *S*, made by the node *v*. In fact, if the budget is greater than zero, the node will become active for sure, and we need to consider the difference of scores, but if the budget is zero we cannot select it as a seed node and the value should be multiplied by the probability that the node will become active, i.e., *p* · *bf*(*v*),*<sup>v</sup>* . We also define DoV*c*(*null*, *k*, *S*, *p*) = 0, that is, the value of DoV*<sup>c</sup>* for a null reference is zero. It is useful when a node has just left (resp. right) child, then the value of the function for its right (resp. left) child, regardless of the other parameters, is zero. The pseudo-code of the DP is presented in Algorithm 1, which calculates the maximum DoV*spv c* ; by small changes, it can find the seed nodes too. Note that the final answer will be calculated by DoV*c*(*vroot*, *B*, ∅, 0) where *vroot* is the root node of the tree, *B* is the budget, ∅ represents that we have no seed node so far, and 0 means the parent of the root node will be activated with zero probability. The following theorem shows that the DP works well.

**Algorithm 1**: Calculating maximum DoV*<sup>c</sup>* for e given tree *T* and budget *B* when the diffusion model is LTM and voting system is straight-party voting.

**Procedure** *DoV(Tree T* = (*V*, *E*)*, Budget B) A* ← [*B* + 1, |*V*|] . It is a two-dimensional array *A*[0..*B*, 0..|*V*| − 1] Name all nodes in *V* from 0 to |*V*| − 1 in BFS reverse order **for** (*j* ← 0; *j* < |*V*|; *j* ← *j* + 1) **do** *Fv<sup>j</sup>* ← Set of all possible probabilities that *f*(*vj*) may become active **for** *(i* ← 0; *i* <= *B*; *i* ← *i* + 1*)* **do** . the variables *i*, *j* are a counter for rows and columns, respectively. *A*[*i*, *j*] ← Array[|*Fv<sup>j</sup>* |] . Each cell (*i*, *j*) is an array **if** *(v<sup>j</sup> is a leaf)* **then for** *(p* ∈ *Fv<sup>j</sup> )* **do** *<sup>A</sup>*[*i*, *<sup>j</sup>*; *<sup>p</sup>*] ← <sup>∑</sup>*c*∈*C*<sup>1</sup> *πvj* (*c*)+ <sup>1</sup> |*C*1 | ·*p*·*bf*(*vj* ),*vj* <sup>1</sup>+*p*·*bf*(*vj* ),*vj* − *πv<sup>j</sup>* (*c*) **if** *(i* = 0*)* **then** *A*[*i*, *j*; *p*] ← *p* · *bf*(*vj*),*v<sup>j</sup>* · *A*[*i*, *j*; *p*] **end end continue end for** *(p* ∈ *Fv<sup>j</sup> )* **do** . If *r*(*vj*) or *l*(*vj*) does not exist, *A*[. . . ,*r*(*vj*) or *l*(*vj*); . . . ] is zero. D*<sup>v</sup>* ← ∑*c*∈*C*<sup>1</sup> *πvj* (*c*)+ <sup>1</sup> |*C*1 | ·*p*·*bf*(*vj* ),*vj* <sup>1</sup>+*p*·*bf*(*vj* ),*vj* − *πv<sup>j</sup>* (*c*) *max<sup>j</sup>* <sup>←</sup> max*<sup>i</sup> k*=0 (*A*[*k*,*r*(*vj*); *p* · *bf*(*vj*),*v<sup>j</sup>* ] + *A*[*i* − *k*, *l*(*vj*); *p* · *bf*(*vj*),*v<sup>j</sup>* ]) *max*0 *<sup>j</sup>* <sup>←</sup> max*i*−<sup>1</sup> *k*=0 (*A*[*k*,*r*(*vj*); 1] + *A*[*i* − *k* − 1, *l*(*vj*); 1]) *A*[*i*, *j*; *p*] ← max(*max<sup>j</sup>* + *p* · *bf*(*vj*),*v<sup>j</sup>* · D*v*, *max*<sup>0</sup> *<sup>j</sup>* + D*v*) **end end end return** *A*[*B*, |*V*| − 1; 0] . The final result for the root node using all budget **end**

**Theorem 5.** *Given a tree T* = (*V*, *E*) *and budget B, the DP Equation* (3) *finds a set of seed nodes S* (|*S*| 6 *B*) *to maximize DoVspv c .*

**Proof.** Consider the matrix *A*[*B* + 1, |*V*|] where each cell *A*[*k*, *v*] point to another array *A* 0 where the columns are all possible probabilities that *f*(*v*) will become active. Calculating all possible probabilities for the array *A* 0 , we have at most |*Fv*| columns for each node *v* ∈ *V* and budget 0 6 *k* 6 *B*, and for each of them, we need to calculate and store the maximum DoV*c*.

Please note that if *f*(*v*) becomes active, it can activate *v* with a probability equal to the weight of the edge between them (*bf*(*v*),*<sup>v</sup>* ). It holds because each node has just one incoming edge (its parent), and the threshold of the node will be generated uniformly at random. Then the probability that the threshold of the node *v* be less than (or equal) to the weight of the incoming edge is *bf*(*v*),*<sup>v</sup>* .

Let us show that all values in the arrays will be calculated correctly, by induction. To see that, consider the base cases. For each leaf *v* ∈ *V*, the node cannot activate any other node as it has no outgoing edge. Then, these nodes cannot change the probability distribution of other nodes. In other words, each leaf will change just its own probability distribution. If *k* = 0, it means that we cannot select the node as a seed node, and we need to consider the probability of activating the node, because just activated nodes can update their probability distribution after the diffusion. Then if *k* = 0, we have DoV*c*(*v*, *k*, *S*, *p*) = *p* · *bf*(*v*),*<sup>v</sup>* · D*v*, where D*<sup>v</sup>* is the difference of the party's score if the node *v* becomes active (defined in Equation (4)), and *p* · *bf*(*v*),*<sup>v</sup>* is the probability that the node will be activated by its parent. On the other hand, if *k* > 0, we can select *v* as a seed node, and it will be activated with the probability of one, then we have DoV*c*(*v*, *k*, *S*, *p*) = D*v*. Using the updating rule (defined in Section 3.1), and the definition of DoV*spv c* (defined in Equation (1)), the base cases are true.

Let us define (*i* 0 , *j* 0 ) < (*i*, *j*) if *j* 0 < *j*, or *j* <sup>0</sup> = *j* ∧ *i* 0 < *i*. We have shown that all arrays *A* 0 related to the base cases filled out correctly. Now by induction step, assume all related arrays related to pair (*i* 0 , *j* 0 ) smaller than (*i*, *j*) are correctly calculated. In order to calculate the *A* 0 related to *A*[*i*, *j*], for each column *p* ∈ *Fv<sup>j</sup>* we use following formula

$$\begin{split} \text{DoV}\_{\mathfrak{c}}(\boldsymbol{v}\_{j},i,\boldsymbol{S},\boldsymbol{p}) &= \max \left\{ \\ \max\_{k=0}^{i} \left\{ \text{DoV}\_{\mathfrak{c}}\left(r(\boldsymbol{v}\_{j}),k,\boldsymbol{S},\boldsymbol{p}\cdot\boldsymbol{b}\_{f(\boldsymbol{v}\_{j}),\boldsymbol{x}\_{j}}\right) + \text{DoV}\_{\mathfrak{c}}\left(l(\boldsymbol{v}\_{j}),i-k,\boldsymbol{S},\boldsymbol{p}\cdot\boldsymbol{b}\_{f(\boldsymbol{v}\_{j}),\boldsymbol{x}\_{j}}\right) \right\} + \boldsymbol{p}\cdot\boldsymbol{b}\_{f(\boldsymbol{v}\_{j}),\boldsymbol{x}\_{j}}\cdot\mathcal{D}\_{\boldsymbol{v}\_{j}} \\ &\quad \max\_{k=0}^{i-1} \left\{ \text{DoV}\_{\mathfrak{c}}\left(r(\boldsymbol{v}\_{j}),k,\boldsymbol{S}\cup\{\boldsymbol{v}\_{j}\},1\right) + \text{DoV}\_{\mathfrak{c}}\left(l(\boldsymbol{v}\_{j}),i-k-1,\boldsymbol{S}\cup\{\boldsymbol{v}\_{j}\},1\right) \right\} + \mathcal{D}\_{\boldsymbol{v}\_{j}} \right\}. \end{split}$$

in which the first maximization considers the maximum value among all possible cases that we do not select the node *v<sup>j</sup>* as a seed node, and the second one considers the maximum value among all possible cases that we choose *v<sup>j</sup>* as a seed node. The last term in each maximization is the increased amount of DoV*<sup>c</sup>* in the node *v<sup>j</sup>* , which is according to the probability that *v<sup>j</sup>* will become active. Note that in the above formula, we are using the value of DoV*<sup>c</sup>* for the children of *v<sup>j</sup>* , and the nodes are sorted as the BFS reverse order, then all required values are correctly calculated before, and we are selecting the maximum value among all possible cases. Then DoV*c*(*v<sup>j</sup>* , *i*, *S*, *p*) will find the maximum possible value of DoV*spv <sup>c</sup>* correctly and concludes the proof.

For the destructive model, we define DoV*<sup>d</sup>* (*v*, *k*, *S*, *p*) as the maximum difference of probability to vote for our target party before and after *S* in the sub-tree rooted at *v*, while the budget is *k* and *p* ∈ *F<sup>v</sup>* is the probability that *f*(*v*) will become active. Formally, we define DoV*<sup>d</sup>* (*v*, *k*, *S*, *p*) as follows.

$$\begin{split} \text{DoV}\_{d}(\boldsymbol{v},\boldsymbol{k},\boldsymbol{S},\boldsymbol{p}) &= \max \left\{ \\ \max\_{\boldsymbol{k}'=0}^{k} \Big\{ \text{DoV}\_{d}\Big( \boldsymbol{r}(\boldsymbol{v}),\boldsymbol{k}',\boldsymbol{S},\boldsymbol{p} \cdot \boldsymbol{b}\_{f(\boldsymbol{v}),\boldsymbol{p}} \Big) + \text{DoV}\_{d}\Big( \boldsymbol{l}(\boldsymbol{v}),\boldsymbol{k} - \boldsymbol{k}',\boldsymbol{S},\boldsymbol{p} \cdot \boldsymbol{b}\_{f(\boldsymbol{v}),\boldsymbol{p}} \Big) \Big\} + \boldsymbol{p} \cdot \boldsymbol{b}\_{f(\boldsymbol{v}),\boldsymbol{z}} \cdot \mathcal{D}'\_{\boldsymbol{v}} \Big{ \\ & \max\_{\boldsymbol{k}'=0}^{k-1} \Big{\}} \Big{ \text{DoV}\_{d}\Big( \boldsymbol{r}(\boldsymbol{v}),\boldsymbol{k}',\boldsymbol{S} \cup \{\boldsymbol{v}\},\boldsymbol{1} \Big{ } + \text{DoV}\_{d}\Big{(}\boldsymbol{l}(\boldsymbol{v}),\boldsymbol{k} - \boldsymbol{k}' - 1,\boldsymbol{S} \cup \{\boldsymbol{v}\},\boldsymbol{1} \Big{)} + \mathcal{D}'\_{\boldsymbol{v}} \Big{ \\ \text{(5)} \end{split} \right\}, \quad (5)$$

where D<sup>0</sup> *<sup>v</sup>* = <sup>∑</sup>*c*∈*C*<sup>1</sup> *πv*(*c*) − *πv*(*c*) 1+*p*·*bf*(*v*),*<sup>v</sup>* is the difference that the node *v* can apply. Moreover, for the base cases of the problem, for each leaf *v* ∈ *V*, and each probability *p* ∈ *Fv*, if *k* = 0 we need to consider the probability that the node will become active, then DoV*<sup>d</sup>* (*v*, *k*, *S*, *p*) = *p* · *bf*(*v*),*<sup>v</sup>* · D<sup>0</sup> *v* ; otherwise, if *k* > 0, we have DoV*<sup>d</sup>* (*v*, *k*, *S*, *p*) = D<sup>0</sup> *v* . Furthermore, we set DoV*c*(*null*, *k*, *S*, *p*) = 0. The same as constructive case, for implementation we need a tow-dimensional array *A*[*B* + 1, |*V*|]. Moreover, for each cell (*i*, *j*), 0 6 *i* 6 *B*, 0 6 *j* < |*V*|, we keep another array *A* 0 [|*Fv<sup>j</sup>* |], where *Fv<sup>j</sup>* is the set of possible probabilities that the node *f*(*vj*) can become active. The following theorem shows that by filling the matrix *A* left-to-right and up-down direction, we can find the optimal answer for DoV*spv d* .

**Theorem 6.** *Given a tree T* = (*V*, *E*) *and a budget B, using the DP Equation* (5)*, we can find a set of seed nodes S* (|*S*<sup>|</sup> <sup>6</sup> *<sup>B</sup>*) *to maximize DoVspv d .*

**Proof.** The proof is similar to Theorem 5, except for the base cases and the way of updating each activated node's probability distribution after the diffusion. Since a leaf cannot activate any other node, the only change that it can make is updating its own probability distribution. According to the updating rule (in Section 3.1), and the definition of DoV*spv d* (defined in Equation (2)), the base cases hold. Furthermore, by induction, we can see that the DP Equation (5) will find the maximum value of DoV*spv d* correctly.

#### 6.1.2. Maximizing MoV in Straight-Party Voting under LTM

In order to maximize MoV*spv <sup>c</sup>* we have to know *C S* A , i.e., the most voted opponent party after *S*. We have no problem to find the most voted opponent party before any diffusion (*C*B); however, to find the most voted opponent party after *S* we need to have the optimal set of seed nodes that maximizes MoV*spv c* , and to find the optimal set of seed nodes we need the most voted opponent party (parties), which is a defective cycle.

To deal with this problem, someone may say that we consider *C<sup>i</sup>* , 2 6 *i* 6 *t* as the most voted opponent party after *S*, and solve the related DP; after finding the outcome for all *t* − 1 parties, we select the maximum result as the output. Nevertheless, this is not true in all cases. Consider a case that there are two opponent parties, and each of them has half of the votes before any diffusion. If we consider each of them as the most voted opponent after the diffusion, we will get a wrong outcome as they both can be the most voted opponent after different diffusion processes. In fact, we need to consider multiple parties as the most voted opponent party.

By the way, it has been shown that by maximizing DoV*spv <sup>c</sup>* we get a <sup>1</sup> 3 -approximation factor for maximizing MoV*spv c* . Moreover, by maximizing DoV*spv <sup>d</sup>* we get a <sup>1</sup> 2 -approximation answer for maximizing MoV*spv d* [8].

#### *6.2. Multi-Winner Election Control Using Straight-Party Voting under ICM*

As we saw in previous section (in LTM), each node *v* becomes active either by being among the seed nodes or by the incoming influence from its parent *f*(*v*). Since there is just one incoming edge for each node *v* ∈ *V*, and the threshold of the nodes *t<sup>v</sup>* is generated uniformly at random, then the probability that its threshold be less than or equal to the incoming weight (*bf*(*v*),*<sup>v</sup>* ) is equal to *bf*(*v*),*<sup>v</sup>* . In other words, the node will become active from its parent with the probability that its parent *f*(*v*) is active, times the weight of the edge between them. On the other side, in ICM, a node *v* becomes active if it is either selected as a seed node or its parent *f*(*v*) is activated and tries to influence *v* with the probability *bf*(*v*),*<sup>v</sup>* . Then in a tree, the activation processes in both LTM and ICM are the same.

However, the updating rule is entirely different in them. In other words, in LTM, voters have a probability distribution over the candidates, and the activated nodes will update the probability of voting for candidates regarding the influence from activated incoming neighbors, while in ICM, voters have an exact preferences list over candidates, and the activated nodes promote/demote the position of some candidates in their preference list, regardless of neighbors (see Section 2 for a formal definition).

Since the diffusion process in ICM is the same as LTM, we focus more on updating part of the problem to maximize DoV*spv c* . Recall that we consider the plurality scoring rule for simplicity; however, it is possible to extend the results to any non-increasing scoring function. Then the scoring function F*spv* for our target party is defined as follows. (To extend the result using any non-increasing scoring function *<sup>g</sup>*(·), we should define the functions as F*spv*(*C*1, <sup>∅</sup>) = <sup>∑</sup>*v*∈*<sup>V</sup>* <sup>∑</sup>*c*∈*C*<sup>1</sup> *g*(*πv*(*c*)), <sup>F</sup>*spv*(*C*1, *<sup>S</sup>*) = <sup>E</sup>*A<sup>S</sup>* h ∑*v*∈*<sup>V</sup>* ∑*c*∈*C*<sup>1</sup> *<sup>g</sup>*(*π*˜ *<sup>v</sup>*(*c*))<sup>i</sup> .)

$$\begin{aligned} \mathcal{F}\_{spv}(\mathbb{C}\_1, \mathcal{Q}) &= \sum\_{v \in V} \sum\_{c \in \mathbb{C}\_1} \mathbb{1}\_{\pi\_v(c) = 1'} \\ \mathcal{F}\_{spv}(\mathbb{C}\_1, \mathcal{S}) &= \mathbb{E}\_{A\_{\mathcal{S}}} \Big[ \sum\_{v \in V} \sum\_{c \in \mathbb{C}\_1} \mathbb{1}\_{\tilde{\pi}\_v(c) = 1} \Big] \end{aligned}$$

and the objective functions for the constructive and destructive cases of our problem are the same as Equations (1) and (2), respectively.

#### 6.2.1. Maximizing DoV in Straight-Party Voting under ICM

In this case, node *v* can increase our target party's score by one, if none of our target candidates are in the first position before any diffusion, and one of them is in the second position of the voter's preference list. In other words, the voter *v* may increase the score of our target party if ∃*c* ∈ *C*1, ∃*c* <sup>0</sup> ∈ *C* \ *C*<sup>1</sup> : *πv*(*c* 0 ) = 1 ∧ *πv*(*c*) = 2; otherwise, the node *v* can influence its children and change their opinion, but it cannot affect the target party's score. We call this condition as pre-condition and show it by ¶*v*. We define *F<sup>v</sup>* as the set of all possible probabilities that the node *v* may become active (Please note that the definition of *F<sup>v</sup>* in ICM is different from LTM). Consider a sub-tree rooted at *v* ∈ *V*, budget *k*, seed set *S*, and *p* ∈ *Fv*, we define DoV<sup>0</sup> *c* (*v*, *k*, *S*, *p*) as follows.

$$\begin{split} \text{DoV}'\_{\mathfrak{c}}(\boldsymbol{v},\boldsymbol{k},\boldsymbol{\mathcal{S}},\boldsymbol{p}) &= \max \{ \\ \max\_{\boldsymbol{k}'=0}^{k} \{ \text{DoV}'\_{\mathfrak{c}}(\boldsymbol{r}(\boldsymbol{v}),\boldsymbol{k}',\boldsymbol{\mathcal{S}},\boldsymbol{p}\cdot\boldsymbol{b}\_{\boldsymbol{\nu},\boldsymbol{r}(\boldsymbol{v})}) + \text{DoV}'\_{\mathfrak{c}}(\boldsymbol{l}(\boldsymbol{v}),\boldsymbol{k}-\boldsymbol{k}',\boldsymbol{\mathcal{S}},\boldsymbol{p}\cdot\boldsymbol{b}\_{\boldsymbol{\nu},\boldsymbol{l}(\boldsymbol{v})}) \} + \boldsymbol{p}\cdot\boldsymbol{1}\_{\mathbb{T}^{\nu}} \\ \max\_{\boldsymbol{k}'=0}^{k-1} \{ \text{DoV}'\_{\mathfrak{c}}(\boldsymbol{r}(\boldsymbol{v}),\boldsymbol{k}',\boldsymbol{\mathcal{S}}\cup\{\boldsymbol{v}\},\boldsymbol{b}\_{\boldsymbol{\nu},\boldsymbol{r}(\boldsymbol{v})}) + \text{DoV}'\_{\mathfrak{c}}(\boldsymbol{l}(\boldsymbol{v}),\boldsymbol{k}-\boldsymbol{k}'-\boldsymbol{l},\boldsymbol{\mathcal{S}}\cup\{\boldsymbol{v}\},\boldsymbol{b}\_{\boldsymbol{\nu},\boldsymbol{l}(\boldsymbol{v})}) \} + \boldsymbol{1}\_{\mathbb{T}^{\nu}} \}. \end{split} \tag{6}$$

As the base cases of the problem, for each leaf *v* ∈ *V*, budget zero, and *p* ∈ *F<sup>v</sup>* as the probability that *v* will become active, we set DoV0 *c* (*v*, *<sup>k</sup>*, *<sup>S</sup>*, *<sup>p</sup>*) = *<sup>p</sup>* · <sup>1</sup>¶*<sup>v</sup>* , and for the same parameters except a budget *k* > 0 we set DoV0 *c* (*v*, *<sup>k</sup>*, *<sup>S</sup>*, *<sup>p</sup>*) = <sup>1</sup>¶*<sup>v</sup>* . (To extend the algorithm for any non-increasing scoring function *g*(·), we need to define the base cases, respectively, as DoV0 *c* (*v*, *<sup>k</sup>*, *<sup>S</sup>*, *<sup>p</sup>*) = *<sup>p</sup>* · (∑*c*∈*C*<sup>1</sup> ,∃*c* <sup>0</sup>∈*C*\*C*<sup>1</sup> :*πv*(*c* 0)<*πv*(*c*) *g*(*πv*(*c*) − 1) − *g*(*πv*(*c*))) and DoV<sup>0</sup> *c* (*v*, *k*, *S*, *p*) = ∑*c*∈*C*<sup>1</sup> ,∃*c* <sup>0</sup>∈*C*\*C*<sup>1</sup> :*πv*(*c* 0)<*πv*(*c*) *g*(*πv*(*c*) − 1) − *g*(*πv*(*c*)).) The same as before, for each reference to a node which does not exists (*null*), we define DoV0 *c* (*null*, *k*, *S*, *p*) = 0. In order to implement the DP Equation (6), the idea is the same as Algorithm 1. The following theorem shows that it calculates the maximum DoV*spv c* in polynomial-time.

**Theorem 7.** *Given a tree T* = (*V*, *E*)*, and budget B, the DP Equation* (6) *gives a set of seed nodes S* (|*S*| 6 *B*) *which maximizes DoVspv c .*

**Proof.** In DP Equation (6), there is a maximization over two other maximization formulae. The first one considers the case that we do not select *v* as a seed node; in this case, we consider the probability that node *v* will become active, i.e., *p* ∈ *Fv*. The second maximization considers selecting *v* as a seed node; in this state, *v* will be activated with probability equal to one. In both cases, the node may increase the function's value if the pre-condition holds; otherwise, it can influence its children. The same as previous proves, we show that it works by induction.

Consider a two-dimensional array *A*[*B* + 1, |*V*|] where rows are the budgets from zero to *B*, and columns are the nodes in BFS reveres order. Each cell *A*[*i*, *j*] (0 6 *i* 6 *B*, 0 6 *j* < |*V*|) refers to another array *A* <sup>0</sup> with the size of |*Fv<sup>j</sup>* |. We calculate each array related to each cell (*i*, *j*) left-to-right and up-down direction.

To show that the base cases are correct, note that the leaves cannot activate any other node. Their only effect is by becoming active and changing their own opinion. Then there are two cases if the pre-condition holds for a leaf *v*: First, the budget is more than zero, then *v* can be a seed node and increase the amount of DoV0 *<sup>c</sup>* by one. Second, if the budget is zero, *v* can increment DoV<sup>0</sup> *<sup>c</sup>* with the probability of becoming active through its parent, i.e., in expected, it will be *<sup>p</sup>* · <sup>1</sup>¶*<sup>v</sup>* where *<sup>p</sup>* <sup>∈</sup> *<sup>F</sup><sup>v</sup>* is the probability that *v* will be activated through its parent. Note that if the pre-condition does not hold, the leaf cannot make any effect, and in both cases, its effect is equal to zero.

Let us say (*i* 0 , *j* 0 ) < (*i*, *j*) if *j* 0 < *j*, or *j* <sup>0</sup> = *j* ∧ *i* 0 < *i*. As the step of induction, assume that all cells (*i* 0 , *j* 0 ) smaller that (*i*, *j*) are filled correctly for 0 6 *i* 6 *B*, 0 6 *j* < |*V*|. In order to calculate the array *A* 0 related to the cell (*i*, *j*), for each *p* ∈ *Fv<sup>j</sup>* we have to calculate the result of the following function.

$$\begin{split} \mathsf{DoV}\_{\mathfrak{c}}^{\prime}(\boldsymbol{v}\_{j},i,\mathsf{S},p) &= \max \{ \\ \max\_{\mathbf{k}=0}^{i} \{ \mathsf{DoV}\_{\mathfrak{c}}^{\prime}(\boldsymbol{r}(\boldsymbol{v}\_{j}),\boldsymbol{k},\mathsf{S},p \cdot \boldsymbol{b}\_{\boldsymbol{v}\_{j};\mathsf{I}(\boldsymbol{v}\_{j})}) + \mathsf{DoV}\_{\mathfrak{c}}^{\prime}(l(\boldsymbol{v}\_{j}),i-\mathsf{k},\mathsf{S},p \cdot \boldsymbol{b}\_{\boldsymbol{v}\_{j};\mathsf{I}(\boldsymbol{v}\_{j})}) \} + p \cdot \mathbbm{1}\_{\mathsf{T}^{\prime}} \\ \max\_{\mathbf{k}=0}^{i-1} \{ \mathsf{DoV}\_{\mathfrak{c}}^{\prime}(\boldsymbol{r}(\boldsymbol{v}\_{j}),\boldsymbol{k},\mathsf{S}\cup\{\mathsf{v}\_{j}\},\boldsymbol{b}\_{\boldsymbol{v}\_{j};\mathsf{I}(\boldsymbol{v}\_{j})}) + \mathsf{DoV}\_{\mathfrak{c}}^{\prime}(l(\boldsymbol{v}\_{j}),i-\mathsf{k}-1,\mathsf{S}\cup\{\mathsf{v}\_{j}\},\boldsymbol{b}\_{\boldsymbol{v}\_{j};\mathsf{I}(\boldsymbol{v}\_{j})}) \} + \mathbbm{1}\_{\mathsf{T}} \}. \end{split}$$

There is a maximization over two cases. Let us check each case separately. The first case considers all possible cases to split the budget into two parts for its children *r*(*vj*) and *l*(*vj*) (the first and second terms) when *v<sup>j</sup>* is not selected as a seed node. It finds the split with the maximum outcome using the DoV0 *<sup>c</sup>* of its children, which are calculated correctly. In this case, since the node *v<sup>j</sup>* is not a seed node, then the probability that its right (resp. left) child will become active is *p* · *bv<sup>j</sup>* ,*r*(*v<sup>j</sup>* ) (resp. *p* · *bv<sup>j</sup>* ,*l*(*v<sup>j</sup>* ) ). The fixed-term is the amount of change that the node *v<sup>j</sup>* can afford to maximize our target party's score. If the pre-condition holds, then with the probability of *<sup>p</sup>* it will increase the score by one, that is *<sup>p</sup>* · <sup>1</sup>¶*<sup>v</sup>* .

The second maximization investigates the same situation except that it selects *v<sup>j</sup>* as a seed node (if *i* > 0) and uses the value DoV0 *<sup>c</sup>* of its children to find the best split for the *i* − 1 remaining budgets. In this case, the node *v<sup>j</sup>* can increase our party's score by one (if the pre-condition holds) as it is selected as a seed node and will be activated for sure. (To generalize the proof using any non-increasing scoring function *g*(·), we should change the updating part of each maximization (the fixed part) as *<sup>p</sup>* · (∑*c*∈*C*<sup>1</sup> ,∃*c* <sup>0</sup>∈*C*\*C*<sup>1</sup> :*πv*(*c* 0)<*πv*(*c*) *<sup>g</sup>*(*πv*(*c*) − <sup>1</sup>) − *<sup>g</sup>*(*πv*(*c*))) and <sup>∑</sup>*c*∈*C*<sup>1</sup> ,∃*c* <sup>0</sup>∈*C*\*C*<sup>1</sup> :*πv*(*c* 0)<*πv*(*c*) *g*(*πv*(*c*) − 1) − *g*(*πv*(*c*)), respectively.) Note that all corresponding values for the children of *v<sup>j</sup>* are correctly calculated before because the nodes are sorted as BFS reverse order. Finally, it finds the maximum value among the two cases.

For the destructive case of the problem, we define pre-condition ¶ 0 *<sup>v</sup>* as ∃*c* ∈ *C*<sup>1</sup> : *πv*(*c*) = 1. Then for a node *v*, if it becomes active and ¶ 0 *<sup>v</sup>* holds, the node will decrease the party's score by one; otherwise, *v* cannot change it. For each sub-tree rooted at *v*, budget *k*, and *p* ∈ *Fv*, let us define DoV0 *d* (*v*, *k*, *S*, *p*) as follows.

$$\begin{split} \mathsf{DoV}\_{d}^{\prime}(\boldsymbol{v},\boldsymbol{k},\boldsymbol{\mathcal{S}},\boldsymbol{p}) &= \max \left\{ \\ \max\_{\boldsymbol{k}^{\prime}=0}^{k} \{ \mathsf{DoV}\_{d}^{\prime}(\boldsymbol{r}(\boldsymbol{v}),\boldsymbol{k}^{\prime},\boldsymbol{\mathcal{S}},\boldsymbol{p} \cdot \boldsymbol{b}\_{\boldsymbol{\nu}\_{\mathcal{I}}\boldsymbol{r}(\boldsymbol{v})}) + \mathsf{DoV}\_{d}^{\prime}(\boldsymbol{l}(\boldsymbol{v}),\boldsymbol{k} - \boldsymbol{k}^{\prime},\boldsymbol{\mathcal{S}},\boldsymbol{p} \cdot \boldsymbol{b}\_{\boldsymbol{\nu}\_{\mathcal{I}}\boldsymbol{l}(\boldsymbol{v})}) \} + \boldsymbol{p} \cdot \mathbbm{1}\_{\mathsf{II}\_{\mathcal{I}}^{\prime}} \\ \max\_{\boldsymbol{k}^{\prime}=0}^{k-1} \{ \mathsf{DoV}\_{d}^{\prime}(\boldsymbol{r}(\boldsymbol{v}),\boldsymbol{k}^{\prime},\boldsymbol{\mathcal{S}} \cup \{\boldsymbol{v}\},\boldsymbol{b}\_{\boldsymbol{\nu}\_{\mathcal{I}}\boldsymbol{r}(\boldsymbol{v})}) + \mathsf{DoV}\_{d}^{\prime}(\boldsymbol{l}(\boldsymbol{v}),\boldsymbol{k} - \boldsymbol{k}^{\prime} - \boldsymbol{1},\boldsymbol{\mathcal{S}} \cup \{\boldsymbol{v}\},\boldsymbol{b}\_{\boldsymbol{\nu}\_{\mathcal{I}}\boldsymbol{l}(\boldsymbol{v})}) \} + \mathbbm{1}\_{\mathsf{II}\_{\mathcal{I}}^{\prime}} \}. \end{split}$$

Note that the definition is exactly the same as constructive case except for the pre-condition. Furthermore the base cases are the same as before if we substitute ¶ 0 *v* for ¶*v*. The prove of the following theorem is similar to the Theorem 7; then we omit it to avoid repetition.

**Theorem 8.** *Given a tree T* = (*V*, *E*)*, and budget B, the DP Equation* (7) *gives a set of seed nodes S* (|*S*| 6 *B*) *which maximizes DoVspv d .*

#### 6.2.2. Maximizing MoV in Straight-Party Voting under ICM

Similar to Section 6.1.2, we do not know the most scored parties after the diffusion started from a set of optimal seed nodes. However, it has been shown that by maximizing DoV*spv c* (resp. DoV*spv d* ) we get a <sup>1</sup> 3 (resp. <sup>1</sup> 2 ) approximation algorithm for maximizing MoV*spv c* (resp. MoV*spv d* ) [9].

#### **7. Discussion**

Controlling election via social influence is one of the most crucial parts of each democratic election. It has been shown that many campaigns are using this powerful tool to influence the voters and change their opinion during elections. In this work, we considered the multi-winner election control utilizing social influence so that the attacker tries to maximize/minimize the number of winners from his target party, concerning the party with the most winners.

We exhibited different results, including hardness of approximation, approximation guarantee, and optimal solutions for our problem considering different structures, diffusion models, and voting systems. In ICM, each voter has a preference list over the candidates and will vote for one or more candidate according to the voting rule, e.g., plurality, Borda's rule, *k*-approval, and anti-plurality. In this case, the influenced voters change their opinion by promoting/demoting the candidates' position in their preference list. On the other hand, in LTM, we consider that the voters have a probability distribution over all candidates. Each voter votes for one or more candidates proportional to the probability of voting for them. In this model, the activated voters change their opinion based on the incoming activated neighbors' influence.

We proved the problem is hard to approximate within any factor when the structure is a general graph, and the diffusion model is LTM. We also considered the problem when the structure is an arborescence, and the diffusion process follows the ICM rules. We showed that the problem is inapproximable within any factor, except *P* = *NP*. Another structure that we investigated is a tree where the voting system is a variation of straight-party voting. We presented a polynomial-time algorithm to maximize the expected score of our target party regarding both LT and IC diffusion models. It yields that we can get a <sup>1</sup> 3 -approximation factor for maximizing MoV in constructive case, and <sup>1</sup> 2 -approximation factor concerning MoV in the destructive model.

The results of this paper open several research directions. Considering the multi-winner election control through social influence on arborescence, when the diffusion model is LTM can be an exciting research problem. We conjecture that maximizing both objective functions (MoV and DoW) is hard; however, there exists a polynomial-time algorithm for the IM problem on arborescence under LTM. We plan to consider maximizing MoV in straight-party voting to either present an optimal solution or provide a hardness result regarding both constructive and destructive cases. Furthermore, maximizing DoV on the bidirected trees, where a child can activate its parent too, can be impressive. We conjecture that the problem accepts a polynomial-time algorithm following a similar dynamic programming approach.

**Author Contributions:** Conceptualization, M.A.M. and G.D.; methodology, M.A.M. and G.D.; software, M.A.M. and G.D.; validation, M.A.M. and G.D.; formal analysis, M.A.M. and G.D.; investigation, M.A.M. and G.D.; resources, M.A.M. and G.D.; data curation, M.A.M. and G.D.; writing–original draft preparation, M.A.M. and G.D.; writing–review and editing, M.A.M. and G.D.; visualization, M.A.M and G.D.; supervision, G.D.; project administration, G.D.; funding acquisition, G.D. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work has been partially supported by the Italian MIUR PRIN 2017 Project ALGADIMAR "Algorithms, Games, and Digital Markets".

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **On Multidimensional Congestion Games †**

#### **Vittorio Bilò 1,\*, Michele Flammini 2,\*, Vasco Gallotti <sup>3</sup> and Cosimo Vinci 2,\***


Received: 9 September 2020; Accepted: 3 October 2020; Published: 15 October 2020

**Abstract:** We introduce multidimensional congestion games, that is, congestion games whose set of players is partitioned into *d* + 1 clusters *C*0, *C*1, . . . , *C<sup>d</sup>* . Players in *C*<sup>0</sup> have full information about all the other participants in the game, while players in *C<sup>i</sup>* , for any 1 ≤ *i* ≤ *d*, have full information only about the members of *C*<sup>0</sup> ∪ *C<sup>i</sup>* and are unaware of all the others. This model has at least two interesting applications: (*i*) it is a special case of graphical congestion games induced by an undirected social knowledge graph with independence number equal to *d*, and (*ii*) it represents scenarios in which players have a type and the level of competition they experience on a resource depends on their type and on the types of the other players using it. We focus on the case in which the cost function associated with each resource is affine and bound the price of anarchy and stability as a function of *d* with respect to two meaningful social cost functions and for both weighted and unweighted players. We also provide refined bounds for the special case of *d* = 2 in presence of unweighted players.

**Keywords:** congestion games; pure Nash equilibrium; potential games; price of anarchy; price of stability

#### **1. Introduction**

*Congestion games* [1–4] are, perhaps, the most famous class of non-cooperative games due to their capability to model several interesting competitive scenarios, while maintaining nice properties. In these games, there is a set of players sharing a set of resources. Each resource has an associate cost function which depends on the number of players using it (the so-called *congestion*). Players aim at choosing subsets of resources so as to minimize the sum of the resource costs.

Congestion games were introduced by Rosenthal in Reference [2]. He proved that each such a game admits a bounded potential function whose set of local minima coincides with the set of *pure Nash equilibria* [5] of the game, that is, strategy profiles in which no player can decrease her cost by unilaterally changing her strategic choice. This existence result makes congestion games particularly appealing especially in all those applications in which pure Nash equilibria are elected as the ideal solution concept.

In these contexts, the study of inefficiency due to selfish and non-cooperative behavior has affirmed as a fervent research direction. To this aim, the notions of *price of anarchy* [6] and *price of stability* [7] are widely adopted. The price of anarchy (resp. stability) compares the performance of the worst (resp. best) pure Nash equilibrium with that of an optimal cooperative solution.

Congestion games with unrestricted cost functions are general enough to model the Prisoner's Dilemma game, whose unique pure Nash equilibrium is known to perform arbitrarily bad with respect to the solution in which all players cooperate. Hence, in order to deal with significative bounds on the prices of anarchy and stability, some kind of regularity needs to be imposed on the cost functions associated with the resources. To this aim, lot of research attention has been devoted to the case of polynomial cost functions [8–17], and more general latency functions verifying some mild assumptions [10,18,19]. Among these, the particular case of affine functions occupies a predominant role. In fact, as shown in References [20–22], they represent the only case, together with that (perhaps not particularly meaningful) of exponential cost functions, for which *weighted congestion games* [20], that is the generalization of congestion games in which each player has a weight and the congestion of a resource becomes the sum of the weights of its users, still admit a potential function.

#### *1.1. Motivations*

Traditional (weighted) congestion games are defined under a *full information* scenario—each player knows all the other participants in the game as well as their available strategies. These requirements, anyway, become too strong in many practical applications, where players may be unaware about even the mere existence of other potential competitors. This observation, together with the widespread of competitive applications in social networks, has drawn some attention on the model of *graphical (weighted) congestion games* [23–25].

A graphical (weighted) congestion game (G, *G*) is obtained by coupling a traditional (weighted) congestion game G with a *social knowledge graph G* expressing the *social context* in which the players operate. In *G*, each node is associated with a player of G and there exists a directed edge from node *i* to node *j* if and only if player *i* has full information about player *j*. A basic property of (weighted) congestion games is that the congestion of a resource *r* in a given strategy profile *σ* is the same for all players. The existence of a social context in graphical (weighted) congestion games, instead, makes the congestion of each resource player dependent. In these games, in fact, the congestion presumed by player *i* on resource *r* in the strategy profile *σ* is obtained by excluding from the set of players choosing *r* in *σ* those of whom player *i* has no knowledge. Clearly, if *G* is complete, then there is no difference between (G, *G*) and G. In all the other cases, however, there may be a big difference between the cost that a player *presumes* to pay on a certain strategy profile and the real cost that she effectively *perceives* because of the presence of players she was unaware of during her decisional process (We observe that the model of graphical congestion games is sufficiently powerful to describe an alternative scenario in which players never perceive their real costs, which are perceived and evaluated by a central entity only. In such case, the central entity aims at evaluating the global and real impact on the performance of the game caused by the players' strategic behaviour).

Graphical congestion games have been introduced by Bilò et al. in Reference [24]. They focus on affine cost functions and provide a complete characterization of the cases in which existence of pure Nash equilibria can be guaranteed. In particular, they show that equilibria always exist if and only if *G* is either undirected or directed acyclic. Then, for all these cases, they give bounds on the price of anarchy and stability expressed as a function of the number of players in the game and of the maximum degree of *G*. These metrics are defined with respect to both the sum of the perceived costs and the sum of the presumed ones.

Fotakis et al. [25] argue that the maximum degree of *G* is not a proper measure of the level of social ignorance in a graphical congestion game and propose to bound the prices of anarchy and stability as a function of the independence number of *G*, denoted by *δ*(*G*). They focus on graphical weighted congestion games with affine cost functions and show that they still admit a potential function when *G* is undirected. Then, they prove that the price of anarchy is between *δ*(*G*)(*δ*(*G*) + 1) and *δ*(*G*)(*δ*(*G*) + 2 + p *δ*(*G*) <sup>2</sup> + 4*δ*(*G*))/2 with respect to both the sum of the perceived costs and the sum of the presumed ones, and that the price of stability is between *δ*(*G*) and 2*δ*(*G*) with respect to the sum of the perceived costs.

#### *1.2. Our Contribution and Significance*

The works of Bilò et al. [24] and Fotakis et al. [25] aim at characterizing the impact of social ignorance in the most general case, that is, without imposing any particular structure on the social knowledge graphs defining the graphical game. Nevertheless, real-world-based knowledge relationships usually obey some regularities and present recurrent patterns: for instance, people tend to cluster themselves into well-structured collaborative groups (cliques) due to family memberships, mutual friendships, interest sharing, business partnerships.

To this aim, we introduce and study *multidimensional (weighted) congestion games*, that is, (weighted) congestion games whose set of players are partitioned into *d* + 1 clusters *C*0, *C*1, . . . , *C<sup>d</sup>* . Players in *C*<sup>0</sup> have full information about all the other participants in the game, while players in *C<sup>i</sup>* , for any 1 ≤ *i* ≤ *d*, have full information only about the members of *C*<sup>0</sup> ∪ *C<sup>i</sup>* and are unaware of all the others. It is not difficult to see (and we provide a formal proof of this fact in the next section) that each multidimensional (weighted) congestion game is a graphical (weighted) congestion game whose social knowledge graph *G* is undirected and verifies *δ*(*G*) = *d*. In addition, *G* possesses the following, well-structured, topology: it is the union of *d* + 1 disjoint cliques (each corresponding to one of the *d* + 1 clusters in the multidimensional (weighted) congestion game) with the additional property that there exists an edge from all the nodes belonging to one of these cliques (the one corresponding to cluster *C*0) to all the nodes in all the other cliques.

We believe that the study of graphical games restricted to some prescribed social knowledge graphs like the ones we consider here, may be better suited to understand the impact of social ignorance in non-cooperative systems coming from practical and real-world applications. Moreover, the particular social knowledge relationships embedded in the definition of multidimensional (weighted) congestion games perfectly model the situation that generates when several independent games with full information are gathered together by some promoting parties so as to form a sort of "global super-game". The promoting parties become players with full information in the super-game, while each player in the composing sub-games maintains full information about all the other players in the same sub-game, acquires full information about all the promoting parties in the super game, but completely ignores the players in the other sub-games. Such a composing scheme resembles, in a sense, the general architecture of the Internet, viewed as a self-emerged network resulting from the aggregation of several autonomous systems (AS). Users in an AS have full information about anything happening within the AS, but, at the same time, they completely ignore the network global architecture and how it develops outside their own AS, except for the existence of high-level network routers. High-level network routers, instead, have full information about the entire network.

Furthermore, multidimensional (weighted) congestion games are also useful to model games in which players belong to different types and the level of competition that each player experiences on each selected resource depends on her type and on the types of the other players using the resource. Consider, for instance, a machine which is able to perform *d* different types of activities in parallel and a set of tasks requiring the use of this machine. Tasks are of two types: simple and complex. Simple tasks take the machine busy on one particular activity only, while complex tasks require the completion of all the *d* activities. Hence, complex tasks compete with all the other tasks, while simple ones compete only with the tasks requiring the same machine (thus, also with complex tasks). A similar example is represented by a facility location game where players want to locate their facilities so as to minimize the effect of the competition due to the presence of neighbor competitors. If we assume that the facilities can be either specialized shops selling particular products (such as perfumeries, clothes shops, shoe shops) or shopping centers selling all kinds of products, we have again that the shopping centers compete with all the other participants in the game, while specialized shops compete only with shops of the same type and with shopping centers.

In this paper, we focus on multidimensional (weighted) congestion games with affine cost functions. In such a setting, we bound the price of anarchy and the price of stability with respect to the two social cost functions, which are the sum of the perceived costs and the sum of the presumed costs. In fact, when multidimensional (weighted) congestion games are viewed as graphical (weighted) congestion games with highly clustered knowledge relationships, the sum of the perceived costs is more appropriate to define the overall quality of a profile: players decide according to their knowledge, but then, when the solution is physically realized, their cost becomes influenced also by the players of which they were not aware. Hence, under this social cost function, the notions of price of anarchy and price of stability effectively measure the impact of social ignorance in the system. On the other hand, when multidimensional (weighted) congestion games are used to model players belonging to different types, the perceived cost of a player coincides with the presumed one since there is no real social ignorance, even if the fact that players can be of different types allows for a reinterpretation of the model as a special case of graphical (weighted) congestion games. Hence, in such a setting, the inefficiency due to selfish behavior has to be analyzed with respect to the sum of the presumed costs.

We determine general upper bounds for the price of anarchy and the price of stability as a function of *d*. For the sum of the presumed costs, we show that the price of anarchy and stability of weighted games are at most ( √ *d*+4+ √ *d*)(<sup>√</sup> *d* √ *d*+4+*d*+4) 4 √ *d*+4 ≤ *d* + 2 and 2, respectively. Instead, for the sum of the perceived costs, the results of Reference [25] yield upper bounds of *d*(*d* + 2 + √ *d* <sup>2</sup> + 4*d*)/2 and 2*d* for the price of anarchy and the price of stability, respectively.

Then, we investigate the case of unweighted games with *d* = 2 (i.e., bidimensional congestion games) in higher depth and provide refined bounds. In particular, we prove that price of anarchy is 119/33 ≈ 3.606 with respect to the sum of the presumed costs and it is 35/8 = 4.375 with respect to the sum of the perceived ones, and that the price of stability is between (1 + √ 5)/2 ≈ 1.618 and 1 + 2/ √ 7 ≈ 1.756 for the sum of the presumed costs as social cost function, and between (5 + √ 17)/4 ≈ 2.28 and 2.92 for the sum of the perceived ones. These results are derived by exploiting the primal-dual method developed in Reference [11].

A preliminary version of this paper has been presented at SIROCCO 2012 [26].

#### *1.3. Paper Organization*

Next section contains all formal definitions and notation. In Section 3 we discuss the existence of pure Nash equilibria in multidimensional weighted congestion games. In Sections 4 and 5, we present our bounds for the price of anarchy and the price of stability, respectively. Finally, in the last section, we give some concluding remarks and discuss open problems.

#### **2. Model and Definitions**

For an integer *n* ≥ 2, we denote [*n*] := {1, 2, . . . , *n*}. In a *weighted congestion game* G, we have *n* players and a set of resources *R*, where each resource *r* ∈ *R* has an associated cost function `*<sup>r</sup>* . The set of strategies for each player *i* ∈ [*n*], denoted as *S<sup>i</sup>* , can be any subset of the powerset of *R*, that is, *S<sup>i</sup>* ⊆ 2 *<sup>R</sup>*. Each player *<sup>i</sup>* <sup>∈</sup> [*n*] is associated with a positive weight *<sup>w</sup><sup>i</sup>* <sup>&</sup>gt; 0. Given a strategy profile *σ* = (*σ*1, . . . , *σn*), the congestion of resource *r* in *σ*, denoted as *nr*(*σ*), is the total weight of players choosing *<sup>r</sup>* in *<sup>σ</sup>*, that is, *<sup>n</sup>r*(*σ*) = <sup>∑</sup>*i*∈[*n*]:*r*∈*σ<sup>i</sup> wi* . The perceived cost paid by player *i* in *σ* is *<sup>c</sup>i*(*σ*) = <sup>∑</sup>*r*∈*σ<sup>i</sup>* `*r*(*nr*(*σ*)). An *unweighted congestion game* (congestion game, for brevity) is a weighted congestion game in which all players have unitary weights. An *affine weighted congestion game* is a weighted congestion game such that, for any *r* ∈ *R*, it holds that `*r*(*x*) = *αrx* + *β<sup>r</sup>* , with *α<sup>r</sup>* , *β<sup>r</sup>* ≥ 0; the game is *linear* if *β<sup>r</sup>* = 0 for any *r* ∈ *R*.

For any integer *d* ≥ 2, a *d-dimensional weighted congestion game* (G, C) consists of a weighted congestion game G whose set of players is partitioned into *d* + 1 clusters *C*0, *C*1, . . . , *C<sup>d</sup>* . For a player *i*, we denote by *f*(*i*) ∈ {0, . . . , *d*} the cluster she belongs to. We say that players in *C*<sup>0</sup> are *omniscient* and that players in *C<sup>i</sup>* , for any *i* ∈ [*d*], are *ignorant*. Given a strategy profile *σ*, we denote by *nr*,*j*(*σ*) the total weight of players belonging to *C<sup>j</sup>* who are using resource *r* in *σ*. The presumed cost of a player *<sup>i</sup>* in *<sup>σ</sup>* is *<sup>c</sup>*ˆ*i*(*σ*) = <sup>∑</sup>*r*∈*σ<sup>i</sup>* `*r*(*nr*, *<sup>f</sup>*(*i*) (*σ*) + *<sup>n</sup>r*,0(*σ*)) if she is ignorant and *<sup>c</sup>*ˆ*i*(*σ*) = <sup>∑</sup>*r*∈*σ<sup>i</sup>* `*r*(∑ *d j*=0 *nr*,*j*(*σ*)) = ∑*r*∈*σ<sup>i</sup>* `*r*(*nr*(*σ*)) = *ci*(*σ*) if she is omniscient. A *d-dimensional weighted affine congestion game* is a pair (G, C) such that G is an affine weighted congestion game.

Given a strategy profile *σ* and a strategy *s* ∈ *S<sup>i</sup>* for a player *i* ∈ [*n*], we denote with (*σ*−*<sup>i</sup>* ,*s*) the strategy profile obtained from *σ* by replacing the strategy *σ<sup>i</sup>* played by *i* in *σ* with *s*. A *pure Nash equilibrium* is a strategy profile *σ* such that, for any player *i* ∈ [*n*] and for any strategy *s* ∈ *S<sup>i</sup>* , it holds that *c*ˆ*i*(*σ*−*<sup>i</sup>* ,*s*) ≥ *c*ˆ*i*(*σ*).

Let Σ be the set of all possible strategy profiles which can be realized in (G, C). We denote with N E(G, C) ⊆ Σ the set of pure Nash equilibria of (G, C). Let SF : Σ → R≥<sup>0</sup> be a *social cost function* measuring the overall quality of each strategy profile in Σ. We denote with *σ* ∗ a *social optimum* of (G, C) with respect to SF, that is, a strategy profile minimizing the social cost function SF. We consider two social cost functions, namely, the (weighted) sum of the presumed costs of all players and the (weighted) sum of their perceived ones denoted, respectively, as Pres and Perc. Technically, they assume the following expressions:

$$\begin{split} \mathsf{PSr}(\sigma) &= \sum\_{i \in [n]} w\_i \pounds\_i(\sigma) \\ &= \sum\_{i \in \mathsf{C}\_0} w\_i \sum\_{r \in \sigma\_l} \left( a\_{r\varGamma\_l}(\sigma) + \beta\_r \right) + \sum\_{i \notin \mathsf{C}\_0} w\_i \sum\_{r \in \sigma\_l} \left( a\_r \left( n\_{r, f(i)}(\sigma) + n\_{r, 0}(\sigma) \right) + \beta\_r \right) \\ &= \sum\_{r \in \mathsf{R}} \left( a\_r n\_{r, 0}(\sigma) \sum\_{j=0}^d n\_{r, j}(\sigma) + \beta\_r n\_{r, 0}(\sigma) \right) \\ &\quad + \sum\_{r \in \mathsf{R}} \left( a\_r n\_{r, 0}(\sigma) \sum\_{j=1}^d n\_{r, j}(\sigma) + a\_r \sum\_{j=1}^d n\_{r, j}(\sigma)^2 + \beta\_r \sum\_{j=1}^d n\_{r, j}(\sigma) \right) \\ &= \sum\_{r \in \mathsf{R}} \left( a\_r \left( \sum\_{j=0}^d n\_{r, j}(\sigma)^2 + 2n\_{r, 0}(\sigma) \sum\_{j=1}^d n\_{r, j}(\sigma) \right) + \beta\_r \sum\_{j=0}^d n\_{r, j}(\sigma) \right) \end{split}$$

and

$$\mathsf{Perc}(\sigma) = \sum\_{i \in [n]} w\_i c\_i(\sigma) = \sum\_{i \in [n]} w\_i \sum\_{r \in \sigma\_i} (a\_r n\_r(\sigma) + \beta\_r) = \sum\_{r \in R} \left( a\_r n\_r(\sigma)^2 + \beta\_r n\_r(\sigma) \right).$$

For a fixed social cost function SF, the *price of anarchy* of (G, C), denoted by *PoA*(G, C), is the ratio between the social value of the *worst* pure Nash equilibrium of (G, C) and that of a social optimum, that is, *PoA*(G, C) = max*σ*∈N E(G,C) SF(*σ*) SF(*σ*∗) . The *price of stability*, denoted by *PoS*(G, C), considers the best case, so that *PoS*(G, C) = min*σ*∈N E(G,C) SF(*σ*) SF(*σ*∗) .

#### **3. Existence of Pure Nash Equilibria**

In this section, we prove that multidimensional unweighted (resp. weighted affine) congestion games are graphical unweighted (resp. weighted affine) congestion games defined by an underlying undirected social knowledge graph. This allows us to use a result in Reference [24] (resp. [25]) stating that these games are potential games, thus admitting pure Nash equilibria.

A *graphical weighted congestion game* (G, *G*) consists of a weighted congestion game G and a directed graph *G* = (*N*, *A*) such that each node of *N* is associated with a player in G and there exists a directed edge from node *i* to node *j* if and only if player *i* has full information about player *j*. The congestion presumed by player *<sup>i</sup>* on resource *<sup>r</sup>* in the profile *<sup>σ</sup>* is *<sup>n</sup>*˜*r*,*i*(*σ*) = <sup>∑</sup>*p*∈*N*:*r*∈*σp*,(*i*,*p*)∈*<sup>A</sup> <sup>w</sup><sup>p</sup>* + *<sup>w</sup><sup>i</sup>* and the presumed cost paid by player *<sup>i</sup>* in *<sup>σ</sup>* is *<sup>c</sup>*˜*i*(*σ*) = <sup>∑</sup>*r*∈*σ<sup>i</sup>* `*r*(*n*˜*r*,*i*(*σ*)). A *graphical weighted affine congestion game* is a pair (G, *G*) such that G is an affine weighted congestion game. The *independence number δ*(*G*) of (G, *G*) is the cardinality of a maximum independent set of graph *G*.

A function Φ : Σ → R is a *weighted potential function* for a graphical weighted congestion game (G, *G*), if for any profile *σ*, any player *i* ∈ [*n*] and any strategy *s* ∈ *S<sup>i</sup>* , it holds that Φ(*σ*) − Φ(*σ*−*<sup>i</sup>* ,*s*) = *ai*(*c*˜*i*(*σ*) − *c*˜*i*(*σ*−*<sup>i</sup>* ,*s*)) for some *a<sup>i</sup>* > 0; if *a<sup>i</sup>* = 1, Φ is an *exact potential function*. In Reference [24] (resp. [25]), it is shown that each graphical unweighted (resp. weighted affine) congestion game (G, *G*) such that *G* is undirected admits an exact potential function (resp. weighted potential function).

The following result shows that *d*-dimensional weighted congestion games are a subclass of graphical weighted congestion games.

**Proposition 1.** *Each d-dimensional weighted congestion game is a graphical weighted congestion game whose social knowledge graph is undirected.*

**Proof.** Fix a *d*-dimensional weighted congestion game (G, C). We define a graph *G* = (*N*, *A*) such that each node in *N* is associated with a player in G and there is an undirected edge {*u*, *v*} ∈ *A* if and only if either *u*, *v* ∈ *C<sup>i</sup>* for some 0 ≤ *i* ≤ *d* or *u* ∈ *C*0. We show that, for any strategy profile *σ* of G and for any *i* ∈ [*n*], *c*ˆ*i*(*σ*) = *c*˜*i*(*σ*).

Consider first an omniscient player *i* ∈ *C*0. In (G, C), it holds that

$$\mathcal{E}\_{\boldsymbol{l}}(\sigma) = \sum\_{\boldsymbol{r} \in \sigma\_{\boldsymbol{l}}} \ell\_{\boldsymbol{r}}(\boldsymbol{n}\_{\boldsymbol{r}}(\sigma)) = \sum\_{\boldsymbol{r} \in \sigma\_{\boldsymbol{l}}} \ell\_{\boldsymbol{r}} \left( \sum\_{p \in [n] : \boldsymbol{r} \in \sigma\_{p}} w\_{p} \right) \boldsymbol{r}$$

while in (G, *G*), it holds that

$$\mathcal{Z}\_i(\sigma) = \sum\_{r \in \sigma\_i} \ell\_r(\mathfrak{n}\_{r,i}(\sigma)) = \sum\_{r \in \sigma\_i} \ell\_r \left( \sum\_{p \in N: r \in \sigma\_p, \{i, p\} \in A} w\_p + w\_i \right) = \sum\_{r \in \sigma\_i} \ell\_r \left( \sum\_{p \in [n]: r \in \sigma\_p} w\_p \right).$$

where the last equality follows from the fact that, by construction of *G*, it holds that {*i*, *p*} ∈ *A*, for any *p* ∈ [*n*] with *p* 6= *i*.

Next, consider an ignorant player *i* ∈ *C<sup>j</sup>* for some *j* ∈ [*d*]. In (G, C), it holds that

$$\hat{c}\_l(\sigma) = \sum\_{r \in \sigma\_l} \ell\_r(n\_{r, f(i)}(\sigma) + n\_{r, 0}(\sigma)) = \sum\_{r \in \sigma\_l} \ell\_r \left( \sum\_{p \in \mathbb{C}\_0 \cup \mathbb{C}\_j \colon r \in \sigma\_p} w\_p \right).$$

while in (G, *G*), it holds that

$$\mathcal{E}\_{l}(\sigma) = \sum\_{r \in \sigma\_{l}} \ell\_{r}(\mathfrak{h}\_{r,i}(\sigma)) = \sum\_{r \in \sigma\_{l}} \ell\_{r} \left( \sum\_{p \in \mathcal{N}: r \in \sigma\_{p}, \{i, p\} \in A} w\_{p} + w\_{i} \right) = \sum\_{r \in \sigma\_{l}} \ell\_{r} \left( \sum\_{p \in \mathbb{C}\_{0} \cup \overline{\mathbb{C}}\_{j}: r \in \sigma\_{p}} w\_{p} \right).$$

where the last equality follows from the fact that, by construction of *G*, for any *p* ∈ [*n*] with *p* 6= *i*, it holds that {*i*, *p*} ∈ *A* if and only if *p* ∈ *C*<sup>0</sup> ∪ *C<sup>j</sup>* .

Each game admitting an exact or weighted potential function always admits pure Nash equilibria. Hence, by Proposition 1 and the existence of an exact (resp. weighted) potential function for graphical unweighted (resp. weighted affine) congestion games with undirected social knowledge graphs, we have that *d*-dimensional unweighted (resp. weighted affine) congestion games always admit pure Nash equilibria.

For weighted affine games, the potential function assume the following expression:

$$\begin{split} \Phi(\sigma) &= \sum\_{r \in R} \left( a\_r \left( \sum\_{\substack{i \in [n] : r \in \sigma\_l}} w\_i^2 + \sum\_{\substack{\{i, p\} \in A : r \in \sigma\_l \cap \sigma\_p}} w\_l w\_p \right) + \beta\_r \sum\_{\substack{i \in [n] : r \in \sigma\_l}} w\_i \right) \\ &= \frac{1}{2} \sum\_{r \in R} \left( a\_r \left( \sum\_{j=0}^d n\_{r,j}(\sigma)^2 + \sum\_{\substack{i \in [n] : r \in \sigma\_l}} w\_i^2 + 2n\_{r,0}(\sigma) \sum\_{j=1}^d n\_{r,j}(\sigma) \right) + 2\beta\_r \sum\_{j=0}^d n\_{l,j}(\sigma) \right) . \end{split} \tag{1}$$

#### **4. Bounds for the Price of Anarchy**

In this section, we provide an upper bound for the price of anarchy of multidimensional weighted affine congestion games as a function of *d*.

Fix a pure Nash equilibrium *σ* and a social optimum *σ* ∗ , thus fixing the congestions *nr*,*i*(*σ*) and *nr*,*i*(*σ* ∗ ) for each *i* ∈ [*n*] and *r* ∈ *R*. The pure Nash equilibrium condition implies that no player lowers her presumed cost by deviating to the strategy she uses in *σ* ∗ . For any player *i* ∈ *C*0, this yields

$$\sum\_{r \in \sigma\_i^\*} \left( a\_r n\_r(\sigma) + \beta\_r \right) - \sum\_{r \in \sigma\_i^\*} \left( a\_r \left( n\_r(\sigma) + w\_i \right) + \beta\_r \right) \le 0,\tag{2}$$

that is a necessary condition satisfied by any pure Nash equilibrium (The equilibrium condition yields the stronger inequality <sup>∑</sup>*r*∈*σi*\*<sup>σ</sup>* ∗ *i* (*αrnr*(*σ*) + *<sup>β</sup>r*) − <sup>∑</sup>*r*∈*<sup>σ</sup>* ∗ *i* \*σi* (*αr*(*nr*(*σ*) + *wi*) + *βr*) ≤ 0, so that inequality <sup>∑</sup>*r*∈*σ<sup>i</sup>* (*αrnr*(*σ*) + *<sup>β</sup>r*) − <sup>∑</sup>*r*∈*<sup>σ</sup>* ∗ *i* (*αr*(*nr*(*σ*) + *wi*) + *βr*) ≤ 0 is a relaxation of the equilibrium condition.). For weighted games, by using *w<sup>i</sup>* ≤ *nr*,0(*σ* ∗ ) for any *r* ∈ *R* and *i* ∈ [*n*] such that *r* ∈ *σ<sup>i</sup>* , by multiplying (2) for *w<sup>i</sup>* and summing it for each *i* ∈ *C*0, we get

$$\sum\_{r \in R} \left( a\_r n\_{r,0}(\sigma) \sum\_{j=0}^d n\_{r,j}(\sigma) + \beta\_{\mathcal{I}} n\_{r,0}(\sigma) - a\_{\mathcal{I}} n\_{r,0}(\sigma^\*) \left( n\_{\mathcal{I},0}(\sigma^\*) + \sum\_{j=0}^d n\_{\mathcal{I},j}(\sigma) \right) - \beta\_{\mathcal{I}} n\_{\mathcal{I},0}(\sigma^\*) \right) \le 0,\tag{3}$$

that is a further necessary condition satisfied by any pure Nash equilibrium. For unweighted games, we simply fix *w<sup>i</sup>* = 1 for any *i* ∈ [*n*] and sum the inequality for each *i* ∈ *C*0, thus getting

$$\sum\_{r \in R} \left( a\_r n\_{r,0}(\sigma) \sum\_{j=0}^d n\_{r,j}(\sigma) + \beta\_r n\_{r,0}(\sigma) - a\_r n\_{r,0}(\sigma^\*) \left( 1 + \sum\_{j=0}^d n\_{I,j}(\sigma) \right) - \beta\_r n\_{r,0}(\sigma^\*) \right) \le 0. \tag{4}$$

For any player *i* ∈ *C<sup>j</sup>* , with *j* ∈ [*d*], the equilibrium condition yields

$$\sum\_{r \in \sigma\_l^\*} \left( a\_r \left( n\_{r,j}(\sigma) + n\_{r,0}(\sigma) \right) + \beta\_r \right) - \sum\_{r \in \sigma\_l^\*} \left( a\_r \left( n\_{r,j}(\sigma) + n\_{r,0}(\sigma) + w\_i \right) + \beta\_r \right) \le 0.$$

For weighted games, again, by using *w<sup>i</sup>* ≤ *nr*,0(*σ* ∗ ) for any *r* ∈ *R* and *i* ∈ [*n*] such that *r* ∈ *σ<sup>i</sup>* , by multiplying this inequality for *w<sup>i</sup>* , and by summing it for each *i* ∈ *C<sup>j</sup>* , we get

$$\sum\_{r \in R} \left( a\_r n\_{t\_{\vec{\gamma}}}(\sigma) \left( n\_{t\_{\vec{\gamma}}}(\sigma) + n\_{t,0}(\sigma) \right) + \beta\_r n\_{t,j}(\sigma) - u\_r n\_{t,j}(\sigma^\*) \left( n\_{t,j}(\sigma) + n\_{t,0}(\sigma) + n\_{t,j}(\sigma^\*) \right) - \beta\_r n\_{t,j}(\sigma^\*) \right) \le 0.15$$

By further summing for each *j* ∈ [*d*], we obtain

$$\sum\_{\sigma \in R} \left( \sum\_{j=1}^{d} \left( n\_{\tau,j}(\sigma) \left( u\_{\tau}(n\_{\tau,j}(\sigma) + n\_{\tau,0}(\sigma) + \mathfrak{f}\_{\tau}) \right) - \sum\_{j=1}^{d} \left( n\_{\tau,j}(\sigma^\*) \left( u\_{\tau}(n\_{\tau,j}(\sigma) + n\_{\tau,0}(\sigma) + n\_{\tau,j}(\sigma^\*)) + \mathfrak{f}\_{\tau} \right) \right) \right) \right) \le 0. \tag{5}$$

For unweighted games, by setting *w<sup>i</sup>* = 1, and by summing the equilibrium constraint for any *i* ∈ [*n*] and *j* ∈ [*d*], we analogously get

$$\sum\_{\tau \in R} \left( \sum\_{j=1}^{d} \left( n\_{\tau\_{\vec{\gamma}}}(\sigma) \left( a\_{\tau}(u\_{\tau\_{\vec{\gamma}}}(\sigma) + n\_{\tau \beta}(\sigma) + \mathfrak{f}\_{\tau} \right) \right) - \sum\_{j=1}^{d} \left( n\_{\tau\_{\vec{\gamma}}}(\sigma^\*) \left( a\_{\tau}(u\_{\tau\_{\vec{\gamma}}}(\sigma) + n\_{\tau \beta}(\sigma) + 1) + \mathfrak{f}\_{\tau} \right) \right) \right) \le 0. \tag{6}$$

In the sequel, for the sake of conciseness, we adopt *kr*,*<sup>j</sup>* and *lr*,*<sup>j</sup>* as short-hands for *nr*,*j*(*σ*) and *nr*,*j*(*σ* ∗ ), respectively.

Theorem 1 provides an upper bound for the price of anarchy of multidimensional weighted affine congestion games with respect to social cost function Pres.

**Theorem 1.** *For each d-dimensional weighted affine congestion game* (G, C)*,*

$$PoA(\mathcal{G}, \mathcal{C}) \le \frac{(\sqrt{d+4} + \sqrt{d})(\sqrt{d} \cdot \sqrt{d+4} + d + 4)}{4\sqrt{d+4}} \le d+2$$

*under the social cost function* Pres*.*

**Proof.** Let *σ* and *σ* <sup>∗</sup> be a worst-case equilibrium and a social optimum of (G, C), respectively. Let *k<sup>r</sup>* = (*kr*,0, . . . , *kr*,*<sup>d</sup>* ), *l<sup>r</sup>* = (*lr*,0, . . . , *lr*,*<sup>d</sup>* ), and let *<sup>P</sup>* = (*pi*,*j*)*i*,*j*∈[*d*]∪{0} be the (*<sup>d</sup>* + <sup>1</sup>) × (*<sup>d</sup>* + <sup>1</sup>) binary matrix such that: (i) *pi*,*<sup>j</sup>* = 1 if either *i* = *j*, or *i* = 0, or *j* = 0; (ii) *pi*,*<sup>j</sup>* = 0 otherwise. By summing inequalities (3) and (5), we get the following compact inequality involving the product between vectors, matrices, and scalars:

$$\sum\_{r \in R} \left( a\_r (\mathbf{k}\_r \cdot \mathbf{P} \cdot \mathbf{k}\_r^T) + \beta\_r \sum\_{j=0}^d k\_{r,j} - a\_r (l\_r \cdot \mathbf{P} \cdot \mathbf{k}\_r^T + l\_r \cdot l\_r^T) - \beta\_r \sum\_{j=0}^d l\_{r,j} \right) \le 0 \tag{7}$$

Let *<sup>Q</sup>* = (*qi*,*j*)*i*,*j*∈[*d*]∪{0} be the (*<sup>d</sup>* + <sup>1</sup>) × (*<sup>d</sup>* + <sup>1</sup>) matrix such that: (i) *<sup>q</sup>i*,*<sup>j</sup>* = √ *d* if *i* = *j*; (ii) *qi*,*<sup>j</sup>* = 1 if either *i* = 0, or *j* = 0, with (*i*, *j*) 6= (0, 0); (iii) *qi*,*<sup>j</sup>* = 0 otherwise. As 0 ≤ *pi*,*<sup>j</sup>* ≤ *qi*,*<sup>j</sup>* for any *i*, *j* we have that

$$\mathbf{l}\_r \cdot \mathbf{P} \cdot \mathbf{k}\_r^T \le \mathbf{l}\_r \cdot \mathbf{Q} \cdot \mathbf{k}\_r^T. \tag{8}$$

We have that matrix *Q* is a symmetric positive-semidefinite matrix (see Lemma A1 in the Appendix A for the proof of this fact), thus, the following inequality holds for any *u* > 0:

$$0 \le \left(\sqrt{\boldsymbol{\mu}} \cdot \mathbf{k}\_r - \frac{1}{2\sqrt{\boldsymbol{\mu}}} \cdot \mathbf{l}\_l\right) \cdot \mathbf{Q} \cdot \left(\sqrt{\boldsymbol{\mu}} \cdot \mathbf{k}\_r - \frac{1}{2\sqrt{\boldsymbol{\mu}}} \cdot \mathbf{l}\_r\right)^T = \boldsymbol{\mu} \cdot \mathbf{k}\_r \cdot \mathbf{Q} \cdot \mathbf{k}\_r^T + \frac{1}{4\mu} \cdot \mathbf{l}\_r \cdot \mathbf{Q} \cdot \mathbf{l}\_r^T - \boldsymbol{l}\_r \cdot \mathbf{Q} \cdot \mathbf{k}\_r^T. \tag{9}$$

Finally, as 0 ≤ *qi*,*<sup>j</sup>* ≤ √ *d* · *pi*,*<sup>j</sup>* for any *i*, *j*, we have that

$$\mathbf{x} \cdot \mathbf{Q} \cdot \mathbf{x}^T \le \sqrt{d} \cdot \mathbf{x} \cdot \mathbf{P} \cdot \mathbf{x}^T \tag{10}$$

for any vector *x* = (*x*0, . . . , *x<sup>d</sup>* ) of non-negative real numbers. By exploiting (7), (9), and (10), for any fixed *u* > 0 we get

$$\begin{split} \mathsf{Pred}(\sigma) &= \sum\_{r \in R} \left( a\_r (\mathbf{k}\_r \cdot \mathbf{P} \cdot \mathbf{k}\_r^T) + \beta\_r \sum\_{j=0}^d k\_{r,j} \right) \\ &\leq \sum\_{r \in R} \left( a\_r (l\_r \cdot \mathbf{P} \cdot \mathbf{k}\_r^t + l\_r \cdot l\_r^T) + \beta\_r \sum\_{j=0}^d l\_{r,j} \right) \\ &\leq \binom{d}{d} \end{split} \tag{11}$$

$$\begin{aligned} \epsilon\_{\epsilon} &\leq \sum\_{r\in R} \left( a\_{r} (\mathbf{l}\_{r} \cdot \mathbf{P} \cdot \mathbf{k}\_{r}^{T} + \mathbf{l}\_{r} \cdot \mathbf{P} \cdot \mathbf{l}\_{r}^{T}) + \beta\_{r} \sum\_{j=0}^{d} l\_{r,j} \right) \\ &\leq \sum\_{r\in R} \left( a\_{r} \left( u \cdot \mathbf{k}\_{r} \cdot \mathbf{Q} \cdot \mathbf{k}\_{r}^{T} + \frac{1}{4u} \cdot l\_{r} \cdot \mathbf{Q} \cdot l\_{r}^{T} + l\_{r} \cdot \mathbf{P} \cdot l\_{r}^{T} \right) + \beta\_{r} \sum\_{j=0}^{d} l\_{r,j} \right) \end{aligned} \tag{12}$$

$$\begin{aligned} &r \in \mathbb{R} \\ &\leq \sum\_{r \in \mathbb{R}} \left( a\_r \left( \sqrt{d} \cdot u \cdot \mathbf{k}\_r \cdot \mathbf{P} \cdot \mathbf{k}\_r^T + \left( \frac{\sqrt{d}}{4u} + 1 \right) \cdot \mathbf{l}\_r \cdot \mathbf{P} \cdot \mathbf{l}\_r^T \right) + \beta\_r \sum\_{j=0}^d l\_{r,j} \right) \end{aligned} \tag{13}$$

$$\begin{aligned} & \leq \sqrt{d} \cdot u \cdot \sum\_{r \in R} \left( a\_r (\mathbf{k}\_r \cdot \mathbf{P} \cdot \mathbf{k}\_r^T) + \mathfrak{f}\_r \sum\_{j=0}^d k\_{r,j} \right) + \left( \frac{\sqrt{d}}{4u} + 1 \right) \cdot \sum\_{r \in R} \left( a\_r (\mathbf{l}\_r \cdot \mathbf{P} \cdot \mathbf{l}\_r^T) + \mathfrak{f}\_r \sum\_{j=0}^d l\_{r,j} \right) \\ & = \sqrt{d} \cdot u \cdot \text{pres}(\sigma) + \left( \frac{\sqrt{d}}{4u} + 1 \right) \cdot \text{pres}(\sigma^\*), \end{aligned} \tag{14}$$

where (11), (12), and (13), come from (7), (9), and (10), respectively (Inequalities (11)–(14) can be stated within the smoothness framework of Roughgarden [19], and show that multidimensional weighted affine congestion games are (*λ*, *µ*)-smooth with *λ* = √ *d* <sup>4</sup>*<sup>u</sup>* + 1 and *µ* = √ *d* · *u* for any *u* > 0.). Finally, by manipulating (14), we get

$$\operatorname{PoA}(\mathcal{G}, \mathcal{C}) = \frac{\operatorname{pres}(\sigma)}{\operatorname{pres}(\sigma^\*)} \le \inf\_{u > 0} \frac{\frac{\sqrt{d}}{4u} + 1}{1 - \sqrt{d} \cdot u} = \frac{(\sqrt{d+4} + \sqrt{d})(\sqrt{d} \cdot \sqrt{d+4} + d + 4)}{4\sqrt{d+4}},\tag{15}$$

thus showing the claim. A simpler upper bound of *d* + 2 can be obtained by setting *u* = <sup>1</sup> 2 √ *d* in (15).

Relatively to the social cost function Perc, the following upper bound is derived as a corollary of a result in Reference [25].

**Corollary 1.** *For each d-dimensional affine congestion game* (G, C)*, PoA*(G, C) ≤ *d*(*d* + 2 + √ *d* <sup>2</sup> + 4*d*)/2 *under the social cost function* Perc*.*

**Proof.** Theorem 2 of Reference [25] states that *δ*(*G*)(*δ*(*G*) + 2 + p *δ*(*G*) <sup>2</sup> + 4*δ*(*G*))/2 is an upper bound for the price of anarchy of any graphical congestion having independence number *δ*(*G*). As the graphical congestion game equivalent to (G, C) has independence number equal to *d*, the claim follows.

#### **5. Bounds for the Price of Stability**

In order to bound the price of stability with respect to the social cost function Pres, we consider a pure Nash equilibrium that minimizes the potential function Φ defined in (1), which leads to the following upper bound.

**Theorem 2.** *For each d-dimensional weighted affine congestion game* (G, C)*, PoS*(G, C) ≤ 2 *under the social cost function* Pres*.*

**Proof.** Let *σ* and *σ* ∗ be a pure Nash equilibrium minimizing the potential function Φ defined in (1), and let *σ* ∗ be a social optimum. We have that

$$\begin{split} \mathsf{Pres}(\sigma) &= \sum\_{r \in \mathcal{R}} \left( a\_r \left( \sum\_{j=0}^d k\_{r,j}^2 + 2k\_{r,0} \sum\_{j=1}^d k\_{r,j} \right) + \beta\_r \sum\_{j=0}^d k\_{r,j} \right) \\ &\leq \sum\_{r \in \mathcal{R}} \left( a\_r \left( \sum\_{j=0}^d k\_{r,j}^2 + \sum\_{i \in [n] \; \middle| \; x \in \sigma\_l} w\_i^2 + 2k\_{r,0} \sum\_{j=1}^d k\_{r,j} \right) + 2\beta\_r \sum\_{j=0}^d k\_{r,j} \right) \\ &= \mathbf{2 \cdot \Phi}(\sigma) \end{split} \tag{16}$$

$$\leq \ 2 \cdot \Phi(\sigma^\*) \tag{17}$$

$$=\sum\_{r\in R} \left( \alpha\_r \left( \sum\_{j=0}^d l\_{r,j}^2 + \sum\_{i\in [n]: r \in \sigma\_i^\*} w\_i^2 + 2l\_{r,0} \sum\_{j=1}^d l\_{r,j} \right) + 2\beta\_r \sum\_{j=0}^d l\_{r,j} \right) \tag{18}$$
 
$$\le \sum\_{r\in R} \left( \alpha\_r \left( \sum\_{r=1}^d l\_{r,r+1}^2 + \sum\_{i=1}^d l\_{r,i} \right) + 2\beta\_r \sum\_{r=1}^d l\_{r,r} \right)$$

$$\begin{aligned} &\leq \sum\_{r\in\mathcal{R}} \left( a\_r \left( 2\sum\_{j=0}^d l\_{r,j}^2 + 2l\_{r,0} \sum\_{j=1} l\_{r,j} \right) + 2\beta\_r \sum\_{j=0} l\_{r,j} \right) \\ &\leq \quad \, 2\sum\_{r\in\mathcal{R}} \left( a\_r \left( \sum\_{j=0}^d l\_{r,j}^2 + 2l\_{r,0} \sum\_{j=1}^d l\_{r,j} \right) + \beta\_r \sum\_{j=0}^d l\_{r,j} \right) \\ &= \quad \, 2\cdot\text{Pres}(\sigma^\*)\_\prime \end{aligned} \tag{19}$$

where (17) holds since *σ* minimizes Φ, and (16) and (18) hold by exploiting (1). By (19), we get *PoS*(G, C) ≤ Pres(*σ*) Pres(*σ*∗) ≤ 2, and the claim follows.

Relatively to the social cost function Perc, the following upper bound is derived as a corollary of a result in Reference [25].

**Corollary 2.** *For each d-dimensional affine congestion game* (G, C)*, PoS*(G, C) ≤ 2*d under the social cost function* Perc*.*

**Proof.** Theorem 6 of Reference [25] states that 2*δ*(*G*) is an upper bound for the price of stability of any graphical congestion game having independence number *δ*(*G*). As the graphical congestion game equivalent to (G, C) has independence number equal to *d*, the claim follows.

#### **6. Bounds for Bidimensional Unweighted Games**

In this section, we investigate in more detail the case of unweighted affine games with *d* = 2, that is, bidimensional affine congestion games, and provide refined bounds for the price of anarchy and the price of stability under both social cost functions. The technique we adopt is the primal-dual framework introduced in Reference [11].

#### *6.1. Price of Anarchy*

We first consider the price of anarchy. Let (G, C) be an arbitrary *d*-dimensional unweighted congestion game, and let *σ* and *σ* <sup>∗</sup> be a worst-case equilibrium and a social optimum of (G, C), respectively. For SF = Pres, we get the following primal linear program LP(Pres, *σ*, *σ* ∗ ) in variables (*α<sup>r</sup>* , *βr*)*r*∈*R*, whose optimal solution provides an upper bound to *PoA*(G, C):

$$\begin{split} \max & \quad \sum\_{r \in R} \left( a\_r \left( \sum\_{j=0}^d k\_{r,j}^2 + 2k\_{r,0} \sum\_{j=1}^d k\_{r,j} \right) + \beta\_r \sum\_{j=0}^d k\_{r,j} \right) \\ & \text{s.t.} & \quad \sum\_{r \in R} \left( a\_r k\_{r,0} \sum\_{j=0}^d k\_{r,j} + \beta\_r k\_{r,0} - a\_r l\_{r,0} \left( 1 + \sum\_{j=0}^d k\_{r,j} \right) - \beta\_r l\_{r,0} \right) \le 0 \\ & \quad \sum\_{r \in R} \left( \sum\_{j=1}^d \left( k\_{r,j} \left( a\_r (k\_{r,j} + k\_{r,0} + \beta\_r) \right) - \sum\_{j=1}^d \left( l\_{r,j} \left( a\_r (k\_{r,j} + k\_{r,0} + 1) + \beta\_r \right) \right) \right) \right) \le 0 \\ & \quad \sum\_{r \in R} \left( a\_r \left( \sum\_{j=0}^d l\_{r,j}^2 + 2l\_{r,0} \sum\_{j=1}^d l\_{r,j} \right) + \beta\_r \sum\_{j=0}^d l\_{r,j} \right) = 1 \\ & \quad a\_r, l\_r \ge 0 \end{split}$$

The optimal solution of the above linear program is an upper bound to the price of anarchy as the objective function is equal to Pres(*σ*), the first two constraints are the pure Nash equilibrium conditions derived in (4) and (6), respectively (which are necessary conditions satisfied by any equilibrium), and the last normalization constraint imposes without loss of generality that Pres(*σ* ∗ ) = 1 (When applying the primal dual method, we observe that, once *σ* and *σ* <sup>∗</sup> are fixed, the coefficients (*αr*)*r*∈*<sup>R</sup>* and (*βr*)*r*∈*<sup>R</sup>* are chosen in such a way that the value Pres(*σ*) = Pres(*σ*)/Pres(*σ* ∗ ) is maximized, thus getting an upper bound on the price of anarchy. We also observe that (*αr*)*r*∈*<sup>R</sup>* and (*βr*)*r*∈*<sup>R</sup>* are the unique variables in the considered LP formulation, and the other quantities (e.g., the congestions) are considered as fixed parameters (w.r.t. the LP formulation). See Reference [11] for further details on the primal-dual method and how to apply it to measure the performance of congestion games under different quality metrics.).

By associating the three dual variables *x*, *y* and *γ*, with the three constraints of LP(Pres, *σ*, *σ* ∗ ), the dual formulation DLP(Pres, *σ*, *σ* ∗ ) becomes

$$\begin{array}{llll}\min & \gamma\\ & \text{s.t.}\\ & \mathbf{x}\left(k\_{r,0}\sum\_{j=0}^{d}k\_{r,j} - l\_{r,0}\sum\_{j=0}^{d}k\_{r,j}\right) + y\sum\_{j=1}^{d}\left(k\_{r,j}(k\_{r,j} + k\_{r,0}) - l\_{r,j}(k\_{r,j} + k\_{r,0} + 1)\right) \\ & + \gamma\left(\sum\_{j=0}^{d}l\_{r,j}^{2} + 2l\_{r,0}\sum\_{j=1}^{d}l\_{r,j}\right) \geq \sum\_{j=0}^{d}k\_{r,j}^{2} + 2k\_{r,0}\sum\_{j=1}^{d}k\_{r,j} & \forall r \in R\\ & \mathbf{x}(k\_{r,0} - l\_{r,0}) + y\sum\_{j=1}^{d}(k\_{r,j} - l\_{r,j}) + \gamma\sum\_{j=0}^{d}l\_{r,j} \geq \sum\_{j=0}^{d}k\_{r,j} & \forall r \in R\\ & \mathbf{x}, y \geq 0.\end{array}$$

By the Weak Duality Theorem, each feasible solution for DLP(Pres, *σ*, *σ* ∗ ) provides an upper bound to the optimal solution of LP(Pres, *σ*, *σ* ∗ ), that is on the price of anarchy achievable by the particular choice of *σ* and *σ* ∗ . Anyway, if the provided dual solution is independent of this choice, we obtain an upper bound on the price of anarchy for any possible game.

For the case of the social cost function Perc, we only need to replace the objective function and the third constraint in LP(Pres, *σ*, *σ* ∗ ), respectively, with

$$\sum\_{r \in R} \left( \alpha\_r \left( \sum\_{j=0}^d k\_{r,j} \right)^2 + \beta\_r \sum\_{j=0}^d k\_{r,j} \right) \text{ and } \sum\_{r \in R} \left( \alpha\_r \left( \sum\_{j=0}^d l\_{r,j} \right)^2 + \beta\_r \sum\_{j=0}^d l\_{r,j} \right) = 1. \text{ }$$

This results in the following dual program DLP(Perc, *σ*, *σ* ∗ ):

$$\min \quad \gamma$$

*s*.*t*.

$$\begin{aligned} \chi\left(k\_{r,0}\sum\_{j=0}^{d}k\_{r,j} - l\_{r,0} - l\_{r,0}\sum\_{j=0}^{d}k\_{r,j}\right) + y\sum\_{j=1}^{d}\left(k\_{r,j}(k\_{r,j} + k\_{r,0}) - l\_{r,j}(k\_{r,j} + k\_{r,0} + 1)\right) &\leq \chi(k\_{r,0} - l\_{r,0}) \\ +\gamma\left(\sum\_{j=0}^{d}l\_{r,j}\right)^{2} &\geq \left(\sum\_{j=0}^{d}k\_{r,j}\right)^{2} &\forall r \in \mathbb{R} \\ \chi(k\_{r,0} - l\_{r,0}) + y\sum\_{j=1}^{d}(k\_{r,j} - l\_{r,0}) + \gamma\sum\_{j=1}^{d}l\_{r,j} &\forall r \in \mathbb{R} \end{aligned}$$

$$\begin{aligned} \mathbf{x}(k\_{r\varnothing} - l\_{r\varnothing}) + \mathbf{y} \sum\_{j=1}^{l} (k\_{r,j} - l\_{r,j}) + \gamma \sum\_{j=0}^{l} l\_{r,j} &\geq \sum\_{j=0}^{l} k\_{r,j} &\forall r \in R\\ \mathbf{x}, \mathbf{y} \geq \mathbf{0}. \end{aligned}$$

Note that the second constraint is the same in both DLP(Pres, *σ*, *σ* ∗ ) and DLP(Perc, *σ*, *σ* ∗ ). For the sake of conciseness, in the sequel, we shall drop the subscript *r* from the notation; moreover, when fixed a dual solution, we shall denote the first and second constraint of a given dual program as *g*1(*k*, *l*) ≥ 0 and *g*2(*k*, *l*) ≥ 0, respectively, where we set *k* = (*k*0, . . . , *k<sup>d</sup>* ) and *l* = (*l*0, . . . , *l<sup>d</sup>* ).

When *d* = 2, we exploit an equivalent, but nicer, representation of the dual inequalities. With this aim, we set *k<sup>r</sup>* := *nr*(*σ*) and *l<sup>r</sup>* := *nr*(*σ* ∗ ) and replace *kr*,0 and *lr*,0 with *k<sup>r</sup>* − *kr*,1 − *kr*,2 and *l<sup>r</sup>* − *lr*,1 − *lr*,2, respectively. By substituting and rearranging, DLP(Pres, *σ*, *σ* ∗ ) becomes

$$\begin{array}{ll} \min & \gamma\\ \text{s.t.} & \\ & \mathbf{x}\left((k\_{r}-k\_{r,1}-k\_{r,2})k\_{r}-(l\_{r}-l\_{r,1}-l\_{r,2})(k\_{r}+1)\right) \\ & + \mathbf{y}\left(k\_{r,1}(k\_{r}-k\_{r,2})-l\_{r,1}(k\_{r}-k\_{r,2}+1)+k\_{r,2}(k\_{r}-k\_{r,1})-l\_{r,2}(k\_{r}-k\_{r,1}+1)\right) \\ & + \gamma\left(l\_{r}^{2}-2l\_{r,1}l\_{r,2}\right) \geq k\_{r}^{2}-2k\_{r,1}k\_{r,2} \\ & \mathbf{x}(k\_{r}-k\_{r,1}-k\_{r,2}-l\_{r}+l\_{r,1}+l\_{r,2})+\mathbf{y}(k\_{r,1}+k\_{r,2}-l\_{r,1}-l\_{r,2})+\gamma l\_{r} \geq k\_{r} \\ & \mathbf{x},y \geq 0. \end{array} \qquad \forall r \in \mathbb{R}$$

Similarly, the dual program DLP(Perc, *σ*, *σ* ∗ ) can be rewritten as:

$$\begin{array}{ll}\min & \gamma\\ & \text{s.t.}\\ & \text{x.t.} \left( (k\_{r} - k\_{r,1} - k\_{r,2})k\_{r} - (l\_{r} - l\_{r,1} - l\_{r,2})(k\_{r} + 1) \right) \\ & + y \left( k\_{r,1}(k\_{r} - k\_{r,2}) - l\_{r,1}(k\_{r} - k\_{r,2} + 1) + k\_{r,2}(k\_{r} - k\_{r,1}) - l\_{r,2}(k\_{r} - k\_{r,1} + 1) \right) + \gamma l\_{r}^{2} \geq k\_{r}^{2} & \forall r \in R\\ & \mathbf{x}(k\_{r} - k\_{r,1} - k\_{r,2} - l\_{r} + l\_{r,1} + l\_{r,2}) + y(k\_{r,1} + k\_{r,2} - l\_{r,1} - l\_{r,2}) + \gamma l\_{r} \geq k\_{r} & \forall r \in R\\ & \mathbf{x}, y \geq 0. \end{array}$$

In the following theorem we provide upper bounds for the price of anarchy of bidimensional affine congestion games under social cost functions Pres and Perc.

**Theorem 3.** *For each bidimensional affine congestion game* (G, <sup>C</sup>)*, PoA*(G, <sup>C</sup>) <sup>≤</sup> <sup>119</sup> <sup>33</sup> *under the social cost function* Pres *and PoA*(G, <sup>C</sup>) <sup>≤</sup> <sup>35</sup> 8 *under the social cost function* Perc*.*

We now show the existence of two matching lower bounding instances (the proof is deferred to the Appendix B).

**Theorem 4.** *There exist two bidimensional linear congestion games* (G, C) *and* (G 0 , C 0 ) *such that PoA*(G, C) ≥ 119 <sup>33</sup> *under the social cost function* Pres *and PoA*(G 0 , C 0 ) <sup>≥</sup> <sup>35</sup> 8 *under the social cost function* Perc *Appendix B.*

#### *6.2. Price of Stability*

In order to bound the price of stability, we can use the same primal formulations exploited for the determination of the price of anarchy with the additional constraint Φ(*σ*) ≤ Φ(*σ* ∗ ), which, by Equation (1), becomes

$$\sum\_{\mathbf{r}\in\mathcal{R}} \left( a\_{\mathbf{r}} \left( \sum\_{j=0}^{d} \left( k\_{\mathbf{r},j}^{2} + k\_{\mathbf{r},j} - l\_{\mathbf{r},j}^{2} - l\_{\mathbf{r},j} \right) + 2k\_{\mathbf{r},0} \sum\_{j=1}^{d} k\_{\mathbf{r},j} - 2l\_{\mathbf{r},0} \sum\_{j=1}^{d} l\_{\mathbf{r},j} \right) + 2\beta\_{\mathbf{r}} \sum\_{j=0}^{d} (k\_{\mathbf{r},j} - l\_{\mathbf{r},j}) \right) \le 0.5$$

Hence, the dual program for the social cost function Pres becomes the following one.

$$\begin{array}{ll}\min & \gamma\\ & \text{s.t.}\\ & \begin{aligned} & \left(k\_{r,0}\sum\_{j=0}^{d}k\_{r,j} - l\_{r,0} - l\_{r,0}\sum\_{j=0}^{d}k\_{r,j}\right) + y\sum\_{j=1}^{d}\left(k\_{r,j}(k\_{r,j} + k\_{r,0}) - l\_{r,j}(k\_{r,j} + k\_{r,0} + 1)\right) \\ & + z\left(\sum\_{j=0}^{d}\left(k\_{r,j}^{2} + k\_{r,j} - l\_{r,j}^{2} - l\_{r,j}\right) + 2k\_{r,0}\sum\_{j=1}^{d}k\_{r,j} - 2l\_{r,0}\sum\_{j=1}^{d}l\_{r,j}\right) \\ & + \gamma\left(\sum\_{j=0}^{d}l\_{r,j}^{2} + 2l\_{r,0}\sum\_{j=1}^{d}l\_{r,j}\right) \ge \sum\_{j=0}^{d}k\_{r,j}^{2} + 2k\_{r,0}\sum\_{j=1}^{d}k\_{r,j} \\ & x(k\_{r,0} - l\_{r,0}) + y\sum\_{j=1}^{d}(k\_{r,j} - l\_{r,j}) + 2z\sum\_{j=0}^{d}(k\_{r,j} - l\_{r,j}) + \gamma\sum\_{j=0}^{d}l\_{r,j} \ge \sum\_{j=0}^{d}k\_{r,j} \\ & x, y, z \ge 0. \end{array} \quad \forall r \in R$$

Again, for the social cost function Perc, we obtain mutatis mutandis the following dual program.

*min γ s*.*t*. *x kr*,0 *d* ∑ *j*=0 *kr*,*<sup>j</sup>* − *lr*,0 − *lr*,0 *d* ∑ *j*=0 *kr*,*<sup>j</sup>* ! + *y d* ∑ *j*=1 *kr*,*j*(*kr*,*<sup>j</sup>* + *kr*,0) − *lr*,*j*(*kr*,*<sup>j</sup>* + *kr*,0 + 1) +*z d* ∑ *j*=0 *k* 2 *<sup>r</sup>*,*<sup>j</sup>* + *kr*,*<sup>j</sup>* − *l* 2 *<sup>r</sup>*,*<sup>j</sup>* − *lr*,*<sup>j</sup>* + 2*kr*,0 *d* ∑ *j*=1 *kr*,*<sup>j</sup>* − 2*lr*,0 *d* ∑ *j*=1 *lr*,*j* ! +*γ d* ∑ *j*=0 *lr*,*j* !<sup>2</sup> ≥ *d* ∑ *j*=0 *kr*,*<sup>j</sup>* !<sup>2</sup> ∀*r* ∈ *R x*(*kr*,0 − *lr*,0) + *y d* ∑ *j*=1 (*kr*,*<sup>j</sup>* − *lr*,*j*) + 2*z d* ∑ *j*=0 (*kr*,*<sup>j</sup>* − *lr*,*j*) + *γ d* ∑ *j*=0 *lr*,*<sup>j</sup>* ≥ *d* ∑ *j*=0 *kr*,*<sup>j</sup>* ∀*r* ∈ *R*

$$
\propto \, y, z \geq 0.
$$

Again, by setting *k<sup>r</sup>* := *nr*(*σ*) and *l<sup>r</sup>* := *nr*(*σ* ∗ ) and replacing *kr*,0 and *lr*,0 with *k<sup>r</sup>* − *kr*,1 − *kr*,2 and *l<sup>r</sup>* − *lr*,1 − *lr*,2, respectively, DLP(Pres, *σ*, *σ* ∗ ) becomes

$$\begin{array}{ll} \min & \gamma\\ \text{s.t.} & \\ & \mathbf{x}\left((k\_{r}-k\_{r,1}-k\_{r,2})k\_{r}-(l\_{r}-l\_{r,1}-l\_{r,2})(k\_{r}+1)\right) \\ & + \mathbf{y}\left(k\_{r,1}(k\_{r}-k\_{r,2})-l\_{r,1}(k\_{r}-k\_{r,2}+1)+k\_{r,2}(k\_{r}-k\_{r,1})-l\_{r,2}(k\_{r}-k\_{r,1}+1)\right) \\ & + \mathbf{z}\left(k\_{r}^{2}-2k\_{r,1}k\_{r,2}-l\_{r}^{2}+2l\_{r,1}l\_{r,2}+k\_{r}-l\_{r}\right)+\gamma\left(l\_{r}^{2}-2l\_{r,1}l\_{r,2}\right) \geq k\_{r}^{2}-2k\_{r,1}k\_{r,2} \qquad \forall r\in\mathbb{R} \\ & \mathbf{x}(k\_{r}-k\_{r,1}-k\_{r,2}-l\_{r}+l\_{r,1}+l\_{r,2})+y(k\_{r,1}+k\_{r,2}-l\_{r,1}-l\_{r,2})+2z(k\_{r}-l\_{r})+\gamma l\_{r}\geq k\_{r} \qquad \forall r\in\mathbb{R} \\ & \mathbf{x},y,z\geq 0. \end{array}$$

Similarly, the dual program DLP(Perc, *σ*, *σ* ∗ ) can be rewritten as:

$$\begin{array}{ll} \min & \gamma\\ \text{s.t.} & \\ & \mathbf{x}\left((k\_{r}-k\_{r,1}-k\_{r,2})k\_{r}-(l\_{r}-l\_{r,1}-l\_{r,2})(k\_{r}+1)\right) \\ & + \mathbf{y}\left(k\_{r,1}(k\_{r}-k\_{r,2})-l\_{r,1}(k\_{r}-k\_{r,2}+1)+k\_{r,2}(k\_{r}-k\_{r,1})-l\_{r,2}(k\_{r}-k\_{r,1}+1)\right) \\ & + \mathbf{z}\left(k\_{r}^{2}-2k\_{r,1}k\_{r,2}-l\_{r}^{2}+2l\_{r,1}l\_{r,2}+k\_{r}-l\_{r}\right)+\gamma l\_{r}^{2} \geq k\_{r}^{2} & \quad \forall r\in\mathcal{R} \\ & \mathbf{x}(k\_{r}-k\_{r,1}-k\_{r,2}-l\_{r}+l\_{r,1}+l\_{r,2})+\mathbf{y}(k\_{r,1}+k\_{r,2}-l\_{r,1}-l\_{r,2})+2\mathbf{z}(k\_{r}-l\_{r})+\gamma l\_{r} \geq k\_{r} & \quad \forall r\in\mathcal{R} \\ & \mathbf{x},y,z \geq 0. \end{array}$$

**Theorem 5.** *For each bidimensional affine congestion game* (G, C)*, PoS*(G, C) ≤ 1 + <sup>√</sup> 2 7 *under the social cost function* Pres *and PoS*(G, C) ≤ 2.92 *under the social cost function* Perc*.*

**Proof.** For the social cost function Pres, set *γ* = 1 + √ 2 7 , *x* = *y* = √ 1 7 and *z* = <sup>1</sup> <sup>2</sup> <sup>+</sup> <sup>1</sup> 2 √ 7 . The second dual constraint is always satisfied, as min{*x*, *y*} ≥ 1 and max{*x*, *y*} + 2*z* ≤ *γ*. Thus, we shall focus again on the first constraint *g*1(*k*, *l*) ≥ 0. For any *r* ∈ *R*, *g*1(*k*, *l*) becomes

$$k^2(3 - \sqrt{7}) - k(2l - 1 - \sqrt{7}) + 2k\_1k\_2(\sqrt{7} - 3) + 2(k\_1l\_2 + k\_2l\_1) + (l^2 - l)(3 + \sqrt{7}) - 2l\_1l\_2(3 + \sqrt{7}) \ge 0.5$$

The claim follows by applying Lemma A9 reported in the Appendix A.

For the social cost function Perc, set *γ* = 2.92, *x* = 0.68, *y* = 1.3 and *z* = 0.81. Again, the second dual constraint is always satisfied, as min{*x*, *y*} ≥ 1 and max{*x*, *y*} + 2*z* ≤ *γ*. Thus, we shall focus again on the first constraint *g*1(*k*, *l*) ≥ 0. For any *r* ∈ *R*, *g*1(*k*, *l*) become 49*k* <sup>2</sup> <sup>+</sup> *<sup>k</sup>*(62*k*<sup>1</sup> <sup>+</sup> <sup>62</sup>*k*<sup>2</sup> <sup>−</sup> <sup>68</sup>*<sup>l</sup>* <sup>−</sup> 62*l*<sup>1</sup> − 62*l*<sup>2</sup> + 81) + 130*k*1*l*<sup>2</sup> + 130*k*2*l*<sup>1</sup> − 422*k*1*k*<sup>2</sup> + 211*l* <sup>2</sup> <sup>−</sup> <sup>149</sup>*<sup>l</sup>* <sup>+</sup> <sup>2</sup>(81*l*1*l*<sup>2</sup> <sup>−</sup> <sup>31</sup>*l*<sup>1</sup> <sup>−</sup> <sup>31</sup>*l*2) <sup>≥</sup> 0. The claim follows by applying Lemma A13 reported in the Appendix A.

For these cases, unfortunately, we are not able to devise matching lower bounds. The following result is obtained by suitably extending the lower bounding instance given in Reference [17] for the price of stability of congestion games (the proof is deferred to the Appendix).

**Theorem 6.** *For any e* > 0*, there exist two bidimensional linear congestion games* (G, C) *and* (G 0 , C 0 ) *such that PoS*(G, C) ≥ 1+ √ 5 <sup>2</sup> − *e under the social cost function* Pres *and PoS*(G 0 , C 0 ) ≥ 5+ √ 17 <sup>4</sup> − *e under the social cost function* Perc*.*

#### **7. Conclusions and Open Problems**

We have introduced *d*-dimensional (weighted) congestion games: a generalization of (weighted) congestion games able to model various interesting scenarios of applications. They can also be reinterpreted as a particular subclass of that of graphical (weighted) congestion games defined by an undirected social knowledge graph whose independence number is equal to *d*. We have provided bounds for the price of anarchy and the price of stability of these games as a function of *d* under the two fundamental social cost functions sum of the players' perceived costs and sum of the players' presumed costs. We have also considered in deeper detail the case of *d* = 2 in presence of unweighted players only.

Closing the gap between upper and lower bounds is an intriguing and challenging open problem. In particular, we conjecture that the upper bound of *O*(*d*) for the price of anarchy of *d*-dimensional weighted congestion games is asymptotically tight (with respect to *d*), even for unweighted games.

Along the line of research of improving the performance of congestion games via some feasible strategies or coordination (e.g., taxes [27,28] or Stackelberg strategies [29,30]), another interesting research direction is partitioning the players into *d* + 1 clusters similarly as in *d*-dimensional games, to improve as much as possible the price of anarchy or the price of stability.

A further research direction is that of combining the model of multidimensional congestion games with other variants of congestion games (e.g., risk-averse congestion games [31–34] and congestion games with link failures [35–37]).

**Author Contributions:** Conceptualization, V.B., M.F., V.G., and C.V.; Methodology, V.B., M.F., V.G., and C.V.; Validation, V.B., M.F., and C.V.; Formal Analysis, V.B., M.F., and C.V.; Investigation, V.B., M.F., V.G., and C.V.; Writing Original Draft Preparation, V.B., M.F., and C.V.; Writing Review & Editing, V.B. and C.V.; Visualization, V.B., M.F., V.G., and C.V.; Supervision, V.B., M.F., and C.V.; Project Administration, V.B. and M.F.; Funding Acquisition, M.F. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was partially supported by the Italian MIUR PRIN 2017 Project ALGADIMAR "Algorithms, Games, and Digital Markets".

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Appendix A. Technical Lemmas**

In this section we gather all technical lemmas needed to prove our main theorems.

**Lemma A1.** *For any <sup>d</sup>* ≥ <sup>0</sup>*, let <sup>Q</sup>* = (*qi*,*j*)*i*,*j*∈[*d*]∪{0} *be the* (*d* + 1) × (*d* + 1) *matrix such that: (i) qi*,*<sup>j</sup>* = √ *d if i* = *j; (ii) qi*,*<sup>j</sup>* = 1 *if either i* = 0*, or j* = 0*, with* (*i*, *j*) 6= (0, 0)*; (iii) qi*,*<sup>j</sup>* = 0 *otherwise. We have that Q is a positive-semidefinite matrix.*

**Proof.** To show the claim, we resort to the Sylvester's criterion, stating that a symmetric matrix *M* is positive-semidefinite if and only if the determinant of each principal minor of *M* (i.e., each upper upper left *h*-by-*h* corner of *M*) is non-negative. Let *Ah*,*<sup>x</sup>* = (*ah*,*x*,*i*,*<sup>j</sup>* )*i*,*j*∈[*h*] be a *<sup>h</sup>* × *<sup>h</sup>* matrix such that: (i) *ah*,*x*,*i*,*<sup>j</sup>* = *x* if *i* = *j*; (ii) *ah*,*x*,*i*,*<sup>j</sup>* = 1 if (*i*, *j*) 6= (1, 1), and, either *i* = 1, or *j* = 1; (iii) *ah*,*x*,*i*,*<sup>j</sup>* = 0 otherwise. We have that each principal minor of matrix *Q* is of type *Ah*, √ *d* for some *h* ∈ [*d* + 1]. Thus, it is sufficient showing that the determinant of matrix *Ah*, √ *d* , denoted as *Det*(*Ah*, √ *d* ), is non-negative for any *h* ∈ [*d* + 1].

We first show by induction on integers *h* ≥ 1 that *Det*(*Ah*,*<sup>x</sup>* ) = *x <sup>h</sup>* <sup>−</sup> (*<sup>h</sup>* <sup>−</sup> <sup>1</sup>) · *<sup>x</sup> h*−2 for any fixed *x* ∈ R. If *h* = 0 we trivially get *Det*(*Ah*,*<sup>x</sup>* ) = *x* = *x <sup>h</sup>* <sup>−</sup> (*<sup>h</sup>* <sup>−</sup> <sup>1</sup>) · *<sup>x</sup> h*−2 . Now, we assume that *Det*(*Ah*,*<sup>x</sup>* ) = *x <sup>h</sup>* <sup>−</sup> (*<sup>h</sup>* <sup>−</sup> <sup>1</sup>)*<sup>x</sup> <sup>h</sup>*−<sup>2</sup> holds for some *<sup>h</sup>* <sup>≥</sup> 1, and we show that *Det*(*Ah*+1,*<sup>x</sup>* ) = *x <sup>h</sup>*+<sup>1</sup> <sup>−</sup> *<sup>h</sup>* · *<sup>x</sup> h*−1 . We get *Det*(*Ah*+1,*<sup>x</sup>* ) = *x* · *Det*(*Ah*,*<sup>x</sup>* ) − *x <sup>h</sup>*−<sup>1</sup> = *x*(*x <sup>h</sup>* <sup>−</sup> (*<sup>h</sup>* <sup>−</sup> <sup>1</sup>)*<sup>x</sup> h*−2 ) − *x <sup>h</sup>*−<sup>1</sup> = *x <sup>h</sup>*+<sup>1</sup> <sup>−</sup> *<sup>h</sup>* · *<sup>x</sup> h*−1 , where the first equality comes from the Laplace expansion for computing the determinant, and the second equality comes from the inductive hypothesis.

By using the fact that *Det*(*Ah*,*<sup>x</sup>* ) = *x <sup>h</sup>* <sup>−</sup> (*<sup>h</sup>* <sup>−</sup> <sup>1</sup>) · *<sup>x</sup> <sup>h</sup>*−<sup>2</sup> holds for any *<sup>x</sup>* <sup>∈</sup> <sup>R</sup> and any integer *h* ≥ 1, we have that *Det*(*Ah*, √ *d* ) = (<sup>√</sup> *d*) *<sup>h</sup>* <sup>−</sup> (*<sup>h</sup>* <sup>−</sup> <sup>1</sup>)(<sup>√</sup> *d*) *<sup>h</sup>*−<sup>2</sup> <sup>≥</sup> <sup>0</sup> for any *<sup>h</sup>* <sup>∈</sup> [*<sup>d</sup>* <sup>+</sup> <sup>1</sup>], where the last inequality holds since quantity *x <sup>h</sup>* <sup>−</sup> (*<sup>h</sup>* <sup>−</sup> <sup>1</sup>)*<sup>x</sup> h*−1 is always non-negative for any *x* ≥ √ *h* − 1 if *h* ≤ *d* + 1. Thus each principal minor of *Q* has a non-negative determinant, and the claim follows.

**Lemma A2.** *Let θ* : Z 6 <sup>≥</sup><sup>0</sup> <sup>→</sup> <sup>Q</sup> *be the function such that <sup>θ</sup>*(*a*, *<sup>b</sup>*, *<sup>c</sup>*, *<sup>d</sup>*,*e*, *<sup>f</sup>*) = <sup>18</sup>*<sup>a</sup>* <sup>2</sup> <sup>−</sup> *<sup>a</sup>*(*<sup>b</sup>* <sup>+</sup> *<sup>c</sup>* <sup>+</sup> <sup>51</sup>*<sup>d</sup>* <sup>−</sup> *<sup>e</sup>* <sup>−</sup> *<sup>f</sup>*) + 50*b f* + 50*ce* − 34*bc* + 119*d* <sup>2</sup> <sup>−</sup> <sup>51</sup>*<sup>d</sup>* <sup>+</sup> *<sup>e</sup>* <sup>+</sup> *<sup>f</sup>* <sup>−</sup> <sup>238</sup>*e f . For any* (*a*, *<sup>b</sup>*, *<sup>c</sup>*, *<sup>d</sup>*,*e*, *<sup>f</sup>*) <sup>∈</sup> <sup>Z</sup> 6 ≥0 *such that a* ≥ *b* + *c and d* ≥ *e* + *f , it holds that θ*(*a*, *b*, *c*, *d*,*e*, *f*) ≥ 0*.*

**Proof.** At a first glance, in order to use standard arguments from calculus, we allow the 6-tuples (*a*, *b*, *c*, *d*,*e*, *f*) to take values in the set of non-negative real numbers.

We first show that, in such an extended scenario, *θ* attains its minimum for 6-tuples (*a*, *b*, *c*, *d*,*e*, *f*) such that *b* = *c* and *e* = *f* . Consider to this aim the 6-tuple (*a*, *b*, *b* + *h*, *d*,*e*,*e* + *k*), where *h*, *k* ∈ R. By definition of *θ*, we get *θ*(*a*, *b* + *<sup>h</sup>* 2 , *b* + *<sup>h</sup>* 2 , *d*,*e* + *<sup>k</sup>* 2 ,*e* + *<sup>k</sup>* 2 ) = *<sup>θ</sup>*(*a*, *<sup>b</sup>*, *<sup>b</sup>* <sup>+</sup> *<sup>h</sup>*, *<sup>d</sup>*,*e*,*<sup>e</sup>* <sup>+</sup> *<sup>k</sup>*) <sup>−</sup> <sup>17</sup>*<sup>h</sup>* <sup>2</sup>−50*hk*+119*<sup>k</sup>* 2 2 ≤ *θ*(*a*, *b*, *b* + *h*, *d*,*e*,*e* + *k*) − (4*h*−10*k*) 2 <sup>2</sup> ≤ *θ*(*a*, *b*, *b* + *h*, *d*,*e*,*e* + *k*).

Hence, we do not lose in generality by restricting to 6-tuples of non-negative real values (*a*, *b*, *b*, *d*,*e*,*e*) such that *a* ≥ 2*b* and *d* ≥ 2*e*. In this case *θ* becomes 18*a* <sup>2</sup> <sup>−</sup> *<sup>a</sup>*(2*<sup>b</sup>* <sup>+</sup> <sup>51</sup>*<sup>d</sup>* <sup>−</sup> <sup>2</sup>*e*) <sup>−</sup> <sup>34</sup>*<sup>b</sup>* <sup>2</sup> + 100*be* + 119*d* <sup>2</sup> <sup>−</sup> <sup>51</sup>*<sup>d</sup>* <sup>−</sup> <sup>238</sup>*<sup>e</sup>* <sup>2</sup> + 2*e*. Consider the two partial derivatives *δθ <sup>δ</sup><sup>b</sup>* = 100*e* − 2*a* − 68*b* and *δθ <sup>δ</sup><sup>e</sup>* = 2(*a* + 50*b* + 1 − 238*e*). Since they are linear and decreasing in *b* and *e*, respectively, it follows that *<sup>θ</sup>* is minimized at one of the following four cases: *<sup>b</sup>* <sup>=</sup> <sup>0</sup> <sup>∧</sup> *<sup>e</sup>* <sup>=</sup> 0, *<sup>b</sup>* <sup>=</sup> <sup>0</sup> <sup>∧</sup> *<sup>e</sup>* <sup>=</sup> *<sup>d</sup>* 2 , *b* = *<sup>a</sup>* <sup>2</sup> ∧ *e* = 0 and *b* = *<sup>a</sup>* <sup>2</sup> <sup>∧</sup> *<sup>e</sup>* <sup>=</sup> *<sup>d</sup>* 2 .

In the first case, *θ* becomes 18*a* <sup>2</sup> <sup>−</sup> <sup>51</sup>*ad* <sup>+</sup> <sup>119</sup>*<sup>d</sup>* <sup>2</sup> <sup>−</sup> <sup>51</sup>*d*. Since *δθ <sup>δ</sup><sup>a</sup>* = 36*a* − 51*d*, *θ* is minimized at *a* = <sup>17</sup>*<sup>d</sup>* <sup>12</sup> . By substituting, *<sup>θ</sup>* becomes <sup>1</sup> 8 (663*d* <sup>2</sup> <sup>−</sup> <sup>408</sup>*d*) which is always non-negative for any *<sup>d</sup>* <sup>∈</sup> <sup>Z</sup>.

In the second case, *θ* becomes 36*a* <sup>2</sup> <sup>−</sup> <sup>100</sup>*ad* <sup>+</sup> <sup>119</sup>*<sup>d</sup>* <sup>2</sup> <sup>−</sup> <sup>100</sup>*d*. Since *δθ <sup>δ</sup><sup>a</sup>* = 36*a* − 50*d*, *θ* is minimized at *a* = <sup>25</sup>*<sup>d</sup>* <sup>18</sup> . By substituting, *<sup>θ</sup>* becomes <sup>1</sup> 9 (223*d* <sup>2</sup> <sup>−</sup> <sup>450</sup>*d*) which is always non-negative for any *<sup>d</sup>* <sup>∈</sup> Z \ {1, 2}.

In the third case, *θ* becomes <sup>17</sup> 2 (*a* <sup>2</sup> <sup>−</sup> <sup>6</sup>*ad* <sup>+</sup> <sup>14</sup>*<sup>d</sup>* <sup>2</sup> <sup>−</sup> <sup>6</sup>*d*). Since *δθ <sup>δ</sup><sup>a</sup>* = 17(*a* − 3*d*), *θ* is minimized at *a* = 3*d*. By substituting, *θ* becomes <sup>17</sup> 2 (5*d* <sup>2</sup> <sup>−</sup> <sup>6</sup>*d*) which is always non-negative for any *<sup>d</sup>* <sup>∈</sup> <sup>Z</sup> \ {1}.

In the fourth case, *θ* becomes <sup>1</sup> 2 (17*a* <sup>2</sup> <sup>−</sup> <sup>50</sup>*ad* <sup>+</sup> <sup>119</sup>*<sup>d</sup>* <sup>2</sup> <sup>−</sup> <sup>100</sup>*d*). Since *δθ <sup>δ</sup><sup>a</sup>* = 17*a* − 25*d*, *θ* is minimized at *a* = <sup>25</sup>*<sup>d</sup>* <sup>17</sup> . By substituting, *<sup>θ</sup>* becomes <sup>1</sup> <sup>17</sup> (699*d* <sup>2</sup> <sup>−</sup> <sup>850</sup>*d*) which is always non-negative for any *d* ∈ Z \ {1}.

Hence, in order to complete the proof, we are left to settle the following cases: (*a*, 0, 0, 1, 0, 0), (*a*, 0, 0, 2, 1, 1), (*a*, 0, 0, 1, 1, 0), (*a*, 0, 0, 1, 0, 1), (*a*, *a* 2 , *a* 2 , 1, 1, 0), (*a*, *a* 2 , *a* 2 , 1, 0, 1) and (*a*, *a* 2 , *a* 2 , 1, 0, 0).

In the case (*a*, 0, 0, 1, 0, 0), *θ* becomes 18*a* <sup>2</sup> <sup>−</sup> <sup>51</sup>*<sup>a</sup>* <sup>+</sup> <sup>68</sup> which is always non-negative for any *<sup>a</sup>* <sup>∈</sup> <sup>R</sup>. In the case (*a*, 0, 0, 2, 1, 1), *θ* becomes 18*a* <sup>2</sup> <sup>−</sup> <sup>100</sup>*<sup>a</sup>* <sup>+</sup> <sup>138</sup> which is always non-negative for any *<sup>a</sup>* <sup>∈</sup> <sup>Z</sup>. In the cases (*a*, 0, 0, 1, 1, 0) and (*a*, 0, 0, 1, 0, 1), *θ* becomes 18*a* <sup>2</sup> <sup>−</sup> <sup>50</sup>*<sup>a</sup>* <sup>+</sup> <sup>69</sup> which is always non-negative for any *a* ∈ R. In the cases (*a*, *a* 2 , *a* 2 , 1, 1, 0) and (*a*, *a* 2 , *a* 2 , 1, 0, 1), *θ* becomes <sup>17</sup>*<sup>a</sup>* <sup>2</sup>−50*a*+<sup>138</sup> <sup>2</sup> which is always non-negative for any *a* ∈ R. Finally, in the case (*a*, *a* 2 , *a* 2 , 1, 0, 0), *θ* becomes <sup>17</sup> 2 (*a* <sup>2</sup> <sup>−</sup> <sup>6</sup>*<sup>a</sup>* <sup>+</sup> <sup>8</sup>) which is always non-negative for any *a* ∈ Z \ {3}. Hence, we are only left to consider the case (3, *b*, *c*, 1, 0, 0) for which *θ* becomes 77 − 34*bc* − 3(*b* + *c*). Since *b* + *c* ≤ 3, it holds that 77 − 34*bc* − 3(*b* + *c*) ≥ 68 − 34*bc* which is always non-negative since *bc* ≤ 2 for any *b*, *c* ∈ Z≥<sup>0</sup> such that *b* + *c* ≤ 3.

**Lemma A3.** *Let λ* : Z 2 <sup>≥</sup><sup>0</sup> <sup>→</sup> <sup>Q</sup> *be the function such that <sup>λ</sup>*(*a*, *<sup>d</sup>*) = *<sup>a</sup>* <sup>2</sup> <sup>−</sup> <sup>3</sup>*ad* <sup>+</sup> <sup>5</sup>*<sup>d</sup>* <sup>2</sup> <sup>−</sup> <sup>3</sup>*d. For any* (*a*, *<sup>d</sup>*) <sup>∈</sup> <sup>Z</sup> 2 ≥0 *such that d* 6= 1*, it holds that λ*(*a*, *d*) ≥ 0*.*

**Proof.** Since *δλ <sup>δ</sup><sup>a</sup>* <sup>=</sup> <sup>2</sup>*<sup>a</sup>* <sup>−</sup> <sup>3</sup>*d*, *<sup>λ</sup>* is minimized at *<sup>a</sup>* <sup>=</sup> <sup>3</sup> 2 *d*. By substituting, we get 11*d* <sup>2</sup> <sup>−</sup> <sup>12</sup>*<sup>d</sup>* which is non-negative for any *d* ∈ Z \ {1}.

**Lemma A4.** *Let λ* : Z 2 <sup>≥</sup><sup>0</sup> <sup>→</sup> <sup>Q</sup> *be the function such that <sup>λ</sup>*(*a*, *<sup>d</sup>*) = *<sup>a</sup>* <sup>2</sup> <sup>−</sup> <sup>6</sup>*ad* <sup>+</sup> <sup>14</sup>*<sup>d</sup>* <sup>2</sup> <sup>−</sup> <sup>6</sup>*d. For any* (*a*, *d*) ∈ Z 2 ≥0 *such that d* 6= 1*, it holds that λ*(*a*, *d*) ≥ 0*.*

**Proof.** Since *δλ <sup>δ</sup><sup>a</sup>* = 2*a* − 6*d*, *λ* is minimized at *a* = 3*d*. By substituting, we get 5*d* <sup>2</sup> <sup>−</sup> <sup>6</sup>*<sup>d</sup>* which is non-negative for any *d* ∈ Z \ {1}.

**Lemma A5.** *Let λ* : Z 2 <sup>≥</sup><sup>0</sup> <sup>→</sup> <sup>Q</sup> *be the function such that <sup>λ</sup>*(*a*, *<sup>d</sup>*) = <sup>20</sup>*<sup>a</sup>* <sup>2</sup> <sup>−</sup> <sup>84</sup>*ad* <sup>+</sup> <sup>259</sup>*<sup>d</sup>* <sup>2</sup> <sup>−</sup> <sup>168</sup>*d. For any* (*a*, *d*) ∈ Z 2 ≥0 *, it holds that λ*(*a*, *d*) ≥ 0*.*

**Proof.** Since *δλ <sup>δ</sup><sup>a</sup>* <sup>=</sup> <sup>40</sup>*<sup>a</sup>* <sup>−</sup> <sup>84</sup>*d*, *<sup>λ</sup>* is minimized at *<sup>a</sup>* <sup>=</sup> <sup>21</sup> <sup>10</sup> *d*. By substituting, we get 61*d* <sup>2</sup> <sup>−</sup> <sup>60</sup>*<sup>d</sup>* which is non-negative for any *d* ∈ Z.

**Lemma A6.** *Let λ* : Z 2 <sup>≥</sup><sup>0</sup> <sup>→</sup> <sup>Q</sup> *be the function such that <sup>λ</sup>*(*a*, *<sup>d</sup>*) = <sup>13</sup>*<sup>a</sup>* <sup>2</sup> <sup>−</sup> <sup>21</sup>*ad* <sup>+</sup> <sup>35</sup>*<sup>d</sup>* <sup>2</sup> <sup>−</sup> <sup>21</sup>*d. For any* (*a*, *d*) ∈ Z 2 ≥0 *, it holds that λ*(*a*, *d*) ≥ 0*.*

**Proof.** Since *δλ <sup>δ</sup><sup>a</sup>* <sup>=</sup> <sup>26</sup>*<sup>a</sup>* <sup>−</sup> <sup>21</sup>*d*, *<sup>λ</sup>* is minimized at *<sup>a</sup>* <sup>=</sup> <sup>21</sup> <sup>26</sup> *d*. By substituting, we get 197*d* <sup>2</sup> <sup>−</sup> <sup>156</sup>*<sup>d</sup>* which is non-negative for any *d* ∈ Z.

**Lemma A7.** *Let θ* : Z 6 <sup>≥</sup><sup>0</sup> <sup>→</sup> <sup>Q</sup> *be the function such that <sup>θ</sup>*(*a*, *<sup>b</sup>*, *<sup>c</sup>*, *<sup>d</sup>*,*e*, *<sup>f</sup>*) = <sup>7</sup>*<sup>a</sup>* <sup>2</sup> <sup>+</sup> <sup>3</sup>*a*(2*<sup>b</sup>* <sup>+</sup> <sup>2</sup>*<sup>c</sup>* <sup>−</sup> <sup>5</sup>*<sup>d</sup>* <sup>−</sup> <sup>2</sup>*<sup>e</sup>* <sup>−</sup> 2 *f*) + 21*b f* + 21*ce* − 42*bc* + 35*d* <sup>2</sup> <sup>−</sup> <sup>15</sup>*<sup>d</sup>* <sup>−</sup> <sup>6</sup>*<sup>e</sup>* <sup>−</sup> <sup>6</sup> *<sup>f</sup> . For any* (*a*, *<sup>b</sup>*, *<sup>c</sup>*, *<sup>d</sup>*,*e*, *<sup>f</sup>*) <sup>∈</sup> <sup>Z</sup> 6 ≥0 *such that a* ≥ *b* + *c and d* ≥ *e* + *f , it holds that θ*(*a*, *b*, *c*, *d*,*e*, *f*) ≥ 0*.*

**Proof.** At a first glance, in order to use standard arguments from calculus, we allow the 6-tuples (*a*, *b*, *c*, *d*,*e*, *f*) to take values in the set of non-negative real numbers. Since *δθ <sup>δ</sup><sup>c</sup>* = 3(2*a* − 14*b* + 7*e*) and *δθ <sup>δ</sup> <sup>f</sup>* = 3(7*b* − 2*a* − 2), *θ* is minimized at one of the following four cases: *c* = 0 ∧ *f* = 0, *c* = 0 ∧ *f* = *d* − *e*, *c* = *a* − *b* ∧ *f* = 0 and *c* = *a* − *b* ∧ *f* = *d* − *e*.

In the first case, we get *θ* = 7*a* <sup>2</sup> <sup>+</sup> <sup>3</sup>*a*(2*<sup>b</sup>* <sup>−</sup> <sup>5</sup>*<sup>d</sup>* <sup>−</sup> <sup>2</sup>*e*) + <sup>35</sup>*<sup>d</sup>* <sup>2</sup> <sup>−</sup> <sup>15</sup>*<sup>d</sup>* <sup>−</sup> <sup>6</sup>*e*. Since *δθ <sup>δ</sup><sup>b</sup>* = 6*a*, *θ* is minimized at *b* = 0 which yields *θ* = 7*a* <sup>2</sup> <sup>−</sup> <sup>3</sup>*a*(5*<sup>d</sup>* <sup>+</sup> <sup>2</sup>*e*) + <sup>35</sup>*<sup>d</sup>* <sup>2</sup> <sup>−</sup> <sup>15</sup>*<sup>d</sup>* <sup>−</sup> <sup>6</sup>*e*. Since *δθ <sup>δ</sup><sup>e</sup>* = −6(*a* + 1), *θ* is minimized at *e* = *d* which yields *θ* = 7(*a* <sup>2</sup> <sup>−</sup> <sup>3</sup>*ad* <sup>+</sup> <sup>5</sup>*<sup>d</sup>* <sup>2</sup> <sup>−</sup> <sup>3</sup>*d*). The claim then follows for any *<sup>d</sup>* <sup>6</sup><sup>=</sup> <sup>1</sup> by applying Lemma A3. For the leftover tuples of the form (*a*, 0, 0, 1, 1, 0), we get *θ* = 7(*a* <sup>2</sup> <sup>−</sup> <sup>3</sup>*<sup>a</sup>* <sup>+</sup> <sup>2</sup>) which is always non-negative for any *a* ∈ Z.

In the second case, we get *θ* = 7*a* <sup>2</sup> <sup>+</sup> <sup>3</sup>*a*(2*<sup>b</sup>* <sup>−</sup> <sup>7</sup>*d*) + <sup>7</sup>(3*b*(*<sup>d</sup>* <sup>−</sup> *<sup>e</sup>*) + <sup>5</sup>*<sup>d</sup>* <sup>2</sup> <sup>−</sup> <sup>3</sup>*d*). Since *δθ <sup>δ</sup><sup>b</sup>* = 3(2*a* + 7(*d* − *e*)), *θ* is minimized at *b* = 0, which yields *θ* = 7(*a* <sup>2</sup> <sup>−</sup> <sup>3</sup>*ad* <sup>+</sup> <sup>5</sup>*<sup>d</sup>* <sup>2</sup> <sup>−</sup> <sup>3</sup>*d*). The claim then follows for any *d* 6= 1 by applying Lemma A3. For the leftover tuples of the form (*a*, 0, 0, 1,*e*, 1 − *e*), we get *θ* = 7(*a* <sup>2</sup> <sup>−</sup> <sup>3</sup>*<sup>a</sup>* <sup>+</sup> <sup>2</sup>) which is always non-negative for any *<sup>a</sup>* <sup>∈</sup> <sup>Z</sup>.

In the third case, we get *θ* = 13*a* <sup>2</sup> <sup>−</sup> <sup>3</sup>*a*(14*<sup>b</sup>* <sup>+</sup> <sup>5</sup>(*<sup>d</sup>* <sup>−</sup> *<sup>e</sup>*)) + <sup>42</sup>*<sup>b</sup>* <sup>2</sup> <sup>−</sup> <sup>21</sup>*be* <sup>+</sup> <sup>35</sup>*<sup>d</sup>* <sup>2</sup> <sup>−</sup> <sup>15</sup>*<sup>d</sup>* <sup>−</sup> <sup>6</sup>*e*. Since *δθ <sup>δ</sup><sup>e</sup>* = 3(5*a* − 7*b* − 2), *θ* is minimized at either *e* = 0 or *e* = *d*. For *e* = *d*, we get *θ* = 13*a* <sup>2</sup> <sup>−</sup> <sup>42</sup>*ab* <sup>+</sup> 42*b* <sup>2</sup> <sup>−</sup> <sup>21</sup>*bd* <sup>+</sup> <sup>35</sup>*<sup>d</sup>* <sup>2</sup> <sup>−</sup> <sup>21</sup>*d*. Since *δθ <sup>δ</sup><sup>b</sup>* <sup>=</sup> <sup>−</sup>21(2*<sup>a</sup>* <sup>−</sup> <sup>4</sup>*<sup>b</sup>* <sup>+</sup> *<sup>d</sup>*), *<sup>θ</sup>* is minimized at *<sup>b</sup>* <sup>=</sup> <sup>2</sup>*a*+*<sup>d</sup>* 4 . This yields *θ* = 20*a* <sup>2</sup> <sup>−</sup> <sup>84</sup>*ad* <sup>+</sup> <sup>259</sup>*<sup>d</sup>* <sup>2</sup> <sup>−</sup> <sup>168</sup>*<sup>d</sup>* and the claim then follows by applying Lemma A5. For *<sup>e</sup>* <sup>=</sup> 0, we get *θ* = 13*a* <sup>2</sup> <sup>−</sup> <sup>3</sup>*a*(14*<sup>b</sup>* <sup>+</sup> <sup>5</sup>*d*) + <sup>42</sup>*<sup>b</sup>* <sup>2</sup> + 35*d* <sup>2</sup> <sup>−</sup> <sup>15</sup>*d*. Since *δθ <sup>δ</sup><sup>b</sup>* <sup>=</sup> <sup>42</sup>(2*<sup>b</sup>* <sup>−</sup> *<sup>a</sup>*), *<sup>θ</sup>* is minimized at *<sup>b</sup>* <sup>=</sup> *<sup>a</sup>* 2 which yields *θ* = 5(*a* <sup>2</sup> <sup>−</sup> <sup>6</sup>*ad* <sup>+</sup> <sup>14</sup>*<sup>d</sup>* <sup>2</sup> <sup>−</sup> <sup>6</sup>*d*) and the claim then follows for any *<sup>d</sup>* <sup>6</sup><sup>=</sup> <sup>1</sup> by applying Lemma A4. For the leftover tuples of the form (*a*, *a* 2 , *a* 2 , 1, 0, 0), we get *θ* = <sup>5</sup> 2 (*a* <sup>2</sup> <sup>−</sup> <sup>6</sup>*<sup>a</sup>* <sup>+</sup> <sup>8</sup>) which is always non-negative for any *a* ∈ Z \ {3}. Hence, we are still left to prove what happens for the tuples of the form (3, *b*, 3 − *b*, 1, 0, 0). In this case, we get *θ* = 42*b* <sup>2</sup> <sup>−</sup> <sup>126</sup>*<sup>b</sup>* <sup>+</sup> <sup>92</sup> which is always non-negative for any *b* ∈ Z.

In the fourth case, we get *θ* = 13*a* <sup>2</sup> <sup>−</sup> <sup>21</sup>*a*(2*<sup>b</sup>* <sup>+</sup> *<sup>d</sup>* <sup>−</sup> *<sup>e</sup>*) + <sup>7</sup>(6*<sup>b</sup>* <sup>2</sup> <sup>+</sup> <sup>3</sup>*b*(*<sup>d</sup>* <sup>−</sup> <sup>2</sup>*e*) + <sup>5</sup>*<sup>d</sup>* <sup>2</sup> <sup>−</sup> <sup>3</sup>*d*). Since *δθ <sup>δ</sup><sup>e</sup>* = 21(*a* − 2*b*), *θ* is minimized at either *e* = 0 or *e* = *d*. For *e* = 0, we get *θ* = 13*a* <sup>2</sup> <sup>−</sup> <sup>21</sup>*a*(2*<sup>b</sup>* <sup>+</sup> *<sup>d</sup>*) + 7(6*b* <sup>2</sup> + 3*bd* + 5*d* <sup>2</sup> <sup>−</sup> <sup>3</sup>*d*). Since *δθ <sup>δ</sup><sup>b</sup>* <sup>=</sup> <sup>−</sup>21(2*<sup>a</sup>* <sup>−</sup> <sup>4</sup>*<sup>b</sup>* <sup>−</sup> *<sup>d</sup>*), *<sup>θ</sup>* is minimized at either *<sup>b</sup>* <sup>=</sup> <sup>0</sup> or *<sup>b</sup>* <sup>=</sup> <sup>2</sup>*a*−*<sup>d</sup>* 4 . The first case yields *θ* = 13*a* <sup>2</sup> <sup>−</sup> <sup>21</sup>*ad* <sup>+</sup> <sup>35</sup>*<sup>d</sup>* <sup>2</sup> <sup>−</sup> <sup>21</sup>*<sup>d</sup>* and the claim then follows by applying Lemma A6, while the second one yields *θ* = <sup>20</sup>*<sup>a</sup>* <sup>2</sup>−84*ad*+259*<sup>d</sup>* <sup>2</sup>−168*<sup>d</sup>* 8 and the claim then follows by applying Lemma A5. For *e* = *d*, we get *θ* = 13*a* <sup>2</sup> <sup>−</sup> <sup>41</sup>*ab* <sup>+</sup> <sup>7</sup>(6*<sup>b</sup>* <sup>2</sup> <sup>−</sup> <sup>3</sup>*bd* <sup>+</sup> <sup>5</sup>*<sup>d</sup>* <sup>2</sup> <sup>−</sup> <sup>3</sup>*d*). Since *δθ <sup>δ</sup><sup>b</sup>* = −21(2*a* − 4*b* + *d*), *θ* is minimized at *b* = <sup>2</sup>*a*+*<sup>d</sup>* <sup>4</sup> which yields *<sup>θ</sup>* <sup>=</sup> <sup>20</sup>*<sup>a</sup>* <sup>2</sup>−84*ad*+259*<sup>d</sup>* <sup>2</sup>−168*<sup>d</sup>* 8 and the claim then follows by applying Lemma A5.

**Lemma A8.** *Let λ* : Z 2 <sup>≥</sup><sup>0</sup> <sup>→</sup> <sup>Q</sup> *be the function such that <sup>λ</sup>*(*a*, *<sup>d</sup>*) = <sup>3</sup><sup>−</sup> √ 7 2 *a* <sup>2</sup> + (1 + √ 7 − 2*d*)*a* + (3 + √ 7)( *<sup>d</sup>* 2 <sup>2</sup> − *d*)*. For any* (*a*, *d*) ∈ Z 2 ≥0 \ {(0, 1),(1, 1),(1, 2)}*, it holds that λ*(*a*, *d*) ≥ 0*.*

**Proof.** Since *δλ <sup>δ</sup><sup>a</sup>* = (3 + √ 7)*a* − 2*d* + 1 + √ 7, *λ* is minimized at either *a* = 0 or *a* = 2*d*−1− √ 7 3+ √ 7 . √

In the first case, *λ* becomes <sup>3</sup><sup>−</sup> 7 2 *d*(*d* − 2) which is always non-negative for any *d* ∈ Z≥<sup>0</sup> \ {1}. √ √

In the second case, *λ* becomes <sup>1</sup> 2 (3( 7 − 1)*d* <sup>2</sup> + 2*d*( <sup>7</sup> <sup>−</sup> <sup>7</sup>) + <sup>√</sup> 7 − 5) which is always non-negative for any *<sup>d</sup>* <sup>∈</sup> <sup>Z</sup>≥<sup>0</sup> \ {1, 2}. For the leftover case *<sup>d</sup>* <sup>=</sup> 2, *<sup>λ</sup>* becomes <sup>3</sup><sup>−</sup> √ 7 2 *a* <sup>2</sup> + (<sup>√</sup> 7 − 3)*a* which is always non-negative for any *a* ∈ Z \ {1}. For the other case *d* = 1, *λ* becomes 3− √ 7 2 *a* <sup>2</sup> + (<sup>√</sup> 7 − 1)*a* − 3+ √ 7 <sup>2</sup> which is always non-negative for any *a* ∈ Z \ {0, 1}.

**Lemma A9.** *Let θ* : Z 6 <sup>≥</sup><sup>0</sup> <sup>→</sup> <sup>Q</sup> *be the function such that <sup>θ</sup>*(*a*, *<sup>b</sup>*, *<sup>c</sup>*, *<sup>d</sup>*,*e*, *<sup>f</sup>*) = *<sup>a</sup>* 2 (3 − √ 7) − *a*(2*d* − 1 − √ 7) + 2*bc*( √ 7 − 3) + 2(*b f* + *ce*) + (*d* <sup>2</sup> <sup>−</sup> *<sup>d</sup>*)(<sup>3</sup> <sup>+</sup> √ 7) − 2(3 + √ 7)*e f . For any* (*a*, *b*, *c*, *d*,*e*, *f*) ∈ Z 6 ≥0 *such that a* ≥ *b* + *c and d* ≥ *e* + *f , it holds that θ*(*a*, *b*, *c*, *d*,*e*, *f*) ≥ 0*.*

**Proof.** Note first, that for 6-tuples of the form (0, *b*, *c*, 1,*e*, *f*), it holds that *θ* = 0, since *a* = 0 ⇒ *b* = *c* = 0 and *d* = 1 ⇒ *e f* = 0, for 6-tuples of the form (1, *b*, *c*, 1,*e*, *f*), it holds that *θ* = 2(1 + *b f* + *ce*) > 0, since *a* = *d* = 1 ⇒ *bc* = *e f* = 0, and for 6-tuples of the form (1, *b*, *c*, 2,*e*, *f*), it holds that *θ* = 2*b f* + 2*ce* − 2(3 + √ 7)*e f* + 2(3 + √ 7) ≥ 0, since *d* = 2 ⇒ *e f* ≤ 1. Hence, in the sequel of the proof, we avoid to consider the cases *a* = 0 ∧ *d* = 1, *a* = *d* = 1 and *a* = 1 ∧ *d* = 2.

At a first glance, in order to use standard arguments from calculus, we allow the 6-tuples (*a*, *b*, *c*, *d*,*e*, *f*) to take values in the set of non-negative real numbers. Since it holds that *δθ <sup>δ</sup><sup>c</sup>* = 2(*b*( √ 7 − 3) + *e*) and *δθ <sup>δ</sup> <sup>f</sup>* = 2(*b* − ( √ 7 + 3)*e*), *θ* is minimized at one of the following four cases: *c* = 0 ∧ *f* = 0, *c* = 0 ∧ *f* = *d* − *e*, *c* = *a* − *b* ∧ *f* = 0 and *c* = *a* − *b* ∧ *f* = *d* − *e*.

In the first case, we get *θ* = (3 − √ 7)*a* <sup>2</sup> + (<sup>√</sup> 7 + 1 − 2*d*)*a* + (3 + √ 7)(*d* <sup>2</sup> <sup>−</sup> *<sup>d</sup>*). The claim follows by applying Lemma A8, since *θ* ≥ *λ*.

In the second case, we get *θ* = (3 − √ 7)*a* <sup>2</sup> + (<sup>√</sup> 7 + 1 − 2*d*)*a* + (3 + √ 7)(*d* <sup>2</sup> <sup>−</sup> *<sup>d</sup>*) + <sup>2</sup>(*<sup>d</sup>* <sup>−</sup> *<sup>e</sup>*)(*<sup>b</sup>* <sup>−</sup> (3 + √ 7)*e*). Since *δθ <sup>δ</sup><sup>b</sup>* = 2(*d* − *e*), *θ* is minimized at *b* = 0, which yields *θ* = (3 − √ 7)*a* <sup>2</sup> + (<sup>√</sup> 7 + 1 − 2*d*)*a* + (3 + √ 7)(*d* <sup>2</sup> <sup>−</sup> *<sup>d</sup>*) <sup>−</sup> <sup>2</sup>(*<sup>d</sup>* <sup>−</sup> *<sup>e</sup>*)(<sup>3</sup> <sup>+</sup> √ 7)*e*. Since *δθ <sup>δ</sup><sup>e</sup>* = 4(3 + √ 7)*e* − 2*d*(3 + √ 7), *θ* is minimized for *e* = *<sup>d</sup>* 2 . In this case, *θ* becomes (3 − √ 7)*a* <sup>2</sup> + (<sup>√</sup> 7 + 1 − 2*d*)*a* + (3 + √ 7)( *<sup>d</sup>* 2 <sup>2</sup> − *d*). The claim follows by applying Lemma A8, since *θ* ≥ *λ*.

In the third case, we get *θ* = (3− √ 7)*a* <sup>2</sup> + (<sup>√</sup> 7+1+2*e* −2*d*)*a* + (3+ √ 7)(*d* <sup>2</sup> <sup>−</sup> *<sup>d</sup>*) +2*<sup>b</sup>* 2 (3− √ 7) + 2*b*((<sup>√</sup> <sup>7</sup> <sup>−</sup> <sup>3</sup>)*<sup>a</sup>* <sup>−</sup> *<sup>e</sup>*). Since *δθ <sup>δ</sup><sup>e</sup>* = 2(*a* − *b*), *θ* is minimized for *e* = 0, which yields *θ* = (3 − √ 7)*a* <sup>2</sup> + (<sup>√</sup> 7 + 1 − 2*d*)*a* + (3 + √ 7)(*d* <sup>2</sup> <sup>−</sup> *<sup>d</sup>*) + <sup>2</sup>*<sup>b</sup>* 2 (3 − √ 7) + 2*ab*( √ <sup>7</sup> <sup>−</sup> <sup>3</sup>). Since *δθ <sup>δ</sup><sup>b</sup>* = 4(3 − √ 7)*b* − 2*a*(3 − √ 7), *θ* is minimized for *b* = *<sup>a</sup>* 2 . In this case, *θ* becomes <sup>3</sup><sup>−</sup> √ 7 2 *a* <sup>2</sup> + (<sup>√</sup> 7 + 1 − 2*d*)*a* + (3 + √ 7)(*d* <sup>2</sup> <sup>−</sup> *<sup>d</sup>*). The claim follows by applying Lemma A8, since *θ* ≥ *λ*.

In the fourth case, we get *θ* = (3 − √ 7)*a* <sup>2</sup> + (<sup>√</sup> 7 + 1 + 2*e* − 2*d*)*a* + (3 + √ 7)(*d* <sup>2</sup> <sup>−</sup> *<sup>d</sup>*) + <sup>2</sup>*<sup>b</sup>* 2 (<sup>3</sup> <sup>−</sup> <sup>√</sup> 7) + 2( √ 7 − 3)*ab* + 2*bd* − 4*be* + 2(3 + √ 7)*e* 2 . Since *δθ <sup>δ</sup><sup>b</sup>* = 4(3 − √ 7)*b* + 2( √ 7 − 3)*a* + 2*d* − 4*e*, *θ* is minimized at either *b* = 0 or *b* = (3− √ 7)*a*+2*e*−*d* 2(3− √ 7) . For *b* = 0, *θ* becomes (3 − √ 7)*a* <sup>2</sup> + (<sup>√</sup> 7 + 1 + 2*e* − 2*d*)*a* + (3 + √ 7)(*d* <sup>2</sup> <sup>−</sup> *<sup>d</sup>*) + <sup>2</sup>(<sup>3</sup> <sup>+</sup> √ 7)*e* 2 . Since *δθ <sup>δ</sup><sup>e</sup>* = 2*a* − 2(3 + √ 7)*d* + 4(3 + √ 7)*e*, *θ* is minimized at either *e* = 0 or *e* = (3+ √ 7)*d*−*a* 2(3+ √ 7) . In these two cases, *θ* becomes, respectively, (3 − √ 7)*a* <sup>2</sup> + (<sup>√</sup> 7 + 1 − 2*d*)*a* + (3 + √ 7)(*d* <sup>2</sup> <sup>−</sup> *<sup>d</sup>*) and <sup>3</sup> 4 (3 − √ 7)*a* <sup>2</sup> + (<sup>√</sup> 7 + 1 − *d*)*a* + (3 + √ 7)( *<sup>d</sup>* 2 <sup>2</sup> − *d*) which are always non-negative because of Lemma A8 and the fact that *θ* ≥ *λ*. For *b* = (3− √ 7)*a*+2*e*−*d* 2(3− √ 7) , *θ* becomes 3− √ 7 2 *a* <sup>2</sup> + (<sup>√</sup> 7 + 1 − *d*)*a* + (3 + √ 7)( <sup>3</sup>*<sup>d</sup>* 2 <sup>4</sup> − *d*) − (3 + √ 7)*de* + (3 + √ 7)*e* 2 . Since *δθ <sup>δ</sup><sup>e</sup>* = (3 + √ 7)(2*e* − *d*), *θ* is minimized at either *e* = 0 or *e* = *<sup>d</sup>* 2 . In these two cases, *θ* becomes, respectively, <sup>3</sup><sup>−</sup> √ 7 2 *a* <sup>2</sup> + ( √ 7 + 1 − *d*)*a* + (3 + √ 7)( <sup>3</sup>*<sup>d</sup>* 2 <sup>4</sup> <sup>−</sup> *<sup>d</sup>*) and <sup>3</sup><sup>−</sup> √ 7 2 *a* <sup>2</sup> + (<sup>√</sup> 7 + 1 − *d*)*a* + (3 + √ 7)( *<sup>d</sup>* 2 <sup>2</sup> − *d*) which are always non-negative because of Lemma A8 and the fact that *θ* ≥ *λ*.

**Lemma A10.** *Let λ* : Z 2 <sup>≥</sup><sup>0</sup> <sup>→</sup> <sup>Q</sup> *be the function such that <sup>λ</sup>*(*a*, *<sup>d</sup>*) = <sup>49</sup>*<sup>a</sup>* <sup>2</sup> <sup>+</sup> *<sup>a</sup>*(<sup>81</sup> <sup>−</sup> <sup>130</sup>*d*) + <sup>211</sup>*<sup>d</sup>* <sup>2</sup> <sup>−</sup> <sup>211</sup>*d. For any* (*a*, *d*) ∈ Z 2 ≥0 *, it holds that λ*(*a*, *d*) ≥ 0*.*

**Proof.** Since *δλ <sup>δ</sup><sup>a</sup>* <sup>=</sup> <sup>98</sup>*<sup>a</sup>* <sup>−</sup> <sup>130</sup>*<sup>d</sup>* <sup>+</sup> 81, *<sup>λ</sup>* is minimized at either *<sup>a</sup>* <sup>=</sup> 0 or *<sup>a</sup>* <sup>=</sup> <sup>130</sup>*d*−<sup>81</sup> <sup>98</sup> .

In the first case, *λ* becomes *d*(*d* − 1) which is always non-negative for any *d* ∈ Z.

In the second case, *<sup>λ</sup>* becomes *<sup>d</sup>*(3057*<sup>d</sup>* <sup>−</sup> <sup>2537</sup>) <sup>−</sup> <sup>6561</sup> <sup>8</sup> which is always non-negative for any *d* ∈ Z \ {0, 1}. For the leftover case *d* = 0, *λ* becomes 49*a* <sup>2</sup> <sup>+</sup> <sup>81</sup>*a*, which is non-negative for any *<sup>a</sup>* <sup>∈</sup> <sup>R</sup>. For the other case of *d* = 1, *λ* becomes *a*(*a* − 1) which is non-negative for any *a* ∈ Z.

**Lemma A11.** *Let λ* : Z 2 <sup>≥</sup><sup>0</sup> <sup>→</sup> <sup>Q</sup> *be the function such that <sup>λ</sup>*(*a*, *<sup>d</sup>*) = <sup>11</sup>*<sup>a</sup>* <sup>2</sup> <sup>+</sup> *<sup>a</sup>*(<sup>81</sup> <sup>−</sup> <sup>68</sup>*d*) + <sup>422</sup>*<sup>d</sup>* <sup>2</sup> <sup>−</sup> <sup>298</sup>*d. For any* (*a*, *d*) ∈ Z 2 ≥0 *, it holds that λ*(*a*, *d*) ≥ 0*.*

**Proof.** Since *δλ <sup>δ</sup><sup>a</sup>* <sup>=</sup> <sup>222</sup>*<sup>a</sup>* <sup>−</sup> <sup>2</sup>(68*<sup>d</sup>* <sup>−</sup> <sup>81</sup>), *<sup>λ</sup>* is minimized at either *<sup>a</sup>* <sup>=</sup> 0 or *<sup>a</sup>* <sup>=</sup> <sup>68</sup>*d*−<sup>81</sup> <sup>11</sup> .

In the first case, *λ* becomes *d*(211*d* − 149) which is always non-negative for any *d* ∈ Z.

In the second case, *<sup>λ</sup>* becomes *<sup>d</sup>*(9*<sup>d</sup>* <sup>+</sup> <sup>3869</sup>) <sup>−</sup> <sup>6561</sup> <sup>2</sup> which is always non-negative for any *d* ∈ Z≥<sup>0</sup> \ {0}. For the leftover case of *d* = 0, *λ* becomes 11*a* <sup>2</sup> + 162*a* which is non-negative for any *a* ∈ R.

**Lemma A12.** *Let λ* : Z 2 <sup>≥</sup><sup>0</sup> <sup>→</sup> <sup>Q</sup> *be the function such that <sup>λ</sup>*(*a*, *<sup>d</sup>*) = <sup>2321</sup>*<sup>a</sup>* <sup>2</sup> <sup>+</sup> <sup>422</sup>*a*(<sup>81</sup> <sup>−</sup> <sup>65</sup>*d*) + <sup>84817</sup>*<sup>d</sup>* 2 − 89042*d. For any* (*a*, *d*) ∈ Z 2 ≥0 \ {(0, 1)}*, it holds that λ*(*a*, *d*) ≥ 0*.*

**Proof.** Since *δλ <sup>δ</sup><sup>a</sup>* <sup>=</sup> <sup>4642</sup>*<sup>a</sup>* <sup>−</sup> <sup>422</sup>(65*<sup>d</sup>* <sup>−</sup> <sup>81</sup>), *<sup>λ</sup>* is minimized at either *<sup>a</sup>* <sup>=</sup> 0 or *<sup>a</sup>* <sup>=</sup> <sup>65</sup>*d*−<sup>81</sup> <sup>11</sup> .

In the first case, *λ* becomes 84817*d* <sup>2</sup> <sup>−</sup> <sup>89042</sup>*<sup>d</sup>* which is always non-negative for any *<sup>d</sup>* <sup>∈</sup> <sup>Z</sup> \ {1}. In the second case, *<sup>λ</sup>* becomes *<sup>d</sup>*(5189*<sup>d</sup>* <sup>+</sup> <sup>155296</sup>) <sup>−</sup> <sup>1384371</sup> <sup>8</sup> which is always non-negative for any *d* ∈ Z≥<sup>0</sup> \ {0, 1}. For the leftover case *d* = 0, *λ* becomes 11*a* <sup>2</sup> + 162*a*, which is non-negative for any *<sup>a</sup>* <sup>∈</sup> <sup>R</sup>. For the other case of *<sup>d</sup>* <sup>=</sup> 1, *<sup>λ</sup>* becomes *<sup>a</sup>*(11*<sup>a</sup>* <sup>+</sup> <sup>32</sup>) <sup>−</sup> <sup>4225</sup> <sup>211</sup> which is non-negative for any *a* ∈ Z≥<sup>0</sup> \ {0}.

**Lemma A13.** *Let θ* : Z 6 <sup>≥</sup><sup>0</sup> <sup>→</sup> <sup>Q</sup> *be the function such that <sup>θ</sup>*(*a*, *<sup>b</sup>*, *<sup>c</sup>*, *<sup>d</sup>*,*e*, *<sup>f</sup>*) = <sup>49</sup>*<sup>a</sup>* <sup>2</sup> <sup>+</sup> *<sup>a</sup>*(62*<sup>b</sup>* <sup>+</sup> <sup>62</sup>*<sup>c</sup>* <sup>−</sup> <sup>68</sup>*<sup>d</sup>* <sup>−</sup> 62*e* − 62 *f* + 81) + 130*b f* + 130*ce* − 422*bc* + 211*d* <sup>2</sup> <sup>−</sup> <sup>149</sup>*<sup>d</sup>* <sup>+</sup> <sup>162</sup>*e f* <sup>−</sup> <sup>62</sup>*<sup>e</sup>* <sup>−</sup> <sup>62</sup> *<sup>f</sup> . For any* (*a*, *<sup>b</sup>*, *<sup>c</sup>*, *<sup>d</sup>*,*e*, *<sup>f</sup>*) <sup>∈</sup> Z 6 ≥0 *such that a* ≥ *b* + *c and d* ≥ *e* + *f , it holds that θ*(*a*, *b*, *c*, *d*,*e*, *f*) ≥ 0*.*

**Proof.** At a first glance, in order to use standard arguments from calculus, we allow the 6-tuples (*a*, *b*, *c*, *d*,*e*, *f*) to take values in the set of non-negative real numbers. Since it holds that *δθ <sup>δ</sup><sup>c</sup>* = 62*a* − 42*b* + 130*e* and *δθ <sup>δ</sup> <sup>f</sup>* = −2(31*a* − 65*b* − 81*e* + 31), *θ* is minimized at one of the following four cases: *c* = 0 ∧ *f* = 0, *c* = 0 ∧ *f* = *d* − *e*, *c* = *a* − *b* ∧ *f* = 0 and *c* = *a* − *b* ∧ *f* = *d* − *e*.

In the first case, we get *θ* = 49*a* <sup>2</sup> <sup>+</sup> *<sup>a</sup>*(62*<sup>b</sup>* <sup>−</sup> <sup>68</sup>*<sup>d</sup>* <sup>−</sup> <sup>62</sup>*<sup>e</sup>* <sup>+</sup> <sup>81</sup>) + <sup>211</sup>*<sup>d</sup>* <sup>2</sup> <sup>−</sup> <sup>149</sup>*<sup>d</sup>* <sup>−</sup> <sup>62</sup>*e*. Since *δθ <sup>δ</sup><sup>e</sup>* = −62(*a* + 1), *θ* is minimized at *e* = *d*, which yields *θ* = 49*a* <sup>2</sup> <sup>+</sup> *<sup>a</sup>*(62*<sup>b</sup>* <sup>−</sup> <sup>130</sup>*<sup>d</sup>* <sup>+</sup> <sup>81</sup>) + <sup>211</sup>*<sup>d</sup>* <sup>2</sup> <sup>−</sup> <sup>211</sup>*d*. Since *δθ <sup>δ</sup><sup>b</sup>* = 62*a*, *θ* is minimized for *b* = 0. In this case, *θ* becomes 49*a* <sup>2</sup> <sup>+</sup> *<sup>a</sup>*(<sup>81</sup> <sup>−</sup> <sup>130</sup>*d*) + <sup>211</sup>*<sup>d</sup>* <sup>2</sup> <sup>−</sup> <sup>211</sup>*d*. The claim follows by applying Lemma A10.

In the second case, we get *θ* = 49*a* <sup>2</sup> <sup>+</sup> *<sup>a</sup>*(62*<sup>b</sup>* <sup>−</sup> <sup>130</sup>*<sup>d</sup>* <sup>+</sup> <sup>81</sup>) + <sup>130</sup>*b*(*<sup>d</sup>* <sup>−</sup> *<sup>e</sup>*) + <sup>211</sup>*<sup>d</sup>* <sup>2</sup> <sup>+</sup> *<sup>d</sup>*(162*<sup>e</sup>* <sup>−</sup> <sup>211</sup>) <sup>−</sup> 162*e* 2 . Since *δθ <sup>δ</sup><sup>b</sup>* = 62*a* + 130(*d* − *e*), *θ* is minimized at *b* = 0, which yields *θ* = 49*a* <sup>2</sup> <sup>+</sup> *<sup>a</sup>*(<sup>81</sup> <sup>−</sup> <sup>130</sup>*d*) + 211*d* <sup>2</sup> <sup>+</sup> *<sup>d</sup>*(162*<sup>e</sup>* <sup>−</sup> <sup>211</sup>) <sup>−</sup> <sup>162</sup>*<sup>e</sup>* 2 . Since *δθ <sup>δ</sup><sup>e</sup>* = 162*d* − 324*e*, *θ* is minimized at either *e* = 0 and *e* = *d*. In both cases *θ* becomes 49*a* <sup>2</sup> <sup>+</sup> *<sup>a</sup>*(<sup>81</sup> <sup>−</sup> <sup>130</sup>*d*) + <sup>211</sup>*<sup>d</sup>* <sup>2</sup> <sup>−</sup> <sup>211</sup>*<sup>d</sup>* and the claim follows by applying Lemma A10.

In the third case, we get *θ* = 111*a* <sup>2</sup> <sup>−</sup> *<sup>a</sup>*(422*<sup>b</sup>* <sup>+</sup> <sup>68</sup>*<sup>d</sup>* <sup>−</sup> <sup>68</sup>*<sup>e</sup>* <sup>−</sup> <sup>81</sup>) + <sup>422</sup>*<sup>b</sup>* <sup>2</sup> <sup>−</sup> <sup>130</sup>*be* <sup>+</sup> <sup>211</sup>*<sup>d</sup>* <sup>2</sup> <sup>−</sup> <sup>149</sup>*<sup>d</sup>* <sup>−</sup> 62*e*. Since *δθ <sup>δ</sup><sup>b</sup>* <sup>=</sup> <sup>−</sup>422*<sup>a</sup>* <sup>+</sup> <sup>844</sup>*<sup>b</sup>* <sup>−</sup> <sup>130</sup>*e*, *<sup>θ</sup>* is minimized at *<sup>b</sup>* <sup>=</sup> <sup>211</sup>*a*+65*<sup>e</sup>* <sup>422</sup> , which yields *θ* = 2321*a* <sup>2</sup> + 422*a*(3*e* + 81 − 68*d*) + 89042*d* <sup>2</sup> <sup>−</sup> <sup>62878</sup>*<sup>d</sup>* <sup>−</sup> <sup>4225</sup>*<sup>e</sup>* <sup>2</sup> <sup>−</sup> <sup>26164</sup>*e*. Since *δθ <sup>δ</sup><sup>e</sup>* = 1266*a* − 8450*e* − 26164, *θ* is minimized at either *e* = 0 or *e* = *d*. For *e* =, *θ* becomes 11*a* <sup>2</sup> <sup>+</sup>2*a*(81−68*d*) +422*<sup>d</sup>* <sup>2</sup> <sup>−</sup>298*<sup>d</sup>* and the claim follows by applying Lemma A11. For *e* = *d*, *θ* becomes 2321*a* <sup>2</sup> <sup>+</sup> <sup>422</sup>*a*(<sup>81</sup> <sup>−</sup> <sup>65</sup>*d*) + <sup>84817</sup>*<sup>d</sup>* <sup>2</sup> <sup>−</sup> <sup>89042</sup>*<sup>d</sup>* and the claim follows for any 6-tuple (*a*, *b*, *c*, *d*,*e*, *f*) such that (*a*, *d*) 6= (0, 1) by applying Lemma A12. Hence, we are left to consider the 6-tuples of the form (0, 0, 0, 1,*e*, 0). In this case *θ* becomes 62(1 − *e*) which is always non-negative since *e* ∈ {0, 1}.

In the fourth case, we get *θ* = 111*a* <sup>2</sup> <sup>−</sup> *<sup>a</sup>*(422*<sup>b</sup>* <sup>+</sup> <sup>130</sup>*<sup>d</sup>* <sup>−</sup> <sup>130</sup>*<sup>e</sup>* <sup>−</sup> <sup>81</sup>) + <sup>422</sup>*<sup>b</sup>* <sup>2</sup> <sup>+</sup> <sup>130</sup>*b*(*<sup>d</sup>* <sup>−</sup> <sup>2</sup>*e*) + 211*d* <sup>2</sup> <sup>+</sup> *<sup>d</sup>*(162*<sup>e</sup>* <sup>−</sup> <sup>211</sup>) <sup>−</sup> <sup>162</sup>*<sup>e</sup>* 2 . Since *δθ <sup>δ</sup><sup>b</sup>* = −422*a* + 844*b* + 130(*d* − 2*e*), *θ* is minimized at either *b* = 0 or *b* = 211*a*−65(*d*−2*e*) <sup>422</sup> . For *b* = 0, *θ* becomes 111*a* <sup>2</sup> <sup>−</sup> *<sup>a</sup>*(130*<sup>d</sup>* <sup>−</sup> <sup>130</sup>*<sup>e</sup>* <sup>−</sup> <sup>81</sup>) + <sup>211</sup>*<sup>d</sup>* <sup>2</sup> <sup>+</sup> *<sup>d</sup>*(162*<sup>e</sup>* <sup>−</sup> <sup>211</sup>) <sup>−</sup> <sup>162</sup>*<sup>e</sup>* 2 . Since *δθ <sup>δ</sup><sup>e</sup>* = 130*a* + 162*d* − 324*e*, *θ* is minimized at either *e* = 0 or *e* = *d*. In these two cases, *θ* becomes, respectively, 111*a* <sup>2</sup> <sup>+</sup> *<sup>a</sup>*(<sup>81</sup> <sup>−</sup> <sup>130</sup>*d*) + <sup>211</sup>*<sup>d</sup>* <sup>2</sup> <sup>−</sup> <sup>211</sup>*<sup>d</sup>* and <sup>111</sup>*<sup>a</sup>* <sup>2</sup> + 81*a* + 211*d* <sup>2</sup> <sup>−</sup> <sup>211</sup>*<sup>d</sup>* which are always non-negative because of Lemma A10 and the fact the *θ* ≥ *λ*. For *b* = 211*a*−65(*d*−2*e*) <sup>422</sup> , *θ* becomes 2321*a* <sup>2</sup> <sup>+</sup> <sup>422</sup>*a*(<sup>81</sup> <sup>−</sup> <sup>65</sup>*d*) + <sup>84817</sup>*<sup>d</sup>* <sup>2</sup> <sup>+</sup> <sup>2</sup>*d*(42632*<sup>e</sup>* <sup>−</sup> <sup>44521</sup>) <sup>−</sup> <sup>85264</sup>*<sup>e</sup>* 2 . Since *δθ <sup>δ</sup><sup>e</sup>* = 85264*d* − 170528*e*, *θ* is minimized at either *e* = 0 or *e* = *d*. In both cases, *θ* becomes 2321*a* <sup>2</sup> <sup>+</sup> <sup>422</sup>*a*(<sup>81</sup> <sup>−</sup> <sup>65</sup>*d*) + <sup>84817</sup>*<sup>d</sup>* 2 − 89042*d* and the claim follows for any 6-tuple (*a*, *b*, *c*, *d*,*e*, *f*) such that (*a*, *d*) 6= (0, 1) by applying Lemma A12. Hence, we are left to consider the 6-tuples of the form (0, 0, 0, 1,*e*, 1 − *e*). In this case, *θ* becomes 162(1 − *e*) which is always non-negative since *e* ∈ {0, 1}.

#### **Appendix B. Missing Proofs**

**Theorem A1** (Claim of Theorem 4)**.** *There exist two bidimensional linear congestion games* (G, C) *and* (G 0 , C 0 ) *such that PoA*(G, <sup>C</sup>) <sup>≥</sup> <sup>119</sup> <sup>33</sup> *under the social cost function* Pres *and PoA*(G 0 , C 0 ) <sup>≥</sup> <sup>35</sup> 8 *under the social cost function* Perc*.*

**Proof.** For the social cost function Pres, consider the game (G, C) depicted in Figure A1a). First, we show that *σ* is a pure Nash equilibrium for (G, C), that is, no player can lower her perceived cost by switching to her optimal strategy. Player 1 is paying 27 · 2 + 46 = 100; by switching to *σ* ∗ 1 , she pays 7 · 4 + 18 · 4 = 100. Player 2 is paying 27 · 2 + 42 + 56 = 152; by switching to *σ* ∗ 2 , she pays 17 · 4 + 21 · 4 = 152. Player 3 is paying 27 · 2 + 42 = 96; by switching to *σ* ∗ 3 , she pays 7 · 4 + 17 · 4 = 96. Player 4 is paying 27 · 2 + 46 + 56 = 156; by switching to *σ* ∗ 4 , she pays 18 · 4 + 21 · 4 = 156. Player 5 is paying 7 · 3 + 17 · 3 + 21 · 3 = 135; by switching to *σ* ∗ 5 , she pays 27 · 5 = 135. Player 6 is paying 7 · 3 + 18 · 3 + 21 · 3 = 138; by switching to *σ* ∗ 6 , she pays 46 · 3 = 138. Player 7 is paying 7 · 3 + 18 · 3 + 17 · 3 = 126; by switching to *σ* ∗ 7 , she pays 42 · 3 = 126. Player 8 is paying 18 · 3 + 17 · 3 + 21 · 3 = 168; by switching to *σ* ∗ 8 , she pays 56 · 3 = 168.

The price of anarchy of (G, C) is then lower bounded by the ratio

$$\frac{100+152+96+156+135+138+126+168}{25+38+24+39+27+46+42+56} = \frac{1071}{297} = \frac{119}{33}.$$

For the social cost function Perc, consider the game (G 0 , C 0 ) depicted in Figure A1b). First, we show that *σ* is a pure Nash equilibrium for (G 0 , C 0 ), that is, no player can lower her perceived cost by switching to her optimal strategy. Player 1 is paying 1418 + 958 + 189 · 2 = 2754; by switching to *σ* ∗ 1 , she pays 918 · 3 = 2754. Player 2 is paying 616 + 221 + 189 · 2 = 1215; by switching to *σ* ∗ 2 , she pays 405 · 3 = 1215. Player 3 is paying 1418 + 616 + 189 · 2 = 2412; by switching to *σ* ∗ 3 , she pays 804 · 3 = 2412. Player 4 is paying 958 + 221 + 189 · 2 = 1557; by switching to *σ* ∗ 4 , she pays 519 · 3 = 1557. Player 5 is paying (918 + 405 + 804) · 2 = 4254; by switching to *σ* ∗ 5 , she pays 1418 · 3 = 4254. Player 6 is paying (918 + 519) · 2 = 2874; by switching to *σ* ∗ 6 , she pays 958 · 3 = 2874. Player 7 is paying (405 + 519) · 2 = 1848; by switching to *σ* ∗ 7 , she pays 616 · 3 = 1848. Player 8 is paying 804 · 2 = 1608; by switching to *σ* ∗ 8 , she pays 221 · 3 + 189 · 5 = 1608.

By noting that the perceived cost of the first four players is exactly twice their presumed one, the price of anarchy of (G 0 , C 0 ) is then lower bounded by the ratio

$$\frac{2 \cdot (2754 + 1215 + 2412 + 1557) + 4254 + 2874 + 1848 + 1608}{1418 + 958 + 616 + 221 + 189 + 918 + 405 + 804 + 519} = \frac{26460}{6048} = \frac{35}{8}$$

.

**Figure A1.** The games depicted in figures (**a**,**b**) represent the lower bound instances w.r.t. the social cost functions Pres and Perc, respectively. Each column in the matrix represents a resource of cost function `(*x*) = *αx* whose coefficient *α* is reported at the bottom of the column. Each row *i* in the matrix models the strategy set of player *i* as follows: circles represent resources belonging to *σ<sup>i</sup>* , while crosses represent resources belonging to *σ* ∗ *i* .

**Theorem A2** (Claim of Theorem 6)**.** *For any e* > 0*, there exist two bidimensional linear congestion games* (G, C) *and* (G 0 , C 0 ) *such that PoS*(G, C) ≥ 1+ √ 5 <sup>2</sup> − *e under the social cost function* Pres *and PoS*(G 0 , C 0 ) ≥ 5+ √ 17 <sup>4</sup> − *e under the social cost function* Perc*.*

**Proof.** Let (G, C) be a bidimensional linear congestion game such that |*C*0| = *n*<sup>0</sup> and |*C*1| = |*C*2| = *n*1. Each player *i* ∈ *C*<sup>1</sup> ∪ *C*<sup>2</sup> has two strategies *σ<sup>i</sup>* and *σ* ∗ *i* , while all players in *C*<sup>0</sup> have the same strategy *s*.

There are three types of resources:


Let *σ* (resp. *σ* ∗ ) be the strategy profile in which each player *i* ∈/ *C*<sup>0</sup> plays strategy *σ<sup>i</sup>* (resp. *σ* ∗ *i* ). The cost of each player *i* ∈ *C<sup>j</sup>* , with *j* ∈ {1, 2}, for adopting strategy *σ<sup>i</sup>* when there are exactly *h* players in *C<sup>j</sup>* adopting the strategy played in *σ* (and thus there are *n*<sup>1</sup> − *h* players in *C<sup>j</sup>* adopting the strategy played in *σ* ∗ ) is *costσ*(*h*) = <sup>2</sup>*n*1−*h*−<sup>1</sup> <sup>2</sup> + *n*<sup>0</sup> + *h*. Similarly, the cost of each player *i* ∈ *C<sup>j</sup>* for adopting strategy *σ* ∗ *<sup>i</sup>* when there are exactly *h* players in *C<sup>j</sup>* adopting the strategy played in *σ* is *costσ*<sup>∗</sup> (*h*) = *<sup>n</sup>*1+2*n*0+1+*<sup>γ</sup>* <sup>2</sup> + *n*1+*h*−1 2 . Since for any *h* ∈ [*n*1], it holds that *costσ*<sup>∗</sup> (*h* − 1) > *costσ*(*h*), it follows that *σ* is the only pure Nash equilibrium for (G, C).

The price of stability of (G, C) is then lower bounded by the following ratio

$$\frac{n\_1(n\_1 - 1) + 2n\_1(n\_1 + n\_0) + n\_0(2n\_1 + n\_0)}{n\_1(n\_1 + 2n\_0 + 1 + \gamma) + n\_1(n\_1 - 1) + n\_0^2},$$

which, for *n*<sup>0</sup> going to infinity and *n*<sup>1</sup> = 1+ √ 5 2 *n*0, tends to <sup>1</sup><sup>+</sup> √ 5 2 .

Let (G 0 , C 0 ) be a bidimensional linear congestion game such that *C*<sup>0</sup> = ∅, |*C*1| = *n*<sup>1</sup> and |*C*2| = *n*2. Each player *i* ∈ *C*<sup>1</sup> has two strategies *σ<sup>i</sup>* and *σ* ∗ , while all players in *C*<sup>2</sup> have the same strategy *s*.

*i*

There are three types of resources:


Let *σ* (resp. *σ* ∗ ) be the strategy profile in which each player *i* ∈/ *C*<sup>0</sup> plays strategy *σ<sup>i</sup>* (resp. *σ* ∗ *i* ). The cost of each player *i* ∈ *C*<sup>1</sup> for adopting strategy *σ<sup>i</sup>* when there are exactly *h* players in *C*<sup>1</sup> adopting the strategy played in *σ* (and thus there are *n*<sup>1</sup> − *h* players in *C*<sup>1</sup> adopting the strategy played in *σ* ∗ ) is *costσ*(*h*) = <sup>2</sup>*n*1−*h*−<sup>1</sup> <sup>2</sup> + *h*. Similarly, the cost of each player *i* ∈ *C*<sup>1</sup> for adopting strategy *σ* ∗ *<sup>i</sup>* when there are exactly *<sup>h</sup>* players in *<sup>C</sup>*<sup>1</sup> adopting the strategy played in *<sup>σ</sup>* is *costσ*<sup>∗</sup> (*h*) = *<sup>n</sup>*1+1+*<sup>γ</sup>* <sup>2</sup> + *n*1+*h*−1 2 . Since for any *h* ∈ [*n*1], it holds that *costσ*<sup>∗</sup> (*h* − 1) > *costσ*(*h*), it follows that *σ* is the only pure Nash equilibrium for (G 0 , C 0 ).

The price of stability of (G 0 , C 0 ) is then lower bounded by the following ratio

$$\frac{\frac{1}{2}n\_1(n\_1 - 1) + (n\_1 + n\_2)^2}{\frac{1}{2}n\_1(n\_1 + 1 + \gamma) + \frac{1}{2}n\_1(n\_1 - 1) + n\_2^2}\gamma$$

which, for *n*<sup>2</sup> going to infinity and *n*<sup>1</sup> = 1+ √ 17 4 *n*2, tends to <sup>1</sup><sup>+</sup> √ 17 4 .

#### **References**


**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **Algorithmic Aspects of Some Variations of Clique Transversal and Clique Independent Sets on Graphs**

**Chuan-Min Lee**

Department of Computer and Communication Engineering, Ming Chuan University, 5 De Ming Road, Guishan District, Taoyuan City 333, Taiwan; joneslee@mail.mcu.edu.tw; Tel.: +886-3-350-7001 (ext. 3432); Fax: +886-3-359-3876

**Abstract:** This paper studies the maximum-clique independence problem and some variations of the clique transversal problem such as the {*k*}-clique, maximum-clique, minus clique, signed clique, and *k*-fold clique transversal problems from algorithmic aspects for *k*-trees, suns, planar graphs, doubly chordal graphs, clique perfect graphs, total graphs, split graphs, line graphs, and dually chordal graphs. We give equations to compute the {*k*}-clique, minus clique, signed clique, and *k*-fold clique transversal numbers for suns, and show that the {*k*}-clique transversal problem is polynomial-time solvable for graphs whose clique transversal numbers equal their clique independence numbers. We also show the relationship between the signed and generalization clique problems and present NP-completeness results for the considered problems on *k*-trees with unbounded *k*, planar graphs, doubly chordal graphs, total graphs, split graphs, line graphs, and dually chordal graphs.

**Keywords:** clique independent set; clique transversal number; signed clique transversal function; minus clique transversal function; *k*-fold clique transversal set

**Citation:** Lee, C.-M. Algorithmic Aspects of Some Variations of Clique Transversal and Clique Independent Sets on Graphs. *Algorithms* **2021**, *14*, 22. https://doi.org/10.3390/a 14010022

Received: 9 December 2020 Accepted: 11 January 2021 Published: 13 January 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

#### **1. Introduction**

Every graph *G* = (*V*, *E*) in this paper is finite, undirected, connected, and has at most one edge between any two vertices in *G*. We assume that the vertex set *V* and edge set *E* of *G* contain *n* vertices and *m* edges. They can also be denoted by *V*(*G*) and *E*(*G*). A graph *G* 0 = (*V* 0 , *E* 0 ) is an *induced subgraph* of *G* denoted by *G*[*V* 0 ] if *V* <sup>0</sup> ⊆ *V* and *E* 0 contains all the edge (*x*, *y*) ∈ *E* for *x*, *y* ∈ *V* 0 . Two vertices *x*, *y* ∈ *V* are *adjacent* or *neighbors* if (*x*, *y*) ∈ *E*. The sets *NG*(*x*) = {*y* | (*x*, *y*) ∈ *E*} and *NG*[*x*] = *NG*(*x*) ∪ {*x*} are the *neighborhood* and *closed neighborhood* of a vertex *x* in *G*, respectively. The number *degG*(*x*) = |*NG*(*x*)| is the *degree* of *x* in *G*. If *degG*(*x*) = *k* for every *x* ∈ *V*, then *G* is *k-regular*. Particularly, *cubic graphs* are an alternative name for 3-regular graphs.

A subset *S* of *V* is a *clique* if (*x*, *y*) ∈ *E* for *x*, *y* ∈ *S*. Let *Q* be a clique of *G*. If *Q* ∩ *Q*<sup>0</sup> 6= *Q* for any other clique *Q*0 of *G*, then *Q* is a *maximal* clique. We use *C*(*G*) to represent the set {*C* | *C* is a maximal clique of *G*}. A clique *S* ∈ *C*(*G*) is a *maximum* clique if |*S*| ≥ |*S* 0 | for every *S* <sup>0</sup> ∈ *C*(*G*). The number *ω*(*G*) = max{|*S*| | *S* ∈ *C*(*G*)} is the *clique number* of *G*. A set *D* ⊆ *V* is a *clique transversal set* (abbreviated as CTS) of *G* if |*C* ∩ *D*| ≥ 1 for every *C* ∈ *C*(*G*). The number *τC*(*G*) = min{|*S*| | *S* is a CTS of *G*} is the *clique transversal number* of *G*. The *clique transversal problem* (abbreviated as CTP) is to find a minimum CTS for a graph. A set *S* ⊆ *C*(*G*) is a *clique independent set* (abbreviated as CIS) of *G* if |*S*| = 1 or |*S*| ≥ 2 and *C* ∩ *C* 0 = ∅ for *C*, *C* <sup>0</sup> ∈ *S*. The number *αC*(*G*) = max{|*S*| | *S* is a CIS of *G*} is the *clique independence number* of *G*. The *clique independence problem* (abbreviated as CIP) is to find a maximum CIS for a graph.

The CTP and the CIP have been widely studied. Some studies on the CTP and the CIP consider imposing some additional constraints on CTS or CIS, such as the *maximum-clique independence problem* (abbreviated as MCIP), the *k-fold clique transversal problem* (abbreviated as *k*-FCTP), and the *maximum-clique transversal problem* (abbreviated as MCTP).

**Definition 1** ([1,2])**.** *Suppose that k* ∈ N *is fixed and G is a graph. A set D* ⊆ *V*(*G*) *is a k-fold clique transversal set (abbreviated as k-FCTS) of G if* |*C* ∩ *D*| ≥ *k for C* ∈ *C*(*G*)*. The number τ k C* (*G*) = *min*{|*S*| | *S is a k-FCTS of G*} *is the k-fold clique transversal number of G. The k-FCTP is to find a minimum k-FCTS for a graph.*

**Definition 2** ([3,4])**.** *Suppose that G is a graph. A set D* ⊆ *V*(*G*) *is a maximum-clique transversal set (abbreviated as MCTS) of G if* |*C* ∩ *D*| ≥ 1 *for C* ∈ *C*(*G*) *with* |*C*| = *ω*(*G*)*. The number τM*(*G*) = *min*{|*S*| | *S is an MCTS of G*} *is the maximum-clique transversal number of G. The MCTP is to find a minimum MCTS for a graph. A set S* ⊆ *C*(*G*) *is a maximum-clique independent set (abbreviated as MCIS) of G if* |*C*| = *ω*(*G*) *for C* ∈ *S and C* ∩ *C* 0 = ∅ *for C*, *C* <sup>0</sup> ∈ *S. The number αM*(*G*) = *max*{|*S*| | *S is an MCIS of G*} *is the maximum-clique independence number of G. The MCIP is to find a maximum MCIS for a graph.*

The *k*-FCTP on balanced graphs can be solved in polynomial time [2]. The MCTP has been studied in [3] for several well-known graph classes and the MCIP is polynomial-time solvable for any graph *H* with *τM*(*H*) = *αM*(*H*) [4]. Assume that *Y* ⊆ R and *f* : *X* → *Y* is a function. Let *f*(*X* 0 ) = <sup>∑</sup>*x*∈*<sup>X</sup> <sup>f</sup>*(*x*) for *<sup>X</sup>* <sup>0</sup> ⊆ *X*, and let *f*(*X*) be the *weight* of *f* . A CTS of *G* can be expressed as a function *f* whose domain is *V*(*G*) and range is {0, 1}, and *f*(*C*) ≥ 1 for *C* ∈ *C*(*G*). Then, *f* is a *clique transversal function* (abbreviated as CTF) of *G* and *τC*(*G*) = min{ *f*(*V*(*G*)) | *f* is a CTF of *G*}. Several types of CTF have been studied [4–7]. The following are examples of CTFs.

**Definition 3.** *Suppose that k* ∈ N *is fixed and G is a graph. A function f is a* {*k*}*-clique transversal function (abbreviated as* {*k*}*-CTF) of G if the domain and range of f are V*(*G*) *and* {0, 1, 2, . . . , *k*}*, respectively, and f*(*C*) ≥ *k for C* ∈ *C*(*G*)*. The number τ* {*k*} *C* (*G*) = *min*{ *f*(*V*(*G*)) | *f is a* {*k*}*-CTF of G*} *is the* {*k*}*-clique transversal number of G. The* {*k*}*-clique transversal problem (abbreviated as* {*k*}*-CTP) is to find a minimum-weight* {*k*}*-CTF for a graph.*

**Definition 4.** *Suppose that G is a graph. A function f is a signed clique transversal function (abbreviated as SCTF) of G if the domain and range of f are V*(*G*) *and* {−1, 1}*, respectively, and f*(*C*) ≥ 1 *for C* ∈ *C*(*G*)*. If the domain and range of f are V*(*G*) *and* {−1, 0, 1}*, respectively, and f*(*C*) ≥ 1 *for C* ∈ *C*(*G*)*, then f is a minus clique transversal function (abbreviated as MCTF) of G. The number τ s C* (*G*) = *min*{ *f*(*V*(*G*)) | *f is an SCTF of G*} *is the signed clique transversal number of G. The minus clique transversal number of G is τ* − *C* (*G*) = *min*{ *f*(*V*(*G*)) | *f is an MCTF of G*}*. The signed clique transversal problem (abbreviated as SCTP) is to find a minimum-weight SCTF for a graph. The minus clique transversal problem (abbreviated as MCTP) is to find a minimum-weight MCTF for a graph.*

Lee [4] introduced some variations of the *k*-FCTP, the {*k*}-CTP, the SCTP, and the MCTP, but those variations are dedicated to maximum cliques in a graph. The MCTP on chordal graphs is NP-complete, while the MCTP on block graphs is linear-time solvable [7]. The MCTP and SCTP are linear-time solvable for any strongly chordal graph *G* if a *strong elimination ordering* of *G* is given [5]. The SCTP is NP-complete for doubly chordal graphs [6] and planar graphs [5].

According to what we have described above, there are very few algorithmic results regarding the *k*-FCTP, the {*k*}-CTP, the SCTP, and the MCTP on graphs. This motivates us to study the complexities of the *k*-FCTP, the {*k*}-CTP, the SCTP, and the MCTP. This paper also studies the MCTP and MCIP for some graphs and investigates the relationships between different *dominating functions* and CTFs.

**Definition 5.** *Suppose that k* ∈ N *is fixed and G is a graph. A set S* ⊆ *V*(*G*) *is a k-tuple dominating set (abbreviated as k-TDS) of G if* |*S* ∩ *NG*[*x*]| ≥ 1 *for x* ∈ *V*(*G*)*. The number γ*×*<sup>k</sup>* (*G*) = *min*{|*S*| | *S is a k-TDS of G*} *is the k-tuple domination number of G. The k-tuple domination problem (abbreviated as k-TDP) is to find a minimum k-TDS for a graph.*

Notice that a *dominating set* of a graph *G* is a 1-TDS. The *domination number γ*(*G*) of *G* is *γ*×1(*G*).

**Definition 6.** *Suppose that k* ∈ N *is fixed and G is a graph. A function f is a* {*k*}*-dominating function (abbreviated as* {*k*}*-DF) of G if the domain and range of f are V*(*G*) *and* {0, 1, 2, . . . , *k*}*, respectively, and <sup>f</sup>*(*NG*[*x*]) ≥ *<sup>k</sup> for <sup>x</sup>* ∈ *<sup>V</sup>*(*G*)*. The number <sup>γ</sup>*{*k*} (*G*) = *min*{ *f*(*V*(*G*)) | *f is a* {*k*}*-DF of G*} *is the* {*k*}*-domination number of G. The* {*k*}*-domination problem (abbreviated as* {*k*}*-DP) is to find a minimum-weight* {*k*}*-DF for a graph.*

**Definition 7.** *Suppose that G is a graph. A function f is a signed dominating function (abbreviated as SDF) of G if the domain and range of f are V*(*G*) *and* {−1, 1}*, respectively, and f*(*NG*[*x*]) ≥ 1 *for x* ∈ *V*(*G*)*. If the domain and range of f are V*(*G*) *and* {−1, 0, 1}*, respectively, and f*(*NG*[*x*]) ≥ 1 *for x* ∈ *V*(*G*)*, then f is a minus dominating function (abbreviated as MDF) of G. The number γs*(*G*) = *min*{ *f*(*V*(*G*)) | *f is an SDF of G*} *is the signed domination number of G. The minus domination number of G is γ* <sup>−</sup>(*G*) = *min*{ *f*(*V*(*G*)) | *f is an MDF of G*}*. The signed domination problem (abbreviated as SDP) is to find a minimum-weight SDF for a graph. The minus domination problem (abbreviated as MDP) is to find a minimum-weight MDF for a graph.*

Our main contributions are as follows.


#### **2. Suns**

In this section, we give equations to compute *τ* {*k*} *C* (*G*), *τ k C* (*G*), *τ s C* (*G*), and *τ* − *C* (*G*) for any sun *G* and show that *τ* {*k*} *C* (*G*) = *<sup>γ</sup>*{*k*} (*G*), *τ k C* (*G*) = *<sup>γ</sup>*×*<sup>k</sup>* (*G*), *τ s C* (*G*) = *γs*(*G*), and *τ* − *C* (*G*) = *γ* −(*G*).

Let *p* ∈ N and *G* be a graph. An edge *e* ∈ *E*(*G*) is a *chord* if *e* connects two nonconsecutive vertices of a cycle in *G*. If *C* has a chord for every cycle *C* consisting of more than three vertices, *G* is a *chordal* graph. A *sun G* is a chordal graph whose vertices can be partitioned into *W* = {*w<sup>i</sup>* | 1 ≤ *i* ≤ *p*} and *U* = {*u<sup>i</sup>* | 1 ≤ *i* ≤ *p*} such that (1) *W* is an independent set, (2) the vertices *u*1, *u*2, . . . , *u<sup>p</sup>* of *U* form a cycle, and (3) every *w<sup>i</sup>* ∈ *W* is adjacent to precisely two vertices *u<sup>i</sup>* and *u<sup>j</sup>* , where *j* ≡ *i* + 1 (mod *p*). We use *S<sup>p</sup>* = (*W*, *U*, *E*) to denote a sun. Then, |*V*(*Sp*)| = 2*p*. If *p* is odd, *S<sup>p</sup>* is an *odd* sun; otherwise, it is an *even* sun. Figure 1 shows two suns.

**Figure 1.** (**a**) The sun *S*3. (**b**) A sun *S*<sup>4</sup> .

**Lemma 1.** *For any sun S<sup>p</sup>* = (*W*, *U*, *E*)*, τ* 2 *C* (*Sp*) = *p and τ* 3 *C* (*Sp*) = 2*p.*

**Proof.** It is straightforward to see that *U* is a minimum 2-FCTS and *W* ∪ *U* is a minimum 3-FCTS of *Sp*. This lemma therefore holds.

**Lemma 2.** *Suppose that k* ∈ N *and k* > 1*. Then, τ* {*k*} *C* (*Sp*) = d*pk*/2e *for any sun S<sup>p</sup>* = (*W*, *U*, *E*)*.*

**Proof.** Let *i*, *j* ∈ {1, 2, . . . *p*} such that *j* ≡ *i* + 1 (mod *p*). Since every *w<sup>i</sup>* ∈ *W* is adjacent to precisely two vertices *u<sup>i</sup>* , *u<sup>j</sup>* ∈ *U*, *NS<sup>p</sup>* [*w<sup>i</sup>* ] = {*w<sup>i</sup>* , *u<sup>i</sup>* , *uj*} is a maximal clique of *Sp*. By contradiction, we can prove that there exists a minimum {*k*}-CTF *f* of *S<sup>p</sup>* such that *f*(*wi*) = 0 for *w<sup>i</sup>* ∈ *W*. Since *f*(*NS<sup>p</sup>* [*w<sup>i</sup>* ]) ≥ *k* for 1 ≤ *i* ≤ *p*, we have

$$\tau\_{\mathbb{C}}^{\{k\}}(\mathcal{S}\_p) = \sum\_{i=1}^p f(u\_i) = \frac{\sum\_{i=1}^p f(\mathcal{N}\_{\mathcal{S}p}[w\_i])}{2} \ge \frac{pk}{2}.$$

Since *τ* {*k*} *C* (*Sp*) is a nonnegative integer, *τ* {*k*} *C* (*Sp*) ≥ d*pk*/2e.

We define a function *h* : *W* ∪ *U* → {0, 1, . . . , *k*} by *h*(*wi*) = 0 for every *w<sup>i</sup>* ∈ *W*, *h*(*ui*) = d*k*/2e for *u<sup>i</sup>* ∈ *U* with odd index *i* and *h*(*ui*) = b*k*/2c for every *u<sup>i</sup>* ∈ *U* with even index *i*. Clearly, a maximal clique *Q* of *S<sup>n</sup>* is either the closed neighborhood of some vertex in *W* or a set of at least three vertices in *U*. If *Q* = *NS<sup>p</sup>* [*w<sup>i</sup>* ] for some *w<sup>i</sup>* ∈ *W*, then *h*(*Q*) = d*k*/2e + b*k*/2c = *k*. Suppose that *Q* is a set of at least three vertices in *U*. Since *k* ≥ 2, *h*(*Q*) ≥ 3 · b*k*/2c ≥ *k*. Therefore, *h* is a {*k*}-CTF of *Sp*. We show the weight of *h* is d*pk*/2e by considering two cases as follows.

Case 1: *p* is even. We have

$$h(V(\mathcal{S}\_p)) = \sum\_{i=1}^p h(u\_i) = \frac{p}{2} \cdot (\lceil k/2 \rceil + \lfloor k/2 \rfloor) = \frac{pk}{2}.$$

Case 2: *p* is odd. We have

$$h(V(\mathcal{S}\_p)) = \sum\_{i=1}^p h(u\_i) = \frac{(p-1)}{2} \cdot k + \lceil k/2 \rceil = \lceil pk/2 \rceil.$$

Following what we have discussed above, we know that *h* is a minimum {*k*}-CTF of *S<sup>n</sup>* and thus *τ* {*k*} *C* (*Sp*) = d*pk*/2e.

**Lemma 3.** *For any sun S<sup>p</sup>* = (*W*, *U*, *E*)*, τ* − *C* (*Sp*) = *τ s C* (*Sp*) = 0.

**Proof.** For 1 ≤ *i* ≤ *p*, *NS<sup>p</sup>* [*w<sup>i</sup>* ] is a maximal clique of *Sp*. Let *h* be a minimum SCTF of *Sp*. Then, *τ s C* (*Sp*) = *h*(*V*(*Sp*)). Note that *h*(*NS<sup>p</sup>* [*w<sup>i</sup>* ]) ≥ 1 for 1 ≤ *i* ≤ *p*. We have

$$h(V(\mathcal{S}\_p)) = \sum\_{i=1}^p h(N\_{\mathcal{S}\_p}[w\_i]) - \sum\_{i=1}^p h(u\_i) \ge p - \sum\_{i=1}^p h(u\_i).$$

Since ∑ *p i*=1 *h*(*ui*) ≤ *p*, we have *τ s C* (*Sp*) ≥ 0. Let *f* be an SCTF of *S<sup>p</sup>* such that *f*(*ui*) = 1 and *f*(*wi*) = −1 for 1 ≤ *i* ≤ *p*. The weight of *f* is 0. Then *f* is a minimum SCTF of *Sp*. Hence, *τ* − *C* (*Sp*) = 0 and *τ s C* (*Sp*) = 0. The proof for *τ* − *C* (*G*) = 0 is analogous to that for *τ s C* (*G*) = 0.

**Theorem 1** (Lee and Chang [9])**.** *Let S<sup>p</sup> be a sun. Then,*

(1) *γ* −(*Sp*) = *γs*(*Sp*) = 0*;* (2) *<sup>γ</sup>*{*k*} (*Sp*) = d*pk*/2e*;* (3) *γ*×1(*Sp*) = d*p*/2e*, γ*×2(*Sp*) = *p and γ*×3(*Sp*) = 2*p.*

**Corollary 1.** *Let S<sup>p</sup> be a sun. Then,*


**Proof.** The corollary holds by Lemmas 1–3 and Corollary 1.

#### **3. Clique Perfect Graphs**

Let G be the set of all induced subgraphs of *G*. If *τC*(*H*) = *αC*(*H*) for every *H* ∈ G, then *G* is *clique perfect*. In this section, we study the {*k*}-CTP for clique perfect graphs and the SCTP for balanced graphs.

**Lemma 4.** *Let G be such a graph that τC*(*G*) = *αC*(*G*)*. Then, τ* {*k*} *C* (*G*) = *kτC*(*G*)*.*

**Proof.** Assume that *D* is a minimum CTS of *G*. Then, |*D*| = *τC*(*G*). Let *x* ∈ *V*(*G*) and let *f* be a function whose domain is *V*(*G*) and range is {0, 1, . . . , *k*}, and *f*(*x*) = *k* if *x* ∈ *D*; otherwise, *f*(*x*) = 0. Clearly, *f* is a {*k*}-CTF of *G*. We have *τ* {*k*} *C* (*G*) ≤ *kτC*(*G*).

Assume that *f* is a minimum-weight {*k*}-CTF of *G*. Then, *f*(*V*(*G*)) = *τ* {*k*} *C* (*G*) and *f*(*C*) ≥ *k* for every *C* ∈ *C*(*G*). Let *S* = {*C*1, *C*2, . . . , *C*`} be a maximum CIS of *G*. We know that |*S*| = ` = *αC*(*G*) and ∑ ` *i*=1 *f*(*Ci*) ≤ *f*(*V*(*G*)). Therefore, *kτC*(*G*) = *kαC*(*G*) = *k*` ≤ ∑ ` *i*=1 *f*(*Ci*) ≤ *f*(*V*(*G*)) = *τ* {*k*} *C* (*G*). Following what we have discussed above, we know that *τ* {*k*} *C* (*G*) = *kτC*(*G*).

**Theorem 2.** *If a graph G is clique perfect, τ* {*k*} *C* (*G*) = *kτC*(*G*)*.*

**Proof.** Since *G* is clique perfect, *τC*(*G*) = *αC*(*G*). Hence, the theorem holds by Lemma 4.

**Corollary 2.** *The* {*k*}*-CTP is polynomial-time solvable for distance-hereditary graphs, balanced graphs, strongly chordal graphs, comparability graphs, and chordal graphs without odd suns.*

**Proof.** Distance-hereditary graphs, balanced graphs, strongly chordal graphs, comparability graphs, and chordal graphs without odd suns are clique perfect, and the CTP can be solved in polynomial time for them [10–14]. The corollary therefore holds.

**Definition 8.** *Suppose that R is a function whose domain is C*(*G*) *and range is* {0, 1, . . . , *ω*(*G*)}*. If R*(*C*) ≤ |*C*| *for every C* ∈ *C*(*G*)*, then R is a clique-size restricted function (abbreviated as*

*CSRF) of G. A set D* ⊆ *V*(*G*) *is an R-clique transversal set (abbreviated as R-CTS) of G if R is a CSRF of G and* |*D* ∩ *C*| ≥ *R*(*C*) *for every C* ∈ *C*(*G*)*. Let τR*(*G*) = *min*{|*D*| | *D is an R-CTS of G*}*. The generalized clique transversal problem (abbreviated as GCTP) is to find a minimum R-CTS for a graph G with a CSRF R.*

**Lemma 5.** *Let G be a graph with a CSRF R. If R*(*C*) = d(|*C*| + 1)/2e *for every C* ∈ *C*(*G*)*, then τ s C* (*G*) = 2*τR*(*G*) − *n.*

**Proof.** Assume that *D* is a minimum *R*-CTS of *G*. Then, |*D*| = *τR*(*G*). Let *x* ∈ *V*(*G*) and let *f* be a function of *G* whose domain is *V*(*G*) and range is {−1, 1}, and *f*(*x*) = 1 if *x* ∈ *D*; otherwise, *f*(*x*) = −1. Since |*D* ∩ *C*| ≥ d(|*C*| + 1)/2e for every *C* ∈ *C*(*G*), there are at least d(|*C*| + 1)/2e vertices in *C* with the function value 1. Therefore, *f*(*C*) ≥ 1 for every *C* ∈ *C*(*G*), and *f* is an SCTF of *G*. Then, *τ s C* (*G*) ≤ 2*τR*(*G*) − *n*.

Assume that *h* is a minimum-weight SCTF of *G*. Clearly, *h*(*V*(*G*)) = *τ s C* (*G*). Since *h*(*C*) ≥ 1 for every *C* ∈ *C*(*G*), *C* contains at least d(|*C*| + 1)e/2 vertices with the function value 1. Let *D* = {*x* | *h*(*x*) = 1, *x* ∈ *V*(*G*)}. The set *D* is an *R*-CTS of *G*. Therefore, 2*τR*(*G*) − *n* ≤ 2|*D*| − *n* = *τ s C* (*G*). Hence, we have *τ s C* (*G*) = 2*τR*(*G*) − *n*.

**Theorem 3.** *The SCTP on balanced graphs can be solved in polynomial time.*

**Proof.** Suppose that a graph *G* has *n* vertices *v*1, *v*2, . . . , *v<sup>n</sup>* and ` maximal cliques *C*1, *C*2, . . . , *C*` . Let *i* ∈ {1, 2, . . . , `} and *j* ∈ {1, 2, . . . , *n*}. Let *M* be an ` × *n* matrix such that an element *M*(*i*, *j*) of *M* is one if the maximal clique *C<sup>i</sup>* contains the vertex *v<sup>j</sup>* , and *M*(*i*, *j*) = 0 otherwise. We call *M* the *clique matrix* of *G*. If the clique matrix *M* of *G* does not contain a square submatrix of odd order with exactly two ones per row and column, then *M* is a *balanced* matrix and *G* is a *balanced* graph. We formulae the GCTP on a balanced graph *G* with a CSRF *R* as the following integer programming problem:

$$\begin{array}{ccc} \text{minimize} & \sum\_{i=1}^{n} \mathfrak{x}\_{i} \\\\ \text{subject to} & MX \geq \mathcal{C} \end{array} \right\}.$$

where C = (*R*(*C*1), *R*(*C*2), . . . , *R*(*C*` )) is a column vector and *X* = (*x*1, *x*2, . . . , *xn*) is a column vector such that *x<sup>i</sup>* is either 0 or 1. Since the matrix *M* is balanced, an optimal 0–1 solution of the integer programming problem above can be found in polynomial time by the results in [15]. By Lemma 5, we know that the SCTP on balanced graphs can be solved in polynomial time.

#### **4. Split Graphs**

Let *G* be such a graph that *V*(*G*) = *I* ∪ *C* and *I* ∩ *C* = ∅. If *I* is an independent set and *C* is a clique, *G* is a *split* graph. Then, every maximal of *G* is either *C* itself, or the closed neighborhood *NG*[*x*] of a vertex *x* ∈ *I*. We use *G* = (*I*, *C*, *E*) to represent a split graph. The {*k*}-CTP, the *k*-FCTP, the SCTP, and the MCTP for split graphs are considered in this section. We also consider the {*k*}-DP, the *k*-TDP, the SDP, and the MDP for split graphs.

For split graphs, the {*k*}-DP, the *k*-TDP, and the MDP are NP-complete [16–18], but the complexity of the SDP is still unknown. In the following, we examine the relationships between the {*k*}-CTP and the {*k*}-DP, the *k*-FCTP and the *k*-TDP, the SCTP and the SDP, and the MCTP and the MDP. Then, by the relationships, we prove the NP-completeness of the SDP, the {*k*}-CTP, the *k*-FCTP, the SCTP, and the MCTP for split graphs. We first consider the {*k*}-CTP and the *k*-FCTP and show in Theorems 4 and 5 that *τ k C* (*G*) = *<sup>γ</sup>*×*<sup>k</sup>* (*G*) and *τ* {*k*} *C* (*G*) = *<sup>γ</sup>*{*k*} (*G*) for any split graph *G*. Chordal graphs form a superclass of split graphs [19]. The cardinality of *C*(*G*) is at most *n* for any chordal graph *G* [20]. The following lemma therefore holds trivially.

**Lemma 6.** *The k-FCTP, the* {*k*}*-CTP, the SCTP, and the MCTP for chordal graphs are in NP.*

**Theorem 4.** *Suppose that k* ∈ N *and G* = (*I*, *C*, *E*) *is a split graph. Then, τ k C* (*G*) = *<sup>γ</sup>*×*<sup>k</sup>* (*G*)*.*

**Proof.** Let *S* be a minimum *k*-FCTS of *G*. Consider a vertex *y* ∈ *I*. By the structure of *G*, *NG*[*y*] is a maximal clique of *G*. Then, |*S* ∩ *NG*[*y*]| ≥ *k*. We now consider a vertex *x* ∈ *C*. If *C* 6∈ *C*(*G*), then there exists a vertex *y* ∈ *I* such that *NG*[*y*] = *C* ∪ {*y*}. Clearly, *NG*[*y*] ⊆ *NG*[*x*] and |*S* ∩ *NG*[*x*]| ≥ |*S* ∩ *NG*[*y*]| ≥ *k*. If *C* ∈ *C*(*G*), then |*S* ∩ *NG*[*x*]| ≥ |*S* ∩ *C*| ≥ *k*. Hence, *S* is a *k*-TDS of *G*. We have *γ*×*<sup>k</sup>* (*G*) ≤ *τ k C* (*G*).

Let *D* be a minimum *k*-TDS of *G*. Recall that the closed neighborhood of every vertex in *I* is a maximal clique. Then, *D* contains at least *k* vertices in the maximal clique *NG*[*y*] for every vertex *y* ∈ *I*. If *C* 6∈ *C*(*G*), *D* is clearly a *k*-FCTS of *G*. Suppose that *C* ∈ *C*(*G*). We consider three cases as follows.

Case 1: *y* ∈ *I* \ *D*. Then, |*D* ∩ *C*| ≥ |*D* ∩ *NG*(*y*)| ≥ *k*. The set *D* is a *k*-FCTS of *G*.

Case 2: *y* ∈ *I* ∩ *D* and *x* ∈ *NG*(*y*) \ *D*. Then, the set *D*<sup>0</sup> = (*D* \ {*y*}) ∪ {*x*} is still a minimum *k*-TDS and |*D*<sup>0</sup> ∩ *C*| ≥ |*D*<sup>0</sup> ∩ *NG*(*y*)| ≥ *k*. The set *D*<sup>0</sup> is a *k*-FCTS of *G*.

Case 3: *I* ⊆ *D* and *NG*[*y*] ⊆ *D* for every *y* ∈ *I*. Then, |*D* ∩ *C*| ≥ |*D* ∩ *NG*(*y*)| ≥ *k* − 1. Since *C* ∈ *C*(*G*), there exists *x* ∈ *C* such that *y* 6∈ *NG*(*x*). If *NG*(*x*) ∩ *I* = ∅, then *NG*[*x*] = *C* and |*D* ∩ *C*| = |*D* ∩ *NG*[*x*]| ≥ *k*. Otherwise, let *y* <sup>0</sup> ∈ *NG*(*x*) ∩ *I*. Then, *x* ∈ *D* and |*D* ∩ *C*| ≥ |*D* ∩ *NG*(*y*)| + 1 ≥ *k*. The set *D* is a *k*-FCTS of *G*.

By the discussion of the three cases, we have *τ k C* (*G*) ≤ *<sup>γ</sup>*×*<sup>k</sup>* (*G*). Hence, we obtain that *γ*×*<sup>k</sup>* (*G*) ≤ *τ k C* (*G*) and *τ k C* (*G*) ≤ *<sup>γ</sup>*×*<sup>k</sup>* (*G*). The theorem holds for split graphs.

**Theorem 5.** *Suppose that k* ∈ N *and G* = (*I*, *C*, *E*) *is a split graph. Then, τ* {*k*} *C* (*G*) = *<sup>γ</sup>*{*k*} (*G*)*.*

**Proof.** We can verify by contradiction that *G* has a minimum-weight {*k*}-CTF *f* and a minimum-weight {*k*}-DF *g* of *G* such that *f*(*y*) = 0 and *g*(*y*) = 0 for every *y* ∈ *I*. By the structure of *G*, *NG*[*y*] ∈ *C*(*G*) for every *y* ∈ *I*. Then, *f*(*NG*[*y*]) ≥ *k* and *g*(*NG*[*y*]) ≥ *k*. Since *f*(*y*) = 0 and *g*(*y*) = 0, *f*(*NG*(*y*)) ≥ *k* and *g*(*NG*(*y*)) ≥ *k*.

For every *y* ∈ *I*, *NG*(*y*) ⊆ *C* and *f*(*C*) ≥ *f*(*NG*(*y*)) ≥ *k*. For every *x* ∈ *C*, *f*(*NG*[*x*]) ≥ *<sup>f</sup>*(*C*) ≥ *<sup>k</sup>*. Therefore, the function *<sup>f</sup>* is also a {*k*}-DF of *<sup>G</sup>*. We have *<sup>γ</sup>*{*k*} (*G*) ≤ *τ* {*k*} *C* (*G*). We now consider *g*(*C*) for the clique *C*. If *C* 6∈ *C*(*G*), the function *g* is clearly a {*k*}-CTF of *G*. Suppose that *C* ∈ *C*(*G*). Notice that *g* is a {*k*}-DF and *g*(*y*) = 0 for every *y* ∈ *I*. Then, *g*(*C*) = *g*(*NG*[*x*]) ≥ *k* for any vertex *x* ∈ *C*. Therefore, *g* is also a {*k*}-CTF of *G*. We have *τ* {*k*} *C* (*G*) ≤ *<sup>γ</sup>*{*k*} (*G*). Following what we have discussed above, we know that *τ* {*k*} *C* (*G*) = *<sup>γ</sup>*{*k*} (*G*).

**Corollary 3.** *The* {*k*}*-CTP and the k-FCTP are NP-complete for split graphs.*

**Proof.** The corollary holds by Theorems 4 and 5 and the NP-completeness of the {*k*}-DP and the *k*-TDP for split graphs [16,18].

A graph *G* is a *complete* if *C*(*G*) = {*V*(*G*)}. Let *G* be a complete graph and let *x* ∈ *V*(*G*). The vertex set *V*(*G*) is the union of the sets {*x*} and *V*(*G*) \ {*x*}. Clearly, {*x*} is an independent set and *V*(*G*) \ {*x*} is a clique of *G*. Therefore, complete graphs are split graphs. It is easy to verify the Lemma 7.

**Lemma 7.** *If G is a complete graph and k* ∈ N*, then*


For split graphs, however, the signed and minus domination numbers are not necessarily equal to the signed and minus clique transversal numbers, respectively. Figure 2 shows a split graph *G* with *τ s C* (*G*) = *τ* − *C* (*G*) = −3. However, *γs*(*G*) = *γ* −(*G*) = 1. We therefore introduce *H*1*-split* graphs and show in Theorem 6 that their signed and minus domination numbers are equal to the signed and minus clique transversal numbers, respectively. *H*1-split graphs are motivated by the graphs in [17] for proving the NP-completeness of the MDP on split graphs. Figure 3 shows an *H*1-split graph.

**Figure 2.** A split graph *G* with *τ s C* (*G*) = *τ* − *C* (*G*) = −3.

**Definition 9.** *Suppose that G* = (*I*, *C*, *E*) *is a split graph with* 3*p* + 3` + 2 *vertices. Let U, S, X, and Y be pairwise disjoint subsets of V*(*G*) *such that U* = {*u<sup>i</sup>* | 1 ≤ *i* ≤ *p*}*, S* = {*s<sup>i</sup>* | 1 ≤ *i* ≤ `}*, X* = {*x<sup>i</sup>* | 1 ≤ *i* ≤ *p* + ` + 1}*, and Y* = {*y<sup>i</sup>* | 1 ≤ *i* ≤ *p* + ` + 1}*. The graph G is an H*1*-split graph if V*(*G*) = *U* ∪ *S* ∪ *X* ∪ *Y and G entirely satisfies the following three conditions.*


**Figure 3.** A split graph *G* with one of its partitions indicated.

**Theorem 6.** *For any H*1*-split graph G* = (*I*, *C*, *E*)*, τ s C* (*G*) = *γs*(*G*) *and τ* − *C* (*G*) = *γ* −(*G*)*.*

**Proof.** We first prove *τ s C* (*G*) = *γs*(*G*). Let *G* = (*I*, *C*, *E*) be an *H*1-split graph. As stated in Definition 9, *I* can be partitioned into *S* = {*s<sup>i</sup>* | 1 ≤ *i* ≤ `} and *Y* = {*y<sup>i</sup>* | 1 ≤ *i* ≤ *p* + ` + 1}, and *C* can be partitioned into *U* = {*u<sup>i</sup>* | 1 ≤ *i* ≤ *p*} and *X* = {*x<sup>i</sup>* | 1 ≤ *i* ≤ *p* + ` + 1}. Assume that *f* is a minimum-weight SDF of *G*. For each *y<sup>i</sup>* ∈ *Y*, |*NG*[*y<sup>i</sup>* ]| = 2 and *y<sup>i</sup>* is adjacent to only the vertex *x<sup>i</sup>* ∈ *X*. Then, *f*(*xi*) = *f*(*yi*) = 1 for 1 ≤ *i* ≤ *p* + ` + 1. Since *C* = *U* ∪ *X* and |*U*| = *p*, we know that *f*(*C*) = *f*(*U*) + *f*(*X*) ≥ (−*p*) + (*p* + ` + 1) ≥ ` + 1. Notice that *f*(*NG*[*y*]) ≥ 1 and *NG*[*y*] ∈ *C*(*G*) for every *y* ∈ *I*. Therefore, *f* is also an SCTF of *G*. We have *τ s C* (*G*) ≤ *γs*(*G*).

Assume that *h* is a minimum-weight SCTF of *G*. For each *y<sup>i</sup>* ∈ *Y*, |*NG*[*y<sup>i</sup>* ]| = 2 and *y<sup>i</sup>* is adjacent to only the vertex *x<sup>i</sup>* ∈ *X*. Then, *h*(*xi*) = *h*(*yi*) = 1 for 1 ≤ *i* ≤ *p* + ` + 1. Consider the vertices in *I*. Since *NG*[*y*] ∈ *C*(*G*) for every *y* ∈ *I*, *h*(*NG*[*y*]) ≥ 1. We now consider the vertices in *C*. Recall that *C* = *U* ∪ *X*. Let *u<sup>i</sup>* ∈ *U*. Since |*U*| = *p* and |*S*| = `, we know that *h*(*NG*[*u<sup>i</sup>* ]) = *h*(*U*) + *h*(*X*) + *h*(*NG*[*u<sup>i</sup>* ] ∩ *S*) ≥ (−*p*) + (*p* + ` + 1) + (−`) ≥ 1. Let *x<sup>i</sup>* ∈ *X*. Then, *h*(*NG*[*x<sup>i</sup>* ]) = *h*(*U*) + *h*(*X*) + *h*(*yi*) + *h*(*si*) ≥ (−*p*) + (*p* + ` + 1) + 1 − 1 ≥ ` + 1. Therefore, *h* is also an SDF of *G*. We have *γs*(*G*) ≤ *τ s C* (*G*).

Following what we have discussed above, we have *τ s C* (*G*) = *γs*(*G*). The proof for *τ* − *C* (*G*) = *γ* −(*G*) is analogous to that for *τ s C* (*G*) = *γs*(*G*). Hence, the theorem holds for any *H*1-split graphs.

#### **Theorem 7.** *The SDP on H*1*-split graphs is NP-complete.*

**Proof.** We reduce the (3,2)-*hitting set problem* to the SDP on *H*1-split graphs. Let *U* = {*u<sup>i</sup>* | 1 ≤ *i* ≤ *p*} and let C = {*C*1, *C*2, . . . , *C*`} such that *C<sup>i</sup>* ⊆ *U* and |*C<sup>i</sup>* | = 3 for 1 ≤ *i* ≤ `. A (3,2)-hitting set for the instance (*U*, C) is a subset *U*<sup>0</sup> of *U* such that |*C<sup>i</sup>* ∩ *U*<sup>0</sup> | ≥ 2 for 1 ≤ *i* ≤ `. The (3,2)-hitting set problem is to find a minimum (3,2)-hitting set for any instance (*U*, C). The (3,2)-hitting set problem is NP-complete [17].

Consider an instance (*U*, C) of the (3,2)-hitting set problem. Let *S* = {*s<sup>i</sup>* | 1 ≤ *i* ≤ `}, *X* = {*x<sup>i</sup>* | 1 ≤ *i* ≤ *p* + ` + 1}, and *Y* = {*y<sup>i</sup>* | 1 ≤ *i* ≤ *p* + ` + 1}. We construct an *H*1-split graph *G* = (*I*, *C*, *E*) by the following steps.


Let *τ<sup>h</sup>* (3, 2) be the minimum cardinality of a (3,2)-hitting set for the instance (*U*, C). Assume that *U*0 is a minimum (3,2)-hitting set for the instance (*U*, C). Then, |*U*<sup>0</sup> | = *τ<sup>h</sup>* (3, 2). Let *f* be a function whose domain is *V*(*G*) and range is {−1, 1}, and *f*(*v*) = 1 if *v* ∈ *X* ∪ *Y* ∪ *U*<sup>0</sup> and *f*(*v*) = −1 if *v* ∈ *S* ∪ (*U* \ *U*<sup>0</sup> ). Clearly, *f* is an SDF of *G*. We have *γs*(*G*) ≤ 2(*p* + ` + 1) + |*U*<sup>0</sup> | − ` − (*p* − |*U*<sup>0</sup> |) = *p* + ` + 2*τ<sup>h</sup>* (3, 2) + 2.

Assume that *f* is minimum-weight SDF of *G*. For each *y<sup>i</sup>* ∈ *Y*, |*NG*[*y<sup>i</sup>* ]| = 2 and *y<sup>i</sup>* is adjacent to only the vertex *x<sup>i</sup>* ∈ *X*. Then, *f*(*xi*) = *f*(*yi*) = 1 for 1 ≤ *i* ≤ *p* + ` + 1. For any vertex *v* ∈ *X* ∪ *Y* ∪ *U*, *f*(*NG*[*v*]) ≥ 1 no matter what values the function *f* assigns to the vertices in *U* or in *S*. Consider the vertices in *S*. By the construction of *G*, *degG*(*si*) = 4 and |*NG*[*s<sup>i</sup>* ]| = 5 for 1 ≤ *i* ≤ `. There are at least three vertices in *NG*[*s<sup>i</sup>* ] with the function value 1. If *f*(*NG*[*s<sup>i</sup>* ]) = 5, then there exists an SDF *g* of *G* such that *g*(*si*) = −1 and *g*(*v*) = *f*(*v*) for every *v* ∈ *V*(*G*) \ {*si*}. Then, *g*(*V*(*G*)) < *f*(*V*(*G*)). It contradicts the assumption that the weight of *f* is minimum. Therefore, there exists a minimum-weight SDF *h* of *G* such that *h*(*si*) = −1 for 1 ≤ *i* ≤ `. Notice that *NG*(*si*) = *C<sup>i</sup>* ∪ {*xi*} for 1 ≤ *i* ≤ `. There are at least two vertices in *C<sup>i</sup>* with the function value 1. Then, the set *U*<sup>0</sup> = {*u* ∈ *U* | *h*(*u*) = 1} is a (3,2)-hitting set for the instance (*U*, C). We have *p* + ` + 2*τ<sup>h</sup>* (3, 2) + 2 ≤ *p* + ` + 2|*U*<sup>0</sup> | + 2 = *γs*(*G*).

Following what we have discussed above, we know that *γs*(*G*) = *p* + ` + 2*τ<sup>h</sup>* (3, 2) + 2. Hence, the SDP on *H*1-split graphs is NP-complete.

**Corollary 4.** *The SCTP and the MCTP on split graphs are NP-complete.*

**Proof.** The corollary holds by Theorems 6 and 7 and the NP-completeness of the MDP on split graphs [17].

#### **5. Doubly Chordal and Dually Chordal Graphs**

Assume that *G* is a graph with *n* vertices *x*1, *x*2, . . . , *xn*. Let *i* ∈ {1, 2, . . . , *n*} and let *G<sup>i</sup>* be the subgraph *G*[*V*(*G*) \ {*x*1, *x*2, . . . *xi*−1}]. For every *x* ∈ *V*(*Gi*), let *N<sup>i</sup>* [*x*] = {*y* | *y* ∈ (*NG*[*x*] \ {*x*1, *x*2, . . . , *xi*−1})}. In *G<sup>i</sup>* , if there exists a vertex *x<sup>j</sup>* ∈ *N<sup>i</sup>* [*xi* ] such that *Ni* [*x<sup>k</sup>* ] ⊆ *N<sup>i</sup>* [*xj* ] for every *x<sup>k</sup>* ∈ *N<sup>i</sup>* [*xi* ], then the ordering (*x*1, *x*2, . . . , *xn*) is a *maximum neighborhood ordering* (abbreviated as MNO) of *G*. A graph *G* is *dually chordal* [21] if and only if *G* has an MNO. It takes linear time to compute an MNO for any dually chordal graph [22]. A graph *G* is a *doubly chordal* graph if *G* is both chordal and dually chordal [23]. Lemma 8 shows that a dually chordal graph is not necessarily a chordal graph or a clique perfect graph. Notice that the number of maximal cliques in a chordal graph is at most *n* [20], but the number of maximal cliques in a dually chordal graph can be exponential [24].

**Lemma 8.** *For any dually graph G, τC*(*G*) = *αC*(*G*)*, but G is not necessarily clique perfect or chordal.*

**Proof.** Brandstädt et al. [25] showed that the CTP is a particular case of the *clique rdomination problem* and the CIP is a particular case of the *clique r-packing problem*. They also showed that the minimum cardinality of a clique *r*-dominating set of a dually chordal graph *G* is equal to the maximum cardinality of a clique *r*-packing set of *G*. Therefore, *τC*(*G*) = *αC*(*G*).

Assume that *H* is a graph obtained by connecting every vertex of a cycle *C*<sup>4</sup> of four vertices *x*1, *x*2, *x*3, *x*<sup>4</sup> to a vertex *x*5. Clearly, the ordering (*x*1, *x*2, *x*3, *x*4, *x*5) is an MNO and thus *H* is a dually chordal graph. The cycle *C*<sup>4</sup> is an induced subgraph of *H* and does not have a chord. Moreover, *τC*(*H*) = *αC*(*H*) = 1, but *τC*(*C*4) = 2 and *αC*(*C*4) = 1. Hence, a dually chordal graph is not necessarily clique perfect or chordal.

**Theorem 8.** *Suppose that k* ∈ N *and k* > 1*. The k-FCTP on doubly chordal graphs is NP-complete.*

**Proof.** Suppose that *G* is a chordal graph. Let *H* be a graph such that *V*(*H*) = *V*(*G*) ∪ {*x*} and *E*(*H*) = *E*(*G*) ∪ {(*x*, *y*) | *y* ∈ *V*(*G*)}. Clearly, *H* is a doubly chordal graph and we can construct *H* from *G* in linear time.

Assume that *S* is a minimum (*k* − 1)-FCTS of *G*. By the construction of *H*, each maximal clique of *H* contains the vertex *x*. Therefore, *S* ∪ {*x*} is a *k*-FCTS of *H*. Then *τ k C* (*H*) ≤ *τ k*−1 *C* (*G*) + 1.

By contradiction, we can verify that there exists a minimum *k*-FCTS *D* of *H* such that *x* ∈ *D*. Let *S* = *D* \ {*x*}. Clearly, *S* is a (*k* − 1)-FCTS of *G*. Then *τ k*−1 *C* (*G*) ≤ *τ k C* (*H*) − 1. Following what we have discussed above, we have *τ k C* (*H*) = *τ k*−1 *C* (*G*) + 1. Notice that *τC*(*G*) = *τ* 1 *C* (*G*) and the CTP on chordal graphs is NP-complete [14]. Hence, the *k*-FCTP on doubly chordal graphs is NP-complete for doubly chordal graphs.

**Theorem 9.** *For any doubly chordal graph G, τ* {*k*} *C* (*G*) *can be computed in linear time.*

**Proof.** The clique *r*-dominating problem on doubly chordal graphs can be solved in linear time [25]. The CTP is a particular case of the clique *r*-domination problem. Therefore, the CTP on doubly chordal graphs can also be solved in linear time. By Lemmas 4 and 8, the theorem holds.

#### **6.** *k***-Trees**

Assume that *G* is a graph with *n* vertices *x*1, *x*2, . . . , *xn*. Let *i* ∈ {1, 2, . . . , *n*} and let *G<sup>i</sup>* be the subgraph *G*[*V*(*G*) \ {*x*1, *x*2, . . . *xi*−1}]. For every *x* ∈ *V*(*Gi*), let *N<sup>i</sup>* [*x*] = {*y* | *y* ∈ (*NG*[*x*] \ {*x*1, *x*2, . . . , *xi*−1})}. If *N<sup>i</sup>* [*xi* ] is a clique for 1 ≤ *i* ≤ *n*, then the ordering (*x*1, *x*2, . . . , *xn*) is a *perfect elimination ordering* (abbreviated as PEO) of *G*. A graph *G* is chordal if and only if *G* has a PEO [26]. A chordal graph *G* is a *k*-tree if and only if either *G* is a complete graph of *k* vertices or *G* has more than *k* vertices and there exists a PEO (*x*1, *x*2, . . . , *xn*) such that *N<sup>i</sup>* [*xi* ] is a clique of *k* vertices if *i* = *n* − *k* + 1; otherwise, *N<sup>i</sup>* [*xi* ] is a clique of *k* + 1 vertices for 1 ≤ *i* ≤ *n* − *k*. Figure 4 shows a 2-tree with the PEO (*v*1, *v*2, . . . , *v*13).

**Figure 4.** A 2-tree *H*.

In [3], Chang et al. showed that the MCTP is NP-complete for *k*-trees with unbounded *k* by proving *γ*(*G*) = *τM*(*G*) for any *k*-tree *G*. However, Figure 4 shows a counterexample that disproves *γ*(*G*) = *τM*(*G*) for any *k*-tree *G*. The graph *H* in Figure 4 is a 2-tree with the perfect elimination ordering (*v*1, *v*2, . . . , *v*13). The set {*v*5, *v*10} is the minimum dominating set of *H* and the set {*v*5, *v*10, *v*11} is a minimum MCTS of *H*. A modified NP-completeness proof is therefore desired for the MCTP on *k*-tree with unbounded *k*.

#### **Theorem 10.** *The MCTP and the MCIP are NP-complete for k-trees with unbounded k.*

**Proof.** The CTP and the CIP are NP-complete for *k*-trees with unbounded *k* [8]. Since every maximal clique in a *k*-tree is also a maximum clique [27], an MCTS is a CTS and an MCIS is a CIS. Hence, the MCTP and the MCIP are NP-complete for *k*-trees with unbounded *k*.

**Theorem 11.** *The SCTP is NP-complete for k-trees with unbounded k.*

**Proof.** Suppose that *k*<sup>1</sup> ∈ N and *G* is a *k*1-tree with |*V*(*G*)| > *k*1. Let *C*(*G*) = {*C*1, *C*2, . . . , *C*`}. Since *G* is a *k*1-tree, |*C<sup>i</sup>* | = *k*<sup>1</sup> + 1 for 1 ≤ *i* ≤ `.

Let *Q* be a clique with *k*<sup>1</sup> + 1 vertices. Let *H* be a graph such that *V*(*H*) = *V*(*G*) ∪ *Q* and *E*(*H*) = *E*(*G*) ∪ {(*x*, *y*) | *x*, *y* ∈ *Q*} ∪ {(*x*, *y*) | *x* ∈ *Q*, *y* ∈ *V*(*G*)}. Let *X<sup>i</sup>* = *C<sup>i</sup>* ∪ *Q* be a clique for 1 ≤ *i* ≤ `. Clearly, *C*(*H*) = {*X<sup>i</sup>* | 1 ≤ *i* ≤ `}. Let *k*<sup>2</sup> = 2*k*<sup>1</sup> + 1. Then, *H* is a *k*2-tree and |*X<sup>i</sup>* | = *k*<sup>2</sup> + 1 = 2*k*<sup>1</sup> + 2 for 1 ≤ *i* ≤ `. Clearly, we can verify that there exists a minimum-weight SCTF *h* of *H* of such that *h*(*x*) = 1 for every *x* ∈ *Q*. Then, *C<sup>i</sup>* = *X<sup>i</sup>* \ *Q* contains at least one vertex *x* with *h*(*x*) = 1 for 1 ≤ *i* ≤ `. Let *S* = {*x* | *x* ∈ *V*(*H*) \ *Q* and *h*(*x*) = 1}. Then, *S* is a CTS of *G*. Since *τ s C* (*H*) = |*Q*| + 2|*S*| − |*V*(*G*)|, we have |*Q*| + 2*τC*(*G*) − |*V*(*G*)| ≤ *τ s C* (*H*).

Assume that *D* is a minimum CTS of *G*. Let *f* be a function of *H* whose domain is *V*(*H*) and range is {−1, 1}, and (1) *f*(*x*) = 1 for every *x* ∈ *Q*, (2) *f*(*x*) = 1 for every *x* ∈ *D*, and (3) *f*(*x*) = −1 for every *x* ∈ *V*(*G*) \ *D*. Each maximal clique of *H* has at least *k*<sup>1</sup> + 2 vertices with the function value 1. Therefore, *f* is an SCTF. We have *τ s C* (*H*) ≤ |*Q*| +2*τC*(*G*) − |*V*(*G*)|. Following what we have discussed above, we know that *τ s C* (*H*) = |*Q*| + 2*τC*(*G*) − |*V*(*G*)|. The theorem therefore holds by the NP-completeness of the CTP for *k*-trees [8].

**Theorem 12.** *Suppose that κ* ∈ N *the κ-FCTP is NP-complete on k-trees with unbounded k.*

**Proof.** Assume that *k*<sup>1</sup> ∈ N and *G* is a *k*1-tree with |*V*(*G*)| > *k*1. Let *H* be a graph such that *V*(*H*) = *V*(*G*) ∪ {*x*} and *E*(*H*) = *E*(*G*) ∪ {(*x*, *y*) | *y* ∈ *V*(*G*)}. Clearly, *H* is a (*k*<sup>1</sup> + 1) tree and we can construct *H* in linear time. Following the argument analogous to the proof of Theorem 8, we have *τ κ C* (*H*) = *τ κ*−1 *C* (*G*) + 1. The theorem therefore holds by the NP-completeness of the CTP for *k*-trees [8].

**Theorem 13.** *The SCTP and κ-FCTP problems can be solved in linear-time for k-trees with fixed k.*

**Proof.** Assume that *κ* ∈ N and *G* is a graph. The *κ*-FCTP is the GCTP with the CSRF *R* whose domain is *C*(*G*) and range is {*κ*}. By Lemma 5, *τ s C* (*G*) can be obtained from the solution to the GCTP on a graph *G* with a particular CSRF *R*. Since the GCTP is linear-time solvable for *k*-trees with fixed *k* [8], the SCTP and *κ*-FCTP are also linear-time solvable for *k*-trees with fixed *k*.

#### **7. Planar, Total, and Line Graphs**

In a graph, a vertex *x* and an edge *e* are *incident* to each other if *e* connects *x* to another vertex. Two edges are *adjacent* if they share a vertex in common. Let *G* and *H* be graphs such that each vertex *x* ∈ *V*(*H*) corresponds to an edge *e<sup>x</sup>* ∈ *E*(*G*) and two vertices *x*, *y* ∈ *V*(*H*) are adjacent in *H* if and only if their corresponding edges *e<sup>x</sup>* and *e<sup>y</sup>* are adjacent in *G*. Then, *H* is the *line graph* of *G* and denoted by *L*(*G*). Let *H*0 be a graph such that *V*(*H*0 ) = *V*(*G*) ∪ *E*(*G*) and two vertices *x*, *y* ∈ *V*(*H*<sup>0</sup> ) are adjacent in *H* if and only if *x* and *y* are adjacent or incident to each other in *G*. Then, *H*0 is the *total graph* of *G* and denoted by *T*(*G*).

**Lemma 9** ([28])**.** *The following statements hold for any triangle-free graph G.*


**Theorem 14.** *The MCIP is NP-complete for any 4-regular planar graph G with the clique number 3.*

**Proof.** Since |*C*(*G*)| = *O*(*n*) for any planar graph *G* [29], the MCIP on planar graphs is in NP. Let G be the class of triangle-free, 3-connected, cubic planar graphs. The independent set problem remains NP-complete even when restricted to the graph class G [30]. We reduce this NP-complete problem to the MCIP for 4-regular planar graphs with the clique number 3 as follows.

Let *G* ∈ G and *H* = *L*(*G*). Clearly, we can construct *H* in polynomial time. By Lemma 9, we know that *H* is a 4-regular planar graph with *ω*(*H*) = 3 and each maximal clique is a triangle in *H*.

Assume that *D* = {*x*1, *x*2, . . . , *x*`} is an independent set of *G* of maximum cardinality. Since *G* ∈ G, *degG*(*x*) = 3 for every *x* ∈ *V*(*G*). Let *ei*<sup>1</sup> ,*ei*<sup>2</sup> ,*ei*<sup>3</sup> ∈ *E*(*G*) have the vertex *x<sup>i</sup>* in common for 1 ≤ *i* ≤ `. Then, *ei*<sup>1</sup> ,*ei*<sup>2</sup> ,*ei*<sup>3</sup> form a triangle in *H*. Let *C<sup>i</sup>* be the triangle formed by *ei*<sup>1</sup> ,*ei*<sup>2</sup> ,*ei*<sup>3</sup> in *H* for 1 ≤ *i* ≤ `. For each pair of vertices *x<sup>j</sup>* , *x<sup>k</sup>* ∈ *D*, *x<sup>j</sup>* is not adjacent to *x<sup>k</sup>* in *G*. Therefore, *C<sup>j</sup>* and *C<sup>k</sup>* in *H* do not intersect. The set {*C*1, *C*2, . . . , *C*`} is an MCIS of *H*. We have *α*(*G*) ≤ *αM*(*H*).

Assume that *S* = {*C*1, *C*2, . . . , *C*`} is a maximum MCIS of *H*. Then, each *C<sup>i</sup>* ∈ *S* is a triangle in *H*. Let *C<sup>i</sup>* be formed by *ei*<sup>1</sup> ,*ei*<sup>2</sup> ,*ei*<sup>3</sup> in *H* for 1 ≤ *i* ≤ `. Then, *ei*<sup>1</sup> ,*ei*<sup>2</sup> ,*ei*<sup>3</sup> are incident to the same vertex in *G*. For 1 ≤ *i* ≤ `, let *ei*<sup>1</sup> ,*ei*<sup>2</sup> ,*ei*<sup>3</sup> ∈ *E*(*G*) have the vertex *x<sup>i</sup>* in common. For each pair of *C<sup>j</sup>* , *C<sup>k</sup>* ∈ *S*, *C<sup>j</sup>* and *C<sup>k</sup>* do not intersect. Therefore, *x<sup>j</sup>* is not adjacent to *x<sup>k</sup>* in *G*. The set {*x*1, *x*2, . . . , *x*`} is an independent set of *G*. We have *αM*(*H*) ≤ *α*(*G*).

Hence, *α*(*G*) = *αM*(*H*). For *k* ∈ N, we know that *α*(*G*) ≥ *k* if and only if *αM*(*G*) ≥ *k*.

**Corollary 5.** *The MCIP is NP-complete for line graphs of triangle-free, 3-connected, cubic planar graphs.*

**Proof.** The corollary holds by the reduction of Theorem 14.

**Theorem 15.** *The MCIP problem is NP-complete for total graphs of triangle-free, 3-connected, cubic planar graphs.*

**Proof.** Since |*C*(*G*)| = *O*(*n*) for a planar graph *G*, the MCIP on planar graphs is in NP. Let G be the classes of traingle-free, 3-connected, cubic planar graphs. The independent set problem remains NP-complete even when restricted to the graph class G [30]. We reduce this NP-complete problem to MCIP for for total graphs of triangle-free, 3-connected, cubic planar graphs. as follows

Let *G* ∈ G and *H* = *T*(*G*). Clearly, we can construct *H* in polynomial time. By Lemma 9, we can verify that *H* is a 6-regular graph with *ω*(*H*) = 4.

Assume that *D* = {*x*1, *x*2, . . . , *x*`} is an independent set of *G* of maximum cardinality. Since *G* ∈ G, *degG*(*x*) = 3 for every *x* ∈ *V*(*G*). Let *ei*<sup>1</sup> ,*ei*<sup>2</sup> ,*ei*<sup>3</sup> ∈ *E*(*G*) have the vertex *x<sup>i</sup>* in common. Then, *x<sup>i</sup>* ,*ei*<sup>1</sup> ,*ei*<sup>2</sup> ,*ei*<sup>3</sup> form a maximum clique in *H*. Let *C<sup>i</sup>* be the maximum clique formed by *x<sup>i</sup>* ,*ei*<sup>1</sup> ,*ei*<sup>2</sup> ,*ei*<sup>3</sup> in *H* for 1 ≤ *i* ≤ `. For each pair of vertices *x<sup>j</sup>* , *x<sup>k</sup>* ∈ *D*, *x<sup>j</sup>* is not adjacent to *x<sup>k</sup>* in *G*. Therefore, *C<sup>j</sup>* and *C<sup>k</sup>* in *H* do not intersect. The set {*C*1, *C*2, . . . , *C*`} is an MCIS of *H*. We have *α*(*G*) ≤ *αM*(*H*).

Assume that *S* = {*C*1, *C*2, . . . , *C*`} is a maximum MCIS of *H*. By the construction of *H*, each *C<sup>i</sup>* ∈ *S* is formed by three edge-vertices in *E*(*G*) and their common end vertex in *V*(*G*). Let *x<sup>i</sup>* ∈ *V* and *ei*<sup>1</sup> ,*ei*<sup>2</sup> ,*ei*<sup>3</sup> ∈ *E*(*G*) in *H* such that *C<sup>i</sup>* is formed by *v<sup>i</sup>* ,*ei*<sup>1</sup> ,*ei*<sup>2</sup> ,*ei*<sup>3</sup> for 1 ≤ *i* ≤ `. For each pair of *C<sup>j</sup>* , *C<sup>k</sup>* ∈ *C*, *C<sup>j</sup>* and *C<sup>k</sup>* do not intersect. Therefore, *x<sup>j</sup>* is not adjacent to *x<sup>k</sup>* in *G*. The set {*x*1, *x*2, . . . , *x*`} is an independent set of *G*. We have *αM*(*H*) ≤ *α*(*G*).

Hence, *α*(*G*) = *αM*(*H*). For *k* ∈ N, we know that *α*(*G*) ≥ *k* if and only if *αM*(*H*) ≥ *k*.

**Funding:** This research is supported by a Taiwanese grant under Grant No. NSC-97-2218-E-130-002-MY2.

**Acknowledgments:** We are grateful to the anonymous referees for their valuable comments and suggestions to improve the presentation of this paper.

**Conflicts of Interest:** The author declares no conflict of interest.

#### **References**


## *Article* **A Quasi-Hole Detection Algorithm for Recognizing** *k***-Distance-Hereditary Graphs, with** *k* < **2**

**Serafino Cicerone**

Department of Information Engineering, Computer Science and Mathematics, University of L'Aquila, I-67100 L'Aquila, Italy; serafino.cicerone@univaq.it

**Abstract:** Cicerone and Di Stefano defined and studied the class of *k*-distance-hereditary graphs, i.e., graphs where the distance in each connected induced subgraph is at most *k* times the distance in the whole graph. The defined graphs represent a generalization of the well known distance-hereditary graphs, which actually correspond to 1-distance-hereditary graphs. In this paper we make a step forward in the study of these new graphs by providing characterizations for the class of all the *k*-distance-hereditary graphs such that *k* < 2. The new characterizations are given in terms of both forbidden subgraphs and cycle-chord properties. Such results also lead to devise a polynomial-time recognition algorithm for this kind of graph that, according to the provided characterizations, simply detects the presence of quasi-holes in any given graph.

**Keywords:** distance-hereditary graphs; stretch number; recognition problem; forbidden subgraphs; hole detection

**Citation:** Cicerone, S. A Quasi-Hole Detection Algorithm for Recognizing *k*-Distance-Hereditary Graphs, with *k* < 2. *Algorithms* **2021**, *14*, 105. https://doi.org/10.3390/a14040105

Academic Editor: Frank Werner

Received: 10 February 2021 Accepted: 23 March 2021 Published: 25 March 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

#### **1. Introduction**

Distance-hereditary graphs have been introduced by Howorka [1], and are defined as those graphs in which every connected induced subgraph is isometric; that is, the distance between any two vertices in the subgraph is equal to the one in the whole graph. Therefore, any connected induced subgraph of any distance-hereditary graph *G* "inherits" its distance function from *G*. Formally:

**Definition 1** (from [1])**.** *A graph G is a distance-hereditary graph if, for each connected induced subgraph G*0 *of G, the following holds: dG*0(*x*, *y*) = *dG*(*x*, *y*)*, for each x*, *y* ∈ *G* 0 *.*

This kind of graph have been rediscovered many times (e.g., see [2]). Since their introduction, dozens of papers have been devoted to them, and different kinds of characterizations have been found: metric, forbidden subgraphs, cycle/chord conditions, level/neighborhood conditions, generative, and more (e.g., see [3]). Among such results, the generative properties resulted as the most fruitful for algorithmic applications, since they allowed researchers to efficiently solve many combinatorial problems in the class of distance-hereditary graphs (e.g., see [4–9]).

From an applicative point of view, distance-hereditary graphs are mainly attractive due to their basic metric property. For instance, these graphs can model unreliable communication networks [10,11] in which vertex failures may occur: at a given time, if sender and receiver are still connected, any message can be still delivered without increasing the length of the path used to reach the receiver.

Since in communication networks this property could be considered too restrictive, in [12] the class of *k-distance-hereditary graphs* has been introduced. These graphs can model unreliable networks in which messages can eventually reach the destination traversing a path whose length is at most *k* times the length of a shortest path computed in absence of vertex failures. The minimum *k* a network guarantees regardless the failed vertices is called *stretch number*. Formally:

**Definition 2** (from [12])**.** *Given a real number k* ≥ 1*, a graph G is a k*-distance-hereditary graph *if, for each connected induced subgraph G* 0 *of G, the following holds: dG*0(*x*, *y*) ≤ *k* · *dG*(*x*, *y*)*, for each x*, *y* ∈ *G* 0 *.*

The class of all the *k*-distance-hereditary graphs is denoted by DH(*k*). Concerning this class of graphs, the following relationships hold:


Additional results about the class hierarchy DH(*k*) can be found in [13,14]. It is worth to notice that this hierarchy is *fully general*; that is, for each arbitrary graph *G* there exists a number *k* such that *G* ∈ DH(*k*). It follows that the stretch number of *G*, denoted as *s*(*G*), is the smallest number *t* such that *G* belongs to DH(*t*). In [12], it has been shown that the stretch number *s*(*G*) of any connected graph *G* can be computed as follows:


It follows that for any non-trivial graph *G* with *n* ≥ 4 vertices, by simply maximizing *D*(*u*, *v*) and minimizing *d*(*u*, *v*), we get *s*(*G*) ≤ (*n* − 2)/2. From the above relationship about *s*(*G*), we get that the stretch number is always a rational number. Interestingly, it has been shown that there are some rational numbers that cannot be stretch numbers. Formally, a positive rational number *t* is called *admissible stretch number* if there exists a graph *G* such that *s*(*G*) = *t*. The following result characterizes which numbers are admissible stretch numbers.

**Theorem 1** (from [14])**.** *A rational number t is an admissible stretch number if and only if <sup>t</sup>* <sup>=</sup> <sup>2</sup> <sup>−</sup> <sup>1</sup> *i , for some integer i* ≥ 1*, or t* ≥ 2*.*

Apart from the interesting general results found for the classes DH(*k*), the original motivation was studying how (if possible) to extend the known algorithmic results from the base class, namely DH(1), to DH(*k*) for some constant *k* > 1. According to Theorem 1, in this work we are interested in studying the class containing each graph *G* such that *s*(*G*) < 2. Since this class contains graphs with stretch number *strictly* less than two, throughout this paper it will be denoted by *s*DH(2).

**Results.** In this work, we provide three results for the class *s*DH(2), namely two different characterizations and a recognition algorithm (notice that the characterizations have already been presented in [13] but with omitted proofs). The first characterization is based on listing all the minimal forbidden subgraphs for each graph in the class. It is interesting to observe the similarity with the corresponding result for the class DH(1):

	- **–** holes *Hn*, for each *n* ≥ 5;
	- **–** cycles *C*<sup>5</sup> with *cd*(*C*5) = 1;
	- **–** cycles *C*<sup>6</sup> with *cd*(*C*6) = 1.
	- **–** holes *Hn*, for each *n* ≥ 6;
	- **–** cycles *C*<sup>6</sup> with *cd*(*C*6) = 1;
	- **–** cycles *C*<sup>7</sup> with *cd*(*C*7) = 1;
	- **–** cycles *C*<sup>8</sup> with *cd*(*C*8) = 1.

Here we used the notion of "chord distance" *cd*(*C*) to express the position of possible chords within any cycle *C* (see Section 2 for a formal definition). Notice that in [14] a similar result has been provided for the generic class DH(<sup>2</sup> <sup>−</sup> <sup>1</sup> *i* ), *i* > 1.

The second result is a characterization based on a cycle-chord property. As in the previous case, notice the similarity with the corresponding result for the class DH(1):


The last result is a recognition algorithm for graphs belonging to *s*DH(2) that works in *O*(*n* <sup>2</sup>*m*<sup>2</sup> ) time and *O*(*m*<sup>2</sup> ) space. Basically, this algorithm exploits the result based on the cycle-chord property and, as a consequence, simply detects *quasi-holes* in any graph. A quasi-hole is any cycle with at least five vertices and chord-distance at most one (i.e., all the possible chords of the cycle must be incident to the same vertex). This algorithm is obtained by adapting the algorithm provided in [15] for detecting holes (i.e., any cycle with at least five vertices and no chords).

**Outline.** The paper is organized as follows. In Section 2, we introduce notation and basic concepts used throughout the paper. Sections 3 and 4 are devoted to providing the characterization based on minimal forbidden subgraphs and cycle-chord conditions for graphs in *s*DH(2), respectively. In Section 5, we provide the algorithm for detecting quasiholes and hence to solve the recognition problem for the class *s*DH(2). Finally, Section 6 provides some concluding remarks.

#### **2. Notation and Basic Concepts**

We consider finite, simple, loop-less, undirected, and unweighted graphs *G* = (*V*, *E*) with vertex set *V* and edge set *E*. A *subgraph* of *G* is a graph having all its vertices and edges in *G*. Given *S* ⊆ *V*, the *induced subgraph G*[*S*] of *G* is the maximal subgraph of *G* with vertex set *S*. Given *u* ∈ *V*, *NG*(*u*) denotes the set of *neighbors* of *u* in *G*, and *NG*[*u*] = *NG*(*u*) ∪ {*u*}.

A sequence of pairwise distinct vertices (*x*0, *x*1, . . . , *x<sup>k</sup>* ) is a *path* in *G* if (*x<sup>i</sup>* , *xi*+1) ∈ *E* for 0 ≤ *i* < *k*; vertex *x<sup>i</sup>* , for each 0 < *i* < *k*, is an *internal vertex* of that path. A *chord* of a path is any edge joining two non-consecutive vertices in the path, and a path is an *induced path* if it has no chords. We denote by *P<sup>k</sup>* any induced path with *k* ≥ 3 vertices (e.g., an induced path on three vertices is denoted as *P*<sup>3</sup> whereas an induced path on four vertices is denoted as *P*4). Two vertices *x* and *y* are *connected* in *G* if there exists a path (*x*, . . . , *y*) in *G*. A graph is *connected* if every pair of vertices is connected.

<sup>A</sup> *cycle* in *<sup>G</sup>* is a path (*x*0, *<sup>x</sup>*1, . . . , *<sup>x</sup>k*−<sup>1</sup> ) where also (*x*0, *<sup>x</sup>k*−<sup>1</sup> ) ∈ *E*. Two vertices *x<sup>i</sup>* and *<sup>x</sup><sup>j</sup>* are *consecutive* in the cycle (*x*0, *<sup>x</sup>*1, . . . , *<sup>x</sup>k*−<sup>1</sup> ) if *j* = (*i* + 1) mod *k* or *i* = (*j* + 1) mod *k*. A *chord* of a cycle is an edge joining two non-consecutive vertices in the cycle. We denote by *C<sup>k</sup>* any cycle with *k* ≥ 3 vertices, whereas *H<sup>k</sup>* denotes a *hole*, i.e., a cycle *C<sup>k</sup>* , *k* ≥ 5, without chords. The *chord distance* of a cycle *C<sup>k</sup>* is denoted by *cd*(*C<sup>k</sup>* ) and is defined as the minimum number of consecutive vertices in *C<sup>k</sup>* such that every chord of *C<sup>k</sup>* is incident to some of such vertices (see Figure 1 for an example of chord distance). We assume *cd*(*H<sup>k</sup>* ) = 0.

The length of any shortest path between two vertices *x* and *y* in a graph *G* is called *distance* and is denoted by *dG*(*x*, *y*). Moreover, the length of any longest induced path between them is denoted by *DG*(*x*, *y*). If *x* and *y* are distinct vertices, we use the symbols *pG*(*x*, *y*) and *PG*(*x*, *y*) to denote any shortest and any longest induced path between *x* and *y*, respectively. Sometimes, when no ambiguity occurs, we also use *pG*(*x*, *y*) and *PG*(*x*, *y*) to denote the sets of vertices belonging to the corresponding paths. If *dG*(*x*, *y*) ≥ 2, then {*x*, *y*} is a *cycle-pair* if there exist two induced paths *pG*(*x*, *y*) and *PG*(*x*, *y*) such that *pG*(*x*, *y*) ∩ *PG*(*x*, *y*) = {*x*, *y*}. In other words, if {*x*, *y*} is a cycle-pair, then there exist induced paths *pG*(*x*, *y*) and *PG*(*x*, *y*) such that the vertices in *pG*(*x*, *y*) ∪ *PG*(*x*, *y*) form a cycle in *G*; this cycle is denoted by *G*[*x*, *y*]. In Figure 1 {*v*3, *v*6} is a cycle-pair that induces the cycle (*v*3, *v*4, *v*5, *v*6, *v*1); in particular, *G*[*v*3, *v*6] is induced by *pG*(*v*3, *v*6) = (*v*3, *v*1, *v*6) and *PG*(*v*3, *v*6) = (*v*3, *v*4, *v*5, *v*6). We use the symbol S(*G*) to denote the set containing all pairs {*u*, *v*} of connected vertices that induce the stretch number of *G*, namely S(*G*) =

{{*x*, *y*} : *sG*(*x*, *y*) = *s*(*G*)}. The following lemma states that cycle-pairs are useful to determine the stretch number.

**Figure 1.** The chord distance of this *C*<sup>6</sup> graph is two because: (i) vertices *v*<sup>1</sup> and *v*<sup>2</sup> are consecutive in the cycle, (ii) every chord is incident to one of such vertices, and (iii) there is no other set with less than two vertices with the same properties.

**Lemma 1** (from [12])**.** *Let G be a graph such that s*(*G*) > 1*. The following relationships hold:*


This lemma suggests that studying *s*(*G*) concerns the analysis of cycles in *G*. In particular, if {*u*, *v*} is a cycle-pair that belongs to S(*G*), then the cycle *G*[*u*, *v*] is called *inducingstretch cycle* for *G*. In Figure 1, the represented graph *G* belongs to DH(3/2); moreover, both {*v*3, *v*5} and {*v*3, *v*6} are cycle-pairs in S(*G*), and (*v*1, *v*3, *v*4, *v*5, *v*6) is the corresponding inducing-stretch cycle.

#### **3. A Characterization Based on Forbidden Subgraphs**

A well known characterization based on *minimal forbidden subgraphs* has been provided for the class of distance-hereditary graphs.

**Theorem 2** (from [2])**.** *A graph G is a distance-hereditary graph if and only if it does not contain, as an induced subgraph, any of the following graphs: the hole Hn, n* ≥ 5*, the house, the fan, and the domino (cf. Figure 2).*

**Figure 2.** The minimal forbidden subgraphs of distance-hereditary graphs: from left to right, the *hole*, the *house*, the *fan*, and the *domino*. Dashed lines represent paths of length at least one.

This result can be easily reformulated, and simplified, by using the notion of chord distance. In particular, it is possible to characterize in a compact way all the forbidden subgraphs by using just the notion of chord distance as follows:

	- (*i*) *Hn, for each n* ≥ 5*;*
	- (*ii*) *cycles C*<sup>5</sup> *with cd*(*C*5) = 1*;*
	- (*iii*) *cycles C*<sup>6</sup> *with cd*(*C*6) = 1*.*

It is worth to notice that in this way we do not consider the minimal subgraphs only (cf. Figure 3).

**Figure 3.** The forbidden subgraphs of DH(1) expressed according to the notion of chord distance. Dashed lines represent paths of length at least one. Dotted lines represent chords that may or may not exist.

In the following we provide a characterization similar to that of Theorem 2 for any graph *G* ∈ *s*DH(2). Before giving such a result, we need to recall the following technical lemma.

**Lemma 2.** *Let G be a graph and let G*[*x*, *y*] *be an inducing-stretch cycle of G defined by the induced paths PG*(*x*, *y*) = (*x*, *u*1, *u*2, . . . , *up*−1, *y*) *and pG*(*x*, *y*) = (*x*, *v*1, *v*2, . . . , *vq*−1, *y*)*. If d*(*x*, *y*) ≥ 3 *then v*<sup>1</sup> *must be incident to chords of the cycle G*[*x*, *y*]*.*

**Proof.** Since *G*[*x*, *y*] is an inducing-stretch cycle of *G*, then *s*(*G*) = *<sup>p</sup> q* . If *v*<sup>1</sup> is not incident to any chords of *G*[*x*, *y*], then the induced paths *PG*(*v*1, *y*) = (*v*1, *x*, *u*1, *u*2, . . . , *up*−1, *y*) and *<sup>p</sup>G*(*v*1, *<sup>y</sup>*) = (*v*1, *<sup>v</sup>*2, . . . , *<sup>v</sup>q*−1, *<sup>y</sup>*) imply *<sup>s</sup>G*(*v*1, *<sup>y</sup>*) = *<sup>p</sup>*+<sup>1</sup> *<sup>q</sup>*−<sup>1</sup> <sup>&</sup>gt; *p q* , a contradiction.

Let *G* be any graph. According to Lemma 1, let us consider an inducing-stretch cycle *G*[*x*, *y*] of *G*. Assume that *G*[*x*, *y*] is formed by the vertices of the induced paths *PG*(*x*, *y*) = (*x*, *u*1, *u*2, . . . , *up*−1, *y*) and *pG*(*x*, *y*) = (*x*, *v*1, *v*2, . . . , *vq*−1, *y*). Since *PG*(*x*, *y*) and *pG*(*x*, *y*) are induced paths, each chord of *G*[*x*, *y*] (if any) joins vertices *v<sup>i</sup>* and *u<sup>j</sup>* , with 1 ≤ *i* ≤ *q* − 1 and 1 ≤ *j* ≤ *p* − 1. When some vertex *v<sup>i</sup>* is incident to chords of *G*[*x*, *y*], we denote by (*v<sup>i</sup>* , *u<sup>l</sup> i* ) and (*v<sup>i</sup>* , *ur<sup>i</sup>* ) the *leftmost* and *rightmost* chords of *v<sup>i</sup>* , respectively. Formally, the indices *l<sup>i</sup>* and *r<sup>i</sup>* are defined as follows:


**Theorem 3.** *Let G be a graph. G* ∈ *s*DH(2) *if and only if the following graphs are not induced subgraphs of G:*


**Proof.** (⇒) Each provided hole and cycle has stretch number greater or equal to 2, and hence it cannot be an induced subgraph of *G*.

(⇐) We prove that if *s*(*G*) ≥ 2, then *G* contains one of the subgraphs in items (*i*), (*ii*), (*iii*), or (*iv*), or *G* contains a proper induced subgraph *G* 0 such that *s*(*G* 0 ) ≥ 2. In the latter case, we can recursively apply to *G* 0 the following proof.

According to Lemma 1, consider an inducing-stretch cycle *G*[*x*, *y*] of *G* and assume it is formed by the vertices of the induced paths *PG*(*x*, *y*) = (*x*, *u*1, *u*2, . . . , *up*−1, *y*) and *pG*(*x*, *y*) = (*x*, *v*1, *v*2, . . . , *vq*−1, *y*). Notice that, since *PG*(*x*, *y*) and *pG*(*x*, *y*) are induced paths, each possible chord of *G*[*x*, *y*] joins vertices *v<sup>i</sup>* and *u<sup>j</sup>* , with 1 ≤ *i* ≤ *q* − 1 and 1 ≤ *j* ≤ *p* − 1.

Since *<sup>p</sup> <sup>q</sup>* ≥ 2 by hypotheses, then *q* ≥ 2 by Item (*i*) of Lemma 1, and hence *p* ≥ 4. According to the value of *q*, we analyze two different cases:

*q* = 2**:** In this case, if *G*[*x*, *y*] is chordless, then it corresponds to a hole as described in Item (*i*). If the chord distance of *G*[*x*, *y*] is equal to 1, all chords are incident to *v*1. According to *p*, we have:


$$s\_{G'}(\mu\_{l\_1}, y) \ge \frac{p - l\_1}{2} \ge \frac{7 - 3}{2} = 2.$$

Hence, *G* 0 is a proper subgraph of *G* with *s*(*G* 0 ) ≥ 2. The statement follows by recursively applying to *G* 0 this proof.

	- *r*<sup>1</sup> ≥ 4**:** Consider the subgraph *G* 00 induced by the vertices in the cycle (*v*1, *x*, *u*1, *u*2, . . . , *ur*<sup>1</sup> ). In this case, the induced paths *P* <sup>00</sup> = (*x*, *u*1, *u*2, . . . , *ur*<sup>1</sup> ) and *p* <sup>00</sup> = (*x*, *v*1, *ur*<sup>1</sup> ) provide the following lower bound for *sG*<sup>00</sup> : *sG*00(*x*, *ur*<sup>1</sup> ) ≥ *r*1/2 ≥ 2. Hence, *G* 00 is a proper subgraph of *G* with *s*(*G* <sup>00</sup>) ≥ 2. The statement follows by recursively applying to *G* 00 this proof.
	- *r*<sup>1</sup> ≤ 3**:** in this case the induced paths *P* <sup>000</sup> = (*v*1, *ur*<sup>1</sup> , *ur*1+1, . . . , *up*−1, *y*) and *p* 000 = (*v*1, *v*2, . . . , *vq*−1, *y*) provide the following lower bound for *sG*(*v*1, *y*):

$$s\_G(v\_1, y) \ge \frac{p-2}{q-1}.$$

Since *<sup>p</sup>*−<sup>2</sup> *<sup>q</sup>*−<sup>1</sup> ≥ *p q* is equivalent to *<sup>p</sup> <sup>q</sup>* ≥ 2 (which holds by hypothesis), then the subgraph *G* 000 induced by the vertices in both *P* 000 and *p* 000 is a proper subgraph of *G* with stretch *p* ∗/*q* <sup>∗</sup> ≥ 2 and *q* <sup>∗</sup> = *q* − 1. Hence, the statement follows by recursively applying to *G* 000 this proof.

This concludes the proof.

Figures 3 and 4 summarize the characterizations based on forbidden subgraphs for classes DH(1) and *s*DH(2), respectively. Figure 5 provides the list of all the *minimal* forbidden subgraphs of any graph in *s*DH(2).

**Figure 4.** The forbidden subgraphs of graphs having stretch number less than 2. Dashed (dotted, respectively) lines represent paths of length at least one (chords that may or may not exist, respectively).

**Figure 5.** The *minimal* forbidden subgraphs of any graph with stretch number less than 2. Dashed lines represent paths of length at least one.

#### **4. A Characterization Based on Cycle-Chord Conditions**

For the class of distance-hereditary graphs, Howorka provided the following well known characterization based on cycle-chord conditions.

**Theorem 4** (from [1])**.** *Let G be a graph. G* ∈ DH(1) *if and only if each cycle Cn, n* ≥ 5*, of G has two crossing chords.*

In [12], this result has been reformulated in terms of chord distance:

**Theorem 5** (from [12])**.** *Let G be a graph. G* ∈ DH(1) *if and only if cd*(*Cn*) > 1 *for each cycle Cn, n* ≥ 5*, of G.*

In the remainder of this section, we provide a similar characterization for graphs belonging to *s*DH(2).

**Lemma 3.** *Let G be a graph. If s*(*G*) = 2 *then G contains, as induced subgraph, a cycle C*<sup>6</sup> *with chord distance at most 1.*

**Proof.** According to Lemma 1, consider an inducing-stretch cycle *G*[*x*, *y*] of *G*. Since *s*(*G*) = 2, assume that *G*[*x*, *y*] is formed by the vertices of the induced paths *PG*(*x*, *y*) = (*x*, *u*1, *u*2, . . . , *u*2*s*−1, *y*) and *pG*(*x*, *y*) = (*x*, *v*1, *v*2, . . . , *vs*−1, *y*), with *s* ≥ 2.

If *s* = 2 then the proof is concluded. In fact, cycle *G*[*x*, *y*] has 6 vertices and every chord of *G*[*x*, *y*] (if any) is incident to *v*1.

In the remainder of the proof assume *s* ≥ 3. In this case, according to Lemma 2, *v*<sup>1</sup> is incident to chords of *G*[*x*, *y*]. Let (*v*1, *ur*<sup>1</sup> ) be the rightmost chord incident to *v*1. We analyze different cases according to the value of *r*1.


$$s\_G(v\_1, y) \ge \frac{2s - r\_1 + 1}{s - 1} \ge \frac{2s - 2 + 1}{s - 1} = 2 + \frac{1}{s - 1}.$$

This contradicts *s*(*G*) = 2.

It follows that either *r*<sup>1</sup> = 4 or *r*<sup>1</sup> = 3. In the first case the cycle (*v*1, *x*, *u*1, *u*2, *u*3, *u*4) represents the requested cycle *C*6: chords of *G*[*x*, *y*] (if any) are all incident to *v*1. In the second case consider the induced paths (*v*1, *ur*<sup>1</sup> , *ur*1+1, . . . , *u*2*s*−1, *y*) and (*v*1, *v*2, . . . , *vs*−1, *y*). These paths induce the following lower bound on *sG*(*v*1, *y*):

$$s\_G(v\_1, y) \ge \frac{2s - r\_1 + 1}{s - 1} = \frac{2s - 3 + 1}{s - 1} = 2.$$

Hence, the above paths induce a proper subgraph *G* 0 of *G* with stretch number 2. Hence, this proof can be recursively applied to *G* 0 .

**Lemma 4.** *Let G be a graph. s*(*G*) ≥ 2 *if and only if G contains, as an induced subgraph, a cycle Cn, n* ≥ 6*, with chord distance at most 1.*

**Proof.** (⇐) Trivial.

(⇒) If *s*(*G*) = 2, then it is sufficient to use Lemma 3. Now, let us assume that *s*(*G*) = *p*/*q* > 2 such that *p* and *q* are coprime. By Lemma 1, if *G*[*x*, *y*] is an inducing-stretch cycle of *G*, according to the hypotheses, we may assume that *G*[*x*, *y*] is formed by the vertices of the induced paths *PG*(*x*, *y*) = (*x*, *u*1, *u*2, . . . , *up*·*s*−1, *y*) and *pG*(*x*, *y*) = (*x*, *v*1, *v*2, . . . , *vq*·*s*−1, *y*), with *s* ≥ 1.

If *d*(*x*, *y*) = 2, then *G*[*x*, *y*] contains at least 6 vertices and all its chords (if any) are incident to *v*1. Then, *G*[*x*, *y*] corresponds to the requested cycle.

In the remainder, assume that *d*(*x*, *y*) ≥ 3. In this case, by Lemma 2, vertex *v*<sup>1</sup> is incident to chords of *G*[*x*, *y*]: let (*v*1, *ur*<sup>1</sup> ) be the rightmost chord incident to it.

If *r*<sup>1</sup> ≤ 3, then the two induced paths (*v*1, *ur*<sup>1</sup> , *ur*1+1, . . . , *up*·*s*−1, *y*) and (*v*1, *v*2, . . . , *vq*·*s*−1, *y*) provide the following lower bound for *sG*(*v*1, *y*):

$$s\_G(v\_1, y) \ge \frac{p \cdot s - r\_1 + 1}{q \cdot s - 1}.$$

Now we show that

$$\frac{p \cdot s - r\_1 + 1}{q \cdot s - 1} > \frac{p}{q}.\tag{1}$$

It can be easily observed that Equation (1) is equivalent to

$$\frac{p}{q} > r\_1 - 1.\tag{2}$$

Since *r*<sup>1</sup> ≤ 3 and *p*/*q* > 2 by hypothesis, then Equation (2) holds. This implies that *sG*(*v*1, *y*) > *p*/*q*, a contradiction.

Then, it follows that *r*<sup>1</sup> ≥ 4. In this case, *C* = (*x*, *u*1, *u*2, . . . , *ur*<sup>1</sup> , *v*1) is an induced cycle with *r*<sup>1</sup> + 2 ≥ 6 vertices and chord distance at most 1 (In *C*, all the possible chords are incident to *v*1). This concludes the proof.

This lemma can be reformulated so that it directly provides a characterization for the graphs under consideration.

**Theorem 6.** *Let G be a graph. G* ∈ *s*DH(2) *if and only if cd*(*Cn*) > 1 *for each cycle Cn, n* ≥ 6*, of G.*

Compare Theorems 5 and 6 to observe the similarity between the cycle-chord characterizations of graphs with stretch number equal to 1 and graphs with stretch number less than 2, respectively.

#### **5. Recognition Algorithm**

The distance-hereditary graphs, i.e., graphs in DH(1), can be recognized in linear time [16], while the recognition problem for the generic class DH(*k*), *k* not fixed, is co-NPcomplete [12]. For small and fixed values of *k*, in [14] a partial answer to this basic problem is given. In particular, Lemma 1 states that for *k* < 2, only specific rational numbers may act as stretch numbers. In [14], a characterization for each class DH(2 − 1/*i*), *i* > 1, has been provided, and such a characterization led to a polynomial time algorithm for the recognition problem for the class DH(2 − 1/*i*), with fixed *i* > 1. Unfortunately, the running time of this algorithm is bounded by *O*(*n* 3*i*+2 ).

In this section, we propose a polynomial-time algorithm for solving the recognition problem for the class *s*DH(2) according to the following approach. Lemma 4 provides a characterization for all graphs not belonging to *s*DH(2). It is based on detecting whether a given graph *G* contains or not an induced cycle *Cn*, *n* ≥ 6, with chord distance at most 1. Now, assume that we have an algorithm A returning *true* if and only if a given graph *G* contains such a cycle. Then, to recognize whether *G* ∈ *s*DH(2) we can simply use A on *G* and certify the membership if and only if A return *false*. In the remainder of this section we show that such an algorithm A can be defined.

#### *5.1. An Existing Hole Detection Algorithm*

We remind that *H<sup>k</sup>* denotes a hole, i.e., a chordless cycle with *k* ≥ 5 vertices. In [15], Nikolopoulos and Palios provided the following result about the hole detection problem.

**Theorem 7** (from [15])**.** *Given any connected graph G* = (*V*, *E*) *with* |*V*| = *n and* |*E*| = *m, it is possible to determine whether G contains a hole in O*(*m*<sup>2</sup> ) *time and O*(*nm*) *space.*

They also extended their result to larger versions of holes.

**Corollary 1** (from [15])**.** *Let G* = (*V*, *E*) *be a connected graph with* |*V*| = *n and* |*E*| = *m, and let k* ≥ 5 *be a constant. It is possible to determine whether G contains a hole on at least k vertices in O*(*nmp*−<sup>1</sup> ) *time and O*(*mp*−<sup>1</sup> ) *space if k* = 2*p, and in O*(*n* + *m<sup>p</sup>* ) *time and O*(*nmp*−<sup>1</sup> ) *space if k* = 2*p* + 1*.*

Therefore, according to this corollary, it is possible to check whether *G* contains a hole *Hk* , with *<sup>k</sup>* <sup>≥</sup> 6 vertices, in *<sup>O</sup>*(*nm*<sup>2</sup> ) time and *O*(*m*<sup>2</sup> ) space.

#### *5.2. Quasi-Hole Detection Algorithm*

We call *quasi-hole* any cycle *C<sup>k</sup>* such that *k* ≥ 5 and *cd*(*C<sup>k</sup>* ) ≤ 1. In what follows, we show that the hole-detection algorithms recalled in Theorem 7 and Corollary 1 can be adapted to detect quasi-holes in any connected graph *G*. This adapted version is called QuasiHoleDetection and it is described in pseudo-code as shown in Algorithms 1 and 2. The strategy behind QuasiHoleDetection is based on the following result:

**Lemma 5.** *A connected graph G contains a quasi-hole if and only if there exists a cycle* (*v*0, *v*1, . . . , *v*` )*,* ` ≥ 4*, in G such that each path* (*v<sup>i</sup>* , *vi*+<sup>1</sup> , *vi*+<sup>2</sup> , *vi*+3)*, i* = 1, . . . , ` − 3*, is a P*<sup>4</sup> *of G.*


a "length" defined as *length*(*v<sup>i</sup>* , *vj*) = |*j* − *i*|. Now, let (*v<sup>l</sup>* , *vr*), with *l* < *r*, be a chord with minimum length. By definition, 0 < *l* < *r* ≤ ` holds. Since (*v<sup>l</sup>* , *vl*+<sup>1</sup> , *vl*+<sup>2</sup> , *vl*+<sup>3</sup> ) is a *P*4, then *r* ≥ *l* + 4, and hence *C* <sup>0</sup> = (*v<sup>l</sup>* , *vl*+<sup>1</sup> , . . . , *vr*) results to be a cycle with at least 5 vertices. Moreover, between *v<sup>i</sup>* and *v<sup>j</sup>* , for each *l* ≤ *i* < *i* + 2 ≤ *j* ≤ *r*, (*i*, *j*) 6= (*l*,*r*), cannot exist an edge, otherwise it would be a chord with length smaller than *length*(*v<sup>l</sup>* , *vr*).

Since *C* 0 is a cycle with at least 5 vertices and with chord distance zero, then it contradicts the fact that *C* is the shortest among the cycles fulfilling the conditions of the statement. Hence, *cd*(*C*) ≤ 1.

Since both the properties at points (*i*) and (*ii*) hold, it follows that *C* is a quasi-hole.


The above lemma is used by the provided algorithm for the detection of quasi-holes in *G*. To this end, we associate to *G* a directed graph *G* <sup>+</sup> defined as follows:


If (*a*, *b*, *c*) is a path *P*<sup>3</sup> of *G*, then both the vertices *vabc* and *vcba* belong to *G* <sup>+</sup>. In a similar way, if (*a*, *b*, *c*, *d*) is a path *P*<sup>4</sup> of *G*, then the edges (*vabc*, *vbcd*) and (*vdcb*, *vcba*) must be contained in *G* <sup>+</sup>. Hence, visiting *G* <sup>+</sup> is equivalent to proceeding along *P*4s of *G*. It follows that the conditions of Lemma 5 on *G* can be verified by performing a revised DFS on *G* + (cf. [17]). In turn, the following lemma holds:

**Lemma 6.** *Let G be any connected graph, and let G* <sup>+</sup> *be its associated directed graph. By performing a DFS on G* <sup>+</sup>*, if the DFS-path is vu*0*u*1*u*<sup>2</sup> , *vu*1*u*2*u*<sup>3</sup> , . . . , *<sup>v</sup>uk*−2*uk*−1*u<sup>k</sup> , where u<sup>i</sup>* 6= *u<sup>j</sup> for each* 0 ≤ *i* < *j* < *k and u<sup>k</sup>* = *u*` *for some* ` *such that* 0 ≤ ` < *k, then u*` , *u*`+<sup>1</sup> , . . . , *uk*−<sup>1</sup> *are vertices forming a cycle in G that fulfill Lemma 5. Conversely, if G contains a quasi-hole, the DFS on G* <sup>+</sup> *will meet a sequence of vertices in G* <sup>+</sup> *whose corresponding P*3*s in G produce a path as the path* (*v*1, *v*2, . . . , *v*` ) *in the cycle as in Lemma 5.*

#### **Algorithm 2:** A recursive procedure used by QuasiHoleDetection to perform an adapted DFS.

```
Procedure: procedure Visit
  Input :four vertices base, u1, u2, and u3 of G
1 AP[u3] ← 1;
2 walked_P3[(u1, u2), u3] ← 1;
3 foreach (u3, u4) ∈ E \ {(u3, u2)} do
4 if u4 = base then
5 if AP.size ≥ 5 then
          // the active path determines a quasi-hole
6 print "true" ;
7 exit;
8 else
9 break
10 end
11 else
12 if (u2, u4) 6∈ E and (u1 = base or (u1, u4) 6∈ E) then
          // here, when u1 6= base, (u1, u2, u3, u4) forms a P4 in G
13 if AP[u4] = 1 then
             // the active path determines a hole
14 print "true" ;
15 exit;
16 end
17 if walked_P3[(u2, u3), u4] = 0 then
18 Visit(base, u2, u3, u4);
19 end
20 end
21 end
22 end
23 AP[u3] ← 0;
```
By following the same strategy used in [15], to reduce the space complexity required by *G* <sup>+</sup>, the DFS on *G* <sup>+</sup> is simulated by performing a revised DFS directly on *G*. This revised DFS on *G* is implemented by Algorithm QuasiHoleDetection (cf. Figure 1).

At Line 1, the algorithm computes the adjacency matrix *M*[ ] of *G* from its adjacencylist (we assume that *G* is provided as input according to this representation). *M*[ ] is used to check the adjacency in constant time. At Line 2, each vertex *v*<sup>1</sup> of *G* is checked against the following possible role: *v*<sup>1</sup> belongs to a quasi-hole *C* and all the chords of *C*, if any, are adjacent to *v*1. To perform this check, at Line 6 we consider each edge (*v*2, *v*3) in *G*: if this edge, along with (*v*1, *v*2) (cf. Line 7) or (*v*1, *v*3) (cf. Line 12), form a path with three vertices, then the algorithm tries to extend this path into the requested cycle by recursively calling the Procedure Visit (see Algorithm 2).

Visit works according to Lemma 5: in any step, it attempts to extend a path *P*<sup>3</sup> defined by (*u*1, *u*2, *u*3) into *P*4s of the form (*u*1, *u*2, *u*3, *u*4); then, for each such *P*4, the procedure proceeds by extending the *P*<sup>3</sup> formed by (*u*2, *u*3, *u*4) into *P*4s of the form (*u*2, *u*3, *u*4, *u*5), and so on. In this situation, the *active-path* is first extended from (*u*1, *u*2, *u*3) to (*u*1, *u*2, *u*3, *u*4), then to (*u*1, *u*2, *u*3, *u*4, *u*5) and so on. In case of backtracking, the last vertex is removed of the current active-path. By proceeding in this way, two cases may occur:

• the initial vertex *v*<sup>1</sup> (called *base* in the algorithm) is added again to the active-path (cf. Line 4). If the length of the active-path is 5 or more (cf. Line 5), then the graph contains a cycle fulfilling the conditions of Lemma 5 and hence a quasi-hole is found; • at the end of the active-path there is a vertex different from *base* but already inserted in the active-path (cf. Lines 12–13). In this case, again the conditions of Lemma 5 apply, but now we are sure that a hole is found.

It is worth to remark that the ongoing active-path *P* on *G* and the ongoing DFS-path *P* <sup>+</sup> on *G* <sup>+</sup> contain exactly the same vertices: the elements of *P* correspond to the vertices of the *P*3s associated with the elements of *P* <sup>+</sup> (in *P*, the repeated vertices of *G* in adjacent *P*3s are present only once).

We now explain the role of the additional data structures *AP*[·] and *walked*\_*P*3[(·, ·), ·]. The former is an auxiliary array of size *n* used to check if a vertex appears in the "active path" computed so far; given *u*, *AP*[*u*] is equal to 1 if *u* appears in the active path, 0 otherwise. Concerning the latter, during the visit on *G* <sup>+</sup>, vertices that correspond to path *P*3s of *G* are recorded so that they are not "visited" again. The entry *walked*\_*P*3[(*u*1, *u*2), *u*3] equals one if and only if the vertices *u*1, *u*2, *u*<sup>3</sup> induce (*u*1, *u*2, *u*3) as a path *P*<sup>3</sup> of *G* already encountered during the DFS, otherwise it equals zero. Since *walked*\_*P*3[(·, ·), ·] has entries *walked*\_*P*3[(*u*1, *u*2), *u*3] and *walked*\_*P*3[(*u*2, *u*1), *u*3] for each edge (*u*1, *u*2) ∈ *E* and for each *u*<sup>3</sup> ∈ *V*, then its size is 2*m* · *n*. Notice that Visit registers the entry of *walked*\_*P*3[ ] at the beginning, thus avoiding another execution on the same path *P*3. In this way, Visit() is executed exactly once for each path *P*<sup>3</sup> of *G*.

Notice that the description of Visit() assures that starting from a *P*<sup>3</sup> formed by (*u*1, *u*2, *u*3) we proceed to a *P*<sup>3</sup> formed by (*u*2, *u*3, *u*4) only if (*u*1, *u*2, *u*3, *u*4) is a path *P*<sup>4</sup> of *G*. The only exception is when *u*<sup>1</sup> coincides with the starting vertex *v*<sup>1</sup> selected at Line 2 by QuasiHoleDetection: in such a case (*u*1, *u*2, *u*3, *u*4) may have chords from *u*1. For this purpose, the initial vertex *v*<sup>1</sup> is assigned to the variable *base* (cf. Line 4 of the main algorithm) and it is later passed to Visit (cf. Lines 9 and 14 of the main algorithm).

We can now provide the following statement:

**Theorem 8.** *Given any connected graph G* = (*V*, *E*) *with* |*V*| = *n and* |*E*| = *m, it is possible to determine whether G contains a quasi-hole in O*(*nm*<sup>2</sup> ) *time and O*(*nm*) *space.*

**Proof.** According to the above description of QuasiHoleDetection, its correctness follows from Lemmas 5 and 6, and from the inherent execution of DFS on *G* <sup>+</sup>. In the remainder of the proof we analyze the complexity of the algorithm about the required time and space.

As *G* is a connected graph, we get *n* = *O*(*m*). Concerning the data structures used by the algorithm, we assume that from any edge (*v*1, *v*2) it is possible to access in constant time both its endpoints; alike, from any entry in the adjacency matrix *M*[ ] of *G* corresponding to *v*<sup>1</sup> and *v*<sup>2</sup> it is possible to access in constant time the edge (*v*1, *v*2).

Consider first the time complexity of performing the revised DFS of *G*. The visit starts at Line 6, and proceeds by recursive calls to Visit. This recursive procedure checks each path (*u*1, *u*2, *u*3) of *G* which is a *P*<sup>3</sup> and tries to extend it into a *P*<sup>4</sup> of the form (*u*1, *u*2, *u*3, *u*4). Notice that each set of vertices {*u*1, *u*2, *u*3, *u*4} where (*u*1, *u*2, *u*3) is a *P*<sup>3</sup> and *u*<sup>4</sup> is adjacent to *u*<sup>3</sup> is uniquely characterized by the ordered pair ((*u*1, *u*2),(*u*3, *u*4)) where (*u*1, *u*2) and (*u*3, *u*4) are ordered pairs of adjacent vertices in *G*. Hence, the time required to perform the whole visit according to the recursive executions of Visit is *O*(*m*<sup>2</sup> ). We can now determine the time complexity of QuasiHoleDetection. Step at Line 1 clearly takes *O*(*n* 2 ) time. The subsequent loop at Line 2 is repeated *O*(*n*) times, and for each step the algorithm requires *O*(*nm*) time for the initialization at Line 3 and, as described before, *O*(*m*<sup>2</sup> ) time for visiting *G* according to the recursive calls to Visit.

It follows that the final time complexity is *O*(*nm*<sup>2</sup> ). The algorithm requires *O*(*nm*) space: *O*(*n*) and *O*(*nm*) for the arrays *AP*[ ] and *walked*\_*P*3[ ], respectively, and *O*(*n* 2 ) for the adjacency matrix *M*[ ] and the adjacency-list used to represent *G*.

#### *5.3. Detecting Quasi-Hole on at Least k Vertices*

As in [15], the strategy described above to define a quasi-hole detection algorithm can be generalized to built algorithms for the detection of quasi-holes on at least *k* vertices, with *k* ≥ 5. For any input graph *G*, we consider the following family of directed graphs *G* (*t*) :


By definition, *G* ≡ *G* (2) and *G* <sup>+</sup> <sup>≡</sup> *<sup>G</sup>* (4) where *G* <sup>+</sup> is the direct graph associated to *G* in Section 5.2. Therefore, in the same way that running DFS on *G* <sup>+</sup> <sup>≡</sup> *<sup>G</sup>* (4) allowed us to detect quasi-holes (on at least five vertices), running DFS on *G* (*k*−1) allows us to detect (extended) quasi-holes on at least *k* vertices, for each constant *k* ≥ 5. This is ensured by the following statement, which represents a generalization of Lemma 5:

**Lemma 7.** *Given a constant k* ≥ 5*, a graph G contains a quasi-hole on at least k vertices if and only if G contains a cycle* (*u*0, *u*1, . . . , *ut*)*, with t* ≥ *k* − 1*, such that* (*u<sup>i</sup>* , *ui*+<sup>1</sup> , . . . , *<sup>u</sup>i*+*k*−<sup>2</sup> ) *is an induced path Pk*−<sup>1</sup> *of G for each i* = 1, 2, . . . , *<sup>t</sup>* − *<sup>k</sup>* + <sup>2</sup>*.*

Lemmas 6 and 7 induce the following statement:

**Corollary 2.** *Let G be a connected graph and let k* ≥ 5 *be a constant. Assume that a DFS is executed on G* (*k*−1) *, the directed graph associated to G. If the active path computed by the DFS is vu*0*u*<sup>1</sup> ···*uk*−<sup>3</sup> , *<sup>v</sup>u*1*u*2···*uk*−<sup>2</sup> , . . . , *<sup>v</sup>ur*−*k*+3*ur*−*k*+<sup>4</sup> ···*u<sup>r</sup> , where u<sup>i</sup>* 6= *u<sup>j</sup> for all* 0 ≤ *i* < *j* < *r, and u<sup>r</sup>* = *u<sup>p</sup> for some p such that* 0 ≤ *p* < *r, then up*, *up*+1, . . . , *ur*−<sup>1</sup> *are vertices forming a cycle in G that fulfill the conditions of Lemma 7. Conversely, if G contains a quasi-hole on at least k vertices, the DFS on G* (*k*−1) *will meet a sequence of vertices whose associated <sup>P</sup>k*−<sup>2</sup> *s in G form a path as the path* (*u*1, *u*2, . . . , *ut*) *in the cycle of Lemma 7.*

Additionally, in this situation we do not build *G* (*k*−1) since we implicitly run DFS on this associated graph. In particular, we process each unvisited *<sup>P</sup>k*−<sup>2</sup> of *<sup>G</sup>* as follows: we try to extend the induced path *<sup>P</sup>k*−<sup>2</sup> formed by (*u*0, *<sup>u</sup>*1, . . . , *<sup>u</sup>k*−<sup>3</sup> ) into *<sup>P</sup>k*−1s of the form (*u*0, *<sup>u</sup>*1, . . . , *<sup>u</sup>k*−<sup>3</sup> , *uk*−<sup>2</sup> ); then, for each such *<sup>P</sup>k*−<sup>1</sup> , we proceed by extending the *Pk*−<sup>2</sup> (*u*1, *<sup>u</sup>*2, . . . , *<sup>u</sup>k*−<sup>2</sup> ) into *<sup>P</sup>k*−1s, and so on. Since there exist *<sup>O</sup>*(*m<sup>a</sup>* ) induced paths on 2*a* vertices and *O*(*nm<sup>a</sup>* ) on 2*a* + 1 vertices, and it requires *O*(*k*) time to detect whether a vertex extends a *Pk*−<sup>1</sup> into a *P<sup>k</sup>* , we have the following corollary:

**Corollary 3.** *Let G* = (*V*, *E*) *be a connected graph with* |*V*| = *n and* |*E*| = *m, and let k* ≥ 5 *be a constant. By implicitly running DFS on G* (*k*−1) *it is possible to detect whether G contains a quasi-hole on at least k vertices in O*(*n* <sup>2</sup>*mp*−<sup>1</sup> ) *time when k* = 2*p, and in O*(*n* <sup>2</sup> + *nm<sup>p</sup>* ) *time when k* = 2*p* + 1*.*

The space required is *O*(*mp*−<sup>1</sup> ) when *k* = 2*p*, and *O*(*nmp*−<sup>1</sup> ) when *k* = 2*p* + 1. According to Lemma 4 and Corollary 3, we finally get the following result:

**Theorem 9.** *Let G* = (*V*, *E*) *be a connected graph with* |*V*| = *n and* |*E*| = *m. It is possible to recognize whether G* ∈ *s*DH(2) *in O*(*n* <sup>2</sup>*m*<sup>2</sup> ) *time and O*(*m*<sup>2</sup> ) *space.*

#### **6. Conclusions**

In this paper, we studied the class *s*DH(2). It contains each graph *G* with stretch number less than two, that is *s*(*G*) < 2. These graphs form a superclass of the well studied distance-hereditary graphs, which corresponds to graphs with stretch number equal to one.

For the class *s*DH(2) we provided: (1) a characterization based on listing all the minimal forbidden subgraphs, (2) a characterization based on cycle-chord properties, and (3) a recognition algorithm that works in *O*(*n* <sup>2</sup>*m*<sup>2</sup> ) time and *O*(*m*<sup>2</sup> ) space. This algorithm exploits the result based on the cycle-chord property to detects quasi-holes in a graph; it is a simple adaptation of the algorithm provided in [15] for detecting holes.

The characterizations found seem to suggest that the graphs in *s*DH(2) and those in DH(1) may be really similar in structure and hence properties. As a consequence, it would be interesting to determine whether the class *s*DH(2) can be also characterized according to generative operations (we remind that the generative properties resulted as the most fruitful for devising efficient algorithms for distance-hereditary graphs). This problem has been partially addressed in [18,19].

On the contrary, Theorem 1 could suggest that graphs with stretch number greater or equal to two may have a completely different structure with respect to those in DH(1).

Another possible extension of this work could be to investigate in the class *s*DH(2) other specific combinatorial problems that have been solved in the class of distancehereditary graphs.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The author declares no conflicts of interest.

#### **References**


MDPI St. Alban-Anlage 66 4052 Basel Switzerland Tel. +41 61 683 77 34 Fax +41 61 302 89 18 www.mdpi.com

*Algorithms* Editorial Office E-mail: algorithms@mdpi.com www.mdpi.com/journal/algorithms

MDPI St. Alban-Anlage 66 4052 Basel Switzerland

Tel: +41 61 683 77 34 Fax: +41 61 302 89 18

www.mdpi.com ISBN 978-3-0365-1541-0