*Article* **Dynamic Programming Algorithms for Computing Optimal Knockout Tournaments**

**Amelia B ˘adic ˘a 1, Costin B ˘adic ˘a 2,\*, Ion Buligiu 1, Liviu Ion Ciora <sup>1</sup> and Doina Logof ˘atu <sup>3</sup>**


**Abstract:** We study competitions structured as hierarchically shaped single-elimination tournaments. We define optimal tournaments by maximizing attractiveness such that the topmost players will have the chance to meet in higher stages of the tournament. We propose a dynamic programming algorithm for computing optimal tournaments and we provide its sound complexity analysis. Based on the idea of the dynamic programming approach, we also develop more efficient deterministic and stochastic sub-optimal algorithms. We present experimental results obtained with the Python implementation of all the proposed algorithms regarding the optimality of solutions and the efficiency of the running time.

**Keywords:** optimization; knockout tournament; dynamic programming algorithm; computational complexity; combinatorics

#### **1. Introduction**

Tournament design is a combinatorial problem with many theoretical implications, as well as with a lot of practical applications. There are many types of tournaments that have been theoretically analyzed and practically used in various contexts. Basically, there are two main principles used in tournament design: "round-robin" principle and "knockout" principle. They can be used in isolation or combined for obtaining different tournaments designs, depending on various factors, such as the number of players, time available to carry out the tournament, and application domain.

In this paper we propose a formal definition of competitions that have the shape of single-elimination tournaments, also known as knockout tournaments. We introduce methods to quantitatively evaluate the attractiveness and competitiveness of a given tournament. We consider that a tournament is more attractive if competition is encouraged in higher stages, i.e., higher-ranked players will have the chance to meet in higher stages of the tournament, thus increasing the stake of their matches.

In knockout tournaments, the result of each match is always a win of one of the two players, i.e., draws are not possible. A knockout tournament is hierarchically structured as a binary tree such that each leaf represents one player or team that is enrolled in the tournament, while each internal node represents a game of the tournament.

The tournament is carried out in a series of rounds. If there is a number *N* of players equal to a power of 2, for example *N* = 8 = 23 then the tournament tree is a complete binary tree with all the players entering the tournament in the first round; however, in the general case, the number of players might not be a power of 2, for example *N* = 9. In this case some of the players will receive waivers thus entering the tournament directly in the the second round, while the rest of the players will enter the tournament in the first round.

In this paper we significantly extend our preliminary results reported in [1] for fully balanced tournaments (the number of players is *N* = 2*k*) to general tournaments where the

**Citation:** B ˘adic ˘a, A.; B ˘adic ˘a, C.; Buligiu, I.; Ciora, L.I.; Logof ˘atu, D. Dynamic Programming Algorithms for Computing Optimal Knockout Tournaments. *Mathematics* **2021**, *9*, 2480. https://doi.org/10.3390/ math9192480

Academic Editor: Fabio Caraffini

Received: 7 September 2021 Accepted: 27 September 2021 Published: 4 October 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

number of players can be an arbitrary natural number, not necessarily a power of 2. Our new results are summarized as follows:


#### **2. Related Works**

Tournament design attracted research in operations research, combinatorics, and statistics. The problem is also related to intelligent planning and activity scheduling, broadly covered also by artificial intelligence.

There are two main principles used in tournament design, namely the "round-robin" principle and "knockout" principle, and they can be used in isolation or combined for obtaining different tournaments designs. The "round-robin" principle states that in a tournament, each two players should meet at least once, sometimes exactly once. "Knockout", also known as the "elimination" principle states that players are eliminated after a certain number of games, sometimes exactly after one game.

For example, in a round-robin tournament in which each two players should meet exactly once, an important aspect is the scheduling of the tournament, a problem also known as league scheduling [2]. This is an important component of the tournament design. Note that the "round robin" principle can also be applied with restrictions. Consider for example a two-team tournament, in which each team has the same number of players. Each game involves two players from different teams and any two players from different teams must play exactly once. In this case, we can still apply the round-robin principle, but players of the same team are not allowed to play. This type of tournament is called a bipartite tournament. A good coverage of the combinatorial aspects of round-robin tournaments can be found in monograph [2].

On the other hand, in a knockout tournament where two players will play at most one game and after each game exactly one player advances in the tournament, while the other is kicked off, the tournament schedule results directly from the tournament design, i.e., no separate scheduling stage is needed.

Note also that the round-robin and knock-out principles can be combined into a single tournament design. Consider for example the UEFA Champions League football tournament. In the groups' phase, round-robin is used inside each group, while after the group phase, knockout is used to determine the tournament winner.

In this paper we consider knockout tournaments in which two players meet at most once. These are specialized tournaments with possible applications in sports (e.g., football and tennis tournaments), online games (e.g., online poker), and election processes. While in the former case there is a relatively low number of players, thus not raising special computational challenges, the number of players in massive multiplayer online games can grow such that computing the optimal tournament becomes a more difficult problem.

A comprehensive analysis of knockout tournaments is proposed in [3,4]. Traditionally, the method for designing a tournament involves two stages: (i) tournament structure design and (ii) seeding. In the first stage, the structure of the tournament tree is proposed.

In the second stage, players are assigned to each leaf of the tournament tree; however, as we can argue that this method of tournament design has some limitations, we proposed different "integrated" approach. One limitation is for example the fact that once fixed, the tournament structure cannot be changed. On one hand this will result in a smaller search space during the seeding process, on the other hand it limits the total number of tournament designs. So our model of tournament trees includes both the tree structure, as well as the players' seeding; a separate seeding process is not necessary.

Research in combinatorics of knockout tournaments also produced interesting mathematical results. The structure of a knockout tournament can be modeled as a special kind of binary tree, called an Otter tree [5]. The number of knockout tournament structures (i.e., prior to seeding) for *N* players is given by the Wedderburn—Etherington number [6] of order *<sup>N</sup>* that is known to have an exponential growth approximately equal to 0.3188 <sup>×</sup> 2.4832*<sup>N</sup> <sup>N</sup>*1.5 .

An important aspect concerns the factors that can be used to evaluate a tournament design. This problem has been also considered in previous works [3,4]. An interesting discussion of economic aspects of tournament attractiveness, such as spectator interest, is provided by [7].

A method for augmenting a tournament with probabilistic information based on tournament results was proposed in [8]. The effectiveness of tournament plans based on dominance graphs is studied in [6]. The tournament problem was also a source of inspiration for programming competitions [9]. A more recent work addressing competitiveness development and ranking precision of tournaments is [10].

For example, the more recent work [3,4] proposes a probabilistic approach to define the tournament cost, by including in their model the win–loss probabilities of each game between two players *i* and *j*. On one hand, we can question the robustness of such values. On the other hand, we recognize that some approximations of such values might be empirically obtained based on several factors, e.g., global player rankings (when available) or on the history of games between the players (if a nonempty history exists). In our work we do not use this information, i.e., we assume by default that there are equal winning chances for the players of each game. While this simplification clearly has drawbacks, it has the advantage of enabling a clean algorithm design based on dynamic programming principles. Our approach can be extended by adding probabilistic information to the cost function, but then it will require further analysis of algorithmic solutions within our "integrated" approach.

A theoretical investigation of knockout tournaments is provided by [11]. Their analysis is focused only on tournaments with a power of 2 number of players, i.e., similar with [1], but definitely less general than in the current work, where an arbitrary number of players is considered. Interesting results of this work concern the discussion of new seeding approaches named "equal gap" and "increasing competitive", as well as the investigation of their theoretical properties.

There has been also theoretical interest in analyzing the possible outcomes of knockout tournaments. Upper and lower bounds of winning probabilities of players of a random knockout tournament are provided in [12]. Note that the analysis is focused on the random knockout tournament where the definition of matches to be carried out in each round is defined randomly. Moreover, this work assumes as [3,4], that the win–loss probabilities of each match between two players are known.

There is also interest in the literature in designing new formats of knockout tournaments. For example, a new format based on actively involving the teams in defining the tournament format, was recently proposed in [13] for the specific competition of UEFA Champions League. The proposed format was coined "Choose your opponent" with the claimed benefit to make group stages more exciting. The authors also show how this model can be used for the objective of maximizing the number of home games during the knockout stage.

Knockout tournament structures are sometimes called tournament brackets. According to [14], two types of tournament brackets are possible: fixed and adaptive. In fixed

brackets, the tournament structure is fixed, while in adaptive brackets pairings in stage *i* + 1 are defined based on winners of stage *i*. Our approach is clearly fixed, with the difference that we use an integrated approach to define both the structure as well as the seeding. What is different in [14] is the fact that authors look for optimizing a fixed bracket by using utility functions and Bayesian optimal design. They propose a simulated annealing algorithm to optimize the expected value of a given utility function on a fixed tournament bracket. While interesting, this endeavor is clearly different from our approach. We plan however to investigate in the future the suitability of extending integrated approach and proposed algorithms by incorporating probabilistic information.

Clearly tournaments have a lot of practical applications, for example in the sports' domain. In this context, the recent work [15] provides an interesting discussion on the economics of sports from operations research, as well as practical applicability perspectives. The discussion is centered around several paradoxes of tournament rankings, with clear examples from the practice of tournament design.

#### **3. Knockout Tournaments**

We consider hierarchically structured knockout tournaments such that the result of each match is always a win of one of the two players, i.e., draws are not possible. A tournament is modeled as a binary tree such that each leaf node represents a unique player and each internal node represents a game between two players and its winner.

**Definition 1** (Tournaments)**.** *Let* Σ *be a finite nonempty set of players. We define the set of trees* T (Σ) *with leaves* Σ *as follows:*


$$\mathcal{T}(\Sigma) = \{ t | t = \{ t\_1, t\_2 \}, t\_1 \in \Sigma\_1, t\_2 \in \Sigma\_2 \} \tag{1}$$

Note that the set notation in Equation (1) implies that the trees are not ordered, i.e., the order of the left and right branches does not matter.

**Example 1.** *We consider examples of tournaments for sets of players with* 1, 2, 3 *and* 4 *elements:*

	- {{{1, 2}, 3}, 4},...}*. It is not difficult to see that in this case there are* 15 *trees.*

Some of the tournaments introduced in Example 1 are depicted graphically in Figure 1. Observe that the tournaments on the first row (labeled "a" and "b") involve a number of elements that is a power of two (2 = 21 and 4 = 22, respectively) and are fully balanced. However, the tournaments on the second row are not fully balanced, although the lower rightmost tournament involves 4 = 22 players. However, intuitively, the tournament with three players (labeled "c") should be accepted, as player 3 will enter the tournament only 1 round after players 1 and 2, i.e., it has a sense of "balancing". However, the lower rightmost tournament with four players (labeled "d") is not acceptable, as player 4 received an exemption from playing in the first two rounds, and this is considered unfair.

**Figure 1.** Tournaments of two and three players (first column) and four players (second column).

**Proposition 1** (Counting tournaments)**.** *The set* T (Σ) *with* |Σ| = *N players contains:*

$$\frac{(2N-2)!}{(N-1)! \times 2^{N-1}}\tag{2}$$

*elements.*

**Proof.** The number of full binary tree structures with *N* leaves is equal to *CN*−<sup>1</sup> where *CN* is Catalan's number [16] defined by:

$$C\_N = \frac{1}{N+1} \binom{2N}{N} \tag{3}$$

Now, each permutation of the *N* players can be attached to the leaves of a binary tree, thus obtaining *N*! · *CN*−<sup>1</sup> trees. However, the branches of each internal node can be exchanged, resulting in the same tree. There are *N* − 1 internal nodes and therefore a total number of 2*N*−<sup>1</sup> exchanges, resulting a number of trees given by:

$$\frac{N! \cdot \mathbb{C}\_{N-1}}{2^{N-1}} = \frac{(2N-2)!}{(N-1)! \cdot 2^{N-1}}\tag{4}$$

q.e.d.

**Example 2.** *For example, if N* = 3 *we obtain* 4! 2!×<sup>22</sup> <sup>=</sup> <sup>3</sup> *trees, while if <sup>N</sup>* <sup>=</sup> <sup>4</sup> *we obtain* 6! 3!×<sup>23</sup> <sup>=</sup> <sup>15</sup> *trees. These results are consistent with Example 1.*

A valid tournament should be balanced, i.e., each player should play (almost) the same number of games to win the tournament.

Analyzing the tournaments from Example 1 and Figure 1 we can observe that if |Σ| ≤ 3 then each element of T (Σ) represents a valid tournament. However, if |Σ| = 4 then only 3 trees of T (Σ) represent valid tournaments. For example, {{1, 2}, {3, 4}} is a valid tournament as each player should play exactly two games to win the tournament. In this case we have a fully balanced tournament consisting of *N* = 22 players. Moreover, {{1, 2}, 3} is also considered a valid tournament, as players 1 and 2 must play two games to win, while player 3 must play one game to win, i.e., has an exemption for the first round (the difference between the number of games played by each player is at most 1). However, {{{1, 2}, 3}, 4} is not a valid tournament, as players 1 and 2 must play three games to win the tournament, while player 4 must play a single match to win the tournament (the

difference between the number of games played by each player is above 1, i.e., more than one exemption for a player is considered unfair).

Observe that a tree representing a valid tournament has the property that all its leaves are of height *n* or *n* + 1 for a suitable value of *n*. Actually, the value of *n* can be determined from the given number of players *N* of the tournament and it represents the number of rounds of the tournament.

Let us consider a tournament with *n* rounds. It is not difficult to see that the maximum number of players is *Nmax* = 2*<sup>n</sup>* and it is obtained when in the first round we have a maximum number of 2*n*−<sup>1</sup> games; therefore, for a tournament with *n* rounds we have:

$$2^{n-1} < N \le 2^n \tag{5}$$

Observe that from Equation (5) it follows that:

$$n = \lceil \log\_2 N \rceil \tag{6}$$

**Definition 2** (Balanced (valid) tournaments)**.** *Let <sup>n</sup>* <sup>∈</sup> <sup>N</sup> *be the number of rounds. Let* <sup>Σ</sup> *be a nonempty set of N players such that conditions (5) and (6) are fulfilled. Then the set* T*n*(Σ) *of balanced trees with n layers representing the set of balanced (valid) tournaments with n rounds is defined as follows:*


If *<sup>n</sup>* <sup>≥</sup> 1 then there are *<sup>N</sup>* <sup>∈</sup> <sup>2</sup>*n*−<sup>1</sup> <sup>+</sup> <sup>1</sup> ... <sup>2</sup>*<sup>n</sup>* players. Then a tree *<sup>t</sup>* ∈ T*n*(Σ) can be obtained either (i) by joining two balanced trees with *n* − 1 layers or (ii) by joining one balanced tree with *n* − 1 layers and one fully balanced tree with *n* − 2 layers (all its leaves are on layer *n* − 2), so in both cases the balancing condition of *t* is properly preserved.

**Proposition 2** (Structure of a balanced tournament)**.** *Let t* ∈ T*n*(Σ) *be a tournament of N players such that n is defined by Equation (6). Then the number of players starting in the first round is <sup>β</sup>* <sup>=</sup> <sup>2</sup>*<sup>N</sup>* <sup>−</sup> <sup>2</sup>*<sup>n</sup> and the number of players starting in the second round (waivers) is <sup>γ</sup>* <sup>=</sup> <sup>2</sup>*<sup>n</sup>* <sup>−</sup> *N. Moreover, if <sup>n</sup>* <sup>≥</sup> <sup>1</sup> *then the number of internal nodes of level* <sup>2</sup> *in the tree is equal to <sup>α</sup>* <sup>=</sup> *<sup>N</sup>* <sup>−</sup> <sup>2</sup>*n*−1*, i.e., <sup>β</sup>* <sup>=</sup> <sup>2</sup>*<sup>α</sup> and <sup>γ</sup>* <sup>=</sup> <sup>2</sup>*n*−<sup>1</sup> <sup>−</sup> *<sup>α</sup>.*

**Proof.** First observe that if the number of players is a power of 2, i.e., *N* = 2*n*, then *α* = 2*n*−<sup>1</sup> = *N*/2, *β* = *N*, and *γ* = 0. This is trivially true, as in this case the tournament is fully balanced and all the players start in the first round (there are no exemptions).

The proof for the general case can be shown by induction on *<sup>n</sup>* <sup>∈</sup> <sup>N</sup>.

For *n* = 0 the tournament has *N* = 1 players. In this case there is a single balanced tournament with *γ* = 0 and *β* = 1, so the property trivially holds.

For *n* = 1 the tournament has *N* = 2 players. In this case there is a single balanced tournament with *α* = 1, *β* = 2 and *γ* = 0, so the property trivially holds.

Let us now assume that the property holds for *k* = 0, 1, ... , *n* and let us prove it for *k* = *n* + 1. There are two cases.

Case 1. If *n* ≥ 1, *t*<sup>1</sup> ∈ T*n*(Σ1), *t*<sup>2</sup> ∈ T*n*(Σ2), Σ<sup>1</sup> ∩ Σ<sup>2</sup> = ∅ and Σ<sup>1</sup> ∪ Σ<sup>2</sup> = Σ, let us consider *t* = {*t*1, *t*2}∈T*n*+1(Σ) such that the second condition of Definition 2 is fulfilled. According to the induction hypothesis we have *<sup>γ</sup><sup>i</sup>* <sup>=</sup> <sup>2</sup>*<sup>n</sup>* <sup>−</sup> *Ni*, *<sup>β</sup><sup>i</sup>* <sup>=</sup> <sup>2</sup>*Ni* <sup>−</sup> <sup>2</sup>*n*, *<sup>α</sup><sup>i</sup>* <sup>=</sup> *Ni* <sup>−</sup> <sup>2</sup>*n*−<sup>1</sup> for *<sup>i</sup>* <sup>=</sup> 1, 2 and *<sup>N</sup>* <sup>=</sup> *<sup>N</sup>*<sup>1</sup> <sup>+</sup> *<sup>N</sup>*2. Then *<sup>γ</sup>* <sup>=</sup> *<sup>γ</sup>*<sup>1</sup> <sup>+</sup> *<sup>γ</sup>*<sup>2</sup> <sup>=</sup> <sup>2</sup>*n*+<sup>1</sup> <sup>−</sup> (*N*<sup>1</sup> <sup>+</sup> *<sup>N</sup>*2) = <sup>2</sup>*n*+<sup>1</sup> <sup>−</sup> *<sup>N</sup>*. Similarly *<sup>β</sup>* <sup>=</sup> *<sup>β</sup>*<sup>1</sup> <sup>+</sup> *<sup>β</sup>*<sup>2</sup> <sup>=</sup> <sup>2</sup>*<sup>N</sup>* <sup>−</sup> <sup>2</sup>*n*+<sup>1</sup> and *<sup>α</sup>* <sup>=</sup> *<sup>α</sup>*<sup>1</sup> <sup>+</sup> *<sup>α</sup>*<sup>2</sup> <sup>=</sup> *<sup>N</sup>* <sup>−</sup> <sup>2</sup>*<sup>n</sup>* q.e.d.

Case 2. If *<sup>n</sup>* <sup>≥</sup> 2, *<sup>t</sup>*<sup>1</sup> ∈ T*n*(Σ1), *<sup>t</sup>*<sup>2</sup> ∈ T*n*−1(Σ2) is a fully balanced tree (i.e., <sup>|</sup>Σ2<sup>|</sup> <sup>=</sup> <sup>2</sup>*n*−1), Σ<sup>1</sup> ∩ Σ<sup>2</sup> = ∅ and Σ<sup>1</sup> ∪ Σ<sup>2</sup> = Σ, let us consider *t* = {*t*1, *t*2}∈T*n*+1(Σ) such that the third condition of Definition 2 is fulfilled. According to the induction hypothesis we

have *<sup>γ</sup>*<sup>1</sup> <sup>=</sup> <sup>2</sup>*<sup>n</sup>* <sup>−</sup> *<sup>N</sup>*1, *<sup>β</sup>*<sup>1</sup> <sup>=</sup> <sup>2</sup>*N*<sup>1</sup> <sup>−</sup> <sup>2</sup>*n*, *<sup>α</sup>*<sup>1</sup> <sup>=</sup> *<sup>N</sup>*<sup>1</sup> <sup>−</sup> <sup>2</sup>*n*−1, *<sup>γ</sup>*<sup>2</sup> <sup>=</sup> 0, *<sup>β</sup>*<sup>2</sup> <sup>=</sup> <sup>2</sup>*n*−<sup>1</sup> and *<sup>α</sup>*<sup>2</sup> <sup>=</sup> <sup>2</sup>*n*−2, *<sup>N</sup>*<sup>2</sup> <sup>=</sup> <sup>2</sup>(*<sup>n</sup>* <sup>−</sup> <sup>1</sup>), and *<sup>N</sup>* <sup>=</sup> *<sup>N</sup>*<sup>1</sup> <sup>+</sup> <sup>2</sup>*n*−1. Then *<sup>β</sup>* <sup>=</sup> *<sup>β</sup>*<sup>1</sup> <sup>=</sup> <sup>2</sup>(*N*<sup>1</sup> <sup>+</sup> <sup>2</sup>*n*−1) <sup>−</sup> <sup>2</sup>*<sup>n</sup>* <sup>−</sup> <sup>2</sup>*<sup>n</sup>* <sup>=</sup> <sup>2</sup>*<sup>N</sup>* <sup>−</sup> <sup>2</sup>*n*<sup>+</sup>1. Similarly *<sup>γ</sup>* <sup>=</sup> *<sup>γ</sup>*<sup>1</sup> <sup>+</sup> *<sup>β</sup>*<sup>2</sup> <sup>=</sup> <sup>2</sup>*<sup>n</sup>* <sup>−</sup> *<sup>N</sup>*<sup>1</sup> <sup>+</sup> <sup>2</sup>*n*−<sup>1</sup> <sup>=</sup> <sup>2</sup>*<sup>n</sup>* <sup>+</sup> <sup>2</sup>*n*−<sup>1</sup> <sup>+</sup> <sup>2</sup>*n*−<sup>1</sup> <sup>−</sup> *<sup>N</sup>* <sup>=</sup> <sup>2</sup>*n*+<sup>1</sup> <sup>−</sup> *<sup>N</sup>* and similarly for *<sup>α</sup>* <sup>=</sup> *<sup>α</sup>*<sup>1</sup> <sup>=</sup> *<sup>N</sup>*<sup>1</sup> <sup>−</sup> <sup>2</sup>*n*−<sup>1</sup> <sup>=</sup> *<sup>N</sup>* <sup>−</sup> <sup>2</sup>*n*−<sup>1</sup> <sup>−</sup> <sup>2</sup>*n*−<sup>1</sup> <sup>=</sup> *<sup>N</sup>* <sup>−</sup> <sup>2</sup>*<sup>n</sup>* q.e.d.

The relations *<sup>β</sup>* <sup>=</sup> <sup>2</sup>*<sup>α</sup>* and *<sup>γ</sup>* <sup>=</sup> <sup>2</sup>*n*−<sup>1</sup> <sup>−</sup> *<sup>α</sup>* can be now easily checked.

**Example 3** (Tournament structure design)**.** *Let us consider a tournament with N* = 5 *players. In this case <sup>n</sup>* <sup>=</sup> <sup>3</sup>*, <sup>γ</sup>* <sup>=</sup> 23 <sup>−</sup> <sup>5</sup> <sup>=</sup> <sup>3</sup>*, <sup>α</sup>* <sup>=</sup> 22 <sup>−</sup> <sup>3</sup> <sup>=</sup> <sup>1</sup>*, <sup>β</sup>* <sup>=</sup> <sup>2</sup> <sup>×</sup> <sup>5</sup> <sup>−</sup> 23 <sup>=</sup> <sup>2</sup>*. A tree representing a tournament with five players will have three layers such that the first layer consists of β* = 2 *leaves (players) and the second layer consists of* 2*n*−<sup>1</sup> = 22 = 4 *nodes among which there is α* = 1 *internal node and γ* = 3 *leaves (players). One such a balanced tournament is depicted in Figure 2.*

**Figure 2.** A balanced tournament of five players.

**Proposition 3** (Counting balanced tournaments)**.** *The set* T*n*(Σ) *with* |Σ| = *N players contains:*

$$\frac{N! \cdot \binom{2^{n-1}}{\gamma}}{2^{N-1}} \tag{7}$$

*elements.*

**Proof.** There are ( 2*n*−<sup>1</sup> *<sup>γ</sup>* ) ways of choosing how the *γ* players will enter second round. Their ordering matters so we multiply with *γ*!. Moreover those *γ* players are arbitrarily chosen from the set of *N* players, so we multiply with ( *N <sup>γ</sup>*). Finally, the ordering of those *β* remaining players that enter first round matters, so we also multiply with *β*!. For each internal node of the tree, exchanging its left and right sub-tree is a tournament invariant. There are 2*N*−<sup>1</sup> independent ways of exchanging left and right sub-tree of the tree, so we must divide by 2*N*−1. We obtain:

$$\frac{\binom{N}{\gamma} \cdot \binom{2^{n-1}}{\gamma} \cdot \gamma! \cdot \beta!}{2^{N-1}} = \frac{N! \cdot \binom{2^{n-1}}{\gamma}}{2^{N-1}} \tag{8}$$

A simpler proof is obtained by thinking about structures (i.e., "shapes") of tournament trees. The selection of the "locations" of those *γ* players entering second round can be achieved in ( 2*n*−<sup>1</sup> *<sup>γ</sup>* ) ways. For each tree structure defined in this way there are *N*! permutations of the leaves (players), thus defining a total number of *N*! · ( 2*n*−<sup>1</sup> *<sup>γ</sup>* ) balanced tournament trees. Finally we divide by 2*N*−<sup>1</sup> and we obtain Equation (7).

**Example 4.** *Let us check the number of balanced tournaments for several cases.*

*1. The number of balanced tournaments with N* = 5 *players can be obtained as follows:*

$$\frac{5! \times \binom{2^2}{3}}{2^4} = 30\tag{9}$$

*It is not difficult to verify that this result is correct. In this case β* = 2*. There are* ( 5 <sup>2</sup>) = 10 *ways of selecting those two players that will enter first round. We have one separate tournament* *by letting the winner of the game of these two players playing against each of the remaining three players in the second round. So in total there are* 10 × 3 = 30 *balanced tournaments with five players.*

*2. For n* = 3 *players we obtain:*

$$\frac{3! \times \binom{2^1}{1}}{2^2} = 3\tag{10}$$

*3. If N* = 2*<sup>n</sup> then γ* = 0*, thus we obtain our result for fully balanced tournaments from [1] stating that the total number of fully balanced tournaments is given by:*

$$\frac{(2^n)!}{2^{2^n-1}}.\tag{11}$$

*The number of fully balanced tournaments with two rounds is* (22)! <sup>2</sup>22−<sup>1</sup> <sup>=</sup> 4! <sup>8</sup> = 3*. Observe that applying the formula, we obtain* 315 *fully balanced tournaments with three rounds. Let us obtain this result using a different reasoning. Let us count the number of a set with eight elements consisting of two subsets of four elements each. There are* ( 8 <sup>4</sup>)/2 = 35 *possibilities, as we consider the four combinations of eight elements, and we divide by two as the order of the subsets of a partition does not matter; however, for each set of each partition there are three fully balanced tournaments of three rounds, so multiplying we obtain a total of nine possibilities. So the number of three-stage tournaments is* 9 × 35 = 315*.*

#### **4. Optimal Tournaments**

Each round of a tournament with *n* rounds defines possible games between players. Note that in a given tournament any two players can play in a game at one and only one of its rounds. This follows from the fact that for any two leaves of a binary tree there is a unique closest common ancestor. It follows that the tournament round *si*,*<sup>j</sup>* where players *i*, *j* can meet is a unique value in 1, 2, ... , *n* and it is well defined. For example, referring to the tournament shown in Figure 2, *s*2,4 = 3, *s*1,5 = 2, and *s*4,5 = 1.

Intuitively, the higher the quotations of players *i* and *j*, the better it is to let them meet in a higher stage of the tournament in order to increase the stakes of their games.

We assume in what follows that a quotation *qi* ∈ (0, +∞) is available for each player *i* ∈ Σ. Quotations can be obtained from the players' current ranking (as for example in international tennis tournaments ATP and WTA) or by other means.

**Definition 3** (Tournament cost)**.** *Let t* ∈ T*n*(Σ) *be a tournament with n rounds and let st <sup>i</sup>*,*<sup>j</sup>* ∈ {1, 2, ... , *n*} *be the stage of t where players i*, *j can meet. Let qi* > 0 *be the quotations of players for all i* ∈ Σ*. The cost of t is defined as:*

$$Cost(t) = \sum\_{i,j \in \Sigma, i < j} q\_i q\_j \mathbf{s}\_{ij}^t \tag{12}$$

**Definition 4** (Optimal tournament)**.** *A tournament such that its cost computed with Equation (12) is maximal is called an* optimal tournament *and it is defined by:*

$$\begin{array}{lcl}\text{OptC}(\Sigma) &= \max\_{t \in \mathcal{T}\_n(\Sigma)} \text{Cost}(t) \\ t^\* &= \text{argmax}\_{t \in \mathcal{T}\_n(\Sigma)} \text{Cost}(t) \end{array} \tag{13}$$

Obviously, better ranked players have a higher quotation. We assume that if player *i* has rank *ri* then its quota is *qi* such that whenever *ri* < *rj* we have *qi* > *qj*. For example, if there are *<sup>N</sup>* <sup>=</sup> <sup>2</sup>*<sup>n</sup>* players then we can choose *qi* <sup>=</sup> *<sup>N</sup>* <sup>+</sup> <sup>1</sup> <sup>−</sup> *ri* for all *<sup>i</sup>* <sup>=</sup> 1, . . . , *<sup>N</sup>*.

**Example 5.** *Let us consider the tournaments t*1*, t*2*, and t*<sup>3</sup> *with three players shown in Figure 3. Let us introduce:*

$$\begin{array}{rcl} A &= q\_1 q\_2 + q\_1 q\_3 + q\_2 q\_3 \\ S &= q\_1 + q\_2 + q\_3 \end{array} \tag{14}$$

*We obtain:*

$$\begin{array}{l} \text{Cost}(t\_1) &= q\_2 q\_3 1 + q\_1 q\_2 2 + q\_1 q\_3 2 = A + q\_1 (q\_2 + q\_3) = A + q\_1 (S - q\_1) \\ \text{Cost}(t\_2) &= A + q\_2 (S - q\_2) \\ \text{Cost}(t\_3) &= A + q\_3 (S - q\_3) \end{array} \tag{15}$$

*The ordering of the costs depends on the ordering of the values of the function q*(*S* − *q*) *for q* = *q*1, *q*2, *q*<sup>3</sup> ∈ [0, *S*]*. This function is monotonically increasing on* [0, *S*/2] *and monotonically decreasing on* [*S*/2, *S*]*. Observe that if qi* ≤ *S*/2*, i.e., if neither player gets more than a half of the total quotation stake, then the ordering of the costs is given by the ordering of the quotations qi.*

**Figure 3.** Balanced tournaments with three players.

**Example 6.** *Let us consider four players (see Table 1). We assume that each player has a unique rank from 1 to 4. Now, if we choose qi* = 5 − *ri then, using this approach for defining players' quota, player 2 with rank 4 is assigned quotation q*<sup>2</sup> = 1*. We consider the three tournaments t*1, *t*2, *t*<sup>3</sup> ∈ *T*2({1, 2, 3, 4}) *from Figure 4. According to Equation (12), the cost of a tournament t* ∈ *T*2({1, 2, 3, 4}) *is:*

$$\begin{aligned} \text{Cost}(t) &= \sum\_{1 \le i < j \le 4}^{4} s\_{ij}^{t} q\_{i} q\_{j} \\ \text{Cost}(t) &= s\_{12}^{t} \cdot 4 \times 1 + s\_{13}^{t} \cdot 4 \times 2 + s\_{14}^{t} \cdot 4 \times 3 + s\_{23}^{t} \cdot 1 \times 2 + s\_{24}^{t} \cdot 1 \times 3 + s\_{34}^{t} \cdot 2 \times 3 \end{aligned} \tag{16}$$

*Substituting stage values s<sup>t</sup> ij for each tournament from Table 2 into Equation (16) we obtain the tournaments' cost values from Table 2. We observe that in this case the best tournament is t*3*. Actually it can be easily checked that the best tournament is t*<sup>3</sup> *for whatever values of the quota that are decreasingly ordered according to the ranks.*

**Table 1.** Players' ranking and quotation for *n* = 4.



**Table 2.** Games playing stages for each tournament of Figure 4 and their costs.

**Figure 4.** Balanced tournaments with four players.

#### **5. Dynamic Programming Algorithm for Computing Optimal Tournaments**

For any set of players Σ we denote by *q*<sup>Σ</sup> the sum of quotations of the players in Σ.

$$q\_{\Sigma} = \sum\_{i \in \Sigma} q\_i \tag{17}$$

**Proposition 4** (Recurrence for tournament cost)**.** *Let t* ∈ T*n*(Σ) *be an n-stage tournament such that* <sup>Σ</sup> *is a finite nonempty set with N* <sup>∈</sup> <sup>2</sup>*n*−<sup>1</sup> <sup>+</sup> 1...2*<sup>n</sup> elements. Then:*

$$\text{Cost}(t) = \begin{cases} 0 & n = 0\\ q\_i \cdot q\_j & n = 1, N = 2, \Sigma = \{i, j\} \\ \text{Cost}(t\_1) + \text{Cost}(t\_2) + nq\_{\Sigma\_1}q\_{\Sigma\_2} & n \ge 2, t = \{t\_1, t\_2\}, t\_1 \in \mathcal{T}(\Sigma\_1), \\ & t\_2 \in \mathcal{T}(\Sigma\_2), \Sigma\_1 \cup \Sigma\_2 = \Sigma\_r \Sigma\_1 \cap \Sigma\_2 = \mathcal{Q}\_r \\ & t\_1 \text{ and } t\_2 \text{ are balanced, at most one has } n - 2 \\ & \text{levels and in this case it is fully balanced} \end{cases} \tag{18}$$

**Proof.** If *n* = 0 then Σ has a single player so the result is obvious, as no games are played to determine the winner of the tournament.

If *n* = 1 then Σ has two players *i* and *j* so the result is obvious, as a single game is played to determine the winner of the tournament, between player *i* and player *j*.

If *<sup>n</sup>* <sup>≥</sup> 2 then *<sup>t</sup>* <sup>=</sup> {*t*1, *<sup>t</sup>*2}. If *<sup>i</sup>* <sup>∈</sup> <sup>Σ</sup><sup>1</sup> and *<sup>j</sup>* <sup>∈</sup> <sup>Σ</sup><sup>2</sup> then *<sup>s</sup><sup>t</sup> ij* = *n*. So Equation (12) gives:

$$\begin{split} \text{Cost}(t) &= \sum\_{i,j \in \Sigma\_1, i < j} q\_i q\_j s\_{ij}^{t\_1} + \sum\_{i,j \in \Sigma\_2, i < j} q\_i q\_j s\_{ij}^{t\_2} + \sum\_{i \in \Sigma\_1, j \in \Sigma\_2} q\_i q\_j s\_{ij}^{t} = \\ &\text{Cost}(t\_1) + \text{Cost}(t\_2) + n \sum\_{i \in \Sigma\_1, j \in \Sigma\_2} q\_i q\_j = \text{Cost}(t\_1) + \text{Cost}(t\_2) + n q\_{\Sigma\_1} q\_{\Sigma\_2} \end{split} \tag{19}$$

The conditions from the Equation (18) follow directly from the recursive definition of balanced tournaments (Definition 2).

**Proposition 5** (Recurrences for optimal tournaments)**.**

*1. The optimal tournament cost OptC introduced by Equation (13) can be defined recursively as follows:*

$$Opt\mathbb{C}(\Sigma) = \begin{cases} 0 & n = 0, |\Sigma| = 1 \\ q\_i \cdot q\_j & n = 1, \Sigma = \{i, j\} \\ \max\_{\begin{subarray}{c} \Sigma\_1 \cup \Sigma\_2 = \Sigma \\ \Sigma\_1 \cap \Sigma\_2 = \mathcal{Q} \end{subarray}} Opt\mathbb{C}(\Sigma\_1) + Opt\mathbb{C}(\Sigma\_2) + nq\_{\Sigma\_1}q\_{\Sigma\_2} & n \ge 2, 2^{n-1} < |\Sigma| \le 2^n \\ & |\Sigma| \le 2^{n-2} \tag{20} \\ & |\Sigma\_2| \le 2^{n-1} \end{cases} \tag{20}$$

*2. The optimal tournament can be determined by recording the pairs of subsets Opt*(Σ)=(Σ1, Σ<sup>2</sup> = Σ \ Σ1) *that maximize OptC in Equation (20) for n* ≥ 2*,* <sup>2</sup>*n*−<sup>1</sup> <sup>&</sup>lt; <sup>|</sup>Σ| ≤ <sup>2</sup>*<sup>n</sup> as follows:*

$$OptS(\Sigma) = \underset{\Sigma\_1, \Sigma\_2 \subseteq \Sigma}{\text{argmax}} \, Opt\mathbb{C}(\Sigma\_1) + Opt\mathbb{C}(\Sigma\_2) + nq\_{\Sigma\_1}q\_{\Sigma\_2} \tag{21}$$

*Note that the limits of* argmax *in Equation (21) must satisfy the conditions from the third branch of Equation (20). Note also that it is enough to record Opt*(Σ) = Σ<sup>1</sup> *as* Σ<sup>2</sup> = Σ \ Σ1*.*

**Proof.** The proof follows by applying the maximization operation in Equation (18) and observing that the term *nq*Σ<sup>1</sup> *q*Σ<sup>2</sup> does not depend on *t* = {*t*1, *t*2}. The condition |Σ1|≤|Σ2| ensures that a pair {Σ1, Σ2} is uniquely considered (otherwise each pair will be considered twice as {Σ1, Σ2} and {Σ2, Σ1}).

Moreover, the sets *OptS*(Σ) can be used to construct an optimal tournament. Let Σ*<sup>n</sup>* = Σ. We define: Σ*n*−<sup>1</sup> = *OptS*(Σ*n*), ..., Σ<sup>0</sup> = *OptS*(Σ1). Then the optimal tournament *t* ∗ can be defined recursively as follows:

$$\begin{aligned} t^\* &= t^\hbar(\Sigma^\hbar) \\ t^i(\Sigma^i) &= \left\{ \begin{array}{ll} \{t\_1^{i-1}(\Sigma^{i-1}), t\_2^{i-1}(\Sigma^i \backslash \Sigma^{i-1})\} & i \ge 1 \\ \hline \end{array} \right. \\ &\left. \begin{array}{ll} i \ge 1 \\ i = 0, \Sigma^0 = \{j\} \end{array} \right. \end{aligned} \tag{22}$$

Proposition 5 (Equation (20) in particular) can be used to design a dynamic programming algorithm for computing the optimal tournament and its cost. The dynamic programming algorithm can be implemented either with a bottom-up approach or using a top-down approach with memoization [17]. We will explore these possibilities in what follows by deriving a bottom-up dynamic programming algorithm for fully balanced tournaments as well as a top-down dynamic programming algorithm with memoization for the general case.

#### *5.1. Top-Down Dynamic Programming Algorithm with Memoization*

**Proposition 6** (Top-down recursive application of Equation (20))**.** *Let us assume that we want to compute OptC*(Σ) *for* <sup>|</sup>Σ<sup>|</sup> <sup>=</sup> *N, <sup>N</sup>* <sup>≥</sup> <sup>2</sup>*,* <sup>2</sup>*n*−<sup>1</sup> <sup>+</sup> <sup>1</sup> <sup>≤</sup> *<sup>N</sup>* <sup>≤</sup> <sup>2</sup>*n. According to Proposition 5, we must recursively explore all tournaments of shape t* = {*t*1, *t*2} *such that ti* ∈ *T*(Σ*i*)*,* |Σ*i*| = *Ni, i* = 1, 2*, N* = *N*<sup>1</sup> + *N*2*. Then we should recursively apply Equation (20) only for the following values of N*1*:*

$$\max\{N - 2^{n-1}, 2^{n-2}\} \le N\_1 \le \lfloor N/2 \rfloor \tag{23}$$

**Proof.** As *N* = *N*<sup>1</sup> + *N*<sup>2</sup> and *N*<sup>1</sup> ≤ *N*<sup>2</sup> we obtain:

$$N\_1 \le \lfloor N/2 \rfloor \tag{24}$$

As *<sup>N</sup>* <sup>=</sup> *<sup>N</sup>*<sup>1</sup> <sup>+</sup> *<sup>N</sup>*<sup>2</sup> and 2*n*−<sup>2</sup> <sup>≤</sup> *<sup>N</sup>*<sup>1</sup> <sup>≤</sup> <sup>2</sup>*n*−<sup>1</sup> we obtain:

$$2^{n-2} \le N\_1 \le 2^{n-1}$$

$$N - 2^{n-1} \le N\_1 \le N - 2^{n-2}$$

Combining (24) and (25) we obtain:

$$\max\{N - 2^{n-1}, 2^{n-2}\} \le N\_1 \le \min\{2^{n-1}, N - 2^{n-2}, \lfloor N/2 \rfloor\}\tag{26}$$

It is not difficult to see that *N*/2 ≤ *<sup>N</sup>*/2 <sup>≤</sup> <sup>2</sup>*n*−<sup>1</sup> and *N*/2 ≤ *<sup>N</sup>*/2 <sup>&</sup>lt; *<sup>N</sup>* <sup>−</sup> <sup>2</sup>*n*−<sup>2</sup> so the value of right-hand side of Equation (26) is *N*/2, q.e.d.

Observe that for fully balanced tournaments *N* = 2*n*, so this top-down recursive process will generate subsets of sizes 2*n*−1, 2*<sup>n</sup>*−<sup>2</sup> . . . 2, 1.

**Example 7.** *Let us illustrate the application Equation (20) for N* = |Σ| = 25 *players. The results are summarized in Table 3. It follows that solving the problem for a set of* 25 *players requires the solving of all subproblems corresponding to its subsets of* 1, 2, ... , 16 *players; however, solving the problem for a set of* 15 *players requires the solving of all subproblems corresponding to its subsets of* 1, 2, 3, 4, 7, 8 *players. Moreover, solving the problem for a set of* 14 *players requires the solving of all subproblems corresponding to its subsets of* 1, 2, 3, 4, 6, 7, 8 *players, while solving the problem for a set of* 12 *or* 13 *players requires the solving of all subproblems corresponding to its subsets of* 1, 2, 3, 4, 5, 6, 7, 8 *players.*

**Table 3.** Games playing stages for each tournament and tournament costs.


Proposition 6 sets the iteration bounds for exploring the subsets of players in the topdown approach. Combining the results of Propositions 5 and 6, we obtain the top-down approach for computing optimal balanced tournaments; see Algorithm 1.

**Algorithm 1** *OptTourCostTD*(Σ, *N*, *q*) top-down dynamic programming algorithm with memoization for computing the cost of the optimal tournament.

**Global:** *OptC*, initially ∅, maps subsets of players to costs of optimal tournaments. *OptS*, initially ∅, maps subsets to sub-subsets for building optimal tournaments.

```
Input: N ∈ N∗ represents the number of players.
       q. Vector of size N representing the players' quota.
       Σ such that |Σ| = N. Σ represents the set of players.
```

```
Output: Cmax. Cost of the optimal tournament for set Σ of players.
  n ← log2 N
  if n = 0 then
     Cmax ← 0
     Smax ← ∅
  else if n = 1 (i.e., Σ = {i, j}) then
     Cmax ← qi ∗ qj
     Smax ← {i}
  else
     P1 ← 2n−2
     P2 ← 2 ∗ P1
     kmax ← N/2
     if N ≤ P1 + P2 then
         kmin ← P1
     else
         kmin ← N − P2
     end if
     Cmax ← −∞
     for k = kmin, kmax do
         for Σ1 ⊆ Σ s.t. |Σ1| = k do
            if Σ1 	∈ OptC then
               C1 ← OptTourCostTD(Σ1, k, q)
            else
               C1 ← OptC[Σ1]
            end if
            Σ2 ← Σ \ Σ1
            if Σ2 	∈ OptC then
               C2 ← OptTourCostTD(Σ2, k, q)
            else
               C2 ← OptC[Σ2]
            end if
            C ← C1 + C2
            ql ← 0
            for i ∈ Σ1 do
               ql ← ql + qi
            end for
            qr ← 0
            for i ∈ Σ2 do
               qr ← qr + qi
            end for
            C ← C + k ∗ ql ∗ qr
            if C > Cmax then
               Cmax ← C
               Smax ← Σ1
            end if
         end for
     end for
  end if
  OptC[Σ] ← Cmax
  OptS[Σ] ← Smax
```
#### *5.2. Bottom-Up Dynamic Programming Algorithm for Fully Balanced Tournaments*

The cost of the optimal tournament is computed with the help of *OptC* vector that is indexed by all the subsets of Σ generated recursively by Equation (20), starting from the topmost set <sup>Σ</sup>. Note that for a fully balanced tournament we have <sup>|</sup>Σ<sup>|</sup> <sup>=</sup> <sup>2</sup>*<sup>n</sup>* and the process will generate exactly all the subsets of Σ of cardinal: 20, 21,...2*<sup>n</sup>*. Note that in this case the size of *OptC* can be determined as:

$$S\_n = \sum\_{i=0}^{2^n} \binom{2^n}{2^i} \tag{27}$$

Additionally we must save in vector *OptS* of size *Sn* the subsets Σ determined using Equation (21), such that we can reuse them to construct the optimal tournament using Equation (22). Our proposed algorithm is presented as Algorithm 2.


**Input:** *<sup>N</sup>* <sup>=</sup> <sup>2</sup>*n*, *<sup>n</sup>* <sup>∈</sup> <sup>N</sup>. *<sup>N</sup>* represents the number of players. *q*. Vector of size *N* representing the players' quota. <sup>Σ</sup> such that <sup>|</sup>Σ<sup>|</sup> <sup>=</sup> <sup>2</sup>*n*. <sup>Σ</sup> represents the set of players. **Output:** *OptC*. Vector of costs of the optimal sub-tournaments. *OptS*. Vector of sets to construct the optimal tournament. 1: **for** *i* = 1, *N* **do** 2: *OptC*[{*i*}] ← 0 3: **end for** 4: **for** *k* = 1, *n* **do** 5: **for** <sup>Σ</sup><sup>1</sup> <sup>⊆</sup> <sup>Σ</sup> s.t. <sup>|</sup>Σ1<sup>|</sup> <sup>=</sup> <sup>2</sup>*<sup>k</sup>* **do** 6: *Cmax* ← −∞ 7: **for** Σ- ⊆ Σ<sup>1</sup> s.t. |Σ- <sup>|</sup> <sup>=</sup> <sup>2</sup>*k*−<sup>1</sup> **do** 8: *C* ← *OptC*(Σ- ) + *OptC*(Σ<sup>1</sup> \ Σ- ) 9: *ql* ← 0 10: **for** *i* ∈ Σ **do** 11: *ql* ← *ql* + *qi* 12: **end for** 13: *qr* ← 0 14: **for** *i* ∈ Σ<sup>1</sup> \ Σ **do** 15: *qr* ← *qr* + *qi* 16: **end for** 17: *C* ← *C* + *k* ∗ *ql* ∗ *qr* 18: **if** *C* > *Cmax* **then** 19: *Cmax* ← *C* 20: *Smax* ← Σ- 21: **end if** 22: **end for** 23: *OptC*[Σ1] ← *Cmax* 24: *OptS*[Σ1] ← *Smax* 25: **end for** 26: **end for**

#### *5.3. Computing an Optimal Tournament*

Note that both Algorithms 1 and 2 determine the *OptS* structure that records the split points for each subset of players according to Equation (20). The *OptS* structure can be used to actually build an optimal tournament according to Algorithm 3 using Equation (22).

**Algorithm 3** *OptTour*(Σ, *N*,*OptS*) algorithm for computing the optimal tournament.

**Input:** Σ representing the set of players.

*<sup>N</sup>* <sup>∈</sup> <sup>N</sup><sup>∗</sup> representing the number of players.

*OptS*. Structure determined either by Algorithm 1 or by Algorithm 2.

**Output:** Returns the optimal tournament.

1: **if** *N* = 1 (i.e., Σ = {*j*}) **then**

2: **return** *j* 3: **end if** 4: Σ<sup>1</sup> ← *OptS*(Σ) 5: *N*<sup>1</sup> ← |Σ1| 6: Σ<sup>2</sup> ← Σ \ Σ<sup>1</sup> 7: *t*<sup>1</sup> ← *OptTour*(Σ1, *N*1,*OptS*) 8: *t*<sup>2</sup> ← *OptTour*(Σ2, *N* − *N*1,*OptS*) 9: **return** {*t*1, *t*2}

*5.4. Correctness and Complexity Results*

**Proposition 7** (Correctness of Algorithms 1–3)**.**


#### **Proof.**

Proof of a. Algorithms 1 and 2 compute the values of *OptC* and *OptS* either in topdown or bottom-up fashion for all the subsets that are required to determine the optimal tournament for the set Σ of players. The computation follows Equations (20) and (21); therefore the correctness of this point follows from Propositions 5 and 6.

Proof of b. Algorithm 3 computes the optimal tournament using Equations (22). As values of *OptS* are correctly determined according to point "a", it follows that the tournament computed by Algorithm 3 is the optimal tournament.

**Proposition 8** (Complexity of Algorithms 2 and 3)**.** *Let us consider tournaments with N players.*


**Proof.** The proof is using the Stirling approximation of the factorial, written in inequality form ([18]), in fact showing that *p*! = Θ <sup>√</sup>*<sup>p</sup> p e p* :

$$\sqrt{2\pi} \le \frac{p!}{\sqrt{p}\left(\frac{p}{e}\right)^p} \le e \tag{28}$$

Using this observation it is not difficult to prove that:

$$
\begin{pmatrix} 2p \\ p \end{pmatrix} = \Theta \left( \frac{2^{2p}}{\sqrt{p}} \right) \tag{29}
$$

Proof of a. The space complexity of Algorithms 2 and 3 is given by the size of structures *OptC* and *OptS* (see Equation (27)); however, the asymptotically dominant term of this summation is ( <sup>2</sup>*<sup>n</sup>* <sup>2</sup>*n*−<sup>1</sup>). Then the result follows using Equation (29) for *<sup>p</sup>* = <sup>2</sup>*n*−1.

Proof of b. Algorithm 2 contains one "for" loop (lines 4–26) including other three nested "for" loops. The first inner "for" loop (lines 5–25) is executed (*<sup>N</sup>* <sup>2</sup>*<sup>k</sup>* ) times. The second inner "for" loop (lines 7–22) is executed ( <sup>2</sup>*<sup>k</sup>* <sup>2</sup>*k*−<sup>1</sup>) times. The third inner "for" loop (lines 10–12 and 14–16) is executed 2*k*−<sup>1</sup> times. The total number of steps of Algorithm 2 is given by:

$$\sum\_{k=1}^{n} \binom{N}{2^k} \binom{2^k}{2^{k-1}} 2^{k-1} \tag{30}$$

Observe that the asymptotically dominant term of this summation is obtained for *k* = *n* − 1 and it can be transformed using Equation (29), thus concluding the proof:

$$
\Theta\begin{pmatrix} N \\ 2^{n-1} \end{pmatrix} \binom{2^{n-1}}{2^{n-2}} 2^{n-2} = \Theta\left(\frac{2^n}{\sqrt{2^{n-1}}} \frac{2^{2^{n-1}}}{\sqrt{2^{n-2}}} 2^{n-2}\right) = \Theta\left(\frac{1}{\sqrt{2}} 2^N 2^{N/2}\right) = \Theta((2\sqrt{2})^N) \tag{31}
$$

Proof of c. If *N* is a power of two, i.e., we have a fully balanced tournament, the memory consumption is exactly as in case a, so the first result follows trivially. Otherwise, the space consumption of tables *OptC* and *OptS* has an upper bound given by the size of the power set of Σ, and the result follows immediately.

Proof of d. Let us first observe that in order to determine the cost of an optimal tournament with *N* players we need to know the costs of optimal tournaments for *N*<sup>1</sup> and *N*<sup>2</sup> players such that *N* = *N*<sup>1</sup> + *N*<sup>2</sup> and condition of Equation (23) holds. It is not difficult to observe that:

$$\frac{N}{4} < N\_1 \le N\_2 < \frac{3N}{4} \tag{32}$$

First note that the upper bound of *N*<sup>2</sup> follows from the lower bound of *N*1, so it is enough to show the lower bound of *N*1. Let us assume by contradiction that:

$$\frac{N}{4} \ge \max\{N - 2^{n-1}, 2^{n-2}\}\tag{33}$$

It follows that:

$$\frac{N}{2} = \frac{N}{4} + \frac{N}{4} \ge N - 2^{n-1} + 2^{n-2} = N - 2^{n-2} \tag{34}$$

so:

$$N \le 2^{n-1} \tag{35}$$

and thus contradicting (5).

It is easier to analyze the complexity of Algorithm 1 by thinking "bottom-up" rather than "top-down". The complexity will be the same, as the role of the memorization technique is just to evaluate exactly once the cost of a tournament for each subset of players. So we must determine the cost for a subset of *i* = *<sup>N</sup>* <sup>4</sup> , *<sup>N</sup>* <sup>4</sup> <sup>+</sup> 1, ... , <sup>3</sup>*<sup>N</sup>* <sup>4</sup> players (· can be omitted without losing generality); therefore the total running time has the following upper bound:

$$\sum\_{p=\frac{N}{4}}^{p=\frac{3N}{4}} \binom{N}{p} \sum\_{k=\frac{P}{4}}^{\frac{3p}{4}} \binom{p}{k} k \tag{36}$$

Although ( *p <sup>k</sup>*) <sup>=</sup> ( *<sup>p</sup> p*−*k* ), so grouping terms with complementary binomial coefficients of inner sum of (36), noticing that ( *p <sup>k</sup>*) ≤ ( *p p* 2 ) and using (29), the inner sum has an upper bound of: *p*

$$p \cdot \sum\_{k=\frac{p}{4}}^{\frac{\Delta}{2}} \binom{p}{k} \le p \cdot \frac{p}{4} \cdot \Theta(\frac{2^p}{\sqrt{p}}) \le N\sqrt{N} \cdot \Theta(2^p) \tag{37}$$

Now, substituting (37) in (36) we obtain the following upper bound of the running time:

$$N\sqrt{N}\cdot\sum\_{p=\frac{N}{4}}^{p=\frac{3N}{4}} \binom{N}{p} \cdot \Theta(2^p) \le N\sqrt{N}\cdot\sum\_{p=0}^{p=N} \binom{N}{p} \cdot \Theta(2^p) = \Theta(3^N \cdot N\sqrt{N})\tag{38}$$

As this is in fact only an upper bound of our running time, the result of point e follows (i.e., with *O* rather than Θ).

Proof of e. Observe that the time complexity of Algorithm 3 satisfies the recursive equation *T*(|Σ|) = *T*(|Σ1|) + *T*(|Σ2|). Unfolding this equation with the substitution method yields an asymptotic execution time Θ(|Σ|) = Θ(*N*).

#### *5.5. Sub-Optimal Algorithms*

The dynamic programming approach for construing optimal tournaments has the disadvantage that the full exploration of the search space becomes prohibitive for larger tournaments. Our experiments (see Section 6) clearly show that this approach is unfeasible for tournaments of more than *n* = 16 players; however, the dynamic programming algorithms can be easily adapted to explore a smaller size of the search space, leading to sub-optimal solutions. The exploration strategy can be used to tune the trade-off between the complexity of the computation and the "gap" between the provided sub-optimal solution and the actual optimal solution.

The resulting sub-optimal algorithms follow a strictly top-down approach that can be the best described as divide-and-conquer. At each decision point, rather than exploring all pairs of subsets (Σ1, Σ2) satisfying conditions of Equation (20), only few such pairs (ideally only 1) are selected for exploration. This selection strategy can be deterministic, based on heuristic principles, or stochastic based on stochastic sampling subsets of Σ satisfying the conditions of Equation (20). The strictly top-down approach has the advantage that it avoids the use of temporary structures *OptS* and *OptC* and of the additional algorithm *OptTour* to build the solution. Rather, the top-down approach will build the solution directly, using the recursive divide-and-conquer approach.

The general approach of a sub-optimal algorithm following a top-down divide and conquer approach is presented as Algorithm 4. Note that this algorithm is using a specific strategy to explore only a few subsets of <sup>Σ</sup> defined by *Strategy*(Σ, *kmin*, *kmax*) <sup>⊆</sup> <sup>2</sup><sup>Σ</sup> and satisfying the conditions of Equation (20).

**Proposition 9** (Complexity of Algorithm 4)**.** *Let us consider tournaments for N players and let n be the number of stages of a balanced tournament. Then the time complexity of Algorithm 4 is O*(*N*1+log2 *<sup>s</sup>*) *where s is the average number of subsets of* Σ *explored by the strategy of the algorithm.*

**Proof.** It is not difficult to observe that the time complexity of Algorithm 4 satisfies the following recurrence:

$$T(|\Sigma|) = \mathbf{s} \cdot \left( T(|\Sigma\_1|) + T(|\Sigma \backslash \Sigma\_1|) \right) \tag{39}$$

Applying the substitution method for Equation (39) we obtain:

$$T(|\Sigma|) = \sum\_{i \in \Sigma} s^{h\_i} \cdot O(1) \tag{40}$$

For each *i* ∈ Σ, *hi* from Equation (40) denotes the height of leaf *i* in the tournament tree, so *hi* ≤ *n* ≤ log2 *N*. So:

$$T(|\Sigma|) = N \cdot O(s^{\log\_2 N}) = O(N^{1 + \log\_2 s})\tag{41}$$

q.e.d.

**Algorithm 4** *OptTourCostSubOpt*(Σ, *N*, *q*, *S*) top-down divide-and-conquer algorithm for computing a sub-optimal tournament.

```
Input: N ∈ N∗ represents the number of players.
```
*q*. Vector of size *N* representing the players' quota.

Σ such that |Σ| = *N*. Σ represents the set of players.

*<sup>S</sup>* = <sup>∑</sup>*i*∈<sup>Σ</sup> *qi*. Sum of players' quotations.

**Output:** *Cmax*. Cost of the sub-optimal tournament for set Σ of players.

*tmax*. Sub-optimal tournament tree.

1: *n* ← log2 *N*

2: **if** *n* = 0 (i.e." Σ = {*i*}) **then** 3: *Cmax* ← 0 4: *tmax* ← *i* 5: **else if** *n* = 1 (i.e., Σ = {*i*, *j*}) **then** 6: *Cmax* ← *qi* ∗ *qj* 7: *tmax* ← {*i*, *j*} 8: **else** 9: *<sup>P</sup>*<sup>1</sup> <sup>←</sup> <sup>2</sup>*n*−<sup>2</sup> 10: *P*2 ← 2 ∗ *P*1 11: *kmax* ← *N*/2 12: **if** *N* ≤ *P*1 + *P*2 **then** 13: *kmin* ← *P*1 14: **else** 15: *kmin* ← *N* − *P*2 16: **end if** 17: *Cmax* ← −∞ 18: **for** Σ<sup>1</sup> ∈ *Strategy*(Σ, *kmin*, *kmax*) **do** 19: *S*<sup>1</sup> ← 0 20: **for** *i* ∈ Σ<sup>1</sup> **do** 21: *S*<sup>1</sup> ← *S*<sup>1</sup> + *qi* 22: **end for** 23: *k* ← |Σ1| 24: (*C*1, *t*1) ← *OptTourCostSubOpt*(Σ1, *k*, *q*, *S*1) 25: (*C*2, *t*2) ← *OptTourCostSubOpt*(Σ \ Σ1, *N* − *k*, *q*, *S* − *S*1) 26: *C* ← *C*<sup>1</sup> + *C*<sup>2</sup> + *k* ∗ *S*<sup>1</sup> ∗ (*S* − *S*1) 27: **if** *C* > *Cmax* **then** 28: *Cmax* ← *C* 29: *tmax* ← (*t*1, *t*2) 30: **end if** 31: **end for** 32: **end if** 33: **return** (*Cmax*, *tmax*)

Observe that if *s* = 1 then the time complexity of Algorithm 4 is linear in the number *N* of players. Moreover, if *s* > 1 then the time complexity of the algorithm is polynomial in *N* and the degree of the polynomial grows logarithmically with *s*.

#### 5.5.1. Deterministic Sub-Optimal Algorithms

We define a deterministic sub-optimal algorithm by letting Σ<sup>1</sup> consist of the smallest set of players 1, 2, . . . , *k* such that *k* ≥ *kmin* and *q*Σ<sup>1</sup> > *q*Σ/2.

The rationale of this choice is to try to make the product *q*Σ<sup>1</sup> *q*Σ<sup>2</sup> from Equation (20) as high as possible. As *q*Σ<sup>1</sup> + *q*Σ<sup>2</sup> = *q*<sup>Σ</sup> is constant, we try to make the values *q*Σ<sup>1</sup> and *q*Σ<sup>2</sup> as close as possible, while maintaining the constraints on the size of subset Σ1.

We can define three variants of the deterministic sub-optimal algorithm by considering the sequence of players' quotations to be: (i) unchanged, i.e., as it was provided as input; (ii) increasingly sorted; (iii) decreasingly sorted.

#### 5.5.2. Stochastic Sub-Optimal Algorithms

We define a stochastic sub-optimal algorithm by letting Σ<sup>1</sup> consist of a family of randomly chosen subsets of players of Σ<sup>1</sup> ⊆ Σ such that *kmax* ≥ |Σ1| ≥ *kmin*. This is easily achieved by randomly choosing the number of players *k* uniformly distributed in [*kmin*, *kmax*] and then randomly choosing a subset Σ<sup>1</sup> of *k* elements and uniformly distributed in Σ.

The number of chosen subsets Σ<sup>1</sup> explored by the algorithm is a parameter denoted by *Nsample* and it usually has a low number, as it directly influences the complexity of the algorithm according to Proposition 9, *s* = *Nsampl*. For example, if *Nsampl* = 1 the complexity of the algorithm is *O*(*N*), if *Nsampl* = 2 the complexity of the algorithm is *O*(*N*2) and if *Nsampl* = 4 the complexity of the algorithm is *O*(*N*3).

#### **6. Implementation and Experiments**

#### *6.1. Implementation Issues*

There were several issues that we had to address by our experimental implementation of Algorithms 1–3.

Firstly, we have chosen to represent sets as arrays of bits, as well as using the integer value that is equivalent to the binary representation as an array of bits.

Secondly, for generating subsets of given size (i.e., combinations) we have used Algorithm 7.2.1.2L from [19] for generating permutations with repetitions of binary arrays. Basically the subsets representing combinations of *k* elements of a set with *n* ≥ *k* elements are all the permutations with repetitions of a binary vector of *n* elements containing exactly *k* elements equal to 1.

Thirdly, we had to choose an efficient representation of *OptC* and *OptS* structures. Their operation is crucial for the efficient implementation of some of our algorithms. As for our implementation we have chosen Python platform, we decided to implement *OptC* and *OptS* using subset-indexed dictionaries that map subsets of Σ to costs and to subsets necessary for building optimal tournaments, respectively. The subsets representing the dictionary keys are defined as integer values of their characteristic vector in binary format. As Python dictionaries are efficiently implemented using hash tables, an average *O*(1) time complexity is expected for lookup operations.

Finally, for the implementation of the random selection of subsets we have used the array of bits representation of sets and we have applied the *random.permute* function from *NumPy* package to return a randomly permuted array representing a random subset.

#### *6.2. Experimental Results*

Our experiments were developed in Python 3.7.3 using Jupyter Notebook on an *x64* based PC with a 2 cores / 4 threads Intel© Core™i7-5500U CPU at 2.40 GHz running Windows 10 (The experimental code is available at http://software.ucv.ro/~cbadica/tour.zip) (accessed on 2 September 2021).

According to our findings, there are no algorithms directly available to be compared with our own proposals. There are two main causes for this. First, we consider an integrated approach of tournament design, rather than a process involving two separated stages for structure design and seeding. Secondly, we do not use probabilistic information in our cost function, thus hindering the direct comparison of tournament cost values.

We took a different path for evaluating our proposals. We have implemented optimal algorithms, as well as several versions of sub-optimal algorithms and then compared their outcomes in terms of running time and optimality. So finally we have implemented and experimentally evaluated eight algorithms, as presented in Table 4.

**Table 4.** Table presenting the list of implemented optimal and sub-optimal algorithms for balanced tournaments.


Note that for the optimal algorithms there are at least two restrictions that hinder a complete experimental comparison with the rest of the algorithms. Firstly, their high computational complexity limits their applicability only to small number of players. Secondly, the dynamic programming bottom-up algorithm works only for a number of players that is an exact power of 2. We have only checked it for *N* = 4, 8, 16.

Our data set includes multiple sequences of players' quotations. For each *N* = 3, 4, ... 50 we generated a sequence of quotations *q*1, *q*2, ... , *qN* with integer values *qi* randomly chosen with a uniform distribution in the interval [*qmin* = 1, *qmax* = 9]. This data set was used to experimentally evaluate the algorithms from Table 4, as follows:


For each algorithm, we recorded the (sub-)optimal cost of the output tournament, as well as the running time. Stochastic algorithms *SubOptSi* for *i* = 1, 2, 3 were evaluated by repeating their execution 10 times for each input sequence of quotations from the data set and recording the minimum, maximum, and average costs, as well as the average running times.

Figure 5 presents the sub-optimal costs produced by *SubOptDi* and *SubOptSi* algorithms for *i* = 1, 2, 3. The figure plots costs *Ci* produced by algorithms *SubOptDi* for *i* = 1, 2, 3, as well as average costs *CSai* produced by algorithms *SubOptSi* for *i* = 1, 2, 3 and the maximum cost *CSM*<sup>3</sup> produced by the 10-times repeated execution of algorithm *SubOptS*3. Note that we included maximum cost only for this case, as it should be obvious that it is expected that stochastic algorithm *SubOptS*<sup>3</sup> will produce the best results among *SubOptSi* for *i* = 1, 2, 3 because it uses the highest number of samplings *Nsampl* = 3.

**Figure 5.** Sub-optimal tournament costs determined by sub-optimal algorithms on different quotations' sequences for various number of players.

Analyzing Figure 5, we first observe that the relative difference of the costs produced by the various algorithms on the same input sequence is rather low. This is expected, as quotations *qi* were generated as integer values from a small interval 1 ≤ *qi* ≤ 9 while the cost tends to reach significantly higher values. For example, analyzing in detail the results obtained for the sample with *N* = 50 players we observe that the relative difference between the smallest and the highest cost obtained (147,555 and 156,203) is of only 5.53%. We can also notice that the best results among sub-optimal algorithms were obtained by algorithm *SubOptS*3, while the worst results were obtained by algorithms *SubOptD*<sup>2</sup> and *SubOptD*3. It might look a bit surprising that algorithm *SubOptD*<sup>1</sup> appears to be superior to *SubOptD*<sup>2</sup> and *SubOptD*3; however, taking into account how the data set was generated, this could be explained by the fact that algorithm *SubOptD*<sup>1</sup> is actually using a random permutation of the quotations' sequence that provides a better balance of the total quotation distribution between the two subsets Σ<sup>1</sup> and Σ \ Σ<sup>1</sup> (see Algorithm 4) than algorithms *SubOptD*<sup>2</sup> and *SubOptD*3. One final remark, also observed experimentally, is that algorithms *SubOptD*<sup>2</sup> and *SubOptD*<sup>3</sup> produce the same sub-optimal costs if the number of players is an exact power of 2 (i.e., 16 and 32 on Figure 5). This is an immediate consequence on the logic behind their strategy definition.

Figure 6 presents the running times *TDi* and *TSi* of algorithms *SubOptDi* and *SubOptSi* for *i* = 1, 2, 3. The time figures are given in milliseconds and presented on a logarithmic scale and they were computed by taking the average for 10 executions of the algorithm on the same input data. First observe that deterministic versions *SubOptDi* are the fastest and they have virtually almost the same running times. This can be easily explained by the low computational complexity of the implementation of their underlying strategies. Basically, their strategies use the same mechanism, while the additional sorting of the sequence of quotations adds a negligible cost as it is performed before the actual core processing of the algorithms. Second, the highest execution time is achieved, as expected, by *SubOptS*3. This algorithm has the highest computational complexity among sub-optimal algorithms, as it is using three subset samples during the top-down search. From Figure 6 it also follows that the highest average execution time was obtained for the sequence of quotations with *N* = 46 players and its value was 2.49 s.

**Figure 6.** Running times on logarithmic scale of sub-optimal algorithms on different quotations' sequences for a variable number of players.

Figure 7 presents results obtained with optimal algorithms *OptTD* and *OptBU*, as well as their comparison with results obtained by sub-optimal algorithm *SubOptS*<sup>3</sup> for *N* = 3, 4, . . . , 16 players, on our input data set.

In Figure 7a we show the comparison of relative maximum and average costs obtained by algorithm *SubOptS*<sup>3</sup> (*CSM*3 and *CSa*3) with the actual optimal cost obtained by algorithm *OptTD*. The relative sub-optimal cost is a measure *Cr* ∈ (0, 1] computed with Equation (42) using the absolute values of sub-optimal cost *C*<sup>0</sup> and optimal cost *C*<sup>1</sup> ≥ *C*<sup>0</sup> > 0. Observe that *Cr* = 1 if and only if *C*<sup>1</sup> = *C*0, i.e., if the algorithm providing sub-optimal cost *C*<sup>0</sup> is in fact optimal. Note that the computation of the relative sub-optimal cost assumes the the exact value *C*<sup>1</sup> of the optimal cost is known. In our

case, this value is known, as it was determined using the *OptTD* optimal algorithm for *N* = 3, 4, . . . , 16 players. *Cr* <sup>=</sup> *<sup>C</sup>*<sup>0</sup>

*C*1

(42)

(**a**). Comparison of relative maximum and average costs obtained by algorithm *SubOptS*<sup>3</sup> (*CSM*3 and *CSa*3) with optimal cost obtained by algorithm *SubOptD*<sup>1</sup> for *N* = 3, 4, ... , 16 players, on our input data set.

(**b**). Comparison of running times of algorithms *OptTD*, *OptBU*, and *SubOptS*<sup>3</sup> for *N* = 3, 4, ... , 16 players, on our input data set. Time values are plotted on a logarithmic scale.

**Figure 7.** Comparing costs and running times of our implemented algorithms.

In Figure 7b we show the comparison of running times *TTD*, *TBU*, and *TS*3 of algorithms *OptTD*, *OptBU*, and *SubOptS*<sup>3</sup> for *N* = 3, 4, ... , 16 players. The running times were evaluated by repeating the algorithm execution 10 times for the same input data. They are plotted on a logarithmic scale. Observe that by far the most efficient among them is algorithm *SubOpt*3. The linear increasing trend of *TTD* and *TBU* on the logarithmic scale is consistent to our findings that the complexity of algorithms *OptTD* and *OptBU* is exponential with the number of tournament players. Note that this tendency is also observed on the plot of *TBU*, for which the values were recorded only for an exact power of two of the number of players, i.e., *N* = 4, 8, 16. Moreover, the sub-linear increasing trend of *TS*3 on the logarithmic scale is consistent with the fact that algorithm *SubOpt*<sup>3</sup> has a polynomial time complexity.

#### **7. Conclusions**

In this paper we defined optimal competitions structured as hierarchically shaped single-elimination tournaments. The optimality criterion aimed to maximize tournament attractiveness by letting the topmost players meet in higher stages of the tournament. We proposed a dynamic programming algorithm for computing optimal tournaments and we provided a thorough analysis of its correctness and computational complexity. Based on the idea of the dynamic programming approach, we also developed deterministic and stochastic sub-optimal algorithms. We realized an experimental evaluation of the proposed algorithms by providing experimental results that we obtained with their Python implementation. The results addressed the optimality of solutions and the efficiency of the running time.

**Author Contributions:** Conceptualization, A.B. and C.B.; methodology, A.B. and C.B.; software, C.B.; formal analysis, C.B. and A.B.; investigation, I.B., L.I.C. and D.L.; writing, C.B. and A.B. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**

