Entropy-Based Strategies for Multi-Bracket Pools

Brill, Ryan S.; Wyner, Abraham J.; Barnett, Ian J.

doi:10.3390/e26080615

Open AccessArticle

Entropy-Based Strategies for Multi-Bracket Pools

by

Ryan S. Brill

^1,*

,

Abraham J. Wyner

² and

Ian J. Barnett

³

¹

Graduate Group in Applied Mathematics and Computational Science, University of Pennsylvania, Philadelphia, PA 19104, USA

²

Department of Statistics and Data Science, The Wharton School, University of Pennsylvania, Philadelphia, PA 19104, USA

³

Department of Biostatistics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA

^*

Author to whom correspondence should be addressed.

Entropy 2024, 26(8), 615; https://doi.org/10.3390/e26080615

Submission received: 5 June 2024 / Revised: 19 July 2024 / Accepted: 20 July 2024 / Published: 23 July 2024

(This article belongs to the Section Multidisciplinary Applications)

Download

Browse Figures

Versions Notes

Abstract

Much work in the parimutuel betting literature has discussed estimating event outcome probabilities or developing optimal wagering strategies, particularly for horse race betting. Some betting pools, however, involve betting not just on a single event, but on a tuple of events. For example, pick six betting in horse racing, March Madness bracket challenges, and predicting a randomly drawn bitstring each involve making a series of individual forecasts. Although traditional optimal wagering strategies work well when the size of the tuple is very small (e.g., betting on the winner of a horse race), they are intractable for more general betting pools in higher dimensions (e.g., March Madness bracket challenges). Hence we pose the multi-brackets problem: supposing we wish to predict a tuple of events and that we know the true probabilities of each potential outcome of each event, what is the best way to tractably generate a set of n predicted tuples? The most general version of this problem is extremely difficult, so we begin with a simpler setting. In particular, we generate n independent predicted tuples according to a distribution having optimal entropy. This entropy-based approach is tractable, scalable, and performs well.

Keywords:

Entropy; optimization; parimutuel betting; NCAA March Madness; bracket challenges

1. Introduction

Parimutuel betting or pool betting involves pooling together all bets of a particular type on a given event, deducting a track take or vigorish, and splitting the pot among all winning bets. Prime examples are horse race betting and the March Madness bracket challenge (which involves predicting the winner of each game in the NCAA Division I men’s basketball March Madness tournament). Profitable parimutuel wagering systems have two components: a probability model of the event outcome and a bet allocation strategy. The latter uses the outcome probabilities as inputs to a betting algorithm that determines the amount to wager on each potential outcome. There is a large body of literature on estimating outcome probabilities for pool betting events. For instance, we provide an overview of estimating outcome probabilities for horse races and college basketball matchups in Appendix B.1. There is also a large body of literature on developing optimal wagering strategies, particularly for betting on horse race outcomes. Notably, assuming the outcome probabilities are known, Isaacs [1] and Kelly [2] derive the amount to wager on each horse so as to maximize expected profit and expected log wealth, respectively. Rosner [3] derives a wagering strategy for a risk-averse decision maker, and Willis [4] and Hausch et al. [5] derive other wagering strategies. On the other hand, there has been very limited work, and no literature to our knowledge, on deriving optimal strategies for generating multiple predicted March Madness brackets. Existing work focuses on generating a single predicted bracket (see Appendix B.2 for details).

Existing wagering strategies for pools that involve betting on the outcome of a single event (e.g., the winner of a horse race) have been successful. For instance, Benter [6] reported that his horse race gambling syndicate made significant profits during its five-year gambling operation. However, many betting pools in the real world involve betting not just on a single event but on a tuple of events. For example, the pick six bet in horse racing involves predicting each winner of six horse races. Also, the March Madness bracket challenge involves predicting the winner of each game in the NCAA Division I men’s basketball tournament. Another compelling example to the pure mathematician is predicting each of the bits in a randomly drawn bitstring. In each of these three prediction contests, the goal is to predict as best as possible a tuple of events, which we call a bracket. We suppose it is permissible to generate multiple predicted brackets, so we call these contests multi-bracket pools. In developing wagering strategies for multi-bracket pools, the literature on estimating outcome probabilities for each event in the bracket still applies. However, given these probabilities, the wagering strategy the literature developed for betting on single events does not extend to general multi-bracket pools. Although these methods work well in low-dimensional examples such as betting on the winner of a horse race, they are intractable for general multi-bracket pools having larger dimension (e.g., March Madness bracket challenges; see Appendix C for details).

Hence, we pose the multi-brackets problem. Suppose we wish to predict a bracket (a tuple of events) and suppose we know the “true” probabilities of each potential outcome of each event. Then, what is the best way to tractably generate a set of n predicted brackets? More concretely, how can we construct a set of n brackets that maximize an objective function such as expected score, win probability, or expected profit? The most general version of the multi-brackets problem, which finds the optimal set of n brackets across all such possible sets, is extremely difficult. To make the problem tractable, possible, and/or able to be visualized, depending on the particular specification of the multi-bracket pool, we make simplifying assumptions. First, we assume we (and optionally a field of opponents) predict i.i.d. brackets generated according to a bracket distribution. The task becomes to find the optimal generating bracket distribution. For higher-dimensional examples (e.g., March Madness bracket challenges), we make another simplifying assumption, optimizing over an intelligently chosen low-dimensional subspace of generating bracket distributions. In particular, we optimize over brackets of varying levels of entropy. We find that this entropy-based approach is sufficient to generate well-performing sets of bracket predictions. We also learn the following high-level lessons from this strategy: we should increase the entropy of our bracket predictions as n increases and as our opponents increase entropy.

The remainder of this paper is organized as follows. In Section 2, we formally introduce the multi-brackets problem. Then, in Section 3, we propose an entropy-based solution to what we consider a canonical example of a multi-bracket pool: guessing a randomly drawn bitstring. Using this example to aid our understanding of multi-bracket pools, in Section 4, we connect the multi-brackets problem to Information Theory, particularly via the Asymptotic Equipartition Property. Then, in Section 5, we propose entropy-based solutions to real-world examples of multi-bracket pools, including the pick six bet in horse racing in Section 5.1 and March Madness bracket challenges in Section 5.2. We conclude in Section 6.

2. The Multi-Brackets Problem

In this section, we formally introduce the multi-brackets problem. The goal of a multi-bracket pool is to predict a tuple of m outcomes

τ = (τ_{1}, \dots, τ_{m})

, which we call the “true” observed reference bracket. We judge how “close” a bracket prediction

x = (x_{1}, \dots, x_{m})

is to

τ

by a bracket scoring function

f (x, τ)

. One natural form for the scoring function is

f (x, τ) = \sum_{i = 1}^{m} w_{i} \cdot 1 1 {x_{i} = τ_{i}},

(1)

which is the number of outcomes predicted correctly weighted by

{w_{i}}_{i = 1}^{m}

. Another is

f (x, τ) = 1 1 {x = τ},

(2)

which is one if and only if the predicted bracket is exactly correct. The contestants who submit the highest-scoring brackets win the pool.

The multi-brackets problem asks the general question: if we could submit n brackets to the pool, how should we choose which brackets to submit? This question takes on various forms depending on the information available to us and the structure of a particular multi-bracket pool. In the absence of information about opponents’ predicted brackets, how should we craft our submitted set

B_{n}

of n bracket predictions in order to maximize the expected maximum score? Formally, we solve

\begin{matrix} B_{n}^{*} : = & \underset{{B_{n} \subset X : | B_{n} | = n}}{arg max} E_{τ} [max_{x \in B_{n}} f (x, τ)] . \end{matrix}

(3)

Or, assuming a field of opponents submits a set

O_{k}

of k bracket predictions to the pool according to some strategy, how should we craft our submitted set

B_{n}

of n brackets in order to maximize our probability of having the best bracket? Formally, we solve

\begin{matrix} B_{n}^{*} : = & \underset{{B_{n} \subset X : | B_{n} | = n}}{arg max} P_{τ, O_{k}} [max_{x \in B_{n}} f (x, τ) \geq max_{y \in O_{k}} f (y, τ)] . \end{matrix}

(4)

Another version of a multi-bracket pool offers a carryover C of initial money in the pot, charges b dollars per submitted bracket, and removes a fraction

α

from the pot as a track take or vigorish. The total pool of money entered into the pot is thus

\begin{matrix} T = C + b (n + k) (1 - α), \end{matrix}

(5)

which is split among the entrants with the highest-scoring brackets. The question becomes the following: how should we craft our submitted set

B_{n}

of n brackets in order to maximize expected profit? Formally, we solve

\begin{matrix} B_{n}^{*} : = & \underset{{B_{n} \subset X : | B_{n} | = n}}{arg max} T \cdot P_{τ, O_{k}} [max_{x \in B_{n}} f (x, τ) > max_{y \in O_{k}} f (y, τ)] - b \cdot n . \end{matrix}

(6)

This variant assumes no ties but is easily extended to incorporate ties (see Section 5.1). The optimization problems in Equations (3), (4) and (6) and related variants define the multi-brackets problem.

In upcoming sections, we explore specific examples of the multi-brackets problem. In guessing a randomly drawn bitstring (Section 3) and the March Madness bracket challenge (Section 5.2), we explore the multi-brackets problem via scoring function (1) and objective functions (3) and (4). In pick six betting in horse racing (Section 5.1), we explore the multi-brackets problem via scoring function (2) and objective function (6).

The most general version of the multi-brackets problem, which finds the optimal set of n brackets across all such possible sets, is extremely difficult. To make the problem tractable, possible, and/or able to be visualized, depending on the particular specification of the multi-bracket pool, we make simplifying assumptions. We assume we (and the field of opponents) submit i.i.d. brackets generated from some bracket distribution. As the size of a bracket increases, solving the multi-brackets problem under this assumption quickly becomes intractable, so we optimize over intelligently chosen low-dimensional subspaces of bracket distributions. We find this entropy-based strategy is sufficient to generate well-performing sets of submitted brackets.

3. Canonical Example: Guessing a Randomly Drawn Bitstring

In this section, we delve into what we consider a canonical example of a multi-bracket pool: guessing a randomly drawn bitstring. In this contest, we want to predict the sequence of bits in a reference bitstring, which we assume is generated according to some known probability distribution. We submit n guesses of the reference bitstring with the goal of being as “close” to it as possible or of being “closer” to it than a field of k opponents’ guesses, according to some distance function. With some assumptions on the distribution

P

from which the reference bitstring is generated, the distribution

Q

from which we generate bitstring guesses, and the distribution

R

from which opponents generate bitstring guesses, the expected maximum score and win probability are analytically computable and tractable. By visualizing these formulas, we discern high-level lessons relevant to all multi-bracket pools. To maximize the expected maximum score of a set of n submitted randomly drawn brackets, we should increase the entropy of our submitted brackets as n increases. To maximize the probability that the maximum score of n submitted randomly drawn brackets exceeds that of k opposing brackets, we should increase the entropy of our brackets as our opponents increase entropy.

The objective of this multi-bracket pool is to predict a randomly drawn bitstring, which is to predict a sequence of bits. Here, a bracket is a bitstring consisting of m bits divided into R rounds with

m_{rd}

bits in each round

rd \in {1, \dots, R}

. For concreteness, we let there be

m_{rd} = 2^{R - rd}

bits in each of the

R = 6

rounds (i.e., 32 bits in round 1, 16 bits in round 2, 8 bits in round 3, ..., 1 bit in round 6, totaling 63 bits), but the analysis in this section holds for other choices of

m_{rd}

and R. The “true” reference bracket that we are trying to predict is a bitstring

τ = (τ_{rd, i} : 1 \leq rd \leq R, 1 \leq i \leq m_{rd})

. A field of opponents submits k guesses of

τ

, the brackets

(y^{(1)}, \dots, y^{(k)})

, where each bracket is a bitstring

y^{(ℓ)} = (y_{rd, i}^{(ℓ)} : 1 \leq rd \leq R, 1 \leq i \leq m_{rd})

. We submit n guesses of

τ

, the brackets

(x^{(1)}, \dots, x^{(n)})

, where each bracket is a bitstring

x^{(j)} = (x_{rd, i}^{(j)} : 1 \leq rd \leq R, 1 \leq i \leq m_{rd})

. The winning submitted bracket among

{x^{(j)}}_{j = 1}^{n} \cup {y^{(ℓ)}}_{ℓ = 1}^{k}

is “closest” to the reference bracket

τ

according to a scoring function

f (x, τ)

measuring how “close” x is to

τ

. Here, we consider

f (x, τ) = \sum_{rd = 1}^{R} \sum_{i = 1}^{m_{rd}} w_{rd, i} \cdot 1 1 {x_{rd, i} = τ_{rd, i}},

(7)

which is the weighted number of bits guessed correctly. This scoring function encompasses both the Hamming score and ESPN score. The Hamming score measures the number of bits guessed correctly, weighing each bit equally (

w_{rd, i} \equiv 1

). The ESPN score weighs each bit by

w_{rd, i} = 10 \cdot 2^{rd - 1}

so that the maximum accruable score in each round is the same (

10 \cdot 2^{R - 1}

).

Suppose the “true” reference bitstring

τ

is generated according to some known distribution

P

and opponents’ bitstrings are generated according to some known distribution

R

. Our task is to submit n predicted bitstrings so as to maximize the expected maximum score

\begin{matrix} E [max_{j = 1, \dots, n} f (x^{(j)}, τ)] \end{matrix}

(8)

or the probability that we do not lose the bracket challenge

\begin{matrix} P [max_{j = 1, \dots, n} f (x^{(j)}, τ) \geq max_{ℓ = 1, \dots, k} f (y^{(ℓ)}, τ)] . \end{matrix}

(9)

In particular, we wish to submit n bitstrings generated according to some distribution

Q

, and it is our task to find suitable

Q

. For tractability, we consider the special case that bits are drawn independently with probabilities varying by round. We suppose that each bit

τ_{rd, i}

in the reference bitstring is an independently drawn

Bernoulli (p_{rd})

coin flip. The parameter

p_{rd} \in [0.5, 1]

controls the entropy of the contest: lower values correspond to a higher-entropy (more variable) reference bitstring that is harder to predict. By symmetry, our strategy just needs to vary by round. So, we assume that each of our submitted bits

x_{rd, i}^{(j)}

is an independently drawn

Bernoulli (q_{rd})

coin flip and each of our opponents’ submitted bits

y_{rd, i}^{(ℓ)}

is an independently drawn

Bernoulli (r_{rd})

coin flip. The parameters

q_{rd}

and

r_{rd}

control the entropy of our submitted bitstrings and our opponents’ submitted bitstrings, respectively. Our task is to find the optimal strategy or entropy level

{(q_{rd})}_{rd = 1}^{R}

. In this setting, expected maximum score and win probability are analytically computable and tractable (see Appendix E).

We first visualize the case where the entropy of the reference bitstring, our submitted bitstrings, and our opponents’ submitted bitstrings do not vary by round:

p \equiv p_{rd}

,

q \equiv q_{rd}

, and

r \equiv r_{rd}

. In Figure 1, we visualize the expected maximum Hamming score of n submitted bitstrings as a function of p, q, and n. We find that we should increase the entropy of our submitted brackets (decrease q) as n increases, transitioning from pure chalk (

q = 1

) for

n = 1

bracket to the true amount of randomness (

q = p

) for large n. Specifically, for small n, the green line

q = p

lies below the blue lines (large q), and for large n, the green line lies above all the other lines.

In Figure 2, we visualize the win probability as a function of q, r, and n for

k = 100

and

p = 0.75

. The horizontal gray dashed line

q = p = 0.75

represents that we match the entropy of the reference bitstring, the vertical gray dashed line

r = p = 0.75

represents that our opponents match the entropy of the reference bitstring, and the diagonal gray dashed line

q = r

represents that we match our opponents’ entropy. We should increase entropy (decrease q) as n increases, visualized by the green region moving downwards as n increases. Further, to maximize win probability, we should increase entropy (decrease q) as our opponents’ entropy increases (as r decreases), visualized by the triangular form of the green region. In other words, we should tailor the entropy of our brackets to the entropy of our opponents’ brackets. These trends are similar for other values of k and n (see Figure A3 of Appendix E).

These trends generalize to the case where the entropy of each bitstring varies by round (i.e., general

q_{rd}

,

r_{rd}

, and

p_{rd}

). It is difficult to visualize the entire

R = 6

dimensional space of

p = (p_{1}, \dots, p_{6})

,

q = (q_{1}, \dots, q_{6})

, and

r = (r_{1}, \dots, r_{6})

, so we instead consider a lower-dimensional subspace. Specifically, we visualize a two-dimensional subspace of q parameterized by

(q_{E}, q_{L})

, where

q_{E}

denotes q in early rounds and

q_{L}

denotes q in later rounds. For example,

q_{E} = q_{1} = q_{2} = q_{3}

and

q_{L} = q_{4} = q_{5} = q_{6}

is one of the five possible partitions of

(q_{E}, q_{L})

. We similarly visualize a two-dimensional subspace of r parameterized by

(r_{E}, r_{L})

. Finally, we let the reference bitstring have a constant entropy across each round,

p_{rd} \equiv p

.

In Figure 3, we visualize the expected maximum ESPN score of n bitstrings as a function of

q_{E}

,

q_{L}

, and n for

p = 0.75

. The three columns display the results for

n = 1

,

n = 10

, and

n = 100

, respectively. The five rows display the results for the five partitions of

(q_{E}, q_{L})

. For instance, the first row shows one partition

q_{E} = q_{1}

and

q_{L} = q_{2} = q_{3} = q_{4} = q_{5} = q_{6}

. As n increases, the expected maximum ESPN score increases. We visualize this as the lines moving upwards as we move right across the grid of plots. As E increases (i.e., as

q_{E}

encompasses a larger number of early rounds), the impactfulness of the late round strategy

q_{L}

decreases. We visualize this as the lines become more clumped together as we move down the grid of plots in Figure 3. For

n = 1

, the best strategy is pure chalk (

q_{E} = 1

,

q_{L} = 1

), and as n increases, the optimal values of

q_{E}

and

q_{L}

decrease. In other words, as before, we want to increase the entropy of our submitted brackets as n increases. We visualize this as the circle (i.e., the best strategy in each plot) moving leftward and having a more reddish color as n increases.

In Figure 4, we visualize the win probability as a function of

q_{E}

,

q_{L}

,

r_{E}

, and

r_{L}

, for

n = k = 100

,

p = 0.75

, and ESPN score. Figure 4a uses the partition where the first three rounds are the early rounds (e.g.,

q_{E} = q_{1} = q_{2} = q_{3}

and

r_{E} = r_{1} = r_{2} = r_{3}

). In this scenario, early round strategy

q_{E}

and

r_{E}

are much more impactful than late round strategy

q_{L}

and

r_{L}

. We visualize this as each subplot looking the same. The green triangle within each subplot illustrates that we should increase early round entropy (decrease

q_{E}

) as our opponents’ early round entropy increases (i.e., as

r_{E}

decreases). Figure 4b uses the partition where just the first round is an early round (e.g.,

q_{E} = q_{1}

and

r_{E} = r_{1}

). In this scenario, both early round strategy

q_{E}

and

r_{E}

and late round strategy

q_{L}

and

r_{L}

are impactful. The green triangle appears again in each subplot, illustrating that we should increase early round entropy as our opponents’ early round entropy increases. But the green triangle grows as

r_{L}

decreases, indicating that we should increase late round entropy (decrease

q_{E}

) as our opponent’s entropy increases.

4. An Information Theoretic View of the Multi-Brackets Problem

The multi-brackets problem is intimately connected to Information Theory. Viewing the multi-brackets problem under an information theoretic lens provides a deeper understanding of the problem and elucidates why certain entropy-based strategies work. In particular, the Asymptotic Equipartition Property from Information Theory helps us understand why it makes sense to increase entropy as the number of brackets increases and as our opponents’ entropy increases. In this section, we give an intuitive explanation of the Equipartition Property and discuss implications, relegating the formal mathematical details to Appendix F.

To begin, we partition the set of all brackets

X

into three subsets,

\{\begin{matrix} low entropy “ chalky ” brackets & C \subset X, \\ “ typical ” brackets & T \subset X, \\ high entropy “ rare ” brackets & R \subset X . \end{matrix}

(10)

We visualize this partition of

X

under three lenses in Figure 5.

First, the probability mass of an individual low-entropy or chalky bracket is much larger than the probability mass of an individual typical bracket, which is much larger than the probability mass of an individual high-entropy or rare bracket. In symbols, if

x_{1} \in C, x_{2} \in T, and x_{3} \in R

, then

P (x_{1}) > > P (x_{2}) > > P (x_{3})

. “Rare” is a good name for high-entropy brackets because they are highly unlikely. “Chalk”, a term from sports betting, is a good name for low-entropy brackets because it refers to betting on the heavy favorite (i.e., the outcome with the highest individual likelihood). Most of the individual forecasts within a low-entropy bracket must consist of the most probable outcomes. For example, in the “guessing a bitstring” contest, assuming the reference bitstring consists of independent Bernoulli(p) bits where

p > 0.5

, low-entropy brackets are bitstrings consisting mostly of ones. In real-world examples of multi-bracket pools, people are drawn to these low-entropy chalky brackets because they have high individual likelihoods.

Second, there are exponentially more rare brackets than typical brackets, and there are exponentially more typical brackets than chalky brackets. In symbols,

| R | > > | T | > > | C |

. In the “guessing a bitstring” contest with

p > 0.5

, the overwhelming majority of possible brackets are high-entropy brackets having too many zeros, and very few possible brackets are low-entropy brackets consisting almost entirely of ones. Typical brackets tow the line, having the “right” amount of ones. March Madness is analogous: the overwhelming majority of possible brackets are rare brackets with too may upsets (e.g., a seed above 8 winning the tournament) and relatively few possible brackets are chalky brackets with few upsets (there are only so many distinct brackets with favorites winning nearly all the games). Typical brackets tow the line, having the “right” number of upsets.

Lastly, the typical set of brackets contains most of the probability mass. In symbols,

P (T) > > P (C)

and

P (T) > > P (R)

. This is a consequence of the previous two inequalities. Although

| R |

is massive,

P (x)

for

x \in R

is so small that

P (R)

is small. Also, although

P (x)

for

x \in C

is relatively large,

| C |

is so small that

P (C)

is small. Hence, the remainder of the probability mass,

P (T)

, is large. “Typical” is thus a good name for brackets whose entropy is not too high or too low because a randomly drawn bracket typically has this “right” amount of entropy. For example, the observed March Madness tournament is almost always a typical bracket featuring a typical number of upsets.

Drilled down to its essence, the Equipartition Property tells us that, as the number of forecasts m within each bracket grows, the probability mass of the set of brackets becomes increasingly more concentrated in an exponentially small set, the “typical set.” See Appendix F for a more formal treatment of the Equipartition Property.

This information theoretic view of the multi-brackets problem sets up a tradeoff between chalky and typical brackets. Typical brackets have the “right” entropy but consist of less likely individual outcomes, whereas chalky low-entropy brackets have the “wrong” entropy but consist of more likely individual outcomes. The former excels when n is large, the latter excels when n is small, and for moderate n we interpolate between these two regimes; so, we should increase the entropy of our set of predicted brackets as the number of brackets n increases. We justify this below using the Equipartition Property.

As the typical set contains most of the probability mass, the reference bracket is highly likely to be a typical bracket. So, when n is large, we should generate typical brackets as guesses since it is likely that at least one of these guesses is “close” to the reference bracket. When n is small, generating typical brackets as guesses does not produce as high an expected maximum score as chalky brackets. To understand, recall that a bracket consists of m individual forecasts. A single randomly drawn typical bracket has the same entropy as the reference bracket but is not likely to correctly predict each individual forecast. For instance, in our “guessing a bitstring” example, a single randomly drawn bitstring has, on average, a similar number of ones as the reference bitstring, but not the “right” ones in the “right” locations. A chalky bracket, on the other hand, predicts highly likely outcomes in most of the individual forecasts. The chalkiest bracket, which predicts the most likely outcome in each individual forecast, matches the reference bracket for each forecast in which the reference bracket realizes its most likely outcome. This, on average, yields more matches than that of a typical bracket because more forecasts realize their most likely outcome than any other single outcome. For instance, in our “guessing a bitstring” example, a chalky bracket consists mostly of ones (assuming

p > 0.5

) and so correctly guesses the locations of ones in the reference bitstring. This is better, on average, than guessing a typical bracket, which, on average, has the “right” number of ones but in the wrong locations.

5. Real-World Examples

Now, we discuss real world examples of multi-bracket pools: pick six betting in horse racing and March Madness bracket challenges. Both contests involve predicting a tuple of outcomes. An individual pick six bet (ticket) involves predicting each winner of six horse races, and an individual March Madness bet (bracket) involves predicting the winner of each game in the NCAA Division I Men’s Basketball March Madness tournament. In both contests, it is allowed, but not necessarily commonplace (outside of horse racing betting syndicates), to submit many tickets or brackets. We demonstrate that the entropy-based strategies introduced in the previous sections are particularly well-suited for these problems. In particular, optimizing over strategies of varying levels of entropy is tractable and yields well-performing solutions.

5.1. Pick Six Horse Race Betting

Horse race betting is replete with examples of multi-bracket pools. A prime example is the pick six bet, which involves correctly picking the winner of six horse races. Similar pick three, pick four, and pick five bets, which involve correctly picking the winner of three, four, or five horse races, respectively, also exist. Due to the immense difficulty of picking six consecutive horse race winners coupled with a large number of bettors in these pools, payoffs for successful pick six bets can be massive (e.g., in the millions of dollars). In this section, we apply our entropy-based strategies to pick six betting, demonstrating the massive profit potential of these bets.

To begin, let

s \in {3, 4, 5, 6}

denote the number of races comprising the pick-s bet (for the pick three, four, five, and six contests, respectively). Suppose, for simplicity, that one pick-s ticket, consisting of s predicted horse race winners, costs USD 1 each (typically a pick-s bet costs USD 1 or USD 2). Indexing each race by

j = 1, \dots, s

, suppose there are

m_{j}

horses in race j, and let

= (m_{1}, \dots, m_{s})

. There is a fixed carryover C, an amount of money leftover from previous betting pools in which no one won, that is added to the total prize pool for the pick-s contest. As assumed throughout this paper, assume the “true” win probability

P_{i j}

that horse i wins race j is known for each i and j. As our operating example in this section, we set

P

to be the win probabilities implied by the Vegas odds from the pick six contest from Belmont Park on 21 May 2023 (https://entries.horseracingnation.com/entries-results/belmont-park/2023-05-21, accessed on 1 June 2023), which we visualize in Figure 6.

Suppose the public purchases k entries according to some strategy. In particular, we assume the public submits k independent tickets according to

R

, where

R_{i j}

is the probability an opponent selects horse i to win race j. We purchase n entries according to strategy

Q

. Specifically, we submit n independent tickets according to

Q

, where

Q_{i j}

is the probability we select horse i to win race j. The total prize money is thus

T = C + (n + k) (1 - α),

(11)

where

α

is the track take (vigorish). Let W be our number of winning tickets and let

W_{opp}

be our opponents’ number of winning tickets. Under our model, both W and

W_{opp}

are random variables. Formally, denote the “true” observed s winning horses by

τ = (τ_{1}, \dots, τ_{s})

, our n tickets by

(x^{(1)}, \dots, x^{(n)})

, where each

x^{(ℓ)} = (x_{1}^{(ℓ)}, \dots, x_{s}^{(ℓ)})

, and the publics’ k tickets by

(y^{(1)}, \dots, y^{(k)})

, where each

y^{(ℓ)} = (y_{1}^{(ℓ)}, \dots, y_{s}^{(ℓ)})

. Then,

W = \sum_{ℓ = 1}^{n} 1 1 {x^{(ℓ)} = τ} = \sum_{ℓ = 1}^{n} 1 1 {x_{1}^{(ℓ)} = τ_{1}, \dots, x_{s}^{(ℓ)} = τ_{s}}

(12)

and

W_{opp} = \sum_{ℓ = 1}^{k} 1 1 {y^{(ℓ)} = τ} = \sum_{ℓ = 1}^{k} 1 1 {y_{1}^{(ℓ)} = τ_{1}, \dots, y_{s}^{(ℓ)} = τ_{s}} .

(13)

Then, the amount we profit is also a random variable,

Profit = (\frac{W}{W + W_{opp}}) T - n,

(14)

where we treat

\frac{0}{0}

to be 0 (i.e., if both

W = 0

and

W_{opp} = 0

, the fraction

W / (W + W_{opp})

is 0). Here, the randomness is over

τ

∼

P

, x∼

Q

, and y∼

R

.

Our task is to solve for the optimal investment strategy

Q

given all the other variables n, k,

P

,

R

, C, and

α

. Formally, we wish to maximize expected profit,

E [Profit] = - n + T \cdot E (\frac{W}{W + W_{opp}}) .

(15)

In Appendix G, we compute a tractable lower bound for the expected profit.

We are unable to analytically optimize the expected profit to find an optimal strategy

Q^{*}

given the other variables, and we are unable to search over the entire high-dimensional

Q

-space for an optimal strategy. Instead, we apply the entropy-based strategies described in the previous sections. The idea is to search over a subspace of

Q

that explores strategies of varying entropies, finding the optimal entropy given the other variables. To generate n pick six tickets at varying levels of entropy, we let

Q = Q (λ, ϕ)

vary according to parameters

λ

and

ϕ

that control the entropy. Assuming without loss of generality that in each race j the “true” win probabilities are sorted in decreasing order,

P_{1 j} \geq P_{2 j} \geq \dots \geq P_{m_{j} j}

, we define

Q (λ, ϕ)

for

λ > 0

and

ϕ \in [0, 1]

by

\begin{matrix} {\tilde{Q}}_{i j} (λ, ϕ) & = \{\begin{matrix} {(P_{i j} / P_{r o u n d (ϕ \cdot m_{j}), j})}^{λ} & if λ < 1, \\ P_{i j} (λ \cdot 1 1 {i \leq r o u n d (ϕ \cdot m_{j})} + \frac{1}{λ} \cdot 1 1 {i \leq r o u n d (ϕ \cdot m_{j})}) & if λ \geq 1, \end{matrix} \end{matrix}

(16)

\begin{matrix} Q_{i j} (λ, ϕ) & = {\tilde{Q}}_{i j} (λ, ϕ) / \sum_{i = 1}^{m_{j}} {\tilde{Q}}_{i j} (λ, ϕ), \end{matrix}

(17)

recalling that there are

m_{j}

horses in race j. We visualize these probabilities for race

j = 6

in Figure 7. For fixed

ϕ

, smaller values of

λ

push the distribution

Q_{* j}

closer towards a uniform distribution, increasing its entropy. Conversely, increasing

λ

lowers its entropy. In lowering its entropy, we shift the probability from some horses onto other horses in a way that makes the distribution less uniform. The parameter

ϕ

controls the number of horses to which we transfer probability as

λ

increases. For instance, there are

m_{j} = 8

horses in race

j = 6

, so when

ϕ = 3 / 8

, we transfer successively more probability to the top

3 = r o u n d (ϕ \cdot m_{j})

horses as

λ

increases.

Further, we assume we play against opponents who generate brackets according to the strategy

R_{i j} (λ_{o p p}) = P (λ = λ_{o p p}, ϕ = 1 / 8)

. In other words, low-entropy opponents bet mostly on the one or two favorite horses (depending on

m_{j}

), high-entropy opponents are close to a uniform distribution, and moderate-entropy opponents lie somewhere in the middle. The exact specification of the opponents’ distribution is not important, as we use it to illustrate a general point. In future work, one can try to model the distribution of the public’s ticket submissions to obtain more precise results.

In Figure 8, we visualize the expected profit for a pick six horse racing betting pool in which we submit n tickets according to strategy

Q (λ, ϕ)

against a field of k = 25,000 opponents who use strategy

R (λ_{o p p})

, assuming a track take of

α = 0.05

and carryover C = 500,000, as a function of

λ_{o p p}

and n. Given these variables, we use the strategy

(λ, ϕ)

that maximizes expected profit over a grid of values. We see that the entropy of the optimal strategy increases as n increases (i.e.,

λ

decreases and

ϕ

increases as n increases). Further, we see that submitting many brackets at a smart entropy level is hugely profitable. This holds true particularly when the carryover is large enough, which occurs fairly regularly.

5.2. March Madness Bracket Challenge

March Madness bracket challenges are prime examples of multi-bracket pools. In a bracket challenge, contestants submit an entire bracket, or a complete specification of the game winners of each of the games in the NCAA Division I Men’s Basketball March Madness tournament. The winning bracket is closest to the observed NCAA tournament according to some metric. Popular March Madness bracket challenges from ESPN, BetMGM, and DraftKings, for instance, offer large cash prizes—BetMGM offered USD 10 million to a perfect bracket or USD 100,000 to the closest bracket, DraftKings sent USD 60,000 in cash prizes spread across the best 5096 brackets last year, and ESPN offered USD 100,000 to the winner of a lottery among the entrants who scored the most points in each round of the tournament (https://www.thelines.com/best-march-madness-bracket-contests/, accessed on 1 June 2023). To illustrate the difficulty of perfectly guessing the observed NCAA tournament, Warren Buffett famously offered USD 1 billion to anyone who filled out a flawless bracket (https://bleacherreport.com/articles/1931210-warren-buffet-will-pay-1-billion-to-fan-with-perfect-march-madness-bracket, accessed on 1 June 2023). In this section, we apply our entropy-based strategies to March Madness bracket challenges, demonstrating the impressive efficacy of this strategy.

To begin, we denote the set of all brackets by

X

, which consists of

N = 2^{63} = 2^{2^{6} - 1}

brackets since there are 63 games through six rounds in the NCAA tournament (excluding the four-game play-in tournament). We define an atomic probability measure

P

on

X

, where

P (x)

is the probability that bracket

x \in X

is the “true” observed NCAA tournament, as follows. Given that match

m \in {1, \dots, 63}

involves teams i and j, we model the outcome of this match by

i \cdot b_{m} + j \cdot (1 - b_{m})

, where

b_{m} \overset{i n d}{\sim} Bernoulli (P_{i j})

. In other words, with probability

P_{i j}

, team i wins the match; else, team j wins the match. Prior to the first round (games 1 through 32), the first 32 matchups are set. Given these matchups, the 32 winning teams in round one are determined by Bernoulli coin flips according to

P

. These 32 winning teams from round one then uniquely determine the 16 matchups for the second round of the tournament. Given these matchups, the 16 winning teams in round two are also determined by Bernoulli coin flips according to

P

. These winners then uniquely determine the matchups for round three. This process continues until the end of round six, when one winning team remains.

In this work, we assume we know the “true” win probabilities

P

. As our operating example in this section, we set

P

to be the win probabilities implied by FiveThirtyEight’s Elo ratings from the 2021 March Madness tournament (https://projects.fivethirtyeight.com/2021-march-madness-predictions/, accessed on 5 June 2024). We scrape FiveThirtyEight’s pre-round-one 2021 Elo ratings

{β_{i}}_{i = 1}^{64}

and index the teams by

i \in {1, \dots, 64}

in decreasing order of Elo rating (e.g., the best team Gonzaga is 1 and the worst team Texas Southern is 64). Then, we define

P

by

P_{i j} = 1 / (1 + 10^{- (β_{i} - β_{j}) * 30.464 / 400})

. In Figure 9a, we visualize

{β_{i}}_{i = 1}^{64}

. The Elo ratings range from 71.1 (Texas Southern) to 96.5 (Gonzaga), who is rated particularly highly. In Figure 9b, we visualize

P

via the functions

j \mapsto P_{i j}

for each team i. For instance, Gonzaga’s win probability function is the uppermost orange line, which is considerably higher than the other teams’ lines. See Appendix D for a discussion of the robustness of our results to this choice of

P

.

Suppose a field of opponents submits k brackets

(y^{(1)}, \dots, y^{(k)}) \subset X

to the bracket challenge according to some strategy

R

. In particular, we assume the public submits k independent brackets according to

R

, where

R_{i j}

is the probability an opponent selects team i to beat team j in the event that they play. We submit n brackets

(x^{(1)}, \dots, x^{(n)}) \subset X

to the bracket challenge according to strategy

Q

. Specifically, we submit n independent brackets according to

Q

, where

Q_{i j}

is the probability we select team i to beat team j in the event that they play. The goal is to become as “close” to the “true” reference bracket

τ \in X

, or the observed NCAA tournament, as possible according to a bracket scoring function. The most common such scoring function in these bracket challenges is what we call ESPN score, which credits

10 \cdot 2^{rd - 1}

points to correctly predicting the winner of a match in round

rd \in {1, \dots, 6}

. Since there are

2^{6 - rd}

matches in each round

rd

, ESPN score ensures that the maximum accruable points in each round is the same (320). Formally, our task is to submit n brackets so as to maximize the probability we do not lose the bracket challenge,

\begin{matrix} P [max_{j = 1, \dots, n} f (x^{(j)}, τ) \geq max_{ℓ = 1, \dots, k} f (y^{(ℓ)}, τ)] . \end{matrix}

(18)

Alternatively, in the absence of information about our opponents, our task is to submit n brackets so as to maximize expected maximum score,

\begin{matrix} E [max_{j = 1, \dots, n} f (x^{(j)}, τ)] . \end{matrix}

(19)

Under this model, it is intractable to explicitly evaluate these formulas for expected maximum score or win probability for general

P

,

Q

, and

R

, even when we independently draw brackets from these distributions. This is because the scores

f (x^{(1)}, τ)

and

f (x^{(2)}, τ)

of two submitted brackets

x^{(1)}

and

x^{(2)}

relative to

τ

are both dependent on

τ

, and integrating over

τ

yields a sum over all

2^{m} = 2^{63}

possible true brackets for

τ

, which is intractable. Hence, we use Monte Carlo simulation to approximate expected maximum score and win probability. We approximate expected maximum score via

\begin{matrix} E [max_{j = 1, \dots, n} f (x^{(j)}, τ)] \approx \frac{1}{B_{1}} \sum_{b_{1} = 1}^{B_{1}} \frac{1}{B_{2}} \sum_{b_{2} = 1}^{B_{2}} max_{j = 1, \dots, n} f (x^{(j, b_{2})}, τ^{(b_{1})}), \end{matrix}

(20)

where the

τ^{(b_{1})}

are independent samples from

P

and the

x^{(j, b_{2})}

are independent samples from

Q

. We use a double Monte Carlo sum, with

B_{1} = 250

draws of

τ

and

B_{2} = 100

draws of

(x^{(1)}, \dots, x^{(n)})

, because it provides a smoother and stabler approximation than a single Monte Carlo sum. Similarly, we approximate win probability via

\begin{matrix} P [max_{j = 1, \dots, n} f (x^{(j)}, τ) \geq max_{ℓ = 1, \dots, k} f (y^{(ℓ)}, τ)] \end{matrix}

(21)

\begin{matrix} \approx & \frac{1}{B_{1}} \sum_{b_{1} = 1}^{B_{1}} \frac{1}{B_{2}} \sum_{b_{2} = 1}^{B_{2}} 1 1 \{max_{j = 1, \dots, n} f (x^{(j, b_{2})}, τ^{(b_{1})}) \geq max_{ℓ = 1, \dots, k} f (y^{(ℓ, b_{2})}, τ^{(b_{1})})\}, \end{matrix}

(22)

where the

τ^{(b_{1})}

are independent samples from

P

, the

x^{(j, b_{2})}

are independent samples from

Q

, and the

y^{(ℓ, b_{2})}

are independent samples from

R

. We again use a double Monte Carlo sum, with

B_{1} = 250

draws of

τ

and

B_{2} = 100

draws of

(x^{(1)}, \dots, x^{(n)})

and

(y^{(1)}, \dots, y^{(k)})

, because it provides a smooth and stable approximation.

We are unable to analytically optimize these objective functions to find an optimal strategy

Q^{*}

given the other variables, and we are unable to search over the entire high-dimensional

Q

-space for an optimal strategy. These problems are even more difficult than simply evaluating these objective functions, which itself is intractable. Thus, we apply the entropy-based strategies from the previous sections, which involve generating successively higher entropy brackets as n increases. The idea is to search over a subspace of

Q

that explores strategies of varying entropies, finding the optimal entropy given the other variables. To generate n brackets at varying levels of entropy, we let

Q = Q (λ)

vary according to the parameter

λ

that controls the entropy. In a game in which team i is favored against team j (so

i < j

, since we indexed the teams in decreasing order of team strength, and

P_{i j} \in [0.5, 1]

), the lowest entropy (chalkiest) strategy features

Q_{i j} = 1

, the “true” entropy strategy features

Q_{i j} = P_{i j}

, and the highest entropy strategy features

Q_{i j} = 1 / 2

. We construct a family for

Q

that interpolates between these three poles,

Q_{i j} (λ) : = \{\begin{matrix} (1 - 2 λ) \frac{1}{2} + (2 λ) P_{i j} & if λ \in [0, \frac{1}{2}] and i < j, \\ (1 - 2 (λ - \frac{1}{2})) P_{i j} + 2 (λ - \frac{1}{2}) 1 & if λ \in [\frac{1}{2}, 1] and i < j, \end{matrix}

(23)

where

λ \in [0, 1]

. The entropy of

Q (λ)

increases as

λ

decreases.

Further, we assume we play against colloquially chalky opponents, who usually bet on the higher-seeded team. Each team in the March Madness tournament is assigned a numerical ranking from 1 to 16, their seed, prior to the start of the tournament by the NCAA Division I Men’s Basketball committee. The seeds determine the matchups in round one and are a measure of team strength (i.e., lower-seeded teams are considered better by the committee). We suppose colloquially chalky opponents generate brackets according to a distribution

R^{(cc)}

based on the seeds

s_{i}

and

s_{j}

of teams i and j,

R_{i j}^{(cc)} = \{\begin{matrix} 0.9 if s_{i} - s_{j} < - 1, \\ 0.5 if | s_{i} - s_{j} | \leq 1, \\ 0.1 if s_{i} - s_{j} > 1, \end{matrix}

(24)

so they usually bet on the higher-seeded team. The exact specification of the colloquially chalky distribution is not important, as we use

R^{(cc)}

to illustrate a general point. In future work, one can try to model the distribution of the public’s bracket submissions to obtain more precise results.

In Figure 10a, we visualize the expected max score of n brackets generated according to

Q (λ)

as a function of n and

λ

. In Figure 10b, we visualize the probability that the max score of n brackets generated according to

Q (λ)

exceeds that of k = 10,000 colloquially chalky brackets generated according to

R^{(cc)}

as a function of n and

λ

. In both, we again see that we should increase entropy (decrease

λ

) as n increases. In particular, the small circle (indicating the best strategy given n and k) moves leftward as n increases. Further, we see that tuning the entropy of our submitted bracket set given the other variables yields an excellent win probability, even when n is much smaller than k.

6. Discussion

In this work, we pose and explore the multi-brackets problem: how should we submit n predictions of a randomly drawn reference bracket (tuple)? The most general version of this question, which finds the optimal set of n brackets across all such possible sets, is extremely difficult. To make the problem tractable, possible, and/or able to be visualized, depending on the particular specification of the multi-bracket pool, we make simplifying assumptions. First, we assume we (and optionally a field of opponents) submit i.i.d. brackets generated according to a bracket distribution. The task becomes to find the optimal generating bracket distribution. For some multi-bracket pools, this is tractable, and for others, it is not. For those pools, we make another simplifying assumption, searching over an intelligently chosen low-dimensional subspace of generating bracket distributions covering distributions of various levels of entropy. We find this approach is sufficient to generate well-performing sets of submitted brackets. We also learn the following high-level lessons from this strategy: we should increase the entropy of our bracket predictions as n increases and as our opponents increase entropy.

We leave much room for future work on the multi-brackets problem. First, it is still an open and difficult problem to find the optimal set of n bracket predictions across all such possible subsets, where optimal could mean maximizing expected maximum score, win probability, or expected profit. Second, in this work, we assume the “true” probabilities

P

and our opponents’ generating bracket strategy

R

exist and are known. A fruitful extension of this work would revisit the problems posed in this work under the lens that, in practice, these distributions are either estimated from data or are unknown (e.g., as in Metel [7]). Finally, we suggest exploring more problem-specific approaches to particular multi-bracket pools. For instance, in March Madness bracket challenges, we suggest exploring strategies of varying levels of entropy within each round. Perhaps the public’s entropy is too low in early rounds and too high in later rounds, suggesting we should counter by increasing our entropy in earlier rounds and decreasing our entropy in later rounds.

Author Contributions

Conceptualization, R.S.B.; methodology, R.S.B., A.J.W. and I.J.B.; software, R.S.B.; validation, R.S.B.; formal analysis, R.S.B.; investigation, R.S.B.; data curation, R.S.B.; writing—original draft preparation, R.S.B.; writing—review and editing, R.S.B.; visualization, R.S.B.; supervision, A.J.W. and I.J.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

See Appendix A.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Our Code

The code and data for this project is publicly available at https://github.com/snoopryan123/entropy_ncaa (accessed on 22 July 2024).

Appendix B. Previous Work Details

Appendix B.1. Estimating Outcome Probabilities in Horse Racing and March Madness

There has been a plethora of research on estimating horse race outcome probabilities. A line of research beginning with Harville [8] estimates the probabilities of the various possible orders of finish of a horse race, assuming knowledge of just the win probabilities of each individual horse. Henery [9], Stern [10], Bacon-Shone et al. [11], Lo and Bacon-Shone [12], and Lo and Bacon-Shone [13] extend this work, developing better and more tractable models. There has also been extensive research on the favorite-longshot bias, the phenomenon that the public typically underbets favored horses and overbets longshot horses, skewing the win probabilities implied by the odds that each horse wins the race. For instance, Asch et al. [14], Ali [15], Rosenbloom [16], Brown and Lin [17], and Hausch et al. [18] illustrate and explain the favorite-longshot bias, and some of these works attempt to adjust for this bias in estimating horse racing outcome probabilities. Yet another line of research focuses on comprehensively estimating these horse finishing probabilities. Bolton and Chapman [19] model outcome probabilities using a multinomial logistic regression model, forming the basis for most modern prediction methods. Chapman [20], Benter [6], Edelman [21], Lessmann et al. [22], and Lessmann et al. [23] extend this work (Benter in particular reported that his team has made significant profits during their five-year gambling operation). Dayes [24] searches for covariates that are predictive of horse race outcome probabilities even after adjusting for the odds, Silverman [25] estimates a hierarchical Bayesian model of these probabilities, and Gulum [26] uses machine learning and graph-based features to estimate these probabilities. For a more comprehensive review of this literature, see Hausch et al. [18].

Similarly, there has been a plethora of research on estimating win probabilities for March Madness matchups. Publicly available NCAA basketball team ratings have been around for decades, for instance, from Massey [27], Pomeroy [28], Sagarin [29], Moore [30], and Georgia Institute of Technology [31]. Other early approaches from Schwertman et al. [32] and Kvam and Sokol [33] use simple logistic regression models to rate teams. Since then, modelers have aggregated existing team rating systems into ensemble models. For instance, Carlin [34] uses a prediction model that merges Vegas point spreads with other team strength models, Lopez and Matthews [35] merge point spreads with possession-based team efficiency metrics, and FiveThirtyEight [36] combines some of these publicly available ratings systems with their own ELO ratings. Today, people use machine learning or other more elaborate modeling techniques to build team ratings systems. For instance, ESPN’s BPI uses a Bayesian hierarchical model to predict each team’s projected point differential against an average Division I team on a neutral court [37]. Further, Ji et al. [38] use a matrix completion approach, Goto [39] uses gradient boosting, reHOOPerate [40] uses a neural network, Forsyth and Wilde [41] use a random forest, and Gumm et al. [42], Deshpande [43], and Kim et al. [44] compare various machine learning models. Finally, each year there is a popular Kaggle competition in which contestants submit win probabilities for each game and are evaluated on the log-loss of their probabilities [45].

Appendix B.2. Previous Approaches to Submitting Multiple Brackets to a March Madness Bracket Challenge

For March Madness bracket challenges, there has been limited research on what we should do with team ratings and win probability estimates once we obtain them. Most existing research has focused on filling out one optimal bracket after obtaining win probabilities. For instance, Kaplan and Garstka [46] find the bracket which maximizes expected score and Clair and Letscher [47] find the bracket which maximizes expected return conditional on the behavior of other entrants’ submitted brackets. There has also been some work on filling out multiple brackets in the sports analytics community. For instance, Scott Powers and Eli Shayer created an R package mRchmadness (https://github.com/elishayer/mRchmadness, accessed on 1 June 2023) which uses simulation methods to generate an optimal set of brackets. Also, Tauhid Zaman at the 2019 Sports Analytics Conference (https://www.youtube.com/watch?v=mAgb8A2GDAQ, accessed on 1 June 2023) used integer programming to greedily generate a sequence of “optimal” brackets subject to “diversity” constraints, which force the next bracket in the sequence to be meaningfully different from prior brackets. Nonetheless, we are not aware of any papers which focus on filling out multiple brackets so as to optimize maximum score or win probability.

Appendix C. The Difficulty of Generating Multiple Predicted Brackets

In this section, we discuss the limitations of existing single-event wagering strategies when extended to multi-bracket pools. To start, there are a few key differences between betting on single events and multi-bracket pools. First, in betting on single events, there is no need to submit multiple predictions. In betting on the winner of a basketball game, either one or zero of the teams will have positive expected profit (abbreviated as EV, or expected value), depending on the probabilities and odds. Similarly, to maximize EV in betting on a horse race, bet on the highest EV horse. Second, in betting on single events, there is usually no notion of win probability or a field of opponents because these bets are against the house. Third, whereas multi-bracket pools are one-time events, betting on single events can take a sequential nature in which bettors place a sequence of many bets over a long time horizon. To maximize expected log bankroll in the long run, Kelly [2] devised an algorithm that spreads the bankroll across multiple horses. Only betting on plus EV horses can be suboptimal if there are many other horses with nontrivial win probability. The existence of other horses can yield the need to hedge in order to not go bankrupt. The optimization problem that Kelly solved boils down to calculus.

Directly extending these strategies to betting on the outcome of a random tuple (or bracket) in a bracket pool involves optimally generating a single predicted bracket. Generating a bracket in order to maximize expected score or expected profit, ignoring a field of opponents, is straightforward. First, we consider tournaments featuring a smaller number of possible brackets. For instance, in a pick six contest with

m_{j}

horses in race j, there are

\prod_{j} m_{j}

tickets (e.g., 1 million tickets for

m_{j} \equiv 10

). To maximize expected score, we simply enumerate tickets by their probability and select the highest one. We can then maximize expected profit, a function of the probabilities and the odds, similarly. For tournaments consisting of a larger number of possible brackets, such as a March Madness bracket challenge, we cannot enumerate the brackets by their likelihood. But we can find the bracket that maximizes expected score: it is the most probable bracket, which features zero upsets.

Generating a bracket in order to maximize win probability or expected profit in a way that accounts for a field of opponents takes more work. To account for opponents, existing work (e.g., Clair and Letscher [47]) assumes opponents pick according to some probability distribution. Then, it is intractable to even evaluate the expected return or win probability of any given bracket since that involves integrating over

2^{m}

realizations of the observed bracket, where m is the size of a bracket (e.g.,

m = 63

). Existing work instead approximates these quantities. Clair et al. use a normal approximation to do so. Then, it is still intractable to enumerate the

2^{m}

brackets by approximated expected score and select the best one. Existing work instead approximates this search. Clair et al. use a hill climbing greedy algorithm to do so. Beginning with a subset of brackets, they examine all “neighboring brackets” (brackets who have one differing match outcome), choose the most optimal neighbor, and repeat until they converge to a local optimum.

These strategies do not extend to general multi-bracket pools. For instance, consider March Madness. It is intractable to enumerate the set of

(\binom{2^{m}}{n})

combinations of size-n bracket subsets, let alone to list them by likelihood or in order of expected score. Further, approximations such as the greedy hill climbing algorithm from Clair and Letscher [47] quickly become ineffectual as n increases, as any tractable number of starting seeds for the algorithm is so substantially smaller than the search space. The set of size-n subsets of brackets is just too large.

Appendix D. Robustness of Our Results to P

We would like an idea of how our entropy-based strategies are affected by variations in the “true” win probabilities

P

. Due to the high-dimensional nature of the problem, this is a difficult question. Considering our March Madness tournament example from Section 5.2, to show that our results are not entirely dependent on the idiosyncrasies of the 2021 ELO ratings from FiveThirtyEight, such as Gonzaga having a very high ELO score, we reconduct our simulations using alternative win probabilities

P

, denoted

P^{'}

. In particular, we generate new ELO scores

{β_{i}^{'}}_{i = 1}^{64}

from a Gaussian using the sample mean and standard deviation of the 64 FiveThirtyEight ELO ratings. We visualize the new ELO scores and their implied win probabilities

P^{'}

in Figure A1. Although specific numbers that depend on

P^{'}

change accordingly, such as the optimal entropy parameter

λ

, we find that the qualitative results of our study do not change. In particular, the overall shapes of the optimal entropy curves for

P^{'}

in Figure A2 mimic those that use the original

P

from Figure 10 in Section 5.2.

Figure A1. On the left, we plot the distribution of Elo ratings. On the right, for each team i, we plot the function

j \mapsto P_{i j}^{'}

.

Figure A1. On the left, we plot the distribution of Elo ratings. On the right, for each team i, we plot the function

j \mapsto P_{i j}^{'}

.

Figure A2. An analagous plot to Figure 10 in Section 5.2 that uses

P^{'}

rather than

P

. (a): The expected max ESPN score (y-axis) of n brackets generated according to

Q (λ)

as a function of n (color) and

λ

(x-axis). (b): The probability (y-axis) that the max ESPN score of n brackets generated according to

Q (λ)

exceeds that of

k = 10,000

colloquially chalky brackets generated according to

R^{(cc)}

as a function of n (color) and

λ

(x-axis). The small circle indicates the best strategy given n and k.

Figure A2. An analagous plot to Figure 10 in Section 5.2 that uses

P^{'}

rather than

P

. (a): The expected max ESPN score (y-axis) of n brackets generated according to

Q (λ)

as a function of n (color) and

λ

(x-axis). (b): The probability (y-axis) that the max ESPN score of n brackets generated according to

Q (λ)

exceeds that of

k = 10,000

colloquially chalky brackets generated according to

R^{(cc)}

as a function of n (color) and

λ

(x-axis). The small circle indicates the best strategy given n and k.

Appendix E. Guessing a Randomly Drawn Bitstring’s Details

Appendix E.1. Expected Maximum Score

The expected maximum score of n submitted brackets is

\begin{matrix} E [max_{j = 1, \dots, n} f (x^{(j)}, τ)] \end{matrix}

(A1)

\begin{matrix} = & \sum_{a = 0}^{m} P (max_{j = 1, \dots, n} f (x^{(j)}, τ) > a) by tail sum \end{matrix}

(A2)

\begin{matrix} = & \sum_{a = 0}^{m} \{1 - P (max_{j = 1, \dots, n} f (x^{(j)}, τ) \leq a)\} \end{matrix}

(A3)

\begin{matrix} = & \sum_{a = 0}^{m} \{1 - \sum_{u = 0}^{m} P (max_{j = 1, \dots, n} f (x^{(j)}, τ) \leq a | u) P (u)\}, \end{matrix}

(A4)

where

u = (u_{1}, \dots, u_{R})

and

u_{rd}

is the number of zeros in

τ

in round

rd

. With this definition of u,

{f (x^{(j)}, τ)}_{j = 1}^{n}

are conditionally i.i.d. given u and

P (u) = \prod_{rd = 1}^{R} P (u_{rd}) = \prod_{rd = 1}^{R} dbinom (u_{rd}, m_{rd}, 1 - p_{rd}) .

(A5)

Thus,

\begin{matrix} E [max_{j = 1, \dots, n} f (x^{(j)}, τ)] \end{matrix}

(A6)

\begin{matrix} = & \sum_{a = 0}^{m} \{1 - \sum_{u = 0}^{m} P (f (x^{(j)}, τ) \leq a for all j | u) P (u)\} \end{matrix}

(A7)

\begin{matrix} = & \sum_{a = 0}^{m} \{1 - \sum_{u = 0}^{m} P {(f (x^{(1)}, τ) \leq a | u)}^{n} P (u)\} . \end{matrix}

(A8)

The CDF of the score given u is

\begin{matrix} P (f (x^{(1)}, τ) \leq a | u) \end{matrix}

(A9)

\begin{matrix} = & P (\sum_{rd = 1}^{R} \sum_{i = 1}^{m_{rd}} w_{rd, i} \cdot 1 1 {x_{rd, i}^{(1)} = τ_{rd, i}} \leq a | u) \end{matrix}

(A10)

\begin{matrix} = & P (\sum_{rd = 1}^{R} w_{rd} \cdot (Binom (u_{rd}, 1 - q_{rd}) + Binom (m_{rd} - u_{rd}, q_{rd})) \leq a) . \end{matrix}

(A11)

This is the CDF of a generalized Poisson Binomial distribution, which we compute in

R

using the

PoissonBinomial

package [48].

Appendix E.2. Win Probability

The probability that the maximum score of our n submitted brackets exceeds or ties that of k opposing brackets is

\begin{matrix} P [max_{j = 1, \dots, n} f (x^{(j)}, τ) \geq max_{ℓ = 1, \dots, k} f (y^{(k)}, τ)] \end{matrix}

(A12)

\begin{matrix} = & 1 - P [max_{j = 1, \dots, n} f (x^{(j)}, τ) < max_{ℓ = 1, \dots, k} f (y^{(k)}, τ)] \end{matrix}

(A13)

\begin{matrix} = & 1 - P [f (x^{(j)}, τ) < max_{ℓ = 1, \dots, k} f (y^{(k)}, τ) \forall j] \end{matrix}

(A14)

\begin{matrix} = & 1 - \sum_{u = 0}^{m} P [f (x^{(j)}, τ) < max_{ℓ = 1, \dots, k} f (y^{(k)}, τ) \forall j | u] P (u) \end{matrix}

(A15)

\begin{matrix} = & 1 - \sum_{u, a = 0}^{m} P [f (x^{(j)}, τ) < a \forall j | u] P (max_{ℓ = 1, \dots, k} f (y^{(k)}, τ) = a | u) P (u) \end{matrix}

(A16)

\begin{matrix} = & 1 - \sum_{u, a = 0}^{m} P {[f (x^{(1)}, τ) < a | u]}^{n} \{P (max_{ℓ = 1, \dots, k} f (y^{(k)}, τ) \leq a | u) - P (max_{ℓ = 1, \dots, k} f (y^{(k)}, τ) \leq a - 1 | u)\} P (u) \end{matrix}

(A17)

\begin{matrix} = & 1 - \sum_{u, a = 0}^{m} P {[f (x^{(1)}, τ) \leq a - 1 | u]}^{n} \{P {(f (y^{(1)}, τ) \leq a | u)}^{k} - P {(f (y^{(1)}, τ) \leq a - 1 | u)}^{k}\} P (u) . \end{matrix}

(A18)

Here, we condition on

u = (u_{1}, \dots, u_{R})

where

u_{rd}

is the number of zeros in

τ

in round

rd

. With this definition of u, both

{f (x^{(j)}, τ)}_{j = 1}^{n}

and

{f (y^{(ℓ)}, τ)}_{ℓ = 1}^{k}

are conditionally i.i.d. given u. We compute the Generalized Poisson Binomial CDFs of the scores

f (x^{(1)}, τ)

and

f (y^{(1)}, τ)

given u as described in Appendix E.1.

In Figure A3, we visualize this win probability as a function of q and r for

p = 0.75

and various values of k and n.

Figure A3. The probability (color) that the maximum Hamming score of n submitted Bernoulli(q) brackets relative to a reference Bernoulli(p) bracket exceeds that of k opposing Bernoulli(r) brackets as a function of q (y-axis), r (x-axis), n (facet), and k (letter) for

p = 0.75

in the “guessing a randomly drawn bitstring” contest with

p \equiv p_{rd}

,

q \equiv q_{rd}

,

r \equiv r_{rd}

, and

R = 6

rounds. (a) features k = 100, (b) features k = 1000, (c) features k = 10,000, (d) features k = 100,000, (e) features k = 1,000,000, (f) features k = 10,000,000.

Figure A3. The probability (color) that the maximum Hamming score of n submitted Bernoulli(q) brackets relative to a reference Bernoulli(p) bracket exceeds that of k opposing Bernoulli(r) brackets as a function of q (y-axis), r (x-axis), n (facet), and k (letter) for

p = 0.75

in the “guessing a randomly drawn bitstring” contest with

p \equiv p_{rd}

,

q \equiv q_{rd}

,

r \equiv r_{rd}

, and

R = 6

rounds. (a) features k = 100, (b) features k = 1000, (c) features k = 10,000, (d) features k = 100,000, (e) features k = 1,000,000, (f) features k = 10,000,000.

Appendix F. The Asymptotic Equipartition Property

Let

X

denote the set of all brackets of length m. Each bracket

x \in X

consists of m individual forecasts

x = (x_{1}, \dots, x_{m})

. Each forecast

x_{i}

has

o \geq 2

possible outcomes. Let

P

be a probability measure on

X

. The entropy of

(X, P)

is

H : = E [- \frac{1}{m} {log}_{2} P (X)]

, where X is a bracket randomly drawn from

X

according to

P

, and the entropy of a bracket

x \in X

is

H (x) : = - \frac{1}{m} {log}_{2} P (x)

. For instance, in our “guessing a bitstring” example from Section 3,

X

is the set of all bitstrings of length m, each individual forecast is a bit, and supposing a bitstring is randomly drawn by m independent Bernoulli(p) coin flips, the entropy is

H = - (p {log}_{2} (p) + (1 - p) {log}_{2} (1 - p))

.

Letting

ϵ > 0

, we partition the set of all brackets

X

into three subsets,

\{\begin{matrix} ϵ - low entropy “ chalky ” brackets C_{ϵ} : = {x \in X : P (x) \geq 2^{- m (H - ϵ)}}, \\ ϵ - “ typical ” brackets T_{ϵ} : = {x \in X : 2^{- m (H + ϵ)} < P (x) < 2^{- m (H - ϵ)}}, \\ ϵ - high entropy “ rare ” brackets R_{ϵ} : = {x \in X : P (x) \leq 2^{- m (H + ϵ)}} . \end{matrix}

(A19)

Under this definition, an individual chalky bracket is more probable than an individual typical bracket, which is more probable than an individual rare bracket.

The Asymptotic Equipartition Property (A.E.P.) from Information Theory, Theorem A1, quantifies our intuition about chalky, typical, and rare brackets from Section 4: as m tends to infinity, the probability mass of the set of brackets becomes increasingly more concentrated in an exponentially small set, the typical set. The proof of Theorem A1 is adapted from Cover and Thomas [49].

The primary mathematical takeaway from Theorem A1 is as follows. The set

X

of all possible length m brackets has exponential size

o^{m}

, recalling that

o

is the number of possible outcomes of each individual forecast (e.g.,

o = 2

in “guessing a bitstring”). The

ϵ

-typical set

T_{ϵ}

comprises a tiny fraction of

X

, having size

| T_{ϵ} | \approx 2^{m H}

by part (b) of the theorem. Therefore,

T_{ϵ}

is exponentially smaller than

X

,

| T_{ϵ} | / | X | \approx 2^{m H} / o^{m} = 2^{- m ({log}_{2} o - H)}

. In the “guessing a bitstring” example with

p = 0.75

in which the reference bitstring consists of m independent Bernoulli(p) bits,

o = 2

and

H \approx 0.81

. Thus,

| T_{ϵ} | / | X | \approx 2^{- m (0.19)} \approx 0 . 88^{m}

. When

m = 63

(as in March Madness),

| T_{ϵ} | / | X | \approx 0.00026

, so the typical set of brackets is about

4000

times as small as the full set. This factor increases exponentially as m increases.

Theorem A1

(Asymptotic Equipartition Property). Let

ϵ > 0

.

(a): The typical set asymptotically contains most of the probability mass:

$P (T_{ϵ}) \to 1 in probability as m \to \infty .$

(A20)

In other words, the reference bitstring is likely a typical bracket.
(b): For m sufficiently large, we can bound the sizes of sets of chalky, typical, and rare brackets in terms of the entropy,

$\{\begin{matrix} | C_{ϵ} | < 2^{m (H - ϵ)}, \\ (1 - ϵ) \cdot 2^{m (H - ϵ)} < | T_{ϵ} | < 2^{m (H + ϵ)}, \\ | R | > o^{m} - 2^{m (H + ϵ)} - 2^{m (H - ϵ)} . \end{matrix}$

(A21)

In other words, most brackets are rare, exponentially fewer brackets are typical, and exponentially fewer of those are chalky.
(c): For m sufficiently large, the typical set is essentially the smallest high-probability set: letting $δ > 0$ and $B_{δ} \subset X$ be any high-probability set with $P (B_{δ}) \geq 1 - δ$ , $B_{δ}$ and $T_{ϵ}$ have similar sizes, $| B_{δ} | \geq (1 - ϵ - δ) 2^{m (H - ϵ)}$ . $B_{δ}$ is a high-probability set when δ is small, and in that case, both $| B_{δ} |$ and $| T |$ are essentially bounded below by $(1 - ϵ) 2^{m (H - ϵ)}$ .

Proof of Theorem A1.

\begin{matrix} P (T_{ϵ}) = P_{X \sim P} (X \in T_{ϵ}) = P (2^{- m (H + ϵ)} < P (X) < 2^{- m (H - ϵ)}) = P (| - \frac{1}{m} {log}_{2} P (x) - H | \geq ϵ), \end{matrix}

(A22)

which converges to 1 by the law of large numbers since

H = E [- \frac{1}{m} {log}_{2} P (X)]

. This proves part (a).

Now,

\begin{matrix} 1 = \sum_{x \in X} P (x) \geq \sum_{x \in T_{ϵ}} P (x) > \sum_{x \in T_{ϵ}} 2^{- m (H + ϵ)} = 2^{- m (H + ϵ)} \cdot | T_{ϵ} |, \end{matrix}

(A23)

so

| T_{ϵ} | < 2^{m (H + ϵ)}

. By part (a), for m sufficiently large,

\begin{matrix} 1 - ϵ \leq P (T_{ϵ}) = \sum_{x \in T_{ϵ}} P (x) < \sum_{x \in T_{ϵ}} 2^{- m (H - ϵ)} = 2^{- m (H - ϵ)} \cdot | T_{ϵ} |, \end{matrix}

(A24)

so

| T_{ϵ} | > (1 - ϵ) \cdot 2^{m (H - ϵ)}

. Similarly,

\begin{matrix} 1 = \sum_{x \in X} P (x) \geq \sum_{x \in C_{ϵ}} P (x) > \sum_{x \in C_{ϵ}} 2^{- m (H - ϵ)} = 2^{- m (H - ϵ)} \cdot | C_{ϵ} |, \end{matrix}

(A25)

so

| C_{ϵ} | < 2^{m (H - ϵ)} .

Therefore,

\begin{matrix} | R_{ϵ} | & = | X ∖ (T_{ϵ} \cup C_{ϵ}) | = o^{m} - | T_{ϵ} | - | C_{ϵ} | > o^{m} - 2^{m (H + ϵ)} - 2^{m (H - ϵ)} . \end{matrix}

(A26)

This proves part (b).

Finally, by part (a), for m sufficiently large,

\begin{matrix} 1 - δ - ϵ = (1 - ϵ) + (1 - δ) - 1 \leq P (T_{ϵ}) + P (B_{δ}) - P (T_{ϵ} \cup B_{δ}) . \end{matrix}

(A27)

Thus,

\begin{matrix} \begin{matrix} 1 - δ - ϵ & \leq P (T_{ϵ} \cap B_{δ}) = \sum_{x \in T_{ϵ} \cap B_{δ}} P (x) \leq \sum_{x \in T_{ϵ} \cap B_{δ}} 2^{- m (H - ϵ)} \\ = | T_{ϵ} \cap B_{δ} | \cdot 2^{- m (H - ϵ)} \leq | B_{δ} | \cdot 2^{- m (H - ϵ)}, \end{matrix} \end{matrix}

(A28)

so

| B_{δ} | \geq (1 - ϵ - δ) 2^{m (H - ϵ)}

. This proves part (c). □

Appendix G. Pick Six Details

We can explicitly and quickly compute a tractable lower bound for the expected profit (Formula (15)) under our pick six model from Section 5.1. We begin with

\begin{matrix} E (\frac{W}{W + W_{opp}}) & = E_{τ \sim P, x \sim Q, y \sim R} (\frac{W}{W + W_{opp}}) \end{matrix}

(A29)

\begin{matrix} = \sum_{τ} P (τ) E (\frac{W}{W + W_{opp}} | τ) \end{matrix}

(A30)

\begin{matrix} = \sum_{τ} P (τ) \sum_{w, w^{'}} (\frac{w}{w + w^{'}}) P (W = w, W_{opp} = w^{'} | τ) \end{matrix}

(A31)

\begin{matrix} = \sum_{τ} P (τ) \sum_{w, w^{'}} (\frac{w}{w + w^{'}}) P (W = w | τ) P (W_{opp} = w^{'} | τ) \end{matrix}

(A32)

Since W is conditionally independent of

W_{opp}

given

τ

,

\begin{matrix} = \sum_{τ} P (τ) \sum_{w^{'}} \sum_{w \geq 1} (\frac{w}{w + w^{'}}) P (W = w | τ) P (W_{opp} = w^{'} | τ) \end{matrix}

(A33)

Since if

w = 0

,

w / (w + w^{'}) = 0

,

\begin{matrix} \geq \sum_{τ} P (τ) \sum_{w^{'}} \sum_{w \geq 1} (\frac{1}{1 + w^{'}}) P (W = w | τ) P (W_{opp} = w^{'} | τ) \end{matrix}

(A34)

Since

w / (w + w^{'}) \geq 1 / (1 + w^{'})

, which is essentially to say that we will not submit duplicate tickets,

\begin{matrix} = \sum_{τ} P (τ) P (W \neq 0 | τ) \sum_{w^{'}} (\frac{1}{1 + w^{'}}) P (W_{opp} = w^{'} | τ) \end{matrix}

(A35)

\begin{matrix} = \sum_{τ} P (τ) P (W \neq 0 | τ) E [\frac{1}{1 + W_{opp}} | τ] \end{matrix}

(A36)

\begin{matrix} \geq \sum_{τ} P (τ) P (W \neq 0 | τ) \frac{1}{1 + E [W_{opp} | τ]} \end{matrix}

(A37)

by Jensen’s inequality, since

x \mapsto 1 / (1 + x)

is convex when

x > 0

.

Now,

\begin{matrix} P (τ) = P_{τ \sim P} (τ) = P (τ_{1}, \dots, τ_{s}) = \prod_{j = 1}^{s} P (τ_{j}) = \prod_{j = 1}^{s} P_{τ_{j} j} . \end{matrix}

(A38)

Also,

\begin{matrix} P (W \neq 0^{'} | τ) & = P_{τ \sim P, x \sim Q} (W \neq 0^{'} | τ) \end{matrix}

(A39)

\begin{matrix} = P (\exists ℓ \in {1, \dots, n} such that x^{(ℓ)} = τ | τ) \end{matrix}

(A40)

\begin{matrix} = 1 - P (\forall ℓ \in {1, \dots, n}, x^{(ℓ)} \neq τ | τ) \end{matrix}

(A41)

\begin{matrix} = 1 - P {(x^{(1)} \neq τ | τ)}^{n} \end{matrix}

(A42)

Since the

{x^{(ℓ)}}

are i.i.d.,

\begin{matrix} = 1 - P {(\exists j \in {1, \dots, s} such that x_{j}^{(1)} \neq τ_{j} | τ)}^{n} \end{matrix}

(A43)

\begin{matrix} = 1 - {(1 - P (\forall j \in {1, \dots, s}, x_{j}^{(1)} = τ_{j} | τ))}^{n} \end{matrix}

(A44)

\begin{matrix} = 1 - {(1 - \prod_{j = 1}^{s} P (x_{j}^{(1)} = τ_{j} | τ))}^{n} \end{matrix}

(A45)

Since each of the s races are independent,

\begin{matrix} = 1 - {(1 - \prod_{j = 1}^{s} Q_{τ_{j} j})}^{n} . \end{matrix}

(A46)

Then, by similar logic,

\begin{matrix} E [W_{opp} | τ] & = E_{τ \sim P, y \sim R} [W_{opp} | τ] \end{matrix}

(A47)

\begin{matrix} = E [\sum_{ℓ = 1}^{k} 1 1 {y^{(ℓ)} = τ} | τ] \end{matrix}

(A48)

\begin{matrix} = \sum_{ℓ = 1}^{k} P (y^{(ℓ)} = τ | τ) \end{matrix}

(A49)

\begin{matrix} = k \cdot P (y^{(1)} = τ | τ) \end{matrix}

(A50)

\begin{matrix} = k \cdot \prod_{j = 1}^{s} P (y_{j}^{(1)} = τ_{j} | τ) \end{matrix}

(A51)

\begin{matrix} = k \cdot \prod_{j = 1}^{s} R_{τ_{j} j} . \end{matrix}

(A52)

Combining all these formulas, we can explicitly and quickly evaluate a lower bound for the expected profit,

\begin{matrix} E [Profit] & = - n + T \cdot E (\frac{W}{W + W_{opp}}) \end{matrix}

(A53)

\begin{matrix} \geq - n + T \cdot \sum_{τ} P (τ) P (W \neq 0 | τ) \frac{1}{1 + E [W_{opp} | τ]} \end{matrix}

(A54)

\begin{matrix} = - n + T \cdot \sum_{τ} (\prod_{j = 1}^{s} P_{τ_{j} j}) (1 - {(1 - \prod_{j = 1}^{s} Q_{τ_{j} j})}^{n}) (\frac{1}{1 + k \cdot \prod_{j = 1}^{s} R_{τ_{j} j}}) . \end{matrix}

(A55)

References

Isaacs, R. Optimal Horse Race Bets. Am. Math. Mon. 1953, 60, 310–315. [Google Scholar] [CrossRef]
Kelly, J.L. A new interpretation of information rate. Bell Syst. Tech. J. 1956, 35, 917–926. [Google Scholar] [CrossRef]
Rosner, B. Optimal Allocation of Resources in a Pari-Mutuel Setting. Manag. Sci. 1975, 21, 997–1006. [Google Scholar] [CrossRef]
Willis, K.E. Optimum No-Risk Strategy for Win-Place Pari-Mutuel Betting. Manag. Sci. 1964, 10, 574–577. [Google Scholar] [CrossRef]
Hausch, D.B.; Ziemba, W.T.; Rubinstein, M. Efficiency of the Market for Racetrack Betting. Manag. Sci. 1981, 27, 1435–1452. [Google Scholar] [CrossRef]
Benter, W. Computer Based Horse Race Handicapping and Wagering Systems: A Report. In Efficiency of Racetrack Betting Markets; World Scientific Publishing Co. Pte. Ltd.: Singapore, 2008; Chapter 19; pp. 183–198. [Google Scholar]
Metel, M. Kelly Betting on Horse Races with Uncertainty in Probability Estimates. Decis. Anal. 2017, 15, 47–52. [Google Scholar] [CrossRef]
Harville, D.A. Assigning Probabilities to the Outcomes of Multi-Entry Competitions. J. Am. Stat. Assoc. 1973, 68, 312–316. [Google Scholar] [CrossRef]
Henery, R.J. Permutation Probabilities as Models for Horse Races. J. R. Stat. Soc. Ser. (Methodol.) 1981, 43, 86–91. [Google Scholar] [CrossRef]
Stern, H. Models for Distributions on Permutations. J. Am. Stat. Assoc. 1990, 85, 558–564. [Google Scholar] [CrossRef]
Bacon-Shone, J.; Lo, V.S.Y.; Busche, K. Logistics Analyses of Complicated Bets; Research Report 11; Department of Statistics, The University of Hong Kong: Hong Kong, China, 1992. [Google Scholar]
Lo, V.S.Y.; Bacon-Shone, J. A Comparison between Two Models for Predicting Ordering Probabilities in Multiple-Entry Competitions. J. R. Stat. Soc. Ser. Stat. 1994, 43, 317–327. [Google Scholar] [CrossRef]
Lo, V.S.Y.; Bacon-Shone, J. Chapter 4—Approximating the Ordering Probabilities of Multi-Entry Competitions by a Simple Method. In Handbook of Sports and Lottery Markets; Hausch, D.B., Ziemba, W.T., Eds.; Handbooks in Finance; Elsevier: San Diego, CA, USA, 2008; pp. 51–65. [Google Scholar] [CrossRef]
Asch, P.; Malkiel, B.G.; Quandt, R.E. Market Efficiency in Racetrack Betting. J. Bus. 1984, 57, 165–175. [Google Scholar] [CrossRef]
Ali, M.M. Probability models on horse-race outcomes. J. Appl. Stat. 1998, 25, 221–229. [Google Scholar] [CrossRef]
Rosenbloom, E. A better probability model for the racetrack using Beyer speed numbers. Omega 2003, 31, 339–348. [Google Scholar] [CrossRef]
Brown, L.D.; Lin, Y. Racetrack betting and consensus of subjective probabilities. Stat. Probab. Lett. 2003, 62, 175–187. [Google Scholar] [CrossRef]
Hausch, D.B.; Lo, V.S.; Ziemba, W.T. (Eds.) Efficiency of Racetrack Betting Markets; World Scientific Publishing Co. Pte. Ltd.: Singapore, 2008. [Google Scholar]
Bolton, R.; Chapman, R. Searching for Positive Returns at the Track. Manag. Sci. 1986, 32, 1040–1060. [Google Scholar] [CrossRef]
Chapman, R.G. Still searching for positive returns at the track: Empirical results from 2000 Hong Kong races. In Efficiency of Racetrack Betting Markets; World Scientific Publishing Co. Pte. Ltd.: Singapore, 2008; Chapter 18; pp. 173–181. [Google Scholar]
Edelman, D. Adapting support vector machine methods for horserace odds prediction. Ann. Oper. Res. 2007, 151, 325–336. [Google Scholar] [CrossRef]
Lessmann, S.; Sung, M.C.; Johnson, J. Adapting Least-Square Support Vector Regression Models to Forecast the Outcome of Horseraces. J. Predict. Mark. 2007, 1, 169–187. [Google Scholar] [CrossRef]
Lessmann, S.; Sung, M.C.; Johnson, J. Identifying winners of competitive events: A SVM-based classification model for horserace prediction. Eur. J. Oper. Res. 2009, 196, 569–577. [Google Scholar] [CrossRef]
Dayes, V.S. Model Considerations for Multi-Entry Competitions. Ph.D. Thesis, San Diego State University, San Diego, CA, USA, 2010. [Google Scholar]
Silverman, N. A Hierarchical Bayesian Analysis of Horse Racing. J. Predict. Mark. 2012, 6, 1–13. [Google Scholar] [CrossRef]
Gulum, M.A. Horse Racing Prediction Using Graph-Based Features. Ph.D. Thesis, University of Louisville, Louisville, KY, USA, 2018. [Google Scholar]
Massey, K. Massey Ratings: Frequently Asked Questions. Available online: https://masseyratings.com/faq.php (accessed on 5 June 2024).
Pomeroy, K. Ratings Explanation. Available online: https://kenpom.com/blog/ratings-explanation/ (accessed on 5 June 2024).
Sagarin, J. Jeff Sagarin’s College Basketball Ratings. Available online: http://sagarin.com/sports/cbsend.htm (accessed on 5 June 2024).
Moore, S. Sonny Moore’s Computer Power Ratings. Available online: http://sonnymoorepowerratings.com/m-basket.htm (accessed on 5 June 2024).
Georgia Institute of Technology. LRMC (Classic) Results through Games of 3/5/2023; Georgia Institute of Technology: Atlanta, GA, USA, 2023; Available online: https://www2.isye.gatech.edu/~jsokol/lrmc/ (accessed on 5 June 2024).
Schwertman, N.C.; Schenk, K.L.; Holbrook, B.C. More Probability Models for the NCAA Regional Basketball Tournaments. Am. Stat. 1996, 50, 34–38. [Google Scholar] [CrossRef]
Kvam, P.; Sokol, J. A logistic regression/Markov chain model for NCAA basketball. Nav. Res. Logist. 2006, 53, 788–803. [Google Scholar] [CrossRef]
Carlin, B.P. Improved NCAA basketball tournament modeling via point spread and team strength information. In Anthology of Statistics in Sports; Society for Industrial and Applied Mathematics: Philadelphia, PA, USA, 2005; pp. 149–153. [Google Scholar]
Lopez, M.J.; Matthews, G.J. Building an NCAA men’s basketball predictive model and quantifying its success. J. Quant. Anal. Sport. 2015, 11, 5–12. [Google Scholar] [CrossRef]
FiveThirtyEight. 2022 March Madness Predictions. Available online: https://projects.fivethirtyeight.com/2022-march-madness-predictions/ (accessed on 5 June 2024).
ESPN Sports Analytics Team. BPI and Strength of Record: What Are They and How Are They Derived? Available online: https://www.espn.com/blog/statsinfo/post/_/id/125994/bpi-and-strength-of-record-what-are-they-and-how-are-they-derived (accessed on 5 June 2024).
Ji, H.; O’Saben, E.; Boudion, A.; Li, Y. March Madness Prediction: A Matrix Completion Approach. In Modeling, Simulation, and Visualization Student Capstone Conference; Department of Computer Science, Old Dominion University: Norfolk, VA, USA, 2015. [Google Scholar]
Goto, K. Predicting March Madness Using Machine Learning. Available online: https://towardsdatascience.com/kaggle-march-madness-silver-medal-for-two-consecutive-years-6207ff63b86c (accessed on 5 June 2024).
reHOOPerate. Training a Neural Network to Fill Out My March Madness Bracket. Available online: https://medium.com/re-hoop-per-rate/training-a-neural-network-to-fill-out-my-march-madness-bracket-2e5ee562eab1 (accessed on 5 June 2024).
Forsyth, J.; Wilde, A. A Machine Learning Approach to March Madness. 2014. Available online: https://axon.cs.byu.edu/~martinez/classes/478/stuff/Sample_Group_Project3.pdf (accessed on 5 June 2024).
Gumm, J.; Barrett, A.; Hu, G. A machine learning strategy for predicting march madness winners. In Proceedings of the 2015 IEEE/ACIS 16th International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD), Takamatsu, Japan, 1–3 June 2015; pp. 1–6. [Google Scholar] [CrossRef]
Deshpande, A. Applying Machine Learning To March Madness. Available online: https://adeshpande3.github.io/Applying-Machine-Learning-to-March-Madness (accessed on 5 June 2024).
Kim, J.W.; Magnusen, M.; Jeong, S. March Madness prediction: Different machine learning approaches with non-box score statistics. Manag. Decis. Econ. 2023, 44, 2223–2236. [Google Scholar] [CrossRef]
Sonas, J.; Maggie, W.C. March Machine Learning Mania 2023. 2023. Available online: https://www.kaggle.com/competitions/march-machine-learning-mania-2023/discussion/475046 (accessed on 5 June 2024).
Kaplan, E.H.; Garstka, S.J. March Madness and the Office Pool. Manag. Sci. 2001, 47, 369–382. [Google Scholar] [CrossRef]
Clair, B.; Letscher, D. Optimal Strategies for Sports Betting Pools. Oper. Res. 2007, 55, 1163–1177. [Google Scholar] [CrossRef]
Junge, F. R Package, Version 1.2.5; PoissonBinomial: Efficient Computation of Ordinary and Generalized Poisson Binomial Distributions. 2022. Available online: https://rdrr.io/cran/PoissonBinomial/ (accessed on 5 June 2024).
Cover, T.M.; Thomas, J.A. Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing); Wiley-Interscience: Hoboken, NJ, USA, 2006. [Google Scholar]

Figure 1. The expected maximum Hamming score (y-axis) of n submitted Bernoulli(q) bitstrings relative to a reference Bernoulli(p) bitstring as a function of p (x-axis), q (color), and n (facet) in the “guessing a randomly drawn bitstring” contest with

p \equiv p_{rd}

,

q \equiv q_{rd}

,

r \equiv r_{rd}

, and

R = 6

rounds. As n increases, we want to increase the entropy of our submitted brackets.

Figure 1. The expected maximum Hamming score (y-axis) of n submitted Bernoulli(q) bitstrings relative to a reference Bernoulli(p) bitstring as a function of p (x-axis), q (color), and n (facet) in the “guessing a randomly drawn bitstring” contest with

p \equiv p_{rd}

,

q \equiv q_{rd}

,

r \equiv r_{rd}

, and

R = 6

rounds. As n increases, we want to increase the entropy of our submitted brackets.

Figure 2. The probability (color) that the maximum Hamming score of n submitted Bernoulli(q) brackets relative to a reference Bernoulli(p) bracket exceeds that of k opposing Bernoulli(r) brackets as a function of q (y-axis), r (x-axis), and n (facet) for

p = 0.75

and

k = 100

in the “guessing a randomly drawn bitstring” contest with

p \equiv p_{rd}

,

q \equiv q_{rd}

,

r \equiv r_{rd}

, and

R = 6

rounds. We should increase entropy as n increases and as our opponents’ entropy increases.

Figure 2. The probability (color) that the maximum Hamming score of n submitted Bernoulli(q) brackets relative to a reference Bernoulli(p) bracket exceeds that of k opposing Bernoulli(r) brackets as a function of q (y-axis), r (x-axis), and n (facet) for

p = 0.75

and

k = 100

in the “guessing a randomly drawn bitstring” contest with

p \equiv p_{rd}

,

q \equiv q_{rd}

,

r \equiv r_{rd}

, and

R = 6

rounds. We should increase entropy as n increases and as our opponents’ entropy increases.

Figure 3. The expected maximum ESPN score (y-axis) of n submitted bitstrings, with Bernoulli(

q_{E}

) bits in early rounds and Bernoulli(

q_{L}

) bits in later rounds, relative to a reference Bernoulli(p) bitstring as a function of

q_{E}

(x-axis),

q_{L}

(color), n (columns), and the partition

(q_{E}, q_{L})

(rows) in the “guessing a randomly drawn bitstring” contest with

R = 6

rounds and

p = 0.75

. The circles indicate the best strategy in each setting. As n increases, we want to increase the entropy of our bracket predictions in both the early and late rounds.

Figure 3. The expected maximum ESPN score (y-axis) of n submitted bitstrings, with Bernoulli(

q_{E}

) bits in early rounds and Bernoulli(

q_{L}

) bits in later rounds, relative to a reference Bernoulli(p) bitstring as a function of

q_{E}

(x-axis),

q_{L}

(color), n (columns), and the partition

(q_{E}, q_{L})

(rows) in the “guessing a randomly drawn bitstring” contest with

R = 6

rounds and

p = 0.75

. The circles indicate the best strategy in each setting. As n increases, we want to increase the entropy of our bracket predictions in both the early and late rounds.

Figure 4. The probability (color) that the maximum ESPN score of n bitstrings, with Bernoulli(

q_{E}

) bits in early rounds and Bernoulli(

q_{L}

) bits in later rounds, relative to a reference Bernoulli(p) bitstring exceeds that of k opposing bitstrings, with Bernoulli(

r_{E}

) bits in early rounds and Bernoulli(

r_{L}

) bits in later rounds, as a function of

q_{E}

(y-axis),

r_{E}

(x-axis),

q_{L}

(rows), and

r_{L}

(columns) for

p = 0.75

,

k = 100

, and

n = 100

in the “guessing a randomly drawn bitstring” contest with

R = 6

rounds. (a) uses the partition where the first three rounds are the early rounds (e.g.,

q_{E} = q_{1} = q_{2} = q_{3}

and

r_{E} = r_{1} = r_{2} = r_{3}

) and (b) uses the partition where just the first round is an early round (e.g.,

q_{E} = q_{1}

and

r_{E} = r_{1}

). We should still increase the entropy of our bracket predictions as our opponents increase entropy.

Figure 4. The probability (color) that the maximum ESPN score of n bitstrings, with Bernoulli(

q_{E}

) bits in early rounds and Bernoulli(

q_{L}

) bits in later rounds, relative to a reference Bernoulli(p) bitstring exceeds that of k opposing bitstrings, with Bernoulli(

r_{E}

) bits in early rounds and Bernoulli(

r_{L}

) bits in later rounds, as a function of

q_{E}

(y-axis),

r_{E}

(x-axis),

q_{L}

(rows), and

r_{L}

(columns) for

p = 0.75

,

k = 100

, and

n = 100

in the “guessing a randomly drawn bitstring” contest with

R = 6

rounds. (a) uses the partition where the first three rounds are the early rounds (e.g.,

q_{E} = q_{1} = q_{2} = q_{3}

and

r_{E} = r_{1} = r_{2} = r_{3}

) and (b) uses the partition where just the first round is an early round (e.g.,

q_{E} = q_{1}

and

r_{E} = r_{1}

). We should still increase the entropy of our bracket predictions as our opponents increase entropy.

Figure 5. Note that these figures are not drawn to scale. First line: the probability mass of an individual low-entropy (chalky) bracket is much larger than the probability mass of an individual typical bracket, which is much larger than the probability mass of an individual high-entropy (rare) bracket. Second line: there are exponentially more rare brackets than typical brackets, and there are exponentially more typical brackets than chalky brackets. Third line: the typical brackets occupy most of the probability mass on aggregate.

Figure 6. The “true” win probability

P_{i j}

(y-axis) that horse i (x-axis) wins race j (facet) for each

i, j

. These probabilities are implied by the Vegas odds for the pick six contest at Belmont Park on 21 May 2023.

Figure 6. The “true” win probability

P_{i j}

(y-axis) that horse i (x-axis) wins race j (facet) for each

i, j

. These probabilities are implied by the Vegas odds for the pick six contest at Belmont Park on 21 May 2023.

Figure 7. The probability

Q_{i j} = Q_{i j} (λ, ϕ)

(y-axis) that we select horse i (x-axis) to win race

j = 6

for various values of

λ

(column) and

ϕ

(row). For fixed

ϕ

, entropy increases as

λ

decreases. For fixed

λ

, the probabilities of successively fewer horses are upweighted as

ϕ

decreases.

Figure 7. The probability

Q_{i j} = Q_{i j} (λ, ϕ)

(y-axis) that we select horse i (x-axis) to win race

j = 6

for various values of

λ

(column) and

ϕ

(row). For fixed

ϕ

, entropy increases as

λ

decreases. For fixed

λ

, the probabilities of successively fewer horses are upweighted as

ϕ

decreases.

Figure 8. A lower bound of our expected profit (y-axis) for a pick six horse racing betting pool in which we submit n tickets according to strategy

Q (λ, ϕ)

against a field of k = 25,000 opponents who use strategy

R (λ_{o p p})

, assuming a track take of

α = 0.05

and carryover C = 500,000, as a function of

λ_{o p p}

(x-axis) and n (color). Given these variables, we use the strategy

(λ, ϕ)

that maximizes expected profit over a grid of values.

Figure 8. A lower bound of our expected profit (y-axis) for a pick six horse racing betting pool in which we submit n tickets according to strategy

Q (λ, ϕ)

against a field of k = 25,000 opponents who use strategy

R (λ_{o p p})

, assuming a track take of

α = 0.05

and carryover C = 500,000, as a function of

λ_{o p p}

(x-axis) and n (color). Given these variables, we use the strategy

(λ, ϕ)

that maximizes expected profit over a grid of values.

Figure 9. (a): A histogram of FiveThirtyEight’s pre-round-one Elo ratings for the 2021 March Madness tournament. (b): The function

j \mapsto P_{i j}

for each team i (color) implied by these Elo ratings.

Figure 9. (a): A histogram of FiveThirtyEight’s pre-round-one Elo ratings for the 2021 March Madness tournament. (b): The function

j \mapsto P_{i j}

for each team i (color) implied by these Elo ratings.

Figure 10. (a): The expected max ESPN score (y-axis) of n brackets generated according to

Q (λ)

as a function of n (color) and

λ

(x-axis). (b): The probability (y-axis) that the max ESPN score of n brackets generated according to

Q (λ)

exceeds that of

k = 10,000

colloquially chalky brackets generated according to

R^{(cc)}

as a function of n (color) and

λ

(x-axis). The small circle indicates the best strategy given n and k. We want to increase entropy (decrease

λ

) as n increases.

Figure 10. (a): The expected max ESPN score (y-axis) of n brackets generated according to

Q (λ)

as a function of n (color) and

λ

(x-axis). (b): The probability (y-axis) that the max ESPN score of n brackets generated according to

Q (λ)

exceeds that of

k = 10,000

colloquially chalky brackets generated according to

R^{(cc)}

as a function of n (color) and

λ

(x-axis). The small circle indicates the best strategy given n and k. We want to increase entropy (decrease

λ

) as n increases.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Brill, R.S.; Wyner, A.J.; Barnett, I.J. Entropy-Based Strategies for Multi-Bracket Pools. Entropy 2024, 26, 615. https://doi.org/10.3390/e26080615

AMA Style

Brill RS, Wyner AJ, Barnett IJ. Entropy-Based Strategies for Multi-Bracket Pools. Entropy. 2024; 26(8):615. https://doi.org/10.3390/e26080615

Chicago/Turabian Style

Brill, Ryan S., Abraham J. Wyner, and Ian J. Barnett. 2024. "Entropy-Based Strategies for Multi-Bracket Pools" Entropy 26, no. 8: 615. https://doi.org/10.3390/e26080615

APA Style

Brill, R. S., Wyner, A. J., & Barnett, I. J. (2024). Entropy-Based Strategies for Multi-Bracket Pools. Entropy, 26(8), 615. https://doi.org/10.3390/e26080615

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Entropy-Based Strategies for Multi-Bracket Pools

Abstract

1. Introduction

2. The Multi-Brackets Problem

3. Canonical Example: Guessing a Randomly Drawn Bitstring

4. An Information Theoretic View of the Multi-Brackets Problem

5. Real-World Examples

5.1. Pick Six Horse Race Betting

5.2. March Madness Bracket Challenge

6. Discussion

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

Appendix A. Our Code

Appendix B. Previous Work Details

Appendix B.1. Estimating Outcome Probabilities in Horse Racing and March Madness

Appendix B.2. Previous Approaches to Submitting Multiple Brackets to a March Madness Bracket Challenge

Appendix C. The Difficulty of Generating Multiple Predicted Brackets

Appendix D. Robustness of Our Results to P

Appendix E. Guessing a Randomly Drawn Bitstring’s Details

Appendix E.1. Expected Maximum Score

Appendix E.2. Win Probability

Appendix F. The Asymptotic Equipartition Property

Appendix G. Pick Six Details

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI