1. Introduction
Parimutuel betting or pool betting involves pooling together all bets of a particular type on a given event, deducting a track take or vigorish, and splitting the pot among all winning bets. Prime examples are horse race betting and the March Madness bracket challenge (which involves predicting the winner of each game in the NCAA Division I men’s basketball March Madness tournament). Profitable parimutuel wagering systems have two components: a probability model of the event outcome and a bet allocation strategy. The latter uses the outcome probabilities as inputs to a betting algorithm that determines the amount to wager on each potential outcome. There is a large body of literature on estimating outcome probabilities for pool betting events. For instance, we provide an overview of estimating outcome probabilities for horse races and college basketball matchups in 
Appendix B.1. There is also a large body of literature on developing optimal wagering strategies, particularly for betting on horse race outcomes. Notably, assuming the outcome probabilities are known, Isaacs [
1] and Kelly [
2] derive the amount to wager on each horse so as to maximize expected profit and expected log wealth, respectively. Rosner [
3] derives a wagering strategy for a risk-averse decision maker, and Willis [
4] and Hausch et al. [
5] derive other wagering strategies. On the other hand, there has been very limited work, and no literature to our knowledge, on deriving optimal strategies for generating multiple predicted March Madness brackets. Existing work focuses on generating a single predicted bracket (see 
Appendix B.2 for details).
Existing wagering strategies for pools that involve betting on the outcome of a single event (e.g., the winner of a horse race) have been successful. For instance, Benter [
6] reported that his horse race gambling syndicate made 
significant profits during its five-year gambling operation. However, many betting pools in the real world involve betting not just on a single event but on a 
tuple of events. For example, the pick six bet in horse racing involves predicting each winner of six horse races. Also, the March Madness bracket challenge involves predicting the winner of each game in the NCAA Division I men’s basketball tournament. Another compelling example to the pure mathematician is predicting each of the bits in a randomly drawn bitstring. In each of these three prediction contests, the goal is to predict as best as possible a tuple of events, which we call a 
bracket. We suppose it is permissible to generate multiple predicted brackets, so we call these contests 
multi-bracket pools. In developing wagering strategies for multi-bracket pools, the literature on estimating outcome probabilities for each event in the bracket still applies. However, given these probabilities, the wagering strategy the literature developed for betting on single events does not extend to general multi-bracket pools. Although these methods work well in low-dimensional examples such as betting on the winner of a horse race, they are intractable for general multi-bracket pools having larger dimension (e.g., March Madness bracket challenges; see 
Appendix C for details).
Hence, we pose the multi-brackets problem. Suppose we wish to predict a bracket (a tuple of events) and suppose we know the “true” probabilities of each potential outcome of each event. Then, what is the best way to tractably generate a set of n predicted brackets? More concretely, how can we construct a set of n brackets that maximize an objective function such as expected score, win probability, or expected profit? The most general version of the multi-brackets problem, which finds the optimal set of n brackets across all such possible sets, is extremely difficult. To make the problem tractable, possible, and/or able to be visualized, depending on the particular specification of the multi-bracket pool, we make simplifying assumptions. First, we assume we (and optionally a field of opponents) predict i.i.d. brackets generated according to a bracket distribution. The task becomes to find the optimal generating bracket distribution. For higher-dimensional examples (e.g., March Madness bracket challenges), we make another simplifying assumption, optimizing over an intelligently chosen low-dimensional subspace of generating bracket distributions. In particular, we optimize over brackets of varying levels of entropy. We find that this entropy-based approach is sufficient to generate well-performing sets of bracket predictions. We also learn the following high-level lessons from this strategy: we should increase the entropy of our bracket predictions as n increases and as our opponents increase entropy.
The remainder of this paper is organized as follows. In 
Section 2, we formally introduce the multi-brackets problem. Then, in 
Section 3, we propose an entropy-based solution to what we consider a canonical example of a multi-bracket pool: guessing a randomly drawn bitstring. Using this example to aid our understanding of multi-bracket pools, in 
Section 4, we connect the multi-brackets problem to Information Theory, particularly via the Asymptotic Equipartition Property. Then, in 
Section 5, we propose entropy-based solutions to real-world examples of multi-bracket pools, including the pick six bet in horse racing in 
Section 5.1 and March Madness bracket challenges in 
Section 5.2. We conclude in 
Section 6.
  2. The Multi-Brackets Problem
In this section, we formally introduce the multi-brackets problem. The goal of a multi-bracket pool is to predict a tuple of 
m outcomes 
, which we call the “true” observed reference 
bracket. We judge how “close” a bracket prediction 
 is to 
 by a 
bracket scoring function . One natural form for the scoring function is
      
      which is the number of outcomes predicted correctly weighted by 
. Another is
      
      which is one if and only if the predicted bracket is exactly correct. The contestants who submit the highest-scoring brackets win the pool.
The multi-brackets problem asks the general question: if we could submit 
n brackets to the pool, how should we choose which brackets to submit? This question takes on various forms depending on the information available to us and the structure of a particular multi-bracket pool. In the absence of information about opponents’ predicted brackets, how should we craft our submitted set 
 of 
n bracket predictions in order to maximize the expected maximum score? Formally, we solve
      
Or, assuming a field of opponents submits a set 
 of 
k bracket predictions to the pool according to some strategy, how should we craft our submitted set 
 of 
n brackets in order to maximize our probability of having the best bracket? Formally, we solve
      
Another version of a multi-bracket pool offers a 
carryover C of initial money in the pot, charges 
b dollars per submitted bracket, and removes a fraction 
 from the pot as a track take or vigorish. The total pool of money entered into the pot is thus
      
      which is split among the entrants with the highest-scoring brackets. The question becomes the following: how should we craft our submitted set 
 of 
n brackets in order to maximize expected profit? Formally, we solve
      
This variant assumes no ties but is easily extended to incorporate ties (see 
Section 5.1). The optimization problems in Equations (
3), (
4) and (
6) and related variants define the multi-brackets problem.
In upcoming sections, we explore specific examples of the multi-brackets problem. In guessing a randomly drawn bitstring (
Section 3) and the March Madness bracket challenge (
Section 5.2), we explore the multi-brackets problem via scoring function (
1) and objective functions (
3) and (
4). In pick six betting in horse racing (
Section 5.1), we explore the multi-brackets problem via scoring function (
2) and objective function (
6).
The most general version of the multi-brackets problem, which finds the optimal set of n brackets across all such possible sets, is extremely difficult. To make the problem tractable, possible, and/or able to be visualized, depending on the particular specification of the multi-bracket pool, we make simplifying assumptions. We assume we (and the field of opponents) submit i.i.d. brackets generated from some bracket distribution. As the size of a bracket increases, solving the multi-brackets problem under this assumption quickly becomes intractable, so we optimize over intelligently chosen low-dimensional subspaces of bracket distributions. We find this entropy-based strategy is sufficient to generate well-performing sets of submitted brackets.
  3. Canonical Example: Guessing a Randomly Drawn Bitstring
In this section, we delve into what we consider a canonical example of a multi-bracket pool: guessing a randomly drawn bitstring. In this contest, we want to predict the sequence of bits in a reference bitstring, which we assume is generated according to some known probability distribution. We submit n guesses of the reference bitstring with the goal of being as “close” to it as possible or of being “closer” to it than a field of k opponents’ guesses, according to some distance function. With some assumptions on the distribution  from which the reference bitstring is generated, the distribution  from which we generate bitstring guesses, and the distribution  from which opponents generate bitstring guesses, the expected maximum score and win probability are analytically computable and tractable. By visualizing these formulas, we discern high-level lessons relevant to all multi-bracket pools. To maximize the expected maximum score of a set of n submitted randomly drawn brackets, we should increase the entropy of our submitted brackets as n increases. To maximize the probability that the maximum score of n submitted randomly drawn brackets exceeds that of k opposing brackets, we should increase the entropy of our brackets as our opponents increase entropy.
The objective of this multi-bracket pool is to predict a randomly drawn bitstring, which is to predict a sequence of bits. Here, a bracket is a bitstring consisting of 
m bits divided into 
R rounds with 
 bits in each round 
. For concreteness, we let there be 
 bits in each of the 
 rounds (i.e., 32 bits in round 1, 16 bits in round 2, 8 bits in round 3, ..., 1 bit in round 6, totaling 63 bits), but the analysis in this section holds for other choices of 
 and 
R. The “true” reference bracket that we are trying to predict is a bitstring 
. A field of opponents submits 
k guesses of 
, the brackets 
, where each bracket is a bitstring 
. We submit 
n guesses of 
, the brackets 
, where each bracket is a bitstring 
. The winning submitted bracket among 
 is “closest” to the reference bracket 
 according to a scoring function 
 measuring how “close” 
x is to 
. Here, we consider
      
      which is the weighted number of bits guessed correctly. This scoring function encompasses both the 
Hamming score and 
ESPN score. The Hamming score measures the number of bits guessed correctly, weighing each bit equally (
). The ESPN score weighs each bit by 
 so that the maximum accruable score in each round is the same (
).
Suppose the “true” reference bitstring 
 is generated according to some known distribution 
 and opponents’ bitstrings are generated according to some known distribution 
. Our task is to submit 
n predicted bitstrings so as to maximize the expected maximum score
      
      or the probability that we do not lose the bracket challenge
      
In particular, we wish to submit 
n bitstrings generated according to some distribution 
, and it is our task to find suitable 
. For tractability, we consider the special case that bits are drawn independently with probabilities varying by round. We suppose that each bit 
 in the reference bitstring is an independently drawn 
 coin flip. The parameter 
 controls the entropy of the contest: lower values correspond to a higher-entropy (more variable) reference bitstring that is harder to predict. By symmetry, our strategy just needs to vary by round. So, we assume that each of our submitted bits 
 is an independently drawn 
 coin flip and each of our opponents’ submitted bits 
 is an independently drawn 
 coin flip. The parameters 
 and 
 control the entropy of our submitted bitstrings and our opponents’ submitted bitstrings, respectively. Our task is to find the optimal strategy or entropy level 
. In this setting, expected maximum score and win probability are analytically computable and tractable (see 
Appendix E).
We first visualize the case where the entropy of the reference bitstring, our submitted bitstrings, and our opponents’ submitted bitstrings do not vary by round: 
, 
, and 
. In 
Figure 1, we visualize the expected maximum Hamming score of 
n submitted bitstrings as a function of 
p, 
q, and 
n. We find that we should increase the entropy of our submitted brackets (decrease 
q) as 
n increases, transitioning from pure chalk (
) for 
 bracket to the true amount of randomness (
) for large 
n. Specifically, for small 
n, the green line 
 lies below the blue lines (large 
q), and for large 
n, the green line lies above all the other lines.
In 
Figure 2, we visualize the win probability as a function of 
q, 
r, and 
n for 
 and 
. The horizontal gray dashed line 
 represents that we match the entropy of the reference bitstring, the vertical gray dashed line 
 represents that our opponents match the entropy of the reference bitstring, and the diagonal gray dashed line 
 represents that we match our opponents’ entropy. We should increase entropy (decrease 
q) as 
n increases, visualized by the green region moving downwards as 
n increases. Further, to maximize win probability, we should increase entropy (decrease 
q) as our opponents’ entropy increases (as 
r decreases), visualized by the triangular form of the green region. In other words, we should tailor the entropy of our brackets to the entropy of our opponents’ brackets. These trends are similar for other values of 
k and 
n (see 
Figure A3 of 
Appendix E).
These trends generalize to the case where the entropy of each bitstring varies by round (i.e., general , , and ). It is difficult to visualize the entire  dimensional space of , , and , so we instead consider a lower-dimensional subspace. Specifically, we visualize a two-dimensional subspace of q parameterized by , where  denotes q in early rounds and  denotes q in later rounds. For example,  and  is one of the five possible partitions of . We similarly visualize a two-dimensional subspace of r parameterized by . Finally, we let the reference bitstring have a constant entropy across each round, .
In 
Figure 3, we visualize the expected maximum ESPN score of 
n bitstrings as a function of 
, 
, and 
n for 
. The three columns display the results for 
, 
, and 
, respectively. The five rows display the results for the five partitions of 
. For instance, the first row shows one partition 
 and 
. As 
n increases, the expected maximum ESPN score increases. We visualize this as the lines moving upwards as we move right across the grid of plots. As 
E increases (i.e., as 
 encompasses a larger number of early rounds), the impactfulness of the late round strategy 
 decreases. We visualize this as the lines become more clumped together as we move down the grid of plots in 
Figure 3. For 
, the best strategy is pure chalk (
, 
), and as 
n increases, the optimal values of 
 and 
 decrease. In other words, as before, we want to increase the entropy of our submitted brackets as 
n increases. We visualize this as the circle (i.e., the best strategy in each plot) moving leftward and having a more reddish color as 
n increases.
In 
Figure 4, we visualize the win probability as a function of 
, 
, 
, and 
, for 
, 
, and ESPN score. 
Figure 4a uses the partition where the first three rounds are the early rounds (e.g., 
 and 
). In this scenario, early round strategy 
 and 
 are much more impactful than late round strategy 
 and 
. We visualize this as each subplot looking the same. The green triangle within each subplot illustrates that we should increase early round entropy (decrease 
) as our opponents’ early round entropy increases (i.e., as 
 decreases). 
Figure 4b uses the partition where just the first round is an early round (e.g., 
 and 
). In this scenario, both early round strategy 
 and 
 and late round strategy 
 and 
 are impactful. The green triangle appears again in each subplot, illustrating that we should increase early round entropy as our opponents’ early round entropy increases. But the green triangle grows as 
 decreases, indicating that we should increase late round entropy (decrease 
) as our opponent’s entropy increases.
  4. An Information Theoretic View of the Multi-Brackets Problem
The multi-brackets problem is intimately connected to Information Theory. Viewing the multi-brackets problem under an information theoretic lens provides a deeper understanding of the problem and elucidates why certain entropy-based strategies work. In particular, the Asymptotic Equipartition Property from Information Theory helps us understand why it makes sense to increase entropy as the number of brackets increases and as our opponents’ entropy increases. In this section, we give an intuitive explanation of the Equipartition Property and discuss implications, relegating the formal mathematical details to 
Appendix F.
To begin, we partition the set of all brackets 
 into three subsets,
      
	  We visualize this partition of 
 under three lenses in 
Figure 5.
First, the probability mass of an individual low-entropy or chalky bracket is much larger than the probability mass of an individual typical bracket, which is much larger than the probability mass of an individual high-entropy or rare bracket. In symbols, if , then . “Rare” is a good name for high-entropy brackets because they are highly unlikely. “Chalk”, a term from sports betting, is a good name for low-entropy brackets because it refers to betting on the heavy favorite (i.e., the outcome with the highest individual likelihood). Most of the individual forecasts within a low-entropy bracket must consist of the most probable outcomes. For example, in the “guessing a bitstring” contest, assuming the reference bitstring consists of independent Bernoulli(p) bits where , low-entropy brackets are bitstrings consisting mostly of ones. In real-world examples of multi-bracket pools, people are drawn to these low-entropy chalky brackets because they have high individual likelihoods.
Second, there are exponentially more rare brackets than typical brackets, and there are exponentially more typical brackets than chalky brackets. In symbols, . In the “guessing a bitstring” contest with , the overwhelming majority of possible brackets are high-entropy brackets having too many zeros, and very few possible brackets are low-entropy brackets consisting almost entirely of ones. Typical brackets tow the line, having the “right” amount of ones. March Madness is analogous: the overwhelming majority of possible brackets are rare brackets with too may upsets (e.g., a seed above 8 winning the tournament) and relatively few possible brackets are chalky brackets with few upsets (there are only so many distinct brackets with favorites winning nearly all the games). Typical brackets tow the line, having the “right” number of upsets.
Lastly, the typical set of brackets contains most of the probability mass. In symbols,  and . This is a consequence of the previous two inequalities. Although  is massive,  for  is so small that  is small. Also, although  for  is relatively large,  is so small that  is small. Hence, the remainder of the probability mass, , is large. “Typical” is thus a good name for brackets whose entropy is not too high or too low because a randomly drawn bracket typically has this “right” amount of entropy. For example, the observed March Madness tournament is almost always a typical bracket featuring a typical number of upsets.
Drilled down to its essence, the Equipartition Property tells us that, as the number of forecasts 
m within each bracket grows, the probability mass of the set of brackets becomes increasingly more concentrated in an exponentially small set, the “typical set.” See 
Appendix F for a more formal treatment of the Equipartition Property.
This information theoretic view of the multi-brackets problem sets up a tradeoff between chalky and typical brackets. Typical brackets have the “right” entropy but consist of less likely individual outcomes, whereas chalky low-entropy brackets have the “wrong” entropy but consist of more likely individual outcomes. The former excels when n is large, the latter excels when n is small, and for moderate n we interpolate between these two regimes; so, we should increase the entropy of our set of predicted brackets as the number of brackets n increases. We justify this below using the Equipartition Property.
As the typical set contains most of the probability mass, the reference bracket is highly likely to be a typical bracket. So, when n is large, we should generate typical brackets as guesses since it is likely that at least one of these guesses is “close” to the reference bracket. When n is small, generating typical brackets as guesses does not produce as high an expected maximum score as chalky brackets. To understand, recall that a bracket consists of m individual forecasts. A single randomly drawn typical bracket has the same entropy as the reference bracket but is not likely to correctly predict each individual forecast. For instance, in our “guessing a bitstring” example, a single randomly drawn bitstring has, on average, a similar number of ones as the reference bitstring, but not the “right” ones in the “right” locations. A chalky bracket, on the other hand, predicts highly likely outcomes in most of the individual forecasts. The chalkiest bracket, which predicts the most likely outcome in each individual forecast, matches the reference bracket for each forecast in which the reference bracket realizes its most likely outcome. This, on average, yields more matches than that of a typical bracket because more forecasts realize their most likely outcome than any other single outcome. For instance, in our “guessing a bitstring” example, a chalky bracket consists mostly of ones (assuming ) and so correctly guesses the locations of ones in the reference bitstring. This is better, on average, than guessing a typical bracket, which, on average, has the “right” number of ones but in the wrong locations.
  5. Real-World Examples
Now, we discuss real world examples of multi-bracket pools: pick six betting in horse racing and March Madness bracket challenges. Both contests involve predicting a tuple of outcomes. An individual pick six bet (ticket) involves predicting each winner of six horse races, and an individual March Madness bet (bracket) involves predicting the winner of each game in the NCAA Division I Men’s Basketball March Madness tournament. In both contests, it is allowed, but not necessarily commonplace (outside of horse racing betting syndicates), to submit many tickets or brackets. We demonstrate that the entropy-based strategies introduced in the previous sections are particularly well-suited for these problems. In particular, optimizing over strategies of varying levels of entropy is tractable and yields well-performing solutions.
  5.1. Pick Six Horse Race Betting
Horse race betting is replete with examples of multi-bracket pools. A prime example is the pick six bet, which involves correctly picking the winner of six horse races. Similar pick three, pick four, and pick five bets, which involve correctly picking the winner of three, four, or five horse races, respectively, also exist. Due to the immense difficulty of picking six consecutive horse race winners coupled with a large number of bettors in these pools, payoffs for successful pick six bets can be massive (e.g., in the millions of dollars). In this section, we apply our entropy-based strategies to pick six betting, demonstrating the massive profit potential of these bets.
To begin, let 
 denote the number of races comprising the pick-
s bet (for the pick three, four, five, and six contests, respectively). Suppose, for simplicity, that one pick-
s ticket, consisting of 
s predicted horse race winners, costs USD 1 each (typically a pick-
s bet costs USD 1 or USD 2). Indexing each race by 
, suppose there are 
 horses in race 
j, and let 
. There is a fixed carryover 
C, an amount of money leftover from previous betting pools in which no one won, that is added to the total prize pool for the pick-
s contest. As assumed throughout this paper, assume the “true” win probability 
 that horse 
i wins race 
j is known for each 
i and 
j. As our operating example in this section, we set 
 to be the win probabilities implied by the Vegas odds from the pick six contest from Belmont Park on 21 May 2023 (
https://entries.horseracingnation.com/entries-results/belmont-park/2023-05-21, accessed on 1 June 2023), which we visualize in 
Figure 6.
Suppose the public purchases 
k entries according to some strategy. In particular, we assume the public submits 
k independent tickets according to 
, where 
 is the probability an opponent selects horse 
i to win race 
j. We purchase 
n entries according to strategy 
. Specifically, we submit 
n independent tickets according to 
, where 
 is the probability we select horse 
i to win race 
j. The total prize money is thus
        
        where 
 is the track take (vigorish). Let 
W be our number of winning tickets and let 
 be our opponents’ number of winning tickets. Under our model, both 
W and 
 are random variables. Formally, denote the “true” observed 
s winning horses by 
, our 
n tickets by 
, where each 
, and the publics’ 
k tickets by 
, where each 
. Then,
        
        and
        
Then, the amount we profit is also a random variable,
        
        where we treat 
 to be 0 (i.e., if both 
 and 
, the fraction 
 is 0). Here, the randomness is over 
∼
, 
x∼
, and 
y∼
.
Our task is to solve for the optimal investment strategy 
 given all the other variables 
n, 
k, 
, 
, 
C, and 
. Formally, we wish to maximize expected profit,
        
		In 
Appendix G, we compute a tractable lower bound for the expected profit.
We are unable to analytically optimize the expected profit to find an optimal strategy 
 given the other variables, and we are unable to search over the entire high-dimensional 
-space for an optimal strategy. Instead, we apply the entropy-based strategies described in the previous sections. The idea is to search over a subspace of 
 that explores strategies of varying entropies, finding the optimal entropy given the other variables. To generate 
n pick six tickets at varying levels of entropy, we let 
 vary according to parameters 
 and 
 that control the entropy. Assuming without loss of generality that in each race 
j the “true” win probabilities are sorted in decreasing order, 
, we define 
 for 
 and 
 by
        
        recalling that there are 
 horses in race 
j. We visualize these probabilities for race 
 in 
Figure 7. For fixed 
, smaller values of 
 push the distribution 
 closer towards a uniform distribution, increasing its entropy. Conversely, increasing 
 lowers its entropy. In lowering its entropy, we shift the probability from some horses onto other horses in a way that makes the distribution less uniform. The parameter 
 controls the number of horses to which we transfer probability as 
 increases. For instance, there are 
 horses in race 
, so when 
, we transfer successively more probability to the top 
 horses as 
 increases.
Further, we assume we play against opponents who generate brackets according to the strategy . In other words, low-entropy opponents bet mostly on the one or two favorite horses (depending on ), high-entropy opponents are close to a uniform distribution, and moderate-entropy opponents lie somewhere in the middle. The exact specification of the opponents’ distribution is not important, as we use it to illustrate a general point. In future work, one can try to model the distribution of the public’s ticket submissions to obtain more precise results.
In 
Figure 8, we visualize the expected profit for a pick six horse racing betting pool in which we submit 
n tickets according to strategy 
 against a field of 
k = 25,000 opponents who use strategy 
, assuming a track take of 
 and carryover 
C = 500,000, as a function of 
 and 
n. Given these variables, we use the strategy 
 that maximizes expected profit over a grid of values. We see that the entropy of the optimal strategy increases as 
n increases (i.e., 
 decreases and 
 increases as 
n increases). Further, we see that submitting many brackets at a smart entropy level is hugely profitable. This holds true particularly when the carryover is large enough, which occurs fairly regularly.
  5.2. March Madness Bracket Challenge
March Madness bracket challenges are prime examples of multi-bracket pools. In a bracket challenge, contestants submit an entire 
bracket, or a complete specification of the game winners of each of the games in the NCAA Division I Men’s Basketball March Madness tournament. The winning bracket is closest to the observed NCAA tournament according to some metric. Popular March Madness bracket challenges from ESPN, BetMGM, and DraftKings, for instance, offer large cash prizes—BetMGM offered USD 10 million to a perfect bracket or USD 100,000 to the closest bracket, DraftKings sent USD 60,000 in cash prizes spread across the best 5096 brackets last year, and ESPN offered USD 100,000 to the winner of a lottery among the entrants who scored the most points in each round of the tournament (
https://www.thelines.com/best-march-madness-bracket-contests/, accessed on 1 June 2023). To illustrate the difficulty of perfectly guessing the observed NCAA tournament, Warren Buffett famously offered USD 1 billion to anyone who filled out a flawless bracket (
https://bleacherreport.com/articles/1931210-warren-buffet-will-pay-1-billion-to-fan-with-perfect-march-madness-bracket, accessed on 1 June 2023). In this section, we apply our entropy-based strategies to March Madness bracket challenges, demonstrating the impressive efficacy of this strategy.
To begin, we denote the set of all brackets by , which consists of  brackets since there are 63 games through six rounds in the NCAA tournament (excluding the four-game play-in tournament). We define an atomic probability measure  on , where  is the probability that bracket  is the “true” observed NCAA tournament, as follows. Given that match  involves teams i and j, we model the outcome of this match by , where . In other words, with probability , team i wins the match; else, team j wins the match. Prior to the first round (games 1 through 32), the first 32 matchups are set. Given these matchups, the 32 winning teams in round one are determined by Bernoulli coin flips according to . These 32 winning teams from round one then uniquely determine the 16 matchups for the second round of the tournament. Given these matchups, the 16 winning teams in round two are also determined by Bernoulli coin flips according to . These winners then uniquely determine the matchups for round three. This process continues until the end of round six, when one winning team remains.
In this work, we assume we know the “true” win probabilities 
. As our operating example in this section, we set 
 to be the win probabilities implied by FiveThirtyEight’s Elo ratings from the 2021 March Madness tournament (
https://projects.fivethirtyeight.com/2021-march-madness-predictions/, accessed on 5 June 2024). We scrape FiveThirtyEight’s pre-round-one 2021 Elo ratings 
 and index the teams by 
 in decreasing order of Elo rating (e.g., the best team Gonzaga is 1 and the worst team Texas Southern is 64). Then, we define 
 by 
. In 
Figure 9a, we visualize 
. The Elo ratings range from 71.1 (Texas Southern) to 96.5 (Gonzaga), who is rated particularly highly. In 
Figure 9b, we visualize 
 via the functions 
 for each team 
i. For instance, Gonzaga’s win probability function is the uppermost orange line, which is considerably higher than the other teams’ lines. See 
Appendix D for a discussion of the robustness of our results to this choice of 
.
Suppose a field of opponents submits 
k brackets 
 to the bracket challenge according to some strategy 
. In particular, we assume the public submits 
k independent brackets according to 
, where 
 is the probability an opponent selects team 
i to beat team 
j in the event that they play. We submit 
n brackets 
 to the bracket challenge according to strategy 
. Specifically, we submit 
n independent brackets according to 
, where 
 is the probability we select team 
i to beat team 
j in the event that they play. The goal is to become as “close” to the “true” reference bracket 
, or the observed NCAA tournament, as possible according to a bracket scoring function. The most common such scoring function in these bracket challenges is what we call 
ESPN score, which credits 
 points to correctly predicting the winner of a match in round 
. Since there are 
 matches in each round 
, ESPN score ensures that the maximum accruable points in each round is the same (320). Formally, our task is to submit 
n brackets so as to maximize the probability we do not lose the bracket challenge,
        
Alternatively, in the absence of information about our opponents, our task is to submit 
n brackets so as to maximize expected maximum score,
        
Under this model, it is intractable to explicitly 
evaluate these formulas for expected maximum score or win probability for general 
, 
, and 
, even when we independently draw brackets from these distributions. This is because the scores 
 and 
 of two submitted brackets 
 and 
 relative to 
 are both dependent on 
, and integrating over 
 yields a sum over all 
 possible true brackets for 
, which is intractable. Hence, we use Monte Carlo simulation to approximate expected maximum score and win probability. We approximate expected maximum score via
        
        where the 
 are independent samples from 
 and the 
 are independent samples from 
. We use a double Monte Carlo sum, with 
 draws of 
 and 
 draws of 
, because it provides a smoother and stabler approximation than a single Monte Carlo sum. Similarly, we approximate win probability via
        
        where the 
 are independent samples from 
, the 
 are independent samples from 
, and the 
 are independent samples from 
. We again use a double Monte Carlo sum, with 
 draws of 
 and 
 draws of 
 and 
, because it provides a smooth and stable approximation.
We are unable to analytically optimize these objective functions to find an optimal strategy 
 given the other variables, and we are unable to search over the entire high-dimensional 
-space for an optimal strategy. These problems are even more difficult than simply evaluating these objective functions, which itself is intractable. Thus, we apply the entropy-based strategies from the previous sections, which involve generating successively higher entropy brackets as 
n increases. The idea is to search over a subspace of 
 that explores strategies of varying entropies, finding the optimal entropy given the other variables. To generate 
n brackets at varying levels of entropy, we let 
 vary according to the parameter 
 that controls the entropy. In a game in which team 
i is favored against team 
j (so 
, since we indexed the teams in decreasing order of team strength, and 
), the lowest entropy (chalkiest) strategy features 
, the “true” entropy strategy features 
, and the highest entropy strategy features 
. We construct a family for 
 that interpolates between these three poles,
        
        where 
. The entropy of 
 increases as 
 decreases.
Further, we assume we play against 
colloquially chalky opponents, who usually bet on the higher-seeded team. Each team in the March Madness tournament is assigned a numerical ranking from 1 to 16, their 
seed, prior to the start of the tournament by the NCAA Division I Men’s Basketball committee. The seeds determine the matchups in round one and are a measure of team strength (i.e., lower-seeded teams are considered better by the committee). We suppose colloquially chalky opponents generate brackets according to a distribution 
 based on the seeds 
 and 
 of teams 
i and 
j,
        
        so they usually bet on the higher-seeded team. The exact specification of the colloquially chalky distribution is not important, as we use 
 to illustrate a general point. In future work, one can try to model the distribution of the public’s bracket submissions to obtain more precise results.
In 
Figure 10a, we visualize the expected max score of 
n brackets generated according to 
 as a function of 
n and 
. In 
Figure 10b, we visualize the probability that the max score of 
n brackets generated according to 
 exceeds that of 
k = 10,000 colloquially chalky brackets generated according to 
 as a function of 
n and 
. In both, we again see that we should increase entropy (decrease 
) as 
n increases. In particular, the small circle (indicating the best strategy given 
n and 
k) moves leftward as 
n increases. Further, we see that tuning the entropy of our submitted bracket set given the other variables yields an excellent win probability, even when 
n is much smaller than 
k.
  6. Discussion
In this work, we pose and explore the multi-brackets problem: how should we submit n predictions of a randomly drawn reference bracket (tuple)? The most general version of this question, which finds the optimal set of n brackets across all such possible sets, is extremely difficult. To make the problem tractable, possible, and/or able to be visualized, depending on the particular specification of the multi-bracket pool, we make simplifying assumptions. First, we assume we (and optionally a field of opponents) submit i.i.d. brackets generated according to a bracket distribution. The task becomes to find the optimal generating bracket distribution. For some multi-bracket pools, this is tractable, and for others, it is not. For those pools, we make another simplifying assumption, searching over an intelligently chosen low-dimensional subspace of generating bracket distributions covering distributions of various levels of entropy. We find this approach is sufficient to generate well-performing sets of submitted brackets. We also learn the following high-level lessons from this strategy: we should increase the entropy of our bracket predictions as n increases and as our opponents increase entropy.
We leave much room for future work on the multi-brackets problem. First, it is still an open and difficult problem to find the optimal set of 
n bracket predictions across 
all such possible subsets, where optimal could mean maximizing expected maximum score, win probability, or expected profit. Second, in this work, we assume the “true” probabilities 
 and our opponents’ generating bracket strategy 
 exist and are known. A fruitful extension of this work would revisit the problems posed in this work under the lens that, in practice, these distributions are either estimated from data or are unknown (e.g., as in Metel [
7]). Finally, we suggest exploring more problem-specific approaches to particular multi-bracket pools. For instance, in March Madness bracket challenges, we suggest exploring strategies of varying levels of entropy within each round. Perhaps the public’s entropy is too low in early rounds and too high in later rounds, suggesting we should counter by increasing our entropy in earlier rounds and decreasing our entropy in later rounds.