2. Number of Unique Topologies from Combinatorics
Let us estimate how many unique event topologies are expected to be produced in
collisions at
TeV. Assume LHC collisions produce events with light-flavor jets (
j), jets associated with
b-quarks (
b-jets), electrons (
e), muons (
), tau (
) leptons, and photons (
). In addition, neutrinos can lead to missing transverse energy, referred to by the acronym “MET” (denoted in the numerical calculation by the letter
m). All events can be grouped into exclusive classes denoted as
where
N is an integer number that defines the number of objects of a certain type,
, in a collision event. In the following, the word “object” will be used for jets,
b-jets, leptons, and photons. In the case of MET,
is either
(no significant MET) or
(when MET is above 200 GeV). Thus, an event class marked with
corresponds to one (or several) produced neutrinos. Using this notation,
represents a class of events with large MET (
), two light-flavor jets (
), one
jet (
), and one photon (
).
To estimate how many unique topologies are expected from the SM, we produced a sample of
Pythia8 version 8.307 [
1] MC events with
collisions at
TeV after enabling all SM processes of this generator. Similar to [
2], the simulation used 44 physics sub-processes at leading-order QCD, such as light-flavor dijet production, all top production, weak single and double boson production, prompt photons, and all Higgs SM processes. The cut on the two-body matrix elements in
Pythia8 was set to 100 GeV. The total integrated luminosity of the simulation was 154 fb
−1, i.e., larger than the LHC Run2 data sample of 140 fb
−1. As light-flavor QCD dijets cannot create the required complexity of the event classes, the generation of such events was relatively suppressed compared with other sub-processes with lower cross sections. The total number of generated events was 0.53 billion. Stable particles with a lifetime larger than
seconds were considered, while neutrinos were excluded from the consideration. The NNPDF 2.3 LO [
3] parton density function, interfaced with
Pythia8 via the LHAPDF library [
4], was used. A detector simulation was not applied. The object reconstruction was the same as in [
2]. Hadronic jets were reconstructed using the anti-
algorithm [
5] with a distance parameter of
implemented in the
FastJet package [
6]. The transverse momenta (
) of the jets must be greater than 20 GeV, and the pseudorapidity (
) must satisfy
. A jet is classified as a
b-jet if its four-momentum matches the momentum of a
b-quark, and the
b-quark contributes more than 50% of the total jet energy. Leptons and photons are required to be isolated. A cone of size 0.2 in azimuthal angle (
) and
is defined around the true direction of the lepton. Then, all energies of particles inside this cone were summed up. A lepton is considered isolated if it carries more than 90% of the cone energy. The transverse momentum cut and the
cut were the same as for the jets. The requirement for MET was 200 GeV, i.e., when 0 m becomes 1 m in the symbolic calculation.
According to the above SM MC simulation, the number of non-identical event topologies was 3537. The maximum number of observed objects was
,
,
,
,
, and
. We did not observe more than 20 objects per event. The total number of light-flavor
b-jets was always less than 19. In addition, the total number of leptons was never larger than 5. All such restrictions can be called the “boundary” condition, which limits the number of possible event classes. They are summarized below:
where
is the total number of leptons. Events with
and
(but with transverse momenta larger than
GeV used in this paper) and four-lepton events have recently been studied using LHC [
7,
8]; therefore, our MC simulation should be a reasonable representation of the reality.
We will keep a conservative view that a BSM phenomenon does not violate the boundary condition; otherwise, it can easily be found by looking at inclusive single-particle distributions of identified particles or jets. For example, an observation of events with five muons could alarm the observer in the past and thus such high-multiplicity events cannot represent the experimental challenge for their detection. But new phenomena can be “hidden” in exclusive combinatorial combinations, which are more intricate to discover experimentally. We will come back to the discussion of this point later.
Let us calculate how many combinatorial combinations are expected by preserving the SM boundary condition Equation (
1). First, we set the maximum number of objects to be observed to
. There are up to
objects per event (where MET is counted as an additional “object”). The total number of combinations, where items can be repeated more than once and the ordering of items is not important, is
Thus, the total number of unique combinations is 888,022. Imposing the boundary condition Equation (
1) from the SM MC simulation is not straightforward using analytic calculations. However, such a calculation can be obtained numerically, as shown in
Appendix A. The obtained answer is 19,497 combinations.
The difference between the MC prediction (3537) for SM processes and what, potentially, can be expected for the number of event classes from combinatorics (19,497) demonstrates that the MC event sample does not include all possible event classes. For example, event topologies such as
have never been seen in the generated event sample for SM processes. The author does not know which BSM scenario can lead to such event classes. Note that the event topologies are defined in the restricted phase space, i.e., in the limited kinematic region defined by the transverse momentum and pseudorapidity selection. Thus, such event topologies cannot violate the charge, lepton number, energy-momentum conservation, and other constraints. The difference of about 5 between the number of event classes predicted by the
Pythia8 generator and by the numeric combinatorics can be an indication that the event generation may require more events. In addition, not all physics processes are included in the event generator. For example, next-to-leading order QCD effects may be in play. It should be noted that
Pythia8 agrees well with alternative MC simulations up to six jets [
9], but the other event topologies need to be verified too. We will put this question aside and assume that the total number of possible event classes is 19,497, as derived in the numeric computation with the
Pythia8 boundary condition Equation (
1), but not what has been predicted by
Pythia8 itself for the number of event classes. More realistic event generators may reduce the discrepancy between our numeric estimate and the generator predictions for the number of unique event classes, but they cannot change the conclusion of this paper, which does not rely on MC simulations.
How can we be sure that previous LHC studies were able to explore all such event topologies? According to the publication record of the ATLAS and CMS experiments,
collision events have been studied in about 600 publicly available results using 140 fb
−1 of data. For the sake of argument, let us assume that 5 non-identical event classes were scrutinized in each paper,
1 and they were found to match the SM predictions. This produces 3000 investigated event classes. Note that one ATLAS publication [
10] contains the studies of more than 700 event classes, but that analysis used a small fraction of the LHC Run2 data, and these event classes are expected to be a subset of the 3000 event topologies assumed before. Therefore, the number of unexplored event topologies, out of 19,497 expected, is close to
.
If one considers fully reconstructed (identified) SM heavy particles, such as Z, W and top quarks, the number of event classes will increase. This can easily be checked by adding these additional particles in our numeric notation after reducing the maximum multiplicities of leptons from 4 to 3 (i.e., considering decays) and reducing the number of jets (b-jets) by one. We should also require that the total number of W, Z, and top quarks cannot be greater than 3; the latter boundary condition makes this example more realistic. In this case, the number of event topologies will increase to more than 140,000.
3. Discussion
When discussing the coverage of the event classes by the LHC studies, it is assumed that new phenomena predominantly contribute to a single event class, rather than to many event classes. The latter assumption, keeping in mind our model-agnostic approach, should be quite reasonable considering the fact that we do not know much about what can be expected from BSM physics.
From the standpoint of QCD, even if a BSM model is characterized by a very specific event class (say, with a fixed
number of jets plus some fixed number of leptons and photons), additional event classes with extra jets can be produced due to the parton shower. The event rate of the events with
+ 1 jets is suppressed with respect to events with
jets by the strong coupling constant
(times the number of jets). However, ignoring softer jets involves an additional supposition that there is nothing interesting in high-jet multiplicity events, as they originate from the QCD parton showering. This assumption is incompatible with the general search strategy, as it must involve an undefined cutoff parameter that limits the number of jets and the entire scope of model-agnostic searches. This is why an exclusive approach to jet multiplicity has been adopted by the ATLAS [
10] general searches.
From an experimental perspective, it is not unreasonable to think that some inclusive measurements may have certain sensitivity to the 19,497 event classes reported for the condition Equation (
1). This is because many studied distributions at the LHC are a “mix” of different event classes. In our view, inclusive measurements cannot effectively pinpoint a specific event topology produced with a small cross-section. Generally, searches in events with exclusive definitions of jets and particles, where any event class with a fixed jet multiplicity is treated as a unique hadronic-final signature, are better motivated. For example, it is difficult to understand how an inclusive two-jet measurement can ping-point event class with additional two jets and a few leptons shown in Equation (
3), which may have a cross-section by several orders of magnitude smaller than the inclusive two-jet measurements. Thus, it is necessary to carry out dedicated measurements focusing on such exclusive event topologies.
7. Kinematic Consideration
It is more difficult to understand the kinematic side of the argument beyond the object-multiplicity combinatorics. So far, we have assumed that all objects are produced in any detector region, following some density distributions expressed in terms of , , and , and only a composition of their multiplicities can separate one event topology from the other. It is natural to expect that some BSM phenomena may be distinguished from the SM events by their distinct kinematics too. For example, heavy particles can predominately decay into two other jets/particles in the central detector region, whereas other BSM models may “prefer” to populate the forward detector regions.
We will use a simple toy consideration to calculate the number of possible kinematic features using combinatorics with substitution. Assume that all objects in 19,497 distinct event classes populate the detector phase space according to the SM expectations. We define a new phenomenon if two objects approach close to each other, i.e., they are the decay products of low-mass states.
2 Such objects are still counted as two separate objects, but they form ensembles of kinematically unique events, and their production rate should be larger than that obtained from pure statistical noise around the SM-defined densities.
Let us count how many such unique kinematic topologies can exist by grouping jets and particles. For example, consider the event topology with one jet, one
jet, and one electron, such as:
where we shorten the notation after removing
,
,
, and
. This event topology creates three kinematically-distinct classes:
where the four-character strings,
,
, and
, represent three two-body groups with a certain dynamic correlation between the objects in each group. For example, such objects can be close to each other for a statistically significant number of events, as they stem from exotic low-mass states. Experimentally, these three combinations can be viewed as invariant masses of jet+(
b-jet),
e+(
jet), and jet+
e with associated production of other objects produced anywhere in a detector following the SM single-particle densities. Now, we can ask this question: how many such sub-classes of events exist out of 19,497 total combinations? The obtained number using numeric combinatorics is 159,674 (see
Appendix A).
As before, now we need to estimate how many two-body distributions have been analyzed at the LHC. We assume that for each of the 3000 event classes studied at the LHC, at least one relevant two-body kinematic distribution (such as an invariant mass) has been inspected, and no deviations from the SM have been found. Therefore, for the expected 159,674 event classes with two-body correlation, the chances that the LHC will encounter one of these topologies, which may have an excess over the SM background, are about 2%. This assumes that such events with correlations are explored uniformly across all the event topologies. This estimate can only be used as a conservative guide or as an upper limit on the actual LHC coverage of new phenomena, as this calculation does not consider charge topologies, correlations beyond the two-prong decays, known heavy SM particles, or other possibilities.
For the boundary condition Equation (
4), which is motivated by recent LHC studies, the calculated number of two-particle sub-classes is 53,108. This leads to
kinematic distributions potentially explored at the LHC.
8. Conclusions
The modern approach to searches for new physics at the LHC is usually based on event signatures proposed by model builders. It is quite clear that LHC has good coverage of event topologies with low jet/particle multiplicities and hard-QCD jets. But for events with large multiplicities, where jets are treated exclusively, the experimental coverage of the LHC is not large.
Nature can be more unpredictable, and more model-agnostic approaches can also be useful for discovering new physics in the LHC data. Our numeric analysis, guided by the large-scale SM simulations, reveals that the non-observation of new phenomena at the LHC is not unsurprising. If a BSM signal with unusual two-particle correlations can equally be found in any of the event classes discussed in this paper, then the chance that the LHC could detect such a new phenomenon is rather small, that is, about (or ), depending on the boundary condition used in the numeric calculation. If we are only interested in jet/particle multiplicities, then the number of unexplored event classes is 81% (55%), leading to the probability of 19% (45%) for the observation of a new event topology at the LHC. These estimates assume that a BSM phenomenon can equally be found in any of those unexplored event classes and jets are treated excursively. These values represent the upper bounds on the probability of finding new phenomena because only the lightest identified particles were taken into account, and the calculation of kinematically distinct event classes includes only one feature (i.e., two-particle correlations). Despite the approximate nature of our calculations, they represent the first quantitative estimates obtained under the assumptions proposed in this paper. Thus, LHC is still at the beginning of the journey to discover new physics.
As we mentioned in the introduction, this study discusses the purely theoretical side of the problems, namely, the number of exclusive event classes that could potentially exist for LHC collisions within the reasonable boundary conditions expected from truth-level MC simulations and the initial LHC studies [
7,
8]. The question of experimental feasibility has also been partially addressed before, but to arrive at a realistic conclusion, full simulations of the detectors are required. Studies of this kind cannot easily be conducted using fast detector simulations with smearing of truth-level particles, as this approach cannot mimic the jet reconstruction purity and fake rates for leptons.
Another caution is that the LHC searches for new physics using event signatures that extend beyond simple object counting or even two-particle correlations (such as the two-body invariant masses discussed in this paper). The relevance of existing studies to the signatures explored here is not straightforward to determine. On the other hand, the presence of new signatures beyond those discussed in this paper, such as three-particle correlations or signatures associated with searches for effective field theory operators, could expand the scope of unexplored phase space.
On the experimental side, when searches are performed in a large number of event classes discussed in this article, special care should be taken when addressing the ”look elsewhere” effect, which reduces the statistical significance of potential excesses due to the large number of event classes. This effect needs to be incorporated into the statistical tools used in the evaluation of the statistical significance of possible deviations from SM backgrounds. This question can only be answered using experimental data and MC simulations of all relevant physics processes, where background rates and detector effects are well controlled.
In order to tackle the problem of searches for new phenomena in the vast number of possible event topologies reported in this paper, novel methods of data analysis, which rely less on expectations from BSM models, should be widely used. For example, unsupervised machine-learning methods can automatically label unusual event classes as anomalies. Then, such anomalous events can be compared with the SM predictions. Only very recently, the LHC [
11,
12] has started its physics program of using anomaly detection and fully unsupervised machine learning for complete event kinematics. For studies of multiplicities of event classes, one can train a neural network to reproduce the shapes of rates of event class as a function of their multiplicities using a small fraction of data or some control region. Comparing such shapes with actual data would provide a useful tool for understanding the “missing information” problem. We hope that machine learning or novel event-counting methods, aimed at discovering model-independent new signatures within the exclusive event classes that constitute LHC events, will reach their full potential in uncovering new physics in the near future.