1. Introduction
Measurements of physical characteristics in various natural phenomena, in many cases, can be considered as a sampling of a stochastic process in time and magnitude domains. This sampling produces time series which reflect the dynamics of the underlying physical process. From these measurements, sequences of record-breaking events with respect to their size (magnitude) can be extracted where each such an event is larger (smaller) than all previous events [
1,
2,
3,
4,
5]. The most prominent example is the daily temperature measurements and the corresponding high (low) temperatures which were recorded during a particular historical time interval. From the sequence of daily temperatures it is possible to extract a subsequence of record-breaking temperatures. The analysis of these record-breaking temperatures is of critical importance to understand future trends and variations in weather patterns and climate changes [
3,
6,
7,
8,
9,
10,
11].
Record-breaking events extracted from the physical measurements, thought experiments, or computer simulations form a subsequence in time and are distinguished based on their magnitudes. At a given time step a record-breaking event is defined as the largest among all previous records. Among the following events, the event that exceeds the previous record becomes a record-breaking event [
1,
3,
4,
5,
8]. The statistical analysis of records was developed for the sequences extracted from the independent and identically distributed (
i.i.d.) random variables and was based on extreme value statistics [
1,
4,
5,
12]. For such
i.i.d. sequences it is known that some statistical measures of records are independent of their underlying distribution and can be derived analytically [
2]. However, the effects of correlations and memory between events introduce complications to the theory.
In recent years some progress has been made in analyzing
i.i.d. random sequences with time-varying underlying distributions as well as non-
i.i.d. random sequences with the presence of correlations to study the statistics of their record-breaking events. The daily record temperatures were analyzed in Philadelphia to establish the trends and correlations in their variations [
7]. Records drawn from independent random variables but with progressively broadening or sharpening distributions were investigated [
9]. To consider the effects of correlations in time series, record-breaking events were extracted from the sequences generated by random walks and Lévy flights [
13]. Record-breaking events were observed and studied in the models and experiments describing the processes of rupture and failure [
14,
15,
16,
17,
18,
19].
The standard model of the occurrence of the records assumes that a single event is added at each time step. The generalization of this model can also be considered where the occurrence of events grows stochastically in time. Particularly, several models with deterministic growth of events were analyzed [
20]. The effects of long-term correlations was studied in the context of extreme events to quantify how the distribution of maxima is affected by the length and the presence of persistence in the time series [
21]. The same authors also analyzed the statistics of return intervals between extremal events extracted from the long-term correlated time series [
22].
Record-breaking statistics has been also applied to seismicity. To analyze clustering both in space and time, the record-breaking statistics was used to quantify the recurrence times between earthquakes [
23]. This was generalized to events occurring in space and time by analyzing their recurrences which form a record-breaking process [
24]. By assuming that global earthquakes are independent and their magnitudes follow exponential distribution, the sequences of record-breaking earthquakes were extracted and analyzed for world-wide earthquakes with magnitudes greater than
[
25,
26]. Record-breaking events can also be studied in the context of natural time analysis [
27,
28,
29,
30]. Recently, there has been interest in developing the forecasting or nowcasting approaches related to natural seismicity where record-breaking events can also play a prominent role [
31,
32,
33,
34,
35,
36,
37,
38].
In the present work, the sequences of i.i.d. random numbers following the Weibull distribution were generated and the corresponding subsequences of record-breaking events were extracted to analyze their statistical properties. The main goal of the work was to derive analytically and confirm through numerical Monte Carlo simulations several statistical measures describing the distribution of magnitudes and the temporal structure of record-breaking events. The temporal structure of the record-breaking events does not depend on the underlying distribution function from which the records are extracted. In the work, a convolution operation was used to derive the recursive formula for the distribution of times of the occurrence of records. In addition, the non-normalized cumulative log-normal distribution function was used to approximate the average time of the occurrence of the kth record. On the other hand, the distribution of magnitudes and the corresponding averages of record-breaking events are distribution specific. In this respect, the Weibull distribution was used to study several statistical measures of records. As a result, the distribution of the magnitudes of the records, the average magnitudes, the average of the values of records at given time steps, were derived analytically and confirmed through numerical simulations.
The Weibull distribution plays a prominent role in the studies of various problems in physics, geophysics, and engineering. It has been reported that the interoccurrence of characteristic earthquakes on a single fault follows the Weibull distribution [
39,
40]. It has been also shown that recurrence statistics in the long-range correlated time series follows the stretched exponential distribution. The stretched exponential distribution is the Weibull distribution with the shape parameter
in the range
. The stretched exponential distribution also plays an important role in the context of the nucleation phenomena [
41].
The paper has the following structure. In
Section 2, the basic known facts concerning the statistics of record-breaking events extracted from sequences of
i.i.d. random variables are introduced. Several fundamental expressions for different measures of records are derived. In
Section 3 the analysis of record-breaking events generated from the Weibull distribution is presented. Several analytical results are presented and confirmed through numerical simulations.
Section 4 concludes the analysis.
2. Statistics of Record-Breaking Events
In this section, I provide an overview of several known fundamental statistical measures that characterize the record-breaking events. This measures are independent of the underlying distribution from which the records are drawn and valid when events are i.i.d.. In addition, I derive an expression for …
Physical observations or computer simulations can produce a sequence of measurements of a particular observable,
, at specific instances of time
. Examples abound such as daily temperature measurements, concentrations of carbon dioxide in the atmosphere, flood areas, sport events with the corresponding records, occurrence of earthquakes and volcanic eruptions, etc. These measurements can be considered as a stochastic variable. A record-breaking event
up to time
, has the largest magnitude among all previous events,
[
1]. A subsequent event becomes a record-breaking one if it exceeds the current record-breaking event. In this work, a discrete time
is assumed to mark the times of the occurrence or generation of events with the simplified notation:
. A subscript
is used to mark the record-breaking events in a sequence. For example,
specifies the magnitude of the
kth record-breaking event. As a result, for a given sequence of random events one can extract the subsequence of the record-breaking events:
, where
.
Several fundamental measures can be defined to study the statistics of record-breaking events. The theory of record-breaking events typically assumes that they are
i.i.d. random numbers [
4,
5]. This is a direct result that the records are extracted from the sequence of
i.i.d. events drawn from a given distribution with a density function
. The distribution can be bounded or unbounded depending on the problem. Record-breaking events extracted from a bounded distribution will be bounded as well. The probability for the records to not exceed
x can be written as
where
specifies the lower bound of the distribution function.
2.1. Frequency-Magnitude Statistics of Record-Breaking Events
It is possible to compute the distribution of record magnitudes for each order
k. The probability density function for the
kth record has the form [
7]:
where
is the distribution function given in Equation (
1) from which the records are drawn. Equation (
2) is a recursive formula to compute the distribution of magnitudes for the
kth record given the distribution for the
st record-breaking event. The distribution of the first record,
, coincides with the distribution from which the random variables are drawn,
. For the second record order
, by noticing that
, one can compute
This generalizes for an arbitrarily order
k and can be proved by induction that the general form of Equation (
2) is [
42]
Equation (
4) is valid for records drawn from
i.i.d. random variables with the underlying distribution function
.
Using the above derived Equation (
4), one can also compute the average magnitude,
, of the
kth record-breaking event
where
is the support of the distribution function
. In the next section we will show that the integral in Equation (
5) can be computed exactly in the case of Weibull distributed random variables.
Similarly, the average magnitude
of a record-breaking event at a given time step
n has the form:
where
specifies the probability density function for the records to occur at time
n. Therefore,
is a probability to have the magnitude of the record to be between
x and
at a time step
n. This probability density function
is related to
as [
8]
Noticing that
, Equation (
6) can be written in the following form
The reviewed results, Equations (
4)–(
8), are valid only for
i.i.d. random variables.
2.2. Temporal Structure of Record-Breaking Events
To characterize the occurrence of records in time, one can estimate the average number of record-breaking events,
, that occurred up to a time step
n. The quantity
is a random variable. In case of
i.i.d. events from which the sequence of record-breaking events is extracted, the probability for the
jth record-breaking event is
[
4]. Therefore, the probability decreases harmonically with increasing time steps.
It can be shown that the average number of records,
, is [
4]
where
is the Euler–Mascheroni constant and
is a harmonic number,
, Ref. [
43]. This signifies that the average number of records increases as
∼
for large
n and is independent of the underlying distribution function
. When the records are extracted from processes with memories or long range correlations, the average number can deviate from the growth given in Equation (
9).
To quantify the variability of the average number of records, one can compute the variance [
9]
For
i.i.d. record-breaking events, the ratio of the variance, Equation (
10), to the mean, Equation (
9), approaches unity as
and the distribution of the number of events at each time step
become Poisson with the mean value
. This signifies that the occurrence of records follows a log-Poisson process [
9].
The distribution of times between two subsequent record-breaking events (interevent times) characterizes the process of the occurrence of records. The interevent time between
kth and
st record-breaking event is defined as
. For the records drawn from
i.i.d. random events, the distribution of interevent times is independent of the underlying distribution of magnitudes and the corresponding non-normalized histogram follows a power law:
, [
8]. This power-law distribution is obtained by considering all interevent times between records in a given sequence.
In addition, it is possible to consider the probability that the
kth record is broken after
m time steps. The corresponding distribution function,
, provides a more detailed structure of times between consecutive records [
5,
7]
where
is the underlying distribution function of the random variables from which records are drawn. The expression,
, gives the probability that the previous record,
kth, is broken after
time steps and the new
st record has the value
. Equation (
11) is obtained by averaging this probability over all possible values of
x.
The probability that the 1st record (
) is broken after
m time steps,
, can be computed explicitly. This can be achieved by using the fact that
. Substituting this into Equation (
11) and performing integration by parts, one has
This distribution is a power-law; as a result, the average interevent time between the first and the second records is infinite,
.
Using Equation (
4), Equation (
11) can be written using the cumulative distribution function
with the result
The integration can be performed explicitly and one obtains [
42]:
For
and 3, one has
where
is a harmonic number and
is a digamma function [
43]. The obtained results illustrate that, for the records drawn from the
i.i.d. random variables, the probability distribution
is independent of the magnitude distribution
of records.
Finally, it is also possible to define and analyze the probability distribution,
, for the time of the occurrence of the
kth record-breaking event at a given time step
n. By knowing this probability distribution, one can compute the average time
of the occurrence of the
kth record-breaking event. It is obvious that the first event is always a record-breaking event; as a result,
. The distribution for the time of the occurrence of the second record-breaking event,
, is the same as the distribution,
, of interevent times between the first (
) and second (
) records. The occurrence times of the
kth record,
, is a random variable. It can be computed as a sum of two random variables,
, where
is the occurrence time of the
st record and
is the interevent time between
and
k records. Therefore, the distribution of times of the occurrence of the
kth record,
, can be computed recursively using the discrete convolution of the two densities
and
with the result:
In practice, this distributions cannot be evaluated explicitly, except for
, where it coincides with
. Instead, one can use a numerical approximation by using long but finite sequences of events to perform the convolution operation and computing the distributions recursively for specific values of
k. This can also be achieved through Monte Carlo simulations of events drawn from a well-known distribution to compute explicitly the distributions of the occurrence times. This is going to be illustrated in the next section.
Next, I consider the record-breaking events extracted from the sequences of random variables drawn from the Weibull distribution.
3. Weibull Record-Breaking Events
A particular example of record-breaking events can be analyzed by constructing the sequence of
i.i.d. random numbers drawn from the Weibull distribution. The probability density function,
, for the Weibull distribution is
where
and
are the shape and scaling parameters, respectively. When
, this reduces to the exponential distribution. In the case of
, this defines the stretched-exponential distribution. The corresponding distribution function is given by
In order to investigate various statistical measures of record-breaking events drawn from the Weibull distribution, I performed Monte Carlo simulations and compared numerical results with theoretical ones. As stated in the previous section, the temporal structure of record breaking events is independent from the underlying distribution. On the other hand, the distribution of magnitudes and the corresponding averages are not. In the case of the Weibull distribution, they can be derived analytically. This is illustrated in this section.
First, I illustrate the known results for the evolution of the records drawn from any underlying distribution. The average number of record-breaking events
, which occurred up to time step
n, is shown in
Figure 1a and follows Equation (
9). Next, the index of dispersion of record average numbers, which is defined as the ratio of the variance to the mean value of records, is shown in
Figure 1b as solid symbols and is computed as the ratio of Equation (
10) to Equation (
9) at any time step
n.
The probability density function,
, for the magnitude of the
kth record-breaking event can be computed analytically using Equations (
4), (
18), and (
19) and has the form:
with
The comparison of the Monte Carlo simulations of the Weibull random variables and Equation (
20) for the several record orders
is given in
Figure 2a for
and
. This confirms the validity of our simulation results. It also shows the deviations from the theoretical distributions given by Equation (
20) starting for orders larger than
. This is attributed to the finiteness,
, of the generated sequences.
The mean value,
, of the
kth record-breaking event can be evaluated exactly using Equation (
20) with the result:
where
is the gamma function. When
, this reduces to the known result for the exponential distribution,
,
7]. The comparison of the record-breaking events constructed from the Monte Carlo simulations of the Weibull random variables and a plot of Equation (
22) is given in
Figure 2b for several values of
and
. The simulated values start to deviate from the theoretical ones starting from the 8th record order for these particular simulations where I have used sequences of
time steps. The finiteness of the sequences plays an important role in the statistics of the record breaking events and has to be taken into account when comparing with the analytical results. This is also evident in
Figure 2a where the distributions of magnitudes of record-breaking events deviate from ones given by Equation (
20) starting from the record order
.
The probability density function
, Equation (
7), can be computed analytically in the case of the Weibull random variables. Using Equations (
7), (
18), and (
19), one obtains:
The average value
of the record-breaking events at time step
n can be derived by substituting Equation (
23) into Equation (
6)
The integral in Equation (
24) can be evaluated analytically with the result (see
Appendix A):
It is worthwhile to note that
is equal to the mean of the Weibull distribution for given parameters
and
.
For
and
, Equation (
25) has the form:
where
is a polygamma function [
43].
It is also possible to obtain the asymptotic limit of Equation (
24) in the case of large time steps:
Using the above Equation (
24), one can compute the average value
of the record-breaking events at different time steps
n. The results are illustrated in
Figure 2c where the values are computed using the numerical integration of Equation (
24) for several values of
and fixed
.
In addition, I estimated the distributions for which the
kth record is broken after
m time steps,
, from the numerical simulations of Weibull random variables. I also compared them with ones given by Equation (
13), which is valid for
i.i.d. random variables drawn from any underlying distribution function. The results of Monte Carlo simulations and evaluation of Equation (
13) are shown in
Figure 3a for the first several orders of
, which confirm the derived formulas.
I also computed the distribution of interevent times between all subsequent record-breaking events,
. These distributions were constructed by counting all interevent times for all record orders
k. The results are illustrated in
Figure 3b for the Weibull random variables for several values of
and
. For comparison, I also plot as a dashed line the non-normalized distribution
. The finite size effects are also present in the distributions
. For large values of
m, they are influenced by the finiteness of the interval
.
The probability density functions,
, for the time of occurrences of the
kth record-breaking event versus a time step
n are shown in
Figure 4a for the first several orders of
. As mentioned above, the distribution function
coincides with the distribution function
. This is confirmed by plotting Equation (
12) as a solid purple line in
Figure 4a. The subsequent solid green curve for the (
)th record breaking event was computed using the recursive formula, Equation (
17). For higher-order record distributions, the computations using the recursive formula become very time consuming. In addition, the average times
of the occurrence of the
kth record-breaking event are given in
Figure 4b. These times are independent of the parameters of the Weibull distribution. This is related to the fact that the temporal structure of the record-breaking events drawn from
i.i.d. random variables does not depend on the underlying distribution from which random variables are drawn.
To approximate the functional form of the variability of the average times
, the following function was considered and fitted to the simulated data given in
Figure 4b:
The estimated parameters from the fit were
,
, and
with the corresponding 95% confidence intervals. The corresponding Equation (
29) is plotted as a solid black line in
Figure 4b. As a result, the average time for the occurrence of the
kth record is
. Equation (
29) is in fact the cumulative distribution function of the log-normal distribution multiplied by the parameter
A.
4. Conclusions
In this work, I analyzed both analytically and numerically the statistics of record-breaking events extracted from the sequences of
i.i.d. random variables drawn from the Weibull distribution. I derived several analytical results concerning the magnitude distribution and the corresponding averages of records and confirmed them through numerical simulations. The numerical simulations revealed that the finiteness of the sequences considered,
T, played an important role for higher record orders,
k. This is particularly evident in
Figure 2a,b, where one observes significant deviations from the theoretical results for orders larger than
. This is attributed to the fact that the statistics for higher order records come from the entries generated closer to the end of the sequences. Therefore, the finiteness of the sequences influences these statistics.
I derived exact analytical expressions for the distribution of magnitudes, Equation (
20), and the average magnitude, Equation (
22), of the record-breaking events of a given order
k. Similarly, I derived an exact analytical expression for the average record magnitude at a given time step
n reported in Equation (
25). This formula has simpler representations for particular values of
, and the expressions are reported in Equations (
26) and (27) for
and
, respectively. I also provided the asymptotic form of Equation (
25) for large values of
n given in Equation (
28). In addition, a recursive formula was derived for the distribution of record times, Equation (
17). All these obtained results were compared to numerical simulations to confirm their validity.
The presented analysis confirmed that the temporal structure of the studied record-breaking events extracted from the Weibull random variables did not depend on the underlying distribution function. On the other hand, the magnitude distributions and the corresponding average values were controlled by the shape of the underlying distribution from which record sequences were extracted.