1. Introduction
In the realm of econometrics and econophysics, researchers have long explored the intriguing properties of long-range memory in natural and social systems, often characterized by self-similarity and power-law statistical distributions. In financial markets, the abundance of data related to volatility, trading activity, and order flow has provided fertile ground for empirical investigations into long-range memory properties [
1,
2,
3,
4,
5]. Various models with fractional noise have been proposed in econometrics to describe volatility time series [
1,
6,
7,
8,
9,
10,
11]. However, from the perspective of econophysics, these models tend to lack sufficient microscopic reasoning and primarily serve as macroscopic descriptions of complex social systems, often based on assumptions of long-range memory. As a result, predicting stock price movements, despite the application of advanced trading algorithms and machine learning techniques, remains an enduring challenge for researchers [
12,
13,
14].
To deepen our understanding of long-range memory in social systems, it becomes essential to compare the macroscopic modeling with empirical analyses. Our previous review [
15] raised the crucial question of whether observed long-range memory in social systems is a result of genuine long-range memory processes or merely an outcome of the non-linearity of Markov processes. In our endeavor to explore this question, we have reduced the macroscopic dynamics of financial markets to a set of stochastic differential equations (SDEs) and related them to a microscopic agent-based model capable of reproducing empirical probability density functions (PDFs) and power spectral densities (PSDs) of absolute returns [
16,
17,
18,
19]. Moreover, we have employed this model to interpret the scaling behavior of volatility return intervals [
20]. This approach could also find relevance in other social systems, where non-linear SDEs derived from agent-based models describing opinion or population dynamics lead to macroscopic descriptions [
21,
22]. Given these complexities, selecting the most appropriate model for interpreting empirical time series poses a significant challenge.
A promising line of inquiry comes from the observation that market-order flows exhibit long-range persistence, attributed to the order-splitting behavior of individual traders [
23]. This finding reinforces the presence of genuine long-range memory in financial systems, as recently confirmed in a comprehensive investigation [
24]. Consequently, we anticipate the manifestation of persistence in the limit-order flow as well.
In our previous contribution, we explored order disbalance time series in financial markets from the perspective of fractional Lévy stable motion (FLSM) [
25]. Although FLSM and auto-regressive fractionally integrated moving average (ARFIMA) processes offer generalized models for self-similar fractional time series [
26,
27,
28], they require more comprehensive approaches to explain the observed statistical properties of order disbalance in financial markets.
In this study, we continue our analysis of limit order flow using LOBSTER data, see technical description [
29] or short description
https://lobsterdata.com/info/HowDoesItWork.php (accessed on 20 July 2023), aiming to demonstrate the possibility of simple, yet specific modeling of empirical time series. A key innovation of our approach lies in the discovery of a general statistical distribution governing limit order cancellation times. We propose the application of Tsallis statistics, a generalization of Boltzmann–Gibbs statistics [
30,
31,
32], to fit the histograms of limit order cancellation times. Remarkably, the distribution’s parameters for various stocks and time periods appear close, suggesting a universal nature of the observed statistical property.
To augment the identification of the limit order cancellation times’
q-exponential distribution as a new stylized fact in limit order statistical properties, we further consider the assumption of limit order flow as fractional Lévy noise (FLN). Empirical data analysis supports the plausibility of such an assumption. Ultimately, we present a relatively straightforward model with empirical grounding, serving as an artificial model defining statistical properties of order flow and disbalance in financial markets. This new approach addresses certain contradictions uncovered in previous investigations from the perspective of FLSM [
25].
The rest of this paper is organized as follows:
Section 2 introduces the discrete Tsallis
q-exponential distribution;
Section 3 defines and investigates the empirical statistical properties of limit order cancellation times;
Section 4 examines a modified version of the order disbalance time series and its statistical properties;
Section 5 studies the statistical properties of the limit order submission sequence;
Section 6 introduces the artificial order disbalance model, and finally, we conclude the results and summarize our findings.
2. The Discrete Tsallis q-Exponential Distribution
Let us introduce the discrete Tsallis
q-exponential distribution to capture the probability mass function (PMF) for discrete variables
based on the continuous Tsallis
q-exponential probability density function (PDF) [
30], given by:
where
. First, we explore an approach using the survival function of the
q-exponential distribution [
33]:
Withthis survival function, we define the PMF of the discrete
q-exponential distribution as:
The PMF (
3) is normalized for
. Interestingly, this power-law PMF converges to the geometric distribution as
, which aligns with expectations that we have one more
q-generalization of the geometric distribution [
34,
35]. We can express this convergence as:
where we denote
. This result confirms the suitability of the PMF (
3) as a Tsallis
q-generalization of the geometric distribution, making it a potential candidate for modeling event waiting times in social and other complex systems.
The PMF for the discrete variable
k often is defined as follows, [
36]:
After substitution of Equation (
1) into Equation (
5) we obtain a version of the discrete Tsallis
q-exponential distribution as:
The explicit form of the limit of this PMF when
is currently unclear; thus, we will use Equation (
3) instead of Equation (
6) in this investigation.
3. Cancellation Times of Limit Orders in the Order Flow of Financial Markets
In this section, we analyze the cancellation times of limit orders in the order flow of financial markets using the LOBSTER data for all NASDAQ-traded stocks [
29]. The LOB data that LOBSTER reconstructs originates from NASDAQ’s Historical TotalView-ITCHfiles (
http://nasdaqtrader.com, accessed on 20 July 2023). Here we construct the daily time series of order flow from 3 to 31 August 2020, a total of 21 working days. This data exemplifies an empirical social system appropriate for investigating power-law statistical properties. The statistical properties of the cancellation times of limit orders are of particular interest in this study as they play a vital role in understanding order flow and disbalance dynamics.
We retrieve LOBSTER data files: message.csv and orderbook.csv for each selected trading day and ticker (stock). These files contain the complete list of events causing an update of LOB up to the ten levels of prices. Any event j changing the LOB state has a time value counted in the event space. Thus, we deal with a discrete time scale and avoid daily seasonality related to the fluctuating activity of traders. Every limit order has its identification code; therefore, it is straightforward to pair limit order submission and cancellation events. Let us define the limit order cancellation time as the time difference between the order cancellation and submission event. Seeking a simplified approach to the order flow modeling, we consider order cancellation and execution as equivalent events leading to the complete deletion of the previously submitted limit order. In the LOBSTER message.csv file, these events are denoted as event types 3 and 4.
We can calculate the list of all cancellation times
from empirical data or group this data according to the price levels or the volume of the limit orders. The dependence of the cancellation time PDF on the level of price is weak. The two sub-figures in
Figure 1 illustrate the empirical PDFs of the cancellation times calculated separately for the four price levels of the two stocks AMZN and MA. It is worth noting that the dependence of the cancellation time PDF on the price level is weak, with slight differences observed between the first price level (red) and the subsequent levels (blue, green, and black).
We group the cancellation times
into four categories based on the volumes
of the limit orders: (1)
; (2)
; (3)
; (4)
. Each group provides sufficient
values from the joint period of 21 trading days, enabling the calculation of histograms and evaluation of empirical probability density functions (PDFs). The two sub-figures in
Figure 2 illustrate the empirical PDFs of AMZN and MA stock’s cancellation times calculated separately for the four groups of limit order volumes. Once again, the empirical distributions show weak dependence on the limit order volumes, reinforcing our modeling assumption of independence from price and volume.
Based on the independence assumption, we fit the empirical histograms of total limit order cancellation times submitted on the stock market for ten stocks (NVDA, HD, AMZN, NFLX, MA, LLY, TSLA, ADBE, V, JNJ) using the PMF (
3). The fitting is performed using the Maximum Likelihood Estimator (MLE) method [
37]. Data for every stock is joined from 21 trading day of August 2020. Only limit orders up to the ten levels of prices are included in this MLE calculation.
Table 1 provides the calculated
and
q values for each stock from the joint empirical histograms.
The mean values of parameters for these ten stocks are
and
.
Figure 3 demonstrates the close fitting PMFs for all ten stocks. The thin color lines represent PMFs for individual stocks, while the thick black line represents the PMF with mean values of parameters.
4. Live Limit Orders and Order Disbalance
The analysis of limit order cancellation times revealed intriguing empirical properties with remarkably close values of parameters for different stocks. This suggests the assumption that the observed probability mass function (PMF) might be stable over time, which can be valuable in constructing simplified order flow and disbalance models. To achieve this objective, we reconstruct the order disbalance time series from the data in the message files. We adopt an alternative approach to form disbalance time series to explore more aspects of potential memory effects in empirical order disbalance time series.
As mentioned in
Section 3, the LOBSTER data’s message files contain sufficient information about limit orders submitted to the exchange, including their prices, volumes, event times of submission, cancellation, and full execution. With this information, we can construct a list of live limit orders at any discrete event step. We use the notation
to denote a limit order, where
represents the order submission event time,
is the order cancellation or full execution time, and
is the volume (positive for buy limit orders and negative for sell limit orders). To simplify the modeling approach, we ignore the indexing of price levels, as it is not crucial for the investigation of the time series. We include limit orders up to the tenth level of prices on both the buy and sell sides. Using this notation, we can rewrite the order disbalance as follows:
Here, the first sum is over all the live limit orders, including all the limit order volumes submitted before event j and waiting for cancellation or execution. A sequence of limit order submissions of length N generates a series of order disbalance of length since each submission is paired with a cancellation or execution event.
This modified definition of order disbalance series provides a slightly simplified version compared to the previous approach used in [
25], which relied on the data from the orderbook.csv file. By excluding order book events related to the partial execution of orders, we can focus on exploring the statistical properties of the time series with a simplified model. The mean squared displacement (MSD) and Hurst exponents are evaluated for these empirical time series, considering two levels of random (reshuffled) time series. The first level involves reshuffling only the empirical sequence of volumes
, yielding new series
and
. This reshuffling destroys the correlation contained in the limit order submission sequence. The second level involves additional complete reshuffling of all increments
, leading to the memory-less time series
, where the anti-correlation arising from the limit order cancellation events is destroyed as well.
Table 2 presents the MSD and Hurst exponents calculated for the order disbalance
of ten stocks NVDA, HD, AMZN, NFLX, MA, LLY, TSLA, ADBE, V, and JNJ. The Hurst exponents for different series are listed and evaluated using AVE and Higuchi’s method. The empirical analysis shows minor changes in scaling parameters compared to the results in [
25].
Our previous exploration of the order flow from the perspective of FLSM or ARFIMA models [
25] has sparked new questions and raised concerns about the applicability of these models. Notably, while considering the time series
as an accumulated ARFIMA(0,d,0) process provided valuable insights, it was observed that empirical order disbalance time series exhibit strict boundedness. This discrepancy indicates the existence of diffusion reversion mechanisms that fall outside the scope of the ARFIMA model.
Another critical challenge that emerged during our analysis is related to the auto-codifference of empirical
time series [
38]. The order disbalance increments
demonstrate clear auto-dependence, as shown in
Figure 4. However, fitting the empirical auto-codifference with the expected theoretical form
derived for fractional L’evy noise [
39] presented difficulties. Qualitatively, the asymptotic auto-codifference of the ARFIMA(0,d,0) process only matched the empirical data for memory parameter values in the region of
. However, within this region, the accumulated ARFIMA process exhibited strong unbounded behavior, which is inconsistent with the empirical order disbalance time series.
To gain a more specific empirical perspective, we investigated the potential application of the FLSM approach to the order disbalance series, assuming that the distribution of limit order volumes for the most liquid stocks followed a stable L’evy distribution with parameter
[
25]. However, this assumption turned out to be very approximate due to the distinctive resonance structure in the PDF of volumes. This observation led us to reevaluate the limit order flow from a different angle.
One specific issue arising from the FLSM perspective was related to the sample mean squared displacement (MSD), which has an exponent
for large samples and lags [
26]. However, these values of the parameter
d defined by the empirical
in
Table 2 range from
for AMZN to
for JNJ. These values contradict the empirical analysis in [
25], which suggested
for all stocks.
To address limitations of the previous approach and gain deeper insights into the auto-dependence of increments
, we introduced a modified order disbalance definition, Equation (
7). The pairing of limit order increments with opposite signs due to cancellation times assumed to follow a
q-exponential PMF provides a new quantitative model contributing to the observed auto-dependence. This new mechanism of auto-dependence is fundamentally different from the fractionally integrated increments in the ARFIMA0,d,0 series. Thus, the simplification introduced here is helpful for the more precise interpretation of memory effects in the order disbalance time series. In
Figure 5, we visualize the results of empirical analysis presented in
Table 2.
The auto-dependence in the
series can also have other origins; for example, some dependence can arise from the original sequence of the limit order volumes
. Such memory effects in market order flow have been investigated in [
14,
23,
40], and competing interpretations have been provided. To evaluate the possible contribution of the auto-dependence in the limit order volume sequence, we use the random reshuffling procedure of the limit order volumes to obtain
and
series with zero volume correlation. The evaluated Hurst exponents of the
series, see
Table 2 and
Figure 5, are slightly shifted to the side of smaller values. Notably, time series
remained strictly bounded, in contrast to the unbounded behavior observed in
, suggesting that the diffusion of the order disbalance series is self-reverted due to every limit order being canceled or executed. The observed auto-dependence in
and its origin from both the cancellation times and the sequence of limit order volumes raise intriguing questions about the underlying mechanisms of market dynamics. Further theoretical consideration and empirical studies may help uncover the nature of the long-range memory in order disbalance and contribute to a more comprehensive understanding of market order flow.
5. Time Series of Limit Order Submissions
It is essential to investigate the statistical properties of time series that comprise only a sequence of limit order submissions to the market, denoted as
,
where
represents the volume of the submitted limit order, and order cancellation or execution is not included in the series. In this case, we can confidently consider the series from the perspective of FLSM, as the order flow remains uninterrupted by cancellation events. The analysis results are provided in
Table 3.
The first method used to evaluate the parameter
d is based on the assumption that the series
exhibits FLSM-like behavior, with the memory parameter
derived from the sample mean squared displacement (MSD) exponent
, as referenced in [
25,
26]. The second method,
, involves calculating the sample auto-codifference
with lag
t of series
, as defined in Equation (
9). It is worth noting that this method is sensitive to the evaluation of parameter
due to its reliance on the asymptotic form of auto-codifference for fractional L’evy noise, as given in Equation (
10), leading to the definition
[
38].
where
i denotes imaginary units and
N is the length of the series. Note that this method is sensitive to the evaluation of parameter
, as we use the asymptotic form
of auto-codifference for the fractional Lévy noise, see [
38]. The third method,
, relies on the relation used in [
25], where
, assuming that the time series
is fractional L’evy stable motion-like.
Evaluation of memory parameter
d using three different methods and results in the
Table 3 support the idea that the limit order series
is FLSM-like. Fluctuations of memory parameters between methods are considerably smaller than fluctuations between different stocks. The major problem in this consideration remains the assumption of the limit order volume distribution according to the Lévy stable distribution. We fit the
parameter of this distribution to only the tail part
of the empirical histogram. Results provided in the
Table 3 show very stable values for all stocks
.
6. Artificial Order Disbalance Time Series
We propose an artificial order disbalance time series model that captures the main observations of this study. The model involves two random sequences: (a) A sequence of limit order volumes generated as ARFIMA{0,d,0}{
}, where
d is the memory parameter,
is the stability index,
N is the length of the sequence, and
defines the maximum possible absolute value of the volume, selected from the observed empirical data. (b) A corresponding sequence of limit order cancellation times with the same length
N generated using the probability mass function (PMF)
defined by Equation (
3).
With these two independent sequences, we can calculate the model sequence of events
defined by
, which includes cancellation and execution events. This generated random sequence represents the artificial analog of order disbalance time series introduced in Equation (
7), and it is used for the empirical analysis. We choose the artificial model parameters aiming to reproduce the empirical data:
;
;
;
;
.
We compare results generated by the artificial series
and
with those from the empirical series. In
Table 4, we present the MSD and Hurst exponents, together with the memory parameter
, for the artificially generated series and empirical series of stocks MA, NFLX, and TSLA.
The results in the
Table 4 are averaged over five realizations of artificial series and five daily empirical series. The model results with
closely match the series of stock MA, while the model results with
are similar to the empirical series of NFLX and TSLA. Thus, we conclude that the empirically established
q-exponential nature of the limit order cancellation times helps to reconstruct the properties of the order disbalance time series
. Such reconstructions are crucial for interpreting order disbalance time series from the perspective of FLSM.
In
Figure 6, we demonstrate the application of sample auto-codifference [
38], as defined in Equation (
9), for the empirical and artificial model time series. The left sub-figure shows the NFLX auto-codifference of the limit order flow
for the first five trading days of August 2020, along with the best fit by the asymptotic curve from Equation (
10). The fluctuations in defined parameters
and
are considerable for the daily series, but the average
d is close to
. The right sub-figure compares three auto-codifference curves, averaged over five sample series related to NFLX data. The red curve represents the auto-codifference of the empirical order disbalance increment series
, the black curve of the corresponding order disbalance artificial model series with
, and the green curve of the artificial limit order flow series
. The similar behavior of all three curves and good correspondence between the empirical and synthetic series indicate the usefulness of auto-codifference in the research of persistence in financial and other social systems.
An intriguing result is that the auto-codifference of two very different processes in terms of self-similarity exhibits the expected behavior. Positively correlated fractional L’evy noise-like series display a similar auto-codifference as series that exhibit anti-persistence from a self-similarity perspective (). For example, our preliminary investigation of empirical series based only on the limit order signs shows that the persistence of limit order flow disappears when cancellation and execution of orders are included in the series.
In conclusion, the artificial order disbalance time series model provides valuable insights into the persistence and memory properties of the series. The comparison with empirical data demonstrates the usefulness of the model and supports the conclusion that the q-exponential nature of limit order cancellation times contributes to the observed persistence in order disbalance time series. The application of auto-codifference further enhances our understanding of the self-similarity behavior in these financial systems, opening up new avenues for research in this domain.
7. Discussion and Conclusions
In this study, we have delved into the statistical properties of limit order cancellation times in financial markets to better understand the peculiarities of order disbalance time series from the perspective of fractional L’evy stable motion (FLSM). Our previous investigation of order disbalance time series within the framework of FLSM yielded contradictory conclusions [
25]. However, empirical time series often exhibit specific characteristics that necessitate careful consideration during empirical analysis. To address the question of why order disbalance time series in financial markets are strictly bounded, we have focused on the statistical properties of limit order cancellation times, treating them as discrete events.
To this end, we have introduced the concept of a discrete
q-exponential distribution, presented in Equation (
3), as a
q-extension of the geometric distribution, based on the theoretical foundations of generalized Tsallis statistics [
41]. This distribution allows for a better fit of empirical limit order cancellation times, revealing their weak sensitivity to order sizes and price levels. Remarkably, the parameters of the fitted discrete
q-exponential PMF,
, and
, have proven consistent across ten stocks and trading days analyzed. Building on this unique statistical property of cancellation times, we model and empirically investigate limit order flow and order disbalance time series.
The clear distinction between the series of limit order flow and the series of order disbalance , which includes order cancellations, is essential in this research. Limit order flow series display persistence and remain unbounded, making them FLSM-like. On the other hand, order disbalance series, which includes order cancellations and executions, is bounded and exhibits anti-persistence. It is important to acknowledge that limit order flow in financial markets serves as a prime example of time series requiring thorough empirical analysis to validate the use of econometric methods for time series analysis.
By combining fractional L’evy stable limit order flow with the q-exponential cancellation time distribution, we propose a relatively straightforward model of order disbalance in financial markets. This model also serves as an illustrative example of the broader approach to modeling opinion dynamics in various social systems. Our research highlights the significance of social system modeling to ensure the proper utilization of formal mathematical methods.
In summary, our study contributes to a better understanding of order disbalance time series and their memory effects in financial markets. The incorporation of the discrete q-exponential distribution for modeling cancellation times provides valuable insights into the persistence of the order disbalance time series and helps address the question of their boundedness. Furthermore, the combination of FLSM and the q-exponential distribution proves to be a promising approach for modeling social systems, which can be explored further in future research.
In conclusion, the statistical properties of limit order cancellation times and their impact on order disbalance time series have shed light on the dynamics of financial markets. This study not only enhances our understanding of complex financial systems but also highlights the importance of empirical analysis when applying mathematical methods to social system modeling. By bridging the gap between theory and empirical observations, we contribute to the development of more accurate models and deeper insights into the behavior of financial markets and social systems as a whole.