*3.1. Model*

This section presents an overview of the Hawkes process and introduces our model.

#### 3.1.1. Mathematical Notation

Let us consider point process {*ti*}, which is a sequence of non-negative random variables such that ∀*i* ∈ N, *ti* < *ti*+1. For point process {*ti*}, the conditional intensity function is defined as follows [42]:

$$\lambda(t \mid \mathcal{H}\_t) = \lim\_{\Delta t \to 0} \frac{\mathbb{P}\{N(t + \Delta t) - N(t) = 1 \mid \mathcal{H}\_t\}}{\Delta t} \tag{1}$$

where P{*A*|*B*} represents the probability of *A* under condition *B*, *N*(*t*) is the cumulative number of event occurrences at time *t* (i.e., a counting process), and H*t* is the history of the process up to time *t* containing a sequence of event times {*ti*} (i.e., a filtration). As can be seen from the definition, *λ*(*t* | H*t*)<sup>Δ</sup>*<sup>t</sup>* represents the probability of an event occurring in time interval [*t*, *t* + <sup>Δ</sup>*t*). Here, we use the shorthand notation *λ*(*t*) ≡ *λ*(*t* | H*t*), assuming the history up to time *t*, and we call the Poisson parameter *λ*(*t*) an intensity function.

#### 3.1.2. Overview of Hawkes Process

The Hawkes process is a point process in which the intensity function is affected by the occurrence of past events. Let {*ti*} be a point process, and *N*(*t*) be the associated counting process, and the intensity function of the generalized Hawkes process is defined as follows [43]:

$$\lambda(t) = \mathfrak{c} + \int\_{-\infty}^{t} \phi(t - s) dN(s) = \mathfrak{c} + \sum\_{t\_i < t} \phi(t - t\_i) \tag{2}$$

where *c* is a positive constant showing a base intensity, and *φ*(*t*) is a kernel function that expresses the effect of event *ti* from the past on the current intensity [44]. There are various types of kernel functions, and their properties have been well studied. In this study, we applied an exponential kernel *<sup>α</sup>e*<sup>−</sup>*β<sup>t</sup>*, which is a popular kernel function that was originally proposed by Hawkes [26]. We call this Hawkes process a univariate Hawkes process because it is affected by its own events.

The Hawkes process can be extended to a multivariate model in which several types of point processes interact with each other. Let {**<sup>t</sup>***i*} ≡ {{*<sup>t</sup>*1,*<sup>i</sup>*}, {*<sup>t</sup>*2,*<sup>i</sup>*}, ... , {*tM*,*<sup>i</sup>*}} be M-variable point processes, and **N**(*t*) = {*<sup>N</sup>*1(*t*), *<sup>N</sup>*2(*t*), ... , *NM*(*t*)} be the associated counting process, the intensity function of a multivariate Hawkes process for point process {*tn*,*<sup>i</sup>*} is defined as follows [43]:

$$\lambda\_n(t) = \mathfrak{c}\_n + \sum\_{m=1}^{M} \int\_{-\infty}^{t} \phi\_{n,m}(t-s) dN\_m(s) = \mathfrak{c}\_n + \sum\_{m=1}^{M} \sum\_{t\_{m,i} < t} \phi\_{n,m}(t - t\_{m,i}) \tag{3}$$

As in the case of the univariate Hawkes process, *cn* is a positive constant showing a base intensity, and *φ<sup>n</sup>*,*<sup>m</sup>*(*t*) is a kernel function that expresses the effect of event *tm*,*<sup>i</sup>* from the past on the current intensity. In this case, *φ<sup>n</sup>*,*<sup>m</sup>*(*t*) = *<sup>α</sup>n*,*me*<sup>−</sup>*β<sup>n</sup>*,*<sup>m</sup>t*, with positive constant parameters *<sup>α</sup>n*,*<sup>m</sup>* and *β<sup>n</sup>*,*m*, and the intensity function for point process {*tn*,*<sup>i</sup>*} is given as follows:

$$\begin{aligned} \lambda\_n(t) &= \quad c\_n + \sum\_{m=1}^M \int\_{-\infty}^t \alpha\_{n,m} e^{-\beta\_{n,m}(t-s)} dN\_m(s) \\ &= \quad c\_n + \sum\_{m=1}^M \sum\_{t\_{m,i} < t} \alpha\_{n,m} \exp\left(-\beta\_{n,m}(t-t\_{m,i})\right) \end{aligned} \tag{4}$$

Figure 5 is a schematic of this intensity function (Equation (4)). The intensity function, *<sup>λ</sup>n*(*t*), increases *αnm* at event time *tn*,*<sup>i</sup>* and exponentially decays with a time constant of

1/*βnm*. Thus, *<sup>λ</sup>n*(*t*) is excited not only by its own events, but also by other events, and the multivariate Hawkes process can represent such mutual interactions.

**Figure 5.** Schematic showing an example of the time evolution of the intensity function for a multivariate Hawkes process with the exponential kernel *φ<sup>n</sup>*,*<sup>m</sup>*(*t*) = *<sup>α</sup>n*,*me*<sup>−</sup>*β<sup>n</sup>*,*<sup>m</sup><sup>t</sup>*

The quantity, *ρ<sup>n</sup>*,*m*, expressed in the following equation (Equation (5)) in the case of an exponential kernel is called the branching ratio [45,46]. This is the expectation of the number of occurrences of event *n* caused by the occurrence of event *m*. A larger value for this number represents a greater impact of event *m* on event *n*, and this value is an important quantity for interpreting the Hawkes process:

$$\rho\_{n,\mathfrak{m}} \equiv \frac{\mathfrak{a}\_{\mathfrak{m},\mathfrak{m}}}{\beta\_{n,\mathfrak{m}}} = \int\_{t\_{m,i}}^{\infty} \mathfrak{a}\_{n,\mathfrak{m}} \exp\left(-\beta\_{n,\mathfrak{m}}(t - t\_{\mathfrak{m},i})\right) dt \tag{5}$$

#### 3.1.3. Trader Model

In this study, the buy and sell limit order processes of 134 HFTs are modeled by eight-variable Hawkes processes with exponential kernels that are excited by a total of eight-point processes. These eight types were the target HFTs' sell limit (TS) and buy limit (TB), and six types of orders in the order book: sell limit (SL); buy limit (BL); sell cancel (SC); buy cancel (BC); hit sell (HS); and hit buy (HB).

Let {**<sup>t</sup>***i*} ≡ {{*tTS*,*<sup>i</sup>*}, {*tTB*,*<sup>i</sup>*}, {*tSL*,*<sup>i</sup>*}, {*tBL*,*<sup>i</sup>*}, {*tSC*,*<sup>i</sup>*}, {*tBC*,*<sup>i</sup>*}, {*tHS*,*<sup>i</sup>*}, {*tHB*,*<sup>i</sup>*}} be an eightvariable point process, the intensity functions for {*tTS*,*<sup>i</sup>*} and {*tTB*,*<sup>i</sup>*} are given as follows:

$$\lambda\_{TS}(t) = \mathcal{c}\_{TS} + \sum\_{m \in \mathcal{M}} \sum\_{t\_{m,i} \le t} \mathfrak{a}\_{TS,m} \exp\left(-\beta\_{TS,m}(t - t\_{m,i})\right) \tag{6}$$

$$\lambda\_{TB}(t) = \mathfrak{c}\_{TB} + \sum\_{m \in \mathcal{M}} \sum\_{t\_{m,i} < t} \mathfrak{a}\_{TB,m} \exp\left(-\beta\_{TB,m}(t - t\_{m,i})\right) \tag{7}$$

where M ≡ {*TS*, *TB*, *SL*, *BL*, *SC*, *BC*, *HS*, *HB*}. We assume that the above intensity functions (Equations (6) and (7)) represent the buy and sell limit order processes of each HFT and examine which events affect their order generation processes. Hereafter, the abbreviations listed in Table 2 are used for the order events.

**Table 2.** Abbreviations for eight types of order events. Note the six types of orders in the order book do not include the orders of target HFTs. Therefore, {**<sup>t</sup>***i*} differs for each HFT.


#### *3.2. Parameter Estimation Using Maximum Likelihood Estimation*

For Equations (6) and (7), we apply the maximum likelihood estimation method for the parameter estimation. Because the functional forms of the intensity functions are the same for {*tTS*,*<sup>i</sup>*} and {*tTB*,*<sup>i</sup>*}, we solve the log-likelihood functions in the following manner. For the point process {*tTS*,*<sup>i</sup>*}, the log-likelihood function in time interval [0, *T*] is given by the following [47]:

$$\begin{aligned} \log L(c\_{TS}, \mathbf{a}\_{TS}, \boldsymbol{\theta}\_{TS}) &= -\int\_0^T \lambda\_{TS}(t)dt + \sum\_{i=1}^n \log \lambda\_{TS}(t\_i) \\ &= -c\_{TS}T + \sum\_{m \in \mathcal{M}} \frac{a\_{TS,m}}{\mathcal{\beta}\_{TS,m}} \left[ \sum\_{t\_{m,i} < T} \left\{ \exp \left( -\beta\_{TS,m}(T - t\_{m,i}) - 1 \right) \right\} \right] \\ &+ \sum\_{i=1}^n \log \left[ c\_{TS} + \sum\_{m \in \mathcal{M}} \sum\_{t\_{m,j} < t\_{TS,j}} a\_{TS,m} \exp \left( -\beta\_{TS,m}(t\_{TS,i} - t\_{m,j}) \right) \right] \tag{8} \end{aligned}$$

The same formulation is also applied for log *<sup>L</sup>*(*cTB*, *αTB*, *βTB*).

These log-likelihood functions are differentiable by each parameter, and we optimized them by Adam [48], which is a type of gradient descent method, to obtain the maximum likelihood estimators. Here, the initial values are (*cTS*, *αTS*, *βTS*)=(*cTB*, *αTB*, *βTB*) = (0.1, **0**.**1**, **<sup>10</sup>**), and the various parameters required for Adam are set following the values in the original reference [48]. Here, *αTS*, *βTS*, *αTB*, and *βTB* are vectors of eight variables (e.g., *αTS* = (*<sup>α</sup>TS*,*TS*, *<sup>α</sup>TS*,*TB*,..., *<sup>α</sup>TS*,*HS*)).

It is also known that the computation of the gradients of the log-likelihood function of the Hawkes process usually requires the computation of *<sup>O</sup>*(*N*<sup>2</sup>). However, using a recursive formulation that can be used when the kernel function is an exponential function, we perform maximum likelihood estimation with *O*(*N*) computational complexity (see Ogata [49] for the recursive formulation).

In addition, the HFTs do not always continuously place orders. Therefore, if an HFT did not place any orders for more than 15 min, we considered such period as not participating in the trade and performed the maximum likelihood estimation for each trader by ignoring such inactive periods.

#### *3.3. Validity of Estimation Results*

Based on a residual analysis [50], we assessed the goodness-of-fit of the point process model. Let the intensity function for point process {*ti*} be *<sup>λ</sup>*(*t*), then the sequence, {*<sup>τ</sup>i*}, of random variables, where each element is transformed by *τi* ≡ *ti* 0 *<sup>λ</sup>*(*t*)*dt*, has the distribution of a stationary Poisson process with intensity 1, and the transformed residual Δ*τi* ≡ *<sup>τ</sup>i*+1 − *τi* has an exponential distribution with the unit mean.

Therefore, if the estimated HFT intensities, *λ TS*(*t*) and *λ*ˆ *TB*(*t*), are good approximations of the true intensities, *<sup>λ</sup>TS*(*t*) and *<sup>λ</sup>TB*(*t*), respectively, then the transformed residuals, <sup>Δ</sup>*τ*<sup>ˆ</sup>*TS*,*<sup>i</sup>* and <sup>Δ</sup>*τ*<sup>ˆ</sup>*TB*,*i*, are expected to follow the exponential distributions with the unit mean. Here, the transformed residuals are defined as <sup>Δ</sup>*τ*<sup>ˆ</sup>*TS*,*<sup>i</sup>* ≡ *tTS*,*i*+<sup>1</sup> *tTS*,*<sup>i</sup> λ*ˆ *TS*(*t*)*dt* and <sup>Δ</sup>*τ*<sup>ˆ</sup>*TB*,*<sup>i</sup>* ≡ *tTB*,*i*+<sup>1</sup> *tTB*,*<sup>i</sup> λ*ˆ *TB*(*t*)*dt*, respectively, which can be derived in the case of <sup>Δ</sup>*τ*<sup>ˆ</sup>*TS*,*<sup>i</sup>* as an example (Equation (9)) because the intensity function has the same form as described above:

$$\begin{array}{c} \Delta t\_{TS,i} = \pounds\_{TS} \Delta t\_{TS,i} - \sum\_{m \in \mathcal{M}} \sum\_{t\_{m,i} < t\_{TS,i+1}} \frac{\hbar\_{\text{n,m}}}{\beta\_{\text{n,m}}} (\exp(-\hat{\boldsymbol{\beta}}\_{TS,\mathcal{m}}(t\_{TS,i+1} - t\_{\textbf{m},i})) \\ - \exp(-\hat{\boldsymbol{\beta}}\_{TS,m}(t\_{TS,i} - t\_{\textbf{m},i}))) \end{array} \tag{9}$$

ˆ

Figure 6a shows the cumulative distribution of the three HFTs' original residuals, <sup>Δ</sup>*tTS*,*i*, and Figure 6b shows the cumulative distribution of the same three HFTs' transformed residuals, <sup>Δ</sup>*τ*<sup>ˆ</sup>*TS*,*i*. We confirmed that the transformed residuals for the three HFTs approximately followed the exponential distribution with the unit mean, which implied that the intensities of {*tTS*,*<sup>i</sup>*} were properly approximated. Because not all of the HFTs' order generation processes could be modeled by the Hawkes process proposed here, the above operations were performed for the sell and buy order processes of the 134 HFTs.

**Figure 6.** Cumulative distributions of <sup>Δ</sup>*tTS*,*<sup>i</sup>* (**a**) and <sup>Δ</sup>*τ*<sup>ˆ</sup>*TS*,*<sup>i</sup>* (**b**) for three typical HFTs (blue: HFT with 2nd highest order frequency; orange: HFT with 4th highest order frequency; and red: HFT with 7th highest order frequency).

Here, we apply the Kullback–Leibler divergence between the distribution of the transformed residuals and the exponential distribution with the unit mean as a criterion to determine whether the approximations of the intensities of each HFTs' {*tTS*,*<sup>i</sup>*} and {*tTB*,*<sup>i</sup>*} are appropriate. Since we estimate the intensity functions of the two-point processes for each trader, we calculate the sum of the respective Kullback–Leibler divergences (*DKL TS*+*TB*), as defined by Equation (10) below:

$$D\_{TS+TB}^{KL} = \sum\_{j} p\_j(\Delta \mathfrak{k}\_{TS,j}) \log \frac{p\_j(\Delta \mathfrak{k}\_{TS,j})}{q\_j(\Delta \mathfrak{k}\_{TS,j})} + \sum\_{j} p\_j(\Delta \mathfrak{k}\_{TB,i}) \log \frac{p\_j(\Delta \mathfrak{k}\_{TB,i})}{q\_j(\Delta \mathfrak{k}\_{TB,i})} \tag{10}$$

Here, *qj*(*τ*) is the discrete exponential distribution with the unit mean, and *pj*(*τ*) is the sampling distribution. The bin size of the discrete distribution is assumed to be 1. The threshold of accepting *DKL TS*+*TB*error will be determined in the next section.
