*2.2. Model Estimation*

The fitting of the AR(p) model is introduced in this part, followed by the extension of the procedure to the fitting of the TAR(p) model. Both procedures are fitted with the maximum likelihood procedure, as it has been shown that the maximum likelihood estimators (MLE) are consistent for Gamma random error (Li and McLeod 1988).

The MLE for AR(p) model are derived by *l*(*α*ˆ, *β*ˆ, *ϕ*ˆ) = *argmin*(−*l*), where *l* denotes the log-likelihood function, in the form of

$$\ln l(a,\beta,\varphi) = -n \ast \ln(\Gamma(a)) - n \ast a \ast \ln(\beta) + (a-1) \ast \sum\_{t=1}^{n} \ln(\varepsilon\_{t}) - \frac{\sum\_{t=1}^{n} \varepsilon\_{t}}{\beta},\tag{5}$$

where

$$
\varepsilon\_t = RV\_t - \sum\_{i=1}^p \varphi\_i \* RV\_{t-i} \tag{6}
$$

is the random error.

To further reduce the dimension of estimation, a profile likelihood method is used. The Gamma parameters *α* and *β* are replaced by the MLE of *α* and *β*, using the result of Wilk et al. (1962) and the approximation *dln*(*γ*(*α*)) *dα* ≈ *ln*(*α* − 12 ). Thus, the final estimates of *α* and *β* are as follows:

$$
\hat{\mathfrak{a}} = \frac{A}{2\*(A-G)} \qquad \text{and} \qquad \hat{\mathfrak{f}} = \frac{A}{\hat{\mathfrak{a}}} \,' \tag{7}
$$

where *A* stands for the arithmetic mean of the random error and *G* is the geometric mean. Thus, the estimation of the model is achieved by estimating *ϕ*ˆ = *argmin*(−<sup>ˆ</sup>*l*), where

$$\hat{I}(\varphi) = -n \ast \ln(\Gamma(\hbar)) - n \ast \hbar \ast \ln(\hat{\beta}) + (\hbar - 1) \ast \sum\_{t=1}^{n} \ln(\varepsilon\_t) - \frac{\sum\_{t=1}^{n} \varepsilon\_t}{\hat{\beta}},\tag{8}$$

where *α*ˆ and *β* ˆ are estimated by Equation (7) and, by Equation (6), thus depend on *ϕ* .

The Nelder–Mead method, which is a non-gradient optimization method, is proposed to optimize the negative log-likelihood function. Although such a procedure is heuristic and may converge to non-stationary points, its performance is much more stable than traditional gradient methods, such as the Hessian matrix method, which may not be easily calculated (even numerically) given the dependency of the log-likelihood function and as *ϕ* is quite complicated.

Additionally, before simply applying the method and carrying out the optimization, it should be noticed that, as the random error *εt* is assumed to be Gamma, it is required to be greater than zero, which is also evident from the term *ln*(*<sup>ε</sup>t*) in the expression of the log-likelihood function. To reflect this non-negativity constraint, a penalty method is applied and the log-likelihood function becomes:

$$\hat{I} = \left(-n \ast \ln(\Gamma(\hat{\mathfrak{a}})) - n \ast \hat{\mathfrak{a}} \ast \ln(\hat{\mathfrak{b}}) + (\mathfrak{a} - 1) \ast \sum\_{t=1}^{n} \ln(\varepsilon\_{t}) - \frac{\sum\_{t=1}^{n} \varepsilon\_{t}}{\hat{\mathfrak{b}}}\right) \ast I\_{\text{all } \varepsilon\_{t} \ge 0} - M \ast I\_{\text{some } \varepsilon\_{t} < 0}, \tag{9}$$

where *M* is some large-enough number.

As the Nelder–Mead method is a heuristic search method, the choice of initial point may greatly affect the result and, thus, the estimation process takes various initial points and returns the result that yields a best fit, using the AIC or BIC . Furthermore, a candidate set of AR order p is given and the procedure searches for the best AR order within the set, again by AIC and BIC. Specifically, in the scope of the simulation study in this report, the initial points for *ϕ* are set uniformly within [0,1] and the initial points for T are set within [*μ* − *n* ∗ *σ*, *μ* + *n* ∗ *σ*], where *μ* the sample mean of the RV, *σ* is the sample variance, and *n* is a pre-determined number to control the range, here set as 0.5. The step size of *ϕ* is set to be 0.25 and that of T to be 0.05*σ*. For empirical data analysis, values of *ϕ* in the ranges [0,0.5] and [0.5,1] are tested, with step size 0.125, and the results showed that the outcome from [0,0.5]

almost always dominated that from [0.5,1] and, thus, the range [0,0.5] and step size 0.125 were used for *ϕ*.

The fitting of the TAR(p) model is essentially the same, except that the random errors are classified into two different regimes. Thus, the log-likelihood function is expressed as:

$$\begin{split} \hat{I} = \left( -n\_1 \ast \ln(\Gamma(\hat{\mathfrak{a}}\_1)) - n\_1 \ast \hat{\mathfrak{a}}\_1 \ast \ln(\hat{\mathfrak{f}}\_1) + (\hat{\mathfrak{a}}\_1 - 1) \ast \sum\_{t=1}^{n\_1} \ln(\varepsilon\_{1,t}) - \frac{\sum\_{t=1}^{n\_1} \varepsilon\_{1,t}}{\hat{\mathfrak{f}}\_1} \right. \\ -n\_2 \ast \ln(\Gamma(\hat{\mathfrak{a}}\_2)) - n\_2 \ast \hat{\mathfrak{a}}\_2 \ast \ln(\hat{\mathfrak{f}}\_2) + (\hat{\mathfrak{a}}\_2 - 1) \ast \sum\_{t=1}^{n\_2} \ln(\varepsilon\_{2,t}) - \frac{\sum\_{t=1}^{n\_2} \varepsilon\_{2,t}}{\hat{\mathfrak{f}}\_2} \\ \ast \boldsymbol{I}\_{\text{all }\varepsilon\_{1,t}, \varepsilon\_{2,t} \ge 0} - \boldsymbol{M} \ast \boldsymbol{I}\_{\text{some }\varepsilon\_{1,t}, \varepsilon\_{2,t} < 0} \end{split} \tag{10}$$

where *<sup>ε</sup>*1,*t* are the random errors corresponding to the observations in the first regime, *n*1 is the number of observations in the first regime, and *<sup>ε</sup>*2,*t* and *n*2 the corresponding counterparts in the second regime, respectively.

A final concern regarding the model estimation would be that, for the first few observations, the AR model may not be properly initiated, as there are no earlier observations. Therefore, the sample estimates are essentially estimated by a sample, with the first few observations serving only as the independent variable, but not the dependent variable; that is,

$$RV\_{t+n} = \sum\_{i=1}^{p} q\_i \ast RV\_{t+n-i} + \varepsilon\_{t+n} \tag{11}$$

with *n* being the truncated size. Additionally, as the AIC and BIC are typically applied on the same sample with the same sample size, to allow for the comparison between models of different AR order and lag, a common truncation of size 10 is applied in the scope of this study, as the AR order and lag investigated did not exceed this reasonably.

As with the process of fitting the AR(p) model, the fitting for TAR(p) searches for the best model of AR order p and lag *d*, where p and *d* are given in the pre-determined candidate set and the threshold T.

#### *2.3. Empirical Data Analysis Preparation*

The data used in this paper were the consolidated realized volatility data from Shen et al. (2018), which are the realized volatilities for 30 stocks traded on the New York Stock Exchange (NYSE).

Graphs of PACF and the corresponding naive 95% confidence bound, proposed by Quenouille (1949), were first examined for the stock data, which showed that the PACF of the stocks were mostly significant within a lag of 5 and demonstrated a somewhat cut-off property; thus suggesting the fitting the AR model was potentially a good starting point. Non-linear threshold type AR models were also considered as a supplement to the AR model.

After considering the practical reasonableness of the model and the computational power available, an AR order up to 5 and lag order up to 3 were considered.

The final model for each stock was determined by both considering the AIC and BIC and the associated Ljung–Box test for each criterion. If the model selected by the two criteria differed with a similar goodness of fit, a simpler model was preferred. Otherwise, the model that gave a better goodness of fit result was preferred.

The data set and R code used for the study are available upon request, from either author.
