**4. Empirical Analysis**

## *4.1. Prediction Process*

4.1.1. Step 1: After Calculating the Fluctuation Value in Stock Time Series, the Fluctuation Values Will Be Converted to Neutrosophic Time Series

This study needs to select the parameters of the model and estimate its performance. Many studies in the field of fuzzy forecasting have used the data from January–October as the training set and the data from November–December as the test dataset. To facilitate comparison with these existing studies, we also selected data from November–December as the test dataset. Considering the characteristics of time series, traditional cross-validation methods (such as *k*-fold cross-validation) have poor adaptability. A subset of data after the training subset needs to be retained for validation of model performance. Therefore, we chose a special nested cross-validation, the outer layer of which was used to estimate the model performance and the inner layer of which was used to select the parameters. Specifically, in this paper, we used TAIEX's 1999 data as an example. The closing prices from 1 January–31 October were used as the training dataset. Among them, from January–August was a training subset, and from September–October was for validation. Logical relationships were constructed between each dataset and its closest ninth-order historical values. The closing prices from 1 November–31 December were used as forecast data, and performance was evaluated by comparing forecasting and realistic data.

For example, when the fluctuation value is *U*12 = 28.7, the sequence of linguistic variables is *l*4, *l*5, *l*3, *l*3, *l*2, *l*2, *l*2, *l*5, *l*3. *pU*12 (*l*1) = 0, *pU*12 (*l*2) = 0.3333, *pU*12 (*l*3) = 0.3333, *pU*12 (*l*4) = 0.1111, *pU*12 (*l*5) = 0.2222. Then, we can calculate the ninth-order fuzzy fluctuation information entropy as follows:

$$E(l1\_{12}) = E(28.7) = -\sum\_{i=1}^{5} p\_{l1\_{12}}(l\_i) \log\_2(p\_{l1\_{12}}(l\_i)) = 1.8911\tag{10}$$

$$E(lI\_{13}) = E(-106.5) = -\sum\_{i=1}^{5} p\_{lI\_{13}}(l\_i) \log\_2(p\_{lI\_{13}}(l\_i)) = 1.5307\tag{11}$$

$$E(-33.89) = -\sum\_{i=1}^{5} p\_{lI\_{14}}(l\_i) \log\_2(p\_{U\_{14}}(l\_i)) = 1.3923\tag{12}$$

The information entropy of fluctuation time proposed in this paper is the intermediate term of NS. In order to maintain the consistency with the other two terms, the above results must be normalized. Normalized information entropy based on the maximum values of information entropy is calculated as follows:

...

$$E'(\mathcal{U}\_{12}) = \frac{1.8911}{3.7000} = 0.5111 \tag{13}$$

$$E'(\mathcal{U}\_{13}) = \frac{1.5307}{3.7000} = 0.4137\tag{14}$$

$$E'(lI\_{14}) = \frac{1.3923}{3.7000} = 0.3763\tag{15}$$

In order to convert the numerical data of stock market fluctuation time series into NS, it is necessary to calculate the elements corresponding to the truth-membership term and the falsity-membership term of NS. According to Equation (7), neutrosophic set membership can be calculated. For example, when the fluctuation value is *U*12 = 28.7, then truth-membership *TX*12 of *X*12 is 28.7 3/2×*len* <sup>+</sup>13= 0.5584 and falsity-membership *FX*12 of *X*12 is −28.7 3/2×*len* <sup>+</sup>13= 0.1082. Then, the fluctuation can be represented by the neutrosophic set as follows:

...

$$X\_{12}(28.7) \rightarrow (0.5584, 0.5111, 0.1082) \tag{16}$$

$$X\_{13}(-106.5)\rightarrow(0.0000, 0.4137, 1.0000)\tag{17}$$

$$X\_{14}(-33.89)\rightarrow(0.0675, 0.3763, 0.5991)\tag{18}$$

$$X\_{223}(148.18) \to (1.0000, 0.3910, 0.0000) \tag{19}$$

4.1.2. Step 2: According to Definition 5, Establishing Mapping Relationships Based on Historical Values, Historical Trends, and Current Values

This step requires establishing neutrosophic logical relationships based on the feature and target sets, where *X*12 is the feature item of *X*13.

...

...

$$X\_{12}(\mathbf{x}) \to X\_{13}(\mathbf{x}) = D\_{12}(\mathbf{x}) \tag{20}$$

$$X\_{13}(\mathbf{x}) \to X\_{14}(\mathbf{x}) = D\_{13}(\mathbf{x}) \tag{21}$$

$$\cdots$$

## 4.1.3. Step 3: Calculating the Jaccard Similarity

Jaccard similarity is usually used to compare similarities and differences of a limited set of samples. The higher the value, the higher the similarity. We used it to compare the current logical group with the logical groups in the training set in order to identify similar groups. *S*-*<sup>X</sup>*223,12 indicates the similarity between the 223rd and 12th groups.

$$\begin{array}{rcl} S\_{X\_{23,12}} &= \frac{0.5584 \times 1.0000 + 0.5111 \times 0.3910 + 0.1082 \times 0.0000}{0.5584^2 + 0.511^2 + 0.1082^2 + 1.0000^2 + 0.3910^2 + 0.0000^2 - (0.5584 \times 1.0000 + 0.5111 \times 0.3910 + 0.1082 \times 0.0000)} & (\text{22}) \\ &= 0.7742 \end{array} \tag{22}$$

#### 4.1.4. Step 4: Forecasting the Neutrosophic Fluctuation Point Using the Aggregation Operator

First, we applied the Jaccard similarity measure method to locate similar LHSs of NLRs. We tested different threshold values for the training data. In this example, it was set to 0.89, and we identified 65 groups that met the criteria.

Furthermore, we calculated the forecasting NFTS using the aggregation operator:

$$\mathbf{D}\_{224} = (0.5005, 0.5067, 0.3401)$$

4.1.5. Step 5: Calculating the Forecasted Value

> Then, we calculated the predicted fuzzy fluctuation:

$$Y'(t+1) = 0.5005 - 0.3401 = 0.1604\tag{23}$$

We also calculated the real number of the fluctuation:

$$\mathcal{U}'(t+1) = \mathcal{Y}'(t+1) \times len = 0.1604 \times 85 = 13.63 \tag{24}$$

Finally, the predicted value was obtained from the actual value of the previous day and the predicted fluctuation value:

$$V'(t+1) = V(t) + \mathcal{U}'(t+1) = 7854.85 + 13.63 = 7868.47\tag{25}$$

For the sample dataset, the complete prediction result of stock fluctuation trends and the actual values are shown in Table 1 and Figure 2.

Table 1 and Figure 2 show that NFM-IE was able to successfully forecast TAIEX data from 1 November 1999–30 December 1999 based on the logical rules derived from training data.


**Table 1.** Forecasting results from 1 November 1999–30 December 1999.

**Figure 2.** Forecasting results from 1 November 1999–30 December 1999.

## *4.2. Performance Assessments*

During the experimental analysis, some methods were used to measure prediction accuracy in order to quantify model prediction effects. These methods are mainly used in the prediction field, including the mean squared error (MSE), the root mean squared error (RMSE), the mean absolute error (MAE), and the mean absolute percentage error (MAPE).

These expressions are respectively illustrated by Equations (26)–(29):

$$MSE = \frac{\sum\_{t=1}^{n} \left( forcast\_t - actual\_t\right)^2}{n} \tag{26}$$

$$RMSE = \sqrt{\frac{\sum\_{t=1}^{n} \left(forecast\_t - actual\_t\right)^2}{n}} \tag{27}$$

$$MAE = \frac{\sum\_{t=1}^{n} \left| (forecast\_t - actual\_t) \right|}{n} \tag{28}$$

$$MAPE = \frac{\sum\_{t=1}^{n} \left| (forecast\_t - actual\_t) \right| / \operatorname{actual}\_t}{n} \tag{29}$$

where *forecastt* represents the predicted observations and *actualt* represents actual observations.

Theil's U index [31] is primarily used to measure the deviation between predicted and actual values. It can ge<sup>t</sup> a relative value between zero and one, where zero means that the actual value is equal to the predicted value, that is the prediction model is perfect. At the same time, one indicates that the model prediction e ffect is not satisfactory. Theil's U index is expressed as follows:

$$MI = \frac{\sqrt{\frac{\sum\_{t=1}^{n} \left(forecast\_{t} - actual\_{t}\right)^{2}}{n}}}{\sqrt{\sum\_{t=1}^{n} forecast\_{t}^{2}}} + \sqrt{\frac{\sum\_{t=1}^{n} actual\_{t}^{2}}{n}} \tag{30}$$

According to Equations (26)–(30), we separately predicted TAIEX data from 1997–2005 and further calculated the error for each year.

From Table 2, the results of di fferent error statistics methods showed that NFM-IE can successfully forecast di fferent time series of TAIEX 1997–2005.

**Table 2.** Comparing results of different error statistics methods for Taiwan Stock Exchange Capitalization Weighted Stock Index (TAIEX) data collected from 1997–2005.

