**AI and Financial Markets**

Special Issue Editors **Shigeyuki Hamori Tetsuya Takiguchi**

MDPI • Basel • Beijing • Wuhan • Barcelona • Belgrade • Manchester • Tokyo • Cluj • Tianjin

*Special Issue Editors* Shigeyuki Hamori Kobe University Japan

Tetsuya Takiguchi Kobe University Japan

*Editorial Office* MDPI St. Alban-Anlage 66 4052 Basel, Switzerland

This is a reprint of articles from the Special Issue published online in the open access journal *Journal of Risk and Financial Management* (ISSN 1911-8074) (available at: https://www.mdpi.com/ journal/jrfm/special issues/AI financial markets).

For citation purposes, cite each article independently as indicated on the article page online and as indicated below:

LastName, A.A.; LastName, B.B.; LastName, C.C. Article Title. *Journal Name* **Year**, *Article Number*, Page Range.

**ISBN 978-3-03936-224-0 (Pbk) ISBN 978-3-03936-225-7 (PDF)**

Cover image courtesy of Sonia Rocca.

c 2020 by the authors. Articles in this book are Open Access and distributed under the Creative Commons Attribution (CC BY) license, which allows users to download, copy and build upon published articles, as long as the author and publisher are properly credited, which ensures maximum dissemination and a wider impact of our publications.

The book as a whole is distributed by MDPI under the terms and conditions of the Creative Commons license CC BY-NC-ND.

## **Contents**


## **About the Special Issue Editors**

**Shigeyuki Hamori** is Professor of Economics, Graduate School of Economics, Kobe University, Japan. He received his PhD in Economics from Duke University, United States. His research interests include applied time-series analysis, empirical finance, data science, and international finance.

**Tetsuya Takiguchi** is Professor of Information Science, Graduate School of System Informatics, Kobe University, Japan. He received his Dr. Eng. degree in Information Science from Nara Institute of Science and Technology, Japan. His research interests include signal processing, machine learning, pattern recognition, and statistical modeling.

## **Preface to "AI and Financial Markets"**

The application of AI technology to the analysis of financial markets has received significant attention due to the development of information processing technology, especially deep learning. This Special Issue is a collection of 10 articles on "AI and Financial Markets" and contains four articles on machine learning ([1–4]), two articles based on agent-based artificial market simulation ([5] and [6]), four articles on the application of other approaches ([7–9]), and one concept article ([10]). This Special Issue provides an overview on "AI and Financial Markets" and hopes to serve as inspiration for both researchers and practitioners in financial technology.

The first challenge is an analysis using machine learning such as supervised learning, reinforcement learning, and text analysis ([1–4]). Suimon et al. [1] built a three-factor model of the Japanese yield curve using the machine learning approach of an autoencoder. Yono et al. [2] developed a model to measure macroeconomic uncertainty based on news text. Zengeler and Handmann [3] use deep reinforcement learning for automatic trading of contracts for indices differing at a high frequency. Zhang and Hamori [4] combined machine learning methodologies with traditional economic models and examined whether such combinations exceed the predictive ability of random walk.

The second block of papers deals with agent-based artificial market simulations ([5,6]). This approach is useful for creating unobserved market states and analyzing the impact of hypothetical investment actions on the market. Hirano et al. [5] developed a model with continuous double auction markets with three types of trading agents (stylized trading agents and two kinds of portfolio trading agents) to analyze the effects of the regulation of capital adequacy ratio on markets. Maeda et al. [6] developed a framework for training deep reinforcement learning models in agent-based artificial price–order–book simulations.

Third, several papers deal with some other interesting approaches ([7–10]). Kim et al. [7] used the hidden Markov model to identify the phases of individual assets and proposed an investment strategy to use price trends effectively. Yue et al. [8] outline the network model of non-banking financial institutions in Romania. Using historical data, Vezeris et al. [9] implemented a trading system based on the Turtle rules and examined its efficiency when trading selected assets.

Last, but not least, Deng's [10] concept paper is on the Artificial Intelligence BlockCloud (AIBC). AIBC is an artificial intelligence and blockchain technology-based large-scale decentralized ecosystem that allows system-wide low-cost sharing of computing and storage resources.

These studies address various aspects of AI and financial markets. We hope these papers will be a valuable resource and inspiration for anyone working in the exciting new field of financial technology.


**Shigeyuki Hamori, Tetsuya Takiguchi** *Special Issue Editors*

## *Article* **Autoencoder-Based Three-Factor Model for the Yield Curve of Japanese Government Bonds and a Trading Strategy**

#### **Yoshiyuki Suimon 1,2,\* , Hiroki Sakaji <sup>1</sup> , Kiyoshi Izumi <sup>1</sup> and Hiroyasu Matsushima <sup>1</sup>**


Received: 2 March 2020; Accepted: 21 April 2020; Published: 23 April 2020

**Abstract:** Interest rates are representative indicators that reflect the degree of economic activity. The yield curve, which combines government bond interest rates by maturity, fluctuates to reflect various macroeconomic factors. Central bank monetary policy is one of the significant factors influencing interest rate markets. Generally, when the economy slows down, the central bank tries to stimulate the economy by lowering the policy rate to establish an environment in which companies and individuals can easily raise funds. In Japan, the shape of the yield curve has changed significantly in recent years following major changes in monetary policy. Therefore, an increasing need exists for a model that can flexibly respond to the various shapes of yield curves. In this research, we construct a three-factor model to represent the Japanese yield curve using the machine learning approach of an autoencoder. In addition, we focus on the model parameters of the intermediate layer of the neural network that constitute the autoencoder and confirm that the three automatically generated factors represent the "Level," "Curvature," and "Slope" of the yield curve. Furthermore, we develop a long–short strategy for Japanese government bonds by setting their valuation with the autoencoder, and we confirm good performance compared with the trend-follow investment strategy.

**Keywords:** yield curve; term structure of interest rates; machine learning; autoencoder; interpretability

#### **1. Introduction**

The interest rate on government bond yields is a representative indicator of macroeconomic fundamentals. Generally, when an economy is active, household consumption and corporate capital investment are also active. In such situations, the demands for raising funds increase even though the interest rate level is high. As a result, these demands lead to a further rise in interest rates on loans, corporate bonds, and government bonds. In addition, the interest rates of government and corporate bonds are affected by the credit risk of the issuer country or company. For example, when concern increases regarding the issuer's financial and fiscal situations, the pressure on interest rates also increases due to the rise in fund procurement costs. Meanwhile, nominal interest rates are also affected by inflation. For instance, the relationship that the nominal interest rate rises when peoples' inflation expectations rise is known as the Fisher equation. Thus, while incorporating changes in various macroeconomic environments, the market price of government bonds and the interest rates also change. In addition to the macroeconomic environment of the market's home country, various factors in foreign markets are transmitted to the domestic interest rate market through interest rate arbitrage transactions.

Another factor significantly impacting the interest rate market is the monetary policy of the central bank. Traditionally, central banks adjust the level of short-term interest rates as the policy rate. For example, when the economy slows down, the central bank tries to stimulate the economy by lowering the policy rate to establish an environment in which companies and individuals can easily raise funds. Conversely, when the economy is overheating, the central bank will leverage the opposite mechanism by raising the policy rate. For these purposes, the central bank generally implements monetary policy by inducing short-term interest rates through open market operations.

However, in 2016, the Bank of Japan (BOJ) introduced the "Quantitative and Qualitative Monetary Easing with Yield Curve Control" that sets the target for long-term and short-term interest rates (Bank of Japan 2016). The "yield curve" is formed by the interest rates of government bonds by maturities, with movements that reflect the interest rate fluctuation factors described above. The BOJ's yield curve control policy significantly affects the fluctuation characteristics of the yield curve because it sets a guidance target for long-term and short-term interest rates. Under such monetary policy, the medium-term and short-term interest rates of Japanese government bonds are currently negative. We show the change of the Japanese yield curve in Figure 1. In the past, the yield curve fluctuated in the positive area, but recently, it has become negative in not only the short-term but also the long-term interest rates. Furthermore, the short- and medium-term interest rates' curve has become almost flat in the negative area. In addition, the volatility of the short-term to long-term interest rates has declined. So, there is now an increasing need for a yield curve model that can flexibly cope with these types of changing yield curve shapes.

**Figure 1.** Japanese government bond (JGB) yield curve.

Analysis of the yield curve shape is important for forecasting fluctuations for trading purposes and risk management for bondholders to understand the characteristics of the market environment. With these backgrounds, this research focuses on the Japanese interest rate market. We propose a model that flexibly describes the shape of the yield curve and serves as a reference for evaluating the asset price of government bonds relative to the market environment.

Specifically, we develop a three-factor model that represents the yield curve of the Japanese government bond market using an autoencoder, a type of machine learning method structured as an artificial neural network. To propose such a method, we first design a simple self-encoder model with one hidden layer. By considering the model parameters of the neural network, we confirm that the three automatically generated factors represent the "Level," "Curvature," and "Slope" of the yield curve.

Various studies using complex neural network models exist today for financial market analysis and forecasting. For example, using neural network models, Krauss et al. (2017), analyzed the US stock market, Suimon (2018), forecasted the Japanese bond market, and Matthew et al. (2017), focused on commodity and foreign exchange markets. However, when a complex neural network model, such as deep learning, was used, risk management problems resulted in making interpretations of these models' output results difficult. In financial businesses, accountability arises in many situations

when forming investment strategy decisions and valuing assets held. Therefore, this research proposes a method that contributes to the transparency of decision making by constructing an interpretable yield curve model. Furthermore, we propose a method of using the autoencoder as a discriminator for overvalued and undervalued government bonds, and verify the prediction accuracy when using these in investment strategies.

#### **2. Literature Review**

In this research, we propose a factor model for the JGB yield curve using an autoencoder to develop a trading strategy for judging the overvaluation or undervaluation of each maturities' government bonds. As a previous related study, with respect to yield curve modeling, there are some methods to estimate the yield curve from the government bonds' market prices. According to Kikuchi and Shintani (2012), these methods are classified the following four types: (1) the piecewise polynomial method (McCulloch 1971, 1975), (Steeley 1991), the models of which use piecewise polynomials to express the discount function; (2) the non-parametric method (Tanggaard 1997), which does not assume any specific structure for the discount function; (3) the polynomial method (Schaefer 1981), which models the discount function with polynomials; and (4) the parsimonious function method (Nelson and Siegel 1987), (Svensson 1995), which assumes specific functional forms for the yield curve term structure.

Regarding the forecasting model of yield curve, Diebold and Li (2006), simulated the US yield curve by using the Nelson–Siegel model (Nelson and Siegel 1987) and predicted the additional changes of the model's three factors by using the AR (Autoregressive) model. In addition, the authors in Diebold et al. (2006), analyzed the relationship between the Nelson–Siegel factors and macroeconomic variables. They demonstrated that the level factor is related to inflation, and the slope factor is related to economic activity. However, the curvature factor did not have a clear relationship with macroeconomic variables.

The research of Reisman and Zohar (2004), extracted three principal component factors with PCA, simulated the time series of each factor, and predicted the yield curve changes based on the time series model factors. In addition, Moench (2008), and Ludvigson and Ng (2009), conducted PCA on multiple macro indexes and predicted the interest rates based on the established macro-based PCA factors.

Furthermore, Suimon (2018), also proposed a yield curve model based on machine learning methods, and expressed the interest rate term structure by focusing on the relative relationship of the three periods (5-, 10-, and 20-year interest rates) of the yield curve instead of the three Nelson–Siegel factors. Then, using Long Short-Term Memory (LSTM), they forecasted the long-term interest rate. Extending this research, Suimon et al. (2019a, 2019b), incorporated Japanese and US interest rate information directly into a neural network model. Moreover, based on interest rate parity theory, Suimon et al. (2019c), incorporated the Dollar–Yen exchange rate in addition to the US and Japanese interest rates into a neural network model.

Machine learning methods for modeling financial markets have been developed, particularly in the forecasting of stock markets. For example, multiple investigations on stock price prediction with neural networks have been reported (Soni 2011). Olson and Mossman (2003), demonstrated the utility of a neural network model for the Canadian stock market forecasting with multiple accounting ratios as the input data. Research by Krauss et al. (2017), predicted next-day US stock market pricing by combining three machine learning methods of deep neural networks, gradient-boosted trees, and random forests. They reported the prediction accuracy of this combined method exceeded the individual method's prediction accuracies. In another recent study on forecasting stock returns in the cross-section, Abe and Nakayama (2018), showed that deep neural networks outperform shallow neural networks as well as other typical machine learning models. Another deep learning-based research of financial markets apart from the stock market was performed by Matthew et al. (2017), who focused on commodity and foreign exchange markets.

#### **3. Term Structure Model of The Yield Curve**

#### *3.1. Changes in Government Bond Interest Rates*

Interest rates (yields of government bonds) are representative indicators of the macroeconomic environment. When economic activity is booming, interest rates on bank loans, corporate bonds, and government bonds experience upward pressure due to the growing demand for various funds. In addition, interest rates on government bonds and corporate bonds are affected by the credit risk of the issuing country or company. Interest rate fluctuation factors in overseas markets also affect the domestic market through financial arbitrage transactions. Through these interactions, the market price of the government bonds is formed while incorporating changes in the market environment, and the interest rates calculated from government bond prices change.

#### *3.2. Term Structure Model of The Japanese Government Bond Yield Curve*

Figure 2 plots a history of several interest rates with varying maturities. Currently in Japan, in addition to short-term discounted government bonds, government bonds with maturities of 2, 5, 10, 20, 30, and 40 years are issued, and there are market interest rates on government bonds with a range of maturities (Ministory of Finance 2019). The "Yield curve" shown in Figure 3 combines the interest rates of these maturities.

**Figure 2.** Historical data of Japanese government bond interest rates.

**Figure 3.** Japanese government bond (JGB) yield curve data and the Nelson–Siegel model curve.

The shape of the yield curve can be expressed using a term structure model for interest rates. For example, using the Nelson–Siegel model (Nelson and Siegel 1987), the term structure of interest

rates can be expressed by the following functional form. Here, *y* is the interest rate, τ is the term, and λ is a constant. The Nelson–Siegel model curve is fitted in Figure 3 to the actual Japanese government bond (JGB) yield curve according to

$$y(\tau) = F\_1 + F\_2 \left(\frac{1 - e^{-\lambda \tau}}{\lambda \tau}\right) + F\_3 \left(\frac{1 - e^{-\lambda \tau}}{\lambda \tau} - e^{-\lambda \tau}\right) \tag{1}$$

This model function consists of three factors of *F*, representing the level, slope, and curvature of the yield curve. The coefficients for each factor by maturities are shown in Figure 4. The Nelson–Siegel model is one in which a simple function approximately represents the shape of the yield curve. So, while it is convenient, it is also subject to restrictions on the shape of the model function.

**Figure 4.** Three factors of the Nelson–Siegel model.

Principal factors can also be extracted by performing principal component analysis (PCA) directly on interest rate data. Applied to weekly Japanese government bond yield data for 2, 5, 7, 10, 15, and 20 years after 1992, the results show nearly a 99% cumulative contribution of the third PCA factor. This suggests that the shape of the yield curve can be almost expressed. Figure 5 plots the shape of the eigenvector for each PCA factor showing that the first, second, and third factors represent the level, slope, and curvature of the yield curve, respectively.

**Figure 5.** Eigenvectors of the principal component analysis (PCA) factors 1–3.

#### **4. Autoencoder-Based Model Design for the Yield Curve**

#### *4.1. Autoencoder*

We next construct a model that expresses the yield curve of the Japanese government bond market using an autoencoder, an algorithm for dimension compression using neural networks (Hinton and Salakhutdinov 2006). Principal component analysis is an example of a linear dimension compression. With autoencoders, the same training data is learned through the input and output layers of a neural network. By increasing the number of nodes in a hidden layer, more complex yield curve shapes can be expressed. In this research, we construct neural network models with 2, 3, and 4 nodes in the hidden layer for comparison.

Figure 6 illustrates how we incorporate 2-, 5-, 7-, 10-, 15-, and 20-year interest rate data into a learning model. In this autoencoder model, *Y* is the vector of the input information, 2-, 5-, 7-, 10-, 15-, and 20-year interest rate data and the activation function to the hidden layer is hyperbolic tangent. The output information of the model is *Y*- . We estimate the model parameters *b* and *a* so that the input information *Y* and the output information *Y* matches. we use weekly data from July 1992 to July 2019 to estimate the model. We interpret each node of the hidden layer in the self-encoder by assigning a linear function to represent the path to the output layer and consider the function's coefficient.

$$\mathbf{Y}' = b\mathbf{F} = b[\tanh(a\mathbf{Y})] \tag{2}$$

First, we analyze the model with a hidden layer comprised of three nodes. Figure 7 shows the coefficient *b* of the linear function representing the output from the hidden layer, which provides the correspondence between the hidden layer's nodes and the output layer's nodes. Each node in the hidden layer can be interpreted as the level, slope, and curvature of the yield curve

**Figure 6.** Autoencoder model constructed with interest rate data.

Based on these results, Figure 8 compares the value of each node with the actual interest rate level and interest rate spread (i.e., the interest rate difference). For example, comparing Node 2, representing a level, with a two-year interest rate near the start of the yield curve, the two are approximately linked. Comparing Node 1, interpreted as the slope, with the 2–20-year interest rate spread (i.e., 20-year yield - 2-year yield), both move similarly. Node 3, interpreted as the curvature of the yield curve centered on long-term interest rates, compared with the 2-10-20 year butterfly spread (i.e., 2 × 10-year yield - 10-year yield - 20-year yield), also moves similarly in Figure 8.

**Figure 7.** Coefficients of the linear function output from the hidden layer (for three nodes).

**Figure 8.** Plots of the hidden layer node output with the corresponding interest rate levels and interest rate spread.

Next, we review the results with only two nodes in the hidden layer. Figure 9 plots the coefficients of the linear function representing the hidden layer for each node to the output layer. With only two hidden layers, the nodes are interpreted as the curvature of the yield curve and the combined slope and level of the yield curve.

**Figure 9.** Coefficients of the linear function output from the hidden layer with two nodes.

Figure 10 plots comparisons between the value of each node in the hidden layer with the actual interest rate spread. Node 1, interpreted as the slope, moves similarly to the 2-20-year interest rate spread (i.e., 20-year yield - 2-year yield). Node 2, interpreted as a curvature of the yield curve centered on the long-term interest rate, is approximately linked to the 2-7-20 year butterfly spread (i.e., 2 × 10-year yield - 7-year yield - 20-year yield).

**Figure 10.** Plots of the hidden layer nodes' output corresponding to the interest rate spreads.

Finally, we analyze the model with four nodes in the hidden layer. According to the coefficients of the linear function representing the output from the hidden layer, as shown in Figure 11, Node 1 represents the level, Node 3 represents the curvature, and Node 2 and Node 4 represent the slope. However, according to the shape of each coefficient vector, Node 1, Node 2, and Node 4 also include a curvature element. So, with four nodes in the hidden layer, the interpretation of each is not as straightforward as the models with two or three nodes. In the principal component analysis (PCA) described in Section 3, the cumulative contribution to the shape of the yield curve when using the third principal component factor was about 99%. So, as suggested by these results, the autoencoder can best represent the shape of the yield curve with three nodes in the hidden layer.

**Figure 11.** Coefficients of the linear function output from the hidden layer with four nodes.

In this research, we proposed a yield curve model using an autoencoder. Like the Nelson–Siegel model (Nelson and Siegel 1987), and other known factor models (Svensson 1995) (Dai and Singleton 2000), the proposed model can express the shape of the yield curve by combining the three factors, curvature, level, and slope. The factor models, such as the Nelson–Siegel model and the Svensson model etc., that we mention above need to explicitly set a function form that expresses the shapes of the yield curve. However, the autoencoder-based model or neural network-based model have high flexibility for the expression of the yield curve because these models can set the function forms flexibly. With significant changes in monetary policy and other factors, the fluctuation characteristics of the yield curve also change, so a flexible functional form is required for the yield curve modeling. When using autoencoder-based model or neural network-based model, not only the model parameters but also hyperparameters and the number of nodes can be changed to increase the flexibility of the model function that express the shape of the yield curve.

Furthermore, when using PCA for the yield curve modeling and specifying the number of principal component factors, we cannot cover the contribution of the other PCA factors. However, when using an autoencoder for the modeling, all input information can be used via the network model. These points are advantages of the autoencoder-based model as a yield curve factor model compared to other factor models.

#### *4.2. Autoencoder-Based Yield Curve Model and Trading Strategy*

From the viewpoint of asset price evaluation and investment strategies for government bonds, we propose using an autoencoder that models the shape of the yield curve. The interest rate output by the trained autoencoder is calculated based on the relative relationship with the interest rates of the other maturities. So, in this section, we apply the trained autoencoder as a discriminator for overvalued or undervalued government bonds compared to other maturities. We also construct a long–short strategy for government bonds based on these overvalued and undervalued evaluations to verify its performance.

The interest rate data for each maturity at the time of investment is input to the learned autoencoder, and we define the output interest rate as the reference interest rate. For each maturity, if the interest rate at the time of investment is higher than the reference interest rate, we judge the government bond as undervalued as shown in Figure 12, so we long (buy) the bond. On the other hand, if the interest rate is lower than the base interest rate, we short (sell) the bond. The investment period for each position is one or three months. For training the autoencoder, we include data from the previous 2, 5, and 10 years, excluding data at the time of the model update, and update the models annually. Weekly interest rate data from July 1992 to July 2019 is used for the investment simulation.

**Figure 12.** Judgments of overvaluation or undervaluation using an autoencoder.

Figure 13 shows the simulation results for the autoencoder with three hidden layer nodes modeling the average capital gain over one month for each long–short strategy. The unit of the capital gain is bp (0.01%). For the long position, the decrease in interest rates during the investment period is the capital gain, and, for the short position, the increase in interest rates is the capital gain. To verify the accuracy of the interest rate forecasting by the model, we do not consider the effects of carry and rolldown and repo cost when making a short position. For the comparison of this trading strategy, we present the results of a trend-follow investment strategy (i.e., long if the interest rate declines from the previous week at each investment period, and short if interest rates rise) along with the results of investment strategies that are always a long (short) position.

The performance of the investment simulation depends on the number of nodes in the hidden layer, the learning period of the model, and the maturity of the government bonds to be invested. Figure 13 shows the result of the proposed strategy with the three-node model with a learning period of about 5 years and an investment period of 1 month. For the 10-year and 20-year government bonds' investment strategies, these results suggest that the proposed model has a higher investment return than the trend-follow investment strategy.

The results from the models with two or four nodes in the hidden layer are included in Appendix A. For both cases, the performances are similar to the model with three hidden layer nodes. However, for the one-month investment strategy of 10-year and 20-year government bonds using 5 years for the learning period, the performance of the three-node case is better.

**Figure 13.** Average monthly capital gains based on different learning periods on a three-node hidden layer model with investment periods of 1 or 3 months.

In this strategy, the trained autoencoder calculates the base interest rate from the relationships with the other maturities' interest rates. If the interest rate of the target maturity is distorted compared to the other rates, there is merit in that the autoencoder's judgment can automatically construct the investment position for correcting the distortion. However, such interest rate distortions between maturities are corrected in a relatively short period, so the one-month investment period is better than three months, as shown in these results of the investment performance.

The performance of the model is good when the learning period is approximately 5 years, which is likely due to the frequency of monetary policy changes that significantly affect the characteristics of the interest rate market. Based on the Yu-cho Foundation (2018), Figure 14 shows the timing of recent major Japanese monetary policy changes and illustrates that the monetary policy framework changes every two to five years. Considering this frequency of change, if the model learning period is 10 years, it is difficult to respond to these changes in the market characteristics. On the other hand, if the model learning period is 2 years, the number of data samples for learning is too small in the form of weekly granular data. From these observations, a learning period of 5 years is presumed to offer the best performance.

**Figure 14.** Monetary policy change with the medium- and short-term interest rates.

In summary, we confirmed the relative effectiveness of the performance of the 10-year and 20-year government bonds strategies with a learning period of 5 years and an investment period of 1 month. Figure 15 shows the cumulative performance of this strategy that verifies the positive cumulative returns for both strategies. However, in the 10-year government bonds strategy, the cumulative returns decline significantly after the beginning of 2016. Figure 15 also includes the 10-year and 20-year bond yields, showing that the 10-year bond yield fell nearly to 0% since the introduction of the "Quantitative and Qualitative Monetary Easing with a Negative Interest Rate" policy in January 2016. Furthermore, after the introduction of the "Quantitative and Qualitative Monetary Easing with Yield Curve Control" policy in September 2016, the 10-year bond yield remains strictly around 0%. Therefore, we see that such rigidity in market interest rates can cause the sluggish performance of the long–short strategies for 10-year government bonds.

**Figure 15.** Interest rates and the result of the autoencoder-based investment strategies.

#### *4.3. Comparison with Other Strategy Models*

In the previous section, we confirmed the performance of the investment strategies that use the autoencoder as a discriminator to judge overvaluation or undervaluation for each maturities' government bonds. In this strategy, as shown in Figure 16, the investment decision is based on the interest rate data at the time of the investment, without directly using the historical data before the investment day. Nevertheless, a relatively stable return is obtained.

**Figure 16.** The differences in interest rate information used for each investment strategy.

On the other hand, this investment strategy is ancillary as it was devised during the development of the autoencoder for the yield curve factor model. Typically, when making an investment strategy for government bonds, the historical data of the interest rates would be incorporated. For example, Suimon (2018), and Suimon et al. (2019b), proposed an investment strategy based on a neural network model that learned with historical time series data of interest rates. These studies demonstrated the usefulness of an LSTM (Long Short-Term Memory) (Hochreiter and Schmidhuber 1997) model that also learned to make predictions based on interest rate time series data. In addition, VAR (Vector autoregression)-based strategies (Estrella and Mishkin 1997; Ang and Piazzesi 2003; Afonso and

Martins 2012) are also known as investment strategies for the government bonds market by using the historical interest rates data. Based on these past studies, we next implement an investment strategy using the LSTM and VAR model trained on interest rate historical data to compare its investment performance with that of the autoencoder model.

The LSTM model is a type of Recurrent Neural Network (RNN) model that inputs past time series information sequentially. Figure 17 illustrates the relationship between the input and output information of the interest rate data for the LSTM model we implement here. The structure of the LSTM block in Figure 17 is as shown in Figure 18. As shown in Figure 16, in addition to the interest rate data at the time of investment, the interest rate data of the previous weeks are used as input information for the model. Then, the LSTM model learns the correspondence between these past interest rates and future interest rates.

**Figure 17.** Structure of the Long Short-Term Memory (LSTM) model.

**Figure 18.** Structure of the LSTM block.

Furthermore, as with LSTM, we implement a VAR model which use the interest rate information of the past few weeks as the input information and which use the interest rate information of one month ahead as the output information. Here, let *yt* be the interest rate of one month ahead and *yt*−*<sup>i</sup>* be the weekly interest rate of the past three weeks including the time of investment. Φ*<sup>i</sup>* and *C* are the model parameters.

$$y\_t = \Phi\_1 y\_{t-1} + \Phi\_2 y\_{t-2} + \Phi\_3 y\_{t-3} + \Phi\_4 y\_{t-4} + \mathcal{C} + \varepsilon\_t \tag{3}$$

Using the LSTM model and VAR model, we predict the interest rate one month later and decide to buy or sell the government bonds for each maturity based on the relationship between the actual interest rate at the time of investment and the forecasted future interest rate. For example, if the predicted interest rate one month ahead according to the model is higher than the interest rate at the time of investment, we expect the interest rate will rise, so we short (sell) the government bond. On the other hand, if the predicted interest rate is lower than the current interest rate, we expect the interest rate will fall, and we long (buy) the government bond.

Figure 19 shows the results of our investment simulation. Similar to the strategy using the autoencoder, the learning period of the LSTM and VAR model is 5 years, and we relearn the model every year. As a result, the investment strategy using these models demonstrates relatively high investment performance compared to that of the strategy using the autoencoder.

**Figure 19.** Average monthly capital gains using the LTSM, VAR and autoencoder-based models' learning period of 5 years and an investment period of 1 month.

Considering the cumulative returns presented in Figure 20, the LSTM and VAR model utilizing the historical time series information provides stable forecasting of the returns. On the other hand, the strategy using the autoencoder that does not use time series information of past interest rates as the input data is inferior in terms of trading performance.

**Figure 20.** Cumulative returns from each investment strategy.

However, based on the yield curve shape information at the time of investment, the autoencoder determines if the government bonds are overpriced or underpriced, enabling a decision to sell or buy based on its valuation. The cumulative return of the autoencoder strategy is stably positive. So, the evaluation of overpricing or underpricing for each bond at the time of investment is reasonable. Therefore, the proposed model using an autoencoder is effective from the viewpoint of the asset evaluation of government bonds relative to the market environment.

In addition, from the viewpoint of interpretability, we have merit to use the autoencoder-based model that we propose in this research. The autoencoder-based model expresses the yield curve by three factors, which are interpreted as the level, slope, and curvature of the yield curve. In the trading strategy, as we proposed, we decide to sell or buy based on its valuation of the yield curve shape information at the time of investment based on the autoencoder. So, the proposed strategy is to construct a trading position to the direction in which the deviation between the actual curve and the theoretical curve by autoencoder is corrected. Therefore, we can clearly interpret what we are betting on in the proposed strategy. On the other hand, the investment strategy based on LSTM and VAR shown in the paper predicts the future interest rate directly based on historical interest rate information at the time of investment and decide to sell or buy based on prediction. Therefore, it is difficult to interpret whether the trading position by LSTM or VAR is betting on the pattern of interest rate change from the past or betting on correcting the distortion of the yield curve at the time of investment. As described above, the merit of the proposed model based on the autoencoder is the interpretability of the model and the interpretability of what we are betting on.

Finally, I would like to supplement the analysis/simulation programming method used in this research. The programming language used throughout this research is Python, and the Python library TensorFlow was used to implement the neural network (the proposed autoencoder-based yield curve model, LSTM model), and scikit-learn was used to implement PCA.

#### **5. Conclusions**

We proposed a factor model for JGB yield curves by using an autoencoder. In Japan, the shape of the yield curve has changed significantly in recent years following major adjustments in monetary policy, such as the "Quantitative and Qualitative Monetary Easing with Yield Curve Control" by the

BOJ in 2016. Under such monetary policy, the medium-term and short-term interest rates of JGBs are currently negative. Recently, interest rates have been declining due to the slow growth of the global economy, and interest rates have become negative in some countries other than Japan due to the central banks' monetary easing. In addition to adjusting short-term policy rates, some central banks have also introduced policies that directly affect long-term interest rates, such as the purchase of long-term government bonds. As a result, the shape of the yield curve has been diversified globally. So, there is now an increasing need for a yield curve model that can flexibly cope with these changing yield curve shapes. The neural network-based autoencoder model offers flexibility to represent the shape of the yield curve by configuring the number of nodes and the activation function.

When a complex neural network model, such as deep learning, is used, the interpretability of the model and output results is often difficult. In this research, we focused on the model parameters of the intermediate layer of the neural network that constitute the autoencoder and confirmed that the three automatically generated factors represent the "Level," "Curvature," and "Slope" of the yield curve. We think this interpretation of the yield curve model is significant from the viewpoint of risk management in financial businesses.

Furthermore, we developed a long–short strategy for JGBs by using the autoencoder to determine if they are overpriced or underpriced, and we confirmed the good performance of this approach relative to the trend-follow investment strategy. In particular, for the 10-year and 20-year government bonds, the cumulative return of the one-month investment strategy based on the autoencoder-based model (three hidden layer node model with a learning period of 5 years) is stably positive. So, we see that the evaluation of overpricing or underpricing for each bond at the time of investment is reasonable in these cases. Therefore, our proposed model using an autoencoder is effective for asset evaluation of long-term government bonds relative to the market environment.

On the other hand, for prediction accuracy, the LSTM model using past interest rate time series data offered better performance. Based on this result, future work will customize the neural network structure of the yield curve model to improve the prediction accuracy in addition to the interpretability that we proposed here. Furthermore, we analyzed interest rate data in the Japanese government bond market, and, in the future, we will conduct similar analyses on other market data, such as in the United States and Europe, as well as compare global market analyses.

**Author Contributions:** Conceptualization, Y.S.; methodology, Y.S.; software, Y.S.; validation, Y.S., H.S., K.I. and H.M.; formal analysis, Y.S.; investigation, Y.S.; resources, Y.S.; data curation, Y.S.; writing—original draft preparation, Y.S.; writing—review and editing, Y.S., H.S., K.I. and H.M.; visualization, Y.S.; supervision, H.S. and K.I. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Appendix A**

In Section 3.2, we simulated the investment strategy by using an autoencoder model we proposed in this research. The following figures show the results of the model with a hidden layer comprised of two and four nodes.

**Figure A1.** Average monthly capital gains with a two-node hidden layer for investment periods of 1 or 3 months.

**Figure A2.** Average monthly capital gains with a four-node hidden layer with investment periods of 1 or 3 months.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **Construction of Macroeconomic Uncertainty Indices for Financial Market Analysis Using a Supervised Topic Model †**

#### **Kyoto Yono \*, Hiroki Sakaji , Hiroyasu Matsushima , Takashi Shimada and Kiyoshi Izumi**

School of Engineering, The University of Tokyo, Tokyo 113-8654, Japan; sakaji@sys.t.u-tokyo.ac.jp (H.S.); matsushima@sys.t.u-tokyo.ac.jp (H.M.); shimada@sys.t.u-tokyo.ac.jp (T.S.); izumi@sys.t.u-tokyo.ac.jp (K.I.)


Received: 9 January 2020; Accepted: 15 April 2020; Published: 19 April 2020

**Abstract:** The uncertainty in the financial market, whether the US—China trade war will slow down the global economy or not, Federal Reserve Board (FRB) policy to increase the interest rates, or other similar macroeconomic events can have a crucial impact on the purchase or sale of financial assets. In this study, we aim to build a model for measuring the macroeconomic uncertainty based on the news text. Further, we proposed an extended topic model that uses not only news text data but also numeric data as a supervised signal for each news article. Subsequently, we used our proposed model to construct macroeconomic uncertainty indices. All these indices were similar to those observed in the historical macroeconomic events. The correlation was higher between the volatility of the market and uncertainty indices with larger expected supervised signal compared to uncertainty indices with the smaller expected supervised signal. We also applied the impulse response function to analyze the impact of the uncertainty indices on financial markets.

**Keywords:** uncertainty; economic policy; text mining; topic model

#### **1. Introduction**

Macroeconomic uncertainty is a factor that influences the purchase and sale of financial assets. For investors, the macroeconomic uncertainty is the degree of unpredictability for the future direction of the economy, ranging from several topics, the monetary and fiscal policies in each country and the trade friction between any two countries.

Bank of America Merrill Lynch surveys institutional investors all over the world every month. The survey is about the institutional investor's view of the world economy and its asset class allocation. When the uncertainty about a certain country is high, the institutional investor reduces the weight of the risky assets of a certain country and allocate to the safe assets such as bonds and cash.

For example, when the uncertainty about the EU economy is high and the investors are uncertain about the economic growth in the country except for the US, they allocate their assets to US assets and increase the safe assets such as bonds and cash. On the contrary, when the uncertainty about a certain country disappears, the investors buy the assets of a certain country and reduce the safe assets such as bonds and cash.

#### *1.1. Measurement of the Macroeconomic Uncertainty*

In a modern economic environment, several macroeconomic uncertainties are observed to co-exist; further, the investors can improve their investment strategies if they can quantitatively classify the uncertainty based on its source and measure the uncertainty. They can hedge the risk associated with

high macroeconomic uncertainty. Additionally, investors may also utilize the uncertainty associated with stress testing.

Historical volatility of a financial asset is partially influenced by the uncertainty (Chuliá et al. 2017). One alternative way to measure the magnitude of uncertainty is to measure the magnitude of historical volatility. However, if assets are influenced by several macroeconomic uncertainties or if volatilities are influenced by demand and supply, measuring only volatility is insufficient to evaluate the uncertainty.

Recently, an alternative method has been proposed to measure the macroeconomic uncertainty using a text mining method. (Baker et al. 2016) developed an approach to measure the policy of economic uncertainty index . (Manela and Moreira 2017) developed the news-implied volatility . Further, we will introduce the related studies in the subsequent section.

#### *1.2. Our Contributions*

The objective of this study is to construct the macroeconomic uncertainty indices based on the news text. We proposed an extended topic model using both news text data and numeric data as a supervised signal for each news article.

For our research, we used the supervised Latent Dirichlet Allocation (sLDA) and the uncertainties generated without the usage of pre-defined words for each uncertainty (sLDA is one of the topic models discussed in Section 5).

One of the benefits of our model is that our model is able to show the market impact of each uncertainty by using VIX index as a supervised signal for sLDA model. The market impact is not automatically estimated in other models. However, as the results of our model parameter inference, estimated parameter eta is computed as the average magnitude of VIX for each uncertainty index. The relation between estimated parameter eta and market impact is also discussed in Section 6.3. Further, We detail the macroeconomic uncertainty index analysis with historical macroeconomic events. Additionally, we conducted qualitative as well as quantitative analyses of the selected uncertainty indices.

#### **2. Related Works**

#### *2.1. Research Related to Volatility*

In financial markets, the correlation between market historical volatility and uncertainty is very high. When uncertainty is high, market historical volatility is also high. (Chuliá et al. 2017) used the stock price to construct the uncertainty index of the market. They separate variation (market volatility) into expected variation and uncertainty (unexpected variation). Many previous studies have performed market volatility prediction using numeric data. (Castelnuovo and Tran 2017) used Google Trends data to forecast weekly volatility of stock markets. (Manela and Moreira 2017) used the news text to predict the market volatility.

#### *2.2. Research Using Text Data*

(Baker et al. 2016) introduced Economic Policy Uncertainty Index which is constructed by using the news text data. They used pre-defined words for three categories ("policy", "economic", and "uncertainty") to count a number of news articles that contain these words. Moreover, they conducted monthly aggregate seasonal adjustments. This Economic Policy Uncertainty Index had been explored further by many studies. (Jin et al. 2019) analyzed the relation between stock price crash risk and uncertainty index. In addition, (Pástor and Veronesi 2013) explored how stock prices respond to political uncertainty, (Brogaard and Detzel 2015) used uncertainty index to forecast market returns. Not only from the financial market side, but also from the economic side, the uncertainty index have been studied. For example, (Gulen and Ion 2016) analyzed the relationship between the uncertainty index and corporate investment. (Bachmann et al. 2013; Fernández-Villaverde et al. 2015) analyzed

the economic activities using the uncertainty index. Furthermore, (Bloom 2014) discussed the stylized facts about uncertainty index.

Many research have extended Backer's methodology to other countries' news text data and built Economic Policy Uncertainty Index for specific countries. (Arbatli 2017) created the Japan Policy Uncertainty Index, (Manela and Moreira 2017) developed the Belgium Policy Uncertainty Index. (Azqueta-Gavaldon 2017) created Economic Policy Uncertainty in the UK. (Jin et al. 2019) constructed Chinese economic policy uncertainty in similar method as (Baker et al. 2016).

Furthermore, an extension of uncertainty index using different data is also discussed. (Husted et al. 2019; Saltzman and Yung 2018) extracted certain texts from Federal Reserve Beige Books. (Bloom 2014) construct the World Uncertainty Index based on news for 143 individual countries. (Baker et al. 2019) construct Equity Market Volatility tracker. (Hamid 2015) developed uncertainty index by using Google Trend data.

#### *2.3. Topic Model*

LDA is one of the most popular topic models proposed by (Blei et al. 2003) on the basis that all the documents are a mixture of latent topics and that each topic is a probability distribution with respect to words. Text analysis based on the bag of words representation has a high dimension (the number of dimensions is the number of distinct words in the corpus). By using topic models the number of dimensions of text can be reduced the dimensions equals the number of topics. Many researchers applied the LDA model to Financial News. (Hisano et al. 2013) used LDA to classify Business News. (Shirota et al. 2014) used LDA to extract financial policy topics from the Policy Board of the Bank of Japan financial policy meeting proceedings. (Mueller and Rauh 2018) use extended LDA model to classify Federal Open Market Committee text and discuss the FOMC Communications on US Treasury Rates. (Kanungsukkasem and Leelanupab 2019) introduced an extended LDA model called FinLDA. (Thorsrud 2018) use LDA to classify newspaper topic and construct a daily business cycle index. (Larsen and Thorsrud 2019) use LDA to classify newspaper topic and predict economic variables.

#### *2.4. Topic Model Applying to Uncertainty Index*

(Azqueta-Gavaldón 2017) used topic models to build an uncertainty index. He used topic models to separate news text into 30 topics and selected topics which is equivalent to categories identify by (Baker et al. 2016). He successfully replicates Backer's Economic Policy Uncertainty Index in a less costly and more flexible way.

(Rauh 2019) used regional newspapers to build uncertainty at the regional level. He uses the topic model to separate news text into 30 topics and extract 5 topics (Independence, Energy, Investment, Federal, Government, Topic index) as an uncertainty index.

The difference between this study and the previous studies is that we constructed multiple uncertainty indices using a supervised LDA model. The description of the supervised LDA is provided in Section 5. By using the supervised LDA model, uncertainty indices that have a strong relationship with market volatility can be separated from others.

#### **3. Datasets**

In this study, we extract the uncertainty index based on the news text and apply the renowned topic model to the news text for classifying the topic of uncertainty. News text is considered to be the dataset for the model. Besides, numeric data is also considered to be the dataset as the supervised signal for each article.

#### *3.1. Text Data*

The text data were obtained from the Japanese Reuters news articles, and we extracted the global economy news article from the Reuters website. In total, we collected 33,000 articles from August 2009 to November 2019. More than ten global economy news articles are published on the Reuters website per day, and each article contains an average of 1200 words. The news articles on the global economy category are observed to focus on the economic events and monetary policy on each major country. Furthermore, comments and columns of economists are available on the website. We used the news articles from the global economy category because the text corpus should contain articles related to global economic uncertainty and not individual firm issues or market price movement. This condition is almost the same as Baker's conditions of extracting the article which contains terms related to "economic" and terms related to the "policy" category.

#### *3.2. Numeric Data*

We selected the volatility index (VIX) as the supervised signal for the sLDA model. The VIX concept is used to formulate theoretical expectations of the volatility implied by the S&P 500 index, which is disseminated and calculated on a real-time basis by the Chicago Board Options Exchange. VIX is an index that allows investors to measure uncertainty for future market trends. Thus, we collect the VIX dataset daily similar to the interval during which we collect the text data.

#### **4. Materials and Methods**

Figure 1 divides the entire process into three parts. The first part is the prepossessing of the input data, and the second part is the topic classification that is performed using sLDA. The final part is to measure uncertainty index. We use topic distribution for each document and normalize in each month.

**Figure 1.** Overall process.

#### *4.1. Text Data Preprocessing*

After obtaining Reuters news articles from the global economy category of the Reuters'website, we extracted articles that explained the uncertainty. First we define uncertainty terms (Table 1). For all articles, we extracted articles that contain any of the uncertainty terms. Because some Reuters articles contain articles that are not related to uncertainty but views about the economic forecast, we should eliminate these articles from the corpus. This condition is the same as Baker's conditions of extracting the article which contains terms related to the "uncertainty" category.

**Table 1.** Term sets for uncertainty.


Subsequently, we conduct Japanese language morphological analysis and extract nouns using Mecab (Japanese Part-of-Speech and Morphological Analyzer). As a result of text data preprocessing, the corpus contains 3115 documents and approximately 2.95 million words with 1786 distinct terms.

#### *4.2. Numeric Data Prepossessing*

For numeric data, we use the VIX index as the supervised signal for sLDA. We conducted normalization to convert the VIX index to exhibit a zero average and a standard deviation of one. Further, we use the converted VIX index as the supervised signal for each article.

#### *4.3. Topic Classification*

The topic classification is conducted using sLDA. The supervised signal for each article is the converted VIX having the same date the article was published.

#### *4.4. Uncertainty Measurement*

After the topic classification was completed using sLDA, *θd*=1..*M*,*k*=1...*<sup>K</sup>* the probability of topic k occurring in document d is inferred. Here *M* is the total number of documents and *K* is the total number of topics. Next, we calculate the average probability topic k occurring in documents for the month *t* by the following equation.

$$S\_{t=1\ldots T,k=1\ldots K} = \frac{1}{n\_l} \sum\_{d\_l \in D\_l} \theta\_{d\_l,k=1\ldots K} \,. \tag{1}$$

Here, *nt* denotes the number of documents at month *t* and *Dt* denotes the collection of documents at month *t*. Finally, the score for the uncertainty index of topic k at the month t (*U It*,*k*) is the normalized value of *St*,*k*. We normalized *St*,*k*=1...*<sup>K</sup>* so that the average score of each topic to 100 by the following equation.

$$
\Delta II\_{t=1\ldots T,k=1\ldots K} = \frac{S\_{t=1\ldots T,k=1\ldots K}}{\frac{1}{T}\sum\_{t}^{T}S\_{t,k=1\ldots K}} \times 100.\tag{2}
$$

#### **5. Topic Model**

#### *Supervised Latent Dirichlet Allocation*

Ever since LDA was introduced by (Blei et al. 2003), there were many extension models of LDA. sLDA is an expansion of LDA proposed by (Mcauliffe and Blei 2008). sLDA is a model developed by adding a response variable associated with each document to the LDA model, which jointly model the documents and responses, to find latent topics that will optimally predict the response variables in case of future unlabeled documents.

Figure 2 presents the graphical model representation of sLDA and Table 2 presents the sLDA notations. In our research, the converted VIX index is used as a signal for the article.


**Table 2.** Notations in sLDA.

**Figure 2.** The graphical model representation of sLDA.

The generation process in sLDA is consists of following steps:

• For each document *d*, topic distribution *θ<sup>d</sup>* determined by the following equation, where *α* is the hyperparameter of Dirichlet distribution.

$$
\theta\_d \sim Dirichlet(a).\tag{3}
$$

• For each topic *k*, word distribution *ϕ<sup>k</sup>* determined by the following equation, where *β* is the hyperparameter of Dirichlet distribution.

$$
\varphi\_k \sim Dirichlet(\beta). \tag{4}
$$

	- **–** topic *zd*,*<sup>i</sup>* is sample from distribution by the following equation.

$$z\_{d,i} \sim Multinomial(\theta\_d). \tag{5}$$

**–** word *wd*,*<sup>i</sup>* is a sample from distribution by the following equation.

$$aw\_{d,i} \sim Multinomial(\varphi\_{\mathbb{Z}\_{d,i}}).\tag{6}$$

• For each document *d*, response variable *Yd* is sample from distribution by the following equation, where *zd* := (1/*Nd*) <sup>∑</sup>*Nd <sup>n</sup>*=<sup>1</sup> *zn*.

$$Y\_d \sim N(\eta^T \overline{z}\_{d\prime} \sigma^2). \tag{7}$$

#### **6. Results**

#### *6.1. Topic Classification*

We performed topic classification by sLDA using the following parameters: *α* = 0.35, *β* = 0.10, *σ* = 1.0, *K* = 10. We set the variance of supervised signal *σ* = 1.0. This is because we use the normalized VIX index as the supervised signal. As for *α*, *β* we did parameter search among

*α* ∈ [0.2, 0.25, 0.3, 0.35, 1.0, 3.0, 6.25], *β* ∈ [0.1, 0.05, 0.01, 0.005]. We select *α* = 0.35, *β* = 0.10 because the result of topic classification results is clearly separated.

The perplexity of sLDA was calculated by different number of topics (*K*). Figure 3 show the relationship between the perplexity of the model and number of topics.

The reason we set *K* = 10 is that even if the number of topics *K* > 10, the perplexity does not change much further but too many numbers of topics are hard to interpret.

The topic classification results obtained using the sLDA model were presented in Table 3.


**Table 3.** Top 10 words for each topic word distribution *ϕk*.

For the top words of each topic word distribution *ϕk*, we interpret topics as follows. Topic 0 is a topic related to uncertainty on monetary policy in the EU; topic 1 is a topic related to uncertainty on international economic events affecting the world economy; topic 2 is a topic related to the Great East Japan Earthquake. Although some of the most frequent words in topic 2 are related to the disclaimer in the news, some of the most frequent words are related Great East Japan Earthquake. The time series of topic 2 (Figure 4) shows a peak after the Great East Japan Earthquake in 2011;

Topic 3 is a topic related to uncertainty on fiscal policy in the US; topic 4 is a topic related to uncertainty on economic growth in China and emerging countries; topic 5 is a topic related to uncertainty on financial system risk in the EU; topic 6 is a topic related to uncertainty on monetary policy in Japan; topic 7 is a topic related to uncertainty on financial markets; topic 8 is a topic related to uncertainty on monetary policy economic growth in Japan; topic 9 is a topic related to the uncertainty on monetary policy in the US;

The time series of the monthly average percentage of document topic distribution of each topic is shown in Figure 4.

**Figure 4.** Time series of percentage of document topic distribution.

The weight of topic 1 which is a topic related to uncertainty on international economic events affecting the world economy is increasing over time. The weight of topic 5, which is a topic related to uncertainty on financial system risk in the EU, decreases over time. Except for theses topics, almost all topics remain stable. The lowest weight over the entire period is topic 2 which is topically related to the Great East Japan Earthquake.

#### *6.2. Uncertainty Indices with Macroeconomic Event*

In this section, we consider time-series generated uncertainty indices *U It*,*<sup>k</sup>* with the related macroeconomic event.

Figure 5 shows the time-series of topic 1 which is related to uncertainty on international economic events affecting the world economy. A: September 2014. The Scottish independence referendum in the UK. B: June 2016. The United Kingdom European Union membership referendum. C: March 2018. US-China trade friction.

**Figure 5.** Tpoic 1: uncertainty on international economic events affecting the world economy.

Figure 6 shows the time-series of TOPIC 3 which is related to uncertainty on fiscal policy in the US. D: November 2012. Obama re-elected as president. E: November 2016. Trump elected as president.

**Figure 6.** Topic 3: uncertainty on fiscal policy in the US.

Figure 7 shows the time-series of TOPIC 4 which is related to uncertainty on economic growth in China and emerging countries. F: January 2011. Economic overheating and concerns about inflation in emerging countries and China. G: August 2012. The slowdown in China's industrial production index. H: March 2014. Ukrainian crisis. I: August 2015. China shock. J: March 2018. US-China trade friction.

**Figure 7.** Topic 4: uncertainty on economic growth in China and emerging countries.

Figure 8 shows the time-series of topic 5 which is related to uncertainty financial system risk in the EU.

**Figure 8.** Topic 5: uncertainty on financial system risk in the EU.

K: April 2010. Greece debt crisis. L: November 2010. Rating firms downgraded Greek government bonds. M: November 2011. The referendum on accepting support measures from the European Union. N: March 2013. Cyprus shock. O: January 2015. Greek General Election and Uncertainty about the future of negotiations with the EU increased. P: November 2015 Portuguese Government Bonds Excluded from ECB Bond Purchase Program due to Portugal's political situation uncertainty increased.

Figure 9 shows the time-series of topic 6 which is related to uncertainty monetary policy in Japan. Q: April 2012. Bank of Japan increased funds for asset purchases by about 10 trillion yen. R: October 2012. Bank of Japan increased funds for asset purchases by about 10 trillion yen. S: April 2013. Bank of Japan decided to introduce quantitative and qualitative monetary easing policy. T: January 2016. Bank of Japan decided to introduce quantitative and qualitative monetary easing policies with negative interest rates. U: July 2016. Bank of Japan decided additional monetary easing by increasing the ETF purchase amount, etc.

**Figure 9.** Topic 6: uncertainty on monetary policy in Japan.

The country-specific uncertainty index build by Baker's model contains the macroeconomic uncertainties happened in the other countries. For example, economic policy uncertainty (EPU) for Japan contains the macroeconomic uncertainty of European Debt Crisis in 2011 (Arbatli 2017). These macroeconomic uncertainties should not be contained in EPU for Japan but should be only contained in EPU for the EU. As for our model, we use sLDA model to separate the uncertainties based on its topics. the macroeconomic uncertainty caused by the European Debt Crisis only contributes to the TOPIC 5 (uncertainty on financial system risk in the EU), not to the other country-specific uncertainties.

#### *6.3. Comparison with Baker's Model*

In this section, we present a comparison of the Uncertainty Indices constructed by the proposed model and Uncertainty Index constructed by Baker's model. Due to limited access to the news text corpus, we used the same news text corpus as we used in our model (Section 3.1) to build Baker's model. Against (Arbatli 2017), they used several Japanese newspapers in their research, we used Japanese Reuters news on the Reuters website. We constructed the Japan Economic Policy Uncertainty Index by following the same model in (Baker et al. 2016). The model counts the number of articles that contains three categories Japanese words (economy, policy, uncertainty) as Baker's model. The outputted Japan Economic Policy Uncertainty Index is shown in Figure 10. We compare this Japan Economic Policy Uncertainty Index constructed by Baker's model and Uncertainty Indices constructed by our proposed model.

First of all, we examined the term frequency of the sentences of each peak of the Japan Economic Policy Uncertainty Index by Baker's model (Figure 10 from A to E). Top 10 words with a frequency of occurrence are shown in Table 4 .

**Figure 10.** Japan Economic Policy Uncertainty Index constructed by Baker's model.

**Table 4.** Top 10 words for each peaks of the Japan Economic Policy Uncertainty Index by Baker's model.


As you can see from Table 4, the keywords related to the countries other than Japan ("USA", "Europe", "FRB", "Emerging countries" and "China" etc;) , and keywords related to the market ("Financial market", "Yen Appreciation" and "Dollar" etc;) appeared in the Table.

The major events on each peak are related to not only economic event happened in Japan but also in other countries as well. A: Fiscal and financial instability in Greece and other EU countries B: Great East Japan Earthquake C: The U.S. debt ceiling issue D: The contraction of QE3 in the U.S. and concerns about emerging economies E: Slowdown in China and other emerging economies and resource prices falling

We compared the above results with Table 3 which is the uncertainty index constructed by our proposed model. As for topic 6 which is an uncertainty index related to monetary policy in Japan, the top 10 keywords only contain keywords related to Japan and not keywords related to the countries other than Japan. Similarly, topic 9 which is an uncertainty index related to monetary policy in the US, the top 10 keywords only contain keywords related to the US and not keywords related to the countries other than the US.

As a quantitative comparison between the Japan Economic Policy Uncertainty Index by Baker's model and Japan Economic Policy Uncertainty Index by our proposed model, we calculate the ratio of keywords related to the country, keywords related to the market, keywords related to the other country.

As you can see from Table 5, our model have a high percentage of keywords compared to baker's model. In this section, we compared the Japanese Uncertainty Indices constructed by the proposed model and the Japanese Uncertainty Index constructed by Baker's model. As a result, the index by Baker's model is more likely to include keywords related to other countries than those related to the local country. It can be said that the index by Baker's model includes more uncertainty caused by the global economic factor and market factor than the country factor. Conversely, the index by our

model is divided by topics and the uncertainty index by specific country does not contain elements of uncertainty caused by foreign countries and markets.


**Table 5.** The ratio of keywords related to the country, other countries, and financial market.

The effect to impulse response function caused by the difference between index by Baker's model and index by our proposed model is further discussed in Section 6.5.

#### *6.4. Correlation with Other Indices*

In this subsection, we discuss the relationship with financial market indices, and the relationship with the uncertainty index created by (Baker et al. 2016).

In Table 6, we present *η*¯ of each index at first two columns and Pearson correlation coefficient between the macroeconomic uncertainty indices and volatility of financial market indices (US 10-year bond, S&P500, USD/JPY), and VIX index in the last four columns. The volatility of each financial market indices is computed by a standard deviation of the daily return of the target month. Note that *η*¯ is estimated eta after inference and indicates the expected value of a supervised signal for each uncertainty indices, which is then normalized VIX index.

**Table 6.** The *η*¯ and Pearson correlation coefficient between uncertainty indices and the volatilities of financial market indices.


The results denote that uncertainty indices with larger parameter *η* have a higher positive correlation with market volatility and VIX index and uncertainty indices with smaller parameter *η* have a higher negative correlation with market volatility and VIX index.

For example, topic 5 which is related to the uncertainty on financial system risk in the EU (Table 3, Figure 8) and topic 7 which is related to the uncertainty on financial markets (Table 3) have larger *η* value than the other topics. These topics and volatilities of other financial market indices show a stronger positive correlation.

To the contrary, topic 1 which is related to the uncertainty on international economic events affecting the world economy (Table 3, Figure 5) and topic 3 which is related to the uncertainty on fiscal policy in the US (Table 3, Figure 6) have smaller *η* value than the other topics. These topics and volatilities of other financial market indices show a stronger negative correlation.

In addition, topic 0 which is related to the uncertainty on monetary policy in the EU (Table 3) and topic 6 which is related to the uncertainty on monetary policy in Japan (Table 3, Figure 9) has *η* value close to 0. These topics and volatilities of financial market indices shows no correlation.

Note that topic 2 which is the topic related to the Great East Japan Earthquake have a higher positive correlation with USG10 volatility than other topics. This is because topic 2 which contains both the disclaimer and uncertainty related to the Great East Japan Earthquake according to topic word distribution and the heightened uncertainty in the Japanese economy caused by the earthquake in 2011 coincided with the fall in U.S. interest rates triggered by the European debt crisis. If we exclude the peak in 2011, the correlation between topic 2 and USG10 volatility decreases from 0.37 to 0.14 and this is lower than other topics.

The relationship between *η* and the Correlation coefficient with financial market indices is more clearly shown through scatter plots (Figure 11).

**Figure 11.** The *η*¯ and Pearson correlation coefficient between uncertainty indices and market indices.

The results of Table 6 also show that topic 9 which is related to the monetary policy of the US has a negative correlation with the S&P500 index. The reason is although S&P500 is the stock index of the US market, it is influenced by not only the monetary policy of the US but also from other global economic events such as china shock (topic 4), EU debt crisis (topic). This is why topic 9 which is related to the monetary policy of the US has a negative correlation with the volatility of S&P500 index.

In Table 7, we present *η*¯ of each indices at the first two columns and Pearson correlation coefficient between the our macroeconomic uncertainty index and four Economic Policy Uncertainty Index (EPUI) by (Baker et al. 2016) in the last four columns.

To the contrary the relationship between *η* and Correlation coefficient with market indices (Table 6), the results denote that uncertainty indices with smaller parameter *η* have a higher positive correlation

with EPUI and uncertainty indices with larger parameter *η* have a higher negative correlation with market volatility and VIX index.


**Table 7.** The *η*¯ and the Pearson correlation coefficient between uncertainty indices and Economic Policy Uncertainty Index (EPUI) by (Baker et al. 2016).

Following Figure 12 show the scatter plots between *η* and Correlation coefficient with EPUI by (Baker et al. 2016).

**Figure 12.** The *η*¯ and correlation coefficient with EPUI by (Baker et al. 2016).

The results of Table 7 also show that TOPIC 9 which is related to the monetary policy of the US has a very weak correlation with Baker's indices. Within Backer's Economic Policy Uncertainty indices, the percentage of Monetary policy of all Policy Category is less than 30%. This is why the correlation between Backer's Economic Policy Uncertainty and our TOPIC 9 which is related to the monetary policy of the US is very weak.

The above analysis shows:


#### *6.5. Impulse Response Analysis*

We conducted the var analysis between uncertainty index and macroeconomic by using the Japan Industrial production index (Figure 13).

To analyze the impact of the uncertainty index by our model on the industrial production index, we constructed three VAR models as follows;


Due to the limitations of the data, the data period is from 2013/1 to 2019/11. The estimation method is OLS and the identification condition used to compute the impulse response functions is Cholesky decomposition and the variables are in order of uncertainty index and related market variable.

The VIX index has an impact of about 0.4 standard deviations on the industrial production index after four months.

The uncertainty index created based on Baker's model does not show a significant impact. Out of our topic-specific indices, topic 4 (global economy uncertainty index) and topic 6 (Japan uncertainty index) shows a significant impact on Industrial Production Index, with a negative impact comparable to that of the VIX index.

The Japanese uncertainty index constructed by Baker's model had an impact on pushing down the Japanese industrial production index, but it did not show any significance. This is because the uncertainty index by Baker's model fails to break down into country-specific uncertainty as described in Section 6.3.

Although (Arbatli 2017) showed the significance of the Japanese uncertainty index constructed by the Baker's model affecting industrial production index, but we can not show the significance in our analysis with different text corpus and different data period for var model.

With our proposed model, each uncertainty index of a specific country or specific topic is constructed with less news test data compares to (Arbatli 2017), some uncertainty indices show significant impact on the Industrial Production Index.

**Figure 13.** Impulse Responses to Unit Standard Deviation Uncertainty Index Innovation.

#### **7. Discussion and Conclusions**

In this study, we apply the sLDA model to extract uncertainty indices from the news text and the VIX index as a supervised signal. We constructed uncertainty indices based on the topics generated by sLDA. Further, We conducted correlation analysis based on the volatility of the market indices and impulse response analysis based on the related market indices. The results denote that the macroeconomic uncertainty indices with larger parameter *η* have a higher positive correlation with financial market volatility and VIX index, which enable sLDA model to extract topics highly linked to market fluctuations.

Currently, our research is conducted by Japanese news articles and it is limited to Reuters News only. In future work, we will expand our news corpus to several sources and also we will conduct the same analysis in the English version of news articles.

**Author Contributions:** Conceptualization, K.Y., H.S., T.S., H.M. and K.I.; methodology, K.Y.; software, K.Y.; validation, K.Y., H.S., T.S., H.M. and K.I.; formal analysis, K.Y.; investigation, K.Y.; resources, K.Y.; data curation, K.Y.; writing—original draft preparation, K.Y.; writing—review and editing, K.Y.; visualization, K.Y.; supervision, K.I. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


c 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **Contracts for Difference: A Reinforcement Learning Approach**

#### **Nico Zengeler \* and Uwe Handmann**

Hochschule Ruhr West, University of Applied Sciences, 46236 Bottrop, Germany; uwe.handmann@hs-ruhrwest.de

**\*** Correspondence: nico.zengeler@hs-ruhrwest.de

Received: 21 March 2020; Accepted: 16 April 2020; Published: 17 April 2020

**Abstract:** We present a deep reinforcement learning framework for an automatic trading of contracts for difference (CfD) on indices at a high frequency. Our contribution proves that reinforcement learning agents with recurrent long short-term memory (LSTM) networks can learn from recent market history and outperform the market. Usually, these approaches depend on a low latency. In a real-world example, we show that an increased model size may compensate for a higher latency. As the noisy nature of economic trends complicates predictions, especially in speculative assets, our approach does not predict courses but instead uses a reinforcement learning agent to learn an overall lucrative trading policy. Therefore, we simulate a virtual market environment, based on historical trading data. Our environment provides a partially observable Markov decision process (POMDP) to reinforcement learners and allows the training of various strategies.

**Keywords:** contract for difference; CfD; reinforcement learning; RL; neural networks; long short-term memory; LSTM; Q-learning; deep learning

#### **1. Introduction**

High-frequency trading (HFT) causes a high proportion of market activity but receives only little academic attention Aldridge (2013); Brogaard (2010). Artificial intelligence applications, especially machine learning algorithms, offer new perspectives, possibilities and tools for economic modelling and reasoning Aghion et al. (2017). In particular, high-frequency trading of speculative assets like Contracts for Difference (CfD) relies on a statistically appropriate, automated handling of risk. Therefore, we simulate a derivative market as a partially observable Markov decision process (POMDP) for Reinforcement Learning of CfD trading policies at high frequency.

To determine a reward for the agent action, the environment evaluates the trade action on historical market data and returns the financial profit or loss. The agent then tries to find an optimal policy that maximizes the expected rewards. To approximate an optimal policy, we use deep neural networks and evaluate both a feedforward neural network and a recurrent long short-term memory network (LSTM). As a reinforcement learning method, we propose a Q-Learning approach with prioritized experience replay. To evaluate the real-world applicability of our approach, we also perform a test under real market conditions.

We begin this paper by introducing Contracts for Differences and presenting relevant state-of-the-art research in Section 2. In Section 3, we explain our implementation in detail and in Section 4 we illuminate our evaluation process and the results we obtained. The conclusion in Section 6 ends this paper with a summary, a short discussion and possible future work.

#### *Contracts for Difference*

A contract for Difference (CfD), a form of a total return swap contract, allows two parties to exchange the performance and income of an underlying asset for interest payments. In other words, economic players may bet on rising or falling prices and profit, if the real price development matches their bet. Due to the possibility of highly leveraged bets, high wins may occur as well as high losses.

In contrast to other derivatives, such as knockout certificates, warrants or forward transactions, a CfD allows the independent setting of stop-loss and take-profit values. Setting a take-profit and stop-loss value automatically closes the deal, if the underlying course strikes the corresponding threshold. If the asset development does not correspond to the bet, but develops in the opposite direction, a depth arises which can lead to additional financing obligations. A security deposit, also referred to as a margin, fulfils the purpose of hedging the transaction. Since additional funding obligations in the event of default can easily exceed the margin, individual traders can suffer very high losses in a very short time if they have not set a stop-loss value.

Concerning legal aspects, CfD trading currently faces an embargo in the United States of America. According to a general ruling of the Federal Financial Supervisory Authority (Bundesanstalt für Finanzdienstleistungsaufsicht), a broker in Germany may only offer such speculative options to his customers, if they have no additional liability in case of default, but instead only lose their security deposit.

#### **2. State of the Art**

State-of-the-art stock market prediction on longer time scales usually incorporates external textual information from news feeds or social media Bollen et al. (2011); Ding et al. (2015); Vargas et al. (2017). Using only historical trade data, Chen et al. (2015) investigates a LSTM-based course prediction model for the Chinese stock market to predict rising or falling courses on daily time scales, which achieves accuracies between 64.3% and 77.2%. A deep learning LSTM implementation by Akita et al. (2016) learns to predict stock prices based on combined news text together and raw pricing information, allowing for profitable trading applications.

Considering high-frequency trading algorithms and strategies, a vast variety of applications exists, including many classic machine learning approaches Aldridge (2013). Concerning reinforcement learning in high-frequency trading, Moody and Saffell (1999) proposed a reinforcement learning system to optimize a strategy that incorporates long, short and neutral positions, based on financial and macroeconomic data. There also exist reinforcement learning approaches for high-frequency trading on foreign exchange markets with deep reinforcement learning Dempster and Romahi (2002); Gold (2003); Lim and Gorse (2018).

Yet, to the best of our knowledge, no deep reinforcement learning research for high-frequency trading on contracts for difference exists.

#### **3. Method**

We aim to find optimal trading policies in adjustable environment setups, using Q-Learning, as proposed by Watkins and Dayan (1992). To boost the training efficiency, we employ a prioritized experience replay memory for all our models Schaul et al. (2015). We use AdaGrad updates as a weight update rule Duchi et al. (2011). As for an observed state *s*, the underlying POMDP presents a tick chart of length *l*. The tick chart consists of a sequence of ask and bid prices, with the corresponding ask and bid trade volumes. We denote the price values per tick as *pask*, *pbid* and the trade volumes as *vask*, *vbid*.

#### *3.1. Models*

We investigate both a feedforward and a LSTM architecture. Both architectures feature the same input and output layers setup, but have different hidden layers. The input layer contains the state *s*, in form of a tick data sequence with length *l*. As for the output layer, the neural network approximates the Q-values *Q*(*s*, *a*) for each action in the action space *a* ∈ *A*. To approximate these values, we use an output layer of |*A*| neurons, each neuron with linear activation. Each action *a* ∈ *A* may invoke a different trade order.

#### 3.1.1. Feedforward

Our feedforward neural network features a hidden part of three dense layers, as sketched in Figure 1. The first two dense layers consist of 500 rectifying linear units with a small bias of 0.1. We use the He weight initialization with a uniform distribution to initialize the weights. To obtain a roughly the same number of weights as we have in our LSTM architecture, we append a third layer with 180 rectifying linear units, also with a bias of 0.1. For an input length of *l* = 500 and an action space size of |*A*| = 3, the feedforward network has a total of 840,540 parameters.

**Figure 1.** Our feedforward architecture.

#### 3.1.2. LSTM

We use a LSTM network with forget gates as proposed by Gers et al. (1999). Like for the feedforward network, the input layer consists of a sequence of trade data. During training and testing, we keep the sequence length constant. The output layer approximates the linear *Q*-Values, using the hidden state of the LSTM layer, inspired by an architecture found in Mirowski et al. (2016). The hidden LSTM layer consists of a single recurrent layer with 100 rectifying linear units, as shown in Figure 2. We initialize the gates weights using the normal distribution. For a fixed input length of *l* = 500 and an action space size of |*A*| = 3, the LSTM network has a total of 840,300 parameters.

**Figure 2.** Our LSTM architecture.

#### *3.2. Environment*

We implement a simple market logic that operates on historical trading data on a tick time scale as a basis for a POMDP. The environment processes the agents trading actions without a delay, which simplifies analytical investigations but discards the important factor of latency. To increases the information content presented to the agent in each observation, we remove equal successive ticks. This results in a reduced input sequence length but discards the information about how long a particular state lasts.

As a state *s*, the environment presents a sequence of *l* unique ticks, starting from a random point *t* in trade history *x*. We adjust each state for the mean value:

$$s = \mathcal{X}[t:t+l] - \mathfrak{x}[t:t+l],$$

For each state *s*, the agent chooses an action *a*. If the agent chooses the action *a* = 0 to open no trade, the agent receives a reward of 0 and observes the next state. An episode in this environment terminates if the agent chooses to open a deal with an action *a* = 0. When the agent performs an action, the simulation runs forward until the market price reaches either the take-profit or the stop-loss value. The environment then returns the achieved financial profit or loss as a reward, scaled by a constant factor. The pseudo codes in the Appendix A describe the whole training algorithm for a Q-Learning agent on a simulated CfD market.

#### **4. Evaluation**

To evaluate our approach, we use a DE30 CfD index with a nominal value of AC25 per lot at a 5% leverage. We reflect the boundary conditions of the chosen asset in our simulation by setting an adequate reward scaling factor. A small trade volume of 0.01 lot leads to a reward scaling factor of *c* = 0.25 for the training and testing procedure.

As a data basis for our market simulation, we have recorded the corresponding market history from July 2019, using the interface provided by X Open Hub xAPI (n.d.). We recorded five data points per second and removed unchanged successor data points. After recording for a month, we have split the data into a set for the training simulation and a set for the test simulation. This led us to a data basis of about three million unique tick values for the training environment, and about half a million unique ticks for the market logic in our testing procedure.

In this evaluation, we use the models as described in Section 3 with an action space of size |*A*| = 3. The action *a* = 0 does not cause a trade order but makes the agent wait and observe the next tick. To open a long position, the agent would choose *a* = 1, while the action *a* = 2 would cause the opening of a short position.

To find good training parameters for our models, we have conducted a grid search in a space of batch size, learning rate and input sequence length. We evaluate batch sizes *b* ∈ {10, 50, 100}, learning rates *<sup>η</sup>* ∈ {10<sup>−</sup>4, 10−5, 10−6} and input sequence lengths *<sup>l</sup>* ∈ {50, 100, 250}. By comparing the final equities after 1000 test trades we find an optimal parameter configuration. For the feedforward architecture, we find the optimal training parameter configuration in (*b* = 100, *l* = 50, *η* = 10−5). Concerning the single layer LSTM network, we find the best test result for in (*b* = 10, *l* = 50, *η* = 10−4).

#### *4.1. Training*

For each memory record, we have a starting state *s*1, the chosen action *a*, the follow-up state *s*<sup>2</sup> alongside with the achieved reward *r* and a variable *e*. The variable *e* tells us if the replayed experience features a closed trade, thereby ending in a terminal state. In a training run, the agent performs a total of 250,000 learning steps. For each learning step, we sample a batch of *b* independent experiences (*s*1, *a*,*s*2,*r*,*e*) from the prioritized replay memory. Then, we apply an AdaGrad weight update to the neural network, based on the difference between the predicted and the actual *Q*-values. On a currently standard workstation, the training of the smallest feedforward model takes about 15 min, while the training of the large two layer LSTM model took about two days, using an implementation in Theano and Lasagne Dieleman et al. (2015); Theano Development Team (2016).

#### *4.2. Test*

To evaluate our models, we perform tests on unseen market data. If for an optimal action *a* the expected reward *Q*(*s*, *a*) < 0, the agent does not execute the order, as we want our agent to achieve a profit and not a minimal loss. This increases the time between trades at the benefit of more likely success. We test each feedforward and LSTM network by performing a total of 1000 test trades on unseen data. Each test run starts with an equity of AC1000.

From the action distribution in Figure 3, we can see that both the feedforward and the LSTM agent tend to open short positions more frequently. To perform a trade, the feedforward network observes for 2429 ticks on average, while the LSTM network waits for 4654 tick observations before committing to any trade action. While the LSTM network tends to wait and observe, by choosing the action *a* = 0 more frequently, the feedforward network makes decisions faster. We can see an increase in equity for both our models in Figure 3. Furthermore, the LSTM network seems to have a conceptual advantage due to its immanent handling of sequences. Looking at the differences in the profit distribution, as shown in Figure 3, we find that the LSTM network achieves lower profits.

**Figure 3.** Test results. (**Top**) action distribution. (**Mid**) equity development. (**Bottom**) profit distribution.

#### **5. Real-World Application**

For a real-world proof of concept with a demo account, we use an LSTM architecture as it seems to outperform a feedforward network. At first, we tried to apply the best model we found without further boundary conditions. The latency issues caused the agent to decide upon a past observation, lowering the temporal precision and causing more negative profits. Also, the agents orders come to pass late, such that the state has already changed and the agent misses its intended price to set the take-profit and stop-loss values.

To accommodate for the latency problems, we have designed an LSTM architecture with an additional layer of 250 LSTM units, as shown in Figure 4. Also, we increased the action space size to *A* = |10| and introduce a function *δ* = *dpro fit*(*a*) to map *a* to a certain *δ* to add to the stop-loss and take-profit values, thus allowing the anticipation of high spreads:

$$d\_{profit}(a) := \begin{cases} \delta = 0, & : a = 0 \\ \delta = 2, & : a = 1, a = 6 \\ \delta = 5, & : a = 2, a = 7 \\ \delta = 10, & : a = 3, a = 8 \\ \delta = 25, & : a = 4, a = 9 \\ \delta = 50, & : a = 5, a = 10 \end{cases}$$

This workaround introduced some slack into the strategy to accommodate for the various problems introduced by latency. Conceptually, these adjustable delta values allow the agent to anticipate different magnitudes of price changes. This decreases the risk of immediate default, but potentially leads to a high loss.

Using these settings, we have trained the LSTM network with a training parameter configuration of (*b* = 50, *l* = 250, *η* = 10−5). In the corresponding learning dynamics, as shown Figure 5, we can see that the agents maintain potential high wins while trying to reduce losses, which result in an overall surplus during training. To keep the agent up to date with the real-world market data, we have trained the network out of trading time. We let our real-world test run for ten trading days, in which the agent opened and closed 16 trades without manual interference. Also, we superimpose a rule-based hedging system, which allows the agent to open one long position, one short position and a third arbitrary position. contemporaneous. Figure 6 shows the profits achieved by the agent and the corresponding increase in equity.

**Figure 4.** A more sophisticated LSTM architecture for a real-world application.

**Figure 5.** The learning dynamics of our real-world LSTM example.

**Figure 6.** (**Top**) The profits achieved in our real-world example. (**Bottom**) The equity development of our demo account.

#### **6. Discussion**

Comparing the results of the recurrent LSTM network to the simple feedforward network, we can state that assuming sequences in trading data may slightly improve results of a machine learning system. Although our contribution proves that artificially intelligent trading automata may learn different strategies, if given an appropriate training environment, we did not investigate the effects of changing the training environment. We did not investigate the effect of adjusting the chance-risk-ratio at training time, neither did we study the effect of larger replay memory sizes or other learning schemes than Q-Learning. Instead of using batch normalization with a division by the standard deviation, we only subtracted the mean value.

We did not compare our method to related concepts, as presented in the state-of-the-art section, and leave the comparison of different baselines for future research. As for now, we have only considered the course data of a single trading symbol.

As for the real-world example, we have used a demo account with a limited validity of only 20 trading days. We could only use ten trading days for testing our approach in a real-world setting. That said, we did not observe enough trades to make a reliable statement about the long-term reliability of that concrete strategy in a real-world setting.

#### *6.1. Future Work*

To do a baseline comparison research, a market simulation needs an interface that can provide its state data in various representations and interpret different kind of order types. Such an environment also needs to provide trading logic for various assets as well as their derivatives. Also, a simulated market environment for a baseline comparison may consider the market effects of trades.

Considering our simulated market, we may improve the environment in various ways. First, the introduction of an artificial latency would allow experiments that take high-frequency trading requirements into account. This would allow the simulation of various agents that compete under different latency conditions. Secondly, we may gather input data from more than one asset to make use of correlations. And third, we currently neglect the influence of the agents trading decisions on the price development.

In future work, we may study the benefit of aggregated input sequences of different assets, for example a composite input of gold, oil, index charts and foreign exchange markets. We may also provide the input on different temporal solutions, for example on a daily, monthly, or weekly chart. A convolutional neural network may correlate course data from different sources, in order to improve the optimal policy. Given a broader observation space, a reinforcement learning agent may also learn more sophisticated policies, for example placing orders on more than one asset. More sophisticated methods of transfer learning may enable us to reuse already acquired knowledge to improve the performance on unknown assets. For instance, we may use progressive neural networks to transfer knowledge into multiple action spaces. This allows the learning of trading on multiple assets simultaneously, making use of correlations in a large input space.

Furthermore, future work may consider the integration of economic news text as input word vectors. This way, the trade proposals made by our agents may serve as an input for more sophisticated trading algorithms that employ prior market knowledge. As an example, a rule-based system that incorporates long-term market knowledge may make use of the agent proposals to provide a fully automatic, reliable trading program. Such a rule-based system might prevent the agent to open positions if above or below a certain threshold that a human operator may set according to his or her prior knowledge of the market.

#### *6.2. Conclusions*

To conclude, our studies prove that there is a high-frequency trading system which, in a simulation, critically outperforms the market, given a near-zero latency. Our real-world application shows that additional model parameters may compensate for a lower latency. We have contributed a parametrizable training environment that allows the training of such reinforcement learning agents for CfD trading policies. Our neural network implementations serve as a proof of concept for artificially intelligent trading automata that operate on high frequencies. As our approach conceptually allows the learning of trading strategies of arbitrary time scales, a user may provide minutely, hourly or even daily closing prices as a data basis for training as well.

Another key observation of our investigation reveals the importance of using a training setup that matches the real trading conditions. Furthermore, we observe that if we perform trades without setting stop-loss values, the CfD remains open at a position that matches the our obersvations of resistance lines on the index price.

**Author Contributions:** Experiments and Writing, N.Z.; Supervision, U.H.; All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Appendix A**

We have published the source code and data used in this paper at cfd\_src (n.d.).

*Appendix A.1. Pseudocode*

In this pseudo code we denote a subvector from the batch of observations, for example the rewards vector, as *batch*[*rewards*].

#### **Algorithm A1** Training in market simulation


#### **Algorithm A2** Market logic

```
1: procedure MARKET LOGIC(action)
2: Price deltas for take-profit values and stop-loss values dpro fit, dloss
3: Reward scaling factor c
4: Historic trade data X
5: if action = no trade then
6: t ← t+1
7: reward ← 0
8: terminal ← 0
9:
10: if action = long then
11: take-profit ← X[t + l] + dpro fit(action)
12: stop-loss ← X[t + l] − dloss(action)
13: while t ≤ len(X) − l do
14: t ← t+1
15: price ← X[t + l][bid price]
16: if price ≥ take-profit then
17: reward ← c · (price − take-profit + dpro fit(action))
18: terminal ← 1
19: if price ≤ stop-loss then
20: reward ← c · (price − stop-loss − dloss(action))
21: terminal ← 1 22:
23: if action = short then
24: take-profit ← X[t + l] − dpro fit(action)
25: stop-loss ← X[t + l] + dloss(action)
26: while t ≤ len(X) − l do
27: t ← t+1
28: price ← X[t + l][ask price]
29: if price ≤ take-profit then
30: reward ← c · (take-profit − price + dpro fit(action))
31: terminal ← 1
32: if price ≥ stop-loss then
33: reward ← c · (stop-loss − price − dloss(action))
34: terminal ← 1
35:
36: state2 ← X[t : t + l] − mean(X[t : t + l])
37: returnstate2,reward, t, terminal
```
#### **Algorithm A3** Q-Learning step


#### **Algorithm A4** Update Prioritized Replay Memory

**procedure** APPEND TO MEMORY(experience) *Lists state*1*, state*2*, action, reward, terminal Priority vector P Probability distribution p Trust factor α* = <sup>7</sup> 10 *Experience* (*s*1,*s*2, *a*,*r*, *t*) *i* ← *i* + 1 mod |*M*| (*state*1[*i*],*state*2[*i*], *action*[*i*],*reward*[*i*], *terminal*[*i*]) ← (*s*1,*s*2, *a*,*r*, *t*) *P*[*i*] ← max(*P*) *pi* <sup>←</sup> *<sup>P</sup>*[*i*] *α* ∑*<sup>i</sup> P*[*i*]*<sup>α</sup>*

#### **Algorithm A5** Update Prioritized Replay Memory

**procedure** UPDATE PRIORITY(batch) *Priority vector P Temporal difference errors batch*[*delta*] **for** *deltai* **do**: *P*[*i*] = |*δ<sup>i</sup>* + |

#### **References**


Watkins, Christopher, and Peter Dayan. 1992. Q-learning. *Machine Learning* 8: 279–92. [CrossRef]

xAPI. n.d. Documentation of the Interface Used for Data Gathering and Real World Test. Available online: http://developers.xstore.pro/documentation/ (accessed on 12 April 2020).

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **The Predictability of the Exchange Rate When Combining Machine Learning and Fundamental Models**

#### **Yuchen Zhang and Shigeyuki Hamori \***

Graduate School of Economics, Kobe University, 2-1, Rokkodai, Nada-Ku, Kobe 657-8501, Japan; zhangyuchen0227@yahoo.co.jp

**\*** Correspondence: hamori@econ.kobe-u.ac.jp

Received: 30 January 2020; Accepted: 1 March 2020; Published: 4 March 2020

**Abstract:** In 1983, Meese and Rogoff showed that traditional economic models developed since the 1970s do not perform better than the random walk in predicting out-of-sample exchange rates when using data obtained after the beginning of the floating rate system. Subsequently, whether traditional economical models can ever outperform the random walk in forecasting out-of-sample exchange rates has received scholarly attention. Recently, a combination of fundamental models with machine learning methodologies was found to outcompete the predictability of random walk (Amat et al. 2018). This paper focuses on combining modern machine learning methodologies with traditional economic models and examines whether such combinations can outperform the prediction performance of random walk without drift. More specifically, this paper applies the random forest, support vector machine, and neural network models to four fundamental theories (uncovered interest rate parity, purchase power parity, the monetary model, and the Taylor rule models). We performed a thorough robustness check using six government bonds with different maturities and four price indexes, which demonstrated the superior performance of fundamental models combined with modern machine learning in predicting future exchange rates in comparison with the results of random walk. These results were examined using a root mean squared error (RMSE) and a Diebold–Mariano (DM) test. The main findings are as follows. First, when comparing the performance of fundamental models combined with machine learning with the performance of random walk, the RMSE results show that the fundamental models with machine learning outperform the random walk. In the DM test, the results are mixed as most of the results show significantly different predictive accuracies compared with the random walk. Second, when comparing the performance of fundamental models combined with machine learning, the models using the producer price index (PPI) consistently show good predictability. Meanwhile, the consumer price index (CPI) appears to be comparatively poor in predicting exchange rate, based on its poor results in the RMSE test and the DM test.

**Keywords:** exchange rates; fundamentals; prediction; random forest; support vector machine; neural network

#### **1. Introduction**

Despite the existence of various economic theories explaining the fluctuation of future exchange rates, as shown in Meese and Rogoff (1983a, 1983b), the random walk often produces better predictions for future exchange rates. More specifically, it has been shown that traditional economical models developed since the 1970s do not perform better than the random walk in predicting the out-of-sample exchange rate when using data obtained after the beginning of the floating rate system. Since the publication of these papers, many researchers have investigated this puzzle. Cheung et al. (2005) confirmed the work of Meese and Rogoff (1983a, 1983b), and demonstrated that the interest rate parity, monetary, productivity-based, and behavioral exchange rate models do not outperform the random walk for any time-period. Similarly, Rossi (2013) could not find a model with strong out-of-sample forecasting ability. On the contrary, Mark (1995) showed that the economic exchange-rate models perform better than the random walk in predicting long term exchange rates. Amat et al. (2018) also found that combining machine learning methodologies, traditional exchange-rate models, and Taylor-rule exchange rate models could be useful in forecasting future short-term exchange rates in the case of 12 major currencies.

There have been similar attempts by researchers using stock market data. These studies show the predictability of future stock price using machine learning methodologies (Cervelló-Royo et al. 2015; Chong et al. 2017), and stock market trends (Chang et al. 2012; García et al. 2018). Hamori et al. (2018) also analyzed the default risk using several machine learning techniques.

Following on from these previous studies, this paper focuses on a combination of modern machine learning methodologies and economic models. The purpose of this paper is to determine whether such combinations outperform the prediction performance of random walk without drift. This model has been used as the comparison in most studies in this field since Meese and Rogoff (1983a, 1983b). The most profound study in this field is Amat et al. (2018). What distinguishes the present paper from previous studies is that instead of using an exponential weighted average strategy and sequential ridge regression with discount factors, this paper applies the random forest, support vector machine (SVM), and neural network models to four fundamental theories (uncovered interest rate parity, purchase power parity, the monetary model, and the Taylor rule models). Furthermore, the robustness of the results is thoroughly examined using six government bonds with different maturities (1, 2, 3, 5, 7, and 10 years) and four price indexes (the producer price index (PPI), the consumer price index (CPI) of all items, CPI excluding fresh food, and CPI excluding fresh food and energy) individually in three machine learning models. Together, these elements should provide concrete evidence for the results that were obtained.

In the empirical analysis, a rolling window analysis was used for a one-period-ahead forecast for the JPY/USD exchange rate. The sample data range from August 1980 until August 2019. The window size was set as 421. Hence, in total, the rolling window analysis was conducted 47 times for the individual fundamental models. The main findings of this study are as follows. First, when comparing the performance of the fundamental models combined with machine learning to that of the random walk, the root mean squared error (RMSE) results show that the fundamental models with machine learning outperform the random walk (the mean absolute percentage error (MAPE) also confirmed this result). In the Diebold–Mariano (DM) test, most of the results show significantly different predictive accuracies compared to the random walk, while some of the random forest results show the same accuracy as the random walk. Second, when comparing the performance of the fundamental models combined with machine learning, the models using the PPI show fairly good predictability in a consistent manner. This is indicated by both the RMSE and the DM test results. However, the CPI is not appropriate for predicting exchange rates, based on its poor results in the RMSE test and DM test. This result seems reasonable given that the CPI includes volatile price indicators such as food, beverages and energy.

The rest of the paper is organized as follows. Section 2 explains the fundamental models, Section 3 describes the data used in the empirical studies, Section 4 describes the methodology of machine learning, Section 5 shows the results and evaluation, and Section 6 summarizes the main findings of the paper.

#### **2. Fundamental Models**

Following Rossi (2013) and Amat et al. (2018), this paper uses four basic methods to predict the exchange rate. These methods are uncovered interest rate parity (UIRP), purchase power parity (PPP), the monetary model, and the Taylor rule models.

#### *2.1. Uncovered Interest Rate Parity*

The UIRP theorem used in the following section was proposed by Fisher (1896). This theorem analyzes how interest rates can be altered due to expected changes in the relative value of the objected units. UIRP is based on the following assumption; in a world with only two currencies and where market participants possess perfect information, investors can buy 1 = *St* units of foreign government bonds using one unit of their home currency. When investors buy a foreign bond between time *t* and time *t* + *h*, the earnings from the foreign bond are the bond premium plus the foreign interest rate: *i* ∗ *<sup>t</sup>*+*h*. At the end of the period, investors can collect the return converted to the home currency, which is shown as *St*<sup>+</sup>*<sup>h</sup>* 1 + *i* ∗ *t*+*h* /*St* in expectation. Additional transaction costs during the whole process are ignored in this analysis, and the bond return should be the same whether the investors buy the home bond or the foreign bond. Hence, the following equation is given:

$$(1 + i\_{t+h}^\*)E\_t(S\_{t+h}/S\_t) = 1 + i\_{t+h} \tag{1}$$

By taking logarithms, the previous UIRP equation can be rewritten as

$$E\_t(s\_{t+h} - s\_t) = \alpha + \beta(i\_{t+h} - i\_{t+h}^\*) \tag{2}$$

where *St* is the logarithm of the exchange rate, and *h* is the horizon.

Another uncovered interest rate parity equation used in Taylor (1995) is as follows:

$$
\Delta\_k s\_{t+k}^c = i\_t - i\_t^\* \tag{3}
$$

where *st* denotes the logarithm of the spot exchange rate (domestic price for foreign currency) at time *t*, and *it* is the nominal interest for domestic and foreign securities, respectively (with *k* periods to maturity).

It is worth noting that in both equations, maturity is denoted as *k*, meaning that if we follow the equation faithfully to predict the one-month ahead exchange rate, we should use the one-month maturity of the government bond to predict that rate. However, the focus here is on the relationship between interest rate differences and the exchange rate. Thus, the above equations are rewritten as

$$\mathbf{s}\_{t+1} - \mathbf{s}\_t = \mathbf{i}\_t - \mathbf{i}\_t^\*. \tag{4}$$

The above equation is used in the following empirical analysis.

Meese and Rogoff (1983a, 1983b), who used Equation (1) to forecast sample real exchange rates using the real interest rates and compared their performance with the predictions using random walk, found that the latter provided better forecasting results.

#### *2.2. Purchasing Power Parity*

The PPP was first proposed in Cassel (1918). The concept of PPP is that the same amount of goods or services can be purchased in either currency with the same initial amount of currency. That is, a unit of currency in the home country would have the same purchasing power in the foreign country.

The absolute purchase power parity can be expressed as the following equation:

$$S = \frac{P}{P^\*} \tag{5}$$

where *S* denotes the exchange rate in period *t*, *P* denotes the price level in the home country, and *P*∗ denotes the price level in the foreign country.

Assuming that the absolute purchasing power parity holds in period *t* + 1, we can obtain the following equation:

$$S\_{t+1} = \frac{P\_{t+1}}{P\_{t+1}^\*}.\tag{6}$$

Assuming that the inflation rate from period *t* to period *t* + 1 is π, we can obtain following equation:

$$S\_{l+1} = \frac{(1+\pi)P\_l}{(1+\pi^\*)P\_t^\*} = \frac{1+\pi}{1+\pi^\*}S\_{l\prime} \tag{7}$$

which means that

$$\frac{S\_{t+1}}{S\_t} = \frac{1+\pi}{1+\pi^\*}.\tag{8}$$

Assuming that the rate of the change in the exchange rate is ρ, then

$$\frac{S\_{t+1}}{S\_t} = \rho + 1.\tag{9}$$

Using Equations (8) and (9), we can obtain

$$
\rho + \rho \pi^\* + 1 + \pi^\* = 1 + \pi. \tag{10}
$$

Since ρπ∗ is a very small value, it is ignored in the following analysis. Then, we obtain

$$\frac{S\_{t+1} - S\_t}{S\_t} = \pi - \pi^\*. \tag{11}$$

From Equation (11), we can see that there is a clear relationship between the rate of change in the exchange rate and the inflation rate. This paper use four indexes to calculate the inflation rate. These indexes are the PPI, the CPI of all items, the CPI excluding fresh food, and the CPI excluding fresh food and energy. Most papers use the CPI when describing the PPP theorem. However, Hashimoto (2011) mainly uses PPI for purchase power parity, since it includes business activities in both home and foreign markets.

#### *2.3. Monetary Model*

The monetary model was first introduced by Frenkel (1976) and Mussa (1976). The monetary approach determines the exchange rate as a relative price of two currencies and models the exchange rate behavior in terms of the relative demand for and the supply of money in the two countries. The long-run money market equilibrium in the domestic and foreign country is given by

$$m\_t = p\_t + ky\_t - h i\_t \tag{12}$$

$$m\_t^\* = p\_t^\* + ky\_t^\* - h\vec{u}\_t^\*.\tag{13}$$

From Equations (12) and (13), we can obtain

$$m\_l - m\_t^\* = p\_l - p\_t^\* + k \left( y\_t - y\_t^\* \right) - h \left( i\_l - i\_t^\* \right) \tag{14}$$

where *mt* denotes the logarithms of the money supply, *pt* denotes the logarithms of the price level, *yt* denotes the logarithms of income, and *it* denotes the logarithms of the interest rates. *k* denotes the income elasticity. Assuming that *k* is 1 and using an uncovered interest rate parity of *it* − *i* ∗ *<sup>t</sup>* = *St*+<sup>1</sup> − *St*, we get

$$S\_{t+1} - S\_t = p\_t - p\_t^\* + y\_t - y\_t^\* - (m\_t - m\_t^\*). \tag{15}$$

This paper mainly focuses on the relationship between the change rate of the exchange rate and other variables. Thus, the following equation is used:

$$S\_{t+1} - S\_t = f\{p\_t - p\_{t\prime}^\* \: \ y\_t - y\_{t\prime}^\* \: \ m\_t - m\_t^\*\}.\tag{16}$$

*2.4. Taylor Rule Models*

Engel and West (2005, 2006) and Molodtsova and Papell (2009) improved the original Taylor rule for monetary policy (Taylor 1993), which describes the change in the exchange rate.

The concept in the original Taylor model (Taylor 1993) is that the monetary authority sets the real interest rate as a function of the difference between the real inflation and the target level and also as a function of the output gap *yt*.

Taylor (1993) proposed the following equation:

$$\dot{i}\_t^T = \pi\_t + \phi(\pi\_t - \pi^\*) + \gamma y\_t + r^\* \tag{17}$$

where *i T <sup>t</sup>* denotes the target for the short-term nominal interest rate, π*<sup>t</sup>* is the inflation rate, π<sup>∗</sup> is the target level of inflation, *yt* is the output gap, and *r*<sup>∗</sup> is the equilibrium level of the real interest rate.

Following Molodtsova and Papell (2009), assuming that μ = *r*<sup>∗</sup> −φπ<sup>∗</sup> , and λ = 1 + φ, the following equation is obtained:

$$\mathbf{i}\_t^T = \mu + \lambda \pi \mathbf{i} + \gamma \mathbf{y}\_t. \tag{18}$$

Since the monetary policy also depends on the real exchange rate, the real exchange rate variable *qt* is added into the previous equation:

$$i\_t^T = \mu + \lambda \pi + \gamma y\_t + \delta q\_t. \tag{19}$$

On top of Equation (19), we added another feature so that the interest rate adjusts gradually to achieve its target level (Clarida et al. 1998). This means that the actual observable interest rate *it* is partially adjusted to the target, as follows:

$$\dot{\mathbf{i}}\_{t} = (\mathbf{1} - \rho)\mathbf{i}\_{t}^{T} + \rho \dot{\mathbf{i}}\_{t-1} + \mathbf{v}\_{l} \tag{20}$$

where ρ is the smoothing parameter, and *vt* is a monetary shock.

By substituting Equation (19) into Equation (20), we get the following equation:

$$\dot{i}\_l = (1 - \rho)(\mu + \lambda \pi t + \gamma y\_l + \delta q\_l) + \rho \dot{i}\_{l-1} + v\_l \tag{21}$$

where for the US, δ = 0, and *vt* is the monetary policy shock. Thus, we can obtain the following two equations using asterisks to denote foreign country variables:

$$i\_l = (1 - \rho)(\mu + \lambda \pi\_l + \gamma y\_l) + \rho i\_{l-1} + v\_l \tag{22}$$

$$\mathbf{u}\_t^\* = (1 - \rho^\*) \left(\mu^\* + \lambda^\* \pi\_t^\* + \gamma^\* y\_t^\* + \delta^\* q\_t\right) + \rho^\* \mathbf{i}\_{t-1}^\* + v\_t^\* \tag{23}$$

By taking the difference of Equations (22) and (23), using the UIRP model and re-defining the coefficients, we get

$$S\_{t+1} - S\_t = \vec{\mu} + \vec{\delta}q\_t + \vec{\lambda}\pi\_t + \vec{\gamma}y\_t - \vec{\lambda}^\*\pi\_t^\* - \vec{\gamma}^\*y\_t^\* + \rho i\_{t-1} - \rho^\*i\_{t-1}^\*.\tag{24}$$

In Molodtsova and Papell (2009), the strongest result was found in the symmetric Taylor rule model, which means that the coefficient of the real exchange rate δ˜ = 0. Therefore, the Taylor fundamentals take the inflation, output gaps, and lagged interest rate into consideration.

In Rossi (2013), Giacomini and Rossi (2010), and Jamali and Yamani (2019), lagged interest rates are not included, while the coefficient is defined as in Equation (24), so

$$S\_{t+1} - S\_t = \bar{\mu} + \bar{\lambda} \{\pi\_t - \pi\_t^\*\} + \bar{\gamma} (y\_t - y\_t^\*). \tag{25}$$

Since Meese and Rogoff (1983a, 1983b) used Equation (24), while Rossi used Equation (25), this paper uses both equations for the Taylor rule models.

#### **3. Data**

The data used to describe macroeconomies were taken from the DataStream database. All data describe monthly frequency. This paper used government bonds with different maturities (1, 2, 3, 5, 7, and 10 years) for each country. The producer price index (PPI) and consumer price index (CPI) of all items, the CPI excluding fresh food, and the CPI excluding fresh food and energy were used to calculate the inflation rate. For the money stock, we used each country's M1. To measure the output, we used the industrial production index, as GDP is only available quarterly. Following Molodtsova and Papell (2009), we used the Hodrick–Prescott filter to calculate the potential output to obtain the output gap. The exchange rates were taken from the BOJ Time-Series Data Search. The data is from the period ranging from August 1980 to August 2019. Data are described in Table 1.


**Table 1.** Data description.

This paper used a rolling window analysis for the one-period-ahead forecast. A rolling window analysis runs an estimation iteratively, while shifting the fixed window size by one period in each analysis. The whole sample dataset ranges from the first period of August 1980 until August 2019. Here, the window size was set as 421. For example, the first window taken from August 1980 to August 2015 was used to estimate September 2015. Hence, the model uses the training data from period 1 to 421 to predict period 422 and then uses the training data from period 2 to 422 to predict period 423. This is repeated until the end of the time-series. In total, the rolling window analysis is run 47 times for one model.

There are two reasons why we used the end of month exchange rate rather than the monthly average exchange rate. First, the end of month exchange rate is often used in this field of study. Second, as mentioned in Engel et al. (2019), although replacing the monthly average exchange rate with the end of month exchange rate reduces the forecasting power of the Taylor rule fundamentals compared to that of the random walk (Molodtsova and Papell 2009), it is highly possible that changes in the monthly average exchange rate are serially correlated. Thus, following Engel et al. (2019), this study also used the end of month exchange rate.

#### **4. Methodologies**

Here, we use the result from random walk as the benchmark test and compare its performance to three types of machine learning: random forest, support vector machine, and neural network. The results are examined using the RMSE and a DM test.

#### *4.1. Random Forest*

Random forest (Breiman 2001) is an ensemble learning method that builds multiple decision trees by analyzing data features and then merges them together to improve prediction performance. This method enables us to avoid an overfitting problem when more trees are added to the forest and improves prediction performance because each tree is drawn from the original sample using bootstrap resampling and is grown based on a randomly selected feature. The uncorrelated models obtained from this method improve prediction performance, as the feature mentioned above protects individual errors from each other. In this way, an individual error will not interfere with the entire group moving toward the correct direction. The random forest produces regression trees through the following steps (Figure 1):

**Figure 1.** Mechanism of random forest.

Assume that there is a dataset *D* = (*x*1, *y*1)......(*xn*, *yn*) and the target is to find the function *f* : *X* → *Y*, where *X* is the inputs, and *Y* is the produced outputs. Let *M* be the number of features.


A prediction is produced by taking the average of the predictions from all trees in the forest (in the case of a classification problem, a prediction is decided by the majority).

In this paper, *X* indicates the fundamental economic features, and *Y* is the exchange rate. D refers to all data.

#### *4.2. Support Vector Machine*

The original SVM algorithm was introduced by Vapnik and Lerner (1963). Boser et al. (1992) suggested an alternative way to create nonlinear classifiers by applying the kernel functions to maximum-margin hyperplanes.

The primary concept of SVM regression is discussed first with a linear model and then is extended to a non-linear model using the kernel functions. Given the training data (*x*1, *y*1), ... , (*x*1, *y*1), *XRn*, *YR* , the SVM regression can be given by

$$f(\mathbf{x}) = w^\top \mathbf{x} + b, \ a \epsilon \mathbf{x} , \ b \epsilon \mathbf{R} \tag{26}$$

ξ is the insensitive loss function considered in SVM from the loss function described as

$$\left|\xi\right|\_{\varepsilon} = \left|y - f(\mathbf{x})\right|\_{\varepsilon} = \begin{cases} 0 & \text{if } \left|y - f(\mathbf{x})\right| \le \varepsilon \\ \left|y - f(\mathbf{x})\right| \le \varepsilon & \text{otherwise} \end{cases}.\tag{27}$$

The principal objective of SVM regression is to find function *f*(*x*) with the minimum value of the loss function and also to make it as flat as possible. Thus, the model can be expressed as the following convex optimization problem:

$$\min \frac{1}{2} \|w\|^2 + \mathcal{C} \left(\sum\_{i}^{I} \xi\_i^\* + \sum\_{i=1}^{I} \xi\_i\right) \tag{28}$$

subject to

$$y\_i - w^\top x - b \le \varepsilon + \xi\_i \tag{29}$$

$$\{w^\top x + b - y\_i \le |\varepsilon| + \xi\_i^\*\}\tag{30}$$

$$
\xi\_i \downarrow \xi\_i^\* \ge 0 \tag{31}
$$

where C determines the trade-off between the flatness of *f*(*x*) and the amount up to which deviations larger than ε are tolerable (ξ*i*, ξ<sup>∗</sup> *i* ).

After solving the Lagrange function from Equations (29)–(31) and using the kernel function, the SVM model using the kernel function can be expressed as follows:

$$\max - \frac{1}{2} \sum\_{i,j=1}^{l} (\alpha\_i - a\_i^\*) (\alpha\_j - a\_j^\*) \mathbf{k}(x\_i, x\_j) + \sum\_{i=1}^{l} y\_i (\alpha\_i - a\_i^\*) - \varepsilon \sum\_{i=1}^{l} (\alpha\_i - a\_i^\*) \tag{32}$$

subject to

$$\sum\_{i=1}^{I} \left(\alpha\_i - \alpha\_i^\*\right) = 0,\tag{33}$$

$$\alpha\_{i\prime} \; \alpha\_j^\* \varepsilon \; [0, \mathbb{C}]. \tag{34}$$

where *k xi*, *xj* is the kernel function, and α*i*, α<sup>∗</sup> *<sup>i</sup>* are the Lagrangian multipliers. SVM can be performed by various functions, such as the linear, polynomial, or radial basis function (RBF), and sigmoid functions. This paper uses the radial basis function SVM model. The radial basis function can be expressed as follows:

$$k(\mathbf{x}\_i, \mathbf{x}\_j) = \exp\left(-\sigma \left|\mathbf{x}\_i - \mathbf{x}\_j\right|^2\right). \tag{35}$$

Here, the best C and sigma are determined using a grid search. Depending on the size of the C parameter, there is a trade-off between the correct classification of training examples and a smooth decision boundary. A larger C does not tolerate misclassification, offering a more complicated decision function, which a smaller C does tolerate. In this way, a simpler decision function is given. The sigma parameter defines how far the influence of a single training example reaches, with low values meaning 'far' and high values meaning 'close'. A larger sigma gives a great deal of weight to the variables nearby, so the decision boundary becomes wiggly. For a smaller sigma, the decision boundary resembles a linear boundary, since it also takes distant variables into consideration.

#### *4.3. Neural Network*

The feedforward neural network is the first and simplest type of neural network model. General references for this model include Bishop (1995), Hertz et al. (1991), and Ripley (1993, 1996). This paper uses one hidden layer model, which is the simplest model, as shown in Figure 2.

**Figure 2.** The mechanism of a neural network.

As shown in Figure 2, the information moves forward from the input nodes, through the hidden nodes, and then reaches the output nodes.

Inputs are summed by individual nodes. Then, adding a bias (*wij* in the Figure 2), the result is substituted into a fixed function φ*<sup>h</sup>* (Equation (37)). The results of the output units are produced in the same process with output function φ*o*. Thus, the equation of a neural network is written as follows:

$$y\_k = \phi\_o \left( a\_k + \sum\_h w\_{hk} \phi\_h \left( a\_h + \sum\_i w\_{ih} \mathbf{x}\_i \right) \right) \tag{36}$$

The activation function φ*<sup>h</sup>* of the hidden layer units usually takes a logistic function as

$$l(z) = \frac{1}{1 + e^{-z}},\tag{37}$$

and the output function φ*<sup>o</sup>* usually takes a linear function in regression (in the case of a classification problem, the output function often takes a logistic form.)

Here, we adjust two hyper-parameters, which are the number of the units in the hidden layer C and the parameter for weight decay, using a grid search. The latter is a regularization parameter to avoid the over-fitting problem (Venables and Ripley 2002).

#### **5. Results and Evaluation**

#### *5.1. Root Mean Squared Error*

Random walk uses the following equation:

$$\left\{ \sum\_{s=0}^{46} \left[ \mathbf{A} \left( \mathbf{t} + \mathbf{s} + 1 \right) - \mathbf{A} \left( \mathbf{t} + \mathbf{s} \right) \right]^2 / 47 \right\}^{\frac{1}{2}}.\tag{38}$$

For the other machine learning models, the following equation is used:

$$\left\{ \sum\_{s=0}^{46} \left[ \mathcal{F} \left( \mathbf{t} + \mathbf{s} + 1 \right) - \mathcal{A} \left( \mathbf{t} + \mathbf{s} + 1 \right) \right]^2 / 47 \right\}^{\frac{1}{2}} \tag{39}$$

where *A*(*t*) denotes the actual value of the change rate in the exchange rate, and *F*(*t*) is the predicted value. If *S* = 0, then *S* is a prediction for September 2015.

#### *5.2. Modified Diebold–Mariano Test*

The DM test was proposed by Diebold and Mariano (1995). This test examines whether the null hypothesis (that the competing model has the same predictive accuracy) is statistically true. Let us define the forecast error *eit* as

$$x\_{\rm it} = \mathfrak{Y}\_{\rm it} - y\_{\rm t}, \quad \mathfrak{i} = 1, \ 2 \tag{40}$$

where *y*ˆit and *yt* are the predicted and actual values at time *t*, respectively.

Let g(*eit*) denote the loss function. In this paper, it is defined as the following

$$
\mathcal{g}(e\_{it}) = e\_{it}^2. \tag{41}
$$

Then, the loss differential *dt* can be written as

$$d\_t = \mathcal{g}(\mathbf{e}\_{1t}) - \mathcal{g}(\mathbf{e}\_{2t}).\tag{42}$$

The statistic for the DM test is defined as follows:

$$\text{DM} = \frac{\overline{d}}{\sqrt{\frac{s}{N}}} \tag{43}$$

where *d*, *s*, and *N* represent the sample mean, the variance of *dt*, and the sample size, respectively. The null hypothesis is that H0 : E[*dt*] = 0 ∀*t*, which means that the two forecasts have the same accuracy, while the alternative hypothesis is that H1 : *E*[*dt*] - 0 ∀*t*, meaning that the two forecasts have different levels of accuracy. If the null hypothesis is true, then the DM statistic is asymptotically distributed as *N*(0, 1), the standard normal distribution.

A modified DM test was proposed by Harvey et al. (1997), who found that the modified DM test performs better than the original one. They defined the statistic for the modified DM test as follows:

$$DM^\* = \left[\frac{n+1-2h+n^{-1}h(h-1)}{n}\right]^{\frac{1}{2}}DM\tag{44}$$

where *h* denotes the horizon, and the DM represents the original statistic, as in Equation (43). In this study, we predict one period ahead, meaning that *h* = 1, so *DM*<sup>∗</sup> = *n*−1 *n* 1 2 *DM*.

Tables 2–6 indicate the following. According to the RMSE, the fundamental models using machine learning outperform the random walks with regard to their error size. This is also confirmed by the MAPE (Appendix A, Table A1). Since this is confirmed regardless of the government bonds' time to maturity and the price level measurements used, these findings are robust. Furthermore, models using PPI always show better predictability compared to CPI. This empirical result is in line with that of Hashimoto (2011). Because of the close trading relationship between Japan and the US, the fluctuation of JPY/USD tends to be influenced by PPI rather than CPI. The poor performance of CPI in terms of its error size and the significance of its predictive accuracy could be explained by the inclusion of volatile price indicators, such as food, beverages, and energy, w makes it difficult to measure an accurate inflation rate gap. In addition, in the case of the Taylor rule models, both Equations (24) and (25) present reasonable results for this empirical study. This demonstrates that either equation can be used to predict the exchange rate.


**Table 2.** Results for the uncovered interest rate parity (UIRP) model.

*Note*. "UIRP" indicates uncovered interest rate parity, "SVM" indicates support vector machine, "DM" indicates the modified Diebold–Mariano test statistic as in Equation (44), "bond\_1y" indicates the government bond with 1 year to maturity. "bond\_2y" indicates the government bond with two years to maturity. "bond\_3y" indicates the government bond with three years to maturity. "bond\_5y" indicates the government bond with five years to maturity. "bond\_7y" indicates the government bond with seven years to maturity. "bond\_10y" indicates the government bond with ten years to maturity.


**Table 3.** Results for the purchasing power parity (PPP) model.

*Note.* "PPP" indicates purchasing power parity, "DM" indicates modified Diebold–Mariano test statistic as in Equation (44), "PPI" indicates producer price index, "CPI" indicates the CPI of all items, "CPI\_CORE" indicates the CPI excluding fresh food, and "CPI\_CORECORE" indicates the CPI excluding fresh food and energy.


**Table 4.** Results for the monetary model.

*Note.* "DM" indicates modified Diebold–Mariano test statistic, as in Equation (44), "PPI" indicates producer price index, "CPI" indicates the CPI of all items, "CPI\_CORE" indicates the CPI excluding fresh food, and "CPI\_CORECORE" indicates the CPI excluding fresh food and energy.


**Table 5.** Results for the Taylor model (Equation (25)).

*Note.* "SVM" indicates support vector machine, "DM" indicates modified Diebold–Mariano test statistic as in Equation (44), "Taylor1" indicates using Equation (25), "PPI" indicates producer price index, "CPI" indicates the CPI of all items, "CPI\_CORE" indicates the CPI excluding fresh food, and "CPI\_CORECORE" indicates the CPI excluding fresh food and energy.

From the perspective of the modified DM test, we can see that most of the results show significantly different predictive accuracies compared to the random walk, while some of the random forest results show the same predictive accuracy as the random walk. Random forest is a weak tool for predicting the out-of-sample exchange rate compared with the other machine learning models. This seems reasonable, as the random forest model ignores two characteristics of time series data, that is, inherent time trend and the interdependency among variables. However, random forest can be useful in predicting time series data in some cases, such as that in Dudek (2015).


**Table 6.** Result for the Taylor model (Equation (24)).

*Note.* "SVM" indicates support vector machine, "Taylor2" indicates using Equation (24), "DM" indicates modified Diebold–Mariano test statistic as in Equation (44), "PPI" indicates producer price index, "CPI" indicates the CPI of all items, "CPI\_CORE" indicates the CPI excluding fresh food, and "CPI\_CORECORE" indicates the CPI excluding fresh food and energy. "Taylor2\_PPI\_2y" indicates using the PPI to calculate the inflation rate and using a government bond with 2 year to maturity to calculate the lagged interest rate. "bond\_2y" indicates the government bond with two years to maturity. "bond\_3y" indicates the government bond with three years to maturity. "bond\_5y" indicates the government bond with five years to maturity. "bond\_7y" indicates the government bond with seven years to maturity. "bond\_10y" indicates the government bond with ten years to maturity.

#### **6. Conclusions**

Since the work of Meese and Rogoff (1983a, 1983b), there have been many attempts by researchers to solve the puzzle of why traditional economical models are not able to outperform the random walk in predicting out-of-sample exchange rates. In recent years, Amat et al. (2018) found that in combination with machine learning methodologies, traditional exchange-rate models and Taylor-rule exchange rate models can be useful for forecasting future short-term exchange rates across 12 major currencies.

In this paper, we analyzed whether combining modern machine learning methodologies with economic models could outperform the prediction performance of a random walk without drift. More specifically, this paper sheds light on the application of the random forest method, the support vector machine, and neural networks to four fundamental theories (uncovered interest rate parity, purchase power parity, the monetary model, and the Taylor rule models). The robustness of the results was also thoroughly examined using six government bonds with different maturities and four different price indexes in three machine learning models. This provides concrete evidence for predictive performance.

In the empirical analysis, a rolling window analysis was used for the one-period-ahead forecast for JPY/USD. Using sample data from between August 1980 and August 2019, there were two main findings. First, comparing the performance of the fundamental models combining machine learning with the performance obtained by the random walk, the RMSE results show that the former models outperform the random walk. In the DM test, most of the results show a significantly different predictive accuracy with the random walk, while some of the random forest results show the same accuracy as the random walk. Second, comparing the performance of the fundamental models combined with machine learning, the models using PPI show fairly good predictability in a consistent manner, from the perspective of both the size of their errors and their predictive accuracy. However, CPI does not appear to be a useful index for predicting exchange rate based on its poor results in the RMSE test and DM test.

**Author Contributions:** Investigation, Y.Z.; writing—original draft preparation, Y.Z.; writing—review and editing, S.H.; project administration, S.H.; funding acquisition, S.H. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was supported by JSPS KAKENHI Grant Number 17H00983.

**Acknowledgments:** We are grateful to two anonymous referees for their helpful comments and suggestions.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Appendix A**

The results using mean absolute percentage error (MAPE) are shown in Table A1. As shown below, the UIRP model also outperforms random walk in MAPE. The results for PPP, the monetary model, and Taylor model are omitted here since they all show the same results.


**Table A1.** Mean absolute percentage error (MAPE) of UIRP Model.

*Note.* "UIRP" indicates uncovered interest rate parity, "SVM" indicates support vector machine, "bond\_1y" indicates the government bond with 1 year to maturity. "bond\_2y" indicates the government bond with two years to maturity. "bond\_3y" indicates the government bond with three years to maturity. "bond\_5y" indicates the government bond with five years to maturity. "bond\_7y" indicates the government bond with seven years to maturity. "bond\_10y" indicates the government bond with ten years to maturity.

#### **References**


Taylor, Mark P. 1995. The economics of exchange rate. *Journal of Economic Literature* 33: 13–47.


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **Impact Analysis of Financial Regulation on Multi-Asset Markets Using Artificial Market Simulations**

## **Masanori Hirano 1,\* , Kiyoshi Izumi 1, Takashi Shimada 1,2, Hiroyasu Matsushima <sup>1</sup> and Hiroki Sakaji <sup>1</sup>**


Received: 31 March 2020; Accepted: 14 April 2020; Published: 17 April 2020

**Abstract:** In this study, we assessed the impact of capital adequacy ratio (CAR) regulation in the Basel regulatory framework. This regulation was established to make the banking network robust. However, a previous work argued that CAR regulation has a destabilization effect on financial markets. To assess impacts such as destabilizing effects, we conducted simulations of an artificial market, one of the computer simulations imitating real financial markets. In the simulation, we proposed and used a new model with continuous double auction markets, stylized trading agents, and two kinds of portfolio trading agents. Both portfolio trading agents had trading strategies incorporating Markowitz's portfolio optimization. Additionally, one type of portfolio trading agent was under regulation. From the simulations, we found that portfolio optimization as each trader's strategy stabilizes markets, and CAR regulation destabilizes markets in various aspects. These results show that CAR regulation can have negative effects on asset markets. As future work, we should confirm these effects empirically and consider how to balance between both positive and negative aspects of CAR regulation.

**Keywords:** artificial market; simulation; CAR regulation; portfolio

#### **1. Introduction**

The recent complexity of financial technology has increased the risk of financial markets, especially systemic risks. A systemic risk comes from the interaction between various components in financial markets. Systemic risk is one of the most significant dangers in financial markets because small triggers cause huge shocks in financial markets with the existence of systemic risks.

One famous example of systemic risk is the financial crisis of 2007–2008. The crisis began with a local default of American subprime mortgage loans. However, the fails in subprime mortgages spread widely, and it also affected stock markets. Moreover, it affected not only American financial markets but also global financial markets.

The other example of those risks is flash crashes. One of the most famous flash crash was the 2010 Flash Crash in the U.S. stock market on 6 May 2010. S&P 500 and Dow Jones Industrial Average rapidly dropped. The main reason for this crash was said to be one big sell order. Additionally, many algorithmic sell orders followed the first big sell orders, and it was said to cause a significant crash. Supposedly, improvements in information technologies contributed to risks in the financial market. However, the ground truth of these crashes has not been revealed.

These systemic risks should be reduced to stabilize financial markets. Once the risk becomes a reality and causes shocks, the spreading of shocks could not be stopped with a small amount of effort. Even if the beginning could be identified, it would be almost impossible to cut the chain of failures. It is because the risks and shocks come from the complex structure of financial markets. To avoid these shocks, some regulations or structural improvement in financial markets is necessary beforehand.

Therefore, predicting and identifying potential risks and making regulations for them are essential for market stabilization. Of course, it is also essential to learn from what happened before.

In terms of predicting and identifying potential risks, there are many ways to do so using artificial intelligence (AI) technologies. One of the most significant approaches to identify the potential risks is simulations. Simulation is also one of AI technologies and a promising approach. Simulation can test hypothetical situations and run again and again. It is different from the real financial market and beneficial.

Especially, agent-based simulation is useful for social science (Moss and Edmonds 2005). Agent-based simulations aim to imitate the real world by making an imaginary world with agents on computers.

The importance of simulation for financial markets was also argued (Farmer and Foley 2009; Battiston et al. 2016). For example, Lux and Marchesi (1999) showed that interaction between agents in financial market simulations is necessary to replicate stylized facts in financial markets. Moreover, some simulation models were made based on empirical researche or mathematically validated (Avellaneda and Stoikov 2008; Nagumo et al. 2017). These models can help us to interpret empirical findings of complex systems.

Moreover, one promising approach of agent-based simulations for financial markets is artificial market simulations. In this simulation, agents are constructed by imitating traders in the real financial market. In some previous work, such as Mizuta et al. (2016); Torii et al. (2015), this type of approach was used. In addition, Mizuta (2019) argued that artificial market simulation could contribute to the improvements of the structures or rules and regulations of financial markets.

In terms of regulations, there is a well-known regulation called "Basel regulatory framework." This regulatory framework has been updated and still active for over three decades in financial markets. This series of regulations was started as "Basel I" in 1988. In 1996, modified Basel I was agreed upon (Basle Committee on Banking Supervision 1996a). This regulatory framework included market operation risks for the first time. After the agreement of Basel II and modifications of it, the Capital Adequacy Ratio (CAR) regulation was established (Basel Committee on Banking Supervision 2006). This regulation aimed to avoid banks from bankrupting due to financial shocks and obligated banks to prepare enough money. Because of this regulation, banks have had to do risk management when they hold risk assets such as equities and derivatives. Then, in December 2017, the Basel Committee on Banking Supervision reached an agreement on the new international regulatory framework for banks (Basel Committee on Banking Supervision 2017). This agreement is called "Basel III". It aimed to avoid systemic risks as a whole financial market.

However, CAR regulation, which is introduced in Basel II, is said to destabilized markets under some situations. When markets fluctuate remarkably, risks of assets increase, and banks cannot have assets due to risk management. Thus, banks have to sell their assets. This mechanism can destabilize markets. This possibility was pointed out by Benink et al. (2008). According to this previous study, the CAR regulation of the Basel II accord can destabilize markets when the market has uncertainties.

There are also other researches focused on this destabilizing effect by using simulations. Hermsen (2010) shows a destabilizing effect of Basel II on markets by simulations. However, this research was testing only with a single asset, which is far from the truth in real markets.

In this paper, we show artificial market simulation with multi-asset markets and the CAR regulation and tested the effect of CAR regulation on financial markets. Markets with multiple assets are a more realistic scenario than that with single-assets. Moreover, using the artificial market simulations, we conducted research about the destabilizing effects of CAR regulation. In addition, we also sought other effects of the CAR regulation.

As a result, we confirmed the possibility of the destabilizing effect of CAR regulation and revealed the mechanism. We found that: 1. adopting portfolio optimization as each agent's strategy stabilizes markets; 2. the existence of CAR regulation can destabilize markets; 3. CAR regulation can cause significant price shocks; 4. CAR regulation also can suppress price increases; 5. CAR regulation pushes down market prices. From these results, although CAR regulation might have a positive effect as a regulation for banks, we conclude that CAR regulation can have negative effects on asset markets.

The contribution of this study is an impact assessment of CAR regulation on situations that have not yet happened. Usually, empirical studies are only able to assess the impact on the realized situation. Thus, it is difficult to estimate the future impact by introducing the regulation strictly. So, as we mentioned above, our simulation study can contribute to this issue.

#### **2. Related Works**

As a promising approach, there are many works on a numerical-simulation based approach. Moss and Edmonds (2005) argued that agent-based simulation is useful for this kind of social science. In financial markets, the importance of agent-based simulation was argued in (Battiston et al. 2016; Farmer and Foley 2009). For example, Lux and Marchesi (1999) showed that interaction between agents in financial market simulations is necessary to replicate stylized facts in financial markets. Moreover, some simulation models were made based on empirical research or mathematically validated (Avellaneda and Stoikov 2008; Nagumo et al. 2017). These models can help us to interpret empirical findings of complex systems.

Moreover, there are also many works on financial market simulation based on multi-agent simulation. This approach is called an artificial market simulation. Mizuta (2019) has demonstrated that a multi-agent simulation for the financial market can contribute to the implementation of rules and regulations of actual financial markets. Torii et al. (2015) used this approach to reveal how the flow of a price shock is transferred to other stocks. Their study was based on (Chiarella and Iori 2002), which presented stylized trader models including only fundamental, chartist, and noise factors. Mizuta et al. (2016) tested the effect of tick size, i.e., the price unit for orders, which led to a discussion of tick-size devaluation in the Tokyo stock exchange. As a platform for artificial market simulation, Torii et al. (2017) proposed the platform "Plham". In this study, we partially used this platform. Now, there has already been an updated version called "PlhamJ" (Torii et al. 2019). In addition, Hermsen (2010) showed the possibility that the regulation has a destabilizing effect on markets by simulation. As we mentioned in the introduction, this paper is aimed to validate the same effects on multi-asset markets.

Other works related to financial markets also exist other than simulations. For example, in Japan, the Japan Exchange Group provides tick data, which includes all the order data of the stock exchanges in Japan (Japan Exchange Group 2017). These order data, called "Flex Full" data, are detailed and serve several purposes. For example, Miyazaki et al. (2014) proposed the use of a Gaussian mixture model and Flex Full data for the detection of illegal orders and trades in the financial market. Tashiro and Izumi (2017) proposed a short-term price prediction model using neural networks. In their work, the authors processed Flex Full order data on a millisecond time scale with a recurrent neural network known as long short-term memory (Hochreiter and Schmidhuber 1997). Further, the author recently extended this method (Tashiro et al. 2019) using a convolution neural network (Krizhevsky et al. 2012). Moreover, Nanex (2010) mined and reported some distinguishing ordering patterns from order data. Cont (2001) obtained stylized facts regarding the real financial market with statistical analytics such as volatility clustering. In this study, we used some findings and analysis from this work.

#### **3. Model**

Here, we describe the new multi-asset model of the artificial market. The model contains markets and agents of different types, acting like traders in real financial markets and using their own strategies. The types of agents are as follows.


Each agent can buy or sell any asset in the markets subject to restrictions such as cash, leverage, or CAR regulation (Figure 1). The markets are based on a multi-asset artificial market of Torii et al. (2015). The stylized trading agents are based on a model of Chiarella and Iori (2002).

**Figure 1.** Model outline. The model consists of a number of markets in which three types of agent trade assets.

In every step of the simulation, some agents are chosen and place their orders. The number of those agents is the same number as the number of markets. Thus, it is not fixed how many orders are placed or how many orders are contracted in one step.

#### *3.1. Markets*

Every asset has a unique market with a continuous double auction mechanism for setting the market price. Any sell or buy order can be placed, and the prices of orders are real numbers (not limited to integers). Examples of order books are shown in Figure 2. Before inputting a new order, usually, there are also other orders on both sell and buy order books (Figure 2a). Then, if one agent input 80 buy orders at 400.53, the orders are placed on a buy order book (Figure 2b). In the case of the order books that appear in Figure 2, the 80 new buy orders can be contracted with existing sell orders, 50 sell orders at the price of 400.5 and 30 orders in 38 sell orders at the price of 400.528 (Figure 2c). After contracting, only 8 sell orders in 38 sell orders at the price of 400.528 are still on the sell book, and all contracted orders vanish (Figure 2d).



(**c**)

(**d**)

**Figure 2.** Continuous double auction mechanism for setting the market price. (**a**) Before inputting a new order. (**b**) Inputting 80 buy orders at 400.53 (highlighted in yellow). (**c**) Contracting 50 orders at 400.5 and 30 orders at 400.528 (highlighted in yellow). (**d**) After contracting.

Market prices are decided separately for every asset in each market through the continuous double auction mechanism. Transaction fees are ignored.

Every asset has a fundamental price. This price is for agents to decide their own estimated value for each asset, and determined in accordance with multivariate geometric Brownian motion (known as a random walk). In the simulations, the start of the Brownian motion was set at 400.0, and the variance was set at 1.0 × <sup>10</sup>−<sup>6</sup> based on Torii et al. (2015).

In addition, we assumed that there was no correlation between any pairs of assets.

#### *3.2. Agents*

The three types of agent roughly imitate traders in real markets. Each one has its own trading strategy or trading algorithm, which is explained below. Initially, all agents have 50 assets per market.

#### *3.3. Agents: Stylized Trading Agents*

The stylized trading agents are based on those of Chiarella and Iori (2002).

In every step, the stylized trading agents estimate their reasonable price for each asset by using historical data and fundamental prices. They use three types of index, which they calculate by themselves using their own unique parameters.


A stylized trading agent, agent *i*, decides its reasonable price for asset (market) *s* at time *t*, as follows.

Agent *i* has three unique weights; *w<sup>i</sup> <sup>F</sup>* ≥ 0 for *<sup>F</sup>*, *<sup>w</sup><sup>i</sup> <sup>C</sup>* ≥ 0 for *<sup>C</sup>*, and *<sup>w</sup><sup>i</sup> <sup>N</sup>* ≥ 0 for *<sup>N</sup>* (where *<sup>w</sup><sup>i</sup> <sup>F</sup>* + *wi <sup>C</sup>* + *<sup>w</sup><sup>i</sup> <sup>N</sup>* > 0). The estimated logarithmic return *r i*,*s <sup>t</sup>* is

$$r\_t^{i,s} = \frac{1}{w\_F^i + w\_C^i + w\_N^i} (w\_F^i F\_t^{i,s} + w\_C^i C\_t^{i,s} + w\_N^i N\_t^{i,s}).\tag{1}$$

As described below, *w<sup>i</sup> <sup>F</sup>*, *<sup>w</sup><sup>i</sup> <sup>C</sup>*, and *<sup>w</sup><sup>i</sup> <sup>N</sup>* were set to exponential distributions with means of 10.0, 1.0, and 10.0.

Fundamental factor *Fi*,*<sup>s</sup> <sup>t</sup>* , chartist factor *<sup>C</sup>i*,*<sup>s</sup> <sup>t</sup>* , and noise factor *<sup>N</sup>i*,*<sup>s</sup> <sup>t</sup>* for agent *i* at time *t* are calculated:

• Fundamental factor:

$$F\_t^{i,s} = \frac{1}{r\_r^i} \ln \left(\frac{p\_t^s}{p\_t^{\*s}}\right),\tag{2}$$

where *p<sup>s</sup> <sup>t</sup>* is the price at time *t*, *p*∗*<sup>s</sup> <sup>t</sup>* is the fundamental price at time *t* (given by the geometric Brownian motion as we mentioned above), and *τ<sup>i</sup> <sup>r</sup>* is agent *i*'s mean-reversion-time constant (This indicates how long the agent assumes it takes for the price to go back to the fundamental price).

• Chartist factor:

$$\mathbf{C}\_{t}^{i,s} = \frac{1}{\tau^i} \sum\_{j=1}^{\tau^i} \ln \left( \frac{p\_{t-j}^s}{p\_{t-j-1}^s} \right),\tag{3}$$

where *τ<sup>i</sup>* is agent *i*'s time window size, which means how far back into the past the historical data goes that the agent uses.

• Noise factor:

$$N\_t^{i,s} \sim N(0, \sigma\_N^i),\tag{4}$$

which means that *Ni*,*<sup>s</sup> <sup>t</sup>* obeys a normal distribution with a zero mean and the variance (*σ<sup>i</sup> <sup>N</sup>*)2.

In the simulations, *τ<sup>i</sup> <sup>r</sup>* was a uniform distribution [50, 150], *τ<sup>i</sup>* was a uniform distribution [100, 200], and *σ<sup>i</sup> <sup>N</sup>* = 1.0 × <sup>10</sup>−<sup>3</sup> based on Torii et al. (2015).

On the basis of the Equations above, the agent decides its reasonable price *pi*,*<sup>s</sup> <sup>t</sup>*+Δ*<sup>t</sup>* at time *t* + Δ*t*:

$$p\_{t+\Delta t}^{i,s} = p\_t^s \exp\left(r\_t^{i,s} \Delta t\right). \tag{5}$$

Then, on the basis of this price, the agent decides whether to buy or sell one unit share; if *pi*,*<sup>s</sup> <sup>t</sup>*+Δ*<sup>t</sup>* > *<sup>p</sup><sup>s</sup> <sup>t</sup>*, the agent buys one unit of asset *s*, and if not, the agent sells one unit of asset *s* at time *t*.

Stylized trading agents are aimed at making enough orders for contracting. This is because if there were few orders on the markets' order books, an agent's order would not be contracted when other portfolio trading agents make orders. Thus, only stylized trading agents are allowed to "sell short" and "margin trade." Sell short is to place sell orders without holding stock. Margin trade is to place orders without holding enough money.

In addition, stylized trading agents can access and buy or sell any asset in the markets. They make decisions separately, whether they will buy or sell each asset.

#### *3.4. Agents: Non-Regulated Portfolio Trading Agents*

Non-regulated portfolio trading agents have a trading strategy based on portfolio optimization. In the same way as stylized trading agents, non-regulated portfolio trading agents can access all markets and can buy or sell any asset. However, in contrast to stylized trading agents, non-regulated portfolio trading agents choose their position, how many shares of each asset they will buy or sell, inclusively. That is, whether they will buy or sell an asset depends on the other assets, and the decision also depends on the results of the portfolio optimization.

Non-regulated portfolio trading agents optimize their position in each *τ<sup>i</sup> <sup>p</sup>* step. *τ<sup>i</sup> <sup>p</sup>* is the term in which to keep the current position. This variable is given at random to each agent, following an exponential distribution whose mean and standard deviation are each 150. Thus, the agents take actions only in each *τ<sup>i</sup> <sup>p</sup>* step. The actions are as follows.


Markowitz's mean-variance approach (Markowitz 1952) is used in the optimization phase (the third phase above). The expected utility function is defined with a position matrix , matrix of reasonable values *Prsn*, and variance-covariance matrix of every asset for the last *τ* steps Ω. *A* means the transposed matrix of *A*.

$$\mathbb{E}\mathbb{E}\mathbb{U}(\mathbf{x}) = P\_{\text{res}}^{\top}\mathbb{x} - \frac{1}{2}\mathbb{x}^{\top}\Omega\mathbb{x} \tag{6}$$

The constraints are as follows. The agents have a budget, and they are not allowed to sell short. Thus, all components in are nonnegative. Agents find that fits the constraints and maximizes EU( ).

Non-regulated portfolio trading agents have capital and leverage. In the simulations, the capital was 6000 per market and the leverage limit was 10, which means that the budget limit is 60,000 per market. For example, if there are five markets, the budget limit is 300,000.

#### *3.5. Agents: Regulated Portfolio Trading Agents*

Regulated portfolio trading agents are almost the same as non-regulated portfolio trading agents. However, they have additional constraints, which model the CAR regulation. The CAR regulation is based on Basel I (Basle Committee on Banking Supervision 1996a) and Basel II (Basel Committee on Banking Supervision 2006).

The CAR regulation in this model uses the Value at Risk (VaR) and regulates agents when the value of assets held by each agent falls.

$$\text{(CAR)} = \frac{\text{(Capital)}}{12.5 \times \text{(Markets Risk)}} \ge 8\% \tag{7}$$

This equation follows the rule modeling Basel I after introduction of market risk regulation (Basle Committee on Banking Supervision 1996a). The equation above is also defined by it. Markets risk is calculated as

$$\text{The first-order coupling between the two-dimensional } \mathcal{N} \text{-matrices is the only possible } \mathcal{N} \text{-matrices with } \mathcal{N} = \{0, 1, 2, \dots, N\} \text{ and } \mathcal{N} = \{0, 1, 2, \dots, N\}.$$

$$(\text{Markets Risk}) = (\text{Current Total Share Price}) \times (1 - \exp\left(\text{VaR}\right)). \tag{8}$$

The current total share price does not include cash because cash is meant to be a non-risk property. According to this equation, the more leverage agents take on, the more risk they have to take on as well. The "(1 − exp (VaR))" part indicates the possible amount of the decrease in asset value using VaR. Because VaR is calculated on a logarithmic scale, it is included as an exponent in this equation.

VaR is calculated as

$$(\text{VaR}) = -1.0 \times (99\% \text{ One-sided Confidence Interval}) \times \sqrt{T} \times \sqrt{\text{x} \Omega \text{x}} \tag{9}$$

This calculation involves taking the floor of the upper one-sided confidence interval and uses the root T multiply method (FFR+ 2008). is the position matrix of all assets. The holding term of an asset *T* is set to 10 steps, and Ω is the variance-covariance matrix of all assets for the last 250 steps, as described in Basel I and II (Basle Committee on Banking Supervision 1996b), whose CAR regulation uses historical data for 250 business days and expects a holding term of 10 business days.

When the agents' portfolio violates the regulation, they revise their portfolios as follows.

1. Calculate CAR based on the current position (referred to as CAR( )).

2. Calculate *R*.

$$R = \frac{0.08}{\text{CAR}(\text{\%})} \tag{10}$$


$$B = \frac{1}{\frac{R-1}{2} + 1} \text{Val}(\text{x}) \tag{11}$$


$$\text{CAR}(\text{z}) \ge 0.08.\tag{12}$$

7. If the portfolio still violates Equation (12), go to step 1 again. If not, is the final position, which does not violate the CAR regulation.

#### *3.6. Parameters*

All agents have parameters. Some parameters were decided beforehand; others were decided by conducting a parameter search over a number of simulations. The agent parameters decided by parameter search are


These parameters are decided by simulating with many parameter sets. In this study, the candidates for the amount of cash were 4000, 6000, 8000, and 12,000, while the candidates for the weights were exponential distributions with means of 1.0, 3.0, 5.0, and 10.0. Then, the target kurtosis of the market price change was set to about 15 for five markets, 1000 stylized trading agents, and 100 non-regulated portfolio trading agents. Kurtosis, *κ*, is defined with the logarithmic return *rt*:

$$\kappa = \frac{\overline{(r\_t - \overline{r\_t})^4}}{(r\_t - \overline{r\_t})^2} - 3. \tag{13}$$

Kurtosis was used in the parameter search because a large kurtosis is a feature of "fat-tailed" distributions and a stylized fact characterizing real markets (Cont 2001). Here, events that rarely occur in accordance with a normal (Gaussian) distribution occur more often when they follow a fat-tailed distribution. Because *κ* = 0 for the normal (Gaussian) distribution, it is a good index for indicating a fat tail.

According to Tables 1 and 2, the possible kurtosis seems to be under 20. The test with five markets, 1000 stylized trading agents, and 100 non-regulated portfolio trading agents showed price changes with comparatively higher kurtosis than those of other simulations in this study. So, we assumed the target kurtosis to be about 15.

**Table 1.** Kurtosis of 5 min interval (Cont 2001).



**Table 2.** Kurtosis of daily price changes in Japanese markets.

In addition, we checked for "absence of autocorrelations" and "volatility clustering", which are two of Cont's stylized facts (Cont 2001).

The parameter search yielded the following results.


#### **4. Experiments**

We designed two different experiments: one for assessing the effect of portfolio trading agents, the other for checking the effect of the CAR regulation.

Before explaining each experiment, we should explain the evaluation index. The index evaluates how many steps the market price drops abnormally. We define "abnormal" as:

$$\text{DVV}\_{\text{t}} = \ln \left( \frac{p\_{\text{t}}}{p\_{\text{t}}^{\*}} \right) \le -0.1. \tag{14}$$

In this equation, *pt* means the market price, and *p*∗ *<sup>t</sup>* means the fundamental price. Moreover, *NDV* is defined as the number of steps in which Equation (14) is satisfied.

The common settings for the simulations were as follows.


#### *4.1. Experiments for Assessing Effects of Portfolio Trading Agents*

First, using only stylized trading agents and non-regulated portfolio trading agents, as listed below, we experimented on the effect of portfolio trading agents.


#### *4.2. Experiments for Assessing Effects of the CAR Regulation*

Second, we included regulated portfolio trading agents. Here, the total number of non-regulated portfolio trading agents and regulated portfolio trading agents were fixed to 100 agents. We experimented by changing the ratio of non-regulated portfolio trading agents and regulated portfolio trading agents.


#### **5. Results & Discussion**

#### *5.1. Stylized Fact Checks*

Before we present the results of each type of experiment we mentioned above, we show the results for checking whether stylized facts are also appearing in our simulation. This process is essential to ensure that our simulations are not unrelated ones to the actual market. As stylized facts to check our simulation, we employed the absence of autocorrelations, heavy tails (fat tails), and volatility clustering. These stylized facts were listed on Cont (2001) and Dacorogna et al. (2001). To ensure our simulation works well, we checked the facts in all parameters sets in this study by checking plots. However, here we show the results from the simulation with 1000 stylized trading agents, 50 non-regulated portfolio agents, 50 regulated portfolio agents, and 5 markets.

In the following, we check these stylized facts each by each.

At first, we checked the absence of autocorrelation. Here, we calculate the autocorrelation of logarithm returns.

Figure 3 shows autocorrelation function (ACF) for logarithm returns. This result clearly shows the absence of autocorrelation. It is because the autocorrelation is statistically significant only for very small lags. This result agrees nicely with the fact shown in Cont (2001) and Dacorogna et al. (2001).

**Figure 3.** Autocorrelation function (ACF) for logarithm returns. Error bars mean standard deviation for each lag among 500 markets in 100 simulations. The green area shows the 95% confident interval corresponding to each lag.

Next, we checked the volatility clustering. Here, we tested the autocorrelations of 50-step standard deviations of logarithm returns.

Figure 4 shows autocorrelation function (ACF) for 50-step standard deviations of logarithm returns of logarithm returns. This also clearly shows the volatility clustering. The autocorrelation function is statistically significant. It means that there are autocorrelations for standard deviation, i.e., volatility. This result also agrees with the stylized fact shown in Cont (2001) and Dacorogna et al. (2001).

**Figure 4.** Autocorrelation function (ACF) for 50-step standard deviations of logarithm returns. Error bars mean standard deviation for each lag among 500 markets in 100 simulations. The green area shows the 95% confident interval corresponding to each lag.

At last, we confirmed the heavy tail (fat tail) in return distributions. To investigate the existence of heavy tails, we processed the time-series market prices in simulations. At first, we calculated logarithm returns from market prices. Then, we normalized these returns in each market in each simulation. Moreover, based on these normalized returns, we made frequency distributions. According to the frequency distributions, we made Figure 5. In addition, for comparisons, we also plot the Gaussian (normal) distribution on Figure 5.

**Figure 5.** Frequency distribution of normalized log returns. The blue plot shows the results from simulations. Error bars mean standard deviation for each lag among 500 markets in 100 simulations. The orange plot shows the Gaussian (Normal) distribution.

According to Figure 5, the simulation results have a significant heavy tail (fat tail). This result also shows that our simulations can reproduce one of the stylized facts.

According to the results we showed above, we can assume that our simulations can reproduce some stylized facts in real markets. This may indicate that our simulations are reliable at a certain level.

As future work, we should further ensure the reliability of our simulations. Of course, it is almost impossible to prove that simulations are entirely reliable. However, we should keep trying to ensure reliability by some methods including checking stylized facts.

*5.2. Effects of Portfolio Trading Agents*

Figure 6 shows the results of the experiments assessing the effect of portfolio trading agents.

**Figure 6.** *NDV* of experiments assessing the effect of portfolio trading agents. The horizontal axis is the number of non-regulated portfolio trading agents, and the vertical axis in the number of steps in which the market price is below exp (−0.1) times the fundamental price. (Details are explained with Equation (14)). Each series of plots shows results for the corresponding number of markets in the simulation. Error bars show standard deviations.

According to Figure 6, the more non-regulated portfolio trading agents there are in the markets, the fewer steps there are whose market price goes below exp (−0.1) times the fundamental price. This means non-regulated portfolio trading agents stabilize markets when there are multiple markets (assets). However, rightly, there is not the tendency when there is only one market (asset).

Ordinarily, portfolio optimization is thought to stabilize only individual trading returns. However, these results show that portfolio optimization can also stabilize whole markets.

In our opinion, because portfolio optimization can find market equilibria points, it can improve the efficiency of markets. In short, portfolio optimization might help to find the correct value for each asset, and it has a stabilizing effect on markets.

In addition, Figure 6 shows that the more assets there are in portfolios, the more stable that markets become. Additionally, in the case of 5–10 markets, the stabilizing effect saturates. On the other hand, when there is only one market (assets), and agents cannot take portfolio optimization, the stabilizing effect does not appear.

These results also suggest that the number of portfolio trading agents is in the appropriate range to examine these effects. It is because the figure clearly shows changes in *NDV*.

#### *5.3. Effects of CAR Regulation*

Figure 7 shows the results of the experiments assessing the effect of CAR regulation.

**Figure 7.** *NDV* graph assessing the effect of capital adequacy ratio (CAR) regulation. The horizontal axis is the percentage of regulated portfolio trading agents among all non-regulated and regulated portfolio trading agents. The vertical axis is the number of steps in which the market price is below exp (−0.1) times the fundamental price. (Details are explained with Equation (14)). Each series of plots shows a simulation series for a certain number of markets. Error bars show standard deviations.

The far left of Figure 7 shows *NDV* in the case that the percentage of regulated portfolio trading agents is 0%. The results here are under the same condition as in the case that the number of portfolio trading agents is 100 in Figure 6. Precisely, the results are slightly different from those in Figure 6 because we performed experiments as consecutive different experiments from the experiments for Figure 6. However, the results are almost the same.

According to this graph, regulated portfolio trading agents make *NDV* larger, which means CAR regulation causes more and more price shocks.

Moreover, when the percentage of portfolio trading agents regulated by Basel is 100%, the difference in *NDV* between the instances of one market and ten markets is smaller than when the percentage is 0%. It suggested that CAR regulation can suppress the stabilizing effect of portfolio trading agents.

These results show that CAR regulation may destabilize markets even if there are multiple markets (assets) and portfolio trading agents have great stabilizing effects on markets.

Let us consider why this happens. CAR regulation would be invoked when market prices fall and risks of holding assets increase; this means that agents would have to sell their assets. In turn, the regulation would have more and more impact on other agents, i.e., the regulation causes assets sellings and the asset sellings trigger other's regulations. Thus, under these conditions, selling would induce more selling and lead to significant price shocks.

To ensure the results, we also checked the more significant shocks. Thus, we modified the threshold of the definition of Equation (14) as:

$$\text{DV}\_{\text{t}} = \ln \left( \frac{p\_{\text{t}}}{p\_{\text{t}}^{\*}} \right) \le -0.5. \tag{15}$$

Then, we made a plot of *NDV* in the same manner as in Figure 7. The result is shown in Figure 8.

**Figure 8.** *NDV* graph assessing the effect of CAR regulation. The horizontal axis is the percentage of regulated portfolio trading agents among all non-regulated and regulated portfolio trading agents. The vertical axis is the number of steps in which the market price is below exp (−**0**.**5**) times the fundamental price. (Details are explained with Equation (15)). Each series of plots shows a simulation series for a certain number of markets. Error bars show standard deviations. Please note that the vertical axis is logarithmic.

Also, in Figure 8, there is the same tendency as in Figure 7. Even if the threshold is exp (−0.5) ≈ 0.607 times the fundamental price, there are some steps in which the market price is below the threshold. About 60% of fundamental prices are quite significant price shocks.

Moreover, the tendency, i.e., more regulated portfolio trading agent on markets cause more price shocks, is more significant when there are fewer numbers of markets. When there are ten markets and no regulated portfolio trading agent, the significant price shocks defined in Equation (15) happened in only about 1 step in 60,000 steps on average. However, when all the portfolio trading agents are regulated, it happened more than 40 steps, even if there are ten markets.

Now, let us look for other effects of CAR regulation by examining Figures 9 and 10.

**Figure 9.** Kurtosis of price changes in the experiments for assessing the effect of CAR regulation. The horizontal axis is the percentage of regulated portfolio trading agents among all non-regulated and regulated portfolio trading agents. The vertical axis is kurtosis. Each series of plots shows results for a certain number of markets in the simulation. Error bars show standard deviations.

**Figure 10.** Mean *DV* in experiments for assessing the effect of CAR regulation. The horizontal axis is the percentage of regulated portfolio trading agents among all non-regulated and regulated portfolio trading agents. The vertical axis is the mean of *DVt*, which is defined in Equation (14). Each series of plots shows results for a certain number of markets in the simulation. Error bars show standard deviations.

Figure 9 shows the kurtosis of price changes. The definition of kurtosis is the same as in Equation (13).

According to this figure, having more regulated portfolio trading agents decreases kurtosis. This suggests that CAR regulation eliminates the fat-tail tendency from markets.

The results depicted in the *NDV* and kurtosis graphs suggest the following hypothesis: CAR regulation depresses prices and decreases the chances of market prices rising. We have already confirmed the fact that CAR regulation causes more price shocks and eliminates the fat-tail tendency. Thus, we suspect that the overall decline in prices had led to a low fat-tail even when the price shocks happened more frequently. Moreover, we also suspect that CAR regulation causes decreases the chances of market prices rising. The reason we assumed this is that a rising market price indicates that there is more chance of the price decreasing. In addition, agents sell assets during sudden market price rises, because the chance of a price shock also increases in the CAR regulatory framework.

Thus, CAR regulation may depress whole markets.

Figure 10 shows the changes in mean *DVt*. *DVt* is defined in Equation (14); it is the ratio of the market price to the fundamental price.

Figure 10 shows that a greater number of regulated portfolio trading agents depress the mean of *DVt*. It clearly shows that CAR regulation depresses whole markets.

In addition, we also checked the qualitative price fluctuations. Figures 11 and 12 show the average steps in which market prices go up or down.

**Figure 11.** Steps in which the market prices went up. The horizontal axis is the percentage of regulated portfolio trading agents among all non-regulated and regulated portfolio trading agents. The vertical axis is the mean number of Steps in which the market prices went up. Each series of plots shows results for a certain number of markets in the simulation. Error bars show standard deviations.

**Figure 12.** Steps in which the market prices went down. The horizontal axis is the percentage of regulated portfolio trading agents among all non-regulated and regulated portfolio trading agents. The vertical axis is the mean number of Steps in which the market prices went down. Each series of plots shows results for a certain number of markets in the simulation. Error bars show standard deviations.

According to the results, the more portfolio trading agents there are on markets, the more the market prices fluctuate. In both Figures 11 and 12, increasing the percentage of regulated portfolio trading agents cause more steps in which the market prices went up or down. These results show that there are fewer steps in which the market prices stayed the same when there are more regulated agents. This means markets fluctuate more due to CAR regulation. Moreover, this tendency is supposed to not come from price shocks confirmed above. It is because, if this effect came from the price shocks, only the increase in the steps in which the market prices went down would be shown.

#### *5.4. Summary of Results and its Discussion*

The results obtained by the simulation and their interpretation are summarized in Table 3. In Table 3, the descriptions are simplified and become easy to understand. "Less Regulated" means the case when there are fewer numbers of regulated portfolio trading agents such as 0%. On the other hand, "More Regulated" means the case when there are bigger numbers of regulated portfolio trading agents such as 100%. The other descriptions are also simplified and relative.

As we mentioned before, price shocks happened more with more regulated agents. It is also assumed to enhance the fat tail of the lower side.

Moreover, kurtosis was also low when there are more regulated agents. It means the fat tail of price change (return) distribution was diminished.

Gathering the facts above, we can assume that the regulation also diminished the fat tail of the upper side of price change (return) distribution. This means that CAR regulation can depress the chance of price rising.

Since we also confirmed a decrease in the mean of market prices, the depressing effects of CAR regulation are supposed to be various.

Moreover, we also confirmed the additional fluctuation due to the regulation through the change in the frequency of price moving (up and down).


**Table 3.** Summary of simulated facts and interpretations about the effect of CAR regulation.

In summary, we suppose that four dynamics due to CAR regulation are at play:


#### *5.5. Future Work*

In this subsection, we discuss future work.

At first, as future work, the negative effect of CAR regulation shown in this paper should be confirmed empirically. Simulation is helpful to test a hypothetical situation or to help us understand what happened. However, the combination of simulation and empirical research on CAR regulation make our argument more solid.

Moreover, we examine other ways of calculating risks. Now, we employed VaR (Value at Risk). However, there are other ways to calculate risks, such as the expected shortfall.

In terms of CAR regulation, we should consider how to balance both positive and negative aspects of CAR regulation. In this paper, we showed and emphasized only the negative aspect. However, originally, CAR regulation and Basel regulatory framework were designed to prevent a chain bankruptcy in bank networks. So, the impact of regulation on financial markets is secondary. Thus, we should not focus only on the negative effect on financial markets but also consider it as a whole, including bank networks.

#### **6. Conclusions**

In this paper, we examined the effect of CAR regulation in agent-based simulations. By using agents who buy or sell at reasonable prices—agents whose strategy is portfolio optimization, and agents whose strategy is portfolio optimization under the regulation in artificial markets—we simulated two scenarios. The results lead us to the following conclusions:


Thus, we conclude that CAR regulation has negative effects on markets widely. Although it might be adequate to prevent systemic risks and chain bankruptcy, CAR regulation can have negative effects, at least, on assets markets. Moreover, if significant price shocks happen because of CAR regulation, there is a possibility that bankruptcies are caused because of the price shocks. Thus, we should balance the positive and negative aspects of CAR regulation.

In this study, we contribute to assessing the impact of CAR regulation via simulations. Our simulation study is beneficial to asses the situation that never happened yet. Usually, empirical studies are only able to assess the impact on the realized situation.

Regarding future work, our finding that CAR regulation can suppress price increases is a new finding that should be the subject of verification empirically. Moreover, we should examine other ways to calculate risk, e.g., by using the expected shortfall in the latest CAR regulation, instead of the VaR used in this study.

**Author Contributions:** Conceptualization, K.I. and M.H.; methodology, M.H.; software, M.H.; validation, M.H.; formal analysis, M.H.; investigation, M.H.; resources, M.H.; data curation, M.H.; writing—original draft preparation, M.H.; writing—review and editing, M.H., K.I., T.S., and H.M.; visualization, M.H.; supervision, K.I.; project administration, K.I. and H.S.; funding acquisition, K.I. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was supported by KAKENHI (no. 15H02745) and MEXT via Exploratory Challenges on Post-K computer (study on multilayered multiscale space-time simulations for social and economic phenomena).

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


c 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **Deep Reinforcement Learning in Agent Based Financial Market Simulation**

**Iwao Maeda 1,\*, David deGraw 2, Michiharu Kitano 3, Hiroyasu Matsushima <sup>1</sup> , Hiroki Sakaji 1, Kiyoshi Izumi <sup>1</sup> and Atsuo Kato <sup>3</sup>**


Received: 28 February 2020; Accepted: 8 April 2020; Published: 11 April 2020

**Abstract:** Prediction of financial market data with deep learning models has achieved some level of recent success. However, historical financial data suffer from an unknowable state space, limited observations, and the inability to model the impact of your own actions on the market can often be prohibitive when trying to find investment strategies using deep reinforcement learning. One way to overcome these limitations is to augment real market data with agent based artificial market simulation. Artificial market simulations designed to reproduce realistic market features may be used to create unobserved market states, to model the impact of your own investment actions on the market itself, and train models with as much data as necessary. In this study we propose a framework for training deep reinforcement learning models in agent based artificial price-order-book simulations that yield non-trivial policies under diverse conditions with market impact. Our simulations confirm that the proposed deep reinforcement learning model with unique task-specific reward function was able to learn a robust investment strategy with an attractive risk-return profile.

**Keywords:** deep reinforcement learning; financial market simulation; agent based simulation

#### **1. Introduction**

In recent years, applications of deep learning to predicting financial market data have achieved some level of success (Chong et al. 2017; Long et al. 2019; Lahmiri and Bekiros 2019, 2020). However, issues such as heteroskedasticity Nelson (1991), low signal-to-noise Aaker and Jacobson (1994), and the large observer effect seen in market impact, make their use in real-world applications challenging. Furthermore, the predictive ability of deep learning models is highly dependent on the data used in training, and for samples with "unknown" features (also known as out-of-distribution samples (Lee et al. 2018)), these models can make surprisingly nonsensical predictions and is a critical issue when applying deep learning models to real world tasks (Sensoy et al. 2018).

In financial areas, there are models for providing automated financial advice or investment management called robo-advisors (Leong and Sung 2018; Sironi 2016). Robo-advisors predict future market states and determine actions according to their predictions. Therefore, poor predictions could potentially lead to unacceptable levels of risk and large losses of capital. As a common financial disclaimer states that "past performance is not a guarantee of future returns", the fundamental difficulty is that past data is often a poor predictor of the future (Silver 2012),

and complex dynamical properties of financial markets (Arthur 1999) make it difficult to incorporate prior knowledge about the target distribution. Financial practitioners have traditionally been limited to training models with past data and do not have many options for improving their predictive models (Bailey et al. 2014). Moreover, such "back-testing" of models cannot account for transaction costs or market impact, both of which can be of comparable magnitude to the forecast returns (Hill and Faff 2010).

In order to overcome such issues, we argue that artificial market simulation provides a promising avenue for improving the predictive ability of deep reinforcement learning models. Agent-based artificial market simulation is an established and widely studied methodology (Raberto et al. 2001; Streltchenko et al. 2005) that has been shown to be of practical significance as alternatives to real markets (Brewer et al. 2013; Muranaga et al. 1999; Raberto et al. 2001). One of the main advantages of simulated markets is that they can be adapted to create realistic scenarios and regimes that have never been realized in the past (Lux and Marchesi 1999; Silva et al. 2016). Multi-agent simulation is a system of environment and agents where the actions of agents are informed by the environment and the state of the environment evolves, in turn, through the actions of the agents. Typically, in financial simulations, the market is designated as the environment where assets are transacted by an ensemble of agents modeled after investors and whose behavior is governed by reductive formulas.

Efforts to learn deep reinforcement learning (DRL) strategies on environmental simulators have been successful in various domains (Gu et al. 2017; Gupta et al. 2017; Li et al. 2016) and shown to attain or exceed human-level performance on highly complex reward-maximizing gameplay tasks (Mnih et al. 2013, 2015) such as GO and StarCraft II (Vinyals et al. 2019). Deep reinforcement learning agents base their actions on predictions about the environment, and train their networks to maximize their cumulative rewards obtained through their individual actions. As mentioned previously, models trained in simulation with DRL are understood to derive their superior predictive ability from having trained across more different scenarios than can be experienced by any one human in a lifetime.

Another advantage of agent simulations is that DRL models can be trained in an environment with realistic transaction costs and market impact (Donier et al. 2015). Market impact is the influence of your own investment actions on market states and is understood to adversely affect investment returns. Market impact is something that cannot be reproduced from historical data alone since the relationships between actions and effects cannot be replicated in such back-testing.

Previous simulation and deep reinforcement learning research in finance (Raman and Leidner 2019; Ritter 2018; Spooner et al. 2018) has been limited by overly simplistic agents with a limited action space. In contrast, our proposed agent based framework for training DRL agents yield sophisticated trading strategies that achieve robust risk-return profiles under diverse conditions with market impact and transaction costs.

The main contributions of our current work are as follows:


#### **2. Related Work**

#### *2.1. Stock Trading Strategies*

Stock trading strategies have been studied for a long time (LeBaron 2002). Traditionally, dynamic model-based approaches apply simple rules and formulas to describe trader behavior, such as the sniping strategy (Rust et al. 1992), the Zero intelligence strategy (Ladley 2012), and the risk-based bidding strategy (Vytelingum et al. 2004). In recent years, reinforcement learning (RL) methods (Dayan and Balleine 2002;

Sutton et al. 1998), especially deep reinforcement learning (DRL) methods (Mnih et al. 2013, 2015) have been applied to learning investment strategies (Meng and Khushi 2019; Nevmyvaka et al. 2006), such as DRL for financial portfolio management (Jiang et al. 2017), market making via reinforcement learning (Spooner et al. 2018), and DRL for price trailing (Zarkias et al. 2019).

#### *2.2. Deep Reinforcement Learning*

DRL (as well as RL) is roughly classified into two types—value based and policy based approaches. Value based deep reinforcement learning methods (Littman 2001) approximate value functions called Q-functions using deep neural networks, and actions with maximum Q-functions are selected. Rainbow (Hessel et al. 2018) claims that combination of major methodologies in DRL, such as double DQN (Van Hasselt et al. 2016) and dueling network (Wang et al. 2015) improves performance of DQN drastically. General reinforcement learning architecture (Gorila) (Nair et al. 2015) provided a parallel training procedure for fast training. Ape-X (Horgan et al. 2018) proposed an algorithm for distributed processing of prioritized experience replay (Schaul et al. 2015). On the other hand, policy based deep reinforcement learning methods (Sutton et al. 2000) approximate the optimal policy directly using deep neural networks. Especially, actor-critic methods which train actor and critic functions simultaneously have been intensely studied, such as asynchronous advantage actor-critic (A3C) (Mnih et al. 2016), deep deterministic policy gradient (DDPG) (Silver et al. 2014), and trust region policy optimization (TRPO) (Schulman et al. 2015).

DRL is applied to various tasks such as table games (Silver et al. 2016, 2017) video games (Lample and Chaplot 2017; Mnih et al. 2013), autonomous driving (Pan et al. 2017; Sallab et al. 2017), and robotic manipulation (Gu et al. 2017; Kalashnikov et al. 2018).

#### *2.3. Financial Market Simulation*

Financial market simulation has been used for investigating market microstructure (Muranaga et al. 1999) and financial market regulations (Mizuta 2016). In particular, multi agent financial market simulation (LeBaron et al. 2001; Lux and Marchesi 1999; Samanidou et al. 2007) is commonly used. In early agent based simulations, such as the rebalancers and portfolio insurers model (Kim and Markowitz 1989) and the econophysics approach (Levy et al. 1994), were not able to have realistic time series characteristics called stylized facts (Harvey and Jaeger 1993; Levine and Demirgüç-Kunt 1999). In subsequent studies, however, stylized facts have been observed by approaches such as application of percolation theory (Solomon et al. 2000; Stauffer 2001), and fundamentalist and chartist model (Lux and Marchesi 1999). In this study, virtual markets are created using such a simulation theory.

#### **3. Simulation Framework Overview**

The overview of our proposed framework is shown in Figure 1. The simulator consists of markets and agents where markets play the role of the environment whose state evolves through the actions of agents. Each agent, in turn, decides its action according to observations of the markets' states. We incorporate a degree of randomness in the agents' selection of actions and the objective of the agents is to maximize their capital amount *cap* calculated with the following equation.

$$cap = cash + \sum\_{i} p\_{\text{Mid},i} pos\_{i'} \tag{1}$$

where *cash*, *p*Mid,*i*, and *posi* are the amount of cash, the midprice of instrument *i*, and quantity of instrument *i*, respectively. In each simulation step, we sample from a distribution of agents until we find one that submits an order. Subsequently, the market order book is updated with the submitted order. The parameters of the DRL agent model is trained at a fixed time intervals of the simulation.

**Figure 1.** Overview of the simulation framework proposed in this research. The simulator consists of markets and agents where markets play the role of the environment whose state evolves through the actions of the agents. There are two types of agents—deep reinforcement learning (DRL) agent and fundamental-chart-noise (FCN) agents. The objective of the agents is to maximize their capital amount. In each step of simulation, an agent to submit an order is sampled, the agent submits an order, and markets process orders and update their orderbooks. The DRL model in the DRL agent is trained in a certain time intervals.

#### **4. Simulator Description**

#### *4.1. Markets*

Our simulated orderbook market consists of an instrument, prices, quantities, and side (buy or sell). An order must specify these four properties as well as an order type from the below three:


The exceptions are that MKT orders do not need to specify a price, and CXL orders do not need to specify quantity since we do not allow volume to be amended down in our simulation (i.e., CXL orders can only remove existing LMT or MKT orders).

Market pricing follows a continuous double auction Friedman and Rust (1993). A transaction occurs if there is a prevailing order on the other side of the orderbook at a price equal to or better than that of the submitted order. If not, the order is added to the market orderbook. When there are multiple orders which meet the condition, the execution priority is given in order of price first, followed by time. CXL orders remove the corresponding limit order from the orderbook, and fails if the target order has already been executed.

The transaction volume *v* is determined by:

$$
v = \min(v\_{\text{buy}}, v\_{\text{sell}}),\tag{2}$$

where *v*buy and *v*sell are submitted volumes of buy and sell orders.

After *v* is determined, stock and cash are exchanged according to the price and transaction volume between the buyer and seller. Executed buy and sell orders are removed from the orderbook if *v* = *v*buy or *v* = *v*sell, and change to volume *v*buy − *v* or *v*sell − *v* otherwise.

Additionally, we define a fundamental price *pF* for each market. The fundamental price represents the fair price of the asset/market and observable only by FCN agents (not the DRL agents) and is used to predict future prices. The fundamental price changes according to a geometric Brownian motion (GBM) Eberlein et al. (1995) process.

Markets required the following hyperparmeters:


Tick size is the minimum price increment of the orderbook. The initial fundamental price and fundamental volatility are parameters of the GBM used to determine the fundamental price.

#### *4.2. Agents*

Agents registered to the simulator are classified into the following two types:


The DRL agent is the agent we seek to train, while the FCN agents comprise the environment of agents in the artificial market. Details of the DRL agent are described in Section 5.1.

The FCN agent (Chiarella et al. 2002) is a commonly used financial agent and predicts the log return *r* of an asset with a weighted average of fundamental, chart, and noise terms.

$$\tau = \frac{1}{w\_F + w\_\mathbb{C} + w\_N} (w\_F F + w\_\mathbb{C} \mathbb{C} + w\_N N). \tag{3}$$

Each terms is calculated by the following equations below. The fundamental term *F* represents the difference between the reasonable price considered by the agent and the market price at the time, the chart term *C* represents the recent price change, and the noise term *N* is sampled from a normal distribution.

$$F = \frac{1}{T} \log(\frac{p\_t^\*}{p\_t}) \tag{4}$$

$$\mathcal{C} = \frac{1}{T} \log(\frac{p\_t}{p\_{t-\tau}}) \tag{5}$$

$$N \sim \mathcal{N}(\mu, \sigma^2). \tag{6}$$

*pt* and *p*∗ *<sup>t</sup>* are current market price, fundamental price, and *τ* is the time window size, respectively. *p*∗ *<sup>t</sup>* changes according to a geometric Brownian motion (GBM). Weight values *wF*, *wC*, and *wN* are independently random sampled from exponential distributions for each agent. The scale parameters *σF*, *σC*, and *σ<sup>N</sup>* are required for each simulation. Parameters of the normal distribution *μ* and *σ* are fixed to 0 and 0.0001.

The FCN agents predict future market prices *pt*+*<sup>τ</sup>* from the predicted log return with the following equation:

$$p\_{l+T} = p\_l \exp(r\tau). \tag{7}$$

The agent submits a buy limit order with price *pt*+*τ*(1 − *k*) if *pt*+*<sup>τ</sup>* > *pt*, and submits a sell limit order with price *pt*+*τ*(1 + *k*) if *pt*+*<sup>τ</sup>* < *pt*. The parameter *k* is called order margin and represents the amount of profit that the agent expects from the transaction. The submitting volume *v* is sampled from a discrete uniform distribution *u*{1, 5}. In order to control the number of outstanding orders in the market orderbook, each order submitted by the FCN agents has a time window size, after which the order is automatically canceled .

#### *4.3. Simulation Progress*

Simulation proceeds by repeating the order submission by the agents, order processing by the markets, and updating the fundamental prices at the end of each step. The first 1000 steps are pre-market-open steps used to build the market orderbook and order processing is not performed.

Each order action consists of order actions by the FCN and DRL agents. FCN agents are randomly selected and given a chance to submit an order. The FCN agent submits an order according to the strategy described in Section 4.2 with probability 0.5 and does nothing with probability 0.5. If the FCN agent submits an order, the order is added to the orderbook of the corresponding market. Similarly, an actionable DRL agent is selected, and the DRL agent submits an order according to the prediction made from observing the market state. The DRL agent may act again after an interval sampled from a normal distribution N (100, 102).

Once an agent submits an order, the market processes the order according to the procedure described in Section 4.1. After processing the orders, each market deletes orders that have been posted longer than the time window size of the relevant agent.

In the final step, each market updates its fundamental price according to geometric Brownian motion.

Additionally, training of the DRL agent is performed at a fixed interval. The DRL agent collects recent predictions and rewards from the simulation path and updates its policy gradients.

#### **5. Model Description**

#### *5.1. Deep Reinforcement Learning Model*

DRL uses deep learning neural networks with reinforcement learning algorithms to learn optimal policies for maximizing an objective reward. In this study, deep reinforcement learning models are trained to maximize the financial reward, or returns, of an investment policy derived from observations of, and agent actions in, a simulated market environment.

Various types of deep reinforcement learning methods are used in financial applications depending on the task (Deng et al. 2016; Jiang and Liang 2017; Jiang et al. 2017). In this study, we used advantage-actor-critic (A2C) network (Mnih et al. 2016). An A2C network is a version of actor-critic network (Konda and Tsitsiklis 2000), and has two prediction paths, one for the actor and one for the critic network. The actor network approximates and optimal policy *π*(*at*|*st*; *θ*)(Sutton et al. 2000), while the critic network approximates the state value *V*(*st*; *θv*), respectively. The gradient with respect to the actor parameters *θ* takes the form ∇*<sup>θ</sup>* log(*π*(*at*|*st*; *θ*))(*Rt* − *V*(*st*; *θv*)) + *β*∇*θHπ*(*at*|*st*; *θ*)), where *Rt* is the reward function, *H* is the entropy and *β* is the coefficient. The gradient with respect to the critic parameters *<sup>θ</sup><sup>v</sup>* takes the form ∇*θ<sup>v</sup>* (*Rt* − *<sup>V</sup>*(*st*; *<sup>θ</sup>v*))2.

#### *5.2. Feature Engineering*

The price series is comprised of 20 contiguous time steps taken at 50 time step intervals of four market prices—the last (or current) trade price, best ask (lowest price in sell orderbook), best bid (highest price in buy orderbook), and mid price (average of best ask and best bid). Each price series is normalized by the price values of the first row.

The orderbook features are arranged in a matrix of the latest orderbook and summarize order volumes of upper and lower prices centered at the mid price. Order volumes of all agents and the predicting agent are aggregated at each price level. To distinguish buy and sell orders, buy order volumes are recorded as negative values. The shape of each orderbook feature is 20 × 2.

The agent features consist of cash amount, and stock inventory of each market. The shape of the agent feature is 1 + *n*Market, where *n*Market is the number of markets.

#### *5.3. Actor Network*

The actor network outputs action probabilities. Each action has the following five parameters:


Side indicates the intent of the order. When side is "Stay", the other four parameters are ignored and no action is taken. Market indicates the target market. Price is the difference of submitting price from the best price (i.e., depth in the orderbook), and the submitting price *p* is calculated by the following equation:

$$p = \begin{cases} p\_{\text{BestAsk}} - \text{Price} & (\text{Action} = \text{Buy}) \\ p\_{\text{BestBid}} + \text{Price} & (\text{Action} = \text{Sell}). \end{cases} \tag{8}$$

Both price and volume are used when type is LMT, and only volume is used when type is MKT. When type is CXL, we allow whether to cancel the order with highest price or lowest price. Finally, the number of all actions *n*All is calculated as:

$$n\_{\rm All} = 2n\_{\rm Market}(n\_{\rm Price}n\_{\rm Volume} + n\_{\rm Volume} + 2) + 1,\tag{9}$$

where *n*Market, *n*Price, and *n*Volume indicate the number of markets, price categories, and volume categories, respectively.

Agents select actions by roulette selection with the output probability. Additionally, DRL agents perform random action selection with probability according to the epsilon greedy strategy (Sutton and Barto 2018).

#### *5.4. Reward Calculation*

A typical objective for financial trading agents is to maximize their capital amount calculated by the following equation.

$$
cap = \text{cash} + \sum\_{i} p\_{\text{Mid},i} pos\_i. \tag{10}
$$

The reward should then be some function that is proportional to the change in capital amount. For this study, two reward functions are used: the capital-only reward *R*CO, and the liquidation-value reward *R*LV. *R*CO is calculated as the difference between the capital after investment *capa* and the baseline capital *capb*. The baseline capital was calculated with amount of cash and inventory without investment.

$$R\_{\rm CO} = cap\_{t+\tau} - cap\_t \tag{11}$$

$$
cap\_{t+\tau} = \text{cash}\_{t+\tau} + \sum\_{i} p\_{\text{Mid},i,t+\tau} \text{pos}\_{i,t+\tau} \tag{12}
$$

$$
cap\_t = \text{cash}\_t + \sum\_{i} p\_{\text{Mid},i,t} \text{pos}\_{i,t'} \tag{13}
$$

where *t* and *τ* are the time of action and the time constant, respectively.

On the other hand, *R*LV is the calculated as the difference between the liquidation value after investment *LVa* and the baseline liquidation value *LVb*. The liquidation value is defined as the cash amount that would remain if the entire inventory was instantly liquidated. If the inventory cannot be liquidated within the constraints of the current market orderbook, the remaining inventory is liquidated with a penalty price (0 for long inventory and 2*p*Market for short inventory).

$$R\_{\rm LV} = LV\_{t+\tau} - LV\_{t}.\tag{14}$$

The liquidation value *LV* strongly correlates to *cap* but carries a penalty proportional to the magnitude of the inventory. The capital-only reward includes no penalty to the inventory value, and use of it may cause training of risky strategies (Fama and French 1993). We anticipate the use of *R*LV may be effective in training DRL strategies that balance inventory risk with trading reward.

In training, calculated rewards are normalized for each training batch with the following equation.

$$R\_i' = \frac{R\_i}{\sqrt{E[R^2]}}.\tag{15}$$

In addition, a penalty value *Rp* is add to the normalized reward when the action selected by the DRL agent was infeasible at the action phase (inappropriate cancel orders etc.). *Rp* was set to −0.5.

#### *5.5. Network Architecture*

Overview of the A2C network used in this research is shown in Figure 2. As the number of markets is 1 in this research, the length of agent features is 2. The network consist of three networks—a market feature network, an actor network, and a critic network. The market feature network uses a long short-term memory (LSTM) (Hochreiter and Schmidhuber 1997) layer to extract features from the market price series according to previous studies (Bao et al. 2017; Fischer and Krauss 2018), as well as convolutional neural network (CNN) layers to extract orderbook features that have positional information (Tashiro et al. 2019; Tsantekidis et al. 2017). The agent features are extracted by dense layers and the actor network outputs action probabilities while the critic network outputs predicted state values. ReLU activation is applied to the convolutional and dense layers in the network except for the last layers of actor and critic networks which use softmax activation so that the action probability outputs sum to unity.

**Figure 2.** Overview of the A2C network used in this research (*n*market = 1). The network takes price series, orderbook feature and agent feature as input variables and outputs action probabilities and state values. Purple, blue, orange, green, and yellow boxes in the network represent LSTM, convolutional, max pooling, dense, and merge layers, respectively. LSTM layers have tangent hyperbolic activation. Convolutional and dense layers except the last layers of actor and critic networks have ReLU activation. Action probabilities are calculated by applying softmax activation to the output of actor network.

#### **6. Experiments**

Experiments were performed using the simulator and deep reinforcement learning model described in previous sections. The purpose of the experiments is to investigate following questions:


DRL models with the capital-only (CO) and liquidation-value (LV) reward functions (the CO-DRL and LV-DRL models) were trained in one simulation and validated in a separate simulation. In the validation simulations, DRL models select actions only by their network outputs and do not perform random action selections. Each simulation consists of 100 sub simulations which have 1,000,000 steps (total 100,000,000 steps). Each sub simulation has different simulator and agent parameter settings. For comparison, a model that selects actions randomly (with the same probability) from the same action set as the DRL models was trained in the same simulations.

Model performances were compared along the following benchmarks:


Average reward is the average of all actions of the agent in simulation. Sharpe ratio *Sp* (Sharpe 1994) is a metric for measuring investment efficiency and calculated by following equation:

$$S\_p = \frac{E[R\_d - R\_b]}{\sqrt{\text{Var}[R\_d - R\_b]}},\tag{16}$$

where *Ra* and *Rb* are returns of the investment and benchmark, respectively. In this study, *Rb* = 0 was assumed. Maximum drawdown (MDD) *MDD* (Magdon-Ismail et al. 2004) is another metric used to measure investment performance and calculated by the following equation:

$$MDD = \frac{P - L}{P} \,\prime\tag{17}$$

where *P* and *L* are the highest and lowest capital amount before and after the largest capital drop.

#### *6.1. Simulator Settings*

Some parameters of simulator and agents were fixed in all sub simulations, while others were randomly sampled in each sub simulation.

The values/distributions of market parameters are shown below.


The values/distributions of parameters for FCN agents are shown below.


The values/distributions of parameters for DRL agent are shown below.


#### *6.2. Results*

Average rewards of each sub simulation in training and validation are shown in Figures 3 and 4. Both figures represents average rewards of random, CO-DRL, and LV-DRL models. As shown, average reward gradually improves as simulation progresses in both DRL models, but at a lower rate and overall magnitude in the CO-DRL model. Similarly in validation, the average rewards of the LV-DRL model are orders of magnitude higher than the other models across all sub simulations.

**Figure 3.** Average rewards of each sub simulation in training. Blue, orange and green lines represents average rewards of random, capital only-deep reinforcement learning (CO-DRL), and liquidation-value deep reinforcement learning (LV-DRL) models.

**Figure 4.** Average rewards of each sub simulation in validation. Orange and blue lines represents average rewards of deep reinforcement learning (DRL) and random (baseline) models.

Model performance in training and validation simulations are shown in Table 1. Table 1 shows the mean and standard deviation of the average rewards , Sharpe ratios, and maximum drawdowns across the sub simulations. The evaluation metrics clearly show that the LV-DRL agent outperforms the others across all benchmarks. We can see that the liquidation-value reward function was instrumental for the DRL agent to learn a profitable trading strategy that simultaneously mitigates inventory risk.

**Table 1.** Model performances of random, CO-DRL, and LV-DRL models. *R*, *Sp*, and *MDD* are average reward, Sharpe ratio, and maximum drawdown. Indices in the table are mean and standard deviation of sub simulations.


The capital change during sub simulations in validation are shown in Figure 5 and illustrate important features of the LV-DRL model under different scenarios. In result (a), the capital of the LV-DRL model continues to rise through sub simulation despite a falling market. We see the LV-DRL model tends to keep the absolute value of its inventory near 0, and that capital swings are smaller than other two models. Result (b) shows a case where the CO-DRL model slightly outperforms in accumulated capital, but we see that this strategy takes excessive inventory risk, causing large fluctuations in capital. In contrast, the LV-DRL model achieves a similar level of capital without large swings at a much lower level of inventory, leading to a higher Sharpe ratio.

**Figure 5.** Example capital changes of sub simulations in validation. Both left and right columns show changes of capital, inventory, and mid price in one typical sub simulation.

A histogram of action probabilities of the three models in validation is shown in Figure 6. The horizontal axis shows 33 types of actions that can be selected by DRL agent. As explained in Section 5.3, actions of models are parameterized by five factors: Intent (Stay, Buy, Sell), Market, Order type (LMT, MKT, CXL), Price (difference from the base price in LMT and MKT orders, lowest or highest in CXL order), and Volume. We ignore the market factor since we only have one market in these experiments. The horizontal axis labels of Figure 6 actions are expressed by "Action-OrderType-Price-Volume".

As shown in Figure 6, the LV-DRL agent can be seen to avoid inefficient actions such as orders with unfavorable price and large volume, or cancel orders. The agent prefers to select buy and sell limit orders with price difference 4 and 6 and volume 5 while limit orders with price difference 0 and volume 5, market orders with volume 5, and cancel orders were rarely selected. Previous studies about high-frequency-trader and market making indicate that traders in real markets have similar investment strategies to our LV-DRL model (Hirano et al. 2019). On the other hand, since the reward function of the CO-DRL model does not consider the inventory of the agent, the CO-DRL model seems to pursue a momentum-trend-following strategy without much regard for risk. We also observe that the CO-DRL model submits more aggressive

market orders than the LV-DRL model; another indication that the CO-DRL agent disregards costs and risk in favor of capital gains.

**Figure 6.** Action probabilities of three models in validation. The horizontal axis represents possible actions which are expressed by "Action-Order type-Price(difference from the price of market order in limit (LMT) and market (MKT) orders, lowest or highest in cancel (CXL) order)-Volume".

#### **7. Conclusions**

In this study we were able to show that with the appropriate reward function, deep reinforcement learning can be used to learn an effective trading strategy that maximizes capital accumulation without excessive risk in a complex agent based artificial market simulation. It was confirmed that the learning efficiency greatly differs depending on the reward functions, and the action probability distributions of well-trained strategies were consistent with investment strategies used in real markets. While it remains to be seen whether our proposed DRL model can perform in a real live financial market, our research shows that detailed simulation design can replicate certain features of real markets, and that DRL models can optimize strategies that adapt to those features. We believe that further consideration of realistic markets (such as multiple markets, agents with various behavioral principles, exogenous price fluctuation factors, etc.) will bring simulations closer to reality and enable creation of various DRL agents that perform well in the real world, and more advanced analyses of real markets.

One of the limitations of the current work is that there is only one DRL agent. In order to consider a more realistic market, we must simulate the interaction between various dynamic agents with differing reward functions. In future work, we plan to look at the effect of introducing multiple DRL agents and examining emergent cooperative and adversarial dynamics between the various DRL agents (Jennings 1995; Kraus 1997) and how that affects market properties as well as learned strategies.

**Author Contributions:** Conceptualization, I.M.; methodology, I.M., D.d., M.K., K.I., H.S. and A.K.; investigation, I.M.; resources, H.M. and H.S.; writing–original draft preparation, I.M. and D.d.; supervision, K.I.; project administration, K.I. and A.K. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


Hirano, Masanori, Kiyoshi Izumi, Hiroyasu Matsushima, and Hiroki Sakaji. 2019. Comparison of behaviors of actual and simulated hft traders for agent design. Paper Presented at the 22nd International Conference on Principles and Practice of Multi-Agent Systems, Torino, Italy, October 28–31.

Hochreiter, Sepp, and Jürgen Schmidhuber. 1997. Long short-term memory. *Neural Computation* 9: 1735–80. [CrossRef]

Horgan, Dan, John Quan, David Budden, Gabriel Barth-Maron, Matteo Hessel, Hado Van Hasselt, and David Silver. 2018. Distributed prioritized experience replay. *arXiv* arXiv:1803.00933.


Ladley, Dan. 2012. Zero intelligence in economics and finance. *The Knowledge Engineering Review* 27: 273–86. [CrossRef]


LeBaron, Blake. 2001. A builder's guide to agent-based financial markets. *Quantitative Finance* 1: 254–61. [CrossRef]


Magdon-Ismail, Malik, Amir F. Atiya, Amrit Pratap, and Yaser S. Abu-Mostafa. 2004. On the maximum drawdown of a brownian motion. *Journal of Applied Probability* 41: 147–61. [CrossRef]

Meng, Terry Lingze, and Matloob Khushi. 2019. Reinforcement learning in financial markets. *Data* 4: 110. [CrossRef]


c 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **Global Asset Allocation Strategy Using a Hidden Markov Model**

#### **Eun-chong Kim 1,\*, Han-wook Jeong <sup>2</sup> and Nak-young Lee <sup>1</sup>**


Received: 29 August 2019; Accepted: 2 November 2019; Published: 6 November 2019

**Abstract:** This study uses the hidden Markov model (HMM) to identify the phases of individual assets and proposes an investment strategy using price trends effectively. We conducted empirical analysis for 15 years from January 2004 to December 2018 on universes of global assets divided into 10 classes and the more detailed 22 classes. Both universes have been shown to have superior performance in strategy using HMM in common. By examining the change in the weight of the portfolio, the weight change between the asset classes occurs dynamically. This shows that HMM increases the weight of stocks when stock price rises and increases the weight of bonds when stock price falls. As a result of analyzing the performance, it was shown that the HMM effectively reflects the asset selection effect in Jensen's alpha, Fama's Net Selectivity and Treynor-Mazuy model. In addition, the strategy of the HMM has positive gamma value even in the Treynor-Mazuy model. Ultimately, HMM is expected to enable stable management compared to existing momentum strategies by having asset selection effect and market forecasting ability.

**Keywords:** price momentum; hidden markov model; asset allocation

#### **1. Introduction**

When it comes to asset allocation, it is a very important consideration to predict the price movement of the investment target. This move is also related to the phenomenon of price momentum. Using this price momentum phenomenon, investment strategies reflecting the price trend of assets have been developed. The method of using price momentum has performed well in the majority of the past market (Antonacci 2013). However, it has a disadvantage in that investment decisions are made after a certain period when price trend has passed. We expect that the artificial intelligence method, which has recently expanded its application in the financial market, will help this point. Freitas et al. (2009) showed that a prediction-based portfolio optimization model using neural networks can capture short investment opportunities and outperform the mean-variance model. They used neural networks to predict stock returns and employed predicted stock returns to compose the portfolio. Maknickiene˙ (2014) found ensemble of Evolino-Recurrent Neural Network (RNN) is effective in portfolio management.

The purpose of this study is to investigate whether the use of the artificial intelligence method can improve portfolio performance empirically for global asset allocation. In this paper, we propose a method of asset allocation using the Hidden Markov Model (HMM). In addition, we examine whether asset selection through the HMM can yield significant excess returns. In this paper, the asset class is divided into four groups (stocks, bonds, real estate, and commodities) for empirical analysis. We used Exchange Traded Funds (ETFs) to allocate assets that have the effect of investing in real assets and in various sectors as well as indices. ETFs have the advantage of lower transaction cost and taxes (Poterba and Shoven 2002).

The purpose of this study is to demonstrate empirically that the investment using the HMM is superior to the existing investment method.

The composition of this paper is as follows. Section 2 discusses asset allocation, momentum investing and HMM. Section 3 describes the investment method using the HMM proposed in this paper. Section 4 analyzes the results of the portfolio and evaluates its usefulness. Finally, in Section 5, the conclusion and the remaining task are discussed.

#### **2. Literature Review**

#### *2.1. Asset Allocation*

Asset allocation is a series of processes that optimize the portfolio of risk assets. The purpose of asset allocation is to create an efficient portfolio through diversified investment based on Markowitz's portfolio theory published in 1952 (Markowitz 1952).

Asset allocation has a significant impact on the performance of the portfolio and the use of exchange-traded fund (ETF) makes portfolio construction relatively easy. Since the ETF consists of securities that follow the index, it can achieve the effect of diversified investment. Miffre (2007) found that ETFs are effective in global asset allocation strategies.

The mean-variance portfolio or modern portfolio theory proposed by Markowitz pursues maximum returns under a given risk. The mean-variance portfolio provides a foundation of modern finance theory and has numerous extensions and applications (Yin and Zhou 2004). Optimal asset allocation across many assets is important and difficult. Simaan (1997) compared the mean-variance model with the mean absolute deviation model. Elliott et al. (2010) investigated a mean-variance portfolio selection problem under a Markovian regime-switching Black-Scholes-Merton economy.

One of the most frequently used by academics is the traditional 60/40 portfolio of stocks/bonds (Chaves et al. 2011; Asness et al. 2012). This portfolio invests 60% in stocks that are risky asset and 40% in bonds that are safe asset. Asness (1996) showed that the 60/40 portfolio outperformed the stock 100% portfolio.

Also, equal weighted portfolio (1/N), which invests the same weight in asset groups is often used. DeMiguel et al. (2007) found that equal weighted portfolio has a lower turnover, higher Sharpe index and higher return than the existing mean variance portfolio.

We compare the proposed strategy with the 60/40 portfolio and the equal weighted portfolio. In addition, this study constructs a portfolio using ETFs/ETNs that are effective in global asset allocation strategies.

#### *2.2. Momentum Investing*

The price momentum was first introduced by Jegadeesh and Titman (1993). Momentum means that the future returns of the winning stocks, which prices have risen for 3 to 12 months, are higher than the future returns of the loser stocks, which prices have fallen over the same period.

Momentum can be largely classified as time series momentum and cross-sectional momentum. Stocks with low return over the past period continue to fall in price, while stocks with high return continue to have cross-sectional momentum across the stock (Grinblatt and Moskowitz (2003); Chordia and Shivakumar (2006); Sadka (2006); Zhu and Zhou (2009); Novy-Marx (2012); Fama and French (2012)).

There are also time series momentum using only the past information of the asset. We analyze financial assets including stocks, bonds, futures, currencies, and derivatives, and show that trading strategies that use time series momentum based on past returns can be statistically significant (Moskowitz et al. 2012). This effect has been reported to be a common phenomenon not only in the US market, but also in the global equity markets (Asness et al. 2013).

Momentum investment strategy refers to a strategy that expects excess returns by buying stocks that had high yields in the past and selling stocks that had low ones. For this time-series momentum, historical data from individual assets are used to identify trends in assets at present and make investment

decisions. In this study, we compare strategy using the most common method of time series momentum with strategy to identify current trends by applying the HMM, a method of machine learning.

#### *2.3. Hidden Markov Model (HMM)*

The HMM is a probability model that estimates the current state based on the assumption that the present state is only affected by the past state for sequential data. Importantly, it is assumed that each state in the HMM follows the Markov chain but is hidden. That is, as shown in Figure 1, the HMM is a model consisting of two hidden elements, a hidden state value (S) and an observable value (O) that actually allow movement between each component to follow the Markov chain. It is also composed of transition probability that connects hidden state to each hidden state. In order to estimate the HMM, there arises a problem of probability estimation, optimal state estimation, and model parameter estimation. Each problem is solved using a forward and backward algorithm, a Viterbi algorithm, and a Baum-Welch algorithm (Ramage 2007).

**Figure 1.** Hidden Markov Model Trellis Diagram.

The HMM is mainly used for time series data and in the financial sector, it is widely used for the researches of the asset price prediction model and the transition of the asset. There is an empirical study that the stock price prediction using the HMM is meaningful (Hassan and Nath 2005). The HMM also approximates observations of asset prices into the Markov process to infer the current state of the asset which can help predict future asset prices using the current state and transition probability (Nguyen 2018). Kritzman et al. (2012) has shown that applying hidden state of the HMM to two cases, it is effective in exploring the phase change of market instability.

#### **3. Model Specification**

This study aims to comprehend phases of each assets and make portfolio by utilizing HMM. HMM can be used in case of situation that status variable cannot be figured out in advance. Figure 2 contains the overall process of analysis.

The return on each asset class ETF is used to learn the HMM and to identify the asset class regime through the distinct hidden states. Through the identified regime, we select the stocks to invest and finally construct the investment portfolio.

**Figure 2.** Empirical Analysis Model Process.

#### *3.1. Data Source*

To construct a portfolio model that compares with the HMM model used in this study, the price data of the ETF was used. A total of 23 ETFs representing each asset class were used for asset allocation. Since we assume monthly rebalancing, we calculated monthly returns and used adjusted stocks with adjusted dividends. The data source was used to invoke Yahoo Finance data using the "quantmod" package in the R program.

#### *3.2. Learning of Hidden Markov Model*

#### 3.2.1. Markov Chain

The Hidden Markov model is based on the Markov chain. Markov chain refers to a discrete-time stochastic process with Markov properties. The key to the Markov chain is that the probability of one state depends only the state before it, and the transition from one state to another does not require a long history of state transition. It can be estimated as a transition from the last state.

$$\mathbf{P}(q\_i|q\_1\cdots q\_{i-1}) = \mathbf{P}(q\_i|q\_{i-1})$$

#### 3.2.2. Hidden Markov Model

The hidden Markov model represents a change in phenomena as a probabilistic model. It is assumed that each state follows a Markov chain but is hidden. The hidden Markov model is used to infer the indirectly hidden state using observations. The phases that can occur in the stock market are defined as the hidden states of the hidden Markov model, and the return of the asset class is used as input data. The model used in this study is as follows.

$$
\lambda = (A, \ B, \ \pi, \ N).
$$

where λ is hidden Markov model, *A* = *ai*, *<sup>j</sup>* is transition matrix from *i* states to *j* states, *B* = *bi*(*k*) is observation probability matrix where *i* states, π is vector of initial probability of being in state and *N* is number of states.

Based on the input data, the hidden Markov model calculates the 'state probability' and the 'transition probability' for each hidden state. In this study, three hidden states are assumed. There are three problems to actually applying the hidden Markov model.


The first and second problems can be solved with the dynamic algorithm based forward algorithm and the Viterbi algorithm, respectively. The third problem is solved with the Baum-Welch Algorithm. To use the hidden Markov model actually, we solved the above problems using the depmixS4 package of the R. The Observation data is asset price data. Each Model parameters can estimate Observation probability Matrix (B) on Hidden state and Transition Matrix (A).

#### 3.2.3. HMM Parameter Learning: Baum-Welch Algorithm

In this study, we used the Baum Welch algorithm (also called Forward, Backward Algorithm) to learn each parameter of the model to maximize the likelihood. The observation value O is a stock price data, and the parameters of each model may estimate the probability B Observation (O) can be observed in each hidden state and probability An Observation (O) can be shifted to another hidden state. Calculate the forward and backward probabilities respectively, based on the observation (O) with the parameters of the model, i.e., the fixed probability A and the emission probability B. Figure 3 illustrates the above process.

**Figure 3.** Forward Probability (α) & Backward Probability (β).

#### *3.3. Estimation of Asset Phases & Portfolio Composition*

For analyzing Hidden states, compute average sharpe ratio on each state. If computed sharpe ratio is positive, we consider it increasing phases. We also trained HMM by using sliding window method which trains data of each asset for 2 years and replicates process of moving and training at intervals of one month. A schematic diagram of the sliding window method is shown in Figure 4. This methodology reflects decreasing of influence of previous data by not using all previous asset price data. It allows quicker adaptations to changing conditions on the market (Kaastra and Boyd 1996).

**Figure 4.** Sliding Window Method.

Each slice is trained by asset class using data from two years just before portfolio rebalancing. After learning, rebalancing is based on the results. In rebalancing, stock selection through HMM is as follows. In contrast, if sharpe ratio is negative, it might be decreasing phases. As a result, if Hidden state belonged to the present time has highest sharpe ratio, we consider the asset is in increasing phases. Assets selected by this process constitute a portfolio with equal weight.

For comprehending overall process, we give an example. We apply SPY ETF adjusted close data to HMM from January 2004 to December 2005. Figure 5 contains division of hidden state by result of training.

**Figure 5.** January 2004–December 2005 Optimal state of S&P500 Index.

In this time, we compute monthly average returns and sharp ratio for judging which phases by each hidden state. Looking at Table 1, in the state of 1, average monthly returns and sharp ratio is negative(−) and this can be considered decreasing phases. Also, in state 2, average monthly returns and sharpe ratio is positive and this can be considered increasing phases. In case of state 3, average monthly returns and sharp ratio is positive but it is not apparent compare to state 2. It might be considered stationary.


**Table 1.** Regime Detection by January 2004–December 2005 Optimal state of S&P500 Index.

The latest date of Learning window, December 2005, cannot be included portfolio of next window, January 2006 because it's not increasing phases. If sharpe ratio of input data is positive (+) and considered increasing phases, this asset can be included in portfolio. Assets selected by this process constitute a portfolio with equal weight.

3 0.4293 0.1529 Clearance phase

#### *3.4. Analyzing E*ff*ect of Asset Selection*

In this study, various performance measures are used to verify the effectiveness of the proposed strategy. Through this, we understand why the performance of the proposed strategy is caused. Below are the major excess returns in this paper.

#### 3.4.1. Information Ratio

The information rate is the ratio of the excess return resulting from active investment activity to the standard deviation of the return from active activity. Grinold and Kahn (1995), the basis of performance evaluation using information ratio, measured the information ratio using excess return and residual risk after considering systematic risk through regression analysis of benchmark return and portfolio return. It was argued that the magnitude of this value could determine the superiority of the performance.

$$IR = \left. \frac{R\_p - R\_m}{TE\_{p,m}} \right. \\ \left. \left( TE = \sqrt{\frac{\sum\_{i=1}^{n} \left[ \left( R\_p - \overline{R}\_p \right) - \left( R\_{BM} - \overline{R}\_{BM} \right) \right]^2}{n-1}} \right) \right.$$

where *Rp* is return of portfolio, *Rp* is the average return on the portfolio, *RBM* Is the return on the benchmark, *RBM* is the average return on the benchmark.

If IR is a positive value, it means that the manager valued the information and achieved the excess return, which is called the information rate because it shows the manager's information capabilities.

#### 3.4.2. Jensen's Alpha

Jensen's Alpha, also known as the Jensen's Performance Index, is a measure of the excess returns earned by the portfolio compared to returns suggested by the CAPM model (Jensen 1968). The Jensen's Alpha can be calculated by using following formula:

$$a\_{\mathcal{P}} = \mathcal{R}\_{\mathcal{P}} - \mathcal{R}\_{f} + \beta \left[ \mathcal{R}\_{m} - \mathcal{R}\_{f} \right]\_{\mathcal{P}}$$

where *Rp* is return of portfolio, *Rf* is risk-free rate, *Rm* is market return and β is stock's beta.

If Jensen's alpha is positive which indicates over-performed the market on a risk-adjusted basis and ability of asset selection. In this study, we considered US Treasury bond (10-year) as risk-free rate.

#### 3.4.3. Fama's Net Selectivity

The Fama's Net Selectivity Measure is suggesting a breakdown of portfolio performance (Fama 1972). It is given by the annualized return of the fund, deducted the yield of an investment without risk, minus the standardized expected market premium times the total risk of the portfolio under review. The Fama's Net Selectivity can be calculated by using following formula:

$$Net\ Selecttivity\_p = \left[R\_p - R\_f\right] - \frac{\left[R\_{\text{m}} - R\_f\right]}{\sigma\_{\text{m}}} \cdot \sigma\_p$$

where *Rp* is return of portfolio, *Rf* is risk-free rate, *Rm* is market return, σ*<sup>p</sup>* is the standard deviation of the portfolio return over period and σ*<sup>m</sup>* is standard deviation of market return over period.

The Fama's Net selectivity gives the excess return obtained by the manager that cannot have been obtained investing in the market portfolio. It compares the extra return obtained by the portfolio manager with a specific risk and the extra return that could have been obtained with the same amount of systematic risk.

#### 3.4.4. Treynor-Mazuy Measure

In case of Jensen's alpha, it cannot divide ability of asset selection and capture market timing. For analyzing in detail, we verify HMM asset allocation strategy by utilizing not only Jensen's alpha also Treynor-Mazuy measure. The magnitude of the Treynor-Mazuy measure depends on two variables: the return of the fund and risk sensitivities variability. Treynor-Mazuy measure suggests that portfolios which display evidence of good market timing will be more volatile in up markets and less volatile in down markets (Treynor and Mazuy 1966). The Jensen's alpha can be calculated by using following formula:

$$R\_p - R\_f = a + \beta \left(R\_m - R\_f\right) + \gamma \left(R\_m - R\_f\right)^2 + \epsilon^2$$

where *Rp* − *Rf* is excess return of portfolio, *Rm* − *Rf* is market premium, a is selectivity ability, β is beta and γ is market timing performance.

If a portfolio has selectivity ability, a has a positive value and if a portfolio has market timing ability, γ has a positive value.

#### **4. Empirical Analysis**

HMM Strategy that we suggest in this study intends to capture each asset's phases and constitute portfolio included stocks or assets having possibilities to increase. For empirical analysis, we apply our strategy to not only stock, bond but also alternative investment class like commodity and REIT's. We constitute portfolio and make universe repetitively by sliding window methodology during January 2001–September 2019. In the above period, the HMM strategy was constructed by sliding 20 months of experiments, sliding for 1 month, sliding every month with 2 years of data. The investment period is January 2013–September 2019 excluding two years of initial learning.

#### *4.1. Global Asset Allocation Investment Universe*

In order to test robust of our strategy, we composed two universes. Tables 2 and 3 contain the asset classes we used as universe. In order to obtain the closest effect to actual investment, we tested the ETF price following the asset class index. We set the cost of trading by assuming the median bid-ask spread of 0.10% applied to the turnover of the portfolio. So, a portfolio with higher turnover will greater trading costs. We set trading commissions to \$0, assuming that investors are using a discount broker providing commission-free ETFs.



We further refined and tested the first 10 asset classes. In the case of alternative investments, we added some asset classes, as each asset class could not be further subdivided. Table 3 shows the investment universe with more granular asset classes.


**Table 3.** Global Asset Allocation Invest Universe (Asset 22).

We mark universe of Table 2 as 'Asset 10' and universe of Table 3 as 'Asset 22'. HMM Strategy is applied to those two universes and we verified validation of common results.

#### *4.2. Summary of Investment Result*

We chose equal weighted portfolio, 60/40 portfolio, mean-variance portfolio, which is used for asset allocation as a benchmark for analyzing portfolio performance. In addition, by comparing the momentum strategy, we examine how the judgement of individual financial instrumental using HMM leads to investment performance.

In the EW portfolio, the asset classes that make up each universe were invested by 1/N. The share of stocks in an equal-weight portfolio will continue to change in proportion to returns. Therefore, we applied wall rebalancing to keep the equal weight. Also, for the 60/40 portfolio, the S & P 500 is 60% and 10-Year U.S. Treasury bond was composed of 40%. This proportion also continues to change with the returns on stocks and bonds. Therefore, in order to maintain a 60/40 fixed ratio, monthly rebalancing was applied. In the case of the MV portfolio, it means Markowitz's mean-variance optimized portfolio, which has been optimized for weight with data from the past two years, which also applies monthly rebalancing. Lastly, the comparison MOM strategy selects positive stocks and excludes negative stocks based on 12-month momentum. Similarly, wall rebalancing was applied.

Figures 6 and 7 show the cumulative portfolio returns and draw down charts for the two universes covered in this study. In both universes, the HMM strategy outperforms the momentum strategy. In the case of the draw down, it can be seen that the HMM and MOM strategies are superior to the traditional strategies used as benchmarks in this study. It's remarkable that drawdown of HMM decreased while drawdown of benchmark increased in financial crisis time (2008). As a result, HMM stabilize risk of portfolio as well.

**Figure 7.** Portfolio Return and Drawdown Chart of Asset 22.

Table 4 summarizes the investment results. All of the strategies proposed in this study show that the investment performance is superior to the momentum strategy and benchmark. Also, universe 1 (Asset 10) and universe 2 (Asset 22) exhibit the same tendency. This implies that performance of the

investment with HMM shows better investment performance than the strategy using classic asset allocation and momentum.


**Table 4.** Portfolio Performance Summary.

Figures 8 and 9 are graphs showing the change in investment weight of strategy using momentum and HMM. The left side shows the momentum strategy and the right side shows the HMM strategy. Both strategies are defensive by increasing the weight of bonds in the 2008 financial crisis period. However, in the case of momentum, the weight of assets does not increase immediately after the index has bottomed out. This is because momentum is measured by a certain trend in the MOM strategy. In the case of HMM, on the other hand, it seems that the weight of bonds increases in the decline phases immediately and increase proportion of shares in the up phases. This shows that the strategy applying the HMM can figure out change of phases fast. This implies that the HMM strategy has a market timing capability compared to the MOM strategy.

**Figure 8.** Change of Weights in Asset 10.

**Figure 9.** Change of Weights in Asset 22.

Table 5 calculates the rolling return to calculate the probability that the strategy's return outperforms each benchmark. Rolling returns are useful for examining the behavior of returns for holding periods, similar to those actually experienced by investors. The rolling period was examined from 1 month in the short term to 36 months in the long term. Overall, HMM strategy is more likely to outperform benchmarks than MOM strategies.


**Table 5.** Portfolio Outperformance Probability.

Information Ratio is widely used as an indicator for assessing performance in benchmarked funds, with the numerator in the form of fractions where the numerator is the excess return and the denominator is the standard deviation of the excess return. Table 6 summarizes the results of the information ratio. In this study, we set EW, 60/40, and MV portfolios as benchmarks. Therefore, when calculating the Information Ratio, each portfolio's return was calculated as a benchmark.


**Table 6.** Information Ratio Result.

#### *4.3. Validation of Selection E*ff*ect of HMM*

The tendency of the return distribution is highly related to the asset selection ability of the portfolio and market prediction ability. In this study, Jensen's alpha, Fama's Net Selectivity and Treynor-Mazuy are measured quantitatively. First, Table 7 shows the result of Jensen's alpha for each universe. The benchmarks used the measure Jensen's alpha were an equal weighted portfolio, 60/40 portfolio and mean-variance portfolio of assets that belonged to each universe. In both universes, we can see that Jensen's alpha of HMM is about 2% larger than the momentum strategy. This tendency is the same for the EW portfolio, 60/40 portfolio and MV portfolio as well. This shows that both strategies using HMM and momentum have the effect of financial instrument selection and the size of HMM strategy is larger than MOM strategy.


**Table 7.** Jensen's Alpha Result.

Table 8 summarizes Fama's measures. This shows that both momentum and HMM strategies are positive when compared to each benchmark. Fama's Net Selectivity, like Jensen's Alpha, has a positive value if it has a positive value. The above results show that the HMM strategy is superior to all benchmarks over the existing MOM strategy. This is the same result as measuring the stock selection effect through Jensen's alpha.


Next, we use Treynor-Mazuy model to examine financial instrument selection ability and market prediction ability. The benchmark likewise sets up the EW portfolios, 60/40 portfolios and MV portfolio belonging to each universe. Table 9 summarizes the Treynor-Mazuy Measure results. First of all, the alpha showing the ability to select a stock shows that both the HMM strategy and the MOM strategy have positive values. This is the same conclusion as the measurement of Jensen's alpha or Fama's Net Selectivity. Also, in the case of gamma, which shows market timing capability, all of HMM strategies are positive, and some of them are negative in MOM strategy. If gamma is positive, market timing capability exists. The magnitude of this value shows that the HMM strategy is superior to the MOM strategy.


**Table 9.** Treynor-Mazuy Measure Result.

#### **5. Conclusions**

The purpose of this study is to verify the effect of asset allocation through an artificial intelligence method. In particular, the HMM identifies the stages of individual financial product change and identifies the usefulness of the investment strategy. The concealment status in the HMM can provide insight into the current asset class. This works similarly to identify whether existing momentum strategies are up or down via price momentum. In this study, we proposed a strategy to include the portfolio if it can be determined that the current state derived from HMM is an upward phase, and to exclude it from the portfolio. For comparison, we discussed a momentum strategy that uses trends in the historical prices of individual financial instruments. The benchmarks include the EW portfolio, the 60/40 portfolio, and Markowitz's MV portfolio, which are the most used asset allocation. In order to confirm the robustness of model, two investment universes were largely set up.

Looking at portfolio performance, the HMM and momentum strategies in both universes show better average annual returns and sharp ratios than traditional asset allocation methods. This suggests that the two strategies improved in the down market, such as the financial crisis, by reducing the max draw down. On the other hand, comparing the HMM strategy and the momentum strategy shows that the overall strategy of HMM is superior. Since this preponderance can only occur in certain intervals, we calculated the probability that the rolling returns of the two strategies could outperform the BM. As a result, in all cases, the HMM strategy is more likely to outperform the benchmark than the momentum strategy. It is also shown that as the holding period increases, the probability increases. This is common in both universes. This shows that the HMM strategy is relatively robust and can be useful in real investments.

In this study, we tried to find out where the cause of this achievement was. Three representative methods of measuring portfolio performance were used. First, we looked at excess returns through the Alpha of Jensen. As Jensen's Alpha is positive for both the HMM and MOM strategies, it demonstrates its ability to select stocks. Next, we examined the ability to select stocks through the method of Fama's Net Selectivity. This value is also positive, indicating that there is a stock selection effect. In both cases, since HMM is larger than MOM strategy, HMM strategy's stock selection ability is excellent. Finally, the strategy was evaluated using the Treynor-Mazuy model. The alpha of Treynor-Mazuy model shows the effect of stock selection, and the result is the same. Also, the value of gamma is the market timing capability, which also shows that the HMM strategy is larger than the MOM strategy.

Summarizing the above results, we can see that the MOM strategy and the HMM strategy are superior to the traditional portfolio composition strategy in terms of performance. This is because the HMM strategy and the momentum strategy, as discussed in this study, have a greater selection effect than the traditional strategy. In particular, the HMM strategy is more effective than the MOM strategy that managers use a lot.

The implications of this study are as follows. MOM strategy the invest in each financial instrument trend are well known and used frequently. MOM strategy has a process of identifying trends in the past through price data and investing in them. The HMM strategy shows that the investment result are superior to the MOM strategy by identifying trends in the financial instruments. Therefore, this method can be applied to bond portfolio, asset allocation portfolio as well as equity portfolio widely.

It is meaningful that this process has been identified through the HMM which is an Artificial Intelligence method. This indicates that an Artificial Intelligence can improve performance more than traditional investment methods. Therefore, further research needs to be applied to asset allocation strategies using deep learning such as RNN.

The limitations and future challenges of this study are as follows. First, in this study, we judged the phases by learning the returns of individual assets. However, as macroeconomic variable also influences the return of individual stocks and assets, these factors need to be considered as well. Second, investment decisions were made on a monthly basis. There is also a price trend on intraday, so it can be applied to high frequency areas as well.

**Author Contributions:** All the authors made important contribution to this paper. Conceptualization, E.-c.K.; Methodology, E.-c.K.; Software, E.-c.K. and H.-w.J.; Validation, H.-w.J. and N.-y.L.; Formal Analysis, E.-c.K. and N.-y.L.; Investigation, N.-y.L. and H.-w.J.; Resources, E.-c.K. and H.-w.J.; Data Curation, H.-w.J.; Writing-Original Draft Preparation, E.-c.K. and N.-y.L.; Writing-Review & Editing, H.-w.J.; Visualization, N.-y.L.; Supervision, E.-c.K.

**Funding:** This research received no external funding.

**Acknowledgments:** We would like to thank the three anonymous reviewers who carefully reviewed our paper and provided us with intuitive suggestions and constructive comment. We are grateful to the editors for theirs support and effort for our manuscript.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **Social and Financial Inclusion through Nonbanking Institutions: A Model for Rural Romania**

### **Xiao-Guang Yue 1,2 , Yong Cao 3, Nelson Duarte <sup>4</sup> , Xue-Feng Shao <sup>5</sup> and Otilia Manta 6,7,\***


Received: 19 September 2019; Accepted: 26 October 2019; Published: 29 October 2019

**Abstract:** The challenges of financial systems have immediate or medium-term social effects. The financial industry is constantly searching for measures to reduce these challenges, especially for those with little or no access to financial services. While current communication technologies make services more accessible through digital mobile platforms, there are still difficulties in establishing viable customer arrangements. In addition to the increased investment in financial technologies, nonbanking financial institutions have now expanded to offer more flexible services tailored to individual circumstances, especially those in isolated rural areas. This research outlines the network model of nonbanking financial institutions in Romania, as well as a microfinance model, based on the financial analysis of four national indicators of nonbanking financial institutions. Data used are presented in absolute values, from the annual numerical series for the reference period 2007–2017. The new initiatives and features incorporated in this Romanian model should be applicable elsewhere and will actively contribute to the expansion and sustainability of financial services, with a positive inclusive impact on society.

**Keywords:** community finances; fiscal flexibility; individualized financial arrangements; sustainable financial services

**JEL Classification:** F65; G23; G32

#### **1. Introduction**

In a global context where large populations have inadequate access to financial services and slow local and regional development, this study assesses the expansion of the nonbanking financial institution (NFI) sector in recent years; it also briefly reviews innovative financial models designed to improve access to and the relevance of financial services for people previously not participating in regulated financial systems. These investigations give indications of successful ways to promote social inclusion and its benefits.

With the emergence of digital financial technologies and the societal challenges created by globalization, it is essential to develop the economic models that improve economic, social, and financial

inclusion (Bartels 2017; Shaikh et al. 2017; Nanda and Kaur 2016). Digitization has had an extraordinary impact on the development of financial innovations, especially in the fields of digital finance and financial institutions in virtual space, namely, the so-called "virtual banks" (Pennathur 2001).

Artificial intelligence is another major development in the financial field. This is a natural trend because many jobs in the financial banking system are repetitive in nature, and robotic components have already been widely implemented (e.g., processing/transfer of payments online or via ATM, account opening/closure). At the same time, the financial technology (FinTech) concept has created new opportunities for clients, such as rapid identification of individual businesses and high flexibility of the award conditions (Kuo Chuen and Teo 2015). This financial revolution has expanded the context of microfinance and provided digital opportunities for the next generation (Arp 2018; Arp et al. 2017). The emergence of new systems such as GlobTech—an innovation of global economic and financial technologies, as well as FinTech—blockchain technology has stimulated confidence in new concepts and financial products (Manta 2017). Financial regulators are also tightening measures against money laundering and terrorism financing.

#### **2. Literature Review**

The current financial reality is the assessment of the adaptation of current financial systems to new financial technological trends. In general, the new financial reality is reflected in new business concepts, such as FinTech. In order to adapt to these challenges, it is necessary for users to know the main major technological trends, including cryptocurrencies, artificial intelligence, blockchain, and database management, from both the business and regulatory points of view. We should also understand how to analyze and evaluate technological innovation in finance, and how new technologies impact economies, markets, companies, and individuals (Sanicola 2017; Zhang et al. 2019).

The existing literature uses the term FinTech, which is generally defined as "a new financial industry that applies technology to improve financial activities" (Schüffel 2017; Government and Economy 2018). FinTech introduces new approaches to financing applications, processes, products, and business finance; it emerged from the need to optimize the funding process through technology and is made up of one or more complementary financial services, provided as an end-to-end process via the Internet (Australian Government 2010; BRW 2014; Sanicola 2017). Financial technology has been used to automate insurance, transactions, banking, and risk management (Aldridge and Krawciw 2017; Wesley-James et al. 2015). Financial services in the process of social and financial inclusion can be provided by innovative financial service providers, or may come from an authorized banking institution or even an insurer. The penetration of financial services to the beneficiary is currently possible through online platforms based on financial-banking programming applications, applications for the financial sector and are supported by regulations such as the European Payment Directive (Scholten 2016).

In trading on the capital markets, innovative electronic trading platforms have emerged as a result of the optimization of the financing process both in terms of time and the operational management of financial transactions in the virtual space. Social trading networks allow investors to observe the trading behavior of traders and to follow their investment strategies on financial markets (Voronkova and Bohl 2005). The platforms require little knowledge about financial markets and have been described as being "a cheap, sophisticated alternative to traditional wealth managers", according to the World Economic Forum (McWaters 2015). However, given the standardization of repetitive activities, a new class of robocounsellors has emerged. This class is a group of automated financial advisers, who provide financial advice to manage online investment with minimal to moderate investment (Lieber 2014; Redheering 2016).

In less than a decade, global investment in financial technology has increased by more than 2200%, from \$930 million in 2008 to more than \$22 billion in 2015 (Accenture 2019). The financial technology industry has also seen rapid growth in recent years. According to the mayor's office in London, 40% of London's workforce is engaged in financial and technological services (Accenture 2019). As a leading

global professional services company, Accenture analyzed data obtained from a global venture-finance data and analytics firm—CB Insights—and found that global investments in FinTech increased to \$55.3 billion in 2018, with the investments in China taking the lead. The total value of transactions doubled, with the number of transactions being close to 20%. Sales values in the United States and the United Kingdom increased by 46% and 56%, respectively; Canada, Australia, Japan, and Brazil also have fundraising (Accenture 2019).

The significant increase was largely due to a ninefold increase in the value of transactions in China to \$25.5 billion—nearly as much as \$26.7 billion of all FinTech investments globally in 2017. China accounted for 46% of all FinTech investments in 2018. More than half of China's FinTech investments came from the \$14 billion financing round in May of Ant Financial, which manages the largest money market fund in world but is best known for its Alipay mobile payment service. In Europe, \$1.5 billion was invested in financial technology companies in 2014. Specifically, London-based companies received \$539 million, Amsterdam-based companies received \$306 million, and companies in Stockholm received \$266 million. FinTech's transactions in Europe reached a maximum of five quarters, rising from 37 in the fourth quarter of 2015 to 47 in the first quarter of 2016 (Wesley-James et al. 2015). Lithuania is becoming a Northern European center of financial technology companies, with the exit of Great Britain from the European Union, and has issued 51 FinTech licenses since 2016 and 32 of them in 2017 (Government and Economy 2018; Financial Stability Board 2019). FinTech companies in the United States raised \$12.4 billion in 2018, showing an increase of 43% compared with 2017 figures (Kauflin 2019).

The idea of "financial inclusion" has gained acceptance and importance since the early 2000s, when exclusion from financial services was identified to be directly correlated with poverty (Armenion 2016; Scholten 2016; Sarma and Pais 2011). In 2018, it was estimated that over 2.2 billion working-age adults, globally, did not have access to financial services provided by regulated financial institutions. For example, in Sub-Saharan Africa, only 24% of adults have a bank account, even though the official African financial sector has grown in recent years (Sanicola 2017). The United Nations (UN) defines financial inclusion objectives (McWaters 2015) as the access to a reasonable cost for all households of a complete range of financial services, including savings or storage, payment, and credit transfer and insurance services. The UN, through partnerships with financial institutions, supports the phenomenon of financial inclusion of the many in need. It adapted and developed personalized financial products for the poor, as well as promoted integrated financial education programs regarding innovative financial services and products, which strengthen the knowledge of financial services through the financial education process, in particular by involving women.The UN financial inclusion product is funded through the United Nations Development Program (Accenture 2019).

Financial exclusion is a big issue of social exclusion in a society, which prevents individuals or social groups from gaining access to formal financial system (Sarma 2008). Those who promote financial inclusion argue that financial services have positive effects when more investors and businesses are involved (Chakelian 2016; UNDP 2012; Williams-Grut 2015). This is also confirmed by the regulatory policy of NFIs that support financial products and services aimed directly at supporting social inclusion. With the direct involvement of social inclusion and a sense of belonging in their community, firms can benefit in different ways. For example, Dunbar et al. (2019) found that enterprises open to corporate social responsibility (CSR) are likely to bring reputational intangibles with risk-reduction incentives at the managerial level (Dunbar et al. 2019; Liu et al. 2017). However, there is also skepticism about the effectiveness of financial inclusion initiatives (Schüffel 2017). Research on microfinance initiatives indicates that broad credit availability for microenterprises can produce informal mediation, a form of unintended entrepreneurship (Aldridge and Krawciw 2017). Nevertheless, from a contingency perspective, firms undertaking CSR activities witness more investment opportunities and stronger governance secures its financial services in an all-inclusive financial system, which, in return, improves future social responsibilities (Ikram et al. 2019).

Clearly, at a global level, NFIs play an important role for nonbank customers. Through innovative financing instruments, the financial regulations regarding their activity, and the programs and funding models, nonbanking financial institutes will be better supporting those who are financially excluded, for instance, small and marginal farmers and certain social groups (Sunstar Philippines 2016). In Romania, an open approach to digital financial technologies exists, but several steps are necessary before these new technologies can be adopted. The first step is to promote knowledge of FinTech advantages. This research provides further analysis of the network of nonbanking financial services, as well as indicators of its development in Romania. By using a case study, this paper identifies specific assumptions and criteria for the design and development of a new microfinance models for rural areas.

#### **3. Methodology**

#### *3.1. Descriptive Data and Statistics*

In an increasingly dynamic financial services industry, banking institutions, NFIs, and other digital financial institutions (FinTech) form the essential parts of a solid and stable financial system. These financial institutions were initially established in different financial sectors. However, they need to be merged into a well-coordinated and complementary system. In some countries, the banking system dominates, whereas in others, nonbank financial institutions (including digital ones) are creating an alternative complement to the banking system. This creates easier access to finance for businesses and households. A range of financial products and services are currently provided both by banks and nonbank financial institutions, as well as many other organizations, including insurance, leasing, factoring, venture capital companies, mutual funds, and pension funds. The ratio of stock market capitalization to banking system assets is very high in most economically advanced countries.

In order to meet the challenge in national financial systems, we constructed the NFI indicators of Romania for 2007–2017 (Ministry of Public Finance 2018; National Institute of Statistics 2014). Data sources include Romania's statistical yearbook, national accounts, national financial accounts, National Bank of Romania statistics, and the financial statements of NFIs and the Financial Supervisory Authority (FSA). The absence of complete NFI data required the use of correlation coefficients between the financial sectors and subsectors: investment funds other than money market funds, other financial intermediaries excluding companies' insurance and pension funds, and financial auxiliaries.

#### *3.2. Indicators and Models*

Explanations regarding the indicators used in our research are summarized below in Table 1.



The asset formation indicator (IF) highlights the financial effort of an NFI to set up total assets and, in particular, financial assets and can help differentiate between organizations in asset formation. It allows for assessment of the effectiveness of a policy for mobilizing financial resources and management of their allocation by type of financial assets (investments) in an economic analysis. In this study, IF

created possibilities for correlation with performance indicators, development, as well as openings for interpretation of the penetration potential in the financial markets specific to NFIs.

The asset usage indicator (IU), also referred to as the asset rotation indicator, is both a measure of the economic profitability of NFIs and the efficiency of asset use, independently of the financial structure, the fiscal policy that taxes the profit, as well as the exceptional elements. IU highlights the globally active, generative relationship between the balance sheet and the results, a relationship dependent on the set of internal and external factors influencing the activity of NFIs, including the managerial factor. It provides a possibility of a quantitative approach to the interactivity between the financial variables of NFIs, the elements of the active patrimony, and the elements of the "total" productive results. This indicator could be useful for building a synthetic financial assessment indicator of NFIs.

The NFI sector expansion indicator (IS) highlights the importance of the NFI sector in the economy and its share, through inclusion, in the financial sector in terms of financial potential. It allows knowledge of the extension of the NFI sector, offering the possibility of evaluating the sector's connection to the national economy as a whole by processing additional information to the financial sector components of the financial markets. This indicator has its advantages in economic analysis with the possibilities for correlation with efficiency, performance, robustness, and stability indicators. The NFI sector expansion indicator provides indirect information on the interconnection between the result component (efficiency and performance) and the potential (robustness and stability) component of the NFI sector. Matrix integration of the indicator in the network of interactions between the evaluation indicators can be executed for the purpose of composing modeling of their dynamic codeterminations and the construction of a synthetic indicator of the financial evaluation of NFIs.

The debt sustainability indicator (IT) expresses the ability of NFIs to generically secure the liquidity of the debt from the revenue generated by the financial investments, expressing indirectly the indebtedness possibilities of the NFI and, at the same time, its margin of maneuver for the structuring of financial debts. It allows for a correlated analysis of financial leverage and performance indicators and debt recovery, providing information on the potential for improving the market position of NFIs.

#### 3.2.1. Credit Constraints Model (Model 1)

In this subsection, we propose four indicators to evaluate financial system developments. Our choice of the indicators is in accordance with the literature and also takes into account the fact that microfinance and other financial innovations are still in their infancy in Romania. The input variables were obtained by generalizing and integrating items in the balance sheet and NFI result account, representing the state and, sometimes, dynamics of financial aggregates. The indicators were calculated from primary indicators and represent absolute values of input variables. Relative weights of the calculated indicators are determined on the basis of the logical conditioning, causal correspondence, absolute values of input variables, or comparability of data in terms of content, coverage, processing methodology, units of measure, and sources of information. The primary and calculated indicators used in this paper are expressed in absolute terms and listed in Table 2.


#### **Table 2.** Terms of primary and calculated indicators.

3.2.2. Network Model of Nonbank Financial Institutions (Including Digital Ones)

To investigate the flexibility of the nonbanking financial system, the proposed model intuitively represents a real financial system that retains the essential features relevant to modeling its impact. According to basic exogenous parameters, the simulation must represent the financial system, allow study of the flexibility of the financial system to shocks, and indicate how shock flexibility depends on the key parameters of the system.

The structure of the financial system is based on two exogenous parameters describing the random graph: the number of nodes representing nonbanking/digital financial institutions (*N*) and the probability (*pij*), where i is one NFI that is linked to another NFI j and the probability is assumed equal for all pairs of nonbanking/digital financial institutions. The graph simulation was done according to the specified parameters and shows a number of links Z made in Figure 1, with a significant network of nine NFIs.

**Figure 1.** The representation of the network of nine NFIs. Source: Otilia Manta, based on her own contribution.

For any representation of the random graph, the balance sheet of individual nonbank financial institutions was completed in a consistent manner with the level of the financial institution and the aggregate balance sheet identity. For the purpose of the detailed description, the following notation was inserted for clarity. The small letters are used for variables at the level of the individual NFI, the big letters for the aggregates, and the Greek letters for the rates.

An individual NFI asset, denoted by *a*, includes external assets denoted by *e* and interbank assets denoted by *i*. Thus, for NFI, we have *ai* = *ei* + *ii*, where *i* = 1, ... , N.

An NFI's liabilities, denoted by *l*, consist of the net assets of the NFI denoted by *c*, customer deposits denoted by *d*, and borrowings between financial institutions denoted by *b*. Thus, for NFIs, we have *li* = *ci* + *di* + *bi*, where *i* = 1, ... , N. According to the NFI balance, we have *ai* = *li*, for *i* = 1, ... , N.

The asset side of the NFI balance sheet as well as the interinstitutional/interbank borrowing received (*b*) on the liabilities side are then added. The determination of the two remaining components, the net asset (*c*) and the deposits (*d*) on the liabilities side, is relatively direct. The net asset is established as a fixed proportion (*c*) of the total assets at the level of the NFI, *ci* = γ × *ai*. Customer deposits are deducted as the rest of the identity of the NFI balance (i.e., *di* = *ai* − *ci* − *bi*).

The network model of nonbank financial institutions, as well as each component of the NFI balance sheet, completes the construction of the system of nonbank financial institutions, supporting financial inclusion at national and global levels. The financial systems constituted by NFIs can be simply described by the following set of architectural parameters (*c*, *p*, *N*, and *E*), where *c* signifies the net asset as a percentage of total assets, *p* is the probability of any two nodes or financial institutions to be connected, *N* is the number of NFIs, and *E* is the total of external assets of financial institutions.

Another financing model that is complementary to the network model is the microfinance model (Manta 2018). The construction of the microfinance model described here was based on research tools such as interviews and questionnaires of over 15,000 nonbank users in rural Romania. The research was carried out in eight designated development regions in 2012–2016. The work included testing the microfinance model as a basis for social and financial inclusion, as well as facilitating entrepreneurship in rural areas of Romanian.

#### **4. Results**

In this section, we present the situation of the main indicators of the NFIs, as well as a microfinance model developed from a conceptual and testable point of view in the rural area for the Romanian market.

#### *4.1. Empirical Evaluation of NFI (Calculation of Indicators)*

The calculation of NFI values was carried out for the period 2007–2017, for which consistent data are available in Table 3. Sources of data used have been previously listed; however, in some cases, the absence of complete NFI data required the use of correlation coefficients, as explained at the end of Section 3.1.


**Table 3.** Absolute values of primary indicators (million RON1).

Source: values based on financial statements of NFI and national financial accounts for 2007–2017 (Ministry of Public Finance).

<sup>1</sup> RON is the national currency of Romania; the average annual rate for 2019 is 1 Euro to 4.7383 RON, according to the National Bank of Romania.

#### 4.1.1. Relevant Indicators for NFI Sector Development

Indicator 1: Asset Formation Indicator (IF)

Formula:

#### IF = CHN/AFN = total financial expense/assets

Values of the asset formation indicator (IF) were calculated and are reported in Table 4.

**Table 4.** Asset formation indicator (IF) values for the period 2007–2017.


Source: values based on financial statements of NFI and national financial accounts for 2007–2017 (Ministry of Public Finance).

For the analyzed period, the increased efforts of the NFIs were demonstrated in the expenditures incurred and the constitution of the financial assets, showing that the expense for setting up a unit of financial asset had increased. This highlights the effects of the financial crisis, which resulted in increased competition in the NFI product market, as well as ineffective policy for mobilizing financial resources and allocating them. The crisis not only resulted in the collapse of financial institutions but also impeded global credit markets and required intensive government interventions in NFIs (Li et al. 2018). This trend impacted the performance of NFIs and their development, robustness, and stability.

Indicator 2: Asset Use Indicator (IU)

Formula:

#### IU = VTN/AFN = total income/total financial assets

Values of the asset user indicator (IU) were calculated and are reported in Table 5.


**Table 5.** Asset user indicator (IU) values for the period 2007–2017.

Source: values based on financial statements of NFI and national financial accounts for 2007–2017 (Ministry of Public Finance).

The evolution of this performance indicator reflects both asset efficiency and global economic efficiency. Its evolution was influenced by internal and external factors of NFI, especially the managerial factor. The low deterioration of the indicator's value is due to the offsetting effects of relevant indicators and the complementary impact of heritage elements in the NFI portfolio.

Indicator 3: NFI (IS) Expansion Indicator

Formula:

IS = AFN/GDP = total financial assets/GDP

Values of the NFI expansion indicator (IS) were calculated and are reported in Table 6.


**Table 6.** NFI expansion indicator (IS) values for the period 2007–2017.

Source: values based on financial statements of NFI and national financial accounts for 2007–2017 (Ministry of Public Finance).

The evolution of the values of this developmental indicator signifies the growth of the NFI sector from the perspective of financial assets, determined primarily by increase in GDP growth and an absolute increase in the value of financial assets of the sector. This development highlights the potential enhancement of sector participation in GDP formation. NFI development highlights the relative disconnection from efficiency and performance indicators. This indicator also reveals the effects of inadequate portfolio management and policy as well as the deterioration of the financial situation.

Indicator 4: Debt Sustainability Indicator

Formula:

IT = DFN/VTN = total financial debt/total income

Values of the debt sustainability indicator (IT) were calculated and are reported in Table 7 and Figure 2.

**Table 7.** NFI debt sustainability indicator (IT) values for the period 2007–2017.


Source: values based on financial statements of NFI and national financial accounts for 2007–2017 (Ministry of Public Finance).

**Figure 2.** Debt sustainability values of the NFI (IT) sector.

The continuous diminution of the value of this indicator over the period 2007–2011 highlights the reduction in the capacity of financial investments to support, through total revenues generated, the commitment of financial liabilities. The insufficient capitalization of the leverage potential, determined by the increase in equity and income, reveals the inability of asset management to profitably capitalize the asset portfolio. In 2014–2017, there was an increase in the IT indicator, reflecting an increase in investment capacity. The indicator values for the full 10-year period are summarized in Table 8.


**Table 8.** Values of nonbank financial institutions' indicators for the period 2007–2017.

Source: values based on financial statements of NFI and national financial accounts for 2007–2017 (Ministry of Public Finance).

The evolution of the indicators of nonbank financial institutions at the national level in Romania shows us how to involve them in the financing of economic activities of nonbanking populations. Moreover, the working hypothesis by which the role of nonbank financial institutions becomes the direct source involved in the process of inclusion is confirmed by the financial indicators calculated here.

#### *4.2. The Microinnovation and Entrepreneurship (MIT) Model*

After three years of searches, documentation, research, forecasting, testing, and implementation, we propose a microfinance model—the MIT. This model not only meets international standards and practices but also responds directly to the needs of entrepreneurs in the rural area of Romanian.

In Romania, there are over 3.5 million small firms, peasant farms, and other types of household businesses that need access to financing. Moreover, according to the Romanian National Statistics 2014, 5.5 million people are suffering relative or absolute poverty. Our field interview has shown a strong inclination of the younger generation to start their own businesses. The population aged between 16 and 24 is 420,000. Among them, the unemployment rate is over 20%. Following the 2008 financial crisis, the banking system has considerably reduced their rural branches, making the scarcity of microfinance more acute. Therefore, it is vital for Romanian rural area economic development to form new types of financial institutions and offer innovative financial products. The following observations and considerations demonstrate the need to develop this MIT model.

The nonbanking financial sector has risen as a result of legislative interventions from 2007 to 2015. Applied research has been undertaken on microfinance in rural areas, including analyzing and testing the entrepreneurial capacity to set up microfinance institutions in rural areas. Along with the impact of financing policies on Romanian rural areas, there is also a great possibility of setting up a network of 1580 microenterprises, specialized in rural microfinance services for communes with more than 3000 inhabitants. Furthermore, past and current agricultural credit, in terms of the number of communes in Romania classified according to the number of inhabitants, creates possible direct beneficiaries of microfinance.

The MIT model, which incorporates elements from each of the microfinance models, is considered the most applicable to the rural area of Romania. The MIT model takes into account traditional instruments and products that have been transposed to beneficiaries through microfinance instruments using financial technologies. The program in this model is designed especially for the small entrepreneur from Romanian rural areas, following the test for entrepreneurial capacity, acquisition of competencies, and coordinated implementation. In addition, a network with an architecture of financial flows and stocks (transmission mechanism, flow, transmitters, and receivers) is clearly identified in the MIT model. A detailed description of the model development is beyond the scope of this paper. For further detail, see Manta (2018); however, the features of the model are summarized in Figure 3.

**Figure 3.** The microfinance entrepreneur model.

The emergence of new systems such as GlobTech and FinTech has stimulated confidence in new concepts of financial products from microfinance institutions (i.e., microfinance enterprises (MSMs)). The beneficiaries of microfinance products include rural entrepreneurs in rural areas, FSZ—semisubsistence agricultural farms, TA—young entrepreneurs, and financially excluded donors (the poor). The funding in this system could be sourced from (1) FT—traditional donors (banks, NFIs, international financial institutions, etc.); (2) FI—investment funds; (3) DG government donors (government, government agencies, etc.), especially in the case of programs for stimulating population lending through social credit with zero interest and having a major impact on financial support, financial education, social integration, fiscal consolidation, and sustainability in the rural environment; and (4) DNG—nongovernmental donors (international and national associations, foundations, etc.).

Microfinance products and tools related to the model were distributed to recipients through either classic channels (1580 MSM networks) and/or current digital financial platforms. From 2017 onwards, microcredit products and the transfer of funds have been the tools/products that are most easily handled by FinTech's current technology. These include


Support services provided by MSM in the MIT model include


## • ASS—other support services.

Just as the family is the basic cell of society, the small entrepreneur (the SME and/or the microenterprise) is the basic cell of the rural economy, and this basic unit concept determined the emergence of the MIT model. The procedure for setting up an MSM is the same as for any limited liability company (LLC), commonly referred to as the specialized microfinance (MSM), SRL. Establishment of an MSM SRL company in 2017 could be done within three working days of submitting the file to the Trade Registry in the area where the company's registered office is located. Steps to set up an MSM-type firm in 2017 are listed below:

(1) Select and book the business name with the specially mentioned MSM. To save time, the entrepreneur can prepare at least three names when they check name availability. The National Trade Register Office charges a fee for the registration procedure. A special MSM SRL must include the name of the locality in the name for the identifications within the network.

(2) Establish the main activity object. The firms specializing in microfinance can use CAEN CODE—6612 Financial intermediation activities and/or other specific CAEN codes of activity as their core code. A lawyer will check all CAEN codes.

(3) Deposit social capital with the bank, which will also serve as the treasurer bank. The minimum social capital in the case of an LLC is 200 RON (equivalent 45 EUR) and must be deposited in the company's account with a commercial bank.

(4) Set up the registered office. The law requires an MSM to have an active working place. Eligible places include the company owner's personal property (sales contract or heir certificate needed as proof), rented spaces (rental or sublease agreements registered with the territorial fiscal units needed as proof), spaces with commodity contracts (commodity contract, usufruct), or leased real estate (real estate leasing contract needed as proof).

(5) Draft the company charter. Prepare the document of the specialized MSM SRL following the standards, including all the specific clauses for a limited liability company.

(6) Prepare the entrepreneur's own declaration, showing that he/she fulfills the legal conditions for having the status of associate and/or administrator and obtaining the specimen signature.

#### **5. Discussion and Conclusions**

Globally, an enormous expansion of NFIs has occurred to meet the demand for financial services from huge populations currently inadequately serviced. This trend has been facilitated by new technologies and international acknowledgement of the benefits of financial inclusion that participation in regulated financial systems brings. Concurrently, global investment in FinTech has also increased dramatically. Focusing on a national level, it has been demonstrated that the financial development of the NFI sector has a relevant impact on the long-term performance and growth of the economy, measured by factors such as the size, depth, access, efficiency, and stability of the NFI sector and the financial system. As part of this development, we have proposed the network model of NFIs, based on the indicators calculated by the authors, whereas the new microfinance model is oriented towards improving the social performance of the entrepreneurs in rural Romania and is explained in terms of defined assumptions, identified needs, model components, and inter-relationships. A short, simple entry process to the package is also listed.

The evaluation of the development of NFIs was made by indicators that express the weight of the financial potential (assets) of this sector in the macroeconomic or macrofinancial situation and the financial structure of the sector or institutions. By calculating indicators of the financial nonbanking sector, this study has demonstrated the size of the NFI sector in Romania and indicated an early stage of development with considerable potential. It is emphasized that financial system development is a process of consolidation and diversification of NFIs to provide services that meet the specific requirements of disparate customers (or economic units) in an effective and real way. Once established, the provision of uninterrupted and unlimited services must be ensured by relevant institutions.

Social and economic sustainability requires the creation of a social system that supports the objectives of raising real incomes, raising educational standards, and improving health and the quality of life. If development is restricted by resources, the priority should involve renewable natural resources, as well as respecting the limits of the development process, allowing for the fact that those limits may be adjusted by technology. The financial sustainability of the NFI sector is achieved when the levels and standards of NFI services are provided in line with long-term objectives, without increasing customer payments or reducing the quality of services.

Therefore, our work contributes to the scientific results by establishing network operating mechanisms for financial institutions involved in supporting social and financial inclusion, as well as microfinance models designed and adapted to local specificities but structured based on existing empirical research at a global level. In addition, our study presents an applicative direction, since the microfinance business model for Romanian rural entrepreneurs is currently testable at the level of NFIs. From an institutional decision point of view, the proposed network contributes to the financial inclusion of small entrepreneurs in rural areas of Romania. Furthermore, it can be used as a supporting instrument for the process of regulating financial services provided to beneficiaries through NFIs. In our future research, we will gather concrete results of these effects that microfinance models have for entrepreneurs. We will also propose the development of innovative mechanisms and tools for the social and financial inclusion of nonbanking institutions (small entrepreneurs) in the context of financial technologies, with a direct impact on the sustainability of small businesses in rural areas.

In summary, in many rural areas where people are excluded financially, the role of the financial network of NFIs is very important, with both social and financial impacts. Both banking and nonbanking institutes should look at financial inclusion both as a business opportunity and as a social responsibility. This research has made a scientific contribution to financial models, providing indicators that could be used within financial banking institutions and a pragmatic package oriented to microfinance solutions for entrepreneurs locally. This study can be extended in future research to present the global evolution of both the financial and social inclusion processes, as well as the implications for financial institutions that support this process.

**Author Contributions:** Conceptualization, O.M., X.-F.S., and X.-G.Y.; methodology, Y.C.; software, O.M.; validation, X.-G.Y. and N.D.; writing—original draft preparation, O.M. and X.-G.Y.; writing—review and editing, X.-F.S. and X.-G.Y.; funding acquisition, O.M.

**Funding:** This research received no external funding.

**Acknowledgments:** We deeply appreciate the three anonymous reviewers for their intuitive suggestions and constructive comments. We are grateful to the editors for their continued support and effort for our manuscript.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **AdTurtle: An Advanced Turtle Trading System**

#### **Dimitrios Vezeris \* , Ioannis Karkanis and Themistoklis Kyrgos**

COSMOS4U, 67100 Xanthi, Greece; g.karkanis@cosmos4u.net (I.K.); thkyrgos@kyrgos.gr (T.K.) **\*** Correspondence: d.vezeris@ioniki.net; Tel.: +30-254-120-0220

Received: 9 May 2019; Accepted: 5 June 2019; Published: 8 June 2019

**Abstract:** For this research, we implemented a trading system based on the Turtle rules and examined its efficiency when trading selected assets from the Forex, Metals, Commodities, Energy and Cryptocurrency Markets using historical data. Afterwards, we enhanced our Turtle-based trading system with additional conditions for opening a new position. Specifically, we added an exclusion zone based on the ATR indicator, in order to have controlled conditions for opening a new position after a stop loss signal was triggered. Thus, AdTurtle was developed, a Turtle trading system with advanced algorithms of opening and closing positions. To the best of our knowledge, for the first time this variation of the Turtle trading system has been developed and examined.

**Keywords:** algorithmic trading; Stop Loss; Turtle; ATR

#### **1. Introduction**

Thanks to advances in technology, automated trading systems have become a tool frequently employed by institutional investors as well as individual day traders. In this research, we implemented such an automated trading system based on the Turtle trading strategy, and examined its behavior with exclusion barriers added to the Turtle's stop loss strategy in order to prevent a new position from opening right after a stop loss signal was triggered.

Every trade executed in a market carries an inherent risk of the price of the asset moving in a direction which is the opposite of the originally anticipated one, and that could result in substantial losses. To mitigate this risk, various stop loss strategies have been introduced and examined over the years. In this paper, we are examining the efficiency of sliding\* and variable\* stop loss strategies combined with exclusion barriers based on the ATR indicator, which is a measure of volatility of an asset's price, added to the automated trading system described above.

\* Sliding: the stop loss barrier slides in the same direction as the price moves to more profitable levels. \* Variable: the width of the stop loss zone adjusts using the latest ATR value.

#### **2. Materials and Methods**

#### *2.1. Related Work and Background*

Systems based on Donchian channels were examined—without parameter optimization—by (Beyoglu and Ivanov 2008), who delved—among other things—into two simple strategies based on the Donchian channels breakthroughs. The first with a 20-day period and the second with a 55-day period were implemented on stocks picked using the CAN SLIM method. They found that these strategies yielded moderate results, but also had small drawdowns, with the 20-day period securing higher profits, but also bigger drawdowns, than the 55-day period. (Jackson 2006) also used two systems based on the Donchian channels, with a wide range of periods, to examine through statistical methods whether technical analysis systems in general can yield profits when trading on bond markets. He found technical analysis to be a profitable tool to use when trading bond futures.

Donchian channels have also been used instead of other indicators as inputs in machine-operated learning systems used in trading, as it was done by (Fletcher et al. 2010). They used, among other things, the highest and lowest prices of various periods as inputs to a multiple kernel learning model and found them to be highly relevant when making predictions regarding the EURUSD exchange rate, provided they were used in combination with other indicators. (Chandrinos and Lagaros 2018) developed a Donchian channel breakout strategy on renko diagrams instead of price diagrams for Forex pairs, with parameters optimized over a 4-year period. The results of the optimized systems over a following 7-year period were extremely positive and promising, and the authors made the most of the systems in a following which involved producing forex portfolios.

The ATR indicator has also been used as input in machine-operated learning systems for trading, as was done by (Ghosh and Purkayastha 2017), who concluded that the Xg boost model outperforms the Support Vector Machine and Random Forest models in predicting the possibility of profit for a stock in the National Stock Exchange in India. (Vanstone and Finnie 2006) also utilized the ATR indicator in order to prove that artificial neural networks can be trained on the basis of technical indicators to identify stocks whose price will potentially rise significantly.

The ATR indicator has also been used in stop loss strategies. (Wilcox and Crittenden 2005) examined a strategy that entailed buying a stock at an all-time high, which we could argue is a form of a Donchian channel with a very high period, and using ATR to calculate exit price levels for selling it. These stop loss levels follow the price as it moves upwards after the long position has been opened. They prove that a (−10 × ATR) stop loss level generates favorable results on average and that, in general, trend following on stocks does have positive results and can be used as a building block for a trading strategy. (Gilligan 2009) used a similar system on stocks picked by the CAN SLIM method. As an entry point, he employed the breakout from a Donchian high barrier with a period of 20 weeks. Exiting when price becomes lower than a barrier set using some multiple of the ATR value turned out to generate optimal results in comparison with all the other exit strategies examined.

(Levene et al. 2014) have also used ATR to calculate barriers to entry as well as to exit a position, on selected stocks. This system outperformed a gap strategy which was examined and yielded worse results than a trend following system they also examined, but when used in a combined system with the other two, the final result was a robust and consistent system. A different method of using the ATR indicator, known as ATR Ratchet, was scrutinized by (Cekirdekci and Iliev 2010). In ATR Ratchet, the stop loss barriers calculated using the ATR value became progressively narrower over the days following the purchase of a stock. They concluded that the ATR Ratchet exit strategy caused a lot of premature exits because of the price fluctuations during the trading session, which resulted in low returns when this exit strategy was used.

The ATR indicator combined with the Turtle trading strategy was also used by (Swart 2016) for position pyramiding as well as calculating stop loss levels. Through back-testing, he also examined the behavior of different ATR periods and multipliers on a variety of assets, and came to conclusions similar to ours, namely that bigger ATR multipliers when calculating the stop loss price barriers generate better results than smaller ones.

Something that had not been examined prior to our research was the behavior of the sliding and variable ATR-based stop loss technique combined with exclusion barriers when used on an automated trading system, based on a Turtle strategy, similar with the one described by (Vezeris et al. 2018b).

#### *2.2. Automated Trading Strategy Development*

#### 2.2.1. The Donchian Channels

Donchian Channels are barriers formed around the price series by the high and low prices over a past period. For a period of *n* hours, the upper line indicates the highest price during the past *n* hour period, while the lower line indicates the lowest price during that past *n* hour period. An example of Donchian Channels can be seen in Figure 1.

**Figure 1.** Donchian Channels of period 20 on a EURUSD H1 chart.

#### 2.2.2. The Average True Range Indicator

The Average True Range Indicator measures the volatility of an asset's price over a previous period. True Range is defined as

$$TR = \max\left[ (\text{high} - \text{low}), \, \text{abs}(\text{high} - \text{close}\_{\text{prev}}), \, \text{abs}(\text{low} - \text{close}\_{\text{prev}}) \right]$$

and the Average True Range is calculated as

$$\text{ATR} = \frac{1}{\text{N}} \sum\_{\text{i}=1}^{\text{N}} \text{TR}\_{\text{i}}.$$

for the first time, and after that as

$$\text{ATR}\_{\text{t}} = \frac{\text{ATR}\_{\text{t}-1} \times (\text{N}-1) + \text{TR}\_{\text{t}}}{\text{N}}$$

In the original Turtle trading rules, the ATR indicator was used as a measure of volatility in order to determine which markets to enter and what size the positions should have. Additionally, it was used in order to set a stop loss barrier for each position as well as a barrier at which the Turtle trading system would add to the initial position. In our research, our choice of markets was determined by other reasons (outlined in Section 2.3). The use of ATR here, in addition to the uses mentioned above, is for the purpose of adding exclusion barriers to the Stop Loss strategy of the basic trading system.

#### 2.2.3. The Turtle Trading Strategy

The Turtle trading system was introduced by Richard Dennis and William Eckhardt during the 1980s as described by (Curtis 2007). It utilizes trend following indicators to recognize and follow trends of an asset's price, in order to enter or exit a position, like multiple Donchian Channels of different periods of high and low lines. The general rule is to open a long position when the price breaks above the high line of the past *n* days and hold the position until the price line breaks below the low line of the past *m* days where *m* < *n*. The same rule applies with regard to short positions, when the price line breaks the low line of the past *n* days then a short position is opened, and the position is held until the price line breaks the high line of a shorter *m* period.

A very important element in the trading system's profitability is the use of price levels to gradually invest more as the price continues to follow a trend. We did not implement hedging and non-hedging trading strategies as described by (Vezeris et al. 2018a). Instead, the Turtle trading system invests enough to risk losing only 4% of its account equity, if the price moves in the opposite direction by X × ATR(N) when opening a new position, and it would continue to add to the initial position investing enough to risk losing an additional 4% of its account equity each time the price moves by Z × ATR(N) in the direction of the trend. Where N is the period over which the ATR is calculated, X is a constant adjusting the width of the stop loss barrier and Z is a constant adjusting the width of the new position barrier at which the trading strategy adds to the initial position. The maximum number of permitted additions to the initial position is four, for a maximum risk of 20% of its account equity.

To avoid opening positions of the same type during periods of high volatility, the period of the high or low line used to open a new position is extended when the new position is of the same type as the previous one and when the previous one was profitable. For example, if we entered a long position because the price broke the high (40) line and then exited that long position because the price broke the low (20) line, then in order to open a new long position the price would have to break a high (60) line and in order to close that new long position the price would have to fall below the low (30) line. This rule is applied only when the previous trade had a profitable result. If the previous position closed with a negative profit or because a stop loss signal is triggered, then a new position of the same type can open again with the same period of high or low line.

This means that we have a total of 4 high indicator lines:


and 4 low indicator lines:


There are also 4 different periods for these lines:


The various lines the Turtle trading strategy uses can be better seen in Figure 2.

In addition to the rules above, the Turtle trading system has a stop loss strategy on its own. It is based on the sliding and variable ATR Stop Loss zone described in Section 2.2.4, but without the exclusion barriers.

For our research, we chose to use the above automated trading strategy in the hourly timeframe (H1), as we wanted to examine the profitability of the system using exclusion barriers in combination with the Stop Loss strategy in a High Frequency Trading mode.

From this point onwards, we will refer to the abovementioned basic trading strategy as "classic Turtle expert advisor".

**Figure 2.** The lines used by the Turtle indicator as described above, on a EURUSD chart with *x* = 24, *y* = 60 and *n* = *m* = 2.

2.2.4. The Turtle Expert Advisor Combined with the ATR Indicator for the Stop Loss Strategy

The ATR indicator is combined with the Turtle expert advisor so that stop loss barriers can be created in addition to the barriers used by the Turtle expert advisor itself. For example, when a new long position is opened then a stop loss barrier can be set at

$$\text{Stop Loss Barrier} = \text{Opening Price} - \text{\textit{X}} \times \text{ATR}(\text{N})$$

or when a new short position is opened a new stop loss barrier can be set at

Stop Loss Barrier = Opening Price + X × ATR(N)

where X is a constant adjusting the width of the stop loss zone and N is the period over which the ATR is calculated.

As the price moves to more profitable levels after a position is opened, it could be smart to slide the ATR stop loss window accordingly, so that some of the profits are secured in the event of a downtrend.

Therefore, each time the price breaks out of the ATR stop loss zone towards a profitable direction, the ±X × ATR(N) stop loss zone is redrawn around that price.

The ATR stop loss zones have a width that is determined by a constant, as well as the ATR value at the time the position was opened and the zone was created. However, variability (and consequently the value of ATR) can change over the time a position is held, so it could be meaningful to adjust the width of the stop loss zone using the latest ATR value for each timeframe.

Therefore, in addition to sliding, we can implement the ATR window to change its width, too, using the recent value of ATR for each timeframe.

Moreover, ATR can form zones of exclusion after a position is closed due to a stop loss, so that new positions are not immediately opened especially in periods of high volatility. So after a position is closed, in order to open a new long position the price must rise above

New Position Barrier = Closing Price + Y × ATR(N)

and in order to open a new short position the price must drop below

New Position Barrier = Closing Price − Y × ATR(N)

where Y is a constant for adjusting the width of the new position barrier.

Flowcharts of the advanced Turtle Expert Advisor can be seen in Figures 3–6.

**Figure 3.** The Turtle basic strategy, showing when a position is opened or closed based on the open high or low lines as described in the sections above.

**Figure 4.** Initialization process after opening the first position. N\_ATR is the ATR value at the time the position is opened, X is a constant adjusting the width of the stop loss zone and Z is a constant adjusting the width of the new position zone at which the trading strategy adds to the initial position.

#### *2.3. Data and Implementation*

To examine the stop loss strategy described above we used the Metatrader 5 trading terminal by Metaquotes to conduct the back tests and Microsoft SQL Server to store and initially process the results.

We experimented on a total of 8 assets, namely AUDUSD, EURUSD, GBPUSD, USDCHF, USDJPY, XAUUSD, OIL and BTCUSD over a six-month period, from 3 September 2017 to 24 February 2018, with data from ForexTime, GEBinvest and OctaFX. We chose these assets as we wanted to examine the performance of the ATR Stop Loss strategy in High Frequency Trade mode (hourly timeframe) on markets that trade globally on a 24 h base or close to a 24 h base. This is also the reason for not choosing assets from categories such as equities or rates that were also traded in the original Turtle trading system. As these automatic trading systems would be used in High Frequency Trading, we determined that a six-month testing period would be enough for the examined strategies to unravel their potential and to draw conclusions about their performance.

We set the initial capital for each test at \$10,000 for all assets except for BTCUSD, whose high price demanded a high margin, so in this case we set the initial capital at \$10,000,000 and adjusted the results to make them comparable to the other assets.

We also decided that it is best not to hold open positions over the weekend based on the results by (Vezeris et al. 2018b).

**Figure 5.** Recalculation and checking point of the stop loss and open new position barriers in order to close or open a position if needed.

**Figure 6.** Creating the exclusion zone after a stop loss was triggered and preventing the opening of a new position until the price level escapes the upper or lower exclusion barrier.

#### **3. Results and Discussion**

#### *3.1. Default*/*Selected Parameters*

Initially we set the frequency divisors for closing lines to the following values: *n* = 2 and *m* = 2, the ATR period to 24 and the constant parameter X of the stop loss barrier to 2 as per the Turtle original rules. We used three different sets of parameters for fast, medium and slow paced opening lines, that were selected randomly instead of using the d-Backtest PS method described by (Vezeris et al. 2016) that was later used by (Vezeris and Schinas 2018) to compare the performance of different automated systems. The parameters can be seen in Table 1. We then compared results from (a) the classic Turtle Expert Advisor with the default parameters of *x* = 24 and *y* = 60, (b) the classic Turtle Expert Advisor with the fast selected parameters, (c) the classic Turtle Expert Advisor with the medium selected parameters, and (d) the classic Turtle Expert Advisor with the slow selected parameters. The results of these four experiments can be seen in Table 2, Figures 7 and 8.

**Table 1.** Randomly selected parameters.



**Table 2.** Classic Turtle's gross profits, gross losses and net profits for experiments (a), (b), (c) and (d).

As Figures 7 and 8 illustrate, results from using the slow parameters are generally more profitable and have less drawdown compared with the ones using the default, fast or slow parameters.

From now on in our experiments, we will use our selected parameters. The pseudocode of the Advanced Turtle trading system can be found in Appendix A. Detailed results for the experiments that will follow can be found in Appendix B.

**Figure 7.** Net profits for experiments (a), (b), (c) and (d).

**Figure 8.** Drawdown as percentage of equity for experiments (a), (b), (c) and (d).

#### *3.2. Sliding and Variable ATR Zone*

Next, we examined (e) the stop loss strategy of the sliding and variable ATR zone, where a stop loss barrier is formed at ±X × ATR(N) after a new position is opened. We tested every combination of parameters for N: {12, 24, 36, 48}, and X: {1, 2, 3, 4} for every set of the selected parameters and compared the results with the ones from (f) the stop loss strategy of the sliding and variable ATR zone as described in Section 2.2.4, where apart from the stop loss barrier, an exclusion barrier is formed at ±Y\*ATR(N) after a position is closed due to a stop loss being triggered. Again, we tested every combination of parameters for N: {12, 24, 36, 48}, X: {1, 2, 3, 4} and Y: {1, 2, 3, 4} for every set of the selected parameters. Figures 9–14 show the averages of profits and the averages of drawdowns of the assets for every combination of N, X and Y for the fast, medium and slow parameters respectively.

**Figure 9.** Averages of profits between the classic and the advanced Turtle systems for every combination of N, X, Y for the fast parameters (20, 40).

**Figure 10.** Averages of drawdowns as percentage of equity between the classic and the advanced Turtle systems for every combination of N, X, Y for the fast parameters (20, 40).

**Figure 11.** Averages of profits between the classic and the advanced Turtle systems for every combination of N, X, Y for the medium parameters (40, 80).

**Figure 12.** Averages of drawdowns as percentage of equity between the classic and the advanced Turtle systems for every combination of N, X, Y for the medium parameters (40, 80).

**Figure 13.** Averages of profits between the classic and the advanced Turtle systems for every combination of N, X, Y for the slow parameters (80, 160).

It can be observed that the advanced Turtle expert advisor performs better than the classic Turtle expert advisor. This is apparent in Figures 9 and 11 with the fast and medium parameters, but we can also clearly distinguish many peaks in Figure 13 with the slow parameters, where, with the addition of exclusion barriers, the Turtle expert advisor yields better results.

Additionally, a trend is evident as profits tend to be higher for bigger values of X. Less drawdown also coincides with higher profits as it can be seen when contrasting Figures 9 and 10. Similar results can be observed by contrasting Figures 11 and 12 as well as Figures 13 and 14.

At this point, it is established the Turtle expert advisor can benefit from the introduction of the sliding and variable ATR stop loss strategy combined with exclusion barriers. Also, we have better results with higher values of the stop loss multiplier X and the exclusion zone multiplier Y. Different values of the ATR period N do not seem to influence the results much other than for N = 12, where there are worse results from the other values of N that was examined.

**Figure 14.** Averages of drawdowns as percentage of equity between the classic and the advanced Turtle systems for every combination of N, X, Y for the slow parameters (80, 160).

#### **4. Exceptions**

We tried using extraordinary values of the stop loss multiplier X: {5, 6, 7} and examined the classic and the advanced Turtle using the slow parameters {80, 160} as they provided better results. Figure 15 shows the average of profits of the assets and Figure 16 the average of drawdowns of the assets for every combination of N, X and Y.

**Figure 15.** Averages of profits between the classic and the advanced Turtle systems for every combination of N, X, Y for the slow parameters (80, 160), where X ≥ 5.

**Figure 16.** Averages of drawdowns as percentage of equity between the classic and the advanced Turtle systems for every combination of N, X, Y for the slow parameters (80, 160), where X ≥ 5.

Using extraordinary values for the stop loss multiplier X, the classic Turtle outperforms the advanced Turtle. Additionally, comparing Figures 13 and 15, profits are higher with higher values of X, a fact which makes the stop loss zone wider, while chances of a stop loss being triggered are slimmer. In other words, the less our Turtle expert advisors use the ATR stop loss windows, the better the results are bound to be, and the exclusion zone becomes redundant.

#### **5. Conclusions**

In our research, we examined exclusion barriers added to a classic Turtle automated trading system used in trading 8 assets from the Forex, Metals, Commodities, Energy and Cryptocurrencies categories.

With this research we concluded that an automated trading system based on a Turtle strategy can benefit from the introduction of exclusion barriers added to its stop loss strategy, for values of the stop loss multiplier X less than 5. Thus, the Adapted Turtle trading system or AdTurtle is developed.

On the contrary, in cases where extraordinary values of X (more than or equal to 5) are used, the Classic Turtle system turns out to be more profitable.

**Author Contributions:** Conceptualization, D.V.; Investigation, D.V.; Methodology, D.V.; Software, I.K. and T.K.; Supervision, D.V.; Visualization, I.K.

**Funding:** This research has been co-financed by the European Union and Greek national funds through the Operational Program Competitiveness, Entrepreneurship and Innovation, under the call RESEARCH—CREATE—INNOVATE (project code: T1EDK-02342).

**Acknowledgments:** We would like to thank the anonymous referees who carefully reviewed our paper and provided us with valuable insights and suggestions.

**Conflicts of Interest:** The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, and in the decision to publish the results.

#### **Appendix A**

Pseudocode of the basic Turtle trading strategy as described in Section 2.2.3.

```
OnTick() { 
 if(currentPosition == None) { 
 if(previousPosition == None) { 
 if(price > open_high_line)) openLongPosition(); 
 else if(price < open_low_line)) openShortPosition(); 
       } 
       else if(StopLossTriggered == true) { 
        if(price < high_Exclusion_Barrier && price > low_Exclusion_Barrier) 
        return; 
       } 
 else if(previousPosition == Long) { 
 if(lastProfit <= 0 || StopLossTriggered == true) { 
    if(price > open_high_line) openLongPosition(); 
 else if(price < open_low_line)) openShortPosition(); 
    } 
    else { 
    if(price > open_extended_period_high_line) openLongPosition(); 
    else if(price < open_low_line) openShortPosition(); 
    } 
       } 
       else if(previousPosition == Short) { 
 if(lastProfit <= 0 || StopLossTriggered == true) { 
    if(price > open_high_line) openLongPosition(); 
 else if(price < open_low_line)) openShortPosition(); 
    } 
    else { 
    if(price > open_high_line) openLongPosition(); 
    else if(price < open_extended_period_low_line) openShortPosition(); 
    } 
       } 
   } //continues in next section ...
```

```
else if(currentPosition == Long) 
 { 
 if(previousPosition == Long && lastProfit > 0 && 
   StoplossTriggered == false) { 
 if(price < close_extended_period_low_line) closeLongPosition(); 
 } 
 else { 
 if(price < close_low_line) closeLongPosition(); 
 } 
 } 
 else if(currentPosition == Short) { 
 if(previousPosition == Short && lastProfit > 0 
   && StoplossTriggered == false) { 
 if(price > close_extended_period_high_line) closeShortPosition(); 
 } 
 else { 
 if(price > close_high_line) closeShortPosition(); 
 } 
 } 
}
```
Pseudocode of the Turtle trading strategy initializing the barriers after opening the first position.

```
OnTick() { 
 if(position_opened_for_the_first_time) { 
     N_ATR = ATR(N); 
     if(currentPosition == Long) { 
         StopLossLine = openPrice – X*N_ATR; 
         NewPositionLine = openPrice + Z*N_ATR; 
        } 
        else { 
         StopLossLine = openPrice + X*N_ATR; 
         NewPositionLine = openPrice - Z*N_ATR; 
        } 
        basePrice = baseOpenPrice = openPrice; 
        openPositions = 1; 
    } 
}
```
*JRFM* **2019**, *12*, 96

Pseudocode of the Turtle trading strategy with the ATR Stop Loss strategy, as described in Section 2.2.4.

```
OnTick() { 
 if(positionIsOpened) { 
     N_ATR = ATR(N); 
     if(currentPosition == Long) { 
         StopLossLine = basePrice – X*N_ATR; 
         newPositionLine = baseOpenPrice + Z*N_ATR; 
         if(currentPrice <= StopLossLine) { 
             StopLossTriggered = true; 
                N_ATR = ATR(N); 
                CloseLongPositions(); 
                return; 
            } 
            if(currentPrice >= newPositionLine) { 
             basePrice = currentPrice; 
             baseOpenPrice = currentPrice; 
             if(openPositions < 5) { 
                 OpenLongPosition(); 
                 openPosition += 1; 
                } 
            } 
        } 
 // continues in next section ...
```

```
else{ 
         StopLossLine = basePrice + X*N_ATR; 
         newPositionLine = baseOpenPrice - Z*N_ATR; 
         if(currentPrice >= StopLossLine) { 
             StopLossTriggered = true; 
                N_ATR = ATR(N); 
                CloseShortPositions(); 
                return; 
            } 
            if(currentPrice <= newPositionLine) { 
             basePrice = currentPrice; 
             baseOpenPrice = currentPrice; 
             if(openPositions < 5) { 
                 OpenLongPosition(); 
                 openPosition += 1; 
                } 
            } 
        } 
    } 
    if(noPositionIsOpened) { 
     if(StopLossTriggered == true) { 
     high_Exclusion_Barrier = closingPrice + Y*N_ATR; 
     low_Exclusion_Barrier = closingPrice – Y*N_ATR; 
        } 
    } 
}
```
## **Appendix B**

**Figure A1.** Net profits of AUDUSD between the classic and the advanced Turtle systems for every combination of N, X, Y for the fast parameters (20, 40).

**Figure A2.** Drawdowns of AUDUSD as percentage of equity between the classic and the advanced Turtle systems for every combination of N, X, Y for the fast parameters (20, 40).

**Figure A3.** Net profits of EURUSD between the classic and the advanced Turtle systems for every combination of N, X, Y for the fast parameters (20, 40).

**Figure A4.** Drawdowns of EURUSD as percentage of equity between the classic and the advanced Turtle systems for every combination of N, X, Y for the fast parameters (20, 40).

**Figure A5.** Net profits of GBPUSD between the classic and the advanced Turtle systems for every combination of N, X, Y for the fast parameters (20, 40).

**Figure A6.** Drawdowns of GBPUSD as percentage of equity between the classic and the advanced Turtle systems for every combination of N, X, Y for the fast parameters (20, 40).

**Figure A7.** Net profits of USDCHF between the classic and the advanced Turtle systems for every combination of N, X, Y for the fast parameters (20, 40).

**Figure A8.** Drawdowns of USDCHF as percentage of equity between the classic and the advanced Turtle systems for every combination of N, X, Y for the fast parameters (20, 40).

**Figure A9.** Net profits of USDJPY between the classic and the advanced Turtle systems for every combination of N, X, Y for the fast parameters (20, 40).

**Figure A10.** Drawdowns of USDJPY as percentage of equity between the classic and the advanced Turtle systems for every combination of N, X, Y for the fast parameters (20, 40).

**Figure A11.** Net profits of XAUUSD between the classic and the advanced Turtle systems for every combination of N, X, Y for the fast parameters (20, 40).

**Figure A12.** Drawdowns of XAUUSD as percentage of equity between the classic and the advanced Turtle systems for every combination of N, X, Y for the fast parameters (20, 40).

**Figure A13.** Net profits of OIL between the classic and the advanced Turtle systems for every combination of N, X, Y for the fast parameters (20, 40).

**Figure A14.** Drawdowns of OIL as percentage of equity between the classic and the advanced Turtle systems for every combination of N, X, Y for the fast parameters (20, 40).

**Figure A15.** Net profits of BTCUSD between the classic and the advanced Turtle systems for every combination of N, X, Y for the fast parameters (20, 40).

**Figure A16.** Drawdowns of BTCUSD as percentage of equity between the classic and the advanced Turtle systems for every combination of N, X, Y for the fast parameters (20, 40).

**Figure A17.** Net profits of AUDUSD between the classic and the advanced Turtle systems for every combination of N, X, Y for the medium parameters (40, 80).

**Figure A18.** Drawdowns of AUDUSD as percentage of equity between the classic and the advanced Turtle systems for every combination of N, X, Y for the medium parameters (40, 80).

**Figure A19.** Net profits of EURUSD between the classic and the advanced Turtle systems for every combination of N, X, Y for the medium parameters (40, 80).

**Figure A20.** Drawdowns of EURUSD as percentage of equity between the classic and the advanced Turtle systems for every combination of N, X, Y for the medium parameters (40, 80).

**Figure A21.** Net profits of GBPUSD between the classic and the advanced Turtle systems for every combination of N, X, Y for the medium parameters (40, 80).

**Figure A22.** Drawdowns of GBPUSD as percentage of equity between the classic and the advanced Turtle systems for every combination of N, X, Y for the medium parameters (40, 80).

**Figure A23.** Net profits of USDCHF between the classic and the advanced Turtle systems for every combination of N, X, Y for the medium parameters (40, 80).

**Figure A24.** Drawdowns of USDCHF as percentage of equity between the classic and the advanced Turtle systems for every combination of N, X, Y for the medium parameters (40, 80).

**Figure A25.** Net profits of USDJPY between the classic and the advanced Turtle systems for every combination of N, X, Y for the medium parameters (40, 80).

**Figure A26.** Drawdowns of USDJPY as percentage of equity between the classic and the advanced Turtle systems for every combination of N, X, Y for the medium parameters (40, 80).

**Figure A27.** Net profits of XAUUSD between the classic and the advanced Turtle systems for every combination of N, X, Y for the medium parameters (40, 80).

**Figure A28.** Drawdowns of XAUUSD as percentage of equity between the classic and the advanced Turtle systems for every combination of N, X, Y for the medium parameters (40, 80).

**Figure A29.** Net profits of OIL between the classic and the advanced Turtle systems for every combination of N, X, Y for the medium parameters (40, 80).

**Figure A30.** Drawdowns of OIL as percentage of equity between the classic and the advanced Turtle systems for every combination of N, X, Y for the medium parameters (40, 80).

**Figure A31.** Net profits of BTCUSD between the classic and the advanced Turtle systems for every combination of N, X, Y for the medium parameters (40, 80).

**Figure A32.** Drawdowns of BTCUSD as percentage of equity between the classic and the advanced Turtle systems for every combination of N, X, Y for the medium parameters (40, 80).

**Figure A33.** Net profits of AUDUSD between the classic and the advanced Turtle systems for every combination of N, X, Y for the slow parameters (80, 160).

**Figure A34.** Drawdowns of AUDUSD as percentage of equity between the classic and the advanced Turtle systems for every combination of N, X, Y for the slow parameters (80, 160).

**Figure A35.** Net profits of EURUSD between the classic and the advanced Turtle systems for every combination of N, X, Y for the slow parameters (80, 160).

**Figure A36.** Drawdowns of EURUSD as percentage of equity between the classic and the advanced Turtle systems for every combination of N, X, Y for the slow parameters (80, 160).

**Figure A37.** Net profits of GBPUSD between the classic and the advanced Turtle systems for every combination of N, X, Y for the slow parameters (80, 160).

**Figure A38.** Drawdowns of GBPUSD as percentage of equity between the classic and the advanced Turtle systems for every combination of N, X, Y for the slow parameters (80, 160).

**Figure A39.** Net profits of USDCHF between the classic and the advanced Turtle systems for every combination of N, X, Y for the slow parameters (80, 160).

**Figure A40.** Drawdowns of USDCHF as percentage of equity between the classic and the advanced Turtle systems for every combination of N, X, Y for the slow parameters (80, 160).

**Figure A41.** Net profits of USDJPY between the classic and the advanced Turtle systems for every combination of N, X, Y for the slow parameters (80, 160).

**Figure A42.** Drawdowns of USDJPY as percentage of equity between the classic and the advanced Turtle systems for every combination of N, X, Y for the slow parameters (80, 160).

**Figure A43.** Net profits of XAUUSD between the classic and the advanced Turtle systems for every combination of N, X, Y for the slow parameters (80, 160).

**Figure A44.** Drawdowns of XAUUSD as percentage of equity between the classic and the advanced Turtle systems for every combination of N, X, Y for the slow parameters (80, 160).

**Figure A45.** Net profits of OIL between the classic and the advanced Turtle systems for every combination of N, X, Y for the slow parameters (80, 160).

**Figure A46.** Drawdowns of OIL as percentage of equity between the classic and the advanced Turtle systems for every combination of N, X, Y for the slow parameters (80, 160).

**Figure A47.** Net profits of BTCUSD between the classic and the advanced Turtle systems for every combination of N, X, Y for the slow parameters (80, 160).

**Figure A48.** Drawdowns of BTCUSD as percentage of equity between the classic and the advanced Turtle systems for every combination of N, X, Y for the slow parameters (80, 160).

**Figure A49.** Net profits of BTCUSD between the classic and the advanced Turtle systems for every combination of N, X, Y for the slow parameters (80, 160), where X ≥ 5.

**Figure A50.** Drawdowns of BTCUSD as percentage of equity between the classic and the advanced Turtle systems for every combination of N, X, Y for the slow parameters (80, 160), where X ≥ 5.

**Figure A51.** Net profits of EURUSD between the classic and the advanced Turtle systems for every combination of N, X, Y for the slow parameters (80, 160), where X ≥ 5.

**Figure A52.** Drawdowns of EURUSD as percentage of equity between the classic and the advanced Turtle systems for every combination of N, X, Y for the slow parameters (80, 160), where X ≥ 5.

**Figure A53.** Net profits of GBPUSD between the classic and the advanced Turtle systems for every combination of N, X, Y for the slow parameters (80, 160), where X ≥ 5.

**Figure A54.** Drawdowns of GBPUSD as percentage of equity between the classic and the advanced Turtle systems for every combination of N, X, Y for the slow parameters (80, 160), where X ≥ 5.

**Figure A55.** Net profits of USDCHF between the classic and the advanced Turtle systems for every combination of N, X, Y for the slow parameters (80, 160), where X ≥ 5.

**Figure A56.** Drawdowns of USDCHF as percentage of equity between the classic and the advanced Turtle systems for every combination of N, X, Y for the slow parameters (80, 160), where X ≥ 5.

**Figure A57.** Net profits of USDJPY between the classic and the advanced Turtle systems for every combination of N, X, Y for the slow parameters (80, 160), where X ≥ 5.

**Figure A58.** Drawdowns of USDJPY as percentage of equity between the classic and the advanced Turtle systems for every combination of N, X, Y for the slow parameters (80, 160), where X ≥ 5.

**Figure A59.** Net profits of XAUUSD between the classic and the advanced Turtle systems for every combination of N, X, Y for the slow parameters (80, 160), where X ≥ 5.

**Figure A60.** Drawdowns of XAUUSD as percentage of equity between the classic and the advanced Turtle systems for every combination of N, X, Y for the slow parameters (80, 160), where X ≥ 5.

**Figure A61.** Net profits of OIL between the classic and the advanced Turtle systems for every combination of N, X, Y for the slow parameters (80, 160), where X ≥ 5.

**Figure A62.** Drawdowns of OIL as percentage of equity between the classic and the advanced Turtle systems for every combination of N, X, Y for the slow parameters (80, 160), where X ≥ 5.

**Figure A63.** Net profits of BTCUSD between the classic and the advanced Turtle systems for every combination of N, X, Y for the slow parameters (80, 160), where X ≥ 5.

**Figure A64.** Drawdowns of BTCUSD as percentage of equity between the classic and the advanced Turtle systems for every combination of N, X, Y for the slow parameters (80, 160), where X ≥ 5.

#### **References**


Wilcox, Cole, and Eric Crittenden. 2005. *Does Trend Following Work on Stocks?* Phoenix: Blackstar Funds, LLC.

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

*Concept Paper*

## **Blockchain Economical Models, Delegated Proof of Economic Value and Delegated Adaptive Byzantine Fault Tolerance and their implementation in Artificial Intelligence BlockCloud**

**Qi Deng 1,2,3**


Received: 12 October 2019; Accepted: 21 November 2019; Published: 25 November 2019

**Abstract:** The Artificial Intelligence BlockCloud (AIBC) is an artificial intelligence and blockchain technology based large-scale decentralized ecosystem that allows system-wide low-cost sharing of computing and storage resources. The AIBC consists of four layers: a fundamental layer, a resource layer, an application layer, and an ecosystem layer (the latter three are the collective "upper-layers"). The AIBC layers have distinguished responsibilities and thus performance and robustness requirements. The upper layers need to follow a set of economic policies strictly and run on a deterministic and robust protocol. While the fundamental layer needs to follow a protocol with high throughput without sacrificing robustness. As such, the AIBC implements a two-consensus scheme to enforce economic policies and achieve performance and robustness: Delegated Proof of Economic Value (DPoEV) incentive consensus on the upper layers, and Delegated Adaptive Byzantine Fault Tolerance (DABFT) distributed consensus on the fundamental layer. The DPoEV uses the knowledge map algorithm to accurately assess the economic value of digital assets. The DABFT uses deep learning techniques to predict and select the most suitable BFT algorithm in order to enforce the DPoEV, as well as to achieve the best balance of performance, robustness, and security. The DPoEV-DABFT dual-consensus architecture, by design, makes the AIBC attack-proof against risks such as double-spending, short-range and 51% attacks; it has a built-in dynamic sharding feature that allows scalability and eliminates the single-shard takeover. Our contribution is four-fold: that we develop a set of innovative economic models governing the monetary, trading and supply-demand policies in the AIBC; that we establish an upper-layer DPoEV incentive consensus algorithm that implements the economic policies; that we provide a fundamental layer DABFT distributed consensus algorithm that executes the DPoEV with adaptability; and that we prove the economic models can be effectively enforced by AIBC's DPoEV-DABFT dual-consensus architecture.

**Keywords:** blockchain; BlockCloud; Artificial Intelligence; consensus algorithms

#### **1. Introduction**

After the outburst of the 2008 financial crisis, Satoshi Nakamoto publishes a paper titled "Bitcoin: A Point-to-Point Electronic Cash System," symbolizing the birth of cryptocurrencies (Nakamoto 2008). Vitalik Buterin (Buterin 2013) improves upon Bitcoin with a public platform that provides a Turing-complete computing language, the Ethereum, which introduces the concept of smart contracts, allowing anyone to author decentralized applications where they can create their own arbitrary rules for ownership, transaction formats, and state transition functions. Bitcoin and Ethereum are the first batch of practical blockchains that make use of distributed consensus, decentralized

ledger, data encryption and economic incentives afforded by the underlying blockchain technology. Essentially, the blockchain technology enables trustless peer-to-peer transactions and decentralized coordination and collaboration among unrelated parties, providing answers to many challenges unsolvable by the traditional centralized institutions, including not but limited to, low efficiency, high cost, and low security.

Bitcoin, the pioneer of the blockchain's distributed ledger and distributed database revolution, is widely regarded as "Blockchain 1.0." "Blockchain 2.0" is represented by Ethereum, which adds a smart contract mechanism to the Bitcoin foundation. The blockchain is entering its 3.0 era: it seeks to create ecosystems with a proliferation of application scenarios with no apparent scope limitation. It has the potential to become the low-level protocol of the "Internet of Everything," and is particularly friendly to applications that require process management, such as supply chain finance, transportation and logistics, property right certification, charity and donation management, etc.

"Blockchain 3.0" is not without challenges. To begin with, as of today, there are only a very few choices of "proven" blockchain consensus algorithms, yet there are literally unlimited number of blockchain applications. To make things worse, each and every existing blockchain employs only one (predetermined) consensus algorithm. As a result, a vast majority of applications would have to rely upon consensus algorithms that are not optimized for them, greatly reducing their efficiency and effectiveness.

Furthermore, while the blockchain starts out as a bottom-layer technology, its true promise goes beyond technology. As "Blockchain 3.0" seeks to create ecosystems with numerous interconnected applications that perform at the highest level collectively; these ecosystems are thus "economies" in the digital world, and applications their "agents." Therefore, a well-designed "Blockchain 3.0" implementation needs to include economic models that provide "rules" to the ecosystem it creates, especially a macroeconomic monetary policy that enforces real-time synchronization between "economic growth (within the ecosystem)" and "money (token) supply". To that end, none of the existing self-claimed "Blockchain 3.0" solution is yet successful.

We propose the Artificial Intelligence BlockCloud (AIBC), an Artificial Intelligence (AI) based blockchain ecosystem. The AIBC is an attempt at addressing the aforementioned challenges "Blockchain 3.0" faces. Anchored on the principles of decentralization, scalability, and controllable cost, the AIBC seeks to provide a perfect platform for Distributed SOLutions (DSOLs) by leveraging the basic blockchain technology and sharing computing power and storage space system-wide.

The AIBC emphasizes on ecosystem expansion. Our goal is to build a cross-application distributed and trusted ecosystem. Based on our economic model, the upper layer Delegated Proof of Economic Value (DPoEV) incentive consensus enables connections among diverse computing, data and information entities. The value in the AIBC is essentially the knowledge that existed in and accumulated by participating entities. The entities then participate in exchanges of values through resource sharing activities, facilitated by token (unit of economic value) transfers. The benefits of the AIBC are then value creation and exchange across entities.

The AIBC also stresses on application support. It provides a flexible technical support infrastructure of distributed services for large business scenarios. Its AI-based fundamental layer Delegated Adaptive Byzantine Fault Tolerance (DABFT) distributed consensus allows individualized real-time customization of protocols. Thus application scenarios in the AIBC ecosystem can be optimized according to differentiated requirements of multiple entities on a public chain that provides common bottom-layer services.

Our contribution is four-fold: that we develop a set of innovative economic models governing the monetary, trading and supply-demand policies in the AIBC; that we establish an upper-layer DPoEV incentive consensus algorithm that implements the economic policies; that we provide a fundamental layer DABFT distributed consensus algorithm that executes the DPoEV with adaptability; and that we prove the economic models can be effectively enforced by AIBC's DPoEV-DABFT dual-consensus architecture.

The rest of the paper is organized as follow: Section 2 provides an overview of the AIBC, Section 3 surveys the existing proven blockchain consensus algorithms, Section 4 presents the AIBC economic models, Sections 5 and 6 give the details of the DPoEV and DABFT consensus algorithms, and Section 7 concludes the paper.

#### **2. AIBC Overview**

#### *2.1. AIBC Key Innovation*

The AIBC is an Artificial Intelligence and blockchain technology based decentralized ecosystem that allows resource sharing among participating nodes. The primary resources shared are the computing power and storage space. The goals of the AIBC ecosystem are efficiency, fairness, and legitimacy.

The key innovation of the AIBC is separating the fundamental (blockchain) layer distributed consensus and the application layer incentive mechanism. The AIBC implements a two-consensus scheme to enforce upper-layer economic policies and achieve fundamental layer performance and robustness: The DPoEV incentive consensus to create and distribute award on the application and resource layers; the DABFT distributed consensus for block proposition, validation and ledger recording on the fundamental layer.

The DPoEV consensus is derived from a model of cooperative economics (macroeconomics, microeconomics, and international trade). It uses the knowledge map algorithm (a branch of Artificial Intelligence) to accurately assess the economic value of digital assets (knowledge).

The DABFT is the fundamental layer distributed consensus algorithm. It improves upon the ADAPT algorithm (Bahsoun et al. 2015) and uses deep learning (a branch of Artificial Intelligence) techniques to predict and dynamically select the most suitable Byzantine Fault Tolerant (BFT) algorithm for the current application scenario in order to achieve the best balance of performance, robustness and security. The DABFT is currently the most adaptive distributed consensus solution that meets various technical needs among public chains.

#### *2.2. AIBC Architecture*

The AIBC consists of four layers: a fundamental layer conducts the essential blockchain functions, a resource layer that provides the shared services, an application layer that initiates a request for resources, and an ecosystem layer that comprises physical/virtual identities that own or operate nodes:


Figure 1 illustrates the AIBC layer structure and corresponding consensus algorithms.

**Figure 1.** AIBC Layer Structure and Consensus Algorithms. The AIBC consists of four layers: A fundamental layer conducts the essential blockchain functions, a resource layer that provides the shared services, an application layer that initiates a request for resources, and an ecosystem layer that comprises physical/virtual identities that own or operate nodes.

#### *2.3. AIBC Two-Consensus Implementation*

The AIBC layers have distinguished responsibilities and thus performance and robustness requirements. For example, once a task is initiated, the application and resource layers are primarily concerned with delivering resources and distributing reward. Therefore, these layers need to follow the economic policies strictly and run on a deterministic and robust protocol, but not necessarily a high-performance one (in terms of speed). On the other hand, the fundamental layer is the workhorse providing basic blockchain services such as consensus building, block proposition and validation, transaction tracking, and ledger recording. Therefore it needs to follow an adaptive protocol with high throughput without sacrificing robustness.

As such, the AIBC implements a two-consensus approach: the DPoEV incentive consensus to create and distribute awards on the application and resource layers, and the DABFT distributed consensus responsible for blockchain functions on the fundamental layer. The DPoEV is deterministic and does not necessarily require high-performance as most of the application scenarios do not demand real-time reward distribution. On the other hand, the DABFT has to be real-time and adaptive, as block validation and record bookkeeping need to be done quickly and robustly.

The two-consensus implementation is a distinguishing feature of the AIBC. It enforces upper-layer economic policies and bottom-layer consensus building, a perfect combination for resource-sharing application scenarios. On the other hand, most of the existing and proposed public chains adopt one-consensus schemes, which do not provide flexibility in performance and robustness tradeoff and are vulnerable against risks such as 51%-attacks.

#### **3. Review of Literature and Practice on Major Consensus Algorithms**

Most of the existing blockchains adopt one-consensus schemes, we survey the ones that have actually been used in "mainstream" blockchains in the section. The majority of these consensus algorithms are proof-based (PoX), and some of them are vote-based. While most of the vote-based consensus algorithms are flavors of the Byzantine Fault Tolerance (BFT), a noticeable few utilize the Crash Fault Tolerance (CFT) approach.

#### *3.1. Proof-Based Consensus Algorithms*

#### 3.1.1. PoW (Proof of Work) Workload Proof Consensus

The PoW consensus behind Bitcoin plays the zero-sum game of SHA256 hash for the miners to win ledger recording privilege. With the increased level of difficulty on block mining, the PoW consumes a tremendous amount of computing power (and electricity) with a great reduction of throughput. Even worse, the higher number of miners, the higher level of difficulty of mining, and the lower level of probability for a miner to win ledger recording privilege, which induces a yet higher level of energy consumption and longer latency. This is the key reason why Ethereum has long considered the use of the PoS (Proof-of-Stake) algorithm, Casper, instead of the PoW. Therefore, from the perspective of mining speed and cost, the PoW is not conducive to long-term and rapid development of blockchain based ecosystems. Other mainstream PoW-based blockchains include the Litecoin (LTC 2018).

#### 3.1.2. PoS (Proof of Stake) Equity Proof Consensus and DPoS

The PoS consensus measures the amount and age of wealth in the ecosystem in order to grant ledger recording privilege (Buterin 2013). PeerCoin (King and Nadal 2012), NXT (NXT 2015), as well as the Ethereum's Casper implementation (Buterin 2014), adopt the PoS. Although the PoS consumes a much lower level of energy than the PoW, it amplifies the impact of accumulated wealth, as such, in a PoS ecosystem, participants with a higher level of wealth can easily monopolize ledger recording. In addition, block confirmations are probabilistic, not deterministic, thus in theory, a PoS ecosystem may have exposure to other attacks. Therefore, from the perspective of miner composition, the PoS is not conducive to the interests of participants in the ecosystem.

The DPoS is derived from the PoS, and is being used by EOS (EOS 2018). The main difference is that, in the DPoS regime, all asset holders elect a number of representatives, and delegate consensus building to them. The regulatory compliance, performance, resource consumption, and fault tolerance of the DPoS are similar to that of the PoS. The key advantage of the DPoS is that it significantly reduces the number of nodes for block verification and ledger recording, thus is capable of achieving consensus in seconds. However, the DPoS inherits the PoS's major shortcomings, that it is probabilistic and does not acknowledge monopolization.

#### 3.1.3. PoI (Proof of Importance) Importance Proof Consensus

The PoI introduces the concept of account importance, which is used as a measure to allocate ledger recording privilege (NEM 2018). The PoI partly resolves the wealth monopolization dilemma of the PoS. However, it exposes to a nothing-at-stake scenario, which makes cheating rather low cost. Therefore, the PoI deviates from the AIBC goal of legitimacy and the DPoEV requirement of "rule of relevancy."

#### 3.1.4. PoD (Proof of Devotion) Contribution Proof Consensus

The PoD introduces the concept of contribution and awards ledger recording privilege according to contributions of accounts (NAS 2018). However, the PoD uses otherwise meaningless pseudo-random numbers to determine ledger recording privilege among participants, which is not consistent with the concept of utilizing resources only for meaningful and productive endeavors. Moreover, due to the limitation of design, the PoD cannot achieve desired level of efficiency.

#### 3.1.5. PoA (Proof of Authority) Identity Proof Consensus

The PoA is similar to the PoS (VET 2018). However, unlike the POS, the PoA nodes are not required to hold assets to compete for ledger recorder privilege, rather, they are required to be known and verified identities. This means that nodes are not motivated to act in their own interest. The PoA is cheaper, more secure and offers higher TPS than the PoS.

#### 3.1.6. PoET (Proof of Elapsed Time) Sample Size Proof Consensus

The PoET (Intel 2017a) is used in Intel's Hyperledger Sawtooth blockchain, it utilizes a "trusted execution environment" to improve on the efficiency and reduce the power consumption of the PoW. The PoET stochastically elects individual nodes to execute requests at a given target rate. These nodes sample an exponentially distributed random variable and wait for an amount of time dictated by the sample. The node with the smallest sample wins the election.

#### 3.1.7. PoSpace (Proof of Space) Disk Space Proof Consensus

The PoSpace (Park et al. 2015) was proposed to improve the inefficient mining of the PoW and inexpensive mining of the PoS, and is used in the SpaceMint blockchain. To mine blocks in a PoSpace blockchain, miners invest disk space instead of computing power, and dedicating more disk space yields a proportionally higher expectation of successfully mining a block.

#### *3.2. Vote-Based Consensus Algorithms*

#### 3.2.1. BFT Distributed Consistency Consensus Algorithms

All the above proof-based consensus algorithms are susceptible to a variety of attacks, especially variations of the 51% attack, which can be partially addressed by vote-based consensuses.

The BFT provides *F* = " (*N* − 1)/3# fault tolerance. The possible solution to the Byzantine problem is that consistency can be achieved in the case of *N* ≥ 3*F* + 1, where *N* is the total number of validators, and *F* is the number of faulty validators. After information is exchanged between the validators, each validator has a list of information obtained, and the information that exists in a 2/3 majority of validators prevails. The BFT advantage is that consensus can be reached efficiently with safety and stability (Lamport et al. 1982; Driscoll et al. 2003). The disadvantages of the BFT are that, when one third or more of the validators stop working, the system will not be able to provide services; and that when one third or more of the validators behave maliciously and all nodes are divided into two isolated islands by chance, the malicious validators can fork the system, though they will leave cryptographic evidence behind. The decentralization level of the BFT is not as high as the other consensuses, thus it is more suitable for multi-centered application scenarios.

A high-performance variant of the BFT, the PBFT (Practical BFT), can achieve a consensus delay of two to five seconds, which satisfies the real-time processing requirements of many commercial applications (Castro and Liskov 2002). The PBFT's high consensus efficiency enables it to meet high-frequency trading needs. The PBFT is a "permissioned" blockchain among a set of known, identified participants; it provides a way to secure the interactions among a group of entities that have a common goal but do not fully trust each other, such as businesses that exchange funds, goods, or information. Thus it may not be suitable for public blockchains. Also, the PBFT PBFT is a network-intensive algorithm, thus not scalable to large networks. Hyperledger Fabric utilizes the PBFT (Androulaki et al. 2018).

The DBFT (Delegated BFT) improved upon the PBFT, and is to select the validators by their stake in the ecosystem, and the selected validators then reach consensus through the BFT algorithm (BTS 2018; NEO 2018). The DBFT has many improvements over the BFT. It updates the BFT's client/service architecture to a peer-node mode suitable for P2P networks. It evolves from static consensus to dynamic consensus that validators can dynamically enter and exit. It incorporates a voting mechanism based on the validators' stakes for ledger recording. It also introduces the usage of a digital certificate, which resolves the issue of validator identity authentication. The DBFT has many desirable features, such as specialized bookkeepers, tolerance of any type of error, and no bifurcation. Just as with the BFT, when one third or more of the validators behave maliciously and all nodes are divided into two isolated islands by chance, the malicious validators can fork the system, though they will leave cryptographic evidence behind.

The Ripple (Schwartz et al. 2014) and its newer version, the XRP (Chase and MacBrough 2018), are proposed to reduce latency of the more basic BFT algorithms, while still maintain robustness in the face of Byzantine failures. The Ripple/XRP is used in the Ripple blockchain.

#### 3.2.2. CFT Distributed Consistency Consensus Algorithms

The Raft (Ongaro and Ousterhout 2014) is a leader-based consensus algorithm. It defines an election process whereby a leader is established and recognized by all followers. Only one nod (leader) publishes blocks, which are then validated and agreed on by the other nodes in the network (the followers).

The Raft is a Crash Fault Tolerant (CFT) algorithm, i.e., it is not a BFT. It continues to make progress as long as a majority of its nodes are available. However, the Raft only guarantees safety and availability under non-Byzantine conditions, which makes it ill-suited for networks that require BFT. It is implemented as Sawtooth Raft in Intel's Hyperledger Sawtooth as one of the consensus engines (Intel 2017b).

#### *3.3. Flaws of Existing Consensus Algorithms*

In this section we survey a number of the most popular consensus algorithms. As the blockchain is a very dynamic field, more comprehensive surveys are available for interested readers (e.g., Nguyen and Kim 2017; Wang et al. 2019)

All existing consensus algorithms function well as standalone protocols. However, all the blockchains that are built on these consensus algorithms do not offer balanced (high) performance and (resilient) robustness. The reason is simple, that different blockchain layers have conflict performance measures that cannot be satisfied by any single consensus algorithm.

For example, the AIBC layers have distinguished responsibilities and thus performance and robustness requirements. Once a task is initiated, the application and resource layers are primarily concerned with delivering resources and distributing reward. Therefore, these layers need to follow the economic policies strictly and run on a deterministic and robust protocol, but not necessarily a high-performance one (in terms of speed). On the other hand, the fundamental layer is the workhorse providing basic blockchain services such as consensus building, block proposition and validation, transaction tracking, and ledger recording. Therefore it needs to follow a protocol with high throughput without sacrificing robustness. As such, a multi-protocol AND adaptive approach is necessary, which gives rise to AIBC's DPoEV-DABFT dual-consensus architecture.

While academic literature addressing the above concerns is still lacking, there are some noticeable efforts, mainly on explaining the mechanisms of blockchains from a scholarly perspective. (Tschorsch and Scheuermann 2016) studies the Bitcoin protocol, as well as its building blocks and its applications, for the purpose of establishing academic research directions. (Herlihy 2018) provides a tutorial on the basic notions and mechanisms underlying blockchains, stressing that blockchains are not mirrored images of distributed computing. (Sultan et al. 2018) aim to address the gap and presents an overview of blockchain technology, identifying blockchain's key characteristics with discussions on blockchain applications.

Some academic studies provide alternatives of consensus algorithms. (Bonneau et al. 2015) seek to identify key components of Bitcoin's design that can be decoupled to enable a more insightful analysis of Bitcoin's properties and future stability. Other academic literature provides guidance on how to design or improve blockchains in order to use their respective consensus algorithms more effectively. (Li et al. 2018) conduct a study on security threats to blockchains, survey the corresponding real attacks by examining popular blockchains, and review security enhancement solutions that could be used in future blockchain development. (Belotti et al. 2019) come up with a "vademecum" guiding developers to the right decisions on when, which and how to adopt blockchains and consensus algorithms.

Other scholars focus on impact of business logics on blockchain implementation. (Governatori et al. 2018) analyze on how concepts pertinent to legal contracts can influence certain aspects of their digital implementation through distributed ledger technology and smart contracts.

Again, academic research on dual-consensus architecture is warranted.

#### *3.4. Multi-Protocol Consensus Algorithms*

While a multi-protocol and adaptive approach is necessary, it has not been studied thoroughly in the academic circle. Neither such an approach is readily available in any of today's blockchains.

There are "dual-token" blockchains. Ontology (ONT 2017) is a public blockchain for a peer-to-peer trust network. It has two tokens: ONT is tied to its consensus algorithm (VBFT1) and is the token that represents "ownership" of a stake of the chain itself; and ONG is the measure of payment for services. LuckyBlock (LKC 2018) is a blockchain designed specifically for decentralized gaming with two tokens: LKC follows the consensus algorithm (a mix of PoW and PoS) that functions as the stakeholder of the chain, and LuckyS is issued and used as the measure of payment on a number of sidechains. Other similar dual-token blockchains include Crysto, XMT, etc. These dual-token blockchains merely separate ownership and measure of payment in order to achieve higher level of convenience in terms of ecosystem management. They do not offer multi-protocol consensus algorithms that seek to balance the network level requirements of performance and robustness.

To the author's best knowledge, there has been only one instance of "true" multi-protocol blockchains in practice at the time of this draft2. VeriCoin (VRC) and Verium (VRM) Reserve (Pike et al. 2015) are dual blockchain protocols pairing a digital currency with a digital commodity using a protocol called "binary chain." It utilizes a variation of PoW on the VeriCoin side (PoWT—Proof of Work Time) and a variation of PoS (PoST—Proof of Stake Time) on the Verium side, trying to be both a fast currency and a secure store of value. The VeriCoin/Verium essentially serves a rather narrow purpose of providing a faster Bitcoin alternative, thus, unlike the AIBC, it dose not seek to address the performance and robustness challenge Blockchain 3.0 faces. That is, the VeriCoin does not offer an economic model that is attack-proof, and the Verium does not offer a consensus protocol that is adaptive.

Table 1 compares the mainstream blockchains and their consensus algorithms to the AIBC.

<sup>1</sup> VBFT is proposed by Ontology. It is a consensus algorithm that combines PoS, VRF (Verifiable Random Function), and BFT. <sup>2</sup> November, 2019.


**Table 1.** Comparisons of Blockchains and Their Consensus Algorithms.

The table compares mainstream blockchains and aspects of their consensus algorithms that we are interested against the AIBC.

#### **4. AIBC Economic Models**

The AIBC ecosystem is essentially a closed economy, of which the operations run on a set of carefully designed economic models. These economic models, at the minimum, must include a macroeconomic model that governs monetary policy (token supply), a trade economic model that enforces fair trade policy, and a microeconomic model that manages supply and demand policy.

#### *4.1. Economic Model Overview*

The most important economic model is the macroeconomic model that provides tools to govern the monetary policy, which principally deals with money (token) supply.

Before the birth of modern central banks, money was essentially in the form of precious metals, particularly gold and silver. Thus, money supply was basically sustained by physical mining of precious metals. Paper money in the modern sense did not come to existence till after the creation of the world's first central bank, Bank of England in 1694. With the creation of the central banks, the modern monetary policy was born. Initially, the goal of monetary policy was to defend the so-called gold standard, which was maintained by their promise to buy or sell gold at a fixed price in terms of the paper money (Kemmerer 1994). The mechanism for the central banks to maintain the gold standard is through setting/resetting the interest rates that they adjust periodically and on special occasions.

However, the gold standard has been blamed for inducing deflationary risks because it limits money supply (Keynes 1920). The argument gains merit during the great depression of the 1920- s and 1930- s, as the gold standard might have prolonged the economic depression because it prevented the central banks from expanding the money supply to stimulate the economy (Eichengreen 1995; American Economic Association 1936). The "physical" reason behind the gold standard's deflationary pressure on the economy is the scarcity of gold, which limits the ability of monetary policy to supply needed capital during economic downturns (Mayer 2010). In addition, the unequal geographic distribution of gold deposits makes the gold standard

disadvantageous for countries with limited natural resources, which compounds the money supply problem when their economies are in contrarian mode (Goodman 1981).

An obvious way to combat the gold standard's natural tendency of devaluation risk is to issue paper money that is not backed by the gold standard, the so-called fiat money. A fiat money has no intrinsic value and is used as legal tender only because its issuing authority (a central government or a central bank) backs its value with non-precious metal financial assets, or because parties engaging in exchange agree on its value (Goldberg 2005). While the fiat money seems to be a good solution for the devaluation problem, central governments have always had a variety of reasons to oversupply money, which causes inflation (Barro and Grilli 1994). Even worse, as the fiat money has no intrinsic value, it can become practically worthless if the issuing authorities either are not able or refuse to guarantee its value, which induces hyperinflation. Case in point is the Deutsche Mark hyperinflation in the Weimar Republic in 1923 (Board of Governors of the Federal Reserve System 1943).

Therefore, neither the gold standard nor the fiat currency can effectively create a "perfect" monetary policy that closely matches the money supply with the state of the economy. After the breakdown of the Bretton Woods framework, all economies, developed and developing alike, still, struggle with choices of monetary policy instruments to combat money supply issues de jour. In addition, because of the physical world's "stickiness (of everything)," all money supply instruments (e.g., central bank interest rates, reserve policies, etc.) lag behind the economic reality, making real-time policy adjustment impossible.

Therefore, eradication of deflation and inflation will always be impractical, unless a commodity money with the following properties can be found or created:


Such a commodity does not exist in the physical world. However, things might be different in the digital world, if digital assets can be monetized into digital currencies.

There have been discussions about a "Bitcoin standard." For example, (Weber 2015) with the Bank of Canada explores the possibility and scenarios that central banks get back to a commodity money standard, only that this time the commodity is not gold, but Bitcoin. However, just like gold, Bitcoin faces a scarcity challenge in that its quantity is finite, and just like gold it needs to be mined at a pace that may lag far behind economic growth (Nakamoto 2008). As such, other than that Bitcoin resides in the digital worlds, it does not offer obvious and significant benefits over gold as the anchor for a non-deflationary commodity money standard.

However, such a digital currency can be created, which instantaneously satisfies the requirement that it can be put into and taken out of circulation instantaneously and in sync with economic reality.

The requirements that the digital currency must have gold-like intrinsic value but not its physical scarcity and that it must be mined at the exact pace as the economic growth are not trivial. First of all, there must be an agreement that digital assets are indeed assets with intrinsic value as if they were physical assets. While such an agreement is more of a political and philosophical nature, and therefore beyond the scope of our practicality-oriented interest, it is not a far stretch to regard knowledge as something with intrinsic value, and since all knowledge can be digitized, it thus can form the base of a digital currency with intrinsic value. This is what we call the "knowledge is value" principle.

Based on our "knowledge is value" principle, there is some merit to Warren Buffett's argument that Bitcoin has no intrinsic value, "because [Bitcoin] does not produce anything (Buffett 2018)." Warren Buffett's remarks refer to the facts that during the Bitcoin mining process, nothing of value (e.g., knowledge) is actually produced, and that holding Bitcoin itself does not produce returns the way traditional investment vehicles backed by physical assets do (i.e., through value-added production processes that yield dividends and capital appreciation).

Therefore, again, based on our "knowledge is value" principle, a digital currency that forms the base for a commodity money standard must have intrinsic value in and unto itself; thus not only it is knowledge, it also produces knowledge. This is the fundamental thesis upon which a digital ecosystem that uses a quantitative unit of knowledge as value measurement, thus currency, can be built.

In a digital ecosystem, there is both knowledge in existence and knowledge in production. If the value of knowledge in existence can be directly measured by a quantitative and constant unit, then the unit itself can be regarded as a currency. Furthermore, the value of knowledge in production can also be measured by the constant unit (currency) in an incremental manner, thus expansion of knowledge is in sync with the expansion of currency base. Effectively, the value measurement system is an autonomous monetary policy that automatically synchronizes economic output (knowledge mining) and money supply (currency mining), because the currency is not a stand-alone money, but merely a measurement unit of the value of knowledge. Thus, this digital currency simultaneously satisfies the requirements that it must have gold-like intrinsic value but not its physical scarcity and that it must be mined at the exact pace as the economic growth, as the currency (measurement unit) and the economic growth (knowledge) are now one and the same; they are unified. In the next section, we discuss how to develop the measurement unit.

The trade economic model provides tools to enforce fair trade policy among participants in a "globalized" economic environment. In a conventional open and free trade regime with no restrictions, it is quite likely that a few "countries" over-produce (export) and under-consume (import), thus they accumulate vast surpluses with regard to their trading partners. These countries will eventually appropriate all the wealth in the global economy, reducing their trade partners to an extreme level of poverty. Therefore, there must be a fair trade policy, enforced by a collection of bilateral and multilateral trade agreements, which penalizes the parties with unreasonable levels of surplus, and provides incentives to the parties with unreasonable levels of deficit. The penalization can be in the form of tariff levy, and other means to encourage consumption and curb production. The incentives can be tariff credit to encourage production and curb consumption. They are essentially wealth rebalancing devices that a "World Trade Organization (WTO)" like body would deploy to guarantee that trades should be both free and fair (WTO 2015).

The microeconomic model provides tools to help manage supply and demand policy in order to set market-driven transaction prices between participants. When there are multiple products simultaneously competing for consumers, the price of a product is set at the point of supply-demand equilibrium. The supply and demand policy discourages initially high-value products to dominate production capability and encourages initially low-value products to be produced. Therefore, consumers can find any product that serves their particular need at reasonable price points.

#### *4.2. Economic Model Implementation Overview*

Because of the physical world's "stickiness (of everything)," all monetary policy instruments (e.g., central bank interest rates, reserve policies, etc.), fair trade devices and supply-demand balancing tools lag behind the economic reality. This means these economic models can never dynamically track economic activities and adjust economic policies accordingly on a real-time basis. To make things more complicated, because all economic policies are controlled by centralized authorities (central banks, WTO, etc.), they may not necessarily reflect the best interests of majority participants in economic activities.

The Internet, however, provides a leveling platform that makes real-time economic policy adjustment practical. This is because the digital world can utilize advanced technological tools in order not to suffer from the reality stickiness and policy effect lag that are unavoidable in the physical world, as well as the potential conflict of interest that cannot be systematically eliminated with centralized authorities. The most important tool of them all, in this sense, is the blockchain technology, which provides a perfect platform for a decentralized digital economy capable of real-time economic policy adjustment.

On the upper-layer, the AIBC ecosystem is an implementation of the "knowledge is value" macroeconomic model through a DPoEV incentive consensus algorithm. The DPoEV consensus establishes a digital economy, in which a quantitative unit that measures the value of knowledge, the CFTX token, is used as the media of value storage and transactions. Since the token issuance and the knowledge expansion are unified and therefore always in-sync on a real-time basis, no deflation and inflation exist in the ecosystem by design. Along with the trade and microeconomic models, the AIBC provides a framework of decentralized, consensus-based digital economy with real-time policy adjustment that enables resource sharing.

On the bottom layer, the AIBC implements a DABFT distributed consensus algorithm that enforces the upper-layer DPoEV policies. It combines some of the best features of the existing consensus algorithms and is adaptive, capable of selecting the most suitable consensus for any application scenario. The DABFT is the blockchain foundation upon which the AIBC ecosystem is built.

#### **5. Delegated Proof of Economic Value (DPoEV)**

#### *5.1. DPoEV Overview*

Inside the AIBC ecosystem, all activities create (or destroy) economic value. Therefore, there is a need for a logical and universal way to assess the economic value of an activity, measured by the community's value storage and transaction medium, the CFTX token. The DPoEV incentive consensus algorithm is to create and distribute award to participating nodes in the AIBC ecosystem. The DPoEV, in turn, is established upon an innovative Economic Value Graph (EVG) approach, which is derived from the knowledge graph algorithm (a branch of Artificial Intelligence and deep learning). The EVG is designed to measure the economic value ("wealth") of the ecosystem in a dynamic way. The EVG will be explained in the next sub-section.

The implementation of DPoEV is as follow:


resource nodes with a low probability of winning assignments. Thus, the DPoEV also functions as a "world trade organization" that enforces fair trade in a decentralized ecosystem.


#### *5.2. Economic Value Graph (EVG) Overview*

Up to this point, we still have not answered the question of how the value of knowledge is actually measured. The pursuit of a public blockchain is to create an ecosystem that supports a variety of application scenarios, and one of the challenges is to define a universal measurement of economic value.

We propose an innovative Economic Value Graph (EVG) mechanism to dynamically measure the economic value ("wealth") of knowledge in the AIBC ecosystem. The EVG is derived from the knowledge graph algorithm, which is very relevant in the context of the AIBC.

#### 5.2.1. Knowledge Graph Overview

A knowledge graph (or knowledge map) consists of a series of graphs that illustrate the relationship between the subject's knowledge structure and its development process. The knowledge graph constructs complex interconnections in a subject's knowledge domain through data mining, information processing, knowledge production and measurement in order to reveal the dynamic nature of knowledge development and integrate multidisciplinary theories (Watthananona and Mingkhwanb 2012).

A knowledge graph consists of interconnected entities and their attributes; in other words, it is made of pieces of knowledge, each represented as an SPO (Subject-Predicate-Object) triad. In knowledge graph terminology, this ternary relationship is known as Resource Description Framework (RDF). The process of constructing a knowledge graph is called knowledge mapping. Figure 2 illustrates Knowledge Graph Subject-Predicate-Object triad.

**Figure 2.** Knowledge Graph Subject-Predicate-Object Triad. A knowledge graph consists of interconnected entities and their attributes; in other words, it is made of pieces of knowledge, each represented as an SPO (Subject-Predicate-Object) triad. In knowledge graph terminology, this ternary relationship is known as Resource Description Framework (RDF). The process of constructing a knowledge graph is called knowledge mapping.

The knowledge graph algorithm is consistent with the EVG. There are two steps in knowledge mapping for an ecosystem: realization of initial knowledge, and dynamic valuation of additional knowledge.

For an ecosystem, at the realization of initial knowledge stage, the knowledge graph algorithm assesses *i*th node's initial economic value of knowledge, which is a combination of explicit and implicit economic values of all relevant knowledge pieces at and connected to that node. The total economic value of the entire ecosystem is thus the sum of all node level economic values.

$$v0\_i = \prod\_{j=1}^{M} P(v0\_{i,j}|v0\_{i,j-1}) \ v0\_{i,j} \tag{1}$$

$$N0 = \sum\_{i=1}^{N} v0\_i \text{ , } i = 1, \dots, N, \ j = 1, \dots, M \tag{2}$$

where *v*0*i*,*<sup>j</sup>* is the initial economic value of the *j*th knowledge piece, and *P*(*vi*,*<sup>j</sup> vi*,*j*−1) is the probability of *v*0*i*,*<sup>j</sup>* given all knowledge pieces prior to the *j*th, at the *i*th node, and Π is a Cartesian product.

Once the initial economic value of the ecosystem is realized, in a task-driven ecosystem, as the tasks start to accumulate, a collection of knowledge graphs of the tasks is then created to assess the incremental economic value of the new knowledge. Finally, the knowledge graph of the entire ecosystem is updated. This dynamic valuation of additional knowledge requires automatic extraction of relationships between the tasks and participating nodes, as well as relationship reasoning and knowledge representation realization. The total economic value of the entire ecosystem is thus the sum of all node level updated economic values.

$$t1\_i = \prod\_{k=1}^{K} P(t1\_{i,k}|t1\_{i,k-1})t1\_{i,k} - \prod\_{k=1}^{K} \mathbb{C}(t1\_{i,k}|t1\_{i,k-1})t1\_{i,k} \tag{3}$$

$$T1 = \sum\_{i=1}^{N} t1\_i \tag{4}$$

$$V1 = V0 + T1 = \,\,\Sigma\_{i=1}^{N}(v0\_i + t1\_i), \, i = 1, \dots, N, \, k = 1, \dots, K \tag{5}$$

where *t*0*i*,*<sup>k</sup>* is the incremental economic value of the *k*th knowledge piece of the task, *P*(*t*1*i*,*<sup>k</sup> t*1*i*,*k*−1) is the probability of *t*1*i*,*<sup>k</sup>* given all knowledge pieces prior to the *k*th, *C*(*t*1*i*,*<sup>k</sup> t*1*i*,*k*−1) is the covariance of *t*1*i*,*<sup>k</sup>* given all knowledge pieces prior to the *k*th, at the *i*th node, and Π is Cartesian product.

#### 5.2.2. EVG Implementation

The essence of EVG is "knowledge is value," and it accesses the entire ecosystem's economic value dynamically.

At the genesis of the AIBC ecosystem, there are no side blockchains, as no task has been initiated yet, and the EVG mechanism just simply depicts a knowledge graph of each and every node (super, tasking, computing and storage node) in the blockchain. The EVG then aggregates the knowledge graph of all nodes and establishes a global knowledge graph. At this juncture, the EVG has already assessed the original knowledge depository of the entire ecosystem. Furthermore, in order to quantify this original wealth, the EVG equates it to an initial supply of CFTX tokens, issued by the DPoEV consensus. This process establishes a constant measurement unit of economic value (token) for the future growth of the ecosystem. The EVG then creates a credit table, which contains all nodes in the ecosystem, and their initial economic values. When a new node joins the ecosystem, the EVG appends a new entry to the credit table for it, with its respective initial economic value. The credit table resides in all super nodes, and its creation and update need to be validated and synchronized by all super nodes by the fundamental layer DABFT distributed consensus algorithm. The DABFT will be discussed in the next section.

The wealth generation is driven by tasks, and the super nodes are the ones that are responsible for handling them. As the tasks continue to be initiated, side chains continue to grow and accumulate from the super nodes. These side chains are the containers of the incremental knowledge, and the EVG measures the economic value of this incremental knowledge with the measurement unit (token). Upon the acceptance of every task, the DPoEV consensus issues a fresh supply of CFTX tokens proportional to the newly created economic value to ensure that the money supply is in sync with the economic growth in order to avoid macroeconomic level inflation or deflation.

Each task is tracked by a distinguished task blockchain, which is a side chain with the root block connected to its handling super node. Each block in the task blockchain tracks the status of the task. The root block contains information including the initial estimation of the economic value of the task. Each subsequent block provides updated information on contributions from the task validation, handling, and resource nodes. When the task blockchain reaches its finality, the EVG has a precise measure of economic value generated by this task. Furthermore, the blocks contain detailed information on contributions from participating nodes, and transactions. Thus, the EVG can accurately determine the size of the reward (amount of tokens) issued to each participating node. The DPoEV then credits a respective amount of tokens to each participating node, which is recorded in the credit table validated by the DABFT consensus.

The EVG enables the DPoEV to manage the economic policy of the ecosystem on a real-time basis through the credit table. The DPoEV can dynamically determine the purchase price of a task, which covers the overall cost paid to the super and resource nodes. It can also set the transaction cost for each assignment. The overall effect is that all macroeconomic, microeconomic and trade policies are closely monitored and enforced. Table 2 is an example of the EVG Node Credit Table.



The EVG (Economic Value Graph) enables the DPoEV to manage the economic policy of the ecosystem on a real-time basis through the credit table. The DPoEV can dynamically determine the purchase price of a task, which covers the overall cost paid to the super and resource nodes. It can also set the transaction cost for each assignment. The overall effect is that all macroeconomic, microeconomic and trade policies are closely monitored and enforced.

#### *5.3. Economic Relevancy Ranking (ERR)*

While the EVG measures the economic value of knowledge created by task, it does not assess the validation, handling, computing, and storage capabilities of participating nodes, as these capabilities are not necessarily based on knowledge. This can be fatal because the DPoEV assigns tasks to super nodes and resource nodes first and foremost with a "rule of relevancy" ranking scheme. This issue is resolved by the Economic Relevancy Ranking (ERR) mechanism.

The ERR ranks tasks as well as the super node and resource nodes (collectively known as "service nodes"). Based on the ERR rankings, the DPoEV provides a matchmaking service that pairs tasks and service nodes.

The ERR assesses each newly created task by the following factors:


The ERR ranking score of a task is thus given as:

$$TR\_{ERR} = \sum\_{i=1}^{N} \frac{w\_i TR\_i}{n\_i}, \sum\_{i=1}^{N} w\_i = 1\tag{6}$$

where *TRi* is ranking score of the *i*th factor, *wi* is that factor's weight, and *ni* is the factor's normalization coefficient.

As tasks start to accumulate, they are ranked by the above criteria. The ERR then creates a task ranking table, which contains the addresses of all tasks (root blocks of side chains) and their ranking scores. When a new task is initiated, the ERR appends a new entry to the task ranking table for it, with its respective ranking score. The task ranking table resides in all super nodes, and its creation and update need to be validated and synchronized by all super nodes based on the DABFT consensus. Table 3 gives an example of the ERR Task Ranking Score Table.


**Table 3.** ERR Task Ranking Score Table.

The ERR (Economic Relevancy Ranking) creates a task ranking table, which contains the addresses of all tasks (root blocks of side chains) and their ranking scores. When a new task is initiated, the ERR appends a new entry to the task ranking table for it, with its respective ranking score. The task ranking table resides in all super nodes, and its creation and update need to be validated and synchronized by all super nodes based on the DABFT consensus.

In parallel, the ERR assesses the capabilities of the service nodes based on the same criteria. It then creates a service node ranking table, which contains the addresses of all service nodes and their ranking scores. When a new service node joins, the ERR appends a new entry to the service node ranking table for it, with its respective ranking score. The service node ranking table resides in all super nodes, and its creation and update need to be validated and synchronized by all super nodes with the DABFT consensus.

The ERR algorithm has three major properties:


The ERR ranking score of a service node is thus given as:

$$SNR\_{ERR} = \sum\_{j=1}^{M} \frac{w\_j SNR\_j}{n\_j}, \sum\_{j=1}^{M} w\_j = 1 \tag{7}$$

where *SNRj* is the ranking score of the *j*th property, *wj* is that property's weight, and *nj* is that property's normalization coefficient.

Based on the ERR ranking scores of tasks and service nodes, the DPoEV provides a matchmaking service that pairs tasks with service nodes with the closet ranking scores. Thus the "rule of relevancy" in service node selection is observed, and service nodes with the highest rankings cannot dominate task handling and assignment. Rather, they have to be "relevant" to the tasks for which they compete. In addition, the "rule of wealth" and "rule of fairness" are used to enforce economic principles. Table 4 is an instance of the ERR Service Node Ranking Score Table.


**Table 4.** ERR Service Node Ranking Score Table.

Based on the ERR ranking scores of tasks and service nodes, the DPoEV provides a matchmaking service that pairs tasks with service nodes with the closet ranking scores. Thus the "rule of relevancy" in service node selection is observed, and service nodes with the highest rankings cannot dominate task handling and assignment. Rather, they have to be "relevant" to the tasks for which they compete. In addition, the "rule of wealth" and "rule of fairness" are used to enforce economic principles.

The service node selected (out of N service nodes) given a task *j* is follows the following equation:

$$SN\_T = \min[\bigcup\_{i}^{N} (SN\_{ARR,i} - TR\_{ERR,j})] \tag{8}$$

where *SNERR,i* is the ERR ranking score of the *i*th service node, and *TRERR,j* is the ERR ranking score of the *j*th task.

It is important to notice that, unlike the EVG, the ERR does not measure the economic value of tasks and service nodes. Rather, it ranks them based on their requirements and capabilities, which are not the bearers of economic value, but its producers. As such, the ERR has no role in money supply policy in the DPoEV framework.

#### *5.4. DPoEV Advantages*

The DPoEV incentive consensus algorithm creates and distributes award to participating nodes in the AIBC ecosystem in the form of CFTX tokens. It eliminates the possibility of macroeconomic level inflation and deflation, enforces free and fair trade, and balances microeconomic level supply and demand.

With the EVG and ERR, by design, the DPoEV enforces the economic policies and the "rules of relevancy, wealth and fairness." It thus guarantees that no tasking nodes can dominate task initiation, no super nodes can dominate task handling, and no resource nodes can dominate task assignment.

A key benefit of the DPoEV is that it effectively eliminates the possibility of 51% attack based on the number of efforts (like Proof-of-Work in Bitcoin), or wealth accumulation (like Proof-of-Stake in Ethereum). As a matter of fact, it has the potential to eliminate 51% attack of anything.

#### **6. Delegated Adaptive Byzantine Fault Tolerance (DABFT)**

While the DPoEV algorithm provides the application layer incentive consensus, it needs to work with a high-performance fundamental layer distributed consensus protocol that actually provides blockchain services. This bottom layer consensus is the "real" blockchain enabler.

Therefore, unlike most of the existing public chains, the AIBC establishes a two-consensus approach: on the application layer, the DPoEV consensus is responsible for economic policy enforcement, and on the fundamental layer, a Delegated Adaptive Byzantine Fault Tolerance (DABFT) distributed consensus algorithm is responsible for managing each and every transaction in terms of block generation, validation, and ledger recording. While the DPoEV does not need to be real-time as most of the application scenarios do not demand real-time reward distribution, the DABFT has to be real-time, as block validation and ledger recording need to be done quickly and robustly. The goal of DABFT is to achieve up to hundreds of thousands of TPS (Transactions per Second).

#### *6.1. DABFT Design Goals*

The DABFT implements the upper-layer DPoEV economic policies on the fundamental layer and provides the blockchain services of block generation, validation, and ledger recording. It focuses on the AIBC's goals of efficiency, fairness, and legitimacy. Unlike the dominant consensus algorithms (e.g., PoW) that waste a vast amount of energy just for the purpose of winning ledger recording privilege, the DABFT utilizes resources only for meaningful and productive endeavors that produce economic value.

#### *6.2. DABFT Adaptive Approach*

In Section 3 we survey all the existing blockchain consensus algorithms. In view of the advantages and disadvantages of the existing consensus algorithms, we conclude that, although some of them offer useful features, none of them alone can fully meet the AIBC goals of efficiency, fairness, and legitimacy, and hence, do not comply with the DPoEV economic models.

We thus propose the DABFT, which combines the best features of the existing consensus algorithms. Conceptually, the DABFT implements certain PoS features to strengthen the legitimacy of the PoI, and certain PoI features to improve the fairness of PoS. It also improves the PoD's election mechanism with the BFT algorithm.

In addition, the DABFT is further extended by a feature of adaptiveness. The DABFT is a delegated mechanism with a higher level of efficiency and is essentially a more flexible DBFT that is capable of selecting BFT flavors most suitable for particular (and parallel) tasks on the fly. The adaptiveness is achieved by deep learning techniques, that real-time choices of consensus algorithms for new tasks are inferred from trained models of previous tasks.

Therefore, the DABFT is the perfect tool to build the efficient, legit and fair AIBC ecosystem that conducts only meaningful and productive activities.

#### *6.3. DABFT Algorithm Design*

#### 6.3.1. New Block Generation

Upon the release of a new task, a subset of super nodes that are most relevant to the task is selected as the representatives (task validators), who then elect among themselves a single task handler responsible for managing the task. The task handler then selects a number of resource nodes that are the most relevant to the task, and distribute the task to them. Upon successful release of the new task, the task handler proposes a new block that is then validated by the task validators. A new block is thus born.

Because of the "rule of relevancy," it is highly likely that each new task is assigned a completely different set of task validators and task handler. However, once the task handler and validators are selected, they manage the task from inception to completion (from the root block to the final block of the side chain). Therefore, there is no need for the periodical system-wide reelection of representatives. The key benefit of this arrangement is that no dynasty management is required, which reduces the system's complexity and improves its efficiency.

The real-time selection of task validators and handler for a new task based on the "rule of relevancy" means the DABFT has a built-in "dynamic sharding" feature, which will be explained in a later subsection.

#### 6.3.2. Consensus Building Process

After a task handler proposes a new block, the task validators participate in a round of BFT voting to determine the legitimacy of the block.

At present, none of the mainstream BFT algorithms is optimal for all tasks. The DABFT utilizes a set of effectiveness evaluation algorithms through AI based deep learning to determine the optimal BFT mode for the task at hand. The flavors of BFT algorithms for the DABFT to choose from include, but not limited to, DBFT and PBFT (Practical BFT), as well as Q/U (Abd-El-Malek et al. 2005), HQ (Cowling et al. 2006), Zyzzyva (Kotla et al. 2009), Quorum (Guerraoui et al. 2010), Chain (Guerraoui et al. 2010), Ring (Guerraoui et al. 2011), and RBFT (Redundant BFT) (Aublin et al. 2013), etc. Figure 3 shows the consensus process for several mainstream BFT algorithms.

Through the machine learning prediction, DABFT dynamically switches the system to the optimal BFT consensus of the present task. The DABFT improves upon the ADAPT (Bahsoun et al. 2015) and is similar to it in several ways. Like the ADAPT, the DABFT is a modular design and consists of three important modules: BFT System (BFTS), Event System (ES), and Quality Control System (QCS). The BFTS is essentially an algorithms engine that modularizes the aforementioned BFT algorithms. The ES collects factors that have a significant impact on performance and security in the system, such as a number of terminals, requests, sizes, etc., and sends task information to the QCS. The QCS drives the system through either static (Shoker and Bahsoun 2013), dynamic, or heuristics mode, and evaluates a set of Key Performance Indicators (KPI) and Key Characteristics Indicator (KCI) to select the optimal BFT flavor for the task at hand.

The QCS computes the evaluation scores of the competing BFT protocols for a particular task and then selects the protocol with the highest score. For a given task *t* and protocol *pi* ∈ *BFTS* that has an evaluation score *Ei*,*<sup>t</sup>* (*element o f Matrix E*), the best protocol *pt* is given as:

$$p\_t = p\_{i\prime} \text{ s.t. } E\_{i,t} = \max\_{1 \le j \le n} E\_{j,t} \tag{9}$$

$$where \begin{cases} \text{ } & E = \text{C} \circ P\\ \text{ } & \frac{1}{a} \cdot (A \circ (e\_n - \iota I)) \text{ } \\ & P = B^\pm \cdot (V \circ W) \end{cases} \tag{10}$$

where *C* is the KCI matrix and *P* the KPI matrix; matrix *A* represents the profiles (i.e., the KCIs) of the protocols; Column matrix *U* represents the KCI user preferences (i.e., the weights); column matrix *en* is a unit matrix used to invert the values of the matrix *U* to −*U*. The use of 1/*a* within the integer value operator rules out protocols not matching all user preferences in matrix *U*. Matrix *B* represents KPIs of the protocols, one protocol per row. Column matrix *V* represents the KPI user-defined weights for evaluations. Column matrix *W* is used in the heuristic mode only, with the same constraints as matrix *<sup>V</sup>*. The operator "◦" represents Hadamard multiplication, and the operator " . ∨" represents Boolean multiplication.

**Figure 3.** Mainstream BFT Algorithm Consensus Processes.

There is one major shortcoming in the ADAPT design. The ADAPT employs the Support Vector Regression (SVR) method (Smola and Schölkopf 2004) with a five-fold cross-validation to predict the KPI parameters for elements in matrix *B*. There are six fields in the dataset: number of clients, request size, response size, throughput, latency, and capacity. While the methodology is useful in BFT settings in and of itself, it is not designed for the highly complex blockchain application scenarios in which there are many interactions between participants, thus it is not particularly effective for them. For example, in the AIBC context, at any given time there are multiple tasks (handled by different handlers) that compete for resources. As such, the ever-increasing number of tasks and interactions between them

affect the key KPI parameters (throughput, latency, and capacity) for individual tasks continuously (time series), for the purpose of achieving the best performance on the system-level. As such, it is necessary to introduce a mechanism that incorporates time-varying conditional correlations across tasks in order to adjust the KPI parameters on the fly. What sets the DABFT apart from the ADAPT is that the DABFT has such a function built in.

The DABFT implements the time-varying conditional correlation mechanism in the QCS. First of all, for task *t*, the QCS trains on the existing data to come up with the initial matrix *B*ˆ*<sup>t</sup>* (basically matrix *B* in the ADAPT, but specifically for task *t*). It then calculates a residual matrix *Et* as follows3:

$$E\_t = \begin{vmatrix} \dot{B}\_t \ -B\_t \end{vmatrix} \tag{11}$$

where *Bt* is the "real" KPI parameter matrix derived from empirical tests.

The specification with time-varying multi-dimensional correlation matrix for task *t* is thus given as4:

$$\begin{aligned} E\_t | \Psi\_{t-1} &\sim \mathcal{N}(0, \Omega\_t = H\_t \mathcal{P}\_t H\_t) \\ H\_t^2 &= H\_0^2 + \mathcal{K} E\_{t-1} E\_{t-1}^T + \mathcal{A} H\_{t-1}^2 \\ P\_t &= O\_t^\* O\_t O\_t^\* \\ \Xi\_t &= H\_t^{-1} E\_t \\ O\_t &= (1 - a - b) \overline{O} + a \Xi\_t \Sigma\_{t-1}^T + b O\_{t-1} \\ a + b &< 1 \end{aligned} \tag{12}$$

where:


It's worth mentioning that Equations (11) and (12) only propagate from task *t* back to task *t* − 1 for the purpose of reducing computation complexity.

Finally, the predicted KPI matrix for task *t*, *Bt*, is given as:

$$
\overline{B}\_t = \mathcal{B}\_t + \Omega\_t \tag{13}
$$

From this point onward, DABFT is similar to the ADAPT, and proceeds to select the BFT protocol with the evaluation highest score based on Equations (9) and (10). For any BFT choice, the DABFT provides fault tolerance for *F* = " (*N* − 1)/3# for a consensus set consisting of *N* task validators. This fault tolerance includes security and availability and is resistant to general and Byzantine faults in any network environment. The DABFT offers deterministic finality, thus a confirmation is a final confirmation, the chain cannot be forked, and the transactions cannot be revoked or rolled back.

Under the DABFT consensus mechanism, it is estimated that a block is generated every 0.1 to 0.5 s. The system has a theoretical sustainable transaction throughput of 30,000 TPS, and with proper

<sup>3</sup> The *B*ˆ*<sup>t</sup>* and *Bt* are full matrices made of row vectors for individual BFT flavors, while *Et* is actually a column matrix. The mathematical representation in this subsection is simplified just to illustrate the analysis process without losing a "high-level" accuracy.

<sup>4</sup> Essentially, this is a Dynamic Conditional Correlation (DCC) for multivariate time-series analysis with a DCC (1,1) specification (Engle and Sheppard 2001; Engle 2002).

optimization, has a potential to achieve 100,000 TPS and beyond, making the AIBC ecosystem capable of supporting high-frequency large-scale commercial applications.

The DABFT has an option to incorporate digital identification technology for the AIBC to be real name based, making it possible to freeze, revoke, inherit, retrieve, and transfer assets under judicial decisions. This feature makes the issuance of financial products with compliance requirement possible.

#### 6.3.3. Fork Selection

The DABFT selects the authority chain for each task with a block score at each block height. Under the principle of fairness and legitimacy, the forked chain of blocks with the highest economic value is selected to join the authority chain. The economic value of a forked chain is the sum of the economic value of the forked block and the descendants of that block. This is achievable because all tasks are tracked by their corresponding side chain blocks that will eventually reach finality.

#### 6.3.4. Voting Rules

In order to defend against malicious attacks to the consensus process, the DABFT borrows Casper's concept of minimum penalty mechanism to constrain task validators' behavior. The voting process follows the following basic rules:


#### 6.3.5. Incentive Analysis

The task validators (including the task handler) participating in the DABFT of a task receive rewards in the form of CFTX tokens according to the DPoEV incentive consensus. The total number of tokens awarded to the task validators is a percentage of the overall number of tokens allocated to the task and is shared by all participating task validators and handler. The number of tokens awarded to the task handler and each task validator is determined by its contribution to the completion of the task. These numbers are dynamically determined by the DPoEV, particularly its EVG engine.

#### *6.4. Attack-Proof*

There are several attacks of particular interest in distributed consensus, and three of the most analyzed ones are double spending attack, short-range attack and 51% attack. In the DPoEV-DABFT two consensus AIBC ecosystem, by design, none of the attacks have a chance to succeed.

A double spending attack happens when a malicious node tries to initiate the same tokens through two transactions to two distinguished destinations. In a delegated validation regime (e.g., DPoS or DBFT), for such an attack to succeed, the malicious node must first become a validator through the election (with deposit paid) and then bribe at least one-third of other validators in order for both transactions to reach finality. It is impossible to succeed in double spending in the DPoEV-DABFT two consensus AIBC ecosystem. The reasons are that the validators (super nodes) are chosen by their relevancy to tasks but not their deposits; that the validators are not allowed to initiate tasks; and that the validators are rewarded based on their levels of contribution, not by other validators. Essentially, conditions for the double spending attack to occur do not exist.

A short-range attack is initiated by a malicious node that fakes a chain (A-chain) to replace the legitimate chain (B-chain) when the H+1 block has not expired. In a delegated regime for this attack to be successful, the attacker needs to bribe the validators in order to make the block A1 score higher than B1. Thus, essentially, the short-range attack is very much like a double-spending attack at the A1/B1 block level, which has no chance to succeed for the same reason that makes the double-spending attack futile.

In the PoW, a 51% attack requires a malicious node to own 51% of the total computing power in the system, in the PoS 51% of the deposit, and in the PoD 51% of the certified accounts. In the DPoEV-DABFT two consensus AIBC ecosystem, restrained by the economic model, there is no possibility for any node to own more than 51% of the economic value. More importantly, since the validators are not allowed to initiate tasks (thus transactions), a validator with bad intention must bribe its compatriots to even launch such an attack. However, the validators are rewarded based on their levels of contribution, not by other validators. Essentially, conditions for the 51% attack to occur do not exist either.

#### *6.5. Dynamic Sharding*

One of the challenges the mainstream blockchains face is scalability, which is key to performance improvement. Ethereum seeks to resolve the scalability issue with the so-called sharding approach, in which a shard is essentially "an isolated island" (Rosic 2018; Buterin 2018). The DABFT, by design, has a built-in dynamic sharding feature.

First of all, the AIBC ecosystem is a 2D BlockCloud with super nodes that track the status of tasks through side chains. Once a task is initiated, a set of task validators are then selected according to the "rule of relevancy". A task handler is then chosen among the task validators to handle the task. The task handler and validators manage the task from the beginning to the end with no dynasty change. Thus, effectively, from a task's perspective, the task validators form a shard that is responsible for managing it, with the task handler being its leader.

In addition, due to the "rule of relevancy," it is highly likely that each new task is assigned a different set of task validators from the previous task, although overlapping is possible, especially when the number of super nodes is small. Therefore, once a task is completed and its associated task reaches finality, its shard dissolves automatically. Therefore, in the AIBC, no periodic "re-sharding" is necessary. Such fluidity affords the AIBC a "dynamic" sharding feature.

The dynamic sharding feature makes the so-called single-shard takeover attack against the AIBC impossible to succeed. First off, shards are directly formed by tasks in a highly random fashion due to the unpredictable nature of the "rule of relevancy." Second, shards have very short lifespans as they only last till tasks are completed. Practically, malicious nodes never have a chance to launch attacks.

The AIBC also maintains a 1D "main chain" at each super node, with the blocks of side chains of shards intertwined. A Merkle tree structure of the 1D blockchain makes it topologically identical to the 2D BlockCloud.

#### **7. Conclusions**

The AIBC is an Artificial Intelligence and blockchain technology based large-scale decentralized ecosystem that allows system-wide low-cost sharing of computing and storage resources. The AIBC consists of four layers: a fundamental layer, a resource layer, an application layer, and an ecosystem layer.

The AIBC layers have distinguished responsibilities and thus performance and robustness requirements. The application and resource layers need to follow the economic policies strictly and run on a deterministic and robust protocol; the fundamental layer needs to follow an adaptive protocol with high throughput without sacrificing robustness. As such, the AIBC implements a two-consensus approach: the DPoEV incentive consensus to create and distribute awards on the upper layers, and the DABFT distributed consensus responsible for blockchain functions on the fundamental layer. The DPoEV is deterministic and does not necessarily require high-performance as most of the application scenarios do not demand real-time reward distribution. The DABFT is real-time and adaptive, as block validation and record bookkeeping need to be done quickly and robustly.

The DPoEV follows a set of economic policy (especially the macroeconomic policy that governs the monetary policy, thus token supply), and uses the knowledge map algorithm to accurately assess the economic value of digital assets. The DABFT uses deep learning techniques to predict and select the most suitable BFT algorithm in order to enforce the economic policies on the fundamental layer, as well as to achieve the best balance of performance, robustness, and security. In addition, by design, the DABFT has a built-in dynamic sharding feature, which affords the AIBC scalability, while at the meantime eliminates the possibility of single-shard takeover.

With the DPoEV-DABFT dual-consensus architecture, the AIBC has a theoretical sustainable transaction throughput of 30,000 TPS, and with proper optimization, has a potential to achieve 100,000 TPS and beyond, making the AIBC ecosystem capable of supporting high-frequency large-scale commercial applications. In addition, the dual-consensus architecture, by design, makes the AIBC attack-proof against risks such as double-spending, short-range and 51% attacks.

Our contribution is four-fold: that we develop a set of innovative economic models governing the monetary, trading and supply-demand policies in the AIBC; that we establish an upper-layer DPoEV incentive consensus algorithm that implements the economic policies; that we provide a fundamental layer DABFT distributed consensus algorithm that executes the DPoEV with adaptability; and that we prove the economic models can be effectively enforced by AIBC's DPoEV-DABFT dual-consensus architecture.

**Funding:** This research was funded by Cofintelligence Financial Technology Ltd. (Hong Kong and Shanghai, China). **Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


Guerraoui, Rachid, Nikola Knezevic, Vivien Quema, and Marko Vukolic. 2011. *Stretching BFT*. Lausanne: EPFL.


© 2019 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

MDPI St. Alban-Anlage 66 4052 Basel Switzerland Tel. +41 61 683 77 34 Fax +41 61 302 89 18 www.mdpi.com

*Journal of Risk and Financial Management* Editorial Office E-mail: jrfm@mdpi.com www.mdpi.com/journal/jrfm

MDPI St. Alban-Anlage 66 4052 Basel Switzerland

Tel: +41 61 683 77 34 Fax: +41 61 302 89 18