**Financial Distress Prediction and Feature Selection in Multiple Periods by Lassoing Unconstrained Distributed Lag Non-linear Models**

#### **Dawen Yan 1, Guotai Chi <sup>2</sup> and Kin Keung Lai 3,\***


Received: 16 July 2020; Accepted: 29 July 2020; Published: 3 August 2020

**Abstract:** In this paper, we propose a new framework of a financial early warning system through combining the unconstrained distributed lag model (DLM) and widely used financial distress prediction models such as the logistic model and the support vector machine (SVM) for the purpose of improving the performance of an early warning system for listed companies in China. We introduce simultaneously the 3~5-period-lagged financial ratios and macroeconomic factors in the consecutive time windows *t* − 3, *t* − 4 and *t* − 5 to the prediction models; thus, the influence of the early continued changes within and outside the company on its financial condition is detected. Further, by introducing lasso penalty into the logistic-distributed lag and SVM-distributed lag frameworks, we implement feature selection and exclude the potentially redundant factors, considering that an original long list of accounting ratios is used in the financial distress prediction context. We conduct a series of comparison analyses to test the predicting performance of the models proposed by this study. The results show that our models outperform logistic, SVM, decision tree and neural network (NN) models in a single time window, which implies that the models incorporating indicator data in multiple time windows convey more information in terms of financial distress prediction when compared with the existing singe time window models.

**Keywords:** financial distress prediction; unconstrained distributed lag model; multiple periods; Chinese listed companies

#### **1. Introduction**

Over the last four decades, models and methods for the prediction of corporate financial distress have attracted considerable interest among academics as well as practitioners. Financial distress prediction models can be used for many purposes including: monitoring of the solvency of regulated companies, assessment of loan default risk and the pricing of bonds, credit derivatives, and other securities exposed to credit risk (see [1–4]).

Different countries have different accounting procedures and rules; thus, the definition of financial distress put forward by different scholars is not always the same (see [4–7]). Bankruptcy is one of the most commonly used outcomes of financial distress of a company [5]. The nature of a bankrupt firm is that the owners can abandon the firm and transfer ownership to the debt holders, and bankruptcy occurs whenever the realized cash flow is less than the debt obligations [8]. It is generally agreed on that financial failure leads to substantive weakening of profitability of the company over time, but it is also feasible that a financially distressed firm may not change its formal status to bankrupt [9]. Therefore, in this paper, as done by [3] and [4], we identify a financially distressed company as the one at risk of failing, but which remains a viable entity at the present time. More specifically, "special treatment" (ST) is used to measure the financial distress status of a listed company. Further, in this paper, we provide a group of financial distress prediction models that incorporate the panel data of financial and macroeconomic indicators to implement early financial distress prediction.

In the existing studies, many classical statistical methods, such as discriminant analysis [1]; logistic regression and multinomial logit models (see [2–4,10]); heuristic algorithm methods such as the genetic algorithm and particle swarm optimization [11]; currently popular machine learning techniques, such as the support vector machine, decision tree and neural networks (see [7,12–16]), have been widely applied to develop financial distress prediction models. The relevant studies also realize that a set of indicators can be used to predict the financial distress, including financial indicators (e.g., see [7,17–20]) and macroeconomic indicators (e.g., see [4]). For example, accounting models such as Altman's 5-variable (Z-score) model (see [1]), 7-variable model (see [21]) etc. have gained popularity in both academic and industrial fields due to their discriminating ability and predictive power.

A few extant studies have predicted financial distress by using the accounting ratios from one or more years prior to its observation, given that early change in financial indicators may provide the warning sign of deterioration of financial conditions. The authors of [1] provide the evidence that bankruptcy can be predicted two years prior to the event, while those of [4] construct respectively two groups of financial distress prediction models for two periods: one year and two years before the observation of the financial distress event. They found that the models in the both time-windows have good predictive performance. Other similar examples can be found in [7,20,22].

In the relevant literatures, financial indicators in the different time windows have proven their contribution to the performance of the distress prediction models, in spite of the fact that the degree of their impact tends to change over time. However, the procedures are performed for all the different lag periods separately, i.e., only using information of one specific year prior to the date of the distress event. To the best of the authors' knowledge, no previous study, which solves the listed companies' financial distress prediction problem, takes into account the impacts of relevant indicators in the different and consecutive lag periods.

In this study, we take the form of the distributed lag model (DLM), in addition to classical classification techniques, into the financial distress prediction problem and propose a group of distress prediction models including the logistic-distributed lag model and the SVM-distributed lag model that can be treated as generalized distributed lag model. We construct the linkage between multiple lagged values of financial ratios and macroeconomic indicators and current financial status in order to capture the dynamic natures of the relevant data. Further, we propose to implement the penalized logistic-distributed lag financial distress model with the least absolute shrinkage and selection operator (lasso) penalty via the algorithm framework of the alternating direction method of multipliers (ADMM) that yields the global optimum for convex and the non-smooth optimization problem. Lasso-type penalty was applied for three purposes: to avoid the collinearity problem in applying the distributed lag models directly, to simultaneously select significant variables and estimate parameters, and to address the problem of over-fitting. We conduct a series of empirical studies to illustrate the application of our distributed lag financial distress models, including a comparison of predictive performances of the two distributed lag financial distress models proposed in this paper as well as the comparisons of predictive performances of our models with a group of widely used classification models in the different time windows. The results show that all distributed lag financial distress models aggregating the data in three consecutive time windows outperform the ones that incorporate the data in any single-period: 3 years, 4 years or 5 years before the observation year of financial distress, when the financial and macroeconomic factors are included. This paper may provide a means of improving the predictive performance of financial distress model by incorporating data of financial and macroeconomic indicators in consecutive and multiple periods before the observation of financial distress.

The rest of this paper is organized as follows. Section 2 briefly reviews the previous financial distress prediction literature and distributed lag modeling. Section 3 constructs a group of generalized distributed lag models composed of lagged explanatory variables and *l*<sup>1</sup> regularization, the logistic regression-distributed lag model and the lasso–SVM model with lags, and proposes the ADMM algorithm framework for coefficient estimations and variable selection at same time. Section 4 provides a description of the data. Section 5 presents the empirical results and compares the predictive performance of our models reflecting the extended lag effects of used indicators with the existing financial distress prediction models. Section 6 concludes the paper.

#### **2. Background**

#### *2.1. Literature on Financial Distress Prediction*

#### 2.1.1. Financial Factors and Variable Selection

There is a large amount of theoretical and empirical microeconomic literature pointing to the importance of financial indicators on financial distress forecasting. The authors of [1] selected five financial indicators of strong predictive ability from the initial set of 22 financial indicators using stepwise discriminant analysis: earnings before interest and taxes/total assets, retained earnings/total assets, working capital/total assets, market value equity/book value of total debt and sales/total assets, which measures the productivity of assets of a firm. The other studies that concern the similar accounting ratios in financial distress prediction can also be found in [23,24]. Furthermore, the current-liabilities-to-current-assets ratio used to measure liquidity (see [22,25]) and the total-liabilities-to-total-assets ratio used to measure the degree of indebtedness of a firm (see [22,25,26]) and cash flow (see [18]) have been incorporated into the distress prediction models because of their predictive performance. The authors of [20] considered nine financial indicators and use them to predict the regulatory financial distress in Brazilian electricity distributors. The authors of [7] introduced 31 financial indicators and found that the most important financial indicators may be related to net profit, earnings before income tax, cash flow and net assets. Along this line, with machine learning methods developing, more diversified financial indicators have been considered in very recent studies of Hosaka [27], Korol et al. [28], and Gregova et al. [29].

Another very important recent research line in the area of financial distress prediction is the suitability problem of these financial ratios used as explanatory variables. For example, Kovacova et al. [30] discussed the dependence between explanatory variables and the Visegrad Group (V4) and found that enterprises of each country in V4 prefer different explanatory variables. Kliestik [31] chose eleven explanatory financial variables and proposed a bankruptcy prediction model based on local law in Slovakia and business aspects.

In this paper, we construct an original financial dataset including 43 financial ratios. The ratios are selected on the basis of their popularity in the previous literature (see [1,7,27]) and potential relevancy to this study. Then, like many relevant studies [32–34], we use the lasso method and conduct feature selection in order to exclude the potentially redundant factors.

#### 2.1.2. Macroeconomic Conditions

Macroeconomic conditions are relevant for the business environment in which firms are operating; thus, the deterioration of macroeconomic conditions may induce the occurrence of the financial distress. Macroeconomic variables have been found to impact corporate default and bankruptcy risk significantly, and good examples can be found in [35–39]. In the aspect of financial distress of listed companies, [4] consider the macroeconomic indicators of the retail price index and the short-term bill rate adjusted for inflation in addition to the accounting variables. The results in their studies suggest that all macroeconomic indicators have significant impact on the likelihood of a firm's financial distress. In this paper, we control for macroeconomic conditions, GDP growth, inflation, unemployment rate in the

urban area and consumption level growth over the sample period. GDP growth is widely understood to be an important variable to measure economic strength and prosperity, and the increase in GDP growth may decrease the likelihood of distress. The authors of [22,40] have pointed out that the decline in GDP is significantly linked to the tightening of a firm's financial conditions, especially during the financial crisis period. The unemployment rate and inflation are two broadly used measures of overall health of the economy. High unemployment and high inflation that reflect a weaker economy may increase the likelihood of financial distress. Their impacts on financial distress have been examined in [16,37].

Different from the existing relevant studies that consider the lagged effect of macroeconomic variables only in a fixed window, such as 3 years prior to financial distress [16], this paper imposes a distributed lag structure of macroeconomic data, in addition to financial ratio data, and considers the lagged effects of the factors in the multi-periods. Particular attention is devoted to the lag structure and whether the predicting performance can be improved after introducing a series of lagged macroeconomic variables. The theoretical and empirical investigations in this study may complement the literature on financial distress prediction concerned with applying dynamic macro and financial data.

#### 2.1.3. Related Literature on Chinese Listed Companies

The Chinese stock market has grown to over 55 trillion in market capitalization as of February 2020, and the number of listed firms has surpassed 3200, becoming the world's second largest market. Its ongoing development and the parallel evolution of regulations have made China's stock market an important subject for mainstream research in financial economics [41]. In April 1998, the Shanghai and Shenzhen stock exchanges implemented a special treatment (ST) system for stock transactions of listed companies with abnormal financial conditions or other abnormal conditions. According to the regulations, there are three main reasons for designation of a *ST* company: (1) a listed company has negative net profits for two consecutive years; (2) the shareholders' equity of the company is lower than the registered capital; (3) a firm's operations have stopped and there is no hope of restoring operations in the next 3 months due to natural disasters, serious accidents, or lawsuits and arbitration [7]. *ST* status is then usually applied as a proxy of financial distress (e.g., [7,16,42–45]).

Researchers regard the topic of financial distress prediction of Chinese listed companies as data-mining tasks, and use data mining, machine learning or statistical methods to construct a series of prediction models incorporating financial data (see [7,42,43]) or financial plus macroeconomic data [16] in one-time-period, but not in multiple periods of time. In the very recent study of [45], the authors proposed a financial distress forecast model combined with multi-period forecast results. First, with the commonly used classifiers such as the support vector machine (SVM), decision tree (DT) etc., the two to five-year-ahead financial distress forecast models are established one by one and denoted as *T*-2 to *T*-5 models, respectively. Then, through combining the forecast results of these one-time-period models, the multi-period forecast results, as a weighted average over a fixed window, with exponentially declining weights, are provided. This is obviously different from our model, as we introduce the multi-period lagged explanatory variables and detect simultaneously the effects of the variables in different prior periods on financial distress in the process of modeling.

#### *2.2. Distributed Lag Models*

Sometimes the effect of an explanatory variable on a specific outcome, such as the changes in mortality risk, is not limited to the period when it is observed, but it is delayed in time [46,47]. This introduces the problem of modeling the relationship between a future outcome and a sequence of lags of explanatory variables, specifying the distribution of the effects at different times before the outcome. Among the various methods that have been proposed to deal with delayed effects, as a major econometric approach, distributed lag models (DLMs) have been used to diverse research fields including assessing the distributed lag effects of air pollutants on children's health [48], hospital admission scheduling [49], and economical and financial time series analysis [50,51].

DLMs model the response *Yt*, observed at time *t* in terms of past values of the independent variable *X*, and have a general representation given by

$$\log(\mathbf{Y}\_t) = a\_0 + \sum\_{l=0}^{L} s\_l(\mathbf{x}\_{t-l}; a\_l) + \varepsilon\_t, \mathbf{t} = L, +1, \dots, T \tag{1a}$$

where *Yt* is the response at time *t* and *g* is a monotonic link function; the functions *sl* denote a smoothed relationship between the explanatory vector *xt*−*<sup>l</sup>* and the parameter vector α*l*; α<sup>0</sup> and ε*<sup>t</sup>* denote the intercept term and error term with a zero mean and a constant variance σ2; *l*, *L* are the lag number and the maximum lag. The form that link function *g* takes presents a distributed lag linear model or a non-linear model. For example, linear *g* plus the continuous variable *Yt* present distributed lag linear models, while the logit function *g* plus the binary variable *Yt* present a distributed lag non-linear model. In model (1a), the parametric function *sl* is applied to model the shape of the lag structure, usually polynomials (see [49,52]) or less often regression splines [53] or more complicated smoothing techniques of penalized splines within generalized additive models [47,48]. In fact, the introduction of *sl* is originally used to solve the problem that these successive past observations may regard as collinear. If *L*, the number of relevant values of *X*, is small, as may well be the case for some problems if annual data are involved, then model (1a) degrades to an unconstrained distributed lag model given by the following general representation

$$\mathbf{g}(\mathbf{Y}\_t) = a\_0 + \sum\_{l=0}^{L} a\_l \mathbf{l}^T \mathbf{x}\_{t-l} + \varepsilon\_{t\prime} \ t = L\_\prime + 1, \ \dots, \ T \tag{1b}$$

In (1b), the definitions of variables are the same as those in (1). Correspondingly, the coefficients in model (1b) can be estimated directly by pooled least squares for the linear case or pooled maximum likelihood for the non-linear case, e.g., logit link function *g* under the assumption that *xt*−*<sup>l</sup>* is strictly exogenous [54].

In this paper, logistic regression with an unconstrained distributed lag structure is used to identify the relationship between financial and macroeconomic indicators and future outcome of financial distress. The logistic regression may be the most frequently used technique in the financial distress prediction field ([4,20]) because logistic regression relies on fewer assumptions due to the absence of the need for multivariate normality and homogeneity in the variance–covariance matrices of the explanatory variables [23]. Further, a lasso penalty is introduced to conduct simultaneous parameter estimation and variable selection, considering that the lasso penalty method has good performance for solving the overfitting problem caused by the introduction of factors in adjacent windows and selecting the features and the corresponding exposures with relatively significant influence on the response. In fact, the lasso method has been applied for linear-distributed lag modelling (e.g., [55]).

#### **3. Methodology**

In this section, by combining the logistic regression method and unconstrained distributed lag model, we seek to estimate which indicators and in which period prior to the distress event best predicts financial distress. First, we construct Model 1 that represents the "accounting-only" model and incorporates the financial ratios. We introduce the 3-period-lagged financial ratios as independent variables into Model 1 and use the model to predict the financial distress event in year t by using the data of relevant indicator of the consecutive years, *t* − 3, *t* − 4, *t* − 5, simultaneously. Note that t refers to the current year in this paper. Then, we construct Model 2, which represents the 'accounting plus macroeconomic indicators' model, and includes, in addition to the accounting variables, 3-period-lagged macroeconomic indicators. Then, we introduce lasso penalty to the models and implement the coefficient estimation and feature selection. Further, we provide the algorithm framework of alternating direction method of multipliers (ADMM) that yields the global optimum for convex and the non-smooth optimization problem to obtain the optimal estimation for the coefficients. Finally, we propose a support vector machine model that includes the lagged variables of the accounting

ratios and macroeconomic factors. This model is used for comparison of the predictive performance of the logistic model with a distributed lag of variables.

#### *3.1. Logistic Regression Framework with Distributed Lags*

#### 3.1.1. The Logistic Regression-Distributed Lag Model with Accounting Ratios Only

The logistic regression may be the most frequently used technique in the financial distress prediction field and has been widely recognized ([11,20]). We propose a logistic model composed of lagged explanatory variables. Similar to the distributed lag linear model, the model has the following general form:

$$P\left(\mathbf{Y}\_{i,t} = 1 \middle| \mathbf{X}\_{i,t-l}\right) = \left(1 + \exp\left(-\left(a\_0 + \sum\_{l=0}^{L} a\_{t-l}^T \mathbf{X}\_{i,t-l}\right)\right)\right)^{-1}, \ t = t\_0 + L, t\_0 + L + 1, \dots, \ t\_0 + L + d \tag{2}$$

In (2), *Yi*,*<sup>t</sup>* is a binary variable, and if *Yi*,*<sup>t</sup>* =1, then it means that firm *I* at time *t* is a financially distressed company, otherwise, firm *i* (*i* = 1, 2, ... , *n*) is a financially healthy company, corresponding the case of *Yi*,*<sup>t</sup>* = 0; α<sup>0</sup> is intercept, and α*t*−*<sup>l</sup>* = (α*t*−*<sup>l</sup>*,1, α*t*−*<sup>l</sup>*,2, ... , α*t*−*l*,*p*) <sup>T</sup> is the coefficient vector for the explanatory variable vector *Xi*,*t*−*<sup>l</sup>* at time *t*−*l*; *Xi*,*t*−*<sup>l</sup>* is the *p*-dimension accounting ratio vector for firm *i* at time *t* − *l*, *l* = 0, 1, 2, ... , *L*; *l*, *L* are the lag number and the maximum lag; *t*<sup>0</sup> is the beginning of the observation period and *d* is the duration of observation. The idea in (2) is that the likelihood of occurrence of the financial distress at time *t* for a listed company may depend on *X* measured not only in the current time *t*, but also in the previous time windows *t* − 1 through *t* − *L*.

In Formula (2), we assume a five-year effect and set the maximum lag *L* = 5, given that (1) the effect of the explanatory variable on the response variable may decline to zero in the time series data scenario; (*2*) the considered length of lag is not more than 5 years in most of the previous studies of financial distress prediction (see [1,7,37]). Besides, we set directly the coefficient α*t*−0, α*t*−1, α*t*−<sup>2</sup> for the variables in the current year and the previous two year before ST to be 0 vectors, because (1) the financial statement in the current year (year *t*) is not available for financial distressed companies labeled as in financial distress in year t, since the financial statement is published at the end of the year, but special treatment probably occurs before the publication; (2) designation of an ST company depends on the financial and operating situations of the previous year before the label of ST. Put simply, it is not meaningful to forecast ST risk 0, 1 or 2 years ahead (see [7,45]). Therefore, the logistic model containing the 3~5-period-lagged financial indicators, defined as Model 1, is presented as follows:

$$P(Y\_{i,t} = 1) = \left(1 + \exp(-\alpha\_0 - \alpha\_{t-3} \, ^\mathrm{T}X\_{i,t-3} - \alpha\_{t-4} \, ^\mathrm{T}X\_{i,t-4} - \alpha\_{t-5} \, ^\mathrm{T}X\_{i,t-5}\right)^{-1} \tag{3}$$

In Equation (3), *Yi*,*<sup>t</sup>* is binary response and is defined the same in (2); *Xi,t*−3, *Xi,t*−<sup>4</sup> and *Xi,t*−<sup>5</sup> are the *p*-dimensional financial indicator vectors of firm *i* observed in year *t* − 3, *t* − 4 and *t* − 5; α0, α*t*−3, α*t*−4, α*t*−<sup>5</sup> are intercept terms and the coefficient vectors for the explanatory vectors *Xi,t*−3, *Xi,t*−<sup>4</sup> and *Xi,t*−5, respectively, and α*t*−*<sup>l</sup>* (*l* = 3, 4, 5) stands for the average effect of increasing by one unit in *Xi,t*−*<sup>l</sup>* on the log of the odd of the financial distress event holding others constants. Of course, in Model 1, we consider the effect of changes in financial ratios on financial distress probability during three consecutive years (*t* − 3, *t* − 4, *t* − 5).

#### 3.1.2. The Logistic Regression-Distributed Lag Model with Accounting Plus Macroeconomic Variables

We further add the macro-economic factors into Equation (3) to detect the influence of macroeconomic conditions, in addition to financial indicators. Model 2, including both accounting variables and macroeconomic variables, takes the following form:

$$P(Y\_{l,l} = 1) = (1 + \exp(-\alpha\_0 - \alpha\_{l-3} \, ^\mathrm{T}X\_{l,l-3} - \alpha\_{l-4} \, ^\mathrm{T}X\_{l,l-4} - \alpha\_{l-5} \, ^\mathrm{T}X\_{l,l-5} - \eta\_{l-3} \, ^\mathrm{T}Z\_{l,l-5} - \eta\_{l-4} \, ^\mathrm{T}Z\_{l,l-4} - \eta\_{l-5} \, ^\mathrm{T}Z\_{l,l-5}))^{-1} \tag{4}$$

In Equation (4), *Zt*−*<sup>l</sup>* (*l* = 3, 4, 5) represents the *m*-dimensional macroeconomic factor vector of year *t* − *l*; η*t*−*<sup>l</sup>* (*l* = 3, 4, 5) is the coefficient vector for Z*i*,*t*−*l*; the others are defined as in Equation (3). Similarly, η*j*,*t*−<sup>3</sup> + η*j*,*t*−<sup>4</sup> + η*j*,*t*−<sup>4</sup> represent the cumulative effects on log odd of the distress event of the *j*-th (*j* = 1, 2, ... , *m*) macroeconomic factor.

Models 1 and 2, marked as Equations (3) and (4), can reflect the continued influence of the financial statement and macroeconomic conditions for multi-periods on the response; however, a considerable amount of potentially helpful financial ratios, macroeconomic factors and their lags may bring redundant information, thus decreasing the models' forecast performances. In the following section, we implement feature selection by introducing lasso penalty into the financial distress forecast logistic models. Further, we provide an ADMM algorithm framework to obtain the optimal estimation for the coefficients.

#### *3.2. The Lasso–Logistic Regression-Distributed Lag Model*

There is currently much discussion about the lasso method. Lasso, as an *l*1-norm penalization approach, has been actively studied. In particular, lasso has been used on the distributed lag linear model, and lasso estimators for coefficients are obtained through minimizing the residual sum of squares and the *l*1-norm of coefficients simultaneously (e.g., [55]). For the logistic model with lagged financial variables (3), we can extend to logistic–lasso as follows in Equation (5):

$$f(\left|\mathbf{a}\_{0},\mathbf{a}\right> = \underset{a\_{0},a}{\arg\min} \left(a\_{0},a\middle|\mathbf{X}\_{i,t-3\nu}\mathbf{X}\_{i,t-4\nu}\mathbf{X}\_{i,t-5\nu}\mathbf{Y}\_{i,t}\right) + \lambda\|\alpha\|\_{1} \tag{5}$$

where

$$f(a\_0, a \vert X\_{it}, Y\_{i,t}) = \sum\_{t=t\_0+5}^{t\_0+d} \sum\_{i=1}^{n} \left( -\chi\_{i,t}(a\_0 + a^T X\_{it}) + \ln(1 + \exp\{a\_0 + a^T X\_{it}\}) \right)$$

and αˆ <sup>0</sup> and αˆ denote the maximum likelihood estimations for intercept α<sup>0</sup> and coefficient vector α; *f* denotes the minus log-likelihood function of Model 1 and can be regarded as the loss function of the observations; α = (α*t*−<sup>3</sup> T, <sup>α</sup>*t*−<sup>4</sup> T, <sup>α</sup>*t*−<sup>5</sup> T) <sup>T</sup> are the unknown coefficients for explanatory variables; *Xit* = (*Xi,t*−<sup>3</sup> T, *Xi,t*−<sup>4</sup> T, *Xi,t*−<sup>5</sup> T) T, *Yi*,*<sup>t</sup>* are known training observations and defined as above; λ is the turning parameter; · <sup>1</sup> denotes *l*1-norm of a vector, i.e., the addition of absolute values of each element of a vector; *t*<sup>0</sup> and *d* are defined as before; *n* is the number of observed company samples.

Introducing the auxiliary variable <sup>β</sup> <sup>∈</sup> *<sup>R</sup>***3***p*, the lasso–logistic model (5) can be explicitly rewritten as follows:

$$\min\_{\alpha\_0, \alpha\_s, \boldsymbol{\beta}} f(\alpha\_0, \alpha | \mathbf{X}\_{i, t-3}, \mathbf{X}\_{i, t-4}, \mathbf{X}\_{i, t-5}, \mathbf{Y}\_{i, t}) + \lambda \|\boldsymbol{\beta}\|\_1 \text{ s.t.} \boldsymbol{\alpha} = \boldsymbol{\beta} \tag{6}$$

In this paper, we solve the optimization problem (6) by using alternating direction method of multipliers (ADMM) algorithm that was first introduced by [56]. ADMM is a simple but powerful algorithm and can be viewed as an attempt to blend the benefits of dual decomposition [57] and augmented Lagrangian methods for constrained optimization [58]. Now, the ADMM algorithm becomes a benchmark first-order solver, especially for convex and non-smooth minimization models with separable objective functions (see [59,60]), thus, it is applicable for the problem (6).

The augmented Lagrangian function of the optimization problem (6) can be defined as

$$L\_{\rho}(a\_0, a\_r \theta, \theta) = f(a\_0, a | \mathbf{X}\_{\text{it}}, \mathbf{Y}\_{\text{i}, \text{t}}) + \lambda \|\boldsymbol{\beta}\|\_1 - \theta^T(\alpha - \beta) + \frac{\rho}{2} \|\alpha - \beta\|\_2^2 \tag{7}$$

where *L*<sup>ρ</sup> is the Lagrange function; θ is a Lagrange multiplier vector and ρ(>0) is an augmented Lagrange multiplier variable. In this paper, ρ is predetermined to be 1 for simplicity. Then, the iterative scheme of ADMM for the optimization problem (6) reads as

$$\left(\alpha\_0^{k+1}, \alpha^{k+1}\right) = \underset{\alpha\_0, \alpha}{\arg\min} L\_{\boldsymbol{\rho}}\left(\left(\alpha\_0, \alpha\right), \boldsymbol{\beta}^k, \boldsymbol{\theta}^k\right) \tag{8a}$$

*Mathematics* **2020**, *8*, 1275

$$\boldsymbol{\beta}^{k+1} = \underset{\boldsymbol{\beta}}{\arg\min} \boldsymbol{L}\_{\boldsymbol{\beta}} \Big(\boldsymbol{\alpha}^{k+1}, \boldsymbol{\beta}, \boldsymbol{\theta}^{k}\Big) \tag{8b}$$

$$
\theta^{k+1} = \theta^k - \rho \left(\alpha^{k+1} - \beta^{k+1}\right) \tag{8c}
$$

In (8a)–(8c), α<sup>0</sup> *<sup>k</sup>*<sup>+</sup>1, α*k*<sup>+</sup>1, β*k*<sup>+</sup>1, and θ*<sup>k</sup>* are the values of α0, α, β, and θ the *k*-th iterative step of the ADMM algorithm, respectively. Further, the ADMM scheme (8a)–(8c) can be specified as

$$f\left(a\_0^{k+1}, \alpha^{k+1}\right) = \underset{a\_0, a}{\arg\min} f(a\_0, \alpha) - \left(\theta^k\right)^T \left(a - \beta^k\right) + \frac{\rho}{2} \left\|\alpha - \beta^k\right\|\_2^2 \tag{9a}$$

$$\boldsymbol{\beta}^{k+1} = \underset{\boldsymbol{\beta}}{\arg \min} \lambda \left\| \boldsymbol{\beta} \right\|\_{1} - \left( \boldsymbol{\theta}^{k} \right)^{T} \left( \boldsymbol{\alpha}^{k+1} - \boldsymbol{\beta} \right) + \frac{\rho}{2} \left\| \boldsymbol{\alpha}^{k+1} - \boldsymbol{\beta} \right\|\_{2}^{2} \tag{9b}$$

$$
\theta^{k+1} = \theta^k - \rho \left(\alpha^{k+1} - \beta^{k+1}\right) \tag{9c}
$$

The sub-problem in (9a), that is, the convex and smooth optimization problem, can be fast solved by the Newton method [61], after setting the initial θ, β to be arbitrary constants. More specifically, let α*\* <sup>k</sup>*+<sup>1</sup> = (α<sup>0</sup> *<sup>k</sup>*+1*;* α*k*<sup>+</sup>1) and α*\* <sup>k</sup>*+<sup>1</sup> be calculated via the following process:

$$
\alpha\_\*^{k+1} = \alpha\_\*^k - \left(\nabla^2 l\right)^{-1} \nabla l \tag{10}
$$

where

$$l(a\_\star) = l(a\_0; a) = f(a\_{0\prime}a) - \left(\theta^k\right)^T \left(\alpha - \beta^k\right) + \frac{\rho}{2} \|\alpha - \beta^k\|\_2^2$$

and <sup>∇</sup>2*<sup>l</sup>* <sup>∈</sup> *<sup>R</sup>*(3*p*+1)×(3*p*+1), <sup>∇</sup>*<sup>l</sup>* <sup>∈</sup> *<sup>R</sup>*3*p*+<sup>1</sup> are the hessian matrix and the derivative of differentiable function *l* with respect to α\*, respectively. For sub-problem (9b), its solution is analytically given by

$$\beta\_r^{k+1} = \begin{cases} \alpha\_r^{k+1} - \frac{\lambda + \theta\_r^k}{\rho}, \alpha\_r^{k+1} > \frac{\lambda + \theta\_r^k}{\rho} \\ 0, \frac{-\lambda + \theta\_r^k}{\rho} < \alpha\_r^{k+1} \le \frac{\lambda + \theta\_r^k}{\rho} \\ \cdots, \ \alpha\_r^{k+1} \le \frac{-\lambda + \theta\_r^k}{\rho} \end{cases} \tag{11}$$

where β*<sup>r</sup> <sup>k</sup>*<sup>+</sup>1, α*<sup>r</sup> <sup>k</sup>*+<sup>1</sup> and θ*<sup>r</sup> <sup>k</sup>* are the *r*-th components of β*k*<sup>+</sup>1, α*<sup>r</sup> <sup>k</sup>*+<sup>1</sup> and θ*k*, respectively, for the *k*-th iterative step and *r* = 1, 2, ... , 3*p*.

The choice of tuning parameters is important. In this study, we find an optimal tuning parameter λ by the 10-fold cross validation method. We then compare the forecast accuracy of each method based on the mean area under the curve (*MAUC*) given as follows:

$$MAIC(\lambda) = \sum\_{j=1}^{10} AIC^j(\lambda) / 10 \tag{12}$$

where *AUCj* (λ) denotes the area under the receiver operating characteristic (ROC) curve on *j*-th validation set for each tuning parameter λ.

So far, the lasso estimators for the logistic model (5) including 3~5-period-lagged financial ratios have been obtained by following the above procedures. For the convenience of readers, we summarize the whole optimization procedures in training the lasso–logistic with lagged variables and describe them in Algorithm 1.

**Algorithm 1.** An alternating direction method of multipliers (ADMM) algorithm framework for lasso–logistic with lagged variables (5). 1: Dual residual and prime residua denote ||β*k*+<sup>1</sup> <sup>−</sup> <sup>β</sup>*k*||<sup>2</sup> and ||α*k*+<sup>1</sup> <sup>−</sup> <sup>β</sup>*k*+1||<sup>2</sup> respectively. 2: *N* denotes the maximum iterative number of the ADMM algorithm.

#### **Require:**


#### **Ensure:**


For the logistic model (4) with lag variables of the financial ratio and macroeconomic indicators, we can also extend the lasso as follows in (13):

$$f(\left\|\mathbf{a}\_{0\prime}\circ\right\rangle) = \operatorname\*{arg\,min}\_{\mathbf{a}\_{0\prime}\circ} \Big\|\mathbf{x}\_{0\prime}\circ\Big\|\mathbf{X}\_{i,t-3\prime}\mathbf{X}\_{i,t-4\prime}\mathbf{X}\_{i,t-5\prime}\mathbf{Z}\_{i,t-3\prime}\mathbf{Z}\_{i,t-4\prime}\mathbf{Z}\_{i,t-5\prime}\mathbf{Y}\_{i,t}\Big) + \lambda\|\|\gamma\|\|\_{1} \tag{13}$$

where γˆ = (αˆ, ηˆ) is the lasso estimator vector for coefficients of lagged financial ratios and macroeconomic indicators; γ = (αT, ηT) <sup>T</sup> represents the unknown coefficients for explanatory variables; α, η = (η*t*−<sup>3</sup> T, <sup>η</sup>*t*−<sup>4</sup> T, <sup>η</sup>*t*−<sup>5</sup> T) <sup>T</sup> and the others are defined as in Equations (4) and (5). The lasso estimator for model (13) can also be found by using the ADMM algorithm presented above.

#### *3.3. The Lasso–SVM Model with Lags for Comparison*

The support vector machine (SVM) is a widely used linear classifier with high interpretability. In this sub-section, we construct a lasso–SVM model that includes the 3-period-lagged financial indicators for comparison with the lasso–logistic-distributed lag model. The SVM formulation combing the original soft-margin SVM model [62] and a 3~5–period-lagged financial ratio variable vector is as follows:

$$\begin{cases} \min\_{a\_0, a, \boldsymbol{\xi}} \frac{1}{\tau} \|\boldsymbol{a}\|\boldsymbol{\xi}\|\_{2}^{2} + \mathcal{C} \sum\_{t=t\_0+5}^{t\_0+d} \sum\_{i=1}^{n} \boldsymbol{\xi}\_{i,t} \\ \text{s.t. } \boldsymbol{Y}\_{i,t} (a\_0 + a\_{t-3} \boldsymbol{\varvarlim}\_{i,t-3}^{T} \boldsymbol{X}\_{i,t-4} + a\_{t-4} \boldsymbol{\varvarlim}\_{i,t-4}^{T} \boldsymbol{X}\_{i,t-5}) \ge 1 - \boldsymbol{\xi}\_{i,t}, \boldsymbol{\xi}\_{i,t} \ge 0, \\ \quad \quad i = 1, 2, \ldots, n, \, t = t\_0 + 5, \ldots, ld \end{cases} \tag{14}$$

In (14), α<sup>0</sup> (intercept) and α = (α*t*−3; α*t*−4; α*t*−5) (normal vector) are the unknown coefficients of hyper-plane *f*(*Xit*) = α<sup>0</sup> + α*TXit*; · <sup>2</sup> denotes *l*2-norm of a vector; *C* is the penalty parameter and a predetermined positive value; ξ*i*,*<sup>t</sup>* is the unknown slack variable; *Yi*,*<sup>t</sup>* is a binary variable and *Yi*,*<sup>t</sup>* = 1, when firm *i* is a financially disIressed company in year *t*, otherwise *Yi*,*<sup>t</sup>* = −1; *Xit* = (1; *Xi,t*−3; *Xi,t*−4; *Xi,t*−5) denotes the observation vector of 3~5-period-lagged financial indicators for firm *i*; *n* represents the number of observations; *t*<sup>0</sup> and *d* denote the beginning and length of the observation period, respectively.

By introducing the hinge loss function, the optimization problem (14) has the equivalent form as follows [63]:

$$\min\_{\alpha\_{\bullet}} \sum\_{t=t\_0+5}^{t\_0+d} \sum\_{i=1}^{n} \left[1 - \mathcal{Y}\_{i,t} \big(\alpha\_\*^T \mathcal{X}\_{it}\big)\right]\_+ + \lambda \|\alpha\_\*\|\_2^2 \tag{15}$$

where α*\** = (α0; α), [·]<sup>+</sup> indicates the positive part, i.e., [*x*]<sup>+</sup> = max{*x*,0}, and the turning parameter λ = 1/2*C*.

Considering that it is regularized by *l*2-norm, the SVM forces all nonzero coefficient estimates, which leads to the problem of its inability to select significant features. Thus, to prevent the influence of noise features, we replace *l*2-norm in the optimization problem (15) with *l*1-norm, which is able to simultaneously conduct feature selection and classification. Furthermore, for computational convenience, we replace the hinge loss function in (15) with the form of the sum of square, and present the optimization problem combining the SVM model and the lasso method (*l*<sup>1</sup> regularization) as follows:

$$\alpha\_{\star} = \operatorname\*{argmin}\_{a\_{\star}} \sum\_{t=t\_0+5}^{t\_0+d} \sum\_{i=1}^{n} \left( \left[1 - Y\_{i,t} \alpha\_{\star}{}^T X\_{it} \right]\_{+} \right)^2 + \lambda \|\alpha\_{\star}\|\_1 \tag{16}$$

In (16), αˆ<sup>∗</sup> is the optimal estimated value for the coefficients of the SVM model, and the others are defined as above. Similarly with the process of the solution to the problem (5) as presented previously, first by introducing an auxiliary variable <sup>β</sup> <sup>∈</sup> *<sup>R</sup>***3***p*+**1**, the lasso–SVM model (16) can be explicitly rewritten as follows:

$$\min\_{\alpha\_\*\beta\_\*} \sum\_{t=t\_0+5}^{t\_0+d} \sum\_{i=1}^n \left( \left[1 - Y\_{i,t}\alpha\_\*^T X\_{it}\right]\_+ \right)^2 + \lambda \|\beta\_\*\|\_1 \text{ s.t.} \alpha\_\* = \beta\_\* \tag{17}$$

Then, the augmented Lagrangian function of the optimization problem (17) can be accordingly specified as

$$L\_{\rho}(a\_{\ast}, \pounds\_{\ast}, \theta\_{\ast}) = \sum\_{t=t\_0+5}^{t\_0+d} \sum\_{i=1}^{n} \left( \left[1 - Y\_{i,t}a\_{\ast}^{\top}X\_{i\bar{t}}\right]\_{+} \right)^2 + \lambda \|\pounds\_{\ast}\|\_{1} - \theta\_{\ast}^{\top}(a\_{\ast} - \beta\_{\ast}) + \frac{\rho}{2} \|\alpha\_{\ast} - \beta\_{\ast}\|\_{2}^{2} \tag{18}$$

where <sup>θ</sup> <sup>∈</sup> *<sup>R</sup>***3***p*+**<sup>1</sup>** and <sup>ρ</sup> <sup>∈</sup> *<sup>R</sup>* are the Lagrange and the augmented Lagrange multipliers, respectively. Then, the iterative scheme of ADMM for the optimization problem (18) is similar with (8a)–(8c) and can be accordingly specified as

$$a\_{\star}^{\;k+1} = \operatorname\*{argmin}\_{a\_{\star}} \sum\_{t=t\_0+5}^{t\_0+d} \sum\_{i=1}^{n} \left( \left[1 - \boldsymbol{Y}\_{i,t} \middle(a\_{\star}^{\;T} \boldsymbol{X}\_{it}\right)\right]\_{+} \right)^2 - \left(\boldsymbol{\theta}\_{\star}^{\;k}\right)^T \left(a\_{\star} - \boldsymbol{\beta}\_{\star}^{\;k}\right) + \frac{\rho}{2} \lVert a\_{\star} - \boldsymbol{\beta}\_{\star}^{\;k} \rVert\_{2}^{2} \tag{19a}$$

$$\beta\_\* \overset{k+1}{=} \underset{\beta\_\*}{\arg \min} \lambda \left\| \beta\_\* \right\|\_1 - \theta^k \Big( \alpha\_\*^{k+1} - \beta\_\* \Big) + \frac{\rho}{2} \left\| \alpha\_\*^{k+1} - \beta\_\* \right\|\_2^2 \tag{19b}$$

$$
\theta\_\*^{\,k+1} = \theta\_\*^{\,k} - \rho \big( a\_\*^{k+1} - \beta\_\*^{\,k+1} \big) \tag{19c}
$$

The finite Armijo–Newton algorithm [61] is applied for solving the α-sub-problem (19a), which is a convex piecewise quadratic optimization problem. Its objective function is first-order differentiable but not twice-differentiable with respect to α\*, which precludes the use of a regular Newton method. *F*(α\*) is the objective function of the sub-optimization problem (19a) and its gradient and generalized Hessian matrix are presented as follows Equations (20) and (21):

$$\nabla F(a\_\star) = -2\sum\_{t=t\_0+5}^{t\_0+d} \sum\_{i=1}^n \left( Y\_{i,t} X\_{it} (1 - \mathcal{Y}\_{i,t} a\_\star^T X\_{it})\_+ - \theta^k + \rho \left( a\_\star - \beta\_\star^k \right) \right) \tag{20}$$

$$\sigma^2 F(\alpha\_\*) = 2 \sum\_{t=t\_0+5}^{t\_0+d} \sum\_{i=1}^n \operatorname{diag} \left( 1 - Y\_{i,t} \alpha\_\*^T X\_{it} \right)\_\* X\_{it} X\_{it}^T + \rho I \tag{21}$$

where **<sup>I</sup>** <sup>∈</sup> *<sup>R</sup>***3***p*+**<sup>1</sup>** is identity matrix and *diag*(1 <sup>−</sup> *Yi*,*<sup>t</sup>* <sup>α</sup>\* *TXit*)\* is a diagonal matrix in that the *j*-th (*j* = 1, 2, ... , 3*p* + 1) diagonal entry is a sub-gradient of the step function (·)<sup>+</sup> as

$$\left( \operatorname{diag} \left( 1 - Y\_{i,t} \boldsymbol{\alpha}\_\*^T \mathbf{X}\_{it} \right)\_\* \right)\_{\stackrel{\rightarrow}{j}} \begin{cases} = 1 & \text{if } 1 - Y\_{i,t} \boldsymbol{\alpha}\_\*^T \mathbf{X}\_{it} > 0, \\ \in \left[ 0, 1 \right] & \text{if } 1 - Y\_{i,t} \boldsymbol{\alpha}\_\*^T \mathbf{X}\_{it} = 0, \\ = 0 & \text{if } 1 - Y\_{i,t} \boldsymbol{\alpha}\_\*^T \mathbf{X}\_{it} < 0. \end{cases} \tag{22}$$

The whole optimization procedure applied to solve the α-sub-problem (19a) is described in Algorithm 2.

**Algorithm 2.** A finite Armijo–Newton algorithm for the sub-problem (19a). 1: δ is the parameter associated with finite Armijo Newton algorithm and between 0 and 1.

#### **Require:**


#### **Ensure:**


*i*


The finite Armijo–Newton algorithm can guarantee the unique global minimum solution in a finite number of iterations. The details of proof of the global convergence of the sequence to the unique solution can be found in [61]. For the sub-problem (19b), its solution can be also analytically given by (11) presented above, after replacing α, β and θ with α\*, β\* and θ\*.

So far, the lasso estimators for the SVM model (16), including 3~5-period-lagged financial ratios, have been obtained by following the above procedures. For the convenience of readers, we summarize the whole optimization procedures in training the lasso–SVM with lagged variables and describe them in Algorithm 3. It is worth to note that the estimators for the lasso–SVM model that contain 3~5-period-lagged financial ratios and macro-economic indicators can be also obtained by the following algorithm similarly.

**Algorithm 3.** An ADMM algorithm framework for lasso–support vector machine (SVM) with lagged variables (16)

#### **Require:**


#### **Ensure:**


#### **4. Data**

#### *4.1. Sample Description*

The data used in the study are limited to manufacturing corporations. The manufacturing sector plays an important role in contributing to the economic growth of a country, especially a developing country [64]. According to the data released by the State Statistical Bureau of China, manufacturing accounts for 30% of the country's GDP. China's manufacturing sector has the largest number of listed companies as well as the largest number of ST companies each year. On the other hand, according to the data disclosed by the China Banking Regulatory Commission, in the Chinese manufacturing sector, the non-performing loan ratio has been increasing. For example, there was a jump in the non-performing loan ratio from 3.81% in December of 2017 to 6.5% in June of 2018. Therefore, it is quite important to establish an effective early warning system aiming to assess financial stress and prevent potential financial fraud of a listed manufacturing company for market participants, including investors, creditors and regulators.

In this paper, we selected 234 listed manufacturing companies from the Wind database. Among these, 117 companies are financially healthy and 117 are financially distressed, i.e., the companies being labeled as "special treatment". The samples were selected from 2007 to 2017, since the Ministry of Finance of the People's Republic of China issued the new "Accounting Standards for Business Enterprises" (new guidelines), which required that all listed companies be fully implemented from January 1, 2007. Similar to [7], [16] and [45], all 117 financially distressed companies receive ST due to negative net profit for two consecutive years. There were respectively 10, 9, 17, 24, 26 and 31 companies labeled as ST or \*ST in each year from 2012 to 2017. The same number of financially healthy companies were selected in each year. Considering the regulatory requirement and qualified data of listed companies, our data sample enforces the use of 2007 (*t*0) as the earliest estimation window available in forecasting a listed company's financial distress. Meanwhile, the maximum order lag used in our models is as long as 5 (years); that is, the maximum horizon is 5 years, so the number of special-treated (ST) companies was counted since 2012 (*t*<sup>0</sup> + 5). Furthermore, we divided the whole sample group into two groups: the training sample and the testing sample. The training sample is from 2012 to 2016, includes the data of 172 companies and is used to construct the models and estimate the coefficients. Correspondingly, the testing sample is from 2017, includes the data of 62 companies and is used to evaluate the predicting performance of the models.

#### *4.2. Covariate*

In this paper, we use the factors measured in consecutive time windows *t* − 3, *t* − 4 and *t* − 5 to predict a listed company's financial status at time *t* (*t* = 2012, 2013, ... , 2017). Therefore, we define response *y* as whether a Chinese manufacturing listed company was labeled as "special treatment" by China Securities Regulatory Commission at time *t* (*t* = 2012, 2013, ... , 2017) and input explanatory variables as their corresponding financial indicators based on financial statements reported at *t* − 3, *t* − 4 and *t* − 5. For example, we define response *y* as whether a Chinese manufacturing listed company was labeled as "special treatment" during the period of from January 1, 2017 to December 31, 2017 (denoted as year *t*) and (1) input explanatory variables as their corresponding financial indicators based on financial statements reported on December 31, 2014 (denoted as year *t* − 3), in December, 2013 (denoted as year *t* − 4) and in December, 2012 (denoted as year *t* − 5); through this way, the time lags of the considered financial indicators and the responses are between 3 to 5 years; (2) input explanatory variables as macroeconomic indicators based on the statements reported on December 31, 2014, 2013 and 2012 by the Chinese National Bureau of Statistics; through this way, the time lags of the considered macroeconomic indicators and the response are also between 3 to 5 years. The effect of time lags of 3 to 5 years of financial indicators on the likelihood of occurrence of financial distress is separately suggested by some previous research of early warnings of listed companies' financial

distress, but the varying effects of these time lags that occur in one prediction model are not yet considered in the existing studies.

#### 4.2.1. Firm-Idiosyncratic Financial Indicator

An original list of 43 potentially helpful ratios is compiled for prediction and provided in Table 1 because of the large number of financial ratios found to be significant indicators of corporate problems in past studies. These indicators are classified into five categories, including solvency, operational capability, profitability, structural soundness and business development and capital expansion capacity. All variables used for calculation of financial ratios are obtained from the balance sheet, income statements or cash flow statements of the listing companies. These financial data for financially distressed companies are collected in year 3, 4 and 5 before the companies receive the ST label. For example, the considered year when the selected financially distressed companies receive ST is 2017; the financial data are obtained in 2014, 2013 and 2012. Similarly, the data for financially healthy companies are also collected in 2014, 2013 and 2012. Model 1 (the accounting-only model) will be constructed using all the data in the following context. The model is used to predict whether a company is labeled in year *t*, incorporating the financial data of three consecutive time windows, *t* − 3, *t* − 4 and *t* − 5 (*t* = 2012, 2014, ... , 2017).


**Table 1.** List of financial indicators.

#### 4.2.2. Macroeconomic Indicator

Besides considering three consecutive period-lagged financial ratios for the prediction of financial distress of Chinese listed manufacturing companies, we also investigated the associations between macro-economic conditions and the possibility of falling into financial distress of these companies. The macro-economic factors include GDP growth, inflation, unemployment rate in urban areas and consumption level growth, as described in Table 2. GDP growth is widely understood to be an important variable to measure economic strength and prosperity; the increase in GDP growth may decrease the likelihood of distress. High inflation and high unemployment that reflect a weaker economy

may increase the likelihood of financial distress. Consumption level growth reflects the change in consumption level and its increase may reduce the likelihood of financial distress.


**Table 2.** List of macroeconomic factors.

1: All data of the macro-economic covariates are collected from the National Bureau of Statistics of China.

In the following empirical part, Model 2 represents the "accounting plus macroeconomic indicators" model and includes, in addition to the accounting variables, 3-period-lagged macroeconomic indicators. We collected the corresponding macroeconomic data in each year from 2007 to 2012 for all 234 company samples and the raw macroeconomic data are from the database of the Chinese National Bureau of Statistics.

#### *4.3. Data Processing*

The results in the existing studies suggest that the predicting models of standardized data yield better results in general [65]. Therefore, before the construction of the models, a standardization processing is implemented based on the following linear transformations:

$$u\_{ij}(t) = \frac{u\_{ij}(t) - \min\_{1 \le i \le 234} \{ u\_{ij}(t) \}}{\max\_{1 \le i \le 234} \{ u\_{ij}(t) \} - \min\_{1 \le i \le 234} \{ u\_{ij}(t) \}} \tag{23}$$

where *xij*(*t*) denotes the standardized value of the *j*-th financial indicator for the *i*-th firm in year *t*, and *j* = 1, 2, ... , 43, *i* = 1, 2, ... , 234, and *t* = 2007, 2008, ... , 2012; *uij*(*t*) denotes the original value of the *j*-th indicator of the *i*-th company in year *t*. Linear transformation scales each variable into the interval [0, 1]. Similarly, the following formula is used for data standardization of the macro-economic factor:

$$z\_{ij}(t) = \frac{v\_{ij}(t) - \min\_{1 \le i \le 234} \{ v\_{ij}(t) \}}{\max\_{1 \le i \le 234} \{ v\_{ij}(t) \} - \min\_{1 \le i \le 234} \{ v\_{ij}(t) \}} \tag{24}$$

In formula (24), *zij*(*t*) denotes the standardized value of the *j*-th macro-economic factor in year *t*; *vij*(*t*) denotes the original value of the *j*-th indicator of the *i*-th company in year *t*, where *j* = 1, 2, 3, 4, *i* = 1, 2, ... , 234, and *t* = 2007, 2008, ... , 2012. It is worth noting that the assignment to *vij*(*t*) for each company is based on the data of the macroeconomic condition of the province where the company operates (registration location).

#### **5. Empirical Results and Discussion**

In this chapter, we establish a financial earning prediction system for Chinese listed manufacturing companies by using two groups of lasso-generalized distributed lag models, i.e., a logistic model and an SVM model including 3~5-period-lagged explanatory variables, and implement financial distress prediction and feature selection simultaneously. For the selected sample set, the sample data from 2007 to 2016 were used as the training sample and the sample from 2017 as the test sample. The tuning parameter was identified from cross-validation in the training set, and the performance of the chosen method was evaluated on the testing set by the area under the receiver operating characteristics curve (AUC), G-mean and Kolmogorov–Smirnov (KS) statistics.

#### *5.1. Preparatory Work*

It is necessary to choose a suitable value for the tuning parameter λ that controls the trade-off of the bias and variance. As mentioned before, 10-fold cross-validation is used on the training dataset in order to obtain the optimal tuning parameter, λ. First, we compare prediction performance of the lasso–logistic-distributed lag model (5) including only 43 firm-level financial indicators (the accounting-only model) when the turning parameter λ changes. The results show that the mean AUCs of validation data are 0.9075, 0.9095, 0.9091, 0.9112, 0.8979, 0.8902 and 0.8779, respectively, corresponding to λ = 0.01, 0.1, 0.5, 1, 2, 3, 4. Second, we compare the prediction performance of the logistic-distributed lag model (4) incorporating lasso penalty with 43 firm-level financial indicators and 4 macro-economic factors (the model of accounting plus macroeconomic variables). The results show that the mean AUCs of validation data are 0.9074, 0.8018, 0.9466, 0.9502, 0.9466, 0.9466 and 0.8498, respectively, corresponding to λ = 0.01, 0.1, 0.5, 1, 2, 3, 4. Panel (a) and (b) in Figure 1 also show the average predictive accuracy of cross-validation that results from using seven different values of the tuning parameter λ in the accounting model and the model of accounting plus macroeconomic variables.

**Figure 1.** (**a**,**b**) are the Cross-validation performances that result from applying lasso–logistic-distributed lag regression to the listed manufacturing companies' data with various values of λ.

Generally speaking, the two kinds of models yield the best performance when λ = 1. Therefore, in the following, we fit and evaluate the lasso–logistic-distributed lag models by using the tuning parameter of 1.

#### *5.2. Analyses of Results*

This study develops a group of ex-ante models for estimating financial distress likelihood in the time window of *t* to test the contribution of financial ratios and macroeconomic indicators in the consecutive time windows of *t* − 3, *t* − 4 and *t* − 5. In the followings, Table 3 presents the results from lasso–logistic-distributed lag (LLDL) regressions of the financial distress indicator on the predictor variables and Table 4 presents the results from the lasso–SVM-distributed lag model. Furthermore, we compare predictive performance of the existing widely used ex-ante models, including neural networks (NN), decision trees (DT), SVM, and logistic models estimated in a time period from *t* − 3 to *t* − 5 with our models. The comparative results are shown in Table 5, Table 6 as well as Figure 2.




**Table 3.** *Cont.*

respectively, when λ = 1. 2: "×" in the table means that the corresponding factor cannot be selected. 3: The values in brackets are standard error for the estimated coefficients. "\*",and "\*\*\*" indicate that the corresponding variable being significant is accepted at significance levels of 0.1, 0.05 and 0.01, respectively.

 "\*\*"

#### *Mathematics* **2020** , *8*, 1275




**Figure 2.** Predictive performance of NN, DT, lasso–SVM, and lasso–logistic models on three different time window datasets, respectively, and our models on three consecutive time window datasets, evaluated by AUC for (**a**) and (**b**), G-Mean for (**c**) and (**d**), and KS for (**e**) and (**f**).


**Table 5.** Prediction results of the neural network (NN), decision tree (DT), lasso–SVM and lasso–logistic in the single year time window versus the lasso–SVM-distributed lag (LSVMDL) and lasso–logistic-distributed lag (LLDL) models (financial ratios only).

**Table 6.** Prediction results of NN, DT, lasso–SVM and lasso–logistic models in the single year time window versus the lasso–SVM-distributed lag (LSVMDL) model and the lasso–logistic-distributed lag (LLDL) model (financial ratios plus macroeconomic indicators).


#### 5.2.1. The Results of the Accounting-Only Model and Analyses

In Table 3, Model 1 represents the "accounting-only" lasso–logistic-distributed lag (LLDL) regression model including the 43 financial statement ratios in 3 adjacent years; the results of financial indicator selection and the estimations for the coefficients are listed in the first three columns. By using Algorithm 1, 23 indicators are in total chosen from the original indicator set. More specifically, two indicators, i.e., indicator number 1 and 2, are selected from the solvency category, five indicators (number 3 to 7) are selected from the operational capability category; six indicators (8-13) from operational capability, eight indicators (13–21) from profitability and two indicators (21-23) from structural soundness and business development and capital expansion capacity. It also can be found that nine financial indicators, namely, sales revenue/average total assets(1), impairment losses/sales profit(2), sales cost/average net inventory(3), shareholders' equity/net profit(4), net profit/total profit(5), net cash flow from operating activities/total assets(6), main business profit/net income from main business(7), net profit attributable to shareholders of the parent company/net profit(8) and operating capital/total assets(9), not used in the paper of [7] have quite significant influence on the future financial distress risk.

The potentially helpful ratios, such as the leverage ratio (total liabilities/total assets), shareholders' equity/net profit (ROE), net profit/average total assets (ROA), current liabilities/total liabilities etc., have significant effects on the occurrence of financial distress of Chinese listed manufacturing companies. For example, as shown in Table 3, the indicator of the leverage ratio in year *t* − 3—a very early time period—is selected as a significant predictor, and the estimated value for the coefficient is 3.1671. This implies that the increase in value of the Leverage ratio in the fifth previous *ST* year increases the financial risk of the listed manufacturing companies. The indicator of ROA for year *t* − 4 is selected, and the estimated value of the coefficient of the indicator is −1.1919, which implies the probability of falling into financial distress for a company will decrease with the company's ROA value, i.e., net profit/average total assets increasing.

Besides, the results in Table 3 also show that all changes in the indicator of sales revenue/average total assets for three consecutive time periods have significant effects on the future financial distress risk. It can be found that different weights are assigned to the variables of sales revenue/average total assets with different time lags, and the coefficient estimates for the indicator in the time windows of *t* − 3, *t* − 4 and *t* − 5 are −0.4367, −5.7393 and −1.8312, respectively. This implies that increases in sale revenue in different time windows have positive and significant (but different) effects on the future financial status of a listed company. The result for the indicator of "net cash flow from operating activities/total assets" presented in row 13 and the first 3 columns of Table 3 illustrate that changes in this indicator in different time windows have different effects on the future occurrence of financial distress at a significance level and magnitude of influence. The estimated coefficients for the variable measured in the previous time windows, *t* − 3, *t* − 4 and *t* − 5, are −4.8561, −2.6798 and −1.0999, at the significance level of 0.01, 0.05 and (>) 0.1, respectively. This indicates that (1) the higher the ratio of net cash flow from operating activities to total assets for a listed manufacturing company, the lower the likelihood of the firm's financial distress; (2) the changes in net cash flow from operating activities/total assets in the time windows *t* − 3 and *t* − 4 have significant influence on the risk of financial distress, and the magnitude of influence increases as the length of lag time decreases; (3) the influence of this indicator declines over time and change in this indicator in the 5 years before the observation of the financial distress event has no significant effect on financial risk when compared with relatively recent changes.

#### 5.2.2. The Results and Analyses of the Model of Accounting Plus Macroeconomic Variables

In Table 3, Model 2 represents the "accounting plus macroeconomic factor" model, including the original 43 financial ratios and 4 macroeconomic indicators in 3 adjacent years, and the results of indicator selection and the coefficient estimates are listed in the last three columns. It can be found that for Model 2, the same group of financial variables is selected and included in the final model. Time lags of the selected financial variables and the signs (but not magnitudes) of the estimated coefficients for the variables are almost consistent for Model 1 and 2.

In addition to the accounting ratios, three macroeconomic factors are selected as significant predictors and included in the final model: GDP growth, consumption level growth and unemployment rate in time window of *t* − *3*. The estimate for the coefficients of the selected GDP growth and unemployment rate are −2.4867 and 2.7262, respectively, which means that high GDP growth should decrease the financial distress risk, but high unemployment will deteriorate the financial condition of a listed manufacturing company. These results are consistent, which was expected. The estimate for the coefficient of consumption level growth is −0.9931, which implies that the high consumption level growth should decrease the possibility of financial deterioration of a listed company. Finally, it cannot be found that Consumer Price Index (CPI) growth has a significant influence on the financial distress risk.

The 4 year-lagged and 5 year-lagged GDP growth and 4 year-lagged consumption level growth are also selected and included in the final model but not as very significant predictors, which implies the following: (1) the changes in macroeconomic conditions have a continuous influence on the financial distress risk; (2) however, the effect of the macroeconomic condition' changes on the financial distress risk declines with the length of the lag window increasing.

#### 5.2.3. The Results of Lasso–SVM-Distributed Lag (LSVMDL) Models and Analyses

We introduce 3-period lags of financial indicators presented in Table 1, i.e., TL/TA*t*−3, TL/TA*t*−<sup>4</sup> and TL/TA*t*−5, CA/CL*t*−3, CA/CL*t*−<sup>4</sup> and CA/CL*t*−<sup>5</sup> ... , NICCE/NOS*t*−3, NICCE/NOS*t*−<sup>4</sup> and NICCE/NOS*t*−<sup>5</sup> into the model (16) and implement the indicator selection and the coefficient estimates by using Algorithm 3. The corresponding results are presented in first three columns of Table 4. Then, we introduce 3-period lags of financial and macroeconomic indicators presented in Tables 1 and 2 into the model (16) and the coefficient estimate of selected indicators are presented in the last three columns of Table 4.

Twenty-four financial indicators are selected and included in the final SVM-distributed lag model, denoted as Model 1 in Table 4; 17 indicators among them are also included in the final logistic-distributed lag model. For convenience of comparison, the 17 indicators, such as total liabilities/total assets, current liabilities/total assets and sales revenue/average current assets etc., are italicized and shown in the "selected indicator" column of Table 4.

According to the relation between response variables and predictors in the SVM model, as mentioned before, the increase (decrease) in the factors should increase (decrease) the financial distress risk when the coefficient estimates are positive. Therefore, let us take the estimated results in the first three rows and columns as an example: (1) the increase in the total liabilities to total assets ratio should increase the financial distress risk of a listed manufacturing company; (2) the increase in current liabilities to total assets ratio should decrease the financial distress risk; (3) the changes in the indicators in the period closer to the time of obtaining ST have a more significant effect on the likelihood of financial distress in terms of magnitudes of estimates of the coefficients.

Four macroeconomic factors, in addition to 24 financial indicators, are selected and included in the final SVM-distributed lag model, denoted as Model 2 in Table 4. The results show that (1) the effects of the selected financial ratios on the response, i.e., the financial status of a company, is consistent with the results in the SVM-distributed lag model including only financial ratios, i.e., Model 1, in terms of time lags of the selected financial variables and the signs of the estimated coefficients for the explanatory variables; (2) high GDP growth and high consumption level growth should decrease the financial distress risk, but high unemployment will deteriorate the financial condition of a listed manufacturing company.

From Table 4, it can be found that different indicators have different influence on the financial status of a company. The effects of some indicators on financial distress risk increase with the decrease in the time lag, e.g., total liabilities to total assets ratio, current liabilities/total assets and net cash flow from operating and investing activities/total liabilities etc., while the effects of some other indicators should decrease with the decrease in the time lag, e.g., fixed assets/total assets, GDP growth and consumption level growth etc. However, for some indicators, the effects of different time windows on financial status change. For example, the coefficients for current assets/current liabilities (current ratio) in Model 1 are 13.7838 for time window *t* − *4* and −23.2184 for time window *t* − *5*, which implies that a high current ratio in time window *t* − *5* should decrease the financial distress risk; this, however, would be not the case in time *t* − *4*. Similar case can be found for CPI growth in Model 2. Thus, SVM-distributed lag models may not interpret well; therefore, it would be inferior to the logistic-distributed lag models in terms of in terms of interpretability.

#### 5.2.4. Comparison with Other Models

For the purpose of comparison, the prediction performances of the ex-ante models for the estimation of financial distress likelihood developed by the existing studies are shown in Tables 5 and 6. The existing widely used ex-ante models include the neural network (NN), decision tree (DT), SVM, and logistic models estimated in different time periods of *t* − 3, *t* − 4, and *t* − 5, called *t* − 3 models, *t* − 4 models and *t* − 5 models. The construction of these three groups of models is similar to [7]. Let us take the construction of *t* − 5 model as example. For 10 financially distressed companies that received ST in 2012 and the selected 10 healthy companies until 2012 as a control group, their financial and macroeconomic data in 2007 (5 years before 2012) were collected. For 9 financially distressed companies that received ST in 2013 and the selected 9 healthy companies, their financial and macroeconomic data in 2008 (5 years before 2013) were collected. Similarly, for 17, 24, 26 financial distressed companies that receive the ST label respectively in 2014, 2015 and 2016 and the non-financial companies randomly selected at a 1:1 ratio in each year for matching with the ST companies, their data in 2009 (5 years before 2014), 2010 (5 years before 2015) and in 2011 (5 years before 2016) were collected. By using the labels of 172 companies and the data that were obtained 5 years prior to the year when the companies received the ST label, we construct *t* − 5 financial distress forecast models combined with a neural network (NN), decision tree (DT), SVM, and logistic regression. Similarly, *t* − 3 models and *t* − 4 models can be built. The data of financially distressed companies that received ST in 2017 and non-financial distressed companies were used to evaluate these models' predicting performance.

As mentioned in the beginning of this section, three measures of prediction performances are reported in these two tables, namely, AUC, G-mean, and Kolmogorov–Smirnov statistics. In the above scenarios based on different time periods as well as division of the whole dataset, we compare respectively the predicting performance of those one-time window models (*t* − 3 models, *t* − 4 models and *t* − 5 models) including financial ratios only and financial ratios plus macroeconomic factors with our lasso–SVM-distributed lag (LSVMDL) model and lasso–logistic-distributed lag (LLDL). The prediction results are presented in Table 5 for the case of "financial ratios only" and Table 6 for the case of "financial ratio plus macroeconomic factors".

In Table 5, panel A presents the predictive performances of NN, DT, lasso–SVM and lasso–logistic models including the original 43 financial ratios shown in Table 1 in the period *t* − 3 as predictors of financial distress status in period *t*, while the results in the last two columns are the performances of the two groups of distributed lag financial distress predicting models including the same original 43 financial ratios but in periods *t* − 3, *t* − 4 and *t* − 5, i.e., our models. Panel B and C of Table 5 present the prediction performance of the models used for comparison purposes estimated in *t* − 4 and *t* − 5, respectively. The results for our models retain the same values because these models include simultaneously the 3-year-, 4-year- and 5-year-lagged financial ratios.

The only difference between Tables 5 and 6 is that all models, in addition to the 43 original accounting rations, incorporate 4 macroeconomic indicators in different time windows. For example, for time window *t* − 3, the NN, DT, lasso–SVM and lasso–logistic models include 3-year-lagged macroeconomic indicators shown in Table 2 in addition to the financial statement ratios shown in Table 1. The cases of time windows *t* − 4 and *t* − 5 are similar for these models. As for the LSVMDL and LLDL models, i.e., our models, they include 3-periods-lagged macroeconomic indicators in the time windows *t* − 3, *t* − 4 and *t* − 5 in addition to the accounting ratios.

From Table 5, the prediction accuracy of NN or DT is highest in the time windows *t* − 3 and *t* − 4; our models outperform the others in time window *t* − 5 for predicting accuracy. Generally speaking, the accuracy for time period *t* − 3 is relatively higher than the other two time periods for the NN, lasso–SVM and lasso–logistic models. Furthermore, the prediction results based on time period *t* − 3 are the most precise for NN when compared with other models in a single time period and even our models, which implies that the selected financial ratios in the period closer to the time of obtaining ST may contain more useful information for the prediction of financial distress, and may be applicable to NN. The AUC of 91.52% of the lasso–logistic-distributed lag model (LLDL) ranked second, close to the accuracy of 93.56% obtained by using NN. Therefore, the LLDL model should be competitive in terms of interpretability and accuracy in the case of "accounting ratio only".

From Table 6, the prediction accuracy of all used models is higher than the results in Table 5. For example, the AUC, G-mean and KS of the NN model in time window *t* − 3 increases from 93.56%, 86.73% and 88.00% in Table 5 to 94.00%, 90.87% and 89.00% in Table 6, respectively. The changing

tendency of the prediction accuracy is retained for the other models, including macroeconomic indicators in addition to the accounting ratios. All results in Table 6 indicate that the introduction of the macroeconomic variables can improve predictive performance of all used models for the purpose of comparison; the changes in macroeconomic conditions do affect the likelihood of financial distress risk. On the other hand, the LLDL model performs best with the AUC of over 95% when compared with the best NN (in time period *t* − 3, 94%), the best DT (in time period *t* − 4, 92.24%), the best lasso–SVM (in time period *t* − 4, 93.64%), the best lasso–logistic (in time period *t* − 5, 90.68%) and LSVMDL (93.12%). The LSVMDL model is the best performing model in terms of G-mean and KS statistics.

Figure 2 also shows the comparative results of the accuracy of the six models. The predictive performances of all the models including accounting ratios only, indicated by the dotted lines (a), (c) and (e) in Figure 2, are worse than the models including macroeconomic indicators as well as accounting ratios, which are illustrated by the solid lines (b), (d) and (f) in Figure 2. Figures (a) and (b), G-Mean for (c) and (d), and KS for (e) and (f) present AUC, G-mean and KS for all of the examined models, respectively. The models used for comparison, namely, NN, DT, lasso–SVM and lasso–logistic models, were those that yielded the highest accuracy based on the different time window dataset. For example, based on the results of panel (b), AUC of NN (the yellow solid line), DT (the pink solid line), and lasso–logistic (the red one) models are highest in time window *t* − 3, *t* − 4 and *t* − 5, respectively. We cannot conclude that the prediction results based on financial and macroeconomic data of one specific time window, e.g., *t* − 3 (see [7]), are the most accurate. However, from the results in (b), (d) and (f), our models, the LLDL or LSVMDL model incorporating financial and macroeconomic data in three consecutive time-windows, yielded relatively robust and higher prediction performances.

Put simply, the two groups of generalized distributed lag financial distress predicting models proposed by this paper outperform the other models in each time period, especially when the accounting ratios and macroeconomic factors were introduced into the models. We demonstrated that our models provide an effective way to deal with multiple time period information obtained from changes in accounting and macroeconomic conditions.

#### 5.2.5. Discussion

Logistic regression and multivariate discriminant methods should be the most popular statistical techniques used in financial distress risk prediction modelling for different countries' enterprise, e.g., American enterprises [1] and European enterprises [4,30,31], because of their simplicity, good predictive performance and interpretability. The main statistical approach involved in this study is logistic regression, but rather multivariate discriminant analysis, given that strict assumptions regarding normal distribution of explanatory variables are used in multivariate discriminant analysis. The results in this study conform that logistic regression models still perform well for predicting Chinese listed enterprises' financial distress risks.

The major contribution to financial distress prediction literature made by this paper is that an optimally distributed lag structure of macroeconomic data in the multi-periods, in addition to financial ratio data, are imposed on the logistic regression model through minimizing loss function, and the heterogenous lagged effects of the factors in the different period are presented. The results unveil that financial indicators, such as total liabilities/total assets, sales revenue/total assets, and net cash flow from operating activities/total assets, tend to have a significant impact over relatively longer periods, e.g., 5 years before the financial crisis of a Chinese listed manufacturing company. This finding is in accordance with the recent research of [30,31] in that the authors claim the process of going bankrupt is not a sudden phenomenon; it may take as long as 5–6 years. In the very recent study of Korol et al. [30], the authors built 10 group models comprising 10 periods: from 1 year to 10 years prior to bankruptcy. The results in [30] indicate that a bankruptcy prediction model such as the fuzzy set model maintained an effectiveness level above 70% until the eighth year prior to bankruptcy. Therefore, our model can be extended through introducing more lagged explanatory variables, e.g., 6- to 8-year-lagged financial

variables, which may bring a better distributive lag structure of explanatory variables and predicting ability of the models.

The findings of this study allow managers and corporate analysts to prevent financial crisis of a company by monitoring early changes in a few sensitive financial indicators and taking actions, such as optimizing the corporate's asset structure, increasing cash flow and sales revenue, etc. They are also helpful for investors to make investment decision by tracking continuous changes in accounting conditions of a company of interest and predicting its risk of financial distress.

Another major contribution of this study is the confirmation of the importance of macroeconomic variables in predicting the financial distress of a Chinese manufacturing company, although scholars still argue about the significance of macro variables. For example, Kacer et al. [66] did not recommend the use of macro variables in the financial distress prediction for Slovak Enterprises, while Hernandez Tinoco et al. [4] confirmed the utilization of macro variables in the financial distress prediction for listed enterprises of the United Kingdom. The results in Section 5.2.4 of this study show that the prediction performance of all models (including both the models used for comparison and our own models) was increased when the macro variables were included in each model. The findings of this study allow regulators to tighten the supervision of Chinese listed companies when macroeconomic conditions change, especially in an economic downturn.

One of the main limitations of this study is that we limited the research only to the listed manufacturing companies. Both Korol et al. [28] and Kovacova [30] emphasized that the type of industry affects the risk of deterioration in the financial situation of companies. More specifically, distinguished by factors such as intensity of competition, life cycle of products, demand, changes in consumer preferences, technological change, reducing entry barriers into the industry and susceptibility of the industry to business cycles, different industries are at different levels of risk [28]. The manufacturing sector, which includes the metal, mining, automotive, aerospace and housing industries, is highly susceptible to demands, technological changes and macroeconomic conditions, thus making it at a high level of risk, while agriculture may be at a relatively low risk level. The risk parameter assigned to the service sector, including restaurants, tourism, transport and entertainment etc., has seen significant changes following the outbreak of the Coronavirus. Therefore, applicability and critique to our models for predicting financial distress risk of the companies operating in other industry and even other countries need to be further detected.

#### **6. Conclusions**

In this paper, we propose a new framework of a financial early warning system through introducing a distributed lag structure to be widely used in financial distress prediction models such as the logistic regression and SVM models. Our models are competitive when compared with the conventional financial distress forecast models, which incorporates data from only one-period of *t* − 3 or *t* − 4 or *t* − 5, in terms of predictive performance. Furthermore, our models are superior to the conventional one-time window financial distress forecast models, in which macroeconomic indicators of GDP growth, consumption level growth and unemployment rate, in addition to accounting factors, are incorporated. The empirical findings of this study indicate that the changes in macroeconomic conditions do have significant and continuous influence on the financial distress risk of a listed manufacturing company. This paper may provide an approach of examining the impacts of macroeconomic information from multiple periods and improving the predictive performance of financial distress models.

We implement feature selection to remove redundant factors from the original list of 43 potentially helpful ratios and their lags by introducing lasso penalty into the financial distress forecast logistic models with lags and SVM models with lags. Furthermore, we provide an ADMM algorithm framework that yields the global optimum for convex and the non-smooth optimization problem to obtain the optimal estimation for the coefficients of these financial distress forecast models with financial and macroeconomic factors and their lags. Results from the empirical study show that not only widely used financial indicators (calculated from accounting data), such as leverage ratio, ROE, ROA, and current liabilities/total liabilities, have significant influence on the financial distress risk of a listed manufacturing company, but also the indicators that are rarely seen in the existing literature, such as net profit attributable to shareholders of the parent company and net cash flow from operating activities/total assets, may play very important roles in financial distress prediction. The closer to the time of financial crisis, the more net profit attributable to shareholders of the parent company and net cash flow from operating activities may considerably decrease the financial distress risk. These research findings may provide more evidence for company managers and investors in terms of corporate governance or risk control.

The main limitation of this research is that we limited the research only to listed manufacturing companies. Sensitivity of financial distress models and suitability of both financial and macroeconomic variables to the enterprises that operate in other industries, e.g., service companies, need to be further discussed. On the other hand, given that the utilization of financial and macroeconomic variables in predicting the risk of financial distress of Chinese listed manufacturing companies is confirmed, we intend to continue the research toward the use of interaction terms of financial and macroeconomic variables in the context of the multiple period. Furthermore, the heterogeneous effect of changes in macroeconomic conditions on the financial distress risk of a company under different financial conditions can be discovered.

**Author Contributions:** We attest that all authors contributed significantly to the creation of this manuscript. The conceptualization and the methodology were formulated by D.Y., data curation was completed by G.C., and the formal analysis was finished by K.K.L. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by National Natural Science Foundation of China under grant numbers 71731003, 71301017 and by the Fundamental Research Funds for the Central Universities under grant numbers DUT19LK50 and QYWKC2018015. The authors wish to thank the organizations mentioned above.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Article* **Deep Learning Methods for Modeling Bitcoin Price**

**Prosper Lamothe-Fernández 1, David Alaminos 2,\*, Prosper Lamothe-López <sup>3</sup> and Manuel A. Fernández-Gámez <sup>4</sup>**


Received: 25 June 2020; Accepted: 28 July 2020; Published: 30 July 2020

**Abstract:** A precise prediction of Bitcoin price is an important aspect of digital financial markets because it improves the valuation of an asset belonging to a decentralized control market. Numerous studies have studied the accuracy of models from a set of factors. Hence, previous literature shows how models for the prediction of Bitcoin suffer from poor performance capacity and, therefore, more progress is needed on predictive models, and they do not select the most significant variables. This paper presents a comparison of deep learning methodologies for forecasting Bitcoin price and, therefore, a new prediction model with the ability to estimate accurately. A sample of 29 initial factors was used, which has made possible the application of explanatory factors of different aspects related to the formation of the price of Bitcoin. To the sample under study, different methods have been applied to achieve a robust model, namely, deep recurrent convolutional neural networks, which have shown the importance of transaction costs and difficulty in Bitcoin price, among others. Our results have a great potential impact on the adequacy of asset pricing against the uncertainties derived from digital currencies, providing tools that help to achieve stability in cryptocurrency markets. Our models offer high and stable success results for a future prediction horizon, something useful for asset valuation of cryptocurrencies like Bitcoin.

**Keywords:** bitcoin; deep learning; deep recurrent convolutional neural networks; forecasting; asset pricing

#### **1. Introduction**

Bitcoin is a cryptocurrency built by free software based on peer-to-peer networks as an irreversible private payment platform. Bitcoin lacks a physical form, is not backed by any public body, and therefore any intervention by a government agency or other agent is not necessary to transact [1]. These transactions are made from the blockchain system. Blockchain is an open accounting book, which records transactions between two parties efficiently, leaving such a mark permanently and impossible to erase, making this tool a decentralized validation protocol that is difficult to manipulate, and with low risk of fraud. The blockchain system is not subject to any individual entity [2].

For Bitcoin, the concept originated from the concept of cryptocurrency, or virtual currency [3]. Cryptocurrencies are a monetary medium that is not affected by public regulation, nor is it subject to a regulatory body. It only affects the activity and rules developed by the developers. Cryptocurrencies are virtual currencies that can be created and stored only electronically [4]. The cryptocurrency is designed to serve as a medium of exchange and for this, it uses cryptography systems to secure the transaction and control the subsequent creation of the cryptocurrency. Cryptocurrency is a subset of a

digital currency designed to function as a medium of exchange and cryptography is used to secure the transaction and control the future creation of the cryptocurrency.

Forecasting Bitcoin price is vitally important for both asset managers and independent investors. Although Bitcoin is a currency, it cannot be studied as another traditional currency where economic theories about uncovered interest rate parity, future cash-flows model, and purchasing power parity matter, since different standard factors of the relationship between supply and demand cannot be applied in the digital currency market like Bitcoin [5]. On the one hand, Bitcoin has different characteristics that make it useful for those agents who invest in Bitcoin, such as transaction speed, dissemination, decentrality, and the large virtual community of people interested in talking and providing relevant information about digital currencies, mainly Bitcoin [6].

Velankar and colleagues [7] attempted to predict the daily price change sign as accurately as possible using Bayesian regression and generalized linear model. To do this, they considered the daily trends of the Bitcoin market and focused on the characteristics of Bitcoin transactions, reaching an accuracy of 51% with the generalized linear model. McNally and co-workers [8] studied the precision with which the direction of the Bitcoin price in United States Dollar (USD) can be predicted. They used a recurrent neural network (RNN), a long short-term memory (LSTM) network, and the autoregressive integrated moving average (ARIMA) method. The LSTM network obtains the highest classification accuracy of 52% and a root mean square error (RMSE) of 8%. As expected, non-linear deep learning methods exceeded the ARIMA method's prognosis. For their part, Yogeshwaran and co-workers [9] applied convolutional and recurrent neural networks to predict the price of Bitcoin using data from a time interval of 5 min to 2 h, with convolutional neural networks showing a lower level of error, at around 5%. Demir and colleagues [10] predicted the price of Bitcoin using methods such as long short-term memory networks, naïve Bayes, and the nearest neighbor algorithm. These methods achieved accuracy rates between 97.2% and 81.2%. Rizwan, Narejo, and Javed [11] continued with the application of deep learning methods with the techniques of RNN and LSTM. Their results showed an accuracy of 52% and an 8% RMSE by the LSTM. Linardatos and Kotsiantis [12] had the same results, after using eXtreme Gradient Boosting (XGBoost) and LSTM; they concluded that this last technique yielded a lower RMSE of 0.999. Despite the superiority of computational techniques, Felizardo and colleagues [13] showed that ARIMA had a lower error rate than methods, such as random forest (RF), support vector machine (SVM), LSTM, and WaveNets, to predict the future price of Bitcoin. Finally, other works showed new deep learning methods, such as Dutta, Kumar, and Basu [14], who applied both LSTM and the gated recurring unit (GRU) model; the latter showed the best error result, with an RMSE of 0.019. Ji and co-workers [15] predicted the price of Bitcoin with different methodologies such as deep neural network (DNN), the LSTM model, and convolutional neural network. They obtained a precision of 60%, leaving the improvement of precision with deep learning techniques and a greater definition of significant variables as a future line of research. These authors show the need for stable prediction models, not only with data in and out of the sample, but also in forecasts of future results.

To contribute to the robustness of the Bitcoin price prediction models, in the present study a comparison of deep learning methodologies to predict and model the Bitcoin price is developed and, as a consequence, a new model that generates better forecasts of the Bitcoin price and its behavior in the future. This model can predict achieving accuracy levels above 95%. This model was constructed from a sample of 29 variables. Different methods were applied in the construction of the Bitcoin price prediction model to build a reliable model, which is contrasted with various methodologies used in previous works to check with which technique a high predictive capacity is achieved; specifically, the methods of deep recurrent neural networks, deep neural decision trees, and deep support vector machines, were used. Furthermore, this work attempts to obtain high accuracy, but it is also robust and stable in the future horizon to predict new observations, something that has not yet been reported by previous works [7–15], but which some authors demand for the development of these models and their real contribution [9,12].

We make two main contributions to the literature. First, we consider new explanatory variables for modeling the Bitcoin price, testing the importance of these variables which have not been considered so far. It has important implications for investors, who will know which indicators provide reliable, accurate, and potential forecasts of the Bitcoin price. Second, we improve the prediction accuracy concerning that obtained in previous studies with innovative methodologies.

This study is structured as follows: Section 2 explains the theory of methods applied. Section 3 offers details of the data and the variables used in this study. Section 4 develops the results obtained. Section 5 provides conclusions of the study and the purposes of the models obtained.

#### **2. Deep Learning Methods**

As previously stated, different deep learning methods have been applied for the development of Bitcoin price prediction models. We use this type of methodology thanks to its high predictive capacity obtained in the previous literature on asset pricing to meet one of the objectives of this study, which is to achieve a robust model. Specifically, deep recurrent convolution neural network, deep neural decision trees, and deep learning linear support vector machines have been used. The characteristics of each classification technique used are detailed below. In addition, the method of analysis of the sensitivity of variables used in the present study, in particular, the method of Sobol [16], which is necessary to determine the level of significance of the variables used in the prediction of Bitcoin price is recorded, fulfilling the need presented by the previous literature in the realization of the task of feature selection [15].

#### *2.1. Deep Recurrent Convolution Neural Network (DRCNN)*

Recurrent neural networks (RNN) have been applied in different fields for prediction due to its huge prediction performance. The previous calculations made are those that form the result within the structure of the RNN [17]. Having an input sequence vector *x*, the hidden nodes of a layer *s*, and the output of a hidden layer *y*, can be estimated as explained in Equations (1) and (2).

$$\mathbf{s}\_{\mathbf{t}} = \sigma(\mathbf{W}\_{\mathbf{x}\mathbf{s}}\mathbf{x}\_{\mathbf{t}} + \mathbf{W}\_{\mathbf{s}\mathbf{s}}\mathbf{s}\_{\mathbf{t}-1} + \mathbf{b}\_{\mathbf{s}}) \tag{1}$$

$$\mathbf{y}\_t = \mathbf{o}(\mathbf{W}\_{\text{s0}}\mathbf{s}\_t + \mathbf{b}\_\text{y}) \tag{2}$$

where *Wxs*, *Wss*, and *Wso* define the weights from the input layer *x* to the hidden layer *s*, by the biases of the hidden layer and output layer. Equation (3) points out σ and *o* as the activation functions.

$$STFT(z(t))(\tau,\omega) \equiv T(\tau,\omega) = \int\_{-\infty}^{+\infty} z(t)\omega(t-\tau)e^{-j\omega t}dt\tag{3}$$

where *z(t)* is the vibration signals, and ω*(t)* is the Gaussian window function focused around 0. T(τ*,* ω) is the function that expresses the vibration signals. To calculate the hidden layers with the convolutional operation, Equations (4) and (5) are applied.

$$\mathbf{S}\_{\rm t} = \sigma(\mathbf{W}\_{\rm TS} \ast \mathbf{T}\_{\rm t} + \mathbf{W}\_{\rm ss} \ast \mathbf{S}\_{\rm t-1} + \mathbf{B}\_{\rm s}) \tag{4}$$

$$\mathbf{Y\_t} = \mathbf{o}(\mathbf{W\_{YS}} \ast \mathbf{S\_t} + \mathbf{B\_y}) \tag{5}$$

where *W* indicates the convolution kernels.

Recurrent convolutional neural network (RCNN) can be heaped to establish a deep architecture, called the deep recurrent convolutional neural network (DRCNN) [18,19]. To use the DRCNN method in the predictive task, Equation (6) determines how the last phase of the model serves as a supervised learning layer.

$$
\mathbf{f} = \sigma(\mathbf{W\_h} \* \mathbf{h} + \mathbf{b\_h}) \tag{6}
$$

where *Wh* is the weight and *bh* is the bias. The model calculates the residuals caused by the difference between the predicted and the actual observations in the training stage [20]. Stochastic gradient descent is applied for optimization to learn the parameters. Considering that the data at time t is r, the loss function is determined as shown in Equation (7).

$$\mathbf{L(r,\hat{r})} = \frac{1}{2} \|\mathbf{r} - \hat{\mathbf{r}}\|\_2^2 \tag{7}$$

#### *2.2. Deep Neural Decision Trees (DNDT)*

Deep neural decision trees are decision tree (DT) models performed by deep learning neural networks, where a weight division corresponding to the DNDT belongs to a specific decision tree and, therefore, it is possible to interpret its information [21]. Stochastic gradient descent (SGD) is used to optimize the parameters at the same time; this partitions the learning processing in mini-batches and can be attached to a larger standard neural network (NN) model for end-to-end learning with backward propagation. In addition, standard DTs gain experience through a greedy and recursive factor division. This can make a selection of functions more efficient [22]. The method starts by performing a soft binning function to compute the residual rate for each node, making it possible to make decisions divided into DNDTs [23]. The input of a binning function is a real scalar x which makes an index of the containers to which x belongs.

The activation function of the DNDT algorithm is carried out based on the NN represented in Equation (8).

$$
\pi = \text{fw} \flat \flat \flat \tau \text{(x)} = \text{softmax}((\text{wx} + \text{b}) / \tau) \tag{8}
$$

where w is a constant with value w = [1, 2, ..., n + 1], τ > 0 is a temperature factor, and b is defined in Equation (9).

$$\mathbf{b} = [0, -\beta 1, -\beta 1, -\beta 2, \dots, -\beta 1 - \beta 2 - \dots - \beta \mathbf{n}] \tag{9}$$

The coding of the binning function *x* is given by the NN according the expression of Equation (9) [24]. The key idea is to build the DT with the applied Kronecker product from the binning function defined above. Connecting every feature *xd* with its NN *fd* (*xd*), we can determine all the final nodes of the DT as appears in Equation (10).

$$\mathbf{z} = \mathbf{f1(x1)} \otimes \mathbf{f2(x2)} \otimes \cdots \otimes \mathbf{fD(xD)}\tag{10}$$

where *z* expresses the leaf node index obtained by instance *x* in vector form. The complexity parameter of the model is determined by the number of cut points of each node. There may be inactive points since the values of the cut points are usually not limited.

#### *2.3. Deep Learning Linear Support Vector Machines (DSVR)*

Support vector machines (SVMs) were created for binary classification. Training data are denoted by its labels (*xn*, *yn*), *n* = 1, ... , *N*, *xn* ∈ R*D*,*tn* ∈ {−1, +1}; SVMs are optimized according to Equation (11).

$$\begin{array}{c} \min\limits\_{w \in \mathbb{E}\_n} \frac{1}{2} \mathsf{W}^T \mathsf{W} + \mathsf{C} \sum\_{n=1}^N \underline{\xi}\_n\\ \text{s.t. } \mathsf{W}^T \mathsf{x}\_n\\ \mathsf{f}\_n \ge 0 \; \mathsf{V}n \end{array} \tag{11}$$

where ξ*<sup>n</sup>* are features that punish observations that do not meet the margin requirements [25]. The optimization problem is defined as appears in Equation (12).

$$\min\_{\mathcal{W}} \frac{1}{2} \boldsymbol{\mathcal{W}}^T \boldsymbol{\mathcal{W}} + \mathbb{C} \sum\_{n=1}^{N} \max\limits (1 - \boldsymbol{\mathcal{W}}^T \mathbf{x}\_n \mathbf{t}\_n, \mathbf{0}) \tag{12}$$

*Mathematics* **2020**, *8*, 1245

Usually the Softmax or 1-of-*K* encoding method is applied in the classification task of deep learning algorithms. In the case of working with 10 classes, the Softmax layer is composed of 10 nodes and expressed by *pi*, where *i* = 1, ..., 10; *pi* specifies a discrete probability distribution, -10 *<sup>i</sup> pi* = 1.

Equation (13) is defined by *h* as the activation of the penultimate layer nodes, *W* as the weight linked by the penultimate layer to the Softmax layer, and the total input into a Softmax layer. The next expression is the result.

$$a\_i = \sum\_k h\_k \mathcal{W}\_{ki} \tag{13}$$

$$p\_i = \frac{\exp(a\_i)}{\sum\_{j}^{10} \exp(a\_j)}\tag{14}$$

The predicted class î would be as follows in Equation (15).

$$
\hat{\mathbf{x}} = \arg\max\_{\hat{i}} p\_{\hat{i}} = \arg\max\_{\hat{i}} a\_{\hat{i}} \tag{15}
$$

Since linear-SVM is not differentiable, a popular variation is known as the DSVR, which minimizes the squared hinge loss as indicated in Equation (16).

$$\min\_{w} \frac{1}{2} \boldsymbol{W}^T \boldsymbol{W} + \mathbb{C} \sum\_{n=1}^{N} \max\left(1 - \boldsymbol{W}^T \boldsymbol{x}\_n \boldsymbol{t}\_n, \boldsymbol{0}\right)^2\tag{16}$$

The target of the DSVR is to train deep neural networks for prediction [24,25]. Equation (17) expresses the differentiation of the activation concerning the penultimate layer, where *l (w)* is said differentiation, changing the input *x* for the activation *h*.

$$\frac{\partial l(w)}{\partial \mathbf{l}\_n} = -\mathbf{C}t\_{\text{fl}}w(\mathbb{I}\{1 > w^T \mathbf{l}\_{\text{fl}} t\_{\text{fl}}\})\tag{17}$$

where I{·} is the indicator function. Likewise, for the DSVR, we have Equation (18).

$$\frac{\partial l(w)}{\partial l\_{\rm ll}} = -2\text{C}t\_{\rm ll}w(\max(1 - \mathcal{W}^T h\_{\rm tr} t\_{\rm tr}, 0))\tag{18}$$

#### *2.4. Sensitivity Analysis*

Data mining methods have the virtue of offering a great amount of explanation to the authors' studied problem. To know what the degree is, sensitivity analysis is performed. This analysis tries to quantify the relative importance of the independent variables concerning the dependent variable [26,27]. To do this, the search for the reduction of the set of initial variables continues, leaving only the most significant ones. The variance limit follows, where one variable is significant if its variance increases concerning the rest of the variables as a whole. The Sobol method [16] is applied to decompose the variance of the total output V (Y) offered by the set of equations expressed in Equation (19).

$$V(Y) = \sum\_{i} V\_i + \sum\_{i} \sum\_{j>1} V\_{ij} + \dots + V\_{1,2,\dots k} \tag{19}$$

where *Vi* = *VE*(*Y Xi*) and *Vij* <sup>=</sup> *VE*(*<sup>Y</sup> Xi*, *Xj*)) <sup>−</sup> *Vi* <sup>−</sup> *Vj* .

*Si* = *Vi*/*V* and *Sij* = *Vij*/*V* define the sensitivity indexes, with *Sij* being the effect of interaction between two variables. The Sobol decomposition allows the estimation of a total sensitivity index, *STi*, which measures the sum of all the sensitivity effects involved in the independent variables.

#### **3. Data and Variables**

The sample period selected is from 2011 to 2019, with a quarterly frequency of data. To obtain the information of the independent variables, data from the IMF's International Financial Statistics (IFS), the World Bank, FRED Sant Louis, Google Trends, Quandl, and Blockchain.info were used.

The dependent variable used in this study is the Bitcoin price and is defined as the value of Bitcoin in USD. In addition, we used 29 independent variables, classified into demand and supply variables, attractiveness, and macroeconomic and financial variables, as possible predictors of the Bitcoin future price (Table 1). These variables were used throughout the previous literature [1,3,4,14].


**Table 1.** Independent variables.

The sample is fragmented into three mutually exclusive parts, one for training (70% of the data), one for validation (10% of the data), and the third group for testing (20% of the data). The training data are used to build the intended models, while the validation data attempt to assess whether there is overtraining of those models. As for the test data, they serve to evaluate the built model and measure the predictive capacity. The percentage of correctly classified cases is the precision results and RMSE measures the level of errors made. Furthermore, for the distribution of the sample data in these three phases, cross-validation 10 times with 500 iterations was used [28,29].

#### **4. Results**

#### *4.1. Descriptive Statistics*

Table 2 shows a statistical summary of the independent variables for predicting Bitcoin price. It is observed that all the variables obtain a standard deviation not higher than each value of the mean. Therefore, the data show initial stability. On the other hand, there is a greater difference between the minimum and maximum values. Variables like mining commissions and cost per transaction show a small minimum value compared to their mean value. The same fact happens with the hash

variable. Despite these extremes, they do not affect the values of the standard deviations of the respective variables.



#### *4.2. Empirical Results*

Table 3 and Figures 1–3 show the level of accuracy, the root mean square error (RMSE), and the mean absolute percentage error (MAPE). In all models, the level of accuracy always exceeds 92.61% for testing data. For its part, the RMSE and MAPE levels are adequate. The model with the highest accuracy is that of deep recurrent convolution neural network (DRCNN) with 97.34%, followed by the model of deep neural decision trees (DNDT) method with 96.94% on average by regions. Taken together, these results provide a level of accuracy far superior to that of previous studies. Thus, in the work of Ji and co-workers [15], an accuracy of around 60% is revealed; in the case of McNally and co-workers [8], it is close to 52%; and in the study of Rizwan, Narejo, and Javed [11], it approaches 52%. Finally, Table 4 shows the most significative variables by methods after applying the Sobol method for the sensitivity analysis.

**Table 3.** Results of accuracy evaluation: classification (%).


DRCNN: deep recurrent convolution neural network; DNDT: deep neural decision trees; DSVR: deep learning linear support vector machines; Acc: accuracy; RMSE: root mean square error; MAPE: mean absolute percentage error.



**Figure 1.** Results of accuracy evaluation: classification (%).

**Figure 2.** Results of accuracy evaluation: RMSE.

**Figure 3.** Results of accuracy evaluation: MAPE.

Table 4 shows additional information on the significant variables. Block size, cost per transaction, and difficulty were significant in the three models for each method applied. This demonstrates the importance of the cost to carry out the Bitcoin transaction, of the block of Bitcoins to buy, as well as the difficulty of the miners to find new Bitcoins, as the main factors in the task of determining the price of Bitcoin. This contrasts with the results shown in previous studies, where these variables are not significant or are not used by the initial set of variables [5,7,8]. The best results were obtained by the DRCNN method, where in addition to the aforementioned variables, the transaction value, transaction volume, block size, dollar exchange rate, Dow Jones, and gold were also significant. This shows that the demand and supply variables of the Bitcoin market are essential to predict its price, something that has been shown by some previous works [1,30]. Yet significant macroeconomic and financial variables have not been observed as important factors by other recent works [30,31], since they were shown as variables that did not influence Bitcoin price fluctuations. In our results, the macroeconomic variables of Dow Jones and gold have been significant in all methods.

On the other hand, the models built by the DNDT and DSVR methods show high levels of precision, although lower than those obtained by the DRCNN. Furthermore, these methods show some different significant variables. Such is the case of the variables of forum posts, a variable popularly used as a proxy for the level of future demand that Bitcoin could have, although with divergences in previous works regarding its significance to predict the price of Bitcoin, where some works show that this variable is not significant [11,14]. Finally, these methods show another macroeconomic variable that is more significant, in the case of the dollar exchange rate. This represents the importance that changes in the price of the USD with Bitcoin can be decisive in estimating the possible demand and, therefore, a change in price. This variable, like the rest of the macroeconomic variables, has not been shown as a significant variable [5,31].

This set of variables observed as significant represents a group of novel factors that determine the price of Bitcoin and therefore, is different from that shown in the previous literature.

#### *4.3. Post-Estimations*

In this section, we try to perform estimations of models to generate forecasts in a future horizon. For this, we used the framework of multiple-step ahead prediction, applying the iterative strategy and models built to predict one step forward are trained [32]. At time *t*, a prediction is made for moment *t* + 1, and this prediction is used to predict for moment *t* + 2 and so on. This means that the predicted data for *t* + 1 are considered real data and are added to the end of the available data [33]. Table 5

and Figures 4–6 show the accuracy and error results for *t* + 1 and *t* + 2 forecasting horizons. For *t* + 1, the range of precision for the three methods is 88.34–94.19% on average, where the percentage of accuracy is higher in the DRCNN (94.19%). For *t* + 2, this range of precision is 85.76–91.37%, where the percentage of accuracy is once again higher in the DRCNN (91.37%). These results show the high precision and great robustness of the models.


Acc: accuracy.

**Figure 4.** Multiple-step ahead forecasts in forecast horizon: accuracy.

**Figure 5.** Multiple-step ahead forecasts in forecast horizon: RMSE.

**Figure 6.** Multiple-step ahead forecasts in forecast horizon: MAPE.

#### **5. Conclusions**

This study developed a comparison of methodologies to predict Bitcoin price and, therefore, a new model was created to forecast this price. The period selected was from 2011 to 2019. We applied different deep learning methods in the construction of the Bitcoin price prediction model to achieve a robust model, such as deep recurrent convolutional neural network, deep neural decision trees and deep support vector machines. The DRCNN model obtained the highest levels of precision. We propose to increase the level of performance of the models to predict the price of Bitcoin compared to previous literature. This research has shown significantly higher precision results than those shown in previous works, achieving a precision hit range of 92.61–95.27%. Likewise, it was possible to identify a new set of significant variables for the prediction of the price of Bitcoin, offering great stability in the models developed predicting in the future horizons of one and two years.

This research allows us to increase the results and conclusions on the price of Bitcoin concerning previous works, both in matters of precision and error, but also on significant variables. A set of significant variables for each methodology applied has been selected analyzing our results, but some of these variables are recurrent in the three methods. This supposes an important addition to the field of cryptocurrency pricing. The conclusions are relevant to central bankers, investors, asset managers, private forecasters, and business professionals for the cryptocurrencies market, who are generally interested in knowing which indicators provide reliable, accurate, and potential forecasts of price changes. Our study suggests new and significant explanatory variables to allow these agents to predict the Bitcoin price phenomenon. These results have provided a new Bitcoin price forecasting model developed using three methods, with the DCRNN model as the most accurate, thus contributing to existing knowledge in the field of machine learning, and especially, deep learning. This new model can be used as a reference for setting asset pricing and improved investment decision-making.

In summary, this study provides a significant opportunity to contribute to the field of finance, since the results obtained have significant implications for the future decisions of asset managers, making it possible to avoid big change events of the price and the potential associated costs. It also helps these agents send warning signals to financial markets and avoid massive losses derived from an increase of volatility in the price.

Opportunities for further research in this field include developing predictive models considering volatility correlation of the other new alternative assets and also safe-haven assets such as gold or stable currencies, that evaluate the different scenarios of portfolio choice and optimization.

**Author Contributions:** Conceptualization, P.L.-F., D.A., P.L.-L. and M.A.F.-G.; Data curation, D.A. and M.A.F.-G.; Formal analysis, P.L.-F., D.A. and P.L.-L.; Funding acquisition, P.L.-F., P.L.-L. and M.A.F.-G.; Investigation, D.A. and M.A.F.-G.; Methodology, D.A.; Project administration, P.L.-F. and M.A.F.-G.; Resources, P.L.-F. and M.A.F.-G.; Software, D.A.; Supervision, D.A.; Validation, D.A. and P.L.-L.; Visualization, P.L.-F. and D.A.; Writing—original draft, P.L.-F. and D.A.; Writing—review & editing, P.L.-F., D.A., P.L.-L. and M.A.F.-G. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by Cátedra de Economía y Finanzas Sostenibles, University of Malaga, Spain.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

*Article*
