1. Introduction
The study of crisis events in international finance has received considerable attention in the field of economics over the last two decades, especially the prediction of sovereign debt and currency crises, due to their enormous importance in economic activity. This great research effort has produced a huge range of prediction models, supported in turn by varied methodologies [
1,
2,
3,
4].
The current importance of models for predicting crisis events is increased by the last global financial crisis, which showed that even developed countries, that is, those that in theory, are in a better situation and economic stability. The globalization process and economic development have led to the emergence of greater complexity in the macroeconomic and financial environment [
1]. This has created a new space for research, and the demand to build new models to forecast this event, not just at the level of a country but to explain the common characteristics of these crises for a wide geographic spectrum [
4,
5].
One of the paths initially taken by the literature on the prediction of international financial crisis events was the development of models built with samples made up of emerging economies since they tend to be more vulnerable countries and have statistically suffered a higher frequency of crises. However, at this initial stage, specific samples composed of only one country, or a reduced set of countries, were considered, and therefore could well be considered as regional models. Subsequently, the development of the literature in the construction of regional models was due to mere necessity [
6]. Recently, various so-called global models have also appeared that have used samples of economies from different regions of the world for their construction. Almost all of these global models have been built to predict situations in emerging economies, including some advanced economies [
7].
The results obtained by studies, such as that of [
8], confirm the convenience, both explanatory and potential classification capacity, of global models for predicting these crisis events in comparison with regional models or with information from a single country. Besides, there is a demand for more research on global models connected with the increase of accuracy and the scope of the information used, since the studies that have obtained high levels of precision used very small samples, mainly from a single country, and, therefore, with short-term conclusions [
6,
9,
10]. Many of these works have lacked methodological comparisons to find which empirical technique or which type of method could be the most appropriate for prediction [
11,
12,
13]. Therefore, the literature shows how necessary it is to deepen the use of computational techniques, also called ‘machine learning techniques’, to find alternatives with greater precision to anticipate and prevent future financial crises [
3,
4].
These sovereign debt and currency crises prediction models can be useful to more accurately assess the reputation of a country in the world [
14,
15,
16,
17,
18,
19]. Country reputation explains how the most important characteristics of a country, for example, social and economic factors, influence the image or brand in which the country is projected to the world. In particular, it can influence the market expectations of energy companies, where bilateral relations between countries are key, and the role of reputation shows the international trust that exists in the country. Various authors [
16,
17,
18,
19,
20,
21,
22] have expressed the need to incorporate data and variables on the economic and financial stability of the countries as one more important factor concerning the reputation of the country.
The present study tries to answer the research question of whether it is possible to make global crisis prediction models more accurate relative to those in previous literature, taking into account not only statistical techniques such as logistic regression or function Probit but also computational techniques that have yielded excellent classification results in recent decades in matters of economic prediction [
23]. To offer greater explanatory and comparative diversity, both global and regional models have been used for Africa and the Middle East, Asia, Latin America, and Europe. The results reached have made it possible to verify a greater precision of computational methods compared to traditional statistical techniques. Even very novel computational techniques have shown interesting potential in the precision of these events of the crisis.
The structure of this research work is as described below. In
Section 2, a review of the previous literature on the prediction of the mentioned crisis events is carried out: sovereign debt crisis and currency crisis.
Section 3 presents the methodology used.
Section 4 details the variables and data used in the research, and the results achieved are examined in
Section 5. Lastly, the conclusions of the investigation and its implications are presented.
3. Methodologies
As previously stated, to resolve the research question, we used a variety of methods in the design of the crisis prediction models. Applying different methods aims to achieve a robust model, which is tested not only through one classification technique but also by implementing all previous classification techniques that have been successful in previous literature. Specifically, multilayer perceptron, support vector machines, fuzzy decision trees, AdaBoost, extreme gradient boosting, random forests, deep belief network, and deep learning neural decision trees have been applied. The following is a summary of the methodological aspects of each of these classification techniques.
3.1. Multilayer Perceptron
The multilayer perceptron (MLP) is an RNA methodology composed of a layer of input units, an output layer and other intermediate layers also called hidden layers. These last layers have no connections with the outside. The system is designed for supervised feedback. All the layers would be connected so that the input nodes are connected with the nodes of the second layer, these in turn with those of the third layer, and so on. The methodology aims to form a correspondence between a set of initial observations at the input with the set of outputs desired for the output layer.
The work [
52] develops the MLP learning scheme as its case, in which initially there is no knowledge about the underlying model of the applied data. This scheme needs to find a function that captures the learning patterns, as well as a generalization process to be able to analyze individuals not included in the learning stage [
53]. It is necessary to adjust the weights considering the sample data, assuming that the information on the architecture of the network is available, where the objective is to achieve weights that minimize the learning error. Therefore, given a set of pairs of learning patterns {(x1, y1), (x2, y2)… (xp, yp)} and an error function ε (W, X, Y), the training stage is It composes in identifying the set of weights that minimizes the learning error E (W) [
54], as it appears in (1).
3.2. Support Vector Machines
Support vector machines (SVM) have registered good results when applied to problems of a very diverse nature, where the generalization error needs to be minimized. SVM is defined as the attempt to classify a surface (σ
i) that divides positive and negative data by as large a margin as possible [
55].
All possible surfaces (σ1, σ2, …) in the A-dimensional space that differentiates the positive data from the negative ones in the training observations are used to find the smallest possible distance. The positive and negative data are linearly separable and therefore the decision surfaces are |A|-1-hyperplanes. Attention must be paid to the best decision surface and is identified through a small set of training data called support vectors. SVM allows the construction of non-linear classifiers, that is, the algorithm represents non-linear training data in a high-dimensional space.
In our analysis, the minimum sequential optimization (SMO) method is applied to train the SVM algorithm. The SMO technique separates quadratic programming (QP) problems to be solved in SVM by smaller QP problems.
3.3. Fuzzy Decision Trees (FDT)
This is an algorithm based on the famous C4.5 technique where a decision tree is built based on characteristics that are composed of smaller subsets, basing the decision of the formation of the decision tree on the possibility of deriving a value from the information [
56]. This algorithm can collect hidden information from large data sets and produce its own rules for optimal classification [
57]. Therefore, C4.5 is made up of features such as the selection of attributes as root, the possibility of producing a branch for each value, and being able to repeat the process for each branch until all branch cases have the same class. The highest gain is used for the selection of attributes as the root, as expressed in Equation (2):
where
S is the set of cases,
A is the attributes,
n represents the partition number of the attribute
A, and
Si represents the number of cases in the partition
i-th.
The result of Entropy is computed as appears in Equation (3):
where
S establishes the set of cases,
n identifies the number of partitions of
S, and
pi represents the proportion of
S.
The fuzzy decision trees show an initial architecture identical to the decision trees developed at the beginning. Fuzzy decision trees allow observations to be developed in different branches of a node at the same time and with different levels of satisfaction in the interval (0–1) [
58,
59]. Fuzzy decision trees differ from standard decision trees because they apply division criteria related to fuzzy constraints, their inference techniques are different, and the fuzzy sets representing the observations should not change. On the other hand, the stimulus of the fuzzy decision tree is composed of two factors, such as a procedure to build a fuzzy decision tree and an inference development for decision making. The fuzzy modification has achieved better results in previous studies in comparison with the C4.5 algorithm [
59].
3.4. AdaBoost
AdaBoost is a meta-algorithm-based learning technique that can be applied to other types of learning algorithms to increase your ability to hit. This procedure performs a weighted sum to obtain the result from the other algorithms, called weak classifiers, with the driven classifier such as AdaBoost. This classifier adapts to the rest of the weak algorithms to hit in favor of the cases badly classified by the previous classifiers. AdaBoost has the characteristic of being sensitive to samples with noise and outliers, but, in some classification problems, it may be less sensitive than other classifiers [
60].
AdaBoost develops a particular technique of training a powered classifier [
61]. A Boost classifier is a classifier composed as it shows in Equation (4):
where a
ft represents a weak learner which takes an object
x as input and returns a real value result pointing out the class of the object. The predicted object class and absolute value show a level of confidence in the classification problem through the signal from the weak classifier output. For its part, the sign
T of the classifier will be positive in the case that the sample is within a positive class, and negative otherwise.
Every classifier indicates an output, the hypothesis
h(
xi), for each sample in the training set. In iteration
t, a weak classifier has chosen
ft and provides a coefficient
αt, so the training error adds
Et, this classifier having the mission of minimizing the level of error, as shown in Equation (5).
where
Ft−1 represents the driven classifier generated in the prior step of training,
E(
F) defines the error function, and
ft(
x) =
αth(
x) is the weak beginner that sums to the final classifier.
3.5. Extreme Gradient Boosting (XGBoost)
XGBoost is an algorithm based on increasing the gradient and has shown superior predictive power to many computational methodologies widely used in the previous literature [
58,
62,
63]. It is an algorithm that can be applied to supervised learning situations and is made up of sets of regression and classification trees (CART). Initially, the variable to be predicted can be defined as yi, XGBoost is defined as it appears in Equation (6).
where
K represents the total number of trees,
for the tree,
defines a function in the functional space
F, and
F shows the possible set of all CARTs.
For the trained and trained CART, they will try to mimic the level of residues thrown by the model in the training step. The objective function is optimized in step
(t + 1) as defined in Equation (7).
where
l (.) represents the loss function in the training step,
shows the validation value in this training step,
describes the prediction value in step
t, and
is fixed starting the regularization defines in Equation (8).
In this Equation (8), T represents the number of leaves and wj defines the score obtained for the sheet jth. Once optimized (8), the expansive Taylor rule is applied to carry out the descent of the gradient and collect different loss functions. Significant variables are chosen during the training step as a node in the trees, eliminating non-significant variables.
3.6. Random Forests
Random forests (RF) are an ensemble method that averages the forecasts of a high number of uncorrelated decision trees [
64,
65]. They usually display good performance with better generalization properties than individual trees, are generally relatively robust to outliers, and need virtually no parameter turning [
66]. Random forests are supported by two domain ideas: packaging to build each tree on a different starter sample of the training data and random selection of features to decorate the trees. The training algorithm is quite simple and can be described as follows: For each of the trees in the set, a sample of the training data is drawn. By growing the tree
Tb over Z, characteristics that are available as candidates for the division at the respective node are randomly chosen [
67]. Lastly, the grown tree is added
Tb to the whole. During the inference, each of the trees provides a prediction
for the class label of the new observation
x. The final random forest prediction
is then the majority vote of the trees, that is,
.
Inspired by [
64], RF holds 100 trees, each with a maximum depth of 10 for the simulation study. The trees all use cross-entropy as the error minimization measure and
m = √p characteristics are set as the default option for the classification configuration of this algorithm.
3.7. Deep Belief Network
The Deep Belief Network (DBN) is a variant of a deep neural network made up of two upper layers joined together as an undirected bipartite associative memory, called restricted Boltzmann machines (RBM).
The lower layers form a directed graphical pattern, called the sigmoid belief network. The difference between sigmoid belief networks and DBN is found in the way the hidden layers are parameterized [
68], as indicated in Equation (9).
where
v represents the vector of visible units,
defines the conditional probability of visible units at the level
k. The joint distribution at the top level,
, is an RBM, being
. Another way to show DBN as a generative model can be pointed out in the expression (10):
DBN is made up of the accumulation of RBMs. The visible layer of each RBM in this composition constitutes the hidden layer of the previous RBM. In the way that a model fits a data set, the mission is to establish a model
for the true posterior
. The approximations for the higher-level posterior are determined by the posterior
,
, where the upper-level RBM gives the possibility to calculate the inference procedure [
68].
3.8. Deep Neural Decision Trees
Deep Neural Decision Trees (DNDT) are composed of decision tree models computed using deep learning neural networks. In these trees, a combination of weights is assigned to DNDT, which belongs to a specific decision tree, its result being interpretable [
69]. DNDT starts with a “soft clustering” function [
70] to evaluate the level of residues contained in each node, which allows obtaining split decisions in DNDT. The “grouping” function is defined as using a real scalar as input x and getting an index of the “containers” that x belongs to.
In the case of having a variable x, it can be joined in n + 1 intervals. The need to generate n cut points is created, which are trainable variables within the algorithm. These cutoff points are called [β1, β2, …, βn] and move in an increasing monotonic fashion, hence, β1 < β2 < … < βn.
The function of the activation of the DNDT method starts from a neural network to make the computation, as is defined in Equation (11).
where
w is a constant and its value is set as
w = [1, 2, …,
n + 1],
τ > 0 is a factor temperature, and
b is constructed as defined by Equation (12).
The neural network, which is defined in Equation (12), produces an encoding of the ‘binning’ function
x. In case
τ approaches 0 (which is the case most often), the vector is sampled via the straight-through (ST) method Gumbel-Softmax [
71]. Considering our ‘binning’ function defined above, the main idea is to create the decision tree through the Kronecker product. Suppose we get an input instance
x ∈ RD with
D features. Interleaving each feature
xd with its neural network
fd(
xd), we can find all the final nodes of the decision trees, as expressed in Equation (13).
where
z is now also a vector indicating the index of the leaf node where the instance arrives
x. Finally, we suppose that a linear classifier at every sheet
z sorts the instances that arrive there. DNDT scales well with the number of inputs because of the mini-batch training of the neural network. However, a key design drawback is that, due to the use of the Kronecker product, it is not scalable concerning the number of features. In our implementation today, we avoid this problem with large data sets by having a forest with random subspace training [
65].
3.9. Sensitivity Analysis
In Machine Learning techniques, it is also necessary to analyze the impact of variables as occurs in traditional statistical techniques, after using data samples that contain a wide variety of variables. To carry out this evaluation, a sensitivity analysis must be applied. The objective of this procedure is to determine the level of significance of the independent variables over the dependent variable [
72,
73]. Therefore, it tries to determine those models that are made up of the most important ones, and therefore, eliminate the variables that are not significant. For a variable to be considered significant, it must have a variance greater than the mean of the rest of the variables that make up the model. The Sobol method [
74] is the technique chosen to decompose the variance of the total V (Y) given by the following equations expressed in (14).
where
Vi =
V(
E(
Y|
Xi)
y,
Vij =
V(
E|
Xi,
Xj)) −
Vi −
Vj.
The sensitivity indices are obtained by Sij = Vij/V, where Sij denotes the effect of interaction between two factors. The Sobol decomposition makes it possible to estimate a total sensitivity index STi, measuring the total sensitivity effects implied by the independent variables.
6. Conclusions
The present study developed robust global and regional models to predict international financial crises, specifically those related to sovereign debt and the price of the currency. Similarly, an attempt is made to show the superiority of computational techniques over statistics in terms of the level of precision. An attempt has been made to clarify these issues by overcoming the previous absence of definitive conclusions due to the lack of homogeneity caused by the disparity of methodologies, approaches, available databases, periods, and countries, among other issues.
The results of the study carried out have allowed us to obtain the conclusions that appear below. First, to confirm the existence of differences between the global and regional models, and that the global models can even show a precision capacity similar to the mean of the regional models. To this end, the global sovereign debt prediction models for the studied regions (Africa and the Middle East, Asia, Latin America, and Europe) have obtained an accuracy capacity of 97.80%, 100%, 96.82%, 98%, 85%, and 99.76%, respectively. For its part, this precision relationship for the models built in the study of the currency crisis shows a precision of 98.43%, 98.24%, 98.54%, 96.90%, and 99.07% for the Global sample, Africa and the Middle East, Asia, Latin America, and Europe, respectively. This shows the high level of robustness of the models built concerning previous works.
Second, about the objective that postulated that the application of computational methods could improve the level of precision shown by statistical techniques, our empirical evidence has allowed us to accept it for the analyzed crises, all based on the comparison made between levels of success for test sample data and obtained RMSE values. The best methods for the sovereign debt crisis have been FDT, AdaBoost, XGBoost, and DNDT. While for the prediction of the currency crisis, the best techniques have been DNDT, XGBoost, RF, and DBN.
Regarding the explanatory variables of sovereign debt crises, in the set of estimated models, some variables have appeared as significant continuously. They are the variables related to the exposure to the country’s debt, more specifically TDEB, which shows the importance of a high level of public debt in sovereign default, and IMFC, which indicates the influence of high dependence on credit provided by the IMF as a possible cause of the increased probability of default. On the other hand, the foreign sector variables related to the amount of foreign exchange reserves accumulated by a country, such as FXR and M2R, show the importance of a high level of foreign exchange reserves with which to be able to face international debt payments. Lastly, the SCFR and SBS variables also show continued significance, showing that interest paid and credit rating are important factors when evaluating the possibility of a sovereign default.
The results of the currency crisis prediction models also show that a small group of variables are consistently significant. This is the case of the FCF variable, indicating how a low level of dynamism in net investment in the country can cause a strong drop in the currency’s value. Similarly, the variables that the money supply represents, such as M2M and M2R, indicate that a rise in the money supply in the market makes the currency lose its price. Variables of the foreign sector attribute such as TRO and CACC are also presented as significant variables due to the importance of the commercial opening of a country in the price of its currency. Finally, in the case of naming the most significant political variables in a general way, the variables DUR and YEAR indicate a higher incidence of currency crises in those countries where political regimes are perpetuated, i.e., close to totalitarianism.
6.1. Implications
The above conclusions have important theoretical and practical implications. From a theoretical approach, the models developed can help provide tools for the prevision of sovereign debt and currency crises that are able to avoid international financial crises both at the regional level and as a whole (global), since a high level of robustness in these models concerning previous works. This study is a great contribution to the field of international finance, as the results presented in this work have considerable implications for further decisions, providing tools that help governments and financial markets achieve financial stability. Given the need of countries to obtain financing and establish international relations, our models can help to foresee sovereign debt and currency crises in them, avoiding financial disturbances and imbalances and reducing the possibility of damages in the financial intermediation process. All this implies an improvement in the functioning of financial markets, debt sustainability, the profitability of credit institutions, and the non-banking financial sector, such as investment funds. From a practical point of view, our sovereign debt and currency crisis prediction models can be useful to assess the reputation of a country more accurately. In a globalized world, companies always try to expand into markets outside their own, which makes it vital to enjoying a good image of the country of origin to improve the perception of the goods and services offered. A poor reputation of a country in terms of paying its debt obligations, as well as an unstable currency, can have a negative impact on companies from that country in other markets about seeking financing, suppliers, and partnerships with other companies. Therefore, a better perception of the country’s financial management can improve on the one hand, its position in the financial markets, and, on the other side, the country’s reputation for the benefit of its companies.
6.2. Limitations and Further Research
This investigation has certain limitations, principally the historical data available for emerging economies. As this research was conducted from a globally oriented perspective, it requires a much larger range of information compared to other studies in this field. Furthermore, future studies may delve into other types of political information to deepen their influence both on the financial crises studied and on the impact of the country’s reputation. It would be convenient to relate the influence of the financial crises suffered by a country on exports or tourism, important dimensions in the country’s reputation through modifications of the country strength models, as the main tools for measuring reputation. Likewise, and to increase the generalization of the results in the study of the country’s reputation, further analysis could be included on the impact that the financial strength of a country has on corporate reputation, both in large companies and in those that wish to expand internationally, for instance energy companies.
Machine learning techniques show a great capacity to absorb observations, the use of large data samples being vital to obtain a high level of accuracy. Therefore, it shows a greater margin to achieve better precision ratios and a low level of error. But some of the weaknesses of these computational techniques compared to statistics is their higher computational cost to perform the analyzes, as well as greater difficulty in interpreting some methodologies. Therefore, leaving aside the superiority of machine learning methodologies demonstrated in this study and a multitude of previous works, it is necessary to find new techniques that can mitigate the weaknesses described. An interesting technique to test powerful alternatives for predicting international financial crises, such as their effect on the management of the country’s reputation, would be dynamic systems. This technique has been used in different areas of management, obtaining very satisfactory results in simulations of medium and long-term scenarios [
81,
82,
83,
84].