Several quantitative methods were applied following the 1960s to model sovereign credit risk and to quantify the sovereign probability of default. As such, multivariate statistical and stochastic process-based sovereign default forecasting has an approximately 50-year developmental history. Based on historical development, it can be concluded that the applied quantitative methods may accountably model relationships between explanatory and target variables and provide reliable means of forecasting the probability of sovereign default.
Historical development is evaluated through 50 empirical publications that achieved significant, recognized scientific results. Articles appearing in highly rated journals and which achieved substantial citation, and/or which are attributable to the most currently applied models with subsequent outstanding results are regarded by the author as historically relevant.
2.2.1. Multivariate Classification Methods
In relation to multivariate classification methods, they may be divided between traditional parametric statistical methods, mostly applied in the earlier phase of sovereign default forecasting history, and non-parametric machine learning methods, which are currently widely applied. From traditional and parametric classification methods, discriminant analysis (DA), logistic regression (logit), probit, and Tobit analysis are regarded as significant in the historical development of the field of sovereign default forecasting.
The first multivariate sovereign default forecast model was developed by
Frank and Cline (
1971) using multivariate DA. The target variable was that of sovereign debt restructurings occurring between 1960 and 1968. Following the success of the first model, DA became widely applied to predict sovereign default. Publications by
Grinols (
1976),
Sargen (
1977),
Saini and Bates (
1978),
Taffler and Abassi (
1984), and
Burton and Inoue (
1987) are highly emphasized as significant articles in this regard. Following the 1980s, it can be observed that the role of DA in the literature and in practice was replaced by more advanced techniques indicating higher classification power and less rigorous application assumptions.
In parallel with the developmental tendency of corporate and bank failure prediction, the first logit-based sovereign default forecast models appeared at the end of the 1970s. In contrast to DA, this did not require use of rigorous normality and variance assumptions. The first sovereign default logit model was published by
Feder and Just (
1977). The predictive power of the six-variate logit model is adjudged to be superior than any of the previously published models.
Mayo and Barett (
1978),
Feder et al. (
1981),
Citron and Nickelsburg (
1987),
Oral et al. (
1992),
Sommerville and Taffler (
1995),
Ciarlone and Trebeschi (
2005),
Fuertes and Kalotychou (
2006,
2007b), and
Kaminsky and Vega-Garcia (
2016) all produced major logit models to contribute to the literature. The Noise-To-Signal (NTS) approach, combined with logit models, was also shown to be strongly applicable in sovereign default forecasting. By realizing that several variables behaved differently before crises,
Kaminsky et al. (
1998) categorized variables according to the excess of predefined thresholds. Similar methodology was applied by
Dawood et al. (
2017) and by
Wijayanti and Rachmanira (
2020). In overall terms, it can be concluded that logit methods are still widely applied for sovereign default forecasting, both as a standalone and as a benchmark method. The popularity of logit methods has remained unbroken, even though in recent years several empirical studies have revealed that machine learning methods can achieve much superior predictive power.
The probit method was first applied by
Kharas (
1984) to predict sovereign default. This author examined long-term creditworthiness of developing countries between 1965 and 1976 by concentrating on the relationship between capital accumulation and external debt. Similar probit models were developed by
Balkan (
1992),
de Bondt and Winder (
1996),
Reinhart (
2002), and
Szetela et al. (
2016). Tobit analysis was applied for sovereign default forecasting by
Lloyd-Ellis et al. (
1990),
Lanoie and Lemarbre (
1996), and
Gür (
2001). Generally, it can be argued that although probit and Tobit methods are less frequently applied in the literature and in practice as compared to logit methods, they can still be regarded as substantial in the historical development of the field of sovereign default forecasting.
Within non-parametric and machine learning classification, the following methods may also be regarded as historically significant in the development of sovereign default forecasting:
Decision Trees (CART and C4.5 trees)
Neural Networks (NN)
Support Vector Machine (SVM)
Random Forest (RF)
Least Absolute Shrinkage and Selection Operator (LASSO)
Multivariate Adaptive Regression Splines (MARS)
Extremely Randomized Trees (ERT)
Extreme Gradient Boosting (XGBoost)
Deep Neural Decision Trees (DNDT)
The first decision tree-based sovereign default model was developed by
Cosset and Roy (
1988) by applying classification and regression trees (CART) to observations between 1983 and 1985. The target variable was that of sovereign rating, and the explanatory variables were exchange rate changes, inflation rates, and infant mortality rates. Results indicated that regression trees were able to effectively manage hidden relationships in the database and better handle multicollinearity as compared with previously applied techniques.
Manasse et al. (
2003) produced an article with a significant effect on the development of the field, which applied CART to locate early warning indicators of sovereign debt crises. In total, 1276 observations were examined between 1970 and 2002, 54 of which were default occurrences. The target variable was defined in parallel with the default definition of S&P and the excess of the IMF non-concessional limit. A logit model was developed as a benchmark and the six-level regression tree achieved 89% classification power, whereas the logit model achieved only 74%.
Manasse and Roubini (
2009) applied CART to examine macroeconomic, financial, and political factors explaining sovereign debt crises. The initial 50 variables were reduced to 10 utilizing decision trees whereby rules were developed to recognize features of defaulting countries. It was concluded that not all crises were similar, and they could be differentiated in terms of solvency, liquidity, and macroeconomic risks. Decision trees also explored factor-groups to identify relative risk-free zones.
Savona and Vezzoli (
2015) attempted to locate the best compromise between in-sample model fit and out-of-sample predictive power. The authors examined developing countries by using regression trees between 1975 and 2010, supplemented with data from Greece, Ireland, Portugal, and Spain. The danger of failure was established by defining variables in terms of how they may exceed predefined thresholds. The classification power of the model outperformed that of the benchmark NTS logit model. The strongest variables were those of short-term excessive indebtedness, default history, real GDP growth, and US interest rates.
Alaminos et al. (
2019) applied fuzzy C4.5 decision tree methodology to predict sovereign debt crises using data collected between 1970 and 2017, and by applying 30 variables and ten-fold cross validation. The area under the ROC curve (AUROC) of the global model was 94%, thus indicating a very strong predictive power.
It can be concluded that decision trees also fulfilled a critical role in the historical development of the field of sovereign default forecasting. However, the static decision trees have since been transformed to produce more advanced machine learning decision trees such as RF and ERT, which are presented later.
The first NN-based sovereign default model was published by
Cosset and Roy (
1994) using data collected from 76 countries between 1983 and 1985, with a forecast made until 1986. Results were compared with the logit method. The NN model, incorporating reserve import ratios, net external debt ratios to exports, per capita GNP levels, current account to GNP levels, investment willingness, export changes, and political instability as explanatory variables outperformed the logit model as a means of classifying sovereign default. A similar NN model was developed by
Chattopadhyay (
1997) by using the net position change of US foreign direct investment as a target variable.
Within the framework of comparative analysis,
Cooper (
1999) examined the performance of a back propagation NN on data collected between 1960 and 1982 by comparing results with DA, logit, and probit methods. Debt restructuring was set as a target variable and the author demonstrated superiority of the NN method with 90% classification accuracy, in contrast to 85% achieved with logit and probit methods, and 80% with DA.
Yim and Mitchell (
2005) attempted to forecast changes in sovereign ratings by utilizing back propagation NN, hybrid NN, and by employing DA, logit, and probit as benchmark methods. The deployed hybrid network integrated variables and outputs from statistical models and back propagation NN combined with Ward clustering and self-organizing maps (SOM). The strongest model variable emerged to be political risk, and the best hybrid model was determined to be the combination of NN–logit–probit, which provided a perfect classification of the out-of-sample set.
In the recent global financial crisis, timely recognition of sovereign default became particularly important. By researching debt crises of developing countries between 1980 and 2004,
Fioramanti (
2008) developed an NN-based early warning model. This author emphasized the high flexibility and non-linear approximation capability of NN, thereby outperforming earlier methodologies.
Frascaroli et al. (
2009) attempted to reconstruct sovereign ratings with resilient propagation neural networks (RBPRO-NN) by using macroeconomic data collected between 1975 and 2005. The model was tested in multiple scenarios on the Brazil economy, and an exact prediction was produced for which indicators the country would need to improve upon in order to receive better ratings.
Zhou and Wang (
2019) experimented with deep learning neural networks (DL-NN) using a database of 183 countries with data collected between 1970 and 2015. Target variables included sovereign default events, IMF excess limits, implicit severe domestic sovereign indebtedness, and loss of market confidence. By also paying careful attention to prevent overtraining, the model was able to achieve almost perfect classification accuracy.
It can ultimately be concluded that experience with NN has demonstrated it to be by far one of the best sovereign default forecasting modeling methods. This has been manifested in various standalone models, constantly developing learning algorithms and consideration of useful benchmark models.
Demand for even more reliable and effective sovereign default forecasting models has been further corroborated by events since 2010. This is especially relevant since a great number of models developed before 2008 failed to forecast the severity and duration of the 2008–2010 global economic crisis (
Candelon et al. 2014). Several improved econometric models were published with the expressed goal of increasing predictive power in out-of-sample periods. Several critiques were simultaneously made on the applicability of earlier models. As a consequence, since around 2015, artificial intelligence-based machine learning procedures have also unambiguously become dominant in sovereign default forecasting methods. Their constant development of creative combination currently comprises the most interesting research challenges in the field of sovereign default forecasting.
Pisula (
2017) experimented with different ensemble classifier machine learning methods with data collected from 133 countries between 1980 and 2014 by using macroeconomic and financial indicators and a three-fold cross validation method. The single target variable was that of debt service difficulty and the forecast horizon was 3 years. By creating a balanced sample, 1281 observations were classified as default occurrences from 2562. This author combined a stacking ensemble classifier tool with NN, SVM, G-logit, and MARS methods, a bagging ensemble classifier with the RF method and an AdaBoost ensemble classifier tool with the CART method. The best predictive power (97% AUROC) was achieved by the AdaBoost–CART model, followed by the RF method with 96%. Other models significantly underperformed in comparison.
Huang and Sethi (
2017) developed NN, SVM, RF, and logit models by using an IMF database containing 1200 observations. Variables were reduced via Principal Component Analysis (PCA), and results were backtested by use of a ten-fold cross validation method. RF emerged to be the best predictive model with 91% classification accuracy, followed by the SVM (89%), NN (88%), and logit methods (87%).
Nyman and Ormerod (
2018) predicted economic crises by applying RF methods using macroeconomic and market indicators to ex post reproduce the best possible ex-ante forecasting cases. One hundred decision trees were built with the help of bagging on data collected between 1970 and 2010. Variables were lagged, where considered reasonable, thus enabling retrospective forecasting of multiple periods with the average result being considered as the prediction. The RF model was able to forecast the financial crisis at the beginning of 2009 from data collected 18 months previously and did not forecast a crisis for any period when it actually did not happen.
da Silva et al. (
2019) attempted to reproduce sovereign ratings by applying machine learning procedures using data collected from 137 countries between 1958 and 2017. Following clustering and by applying PCA, the authors developed an RF model that was fine-tuned by testing crisis impacts and which achieved 98% classification accuracy.
Lucia et al. (
2019) analyzed the behavior of sovereign CDS spreads between 2009 and 2013 to identify various turning points. The authors explained time-dependent behavior of CDS spreads by using real-time, country-specific macroeconomic variables and market indicators for which the LASSO machine learning procedure was applied. It was suggested that the substance of fundamental conditions significantly decreases during a crisis breakout period given the panic in markets whereby certain countries are punished because of their assumed or actual vulnerability.
Bluwstein et al. (
2020) constructed machine learning models by using an extended period (1870–2016) database of data collected from 17 countries containing macroeconomic, financial, and market indicators. From 2499 observations, 90 were classified as default occurrences. Target variables were defined as crisis indicators occurring in the banking sectors of the examined countries. In addition to benchmark logit and CART models, the authors applied RF, ERT, SVM, and NN methods. The ‘black box’ aspect of the machine learning method in this instance was resolved by use of the Shapley regression method. Each model found similar variables as relevant to the forecast financial crises, from which the resulting slope of the yield curve was emphasized. The best model was found to be ERT with 87% AUROC, followed by the RF, SVM, NN, logit, and CART methods, respectively.
Alaminos et al. (
2021) found that an accuracy limitation of several existing models could be due to a lack of geographic diversity. The authors used a wide global sample differentiated according to major geographical regions and attempted to use several machine learning methods to locate the best model. Separate models were built to predict sovereign debt crises and currency crises. The best sovereign debt crisis model was achieved by use of the fuzzy decision trees model (97.8% accuracy), followed by the AdaBoost model (96.1%), and the XGBoost model (94.4%). The most reliable currency crisis model was developed by use of the DNDT model (98.4% accuracy) followed by the XGBoost model (97.3%), and then by the fuzzy decision trees model (95.8%).
2.2.3. Rating-Based Approaches
Given sovereign rating is a complex, forward-looking measure of sovereign issuers’ debt servicing capacity, it is widely used as an important characteristic of expressing sovereign risk and as acting as a basis for credit risk undertaking decisions. Rating agencies provide valuable databases for sovereign default forecasting, primarily with frequently published empirical default rate time series, and transitional matrices expressing the probability of changes in sovereign rating, which they also provide. Various time series forecasting methods can be applied to the published historical sovereign default rates, and beginning with transitional matrices, a great number of matrix function-based stochastic methods are available to forecast sovereign default of which the Markov chain is the best-known methodological tool. This section focusses further on the Markov chain method.
It is important to note that rating agencies fundamentally focus on longer-term horizons by using ‘through-the-cycle’ rating methodology. As a result, they primarily provide insight to durable components of the perceived rating changes (
Altman and Rijken 2004).
Hu et al. (
2002) constructed transitional matrices from sovereign ratings. By recognizing the problem that several sovereign entities with unfavorable ratings do not possess decades-long historical transitional data, the authors recommended combining and supplementing matrices with data from other actual historical default events.
Wei (
2003) produced a general, multi-factor Markov chain applied to rating migrations and credit risk spreads, which was also applied to corporate and sovereign debtors. The time-dependent transition matrix was constructed with the help of latent variables representing the economic cycle and economic environment based on observed transitions between 1981 and 1998.
The application of Markov chains was also recommended by
Kiefer and Larson (
2004). The authors examined the applicability of Markov chains to local governmental bonds, commercial debt letters, and sovereign debts. They recommended the use of Markov chains to predict default over a maximum five-year period in the case of local government bonds, and six months in the case of commercial debt letters. However, this study did not locate any limit to forecasting changes of sovereign ratings, including the quantification of migration to sovereign default. These authors highly appreciated the database scope and default definition used by S&P.
Fuertes and Kalotychou (
2007a) constructed three Markov chain models by using Moody’s rating changes identified in 72 countries between 1981 and 2004. A discrete chain, a one time-homogeneous chain, a continuous hazard chain, and a time-inhomogeneous continuous hazard chain were also deployed. Bias and variance of the model variables applied to the finite sample were tested via a bootstrap simulation exercise. The duration dependence and momentum characteristic of upgrading and downgrading were examined by panel logit methods, and non-Markovian processes were identified in the sovereign rating changes. For countries with worse ratings, the non-homogeneous continuous Markov chain indicated the best performance.
Bhaumik and Landon-Lane (
2013) examined a Moody’s sovereign rating migration database with data collected between 1996 and 2005 by using Markov chains for different country groups and different economic conditions. The homogeneity assumption was rejected, and distinct samples were created for each rating migration by using the Bayes decomposition method. Non-homogeneous Markov chains were constructed by using mobility indices to achieve promising results.
Oh et al. (
2019) developed a Regime Switching Markov chain (RSMC) model in which regime states were derived from a hidden Markov model expressing dynamics of sovereign rating transitions. The authors firstly demonstrated that the estimation of RMSC is superior to the homogeneous Markov chain and applied the model to a monthly time series database of sovereign ratings of 41 countries between 1994 and 2018 by also considering the status of the specific economic environment. Results indicated that in the case of economic recession, countries with worse ratings received a higher probability of downgrading.
Szetela et al. (
2019) researched interrelationships between sovereign defaults by using data collected from 42 European countries between 1994 and 2013. Since traditional statistical methods failed to adequately model relationships, the authors applied the Copula method to rate sovereign financial instruments. The Markov chain was applied as a dynamic variable in order to quantify transitions among sovereign ratings. It was found to be challenging to manage low default rates and predict the probability of default for developed European countries. Eventually, the best model was achieved by the t-Copula method.