*Review* **A Comprehensive Review of Corporate Bankruptcy Prediction in Hungary**

#### **Tamás Kristóf \* and Miklós Virág**

Department of Enterprises Finances, Corvinus University of Budapest, F˝ovám tér 8, 1093 Budapest, Hungary; miklos.virag@uni-corvinus.hu

**\*** Correspondence: tamas.kristof@uni-corvinus.hu

Received: 12 January 2020; Accepted: 17 February 2020; Published: 19 February 2020

**Abstract:** The article provides a comprehensive review regarding the theoretical approaches, methodologies and empirical researches of corporate bankruptcy prediction, laying emphasis on the 30-year development history of Hungarian empirical results. In ex-socialist countries corporate bankruptcy prediction became possible more than 20 years later compared to the western countries, however, based on the historical development of corporate bankruptcy prediction after the political system change it can be argued that it has already caught up to the level of international best practice. Throughout the development history of Hungarian bankruptcy prediction, it can be tracked how the initial, small, cross-sectional sample and classic methodology-based bankruptcy prediction has evolved to today's corporate rating systems meeting the requirements of the dynamic, through-the-cycle economic capital calculation models. Contemporary methodological development is characterized by the domination of artificial intelligence, data mining, machine learning, and hybrid modelling. On the basis of empirical results, the article draws several normative proposals how to assemble a bankruptcy prediction database and select the right classification method(s) to accomplish efficient corporate bankruptcy prediction.

**Keywords:** bankruptcy prediction; classification; credit risk modelling; corporate failure; rating systems

#### **1. Introduction**

In recent years, the increasing relevance of corporate bankruptcy prediction as a research field has been corroborated also by the fact that several comprehensive reviews emerged in literature with the aim to summarise the key findings of earlier published results. Central-Eastern Europe is not an exception of global tendencies, see inter alia (Kliestik et al. 2018; Pavol et al. 2018; Popescu and Dragota 2018; Prusak 2018; Marek et al. 2019). Corporate bankruptcy prediction in ex-socialist countries became possible more than 20 years later compared to western countries, since before the political system change no bankruptcy event in today's market economy sense existed in the centrally managed planned economies. However, based on the historical development of corporate bankruptcy prediction after the political system change, it can be argued that it has already caught up to the level of international best practice regarding the examined research problems, applied methods, and empirical results.

In Hungary the legislation system was needed to be adjusted to the new socio-economic processes in a relatively short time after the political system change. The establishment of bankruptcy regulations had almost no dogmatic precedents, since the legal field of insolvency had been completely missing from the Hungarian legislative system for forty years. The Act of Bankruptcy as of 1991 qualified a company as insolvent, if its debts exceeded its assets, it did not pay obligations 60 days after maturity, the foreclosure of liabilities was resultless, and/or it cancelled the payments. The Act of Bankruptcy has been modified several times since 1991, however, the fundamental concept of insolvency has not substantially changed. In the current Hungarian legislation system legal failure can have four forms:


Hungary can be proud of the fact that corporate bankruptcy prediction began as early as possible, and has already achieved a 30-year development history having extensive range of results. Many of them, however, were published only in Hungarian making it difficult to analyse by international scholars, and so far no comprehensive review has been written in an international journal to evaluate them. In our opinion time has come to resolve this gap.

The article attempts to synthesize the historical development tendencies of theoretical approaches, methodologies, and empirical researches of corporate bankruptcy prediction, laying emphasis on the 30-year development history of Hungarian empirical corporate bankruptcy prediction models. Throughout the development history of Hungarian bankruptcy prediction, it can be tracked, how the initial, small, cross-sectional sample and classic methodology-based bankruptcy prediction has evolved to today's corporate rating systems meeting the requirements of the dynamic, through-the-cycle economic capital calculation models. Contemporary methodological development is characterized by the domination of artificial intelligence, data mining, machine learning and hybrid modelling.

The article evaluates the development of bankruptcy prediction methodology starting from the linear statistical methods arriving at the contemporary artificial intelligence-based machine learning procedures, providing Hungarian empirical results to the application of all methods.

The research method of completing the literature review was to collect and evaluate all theoretical, methodological and empirical publications that appeared in the field of Hungarian corporate bankruptcy prediction. Considering the fact that Hungary is a relatively small country and the research field is comparatively narrow, it has been possible to provide a review encompassing the all-inclusive set of studies. The range of studies also included the works of Hungarian researchers published abroad together with the publications of Transylvanian-Hungarian bankruptcy modellers.

In our opinion this article might serve as an instructive story for other countries being in similar shoes and for scholars interested in development histories and case studies of the professional field. Since it turned out soon that well-known international corporate bankruptcy models did not perform well in Hungary, emphasis was laid on own empirical model development efforts leading to a diverse experimentation with several approaches and techniques.

#### **2. Theoretical Considerations**

Corporate failure has been a research focus for social scientists for a long time. One of the fundamental questions of management and organisation sciences is why certain organisations survive, whereas others disappear (Virág et al. 2013). In recent decades substantial number of publications have emerged in the literature in the fields of business failure, corporate survival, bankruptcy prediction, organisational mortality, financial distress, default prediction, and credit scoring, which might seem to be at first glance different things, however, it is a mutual effort of them that they attempt to predict the occurrence of a failure event with the help of corporate descriptive variables by applying similar methods (Kristóf and Virág 2019a).

It can be concluded that bankruptcy prediction primarily supports the empirical research of corporate survival and failure by exploring the reasons for failure, and by constantly developing the multivariate classification and forecast methodology (Kristóf and Virág 2019b). Bankruptcy prediction has gone through significant progress in the recent 50 years.

In the economic system, the continuous inflow and outflow of economic organizations is a natural phenomenon. According to Schumpeter (1934), corporate failure is a necessary element of effective market economy, which enables to transform the human, physical, and financial resources to other, more productive companies.

Organizational termination has been explained by many approaches of organisational theory (Mellahi and Wilkinson 2004). Classic industrial organisation and organisational ecology emphasise on the deterministic role of environment, and scholars in this field argue that external industrial and environmental conditions leave limited freedom for the managers to make decisions, that is why it is not the managers who are responsible for corporate failure. On the other side, representatives of behaviourist, political, decision theoretic, and organisational psychologist schools pursue a voluntarist approach and blame the activities, decisions, and perceptions of managers for failure. Truth is obviously somewhere between the deterministic and voluntarist approaches.

Two tendencies might be distinguished in the research field of organisational survival (Anheier and Moulton 1999). A greater part of studies examining organisational survival/failure has been carried out at the macro level. Besides modelling financial solvency, the relevant studies survey the dynamics of organisational population, together with entrance and exit from the population. The most extensive survival-researches have been conducted by the representatives of the population ecology approach. A smaller part of studies examining the organisational survival has been performed in forms of organisation-specific analyses. Emphasis has been laid on organisational efficiency and performance criteria. From management side inter alia different behavioural characteristics, inadequate organisational structure, information asymmetry, unfounded decisions, lack of foresight and self-conceit effect might also play a role in failure (Jáki 2013a, 2013b). In the management literature organisational survival is often published in the form of 'rise and fall' of different companies (Kristóf 2008b).

On the basis of case studies and quantitative analyses, several theories were born to explain organisational survival (Virág et al. 2013). However, generalisations derived from empirical researches did not converge into a unified theory of organisational survival; they rather remained competitive and complementary streams. Under such circumstances theories are regarded as 'good', which reveal organisational survival from the most possible aspects, namely which are simultaneously able to deal with the contingency, transaction cost, principal-agent, political, life cycle, cognitive, structural, resource-based, evolutionary and decision theoretic sides of survival, and do not intend to achieve a groundlessly high level of abstraction. A deep analysis of relevant organisational theoretic schools was accomplished by Kristóf (2005b). Considering the fact that the findings of organizational theoretic schools and empirical models partially arrived at controverting results, it is not recommended to define a generic, unified theoretical-methodological framework to research organisational survival.

It raises interesting theoretical problems on how the elaborated mathematical-statistical bankruptcy prediction models can contribute to the economic theories explaining organisational survival and failure. According to Blaug (1980) it can be observed in many fields of economic sciences that different econometric studies arrive at contradicting conclusions, and taking into account the available data no best method exists, on the basis of which it could be decided which conclusion harmonizes best with reality (Scott 1981). Consequently, one might examine contradicting hypotheses throughout several decades (Westgaard 2005).

Despite the fact that as a result of enormous model development efforts a great number of appreciated relationships were found, throughout the decades-long history of bankruptcy prediction no unified consent has been achieved which explanatory variables might best predict corporate failure. The exceptionally wide range of forecast methods, together with the different modelling databases from diverse countries, industries and periods make it remarkably challenging to hypothesise what causes corporate failure and how. The lack of theoretical background to explanatory variables is a true limitation to elaborate a general comprehensive theory of bankruptcy prediction. Without a generally accepted theory, nevertheless, it might be inspiring to conclude that any empirically developed model could well operate in a different period and in a different economic environment. Accordingly, it

can be argued that no bankruptcy prediction model might function independently of time, space, and economic environment (Kristóf 2008b). In this aspect Hungary is a special case even among the Central-Eastern European countries, where world-famous and widely applied models showed substantially worse performance compared to their origin and experiences gained with them in other countries (Rež ˇnáková and Karas 2015; Altman et al. 2017). No wonder that country-specific bankruptcy prediction models might significantly differ from one another although being estimated using the same modelling techniques and variables (Laitinen and Suvas 2013).

Scientific predictability problem of bankruptcy forecasting is not a unique phenomenon in the field of social sciences. Predictability in social sciences has been serving as a basis of scientific discussions for a long time (Kristóf 2006). Until the end of 1950s scientific theories were judged based on their ability to make predictions. Only in the 1970s did the evaluation of heuristic power supersede the predictive power. The possibility of exact bankruptcy prediction is to be rejected from the theoretical side, since in society and economy there are no universalities like the laws of nature, on the basis of which long-run generalizations could be formed; it is only true in the case of some trivial regularities. If it were possible to exactly predict bankruptcy and similar socio-economic events, then it would be in principal also possible to list the future economic events. However, if this list became known, it would surely inspire several actors to conduct activities, which would obstruct the occurrence of the predicted event.

Hence it is impossible to give an obvious explanation to corporate survival/failure from the viewpoint of philosophy of science. The solution to this problem is the multi-sided theory-building, concurrent observance of more approaches, and simultaneous application of more forecast methods. Theory must drive empirical model development; in addition, the examination of statistical assumptions should be carried out in theoretical context (Virág et al. 2013). To support the development of the scientific field the results of hypothesis-examinations have to be fed back to theory-formulation.

In accordance with the goals of the article, from this point, the general term 'organisation' mentioned in organisational theoretical approaches will be restricted to economic organisations (companies). Overlapping the theoretical explanations, it is worthwhile to consider which methods might be applicable to accomplish efficient corporate bankruptcy prediction.

The use of financial ratios in corporate failure prediction is based on the assumption that the failure process is characterised by a systematic deterioration in the values of the ratios (Laitinen 1991). It can be argued that different financial predictors might be efficient in the different phases of the corporate failure process (Laitinen 1993). Accordingly, firm failure processes have become more and more important concepts, since they allow considering the behaviour of failing firms in the longer perspective, leading to the breakthrough role of dynamization approaches in bankruptcy prediction (Lukason and Laitinen 2019).

#### **3. Methodological Development in the International Literature of Corporate Bankruptcy Prediction**

Corporate bankruptcy prediction has attracted substantial attention in science for many decades. According to the research of Du Jardin (2010) throughout the historical development of bankruptcy prediction, models were published worldwide by applying more than 50 different methods and 500 variables. The article encompasses the most distributed methods having the greatest impact on scientific research and practical application.

From a methodological point of view, bankruptcy prediction is a binary classification problem with the aim to differentiate between solvent and insolvent groups of companies as good as possible (Virág 2004). Bankruptcy prediction is regarded as a boundary discipline between corporate finance and statistics (data mining), which attempts to predict the future solvency of companies using financial ratios as explanatory variables applying multivariate methods (Nyitrai 2015a).

Throughout the first half of the 20th century, there were no sophisticated statistical methods and computers available to predict bankruptcy. The financial ratios of failing and non-failing companies were compared, and it was concluded that in case of bankrupt companies the most frequently used ratios behaved worse (Fitzpatrick 1932). The first methodological breakthrough came to pass when Durand (1941) published a univariate discriminant analysis (DA)-based credit scoring model. This method became worldwide spread later with the univariate DA model of Beaver (1966).

Realising that the classification of observations using one variable does not provide a reliable result, Myers and Forgy (1963) applied multivariate regression analysis and DA to elaborate a credit rating system for banking clients. In case of riskier clients multivariate DA showed better results, in particular compared to the earlier applied expert rating system, so more and more attention was given to the method. The breakthrough success was achieved by the world-famous multivariate DA model of Altman (1968), which was able to classify the companies in the sample with 95 percentage of classification accuracy. Since its first publication, the model has gone through several revisions. Despite its great number of successful applications, however, the limitations of the model have come to pass, which can be first led back to the rigorous statistical assumption system of DA, second to the application of a hard default definition as a target variable, and third the usability of the model has been reduced by the fact that it had been developed in a relatively narrow range of companies (American stock exchange corporations), thereby limiting its applicability to populations different from the modelling database.

Since the 1970s the development of the field has been dominated by the modernisation of mathematical-statistical classification methods and the IT solutions supporting them (Nyitrai 2015a). Passing through the distribution and variance assumptions of DA, logistic regression (logit) has become a more and more popular bankruptcy prediction method, which was first applied by Chesser (1974) on a credit risk database. In the global distribution of logit, the publication of Ohlson (1980) represented a milestone, which developed a logit model on a sample of 105 insolvent and 2058 solvent companies, thereby expressing that insolvent companies represent a smaller share in the population compared to the solvent ones. The application of probit regression began in the 1980s for similar methodological reasons (Zmijewski 1984).

Nonparametric methods having no statistical assumption behind appeared in bankruptcy prediction also since the 1980s. Decision trees, which are even today widespread tools to solve classification problems and to accomplish efficient data mining, were first applied for bankruptcy prediction by Frydman et al. (1985).

The 1990s brought new challenges to bankruptcy forecasting scholars and practitioners (Prusak 2005). Several critiques concerned linear or linearizable, robust models and the earlier applied methods. As a result, neural networks (NN) belonging to the family of artificial intelligence methods have been given a boost to improve the reliability of bankruptcy forecast models (Kristóf 2005a). NNs were first applied for bankruptcy prediction by Odom and Sharda (1990). The authors proved that the performance of the three-layer backpropagation networks outperformed the results of earlier methods. Since then NNs have been widely distributed, have gone through substantial developments, and represent one of the most popular methods of today.

In parallel with the spreading of NNs, modern visual clustering procedures have been gaining a wide role in bankruptcy prediction. Self-organising maps (SOM) operating on the principle of unsupervised NNs enabled to cluster databases with unknown output into solvent and insolvent classes (Kiviluoto 1998). Multidimensional scaling (MDS) visualizes the hidden relationships between data, reducing them into multidimensional coordinates (Neophytou and Molinero 2004).

The bankruptcy prediction application of neuro-fuzzy systems has become an intensively researched object since the beginning of 2000s, providing better results compared to traditional NNs (Vlachos and Tolias 2003). In parallel, the support vector machine (SVM) procedure has also been proven to achieve higher classification accuracy than earlier applied methods, which was first published based on a sample of Australian companies using twenty-fold cross-validation (Fan and Palaniswami 2000). In addition, the methods of rough set theory (RST) (Dimitras et al. 1999), k nearest neighbour (KNN) (Ardakhani et al. 2016), Bayes-networks (Sun and Shenoy 2007), genetic algorithms

(GA) (Lensberg et al. 2006), learning vector quantization (LVQ) (Neves and Vieira 2016) and case-based reasoning (CBR) (Bryant 1997) also began to spread in the 2000s.

By the 2010s ensemble methods as a special case of method-combinations have gained significance instead of individually applying certain classification methods (Marqués et al. 2012). The essence of them is multiple bootstrapping and applying classification procedures on several subsamples. The classification power of the final model is the average of that of the individual models, usually outperforming the classification power without using ensemble methods. The most frequently applied ensemble methods are boosting, bagging, random subspace, random forest, Gauss-processes and autoencoder belonging to the family of machine learning procedures (Nyitrai 2015a; Wang 2017). Today's bankruptcy prediction researches are unambiguously dominated by machine learning, data mining, artificial intelligence and hybrid modelling through creatively combining different new methods (Barboza et al. 2017). Bankruptcy prediction as a multivariate classification problem is a very popular topic in data mining competitions aiming at finding more and more reliable and contemporary algorithms, accordingly a constantly widening range of innovative solutions are becoming public day by day.

#### **4. Empirical Development of Hungarian Corporate Bankruptcy Prediction**

Under Hungarian conditions, it became possible to scientifically examine bankruptcy prediction at the beginning of 1990s by the appearance of the Bankruptcy Act regulating the cases of legally going into bankruptcy. Throughout the recent thirty years the Hungarian literature and practice of bankruptcy prediction have gone through a substantial improvement. Considering the various research goals and databases, however, the empirically measured differences between the classification powers of the elaborated models have to be interpreted in light of the range and definition of explanatory and target variables. The importance of the scientific field can be well represented by the fact that so far fourteen PhD theses in Hungary have dealt with the theoretical backgrounds, methodological challenges and/or the practical application of corporate bankruptcy prediction (Virág 1993; Arutyunjan 2002; Kiss 2003; Imre 2008; Kristóf 2008b; Oravecz 2009; Kotormán 2009; Felföldi-Sz ˝ucs 2011; Hámori 2014; Madar 2014; Nyitrai 2015a; Bozsik 2016; Fejér-Király 2016; Koroseczné Pavlin 2016). The year of 2016 was particularly strong when three PhD theses were published.

#### *4.1. The Era of Classic DA and Logit Models*

The first Hungarian corporate bankruptcy prediction study was elaborated by Péter Futó in 1989 who worked in the Industrial Economic Institution. The research used annual report data of Hungarian industrial companies from 1986–1987 and the occurrence of insolvency event in 1988 by using variance analysis (VA) and simplified DA. The definition of insolvency event was the fact the companies could not pay their obligations in at least two months throughout the first six months of 1988. The study was not published; its results were interpreted later by Virág and Hajdu (1998). Empirical results revealed that under Hungarian circumstances it became possible to examine which financial ratios might be extensively applied to predict bankruptcy.

The first published Hungarian bankruptcy models were elaborated by Miklós Virág after a 10 months long study trip in the United States using annual report data from 1990 and 1991 applying DA and logit (Virág 1993). The author applied 15 financial ratios. Within the 154 manufacturing companies involved in the research, 77 were solvent and 77 became insolvent in 1992 (in line with the novel Bankruptcy Act insolvent companies had to declare bankruptcy against themselves). The four-variate DA model had 78, and the five-variate logit model had 82 percentage of classification accuracy (Virág 1996).

Virág and Hajdu (1996) created an early warning bankruptcy model family in 1996 indicating bankruptcy dangers for different sectors and branches of the economy using DA, based on the financial data of 10,000 economic units. Altogether 41 bankruptcy models were built: one for the total economy, ten for the national economic sectors, and thirty for the branches. The accuracy of the 1996 bankruptcy model family covering national economic sectors and branches was well over the earlier models because of the details of the range of activities, namely all of them had more than 90 percent of classification accuracy. The authors drew the conclusion first time in Hungary that throughout the financial classification it was reasonable to examine how the financial situation of a company equated to companies operating in the same industry, and whether or not they became bankrupt (Hajdu and Virág 2001).

Hámori (2001) transformed the financial ratios to his logit model in a way that they could be evaluated monotonously. The author defined certain limits for the value-range of ratios, and he modified the outlier data with predefined theoretical maximum values. To avoid multicollinearity, he created four factors from the ratios. The sample consisted of 685 solvent and 72 insolvent companies. The classification accuracy of the four-factor-model was 95 percent.

Arutyunjan (2002) tested the applicability of foreign DA models on Hungarian agricultural firms. All in all, the author did not regard foreign models as reliable on the database and developed instead an own logit model achieving 92 percentage of classification accuracy.

Virág and Dóbé (2005) examined the solvency of national economic sectors applying the earlier elaborated bankruptcy model family. Input variables were considered using the sector-level aggregated ratios taking into account 30 national economic sectors and 15 financial ratios. The authors defined the average ratio values as units of analysis (centroids). It was concluded that the average picture of the majority of sectors better resembled their own surviving companies, than the bankrupt ones.

#### *4.2. The Era of NNs and Basel II*

Kiss (2003) approached the problem from the viewpoint of credit score modelling, defining a mutual comprehensive development framework between bankruptcy prediction and credit scoring. The results of his PhD thesis was the hierarchical ordering of statistical methods, in addition to the elaboration of organisational, IT and decision support framework of scoring systems.

Using the database of the first Hungarian bankruptcy model Virág and Kristóf (2005a) developed NN-based models. Experimenting with different structures a four-layer backpropagation network showed the best result outperforming the DA model by 9 percentage points, and the logit model by 5 percentage points Virág and Kristóf (2005b). The authors later performed a more complex empirical research on the same database comparing the performance of four classification procedures using the industrial mean relative ratios, and again found that NNs outperformed the traditional methods (Virág and Kristóf 2006).

Because of the fact that the Hungarian introduction of Basel II had been approaching, the Supervisory Authority of Financial Institutions launched a tender in 2006 to elaborate databases to support the application of risk management methods in financial institutions. The winner study (Info-Datax 2006) first attempted to explore the problems of statistical methods applied to probability of default (PD) prediction from the methodological side, and then used principal component analysis (PCA) for data reduction. Within the framework of empirical research, the authors compared the performance of DA, logit and decision trees on a sample of 1500 companies. All the three models showed 87–88 percentage of classification accuracy.

Certain methodological reviews of bankruptcy prediction were provided by Halas (2004); Szabadosné Németh and Dávid (2005); Oravecz (2007); Sz ˝ucs (2014); Ratting (2015); Reizinger-Ducsai (2016), however, the authors did not carry out own empirical model development. The applicability of earlier published international models were examined on small samples by Kotormán (2009) on agricultural enterprises, Rózsa (2014) on dairy firms, Pet˝o and Rózsa (2015) on meat processing companies, Dorgai et al. (2016) on commercial enterprises and Fenyves et al. (2016) on hotels with more or less success. A small-sample model development was performed by Süt˝o (2018) and Ékes and Koloszár (2014).

Organisational theoretic approaches explaining corporate survival, theoretical, methodological and practical problems of bankruptcy prediction, together with the best-practice application of corporate failure models were brought together by Kristóf (2008b). Considering the industrial mean corrected variables, comparing the results of models built with and without PCA, altogether the NN models showed the best result by 84 percentage of area under receiver operating characteristic (AUROC), pursued by the logit model developed on the principal components (83 percentage), and then came the performance of decision trees developed using the original variables (81 percentage). In addition, the MDS and SOM were first time applied in Hungary for bankruptcy modelling purposes in the same study, proving the clustering and variable selection capabilities of the two methods.

Meanwhile, the bankruptcy prediction in Transylvania also attempted to catch up to international best practice. The first Transylvanian-Hungarian bankruptcy prediction models were developed by Benyovszki and Kibédi (2008) on a sample of 129 companies from Baia Mare using logit and probit, achieving 81 percent of classification accuracy with both models. The most comprehensive theoretical, methodological and empirical researches were carried out in Szeklerland in the 2010s, when different logit and NN-based models were developed on a sample of companies from Hargitha County (Fejér-Király 2015, 2016, 2017). Based on the empirical findings it can be concluded that the behaviour of Hargitha County companies is different from Hungarian experiences, since no size variable became significant in Transylvania, whereas turnover ratios showed real added value, in contrast to earlier experiences in Hungary. In addition, it was proven that applying PCA and the inclusion of macroeconomic variables provided better models.

Felföldi-Sz ˝ucs (2015) researched the predictability of buyers' non-performance derived from granting commercial loans on the sample of 905 Hungarian small and middle enterprises. The target variable was the 90 days past due event happened on behalf of the buyers. Correspondingly to banking credit risk models the author proved by logit that behavioural, non-financial variables contributed to better discriminatory power, compared to models developed using the traditional financial ratios (Felföldi-Sz ˝ucs 2011). It was an important finding in Hungary, and corroborated the results gained in other European countries, especially for small and medium enterprises (SMEs) (see i.a. Lukason and Andresson 2019).

#### *4.3. The Challenges of Data Transformations and Method Combinations*

Besides the substantial number of publications regarding comparative analytical bankruptcy prediction studies, more and more emphasis was laid on publications emphasising the importance of data preparation and data transformation procedures (Kristóf 2008a). The study of Hámori (2014) drew attention to the detection and handling of different data preparation anomalies (missing values, outliers, division by zero, double negative divisions, null per null divisions etc.) together with demonstrating a handbook-like methodological guidance and case studies to resolve the perceived problems.

Representativity of modelling sample and the problem of sampling bias were in-depth researched by Oravecz (2009). The results of her PhD thesis were the definition of missing data handling techniques together with the elaborated reject inference methods applicable in credit score modelling to manage sampling bias. The author justified on a sample of 2279 observations using logit that stronger sampling bias led to weaker model performance.

Within the framework of a small-sample empirical research Virág and Kristóf (2009) projected the dissimilarities between solvent and insolvent observations into coordinates of a lower dimensional space applying MDS, and developed a logit model on the reduced dimensional coordinates achieving outstanding classification accuracy.

The impact of relating stock balance sheet items to flow profit-and-loss statement items on the performance of bankruptcy prediction models was in-depth researched by Nyitrai (2017). The effects of handling outliers on model performance in different manners were examined by Nyitrai and Virág (2019). It was concluded that categorisation by Chi-square automatic interaction detection (CHAID) decision trees more effectively handled outliers than coercing by external percentiles or by the mean ± different standard deviations.

Examining further the favourable impact of decision trees on model performance, it was demonstrated by Kristóf and Virág (2012) on a sample of 504 Hungarian companies that the performance of logit and NN models can be further improved by applying variables discretized by CHAID decision trees compared to the application of original variables. However, PCA did not provide added value to the classification power of the models.

The efficiency of combining decision trees and NNs was proven by Bozsik (2011). The author ordered single-layer perceptron networks to the peaks of C4.5 decision trees on a sample of 250 companies using 17 variables. Both the developed brute force and fine-tuned slim models achieved 84 percentage of classification accuracy.

The impacts of company size and industry on bankruptcy models were examined by Nyitrai (2018), using the sample of annual report data from 2007–2015 of 2614 Hungarian enterprises. On the basis of the developed logit models it was proven that both company size and industry influence the design and performance of bankruptcy models.

#### *4.4. Dynamization and Through-the-Cycle Modeling*

In line with the through-the-cycle modelling requirements of Basel Capital Accord Imre (2008) applied first in Hungary time-series input variables of 2000 companies from 2002–2006, supplementing the accustomed variables by company form, county and industry. The target variable of the decision tree, logit and NN models was the occurrence of 90 days past due event. In static approach (without dynamizing the variables) the AUROC on the testing sample was 90 percentage in case of logit and NN models in contrast to the 83 percent performance of decision trees. However, by applying the dynamized variables expressing the timely change, the model performance of NN improved to 92 percentage, that of logit to 91 percentage, and that of decision trees to 84 percentage, thereby it was proven first time in Hungary that the application of dynamized variables did have a positive impact on the classification power of bankruptcy prediction models.

Insolvency prediction of 10–250 million HUF revenue Trans-Danubian companies was researched by Bareith et al. (2014) applying NNs with 1-1 hidden layers. Because of the impact of financial crisis, the database was partitioned into two economic cycles: 2002–2008 and 2009–2012. In both periods the financial ratios of three historical years were considered, filtering out the non-relevant variables with the help of a relative importance (RI) based variable selection. More dynamic variables were included in both periods. The model developed on the 2002–2008 period achieved 85 percentage of classification accuracy, compared to the 79 percentage of classification accuracy measured on the model developed using data of the 2009–2012 period. The authors performed a similar empirical research two years later on companies from Csongrád County without partitioning the period of data collection, and achieved an even higher performing neural network model (Bareith et al. 2016). Financial ratios of liquidated small enterprises were in-depth examined by Koroseczné Pavlin (2016) throughout the years before going into liquidation, showing considerable empirical results in the field.

In line with the Basel requirements, Madar (2014) elaborated a corporate rating system applying logit, which was suitable to estimate long-term PD and economic capital, using the database of a credit institution portfolio from 2007–2012 containing 78,516 observations. The author converted the original variables with the help of weight-of-evidence (WOE) transformation. The target variable was the censored default rate. The study revealed the importance of model stability and PD calibration, and proposed techniques to resolve the problems, considering the fact that in crisis periods substantially higher PD can be measured compared to the periods of economic growth.

In the field of dynamic modelling Bauer and Endrész (2016) published an outstanding study that applied a very long historical database from 1996–2014. Combining micro and macro variables the authors developed a probit model for the population of Hungarian double-entry bookkeeping companies, specifying legal failure as the target variables, handling the heterogeneity by company size. The AUROC of the model was 86 percentage.

With the aim of a Central Bank and credit institution sector research Banai et al. (2016) developed PD models for the total population of credited Hungarian SMEs by linking the database of the Central Credit Information (CCI) and financial report data, supplemented with macroeconomic variables. Data collection considered the period of 2007–2014, the target variable was the non-performing event derived from delinquent loan payment. Dynamic logit models were segmented per company size. Certain variables were categorized, lagged or discretized. The micro enterprise model had 75, the small enterprise model 79 and the middle enterprise model 84 percentage of AUROC. The model performance was less favourable than the previously developed model using legal failure as the target variable, since the non-performing event of CCI (60 days past due) is a significantly softer criterion than legal failure.

Similar research was carried out by Nyitrai and Virág (2017b) on time series financial ratios of 1542 Hungarian companies from the period of 2001–2014. Logit was applied using ten-fold cross-validation. Variables were retrospectively dynamized for all historical periods with the help of the formula earlier published by Nyitrai (2014). AUROC showed tendentiously stronger model performance when considering more and more historical years through model development. It was concluded that in case of companies younger than 10 years it was worthwhile to apply as many years as available, however, in case of companies older than 10 years the application of the last 10 years resulted in best model performance. The same authors performed similar empirical research on a different sample containing 1354 companies, which corroborated the findings (Nyitrai and Virág 2017a), which was also in compliance with the findings of an earlier modelling research performed by three different decision trees on a sample of 1082 enterprises (Virág and Nyitrai 2015).

The positive impact of dynamization on predictive power was again proven by Nyitrai (2019b) with the help of a recent Hungarian empirical research. Trends of financial ratios were expressed by indicator variables, and the minimum and maximum values of previous periods were represented as benchmarks in the models. Applying ten-fold cross-validation the developed DA, logit and decision tree models showed that dynamized variables improved classification accuracy compared to models developed from the original static variables. In addition, it was demonstrated by Nyitrai (2019a) that creating categorical variables from the number of nodes of CHAID decision trees coming from subsequent years arrived at better predictive power compared to the approach by using the original data as input variables.

To meet the requirements of IFRS-9 international accounting standards for financial instruments it became necessary to extend the one-year range of failure event to long-term. Forward-looking to the term of financial instruments Kristóf and Virág (2017) and Kristóf (2018b) developed 20-year PD forecast models for Hungarian companies applying continuous, non-homogeneous Markov chains.

#### *4.5. Machine Learning and Data Mining*

SVM was applied on a Hungarian corporate database for the first time by Virág and Nyitrai (2013) on the sample of the first bankruptcy model. Using different kernel functions the SVM model was altogether able to classify the observations 5 percentage points better than the best benchmark NN model.

Within the framework of experimenting with machine learning procedures on Hungarian companies Virág and Nyitrai (2014a) applied the RST method on the first Hungarian bankruptcy model database. In addition, the authors attempted to find answer to the question whether it was worthwhile to disregard model interpretability to achieve higher classification accuracy. Results showed that applying RST through generating easily interpretable 'if-then' rules provided similar results compared to SVM; accordingly, the trade-off between the interpretability and performance of the models became out of question.

Virág and Nyitrai (2014b) compared the performance of the two most frequently applied ensemble methods (adaboost, bagging) in the case of C4.5 decision trees using the sample of 976 Hungarian companies having financial report data for the period of 2001–2012. Model performance of the original financial ratios was compared to the model developed using the ratios after industrial mean correction, and to the model developed using dynamized ratios. To avoid sampling problems, hundred-fold cross-validation was applied. The best result was achieved by using the bagging procedure, which was underperformed by the adaboost procedure by 1 percentage point, and by 6 percentage points using the standalone C4.5. Empirical results again proved the favourable model performance impact of dynamized variables; however, industrial mean ratios did not contribute to improvement.

The KNN was applied to Hungarian bankruptcy prediction first by Nyitrai (2015b). The study examined the classification accuracy of different models developed on a balanced sample of 1000 observations using different k values, distance definitions, and variables 1, 2 and 3 years before bankruptcy and derived from multi-period variables. The best result was achieved by the model developed using the multi-period variables (80 percentage), which was followed by the model using variables 1 year before the occurrence of bankruptcy (77 percentage). Results also revealed that certain financial ratios rather give early warning indication to potential bankruptcy in the short-run, whereas others in the long-run. The author performed empirical research in the same year using CHAID decision trees and arrived at similar conclusions (Nyitrai 2015a).

CBR as a relative method to KNN was applied for Hungarian bankruptcy prediction by Kristóf (2018a) on a sample of 1,828 micro-enterprises. To make input variables orthogonal to one another the study applied PCA. The nearest neighbours were determined by the reduced dimensionality tree (RDT) method. Although the classification accuracy of the CBR model outperformed the decision trees and was similar to logit, eventually it was smaller than that of the benchmark NN model.

After carrying out the proper data preparation steps on a balanced sample of 1534 Hungarian small enterprises Boda et al. (2016) applied the component-based object comparison for objectivity (COCO) proximity analysis with different step-functions, the WizWhy data mining procedure with different layers and rule-systems, in addition logit and NN as benchmark models. Eventually COCO, logit and NN also provided 80 percent of classification accuracy, however, the WizWhy model optimised with different logics and hybrid rule systems built on already realised partial results achieved 92 percentage of classification accuracy.

Realizing the opportunities of flexible and adaptive artificial intelligence modelling Bozsik (2016) developed several hybrid artificial intelligence-based bankruptcy models by combining the advantages of different methods. From the innovative study the fuzzy system combined by SVM (FSVM) using Gauss-kernel function showed exceptional classification accuracy (93 percent). Another remarkable hybrid model was the five-layer adaptive neuro-fuzzy (ANFIS) developed by Gauss membership functions having 84 percentage of classification accuracy.

Boros (2018) experimented with several machine learning algorithms examining their impact on credit risk models using a sample of 10,000 companies. After variable selection, PCA and WOE categorisation eventually the NN model with 82 percent of AUROC became better than the SVM model (81 percent of AUROC) followed by stochastic gradient boosting (76 percent of AUROC). Initial models developed using variables without categorisation showed significantly worse performance.

#### *4.6. Summary of Hungarian Bankruptcy Models*

Evaluating the most important features and results of Hungarian corporate bankruptcy prediction, it can be argued that the country can be really proud of the rich set of empirical models and methodological development throughout the analysed period. Table 1 provides a systematic summary of the studies in a chronological order showing a comprehensive picture how development took place in time.


**Table 1.** Summary of Hungarian empirical corporate bankruptcy models.


**Table 1.** *Cont.*


**Table 1.** *Cont.*

<sup>1</sup> According to classification matrix or AUROC (see the applied model performance indicator in the body text of the article for each model). The best model performance is presented, if more than one model was developed.

#### **5. Conclusions**

After the comprehensive review of relevant literature and the completely analysed 30-year of Hungarian empirical results the following normative proposals can be drawn for researchers and practitioners working in the field of corporate bankruptcy prediction.


event derived from delinquent payment; represent a substantially lower criterion compared to legal failure. In addition, if large modelling sample is available, it is worthwhile to develop models separately for segments and/or industries.

• With regard to model development methodology, nowadays the two most spread techniques are the logit and NN-based bankruptcy modelling pursued by the decision trees. Considering the fact that the application of artificial intelligence and data mining-based methodologies are constantly emerging, it is recommended that at least as a benchmark model the classification power of the three most frequently applied methods must be compared to the performance of any new model. Development of innovative hybrid models are expressively supported, since they successfully combine the advantages of certain methods with others, thereby contributing to better model performance. In addition, it has to be recognised that the application of traditional bankruptcy prediction methods setting rigorous mathematical-statistical criteria (DA) might evidently raise model performance problems, which is a substantial argument against their interpretability accustomed in recent decades.

**Funding:** This research received no external funding.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


Boda, Dániel, Martin Luptak, László Pitlik, Gábor Sz ˝ucs, and István Takács. 2016. Prediction of insolvency of Hungarian micro enterprises. In *Proceedings of the ENTRENOVA–ENTerprise Research InNOVAtion Conference, Rovinj, Croatia, September 8–9*. Zagreb: IRENET–Society for Advancing Innovation and Research in Economy, pp. 352–59.

Boros, Bence. 2018. *Artificial Intelligence and Automation in Credit Scoring*. Budapest: KPMG Tanácsadó Kft.


Kristóf, Tamás. 2018a. A case-based reasoning alkalmazása a hazai mikrovállalkozások cs˝odel˝orejelzésére. *Statisztikai Szemle* 96: 1109–28. [CrossRef]


Ratting, Anita. 2015. Fizetésképtelenség-el˝orejelzési megközelítések. *Társadalom és Gazdaság* 7: 53–73. [CrossRef]

Reizinger-Ducsai, Anita. 2016. Bankruptcy prediction and financial statements. The reliability of a financial statement for the purpose of modeling. *Prace Naukowe Uniwersytetu Ekonomicznego We Wrocławiu* 441: 202–13. [CrossRef]


Schumpeter, Joseph. 1934. *The Theory of Economic Development*. Cambridge: Harvard Business Press.


Szabadosné Németh, Zsuzsanna, and László Dávid. 2005. A kis- és középvállalati szegmens mulasztási valószín ˝uségének el˝orejelzése magyarországi környezetben. *Hitelintézeti Szemle* 4: 39–58.


Virág, Miklós. 2004. A cs˝odmodellek jellegzetességei és története. *Vezetéstudomány* 35: 24–32.


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
