*Article* **An Ensemble Classifier-Based Scoring Model for Predicting Bankruptcy of Polish Companies in the Podkarpackie Voivodeship**

### **Tomasz Pisula**

Department of Quantitative Methods, Faculty of Management, Rzeszow University of Technology, al. Powstancow W-wy 10, 35-959 Rzeszow, Poland; tpisula@prz.edu.pl

Received: 21 December 2019; Accepted: 15 February 2020; Published: 19 February 2020

**Abstract:** This publication presents the methodological aspects of designing of a scoring model for an early prediction of bankruptcy by using ensemble classifiers. The main goal of the research was to develop a scoring model (with good classification properties) that can be applied in practice to assess the risk of bankruptcy of enterprises in various sectors. For the data sample, which included 1739 Polish businesses (of which 865 were bankrupt and 875 had no risk of bankruptcy), a genetic algorithm was applied to select the optimum set of 19 bankruptcy indicators, on the basis of which the classification accuracy of a number of ensemble classifier model variants (boosting, bagging and stacking) was estimated and verified. The classification effectiveness of ensemble models was compared with eight classical individual models which made use of single classifiers. A GBM-based ensemble classifier model offering superior classification capabilities was used in practice to design a scoring model, which was applied in comparative evaluation and bankruptcy risk analysis for businesses from various sectors and of different sizes from the Podkarpackie Voivodeship in 2018 (over a time horizon of up to two years). The approach applied can also be used to assess credit risk for corporate borrowers.

**Keywords:** bankruptcy prediction; ensemble classifiers; boosting; bagging; stacking; scoring models

#### **1. Introduction**

According to statistical data from 2018–2019, 30–60 businesses in Poland announce bankruptcy each month. Business bankruptcy is invariably an adverse phenomenon for the business itself and its employees, but it is also a problem for its creditors, banks and partners. The high number of bankruptcies reported may also lead to negative consequences locally—for the economic development and economic circumstances of the region—and on the national scale—for the economy of the whole country. For this reason, the issue of early prediction of business bankruptcy, and therefore the possibility of forecasting the risk of business bankruptcy over a long time horizon (even up to several years), is a very important financial and economic problem. In its financial and economic dimension, bankruptcy (i.e., business default) is defined as a situation in which a business is unable (for various reasons) to meet its liabilities towards creditors. For businesses operating in market economics conditions, a potential risk of bankruptcy always exists. The risk is the most commonly defined as the probability of defaulting on liabilities incurred (probability of default, PD). The subject of modeling risk of bankruptcy is also of enormous importance for institutions granting corporate loans, to whom the bankruptcy of a corporate debtor means a potential loss of the loan granted.

The main objective of this study was to design a scoring model based on ensemble classifiers which could be used to forecast the risk of bankruptcy for Polish businesses conducting activity in the Podkarpackie Voivodeship over a time horizon of up to two years. One of the reasons for using a developed scoring model based on ensemble classifiers to forecast bankruptcy risk for companies from the Podkarpackie region in this study is the fact that the Podkarpackie Voivodeship (along with several other Polish regions) just after a period of political transformation of Poland from socialism to market economy, was notably lagging behind in development. It belonged to the group of several eastern regions (voivodeships) from the so-called the eastern wall, which was overlooked and underestimated in the policies pursued by relevant governments. The selection of companies from the region was also influenced by the fact that the Podkarpackie Voivodeship is currently one of the "development tigers" in Poland and is catching up quickly. This is mainly due to the more effective policies of the current government aimed at equalizing the development opportunities of Polish regions. The Podkarpackie Voivodship is not a very large voivodeship in relation to other regions of Poland as it occupies 11th place in a ranking of all 16 voivodeships, with an area of 17,846 km2 (source: Główny Urz ˛ad Statystyczny (2019) (Statistics Poland)—Local Data Bank, https://bdl.stat.gov.pl). The attractiveness of the voivodeship, however, is influenced by its geographical location, which is conducive to the development of ecological agriculture and tourism (also international—the Bieszczady Mountains). A big advantage of the region is also its border location (the region borders Ukraine and Slovakia—which also belongs to the EU). Due to its population size, the Podkarpackie region belongs to the group of medium-populated regions of Poland and takes 8th place in this ranking, with a population of approximately 2.1 million (source: Główny Urz ˛ad Statystyczny (2019) (Statistics Poland)—Local Data Bank, https://bdl.stat.gov.pl). The voivodeship also has no very developed industries, in comparison to other more industrialized regions of Poland. Nevertheless, the Podkarpackie Voivodeship belongs to the group of the fastest developing regions of Poland. In terms of income per capita, the Podkarpackie Voivodeship took 2nd place in 2018 in the ranking of 16 Polish regions-voivodships (revenues at the level of PLN 562.4 per inhabitant (source: Statistics Poland—Local Data Bank, https://bdl.stat.gov.pl). Also in 2018, the Podkarpackie region was the most dynamically developing region of Poland in terms of the growth of generated GDP (GDP). The Podkarpackie recorded an increase of 7.8% of GDP in 2018 compared to the previous year. In 2018, the GDP generated in the Podkarpackie already constituted 3.9% of Poland's GDP and was 9th place in the regions (ranking source: Statistics Poland—Local Data Bank, https://bdl.stat.gov.pl/BDL/start). This proves that the region's economy is already very dynamic but at present is still progressing. The economy of the Podkarpackie region stands out positively and has a very large impact on its potential cluster of aviation industry enterprises belonging to the so-called aviation valley and the dynamic development of road and transport infrastructure (e.g., the route of the international European North-South communication line Via Carpatia), as well as the development of innovation (innovative technologies) in the region. The companies that drive development in the region belong to Stowarzyszenie Dolina Lotnicza (2019) (Aviation Valley Association), that include many aviation industry companies that provide services to major aviation manufacturers around the world (e.g., Boeing, Airbus, source: http://www.dolinalotnicza.pl/en/business-card). These include companies such as 3M Poland, 3D Robot, Boeing Distribution Services, Pratt & Whitney Poland, Collins Aerospace, Goodrich Aerospace Poland, General Electric Company Poland, Hamilton Sundstrand Poland, Heli-One, Safran Transmission Systems Poland and MTU Aero Engines Poland. The very dynamic development of economic potential in the Podkarpackie region also affects the quality of life of its inhabitants. The Podkarpackie Voivodeship has been high in the quality of life rankings for several years. All these factors make it sensible to conduct a comprehensive analysis and an assessment of the risk of bankruptcy of enterprises operating in the Podkarpackie region using the most effective models of forecasting and assessing the risk of their bankruptcy. Therefore, first the work focused on developing an adequate scoring model for bankruptcy forecast using ensemble classifiers, and analyzing and verifying its prognostic capacity (classification efficiency), while only later on using it in practice to comprehensively assess the bankruptcy risk of enterprises from the Podkarpackie region belonging to various sectors of the economy (depending on the declared classification of their activities) that can also be distinguished by their size.

The article details the stages in which the scoring model was designed and implemented in practice. The scoring model design stage involved the comparison of the predictive capability of ensemble models used in this study with that of conventional single classifiers. The results of previous works of many authors (see e.g., Anwar et al. 2014; Barboza et al. 2017; Tsai et al. 2014) indicate that the models based on ensemble classifiers help achieve more accurate results and improve the discriminant capability of the model. On the basis of the scoring model designed, a bankruptcy risk assessment for businesses from the Podkarpackie Voivodeship was carried out based on the sector in which they operated and the size of the business.

The main innovation of the research presented in the article is that previous studies of other authors did not discuss the practical use of the scoring model for comprehensive analysis of the bankruptcy risk of companies (also from different sectors) operating in the Podkarpackie region, using the ensemble classifiers approach.

#### **2. Literature Review**

The various problems of bankruptcy of businesses are widely described in the literature. The significance and salience of the bankruptcy problem has motivated many authors to concentrate on this issue in their research. The first mentions of the subject of modeling business bankruptcy and forecasting its likelihood appeared in economic and financial literature in 1968. The first study on risk bankruptcy modeling was published by Altman (1968). The early bankruptcy prediction studies applied statistical methods and mainly concerned the use of different variants of discriminant analysis or logistic regression (Ohlson 1980; Begley et al. 1996). Since those models had significant limitations, artificial intelligence and machine learning methods that were successfully applied in image recognition tasks were gradually also implemented in bankruptcy forecasting. It was found that machine learning techniques such as neural networks (NNet), Support Vector Machines (SVM) and ensemble classifier methods have better forecasting capabilities and higher classification effectiveness than conventional approaches. An overview of the previous research on the application of statistical methods and machine learning techniques in business bankruptcy prediction can be found in studies such as the ones by Kumar and Ravi (2007) and Lessmann et al. (2015). Alaka et al. (2018) presented a comprehensive overview of literature and systematics of predictive models used in business bankruptcy forecasting, including: purpose of research, method of selecting variables for the model, sample size for analyzed businesses (also including bankrupt ones) and a comparison of the effectiveness of models' classification measures.

Some works deal with the issues of forecasting and assessing the risk of bankruptcy of enterprises, taking into account the specificities of the sector of their activity. Rajin et al. (2016) conducted a bankruptcy risk assessment for Serbian agricultural enterprises, which is one of the most significant sectors of the Serbian economy. The classification efficiency of several models was compared using the methods of linear discriminant analysis. Their research shows that models taking into account the specifics of economies and market characteristics (e.g., the European market—DF-Kralicek's model) give better results for the Serbian economy than models created for American markets (e.g., the classic Altman Z-Score model). Karas et al. (2017) dealt with similar problems, who showed that classic scoring models developed for the US economy (Z-Score Altman, Altman-Sabato's models) and IN05—designed and developed for the Czech enterprises—are less effective compared to the original validation results. This forces researchers to develop more adequate models, in particular taking into account the specificity and financial indicators of the agricultural sector and the economy of the country affecting the bankruptcy of enterprises. Receiver-Operating Characteristic (ROC) curves were used to measure the effectiveness of the models. Chen et al. (2013) dealt with the problems of forecasting the bankruptcy risk of industrial enterprises in the manufacturing sector in China. They used a modified variant of Multi-Criteria Linear Programming algorithm (so-called MC2LP algorithm) to forecast the risk of bankruptcy of 1499 Chinese enterprises from the studied sector and selected 36 financial indicators to assess their financial condition. The classification efficiency of the studied model was compared with the efficiency of the classic MCLP model and the SVM approach. Matrix correctness (compliance) matrices were used as measures of classification accuracy. The use of the

model proposed by the authors enables setting up a variable value for the cut-off point (determining the expected belonging of objects to classes) and thus systematically correcting incorrect classification errors. Topaloglu (2012) dealt with the forecast of bankruptcy of American enterprises from the manufacturing sector using a multi-period logistic regression model, the so-called hazard models. The research period covered bankruptcies from 1980–2007 and the results show that macroeconomic diagnostic variables in a model such as GDP have the very large impact on the assessment of their bankruptcy. The study shows that accounting indicators for assessing the financial condition of enterprises used in the model lose their predictive power (become irrelevant) when global market and macroeconomic indicators are taken into account. Achim et al. (2012) studied the financial risk of bankruptcy for Romanian enterprises from the manufacturing sector using the Principal Component Analysis method in the period of 2000–2011, and thus taking into account the impact of the global crisis on financial markets. The research sample included 53 enterprises registered in Romania and operating in the production sector, including 16 selected and most frequently used financial indicators used in the study. The study shows good predictive quality of the model tested and presents its potential application possibilities. In the literature, you can also find works on the modeling of bankruptcy risk for enterprises operating in other sectors of the economy, e.g., Marcinkevicius and Kanapickiene (2014) for companies from the construction sector, as well as Kim and Gu (2010), Youn and Gu (2010) and Diakomihalis (2012) for companies in the hotel and restaurant sector.

It is also necessary to emphasize an important aspect in the research on the risk of bankruptcy of enterprises, which is taking into account the impact of economic cycles and selected macroeconomic variables of the market while considering the effect of cyclical economic conditions of countries. In Vlamis (2007) statistical logistic and probit regression models were used to forecast the risk of bankruptcy of American real estate companies in the period 1980–2001. It has been shown that financial indicators such as profitability, debt service and company liquidity are important determinants of the risk of bankruptcy of the surveyed enterprises. A number of key macroeconomic financial variables have also been used because the risk of borrowers' bankruptcy depends on the state of the economy and the current business cycle. Similar issues were dealt with in the publication by Hol (2007), which concerned the study of the impact of business cycles on the probability of bankruptcy of Norwegian companies. It has been shown that models that take into account the impact of economic cycles have better prognostic properties than models that only take into account the financial indicators of companies. In a similar study, Bruneau et al. (2012) analyzed the relationship between macroeconomic shocks and exposure to the risk of bankruptcy of companies in France belonging to different sectors of classification of activities. The study of the dependence of the risk of bankruptcy on economic cycles was carried out using the two-equation VAR model based on data from 1990–2010.

At this point one should also mention Polish authors' significant contribution to the development of bankruptcy forecast models which take into account the specific nature of the Polish economy. Their research is mostly based on classical techniques, using statistical methods or machine learning tools and the methods for predicting and evaluating the risk of business bankruptcy. The results obtained by Polish authors studying bankruptcy risk modeling can be found in publications by Korol (2010), Hadasik (1998), Hamrol and Chodakowski (2008), M ˛aczy ´nska (1994), Prusak (2005) and Ptak-Chmielewska (2016). In the context of the research done by Polish authors, a very interesting and detailed comparative analysis of the subject of enterprise bankruptcy forecasting in East-Central Europe and an overview of models applied from the perspective of developing economies of the countries of the region in the transformation period was presented by Kliestik et al. (2018).

In recent years, ensemble classifiers have been successfully used for predicting bankruptcy of businesses. Some studies of this type include Barboza et al. (2017), Brown and Mues (2012) and Zi ˛eba et al. (2016). They are dedicated to the application of ensemble classifiers in forecasting bankruptcy of businesses and demonstrate that the ensemble classifiers offer better forecasting properties and accuracy than conventional statistical methods. Moreover, a study by Kim et al. (2015) proved that ensemble models are more resistant to the sample imbalance problem (for bankrupt businesses and those at no risk of bankruptcy) during the statistical data preparation phase.

Many studies on the application of ensemble classifiers in business bankruptcy forecasting refer to boosting and bagging methods (sequential correction and classification error minimization, as well as component classifier result sampling and combining) in order to increase classification performance of the entire forecasting system. In studies by Cortes et al. (2007) and Heo and Yang (2014), Adaboost (an adaptive boosting algorithm) was applied to decision trees as basic classification models. The use of ensemble classifiers with a classifier boosting technique based on neural network classifiers was discussed in studies by Alfaro et al. (2008), Fedorova et al. (2013), Kim and Kang (2010) and West et al. (2005). A different approach was adopted by Kim et al. (2015) and Sun et al. (2017) who used support vector machines (SVM) as base classifiers, which were boosted as a group of ensemble classifiers. Bagging is also a method frequently used in practical applications of ensemble classifiers. This subject dealt with studies which analyze the classification effectiveness of such ensemble classifiers by relying on several models of base classifiers developed by Hua et al. (2007), Zhang et al. (2010) and Twala (2010). The use of ensemble classifiers with combining (stacking) the results of several classifiers in a single meta-classifier was discussed in studies such as those by Iturriaga and Sanz (2015), Tsai and Wu (2008) and Tsai and Hsu (2013). Furthermore, many studies are dedicated to the use of various techniques of combining the results of base model classification: such as neural networks in the form of self-organizing maps (SOMs), rough sets techniques, case-based reasoning and classifier consensus methods. Examples of the use of this type of ensemble classifiers were examined by Ala'raj and Abbod (2016), Du Jardin (2018), Chuang (2013) and Li and Sun (2012).

#### **3. Environmental Background of the Research Conducted**

#### *3.1. Statistical Description of Bankruptcies in Poland*

According to data from Ogólnopolski Monitor Upadło´sciow (2019) (Coface Nationwide Bankruptcy Monitor—source: http://www.emis.com, http://www.coface.pl/en) a total of 798 businesses declared bankruptcy in 2018. Most bankruptcies were reported in October and September (76 and 74, respectively) and in the following months: March, April, May (67, 66 and 65, respectively), with 61 bankruptcies reported in January. The months with the relatively lowest number of bankruptcies were declared in August (42) and February and December (45 bankruptcies). Comparing the structure of business bankruptcies by voivodeships in 2018 (Figure 1), we may notice that the highest number of bankrupt businesses were reported in the Mazowieckie Voivodeship—156 (which constitutes 22% of all bankrupt enterprises). Further positions in the ranking, with a significantly lower number of bankruptcies, are held by: Sl ˛ ´ askie Voivodeship—84 (12%), Wielkopolskie Voivodeship—68 (10%), Dolno´sl ˛askie Voivodeship—61 (9%), Podkarpackie Voivodeship—47 (7%) and Małopolskie Voivodeship—44 (6%). The lowest number of bankruptcies is reported in: Lubuskie Voivodeship—13 (2%), Opolskie Voivodeship—15 (2%), Podlaskie Voivodeship—19 (3%) and Swi ˛ ´ etokrzyskie Voivodeship—20 (3%). During the first three months of the year 2019, most bankruptcies were also reported in the Mazowieckie Voivodeship—27, Dolno´sl ˛askie Voivodeship—15, Sl ˛ ´ askie Voivodeship—14 and Wielkopolskie Voivodeship—11. Based on the latest available data from Q1 2019, the largest number of bankruptcies were recorded in the following voivodeships: Mazowieckie—27, Dolno´sl ˛askie—15, Sl ˛ ´ askie—14, Wielkopolskie—11 and Łódzkie 10. The least in Swi ˛ ´ etokrzyskie—1, Opolskie—2, Podlaskie and Lubuskie—3, Warmi ´nsko—Mazurskie—4. For the comparison in the Podkarpackie voivodeship, there were 6 bankruptcies.

**Figure 1.** Number of bankruptcies in Polish voivodeships in Q1 2019 and in 2018. Source: own elaboration based on the data analyzed from Coface Nationwide Bankruptcy Monitor (http://www. emis.com).

When analyzing the number of bankrupt businesses in Poland in the year 2019 depending on their business activity, we may notice that the highest number of bankruptcies concerned businesses carrying out varied individual activities (one-person businesses, self-employment)—108 (27% bankrupt), followed by commercial law companies from the commerce (trade) sector—91 (22%), and from the industrial and service sector—70 (17%) and 63 (16%), respectively. In 2019, 51 (13%) businesses from the construction sector, 11 (3%) transport and logistics businesses and 9 (2%) businesses involved in other activities declared their bankruptcy. Figure 2 shows the distribution of the number of bankrupt businesses by their type and the sector of their activity.

**Figure 2.** Number of business bankruptcies in Poland in 2019 by sector of activity. Source: own elaboration based on the data analyzed from Coface Nationwide Bankruptcy Monitor (http://www. emis.com).

#### *3.2. Characteristics of Companies Operating in the Podkarpackie Voivodeship*

According to Emerging Markets Information Service (2019)—EMIS (http://www.emis.com), in 2018 about 3679 companies and partnerships were registered and operating in the Podkarpackie Voivodeship (the number of available financial statements for 2018 in the database). Their reported sector of activity belonged to one of the following 18 areas: A—farming, forestry and fishing, B—mining and extraction, C—industrial processing, D—production of energy, supply of water, gas and other energy sources, E—waste, waste water and sewage management, F—construction, G—wholesale and retail, and servicing vehicles and motorcycles, H—transport and storage management, I—accommodation and food services, J—information and communications, K—finance and insurance, L—services for the property market, M—scientific, specialist and technological activity, N—administration and support, P—education, Q—health and social care, R—culture, entertainment and leisure, S—other services. Figure 3 presents the structure of the number of businesses operating in the Podkarpackie Voivodeship by sector.

**Figure 3.** Businesses in the Podkarpackie Voivodeship by sector of activity. Source: own elaboration based on the data analyzed from 2018 (http://www.emis.com).

In 2018, 997 businesses in the Podkarpackie Voivodeship, a vast majority of this area's businesses, operated in the wholesale and retail sector. Economic activity in the field of industrial processing was declared by 718 enterprises, followed by sectors such as the construction, scientific, specialist and technological activity, and services for the property market sectors (344, 321, 270 businesses, respectively). The lowest number of businesses operated in sectors such as education—37, production of energy and supply of energy sources—35, culture—27, other services—24, as well as mining and extraction—21.

Figure 4 presents the structure of the number of businesses in the Podkarpackie Voivodeship by the duration for which they have functioned (in years). Most businesses, i.e., 1792 (which corresponds to 49% of all analyzed entities) have operated in the market for a very long time—10 years. Nearly as many businesses, i.e., 1720 (47% of the total number), have been active for a medium number of years, whereas 'young' enterprises (167), established in the period from 2017 to 2019 and active for up to two years, constituted only 4% of all businesses analyzed.

**Figure 4.** The structure of the number of businesses in the Podkarpackie Voivodeship by the duration of business activity (in years). Source: own elaboration based on the data analyzed from 2018 (http://www.emis.com).

An analysis of businesses operating in the Podkarpackie Voivodeship according to their size (Figure 5) shows that 40% (1461) of all enterprises are very small, they are the so-called micro-enterprises. Small businesses constituted a further 15%. Overall, over a half of businesses (55%) were either microenterprises or small enterprises. The number of medium and small enterprises was more or less equal, which corresponds respectively to 22% and 23% of all entities analyzed. The size of the enterprise was identified in accordance with the legal provisions of the classification of Polish enterprises adapted to EU law and directives. Micro enterprises were identified according to the rule: number of employees <10 and annual Turnover <= 2 m €. Small enterprises were identified as not being micro enterprises and fulfilling the conditions: number of employees <50 and annual Turnover <= 10 m €. Medium enterprises were identified as not being small and fulfilling the conditions: number of employees <250 and annual Turnover <= 50 m €. Therefore, large enterprises were identified according to the rule: number of employees >= 250 and annual Turnover >50 m € (source: https://ec.europa.eu/growth/smes/business-friendly-environment/sme-definition\_pl).

**Figure 5.** Structure of the number of businesses operating in the Podkarpackie Voivodeship by enterprise size. Source: own elaboration based on the data analyzed from 2018 (http://www.emis.com).

Among all businesses operating in the Podkarpackie Voivodeship, only 7 were listed in the stock market, while 3672 were non-listed companies. An analysis of legal forms of businesses in the Podkarpackie Voivodeship (Figure 6) shows that the vast majority (73.6%) are limited liability companies (private limited companies). There are 2.9% enterprises operating as public limited companies, and only 0.5% are limited partnerships. The remaining businesses, having other legal forms, constitute 23% of all enterprises analyzed in this study.

**Figure 6.** Distribution of the structure of businesses operating in the Podkarpackie Voivodeship by legal form of activity. Source: own elaboration based on the data analyzed from 2018 (http://www.emis.com).

#### **4. Materials and Methods**

As can be seen in the above analysis of literature, in practice business bankruptcy risk assessment makes use of various classifier models. Both classical statistical methods and more advanced non-statistical methods are used, with the latter based on various machine learning techniques. The use of so-called ensemble classifiers, i.e., classifiers designed to increase classification efficiency in relation to the conventional approach (which is based on single classifiers), are becoming increasingly popular—for obvious reasons. Table 1 contains an overview of business bankruptcy risk forecasting models that are most often used in practice.

Classical business bankruptcy forecasting models using single classifier models are very-well known and presented in many publications. Meanwhile, the presents study focuses mainly on a detailed presentation of the ensemble classifier methodology. A detailed discussion of classical models and models used in business bankruptcy forecasts can be found e.g., in monographs by Kuhn and Johnson (2013) and Hastie et al. (2013).


**Table 1.** List of methods applied in forecasting business bankruptcy risk.

Source: own elaboration based on the literature analyzed.

#### *4.1. Ensemble Classifier Methodology*

The ensemble classifier methodology involves combining several single classifiers into an ensemble of classifiers performing the same task in order to improve the effectiveness of classification (the discriminant capability of the entire model) defined as correct assignment of objects into expected classes. This is done by suitably aggregating (often by weighing) results of classification obtained from component classifiers to arrive at a resultant classifier with the best possible forecasting capabilities (surpassing those of all base classifiers in use). Figure 7 shows a functional diagram of ensemble classifiers.

**Figure 7.** A diagram presenting the idea of using ensemble classifiers. Source: own elaboration.

A detailed description of ensemble classifier methodology, their types, characteristics and numerous practical applications can be found in monographs by Zhang and Ma (2012) as well as Zhou (2012). In practice, three well-known approaches: boosting, bagging and combining are applied in ensemble classifier methods. The terminology of boosting ensemble classifiers refers to a broad class of algorithms which enable boosting "weak classifiers", turning them into "strong qualifiers" (of excellent classification performance approaching that of perfect models). An example of such approach is AdaBoost—an adaptive boosting algorithm (Freund and Schapire 1997). In AdaBoost, classifiers of the same type, e.g., boosted classification trees, serve as base classifiers. Voting strategies are most commonly used in order to determine object classes, aggregating their output classifications, such as majority voting, plurality voting, weighted voting or soft voting. The AdaBoost.M1 adaptive boosting algorithm in the case of object classification for two classes contains the following steps (see: Algorithm 1, Zhang and Ma 2012, p. 14).

#### **Algorithm 1 AdaBoost.M1 algorithm**




The name of the second group of ensemble classifier making use of the bagging method is derived from the English abbreviation: Bootstrap AGGregatING (Breiman 1996). This group of ensemble classifiers involves bootstrap sampling to obtain training subsets for base classifiers. Each the classifier is therefore trained on a different training sample, and the results are aggregated. Here, classifiers of the same type are used most often as base classifiers. An example of such a type of ensemble classifiers is Random Forest. The bootstrap aggregation algorithm for object classification into two classes has the following steps (see: Algorithm 2, Zhang and Ma 2012, p. 12).


	- Randomly select a replication subset—training sample *St* by selecting R % of *S* at random
	- Train the base classifier on subset *St*, obtain hypothesis for classifier *ht* concerning classification accuracy relative to the pattern
	- Add *ht* to ensemble, ε ← ε ∪ {*ht*}

A group of methods called ensemble combining represents a wholly different approach. The group includes the so-called combined methods utilizing results of classification functions for single (base) classifiers and aggregating them into the result classification function using the averaging approach (simple or weighted averaging of base classifier results), voting approach (using various types of voting strategies, e.g., majority voting) or stacked generalization approach. The stacking ensemble methodology, pioneered by Wolpert (1992), is based on a combined approach whereby base classifiers (level 1 classifiers) are trained on the same random samples, and then relevant classification results (their classification functions) are used as training samples for the new meta-classifier (level 2 classifier) and aggregated in result classifications.

#### *4.2. Feature Selection Process in Bankruptcy Prediction*

A deeply significant classification-related issue is the problem of choosing the appropriate (optimum) set of diagnostic variables (i.e., the feature selection problem). Detailed characteristics of methods used for the selection of relevant variables for forecast models can be found in studies by John et al. (1994) and Jovic et al. (2015). Wrapper methods are frequently used techniques which analyze possible predictor subsets and determine the effectiveness of their impact on the model's dependent variable on the basis of a search algorithm, the best subset of variables and the classification method applied. In order to search all variable subsets, the search algorithm is 'wrapped' around the classification model, hence the name of this group of methods. Wrapper feature selection methods are based on various approaches of searching for the optimum subset of predictors. Such approaches can be divided into two basic groups: deterministic and randomized. This group of deterministic methods applies various types of sequential algorithms, e.g., progressive stepwise selection or backward stepwise elimination. Wrapper feature selection methods most frequently use random algorithms such as simulated annealing, genetic algorithms or ant colony optimization. A method employing a genetic algorithm in order to search for the optimum subset of predictors is often used to select variables for bankruptcy models. The genetic algorithm of Feature Selection (see: Algorithm 3) is executed according to the procedure designed by Kuhn and Johnson (2013).

#### **Algorithm 3 Genetic Algorithm Feature Selection (GAFS)**


Tune and train a model and compute each chromosome's fitness


Select two chromosomes based on the fitness criterion

*Crossover:* Randomly select a locus and exchange each chromosome's genes beyond the loci *Mutation:* Randomly change binary values of each gene in each new child chromosome with probability *pm*


#### *4.3. Data Samples Description*

The original research sample used in the study included data for 1739 Polish enterprises (bankrupt and not threatened with bankruptcy). This sample included calculated values for 19 financial indicators determining the financial condition of selected enterprises (characterized in detail in Section 5.1 and selected for the study using the wrapper search technique and genetic algorithm discussed in detail in Section 4.2). For bankrupt enterprises, the values of diagnostic variables were set at 1 or 2 years before the actual period of their bankruptcy. Statistical data came from the financial statements of enterprises from 2010–2018 available in the EMIS database (http://www.emis.com). Bankruptcy episodes were identified on the basis of statistics from the EMIS database source: Ogólnopolski Monitor Upadło´sciow (2019) (Coface Polish National Bankruptcy Monitor, source: http://www.emis.com, http://coface.pl/en). The balanced sample used included a total of 1739 research cases from all major sectors of the economy (865—cases for bankrupt enterprises and 874—randomly selected cases for enterprises not at risk of bankruptcy with strong financial conditions). The condition for the non-defaulted enterprises was evaluated on the basis of careful analysis and evaluation of values of many financial indicators, such as profitability ratios, debt ratios, management performance indicators etc., which determined their financial condition and low exposure to the bankruptcy risk. A 70% teaching sample was drawn from the research sample (1217 enterprises: 592—bankrupt and 625—not threatened with bankruptcy), which was used to train and calibrate the parameters of the bankruptcy models used. The remaining cases constituted a random 30% set for the test-validation sample (522—enterprises: 273—bankrupt and 249—not threatened with bankruptcy), which was used at the stage of validation of models to check their predictive properties for new, unknown cases. A separate research sample was designated for enterprises from the Podkarpackie Voivodeship, which included 2133 enterprises of various sizes from the Podkarpackie region registered in various sectors of economic activity. This sample included all enterprises operating in the Podkarpackie for which financial statements (in EMIS database) for 2018 were available. This sample was used as a research set to assess the risk of bankruptcy (in the 2-year horizon up to 2020) of enterprises in the Podkarpackie region under analysis based on an estimated scoring model using the approach of ensemble classifiers.

#### *4.4. Procedure of Mapping PD into Scores*

In this study, the score scaling approach discussed in detail in the literature was used (see e.g., Siddiqi 2017, pp. 240–41). The relationship between the score and logarithms for the so-called odds

ratio: *Odds* = <sup>1</sup>−*PD PD* —expressing the ratio of odds: (1-PD)—that the business in question will be classified as healthy versus the odds that the business will be bankrupt (PD) is:

$$Score = a\_0 + a\_1 \cdot \ln(Odds). \tag{1}$$

By introducing the concept of *pdo*—the number of points in the scoring system which doubles the value of the odds ratio, for a given value of the score we obtain the following relationship:

$$Score + pdo = a\_0 + a\_1 \cdot \ln(2 \cdot Odds). \tag{2}$$

By solving the system of Equations (1) and (2) we obtain formulas of the linear relationship ratios of score scaling depending on ln(*Odds*), and therefore on the probability of default (PD):

$$\begin{array}{c} a\_1 = \frac{pdo}{\ln(2)},\\ a\_0 = Score\_0 - a\_1 \cdot \ln(Odds). \end{array} \tag{3}$$

#### *4.5. Validation Measures of Bankruptcy Prediction Models*

Commonly used measures of classification accuracy were applied in the validation of estimated bankruptcy models. They are described by Siddiqi (2017) and Thomas (2009) clearly and in detail. The confusion matrix is probably the most frequent approach in the assessment of classification accuracy of models. Table 2 presents a general form of the confusion matrix.



Source: own elaboration.

Quantities shown in the table have the following meaning: TN—number of actually bankrupt businesses correctly classified by the model, TP—number of healthy businesses correctly classified by the model as healthy businesses, FN—number of actually bankrupt businesses incorrectly classified by the model as healthy businesses, FP—number of actually healthy businesses incorrectly classified by the model as bankrupt. *AC* = *TN*+*TP TN*+*FN*+*FP*+*TP* ·100% is the measure of the overall effectiveness of correct classification. The effectiveness of correct classifications for the 'bankrupt' class alone can be specified as: *ACB* <sup>=</sup> *TN TN*+*FN* ·100% <sup>=</sup> <sup>1</sup> <sup>−</sup> *ErrB*, where: *ErrB*—is the so-called type I error of incorrect classifications for the class of bankrupt businesses. Likewise, the effectiveness of correct classification for the businesses at no risk of bankruptcy alone can be determined as follows: *ACNB* <sup>=</sup> *TP FP*+*TP* ·100% <sup>=</sup> <sup>1</sup> <sup>−</sup> *ErrNB*, where: *ErrNB*—is the so-called type II error of incorrect classification for the class of healthy businesses. Obviously, the higher the values of classification accuracy measures, the better the effectiveness of the models assessed.

The GINI coefficient and the related area under curve ROC (Receiver Operating Characteristic) AUCROC are also often used as measures of bankruptcy model classification effectiveness (see e.g., Agarwal and Taffler 2008, Barboza et al. 2017). The ROC curve is a graphic representation in a coordinate system (Y = Sensitivity, X = (1 − Specificity)) of a relationship of the cumulative percentage (structural ratio) for bankrupt businesses from the contingency table for the predicted *i*th category of a point score (*scorei*): ω\_*skB*,*<sup>i</sup>* = *i <sup>j</sup>*=<sup>1</sup> *nB*,*<sup>j</sup> nB* and the corresponding cumulative structural ratio for businesses

at no risk of default: ω\_*skNB*,*<sup>i</sup>* = *i <sup>j</sup>*=<sup>1</sup> *nNB*,*<sup>j</sup> nNB* . In the case of classification results ordered relative to the score in the contingency table with k different scoring categories, the GINI coefficient, and thus AUCROC, is determined by the following formula (see e.g., Thomas 2009, pp. 117–18):

$$\text{GINI} = 1 - \sum\_{i=1}^{k-1} (\omega\_{skB,i+1} - \omega\_{skB,i}) \cdot (\omega\_{skNB,i+1} + \omega\_{skNB,i}) = 2 \cdot \text{ALC} \text{ (ROC)} - 1. \tag{4}$$

The GINI coefficient takes values from interval [0,1]. High values of the coefficient, approaching 1, mean that the model being assessed is highly effective (nearly perfect). Meanwhile, the measure of the area under curve AUCROC ranges from 0.5 to 1. Value 0.5 means that the model classifies businesses in the analyzed classes in a completely random way (i.e., its use is pointless), while 1 is a value attained by the best model which perfectly identifies membership in a class.

Information Value (IV), Kolmogorov-Smirnov (KS) statistics and less frequently, the divergence coefficient (Div) are also used to evaluate the effectiveness of bankruptcy forecasting models at the validation stage. IV is calculated by the following formula (see e.g., Thomas 2009, p. 106):

$$IV = \sum\_{i=1}^{k} \left( \frac{n\_{\rm NB,i}}{n\_{\rm NB}} - \frac{n\_{\rm B,i}}{n\_{\rm B}} \right) \cdot \ln \left( \frac{n\_{\rm NB,i} / n\_{\rm NB}}{n\_{\rm B,i} / n\_{\rm B}} \right) \tag{5}$$

where: *nB* is the number of bankrupt businesses, *nNB* is the number of businesses with no risk of bankruptcy, *nB*,*<sup>i</sup>* is the number of businesses for the *i*th scoring category and *nNB*,*<sup>i</sup>* is the corresponding number of businesses with no risk of bankruptcy. The higher IV values, the better discriminant properties of the model subjected to assessment.

The Kolmogorov-Smirnov (KS) statistic compares the empirical distributions of populations containing bankrupt businesses and healthy businesses (a goodness of fit measure). The greater the differences in cumulative distribution functions for the score (higher KS values), the better discriminant capabilities of the model (i.e., the better the scoring model is in separating bankrupt businesses from healthy ones). KS statistic values are calculated by the following formula (see e.g., Thomas 2009, p. 111):

$$KS = \max\_{i=1,\ldots,k} \left| \omega\_{\text{-}sk\_{B,i}} - \omega\_{\text{-}sk\_{NB,i}} \right|. \tag{6}$$

The last validation measure applied to the bankruptcy forecasting models assessed is distribution divergence (Div) given by the formula (see e.g., Siddiqi 2017, p. 261):

$$Div = \frac{\left(\mu\_{\rm NB} - \mu\_{\rm B}\right)^2}{0.5 \cdot \left(var\_{\rm NB} + var\_{\rm B}\right)}\tag{7}$$

where: μ*NB*—mean score distribution value for the healthy businesses population, μ*B*—mean score distribution value for bankrupt businesses population and *varNB*, *varB*—respective variances of these distributions.

#### *4.6. Optimal Cut-O*ff *Point for Scoring Determination*

There are several methods of determination the optimal cut-off point for the scoring models. These methods are described in depth in the literature (see e.g., Zweig and Campbell 1993). One of the methods of determining the optimum cut-off point for the score (used in the research) was to find a score value that maximizes the value of the following expression:

$$\max\_{\mathbf{s}\text{core}} \left\{ M\_1(\mathbf{s}\text{core}\_i) = \boldsymbol{\omega}\_- \text{sk}\_{\text{B},i}(\mathbf{s}\text{core}\_i) - \frac{k\_{\text{NB}}}{k\_{\text{B}}} \cdot \frac{1 - p\_{\text{B}}}{p\_{\text{B}}} \cdot \boldsymbol{\omega}\_- \text{sk}\_{\text{NB},i}(\mathbf{s}\text{core}\_i) \right\} \tag{8}$$

where: *kB* is the cost of type I error: the model incorrectly classifies a bankrupt business as a healthy one, *kNB* corresponds to the cost of type II error where the model incorrectly classifies a healthy business as

bankrupt, and *pB* is the probability of membership in the bankrupt class estimated on the basis of the training sample (the percentage of bankrupt businesses in the sample).

#### **5. Research Results**

The ensemble classifier methodology will be applied to design a scoring model in order to predict bankruptcy events of Polish businesses operating in the Podkarpackie Voivodeship. Each stage of design will be presented in detail together with its potential for a practical application.

The process of designing a scoring model using ensemble classifiers for businesses operating in the Podkarpackie Voivodeship was divided into several stages:


#### *5.1. Feature Selection Stage—Selection of Ratios*/*Bankruptcy Risk Determinants*

Twenty-two financial ratios commonly applied in financial analysis of businesses were initially proposed for the assessment of the financial standing of analyzed business entities:


turnover = (Revenues from sales + Inventories) to Short-term liabilities total: (RS+I)/STL, X19—Working capital turnover = Revenues from sales to (Current assets − Short-term liabilities total): RS/(CA-STL)

• Capital structure ratios: X20—Structure of Equity to Assets total (Balance sheet total): E/BST [%], X21—Structure of Fixed assets to total assets (Balance sheet total): FA/BST [%], X22—Structure of Fixed assets to Current assets: FA/CA [%]

With the help of *wrapper* techniques (discussed in Section 4.2 above), an optimum subset of predictors was selected by means of the genetic algorithm and a potentially best set of financial ratios for bankruptcy forecasting models being trained was determined. Linear discriminant analysis (LDA) was used as a forecasting model in the search algorithm, while the general classification accuracy (AC) measure was applied as the measure of the effectiveness of predictor subsets. The calculations were performed by means of the *R* statistical analyses package and function *gafs()* from the *caret* library. Parameters for the genetic algorithm were as follows: *poSize* = *50*—the number of subsets assessed in each iteration, *pcrossover* = *0.8* (crossover probability) —a high probability that the new generation will not be an exact copy of the chromosomes of parents from the previous generation, *pmutation* = *0.1* (mutation probability)—a low probability of chromosome alterations in the subsequent mutation, *elite* = *0*—the number of best subsets capable of survival in each generation. By means of a suitable genetic algorithm randomly searching for the best subset of diagnostic variables, a set of 19 optimum financial ratios (accuracy for the set was AC = 0.89) using 5-fold cross-validation (cv) procedure. Table 3 contains values of selected measures of discriminant capabilities and significance for individual diagnostic variables.


**Table 3.** Discriminant capability measures—ranking of predictors.

Source: own elaboration using Statistica software.

#### *5.2. Calibration of the Parameters of Bankruptcy Risk Forecast Models (Calibration Stage)*

Eight single classifier models were used in forecasting the probability of default (PD) (Table 3). Classification functions for those models, the so-called level 1 classifiers, served as inputs for a level 2 ensemble meta-classifier, which aggregated them into final classification results. k-NN (k-Nearest Neighbors) was the stacking ensemble classifier. Alternatively, boosting and bagging ensemble classifier

approaches were also applied. For comparison purposes, boosting ensemble classifiers were also used: GBM—Stochastic Gradient Boosting Machine (Friedman 2002) and boosted logistic regression classifier (Logit Boost). The Random Forest (RF) model and averaged Neural Networks (avNNet) were used as bagging classifiers (Breiman 2001). A bankruptcy prediction model calibration procedure was based on samples described in detail in Section 4.3. Calculations were performed with the help of procedures written with the use of the R package libraries (https://cran.r-project.org/). In particular, the following libraries were used: *caret*, *caretEnsemble*, *caTools*, *pROC*, *MASS*, *nnet*, *kernlab*, *rpart*, *earth*, *mgCV*, *klaR*, *gbm*, *plyr*, *randomForest* and other auxiliary ones. A cross validation approach was employed in the calibration procedure of the optimum model (k = 5-fold CV cross-validation). The approach assumed an area under ROC curve values (AUCROC) as a measure of models' discriminant quality (effectiveness). Figure 8 illustrates the process of increasing classification effectiveness for the boosting ensemble model depending on the number of iterations of the boosting algorithm for various complexity of classification trees trained. It very clearly shows why ensemble classifiers surpass single (individual) classifiers in terms of quality.

**Figure 8.** GBM model training process with the use of the stochastic gradient boosting algorithm. Source: own elaboration using R package.

A table in Appendix A (Table A1) presents the final best configurations of the considered bankruptcy prediction models and optimum values of their parameters.

#### *5.3. Determining Score for the Optimum Model (Score Scaling Stage)*

Forecast values of classification functions of the models analyzed (probability of default, PD) in the scoring model should be transformed into corresponding values of score through appropriate scaling. In the calculations, it was assumed that for *Score*<sup>0</sup> = 600 the number points which doubles the odds that the business is not at risk of default, evaluated as 50:1 (Odds = 50), is pdo = 20. With the above assumptions, scaling parameters were estimated and the score function was described by the following relationship: *Score* <sup>=</sup> 487.12 <sup>+</sup> 28.85· ln 1−*PD PD* . Figure 9 illustrates the scaling obtained for the score when the GBM ensemble model is used for the training sample.

**Figure 9.** Score scaling in relation to the corresponding probability of default (PD) for the GBM model. Source: own Elaboration using Excel.

#### *5.4. Model Validation (validation Stage)*

Figure 10 presents ROC curves for five classification models assessed. It is clear that the GBM model perfectly (in 100% cases) predicted membership of businesses in either class (bankrupt and healthy) (AUC = 1). The worst of the models compared, NB—Naive Bayes, also had high prediction accuracy expressed by measure (AUC = 0.92), although it was still significantly inferior to other models.

**Figure 10.** ROC curves for models LDA, NB, C&RT, avNNet and GBM for the training sample. Source: own elaboration using the R package.

Figure 11 presents a graphic interpretation of KS = 0.89 (for score = 468) for the LDA model when testing its effectiveness with regard to the test and validation sample. High values for this KS statistics mean that the model is rather effective.

**Figure 11.** Interpretation of the Kolmogorov-Smirnov validation statistic for the LDA model and the test and validation sample: (**a**) Difference in cumulative distribution function for both classes relative to score; (**b**) Relationship of KS as the maximum difference between cumulative distribution functions for both classes relative to score. Source: own elaboration using Excel.

Figure 12 presents a comparison and interpretation of a very high discriminant capability of the ensemble GBM model (divergence Div = 92.1) and the LDA model with a relatively weaker discriminant capability (divergence Div = 2.6) rated on the basis of the training sample.

**Figure 12.** Score distribution for healthy and bankrupt businesses: (**a**) for the GBM model and very high divergence of distributions Div = 92.1; (**b**) for the LDA model and low divergence of distributions Div = 2.6. Source: own elaboration using Excel.

#### *5.5. Optimal Cut\_O*ff *Point Determination Stage*

The next step for the ensemble GBM classifier-based forecasting model with the best classification properties expressed by the value of validation measures involved determining values of the optimum cut\_off point below which the businesses analyzed were regarded as being at risk of default (bankrupt). In the calculations, it was assumed that the ratio of the above costs is kNB kB <sup>=</sup> <sup>1</sup> <sup>2</sup> (double cost for the incorrect classification of bankrupts, as the event appears to be more detrimental for the practical application of the model) and a probability of pB = 0.486 in the training sample was determined. The optimum cut\_off point was calculated for scorecut\_off = 386 by means of formula (8). Therefore, all businesses for which the point value of the score is score ≤ 386 must be forecast as members of the bankruptcy (B) class, while the remaining ones as members of the non-bankruptcy (NB) class. Still, for the estimated optimum ensemble GBM model in the score value interval [387–486], there is a very high potential risk of default (PD > 0.5), determined on the basis of the training sample (contained in the interval [0.96–0.51]). Consequently, if we rely on the classical procedure allowing us to consider a business (for which PD > 0.5) bankrupt (at risk of default), then the score interval (387 <= score <= 486) should be defined as a "gray zone", where it is difficult to clearly determine the membership of a given business in either the bankruptcy class or the non-bankruptcy class. Businesses of this type were assessed as uncertain, leaning towards potential bankruptcy (contingent on unfavorable circumstances affecting their financial health).

Figure 13 presents an interpretation of the optimum cut-off point for the score, determined in the above manner.

**Figure 13.** Optimal score cut-off point for the GBM model. Source: own elaboration using Excel.

*5.6. Classification of Enterprises from the Podkarpacie Region (Prediction Stage) Depending on the Risk of Their Bankruptcy*

Applying the classification rule:

IF (*score* ≤ 386) THEN bankrupt within h ≤ 2 years; IF (*score* > 486) THEN healthy; IF (*score* > 386 AND *score* ≤ 486) THEN uncertain (grey zone);

(9)

a forecast of bankruptcy (membership in either risk class) was determined over a time horizon of maximum 2 years (up to 2020) for businesses operating in the Podkarpackie Voivodeship in various sectors of economic activity and depending on the enterprise size. Table 4 is a contingency table presenting the forecast number of businesses classified as members of each of the 3 bankruptcy risk classes by different economic activity sectors.

**Table 4.** Predicted number of businesses at risk of bankruptcy in time horizon h = 2 (until 2020) and predicted number of businesses in an uncertain condition in the Podkarpackie Voivodeship for various sectors.



**Table 4.** *Cont*.

Source: own elaboration using Statistica software.

Figure 14 presents the forecast probability of potential bankruptcy risk (up to h = 2 years) for the enterprises surveyed from the Podkarpacie for various sectors of classification of their activities, which were estimated on the basis of the optimal ensemble model (GBM) for which the classification functions were used in the developed scoring model.

**Figure 14.** Descriptive statistics characterizing the probability distribution of bankruptcy (over a 2-year time horizon) for the surveyed enterprises from the Podkarpacie for various sectors of their business activities. Source: own elaboration using Statistica.

Figure 15, on the other hand, shows the predicted values of such probability of bankruptcy for the surveyed enterprises from the Podkarpackie, depending on their enterprise size.

**Figure 15.** Descriptive statistics characterizing the probability distribution of bankruptcy (over a 2-year time horizon) for the surveyed enterprises from Podkarpackie depending on the size of the enterprise. Source: own elaboration using Statistica.

Table 5 presents a proper assessment of the classification effectiveness of the developed bankruptcy early warning model on observed and available at the time of conducting the confirmed court tests of 39 actual enterprises that declared bankruptcies in the Podkarpackie Voivodeship (in 2019). They were included in the test sample of 2133 enterprises. This confirms the fairly good quality of the model for which the effectiveness (ex-post) of correct recognition by the implemented scoring model for new (not taken into account at the calibration stage) of actually bankrupt enterprises is about 79% (which seems to be acceptable result), while for enterprises not threatened with bankruptcy, the efficiency of the model is much better and is equal to 95%.



#### **6. Discussion**

The comparative analysis of the classification effectiveness of ensemble models in juxtaposition with several classical bankruptcy forecasting methods indicates that ensemble classifiers are characterized by considerably better values of validation measures, both for the training sample and the test sample, surpassing all of the analyzed base classifiers in terms of accuracy. The best ensemble classifier, GBM (decision trees supported by a stochastic gradient boosting algorithm) offered full accuracy of correctly classified bankrupt and healthy businesses (AC = 100%, ACB = 100%, ACNB = 100%) for the training sample and over 99% for the test sample (Tables A2 and A3). In addition, other values of validation statistics demonstrate nearly perfect predictive capability of the GBM ensemble model for the training sample: AUCROC = 1, statistic KS = 1, divergence Div = 92.1 and information value IV = 5.3 and the test sample: AUCROC = 0.99, statistic KS = 0.99, divergence Div = 22.1 and information value IV = 7.1. The Generalized Additive Model (GAM) seems to be the best classical model, yet it displays inferior values of validation statistics, both for the training sample: AC = 97, AUCROC = 0.99, KS = 0.96, Div = 5.8, IV = 5.3, and for the test sample: AC = 97%, AUCROC = 0.99, KS = 0.96, divergence Div = 43.0, IV = 7.1. This confirms the earlier findings of other authors and allows us to say that in practical

applications, bankruptcy models based on ensemble classifiers outperform other classical approaches and are an interesting alternative to the conventional method of using single classifiers.

Based on the analysis of the value of the probability of bankruptcy (Figure 14) of the enterprises surveyed in the Podkarpackie Voivodeship in individual sectors of their business activity (estimated on the basis of the best ensemble classifier model—GBM, which has the best forecasting and classification capabilities) and on the basis of an analysis of their predicted belonging to three Bankruptcy risk classes (Table 4), the following comparative analysis can be carried out assessing the exposure to bankruptcy risk of enterprises operating in the region in 2018 in view of their potential bankruptcy by 2020.

In sector A (farming, forestry and fishing) with a total of 50 enterprises surveyed, the developed scoring model predicted bankruptcy within a time horizon of up to two years (up to 2020) 4% of all enterprises in this sector, including uncertain enterprises from the second class of bankruptcy risk (from the so-called "gray zone"), i.e., with a significant probability of bankruptcy (PDt = 2 > 50%), the percentage of potentially bankrupt enterprises (over a 2-year horizon) increases to 10%. The average probability of bankruptcy for enterprises in this sector is 11% (min = 0%, max = 99.9%). Every 10 enterprise in this sector had a probability of bankruptcy over a 2-year horizon (up to 2020) in the range of 43%–99.9%. It is therefore quite heavily exposed to the risk of bankruptcy.

In sector B (mining and extraction) with a total of 12 enterprises, the scoring model qualified all enterprises as not being threatened with bankruptcy. The average probability of bankruptcy for enterprises in this sector is 1% (min = 0%, max = 6.8%). Every one of the 10 enterprises in this sector had a probability of bankruptcy in the 2-year horizon (up to 2020) in the range of 2.3%–6.8%. Therefore, it was the first of the three least risky sectors of the region's economy.

In sector C (industrial processing) with a total of 581 enterprises, the scoring model predicted bankruptcy for 2% of all enterprises in this sector within a time horizon of up to two years (up to 2020), including uncertain enterprises from the second class of bankruptcy risk (from "grey zone"), while the number of potentially bankrupt enterprises increased to 7%. The average probability of bankruptcy for enterprises in this sector is 7.4% (min = 0%, max = 100%). Every enterprise in this sector had a probability of bankruptcy over a 2-year horizon (up to 2020) greater than 17.7%.

Sector D (energy, water, gas and other energy sources) with a total of 25 enterprises was the second of the three least risky sectors in the region's economy. The scoring model qualified all enterprises as not being threatened with bankruptcy. The average probability of bankruptcy for enterprises in this sector is 2.2% (min = 0%, max = 43.3%). Every 10 enterprises in this sector had a probability of bankruptcy in the 2-year horizon (up to 2020) that was greater than 1.7%.

In sector E (waste, wastewater and sewage management) with a total of 67 enterprises, the scoring model qualified 97% of enterprises as not being threatened with bankruptcy, and 3% as uncertain. The average probability of bankruptcy for enterprises in this sector is 3.8% (min = 0%, max = 83.7%). Every 10 enterprises in this sector had a probability of bankruptcy in the 2-year horizon (up to 2020) that was greater than 9.2%.

In F sector (construction) with a total of 220 enterprises, the scoring model predicted bankruptcy within a time horizon of up to two years (up to 2020) for 3% of all enterprises in this sector, though after including uncertain enterprises with the second class of bankruptcy risk (from the "grey zone"), the percentage of potentially bankrupt enterprises increases to 8%. The average probability of bankruptcy for enterprises in this sector is 8.6% (min = 0%, max = 100%). Every 10 enterprises in this sector had a probability of bankruptcy over a 2-year horizon (up to 2020) that was greater than 23.5%.

In sector G (wholesale and retail) with a total of 734 enterprises, the scoring model predicted bankruptcy for 2% of all enterprises in this sector for up to two years (up to 2020). After including uncertain enterprises from the second class of bankruptcy risk (from the "gray zone"), the percentage of potentially bankrupt enterprises rose to 5%. The average probability of bankruptcy for enterprises in this sector is 5.7% (min = 0%, max = 100%). Every 10 enterprises in this sector had a probability of bankruptcy over a 2-year horizon (up to 2020) that was greater than 9.8%.

In the H (transport and storage management) sector with a total of 75 enterprises, the scoring model predicted bankruptcy for 3% of all enterprises in this sector for up to two years (up to 2020), including uncertain enterprises from the second class of bankruptcy risk (from the "gray zone"), while the percentage of potentially bankrupt enterprises increased to 7%. The average probability of bankruptcy for enterprises in this sector is 8.2% (min = 0%, max = 100%). Every 10 enterprises in this sector had a probability of bankruptcy over a 2-year horizon (up to 2020) that was greater than 23.2%.

The I sector (accommodation and gastronomy) with a total of 56 enterprises was the sector most exposed to the risk of bankruptcy. The scoring model predicts bankruptcy in the time horizon of up to two years (up to 2020) for as much as 13% of all enterprises in this sector, including uncertain enterprises in the second class of bankruptcy risk (from the "gray zone"), meaning the percentage of potentially bankrupt enterprises increased to 20%. The average probability of bankruptcy for enterprises in this sector is 22.2% (min = 0%, max = 100%). Every 10 enterprises in this sector had a probability of bankruptcy over a 2-year horizon (up to 2020) that was greater than 98.6%.

In the J (information and communication) sector with a total of 55 enterprises, the scoring model qualified 95% of enterprises as not being threatened with bankruptcy, and 5% as uncertain. The average probability of bankruptcy for enterprises in this sector is 6.1% (min = 0%, max = 89.4%). Every 10 enterprises in this sector had a probability of bankruptcy in the 2-year horizon (up to 2020) greater than 11.6%.

In the K (finance and insurance) sector with a total of 12 enterprises, the scoring model qualified 67% of enterprises as not being threatened with bankruptcy, and 33% as uncertain. The average probability of bankruptcy for enterprises in this sector is 28.2% (min = 0%, max = 96%). Every 10 enterprises in this sector had a probability of bankruptcy in a 2-year horizon (up to 2020) within 95.2–96%. This is a very specific sector (financial sector), hence the ambiguous interpretation of the results of the examined model belonging to risk classes.

In the L sector (services for the property market) with a total of 73 enterprises, the scoring model predicted bankruptcy for 3% of all enterprises in this sector within a 2-year horizon (up to 2020), including uncertain enterprises from the second class of bankruptcy risk (from the "gray zone"), where the percentage of potentially bankrupt enterprises increases to 17%. The average probability of bankruptcy for enterprises in this sector is 15.8% (min = 0%, max = 99.7%). Every 10 enterprises in this sector had a probability of bankruptcy over a 2-year horizon (up to 2020) in the range of 72.1%–99.7%. It is therefore also one of the sectors with high exposure to the risk of bankruptcy.

In the sector M (scientific, specialist and technological activity) with a total of 61 enterprises, the scoring model predicted bankruptcy for 2% of all enterprises in this sector within a time horizon of up to 2 years (up to 2020). After including uncertain enterprises from the second class of bankruptcy risk (from the "gray zone"), the percentage of potentially bankrupt enterprises increased to 5%. The average probability of bankruptcy for enterprises in this sector is 6.9% (min = 0%, max = 98.1%). Every 10 enterprises in this sector had a probability of bankruptcy over a 2-year horizon (up to 2020) greater than 17.4%.

The N sector (administration and support) with a total of 43 enterprises was also one of the sectors with a high exposure to the risk of bankruptcy. The scoring model predicted bankruptcy within a 2-year horizon (up to 2020) for 5% of all enterprises in this sector, including uncertain enterprises from the second class of bankruptcy risk (from the "gray zone"), when the percentage of potentially bankrupt enterprises increases to 12%. The average probability of bankruptcy for enterprises in this sector is 10.3% (min = 0%, max = 99.8%). Every 10 enterprises in this sector had a probability of bankruptcy over a 2-year horizon (up to 2020) greater than 51%.

Sector P (education) with a total of only nine enterprises was the third least risk affected sectors in the region's economy. The scoring model qualified all enterprises as not threatened with bankruptcy. The average probability of bankruptcy for enterprises in this sector is 3% (min = 0%, max = 19.1%). Every 10 enterprises in this sector had a probability of bankruptcy within a 2-year horizon (up to 2020) greater than 19%.

In the Q (health and social care) sector with a total of 38 enterprises, the scoring model predicted bankruptcy within a 2-year horizon (up to 2020) for 3% of all enterprises in this sector, including uncertain enterprises from the second class of bankruptcy risk (from "gray zone"), when the percentage of potentially bankrupt enterprises increases to 8%. The average probability of bankruptcy for enterprises in this sector is 10.3% (min = 0%, max = 98.8%). Every 10 enterprise in this sector had a probability of bankruptcy over a 2-year horizon (up to 2020) greater than 39%.

In the R (entertainment and leisure) sector with a total of 11 enterprises, the scoring model qualified 82% of enterprises as not being threatened with bankruptcy, and much because 18% as uncertain. The average probability of bankruptcy for enterprises in this sector is 11.3% (min = 0%, max = 61%). Every 10 enterprises in this sector had a probability of bankruptcy in the 2-year horizon (up to 2020) in the range of 54.9%–61%. Therefore, it is a sector in which ambiguity in the interpretation of the results of the examined model to risk classes can also be observed.

In the last sector S (other services) with a total of 11 enterprises, the scoring model qualified 91% of enterprises as not threatened with bankruptcy, and 9% as uncertain. The average probability of bankruptcy for enterprises in this sector is 12% (min = 0%, max = 91.7%). Every 10 enterprises in this sector had a probability of bankruptcy in the 2-year horizon (up to 2020) greater than 17.7%. It is also a sector in which ambiguity can be observed in interpreting the belonging of the results of the examined model to risk classes.

Based on the results from Table 4 and based on the analysis of the value of the probable bankruptcy probability (Figure 15) for the surveyed enterprises depending on their size, the following relationships illustrating the degree of their exposure to the risk of bankruptcy can be seen. In the sector for very small (micro) enterprises (535 of which were included in the study), the developed scoring model qualified 89% of these enterprises as not threatened with bankruptcy, 4% as bankrupt and a further 7% as uncertain (from the "gray zone"), but potentially with a significant risk of their bankruptcy above 50%. In the sector of small sized enterprises, of which 356 was developed in the study, the scoring model qualified 91% of such enterprises as not threatened with bankruptcy, 3% as bankrupt and another 6% as uncertain (from the "gray zone"). In the sector of medium enterprises (606 included in the study), the scoring model qualified 96% of enterprises as not threatened with bankruptcy, 2% as bankrupt and another 2% as uncertain (from the "gray zone"). Similarly for the large enterprise sector (636 enterprises) the scoring model in the study classified 94% of enterprises as not at risk of bankruptcy, 2% as bankrupt and another 4% as uncertain (from the "gray zone").

One also should pay attention to limitations of the analyses presented. The limitation of the model developed may be the fact that the developed and implemented scoring model has been estimated on the basis of statistical data for enterprises from various sectors of activity. It is very difficult to develop a model with good accuracy (a sufficiently high classification efficiency) that would be good in such a situation, since various sectors often very specific and incomparable. However, on the other hand, the results obtained (Table 5) for 39 actual bankruptcies of enterprises in the Podkarpackie Voivodeship observed and confirmed in 2018, the efficiency of correct recognition by the scoring model of really bankrupt enterprises is about 79%, while for non-bankrupt enterprises the equivalent figure is 95%. The effectiveness of the scoring model for the separate class: bankrupt at 79% is sufficient and acceptable, but of course can be discussed further. It can show that designed model includes three classes of bankruptcy risk (bankrupt, non-bankrupt and "gray zone"—difficult to say, but potentially also bankrupt). In the classic approach with only two classes (bankrupt, non-bankrupt), one should add another 8% to the model effectiveness (including the class of uncertain enterprises—"gray zone" for which the probability of bankruptcy is high and greater than 0.5). Then the efficiency of the correct classifications of estimated model increases to 87%, which seems to be a good result. Overall accuracy for the model (without division into classes) is 94%.

Also, the selection of such a large set of as many as 19 indicators as determinants of the financial condition of enterprises in the models raises the question of whether it should not be limited to the set of only a few most important indicators. Such a large collection may raise suspicions that many of the variables may be strongly correlated with each other, which may affect the quality, especially of classic models, such as LDA. In the study, such a large set of factors was conditioned by the choice using the wrapper method and genetic algorithm, and the final application of the type of ensemble classifiers that are not so sensitive to the interdependence of variables. However, for the sake of accuracy, it is worth emphasizing that the correlation between variables has never been greater than 0.87. However, in future research, it is worth considering reducing the number of predictors of bankruptcy.

### **7. Conclusions**

The results of the analyses presented in the paper lead to several general conclusions that can be a summary of the research:


**Funding:** This research received no external funding.

**Conflicts of Interest:** The author declares no conflicts of interest.

#### **Appendix A**

**Table A1.** Optimum configuration and set of parameters for bankruptcy models applied.



**Table A1.** *Cont*.

Source: own elaboration and calculations using R and Statistica software.

**Table A2.** Validation statistics for selected classical models of single bankruptcy classifiers in comparison to ensemble classifiers applied for the training sample.



**Table A2.** *Cont*.

Source: own elaboration and calculations using R and Statistica software.

**Table A3.** Validation statistics for selected classical models of single bankruptcy classifiers in comparison to ensemble classifiers applied for the test/validation sample.



**Table A3.** *Cont*.

Source: own elaboration and calculations using R and Statistica software.

#### **References**


Alfaro, Esteban, Noelia Garcia, Matias Gamez, and David Elizondo. 2008. Bankruptcy forecasting: An empirical comparison of AdaBoost and neural networks. *Decision Support Systems* 45: 110–22. [CrossRef]

Altman, Edward I. 1968. Financial ratios, discriminant analysis and the prediction of corporate bankruptcy. *Journal of Finance* 23: 589–609. [CrossRef]

Anwar, Hina, Usman Qamar, and Abdul W. M. Qureshi. 2014. Global Optimization Ensemble Model for Classification Methods. *The Scientific World Journal* 2014: 1–9. [CrossRef]

Barboza, Flavio, Herbert Kimura, and Edward Altman. 2017. Machine learning models and bankruptcy prediction. *Expert Systems with Applications* 83: 405–17. [CrossRef]

Begley, Joy, Jin Ming, and Susan Watts. 1996. Bankruptcy classification errors in the 1980s: An empirical analysis of Altman's and Ohlson's models. *Review of Accounting Studies* 1: 267–84. [CrossRef]

Breiman, Leo. 1996. Bagging predictors. *Machine Learning* 24: 123–40. [CrossRef]

Breiman, Leo. 2001. Random Forests. *Machine Learning* 45: 5–32. [CrossRef]


© 2020 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
