The Application of Machine Learning in Diagnosing the Financial Health and Performance of Companies in the Construction Industry

Horváthová, Jarmila; Mokrišová, Martina; Schneider, Alexander

doi:10.3390/info15060355

Open AccessArticle

The Application of Machine Learning in Diagnosing the Financial Health and Performance of Companies in the Construction Industry

by

Jarmila Horváthová

,

Martina Mokrišová

^*

and

Alexander Schneider

Faculty of Management and Business, University of Prešov, Konštantínova 16, 080 01 Prešov, Slovakia

^*

Author to whom correspondence should be addressed.

Information 2024, 15(6), 355; https://doi.org/10.3390/info15060355

Submission received: 15 May 2024 / Revised: 9 June 2024 / Accepted: 13 June 2024 / Published: 14 June 2024

(This article belongs to the Special Issue AI Applications in Construction and Infrastructure)

Download

Browse Figures

Versions Notes

Abstract

:

Diagnosing the financial health of companies and their performance is currently one of the basic questions that attracts the attention of researchers and experts in the field of finance and management. In this study, we focused on the proposal of models for measuring the financial health and performance of businesses. These models were built for companies doing business within the Slovak construction industry. Construction companies are identified by their higher liquidity and different capital structure compared to other industries. Therefore, simple classifiers are not able to effectively predict their financial health. In this paper, we investigated whether boosting ensembles are a suitable alternative for performance analysis. The result of the research is the finding that deep learning is a suitable approach aimed at measuring the financial health and performance of the analyzed sample of companies. The developed models achieved perfect classification accuracy when using the AdaBoost and Gradient-boosting algorithms. The application of a decision tree as a base learner also proved to be very appropriate. The result is a decision tree with adequate depth and very good interpretability.

Keywords:

bankruptcy; construction industry; decision tree; ensemble methods; machine learning; performance

1. Introduction

The analysis of the financial health and performance of companies and the prognosis of their potential financial failure, even bankruptcy, are currently continually debated and examined topics, both between academics and entrepreneurs. Global problems, the pandemic, and internal problems within countries and companies create the premise of their possible failure. Therefore, companies must face many challenges in this regard. To avoid bankruptcy, they must apply increasingly advanced and sophisticated methods of diagnosing their financial health. By applying these methods, managers of companies obtain the necessary information about their financial health, which can be a useful tool to manage and improve companies’ performance.

In economic sciences, as well as in other leading fields, nonparametric methods are increasingly used in research instead of multi-criteria parametric methods. This is primarily due to the quality and usability of economic and financial data, which do not meet the requirements for the application of parametric methods. Another important factor for the change in applied methods is the fact that the amount of processed data is growing. It increases the demands on tools and methods that can be used to process and analyze these data. This is also confirmed by Berezigar Masten and Masten [1], who point out that large datasets are available and have grown significantly over the last 20 years. Equally important is the speed of development in the field of applicable technologies and statistical methods. These facts help to apply increasingly sophisticated methods in various scientific fields. The limitations of parametric methods were already pointed out in 1999 by Dimitras et al. [2], who applied a method that eliminates the limitations of parametric methods and thus achieved better results in their research compared to the results of parametric methods. Several nonparametric methods may be used to solve this problem. To illustrate, Gepp et al. [3] suggest that decision trees can achieve better classification accuracy than discriminant analysis (DA), and their application does not need to meet the assumptions required when applying DA. Olson et al. [4] argue that logistic regression (LR) provides less exact results than decision trees when predicting bankruptcy. Durica et al. [5] applied decision trees (CART, CHAID) to classify prosperous and non-prosperous businesses and achieved 98% classification accuracy. Liu and Wu [6] pointed out the simplicity of interpreting the results of decision trees. Lee et al. [7] highlighted the decision tree as an important tool for predictive and classification analysis. The authors argue that despite the fact that predictive analytics has existed for decades, it has gained much attention, especially during the last decade of the 20th century, while providing the application of new nonparametric methods. This approach mainly includes data-mining methods and big data analysis.

Ensemble methods have been developed and studied to create a model that can accurately predict outcomes with higher precision. The aim of these methods is to create a model with better classification accuracy than single models, which can be achieved individually by combining a set of simple or weak models [8]. Well-known combination methods include bagging or boosting [9]. Several studies comparing the classification accuracy of these methods with single models can be mentioned. For example, Karas and Režňáková [10] achieved much better classification accuracy results when using the nonparametric method of boosted trees compared to linear discriminant analysis (LDA). Ben Jabeur et al. [11] proposed an improved XGBoost algorithm derived from a selection of important features (FS-XGBoost) and confirmed that this model with traditional feature selection methods outperformed discriminant analysis, partial least squares discriminant analysis, logistic regression, support vector machines, Multilayer Perceptron, including Restricted Boltzmann Machine in their AUC.

Despite significant achievements in knowledge acquisition, traditional machine-learning methods (LDA, LR) may not achieve satisfactory performance when handling complex information, e.g., unbalanced, high-dimensional, noisy data, etc. This is due to the difficulty of capturing additional characteristics and underlying data structures, which is a limitation of these methods. Under these circumstances, the question of how to build an effective mining model for knowledge discovery comes to the fore. The concept of ensemble learning revolves around combining data fusion, data modeling, and data mining to create a single connected and comprehensive system.

Another of the possible applications in the given area, which falls under the issue of machine learning, is artificial neural networks. Results of their application are presented in studies [4,12,13,14].

When predicting the financial health of companies, methods that fall into the special field of machine learning—deep learning—have recently come to the fore. The application of these methods has greatly improved the up-to-date level of development in the fields of recognition of speech and visual objects, as well as their detection and many other areas, such as the discovery of medical devices and the prognosis of corporate bankruptcies. Deep convolutional networks brought a breakthrough in the processing of corporate insolvency forecasts, as well as in other areas, using image, video, speech, and sound processing [15].

Nevertheless, discriminant analysis and logistic regression are methodologies most often used in predicting a company’s financial health [3].

In this paper, an analysis of the financial health and performance of businesses in the construction industry was performed. This industry is unique because of its high capital intensity, uniqueness of projects, and long-term project periods [16,17]. Companies doing business within it achieve high values of liquidity and have different capital structures [18] compared to businesses from other industries. For this reason, simple bankruptcy prediction methods are not suitable for companies doing business within this industry. Ensemble methods appear to be an appropriate alternative. The application of these methods in the construction industry of transition economies is rare. This paper strives to fill this gap in scientific research by submitting insolvency prediction models applied within the construction industry using ensemble algorithms. We suppose that these models will achieve higher performance than simple ones. Based on this, we asked the following research question: Will ensemble bankruptcy prediction models for the construction industry achieve higher performance than simple models?

Ensemble models such as boosted trees are often used for bankruptcy prediction due to the high variance of financial indicators that express the financial health of companies. The reason for this variance is the relatively limited number of used samples, meaning that most of the values of the selected indicators are concentrated in a narrow range, but some companies also have extreme values. This creates problems when using Gradient-based models like neural networks or logistic regression, which can lead to less effective predictions. Even after normalizing or standardizing the data, the state is problematic to overcome. Boosted tree models, on the other hand, consider the order of feature values rather than the values themselves. This means that the results are not affected by extreme values and do not demand any pre-processing efforts.

The rest of the paper is organized in the following manner: Section 2 lists selected machine-learning methods and studies dealing with bankruptcy prediction. Section 3 specifies the sample used in the research and the initial set of financial features and describes Lasso regression and selected simple and ensemble classifiers from the methodological point of view. Section 4 offers results of bankruptcy prediction achieved by neural networks, decision trees, AdaBoost, and Gradient-boosting algorithms. Section 5 discusses the results achieved in this study in terms of similar studies and conclusions.

2. Literature Review

Currently, artificial intelligence (AI) techniques are coming to the fore as alternative methods to established approaches or as a part of combined procedures. These are increasingly used to find answers to complex economic assignments, including predicting the failure of businesses. Artificial intelligence techniques manage to overcome the shortcomings of nonlinear programs and deal with corrupt and incoherent data. Their advantage is the ability to learn from examples, and when trained, they can predict and provide generalized notions with ease and speedier [19]. The subfield of AI is machine learning, which allows computers to learn even if they are not directly programmed [20]. There are two basic types of learning methodologies used in machine learning—supervised learning and unsupervised learning. The difference between them is that supervised learning provides a target value to improve the training with the data, while the unsupervised method deals with a target value as such. The second difference lies in their application to different problems. While supervised learning is applied in regression or classification settings, unsupervised learning is rather dedicated to solving association and clustering problems [21]. Often used machine-learning algorithms include Multiple Discriminant Analysis, Logistic Regression, Naïve Bayes, Bayesian Network, Support Vector Machines, Decision Tree, Random Forest, Bootstrapped aggregation (Bagging), AdaBoost, Gradient Boosting, K-Nearest Neighbor and Artificial Neural Network (ANN) [22].

Artificial neural networks are a category of parallel information processing models motivated by biological neural networks, which take into account various significant simplified adaptations [23]. They are able to learn through previous exposure by generalizing acquired knowledge to form new conclusions, and in this way, they can make useful decisions [9]. Recent studies show that ANNs have nonlinear, nonparametric adaptive learning properties that enable them to recognize and classify patterns [12] (p. 16). During the recent period, they have been used successfully to solve and predict numerous financial difficulties, including the prognosis of insolvency [12].

The pioneering attempts using artificial intelligence models which imitated the organic nervous system can be traced back to the 1920s. It took only two decades for the basis of the scientific area handling artificial neural networks to be created. The prime theory devoted to this cause was authored by McCulloch and Pitts [24]. Within their work, both highlighted the probability of the actual presence of an artificial neural network that would be able to deal with arithmetic and logical algorithms. This work was influential to further scholars who started to examine the practical application of neural networks [25]. In 1958, Rosenblat proposed a definition of a neural network structure termed perceptron. It may have been possibly the earliest “honest-to-goodness neural network tool” since it ran as a detailed simulation using an IBM 704 computer [26]. The perceptron was developed in line with biological fundamentals and demonstrated a skill to learn [27]. These attempts were stopped because it was assumed that the linear neural network method was insufficient for dealing with more complex problems [25]. A breakthrough concerning the development of neural networks occurred during the 1970s and 1980s when Rumelhart et al. [28] rediscovered the backpropagation procedure previously developed by Werbos in 1974 and used it as a widely accepted tool for training a multilayer perceptron. The aim is to find the minimum error function in relation to the connection’s weights [27,29]. Multilayer perceptrons (MLPs) with backpropagation learning algorithms, also known as multilayer feedforward neural networks, can handle a high count of statements that single-layer perceptron is unable to resolve. Therefore, they have been used more than other types of neural networks for a wide variety of problems [30,31].

Another widely used machine-learning method is decision trees (DT). They create a hierarchical tree structure that divides data into leaves [9]. Branch nodes in decision trees store rules for classifications, which are used to cluster comparable samples into identical leaf nodes. As a result, they are used in both classification and regression tasks [32]. Commonly used decision tree methods encompass ID3, C4.5, C5, CHAID (Chi-squared Automatic Interaction Detection), CART (Classification and Regression Trees), Best First Decision Tree or AD Decision Tree [4,33]. Decision trees were first used to predict business failure by Frydman et al. [34]. These authors found that DT outperforms discriminant analysis in predicting financial failure. DTs were also used in corporate health prediction by Nazemi Ardakani et al. [35]. Olson et al. [4] used DT and confirmed its significant classification accuracy. The excellent prediction and classification accuracy of DT models was also confirmed by Gepp and Kumar [36].

Among well-known machine-learning algorithms applicable to both classification problems and regression tasks, the Support Vector Machine (SVM) is highly acknowledged [37]. It belongs to the most widely used classification models because of its positive results in greater feature spaces [32]. This method was first introduced by Vapnik and Vapnik [38]. SVM attempts to identify the hyperplane with the largest margin to split the inputs into binary clusters, which is called the optimal separating hyperplane. In the case of linearly indivisible data, SVM aims to map these data onto a feature space of higher dimension by transforming its core function [32].

Over the past few decades, within the area of machine learning, a couple of significant advances in sophisticated learning procedures and powerful pre-processing methods were produced. Among others, it was the further development of ANNs towards deeper neural network architectures combined with advanced learning proficiency, which is summarized as deep learning [39]. A deep ANN is based on the application of a nonlinear model with multiple hidden layers, allowing the capture of the complex relationship between input and output [40]. The benefit deep learning provides over machine learning is that humanly extricated or hand-crafted factors are no longer needed. Deep learning derives aspects spontaneously from source input, processes it, and determines further actions that are dependent on it [40].

In the discipline of recognition of patterns and machine learning, numerous classifiers have been frequently utilized in scientific trends. These combined approaches are known as ensemble classifiers [9]. They have recently delivered better results than a unique classifier [6]. The easiest way to incorporate different classifiers is majority voting. The dual outputs of the k separate original classifiers become merged. Then, the class with the highest count of votes is chosen as the ultimate classification outcome. Another way of combining classifiers is bagging. When using this method, a number of classifiers are thought of separately on distinct training sets utilizing the bootstrap method [9]. Another well-known method of combining classifiers is boosting. Using this method, the importance of incorrectly predicted training instances in subsequent iterations is increased, and thus, the classifiers learn from the errors of previous iterations [41]. Examples of boosting ensembles are AdaBoost or Gradient Boosting [41]. These algorithms have been used in previous studies, and their performance was compared with that of base learners. Kim and Upneja [33] used decision trees and AdaBoosted decision trees to determine the critical features of financial problems of publicly traded US restaurants in the period 1988–2010. AdaBoosted decision trees achieved superior prediction results and the lowest error type I. The delivered outcome of this model showed that restaurants in financial need were massively dependent on debt and achieved worse current ratios, their assets grew slower, and their net profit margins were lower than those of non-distressed restaurants. The authors suggested the application of the AdaBoosted decision tree to enable an early warning system regarding the prediction of the restaurants’ financial distress. Wyrobek and Kluza [42] compared the performance of Gradient-boosted trees with other methods. They found that Gradient-boosted trees performed much better than Linear Discriminant Analysis and Logistic Regression and slightly better than Random Forest. According to these authors, the main advantages of Gradient-boosted trees lie in dealing very well with outliers, missing data, and non-normally distributed variables. They are also used to automatically expose nonlinear interactions among features while adjusting to them. Alfaro et al. [43] conducted an empirical study comparing the performance of AdaBoost and neural networks for predicting bankruptcy. Their findings based on a set of European firms indicate that the AdaBoost method they proposed can significantly lower the generalization error by approximately 30% when compared to the error generated by a neural network.

An overview of selected AI methods applied to predict corporate financial collapse is listed in Table 1.

3. Methodology

The sample under consideration included businesses from the Slovak construction industry, SK NACE 41—Construction of buildings (NACE means the classification of economic activities in the European Union). Financial data were obtained from the CRIF Agency [68]. When preparing the research sample for the analysis, from the set of all enterprises doing business within SK NACE 41, enterprises with incomplete records and zero sales were removed. In the next step, it was necessary to identify and remove outliers from the analyzed sample. For this purpose, kernel density estimates were created for all analyzed indicators. After removing enterprises with incomplete records, zero sales, and outliers from the analyzed sample, we continued to work with a set of 1349 enterprises. Analysis was performed using 24 financial features from all areas of financial health assessment: profitability, assets management, liquidity, and capital structure. Selected financial features and formulas for their calculation are listed in Table 2.

Three criteria [69] were used to determine the assumption of prosperity for the analyzed sample of companies (see Figure 1). Businesses were classified as non-prosperous if they met all these criteria. In this paper, we supposed that non-prosperity is a prerequisite for bankruptcy. The research sample consisted of 1282 prosperous and 67 non-prosperous businesses.

In this paper, Lasso regression was used to find the most appropriate financial features in terms of bankruptcy prediction. Lasso regression was introduced by Tibshirani [70]. It identifies the variables and their associated regression coefficients that lead to building a model with the minimum prediction error. This is achieved by constraining the parameters of the model, which “shrinks” the regression coefficients towards zero [71]. When performing Lasso in software Statistica 14.1.0.8. Lasso logistic regression was selected. The penalized logarithmic likelihood function of Lasso logistic regression that needs to be maximized can be written as (1) [72]:

l_{λ}^{L} (β) = \sum_{i = 1}^{n} [y_{i} x_{i} β - l o g (1 + e^{x_{i} β})] - λ \sum_{j = 1}^{k} |β_{j}|,

(1)

where

λ \geq 0

is the penalty parameter,

β

is the column vector of the regression coefficients,

x_{i}

are the independent variables,

y_{i}

is the binomial dependent variable,

n

is the number of observations, and

k

is the number of variables.

The penalty parameter λ was determined based on the minimum prediction error in cross-validation (λ_min).

Two nonparametric methods were applied to predict bankruptcy—neural networks and decision trees.

Neural networks are currently considered one of the best machine-learning algorithms. The theory of neural networks is based on neurophysiological knowledge. It tries to explain behavior based on the principle of processing information in nerve cells. ANNs are sometimes called brain-without-mind models [73]. Recently, machine-learning techniques, especially ANNs, have been widely investigated with respect to bankruptcy prediction, as they have been confirmed as good predictors and classifiers, especially when classifying companies according to their risk of possible bankruptcy [17]. In this research, we applied feedforward ANN with multiple hidden layers.

The basic structure of neural networks consists of a directed graph with vertices (neurons), which are arranged in layers and are connected by edges (synapses). The input layer consists of all the inputs in separate neurons, and the output layer consists of dependent variables. The function of the simplest multilayer perceptron (MLP) can be written with the use of the following Formula (2) [74]:

o (x) = f (w_{0} + \sum_{i = 1}^{n} w_{i} x_{i}) = f (w_{0} + w^{T} x)

(2)

where

w_{0}

is the intercept,

w = (w_{1}, \dots, w_{n})

is the vector of all synaptic weights except for intercept,

x = (x_{1}, \dots, x_{n})

is the vector of all inputs. The flexibility of the modeling can be improved by adding hidden layers. MLP function with a hidden layer comprising J neurons can be written as follows (3):

o (x) = f (w_{0} + \sum_{i = 1}^{J} w_{j} x f (w_{0 j} + \sum_{i = 1}^{n} w_{i j} x_{i})) = f (w_{0} + \sum_{i = 1}^{J} w_{j} x f (w_{0 j} + \sum_{i = 1}^{n} w_{j}^{T} x))

(3)

where

w_{0}

is the intercept of the output neuron,

w_{j}

is the synaptic weight,

w_{0 j}

is the intercept of the jth hidden neuron, and

w_{j} = (w_{i j}, \dots, w_{n j})

is the vector of all synaptic weights.

All hidden and output neurons calculate the output f

(g (z_{0}, z_{1}, \dots \dots \dots \dots . . z_{k})) = f (g (z))

from the outputs of all previous preceding neurons

z_{0}, z_{1}, \dots, z_{k}

, where the integration function g and the activation function f are as follows: g:

R^{k + 1}

→ R; f: R → R. Integration function g can be defined as (4):

g (z) = {w_{0} z}_{0} + \sum_{i = 1}^{k} w_{i} z_{i} = (w_{0} + w^{T} z)

(4)

As an activation output function, SoftMax was used, which is a combination of several sigmoid functions. The SoftMax function, as opposed to the sigmoid functions used in binary classification, can be applied to multiclass classification problems. The output of the SoftMax function is a value between 0 and 1, while the difference compared to the sigmoid functions is that the sum of the output neurons is 1. The formula for the activation function is as follows (5) [75]:

f {(z)}_{j} = \frac{e^{z j}}{\sum_{j = 1}^{K} e^{z j}} f o r j = 1, \dots, K, w h i l e K > 2

(5)

Decision trees are nonparametric discrimination algorithms that are increasingly popular because of their intuitive interpretability characteristics [4]. When building a decision tree, recursive splitting is used to fit a tree to the training sample. During this process, the training sample is gradually divided into increasingly homogenous subsets using specific criteria [76]. Various decision tree algorithms can be used. One of the most successful ones is the CART algorithm [77] applied in this study. The CART algorithm introduced by Breiman [78] is used to build both classification and regression trees. The construction of the classification tree using CART is based on binary partitioning of attributes [79].

Let us denote training vectors

x_{i} ϵ R^{n}, i = 1, \dots, l

as well as a label vector

y ϵ R^{l}

. A decision tree recursively splits the feature space in such a way that it matches samples with similar or the same features and target values. Let the data at node

m

be presented by

Q_{m}

with

n_{m}

samples. In the case of each candidate split

θ = (j, t_{m})

formed by a feature

j

and threshols

t_{m}

, splits the data into

Q_{m}^{l e f t} (θ)

and

Q_{m}^{r i g h t} (θ)

subsets as follows (6) [80]:

Q_{m}^{l e f t} (θ) = \{(x, y) |x_{j} \leq t_{m}\} Q_{m}^{r i g h t} (θ) = Q_{m} / Q_{m}^{l e f t} (θ)

(6)

A crucial decision when creating a classification tree is the selection of the feature on which to perform further partitioning. A specific measure of the “impurity” of a set of cases can be used. It is the extent to which training cases from several classes are contained in a node [76]. In this paper, the Gini index (GI) was used as an impurity measure.

At the moment when a node splits into two child nodes, GI is calculated for each child node GI(i) as follows (7):

G I (i) = \sum_{c = 1}^{J} p_{t c} (1 - p_{t c}) = 1 - \sum_{c = 1}^{J} p_{t c}^{2}

(7)

where

G I (i)

is Gini index at node t and

p_{t c}

is a share of cases from class c at node t.

The total value of the Gini index for a given distribution (

{G I}_{t o t a l}

) (8) is equal to the weighted sum of all GI indices of individual child nodes. The weights are set according to the size of the child node. Therefore, we calculate

{G I}_{t o t a l}

as the sum of

G I (i)

of the child nodes, which are multiplied by the corresponding share of observations in the given child node from the total number of observations in the original node [76,81].

{G I}_{t o t a l} = \sum_{i = 1}^{K} \frac{N_{i}}{N_{t}} G I (i),

(8)

where

K

is the number of child nodes for binary tree K = 2,

N_{t}

is the number of observations at node t and

N_{i}

is the number of observations at child nodes.

To enhance the classification accuracy of weak learners (in our case, decision tree), one of the ensemble techniques—boosting—was used in this paper. The first and still the most widely used boosting algorithm today is AdaBoost [82], which was presented in the study of Freund and Shapiro [83].

The AdaBoost algorithm is designed to classify tasks into two classes. In this algorithm, trees “grow” gradually. Each tree applies the information from the previous tree, so instead of random selections with repetition, a modification of the original data file is used. The algorithm works by gradually adjusting the weights of learning examples based on previous results. The first iteration is typically one where the learning algorithm formulates a hypothesis from learning examples with equal weights (the weight of each example is 1). If the total number of observations is n, then the starting weights will be equal to 1/n. In the next steps, the model is created using a weight vector that increases the weights if observations are classified incorrectly and decreases the weights if they are classified correctly. The classification method thus increasingly “focuses attention” on “difficult” observations that cannot be assigned to the correct class. When creating trees, residuals are used as the dependent variables [84,85].

Let us denote the indicator function ‖_m {y_i}, which takes on the value 1 if observation i is misclassified in step m. Next, denote the weight of observation i in step m as w_m (i). The weights of the observations are always non-negative, and their sum is equal to 1. Subsequently, let us calculate e_m, which represents an error in the assumption of determining the input weights. This error can be calculated as follows (9):

e_{m} = \sum_{i = 1}^{m} w_{m} (y_{i j}) ‖_{m} \{y_{i j}\}

(9)

In the next step, the weights of correctly classified observations are multiplied by a constant, which is calculated based on the following Formula (10):

w_{m} (y_{i j}) = \frac{e_{m}}{1 - e_{m}} \times w_{m} (y_{i j})

(10)

All weights need to be multiplied by a suitable constant to maintain the relationship (11):

w_{m} (y_{i j}) = 1

(11)

Usually, the algorithm is stopped when there is a sufficient number of iterations, e.g., 100, or if it is true that

0 < e_{m} \leq 0.5

.

To select a suitable tree, the final classification of the possible failure of enterprises is chosen as a combination of the classification of individual trees, while each tree is assigned a weight. Let us mark individual trees as

T_{m}^{T}

. The weight assigned to each tree is as follows (12):

w_{m}^{T} = learning rate \times l o g \frac{e_{m}}{1 - e_{m}}

(12)

Learning rate is a parameter specified by the user. Subsequently, the individual weights are calculated, and the variants are evaluated. The variant that achieves the highest sum of weights is selected.

Gradient boosting, introduced by Friedman [86], is a numerical optimization algorithm that aims to find a model that eliminates the errors of the previous models. At each step, it iteratively adds a new DT, which best reduces the lost function [87]. The methodology of applying Gradient boosting on a decision tree as a base learner was inspired by Friedman [86] and Bentéjac et al. [88].

Given a training dataset

D = {\{x_{i} {; y}_{i}\}}_{1}^{n}

, Gradient boosting aims to look for an approximation

\hat{F} (x)

of the function

F^{*} (x)

investigating instances x to their output values y that minimizes the expected value of a chosen loss function

L (y, F (x))

. These functions are the models of the ensemble (e.g., DT). The approximation is built iteratively. Let us initialize the model with a constant value prediction

F_{0}

(13):

F_{0} (x) = a r g \underset{γ}{m i n} \sum_{i = 1}^{n} L (y_{i,} γ)

(13)

where L is the loss function,

γ

is the predicted value. Argmin means we are searching for the value

γ

that minimizes

\sum_{i = 1}^{n} L (y_{i,} γ) .

For

m = 1,2, \dots, M

, while M is the number of decision trees, the aim is to:

Find the pseudo-residuals $(o b s e r v e d v a l u e - p r e d i c t e d v a l u e)$ . The predicted value is the value forecasted in the prior model;
Compute residuals using Formula (14):

r_{m i} = - {|\frac{\partial L (y_{i}, F (x_{i}))}{\partial F (x_{i})}|}_{F (x) = F_{m - 1} (x)} f o r i = 1, \dots, n

(14)

where

F_{m - 1} (x)

is forecast for the base model.

3.: Find the output value $γ_{m}$ of each leaf of the decision tree using Formula (15)

γ_{m} = a r g \underset{γ}{m i n} \sum_{i = 1}^{n} L (y_{i}, F_{m - 1} (x_{i}) + γ h_{m} (x_{i}))

(15)

where

h_{m} (x_{i})

is a decision tree produced during residual.

4.: Update the model as follows (16):

F_{m} (x) = F_{m - 1} (x) + {v_{m} h}_{m} (X)

(16)

where

v_{m}

is the weight of the

m^{t h}

function

h_{m} (X)

.

If there are deficiencies in the regulation of the iteration process, the above algorithm may suffer from overfitting. When applying some loss functions, the model can perfectly fit pseudo-residuals. In that case, in the next iteration, the pseudo-residuals are equal to zero, and the process is terminated prematurely. Several regularization hyperparameters are considered to control the additive Gradient boosting process. A natural way to regulate the Gradient boosting is to use shrinkage to decrease each step of the gradient with

v = (0, 1⟩

. The value of v is most often set to 0.1 [88].

Furthermore, regularization can also take place by reducing the complexity of the trained models. In the case of decision trees, it is possible to regulate the depth of the tree as well as the minimum number of instances needed to split a node [88].

When building bankruptcy prediction models, 20% of the data were used for testing and the remaining 80% for training. K-fold cross-validation (N = 5) was applied to ensure that observations in each set are equally distributed between prosperous and non-prosperous businesses. The following classification accuracy measures were used to measure the performance of the models: Accuracy, which measures the percentage of correctly classified cases [89]; Precision (also called confidence), which expresses the proportion of predicted positive cases which are correctly true positives. Recall (also called sensitivity), which expresses the proportion of true positive cases that are correctly predicted positive [90]. F1-score, which is defined as the harmonic mean of precision and recall [91]. AUC is not affected by the prior class distributions [92] and is considered a better measure than accuracy [93]. Models were built in the Python module Scikit-learn.

4. Results

The analyzed sample of companies from the construction industry achieves acceptable liquidity results, which was confirmed by several authors. The median of the current ratio as a representative of financial risk reaches a value of 1.26 (see Table 3). This result is higher than 1.2, which rating agencies consider to be the threshold in financial risk assessment. In the case of this indicator, extreme values were also recorded, with a maximum value of 32.91 and a minimum value of 0.15. These are the shortcomings of the sample of companies when extreme values occur within it. Therefore, the application of methods such as DT, which are not affected by extreme values, is very suitable. The safety of the analyzed sample of companies is low, which can be seen in the values of the NWCTA and NWCCA indicators, whose averages do not reach the required values. Profitability indicators, as important predictors of possible bankruptcy of companies, achieve positive results on average. However, even in the case of these indicators, some companies achieve extremely low values and represent outliers in the given area. The ROE and ROS_EBITDA indicators achieve the best results. The problematic area of the analyzed sample of businesses is the capital structure since, in this industry, debt is higher than equity. A significant shortcoming affecting the financial health of the sample under investigation is the long Receivables turnover ratio.

The most relevant features in terms of bankruptcy prediction were identified by the Lasso logistic regression performed in software Statistica 14.1.0.8. These features were selected based on the optimal value of λ determined according to the minimum prediction error of the model when using 10-fold cross-validation (λ_min). The most relevant features at the optimal value of

λ_{m i n} = 0.003,

and their coefficients are as follows: TDTA (4.321), ROC (−4.175), ROE (−1.729), SLTA (1.812), NWCTA (−1.449), NCFTA (−1.310), AT (−0.203), FL (0.008), ELLFAR (−0.004). A graphical representation of these results is shown in Figure 2. The coefficients of the other features were shrunk to 0.

A frequently applied AI method is a neural network. In this paper, a five-layer feedforward ANN was built. The structure of the network was chosen with two conditions in mind: to make the network robust enough to extract features and to prevent its overfitting. The optimization algorithm Adam (Adaptive Moment Estimation) was applied. The architecture of this network with three hidden layers can be seen in Figure 3.

The neural network takes 7 inputs, which are financial features selected by Lasso logistic regression. Hidden layers comprise 16, 32, and 16 neurons and threshold values (bias). The output layer takes the dependent variable—bankruptcy, which contains two neurons giving the final decision. As a result, there are two groups of businesses—businesses that are threatened with bankruptcy and businesses that are not threatened with bankruptcy. The ANN parameters are listed in Table 4.

The classification accuracy of neural networks was evaluated using several performance measures. The overall accuracy of the network was achieved at 97.04% (see Table 5). The network reached higher precision (98.74%) than recall (69,23%). F1-score of the neural network was 0.69, while according to AUC results, the classification accuracy of the network was good.

Another artificial intelligence method applied to predict bankruptcy in this study is a decision tree built by the CART algorithm. It used the features selected by the Lasso logistic regression. DT ranked these features according to their importance for improving financial health and preventing possible bankruptcy. The most important feature was the Return on costs (ROC), followed by the capital structure indicators TDTA and SLTA (see Figure 4).

The results of the DT are graphically illustrated by the diagram (Figure 5). Decomposition of the DT takes place hierarchically. The left branch of the tree is limited to the threshold value of TDTA (

T D T A \leq 1.007

), and 1006 businesses were assessed by that rule., i.e., most prosperous businesses (990 of 1025) and a part of non-prosperous businesses (16 of 54). To increase the accuracy of the model, node 2 was further split, and another branch was created. Businesses that reached

T D T A \leq 0.932

were classified as not at risk of bankruptcy. There were 917 businesses. A total of 89 businesses that reached

T D T A > 0.932

were further classified using other variables. We can state that businesses that reach

0.932 < T D T A \leq 1.007

are at risk of bankruptcy if the value of

R O C \leq - 0.006

and, at the same time, the value of

N W C T A \leq 0.007

.

The left branch of the tree was constrained by a threshold value of

T D T A \leq 1.007

. However, this inequality was not confirmed in the case of 73 businesses, 35 of them prosperous and 38 non-prosperous. Node 3 was further split, and another branch was created. Businesses that reached a positive value of ROC were classified as not at risk of bankruptcy. There were 31 businesses. Businesses that achieved a negative ROC value were further classified based on the value of NWCTA. We can state that businesses that reached

T D T A > 1.007,

are at risk of bankruptcy if the value of

R O C \leq 0

and, at the same time, the value of

N W C T A \leq 0.091 .

The performance metrics of the DT are listed in Table 6. The overall accuracy of the model was 98.89%, while the model achieved higher precision (92.05%) than recall (85.27%). The F1-score of the model was equal to 0.88, while according to the results of its AUC, the model achieved perfect classification accuracy.

Two ensemble techniques were used to enhance decision trees’ classification accuracy—AdaBoosted trees (ABT) and Gradient-boosted trees (GBT). Results of features importance in the case of ABT identified three most important features at about the same level—these were indicator of capital structure TDTA, indicator of profitability ROC and safety indicators NWCTA (see Figure 6).

When applying ABT, there was a slight improvement in overall accuracy compared to DT (see Table 7). However, precision (from 92.05% to 100%), recall (from 85.27% to 94.07%), and thus F1-score increased significantly (from 0.88 to 0.97). The value of AUC 0.97 indicates perfect classification accuracy of the model.

In the case of the GBT application, profitability (ROC) appears to be the most important factor in predicting bankruptcy (see Figure 7). It is followed by capital structure indicators TDTA and SLTA. Significant is also a safety indicator NWCTA.

When using GBT, an increase in classification accuracy compared to DT was also demonstrated, while this model achieved the same performance measures as ABT (see Table 8).

5. Discussion

Empirical research focused on the verification of decision trees, ensemble methods, and artificial neural networks in predicting the financial failure of businesses provides interesting results. The summary of the results is listed in Table 9. The table also shows the results of MDA and LR, which are not part of this empirical study but arise from our earlier research studies.

The ranking of the applied methods according to the results of the individual evaluation criteria is shown in Table 10.

Table 10 shows that ensemble models achieved the best results in all performance criteria. Significantly worse results were achieved by ANN, but despite this, it follows DT in its classification accuracy. The application of the boosting methods to DT clearly showed an increase in their classification accuracy. Compared to ensemble models, MDA achieved the worst results, while LR achieved a slightly better position. Ensemble models achieved the best classification accuracy for both prosperous and non-prosperous businesses. Their classification accuracy for prosperous businesses was 100%, and for non-prosperous businesses, 94.07%. Overall, the classification accuracy of ensemble models was high, as well. It achieved 99.7%.

However, some problems can occur when training DTs. We can mention the determination of the depth of the decision tree, the choice of the appropriate method of selecting the attributes and dealing with training data with missing values. Overfitting problems can occur, as well. The disadvantage of DT occurs if it is built with multiple features while the research sample is small. However, DT also has advantages, such as generating understandable rules, not requiring difficult calculations, working with continuous and categorical variables, and, last but not least, achieving excellent classification and prediction accuracy [94].

The results of this study can be compared with the study by Tsai et al. [9], who applied DT ensembles on three datasets and achieved 88.36% accuracy, while ANN achieved 86.6% accuracy. They confirmed that ensemble models achieve a better classification accuracy compared to simple models, while DT ensembles using boosting methods performed best. Kim and Upneja [33] found out that AdaBoosted decision trees constructed to address financial problems achieved superior prediction results and the lowest error type I compared to decision trees. Alfaro et al. [43] concluded that the ensemble model based on AdaBoost achieved higher classification accuracy than ANN. Better classification accuracy of ensemble-learning models for credit scoring compared to traditional ones also confirmed the findings of Li et al. [95]. Hung and Chen [32] proposed a selective ensemble of three classifiers, i.e., the decision tree, the backpropagation neural network, and the support vector machines, and concluded that they perform better than other weighting or voting ensembles for bankruptcy prediction.

The results of this study can be supported mainly by the study of Heo and Yang [18], who pointed out the fact that many studies focus on the application of different models for the classification of bankrupt companies. However, according to the authors, these are mainly applications for general companies. They focused their study on construction companies, which differ from general companies in some financial characteristics, especially in the liquidity and capital structure. The authors pointed out the significant benefit of the application of the AdaBoost for construction companies. Sun et al. [96] used a backpropagation neural network as a base learner and constructed two ensemble models based on AdaBoost and bagging. These models were constructed to predict the financial distress of Chinese construction and real estate companies. They also confirmed that ensemble models significantly outperform single ones.

The higher classification accuracy of decision trees compared to NN was also confirmed by Olson et al. [4]. They also pointed to the fact that DT results are more understandable for users than ANN results. A shortcoming of DT can be the large number of rules contained in the tree. The high classification and prediction accuracy of DT, which exceeded the classification accuracy of NN, was confirmed by Golbayani et al. [97]. Alam et al. [98], in their study, focused on the prediction of corporate bankruptcy and confirmed the high classification accuracy of the decision tree—random forest (98.7%). Even though we did not apply this decision forest in this study, it is also possible to confirm the significant classification accuracy of decision forests and trees. The significant classification accuracy of AdaBoost was confirmed by Lahmiri et al. [60]. According to the authors, it is a significant classification tool from the point of view of its limited complexity, lower classification error, and short data processing time. Furthermore, the results show that this classification system outperforms models that have been validated on the same data in the recent period. Papíková and Papík [99] state that individual resampling and feature selection methods do not enhance the performance of the model compared to the results of the original unbalanced sample. Even if the sample is unbalanced with a minority of failed businesses, many classification algorithms can cope with this imbalance and bring significant results. These findings are also applied in practice, helping stakeholders to detect real failing companies.

The limitation of this research is the data. There are deficiencies in them, such as missing values, extreme values, or incorrect records. For this reason, adapting data for a given analysis and research is quite time-consuming. In future research, we may focus on the need to resample the dataset to achieve better classification accuracy, even though the classification accuracy of the models built in this study was excellent. We will also focus on the application of non-numeric input data to apply deep learning methods.

A significant contribution of ensembled models is the fact that in their results, they also provide a ranking of the important features that determine the prosperity and failure of the analyzed enterprises (see Table 11).

These are the most important features that should be applied in the prediction of bankruptcy and the classification of businesses into prosperous and non-prosperous.

All entities cooperating with the analyzed businesses can check the results achieved by selected features and thus prevent possible losses. On the other hand, business managers can monitor the development of the mentioned indicators and thus manage and improve the financial health and performance of companies.

Research into the application of decision trees in bankruptcy prediction is still ongoing. It is primarily focused on improving the accuracy of the models without worsening their interpretability. An interesting challenge for the future is, therefore, the effort to improve the classification accuracy of decision tree models without reducing the quality of their interpretation, which tends to be considered their main advantage. The solution to this research problem is twofold: to build traditional decision trees with higher classification accuracy without significantly changing their structure or to propose new interpretation options for non-traditional tree structures [100].

Author Contributions

Conceptualization, J.H., M.M. and A.S.; methodology, J.H., M.M. and A.S.; software, J.H.; validation, J.H. and M.M.; formal analysis, M.M. and A.S.; investigation, J.H. and M.M.; resources, M.M.; data curation, J.H.; writing—original draft preparation, J.H., M.M. and A.S.; writing—review and editing, J.H., M.M. and A.S.; visualization, J.H.; supervision, J.H.; project administration, J.H.; funding acquisition, J.H. and M.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Scientific Grant Agency of the Ministry of Education, Science, Research and Sport of the Slovak Republic and the Slovak Academy of Sciences (VEGA), Grant No. 1/0449/24.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Berezigar Masten, A.; Masten, I. Comparison of Parametric, Semi-Parametric and Non-Parametric Methods in Bankruptcy Prediction. In Semi-Parametric and Non-Parametric Methods in Bankruptcy Prediction; SSRN: Rochester, NY, USA, 2007. [Google Scholar] [CrossRef]
Dimitras, A.I.; Slowinski, R.; Susmaga, R.; Zopounidis, C. Business failure prediction using rough sets. Eur. J. Oper. Res. 1999, 114, 263–280. [Google Scholar] [CrossRef]
Gepp, A.; Kumar, K.; Bhattacharya, S. Business failure prediction using decision trees. J. Forecast. 2010, 29, 536–555. [Google Scholar] [CrossRef]
Olson, D.L.; Delen, D.; Meng, Y. Comparative analysis of data mining methods for bankruptcy prediction. Decis. Support Syst. 2012, 52, 464–473. [Google Scholar] [CrossRef]
Durica, M.; Frnda, J.; Svabova, L. Decision tree based model of business failure prediction for Polish companies. Oeconomia Copernic. 2019, 10, 453–469. [Google Scholar] [CrossRef]
Liu, J.; Wu, C. A gradient-boosting decision-tree approach for firm failure prediction: An empirical model evaluation of Chinese listed companies. J. Risk Model Valid. 2017, 11, 43–64. [Google Scholar] [CrossRef]
Lee, C.S.; Cheang, P.Y.S.; Moslehpour, M. Predictive analytics in business analytics: Decision tree. Adv. Decis. Sci. 2022, 26, 1–29. [Google Scholar]
Zhou, L.; Lai, K.K. AdaBoost Models for Corporate Bankruptcy Prediction with Missing Data. Comput. Econ. 2017, 50, 69–94. [Google Scholar] [CrossRef]
Tsai, C.-F.; Hsu, Y.-F.; Yen, D.C. A comparative study of classifier ensembles for bankruptcy prediction. Appl. Soft Comput. 2014, 24, 977–984. [Google Scholar] [CrossRef]
Karas, M.; Režňáková, M. A parametric or nonparametric approach for creating a new bankruptcy prediction model: The evidence from the Czech Republic. Int. J. Math. Models Methods Appl. Sci. 2014, 8, 214–223. [Google Scholar]
Ben Jabeur, S.; Stef, N.; Carmona, P. Bankruptcy Prediction using the XGBoost Algorithm and Variable Importance Feature Engineering. Comput. Econ. 2023, 61, 715–741. [Google Scholar] [CrossRef]
Zhang, G.; Hu, M.Y.; Patuwo, B.E.; Indro, D.C. Artificial neural networks in bankruptcy prediction: General framework and cross-validation analysis. Eur. J. Oper. Res. 1999, 116, 16–32. [Google Scholar] [CrossRef]
Fletcher, G.; Goss, E. Forecasting with neural networks: An application using bankruptcy data. Inf. Manag. 1993, 24, 159–167. [Google Scholar] [CrossRef]
Du Jardin, P. Predicting bankruptcy using neural networks and other classification methods: The influence of variable selection techniques on model accuracy. Neurocomputing 2010, 73, 2047–2060. [Google Scholar] [CrossRef]
LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
Karas, M.; Srbová, P. Predicting bankruptcy in construction business: Traditional model validation and formulation of a new model. J. Int. Stud. 2019, 12, 283–296. [Google Scholar] [CrossRef]
Jang, Y.; Jeong, I.; Cho, Y.K. Identifying impact of variables in deep learning models on bankruptcy prediction of construction contractors. Eng. Constr. Archit. Manag. 2021, 28, 3282–3298. [Google Scholar] [CrossRef]
Heo, J.; Yang, J.Y. AdaBoost based bankruptcy forecasting of Korean construction companies. Appl. Soft Comput. 2014, 24, 494–499. [Google Scholar] [CrossRef]
Mellit, A.; Kalogirou, S.A. Artificial intelligence techniques for photovoltaic applications: A review. Prog. Energy Combust. Sci. 2008, 34, 574–632. [Google Scholar] [CrossRef]
Lee, A.; Taylor, P.; Kalpathy-Cramer, J.; Tufail, A. Machine Learning Has Arrived! Ophthalmology 2017, 124, 1726–1728. [Google Scholar] [CrossRef]
Setiowati, S.; Zulfanahri, F.E.L.; Ardiyanto, I. A review of optimization method in face recognition: Comparison deep learning and non-deep learning methods. In Proceedings of the 9th International Conference on Information Technology and Electrical Engineering (ICITEE), Piscataway, NJ, USA, 12–13 October 2017; IEEE: New York, NY, USA, 2017; pp. 1–6. [Google Scholar] [CrossRef]
Shinde, P.P.; Shah, S. A Review of Machine Learning and Deep Learning Applications. In Proceedings of the 2018 Fourth International Conference on Computing Communication Control and Automation (ICCUBEA), Pune, India, 16–18 August 2018; IEEE: New York, NY, USA, 2018; pp. 1–6. [Google Scholar] [CrossRef]
Kriegeskorte, N. Deep neural networks: A new framework for modeling biological vision and brain information processing. Annu. Rev. Vis. Sci. 2015, 1, 417–446. [Google Scholar] [CrossRef]
McCulloch, W.S.; Pitts, W. A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biophys. 1943, 5, 115–133. [Google Scholar] [CrossRef]
Vochozka, M.; Jelínek, J.; Váchal, J.; Straková, J.; Stehel, V. Využití Neuronových Sítí Při Komplexním Hodnocení Podniků [The Use of Neural Networks in the Comprehensive Evaluation of Companies]; C.H.Beck: Praha, Czech Republic, 2017. [Google Scholar]
Eberhart, R.C.; Dobbins, R.W. Early neural network development history: The age of Camelot. IEEE Eng. Med. Biol. Mag. 1990, 9, 15–18. [Google Scholar] [CrossRef]
Macukow, B. Neural Networks–State of Art, Brief History, Basic Models and Architecture. In Proceedings of the CISIM 2016: Computer Information Systems and Industrial Management, Vilnius, Lithuania, 14–16 September 2016; Saeed, K., Homenda, W., Eds.; Springer: Cham, Switzerland, 2016. [Google Scholar] [CrossRef]
Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning representations by back-propagating errors. Nature 1986, 323, 533–536. [Google Scholar] [CrossRef]
Popescu, M.-C.; Balas, V.; Perescu-Popescu, L.; Mastorakis, N.E. Multilayer perceptron and neural networks. WSEAS Trans. Circuits Syst. 2009, 8, 579–588. [Google Scholar] [CrossRef]
Volná, E. Neuronové Sítě I [Neural Networks I]; Ostravská Univerzita v Ostravě: Ostrava, Czech Republic, 2008; Available online: https://web.osu.cz/~Volna/Neuronove_site_skripta.pdf (accessed on 1 May 2024).
Park, Y.S.; Lek, S. Artificial Neural Networks: Multilayer Perceptron for Ecological Modeling. In Developments in Environmental Modelling; Jørgensen, S.E., Ed.; Elsevier: Amsterdam, The Netherlands, 2016; Volume 28, pp. 123–140. [Google Scholar] [CrossRef]
Hung, C.; Chen, J.-H. A selective ensemble based on expected probabilities for bankruptcy prediction. Expert Syst. Appl. 2009, 36, 5297–5303. [Google Scholar] [CrossRef]
Kim, S.Y.; Upneja, A. Predicting restaurant financial distress using decision tree and AdaBoosted decision tree models. Econ. Model. 2014, 36, 354–362. [Google Scholar] [CrossRef]
Frydman, H.; Altman, E.I.; Kao, D. Introducing recursive partitioning for financial classification: The case of financial distress. J. Financ. 1983, 40, 269–291. [Google Scholar] [CrossRef]
Nazemi Ardakani, M.; Zare MehrJardi, V.; Mohammadi-Nodooshan, A. A Firms’ Bankruptcy Prediction Model Based on Selected Industries by Using Decision Trees Model. J. Asset Manag. Financ. 2018, 6, 121–138. [Google Scholar] [CrossRef]
Gepp, A.; Kumar, K. Predicting Financial Distress: A Comparison of Survival Analysis and Decision Tree Techniques. Procedia Comput. Sci. 2015, 54, 396–404. [Google Scholar] [CrossRef]
Bhutta, R.; Regupathi, A. Predicting corporate bankruptcy: Lessons from the past. Asian J. Multidiscip. Stud. 2020, 8, 4–10. [Google Scholar]
Vapnik, V.N.; Vapnik, V. Statistical Learning Theory; Wiley: New York, NY, USA, 1998; Volume 1. [Google Scholar]
Janiesch, C.; Zschech, P.; Heinrich, K. Machine learning and deep learning. Electron. Mark. 2021, 31, 685–695. [Google Scholar] [CrossRef]
Chauhan, N.K.; Singh, K. A Review on Conventional Machine Learning vs Deep Learning. In Proceedings of the 2018 International Conference on Computing, Power, and Communication Technologies (GUCON), Uttar Pradesh, India, 28–29 September 2018; IEEE: New York, NY, USA, 2018; pp. 347–352. [Google Scholar] [CrossRef]
González, S.; García, S.; Del Ser, J.; Rakach, L.; Herrera, F. A practical tutorial on bagging and boosting based ensembles for machine learning: Algorithms, software tools, performance study, practical perspectives and opportunities. Inf. Fusion 2020, 64, 205–237. [Google Scholar] [CrossRef]
Wyrobek, J.; Kluza, K. Efficiency of Gradient Boosting Decision Trees Technique in Polish Companies’ Bankruptcy Prediction. In Information Systems Architecture and Technology: Proceedings of 39th International Conference on Information Systems Architecture and Technology–ISAT 2018, Advances in Intelligent Systems and Computing; Wilimowska, Z., Borzemski, L., Świątek, J., Eds.; Springer: Cham, Switzerland, 2019; Volume 854, pp. 24–35. [Google Scholar] [CrossRef]
Alfaro, E.; García, N.; Gámez, M.; Elizondo, D. Bankruptcy forecasting: An empirical comparison of AdaBoost and neural networks. Decis. Support Syst. 2008, 45, 110–122. [Google Scholar] [CrossRef]
Zhao, Z.; Xu, S.; Kang, B.H.; Kabir, M.M.J.; Liu, Y.; Wasinger, R. Investigation and improvement of multi-layer perceptron neural networks for credit scoring. Expert Syst. Appl. 2015, 42, 3508–3516. [Google Scholar] [CrossRef]
Tsai, C.H.-F.; Wu, J.-W. Using neural network ensembles for bankruptcy prediction and credit scoring. Expert Syst. Appl. 2008, 34, 2639–2649. [Google Scholar] [CrossRef]
Mai, F.; Tian, S.; Lee, C.H. Deep learning models for bankruptcy prediction using textual disclosures. Eur. J. Oper. Res. 2019, 274, 743–775. [Google Scholar] [CrossRef]
Ansari, A.; Ahmad, I.S.; Bakar, A.A.; Yaakub, M.R. A Hybrid Metaheuristic Method in Training Artificial Neural Network for Bankruptcy Prediction. IEEE Access 2020, 8, 176640–176650. [Google Scholar] [CrossRef]
Brenes, R.-F.; Johannssen, A.; Chukhrova, N. An intelligent bankruptcy prediction model using a multilayer perceptron. Intell. Syst. Appl. 2022, 16, 200136. [Google Scholar] [CrossRef]
Wu, D.; Ma, X.; Olson, D.-L. Financial distress prediction using integrated Z-score and multilayer perceptron neural networks. Decis. Support Syst. 2022, 159, 113814. [Google Scholar] [CrossRef]
Matsumaru, M.; Kawanaka, T.; Katagiri, K.; Kaneko, S. Prediction of bankruptcy on industry classification. Int. J. Jpn. Assoc. Manag. Syst. 2018, 10, 1–12. [Google Scholar] [CrossRef]
Lee, V.C.H. Genetic Programming Decision Tree for Bankruptcy Prediction. In Proceedings of the 9th Joint International Conference on Information Sciences (JCIS-06), Taiwan, China, 8–11 October 2006; Atlantis Press: Amsterdam, The Netherlands, 2006; pp. 33–36. [Google Scholar] [CrossRef]
Chen, M.-Y. Bankruptcy prediction in firms with statistical and intelligent techniques and a comparison of evolutionary computation approaches. Comput. Math. Appl. 2011, 62, 4514–4524. [Google Scholar] [CrossRef]
Alaka, H.A.; Oyedele, L.O.; Owolabi, H.A. Systematic review of bankruptcy prediction models: Towards a framework for tool selection. Expert Syst. Appl. 2018, 94, 164–184. [Google Scholar] [CrossRef]
Park, M.S.; Son, H.; Hyun, C.; Hwang, H.J. Explainability of Machine Learning Models for Bankruptcy Prediction. IEEE Access 2021, 9, 124887–124899. [Google Scholar] [CrossRef]
Kim, M.-J.; Kang, D.-K.; Kim, H.B. Geometric mean based boosting algorithm with over-sampling to resolve data imbalance problem for bankruptcy prediction. Expert Syst. Appl. 2015, 42, 1074–1082. [Google Scholar] [CrossRef]
Zięba, M.; Tomczak, S.B.; Tomczak, J.M. Ensemble boosted trees with synthetic features generation in application to bankruptcy prediction. Expert Syst. Appl. 2016, 58, 93–101. [Google Scholar] [CrossRef]
Jones, S.; Johnstone, D.; Wilson, R. Predicting Corporate Bankruptcy: An Evaluation of Alternative Statistical Frameworks. J. Bus. Financ. Account. 2017, 44, 3–34. [Google Scholar] [CrossRef]
Sigrist, F.; Hirnschall, C. Grabit: Gradient tree-boosted Tobit models for default prediction. J. Bank. Financ. 2019, 102, 177–192. [Google Scholar] [CrossRef]
García, V.; Marqués, A.I.; Sánchez, J.S. Exploring the synergetic effects of sample types on the performance of ensembles for credit risk and corporate bankruptcy prediction. Inf. Fusion 2019, 47, 88–101. [Google Scholar] [CrossRef]
Lahmiri, S.; Bekiros, S.; Giakoumelou, A.; Bezzina, F. Performance assessment of ensemble learning systems in financial data classification. Intell. Syst. Account. Financ. Manag. 2020, 27, 3–9. [Google Scholar] [CrossRef]
Jabeur, S.M.; Gharib, C.; Mefteh-Wali, S.; Arfi, W.B. CatBoost model and artificial intelligence techniques for corporate failure prediction. Technol. Forecast. Soc. Chang. 2021, 166, 120658. [Google Scholar] [CrossRef]
Smith, M.; Alvarez, F. Predicting Firm-Level Bankruptcy in the Spanish Economy Using Extreme Gradient Boosting. Comput. Econ. 2022, 59, 263–295. [Google Scholar] [CrossRef]
Chen, T.K.; Liao, H.H.; Chen, G.-D.; Kang, W.-H.; Lin, Y.-C. Bankruptcy prediction using machine learning models with the text-based communicative value of annual reports. Expert Syst. Appl. 2023, 233, 120714. [Google Scholar] [CrossRef]
Mattos, E.D.; Shasha, D. Bankruptcy prediction with low-quality financial information. Expert Syst. Appl. 2024, 237, 121418. [Google Scholar] [CrossRef]
Hosaka, T. Bankruptcy prediction using imaged financial ratios and convolutional neural networks. Expert Syst. Appl. 2019, 117, 287–299. [Google Scholar] [CrossRef]
Jabeur, S.M.; Serret, V. Bankruptcy prediction using fuzzy convolutional neural networks. Res. Int. Bus. Financ. 2023, 64, 101844. [Google Scholar] [CrossRef]
Du Jardin, P. Designing topological data to forecast bankruptcy using convolutional neural networks. Ann. Oper. Res. 2023, 325, 1291–1332. [Google Scholar] [CrossRef]
CRIF. Financial Statements of Analyzed Businesses; CRIF-Slovak Credit Bureau, Ltd.: Bratislava, Slovakia, 2023. [Google Scholar]
Valášková, K.; Švábová, L.; Ďurica, M. Verification of prediction models in conditions of the Slovak agricultural sector. Econ. Manag. Innov. 2017, 9, 30–38. [Google Scholar]
Tibshirani, R. Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. Ser. B 1996, 58, 267–288. [Google Scholar] [CrossRef]
Ranstam, J.; Cook, J.A. LASSO regression. Br. J. Surg. 2018, 105, 1348. [Google Scholar] [CrossRef]
Pereira, J.M.; Basto, M.; Ferreira da Silva, A. The Logistic Lasso and Ridge Regression in Predicting Corporate Failure. Proceedia Econ. Financ. 2016, 39, 634–641. [Google Scholar] [CrossRef]
Clark, J.W.; Rafelski, J.; Winston, J.V. Brain without mind: Computer simulation of neural networks with modifiable neuronal interactions. Phys. Rep. 1985, 123, 215–273. [Google Scholar] [CrossRef]
Günther, F.; Fritsch, S. Neuralnet: Training of neural networks. R J. 2010, 2, 30–38. [Google Scholar] [CrossRef]
Sharma, S.; Sharma, S.; Athaiya, A. Activation functions in neural networks. Int. J. Eng. Appl. Sci. Technol. 2020, 4, 310–316. [Google Scholar] [CrossRef]
Pompe, P.P.M.; Feelders, A.J. Using Machine Learning, Neural Networks and Statistics to Predict Corporate Bankruptcy: A Comparative Study. In Artificial Intelligence in Economics and Management; Ein-Dor, P., Ed.; Springer: Boston, MA, USA, 1996. [Google Scholar] [CrossRef]
Ghiasi, M.M.; Zendehboudi, S.; Mohsenipour, A.A. Decision tree-based diagnosis of coronary artery disease: CART model. Comput. Methods Programs Biomed. 2020, 192, 105400. [Google Scholar] [CrossRef]
Breiman, L.; Friedman, J.; Stone, C.J.; Olshen, R.A. Classification and Regression Trees; Taylor & Francis: New York, NY, USA, 1984. [Google Scholar]
Anyanwu, M.N.; Shiva, S.G. Comparative analysis of serial decision tree classification algorithms. Int. J. Comput. Sci. Secur. 2009, 3, 230–240. [Google Scholar]
Scikit-Learn. Decision Trees. 2023. Available online: https://scikitlearn.org/stable/modules/tree.html (accessed on 12 February 2024).
Komprdová, K. Rozhodovací Stromy a Lesy [Decision Trees and Forests]. Multimedia Support for the Teaching of Clinical and Medical Fields: Portal of the Faculty of Medicine of Masaryk University. 2012. Available online: https://portal.med.muni.cz/clanek-596-rozhodovaci-stromy-a-lesy.html (accessed on 14 January 2024).
Mayr, A.; Binder, H.; Gefeller, O.; Schmid, M. The evolution of boosting algorithms. From machine learning to statistical modelling. Methods Inf. Med. 2014, 53, 419–427. [Google Scholar] [CrossRef]
Freund, Y.; Schapire, R.E. A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting. J. Comput. Syst. Sci. 1997, 55, 119–139. [Google Scholar] [CrossRef]
Murphy, K.P. Machine Learning: A Probabilistic Perspective; The MIT Press: Cambridge, MA, USA, 2012. [Google Scholar]
Kononenko, I.; Kukar, M. Machine Learning and Data Mining: Introduction to Principles and Algorithms; Horwood Publishing: Chichester, UK, 2007. [Google Scholar]
Friedman, J. Greedy function approximation: A gradient boosting machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
Touzani, S.; Granderson, J.; Fernandes, S. Gradient boosting machine for modeling the energy consumption of commercial buildings. Energy Build. 2018, 158, 1533–1543. [Google Scholar] [CrossRef]
Bentéjac, C.; Csörgő, A.; Martínez-Muñoz, G. A comparative analysis of gradient boosting algorithms. Artif. Intell. Rev. 2021, 54, 1937–1967. [Google Scholar] [CrossRef]
Devi, S.S.; Radhika, Y. A survey on machine learning and statistical techniques in bankruptcy prediction. Int. J. Mach. Learn. Comput. 2018, 8, 133–139. [Google Scholar] [CrossRef]
Powers, D.M.W. Evaluation: From precision, recall and F-measure to ROC, informedness, markedness and correlation. J. Mach. Learn. Technol. 2011, 2, 37–63. [Google Scholar] [CrossRef]
Flach, P.; Kull, M. Precision-Recall-Gain Curves: PR Analysis Done Right. In Advances in Neural Information Processing Systems; Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R., Eds.; Massachusetts Institute of Technology (MIT) Press: Cambridge, MA, USA, 2015; Volume 28, pp. 838–846. Available online: https://papers.nips.cc/paper/5867-precision-recall-gain-curves-pr-analysis-done-right (accessed on 15 January 2024).
LeDell, E.; Van der Laan, M.J.; Petersen, M. AUC-Maximizing Ensembles through Metalearning. Int. J. Biostat. 2016, 12, 203–218. [Google Scholar] [CrossRef]
Huang, J.; Lu, J.; Ling, C.X. Comparing naive Bayes, decision trees, and SVM with AUC and accuracy. In Proceedings of the Third IEEE International Conference on Data Mining, Melbourne, FL, USA, 19–22 November 2003; IEEE Computer Society: Washington, DC, USA, 2003; pp. 553–556. [Google Scholar] [CrossRef]
Kostík, Ľ.; Saloky, T. Niektoré z problémov pri získavaní dát pomocou rozhodovacích stromov [Some of the problems in data acquisition using decision trees]. ATP J. 2006, 4, 55–57. [Google Scholar]
Li, Y.; Chen, W. A Comparative Performance Assessment of Ensemble Learning for Credit Scoring. Mathematics 2020, 8, 1756. [Google Scholar] [CrossRef]
Sun, J.; Liao, B.; Li, H. AdaBoost and bagging ensemble approaches with neural network as base learner for financial distress prediction of Chinese construction and real estate companies. Recent Pat. Comput. Sci. 2013, 6, 47–59. [Google Scholar] [CrossRef]
Golbayani, P.; Florescu, I.; Chatterjee, R. A comparative study of forecasting corporate credit ratings using neural networks, support vector machines, and decision trees. N. Am. J. Econ. Financ. 2020, 54, 101251. [Google Scholar] [CrossRef]
Alam, T.M.; Shaukat, K.; Mushtaq, M.; Ali, Y.; Khushi, M.; Luo, S.; Wahab, A. Corporate Bankruptcy Prediction: An Approach Towards Better Corporate World. Comput. J. 2021, 64, 1731–1746. [Google Scholar] [CrossRef]
Papíková, L.; Papík, M. Effects of classification, feature selection, and resampling methods on bankruptcy prediction of small and medium-sized enterprises. Intell. Syst. Account. Financ. Manag. 2022, 29, 254–281. [Google Scholar] [CrossRef]
Costa, V.G.; Pedreira, C.E. Recent advances in decision trees: An updated survey. Artif. Intell. Rev. 2023, 56, 4765–4800. [Google Scholar] [CrossRef]

Figure 1. Prosperity criteria. Source: authors.

Figure 2. Results of Lasso regression for λ_min. Source: authors.

Figure 3. Neural network architecture. Source: authors.

Figure 4. Features importance—DT. Source: authors.

Figure 5. DT diagram. Source: authors.

Figure 6. Features importance—ABT. Source: authors.

Figure 7. Features importance—GBT. Source: authors.

Table 1. Selected AI methods applied to predict the financial failure of businesses.

Applied Method	Authors
ANN	Tsai et al. [9]; Zhao et al. [44]; Tsai and Wu [45]; Mai et al. [46]; Ansari et al. [47]; Brenes et al. [48]; Wu et al. [49]
SVM	Tsai et al. [9]; Matsumaru et al. [50]; Lee [51]; Chen [52]
Decision trees	Tsai et al. [9]; Karas and Režňáková [10]; Matsumaru et al. [50]; Lee [51]; Chen [52]; Alaka, et al. [53]; Park et al. [54]
Decision trees and Ensemble-learning algorithms	Heo and Yang [18]; Kim et al. [55]; Zięba et al. [56]; Jones et al. [57]; Sigrist and Hirnschall [58]; García et al. [59]; Lahmiri et al. [60]; Jabeur et al. [61]; Park et al. [54]; Smith and Alvarez [62]; Chen et al. [63]; Mattos and Shasha [64]
Convolutional neural networks	Hosaka [65]; Jabeur and Serret [66]; Du Jardin [67]

Source: authors.

Table 2. Formulas for the calculation of financial features.

Indicator	Abb.	Formula
Cash ratio	CaR	$f i n a n c i a l a s s e t s / s h o r t - t e r m l i a b i l i t i e s$
Quick ratio	QR	$(s h o r t - t e r m r e c e i v a b l e s + f i n a n c i a l a s s e t s) / s h o r t - t e r m l i a b i l i t i e s$
Current ratio	CuR	$s h o r t - t e r m a s s e t s / s h o r t - t e r m l i a b i l i t i e s$
Net working capital to current assets ratio	NWCCA	$n e t w o r k i n g c a p i t a l / c u r r e n t a s s e t s$
Net working capital to total assets	NWCTA	$n e t w o r k i n g c a p i t a l / a s s e t s$
Return on assets	ROA	$E B I T / a s s e t s \times 100$
Return on equity	ROE	$E A T / e q u i t y \times 100$
Return on sales	ROS	$E A T / s a l e s \times 100$
Return on costs	ROC	$E A T / c o s t s \times 100$
Return on assets with EAT	ROA_EAT	$E A T / a s s e t s \times 100$
Return on sales with EBITDA	ROS_EBITDA	$E B I T D A / s a l e s \times 100$
Netto cash flow to total assets	NCFTA	$n e t t o c a s h f l o w / a s s e t s$
Netto cash flow to debt	NCFD	$n e t t o c a s h f l o w / d e b t$
Netto cash flow to short-term debt	NCFSD	$n e t t o c a s h f l o w / s h o r t - t e r m d e b t$
Assets turnover	AT	$s a l e s / a s s e t s$
Receivables turnover ratio	RTR	$s a l e s / s h o r t - t e r m r e c e i v a b l e s$
Short-term liabilities turnover ratio	SLTR	$s a l e s / s h o r t - t e r m d e b t$
Total debt to total assets	TDTA	$d e b t / a s s e t s \times 100$
Financial leverage	FL	$t o t a l a s s e t s / e q u i t y$
Debt to equity ratio	DER	$d e b t / e q u i t y$
Equity to fixed assets ratio	EFAR	$e q u i t y / f i x e d a s s e t s$
Equity and long-term liabilities to fixed assets ratio	ELLFAR	$(e q u i t y + l o n g - t e r m l i a b i l i t i e s) / f i x e d a s s e t s$
Short-term liabilities to total assets	SLTA	$s h o r t - t e r m l i a b i l i t e s / a s s e t s$
Cost ratio	CoR	$c o s t s / r e v e n u e s$

Source: authors.

Table 3. Financial features’ descriptive statistics.

Feature	Mean	Median	Minimum	Maximum	Std. Dev.
CaR	1.50	0.32	−0.76	32.91	3.69
QR	2.45	1.11	0.02	32.91	4.24
CuR	2.75	1.26	0.15	32.91	4.42
NWCCA	0.07	0.21	−5.84	0.97	0.87
NWCTA	0.18	0.15	−1.08	0.97	0.40
ROA	0.08	0.04	−0.58	0.79	0.15
ROE	0.11	0.09	−1.97	1.87	0.42
ROS	0.03	0.02	−2.58	1.74	0.21
ROC	0.06	0.02	−0.66	2.01	0.21
ROA_EAT	0.05	0.03	−0.58	0.65	0.13
ROS_EBITDA	0.10	0.06	−2.02	9.28	0.33
NCFTA	0.10	0.07	−0.58	0.80	0.14
NCFD	0.33	0.11	−5.05	7.59	0.83
NCFSD	0.41	0.14	−5.14	7.66	0.95
AT	1.81	1.43	0.00	9.72	1.51
RTR	12.35	4.71	0.03	195.11	24.50
SLTR	5.34	2.99	0.00	47.03	6.97
TDTA	0.64	0.68	0.03	1.50	0.31
FL	5.78	2.69	−46.42	136.61	13.46
DER	4.78	1.69	−47.42	135.61	13.46
EFAR	3.25	1.01	−45.69	116.55	9.53
ELLFAR	3.82	1.34	−44.97	125.01	10.09
SLTA	0.56	0.57	0.00	1.50	0.31
CoR	0.96	0.98	0.28	2.97	0.20

Source: authors.

Table 4. ANN parameters.

Input Layer	Financial features	TDTA
		ROE
		ROC
		SLTA
		NWCTA
		NCFTA
		AT
		7
Hidden Layers	Number of HLs	3
	Number of Neurons in HL1	16
	Number of Neurons in HL2	32
	Number of Neurons in HL3	16
	Activation Function	Rectified Linear Unit (ReLU)
Hidden Layers	Dependent Variable	Bankruptcy
	Number of Neurons	2
	Activation Function	Normalized exponential function (SoftMax)
	Error Function	Binary Cross-Entropy

Explanatory notes: HL—Hidden Layer. Source: authors.

Table 5. Performance metrics—ANN.

Metric	Value
Accuracy	97.04%
Precision	98.74%
Recall	69.23%
F1-score	0.69
AUC	0.84

Source: authors.

Table 6. Performance metrics—DT.

Metric	Value
Accuracy	98.89%
Precision	92.05%
Recall	85.27%
F1-score	0.88
AUC	0.92

Source: authors.

Table 7. Performance metrics—ABT.

Metric	Value
Accuracy	99.70%
Precision	100.00%
Recall	94.07%
F1-score	0.97
AUC	0.97

Source: authors.

Table 8. Performance metrics—GBT.

Metric	Value
Accuracy	99.73%
Precision	100.00%
Recall	94.07%
F1-score	0.97
AUC	0.97

Source: authors.

Table 9. Selected performance evaluation criteria of prediction models.

Evaluation Criterion	MDA	LR	ANN	DT	ABT	GBT
Precision (%)	98.05	99.63	98.74	92.05	100.00	100.00
Recall (%)	61.54	61.54	69.23	85.27	94.07	94.07
Accuracy (%)	96.30	97.78	97.04	98.89	99.7	99.7
AUC	0.79	0.79	0.84	0.92	0.97	0.97
F1	0.58	0.58	0.69	0.88	0.97	0.97

Source: authors.

Table 10. The ranking of methods according to the achieved results.

Bankruptcy Prediction Method	Precision	Recall	Accuracy	AUC	F1	Score
MDA	5	5	6	5	5	26
LR	3	5	4	5	5	22
ANN	4	4	5	4	4	21
DT	6	3	3	3	3	18
ABT	1	1	1	1	1	5

Source: authors.

Table 11. Features ranking.

Features Ranking	Features Ranking
AdaBoosted Trees	TDTA, ROC, NWCTA, SLTA, NCFTA, ROE, AT
Gradient-boosted Trees	ROC, TDTA, SLTA, NWCTA, ROE, NCFTA, AT

Source: authors.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Horváthová, J.; Mokrišová, M.; Schneider, A. The Application of Machine Learning in Diagnosing the Financial Health and Performance of Companies in the Construction Industry. Information 2024, 15, 355. https://doi.org/10.3390/info15060355

AMA Style

Horváthová J, Mokrišová M, Schneider A. The Application of Machine Learning in Diagnosing the Financial Health and Performance of Companies in the Construction Industry. Information. 2024; 15(6):355. https://doi.org/10.3390/info15060355

Chicago/Turabian Style

Horváthová, Jarmila, Martina Mokrišová, and Alexander Schneider. 2024. "The Application of Machine Learning in Diagnosing the Financial Health and Performance of Companies in the Construction Industry" Information 15, no. 6: 355. https://doi.org/10.3390/info15060355

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

The Application of Machine Learning in Diagnosing the Financial Health and Performance of Companies in the Construction Industry

Abstract

1. Introduction

2. Literature Review

3. Methodology

4. Results

5. Discussion

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI