Does ESG Predict Business Failure in Brazil? An Application of Machine Learning Techniques

Kaleem, Mehwish; Raza, Hassan; Ashraf, Sumaira; Almeida, António Martins; Machado, Luiz Pinto

doi:10.3390/risks12120185

Open AccessArticle

Does ESG Predict Business Failure in Brazil? An Application of Machine Learning Techniques

by

Mehwish Kaleem

^1,2

,

Hassan Raza

^3,*,

Sumaira Ashraf

^4,5

,

António Martins Almeida

^6,7,* and

Luiz Pinto Machado

⁷

¹

Faculty of Business & Management, Universiti Sultan Zainal Abidin, Kampung Gong Badak 21300, Terengganu, Malaysia

²

Faculty of Management Sciences, University of Gujrat, Gujrat 50700, Punjab, Pakistan

³

Department of Management Sciences, Shaheed Zulfikar Ali Bhutto Institute of Science & Technology University, Islamabad 44000, Pakistan

⁴

ADVANCE/CSG Research Center, ISEG—Institute of Economics and Management, University of Lisbon, 1649-004 Lisbon, Portugal

⁵

CEFAGE Research Center, University of Evora, 7004-516 Evora, Portugal

⁶

CEEAplA (Centre of Applied Economic Studies of the Atlantic), University of Madeira, 9000-072 Funchal, Portugal

⁷

CITUR (Centre for Tourism Research, Development and Innovation), University of Madeira, 9000-072 Funchal, Portugal

^*

Authors to whom correspondence should be addressed.

Risks 2024, 12(12), 185; https://doi.org/10.3390/risks12120185

Submission received: 5 October 2024 / Revised: 4 November 2024 / Accepted: 20 November 2024 / Published: 25 November 2024

Download

Browse Figures

Versions Notes

Abstract

The aim of this study is to explore the influence of environmental, social, and governance (ESG) factors on business failure in Brazil by employing advanced machine learning techniques. We collected data from 235 companies and conducted principal component analysis (PCA) on 40 variables already used in the bankruptcy failure literature, resulting in the formation of seven variables that predict business failure. The results indicate that ESG factors significantly predict business failure in Brazil. This study has implications for investors, policymakers, and business leaders, offering a more precise tool for risk assessment and strategic decision-making.

Keywords:

environmental, social, and governance (ESG); business failure in Brazil; principal component analysis (PCA); machine learning techniques; confusion matrix; precision score

1. Introduction

Prediction of business failure is an issue of paramount importance to ensuring economic stability in developing economies such as Brazil. Success in predicting corporate failure is essential for investment protection and economic stability. Over time, researchers have developed a resilient framework for ensuring financial stability with distinct obstacles presented to developing nations (Michela 2023). This justifies the need for a more accurate, geographically localized model to look at the different distinct economic obstacles (Habib 2023). According to Michalkova et al. (2018), the power of prediction of business failure has the potential to evaluate the well-being of a company and make informed decisions while avoiding the risk of economic downturn.

Building on the categorization by Richardson et al. (1994), which outlines four types of business failure with distinct origins and necessary managerial responses, we observe that Brazilian firms predominantly exhibit characteristics of the “boiled frog” type. These firms show a gradual decline, failing to adapt to market changes, leading to strategic entrapment and deteriorating financial health. This observation aligns with our definition of business failure, where firms that exhibit continuous losses over successive years are deemed failures, while those maintaining or improving financial health are considered successful. This nuanced understanding underscores the importance of tailored intervention solutions that address specific failure processes to enhance economic resilience.

It is important to understand how local factors lead to bankruptcy and establish efficient tools of forecasting that could enhance financial planning and sustainability of businesses. The ability to accurately predict bankruptcy is essential for managing risks effectively and communicating information that helps stakeholders to understand the factors that are necessary for sustainable economic development. Domicián et al. (2023) highlights the increasing occurrence of bankruptcies in emerging markets, indicating a need for new models tailored to such environments. The dynamic nature of Brazil’s economy makes predicting bankruptcy crucial for stakeholders to assess companies’ financial health, enabling informed investment decisions and preemptive actions to prevent economic downturns. According to Bateni and Asghari (2016), models of bankruptcy prediction are important in identifying early signs of financial crisis; hence, this allows for early intervention that could prevent failures. This, as argued by (Neumeyer and Perri 2005), (Furman and Stiglitz 1998), and (Sachs et al. 1996) would subsequently enhance the accuracy of the predictions, which is an essential requirement to hold onto in unpredictable emerging markets to stabilize the economy. Integrating environmental, social, and governance (ESG) factors into models of bankruptcy prediction augments traditional financial analysis by adding off-the-beaten-path metrics. ESG factors are emerging as key indicators for the financial soundness and sustainability in the long run of a firm.

Building on the need for enhanced models in emerging markets, it becomes evident that traditional tools like Altman’s Z-Score and O-Score require critical examination, especially in volatile economies like Brazil where they often underperform. However, these models often underperform in emerging markets such as Brazil, where economic volatility and regional factors significantly impact business outcomes (Ashraf et al. 2019; Srebro et al. 2021). Misclassifications from these traditional methods can lead to severe economic repercussions, including misguided investment decisions and flawed financial planning, which heighten the risks of financial distress (Santoso et al. 2024; Rehman et al. 2021). For example, the Z-Score, originally designed for stable manufacturing firms, fails to consider sector-specific risks and the economic instability prevalent in Brazil, resulting in substantial deviations in predictive accuracy (Srebro et al. 2021). This study addresses these limitations by integrating advanced machine-learning techniques with environmental, social, and governance (ESG) factors, thereby creating a more robust framework that enhances prediction accuracy and reliability within Brazil’s dynamic economic context (Srinivas and Bharathi 2024; Ranjan and Goldsztein 2022). This innovative approach aims not only to improve the accuracy of business failure predictions but also to promote sustainable business practices, contributing to the region’s overall economic stability (Lisin et al. 2022).

Research has indicated that the relationship between ESG transparency and firm value is positive, whereas ESG plays a major role in the support of the investment decisions processes. Additional evidence comes from Lim (2024), (Kaleem et al. 2024), and Helminen (2023) regarding the importance of ESG; specifically, ESG further improves the accuracy of risk appraisal and predictive modeling performance. As we’ve identified, integrating ESG factors into bankruptcy prediction models not only provides a broader assessment of a company’s operational, social, and environmental health but also significantly boosts the precision of these models in predicting financial distress, particularly in Brazil. This study further narrows its focus to slow-onset failures, employing ESG criteria to detect early warning signs and evaluate strategic adjustments necessary to avert these prolonged declines. This approach allows for a nuanced analysis that can identify and address the specific challenges faced by Brazilian firms, enhancing both predictive accuracy and strategic responsiveness.

By concentrating on these slow-onset failures, our research leverages ESG factors to identify early warning signs and evaluate the effectiveness of strategic adjustments aimed at preventing such gradual declines. This method enhances the predictive accuracy of our machine-learning models and provides actionable insights for stakeholders, enabling timely interventions that support sustainable business practices and financial stability. Through a detailed exploration of slow-onset failures, our study distinguishes them from sudden-onset failures in both their development and the required preventative strategies.

Following ESG principles not only helps companies build a good reputation through ongoing transparency but also attracts more investments from stakeholders (Alsayegh et al. 2020). Neglecting ESG risks signals indifference to broader societal and environmental issues; it can result in financial materialization (Liu and Lin 2022). In the same breath, Chouaibi et al. (2022) affirmed a strong relationship between firm commitment to corporate social responsibility (CSR) and financial strength, hence bringing out the role ESG efforts play in financial positioning. Research also finds that, along with “higher ESG score” companies, there are “lower financial distress risks”; it thus underlines ESG metrics’ value for precursors to financial instabilities (Friede et al. 2015).

The inclusion of ESG considerations in predictive frameworks aligns very well with the need to promote resilient and sustainable business practices among Brazil’s non-financial firms (Ahmad et al. 2021). Integrating ESG variables into ML models for bankruptcy prediction has been found to improve forecasting precision and add value through offering complex understanding of solvency problems. Kaleem et al. (2024) supported this integration in the context of Chinese firms, stressing that this integration helps make models more precise and fill the gap of traditional financial data. Studies have also pointed out that ESG performance can reduce financial risks, enhance corporate financial performance, and improve firms’ value (Kim and Li 2021; Zhao et al. 2018; Aydoğmuş et al. 2022). Also, there is the gradually rising popularity of ESG criteria to be increased and incorporated within the boundaries of regulatory control mechanisms to improve the stability of financial institutions (Capelli et al. 2021). All in all, this change toward the integration of ESG factors within risk management strategies and financial models reflects a larger trend in sustainable and responsible business practices. Basically, with the use of ESG information, firms can at least mitigate their risks and at the same time improve financial performance that, at some point, may attract socially aware investors. In the latter half of this century, with the rise of machine-learning techniques, they have been applied to the field of bankruptcy prediction, allowing a gain in the precision and stability of the predictive model.

Indeed, promising results of machine-learning (ML Models) approaches instigated researchers like Olatunji Akinrinola et al. (2024), Le et al. (2019), (Domicián et al. 2023), and Jadhav et al. (2018) to investigate various ML models and methodologies that herald the bright future of these techniques in revolutionizing the landscape of bankruptcy prediction. Indeed, studies focusing on accuracy in predicting bankruptcy reveal better results for the models vis-à-vis conventional statistical methods, such as Shekar et al. (2022). Additionally, the study of imbalanced learning methods and the use of artificial neural networks by Ansari et al. (2020) suggest a growing trend in the use of advanced computational technologies in financial analysis. Thus, integrating with ESG scores as proposed by Elklawy (2024) stands the convergence of cutting-edge analytics with sustainability considerations, aiming at a most accurate and comprehensive prediction of financial distress.

Despite the progress made in bankruptcy prediction models and their techniques, there remains a gap to be filled by the full integration of ESG into the models, particularly in light of the special economic environment ESG disclosure has in Brazil. This research seeks to close this gap by putting together a predictive model that uses a combination of financial and accounting ratios and ESG scores in seeking a more accurate and all-encompassing forecast of business failures within the Brazilian economy.

A number of research questions are raised in this study, starting with how far ESG scores can improve precision in bankruptcy models; in addition to this, they raise the question of how machine-learning techniques apply to integrate ESG scores into financial forecasting in an effective way. The contribution of this study is to develop a comprehensive model predicting corporate insolvency, based on substantial financial and accounting ratios beside the ESG scores of every firm, examining the application of diverse machine-learning methodologies.

This study thus pursues twin objectives: first, to single out the key factors that best predict the risk of corporate failure in the Brazilian market from a wide array of financial metrics and ESG (environmental, social, and governance) scores; second, to come up with a prediction model that incorporates the elements above with a view to improving the precision of the forecast. This study basically aims to ensure that these objectives are met through exposure of the usefulness of ESG score, high levels of financial and accounting ratios, and machine-learning methodologies in predicting bankruptcies. This would contribute to developing economic stability, as well as sustainable business practices.

The importance of the study goes beyond scholarly contributions, contributing to the advanced tools of policymakers, investors, and business leaders in evaluating the financial well-being of, and making improved decisions for, the concerned entity. Moreover, the integration of ESG considerations brings the study within the domain of international trends in progressive and ethical conduct of business, supporting an approach that is ecologically sensitive and socially accountable, facilitating economic development.

The rest of this study is organized as follows. Section 2 provides a comprehensive review of machine-learning approaches for business failure. Section 3 explains the data and methodology used in this study. Section 4 explains the results and analysis. The last section concludes the study.

2. A Review of Machine-Learning Techniques for Business Failure Prediction

This study adopts an elaborate suite of 11 major machine-learning algorithms, each chosen for its ability to facilitate the predictability of company failure that might entail diverse data dimensions and, therefore, be predicted as complex. Details of the review of these methodologies are provided herein. Machine learning is the leading-edge innovation of artificial intelligence, and possesses power in the analysis and forecasting of complex datasets. In this line of thinking, one may say that for higher accuracy and reliability in prediction associated with bankruptcy prediction in non-financial companies within the Brazilian environment, a couple of models from the machine-learning algorithm palette need to be deployed. This paper discusses a few key methodologies and then elaborate on their use, strengths, and limitations in the realm of financial stability analysis.

2.1. Logistic Regression

Logistic regression is one of the most rudimentary tools used in binary classification tasks. More specifically, logistic regression predicts bankruptcy. The simplicity and clarity in the interpretation of the model represents the probability of an event by fitting data to a logistic curve (Bapat and Nagale 2014). This model evaluates the chances of bankruptcy using several financial ratios and indicators, therefore emerging as a clear evaluation of risk (Ogachi et al. 2020). Nonetheless, its effectiveness is mitigated to the extent that the relationship between the independent variables and the log-odds of the dependent variable is non-linear, since it imposes a check on the extremely complex dynamics associated with financial distress (Youn and Gu 2010).

Studies have found logistic regression useful in the prediction of bankruptcies because it can be used to measure exactly how much every single variable contributes to the classification (G. Zhang et al. 1999). Logistic regression models are considered one of the most common bankruptcy prediction models. However, on multiple occasions, logistic regression is outranked by neural networks in terms of prediction accuracy and classification rate (Kristanti et al. 2019). However, logistic regression has been found to have more discriminating power and better predictiveness than the discriminant models when used in a corporate financial distress situation (Uğurlu and Aksoy 2006). Logistic regression has been used relatively favorably in comparison to the discriminant models in the bankruptcy prediction world.

Logistic regression has also been used in predicting financial distress across some sectors, clearly indicating its flexibility and practicality in diverse industries. In carrying out a prediction of bankruptcy for non-financial corporations, logistic regression has also been used and proved to be an effective method of assessing financial risks (Zizi et al. 2021).

2.2. K-Nearest Neighbors (KNN)

The K-Nearest Neighbors (KNN) algorithm is an algorithm for classifying entities to classes, depending on the characteristics of the nearest neighbors of a point in the dataset. Meanwhile, KNN, by its simplicity and flexibility in usage, stands to be the most appropriate method used to capture a complex and varied, non-linear relationship of points between features across the system, without a predefined model (Gaber et al. 2016). However, it depends on the value of the parameter “k” and the way of defining the dissimilarity metric (W. Zhang et al. 2020). Secondly, the curse of dimensionality causes a drop in accuracy, another computational challenge when working with large datasets in KNN (Hjaltason and Samet 1999). Therefore, the challenge is to develop a KNN model that will be more accurate and robust.

The K-Nearest Neighbors (KNN) algorithm is one of the most commonly used simple algorithms with wide application, including machine learning, disease risk prediction, and traffic prediction (Kuang et al. 2019; Sun et al. 2018; Nguyen 2021; Sarker et al. 2020). KNN works by computing the minimum distance between the test data and that of their K-neighbor within the given training data (Paramita et al. 2022).

2.3. Decision Trees

Decision trees are a valuable tool in predictive modeling due to their ability to represent decisions and outcomes in a structured tree format. The good thing with decision trees is that they are good at predictive modeling, showing purity in decisions and outcomes represented in a tree format. They are well-suited to purely numerical data and categorically distributed data. However, the most common challenges of decision trees include overfitting data, leading to bad generalization over unseen data. In overfitting, pruning techniques or ensemble methods like bagging, boosting, or random forest can be applied to evade overfitting and underfitting, respectively (Rokach 2016).

In bankruptcy prediction, decision trees have been very widely used. This idea has also been noted in one such study that has taken bankruptcy prediction using decision trees (Aoki and Hosonuma 2004). This rationalizes the reasons of simplicity and clear interpretability for which decision trees have been applied in models of bankruptcy prediction (Jacobs 2024). Moreover, among various prediction models, decision trees have shown promising results compared to other methods, such as support vector machines (SVMs), for the prediction of bankruptcy (Masanobu et al. 2019). Some research has shown that the application of decision trees in bankruptcy prediction surpasses some other methods applied, thereby evidently revealing their efficiency in the field (Baranyi et al. 2018).

2.4. Support Vector Machine (Linear Kernel)

One of the popularly known ways of performing supervised learning in classification tasks, bankruptcy prediction among them, is the method of support vector machine (SVM) with a linear kernel. If the kernel function of the SVM is linear, then it draws a linear boundary to determine several different classes of data (Min and Lee 2005). The effectiveness of the approach could be witnessed in various applications, for example, the study at hand, financial distress prediction in real estate (Ayuni et al. 2022). SVM is very effective at carrying out both classification and regression exercises; hence it is well known in the machine-learning field as a supervised learning model (Karatzoglou et al. 2004).

Research has shown that in the domain of bankruptcy prediction, SVMs can be applicative measures whose accuracy proves to be higher than that of traditional tools (Min and Lee 2005). Still, in various sectors of the industry where these applicative measures were gauged, the financial area came on stage, with SVMs being applied to forecast the financial condition of a company (Ding et al. 2008). Furthermore, support vector machines perform better than many other classification models, which include logistic regression and decision trees (Rainarli 2019). Researchers have also combined SVMs with feature selection techniques to improve their accuracy in predicting bankruptcy, especially during financial crises (Dellepiane et al. 2015). SVMs have, in fact, also been investigated in combination with other methods, such as Gaussian processes, for their use in probabilistic modeling in bankruptcy prediction scenarios (Antunes et al. 2017).

Specifically, the linear kernel SVMs have shown powerful performance in binary classifications, so much so that some of the highest accuracy results of this model, varying between 77.8% and 91.2%, have been reported in different scenarios (Chui and Lytras 2019). Additionally, SVMs have been applied in multiclass sentiment analysis that can effectively deal with both linearly and non-linearly separable datasets, but the latter requires the former to be transformed into higher dimensions for this purpose (Mukarramah et al. 2021).

2.5. Support Vector Machine (Non-Linear Kernel)

Non-linear kernelized support vector machines (SVMs), particularly, have an advantage when it comes to datasets that are not extremely huge. These rely on non-linear classification, usually with an embedded gamma hyperparameter to improve their predictive ability (Kalaiarasi and Maheswari 2021). In the prediction of bankruptcy, SVMs, in fact, have many applications and have found common usage in prediction models due to their easy interpretability of analysis, being data-driven, and having free nature of distribution characteristics compared with traditional statistical methods (Muñoz-Izquierdo et al. 2019). Optimization is one of the fundamentals in SVM, linked with Gaussian kernel parameters (gamma) and penalty factors (Min and Lee 2005). Research has discovered that support vector machines (SVMs), particularly with the right set of optimized kernel function parameters, offer high effectiveness when applied in the bankruptcy prediction task (Wang et al. 2017).

SVMs have been humanly construed as the most popular classifier because of their capability to obtain the best solution in the world and show high generalization performance (Wang et al. 2017). Furthermore, SVMs have further been applied to some other domains with great success in addition to bankruptcy prediction, such as the analysis of landslide hazard and wind energy forecasting (Moreno et al. 2020; He and Li 2021). Further comparisons between SVMs and other artificial neural network classification methods indicate that, in some situations, SVMs are better than other methods (Kalantar et al. 2018; Byvatov et al. 2003). Consequently, SVMs have become the object of extensive studies in the field of bankruptcy prediction, covering different areas from feature selection and ensemble learning to the application of fuzzy SVMs (Smiti and Soui 2020; Lin et al. 2018; Chaudhuri and De 2011).

The overall view on linear SVMs with non-linear kernels vis-à-vis bankruptcy prediction is, in fact, based on their powerful and reliable characteristics in the context of their flexible behavior when dealing with complex decision surfaces.

2.6. Artificial Neural Networks (ANNs)

The deep learning variation of ANNs is, in actuality, currently at the forefront of all other non-linear and complicated relationship models. It thereby becomes very effective at catching the intricate patterns present in big and diverse data sets. ANN models have also revealed success in the area of bankruptcy prediction, since they can generalize, learn, and define patterns in financial data (Özparlak and Özdemir Dilidüzgün 2022). But though they are highly effective, the main problem with ANNs is that they have often been criticized as being somewhat a ‘black box’ system that requires a greater number of training data sets and are, in fact, computationally too costly to be considered (Ansari et al. 2020).

The artificial neural networks (ANNs) application is one of the areas where evidence from some studies noted effectiveness in bankruptcy prediction. Some of the strategies that have been suggested for enhancing the efficacy of ANNs in bankruptcy prediction are neuro-genetic approaches, hybrid metaheuristic methods, and differential evolution-based pruning models (Gaytan et al. 2022; Tang et al. 2019). They are all meant to train ANNs in the most proper way to increase their accuracy in financial distress prediction.

In addition, the use of ANNs in financial applications has been attracting the attention of scholars over the years. Many researchers have been using ANNs for their applications in finance to deal with wide-ranging problems, including, but not limited to, financial crisis prediction (Shin and Lee 2004) and capital structure analysis (Lim and Nam 2006). These flexible inherent characteristics of ANNs make them adaptable to a broad spectrum of financial scenarios, and their use can even be extrapolated into some nonstandard cases to gain valuable insights into complex economic phenomena.

2.7. Random Forest

Random forest is a flexible model that is applied for the solution of regression and classification tasks due to its outstanding performance for solutions and simplicity of structures (DeSalvo and Mohri 2016). As an ensemble method, this kind of method proposed by Xu et al. (2023) incorporates the bagging approach with the random subspace method (DeSalvo and Mohri 2016). One such multi-purpose model is random forest, which can be used for both classification and regression tasks. Random forest is also known for its high-classification performance, bagging, feature selection, and quick training and classification speeds (Mishina et al. 2015). Random forest can be a useful tool in the prediction of bankruptcy.

This model is a combination of many classification trees, and variables that have greater scores of importance play a bigger role in the classification decision (Prusak 2018). Furthermore, in comparison with an estimate feature importance problem, Random forest handles with more sophistication through better handling of the curse of dimensionality and obtaining accurate estimates of feature importance without costly model training in an iterative manner, which needs cross-validation (Ahsan et al. 2022). In the field of bankruptcy prediction, machine-learning models like random forest are quickly moving to be even more preferred than conventional methods, including artificial neural networks. There is a shift in the use of machine-learning models like random forest, away from being a leading model for the prediction of bankruptcy, with support vector machines, bagging, boosting, and giving attention to random forest itself (Kovacova and Kliestikova 2017).

2.8. Gradient Boosting

Gradient boosting is a machine-learning technique for building an additive model in a forward stage-wise manner. In this kind of technique, as referred to by Friedman in the year 2001 (Cheng et al. 2018), new prediction models are trained, in most cases concerning errors of the model ensemble that have been generated in the previous iteration (Pérez et al. 2020). The gradient boosting decision tree (GBDT) is a classical type of boosting-based model that minimizes the loss of prediction through regression-like training (Feng et al. 2020).

The usefulness of XGBoost is outstanding in the context of bankruptcy prediction. Studies by Yotsawat et al. (2023) and Smiti et al. (2022) have evidenced the successful application of XGBoost in banking as the best method to solve the problem of imbalanced data, both in bankruptcy prediction and credit risk assessment. In addition, XGBoost has also demonstrated good or excellent performance in a variety of fields, including credit-scoring (Yotsawat et al. 2021).

Most importantly, the gradient boosting algorithm is one of the best machine-learning techniques applied to the prediction of tasks. It has been applied in diverse fields, such as price prediction for agricultural commodities (Hegde et al. 2023), slope stability prediction in mining (Saadaari et al. 2020), and even in the development of models for prediction of soil moisture (Hwase and Fofanah 2021). The algorithm can learn functional mapping and, for this reason, it has been noted in several research findings to have very high prediction accuracy (Vanhaeren et al. 2020). It is, therefore, a very useful tool for predictive modeling.

2.9. XGBoost Classifier

The reasons XGBoost has become so popular include the fact that it is a highly optimized version of the gradient-boosting algorithm used in many fields and its tendency to control overfitting better than traditional gradient-boosted methods (Chen and Guestrin 2016). Indeed, XGBoost has been used with great success in a series of problems, such as healthcare, urban classification, fault diagnosis, and financial risk assessment (Torlay et al. 2017; Georganos et al. 2018; Abdi 2020; Deif et al. 2021). Research findings suggest that XGBoost performs better in accuracy and performance than other classifiers, to name a few—the classical support vector machine, random forests, and neural networks (Jafarzadeh et al. 2021; Ramdani and Furqon 2022; Asselman et al. 2023; Sahin 2020). For example, a study on the classification of urban forests found that XGBoost could do even better than the rest of the models at this task, with a lower root mean square error (Ramdani and Furqon 2022).

Moreover, the prediction of bankruptcy has applied XGBoost and proven how flexible this great tool can be in financial applications (Smiti et al. 2022). XGBoost can help researchers improve their prediction models’ accuracy regarding bankruptcy (Smiti et al. 2022). In this case, XGBoost becomes one of the ensemble classifiers, combining the weak learners using boosting to come up with a strong classifier, ensuring accuracy in prediction scenarios.

Moreover, XGBoost can achieve high computational efficiency and accuracy in, for example, recognition of indoor activities and monitoring the pattern of breathing, with a minimum of computation used (Purnomo et al. 2021). This may be the main reason for choosing it in machine-learning applications, as the machine can process data quickly and its accuracy is high. Other studies further show that the model XGBoost is optimizable with additional methods such as the artificial bee colony algorithm to elevate its classification accuracy (Wang et al. 2022b).

2.10. AdaBoost Classifier

The AdaBoost classifier comes under the class of iterative ensemble methods, where the objective is to sum several weak classifiers in a boosted classifier (Kadkhodaei et al. 2020; Kumar et al. 2021). On the contrary, the boost-by-majority classifiers add weak hypotheses by summing their probabilistic predictions (Ganatra and Kosta 2010). They are a linear combination of weak classifiers; each of the weak classifiers focuses on the classification of different input features (Hu 2020). The approach maximizes the strengths contributed by the weak classifiers in a way that each makes better than random predictions (Ferreira and Figueiredo 2012; Wyner et al. 2017).

Bankruptcy prediction for Brazilian firms has been based on AdaBoost; and this same technique has been used in the prediction of bankruptcies for firms belonging to another industry, specifically Korean construction companies (Barboza et al. 2022). Research has indicated that further improvement in performance can be achieved from the ensemble combination of AdaBoost and SMOTE in financial bankruptcy prediction (Sun et al. 2014; Faris et al. 2020). Additionally, AdaBoost has been used in the fault diagnosis method for aviation cable (Wang et al. 2022a) and structural health monitoring framework methods, respectively, indicating their potential applications across different domains (Buckley et al. 2023).

2.11. Catboost Classifier

CatBoost is a powerful gradient-boosting algorithm known for its ability to deal with categorical data directly, without performing lengthy preprocessing to make the dataset usable. This particular case has to do with bankruptcy prediction, since its data contain, more often than not, not only numerical but also categorical variables. Its efficiency, together with advanced regularization abilities, puts the algorithm in an excellent position for predictive tasks in such contexts (Prokhorenkova et al. 2018).

Classifiers like CatBoost used in predicting business failure have been well covered in the literature. Recently, CatBoost—an ML algorithm—has been applied to anticipating business failure; machine-learning algorithms such as CatBoost have been utilized in the development of models for the prediction of financial insolvency risk (Zou et al. 2022), fraudulent financial reporting (Arshad et al. 2015), and the long-term business failure of construction companies (Choi et al. 2017). Moreover, business success and failure have been found to make up important profitability, debt-related factors, and liquidity-related factors in predicting business failure and financial distress. Other predictors of success and failure in small businesses include the age of the business and first-year sales (Marom and Lussier 2014).

To further illustrate the comparative effectiveness of the machine-learning algorithms discussed, the subsequent table encapsulates a concise summary of each method’s advantages and disadvantages, as well as their specific applications in the study. Table 1 serves as a quick reference to assess the suitability of each algorithm for different aspects of bankruptcy prediction, enabling a clearer understanding of their practical deployments and inherent limitations within the context of predicting financial distress.

3. Data and Methodology

3.1. Data and Variables

In this study, we initiated our analysis with 40 variables previously recognized for their efficacy in predicting business failure among non-financial firms in Brazil. These variables include a comprehensive set of financial and accounting ratios as well as environmental, social, and governance (ESG) scores. They were selected due to their established predictive value in the literature, illustrated by classic models like Altman’s Z-score and O-score, and the probit model. Historically, each ratio has been validated as a robust indicator of a firm’s financial health and potential distress. These ratios were sourced from the Refinitiv Eikon database. Python was used for the analysis.

Our study adopts a dual approach to categorize Brazilian firms into ‘failed’ and ‘non-failed’ groups, based on distinct criteria inspired by Richardson et al. (1994). This framework identifies four specific types of business failures, each requiring tailored managerial responses. Predominantly, Brazilian firms fall into what is described as the ‘boiled frog’ syndrome, where businesses deteriorate gradually and fail to adapt to changing market environments. This gradual decline and strategic entrapment, marked by continuous financial losses over successive years, define our criteria for business failure. Conversely, we identify non-failing firms as those showing financial stability or improvement over time.

To underpin our categorization with empirical evidence, we re-checked the selected firms using the criteria in the literature in the studies by Laeven and Levine (2009), Delis and Staikouras (2011), and Chiaramonte and Casu (2017). They analyze the mean and standard deviation of return on assets (ROAs) across rolling five-year periods, and calculate a Z-score, a widely used metric to assess financial health and bankruptcy risk. For our purposes, a Z-score is standardized to ‘0’ if a firm encounters financial difficulties during period ‘t’, and ‘1’ otherwise. This method recognizes the absence of a universal threshold for differentiating between financially healthy and distressed firms.

Based on this methodology, we classified 235 companies into either financially troubled or financially stable categories. Firms with higher Z-scores are deemed financially healthy or not in financial trouble, while those with lower Z-scores are categorized as financially troubled. This classification system supports our analysis and facilitates the training and testing of various machine-learning methods to accurately predict bankruptcy potential.

These firms were systematically selected from an extensive database of Brazilian non-financial firms based on criteria that ensure a representative sample of Brazil’s broader economic landscape. This approach guarantees the generalizability and relevance of our findings to local market conditions. The selection reflects a diverse cross-section of industries, enhancing the applicability of our results across various sectors.

This study employs several machine-learning algorithms—logistic regression, KNNs, decision trees, linear SVMs, RBF SVMs, neural networks, random forest, gradient boosting, XgBoost, AdaBoost, and CatBoost classifiers—in order to assess the bankruptcy risk of non-financial companies in Brazil. This was assessed only after conducting an elaborate process to minimize multicollinearity and perform principal component analysis of the data.

3.2. Data Screening Process

We began the data analysis process from feature screening by conducting a thorough evaluation of multicollinearity, which is crucial for ensuring the accuracy and dependability of our prediction model. First, we eliminated redundant characteristics. To address the issue of multicollinearity, which is characterized by strong correlations between independent variables and makes it difficult to analyze their individual effects by raising the variability of model coefficients, we utilized the variance inflation factor (VIF). We primarily focused on and eliminated variable pairings that exhibited a correlation coefficient of 0.9 or above, regardless of whether the correlation was positive or negative. By excluding 12 attributes, the procedure enhanced the model’s interpretability and predicted accuracy.

The removal of these characteristics also simplified the feature set, which is crucial for improving bankruptcy prediction capabilities by highlighting key indicators of financial stability in Brazilian firms. This step emphasized the importance of critical feature selection and multicollinearity reduction in complex predictive models used for financial assessments.

To further streamline the dataset and improve its manageability, we employed principal component analysis (PCA) on the remaining 28 variables. This technique reduces dataset complexity by minimizing the number of variables, focusing on those that capture the most significant portion of the data’s variance. Such reduction is vital not only for easing model development and analysis but also for curtailing computation demands and enhancing the efficiency of subsequent machine-learning applications. After implementing PCA as reported below in Figure 1, our analysis concentrated on seven variables that captured nearly 60% of the total variance, which were then utilized in further machine-learning endeavors.

3.3. Descriptive Statistics

In Table 2, we perform descriptive analysis on the remaining seven key variables that were retained for further detailed examination; each of them would show a substantial difference in their statistical characteristics between failed and non-failed firms. This distinctness points out the potential use of descriptive analysis in forecasting Brazilian non-financial firms’ financial stability.

The environmental, social, and governance (ESG) scores presented a pronounced non-failed firm, with significantly higher averages than its failed counterparts and slightly more variability. In other words, companies with strong ESG values may be subject to great operational stability and governance, features that might be very proficient in yielding financial risk reduction. In a Brazilian context, where ESG issues have an increased influence, sustainability impacts both business operations and investor decisions. That is, higher ESG scores may reflect improved management of risks and long-term strategic planning.

Similarly, the net profit margin (P1) analysis presented a negative mean in both groups, referring to difficult or typical economic conditions of a sector that have a large impact on profitability. The variability is so high among failing firms that it does tend to indicate a further susceptibility towards extreme financial distress, indicating the importance of this metric to firm health.

This can be translated into the fact that, in operational efficiency, as captured by operating return on assets (P7) and return on capital employed (P8), non-failed firms were able to have positive returns, which clearly shows more efficiency in the management and use of assets and capital. Negative returns, in the meantime, of failed firms point to poor capital management—an aspect not of great importance in a landscape which characterizes many sectors in Brazil. This aspect is of poor capital management in essentially capital-intensive contexts.

For instance, liquidity metrics like quick ratio (L2) and cash to current liabilities (L3) revealed less variability in failed firms, though both groups carried negative mean values. All the same, this would mean a uniformly poor liquidity position among these firms, leading to financial instability. On the other hand, a high level of variability in non-failed firms might signal divergence of their liquidity management practices, where some firms possibly maintain adequate liquidity buffers that can cushion operational and financial pressures.

Lastly, inventory turnover ratio (AC1) gave insight into operational efficiency, where the non-failure group had relatively high turnover but extreme variability. This would mean that, while some of the non-failed firms are effectively managing their inventory as reflected from their capacity to meet demands of the market and saving costs, some could have been experimental with their strategies with regard to optimization of the inventory level.

These variables together give a more comprehensive idea about the financial situation and operational efficiency of firms in Brazil, thereby pointing out the most important financial indicators to predict stability or distress situations of firms. This will give a worthy insight to stakeholders, in the sense that investors, creditors, and regulatory bodies, among others, will make informed choices on the financial viability of firms within the emerging market.

3.4. Pooled Within-Groups Correlation

In building up our predictive model, we provided the pooled within-groups correlation matrix in Table 3. This has been very instrumental to refining the selection of variables discriminating robustly between failed and non-failed firms. In general, relations among the seven variables (ESG, P1, P7, P8, L2, L3, AC1) indicate a weak relationship, just confirming little redundancy among the features. This lack of strong correlations is beneficial, in that it signifies that each of the variables contributes unique informational value and enhances the model’s ability to discriminate between groups based on distinct financial indicators. More notably, weak positive correlations (0.046) between ESG score and net profit margin (P1) would explain that, while relevant, the two actually contribute independently toward the model and thus allow fine-tuned interpretations of financial health. The high coefficient of correlation (−0.628) between operating return on assets (P7) and return on employed capital (P8) would therefore indicate a crucial trade-off, augmenting the model with insight of opposite movements of profitability and capital utilization efficiency.

Further, the negative relationship of liquidity ratios (L2, L3) with other profitability ratios (P1, P7, P8) offers a contradiction to many dimensions which need balancing between liquidity and the profile of the income statement ratios, which are of great importance for forecasting needs. This will underline relationships that indicate either financial weaknesses or strengths based on the strategies used by the firm in managing such. The correlation matrix informs the process of feature selection by identifying the independent variables that are most informative, minimizing the problems of multicollinearity, and making sure that the model developed has the capability to fit a wide range of indicators of financial health. Then, carried out over these variables was principal component analysis (PCA), to reduce dimensionality and extract variables most able to capture critical aspects of data variance—an important exercise for both computational efficiency and model predictive accuracy.

The insight drawn from the presented matrix is paramount in training our machine-learning models to make effective predictions of the financial stability of Brazilian non-financial firms. This structured approach helps us derive a more robust model that not only predicts with higher accuracy but also yields clear and actionable insights for stakeholders into the financial dynamics within the Brazilian market.

4. Results and Analysis

4.1. Application of Machine-Learning Models for Prediction Business Failure

In this study, reduced data obtained after an exhaustive pre-processing exercise, including multicollinearity checks and principal component analysis, were used to predict the bankruptcy potential of non-financial firms in Brazil using various the machine-learning models provided in Table 4. The evaluation is based on effectiveness with regard to each model, using precision, recall, F1-score, and overall accuracy. We provide a strong set of combined metrics, one that ensures our findings are statistically sound and practically relevant.

The logistic regression model showed high precision and recall scores, amounting to 0.92 and 0.94 for non-failure firms, respectively, and 0.94 and 0.92 for failure firms, respectively, with a resultant F1-score of 0.93 for both groups. This further supports the assertion of logistic regression being slightly less accurate than the more sophisticated models and points to a good choice for this kind of binary classification task, with an overall accuracy of 93.39%.

Among some other advanced models, K-Nearest Neighbors, decision trees, and a whole series of ensemble methods—such as random forest, gradient boosting, XgBoost, AdaBoost, and CatBoost—brought out super high performance. The models, especially random forest and CatBoost, had 100% scores in all metrics for differentiating between failed and non-failed firms. It is demonstrated through these models that they well reflect the patterns hidden in the data and hence depict a good ability to handle complex, non-linear relationships and interactions among the financial indicators.

Also, support vector machine (SVM) with linear and radial basis function (RBF) kernel models exhibited a robust performance. The linear kernel SVM gave slightly improved precision and recall of non-failed firm classifications than failed firms; it gave a slightly higher sense of sensitiveness to the identification of non-failed firms. On the other hand, the SVM with RBF kernel found slightly more failed firms due to its higher recall in this category.

Both precision and recall approached 0.97 to 0.98, with F1-scores around 0.97, and overall accuracy at 97.65% for the neural network model. This is an obvious indicator that the model can generalize well in performance for both classes, without having serious bias towards any class. In conclusion, the results showed that random forest and XGBoost classifiers demonstrated exceptional clarity and reliability. Random forest achieved a precision and recall of 100%, indicating it effectively differentiated between failed and non-failed firms without misclassification. Similarly, XGBoost showed high performance across metrics, with its robust handling of imbalanced data making it particularly valuable for predicting business failures in environments with varying data scales. These models are recommended for future studies due to their ability to deliver precise and interpretable results that are crucial for early warning systems in financial health assessments.

In a nutshell, applied prediction models from our study add a lot of value to knowledge in financial risk management, as they offer precise and reliable prediction of bankruptcy. The high-performance metrics in both our feature selection and data processing techniques reflect effectiveness, reassuring that the key retained variables post-PCA possess critical predictive power. Those insights do make our understanding of the financial stability indicators of Brazilian firms richer and also serve as an instrument that would bring added value for stakeholders, which include investors, creditors, and regulatory bodies, to make informed decisions. Our findings underline reliance on advanced analytical techniques for the assessment of financial health of firms and reinforce the potential of machine learning in financial forecasting and risk assessment.

4.2. Receiver Operating Characteristic (ROC) Curves

The receiver operating characteristic (ROC) curve in Figure 2 is among the most critical evaluation tools in a given diagnostic test. In the context of this study, the ROC curve graphically demonstrates compromise from the true positive rate (sensitivity) to the false positive rate (1-specificity) of the used classification models. Each curve represents a different model, and the area under the ROC curve (AUC) will give a single measure of overall performance. The closer the curve comes to the left-hand border, then the top border of the ROC space, the better the classifier. Notably, both classifiers built by random forest and CatBoost yield high sensitivity and specificity, with ROC curves being close to the top-left corners—an excellent prediction model in financial assessments.

The ROC curve in the current research represents the performance of machine-learning models used in the classification of failure from non-failure of Brazilian-based firms. Those with ROC curves closing into the top-left corner of the graph, like the random forest and CatBoost classifiers, signal better performance. These curves, together with the high AUC values, demonstrate the discriminative power in the models, which is of great importance in investors or regulatory bodies obtaining precise financial risk assessments.

4.3. Confusion Matrices

The confusion matrices in Figure 3 are a very detailed breakdown of model’s predictions against actual labels. They include true positives, true negatives, false positives, and false negatives. This is a very important piece of information for understanding the precision of the model (number of positive predictions made correctly/total positive predictions made) and recall (number of positive predictions made correctly/positives that should have been identified).

The confusion matrix of the ensemble methods, such as random forest and CatBoost, showed a near-perfect classification in their confusion matrix, where a high number of true positives and negatives had very small incidences of both false positives and negatives. This proves their reliability to stakeholders making critical financial decisions. Last but not least, the confusion matrices showed accurate results based on the model. High counts of true positives and true negatives have been achieved, especially in models like random forest and the CatBoost classifiers, which imply that these models are very reliable.

Fewer false positives and negatives show that the models are doing a good job at minimizing the cost of misclassification. This, therefore, becomes an interesting study for creditors, whose major concern is not to be assured that the firm is stable and performing well without any undue assurance.

4.4. Feature Importance

The feature importance graph in Figure 4 shows a contribution, that is, the relationship of each variable to the model’s prediction. For instance, the ESG score in many models represents a significant predictor and points to increasing importance for the financial stability of firms in terms of environmental, social, and governance factors. High feature importance for ESG across models suggests the sustainable practice might be close to being linked with the financial corporate health of an enterprise. Both are key insights to investors and regulators with a keen interest in financial performance as well as the responsibilities of a corporation. The feature importance graph explains the relative weightage of each variable in the predictive models. For example, in some models, the ESG score predicts the firm’s stability and therefore is an important variable with high importance. This result corresponds to the current trend of sustainability, which emphasizes relevance to the stakeholders’ interest in long-term financial returns and sustainable business practices. Other than this, the variables return on capital employed (P8) and net profit margin (P1) showed varying levels in the case of multiple models. This implies that quite nuanced roles of the indicators of financial health take place.

5. Conclusions

This work examines thoroughly, with the support of machine-learning techniques, the predictive ability of ESG scores for Brazilian non-financial firms. The results of the study provide that models which incorporate ESG rating do indeed greatly help improve the precision of bankruptcy predictions, thus pointing out clearly the focus on sustainability as a business. These insights play a big role for stakeholders whose ambition is to shore up economic stability and foster responsible investment strategies, especially in dynamic and emerging markets like Brazil.

To enhance the assessment of ESG factors, our study suggests integrating a more granular approach to capturing ESG nuances, such as differentiating between governance, social, and environmental impacts separately. Additionally, employing feature importance techniques within machine-learning models, as done in our study, can highlight which ESG components are most predictive of financial distress, allowing researchers and practitioners to focus on specific areas of ESG that influence corporate performance the most.

The consequences of this work are substantial. It demonstrates that ESG scores belong as a part of sustainability metrics in standard financial analysis, with potential impacts at the policy and corporate governance levels. These findings would therefore support more nuanced risk assessments and more nuanced investment decisions, i.e., that investment decisions should assess, side by side with economic considerations, social and environmental impacts on a level-playing-field basis. In addition, such an approach is likely to inspire companies to increase ESG performance when they recognize a direct relation of ESG with their financial firmness and confidence of the investors.

AI, particularly advanced machine-learning models like those applied in our study, can significantly enhance decision-making in business contexts by providing high-accuracy predictions of potential business failures. These technologies enable early identification of risk factors that are not apparent through traditional analysis methods. Investing in AI facilitates deeper insights into complex relationships between variables, such as those involving ESG factors, which are increasingly relevant in today’s business environment. This not only aids in avoiding financial distress but also supports sustainable business practices, making a strong case for its broader adoption.

Nonetheless, this study has some limitations. One limitation is that it focuses only on non-financial firms in Brazil, which is limiting with regards to its generalizability to other sectors or geographies. Furthermore, reliance on the model’s predictions to machine-learning models, though it bestows great strength on the analysis of models, may make the interpretability of predictions difficult; a common criticism of complex models alludes to a “black box” problem. This might therefore make such models practically cumbersome for stakeholders, who need clear, easily interpretable decision-support tools.

Future studies on this subject must, therefore, expand in such a way as to replicate across several markets and industries. In-depth ESG factors analysis should be conducted since it is very important to take into consideration the scores of ESG analysis such as total ESG risk score, environmental risk score, governance risk score, social risk score, and level of controversy score. This will refine further the predictive model in a manner that brings out factors that affect most of the financial performance. It has to be conducted in a way that brings about the development of approaches to improve the clarity and interpretability of ML models’ predictions. One such approach could be hybrid models marrying the prediction strength of deep learning with the power to explain, which is traditional, of statistical methods. These may include longitudinal study designs to help assess the temporal stability of ESG scores as predictor constructs of financial health and, thereby, understand the impacts of long-term good sustainability practices on financial outcomes over several economic cycles.

Author Contributions

Conceptualization, H.R.; Methodology, H.R.; Software, H.R. and L.P.M.; Formal analysis, M.K.; Resources, A.M.A.; Data curation, A.M.A. and L.P.M.; Writing—review & editing, M.K. and S.A.; Visualization, S.A.; Supervision, A.M.A.; Project administration, S.A.; Funding acquisition, A.M.A. and S.A. All authors have read and agreed to the published version of the manuscript.

Funding

Scientific Research Fund provided by FCT—Fundaçao para a Ciência e Tecnologia (Portugal), National Funding through Research Grant, Grant/Award Number: UIDB/00685/2020—CEEAplA (António Almeida), UIDB/04521/2020; UIDB/04007/202 (Sumaira Ashraf).

Data Availability Statement

The data supporting the findings of this study are available from the Datastream database, accessed via our university account. Due to licensing agreements and privacy restrictions, the raw data cannot be publicly shared. Researchers can access these data through their institutional subscriptions to Datastream. For further details on the dataset and access procedures, interested researchers may contact the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

References

Abdi, Abdulhakim Mohamed. 2020. Land cover and land use classification performance of machine learning algorithms in a boreal landscape using Sentinel-2 data. GIScience and Remote Sensing 57: 1–20. [Google Scholar] [CrossRef]
Ahmad, Nisar, Asma Mobarek, and Naheed Nawazesh Roni. 2021. Revisiting the impact of ESG on financial performance of FTSE350 UK firms: Static and dynamic panel data analysis. Cogent Business and Management 8: 1900500. [Google Scholar] [CrossRef]
Ahsan, Muhammad, Arif Khoirul Anam, Erdi Julian, and Andi Indra Jaya. 2022. Interpretable Predictive Model of Network Intrusion Using Several Machine Learning Algorithms. BAREKENG: Jurnal Ilmu Matematika Dan Terapan 16: 057–064. [Google Scholar] [CrossRef]
Akinrinola, Olatunji, Wilhelmina Afua Addy, Adeola Olusola Ajayi-Nifise, Olubusola Odeyemi, and Titilola Falaiye. 2024. Application of machine learning in tax prediction: A review with practical approaches. Global Journal of Engineering and Technology Advances 18: 102–17. [Google Scholar] [CrossRef]
Alsayegh, Maha Faisal, Rashidah Abdul Rahman, and Saeid Homayoun. 2020. Corporate economic, environmental, and social sustainability performance transformation through ESG disclosure. Sustainability 12: 3910. [Google Scholar] [CrossRef]
Ansari, Abdollah, Ibrahim Said Ahmad, Azuraliza Abu Bakar, and Mohd Ridzwan Yaakub. 2020. A hybrid metaheuristic method in training artificial neural network for bankruptcy prediction. IEEE Access 8: 176640–50. [Google Scholar] [CrossRef]
Antunes, Francisco, Bernardete Ribeiro, and Francisco Pereira. 2017. Probabilistic modeling and visualization for bankruptcy prediction. Applied Soft Computing Journal 60: 831–43. [Google Scholar] [CrossRef]
Aoki, Shigeo, and Yukio Hosonuma. 2004. Bankruptcy Prediction Using Decision Tree. In The Application of Econophysics. Tokyo: Springer, pp. 299–302. [Google Scholar] [CrossRef]
Arshad, Roshayani, Sharinah Mohamed Iqbal, and Normah Omar. 2015. Prediction of business failure and fraudulent financial reporting: Evidence from Malaysia. Indian Journal of Corporate Governance 8: 34–53. [Google Scholar] [CrossRef]
Ashraf, Sumaira, Elisabete G. S. Félix, and Zélia Serrasqueiro. 2019. Do Traditional Financial Distress Prediction Models Predict the Early Warning Signs of Financial Distress? Journal of Risk and Financial Management 12: 55. [Google Scholar] [CrossRef]
Asselman, Amal, Mohamed Khaldi, and Souhaib Aammou. 2023. Enhancing the prediction of student performance based on the machine learning XGBoost algorithm. Interactive Learning Environments 31: 3360–79. [Google Scholar] [CrossRef]
Aydoğmuş, Mahmut, Güzhan Gülay, and Korkmaz Ergun. 2022. Impact of ESG performance on firm value and profitability. Borsa Istanbul Review 22: S119–S127. [Google Scholar] [CrossRef]
Ayuni, Ni Wayan Dewinta, Ni Nengah Lasmini, and Agus Adi Putrawan. 2022. Support Vector Machine (SVM) as Financial Distress Model Prediction in Property and Real Estate Companies. In Proceedings of the International Conference on Applied Science and Technology on Social Science 2022 (ICAST-SS 2022). Amsterdam: Atlantis Press, pp. 397–402. [Google Scholar] [CrossRef]
Bapat, Varadraj, and Abhay Nagale. 2014. Comparison of Bankruptcy Prediction Models: Evidence from India. Accounting and Finance Research 3: 91–98. [Google Scholar] [CrossRef]
Baranyi, Aranka, Csaba Faragó, Csilla Fekete, and Zsuzsanna Szeles. 2018. The Bankruptcy Forecasting Model of Hungarian Enterprises. Advances in Economics and Business 6: 179–89. [Google Scholar] [CrossRef]
Barboza, Flávio Luiz De Moraes, Denize Lemos Duarte, and Michele Aparecida Cunha. 2022. Anticipating corporate’s distresses. Exacta 20: 470–96. [Google Scholar] [CrossRef]
Bateni, Leila, and Farshid Asghari. 2016. Bankruptcy Prediction Using Logit and Genetic Algorithm Models: A Comparative Analysis. Computational Economics 55: 335–48. [Google Scholar] [CrossRef]
Buckley, Tadhg, Bidisha Ghosh, and Vikram Pakrashi. 2023. A Feature Extraction & Selection Benchmark for Structural Health Monitoring. Structural Health Monitoring 22: 2082–127. [Google Scholar] [CrossRef]
Byvatov, Evgeny, Uli Fechner, Jens Sadowski, and Gisbert Schneider. 2003. Comparison of Support Vector Machine and Artificial Neural Network Systems for Drug/Nondrug Classification. Journal of Chemical Information and Computer Sciences 43: 1882–89. [Google Scholar] [CrossRef]
Capelli, Paolo, Federica Ielasi, and Angeloantonio Russo. 2021. Forecasting volatility by integrating financial risk with environmental, social, and governance risk. Corporate Social Responsibility and Environmental Management 28: 1483–95. [Google Scholar] [CrossRef]
Chaudhuri, Arindam, and Kajal De. 2011. Fuzzy Support Vector Machine for bankruptcy prediction. Applied Soft Computing Journal 11: 2472–86. [Google Scholar] [CrossRef]
Chen, Tianqi, and Carlos Guestrin. 2016. XGBoost: A scalable tree boosting system. Paper presented at the Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, August 13–17; pp. 785–94. [Google Scholar] [CrossRef]
Cheng, Christine, Stewart Jones, and William J. Moser. 2018. Abnormal trading behavior of specific types of shareholders before US firm bankruptcy and its implications for firm bankruptcy prediction. Journal of Business Finance and Accounting 45: 1100–38. [Google Scholar] [CrossRef]
Chiaramonte, Laura, and Barbara Casu. 2017. Capital and liquidity ratios and financial distress. Evidence from the European banking industry. British Accounting Review 49: 138–61. [Google Scholar] [CrossRef]
Choi, Hyunchul, Hyeonwoo Sung, Hyukman Cho, Sungwook Lee, Hyojoo Son, and Changwan Kim. 2017. Comparison of single classifier models for predicting long-term business failure of construction companies using finance-based definition of the failure. Paper presented at the ISARC 2017—Proceedings of the 34th International Symposium on Automation and Robotics in Construction, Taipei, Taiwan, June 28–July 1; pp. 282–87. [Google Scholar] [CrossRef]
Chouaibi, Salim, Matteo Rossi, Dario Siggia, and Jamel Chouaibi. 2022. Exploring the moderating role of social and ethical practices in the relationship between environmental disclosure and financial performance: Evidence from Esg companies. Sustainability 14: 209. [Google Scholar] [CrossRef]
Chui, Kwok Tai, and Miltiadis D. Lytras. 2019. A novel MOGA-SVM multinomial classification for organ inflammation detection. Applied Sciences 9: 2284. [Google Scholar] [CrossRef]
Deif, Mohanad A., Ahmed A. A. Solyman, Mohammed H. Alsharif, and Peerapong Uthansakul. 2021. Automated triage system for intensive care admissions during the covid-19 pandemic using hybrid xgboost-ahp approach. Sensors 21: 6379. [Google Scholar] [CrossRef] [PubMed]
Delis, Manthos D., and Panagiotis K. Staikouras. 2011. Supervisory effectiveness and bank risk. Review of Finance 15: 511–43. [Google Scholar] [CrossRef]
Dellepiane, Umberto, Michele Di Marcantonio, Enrico Laghi, and Stefania Renzi. 2015. Bankruptcy Prediction Using Support Vector Machines and Feature Selection During the Recent Financial Crisis. International Journal of Economics and Finance 7: 182–95. [Google Scholar] [CrossRef]
DeSalvo, Giulia, and Mehryar Mohri. 2016. Random composite forests. Paper presented at the 30th AAAI Conference on Artificial Intelligence, AAAI 2016, Phoenix, AZ, USA, February 12–17; pp. 1540–46. [Google Scholar] [CrossRef]
Ding, Yongsheng, Xinping Song, and Yueming Zen. 2008. Forecasting financial condition of chinese listed companies based on support vector machine. Expert Systems with Applications 34: 3081–89. [Google Scholar] [CrossRef]
Domicián, Máté, Raza Hassan, and Ahmad Ishtiaq. 2023. Comparative Analysis of Machine Learning Models for Bankruptcy Prediction in the Context of Pakistani Companies. Paper presented at the 2023 3rd International Conference on Intelligent Technologies, CONIT 2023, Hubli, India, June 23–25. [Google Scholar]
Elklawy, Mohamed. 2024. Is ESG a Determinant of Banks’ Resilience and Growth Everywhere? A Response from an AI-Aided Approach. Master’s thesis, The American University in Cairo, New Cairo, Egypt. [Google Scholar]
Faris, Hossam, Ruba Abukhurma, Waref Almanaseer, Mohammed Saadeh, Antonio M. Mora, Pedro A. Castillo, and Ibrahim Aljarah. 2020. Improving financial bankruptcy prediction in a highly imbalanced class distribution using oversampling and ensemble learning: A case from the Spanish market. Progress in Artificial Intelligence 9: 31–53. [Google Scholar] [CrossRef]
Feng, Ji, Yi-Xuan Xu, Yuan Jiang, and Zhi-Hua Zhou. 2020. Soft Gradient Boosting Machine. arXiv arXiv:2006.04059. [Google Scholar]
Ferreira, Artur J., and Mário A. T. Figueiredo. 2012. Boosting Algorithms: A Review of Methods, Theory, and Applications. In Ensemble Machine Learning. New York: Springer. [Google Scholar] [CrossRef]
Friede, Gunnar, Timo Busch, and Alexander Bassen. 2015. ESG and financial performance: Aggregated evidence from more than 2000 empirical studies. Journal of Sustainable Finance and Investment 5: 210–33. [Google Scholar] [CrossRef]
Furman, Jason, and Joseph E. Stiglitz. 1998. Economic crises: Evidence and Insights from East Asia. Brookings Papers on Economic Activity 1998: 1–135. [Google Scholar] [CrossRef]
Gaber, Tarek, Alaa Tharwat, Aboul Ella Hassanien, and Vaclav Snasel. 2016. Biometric cattle identification approach based on Weber’s Local Descriptor and AdaBoost classifier. Computers and Electronics in Agriculture 122: 55–66. [Google Scholar] [CrossRef]
Ganatra, Amit P., and Yogesh P. Kosta. 2010. Comprehensive Evolution and Evaluation of Boosting. International Journal of Computer Theory and Engineering 2: 931–36. [Google Scholar] [CrossRef]
Gaytan, Jesus Cuauhtemoc Tellez, Karamath Ateeq, Aqila Rafiuddin, Haitham M. Alzoubi, Taher M. Ghazal, Tariq Ahamed Ahanger, Sunita Chaudhary, and G. K. Viju. 2022. AI-Based Prediction of Capital Structure: Performance Comparison of ANN SVM and LR Models. Computational Intelligence and Neuroscience 2022: 8334927. [Google Scholar] [CrossRef]
Georganos, Stefanos, Tais Grippa, Sabine Vanhuysse, Moritz Lennert, Michal Shimoni, and Eleonore Wolff. 2018. Very High Resolution Object-Based Land Use-Land Cover Urban Classification Using Extreme Gradient Boosting. IEEE Geoscience and Remote Sensing Letters 15: 607–11. [Google Scholar] [CrossRef]
Habib, Ahmed Mohamed. 2023. Do business strategies and environmental, social, and governance (ESG) performance mitigate the likelihood of financial distress? A multiple mediation model. Heliyon 9: e17847. [Google Scholar] [CrossRef]
He, Shaofu, and Fei Li. 2021. Artificial Neural Network Model in Spatial Analysis of Geographic Information System. Mobile Information Systems 2021: 1166877. [Google Scholar] [CrossRef]
Hegde, Girish, Vishwanath R. Hulipalled, and Jay B. Simha. 2023. Data driven algorithm selection to predict agriculture commodities price. International Journal of Electrical and Computer Engineering 13: 4671–82. [Google Scholar] [CrossRef]
Helminen, Niilo. 2023. ESG Momentum and Stock Performance in U.S. During 2018–2023 Does Market Reward Companies for Improving ESG Scores? LUT University. Available online: https://urn.fi/URN:NBN:fi-fe20231204150949 (accessed on 2 November 2024).
Hjaltason, Gísli R., and Hanan Samet. 1999. Distance browsing in spatial databases. ACM Transactions on Database Systems 24: 265–318. [Google Scholar] [CrossRef]
Hu, Yi-Chung. 2020. A multivariate grey prediction model with grey relational analysis for bankruptcy prediction problems. Soft Computing 24: 4259–68. [Google Scholar] [CrossRef]
Hwase, Tesyon Korjo, and Abdul Joseph Fofanah. 2021. Machine Learning Model Approaches for Price Prediction in Coffee Market Using Linear Regression, XGB, and LSTM Techniques. International Journal of Scientific Research in Science and Technology 8: 10–48. [Google Scholar] [CrossRef]
Jacobs, Michael, Jr. 2024. Benchmarking alternative interpretable machine learning models for corporate probability of default. Data Science in Finance and Economics 4: 1–52. [Google Scholar] [CrossRef]
Jadhav, Swati, Hongmei He, and Karl Jenkins. 2018. Information gain directed genetic algorithm wrapper feature selection for credit rating. Applied Soft Computing Journal 69: 541–53. [Google Scholar] [CrossRef]
Jafarzadeh, Hamid, Masoud Mahdianpari, Eric Gill, Fariba Mohammadimanesh, and Saeid Homayouni. 2021. Bagging and Boosting Ensemble Classifiers for Classification of Comparative Evaluation. Remote Sensing 13: 4405. [Google Scholar] [CrossRef]
Kadkhodaei, Hamid Reza, Amir Masoud Eftekhari Moghadam, and Mehdi Dehghan. 2020. HBoost: A heterogeneous ensemble classifier based on the Boosting method and entropy measurement. Expert Systems with Applications 157: 113482. [Google Scholar] [CrossRef]
Kalaiarasi, Ganesan, and Sureshbabu Maheswari. 2021. Deep proximal support vector machine classifiers for hyperspectral images classification. Neural Computing and Applications 33: 13391–415. [Google Scholar] [CrossRef]
Kalantar, Bahareh, Biswajeet Pradhan, Seyed Amir Naghibi, Alireza Motevalli, and Shattri Mansor. 2018. Assessment of the effects of training data selection on the landslide susceptibility mapping: A comparison between support vector machine (SVM), logistic regression (LR) and artificial neural networks (ANN). Geomatics, Natural Hazards and Risk 9: 49–69. [Google Scholar] [CrossRef]
Kaleem, Mehwish, Hashim bin Jusoh, Hassan Raza, Misbah Sadiq, and Ahmad Husni bin Hamzah. 2024. A Machine Learning Approach to Predict Bankruptcy in Chinese Companies with ESG Integration. Pakistan Journal of Commerce and Social Sciences 18: 335–57. [Google Scholar]
Karatzoglou, Alexandros, Alex Smola, Kurt Hornik, and Achim Zeileis. 2004. A survey on patents of typical robust image watermarking. Advanced Materials Research 271–273: 389–93. [Google Scholar] [CrossRef]
Kim, Sang, and Zhichuan Li. 2021. Understanding the impact of esg practices in corporate finance. Sustainability 13: 3746. [Google Scholar] [CrossRef]
Kovacova, Maria, and Jana Kliestikova. 2017. Modelling bankruptcy prediction models in Slovak companies. SHS Web of Conferences 39: 01013. [Google Scholar] [CrossRef]
Kristanti, Farida Titik, Deannes Isynuwardhana, and Sri Rahayu. 2019. Market concentration, diversification, and financial distress in the Indonesian banking system. Jurnal Keuangan Dan Perbankan 23: 514–24. [Google Scholar] [CrossRef]
Kuang, Li, Han Yan, Yujia Zhu, Shenmei Tu, and Xiaoliang Fan. 2019. Predicting duration of traffic accidents based on cost-sensitive Bayesian network and weighted K-nearest neighbor. Journal of Intelligent Transportation Systems: Technology, Planning, and Operations 23: 161–74. [Google Scholar] [CrossRef]
Kumar, Mukesh, Karan Bajaj, Bhisham Sharma, and Sushil Narang. 2021. A Comparative Performance Assessment of Optimized Multilevel Ensemble Learning Model with Existing Classifier Models. Big Data 10: 371–87. [Google Scholar] [CrossRef] [PubMed]
Laeven, Luc, and Ross Levine. 2009. Bank governance, regulation and risk taking. Journal of Financial Economics 93: 259–75. [Google Scholar] [CrossRef]
Le, Tuong, Minh Thanh Vo, Bay Vo, Mi Young Lee, and Sung Wook Baik. 2019. A Hybrid Approach Using Oversampling Technique and Cost-Sensitive Learning for Bankruptcy Prediction. Complexity 2019: 8460934. [Google Scholar] [CrossRef]
Lim, Se Hun, and Kyungdoo Nam. 2006. Artificial Neural Network Modeling in Forecasting a Successful Implementation of ERP Systems. International Journal of Computational Intelligence Research 2: 115–19. [Google Scholar] [CrossRef]
Lim, Tristan. 2024. Environmental, social, and governance (ESG) and artificial intelligence in finance: State-of-the-art and research takeaways. Artificial Intelligence Review 57: 76. [Google Scholar] [CrossRef]
Lin, Wei-Chao, Yu-Hsin Lu, and Chih-Fong Tsai. 2018. Feature selection in single and ensemble learning-based bankruptcy prediction models. Expert Systems 36: e12335. [Google Scholar] [CrossRef]
Lisin, Anton, Andrei Kushnir, Alexey G. Koryakov, Natalia Fomenko, and Tatyana Shchukina. 2022. Financial Stability in Companies with High ESG Scores: Evidence from North America Using the Ohlson O-Score. Sustainability 14: 479. [Google Scholar] [CrossRef]
Liu, Yuting, and Minghua Lin. 2022. Research on The Influence of ESG Information Disclosure on Enterprise Financial Risk. Frontiers in Business, Economics and Management 5: 264–71. [Google Scholar] [CrossRef]
Marom, Shaike, and Robert N. Lussier. 2014. A Business Success Versus Failure Prediction Model for Small Businesses in Israel. Business and Economic Research 4: 63. [Google Scholar] [CrossRef]
Masanobu, Matsumaru, Kaneko Shoichi, Katagiri Hideki, and Kawanaka Takaaki. 2019. Bankruptcy prediction for Japanese corporations using support vector machine, artificial neural network, and multivariate discriminant analysis. International Journal of Industrial Engineering and Operations Management 1: 78–96. [Google Scholar] [CrossRef]
Michalkova, Lucia, Katarina Valaskova, Katarina Frajtova Michalikova, and Andrea Drugau Constantin. 2018. The Holistic View of the Symptoms of Financial Health of Businesses. Paper presented at the Third International Conference on Economic and Business Management (FEBM 2018), Hohhot, China, October 20–22; Amsterdam: Atlantis Press, vol. 56, pp. 90–18. [Google Scholar] [CrossRef]
Michela, Porta. 2023. The relationship between ESG Performance and Financial Stability: Are Sustainable Companies Less Likely to Fail? University of Vaasa-Vaasan yliopisto Research Repository. Available online: https://osuva.uwasa.fi/bitstream/handle/10024/16837/Uwasa_2024_Porta_Michela.pdf?sequence=2&isAllowed=y (accessed on 2 November 2024).
Min, Jae H., and Young Chan Lee. 2005. Bankruptcy prediction using support vector machine with optimal choice of kernel function parameters. Expert Systems with Applications 28: 603–14. [Google Scholar] [CrossRef]
Mishina, Yohei, Ryuei Murata, Yuji Yamauchi, Takayoshi Yamashita, and Hironobu Fujiyoshi. 2015. Boosted random forest. IEICE Transactions on Information and Systems E98-D: 1630–36. [Google Scholar] [CrossRef]
Moreno, Sinvaldo Rodrigues, Ramon Gomes Silva, Matheus Henrique dal Molin Ribeiro, Naylene Fraccanabbia, Viviana Cocco Mariani, and Leandro dos Santos Coelho. 2020. Very Short-term Wind Energy Forecasting Based on Stacking Ensemble. Paper presented at the 14th Brazilian Computational Intelligence Meeting (CBIC), Belem, Brazil, November 16–20; pp. 1–22. [Google Scholar] [CrossRef]
Mukarramah, Rifqatul, Dedy Atmajaya, and Lutfi Budi Ilmawan. 2021. Performance comparison of support vector machine (SVM) with linear kernel and polynomial kernel for multiclass sentiment analysis on twitter. ILKOM Jurnal Ilmiah 13: 168–74. [Google Scholar] [CrossRef]
Muñoz-Izquierdo, Nora, María Del Mar Camacho-Miñano, María Jesús Segovia-Vargas, and David Pascual-Ezama. 2019. Is the external audit report useful for bankruptcy prediction? Evidence using artificial intelligence. International Journal of Financial Studies 7: 20. [Google Scholar] [CrossRef]
Neumeyer, Pablo A., and Fabrizio Perri. 2005. Business cycles in emerging economies: The role of interest rates. Journal of Monetary Economics 52: 345–80. [Google Scholar] [CrossRef]
Nguyen, Phong Thanh. 2021. Application Machine Learning in Construction Management. TEM Journal 10: 1385–89. [Google Scholar] [CrossRef]
Ogachi, Daniel, Richard Ndege, Peter Gaturu, and Zeman Zoltan. 2020. Corporate Bankruptcy Prediction Model, a Special Focus on Listed Companies in Kenya. Journal of Risk and Financial Management 13: 47. [Google Scholar] [CrossRef]
Özparlak, Gerçek, and Menevşe Özdemir Dilidüzgün. 2022. Corporate Bankruptcy Prediction Using Machine Learning Methods: The Case of the Usa. International Journal of Management Economics and Business 18: 1007–31. [Google Scholar] [CrossRef]
Paramita, Adi Suryaputra, Indra Maryati, and Laura Mahendratta Tjahjono. 2022. Implementation of the K-Nearest Neighbor Algorithm for the Classification of Student Thesis Subjects. Journal of Applied Data Sciences 3: 128–36. [Google Scholar] [CrossRef]
Pérez, Fernando A. Acosta, Gabriel E. Rodríguez Ortiz, Everson Rodríguez Muñiz, Fernando J. Ortiz Sacarello, Jee Eun Kang, and Daniel Rodriguez-Roman. 2020. Predicting Trip Cancellations and No-Shows in Paratransit Operations. Transportation Research Record 2674: 774–84. [Google Scholar] [CrossRef]
Prokhorenkova, Liudmila, Gleb Gusev, Aleksandr Vorobev, Anna Veronika Dorogush, and Andrey Gulin. 2018. Catboost: Unbiased boosting with categorical features. In Advances in Neural Information Processing Systems. pp. 6638–48. Available online: https://api.semanticscholar.org/CorpusID:5044218 (accessed on 2 November 2024).
Prusak, Błażej. 2018. Review of Research into Enterprise Bankruptcy Prediction in Selected Central and Eastern European Countries. International Journal of Financial Studies 6: 60. [Google Scholar] [CrossRef]
Purnomo, Ariana Tulus, Ding Bing Lin, Tjahjo Adiprabowo, and Willy Fitra Hendria. 2021. Non-contact monitoring and classification of breathing pattern for the supervision of people infected by covid-19. Sensors 21: 3172. [Google Scholar] [CrossRef] [PubMed]
Rainarli, E. 2019. The Comparison of Machine Learning Model to Predict Bankruptcy: Indonesian Stock Exchange Data. IOP Conference Series: Materials Science and Engineering 662: 052019. [Google Scholar] [CrossRef]
Ramdani, Fatwa, and Muhammad Tanzil Furqon. 2022. The simplicity of XGBoost algorithm versus the complexity of Random Forest, Support Vector Machine, and Neural Networks algorithms in urban forest classification. F1000Research 11: 1069. [Google Scholar] [CrossRef]
Ranjan, Adyant, and Guillermo Goldsztein. 2022. An Optimization of Machine Learning Approaches in the Forecasting of Global Financial Stability. Journal of Student Research 11: 1–15. [Google Scholar] [CrossRef]
Rehman, Sana ur, Hani Baloch, and Abdul Qayyum. 2021. Anticipating Bankruptcy for Non-Financial Sector Firms: A Case of Pakistan. Journal of Accounting and Finance in Emerging Economies 7: 401–9. [Google Scholar] [CrossRef]
Richardson, Bill, Sonny Nwankwo, and Susan Richardson. 1994. Understanding the Causes of Business Failure Crises: Generic Failure Types: Boiled Frogs, Drowned Frogs, Bullfrogs and Tadpoles. Management Decision 32: 9–22. [Google Scholar] [CrossRef]
Rokach, Lior. 2016. Decision forest: Twenty years of research. Information Fusion 27: 111–25. [Google Scholar] [CrossRef]
Saadaari, F. Saadaari, Daniel Mireku-Gyimah, and Boluwaji Muriana Olaleye. 2020. Development of a Stope Stability Prediction Model Using Ensemble Learning Techniques—A Case Study. Ghana Mining Journal 20: 18–26. [Google Scholar] [CrossRef]
Sachs, Jeffrey, Aaron Tornell, and Andres Velasco. 1996. Financial Crises in Emerging Markets in 1995. Brookings Papers on Economic Activity 1: 147–215. [Google Scholar] [CrossRef]
Sahin, Emrehan Kutlug. 2020. Assessing the predictive capability of ensemble tree methods for landslide susceptibility mapping using XGBoost, gradient boosting machine, and random forest. SN Applied Sciences 2: 1308. [Google Scholar] [CrossRef]
Santoso, Nurilen Wuri, Ratih Kusumawardhani, and Alfiatul Maulida. 2024. Comparative Analysis Of The Altman, Ohlson, And Zmijewski Models To Predict Financial Distress During The Covid-19 Pandemic. Maksimum 14: 13–21. [Google Scholar] [CrossRef]
Sarker, Iqbal H., Faisal Faruque, Hamed Alqahtani, and Asra Kalim. 2020. K-Nearest Neighbor Learning based Diabetes Mellitus Prediction and Analysis for eHealth Services. EAI Endorsed Transactions on Scalable Information Systems 7: e4. [Google Scholar] [CrossRef]
Shekar, Shetty, Musa Mohamed, and Brédart Xavier. 2022. Bankruptcy Prediction Using Machine Learning Methods. Information Society. 1: 56–67. [Google Scholar] [CrossRef]
Shin, Kyung Shik, and Kyoung Jun Lee. 2004. Neuro-genetic approach for bankruptcy prediction modeling. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Berlin and Heidelberg: Springer, vol. 3214, pp. 646–52. [Google Scholar] [CrossRef]
Smiti, Salima, and Makram Soui. 2020. Bankruptcy Prediction Using Deep Learning Approach Based on Borderline SMOTE. Information Systems Frontiers 22: 1067–83. [Google Scholar] [CrossRef]
Smiti, Salima, Makram Soui, and Khaled Ghedira. 2022. Tri-XGBoost Model: An Interpretable Semi-supervised Approach for Addressing Bankruptcy Prediction. Authorea, 1–44. [Google Scholar] [CrossRef]
Srebro, Bosiljka, Bojan Mavrenski, Vesna Bogojević Arsić, Snežana Knežević, Marko Milašinović, and Jovan Travica. 2021. Bankruptcy risk prediction in ensuring the sustainable operation of agriculture companies. Sustainability 13: 7712. [Google Scholar] [CrossRef]
Srinivas, T. Aditya Sai, and Munnuru Bharathi. 2024. Financial Oracle: Unlocking Credit Scores. Journal of IoT and Machine Learning 1: 9–11. [Google Scholar] [CrossRef]
Sun, Bin, Wei Cheng, Prashant Goswami, and Guohua Bai. 2018. Short-term traffic forecasting using self-adjusting k-nearest neighbours. IET Intelligent Transport Systems 12: 41–48. [Google Scholar] [CrossRef]
Sun, Jie, Hui Li, Qing Hua Huang, and Kai Yu He. 2014. Predicting financial distress and corporate failure: A review from the state-of-the-art definitions, modeling, sampling, and featuring approaches. Knowledge-Based Systems 57: 41–56. [Google Scholar] [CrossRef]
Tang, Yajiao, Junkai Ji, Yulin Zhu, Shangce Gao, Zheng Tang, and Yuki Todo. 2019. A Differential Evolution-Oriented Pruning Neural Network Model for Bankruptcy Prediction. Complexity 2019: 8682124. [Google Scholar] [CrossRef]
Torlay, Laurent, Marcela Perrone-Bertolotti, Elizabeth Thomas, and Monica Baciu. 2017. Machine learning–XGBoost analysis of language networks to classify patients with epilepsy. Brain Informatics 4: 159–69. [Google Scholar] [CrossRef] [PubMed]
Uğurlu, Mine, and Hakan Aksoy. 2006. Prediction of corporate financial distress in an emerging market: The case of Turkey. Cross Cultural Management: An International Journal 13: 277–95. [Google Scholar] [CrossRef]
Vanhaeren, Thomas, Federico Divina, Miguel García-Torres, Francisco Gómez-Vela, Wim Vanhoof, and Pedro Manuel Martínez García. 2020. A comparative study of supervised machine learning algorithms for the prediction of long-range chromatin interactions. Genes 11: 985. [Google Scholar] [CrossRef]
Wang, Falin, Gang Yuan, Chaoyang Guo, and Zhinong Li. 2022. Research on fault diagnosis method of aviation cable based on improved Adaboost. Advances in Mechanical Engineering 14: 1–14. [Google Scholar] [CrossRef]
Wang, Mingjing, Huiling Chen, Huaizhong Li, Zhennao Cai, Xuehua Zhao, Changfei Tong, Jun Li, and Xin Xu. 2017. Grey wolf optimization evolving kernel extreme learning machine: Application to bankruptcy prediction. Engineering Applications of Artificial Intelligence 63: 54–68. [Google Scholar] [CrossRef]
Wang, Nannan, Dayao Li, Dengfeng Cui, and Xiaolong Ma. 2022. Governance disclosure and corporate sustainable growth: Evidence from China. Frontiers in Environmental Science 10: 1015764. [Google Scholar] [CrossRef]
Wyner, Abraham J., Matthew Olson, Justin Bleich, and David Mease. 2017. Explaining the success of adaboost and random forests as interpolating classifiers. Journal of Machine Learning Research 18: 1–33. [Google Scholar]
Xu, Yuhong, Zhiwen Yu, Wenming Cao, and C. L. Philip Chen. 2023. A Novel Classifier Ensemble Method Based on Subspace Enhancement for High-Dimensional Data Classification. IEEE Transactions on Knowledge and Data Engineering 35: 16–30. [Google Scholar] [CrossRef]
Yotsawat, Wirot, Kanyalag Phodong, Thawatchai Promrat, and Pakaket Wattuya. 2023. Bankruptcy prediction model using cost-sensitive extreme gradient boosting in the context of imbalanced datasets. International Journal of Electrical and Computer Engineering 13: 4683–91. [Google Scholar] [CrossRef]
Yotsawat, Wirot, Pakaket Wattuya, and Anongnart Srivihok. 2021. Improved credit scoring model using XGBoost with Bayesian hyper-parameter optimization. International Journal of Electrical and Computer Engineering 11: 5477–87. [Google Scholar] [CrossRef]
Youn, Hyewon, and Zheng Gu. 2010. Predict US Restaurant Firm Failures: The Artificial Neural Network Model versus Logistic Regression Model. Tourism and Hospitality Research 10: 171–87. [Google Scholar] [CrossRef]
Zhang, Guoqiang, Michael Y. Hu, B. Eddy Patuwo, and Daniel C. Indro. 1999. Artificial neural networks in bankruptcy prediction: General framework and cross-validation analysis. European Journal of Operational Research 116: 16–32. [Google Scholar] [CrossRef]
Zhang, Wei, Xiaohui Chen, Yueqi Liu, and Qian Xi. 2020. A Distributed Storage and Computation K-Nearest Neighbor Algorithm Based Cloud-Edge Computing for Cyber-Physical-Social Systems. IEEE Access 8: 50118–30. [Google Scholar] [CrossRef]
Zhao, Changhong, Yu Guo, Jiahai Yuan, Mengya Wu, Daiyu Li, Yiou Zhou, and Jiangang Kang. 2018. ESG and corporate financial performance: Empirical evidence from China’s listed power generation companies. Sustainability 10: 2607. [Google Scholar] [CrossRef]
Zizi, Youssef, Amine Jamali-Alaoui, Badreddine El Goumi, Mohamed Oudgou, and Abdeslam El Moudden. 2021. An optimal model of financial distress prediction: A comparative study between neural networks and logistic regression. Risks 9: 200. [Google Scholar] [CrossRef]
Zou, Yao, Changchun Gao, and Han Gao. 2022. Business Failure Prediction Based on a Cost-Sensitive Extreme Gradient Boosting Machine. IEEE Access 10: 42623–39. [Google Scholar] [CrossRef]

Figure 1. Principal component analysis.

Figure 2. Receiver operating characteristic (ROC) curves.

Figure 3. Confusion matrix curve.

Figure 4. Feature importance graph.

Table 1. Pros and Cons of Machine-Learning Algorithms for Bankruptcy Prediction.

Algorithm	Pros	Cons	Application in Study
Logistic Regression	Simple, interpretable, and effective for linear relationships	May fail with non-linear relationships, prone to overfitting if not regularized	Used for basic binary classification of bankruptcy
K-Nearest Neighbors (KNNs)	Simple, effective for non-linear relationships	Sensitive to the local data structure, high computational cost for large datasets	Employed for its flexibility in capturing complex, varied relationships
Decision Trees	Easy to interpret, can handle both numerical and categorical data	Prone to overfitting, can create overly complex trees that do not generalize well	Utilized for their interpretability and effectiveness in categorical data handling
Support Vector Machine (Linear)	Effective in high-dimensional spaces, memory-efficient	Requires feature-scaling, not suitable for larger datasets	Applied for linearly separable data, used primarily for classification tasks
Support Vector Machine (Non-Linear)	Can model non-linear relationships, robust against overfitting in high-dimensional spaces	Computationally intensive, requires careful tuning of parameters	Used to handle datasets with complex decision boundaries
Artificial Neural Networks (ANNs)	Highly flexible, can model complex non-linear relationships	“Black box” nature, computationally expensive, requires large datasets	Deployed for their ability to learn and model non-linear and complex relationships effectively
Random Forest	Handles overfitting well, effective for large datasets, provides feature importance	Can be slow to generate predictions if consisting of many trees	Used for its robustness and efficiency in handling different types of data
Gradient Boosting	Often provides predictive accuracy that cannot be beaten, very flexible	Can overfit on noisy datasets, sensitive to outliers, computationally expensive	Applied for its strength in sequential correction of predecessors’ errors
XGBoost	Handles large datasets well, provides feature importance, efficient and flexible	Can overfit if not correctly tuned, complex parameter tuning required	Utilized for its efficiency in large datasets and high performance in bankruptcy prediction
AdaBoost	Improves classification accuracy, combines multiple weak learners to form a strong learner	Sensitive to noisy data and outliers, can overfit if the weak learners are too complex	Employed for its ability to adaptively focus on misclassified instances
CatBoost	Excels with categorical data without extensive data preprocessing, reduces overfitting	Less interpretable compared to simpler models, parameter tuning can be complex	Used for its advanced handling of categorical features and robustness against overfitting

Table 2. Descriptive analysis.

Group	Failed Firm		Non-Failed Firm
Variable	Mean	Std. Deviation	Mean	Std. Deviation
ESG	34.532	22.247	53.848	21.425
P1	−1692	11511	−2.061	173
P7	−0.447	1.604	0.102	0.535
P8	−0.190	0.990	0.109	0.419
L2	−1.700	1.946	−1.588	2.783
L3	0.500	0.883	1.209	1.220
AC1	41.16	124.37	88.83	1466.57

ESG = ESG score, P1 = Net profit margin, P7 = Operating return on assets, P8 = Return on capital employed, L2 = Quick ratio, L3 = Cash to current liabilities, AC1 = Inventory turnover ratio.

Table 3. Pooled Within-Groups Correlation.

	ESG	P1	P7	P8	L2	L3	AC1
ESG	1
P1	0.046	1
P7	−0.029	0.079	1
P8	0.021	0.024	−0.628	1
L2	−0.083	−0.031	0.065	0.026	1
L3	−0.108	−0.039	−0.003	−0.009	0.08	1
AC1	−0.058	−0.001	−0.004	−0.006	−0.069	0.014	1

ESG = ESG score, P1 = net profit margin, P7 = Operating return on assets, P8 = Return on capital employed, L2 = Quick ratio, L3 = Cash to current liabilities, AC1 = Inventory turnover ratio.

Table 4. Application of machine-learning models for predicting business failure.

Model(s)	Precision		Recall		F1-Score		Overall Accuracy
Model(s)	Failed	Non-Failed	Failed	Non-Failed	Failed	Non-Failed	Overall Accuracy
Logistic Regression	0.92	0.94	0.94	0.92	0.93	0.93	93.39%
K-Nearest Neighbors	0.99	0.99	0.99	0.99	0.99	0.99	99.15%
Decision Trees	0.99	0.98	0.98	0.99	0.99	0.99	98.51%
SVMs (Linear Kernel)	0.93	0.95	0.95	0.92	0.94	0.94	93.60%
SVMs (RBF Kernel)	0.97	0.93	0.93	0.97	0.95	0.95	95.10%
Neural Networks	0.97	0.98	0.98	0.97	0.98	0.98	97.65%
Random Forest	1	1	1	1	1	1	99.57%
Gradient Boosting	0.99	0.99	0.99	0.99	0.99	0.99	98.93%
XgBoost Classifier	0.99	0.99	0.99	0.99	0.99	0.99	99.15%
AdaBoost Classifier	0.98	0.95	0.95	0.98	0.96	0.96	96.38%
CatBoost Classifier	1	1	1	1	1	1	99.57%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kaleem, M.; Raza, H.; Ashraf, S.; Almeida, A.M.; Machado, L.P. Does ESG Predict Business Failure in Brazil? An Application of Machine Learning Techniques. Risks 2024, 12, 185. https://doi.org/10.3390/risks12120185

AMA Style

Kaleem M, Raza H, Ashraf S, Almeida AM, Machado LP. Does ESG Predict Business Failure in Brazil? An Application of Machine Learning Techniques. Risks. 2024; 12(12):185. https://doi.org/10.3390/risks12120185

Chicago/Turabian Style

Kaleem, Mehwish, Hassan Raza, Sumaira Ashraf, António Martins Almeida, and Luiz Pinto Machado. 2024. "Does ESG Predict Business Failure in Brazil? An Application of Machine Learning Techniques" Risks 12, no. 12: 185. https://doi.org/10.3390/risks12120185

APA Style

Kaleem, M., Raza, H., Ashraf, S., Almeida, A. M., & Machado, L. P. (2024). Does ESG Predict Business Failure in Brazil? An Application of Machine Learning Techniques. Risks, 12(12), 185. https://doi.org/10.3390/risks12120185

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Does ESG Predict Business Failure in Brazil? An Application of Machine Learning Techniques

Abstract

1. Introduction

2. A Review of Machine-Learning Techniques for Business Failure Prediction

2.1. Logistic Regression

2.2. K-Nearest Neighbors (KNN)

2.3. Decision Trees

2.4. Support Vector Machine (Linear Kernel)

2.5. Support Vector Machine (Non-Linear Kernel)

2.6. Artificial Neural Networks (ANNs)

2.7. Random Forest

2.8. Gradient Boosting

2.9. XGBoost Classifier

2.10. AdaBoost Classifier

2.11. Catboost Classifier

3. Data and Methodology

3.1. Data and Variables

3.2. Data Screening Process

3.3. Descriptive Statistics

3.4. Pooled Within-Groups Correlation

4. Results and Analysis

4.1. Application of Machine-Learning Models for Prediction Business Failure

4.2. Receiver Operating Characteristic (ROC) Curves

4.3. Confusion Matrices

4.4. Feature Importance

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI