**1. Introduction**

The financial distress of 2007 started in the U.S., promptly took the form of full-blown systematic risk in the U.S., and rapidly spread to other developed countries to become a serious global economic crisis [1–5]. Because the resultant financial distress had enormous impacts on firms and economies, such as impacting the stability of stock markets, decimating the values of corporations, and impeding the circulation of resources, financial distress prediction is gaining considerably greater interest. If the constructed prediction model is trustworthy and reliable, then managers can initiate some treatments to avoid further deterioration, before financial trouble erupts, and reach the goal of sustainable development, and market participants can adjust their investment strategy to maximize profitability under anticipated risk exposure [6–9].

From the seminal work done by Altman [10], which is grounded on multivariate discriminant analysis (MDA), a large volume of statistical-based techniques and operation research approaches have been consecutively implemented to deal with both credit risk and financial distress prediction. However, the aforementioned techniques have to satisfy some strict statistical assumptions, such as linear separability, multivariate normality, and independence of predictors, which are often violated in real-life applications [11–13]. With the rapid improvement of innovative data-driven technologies, computational intelligence (such as neural network (NN), decision tree (DT), and support vector

machine (SVM)) not only has the ability to cope with non-linearity, but can also extract meaningful information from vague, imprecise data as well as identify implicit trends that are too complicated to be discovered by either users or traditional systems [7,8]. This study adopts computational intelligence due to its (1) superior generalization capability and (2) it does not obey statistical assumptions.

Compared with the well-established research studies in the literature on financial distress and credit risk prediction, works on performance forecasting are quite rare, even as it is widely recognized that the critical trigger for financial distress is bad operating performance [14–16]. Kamei [17] also indicated that bad operating performance is responsible for many financial distress cases (almost 99%). In other words, events such as firm insolvency and defaults on promissory notes do not just happen by coincidence. Instead, there are notable root causes that precede corporate financial distress, and it is the inability to deal with such events properly at an early stage that triggers the demise of these corporations [18]. In short, bad operating performance is not only an inevitable prior stage before financial distress erupts, but also ruins the goal of corporate sustainable development. This is because the corporation with good financial performance normally poses superior operating efficiency. A corporation's operating efficiency increase (such as initiatives to improve energy efficiency, reduce CO2 emissions in production and transportation, reduce water use, eliminate the utilization of virgin materials, and reuse waste) means that it can reduce the operational costs so as to increase its profitability [19]. For example, Coca-Cola established a monitoring and targeting system in its plant to control and evaluate energy and water use, leading to a 15% reduction in water use, with a 6% increase in production. The solar park of Volkswagen AG in North America is expected to provide the electricity to all the plants when the manufacturing line is not ruining. Furthermore, Wanger and Bloom [20] identified the relationship between financial performance and sustainability. They divided their samples into two different subgroups based on their financial performance. Their finding showed that financially good-performing corporations have more resources to invest in sustainability, which, in return, lead to better financial results. Eccles et al. [21] also indicated that good financial performance corporations outperform bad financial performance corporations in terms of sustainable initiatives and accounting rate of return. That is, the corporation with good financial performance has a greater possibility to implement social and environmental sustainability so as to reach the goal of sustainable development.

How to evaluate and appropriately determine corporate operating performance is an attractive research topic. Past studies mainly focused on analyzing financial ratios such as return on assets (ROA) and return on investment (ROI), but these measures belong to the category of one-input and one-output techniques. However, merely implementing a one-input and one-output measurement to describe the whole facets of a corporation's inherent operations is not reliable. In order to provide an overarching assessment to determine the corporation's inherent operating performance, one can adopt data envelopment analysis (DEA), which handles multiple inputs and multiple outputs simultaneously without a pre-defined production function (e.g., profit maximization or cost minimization) [22,23].

In today's big data environment, users gather and disseminate data from so many different resources, but too much data without proper handling will confuse the users and push them to maybe even make an inappropriate decision. This is called the curse of dimensionality. DEA has the advantage of handling multiple inputs and multiple outputs simultaneously, but it comes with a weakness in that too much data will deteriorate its discriminant ability—that is, it cannot tell the difference between superior and inferior operating performances. To overcome this challenge, a novel dimensionality reduction technique, named the non-linear fuzzy robust principal component analysis (NFRPCA), was initiated. The analyzed outcome can then be fed into the restricted Boltzmann machine (RBM) to construct a model for corporate operating performance forecasting.

There is a clear requirement for precise decision-making support for both investments and the ongoing monitoring of the health of corporations. Even a fraction of forecasting accuracy improvement can translate into a tremendous amount of future savings [24]. An ensemble learning architecture is a set of individual classifiers whose predictions are integrated, leading to superior forecasting performance versus those from individual classifiers [25]. The fundamental idea of ensemble learning is to complement the error made by the individual model, which is widely recognized as one of the most efficient ways to improve the performance and robustness of the forecasting problem. Thus, the study used an ensemble learning strategy.

Managers can view the model as a decision support system (DSS) to assist them in forming better decisions under an anticipated risk level as well as reaching the goal of sustainable development. Investors can take the model as an investment guideline to modify their financing strategy and investment portfolio with the target of profit maximization. Policy makers can consider the potential implication of this research outcome and formulate future policies so as to solidify the stability of financial markets and upgrade a nation's industrial level.

The aim of this study was therefore four-fold as follows:


The rest of this study is organized as follows. Section 2 expresses the literature review. Section 3 briefly describes the implemented methodologies. Section 4 conducts some experiments to examine and compare the performance of the proposed model. Finally, Section 5 offers conclusions and some future work ideas.

### **2. Literature Review**

How to diagnose the nature of financial distress is still an open topic in the domain of corporate finance and accounting. As it is widely acknowledged that bad operating performance is the main trigger for financial distress, performance assessment has therefore become an important issue for the business world and academics for several decades. The three most commonly used criteria to describe corporate operating performance are return on assets (ROA), return on investment (ROI), and return on equity (ROE), but they are solely derived from financial statements that could hide some essential information about true financial troubles through different estimation methods and selective accounting principles. Moreover, these criteria belong to the category of one-input and one-output measurement. Merely utilizing these simple criteria to depict the whole structure of a corporation's operation is not reliable and trustworthy. DEA can deal with this obstacle by simultaneously handling multiple inputs and multiple outputs, and providing a final performance rank for each decision-making unit (DMU). However, the performance rank determined by DEA is affected by the inclusion or exclusion of an input or an output [26,27]—that is, the utilization of different inputs or outputs will lead to different performance ranks. Rather than employ a single DEA specification, this study preferred to go beyond a single performance rank and extended into multiple DEA specifications (i.e., it combined inputs and outputs in several dissimilar ways). By doing so, we achieved two objectives at the same time: (1) the assessment mechanism is useful for examining the robustness of the results, and (2) a bundle of performance ranks yields comprehensive information for classifying the units of observations [28].

Although multiple DEA specifications can provide overarching information for decision makers to form appropriate decisions, they still come with some challenges as too much information will confuse the decision makers and too much data fed into DEA will deteriorate its discriminant ability. To overcome this challenge, one can utilize NFRPCA (i.e., one of the dimensionality reduction techniques). The original data going through the NFRPCA procedure not only can facilitate the

decision-making process of the users, but also can enhance DEA's discriminant ability. By combining the utilization of DEA specifications, financial indicators, and dimensionality reduction technique, we were able to classify corporations into four different categories and designated those that have superior operating performance and those that have inferior operating performance. In other words, we transformed the performance assessment task into a conventional binary classification task. The determined results were then fed into the restricted Boltzmann machine (RBM) to construct the model for corporate operating performance forecasting.

#### **3. Methodology**

#### *Restricted Boltzmann Machine Ensemble (RBME)*

The restricted Boltzmann machine (RBM), a probabilistic graphical mechanism that can be deemed as a stochastic neural network, has gained considerable attention in the artificial intelligence (AI) community due to its remarkable advantages, such as outstanding representation capability, fast extraction of helpful information from a complex dataset, and quick identification of abstract features [29–32]. Furthermore, with the great improvement and advancement in computing power and the development of efficient learning algorithms (i.e., contrastive divergence), the RBM has demonstrated its effectiveness in numerous forecasting tasks [29].

The RBM has a two-layer architecture with a set of visible units **v** of dimension *G* depicting the observable data, and a set of binary hidden units **h** of dimension *B* that learn to express features that capture the higher-order correlation in the observed data. These two layers are linked together by means of the symmetrical weight matrix *E* ∈ *RG*×*B*, whereas there are no linkages within the same layer. The conceptual architecture of the two-layer RBM is represented in Figure 1. The RBM initially introduced the utilization of visible and hidden binary units, but it can also be extended to adopt numerous dissimilar sorts of units, such as Gaussian units, binominal units, softmax units, and so forth [30].

**Figure 1.** The conceptual structure of restricted Boltzmann machine (RBM).

The RBM, in reality, can be deemed as a Markov random field that attempts to express input data with hidden units. The weights between two layers are encoded by a statistical relationship. Given the energy function *K*(**v,h**) of the state (**v,h**), the joint distribution over the visible units and hidden units is described in Equation (1):

$$P(\mathbf{v}\_\prime \mathbf{h}) = \frac{1}{Z} \exp(-K(\mathbf{v}\_\prime \mathbf{h})),\tag{1}$$

where *Z* denotes the normalization constant. The binary visible and hidden units are depicted in Equation (2):

$$Z = \sum\_{\mathbf{v}} \sum\_{\mathbf{h}} \exp(-K(\mathbf{v}, \mathbf{h})).\tag{2}$$

For the real-valued visible units and binary units, the normalization constant *Z* can be determined by Equation (3):

$$Z = \int\_{\mathbf{v}} \sum\_{\mathbf{h}} \exp(-K(\mathbf{v}, \mathbf{h})) \, \mathrm{d}\mathbf{v}.\tag{3}$$

The visible units are binary-valued, and the energy function can be shown in Equation (4):

$$K(\mathbf{v}, \mathbf{h}) = -\sum\_{i=1}^{G} \sum\_{j=1}^{B} v\_i E\_{ij} h\_j - \sum\_{j=1}^{B} m\_j h\_j - \sum\_{i=1}^{G} n\_i v\_{i\star} \tag{4}$$

where the biases of the hidden and visible units are depicted as *mj* and *ni*, respectively. If the visible units are real-valued, then the energy function *K*(**v,h**), incorporated with a quadratic term, can be expressed in Equation (5):

$$K(\mathbf{v}, \mathbf{h}) = \frac{1}{2} \sum\_{i=1}^{G} v\_i^2 - \sum\_{i=1}^{G} \sum\_{j=1}^{B} v\_i E\_{ij} h\_j - \sum\_{j=1}^{B} m\_j h\_j - \sum\_{i=1}^{G} n\_i v\_i. \tag{5}$$

According to the abovementioned energy function, we can see that the hidden units *hj* are independent of each other when conditioned on **v**, since there are no direct linkages among all hidden units. In the same vein, the visible units *vi* are also independent of each other when conditioned on **h**. Conditioning on the visible layer, the units of a binary hidden layer are independent Bernoulli random variables. The probability of binary state *hj* of each hidden unit *j* is set to 1, and seen in Equation (6):

$$p(h\_{\bar{\jmath}} = 1 | \mathbf{v}) = \rho(\sum\_{i} E\_{i\bar{\jmath}} v\_i + m\_{\bar{\jmath}})\_{\prime} \tag{6}$$

where ρ(q) = 1/(1 + exp(−q)) denotes the sigmoid activation function.

The visible units conditioned on the hidden layer are independent Bernoulli random variables under the situation of visible units having binary values. The probability of the binary state *vi* of each visible unit *i* is set to 1, and depicted in Equation (7):

$$p(\mathbf{v}\_i = 1 | \mathbf{h}) = \rho(\sum\_j E\_{ij} h\_j + n\_i). \tag{7}$$

The visible units conditioned on the hidden layer are independent Gaussian random variables for the case of visible units having real values, and the mathematical formulation can be represented in Equation (8):

$$p(\mathbf{v}\_i|\mathbf{h}) = D(\sum\_j E\_{ij}h\_j + n\_{i\prime} \quad 1\quad),\tag{8}$$

where *D*() expresses the Gaussian distribution function.

The RBM is a generative model, and the inherent parameters can be determined by implementing a stochastic gradient descent algorithm. By aggregating all possible hidden vectors, the probability that the network assigns to a visible vector can be decided:

$$p(\mathbf{v}) = \frac{1}{Z} \sum\_{h} \exp(-\mathsf{K}(\mathbf{v}, \mathbf{h})) \tag{9}$$

The weight of the derivative of the log probability of a training vector can be decided by Equation (10):

$$\frac{\partial \log p(\mathbf{v})}{\partial E\_{ij}} = \left[v\_i h\_j\right]\_{\text{data}} - \left[v\_i h\_j\right]\_{\text{model}}\tag{10}$$

where the bracket expresses expectations under the distribution determined by the subscript follows. The subscripts data and model in Equation (10) depict the distribution of the data and the equilibrium

distribution decided by the RBM, respectively. Based on this concept, we updated the learning rule of the weight very easily, represented in Equation (11):

$$
\Delta E\_{i\bar{j}} = \lambda [\left(\upsilon\_i \mathfrak{h}\_{\bar{j}}\right)\_{\text{data}} - \left(\upsilon\_i \mathfrak{h}\_{\bar{j}}\right)\_{\text{model}}]\_\prime \tag{11}
$$

where λ depicts the learning rate.

How to get a precise sample of (*vihj*) mod*el* under the situation of calculating the exact gradient of the log probability of training data is very complicated. In order to solve this obstacle, an efficient and faster learning rule, called contrastive divergence (CD), was introduced:

$$
\Delta E\_{i\bar{j}} = \lambda \left[ \left( v\_i h\_{\bar{j}} \right)\_{\text{data}} - \left( v\_i h\_{\bar{j}} \right)\_{\text{recour}} \right]\_{\text{\textquotedblleft}} \tag{12}
$$

where recon describes a distribution decided by implementing alternative Gibbs sampling for 1 step, and Gibbs sampling is initialized with the data.

According to a similar concept for the learning rule of *Eij*, the biases of hidden and visible units (i.e., *mj* and *ni*) can be updated by the following equations:

$$
\Delta m\_{\circ} = \lambda \left[ \left( \mathbf{h}\_{\circ} \right)\_{data} - \left( \mathbf{h}\_{\circ} \right)\_{recon} \right] \text{ and} \tag{13}
$$

$$
\Delta \mathbf{u}\_i = \lambda \left[ \left( \mathbf{v}\_i \right)\_{\text{data}} - \left( \mathbf{v}\_i \right)\_{\text{rxcov}} \right]. \tag{14}
$$

Ensemble learning is an active research field in AI owing to its great potential to enhance the forecasting quality of a singular classifier. Even a fraction of improvement in forecasting accuracy can translate into a considerable amount of future monetary savings. Due to the urgent requirement for forecasting quality improvement, this study extended the singular classifier to an ensemble one (i.e., RBM ensemble: RBME) by implementing the bagging and random subspace method (RSM). By doing so, decision makers can receive much more accurate and unbiased forecasting outcomes as well as protect their personnel or investment wealth.

#### **4. Research Design and Analysis**

#### *4.1. The Data*

The capital-intensive electronics industry in Taiwan has gained considerable attention due to its great impact and influence on the global supply chain. The Taiwan government has also allocated considerable resources and provided numerous financing incentives to upgrade this sector's industrial level. Electronics firms in Taiwan make up an important capital market for global investors/market participants, as related listed firms typically make up over 60% of the domestic stock market turnover. Therefore, we chose the electronics industry in Taiwan from 2016 to 2018 as our research sample. After deleting the missing and extreme data, 1200 samples were preserved. All the data were collected from public websites, such as Taiwan Economic Journal Data Bank (TEJ), Taipei Exchange (TE), and Taiwan Stock Exchange Corporation (TSEC) from the period of 2015–2017.

#### *4.2. The DEA Specifications and Predictors*

By executing DEA to determine corporate operating performance, the informative input and output variables can be decided. In accordance with prior work done by Xu and Wang [31], total assets (TA), total liability (TL), and the cost of goods sold (COGS) were identified as input variables, and earnings before interest and tax (EBIT) and total sales (TS) were decided as output variables. In order to examine the representativeness of the chosen variables, we conducted the Pearson correlation (Table 1). A higher correlation coefficient implies a closer relation between two variables, while a lower correlation coefficient implies less correlation. The results in Table 1 state that all the selected variables show significant positive correlation. Obviously, it is not necessary to remove any of the selected variables.


**Table 1.** The Pearson correlation results.

Note: TA: total assets, TL: total liabilities; COGS: cost of goods sold; EBIT: earnings before interest and tax; TS: total sales. \* denotes *p* < 0.1; \*\* denotes *p* < 0.05; \*\*\* denotes *p* < 0.01.

It is necessary, however, to look beyond a singular DEA score so as to provide an overarching consideration of corporate operating assessment. To deal with this challenge, we implemented a variant of DEA specifications (i.e., three input variables and two output variables can generate 14 different combinations) that combined input and output variables in several dissimilar ways. Table 2 shows all DEA specifications.

**Table 2.** The modules of each data envelopment analysis (DEA) specification.


Note: TA: Total assets, TL: total liabilities, COGS: cost of goods sold, EBIT: earnings before interest and tax, TS: total sales.

The corporate operating performance forecasting is highly related to the issue of corporate financial distress prediction, whereby the selected predictors are designated as the condition variables in this study. Table 3 shows the descriptive statistics of selected predictors.



#### *4.3. The Assessment Criteria*

Assessment criteria have dissimilar influences on financial risk prediction. For example, false negative (FN) is the amount of positive or abnormal cases that is misclassified as normal cases. Since positive cases are distressed, inefficient, or fraudulent accounts in financial risk prediction, a forecasting mechanism with a higher FN rate can lead to tremendous losses for stock market participants as well as negatively influence the stability of the economy and stock market. Therefore, compared with a false positive (FP) rate, the FN rate should have higher importance in a forecasting task. One of the most commonly executed assessment criteria is overall accuracy/error rate. However, merely relying on one assessment criterion to determine the appropriate forecasting mechanism is not reliable and trustworthy. To yield a more comprehensive measurement, we utilized three other assessment criteria: recall, precision, and F-measure. The mathematical formulations are depicted as follows, and the confusion matrix is shown in Table 4 [32–35].

Overall accuracy (*OA*): this criterion is the percentage of a precisely classified module:

$$OA = \frac{(TN + TP)}{(TP + FP + FN + TN)}$$

.

Precision (*Pre*): this criterion is the amount of predicted fault-prone modules that precisely are fault-prone modules:

$$Pre = \frac{TP}{\left(TP + FP\right)}.$$

Recall (*Rec*): this criterion is the number of fault-prone modules that are precisely predicted:

$$\text{Rec} = \frac{TP}{(TP + FN)}.$$

*F*-measure: this criterion is the harmonic mean of precision and recall that has been widely implemented in AI fields:

$$F = \frac{2 \times Pre \times Rez}{(Pre + Rec)}.$$


**Table 4.** The confusion matrix.

#### *4.4. The Forecasting Outcome*

To provide a more sounded measurement, two types of information are considered: 5 indicators derived from financial statements (i.e., gross profit ratio, sales growth rate, inventory turnover rate, earnings per share, and net profit margin) and a performance score obtained from 14 possible DEA specifications (see Table 2). To be much more understandable to non-specialists, visualization technique, one of the dimensionality reduction architecture, namely NFRPCA was conducted. The first component was by far the most essential, accounting for 65.84% of the variability in the data. The addition of the second component increased this percentage to 85.16% (see Table 5). The first two components provided an adequate representation of the data. According to the data structure, we can classify the data into four categories. The data located in quadrant I are designated as having superior operating performance, and the data located in quadrant III are deemed as having inferior operating performance. The analyzed results are fed into the RBME to construct the forecasting model. We conducted five-fold cross-validation (CV) to eliminate the influence of over-fitting. To test the usefulness of NFRPCA, the experiments were separated into two dissimilar scenarios: (1) with principal component analysis (PCA) and (2) with nonlinear fuzzy robust principal component analysis (NFRPCA). To ensure the result did not happen by coincidence, a statistical examination was implemented. Table 6 shows the result. This finding is in accordance with Luukka [36] who indicated that the NFRPCA can project the data sets into a more feasible form. Furthermore, the introduced model not only can be represented in a singular structure, it also can be extended to an ensemble structure.

To examine the effectiveness of ensemble strategies, the experiments were divided into two different scenarios: (1) with ensemble strategies and (2) without ensemble strategies. Table 7 presents the result. We can see that the model with ensemble strategies can not only attain superior forecasting quality under all assessment criteria, but can also facilitate decision-makers to form an appropriate judgment [37]. This finding is in line with prior work done by Wozniak et al. [38] who indicated that an ensemble structure can achieve enhanced forecasting performance by integrating many singular classifiers and using their strengths. The result also can be explained by the famous "no free lunch" theorem introduced by Wolpert [39], which indicated that there is no specific classifier modelling method that is optimal for all forecasting tasks, as each method has its specific competitive edge.


**Table 5.** Non-linear fuzzy robust principal component analysis (NFRPCA) results.


**Table 6.** The forecasting result (PCA vs. NFRPCA).

Note: OA: overall accuracy; Pre: precision; Rec: recall; F: F-measure. \* denotes *p* < 0.1; \*\* denotes *p* < 0.05; \*\*\* denotes *p* < 0.01.


**Table 7.** The forecasting result (With ensemble strategy vs. Without ensemble strategy).

Note: OA: overall accuracy; Pre: precision; Rec: recall; F: F-measure. \* denotes *p* < 0.1; \*\* denotes *p* < 0.05; \*\*\* denotes *p* < 0.01.

To reach a more reliable research finding, this study took the introduced model as a benchmark and compared it with another four different AI-based techniques: decision tree (DT), rough set theory (RST), random forest (RF), and extreme learning machine (ELM). However, it is widely recognized that there is no individual forecasting mechanism that performs the best under all assessment criteria (see Table 8). Rokach [40] stated that the classifier selection can be transformed into a multiple criteria decision analysis (MCDA) task. Before executing the MCDA algorithm, the competitive score of each classifier under different assessment criteria should be decided. The competitive score is computed by utilizing a paired t-test for each classifier at the 5% significance level. The purpose of the paired t-test is to assess whether the superior or inferior competitive score of one classifier over another is statistically significant. We conducted one of the MCDA algorithms, called TOPSIS, to solve this task, because it poses a straightforward computational process, has easy-to-understand decision logics, and the mathematical formulation is rational and understandable. Table 9 shows the competitive score of each classifier and Figure 2 represents the ranking priority of each classifier. We can see that the proposed model reached the 1st rank, meaning that the proposed model outperformed the other four models.


**Table 8.** The forecasting result under four different assessment criteria.

Note: ELM: extreme learning machine; RST: rough set theory; RF: random forest; DT: decision tree; OA: overall accuracy; Pre: precision; Rec: recall; F: F-measure.


**Table 9.** The competitive score.

**Figure 2.** The ranking priority of each classifier. (Note: ELM: extreme learning machine; RST: rough set theory; RF: random forest; DT: decision tree.)

### *4.5. Robustness Test*

Most parts of previous works only utilized one pre-decided database to reach a final conclusion that cannot necessarily be fully trusted in today's highly fluctuating business environment. To prevent users from reaching a biased outcome, we tested our introduced model with another two databases: (1) Condition 1: the performance rank is decided by a traditional financial ratio (i.e., ROA); and (2) Condition 2: the performance rank is determined by a singular DEA score. The results are shown in Figures 3 and 4. We can see that the proposed model performed better in the two different databases.

**Figure 3.** The ranking priority of each classifier in Condition 1: the performance rank is decided by a traditional financial ratio.

**Figure 4.** The ranking priority of each classifier in Condition 2: the performance rank is determined by a singular DEA score.

To ensure that the selected performance measure was fairly representative, we considered each corporation's credit rating status. The corporate credit rating status is evaluated by an independent institution that aims to find out how a corporation is able to meet its financial obligation and specifically relies on a detailed and comprehensive analysis of all the risk factors of the measure object [41]. The usage of credit rating status is widely taken as a measure of corporation's risk and creditworthiness [42,43]. The rating status can be divided into 10 ranks, ranging from best to worst (ranks from 1 to 4 express low risk; from 5 to 6 express middle risk; and from 7 to 10 express high risk). The results under three different performance measures are shown in Figure 5. We can see that the corporation with superior performance derived from multiple DEA specifications has better credit rating status. In contrast, most corporations with good ROA performance still have bad credit rating status. That is, the discriminant ability of multiple DEA specifications outperforms that of the other two measures.

**Figure 5.** The discriminant ability of each performance measure.

Corporations should realize how corporate sustainability is established in s specific context and how the concept of sustainable development can be implemented to the business level and what corporations should do when they intend to reach the goal of sustainability [44]. For example, efficient utilization of the corporation's valuable resources, handling the requirements of all the stakeholders appropriately, maximizing the corporation's long term profitability, and so on, should be covered in the domain of corporate governance [45]. When integrating sustainability with corporate governance, the corporation should propose an avenue that will result in the creation of social, environmental, and economic value [46]. It is an urgent and required task to establish corporate governance that incorporates a positive response to the corporation's social, economic, and environmental risk, and chances that have the potential to affect the financial outcome of the corporations [47,48]. So, the corporate governance can be viewed as the integral part of sustainability. We divided our research samples into two different subgroups based on their corporate governance performance. The corporations with strong corporate governance normally perform better than the corporations with weak corporate governance in terms of financial performance (see Table 10). The finding is in accordance with the work done by Eccles et al. [21]. That is, the corporations with superior financial performance has higher potential to invest in sustainability.

**Table 10.** The comparison results (Average).

