*Article* **A New DEA Model for Evaluation of Supply Chains: A Case of Selection and Evaluation of Environmental E**ffi**ciency of Suppliers**

#### **Evelin Krmac \* and Boban Djordjevi´c**

Faculty of Maritime Studies and Transport, University of Ljubljana, 6320 Portoroz, Slovenia; bbn.djordjevic@gmail.com

**\*** Correspondence: evelin.krmac@fpp.uni-lj.si

Received: 31 March 2019; Accepted: 15 April 2019; Published: 18 April 2019

**Abstract:** Supply Chain Management (SCM) represents an example of a complex multi-stage system. The SCM involves and connects different activities, from customer's orders to received services, all with the aim of satisfying customers. The evaluation of a particular SCM is a complex problem because of the internally linked hierarchical activities and multiple entities. In this paper, the introduction of a non-radial DEA (Data Envelopment Analysis) model for the evaluation of different components of SCM, primarily in terms of sustainability, is the main contribution. However, in order to confirm the novelty and benefits of this new model in the field of SCM, a literature review of past applications of DEA-based models and methods are also presented. The non-radial DEA model was applied for the selection and evaluation of the environmental efficiency of suppliers considering undesirable inputs and outputs resulting in a better ranking of suppliers. Via perturbation of the data used, behavior, as well as the benefits and weaknesses of the introduced model are presented through sensitivity analysis.

**Keywords:** Supply Chain Management; Data Envelopment Analysis; Non-radial DEA model; Supplier; Efficiency Evaluation; Environment

#### **1. Introduction**

A prerequisite for providing products and services of high quality at the lowest cost is the effective management of the supply chain (SCM) [1].

The efficiency of the supply chain (SC) is significantly dependent on the coordination both across firms and within firms because each part can influence the SC. When any of the parts lack co-ordination, dramatic effects on the SC can result [2]. Therefore, measuring and monitoring the efficiency of the SC represents one of the most important steps towards its improvement. The DEA method is one of the most often used multi-criteria decision making (MCDM) methods for SC efficiency evaluation. This is why the model used in this paper is based on it.

The DEA method originated from the work of Charnes et al. [3], originally applied in the evaluation of relative efficiency of similar units when there are multiple inputs and outputs. It is one of the most effective approaches in measuring the efficiency of a SC and its components [4]. After the first application of the DEA in the field of SCM, in the literature, various approaches were presented [4]. The main reason for modifications of the DEA lies in the fact that the traditional DEA models cannot be directly employed in the SC evaluation because they consider only inputs and outputs. However, they must be modified in order to be able to include the intermediate products. Moreover, in real applications within the production process, undesirable (bad) outputs can be produced. A good example of such results, pointed out by Mahdiloo et al. [5], are suppliers' carbon emissions. Mahdiloo et al. [5] highlighted that different DEA approaches that consider the undesirable

outputs, primarily for the evaluation of the green or sustainable SCM, have been developed and are presented in the literature. However, a DEA model that, besides undesirable outputs, can evaluate efficiency using some undesirable inputs is missing.

Because of the importance and complexity of the SC, as well as the possibility to include undesirable inputs and outputs for the evaluation of different parts of the SC, the aim of this paper is to contribute to the existing literature through the introduction of a non-radial DEA model for efficiency evaluation either of different components of SCM or the whole of SCM. Consequently, the main contribution of the paper is reflected in the introduction of a new DEA model for evaluation of SCM that considers also undesirable outputs. The benefit of the introduced model is related to the possibility of consideration of undesirable inputs as well as outputs simultaneously. With such a model, the evaluation of SCM or their components in terms of sustainability would be possible. In order to check and confirm the novelty of the proposed DEA model, a comprehensive literature review of past applications of the DEA method in SCM and particular areas of SCM is presented. The applicability of the introduced model is presented through the selection and evaluation of the environmental efficiency of suppliers using data taken from Mahdiloo et al. [5]. No matter whether the data were reused from an existing study, the aim was to provide an overview of the behavior of the proposed model and compare it with other models using the same data as Mahdiloo et al. [5]. Because the data were reused from an existing study, testing of data before applying the DEA model was not performed.

With the aim to present the novelty of this paper and to better describe the process of the introduction of our model in the SC efficiency evaluation, a sequence of steps, represented in Figure 1, were performed: (1) systematic literature research; (2) selection of the paper with the most appropriate data set used; (3) the description of the non-radial DEA model itself; (4) the application of the non-radial DEA model using the selected set of data and comparison of the results; and (5) sensitivity analysis of the proposed model.

The following section describes the methodology of the literature review. Section 3 presents the results of the literature review together with the classification of papers according to particular evaluated areas of SCM. Section 4 presents the basics of the non-radial DEA model and its introduction for the evaluation of particular areas of SCM. Within Section 5, the results of the proposed model and sensitivity analysis of the model are presented. Section 6 discusses the methodology and obtained results. Finally, in Section 7, we offer our conclusions, summarizing the literature review, presenting the model, and suggesting future research.

**Figure 1.** The research process.

#### **2. Previous Research**

With the aim of confirmation of the novelty of the introduced non-radial DEA model, an overview of papers related to the application of the DEA method to SCM was performed. During the review, the only paper that reviews the application of DEA in SCM found was that of Soheilirad et al. [4], but it only presents a review of literature published until 2016.

#### *Methodology of Literature Review*

The methodology of the literature review has been taken from papers written by Krmac and Djorjevi´c [6] and Djordjevi´c et al. [7]. Accordingly, the literature review was conducted through fundamental guidelines of the systematic literature review. Since ScienceDirect and Scopus represent the two most important (and largest) scientific databases [6], a review of papers published in peer-reviewed journals without limitation on the time period of publishing was performed. However, in order to avoid bias regarding the top journals or the most cited ones, the literature review based on meta-analysis was not conducted.

The review of open-access studies focused on titles, abstracts, and keywords for English-written full-text free-available scientific journal papers, and was performed in December 2018.

The search of databases using keywords such as "Data Envelopment Analysis AND Supply Chain", "Data Envelopment Analysis AND Supply Chain Management", and the variations of both search strings where the abbreviations DEA, SC, and SCM were used, was performed in both databases. Papers that presented an application of the DEA in the SCM field were taken over by first reading the

title, keywords, and abstracts. After the initial reading, 222 papers were extracted. Further, after the full-text reading of extracted papers, 109 were selected and considered relevant. Based on the review, the selected papers were classified into main areas such as the evaluation of the SC, the evaluation and the selection of suppliers, as well as the consideration of the SC and the evaluation of suppliers in terms of sustainability. Also, for each of these categories, the application of DEA in combination with other methods was presented. However, those papers that used the DEA method for analyzing more than one aspect or where the DEA was combined with another method or methods were classified as non-categorized. The overall search process is shown in Figure 2.

**Figure 2.** The overall search process.

#### **3. Results of the Literature Review and Classification**

In the literature, numerous papers that applied the DEA method in the area of SCM were found. Within this paper, they were classified according to the purpose of the application of DEA and the combinations of DEA with other techniques or approaches. Papers that did not fall into any of the defined categories were classified as "non-categorized works". The overall search process is shown in Figure 2.

#### *3.1. E*ffi*ciency and Performance Evaluation of SC with the DEA Method*

For the evaluation of SC performances, the DEA method has been extensively employed. Previously, with the DEA method, only the initial inputs and final outputs to measure the efficiency of SCs were used, while intermediate products were ignored. However, the application of the DEA method for measuring the efficiency of the entire SC and all its components at all levels was recognized in ref. [8]. The application of DEA for performance and efficiency evaluation of SCs is summarized in Figure 3.

**Figure 3.** DEA used in performance and efficiency evaluation of SCs [1,2,8–50].

The description of each paper was given true to the short description of the goal followed by a DEA approach in the first square brackets and the reference in the second one. Because the state of SC functioning can be to a large degree linked with the selection of the best suppliers, the papers considering the methods, models and approaches of supplier evaluation and selection were separately classified under the label "DEA in supplier evaluation and selection". Over the years, several techniques such as Analytic Hierarchical Process (AHP), Analytic Network Process (ANP), Linear Programming (LP), Mathematical Programming, Multi-objective Programming, and DEA have been developed to solve the problem efficiently [25]. The papers that considered the evaluation of suppliers using the DEA technique can be also seen in Figure 3.

In order to improve some characteristics of the DEA in the evaluation of SCs and their parts, DEA was also used in combination with other methods. Shafiee et al. [34] created a network DEA for the evaluation of efficiency with a balanced scorecard (BSC) approach where the DEMATEL approach was employed to obtain a network structure of four perspectives of BSC. Many other combinations for the purpose of evaluation of the SC and its different parts were also used in the literature, and they are summarized in Figure 3 as well under the label "Combination of DEA and other techniques".

#### *3.2. Application of DEA in the Evaluation of SC Parts*

Papers that applied the DEA method in order to evaluate parts or components of SCs are presented in Figure 4 under the label "DEA evaluation of SC components".

**Figure 4.** DEA in the evaluation of SC parts, in SC network design, and in the improvement of information sharing among SC stakeholders [51–65].

#### *3.3. Application of DEA in SC Network Design*

One of the areas of the SC where the DEA method was applied is the problem of SC network design or selection of optimal network solutions. The group of papers that considers the application of DEA in the SC network field is presented in Figure 4 under the label "DEA for SC network design".

#### *3.4. Evaluation of Information Sharing in SC with DEA*

After the previous decade with developed information technology, every firm can now improve its SC strategies with the aim to improve their performance of SCM through well-organized information sharing. In the literature, there are studies like papers written by Chen et al. [64] and Yu et al. [65], that considered the effects of information sharing on the efficiency of SCs (see Figure 4).

#### *3.5. Application of DEA in Sustainable SCM*

Recently, in the literature, a lot of papers can be found regarding sustainable SCM (SSCM). Papers that considered SSCM with DEA are summarized in Figure 5.

**Figure 5.** Application of DEA and combinations of DEA and other methods in SSCM [5,66–96].

SSCM is focused on the improvement of economic, social, and environmental performances at the same time. Therefore, sustainable SCM evaluation has become a significant task for each organization. As one of the methods, DEA was recognized as suitable for the evaluation of sustainable SCM [67].

According to ref. [79], in order to develop effective SC, supplier evaluation and selection plays an important role. Within the developed approaches for supplier selection, the main goal was the reduction of SC costs, while environmental criteria were neglected. Nevertheless, the developed environmental criteria should include the comprehensive carbon footprint in the supplier's selection approaches i.e., consideration of the environmental impact of suppliers. Therefore, the authors of ref. [80] proposed an integrated buyer initiated approach for supplier selection that considers two objectives: cost cutting and environmental efficiency. Other DEA approaches in evaluating the sustainability of suppliers are presented in Figure 5 under the label "DEA in supplier evaluation and selection".

With the aim of improving previously used methods, Chen [78] introduced a structured methodology for supplier selection and evaluation based on supply chain integration architecture. Besides this paper, other papers that combine DEA with additional techniques for analysis of different areas of SCs in terms of sustainability can be found in the literature. They are presented in Figure 5 and group under the label "DEA and other methods in the evaluation of SC from a sustainability viewpoint".

#### *3.6. Non-Categorized Works*

In order to provide a hybrid method for supplier selection, Sevkli et al. [97] used the DEAHP method—the DEA method embedded into AHP methodology—because DEA still lacks a real application case, in which its implications can be evaluated.

Risk evaluation models that also represent an example of tools for supplier selection between the chance-constrained programming (CCP), multi-objective programming (MOP), and DEA were considered by Wu and Olson [98]. Azadeh and Alem [99] presented three types of models for SC risk and supplier selection under certainty, uncertainty and probabilistic conditions: DEA, Fuzzy DEA, and Chance Constraint DEA. From these studies, it can be seen that DEA has been employed in supplier selection. Further, a new approach also based on DEA, called DEA VaR (value-at-risk), was developed by Wu and Olson [100] for the selection of vendors in enterprise risk management. Visani et al. in ref. [101] used a DEA approach in approximating supplier total cost of ownership. Boudaghi and Saen [102] presented a novel model of data envelopment analysis–discriminant analysis (DEA–DA) to predict group membership of suppliers in a sustainable SC context.

Based on the developed fuzzy network DEA model, Pournader et al. [103] evaluated risk resilience of the overall SC and their individual tiers. The DEA method was also used by Azadeh et al. [104] for analyzing the impact of macro-ergonomic factors in healthcare SC. Further, Amalnick and Saffar [24] conducted an analysis of the impacts of resilience engineering and ergonomic factors in aerospace SC using DEA.

Saranga and Moser [105] presented a comprehensive performance measurement framework using the classical and two-stage Value Chain DEA models for estimating the performance of purchasing and supply management. For measuring the performance of suppliers and manufacturers in SC operations, Amirteimoori and Khoshandam [106] developed a DEA model within their study. A model for performance assessment of an outsourcer's processes in an SC comprised of several internal and external entities was provided by Pournader et al. [107] based on the Slacks-based Measure incorporated into a Hybrid Network DEA. Since a transportation system can be disrupted within the SC, Azadeh et al. [108] designed and simulated an echelon SC in which the preferred scenario was identified using fuzzy DEA.

The DEA method was also used in comparing different aspects. For example, Bayraktar et al. [109] compared SCM and information system practices of small and medium-sized enterprises operating in food products and beverages in Turkey and Bulgaria. Also, by combining the DEA method with other methods, the analysis of SC was conducted. Jalalvand et al. [110] combined DEA and PROMETHEE II as tools to compare SC at the process level, business stage level, and SC level.

#### **4. The Proposal of a Non-Radial DEA Model in SC**

Within this part of the paper, the non-radial DEA model M is introduced. The DEA method is a linear programming-based method popularized by Charnes et al. [3] for efficiency evaluation. In the literature, the DEA method has been applied for relative efficiency evaluation of the comparable set of entities, called decision-making units (DMUs) with multiple inputs and outputs, i.e., DMUs that are able to transform multiple inputs to multiple outputs. Based on the obtained results by the application of the DEA method, DMUs are classified as efficient or non-efficient. One of the advantages of the DEA method is that it does not require any prior assumptions on underlying functional relationships between inputs and outputs [7]. The mathematical formulation of the classical input oriented Charnes, Cooper, and Rhodes (CCR) DEA model [111] can be written as:

$$\min \theta ; \\ \text{s.t } X\lambda \le \theta \\ x\_{i\nu} \text{ } \forall \lambda \ge y\_{i\nu} \text{ } \lambda \ge 0\_{\nu}$$

under the assumption that there are *n* DMUs, *m* outputs and *s* inputs, where *X* and *Y* represent sets of vectors of inputs and outputs, respectively. θ represents an indicator of technical efficiency where θ ∈ [0, 1] and indicates how much evaluated entity could potentially reduce its input vector while holding the output constant.

#### *A Brief Description of the Non-Radial DEA Model*

As can be seen from the CCR DEA model, as one of the classical DEA models, it is strongly related to, and can be presented through, production theory, in which raw materials and resources are treated as inputs, while products are treated as outputs in the production process [112]. However, in some real applications, the production process may also use undesirable inputs and generate undesirable outputs. However, a method for treating both undesirable inputs and outputs simultaneously in non-radial DEA models has been introduced by Djordjevi´c et al. [7]. One non-radial DEA model was presented by Wu et al. [113] in the field of energy and environmental efficiency. In addition to the advantages of the non-radial DEA model already described, this model was extended in ref. [7] for the proposal of the evaluation of safety at railway-level crossings. Considering the ability of the DEA method in efficiency evaluation and the advantages of the non-radial DEA model, the proposed model M has been chosen for application and evaluation in SC.

The same model could be used in the SC for the evaluation of its different parts/components "as the input" such as the selection of a supplier. Consequently, inputs can be considered as desirable. However, each part of the SC can also represent/include primarily undesirable factors. Therefore, in the paper of ref. [7], in order to allow for the simultaneous reduction of desirable inputs and to obtain an accurate idea regarding the results of efficiency, the non-radial DEA model by authors of ref. [113] was improved in the following way:

$$\min \; \mathcal{W}\_n \frac{1}{N} \sum\_{l=1}^L \; \partial\_n + \mathcal{W}\_l \frac{1}{L} \sum\_{l=1}^L \; \partial\_l + \mathcal{W}\_j \frac{1}{J} \sum\_{j=1}^J \; \partial\_j \; \mathcal{W}\_j$$

s. t.

$$\sum\_{k=1}^{K} \lambda\_k \mathbf{x}\_{nk} \le \theta\_n \mathbf{x}\_{n0}, \ n = 1 \dots N \tag{1}$$

$$\sum\_{k=1}^{K} \lambda\_k \mathfrak{e}\_{lk} \le \theta\_l \mathfrak{e}\_{0\prime} \qquad l = 1 \ldots L \tag{2}$$

$$\sum\_{k=1}^{K} \lambda\_k y\_{mk} \ge y\_{m0}, \quad n = 1 \dots M \tag{3}$$

$$\sum\_{k=1}^{K} \lambda\_k u\_{jk} = \theta\_j u\_{j0}, \; j = 1 \dots J \tag{4}$$

$$
\lambda\_k \ge 0, \, k = 1 \dots K \tag{5}
$$

(M)

Under the assumption that there are *K* DMUs, each of them has *n* desirable and *l* undesirable inputs in order to produce *m* desirable and *j* undesirable outputs, denoted respectively as *x* = (*x*1*K*, ... , *xnK*), *<sup>e</sup>* <sup>=</sup> (*e*1*l*, ... , *xLK*), *<sup>y</sup>* <sup>=</sup> (*ymK*, ... , *ymK*), *<sup>u</sup>* <sup>=</sup> *<sup>u</sup>*1*K*, ... , *uJK* .

This non-radial model M could be projected for efficiency evaluation either of SC components or whole SC. The non-radial model M proportionally decreases the number of undesirable inputs and undesirable outputs as much as possible for the given level of desirable inputs and desirable outputs. The optimal values of unified efficiency are in the interval between 0 and 1. An entity with a higher value of efficiency has better efficiency compared to others. However, if an entity has an objective function equal to 1 it means that the entity is the best, located at the frontier, and could not reduce undesirable input and undesirable output. Such a non-radial model M could therefore be suitable for efficiency evaluation of SC components or a whole SC in terms of sustainability or dimensions of sustainability because it has a relatively strong discriminating power and the capability to expand desirable outputs, simultaneously reducing undesirable outputs. Additionally, unified efficiency can be calculated through decision-maker-specified weights (user-specified weights) assigned to each of these three efficiency scores and depends on the preferences between undesirable inputs utilization and undesirable outputs. However, as with any model, there are some risks related with the application of the non-radial model M. First, because the unified efficiency depends on the selected user-specified weights, the results can be subjective. Consequently, for example, the greater degree of weight for an undesirable output implies the reduction of that output. A second risk of the model M is related to the resultant inaccuracy if not all necessary variables (inputs and outputs) are included. The results of unified efficiency can be inaccurate if the set of data is not comprehensive. The improved non-radial DEA model M in this paper was applied through the evaluation of the environmental efficiency of suppliers. The detailed description of these and other characteristics of the model M can be found in Djordjevi´c et al. [7].

#### **5. Illustration of Application of the Non-Radial Model M—Numerical Example**

In this part of the paper, the non-radial DEA model M was applied to the selection and evaluation of the environmental efficiency of suppliers with the aim to present the applicability of the model M within the SC field. Because the new and fresh data was missing, non-radial model M was applied on data used in ref. [5] using the Excel Solver tool. However, the main advantages and purpose of the reuse of data is the comparison of obtained results by different models on the same data. Because the data were reused primarily in order to introduce and present the behavior of the model M, the testing of these data before applying the model M was not performed. The data, inputs and outputs that were used in the paper of Mahdiloo et al. [5] are presented in Table 1. Within the study written by Mahdiloo et al. [5], the number of employees (N1) and energy consumption (L1) were considered as inputs, sales, and return on assets (ROA); and environmental R&D investment were considered as desirable outputs; and carbon dioxide (CO2) emission as undesirable output. However, for the application of model M, energy consumption was used as an undesirable input.

The basic equation for the evaluation of environmental efficiency of suppliers (EES) of the model M can be written as:

$$\text{EES} = \frac{\text{Desirable Outputs}}{\text{Desirable inputs and Undesirable inputs and outputs}},\tag{6}$$

where the goal function of the model M can be written as:

$$\min \frac{\sum\_{n=1}^{M} \lambda\_k y\_{nk}}{\mathcal{W}\_n \frac{1}{N} \sum\_{l=1}^{L} \theta\_n + \mathcal{W}\_l \frac{1}{L} \sum\_{l=1}^{L} \theta\_l + \mathcal{W}\_j \frac{1}{\mathcal{I}} \sum\_{j=1}^{l} \theta\_j} \tag{7}$$

or, more simply as

$$\min\_{\frac{1}{3}N\_1 + \frac{1}{3}L\_1 + \frac{1}{3}I\_1}^{M1 + M2 + M3}.\tag{8}$$


**Table 1.** Dataset taken from Mahdiloo et al. [5] for application of model M.

Comparison of the results of the application of three models, namely Model 2, Model 4, and Model 5, performed by ref. [5], with the results of the use of the introduced non-radial DEA model M on the same data is presented in Table 2.


**Table 2.** Results of the efficiency from models 2, 4, and 5 presented in Mahdiloo et al. [5] and model M.

Unified efficiency of the model M was obtained based on the same weights for desirable input, undesirable input and output. The same weights, i.e., 1/3, were selected for both desirable and

undesirable inputs, as well as undesirable output with the aim to reduce the subjective bias. Using the Excel Solver tool, the results of the non-radial model M were obtained. From Table 2, it can be seen that for Model 2, the suppliers 2, 15, 16, 17, and 18 were rated as the most efficient. Next were 13, 14, 19, and 20. Regarding Model 4, the most efficient suppliers were 2, 15, and 18, while a greater number of suppliers have efficiency near to 1 compared to Model 2. The most efficient suppliers with Model 5 were the same as Model 2. Comparing the results obtained by different models, it can be seen that regarding the environmental efficiency of suppliers, Model M gave similar results to those of Model 4. Hence, suppliers that were rated as efficient within Model 4 were efficient also within Model M. The inefficiency of suppliers obtained by Model 4 was also the same with Model M. The main difference between these two models is the efficiency score. Efficiency scores obtained by Model M for each supplier are lower than efficiency scores obtained by Model 4. The main reason for this lies in the relatively strong discriminating power of model M.

#### *Validation of Non-Radial DEA Model M*

The sensitivity analysis of the non-radial DEA model M was performed with the aim to check its validity. It was conducted on data from Mahdiloo et al. [5] as shown in Table 2 using the Excel Solver tool. The main aim of the validation of model M and, therefore, of the sensitivity analysis, was the consideration of the model's behavior. So, the data used has no influence on the sensitivity analysis. The sensitivity analysis was conducted in the same way as in Djordjevi´c et al. [7]. Realization of the sensitivity analysis of non-radial model M is presented in Table 3. The process of sensitivity analysis was conducted based on the certain percentages of perturbation of used data, i.e., 2%, 5% and 10% until the status of at least one supplier was changed [7]. Sensitivity analysis was presented through three cases. In Case 1, both desirable and undesirable inputs, as well as undesirable output, were improved for suppliers with efficiency 1 and worsened for suppliers with efficiency under 1. Within Case 2, perturbation of the data was focused on the increment of undesirable inputs and output for suppliers with efficiency 1 and reduction for those with lower efficiency, while desirable input was fixed. Then, the behavior of model M was checked through Case 3 where desirable outputs were decreased and desirable input increased for suppliers with efficiency 1, and vice versa for inefficient suppliers. For each case, after the data changing model M was solved using Excel Solver. Results of three cases of the sensitivity analysis are presented in Table 3.


**Table 3.** Results of Sensitivity Analysis of non-radial DEA model M.

<sup>1</sup> Remarks: Show the relationships between the results of the efficiency calculated for each supplier and for each data perturbation through Cases 1, 2, and 3.

For most suppliers with efficiency under 1, the score of efficiency was improved (see Table 3). However, for some inefficient suppliers, the efficiency score was not significantly changed, such as, for example, suppliers 12 and 13.

With increments in data for 5% and 10% in Case 1, the suppliers 15 and 18 became efficient, while a score of efficiency was changed gradually for inefficient suppliers. In respect to Case 2, the transformation of results can also be noticed with only a 5% decrement of undesirable input and output for inefficient suppliers, which became efficient, such as suppliers 14 and 16, while the results for some efficient suppliers were transformed to inefficient. In comparison with Case 1, the efficiency and inefficiency of a large number of suppliers were changed. Such information tells us that the results of the non-radial DEA model M probably depend on the undesirable input and output.

For Case 3, results have shown that inefficient suppliers became efficient with 5% decrement/increment. Through comparison with Case 1, it can be concluded that some efficient suppliers became inefficient with a change of desirable outputs and inputs. These results indicated that the suppliers are more sensitive to the data of undesirable output and input (see Case 2). With the aim of further changes from inefficient to efficient or vice versa, it is necessary to apply higher percentages of data perturbations. Meanwhile, it should be pointed out that the efficiency for suppliers such as 2 and 17 are unchanged besides the percentages of data perturbations.

The comparison of the results of three cases is given in Table 3, in the column remarks. It was conducted based on the percentage of data perturbation for each supplier. The results show that the efficiency of a particular supplier was mainly unchanged, i.e., scores of efficiency were the same—in all cases under the 2% of data perturbation. In the case of 5% of data perturbation, the efficiency in Case 2 was lower than those in Case 3 and 1, while it was mainly the same for Cases 1 and 3. Finally, for 10% of data perturbation, the efficiency was mainly lower in Case 2 compared to Cases 3 and 1.

#### **6. Discussion**

As can be seen from the literature review, many studies have applied the DEA method for efficiency evaluation in SCM. The main contribution of this research is related to the introduction of the non-radial DEA model for efficiency evaluation of different components of SC. Applicability of the introduced non-radial model M was presented through the evaluation and selection of suppliers using the data from ref. [5]. The proposed tool, i.e., the non-radial model M, is relevant for the selection and evaluation of suppliers. However, it can also be a good tool for considering best practices of all components of SC in terms of sustainability because the model is able to measure efficiency while considering undesirable inputs, as well as undesirable outputs, which appeared in real applications.

Through comparison of the results obtained by the non-radial DEA model M and models developed by Mahdiloo et al. [5] (see Table 2), it can be seen that model M has similar results, i.e., the closest results in terms of efficiency score to Model 2. However, regarding the efficiency and inefficiency of suppliers, results of Model M are the most similar to those of Model 4. Further, the results for each supplier of Model 2 were lower in comparison with other models. The main reason for this is the higher discrimination power of model M.

Based on the results of model M, obtained using the Excel Solver tool, it can be said that this model is more appropriate for efficiency evaluation. First of all, with the model M desirable inputs, undesirable inputs and undesirable outputs are simultaneously reduced. Hence, model M can minimize desirable inputs as well. However, in the case of the application of the non-radial model M without the efficiency score θ*<sup>n</sup>* and weight *Wn* in terms of desirable inputs, an unreal picture regarding the efficiency can be presented. With model M, the consideration of environmental evaluation and selection of supplier and other components of SC regarding sustainability is more precise, providing better relative efficiency. Further, through the selection of the set of preference weights, the degree of desirability of the adjustment of the input and output levels can be achieved. Therefore, the selection of the weight, for example, for undesirable output, will affect the reduction of that output. Consequently, based on their preferences and the goal of evaluation, decision makers should select weights carefully because

the selection of weights can influence the results of model M. In this paper, all weights were selected to be 1/3.

In our case, the environmental efficiency of suppliers was evaluated based on the data taken from Mahdiloo et al. [5]. The suppliers with the greatest relative efficiency were 2, 15 and 18 (see Table 2). Consequently, based on the model M and the results, it can be said that these suppliers for the given level of Sales, ROA and Environmental R&D investment have minimum Energy consumption and CO2, as well as Number of employees in comparison with other suppliers.

Through the consideration of results of the sensitivity analysis, it can be highlighted that model M is valid. Nevertheless, model M is sensitive to data for a smaller transformation of data that causes reduced stability of the model. The reason lies in the fact that model M has the effect of greater discrimination. Then, the score of efficiency for some suppliers was unchanged, which can be linked to the fact that model M evaluates suppliers, i.e., minimizes inputs for a given level of outputs. Consequently, besides the transformation of data, the score of efficiency for particular suppliers was unchanged as these suppliers have a lower level of desirable outputs in comparison with other suppliers. However, it can be concluded that for a higher percentage of data transformations, the model M can present a picture with higher changes of a score of efficiency. In the case of inaccurate data, the application of model M can present an unrealistic picture regarding the best suppliers. The comparison, given in the column remarks of Table 3, confirms these facts. The comparison was conducted based on the percentage of data perturbation for each supplier. The results show that the efficiency of a particular supplier was mainly unchanged, i.e., scores of efficiency were the same—in all cases under the 2% of data perturbation. In the case of a higher percentage of data perturbation, the efficiency in Case 2 was lower than those in Case 3 and Case 1. The obtained results, therefore, confirm that the behavior and results of the model M can be affected by the accuracy of the data and the selection of inputs and outputs.

Nevertheless, the application of model M with an accurate date can show that the model could be a good tool for supplier evaluation and other parts of SCs in terms of sustainability. Further details of the weaknesses of the model M which can appear during supplier evaluation and selection can be found in Djordjevi´c et al. [7].

#### **7. Conclusions**

Through the literature review, various DEA models for evaluation within SCM have been shown. However, just a few of them considered undesirable inputs, which are an inseparable part of real production processes and applications, while consideration of undesirable outputs within DEA models is still missing. Therefore, in order to improve the existing literature, the non-radial DEA model that simultaneously considers undesirable inputs and outputs was proposed. Consequently, within the paper, the introduction and presentation of a new DEA model for the evaluation of different components of SCs, which is the main contribution of this paper, was presented. Applicability of the proposed model was presented through the evaluation of the environmental efficiency of suppliers. In order to confirm the novelty of the introduced non-radial model M within this paper, a comprehensive literature review of the application of the DEA method in SCM was performed. Numerous papers have applied quite a variety of DEA approaches in the field of SCM and its components. These papers were categorized according to the purpose of application of the DEA method. Application of the DEA method in combination with other methods in the field of SCM was presented as a separate category. Finally, papers that used DEA as a part of a developed framework or method, as well as for analyzing two or more aspects of SC, were grouped as non-categorized works. As can be seen, a lot of papers were presented in the evaluation of SC performance and supplier selection in terms of sustainability. Different modifications of the DEA method in SCM are therefore available in the literature. Besides modifications of the DEA model, there are also papers that only considered some inputs and outputs as undesirable factors.

However, it can be concluded that there are not many papers that have considered undesirable factors in/within SCs that can appear in real applications. Some papers that included undesirable factors in the evaluation of SC or different parts of it were focused only on undesirable outputs. Therefore, in this paper, we introduced a DEA model M that besides undesirable outputs can also consider and evaluate undesirable inputs. The proposed new approach of a non-radial DEA model M for the environmental evaluation of suppliers and other components of SC that is able to consider desirable inputs in the goal function, all with appropriate weights, was presented.

With the introduced non-radial model M, a better picture in terms of a score of efficiency can be achieved. Application of model M has been presented based on the data taken from Mahdiloo et al. [5]. Results of the model M, obtained using Excel Solver tool, and results obtained by models applied in Mahdiloo et al. [5], are presented in Table 2. Because the data were reused primarily in order to introduce and present the behavior of model M, testing of these data before applying model M was not performed. From Table 2, it can be seen that for Model 2, the suppliers 2, 15, 16, 17, and 18 were rated as most efficient. For Model 4, the most efficient suppliers were 2, 15, and 18, while the most efficient suppliers within Model 5 were the same as within Model 2. Comparing the results of different models, it can be seen that the model M yielded similar results for environmental efficiency of suppliers as Model 4. The picture regarding the inefficiency of suppliers is the same. The main difference between the model M and other models is in efficiency score, i.e., in the case of the model M, it is lower than in the case of other models. Considering these results, it can be concluded that model M provides more precise results because of the higher discriminatory power.

In order to check the behavior of the model M, using the same data as in Mahdiloo et al. [5], sensitivity analysis was performed using the Excel Solver tool, conducted through three Cases under a certain percentage of data perturbation (2%, 5% and 10%). The results of the sensitivity analysis are presented in Table 3. In Case 1, both desirable and undesirable inputs, as well as undesirable outputs, were improved for suppliers with efficiency 1 and worsened for suppliers with efficiency under 1. Within Case 2, the perturbation of the data was focused on the increment of undesirable inputs and outputs for suppliers with efficiency 1 and on the reduction for those with lower efficiency, while desirable input was fixed. Then, the behavior of model M was checked through Case 3 where desirable outputs were decreased and desirable input increased for suppliers with efficiency 1, and vice versa for inefficient suppliers. Based on the results obtained through the sensitivity analysis, it can be concluded that for most suppliers with efficiency under 1, the score of efficiency was improved. However, for some inefficient suppliers, the efficiency score was not significantly changed; such as, for example, suppliers 12 and 13. Through the comparison of efficiency for a particular supplier, comparing data of different perturbations (Table 3), it can be seen that efficiency was mainly the same in all Cases under the 2% of data perturbation. However, with 5% of data perturbation, the efficiency for each supplier was lower in Case 2 compared to Case 3 and Case 1, while efficiency was mainly the same for Case 3 and Case 1. In addition, regarding 10% of data perturbation, it can be seen that the efficiency of suppliers was mainly lower in Case 2 compared to Case 3 and Case 1.

Model M was taken from Wu et al. [113] and then improved and applied in Djordjevi´c et al. [7]. The comprehensive observation was that efficiency obtained by non-radial DEA model M from ref. [7] was different from the model developed in ref. [113] and that model M gives better efficiency because of the involvement of efficiency score θ*<sup>n</sup>* with weight *Wn* in the model. The main reason for different results in comparison with results in ref. [113] and in ref. [5] lies in the fact that model M can simultaneously reduce desirable inputs. Through the application of model M for the evaluation of SC, a better picture regarding relative efficiency can be given because the model is more representative and strict. The proposed model M can, therefore, be a good tool for efficiency evaluation of SCs and identification of best practices.

Specifically, the model M is able to measure the efficiency of some components of an SC such as supplier selection and comparison of efficiency between SCs on different (micro and macro) levels over time.

Further, one of the major advantages of the proposed model M are weights that can be assigned to desirable and undesirable inputs and outputs. Through the application of particular weights for inputs and outputs, the level of desirability can be determined, which influences reduction or improvement effects of inputs or outputs. Regarding that, one of the important steps in the non-radial model M should be the careful selection of weights, relying on the preferences of DM and the aim of the evaluation. Based on their degree, they can influence the results of the non-radial DEA model M. Considering the results of the sensitivity analysis of model M presented in Table 3, it can be concluded that the model is valid. However, results of sensitivity analysis also illustrate reduced stability under smaller data transformation.

Bearing in the mind the overview of literature related to the application of the DEA approach in SCM and the introduced non-radial DEA model M, future work can be conducted. For instance, the model M may be applied for the evaluation of components of SC such as supplier selection using experimental or real data. Further, the non-radial DEA model can be also applied to the specific companies within the EU countries or the US. Funding of the best practices among companies and comparisons of between companies or countries can also be realized with model M. In the future, model M can be extended also for the evaluation of whole SC through the inclusion of intermediate variables. Besides the environmental efficiency model M can be applied for measuring other components or whole SC from the perspective of dimensions of sustainability such as economic and social. Moreover, combination and application of DEA with other economic measures such as ROE (Return on Equity) and ROA (Return on Assets) for the purposes of evaluation in SCM in terms of different views of sustainability can be one of the future tasks.

**Author Contributions:** Both authors contributed equally to this work and have read and approved the final manuscript.

**Funding:** This research received no external funding.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Article* **Modelling Construction Site Cost Index Based on Neural Network Ensembles**

#### **Michał Juszczyk \* and Agnieszka Le´sniak**

Cracow University of Technology, Faculty of Civil Engineering, Warszawska 24, 31-155 Cracow, Poland; alesniak@L3.pk.edu.pl

**\*** Correspondence: mjuszczyk@L3.pk.edu.pl

Received: 11 February 2019; Accepted: 18 March 2019; Published: 20 March 2019

**Abstract:** Construction site overhead costs are key components of cost estimation in construction projects. The estimates are expected to be accurate, but there is a growing demand to shorten the time necessary to deliver cost estimates. The balancing (symmetry) between time of calculation and satisfaction of reliable estimation was the reason for developing a new model for cost estimation in construction. This paper reports some results from the authors' broad research on the modelling processes in engineering related to estimation of construction costs using artificial intelligence tools. The aim of this work was to develop a model capable of predicting a construction site cost index that would benefit from combining several artificial neural networks into an ensemble. Combining selected neural networks and forming the ensemble-based models compromised their strengths and weaknesses. With the use of data including training patterns collected on the basis of studies of completed construction projects, the authors investigated various types of neural networks in order to select the members of the ensemble. Finally, three models that were assessed in terms of performance and prediction quality were proposed. The results revealed that the developed models based on ensemble averaging and stacked generalisation met the expectations of knowledge generalisation and accuracy of prediction of site overhead cost index. The proposed models offer predictions of cost in an accepted error range and prove to deliver better predictions than those based on single neural networks. The developed tools can be used in the decision-making process regarding construction cost estimation.

**Keywords:** cost decision making; construction site overhead costs; neural network ensembles; ensemble averaging; stacked generalisation; cost estimation; construction cost management

#### **1. Introduction**

The success of a construction project is determined by obtaining three fundamental goals of a project—completion within the budget, completion within planned time, and achieving the expected quality of construction works. For the budget issue, cost estimation is a key process. On one hand, the estimates are expected to be accurate; on the other hand, there is a growing demand to shorten the time necessary to deliver cost estimates. These needs justify attempts to employ various tools in fast cost analyses and modelling. The aim of this paper is to present the results of the research on artificial neural networks (ANNs) ensembles as artificial intelligence tools for fast analysis and prediction of site overhead costs. This research is a continuation and extension of previous studies, including prediction of these costs with the use of multilayer perceptron neural networks [1]. It is worth mentioning that mathematical tools—which are constantly being developed—are present in the investigations of a broad variety of problems in the field of construction management and technology. Some interesting examples are applications of fuzzy sets theory and fuzzy logic in construction project risk [2–4], the evaluation of a construction safety management system [5], processes in a

construction enterprise [6], the investigation of flow-shop scheduling problems [7], and using multiple criteria decision-making methods for supporting the decision process in construction and building technology [8–10]. There have also been a number of attempts to apply artificial neural networks in the management of construction projects—predicting the completion period of building contracts [11], analysing efficiency and productivity in construction projects [12,13], predicting the maintenance cost of construction equipment [14], supporting bidding decisions [15,16], and facilitating decision making [17–19]. Comprehensive discussion on innovative solutions in the construction industry can be found in Reference [20].

The solutions and models that support cost estimates in construction are explored in many scientific publications. The authors propose a variety of methods, for instance multivariate regression [21], analysis of the selected cost-effectiveness factors [22], a case-based reasoning method [23], fuzzy logic [24], and genetic algorithms [25]. In terms of ANNs, there have been attempts to apply these tools in the field of construction cost management. Some examples are forecasting costs of motorways in different aspects [26], predicting cost deviations in reconstruction, alteration, and rebuilding projects [27], estimating the costs of construction projects [28,29], cost estimates of residential buildings [30], prediction of overhead costs [31,32], cost estimates of buildings' floor structural frames as a higher level of aggregation elements of building information model [33], construction cost of sports fields [34], and shovel capital cost estimation [35].

According to the research presented in Reference [36], the influence of an improper calculation of the overhead costs can create a significant negative financial situation for the contracting company. Generally, the building contractor's overhead costs are divided into two categories—site (project) overhead costs and company's (general) overhead costs [37]. The site (project) overhead costs include items that can be identified with a particular job, but not materials, labour, or production equipment. The company's overhead costs are items that represent the cost of doing business and are often considered fixed expenses that must be paid by the contractor. On the other hand, an overhead cost of a construction project can be defined as a cost that cannot be identified with or charged to a construction project or to a unit of construction production [38]. A new classification of construction companies into competitiveness classes according to the relative value of overhead costs was proposed in Reference [39]. As far as accuracy is concerned, it is more advantageous to calculate both components separately—as is done in Great Britain [40], the US, and Canada [41]. The unstable construction market makes it difficult for construction companies to decide on the optimum level of overhead costs [42].

A number of empirical studies relate to the determination of the project overhead costs. In Reference [43], it is indicated that the method of work is a critical factor affecting the amount spent on project overheads. In Reference [44], the authors pointed that the location of the site could affect a number of project overhead items. In References [31,45,46], research carried out in different countries allowed for the identification of different factors that should be taken into account in site (project) overhead costs.

Studies on construction project overheads and factors that influence their estimates report that it is difficult to determine unambiguously which of the cost components are of the highest importance. Most attention is paid to a detailed calculation of site overheads; however, it is a time-consuming task to take into account all of the possible components of site overhead costs [36].

The aim of the authors' work was to develop a regression model based on the ANNs ensembles, capable of the prediction of site overhead cost index, and, thus, able to support the estimation of site overhead costs in construction projects. An additional research objective was to explore the capabilities of ANNs ensembles in this problem. In the application of ANNs, a very common approach is to select one network to be the core of a developed model. The selection is preceded by a training and performance assessment of numerous networks—compare, e.g., Reference [47]. As an alternative, the employment of a combination of networks i.e., ANNs, offer significant capabilities. Despite their advantages, the ANNs ensembles are rarely reported on for the prediction of widely understood construction costs in research papers.

Site overhead costs can be estimated with the use of preliminaries (compare References [40,41]) —such a method is accurate but time-consuming as all of the cost items must be assessed separately. On the other hand, index methods (compare Reference [36]) allow for quick estimation of site overhead costs, however the accuracy depends on the assumption of the index. The novelty of the approach proposed in this paper relies on the use of knowledge and information from the completed construction projects to train several neural networks, combine them into an ensemble, and assess the site overhead costs on the basis of the predictions produced by the ensemble of neural networks.

The paper content includes an introduction and review of the literature in the above section. Section 2 presents the theoretical background of the problem, and a discussion of the site overhead cost index prediction as a regression problem is presented in Section 3. In Section 4, the authors propose a methodology for the implementation of an ensemble of neural networks (with the use of ensemble averaging and stacked generalisation approaches) for prediction of site overhead cost index, present the results of the studies, and discuss the results. Section 5 includes a summary and conclusions.

#### **2. Background of The Problem, Methods, and Main Assumptions**

The development of the proposed model comes down to solving a regression problem and approximation of the true regression function, which is the relationship between the site overhead costs index (as the dependent variable of the model) and a set of selected predictors (as independent variables of the model). The theory of ANNs is widely presented in the literature—for instance, References [47–49]. ANNs, as mathematical tools applied in regression problems, offer an approximation of the true regression function *g*(*xj*) of multiple variables *xj* where *j* = 1, . . . ,n:

$$\mathcal{S}(\mathbf{x}\_{\bar{\jmath}}) = f(\mathbf{x}\_{\bar{\jmath}}) + \varepsilon,\tag{1}$$

In the equation (1), function *f*(*xj*), as an approximation of *g*(*xj*), is assumed to be implemented implicitly by a trained single ANN, selected from a number of trained candidate networks, where *ε* denotes an error of approximation. There are two disadvantages of an approach based on the selection of a single ANN and discarding the rest of the candidate networks [47,48]—the effort required for the training and assessment of the number of candidate networks is wasted. Moreover, the generalisation performance of the chosen network is biased with respect to some part of the input space due to the selection of learning, testing, and validating subsets from the overall number of patterns available for the training process, structure of the network, its parameters, and conditions of training process initialisation. An alternative approach is to combine a number of different ANNs that share common input *xj* and form an ensemble (the ANNs may differ in their structures, parameters, and way of training, and the ensemble may even include different kinds of networks). In this paper, the authors consider two alternative approaches that are based on ensembles of neural networks—the first approach is termed as ensemble averaging, and the second one *stacked generalisation*—compare, e.g., References [47,48]. In the next three subsections, the authors systematically present the background of the research and the main assumptions of the model development process.

#### *2.1. Ensemble Averaging*

The main assumption for the ensemble averaging approach is that approximation of *g*(*xj*) is done with the use of a linear combination of *K*-trained ANNs. The formal notation is given by Equation (2):

$$\log(\mathbf{x}\_{\bar{i}}) = \frac{1}{K} \sum\_{i=1}^{K} f\_i(\mathbf{x}\_{\bar{i}}) + \varepsilon\_{i\star} \tag{2}$$

where *fi*(*xj*) stands for the approximation and *ε<sup>i</sup>* denotes an error of approximation by *i*-th neural network for *i* = 1, ... ,*K*. Such a mechanism (compare Reference [48]), which does not involve input signals, where individual outputs of ANNs are combined to produce an overall output, belongs to a class of static structures. The following assumptions can be made [47]—the sum-of-squares error for *fi*(*xj*) can be given as:

$$E\_i^{\text{sys}} = \sum \left( \left\{ g(\mathbf{x}\_j) - f(\mathbf{x}\_j) \right\}^2 \right), \tag{3}$$

where *Ei sos* corresponds to an integration over *xj*, weighted by unconditional density *p*(*xj*):

$$E\_i^{\rm os} \equiv \int \dots \int \dots \int \varepsilon\_i^2(\mathbf{x}\_{\dot{\jmath}}) p(\mathbf{x}\_{\dot{\jmath}}) d\mathbf{x}\_1 \dots d\mathbf{x}\_{\dot{\jmath}} \dots d\mathbf{x}\_{n\prime} \tag{4}$$

The average error by the networks acting individually can be written as

$$E\_{av}^{\rm sos} = \frac{1}{K} \sum\_{i=1}^{K} E\_i^{\rm sos}. \tag{5}$$

Supposing that the output of the ensemble of networks is the average of outputs of *K* networks that belong to the ensemble, we have the prediction of the ensemble *fens*(*xj*):

$$f\_{\rm{bus}}(\mathbf{x}\_{j}) = \frac{1}{K} \sum\_{i=1}^{K} f\_{i}(\mathbf{x}\_{j}). \tag{6}$$

Under the assumption that *εi*(*xj*) are uncorrelated and have zero mean, the relation of the ensemble error to the average error of the networks working separately is:

$$E\_{\rm ens}^{\rm sos} = \frac{1}{K^2} \sum\_{i=1}^{K} E\_i^{\rm sos} = \frac{1}{K} E\_{\rm av}^{\rm SOS}.\tag{7}$$

In practice, *εi*(*xj*) are highly correlated and the reduction of the error is much smaller. Typically, some useful reduction of the error is obtained, as the ensemble averaging cannot produce an increase in the expected error:

$$E\_{\rm env}^{\rm SOS} \le E\_{\rm av}^{\rm SOS}.\tag{8}$$

The expectation is that differently trained networks converge to different local minima on the error surface, and the overall performance is improved by combining the outputs in some way [47]. The employment of neural networks ensembles may lead to satisfactory results, especially when the number of training patterns is relatively low or the training data is noisy [47,50].

#### *2.2. Stacked Generalisation*

The stacked generalisation approach, (compare Reference [47]), is based on combining several trained networks together into a two-level model. The general expectation of such an approach is to improve the generalisation capabilities of the networks acting in isolation. The two-step procedure includes a training set of *K* level-0 networks, whose outputs are then used to train a level-1 network. One can say that the level-0 networks form an ensemble, and the level-1 network acts as a combiner of the outputs of the networks belonging to the ensemble. The general idea of the approach is presented in Figure 1.

A stacked generalisation-based model combines the outputs of level-0 networks trained with the *xj* inputs; the outputs of level-0 networks can be written down as *y*ˆ*<sup>i</sup>* = *fi*(*xj*), with the use of the level-1 network to give the final output. Formally the model can be given as

$$\log(\alpha\_j) = \ln(f\_i(\alpha\_j)) + \varepsilon\_{\mathcal{S}\mathcal{G}}.\tag{9}$$

Consequently, predictions on new data is also a two-step procedure. They are made by presenting new input data to the level-0 networks and computing their outputs, which are then presented to the level-1 network which computes the final output. The general suggestion for the stacked generalisation approach is that the ensemble of level-0 networks should consist of various networks that differ from each other, whilst the level-1 network should have a relatively simple structure [47].

**Figure 1.** General idea of stacked generalisation approach.

#### **3. Construction Site Overhead Cost Index Prediction as a Regression Analysis Problem—Assumptions for Ensemble Averaging and Stacked Generalisation**

The prediction of site overhead cost index by the neural networks ensemble and ensemble averaging approach can be formally given with the following Equations (10) and (11):

$$y = \frac{1}{K} \sum\_{i=1}^{K} f\_i(\mathbf{x}j) + \varepsilon\_{i\prime} \tag{10}$$

$$\mathcal{Y}\_{\rm cons} = \frac{1}{K} \sum\_{i=1}^{K} f\_i(\mathbf{x}j) , \tag{11}$$

where *y*—real life values of site overhead cost index (dependent variable), *y*ˆ*ens*—values of *y* predicted by the ensemble of neural networks, *fi*—the *i*-th mapping function implemented implicitly by the *i*-th neural network belonging to an ensemble, *xj*—dependent variables, input shared by all of the members of the ensemble for *j* = 1, ... ,m, *εi*—error of approximation by the *i*-th member of the ensemble for *i* = 1, . . . , *K*.

On the other hand, the prediction by neural networks ensemble and stacked generalisation approach is denoted with Equations (12) and (13):

$$y = h(f\_i(x\_j)) + \varepsilon\_{\mathbb{X}\_{\mathbb{X}\_{\prime}}} \tag{12}$$

$$
\hat{y}\_{\%} = h(f\_i(x\_j)),
\tag{13}
$$

where *y*—as in (11), *y*ˆ*sg*—values of y predicted by the stacked generalisation-based two-level model, *h*—the mapping function implemented implicitly by level-1 neural network, *fi*—the *i*-th mapping function implemented implicitly by *i*-th level-0 neural network, *xj*—as in (11), and *εsg*—the error of approximation by the model.

The relationship between the set of selected predictors and the site overhead cost index was investigated by the authors. Eleven independent variables of the model were selected on the basis of studies of literature [28,31,46] and investigations of the number of projects completed in Poland. The training data included samples of real-life values of dependent variables, *y*, and corresponding vectors of dependent variables, *xj*. The value of the dependent variable in the *p*-th sample (*p* = 1, . . . ,143) was calculated as follows:

$$y^p = \text{SOC}\_{ind}^p = \frac{\text{SOC}^p}{LC^p + MC^p + EC^p + SC^p} \cdot 100\%\_{\prime} \tag{14}$$

where *SOCind<sup>p</sup>*—site overhead costs index, *SOCp*—site overhead costs observed in reality, *LCp*—labour costs observed in reality, *MCp*—material costs observed in reality, *ECp*—equipment costs observed in reality, and *SCp*—subcontractors' costs observed in reality for the *p*-th observation (sample). Some exemplary data, including cost components present in the Equation (13), in thousands of Euros, and corresponding site overhead cost indexes, are presented in Table 1.


**Table 1.** Exemplary values of site overhead costs index.

Independent variables of the model were selected on the basis of studies of the literature and investigations of the number of projects completed in Poland. As a result, a set of selected independent variables was proposed; these variables were denoted as *xj*, where *j* = 1, ... ,11. Three variables brought to the model information about the types of work that were executed in the project were:


Another four variables brought to the model information about the construction site location were:


One of the variables brought to the model information about the duration of construction works was:

• *x*8—overall duration of construction works.

Another two variables brought to the model information about the execution of works in winter and about the subcontracted works were:


The last variable brought to the model information about the main contractor was:

• *x*11—size and necessary potential of the main contractor.

(When compared to the earlier authors' studies on the problem [1,32], the set of ten independent variables has been expanded. Thorough review of available data, which was collected in the earlier phases of the research, allowed to select an additional variable which brings to the model information about the capabilities of the contractor - namely its size and potential.)

Variables *x*1–*x*<sup>6</sup> were of the nominal type. A binary method of coding was applied in the case of *x*1, *x*<sup>2</sup> and *x*3—their values range was 0 or 1. In the case of *x*4, *x*<sup>5</sup> and *x*6—a "1 of n" method of coding was applied—the range of values, considered for the three variables altogether, was 1, 0, 0 or 0, 1, 0 or 0, 0, 1.

Variables *x*7–*x*<sup>10</sup> were of the quantitative type, whereas *x*<sup>11</sup> was of the nominal type. A pseudo-fuzzy scaling method of coding was applied for transformation of the original values

or information into numerical values into the range 0.1–0.9 in the case of the variables presented in Table 2, but for the variable *x*<sup>9</sup> the values were scaled into the range 0.0–1.0. The transformation for these variables is presented in Table 2. The rationale for the transformation was to provide a common scale for all of the variables.


**Table 2.** Transformation of the descriptive values into the numerical values for variables *x*7–*x*11.

The database that included 143 samples was built on the basis of a survey which was completed by Polish contractors. The survey investigated the factors that influence site overhead costs and the scope and complexity of construction works for completed building projects. The studies of the returned surveys resulted in gathering and ordering data used in the process of ANNs training. Table 3 presents some samples of the training data; exemplary records from the database are given.


**Table 3.** Exemplary samples of the training data.

The strategy of the models' development, as well as the assumptions about the training, testing, and performance analysis, are explained in the next section.

#### **4. Models' Development, Results, and Discussion**

#### *4.1. Models' Development Strategy*

The strategy of the model development included conducting multiple training and testing of a number of different types of single ANNs as candidates to become members of the ensemble, forming the ensemble, and then investigating the two approaches discussed earlier. The strategy is presented schematically in the chart in Figure 2 and then discussed in detail.

**Figure 2.** Scheme of the strategy of the models' development.

The whole set of collected data was divided into two main subsets used for training and testing purposes. The testing subset, later referred to as *T*, was selected carefully to be statistically representative for the whole data collection and included 20% of the samples from the whole set of collected data. The data belonging to this subset did not take part in the training of ANNs and was used for the purposes of examination of single ANNs, as well as the ensemble models built upon the ensemble averaging and stacked generalisation approaches. Samples belonging to the subset *T* play the role of new cases in prediction performance analysis as well.

The remaining data was used for training i.e., for supervised learning and validating of single ANNs candidates to become members of the ensemble. Later, these subsets are referred to as *L* and *V*, respectively, whilst the whole training subset is referred to as *L&V*. The strategy involved division of the remaining data in the relation *L*/*V* = 80%/20%, repeated five times, so the five folds of data were available for training purposes. Moreover, each of the samples belonging to the *L&V* subset took part in supervised learning in four folds and in validating in one fold, so the networks for each fold are trained with data which varies in terms of falling different samples either to the *L* or *V* subsets.

Another key assumption was to select one ANN for each of the folds of *L* and *V* subsets to become the member of the ensemble. The selection was made on the basis of two-step ANNs' performance analysis and assessment within the sets of networks trained with the use of each fold of *L* and *V* subsets. The rationale for such assumption was not only to choose the best networks but also to minimise the risk that the prediction of the model is biased due to the sampling of the *L* and *V* subsets. The employed error function and criteria of the trained networks assessment are presented in Table 4. For the purposes of performance assessment and analysis of single ANNs, Pearson's correlation coefficient (15) and error measures (16)–(20) were calculated for the *L*, *V*, *L&V*, and *T* subsets.

Selection of the ensemble members was preceded by an investigation of a number of various multilayer perceptron (MLP) ANNs with one hidden layer, whose structures included 11 neurons in the input layer, h neurons in the hidden layer, and 1 neuron in the output layer. The choice of the MLP networks relied on their applicability to regression problems (compare References [29,49]).

The networks varied in the number of neurons in the hidden layer (h ranged from 4 to 11), the types of employed activation functions—both in the neurons of the hidden and output layer (sigmoid, hyperbolic tangent, exponential, and linear function) and the initial weights of the neurons—at the beginning of the training process. The Broyden–Fletcher–Goldfarb–Shanno algorithm (BFGS) was used for training individual networks—the details about the algorithm can be found in the literature, e.g., Reference [47]. The choice of the training algorithm was dictated by its availability in the software that were used for neural simulations. As one of the three available algorithms, BFGS offered the fastest performance and best convergence of training and testing processes for the investigated problem. A variety of different combinations of employed activation functions and numbers of neurons in hidden layers that made, altogether, over 100 networks were trained for each of the five folds of *L* and *V* subsets.

The first step of selection included an assessment of correlation coefficient between the expected and predicted output and root mean squared error (*RMSE*) values. From the set networks, which fulfilled the conditions of *RL* > 0.90, *RV* > 0.90, *RL&V* > 0.90, and *RT* > 0.90, the authors initially selected 20 networks for which the differences between *RMSEL, RMSEV*, *RMSEL&V*, and *RMSET* were the smallest.

The second step of the selection relied on a thorough review of the initially selected networks for each of the five folds of *L* and *V* subsets. The authors carried out a residual analysis, in terms of both measures presented in Table 4, and distributions, dispersions, and values of errors for the samples belonging to the training and testing subsets.


**Table 4.** Error function and models' performance assessment criteria.

#### *4.2. Results and Discussion*

A review and comparison of the network's performance, based on the methodical analysis, allowed for finally choosing five networks—one for each fold of *L* and *V* subsets. The five selected networks—later referred to as ANN1, ANN2, ANN3, ANN4, and ANN5—are presented in Table 5.


**Table 5.** Details of the five networks selected to be the members of the ensemble.

Table 6 presents the results of training and testing of the five selected networks acting separately. The results in the Table are given according to the criteria presented in Table 4. The results in Table 6 are satisfying, however one can easily see that there are some differences between the performances of the five networks.

Figure 3 presents the scatterplot of the expected and predicted values of *SOCind*, points of coordinates (*yp*, *yˆp*), for the training and testing subsets drawn for the five selected networks acting individually. One can see that, in terms of the criteria shown in Table 4 and according to the results presented in Table 5, the performance of the three networks acting individually was similar and the errors were comparable. However, Figure 3a,b and the analysis of the location and the distribution of the points in the graphs reveal that the predictions for will depended strongly on the choice of a single network acting separately. Although most of the points were distributed along the line of a perfect fit, some points (marked with the ellipses) were placed outside of the cone delimited by percentage errors equal to +25% and −25%.

**Table 6.** Training results and performance of the selected networks.


**Figure 3.** Scatterplots of *y* and *y*ˆ for the five selected neural networks acting separately: (**a**) scatterplot for samples belonging to the training subset, (**b**) scatterplot for samples belonging to the testing subset.

Table 7 presents the maximal values of absolute percentage errors (20) calculated for the five selected ANNs. The values in Table 7 reveal significant errors of predictions, which also justify employment of ensembles of neural networks in the problem.


**Table 7.** *APEmax* errors obtained for the five selected networks.

The five chosen networks were combined to form the ensemble. The rules presented earlier—Equations (10) and (11)—were employed for implementation of the ensemble averaging approach and the outputs of the model were computed as well as the errors and error measures. This model is later called ENS AV.

To complete the process of model development based on the stacked generalisation approach, the authors investigated a number of artificial neural-network candidates to become the level-1 networks. The investigated networks' structures included five neurons in the input layer (as a consequence of the selection of five ensemble member networks), h neurons in the hidden layer, and one neuron in the output layer. The number of neurons in hidden layer h ranged from one to three, as the structure of the level-1 network was supposed to be simple (compare Section 2.2). The types of employed activation functions and training algorithm were the same as in the case of the training ensemble candidate networks (as presented previously in Section 3). Training patterns that included outputs of the five ensemble member networks as the inputs of level-1 networks, and the accompanying real-life values of *SOCind* as the expected outputs, were divided randomly for each investigated network into the learning and validating subset in the proportion L/V = 60%/40%. The investigated networks varied also in the initial weights of the neurons at the beginning of the training process. Altogether, around 100 networks were trained and examined. For the purposes of testing, the authors used the T subset, which was selected in the initial stage of the research (as presented previously in Section 3). The criteria of two-step selection of the level-1 networks were similar as in the case of ensemble candidate networks (as presented previously in Section 3). The final choice of two level-1 networks, namely MLP 5-2-1 and MLP 5-3-1, allowed for the introduction of two alternatively stacked generalisation-based models. The final choice of the two above-mentioned level-1 networks, and further discussion of two alternative models based on stacked generalisation, was due to the comparable quality of these models. These models are later called ENS SG1 and ENS SG2, respectively. The details of the selected level-1 networks are presented in Table 8.

**Table 8.** Details of the two level-1 networks selected for the stacked generalisation-based models.


All three proposed models based on the ensemble of networks, namely ENS AV, ENS SG1, and ENS SG2, were assessed in terms of performance and prediction quality. The overall results appear together in Table 9. For the purposes of performance assessment and analysis of ensemble averaging and stacked generalisation-based models, Pearson's correlation coefficient (16) and error measures (17), (18), (19), and (20) were calculated for *L&V* and *T* subsets.


**Table 9.** Performance measures for the three developed models based on the ensembles of networks.

When the values in Table 9 are collated with values in Tables 5 and 6, the improvements in error measures can be seen easily. The performance of all three models based on the ensembles of networks is better when compared with the performance of the networks acting in isolation. The most evident improvement is achieved for *APEmax*.

Figures 4–6 depict scatterplots of the expected and predicted values of *SOCind* for the ENS AV, ENS SG1, and ENS SG2 models. Figures 4–6 present the points of coordinates (*yp*, *y*ˆ *p ens*) for the training and testing subsets separately. When compared to Figure 3, these graphs show that combining the five selected ANNs allowed for the compensation of errors made by the ANNs acting in isolation in the case of the ENS AV as well as the ENS SG1 and ENS SG2 models. Although an improvement has been achieved in the case of all three introduced models, one can see that the best performance is provided by ENS SG2, where all of the points are distributed within the cone of acceptable errors. In the case of ENS AV and ENS SG1, there are single points located outside of the cone.

**Figure 4.** Scatterplot of *y* and *y*ˆ*ens* for the ensemble, ENS AV, performing ensemble averaging; (**a**) scatterplot for samples belonging to the training subset, (**b**) scatterplot for samples belonging to the testing subset.

**Figure 5.** Scatterplot of *y* and *y*ˆ*sg* for the ensemble, ENS SG1; (**a**) scatterplot for samples belonging to the training subset, (**b**) scatterplot for samples belonging to the testing subset.

**Figure 6.** Scatterplot of *y* and *y*ˆ*sg* for the ensemble, ENS SG2: (**a**) scatterplot for samples belonging to the training subset, (**b**) scatterplot for samples belonging to the testing subset.

Figures 7–9 depict frequencies and distributions of *APEp* errors computed for the training and testing subsets for models based on ensembles of networks. The errors have been accumulated and counted in five intervals, whose ranges equalled 5%; one interval accumulated errors greater than 25%:


The columns in the Figures 7–9 show the percentage frequencies of the errors that have fallen into one of the intervals. The polylines show the distribution of the errors (cumulative frequencies according to the accepted order of intervals). In Figures 7–9, one can see that, in the case of the ENS AV and ENS SG1, only a few *APEp* errors (19) are greater than 25%, and in the case of ENS SG2, none of them fall into this range. On the contrary, for networks acting separately, the significant number of errors is greater than 25%. These results can be explained through the analysis of the *APE<sup>p</sup>* errors for the networks acting separately. For the networks acting separately (ANN1, ANN2, ANN3, ANN4, ANN5), many of the errors *APE<sup>p</sup>* belonging to the interval 1 were relatively small and close to 0%. On the other hand, these small errors were accompanied by a significant number of errors *APE<sup>p</sup>* ≥ 25%, and high values of *APEmax* (compare Table 7). In the case of the ensemble-based models, these errors have been compensated due to the ensemble averaging (ENS AV) or stacked generalisation (ENS SG1, ENS SG2).

**Figure 7.** Frequencies and distributions of absolute percentage errors for the ENS AV model computed for the training and testing subsets.

**Figure 8.** Frequencies and distributions of absolute percentage errors for the ENS SG1 model computed for the training and testing subsets.

**Figure 9.** Frequencies and distributions of absolute percentage errors for the ENS SG2 model computed for the training and testing subsets.

The compensation resulted in the collection of most of the prediction errors in the first five intervals. One cost of this compensation is the decrease of the number of small errors, close to 0, in the first interval. The benefit of the compensation, however, is the improvement of the overall prediction performance and better knowledge generalisation. As mentioned previously, one can easily see that the best performance is offered by the ENS SG2 model as there were no errors *APEp* ≥ 25%.

The analysis of the research results leads to the conclusion that the employment of only one of the five selected networks (as presented in Table 5) to support the prediction of *SOCind* would burden the predictions with the choice of a network—this is confirmed by the distribution of points that represent expected and predicted values (*yp*, *y*ˆ*p*) in Figure 3.

On the other hand, combining these five networks to form an ensemble compromises the strengths and weaknesses of the five ANNs—for some data, certain single-acting networks offered good predictions, whilst for others, there were weak predictions. Combining these networks into an ensemble allows for synergy. The decrease in *APEmax*, as well as more stable predictions, are the most beneficial from employment of the ensembles in the models. Furthermore, a risk of errors exceeding the critical level of 25%, in terms of percentage errors is reduced. These benefits have been achieved at some cost, mainly due to compensation of very small and very high errors offered by certain networks acting separately for certain training and testing patterns. However, the compensation of the errors from the ensemble-based models reduces the unwanted oversensitivity of the networks acting separately to certain training patterns.

#### **5. Summary and Conclusions**

The authors developed three original models based on ensembles of neural networks aimed at the prediction of site overhead cost index for construction projects. One of the models employed ensemble averaging and two of the models employed stacked generalisation. The developed models are capable of predicting the site overhead cost index with a satisfactory accuracy and, thus, supporting estimates of site overhead costs. In the light of the presented research, the general conclusion is that the employment of the ensemble of neural networks to the models proved to be superior over the approach based on the employment of a single neural network. Moreover, the effort—which is unavoidable in the training, verifying, and selecting number of networks of similar quality—is not wasted. In practical terms, the prediction using the ensemble averaging is simple—it needs an averaging of the outputs of networks belonging to the ensemble. On the other hand, stacked generalisation needs some additional computational effort that includes the training and selection of level-1 networks.

In the proposed approach, the authors employed an ensemble of neural networks to be the core of all three models. All of the proposed models consist of five different MLP networks, chosen from over 500 trained networks (over 100 networks were trained and investigated for each of five folds of training data). The five networks vary in their structure, employed activation functions, and initial conditions of training processes. The performance of the five chosen networks is comparable. However, the predictions made by the networks acting separately are burdened with the conditions of the training processes, sampling of learning and validating, and the specificity of each of the networks. Combining the five networks together leads to improvements in predictions due to compromising the strengths and weaknesses of the five networks. The prediction of the site overhead cost index using the ensemble-based models allowed for compensation of the errors made by the single networks. The predictions based on the three models for which the proposed ensemble is a core are more stable, and the risk of exceeding a critical range of errors is minimised.

The ensembles of neural networks proved their superiority over single neural networks acting in isolation. *MAPE* testing errors for the five networks acting in isolation ranged between 9.15% and 13.69%, whereas *APE*max ranged between 26.6% and 76.1%. In the case of the proposed ensembles of networks, both *MAPE* and *APE*max errors for testing were lower; values of *MAPE* ranged between 6.3% and 9.2%, whereas values of *APE*max ranged between 18.5% and 23.4%. The quality of the ensemble-based model is also visible in the distribution of errors for each of them—more than 90% of the *APE* testing errors were smaller than 25%.

The three developed models, namely ENS AV (based on the ensemble averaging approach) and ENS SG1 and ENS SG2 (based on stacked generalisation approach), offer comparable prediction quality and performance, however the best results were achieved for ENS SG2. ENS SG2 is a two-level model that employs the five selected ANNs as the level-0 networks and simple ANN as a level-1 network.

The authors' findings, justified by the analysis of the models' performance, is that the developed models are capable of supporting estimation of site overhead costs in building construction projects. In the case of other types of constructions, e.g., bridges, roads, infrastructure, etc., the specificity of the projects must be taken into account and separate models must be developed.

Further research will include studies on the development of models, supporting the cost estimation process on different levels (for certain facilities, construction works, and cost components), that are based on the concept of ensembles of neural networks.

**Author Contributions:** Conceptualisation, M.J. and A.L.; data curation, A.L.; formal analysis, M.J.; investigation, M.J. and A.L.; methodology, M.J. and A.L.; validation, M.J. and A.L.; visualization, M.J.; writing—original draft, M.J. and A.L.; writing—review and editing, M.J. and A.L.

**Funding:** No external funding has been received—this research was funded by statutory activities of Cracow University of Technology.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
