Increasing the Discriminatory Power of DEA Using  Shannon’s Entropy

Xie, Qiwei; Dai, Qianzhi; Li, Yongjun; Jiang, An

doi:10.3390/e16031571

Open AccessArticle

Increasing the Discriminatory Power of DEA Using Shannon’s Entropy

by

Qiwei Xie

^1,2,

Qianzhi Dai

^3,*,

Yongjun Li

^3,* and

An Jiang

⁴

¹

Department of Electronics and Information, Toyota Technological Institute, Nagoya 468-8511, Japan

²

Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China

³

School of Business, University of Science and Technology of China, Hefei 230026, Anhui Province, China

⁴

Research Center for Eco-Environment Sciences, Chinese Academy of Sciences, Beijing 100085, China

^*

Authors to whom correspondence should be addressed.

Entropy 2014, 16(3), 1571-1585; https://doi.org/10.3390/e16031571

Submission received: 8 November 2013 / Revised: 14 January 2014 / Accepted: 5 March 2014 / Published: 20 March 2014

Download Versions Notes

Abstract

: In many data envelopment analysis (DEA) applications, the analyst always confronts the difficulty that the selected data set is not suitable to apply traditional DEA models for their poor discrimination. This paper presents an approach using Shannon’s entropy to improve the discrimination of traditional DEA models. In this approach, DEA efficiencies are first calculated for all possible variable subsets and analyzed using Shannon’s entropy theory to calculate the degree of the importance of each subset in the performance measurement, then we combine the obtained efficiencies and the degrees of importance to generate a comprehensive efficiency score (CES), which can observably improve the discrimination of traditional DEA models. Finally, the proposed approach has been applied to some data sets from the prior DEA literature.MSC2000 Codes: 62B10, 94A17

Keywords:

data envelopment analysis (DEA); discrimination improvement; Shannon’s entropy

1. Introduction

Data envelopment analysis (DEA) has been proven to be an effective tool for performance evaluation and benchmarking since it was first introduced in [1]. Based on data from Google Scholar [1] has been cited over 16,000 times. After the first CCR model, a number of different DEA models have been proposed, which have wide applications in various performance evaluation problems (e.g., [2–4], etc.). In DEA models, each input or output variable is attached with a weight, and the relative efficiency of each decision-making unit (DMU) is defined as the ratio of its weighted sum of outputs to the weighted sum of inputs, thus DEA efficiencies are relative to the set of input-output data available [5]. The nature of the DEA method allows each DMU under evaluation to maximize its relative efficiency by discretionarily choosing its weight based upon a set of constraints that all the efficiencies of DMUs are not bigger than one. If the efficiency score of a DMU is one, it is DEA (weakly) efficient. Accordingly, at a given number of DMUs, the efficiency score of each DMU relies heavily on the dimensionality of the weight space. Adding variables to a DEA model will result in higher dimensionality of the weight space and higher efficiency scores, as well as an expanded set of efficient DMUs [6]. In other words, the greater the number of variables a DEA model has, the more efficient DMUs will be, and the less discerning the DEA analysis is [7]. This situation suggests the need for selecting as few variables for DEA models as possible.

A guideline commonly applied to variable selection is that the number of variables should be less than one third of the number of the DMUs [8]. However, a great number of practical applications in performance measurement are inconsistent with this guideline. For example, [9] evaluated the performance of seven university departments with six variables. Reference [10] presented an example where DEA was used to evaluate eight hotels with eight variables, and the same data set has also been applied by [7] and [11]. In [12] a series of DEA models were applied to measuring the performance of 22 solid waste disposal alternatives with eight variables, and that data set has also been applied by [7]. In [13] its authors presented a banking example with two inputs and four outputs to compare ten different banks. Reference [14] measured the ecology efficiencies of 17 Chinese cities with 30 variables. Apparently, these aforementioned data sets have a common characteristic which is that the number of variables is too large to directly apply DEA methods to the criteria of the number of variables being “large” is according to the guideline [8].

There are several factors that should be accounted for as a common characteristic of these data sets. Firstly, the initial list of potential variables to be considered for DEA is often very large [11]. For example, any resource used in the process should be treated as an input [11], and many environmental variables should also be taken into the efficiency measurement [15], as they can influence the availability of resources. Secondly, the limited rationality of human beings and potential relationships among variables make it hard for the analyst to determine the best choice among all variables related or unrelated to the production process, though the analyst can use prior knowledge and experience to omit some variables that have no effect on the production process [16]. These practical conditions lead the analyst to choose as many variables as possible.

The conflict between the two requirements of the practical conditions and traditional DEA methods in variables selection causes a situation to always occur, which is that the selected data set is not suitable for applying traditional DEA methods due to its poor discrimination. Then, the natural question that follows is raised:

How to improve the discrimination of traditional DEA methods to create a complete ranking for all DMUs when the aforementioned situation occurs?

Note that if we cannot identify the production frontier from preliminary surveys, it may be risky to rely on one particular DEA model only considering all variables from the selected data set, especially when we are hesitating at which variables should be chosen to characterize the production process. Hence, it is wise to try different models and combine the results of these different models [17]. To this end, this paper presents an approach to improve the discrimination of traditional DEA methods without losing variable information. We consider all possible DEA model specifications in the final result, where each specification includes an alternative variable combination. Efficiencies are calculated for all possible DEA model specifications and analyzed using Shannon’s entropy theory to generate a comprehensive efficiency score (CES) for each DMU, which can then be used to give a complete ranking. The proposed approach has been applied to some data sets from prior DEA.

The rest of this paper is organized as follows: Section 2 includes a literature review on the methods for improving the discrimination. Section 3 describes how to combine traditional DEA methods and Shannon’s entropy theory to generate a CES. In Section 4, we illustrate the proposed approach using some data sets from the literature. Section 5 gives our conclusions.

2. Literature Review

The methods of prior DEA studies in discrimination improvement can be divided into two categories. The first one is to enlarge the number of DMUs while keeping the number of variables fixed. One approach to this is to use pooled cross section and time series data. However, this approach assumes no technological change over the sample periods, which may be a major problem in practice [18].

The second one is to reduce the dimensionality of the selected data set by using variable reduction (VR) based on partial covariance analysis or principal component analysis (PCA). Reference [7] introduced a multivariate statistical approach, called the VR method, using the partial covariance analysis in terms of the principle of minimizing information reduction. They concluded that the major impact of the calculated DEA efficiencies could be maintained even when deleting highly correlated variables (see also [19]). The other discrimination improving method uses PCA to transform the data set into a set of principal components, and a limited number of components are selected and analyzed by traditional DEA models to calculate the efficiency score of each DMU. References [13] and [20] proposed a PCA/DEA method by directly taking the computing procedure of the original input or output variables, respectively, and using the principal components to produce a reduced number of variables for a subsequent data envelopment analysis. This proposed approach has been applied to performance measurement of deregulated airline networks [13] and airport quality [21]. The authors of [14] proposed an Assurance Region (AR) PCA/DEA model, and the AR restrictions are used to reflect the difference of the relative importance of each principal component generated from the ratios of single output to single input. They applied their approach to measuring the eco-efficiencies of 17 Chinese cities. In [22] Monte Carlo simulation was applied to compare the PCA-DEA and VR methodologies, and demonstrated that the former provides a more powerful tool than the latter with consistently more accurate results. However, the use of the principal components as compared to original variables would cause a more opaque result in the subsequent analysis [13], such as the measurement of the efficient levels for each original variable and the directions on the performance improvement for inefficient DMUs, and so on. However, this kind of approach improves the discrimination ability of DEA at the expense of losing some variable information.

3. Discrimination Improvement Using Shannon’s Entropy

3.1. Traditional DEA Models

Suppose there are n independent DMUs, and each DMU_j ( j ∈ N = {1,2,..., n} ) consumes m inputs x_ij (i ∈ M = {1,2,..., m}) to generate s outputs y_rj (r ∈ S = {1,2,..., s}). The efficiency for any given DMU_d can be computed using the following CCR DEA model [1]:

\begin{array}{l} E_{d} (M, S) = min θ - ɛ (\sum_{i = 1}^{m} s_{i}^{-} + \sum_{r = 1}^{s} s_{r}^{+}) \\ s . t . \\ \sum_{j = 1}^{n} λ_{j} x_{i j} + s_{i}^{-} = θ x_{i d}, i = 1, 2, \dots, m; \\ \sum_{j = 1}^{n} λ_{j} y_{r j} - s_{r}^{+} = y_{r d}, r = 1, 2, \dots, s; \\ λ_{j} \geq 0, j = 1, 2, \dots, n, \\ s_{i}^{-}, s_{r}^{+} \geq 0, i = 1, 2, \dots, m, r = 1, 2, \dots, s \end{array}

(1)

where λ_j, $s_{i}^{-}$ and $s_{r}^{+}$ are unknown parameters. ɛ is the non-Archimedes infinitesimal, and E_d (M, S) is the optimal efficiency for DMU_d with considering input data set M and output data set S. CCR DEA model is based on the constant return to scale (CRS) assumption. If we add the constraint $\sum_{j = 1}^{n} λ_{j} = 1$ into the constraints of model (1), then we could obtain BCC DEA model [23], which is based on the variable return to scale (VRS) assumption.

Moreover, if the decision maker wants to make DMU_d efficient (i.e., the efficiency improvement of DMU_d ) under the variable set (M,S), he could use the following equations to calculate the optimal level of inputs and outputs:

{\begin{matrix} {\hat{x}}_{i d} = θ^{*} x_{i d} - s_{i}^{- *}, i = 1, 2, \dots, m; \\ {\hat{y}}_{r d} = y_{r d} + s_{r}^{+ *}, r = 1, 2, \dots, s . \end{matrix}

(2)

3.2. Shannon’s Entropy DEA Models

The following paragraphs describe how to calculate the degree of importance of each model in the efficiency measurement via Shannon’s entropy and combine the results to be a CES.

Theoretically, a DEA model at least has one input and one output [11]. Accordingly, the number of all different combinations of input subsets from M and output subsets from S is K = (2^m −1) × (2^s −1). Denote model (1) based upon the kth combination of variable set as M_k, and model set Ω = {M₁, M₂,..., M_K }. We denote the efficiency score of DMU_j based on M_k as E_kj, j = 1, …, n, k = 1, …, K. If we solve model (1) K times, once with an alternative combination of variable sets, we get an efficiency matrix [E_jk ]_n_×_K as follows:

[\begin{matrix} M_{1} & M_{2} & \dots & M_{K} \\ D M U_{1} & E_{11} & E_{12} & \dots & E_{1 K} \\ D M U_{2} & E_{21} & E_{22} & \dots & E_{2 K} \\ ⋮ & ⋮ & ⋮ & ⋮ \\ D M U_{n} & E_{n 1} & E_{n 2} & \dots & E_{n K} \end{matrix}]

(3)

The concept of Shannon’s entropy [24] plays a central role in information theory, and sometimes refers to a measure of uncertainty. This concept has been extended to different scientific fields, such as physics, social sciences, and so on (e.g., [25–28], etc.). To the best of our knowledge, the first work integrating Shannon’s entropy and DEA is [17], in which the authors integrated a series of efficiency scores of a DMU based upon many different DEA models (such as CCR, BCC and so on) into a comprehensive efficiency score via using Shannon’s entropy to calculate the degree of importance of each model. In [29] an entropy-based approach to deal with the problem of the distorted efficiency measurement in the non-proportional radial measure was proposed. Besides, there are also many other studies based on Shannon’s entropy and DEA, such as [30,31]. Because the issue of these studies is how to integrate the performance evaluation results calculated by different DEA models into a unified result by using Shannon’s entropy, they don’t consider the discrimination problem of DEA. Therefore, it is an absolutely different issue comparing with this work, and we don’t further detail it in this study. In this paper, we introduce an approach to evaluate the importance of each variable combination and obtain the CES. The computing procedure for obtaining a CES is given as follows:

Step 1: Calculate the efficiency matrix [E_jk ]_n_×_K based upon model (1) with all alternative combination of variable set.
Step 2: Normalize the efficiency Matrix [E_jk]_n_×_K and set $e_{j k} = E_{j k} / \sum_{j = 1}^{n} E_{j k}$ , k = 1,2,..., K.
Step 3: Compute entropy f_k as $f_{k} = - {(ln n)}^{- 1} \sum_{j = 1}^{n} e_{j k} ln (e_{j k})$ , k = 1,2,..., K.
Step 4: Calculate the degree of the diversification of M_k as d_k = 1 – f_k, k = 1,2,..., K.
Step 5: Normalize the value of d_k as $W_{k} = d_{k} / \sum_{k = 1}^{K} d_{k}$ , k = 1,2,..., such that 1 $\sum_{k = 1}^{K} W_{k} = 1$ .
Step 6: Calculate the CES as $θ_{j} = \sum_{k = 1}^{K} W_{k} E_{j k}$ , j = 1,2,..., n.

Definition 1

If θ_j = 1, then DMU_j ( j = 1,2,..., n) is comprehensively DEA efficient.

Theorem 1

There is a negative correlation between the entropy value and the difference of DEA efficiencies for a given variable subset. Particularly, if all efficiencies under a variable subset are equal, then the minimal weight of the subset would be obtained.

Proof

For a given variable subset k (k ∈ {1,2,.., K}), if there is a low difference of DEA efficiencies, i.e., e_jk → 1 / n (j ∈ {1,2,..., n}), then we can compute its entropy value as $f_{k} = - {(ln n)}^{- 1} \sum_{j = 1}^{n} (1 / n) ln (1 / n) \to 1$ , d_k = 1 – f_k → 0 and $W_{k} = d_{k} / \sum_{k = 1}^{K} d_{k} \to 0$ .

Theorem 1 states that the value of d _k is appropriate to represent the importance of M_k as compared to other models from Ω. The less discerning the DEA analysis is, the smaller the value d _k will be. If the CES of a given DMU is one, then it is always efficient in Ω.

It should also be noted that if the data set only has one input and one output, the proposed method is equivalent to the traditional models, and the proposed method in this section does not rely on the particular form of the DEA model. This method can be used with either constant or variable returns to scale, or with either an input or an output orientation.

Theorem 2

A DMU is efficient in the CES if and only if it is efficient in all variable combinations.

Proof

For a given DMU_j_₀ ( j₀ ∈ {1,2,..., n}) under evaluation is efficient, this means that $θ_{j_{0}} = \sum_{k = 1}^{K} W_{k} E_{j_{0} k} = 1$ . When it is subtracted by the equation $\sum_{k = 1}^{K} W_{k} = 1$ , then we can obtain the following equation:

\sum_{k = 1}^{K} W_{k} (1 - E_{j_{0} k}) = 0

(4)

From the computing procedure, we can know W_k > 0, ( k = 1,2,..., K ). As a result, E_j_₀_k ( k = 1,2,..., K ) must be equal to 1 in equation (4). That is, DMU_j_₀ is efficient in all variable combinations.

Theorem 2 implies that our approach is a DEA discrimination improving method. When some DMUs are efficient under one subset but inefficient under another subset, and in this case their CES would be inefficient. When some DMUs are efficient under all subsets, then we consider that their performance is indeed efficient and does not need any further adjustment. In general, it would be very rare to get two (or more) efficient DMUs simultaneously based on our proposed approach. But the traditional DEA models determine this because they evaluate the performance of DMUs under a certain variable set, and their discrimination powers are heavily influenced by the variable dimensionality [11]. Moreover, from the computing procedure, if the CES cannot distinguish two (or more) efficient DMUs, then the traditional simple DEA method must also be unable to distinguish them. Therefore, the performance of traditional DEA approach with Shannon’s entropy must be better than simple DEA method in all kind of circumstance.

From the perspective of the slack analysis, all DMUs can obtain the optimal inputs and outputs under each variable subset by applying the system of equations (2). Based on the optimal inputs and outputs, all DMUs would be efficient under each subset. From Theorem 2, we can get θ_j = 1, i.e., each DMU is comprehensively DEA efficient. We don’t obtain a unified improvement plan by calculating the weighted sum of W_k and the optimal inputs and outputs of M_k directly because such a plan cannot ensure all DMUs are efficient after the efficiency improvement.

4. Numerical Examples

4.1. Simple Data Set From [32]

Table 1 shows a simple data set has five DMUs with three inputs and two outputs from [32]. The number of variables is more than the number of DMUs. It is inconsistent with the guideline that the number of variables should be less than one third of the number of the DMUs [8]. Thus, as mentioned above, the discrimination of traditional DEA models might be decreased [7]. In the following paragraphs, we employ this example to illustrate our proposed approach. The results of applying to this data set are based upon the input-oriented, constant return to scale DEA model as described above. The number of all possible variable subsets is K = (2³ −1) × (2² −1) = 21.

As shown in the second and the third columns of Table 2, they are a combination of input and output variables and efficiencies for all specifications, respectively. The number “1” in the second column of Table 2 means that its corresponding variable is included in the combination, while “0” means the variable is removed. According to the computing procedure described in Section 3.1, the degrees of the relative importance of all specifications are calculated as shown in the last column. Table 2 is sorted by the degrees in descending order. The results show that the efficiencies are enlarged when we expand the variables in the data set, and the discrimination of DEA is poorer and poorer. The biggest degree of the importance is W₁ = 0.11103, and its corresponding subset is M₁ only includes two variables (X₃ and Y₂). The smallest degree W₂₁ = 0.007768 corresponds to M₂₁, which includes all variables from the data set.

Using the expression of Step 6 in Section 3.2, the comprehensive efficiency score of each DMU can be calculated as shown in the second column of Table 3. For the convenience of DEA model comparison, the last three columns of Table 3 shows the efficiencies resulting from the traditional input-oriented CCR model, game cross DEA model [32], super-efficiency DEA model [33] and SBM model [34], respectively.

The CCR efficiencies show that DMU₂ and DMU₃ are both DEA efficient and should be ranked at the same order, while the other three models’ results report that DMU₃ performs better than DMU₂. In fact, paying attention to the results in the third column of Table 2, we find that DMU₃ is always DEA efficient based upon all DEA specifications, except model M₈. As to DMU₂, it is efficient based upon 12 DEA models, while the other nine models will lead it to be inefficient. Besides, we can find that the SBM efficiencies can’t distinguish efficient DMUs, such as DMU₂ and DMU₃, which is similar as the situation of CCR efficiencies.

Another interesting finding is the differences of the four different models in distinguishing inefficient DMUs. Both CCR and super-CCR efficiencies show DMU₄ and DMU₅ have a same efficiency score (0.85714), but game cross efficiencies show DMU₄ ranks below DMU₅, which conflicts with the results of our proposed approach. Taking attention to the results in third column of Table 2 again, it shows that the number of the DEA specifications is eight, based upon which DMU₄ has an efficiency score of 0.85714, and the number for DMU₅ is eight too. However, the efficiency scores of DMU₄ are flukier than those of DMU₅ when different DEA model specifications are used. For example, the results from the last 14 models show its efficiencies are at least 0.75, but its efficiencies are lower than 0.23 based upon the first seven DEA models. The efficiencies of DMU₅ are relatively steady from the lower bound 0.35714 to the upper bound 0.85714 when different DEA models are tried. Therefore, it is risky to rely on these models using all variables in calculation at a time, especially when we are hesitating as to which variables should be chosen to characterize the production process.

Besides, when these DEA models are extended from the constant return to scale (CRS) version to variable return to scale (VRS) version, the game cross DEA model may be problematic for the existence of the negative efficiency scores [32] and super-efficiency DEA model may be infeasible [35,36]. However, the proposed approach here is independent of the CRS/VRS assumption and can be used in all cases.

4.2. Hotel Chain

Our last example is from a textbook [10], which measures the efficiencies of eight hotel chains. Each chain (DMU) has gathered information regarding six input variables (service, climate control, price, convenience, room comfort, and food quality) and two output variables (overall satisfaction and value). The same data set has also been applied by [7] and [11]. The following results are based upon input-oriented, constant returns to scale DEA models.

Table 4 reports the results of applying the proposed approach, traditional CCR model, game cross DEA model and super-efficiency DEA model respectively to the data set. It shows that except for the traditional CCR model and the SBM model, the other three models can give a complete ranking for all departments. The results of the three models report that DMU₂ ranks above other DMUs, while the results from the proposed model and game cross DEA model show that DMU₃ ranks next to any other DMUs.

Here we can find that the result of superCCR also has a high discrimination power. However, it is only under an individual variable set with six inputs and two outputs. Moreover, as [37] demonstrated, in fact, the super-efficiency model is inappropriate to rank efficient DMUs because that the efficiencies of efficient DMUs are not calculated under a common platform (i.e., the efficient frontier is changed for each efficient DMU).

In the case of DMU₅ and DMU₆, the ranking order of CES varies from the GCE and superCCR. Because both GCE and superCCR are calculated under the same individual variable set with six inputs and two outputs. We consider that a special variable set is not enough to represent the actual performance of DMUs. In fact, in the calculation process of the CES, we find that the performance of DMU₅ is less than DMU₆ under many variable subsets. By integrating the efficiencies of all subsets, we can obtain the CES. Therefore, the CES is more comprehensive and seems more representative than the GCE and superCCR.

The Pearson’s linear correlation coefficient matrix of the five different models’ efficiencies is obtained as shown in Table 5. The correlation matrix indicates that the efficiencies of the proposed model are highly correlated with the ones based upon the traditional SBM model and the game cross CCR model.

5. Conclusions

This paper presents an approach to improve the discrimination of traditional DEA methods by considering all possible specifications. An approach to give a ranking with more discriminant ability for all individual DMUs is presented, and numerical examples show that the proposed approach has some advantages in ranking DMUs as compared to traditional DEA models, game cross DEA model, and the so-called super efficiency model. In this paper, we select an input-oriented constant return to scale the DEA model to illustrate the examples. It can also be replaced by other DEA models. For example, various DEA models have been proposed to deal with undesirable outputs/inputs, such as in [38–42]. Thus, our proposed approach has good compatibility.

Acknowledgments

This research is supported by the National Science Foundation of China (NSFC) for Distinguished Youth Scholars (No. 71225002), National Natural Science Foundation of China under Grants (No. 61101219, 71271196, 61201050 and 21307150), Science Funds for Creative Research Groups of the National Natural Science Foundation of China and University of Science and Technology of China (No. 71121061 and WK2040160008) and the Fund for International Cooperation and Exchange of the National Natural Science Foundation of China (Grant No. 71110107024).

Appendix: The Matlab Code for the Composite Multiscale Entropy Algorithm

function [All_Efficiency CES Weights] = Entropy_DEA(X, Y)
%All_Efficiency: the efficiency matrix based upon standard input-oriented constant returns to
scale %model with all alternative combination of variable set.
m = size(X, 2);
[n, s] = size(Y);
set = {};
set1 = {};
set2 = {};
containsX = zeros(1, m);
containsY = zeros(1, s);
set1 = subset(X, set1, containsX, 1);
set2 = subset(Y, set2, containsY, 1);
set1{end+1, 1} = X;
set2{end+1, 1} = Y;
for i = 1 : size(set1, 1)
 for j = 1 : size(set2, 1)
 mark = ones(1, m+s);
 if isequal(set1{i, 1}, zeros(n, m)) || isequal(set2{j, 1}, zeros(n, s))
 continue;
 else
 set{end + 1, 1} = set1{i, 1};
 set{end, 2} = set2{j, 1};
 for k = 1 : m
 if isequal(set1{i, 1}(:, k), zeros(n, 1))
 mark(1, k) = 0;
 end
 end
 for k = 1 : s
 if isequal(set2{j, 1}(:, k), zeros(n, 1))
 mark(1, m+k) = 0;
 end
 end
 set{end, 3} = mark;
 end
 end
end
for i = 1 : size(set, 1);
 subX = set{i, 1};
 subY = set{i, 2};
 All_Efficiency(:, i) = CCR_I(subX, subY); % It can be changed as any other DEA %models.
end
[CES Weights] = Shannon_Effciency(All_Efficiency);
function [CES Weights] = Shannon_Effciency(All_Efficiency)
[n l] = size(All_Efficiency);
for j = 1 : l
 for i = 1 : n
 e_new(i, j) = All_Efficiency(i, j)/sum(All_Efficiency(:, j));
 end
end
e_log = log(e_new);
for j = 1 : l
 for i = 1 : n
 e_new2(i, j) = e_new(i, j) * e_log(i, j);
 end
end
e_l = -sum(e_new2)/log(n);%Shannon Entropy
d_l = ones(1, l) - e_l;
for i = 1 : l
 Weights(i, 1) = d_l(i)/sum(d_l);
 if Weights(i, 1) < 0.00009
 Weights(i, 1) = 0;
 end
end
CES = All_Efficiency * Weights;
function [e w] = CCR_I(X, Y)
m = size(X, 2);
[n s] = size(Y);
for i = 1:n
 Aeq = [zeros(1, s) X(i, :)];
 beq = 1;
 f = -[Y(i, :) zeros(1, m)];
 A = [Y -X];
 b = zeros(n, 1);
 LB = zeros(s+m, 1);
 w(:, i) = linprog(f, A, b, Aeq, beq, LB);
 e(i, 1) = -f * w(:, i);
end

Conflicts of Interest

The authors declare no conflict of interest.

Author ContributionsQiwei Xie and Yongjun Li cooperated with each other to conceive and design this study. They drafted the article together. Qianzhi Dai collected and analyzed the data, and revised the study critically for important intellectual content. An Jiang programmed to calculate the data. All authors agree to approve of the final version to be published.

References

Charnes, A.; Cooper, W.W.; Rhodes, E. Measuring the efficiency of decision making units. Eur. J. Oper. Res 1978, 2, 429–444. [Google Scholar]
Cook, W.D.; Seiford, L.M. Data envelopment analysis (DEA)–Thirty years on. Eur. J. Oper. Res 2009, 192, 1–17. [Google Scholar]
Cooper, W.W.; Seiford, L.M.; Tone, K. Data Envelopment Analysis: A Comprehensive Text with Models, Applications, References and DEA Solver Software, 2nd ed; Springer: New York, NY, USA, 2007. [Google Scholar]
Zhu, J. Quantitative Models for Performance Evaluation and Benchmarking: DEA with Spreadsheets and DEA Excel Solver, 2nd ed; Kluwer Academic Publishers: Boston, MA, USA, 2003. [Google Scholar]
Lozano, S. Information sharing in DEA: A cooperative game theory approach. Eur. J. Oper. Res 2012, 222, 558–565. [Google Scholar]
Cinca, S.; Molinero, C.M. Selecting DEA specifications and ranking units via PCA. J. Oper. Res. Soc 2004, 55, 521–528. [Google Scholar]
Jenkins, L.; Anderson, M. A multivariate statistical approach to reducing the number of variables in data envelopment analysis. Eur. J. Oper. Res 2003, 147, 51–61. [Google Scholar]
Friedman, S.-S. Combining ranking scales and selecting variables in the DEA context: The case of industrial branches. Comput. Oper. Res 1998, 25, 781–791. [Google Scholar]
Wong, Y.H.B.; Beasley, J.E. Restricting weight flexibility in data envelopment analysis. J. Oper. Res. Soc 1990, 41, 829–835. [Google Scholar]
Ragsdale, C.T. Spreadsheet Modeling and Decision Analysis, 3rd ed; Thomson Nelson: Cincinnati, OH, USA, 2006; p. 132. [Google Scholar]
Wagner, J.M.; Shimshak, D.G. Stepwise selection of variables in data envelopment analysis: Procedures and managerial perspectives. Eur. J. Oper. Res 2007, 180, 57–67. [Google Scholar]
Sarkis, J. A comparative analysis of DEA as a discrete alternative multiple criteria decision tool. Eur. J. Oper. Res 2000, 123, 543–557. [Google Scholar]
Adler, N.; Golany, B. Evaluation of deregulated airline networks using data envelopment analysis combined with principal component analysis with an application to Western Europe. Eur. J. Oper. Res 2001, 132, 260–273. [Google Scholar]
Liang, L.; Li, Y.; Li, S. Increasing the discriminatory power of DEA in the presence of the undesirable outputs and large dimensionality of data sets with PCA. Expert Syst. Appl 2009, 36, 5895–5899. [Google Scholar]
Boussofiane, A.; Dyson, R.G.; Thanassoulis, E. Applied data envelopment analysis. Eur. J. Oper. Res 1991, 52, 1–15. [Google Scholar]
Pastor, J.T.; Ruiz, J.L.; Sirvent, I. A statistical test for nested radial DEA models. Oper. Res 2002, 50, 728–735. [Google Scholar]
Soleimani-Damaneh, M.; Zarepisheh, M. Shannon’s entropy for combining the efficiency results of different DEA models: Method and application. Expert Syst. Appl 2009, 36, 5146–5150. [Google Scholar]
Hughes, A.; Yaisawarng, S. Sensitivity and dimensionality tests of DEA efficiency scores. Eur. J. Oper. Res 2004. [Google Scholar]
Dyson, R.; Allen, R.; Camanho, A.S.; Podinovski, V.V.; Sarrico, C.S.; Shale, E.A. Pitfalls and protocols in DEA. Eur. J. Oper. Res 2001, 132, 245–259. [Google Scholar]
Adler, N.; Golany, B. Including principal component weights to improve discrimination in data envelopment analysis. J. Oper. Res. Soc. Jpn 2003, 46, 66–73. [Google Scholar]
Adler, N.; Berechman, J. Measuring airport quality from the airlines’ viewpoint: An application of data envelopment analysis. Transport Pol 2001, 8, 171–181. [Google Scholar]
Adler, N.; Yazhemsky, E. Improving discrimination in data envelopment analysis: PCA–DEA. Eur. J. Oper. Res 2010, 202, 273–284. [Google Scholar]
Banker, R.D.; Charnes, A.; Cooper, W.W. Some models for estimating technical and scale inefficiencies in data envelopment analysis. Manage Sci 1984, 30, 1078–1092. [Google Scholar]
Shannon, C.E. A mathematical theory of communication. Bell Syst. Tech. J 1948, 27, 379–423. [Google Scholar]
Mathai, A.M.; Haubold, H.J. On a generalized entropy measure leading to the pathway model with a preliminary application to solar neutrino data. Entropy 2013, 15, 4011–4025. [Google Scholar]
Mistry, K.H.; Lienhard, J.H. An economics-based second law efficiency. Entropy 2013, 15, 2736–2765. [Google Scholar]
Baez, J.C.; Fritz, T.; Leinster, T. A characterization of entropy in terms of information loss. Entropy 2011, 13, 1945–1957. [Google Scholar]
Shao, Y.S.; Brooks, D. ISA-independent workload characterization and its implications for specialized architectures. Entropy 2012, 2, 2. [Google Scholar]
Hsiao, B.; Chern, C.C.; Chiu, C.R. Performance evaluation with the entropy-based weighted Russell measure in data envelopment analysis. Expert Syst. Appl 2011, 38, 9965–9972. [Google Scholar]
Bian, Y.; Yang, F. Resource and environment efficiency analysis of provinces in China: A DEA approach based on Shannon’s entropy. Energ Pol 2010, 38, 1909–1917. [Google Scholar]
Hosseinzadeh Lotfi, F.; Toloie Eshlaghy, A.; Shafiee, M. Providers ranking using data envelopment analysis model, cross efficiency and Shannon entropy. Appl. Math. Sci 2012, 6, 153–161. [Google Scholar]
Liang, L.; Wu, J.; Cook, W.D.; Zhu, J. The dea game cross-efficiency model and its Nash equilibrium. Oper. Res 2008, 56, 1278–1288. [Google Scholar]
Andersen, P.; Petersen, N.C. A procedure for ranking efficient units in data envelopment analysis. Manage Sci 1993, 39, 1261–1294. [Google Scholar]
Tone, K. A slacks-based measure of efficiency in data envelopment analysis. Eur. J. Oper. Res 2001, 130, 498–509. [Google Scholar]
Seiford, L.M.; Zhu, J. Infeasibility of super-efficiency data envelopment analysis models. INFOR 1999, 37, 174–187. [Google Scholar]
Cook, W.D.; Liang, L.; Zha, Y.; Zhu, J. A modified super-efficiency DEA model for infeasibility. J. Oper. Res. Soc 2008, 60, 276–281. [Google Scholar]
Banker, R.D.; Chang, H. The super-efficiency procedure for outlier identification, not for ranking efficient units. Eur. J. Oper. Res 2006, 175, 1311–1320. [Google Scholar]
Scheel, H. Undesirable outputs in efficiency valuations. Eur. J. Oper. Res 2001, 132, 400–410. [Google Scholar]
Seiford, L.M.; Zhu, J. Modeling undesirable factors in efficiency evaluation. Eur. J. Oper. Res 2002, 142, 16–20. [Google Scholar]
Yang, H.; Pollitt, M. Incorporating both undesirable outputs and uncontrollable variables into DEA: The performance of Chinese coal-fired power plants. Eur. J. Oper. Res 2009, 197, 1095–1105. [Google Scholar]
Gomes, E.G.; Lins, M.E. Modelling undesirable outputs with zero sum gains data envelopment analysis models. J. Oper. Res. Soc 2007, 59, 616–623. [Google Scholar]
Liu, W.B.; Meng, W.; Li, X.X.; Zhang, D.Q. DEA models with undesirable inputs and outputs. Ann. Oper. Res 2010, 173, 177–194. [Google Scholar]

**Table 1.** Characteristics of the data set from Reprinted by permission. Copyright 2008 INFORMS [32].
DMU	X₁	X₂	X₃	Y₁	Y₂
1	7	7	7	4	4
2	5	9	7	7	7
3	4	6	5	5	7
4	5	9	8	6	2
5	6	8	5	3	6

Table 2. Efficiencies and degrees of importance of all DEA specifications.

**Table 2.** Efficiencies and degrees of importance of all DEA specifications.
M_k	Variable combinations					Efficiencies for all specifications					W_k

	X₁	X₂	X₃	Y₁	Y₂	DMU₁	DMU₂	DMU₃	DMU₄	DMU₅
1	0	0	1	0	1	0.40816	0.71429	1	0.17857	0.85714	0.11103
2	1	0	0	0	1	0.32653	0.8	1	0.22857	0.57143	0.10836
3	0	1	1	0	1	0.4898	0.71429	1	0.19048	0.85714	0.096124
4	1	0	1	0	1	0.40816	0.8	1	0.22857	0.85714	0.09486
5	0	1	0	0	1	0.4898	0.66667	1	0.19048	0.64286	0.091951
6	1	1	1	0	1	0.4898	0.8	1	0.22857	0.85714	0.084255
7	1	1	0	0	1	0.4898	0.8	1	0.22857	0.64286	0.082092
8	1	0	0	1	0	0.40816	1	0.89286	0.85714	0.35714	0.066891
9	1	0	0	1	1	0.40816	1	1	0.85714	0.57143	0.044747
10	1	1	0	1	0	0.68571	1	1	0.85714	0.45	0.032067
11	0	1	0	1	0	0.68571	0.93333	1	0.8	0.45	0.029345
12	0	0	1	1	0	0.57143	1	1	0.75	0.6	0.024453
13	1	0	1	1	0	0.57143	1	1	0.85714	0.6	0.024033
14	0	1	1	1	0	0.68571	1	1	0.8	0.6	0.017232
15	1	1	1	1	0	0.68571	1	1	0.85714	0.6	0.017074
16	0	0	1	1	1	0.57143	1	1	0.75	0.85714	0.017036
17	1	0	1	1	1	0.57143	1	1	0.85714	0.85714	0.01547
18	1	1	0	1	1	0.68571	1	1	0.85714	0.64286	0.014303
19	0	1	0	1	1	0.68571	0.93333	1	0.8	0.64286	0.012432
20	0	1	1	1	1	0.68571	1	1	0.8	0.85714	0.008484
21	1	1	1	1	1	0.68571	1	1	0.85714	0.85714	0.007768

Table 3. Efficiencies based upon four different DEA models.

**Table 3.** Efficiencies based upon four different DEA models.
DMU	CES	CCR	GCE	SuperCCR	SBM
1	0.47997	0.68571	0.6384	0.68571	0.47619
2	0.83347	1	0.97664	1.12	1
3	0.99283	1	1	1.5	1
4	0.41582	0.85714	0.79878	0.85714	0.32179
5	0.69064	0.85714	0.66703	0.85714	0.56863

Table 4. Results of applying four different models to the data set from [10].

**Table 4.** Results of applying four different models to the data set from [10].
DMU	CES	CCR	GCE	SuperCCR	SBM
1	0.29138	0.88542	0.823	0.88542	0.4156
2	0.92407	1	1	4.2144	1
3	0.24193	0.87312	0.80414	0.87312	0.36333
4	0.34309	0.88315	0.85682	0.88315	0.47566
5	0.65948	1	0.99719	1.8971	1
6	0.82922	1	0.96786	1.2222	1
7	0.47185	0.85697	0.8493	0.85697	0.60733
8	0.59083	1	0.96464	1.8165	1

Table 5. Correlation matrix of the five different models’ efficiencies to hotel case.

**Table 5.** Correlation matrix of the five different models’ efficiencies to hotel case.
Corr	CES	CCR	GCE	SuperCCR	SBM
CES	1	0.85083	0.91554	0.75718	0.91618
CCR	0.85083	1	0.9582	0.65395	0.94433
GCE	0.91554	0.9582	1	0.71683	0.97824
SuperCCR	0.75718	0.65395	0.71683	1	0.63874
SBM	0.91618	0.94433	0.97824	0.63874	1

© 2014 by the authors; licensee MDPI, Basel, Switzerland This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

Share and Cite

MDPI and ACS Style

Xie, Q.; Dai, Q.; Li, Y.; Jiang, A. Increasing the Discriminatory Power of DEA Using Shannon’s Entropy. Entropy 2014, 16, 1571-1585. https://doi.org/10.3390/e16031571

AMA Style

Xie Q, Dai Q, Li Y, Jiang A. Increasing the Discriminatory Power of DEA Using Shannon’s Entropy. Entropy. 2014; 16(3):1571-1585. https://doi.org/10.3390/e16031571

Chicago/Turabian Style

Xie, Qiwei, Qianzhi Dai, Yongjun Li, and An Jiang. 2014. "Increasing the Discriminatory Power of DEA Using Shannon’s Entropy" Entropy 16, no. 3: 1571-1585. https://doi.org/10.3390/e16031571

Article Menu

Increasing the Discriminatory Power of DEA Using Shannon’s Entropy

Abstract

1. Introduction

2. Literature Review

3. Discrimination Improvement Using Shannon’s Entropy

3.1. Traditional DEA Models

3.2. Shannon’s Entropy DEA Models

Definition 1

Theorem 1

Proof

Theorem 2

Proof

4. Numerical Examples

4.1. Simple Data Set From [32]

4.2. Hotel Chain

5. Conclusions

Acknowledgments

Appendix: The Matlab Code for the Composite Multiscale Entropy Algorithm

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI