1. Introduction
Data Envelopment Analysis (DEA) measures the relative efficiency of Decision Making Units (DMUs) that convert inputs to outputs. It was originally proposed by Charnes et al. [
1] as a nonparametric approach, making no assumptions about the production frontier or the weights assigned to the various factors relevant to the analysis. To assess the efficiency of DMUs, they are compared to the best-practice frontier determined by the group of units with the most favorable input–output performance. The traditional methods divide the units into efficient ones, i.e., those on the efficient frontier, and inefficient ones, i.e., those below the frontier. Due to its versatility, DEA has been widely used in various areas such as management, economics, agriculture, education, healthcare, and logistics [
2]. The recent example applications concerned the assessment of public administration [
3] and the urban rail transit network [
4].
Since its first formulation, DEA has been extended in multiple ways [
5,
6]. For example, various efficiency models have been introduced to admit static or dynamic analysis or handle constant or variable returns to scale. In particular, an additive model was formulated to guarantee that the units it indicates as efficient satisfy this property in Koopman’s sense [
7]. However, this model has also been criticized for assuming equal weights of all factors, vulnerability to the factors’ scale differences, and nonintuitive interpretation of the efficiency scores. These drawbacks have motivated the development of an additive value-based efficiency analysis [
8,
9], inspired by Multi-Attribute Value Theory (MAVT) [
10]. This model transforms the input and output values using the marginal functions. Such per-criterion components are aggregated into a comprehensive efficiency measure with an additive value model incorporating weights assigned to various factors. The units that attain the greatest comprehensive value for at least one feasible weight vector are deemed efficient. Such an analysis is insensitive to scale problems due to applying value functions with a common scale. Moreover, the efficiency scores have an intuitive interpretation built on the notion of “min–max regret”. Note that the hybrid methods combining ideas from DEA and Multiple Criteria Decision Analysis (MCDA) have become more and more popular in recent years (see, e.g., [
11,
12]).
This paper contributes to the literature concerning an additive value-based efficiency analysis in a three-fold way. This methodology handles only flat structures of inputs and outputs considered at the same level, without subcategories [
8]. Hence, our first contribution consists of adjusting it to handle hierarchical structures of factors used to assess the performance of DMUs. This is useful in real-world decision analysis for a few reasons. First, it helps to structure inputs and outputs logically and systematically. The higher-level factors are more general, whereas those at lower hierarchy levels are more specific. Moreover, when new information becomes available, the hierarchy can be easily modified or updated, allowing it to handle evolving decision problems. Second, a hierarchical decomposition of factors allows for the breaking down of complex problems into manageable, coherent pieces representing different levels of abstraction. By analyzing the efficiency at various levels of the hierarchy, it is possible to understand the strengths and weaknesses of DMUs and explain the comprehensive results taking into account their evolution along the hierarchy. Third, a hierarchical structure of factors makes efficiency analysis more transparent, flexible, and adaptable. In particular, we support the trade-off analysis, where weights can be associated with lower and higher-level categories of factors, and hence, their relative and absolute impact can be controlled more easily. In this regard, we incorporate the preferences elicited at each hierarchy level into the analysis. These preferences form the linear weight restrictions between factor categories at the same level.
The benefits of using a hierarchical structure have been explored in MCDA. The example methods that handle such a decomposition include the Analytical Hierarchy Process (AHP) [
13], the Multiple Criteria Hierarchy Process (MCHP) [
14,
15], and ELECTRE-III-H [
16]. In the DEA context, the first attempt was made with a two-layer nonlinear model [
17] and its linear counterpart [
18]. Then, Ref. [
19] proposed a multiple-layer DEA model (MLDEA) handling an arbitrary number of levels of inputs and outputs. Further, MLDEA was combined with AHP to consider relative priorities of various factors, mainly in the scenarios where DEA is used as a mathematical tool for constructing so-called composite indicators [
20,
21]. Finally, the latter approach was generalized to the setting of Network and Fuzzy DEA [
22]. The above-mentioned DEA models require inputs and outputs to be considered in separate hierarchies. We fill this research gap by admitting a single multiple-layer hierarchical structure containing inputs and outputs. In this way, the properly defined efficiency can be analyzed in each hierarchy node.
Second, in the proposed framework, we go beyond classifying the DMUs only into efficient and inefficient, as in the original value-based efficiency analysis [
8]. This is attained by verifying the robustness of efficiency results observable for the entire space of feasible input and output weights. We focus on three perspectives: distances to the efficient DMU, ranks, and pairwise preference relations. For each of them, we compute the exact (necessary, possible, and extreme) outcomes by solving dedicated mathematical programming models. Moreover, we estimate the distribution of results using Monte Carlo simulations. The proposed framework is inspired by [
23], being adapted to the multiple-level hierarchy value-based efficiency analysis. The formulations of dedicated procedures and types of considered results are similar to those considered in [
24]. However, we admit the stakeholders studying the stability of efficiency outcomes in each hierarchy node instead of forcing them to consider all inputs and outputs simultaneously. In this regard, the main challenge is adequately handling the indicator weights considered at different hierarchy levels. We also formulate the properties of the exact efficiency outcomes observed along the hierarchy tree. Typically, they relate the results observed in all children nodes of some more general category to the outcomes obtained in the parent node. Hence, they help understand the evolution of necessary, impossible, and extreme conclusions.
Third, we apply the proposed framework to a case study concerning healthcare. As noted in [
25], the efficiency of using resources to ensure a decent level of healthcare has become one of the most critical public policy issues in recent decades. Its assessment can be conducted from the perspective of the entire system or individual organization (e.g., hospitals). A detailed review of the applications of efficiency analysis in healthcare can be found in [
26], and a detailed description of the healthcare system in Poland is given in [
25].
We analyzed nine indicators capturing the quality of healthcare systems in sixteen Polish voivodeships. The indicators were grouped under three main categories: inhabitants’ health improvement, financial management, and consumer satisfaction. We elicited the preferences in the form of marginal value functions for all inputs and outputs and weight constraints. We report the results given a comprehensive efficiency index encompassing all relevant viewpoints and the three subproblems that allow for an understanding of each voivodeship’s strong and weak points. The paper’s three significant contributions, along with their essential aspects, are summarized in
Figure 1.
The paper’s remainder is organized as follows.
Section 2 describes an additive value-based efficiency model.
Section 3 defines a hierarchical structure of inputs and outputs, while
Section 4 describes a respective framework for robustness analysis. In
Section 5, we report the outcomes of a case study concerning the efficiency assessment of the healthcare system in Poland. The last section concludes the paper.
3. A Hierarchical Structure of Inputs and Outputs
The DMUs consume m inputs and produce n outputs . To simplify the notation, we aggregate all inputs and outputs into a single set of factors . Set forms level 0 of the hierarchy. These factors are grouped into categories of the first level, named . Analogously, the first-level categories can be grouped into second-level categories, forming a set , etc. The entire structure contains L levels. In the last (L-th) level, there is only a single category (), called a root.
From the mathematical viewpoint, the factors and categories form a tree (see
Figure 2). The set of all nodes in the tree (factors and categories) is denoted by
. For each node
, we define its parent
as a category in which it is directly contained. The set of direct children of category
is marked as
:
. The set of indirect children of category
contains all direct children of
and their direct and indirect children until reaching the tree’s leaves. For each category at hierarchy level
, we define set
as a subset of
(inputs and outputs), which are the indirect children of
. In particular, all factors are indirect children of the root category, i.e.,
. On the contrary, for an elementary factor
f,
is a singleton, i.e.,
. To maintain the spirit of DEA, for each category
, the respective set of factors (
) needs to contain at least one input and one output, i.e.,
and
, for
.
To illustrate the notation used in the paper, we will describe it using a simple hierarchy of factors in
Figure 3. This example involves two inputs (
) and two outputs (
). The set of factors
containing all elements from
and
is
. Overall, there are four factors (
), two first-level categories (
;
), and one root corresponding to a second-level category (
;
). The hierarchy contains two levels of categories (
). The parent category for input
is
(
). For
and
, the parent category is
, i.e.,
. The considered sets of factors
for example categories are the following:
,
,
, and
.
Given a hierarchy of relevant factors and categories, we assign weight to each node t except the root. Moreover, we admit specifying the linear constraints for these weights at each hierarchy level. Factors or categories involved in a single constraint must have a common parent. For example, for the considered hierarchy, the constraint can take the form or , as and or and have the same parent. On the contrary, the example constraint is not allowed, because and have different parents (). The space of weight vectors that meet these restrictions is denoted by .
To introduce weight restrictions, we consider additional variables (
), representing the aggregated weights of elementary factors in the hierarchy. They are defined as the products of all weights on the path from the analyzed category (
) at the hierarchy level
l to the analyzed factor
f:
For factor in the considered example, when taking into account the root category (), the above formula takes the following form , and when considering category , it is expressed as follows .
We analyze the efficiency of
in each node of the hierarchy. For category
, such efficiency is defined as follows:
The true weights assigned to each hierarchy category
from the set of the indirect children of the analyzed category
are defined as the ratio of the sum of weights of indicators contained in this category and the sum of weights of indicators in the parent category:
Note that the value of weight is always the same, regardless of the considered category (), so we replace symbol with . For example, when considering the root category (), the weight of indicator in the considered example can be calculated as , whereas the weight of category is .
4. Robustness Analysis for Additive Value-Based Efficiency Analysis with a Hierarchical Structure of Factors
The standard value-based efficiency model verifies if each DMU is efficient. Such an analysis builds on the weight vector that is the most advantageous for a given DMU, allowing it to minimize the distance from some efficient DMU. In this section, we introduce a suite of methods that investigate the robustness of efficiency outcomes given all feasible weights. They can be divided into two groups. First, the exact approaches use mathematical programming to find the extreme outcomes for each DMU. In turn, the probabilistic methods estimate the stochastic acceptability indices based on Monte Carlo simulations, reflecting the distributions of possible results. Each group concerns three relevant viewpoints: distances to the efficient unit, ranks, and pairwise preference relations. In what follows, we present the approaches that are flexible enough to determine the relevant results in each hierarchy node.
4.1. Exact Methods
In this section, we present the mathematical programming models that determine the exact robust results. These include extreme (the most and the least advantageous), necessary (observable for all feasible weight vectors), and possible (holding for at least one feasible weight vector) conclusions. Let us first focus on verifying the stability of distances to the efficient unit. The best (minimal) distance
for
considering category
can be computed by solving the following model:
s.t.
Similarly to the standard efficiency analysis, with is deemed efficient, given category , while implies inefficiency.
To compute the worst (maximal) distance
for
, given category
, we solve the following Mixed-Integer Linear Programming (MILP) model:
s.t.
where
C is a large positive constant. The above model uses binary variables
,
, to ensure that
is equal to the efficiency difference between
and some
,
, for which
. Maximizing
guarantees that we obtain the greatest possible difference observable in the set of feasible weights
. Note that when
, the respective constraint is satisfied for all possible variable values; hence, it is relaxed.
The second perspective concerns the bounds of efficiency ranks attained by
. To find the best (minimal) rank
of
, given category
, we minimize the number of other DMUs with greater efficiencies than
:
s.t.
Note that when , , the respective constraint ensures that is ranked not worse than since . When , is ranked better than , deteriorating its best rank by one.
To obtain the worst (maximal) rank
for
, given category
, we maximize the number of DMUs with the efficiencies not worse than
:
s.t.
Note that when , , the respective constraint ensures that is ranked no worse than since . This deteriorates the worst rank of by one. When , the respective constraint is satisfied for all variable values; hence, it relaxed.
The third perspective focuses on the pairwise comparisons between DMUs using two relations: necessary and possible. Given the uncertainty of selecting a specific weight vector, the necessary relation can be considered robust. Specifically,
is necessarily preferred to
, given category
(
), when
is not worse at level
in terms of efficiency than
for all feasible weight vectors. Its truth for pair
and category
can be verified using the following model:
Its optimal solution is equal to the minimal difference between efficiencies of and observable in the set of feasible weights , given category . If , then for all feasible weights , and hence . Otherwise, because there is at least one feasible weight vector, such that .
Furthermore,
is possibly preferred to
, given category
(
), when
is not worse at level
in terms of efficiency than
for at least one feasible weight vector. Its truth for pair
and category
is verified using the following model:
Its optimal solution is equal to the maximal difference between efficiencies of and observable in the set of feasible weights , given category . If , then for at least one feasible weight , and hence, . Otherwise, because there is no feasible weight vector, such that .
The relevant properties of the exact robust results given the hierarchical structure are presented in
Appendix A. The formulations of example mathematical programming models that support understanding the general formulations are given in
Appendix B.
4.2. Simulation-Based Methods
The results determined with mathematical programming are often insufficiently conclusive. In particular, the difference between extreme distances or ranks may be significant, the necessary relation may be poor, and the possible relation may be very rich. If so, it would be helpful to determine the distribution of results observed for the set of feasible weight vectors. Unfortunately, such distribution cannot be computed exactly. However, using Monte Carlo simulations, we can estimate the share of feasible weight space confirming a particular outcome. Specifically, we use the hit-And-run algorithm to generate a predefined number of weight vector samples [
27]. We generate weights for all categories and factors while respecting that the sum of weights of categories or factors with the same parent must be equal to one. In the example problem, the sum of the weights of two categories
and
must be equal to one (
, and the sums of weights assigned to the elementary indicators in the same category also need to be one (
and
). Moreover, we obey the provided weight constraints for all hierarchy levels. After generating a predefined number of weight samples, we compute the efficiencies of all DMUs for each. This lets us calculate the relevant stochastic acceptability indices estimating the respective shares of feasible weight vectors.
In what follows, when considering category and referring to weight vectors, we mean the weights assigned to all categories and factors that are direct or indirect children of in the hierarchy. The most interesting stochastic acceptabilities are defined as follows:
Distance Acceptability Interval Index () for unit , interval , and category is the share of feasible weight vectors for which belongs to . Note that all intervals must be disjoint (), and their sum must cover the space of possible distances (; z—the number of intervals).
Efficiency Rank Acceptability Index () for unit and rank r is the share of feasible weight vectors for which attains r-th position in the efficiency ranking of all DMUs given category .
Pairwise Efficiency Outranking Index () for pair , and category is the share of feasible weight vectors for which is at least as efficiency as at level , i.e., .
Moreover, we compute the expected distance
to the efficient unit and expected rank
for each DMU [
28]. This is performed by averaging the distances or ranks observed for all samples. Note that by default, we use uniform distribution for weight sampling. However, the weights can be generated from any predefined distribution, but it is hard to define as it requires in-depth knowledge about the specific application domain.
In
Appendix C, we illustrate the process of computing the stochastic results on a small sample of weight vectors.
5. Case Study concerning Evaluation of Healthcare System in Poland
This section reports the results of a case study concerning an assessment of the quality of the healthcare system in Poland. This sector faces the challenge of improving the quality of provided services. This can be attained by advancing some indicators reflecting both the system’s functioning and the perception by patients. We consider sixteen voivodeships (provinces) in Poland as DMUs (see
Table 1). These administrative areas govern their healthcare independently, so it makes sense to highlight their differences using a uniformly computed set of indicators. Such an evaluation is critical, given the rapid development of new technologies and major transformations in the healthcare sector.
Following [
29], we consider three main categories of factors representing desirable characteristics in the complex healthcare system. Two areas—health improvement and financial management—are based on objective indicators and parameters, whereas the system’s evaluation by patients is, to some extent, subjective. In particular, improving health is the ultimate aim of the healthcare system. In this regard, it is relevant to consider the example dimensions of the health status that are affected by how the system is operated, given the ever-growing needs of patients. Financial management is essential in healthcare, as this sector experiences the availability of limited resources. Hence, it is vital to assess the financial situation of medical facilities, the management of infrastructure, and the economic efficiency of treatments and therapies. Finally, consumer satisfaction is becoming more and more important in evaluating the healthcare sector. Thus, it is desirable to consider the quality of services, comfort in using patients’ services, and patient rights.
The hierarchy of inputs and outputs for the case study is presented in
Figure 4. Among the nine factors, there are six outputs (
,
,
,
,
, and
) and three inputs (
,
, and
). The selected indicators are representative of the three dimensions and the viewpoints of the most important stakeholders. The indicators in the health improvement category (
H) are representative of the dimensions of preventing diseases (
), their exacerbation (
), and deaths (
). The factors considered in the financial management category stand for the financial situation of healthcare units (
) and infrastructure management (
and
). The inputs and outputs in the system’s evaluation category (
S) represent the waiting time (
), official quality system (
), and patient satisfaction (
). Moreover, we verified that the trends that these indicators confirm also represent other factors that could be considered in the three categories. Note that analysis including over 40 indicators available for assessing the healthcare system in Poland [
29] would not make much sense in the context of DEA, as the number of inputs and outputs would be too large compared to the number of DMUs. Typically, such analyses indicate that all or almost all units are efficient, as even the worst performers tend to specialize in some particular aspects. Hence, we opted for an analysis with a reduced—though carefully selected—set of indicators. The performances of the sixteen voivodeships in terms of nine considered factors are given in
Table 1.
For all factors, we elicited marginal value functions from experts in the healthcare system in Poland. They are provided in
Figure 5. They are decreasing for inputs and increasing for outputs. Moreover, they differ in shape. The function is, e.g., close to linear for
, convex for
, concave for
, and S-shaped for
. Moreover, we incorporated the relative and absolute weight constraints. The category of
inhabitants’ health improvement is more important than the two other categories. Hence, we introduced the following constraints:
and
, where
,
, and
are the weights of the three categories. Finally, we wanted to avoid both the minor and dominating roles of any individual factor or category in the analysis. Hence, we restricted the weight of each category to be not less than
and the weights of second-level elementary factors to the interval
. In what follows, we discuss the results attained in the root hierarchy level and for each of the three categories separately.
5.1. Comprehensive Evaluation of the Quality of Healthcare Systems
In this section, we discuss the results of the comprehensive assessment of Polish voivodeships, taking into account all nine indicators.
Figure 6 presents the extreme and expected distances to the best unit for each analyzed province. Three voivodeships are efficient: POM, LBU, and WLKP. POM also attains the lowest maximal distance (
), which confirms its most favorable evaluation of the healthcare system for all feasible weights. Moreover, the distances for POM are the most stable, as it is characterized by the narrowest range (
). Among the efficient provinces, WLKP has the worst pessimistic distance to the best province. However, its expected distance is better than that of LBU, meaning that for some weight vectors, WLKP performs worse than LBU, but its efficiency score is closer to the best province on average. The worst provinces in the most and the least favorable scenarios are DSL (
,
) and SL (
,
). The greatest sensitivity of the distances depending on the selected weight vector is observed for SW, as the width of its distance interval equals
.
The analysis of extreme distances can be enriched with the distribution of distances over all feasible weight vectors (see
Table 2). The efficient provinces (POM, LBU, and WLKP) are the only ones whose distances were not greater than
for some samples. When considering these three voivodeships, only for LBU, the distance was greater than
for some marginal share of weights (
). Hence, these provinces are robustly better than the remaining ones. Among the inefficient units, the most favorable results were attained by ZPM and KP.
Some provinces are characterized by rather stable distance values. For example, for MAZ and WM, for over
of weight vectors, the distance is the interval
. On the contrary, the distance for PKR varies more depending on the chosen weight vector. In this case, positive
s were observed for all buckets between
and
with the greatest values for the intervals
(
) and
(
). The complete ranking determined by the expected distances (see
Figure 6) indicates POM (
) and WLKP (
) as the best units and DSL (
) and OPO (
) as the worst.
The results of robustness analysis for efficiency ranks are presented in
Figure 7 and
Table 3. The three efficient voivodeships attain the first rank in the most favorable scenario. POM and LBU are ranked third in their worst scenario, while WLKP falls fourth in the pessimistic case. These units are also the best, given their expected ranks. In this regard, POM (
) is followed by WLKP (
) and LBU (
). Among the inefficient units, KP is the most advantageous when considering the most favorable ranks (
). Other inefficient units ranked relatively high in their best scenario are ZPM, WM, LBL, LDZ, SW, MLP, and PKR (
). However, all inefficient provinces are ranked low in the least favorable scenario. The best maximal rank among them is observed for KP and LBL (
), while five provinces can be ranked at the bottom (PDL, DSL, OPO, SL, and PKR). Finally, the best expected ranks among inefficient units are attained by ZPM (
), KP (
), and SW (
), while the worst expected positions are associated with DSL (
) and OPO (
).
The analysis of efficiency rank acceptability indices (see
Table 3) confirms the superiority of POM over other provinces. It is ranked first for most feasible weight vectors (
), and it is in the top two for almost
of samples. Similarly, WLKP is at least second for over
of samples, though its most frequent position is second rather than first. In turn, LBU is ranked third for most scenarios (
). Even though the best possible rank for KP is second, such a position was not observed for any weight vector. The highest for which ERAI for KP is positive is fourth (
). However, this place is predominantly occupied by ZPM, which is ranked in the interval
for over
of feasible scenarios.
Table 4 presents the results of exact robustness analysis from the perspective of pairwise comparisons. For clarity of presentation, the necessary relation is also presented in the form of the Hasse diagram in
Figure 8. For efficient provinces (POM, WLKP, and LBU), no other unit is necessarily preferred to them. As expected, these provinces are necessarily preferred to the greatest number of other units. POM is robustly better than all thirteen inefficient provinces, and WLKP and LBU prove their superiority over all units but KP. Among the inefficient provinces, KP is necessarily preferred to the highest number (4) of other provinces. Eight voivodeships are not robustly as good as any other unit. The least favorable among them is DSL, which is not even possibly preferred to eight other provinces.
For pairs of voivodeships that are not related by the necessary preference, it is worth analyzing the pairwise efficiency outranking indices (see
Table 5). For some, one province proves better for most scenarios (see, e.g., ZPM and PKR with
or MAZ and MLP with
). For other pairs, the shares of feasible scenarios confirming the preference in both directions are more balanced (see, e.g., MAZ and LBL with
and
, or PKR and SL with
and
).
5.2. The Category of Inhabitants’ Health Improvement
In this section, we focus on the results attained when considering only inputs and outputs from the inhabitants’ health improvement category. In addition, we emphasize the differences with respect to the comprehensive level.
Figure 9 presents the extreme and expected distances to the best unit for all considered voivodeships. WLKP is the only efficient province given this category, so its distance to the best unit always equals zero. Hence, POM and LBU lose the status of efficient units. However, these two voivodeships and ZPM have relatively low distances to the best one: ZPM (
), POM (
), and LBU (
). In general, the widths of distance intervals are notably more precise than when considering all relevant factors jointly. They are also more diverse, encompassing a greater range. In particular, seven provinces have maximal distances greater than
, with three even exceeding the threshold of
. At the comprehensive level, this was not observed for any voivodeship.
Among the inefficient provinces, ZPM can be considered the best, as it has the lowest distances to the best unit in both optimistic and pessimistic settings. Moreover, its best expected distance () is twice lower than at the comprehensive level, letting it overtake POM and LBU, which were judged efficient in the hierarchy’s root. The other twelve inefficient provinces are significantly worse. The least favorable among them is OPO, with the distance to the best unit in its optimistic scenario equal to and an expected distance of . In the average case, OPO is directly preceded by MLP () and PKR (). Note that at the comprehensive level, the worst maximal distances were attained by SL, DSL, and PDL. They all prove slightly better regarding inhabitants’ health improvement results.
Table 6 presents the distribution of distances for all voivodeships when considering the inhabitants’ health improvement level. The only two provinces for which this distance is lower than
are WLKP (
) and ZPM (
). Such a favorable result was not attained by ZPM at the comprehensive level for any feasible weight vector. Then, its distance could drop even above
. For the two units mentioned above, as well as POM and LBU, all samples confirm distances not higher than
. Furthermore, for all provinces, we can indicate a single bucket in which the unit’s distance falls for most samples. For example, it is
for LBU,
for MAZ, and
for OPO. For these three provinces, the predominating distance buckets at the comprehensive level were better. However, for other voivodeships, including SW, MLP, and PKR, the most often repeated distance range worsened when limiting the analysis to the inhabitants’ health improvement level.
The extreme and expected efficiency ranks at the inhabitants’ health improvement level are provided in
Figure 10. WLKP, as the only efficient province, is always ranked first. Both ZPM and POM are ranked between second and fourth, while for LBU, this range is slightly wider (
). Conversely, only two voivodeships—OPO and PKR—fall to the bottom ranking at any point, and three others (WM, DSL, and MLP) are ranked fifteenth in the least favorable scenario. Finally, for DSL and MAZ, the difference between their extreme ranks is the greatest. For example, DSL is ranked between sixth and fifteenth. According to the expected ranks, WLKP (
), ZPM (
), POM (
), and LBU (
) are the best, and PKR (
), MLP (
), and OPO (
) are the worst.
When compared to the comprehensive level, the greatest improvement in the attained ranks can be observed for ZPM ( rather than ) and SL ( rather than ). On the contrary, the greatest deterioration of possible positions is noted for OPO ( rather than ) and MLP ( rather than ). For many provinces, including WM, POL, MAZ, LBL, LDZ, and SW, the ranking intervals got significantly narrower, confirming the lower diversity of results when limiting the scope of the analysis to health improvement.
The distribution of efficiency ranks given the inhabitants’ health improvement level is presented in
Table 7. The most stable individual positions were observed for WLKP (1–
), SL (5–
), OPO (16–
), and ZPM (2–
). Such high acceptabilities were not observed at the comprehensive level for any position and unit. In fact, the greatest share of weights (
) supported the third position of LBU. Returning to the health improvement category, for some other voivodeships, the vast majority of weights indicate a pair of ranks (e.g., MLP and PKR are ranked 14th or 15th for
or
samples, respectively). Finally, the ranks of some units are more dependent on the chosen weight vector. For example, KP attained positions between 8th and 12th, with no
exceeding
. Similarly, SW is ranked within the range
with
s not less than
and not greater than
. Still, these outcomes exhibit less diversity than at the comprehensive level.
The graph of the necessary relation at the inhabitants’ health improvement level is shown in
Figure 11. The robust conclusions are richer than at the comprehensive level. For example, the number of pairs of different provinces that are related by the necessary preference increased from 47 to 85. Moreover, the number of levels in the respective Hasse diagram increased from 3 to 7.
In particular, WLKP is necessarily preferred to all other provinces, and four other voivodeships—POM, ZMP, LBU, and SL—proves to be necessarily better than the remaining eleven units. The worst unit is OPO, which is not necessarily preferred to any other province while being possibly preferred only to PKR. Further, MLP and PKR are necessarily worse than 10 and 9 other provinces, respectively. Finally, SL benefited the most from limiting the scope to the health improvement level, because in the hierarchy’s root, it was not robustly better than any other voivodeship.
The respective
s are given in
Table 8. Similarly, as at the comprehensive level, for some pairs, one unit is significantly better than the other (see, e.g., ZPM and POM with
or LBU and SL with
). In turn, other pairs are characterized by more balanced stochastic acceptabilities, indicating an advantage of either voivodeship (see, e.g., PKR and MLP with
or KP and SW with
). However, the absolute values of
s for some pairs differ vastly compared to the hierarchy’s root. For example, POM is necessarily better than ZPM at the comprehensive level, but when considering only health improvement, ZPM attains no worse efficiency for
of feasible weights.
5.3. The Category of Financial Management
In this section, we discuss the results attained at the level of financial management. Instead of comparing them to the outcomes at the comprehensive level, we emphasize how managers can use them to improve the relative efficiency of provinces. Similar improvement strategies can be designed for other categories or hierarchy nodes.
Figure 12 presents the extreme and expected distances to the best province when limiting the scope to financial management. The most important result derived from their analysis is the division of provinces into efficient and inefficient. The minimal distance equals zero only for two units: POM and LBU. This means that they are the best performers among the sixteen voivodeships for at least one feasible weight vector. However, LBU has slightly better maximal and expected distances to the best province than POM. When looking at their inputs and outputs within the financial category, they perform equally well (
) on output
. It is the best value among all provinces. Moreover, the other two financial factors are greater for LBU, confirming that it transforms more beds (input
) into a more significant profit (output
). These two voivodeships should serve as ultimate peers for the remaining inefficient units in terms of financial management.
Among them, the overall good performers are WLKP () and MLP (). They do not optimize one specific input or output, but perform decently on all indicators. In the optimistic, pessimistic, and expected scenarios, SL attains the least favorable results with the significantly greatest distances to the best province. In turn, KP has the broadest range of distances (), confirming its performance’s sensitivity to the selection of particular priorities. This results from an imbalanced performance profile with a highly favorable value on output and a relatively poor value on output . Finally, when complete order is desired, it can be imposed by the expected distances. In this case, LBU () and POM () are safely ranked at the top, and DSL () and SL () are ranked at the bottom.
The distance distribution at the financial management level is presented in
Table 9. For both efficient provinces (POM and LBU), the distance is always within the first bucket (
). Among the inefficient provinces, WLKP confirms its superiority over the remaining units, as for over
samples, its distance from the efficient unit is not greater than
. The only two other provinces with positive
s for bucket
are KP (
) and MLP (
). The greatest stability of distances among inefficient provinces is observed for ZPM with
. There are only two provinces, PDL and KP, for which most samples confirm no single distance bucket. Furthermore, KP is the only unit with positive
s for more than three buckets (
).
The results of the robustness analysis for efficiency ranks at the financial management level are presented in
Figure 13 (extreme and expected positions) and
Table 10 (efficiency rank acceptability indices). They provide additional insights into the comparisons of efficient provinces. Specifically, even if LUB and POM can be ranked first for some weight vectors, the former is ranked first almost twice as often as the latter. This makes it the most favorable province regarding financial management, as additionally confirmed by the expected ranks (
vs.
). The remaining provinces are at most third in the best case. Again, WLKP proves to be the most advantageous among them, with
. Further, KP has the broadest range of efficiency positions, being ranked between third and eleventh.
s confirm that its ranks are rather equally distributed between these extreme positions, with the maximal value for the fifth rank (
) and acceptabilities greater than
for all ranks within the range
. Such great diversity is a consequence of its extreme performances. In fact, it is the best among all provinces on
while being in the bottom three on
. Thus, KP is focused on the profit attained by healthcare institutions rather than on the number of treated patients. This aspect needs to be improved when aiming for higher ranks.
There are three other provinces with relatively wide possible efficiency rank intervals: ZPM (), PDL (), and OPO (). Among them, only ZPM attains a single rank for most samples (). Finally, SL is ranked at the bottom regardless of the weight vector. This is related to its greatest value on input , the lowest value on output , and a relatively low value on output . Thus, even if the financial input of SL is the greatest, its outputs are less favorable than for provinces with lesser financial resources. The complete ranking established with the expected efficiency ranks aligns with the one based on expected distances, with LBU, POM, WLKP, and MLP being ranked among the best provinces and SL, DSL, and PKR placed at the bottom.
Figure 14 presents the necessary ranking at the financial management level. POM and LBU are necessarily preferred to all other provinces, confirming their superiority given the financial category. The second level includes KP, WLKP, and MLP, and the lowest one contains PKR, DSL, and SL. They are possibly preferred to 4, 3, and 0 other units, respectively. Moreover, SL is necessarily worse than all other voivodeships.
The pairwise comparisons are useful for analysts particularly familiar with some provinces. For example, the authorities of OPO can compare it to other provinces searching for possible improvements. They can note that OPO is robustly worse than PDL or LBU and robustly better than SL. Notably, the necessary ranking (see
Figure 14) is a good starting point to find the improvement paths for provinces. OPO has multiple paths to achieve efficiency. For example, it can take PDL as the first benchmark, follow WLKP, and finally refer to POM or LBU. An alternative improvement path runs through KP and POM.
The analysis of
s (see
Table 11) is helpful for pairs related by mutual possible preference. For some, one province attains greater efficiency more often (e.g.,
). For other pairs, indicating a better voivodeship is more challenging, as the shares of weights confirming the advantage of either unit are similar. An example of such a pair is LDZ and KP with
and
.
5.4. The Category of Consumer Satisfaction
In this section, we present the results of provinces’ performance analysis at the level of consumer satisfaction with the healthcare system. To save space, we refer only to the exact and expected results without reporting the complete tables with stochastic acceptabilities.
Figure 15 presents the extreme and expected distances when considering the satisfaction of consumers. There are five efficient provinces with
: POM, KP, LBL, SW, and PKR. This is the greatest number for any node in the considered hierarchy. Among them, two provinces, LBL and PKR, have particularly narrow distance intervals, being close to the efficient units regardless of the weight vector (
for LBL and
for PKR). They are also the best regarding the expected distance (for PKR—
and for LBL—
). In turn, the maximal (worst) possible distances for SW (
) and POM (
) are vastly greater, emphasizing their sensitivity to the selection of a particular weight vector. Among the inefficient units, the best minimal (optimistic) distance is achieved by OPO (
), and the best maximal (pessimistic) distance is attained by WLKP (
). The three provinces that are the worst considering both minimal and maximal distance are PDL (
,
), DSL (
,
), and ZPM (
,
). Their poor performance is also reflected in the bottom ranks according to the expected distances.
The extreme and expected efficiency ranks in consumer satisfaction are presented in
Figure 16. Among the five efficient provinces, PKR has the narrowest rank interval; in the least favorable scenario, it is ranked third. On the contrary, even if POM is efficient, it can drop even to the tenth position, making its performance the least stable among all voivodeships. LBU is the best inefficient province when it comes to the best rank (
), and the best-ranked inefficient units in the pessimistic settings are WM, LBU, and OPO (
). The worst provinces considering both the minimal and maximal ranks are ZPM (
), PDL (
), and DSL (
). Notably, their rank intervals are very narrow. As far as the expected ranks are concerned, the best province is PKR (
), followed by LBL (
) and KP (
). On the other extreme, PDL (
), DSL (
), and ZPM (
) are the least favorable.
The necessary efficiency preference relation is presented as the Hasse diagram in
Figure 17. The provinces form the multiple-level structure, with efficient ones (POM, PKR, LBL, KP, and SW) at the top. Among them, PKR and LBL are necessarily preferred to the most significant number of eleven other provinces, while POM proves robustly better than only six other provinces. Five other voivodeships (WM, LBU, OPO, WLKP, and MLP) are placed in the second level. All of them are necessarily preferred to four or five other provinces. Finally, DSL and ZPM are confirmed as the worst provinces, as they are not necessarily preferred to any other province. In fact, DSL is possibly preferred to only two other units (ZPM and PDL), while ZPM is at least as efficient for at least one weight vector only when compared to DSL.
5.5. Complete Efficiency Rankings of Voivodeships
The expected distances (
) and ranks (
) allow for the construction of a complete ranking of voivodeships. In this section, we compare such orders with the ones obtained with the most commonly used ranking procedures for DEA, i.e., Cross-efficiency (CE) [
30] and Super-efficiency (SE) [
31]. We adapted them to a value-based additive efficiency model and ran it on each category of indicators.
The rankings generated by all four procedures for the level of inhabitants’ health improvement are provided in
Table 12. To quantify the agreement level of such rankings for all hierarchy nodes, we used Kendall’s
coefficient [
28,
32]. Its values are shown in
Table 13. Note that
means the rankings are inverse, whereas 1 denotes a pair of the same rankings.
All four procedures provide highly correlated rankings. The two methods proposed in this paper ( and ) offer the most similar orders of provinces. The rankings constructed with these two procedures are the same for the comprehensive analysis of the healthcare systems and the customer satisfaction category. For the remaining two categories, Kendall’s coefficient equals .
The similarity between the rankings based on , , and CE is between for the comprehensive analysis and for health improvement and financial management perspectives. The average measure value is . When comparing the orders imposed by SE and the two robustness-based methods, Kendall’s is between and () for (), with an average value of (). Given that the maximal possible value of Kendall’s is 1, the observed similarity levels are very high.
5.6. Discussion
Three provinces, namely POM, WLKP, and LBU, prove to be the best in the comprehensive analysis of the healthcare system in Poland. However, the results for the three subcategories differ. This section discusses the conclusions that can be derived from the cross-category analysis, indicating various provinces’ strong and weak points. For illustrative purposes, we refer to three example voivodeships representing the top (WLKP), middle (PKR), and bottom (OPO) performers in the hierarchy’s root.
When it comes to WLKP, it proves to be the best at the inhabitants’ health improvement level; it is necessarily preferred by POM and LBU for the financial management level, and its ranks are between fifth and twelfth, given the consumer satisfaction perspective. This suggests that despite the decent quality of medical services, there is room for improvement in patient satisfaction and financial management. In particular, managers can conduct some training in soft skills for medical staff to improve consumer assessment of the healthcare system.
PKR is the best province regarding consumer satisfaction, as confirmed by its favorable expected distance and rank. However, when considering the comprehensive results and conclusions drawn for the remaining two categories, it performs poorly. Its expected rank is greater than 13 for inhabitants’ health improvement and financial management categories, while in the hierarchy’s root, its average rank is . Hence, the healthcare system managers in this province should focus on improvements in medical decisions and financial management. In turn, other provinces can consider the healthcare system in PKR as the benchmark of proper communication with the consumers.
Such a cross-category analysis can serve as the basis for designing an improvement plan for provinces that proved to be relatively bad in the comprehensive analysis. In particular, the analysis of OPO’s poor performance points to the inhabitants’ health improvement category and financial management. It is the worst province for the former perspective for of weight vectors, and its expected rank is only for the latter category. Hence, managers should first focus on improving inhabitants’ health, due to the high importance of this category in the analysis. Then, they should design a plan for advancing financial management.
6. Conclusions
This paper introduce a novel framework for robustness analysis in the context of additive value-based efficiency analysis with a hierarchical structure. It admits a multiple-layer organization of relevant factors from the most general to the most detailed ones while tolerating both inputs and outputs in the same node. We accept the linear weight restrictions concerning the importance and trade-offs between various factors or subcategories with the common predecessor in the hierarchy. The results can be considered in each hierarchy node, letting the analyst view the comprehensive outcomes and draw conclusions for the subproblems where the relevant factors are limited to concise subsets of inputs and outputs reflecting a particular perspective. The proposed framework can be used in the standard efficiency analysis and the decision contexts requiring the consideration of composite indicators.
We derived the results by considering three perspectives: score-based distances to the efficient unit, ranks, and pairwise preference relations. For each of them, we proposed a pair of methods. The first group was based on mathematical programming, offering the exact, extreme outcomes that can be attained in the set of feasible input/output weights. The other group was based on Monte Carlo simulations, providing the distribution of efficiency outcomes through stochastic acceptabilities that estimate the share of the weight subspaces confirming a given result. These approaches are complementary because the exact outcomes often need more conclusiveness, whereas the stochastic indices—even if approximated with high accuracy—may fail to capture some extreme results.
We illustrated the framework’s applicability by assessing the quality of the healthcare system of sixteen Polish voivodeships. The analysis included nine indicators of different natures and concerned four levels. The comprehensive results were based on all relevant characteristics considered jointly, while the three subproblems captured the perspectives of inhabitants’ health improvement, financial management, and consumer satisfaction. We reported three provinces—Pomorskie (POM), Wielkopolskie (WLKP), and Lubuskie (LBU)—as the most efficient ones. We presented their strong and weak points by referring to the results in all hierarchy nodes. Moreover, we discussed the practical usefulness of the robustness analysis in terms of managerial implications.