**1. Introduction**

Transit-oriented development (TOD) is a strategy of urban development that maximizes the transit accessibility to urban areas within walking distance [1]. The main purpose of TOD is to increase transit user comfort and alleviate automobile use by creating an accessible public transportation environment within the city [2]. Moving towards a transit-oriented approach, personal mobility can be prevented and looking in the long-term, more sustainable cities can be built [3].

TOD is practiced in densely populated areas with high demand for transit system facilities such as subway and bus stations. When planning a new public transportation system, organizers must first contemplate the socio-economic characteristics of the area and plan transportation facilities accordingly to the population density, land use, commercial facilities, and residences in the area [4]. The travel demand for transit systems increases when stations are newly introduced into an urban area. Early TODs focused on increasing the connectivity between urban planning and public transit to address problems such as urban sprawl, tra ffic congestion, and environmental degradation [5]. The development of the TOD concept was aimed at urban planning with a focus on public transit within the city [6]. Historically, TOD has been recognized as an e fficient development strategy in terms of the transit environment and socio-economic characteristics [7]. Calthorpe [1] stated that research on public transit use via high-density, multi-purpose land-use patterns could help shape a culture toward becoming a transit-friendly environment. This concept considered the regional factors, such as population density, complex land use, trip purpose, trip frequency, trip demand, and mode [2]. Also, various studies established the TOD evaluation criteria to achieve a more e fficient and sustainable transit environment and solve complex urban problems. Most previous research stated that e fficient and sustainable transit must balance the social and economic aspects of the transportation environment [8–10]. It means that land use, population density, and residential environment should be closely examined in urban planning [9]. To maximize e fficiency at the economic level, transit capacity must be met and maintained to minimize the consumption of resources [11]. The e fficiency of TOD also has been explored in previous studies [12–18]. They established one or more indicator variables to quantify the e fficiency of TOD strategies. Renne and Wells [16] identified various useful TOD indicators from monitoring successive TODs. Galelo et al. [17] found that travel volume was one of the most representative indicators for evaluating TOD e fficiency. However, Yu et al. [18] suggested that a single indicator had a limitation to e ffectively measure the performance of TOD e fficiency and multi-indicators were required.

The massive data created by the Internet of Things (IoT) in cities now enables data scientists to analyze the objective functions of TOD. With its data-driven approaches, data envelopment analysis (DEA) has been widely used to measure the e ffectiveness of the operation or managemen<sup>t</sup> of transportation systems [19]. DEA has a significant advantage compared to parametric models as it does not require weight parameters to measure e fficiency [20]. The merit of DEA is its simplicity in estimating e fficiency with multiple inputs and outputs [19]. DEA also has merit compared to the parametric model. The parametric model assumes a specific production function for the relationship between input and output [18]. However, DEA does not make assumptions about the production function and the given data are utilized to estimate the production relationship between input and output [21]. Therefore, it is possible to avoid the error of setting the type of distribution according to the arbitrary judgment of the analyst. It is possible for the network DEA model to be designed in the order of stages which is required for the evaluation process [22,23]. It is also used with the slacks-based measure (SBM) model for direct comparisons between the observations [24]. The early DEA models have the disadvantage that the ine fficiency cannot be directly compared between di fferent observations [25], but the SBM model allows direct comparison between di fferent observations by measuring e fficiency based on the slack ratio [26].

The evaluation of the e fficiency of the transit system has been performed in previous studies using DEA model [27–31]. With the introduction of automatic fare collection (AFC) system, the data-driven approach for the transit-related analysis has become possible [32]. Various kinds of data, i.e., smartcard data, socio-economic data, and geographical data, were combined to evaluate the e fficiency of TOD [33]. Transit e fficiency was defined through the relationship between multiple inputs and outputs [34]. Transit e fficiency refers to how well the transit system was introduced and managed with respect to the socio-economics, transit infrastructures, and transit trips of each station area [8]. Regarding the TOD concept, both transit design and e fficiency must be considered to determine transit e fficiency [16,17].

The purpose of this research is to evaluate transit e fficiency in Seoul using the network slacks-based measure (NSBM) DEA model. The smartcard data and socio-economic data were used to evaluate the transit e fficiency of 352 subway station areas in Seoul. The evaluation process is developed as a two-stage network with transit design and e fficiency stage. In the evaluation process, the two-stage NSBM DEA model was used to measure e fficiency. The two-stage network constructed with transit design stage and transit e fficiency stage. With the results gathered from each stage, the overall e fficiency was measured to evaluate the transit e fficiencies of the subway station areas. Each station evaluated were grouped by each Seoul administrative unit and ranked based on its TOD e fficiency.

### **2. Methodology for Evaluating Transit E** ffi**ciency**

### *2.1. Concept of Data Envelopment Analysis (DEA)*

DEA is a nonparametric method for estimating production frontiers. The DEA model identifies relative e fficiencies using a number of input and output variables [20,21]. The purpose of measuring efficiency using the DEA model is to determine the strategy of an enterprise, organization, or industry. The first DEA model was developed to evaluate the e fficiency and increase production for farm yield in the UK [25]. With respect to productivity, DEA has been applied to linear programming and has been used in various fields [35]. The relative e fficiencies of decision-making units (DMUs) were determined and their performances compared. The original model was known as the Charnes, Cooper & Rhodes (CCR) model, which was employed to achieve constant returns to scale (CRS). Since the CRS condition assumes that the unit of production is kept constant at the optimal scale, the input and output are scaled proportionally. The CCR model is the most important model as it shows the most abbreviated methodological features. The CCR model estimates a ratio that can reduce the input as much as possible while keeping the output constant, and vice-versa. For example, there are some considerations to estimate the e fficiency score with the input-oriented CCR model. The e fficiency score is estimated by summing the weights of the output variables. The summed weights of output variables are between 0.0 and 1.0 score. With the observed *J* stations (*j* = 1, ... , *J*), each station produces the *M* outputs using *N* inputs. The ratio of the input value versus output value is the e fficiency score θ and the objective function is to minimize the θ*i* which is the reduced ratio of the input variables of target station *i*. The input-oriented CCR model, therefore, measures the weights of input and output variables to minimize the θ*i*, and the e fficiency score is estimated by weights of variables. The maximum value of the e fficiency score is equal to or less than 1.0 value with the constraints, i.e., *y*, *x* > 0 and λ ≥ 0. The e fficient stations consist of the production frontier, and the ine fficient stations improve e fficiency in the near direction of the production frontier. In other words, the e fficient stations are the reference group that the ine fficient station benchmarks to improve its e fficiency. The mathematical expression of the input-oriented CCR model is as follows:

$$\boldsymbol{\Theta}^{\mathrm{i}^\*} = \mathrm{Min} \left\{ \boldsymbol{\Theta}^i - \varepsilon \left( \sum\_{m}^{M} \boldsymbol{s}\_m^- + \sum\_{n}^{N} \boldsymbol{s}\_n^+ \right) \right\} \tag{1}$$

subject to:

∗

$$\begin{aligned} \theta^i \mathbf{x}\_m^i &= \sum\_{j=1}^J \mathbf{x}\_m^j \lambda^j + \mathbf{s}\_m^- \\ y\_r^j &= \sum\_{j=1}^J y\_r^j \lambda^j - \mathbf{s}\_r^+ \\ \lambda^j &\ge 0, \; \mathbf{s}\_m^- \ge 0, \; \mathbf{s}\_r^+ \ge 0 \end{aligned}$$

where θ*i* is the e fficiency score of the target station area *i*, *j* (*j* = 1, ... , *J*) is the number of observed station areas, *yj n* is the output number of each variable *r* (*r* = 1, ... , *R*) of a station area *j*, *xj m* is the input variable *m* (*m* = 1, ... , *M*) of a station area *j*, *yj r* is the output variable *r* of a station area *j*, λ*j* is the intensity vector of station area *j*, *s*<sup>−</sup> *m* is the slack vector of the input variable *xm*, *s k*+ *o* is the slack vector of the output variable *yr*.

### *2.2. Network Slacks-Based Measure DEA*

The NSBM DEA is used to measure the e fficiency of the network that consists of two or more stages. In general, measuring the e fficiency with DEA model involves two stages, an input stage, and an output stage. However, network DEA has more than two stages that include intermediate processes [22,23]. These intermediate processes are linking activities that occur in stages of production or that occur internally in DMUs. In other words, the output results of the first stage can be applied to

the second stage as inputs for the final result [22]. If the network becomes complex, early DEA models had a limitation with only contain one stage while the network DEA solves this problem by having multiple stages [24]. Since the transit system also has a complex network, the network DEA is suitable for measuring transit efficiency.

The transit infrastructure relative to the socio-economic is important in terms of the transit design [36]. Transit efficiency is defined as the number of transit trips relative to the infrastructures [37]. In this research, the transit design and transit efficiency stages were designed to measure transit efficiency. The process of TOD proceeds with three factors, i.e., socio-economic, transit infrastructure, and transit trip [11]. First, the socio-economic factors are examined to find the area where the transit is needed. Second, the transit infrastructures are built to the area selected from the socio-economic factors. Finally, the transit trips are derived through the socio-economic and transit infrastructures. These three factors are grouped by two stages which are transit design stage and transit efficiency stage. Figure 1 shows the framework of the process used to evaluate transit efficiency. The first stage, the transit design stage was measured by comparing the transit infrastructures with socio-economic factors. For the second stage, the transit efficiency was estimated by comparing the transit trips with transit infrastructures. The overall efficiency score is obtained by the average sum or weighted multiplication of each stage output from the design and efficiency stage. The weight of each stage can be determined by the purpose of the research or the characteristics of the subject [38]. Regarding the TOD evaluation, a bunch of previous studies considered the transit-related factors at the same level [11–13]. Since the concept of TOD considers design and efficiency at the same level, the importance of both stages is considered equal. In this research, identical weights are given at both stages in this research, i.e., *w*<sup>1</sup> : 0.5 and *w*<sup>2</sup> : 0.5. In other words, the assumption is made that both the transit design and efficiency stages are equal contributors to the overall efficiency score. With this assumption, the comprehensive efficiency of the station area could be obtained with consideration of the transit design and efficiency.

**Figure 1.** Network framework for measuring transit efficiency.

The production possibility set of network DEA is denoted as *Pnetwork* and its mathematical expression is shown in Equations (2)–(7). The term *<sup>z</sup>*(*k*,*h*) is an intermediate measure for evaluating transit efficiency, and Equation (5) show their mathematical expressions. The *<sup>z</sup>*(*k*,*h*) is applied to *z*(1,2) in this study, since the network framework for measuring the TOD efficiency requires a connection link from the transit design stage to the transit efficiency stage. Equation (5) describes an intermediate measure such as the weights of outputs from the design stage and inputs for the efficiency stage. Equation (6) is used to determine the variable returns to scale (VRS) condition. In the absence of Equation (6), CRS is assumed:

$$P\_{network} = (\mathbf{x}^k, \mathbf{y}^k, \mathbf{z}^{(k,h)}) \tag{2}$$

$$\mathbf{x}^{k} \ge \sum\_{j=1}^{J} \mathbf{x}\_{j}^{k} \boldsymbol{\lambda}\_{j}^{k} \,\forall k \tag{3}$$

$$y^k \le \sum\_{j=1}^{J} y\_n^k \lambda\_{n\prime}^k \,\forall n, k \tag{4}$$

$$z^{(k,h)} = \begin{cases} \sum\_{j=1}^{l} z\_j^{(k,h)} \lambda\_{j'}^k \,\forall k, h \text{ (as outputs from } k\text{)}\\ \sum\_{j=1}^{l} z\_j^{(k,h)} \lambda\_{j'}^h \,\forall k, h \text{ (as inputs to } h\text{)} \end{cases} \tag{5}$$

$$\sum\_{j=1}^{J} \lambda\_j^k = 1,\ \forall k\tag{6}$$

$$
\lambda\_j^k \ge 0, \,\forall j, k \tag{7}
$$

where *Pnetwork* is the production possibility set, *k* (*k* = 1, 2) is the number of stages that *k* = 1 is the transit design stage and *k* = 2 is the transit efficiency stage, *j* (*j* = 1, ... , *J*) is the number of observed station areas, *xkmj* ∈ *Rmk* + is the input variable of station area *j* of stage *k*, *ykrj* ∈ *Rrk*+ is the output variable of station area *j* of stage *k*, (*k*, *h*) ∈ *L* is the connection link from transit design stage to transit efficiency stage, *<sup>z</sup>*(*k*,*h*) ∈ *<sup>R</sup>*(*k*,*h*) + is an intermediate measure from the transit design stage to transit efficiency stage, λ*k* is the intensity vector corresponding to stage *k*, *z*(*k*,*h*) *j* λ*kj* is the outputs from the transit design stage, and *z*(*k*,*h*) *j*λ*hj*is the inputs to the transit efficiency stage.

The SBM DEA is a widely used model for evaluating efficiency [23]. Network DEA with a slacks-based approach was first developed by Tone and Tsutsui [22], and this model is called NSBM DEA model. The NSBM DEA model evaluates efficiency using the input and output slack. The slack is the difference value amount from the desired value amount from the actual input and output variables [24]. The two slack values are estimated irrespective of the variable unit and are calculated as an efficiency measure using the average of the reduced inputs and the average of the increased outputs. Since input and output variables have different units, slack values are converted to ratio values by dividing the original observation values. NSBM DEA is performed by calculating the slack between observations and production changes. NSBM DEA is referred to as a SBM because it is calculated based on the slack between the observations and production changes. The most important feature of NSBM DEA is that the measure of efficiency does not change even when the units of the input or output variables change. This is because the input or output slack is calculated as a ratio and is thus independent of the unit. Compared to the early DEA models, SBM model has the advantage of allowing direct comparison between different DMUs [23]. Since the efficiency score of early DEA model is estimated by adding the slacks of variables with different units, the inefficiency cannot be directly compared between different DMUs [28]. The efficiency of SBM model is measured by adding the slack ratio. Since SBM model uses the slack ratio of each variable, the efficiency can be measured regardless of the units of the variables [24].

NSBM DEA has three variations, i.e., the non-, input-, and output-oriented models. These three models can be employed depending on the objective or the features of the variables. In this research, we used the output-oriented SBM, for which the output direction is improving efficiency. The output-oriented SBM measures efficiency by fixed inputs and maximizing outputs. Since the outputs of each stage of transit analysis are required by the given conditions, the use of the output-oriented SBM is reasonable for determining the transit design and efficiency scores. It is difficult to change land use or eliminate existing facilities. Therefore, it is necessary to derive efficiency rankings by maximizing the output variables of each stage. NSBM DEA is widely used for evaluating relative efficiencies because it measures the efficiencies of DMUs. Given the transit system features mentioned

above, the NSBM DEA model is suitable for evaluating transit efficiency. The mathematical expression for measuring transit efficiency is shown in Equation (8).

$$\theta\_i^\* = \min \frac{\sum\_{k=1}^K w^k \left[1 - \frac{1}{m\_k} \left(\sum\_{m=1}^{m\_k} s\_{mi}^{k-} / x\_{mi}^k\right)\right]}{\sum\_{k=1}^K w^k \left[1 + \frac{1}{r\_k} \left(\sum\_{r=1}^{r\_k} s\_{ri}^{k-} / y\_{ri}^k\right)\right]} \tag{8}$$

subject to:

$$\begin{aligned} \sum\_{k=1}^{K} w^k &= 1, \,\,\forall k\\ w^k &\ge 0, \,\,\forall k\\ \mathbf{x}\_{\text{mi}}^k &= \sum\_{j=1}^{I} \mathbf{x}\_{\text{mj}}^k \boldsymbol{\lambda}\_j^k + \mathbf{s}\_{\text{m}}^{k-}, \,\forall m, k\\ y\_{\text{ri}}^k &= \sum\_{j=1}^{I} y\_{rj}^k \boldsymbol{\lambda}\_j^k - \mathbf{s}\_r^{k+}, \,\,\forall r, k\\ \sum\_{j=1}^{I} \boldsymbol{\lambda}\_j^k &= 1, \,\,\forall k\\ \boldsymbol{\lambda}\_j^k &\ge 0, \,\,\forall j, k\\ s\_{\text{m}}^{k-} &\ge 0, \,\,\forall m, k\\ s\_{\text{f}}^{k+} &\ge 0, \,\,\forall r, k \end{aligned}$$

where θ∗*i* is the overall efficiency score of station area *i*, *k* (*k* = 1, 2) is the number of the stages that *k* = 1 is the transit design stage and *k* = 2 is the efficiency stage, *w<sup>k</sup>* is the relative weight of stage *k*, *xkmi* is the input variable *m* of station area *i* of stage *k*, *ykri* is the output variable *r* of the station area *i* of stage *k*, λ*k* is the intensity vector corresponding to the stage *k*, *<sup>s</sup>k*<sup>−</sup>*m* is the slack vector of the input variable *xm* of the stage *k*, and *<sup>s</sup>k*+*r* is the slack vector of the output variable *yr* of the stage *k*.
