*3.3. Data Preprocessing*

The station areas of the 352 subway stations in Seoul were designated as DMUs in order to measure transit e fficiency. The Enforcement Rules of the Urban Planning Ordinance of Seoul defines the station area as "an area within a 500 m radius from the center of stations such as subway, national railway, and light rail". [39]. This standard was employed in several previous research related to the transit in Seoul [40,41]. Data preprocessing involved compiling these data by the station area. Transit and socio-economic data were obtained from both the smartcard and open data, respectively.

Socio-economic data from the open data portal were also compiled by the station area. Among the various open data, the population density, land value, number of households, and the number of companies were obtained for this research. Since all the obtained data were provided within census area units, it was necessary to aggregate the data values by the station area. The population density and land use values were compiled by averaging. The number of households and companies were aggregated by summing. From the result of preprocessing, the socio-economic data, population density, land value, the number of households, and the number of companies were determined to average: 34,249 (person/km2), 5903 (1000 won/m2), 5249 households, and 134 companies, respectively. Table 2 lists the descriptive statistics of the socio-economic data from the open-data portal.


**Table 2.** Descriptive statistics of the socio-economic data.

The transit data consist of transit infrastructures and trips per transit stations. The infrastructure variables include the numbers of subway lines, bus lines, and bus stations, and the average distance between a bus stop and subway station. The transit trips data include the numbers of subway trips, bus trips, and transfer trips between subway and bus, and the energy consumption by transit trips.

Since there is an overlapping area between some station areas, the average distance between the bus and the subway stations and the number of transfer trips are included as a variable. The definition of energy consumption is the consumed energy by transit mode per trip [42]. The station area the transit energy consumption of the individual station area can be calculated. Since the transit modes consist of subway and bus, the energy consumption is obtained by the sum of each mode's trips multiplied by the conversion factor. For the transfer trips, conversion factors of each mode are multiplied by each trip. The energy consumption by each station area was calculated using conversion factors, i.e., 0.7 for a subway trip (Mcal/trip), and 3.2 for a bus trip (Mcal/trip). These factors are provided by the Ministry of Trade, Industry and Energy of the Republic of Korea [42]. Figure 2 shows the heat-map of transfer trips on station area.

**Figure 2.** Heat-map of transfer trips on station area.

The data preprocessing results by station area for the numbers of subway lines, bus lines, and bus stations, and the distance between bus stops and subway stations yielded averages of 1.6 lines, 34 lines, 70 stations and 254 m, respectively. To identify the relationship between transit modes, the numbers of bus lines and stations were counted by types of buses, i.e., main bus, branch bus, or circulation bus. The numbers of subway trips, bus trips, and transfer trips, and energy consumption were 36,640 trips, 96,239 trips, 6164 trips, and 377,910 Mcal/trip, respectively. Table 3 lists the descriptive statistics of the transit data obtained from smartcard data.


**Table 3.** Descriptive statistics of the transit efficiency data.
