**1. Introduction**

With the expansion of urban scale and the rapid growth of motor vehicles, traffic congestion has become an increasingly serious urban problem, and the tidal traffic generated by the commuting of residents is believed to be one of the major causes of traffic congestion [1,2]. Traffic congestion not only brings energy waste and environmental pollution [3], but is also believed to negatively affect public health [4]. To address this problem, common approaches including economic or policy measures based on econometric models are used, such as congestion pricing [1,5–7], encouraging the use of public transport [8–10], etc. However, models commonly used in these approaches are static, in which residents' distribution and mobility in space are seldom considered, thus bringing inaccuracy of results [11]. Currently, increasing the availability and utilization of urban spatial big data, especially location-based service (LBS) data from GPS devices, smart cards, and mobile phones, make it possible to describe urban residents' travels more accurately on a finer scale [12]. Data of residents' mobility over time and space can be used for urban geographic mapping [13], epidemiological

analysis [14,15], real-time urban monitoring [16], etc. and can also be used for recognition of urban spatial features [17–19] or measurement of urban vibrancy [20]. Another important scenario of application is the study of residents' commuting and urban transport, including the identification of commuting areas and commuting distances [21,22], and the acquisition of commuter Origin-Destination (OD) matrices [23–25]. Among various sources of location data, mobile phone data have been widely employed in studies such as residents' commuting thanks to its extensive coverage, passive data collection, and the fact that its data acquisition requires no extra equipment. In comparison, alternative data sources, such as smart card data or taxi GPS data, have equivalent di fficulties in data coverage but much smaller population coverage [26]. Results of studies based on new data have been shown to have higher accuracy compared to those that are based on statistical data or measured data, proving the e ffectiveness of big data application in urban studies. In general, most of these studies are at an early stage of describing urban phenomena through data, few studies attempt to go further such as using big data to identify the connection between residents' travel and tra ffic congestion, or to predict and evaluate measures for tra ffic improvement [27,28].

After obtaining relatively accurate data for residents' commuting, modeling and simulation can be an approach to identifying the mechanism and rules behind the functions of urban spaces. The agent-based model (ABM) is considered one of the most e ffective techniques for simulating complex systems and thus has grea<sup>t</sup> advantage to study cities, which are typically complex systems [29]. The distributed characteristics of ABM enable it to reflect di fferences in the behaviors of di fferent types of individuals [30]. Therefore, the model can be used to simulate tra ffic flow and residents' behaviors in urban transport, including the residents' choice of travel modes [31] and carpooling models [32]. Other research applications are found in the optimization of bus routes [33,34], the simulation of the functioning of the urban composite transportation system [35], the evaluation of the impact of the intercity high-speed railway on the ecological environment [36], etc. From the perspective of development trends in research, studies are moving from the simulation of individual decision-making to that of the composite flow of urban tra ffic, with increasing complexity of simulation. However, most of these simulations are still based on survey data, which not only are expensive and time-consuming to acquire but also lack details of residents' travel behaviors. For these reasons, big data are considered to be a better data source for studies of residents' travel behaviors [37].

Among others, call detail records (CDRs) are commonly employed as a kind of urban spatial big data. Compared with traditional survey data, it has higher sample coverage, time e fficiency in acquisition, and higher time resolution [38]. On the other hand, as an e ffective tool for studying urban spaces, ABM has long been fettered by the lack of data in its earlier developments and it sees grea<sup>t</sup> potential in the current context of smart city development [39]. Therefore, using CDRs in ABM offers grea<sup>t</sup> promises for tra ffic simulation that reflects actual urban spatial environment and the spatial distribution of residents. The simulation, in turn, can be used to analyze the causes of tra ffic congestion and even to predict tra ffic conditions under di fferent application scenarios. In previous studies, although big data is gradually applied to generate the OD matrix of the residents' travels and to predict tra ffic pressure in the actual urban road network, the following weaknesses still exist: first, most of these studies were conducted on a macro scale of the whole city. At such a scale, road capacity is often ignored, despite the fact that it is crucial for relieving tra ffic congestion; second, most studies presume that all commuting travels begin simultaneous, without addressing the di fferences in tra ffic volume in di fferent time periods. Apparently, considerable errors may occur in the prediction of tra ffic conditions on the micro scale. Therefore, these two points were taken into consideration in the model employed in the present study.

#### **2. Materials and Methods**

The present research comprises two major parts: the acquisition of the features of residents' commuting behavior and simulation of commuting behavior of urban residents.

#### *2.1. Mobile Phone Data Processing*

As mobile phone data is directly related to the spatial distribution of the base stations, its accuracy in positioning is also determined by the density of base stations and varies across di fferent areas. In addition, due to the di fferent way of work and life of various users, the acquired phone call behavior is also fuzzy data with an uneven distribution over time. In general, mobility studies using mobile phone data usually take areas with densely distributed base stations such as city centers as case studies. User's location is represented by the location of the base station that has recorded the most frequent phone calls by the user within a specific period (one month or several months) at a specific time (working hours or at an interval of several hours). Subsequently, by associating the locations of various base stations along the timeline, user's mobility trajectory can be generated using CDR data [17,19,21,23–25,38,40–42]. The present study uses a similar approach while focusing on the rush hours and dividing the timeline on an hourly basis. Data are processed in combination with the ArcGIS platform, and the raw mobile phone data (Table 1) comprise CDRs of 7 million users in the case city over a time period of one month.

Data processing followed the procedures below:

a. Invalid data and users who make less than three phone calls per month were removed. After the screening, 3.8 million users remained.

b. Users' CDR data were sorted into working hours (7:00 to 18:00, Monday to Friday) and non-working hours (Saturday and Sunday, and 7pm to 6am on weekdays). Base station locations with the highest call frequency during the two time periods were identified as the place of work and residence, respectively.

c. Frequency of phone calls on working days (Monday to Friday) was calculated based on user's CDRs every 24 hours, and field value of the base station ID with the highest call frequency during the period was extracted (as shown in Table 2).

d. The four hours from 6:00 to 9:00 were identified as the peak commuting period. Based on space unit division and base station location, base station ID and space unit code were associated (Figure 1). By comparing the codes of space unit and residence location of a user at each hour, the departure time was determined and the base station ID matrix of origin and destination was obtained. (The decision rule is: if a user's space unit code is 0 at 6:00, and is di fferent from the code of the residence space unit at 7:00, 6:00 is thus decided as the departure time, the code of the residence space unit is decided as the origin, and the code of space unit at 7:00 is decided as the destination. If the code of space unit at 7:00 is still 0, the departure time is further extended to the next time period till the code is not 0 and code of space unit is di fferent from the residence code. If the space unit code at 6:00 is not 0, users with space unit codes at 6:00 and 7:00, 8:00 and 9:00, or 6:00, 7:00, 8:00, 9:00 and 11:00 are searched respectively. Whenever a change in space unit codes occurs, the di fferent units will be decided as the origin and destination. Furthermore, an OD matrix for di fferent hours is generated.)

Subsequently, the OD matrix of residents' travels in the case area were imported into the ABM as basic data.


**Table 1.** Sample of mobile phone call data.


**Table 2.** Sample statistics of base stations assigned to user ID at different hours of a day.

**Figure 1.** Procedures of assigning a user to a spatial unit.

## *2.2. Agent-Based Model*

ABM is often used in complex giant systems such as cities. Generally speaking, a Multi-Agent System (MAS) contains many types of agents, including mobile agents such as urban residents and static agents such as urban roads. Agents run by pre-defined rules and interact with one another, producing movement and dynamic changes starting from an individual agen<sup>t</sup> to the whole. As this mechanism resembles the interaction between human individuals, human and space in the city, ABM is considered as one of the best tools to understand urban functioning [43]. The model in the present study is established on the Repast S platform, and the settings of external environment and agen<sup>t</sup> mobility draw reference from the open source model RepastCity [44–46]. Since residents' traveling is the only behavior studies in the research, the modeling of urban environment can be simplified into the spatial units of travel (i.e., origin and destination) as well as urban roads. Agents' behavior rules mentioned below are coded by Java and added to the RepastCity model to make it run as we designed. The rules for model running are that a resident agen<sup>t</sup> moves from one spatial unit (origin) to another spatial unit (destination) at a specific time point. When the resident agen<sup>t</sup> runs on the road, it may lower its speed of movement due to preset traffic congestion conditions (Figure 2).

**Figure 2.** Diagram of rules for resident Agent behavior.

#### *2.3. Model Hypothesis and Parameter Setting*

The following hypothesis and rules are made regarding the generation of an Agent in the model and its behaviors:

First, each Agent (urban resident) generated has a specific origin and destination of travel, at a specific time point of departure. In the model, all resident Agents are generated simultaneously but are set with a specific delay value each, according to their different departure time. For example, residents depart at 6:00 have a delay value of 0 s, while those who depart at 7:00 have a delay value of 3600 s. In addition, each resident Agent is represented by a private car whose initial speed of travel is based on the driving speed of a normal motor vehicle.

Second, on each plot, a certain number of Agents is generated which is calculated using the number of residents acquired through phone data, then divided by operator's market share, and finally multiplied by the ratio of motor vehicle travel of residents.

Thirdly, traffic congestion emerges when a certain number of resident Agents concentrate in the same road intersection, and the traveling speed of residents varies in accordance to the level of congestion. Roads and nodes in the mode are generated from a shape file built in ArcGIS and are converted to Agents in RepastCity.

Fourth, residents choose the shortest route to their destination and do not change route before their arrival. The choice of path by Agents is based on the Dijkstra algorithm. Codes of space units as origin and destination are acquired in the OD matrix, and the shortest route is calculated in accordance with the road network and algorithm.

There are three major parameter variables in the commuting travel model of urban residents in the study area:

The first one is the commuter travel data of residents in each plot acquired from the OD travel matrix, and the number of travels of the corresponding Agent. The number of Agents is decided based on two factors: first, the number of residents acquired through phone data is converted to ge<sup>t</sup> the number of commuting residents, and is converted to ge<sup>t</sup> the number of travels by motor vehicles. As the Baishazhou area is located in the outskirt of the city near the Third Ring road with no subway lines, and the number of bus lines is far less than those in the inner city area, it is assumed that the majority of travels are made by private cars. The number of residents acquired through phone data is divided by the market share of the telecom operator and then the number of private car travels is acquired at a conversion factor of 1 to 1. The second factor is the ratio of the number of Agents in the simulation to the actual number of residents traveling by cars. In the statistics, we observed that there ~15,000 people traveling at the 9–10:00 period when the amount of travels is at the lowest in the study area. Previous test modeling showed that, with increasing number of Agents, the speed of simulation drops significant, while the precision of simulation results does not increase accordingly. Therefore, in the present simulation, the number of Agents is reduced so as to improve the e fficiency of simulation and the tra ffic capacity of roads has been adjusted proportionally. The final resident-to-Agent ratio is set at 1:10, that is, one Agent represents 10 residents.

The second parameter is the speed setting. Considering the hierarchy of roads in the study area, such as urban expressways, artery roads, etc., the Agent's speed on the roads is also di fferentiated. In the study, two di fferent speeds are set, i.e., the expressway speed, at 50 km/h and the artery road speed, at 30 km/h. This parameter is achieved by specifying the field of road attribute in GIS, corresponding to the speed parameters of 13.9 m/s and 8.3 m/s, respectively. The speed setting also correlates the Agent's travelling speed with the actual time unit, that is, each operation cycle (1 tick in simulation) is equal to 1 second of real time.

The third parameter is the road congestion settings. As the present study is conducted on a meso to macro scale area, roads are not categorized on a finer level, nor is the overlapping of vehicles considered. Congestion is defined by the instantaneous density of Agents on the road as vehicle density can directly demonstrate the congestion level on a road and road occupancy is often used as a quantitative indicator in tra ffic analysis [47]. According to the methods used in previous literature [48], the present study defines road occupancy at 0.5–1 as serious congestion, 0.3–0.5 as slight congestion, and below 0.3 as no congestion.
