**3. Methodology**

DNNs we intend to use in this study are a kind of Artificial Neural Network, consisting of the input layer, the output layer, and the hidden layers in between. DNNs are capable of modeling complex non-linear relationships, such as common artificial neural networks, with the ability to express basic elements in hierarchical configurations and the added layers to converge the characteristics of lower layers. In addition, regardless of continuous or categorical variables, non-linear combinations between input variables are easy to analyze, and automatic feature extraction reduces the hassle of variable selection. These features are used in the study to extract ambient situation information factors, such as weather information, external factors, etc. [39].

Tra ffic accidents are issues directly related to human life, and it is believed that legal and ethical problems will be inevitable in the event of an accident by learning and predicting inaccurate content. Thus, the German Ethics Guidelines for AVs stated that they should be designed to prevent accidents in advance and that they allow the use of AI technology to improve safety. Therefore, it is considered that the top priority is to learn how to prevent accidents by recognizing accident situations in advance as a solution to tra ffic accidents of AVs, and in this study, DNN, which has higher predictability than conventional machine learning algorithms, is to be used.

### *3.1. Data Collecting & Pre-Process*

The data used for this study was from the accident data of Seoul city collected from 2017 to 2018. The main dataset is Tra ffic Accident Analysis System (TAAS) [48] data provided by The Road Tra ffic Authority (KoROAD). The information on tra ffic accident conditions in TAAS can be obtained in the form of Excel data on the TAAS website. However, it was necessary to extract the location information (coordinates) of an accident to identify the tra ffic situation information such as several lanes, speed, etc. during the tra ffic accident in detail. So, we crawled location coordinates data from TAAS to merge link attributes with tra ffic accident condition data. The crawled data include location coordinates, accident number, date of the accident, day of the accident, the content of accident, the number of deaths, the number of severe injuries, the number of light injuries, the number of wounded, accident type, violation of the law, road conditions, weather, road type, offender information, and victim information (see Table 1).


**Table 1.** Crawled Traffic Accident Analysis System (TAAS) data.

However, TAAS has a limitation in that it does not provide data on traffic environment such as the number of lanes and speed, etc. Accordingly, we used Transport Operation & Information Service (TOPIS) [49] from the Seoul city and Korea Transport DataBase (KTDB) [50] provided by MOLIT to ge<sup>t</sup> the traffic environment data of each node & link. The link speed data were drawn from TOPIS (See Table 2), and the data on the number of lanes were extracted from KTDB (node-link data). TOPIS and KTDB data were obtained and utilized in Excel form from the above-mentioned sites.



TOPIS provides link speed data only on minor arterial roads, but not on collector roads. Thus, we utilized KTDB to ge<sup>t</sup> the data for collector roads. The data on speed and number of lanes were merged based on the TAAS coordinate system, and we used the 1-h data for the learning.

After the collection, we refined the data for the analysis. We excluded X, Y coordinates, local area name, and accident number from the dataset because they are only useful for merging purposes. Also, the data on the number of casualties such as the number of deaths, severe injuries, etc. were excluded because they are deemed unsuitable for this study which aims to prevent accidents in advance. When checking the basic statistics of the data, it was found that the number of accidents was appeared constant in monthly and daily bases. Also, the seasonality that we wanted to check in monthly accidents seems to be well reflected in the "weather" factor; likewise for the "day" factor reflected by the "day of the week" factor. Therefore, we only used the time and the day of the week data in the analysis.

Since TOPIS data do not include speed data under the minor arterial road level, 10 to 20 km/h, which is the average speed in Seoul, was allocated for the empty data cells. Finally, a total of 77,000 pre-processed data were used, with 38,625 cases in 2017 and 38,796 cases in 2018.
