**1. Introduction**

Detection of road anomalies like potholes, road cracks, and road safety features/obstacles like speed bumps (also called speed humps) has been investigated by the research community in recent years using a variety of sensors including cameras, Light Detection And Ranging (LiDAR)s, and Inertial Measurement Units (IMU)s. One of the main applications for the detection of road anomalies is the monitoring of the road conditions, which can be used to repair the road surface once a road anomaly (e.g., pothole) is detected or to improve the comfort and safety of the vehicle in Advanced Driver Assistance Systems (ADAS). An additional aspect is to use the identification of road anomalies as landmarks to enhance maps and improve localization to a high degree of accuracy. The upcoming evolution of modern vehicles to autonomous vehicles could benefit from this information and use it for a variety of purposes. It could be used to improve the position of the autonomous vehicle. It could be used improve its travel plan for the comfort of the passengers (e.g., the autonomous vehicle

could slow down before a known pothole). It could even be used to mitigate cybersecurity attacks on the positioning information (e.g., the position information would be correlated with the detected road anomaly to mitigate Global Navigation Satellite Systems (GNSS) spoofing attacks). The use of the position of the road anomaly to support localization algorithms was already mentioned in [1,2]. In addition, studies like [3] suggest that artificial position measurements obtained by detecting road anomalies associated with map locations will be increasingly used in the automotive sector and automated vehicles also in synergy with Simultaneous Localization And Mapping (SLAM) techniques by the robotics community.

Detection of road anomalies can be performed using different techniques including machine learning algorithms [4,5], and in recent times, Deep Learning (DL) algorithms have been adopted with significant success. In many cases, DL is used to detect and identify road anomalies on the basis of the images collected by the camera installed on the vehicle [6,7]. Processing of data images can be a time consuming and error prone task [5], and it is limited by the lighting (e.g., darkness) or environmental conditions (e.g., fog, rain) [8]. An alternative approach, which is not affected by these issues, is to use the data provided by the IMU, and DL was applied to such data in recent studies [9]. Then, the application of DL for road anomaly detection using data from accelerometers and gyroscopes is a recent research area, which is gaining significant momentum in the research community, and it is further explored in this paper.

This paper proposes the combination of time-frequency transform and CNN (called CNN-SP) for the detection and identification of road anomalies in the road infrastructure, which has not been proposed in the literature yet (to the knowledge of the authors) for this specific problem. As shown in the results provided in this paper, CNN-SP provides a superior classification performance to the direct application of CNN to the original time domain signal. This paper uses a relatively large set of vehicles (12) in the data collection phase to improve generalization of the results. The application of CNN-SP is evaluated using an experimental dataset collected by the authors with many hours of driving on a realistic road path with various types of road anomalies and obstacles. The approach is evaluated using data both from the accelerometer and the gyroscope. Finally, this study provides an extensive evaluation of the different sampling rates on the identification accuracy.

The structure of this paper is as follows: Section 2 provides a literature review on the detection of road anomalies using sensors installed on the vehicles with a particular focus on accelerometers and gyroscopes. Section 3 describes the materials and methods used for the analysis including the description of the adopted machine learning algorithms and the related evaluation metrics. Section 4 provides the results of the analysis including the optimization of the CNN-SP approach and a comparison among the results provided by the different machine learning algorithms. Finally, Section 5 provides the conclusions.

#### **2. Literature Review**

Detection of road anomalies and road surface conditions using the sensors installed on the vehicle has been investigated by the research community in recent years. Detection of road anomalies through cameras can provide high accuracy especially with the recent application of deep learning. Examples of the application of deep learning (and convolutional neural networks in particular) in combination with camera images to assess road surface condition were provided in [6,7]. The use of data images can be a time consuming and error prone task [5], and it is limited by the lighting (e.g., darkness) or environmental conditions (e.g., fog, rain) [8]. For this reason, this paper does not use an approach based on camera images, and other sensors are used.

The application of accelerometers, gyroscopes, and magnetometers to detect road anomalies like potholes and obstacles (e.g., speed humps) has been investigated by the research community in recent years [10]. This is also due to the decreasing costs of IMUs and their increasing use in the automotive sector either because they are inserted and used in the vehicle systems or because smartphones (which are equipped with IMUs) can be deployed and installed in vehicles. A very recent and detailed survey on the use of accelerometers and gyroscopes (as inertial sensing sources of information) was presented in [4] where an analysis focused on identifying methods that capture signals provided by inertial sensors such as accelerometers and gyroscopes to recognize transient or persistent events associated with the vehicle's movement was presented. The results of the survey show that a limited amount of the reviewed papers used time-frequency analysis, and no paper in the review used the combination of the time-frequency and convolutional neural networks, as proposed in this paper.

Detection of road anomalies like potholes using accelerometers was implemented by the authors in [11,12] where a high detection rate was achieved using the data collected by the accelerometers in the Z direction, but no deep learning approach was used. In a similar way, the authors in [13] used a generic smartphone application reading data from built-in accelerometers sensors to map and measure the locations of potholes and speed bumps, which can be used to evaluate the road conditions. The data were collected in a cloud system, where the analysis was performed. The authors in [5] proposed the Android application RoadSense, which automatically predicts the quality of the road based on a triaxial accelerometer and a gyroscope. The study used frequency domain features for the classification combined with different machine learning algorithms. As in previous papers, the focus was on pothole detection to monitor the smoothness of the road surface, but the identification of the road anomalies was not attempted.

Four recent works are very similar to the study presented in this paper. In [14], different road anomalies like bump, pothole, and normal conditions (flat road) were detected and identified using an improved Gaussian background model and the K Nearest Neighbor (KNN) algorithm applied to the accelerometers' recording. The described approach provides a high accuracy (96.03%) of recognition of the road surface pothole and a high accuracy of the road surface bump of 94.12%. In comparison to [14], this paper applies a more sophisticated deep learning approach in combination with a time-frequency transform, which is shown to provide a higher identification accuracy than the time domain representation (i.e., original accelerometer signal). In addition, gyroscope data were also used in addition to the accelerometer data. Finally, a larger set of 12 vehicles is used than the two vehicles used in [14], which improves the generalizations of the results.

The authors of [8] proposed a novel approach to identify the profile of a pothole, which is a more challenging task than the detection alone. The depth of the pothole was used as a profile metric, and the accelerometer data collected by a smartphone were used to conduct the analysis together with the GPS data. The authors of [8] used four different vehicles to collect the data with two samplings rate of 100 and 200 Hz. Twenty-three different types of potholes were considered on an overall set of 2760 segments. In comparison to [8], this paper uses a larger set of vehicles (12), a similar set of road anomalies/obstacles (19), and different sampling rates (50, 100, 150, 200 and 250 Hz). In addition, this paper proposes a deep learning approach, while the analysis in [8] was done with Cumulative Distribution Functions (CDFs). On the other hand, the authors of [8] also investigated different placements of the smartphone in the vehicle with the result that the placement on the dashboard provided the best identification accuracy. Based on the results of [8], this paper adopts the position of the smartphone on the dashboard as well.

The authors of [15] used a DL approach based on CNN (as in this paper) to perform the detection of road anomalies. The results provided in [15] confirmed the superior performance of the adoption of CNN in comparison to shallow machine learning algorithms, which prompted the authors of this paper to apply a CNN based approach as well. In comparison to the CNN approach used by the authors of [15], where the CNN was applied directly to the data collected from the accelerometers, this paper first transforms the accelerometer data in the spectral domain using the spectrogram defined in Section 3.4, then the spectral representation is fed to a CNN (called CNN-SP in the rest of this paper). The results presented in this paper show that CNN-SP outperforms the application of CNN directly on the source data (called CNN-1D in the rest of this paper) both for accelerometer and gyroscope data and across different sampling rates. The application of CNN-SP is evaluated using an experimental dataset collected by the authors through many hours of driving on a realistic road path with various

types of road anomalies. The study results presented in this paper use the dataset from [16], where it was used for the different goal of automotive vehicle authentication. Additional details on the dataset are presented in Section 3.

In [17], the authors used both accelerometers and gyroscopes like the study done in this paper to detect speed bumps. The authors applied a number of features to the data recorded by the accelerometer and gyroscope mounted on the vehicle. Then, a genetic algorithm was used to find a logistic model that accurately detects road abnormalities. The results were quite encouraging as the approach was able to achieve an accuracy of 0.9714 in a blind evaluation.

In [9], the authors proposed a road anomaly detection approach based on the application of three different DL algorithms: Deep Feedforward Network (DFN), Convolutional Neural Network (CNN), and Recurrent Neural Network (RNN), and the results were compared. The results showed that the application of DL algorithms was very effective in detecting road anomalies, thus proving the approach also used in this paper. In comparison to this paper, the authors in [9] used a larger set of signals beyond the accelerometers and gyroscopes used in this paper since they also included the shock responses of the four absorbers and the rotation speed. From this point of view, reference [9] represents a significant progress in comparison to the literature. The authors of [9] also compared three different DL algorithms. On the other hand, this paper uses a set of 12 vehicles in comparison to the single vehicle used in [9]. This paper also evaluates the impact of different samples rates on the identification accuracy, while [9] used only the sampling rate of 100 Hz. Finally, this paper uses a time-frequency CNN approach rather than the direct application of DL to the signals in the time domain.

Next, we summarize the progress introduced by this paper in comparison to the references identified in the previous paragraphs. As shown in the results provided in this paper, the time-frequency CNNs provide a superior classification performance to the direct application of CNN to the original time domain signal, as proposed in the literature [9,15]. In comparison to the literature [8,9,13–15,17], which used only one or two vehicles (with the exception of [8], where four vehicles were used), this paper uses a relatively large set of vehicles (12) to improve the generalization of the results. Then, the focus of the paper is to investigate how accelerometer and gyroscope readings collected from different vehicles are used to detect road anomalies and obstacles. The impact of various speeds by different vehicles is also mitigated to support the application of time-frequency CNN. In comparison to the literature, this study uses the data both from the accelerometers and the gyroscopes, which was adopted only by a few studies [9,17], as the accelerometer data were generally used [15]. Finally, this study provides a more extensive, to the knowledge of the authors, evaluation of the different sampling rates on the identification accuracy.

#### **3. Materials and Methods**

#### *3.1. Materials*

As mentioned before, the dataset used for this experiment consisted of 12 different cars. The specifications of the vehicles are listed in Table 1. The same driver and co-driver were present in every car during the experimental data collection to minimize the potential bias of the driving behavior. Even if it is acknowledged that in practical operations, the co-driver may be different or may be absent, we wanted to limit the number of variables in the study. Future developments will investigate the presence of different passengers and their number (see Section 5). For the IMU, we used the microelectromechanical system based motion tracker supplied by Xsens (Enschede, The Netherlands) with Model Number *MTi* − 100 − 2*A*8*G*4 . The technical specification of the Xsens sensor are reported in Table 2.


**Table 1.** Order and specifications(brand and model) of the cars used in the data collection.

**Table 2.** Technical specificationsof the sensor used in the data collection: Xsens with Model Number *MTi* − 100 − 2*A*8*G*4.


The Xsens *MTi* − 100 sensor used for data collection is designed to measure the three axis acceleration and rate of turn at a 2000 Hz sampling rate and the three axis magnetic field at a 100 Hz sampling rate. The actual sampling rate used in the analysis was actually smaller to emulate the sampling rate from a smartphone. In particular, increasing sampling rates of 50, 100, 150, 200 and 250 Hz were used, as this is the range from low cost phones to more sophisticated phones. The IMU was mounted using a strong double sided foam tape at the same spot and orientation for every car. We decided to place the sensor on the top of car's dashboard in the middle of the car, because it is a common placement position for the IMU in the literature [12]. In the recent review by Menegazzo et al. [4], the placement of the IMU on the top of car's dashboard was the one mostly adopted in the literature. The results from [8] also showed that the placement on the dashboard provided the smallest classification error in comparison to the placement of the IMU in other locations on the vehicles (e.g., left or right side of the car). The image of the placement of the IMU in three vehicles is shown in Figure 1.

```
(a) Fiat Panda 3 (b) Mazda3 (c) Fiat Panda 4
```
**Figure 1.** Placement of the sensor used for the data collection in the vehicles for three vehicle models.

The data collection was performed using the controlled positioning strategy. As described in [4], this strategy consists of a simple technique where the sensor is placed on the vehicle so that the axes in both reference frames coincide, i.e., the sensor axes are aligned with the vehicle axes, and it is not necessary to apply preprocessing for reorientation. This technique was used for both accelerometer and gyroscope data. Then, the *x*-axis of the IMU was always pointing towards the driving direction and the *z*-axis in the vertical direction. A description of the reference frames used in this paper is shown in Figure 2 with a depiction of the body frame and the world frame. The data collection and analysis were performed on all three axes' data both for the accelerometers and gyroscopes, but the accelerometer *Z* and the gyroscope *Y* provided the optimal classification accuracy.

**Figure 2.** Reference frames.

The description of the brand and model of each of the 12 cars is shown in Table 1. The models were chosen to have a significant number of cars of the same model (the Fiat Panda), but to include as well car models (Mazda3, Octavia) that are different from the Fiat Panda regarding the weight and engine power. The use of 12 different vehicles mitigated the problem of defining a model for pothole detection based on a single vehicle because the data collected by each vehicle were shuffled in the classification process.

The path where the vehicles were driving was a loop in the European Commission Joint Research Centre (JRC) premises. The path is slightly longer than 2 km. Since the data collection was performed for each of the 12 vehicles driving 20 times on the loop, the entire driving data collection was performed on more than 480 km of road. The JRC campus was built in 1960–1965, and the creation of the road infrastructure was more than 50 years old at the time of writing this paper. The road infrastructure is well maintained, and it is all asphalted, even if the asphalt is relatively old and presents various road anomalies like potholes, cracks, transverse cracks, and patches. Different parts of the loop also have different maintenance records because the loop crosses different sectors of the JRC campus. Then, the road surface condition is uneven across the different parts of the loop. The loop includes 15 main road anomalies and 4 obstacles of two different types, which were selected for the data analysis. With reference to the classification of exteroceptions identified in [4], the road anomalies included potholes, cracks, transverse cracks, and patches, while the obstacles were of types rumble strips and speed humps/speed bumps. A picture of some examples of the speed bumps and the road anomalies (potholes in the road) are shown in Figure 3 with the related recordings with the accelerometer in the *Z* direction in Figure 4.

**Figure 3.** Examples of obstacles and road anomalies (picture).

The road anomalies are identified in the rest of the paper with the identifier RFX with X = 01, ..., 15 while the obstacles are identified with SBY with Y = 01, ..., 04. The segments of the loop, which are outside RFX and SBY, are identified as NORMZ with Z = 01, ..., 22. See Section 3.2 for further details. Note that even the NORM road segments are not exempt from the presence of small road anomalies or uneven surfaces. The map with the pictorial description of the driving path and the position of the road anomalies/obstacles is shown in Figure 5.

**Figure 4.** Examples of obstacles and road anomalies (recordings with the accelerometer in the Z vertical direction).

**Figure 5.** Map of the driving path/loop with the position of the road anomalies/obstacles.

#### *3.2. Methodology*

The overall methodology is described in Figure 6, and each step is described in the following paragraphs:

• Normalization and synchronization: As a first step, the data collected from the IMU in each of the 12 vehicles were synchronized among the 20 laps and normalized. A GNSS receiver was also installed in the vehicle and synchronized with the IMU. This process was repeated for different sampling rates: 50, 100, 150, 200 and 250 Hz. All these sampling rates were obtained by downsampling by the related factor (e.g., a factor of 10 for 200 Hz) the initial data collected at 2000 Hz from the IMU. In this paper, we consider only the analysis of the accelerometer data in the *Z* direction (the vertical direction) and the gyroscope in the *Y* direction (in the direction of the vehicle). The reason for this choice was to minimize the degrees of freedoms in the

analysis and because a heuristic analysis of the data related to the other axes of the accelerometers and gyroscopes showed that the obtained accuracy was inferior to the one obtained using the accelerometer in the *Z* direction and the gyroscope in the *Y* direction. This is to expected because the IMUs were mostly stimulated for those axes by the roughness of the road surface, and this is also consistent with literature [11]. In the rest of this paper, the Accelerometer data in the Z direction is called AccZ, and the Gyroscope data in the *Y* direction is called GyroY. The relationship between the RPY angles and the ENU coordinates was the same as described in [16]. The synchronization was performed by applying the moving variance to the data from the accelerometer and by correlating the results across the laps and the vehicle. As is well known in the literature, full synchronization is not always possible because the vehicles move at different speeds over the road anomalies, and each vehicle has its response to the stimulus created by the road anomaly. On the other side, these are problems derived from data collection in a realistic environment, and such processing issues are likely to be present in any real scenario.

• The entire loop was divided into three different types of segments, and each segment was labeled with type normal, road anomaly identifier (e.g., an asphalt break is RF15), and obstacle identifier (e.g., SB01). There were 4 obstacles (identified with SB01, SB02, SB03 and SB04), 15 road anomalies or road features (identified with RFX and X= 01, ..., 15), and 22 normal segments (identified with NORM), which were obviously the majority in the entire loop. Each segment had a time duration of 4 s. This time duration was chosen because it was long enough to include the driving time of a vehicle over each of the road anomalies/obstacles considered in the study. The approach was tested on different driving speeds since each vehicle (of the dataset of 12 vehicles) was driving at a different speed and the speed was different in each loop (20 loops), even for the same vehicle. Then, the data collection was representative of the real-world conditions when vehicle speeds may be different. Indeed, this was the focus of the study to evaluate if the proposed approach was able to compensate the data taken with different speeds. As the segment duration was fixed, the sample length of each segment was obviously longer for higher sampling rates (e.g., a segment was 200 samples long for a sampling rate of 50 Hz). The labels were generated by using the GNSS position and by manually checking for each segment that the road anomaly and the obstacle were correctly assigned to each segment. This manual step was needed because the GNSS accuracy may not be precise enough to identify the precise location of the road anomaly/feature.

As we had 12 vehicles for 20 loops, the analysis took into consideration a total of 12 × 20 × 41 segments = 9840 segments based on (4 + 15 + 1) = 20 different classes.


**Figure 6.** Overall methodology.
