1. Introduction
Terrain classification is an active area of research having a wide range of applications, e.g., outdoor terrain navigation, the recommendation of floor types for health care environments, sports flooring, consumer suggestion systems and autonomous driving [
1,
2]. In the literature, several terrain classification approaches have been proposed based on different types of data (e.g., visual data, acoustic, physical touch, etc.) typically acquired using optical cameras, 3D laser scanners and on-board sensors mounted over humanoid robots [
3,
4,
5], autonomous off-road driving vehicles [
6,
7,
8] and aerial platforms [
9].
The traditional camera-based approaches employ visual features to distinguish different terrain types. One of the earliest approaches was proposed by Weszka et al. [
10] who performed terrain classification using automatic texture measure. Anastrasirichai et al. [
11] also presented an algorithm that used visual features captured during the human walk. Similarly, Ma et al. employed aerial image data for terrain classification to support off-road navigation of the ground vehicle using low-rank sparse representation. Peterson et al. [
12] also used imagery data using aerial vehicle for ground robot navigation. With the aim of terrain classification, Dornik et al. [
6] classified different soil types using geographic object-based analysis on images. Laible et al. [
7] fused color information together with 3D scans obtained from LiDAR to perform terrain classification. Similarly, Ojeda et al. [
13] employed fused data acquired from a suite of sensors consisting of microphone, accelerometer, gyroscope, infrared and motor current to train a feedforward neural network for terrain classification.
From a robotics perspective, Wu et al. [
3] proposed a small legged robot that used an array of miniature capacitive tactile sensors to directly measure ground reaction forces (GRF) and used them to classify terrains. Zhang et al. [
14] also sensed ground forces using force/torque sensor for biomimetic hexapod robots walking on unstructured terrains. Similarly, Giguere et al. [
4] described a tactile probe for surface classification utilizing mobile robots, with single-axis accelerometer. Belter et al. [
5] also addressed the issue of terrain perception and classification using noisy range data acquired via laser scanning and terrain mapping module in humanoid robots. Valada et al. [
15] performed a robotic acoustic-based terrain classification that exploited the sound waves originating from the terrain–vehicle interaction to build a spectrogram image that is fed to a convolution neural network to learn deep features for subsequent terrain classification. Rothrock et al. [
16] framed the problem of terrain classification as a semantic segmentation problem in which they employed deeplab to visually classify terrain type and then registered it with the slope angles and wheel slip data to generate a prediction model for Mars Rover mission. Similarly, for planetary missions, Brooks et al. [
17] also analyzed the vibration patterns obtained via terrain-vehicle vibration to distinguish different terrain types. Yaguang et al. [
18] proposed a method for terrain classification based on a combination of speed-up robust features for an autonomous multi-legged walking robot.
In the context of autonomous off-road driving, the terrain classification problem has been actively studied in the last decade. For instance, Manduchi et al. [
1] presented a technique for terrain classification and obstacle detection for autonomous navigation using single-axis lidar and a stereo pair of sensors. Dupont et al. [
19] and Lu et al. [
20] also proposed terrain classification based on the vibrations generated by autonomous ground vehicles using a laser stripe based structured light sensor. Further, Delmerico et al. [
21] also performed terrain classification using rapid training of a classifier for autonomous air and ground robots. In the problem of robotic terrain classification, the performance of the proposed methods depends on the appropriate navigation strategy, machine vibration and obstacles in the path.
Although these approaches work well, their accuracy is constrained due to several challenges. For example, in the case of visual sensor, the motion, occlusion and change of appearance in visual data due to varying illumination conditions cause degradation in performance. Similarly, the robotics- and autonomous off-driving-based approaches have trade-off in terms of accuracy, cost, and restrictive ambient conditions [
3]. Thus, owing to these aforementioned issues, the problem of reliably classifying terrain types while maintaining the accuracy with a low-cost solution is a highly challenging task. Moreover, there is a need to explore the alternative approach against traditional terrain classifications. For instance, it is well known that human beings can capture information about terrain during walking by sensing it with their feet and by the sound of their footsteps [
22]. The kinematic properties of the human motion pattern allow capturing the motion data for gait analysis, which in turn has been used as a reliable source for activity recognition [
23] and estimating soft biometrics including gait-based age estimation [
24,
25], gender classification [
24,
26], emotion recognition [
27] and human authentication/identification [
28,
29]. Moreover, with the ubiquitous availability of modern devices such as smartphones and wearables are typically equipped with many sensors. For instance, on-board/embedded inertial measurement units (IMUs) consisting of tri-axial accelerometers, tri-axial gyroscopes and tri-axial magnetometers are able to provide the inertial data (i.e., 3D acceleration and angular velocities with acceptable low noise rate) at no additional cost. Furthermore, together with such sensors, these smart devices are usually equipped with powerful processors capable of performing high computational tasks and thus potentially can capture and analyze inertial data without compromising the normal use of the device. This has potentially created many opportunities to solve real-life problems [
30] including soft biometrics classification [
31], reconstruction of human motion [
32], and measuring physical health and basic activities.
Use of smart phone inertial data for terrain classification has not yet been extensively explored. To the best of our knowledge, there exist only a few methods that utilize inertial data of human gait for terrain classification [
33,
34].
Table 1 highlights the main approaches in terrain classification. In this context, this paper proposes a novel terrain classification framework that uses tempo-spectral features extracted from inertial data collected with a smart phone. The extracted features are used to train machine learning classifiers (support vector machine and random forest) and predict terrain types. More precisely, the gait patterns of normal human walk over six different terrain types with variations in hardness and friction were recorded with inertial sensors. The following are the main contributions of the proposed approach:
We collected gait data of 40 healthy participants using body-mounted inertial sensors (embedded in smartphones) attached on two body locations i.e., chest and lower back.The data were collected on six different types of terrains: carpet, concrete floor, grass, asphalt, soil, and tiles (as explained in
Section 2.4). The data can be freely accessed by sending email to the corresponding author.
We propose a set of 194 saptio-spectral hand-crafted features per stride, which can be used to train different supervised learning classifiers (random forest and support vector machine) and predict terrains. The prediction accuracy remained above 90% for terrains under different classes such as indoor–outdoor, hard–soft, and a combination of binary, ternary, quaternary, quinary and senary terrain classes (details in
Section 2.4 and
Section 3).
From the experimental results, we found that the lower back location is more suitable for sensor placement than chest for the task of terrain classifications as it produced the highest classification accuracies (details are in
Section 4.1).
4. Discussion
4.1. Summary of Findings
The goal of this work was to classify the type of terrain from the inertial data of the strides collected with a single body mounted IMU. 6D accelerations and angular velocities were recorded with the help of two android based smart phones (on-board MPU-6500 IMU). The sensors were mounted on two different body locations, i.e., chest (smart phone), and lower back (smart phone). A total of 40 volunteers participated in data collection sessions and their gait data were recorded on six different terrains, namely carpet, concrete floor, grass, asphalt, soil, and tiles. A valleys detection method was used to segment low-level 6D gait signals into strides. A total of 194 tempo-spectral features were computed for each of the stride, which included 110 time domain features and 84 spectral domain features. The choice of the predictors was Random Forest and Support Vector Machine and 10-fold cross validation was used as validation model.
Figure 9 shows the comparison of classification accuracies achieved with SVM and RF for each sensor location. It is observable that the highest classification accuracy was achieved with the lower back sensor data followed by the chest data. Furthermore, it can be seen that SVM performed better with fewer classes, i.e., indoor–outdoor, hard–soft, binary and ternary classification, whereas the RF performed better with more classes, i.e., quaternary, quinary and senary. This is true for the features sets computed from the data collected with the sensors attached on lower back and chest positions. The gradual drop in the classification accuracies as the number of classes increases was because of the natural similarities between different types of surfaces, which caused a significant number of samples to be mis-classified.
4.2. Comparison with Existing Approaches
Terrain classification has been extensively studied in the domain of humanoid robots and autonomous mobile vehicles; however, we have found few studies that focus on terrain classification using inertial data of human gait. In the experimental setup of Hu et al. [
33], an IMU was mounted on L5 vertebra of the subjects and the gait data of 35 subjects were recorded on two different types of surfaces: flat surfaces and uneven bricks. A deep learning model with long short-term memory units was used for training and prediction. They achieved a surface classification accuracy of 96.3%. Anantrasirichai et al. [
11] used visual features captured from body mounted cameras during human locomotion. They considered three different classes of terrains, i.e., hard surfaces, soft surfaces, and unwalkable surfaces. They reported a classification accuracy of 82%. Diaz et al. [
34] proposed a terrain identification and surface inclination estimation system for a prosthetic leg using visual and inertial sensors. They recorded data on six different surfaces and achieved an average classification accuracy of 86%. Libby et al. [
48] performed acoustic-based terrain classification for robots that used sound from the interaction of vehicle and terrain. They reported a classification rate of 92%. They performed sliding window operation for feature extraction, which is not efficient with respect to time, as compared to the proposed method. In comparison to these approaches, the proposed approach was tested on six different terrains, namely, carpet, concrete floor, grass, asphalt, soil, and tile, and the classification accuracies outperformed others in all cases, as shown in
Table 8.
4.3. Limitations
In our experiments, we only collected data on six different terrains, namely carpet, concrete floor, grass, asphalt, soil, tiles. However, there exist many practical terrains such as pebbles, sand, gravels, exercise mats, footpaths, etc., which should be considered in data collection. This would help in analyzing the behavior of the proposed approach under broader spectrum. Similarly, our database consists of only 40 subjects with a male to female ratio of 30:10, which is unbalanced as only 25% of the population is female. Extending the database and including more female participants to balance the population and testing the proposed approach with the extended dataset is another important direction of the future work. Another limitation of the proposed approach is placement of the sensors i.e., chest and lower back. There exist many other practical sensor placement locations such as wrists, side and back pockets, chest pockets, etc. that should also be considered. This would help in terrain classification in a ubiquitous manner. Another important aspect is that, although the data from both the sensors were collected simultaneously, there was no stringent requirement for synchronization, as both time series were independent from each other. Human intervention was only needed in the pre-processing step for training data preparation to retain the time series data when the subject is walking. Since we were interested in terrain classification using single strides, these strides were extracted automatically using local minima from the time series data. For practical applications, automatic segmentation of strides and decision making (for terrain classification) using sequential analysis is indeed a direction of future work.