2.2.2. Terrain Classification Module

An online terrain classification system is needed to collect information on terrain through the Kinect sensor and then the key points are extracted from a color image of the terrain. Since the terrain classifier is based on the bag-of-words (BoW) module [26], all extracted features are processed by a clustering algorithm to ensure that the clusters have high similarity. These cluster centers are the visual vocabulary. Then, the terrain images are encoded to form the visual dictionary and a visual vocabulary frequency histogram corresponding to each terrain type. Finally, the information is used to train the support vector machine (SVM) [27] and an optimal hyperplane of each terrain type is divided to classify all terrain types. The algorithm can grasp the key samples and eliminate many redundant samples.

The main structure of the terrain classification system is demonstrated in Figure 3. The system is mainly divided into two steps: training and testing. In the first step, the information of all terrain types is collected and stored in memory and the data flow is presented as shown on the right in Figure 3. Then, local features of images in memory are extracted and extracted features are clustered by the k-means algorithm to generate a certain number of visual words [28]. Then, terrain images are encoded using the BoW module to form the visual dictionary and the visual vocabulary frequency histogram corresponding to each terrain type. Then, the information is used to train the SVM. With the aim of validating the terrain classification system established in the training part, the testing part is introduced, shown on the left in Figure 3. The local features from terrain images are extracted in the testing image set and the visual word dictionary is encoded. The images are converted to the frequency histograms that are input in the trained SVM to obtain the terrain label. This part is used by the hexapod robot for terrain recognition. The hexapod robot's gait transform algorithm is guided by terrain identification.

**Figure 3.** Diagram of terrain classification system. BOW: bag-of-words; SVM: support vector machine.

In this paper, a dataset was created using six common terrains: grass, asphalt, sand, gravel, tile, and soil. Each terrain image set contained 50 samples, which were acquired by a Kinect camera. A set of samples of terrain images with different illuminations and weather conditions is shown in Figure 4. The *K*-fold cross-validation was used and *K* = 5 [29]. All images were randomly partitioned into five equally sized groups. Each group was chosen as validation data for testing the classifier and other 4 groups for training set.

**Figure 4.** Different terrains and corresponding numbers of speeded up robust features (SURF) key points.
