2.3.2. Image Infilling for Mixed Terrain

After the first round of classification, both the terrain label and confidence score of the classified image are obtained. In SVM, the confidence score represents the geometric interval between the classified image and the hyperplane of each terrain type. Therefore, the confidence score needs to be normalized before conducting an analysis. The confidence score is adjusted to the interval [0, 1] to facilitate the comparison. Set *Sd* contains the confidence scores of all terrain types, and *di* is the confidence of the test image that corresponds to *i* terrain class before normalization. Set *SD* contains confidence scores after normalization and *Di* is normalized confidence. Therefore, after normalization we get

$$S\_D = \langle D\_i | D\_i = |d\_i| / \sum\_{i=1}^6 |d\_i|, i = 1, 2, \dots, 6 \rangle. \tag{10}$$

Moreover, a pie chart of confidence scores after normalization can clearly demonstrate the contribution of each terrain type. A pie chart of confidence scores after the first round of classification is shown in Figure 8. In the images of single terrain, the weight of single terrain is much higher than the weights of other terrains. A series of experiments demonstrated that if the highest terrain weight is larger than 30% and more than 10% higher than the second highest weight, the terrain can be considered as a single terrain. Otherwise, it is mixed terrain. For mixed terrain, it is difficult to identify the category from weights in the pie chart. In addition, it is important to note that mixed terrain usually appears at the intersection of different terrains. The traditional methods are not practical for images that contain two or more terrain types, because only one label will be notified. Obviously, some approaches can identify the boundaries of different terrains in an image and then make the decision. Actually, it is difficult to accurately determine terrain boundaries and the algorithm needs to do many computations, which causes poor real-time performance that affects the robot's outdoor walking. In the process of the robot moving in a forward direction, the terrain type is gradually changing. Different

types of terrain appear in up and down form in the images. Taking this into consideration, a new method for identification of mixed terrain based on super-pixel image infilling (SPI) is presented.

**Figure 8.** Distribution of terrain types: (**a**) single terrain; (**b**) mixed terrain.

In the field of image segmentation, super-pixel has become a fast-developing image preprocessing technology. Ren et al. [35] first proposed the concept of super-pixels, which quickly divide images into a number of subareas that have image semantics. Compared with the traditional processing method, the extraction and expression of super-pixels are more conducive to collecting local characteristics of the image information. It can greatly reduce the calculation and subsequent processing complexity. Existing segmentation algorithms generally restrict the number of pixels, the compactness, the quality of segmentation, and the practicability of algorithms. Song et al. [36] evaluated the existing super-pixel segmentation algorithms. Their results indicate that the simple linear iterative cluster (SLIC) super-pixel segmentation algorithm has good performance in terms of the controllability of pixel numbers and the close degree of controllability. Aiming at segmentation, the SLIC algorithm is used for mixed terrain regions. The most super-pixels are selected as the target area in a multi-super-pixel area and the boundary pixels of the pixel coordinates of curve fitting are extracted as the terrain boundary segmentation of a complex terrain image. The procedure and results are shown in Figure 9.

**Figure 9.** Segmentation result of mixed terrain images: (**a**) simple linear iterative cluster (SLIC) algorithm; (**b**) maximum super-pixel extraction; (**c**) filtering out smaller areas; (**d**) finding the boundary and fitting the line; (**e**) results.

Classification results after image segmentation are shown in Table 3. The output labels do not match actual terrain types. For this mismatch, the number of points of interest extracted from segmented images is shown in Figure 10. Compared with the original terrain image in Figure 4, the number of feature points of segmented images is still related to terrain type but much smaller. Obviously, it is impossible to realize an accurate prediction using the segmentation image, because the feature points are inadequate. Thus, the segmented images are spliced together to enhance the terrain features. Segmented color images would have only some of the pixels of the original color image collected by the Kinect camera and the blank pixels would be infilled by duplication of the segmented image. In the test, the rotation–inversion operation is used for image infilling. The results are shown in Figure 11.


**Table 3.** Image infilling and confidence scores.

**Figure 10.** Number of feature points in segmented images.

**Figure 11.** Number of feature points in spliced images.

The number of feature points in Figures 10 and 11 shows that the proposed method can enhance local features of segmented images. The classification results of a spliced image using this approach are shown in Table 3. Using the image infilling approach (rotation–inversion), the error results of the first-round classification can be corrected. It can be seen that confidence scores of the correct terrain type increased after image infilling. On the contrary, confidence scores of wrong terrain types decreased. That means the proposed method can effectively magnify image features for the classifier.
