3.1.1. Discrete Object Recognition
The YOLO family can extract category information. Among them, YOLOv5 has a good performance in traffic target detection, and the algorithm only needs GPU training to generate a low-cost, robust, real-time, high-quality, and convincing target detection model. YOLOv5 consists of 4 parts: input, backbone, neck, and head. In this paper, we use YOLOv5 to recognize the type and exact location of specific obstacles and individual garbage.
Many objects in the roadway, such as leaves or bottles, tend to be distributed within close proximity, thus forming a group or cluster. Treating them as discrete objects would only increase the number of nodes to be passed in the TSP, leading to a steep increase in computation cost. Therefore, as shown in
Figure 3, we introduce the group in which neighboring targets of the same type are treated as a cluster so as to lower computational cost.
Further, in practical scenarios, the irregular distribution of complex garbage piles often causes stacking of garbage, which leads to garbage higher than the brush height of the sweeper, making it impossible for the sweeper to complete the sweeping work. For this kind of situation, we will process the LIDAR point cloud information. LIDAR can emit laser beams to detect the position, velocity and other characteristic quantities of a target. As shown in
Figure 4, which shows the process of converting our radar point cloud into a local elevation map, the gray rectangle is the vehicle model of this vehicle, and the orange cube is the recognized elevation terrain. Once the height of the garbage pile is detected to be higher than the threshold value, this pile of garbage is determined as an obstacle and the avoidance strategy is executed.
The inverse perspective transformation is the inverse process of perspective imaging, which acquires a bird’s eye view of the road plane by back-projecting the image coordinates to the spatial coordinates. Ref. [
26] proposed the inverse perspective transformation based on the pitch angle and yaw angle of the vanishing point, which finally realizes the solution of the position of the image plane corresponding to the road plane captured during the working process of the sweeper.
In this paper, the inverse perspective transformation is used to eliminate the perspective effect and obtain the top view of the real road surface, and the YOLOv5 algorithm completes the recognition of single garbage and obstacles. As shown in
Figure 5, it is the result of recognizing obstacles and single items of garbage with the YOLOv5 algorithm after inverse perspective transformation of multiple initial images.
YOLOv5 is used to recognize a single garbage target. In order to make the recognition information more qualitatively dense and reduce the planning redundant points, we further fused the feature points for recognition by a GMM-based skeleton feature downscaling method, which maps the skeleton features of the garbage into a two-dimensional plane and makes the feature points more characteristic. GMM is a K-means-like model that quantizes things precisely with Gaussian probability density function [
27].
Firstly, we discretize the recognized sweeping targets, i.e., we sparsify the features of a single target by selecting some random points in the target box. The random points in the perceptual domain form a set, and the region formed by similar targets can often be characterized by independent normal distributions, so we adopt the GMM model to characterize each discrete region. As shown in
Figure 6, the initial image is a Gaussian mixed distribution, and the accurate detection and target localization of garbage is accomplished.
3.1.2. Continuous Sensing of Complex Garbage Area Based on PE Analysis
Through the fusion of object detection and GMM, we can obtain the distribution of discrete targets; however, in real environments, there also exist some garbage regions with sparser features. As shown in
Figure 7a, the road garbage is continuously distributed on the road and is not composed of significant independent targets, while the pixel characteristic of the garbage area is not regular. At this point, the methods relying simply on deep features struggle to accomplish the area judgment task.
In order to form a qualitatively dense identification for road garbage regions, inspired by [
28,
29], we design a road image information entropy identification method to predict the distribution of garbage regions by recognizing the heterogeneity of garbage regions. As shown in
Figure 7b, we first perform gradient histogram detection on the camera front view image and select the centroid of the anomalous partition, then gradually expand and calculate the complexity of each partition. We will use the PE-a method of approximating the entropy by ordinal analysis to evaluate the complexity of the given partition, and ultimately determine whether it belongs to the complex garbage heap.
The specific calculation steps are as follows: We first perform gradient histogram detection on the camera’s front view image to extract possible garbage distribution areas and select the center of the garbage area. The lower half part of the front view image with a size of 256 × 128 is transformed into a BEV image and then the whole image is divided into 32 regions with a size of 32 × 32. The histogram information is extracted from each region, and the potential garbage distribution is obtained based on the feature of the histogram information. The histogram of each 32 × 32 image block is calculated to predict the potential garbage distribution region. As shown in
Figure 8, the road area tends to have a very narrow histogram for the pixel feature, which is relatively pure. In contrast, the histogram of the potential garbage area tends to have a very wide distribution. Based on this histogram, the potential garbage area could be selected preliminarily.
Since each image block is not necessarily fully covered by garbage, we need to further determine the specific intra-block garbage regions by fine scanning. Thus, we proposed an inference method based on the region PE evaluation [
28]. For each potential garbage distribution region, as shown in
Figure 7b, we encode the area image into sequence
by outwardly expanded encoding from the center of the garbage area. Then, encoding process scans the area from the inside out in an expanding search, and the image features are encoded as a sequence set
. Each scanned sequence is stacked to calculate the PE value, which could describe the changing of the image feature [
29].
The original sequence is a set of RGB points, and RGB only represents the details of the color composition with very weak order features. Therefore, before we input to the PE calculation, we convert the pixel features of the image to HSV format and retain only the H that characterizes the color interval, while S and V characterize the saturation and luminance of the color, for which the two values are homogenized as the average of all pixel points within the color block. By converting the image feature format, we preserve the image features and give them sequential attributes with H values as the ordering criterion.
By converting the image feature format, we preserve the image features and assign them sequence properties using the H-value as the sorting criterion. For each 32 × 32 × 1 pixel block, the PE calculation process starts from the internal 4 × 4 circle (12 pixels) and ends with the external 32 × 32 circle (124 pixels). The specific calculation method is shown in
Figure 8. The HSV-based sequence is considered as a time series with a time delay of 1. Assuming that a one-dimensional time sequence
represents the HSV features of a circle of images, the elements within it are sorted in ascending order according to the reconstructed sequence
, and the computational steps are shown in Algorithm 1.
It is worth noting that after calculating one region of 32 × 32 according to the steps of the algorithm, we still further decompose the potential region into four regions of the same size, i.e., 8 × 8 regions, for which further PE entropy calculations are carried out to determine a more detailed region of garbage distribution.
Algorithm 1: PE . |
|
As shown in
Figure 7b, the substitution entropy detection box on the right does not detect pixel anomalies, so it will not continue to expand the detection outward. After the detection box on the left detects the pixel abnormality, it will continue to select the center point in the direction of the maximum replacement entropy and continue to expand the detection in all directions until there is no pixel abnormality. Further, this method can also be used to circumvent anomalous regions caused by continuous waterlogging and road reflections.
As shown in
Figure 9, taking three 32 × 32 pixel regions as an example,
Figure 9a is the junction of garbage and road,
Figure 9b is the part of the clean road, and
Figure 9c is the complex garbage pile. Then, for each region, the histogram information is extracted based on the histogram information to obtain the potential garbage distribution region. Then, PE entropy calculation is performed for the collection of sequences to obtain the complexity of the sequences. As shown in the line graph of
Figure 9, the PE value of the sequence of
Figure 9c is growing, which indicates that the pixels of the current loop are more complex and have not been iterated to the background region with a more uniform color. The PE value of the sequence of
Figure 9a decreases to a certain extent, which indicates that the current expansion circle has been located in the solid color background region on the periphery of the garbage region.