This part realizes the stitching of the overwater image and underwater image of the obstacle through the matching and calculation of image coding points. In addition, the 3D fusion detection of the obstacle is carried out by using the overall contour of the stitched obstacle as the image driver.
3.3.1. Image Edge Stitching
The representation of edge contour can be described by chain codes, which describe the edge characteristics of the image by obtaining the coding of the image boundary. The eight-way code direction symbol used in this paper is shown in
Figure 6. The eight numbers 0–7 in the eight-way chain code represent eight directions, respectively. In the eight-way chain code rule, the boundary of a pixel in the image is shown in
Figure 7.
The principle of fast stitching and matching of the overwater and underwater edge detection binarized images of unmanned ship obstacles by chain code is as follows: first, obtain the characteristic original code of the region of interest (ROI) of edge detection of the overwater image and underwater image, respectively. Then, judge the original codes of the two images. If the original codes are the same, they will be matched. Otherwise, they will not be matched. Finally, the ROI of the edge detection binarized image of the overwater and underwater obstacle is stitched. According to the original code of the matched image obtained in the above steps, the corresponding subcode space is established, and the subcode is matched through the complementary relationship between the subcodes, which is established by the original code to be matched. If the subcode can be matched, it is considered that the pixels of the edge detection binary image of the overwater obstacle and the underwater obstacle corresponding to the subcode can be stitched; otherwise, it will not be stitched.
The complementary relationship of the subcodes of the eight-way chain code is shown in
Table 1. In order to reduce the calculation of unnecessary pixels, only the underwater images of obstacles and the pixels of the edges to be stitched and their neighborhoods of the underwater images are coded. For example, if the chain code is shown in
Figure 8, its original code is 10276534 and its complement is 54632170.
In order to minimize the excessive coding of the image and the possible noise influence, this paper takes the image to be stitched as the benchmark, uses the disk model to define the characteristics of each pixel, and determines the starting point of coding and the coding principle of the image edge. There are two types of points that need to be encoded in the edge detection binarized image of the obstacle water image and the underwater image: one is the feature point less than the radius of the disc, and the other is the longest end point of the encoded line segment.
The detection criteria for feature points smaller than the radius are:
where
represents the code point detection function,
represents the function matrix of corner scanning,
and
both refer to the rectangular window function, and
is the curve-fitting function for the contour.
The detection standard for the longest end point of the encoded line segment is:
where
,
function represents the direction, and
represents the function that can be fine-tuned.
The specific process is: firstly, the disc model is established for the edge detection images of the overwater obstacle and the underwater obstacle. Then, set the edge of the images to be stitched and their extended neighborhood as the coded ROI area. Take the disc point
O determined by the stitching model as the starting point, and take the disc radius
R as the maximum coding length to determine the first coding point. Then, take this point as a new starting point and set the coding direction within
and the coding length to be less than or equal to
R to select the next coding point. According to the characteristics of the obstacle images, when selecting the next coding point, first determine the direction, and then determine the coding length. According to the coding rules, select the next coding point again until all the coding points suitable for coding in the disc model are selected. The flowchart is shown in
Figure 9.
It is assumed that all coding points have a total of
n decision directions, and the ordering and coding length are determined according to the coding points. Then, the coding direction of each coding point is
, and the coding length is
. The Euclidean distances of
R/4,
R/2, and
R for the coding length
are calculated, and then the coding length corresponding to the shortest Euclidean distance calculated is selected as the final decision coding length, so as to obtain the original code of the obstacle images. The formulas for solving the
n-direction normalized differential code
and the
n-direction complement code
are as follows:
where
refers to the original code value of the
i-th point of the
n-direction, and
refers to the code value of the
i-th point after the normalized difference.
Next, the subcode space is established to obtain the subcode. First, three space sets are defined.
,
, and
, respectively, are used to represent the space sets of the original code, the normalized differential code, and the complement code. The corresponding relationship between the three is as follows:
where the subcode space
and
are subsets of sets
and
, respectively.
The test takes the underwater image of an iceberg model as an example to calculate its normalized differential code.
Figure 10 is the coding model of the underwater image of the iceberg model, where 1 refers to the relative position, 2 refers to the stage radius, and 3 refers to the concentric angle. Because the stitching of the overwater image and the underwater image of the obstacle is based on the edge feature, this paper selects the edge and its extended neighborhood as the coding point. The final selection of the coding point for the underwater image of the obstacle is shown in
Figure 11. From this, it can be concluded that the original code of the center pole code of the edge of the underwater image of the obstacle is 4541474345435414541460, and the corresponding complement code is 0105030701071050105024.
Normalized chain code is conducive to the unique representation of a code point, but it does not have rotation invariance. In the process of image stitching, it is required that the chain code belonging to a certain edge is unique and does not change after moving and rotating. Normalized differential code has this characteristic. Therefore, in order to obtain the chain code that meets the stitching requirements, the original code needs to be differentiated and then normalized to obtain the normalized differential code. Next, taking the original code of the underwater image of the obstacle as an example, the solution of the normalized differential code is carried out, as shown in
Figure 12.
The normalized differential code of the underwater image of the obstacle can be calculated as 1177275317532241753357, while that of the overwater image of the obstacle can be calculated as 1177275317462241753357 by using the same calculation method.
It can be seen that, because the acquisition and processing of the image are affected by the water environment, perception equipment, and processing algorithm, the normalized differential codes of the edge of the overwater image and the underwater image are not exactly the same. However, the matching ratio of the two codes has reached more than 90%, which meets the conditions of edge matching within the allowable range of error.
After obtaining the normalized differential codes of the extended neighborhood edge of the overwater image and the underwater image of the obstacle, the fast matching and stitching of the images can be carried out. Feature extraction and matching calculation must meet the following conditions:
while edge stitching must meet the following conditions:
where
and
, respectively, refer to the set of edge normalized differential codes to be matched between the overwater image and the underwater image of the obstacle,
and
are their subcodes,
is the set of complementary codes to be stitched, and
is the subcode of
. Due to the existence of errors, it is impossible to obtain a completely consistent normalized differential code of the edge to be stitched of the overwater image and the underwater image of the obstacle. Therefore, in the stitching process, the subcode space is used for matching and stitching. The following takes the 5-bit normalized differential code 01234 as an example to illustrate the subcode space establishment, which is shown in
Table 2.
It can be seen from the table that the establishment rule of subcode space is to move cyclically according to the sequence of coding points, so that the continuity and accuracy of each subcode during stitching can be ensured. In the matching process, the complexity of the calculation method of the extracted features also directly affects the final stitching effect. The more complex the calculation method is, the more accurate the feature representation will be, which can further reduce the stitching error rate, but it will also reduce the speed of matching and stitching.
3.3.2. Fusion Algorithm
The perception of obstacles based on visible images only obtains the obstacles’ contour and texture, and lacks the depth information of the obstacles, while the point cloud information obtained by radar and sonar contains the depth information of the obstacles. Image information can help radar and sonar to extract and segment targets faster, and images incorporating depth information are given information such as obstacle size and position. Therefore, more comprehensive information of obstacles can be obtained by fusing the image information with the point cloud information obtained by radar and sonar.
PointNet [
30] is a classic architecture that uses neural networks to process the target point clouds. The architecture applies the maximum pooling function to extract the features from the target point clouds. In order to effectively utilize the spatial relationship between point clouds, PointNet also connects the point-by-point features and global features of the target, and uses the obtained aggregated information to classify and segment the target. It directly processes the target point cloud, which greatly reduces the loss of information. However, since PointNet uses a multi-layer perception method to extract the target point cloud point by point, the relationship between points is not fully considered, which can easily cause misclassification and mis-segmentation. Therefore, according to the needs of this paper, an image-driven Frustum–PointNet algorithm is proposed.
In the obtained point cloud data, the redundant water surface, background, and other information are nonimportant information. In order to make the information processing of the point cloud network more inclined to the target area, this paper also introduces an attention module, which can make the network learn the location of the target and enhance the expression of the area. The point cloud target type of obstacles is
n × 3, which has two dimensions: point and channel. It is necessary to add the attention mechanism to these two dimensions.
Figure 13 is the schematic diagram of the channel attention module, and
Figure 14 is the schematic diagram of the point attention module.
For the channel attention module, first the obtained point cloud features are input, in which the channel features of the points are solved by maximum pooling and average pooling, and then multi-layer perceptron learning is carried out to obtain the distribution attention of the channel.
The processing process of the point attention module has something in common with the channel. The acquired attention channel features are sent to the multi-layer perceptron for activation to obtain the attention weight of the point attention feature, and then multiplied with the input feature to obtain the point attention features. In this paper, the feature of the target area is obtained first through the channel attention module, and the important points of the target area are obtained through the point attention module, so as to suppress invalid information and enhance the role of effective information.
Compared with the view projection or voxelization method, PointNet point cloud network can minimize the loss of target information, so that the learned 3D information of the target is more complete, especially for the small obstacles; it also has better detection performance.
The Frustum–PointNet algorithm can utilize both 2D image and point cloud information for obstacle detection. Firstly, the 2D image of the obstacle is upgraded to the 3D space through the ascending dimension projection, and then the view cone point cloud is extracted. The 3D segmentation of the input view cone point cloud is performed through the Frustum–PointNet network to obtain the target area of the view cone point cloud, the co-ordinate transformation of the obtained target point cloud is carried out through t-net, and, finally, the 3D frame regression is carried out through PointNet to obtain the specific parameters of the obstacle. The stitched image of obstacles is used to drive the obstacle detection of the 3D point cloud, which can realize accurate positioning of the point cloud. The specific algorithm flowchart is shown in
Figure 15.