1. Introduction
Remote sensing image processing is an important branch of remote sensing target detection and recognition, which is mainly aimed at the salient target in the remote sensing image region. This technology is widely used in the vision system of satellites, missiles, and ships. In order to give full play to the perceived ability of the visual system, the speed and accuracy of detection and recognition have become important standards to measure the visual system [
1,
2,
3,
4,
5,
6]. However, the remote image transmission will lose some feature information, such as color and texture, due to the influence of natural factors and the external environment.
As an important feature of remote sensing targets, the shape has good anti-interference ability against natural factors such as the environment. Therefore, the shape has been used by a large number of scholars to describe remote sensing target features [
7,
8,
9,
10,
11,
12], becoming an important tool for remote sensing target detection and recognition. However, most of the existing shape description methods mainly focus on the accuracy and precision of target recognition but ignore the recognition speed, which makes it difficult to be applied in the remote sensing target measurement system. At the same time, it becomes the application bottleneck. For example, LP (label propagation) [
13], LCDP (local constrained diffusion process) [
14], and co-transduction (cooperative transduction) [
15] are based on graph conduction and diffusion process technology to achieve dataset retrieval reordering by iteration, although the recognition accuracy is high, the time cost is very high. Although the MMG (modified common graph) [
16] algorithm is known for its speed, it does not adopt the graph conduction technology based on iteration, it needs to find the shortest path between any two nodes in a sparse G=graph, which still consumes a lot of computation time. There are also many shape description methods based on deep learning and neural networks, but the neural network is a relatively complex framework. In addition, shape datasets are generally small in volume, and cannot meet a large number of training requirements of neural networks. It requires relatively high time complexity and is not easy to meet the needs of practical applications.
Shape descriptors are generally divided into global and local descriptors. Because shape description methods need to satisfy three invariances (translation invariance, scale shrinkage invariance, and rotation invariance), and it is difficult for local shape descriptors to ensure these three invariances, most of the existing shape description methods are based on the global feature of the target. For example, CCD (centroid contour distance) [
17], FD-CCD (FD-based CCD) [
18], FPD (farthest point distance) [
19], SC (shape context) [
20], IDSC (shape context based on inner distance) [
21], AICD (affine-invariant curve descriptor) [
22]. There have also been efficient global shape description methods in recent years, such as CBW (chord bunch balks) proposed by Bin Wang et al. in 2019 [
23,
24], and TCDs (triangular centroid distances) proposed by Chengzhuan Yang et al. in 2017 [
25]. Remote sensing targets could be occluded, such as shown in
Figure 1a, where the shape contour of the plane target in the remote sensing image is occluded. In this case, the obtained shape will be a severely deformed shape contour or a missing shape contour, as shown in
Figure 1b,c. If the above global shape description method is still used, the recognition accuracy will be greatly reduced and the practical application ability of the target recognition method will be reduced. At this time, in order to correctly identify the target through shape, it is necessary to obtain the local shape features to describe the target. Therefore, for the occluded remote sensing target, the local shape description method will play an extremely important role, and the study of the local shape description method is of great value for the occluded target.
However, existing local shape description methods, such as dynamic space warping (DSW) proposed by N. Alajlan et al. [
26] and contour convexities and concavities (CCC) proposed by T. Adamek et al. [
27], mostly due to considering three invariances and increasing the calculation amount (and, thus, increasing the recognition time cost), means they cannot be applied. Therefore, it is critical to research a fast and accurate local shape recognition method for the occluded target. A fast local shape description method based on feature richness, aimed at the local shape features of occluded targets, is proposed by using walking MBRs, where feature richness includes pixel richness, direction richness, and distance richness. It provides some theoretical support for the practical application of occluded remote sensing targets. The authors’ contributions include:
(1) A feature richness principle of the walking MBRs for the local shape contour of the occluded remote sensing target is proposed, which satisfies three invariances of shape recognition;
(2) Proposed local contour pixel richness feature based on the feature principle;
(3) Proposed local contour orientation richness feature based on the feature principle;
(4) Proposed local contour distance richness feature based on the feature principle;
(5) A fast shape local strong feature description method based on the constraint reduction of the feature structure is proposed for the remote sensing occluded target.
Detailed explanations and procedures of the proposed method are provided in the following sections. The second section presents a detailed introduction to the proposed method of FEW. The third section presents the calculation complexity analysis. The fourth section presents the performance and test results of the method, including a self-built small remote sensing target dataset and three internationally used shape datasets. The fifth section presents a discussion of the proposed method. The last section presents the conclusion.
2. The Proposed Method
In this paper, a shape recognition method based on the feature richness of local contour using walking MBRs is proposed for partially occluded remote sensing targets. Firstly, a certain length
of the target contour is fixed from a certain contour point
to obtain its MBR, and then the feature richness of the inner contour of the MBR is calculated, including the contour pixel richness, contour orientation richness, and contour distance richness. Finally, the minimum feature richness and maximum feature richness of the local contour are obtained as the target shape features, which form the local feature of the target shape. The structural framework of the proposed method is shown in
Figure 2 below, and the algorithm framework is shown in Algorithm 1 in Table below.
The proposed method can be divided into six parts, from the input shape contour to the output shape recognition result, representing the complete step of shape recognition. See
Section 3 for a detailed comparison and evaluation.
Algorithm 1 |
Input: Shape contour sampling points, ; |
Output: Shape recognition similarity of constraint reduction, ; |
1: Create some sets of , |
2: represents the number of MBR; |
3: for each ,
do |
4: Calculate the walking MBR of the local contour from to a fixed length; |
5: for each , do; |
6: Calculate contour pixel richness , contour direction richness , and contour distance richness of contour ; |
7: Calculate feature richness, of , ; |
8: end for |
9: end for |
10: The feature richness of the shape is ; |
11: Calculate the maximum feature richness and the minimum feature richness ; |
12: Reduce restraint for to obtain the strong feature-richness ; |
13:
Then, do shape matching and shape recognition to obtain recognition accuracy. |
14: Return |
2.1. New Concepts
Feature richness: Feature is an important weapon for the computer to describe remote sensing targets. The richer the feature of the target shape, the more easily the target shape can be recognized and perceived by the computer. Feature richness is defined as the information richness of features used by the computer to describe the target shape, also known as feature information strength.
Contour pixel richness: Pixels are important information for a computer to describe images, which is also a kind of feature of target shape contour because the shape contour in the image is accumulated from pixels with different parameters. Contour pixel richness is defined as the ratio of the number of pixels on a certain contour in a given area (in this paper the area is whole sides of MBR) when the recognition system describes the target. The larger the ratio of the number of contour pixels to four sides pixels, the greater the pixel richness of the contour.
Contour orientation richness: Orientation is an important feature for the computer to describe the image. Contour orientation richness is defined as the change of relative position between points on the contour, which is expressed as the average directional change in this paper. The greater the change, the greater the contour directional richness.
Contour distance richness: Distance is an important feature for the computer to describe target parameters in target image, especially in the points of shape contour. Contour distance richness is defined as the distribution of the distance between a contour and a given endpoint when a recognition system describes a target shape. The wider the distance distribution, the greater the distance richness of the contour.
Feature structure reduction constraint: Shape recognition is the recognition of shape features. The feature structure reduction constraint is defined as the factor that reduces the performance of shape recognition after obtaining shape features. In this paper, the constraints of structural features are expressed as reducing the weak features in the feature structures and leaving only strong features (minimum feature richness and maximum feature richness).
2.2. Local Contour MBR
The MBR refers to the maximum rectangular range of multiple two-dimensional shape (such as points, lines, and polygons) represented by two-dimensional coordinates. It is the rectangle whose boundary is determined by the maximum abscissa, minimum abscissa, maximum ordinate, and minimum ordinate of each vertex of a given shape. Such a rectangle contains a given image and has a minimum area, etc. The MBR can reflect the feature information of the target, such as direction, size, and position, and the structure with the feature information of the target can be used to describe the target, so the MBR can be used as one of the target features [
28,
29].
In general, there are two types of MBR: minimum area rectangle and minimum perimeter rectangle. In this article, the minimum area rectangle is used. The following steps are the key steps to calculating the MBR [
28].
Step 1: Find the maximum and minimum points of the abscissa and ordinate of the shape contour points respectively;
Step 2: Use these four points to construct four tangents to the shape;
Step 3: If one or two lines coincide with an edge, calculate the area of the rectangle determined by the four lines and save it as the current minimum value. Otherwise, the current minimum value is defined as infinity;
Step 4: Rotate the lines clockwise until one of them coincides with the edge of the polygon;
Step 5: Calculate the area of the new rectangle and compare it with the current minimum area. If it is less than that, update and save the area of the new rectangle as the minimum area;
Step 6: Repeat steps 4 and 5 until the angle of the line is rotated more than 90 degrees;
Step 7: Obtain the MBR.
Figure 3 shows the MBRs of some local contours of the target shape.
The set of contour points
,
denotes the number of contour points of the contour, and all the contours in this article are extracted by the Canny operator. The MBR of the contour obtained at this time can be expressed as Equation (
1). This is also the basic criterion of the feature richness method proposed in this paper.
where,
represent constants,
represent
-axis coordinates of the four vertices of the
of the contour, respectively.
Assuming that the set of a certain contour segment is , and the inner region of the minimum outer rectangle of the contour segment is , then is obtained.
2.3. Contour Pixel Richness
Contour pixel refers to the number of pixels contained in a certain contour, which can represent the contour feature [
30]. In this paper, contour pixel richness is expressed as the proportion of the number of contour pixels in the whole
sides under the principle. Let the number of pixels on the contour segment be
, and the number of pixels contained in the whole sides of the MBR of the contour be
, then the calculation equation of the contour pixel richness
is shown in Equation
, and the calculation result is a constant.
In
Figure 4 below, (a) a specified contour of the aircraft, and (b) the MBR of the contour. After the calculation, the contour pixel richness of this contour segment is 0.52.
2.4. Contour Orientation Richness
Contour orientation richness is defined as the intensity of position variation between contour points. In this paper, the contour orientation is obtained by coding. The chain code is a method to describe the orientation of a curve with the coordinates of the starting point of the curve and the direction code of the contour point. It is often used in image processing, computer graphics, pattern recognition, and other fields to represent the image feature [
31,
32]. Chain codes are generally divided into original chain codes, differential chain codes, and normalized chain codes.
2.4.1. Original Chain Code
Starting from a certain starting point of the boundary (curve or line), calculate the orientation of each line segment in a certain orientation, and express it with the corresponding pointing symbol of the fixed number of orientations. The result will form a digital sequence representing the boundary (curve or line), which is called the original chain code. The original chain code has translation invariance (the pointer is not changed during translation), but when the starting point S is changed, different chain code representations will be obtained, i.e., there is no uniqueness.
2.4.2. Differential Chain Code
The differential chain code denotes the coding of a dataset
M, except for the first element
, in which each element is represented as the difference between the current element and the previous element. Similar to the original chain code, the differential code page has translation invariance and scale invariance. The calculation formula of the “difference” code is shown in Equation (
3).
Here, N represents the total number of directions encoded, and is the value of each sequence in the original chain code.
2.4.3. Normalized Differential Chain Code
The normalized chain code is to obtain the original chain code from any starting point for the closed boundary. The chain code is regarded as the N-bit natural number formed by the numbers in each direction, and the code is recycled in one direction to minimize the N-bit natural number formed by it. At this time, the chain code with a unique starting point is formed, which is called the normalized chain code. Simply speaking, it is to cycle the original chain code in one direction to minimize the N-bit natural number formed by it. Normalized differential chain codes have translation invariance, rotation invariance, and uniqueness. The acquisition method is: normalize the difference code. The calculation formula of the normalized differential chain code is shown in Equation (
4) below.
where, the value of
is the number of differential codes, and
is the value of each sequence in the differential chain code, which can be "recursed" step by step in one orientation starting from any code value in the differential code. Based on the feature requirements of shape recognition, the shape representation method should satisfy three invariances. Considering the three coding methods and the uniqueness of the coding results, it is found that the normalized differential chain code can meet the requirements of shape descriptor for shape recognition, and it has translation, scale transformation, and rotation invariances. Therefore, it can be used as a feature vector to describe the shape contour.
2.4.4. 20-Chain Code Coding
The connectivity orientation diagram of the 20-chain code designed in this method is shown in
Figure 5a below. The 20 orientations corresponding to the current contour are divided into 20 regions respectively. According to the relative direction rule of the current contour point,
, and the fourth contour point,
, the position of each possible next direction is represented by letters from
A to
T. In the calculation process, the 20 orientations correspond to the natural numbers 0–19, respectively. The starting orientation was set as the right orientation of the image, and the starting point of the coding was the contour closest to the top left of the contour part. All the contours on the shape were coded in the clockwise orientation according to the above rules, so the initial contour was returned. Finally, the orientation feature of the contour is represented by the relative position relationship of all orientations of the contour. In
Figure 5b is the local contour of a target, and (c) the orientation code of the contour obtained by coding according to the above rules.
Contour orientation richness represents the intensity of variation of contour orientation, in other words, it is all the variation between the code value of the next direction and the current direction. For the whole local contour, it is expressed as the ratio of all variation to the number of the changes. In this method, the average variation of orientation is chosen as the contour orientation richness. Let the code values of all orientations on the contour be
, where
represents the number of codes on the contour, then the contour orientation richness
can be calculated by the following Equation (
5). The result is also a constant.
2.5. Contour Distance Richness
Distance is an important feature of an image [
20,
21,
33]. Contour distance richness refers to the distance between the points on the contour and the long side and short side of the MBR according the principle, and two pairs of distance values on the long side and short side of the MBR can be obtained respectively. The minimum values of the two pairs of distances are respectively found as the distance of the point with respect to the MBR. Since the sum of these two pairs of values corresponds to the value of the perimeter of the MBR, the shortest distance and the longest distance selected here are the same. In this method, the minimum value is selected as the final distance of this point.
Assume a point
on the contour, and find the distance between the long side and the short side of the MBR as
. The minimum distance to the long side is
, and the minimum distance to the short side is
, the ultimate distance ’value ’ is expressed as
. In this case, after the distance values of all contour points are obtained according to this rule, a vector set containing
distances will be obtained, denoted as
. The distance set is divided into
bins according to the distance distribution, and a new vector set
is obtained, which is defined as the contour distance richness under the
, where the value of
in this method is 10. Then the contour distance richness
is expressed as the following Equation (
6). This vector is also a constant vector.
Figure 6 below shows a schematic diagram of a contour and its
. Where, the green point
P is the point on the contour, the red rectangle is the
of the contour, and the red points
and
are the vertical points of
P on the short and long sides of the
, respectively. The green line · segments
and
are the shortest distances of
P with respect to the short and long sides of the
, respectively. According to the above rules, the minimum distance of the contour point with respect to the MBR can be obtained. After finding the minimum distance of all contour points, the minimum distance set of all points can be obtained.
The histogram is widely used to represent the distribution features of the image [
34,
35,
36]. In this method, the histogram is also used to represent the distance distribution vector
of the contour. For the contour shown in
Figure 6, the distance distribution obtained is shown in
Figure 7 below.
2.6. Feature Richness for Walking MBRs
The method proposed in this paper is based on the local feature of the occluded contour. In order to describe the shape contour features of the occluded target more effectively, the authors use the principle of walking MBR to obtain the contour feature richness of the entire occluded target shape. The walking principle is: from the first point of the occluded contour, respectively, according to a certain number interval
and a certain number points
N to calculate all the MBRs. Then, the feature richness of each fixed-length contour to the MBR is calculated respectively, until the feature richness of the last contour with length
N is obtained.
Figure 8 below shows the walking diagram of the MBR to the occluded target contour. In
Figure 8a, there is an occluded aircraft contour. On this contour, the MBR of the local contour is obtained from the upper left corner according to the principles described above, as shown in the red, green, yellow, and brown rectangles. The order of selection is counterclockwise, as shown by the fluorescent green circular arrow in (a). The walking principle of the MBR is shown in
Figure 8b (the dashed box area in
Figure 8a). The direction pointed by the fluorescent blue dashed arrow is the walking direction of the MBR, also known as the walking principle.
Based on the above walking principles of MBR, the MBR of all local contours and the corresponding feature richness can be obtained. At this time, according to the aforementioned solving methods of richness and the walking principles of the MBR, the pixel richness, orientation richness, and distance richness of the entire occluded target contour can be calculated according to the corresponding equations. The following Equation (
7) represents the pixel richness of the entire occluded target contour, Equation (
8) represents the directional richness of the entire occluded target contour, and Equation (
9) represents the distance richness of the entire occluded target contour, where,
represents the number of local contour segments of the entire occluded target and the number of minimum enclosing rectangles.
indicates the number of bins.
According to the above three equations, the feature richness of the entire occluded target contour can be calculated and expressed as the following Formula (10).
In order to reduce the complexity of the feature structure of the shape contour, this method uses the concept of constraint reduction to select strong features as the final occluded target contour features, including the maximum contour richness and the minimum contour richness. As we all know, strong features are more suitable to describe target features and distinguish different targets more easily. The maximum and minimum contour richness are expressed in Equations (11) and (12), respectively.
Then, the feature richness
, which is finally used to represent the occluded contour, can be obtained and expressed as the following Equation (
13).
2.7. Feature Matching
The feature matching stage is the process of similarity matching after obtaining the final shape feature, which is also an important step after feature extraction. According to the sections described above, the feature richness
of the occluded target contour can be obtained, and feature matching can be carried out after the feature vector is obtained. Based on the feature structure of the occluded shape in this method, the Euclidean distance is used to measure the shape similarity, and then the feature matching is carried out and the recognition result is obtained. In order to ensure the integrity of the feature vectors, the difference between the maximum and the minimum feature richness is calculated as the basis for the similarity measurement of the two shapes. The specific matching equation can be expressed as the following Equation (
14).
5. Discussion
In this paper, a method of shape recognition for occluded remote sensing targets is proposed by using the local contour strong feature richness to the walking MBR. The feature richness of shape contour includes contour pixel richness, contour orientation richness, and contour distance richness. The larger the richness of the contour, the more feature information the contour contains (and the easier the shape to be recognized is). The walking MBR is for the contour of fixed length and walks on the whole occluded shape contour in a clockwise direction. The MBR itself can represent the direction, size, position, and other features of a local contour, which is very suitable for describing shape features. It is worth mentioning that in the last stage of feature structure, the strategy of constraint reduction is used to reduce the weak feature constraints of feature structure, and only the strong feature of shape is used to describe the shape, which will greatly reduce the matching time of shape recognition and improve the speed of shape recognition. Since there is no shape dataset about occluded remote sensing targets at present, in order to verify the efficiency of the proposed method, the validation is carried out on a self-built remote sensing target shape dataset and three international common datasets respectively. Among them, the first three shape datasets all conduct artificial partial occlusion of the recognized target shape, in order to better illustrate the effectiveness of the proposed method for occluded remote sensing target recognition. The last dataset is occluded due to its large deformation, so no artificial occlusion is added. The recognition speed on these four datasets is all lower than 1 ms, and the recognition accuracy is significantly improved compared with other state-of-the-art and well-known shape descriptors. The recognition result on the self-built occlusion remote sensing target is even close to 100%, which provides strong theoretical support for the recognition application of the occluded remote sensing target. In
Section 4.5 and
Section 4.6, comparative experiments also better validate the efficiency of the proposed method, which has sophisticated performance for both occluded target shapes and non-occluded target shapes.
In addition, readers may doubt that the external environment will have an impact on the quality of the acquired target shape image in the contour extraction stage. However, before the object recognition stage, there must be the object detection stage; these are two completely different stages. In the stage of object detection, it is necessary to consider the influence of external factors on the object image, but in the stage of object shape recognition, the shape image that can be recognized by the computer has been acquired by default in the stage of object detection, so it is not necessary to consider the influence of the external environment of the target on the shape recognition. Finally, in future research, we will continue to study the shape recognition of occluded objects, build a large dataset specifically for occluded remote sensing target shape recognition, and publish relevant papers.
Moreover, it can be seen from this article that the proposed shape features are in the spatial domain rather than in the frequency domain and the images in the datasets used in this article do not have noise themselves, so it is not necessary to consider the impact of noise on the recognition results. Finally, even if there is noise in the image, the noise will affect all shape recognition methods, but our proposed method still has the best performance, which again proves the effectiveness of the proposed method for remote sensing target shape recognition.
6. Conclusions
FEW is a shape recognition method with low computational costs, high recognition efficiency, and strong robustness. In this method, shape recognition is performed for the occluded remote sensing target shape contour, and the final shape features are invariant to translation, rotation, and scale transformation. This method obtains the feature richness of the local contour in the MBR based on the walking MBRs, including contour pixel richness, contour orientation richness, and contour distance richness. The obtained feature richness is a one-dimensional constant vector, which greatly reduces the matching cost in the feature matching stage and makes the method have a faster recognition speed. In addition, the final occluded target shape feature structure is simplified into the recognition process of strong features (minimum richness and maximum richness) by the strategy of constraint reduction, which greatly reduces the complexity of the feature structure and accelerates the recognition speed. The final matching time is less than 1 ms. Since the occlusion remote sensing target shape dataset is difficult to obtain, this paper uses a self-built remote sensing target shape dataset and three general shape recognition datasets to verify the performance of the proposed method. It is worth mentioning that the authors of this paper artificially occluded the shapes to be recognized in both the self-built dataset and the two general datasets, in order to better verify the proposed method for occluded remote sensing target shape recognition. The experimental results demonstrate the sophisticated recognition performance of FEW. Compared with some state-of-the-art shape recognition methods, including those for occluded shapes, FEW not only guarantees higher recognition accuracy but also greatly enhances the recognition speed. The recognition speed of the proposed method is more than 1000 times faster than some other methods and the recognition accuracy is close to 100% in the self-built dataset, which greatly enhances the recognition speed of the occluded remote sensing target and provides a more powerful performance support for the practical application of the remote sensing target recognition.