Imaging Parameters-Considered Slender Target Detection in Optical Satellite Images

Huang, Zhaoyang; Wang, Feng; You, Hongjian; Hu, Yuxin

doi:10.3390/rs14061385

Open AccessArticle

Imaging Parameters-Considered Slender Target Detection in Optical Satellite Images

¹

The Key Laboratory of Technology in Geo-Spatial Information Processing and Application System, Chinese Academy of Sciences, Beijing 100190, China

²

Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China

³

School of Electronic, Electrical, and Communication Engineering, University of Chinese Academy of Sciences, Beijing 100049, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2022, 14(6), 1385; https://doi.org/10.3390/rs14061385

Submission received: 31 January 2022 / Revised: 2 March 2022 / Accepted: 10 March 2022 / Published: 13 March 2022

Download

Browse Figures

Versions Notes

Abstract

:

The existing slender target detection methods based on optical satellite images are greatly affected by the satellite perspective and the solar perspective. Due to limited data sources, it is difficult to implement a fully data-driven approach. This work introduces the imaging parameters of optical satellite images, which greatly reduces the influence of the satellite perspectives and the solar perspectives, and reduces the demand for the amount of data. We improve the oriented bounding box (OBB) detector based on faster R-CNN (region convolutional neural networks) and propose an imaging parameters-considered detector (IPC-Det) which is more suitable for our task. Specifically, in the first stage, the umbra and the shadow are extracted by horizontal bounding box (HBB), respectively, and then the matching of the umbra and the shadow is realized according to the imaging parameters. In the second stage, the paired umbra and shadow features are used to complete the classification and regression, and the target is obtained by OBB. In experiments, after introducing imaging parameters, our detection accuracy is improved by 3.9% (up to 87.5%), proving that this work is a successful attempt to introduce imaging parameters for slender target detection.

Keywords:

optical satellite image; imaging parameters; slender targets; object detection

1. Introduction

Slender targets refer to special targets whose height is much larger than their length and width, and are more prominent in remote sensing images, such as high-voltage transmission towers, wind power towers, industrial chimneys, and single high-rise buildings. They generally have special purposes and are an indispensable part of production and life, and it is of great significance to effectively detect and supervise them. In the early days, people generally used manual inspection methods for detection, which was time-consuming and labor-intensive, and also had great risks in some special areas. The introduction of drones has greatly improved detection speed and supervision efficiency. However, slender targets generally have a wide distribution range, the overall distribution is sparse, and the coverage of UAV single-view images is limited [1,2,3], so it is still difficult to achieve large-scale target monitoring. In recent years, with the development of satellite remote sensing, aerospace monitoring methods have become more and more mature, and the data volume and resolution of space images have been continuously improved. Combined with its broad coverage of a single scene, it provides a foundation for monitoring large-scale sparse targets.

Currently, deep learning is still the mainstream method for object detection [4,5,6,7]. In the field of remote sensing, object detection based on deep learning can be divided into two categories according to the type of bounding box: HBB detection [8,9,10,11,12,13,14] and OBB detection [15,16,17,18,19,20]. HBB detection is the most commonly used object detection method. It does not distinguish the orientation of the targets and is generally applicable to images in natural scenes. In remote sensing image object detection and text recognition, the use of horizontal boxes may erroneously contain multiple object instances. OBB detection is based on HBB detection and adds the direction detection of the targets. In remote sensing scenes and text recognition, the region where the target is located can be described more accurately, and robust features such as rotation, scale, and aspect ratio in space can be extracted. This paper improves the OBB detector based on faster R-CNN [8] to make it more suitable for our task and greatly improves the detection accuracy.

OBB detectors are widely used in aerial images and ship detection [16,17,19,20,21,22,23,24]. Rotated RoI Align (RRoI Align) is used to extract rotation-invariant features, which can warp region features precisely according to the bounding boxes of Rotated RoI (RRoI) in the 2D planar [17]. Ref. [20] introduces a novel end-to-end trained ship detection system capable of detecting ships in arbitrary orientations. It utilizes a rotation proposal network (

R^{2} R N

) to generate multidirectional proposal boxes. Ref. [24] proposed a simple yet effective framework to detect multi-oriented object. Instead of returning four vertices directly, it slides the vertices of the horizontal bounding box on each corresponding edge to accurately describe an oriented object, further solving the confusion of objects that are close to horizontal. Ref. [25] incorporates a rotation-equivariant network in the detector to extract rotation-equivariant features, which can accurately predict orientation and lead to a huge reduction in model size. Based on the rotation-equivariant features, a rotation-invariant RoI alignment (RiRoI Align) is also proposed, which adaptively extracts rotation-invariant features from the equivariant features according to the orientation of the RoI.

A lot of research has been carried out on the detection of slender targets in remote sensing images at home and abroad. Zhou et al. [26] used machine learning methods to achieve the detection of power transmission towers in SAR images, and Parkpoom et al. [27] used machine learning methods such as Canny and Hough to achieve the detection of power transmission towers in aerial images. Wang et al. [28] established an aerial photography dataset and used deep learning to realize the detection of power transmission towers in aerial images. Tian et al. [29] used deep learning to realize the detection of transmission towers in satellite remote sensing images. These studies did not combine the prior information such as the characteristics of targets in remote sensing images and imaging parameters, and did not fully utilize the advantages of remote sensing. Huang et al. [30] and Huang et al. [31] analyzed the problem of slender targets in optical satellite images, and used the shadows to make up for the lack of features in the imaging process, effectively alleviating the impact of satellite perspective on imaging. However, the perspective of solar still greatly affects the imaging results of optical satellite images. At different solar azimuth angles, the distribution of targets and shadows is different, and the size of shadow is also directly determined by the solar elevation angle. To use a data-driven method to eliminate the influence of solar perspective, a large amount of data containing different solar perspectives is required, which is obviously unrealistic in the field of satellite remote sensing.

According to the imaging geometric relationship of optical satellite images, after knowing the satellite perspective and the solar perspective, we can obtain the mapping from the target to the umbra and the target to the shadow, and then calculate the size and relative position of the umbra and the shadow, and this information, as known prior knowledge, is brought into the neural network to make up for the lack of data. Due to the different satellite perspectives and solar perspectives for each image, different satellite perspectives and solar perspectives result in different relative positions of the umbra and shadow. In detection, we need to match the umbra and shadow, which, in disguise, treats the umbra and shadow as a whole. In the absence of imaging parameters such as satellite perspectives and solar perspectives, we need a large amount of data containing different relative positions of umbra and shadow, which is unrealistic in the field of remote sensing. After introducing these imaging parameters, we can directly calculate the approximate relative positions of the umbra and shadow based on the imaging parameters of each image, further reducing the data requirements and the influence of satellite perspectives and solar perspectives on detection.

Based on the above analysis, we redesigned the detection scheme. In data processing, we use L1 data as the new data source, and introduce imaging parameters such as satellite azimuth, satellite elevation, solar azimuth, and solar elevation. Then, the relative positions of the umbra and the shadow are calculated according to these imaging parameters, and these data are passed into the network as prior information. The umbra is greatly affected by the satellite perspective, while the shadow is greatly affected by the solar perspective. Detecting umbra and shadows separately can reduce the interaction between satellite and solar perspectives, thereby reducing the cost of detection and the amount of data required. In the network design, we use a two-stage OBB detector. In the first stage, the umbra and the shadow are completely detected into two categories, and then the umbra and the shadow are matched using the prior information calculated by the imaging parameters, which reduces the influence of the satellite and solar perspectives. In the second stage, the oriented prediction boxes containing the targets are regressed, and the prediction results are also given using the features extracted in the first stage. The contributions of the paper are as follows:

Imaging parameters such as satellite perspectives and solar perspectives are introduced as prior information, which makes up for the data volume of optical satellite images and reduces the influence of imaging parameters on detection.
The regression and classification of the network have been redesigned. The dual-head structure is adopted, conv-head is used for regression, fFC-head is used for classification, and channel attention is introduced before classification to improve feature utilization.
A complete set of data processing procedures and detection schemes are designed to improve the detection accuracy of slender targets in optical satellite images.

2. Materials and Methods

In this part, we first introduce the optical satellite geometric imaging model and its application in this paper, and then describe our proposed IPC-Det and its innovative modules (MRPN and TowerHead) in detail.

2.1. The Imaging Geometry Model of Optical Satellite Images

The imaging geometric model of optical satellite remote sensing is shown in Figure 1. The height of slender targets, such as high-voltage transmission towers, are much greater than their length and width. According to the imaging geometry, the umbra of a slender target is greatly affected by the satellite perspectives, but is basically not affected by the solar perspectives. The shadow of slender targets is created by the sun. When the ground is flat, the shadow is a flat target. The satellite perspective has little effect on the shadows of slender targets. Conversely, the perspective of solar has a big effect on the setting shadows. In the image, the reflectivity of the umbra is strong, the reflectivity of the shadow is low, and the difference in reflectivity between the two is obvious.

Among the imaging parameters, the elevation and azimuth of solar (satellite) have the greatest impact on imaging, and the imaging geometry is shown in Figure 2. The solar (satellite) azimuth represents the angle of rotation when rotated clockwise from north to the projection of the solar (satellite) on the ground. The solar (satellite) elevation angle represents the angle between the incident direction of the solar (satellite) light (electromagnetic wave) and the ground plane. In order to facilitate the subsequent description, solar azimuth angle is defined as

θ_{1}

, solar elevation angle is

θ_{2}

, satellite azimuth angle is

θ_{3}

, and satellite elevation angle is

θ_{4}

.

The altitude angle of satellite and solar determines the length of the umbra and the shadow. The larger the altitude angle, the smaller the image. The height of the target is marked as

H i g h

. According to the satellite imaging geometric model shown in Figure 1, the relationship between the umbra length

L_{u m b r a}

and the satellite elevation angle and the shadow length

L_{s h a d o w}

and the solar elevation angle can be obtained:

L_{u m b r a} = \frac{H i g h}{tan θ_{4}}

(1)

L_{s h a d o w} = \frac{H i g h}{tan θ_{2}}

(2)

The satellite azimuth and solar azimuth determine the orientation of the umbra and the shadow. Generally, the umbra and shadow are distributed at an angle and partially intersect, but there are two extreme cases. When the satellite azimuth and solar azimuth are similar in size, the umbra and shadow parts in the image overlap each other, and the shadow candidate region almost wraps the umbra candidate region when the solar elevation angle is small, as shown in Figure 3a. At this time, the umbra will cover part of the shadow information. When the satellite azimuth and solar azimuth differ by about

180^{\circ}

, the umbra and shadow in the image are almost opposite, and the two parts hardly overlap.

From the imaging geometry and the analysis above, it is possible to calculate the relative offset

L_{o c}

of the umbra and shadow given the imaging parameters (satellite azimuth angle, satellite elevation angle, solar azimuth angle, and solar elevation angle) and the height of target. In order to obtain the offset from the umbra of the target to the shadow, we take the bottom of the target as the origin to establish a plane coordinate system. The positive direction of the x-axis is east, and the positive direction of the y-axis is north, as shown in Figure 4.

Then, the shadow vertex coordinates

P_{s}

and the target vertex coordinates

P_{t}

under the satellite projection can be obtained:

P_{s} = (L_{s h a d o w} \cdot cos (270 - θ_{1}), L_{s h a d o w} \cdot sin (270 - θ_{1}))

(3)

P_{t} = (L_{u m b r a} \cdot cos (270 - θ_{3}), L_{u m b r a} \cdot sin (270 - θ_{3}))

(4)

The offset

L o c

from the target vertex to the shadow vertex under satellite projection is

L o c = P_{s} - P_{t}

(5)

Since the positive direction of the y-axis of the coordinate system set in Figure 4 is exactly opposite to the positive direction of the y-axis of the image coordinate system of the image (since only the relative values of satellite azimuth and solar azimuth are used in the formula, the effect of geometric correction can be ignored), the ground resolution of the data is R, and the offset

L_{o c}

from the shadow center to the umbra center is

L_{o c} = (\frac{L o c_{x}}{2 R}, - \frac{L o c_{y}}{2 R})

(6)

where

L_{o c}

is the mapping from the center of umbra to the center of shadow, which represents the change of the relative position of the shadow to the umbra in the image under imaging parameter.

2.2. IPC-Det

IPC-Det takes the imaging parameters as prior information to combine the umbra and shadow of the target, so that the detector avoids the steps of learning the location of the umbra and the shadow. Doing so optimizes the detection process and reduces the amount of data. At the same time, IPC-Det improves the classification and regression modules of faster R-CNN, improving feature utilization and detection accuracy.

By analyzing the characteristics of slender targets in optical satellite images such as high-voltage transmission towers, it is found that the umbra of slender targets is greatly affected by satellite perspective, while the shadow of slender targets is greatly affected by solar perspective. In view of the special relationship between umbra and shadow (the structures can be complementary under different imaging parameters, and the position is also affected by the imaging parameters and can be calculated from the data), we improved on the basis of faster R-CNN [8] and proposed IPC-Det, as shown in Figure 5a. IPC-Det is a two-stage detector; the first stage generates paired umbra and shadow candidate regions, and the second stage uses the umbra and shadow candidate boxes for classification and regresses the targets by OBB.

The backbone of IPC-Det is consistent with that of faster R-CNN, which is used to extract multi-scale features

f = f_{1}, f_{2}, f_{3}, f_{4}, f_{5}

. We improve the region proposal network (RPN) of the original faster R-CNN and design MRPN. The structure of MRPN is shown in Figure 5b. It contains two RPN channels for extracting the proposals of umbra and shadow by HBB, respectively. The extraction process of these two candidate regions is irrelevant. The extracted umbra candidate regions and shadow candidate regions are input to the target matching module for umbra and shadow matching, then the paired candidate regions are output.

The parameter settings of RoIAlign are the same as those of faster R-CNN, except that RoIAlign needs to be used to extract features three times. For the first time, the umbra candidate regions are extracted after enlargement by 1.2 times, and the extracted features are directly input into the regression branch of TowerHead. Next, the features corresponding to the umbra candidate regions and the shadow candidate regions are directly extracted, and the obtained features are input into the classification branch of TowerHead to detect the category of the corresponding candidate regions.

The TowerHead adopts a double-head structure, and its specific structure is shown in Figure 5c. The regression branch uses the branch of the two-head network [15], replacing the FC-head in the faster R-CNN with conv-head, which pays more attention to the edge. As for the classification branch, we improved the classification branch of the original faster R-CNN. The inputs to the classification branch are the features of umbra and shadow. In order to extract the effective information of both, we add a channel attention module (CBM) in front of the original FC-head, which greatly improves the utilization of umbra information and shadow information.

The loss of the network is calculated as follows:

L o s s = L_{M R P N} + 2 \times L_{T o w e r H e a d}

(7)

The loss of the network mainly comes from two parts: MRPN loss and TowerHead loss. Among them, MRPN is used to generate paired shadow and umbra proposals, and TowerHead uses the extracted proposals for classification and regression. The detailed calculation of the two-part loss is shown in Formulas (10)–(13) and (15).

Below, we mainly introduce the two innovative modules, MRPN and TowerHead.

2.3. MRPN

The structure of MRPN is shown in Figure 5b, which is mainly used to extract pairs of umbra and shadow proposals. Extraction is divided into two steps. First, the proposals of umbra and shadow are extracted, respectively, by RPN, then the umbra candidate regions and the shadow candidate regions are matched by prior information such as imaging parameters.

2.3.1. Extract the Proposals of Umbra and Fading Shadow

MRPN contains two parallel RPN modules to extract the proposals of umbra and shadow, respectively. Each RPN consists of a

3 \times 3

convolutional layer and two

1 \times 1

convolutional layers. The five-layer multi-scale features generated by Backbone are input into two RPNs respectively, and each layer of features shares the convolution parameters in the two RPNs. Finally, through two RPN modules, we obtain the umbra proposals

B_{t} = (b_{t x}^{i}, b_{t y}^{i}, b_{t w}^{i}, b_{t h}^{i}), i = 1, 2, \dots, N_{t}

, and the shadow proposals

B_{s} = (b_{s x}^{i}, b_{s y}^{i}, b_{s w}^{i}, b_{s h}^{i}),

i = 1, 2, \dots, N_{s}

, where

(b_{t x}, b_{t y})

and

(b_{s x}, b_{s y})

represent the center point coordinates of proposals,

b_{t w}

and

b_{s w}

represent the width of proposals, and

b_{t h}

and

b_{s h}

represent the height of proposals.

2.3.2. Target Matching

The target matching block follows the parallel RPN blocks. It matches the extracted umbra proposals with the shadow proposals, and finally outputs the paired proposals

B = (b_{t}^{i}, b_{s}^{i}), i = 1, 2, \dots, N

, where

b_{t}^{i}

and

b_{s}^{i}

are pairs of umbral proposals and shadow proposals.

Analyzing the optical satellite imaging geometry and target structure features to obtain the offset

L_{o c} = (F_{x} (h i g, θ_{1}, θ_{2}, θ_{3}, θ_{4}), F_{y} (h i g, θ_{1}, θ_{2}, θ_{3}, θ_{4}))

, where

h i g

represents the height of the target,

θ_{1}

,

θ_{2}

,

θ_{3}

and

θ_{4}

represent solar elevation, solar azimuth, satellite elevation and satellite azimuth, respectively. In a scene image, the solar elevation, solar azimuth, satellite elevation, and satellite azimuth are fixed and can be used as prior information, but the height of the tower in the image is variable and unknown. High-voltage transmission towers are sparsely distributed, widely distributed, and have complex backgrounds. Its general height is concentrated between 20–100 m. After testing, assuming a height of 80 m is the best.

Through the umbra candidate region

B_{t}

and the shadow candidate region

B_{s}

, we can obtain the position of the umbra in the image (

P_{u m b r a}

) and the position of the shadow in the image (

P_{s h a d o w}

), and then obtain the position vector of the umbra (

P_{t}

). The theoretical position of the shadow

P_{s h a d o w}^{^{'}}

corresponding to the umbra can be obtained by Formula (8) (convert

P_{s}

from the local coordinate system to the image coordinate system).

P_{s} = P_{t} + L o c

(8)

After obtaining

P_{s h a d o w}^{^{'}}

, we compare the theoretical position of the shadow (

P_{s h a d o w}^{^{'}}

) with the actual position of the shadow (

P_{s h a d o w}

), and the shadow that is consistent with the theoretical value is the selected proposal. However, since the height of the tower is assumed, and both

P_{u m b r a}

and

P_{s h a d o w}

are predicted by the network, there will always be errors between theoretical and actual values. In order to achieve a better matching result, we give

P_{s h a d o w}^{^{'}}

a certain active range a (

a > 0

), and obtain a new shadow prediction position

P_{s h a d o w}^{^{″}}

:

P_{s h a d o w}^{^{″}} = P_{s h a d o w}^{^{'}} + a

(9)

We use

P_{s h a d o w}^{^{″}}

to match

P_{s h a d o w}

. When

P_{s h a d o w}

is within the range of

P_{s h a d o w}^{^{″}}

, it can be considered as a candidate proposal. Finally, among all candidate proposals, we select the one with the highest confidence as the target.

2.3.3. Loss Function

During training, the main task of the RPN module is to extract the proposals of umbra and fading shadow, without matching umbra and fading shadows. We assign a binary label

b = {0, 1}

to each anchor. When the IoU of the anchor and each ground truth is less than 0.3, the label of the anchor is 0. When the IoU of the anchor and one or more ground-truth is greater than 0.7, the label of the anchor is 1, and other labels are discarded. Then, the loss of MRPN

L_{M R O N}

is

L_{M R P N} = L_{U M B R A} + L_{S H A D O W}

(10)

L_{U M B R A} (t_{t}, a_{t}) = \frac{1}{N_{t}} \sum_{i} F_{c l s} (t_{t i}, t_{t i}^{^{'}}) + \frac{1}{N_{t}} \sum_{i} t_{t i} F_{r e g} (a_{t i}, a_{t i}^{^{'}})

(11)

L_{S H A D O W} (t_{s}, a_{s}) = \frac{1}{N_{s}} \sum_{i} F_{c l s} (t_{s i}, t_{s i}^{^{'}}) + \frac{1}{N_{s}} \sum_{i} t_{s i} F_{r e g} (a_{s i}, a_{s i}^{^{'}})

(12)

where

L_{U M B R A}

represents the loss of the RPN that extracts the umbral proposals,

L_{S H A D O W}

represents the loss of the RPN that extracts the shadow proposals,

t_{i}

represents the label of the anchor,

t_{i}^{^{'}}

represents the category predicted by RPN,

a_{i}

represents the deviation of anchor and ground truth,

a_{i}^{^{'}}

represents the regression of RPN, and

N_{t}

and

N_{s}

are the number of anchors for the umbra and the shadow, respectively.

F_{c l s}

represents the cross-entropy loss, which is used to calculate the classification loss, and

F_{r e g}

represents the smooth L1 loss (as shown in Formula (13)), which is used to calculate the regression loss. The weight of each part of the loss is 1.

S m o o t h L 1 L o s s (x, y) = \{\begin{matrix} 0.5 {(x - y)}^{2}, i f | x - y | < 1 \\ | x - y | - 0.5, o t h e r w i s e \end{matrix}

(13)

2.4. TowerHead

The double-head structure is widely used in R-CNN part for classification and regression. Ref. [14] shows that classification tasks and regression tasks have different focuses. Classification tasks typically require translational invariance, while regression tasks require translational variability. The double-head structure can satisfy both of these points at the same time. Ref. [15] shows that FC-head is more spatially sensitive than conv-head; FC-head is suitable for classification and conv-head is more suitable for regression. In this paper, we extract features of umbra and shadow, we use the features of umbra in regression, and in classification, using both umbra features and shadows features significantly improves the classification performance. Therefore, this paper retains the conv-head branch of [15] to complete the regression task. In the classification branch, we add a channel attention module based on [15] to improve the utilization of umbras and shadows.

2.4.1. CAM

A channel attention module (CAM) is added to the classification channel, as shown in Figure 6. This module is similar to the attention module proposed in [32], except that only the front channel attention is used, and the latter spatial attention is discarded, because the size of our feature is only

7 \times 7

, so it does not need to be used for spatial attention.

The input of the CAM is the umbra features and shadow features extracted by RoIAlign. This block has two channels; one performs the MaxPool operation on the feature and the other performs the AvgPool operation. The results are then fed into a two-layer convolutional network, with both channels sharing the parameters of the convolutional network. The final results of the two channels are then summed and weighted by a sigmoid activation function. Finally, the weights are multiplied by the input features to obtain the final channel attention processing result.

The calculation process of CAM is

\begin{matrix} F_{c l s} = & C o n c a t (F_{u m b r a}, F_{s h a d o w}) \times S i g m o i d (M L P (M a x P o o l (C o n c a t (F_{u m b r a}, F_{s h a d o w}))) \\ + M L P (A v g P o o l (C o n c a t (F_{u m b r a}, F_{s h a d o w})))) \end{matrix}

(14)

2.4.2. The Design of Conv-Head

In regression, we still use the regression branch designed by [15]. This branch includes six residual modules. The first two residual modules (as shown in Figure 7a) increase the dimension of features from 256 to 1024 dimensions, and the last four residual modules are non-local blocks (as shown in Figure 7b). Finally, a series of oriented bounding boxes

B = {(x_{i}, y_{i}, w_{i}, h_{i}, θ_{i}), i = 1, 2, 3 . . . N}

are obtained, where

(x_{i}, y_{i})

represents the center coordinates of the prediction boxes,

w_{i}

represents the width of the prediction boxes,

h_{i}

represents the height of the prediction boxes, and

θ_{i}

represents the rotation angle of the prediction boxes.

2.4.3. Loss Function

The loss in the TowerHead part consists of two parts: regression loss and classification loss. TowerHead contains two channels for computing classification and regression. The input of the channel during classification is a pair of umbral features and shadow features, and the input of the channel during regression is only the umbra feature. The loss of TowerHead is shown in Formula (15).

L_{T o w e r H e a d} (C, B) = \frac{1}{N} \sum_{i} L_{c l s} (C_{i}, C_{i}^{^{'}}) + \frac{1}{N} \sum_{i} p_{t i}^{^{'}} L_{r e g} (B_{i}, B_{i}^{^{'}})

(15)

where B represents the regression result of the network, and C represents the classification result of the network.

L_{c l s}

represents regression loss, using cross-entropy loss, and

L_{r e g}

represents regression loss, using smooth L1 loss (as shown in Formula (13)).

3. Experiment

3.1. Datasets

The experiment takes high-voltage transmission towers as experimental targets, the data is taken from the L1 level data of JL-2 and GF-2, and the data resolution is 0.8 m. The image sizes in this dataset range from

406 \times 358

to

1119 \times 837

, with 2584 pieces of data in the training set and 915 pieces of data in the test set. The dataset contains three types of transmission towers, namely, the wine-glass tower, the dry-type tower, and cat-head tower, and the ratio of each type of tower is basically 1:1:1.

Since IPC-Det needs to introduce additional imaging parameters as prior information, existing annotation design schemes are not applicable here. On the basis of original COCO annotation [33], we added imaging parameters (“ImagingParameters” in Figure 8) and the corresponding shadow box annotation (“Shadow” in Figure 8), as shown in Figure 8. Our new annotation satisfies the requirements of IPC-Det without compromising its applicability in other networks.

3.2. Training Configurations

All experiments in the paper are performed in the pytorch environment. The experimental platform is two RTX 2080Ti, and the batch size is set to 4. In the experiment, the long side of the image is uniformly adjusted to 800. The experiment contains a total of 12 epochs, and the initial learning rate is set to

5 \times 10^{- 6}

. During the next 500 steps, the learning rate is uniformly increased to 0.005. From the 8th to the 11th epoch, the learning rate is reduced to

5 \times 10^{- 4}

. Finally, at epoch 12, the learning rate is

5 \times 10^{- 5}

. In the MRPN part, the anchor size of umbra and shadow are both set to 8, and the aspect ratio is [0.5, 1, 2]. When calculating the offset from the umbra to the shadow using the imaging parameters, the height of the tower is assumed to be 80 m. In TowerHead, when using conv-head to regress the umbral, we enlarge the RoI of the umbral extracted by MRPN by a factor of 1.2, and then extract the corresponding features. Conv-head uses six residual modules, where the first two residual modules are used for increasing the channel dimension and the last four residual modules are non-local blocks. The details of the two modules are shown in Figure 7. In CAM, we use a total of two convolutional layers; the first convolutional layer halves the input feature channels and the second restores the channel dimension.

3.3. Ablation Experiment

In order to verify the influence of different detection heads and shadow features on detection results, we use different detection heads in our experiments (shown in Figure 9) and experiment with only umbra features and with both umbra features and shadow features. A comparative experiment was carried out, and the experimental results are shown in Table 1. Figure 9 shows several detection heads used in the ablation experiments. Among them, only (a) is a single-head structure, and the rest are double-head structures. Both detection branches of (b) are fully connected structures. The regression branch of (c) is conv-head, and the classification branch is a fully connected structure. The regression branch of (d) is a fully connected structure, and the classification branch adopts a CAM structure, and (e) is the detection head of IPC-Det (ours).

It can be seen from Table 1 that when using the same detection head, the accuracy of using both umbra features and shadow features is higher than that of using only umbra features, indicating that the shadow features matched by imaging parameters can indeed be used to improve detection accuracy. Overall, the experimental accuracy of SFHead and CHead is the lowest, the experimental accuracy of DFHead and CAHead is second, and the accuracy of TowerHead is the highest. This shows that the dual-channel structure is indeed better than the single-channel structure, and the FC layer structure using dual-channel is more suitable for this task than the single-channel FC structure. The experimental accuracy of CHead is similar to that of SFHead, indicating that, in this case, it is difficult to use conv-head alone to exert its advantages, and only improving the regression strength cannot improve the final accuracy. Compared with DFHead, the experimental accuracy of CAHead is not much improved, which indicates that the improvement of using CAM alone is still insignificant, and increasing the classification strength alone cannot greatly improve the final accuracy. TowerHead achieves the highest precision. TowerHead not only uses conv-head to improve the regression strength of the network, but also uses CAM to improve the classification strength of the network, indicating that the highest experimental accuracy can be obtained by improving the regression strength and classification strength at the same time.

The analysis of different network inputs under the same detection head structure in Table 1 shows that there is little difference on SFHead and CHead between using only the umbra feature and using both the umbra feature and the shadow feature. Neither SFHead nor CHead change the classification branch much, which is also consistent with the experimental results. In CAHead and TowerHead, the experimental accuracy of inputting umbra features and shadow features at the same time is much higher than that of only inputting umbra features, indicating that CAM can indeed improve the utilization of umbra features and shadow features.

3.4. Comparision with Other Networks

In the experiment part, there are six comparison networks, namely faster R-CNN [8], oriented R-CNN [16], ReDet [25], RoITransformer [17], double-head [15], and IPC-Det (ours). There are four experimental indicators: “

m A P

”, “

A P

”, “

A R

”, “

P a r a m s

”, “

F P S

”. “

A P

” represents the precision of networks. “

A R

” represents the average recall of networks. “

P a r a m s

” represents the number of network parameters, in millions (M) in the experimental results. “

F P S

” represents the network’s ability to process data per second. The calculations of “precision” and “recall” are shown in Formulas (16) and (17), and the calculation method of the above six parameters is consistent with the MS COCO dataset [33]. The performance of experimental networks on the dataset are shown in Table 2.

P r e c i s i o n = \frac{T r u e P o s i t i v e s}{T r u e P o s i t i v e s + F a l s e P o s i t i v e s}

(16)

R e c a l l = \frac{T r u e P o s i t i v e s}{T r u e P o s i t i v e s + F a l s e N e g a t i v e s}

(17)

As can be seen from Table 2, IPC-Det achieves the highest precision among several networks, and its recall rate is also second only to ReDet, which shows that our proposed method is effective. Compared with faster R-CNN, the precision of IPC-Det is increased by 3.9%, the recall rate is increased by 0.9%, and the detection accuracy of three types of targets is higher than that of faster R-CNN. The detection speed of double-head and IPC-Det is the lowest among six networks, and the detection accuracy of double-head is slightly higher than that of IPC-Det. This is because double-head uses conv-head for regression, and conv-head contains a multi-layer residual network. These convolutional networks greatly slow down the overall detection speed. IPC-Det not only uses conv-head for regression, but also uses umbra features and shadow features for classification, and adds a channel attention layer before the original fully connected layer, which further reduces the detection accuracy of the network. The detection results of the six networks are shown in Figure 10. In this case, the six networks can basically complete the detection task.

4. Conclusions

Imaging parameters such as satellite perspective and solar perspective have a great influence on optical satellite images. Imaging parameters are variable, and the same target has different umbra and shadow under different Imaging parameters. Unlike the field of computer vision, it is difficult to obtain high-quality data in the field of remote sensing, and data-driven methods are severely limited. By analyzing the imaging geometric model of optical satellite images, we found that the size and relative position of umbra and shadow can be roughly predicted when the Imaging parameters are known. This can avoid directly learning the relative positions of umbras and shadows, thereby separating the influence of satellite perspectives and solar perspectives on the image, and indirectly reducing their influence on the image. In this paper, the data processing flow and network are redesigned, and imaging parameters are introduced as prior information. Under the action of imaging parameters, the influence of imaging parameters on detection is greatly reduced, thereby reducing the demand for data volume and improving detection accuracy.

This work is an initial attempt to introduce imaging parameters to extract slender targets in optical satellite images. Experiments show that this direction is correct and deserves follow-up research. At present, the sample difficulty is not too big. In the future, the dataset can be expanded and some difficult samples can be added. Second, the detection speed of the existing model needs to be improved, and the model needs to be refined.

Author Contributions

Z.H. and F.W. designed the algorithm; Z.H. performed the algorithm with python; H.Y. and Y.H. guided the overall project containing this one; Z.H. and F.W. wrote this paper; H.Y. and F.W. read and revised this paper. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflict of interest.

References

Ruan, L.; Wang, J.; Chen, J.; Xu, Y.; Yang, Y.; Jiang, H.; Zhang, Y.; Xu, Y. Energy-efficient multi-UAV coverage deployment in UAV networks: A game-theoretic framework. China Commun. 2018, 15, 194–209. [Google Scholar] [CrossRef]
Yang, X.; Lin, D.; Zhang, F.; Song, T.; Jiang, T. High Accuracy Active Stand-off Target Geolocation Using UAV Platform. In Proceedings of the 2019 IEEE International Conference on Signal, Information and Data Processing (ICSIDP), Chongqing, China, 11–13 December 2019; pp. 1–4. [Google Scholar] [CrossRef]
Wang, S.; Han, Y.; Chen, J.; Zhang, Z.; Wang, G.; Du, N. A Deep-Learning-Based Sea Search and Rescue Algorithm by UAV Remote Sensing. In Proceedings of the 2018 IEEE CSAA Guidance, Navigation and Control Conference (CGNCC), Xiamen, China, 10–12 August 2018; pp. 1–5. [Google Scholar] [CrossRef]
Lin, M.; Chen, Q.; Yan, S. Network in Network. arXiv 2013, arXiv:1312.4400. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
Zou, Z.; Shi, Z.; Guo, Y.; Ye, J. Object Detection in 20 Years: A Survey. arXiv 2019, arXiv:1905.05055. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal Loss for Dense Object Detection. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2999–3007. [Google Scholar]
He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
Tian, Z.; Shen, C.; Chen, H.; He, T. FCOS: Fully Convolutional One-Stage Object Detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 27–28 October 2019. [Google Scholar]
Zhang, S.; Chi, C.; Yao, Y.; Lei, Z.; Li, S.Z. Bridging the Gap Between Anchor-Based and Anchor-Free Detection via Adaptive Training Sample Selection. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020. [Google Scholar]
Song, G.; Liu, Y.; Wang, X. Revisiting the Sibling Head in Object Detector. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020. [Google Scholar]
Wu, Y.; Chen, Y.; Yuan, L.; Liu, Z.; Wang, L.; Li, H.; Fu, Y. Rethinking Classification and Localization for Object Detection. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020. [Google Scholar]
Xie, X.; Cheng, G.; Wang, J.; Yao, X.; Han, J. Oriented R-CNN for Object Detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, BC, Canada, 11–17 October 2021; pp. 3520–3529. [Google Scholar]
Ding, J.; Xue, N.; Long, Y.; Xia, G.S.; Lu, Q. Learning RoI Transformer for Oriented Object Detection in Aerial Images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019. [Google Scholar] [CrossRef]
Yang, X.; Liu, Q.; Yan, J.; Li, A.; Zhang, Z.; Yu, G. R3Det: Refined Single-Stage Detector with Feature Refinement for Rotating Object. arXiv 2019, arXiv:1908.05612. [Google Scholar]
Ma, J.; Shao, W.; Hao, Y.; Li, W.; Hong, W. Arbitrary-Oriented Scene Text Detection via Rotation Proposals. IEEE Trans. Multimed. 2017, 20, 3111–3122. [Google Scholar] [CrossRef] [Green Version]
Zhang, Z.; Guo, W.; Zhu, S.; Yu, W. Toward Arbitrary-Oriented Ship Detection With Rotated Region Proposal and Discrimination Networks. IEEE Geosci. Remote Sens. Lett. 2018, 15, 1745–1749. [Google Scholar] [CrossRef]
Xia, G.S.; Bai, X.; Ding, J.; Zhu, Z.; Belongie, S.; Luo, J.; Datcu, M.; Pelillo, M.; Zhang, L. DOTA: A Large-Scale Dataset for Object Detection in Aerial Images. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 3974–3983. [Google Scholar] [CrossRef] [Green Version]
Ding, J.; Xue, N.; Xia, G.S.; Bai, X.; Yang, W.; Yang, M.; Belongie, S.; Luo, J.; Datcu, M.; Pelillo, M.; et al. Object Detection in Aerial Images: A Large-Scale Benchmark and Challenges. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 1. [Google Scholar] [CrossRef] [PubMed]
Huang, H.; Huo, C.; Wei, F.; Pan, C. Rotation and Scale-Invariant Object Detector for High Resolution Optical Remote Sensing Images. In Proceedings of the IGARSS 2019–2019 IEEE International Geoscience and Remote Sensing Symposium, Yokohama, Japan, 28 July–2 August 2019; pp. 1386–1389. [Google Scholar] [CrossRef]
Xu, Y.; Fu, M.; Wang, Q.; Wang, Y.; Chen, K.; Xia, G.S.; Bai, X. Gliding Vertex on the Horizontal Bounding Box for Multi-Oriented Object Detection. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 43, 1452–1459. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Han, J.; Ding, J.; Xue, N.; Xia, G.S. ReDet: A Rotation-equivariant Detector for Aerial Object Detection. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 2785–2794. [Google Scholar] [CrossRef]
Zhou, X.; Liu, X.; Chen, Q.; Zhang, Z. Power Transmission Tower CFAR Detection Algorithm Based on Integrated Superpixel Window and Adaptive Statistical Model. In Proceedings of the IGARSS 2019–2019 IEEE International Geoscience and Remote Sensing Symposium, Yokohama, Japan, 28 July–2 August 2019; pp. 2326–2329. [Google Scholar] [CrossRef]
Tragulnuch, P.; Kasetkasem, T.; Isshiki, T.; Chanvimaluang, T.; Ingprasert, S. High Voltage Transmission Tower Identification in an Aerial Video Sequence using Object-Based Image Classification with Geometry Information. In Proceedings of the 2018 15th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON), Chiang Rai, Thailand, 18–21 July 2018; pp. 473–476. [Google Scholar] [CrossRef]
Wang, H.; Yang, G.; Li, E.; Tian, Y.; Zhao, M.; Liang, Z. High-Voltage Power Transmission Tower Detection Based on Faster R-CNN and YOLO-V3. In Proceedings of the 2019 Chinese Control Conference (CCC), Guangzhou, China, 27–30 July 2019; pp. 8750–8755. [Google Scholar] [CrossRef]
Tian, G.; Meng, S.; Bai, X.; Liu, L.; Zhi, Y.; Zhao, B.; Meng, L. Research on Monitoring and Auxiliary Audit Strategy of Transmission Line Construction Progress Based on Satellite Remote Sensing and Deep Learning. In Proceedings of the 2020 2nd International Conference on Information Technology and Computer Application (ITCA), Guangzhou, China, 18–20 December 2020; pp. 73–78. [Google Scholar] [CrossRef]
Huang, Z.; Wang, F.; You, H.; Hu, Y. Shadow Information-Based Slender Targets Detection Method in Optical Satellite Images. IEEE Geosci. Remote Sens. Lett. 2021, 19, 1–5. [Google Scholar] [CrossRef]
Hu, Y. STC-Det: A Slender Target Detector Combining Shadow and Target Information in Optical Satellite Images. Remote Sens. 2021, 13, 4183. [Google Scholar]
Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module; Springer: Cham, Switzerland, 2018. [Google Scholar]
Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Zitnick, C.L. Microsoft COCO: Common Objects in Context; Springer International Publishing: New York, NY, USA, 2014. [Google Scholar]

Figure 1. Imaging geometric models of optical satellites. The size and orientation of umbras in optical images are affected by the satellite perspective, and the size and orientation of shadows are affected by the solar perspective. The same target has different umbras and shadows under different perspectives of satellite and solar. In this figure, “

T a r g e t

” represents the slender target, “

U m b r a

” represents the umbra of the target under satellite, and “

S h a d o w

” represents the shadow of the target under sun.

Figure 1. Imaging geometric models of optical satellites. The size and orientation of umbras in optical images are affected by the satellite perspective, and the size and orientation of shadows are affected by the solar perspective. The same target has different umbras and shadows under different perspectives of satellite and solar. In this figure, “

T a r g e t

” represents the slender target, “

U m b r a

” represents the umbra of the target under satellite, and “

S h a d o w

” represents the shadow of the target under sun.

Figure 2. Schematic diagram of solar (satellite) elevation and azimuth definition. In this figure, “

E l e v a t i o n A n g l e

” represents the elevation angle, and “

A z i m u t h A n g l e

” represents the azimuth angle.

Figure 2. Schematic diagram of solar (satellite) elevation and azimuth definition. In this figure, “

E l e v a t i o n A n g l e

” represents the elevation angle, and “

A z i m u t h A n g l e

” represents the azimuth angle.

Figure 3. Two special distributions of umbra and shadow under the influence of satellite azimuth and solar azimuth. “

U m b r a

” represents the umbra in the image, and “

S h a d o w

” represents the shadow in the image. (a) The umbra and shadow distribution when the satellite azimuth and solar azimuth are similar. (b) The umbra and shadow distribution when the satellite azimuth and solar azimuth differ by about

180^{\circ}

.

Figure 3. Two special distributions of umbra and shadow under the influence of satellite azimuth and solar azimuth. “

U m b r a

” represents the umbra in the image, and “

S h a d o w

” represents the shadow in the image. (a) The umbra and shadow distribution when the satellite azimuth and solar azimuth are similar. (b) The umbra and shadow distribution when the satellite azimuth and solar azimuth differ by about

180^{\circ}

.

Figure 4. Schematic illustration of the umbra and shadow distribution of slender targets in an optical satellite image. Taking the bottom of the target as the origin, the east is the positive direction of the x-axis, and the north is the positive direction of the y-axis to establish a coordinate system. “

P_{t}

” represents the vector representation of the umbra in the coordinate system, “

P_{s}

” represents the representation of the shadow in the coordinate system, and “

L_{o c}

” represents the offset of the umbra to the shadow.

Figure 4. Schematic illustration of the umbra and shadow distribution of slender targets in an optical satellite image. Taking the bottom of the target as the origin, the east is the positive direction of the x-axis, and the north is the positive direction of the y-axis to establish a coordinate system. “

P_{t}

” represents the vector representation of the umbra in the coordinate system, “

P_{s}

” represents the representation of the shadow in the coordinate system, and “

L_{o c}

” represents the offset of the umbra to the shadow.

Figure 5. Overall architecture diagram of our proposed method. (a) Overall structure diagram of IPC-Det. We first extract multi-scale features using a deep convolutional backbone, then use MRPN to extract umbra and shadow candidate regions, and finally use RoIAlign to extract features corresponding to umbra and shadow, and use TowerHead for classification and regression. (b) Detailed structure of MRPN. We use two RPN modules to extract umbra and shadow candidate regions, respectively, then use imaging parameters to match umbra and shadow, and finally output paired candidate regions for umbra and shadow. (c) Schematic diagram of TowerHead. TowerHead adopts a double-head structure. The upper channel in the figure is used for classification which consists of a channel attention module (CAM) and a fully connected layer (FC layer). CAM improves the feature utilization, and the FC layers classify the features. The lower channel performs regression, similar to [15]: both use the conv-head structure.

Figure 6. Detailed structure of the Channel Attention Module (CAM) in TowerHead. The blue shape on the left is the shadow feature, the red shape is the umbra feature, and the yellow shape on the right is the feature after channel attention processing. The umbra feature and shadow feature go through a two-layer shared convolutional layer after MaxPool operation and AvgPool operation. Finally, the results are added and then the Sigmoid operation is used to obtain the weight of the features, and the weight is multiplied by the input feature to obtain the output of this module.

Figure 7. Different residual modules in conv-head. (a) Residual module used to increase dimensions (from 256 to 1024). (b) Non-local blocks.

Figure 8. Our redesigned annotation format. This annotation adds “

I m a g i n g P a r a m e t e r s

” and “

S h a d o w

” on the basis of COCO [33], where “

I m a g i n g P a r a m e t e r s

” contains the imaging parameters of the data, and “

S h a d o w

” is the shadow label corresponding to the umbra.

Figure 8. Our redesigned annotation format. This annotation adds “

I m a g i n g P a r a m e t e r s

” and “

S h a d o w

” on the basis of COCO [33], where “

I m a g i n g P a r a m e t e r s

” contains the imaging parameters of the data, and “

S h a d o w

” is the shadow label corresponding to the umbra.

Figure 9. The different structures of TowerHead in Table 1. (a) The detection head of faster R-CNN. (b) Double-head structure. Both classification and regression are performed by their respective FC layers. (c) Double-head structure. Classification is performed by FC layers and regression is performed by conv-head. (d) Double-head structure. Classification is performed using CAM and FC layers and regression is performed by FC layers. (e) Double-head structure. Classification is performed by CAM and FC layers and regression is performed by a conv-head [15].

Figure 10. Visualization results of six types of networks on the dataset: (a) is the partial result of faster R-CNN [8], (b) is the partial result of oriented R-CNN [16], (c) is the partial result of RoITransformer [17], (d) is the partial result of double-head [15], (e) is the partial result of ReDet [25], and (f) is the partial result of IPC-Det (ours).

Table 1. Experiments with different inputs and structures of TowerHead. This table shows the experimental results of the multiple detector head structure shown in Figure 9. In the experiment, TowerHead has two kinds of inputs: “

U m b r a

” and “

U m b r a + S h a d o w

”, which represent using only umbra features as input, and using umbra features and shadow features as input, respectively. Five different TowerHead structures are used, as shown in Figure 9.

Table 1. Experiments with different inputs and structures of TowerHead. This table shows the experimental results of the multiple detector head structure shown in Figure 9. In the experiment, TowerHead has two kinds of inputs: “

U m b r a

” and “

U m b r a + S h a d o w

”, which represent using only umbra features as input, and using umbra features and shadow features as input, respectively. Five different TowerHead structures are used, as shown in Figure 9.

Inputs of TowerHead		Structures of TowerHead					mAP	AR
Umbra	Umbra + Shadow	SFCHead	DFCHead	CHead	CAHead	TowerHead	mAP	AR
√		√					0.834	0.920
	√	√					0.838	0.915
√			√				0.839	0.922
	√		√				0.844	0.922
√				√			0.833	0.921
	√			√			0.834	0.916
√					√		0.838	0.920
	√				√		0.847	0.932
√						√	0.862	0.926
	√					√	0.875	0.931

Table 2. Experimental results on the dataset. Each column represents an experimental indicator, and each row represents an experimental network. A, B, and C represent the three types of tower in the dataset, respectively.

Model	A		B		C		mAP	AR	FPS (Task/s)	Params (M)
Model	AP	AR	AP	AR	AP	AR	mAP	AR	FPS (Task/s)	Params (M)
Faster R-CNN	0.890	0.944	0.808	0.890	0.808	0.930	0.836	0.922	30.5	41.13
Oriented R-CNN	0.875	0.936	0.774	0.897	0.778	0.914	0.809	0.916	30.4	41.13
RoITransformer	0.876	0.944	0.802	0.876	0.810	0.937	0.830	0.919	26.1	55.05
Double-Head	0.890	0.946	0.808	0.897	0.812	0.921	0.837	0.921	7.9	46.72
ReDet	0.842	0.931	0.758	0.944	0.719	0.967	0.773	0.947	19.5	31.56
IPC-Det(ours)	0.892	0.958	0.887	0.911	0.846	0.923	0.875	0.931	4.6	59.83

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Huang, Z.; Wang, F.; You, H.; Hu, Y. Imaging Parameters-Considered Slender Target Detection in Optical Satellite Images. Remote Sens. 2022, 14, 1385. https://doi.org/10.3390/rs14061385

AMA Style

Huang Z, Wang F, You H, Hu Y. Imaging Parameters-Considered Slender Target Detection in Optical Satellite Images. Remote Sensing. 2022; 14(6):1385. https://doi.org/10.3390/rs14061385

Chicago/Turabian Style

Huang, Zhaoyang, Feng Wang, Hongjian You, and Yuxin Hu. 2022. "Imaging Parameters-Considered Slender Target Detection in Optical Satellite Images" Remote Sensing 14, no. 6: 1385. https://doi.org/10.3390/rs14061385

APA Style

Huang, Z., Wang, F., You, H., & Hu, Y. (2022). Imaging Parameters-Considered Slender Target Detection in Optical Satellite Images. Remote Sensing, 14(6), 1385. https://doi.org/10.3390/rs14061385

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Imaging Parameters-Considered Slender Target Detection in Optical Satellite Images

Abstract

1. Introduction

2. Materials and Methods

2.1. The Imaging Geometry Model of Optical Satellite Images

2.2. IPC-Det

2.3. MRPN

2.3.1. Extract the Proposals of Umbra and Fading Shadow

2.3.2. Target Matching

2.3.3. Loss Function

2.4. TowerHead

2.4.1. CAM

2.4.2. The Design of Conv-Head

2.4.3. Loss Function

3. Experiment

3.1. Datasets

3.2. Training Configurations

3.3. Ablation Experiment

3.4. Comparision with Other Networks

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI