Next Article in Journal
A Suggestion on the LDA-Based Topic Modeling Technique Based on ElasticSearch for Indexing Academic Research Results
Previous Article in Journal
LPCP: An efficient Privacy-Preserving Protocol for Polynomial Calculation Based on CRT
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Feature Enhanced Anchor-Free Network for School Detection in High Spatial Resolution Remote Sensing Images

1
Key Laboratory of Digital Earth Science, Aerospace Information Research Institute (AIR), Chinese Academy of Sciences, Beijing 100094, China
2
University of Chinese Academy of Sciences, Beijing 100049, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2022, 12(6), 3114; https://doi.org/10.3390/app12063114
Submission received: 14 February 2022 / Revised: 13 March 2022 / Accepted: 16 March 2022 / Published: 18 March 2022

Abstract

:
Object detection in remote sensing images (RSIs) is currently one of the most important topics, which can promote the understanding of the earth and better serve for the construction of digital earth. In addition to single objects, there are many composite objects in RSIs, especially primary and secondary schools (PSSs), which are composed of some parts and surrounded by complex background. Existing deep learning methods have difficulty detecting composite objects effectively. In this article, we propose a feature enhanced Network (FENet) based on an anchor-free method for PSSs detection. FENet can not only realize more accurate pixel-level detection based on enhanced features but also simplify the training process by avoiding hyper-parameters. First, an enhanced feature module (EFM) is designed to improve the representation ability of complex features. Second, a context-aware strategy is used for alleviating the interference of background information. In addition, complete intersection over union (CIoU) loss is employed for bounding box regression, which can obtain better convergence speed and accuracy. At the same time, we build a PSSs dataset for composite object detection. This dataset contains 1685 images of PSSs in Beijing–Tianjin–Hebei region. Experimental results demonstrate that FENet outperforms several object detectors and achieves 78.7% average precision. The study demonstrates the advantage of our proposed method on PSSs detection.

1. Introduction

Object detection has been a basic and popular task in remote sensing image interpretation. The purpose of object detection is to locate the objects and predict the corresponding categories. In recent years, outstanding achievements have been made in remote sensing image object detection [1,2,3,4,5]. For one, a large number of high-resolution remote sensing images (RSIs) can be obtained, benefitting from the advances of earth observation satellite technology. More detailed information of geospatial objects can be observed and analyzed. Furthermore, extremely powerful tools have been proposed thanks to the increase in diverse data and computational resources [6]. Recent research has proved that deep learning is a powerful tool in feature learning and big data analysis [7,8,9,10]. Particularly, excellent performance of convolutional neural networks (CNNs) arouses great concern and interest from researchers in the field of remote sensing. Deep learning methods have been extensively used in the field of remote sensing imagery interpretation. However, object detection in RSIs still faces many challenges:
  • Deep CNN can extract mid-level and high-level features. However, feature representation in RSIs using single CNN is limited due to scale variation, complicated scenes, and noise.
  • Deep learning methods rely heavily on massive amounts of annotated data for learning. Despite the explosive growth of satellite imagery in quality and quantity, much of the data must be labeled manually.
  • Different from the objects in natural scene images, the objects in RSIs usually are photographed from an overhead view and we cannot capture details from multiple angles. In addition, affected by illumination, shadow, scale variations, resolution, and so on, objects in RSIs are difficult to detect accurately using deep learning methods directly.
  • Many geospatial objects in RSIs are combinations of some objects, such as airports, schools, and thermal power plants. These composite objects contain several parts and the feature representation is not fixed. It is challenging to detect composite objects due to their diverse appearance, irregular boundaries, and complex background.
Based on the problems mentioned above, object detection methods in RSIs have been extensively explored in the academic community. Some studies [11,12,13,14] applied feature pyramid structures for multi-scale object detection. Top-down or bottom-up pathway feature layers can be used to achieve multi-scale prediction, particularly for small object detection. The attention mechanism [15] can emphasize the relative information of objects and improve the feature representation. In order to improve the capacity of discriminative features, some research [16,17,18] introduced an attention mechanism to CNN for object detection in RSIs and effectively improved the discrimination of features. However, attention methods are usually applied on all convolutional layers which generates a large computation requirement. Additionally, these methods are mainly built on anchor-based [8,9] networks that need to set anchor parameters for new objects. In the field of composite object detection, some studies used hard example mining [19,20] or improved RPN [21] for airport detection. Sun et al. [22] proposed a part-based network for composite object detection and achieved great performance.
However, the objects studied in these articles are different from composite objects such as primary and secondary schools (PSSs). Figure 1 shows some school samples in Gaofen (GF) images. PSSs in China are composed of a ground track field and some buildings of various shapes and sizes, which have relatively clear boundaries. In urban regions, PSSs are surrounded by clustered residential areas; but in remote rural regions, the color and texture of playgrounds are different from that in the urban areas, and PSSs are surrounded by villages and farmlands. In addition, the PSSs are relatively smaller and the internal components of PSSs are more compact compared with other composite objects. As a typical composite object, PSSs vary in terms of appearance and scale, and are easily disturbed by complex environments, which brings huge challenges for detection. Furthermore, PSSs detection plays a crucial role in social development. PSSs are the foundation of national education and shoulder the responsibility of educating the future successors of the country. Analyzing the spatial distribution of PSSs is of great significance for regional construction and economic evaluation. Therefore, PSSs detection is an important and meaningful task for applications of remote sensing images.
Anchor-free networks are simple and effective methods for object detection. In general, most existing object detectors adopt anchors for generating predicted boxes, which results in an excessive number of hyper-parameters. These hyper-parameters need to be carefully tuned before training. When applying these methods to new datasets, it is necessary to analyze the sizes and aspect ratios of bounding boxes. Moreover, anchor-based detectors will generate large anchor boxes during training for achieving a high recall rate, which leads to an excessive number of negative samples. Anchor boxes also involve complicated computation by repeatedly calculating the intersection-over-union (IoU) between predicted boxes and ground-truth boxes [23]. Therefore, some research has investigated anchor-free networks [23,24,25,26,27]. Anchor-free detectors can be divided into two categories: methods based on key point, such as CenterNet [25], and methods based on dense box such as FCOS [23]. CenterNet detects object as a point and predicts the location offsets of the center points and the size of the bounding boxes. FCOS classifies each pixel on the feature map, which is inspired by the semantic segmentation methods. A center-ness strategy is proposed for suppressing low-quality detected bounding boxes. However, original anchor-free methods such as FCOS cannot precisely detect composite objects in complex environments without stronger feature representation and a suitable constraint strategy of predicted boxes.
We have previously proposed an attention-guided network (ADNet) [28] for PSSs detection. ADNet is an anchor-based method and requires setting of hyper-parameters related to anchors. In contrast, anchor-free methods completely give up the guidance of anchors and can achieve promising performance. This gives them more potential in object detection tasks. Therefore, we studied the PSSs detection based on the anchor-free method.
In this article, we propose a feature enhanced network (FENet) based on FCOS for PSSs detection. The model can capture more semantic information by modeling foreground objects based on enhanced feature modules. In addition, the model optimizes the loss function to filter background information. The model can avoid hyper-parameters and complicated computation related to anchors by eliminating anchor boxes. For PSSs detection, FENet enhances the ability of the model to obtain critical information and improves localization accuracy. Meanwhile, a PSSs dataset is also presented for composite object detection, which collects 1685 images from GF images in Beijing–Tianjin–Hebei regions.
The main contributions of our work are summarized as follows:
  • We propose a feature enhanced network (FENet) for PSSs detection. The proposed method can improve the performance of PSSs detection in RSIs and effectively avoid the influence of negative samples. Compared with other object detection methods, our proposed method can locate the objects precisely without complex computation related to anchors. This simple anchor-free method also provides a new idea for object detection.
  • An enhanced feature module (EFM) is proposed to enlarge the receptive field in high-level layers and enhance discrimination of features. EFM contains two parts: one is a multi-scale local attention (MSLA) module for extracting multi-scale features, and the other is a channel attention-guided unit (CAU) for re-weighting features to obtain global attention and improve the semantic consistency among multiscale features. Through critical information extraction of high-level layers and further feature fusion, EFM can improve the classification and localization during PSSs detection.
  • A context-aware strategy and complete IoU (CIoU) loss are introduced to our network for further optimizing predicted bounding boxes. The context-aware strategy can make full use of foreground information and generate more positive samples. The CIoU loss considers the relationship between predicted boxes and ground-truth boxes in many cases and achieves faster regression of bounding boxes. These strategies are suitable for anchor-free methods and can effectively predict positive samples while ignoring the negative samples.
  • We build a PSSs dataset for composite object detection. This dataset is based on GF satellites with 2 m resolution and includes 1685 annotated images. The PSSs dataset provides a benchmark for future composite object detection.
The remainder of this study is organized as follows: Section 2 introduces the proposed method in detail, including the basic feature extractor, enhanced feature module, context-aware strategy, and multitask loss function. The experiments and results are presented and analyzed in Section 3. Section 4 discusses the results and the generalization ability of our method. Finally, the conclusions of this article are presented in Section 5.

2. Methods

Three key factors are considered for PSSs detection. First, the detector should achieve multi-scale object detection without additional prior anchors. Second, the detector should mine the critical information of targets and learn the powerful feature representation. Third, the detector needs to balance the background and foreground samples.
To alleviate these issues, we propose a feature enhanced anchor-free network for PSSs detection in RSIs. The network architecture mainly consists of three parts, backbone, enhanced feature layers, and detection head. An enhanced feature module (EFM) is designed for improving the discrimination ability of the model. A context-aware strategy is introduced for suppressing more negative samples. Additionally, focal loss is used to balance the positive and negative samples and CIoU loss is used to further optimize model training.

2.1. Network Architecture

Figure 2 shows the proposed network architecture and its dataflow. The proposed network adopts a FCOS-based encoder–decoder architecture. Given an input image, we first feed it into the feature extractor and obtain four feature maps {C2, C3, C4, C5}. ResNet-101 is taken as basic feature extractor first. Details of the output of backbone are shown in Table 1.
In the sequence, two flows happen in parallel. In one flow, the output of conv5 is fed into the enhanced feature module (EFM) for improving the feature representation. The proposed EFM contains two key components, the multi-scale local attention (MSLA) module and the channel attention-guided unit (CAU), to promote discrimination between objects and background.
In the other flow, the feature pyramid layers {P3, P4, P5, P6, P7} with 256 channels are generated by the original FPN. The layers P3 and P4 are generated by a lateral connection and a top-down pathway. The layers P6 and P7 are generated by down-sampling operation with a convolution layer, respectively.
After that, the two flows converge: the enhanced feature map X output from EFM joins all feature pyramid layers followed by 1 × 1 convolution to generate the final enhanced feature layers {E1, E2, E3, E4, E5}. That is important to improve feature reconstruction. Additionally, a context-aware strategy is used for better suppressing negative samples and low-quality predicted bounding boxes. Finally, focal loss is used for classification branch and CIoU loss and centerness loss is used for bounding box regression.

2.2. Enhanced Feature Module

Feature enhancement is essential for object detection. In recent years, deep CNN has been extensively used in image recognition. A deep convolutional network can learn semantic information of an image, which can better figure out the objects in the image. With the outstanding breakthrough of deep learning algorithms, many scholars explore feature enhancement methods to improve the feature representation ability of CNN. One of the efficient feature enhancement methods is the multi-scale feature fusion. High-level features generated by deep CNN have stronger semantic information. Shallow features contain more edge and contour information, which is conducive to localization. Usually, the last feature layer of CNN is well used for detection, but the information of the top and bottom layers cannot be fully used for detection. Therefore, how to better integrate high-level semantic features and low-level spatial information is the focus of feature enhancement methods. Feature pyramid network (FPN) [9] is a classical network for multi-scale feature enhancement. The goal of FPN is to create a feature pyramid structure that integrates low-level and high-level layers through a top-down path. In contrast, the bottom-up path transfers low-level details from shallow layers to deep layers. PANet [29] enhances the entire feature expression with accurate localization information in lower layers through bottom-up path augmentation, shortening the information path between low-level layers and top-level layers. In addition, Cai et al. [14] proposed a cross-layer network to further aggregate features in top and down layers.
The attention mechanism [15], another novel feature enhancement tool in deep learning, has been widely used in object detection [30,31,32]. Its principle is to generate a new layer of weight and identify the key features in the image data. The classical attention methods are SENet [30],CBAM [33],SKNet [34], and GCNet [31]. However, most of these attention modules are applied on each layer of the backbone, which generates a large computation requirement. In our method, we introduced channel attention methods into our proposed enhance feature module for reweighting the output of multilevel local features.
Generally, the object to be detected is only a part of the whole image, while most of the other parts are the background area. Thus, extracting effective information while ignoring background areas is crucial for object detection. In addition, high-level feature maps only contain single scale context information which does not benefit from different receptive fields. To effectively tackle these two problems, we used an enhanced feature module (EFM) for improving the feature discrimination between objects and the background. EFM contains two components, a multi-scale local attention (MSLA) module and a channel attention-guided unit (CAU). The MSLA module is used on the top-level feature map C5, which can not only avoid much computation but also capture rich context information from different receptive fields by using multi-scale dilated convolution layers. Then, a channel attention-guided unit (CAU) is used for reweighting the channel features and exploring the interdependence and correlation of multi-scale features.
As shown in Figure 3, EFM includes two parts: the MSLA module used for capturing multi-scale local features from the high-level feature map, the CAU used for reducing the misleading of aliasing features caused by multiscale information and mining the interdependence among multi-level features. Given the input deep features C 5 C × H × W , the output of MSLA is F C × H × W , and the output of EFM is X C × H × W . The process of EFM can be defined as:
X = C o n v 1 × 1 ( L i C A ( F ) ) C 5
where L i denotes the i-th layer of MSLA, CA denotes the process of CAU, C o n v 1 × 1 denotes a 1 × 1 convolution layer, denotes element-wise multiplication, and ⨁ denotes element-wise summation.
The context information of feature maps obtained by a single convolution kernel is limited. Dilated convolution [35] can expand the receptive field of the convolution kernel. Thus, we apply dilated convolution to generate multi-scale spatial attention and obtain the local attention description by recoding critical information of feature maps. The MSLA module consists of five parallel branches, namely a 1 × 1 convolution layer L1, three parallel dilated convolution layers {L2, L3, L4} with different dilated rates of {3, 6, 12}, and a global average pooling layer L5. The 1 × 1 convolution and global average pooling are used for achieving cross-channel information integration and generating global feature description, respectively. The process of MSLA is defined as:
F = c o n t a c t ( L 1 ( C 5 ) , L 2 ( C 5 ) , L 3 ( C 5 ) , L 4 ( C 5 ) , L 5 ( C 5 ) )
L 5 = U p S a m p l i n g ( C 1 × 1 ( A V P ( C 5 ) ) )
where C 1 × 1 represents the convolution operation and AVP represents the global average pooling.
Subsequently, a global average pooling and a global max pooling independently are applied on the concatenated multiscale features F C × H × W to aggregate two different forms of context information. Then, the global average pooling descriptor D g a p and the global max pooling descriptor D g m p undergo squeeze and expansion, to learn the interdependence among the features of X C × H × W . The operations of squeeze and expansion are inspired by SENet [30], which can retain the essential information of multiscale feature maps. Next, the D g a p and D g m p undergo two convolution layers followed by the element-wise summation operation and the sigmoid function. The CAU is computed as:
C A = s i g m o i d ( C o n v 2 ( δ ( C o n v 1 ( D g a p ( F ) ) ) ) C o n v 2 ( δ ( C o n v 1 ( D g m p ( F ) ) ) ) )
where δ ( · ) represents the ReLU activation function, and C o n v 1 , C o n v 2 represent 1 × 1 convolution layer for channel squeeze and expansion, respectively.
Finally, the enhanced feature map X is assigned to the feature pyramid layers {P3, P4, P5, P6, P7} in Figure 2 for improving the feature representation of each layer.

2.3. Context-Aware Strategy

FCOS proposed an effective strategy for suppressing low-quality predicted bounding boxes, called center-ness. By calculating the degree to which each pixel deviates from the center of the ground-truth box, the predicted bounding boxes far from the center can be filtered. In FCOS, all proposal points inside the ground-truth boxes are regarded as positive samples.
Location ( x , y ) is defined as a positive point if it lies within the range of any ground-truth box. l e f t , r i g h t , t o p , b o t t o m refer to the distances from the positive point to the four edges of ground-truth bounding boxes. The center-ness is defined as:
C e n t e r n e s s = min ( l e f t , r i g h t ) max ( l e f t , r i g h t ) × min ( t o p , b o t t o m ) max ( t o p , b o t t o m )
However, the edge of bounding boxes usually contains background information and the points on the edge of bounding boxes are usually corresponded to negative samples. Based on center-ness branch, we use a context-aware strategy that restrains the range of positive samples. As shown in Figure 4, the dashed yellow rectangle represents the bounding box, named ground-truth box, and the red circle represents the context region, named context-aware region. Taking the center point of the ground-truth box as the center, a circular part with a radius is taken, and only the points falling into the circular region can be regarded as positive samples. Then, the model calculates the Centerness of each point to constrain the predicted bounding boxes, which ensures that the low-quality predicted bounding boxes will be close to the center of the ground-truth box. The circular area can filter the negative samples to improve the discrimination ability of the model.
Each layer E i is used to detect objects at different scales. According to the sizes of the bounding boxes, objects are assigned to the corresponding E i for detection, and the radius multiplied by the stride of layer is the true radius of this layer. Due to the multi-scale feature pyramid structure, each layer has a different area for filtering negative samples.
a r e a = π ( r a d i u s s t r i d e ) 2
In our experiment, the radius is set to 2.0, and stride is {8,16,32,64,128}. In Section 3.5, we conducted experiments to validate the effectiveness of context-aware strategy in different radii.

2.4. Multitask Loss Function

We use a multitask loss function L o s s t o t a l to train FENet. The total loss is defined as a weighted sum of the classification loss ( L o s s c l s ), the localization loss ( L o s s l o c ), and center-ness loss ( L o s s c e n t e r n e s s ).
L o s s T o t a l = λ 1   L o s s c l s + λ 2   L o s s l o c + λ 3   L o s s c e n t e r n e s s ,
where λ 1 , λ 2 , and λ 3 are the balance parameters between each task loss. In this study, we set λ 1 = λ 2 = λ 3 = 1.0. The center-ness loss L o s s c e n t e r n e s s is the cross-entropy loss used in FCOS.
The classification loss L o s s c l s is the focal loss [36], defined as:
L o s s c l s ( p ) = α t ( 1 p t ) γ log ( p t ) ,
p t = {                                         p                           i f     l a b e l = 1                               1 p                     o t h e r w i s e ,
where α t [ 0 ,   1 ] is a balance factor between positive and negative samples, γ [ 0 ,   5 ] is a focusing parameter for easy and hard samples.
IoU loss is commonly used for bounding box regression, but it could not accurately reflect the degree of overlap between predicted box and ground-truth box [23]. As shown in Figure 5a,b, the IoU of the ground-truth box and the predicted box cannot reflect the validity of the detection under the same value of IoU. Complete-IoU (CIoU) loss can indicate the relationship between the ground-truth and predicted box by calculating the distance of the two box centers [37]. Figure 5c,d show the situation that the incorrectly predicted box results in a smaller loss. The red box represents the predicted box b p r e d , and the green box represents the ground-truth box b g t . d denotes the distance of center of b p r e d and b g t , and c denotes the diagonal length of the smallest enclosing box that covers b p r e d and b g t . When d is unchangeable, the ratio of d and c in Figure 5c is higher than that in Figure 5d, which means that the predicted box in Figure 5d is better than that in Figure 5d. Obviously, the model cannot recognize the optimal prediction box in this case.
CIoU loss can be used to evaluate the quality of the predicted box according to the aspect ratios. Therefore, we use CIoU loss for bounding box regression, which has taken into consideration three geometric parameters simultaneously.
L o s s l o c = 1 I o U + ρ 2 ( c e n t e r b p r e d ,   c e n t e r b g t ) c 2 + α ν ,
v = 4 π ( a r c t a n w g t h g t a r c t a n w h ) 2
a = v ( 1 I o U ) + v
where c e n t e r b p r e d and c e n t e r b g t denote the center points of predicted boxes and ground-truth boxes, respectively. ρ ( · ) denotes the Euclidean distance of the two points and ν measures the consistency of the box aspect ratio and α is a trade-off parameter.

3. Experiments and Results

3.1. Datasets

There are limited categories of objects in existing publicly available remote sensing datasets. Specially, GF satellites are of great significance to research of RSIs in China, but there are few annotated GF datasets of PSSs to the best of our knowledge.
We firstly constructed a primary-and-secondary-schools dataset of GF images for object detection, with the resolution of 2 m. The GF satellite slice images used in our study are the fusion data of GF-1 and GF-6 from “the Strategic Priority Research Program of the Chinese Academy of Sciences”. The crop size was set to 512 × 512 pixels and samples were extracted from the GF slice images in Beijing–Tianjin–Hebei region. Finally, we obtained 1685 images. The split ratios of training, validate, and test datasets are 70%, 20%, 10%, respectively. Additionally, the objects are annotated with horizontal bounding boxes. In order to improve the generalization ability of the model, we used three augmentation methods such as color change, flip, and rotation, to extend the training samples. The study area is shown in Figure 6.

3.2. Evaluation Metrics

We evaluated our proposed method using the average precision (AP). The AP is the mean precision corresponding to recall. The AP is defined as:
AP = 0 1 p r e c i s i o n ( r e c a l l ) d ( r e c a l l )
where precision and recall are defined as:
P r e c i s i o n = T P T P + F P
R e c a l l = T P T P + F N

3.3. Implementation Details

All experiments were implemented based on Pytorch. We trained and tested the network on NVIDIA Tesla P100. We adopted ResNet-101 pretrained on the ImageNet dataset as the backbone network. In our experiments, the stochastic gradient descent (SGD) was used as an optimizer, with a momentum of 0.9 and weight decay of 0.0005. The batch size was set to 1, the initial learning rate was set to 0.001, and the training epochs was set to 12. The learning rate dropped by 0.1 after 8 and 11 epochs, respectively.

3.4. Ablation Studies on Different Structures

We conducted comparative experiments on the test dataset to analyze the effect of each component of FENet. The overall ablation studies are reported in Table 2. In experiments, we used the FCOS as the baseline and the ResNet-101 as the backbone. We gradually added EFM, context-aware strategy, and CIoU loss on the baseline. The baseline means the original FCOS architecture without using any components. The last row of Table 2 shows that FENet increases by 4.7% AP compared with the baseline. The improvement brought by the components demonstrates the effectiveness of our proposed method.
The method of baseline-1 in the second row of Table 2 represents the detection performance of the FCOS detector with EFM. It shows that the proposed EFM with the integration map X improves the AP by about 2.2%. Therefore, multiscale feature enhancement is significant for PSSs detection. To intuitively illustrate the effects of EFM, we applied the Grad-cam [38] on enhanced feature maps output from FENet. Figure 7 shows that the information extracted by EFM becomes more abundant and accurate. The enhanced maps highlight locations of objects in the feature maps. Our model can substantially improve the discrimination ability of feature maps and suppress redundant context information. In our feature pyramid layers, each layer is used for predicting objects with different scales. Figure 7b–d presents the E3, E4, E5 of FENet. It can be seen that the lower layers obtain more attention when predicting smaller objects, and the higher layers obtain more effective information when predicting larger objects. The detection performance is significantly improved by assigning objects of different sizes to different feature pyramid layers.
Then, we implemented a context-aware strategy on the baseline. The method of baseline-2 in the third row of Table 2 shows that the context-aware strategy increases the AP by 1.9%. The ablation studies illustrate that context-aware strategy can capture effective regions and locate objects more precisely.
Additionally, we conducted experiments with different IoU loss functions. The method of Baseline-3 in the fourth row of Table 2 presents that the AP is increased 1.4% by CIoU loss function. The original bounding box regression uses IoU loss, which only works well for the cases of overlapping with target boxes [13]. In order to further evaluate the performance of CIoU loss, we applied IoU loss, generalized-IoU (GIoU) [39] loss, and CIoU [37] loss on FENet. In Table 3, it can be seen that the proposed network reaches the best performance when CIoU loss function is adopted. Compared with IoU loss and GIoU loss, the AP of the model using CIoU loss is improved by 2.2% and 0.8%, respectively. The results indicate that CIoU loss is more conducive for model training.

3.5. The Experiments on Different Radii in the Context-Aware Strategy

We conducted experiments to analyze the performance of FENet using different radii in the context-aware strategy. As shown in Table 4, the proposed method achieves best accuracy when the radius is set to 2. The last column represents the detection result by using all pixels in the ground-truth box. When the radius is set to less than 2, the detection accuracy is lower, which indicates that the positive sample information will be lost if the effective area is set too small. When the radius is set to too high, the detection accuracy decreases. It means that a larger region will lead to interference caused by the background. Considering the target size and experimental results, we set the radius to 2.0.

3.6. Comparison to State-of-the-Art Methods

We compared our proposed method with anchor-based methods (FPN [9], RetinaNet [36], and ADNet [28]) and anchor-free methods (CenterNet [25], FCOS [23]) on the same training set, shown in Table 5. All methods were implemented on Pytorch. As can be seen from Table 5, ADNet achieves the best AP of 79.2% higher by 6.8% and 6% over RetinaNet and FPN, and FENet achieves the best AP of 78.7% higher by 6.6% and 4.7% over the CenterNet and FCOS. RetinaNet and FPN are the effective anchor-based methods. RetinaNet uses focal loss function to eliminate class imbalance and enhance feature learning of difficult samples. However, RetinaNet cannot learn the critical features of composite objects and effectively distinguish positive samples and negative samples in some cases, resulting in lower accuracy. FPN only uses the basic CNN, which cannot solve the problem of composite object detection. In addition, FPN needs to set suitable anchors for detection. CenterNet only uses a center point to represent the object and takes a long time to train. These methods need to enhance the representation ability of models for composite objects. In particular, we compared the FENet with ADNet on the same training dataset. The experiments show that the AP obtained by FENet is close to the AP obtained by ADNet, which indicates that FENet can achieve acceptable AP without prior knowledge about anchors. ADNet requires more pre-processing of data during training, which results in a huge amount of work. At the same time, ADNet requires statistical analysis of the bounding boxes when the model is trained on a new dataset, which limits the extension and application of the model. Although the accuracy of FENet is slightly lower, it is more flexible in bounding box regression and does not require more processing of data before training. The detection results demonstrate that FENet is superior to the typical object detectors.
Figure 8 presents some detection results of a typical anchor-based method and a anchor-free method on the test set. Figure 8a–c shows the results of FPN, (d–f) shows the detection results of FCOS, and (g–i) shows the detection results of FENet. By comparing (a,d), (b,e) in Figure 8, we can see that FPN and FCOS detect some small buildings containing a vacant lot as PSSs. It indicates that these methods have no ability to model the critical features of PSSs, which leads to misjudgments. In addition, these methods still fail to distinguish foreground from background in rural areas, as shown in Figure 8c,f. The results indicate that FPN and FCOS do not perform well and cannot correctly detect the objects in complex backgrounds. The PSSs cannot distinguish from small spaces and clustered buildings. After adding EFM and suitable bounding box regression strategy, FENet can detect the whole object effectively.
The experimental results convincingly prove that FENet can locate PSSs precisely by capturing the critical features and excluding false positives. Compared with anchor-based methods, FENet does not require complex design of prior anchors and avoids complex computation regarding anchor boxes. Additionally, FENet can achieve pixel-level detection and can be extended to semantic recognition in the future.

4. Discussion

In our experiments, we obtained more accurate results on PSSs using a feature-enhanced anchor-free method. The score threshold and IoU in FENet were set to 0.01 and 0.5, and the proposed model can achieve 94.4% recall rate and 79.5% precision rate.
The model outputs the corresponding classification confidence (0.1–1.0) for each predicted box when the model is detecting objects. Precision rate and recall rate are affected by the score threshold. High score threshold can filter out negative samples and obtain a high precision. However, the recall rate is lower and the number of positive samples detected will decrease correspondingly by setting a high score threshold. Compared with the anchor-based method, our proposed method generates fewer negative samples while predicting. It means that our method needs to set a relatively low score threshold for obtaining a high recall rate and has more positive samples. In future research, composite object detection could pay more attention to the setting of thresholds to better evaluate the model performance.
Due to the complex environment in RSIs, the appearance of PSSs in different regions is variable. Based on the experiments above, we also applied the trained model on the dataset of Huadong region to verify the robustness and generalization ability of FENet. We collected 705 images of Huadong region from the same GF dataset in our study, whose size was 512 × 512 pixels. Figure 9 shows the detection results of our proposed method on the Huadong dataset. It can be seen from the detection results that our method can detect PSSs with small size, unclear features, and complex backgrounds.
The experiments results demonstrate that the FENet can effectively improve the discriminative ability and successfully weaken the influence of complex background information. Meanwhile, our proposed method can effectively balance the positive samples and negative samples. Compared with anchor-based methods, FENet can achieve a significant performance without pre-designed anchors.
Our study indicates that the anchor-free method can tackle the problems of complex objects in RSIs. It provides a new idea for object detection in RSIs. Feature enhancement of CNN and precise boundary detection are important for PSSs detection. However, there still are some problems with PSSs detection. Figure 10 shows some failure detection of FENet. The ground truth, detection results, false positives, and false negatives are marked in green, red, blue, and orange rectangles, respectively. It is still challenging for our method to deal with some small objects with unclear features in the PSSs dataset. In addition, some ground objects have similar characteristics with PSSs which may cause false positives. For future work, it is important to use higher-resolution remote sensing images and further improve the distinguishing ability of the model.

5. Conclusions

We proposed a feature enhanced network (FENet) based on an anchor-free network. Our method can effectively separate foreground and background objects and suppress negative samples compared with typical object detection networks. FENet applies a new attention module on the middle feature map and integrates multi-scale features, which makes better use of high-level semantic information for enhancing discrimination of features. Moreover, an effective regression strategy is used for optimizing the predicted bounding boxes and achieves more accurate localization. Compared with the anchor-based method on PSSs detection, our proposed method can achieve competitive performance by avoiding complex computation related to anchors and artificial prior knowledge. To validate the effect of FENet, we created a PSSs dataset in the Beijing–Tianjin–Hebei region. Experiments results demonstrated that FENet exhibits better performance for PSSs detection than other methods.

Author Contributions

Methodology, H.F., X.F., Z.Y., X.D. and H.J.; Z.Y. and X.D. contributed to the conception of the study, and performed the analysis with constructive discussion; H.F. and C.X. processed the data; H.F. performed the experiments and wrote the original manuscript; then the manuscript was reviewed and edited by Z.Y. and X.D.; funding acquisition, X.F., Z.Y. and X.D. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Strategic Priority Research Program of the Chinese Academy of Sciences, grant number XDA 19080101, XDA 19080103; the National Natural Science Foundation of China, grant number 41974108; Innovation Drive Development Special Project of Guangxi, grant number GuikeAA20302022.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The dataset presented in this study will be publicly available at https://github.com/AIRCAS-FU accessed on 30 June 2022.

Acknowledgments

The authors are grateful for the anonymous reviewers’ critical comments and constructive suggestions. The authors would also like to thank the developers in MMDetection for their open source deep learning framework.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Cheng, G.; Han, J.; Zhou, P.; Guo, L. Multi-Class Geospatial Object Detection and Geographic Image Classification Based on Collection of Part Detectors. ISPRS J. Photogramm. Remote Sens. 2014, 98, 119–132. [Google Scholar] [CrossRef]
  2. Cheng, G.; Zhou, P.; Han, J. Learning Rotation-Invariant Convolutional Neural Networks for Object Detection in VHR Optical Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2016, 54, 7405–7415. [Google Scholar] [CrossRef]
  3. Chen, S.; Zhan, R.; Zhang, J. Geospatial Object Detection in Remote Sensing Imagery Based on Multiscale Single-Shot Detector with Activated Semantics. Remote Sens. 2018, 10, 820. [Google Scholar] [CrossRef] [Green Version]
  4. Chen, Z.; Zhang, T.; Ouyang, C. End-to-End Airplane Detection Using Transfer Learning in Remote Sensing Images. Remote Sens. 2018, 10, 139. [Google Scholar] [CrossRef] [Green Version]
  5. Pang, J.; Li, C.; Shi, J.; Xu, Z.; Feng, H. R2-CNN: Fast Tiny Object Detection in Large-Scale Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2019, 57, 5512–5524. [Google Scholar] [CrossRef] [Green Version]
  6. Zhu, X.X.; Tuia, D.; Mou, L.; Xia, G.; Zhang, L.; Xu, F.; Fraundorfer, F. Deep Learning in Remote Sensing: A Comprehensive Review and List of Resources. IEEE Geosci. Remote Sens. Mag. 2017, 5, 8–36. [Google Scholar] [CrossRef] [Green Version]
  7. Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Proceedings of the Computer Vision—ECCV 2016, Amsterdam, The Netherlands, 11–14 October 2016; Leibe, B., Matas, J., Sebe, N., Welling, M., Eds.; Springer International Publishing: Cham, Swizerland, 2016; pp. 21–37. [Google Scholar]
  8. Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [Green Version]
  9. Lin, T.-Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 21–26 July 2017; pp. 936–944. [Google Scholar]
  10. Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
  11. Zhang, X.; Zhu, K.; Chen, G.; Tan, X.; Zhang, L.; Dai, F.; Liao, P.; Gong, Y. Geospatial Object Detection on High Resolution Remote Sensing Imagery Based on Double Multi-Scale Feature Pyramid Network. Remote Sens. 2019, 11, 755. [Google Scholar] [CrossRef] [Green Version]
  12. Zhu, M.; Xu, Y.; Ma, S.; Li, S.; Ma, H.; Han, Y. Effective Airplane Detection in Remote Sensing Images Based on Multilayer Feature Fusion and Improved Nonmaximal Suppression Algorithm. Remote Sens. 2019, 11, 1062. [Google Scholar] [CrossRef] [Green Version]
  13. Zhuang, S.; Wang, P.; Jiang, B.; Wang, G.; Wang, C. A Single Shot Framework with Multi-Scale Feature Fusion for Geospatial Object Detection. Remote Sens. 2019, 11, 594. [Google Scholar] [CrossRef] [Green Version]
  14. Cheng, G.; Si, Y.; Hong, H.; Yao, X.; Guo, L. Cross-Scale Feature Fusion for Object Detection in Optical Remote Sensing Images. IEEE Geosci. Remote Sens. Lett. 2020, 18, 1–5. [Google Scholar] [CrossRef]
  15. Guo, M.-H.; Xu, T.-X.; Liu, J.-J.; Liu, Z.-N.; Jiang, P.-T.; Mu, T.-J.; Zhang, S.-H.; Martin, R.R.; Cheng, M.-M.; Hu, S.-M. Attention Mechanisms in Computer Vision: A Survey. arXiv 2021, arXiv:2111.07624. [Google Scholar] [CrossRef]
  16. Zhang, G.; Lu, S.; Zhang, W. CAD-Net: A Context-Aware Detection Network for Objects in Remote Sensing Imagery. IEEE Trans. Geosci. Remote Sens. 2019, 57, 10015–10024. [Google Scholar] [CrossRef] [Green Version]
  17. Chen, J.; Wan, L.; Zhu, J.; Xu, G.; Deng, M. Multi-Scale Spatial and Channel-Wise Attention for Improving Object Detection in Remote Sensing Imagery. IEEE Geosci. Remote Sens. Lett. 2020, 17, 681–685. [Google Scholar] [CrossRef]
  18. Dong, R.; Jiao, L.; Zhang, Y.; Zhao, J.; Shen, W. A Multi-Scale Spatial Attention Region Proposal Network for High-Resolution Optical Remote Sensing Imagery. Remote Sens. 2021, 13, 3362. [Google Scholar] [CrossRef]
  19. Cai, B.; Jiang, Z.; Zhang, H.; Zhao, D.; Yao, Y. Airport Detection Using End-to-End Convolutional Neural Network with Hard Example Mining. Remote Sens. 2017, 9, 1198. [Google Scholar] [CrossRef] [Green Version]
  20. Li, S.; Xu, Y.; Zhu, M.; Ma, S.; Tang, H. Remote Sensing Airport Detection Based on End-to-End Deep Transferable Convolutional Neural Networks. IEEE Geosci. Remote Sens. Lett. 2019, 16, 1–5. [Google Scholar] [CrossRef]
  21. Xu, Y.; Zhu, M.; Li, S.; Feng, H.; Ma, S.; Che, J. End-to-End Airport Detection in Remote Sensing Images Combining Cascade Region Proposal Networks and Multi-Threshold Detection Networks. Remote Sens. 2018, 10, 1516. [Google Scholar] [CrossRef] [Green Version]
  22. Sun, X.; Wang, P.; Wang, C.; Liu, Y.; Fu, K. PBNet: Part-Based Convolutional Neural Network for Complex Composite Object Detection in Remote Sensing Imagery. ISPRS J. Photogramm. Remote Sens. 2021, 173, 50–65. [Google Scholar] [CrossRef]
  23. Tian, Z.; Shen, C.; Chen, H.; He, T. FCOS: Fully Convolutional One-Stage Object Detection. arXiv 2019, arXiv:1904.01355. [Google Scholar]
  24. Law, H.; Deng, J. CornerNet: Detecting Objects as Paired Keypoints. arXiv 2019, arXiv:1808.01244. [Google Scholar]
  25. Duan, K.; Bai, S.; Xie, L.; Qi, H.; Huang, Q.; Tian, Q. CenterNet: Keypoint Triplets for Object Detection. arXiv 2019, arXiv:1904.08189. [Google Scholar]
  26. Yang, Z.; Liu, S.; Hu, H.; Wang, L.; Lin, S. RepPoints: Point Set Representation for Object Detection. arXiv 2019, arXiv:1904.11490. [Google Scholar]
  27. Kong, T.; Sun, F.; Liu, H.; Jiang, Y.; Li, L.; Shi, J. FoveaBox: Beyond Anchor-Based Object Detector. IEEE Trans. Image Process. 2020, 29, 7389–7398. [Google Scholar] [CrossRef]
  28. Fu, H.; Fan, X.; Yan, Z.; Du, X. Detection of Schools in Remote Sensing Images Based on Attention-Guided Dense Network. ISPRS Int. J. Geo-Inform. 2021, 10, 736. [Google Scholar] [CrossRef]
  29. Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path Aggregation Network for Instance Segmentation. arXiv 2018, arXiv:1803.01534. [Google Scholar]
  30. Hu, J.; Shen, L.; Sun, G.; Albanie, S. Squeeze-and-Excitation Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
  31. Cao, Y.; Xu, J.; Lin, S.; Wei, F.; Hu, H. GCNet: Non-Local Networks Meet Squeeze-Excitation Networks and Beyond. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), Seoul, Korea, 27–28 October 2019; pp. 1971–1980. [Google Scholar]
  32. Fu, J.; Liu, J.; Tian, H.; Li, Y.; Bao, Y.; Fang, Z.; Lu, H. Dual Attention Network for Scene Segmentation. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 3141–3149. [Google Scholar]
  33. Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module. arXiv 2018, arXiv:1807.06521. [Google Scholar]
  34. Li, X.; Wang, W.; Hu, X.; Yang, J. Selective Kernel Networks. arXiv 2019, arXiv:1903.06586. [Google Scholar]
  35. Yu, F.; Koltun, V.; Funkhouser, T. Dilated Residual Networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 636–644. [Google Scholar]
  36. Lin, T.-Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal Loss for Dense Object Detection. arXiv 2018, arXiv:1708.02002. [Google Scholar]
  37. Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression. arXiv 2019, arXiv:1911.08287. [Google Scholar] [CrossRef]
  38. Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar]
  39. Rezatofighi, H.; Tsoi, N.; Gwak, J.; Sadeghian, A.; Reid, I.; Savarese, S. Generalized Intersection over Union: A Metric and A Loss for Bounding Box Regression. arXiv 2019, arXiv:1902.09630. [Google Scholar]
Figure 1. Samples of PSSs in different regions. The previous four images show the PSSs in urban areas. The last two images show the PSSs in remote regions.
Figure 1. Samples of PSSs in different regions. The previous four images show the PSSs in urban areas. The last two images show the PSSs in remote regions.
Applsci 12 03114 g001
Figure 2. The architecture of the proposed method, which is based on FCOS. The multi-scale enhanced layers are generated by the enhanced feature module and multi-scale feature fusion. For bounding box regression, a context-aware strategy and CIoU loss are introduced to achieve better performance.
Figure 2. The architecture of the proposed method, which is based on FCOS. The multi-scale enhanced layers are generated by the enhanced feature module and multi-scale feature fusion. For bounding box regression, a context-aware strategy and CIoU loss are introduced to achieve better performance.
Applsci 12 03114 g002
Figure 3. The architecture of EFM. The EFM contains of two parts: a multi-scale local attention module and a channel-guided attention unit.
Figure 3. The architecture of EFM. The EFM contains of two parts: a multi-scale local attention module and a channel-guided attention unit.
Applsci 12 03114 g003
Figure 4. Context-aware strategy. Yellow box: ground-truth. Red circle: context-aware region. Black boxes: possible proposals without context-aware strategy.
Figure 4. Context-aware strategy. Yellow box: ground-truth. Red circle: context-aware region. Black boxes: possible proposals without context-aware strategy.
Applsci 12 03114 g004
Figure 5. Illustration of CIoU loss. CIoU loss takes the IoU, distance of centers, and aspect ratio into consideration and is conducive to model training. (a,b) indicate that the relationship of ground-truth box and predicted box in different cases can be evaluated by calculating the distance of the two box centers; (c,d) illustrate that the detection results can be evaluated by the aspect ratio when the distance of centers is unchangeable.
Figure 5. Illustration of CIoU loss. CIoU loss takes the IoU, distance of centers, and aspect ratio into consideration and is conducive to model training. (a,b) indicate that the relationship of ground-truth box and predicted box in different cases can be evaluated by calculating the distance of the two box centers; (c,d) illustrate that the detection results can be evaluated by the aspect ratio when the distance of centers is unchangeable.
Applsci 12 03114 g005
Figure 6. Display of study area: (a) the study area in China; (b) the Beijing–Tianjin–Hebei region; (c) the datasets used in our study. Note: the secondary and primary schools are plotted by the red circles and blue triangles, respectively.
Figure 6. Display of study area: (a) the study area in China; (b) the Beijing–Tianjin–Hebei region; (c) the datasets used in our study. Note: the secondary and primary schools are plotted by the red circles and blue triangles, respectively.
Applsci 12 03114 g006
Figure 7. The effect of EFM through the visualizing attention map: (a) examples of the input images; (bd) the heatmaps of E3, E4, E5 corresponding images, respectively.
Figure 7. The effect of EFM through the visualizing attention map: (a) examples of the input images; (bd) the heatmaps of E3, E4, E5 corresponding images, respectively.
Applsci 12 03114 g007aApplsci 12 03114 g007b
Figure 8. Results of different methods on the test set: (ac) are the detection results of the FPN detector, (df) are the detection results of FCOS, and (gi) are the detection results of our proposed method. The ground truth boxes are plotted in green, and the detection results of the proposed method are plotted in red.
Figure 8. Results of different methods on the test set: (ac) are the detection results of the FPN detector, (df) are the detection results of FCOS, and (gi) are the detection results of our proposed method. The ground truth boxes are plotted in green, and the detection results of the proposed method are plotted in red.
Applsci 12 03114 g008
Figure 9. Visualization of the detection results of Huadong region. The ground truth boxes are plotted in green and the detection results of the proposed method are plotted in red.
Figure 9. Visualization of the detection results of Huadong region. The ground truth boxes are plotted in green and the detection results of the proposed method are plotted in red.
Applsci 12 03114 g009
Figure 10. Some detection results of the proposed method on the test set. The false positives and false negatives are plotted in blue and orange, respectively.
Figure 10. Some detection results of the proposed method on the test set. The false positives and false negatives are plotted in blue and orange, respectively.
Applsci 12 03114 g010
Table 1. The architecture of the backbone.
Table 1. The architecture of the backbone.
StageBackboneOutput
C2 [ 1 × 1   c o n v , 64 3 × 3   c o n v , 64 1 × 1   c o n v , 256 ] × 3 128 × 128, 256
C3 [ 1 × 1   c o n v , 128 3 × 3   c o n v , 128 1 × 1   c o n v , 512 ] × 4 64 × 64, 512
C4 [ 1 × 1     c o n v , 256 3 × 3     c o n v , 256 1 × 1   c o n v ,   1024 ] × 23 32 × 32, 1024
C5 [ 1 × 1     c o n v , 512 3 × 3     c o n v , 512 1 × 1   c o n v , 2048 ] × 3 16 × 16, 2048
Table 2. The performance of different components of our proposed method.
Table 2. The performance of different components of our proposed method.
MethodEFMContext-Aware StrategyCIoU LossAP
Baseline 74.0
Baseline-1 76.2
Baseline-2 75.9
Baseline-3 75.4
FENet78.7
Table 3. The comparison of the detection performance among different loss functions of bounding box regression.
Table 3. The comparison of the detection performance among different loss functions of bounding box regression.
MethodRegressionAP
FENet-1IoU76.5
FENet-2GIoU77.9
FENet-3CIoU78.7
Table 4. Model performance of using different radii in the context-aware strategy.
Table 4. Model performance of using different radii in the context-aware strategy.
Radius0.51.01.52.02.53.0All Area
AP69.175.777.278.777.175.576.7
Table 5. Detection results of different methods.
Table 5. Detection results of different methods.
MethodsAP
RetinaNet72.4
FPN73.2
ADNet79.2
CenterNet72.1
FCOS74.0
FENet78.7
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Fu, H.; Fan, X.; Yan, Z.; Du, X.; Jian, H.; Xu, C. Feature Enhanced Anchor-Free Network for School Detection in High Spatial Resolution Remote Sensing Images. Appl. Sci. 2022, 12, 3114. https://doi.org/10.3390/app12063114

AMA Style

Fu H, Fan X, Yan Z, Du X, Jian H, Xu C. Feature Enhanced Anchor-Free Network for School Detection in High Spatial Resolution Remote Sensing Images. Applied Sciences. 2022; 12(6):3114. https://doi.org/10.3390/app12063114

Chicago/Turabian Style

Fu, Han, Xiangtao Fan, Zhenzhen Yan, Xiaoping Du, Hongdeng Jian, and Chen Xu. 2022. "Feature Enhanced Anchor-Free Network for School Detection in High Spatial Resolution Remote Sensing Images" Applied Sciences 12, no. 6: 3114. https://doi.org/10.3390/app12063114

APA Style

Fu, H., Fan, X., Yan, Z., Du, X., Jian, H., & Xu, C. (2022). Feature Enhanced Anchor-Free Network for School Detection in High Spatial Resolution Remote Sensing Images. Applied Sciences, 12(6), 3114. https://doi.org/10.3390/app12063114

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop