Application of Instance Segmentation to Identifying Insect Concentrations in Data from an Entomological Radar

Wang, Rui; Ren, Jiahao; Li, Weidong; Yu, Teng; Zhang, Fan; Wang, Jiangtao

doi:10.3390/rs16173330

Open AccessArticle

Application of Instance Segmentation to Identifying Insect Concentrations in Data from an Entomological Radar

by

Rui Wang

^1,2,

Jiahao Ren

¹,

Weidong Li

^1,2,

Teng Yu

^1,2,*,

Fan Zhang

¹ and

Jiangtao Wang

¹

School of Information and Electronics, Beijing Institute of Technology, Beijing 100081, China

²

Advanced Technology Research Institute, Beijing Institute of Technology, Jinan 250300, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2024, 16(17), 3330; https://doi.org/10.3390/rs16173330 (registering DOI)

Submission received: 25 July 2024 / Revised: 1 September 2024 / Accepted: 2 September 2024 / Published: 8 September 2024

(This article belongs to the Topic Radar Signal and Data Processing with Applications)

Download

Browse Figures

Versions Notes

Abstract

:

Entomological radar is one of the most effective tools for monitoring insect migration, capable of detecting migratory insects concentrated in layers and facilitating the analysis of insect migration behavior. However, traditional entomological radar, with its low resolution, can only provide a rough observation of layer concentrations. The advent of High-Resolution Phased Array Radar (HPAR) has transformed this situation. With its high range resolution and high data update rate, HPAR can generate detailed concentration spatiotemporal distribution heatmaps. This technology facilitates the detection of changes in insect concentrations across different time periods and altitudes, thereby enabling the observation of large-scale take-off, landing, and layering phenomena. However, the lack of effective techniques for extracting insect concentration data of different phenomena from these heatmaps significantly limits detailed analyses of insect migration patterns. This paper is the first to apply instance segmentation technology to the extraction of insect data, proposing a method for segmenting and extracting insect concentration data from spatiotemporal distribution heatmaps at different phenomena. To address the characteristics of concentrations in spatiotemporal distributions, we developed the Heatmap Feature Fusion Network (HFF-Net). In HFF-Net, we incorporate the Global Context (GC) module to enhance feature extraction of concentration distributions, utilize the Atrous Spatial Pyramid Pooling with Depthwise Separable Convolution (SASPP) module to extend the receptive field for understanding various spatiotemporal distributions of concentrations, and refine segmentation masks with the Deformable Convolution Mask Fusion (DCMF) module to enhance segmentation detail. Experimental results show that our proposed network can effectively segment concentrations of different phenomena from heatmaps, providing technical support for detailed and systematic studies of insect migration behavior.

Keywords:

entomological radar; concentration identification; instance segmentation

1. Introduction

Hundreds of millions of aerial insects migrate long distances each year, involving complex aerodynamic behaviors and flight strategies [1,2]. However, our understanding of insect migratory behavior and its patterns remains limited. Therefore, further monitoring and analysis of these migratory behaviors are necessary to enhance our knowledge in this area. Radar is one of the most effective means of monitoring insect migration, offering advantages such as all-weather and all-day operation without interfering with the insects’ migratory behavior. The unique capabilities of this technology have led to the emergence of the interdisciplinary field of radar entomology. Since the 1970s, each advancement in entomological radar monitoring technology has significantly propelled the development of radar entomology and the progress of insect migration research [3,4].

Early entomological radars were all scanning systems. In 1968, Professor G. W. Schaefer of the United Kingdom constructed the world’s first entomological radar and successfully observed nocturnal desert locust migrations in the Sahara [5]. This radar utilized an X-band frequency (wavelength 3.2 cm) and a pencil-beam parabolic antenna with variable elevation angles. Schaefer’s design laid the technical foundation for subsequent developments in entomological radar technology. Similarly, in Australia, Drake developed a scanning entomological radar and documented the classical method of entomological radar scanning. This method involved sequential azimuthal scanning at several discrete elevation angles to achieve aerial sampling [6]. Using this scanning radar, they observed locusts taking off at dusk and forming layers at specific altitudes [7]. However, scanning radar can observe entire swarms, but it is difficult to accurately observe the resolution of individual insects in high-density scenes. Due to this limitation, they had to use estimation methods to analyze insect migration [8]. Moreover, early scanning entomological radars were hindered by time-consuming and labor-intensive operation and data processing, which is suitable only for short-term research on insect migratory behaviors, rather than for long-term observation of migratory insects.

In the 1970s, UK-based scientists pioneered the development of Vertical-Looking Radar (VLR) for insect monitoring. This radar system utilized linear polarization, directing its beam vertically upwards and rapidly rotating around its central axis, enabling the detection of insects’ body-axis orientation and wingbeat frequency [9]. In the 1980s, these scientists enhanced the first-generation VLR with precession beam technology, forming the second-generation VLR with a ZLC (Zenith-pointing Linear-polarized Conical scan) configuration. This advanced VLR could measure individual insects’ body–axis orientation, wingbeat frequency, speed, displacement direction, and three RCS parameters related to the polarization pattern shape [10]. The advent of the second-generation VLR endowed entomological radar with the capacity for long-term automated observation. The Rothamsted Experiment Station in the UK employed VLR for more than 20 years of automated monitoring of migratory insects transiting the UK [11]. Similarly, Professor Drake in Australia constructed two VLRs with ZLC configuration, which he refers to as IMRs (Insect Monitoring Radars) [12,13]. Additionally, these scientists tried to combine the VLR with wind and temperature measuring radars for joint observations of insect migration [14]. The monitoring data from the VLRs has enabled scholars in both the UK and Australia to identify the phenomena of migratory insects concentrated in layers and to initially interpret the effects of atmospheric structure and air movement on layer concentrations [15,16,17]. Nevertheless, the VLR for this period exhibits deficiencies in its working mechanism, system functions, and performance indicators. The signal processing is complex, employing time-sharing signal acquisition and processing for intermittent observation. Distance-to-target echoes are collected through a distance gate, resulting in low range resolution (approximately 40–50 m) and discontinuity between height layers. In 2017, Drake et al. upgraded their IMR radar again, and the new version of IMR, also known as IMRU (IMR Upgraded) can achieve a range resolution of about 10 m [18,19]. These design limitations impede its ability to meet the requirements for high-resolution spatiotemporal measurements.

In order to address the shortcomings of VLR, Beijing Institute of Technology has developed a new generation High-Resolution Multi-Dimensional Radar Measurement System (MDR). This system comprises a High-Resolution Phased Array Radar (HPAR) and three High-Resolution Multi-Frequency and Fully Polarimetric Radars (HMFPR) [20,21].

The HMFPR is an advanced radar system with multi-frequency, fully phase-referenced, and fully polarimetric tracking capabilities. The system extends the capabilities of the Ku-HPR to multiple bands, enabling simultaneous operation in X, Ku, and Ka bands. This allows the acquisition of full-polarization echo signals of targets across five sub-bands within these three bands. This enables the precise measurement of multi-dimensional biological and behavioral parameters, such as three-dimensional body–axis orientation, wingbeat frequency, body length, body mass, and speed, further enabling species identification [22,23,24].

The HPAR is a Ku-band, three-coordinate, full-phase-referenced active phased-array scanning radar. It achieves a high range resolution of 0.2 m by employing stepping-frequency broadband synthesis waveforms. Additionally, with high-power phased-array electronic scanning, HPAR performs wide-area detection with low residence time, resulting in a high data update rate. This combination of high range resolution and rapid data update rate allows it to function effectively even during dense migration events, facilitating the measurement of high-resolution spatiotemporal distribution structures of aerial insect populations.

The use of high-resolution spatiotemporal measurements enables the generation of detailed spatiotemporal distribution heatmaps of insect concentrations. Each pixel in these heatmaps is assigned a color, representing the concentration at specific times and altitudes. The majority of spatiotemporal regions demonstrate a relatively low concentration. However, at specific times and altitudes, the concentrations increase significantly, resulting in a patch of highlighted pixels. This indicates the occurrence of insects concentrated in layers within those spatiotemporal regions or that insects are taking off or landing over a wide area. Through these heatmaps, we can clearly observe the variation in insect concentration over time and altitude, particularly the formation of insects concentrated in layers or large-scale take-off and landing phenomena, revealing the intricate and complex structure of their spatiotemporal distribution. These behaviors can be influenced by intricate interactions with meteorological factors, such as wind speed and temperature. To gain a deeper understanding of the behavioral mechanisms driving insect migration, it is essential to obtain and analyze data of insects from different migration phenomena. The concentration varies significantly over time and space, manifesting in the heatmap as clear edges in high-density layers, while low-density layers blend with the background, making them difficult to delineate. For different layers in adjacent spatiotemporal regions, this issue is even more pronounced. Furthermore, the layers may be very close to each other, or even overlap or merge with other layers or the insect’s take-off or landing. Simple threshold-based methods are insufficient for extracting data. Instead, manual segmentation of the heatmap is required to extract the spatiotemporal distribution and density data of different migration phenomena. However, as the volume of monitoring data continues to grow, the traditional method has become increasingly time-consuming and inefficient. Therefore, it is crucial to develop an efficient segmentation algorithm for the automated segmentation and extraction data of different migration phenomena.

The rapid development of deep learning, especially convolutional neural networks (CNNs), has greatly advanced the classification, recognition, and segmentation of image targets, and these technologies have gradually shown tremendous influence in more fields [25,26,27,28,29]. In 2014, Ross Girshick et al. proposed a Region-based Convolutional Neural Network (R-CNN), which introduced CNNs into the field of target detection for the first time [30]. R-CNN utilizes convolutional layers to extract features from candidate regions and subsequently determines the class and location of the target. However, this method falls short in achieving precise target edge segmentation. In the same year, Jonathan Long et al. proposed a Fully Convolutional Network (FCN), which applies CNN to semantic segmentation of images [31]. FCN conducts feature extraction using convolutional layers and achieves pixel-wise classification by upsampling the feature map to the original image size. Nevertheless, semantic segmentation struggles to distinguish between different objects within the same category. In 2017, He et al. proposed Mask R-CNN, which is a classic two-stage instance segmentation network that combines object detection and semantic segmentation tasks [32]. Mask R-CNN can distinguish instances within different categories and achieves pixel-level segmentation based on instance-level target localization results. Subsequently, researchers proposed two-stage instance segmentation methods, including Cascade Mask R-CNN, Hybrid Task Cascade, and QueryInst, which significantly enhanced the accuracy of instance segmentation [33,34,35]. Furthermore, single-stage instance segmentation methods, such as YOLACT and SOLO, have also been developing in terms of operational speed and architectural innovations, which further extend the applicability and performance of instance segmentation [36,37,38]. Instance segmentation techniques demonstrate considerable potential for distinguishing and segmenting different targets, particularly when segmenting insects with different migration phenomena on spatiotemporal distribution heatmaps.

This paper introduces the concept of instance segmentation to the field of insect data extraction, transforming the task into one of instance segmentation of heatmaps. We propose a method for segmenting and extracting insect data from spatiotemporal distribution heatmaps of concentrations across different migration phenomena. We first construct spatiotemporal distribution heatmaps from HPAR monitoring data, utilizing data visualization enhancement and augmentation techniques to build a robust and effective dataset. To address the fine and complex characteristics of a concentration’s spatiotemporal distribution, we propose the Heatmap Feature Fusion Network (HFF-Net), which effectively segments and extracts insect data. In HFF-Net, we introduce the Global Context (GC) module to enhance the backbone network’s ability to extract spatiotemporal distribution features of insects with different migration phenomena [39]. We also employ the Atrous Spatial Pyramid Pooling with Depthwise Separable Convolution (SASPP) module, which utilizes convolutions with different receptive fields to help the network perceive information about layer of varying sizes [40]. Additionally, the Deformable Convolution Mask Fusion (DCMF) module is used to refine the accuracy of the segmentation masks.

The remainder of this paper is organized as follows: Section 2 provides a comprehensive account of the data acquisition and initial processing. Section 3 presents the architecture of our proposed network. Section 4 provides a detailed analysis of the performance of the proposed model and a comparison with previous studies. Section 5 discusses the results and future works.

2. Data Acquisition and Initial Processing

2.1. Radar Data

The MDR was deployed and put into operation at the Yellow River Delta Comprehensive Modern Agricultural Experimental Demonstration Base of Shandong Academy of Agricultural Sciences, Dongying, Shandong Province, China. The data used in this paper are insect monitoring data recorded by the HPAR from September 2021 to November 2023, when it was operated in statistical mode.

In statistical mode, HPAR performs a two-dimensional scan using continuous mechanical scanning in the azimuth direction and electronic scanning in the elevation direction. It is capable of covering an azimuth angle from 0° to 360° and an elevation angle from 5° to 85°, thereby achieving a scanning range that covers approximately a hemisphere. It can detect all individual targets with an RCS greater than −64 dBsm within a cylindrical airspace with a radius of 1 km and an altitude range of 0.1 to 1.5 km, and it can obtain their 3D coordinates within a scanning cycle of 66 s. Figure 1 illustrates the HPAR and its operational mode. In Figure 1b, the black hemisphere represents the entire airspace scanned by HPAR, while the cylinder marked in blue is the area we used to obtain insect data.

To extract information on different migration phenomena from HPAR monitoring data, we need to construct spatiotemporal distribution heatmaps of concentrations. We first constructed a time–altitude matrix to record the distribution of insect numbers at different times and altitudes. The horizontal axis of the matrix represents the time span, which can be up to 12 h, adjusted according to the actual operating duration of the radar; the vertical axis indicates the altitude span, ranging from 1400 m (from 100 m to 1500 m). Each matrix cell represents the statistical count of insects within an 8 m altitude range over a period of 66 s. These insect data are derived from our HPAR statistical module, covering a full range of angles and including the altitude range from 100 to 1500 m. To mitigate the impact of radar blind spots and the variation in monitoring coverage at different altitudes due to the hemispherical scanning mode, we specifically selected data within a horizontal distance range of 525 to 1000 m. This matrix was subsequently quantized into a 256-level grayscale image and visualized as a pseudo-color spatiotemporal distribution heatmap. The heatmap, enhanced by data visualization, clearly reveals the phenomena of insects concentrated in layers and insect take-off or landing over a wide area.

Figure 2 presents two cases of heatmaps illustrating the spatiotemporal distribution of concentrations. In Figure 2a, we can clearly observe that around sunrise, at approximately 5 a.m., there was a phenomenon of insect take-offs over a large area. Subsequently, these insects began to concentrate into layers, forming multiple layers at low altitudes, with an additional layer slowly emerging at higher altitudes. By 9 a.m., the multiple low-altitude layers merged. In Figure 2b, diurnal migratory insects concentrate into layers around 6 p.m. Near dusk at approximately 8 p.m., the concentration of layers descends noticeably. Subsequently, nocturnal migratory insects concentrate into layers at different times and altitudes, with some layers maintaining a high concentration until dawn.

HPAR has been accumulating detection data since September 2021. In 2021 and 2022, radar monitoring was conducted only at night, but from 2023 onwards, it was expanded to full-day operation. By the end of 2023, this resulted in over two years of nighttime data and one year of daytime data. After excluding instances of rain, short monitoring periods, and instances where the concentration of insects is low and relatively average, 332 valid spatiotemporal distribution heatmaps were obtained. With these heatmaps, we can manually annotate the three different insect migration phenomena we observed: large-scale takeoff, large-scale landing, and insects concentrating into layers on the heatmaps. Subsequently, these annotated data are used to train an instance segmentation network to achieve automatic segmentation of insect data for different migration phenomena. However, before initiating the instance segmentation network training, we need to build a robust and effective dataset through data visualization enhancement and data augmentation. These steps will be detailed in the following sections of this chapter.

2.2. Image Visualization Enhancement

The spatiotemporal distribution heatmap of concentrations is inherently limited in its ability to express detailed information, as it primarily highlights migration phenomena with high concentrations while often overlooking those with relatively low concentrations. This limitation can negatively impact both the accuracy of manual labeling and the segmentation performance of instance segmentation networks. Therefore, enhancing the visualization of heatmaps is necessary to improve the detail and clarity of the images for subsequent processing. Figure 3 illustrates the effect of image visualization enhancement.

2.2.1. Image Contrast Enhancement

Firstly, the overall contrast of the image is enhanced using Contrast Limited Adaptive Histogram Equalization (CLAHE) [41]. In heatmaps, the majority of pixels are predominantly concentrated in the lower grayscale range, causing many migration phenomena with relatively low concentrations to have grayscale levels similar to those of the sparsely distributed insects in the surrounding spatiotemporal environment. This similarity makes effective differentiation in the heatmap challenging, necessitating an augmentation of the grayscale distribution to a wider range.

Histogram equalization is an image enhancement technique that improves contrast by transforming the grayscale histogram of an image into a uniform grayscale probability density distribution. Specifically, it enhances contrast by expanding the levels with a high number of pixels and merging those with fewer pixels.

CLAHE is an advanced method of traditional histogram equalization. It divides the image into multiple local regions, calculates the histogram within each region, and applies contrast-limited histogram equalization separately. This approach mitigates the over-enhancement and noise amplification issues associated with global histogram equalization, while preserving local details.

As shown in Figure 3c,d, after applying overall contrast enhancement, many low-concentration layers and take-off phenomena became visible. In Figure 3c, the low-altitude region between 6 a.m. and 8 a.m. clearly shows two distinct layers, and around 9 a.m., more closely adjacent layers are observable. In Figure 3d, it is also possible to see some of the layers that would otherwise not be visible.

2.2.2. Image Texture Enhancement

To further enhance the visualization of texture details, we apply a two-dimensional Gabor filter [42]. In heatmaps, insects concentrated in layers appear as stripe-like textures, necessitating the enhancement of this texture information to highlight layer details. The Gabor transform, a key method in time-frequency analysis, is particularly effective for this purpose. In the spatial domain, a two-dimensional Gabor filter comprises a Gaussian kernel function modulated by a sinusoidal wave, widely used for extracting and enhancing image texture features. It is defined as follows.

g (x, y; λ, θ, ψ, σ, γ) = e^{- \frac{x^{' 2} + γ^{2} y^{' 2}}{2 σ^{2}}} e^{i (2 π \frac{x^{'}}{λ} + ψ)}

(1)

\{\begin{array}{l} x^{'} = x \cos θ + y \sin θ \\ y^{'} = - x \sin θ + y \cos θ \end{array}

(2)

The parameter

θ

represents the main orientation of the filter, with 0° being horizontal and 90° being vertical. The parameters

λ

and

ψ

denote the wavelength and phase offset of the sinusoidal wave within the filter, respectively. The standard deviation

σ

of the Gaussian factor, along with the filter kernel size, determines the effective range of the filter. The aspect ratio

γ

defines the spatial extent of the filter; when

γ = 1

, the filter has a circular effective range.

The imaginary part of the two-dimensional Gabor filter, which contains the sinusoidal component, is more sensitive to phase changes in the image. This sensitivity allows it to better capture regions with significant variations in concentrations in the heatmap. By combining two-dimensional imaginary Gabor filters at different angles, we can enhance the texture details of the layers.

As shown in Figure 3e,f, the clarity of the layers has significantly improved, particularly for the low-concentration, narrow-band layers, whose details have been greatly enhanced. For instance, in Figure 3e, the layers distributed around 800 m between 9 a.m. and 11 a.m., and in Figure 3f, the multiple layers between 2 a.m. and 4 a.m., exhibit much clearer details.

2.2.3. Background Elimination

In the time–height quantity matrix, positions with a value of zero indicate the absence of detected insect targets within that spatiotemporal range. However, during quantization, these regions can be assigned grayscale values similar to those of areas with very low insect densities, which may lead to confusion. To address this issue, it is necessary to define regions with no targets as white in the heatmap, thereby excluding background information.

The results depicted in Figure 3g,h, which illustrate the removal of background information, are presented. In Figure 3g, the removal of background noise allows the insects take-off over a wide area to be clearly observed, with altitudes ranging from below 100 m up to 1400 m. Additionally, the layers at higher altitudes are also depicted more distinctly.

2.3. Data Augmentation

However, the small dataset size severely limits the performance of deep learning methods in instance segmentation tasks. To increase data volume, prevent model overfitting, and enhance reliability, we used the Style-Based Generator Architecture for Generative Adversarial Networks Version 2 (StyleGANv2) for data augmentation [43].

The initial Generative Adversarial Network (GAN) model, proposed by Ian Goodfellow et al. in 2014, comprises a generator that produces synthetic data and a discriminator that attempts to distinguish between real and synthetic data [44]. Through adversarial training, these two networks can learn the feature distribution of the original training dataset and generate similar images. StyleGANv2 represents the state-of-the-art in GAN technology, capable of generating high-quality and realistic samples at relatively low computational costs.

We used a dataset consisting of 332 heatmaps that had undergone image visualization enhancement. During training, we employed Adaptive Discriminator Augmentation (ADA) for data preprocessing and evaluated the training effectiveness using the Frechet Inception Distance (FID), a commonly used metric for assessing the quality of generated images [45,46]. The experiments were conducted using the PyTorch framework on a system equipped with an NVIDIA GeForce RTX 3060 GPU (Santa Clara, CA, USA). Training involved 20,000 iterations, resulting in a final FID score of 27.64. Figure 4 shows two cases of heatmaps generated using StyleGANv2.

3. Method

With the rapid advancement of deep learning technology, its applications have become increasingly widespread across various fields. Among these, instance segmentation has shown exceptional potential in distinguishing and segmenting different instances. Consequently, we have transformed the process of extracting insect data into an instance segmentation problem on spatiotemporal distribution heatmaps of concentrations. However, due to the diverse shapes, sizes, and visibilities of different layers and other take-off and landing phenomena on the heatmaps, which reflect intricate and complex spatiotemporal distribution characteristics, traditional instance segmentation methods face significant challenges. To better capture these features and improve the effectiveness of segmentation, we propose a novel instance segmentation framework—the Heatmap Feature Fusion Network (HFF-Net). The overall structure is illustrated in Figure 5.

HFF-Net enhances the traditional cascade instance segmentation network with three new modules. First, we integrated the Global Context (GC) module into the backbone network, utilizing a global context attention mechanism to enhance the extraction of heatmap features. Next, feature maps of different sizes are fused within the Feature Pyramid Network (FPN) structure [47]. At the front end of the FPN structure, we added the Atrous Spatial Pyramid Pooling with Depthwise Separable Convolution (SASPP) module, which employs convolutional kernels with different dilation rates to achieve multi-scale feature extraction and fusion. The fused feature maps are then processed through the Region Proposal Network (RPN) and Non-Maximum Suppression (NMS) methods to extract Region Proposals, followed by ROI Align to extract different instance feature maps [48]. Finally, these feature maps undergo multiple iterations through the BBox Head and Mask Head to generate prediction boxes and segmentation masks [34]. During the iterative generation of segmentation masks, we applied the Deformable Convolution Mask Fusion (DCMF) module within the Mask Head, significantly enhancing the accuracy and effectiveness of segmentation masks. We will next detail the design of the GC, SASPP, and DCMF modules.

3.1. GC Module

The Global Context (GC) is a computational unit that combines the advantages of Simplified Nonlocal (SNL) and lightweight Squeeze-and-Excitation (SE) modules. It effectively captures long-range dependencies and enhances the response to key features while maintaining low computational complexity. Specifically, the GC module consists of three parts: Context Modeling, which uses a 1 × 1 convolution and SoftMax to obtain attention weights, followed by attention pooling to extract global context features; Bottleneck Transform, capturing inter-channel dependencies in an excitatory manner; and finally, the aggregation of global context features to each position’s features. The entire GC module can be represented as follows:

z_{i} = x_{i} + W_{v 2} ReLU (LN (W_{v 1} \sum_{j = 1}^{N_{p}} \frac{e^{W_{k} x_{j}}}{\sum_{m = 1}^{N_{p}} e^{W_{k} x_{m}}} x_{j}))

(3)

where

x

and

z

denote the input and output of the block,

i

is the index of query position elements, and

j

enumerates all possible position elements.

W_{k}

,

W_{v 1}

, and

W_{v 2}

denote linear transform matrices.

α_{j} = e^{W_{k} x_{j}} / \sum_{m = 1}^{N_{p}} e^{W_{k} x_{m}}

represents the attention pooling weights, which aggregate the features of all positions to obtain the global context feature.

δ (\cdot) = W_{v 2} R e L U (L N (W_{v 1} (\cdot)))

represents the Bottleneck Transform process, where the first 1 × 1 convolution is used for channel compression, reducing the original c channels to c/r (where r is the channel compression ratio). This is followed by LayerNorm and ReLU activation functions, and then a second 1 × 1 convolution restores the number of channels. Finally, the result is added back to the original feature map. Figure 6 illustrates the internal structure of the GC module and its position within the backbone network.

In analyzing the spatiotemporal distribution heatmaps of concentrations, it is crucial to focus on the varying spatiotemporal distributions of different migration phenomena. Understanding these contextual features is key to their identification and segmentation. To this end, we integrate the GC module into the backbone network to capture contextual information within the images, enhancing the network’s responsiveness to the distinct spatiotemporal distribution characteristics of different instances.

3.2. SASPP Module

Our spatiotemporal distribution heatmaps of concentrations span a duration of 12 h and encompass altitudes up to 1400 m. However, the instances exist only in localized spatiotemporal regions, with considerable variation in size and range as depicted in the images. Traditional feature extraction methods with small receptive fields are thus inadequate for effectively capturing the necessary spatiotemporal distribution characteristics of instances. To address this limitation, we replace the initial 1 × 1 convolution in the FPN with the SASPP module to enhance network performance. Figure 7 illustrates the structure of the FPN with the SASPP module.

The SASPP module processes input feature maps in parallel using convolution operations with different dilation rates, capturing contextual semantic information at multiple scales and fusing them. It consists of four parallel branches and each branch takes Ci as input and produces outputs that are concatenated along the channel dimension. A 1 × 1 convolution then adjusts these concatenated features to produce the final output. The four branches include a global average pooling layer followed by a 1 × 1 convolution, two 3 × 3 depthwise separable convolutions with dilation rates of 3 and 6, and a 1 × 1 convolution. Figure 8 illustrates the internal structure of the SASPP module.

The depthwise separable atrous convolution used in the SASPP module is an enhanced convolution operation. Atrous convolution, by introducing a “dilation rate” parameter, effectively expands the receptive field to capture a wider range of contextual information. Depthwise separable atrous convolution further reduces the number of parameters and computational complexity while maintaining model performance. It consists of two steps: (1) depthwise atrous convolution, which applies atrous convolution to each input channel separately and (2) pointwise convolution, which uses a 1 × 1 convolution to combine the outputs of the depthwise atrous convolutions along the channel dimension, producing the final output.

3.3. DCMF Module

In traditional cascade Mask Head architectures, the previous stage’s mask features are simply added to the new stage’s feature map using a 1 × 1 convolution. This approach results in insufficient accuracy of the mask information, limiting the network’s mask prediction capability. To address this limitation, we propose the DCMF module, which enables more effective integration of the previous stage’s mask features with the new stage’s feature map, thereby enhancing mask prediction performance. The DCMF module consists of two key components: (1) Feature Calibration, using deformable convolutions to adjust the mask features and correct the shape of the regions of interest and (2) Adaptive Fusion, which adaptively incorporates mask information into the regions of interest based on the feature map’s requirements.

Deformable Convolution enhances standard convolution by introducing additional two-dimensional offsets within the receptive field, thereby increasing spatial sampling flexibility and better adapting to geometric transformations in images [49]. In our architecture, we construct the necessary offsets by concatenating the previous stage’s mask feature with the feature map obtained through ROI Align. These offsets are then applied to the deformable convolution of the previous stage’s mask feature to achieve feature calibration. Subsequently, we employ an adaptive fusion module to integrate the calibrated mask features with the feature map. This process involves, first, applying a 1 × 1 convolution to the feature map to generate a spatial attention map; second, processing the attention map with a sigmoid function to assign weights to both the feature map and the mask features; and finally, fusing the features based on these weights. Figure 9 shows the internal structure of the DCMF module.

4. Experiments and Results

4.1. Dataset

We randomly divided 332 real spatiotemporal distribution heatmaps of concentrations into a training set and a test set with a 65% to 35% split, yielding 215 and 117 images, respectively. We then augmented the training set by adding 215 and 430 synthetic heatmaps in 1:1 and 1:2 ratios, respectively, to create two distinct mixed training sets.

4.2. Evaluation Metrics

In instance segmentation tasks, the manually annotated segmentation results are typically referred to as the ground truth. The higher the pixel overlap between the predicted results and the ground truth, the better the prediction is considered to be. Intersection over Union (IoU) is the metric used to quantify this overlap. IoU is defined as the ratio of the intersection area to the union area between the predicted results and the ground truth. The IoU value ranges from 0 to 1, with values closer to 1 indicating a higher degree of overlap between the predicted results and the ground truth.

To evaluate a single prediction result, we can use IoU. However, to assess the overall effectiveness of our proposed model, we employed mean Average Precision (mAP), a standard metric in instance segmentation, to measure accuracy. mAP is the mean of the Average Precision (AP) values across all classes. AP is derived by plotting the precision–recall curve and calculating the area under the curve (AUC). Precision refers to the proportion of true positives among all samples predicted as positive, while recall represents the proportion of true positives identified among all actual positive samples. Therefore, AP provides a comprehensive evaluation by considering both precision and recall. Here, true positives are defined as predictions with an IoU greater than a certain threshold.

To evaluate the proposed method, we used four metrics: BBox mAP, BBox mAP50, Mask mAP, and Mask mAP50. BBox mAP and BBox mAP50 assess the performance of bounding box predictions, while Mask mAP and Mask mAP50 evaluate pixel-level segmentation. mAP represents the average precision across IoU thresholds from 0.5 to 0.95 in 0.05 increments, while mAP50 indicates the average precision at an IoU threshold of 0.5. Higher values, closer to 1, indicate better segmentation accuracy.

In addition, we supplement the typical metrics and parameters (denoted as Params) to evaluate the complexity of the network. Parameters serve as a standard measure for the number of learnable network parameters when training a CNN-based model. Larger Params indicate a larger number of model parameters and a more complex model. Assuming the number of channels in the input features is

C_{in}

, the size of the convolutional kernel is

M \times M

, and the number of output channels is

C_{out}

, thus, the parameters can be defined as:

Params = (C_{in} M^{2} + 1) C_{out}

(4)

4.3. Implementation Details

All experiments were conducted using MMDetection, an open-source object detection toolbox based on PyTorch provided by the OpenMMLab platform [50]. Training and testing were performed on a PC equipped with an NVIDIA RTX A6000 GPU. The batch size was set to 8, and we employed Stochastic Gradient Descent (SGD) with a weight decay of 0.0001 and a momentum of 0.9. Images were resized to 645 pixels on the long side and 175 pixels on the short side without altering the aspect ratio. Training was performed over 22 epochs with an initial learning rate of 0.01, using a warm-up strategy with a coefficient of 0.01. The learning rate was reduced by a factor of 10 at the 12th, 16th, and 19th epochs. Specific training parameters varied for some networks, as detailed in Table 1. For evaluation, the score threshold was set to 0.05 and the Non-Maximum Suppression (NMS) threshold to 0.3, with other parameters at MMDetection default settings. MMDetection version 3.2.0 and PyTorch version 1.12.1 were used. Each experiment was repeated 10 times to mitigate the impact of random initialization, with the average result reported as the final outcome.

4.4. Results of Dataset Augmentation Experiments

To validate the effectiveness of our data augmentation method, we used the classical instance segmentation network Mask R-CNN to train and test three different datasets. Based on the varying sizes of these datasets, we adjusted the hyperparameters accordingly to achieve optimal training results. The hyperparameter settings and experimental results are detailed in Table 2. The results indicate that the training sets enhanced with data augmentation significantly outperformed the original datasets on the test sets across four metrics, with notable improvements in the Mask mAP50 metric, where the two augmented datasets showed increases of 4.73% and 8.24%, respectively. These findings confirm the effectiveness of our data augmentation method. In subsequent experiments, we will use datasets augmented at a 1:2 ratio.

4.5. Main Results

We conducted comparative experiments between our network and advanced two-stage and one-stage instance segmentation methods to demonstrate the superior performance of the proposed HFF-Net in segmenting instances. All methods employed ResNet-50 as the backbone network [51].

As shown in Table 3, compared to the classic instance segmentation network Mask R-CNN, our network achieved improvements of 4.33%, 4.1%, 2.95%, and 5.81% in BBox mAP, BBox mAP50, Mask mAP, and Mask mAP50, respectively. Additionally, even though the HTC network without a semantic segmentation branch is a strong baseline and a leading method in instance segmentation, HFF-Net still achieved significant improvements of 1.89%, 2.38%, 2.08%, and 4.93% in these four metrics.

In addition to using the standard ResNet-50 backbone, we also included experimental results for stronger backbones such as ResNet-101, ResNeXt-101, and ResNeXt-101-DCN applied to more precise two-stage instance segmentation networks. The detailed results are presented in Table 4. Compared to the baseline model, our method improved the Mask mAP50 metric by 5.6% and 2.78% when using the ResNet-101 and ResNeXt-101 backbones, respectively. Our network demonstrated significant improvements over the baseline model across different backbones, validating the effectiveness of our approach. Notably, when using the ResNeXt-101-DCN backbone, our segmentation accuracy reached 19.56% and 51.76%, the highest quantitative results among current state-of-the-art methods. These findings indicate that the proposed HFF-Net significantly outperforms various advanced algorithms in instance segmentation capability and demonstrates its broad applicability across different backbones.

In addition to quantitative results, we visualized the instance segmentation outcomes and qualitatively assessed the performance of different networks by comparing their segmentation results on the same heatmap, as shown in Figure 10 Given the high accuracy of the methods used for classifying different migration phenomena (approaching 100%), classification labels are not displayed in the visualized results. Consequently, to illustrate the segmentation outcomes, each instance is assigned a different color.

As shown in Figure 10, existing segmentation methods often suffer from issues such as segmentation gaps (yellow rectangles), aliasing masks (red rectangles), false positives (green rectangles), and under-segmentation (white rectangles). By incorporating advanced feature extraction and fusion modules, our proposed HFF-Net significantly mitigates these issues, further enhancing the quality of the segmentation results.

In Figure 10a, there are insect take-offs over a wide area around 5 a.m. which subsequently form four layers. All three instance segmentation methods can identify this initial take-off instance and the following four layers. However, the two traditional instance segmentation methods exhibit segmentation insufficiencies at the boundary between the take-off instance and the highest layer. Additionally, there are overlapping issues at the beginning of the second-highest layer. In contrast, our proposed network does not encounter these problems and accurately segments each instance.

A more complex situation is presented in Figure 10b. As previously described in Section 2, there is a distinct take-off instance around 5 a.m., followed by the formation of multiple layers after 8 a.m. The close proximity of these layers poses a significant challenge for clear segmentation. All methods, regardless of which one is used, inevitably suffer from overlapping and segmentation insufficiencies. However, our network effectively reduces these errors and excels in capturing detailed segmentation shapes. Similarly, in Figure 10c, our network also significantly reduces segmentation errors compared to traditional methods, enhancing overall performance. These cases further demonstrate the superior performance of our method in handling complex spatiotemporal distribution.

4.6. Ablation Experiments

To gain a deeper understanding of HFF-Net’s performance, we conducted extensive ablation studies to evaluate the effectiveness of its modules and the integrated model. In these experiments, we used the traditional HTC network (without the semantic segmentation branch) and ResNet-50 as baselines. We first detail the experiments for each module and then analyze how each module contributes to enhancing HFF-Net’s instance segmentation performance.

4.6.1. Experiments on GC Module

The Bottleneck Transform in the GC module compresses and restores channels to reduce parameter redundancy and balance performance. We added the GC module after the bottleneck layers in the third, fourth, and fifth stages of the backbone network and varied the bottleneck ratio to find the optimal parameters. The results in Table 5 show that reducing the ratio from 16 to 4, with only a 7M increase in parameters, leads to a significant improvement in all metrics (0.98% for BBox mAP50 and 2% for Mask mAP50). Consequently, we set the ratio to 4 in subsequent experiments. Notably, even with a ratio of 16, the network’s performance exceeded that of the baseline, underscoring the effectiveness of the GC module.

We also compared the GC module with two common attention mechanisms, Squeeze-and-Excitation (SE) and Convolutional Block Attention Module (CBAM), by replacing the GC module with each of these alternatives. In all configurations, the ratio hyperparameter was set to 4. The results are shown in Table 6. Experimental results indicate that adding the SE and CBAM modules did not significantly improve the model’s instance segmentation performance and, in some metrics, even caused a decline. Specifically, the model with the SE module saw a decrease of 0.76% and 1.42% in BBox mAP50 and Mask mAP50, respectively, while the model with the CBAM module experienced a 3.23% drop in BBox mAP50. Furthermore, while the addition of these three modules resulted in a minimal increase in parameters compared to the baseline, the GC module significantly outperformed the other two attention mechanisms. This demonstrates that the GC module effectively models global context and establishes long-range dependencies with minimal computational overhead, thereby enhancing instance segmentation performance.

4.6.2. Experiments on SASPP Module

In this section, we analyze how the SASPP module functions within the FPN framework through ablation studies. The conventional FPN structure consists of a set of 1 × 1 convolutions for lateral connections, a set of top-down feature map aggregations, and a set of 3 × 3 convolutions. We replaced the 1 × 1 convolutions with the SASPP module, allowing it to capture contextual information from the backbone feature maps at multiple scales. To validate the effectiveness of this approach, we embedded the SASPP module at various positions within the FPN and tested its impact on instance segmentation performance. Specifically, at position 1, we replaced the 1 × 1 convolution modules that receive the backbone network input in the FPN structure with the SASPP module. At position 2, we substituted the 3 × 3 convolutions after feature map aggregation with the SASPP module. At position 3, we directly added the SASPP module at the end of the FPN structure, a common optimization technique for various tasks. The structure of the FPN and the positions of the module are illustrated in Figure 11.

We also supplemented our study with experiments using the traditional ASPP structure embedded at various positions. The results indicate that both the SASPP and ASPP modules perform best at position 1, with significant improvements of 4.19% and 2.97% in the Mask mAP50 metric, respectively. At position 3, where feature fusion structures are commonly added, both modules only showed a slight improvement. At position 2, the segmentation metrics even declined. This confirms the correctness of our choice for embedding this module. Furthermore, SASPP outperforms ASPP in terms of parameter efficiency and instance segmentation accuracy. Detailed results are presented in Table 7.

Additionally, we conducted experiments to assess the impact of different dilation rates within the SASPP module on network performance. We compared three sets of dilation rate parameters: [1,3,6], [1,6,12], and [1,3,6,12]. The results are presented in Table 8. The experimental results indicate that all three parameter sets effectively enhance network performance. Among them, the set [1,3,6] achieves the best performance across most metrics, only slightly lagging behind the set [1,3,6,12] by 0.06% in Mask mAP. Taking all factors into account, the [1,3,6] dilation rate parameters offer the optimal performance for the SASPP module.

4.6.3. Experiments on DCMF Module

The DCMF module consists of feature calibration and adaptive fusion. In this section, we first analyze the computation process of the deformable convolution in the feature calibration module. We compare two different ways to perform the offset computation: (1) by directly using the previous stage’s mask feature and (2) by using the two feature maps mentioned in the methodology of Section 3 after splicing them together to perform the offset calculation. The results are shown in Table 9. The experimental results show that using the feature map splicing method can better integrate the information of the mask features from the previous stage, thereby improving the segmentation accuracy.

We also analyze their impact on mask segmentation performance. Using the structure in Figure 12 as the baseline, we incrementally add these submodules. Table 10 shows that both submodules significantly enhance segmentation accuracy. Combined, they improve Mask mAP and Mask mAP50 from 17.09% and 44.87% to 17.34% and 46.6%, respectively.

4.6.4. Module Analysis

In the previous three sections, we analyzed the impact of adding each module individually to the baseline, demonstrating that each key module significantly enhances network performance. In this section, we present the results of progressively integrating different modules into the baseline, as shown in Table 11. The integration of these modules markedly improves the network’s performance on instance segmentation tasks. Our model, HFF-Net, incorporating all key modules, achieves substantial improvements across all four metrics compared to the baseline.

5. Discussion

Entomological radars can observe the phenomena of insects concentrated in layers, thereby enabling the analysis of insect migration behaviors. Traditional entomological radars can only roughly observe the layering phenomena of insects. In contrast, our HPAR offers high-resolution spatiotemporal measurements, allowing us to clearly observe the complex and fine-grained spatiotemporal distribution structures.

Extracting high-resolution clustered insect data is a crucial step toward systematic and detailed studies of insect migration behavior and patterns. Therefore, we propose HFF-Net to segment and extract insect data on different migration phenomena from the spatiotemporal distribution heatmaps of concentrations. Compared to traditional instance segmentation networks, our network more effectively extracts and integrates the spatiotemporal distribution features, resulting in significant improvements in segmentation performance. However, we observed a noticeable difference between the quantitative metrics mAP50 and mAP, indicating that while our instance segmentation method successfully identifies and segments most targets, it remains inadequate in segmenting the edges and fine details of instances. This limitation is also evident in the qualitative analysis results. We attribute this issue to two main factors. First, the boundaries of instances are indistinct. Despite using heatmap visualization enhancement to improve identification, some boundaries remain blurred, making accurate annotation challenging. Additionally, the sample size of our dataset is insufficient. Although we have significantly improved segmentation accuracy through dataset augmentation, the overall scale of the current dataset remains limited. Insect monitoring and migration analysis are long-term tasks, and with the continuous accumulation of data, the effectiveness of segmentation is expected to progressively improve over time.

At present, we have applied automatic segmentation technology to insect monitoring, laying the foundation for in-depth research into insect migration behaviors. However, HPAR does not possess the capability to measure multidimensional biological and behavioral parameters of individual targets, which is precisely the strength of HMFPR. Consequently, future research should integrate the spatiotemporal information obtained from instance segmentation with the insect monitoring data from HMFPR to extract individual data within insect populations for more detailed studies and analyses.

Our research demonstrates the potential of HPAR in observing the spatiotemporal distribution of insects across different migration phenomena and proves the effectiveness of deep learning techniques, particularly instance segmentation, in processing insect radar observation data. In future works, we aim to incorporate and apply more advanced technologies to further enhance the accuracy of instance segmentation and expand the possibilities for insect radar data processing and migration analysis.

Author Contributions

R.W. developed the method; J.R. wrote the manuscript; J.R., F.Z. and J.W. designed and carried out the experiments; R.W., W.L. and T.Y. reviewed and edited the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Key R&D Program of China, grant number 2023YFB3906200.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Chilson, P.B.; Bridge, E.; Frick, W.F.; Chapman, J.W.; Kelly, J.F. Radar Aeroecology: Exploring the Movements of Aerial Fauna through Radio-Wave Remote Sensing. Biol. Lett. 2012, 8, 698–701. [Google Scholar] [CrossRef] [PubMed]
Hu, G.; Lim, K.S.; Horvitz, N.; Clark, S.J.; Reynolds, D.R.; Sapir, N.; Chapman, J.W. Mass Seasonal Bioflows of High-Flying Insect Migrants. Science 2016, 354, 1584–1587. [Google Scholar] [CrossRef] [PubMed]
Drake, V.A.; Reynolds, D.R. Radar Entomology: Observing Insect Flight and Migration; Cabi: Oxford, UK, 2012; ISBN 9781845935566. [Google Scholar]
Chapman, J.W.; Drake, V.A.; Reynolds, D.R. Recent Insights from Radar Studies of Insect Flight. Annu. Rev. Entomol. 2011, 56, 337–356. [Google Scholar] [CrossRef]
Schaefer, G.W. Radar Studies of Locust, Moth and Butterfly Migration in the Sahara. Proc. R. Entomol. Soc. Lond. Ser. C 1969, 34, 39–40. [Google Scholar]
Drake, V.A. Quantitative Observation and Analysis Procedures for a Manually Operated Entomological Radar; Commonwealth Scientific and Industrial Research Organization: Canberra, Australia, 1981. [Google Scholar]
Drake, V.A.; Farrow, R. The Nocturnal Migration of the Australian Plague Locust, Chortoicetes Terminifera (Walker) (Orthoptera: Acrididae): Quantitative Radar Observations of a Series of Northward Flights. Bull. Entomol. Res. 1983, 73, 567–585. [Google Scholar] [CrossRef]
Drake, V.A. Target Density Estimation in Radar Biology. J. Theor. Biol. 1981, 90, 545–571. [Google Scholar] [CrossRef]
Riley, J.R.; Reynolds, D.R. Radar-Based Studies of the Migratory Flight of Grasshoppers in the Middle Niger Area of Mali. Proc. R. Soc. Lond. Ser. B Biol. Sci. 1979, 204, 67–82. [Google Scholar] [CrossRef]
Smith, A.D.; Riley, J.R.; Gregory, R.D. A Method for Routine Monitoring of the Aerial Migration of Insects by Using a Vertical-Looking Radar. Philos. Trans. R. Soc. Lond. Ser. B Biol. Sci. 1993, 340, 393–404. [Google Scholar] [CrossRef]
Chapman, J.W.; Nesbit, R.L.; Burgin, L.E.; Reynolds, D.R.; Smith, A.D.; Middleton, D.R.; Hill, J.K. Flight Orientation Behaviors Promote Optimal Migration Trajectories in High-Flying Insects. Science 2010, 327, 682–685. [Google Scholar] [CrossRef]
Drake, V.A. Automatically Operating Radars for Monitoring Insect Pest Migrations. Insect Sci. 2002, 9, 27–39. [Google Scholar] [CrossRef]
Drake, V.A.; Wang, H.K.; Harman, I.T. Insect Monitoring Radar: Remote and Network Operation. Comput. Electron. Agric. 2002, 35, 77–94. [Google Scholar] [CrossRef]
Drake, V.A.; Harman, I.T.; Bourne, I.A. Simultaneous Entomological and Atmospheric Profiling: A Novel Technique for Studying the Biometeorology of Insect Migration. In Proceedings of the 21st Conference on Agricultural and Forest Meteorology & 11th Conference on Biometeorology and Aerobiology, San Diego, CA, USA, 7–11 March 1994. [Google Scholar]
Schaefer, G.W. Radar observations of insect flight. In Insect Flight; Rainey, R.C., Ed.; Blackwell Scientific: Oxford, UK, 1976; pp. 157–197. [Google Scholar]
Feng, H.; Wu, K.; Ni, Y.; Cheng, D.; Guo, Y. Nocturnal Migration of Dragonflies over the Bohai Sea in Northern China. Ecol. Entomol. 2006, 31, 511–520. [Google Scholar] [CrossRef]
Wood, C.R.; Clark, S.J.; Barlow, J.F.; Chapman, J.W. Layers of Nocturnal Insect Migrants at High-Altitude: The Influence of Atmospheric Conditions on Their Formation. Agric. For. Entomol. 2010, 12, 113–121. [Google Scholar] [CrossRef]
Drake, V.A.; Wang, H. Ascent and Descent Rates of High-Flying Insect Migrants Determined with a Non-Coherent Vertical-Beam Entomological Radar. Int. J. Remote Sens. 2018, 40, 883–904. [Google Scholar] [CrossRef]
Drake, V.A.; Hatty, S.; Symons, C.; Wang, H. Insect Monitoring Radar: Maximizing Performance and Utility. Remote Sens. 2020, 12, 596. [Google Scholar] [CrossRef]
Long, T.; Hu, C.; Wang, R.; Zhang, T.; Kong, S.; Li, W.; Cai, J.; Tian, W.; Zeng, T. Entomological Radar Overview: System and Signal Processing. IEEE Aerosp. Electron. Syst. Mag. 2020, 35, 20–32. [Google Scholar] [CrossRef]
Hu, C.; Yan, Y.; Wang, R.; Jiang, Q.; Cai, J.; Li, W. High-Resolution, Multi-Frequency and Full-Polarization Radar Database of Small and Group Targets in Clutter Environment. Sci. China Inf. Sci. 2023, 66, 227301. [Google Scholar] [CrossRef]
Hu, C.; Li, W.; Wang, R.; Long, T.; Liu, C.; Alistair Drake, V. Insect Biological Parameter Estimation Based on the Invariant Target Parameters of the Scattering Matrix. IEEE Trans. Geosci. Remote Sens. 2019, 57, 6212–6225. [Google Scholar] [CrossRef]
Hu, C.; Zhang, F.; Li, W.; Wang, R.; Yu, T. Estimating Insect Body Size from Radar Observations Using Feature Selection and Machine Learning. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5120511. [Google Scholar] [CrossRef]
Jiang, Q.; Wang, R.; Zhang, J.; Zhang, R.; Li, Y.; Hu, C. A Multisubobject Approach to Dynamic Formation Target Tracking Using Random Matrices. IEEE Trans. Aerosp. Electron. Syst. 2023, 59, 7334–7351. [Google Scholar] [CrossRef]
Yan, C.; Gong, B.; Wei, Y.; Gao, Y. Deep Multi-View Enhancement Hashing for Image Retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 43, 1445–1451. [Google Scholar] [CrossRef]
Yan, C.; Li, Z.; Zhang, Y.; Liu, Y.; Ji, X.; Zhang, Y. Depth Image Denoising Using Nuclear Norm and Learning Graph Model. ACM Trans. Multimed. Comput. Commun. Appl. 2020, 16, 122. [Google Scholar] [CrossRef]
Yan, C.; Hao, Y.; Li, L.; Yin, J.; Liu, A.; Mao, Z.; Chen, Z.; Gao, X. Task-Adaptive Attention for Image Captioning. IEEE Trans. Circuits Syst. Video Technol. 2022, 32, 43–51. [Google Scholar] [CrossRef]
Yan, C.; Teng, T.; Liu, Y.; Zhang, Y.; Wang, H.; Ji, X. Precise No-Reference Image Quality Evaluation Based on Distortion Identification. ACM Trans. Multimed. Comput. Commun. Appl. 2021, 17, 110. [Google Scholar] [CrossRef]
Yan, C.; Meng, L.; Li, L.; Zhang, J.; Wang, Z.; Yin, J.; Zhang, J.; Sun, Y.; Zheng, B. Age-Invariant Face Recognition by Multi-Feature Fusionand Decomposition with Self-Attention. ACM Trans. Multimed. Comput. Commun. Appl. 2022, 18, 29. [Google Scholar] [CrossRef]
Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the 27th IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
Long, J.; Shelhamer, E.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
He, K.M.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. In Proceedings of the 16th IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
Cai, Z.; Vasconcelos, N. Cascade R-CNN: High Quality Object Detection and Instance Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 43, 1483–1498. [Google Scholar] [CrossRef]
Chen, K.; Pang, J.M.; Wang, J.Q.; Xiong, Y.; Li, X.X.; Sun, S.Y.; Feng, W.S.; Liu, Z.W.; Shi, J.P.; Ouyang, W.L.; et al. Hybrid Task Cascade for Instance Segmentation. In Proceedings of the 32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019; pp. 4969–4978. [Google Scholar]
Fang, Y.; Yang, S.; Wang, X.; Li, Y.; Fang, C.; Shan, Y.; Feng, B.; Liu, W. Instances as Queries. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 6890–6899. [Google Scholar]
Bolya, D.; Zhou, C.; Xiao, F.Y.; Lee, Y.J. YOLACT Real-time Instance Segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 9156–9165. [Google Scholar]
Wang, X.; Kong, T.; Shen, C.; Jiang, Y.; Li, L. SOLO: Segmenting Objects by Locations. In Proceedings of the European Conference on Computer Vision (ECCV 2020), Glasgow, UK, 23–28 August 2020; pp. 649–665. [Google Scholar]
Wang, X.; Zhang, R.; Kong, T.; Li, L.; Shen, C. SOLOv2: Dynamic and Fast Instance Segmentation. In Proceedings of the Advances in Neural Information Processing Systems 33 (NeurIPS 2020), Online, 6–12 December 2020; pp. 17721–17732. [Google Scholar]
Cao, Y.; Xu, J.R.; Lin, S.; Wei, F.Y.; Hu, H. GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1971–1980. [Google Scholar]
Chen, L.C.E.; Zhu, Y.K.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. In Proceedings of the 15th European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 833–851. [Google Scholar]
Zuiderveld, K. Contrast Limited Adaptive Histogram Equalization. Graph. Gems 1994, 4, 474–485. [Google Scholar] [CrossRef]
Bovik, A.C.; Clark, M.; Geisler, W.S. Multichannel Texture Analysis Using Localized Spatial Filters. IEEE Trans. Pattern Anal. Mach. Intell. 1990, 12, 55–73. [Google Scholar] [CrossRef]
Karras, T.; Laine, S.; Aittala, M.; Hellsten, J.; Lehtinen, J.; Aila, T. Analyzing and Improving the Image Quality of StyleGAN. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 8107–8116. [Google Scholar]
Goodfellow, I.J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Nets. In Proceedings of the 28th Conference on Neural Information Processing Systems (NIPS), Montreal, QC, Canada, 8–13 December 2014; pp. 2672–2680. [Google Scholar]
Karras, T.; Aittala, M.; Hellsten, J.; Laine, S.; Lehtinen, J.; Aila, T. Training Generative Adversarial Networks with Limited Data. In Proceedings of the Advances in Neural Information Processing Systems 33 (NeurIPS 2020), Virtual, Vancouver, BC, Canada 6–12 December 2020; pp. 12104–12116. [Google Scholar]
Hensel, M.; Ramsauer, H.; Unterthiner, T.; Nessler, B.; Hochreiter, S. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. In Proceedings of the 31st Annual Conference on Neural Information Processing Systems (NIPS), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
Lin, T.-Y.; Dollar, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef]
Dai, J.F.; Qi, H.Z.; Xiong, Y.W.; Li, Y.; Zhang, G.D.; Hu, H.; Wei, Y.C. Deformable Convolutional Networks. In Proceedings of the 16th IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 764–773. [Google Scholar]
Chen, K.; Wang, J.; Pang, J.; Cao, Y.; Xiong, Y.; Li, X.; Sun, S.; Feng, W.; Liu, Z.; Xu, J.; et al. MMDetection: Open MMLab Detection Toolbox and Benchmark. arXiv 2019, arXiv:1906.07155. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]

Figure 1. (a) High-Resolution Phased Array Radar (HPAR); (b) schematic diagram of the statistical mode.

Figure 2. Spatiotemporal distribution heatmap cases. We have labeled the large-scale insect take-off, landing, and layering phenomena observed in the heatmap using red, white, and yellow boxes, respectively. (a) Heatmap generated from monitoring data between 4:00 and 12:00 on 19 April 2023; (b) heatmap generated from monitoring data between 17:00 on 10 August 2023, and 4:00 on 11 August 2023.

Figure 3. Heatmap visualization enhancement cases. Each column shows the processing results for one case. The first column represents the heatmap generated from monitoring data between 4:00 and 12:00 on 19 April 2023. Specifically, (a) the original heatmap, (c) the result after contrast enhancement, (e) the result after texture detail enhancement, and (g) the final result. The second column represents the heatmap generated from monitoring data between 17:00 on 10 August 2023, and 4:00 on 11 August 2023. Specifically, (b) the original heatmap, (d) the result after contrast enhancement, (f) the result after texture detail enhancement, and (h) the final result.

Figure 4. Two cases of heatmaps generated using StyleGANv2. (a) the case includes two instances of layering; (b) the case includes one instance of a large-scale takeoff and two instances of layering.

Figure 5. Overall architecture of HFF-Net.

Figure 6. The internal structure of the GC module and its integration into the backbone network.

Figure 7. Backbone network and FPN architecture with SASPP modules.

Figure 8. Internal structure of the SASPP module.

Figure 9. Internal structure of the DCMF module and its embedding process.

Figure 10. Visualization of segmentation results. We present the instance segmentation results for three heatmap cases, with each column displaying the segmentation results of different networks for the same heatmap. The first row shows the heatmap, the second row displays the ground truth annotations, and the third to fifth rows illustrate the segmentation results from Mask-RCNN, HTC, and HFF-Net, respectively. In the segmentation results we have used different colored rectangular boxes to mark the errors in the different segmentation results: segmentation gaps (yellow rectangles), aliasing masks (red rectangles), false positives (green rectangles), and under-segmentation (white rectangles). (a) Monitoring data from 14 March 2023, 4:00 to 10:00; (b) monitoring data from 19 April 2023 4:00 to 12:00; and (c) monitoring data from 2 August 2023, 4:30 to 10:00.

Figure 11. The structure of the FPN and number of the positions to be modified.

Figure 12. Mask Head internal basic structure.

Table 1. Training parameter configurations for different networks.

Model	Initial Learning Rate	Total Epochs	MultiStepLR Milestones
Default	0.01	22	12/16/19
QueryInst	0.0001	30	20/25/28
Yolact	0.001	45	35/40/43
Solo/Solov2	0.005	26	16/21/24

Table 2. Instance segmentation results of Mask R-CNN on different datasets.

Dataset	Total Epochs	MultiStepLR Milestones	BBox mAP	BBox mAP50	Mask mAP	Mask mAP50
Original dataset	34	24/29/32	11.51	35.96	13.22	35.75
Expanded dataset (1:1)	26	16/21/24	14.3	41.26	15.51	40.48
Expanded dataset (1:2)	22	12/16/19	16.31	46.8	16.22	43.99

Table 3. Instance segmentation results of different networks.

Type	Model	BBox mAP	BBox mAP50	Mask mAP	Mask mAP50	Params
Single-Stage	Yolact	7.67	22.66	6.18	21.62	34.741 M
	Solo	-	-	2.79	10.14	36.301 M
	Solov2	-	-	11.93	36.29	46.593 M
Two-Stage	QueryInst	12.71	31.07	12.91	34	0.172 G
	Mask R-CNN	16.31	46.8	16.22	43.99	43.982 M
	Cascade Mask R-CNN	18.69	48.49	16.61	44.1	77.028 M
	HTC (baseline)	18.75	48.52	17.09	44.87	77.16 M
	HFF-Net(ours)	20.64	50.9	19.17	49.8	93.533 M

Table 4. Instance segmentation results of different networks on stronger backbone network.

Model	Backbone	BBox mAP	BBox mAP50	Mask mAP	Mask mAP50	Params
Mask R-CNN	Resnet-101	17.66	46.68	16.55	44.46	62.974 M
Cascade Mask R-CNN		18.9	48.98	17.31	45.04	96.021 M
HTC		18.96	48.94	17.85	45.55	96.152 M
HFF-Net		20.68	52.55	19.35	51.15	0.121 G
Mask R-CNN	ResNeXt-101	18.22	49.39	17.52	47.91	62.603 M
Cascade Mask R-CNN		19.36	50.53	17.98	48.47	95.649 M
HTC		19.35	50.55	18.3	48.76	95.781 M
HFF-Net		21.55	53.72	19.43	51.54	0.121 G
HTC	ResNeXt-DCN-101	20.08	51.25	18.64	48.86	98.353 M
HFF-Net	ResNeXt-DCN-101	21.74	54.49	19.56	51.76	0.124 G

Table 5. Instance segmentation results with different bottleneck ratios for the GC module.

	BBox mAP	BBox mAP50	Mask mAP	Mask mAP50	Params
Baseline	18.75	48.52	17.09	44.87	77.16 M
ratio 4	19.72	50.41	17.7	48.17	87.161 M
ratio 8	19.23	49.8	17.44	46.27	82.175 M
ratio 16	18.78	49.03	17.24	45.7	79.682 M

Table 6. Instance segmentation results of different attention mechanisms.

	BBox mAP	BBox mAP50	Mask mAP	Mask mAP50	Params
Baseline	18.75	48.52	17.09	44.87	77.16 M
GC	19.72	50.41	17.7	48.17	87.161 M
SE	18.76	47.76	16.89	43.45	87.147 M
CBAM	17.82	45.29	16.62	44.93	87.123 M

Table 7. Instance segmentation results of ASPP and SASPP modules at different embedding positions within the FPN.

	Position	BBox mAP	BBox mAP50	Mask mAP	Mask mAP50	Params
Baseline	/	18.75	48.52	17.09	44.87	77.16 M
ASPP	1	19.18	49.53	17.31	47.84	96.899 M
	2	17.56	46.58	16.86	43.57	81.104 M
	3	18.81	48.73	17.37	45.04	83.464 M
SASPP	1	19.66	49.86	18.23	49.06	81.25 M
	2	17.94	46.9	17.08	43.65	76.929 M
	3	18.81	49.05	17.82	45.44	79.288 M

Table 8. Instance segmentation results for SASPP with different dilation rate configurations.

	BBox mAP	BBox mAP50	Mask mAP	Mask mAP50	Params
Baseline	18.75	48.52	17.09	44.87	77.16 M
1,3,6	19.66	49.86	18.23	49.06	81.25 M
1,6,12	19.54	49.43	17.83	47.94	81.25 M
1,3,6,12	19.12	49.34	17.66	47.96	82.54 M

Table 9. Instance segmentation results with different offset calculation methods.

Adaptive Fusion	BBox mAP	BBox mAP50	Mask mAP	Mask mAP50	Params
Base	18.75	48.52	17.09	44.87	77.16 M
without concat	18.74	48.52	17.20	45.55	78.81 M
concat	18.73	48.56	17.26	45.9	79.244 M

Table 10. Ablation study results for DCMF.

Calibration	Adaptive Fusion	BBox mAP	BBox mAP50	Mask mAP	Mask mAP50	Params
		18.75	48.52	17.09	44.87	77.16 M
√		18.73	48.56	17.26	45.9	79.244 M
√	√	18.79	48.55	17.34	46.6	79.442 M

Table 11. Ablation study results of HFF-Net.

GC	SASPP	DCMF	BBox mAP	BBox mAP50	Mask mAP	Mask mAP50	Params
			18.75	48.52	17.09	44.87	77.16 M
√			19.72	50.41	17.7	48.17	87.161 M
√	√		20.63	50.94	18.69	49.03	91.251 M
√	√	√	20.64	50.9	19.17	49.8	93.533 M

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, R.; Ren, J.; Li, W.; Yu, T.; Zhang, F.; Wang, J. Application of Instance Segmentation to Identifying Insect Concentrations in Data from an Entomological Radar. Remote Sens. 2024, 16, 3330. https://doi.org/10.3390/rs16173330

AMA Style

Wang R, Ren J, Li W, Yu T, Zhang F, Wang J. Application of Instance Segmentation to Identifying Insect Concentrations in Data from an Entomological Radar. Remote Sensing. 2024; 16(17):3330. https://doi.org/10.3390/rs16173330

Chicago/Turabian Style

Wang, Rui, Jiahao Ren, Weidong Li, Teng Yu, Fan Zhang, and Jiangtao Wang. 2024. "Application of Instance Segmentation to Identifying Insect Concentrations in Data from an Entomological Radar" Remote Sensing 16, no. 17: 3330. https://doi.org/10.3390/rs16173330

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

Application of Instance Segmentation to Identifying Insect Concentrations in Data from an Entomological Radar

Abstract

1. Introduction

2. Data Acquisition and Initial Processing

2.1. Radar Data

2.2. Image Visualization Enhancement

2.2.1. Image Contrast Enhancement

2.2.2. Image Texture Enhancement

2.2.3. Background Elimination

2.3. Data Augmentation

3. Method

3.1. GC Module

3.2. SASPP Module

3.3. DCMF Module

4. Experiments and Results

4.1. Dataset

4.2. Evaluation Metrics

4.3. Implementation Details

4.4. Results of Dataset Augmentation Experiments

4.5. Main Results

4.6. Ablation Experiments

4.6.1. Experiments on GC Module

4.6.2. Experiments on SASPP Module

4.6.3. Experiments on DCMF Module

4.6.4. Module Analysis

5. Discussion

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI