SPCN: An Innovative Soybean Pod Counting Network Based on HDC Strategy and Attention Mechanism

Li, Ximing; Zhuang, Yitao; Li, Jingye; Zhang, Yue; Wang, Zhe; Zhao, Jiangsan; Li, Dazhi; Gao, Yuefang

doi:10.3390/agriculture14081347

Open AccessArticle

SPCN: An Innovative Soybean Pod Counting Network Based on HDC Strategy and Attention Mechanism

by

Ximing Li

^1,2

,

Yitao Zhuang

¹,

Jingye Li

¹,

Yue Zhang

¹,

Zhe Wang

¹

,

Jiangsan Zhao

³

,

Dazhi Li

¹ and

Yuefang Gao

^1,*

¹

College of Mathematics and Informatics, South China Agricultural University, Guangzhou 510642, China

²

Key Laboratory of Smart Agricultural Technology in Tropical South China, Ministry of Agriculture and Rural Affairs, Guangzhou 510642, China

³

Department of Agricultural Technology, Norwegian Institute of Bioeconomy Research (NIBIO), P.O. Box 115, NO-1431 Ås, Norway

^*

Author to whom correspondence should be addressed.

Agriculture 2024, 14(8), 1347; https://doi.org/10.3390/agriculture14081347

Submission received: 23 June 2024 / Revised: 4 August 2024 / Accepted: 10 August 2024 / Published: 12 August 2024

(This article belongs to the Special Issue Computer Vision and Artificial Intelligence in Agriculture)

Download

Browse Figures

Versions Notes

Abstract

:

Soybean pod count is a crucial aspect of soybean plant phenotyping, offering valuable reference information for breeding and planting management. Traditional manual counting methods are not only costly but also prone to errors. Existing detection-based soybean pod counting methods face challenges due to the crowded and uneven distribution of soybean pods on the plants. To tackle this issue, we propose a Soybean Pod Counting Network (SPCN) for accurate soybean pod counting. SPCN is a density map-based architecture based on Hybrid Dilated Convolution (HDC) strategy and attention mechanism for feature extraction, using the Unbalanced Optimal Transport (UOT) loss function for supervising density map generation. Additionally, we introduce a new diverse dataset, BeanCount-1500, comprising of 24,684 images of 316 soybean varieties with various backgrounds and lighting conditions. Extensive experiments on BeanCount-1500 demonstrate the advantages of SPCN in soybean pod counting with an Mean Absolute Error(MAE) and an Mean Squared Error(MSE) of 4.37 and 6.45, respectively, significantly outperforming the current competing method by a substantial margin. Its excellent performance on the Renshou2021 dataset further confirms its outstanding generalization potential. Overall, the proposed method can provide technical support for intelligent breeding and planting management of soybean, promoting the digital and precise management of agriculture in general.

Keywords:

computer vision; density map-based counting; soybean pod; hybrid dilated convolution; Convolutional Block Attention Module

1. Introduction

Soybeans are one of the most important sources of protein in the world [1].Consequently, accelerating the breeding and cultivation of new high-yield, high-quality, and disease-resistant varieties is essential. Previous studies have indicated that the number of pods is a crucial factor in soybean yield estimation, providing vital information for plant breeding [2,3,4]. However, traditional manual counting methods are time-consuming and labor-intensive, particularly in large-scale cultivation scenarios where the workload becomes immense and unmanageable. Moreover, manual counting is susceptible to subjective errors, undermining the accuracies of decisions and assessments. These challenges not only diminish the management efficiency of agricultural practices but also ruin the scientific efforts in precision breeding [5,6,7].

Computer vision, as a non-invasive, rapid, and objective technique, holds significant promise across various domains, including in soybean pod counting. Through automated image analysis, computer vision technology can effectively lower labor costs and enhance the speed and accuracy of the counting procedure. In recent years, the emergence and advancement of deep learning techniques has notably propelled their applications in vision-based crop counting methods [8,9,10,11,12].Object detection-based counting methods identify objects in a digital image or video first and then count the detected objects, which are usually in the form of bounding boxes. It has been predominantly employed in recent studies for different counting tasks. Riera et al. [13] introduce a deep learning-based multi-view image fusion framework tailored for processing and analyzing multi-angle videos obtained by ground robots in fields to accurately count soybean pods for yield estimation. Xu et al. [14] propose a VFNet detector-based Deformable Attention Recursive Feature Pyramid Network for Soybean Pod Counting (DARFP-SD). Mathew et al. [15] investigate the application of object detection models, both YOLOV7 and YOLOv7-E6E, for soybean pod counting. Xiang [16] proposed YOLO POD, a method based on the YOLO X framework, to achieve accurate identification and counting of soybean pods without compromising inference speed. He et al. [17] propose a YOLOV5-based soybean pods recognition and weight estimation model to achieve accurate estimation pod weight of a single plant. Yu et al. [3] introduce PodNet for accurate counting and localizing soybean pods with a minimized number of parameters. However, object detection-based counting methods struggle in scenarios characterized by high density and substantial occlusion. On the contrary, counting objects based on a density map can circumvent the necessity of pinpointing objects’ exact locations, and thereby achieving higher prediction accuracies [18,19,20].

The density map-based counting method entails estimating the number of objects by analyzing the density distribution within an image. The density map method estimates the precise count of objects through mapping the extracted features to the density map. This approach has been widely applied in various scenarios of densely located objects, such as crowd counting [18,19,20,21] and Cell counting [22,23,24], etc. In the context of soybean pod counting, the research scenario is plagued by challenges like mutual occlusion and uneven density distribution of soybean pods, akin to the complexities encountered in crowd counting scenarios. This suggests that the density map-based counting method is inherently well-suited for soybean pod counting. However, the potential of density map-based soybean pod counting methods have been relatively less well studied.

Therefore, this paper introduces a novel density map-based pod counting framework, named Soybean Pod Counting Network (SPCN). SPCN integrates both the attention mechanism and the hybrid dilated convolution (HDC) technique to enhance counting accuracy, effectively addressing pod occlusion by using spatial contextual information in crowded regions. Additionally, the study introduces an Unbalanced Optimal Transport (UOT) based loss function for direct density map generation to further enhance counting accuracy and system robustness. To demonstrate the effectiveness of the proposed method, it is tested not only on our large-scale soybean pod counting dataset, BeanCount-1500, but also on two other publicly available counting datasets.

The contributions of this paper are summarized as follows:

Proposed SPCN: This paper pioneers the density map-based SPCN architecture, which notably enhances the accuracy and efficiency of soybean pod counting by integrating the attention mechanism with the HDC strategy and supervising density map generation using the UOT loss function. The source code is available at https://github.com/johnhamtom/soybean_counting_SPCN (accessed on 10 August 2024).

Construction of BeanCount-1500 dataset: We constructed the large-scale BeanCount-1500 soybean pod counting dataset, encompassing diverse lighting conditions, backgrounds, clarity levels, occupation level between plant and pod, as well as instances of pod shatter, providing a valuable resource for soybean pod counting research.

2. Materials and Methods

2.1. Data Acquisition and Preprocessing

To meet the breeding needs of the soybean breeding team at the College of Agriculture, South China Agricultural University, it is necessary to accurately count the number of soybean pods of different soybean varieties. In collaboration with the College of Agriculture, we collected a large number of images of genetically sequenced soybean plants and developed a deep network model to achieve robust automatic soybean pod counting.

Specifically, the collected data covers 316 soybean varieties from different countries of origins. They were all planted at the planting base of South China Agricultural University in June 2019 and harvested in October the same year for image collection. We used various mobile devices, including the HUAWEI Mate 20 (Shenzhen, Guangdong, China) and iPhone 13 (Cupertino, CA, USA), to capture images of these soybean plants. The images were taken under natural light conditions, with backgrounds set in both a clean black curtain environment and an open, noisy environment. For each variety, we randomly selected 4–6 samples and took 10–15 images of each sample from different angles, then manually counted the number of pods on each soybean plant. After thoroughly cleaning and annotating the collected data, we created a high-quality dataset named Bean-Count-1500. Figure 1 shows some example images from Bean-Count-1500.

As illustrated in Figure 2a, during image acquisition, two categories were defined: one category features simple scenes, where soybean plants were photographed against a simple background such as a black backdrop or a clear wall; the other category is characterized by complex real scenes involving authentic backgrounds like fields or corridors. In the situation of complex real scenes, images also vary in levels of clarity, captured by jittering or defocusing the camera, as exemplified in Figure 2b. Additionally, soybean images are categorized into single-plant and multi-plant images, as shown in Figure 2c. Mutual shading caused by dense pod clusters (shattering (Figure 2d), together with these image diversities mentioned above, create extremely challenging situations for accurate pod counting.

To ensure the accuracy of the data, we invited experienced technicians to annotate pods with points inside the images using the labelme tool [25] and to register the number of pods per soybean plant at the same time. For point annotation, a point is put at the center of the visible part of each pod to make sure representative features can be extracted from the pod. Images of soybean plants with no clear pod outlines are deleted during annotation. In addition, cases of pod shatter and partial pod occlusion are also included to cover the diversity of real-world scenarios. As shown in Figure 3, pods whose shape can be shown completely in the picture are labelled with the point right at the center, while pods that are slightly occluded but still show approximate outlines are still labelled with the point close to the center. The complete annotated dataset, BeanCount-1500, contains a total of 24,684 accurately labelled images of 1500 soybean plants with a total of 987,391 pods across 316 varieties. These images of soybean pods are taken under varying conditions, such as different light, background, clarity, number of plants, and pod shatter, to facilitate building a more robust soybean pod counting model.

We randomly sampled 16,456 soybean pod images from the BeanCount-1500 dataset for training. A validation set comprising of 4113 images was used for model selection. Another set of 4115 soybean pod images was used to evaluate the model’s final performance (Table 1).

To elucidate the characteristics of the BeanCount-1500 dataset more comprehensively, we employ three visualization techniques: three-dimensional scatterplots, box plots, and two-dimensional scatterplots. First, Figure 4a depicts a three-dimensional scatter plot of BeanCount-1500, with the x-axis, y-axis, and z-axis representing image width (w), height (h), and the number of pod labels, respectively. The lighter the color, the larger the number of pods per image. This plot illustrates the relationship between pod label count and image resolution, revealing discernible trends and distribution patterns. Figure 4b presents a box-and-line plot demonstrating the distribution of pod annotations across the dataset. The plot highlights that most images exhibit approximately 50 pod labels, alongside a notable of images either with no pod or densely populated images exceeding 150 pods per image. Additionally, Figure 4c showcases a two-dimensional scatter plot of image resolutions categorized into three groups using K-means clustering, represented by purple, yellow, and cyan dots. The analysis indicates that 11,844 images are in resolutions ranging from 512 × 512 to 1728 × 1376 (yellow dots), 9708 images in resolutions spanning 512 × 1129 to 1024 × 4348 (purple dots), and 3132 images in resolutions ranging from 1690 × 512 to 3795 × 1024 (cyan dots).

2.2. Algorithm Description

This paper introduces a novel model architecture named SPCN, which is a dilated convolution network incorporating an attention module. The architecture, depicted in Figure 5, can be divided into three main modules: the front-end module, the attention module, and the back-end module.

The front-end module primarily involves a series of stacked convolutional and pooling layers serving as 2D feature extraction. The CBAM (Convolutional Block Attention Module) module enhances the extracted features by introducing both channel attention and spatial attention mechanisms, effectively weighting and focusing on more important features before forwarding them to the back-end network. The back-end module employs a set of extended dilated convolutions to expand the receptive field. Finally, the output is supervised using the UOT loss function to guide density map generation. This comprehensive architecture facilitates accurate soybean pod counting by using advanced techniques in feature extraction, attention mechanisms, and loss function supervision.

2.2.1. Front-End Module

As depicted in Figure 6, using the exceptional feature extraction capabilities of the VGG19 network [26], we discard the last layers of VGG19, including a max pooling, an adaptive average pooling, three Fully Connected Layer, and one softmax layer, and treat the remaining layers as the core of the front-end module. Specifically, the module employs an up-sampling operation to increase the resolution of the feature map before being passed to the attention module. In this process, the convolutional layers are responsible for mining the deep features of the input image, and the up-sampling operation enlarges the size of the feature map, which eventually captures rich information more effectively.

In conclusion, the design of this front-end network structure demonstrates a high degree of flexibility and extensibility, making it easy to adjust or replace according to specific needs in different application scenarios.

2.2.2. CBAM Module

We used the CBAM Attention Module [27] to increase the model performance. CBAM has been widely used in deep learning to improve feature representation power. It is used to focus both the channel and spatial information of the input feature map. The channel attention module improves the attention to important features by adaptively weighting each channel. The spatial attention module, on the other hand, weights features at different locations differently to improve the model’s attention to important regions. The combination of these two sub-modules enables the model to better focus on more meaningful information from the input features to improve both the accuracy and generalization potential of the derived model.

For a sheet of input features, channel attention and spatial attention can compute complementary attentions focusing on ‘what’ and ‘where’, respectively. Furthermore, it has been shown that sequential arrangements can give better results than parallel arrangements, and that channel attention is slightly better prioritized than spatial attention [27]. As shown in Figure 7, the input feature map F is first channel-weighted by channel attention and then spatially weighted by spatial attention.

2.2.3. Back-End Module

Dilated convolution, a special convolution operation, introduces additional spacing within a conventional convolution kernel, thereby expanding the effective receptive field without inflating parameters or computational load. This technique enables the model to capture a broader range of contextual information without sacrificing resolution. However, dilated convolution’s spaced sampling of the feature map can lead to a grid effect, causing loss of feature information. Drawing inspiration from Wang et al. [28], this paper employs the HDC strategy in the back-end module, which involves a strategic combination of dilated convolutions with varying dilation rates to mitigate the grid effect and expand the receptive field.

As illustrated in Figure 8a, a 3 × 3 convolution kernel with a dilation rate of 1 is depicted in the upper left corner, while a kernel with a dilation rate of 2 is shown in the upper right corner. When processed by three convolutional kernels with a size of 3 × 3 and a dilation rate of [2,2,2], the receptive field surrounding the red pixel point in the center appears grid-like. However, with a dilation rate of [1,2,3], the receptive field is more evenly distributed.

The back-end module, depicted in Figure 8b, comprises four layers of 3 × 3 convolutional kernels. The first layer outputs 256 channels with a dilation rate of 1, the second layer outputs 128 channels with a dilation rate of 2, the third layer outputs 64 channels with a dilation rate of 3, and the final layer serves as the output layer with a single output channel, which is treated as density map. By arranging the first three convolution layers with dilation rates [1,2,3] in this manner, the grid effect is effectively mitigated, and the receptive field is significantly expanded.

2.2.4. Loss Function

In our work, we employ a Generalized Loss function for supervised density map generation based on UOT, as proposed by Wan et al. [29]. This loss function enables the trained network model to generate high-quality density maps by minimizing the direct transport cost between the predicted density maps and the ground truth (GT) point annotations. Additionally, L1-loss and L2-loss are also used to measure the difference between predicted density map and the GT point annotations, as illustrated in Figure 9. This approach ensures that the trained network model produces high-quality density maps that exhibit desirable characteristics, such as more ‘sparse’ and more precise localization.

In this paper,

a = {[a_{i}]}_{i}

and

{b = [b_{j}]}_{j}

are defined as the predicted density map and GT point map, respectively. The General loss formula is shown in (1).

L_{C}^{τ} (A, B) = {m i n}_{P \in R_{+}^{n * m}} < C, P > - ε H (P) + τ D_{1} (P 1_{m}| a) + τ D_{2} (P^{T} 1_{n}| b)

(1)

The loss function can be divided into four parts, the first part being the

{m i n}_{P \in R_{+}^{n * m}} < C, P >

.

C \in R_{+}^{n * m}

It is the transmission cost matrix, where

C_{i j} = C (x_{i}, y_{j}) = e x p (1 / η (x_{i}, y_{j}) {‖x_{i} - y_{j}‖}_{2})

, which refers to the transport cost to move density map of

x_{i}

to the GT point coordinates

y_{j}

.

η (x_{i}, y_{j})

is an adaptive perspective factor that is mapped between

(1 / 2) * (h_{x_{i}} + h_{y_{j}})

; this can make the density generation of distant pods in BeanCount1500 sparser for some images with perspective effects. P is the transport matrix and has a unique solution and a unique form [30]. It determines the optimal transfer scheme between density map pixels and GT points to minimize the transfer cost. By minimizing the transmission cost, it is possible to encourage the prediction of density values in the vicinity of the annotations, pushing the predicted density towards the GT point annotations during training.

The second part

ε H (P) = - ε \sum_{i j} P_{i j} l o g P_{i j}

is the entropy regularisation term, which calculates the degree of overlap of the transmission matrices in order to reduce the ineffective transmission of the transmission matrices and ultimately bring the predicted density values closer to the GT points.

ε

: the larger this, the less compact the density map predicted by the network model.

The third part

D_{1} (P 1_{m}| a)

is the pixel-by-pixel loss between the predicted density map a and the intermediate density map

\overset{´}{a} = P 1_{m}

constructed based on the transfer matrix P and GT point coordinates. In this paper, we use the L2 loss to measure the difference between them (see Equation (2)). This term allows any predicted density that is not associated with an annotation to be penalized.

D_{1} (P 1_{m}| a) = {‖P 1_{m} - a‖}_{2}^{2}

(2)

The fourth part

D_{2} (P^{T} 1_{n}| b)

is the point-by-point loss between the annotations constructed

\overset{´}{b} = P^{T} 1_{n}

using the transmission matrix P and the predicted density maps and the GT point annotations b. In this paper, the L1 loss is used to measure the gap between them (see Equation (3)). This item ensures that all GT point annotations are taken into account, i.e., used in the transmission scheme.

D_{2} (P^{T} 1_{n}| b) = {‖P^{T} 1_{n} - b‖}_{1}

(3)

The final General loss supervised trained network model outputs a predicted density map that is summed over the pixel values (see Equation (4), where

D^{e s t} (x)

is the predicted density map and W and L are the width and height of the predicted density map, respectively, used to calculate the number of predicted pods.

C o u n t = \sum_{w = 1}^{W} \sum_{l = 1}^{L} D^{e s t} (x)

(4)

2.3. Algorithm Implementation, Parameter Setting, and Evaluation Metrics

In summary, the feature extraction module of the SPCN can be outlined in three key steps: first, the input image undergoes feature extraction and spatial up-sampling through the front-end network structure. Next, the representation power of extracted features are improved by the CBAM module. Finally, the receptive field is expanded through dilated convolution for more contextual information extraction in the back-end network. The resulting output density map is then utilized for counting.

In the model, the SPCN architecture is initially configured and trained using a specified number of epochs, datasets, and batch sizes within the training module. Subsequently, once the target image completes the feature extraction process, the resulting output density map is passed to the generalized loss function for density estimation and counting. The pseudo-code outlining this process is presented in Algorithm 1.

Algorithm 1: Training Module

Input: w×h×3 Image of bean dataset

Output: Output feature density map of 1 channel

Model = (SPCN)// Initialize the model as SPCN
Train (Epochs):
Model.train(dataset, batch size)
//Feature extraction part
For x in dataset
Input == x

x = Front−end(x): feat = [64×2,M,128×3,M,256×4,512×4](M == Maxpooling)

x = CBAM(x): feat = [512]
x=Back−end(x): feat = [256,128,64]

x = output_layer(x): channel = 1

Return x
// Loss function computation
Loss = Generalized Loss (Model.train(output))
Expected calculation, print MAE, MSE

The code builds and experiments were performed in the runtime environment shown in the Table 2.

To assess the model’s performance, we employed two evaluation metrics, Mean Absolute Error (MAE) and Mean Squared Error (MSE), on the test dataset. These metrics effectively gauge the model’s accuracy and robustness, respectively, offering valuable insights for further refinement. The formulas for MAE and MSE are as follows, where

y_{i}

represents the true value of pods in the ith picture and

\tilde{y_{i}}

represents the predicted value of pods in the ith picture.

M A E = (1 / n) * \sum_{i = 1}^{n} |\tilde{y_{i}} - y_{i}|

(5)

M S E = (1 / n) * \sum_{i = 1}^{n} {(\tilde{y_{i}} - y_{i})}^{2}

(6)

3. Results

3.1. Training Details

The model implemented in this paper is built upon the PyTorch framework. To enhance training efficacy, we employed the Adam optimizer with an initial learning rate set to 1 × 10⁻⁵, a weight decay of 0.0001, and a momentum parameter of 0.9.

During the training process, we utilized a batch size of 1 and conducted 100 training epochs for soybean pod counting training. We set validation to start at epoch 10 of training and to occur every 2 epochs thereafter. Given the extensive size of the BeanCount-1500 dataset, the SPCN sufficiently converged after 100 training iterations, as depicted in Figure 10. Additionally, the MAE and MSE on the training and validation sets gradually converge during the training process, as shown in Figure 11.

3.2. Comparative Experimental Results

Table 3 shows the counting performances of three models, original GL [29], GL with feature extraction module from CSRNet [19], and SPCN, on our BeanCount-1500 dataset. VGG19 is used in GL as the feature extraction network while the feature extraction module in CSRNet is composed of multiple stacked dilated convolutions. In addition, GL also uses Generalized Loss function as the loss function.

As can be seen from Table 3, SPCN achieves the highest performance with the lowest values of 4.37 in MAE and 6.45 in MSE. GL (CSRNet) has the worst performance, which is due to the gridding issue caused by a large number of dilated convolutions stacked with the fixed dilation rate, resulting in a huge feature information loss.

Three randomly selected images of soybean pods with varying degrees of densities are presented in Figure 12. In Figure 12, “Origin” denotes the input image, while “GL”, “GL(CSRNet)”, and “SPCN” represent different models. Upon visual inspection, it is evident that the predicted and actual values of the density map generated by the proposed SPCN aligns better with the point label compared to the other two models. One of the notable strengths of the SPCN model is demonstrated by predictions inside the yellow circle. For instance, in IMG_2262, the SPCN model successfully detects a pod that closely resembles a branch in the yellow circle region, which is unreachable by either GL or GL(CSRNet). Similarly, in IMG_10481, where numerous pods overlap with leaves in the yellow circle part, the SPCN model exhibits superior detection capabilities compared to its counterparts. In IMG_10726, the region highlighted by the yellow circle with overlapped pods and yellow leaves poses a considerable challenge to most of counting models. GL experiences a high misdetection rate and GL (CSRNet) generates a blurred density map, whereas our proposed SPCN model successfully generates a high-quality density map.

Another set of experiments compare SPCN with different crowd counting models, including the classical model CSRNet [19] and HRNet [31] and models with different loss functions; the BL [20] and DM-Count [32], CCTrans [33], and BCCMA [21] are shown in Table 4. We used the default optimal parameters of these open-source models for comparisons in our experiments. As can be seen in Table 4, all models compared in the study have worse performances than our proposed SPCN in soybean pod counting, highlighting the superiority of SPCN.

3.3. Analysis of Ablation Experiments

To further analyze the contribution of different modules to the model’s better performance, we conducted ablation studies on different modules of SPCN.

In comparison to models utilizing the same Generalized Loss (GL) loss function, SPCN outperforms GL(VGG19), GL(HRNet), and GL(CSRNet) models with various feature extraction modules.

In addition, the CSRNet version D utilized in this study is characterized by a backend comprising six convolutional layers with a kernel size of 3 × 3 and a dilation rate of 4. To validate the effectiveness of the HDC strategy, the dilation rates of the six 3 × 3 convolutional kernels in the backend are adjusted from [4,4,4,4,4,4] to [1,2,3,5,7,7]. This adjustment mitigates grid effects and still preserves the same receptive field size, denoted as CSRNetHDC. The lower MAE and MSE results stemmed from the adjustment over the HDC strategy, demonstrating the effectiveness of the HDC strategy. The models employed and the corresponding experimental results are presented in Table 5.

As illustrated in Table 6, the MAE and MSE of the GL model in soybean pod counting are 4.89 and 7.23, respectively. The integration of CABM reduces MAE and MSE by 0.38 and 0.50, respectively. The incorporation of both CBAM and HDC design reaches the most optimal performance with MAE and MSE of 4.32 and 6.33, respectively, which substantiates the efficacy of the SPCN design.

Finally, we conducted ablation studies on the HDC strategy by comparing three arrangements for the back-end module. Arrangement 1 entails the change of a dilation rate to 2, resulting in a similar receptive field as the original Back-End module, but suffering from the extra grid effect. Arrangement 2 involves convolutional layers with a fixed dilation rate of 3, which leads to a larger receptive field than that of strategy 3 but suffers from the grid effect when compared to the Back-End module in SPCN. Table 7 demonstrates the superior effectiveness of the arrangement of dilated convolution layers within the Back-End module of SPCN, circumventing the grid effect and reaching higher counting accuracy.

In order to verify the generalization ability of the SPCN model in different soybean pod counting scenarios, we trained the SPCN model on our proposed BeanCount-1500 dataset and tested the open source soybean pod counting datasets [16].The first dataset is the Chongzhou dataset, composed of 570 images captured by a Canon 700D with resolution of 4752 × 3168 pixels at the Chongzhou Experimental Base in Sichuan Agricultural University in 2021. The second is the Renshou2021 dataset, comprising 878 images taken by a Canon 750D with a resolution of 5184 × 2196 pixels at Sichuan Agricultural University’s Renshou Farm in 2021.

In this paper, we used the training set of BeanCount-1500 to train the SPCN model; 878 images from the Renshou2021 dataset are used as the test set, maintaining the same test benchmarks as the YOLO POD model [16] and PodNet model [3]. Except for SPCN, all models are trained on the Chongzhou dataset. As can be seen from Table 8, the SPCN model shows strong generalization capability, with an MAE of 4.43, which is only 0.25 away from the optimal MAE, and an MSE of 6.18, which is the current optimal level. Figure 13 shows four display plots of the test results; it is clear that the SPCN model is able to accurately capture pod features and provide excellent counting accuracy. Although the test results on the Renshou2021 dataset are slightly inferior to those of YOLO POD and PODNet, SPCN performs better in terms of MSE.

4. Discussion

Compared with the mainstream method [19,20,21,31,32,33], SPCN achieved an MAE and MSE of 4.37 and 6.45, respectively, on the BeanCount-1500 dataset (see Table 4 for details). In the model performance generalization experiment on the Renshou2021 dataset, SPCN’s MAE and RMSE were 4.43 and 6.18, respectively. Compared with PodNet [3], the RMSE of SPCN was lower by 1.47, demonstrating its significant generalization potential in different scenarios.

Both the attention mechanism and the HDC strategy are incorporated in SPCN. The introduction of the attention mechanism enables the model to dynamically focus on important areas of the image, improving the efficiency in feature extraction. The HDC strategy increases the receptive field while maintaining computational efficiency by using convolution kernels with different expansion rates. This increase in the receptive field is particularly important for processing soybean plant images with complex backgrounds and diverse features, as it helps the model distinguish between different soybean pods more effectively. Additionally, SPCN uses the UOT loss function to supervise the generation of density maps. This function efficiently evaluates the minimum transmission cost between the target distribution and the predicted distribution, resulting in more accurate and refined density maps.

To evaluate algorithm performance, we constructed the BeanCount-1500 dataset. The dataset includes a total of 24,684 accurately annotated images of 1500 soybean plants covering plants with different morphologies and different growth environments with various weather conditions and light intensities. However, in complex situations, when pods are occluded by leaves and neighboring pods, the counting performance of SPCN is decreased to a certain extent. With only a customer-level NVIDIA GeForce RTX GPU (Santa Clara, CA, USA), an image can be processed within 600 ms on average. We have also implemented SPCN in the WeChat Mini Program of the School of Agriculture, South China Agricultural University, for easy soybean plant phenotyping.

Therefore, there is still room for improvement as follows: 1. Consider introducing advanced data augmentation methods to cope with the challenges of complex scenarios; 2. Further improve the loss function to generate finer and more dispersed density maps, retaining the model’s excellent counting performance in high-density scenes while providing precise coordinate information of soybean pods more effectively; 3. The density map-based counting methods hold great promise for pod counting applications, especially for dense counting tasks. Future research could explore ways to simplify density map-based counting models, perhaps by integrating techniques such as knowledge distillation, to develop lighter models without compromising counting accuracy. This adjustment is designed to better meet the needs of practical agricultural applications. Through these efforts, new research directions are expected to advance pod counting technology and bring higher practical application value to the agricultural sector.

5. Conclusions

Soybean pod counting, a pivotal technique in precision agriculture, holds considerable practical significance yet encounters several challenges, including mutual occlusion and the absence of dedicated large datasets. To tackle these challenges, this study introduces an innovative approach, the density map based SPCN model, which integrates the HDC strategy as well as CBAM for feature extraction and UOT loss function for supervising density map generation to achieve highly accurate soybean pod counting. Additionally, we curated BeanCount-1500, a comprehensive dataset dedicated to soybean pod counting encompassing images of different soybean varieties with diverse backgrounds, lighting conditions, and varying numbers of plants, thereby providing a valuable resource for research in this domain.

Through extensive experiments on the BeanCount-1500 dataset, the SPCN model showcases superior counting performance over existing mainstream counting methods, achieving an MAE and MSE of 4.37 and 6.45, respectively. Detailed ablation experiments further affirm the effectiveness of the HDC strategy in soybean pod counting. Furthermore, the SPCN model trained on BeanCount-1500 achieved a high counting accuracy when tested on the Renshou2021 dataset, demonstrating its excellent generalization capability, achieving an MAE and RMSE of 4.43 and 6.18, respectively. This indirectly underscores the tremendous diversity covered by the BeanCount-1500 dataset, validating its valuable potential for future model development.

We anticipate that this study will make significant contributions to the advancement of precision agriculture and the development of smart counting systems. Furthermore, we aim to extend the BeanCount-1500 dataset in future research endeavors and to explore the feasibility of deploying the model on lightweight mobile devices. By doing so, we aim to preserve the benefits of density map-based counting methods in addressing dense scene counting while satisfying the demand for real-time counting in agricultural fields. Ultimately, this initiative will facilitate the seamless translation of this technology into practical applications, thereby enhancing efficiency and productivity in agricultural practices.

Author Contributions

Conceptualization, X.L., Y.Z. (Yitao Zhuang) and Y.G.; Data curation, J.L. and D.L.; Formal analysis, Y.Z. (Yue Zhang); Funding acquisition, X.L.; Investigation, X.L., D.L. and Y.G.; Methodology, X.L., Y.Z. (Yitao Zhuang), J.L., Y.Z. (Yue Zhang), Z.W. and Y.G.; Project administration, X.L. and Y.G.; Resources, J.L. and Y.G.; Software, Y.Z. (Yitao Zhuang); Supervision, X.L.; Validation, Y.Z. (Yue Zhang), Z.W., J.Z. and Y.G.; Writing—original draft, Y.Z. (Yitao Zhuang) and J.Z.; Writing—review and editing, Y.Z. (Yitao Zhuang), Z.W. and Y.G. All authors have read and agreed to the published version of the manuscript.

Funding

The research was supported by the Key Research and Development Program of Guangzhou (No. 2024B03J1358).

Data Availability Statement

The original findings detailed in this study are contained within the article. For additional inquiries, please contact the corresponding authors directly. We will open source the code at https://github.com/johnhamtom/soybean_counting_SPCN (accessed on 10 August 2024).

Acknowledgments

Thanks to everyone who participated in photographing and labeling soybean plants.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Pojić, M.; Mišan, A.; Tiwari, B. Eco-Innovative Technologies for Extraction of Proteins for Human Consumption from Renewable Protein Sources of Plant Origin. Trends Food Sci. Technol. 2018, 75, 93–104. [Google Scholar] [CrossRef]
Wei, M.C.F.; Molin, J.P. Soybean Yield Estimation and Its Components: A Linear Regression Approach. Agriculture 2020, 10, 348. [Google Scholar] [CrossRef]
Yu, Z.; Wang, Y.; Ye, J.; Liufu, S.; Lu, D.; Zhu, X.; Yang, Z.; Tan, Q. Accurate and Fast Implementation of Soybean Pod Counting and Localization from High-Resolution Image. Front. Plant Sci. 2024, 15, 1320109. [Google Scholar] [CrossRef] [PubMed]
He, H.; Ma, X.; Guan, H. A Calculation Method of Phenotypic Traits of Soybean Pods Based on Image Processing Technology. Ecol. Inform. 2022, 69, 101676. [Google Scholar] [CrossRef]
Li, Y.; Jia, J.; Zhang, L.; Khattak, A.M.; Sun, S.; Gao, W.; Wang, M. Soybean Seed Counting Based on Pod Image Using Two-Column Convolution Neural Network. IEEE Access 2019, 7, 64177–64185. [Google Scholar] [CrossRef]
Zhang, C.; Lu, X.; Ma, H.; Hu, Y.; Zhang, S.; Ning, X.; Hu, J.; Jiao, J. High-Throughput Classification and Counting of Vegetable Soybean Pods Based on Deep Learning. Agronomy 2023, 13, 1154. [Google Scholar] [CrossRef]
Zhao, J.; Kaga, A.; Yamada, T.; Komatsu, K.; Hirata, K.; Kikuchi, A.; Hirafuji, M.; Ninomiya, S.; Guo, W. Improved Field-Based Soybean Seed Counting and Localization with Feature Level Considered. Plant Phenomics 2023, 5, 0026. [Google Scholar] [CrossRef] [PubMed]
Kamilaris, A.; Prenafeta-Boldú, F.X. Deep Learning in Agriculture: A Survey. Comput. Electron. Agric. 2018, 147, 70–90. [Google Scholar] [CrossRef]
Xu, D.; Chen, J.; Li, B.; Ma, J. Improving Lettuce Fresh Weight Estimation Accuracy through RGB-D Fusion. Agronomy 2023, 13, 2617. [Google Scholar] [CrossRef]
Turečková, A.; Tureček, T.; Janků, P.; Vařacha, P.; Šenkeřík, R.; Jašek, R.; Psota, V.; Štěpánek, V.; Komínková Oplatková, Z. Slicing Aided Large Scale Tomato Fruit Detection and Counting in 360-Degree Video Data from a Greenhouse. Measurement 2022, 204, 111977. [Google Scholar] [CrossRef]
Khaki, S.; Pham, H.; Han, Y.; Kuhl, A.; Kent, W.; Wang, L. DeepCorn: A Semi-Supervised Deep Learning Method for High-Throughput Image-Based Corn Kernel Counting and Yield Estimation. Knowl.-Based Syst. 2021, 218, 106874. [Google Scholar] [CrossRef]
Maji, A.K.; Marwaha, S.; Kumar, S.; Arora, A.; Chinnusamy, V.; Islam, S. SlypNet: Spikelet-Based Yield Prediction of Wheat Using Advanced Plant Phenotyping and Computer Vision Techniques. Front. Plant Sci. 2022, 13, 889853. [Google Scholar] [CrossRef]
Riera, L.G.; Carroll, M.E.; Zhang, Z.; Shook, J.M.; Ghosal, S.; Gao, T.; Singh, A.; Bhattacharya, S.; Ganapathysubramanian, B.; Singh, A.K.; et al. Deep Multiview Image Fusion for Soybean Yield Estimation in Breeding Applications. Plant Phenomics 2021, 2021, 9846470. [Google Scholar] [CrossRef] [PubMed]
Xu, C.; Lu, Y.; Jiang, H.; Liu, S.; Ma, Y.; Zhao, T. Counting Crowded Soybean Pods Based on Deformable Attention Recursive Feature Pyramid. Agronomy 2023, 13, 1507. [Google Scholar] [CrossRef]
Mathew, J.; Delavarpour, N.; Miranda, C.; Stenger, J.; Zhang, Z.; Aduteye, J.; Flores, P. A Novel Approach to Pod Count Estimation Using a Depth Camera in Support of Soybean Breeding Applications. Sensors 2023, 23, 6506. [Google Scholar] [CrossRef] [PubMed]
Xiang, S.; Wang, S.; Xu, M.; Wang, W.; Liu, W. YOLO POD: A Fast and Accurate Multi-Task Model for Dense Soybean Pod Counting. Plant Methods 2023, 19, 8. [Google Scholar] [CrossRef]
He, H.; Ma, X.; Guan, H.; Wang, F.; Shen, P. Recognition of Soybean Pods and Yield Prediction Based on Improved Deep Learning Model. Front. Plant Sci. 2023, 13, 1096619. [Google Scholar] [CrossRef] [PubMed]
Kang, D.; Ma, Z.; Chan, A.B. Beyond Counting: Comparisons of Density Maps for Crowd Analysis Tasks—Counting, Detection, and Tracking. IEEE Trans. Circuits Syst. Video Technol. 2019, 29, 1408–1422. [Google Scholar] [CrossRef]
Li, Y.; Zhang, X.; Chen, D. CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; IEEE: New York, NY, USA, 2018; pp. 1091–1100. [Google Scholar]
Ma, Z.; Wei, X.; Hong, X.; Gong, Y. Bayesian Loss for Crowd Count Estimation with Point Supervision. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; IEEE: New York, NY, USA, 2019; pp. 6141–6150. [Google Scholar]
Lin, H.; Ma, Z.; Ji, R.; Wang, Y.; Hong, X. Boosting Crowd Counting via Multifaceted Attention. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; IEEE: New York, NY, USA, 2022; pp. 19596–19605. [Google Scholar]
Jiang, N.; Yu, F. A Cell Counting Framework Based on Random Forest and Density Map. Appl. Sci. 2020, 10, 8346. [Google Scholar] [CrossRef]
Jiang, N.; Yu, F. A Two-Path Network for Cell Counting. IEEE Access 2021, 9, 70806–70815. [Google Scholar] [CrossRef]
He, S.; Minn, K.T.; Solnica-Krezel, L.; Anastasio, M.A.; Li, H. Deeply-Supervised Density Regression for Automatic Cell Counting in Microscopy Images. Med. Image Anal. 2021, 68, 101892. [Google Scholar] [CrossRef] [PubMed]
Russell, B.C.; Torralba, A.; Murphy, K.P.; Freeman, W.T. LabelMe: A Database and Web-Based Tool for Image Annotation. Int. J. Comput. Vis. 2008, 77, 157–173. [Google Scholar] [CrossRef]
Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. In Proceedings of the 3rd International Conference on Learning Representations, San Diego, CA, USA, 7 May 2015. [Google Scholar]
Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module. In Computer Vision—ECCV 2018; Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y., Eds.; Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2018; Volume 11211, pp. 3–19. ISBN 978-3-030-01233-5. [Google Scholar]
Wang, P.; Chen, P.; Yuan, Y.; Liu, D.; Huang, Z.; Hou, X.; Cottrell, G. Understanding Convolution for Semantic Segmentation. In Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Tahoe, NV, USA, 12–15 March 2018; IEEE: New York, NY, USA, 2018; pp. 1451–1460. [Google Scholar]
Wan, J.; Liu, Z.; Chan, A.B. A Generalized Loss Function for Crowd Counting and Localization. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 19–25 June 2021; IEEE: New York, NY, USA, 2021; pp. 1974–1983. [Google Scholar]
Peyré, G.; Cuturi, M. Computational Optimal Transport. Found. Trends® Mach. Learn. 2019, 11, 355–607. [Google Scholar] [CrossRef]
Sun, K.; Xiao, B.; Liu, D.; Wang, J. Deep High-Resolution Representation Learning for Human Pose Estimation. arXiv 2019, arXiv:1902.09212. [Google Scholar]
Wang, B.; Liu, H.; Samaras, D.; Hoai, M. Distribution Matching for Crowd Counting. In Proceedings of the NIPS’20: Proceedings of the 34th International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 6–12 December 2020; Curran Associates Inc.: Red Hook, NY, USA, 2020; Volume 135, pp. 1595–1607. [Google Scholar]
Tian, Y.; Chu, X.; Wang, H. CCTrans: Simplifying and Improving Crowd Counting with Transformer. arXiv 2021, arXiv:2109.14483. [Google Scholar]
Jocher, G.; Chaurasia, A.; Qiu, J. YOLO by Ultralytics (Version 8.0.0) [Computer Software]. Available online: https://github.com/ultralytics/ultralytics (accessed on 13 March 2024).
Duan, K.; Bai, S.; Xie, L.; Qi, H.; Huang, Q.; Tian, Q. CenterNet: Keypoint Triplets for Object Detection. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; IEEE: New York, NY, USA, 2019; pp. 6568–6577. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef]
Yu, Z.; Ye, J.; Li, C.; Zhou, H.; Li, X. TasselLFANet: A Novel Lightweight Multi-Branch Feature Aggregation Neural Network for High-Throughput Image-Based Maize Tassels Detection and Counting. Front. Plant Sci. 2023, 14, 1158940. [Google Scholar] [CrossRef]
Tan, M.; Pang, R.; Le, Q.V. EfficientDet: Scalable and Efficient Object Detection. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; IEEE: New York, NY, USA, 2020; pp. 10778–10787. [Google Scholar]
Tian, Z.; Shen, C.; Chen, H.; He, T. FCOS: Fully Convolutional One-Stage Object Detection. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; IEEE: New York, NY, USA, 2019; pp. 9626–9635. [Google Scholar]

Figure 1. Sample images of BeanCount-1500.

Figure 2. Characteristics of the BeanCount-1500. (a) Different illuminations and backgrounds. (b) Diverse levels of clarity. (c) Diverse number of plants in each picture. (d) Shattering.

Figure 3. Demonstration of the characteristics of the labelling process, where the red dots represent the labeled points.

Figure 4. Distribution of BeanCount-1500. (a) Scatter plot of Resolution and soybean pod count distribution. (b) Box plot of soybean pod number distribution. (c) Scatter plot of resolution distribution. Note: For (a), the xyz-axis distribution represents the image height, width, and number of annotations. The lighter the color, the more annotations. For (c), we use the K-means clustering algorithm to classify the resolutions of different images, represented by different colors.

Figure 5. Structure of the SPCN.

Figure 6. Illustration of the Front-End module.

Figure 7. The structure of CBAM.

Figure 8. Diagrammatic representation of the Back-End module. (a) Schematic of the receptive field. (b) Back-end module diagram. Note: For (a), the more cells a color covers, the larger its receptive field.

Figure 9. Schematic diagram of the loss function.

Figure 10. Decreasing training loss graph.

Figure 11. Decreasing training and validation MAE and MSE graph. (a) Decreasing MAE. (b) Decreasing MSE. Note: The blue line shows the convergence of the metric during training, whereas the red line shows the convergence of the metric during validation.

Figure 12. Visualization of model predictions. The first column shows the input image with the actual number of pods in the upper right corner. The second to fourth columns display the density maps predicted by different models with the predicted number of pods in the upper right corner. The fifth column shows the distribution of annotation points. Yellow circles highlight areas where the SPCN model performs better.

Figure 13. Prediction results on the Renshou2021 dataset. The first and third columns represent the input images, with the red font above them indicating the actual number of pods. The second and fourth columns represent the density maps predicted by SPCN, with the red font above them indicating the predicted number of pods.

Table 1. The details of BeanCount-1500.

	Number of Pictures	Total Number of Labels	Median Number of Labels	Average Number of Labels
training set	16,456	658,033	36	39,987
validation set	4113	164,341	36	40,121
test set	4115	165,017	36	39,937

Table 2. Operating environment table.

Software and Drivers
System: Windows 10	CPU: Intel(R)12GenCore i7-12700KF 3.6 GHz
Pytorch: 1.10.1	GPU: GeForce RTX 3060Ti
CUDA:11.3+CUDNN:8.2	Memory: 16 GB
Python: 3.8.15	Disk: 3 TB

Table 3. Comparison of soybean pod counting experiments. Bold numbers indicate the best-performing metrics.

Method	MAE	MSE
GL	4.89	7.23
GL (CSRNet)	5.06	7.45
Ours (SPCN)	4.37	6.45

Table 4. Experimental comparison with mainstream counting models. Bold numbers indicate the best-performing metrics.

Method	MAE	MSE
CSRNet	16.29	21.89
BL	6.11	8.94
HRNet	32.90	41.41
BCCMA	39.94	47.98
DM-Count	7.26	10.76
GL	4.89	7.23
CCTrans	4.91	7.28
Ours (SPCN)	4.37	6.45

Table 5. Comparison of ablation experiment 1. Bold numbers indicate the best-performing metrics.

Method	MAE	MSE
GL	4.89	7.23
GL (HRNet)	21.41	28.60
GL (CSRNet)	5.06	7.45
GL (CSRNetHDC)	4.91	7.24
Ours (SPCN)	4.37	6.45

Table 6. Comparison of ablation experiment 2. Bold numbers indicate the best-performing metrics.

Method	MAE	MSE
GL	4.89	7.23
With CBAM	4.51	6.73
With HDC	4.45	6.57
With CBAM and HDC	4.37	6.45

Table 7. Comparison of ablation experiment 3. Bold numbers indicate the best-performing metrics.

Methodologies	MAE	MSE
Strategy 1: [2,2,2]	4.54	6.70
Strategy 2: [3,3,3]	4.45	6.53
Strategy 3: [1,2,3]	4.37	6.45

Table 8. Performance of the proposed method on the Renshou2021 dataset. Bold numbers indicate the best-performing metrics.

	MAE	RMSE
YOLOv8 [3,34]	6.16	8.89
CenterNet [3,35]	12.67	17.77
Faster R-CNN [3,36]	16.56	19.22
TasselLFANet [3,37]	5.92	8.83
EfficientDet [3,38]	22.74	38.41
FCOS [3,39]	11.88	22.00
YOLO POD [16]	4.18	10.04
PodNet [3]	4.52	7.65
SPCN (train on BeanCount-1500)	4.43	6.18

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, X.; Zhuang, Y.; Li, J.; Zhang, Y.; Wang, Z.; Zhao, J.; Li, D.; Gao, Y. SPCN: An Innovative Soybean Pod Counting Network Based on HDC Strategy and Attention Mechanism. Agriculture 2024, 14, 1347. https://doi.org/10.3390/agriculture14081347

AMA Style

Li X, Zhuang Y, Li J, Zhang Y, Wang Z, Zhao J, Li D, Gao Y. SPCN: An Innovative Soybean Pod Counting Network Based on HDC Strategy and Attention Mechanism. Agriculture. 2024; 14(8):1347. https://doi.org/10.3390/agriculture14081347

Chicago/Turabian Style

Li, Ximing, Yitao Zhuang, Jingye Li, Yue Zhang, Zhe Wang, Jiangsan Zhao, Dazhi Li, and Yuefang Gao. 2024. "SPCN: An Innovative Soybean Pod Counting Network Based on HDC Strategy and Attention Mechanism" Agriculture 14, no. 8: 1347. https://doi.org/10.3390/agriculture14081347

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

SPCN: An Innovative Soybean Pod Counting Network Based on HDC Strategy and Attention Mechanism

Abstract

1. Introduction

2. Materials and Methods

2.1. Data Acquisition and Preprocessing

2.2. Algorithm Description

2.2.1. Front-End Module

2.2.2. CBAM Module

2.2.3. Back-End Module

2.2.4. Loss Function

2.3. Algorithm Implementation, Parameter Setting, and Evaluation Metrics

3. Results

3.1. Training Details

3.2. Comparative Experimental Results

3.3. Analysis of Ablation Experiments

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI