Next Article in Journal
Bollard Pull and Self-Propulsion Performance of a Waterjet Propelled Tracked Amphibian
Previous Article in Journal
Effect of CO2 Thickeners on CH4-CO2 Replacement in Hydrate-Bearing Sediment
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

SC-DiatomNet: An Efficient and Accurate Algorithm for Diatom Classification

by
Jiongwei Li
1,
Chengshuo Jiang
1,
Lishuang Yao
1,2 and
Shiyuan Zhang
1,3,*
1
College of Science, Shantou University, Shantou 515063, China
2
Guangdong Provincial Key Laboratory of Automotive Display and Touch Technologies, Shantou Goworld Display Technology Co., Ltd., Shantou 515041, China
3
Engineering Research Center of Digital Graphic and Next-Generation Printing, Soochow University, Suzhou 215006, China
*
Author to whom correspondence should be addressed.
J. Mar. Sci. Eng. 2024, 12(10), 1862; https://doi.org/10.3390/jmse12101862
Submission received: 28 August 2024 / Revised: 12 October 2024 / Accepted: 15 October 2024 / Published: 17 October 2024
(This article belongs to the Section Ocean Engineering)

Abstract

:
Detecting the quantity and diversity of diatoms is of great significance in areas such as climate change, water quality assessment, and oil exploration. Here, an efficient and accurate object detection model, named SC-DiatomNet, is proposed for diatom detection in complex environments. This model is based on the YOLOv3 architecture and uses the K-means++ algorithm for anchor box clustering on the diatom dataset. A convolutional block attention module is incorporated in the feature extraction network to enhance the model’s ability to recognize important regions. A spatial pyramid pooling module and adaptive anchor boxes are added to the encoder to improve detection accuracy for diatoms of different sizes. Experimental results show that SC-DiatomNet can successfully detect and classify diatoms accurately without reducing detection speed. The recall, precision, and F1 score were 94.96%, 94.21%, and 0.94, respectively. It further improved the mean average precision (mAP) of YOLOv3 by 9.52% on the diatom dataset. Meanwhile, the detection accuracy was improved compared with those of other advanced deep learning algorithms. SC-DiatomNet has potential applications in water quality analysis and monitoring of harmful algal blooms.

1. Introduction

Diatoms [1] are a significant group of eukaryotic microalgae [2], widely distributed in various aquatic environments such as oceans, lakes, and rivers. Globally, there are approximately 250 genera and about 100,000 species of diatoms [3]. Through photosynthesis, they convert carbon dioxide into organic matter and release large quantities of oxygen, thus maintaining the energy cycle and oxygen supply in aquatic ecosystems. Due to their siliceous cell wall structure, diatoms are highly sensitive to environmental changes. The structure and diversity of diatom communities can reflect changes in nutrient levels, pollution status, and other environmental factors in water bodies. Consequently, diatoms are extensively used in water quality assessment [4] and environmental monitoring [5]. In addition, diatoms hold significant value in various fields, including petroleum exploration [6] and biotechnology [7]. Currently, the identification of diatoms is mainly performed by experts through microscopic observation and classification. However, due to the vast diversity in diatom species, varying sizes, algal stacking, and environmental factors, manual classification is a challenging task. Thus, utilizing deep learning for classification of diatoms [8] would significantly advance related research.
The advent of deep learning has greatly accelerated the development of object detection algorithms [9], resulting in remarkable improvements in performance, particularly through the use of advanced convolutional architectures. Currently, object detection algorithms that utilize a Convolutional Neural Network (CNN) [10] are primarily categorized into two types: two-stage and one-stage detection algorithms. Specifically, two-stage detection algorithms, such as the Region-based Convolutional Neural Network (R-CNN) model proposed by Ross Girshick (2014) [11], generally exhibit accurate detection performance through region proposals and boundary box refinement. In contrast, one-stage detection algorithms, for example, the You Only Look Once (YOLO) model proposed by Redmon et al. (2016) [12], achieve exceptionally fast detection speeds in a single inference. They treat object detection as a direct regression problem, quickly completing both bounding box regression and classification tasks in one pass. However, due to the lack of a region proposal process, YOLO may not match the accuracy of two-stage algorithms in localization and classification, particularly when handling small objects or complex backgrounds. In general, existing object detection algorithms have either excessive computational costs or insufficient performance, limiting their application in diatom classification.
With inspiration from the visual attention mechanism [13] and pooling mechanism, an efficient and accurate object detection algorithm named SC-DiatomNet has been developed. This study focused on one-stage detection methods and aimed to explore the potential for improving their performance. This algorithm is based on YOLOv3 framework [14] and integrates the Convolutional Block Attention Module [15] (CBAM) and Spatial Pyramid Pooling [16] (SPP) module. The inclusion of the CBAM module allows the model to capture global feature relationships and learn the importance of features at various spatial locations. Meanwhile, the SPP module enhances the model’s ability to extract features at multiple scales, effectively retaining discriminative features while maintaining the original image resolution. Additionally, the K-means++ algorithm is used to generate specialized adaptive anchor boxes for the diatom dataset, further enhancing SC-DiatomNet’s detection performance in complex scenarios.
The main contributions of this paper are as follows:
(1)
SC-DiatomNet was successfully developed by integrating the CBAM and SPP modules with the YOLOv3 architecture. According to experimental results, this algorithm successfully combines YOLOv3 with attention mechanisms and applies them to diatom object detection.
(2)
By introducing the K-means++ algorithm to create specialized adaptive anchor boxes tailored to the diatom dataset, a more accurate match to the size and shape of diatoms way is achieved, thereby improving detection performance.
(3)
Comprehensive experiments were performed using a diatom dataset, and evaluation results indicate that SC-DiatomNet significantly outperforms YOLOv3 in detection accuracy. Additionally, compared to other popular algorithms, it provides an improved equilibrium between detection accuracy and speed.

2. Materials and Methods

2.1. Diatom Dataset

The dataset used in the experiment comes from the open-source Diatom Dataset [17]. The Diatom Dataset contains a total of 3027 microscopic images of diatoms from 68 species. All images in the dataset were captured using an optical microscope, with a spatial resolution of 2112 × 1584 pixels. For further screening within the Diatom Dataset, we selected categories with more samples, and six species were chosen to form a new diatom dataset, comprising 1043 microscopic images of diatoms. The six species are Encyonema silesiacum, Fragilaria recapitellata, Gomphonema olivaceum, Navicula cryptotenella, Navicula reichardtiana, and Planothidium lanceolatum. In the dataset, all image data are distributed into training, validation, and testing sets with a ratio of 8:1:1. The training set is directly used for training the neural network as input images. After each training epoch, the validation set is employed to optimize the training process by adjusting hyperparameters or using techniques such as early stopping. The test set, on the other hand, is solely used to evaluate the performance of the neural network after training and is not included in the training process. Comprehensive information regarding the diatom dataset is presented in Table 1, and images of the six diatom samples are shown in Figure 1 [18].

2.2. Data Augmentation

In order to improve the model’s recognition accuracy and improve the algorithm’s robustness, data augmentation [19] was applied during model training. Techniques like Gaussian blur, horizontal flip, vertical flip, brightness adjustment, random translation, and random cropping were combined randomly to expand the training set. Since the validation set is only used to optimize the training process through techniques such as hyperparameter tuning or early stopping, data augmentation is applied exclusively during training, while no random data augmentation is necessary during validation. Examples of data augmentation can be seen in Figure 2.

2.3. Convolutional Neural Network

A Convolutional Neural Network (CNN) typically includes an input layer, convolutional layers, pooling layers, and fully connected layers. CNN models can be constructed using diverse algorithms, pooling techniques, and activation functions to meet different application needs. One of the main advantages of CNNs over traditional image processing and classification methods is their ability to automatically extract useful features from image data, eliminating the cumbersome manual process. More importantly, CNNs can learn accurate recognition capabilities by deeply learning complex image patterns from specific datasets, significantly improving identification accuracy [20,21].
YOLO is an object detection algorithm that applies the powerful feature extraction capabilities of a CNN to object detection tasks, incorporating feature extraction, encoding, and decoding. The first two components correspond to the feature extraction network and encoder within the CNN, respectively. The decoding component processes the neural network’s output through a decoder, translating this information into specific annotations of items within an image. The output includes bounding boxes for object locations, type identification, and their predicted accuracy.

3. Improved YOLOv3 Network Structure

3.1. SC-DiatomNet

Here, we propose a novel CNN model, SC-DiatomNet, based on YOLOv3 for diatom classification tasks. The YOLO object detection algorithm has multiple versions [22,23,24,25,26], and we selected YOLOv3 as the foundational model primarily because of its impressive equilibrium between speed and accuracy. This is especially important in scenarios with limited computational resources, where YOLOv3 has demonstrated outstanding performance in diatom classification, making it well-suited to our existing hardware capabilities and application requirements. While more advanced models, such as YOLOv8, offer improvements in certain areas, they come with higher demands for computational resources, particularly in practical applications where hardware requirements are more stringent. YOLOv3’s feature extraction network, DarkNet53 [27], incorporates a large number of residual structures [28], which effectively mitigate the vanishing gradient problem as the network depth increases. The residual structure can map directly from a previous feature layer to a subsequent feature layer (skip connection) without convolution, aiding in model training and feature extraction. However, the neural network model trained directly with YOLOv3 performs poorly in practical tests, struggling to accurately localize targets against complex environments or when targets are small. It also fails to detect densely packed targets effectively, resulting in poor detection performance and an inability to meet the requirements of detecting multiple species in a single diatom microscopic image. This leads to unsatisfactory detection accuracy, making it unfit for practical use.
Therefore, SC-DiatomNet retains parts of the YOLOv3 structure but improves the feature extraction and encoding components. Figure 3 shows the overall architecture of SC-DiatomNet. The blue boxes (DBL) represent the basic convolutional blocks, consisting of a convolutional layer, batch normalization (BN), and a Leaky ReLU activation function. These blocks serve to extract the initial characteristics from the input image. The green boxes (Res) represent residual blocks, which successfully tackle the vanishing gradient issue in deep networks through skip connections, thereby improving the training performance. The orange boxes (CBAM) denote the attention modules we added. These modules automatically learn which regions of the image are important and perform more in-depth feature extraction on these areas, enhancing the network’s ability to perceive critical features. This helps SC-DiatomNet achieve precise object localization under any circumstances. The yellow boxes (SPP) represent the spatial pyramid pooling module, which pools at multiple scales, enabling the network to more effectively detect objects of different sizes. This improves SC-DiatomNet’s accuracy in detecting small targets. The light pink boxes and brown boxes represent feature map concatenation and upsampling operations, respectively. By integrating multi-level feature information, the model’s detection performance in complex environments is further enhanced. Finally, the red boxes represent the output layers of the model. These output three different feature map sizes (52 × 52, 26 × 26, and 13 × 13) to accommodate the detection needs of objects of various sizes. Notably, the K-means++ algorithm [29,30,31] was also introduced in the encoder, allowing the generation of anchor boxes better suited to the experimental dataset. This resulted in more precise handling of objects at different scales, further improving the detection performance of SC-DiatomNet.

3.2. Attention Mechanism

As background complexity increases and target deformation becomes more pronounced, a model’s detection accuracy might be affected. To ensure that the model can precisely locate diatoms while overall detection accuracy is improved, we introduced the Convolutional Block Attention Module (CBAM) into the backbone feature extraction architecture. The concept of attention mechanisms originates from natural language processing (NLP) and has been successfully applied to machine translation [32,33,34]. It has gradually expanded to many fields, such as audio recognition [35] and object detection [36]. CBAM is a module combining spatial and channel attention, capable of automatically identifying and emphasizing the most important features while suppressing irrelevant information, as shown in Figure 4.
In the channel attention module, in the input feature map X∈ℝC×H×W, C, H, and W represent the number of channels, height, and width. Global max pooling and global average pooling are first applied to obtain two one-dimensional feature vectors: the max pooling output Xmax∈ℝC and the average pooling output Xavg∈ℝC. These two feature vectors are then passed through a shared fully connected layer W∈ℝC×C/r, where r represents the reduction ratio, which is used to learn the weights for each channel. The two feature vectors are subsequently merged to generate the final channel attention weight vector Mc(X)∈ℝC, with the Sigmoid function applied to ensure that the weights fall within the range [0, 1]. Finally, this attention weight vector Mc(X) is applied to each channel of the input feature map to emphasize the important channels. In the spatial attention module, the input feature map first undergoes max pooling and average pooling along the channel dimension, resulting in two two-dimensional feature maps X m a x s p a t i a l ∈ℝ1×H×W and X avg spatial ∈ℝ1×H×W, which represent the maximum and average feature values at each pixel location, respectively. These two feature maps are concatenated along the channel dimension to form a feature map with two channels. A convolution operation is then applied to generate the spatial attention weight Ms(X)∈ℝH×W. After Sigmoid activation, the resulting weight map is applied to the feature map, highlighting the important spatial regions.

3.3. Pooling Mechanism

The Spatial Pyramid Pooling (SPP) module is an important neural network component with a pooling mechanism. The key feature of the SPP module is its ability to process input feature maps X∈ℝC×H×W in parallel using pooling layers of different sizes (such as 5 × 5, 9 × 9, and 13 × 13). The results of these layers are then concatenated to produce fixed-size outputs, as shown in Figure 5. This design is inspired by the concept of a spatial pyramid, allowing it to simultaneously capture both local and global features, thereby enriching the information representation of the feature map.

3.4. Adaptive Anchor Box

The YOLOv3 model uses anchor boxes as priors and compares them with the ground truth boxes to adjust and obtain the prediction boxes. Appropriate anchor boxes can improve the accuracy of the model’s detection. Typically, the YOLOv3 model presets nine default sizes of anchor boxes, denoted as {Ai } i = 1 9 . Considering the large variation in the sizes of targets in the diatom dataset, the K-means++ algorithm was used to generate new anchor boxes that better fit the experimental dataset and these were applied to the training of all models. These anchor boxes are denoted as { A i * } i = 1 k , where k represents the number of anchor boxes generated by the K-means++ clustering algorithm, as shown in Figure 6. By introducing prior anchor boxes that are adaptive to the diatom dataset, as listed in Table 2, the network is no longer required to randomly create anchors of various sizes for object prediction. This results in faster network training and quicker convergence.

3.5. Evaluation Metrics

To thoroughly assess the model’s performance during training and objectively evaluate the differences between SC-DiatomNet and other models (YOLOv3, YOLOv3-SPP, and YOLOv3-CBAM), a well-designed loss function is essential. Specifically, the loss function in this study consists of three components: the bounding box regression loss (Lbbox), confidence loss (Lconf), and classification loss (Lcls).The bounding box regression loss measures the difference between the predicted bounding boxes and the ground truth boxes. The confidence loss evaluates whether the predicted bounding box contains the object. Lastly, the classification loss assesses the difference between the predicted and actual class labels for each grid cell. The detailed definitions are as follows:
Loss = L bbox + L conf + L cls
In addition to introducing the loss function, we also incorporated evaluation metrics such as precision, recall, F1 score, and mean average precision (mAP). The following are the formulas for calculating precision, recall, and F1 score:
Precision = TP TP + FP × 100 %
Recall = TP TP + FN × 100 %
F 1 = 2 × Precision × Recall Precision + Recall
where true positives (TP) refer to the number of target samples the model correctly predicts. False positives (FP) represent the number of non-target samples the model incorrectly predicts as targets, and false negatives (FN) denote the number of target samples the model fails to detect.
Precision and recall are used to plot the PR curve, and the area under this curve represents the average precision (AP), which can be calculated using a specific formula. The mAP refers to the average AP across all classes and can be calculated using a specific formula. The mAP value takes other evaluation parameters into account. Observing how the mAP changes across training epochs helps understand the model’s training status and detection accuracy. A higher mAP value indicates better detection performance of the model. Here are the formulas for calculating AP and mAP:
AP = 0 1 P R dR
mAP = i = 1 n A P i n

4. Results

4.1. Model Training

The computer used for training was configured with an Intel i7-10750H CPU and an Nvidia GeForce GTX 1650ti GPU with 4 GB of VRAM. The software environment included PyCharm 2023.1 for compilation, Python 3.9.6 as the programming language, and PyTorch 1.13.1+cu116 as the deep learning framework.
For a fair comparison, this study used the official default settings to train the models: an initial learning rate of 0.001 with an exponential decay strategy, the Adam optimizer utilizing a momentum of 0.9, and an epoch count of 200. Additionally, the study made some modifications during the training phase, as follows. (1) Transfer learning was utilized by fine-tuning a pre-trained model on ImageNet to avoid training from scratch. (2) Additionally, to accelerate training speed and prevent weight disruption, the first 30 epochs involved frozen training, followed by 170 epochs of unfrozen training. (3) The learning rate was scaled using warm-up and cosine annealing algorithms to prevent gradient explosion or vanishing. An early stopping strategy was also adopted to prevent overfitting. Based on this computer configuration and the hyperparameters mentioned above, the training times for each model on the diatom dataset are provided in Table 3. The training times listed in Table 3 are based on a specific hardware configuration, and there may be significant differences in training times when using different hardware.

4.2. Model Comparison

The trained model underwent evaluation on the test set, and the detection results were compared in two groups. The first group included the same type of object detection algorithms but utilized different network structures, namely, YOLOv3, YOLOv3-CBAM, YOLOv3-SPP, and SC-DiatomNet. As shown in Table 4, our proposed SC-DiatomNet model, which is an improved version of YOLOv3, exhibits significant improvement across all evaluation metrics, particularly in terms of F1 score (0.945), recall (94.96%), accuracy (94.21%), and mAP (97.66%). In contrast, the base YOLOv3 model shows an F1 score of 0.795, recall of 81.59%, accuracy of 78.89%, and mAP of 88.14%, which are lower than those of the improved models in all aspects. In Figure 7a, a rapid increase in the mAP value is observed at the initial stage of training. However, after 160 epochs, the mAP value starts to fluctuate within a narrow range, with no further notable improvement. At this point, it can be inferred that the neural network has reached its optimal training effect. The highest mAP value of YOLOv3, 0.888, appears around 150 epochs, eventually stabilizing around 0.88. This mAP value indicates that even when a convolutional neural network is trained to near-optimal performance, it still does not meet the requirements for accurate identification of microalgae (mAP value above 0.9). The improved YOLOv3-SPP and YOLOv3-CBAM models achieved maximum mAP values exceeding 0.95 and ultimately stabilized around 0.93, representing a 5% improvement over YOLOv3. This indicates that the addition of SPP and CBAM modules optimizes the original convolutional neural network to a certain extent. The SC-DiatomNet, which incorporates both the SPP and CBAM modules, performed best, achieving a maximum mAP value of 0.97 and eventually stabilizing around 0.96, representing a 7% improvement over YOLOv3. This result meets the requirements for accurate diatom identification. In Figure 7b, it is observed that SC-DiatomNet exhibits a rapid decline in validation loss, ultimately stabilizing at a low level. This demonstrates that the model converges quickly and maintains good generalization performance.
The second group compares different object detection algorithms, namely Single Shot Multibox Detector (SSD) [37], Faster R-CNN [38], and SC-DiatomNet. From Table 5, it is evident that SC-DiatomNet significantly outperforms both SSD and Faster R-CNN in detection performance. Specifically, SC-DiatomNet demonstrates the best performance in terms of F1 score (0.945), accuracy (94.21%), and mAP (97.66%), significantly outperforming the other two algorithms. Although Faster R-CNN achieves the highest recall (97.872%), its accuracy (63.524%) is relatively low, leading to more false detections and thus affecting its overall performance. In contrast, SSD shows the weakest performance, particularly in recall (72.782%) and mAP (79.516%), indicating that SC-DiatomNet offers a superior balance between detection accuracy and overall performance compared to the other models. Figure 8 clearly shows the trends in detection performance and validation loss changes across the training process for all three models. Figure 9 compares the detection results of each model on the same image, providing a visual comparison of their detection capabilities.

4.3. Detection Results

The study carried out a thorough assessment of the SC-DiatomNet model’s effectiveness. As shown in Table 6, SC-DiatomNet’s performance in detecting six different diatom species is remarkable. Specifically, the detection of Cylindrotheca closterium stands out with an F1 score of 0.99, a recall of 100%, an accuracy of 97.33%, and an AP of 99.52%. Other diatom species, such as Fragilaria capucina, Achnanthes exigua, and Navicula cryptocephala, also exhibit F1 scores, recall, and accuracy rates above 94%, indicating that the model performs consistently and accurately across these species. The only species with slightly lower performance is Pinnularia gibba, with an F1 score of 0.89, recall of 87.88%, and accuracy of 90.62%, though its AP still reached 93.26%, meeting the detection accuracy requirements. Figure 10a illustrates the model’s loss changes during training and validation, showing that both curves steadily decline and converge, indicating a stable and consistent learning process across both training and validation datasets, leading to effective convergence. Figure 10b illustrates that the model achieves high precision across different recall levels. Figure 11 illustrates how our method, SC-DiatomNet, identifies diatom feature information and detects diatoms. The feature information for six diatom species is visualized, and the detection results are output.

4.4. Ablation Experiments

The YOLOv3 network was redesigned, and three improvement schemes were proposed. To demonstrate the impact of these improvements on the model’s performance and avoid any counterproductive modifications that could degrade network performance, we conducted ablation experiments on a diatom test set.
In terms of computational cost, we assessed GFLOPS, parameter size, speed (model inference time), and frames per second (FPS). For detection effectiveness, we used [email protected] and [email protected]:0.95 as evaluation metrics, comparing the improved methods with other algorithms. The number after “@” indicates the specific IoU threshold.
Table 7 presents the experimental results for different models in terms of computational cost. After incorporating the SPP and CBAM modules, SC-DiatomNet shows a GFLOPS of 67.456 GB, a parameter size of 67.653 MB, an inference speed of 60.57 ms, and a frame rate (FPS) of 17. While there is a slight increase in computational load and model parameters, the detection rate sees only a minor rise, leading to a notable enhancement in detection performance. In contrast, SSD has the lowest computational cost, with a GFLOPS of just 15.092 GB, the fastest inference speed (15 ms), and the highest frame rate (66 FPS), though its performance metrics are the weakest. Faster R-CNN, on the other hand, has significantly higher computational costs than the other models, with a GFLOPS of 941.169 GB, an inference speed of 218 ms, and a frame rate of only 4 FPS, highlighting its disadvantage in terms of computational resource demand and real-time capability. Overall, SC-DiatomNet strikes a good balance between performance and computational efficiency. Compared to the base YOLOv3 model and other lightweight models such as SSD, it delivers higher detection accuracy while maintaining much better speed than Faster R-CNN. The proposed improvements in this study carefully balance computational cost while ensuring the model’s detection performance, making it more efficient for practical applications.
Table 8 presents a comparison of different models in terms of detection performance, focusing on key metrics, including [email protected]:0.95, [email protected]:0.95, [email protected], and [email protected]. The results indicate that SC-DiatomNet outperforms all other models, achieving an [email protected]:0.95 of 87.2%, a [email protected]:0.95 of 82.9%, a [email protected] as high as 97.2%, and a [email protected] of 94.4%. In contrast, the baseline YOLOv3 model has an [email protected]:0.95 of only 80.1%, a [email protected]:0.95 of 71.5%, a [email protected] of 87.5%, and a [email protected] of 81.6%, showing significantly weaker performance. Although YOLOv3-CBAM and YOLOv3-SPP demonstrate some improvement, they do not match the overall performance of SC-DiatomNet. SSD shows the poorest detection results, with a [email protected]:0.95 of only 50.4% and a [email protected] of 64.5%. While Faster R-CNN performs at 88.5% on [email protected], its overall performance still falls short of that of SC-DiatomNet. The proposed improvements in this study excel across all key detection metrics, ensuring high detection accuracy in practical applications, particularly in complex scenarios and multi-scale object detection, where precision is notably enhanced.

5. Conclusions

This paper delves into the performance of advanced, deep learning-based object detection algorithms for diatom detection in microscopic images. The YOLOv3 anchor box algorithm and network architecture were optimized to enhance its performance in detecting multi-scale objects and handling detection tasks in complex environments. Numerous experiments were conducted, comparing the proposed SC-DiatomNet with the original model, optimized models (such as YOLOv3, YOLOv3-CBAM, and YOLOv3-SPP), and other advanced models (SSD and Faster R-CNN). SC-DiatomNet surpassed all other models, recording a mAP of 97.66%, an F1 score of 0.94, a recall of 94.96%, and an accuracy of 94.21%. The study has demonstrated the feasibility and effectiveness of the YOLOv3-based SC-DiatomNet architecture for detecting various diatom species in optical microscopic images. In the future, SC-DiatomNet will be further optimized, with a focus on enhancing the model’s adaptability to varying water quality, lighting conditions, and environmental changes to ensure high-precision detection capabilities in more complex and variable underwater environments. Additionally, efforts will be focused on simplifying the SC-DiatomNet model architecture and minimizing inference time, allowing it to operate on resource-constrained hardware devices to meet the requirements for real-time performance and efficiency in practical applications. Furthermore, the application range of SC-DiatomNet will be expanded to include detection tasks for a broader range of aquatic organisms, such as fish and zooplankton, to further validate its performance in multi-species detection. Based on this, newer versions of models (YOLOv5, YOLOv8, etc.) will also be considered for adoption to leverage their advanced features for further optimization of SC-DiatomNet, thereby enhancing detection accuracy and efficiency.

Author Contributions

Conceptualization, J.L. and S.Z.; methodology, J.L.; software, C.J.; validation, J.L., S.Z. and C.J.; formal analysis, J.L.; investigation, C.J.; resources, J.L.; data curation, J.L.; writing—original draft preparation, J.L. and C.J.; writing—review and editing, S.Z.; visualization, J.L.; supervision, S.Z.; project administration, S.Z.; funding acquisition, S.Z. and L.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Key Research and Development Program of China (No. 2021YFB3600300); Natural Science Foundation of Guangdong Province (No. 2023A1515011826); Special Projects in Key Fields of Colleges and Universities of Guangdong Province (No. 2021ZDZX1051); STU Scientific Research Initiation Grant (NTF. 22022; NTF. 19038); Engineering Research Center of digital graphic and next-generation printing; Jiangsu Province, Soochow University (SDGC2248).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

Author Lishuang Yao was employed by the Shantou Goworld Display Technology Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

  1. Behrenfeld, M.J.; Halsey, K.H.; Boss, E.; Karp-Boss, L.; Milligan, A.J.; Peers, G. Thoughts on the evolution and ecological niche of diatoms. Ecol. Monogr. 2021, 91, e01457. [Google Scholar] [CrossRef]
  2. Orefice, I.; Di Dato, V.; Sardo, A.; Lauritano, C.; Romano, G. Lipid mediators in marine diatoms. Aquat. Ecol. 2022, 56, 377–397. [Google Scholar] [CrossRef]
  3. Mann, D.G.; Vanormelingen, P. An Inordinate Fondness? The Number, Distributions, and Origins of Diatom Species. J. Eukaryot. Microbiol. 2013, 60, 414–420. [Google Scholar] [CrossRef] [PubMed]
  4. Solak, C.N.; Peszek, Ł.; Yilmaz, E.; Ergül, H.A.; Kayal, M.; Ekmekçi, F.; Várbíró, G.; Yüce, A.M.; Canli, O.; Binici, M.S.; et al. Use of Diatoms in Monitoring the Sakarya River Basin, Turkey. Water 2020, 12, 703. [Google Scholar] [CrossRef]
  5. Dahiya, P.; Makwana, M.D.; Chaniyara, P.; Bhatia, A. A Comprehensive Review of Forensic Diatomology: Contemporary Developments and Future Trajectories. Egypt J. Forensic Sci. 2024, 14, 2. [Google Scholar] [CrossRef]
  6. Paniagua-Michel, J.; Banat, I.M. Unravelling Diatoms’ Potential for the Bioremediation of Oil Hydrocarbons in Marine Environments. Clean Technol. 2024, 6, 93–115. [Google Scholar] [CrossRef]
  7. Sharma, N.; Simon, D.P.; Diaz-Garza, A.M.; Fantino, E.; Messaabi, A.; Meddeb-Mouelhi, F.; Germain, H.; Desgagné-Penix, I. Diatoms Biotechnology: Various Industrial Applications for a Greener Tomorrow. Front. Mar. Sci. 2021, 8, 636613. [Google Scholar] [CrossRef]
  8. Pedraza, A.; Bueno, G.; Deniz, O.; Cristóbal, G.; Blanco, S.; Borrego-Ramos, M. Automated Diatom Classification (Part B): A Deep Learning Approach. Appl. Sci. 2017, 7, 460. [Google Scholar] [CrossRef]
  9. Zou, Z.; Chen, K.; Shi, Z.; Guo, Y.; Ye, J. Object Detection in 20 Years: A Survey. Proc. IEEE 2023, 111, 257–276. [Google Scholar] [CrossRef]
  10. Li, Z.; Liu, F.; Yang, W.; Peng, S.; Zhou, J. A Survey of Convolutional Neural Networks: Analysis, Applications, and Prospects. IEEE Trans. Neural Netw. Learn. Syst. 2022, 33, 6999–7019. [Google Scholar] [CrossRef]
  11. Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, 23–28 June 2014; IEEE: Columbus, OH, USA, 2014; pp. 580–587. [Google Scholar] [CrossRef]
  12. Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. arXiv 2016, arXiv:1506.02640. [Google Scholar] [CrossRef]
  13. Soydaner, D. Attention Mechanism in Neural Networks: Where It Comes and Where It Goes. Neural Comput. Appl. 2022, 34, 13371–13385. [Google Scholar] [CrossRef]
  14. Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar] [CrossRef]
  15. Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; Springer: Cham, Switzerland, 2018; pp. 3–19. [Google Scholar] [CrossRef]
  16. Xie, J.; Zhu, M.; Hu, K. Improved seabird image classification based on dual transfer learning framework and spatial pyramid pooling. Ecol. Inform. 2022, 72, 101832. [Google Scholar] [CrossRef]
  17. Gunduz, H.; Gunal, S. A Lightweight Convolutional Neural Network (CNN) Model for Diatom Classification: DiatomNet. PeerJ Comput. Sci. 2024, 10, e1970. [Google Scholar] [CrossRef]
  18. Gündüz, H.; Solak, C.N.; Gunal, S. Segmentation of Diatoms Using Edge Detection and Deep Learning. Turk. J. Electr. Eng. Comput. Sci. 2022, 30, 18. [Google Scholar] [CrossRef]
  19. Shorten, C.; Khoshgoftaar, T.M. A Survey on Image Data Augmentation for Deep Learning. J. Big Data 2019, 6, 60. [Google Scholar] [CrossRef]
  20. Krichen, M. Convolutional Neural Networks: A Survey. Computers 2023, 12, 151. [Google Scholar] [CrossRef]
  21. Chen, L.; Li, S.; Bai, Q.; Yang, J.; Jiang, S.; Miao, Y. Review of Image Classification Algorithms Based on Convolutional Neural Networks. Remote Sens. 2021, 13, 4712. [Google Scholar] [CrossRef]
  22. Jiang, P.; Ergu, D.; Liu, F.; Cai, Y.; Ma, B. A Review of Yolo Algorithm Developments. Procedia Comput. Sci. 2022, 199, 1066–1073. [Google Scholar] [CrossRef]
  23. Wang, X.; Lv, F.; Li, L.; Yi, Z.; Jiang, Q. A novel optimized tiny YOLOv3 algorithm for the identification of objects in the lawn environment. Sci. Rep. 2022, 12, 15124. [Google Scholar] [CrossRef] [PubMed]
  24. Yu, Q.; Han, Y.; Lin, W.; Gao, X. Detection and Analysis of Corrosion on Coated Metal Surfaces Using Enhanced YOLOv5 Algorithm for Anti-Corrosion Performance Evaluation. J. Mar. Sci. Eng. 2024, 12, 1090. [Google Scholar] [CrossRef]
  25. Li, C.; Li, L.; Jiang, H.; Weng, K.; Geng, Y.; Li, L.; Ke, Z.; Li, Q.; Cheng, M.; Nie, W.; et al. YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications. arXiv 2022, arXiv:2209.02976. [Google Scholar] [CrossRef]
  26. Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 18–23 June 2023; IEEE: Vancouver, BC, Canada, 2023; pp. 7464–7475. [Google Scholar] [CrossRef]
  27. Deng, L.; Li, H.; Liu, H.; Gu, J. A Lightweight YOLOv3 Algorithm Used for Safety Helmet Detection. Sci. Rep. 2022, 12, 10981. [Google Scholar] [CrossRef] [PubMed]
  28. Shafiq, M.; Gu, Z. Deep Residual Learning for Image Recognition: A Survey. Appl. Sci. 2022, 12, 8972. [Google Scholar] [CrossRef]
  29. Gul, M.; Rehman, M. Big Data: An Optimized Approach for Cluster Initialization. J. Big Data 2023, 10, 120. [Google Scholar] [CrossRef]
  30. Mussabayev, R.; Mladenovic, N.; Jarboui, B.; Mussabayev, R. How to Use K-means for Big Data Clustering? Pattern Recognit. 2023, 137, 109269. [Google Scholar] [CrossRef]
  31. Zhu, A.; Hua, Z.; Shi, Y.; Tang, Y.; Miao, L. An Improved K-Means Algorithm Based on Evidence Distance. Entropy 2021, 23, 1550. [Google Scholar] [CrossRef]
  32. Subakan, C.; Ravanelli, M.; Cornell, S.; Bronzi, M.; Zhong, J. Attention Is All You Need In Speech Separation. In Proceedings of the ICASSP 2021—2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada, 11 June 2021; pp. 21–25. [Google Scholar] [CrossRef]
  33. Galassi, A.; Lippi, M.; Torroni, P. Attention in Natural Language Processing. IEEE Trans. Neural Netw. Learn. Syst. 2021, 32, 4291–4308. [Google Scholar] [CrossRef]
  34. Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar] [CrossRef]
  35. Zhang, Z.; Xu, S.; Zhang, S.; Qiao, T.; Cao, S. Attention based convolutional recurrent neural network for environmental sound classification. Neurocomputing 2021, 453, 896–903. [Google Scholar] [CrossRef]
  36. Cheng, M.; Liu, M. Image Convolution Techniques Integrated with YOLOv3 Algorithm in Motion Object Data Filtering and Detection. Sci. Rep. 2024, 14, 7651. [Google Scholar] [CrossRef] [PubMed]
  37. Kumar, A.; Srivastava, S. Object Detection System Based on Convolution Neural Networks Using Single Shot Multi-Box Detector. Procedia Comput. Sci. 2020, 171, 2610–2617. [Google Scholar] [CrossRef]
  38. Xu, X.; Zhao, M.; Shi, P.; Ren, R.; He, X.; Wei, X.; Yang, H. Crack Detection and Comparison Study Based on Faster R-CNN and Mask R-CNN. Sensors 2022, 22, 1215. [Google Scholar] [CrossRef]
Figure 1. Sample images of six diatom species [18].
Figure 1. Sample images of six diatom species [18].
Jmse 12 01862 g001
Figure 2. Data augmentation samples.
Figure 2. Data augmentation samples.
Jmse 12 01862 g002
Figure 3. Overall framework of SC-DiatomNet. “DBL” blocks represent the basic convolutional blocks, consisting of a convolutional layer (Conv), batch normalization (BN), and a Leaky ReLU activation function. “Res” blocks represent residual blocks. CBAM (Convolutional Block Attention Module) and SPP (Spatial Pyramid Pooling) are used to enhance feature extraction and multi-scale representation in the network.
Figure 3. Overall framework of SC-DiatomNet. “DBL” blocks represent the basic convolutional blocks, consisting of a convolutional layer (Conv), batch normalization (BN), and a Leaky ReLU activation function. “Res” blocks represent residual blocks. CBAM (Convolutional Block Attention Module) and SPP (Spatial Pyramid Pooling) are used to enhance feature extraction and multi-scale representation in the network.
Jmse 12 01862 g003
Figure 4. CBAM structure. ⊗ represents the attention weights multiplied element-wise with the input features, producing the weighted feature map. ⊕ represents the feature map obtained by adding the results of max pooling and average pooling. Ѳ represents the activation function, where the output is normalized between 0 and 1, used for final feature aggregation.
Figure 4. CBAM structure. ⊗ represents the attention weights multiplied element-wise with the input features, producing the weighted feature map. ⊕ represents the feature map obtained by adding the results of max pooling and average pooling. Ѳ represents the activation function, where the output is normalized between 0 and 1, used for final feature aggregation.
Jmse 12 01862 g004
Figure 5. SPP structure.
Figure 5. SPP structure.
Jmse 12 01862 g005
Figure 6. K-means++ clustering of anchors. The “×” represents the centroid of each cluster.
Figure 6. K-means++ clustering of anchors. The “×” represents the centroid of each cluster.
Jmse 12 01862 g006
Figure 7. The YOLOv3 model’s (a) mAP and (b) Val loss curve.
Figure 7. The YOLOv3 model’s (a) mAP and (b) Val loss curve.
Jmse 12 01862 g007
Figure 8. Different object detection algorithms’ (a) mAP and (b) Val loss curves.
Figure 8. Different object detection algorithms’ (a) mAP and (b) Val loss curves.
Jmse 12 01862 g008
Figure 9. Comparison of detection results from different models.
Figure 9. Comparison of detection results from different models.
Jmse 12 01862 g009
Figure 10. SC-DiatomNet’s (a) loss and (b) PR curve.
Figure 10. SC-DiatomNet’s (a) loss and (b) PR curve.
Jmse 12 01862 g010
Figure 11. Visualization and detection of diatoms.
Figure 11. Visualization and detection of diatoms.
Jmse 12 01862 g011
Table 1. Information on the diatom dataset.
Table 1. Information on the diatom dataset.
Diatom SpeciesNumber of ImagesPercentage
Encyonema silesiacum16515.8%
Fragilaria recapitellata16315.6%
Gomphonema olivaceum24623.6%
Navicula cryptotenella20119.3%
Navicula reichardtiana14814.2%
Planothidium lanceolatum12011.5%
Total1043100%
Table 2. Anchor Box Sizes for the Diatom Dataset.
Table 2. Anchor Box Sizes for the Diatom Dataset.
Feature Map SizeAnchor Box Size
13 × 13(60, 29) (22, 83) (41, 57)
26 × 26(64, 40) (35, 78) (55, 61)
52 × 52(103, 45) (45, 128) (82, 90)
Table 3. Model training time.
Table 3. Model training time.
ModelTraining Time
YOLOv37.926 h
YOLOv3-CBAM8.794 h
YOLOv3-SPP7.953 h
SC-DiatomNet8.171 h
Table 4. Comparison of YOLOv3 models.
Table 4. Comparison of YOLOv3 models.
ModelF1 ScoreRecall/%Precision/%mAP/%
YOLOv30.7981.5978.8988.14
YOLOv3-CBAM0.8484.1685.8193.88
YOLOv3-SPP0.8384.0983.1792.86
SC-DiatomNet0.9494.9694.2197.66
Table 5. Comparison of Different Object Detection Algorithms.
Table 5. Comparison of Different Object Detection Algorithms.
ModelF1 ScoreRecall/%Precision/%mAP/%
SSD0.7372.7876.2979.51
Faster R-CNN0.7897.8763.5291.77
SC-DiatomNet0.9494.9694.2197.66
Table 6. The detection performance of SC-DiatomNet.
Table 6. The detection performance of SC-DiatomNet.
Diatom SpeciesF1 ScoreRecall/%Precision/%AP/%
Encyonema silesiacum0.99100.0097.3399.52
Fragilaria recapitellata0.9494.5793.0598.30
Gomphonema olivaceum0.9595.3497.1099.24
Navicula cryptotenella0.9595.5995.2497.99
Navicula reichardtiana0.9496.3691.9197.64
Planothidium lanceolatum0.8987.8890.6293.26
mAP97.66%
Table 7. Partial ablation experiment results for computational costs.
Table 7. Partial ablation experiment results for computational costs.
ModelSPPCBAMGFLOPS (GB)Parameters (MB)Speed (ms)FPS
YOLOv3××65.63461.55145.31022
YOLOv3-CBAM×65.63462.40958.87017
YOLOv3-SPP×67.40666.79546.94021
SC-DiatomNet67.45667.65360.57017
SSD××15.09214.34415.00066
Faster R-CNN××941.16928.480218.0004
Table 8. Partial ablation experiment results for detection performance.
Table 8. Partial ablation experiment results for detection performance.
ModelSPPCBAM[email protected]:0.95[email protected]:0.95[email protected][email protected]
YOLOv3××80.1%71.5%87.5%81.6%
YOLOv3-CBAM×83.7%78.1%93.2%91.2%
YOLOv3-SPP×84.6%78.6%92.1%91.4%
SC-DiatomNet87.2%82.9%97.2%94.4%
SSD××77.5%50.4%64.5%62.1%
Faster R-CNN××74.2%61.0%88.5%71.3%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Li, J.; Jiang, C.; Yao, L.; Zhang, S. SC-DiatomNet: An Efficient and Accurate Algorithm for Diatom Classification. J. Mar. Sci. Eng. 2024, 12, 1862. https://doi.org/10.3390/jmse12101862

AMA Style

Li J, Jiang C, Yao L, Zhang S. SC-DiatomNet: An Efficient and Accurate Algorithm for Diatom Classification. Journal of Marine Science and Engineering. 2024; 12(10):1862. https://doi.org/10.3390/jmse12101862

Chicago/Turabian Style

Li, Jiongwei, Chengshuo Jiang, Lishuang Yao, and Shiyuan Zhang. 2024. "SC-DiatomNet: An Efficient and Accurate Algorithm for Diatom Classification" Journal of Marine Science and Engineering 12, no. 10: 1862. https://doi.org/10.3390/jmse12101862

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop