Next Article in Journal
Sensorless HSPMSM Control of an Improved SMC and EKF Based on Immune PSO
Previous Article in Journal
An Efficient Steganographic Protocol for WebP Files
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Microorganism Detection in Activated Sludge Microscopic Images Using Improved YOLO

School of Information and Control Engineering, Xi’an University of Architecture and Technology, Xi’an 710055, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2023, 13(22), 12406; https://doi.org/10.3390/app132212406
Submission received: 29 October 2023 / Revised: 13 November 2023 / Accepted: 14 November 2023 / Published: 16 November 2023

Abstract

:
Wastewater has detrimental effects on the natural environment. The activated sludge method, a widely adopted approach for wastewater treatment, has proven highly effective. Within this process, microorganisms play a pivotal role, necessitating continuous monitoring of their quantity and diversity. Conventional methods, such as microscopic observation, are time-consuming. With the widespread integration of computer vision technologies into object detection, deep learning-based object detection algorithms, notably the You Only Look Once (YOLO) model, have garnered substantial interest for their speed and precision in detection tasks. In this research, we applied the YOLO model to detect microorganisms in microscopic images of activated sludge. Furthermore, addressing the irregular shapes of microorganisms, we developed an improved YOLO model by incorporating deformable convolutional networks and an attention mechanism to enhance its detection capabilities. We conducted training and testing using a custom dataset comprising five distinct objects. The performance evaluations used in this study utilized metrics such as the mean average precision at intersections over a union threshold of 0.5 ([email protected]), with the improved YOLO model achieving a [email protected] value of 93.7%, signifying a 4.3% improvement over the YOLOv5 model. Comparative analysis of the improved YOLO model and other object detection algorithms on the same dataset revealed a higher accuracy for the improved YOLO model. These results demonstrate the superior performance of the improved YOLO model in the task of detecting microorganisms in activated sludge, providing an effective auxiliary method for wastewater treatment monitoring.

1. Introduction

The quality of water is a critical factor for human health, ecological balance, and sustainable development. There are numerous factors contributing to water pollution. The most significant causes stem from human daily life as well as industrial and agricultural practices. Wastewater generated during social production activities contains substantial amounts of organic compounds, heavy metals, hazardous chemicals, and other pollutants [1]. These substances exhibit toxicity and bioaccumulation within aquatic ecosystems. They could potentially exert long-term effects on aquatic organisms and the ecological food chain. Consequently, applying effective and scientific technologies to address the treatment of wastewater has become an urgent task.
Researchers have proposed various effective methods of wastewater treatment, including physical-chemical processes, biological treatments, and membrane separation. Among these methods, the activated sludge method has received significant attention as a widely applied biological treatment method. The fundamental principle of the activated sludge method involves the degradation of organic substances and contaminants in wastewater, transforming them into harmless compounds through suitable environmental conditions and microorganisms [2]. Microorganisms, including protozoa and metazoans, within the activated sludge play a pivotal role in achieving treatment effects. Notably, the quantity and diversity of these microorganisms exhibit noticeable fluctuations with changes in process operational conditions [3]. Therefore, monitoring changes in microbial communities is significant when dealing with wastewater. We can apply appropriate measures through changes in microbial species and their quantities.
Several techniques have been applied to monitor microorganisms within activated sludge, such as microscopic observation, DNA sequencing [4], and fluorescence in situ hybridization (FISH) [5]. On the one hand, microscopic observation, although effective, is time-consuming, demands substantial human resource investment, and is subject to errors. On the other hand, DNA sequencing and FISH demand meticulous sample preparation and have high implementation costs. Therefore, the above methods have high costs of practical application. With the development and widespread application of computer vision, this technology offers an automated, efficient, and accurate approach to identifying microorganisms within activated sludge.
Object detection methods based on convolutional neural networks (CNN) are essential in computer vision research [6]. They aim to automatically detect objects in images using deep learning models [7]. These methods have significantly improved the object detection performance of computer vision methods and are now mainstream approaches. Early CNN-based object detection methods followed a two-stage process [8]. First, they would generate candidate object regions using methods like selective search. Then, they would perform feature extraction on each candidate region and use a classifier for object classification and bounding box regression. Regions with a CNN (R-CNN) is one of the representative methods, using support vector machines (SVM) for object classification and introducing bounding box regression to fine-tune candidate region positions for better accuracy [9]. However, two-stage object detection models have limitations. Candidate region generation and object classification are computationally independent, resulting in a high complexity and slower detection speed. To address these issues and improve speed and accuracy, a single-stage YOLO (You Only Look Once) model is developed. The YOLO model treats object detection as an end-to-end regression problem, directly predicting object positions and classes from input images. Single-stage methods perform feature extraction and prediction across the entire image. The YOLO model can avoid the candidate region generation and redundant feature extraction operations seen in two-stage methods and substantially improve their detection speed.
There is currently limited research on microorganism detection tasks in microscopic images of activated sludge. However, some progress has been made in the detection and classification of objects in other similar fields [10]. In early research, Tao et al. developed an effective classification of red tide algae. They achieved favorable results using conventional image processing techniques and machine learning methods such as SVM [11]. Jalba et al. studied the shape features of diatoms and proposed an automatic recognition method based on morphological curvature scale contours. This method achieved the efficient automatic identification of diatoms [12]. As deep learning methods based on CNN gained widespread recognition, research gradually shifted towards deep learning. Zhang et al. analyzed existing conventional image processing methods and explored the feasibility of deep learning methods in microbial counting [13]. Al-Barazanchi et al. applied a hybrid approach, combining CNN models for the classification of plankton images. They demonstrated that CNN models outperformed conventional image processing methods with higher classification accuracy [14]. To further improve the accuracy of object classification, Wahid et al. increased the depth of the CNN model. They validated that networks with more layers led to better classification accuracy [15]. However, there are performance limits to simply increasing the layers of the network model. Researchers have begun to develop different structure-of-network models to effectively enhance the performance of deep learning models [16]. These models perform not only image classification but also object detection tasks. Ren et al. developed Faster R-CNN model, which generates the detection region using the prediction method to perform the task of detecting the location of specific objects in an image [9]. Meanwhile, the YOLO object detection model, with its single-stage detection process and end-to-end efficient detection, gained widespread recognition in practical applications. Park et al. applied the latest YOLO model to identify algae, achieving a high average accuracy of 89.8% [17]. This demonstrates the YOLO model’s advantage in detection tasks.
In the practical application of object detection tasks, there are various challenges, such as the irregular shapes of objects. To address these challenges, this study proposed an improvement to the YOLOv5 model using deformable convolutional networks (DCN) and an attention mechanism. Our approach aims to improve the detection performance, specifically for irregular objects. First, we collected and annotated a dataset of microscopic images of activated sludge. Then, the dataset was augmented by implementing deep convolutional generative adversarial networks (DC-GAN). Subsequently, the improved YOLO was trained on the custom dataset. To verify the effectiveness of our proposed improvements, we conducted a comparative analysis with other existing object detection models. The results demonstrate that our improved model outperforms the baseline YOLOv5 as well as the other object detection models in accurately detecting irregularly shaped objects. In summary, there are three main contributions in this study, which can be summarized as follows:
  • The collection and annotation of microscopic images of activated sludge.
  • The augmentation of a dataset by implementing DC-GAN.
  • The improvement of the YOLO model based on DCN and an attention mechanism.
  • A comparative analysis of performance of different object detection models.

2. Materials and Methods

In this section, the materials and methods applied in this study are explained, including data sources, data augmentation, improved YOLO based on DCN and an attention mechanism, and other details. Firstly, the microscopic images of activated sludge were collected as a dataset. Then, we augmented the dataset by implementing DC-GAN and improved the YOLOv5 model by using DCN and an attention mechanism.

2.1. Data Source

The dataset applied to conduct the research was collected from the Environment Laboratory at Xi’an University of Architecture and Technology, China. It consists of 650 microscopic images of activated sludge, including five object classes, i.e., granular sludges, granular sludge flocs, filamentous bacteria, rotifers, and epistylis, all of which are of substantial significance in the activated sludge process. Sample images of these classes are shown in Figure 1.

2.2. Data Augmentation

Data preprocessing is a fundamental part of computer vision model development. Implementing preprocessing techniques such as data augmentation, data normalization, and resizing raw data is essential for improving model performance. In models employing deep learning, the size of the dataset significantly impacts the model’s performance. In the dataset that we collected, there were variations in the quantities of different object classes, with filamentous bacteria and rotifers being less prevalent. Therefore, data augmentation was necessary to overcome this problem.
There are various methods for data augmentation. One of these techniques, DC-GAN, has shown promising results in generating realistic images. In this study, we applied DC-GAN to augment the less-prevalent classes in the dataset. DC-GAN consists primarily of two parts: the generator and the discriminator [18]. The generator plays a crucial role in DC-GAN. It takes random noises as input and gradually transforms them into synthetic images through a series of transposed convolution layers. Typically, the generator consists of multiple transposed convolution layers, each of which progressively increases the size of the input feature maps, resulting in the generation of more detailed images. The discriminator is another vital part of DC-GAN. It takes input images and extracts features using a series of convolutional layers. Its primary function is to classify the input images, determining whether they are real or synthetic. During training, it updates the weights and biases of both the generator and discriminator by backpropagation. This optimization process aims to make the generator generate more realistic images while enhancing the discriminator’s ability to accurately distinguish between them [18]. Figure 2 shows the workflow of DC-GAN.
In the data augmentation, we fed both the synthetic images generated by the generator and the real images into the discriminator. Through iterative adversarial training, the generator continually improved its ability to generate synthetic images, while the discriminator enhanced its capability to discriminate between real and synthetic images. As training progressed, the performance of both the generator and discriminator gradually improved, eventually reaching a balance where the discriminator could no longer distinguish between real and synthetic images [18]. Finally, the images generated by the generator could be used to balance the class distribution within the dataset.
The dataset could be used in training after annotation. We utilized annotation tools to label the five objects in the dataset, as shown in Figure 3. The completed annotations were saved as text files, with each microscopic image corresponding to one annotation file. These annotation files contained the category IDs within the images, along with the location and size of the bounding box.

2.3. Deformable Convolutional Networks

In conventional deep learning models, convolution is primarily used for image feature extraction. However, the conventional convolution process has fixed sampling points, which make it difficult to fully capture the features of irregularly shaped objects. Therefore, it becomes necessary to adaptively adjust the convolution scale and receptive field size to improve feature extraction and object localization.
DCN could enhance the adaptability of the model to irregularly shaped objects. In contrast to conventional convolution, DCN introduces positional offsets [19]. For each convolutional kernel, it dynamically adjusts the sampling positions based on these offsets, enabling the extraction of features from irregular regions, as shown in Figure 4.
There are two steps in the computation process of a 3 × 3 deformable convolution: (1) employing a 3 × 3 grid R to sample on the input feature map; (2) weighted computation on the sampled points employing a convolutional kernel w   . The grid R represents the size and dilation of the receptive field [19]. Equation (1) presents the grid R and a size of 3 × 3 and dilation of 1 for the convolutional kernel.
R = { ( - 1 , - 1 ) , ( - 1 , 0 ) , , ( 0 , 1 ) , ( 1 , 1 ) }  
Equation (2) presents the computation of the output value y ( p 0 ) for each position p 0 on the feature map.
y ( p 0 ) = P n   R w ( p n )   ·   x ( p 0 + p n )
where p n represents the enumeration of the sampling points in R .
During the process of deformable convolution, the grid R is augmented with offsets p n | n = 1 , , N , where N = | R | . Subsequently, Equation (2) transforms into Equation (3).
y ( p 0 ) = P n   R w ( p n ) ·   x ( p 0 + p n + p n )
Since the offset values p n are typically fractional, the calculation requires the use of bilinear interpolation. Equations (4)–(6) present the processes of bilinear interpolations.
x ( p ) = q G ( q ,   p ) ·   x ( q )
G ( q ,   p ) = g ( q x ,   p x )   ·   g ( q y ,   p y )
g ( a ,   b ) = m a x ( 0 ,   1   | a     b | )

2.4. YOLO Model

In the field of object detection, there are primarily two architectures: the two-stage models, including R-CNN, Fast R-CNN, and Faster R-CNN, and the single-stage models, including the YOLO model and Single Shot Multibox Detector (SSD) [16]. Each of these architectures has its own advantages. Single-stage models offer an end-to-end training process, eliminating the need for complex two-stage computation. Additionally, single-stage models typically have simpler architectures and require fewer parameters, reducing the model complexity and the computational resources needed for training and inference. Their architecture makes them more deployable and user-friendly. Consequently, single-stage models like the YOLO model have found wider practical applications across various fields.
The core of the YOLO model is to transform the object detection problem into a regression problem by employing a neural network that directly predicts the class and location information for each object [20]. In contrast to conventional object detection algorithms, YOLO eliminates the need for employing sliding windows of varying sizes across the entire image. Instead, the image is subdivided into grid cells of S × S dimensions. Each individual grid cell is assigned the task of predicting both the bounding box and the class of one object. For an input image, it first passes through a convolutional neural network, resulting in an output tensor of dimensions S × S × ( B × 5 + C ), where B represents the quantity of predicted bounding boxes for each grid cell, and C represents the quantity of object classes. This output tensor can be interpreted as a three-dimensional matrix, where the first and second dimensions correspond to the rows and columns of the grid, and the third dimension contains information about the predicted bounding boxes and classes for each grid cell. In the output tensor, each grid cell predicts B bounding boxes, and each bounding box consists of 5 elements: center locations, width, height, and confidence. Furthermore, YOLO typically applies multiple detection layers, with each layer being responsible for detecting objects of different sizes and aspect ratios. It enables the model to detect large, medium, and small objects, enhancing the adaptability of detections [20].
There are from YOLOv1 to YOLOv5, with continuous upgrades and improvements in the YOLO model [21]. Each version attempts to address the limitations of previous iterations and enhance the model’s performance. Currently, the best-performing model is YOLOv5. In comparison to the previous iterations, it exhibits superior inference speeds and accuracy, making it more suitable for practical applications. There are three main parts in YOLOv5′s architecture: the backbone network, the neck network, and the detection, as shown in Figure 5.
In the detection of microscopic images, there were challenges related to the irregular shapes of the objects. The conventional convolutional modules in the original YOLOv5 model struggled to address these issues. Therefore, we introduced the DCN, which could help to dynamically modify the shape and position of convolutional kernels during the convolution process, thereby improving the ability of feature capture. Specifically, within the backbone network of the YOLOv5 architecture, we replaced all the standard convolutions with DCN modules for better feature capture.

2.5. Channel Attention Mechanism

The attention mechanism is a frequently employed technique within the fields of machine learning and deep learning, inspired by the human visual and perceptual systems. This mechanism directs models to concentrate selectively on portions of the input data instead of either entirely disregarding or uniformly treating all inputs. Through the assignment of diverse weights to different input portions according to their significance, attention mechanisms facilitate more efficient information processing by models. In our research, the utilization of the attention mechanism plays a pivotal role in enhancing the microorganism detection capabilities of the YOLO model.
The Squeeze-and-Excitation (SE) module represents a type of channel attention mechanism designed to amplify valuable features and suppress irrelevant ones. The SE module consists of three essential parts: squeeze, excitation, and reweight, as shown in Figure 6. The initial squeeze part involves compressing global information onto the channel feature map. This compression is achieved through a pooling operation. The result is a global feature description vector that encapsulates essential information about the entire feature map. Then, the excitation part is executed. In this part, a fully connected layer utilizes the global feature description vector to calculate the weights associated with each channel. This process resembles a learning phase during which the model discerns the significance of each channel. The output of the excitation part is a vector that indicates the influence of every channel. In the final reweight part, the learned channel weights are employed to adjust the channel feature maps. By adapting the channel feature responses in accordance with their respective channel weights, the model directs more attention to the valuable features, thereby enhancing the utility of the information. These adjusted channel feature maps contain more valuable details, resulting in improving the performance and generalization of the model.
The SE module functions through a series of operations: Initially, the feature X undergoes a transformation, compressing from H × W × C into a 1 × 1 × C tensor by global average pooling. Subsequently, it proceeds through a series of steps, including passing through a fully connected layer, undergoing a non-linear transformation, and the application of activation functions. This sequence normalizes the tensor into a series of real numbers that span the range from 0 to 1. These real values represent the significance of each channel, with a value of 1 denoting high importance and 0 representing unimportance. In the end, the feature map undergoes a multiplication operation to generate the final output, denoted as X’.

2.6. Transfer Learning

Transfer learning is a significant concept in the fields of machine learning and deep learning, referring to a learning method where a model previously trained on one task (referred to as the source task) is applied to a different but related task (referred to as the target task) [22]. Transfer learning helps the target domain overcome issues such as model generalization problems, leading to improved learning outcomes. Based on technical approaches, transfer learning can be classified into four methods: sample-based transfer, feature-based transfer, model-based transfer, and relation-based transfer [23]. In this paper, we applied the model-based transfer approach, which involves sharing parameters between the source model and target model.
There are three common training approaches often used in transfer learning: The first approach is freezing all convolutional layers of the pre-trained model and training the fully connected layers. The second approach is to freeze partial convolutional layers of the pre-trained model and train the rest of the convolutional layers and the fully connected layers. The third approach is to load all the pre-trained model parameters and train the entire network [23]. In this research, we applied the second training approach. First, we utilized large, publicly available datasets, such as the MAFF 239992 microorganism dataset, to train the entire improved YOLO model. Next, following the strategy of freezing some convolutional layers, we froze half of the convolutional layers in the feature extraction part of the pre-trained, improved YOLO model, meaning their weights were no longer updated in subsequent training. Finally, we conducted training on our custom dataset of activated sludge. During this training process, we trained the unfrozen part of the feature extraction, allowing for the updating of weights in this section. Figure 7 shows the workflows of transfer learning.

2.7. Performance Measurement

The performance of the object detection model was evaluated using precision, recall, and mean average precision at intersection over union threshold of 0.5 ([email protected]). The average precision (AP) of a model refers to the average precision for a single object class. The mAP was computed by meaning the AP values for all object classes, considering an IoU threshold of 0.5.
Precision, recall, and mAP are defined as shown in Equations (7)–(9).
Precision = TP TP + FP
Recall = TP TP + FN
mAP   = 1 N N     i = 1 A P i
where TP represents the number of true positives, FP represents the number of false positives, and FN indicates the number of false negatives.

3. Results

In this section, the descriptions of the experimental settings applied in this study are explained, including the dataset splitting and model training settings. We trained several object detection models and compared the performance of these models on the custom dataset.

3.1. Experimental Settings

The model was trained on a custom dataset that applied data augmentation. There were 1000 images of augmented activated sludge micrographs, including five annotated objects in the custom dataset. The dataset was split into training, testing, and validation sets with a distribution ratio of 6:2:2. During the experiments, the model was trained with the following configurations: 300 epochs, a batch size of 16, a learning rate of 0.001, and the application of the Adam optimizer.

3.2. Models Evaluation

To validate the effectiveness of the data augmentation approaches applied to the dataset, the improved YOLO model was separately trained using the original dataset and the augmented dataset. The labeled microorganism numbers for different datasets and model performances are shown in Table 1 and Table 2. Compared to the original dataset, the model trained on the augmented dataset exhibited a better performance. The precision, recall, and [email protected] of the improved YOLO model trained on the augmented dataset improved by 2.8%, 4.4%, and 3.9%, respectively.
The presence of irregularly shaped objects such as rotifers and filamentous bacteria in the detection task of activated sludge micrographs reduces the model performance. The YOLO model was improved by applying a DCN module and SE module to enhance its ability to detect irregularly shaped objects. Finally, the performance of the original YOLOv5 model was compared with that of the improved YOLO model on the same dataset, and the improved YOLO demonstrated a better overall performance, making it well-suited for the detection of activated sludge microorganisms, as shown in Table 3. It cand be seen that the improved YOLO model shows a better performance in precision, recall, and [email protected] compared to the YOLO model. Specifically, the improved YOLO model exhibited a 5.0% increase in precision and a 6.1% increase in recall for detecting rotifers, as well as a 5.7% increase in precision and a 6.8% increase in recall for detecting filamentous bacteria. The result verifies that the DCN module and SE module could improve the YOLO model for more precise detection of irregularly shaped objects.
We validated the model by using a custom dataset, and the detection results of the improved YOLO model are shown in Figure 8. It demonstrates that the improved YOLO model can effectively detect and classify microorganisms in activated sludge microscopic images.

3.3. Comparison with Existing Models

To validate the advantages of the improved YOLO model, we compared it with the existing models such as R-CNN [24], Faster R-CNN [25], and SSD [26]. From Table 4, it can be seen that the improved YOLO model outperforms existing models on activated sludge micrographs, particularly for irregularly shaped objects like rotifers and filamentous bacteria, where the improved YOLO model exhibits significant improvements.

4. Conclusions

Activated sludge is a commonly used and highly efficient method for wastewater treatment. It effectively removes harmful substances from wastewater. However, the detection of microorganisms in activated sludge micrographs consumes significant time and human resources in daily operations. These microorganisms are crucial for wastewater treatment. Therefore, this study proposes a method for the detection of microorganisms in activated sludge micrographs. To address the detection of irregularly shaped microorganisms, we developed an improved YOLO model based on DCN and an attention mechanism to improve its detection capabilities.
The summarization of our study is as follows: Firstly, we collected 650 activated sludge micrographs, including five object classes. Next, we preprocessed the micrograph dataset and applied DC-GAN data augmentation techniques to enhance object classes with fewer instances. The custom dataset consisted of 1000 micrographs after annotation.
The results indicated that models trained on the augmented dataset exhibit superior performance compared to those trained on the original dataset, presenting the advantages of employing the DC-GAN augmentation technique. In the comparative experiment with the original YOLO model, the improved YOLO model based on DCN and attention mechanisms demonstrated better performance than the original YOLO model, particularly in the detection of irregularly shaped objects. In the final comparative experiments with existing object detection models, the improved YOLO model consistently achieved excellent performance, validating the effectiveness of our improvement approach.
The performance of object detection models is affected by various factors, including the datasets, data preprocessing methods, network model structures, etc. Therefore, the model proposed in this paper may not be directly applicable to other application domains or datasets. However, the analytical methods used in this paper can provide insights for other object detection tasks in different fields.

Author Contributions

Conceptualization, Z.S. and Y.K.; methodology, Z.S. and Y.K.; software, Z.S.; validation, Z.S.; formal analysis, Z.S. and Y.K.; investigation, Z.S.; resources, Z.S. and Y.K.; data curation, Z.S.; writing—original draft preparation, Z.S. and Y.K.; writing—review and editing, Z.S. and Y.K.; visualization, Z.S.; supervision, Y.K.; All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data are not publicly available due to the data is part of ongoing research.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Burzio, C.; Ekholm, J.; Modin, O.; Falås, P.; Svahn, O.; Wilén, B.-M. Removal of organic micropollutants from municipal wastewater by aerobic granular sludge and conventional activated sludge. J. Hazard. Mater. 2022, 438, 129528. [Google Scholar] [CrossRef] [PubMed]
  2. Shin, D.-C.; Yoon, S.-C.; Park, C.-H. Biological characteristics of microorganisms immobilization media for nitrogen removal. J. Water Process Eng. 2019, 32, 100979. [Google Scholar] [CrossRef]
  3. Boujelben, I.; Sabri, S.; van Pelt, J.; ben Makhlouf, M.; Gdoura, R.; Khannous, L. Functional selection of bacteria in an activated sludge reactor for application in saline wastewater treatment in Kerkennah, Tunisia. Int. J. Environ. Sci. Technol. 2021, 18, 1561–1578. [Google Scholar] [CrossRef]
  4. Liu, B.; Gao, X.; Zhang, H. BioSeq-Analysis2. 0: An updated platform for analyzing DNA, RNA and protein sequences at sequence level and residue level based on machine learning approaches. Nucleic Acids Res. 2019, 47, e127. [Google Scholar] [CrossRef] [PubMed]
  5. Young, A.P.; Jackson, D.J.; Wyeth, R.C. A technical review and guide to RNA fluorescence in situ hybridization. PeerJ 2020, 8, e8806. [Google Scholar] [CrossRef] [PubMed]
  6. Bengtsson, S.; de Blois, M.; Wilén, B.-M.; Gustavsson, D. A comparison of aerobic granular sludge with conventional and compact biological treatment technologies. Environ. Technol. 2019, 40, 2769–2778. [Google Scholar] [CrossRef] [PubMed]
  7. Rani, P.; Kotwal, S.; Manhas, J.; Sharma, V.; Sharma, S. Machine learning and deep learning based computational approaches in automatic microorganisms image recognition: Methodologies, challenges, and developments. Arch. Comput. Methods Eng. 2022, 29, 1801–1837. [Google Scholar] [CrossRef] [PubMed]
  8. Hay, E.A.; Parthasarathy, R. Performance of convolutional neural networks for identification of bacteria in 3D microscopy datasets. PLoS Comput. Biol. 2018, 14, e1006628. [Google Scholar] [CrossRef] [PubMed]
  9. Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 2015, 28. Available online: https://proceedings.neurips.cc/paper/2015/file/14bfa6bb14875e45bba028a21ed38046-Paper.pdf (accessed on 12 October 2023). [CrossRef] [PubMed]
  10. Aydin, A.S.; Dubey, A.; Dovrat, D.; Aharoni, A.; Shilkrot, R. CNN based yeast cell segmentation in multi-modal fluorescent microscopy data. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Honolulu, HI, USA, 21–26 July 2017; pp. 753–759. [Google Scholar]
  11. Tao, J.; Chen, W.; Wang, B.; Jiezhen, X.; Nianzhi, J.; Luo, T. Real-time red tide algae classification using naive bayes classifier and SVM. In Proceedings of the 2008 2nd International Conference on Bioinformatics and Biomedical Engineering, Shanghai, China, 16–18 May 2008; pp. 2888–2891. [Google Scholar]
  12. Jalba, A.C.; Wilkinson, M.H.; Roerdink, J.B.; Bayer, M.M.; Juggins, S. Automatic diatom identification using contour analysis by morphological curvature scale spaces. Mach. Vis. Appl. 2005, 16, 217–228. [Google Scholar] [CrossRef]
  13. Zhang, J.; Li, C.; Yin, Y.; Zhang, J.; Grzegorzek, M. Applications of artificial neural networks in microorganism image analysis: A comprehensive review from conventional multilayer perceptron to popular convolutional neural network and potential visual transformer. Artif. Intell. Rev. 2023, 56, 1013–1070. [Google Scholar] [CrossRef] [PubMed]
  14. Al-Barazanchi, H.A.; Verma, A.; Wang, S. Performance evaluation of improved CNN for SIPPER plankton image calssification. In Proceedings of the 2015 Third International Conference on Image Information Processing (ICIIP), Waknaghat, India, 21–24 December 2015; pp. 551–556. [Google Scholar]
  15. Wahid, M.F.; Ahmed, T.; Habib, M.A. Classification of microscopic images of bacteria using deep convolutional neural network. In Proceedings of the 2018 10th International Conference on Electrical and Computer Engineering (ICECE), Dhaka, Bangladesh, 20–22 December 2018; pp. 217–220. [Google Scholar]
  16. Zhang, J.; Li, C.; Rahaman, M.M.; Yao, Y.; Ma, P.; Zhang, J.; Zhao, X.; Jiang, T.; Grzegorzek, M. A comprehensive review of image analysis methods for microorganism counting: From classical image processing to deep learning approaches. Artif. Intell. Rev. 2022, 55, 2875–2944. [Google Scholar] [CrossRef] [PubMed]
  17. Park, J.; Baek, J.; Kim, J.; You, K.; Kim, K. Deep Learning-Based Algal Detection Model Development Considering Field Application. Water 2022, 14, 1275. [Google Scholar] [CrossRef]
  18. Radford, A.; Metz, L.; Chintala, S. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv 2015, arXiv:1511.06434. [Google Scholar]
  19. Dai, J.; Qi, H.; Xiong, Y.; Li, Y.; Zhang, G.; Hu, H.; Wei, Y. Deformable convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 764–773. [Google Scholar]
  20. Jocher, G.; Stoken, A.; Borovec, J.; Changyu, L.; Hogan, A.; Diaconu, L.; Poznanski, J.; Yu, L.; Rai, P. ultralytics/yolov5: v3. 0. 2020. Available online: https://github.com/ultralytics/yolov5 (accessed on 12 October 2023).
  21. Jiang, P.; Ergu, D.; Liu, F.; Cai, Y.; Ma, B. A Review of Yolo algorithm developments. Procedia Comput. Sci. 2022, 199, 1066–1073. [Google Scholar] [CrossRef]
  22. Zhuang, F.; Qi, Z.; Duan, K.; Xi, D.; Zhu, Y.; Zhu, H.; Xiong, H.; He, Q. A comprehensive survey on transfer learning. Proc. IEEE 2020, 109, 43–76. [Google Scholar] [CrossRef]
  23. Devan, K.S.; Walther, P.; von Einem, J.; Ropinski, T.; Kestler, H.A.; Read, C. Detection of herpesvirus capsids in transmission electron microscopy images using transfer learning. Histochem. Cell Biol. 2019, 151, 101–114. [Google Scholar] [CrossRef]
  24. Xie, X.; Cheng, G.; Wang, J.; Yao, X.; Han, J. Oriented R-CNN for object detection. In Proceedings of the IEEE/CVF international conference on computer vision, Montreal, BC, Canada, 11–17 October 2021; pp. 3520–3529. [Google Scholar]
  25. Kim, J.-A.; Sung, J.-Y.; Park, S.-H. Comparison of Faster-RCNN, YOLO, and SSD for real-time vehicle type recognition. In Proceedings of the 2020 IEEE International Conference on Consumer Electronics-Asia (ICCE-Asia), Seoul, Republic of Korea, 1–3 November 2020; pp. 1–4. [Google Scholar]
  26. Tan, L.; Huangfu, T.; Wu, L.; Chen, W. Comparison of RetinaNet, SSD, and YOLO v3 for real-time pill identification. BMC Med. Inform. Decis. Mak. 2021, 21, 324. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Sample images of all object classes. (a) Sample microscopic image of activated sludge; (b) granular sludges; (c) epistylis; (d) rotifers; (e) filamentous bacteria; (f) granular sludge flocs.
Figure 1. Sample images of all object classes. (a) Sample microscopic image of activated sludge; (b) granular sludges; (c) epistylis; (d) rotifers; (e) filamentous bacteria; (f) granular sludge flocs.
Applsci 13 12406 g001
Figure 2. The workflow of DC-GAN.
Figure 2. The workflow of DC-GAN.
Applsci 13 12406 g002
Figure 3. The annotations of images.
Figure 3. The annotations of images.
Applsci 13 12406 g003
Figure 4. The architecture of a 3 × 3 deformable convolution.
Figure 4. The architecture of a 3 × 3 deformable convolution.
Applsci 13 12406 g004
Figure 5. The architecture of YOLOv5.
Figure 5. The architecture of YOLOv5.
Applsci 13 12406 g005
Figure 6. The architecture of SE module.
Figure 6. The architecture of SE module.
Applsci 13 12406 g006
Figure 7. The workflows of transfer learning.
Figure 7. The workflows of transfer learning.
Applsci 13 12406 g007
Figure 8. The detection results of the improved YOLO model.
Figure 8. The detection results of the improved YOLO model.
Applsci 13 12406 g008
Table 1. The labeled microorganism numbers for original dataset and augmented dataset.
Table 1. The labeled microorganism numbers for original dataset and augmented dataset.
DatasetThe Number of
Granular Sludges
The Number of
Granular Sludge Flocs
The Number of
Filamentous Bacteria
The Number of
Rotifers
The Number of
Epistylis
original
dataset
263124848038112992
augmented
dataset
35003500350035003500
Table 2. The performance of models on different datasets.
Table 2. The performance of models on different datasets.
DatasetPrecisionRecall[email protected]
original dataset89.5%88.4%89.8%
augmented dataset92.3%92.8%93.7%
Table 3. The performance of YOLOv5 and improved YOLO.
Table 3. The performance of YOLOv5 and improved YOLO.
ModelObject ClassPrecisionRecall[email protected]
YOLOv5Granular sludges92.3%90.7%89.4%
Granular Sludge Flocs91.2%90.6%
Filamentous Bacteria85.9%85.3%
Rotifers86.3%85.7%
Epistylis91.9%90.1%
Improved YOLOGranular sludges93.6%93.0%93.7%
Granular Sludge Flocs92.1%93.2%
Filamentous Bacteria91.6%92.1%
Rotifers91.3%91.8%
Epistylis93.3%93.9%
Table 4. Comparison table of results with existing methods.
Table 4. Comparison table of results with existing methods.
ModelObject ClassPrecisionRecall[email protected]
R-CNNGranular sludges82.6%81.0%83.5%
Granular Sludge Flocs80.1%79.4%
Filamentous Bacteria74.6%73.8%
Rotifers76.3%77.2%
Epistylis83.4%82.7%
Faster R-CNNGranular sludges87.3%86.9%87.6%
Granular Sludge Flocs85.4%84.3%
Filamentous Bacteria81.8%80.9%
Rotifers82.3%82.7%
Epistylis86.8%84.2%
SSDGranular sludges90.8%91.1%91.3%
Granular Sludge Flocs90.6%90.2%
Filamentous Bacteria88.1%89.0%
Rotifers87.3%88.0%
Epistylis91.7%90.2%
Improved YOLOGranular sludges93.6%93.0%93.7%
Granular Sludge Flocs92.1%93.2%
Filamentous Bacteria91.6%92.1%
Rotifers91.3%91.8%
Epistylis93.3%93.9%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Kong, Y.; Shen, Z. Microorganism Detection in Activated Sludge Microscopic Images Using Improved YOLO. Appl. Sci. 2023, 13, 12406. https://doi.org/10.3390/app132212406

AMA Style

Kong Y, Shen Z. Microorganism Detection in Activated Sludge Microscopic Images Using Improved YOLO. Applied Sciences. 2023; 13(22):12406. https://doi.org/10.3390/app132212406

Chicago/Turabian Style

Kong, Yueping, and Zhiyuan Shen. 2023. "Microorganism Detection in Activated Sludge Microscopic Images Using Improved YOLO" Applied Sciences 13, no. 22: 12406. https://doi.org/10.3390/app132212406

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop