An Improved YOLOv5 Algorithm for Drowning Detection in the Indoor Swimming Pool

Yang, Ruiliang; Wang, Kaikai; Yang, Libin

doi:10.3390/app14010200

Open AccessArticle

An Improved YOLOv5 Algorithm for Drowning Detection in the Indoor Swimming Pool

by

Ruiliang Yang

^1,*

,

Kaikai Wang

² and

Libin Yang

²

¹

School of Aeronautics and Astronautics, Tiangong University, Tianjin 300387, China

²

School of Mechanical Engineering, Tiangong University, Tianjin 300387, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(1), 200; https://doi.org/10.3390/app14010200

Submission received: 17 November 2023 / Revised: 6 December 2023 / Accepted: 22 December 2023 / Published: 25 December 2023

Download

Browse Figures

Versions Notes

Abstract

:

In order to mitigate the risk of irreversible drowning injuries, this study introduces an enhanced YOLOv5 algorithm aimed at improving the efficacy of indoor swimming pool drowning detection and facilitating the timely rescue of endangered individuals. To simulate drowning and swimming positions accurately, four swimmers were deliberately chosen and observed, with monitoring conducted by drones flying above the swimming pool. The study was approved by the ethics committee of our institution, with the registration number 2022024. The images captured by the drones underwent a meticulous evaluation, and only those deemed suitable were selected to construct the self-made dataset, comprising a total of 8572 images. Furthermore, two enhancements were implemented in the YOLOv5 algorithm. Firstly, the inclusion of the ICA module strengthened category classification and the localization of water behavioral postures, which is improved from the coordinated attention module (CA). Secondly, the PAN module was replaced with the bi-directional feature pyramid network (BiFPN). Subsequently, the improved YOLOv5 algorithm underwent training using the self-made dataset. Evaluation of the algorithm’s performance revealed a notably improved detection accuracy rate, recall rate, and an impressive mean Average Precision (mAP) score of 98.1%, 98.0%, and 98.5%, respectively. Our paper introduces the improved YOLOv5 algorithm, surpassing the original YOLOv5 algorithm in terms of recognition accuracy for instances of drowning.

Keywords:

YOLOv5; drowning detection; self-made dataset; coordinate attention; drone

1. Introduction

Drowning poses a major public safety concern, reportedly claiming over 230,000 lives annually, according to a report by the World Health Organization (WHO). Children between the ages of 1 and 4 have the highest drowning rates, followed by children between the ages of 5 and 9 [1]. In China, drowning is responsible for over 59,000 deaths annually, with more than 95% being minors, as reported by the People’s Daily Online Opinion Data Center. Year after year, the proportion of minors dying from drowning is steadily increasing, bringing significant attention to the issue from society. When a drowning incident occurs, it often results in irreversible injuries [2].

Indoor swimming pools exhibit a higher prevalence of drowning incidents [3]. The lack of adequate safety facilities in swimming pools is a significant contributor to drownings. One study revealed that sixty-eight percent of childhood drownings they investigated occurred in pools lacking 4-sided fencing, resulting in an almost twofold increased risk [4]. Another study examined the impact of proper fencing around outdoor swimming pools on drowning incidents. They found that if all residential pools had proper fencing, 19% of pool-related drownings among children under 5 years old might have been prevented [5]. Proper pool fencing serves as a preventive measure, restricting a child’s access to a swimming pool in the absence of a responsible adult [5]. Additionally, fatal drownings may also be caused by swimming pool drainage systems [6], necessitating constant lifeguard surveillance. But, the visual search task of a lifeguard is difficult. The visual search task for lifeguards becomes challenging with increased numbers of swimmers, causing visual clutter within the supervision zone and delayed reaction times [7]. Swimmers engage in various activities, such as chatting, floating, and deliberate submersion, leading to increased processing and judgment times [8]. Continuous water surface monitoring can induce dizziness in lifeguards, compromising the reliability of their supervision [9]. Several wearable devices have emerged to assess swimmers’ behaviors, such as a proposed wearable pulse oximeter designed for swimming pool safety [10]. This device collects real-time data, including pulse oximetry, via sensors to analyze swimmers’ current conditions. Researchers have also proposed a wearable self-rescue alarm system as a potential solution. This system monitors children’s heart rates, water depth, and immersion time to promptly detect drowning accidents [11]. However, one drawback of wearable devices is their potential impact on the overall swimming experience. Additionally, wearable devices utilize methods like the transmission of acoustic signals, which present inevitable challenges. Swimming pools suffer from severe multipath propagation of underwater acoustic signals, potentially weakening their transmission [12]. Furthermore, the presence of strong noise in a pool can negatively affect the detection and accurate positioning of emergency signals, leading to positioning deviations [12]. Hence, there is a pressing need for the development of an automated, unsupervised drowning detection and identification system.

Drowning recognition has recently emerged as a significant area of application in machine learning. Previous research predominantly relied on various cameras, including wall-mounted cameras, overhead cameras, and underwater cameras, for imagery capture within pool environments [13,14]. Hasan et al. [15] introduced a water behavior dataset captured both above and below the water using cameras, comprising a water surface dataset and an underwater dataset for drowning detection. Analysis and experimentation conducted by Hasan et al. [15] reveal that the performance of the water surface dataset is superior to that of the underwater dataset. In the context of real-time water surface image analysis, expedited identification of drowning signs facilitates early detection, potentially saving valuable rescue time, minimizing submersion duration, and decreasing the number of drowning accidents. In practical rescue scenarios, optimal outcomes can be achieved when the duration of submersion remains below six minutes [16,17]. It is evident that incorporating deep learning techniques enhances the effectiveness and feasibility of drowning detection.

While common, the deployment of wall-mounted cameras, overhead cameras, and underwater cameras is not without limitations. Fixed angles associated with these cameras may impose constraints on capturing comprehensive and adaptable visual data. In contrast, the use of drones equipped with aerial perspectives presents a promising solution to overcome these limitations, providing increased flexibility. Concurrently, some existing drowning detection models exhibit low accuracy, rendering them inadequate for practical drowning detection and rescue applications. Claesson et al. [18] proposed the practical and efficient use of uncrewed aerial vehicles (UAVs) and online machine learning models to detect simulated drowning victims. However, the potential drawback is that this online machine learning model can be trained with only a limited number of datasets, which is limited by its low precision. Seguin et al. [19] advocated for the use of UAVs in sea rescue missions, enhancing the quality and speed of rescue operations while ensuring the safety of lifeguards by keeping them away from perilous sea conditions. An enhanced Mask R-CNN algorithm [20] was introduced for timely drowning person detection within swimming pools, achieving a detection rate of 94.1% with a false detection rate of 5.9%. A novel drowning risk detection method based on YOLOv4 was proposed to establish an effective early-warning system [13], attaining a precision of 80.84% in identifying drowning categories. A video-based drowning detection system [21], rooted in background scene modeling, generated a large number of parameters, resulting in a red alert delay of 8.4 to 12.5 s from the onset of a crisis, wasting valuable rescue time. These examples highlight the potential use of drones instead of wall cameras for drowning detection, yet their classification accuracy in water behaviors remains insufficient, diminishing their value in drowning detection and rescue tasks. There is a need for drowning detection models with high accuracy.

This paper proposes an improved YOLOv5 algorithm to improve the effectiveness of drowning detection, achieving a peak accuracy of 98.1%. To assess the improved YOLOv5 algorithm’s accuracy, drones were employed to monitor swimming pools and create a self-made dataset. Experiments on self-made datasets demonstrate that the improved YOLOv5 algorithm yields enhanced drowning detection results with higher accuracy compared to the original YOLOv5 algorithm.

In summary, the contributions of this paper encompass the following: (1) Proposing an improved coordinate attention (ICA) module and an improved YOLOv5 algorithm to augment drowning detection effectiveness and accuracy. (2) Utilizing a self-made dataset created by drones to evaluate the algorithm’s accuracy. (3) Demonstrating via experiments on self-made datasets that the improved YOLOv5 algorithm achieves superior drowning detection results with higher accuracy than the original YOLOv5 algorithm, and the ICA module outperforms the CA module.

2. Literature Review

With the rapid development of the economy and technology, drones are becoming increasingly compact and accessible to civilians. Their applications have grown significantly, being widely employed for efficient item transportation and effective target detection. Budiharto et al. [22] utilized drones to provide important vital medical assistance to emergency patients. Their use of drone technology for item transport proves fast and efficient. Seguin et al. [19] employed drones to swiftly and safely deliver flotation devices to simulated victims, greatly enhancing the quality and speed of first aid. Additionally, Çetin et al. [23] utilized guard drones to capture random area images and utilized artificial intelligence for object detection, accomplishing the identification and classification task of malicious drones flying in the airspace. Furthermore, Claesson et al. [18] proposed using drones along with online machine learning models to detect simulated drowning victims, demonstrating the feasibility of this approach.

Target-detection algorithms are advancing rapidly, with the YOLOv5 algorithm, a classical method, finding widespread use across various detection applications. He et al. [24] conducted a study to explore the advantages of Faster R-CNN and a series of YOLOv5 algorithms, aiming for swift and accurate detection of infant drowning in real-world scenarios. Furthermore, Ellen et al. [25] utilized the YOLOv5 algorithm to search for submerged victims but encountered limitations in accuracy. On a different note, Xu et al. [26] and Xue et al. [27] enhanced the YOLOv5 algorithm specifically for detecting forest fires, resulting in an improved algorithm that performs well in fire detection.

3. Methods

3.1. ICA Module and BiFPN Mechanism

In contrast to channel attention, which focuses solely on reweighing the importance of different channels, the coordinate attention (CA) module encompasses the encoding of spatial information [28,29]. This enables the CA module to more precisely locate the exact position of the object of interest, thereby enhancing the overall model’s recognition capabilities. Our research identifies that targets on the surface of swimming pools are susceptible to deformation and posture changes, generating more non-linear features that increase the complexity of model recognition. Leveraging the sigmoid-weighted linear unit (SiLU) proves beneficial in capturing non-linear characteristics and considering non-linear factors, facilitating the capture of spatial dependency relationships and target position information [30]. The integration of the CA module and SiLU helps the whole model to recognize deformed targets and posture better. Subsequently, the convolutional layer, batch normalization layer, and SiLU activation function are used to further activate and integrate the features; then, the ICA module is proposed, as shown in Figure 1a. The ICA module contributes to enhanced feature learning and increased attention to the feature information of swimmers’ behaviors. Its lightweight characteristic is suitable for mobile networks, and its capacity to analyze non-linear factors enhances the model’s learning of deformed targets and posture changes.

The bi-directional feature pyramid network (BiFPN), depicted in Figure 1b, integrates bidirectional cross-scale connections and fast normalized fusion [31]. It introduces learnable weights to learn the importance of different input features. BiFPN seamlessly fuses the original feature information from the backbone feature extraction, mitigating the loss of original feature information. Consequently, it achieves higher-level feature fusion, leading to a further improvement in algorithmic accuracy.

In Figure 1a, the structure of the ICA module is depicted, highlighting areas of improvement. The “X Avg Pool” represents average pooling in the X direction, while the “Y Avg Pool” indicates average pooling in the Y direction. “Conv2d” denotes standard convolution, and “BatchNorm” refers to batch normalization. Figure 1b illustrates the structure of the BiFPN. The top-down blue lines signify the transmission of high-level semantic features, while the bottom-up green lines convey precise positional information. The purple line represents the fusion of peer characteristic layers [32]. “Layer 1–3” refers to the feature layers, and “output 1–3” denotes the resulting feature layers. “BIF-Add2” signifies the characteristic fusion layer combining two inputs, whereas “BIF-Add3” combines three inputs.

3.2. Improved YOLOv5 Algorithm

The YOLOv5, a one-stage object detection algorithm, was developed by the Ultralytics LLC team. Compared to other networks of similar size, YOLO demonstrates superior performance [26]. Moreover, the YOLOv5s model boasts a compact memory footprint [26]. This paper adopts the YOLOv5s architecture to facilitate future applications on embedded devices. As shown in Figure 2, the structure of the improved YOLOv5 algorithm is divided into four parts: Input, Backbone, Neck, and Output [26,27]. The Input encompasses Mosaic data augmentation, adaptive anchor frame calculation, and adaptive image scaling. The Backbone network comprises Conv units, C3 units, and a spatial pyramid pooling-fast (SPPF) unit [27]. The Conv unit sequentially performs convolution calculations, batch normalization, and the SiLU activation function. The C3 module equally divides the input tensor into two branches and conducts convolution operations. One branch passes through a Conv unit and subsequently through multiple Resunit [33], which comprise a residual structure (Figure 2). The other branch performs direct convolutions and then concatenates the outputs of the two branches to operate a Conv unit. The Neck primarily integrates the ICA module, BiFPN, Conv units, and C3 units. The BiFPN comprises BIF-Add2, BIF-Add3, and other units, with “BN” representing batch normalization. In contrast, the Neck of the original YOLOv5 algorithm leverages the pyramid attention (PAN) network [34], a path aggregation network. In the Output, three detection layers are dedicated to identifying large, medium, and small targets.

However, when applying YOLOv5s to detect drowning targets on the surface of swimming pools, certain limitations become apparent, such as its restricted capability to address target deformation and posture changes. These factors impede detection efficacy and heighten the difficulty of tasks. Overcoming these issues while preserving the algorithm’s efficiency is a significant challenge in drowning detection. Hence, this study introduces an improved YOLOv5 algorithm specifically designed for water surface target detection, as illustrated in Figure 2. Key innovations in this algorithm include (1) Inserting the ICA module enhances the model’s capability in feature extraction and facilitating learning regarding deformed targets and posture changes and (2) replacing the PAN with the bi-directional feature pyramid network (BiFPN). The integration of BiFPN into the YOLOv5 model facilitates more robust feature fusion in the neck, enhances connections between features across different layers, and notably improves the model’s recognition accuracy [27,35].

3.3. Self-Made Dataset

A deficiency exists in the availability of pertinent public datasets addressing drowning incidents. To assess the accuracy of the enhanced YOLOv5 algorithm, this study created a self-made dataset via simulated drowning scenarios, as illustrated in Figure 3. Utilizing drones, the project recorded videos and extracted images at predefined intervals, culminating in the formation of the self-made dataset. Drowning behavior is delineated by non-voluntary controlled movements, encompassing lateral arm extension and struggle at the water’s surface [36].

Four healthy college students (mean age ± standard deviation: 21.2 ± 1.0 years; BMI: 22.1 ± 1.2) willingly participated in the study to simulate drowning (refer to Figure 3a). This study was approved by the Human Research Ethics Committee for Non-Clinical Faculties of the School of Aeronautics and Astronautics, Tiangong University (registration number 2022024). The experimental procedures and precautions were verbally communicated to all subjects the day before the experiment, and they provided informed consent by signing a consent form. Participants abstained from alcohol, coffee, or vigorous exercise within 24 h before the experiment. All subjects actively contributed to the design, execution, reporting, or dissemination plans of our research. To ensure dataset diversity, various postures, including treading water, breaststroke, backstroke, freestyle, and group configurations with two, three, and four individuals, were included (refer to Figure 3b–h). The study took place in a swimming pool under normal lighting conditions, utilizing the DJI Mini3pro drone manufactured by Shenzhen Dajiang Innovation Technology Co., Ltd. in China (Shenzhen, China) (refer to Figure 3i).

Subsequently, our self-made dataset was established using videos captured by drones. The LabelImg tool, a graphical image annotation tool developed in Python, was employed to select one image every five frames. This self-made dataset comprises 8572 labeled images representing drowning, treading water, and swimming. To assess the performance of our improved YOLOv5 algorithm, 7000 images were randomly chosen for the training set, and 1572 images were allocated to the validation set. In the self-made dataset, according to the distribution of positional proportions and the aspect ratios of the labeled targets (as shown in Figure 4), it is evident that the labeled targets are widely and mainly distributed in the middle region of the picture, with aspect ratios mainly falling within 0.3 of the input image size.

4. Experiments

4.1. Experimental Environment and Configuration

In this study, experiments were carried out on a Windows 10 system equipped with an Intel (R) Xeon (R) W-2223 CPU (Intel, Santa Clara, CA, USA) and an NVIDIA GeForce RTX3060 GPU (NVIDIA, Redmond, WA, USA). The PyTorch 1.13 deep learning framework was employed for model implementation. The software environment for the experiments encompassed CUDA 11.7 and Python 3.9. The total number of iterations was set at 100. Specific values for experimental parameters can be found in Table 1.

4.2. Evaluation Indicators

To assess the recognition capability of the model proposed in this study, three metrics were employed: precision, recall, and mean average precision (mAP). These metrics were used to evaluate the model’s performance.

p r e c i s i o n = \frac{T P}{T P + F P}

(1)

r e c a l l = \frac{T P}{T P + F N}

(2)

m A P = \frac{\sum_{i = 1}^{N} {A P}_{i}}{n}

(3)

where the mAP metric is determined by the average area encompassed by the precision–recall curve. Two variants of mAP are employed: mAP@@0.5 and mAP@0.5:0.95. The mAP@0.5 calculates the average AP value for all categories at an intersection-over-union (IOU) of 0.5, where AP represents the area encompassed by the precision–recall curve. For mAP@0.5:0.95, the IOU ranges from 0.5 to 0.95 in increments of 0.05. Evaluation of sample classification involves true positive (TP), true negative (TN), false positive (FP), and false negative (FN), as detailed in Table 2.

4.3. Comparison of Detection Results on the Self-Made Dataset

Table 3 presents a comparative analysis between the improved YOLOv5 algorithm and the original YOLOv5 algorithm using a self-made dataset. The improved YOLOv5 algorithm exhibited superior precision (98.1%) compared to the original YOLOv5 algorithm, with relatively lower values for recall (98.0%) and mAP@0.5 (98.5%). The improved YOLOv5 algorithm demonstrated increases of 1.3%, −0.1%, −0.4%, and 0.1% in precision, recall, mAP@0.5, and mAP@0.5:0.95, respectively, in contrast to the original YOLOv5 algorithm. Noteworthy is the simultaneous increase in the number of parameters by 253,857. Despite this parameter increment, the enhanced YOLOv5 algorithm’s precision displayed a significant enhancement.

Furthermore, the modest decreases in recall and mAP values, along with the parameter increase, were all within acceptable ranges, and the improvements in detection accuracy are worthwhile. The improved YOLOv5 model excels in achieving higher precision in drowning detection, aligning with the demand for heightened precision in this specific task.

4.4. Ablation Experiments

To validate the efficacy of our enhancements, we conducted an ablation experiment on the object detection network based on YOLOv5s. The Neck network of YOLOv5s was augmented with CA, ICA, and BiFPN. The results of the ablation experiments on the self-made dataset are presented in Table 4. Specifically, it was observed that compared to the baseline network YOLOv5s, ICA yielded a 0.8% improvement in precision but a marginal decrease of 0.3% in mAP@0.5. Similarly, CA led to a 0.5% boost in precision with a slight reduction of 0.3% in mAP@0.5. These outcomes prove the effectiveness of the ICA improvement. Furthermore, ICA contributed to a 1.9% increase in parameters. The improved YOLOv5 demonstrated a noteworthy 1.3% enhancement in precision and a 0.2 ms increase in inference speed, albeit with a 3.6% increase in parameters. The change in mAP@0.5:0.95 was minimal. The improvements in detection accuracy are justified and worthwhile, considering the marginal decrease in mAP@0.5 and the modest increase in inference speed.

The optimal placement of improvements significantly influences model performance. To assess the impact of introducing the ICA and BiFPN in the backbone section on model performance, an identical ablation experiment was conducted using YOLOv5s. As depicted in Table 5, incorporating ICA and BiFPN in the backbone results in performance enhancements. Specifically, in comparison to the original YOLOv5s, ICA demonstrates a 0.7% improvement in precision with a 5.9% increase in parameters. The combination of ICA and BiFPN leads to a 1.0% improvement in precision, accompanied by an 8.1% increase in parameters. There is minimal change in mAP@0.5:0.95, but a marginal decrease of 0.3% is observed in mAP@0.5. This ablation experiment confirms that incorporating ICA and BiFPN in the Neck section achieves the highest precision improvement, outperforming their insertion in the backbone section.

To assess the accuracy of the improved YOLOv5, we employ the commonly utilized confusion matrix for evaluation, as illustrated in Figure 5 on the self-made dataset. The diagonal entries signify the probability of the model correctly detecting a specific class within the self-made dataset, aligning with its true label. The off-diagonal areas denote false or missed detections. The results indicate that the improved YOLOv5 exhibits an exceptionally low error detection rate, particularly excelling in the identification of drowning incidents, showcasing its value in the field of rescue operations.

Figure 6 illustrates the box loss, which represents the deviation loss value between the true anchor boxes and the predicted anchor boxes. It also displays the convergence curve and precision curve during the training of the improved YOLOv5. The box loss convergence curve decreases gradually and smoothly, stabilizing around the 90th epoch. The precision curve exhibits fluctuations from the 15th to the 50th epochs, stabilizing gradually thereafter, particularly beyond the 60th epoch or so. This observation signifies the stability of the improved YOLOv5, highlighting its robustness and considerable applicability in tasks related to drowning detection.

Grad-CAM, a technique designed for generating ‘visual explanations’ regarding decisions made by Convolutional Neural Network (CNN)-based models [37], is employed to highlight crucial and attention-concentrated regions within images for predicting specific concepts. As depicted in Figure 7, we employ Grad-CAM to visualize attention-concentrated regions in four algorithms: Figure 7a–d correspond to the following: the original YOLOv5, the original YOLOv5 with CA, the original YOLOv5 with ICA, and the improved YOLOv5, respectively. The areas with a deeper shade of red in the image indicate higher attention or focus of the model. These models accurately focus on regions associated with water behaviors, yet subtle differences exist. In Figure 7a, the highlighted area is smaller, failing to completely encompass the target. In Figure 7b, the hand area is more prominently covered in the upper figure, while the lower figure shows minimal variation compared to Figure 7a. In Figure 7c, the upper figure remains largely unchanged from Figure 7b, but the lower body edge covers more area compared to Figure 7b in the upper figure. In contrast, Figure 7d demonstrates the widest target coverage in the upper figure. Furthermore, in the lower figure, the brightest position precisely aligns with the center of the target, completely covering the head area. Consequently, it is evident that the improved YOLOv5 algorithm outperforms the other three models in classification performance.

5. Limitations

The simulation personnel are fixed, and the current age range is limited. Enhancing the model’s performance could be achieved by incorporating additional sets of personnel photo datasets. Utilizing a more extensive dataset featuring photos from diverse settings has the potential to enhance both sensitivity and specificity. It is important to note that the shape of simulated personnel in the self-made dataset may differ from individuals in real rescue situations who are conscious and in distress. Factors such as bottom conditions, lighting, and attire of personnel in water may also impact the results. And it could be recognized that the use of drones may face constraints, including power shortages and equipment resource limitations. Policies and regulations concerning drones can vary by region. Some areas may impose restrictions on drone flights during specific times or in particular zones for safety or to protect specific areas. Familiarity and compliance with local laws and restrictions are necessary before using a drone. Some regions place a high emphasis on privacy concerns, necessitating consent regarding drone surveillance and privacy matters before their use. We are conducting experiments in public settings, and there are no restrictions on the use of drones or privacy concerns. When outdoors, weather conditions like high winds can limit the use of drones, drones with strong wind resistance are necessary to handle such conditions. In our future research, we will primarily focus on two main areas of work. Firstly, the application scenarios will expand from swimming pools to outdoor areas and beaches, encompassing individuals of different age groups. Then, the model will undergo lightweight processing to effectively utilize memory and hard drive space, facilitating its application and deployment. Due to constraints with some peripheral devices, which might have limited memory and small size, it could potentially restrict the application of our research. After lightweighting the model, drowning detection can be achieved by transferring data to a control system.

6. Conclusions

This article introduces a novel self-made dataset obtained via drone-based data collection specifically designed for drowning detection. The dataset comprises two distinct categories: drowning and non-drowning postures (Figure 3). While previous research primarily relied on fixed camera types such as wall-mounted cameras, overhead cameras, and underwater cameras for capturing imagery within pool areas or interiors [13,14], fixed cameras inherently suffer from limitations due to their rigid perspective, making them sensitive to variations in distance. This sensitivity leads to challenges such as target deformation and posture changes easily. Our study reveals the exceptional flexibility of drones in tracking swimmers and efficiently collecting surface data from swimming pools. Compared to manual supervision, drone-based aerial surveys prove to be a time-saving approach, providing expedited access to critical swimming posture data more conveniently. The aerial perspective not only facilitates the identification of high-risk drowning behaviors, such as near-vertical body postures or ineffective movements but also enhances drowning detection accuracy, thereby improving the quality and efficacy of rescue efforts.

Additionally, the original YOLOv5 inadequately addressed water behavior deformation and posture changes, necessitating the integration of an attention module. The coordinate attention (CA) module effectively enhances the representation of pertinent objects without imposing substantial computational complexity, making it an ideal candidate for integration with conventional networks. Previous studies have successfully employed the CA module to improve detection accuracy and robustness across various tasks. For example, adding the CA module to vehicle detection improved accuracy to almost 95% when used with the BiFPN structure for enhancing feature fusion [31]. Similarly, the CA module with position information accurately detected dense crowds by obtaining the effective width and height of the feature layer image [38]. In sea vessel detection, the CA module was applied to detect important ship features, suppress irrelevant information, increase the model’s robustness to scale changes and noise interference, and effectively improve the model’s overall performance [39]. However, there was insufficient attention to the nonlinear concentration of water activity postures. The ICA module combines the CA module with the SiLU activation function. The SiLU activation function with self-stability outperforms the ReLU activation function [30]. Substituting the ReLU activation function in the CA module with the more stable SiLU activation function enhances the processing capacity of important non-linear features and attention to postures. The ICA module not only captures cross-channel information, direction awareness, and location sensitivity information like the CA module but also enhances non-linear sensitivity. This enhancement assists the model in more accurately locating and identifying objects of interest, resulting in improved target recognition accuracy. Incorporating the ICA module to improve the attention and sensitivity of important feature details in swimming postures, as demonstrated in Table 4, increased the model’s accuracy by 0.8%. Furthermore, replacing the original PAN structure with the Bi-directional Feature Pyramid Network (BiFPN) structure to enhance feature fusion efficiency resulted in a 1.3% increase in the model’s accuracy, as illustrated in Table 3.

While several existing drowning datasets have utilized traditional camera setups, the accuracy of existing models for drowning detection remains insufficient, limiting their value in applications. In contrast, our improved YOLOv5 model achieved an impressive precision of 98.1%, meeting the stringent criteria for high accuracy required in drowning recognition. This heightened accuracy instills greater confidence in the precise detection of drowning incidents, providing crucial support to rescue operations. The model’s fast inference speed of 3.7 ms allows rapid screening of real-time images. The combination of high accuracy and rapid inference speed positions the model as a valuable asset for real-time drowning detection at swimming pools, ensuring timely identification of incidents and contributing significantly to drowning prevention efforts.

In this paper, we present an improved YOLOv5 algorithm designed for the timely detection of drowning incidents in indoor swimming pools.

(1): Two key improvements were implemented to augment the original YOLOv5 algorithm. Firstly, the ReLU activation function in the coordinated attention (CA) module was replaced with the SiLU activation function, resulting in a refined coordinated attention module (ICA). Additionally, the PAN module was substituted with the bi-directional feature pyramid network (BiFPN).
(2): To evaluate the accuracy of the improved YOLOv5 algorithm, a self-made dataset was generated. Four college students simulated drowning scenarios and various water poses under drone surveillance, with relevant images extracted to form a dataset comprising 8572 images.
(3): The improved YOLOv5 algorithm exhibited a noteworthy 1.3% improvement in precision compared to the original YOLOv5 algorithm. It achieved a recall rate of 98.0% and mean average precision (mAP) values of 98.5% and 73.3% at 0.5 to 0.9 IOU thresholds, respectively, meeting the stringent accuracy requirements for drowning detection.

Consequently, the use of a self-made dataset in conjunction with the improved YOLOv5 algorithm for drowning recognition proves to be a viable approach. Moreover, the effect of accurately detecting individuals drowning on the water surface is satisfactory. And the model’s capability to accurately detect individuals drowning on the water surface, with a precision of 98.1%, underscores its high value in drowning detection and rescue operations.

Author Contributions

Conceptualization, R.Y.; methodology, R.Y. and K.W.; software, K.W. and L.Y.; validation, K.W.; formal analysis, R.Y., K.W. and L.Y.; investigation, R.Y. and L.Y.; resources, K.W. and L.Y.; data curation, K.W.; writing—original draft preparation, K.W.; writing—review and editing, K.W. and L.Y.; visualization, K.W.; supervision, R.Y.; project administration, R.Y.; funding acquisition, R.Y. and L.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki and approved by Human Research Ethics Committee for Non-Clinical Faculties of the School of Aeronautics and Astronautics, Tiangong University, with approval number (registration number 2022024, 10 September 2022).

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The data and codes presented in this study are publicly available on GitHub: https://github.com/Wang-Kaikai/drowning-detection-dataset (accessed on 17 December 2023).

Conflicts of Interest

The authors declare no conflict of interest.

Nomenclature

BatchNorm, BN	batch normalization
BiFPN	bi-directional feature pyramid network
BIF-Add2	BiFPN feature fusion of 2 inputs
BIF-Add3	BiFPN feature fusion of 3 inputs
CA	coordinated attention module
Conv2d	ordinary convolution
DJI Mini3pro	drones made by Shenzhen Dajiang Innovation Technology Co., Ltd. in China
FN	false negative
FP	false positive
ICA	improved coordinated attention module
IOU	intersection over union
mAP	mean average precision
PAN	pyramid attention network
ReLU	rectified linear unit
SiLU	sigmoid-weighted linear unit
SPPF	spatial pyramid pooling-fast
TN	true negative
TP	true positive
X Avg Pool	average pool in X direction
Y Avg Pool	average pool in Y direction

References

World Health Organization (WHO). Available online: https://www.who.int/publications-detail-redirect/9789240046726 (accessed on 23 October 2023).
People’s Daily Public Opinion Data Center and People’s Online. Available online: https://www.1608.cn/pptx/70444.html (accessed on 25 October 2023).
Alshbatat, A.I.N.; Alhameli, S.; Almazrouei, S.; Alhameli, S.; Almarar, W. Automated vision-based surveillance system to detect drowning incidents in swimming pools. In Proceedings of the Advances in Science and Engineering Technology International Conferences, Dubai, United Arab Emirates, 20–23 February 2020; pp. 1–5. [Google Scholar] [CrossRef]
Stevenson, M.R.; Rimajova, M.; Edgecombe, D.; Vickery, K. Childhood drowning: Barriers surrounding private swimming pools. Pediatrics 2003, 111, E115–E119. [Google Scholar] [CrossRef] [PubMed]
Logan, P.; Branche, C.M.; Sacks, J.J.; Ryan, G.; Peddicord, J. Childhood drownings and fencing of outdoor pools in the United States, 1994. Pediatrics 1998, 101, E3. [Google Scholar] [CrossRef] [PubMed]
Atilgan, M.; Bulgur-Kirbas, D.; Akman, R.; Deveci, C. Fatal drowning caused by a swimming pool drainage system. Am. J. Forensic Med. Pathol. 2021, 42, 275–277. [Google Scholar] [CrossRef] [PubMed]
Lanagan-Leitzel, L.K.; Skow, E.; Moore, C.M. Great expectations: Perceptual challenges of visual surveillance in lifeguarding. Appl. Cogn. Psychol. 2015, 29, 425–435. [Google Scholar] [CrossRef]
Victoria, L.; David, C. The effect of lifeguard experience upon the detection of drowning victims in a realistic dynamic visual search task. Appl. Cogn. Psychol. 2017, 32, 14–23. [Google Scholar] [CrossRef]
Lei, F.; Zhu, H.; Tang, F.; Wang, X. Drowning behavior detection in swimming pool based on deep learning. Signal Image Video Process. 2022, 16, 1683–1690. [Google Scholar] [CrossRef]
Kałamajska, E.; Misiurewicz, J.; Weremczuk, J. Wearable pulse oximeter for swimming pool safety. Sensors 2022, 22, 3823. [Google Scholar] [CrossRef] [PubMed]
Jalalifar, S.; Kashizadeh, A.; Mahmood, I.; Belford, A.; Drake, N.; Razmjou, A.; Asadnia, M. A smart multi-sensor device to detect distress in swimmers. Sensors 2023, 22, 1059. [Google Scholar] [CrossRef]
Misiurewicz, J.; Bruliński, K.; Klembowski, W.; Kulpa, K.S.; Pietrusiewicz, J. Multipath propagation of acoustic signal in a swimming pool—Source localization problem. Sensors 2022, 22, 1162. [Google Scholar] [CrossRef]
Niu, Q.; Wang, Y.; Yuan, S.; Li, K.; Wang, X. An indoor pool drowning risk detection method based on improved YOLOv4. In Proceedings of the IEEE Advanced Information Management, Communicates, Electronic and Automation Control Conference, Chongqing, China, 16–18 December 2022; pp. 1559–1563. [Google Scholar] [CrossRef]
The Drowning Detection System. Available online: https://poseidon-tech.com/ (accessed on 28 October 2023).
Hasan, S.; Joy, J.; Ahsan, F.; Khambaty, H.; Agarwal, M.; Mounsef, J. A water behavior dataset for an image-based drowning solution. In Proceedings of the IEEE Green Energy and Smart Systems Conference, Long Beach, CA, USA, 1–2 November 2021; pp. 1–5. [Google Scholar] [CrossRef]
Quan, L.; Bierens, J.J.L.M.; Lis, R.; Rowhani-Rahbar, A.; Morley, P.; Perkins, G.D. Predicting outcome of drowning at the scene: A systematic review and meta-analyses. Resuscitation 2016, 104, 63–75. [Google Scholar] [CrossRef]
Quan, L.; Mack, C.; Schiff, M.A. Association of water temperature and submersion duration and drowning outcome. Resuscitation 2014, 85, 790–794. [Google Scholar] [CrossRef] [PubMed]
Claesson, A.; Schierbeck, S.; Hollenberg, J.; Forsberg, S.; Nordberg, P.; Ringh, M.; Jansson, A.; Nord, A. The use of drones and a machine-learning model for recognition of simulated drowning victims-A feasibility study. Resuscitation 2020, 156, 196–201. [Google Scholar] [CrossRef] [PubMed]
Seguin, C.; Blaquiere, G.; Loundou, A.; Michelet, P.; Markarian, T. Unmanned aerial vehicles (drones) to prevent drowning. Resuscitation 2018, 127, 63–67. [Google Scholar] [CrossRef] [PubMed]
Hayat, M.A.; Yang, G.; Iqbal, A. Mask R-CNN based real time near drowning person detection system in swimming pools. In Proceedings of the Mohammad Ali Jinnah University International Conference on Computing, Karachi, Pakistan, 27–28 October 2022; pp. 1–6. [Google Scholar] [CrossRef]
Kam, A.H.; Lu, W.; Yau, W.Y. A video-based drowning detection system. In Proceedings of the Computer Vision—ECCV 2002: 7th European Conference on Computer Vision, Copenhagen, Denmark, 28–31 May 2002; Volume 2353, pp. 297–311. [Google Scholar] [CrossRef]
Budiharto, W.; Gunawan, A.A.S.; Suroso, J.S.; Chowanda, A.; Patrik, A.; Utama, G. Fast object detection for quadcopter drone using deep learning. In Proceedings of the 3rd International Conference on Computer and Communication Systems, Nagoya, Japan, 27–30 April 2018; pp. 192–195. [Google Scholar] [CrossRef]
Çetin, E.; Barrado, C.; Pastor, E. Improving real-time drone detection for counter-drone systems. Aeronaut. J. 2021, 125, 1871–1896. [Google Scholar] [CrossRef]
He, Q.; Mei, Z.; Zhang, H.; Xu, X. Automatic real-time detection of infant drowning using YOLOv5 and Faster R-CNN models based on video surveillance. J. Social Comput. 2023, 4, 62–73. [Google Scholar] [CrossRef]
Ellen, D.A.R.; Kristalina, P.; Hadi, M.Z.S.; Patriarso, A. Effective searching of drowning victims in the river using deep learning method and underwater drone. In Proceedings of the International Electronics Symposium, Denpasar, Indonesia, 8–10 August 2023; pp. 569–574. [Google Scholar] [CrossRef]
Xu, R.; Lin, H.; Lu, K.; Cao, L.; Liu, Y. A forest fire detection system based on ensemble learning. Forests 2021, 12, 217. [Google Scholar] [CrossRef]
Xue, Z.; Lin, H.; Wang, F. A small target forest fire detection model based on YOLOv5 improvement. Forests 2022, 13, 1332. [Google Scholar] [CrossRef]
Hou, Q.; Zhou, D.; Feng, J. Coordinate attention for efficient mobile network design. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 13708–13717. [Google Scholar] [CrossRef]
Li, J.; Liu, C.; Lu, X.; Wu, B. CME-YOLOv5: An efficient object detection network for densely spaced fish and small targets. Water 2022, 14, 2412. [Google Scholar] [CrossRef]
Elfwing, S.; Uchibe, E.; Doya, K. Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Netw. 2018, 107, 3–11. [Google Scholar] [CrossRef]
Lin, M.; Wang, Z.; Huang, L. Analysis and research on YOLOv5s vehicle detection with CA and BiFPN fusion. In Proceedings of the IEEE 4th Eurasia Conference on IOT, Communication and Engineering, Yunlin, Taiwan, 28–30 October 2022; pp. 201–205. [Google Scholar] [CrossRef]
Li, S.; Zhang, S.; Xue, J.; Sun, H. Lightweight target detection for the field flat jujube based on improved YOLOv5. Comput. Electron. Agric. 2022, 202, 107391. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar] [CrossRef]
Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path aggregation network for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8759–8768. [Google Scholar] [CrossRef]
Hong, W.; Ma, Z.; Ye, B.; Yu, G.; Tang, T.; Zheng, M. Detection of green asparagus in complex environments based on the improved YOLOv5 algorithm. Sensors 2023, 23, 1562. [Google Scholar] [CrossRef] [PubMed]
Carballo-Fazanes, A.; Bierens, J.J.; the International Expert Group to Study Drowning Behaviour. The visible behaviour of drowning persons: A pilot observational study using analytic software and a nominal group technique. Int. J. Environ. Res. Public Health 2020, 17, 6930. [Google Scholar] [CrossRef] [PubMed]
Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-CAM: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar] [CrossRef]
Chen, J.; Wei, Y.; Zhou, Y. Dense crowd detection algorithm for YOLOv5 based on coordinate attention mechanism. In Proceedings of the International Conference on Algorithms, High Performance Computing and Artificial Intelligence, Guangzhou, China, 21–23 October 2022; pp. 187–190. [Google Scholar] [CrossRef]
Xie, F.; Lin, B.; Liu, Y. Research on the coordinate attention mechanism fuse in a YOLOv5 deep learning detector for the sar ship detection task. Sensors 2022, 22, 3370. [Google Scholar] [CrossRef] [PubMed]

Figure 1. The structure of ICA and BiFPN. (a) ICA, the parts highlighted in yellow denote improvements; (b) BiFPN.

Figure 2. Improved YOLOv5 structure diagram.

Figure 3. Images of the self-made dataset and the drone. (a) drowning; (b–h) diverse swimming postures, such as tread water, breaststroke, backstroke, freestyle, and group configurations featuring two, three, and four individuals; (i) the DJI Mini3pro.

Figure 4. The distribution of the position and the aspect ratios of the labeled targets of self-made dataset, the black area indicates the densest distribution, followed by dark blue, and light blue shows the most dispersed distribution. (a) the position distribution of the labeled targets; (b) the distribution of the aspect ratios for the labeled targets.

Figure 5. The confusion matrix of the improved YOLOv5 on the self-made dataset.

Figure 6. The box loss convergence curve and precision curve on the training of the improved YOLOv5. (a) the box loss convergence curve; (b) the precision curve.

Figure 7. The results of Grad-CAM, the red area indicates the most concentrated attention of the models. (a) Results of the original YOLOv5; (b) results of the original YOLOv5 with CA; (c) results of the original YOLOv5 with ICA; (d) results of the improved YOLOv5.

Table 1. Experimental parameter settings.

Training Configuration	Value
image size	640
batch size	16
works	5
momentum	0.937
learning rate	0.01
optimizer	SGD

Table 2. Sample classification.

Real Value	Predicted Value (Positive)	Predicted Value (Negative)
Positive	True Positive (TP)	False Positive (FN)
Negative	False Negative (FP)	True Negative (TN)

Table 3. Comparison of improved YOLOv5 algorithm and original YOLOv5 algorithm.

Algorithm	Precision (%)	Recall (%)	mAP@0.5 (%)	mAP@0.5:0.95 (%)	Parameters
YOLOv5	96.8	98.1	98.9	73.2	7,018,216
improved YOLOv5	98.1	98.0	98.5	73.3	7,272,073

Table 4. The impact of the location of the added modules in Neck.

Algorithm	Precision (%)	Recall (%)	mAP@0.5 (%)	mAP@0.5:0.95 (%)	Parameters	Inference Speed (ms)
YOLOv5	96.8	98.1	98.9	73.2	7,018,216	3.5
YOLOv5 + CA	97.3	97.8	98.6	73.4	7,156,648	3.5
YOLOv5 + ICA	97.6	97.2	98.6	73.3	7,156,352	3.6
improved YOLOv5	98.1	98.0	98.5	73.3	7,272,073	3.7

Table 5. The impact of the location of the added modules in Backbone.

Algorithm	Precision (%)	Recall (%)	mAP@0.5 (%)	mAP@0.5:0.95 (%)	Parameters
YOLOv5	96.8	98.1	98.9	73.2	7,018,216
YOLOv5 + CA	97.3	97.5	98.8	73.3	7,437,848
YOLOv5 + ICA	97.5	97.8	98.9	73.2	7,437,848
YOLOv5 + ICA + BIFPN	97.8	97.5	98.6	73.3	7,586,337

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yang, R.; Wang, K.; Yang, L. An Improved YOLOv5 Algorithm for Drowning Detection in the Indoor Swimming Pool. Appl. Sci. 2024, 14, 200. https://doi.org/10.3390/app14010200

AMA Style

Yang R, Wang K, Yang L. An Improved YOLOv5 Algorithm for Drowning Detection in the Indoor Swimming Pool. Applied Sciences. 2024; 14(1):200. https://doi.org/10.3390/app14010200

Chicago/Turabian Style

Yang, Ruiliang, Kaikai Wang, and Libin Yang. 2024. "An Improved YOLOv5 Algorithm for Drowning Detection in the Indoor Swimming Pool" Applied Sciences 14, no. 1: 200. https://doi.org/10.3390/app14010200

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Improved YOLOv5 Algorithm for Drowning Detection in the Indoor Swimming Pool

Abstract

1. Introduction

2. Literature Review

3. Methods

3.1. ICA Module and BiFPN Mechanism

3.2. Improved YOLOv5 Algorithm

3.3. Self-Made Dataset

4. Experiments

4.1. Experimental Environment and Configuration

4.2. Evaluation Indicators

4.3. Comparison of Detection Results on the Self-Made Dataset

4.4. Ablation Experiments

5. Limitations

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Nomenclature

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI