1. Introduction
Drowning poses a major public safety concern, reportedly claiming over 230,000 lives annually, according to a report by the World Health Organization (WHO). Children between the ages of 1 and 4 have the highest drowning rates, followed by children between the ages of 5 and 9 [
1]. In China, drowning is responsible for over 59,000 deaths annually, with more than 95% being minors, as reported by the People’s Daily Online Opinion Data Center. Year after year, the proportion of minors dying from drowning is steadily increasing, bringing significant attention to the issue from society. When a drowning incident occurs, it often results in irreversible injuries [
2].
Indoor swimming pools exhibit a higher prevalence of drowning incidents [
3]. The lack of adequate safety facilities in swimming pools is a significant contributor to drownings. One study revealed that sixty-eight percent of childhood drownings they investigated occurred in pools lacking 4-sided fencing, resulting in an almost twofold increased risk [
4]. Another study examined the impact of proper fencing around outdoor swimming pools on drowning incidents. They found that if all residential pools had proper fencing, 19% of pool-related drownings among children under 5 years old might have been prevented [
5]. Proper pool fencing serves as a preventive measure, restricting a child’s access to a swimming pool in the absence of a responsible adult [
5]. Additionally, fatal drownings may also be caused by swimming pool drainage systems [
6], necessitating constant lifeguard surveillance. But, the visual search task of a lifeguard is difficult. The visual search task for lifeguards becomes challenging with increased numbers of swimmers, causing visual clutter within the supervision zone and delayed reaction times [
7]. Swimmers engage in various activities, such as chatting, floating, and deliberate submersion, leading to increased processing and judgment times [
8]. Continuous water surface monitoring can induce dizziness in lifeguards, compromising the reliability of their supervision [
9]. Several wearable devices have emerged to assess swimmers’ behaviors, such as a proposed wearable pulse oximeter designed for swimming pool safety [
10]. This device collects real-time data, including pulse oximetry, via sensors to analyze swimmers’ current conditions. Researchers have also proposed a wearable self-rescue alarm system as a potential solution. This system monitors children’s heart rates, water depth, and immersion time to promptly detect drowning accidents [
11]. However, one drawback of wearable devices is their potential impact on the overall swimming experience. Additionally, wearable devices utilize methods like the transmission of acoustic signals, which present inevitable challenges. Swimming pools suffer from severe multipath propagation of underwater acoustic signals, potentially weakening their transmission [
12]. Furthermore, the presence of strong noise in a pool can negatively affect the detection and accurate positioning of emergency signals, leading to positioning deviations [
12]. Hence, there is a pressing need for the development of an automated, unsupervised drowning detection and identification system.
Drowning recognition has recently emerged as a significant area of application in machine learning. Previous research predominantly relied on various cameras, including wall-mounted cameras, overhead cameras, and underwater cameras, for imagery capture within pool environments [
13,
14]. Hasan et al. [
15] introduced a water behavior dataset captured both above and below the water using cameras, comprising a water surface dataset and an underwater dataset for drowning detection. Analysis and experimentation conducted by Hasan et al. [
15] reveal that the performance of the water surface dataset is superior to that of the underwater dataset. In the context of real-time water surface image analysis, expedited identification of drowning signs facilitates early detection, potentially saving valuable rescue time, minimizing submersion duration, and decreasing the number of drowning accidents. In practical rescue scenarios, optimal outcomes can be achieved when the duration of submersion remains below six minutes [
16,
17]. It is evident that incorporating deep learning techniques enhances the effectiveness and feasibility of drowning detection.
While common, the deployment of wall-mounted cameras, overhead cameras, and underwater cameras is not without limitations. Fixed angles associated with these cameras may impose constraints on capturing comprehensive and adaptable visual data. In contrast, the use of drones equipped with aerial perspectives presents a promising solution to overcome these limitations, providing increased flexibility. Concurrently, some existing drowning detection models exhibit low accuracy, rendering them inadequate for practical drowning detection and rescue applications. Claesson et al. [
18] proposed the practical and efficient use of uncrewed aerial vehicles (UAVs) and online machine learning models to detect simulated drowning victims. However, the potential drawback is that this online machine learning model can be trained with only a limited number of datasets, which is limited by its low precision. Seguin et al. [
19] advocated for the use of UAVs in sea rescue missions, enhancing the quality and speed of rescue operations while ensuring the safety of lifeguards by keeping them away from perilous sea conditions. An enhanced Mask R-CNN algorithm [
20] was introduced for timely drowning person detection within swimming pools, achieving a detection rate of 94.1% with a false detection rate of 5.9%. A novel drowning risk detection method based on YOLOv4 was proposed to establish an effective early-warning system [
13], attaining a precision of 80.84% in identifying drowning categories. A video-based drowning detection system [
21], rooted in background scene modeling, generated a large number of parameters, resulting in a red alert delay of 8.4 to 12.5 s from the onset of a crisis, wasting valuable rescue time. These examples highlight the potential use of drones instead of wall cameras for drowning detection, yet their classification accuracy in water behaviors remains insufficient, diminishing their value in drowning detection and rescue tasks. There is a need for drowning detection models with high accuracy.
This paper proposes an improved YOLOv5 algorithm to improve the effectiveness of drowning detection, achieving a peak accuracy of 98.1%. To assess the improved YOLOv5 algorithm’s accuracy, drones were employed to monitor swimming pools and create a self-made dataset. Experiments on self-made datasets demonstrate that the improved YOLOv5 algorithm yields enhanced drowning detection results with higher accuracy compared to the original YOLOv5 algorithm.
In summary, the contributions of this paper encompass the following: (1) Proposing an improved coordinate attention (ICA) module and an improved YOLOv5 algorithm to augment drowning detection effectiveness and accuracy. (2) Utilizing a self-made dataset created by drones to evaluate the algorithm’s accuracy. (3) Demonstrating via experiments on self-made datasets that the improved YOLOv5 algorithm achieves superior drowning detection results with higher accuracy than the original YOLOv5 algorithm, and the ICA module outperforms the CA module.
2. Literature Review
With the rapid development of the economy and technology, drones are becoming increasingly compact and accessible to civilians. Their applications have grown significantly, being widely employed for efficient item transportation and effective target detection. Budiharto et al. [
22] utilized drones to provide important vital medical assistance to emergency patients. Their use of drone technology for item transport proves fast and efficient. Seguin et al. [
19] employed drones to swiftly and safely deliver flotation devices to simulated victims, greatly enhancing the quality and speed of first aid. Additionally, Çetin et al. [
23] utilized guard drones to capture random area images and utilized artificial intelligence for object detection, accomplishing the identification and classification task of malicious drones flying in the airspace. Furthermore, Claesson et al. [
18] proposed using drones along with online machine learning models to detect simulated drowning victims, demonstrating the feasibility of this approach.
Target-detection algorithms are advancing rapidly, with the YOLOv5 algorithm, a classical method, finding widespread use across various detection applications. He et al. [
24] conducted a study to explore the advantages of Faster R-CNN and a series of YOLOv5 algorithms, aiming for swift and accurate detection of infant drowning in real-world scenarios. Furthermore, Ellen et al. [
25] utilized the YOLOv5 algorithm to search for submerged victims but encountered limitations in accuracy. On a different note, Xu et al. [
26] and Xue et al. [
27] enhanced the YOLOv5 algorithm specifically for detecting forest fires, resulting in an improved algorithm that performs well in fire detection.
5. Limitations
The simulation personnel are fixed, and the current age range is limited. Enhancing the model’s performance could be achieved by incorporating additional sets of personnel photo datasets. Utilizing a more extensive dataset featuring photos from diverse settings has the potential to enhance both sensitivity and specificity. It is important to note that the shape of simulated personnel in the self-made dataset may differ from individuals in real rescue situations who are conscious and in distress. Factors such as bottom conditions, lighting, and attire of personnel in water may also impact the results. And it could be recognized that the use of drones may face constraints, including power shortages and equipment resource limitations. Policies and regulations concerning drones can vary by region. Some areas may impose restrictions on drone flights during specific times or in particular zones for safety or to protect specific areas. Familiarity and compliance with local laws and restrictions are necessary before using a drone. Some regions place a high emphasis on privacy concerns, necessitating consent regarding drone surveillance and privacy matters before their use. We are conducting experiments in public settings, and there are no restrictions on the use of drones or privacy concerns. When outdoors, weather conditions like high winds can limit the use of drones, drones with strong wind resistance are necessary to handle such conditions. In our future research, we will primarily focus on two main areas of work. Firstly, the application scenarios will expand from swimming pools to outdoor areas and beaches, encompassing individuals of different age groups. Then, the model will undergo lightweight processing to effectively utilize memory and hard drive space, facilitating its application and deployment. Due to constraints with some peripheral devices, which might have limited memory and small size, it could potentially restrict the application of our research. After lightweighting the model, drowning detection can be achieved by transferring data to a control system.
6. Conclusions
This article introduces a novel self-made dataset obtained via drone-based data collection specifically designed for drowning detection. The dataset comprises two distinct categories: drowning and non-drowning postures (
Figure 3). While previous research primarily relied on fixed camera types such as wall-mounted cameras, overhead cameras, and underwater cameras for capturing imagery within pool areas or interiors [
13,
14], fixed cameras inherently suffer from limitations due to their rigid perspective, making them sensitive to variations in distance. This sensitivity leads to challenges such as target deformation and posture changes easily. Our study reveals the exceptional flexibility of drones in tracking swimmers and efficiently collecting surface data from swimming pools. Compared to manual supervision, drone-based aerial surveys prove to be a time-saving approach, providing expedited access to critical swimming posture data more conveniently. The aerial perspective not only facilitates the identification of high-risk drowning behaviors, such as near-vertical body postures or ineffective movements but also enhances drowning detection accuracy, thereby improving the quality and efficacy of rescue efforts.
Additionally, the original YOLOv5 inadequately addressed water behavior deformation and posture changes, necessitating the integration of an attention module. The coordinate attention (CA) module effectively enhances the representation of pertinent objects without imposing substantial computational complexity, making it an ideal candidate for integration with conventional networks. Previous studies have successfully employed the CA module to improve detection accuracy and robustness across various tasks. For example, adding the CA module to vehicle detection improved accuracy to almost 95% when used with the BiFPN structure for enhancing feature fusion [
31]. Similarly, the CA module with position information accurately detected dense crowds by obtaining the effective width and height of the feature layer image [
38]. In sea vessel detection, the CA module was applied to detect important ship features, suppress irrelevant information, increase the model’s robustness to scale changes and noise interference, and effectively improve the model’s overall performance [
39]. However, there was insufficient attention to the nonlinear concentration of water activity postures. The ICA module combines the CA module with the SiLU activation function. The SiLU activation function with self-stability outperforms the ReLU activation function [
30]. Substituting the ReLU activation function in the CA module with the more stable SiLU activation function enhances the processing capacity of important non-linear features and attention to postures. The ICA module not only captures cross-channel information, direction awareness, and location sensitivity information like the CA module but also enhances non-linear sensitivity. This enhancement assists the model in more accurately locating and identifying objects of interest, resulting in improved target recognition accuracy. Incorporating the ICA module to improve the attention and sensitivity of important feature details in swimming postures, as demonstrated in
Table 4, increased the model’s accuracy by 0.8%. Furthermore, replacing the original PAN structure with the Bi-directional Feature Pyramid Network (BiFPN) structure to enhance feature fusion efficiency resulted in a 1.3% increase in the model’s accuracy, as illustrated in
Table 3.
While several existing drowning datasets have utilized traditional camera setups, the accuracy of existing models for drowning detection remains insufficient, limiting their value in applications. In contrast, our improved YOLOv5 model achieved an impressive precision of 98.1%, meeting the stringent criteria for high accuracy required in drowning recognition. This heightened accuracy instills greater confidence in the precise detection of drowning incidents, providing crucial support to rescue operations. The model’s fast inference speed of 3.7 ms allows rapid screening of real-time images. The combination of high accuracy and rapid inference speed positions the model as a valuable asset for real-time drowning detection at swimming pools, ensuring timely identification of incidents and contributing significantly to drowning prevention efforts.
In this paper, we present an improved YOLOv5 algorithm designed for the timely detection of drowning incidents in indoor swimming pools.
- (1)
Two key improvements were implemented to augment the original YOLOv5 algorithm. Firstly, the ReLU activation function in the coordinated attention (CA) module was replaced with the SiLU activation function, resulting in a refined coordinated attention module (ICA). Additionally, the PAN module was substituted with the bi-directional feature pyramid network (BiFPN).
- (2)
To evaluate the accuracy of the improved YOLOv5 algorithm, a self-made dataset was generated. Four college students simulated drowning scenarios and various water poses under drone surveillance, with relevant images extracted to form a dataset comprising 8572 images.
- (3)
The improved YOLOv5 algorithm exhibited a noteworthy 1.3% improvement in precision compared to the original YOLOv5 algorithm. It achieved a recall rate of 98.0% and mean average precision (mAP) values of 98.5% and 73.3% at 0.5 to 0.9 IOU thresholds, respectively, meeting the stringent accuracy requirements for drowning detection.
Consequently, the use of a self-made dataset in conjunction with the improved YOLOv5 algorithm for drowning recognition proves to be a viable approach. Moreover, the effect of accurately detecting individuals drowning on the water surface is satisfactory. And the model’s capability to accurately detect individuals drowning on the water surface, with a precision of 98.1%, underscores its high value in drowning detection and rescue operations.