1. Introduction
Marine waste poses severe and far-reaching threats to marine life. Of particular concern is the pervasive distribution of microplastics in oceans, presenting an insidious danger to marine ecosystems and the delicate balance of our planet [
1,
2,
3]. For example, the sudden increase in ocean plastic waste poses multifaceted threats, i.e., imperiling marine ecosystems, human health, crucial ecosystem services, and so forth [
4]. Among the ocean plastic pollution, plastic waste from terrestrial river basins, constituting 80% of ocean plastic waste, has a huge impact and is a severe global environmental issue to be urgently solved [
5]. Widespread urbanization and human activities in river basins have caused massive waste outfluxes via rivers and streams, resulting in unprecedented ocean plastic pollution. Therefore, detecting, classifying, and quantitatively counting river plastic waste is essential for assessing river quality and the resultant environmental impacts on the ocean [
6].
Traditionally, water quality control has relied on manual assessments of river conditions, with subsequent efforts dedicated to collecting, monitoring, and quantifying floating waste through labor-intensive field surveys [
7,
8,
9]. The quantifying approach [
7] used a net with a 2.5 cm mesh at Noda Bridge on the Edo River in Japan to measure the waste. According to their findings, 6% of the total waste by weight was anthropogenic waste. The collecting approach [
8] reported on the prevalence of plastic waste, ranging from 0.8 to 5.1% of the total macro-plastics, and intercepted an annual average of 22 to 36 tons of floating plastic waste using floating waste-retention booms. As for the monitoring approach [
9], a visual observation method was developed to systematically collect data on floating macro-plastics through collaboration with European countries. While informative, these manual methodologies were labor-intensive and incurred substantial costs. As the demand for more efficient and cost-effective water quality control methods intensifies, there is a growing imperative to explore and implement advanced technologies and automated systems that can augment or replace these traditional approaches, ensuring the more comprehensive and sustainable management of water resources on a global scale [
10,
11].
The imperative for an innovative automated waste detection system has emerged as a linchpin in the optimization of river cleaning and waste removal efforts. Grounded in cutting-edge computer vision technologies, this system is poised to revolutionize the monitoring of polluted waterways, offering heightened efficiency and accuracy. At its technological core lies the implementation of object detection, a sophisticated computer vision technique adept at pinpointing the exact locations of waste items within images or video frames [
12,
13]. This capability not only ensures the early identification of pollution but also facilitates swift response and cleanup operations, transforming the landscape of environmental conservation. Despite these advancements, challenges persist in continuously monitoring the temporal fluctuations in waste quantities. Kataoka et al. [
14] tried to solve the problem of tracking waste in rivers. They developed a method using image analysis, which is like teaching a computer to understand pictures. This method was specifically designed for this task, making it easier to monitor and manage river waste. However, this method faced hurdles in fully detecting plastic waste across various water types, relying on the ability to distinguish color differences in each water body to classify the type of waste. While these automated systems excel at identifying and categorizing types of waste, the accurate quantification of the volume of debris flowing from rivers into the ocean remains a complex challenge, necessitating ongoing refinements and innovations in this dynamic and in vital fields of environmental technology.
In advancing the automated waste detection system for river pollution monitoring, the incorporation of object tracking emerges as a transformative element, particularly in the dynamic context of flowing water. Continuously tracing the motion of identified waste objects across successive video frames allows the system to not only detect pollution but also monitor its trajectory and assess downstream implications. Drawing insights from applications in diverse domains, such as L. Gatelli et al. [
15] and Jingyi’s [
16] use of YOLOv4 as an object-detecting method for vehicle detection at road intersections or S. Charran et al.’s [
17] proposal for automating the ticketing process for traffic violations using image recognition, showcases the versatility of such tracking technologies. Similarly, Y. Ge et al.’s [
18] study on tomato growth monitoring with the YOLO-DeepSORT model underlines the potential for accurate data collection and decision-making in diverse fields. By combining object detection and tracking, the system not only identifies and categorizes waste but also quantifies it, enabling precise waste counting and offering a comprehensive understanding of pollution levels. This innovation empowers environmental authorities to optimize resource allocation, prioritize cleanup initiatives, and contribute to the sustainable preservation of aquatic ecosystems, providing a robust tool for comprehending the types and quantities of waste flowing from rivers into the ocean.
This paper aims to pioneer the development of an automated waste measurement method tailored for real river environments. To achieve this, we considered different scenarios involving waste conditions using an innovative automated waste detection system, leveraging advancements in object detection methods, and introducing enhancements to the established YOLOv5 [
19] model architecture. The proposed system aspires to surpass conventional methods by incorporating cutting-edge features and methodologies, with a specific focus on addressing challenges associated with monitoring flowing river environments. Beyond the object detection capabilities, the paper introduces [
20] as an object-tracking method tailored for video frames of flowing water, offering a solution to the intricate task of waste counting in dynamic river scenarios. By amalgamating these enhancements into the model, the system aims not only to detect but also to accurately quantify waste, providing a more comprehensive understanding of pollution levels in water bodies. This innovative approach holds potential for significant advancements in river pollution monitoring, offering a robust tool for environmental conservation and resource management.
2. Materials and Methods
Figure 1 depicts a system diagram outlining the proposed floating waste measurement method. It illustrates the flow from image input to YOLOv5-based detection, then to DeepSORT tracking, and finally, to data analysis and output.
2.1. Implementation of the Proposed Method
The method we propose in this section combines the strengths of YOLOv5 for detecting waste and YOLOv5_DeepSORT for counting waste, creating an efficient system to monitor and assess river waste in video frames. The research project consists of four important parts. First, we created a diverse and expanded dataset that includes seven different waste classes to make sure our model is well-trained and covers various types of waste. The second part focuses on detecting and classifying waste, using the powerful YOLOv5 design to accurately identify different types of river waste. The third part introduces innovation by smoothly integrating the DeepSORT tracking algorithm, allowing real-time tracking of detected waste objects in video frames. This ensures not only the precise identification but also the continuous monitoring and collection of waste over time. The study ends with a thorough presentation and analysis of results, showing how effective the integrated YOLOv5-DeepSORT system is. This comprehensive approach is a useful tool for environmental managers, providing real-time insights into the changing nature of river waste and contributing to cleaner and more sustainable water ecosystems. Beyond its environmental use, this research highlights the versatility of the method, demonstrating its potential for accurate object detection and counting in different areas, which is a significant step forward in the field.
2.2. Dataset and Environmental Scenario
The dataset for this research was created from a diverse array of waste types and quantities for both training and prediction in deep learning. The collection process involved capturing images of waste, particularly plastics, under controlled conditions. 12MP RGB camera (MAPIR. Inc., San Diego, California, USA), with specific spectral bands (red: 660 nm, green: 550 nm, blue: 475 nm), was employed for image capture, ensuring the high resolution and accurate representation of the waste in the laboratory setting. The dataset categorized simulated waste into seven distinct classes: cans, cartons, plastic bottles, foam, glass, paper, and plastic, reflecting a comprehensive spectrum of environmental debris. This categorization serves as the foundation for training and validating the deep learning model. The dataset not only provides diversity in waste types but also encompasses varying quantities, enriching the model’s ability to discern and classify different levels of pollution in controlled laboratory conditions.
The study places a particular emphasis on scenario diversity in the creation of the dataset, incorporating three waste detection scenarios that emulate different environmental conditions. In the first scenario (Case 1), characterized by clear visibility, individual items of waste are easily distinguishable. The second scenario (Case 2) considers partially submerged waste, evaluating the model’s adaptability to changes in visibility and underwater scenarios. The third scenario (Case 3) involves waste forming a collective mass, requiring the model to accurately detect individual items within clusters. By incorporating these distinct scenarios, the study comprehensively evaluates the model’s behavior in response to variations in environmental conditions and the morphology of the waste. This approach has the potential to enhance the reliability of the detection system in real river scenarios.
2.3. Waste Detection Algorithms
Waste detection models represent a crucial frontier in leveraging advanced technologies to tackle environmental challenges. These models, employing computer vision and machine learning techniques, play a pivotal role in swiftly and accurately identifying and categorizing waste within images and video data. The landscape of available models includes prominent ones, such as YOLOv5, Faster Region Convolutional Neural Network (Faster R-CNNs) [
21], Single-Shot Multibox Detector (SSD) [
22], and Mask R-CNNs [
23]. Each model brings its unique strengths to the table, striking a balance between detection speed and accuracy. YOLO, known for its real-time capabilities, Faster R-CNNs, offering high accuracy, SSD’s rapid scanning, and Mask R-CNN’s pixel-level segmentation exemplify the diverse methodologies employed in waste detection.
In the context of this research, YOLOv5 emerges as the model of choice for waste detection, representing the latest iteration in the You Only Look Once (YOLO) series. Its selection is driven by its exceptional balance between speed and accuracy, making it particularly well-suited for the dynamic requirements of waste detection in environmental monitoring. YOLOv5 excels in the simultaneous detection of multiple objects in a single inference, a critical feature for real-time responsiveness. Additionally, the model demonstrates high accuracy in identifying various types of waste, aligning with the research’s objectives. The choice of YOLOv5 reflects a strategic decision based on its robust performance, adaptability to diverse environmental conditions, and its efficiency in delivering accurate and timely insights.
The implementation of YOLOv5 involves a meticulous fine-tuning process, starting with the preparation of a comprehensive waste detection dataset containing images and corresponding annotations. The subsequent setup of YOLOv5 includes the installation of essential libraries and dependencies, followed by the initialization of the model using either pretrained weights or custom weight configurations tailored to the specific dataset. Fine-tuning is a crucial step that requires adjustments to hyperparameters in
Table S1, including the initial learning rate (lr0) set to 0.00872, the number of epochs set to 80, and batch size set to 8. This process allows the model to adapt to the nuances of the dataset and refine its performance for accurate waste detection. The choice of hyperparameters, including learning rates and regularization techniques, is pivotal in optimizing the model’s performance. The evaluation phase, utilizing a test dataset, ensures that the fine-tuned model meets the desired standards of accuracy and precision. The adaptability, speed, and accuracy of YOLOv5 position it as an ideal solution for automating waste detection across a spectrum of environmental monitoring scenarios, providing a robust tool for addressing environmental challenges.
2.4. Waste Tracking and Counting Algorithms
Waste tracking models, including SORT (Simple Online and Realtime Tracking) [
24] and its extension, DeepSORT, are pivotal in continuously monitoring and tracking the movement and positions of waste objects. SORT employs Kalman filtering and the Hungarian algorithm for real-time online tracking, predicting object motion and associating data effectively. DeepSORT, extending SORT, incorporates deep learning for appearance-based re-identification matching, mitigating identification switching and ensuring robust tracking, even in scenarios involving occlusions. By integrating view information with tracking components, DeepSORT excels at the real-time tracking of multiple objects in video streams. These models enhance the precision of waste outflow measurements, contributing to effective environmental monitoring and waste management with minimized environmental impact.
In this study, we chose DeepSORT (SORT with a Deep Association Metric) as the waste-tracking model. DeepSORT, building upon SORT, uses deep learning to improve tracking capabilities, especially in recognizing and matching appearances. It helps overcome challenges like objects being hidden for a long time or changing their looks, making the tracking more accurate and reducing confusion in identifying objects. The model’s ability to smoothly combine what it sees with the tracking parts aligns with the study’s focus on dealing with complicated waste-tracking situations. Including DeepSORT in the study helps us measure waste flow more precisely, moving forward environmental monitoring and waste management efforts with a focus on accurate tracking in real-time situations.
2.5. Performance and Evaluation
Evaluating the performance of object detection and tracking models, such as YOLOv5, is a critical aspect of deep learning and computer vision. This assessment relies on several essential metrics, including precision (P), recall (R), F1 score, PR curve (precision–recall curve), average precision (AP), and mean average precision (mAP) [
25]. Precision measures the model’s ability to correctly identify positive instances, recall gauges its capability to detect all actual positives, and the F1 score offers a balanced evaluation that avoids favoring one metric over the other. The PR curve graphically illustrates the model’s precision and recall trade-off at different confidence thresholds, aiding in threshold selection. AP (average precision) quantifies object detection accuracy by measuring the area under the precision–recall curve for a specific class. Meanwhile, mAP (mean average precision) averages APs across multiple classes [
25]. To calculate mAP, ‘n’ represents the confidence threshold, and ‘class’ denotes the number of waste classes. Together, these metrics empower researchers and engineers to optimize YOLOv5, tailoring it to specific needs, whether emphasizing accuracy, comprehensiveness, or striking a balance between the two, thereby ensuring precise and efficient object detection and tracking in diverse applications. Precision, recall, F1 score, and mAP can be calculated using Equations (1)–(4).
4. Discussion
The proposed research achieved an 88.0% mAP for seven different waste classes, demonstrating high accuracy (
Table 2). This result shows excellent performance on a large test image set, and it maintains a very high level of accuracy when compared to other studies. Other studies have focused on different waste classes or object classes, and they have achieved varying mAP scores. For instance, the YOLOv3 study focused on four waste classes and achieved a 77.2% mAP on 37 test images. In contrast, the Faster R-CNN study concentrated on three waste classes and achieved an 81.0% mAP. Furthermore, the YOLOv4 and DeepSORT study centered on different vehicle classes, attaining a 78% mAP. The primary advantage of the proposed research is its ability to achieve high accuracy while focusing on the seven distinct waste classes. This study’s significant improvement in accuracy is attributed to factors such as fine-tuning and the abundance of training images. This makes it highly effective in detecting and classifying a wide range of waste types.
This study, similar to the work by Jingyi et al. [
16], underscores the challenges associated with accurately classifying objects that share similar shapes. Notably, the present study identifies the lowest accuracy in detecting plastic bottles and plastic objects. In both studies, there is a recognition of the difficulties in accurately classifying objects, especially those with similar shapes, and a shared emphasis on addressing this challenge. To address this issue, emphasizing the incorporation of specialized training data to discern subtle differences in shape and features is crucial for effective model performance in the future.
The presented results across these cases shed light on how the characteristics of the water environment and the morphology of the waste impact waste detection. In Case 1, where visibility is clear, individual waste items are easily discernible, leading to high recall and precision scores across most waste categories. This scenario represents ideal conditions for detection. Case 2 introduces partially submerged waste, simulating decreased visibility and underwater scenarios. Despite the added complexity, the model demonstrates adaptability, albeit with slightly lower performance metrics compared to Case 1. Case 3 presents a more challenging scenario, with waste forming clusters, demanding the model accurately detect individual items within these masses. Here, the performance varies, with some categories experiencing decreased precision and recall. Overall, these findings underline the importance of considering scenario diversity in dataset creation for waste detection models. By encompassing various environmental conditions and waste morphologies, such as clear water, underwater, and clustered waste, the model’s robustness and reliability in real river scenarios can be significantly enhanced, ensuring effective waste management and environmental preservation efforts.
The model demonstrates an overall precision of 80%, indicating that it is capable of maintaining a consistent level of accuracy despite environmental influences. However, Yuhao Ge et al.’s study [
18] highlighted challenges associated with adverse environmental conditions during video capture, potentially resulting in missed detections and negative impacts on target tracking and counting. This raises concerns about the practical applicability of the model in real-world scenarios, where unpredictable weather or complex scenes may hinder its performance and reliability. To address these concerns, future efforts should prioritize enhancing the model’s robustness to handle challenging real-world conditions effectively. The critical emphasis on improving reliability and stability, especially in adverse environmental contexts, emerges as a pivotal focus for upcoming research endeavors.
In practical implementation, our study utilized a high-performance computer for machine learning training, which lasted approximately 6 h. Remarkably, results were obtained in less than 30 s when the fine-tuned model was executed, as indicated in
Table 1. While acknowledging the need for powerful computers during training, the feasibility of real-time utilization in actual river use is promising. A future challenge involves adapting the model for execution on portable devices or smartphones, ensuring convenience and user-friendliness for broader applicability.
Overall, this research addresses the critical issue of riverine waste management through an innovative approach that integrates cutting-edge technologies, i.e., YOLOv5 and DeepSORT. The system’s ability to accurately detect and classify various types of waste, combined with its real-time tracking capabilities, marks a significant advancement in environmental monitoring. The comprehensive dataset, consideration of seasonal and weather conditions, and incorporation of underwater waste contribute to the model’s robustness. However, challenges related to dataset diversity and regional adaptability remain. As outlined in the PC specifications, the computer’s high performance underscores the research’s commitment to ensuring computational efficiency.
5. Conclusions
The proposed research successfully demonstrated the ability to accurately quantify seven categories (can, carton, plastic bottle, foam, glass, paper, and plastic) through integrating deep learning architectures, i.e., YOLOv5 and DeepSORT. Its practicality in natural river environments and its accurate classification and object-tracking capabilities would make it a valuable tool for environmental conservation. Additionally, it offers a promising solution for addressing the pressing issue of plastic pollution in rivers and oceans. However, it is essential to acknowledge the system’s limitations, particularly concerning detecting and counting small objects, those submerged in water, and objects forming clusters. Furthermore, since this study was conducted in a laboratory setting, considerations such as data requirements, types of waste, and using cameras in natural river environments are necessary for its practical application.