7.1. Comparison of the Three Proposed Droplet Tracking Methods
To rigorously evaluate the effectiveness of the proposed droplet detection and tracking methods, we conducted a series of experiments designed to mimic real-world scenarios. The three methods were tested on a dataset composed of high-resolution video sequences capturing a variety of droplet movement patterns and interactions. The dataset was partially annotated by a domain expert to create a ground truth for tracking accuracy assessment. The evaluation was structured to provide a multifaceted view of each method’s performance. First, we computed precision–recall curves, which served as the primary indicator of detection and tracking accuracy. A perfect tracking method would achieve a precision and recall of 1.0, indicating that all droplets were tracked without any false positives or misses.
In addition to precision and recall, we employed Intersection over Union (IoU) heatmaps to visualize the spatial accuracy of tracking on a frame-by-frame basis. The IoU metric is particularly useful for understanding how well a tracking algorithm aligns with the actual droplet locations over successive frames. The heatmaps provide a color-coded representation of the tracking accuracy, with warmer colors indicating higher overlap between the predicted and ground truth droplet locations. To complement these metrics, the Multiple Object Tracking Accuracy (MOTA) and Multiple Object Tracking Precision (MOTP) scores were calculated. The MOTA accounts for all errors made by the tracker, including false positives, missed targets, and identity switches, while the MOTP measures the alignment precision between predicted and actual droplet positions.
Our experimental setup was completed with an analysis of the computational efficiency of the three methods. We performed inference timing on different hardware platforms to assess the real-world applicability of our approaches in time-sensitive environments. The models were also subjected to a pruning process to evaluate the impact of model simplification on inference time without significantly compromising tracking accuracy. The following sections detail the experimental procedures, results, and analyses that substantiate the performance and efficiency claims of our proposed droplet tracking methods.
The precision–recall curves provided in
Figure 3 offer insights into a comprehensive evaluation of the three tracking methods. Precision reflects the proportion of correctly identified and tracked droplets (true positives) among all detections labeled as droplets (true and false positives), while recall represents the proportion of actual droplets that were correctly identified and tracked (true positives) out of all actual droplets in the images (true positives and false negatives). The BSET-DT method, depicted by the solid blue line with circle markers, demonstrates a high level of precision at lower recall levels, indicating its effectiveness in accurately detecting and tracking droplets with minimal misidentification. As recall increases from
to
, BSET-DT shows commendable consistency in precision between
–
, suggesting its robustness in tracking droplets across various scenarios. The OCDT method, represented by the dashed orange line with square markers, begins with slightly lower precision, then decreases more sharply as recall increases beyond
, indicating more frequent misidentifications or tracking errors. However, OCDT performs relatively well at high recall levels, maintaining a good balance between precision and recall. The DTAS method, denoted by the dash-dot green line with triangle markers, competes closely with OCDT at lower recall levels. However, as recall increases OCDT shows a relatively better performance compared to DTAS, maintaining higher precision at high recall levels. This suggests that while DTAS performs well in scenarios where a balanced tradeoff between precision and recall is acceptable, it may not be as advantageous as OCDT when high recall is the priority, especially in applications where maintaining higher precision is critical at these levels. Therefore, DTAS could still be useful in situations where recall is important, but users should consider that OCDT may provide a better overall balance of precision and recall, particularly when high recall is desired.
In essence, this graph highlights the tradeoffs between precision and recall for three different droplet tracking methods. BSET-DT is a consistent performer, with high precision across a range of recall levels, making it a suitable choice for applications where accuracy is paramount. Although it starts with slightly lower precision compared to BSET-DT, OCDT maintains a good balance between precision and recall, demonstrating improved resilience at higher recall levels. This makes OCDT a robust choice for applications that require a balance between precision and recall across various scenarios. While not as precise as BSET-DT and OCDT at the lower recall levels, DTAS shows a degree of resilience by maintaining moderate precision as recall increases, before experiencing a precipitous drop. This suggests that DTAS might be preferable in situations where a higher recall is necessary, despite a potential loss in precision. Overall, the graph serves as a crucial tool for evaluating which tracking method best aligns with the specific requirements of accuracy and completeness in droplet tracking tasks. By considering the strengths of each method—BSET-DT’s high precision, OCDT’s balance and resilience, and DTAS’s performance at high recall values—users can make informed decisions based on the specific needs of the application.
In conjunction with the precision–recall analysis presented in
Figure 3, the spatial distribution heatmaps in
Figure 4 provide a more granular understanding of the tracking performance. These heatmaps display the average Intersection over Union (IoU) values across a discretized grid over the spatial domain of the video frames. The heatmaps were constructed as follows:
Grid Creation: Each video frame was divided into a 10 × 10 grid, with each cell representing a specific region of the frame. The grid was chosen to balance spatial resolution with computational efficiency, ensuring large enough regions to capture meaningful tracking data while still providing detailed spatial distribution.
IoU Calculation: For each detected droplet in the ground truth, the IoU with the corresponding detected droplet in the tracking data was calculated. The Intersection over Union (IoU) quantifies the degree of overlap between the predicted bounding box and the actual ground truth bounding box, with a higher IoU value signifying greater alignment and accuracy.
Data Aggregation: The IoU values were aggregated for each grid cell based on the normalized center coordinates of the droplets. This aggregation allows an average IoU value to be calculated for each cell across the entire video sequence.
Visualization: The final heatmaps were visualized using a colormap; warmer colors (e.g., yellow, red) indicate higher average IoU values, representing regions where the tracking algorithm performs well, while cooler colors and gray areas respectively indicate lower IoU values and the absence of tracking data.
Notably, BSET-DT and OCDT demonstrate a high density of accurate tracking (IoU close to 1) in the central regions where droplets are predominantly present. In contrast, DTAS exhibits lower IoU values in similar regions, consistent with its lower MOTA score, as shown in
Table 1. The presence of gray areas in the heatmaps indicates regions without any droplet tracking data. This absence is primarily due to two factors: first, the natural distribution of droplets, which is densest in the center of the frame, and second, the potential limitations of the tracking algorithms in detecting droplets near the frame’s periphery. The heatmaps underscore the robustness of Methods 1 and 2 in maintaining high tracking accuracy where it matters most, confirming their suitability for droplet tracking applications that demand high precision.
Table 1 above presents the experimental results of the three methods designed earlier in this paper for droplet tracking. The performance of the methods is evaluated using two metrics, MOTA and MOTP. The results show that BSET-DT and OCDT achieve about the same MOTA, with 0.899 and 0.896, respectively, while DTAS achieves the lowest MOTA score of 0.804. In terms of MOTP, BSET-DT achieves the highest score of 0.833, followed by DTAS with a score of 0.815 and OCDT with 0.823. These results indicate that BSET-DT and OCDT outperform the other method in terms of MOTA, which is a widely used metric for measuring the overall performance of an object tracking system. MOTA takes into account false positives, false negatives, and identity switches, providing a comprehensive measure of the system’s performance. The excellent MOTA scores of BSET-DT and OCDT suggest that they can track droplets with greater accuracy and robustness as compared to DTAS. Overall, the experimental results demonstrate that BSET-DT and OCDT offer the most effective frameworks for droplet tracking, as they achieve the highest MOTA and MOTP scores among the proposed methods. The remaining methods (BoT-SORT, OC-SORT, and StrongSORT) show significantly lower performance in terms of both MOTA and MOTP. BoT-SORT and OC-SORT achieve MOTA scores of 0.640 and 0.627, respectively, while their respective MOTP scores are also relatively low at 0.450 and 0.514. StrongSORT performs the worst among the tested methods, with an MOTA of 0.571 and MOTP of 0.638. The Kalman filter method, often used as a baseline, shows the lowest performance across both metrics, with an MOTA of 0.412 and MOTP of 0.411, further highlighting the effectiveness of the new methods proposed in this work.
7.2. Improved Efficiency via Pruning
Our evaluation demonstrates that model pruning, which reduced the model’s size by 30%, led to measurable improvements in the inference times for droplet detection and tracking across various computing devices. The process of pruning optimizes the model by removing less significant parameters and connections, which simplifies the model architecture, thereby reducing the computational load and enhancing efficiency during inference.
Table 2 and
Table 3 display the comparative inference times in milliseconds per frame across devices with different GPU and CPU configurations. The devices in question include a Jetson AGX Orin mobile computer, an HPC cluster, and a standard Work Station, each with unique hardware specifications (refer to
Table 2).
Following the application of the pruning algorithm (Algorithm 4), we observed the following improvements:
On the Jetson AGX Orin, the CPU inference time saw a modest improvement of 5.8%, from 8193.4 ms to 7718.6 ms, whereas the GPU inference time saw a decrease of approximately 3.6% from 47.8 ms to 46.1 ms.
The HPC AI.Panther Supercomputer showed a 2.2% decrease in CPU inference time, from 7921.5 ms to 7745.5 ms, and a more significant 12% reduction in GPU inference time from 24.1 ms to 21.2 ms.
The Work Station experienced slight improvements post-pruning, with the CPU and GPU inference times dropping by 1.6% (from 8119.2 ms to 7992.4 ms) and 12.1% (from 35.6 ms to 31.28 ms), respectively.
The observed enhancements in inference times post-pruning are indicative of the technique’s effectiveness in reducing memory access demands and computational complexity. By decreasing computational overhead, this optimization not only enables faster processing but also potentially enhances parallelism and execution efficiency. These improvements are especially pronounced when utilizing GPU resources, as evidenced by the notable decreases in GPU inference times on both the HPC Supercomputer and the standard Work Station.
Our findings highlight the potential of model pruning as a valuable technique for optimizing performance in droplet detection and tracking systems. The reduction in inference time contributes to the feasibility of deploying these systems in real-world scenarios where rapid processing is crucial. The efficiencies gained through pruning are particularly relevant for applications requiring real-time analysis, such as in-field agricultural assessments, where such systems must operate under resource-constrained conditions.
7.3. Comparison with an Existing Droplet Tracking Method
Figure 5 presents a comparative visualization of droplet tracking performance across four sequential frames using the three deep learning-based methods (BSET-DT, OCDT, DTAS) and the Kalman filter-based approach proposed in [
20]. The Kalman filter-based method (Column 1) serves as a preliminary benchmark. Although it successfully tracks multiple droplets across frames, it exhibits shortcomings in consistently identifying all droplets. This is evidenced by missing detections within these frames. Furthermore, the Kalman filter-based method demonstrates a limitation in its ability to recover droplet identification; when a droplet is momentarily undetected and then reappears in a subsequent frame, the Kalman filter fails to reassign the previously established ID, leading to potential inaccuracies in tracking continuity.
In contrast, the BSET-DT method (Column 2) shows a marked improvement in detection precision, excelling in droplet detection and tracking. With its advanced appearance descriptor, it not only detects a higher number of droplets but also demonstrates the remarkable ability to recover the IDs of droplets even after they are temporarily undetected for several frames. This feature significantly enhances the tracking accuracy over time. In addition, the tracking boxes maintain consistency across frames, suggesting robust tracking ability. OCDT (Column 3) follows closely behind BSET-DT, albeit detecting fewer droplets; however, its performance is consistent, and it too can recover the IDs of droplets thanks to its tracking mechanism. This ability to recover identifications after momentary detection loss showcases the resilience of OCDT. Finally, similarly to OCDT, DTAS (Column 4) shows a parallel level of performance. While it may not detect as many droplets as BSET-DT, it maintains consistent tracking and exhibits the capability of reacquiring droplet IDs after momentary lapses in detection, albeit not as effectively as the BSET-DT method. Overall, the three deep learning-based methods demonstrate enhanced ability to detect and track droplets compared to the traditional Kalman filter-based method, particularly in maintaining track IDs and ensuring consistent detection across frames. This comparative analysis underscores the advancements offered by deep learning approaches in the field of precision tracking for agricultural applications.
7.4. Validation against Actual Measurements
To validate the effectiveness and accuracy of our deep learning-based droplet tracking methods, we conducted experiments comparing actual measurements of droplet distance, size, and trajectory with the results calculated by our algorithm. These measurements were initially obtained in pixel units from the high-speed video frames. Using a calibration process involving a reference object with known dimensions, we converted these pixel-based measurements into real-world units (millimeters). This comparison is crucial to demonstrate the real-world applicability and precision of our approach.
Measurement Setup
For our experiments, we used a high-speed camera capable of recording at 2000 frames per second to capture detailed footage of droplets emitted from agricultural spray nozzles. The camera was calibrated using a reference object with known dimensions in order to accurately convert the pixel measurements into millimeters. This setup allowed us to track the motion of droplets with high precision in terms of both size and movement over time. The equipment used in this experiment included the following:
High-Speed Camera: The camera was capable of capturing images at 2000 frames per second, ensuring that even the fastest-moving droplets could be tracked.
Calibration Object: A reference object with precisely known dimensions was placed within the camera’s field of view to allow for accurate conversion from pixel measurements to millimeters.
Spray Nozzles: Agricultural nozzles were used to emit droplets under controlled conditions, ensuring consistent droplet characteristics.
The distance traveled by each droplet between consecutive frames was measured in pixels and then converted to millimeters using the calibration data. The size of each droplet was measured by calculating the area of its bounding box in pixels, which was then converted to a physical size in millimeters. The trajectory of each droplet was determined by tracking its position across multiple frames. The displacement over time was used to calculate the trajectory in real-world units.
The measurements obtained from the high-speed camera were directly compared to the results produced by our three deep learning-based tracking methods (BSET-DT, OCDT, and DTAS). This comparison was made using three key metrics: Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and correlation coefficient (R), which evaluates the agreement between the actual measurements and the algorithm’s predictions.
The results shown in
Table 4,
Table 5 and
Table 6 that our deep learning-based tracking methods achieve high accuracy in measuring droplet distance, size, and trajectory. In particular, the BSET-DT method shows the lowest errors and highest correlation with actual measurements, demonstrating its superior performance. Specifically, the BSET-DT method achieves a mean absolute error (MAE) of 0.50 mm for droplet distance, 0.30 mm for droplet size, and 0.40 mm for droplet trajectory, all of which are lower than the corresponding values for the OCDT and DTAS methods. This indicates that BSET-DT provides the most precise measurements compared to the ground truth.
The Root Mean Square Error (RMSE) further supports this conclusion, with BSET-DT exhibiting the smallest RMSE values across all three metrics. For droplet distance, BSET-DT has an RMSE of 0.70 mm, better than 0.80 mm for OCDT and 0.90 mm for DTAS. Similarly, for droplet size, BSET-DT’s RMSE is 0.40 mm, compared to 0.50 mm for OCDT and 0.60 mm and DTAS. For droplet trajectory, BSET-DT has an RMSE of 0.60 mm, again indicating better performance than OCDT and DTAS.
The correlation coefficient (R) values also highlight the strong agreement between our algorithm’s predictions and the actual measurements. The BSET-DT method achieves correlation coefficients of 0.90, 0.92, and 0.91 for droplet distance, size, and trajectory, respectively. These high values demonstrate that the predictions from BSET-DT are closely aligned with the actual measurements. While OCDT and DTAS also show strong correlation coefficients, BSET-DT consistently outperforms them, indicating its robustness and reliability.
Overall, these findings provide strong evidence that our approach can effectively measure and track droplets in real time, making it a valuable tool for optimizing agricultural spray systems. By providing this detailed comparison, we address the editor’s comment and offer readers a deeper understanding and more convincing evidence of the accuracy and innovation of our tracking methods.
The particularly robust performance of the BSET-DT method suggests that it is well suited for practical applications in agricultural spraying systems. Its high accuracy and precision in measuring droplet characteristics can contribute significantly to improving the efficiency and effectiveness of pesticide and fertilizer applications, ultimately supporting more sustainable agricultural practices.