Based on the abovementioned platform technology, this part will carry out model training on the algorithm proposed in this paper, design comparative experiments and ablation experiments, and finally, integrate the object detection technology into the robot equipment for testing. The experimental process and experimental results are as follows.
3.4.3. Comparative Experiment
The specific indicators for evaluating the performance of the algorithm in this paper include precision (
P), recall (
R), and mAP, where mAP @ 0.5 represents the average precision mean when the intersection over the Unio IOU (IOU) threshold is 50% and mAP0.5–0.95 represents the average precision mean of the IOU threshold in the range from 50% to 95%. This process can be described as follows:
In the formula, TP represents the number of correct positive samples, FP represents the number of wrong positive samples, FN represents the number of wrong negative samples, and N represents the number of types in the sample.
Params are an important indicator for measuring model complexity. The more parameters a model has, the more computing resources and data it requires for training and inference. GFLOPS stands for the computational efficiency and speed of a model. It denotes the number of floating-point operations needed to run a network model once. It measures the number of floating-point operations a model performs during a forward propagation.
In order to verify the effectiveness of the improvement of the attention mechanism, the added attention mechanism is replaced, including squeeze-and-excitation (SE) [
27], the convolutional block attention module (CBAM) [
28], efficient channel attention (ECA) [
29], coordinate attention (CA) [
30], and EMA. The effects of different attention mechanisms on the model detection effect are compared and analyzed. The “-“ indicates that the attention mechanism is not applied to the model. The same parameters are used in the training process, and experiments are performed on the VOC dataset. The results are shown in
Table 3.
SE is a classic channel attention mechanism, which strengthens the importance of feature channels by compressing and stimulating processes. As another form of channel attention, ECA enhances feature representation by effectively capturing cross-channel correlations but ignores spatial location information. CA integrates location information into channel attention and processes features in different spatial directions through two feature coding steps, thereby generating weights that fuse channel and spatial information. CBAM combines the advantages of channel and spatial attention mechanisms and models the channel and spatial weights independently, which not only strengthens the relationship between channels but also considers the spatial interaction and realizes the comprehensive optimization of features. The experimental results reveal the specific effects of different attention mechanisms on model performance. The model with SE and CBAM attention mechanisms suffered a 0.4% and 0.2% decrease in detection accuracy, respectively, indicating that the two mechanisms did not effectively improve performance on the current dataset. In contrast, when the model combines the CA, ECA, and EMA attention mechanisms, the detection accuracy is improved by 0.4%, 0.3%, and 1.1%, respectively. For the latter two attention mechanisms, the detection accuracy is significantly improved. On the whole, the introduction of the EMA attention mechanism not only accelerates the detection speed but also effectively improves the detection accuracy of the model, which makes it more advantageous in practical applications.
In order to verify the performance of the ROS–YOLOv5–FleetEMA model proposed in this paper, the model is compared with the traditional YOLOv5-Lite model based on deep learning. In the case of using the same dataset and experimental environment, the average accuracy improvement effect is shown in
Table 4.
Through the analysis of the results, the mAP @ 0.5 of the ROS–YOLOv5–FleetEMA model proposed in this paper is 2.7% higher than that of the traditional YOLOv5-Lite model, and in a wider accuracy range mAP @ 0.5–0.95, the ROS–YOLOv5–FleetEMA model proposed in this paper is 4.3% higher than the traditional YOLOv5-Lite model.
In order to evaluate the lightweight improvement effect of the ROS–YOLOv5–FleetEMA model more comprehensively, this paper introduces the traditional YOLOv5 s model as the comparison benchmark. The experimental results are shown in
Table 5.
The results show that the ROS–YOLOv5–FleetEMA model proposed in this paper has achieved significant optimization in the two key indicators of GFLOPS and Param. Compared with the traditional YOLOv5s model, the GFLOPs of the ROS–YOLOv5–FleetEMA model are reduced by 79.3%, and the parameter amount is reduced by 81.1%. This optimization not only reduces the consumption of computing resources but also makes the model more suitable for deployment on resource-constrained devices. At the same time, compared with the YOLOv5-Lite model, the GFLOPs of the ROS–YOLOv5–FleetEMA model are reduced by 13.2%, and the amount of parameters is reduced by 15.1%.
By comparing the experimental results, it is verified that the ROS–YOLOv5–FleetEMA model shows significant advantages in computational efficiency and resource consumption while maintaining high detection accuracy, which proves its practicability and effectiveness in a resource-constrained environment.
3.4.4. Ablation Experiment
In order to further verify the effectiveness of the improved method ROS–YOLOv5–FleetEMA model proposed in this paper, the following ablation experiments are designed: Conduct ablation experiments to explore the effectiveness of improvement methods on the model. Combine EMA, C3Ghost, and MPDIoU with the traditional YOLOv5-Lite model in different ways. By comparing the performance of models with different configurations, ablation experiments can help us understand how each component affects the overall performance of the model, including detection accuracy, computational efficiency, and resource consumption. The ablation experiment systematically removes or replaces various components in the model, observes the impact of these changes on model performance, provides an empirical basis for model design decisions, and ensures the practicality and effectiveness of the proposed ROS–YOLOv5–FleetEMA model in resource-constrained environments. In the experimental design, “−” indicates that an improvement has not been applied to the model, while “+” indicates that the improvement has been integrated. In this way, the specific impact of each combination on the performance of the model can be clearly demonstrated. The specific results are shown in
Table 6 and
Figure 8.
Through experimental analysis, the EMA attention module is introduced into the traditional YOLOv5-Lite model, and the mAP @ 0.5 is significantly improved, while the number of model parameters does not increase much. In addition, the traditional CIoU loss function is replaced by the MPDIoU loss function, which further optimizes the performance of the model in terms of bounding box positioning accuracy. The MPDIoU loss function makes the model more accurate in predicting the bounding box by considering the center point and diagonal distance of the bounding box, and the predicted bounding box has a higher degree of coincidence with the real bounding box. The experimental results show that mAP @ 0.5 is increased by 0.4%, which indicates that the MPDIoU loss function can make the regression of the model to the bounding box more stable, and the prediction accuracy is higher. After the introduction of the C3Chost module, the parameters of the model and the GFLOPS are significantly reduced while maintaining a high detection accuracy. The C3Chost module reduces the consumption of computing resources by optimizing the feature extraction process without affecting the detection effect. Finally, all these improved methods are applied to the YOLOv5-Lite model; not only has mAP @ 0.5 been significantly improved but the number of parameters of the model has been reduced by 15.1%. This shows that these optimization strategies can significantly reduce the computational complexity and resource consumption of the model without sacrificing the detection accuracy, making the model more suitable for deployment on resource-constrained devices, such as mobile devices and embedded systems.
3.4.5. Integrated Experiment
The Jilin Provincial Key Laboratory of Human Health Status Identification and Function Enhancement was selected as the experimental site.
In order to realize the object detection function, this paper deploys deep learning object detection technology to the robot equipment. For this reason, this paper develops an object detection function package based on ROS–YOLOv5–FleetEMA, enters the src directory in the working space catkin_ws, and opens the terminal; input conda activate yolo, enter the virtual environment, enter the function package directory, enter sudo pip install -r requirements.txt, and install the object detection-related dependency library.
After the installation is completed, enter the roslaunch yolov5_ros yolo.launch command and start the usb_cam and the object detection function based on ROS–YOLOv5–FleetEMA at the same time. The usb_cam is a package used for interacting with the USB camera. This package allows users to subscribe to camera image topics and publish them to ROS, allowing them to use USB cameras in ROS. By subscribing to the image topic published by usb_cam, we employ cv-bridge to transform ROS image messages into the OpenCV image format. Within the callback function, we execute YOLOv5 object detection on the transformed image, subsequently convert the processed image back into ROS image messages, and publish them to a new YOLOv5 topic.
At this point, open the ROS-based object detection service platform, set the IP address of the car, and then connect the device; by using the QT button to subscribe to newly established YOLOv5 topics with just one click, the results will be displayed on the ROS-based object detection platform, as shown in
Figure 9.
Through experimental analysis, the application effect of the ROS–YOLOv5–FleetEMA model proposed in this paper in the ROS robot system is verified. The model not only performs well in a resource-constrained environment but also integrates with a ROS-based object detection platform to achieve efficient and fast object detection. Specifically, the system can accurately identify and track multiple targets, such as pedestrians, monitors, etc. When the car maintains a speed of 0.5 to 1.5 m per second, it takes 34.4 milliseconds to identify the object, up to 30 FPS, ensuring the fluency of the detection process. This optimization not only improves the robot’s perception ability in complex environments but also provides strong support for further decision-making and execution.