*4.5. Results*

This method has achieved good results on the missing faults dataset. A total of 8064 test samples are correctly classified according to type. The detection results of some examples are shown in Figures 12 and 13. This method not only has good classification accuracy, but also has good positioning, timeliness and energy consumption ratio. In the past, similar defect detection research did not achieve detection accuracy over 99%, but our algorithm not only improves accuracy, but can also be promoted at an enterprise level to complete cutting-edge detection work in a fast, cheap, stable and green way. In addition to accuracy, manufacturers are concerned with the efficiency, the stability, the scalability and the comprehensive cost performance of large-scale deployment.

**Figure 12.** Comparison of our algorithm and classical YOLOv3. **Figure 12.** Comparison of our algorithm and classical YOLOv3. **Figure 12.** Comparison of our algorithm and classical YOLOv3.

*Micromachines* **2022**, *13*, x FOR PEER REVIEW 14 of 17

**Figure 13.** Effect of preprocessing and data augmentation on our algorithm. **Figure 13.** Effect of preprocessing and data augmentation on our algorithm.

**Figure 13.** Effect of preprocessing and data augmentation on our algorithm.

In Figures 12 and 13, the *x*-coordinate axis represents training times and the *y*-In Figures 12 and 13, the *x*-coordinate axis represents training times and the *y*-In Figures 12 and 13, the *x*-coordinate axis represents training times and the *y*-coordinate axis represents the value of the measurement index.

coordinate axis represents the value of the measurement index. The comparison results are shown in Table 4. For the first time in this study, FPGA is used to implement the deep learning missing installation detection system. The performance of the hardware scheme is moderate, and the price is moderate. Deployment is difficult, but the advantages are high stability and obvious design flexibility. A comparison of the performance of this algorithm with that of previous algorithms is coordinate axis represents the value of the measurement index. The comparison results are shown in Table 4. For the first time in this study, FPGA is used to implement the deep learning missing installation detection system. The performance of the hardware scheme is moderate, and the price is moderate. Deployment is difficult, but the advantages are high stability and obvious design flexibility. A comparison of the performance of this algorithm with that of previous algorithms is The comparison results are shown in Table 4. For the first time in this study, FPGA is used to implement the deep learning missing installation detection system. The performance of the hardware scheme is moderate, and the price is moderate. Deployment is difficult, but the advantages are high stability and obvious design flexibility. A comparison of the performance of this algorithm with that of previous algorithms is shown in the table below.

shown in the table below. shown in the table below. **Table 4.** A comparison of the performance of this algorithm with that of other representative **Table 4.** A comparison of the performance of this algorithm with that of other representative industry defect inspection studies.


Junfeng Jing [16] 98% 0.046 - Our algorithm 99.2% 0.010 0.991

Junfeng Jing [16] 98% 0.046 - Our algorithm 99.2% 0.010 0.991

Table 5.

Table 5.

The performance impact of an attention mechanism on the algorithm is shown in Table 5.

**Table 5.** Performance impact of an attention mechanism.


A comparison of the main performance indices for our detection algorithm deployed on FPGA and for similar chips is shown in Table 6.

**Table 6.** Comparison of three representative chips.


\* The power consumption data are based on the official instruction document of the chip.

### *4.6. Further Discussion and Analysis*

At the same time, some related experiments are carried out to further verify the validity of the algorithm.

Transfer learning: When training this model, weight is trained from scratch. However, some research also states that the classical YOLOv3 algorithm can be quickly developed through transfer learning. Therefore, we consider two different training schemes: (a) training the network from scratch; (b) training the network by the transfer learning method using a classical network pretrained by the Common Objects in Context (COCO) dataset. The training time of the two methods is very similar, at approximately 48 h, and the mAP of the two methods is approximately 0.991. We think that the effect of cross-dataset transfer learning is not obvious due to the large difference between the dataset of this study and the COCO dataset.

Precision analysis: Compared with the classical YOLOv3 network, our algorithm reduces 14 neural network layers. However, after 300 rounds of training, accuracy improved from 95 to 99.2%. We think that is because the optimized version has fewer network parameters and is easier to converge than the classical version.

Error analysis: Although the recognition accuracy of this method reaches 99.2%, the premise is the existing 10 types of images. When adding a new angle image, it needs automatic recognition and automatic preprocessing. Although identification did not improve, we find that accuracy cannot reach 99.2%. Therefore, it is necessary to train a new network through transfer learning.

FPGA side performance improvement: On the host side, the size of the input image has little impact on algorithm efficiency. However, on the FPGA side, there is a big difference in the recognition efficiency of different-sized images. The key of the problem is that image preprocessing is not based on a programmable hardware circuit, but is based on ARM soft core processing. The ARM CPU of PYNQ-Z2 has poor performance and a slow speed. If image preprocessing is also made into a programmable hardware circuit, FPGA hard core processing should be able to effectively improve the overall performance of the algorithm.

### **5. Conclusions**

The traditional inspection algorithm of missing faults is limited by materials, safety, costs, etc. In this paper, a new and efficient algorithm based on an attention mechanism that can be deployed on FPGA for defects inspection is proposed. Through our algorithm, we can correctly identify the problem of missing faults in addition to achieving high precision, a fast speed and low energy consumption. The experimental results show that the accuracy of the algorithm is 99.2%, processing speed is 1.54 FPS, and energy consumption is 10 W. The algorithm can be widely deployed in the industrial field as cutting-edge equipment.

By the second quarter of 2022, there were 366, 173 and 95 articles containing YOLOv3, YOLOv4 and YOLOv5, respectively, in search titles on Web of Science. YOLOv3 is a very classical algorithm, while YOLOv4 and YOLOv5 represent an inevitable trend. In particular in the fields of robot [27] and dynamic object capture [28], YOLOv5 has made new progress, which points out the direction for future improvement in this research.

**Author Contributions:** Conceptualization, L.Y. and Q.Z.; methodology, L.Y., J.Z. and Q.Z.; software, L.Y. and Q.Z.; validation, J.Z.; formal analysis, L.Y. and J.Z.; investigation, L.Y.; data curation, L.Y. and Z.W.; writing—original draft preparation, L.Y.; writing—review and editing, Q.Z. and J.Z.; supervision, J.Z.; Q.Z. contributed equally to this work and should be considered co-first authors. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the R&D Project in Key Areas of Guangdong Province, grant number 2020B0101050001. This research was also funded by the Qingdao City Philosophy and Social Science Planning 885 Project, grant number QDSKL1801166.

**Acknowledgments:** The authors would like to thank the anonymous referees and journal editors for their valuable and constructive feedback.

**Conflicts of Interest:** The authors declare no conflict of interest.
