3.3.1. Analysis of Twin-NMF Transfer Effectiveness
This section outlines six experimental groups designed to assess the efficacy of the Twin-NMF transfer method. Experiment 1, termed ‘No-transfer’, involves training directly on the ECTI dataset without loading any pre-training weights, with 200 epochs of training starting from initialization. Experiment 2 involves using YOLOv8n, which is loaded with universal detection weights (v8n) obtained by pre-training on the large-scale COCO dataset. This experiment includes 200 epochs of fine-tuning on the ECTI dataset using the v8n pre-training weights. Experiment 3, conducted without employing the Twin-NMF method, referred to as No-TNMF(NEU), involves using all 1800 NEU-DET images for pre-training, which includes 50 epochs to develop the pre-training weights. Subsequently, these weights are fine-tuned over 200 epochs using the ECTI dataset. Experiment 4 utilizes the Twin-NMF method to perform NMF on both the NEU-DET and ECTI-50 datasets. Subsequently, the NEU-DET samples are screened using a cosine similarity. Following this, the selected NEU-DET source domain samples undergo 50 epochs of training to generate pre-training weights. Finally, these pre-training weights are loaded for 200 epochs of fine-tuning on the ECTI dataset, referred to as TNMF(NEU). Experiments 5 and 6 were designed using GC10-DET as the source domain, referred to as No-TNMF(GC10) and TNMF(GC10), respectively.
The initial premise of the six experimental settings is that there is a discrepancy in the distribution of features present in the source domain samples and the target domain samples ECTI. Not all images from the source domain contain relevant prior knowledge for the target task associated with ECTI. This discrepancy, characterized as redundant deviation, has the potential to negatively affect the model’s performance, especially in cases where the model is not effectively adapted to the target task. The TNMF method is designed to identify the most pertinent aspect of the source domain samples between the source domain and the target domain, facilitating a more direct and precise transfer of domain knowledge between the source and target domains.
Table 1 presents the data from ablation experiments that evaluate the effectiveness of this transfer method on the ECTI dataset.
As illustrated in
Table 1, Experiment 2, which utilized general detection weights (v8n), was compared to Experiment 1 (which did not use pre-trained weights and trained the model from scratch), exhibited a 0.1% decrease in precision and a 1.1% decrease in recall. This suggests that the COCO dataset, from which v8n weights are derived, is not closely aligned with the ECTI dataset. In contrast, Experiment 4, which employed the Twin-NMF method to screen NEU-DET, demonstrated a 0.8% increase in precision and a 0.7% increase in
[email protected] compared to Experiment 3, where Twin-NMF was not used. Notably, the recall in Experiment 3 was 1.6% lower than in Experiment 1, indicating a significant rate of false detection. This issue may be attributed to the redundant bias in NEU-DET, affecting the model’s performance in parameter transfer. Using GC10-DET as the source domain dataset, Experiment 6, which applied TNMF, shows a more consistent trend compared to Experiment 5, where TNMF was not used, with precision metrics being 1.2% higher and
[email protected] also 1.2% higher. As indicated by
Table 1, the application of the Twin-NMF method has led to a notable improvement in model performance on the target domain task, particularly in detection accuracy, where TNMF
(NEU) reached 97.9% and TNMF
(GC10) reached 97.5%.
Figure 5 displays the loss and precision curves for the six experimental groups. As shown in
Figure 5, Experiment 2, which utilizes pre-trained v8n weights from the COCO dataset, exhibits significantly lower loss compared to the other five experiments. This advantage of low loss is a consequence of deep learning utilizing generic large datasets that can be trained. Compared to loading the pre-training weights, the convergence of the directly trained Experiment 1 is significantly more lagging in the precision curve, and the loss curve also starts falling from the highest loss value. The distinction between Experiment 3 and 4 lies in the utilization of the TNMF method. Although the discrepancy in the loss curve is marginal, the divergences in the precision curves are more pronounced. Experiment 4, which employs the TNMF approach, demonstrates superior performance compared to Experiment 3, particularly noticeable in the early oscillations and the smoother convergence observed in the later stages. In using GC10-DET as the source domain dataset, TNMF
(GC10) with the application of TNMF likewise outperforms No-TNMF
(GC10) in terms of accuracy profile. The performance of both source domain datasets corroborates the effectiveness of the TNMF method in migration.
In screening the source domain NEU-DET samples using cosine similarity, the distribution of the screened samples within the NEU-DET dataset is plotted by extracting relevant cosine similarity values. This distribution is illustrated in
Figure 6 below.
After completing the joint Twin-NMF process with the NEU-DET and ECTI-50 datasets, cosine similarity is utilized in the screening stage. This involves calculating the cosine similarity for each image in ECTI-50 against all images in NEU-DET, ordering the results by the magnitude of the cosine similarity values. For a single image from ECTI-50, the cosine similarities with all NEU-DET images are calculated, and the top 30 images exceeding the threshold are selected as samples strongly associated with the ECTI target domain. From the NEU-DET dataset, a total of 1181 images were included in the top 30 cosine similarity rankings, with 552 images identified as having strong correlations above the threshold value.
The distribution of visualized data of NEU-DET before and after applying TNMF is obtained by t-SNE [
25] (T-distributed Stochastic Neighbor Embedding) algorithm as shown in
Figure 7 below.
In three-dimensional space, it can be observed that the green ECTI dataset forms a relatively centralized cluster and overlaps with the other two datasets (NEU-DET and TNMF-NEU, which apply the TNMF screening process) in certain areas. The TNMF-NEU dataset, under the TNMF method, retains some of the feature distributions of the original NEU-DET dataset. At the same time, when examining the degree of overlap between different clusters and the tightness within each cluster, it can be seen that TNMF-NEU is converging toward the ECTI dataset. This suggests that TNMF introduces new features while preserving the original data features, making the data distribution closer to that of the target domain. However, feature tuning alone may not be sufficient to fully compensate for the distributional differences between the source and target domains. Thus, while TNMF can facilitate domain adaptation from the source domain to target domain tasks to some extent, it does not completely eliminate the variability between the two.
3.3.2. Analysis of SimAM Effectiveness
This section evaluates the addition of the SimAM module to the six previously described experimental groups and analyzes its effectiveness on the model. The results are detailed in
Table 2. It is notable that the incorporation of SimAM into the neck of the model yielded significant performance improvements across all models, with the most substantial enhancement observed in the final model, which combines Twin-NMF transfer and SimAM. The TNMF(NEU) + SimAM combination demonstrated an improvement of 1% in precision, 0.5% in
[email protected], 1% in
[email protected], and achieved an F1 score of 0.98. Similarly, TNMF(GC10) + SimAM showed a 1.5% improvement in precision and a 0.4% improvement in
[email protected] compared to the baseline.
The introduction of SimAM aims to mitigate the phenomenon of accuracy collapse observed in the model when detecting intermediate-scale defects. More detailed experimental results across various defect types are presented in
Table 3.
Table 3 shows that medium defects exhibit reduced accuracy compared to larger and smaller categories, a phenomenon referred to in this paper as “accuracy collapse”. This issue may arise from the model’s inability to accurately differentiate medium defects, often misclassifying them as small or large, which in turn impacts the overall model accuracy. Compared to the original model, the integration of SimAM has notably improved detection rates for medium defects, thereby strengthening the model’s most vulnerable aspect. We compared the results before and after adding SimAM to the medium types. The red marks indicate improvements, while the blue marks indicate decreases.
Meanwhile, a comparative analysis between SimAM and other attention methods was performed, with the findings detailed in
Table 4. We bold the data where our method performs best.
Table 4 compares the performance metrics and parameter increases of SimAM with several other common attention mechanisms (such as SE, CBAM, and ECA). In terms of performance, the precision and recall of SimAM are slightly higher than those of the other mechanisms. Additionally, unlike the other attention mechanisms that require additional parameters, SimAM does not introduce any extra parameters. As a parameter-free attention mechanism, SimAM offers significant advantages over CBAM, SE, and similar methods, particularly in terms of reduced parameterization. Additionally, SimAM’s approach to calculating three-dimensional weight values for neurons significantly enhances the overall performance of the model. SimAM strikes an ideal balance between performance and efficiency, making it more attractive than mechanisms such as SE, CBAM, and ECA.