1. Introduction
The knee joint [
1] functions as a hinge joint and is stabilized by four key ligaments. These ligaments [
2] are essential for connecting bones and regulating joint movements. The knee is supported by two collateral ligaments on its sides and two cruciate ligaments within the joint. The cruciate ligaments, namely the anterior cruciate ligament (ACL) [
3] and the posterior cruciate ligament (PCL) [
4], connect the distal end of the femur to the proximal end of the tibia.
Figure 1 provides a detailed illustration of the knee joint’s anatomical structure.
The anterior cruciate ligament (ACL) is a crucial component of the knee joint, connecting the femur to the tibia. It is one of the most vulnerable ligaments in the human body and frequently sustains injuries that often necessitate surgical repair [
5]. ACL tears are the most prevalent type of knee injury [
6]. Given the intricate structure of the knee joint, the ACL is essential for maintaining stability and function. Consequently, prompt and precise diagnosis and treatment of ACL injuries are critically important.
ACL tears are among the most common knee injuries, with an incidence rate in the United States reaching up to 74.6% [
7]. These injuries are particularly prevalent among athletes, especially those involved in soccer, basketball, and volleyball. The rising awareness of health and the growing number of sports participants have led to an annual increase in ACL injury cases [
8]. Research indicates that ACL injuries significantly compromise knee stability and adversely impact patients’ quality of life [
9,
10,
11,
12]. Hence, timely and precise diagnosis, along with effective treatment, is essential for managing ACL injuries.
Current clinical methods for diagnosing ACL injuries predominantly involve various imaging techniques [
13,
14,
15], including ultrasound, computed tomography (CT), and magnetic resonance imaging (MRI). While ultrasound and CT can present challenges in accurately localizing injuries, MRI is regarded as the preferred diagnostic tool. MRI excels in clearly depicting the structures and tissue changes within the knee joint, making it particularly advantageous for diagnosing ACL injuries. Consequently, MRI is extensively utilized in clinical settings for the assessment of ACL damage.
In recent years, the widespread application of deep learning technology [
16,
17,
18,
19,
20,
21,
22] has led to significant advancements in medical imaging, particularly in diagnosing ACL injuries. Bien et al. developed the MRnet dataset and employed ImageNet transfer learning weights to train multiple AlexNet networks to detect ACL tears, meniscal tears, and other injuries [
23]. Their model, validated using the external KneeMRI dataset by Stajduhar et al., achieved an area under the curve (AUC) of 91%, markedly surpassing previous results [
24]. Awan et al. redesigned the ResNet14 architecture based on the KneeMRI dataset, focusing on ACL injury diagnosis. By using hybrid class balancing and real-time data augmentation strategies, they addressed class imbalance issues and achieved a three-class accuracy (ACC) of 92% [
25]. Recent studies have introduced the lightweight ELNet model, which, when combined with multi-scale normalization and blur pooling techniques, achieved an AUC of 96% for ACL tear detection on the MRnet dataset [
26]. Dunnhofer et al. employed a pyramid feature model to extract the local region of interest (ROI) area of each MRI slice, increasing the AUC for ACL injury detection to 97.6% [
27]. Additionally, Belton et al. used spatial attention and feature concatenation techniques to enhance the model’s ability to capture critical features within each MRI slice, achieving an AUC of 97.2% [
28]. These studies highlight the immense potential of deep learning in medical imaging diagnostics, paving the way for more accurate diagnosis of ACL injuries.
Despite tremendous advances in applying deep learning to diagnose ACL injuries, some hurdles remain, most notably dataset variety and sample distribution heterogeneity. These limitations impede models’ generalization capabilities, reducing their performance in real-world clinical contexts. Domain adaptation and transfer learning systems present promising solutions to these problems. By utilizing knowledge transfer from one domain to another, these technologies can improve model performance on new datasets while also effectively managing changes between datasets. Domain adaptation approaches allow information to be transferred across source and target domains, even when there are large data distribution disparities. These strategies can increase model correctness and resilience, eliminate the requirement for huge volumes of labeled data in settings with few annotations, and reduce data gathering costs and time. This study investigates and utilizes domain adaptation and transfer learning strategies to solve the limitations of current ACL injury diagnosis models. By combining these advanced methodologies, we hope to improve diagnostic model accuracy and reliability across a wide range of datasets and clinical settings. This strategy will give more effective and practical tools for medical practice, hence increasing patient diagnosis and treatment outcomes while also advancing medical imaging diagnostics.
This study uses the MRnet and KneeMRI datasets to model MRI images using advanced deep learning methods. By combining these datasets, we provide source and target domain data, allowing for successful domain adaptation and transfer learning for ACL damage detection. This method improves the model’s generalization performance across multiple data circumstances. This work also provides useful reference material for ACL domain transfer models and has potential implications in clinical auxiliary diagnostics. The primary contributions of this study are as follows:
(i) This is the first study to investigate domain adaptation and transfer learning for ACL diagnosis utilizing the MRnet and KneeMRI datasets. The suggested strategy overcomes the limits of single-dataset models by improving generalization performance across a variety of data circumstances.
(ii) We provide a novel source and target domain data processing technique based on contrastive learning, which yields a one-to-many domain adaption framework. This approach aligns the feature distributions in the source and target domains, making it better suited to the study’s specific requirements.
(iii) The experimental results, using a ResNet-18 backbone and a spatial attention mechanism, reveal that our contrastive learning-based domain transfer strategy outperforms both fine-tuned and hybrid training models.
The remaining sections of this work are organized as follows.
Section 2 summarizes relevant work on transfer learning models and feature transfer approaches.
Section 3 describes the proposed technique.
Section 4 explains the datasets and experimental methodologies in depth. Finally,
Section 5 summarizes the findings and suggests areas for future investigation.
3. Proposed Method
3.1. Framework
Bien et al. [
23] utilized the MRnet model for transfer learning on the KneeMRI dataset by fine-tuning the model, but without explicitly considering the relationship between samples from different domains. In clinical practice, MRI equipment varies across hospitals, resulting in distributional differences in data. Consequently, models trained on source domain data may not generalize well to target domain data. Therefore, there is a critical need to develop more sophisticated domain adaptation methods for ACL diagnosis. These methods should leverage the relationships between features from source and target domains to facilitate effective transfer of knowledge to target domain data.
This study introduces a contrastive learning-based domain-adaptive model for ACL diagnosis. By employing contrastive learning between samples from the source and target domains, the disparity between domains is minimized.
Figure 2 illustrates the key steps of the proposed approach. The architecture consists of two main components: model training and data preprocessing.
The data preparation procedure is divided into two major tasks: feature extraction from the target domain and sample filtering in the source domain. During the filtering phase, the model predicts the source domain samples, and those that are mistakenly classified are then processed. Meanwhile, samples from the target domain are processed by the model’s backbone network to extract features that are then used to calculate the contrastive loss.
During the model training phase, samples from both the source and target domains are used as inputs. For the target domain samples, only the classification cross-entropy loss is calculated. In contrast, for the source domain samples, in addition to the classification cross-entropy loss, the contrastive loss is calculated using the target domain-extracted features.
The network used in the workflow consists of two parts: a feature extraction network and a classifier network. The MRI data are in the form of , where N represents the number of slices for the sample and represents the image size. The feature extraction network converts this into a vector through a backbone network, average pooling, fully connected layers, and max pooling. The classifier network, composed of fully connected layers and a sigmoid activation function, then predicts the probability of the vector belonging to the positive class.
The upcoming sections will offer a comprehensive description of the specific components within the framework.
3.2. Data Preprocessing
3.2.1. Source Data Filter
In contrastive learning, positive samples are those that belong to the same class as the anchor sample, while negative samples belong to different classes. The main idea is to move positive and negative samples around in the feature space, bringing positive samples closer to the anchor sample and pushing negative samples away. A correct prediction suggests that the sample is more closely aligned with the positive samples of the target domain in feature space. An inaccurate prediction happens when the model, which has been fine-tuned for the target domain, places the sample closer to negative samples while predicting source domain samples. To bring wrongly classified samples closer to positive samples, remedial measures are required. The detailed process is illustrated in
Figure 3.
Our experimental strategy includes testing the model on the entire source domain dataset before each training iteration. Samples that are misclassified—where the model’s predictions do not align with the actual labels—are gathered into a set . This set is then combined with the target domain training set for subsequent training iterations. The detailed process is outlined in Algorithm 1.
The essence of this strategy is to guide the misclassified source domain samples so that the model can update their feature representations during training. This modification seeks to better align the feature distributions of the source and target domains. By correcting these misclassified samples, the difference in distribution between the source and target domains can be successfully reduced, improving the model’s performance in the target domain.
Algorithm 1 Source Data Filter |
- Require:
Source training set S, Source training set labels , Network G - Ensure:
Misclassified samples - 1:
▹ Initialize the set of misclassified samples - 2:
for do - 3:
▹ Predict label using network G - 4:
if then - 5:
▹ Add misclassified sample to - 6:
end if - 7:
end for - 8:
return
|
3.2.2. Target Feature Extraction
In traditional contrastive learning methods, the practice involves placing an equal number of source domain and target domain samples into a single batch for training. However, the unique format (
) of MRI dataset samples typically results in each batch containing only one source domain sample and one target domain sample. This setup presents challenges for the direct implementation of contrastive learning. Alternatively, conducting real-time feature extraction for all target domain samples and calculating contrastive loss with the current source domain sample during each training step would be highly time-consuming. This approach could significantly impact the experiment’s efficiency and feasibility. The detailed process is illustrated in
Figure 4.
To tackle this challenge, we adopted a more streamlined and time-efficient experimental approach. Prior to each training iteration, we began by utilizing the model to extract features from the target domain training set. Subsequently, based on the sample labels, we partitioned the target domain features into two distinct sets: and . These sets were then utilized in the subsequent training iteration for calculating contrastive loss alongside the source domain samples. The detailed process is outlined in Algorithm 2.
This preprocessing technique enabled us to extract and organize the target domain’s features before training began. This allowed us to bypass the batch size constraint and use pre-extracted features for contrastive learning computations while training. Furthermore, our technique dramatically reduces processing costs by eliminating the need for repeated feature extraction in each training session. This change increases the practicality and efficiency of employing contrastive learning on MRI data.
Algorithm 2 Target Feature Extraction |
- Require:
Target training set T, Target training set labels , Feature extraction network - Ensure:
Target class 0 feature set , Target class 1 feature set - 1:
▹ Initialize the feature set for class 0 - 2:
▹ Initialize the feature set for class 1 - 3:
for do - 4:
▹ Extract features using network - 5:
if then - 6:
▹ Add features to class 0 feature set - 7:
else if then - 8:
▹ Add features to class 1 feature set - 9:
end if - 10:
end for - 11:
return
|
3.3. Contrastive Learning
In the previous section, we discussed employing several data preprocessing methodologies for source and target domain MRI data to optimize their participation in contrastive learning. The challenge remains how to successfully train the preprocessed data. To address this, we propose utilizing different loss calculation methods during training based on whether the data are from the source or target domain. This method enables us to make the best use of label information in a supervised contrastive learning framework while also effectively exploiting inter-domain information.
For the target domain data, the model computes the classification loss using the cross-entropy loss function. The specific formula is depicted below, where
p denotes the predicted probability of the positive class, and
y represents the actual label.
For the source domain data, the model also calculates the contrastive loss relative to the target domain data. This method facilitates bringing the source domain data closer to the target domain data within the feature space.
Specifically, the target domain features and source domain features are initially paired as positive and negative samples. Each current source domain feature, paired with each corresponding same-class target domain feature, constitutes a positive sample pair. Conversely, each current source domain feature, paired with each different-class target domain feature, forms a negative sample pair. Here, represents the feature extracted from the i-th filtered source domain sample by the network. and represent the feature vector sets of the two classes in the target domain extracted during preprocessing. denotes the set of target domain positive sample features corresponding to , and denotes the set of target domain negative sample features corresponding to . and represent the number of positive and negative sample pairs in the sets, respectively.
For each positive and negative sample pair, we first calculate their cosine similarity using the following formula:
Next, the cosine similarity is divided by the hyperparameter
and exponentiated. Subsequently, the InfoNCE (Noise Contrastive Estimation) loss function is computed, and the mean value is derived as the final loss.
The final contrastive loss is weighted and added to the classification loss of the target domain to compute the overall loss. The specific formula is as follows, where
is a hyperparameter:
To leverage the intrinsic knowledge in target domain samples, a supervised classification loss function is used. For source domain samples, a hybrid technique of classification loss and contrastive loss is used. This strategy takes advantage of both the intrinsic properties of the source domain and the inter-domain information exchanged with the destination domain. As a result, it changes the distribution of source and target domain samples within the feature space, increasing the similarity of source domain samples to those from the target domain. This method tries to increase the model’s performance in the target domain by efficiently combining contrastive learning and MRI data.
5. Discussion
In this research, we address the problem of limited generalization in transfer learning for ACL diagnostic models by putting forth a strategy that combines contrastive learning methods with source domain sample filtering methodologies. First, we apply various data preparation techniques that are specific to the properties of MRI data for the source and target regions. We iteratively modify the source domain training set by identifying mispredicted samples, which reveal feature space discrepancies that need to be adjusted, and fine-tuning the network.The target domain has a unique sample form; therefore, traditional contrastive learning methods are not immediately applicable. Rather, before training starts, we use the network to pre-extract features from the target domain training samples. We then build a loss function that combines contrastive loss and weighted classification loss for the source domain samples.
Our methodology’s fundamental components extend beyond the KneeMRI and MRnet datasets. This approach can be used for various medical imaging applications that need transfer learning. For example, extending this method to different types of MRI scans, such as those used for brain or spine diagnoses, could illustrate its adaptability. It can also be applied to other imaging modalities, such as CT or PET scans, by tailoring the feature extraction and loss methods to the unique properties of each. Our method’s flexibility and adaptability suggest that it has the potential to increase model performance and generalization in a wide range of medical imaging applications.
Our strategy has substantial advantages over classic contrastive learning techniques and existing multimodal models. Traditional contrastive learning frequently employs a one-size-fits-all strategy that may neglect domain-specific difficulties, but our method pre-extracts characteristics from the target domain, better tailoring the model to its unique qualities. Furthermore, by integrating contrastive loss and weighted classification loss, our technique outperforms single-loss methods in terms of feature discrimination and sample distribution imbalance. Compared to multimodal models, which necessitate significant computational resources and big datasets, our method is more resource-efficient. It offers a practical solution by focusing on single-modality data while efficiently managing domain-specific features, making it especially useful in situations when multimodal data are limited or unavailable.
Contrastive learning has gained traction in the field of medical imaging, mainly due to its ability to learn robust feature representations from limited data. For example, in recent years, several studies have used contrastive learning to improve model generalization in various medical imaging tasks. Zhang et al. [
34] demonstrated the use of contrastive learning in multimodal medical image segmentation, where they successfully improved segmentation accuracy by using unlabeled data to learn invariant features across different imaging modalities. Similarly, Ammar et al. [
35] applied contrastive learning to cardiac MRI classification, which helped distinguish between different cardiac conditions by effectively managing class imbalances and improving feature discrimination.
However, our work adds significant changes that distinguish it from previous investigations. Unlike standard techniques, we extract features from target domain samples prior to performing contrastive learning, which is an essential adaption given the target domain’s specific properties in ACL diagnosis. In addition, we combine the contrastive loss with a classification loss, which not only enhances feature discrimination but also tackles the issues caused by sample distribution imbalances between the source and target domains.
In conclusion, while contrastive learning is frequently employed, the unique changes in our methodology represent major advancements. A more in-depth review of the current literature would help to clarify the innovative components of our work, better situate it within the larger research landscape, and emphasize its significance for transfer learning in medical imaging.
Nevertheless, this study has limitations, prompting several avenues for future research:
1. Improving source domain data filtering procedures to ensure the accurate identification of samples that need adjustment, especially those that differ significantly from positive samples in the target domain.
2. Exploring more effective feature extraction techniques for target domain features to better align with real-time properties, potentially impacting the design of contrastive loss.
3. Refining contrastive loss calculation methods to better accommodate the one-to-many matching nature of source and target domain samples in contrastive learning.
4. Moving away from supervised learning approaches in the target domain and toward unsupervised learning methods that better represent real-world data issues in medical diagnostics.
These directions aim to advance the robustness and applicability of our proposed methodology in real-world medical imaging applications.