1. Introduction
Magnetic resonance imaging (MRI) of the shoulder is essential for evaluating shoulder joint pathologies [
1,
2,
3,
4,
5]. The technology’s excellent soft tissue contrast and multiplanar acquisition capabilities enable optimal assessments of muscles, tendons, hyaline and fibrous cartilage, joint capsules, fat, bursae, and bone marrow [
1,
2,
3,
4,
5]. Unlike ultrasound, which is effective primarily for assessing rotator cuff injuries, MRI offers a comprehensive evaluation of the bone marrow, cartilage, and glenoid labrum, establishing it as the most reliable imaging modality [
3,
4,
5]. Common indications for shoulder MRI include suspected rotator cuff tears, shoulder instability, osteonecrosis, neoplasms, and infections. Additionally, MRI is proficient in diagnosing adhesive capsulitis and impingement syndromes [
5,
6,
7].
Despite its advantages, shoulder MRI faces challenges, notably the high costs associated with prolonged acquisition times that elevate the risk of motion artifacts. This concern is particularly relevant when images with high spatial and contrast resolution are required to detect subtle tears in tendinous and ligamentous structures.
Recent advancements in artificial intelligence (AI), particularly deep learning (DL) algorithms, have been proposed to enhance image reconstruction from undersampled data, thereby reducing scanning times [
8,
9,
10,
11,
12,
13,
14,
15,
16,
17]. AI and DL techniques can be applied across various anatomical regions with different acceleration factors. These approaches differ from traditional acceleration techniques such as parallel imaging and compressed sensing in their ability to learn complex patterns from training data to reconstruct high-quality images from significantly undersampled acquisitions [
9,
10,
11,
12,
13,
14].
Deep learning reconstruction offers several advantages over conventional methods, including improved signal-to-noise ratios (SNRs) and reduced artifacts [
8,
15,
16]. However, these benefits must be balanced against potential trade-offs in image quality, particularly at higher acceleration factors. While a modest acceleration factor may provide minimal time savings but preserve fine detail, implementing more aggressive acceleration—capable of reducing sequence durations by four to six times—could significantly enhance MRI workflows but potentially compromise the detection of subtle pathologies.
Different MR sequences may respond differently to DL reconstruction techniques. For instance, T1-weighted sequences, which typically have inherently high SNRs and relatively short acquisition times, may benefit less dramatically from DL reconstruction compared to T2-weighted or STIR sequences, which are often more signal-limited and time-consuming [
8]. This differential benefit across sequence types is an important consideration when implementing accelerated protocols.
The recent literature has explored accelerated MRI protocols utilizing DL and AI for knee and shoulder examinations, revealing that DL can substantially reduce acquisition times while maintaining image quality and diagnostic confidence comparable to that of standard turbo spin-echo (TSE) MRI [
17,
18,
19]. However, to date, there have been no studies comparing different acceleration levels from commercially available MRI scanners.
The objective of this study is to evaluate the diagnostic accuracy of 2-fold and 4-fold accelerated shoulder MRI protocols, using the standard protocol as the reference for diagnosis.
2. Materials and Methods
2.1. Study Population
Between June 2023 and January 2024, we considered 92 consecutive patients with clinically suspected rotator cuff tears, labral lesions, or bone edema during orthopedic or sports medicine visits for inclusion. The inclusion criteria were shoulder pain and no history of previous surgery. All patients underwent a standard MRI protocol (standard of care) as well as two accelerated protocols (ACS) on the same day.
The exclusion criteria included incomplete imaging data, examinations that were limited in their assessment due to significant motion artifacts, and previous surgery. Following the application of the exclusion criteria, four patients were removed from the analysis due to incomplete data acquisition (n = 2) and significant imaging artifacts (n = 2). The final study cohort comprised 88 patients, with a demographic distribution of 49 males and 39 females and a mean age of 51 years (range: 22–78 years).
2.2. MRI Standard Protocols
All patients were examined using a 3.0 Tesla MR scanner (uMR Omega, United Imaging Healthcare, Shanghai, China). The institutional review board validated the data collection for this prospective clinical study involving all adult patients (>18 years). All data were gathered as part of routine clinical care.
2.3. Technical Parameters
All MRI examinations were performed using a 16-channel dedicated shoulder coil. The standard protocol sequences included T1-weighted turbo spin-echo (TSE), T2-weighted TSE, and TIRM sequences in multiple planes. For T1-weighted TSE, the parameters were TR/TE/FA: 650.0 ms/18.0 ms/150°, slice thickness: 3 mm, acquisition planes: axial and coronal, acquisition time: 3 min 42 s per plane. For T2-weighted TSE, the parameters were TR/TE: 4300.0 ms/124.0 ms for coronal plane; TR/TE: 3500.0 ms/39.0 ms for axial plane, slice thickness: 3 mm, acquisition time: 4 min 18 s and 3 min 56 s, respectively. For TIRM sequences, the parameters were TR/TE: 4800.0 ms/46.0 ms, TI: 150 ms, slice thickness: 3 mm, acquisition time: 4 min 52 s.
The DL2 and DL4 protocols used identical base parameters as the standard protocol but incorporated the uAI deep learning reconstruction algorithm (United Imaging Healthcare) with acceleration factors of 2-fold and 4-fold, respectively. This resulted in the total acquisition times being reduced by 50% for the DL2 protocol and 75% for the DL4 protocol compared to the standard protocol. The specific acquisition times for DL2 were T1-weighted TSE: 1 min 51 s per plane, T2-weighted TSE: 2 min 9 s (coronal) and 1 min 58 s (axial), TIRM: 2 min 26 s. For DL4, the times were further reduced to T1-weighted TSE: 56 s per plane, T2-weighted TSE: 1 min 5 s (coronal) and 59 s (axial), TIRM: 1 min 13 s.
No modifications to the base sequence parameters were required for the accelerated protocols. The DL reconstruction was automatically applied during image acquisition using the manufacturer’s uAI algorithm, which employs a convolutional neural network trained on paired datasets of fully sampled and undersampled images [
8,
9,
10,
11,
12,
13]. Overall, the duration of each examination with all three protocols was shorter than the combined time of two standard protocols, and patients experienced no issues while remaining inside the gantry.
2.4. MRI Postprocessing
The accelerated protocols do not necessitate any additional post-acquisition processing since all acquisition parameters are comparable to those of the standard protocol. The MR system automatically employs the uAI deep learning reconstruction algorithm during image acquisition to reconstruct high-quality images from undersampled k-space data [
14,
15,
16].
The uAI algorithm utilizes a multi-scale convolutional neural network architecture that was trained on paired datasets of fully sampled and undersampled acquisitions to learn optimal image reconstruction [
8,
15,
16]. This approach differs from conventional parallel imaging and compressed sensing techniques by learning complex image features rather than relying on predefined mathematical constraints.
The DL reconstruction occurs in real time during the acquisition process, adding only minimal computational time (typically less than 10 s per sequence). There is no significant delay in obtaining and visualizing the accelerated images, ensuring that the workflow remains uninterrupted.
2.5. Image Analysis
Prior to the study, all four radiologists participated in a standardized training session that included an evaluation of 15 test cases (not included in the study cohort) to ensure consistent application of the evaluation criteria. This training session covered the specific scoring systems for each pathology and established agreement on borderline cases.
As the reference standard, all standard MRI images were evaluated in consensus by two experienced readers (L.R. and G.F.), who possess 25 and 15 years of experience, respectively, to determine the ground truth. Any disagreements between these two readers were resolved through discussion until consensus was reached. Subsequently, the 2-fold and 4-fold accelerated MRI (ACS) protocols were assessed on a dedicated offline workstation by four independent radiologists (A.S, E.O., L.R., and G.F., with 4, 10, 25, and 15 years of experience, respectively).
The evaluation was conducted blind and in random order. In the first step, image quality was assessed using a scale from 1 to 4, where 1 indicated poor quality, with the examination considered non-diagnostic, while 4 indicated optimal quality. Following this, the images were clinically evaluated for the diagnosis of structural abnormalities of the shoulder. Specifically, the readers assessed the presence of the following pathologies: bone edema, rotator cuff tears, and labral tears.
For each alteration, the results of the standard sequences and the accelerated sequences were deemed comparable only if there was a perfect anatomical match in the location of the alteration. All lesions were graded on a 4-point scale, where 1 indicated the absence of alterations and 4 indicated the presence of marked alterations.
The diagnosis of bone edema was based on the presence of increased water content, reflected by a signal increase in sequences with long repetition times (TRs) or signal decay in T1-weighted images. The scoring system was as follows: 1 for definitively no edema; 2 for presumably no edema; 3 for presumably the presence of edema; and 4 for definitively the presence of edema.
For tendon evaluation, the following scoring system was utilized: 1 for definitively no alterations; 2 for mild signal changes without tears; 3 for partial tears; and 4 for complete tears. Signal changes were noted in cases of tendon signal increases (in long-TR images), with or without thickening. Rotator cuff tears were identified by a loss of substance in one or more tendons of the cuff or the long head of the biceps. Each tendon was assessed individually, differentiating partial tears (articular or bursal) characterized by partial interruption of tendon fibers from complete ruptures, with or without tendon retraction. Given the non-arthrographic study protocol and the low incidence of labral abnormalities, the glenoid labrum was evaluated as a single anatomical unit without subdivision into its various anatomical sections. For labral tears, the following scoring system was adopted: 1 for definitively no alterations; 2 for mild signal changes without tears; 3 for partial tears; and 4 for complete tears.
2.6. Statistical Analysis
A statistical analysis was conducted using R software (version 4.3.0, R Foundation for Statistical Computing, Vienna, Austria). The sample size was determined assuming 90% power to detect a 10% difference in diagnostic accuracy between protocols (α = 0.05), requiring a minimum of 82 patients. A post hoc power analysis confirmed adequate statistical power (97%, β = 0.03) for detecting differences in diagnostic accuracy between protocols at the 0.05 significance level.
Standard protocol readings by two experienced radiologists (15 and 25 years of experience) served as the reference standard. For each reader and imaging parameter, we calculated the sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and accuracy with 95% confidence intervals (CIs).
Reader scores were dichotomized (scores 1–2 = negative, 3–4 = positive) for the statistical analysis. We constructed receiver operating characteristic (ROC) curves and calculated the area under the curve (AUC) to assess the overall diagnostic performance of each protocol. AUC values were computed and compared using DeLong’s test. McNemar’s test was used to assess paired differences in diagnostic accuracies between protocols.
Inter-reader agreement was evaluated using Kendall’s coefficient of concordance (W), with agreement strength classified as slight (0.00–0.20), fair (0.21–0.40), moderate (0.41–0.60), substantial (0.61–0.80), or almost perfect (0.81–1.00). We chose Kendall’s W over other agreement measures (such as kappa) because it can accommodate multiple raters simultaneously and provides a single consensus measure across all readers.
A secondary analysis stratified readers by experience level (≤10 years vs. >10 years) to assess whether reader experience moderated the effect of protocol on diagnostic performance. Diagnostic confidence scores were compared using the Kruskal–Wallis test with post hoc Dunn’s test, applying Benjamini–Hochberg correction for multiple comparisons to control the false discovery rate. Statistical significance was set at p < 0.05.
4. Discussion
In this study, we focused on demonstrating that DL-driven acquisitions can significantly reduce the acquisition time for shoulder MRI studies without a substantial compromise in diagnostic accuracy. Our findings provide evidence that moderate acceleration (2-fold) maintains diagnostic performance comparable to that of standard protocols, while more aggressive acceleration (4-fold) shows slightly reduced sensitivity for certain pathologies.
The perfect diagnostic performance for bone marrow edema detection across both accelerated protocols aligns with recent findings by Xie et al. [
17], who reported comparable sensitivity (98.2%) and specificity (97.9%) between standard and DL-reconstructed sequences. This consistency across studies suggests that DL reconstruction effectively preserves the contrast characteristics necessary for bone marrow pathology evaluation.
Our findings of slightly superior performance with DL2 (AUC = 0.94) compared to DL4 (AUC = 0.90) provide important insights into the optimal acceleration levels for tendon evaluation. These results support Chang and Chow’s [
18] emphasis on the “delicate balance between acceleration and fidelity”, particularly for subtle pathologies. This was especially evident in partial-thickness tears, where DL4 showed a minor degradation in sensitivity (96.8% vs. 99.5% for DL2), a finding that parallels Xie et al.’s observation of slightly reduced detection rates for partial-thickness supraspinatus tears in their accelerated protocols.
A novel aspect of this study is the analysis of reader experience’s impact on diagnostic performance, particularly with the DL4 protocol. While Xie et al. reported high inter-reader agreement (κ = 0.82) across their cohort of three readers [
17], they did not stratify the results by experience level. Our observation of reduced diagnostic confidence with DL4 among less experienced readers suggests the need for targeted training when implementing higher acceleration protocols. This finding is particularly relevant in labral tear detection, where DL4 showed lower sensitivity (91.7%) compared to DL2 (100%), extending Chang and Chow’s observations regarding the challenges of accelerated protocols in detecting subtle labral pathologies [
18].
The maintained high inter-reader agreement across the protocols (W = 0.92–0.98) supports DL reconstruction’s robustness, though the slight reduction in concordance with DL4 suggests that moderate acceleration (2×) might represent an optimal balance between time savings and diagnostic confidence. These findings align with Xie et al.’s conclusion that a “sweet spot” exists in acceleration factors where diagnostic quality is preserved while achieving meaningful time savings [
17]. These findings have significant clinical implications, particularly regarding workflow optimization and resource utilization. With the DL2 protocol reducing acquisition times by 50% while maintaining diagnostic accuracy, healthcare centers could potentially increase patient throughput significantly while improving patient comfort and reducing motion artifacts. Chang and Chow specifically highlighted this potential for “democratizing access to MRI” through reduced scan times [
18], a vision supported by our findings. To better understand the overall impact of reader experience on diagnostic performance across all evaluated pathologies, we analyzed the aggregated accuracy data for both DL protocols (
Figure 3). This analysis revealed that while both protocols maintained high diagnostic accuracy, the performance gap between DL2 and DL4 was more pronounced among less experienced readers, suggesting that expertise may partially compensate for increased acceleration rates.
The consistency in diagnostic accuracy across reader experience levels suggests broad applicability across various clinical settings, from academic centers to community practices. As noted by Xie et al. [
17], this robustness is crucial for the widespread implementation of AI-assisted imaging protocols. The potential for reduced scan times to expand MRI accessibility in emergency settings represents another significant advantage, particularly for acute shoulder trauma assessment.
4.1. Mechanisms Underlying Reduced Performance with Higher Acceleration
The slight reduction in diagnostic performance observed with the DL4 protocol, particularly for subtle pathologies such as partial-thickness rotator cuff tears and labral lesions, can be attributed to several factors. First, higher acceleration factors naturally result in greater k-space undersampling, reducing the amount of acquired raw data. While DL reconstruction attempts to compensate for this data loss, there are fundamental information theory limits to what can be recovered [
12,
13,
14,
15].
Second, the DL reconstruction process at higher acceleration factors may introduce subtle blurring or smoothing effects that can obscure fine structural details that are critical for detecting partial tears. This phenomenon was particularly evident in partial-thickness supraspinatus tears near the footprint, as illustrated in
Figure 7. The subtle hyperintensity representing a small articular-sided tear is clearly visible on standard images, slightly less conspicuous on DL2 images, and often not discernible on DL4 images.
Third, the effect of noise amplification at higher acceleration factors, although mitigated by DL reconstruction compared to traditional parallel imaging, may still impact the contrast-to-noise ratio in areas of subtle signal change. This particularly affects structures with inherently lower signals, such as the labrum.
4.2. Study Limitations and Future Directions
Several limitations should be considered when interpreting our findings. First, the single-center, single-vendor design substantially limits the generalizability of our results. Different MRI platforms employ distinct DL reconstruction algorithms, which may perform differently at similar acceleration factors. Multi-center, multi-vendor validation studies are essential before widespread clinical implementation.
Second, while methodologically sound, our reference standard based on consensus reading by two experienced radiologists lacks surgical correlation. Although consensus reading by expert radiologists is an established approach for defining imaging reference standards, it cannot account for pathologies that might be missed by standard MRI but detected during surgery. Future studies would benefit from surgical correlation when available, particularly for labral pathologies where MRI has known limitations.
Third, the relatively low prevalence of certain pathologies in our cohort (teres minor tears n = 1, labral lesions n = 9) limits our statistical power for these specific findings. The wide confidence intervals for sensitivity in detecting these lesions reflect this limitation. Larger cohort studies with more balanced pathology distributions or targeted recruitment of specific pathologies would strengthen future investigations.
Fourth, our study focused on a limited set of shoulder pathologies (BME, rotator cuff tears, and labral tears). While these represent common and clinically significant findings, shoulder MRI evaluates numerous other structures and pathologies that warrant investigation.
Fifth, our assessment of reader experience was binary (≤10 years vs. >10 years) and included only four readers in total. A more granular assessment of experience levels with more readers would provide better insights into the learning curve associated with interpreting DL-accelerated images.
Finally, we did not evaluate the long-term clinical impact of implementing DL-accelerated protocols. Studies assessing patient outcomes, workflow efficiency, and cost-effectiveness are needed to fully understand the clinical value of these techniques.
Future research should focus on multi-center, multi-vendor validation studies to establish the generalizability of our findings across different MRI platforms and DL reconstruction algorithms. Also, the evaluation of a broader range of shoulder pathologies, including ligamentous, capsular, and cartilaginous injuries, should be considered, possibly including surgical correlation to better establish the true diagnostic performance of accelerated protocols.
5. Conclusions
DL-accelerated shoulder MRI protocols demonstrate high diagnostic accuracy, with DL2 showing performance nearly identical to that of the standard protocol across all evaluated parameters. While DL4 maintains acceptable diagnostic accuracy, it shows a slight degradation in sensitivity for subtle pathologies, particularly among less experienced readers. These findings suggest that the DL2 protocol represents an optimal balance between acquisition time reduction and diagnostic confidence, potentially improving workflow efficiency without compromising diagnostic quality.
Based on our findings, we recommend the following for clinical implementation: (1) The DL2 protocol can be safely implemented for routine shoulder MRI examinations, offering a 50% reduction in acquisition time without compromising diagnostic accuracy. (2) The DL4 protocol should be used with caution, particularly when evaluating patients with suspected subtle pathologies such as partial-thickness rotator cuff tears or labral lesions. (3) When implementing accelerated protocols, institutions should consider providing targeted training for radiologists, especially those with less experience in interpreting DL-reconstructed images.
In conclusion, DL-accelerated shoulder MRI protocols, particularly the DL2 protocol with 2-fold acceleration, offer a promising approach to increase scanner efficiency and patient throughput without compromising diagnostic quality. While more aggressive acceleration with DL4 maintains acceptable performance, the optimal clinical implementation will likely involve tailoring the acceleration factor to the specific clinical question and suspected pathology.