Next Article in Journal
Comparison of En Masse Repair versus Separate Double-Layer Repair for Delaminated Rotator Cuff Tears: A Systematic Review and Meta-Analysis
Next Article in Special Issue
Efficacy and Complication Rates of Percutaneous Vertebroplasty and Kyphoplasty in the Treatment of Vertebral Compression Fractures: A Retrospective Analysis of 280 Patients
Previous Article in Journal
Is COVID-19 Infection a Multiorganic Disease? Focus on Extrapulmonary Involvement of SARS-CoV-2
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Inadequate Annotation and Its Impact on Pelvic Tilt Measurement in Clinical Practice

1
Sydney Musculoskeletal Health and The Kolling Institute, Northern Clinical School, Faculty of Medicine and Health and the Northern Sydney Local Health District, Sydney, NSW 2006, Australia
2
Department of Orthopaedics and Traumatic Surgery, Royal North Shore Hospital, St. Leonards, NSW 2065, Australia
3
Sydney Medical School, The University of Sydney, Sydney, NSW 2006, Australia
*
Author to whom correspondence should be addressed.
J. Clin. Med. 2024, 13(5), 1394; https://doi.org/10.3390/jcm13051394
Submission received: 16 January 2024 / Revised: 16 February 2024 / Accepted: 21 February 2024 / Published: 28 February 2024

Abstract

:
Background: Accurate pre-surgical templating of the pelvic tilt (PT) angle is essential for hip and spine surgeries, yet the reliability of PT annotations is often compromised by human error, inherent subjectivity, and variations in radiographic quality. This study aims to identify challenges leading to inadequate annotations at a landmark dimension and evaluating their impact on PT. Methods: We retrospectively collected 115 consecutive sagittal radiographs for the measurement of PT based on two definitions: the anterior pelvic plane and a line connecting the femoral head’s centre to the sacral plate’s midpoint. Five annotators engaged in the measurement, followed by a secondary review to assess the adequacy of the annotations across all the annotators. Results: The outcomes indicated that over 60% images had at least one landmark considered inadequate by the majority of the reviewers, with poor image quality, outliers, and unrecognized anomalies being the primary causes. Such inadequacies led to discrepancies in the PT measurements, ranging from −2° to 2°. Conclusion: This study highlights that landmarks annotated from clear anatomical references were more reliable than those estimated. It also underscores the prevalence of suboptimal annotations in PT measurements, which extends beyond the scope of traditional statistical analysis and could result in significant deviations in individual cases, potentially impacting clinical outcomes.

1. Introduction

The practice of patient-specific templating prior to surgical interventions has become a mainstay for many surgeons [1]. This process often necessitates an examination of radiographic landmark annotations [2,3]. Pelvic tilt (PT) is a routinely measured radiographic parameter in hip and spine surgeries [4,5] for assessing spinopelvic alignment and implant navigation [6,7].
In current clinical practice, a single annotator manually evaluates these anatomical landmarks and calculates the corresponding parameters in a high-pressure clinical environment [8]. The susceptibility to human error in these measurements is a well-acknowledged concern, and the clinical implications of these errors have not been fully explored [9]. The inherent subjectivity of these assessments, coupled with the ambiguity of landmark definitions, variations in patient anatomy, and inconsistencies in radiographic quality, contributes to this challenge [10]. Existing studies on the reliability of radiographic annotations focus on a parameter-level analysis, comparing the accuracy of lengths or angles of paired landmarks through statistical methods like mean absolute error, correlation, and reproducibility analyses [11,12,13,14]. Researchers establish thresholds to determine the reliability of a measurement dataset [15,16]. However, this approach may overlook instances where inadequate landmark annotation does not markedly affect the overall parameter, potentially obscuring landmark-specific inadequacies which could compromise patient care. To our knowledge, analyses that delve into the accuracy of radiographic landmark annotation at a point-wise level are absent from the literature.
In the domain of spinopelvic radiographic analysis, the literature reveals mixed findings on the measurement reliability of various spinal sagittal parameters among experienced surgeons. Some studies have highlighted unfavourable reliability due to the poor visualization of anatomical landmarks [10,17], while others assert the high reliability of these radiographic measurements [16]. Despite this academic debate, clear guidelines for enhancing the accuracy of landmark annotations in clinical settings remain elusive. Research has identified factors such as pathological changes and obesity that obscure certain landmarks [17,18], and different anatomical regions within a single image may exhibit varied error patterns [18,19]. Yet, the impact of suboptimal annotations at a landmark-specific level has not been thoroughly investigated [20]. To address these challenges, some researchers have suggested alternative parameters and the use of image augmentation techniques to improve measurement precision [19,21]. However, the body of work specifically focused on evaluating the accuracy of PT landmark annotations is notably sparse. In the rapidly advancing domain of artificial intelligence (AI), particularly within deep learning frameworks, popular semantic segmentation algorithms often yield landmark predictions as heatmaps, reflecting the model’s confidence in its assessments at a point-wise dimension [22,23]. Despite the widespread adoption of such models, there remains a notable gap in the analysis of point-wise accuracy pertaining to human-generated landmark annotations. This omission leads to an inadvertent incorporation of label noise into AI models, resulting in a fundamental lack of understanding of such noise. The current literature cites “precise” PT measurements by comparing AI-derived results with “gold standard” datasets produced through manual image annotations [24]. This approach, however, neglects the inherent uncertainties and potential inadequacies in manual annotation. Such oversights can lead to the misattribution of deviations from the gold standard to AI inaccuracies, disregarding the possibility that the “gold standard” itself may harbor errors stemming from human annotation [25]. Considering the discussions presented in the current literature, it is evident that there is a pressing need to understand the inadequacies inherent in human landmark annotation at a point-wise level.
This study aims to bridge the gap by conducting an accuracy analysis of PT landmark annotations on a landmark-wise basis. To accomplish this goal, we recruited multiple annotators to measure PT parameters in simulated clinical settings and then performed secondary reviews to collectively evaluate the annotations from different annotators. This approach aims to identify and rectify instances of inadequate annotation and ultimately provide insights into the accuracy of PT landmark annotations, elucidate the specific factors contributing to annotation inadequacies, and determine their impact on PT parameters.

2. Materials and Methods

2.1. Study Design

This retrospective study collected 126 consecutive sagittal radiographs from an academic surgical clinic (EOS Imaging, Paris, France [26]). Patients were enrolled between November 2020 and July 2021. Eleven radiographs were subsequently excluded due to various reasons: three for not meeting the clinical image quality standards, seven due to the presence of bilateral implants obscuring landmarks, and one owing to a disease (developmental dysplasia) that could influence the measurements. Thus, a total of 115 lateral pelvic radiographs from 93 consecutive patients (62 males and 31 females, aged 64.6 ± 11.4 years) were included. The de-identified patient data were collected from a research database that was ethically approved by the St. Vincent’s Hospital Human Research Ethics Committee (2019/ETH09656) in Sydney, with all the participants providing informed consent for the use of their anonymized data for research purposes.

2.2. Landmark Annotations

Five annotators engaged in the landmark annotation process, including two orthopaedic engineers, two orthopaedic fellows, and one surgeon. Each annotator possessed a minimum of 100 pre-surgical templating experiences, ensuring their competency in measuring corresponding parameters. All the annotations were conducted using a customized code in the Image Processing Toolbox in MATLAB (2022b MathWorks, Natick, MA, USA). Two pelvic tilt definitions were used (Figure 1) [7]. The anatomical definition (PTa) comprises the gravity line and the line connecting the centre of the anterosuperior iliac spines (ASISs) and the pubic tubercles. The mechanical definition (PTm) comprises the gravity line and the line connecting the centre of the femoral heads and the midpoint of the sacral plate [7].
In a manner similar to standard clinical procedures which utilizes digital radiographs from the picture-archiving and communication system (PACS), annotators enlarged each image on the computer screen to a level where the interested anatomy region was sufficiently discernible, thereby enabling them to place all landmark points with confidence. The PTa annotations were made by marking one point for the centre of the anterosuperior iliac spines (ASISs) and another for the pubic symphysis. The PTm annotations were conducted via two methods: (1) the calculation method—three points for each femoral head contour (six points total) and two points for the anterior and posterior ends of the sacral endplate; and (2) the estimation method—one point for the centre of two femoral heads and one point for the midpoint of the sacral endplate. These coordinates were subsequently utilized to compute the PT parameters. In instances where it was feasible, the PTm obtained semi-automatically by radiographic technicians via the sterEOS® software (EOS Imaging, Paris, France) were collected for comparison with our manually annotated outcomes [26].
Each image received five separate annotations from independent annotators, each depicted in different colours (representative of each annotator) and shapes (circle for the calculation method and triangle for the estimation method, Figure 2). The fully annotated images were then subject to an error analysis.

2.3. Error Analysis

After a four-week interval from the completion of the annotation process, three assessors (one surgeon, one orthopaedic fellow, and one orthopaedic engineer) conducted an evaluation of all the images containing the five distinct annotations. This assessment was performed in a manner that was blinded to the identity of the annotators, as exemplified in Figure 2. By comparing each other’s annotations, the assessors were able to examine the distribution of the landmarks and gain a collective insight into the underlying causes of inadequate annotations. They rated each landmark based on the following: (1) all five annotations meet the clinical standard for a landmark (all satisfactory); or (2) specific annotations failed to meet the clinical standard due to a low image quality at the landmark region, the presence of anatomical anomalies, annotations away from the intended target (outliers) while the target is identifiable, or due to other factors that could not be categorized within the aforementioned conditions. The assessments were recorded on a custom survey form via REDCap (Vanderbilt University, Nashville, TN, USA).
Upon the initial data collection and review, we adopted a “majority rules” approach for data categorization, where opinions from two or more assessors were regarded as the ground truth. The inadequacies in landmark annotation were analysed in the following manner:
(1)
Landmark-wise inadequacy: instances where two or more assessors identified at least one out of five annotations of a landmark as inadequate.
(2)
Reason-wise inadequacy: cases where two or more assessors highlighted the same reason for a landmark’s inadequacy.
(3)
Parameter-wise impact: if two or more assessors identified the same annotation (both colour and shape) as inadequate, the corresponding annotation and its PT parameter were subsequently excluded from the “adequate” dataset group. Comparative statistics between the “full” dataset group and the “adequate” dataset group were then conducted.
Consider Figure 2 as an illustrative case: the annotations for the centre of the femoral heads and the midpoint of the sacral plate did not exhibit any discernible outliers. Although not perfectly overlapping and with a few annotations potentially neglecting the contour of the second femoral head on the right-hand side due to the predominance of overlapping contours, these annotations were clinically deemed adequate. In contrast, the annotations for the centre of the ASISs revealed notable discrepancies. The black dot clearly overlooked the ASIS on the right side, and the purple dot’s placement was too imprecise; thus, both were classified as inadequate outliers for this study. These inadequate annotations, identifiable by most surgeons and radiologists, had minimal impact on the PTa parameter, which explains why traditional statistical methods failed to detect them. The blue dot, positioned slightly above the clustered red and green dots, was considered adequate due to its location within the centre of the two anterosuperior curves of the iliac spines—a region which allows for some subjective interpretation. Regarding the pubic symphysis region, only the red dot risked being an outlier, located at the anterosuperior edge of the pubic tubercles, whereas the other four dots were aligned at the anterior end of the curvature. Definitions for this landmark may vary across studies [27], and these variations are generally not seen as affecting clinical judgment. Consequently, all annotations at the pubic symphysis region were judged to be adequate.

2.4. Statistical Analysis

The PT parameters for both the “full” dataset group and the “adequate” dataset group were evaluated. The case-wise average parameters were compared using a correlation analysis, the mean absolute difference, the maximum absolute difference, 95% confidence intervals (CIs), and paired t-tests. Pearson’s correlation coefficient was interpreted as “poor” for r < 0.3, “fair” for 0.3 < r < 0.5, “moderate” for 0.5 < r < 0.6, “moderate strong” for 0.6 < r < 0.8, and “very strong” for r > 0.8 [28]. The reliability of the PTm measurements was validated using the intraclass correlation coefficient (ICC) by comparing the measurements with the values reported by the radiographic technician [29]. An ICC above 0.9 was interpreted as representing an excellent agreement [29]. All statistical analyses were conducted using SPSS (IBM, Tulsa, OK, USA).

3. Results

Our landmark-wise inadequacy analysis indicated that, in 61.74% of cases, at least one landmark was not deemed “all adequate” (Table 1). This means that, for at least one landmark per image, the majority of assessors identified it as inadequate for various reasons. Of these, both the ASIS and the femoral head centre (determined using the estimation method) contained inadequate annotations in over 30% of cases, as evaluated by two or more assessors. Annotations based on clearly identified anatomical contours and calculated landmark positions demonstrated a higher accuracy, with lower instances of inadequacy, compared to those relying on estimations of landmark locations. Specifically, the femoral head centre showed a 7.83% inadequacy rate for the identified anatomical positions versus 30.43% for the estimated positions, and the midpoint of the sacral plate had an 11.30% inadequacy rate compared to the 13.04% rate for the estimated positions.
Our in-depth reason-wise analysis revealed that landmarks in different anatomical regions displayed distinct error tendencies (Table 2). Annotations of the ASIS frequently proved to be inadequate due to a compromised image quality and an outlying location. In cases estimating the femoral head centre, a higher incidence of outlier annotations was observable when examined with annotations from multiple assessors. Additionally, anomalies associated with the sacral plate were occasionally overlooked (in 5–7% of cases). The “Other” reason associated with the two cases was an excessive “axial rotation” that made the radiograph not sagittal enough for assessment. Further explanation of these assessments is provided in the Section 4.
A subsequent analysis revealed that inadequate annotations typically resulted in measurement discrepancies ranging from −2° to 2° (95% CIs, Table 3). The mean absolute difference observed between the “full” dataset group and the “adequate” dataset group was minimal, ranging only from 0.35° to 0.52°. Despite this, all the correlation coefficients exceeded 0.9, and the paired t-test revealed no statistically significant differences (all p-values above 0.05). Notably, the maximum difference observed in our cohort reached a value as high as 13.47°.
Among the 115 images, the radiographic technician reported PTm measurements for 113 images using the sterEOS® software. The reliability analysis of the measurements indicated an excellent agreement between our manual methods and the software’s automated measurements, with ICC values ranging from 0.91 to 0.94.

4. Discussion

Our study sheds light on the potential inadequacies in the current practice of radiographic landmark annotation, specifically related to the parameters of pelvic tilt in hip and spine surgeries [2]. A significant finding was that, in a dataset in which the measurement reliability was deemed as excellent as in traditional settings, more than 60% of cases analysed still contained at least one landmark annotation rated as inadequate by two or more assessors. This result underscores the necessity to review and potentially revise current annotation practices [21,30]. Intriguingly, the anatomical landmarks of the ASIS and the femoral head centre (when determined by estimation) were frequently deemed inadequately annotated in over 30% of cases, which underlines the prevalence of substandard radiographic landmark annotations in these regions [20,31]. Notably, the use of estimation in determining landmark locations seemed to be a significant contributor to this inadequacy. In contrast, more accurate anatomical measurements were obtained when more anatomical features were annotated by identifying anatomical contours and calculating landmark locations.
Our detailed reason-wise analysis helped unravel the reasons behind these inadequacies. When estimating the centre of femoral heads and annotating the centre of ASISs, annotations were frequently deemed outliers because the contralateral side of the bone was overlooked [31,32]. This was often due to the assumption that sagittal radiographs were strictly “sagittal”, leading to the misidentification of one side of the bone as overlapped underneath the other [33]. The assessors and annotators observed that poor image quality and outlier annotations in the ASIS region were primarily due to patients’ high BMI, which obscured the belly region and complicated the annotation process [18,31]. The literature has also highlighted that pathologic changes associated with femoral heads likely contribute to the low accuracy of femoral head centre identification by estimation [17]. Although our study did not demonstrate this impact, future studies could explore whether specific pathologies have a higher propensity for inaccuracies in femoral head centre identification. The underestimation of anomalies associated with the sacral slope further underscores the need for thorough anatomical knowledge in the annotation process [34]. Specifically, a lumbosacral transition and variations in the fusion of L5 and S1 are prevalent in the general population; thus sacral slope (SS) was often inaccurately measured by surgeons [17]. The literature estimates the prevalence of lumbosacral transitional vertebrae to be between 4.0 and 35.9%, with a mean of 12.3%, a value which is close to our outcomes [19]. Given the widespread nature of anatomical variations in this region, it is imperative for annotators to incorporate it into their deliberations when discerning sacral landmarks. Moreover, factors like the presence of hardware, anatomical deformities, and calibration issues with imaging equipment can adversely affect measurements [20]. However, such cases were excluded from our cohort, in line with our clinical protocols.
The traditional statistical analysis of the data reveals that inadequate annotations typically led to PT discrepancies ranging from −2° to 2° in most instances. The analysis showed a minor mean difference (0.35° to 0.52°) and a very strong correlation (above 0.9), with no significant statistical differences observed (all p-values > 0.05). These findings indicate that the measurement dataset adheres to clinical standards, even when including annotations deemed to be inadequate in a secondary review. Consequently, inadequate annotations appear to have exerted a minimal impact on the overall parameter analysis of our dataset. This suggests that suboptimal annotation practices may have limited effects on overall clinical decisions [30,35]. Although this result is supported by other studies [31,34,36], extreme errors could potentially influence clinical outcomes, especially in surgeries requiring high-precision measurements [2,32]. Therefore, the reasons behind these inadequacies should be flagged to improve both the quality of education and the data, particularly as these errors could inadvertently influence the emerging domain of machine learning algorithm training. Notably, although Dimar et al. identified inconsistencies in the manual measurements of sagittal parameters between surgeons, they found that a computer-aided programs yielded consistent, reliable measurements with a high-degree of correlation. When comparing the surgeons’ measures with the computer-aided measures, they found poor correlation in most measurements [10,37]. Our findings further highlight the irregularities in manual measures and strengthen the rationale for the implementation of machine-based measurements in the future.
The excellent agreement observed in the comparison of PTm measurements across 113 images, between our manual annotations and those generated by the sterEOS® software, verifies the effectiveness of our manual annotation techniques relative to automated software measurements. The sterEOS® software, equipped with semi-automated statistical shape recognition, a divergence correction algorithm, and 3D parameter calibration with options for manual adjustments by radiographic technicians, demonstrated no significant statistical differences when compared to our method of radiographic annotation utilizing the MATLAB Image Processing Toolbox. This finding highlights the comparability of sophisticated automated systems with meticulous manual annotations, emphasizing the precision and reliability of human expertise in the context of radiographic measurements and analysis.
Despite these findings, several limitations of this study must be noted. First, our study did not consider the differing clinical experiences among annotators and assessors or their personal preferences during radiographic analysis. These differences could have affected the process of radiographic annotation and might be worth investigating in future studies. Second, the design of our assessment questionnaire remained subjective, and the decision to follow a “majority rule” over an “authority rule” method after consultation with a senior surgeon might not be applicable to other studies. Third, this study did not further categorize the reasons behind inadequate annotations, such as bad image quality due to patient movement, obesity, or low-dose X-rays, and outlying annotations due to wrongly identifying the contralateral bone, a compromised image quality, or carelessness. While distinguishing these subcategories could offer additional scientific insights, the subjective nature of image assessment and the potential for overlapping causes prevent the reliable isolation of each factor. Consequently, this study did not attempt to quantitatively differentiate the inadequate annotations based on these potential underlying reasons. Fourth, the radiographs in this study were produced using a fan-beam radiation source. Despite this newer radiographic technology offering lower radiation doses and having been demonstrated to be equivalent to traditional radiography [38,39]—hence its routine use in our clinical practice—the replication of this study using traditional cone-beam radiography would be beneficial. Such replication could help confirm the applicability and generalizability of our findings to conventional radiographic techniques. These factors may warrant further consideration in future research.

5. Conclusions

This study highlights the high incidence of suboptimal radiographic annotations in measuring pelvic tilt parameters, emphasizing that such inadequacies persist even in datasets demonstrating excellent measurement reliability with minimal clinical impact on the larger population. It is essential to acknowledge that, in individual cases, the maximum measurement discrepancies can reach up to 13°, a deviation substantial enough to potentially alter clinical outcomes [40]. Consequently, the prevalence of inadequate annotations cannot be ignored, and our results highlight important avenues for improving the quality of clinical practices related to this issue.
To enhance radiographic annotation practices in PT measurements, ensuring high-quality radiographs is fundamental. Improving these practices involves several targeted strategies:
(1)
Rigorous identification of anatomical landmarks: Practitioners should endeavour to precisely identify anatomical contours and calculate landmark locations with accuracy. This involves not approximating landmark locations based on visual estimates but utilizing well-defined anatomical markers within small, precise, and clear regions for calculation. This meticulous approach ensures that landmark identification is based on tangible anatomical features to minimise subjective interpretation.
(2)
Attention to bilateral anatomy: Special attention should be given to examining both sides of the anatomy, particularly the contralateral bones. This practice aids in recognizing and compensating for overlooked critical features that inform the precise location of landmarks. It prevents the assumption that the contralateral anatomy is fully overlapped.
(3)
Enhanced recognition of anatomical anomalies: Developing comprehensive knowledge for identifying anomalies that could alter the appearance of anatomical structures is crucial. Such anomalies may affect the precision of landmark identification and, consequently, the accuracy of PT measurements.
By adopting these practices, clinicians and radiographers can enhance the accuracy of measuring PT parameters, thereby enhancing clinical decision making in spinopelvic surgery. Additionally, our analysis contributes to the foundation of orthopaedic education by identifying inadequacies in annotations that go beyond what traditional statistics can reveal [1]. Furthermore, these improvements are not just limited to clinical applications, they also play a vital role in advancing the field of medical imaging by providing high-quality data for training automated deep learning models in landmark localization. This synergistic improvement across both manual and automated processes underscores the importance of precise and careful radiographic annotation in the broader context of medical imaging and orthopaedic care.
Future research should consider investigating the influence of annotator experience on the precision of landmark annotations, developing a more objective protocol for establishing gold-standard annotations, and delving deeper into the underlying causes of inadequate annotations, including the determinants of outliers and poor image quality, as well as their subsequent effects on radiographic parameters’ accuracy.

Author Contributions

Y.C.: study conception, data acquisition, study design, data analysis, material preparation, visualization, software, manuscript draft, and manuscript revision; V.M.: study conception, data collection, data analysis, material preparation, and manuscript revision; A.M.B.: data acquisition, data analysis, technical support, and manuscript revision; B.R.: material preparation and manuscript revision; W.L.W.: study concept, study design, data acquisition, manuscript revision, supervision, and resources. All authors have read and agreed to the published version of the manuscript.

Funding

This publication received sponsorship from the Institute of Bone and Joint Research, University of Sydney, St. Leonards, Australia, for the article processing charges.

Institutional Review Board Statement

The de-identified patient data were collected from a research database that was ethically approved by the St. Vincent’s Hospital Human Research Ethics Committee (2019/ETH09656) in Sydney, Australia.

Informed Consent Statement

All the participants provided informed consent for using their anonymized data for research purposes.

Data Availability Statement

Measurement data are freely available under the CC0 license. De-identified imaging data and parameter data are deposited at Figshare for research non-identifiable purposes only (https://doi.org/10.6084/m9.figshare.23938398 accessed on August 2023). The transfer, storage, and use of radiographic data must follow our ethics approval (2019/ETH09656).

Acknowledgments

We thank the professional consultations and landmark annotations from John Farey and the clinical support from Lynette McDonald from the Royal North Shore Hospital, St. Leonards, NSW, Australia.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Chai, Y.; Li, W.R.; Simic, R.; Smith, P.N.; Valter, K.; Limaye, A. An Easy to Use Workflow of 3D Medical Reconstruction for Preoperative Planning and Surgical Education. Ann. Surg. Educ. 2021, 2, 29–32. [Google Scholar]
  2. Houston, W. The analysis of errors in orthodontic measurements. Am. J. Orthod. 1983, 83, 382–390. [Google Scholar] [CrossRef]
  3. Chai, Y.; Boudali, A.; Farey, J.; Walter, W. The “Good Doctor” Performance of Radiological Annotation: Measuring Pelvic Tilt as an Example. In Orthopaedic Proceedings; Bone & Joint: Southington, CT, USA, 2023; p. 114. [Google Scholar]
  4. Vrtovec, T.; Janssen, M.M.; Likar, B.; Castelein, R.M.; Viergever, M.A.; Pernuš, F. A review of methods for evaluating the quantitative parameters of sagittal pelvic alignment. Spine J. 2012, 12, 433–446. [Google Scholar] [CrossRef]
  5. Chai, Y.; Boudali, A.M.; Walter, W.L. Correlations Analysis of Different Pelvic Tilt Definitions: A Preliminary Study. HSS J. 2023, 19, 187–192. [Google Scholar] [CrossRef]
  6. Lembeck, B.; Mueller, O.; Reize, P.; Wuelker, N. Pelvic tilt makes acetabular cup navigation inaccurate. Acta Orthop. 2005, 76, 517–523. [Google Scholar] [CrossRef]
  7. Legaye, J.; Duval-Beaupere, G.; Hecquet, J.; Marty, C. Pelvic incidence: A fundamental pelvic parameter for three-dimensional regulation of spinal sagittal curves. Eur. Spine J. 1998, 7, 99–103. [Google Scholar] [CrossRef] [PubMed]
  8. Di Martino, A.; Rossomando, V.; Brunello, M.; D’Agostino, C.; Pederiva, D.; Frugiuele, J.; Pilla, F.; Faldini, C. How to perform correct templating in total hip replacement. Musculoskelet. Surg. 2023, 107, 19–28. [Google Scholar] [CrossRef] [PubMed]
  9. Wilson, J.; Eardley, W.; Odak, S.; Jennings, A. To what degree is digital imaging reliable? Validation of femoral neck shaft angle measurement in the era of picture archiving and communication systems. Br. J. Radiol. 2011, 84, 375–379. [Google Scholar] [CrossRef] [PubMed]
  10. Dimar, J.R.; Carreon, L.Y.; Labelle, H.; Djurasovic, M.; Weidenbaum, M.; Brown, C.; Roussouly, P. Intra-and inter-observer reliability of determining radiographic sagittal parameters of the spine and pelvis using a manual and a computer-assisted methods. Eur. Spine J. 2008, 17, 1373–1379. [Google Scholar] [CrossRef] [PubMed]
  11. Mast, N.H.; Impellizzeri, F.; Keller, S.; Leunig, M. Reliability and agreement of measures used in radiographic evaluation of the adult hip. Clin. Orthop. Relat. Res. 2011, 469, 188–199. [Google Scholar] [CrossRef]
  12. Kyrölä, K.K.; Salme, J.; Tuija, J.; Tero, I.; Eero, K.; Arja, H. Intra-and interrater reliability of sagittal spinopelvic parameters on full-spine radiographs in adults with symptomatic spinal disorders. Neurospine 2018, 15, 175. [Google Scholar] [CrossRef]
  13. Berthonnaud, E.; Labelle, H.; Roussouly, P.; Grimard, G.; Vaz, G.; Dimnet, J. A variability study of computerized sagittal spinopelvic radiologic measurements of trunk balance. Clin. Spine Surg. 2005, 18, 66–71. [Google Scholar] [CrossRef]
  14. Lin, A.; Manral, N.; McElhinney, P.; Killekar, A.; Matsumoto, H.; Kwiecinski, J.; Pieszko, K.; Razipour, A.; Grodecki, K.; Park, C. Deep learning-enabled coronary CT angiography for plaque and stenosis quantification and cardiac risk prediction: An international multicentre study. Lancet Digit. Health 2022, 4, e256–e265. [Google Scholar] [CrossRef]
  15. Jang, J.-S.; Kim, J.I.; Ku, B.; Lee, J.-H. Reliability Analysis of Vertebral Landmark Labelling on Lumbar Spine X-ray Images. Diagnostics 2023, 13, 1411. [Google Scholar] [CrossRef] [PubMed]
  16. Fleiderman Valenzuela, J.G.; Cirillo Totera, J.I.; Turkieltaub, D.H.; Echaurren, C.V.; Álvarez Lemos, F.L.; Arriagada Ramos, F.I. Spino-pelvic radiological parameters: Comparison of measurements obtained by radiologists using the traditional method versus spine surgeons using a semi-automated software (Surgimap). Acta Radiol. Open 2023, 12, 20584601231177404. [Google Scholar] [CrossRef]
  17. Khalsa, A.S.; Mundis Jr, G.M.; Yagi, M.; Fessler, R.G.; Bess, S.; Hosogane, N.; Park, P.; Than, K.D.; Daniels, A.; Iorio, J. Variability in assessing spinopelvic parameters with lumbosacral transitional vertebrae: Inter-and intraobserver reliability among spine surgeons. Spine 2018, 43, 813–816. [Google Scholar] [CrossRef] [PubMed]
  18. Ouchida, J.; Nakashima, H.; Kanemura, T.; Ito, K.; Tsushima, M.; Machino, M.; Ito, S.; Segi, N.; Nagatani, Y.; Kagami, Y. Impact of Obesity, Osteopenia, and Scoliosis on Interobserver Reliability of Measures of the Spinopelvic Sagittal Radiographic Parameters. Spine Surg. Relat. Res. 2023, 7, 519–525. [Google Scholar] [CrossRef]
  19. Jancuska, J.M.; Spivak, J.M.; Bendo, J.A. A review of symptomatic lumbosacral transitional vertebrae: Bertolotti’s syndrome. Int. J. Spine Surg. 2015, 9, 42. [Google Scholar] [CrossRef] [PubMed]
  20. Wang, W.; Wu, M.; Liu, Z.; Xu, L.; Zhu, F.; Zhu, Z.; Weng, W.; Qiu, Y. Sacrum pubic incidence and sacrum pubic posterior angle: Two morphologic radiological parameters in assessing pelvic sagittal alignment in human adults. Eur. Spine J. 2014, 23, 1427–1432. [Google Scholar] [CrossRef]
  21. Chai, Y.; Boudali, A.M.; Khadra, S.; Walter, W.L. The Sacro-femoral-pubic Angle Is Unreliable to Estimate Pelvic Tilt: A Meta-analysis. Clin. Orthop. Relat. Res. 2022, 481, 1928–1936. [Google Scholar] [CrossRef]
  22. Yeh, Y.-C.; Weng, C.-H.; Huang, Y.-J.; Fu, C.-J.; Tsai, T.-T.; Yeh, C.-Y. Deep learning approach for automatic landmark detection and alignment analysis in whole-spine lateral radiographs. Sci. Rep. 2021, 11, 7618. [Google Scholar] [CrossRef]
  23. Weng, C.-H.; Huang, Y.-J.; Fu, C.-J.; Yeh, Y.-C.; Yeh, C.-Y.; Tsai, T.-T. Automatic recognition of whole-spine sagittal alignment and curvature analysis through a deep learning technique. Eur. Spine J. 2022, 31, 2092–2103. [Google Scholar] [CrossRef]
  24. Vrtovec, T.; Ibragimov, B. Spinopelvic measurements of sagittal balance with deep learning: Systematic review and critical evaluation. Eur. Spine J. 2022, 31, 2031–2045. [Google Scholar] [CrossRef]
  25. Reyes, M.; Meier, R.; Pereira, S.; Silva, C.A.; Dahlweid, F.-M.; Tengg-Kobligk, H.v.; Summers, R.M.; Wiest, R. On the interpretability of artificial intelligence in radiology: Challenges and opportunities. Radiol. Artif. Intell. 2020, 2, e190043. [Google Scholar] [CrossRef]
  26. Melhem, E.; Assi, A.; El Rachkidi, R.; Ghanem, I. EOS® biplanar X-ray imaging: Concept, developments, benefits, and limitations. J. Child. Orthop. 2016, 10, 1–14. [Google Scholar] [CrossRef]
  27. Chai, Y.; Boudali, A.M.; Khadra, S.; Dasgupta, A.; Maes, V.; Walter, W.L. Evaluating Pelvic Tilt Using the Pelvic Antero-posterior Projection Images-A Systematic Review. J. Arthroplast. 2023, in press. [CrossRef]
  28. Chan, Y. Biostatistics 104: Correlational analysis. Singapore Med. J. 2003, 44, 614–619. [Google Scholar]
  29. Koo, T.K.; Li, M.Y. A guideline of selecting and reporting intraclass correlation coefficients for reliability research. J. Chiropr. Med. 2016, 15, 155–163. [Google Scholar] [CrossRef]
  30. Tyrakowski, M.; Yu, H.; Siemionow, K. Pelvic incidence and pelvic tilt measurements using femoral heads or acetabular domes to identify centers of the hips: Comparison of two methods. Eur. Spine J. 2015, 24, 1259–1264. [Google Scholar] [CrossRef]
  31. Imai, N.; Ito, T.; Suda, K.; Miyasaka, D.; Endo, N. Pelvic flexion measurement from lateral projection radiographs is clinically reliable. Clin. Orthop. Relat. Res. 2013, 471, 1271–1276. [Google Scholar] [CrossRef]
  32. Eckman, K.; Hafez, M.A.; Jaramaz, B.; Levison, T.J.; DiGioia III, A.M. Accuracy of pelvic flexion measurements from lateral radiographs. Clin. Orthop. Relat. Res. 2006, 451, 154–160. [Google Scholar] [CrossRef]
  33. Tyrakowski, M.; Wojtera-Tyrakowska, D.; Siemionow, K. Influence of pelvic rotation on pelvic incidence, pelvic tilt, and sacral slope. Spine 2014, 39, E1276–E1283. [Google Scholar] [CrossRef]
  34. Sun, W.; Zhou, J.; Qin, X.; Xu, L.; Yuan, X.; Li, Y.; Qiu, Y.; Zhu, Z. Grayscale inversion radiographic view provided improved intra-and inter-observer reliabilities in measuring spinopelvic parameters in asymptomatic adult population. BMC Musculoskelet. Disord. 2016, 17, 411. [Google Scholar] [CrossRef]
  35. Fletcher, J.P.; Bandy, W.D. Intrarater reliability of CROM measurement of cervical spine active range of motion in persons with and without neck pain. J. Orthop. Sports Phys. Ther. 2008, 38, 640–645. [Google Scholar] [CrossRef]
  36. Zhu, J.; Wan, Z.; Dorr, L.D. Quantification of pelvic tilt in total hip arthroplasty. Clin. Orthop. Relat. Res. 2010, 468, 571–575. [Google Scholar] [CrossRef]
  37. Berthonnaud, E.; Dimnet, J.; Roussouly, P.; Labelle, H. Analysis of the Sagittal Balance of the Spine and Pelvis Using Shape and Orientation Parameters. J. Spinal Disord. Tech. 2005, 18, 40–47. [Google Scholar] [CrossRef]
  38. Bittersohl, B.; Freitas, J.; Zaps, D.; Schmitz, M.R.; Bomar, J.D.; Muhamad, A.R.; Hosalkar, H.S. EOS imaging of the human pelvis: Reliability, validity, and controlled comparison with radiography. JBJS 2013, 95, e58. [Google Scholar] [CrossRef]
  39. McKenna, C.; Wade, R.; Faria, R.; Yang, H.; Stirk, L.; Gummerson, N.; Sculpher, M.; Woolacott, N. EOS 2D/3D X-ray imaging system: A systematic review and economic evaluation. Health Technol. Assess. 2012, 16, 1. [Google Scholar] [CrossRef]
  40. Esposito, C.I.; Gladnick, B.P.; Lee, Y.-y.; Lyman, S.; Wright, T.M.; Mayman, D.J.; Padgett, D.E. Cup position alone does not predict risk of dislocation after hip arthroplasty. J. Arthroplast. 2015, 30, 109–113. [Google Scholar] [CrossRef]
Figure 1. The anatomical definition (PTa) and mechanical definition (PTm) of pelvic tilt angles on sagittal pelvic radiographs.
Figure 1. The anatomical definition (PTa) and mechanical definition (PTm) of pelvic tilt angles on sagittal pelvic radiographs.
Jcm 13 01394 g001
Figure 2. The radiographs with annotations from five annotators for a secondary assessment. Different colors indicate distinct annotators. Round dots represent annotations placed by directly clicking on the relevant location, while triangles represent annotations derived from calculating locations based on annotated bone contours.
Figure 2. The radiographs with annotations from five annotators for a secondary assessment. Different colors indicate distinct annotators. Round dots represent annotations placed by directly clicking on the relevant location, while triangles represent annotations derived from calculating locations based on annotated bone contours.
Jcm 13 01394 g002
Table 1. Cases containing inadequate annotations.
Table 1. Cases containing inadequate annotations.
ASISPubic
Tubercle
Femoral Head Centre (cal)Femoral Head Centre (est)Midpoint of
Sacral Slope (cal)
Midpoint of
sSacral Slope (est)
At Least One
Landmark Was
Inadequate
Cases contain inadequate annotation4211935131571
Percentages36.52%9.57%7.83%30.43%11.30%13.04%61.74%
cal = measured from bone contour annotations and calculating the location; est = measured from estimating the landmark location.
Table 2. Cases of inadequate annotations categorized by underlying reasons.
Table 2. Cases of inadequate annotations categorized by underlying reasons.
LandmarkReasonCasesPercentages
ASISBad quality1714.78%
Anomaly00%
Outlier4034.78%
Other21.74%
Pubic tubercleBad quality32.61%
Anomaly00%
Outlier97.83%
Other00%
Femoral head centre (cal)Bad quality00%
Anomaly00%
Outlier97.83%
Other00%
Femoral head centre (est)Bad quality21.74%
Anomaly00%
Outlier3530.43%
Other00%
Midpoint of sacral slope (cal)Bad quality00%
Anomaly65.22%
Outlier76.09%
Other10.87%
Midpoint of sacral slope (est)Bad quality00%
Anomaly86.96%
Outlier86.96%
Other00%
cal = measured from bone contour annotations and calculating the location; est = measured from estimating the landmark location.
Table 3. Parameter-wise comparison between the full dataset and the “adequate” dataset.
Table 3. Parameter-wise comparison between the full dataset and the “adequate” dataset.
PTm_calPTm_estPTa
Correlation coefficient0.980.990.99
Mean absolute difference 0.39°0.35°0.52°
Maximum difference10.64°13.47°5.55°
−95% confidence interval−2.16°−1.93°−1.92°
95% confidence interval2.22°2.47°1.93°
p-value of the paired t-test0.860.250.96
cal = measured from bone contour annotations and calculating the location; est = measured from estimating the landmark location.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Chai, Y.; Maes, V.; Boudali, A.M.; Rackel, B.; Walter, W.L. Inadequate Annotation and Its Impact on Pelvic Tilt Measurement in Clinical Practice. J. Clin. Med. 2024, 13, 1394. https://doi.org/10.3390/jcm13051394

AMA Style

Chai Y, Maes V, Boudali AM, Rackel B, Walter WL. Inadequate Annotation and Its Impact on Pelvic Tilt Measurement in Clinical Practice. Journal of Clinical Medicine. 2024; 13(5):1394. https://doi.org/10.3390/jcm13051394

Chicago/Turabian Style

Chai, Yuan, Vincent Maes, A. Mounir Boudali, Brooke Rackel, and William L. Walter. 2024. "Inadequate Annotation and Its Impact on Pelvic Tilt Measurement in Clinical Practice" Journal of Clinical Medicine 13, no. 5: 1394. https://doi.org/10.3390/jcm13051394

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop