1. Introduction
Canine hip dysplasia (CHD) is one of the most frequent orthopedic diseases in dogs. This multifactorial and progressive disease manifests as hip laxity and osteoarthritis [
1,
2]. The main cause of CHD is still under study; however, researchers have reported heredity, breed, body size, weight, rapid growth, and hormonal changes to be the predisposing factors [
1,
2,
3,
4]. The laxity of the coxofemoral joint and subluxation of the femoral head are the primary signs of CHD, as well as the underlying reasons for osteoarthritis [
5,
6,
7]. The degeneration of the coxofemoral joint capsule and ligament of the head of the femur with ossification disorders of the acetabulum are the main underlying reasons for the hip laxity incidence [
5,
8]. Pain, lameness, exercise intolerance, a decreased range of motion of the coxofemoral joints, and osteoarthritis are the main symptoms of CHD.
The early detection of CHD is crucial for timely intervention. The importance of the early detection of hip laxity lies in the fact that the earlier the disorder is detected, the better the preventive or therapeutic methods can be planned to minimize the symptoms of the disease and improve the welfare of the dog. Juvenile pubic symphysiodesis (JPS) is one of the preventive methods that can be performed on dogs with mild–moderate hip laxity from 12 to 20 weeks of age [
9]. The JPS is related to age, and the best results are expected between 12 and 16 weeks of age [
10,
11]. Furthermore, double or triple pelvic osteotomy (DPO or TPO, respectively) can be performed on patients with higher grades of hip laxity between 16 and 32 weeks of age [
11].
Globally, veterinarians use different diagnostic programs to diagnose CHD. Most of these programs are based on radiographic and clinical examinations, and veterinarians perform them on adult dogs (from a minimum of 12 months of age). The FCI (Fédération Cynologique Internationale) scoring method is used in many European countries, Asia, Africa, and South America for dogs from a minimum of 12 months of age (18 months for large and giant breeds), which is based on the measurement of the Norberg angle (NA), the evaluation of the shape of the acetabulum, signs of degenerative joint disease (DJD), and the subluxation angle (SA) [
12]. The OFA (Orthopedic Foundation for Animals) scoring program is used in the United States and Canada, and it consists of the measurement of the SA and the existence of DJD in dogs from 24 months of age [
13]. The BVA/KC (British Veterinary Association/Kennel Club) scoring program is used in Britain, Ireland, Australia, and New Zealand, and it is based on the assessment of the NA, SA, and signs of DJD in dogs from 12 months of age [
14,
15].
Despite previous screening programs that were applicable to adult dogs (at least from 1 or 2 years of age), the PennHIP (Pennsylvania Hip Improvement Program) screening method is an early diagnostic program that can be performed on dogs from 4 months of age [
16]. This program is based on the evaluation of the DJD, the congruity of the femoral head and acetabulum, and the measurement of the laxity of the femoral head through the calculation of the distraction index (DI). The DI is calculated on distraction-view radiographs, and it is the ratio of the distance between the centers of the acetabulum and femoral head and the radius of the head of the femur [
16]. The DI is a number between 0 (sound hip) and 1 (severe laxity). The possibility of the CHD incidence in dogs with a DI ≤ 0.3 is low, whereas dogs with a DI ≥ 0.6 are at an increased risk of hip dysplasia [
17].
In addition to radiography, which is a primary diagnostic method for the evaluation of the hip joints, veterinarians also frequently use computed tomography (CT) to assess the coxofemoral joints. The possibility of a three-dimensional evaluation of the acetabulum and femur in this method is one of the prominent advantages. Different researchers have used CT to assess the skeletal structure of the coxofemoral joints, predict the occurrence of osteoarthritis at older ages, and measure the hip laxity in dogs [
18,
19,
20]. Despite the high cost and low availability of CT in veterinary clinics, veterinarians are increasingly using this method, and there will likely be more studies in which the researchers use this method in the future.
The aforementioned radiographic programs are the most popular and frequently used diagnostic protocols worldwide; however, in some cases, veterinarians might utilize a combination of different methods, or even other methods, such as the measurement of the subluxation index (SI), for the diagnosis of CHD [
21]. These systems are not only relevant for the diagnosis and treatment of CHD, but they also help veterinarians to exclude dogs with severe grades of CHD from breeding programs [
3,
22]. Recently, researchers have confirmed that long-term selection strategies could reduce the prevalence of CHD in some breeds [
23,
24]. Despite the fact that genetics is important in this disease, the final score does not only reflect the inheritance potential of CHD in the next generations because environmental factors also play a role in the incidence of this disease [
1,
2].
Despite all these screening programs, the early detection of CHD is still challenging. Developing new early diagnostic methods is not only important in veterinary medicine, but it might also be applicable to human medicine, as CHD is an experimental model for the evaluation of hip dysplasia and osteoarthritis in human medicine [
25,
26]. Together with attempts to develop different molecular and paraclinical diagnostic methods to identify the adequate biomarkers for the early diagnosis of CHD [
25,
27], efforts to develop sensitive clinical methods, such as radiography, are still in progress [
28]. The major challenges in the early diagnosis of CHD are the lack of information on the pathophysiology of the disease and its progressive and irreversible nature [
25].
Given that many of the evaluated values in radiographic screening programs are qualitative, the interpretation of the findings might vary between observers. Thus, having standard protocols with high interobserver agreements might help to reduce the scoring bias and increase the homogeneity of the results. In our previous study, we showed high and poor interobserver correlations for the quantitative and qualitative values, respectively, in Rottweilers at 4 months of age [
29]. Considering that many of the symptoms of hip laxity or signs of DJD may not be visible at 4 months of age, we decided to evaluate the interobserver agreement at older ages as well.
Our first aim of the study was to investigate the interobserver agreement of the frequently used radiographic criteria for the diagnosis of hip laxity on Rottweiler dogs at 8 and 12 months of age. Furthermore, we aimed to compare the results reported in our previous article for the same Rottweilers by the same observers at 4 months of age [
29] with our findings at 8 and 12 months of age. We hypothesized that the interobserver agreement increases with the age of the dogs; thus, we expected higher agreements regarding the older-aged dogs.
4. Discussion
In this study, our aim was to investigate the interobserver reliability of the eight most commonly used radiographic criteria, including four quantitative (NA, DI, DARS, and CEA) and four qualitative (SCAR, LCFH, GDJD, and GDAR) parameters, for the diagnosis of CHD in purebred Rottweilers at 8 and 12 months of age, and to compare the recorded results with the results that we previously reported for the dogs at 4 months of age [
29]. We hypothesized that the interobserver agreement would increase with the age of the dogs.
The results recorded for the quantitative parameters confirmed our hypothesis, and especially in comparison with those at 4 months of age that we previously reported [
29]. Except in a few cases, we did not observe any remarkable high agreements regarding the qualitative values. Most of the interobserver agreements of the qualitative values were improved from poor/weak to weak/moderate agreements in the older ages, and except for a few cases, we did not observe any good or excellent agreements. Therefore, our hypothesis was not confirmed for the qualitative values in this study.
According to the results of our study, there were excellent interobserver agreements for all the quantitative values at 8 and 12 months of age. The ICC values of the NA and DI were the same at both ages (NA = 0.96 and DI = 0.98), whereas the ICC values recorded for the DARS and CEA, despite being excellent at both ages, slightly increased at 12 months of age. These results are more prominent when we compare them with results for the same dogs at 4 months of age by the same observers, which we reported in our previous study [
29].
We observed an excellent interobserver agreement for the NA at 4 months of age [
29], and this agreement even increased at 8 months of age and remained the same at 12 months of age. Similar to the NA, we recorded an excellent interobserver agreement for the DI at 4 months of age [
29], which was increased at 8 months of age and remained the same at 12 months.
We recorded good interobserver agreement for the DARS at 4 months of age [
29], which was improved to excellent agreement at 8 months of age, and even increased at 12 months of age, which was the largest improvement among the quantitative values. The interobserver agreement reported for the CEA at 4 months of age was excellent [
29]. This excellent agreement was slightly increased at 8 months of age, and thereafter at 12 months of age.
According to the multiple comparisons between the mean values of the NA, DI, and DARS, there were no significant differences between the measurements at 8 and 12 months of age, whereas the same values between 4 [
29] and 8 months of age, as well as between 4 [
29] and 12 months of age, were significantly different (
p < 0.001). Despite these parameters, the results reported for the CEA were completely different. We did not record any significant differences between the measurements of the CEA at 4 [
29] and 8 months of age, or between 4 [
29] and 12 months of age; however, the measurements between 8 and 12 months of age significantly differed (
p = 0.013). Due to these findings, on the one hand, we can conclude that the evaluation of the NA, DI, and DARS at 8 months of age did not differ from those at 12 months of age, and we can expect approximately the same results; on the other hand, the measurements of the CEA differed in the older ages, and we should expect even higher interobserver agreements at 12 months of age.
One of the reasons for the increased interobserver agreement of the DARS and CEA for the older ages might be the skeletal maturity of the dogs. They reach their primary skeletal growth and muscle consistency at 4 months of age, and from this age, veterinarians can perform predictive radiographic and orthopedic examinations, such as the Ortolani maneuver [
36,
37,
38]. At 8 months of age, dogs reach the main peak of their skeletal maturity, and it is mostly completed by 12 months of age (18 months in some breeds). Due to the difficulty of the detection of the center of the femoral head, craniolateral edge of the acetabulum, and dorsal acetabular rim on DAR-view radiographs of juvenile dogs, the interobserver agreements increased with the age of the dogs, as the observers could better detect them at the older ages.
Based on the multiple comparisons between the mean values of each quantitative parameter at 4 [
29], 8, and 12 months of age, there were no significant differences for the NA measurements, and we did not report any significant differences for the DI at 4 months of age [
29]. We only recorded one significant difference at 8 months of age and two significant differences at 12 months of age for the DI measurements. According to the comparisons between the DARS measurements on all the examination dates, the number of significant differences between the observers was similar at 4 (seven differences) [
29] and 12 (six differences) months of age, whereas this number was slightly lower at 8 months of age (four differences). We recorded most of these differences between Observer 3 (radiologist), Observer 5 (radiologist), and the other observers. The significant difference recorded for the CEA increased from one difference at 4 months of age [
29] to three differences at 8 months of age and four differences at 12 months of age. These differences at 4 [
29] and 8 months of age regarding the CEA were not specific to a particular observer group or person, and they equally existed between the surgeons and radiologists. However, at 12 months of age, all the differences that we recorded were between Observer 1 (surgeon) and the other observers. In general, Observers 1, 2, and 4 had the lowest number of differences between each other in our previous [
29] and the current study. The similar results between these observers might be because of the daily repetition of these measurements, or they could be due to the experience of the observers in the early diagnosis of hip laxity.
The interobserver agreements recorded for the qualitative values were mostly from poor to moderate at both the examination times in this study. Most of the interobserver agreements recorded for the SCAR at 8 months of age were poor, similar to those recorded at 4 months of age [
29]; however, these poor agreements were reduced at 12 months of age, when we recorded more moderate agreements. In comparison with our previous study [
29], the number of poor agreements was reduced from 80% at 4 months of age to 30% at 12 months of age, while we did not observe any excellent or good agreements for the older ages. Thus, despite the negligible improvement in the interobserver agreements, the increasing age did not increase the interobserver agreement of the SCAR evaluations.
We recorded a high number of moderate interobserver agreements at 8 and 12 months of age regarding the LCFH. The numbers of moderate agreements at 4 [
29] and 12 months of age were the same (60%). The difference between these two ages was in the reduced weak and increased good agreements at 12 months of age.
Despite a high number of poor interobserver agreements of the GDJD at 4 [
29] and 8 months of age, we observed a minimal improvement at 12 months of age, and the poor results were reduced from 100% at 8 months to 50% at 12 months, while we did not observe any good agreements at all.
Similar to the GDJD, we recorded a high number of poor results for the GDAR. The number of poor agreements at 4 months of age [
29] was 90%; however, the number decreased to 70% and 50% at 8 and 12 months of age, respectively. We did not observe any excellent or good agreements regarding the GDAR at any age in our study.
As reported in previous studies, qualitative values are more relative to the observers than quantitative values, and different observers might have different opinions for the same case; thus, we can expect a broad range of results [
29,
39]. The standardization of the film-reading process might increase the interobserver agreements and help the veterinarian to reduce the number of false-negative or false-positive results [
39]. False-negative or -positive results mostly occur on borderline hips because their detection is more difficult than the detection of healthy or severely affected hips. In the case of a false-negative diagnosis, dysplastic dogs will be included in breeding programs, which may contribute to slow progress in terms of reducing the CHD incidence, despite screening programs [
39,
40]. False-positive results will exclude nondysplastic dogs from breeding programs, which reduces the population of nondysplastic dogs and also may increase the risk of other genetic disorders due to the reduction in the genetic variation in dogs [
39]. The main reasons for the diagnosis of false-negative or false-positive hips are the low interobserver agreement of the screening programs and the late onset of the degenerative disease [
40]. Using screening programs with high interobserver agreements will decrease the number of false-negative or false-positive results.
In a study performed by Verhoeven et al. [
39], the authors report low interobserver agreements for the diagnosis of CHD with the FCI scoring method. Paster et al. [
41], Saunders et al. [
42], Smith et al. [
43], and Fortrie et al. [
44] report the same results for subjective hip scoring with ventrodorsal hip-extended radiographs. These studies support our findings regarding the low interobserver agreements for the qualitative values.
Despite the low interobserver agreements for the qualitative parameters, researchers have reported good agreements in the literature for most of the quantitative values. In a study performed by Broeckx et al. [
45], despite a substantial measurement bias between the observers regarding the NA, the authors observed good interobserver agreements regarding the DI and laxity index. Bertal et al. [
46] confirmed these findings. In another study performed by Klever et al. [
47], the authors recorded high interobserver agreements for the measurements of the NA and DI between novice and experienced observers. These findings support our findings regarding the high interobserver agreements for the quantitative values.
The experience of the examiners is an important topic in intra- and inter-observer studies. In previous studies, researchers report various results regarding the effect of the experience of the observers on the intra- and inter-observer agreements. Studies in which researchers investigated the qualitative or subjective parameters, such as the DJD, had better interobserver agreements between the experienced observers than the novice observers [
39]; in other studies, in which the researchers evaluated the quantitative values, such as the DI or NA, they report no significant correlation between the degree of the experience of the observers and the interobserver agreements. Therefore, the experience of the observers, or even the self-learning of each observer, did not increase the intra- and inter-observer agreements for the quantitative parameters in these studies [
46,
47].
In our study, we did not record any significant differences between the observers regarding the NA and DI (except in a few cases) measurements, which confirms our previous studies [
46,
47]; however, we did record significant differences between the observers regarding the DARS and CEA measurements at 8 and 12 months of age. Observers 3 and 5, despite being experts in diagnostic imaging, had performed these measurements less than the other three observers had; thus, they had less experience in the early diagnosis of hip laxity. Most of the differences recorded for the CEA at 12 months of age were between one of the experienced observers (Observer 1) and the others; thus, we cannot consider the experience of the observers, despite being an important aspect, as an absolute factor because observers with a lot of experience might also produce different results. We only recorded these differences for the complex measurements (the DARS and CEA) of the quantitative values. Considering that the complexity of the structures can decrease the intra- and inter-observer agreements [
48], the recommendation is to use simple parameters, such as the DI or NA, which had high intra- and inter-observer agreements, or to use a combination of different methods, such as the total score reported by Merca et al. (2020) [
49] together with other methods, or other modified methods, such as the modified FCI reported by Mostafa et al. (2022) [
28]. However, we recommend reducing the number of observers to reduce the interobserver variability [
46,
50]. Our findings on the effect of the experience of the observers and the qualitative values were contrary to previous studies, and we recorded the same results (poor–moderate agreements) between the novice and experienced observers. We require further investigations to assess the relationship between the degree of the experience of the observers and the different parameters, and especially for the subjective parameters.
The quality of the radiographs and positioning errors are important issues in radiographic studies. Experienced technicians took all of the radiographs included in this study, and an experienced radiologist controlled the process to meet the inclusion criteria. Having high-quality radiographs facilitates the radiographic evaluation and film-reading process; however, it does not influence the agreements, as all the observers receive and evaluate the same radiographs [
51].
In this study, we evaluated the interobserver variability, which provided information on the reproducibility of the method on 8- and 12-month-old Rottweilers. In the future, researchers could use the intraobserver variability at the same ages to investigate the repeatability of the methods. Furthermore, an assessment of only one specific breed may not be expandable to other dog breeds; therefore, we recommend investigations into the intra- and inter-observer agreements for other dog breeds as well.
According to Observer 5, some quantitative parameters were indefinable in our study. Potentially negligible effects on the statistical findings could have been caused by these missing data.