**1. Introduction**

Despite the clinical benefits of modularity in total hip replacement (THR) implants, modular interfaces such as head-neck taper junction sustain mechanically assisted crevice corrosion due to relative micro-motions at the metallic interface and also the presence of corrosive body fluid [1,2]. Previous studies [3–5] have reported that the solid and soluble wear debris and corrosion products released from the head-neck junction may elicit untoward host body reactions such as osteolysis, peri-prosthetic fracture, and metallosis. Depending on the intensity of these postoperative complications, revision surgeries may be needed to replace failed prostheses.

Through large-scale retrieval studies, the surface damage sustained by retrieved implants is assessed, and possible associations between several implant/patients factors and the extent/location of the damage are investigated. The severity of the damage is quantified by using visual scoring methods [6,7]. To date, many studies have applied these methods (with or without modifications) to various modular junctions [7–10]. Upon scoring the damage, each study employs a causal-explanatory statistical modelling to investigate the effect of a particular set of factors (predictors) on the damage score.

In the head-neck junction, stems have a tapered geometry which can be divided into several zones (e.g., anterior, medial, posterior, and lateral quadrants). A deeper level of score granularity can provide more details about the severity and spatial distribution of damage. Distribution of the corrosion damage over the distinct zones of tapers has been investigated by a limited number of studies [8,11–19].

The number of zones scored at stem tapers has seldom gone beyond four (anterior, medial, posterior, and lateral quadrants). One reason for that could be the complexity of conducting pairwise comparisons within the groups of zone factor. With four zones, six combinations (order disregarded) would be required. If it is desired to consider the distal and proximal regions of each quadrant as well, 28 (i.e., 8! (<sup>8</sup>−<sup>2</sup>)!×2!) pairwise comparisons would be required to investigate the damage thoroughly. The studies that scored the distal and proximal regions separately have observed different damage patterns within these regions [15,18,19]. Therefore, it is necessary to look at stem taper zones with a higher level of granularity in order to explore whether any significant difference exists between the distal and proximal regions of the quadrants.

This study introduces a method for addressing this gap. Using this approach, eight individual corrosion scores are assigned to eight distinct zones of each metallic stem taper. Next, an ordinal logistic regression (OLR) model is used to quantitatively compare the severity of corrosion damage at these eight zones.

#### **2. Materials and Methods**

#### *2.1. Retrieved Implants Information*

This study was approved by the Southern Adelaide Clinical Human Research Ethics Committee (Reference No. 485.13); 137 total hip replacement implants retrieved between 1995 and 2015 at the Royal Adelaide Hospital (RAH), Adelaide, Australia were selected. The selection was limited to include only detached head-neck junctions so that the stem tapers were accessible for assessment. The retrieved implants had been disinfected by immersing in 70% ethanol for four days followed by a 4% Biogram solution (polyphenolic disinfectant and detergent with 18% phenol) for 48–72 h. Biologic debris (blood or proteinaceous films) had been removed using a cotton bud without abrasion. The stem tapers, selected for this research, were further cleaned with acetone followed by a gentle wipe with a soft nylon brush. Eleven implant/patient factors were retrieved from Our Patient Management and Outcomes Database (OPMOD) of the RAH. Table 1 provides the demography of these categorical and continuous factors. The missing information associated with each factor supplements the quantity of each factor to add up to 137. This study only looks at the distribution and severity of corrosion. Therefore, the missing patient and implant information did not pose any concern.




**Table 1.** *Cont.*

#### *2.2. Visual Assessment of Corrosion Damage*

The Goldberg's scoring method [7] was used to inspect and rate corrosion on the stem tapers (Table 2). Based on this method, eight distinct zones of the retrieved stem tapers were scored individually. Fretting wear was not scored because it has been reported by several studies that fretting may be masked by corrosion damage; and therefore, hard to visually identify [14,15,19,20].

**Table 2.** Visual criteria for scoring corrion damage.


Also, it is thought that the severity of fretting in Goldberg's method cannot be measured consistently because the pitch of the machined threads over the taper surface varies among different stem designs [14,21]. Lastly, fretting scars can be mixed up with scratches caused by attaching or detaching the head intraoperatively [7,12,21].

In order to have a consistent scoring, one trained investigator (RM) evaluated the damage. The stem tapers were visually scored twice in a random order. Each stem taper was photographed and eight zones (posterior-distal (PD), posterior-proximal (PP), medial-distal (MD), medial-proximal (MP), anterior-distal (AD), anterior-proximal (AP), lateral-distal (LD), and lateral-proximal (LP)) were identified according to our previous study [22]. Figure 1 displays an exemplary taper for each score level.

**Figure 1.** Corrosion damage scores of 1 through 4 for stem tapers.

#### *2.3. Statistical Analysis*

In this study, SPSS (version 25) was used for the statistical analysis and a *p*-value of <0.05 was determined as the level of statistical significance. Weighted kappa (κW) with quadratic weights was run to determine the single-observer repeatability of the corrosion scores. A confusion matrix was established to quantify the disagreements. For quadratic weights, the further away a disagreement was from the perfect agreement, the more harshly that disagreement is considered. The strength of agreemen<sup>t</sup> based on the magnitude of the weighted kappa (κW) was interpreted according to the guideline reported in Landis et al. [23].

Having an ordinal dependent variable (DV) as the response, OLR was employed to capture the ordered nature of the DV levels. The OLR model in this study uses cumulative logits. Selection of cumulative logits against other models (e.g., adjacent or continuation categories) was due to the interest of this study to use the entire response scale regardless of the score level.

Consequently, the cumulative odds OLR came with proportional odds constraint to ensure the regression lines across the DV levels are parallel. This OLR model divides the categories of the ordinal DV to run cumulative logits, as demonstrated in Table 3.

**Table 3.** An ordinal dependent variable (DV) with four levels giving three cumulative probabilities and consequently logits.


With a four-level DV, this OLR model outputs three binomial logistic regressions, according to Equations (1)–(3) that predict the probability of being classified into the 'lower' categories as opposed to the 'higher' categories for each dichotomization of the ordinal DV based on the cumulative probabilities.

$$logit(success) = \ln\left(\frac{Prob(score \le 1)}{Prob(score > 1)}\right) \tag{1}$$

$$\log it(success) = \ln \left( \frac{Prob(score \le 2)}{Prob(score > 2)} \right) \tag{2}$$

$$logit(success) = \ln\left(\frac{Prob(score \le 3)}{Prob(score > 3)}\right) \tag{3}$$

#### 2.3.1. The Ordinal Logistic Regression (OLR) Assumptions

Before deploying an OLR model, four assumptions (constraints) needed to be considered to ensure the validity of the results. The first assumption mandates the DV (visual scores) having an ordinal level of measurement which is valid here. Under the second assumption, there should be at least one independent variable (IV) that is continuous, ordinal or categorical (including dichotomous variables) which is valid as well.

The other two assumptions are related to the characteristics of the data. The third assumption mandates no multi-collinearity between the IVs. It was implemented by incorporating collinearity diagnostic under linear regression which returns the variance of inflation factor (VIF). VIF indicates to what extent a particular IV contributes to multi-collinearity issues within the dataset. In this study, VIF values beyond 10 were considered as having multi-collinearity as a rule of thumb.

The fourth assumption checks for having proportional odds. Here, the test of parallel lines was used to compare the fit of the proportional odds model to a model with varying slope coefficients. It was desired not to reject the null hypothesis that states the slope coefficients are the same across the three cumulative regression models. If true, the effect of each IV will be identical at each cumulative logit which is desired here.

#### 2.3.2. Overall Parameter Estimates

As pointed out earlier, the type of OLR model used in this study produces an equation for each cumulative logit. As there are four categories of the DV, three cumulative logits (Equations (1)–(3)) are expected. Also, the assumption of proportional odds constrains the slope coefficients to be the same for all the three equations, so it is just going to be the thresholds that may vary between the three equations.

Since changes in log odds do not have much intuitive meaning, the ratio of the odds between any two categories or a unit change in a numerical IV is reported. The odds ratio (OR) was calculated as the exponential of the log odds of the slope coefficient. Also, the 95% confidence intervals of the OR and the significance levels are reported.

Unlike the numerical and dichotomous IVs, zone, as a polytomous IV, demands additional calculations to complete an overall test of statistical significance. To exhaust the entire pairwise comparison of the categories, one category was taken as the reference, and the rest were compared with that as primary categories. In each significance test, each zone had to be recoded into a new variable with the desirable reference category being coded as the last category (highest level).
