**3. Results**

For the assessment of intra-observer repeatability, the weighted kappa (κW) with quadratic weights indicated a statistically significant agreement, κW = 0.64 (95% CI, 0.59 to 0.69), *p* < 0.001 between the two sets of scores. According to [23], the strength of the agreemen<sup>t</sup> was classified as good. Before using the OLR model, preliminary data analysis was carried out by looking at frequency histograms of the scores in different zones. Using various bin sizes and definitions, a number of different histograms were generated to graphically summaries the distribution of scores across the eight taper zones.

#### *3.1. Distribution of Corrosion Scores*

Visual scoring of the 137 stem tapers across the eight zones resulted in 1096 corrosion scores. Table 4 summarizes the frequency of each score level. Score level 2 had the highest quantity (512) while the lowest quantity (51) belonged to score level 4.


**Table 4.** The quantity of the zones having each score level.

Figure 2 illustrates the distribution of the corrosion scores at each zone. This figure can be used to compare the variability of each score level across the eight zones. The values are the percentage of each score out of 137 in every zone (the percentages in each zone add up to 100%). Score levels 1 through 4 stood in the first place at zones PP, AD, MD, and MP, respectively.

*Metals* **2018**, *8*, 840

**Figure 2.** Distribution of corrosion score levels across the entire eight stem taper zones of the 137 retrieved implants.

Considering the unbalanced score levels, the first two score levels that are higher in quantity (i.e., 359 and 512) always show higher percentages compared with score levels 3 and 4 within each zone.

To better compare the severity of damage across the zones, two more configurations of scores (by combining the original score levels) were also explored. The first configuration groups the first and the last two score levels into low and high groups, respectively. Figure 3 visualizes this configuration and compares each score group across the eight zones.

**Figure 3.** The quantity of the double score levels at each zone (scores 1 and 2 versus scores 3 and 4 combined).

As expected, the low score group which comprises (359 + 512) scores has a higher frequency compared with the high score group (174 + 51). This configuration can better show which zones have more severe corrosion damage (for example, MD and LD zones). Also, at zones MD and PP, the smallest and largest gaps between these two combined score levels were observed.

The third configuration preserves score level 1 and combines the other three score levels to form two new score groups of intact and corroded stem tapers. Figure 4 illustrates the frequencies of these two score groups.

The medial distal zone had the largest difference between these two score groups which confirms that this particular zone is most damaged. Also, the posterior-proximal zone had the smallest difference between the two score groups (thus least damaged). As a key finding, the distal regions of the four quadrants showed more corrosion damage compared with the proximal regions.

These finding from the histogram can shed light on the likely outcome of the OLR model. In particular, when the number of DV levels are higher, cumulative logits models may become

infeasible. Histograms can determine which score levels are more important to be compared via using other types of OLR models such as adjacent categories.

**Figure 4.** Distribution of corroded stem tapers against the intact group.

#### *3.2. Comparison of Corrosion in the Zones*

Cumulative odds OLR with proportional odds was employed to conduct pairwise comparisons between the zones. First, it was established whether zone is statistically significant overall. From the test of model performed on SPSS, zone was observed to be a statistically significant (*p* = 0.002) predictor of corrosion scores in this univariate regression model.

Since no specific zone was preferential to investigate, 28 pairwise comparisons had to be undertaken which incurred additional calculations to obtain the overall omnibus statistical test. Table 5 summarizes the OR, *p*-values, and confidence intervals. Significant OR values are highlighted in grey. In this table, each zone has been used seven times either as the primary or reference (inside brackets) group to exhaust the combinations. OR values below 1 indicate that for the primary category, the odds of having a higher corrosion score is lower than that of the reference category.


**Table 5.** The odds of observing a higher corrosion score at a primary zone compared with a reference zone.


**Table 5.** *Cont.*

The reciprocal of odds ratios can be calculated to compare a reference group with a primary group. To compare the severity of corrosion across the entire eight zones, the odds ratios were sorted and plotted (Figure 5). The red and blue bars indicate the significant and insignificant OR values, respectively. An OR equal to 1 indicates equal odds of observing a higher corrosion score at the primary and reference zone groups. By moving away from unity, the odds ratios that are first insignificant later on become significant. The speed by which this transition takes place is a function of the presumed statistical significance level.

**Figure 5.** The 28 odds ratios sorted and colour-coded for 28 pairwise comparisons.

The severity of corrosion at each zone with respect to the other zones was assessed based on its corresponding OR values. For each zone, Table 5 has provided seven OR values wherein that particular zone appears as either primary or reference.

Table 6 sorts the eight zones from the least to the most severely damaged according to the value of C1 + C2. This value quantifies how many times each zone had a higher likelihood of damage compared with the other seven zones throughout the 28 pairwise comparisons. C1 indicates how many time a particular zone, as the primary, had an OR value above 1, while C2 indicates how many times that same zone, as the reference, had an OR value below 1. Therefore, both C1 and C2 reflects the frequency of each zone appearing as more severely damaged with respect to the other zones.

**Table 6.** The frequency of each zone showing statistically significant odds ratio (OR).


**Table 6.** *Cont.*


Zones PP and MD were identified having the least and highest severity of corrosion. Interestingly, proximal and distal regions were found to be grouping together in this table with the distal region showing more damage compared with the proximal region across the four quadrants in the studied stem tapers.

## **4. Discussion**

Eight distinct zones of the stem tapers including anterior-distal, anterior-proximal, medial-distal, medial-proximal, posterior-distal, posterior-proximal, lateral-distal, and lateral-proximal were scored and statistically compared to identify the zone(s) with the most severe corrosion damage in the retrieved implants studied in this work. It is noted that there are several studies in the literature that chose to score stem tapers holistically, not locally [9,10,24–26].

Within the studies [11,12,15–19] that scored stem tapers locally, the pools of implants had a limited diversity in terms of implant properties (e.g., head diameter, articulation type, and stem design). Therefore, it was deemed necessary to explore whether a similar distribution of corrosion damage can be seen in a more heterogeneous pool of implants.

To the best of our knowledge, there are only two studies [18,19] in the literature that, similar to this work, have assigned eight local scores to the stems with the rest using lower numbers of zones. In those two studies, one did not compare the scores between the zones [18]. The other compared the four quadrants, and the two distal and proximal regions separately in terms of corrosion severity and did not determine which zone(s) had the most severe damage [19].

Routine causal-explanatory statistical analyses require only one score as the descriptor of damage for each implant. The majority of these studies have chosen to combine the local scores by calculating an overall value [8,11–14]. This approach has led to the presumption that this global score is a continuous variable; and, thus, the statistical analyses for continuous variables have been utilised. Analysing a continuous variable with an interval or ratio level of measurement is generally less complex in nature. However, an increased number of levels in the global score does not necessarily imply a known "distance" between the score levels. Therefore, this approach was treated with suspicion in this study and was not adopted.

Here, the corrosion scores were analysed using a univariate OLR model, and the odds ratios along with their *p*-values were reported. Since there was no particular hypothesis about the relative level of corrosion at the eight zones, 28 pairwise comparisons were carried out to exhaust the entire pairwise comparison of the zones. The distal region of the medial quadrant was found to have the highest odds of receiving a higher corrosion score which is aligned with the previous findings in the literature that identified the distal region [19,20,27] and the medial quadrant [7,10,16,28] having the highest corrosion scores. Also, this study shows that the distal region of all the four quadrants had more corrosion damage in comparison with the proximal region of those quadrants. Therefore, it was found that, regardless of the quadrant, corrosion damage is more present distally than proximally.

Generally, the higher severity of wear or corrosion at a specific zone has been attributed to several factors such as increased micro-motions at the interface, head or stem materials, head diameter, high friction moments, and poor lubrication of the bearing articulation. While some act as root causes, the others play the role of causal factors. Also, damage at the head-neck taper junction usually appears

as a combination of wear and corrosion mechanisms. Some of these factors may only contribute to a specific mode of damage, while others may contribute towards a set of damage mechanisms.

In a retrieval study of 231 implants [7] the stem tapers received four fretting and corrosion scores corresponding to the four quadrants. The medial and lateral scores were observed to be significantly higher than the scores at the other two quadrants (posterior and anterior). This was explained to be due to a higher likelihood of micro-motions between the head and neck about an axis in the sagittal plane. Similar to the present study, the pool of implants in this work had a wide diversity, and higher corrosion scores at the medial quadrant sugges<sup>t</sup> that it could be a phenomenon independent of the included patient and implant factors.

Wilson et al. [29] explained how at the double-tapered cone design of Profemur Z, the proximal end of the neck experiences an almost pure compression and shear loading. High frictional moments at taper junctions were related to poor lubrication of the articulation interfaces by another study [30].

The medial quadrant was identified to have higher corrosion scores in a retrieval study of 52 S-ROM components [16]. It was hypothesised that greater micro-motions at this quadrant could result in a more frequent disruption of the passive oxide layer; and consequently, more severe corrosion damage. Similar to the conclusion of the Wilson et al. [29] study, they reported that this region is generally under a compression-loading regime. A computational modelling of the stem taper stresses paired with large diameter heads confirmed this hypothesis after witnessing maximum levels of principal stresses at the medial quadrant [31]. In that work, a 3D model of a 12/14 titanium taper was paired with cobalt-chromium and alumina heads. Increasing the head diameter increased this quadrant's stresses distal to the junction significantly. It was highlighted that the pairing of a small taper and a large head leads to a larger moment arm transmitting a higher force to a small surface area which facilitates tribo-corrosion.

A relatively higher amount of load and stress at the medial quadrant causes elastic strains which appear as surface compression. This condition may lead to micro-motions of approximately 5 to 40 μm [32] which in turn may result in abrasion or fracture of the oxide layer. The subsequent changes in the metal surface potential and the continuous re-passivation of the oxide layer change the chemistry of the crevice solution. Ultimately, the deaeration and pH decrease of the solution initiate crevice corrosive attacks [33,34]. Crevice corrosion has been reported to occur near the bore opening which may explain observing more severe corrosion at the distal region [35].

Besides micro-motions, galvanic corrosion at this interface due to using mixed metal components is a potential source of material loss. In this study, 18 (13.1%) implants had mixed head and stem materials, whereas 45 (32.8%) had similar materials. Therefore, galvanic corrosion cannot be nominated as the sole mechanism of corrosion.

These studies have used relatively homogenous pools of implants, ye<sup>t</sup> they observed higher levels of corrosion at the medial quadrant or distal zones of stem tapers. Based on the findings of the present study which shows that the distal region of the medial quadrant sustains the most severe corrosion damage, it is understood that this particular zone is most severely damaged versus all the other zones regardless of the properties and patient characteristics of the investigated pool of implants.
