1. Introduction
There is a growing interest within the medical community to develop biomimetic materials for the creation of task trainers (referred to as trainers or simulations). Over the past two decades, the utilisation of such trainers has gained significant traction, particularly in the context of teaching and assessing medical students and training physicians. The primary educational objective of these trainers is to provide an environment wherein trainees can acquire and refine their skills with reduced risk to patients.
However, a critical concern arises with the fidelity of the tactile feedback (also referred to as haptic fidelity) provided by these trainers [
1]. Trainers failing to replicate the tactile feedback encountered in real-world scenarios may result in trainees inadvertently employing aberrant forces or manoeuvres during procedures, potentially leading to patient harm [
2]. Such deviations from the proper technique may prolong the learning process because trainees must “unlearn” these erroneous behaviours [
2].
To address the challenge of haptic fidelity in simulations, it is essential that the incorporated synthetic materials (also referred to as biomimetics) mimic the biomechanical and physical properties of human tissues [
2,
3]. However, selecting the appropriate properties for biomimetics presents a complex challenge. The replication of a single tissue type entails a broad spectrum of biomechanical and physical characteristics that can be influenced by an individual’s anatomy, physiology, and body composition. Moreover, the task is further complicated by the scarcity of research elucidating which biomechanical properties specifically influence the perceived tactile feedback during medical procedures.
With the increasing prevalence and affordability of three-dimensional (3D) printing, the mechanical properties of materials utilised in the design of trainers have garnered significant attention [
4]. A proposed starting point for assessing the biomechanical properties of human tissue is Shore hardness [
4]. By leveraging Shore hardness as a metric, 3D materials can be selected based on comparative Shore hardness readings from the relevant mimicked tissues. Incorporating 3D-printed mimics into trainers may facilitate the creation of lifelike simulations, offering trainees a more representative learning experience.
The general method of measuring Shore hardness employs a durometer, a device that gauges hardness when its metal indenter is pressed onto a material’s surface [
5]. The shape of the metal indenter varies based on the (1) type of hardness test (e.g., Vickers, Knoop, Shore, etc.) and (2) the hardness scale (e.g., Shore A, Shore OO, etc.). Hardness measurement quantifies a material’s resistance to permanent (plastic) deformation and has been used to establish empirical relationships between hardness and other mechanical properties such as yield stress and elastic modulus [
5].
Aside from offering mechanical insights into human tissues, Shore hardness stands out as an appealing methodology because of its simplicity, speed, and non-destructive nature. In industrial contexts, Shore hardness measurements adhere to standards such as ASTM D2240 [
6], which ensure consistent and dependable material data acquisition. However, comparable standards for human tissue testing are notably absent. This absence is likely due to the intricate nature of biological tissues [
7]. For instance, when using Shore durometers, ASTM D2240 recommends a minimum material thickness of 6 mm for elastomers [
6], a requirement often impractical for tissues like fascia. Moreover, industry standards stipulate minimum distances between both repeated Shore measurements and the material’s edge [
6], which are challenging in certain anatomical regions.
Despite its ease of use, there is a paucity of published studies utilising Shore hardness as a methodology. Furthermore, the data from studies employing Shore hardness may be limited in their utility for selecting comparable 3D materials.
Table 1 presents data extracted from five distinct studies where Shore hardness was utilised. Notably, the table exhibits limitations in the type of tissues tested and shows wide ranges of the reported values. For instance, concerning normal pancreatic tissues, Tejo-Otero reported an average Shore O value of 13, whereas Belyaev documented median values of 27 and 30 [
8,
9].
The discrepancy in pancreatic readings between the studies published by Tejo-Otero and Belyaev may stem from adherence (or lack thereof) to the ASTM D2240 standard. Belyaev et al. conducted freehand measurements, a method that may have influenced their readings. The use of a stage, however, is not explicitly detailed in the published methodology by Tejo-Otero [
8,
9]. The ASTM D2240 standard provides precise guidelines for durometer usage, particularly emphasising the use of durometer stands during measurements. This emphasis is likely due to the potential introduction of factors creating variability in Shore readings when measured freely by hand. Such an example includes inconsistency in the applied force when indenting the test specimen.
As the use of physical simulations grows, particularly in the context of 3D printing and do-it-yourself approaches, the demand for standardised tissue values becomes increasingly important. Whilst Shore hardness is an appealing mechanical metric for material selection in clinical training tools, some studies have briefly explored its ability to inform clinical decision making [
9,
11,
13,
14]. Therefore, it is also imperative to establish whether measuring Shore hardness is a reliable method for use among a clinician population, as there are significant research gaps in this area.
To address these research gaps, this study aims to evaluate the reliability of Shore hardness for human tissue measurement by (1) assessing inter-rater reliability across various tissue samples, (2) examining the influence of tissue thickness on Shore hardness readings, and (3) analysing the variability introduced by freehand measurements. It is hypothesised that stage-assisted Shore hardness measurements will exhibit consistency across scales and raters, whereas variations in tissue thickness and freehand techniques are expected to result in greater variability in mean hardness readings.
3. Results
3.1. Inter-Rater Reliability
A total of 1152 Shore A and Shore OO readings were measured by three different raters, and five different tissue types were tested from a total of fourteen different cadavers. The mean hardness values from all three raters on the Shore A scale were 62.25 (ICA), 75.39 (IJV), 70.23 (VN), 44.12 (skin), and 23.90 (SCM). The mean hardness values on the Shore OO scale were 55.71 (ICA), 54.88 (IJV), 55.52 (VN), 59.64 (skin), and 57.76 (SCM).
To determine the inter-rater reliability of Shore hardness readings, the ICC was calculated using Case 2A ICC(A,1) [
19].
Figure 4 presents the results of the ICC study, with the bars representing the ICC value for a given Shore scale and tissue type. The error bars are the 95% CI around the ICC, and the stars indicate statistical significance (
p < 0.05). As seen in
Figure 4, Shore A has a higher inter-rater agreement for tissues compared to the same tissues measured using the Shore OO scale. However, despite the higher agreement, the CIs are wide for all of the ICC values and are also much wider than those predicted during the power calculation.
The overall ICC values can be qualified using different benchmark scales such as Landis and Koch, Fleiss, and Altman. These categorical descriptions of the ICC values can include broad cut-off values and do not take into consideration the number of subjects, raters, categories, or margin of error [
21]. For example, all three benchmark scales would qualify the ICC value for the skin using the Shore A scale as substantial (Landis and Koch), excellent (Fleiss), and good (Altman) despite a CI width of 0.53. To mitigate this, ICC qualifications were made by adapting Gwet’s probabilistic approach [
21].
Table 5 provides a colour-coded summary of the ICC qualifications after applying the probabilistic approach. As seen in the table, the majority of the ICC values using the Shore A scale are moderate, whilst those using the Shore OO scale are poor.
In summary, with the exception of SCM, hardness measurements of human tissue had better agreement using the Shore A scale compared to measurements on the Shore OO scale. Both scales exhibit wide CI widths, causing increased concern about the reliability of this particular method. For hardness measurements of the SCM, both scales cannot reject the null hypothesis, which indicates no agreement between raters (ICC = 0).
3.2. Tissue Thickness and Shore Hardness Values
The correlations between tissue thickness and Shore hardness for Shore A and OO scales are shown in
Figure 5. The mean thickness (SD) values were 0.98 mm (0.19) for ICA, 0.33 mm (0.10) for IJV, 1.13 mm (0.26) for VN, 1.58 mm (0.37) for skin, and 5.52 mm (1.17) for SCM.
It should be noted that because the tissue thickness measures for the ICA were not normally distributed, Spearman’s correlation () was conducted instead of Pearson’s correlation coefficient (r); otherwise, all of the associations between thickness and hardness are Pearson’s correlation coefficient.
With the exception of the SCM, all tissues exhibited a negative relationship between thickness and hardness; that is, thicker tissues have lower hardness readings, and thinner tissues have higher hardness readings. Spearman’s correlation demonstrated that the ICA maintained a significant, inverse relationship between tissue thickness and Shore A hardness values ( = −0.69, p = 0.02).
For Shore OO values, the results were varied. Although the ICA, VN, and skin tissues demonstrated the expected inverse relationships between Shore OO values and tissue thickness, the null hypothesis (r = 0) could not be rejected. In fact, with the exception of the IJV, the null hypothesis could not be rejected for any of the correlation coefficients using the Shore OO scale, which indicates that there is no relationship between thickness and hardness. In addition to being significant, the IJV Shore OO results exhibited a positive correlation coefficient. This finding indicates that increases in tissue thickness are correlated with increases in hardness, which is contrary to the Shore A results.
3.3. Freehand vs. Stand Measurements
A total of 60 Shore OO hardness readings from the surface of a soft-embalmed liver were measured by a single rater. In total, 30 measurements were taken using a durometer mounted on a stand (mean = 50.80, IQR = 17.40), and 30 measurements were taken freehand (median = 8.55, IQR = 5.50) (
Figure 6). A paired Wilcoxon signed-rank test was conducted to evaluate whether there was a significant difference in Shore OO values between measurements taken with the durometer stand and freehand.
The Wilcoxon signed-rank test revealed a significant difference between the two methods of measurement (Z = 4.78,
p < 0.001), with a rank-based effect size of 0.62. These results are unsurprising given that the median difference between measurements with the stand and freehand was 38.20 Shore OO units.
Figure 7 illustrates how different the medians are between the two measurement methods. It is also evident, from
Figure 7, that the spread of data is more even than that of the freehand measurement, illustrating the utility of the stand as a method of reducing measurement bias and random error.
4. Discussion
This study evaluated (1) the inter-rater reliability of Shore hardness measurements across various tissues, (2) the relationship between tissue thickness and Shore hardness, and (3) differences in measurements obtained using freehand versus mounted durometers. Overall, whilst the inter-rater reliability (ICC values) of clinicians using the Shore A scale was higher than that of the Shore OO scale, the Shore A ICC values were, at best, moderate. Few studies have assessed the inter-rater reliability of Shore hardness among clinicians; however, one relevant study was identified [
14]. Similar to this work, it reported mixed reliability and inconsistent correlations between tissue thickness and hardness [
14]. Specifically, ICC values ranged from 0.30 to 0.80 for Shore OO measurements taken by two raters on the plantar skin of 20 healthy adults [
14].
The findings of the previous study are noteworthy for three reasons. First, the absolute Shore hardness values reported are for fresh, living skin using the Shore OO scale (15–30 [median], 4–41 [min–max]) [
14] and are lower than the values reported in this study for embalmed cadaveric tissues (60.17 [median], 52.08–66.46 [min–max]). These discrepancies can likely be attributed to (1) the freehand measurement method used in the previous study [
14], (2) differences in skin thickness and region (0.8–1.1 mm in the previous study vs. 1.58 mm in this study), and (3) the use of embalmed tissues in this study.
Second, comparisons of inter-rater reliability between the two studies reveal critical insights. The previous study reported ICC values on plantar skin ranging from 0.30 to 0.80, with confidence interval (CI) values of 0.40 to 0.72 [
14]. From these results, four of the six plantar locations were classified as having good or excellent reliability [
14]. In contrast, this study reported Shore OO ICC values ranging from 0.00 to 0.40, with CI values of 0.16 to 0.81, across different tissue types in the neck. For the Shore A scale, ICC values ranged from 0.21 to 0.80, with CI values of 0.43 to 0.71. Despite some overlap in ICC ranges between the two studies, this study’s Shore A values were classified as moderate, underscoring how benchmark qualifications can influence the interpretation of results. This raises the question of whether prioritising ICC values near 1.00 with wide CIs is a defensible practise.
Third, the demographics of raters in the previous study were not explicitly reported, which has implications for interpreting ICC results. Rater demographics provide critical information about the generalisability of findings to the intended user population. In this study, the raters represent (1) a potential “floor value” for clinician reliability, as all participants were in the early stages of their medical careers, and (2) the demographic profile of training clinicians likely to use task trainers.
Overall, this study contributes to addressing research gaps in three main ways. First, it emphasises the importance of careful interpretation of inter-rater reliability results when benchmark cut-offs do not account for factors such as the number of subjects, raters, categories, or margin of error. Furthermore, detailed reporting of sample and rater characteristics is essential for ensuring study generalisability. Second, this study highlights the need for implementing or adapting measurement standards, such as ASTM D2240, when evaluating the mechanical properties of tissues. Finally, it provides insights into the impact of tissue thickness and freehand measurement techniques on hardness measurements, which are often used in work utilising Shore hardness as a clinical tool or in the development of tissue mimics for task trainers.
With respect to the results studying hardness and tissue thickness, Shore hardness measurements on the A scale exhibited an expected inverse correlation, consistent with previous studies [
22]. In contrast, no consistent relationship between thickness and hardness was observed on the Shore OO scale, suggesting that this scale may be inappropriate for measuring tissue hardness. However, the generalisability of this finding may be limited to tissues from the necks of embalmed cadavers.
Finally, this study underscores the limitations of freehand Shore hardness measurements. Without additional measures to standardise the applied force, freehand methods introduce significant variability, raising concerns about the validity of results. As shown in
Figure 7, Shore hardness measurements differed markedly between freehand and mounted durometer methods, both in median values and data distribution. Prior research has established that small variations in applied force can significantly affect Shore hardness readings; for example, a difference of 0.045 N (4.597 g of force) resulted in a 10.27-unit difference in Shore hardness [
23]. This highlights both the sensitivity of durometers and the importance of adhering to standardised methods to ensure reliable measurements.
4.1. Limitations
This study has several limitations that may have influenced the results and are primarily related to (1) the specimens used, (2) the Shore hardness methodology and equipment, and (3) the raters. The limitations associated with the specimens can be categorised as (1) macro-level limitations, including sample size, donor characteristics, and preservation, and (2) micro-level limitations, such as variations in factors affecting tissue quality (e.g., pathology, collagen content, and senescence).
At the macro level, power calculations indicated that the overall sample size (
n = 14) was suboptimal, which was evident in the results. For example, using Equation (
1), “reasonable” confidence interval (CI) values for the Shore A ICC results in this work are 0.28 (ICA), 0.30 (IJV), 0.19 (VN), 0.32 (skin), and 0.08 (SCM). However, the actual CI values for Shore A measurements ranged from 0.48 to 0.73, as shown in
Figure 4. For the Shore OO scale, the mismatch was even greater. To achieve reasonable CI values for the Shore A scale, sample sizes of 16–70 would be required. For the Shore OO scale, excluding the SCM, sample sizes would need to range from 30 to 170. Notably, no reasonable CI value could be calculated for the SCM on the Shore OO scale given the study results, highlighting that the study was likely underpowered.
In addition to the small sample size, donor variability likely contributed to the wide CIs. Human tissues exhibit considerable variability in mechanical properties due to structural protein content, age, sex, and underlying diseases. For instance, arterial calcification is a normal part of ageing and is exacerbated by conditions such as atherosclerosis and hyperlipidaemia [
24]. This introduces two major implications in our study: (1) variability in hardness may result from unaccounted disease states, and (2) the findings may not generalise to populations with different demographic profiles, particularly younger or healthier individuals. It should be noted that stratifying or controlling for vessel calcification was not feasible due to unavailable donor medical histories, lack of concurrent histopathological analyses, and the inevitable presence of calcification in our donor population,
Another factor impacting generalisability is that all tissues in this study were preserved. Previous work reported a mean Shore OO value of 41 for arterial tissues [
25], compared to this study’s mean of 54.88. This suggests that Shore OO may be more reliable for fresh or living tissues, whilst Shore A might be more suitable for preserved tissues. Future studies should explore the reliability of these scales for unpreserved tissues.
A further limitation was the relationship between tissue dimensions and durometer readings. During indentation, the durometer’s foot induces localised plastic deformation, which affects not only the material beneath the indentation but also the surrounding material because of the increases in strain-induced dislocation density [
26]. Testing standards account for such effects by prescribing minimum distances from edges of the tested material and material thickness. However, most of the tissue dimensions in this study did not meet these standards, potentially affecting both the precision and validity of the results. Future research should investigate how tissue dimensions impact hardness measurements and determine whether these effects significantly alter the required precision for mimic development.
The use of a digital durometer was another limitation. A post hoc investigation revealed potential digital drift in the two durometers used. However, it was too difficult to distinguish between the effects of drift and rater variability. For example, Shore A reading fluctuated and generally increased over time, but the raters did not take measurements of the same tissues on the same day to ensure blinding. Future studies should consider (1) whether analogue durometers provide better reliability than digital durometers, and (2) controlling for digital drift as much as possible.
Finally, there were limitations in the number and selection of raters. Only three raters were used for ICC calculations. Such a small number of raters may not fully capture the variability across different users, which may not fully capture the variability across different users and thus impacts the generalisability of inter-rater reliability results. Furthermore, another limitation was in the ICC calculation. A two-way random effects layout was employed, but it could be argued that the raters were not truly random representatives of the overall population collecting Shore hardness measurements. Future studies may want to consider the impacts of employing random effects versus mixed-effect layouts in inter-rater reliability studies using Shore hardness.
4.2. Future Considerations and Research
The limitations highlighted in the previous section underscore areas for improvement, particularly related to (1) sample size and characteristics, (2) accuracy and precision of digital durometers, and (3) the number and selection of raters.
Previous studies have investigated the use of Shore hardness durometers as a clinical tool due to their simplicity, non-destructive nature, and ease of use for freehand measurements [
9,
11,
13,
14]. However, this study serves as a cautionary tale, emphasising the importance of carefully considering the available standards, Shore scales, and the reliability of the employed methods. Future studies where hardness is a critical measure should investigate the effects of rater training, expertise, and environmental factors (e.g., clinical vs. research settings) on inter-rater reliability. Additionally, refined methods involving multiple raters and applied force measuring tools could provide more insight into the accuracy and precision of durometers as clinical tools when using a freehand approach.
Whilst some groundwork has been laid to relate different biomechanical properties to Shore hardness [
23,
27,
28], more work is needed. Future directions include exploring the relationships between Shore hardness and other mechanical properties, such as viscoelasticity or creep, through different types of loading (e.g., compression or shear testing). Including a broader variety of tissue types (e.g., cartilage, tendons, neurovascular tissue) and employing different Shore scales could provide deeper insights into the generalisability of Shore hardness. Furthermore, given that both mechanical properties and geometry of tissues are heterogeneous, standardisation efforts in collaboration with biomechanical and material science organisations would also help to ensure reproducible and reliable results in future studies.
The relationship between mechanical properties and tactile perception is another area that warrants further exploration. Establishing a deeper understanding of this relationship could enhance clinical assessment tools and training methodologies. Future studies using Shore hardness as a basis for mimicking tissue properties should incorporate functional testing that replicates clinical applications, such as needle puncture, suturing, incision force, and palpation. Such work could also expand to (1) determine whether Shore hardness is a reliable measure of haptic fidelity and (2) evaluate training outcomes (e.g., skill acquisition and error reduction) in procedural task trainers specifically designed to mimic the Shore hardness of tissues.
5. Conclusions
In conclusion, this study highlights the need for caution when using Shore hardness to select tissue mimics. Whilst it is an accessible and convenient method, the findings of this work demonstrate that Shore hardness measurements are influenced by the Shore scale, tissue type, and tissue thickness. Overall, clinician reliability using the Shore A scale was found to be moderate for vascular tissues and skin but poor for nervous and muscle tissues. For the Shore OO scale, reliability was consistently poor across all tissue types. These results, however, may not be generalisable due to the use of tissues from embalmed cadavers. Further investigations are needed to elucidate whether these findings are true for fresh tissues.
Despite this limitation, the study underscores the importance of standardised methods, such as ASTM D2240, and advocates for the development of standards developed by a multidisciplinary group to address the challenges of measuring the Shore hardness of biological tissues. This work provides critical insights for clinicians considering the use of Shore hardness durometers in clinical settings or for the development of training tools. Future research should focus on (1) correlating functional tests (e.g., suturing, palpation) with Shore hardness, (2) evaluating Shore hardness as a measure of haptic fidelity, and (3) assessing the training outcomes of devices that utilise Shore hardness to replicate human tissues.