**3. Limitations**

Similar to any tool, grimace scales have their caveats and limitations. The creation of pain and grimace scales takes considerable time to develop [3,41,61,62,99]. FAUs can be species-specific with each FAU requiring validation and ideally, statistical modelling and weighting, to determine its significance in the system [8]. This means the number of FAUs can vary amongs<sup>t</sup> species with mice having five FAUs [100], rats four [41], sheep three to five [70,82], lambs five [107], equines six [37], ferrets three [106], cattle three to four [68,85], rabbits five [45] and pigs and piglets three [91,114]. While some FAUs movements appear to be tightly conserved (i.e., orbital tightening), others vary amongs<sup>t</sup> species. These variations can be contradictory between species and may be due to age of the animal [107,115] and/or musculature of the face. The nose and philtrum areas tend to be areas with greater variation amongs<sup>t</sup> mammals [37,82]. For example, rats and rabbits [41,45] will flatten their nose when in pain while mice and ferrets bulge their noses [100,106]. Therefore, each species requires the development of its own precise facial or grimace scoring system. Currently there are several commonly used mammalian research species that either do not have developed grimace scales

or are ye<sup>t</sup> to be fully developed. These include hamsters, dogs, guinea pigs, and non-human primates. Further work is needed to develop and determine the validity of grimace scales in these species.

Pain expression and threshold levels can also vary slightly amongs<sup>t</sup> breed or strain [67,68]. Baseline grimace scores need to be taken for every cohort daily for approximately three days before the initiation of an experiment or potentially painful stimuli to minimise these variations [75]. False positives are known to occur in a small range of scenarios such as sedation/anaesthesia, sleeping status [41,56,75,100], or during bouts of aggression [32]. Therefore, grimace scales should not be used during those times. Additionally, it is important to note that facial variations may occur between individuals. As a result, absolute scores may be less important than a change in the score by two points or more (i.e., trends) [133], and a more 'trends-based' approach could be more useful. There are also times when the grimace scales can result in a false positive with animals not demonstrating a pain face during a known painful procedure. For example, ear clipping in mice did not demonstrate any changes in grimace scores [57] and neither did experimentally induced gastrointestinal mucositis in rats [58]. There is discussion around the differences found in the length of time of post-painful stimuli that an animal may display a pain face and hence a grimace score. Early peer-reviewed publications questioned the ability of grimace scales to be useful for more than 24 h after a painful event [59,100]. More current studies have demonstrated that pain can be identified in animals via grimace scales for more than 24 h and more than 14 days after a painful stimulus [55,82,108,120,128,129]. From the recent literature and available publications, it is clear this technique has applications beyond its initial use.

The history of the animal, the species, breed/strain, environmental context, procedures performed, and general parameters of wellbeing must be considered when using grimace scores [7,64,75,81]. There is still research to be conducted to explore the use of grimace scales. Currently, not every grimace scale has been fully validated (ferret, piglet, lamb) and additional species may ye<sup>t</sup> benefit from their development (goats or other small mammals). Preliminary work does sugges<sup>t</sup> that guinea pigs do not appear to be good candidates for facial pain scales. These studies used behaviourial ethograms which included elements of commonly and strongly conserved facial expressions (i.e., orbital tightening) and did not find any significant correlation of these expressions as indicators of pain [63,78,90]. It may be that grimace scales are not appropriate for these species or the FAUs associated with pain are different to other mammalian species. Many scales have only been used in specific settings or studies and need further work to determine if they are affected by common agricultural or animal procedures such as restraint in lambs [107] or piglets [115]. There is still variability in the available literature as to the length of time a grimace score can be detected in some species and studies [58,128] as well as its applicability of use [56] which should be further explored. While grimace scales have been developed and validated for several mammalian species, it is known there are species-specific variations in the expression of pain faces (guinea pigs) which may determine the development of a grimace to be unsuitable or require a different approach.

### **4. Application and Summary**

In an ideal situation, a single pain identification technique would be sufficient across all species and scenarios; however, this currently does not exist and may never exist, given pain is an individual multifactorial experience [61]. Nonetheless, the growing body of literature is demonstrating that, overall, pain faces in mammalian species are often expressed and can be identified during most procedures, pain types, and contexts. Most of the variation found when using grimace scores to identify and assess pain is in the strength of association, the magnitude of certainty and the consistency of grimace score expression. Even with these variations, the use of grimace scales appears to be good at detecting pain in mammals [41,45,64,70,85,87,91,100–102,106,114]. However, if studies could be more standardised in their approach and the use of grimace scales, this may be beneficial in reducing minor confounding elements (i.e., handling, conspecifics) or in identifying areas of improvement. Future studies and the day to day practical and experimental application of this technique would benefit from having a formally validated and consistent training program, complete with video and photographic

materials. A standard training program would be useful for grimace score users and has been useful for other pain scoring systems [38,46,86]. Part of the development, training and implementation of grimaces could be enhanced by the use of various technologies such as automated or semiautomated software for scale development and scoring via video surveillance [41,59,134,135]. These nascent technologies are often unfeasible due to cost, infrastructure constraints and a lack of development but in the future their use may play a greater role in grimace scoring systems.

The identification and mitigation of pain fulfil an essential and required aspect of refinement when working with animals in research. As of yet, no single indicator or technique is considered su fficient in the identification and assessment of pain. Several peer-reviewed publications have advocated multiple measures of animal welfare, and pain should be employed to mitigate the potential negative e ffects of pain on animal welfare and research outcomes [3,4,9,15,21,61,64]. Using a combination of relevant retrospective and spontaneous techniques applied on a case by case basis can maximise the opportunity to detect and assess pain in research animals. It minimises the chance for pain to be undetected and maximises the opportunity to preserve animal welfare and research outcomes. While there are known limitations, grimace scales to at least identify potential indicators of pain are useful tools [60]. The use of grimace scales with other parameters of pain and/or animals wellbeing is likely to increase the ability of research sta ff to identify and assess pain in mammals and o ffer appropriate humane interventions. At this time grimace scales are a potentially promising and important pain identification tool; however, further work should be performed in a consistent manner to validate existing work as well as explore new applications to other species, conditions and experimental studies.

To achieve good animal welfare and research outcomes and meet legal and ethical obligations, it is paramount to utilise a consistent and accurate pain identification method. The use of a grimace score can assist in fulfilling these obligations by identifying pain and allowing a timely intervention via analgesia or humane endpoints. Grimace scales are thus proving to be a valuable tool with a myriad of applications. Their use can o ffer improvements in animal welfare and more robust animal research outcomes [9,64]. While grimace scales are not without limitations, there is a growing body of literature and evidence to sugges<sup>t</sup> they can be a significantly useful adjuncts in the detection and assessment of pain in a variety of species and research studies [7,35,41,66,76,100]. When used correctly by trained individuals along with an animal's history and basic wellbeing criteria, grimace scales can be a practical, accurate and easy method to identify pain in research animals to provide refinements in experimental animal welfare and outcomes [38,61,75]. Future applications of their use could focus on di fferent types of experimental studies, new species, neonates, standardisation in training protocols, and correlation of multiple observations over time.
