1. Introduction
Behavioural problems (here encompassing behaviours that horse riders, owners and caregivers consider undesirable or dangerous) compromise rider and handler safety and can jeopardise horse welfare. Stereotypic and redirected behaviours have long been associated with suboptimal management practices [
1]. However, non-stereotypic behavioural problems, including unwelcome responses in-hand and under-saddle, are often encountered by horse riders and handlers, with a reported 28% of the 200,000 horses in the United States being relinquished to rescue organisations for this reason [
2]. Normando et al. [
3] found links between riding style and reported behavioural problems, including stereotypies. Specifically, they found associations between English-style riding and restrictive stabling practices, also associated with more frequently reported locomotion stereotypies. Hockenhull and Creighton [
4] found that 91% of UK horse owners reported ridden behavioural problems with their horses, among which shying was the most frequently cited behaviour [
4]. Hockenhull and Creighton [
5] also identified risk factors associated with stable-related and handling behavioural problems when investigating horse management practices. Risk factors included confinement in stables, turn-out schedules, social isolation and the amount of time that owners spent with their horse each day. These researchers also revealed that training styles and equipment were risk factors for ridden behavioural problems, including failing to slow or jump and extreme conflict such as bucking, bolting and rearing [
6].
Understanding how and why problem behaviours develop will advance safety and welfare. Horse owners and riders have an ethical obligation to be aware of how their training affects their horse because equitation largely relies on the appropriate use of pressure during negative reinforcement [
7], the application of aversive stimuli (usually pressure applied via the rider’s hands and legs) until the horse offers the desired response, at which point the pressure must be removed to reinforce conditioning of the correct response [
8]. The use of prolonged or excessive pressure is contraindicated because it leads to habituation (an outcome which horses are especially prone to) and the consequent need for more pressure in future [
9].
When problems manifest, handlers and riders often find themselves in a downward spiral of punishment and increasingly dangerous behaviour [
10,
11]. While ethical equitation employs combined reinforcement, riding the horse obliges the use of negative reinforcement with seat, leg and rein signals. Horses, as prey animals vulnerable to attack when injured or distressed, are easily habituated to aversive stimuli, becoming non-responsive [
12]. Handlers and riders can then resort to punishment when the horse fails to respond to applied stimuli.
In domestic animals, there is a triadic relationship between training, management and behaviour. For equids, when training and management flaws are left unresolved, the horse’s deteriorating behavioural responses can result in suboptimal care and welfare. For example, the horse that has been confined to a stable for an extended period is likely to express post-inhibitory rebound locomotory behaviour [
13,
14] in the form of bolting and bucking. This may unseat the rider and, as a result, prompt the use of inappropriate punishment [
14]. While considerable research has focused on the training of horses [
15,
16,
17,
18,
19,
20,
21,
22,
23,
24,
25,
26] and management [
3,
4,
27,
28,
29,
30,
31,
32,
33,
34,
35,
36], a deeper understanding of how the triadic elements interact is now required to improve horse welfare and advance ethical equitation. By gathering baseline data on the domestic equine triad, researchers will be able to define what constitutes normal behaviour for horses and identify best-practice interventions according to an evidence base.
Horses vary widely in their responses to various types of standardised behavioural testing based on equine and equitation science [
37]. Furthermore, researchers using behavioural tests to assess temperament or reactivity often study cohorts of horses that have been previously exposed to diverse management and training regimes. These diverse regimes could contribute variably to the ontogeny of the behaviours observed [
9]. Small nuanced changes in behaviours can be reliably observed and recorded by animal-owners and caregivers [
38], and the use of online surveys to gather such data has transformed how research is conducted across many industries, from science to marketing [
39]. To define what constitutes normal equine behaviour, there is a clear need for an accessible, objective, standardised and validated data-collection tool, especially one that facilitates comparison between horse populations [
40].
There is a paucity of research on the ontogeny and epidemiology of behavioural problems in horses. Published studies in this domain have typically been geographically specific and have investigated particular behavioural attributes without the benefit of baseline behavioural data [
41], such as reports on how independent the horse appears, how it interacts with other horses and humans and how responsive it is to signals and cues. Many studies also suffer from small sample numbers, making generalisation across populations difficult. There are several reasons for this, the main one being that horses are large and expensive, thus costly to incorporate into experimental designs that impose uniform management on experimental subjects. For this reason, as has been done with other species [
42,
43], low-cost equine studies often gather data from horses in their home environment and harvest behavioural observations reported by their owners and caregivers.
To understand how and why behaviours, particularly those that can jeopardise rider or handler safety and horse welfare, develop and persist over time, large-scale data collection and analyses are required. To date, data collection in the area of horse behaviour has rarely taken a holistic approach, investigating training, management and behaviour at the same time, and nor has longitudinal data collection been a priority.
To optimise questionnaire reliability, owners’ reports must focus on the frequency of observations, negating the need for participants to explain the behaviour or comment on the horse’s motivations for performing behaviours [
41]. Historically, owner-sourced data on horse temperament often required respondents to interpret behaviour. For example, questions that derive such data related to how “anxious” a horse is in a given environment require owners to observe behaviours, decipher which are relevant and, finally, score the behaviour without a reference point to benchmark its current frequency against [
44]. Individuals likely differ in how they interpret behaviour and the assumptions they make about what motivates the horse to behave in a particular way [
45,
46], thus compromising survey reliability.
Such limitations on data collection led to the development of the Equine Behavior Assessment and Research Questionnaire (E-BARQ), a sister project to both the Canine Behavioral Assessment and Research Questionnaire (C-BARQ), [
43] for dogs and the Feline Behavioral Assessment and Research Questionnaire (Fe-BARQ) [
42] for cats. The C-BARQ, launched in 2005, has collected data on over 85,000 dogs and been used in more than 100 published studies. The Fe-BARQ has been collecting data on the behaviour of domestic cats since 2016 and currently has over 7000 cats reported through the database. E-BARQ provides participants with the opportunity to return to the questionnaire and update their responses, a feature that offers researchers longitudinal data. This is arguably a critical feature when investigating horse behaviour as horses change owners and disciplines more regularly than their canine or feline counterparts.
The interaction between training, management and behaviour has been labelled the domestic equine triad [
41]. Providing researchers and the equine industry with a reliable, standardised and validated behavioural assessment tool, that collates and processes their own data across the domestic equine triad, will improve horse rider and handler safety, and horse welfare. However, the importance of demonstrating reliability cannot be overlooked. The tool must be shown to describe what it purports to measure, it must show consistency when repeated, and finally, it must allow different participants to score focal horses in a similar way. For the survey instrument to be widely accessible, relevant and reliable, the tests described in this article are required.
The aim of the current study was to validate the E-BARQ questionnaire as a reliable, standardised behavioural assessment tool for horses. To demonstrate the validity of the E-BARQ as a behavioural assessment instrument, we undertook three testing protocols [
47]. Construct validity, demonstrating that the E-BARQ was measuring what it set out to measure, was evaluated by comparing owners’ subjective assessment of their horses’ behaviour with the detailed scores obtained from the questionnaire. Inter-rater reliability was evaluated by comparing rider pairs’ scores for the same horse. Finally, intra-rater reliability was assessed by comparing the scores given by the same assessor to the same horse over time.
4. Discussion
The E-BARQ performed well in the construct validity testing. Owners reporting moderate or serious problems with their horses during the six months before taking the E-BARQ scored significantly worse than those reporting no problems or minor problems. The results in
Figure 1 and
Table 2, the ridden horse questionnaire, demonstrate that respondents reporting “no behavioural problems” are obtaining fewer high-level behavioural problems (where 1 = N/A, 2 = never and 6 = always). In contrast, with each of the three reported problem levels (mild, moderate and serious problems), the frequency of reported problems significantly increases (see
Table 2). Conversely, those reporting no problems or mild behavioural problems scored better on items relating to problem behaviours, providing reliable construct validity for the E-BARQ questionnaire. Similar results in
Figure 2 and
Table 3, the non-ridden horse questionnaire, demonstrate that respondents reporting “no behavioural problems” are obtaining fewer high-level behavioural problems (where 1 = N/A, 2 = never and 6 = always). In contrast, with each of the three reported problem levels (mild, moderate and serious problems), the frequency of reported problems significantly increases (see
Table 3).
Horses reported to have only minor problems were by far the most common and it is unlikely that even those horses with reportedly severe problems would have poor scores in all areas of behaviour (see
Supplementary File S1 for a full list of items). It is noted that this type of validation is not truly independent, because it relies on the behavioural information of the owner and not all owners are likely to perceive “problem behaviours” in the same way. Nevertheless, it is a highly relevant attribute to test for validity because behavioural problems are known to be the biggest risk to the welfare of the pleasure riding horse [
4,
52].
When interacting with almost all items within the E-BARQ, respondents report on the frequency of behavioural observations and no interpretation of behaviour is required. The current inter-rater reliability testing demonstrates that the E-BARQ encourages this style of reporting consistently, suggesting that E-BARQ results are generalisable to disparate populations of horses. Among the Cohen’s Kappa scores for the 215 items, 33 scored more than 0.860 (almost perfect agreement), 40 scored <0.646 (substantial agreement), 123 scored <0.416 (moderate agreement), five scored <0.4 (fair agreement), four scored 0.361 (slight agreement), and ten, relating to the horse showing anxiety when away from home, showed no agreement with a score of zero (see
Supplementary File S2 for a full list of question items and Kappa alignment scores).
No agreement was found between the ten items relating to the horse showing signs of anxiety when away from home (ĸ = 0) (see
Supplementary File S2). The behaviours that could be reported in this part of the E-BARQ included restlessness, pacing, vocalising, bucking, rearing and moving about or pulling back when tied. When respondents reported on similar behaviours observed in the same horse when separated from other horses at home, their scores showed moderate alignment (ĸ = 0.458). This may reflect an operator effect on behaviour, in that it supports the findings of other researchers [
53] that horses react differently when handlers and riders are anxious. Horses can be taken away from home for a variety of reasons, from high-level competition to pleasure riding outings, and these differences are likely reflected in the handlers’ interaction with the horse at the time. For example, a rider at a high-level competition may feel anxious themselves, changing the way they handle the horse. It may also reveal the limited experience that some participants could have had in taking the horse away from home and the different types of events attended, if they did not typically take the horse away from home. This effect, originating from the arousal level of the handler, may also apply when respondents were asked whether the horse will stand for veterinary and farrier procedures, in that only slight alignment was found. Veterinary visits can be stressful events for horse owners, particularly when the horse is unwell or requires invasive procedures, possibly making horses difficult to manage [
54]. Owners of horses that have not been appropriately trained to stand for the veterinarian or farrier may also alter their handling behaviour during farrier visits as a result of their expectations [
55]. Further investigations into equine manifestations of anxiety displayed at different events and venues, especially with different handlers, would be of interest.
The other notable finding here was the score of fair alignment when respondents were asked how quickly their horse learns. There were four items, including how quickly the horse learned with food rewards, positive reinforcement (other than food), pressure release and punishment/correction. With the exception of this question set, E-BARQ simply asks respondents to report on the frequency of behavioural observations. This more subjective question set was added because the authors recognised the influence that riders’ and handlers’ preconceived ideas can have on the horse–human relationship [
45,
46]. For example, if a rider believes the horse is being stubborn, rather than confused, they may become more reactive and thus more likely to punish or correct the horse. Further studies should explore how the interpretations of videos of a focal horse undertaking a task differ when viewed by riders and handlers with different beliefs about how well the horse learns.
Intra-rater reliability testing revealed very good agreement between ratings. Interestingly, none of the intra-rater associations were within the perfect or near perfect range, presumably because horse behaviour changes over time. However, of the 215 items assessed, only four scored a Cohen’s Kappa lower than 0.485 (moderate agreement). These four items, scoring only 0.262 (fair agreement [
50]), related to horses pulling or lagging behind when being led. It could be argued that this question set is slightly more subjective than some others in the E-BARQ questionnaire and, by asking the respondent to interpret “pulling” or “lagging”, may introduce inconsistencies within respondents. Furthermore, as leading is a frequent activity when handling horses, these particular items may be more subject to recall bias than others. For example, if a horse that does not normally pull on the lead suddenly does so one day, close to the respondent taking an E-BARQ, the respondent is likely to remember the recent unwelcome behaviour. Other behaviours, such as bucking or problematic loading onto a trailer, are less frequently encountered and, presumably, are rarely forgotten. Such behaviours are also more objectively assessed, not requiring the handler to interpret the horse’s behaviour. Further research, comparing larger numbers of E-BARQ results over longer periods, would be of interest in this area.
Again, interestingly, E-BARQ has, arguably, only four items in its subjective question set that explore how quickly the horse learns from combined reinforcement. This set scored 0.555 (moderate agreement) when assessed with the intra-rater reliability tests. The same items scored 0.174 (slight agreement) when different riders assessed a focal horse. This finding underlines the need for questions to be objective and to avoid respondents having to interpret behaviour, as suggested by Fenner et al. [
41].
The current study did have some limitations. Participation was voluntary and, as a result, the results may be exposed to inherent subject and response bias and could over-represent views of horse owners who engage with online platforms. Some measures of performance are open to individual interpretation and, as discussed, can introduce subjectivity into survey responses—for example, personal assessment of a horse’s response to the rider’s cues. Our results suggest that intra- and inter-reliability is good for almost all questions within the E-BARQ survey across two sampling periods and between respondents. However, inter- and intra-rater reliability testing with larger sample sizes and across different geo-cultural areas would be advantageous.