Animal welfare assessment is an essential tool for maintaining positive animal wellbeing. Validated welfare assessment protocols have been developed for farm, laboratory, zoo, and companion animals, including horses in managed care. However, wild and free-roaming equines have received relatively little attention, despite populations being found worldwide. In the UK, free-roaming ponies inhabit areas of Exmoor, Dartmoor, and New Forest, England, and Snowdonia National Park in Wales, amongst others. Visitors and local members of the public who encounter free-roaming ponies occasionally raise concerns about their welfare, as they are not provided with additional food, water, or shelter. In this study, we evaluated the feasibility, reliability, and repeatability of welfare indicators that can be applied to a population of free-roaming Carneddau Mountain ponies to address such concerns. Our findings indicate that many of the trialed indicators were successfully repeated and had good levels of inter-assessor reliability. Reliable and repeatable welfare indicators for free-roaming and semi free-roaming ponies will enable population managers and conservation grazing schemes to manage the welfare of free-roaming horses and ponies.
1. Introduction
Knowledge of the welfare of animals under human care is integral to their successful management; equally important is an understanding of the welfare of free-living animals to guide how we interact with wildlife and their habitats
[1]. To gather knowledge to improve animals’ welfare, a validated, reliable, and repeatable method of assessment is required
[2][3][2,3]. Recently, welfare assessment has moved from resource-based or simple indicators of environmental parameters to include indicators that monitor the behavioral responses and physiological conditions of individual animals over time
[4]. Animal-based indicators are particularly relevant in the welfare assessment of wild or free-ranging animals. Indicators related only to environmental parameters do not allow for the assessment of the behavioral or physical responses to the prescribed condition and are not representative of the animal’s welfare state
[4]. An animal’s ability to adjust to both predictable and unpredictable change in its environment is vital to maintaining welfare
[5]. For example, seasonal changes in forage and grass availability for grazing animals may elicit periods of fasting (hunger) or seasonal weight gain (fat reserves) to cope with available resources
[4]. While periods of hunger may be considered a welfare issue, this may not be a factor for the animal itself if its adaptive capacity (physical and mental abilities) has not been exceeded
[4]. Using a multifactorial approach including animal-based indicators (AB) (physical/physiological outcomes) and resource-based indicators (RB) (what is available in the environment) for assessment enables the evaluator to quantify levels of individual welfare
[6]. There are a range of welfare audit protocols that have recently been developed to determine welfare by evaluating RB and AB indicators for farm
[7][8][7,8], companion
[9], laboratory
[10], and zoo animals
[11][12][13][14][11,12,13,14]. In contrast, there are few protocols for extensively managed animals
[3][15][16][3,15,16] or, indeed, free-living wild populations
[1].
Populations of free-living and free-roaming horses are found throughout the world. Significant numbers occur in Australia, with an estimate of over 300,000 free-roaming brumbies
[17]. In the US, smaller populations of free-roaming horses occur. There are just over 79,000 mustangs and 15,546 burros managed by the United States Department of the Interior’s Bureau of Land Management (BLM), inhabiting approximately 31.5 million acres of land across 10 states in the Western United States
[18]. In the UK, free-roaming ponies inhabit areas of Exmoor, Dartmoor, New Forest in England, and the Carneddau mountains in Wales, amongst others.
Public concern regarding the welfare of free-roaming equids has become more prevalent in recent years. Whilst increased public awareness and demand for the improvement of equine welfare are evident across various equine disciplines, e.g., sport, working horses, racehorses, and those kept for pleasure
[19][20][19,20], there is also concern regarding the welfare of feral populations. Equine stakeholders participating in a study carried out by Horseman et al.
[20] identified overbreeding, a lack of food in winter, and gatherings (rounding-up for health checks) as areas of welfare concern for free-roaming ponies specifically. The public’s concern for the welfare of mustangs in the US has been particularly well-documented, with many groups urging the Bureau of Land Management to cease all gatherings, removals and contraceptive strategies
[21]. Visitors and local members of the public who encounter free-ranging ponies in the Carneddau mountains also occasionally raise concerns about their welfare because they are not provided with additional food, water, or shelter (Carneddau Pony Society, personal communication, 3 November 2019). There is therefore a need for objective indicators of welfare in these populations at both the individual and group levels. Despite several validated equine welfare assessment protocols in existence for sport, pleasure, and working horses, e.g.,
[22][23][24][25][26][22,23,24,25,26], there is currently only one audit available for free-roaming horses. This describes a 10-step protocol using the Five Domains model that can be used to form a template for welfare assessment in free-living terrestrial species and uses Australia’s brumby horses as an example
[1]. Here, we therefore trialed specific welfare indicators to ascertain their feasibility, reliability, and repeatability for the welfare assessment of free-living horses using free-roaming Carneddau Mountain ponies as an example population.
2. Preliminary Testing of Individual Indicators: BCS and Mobility
There was a good degree of reliability between the two observers for BCS of the 34 ponies tested in the preliminary phase using the modified Henneke score. Cohen’s weighted kappa score was 0.78 (95% CI: 0.78–0.78; (
Table 12). The reliability of the mobility scoring using a 3-point scale was very good, with 100% agreement between the two observers in their scoring of the 34 ponies and a κ
w of 1 (
Table 12).
Table 12. Kappa weighted values and their 95% confidence intervals for all preliminary indicators tested for inter-observer reliability in a sample of 34 ponies. K is 1 when there was perfect agreement and 0 when there was no agreement better than chance. All κw values of 0.61 (good) and above are shown in bold.
Welfare Indicator |
Percentage of Agreement |
κ | w | (95% CI) |
Interpretation w/CI |
BCS |
97% |
0.78 | (0.78–0.78 |
Good |
Mobility |
100% |
1.0 | (1.0–1.0) |
Very good |
3. Inter-Assessor Reliability
In this study, several of the welfare indicators in the audit were not witnessed (alopecia of mane or tail, nasal discharge, and coughing). They were therefore not included in the results because their reliability could not be determined. The HGS
[27][31] was only infrequently indicated as the prescribed criterion for conducting the HGS required the observed pony to score a zero in one or more of the following categories: mobility, ocular injury/discharge, wounds, fecal consistency, and BCS. Therefore, assumptions about reliability could not be made. Hoof shape/condition assessment was only indicated in two instances where mobility was impaired. The agreement in these instances was 100% between assessors, but not all specific categories were observed (A = overgrown and B = cracked/chipped) and therefore could not be tested for reliability. Data on fecal consistency were opportunistically collected, and only 12 of the 35 ponies defecated. Assessors reached 100% agreement in terms of fecal consistency assessment, but all ponies had normal feces; other categories (0 = watery, and 1 = abnormal) were not observed, and thus assumptions about reliability across all categories could not be confirmed. Water quality scored three (fresh spring forming a stream, pond, or lake) by all assessors for
n = 35 ponies. As the categories 1 = no water detected and 2 = stagnant pool/ puddle were not identified, this indicator could not be further tested for reliability. Ocular discharge and/or swelling were only infrequently encountered, and while all assessors had 100% agreement, category 1 = discharge with an open eye was not witnessed. Environment ease of movement (people/bikes/dogs) had 100% agreement between all assessors; however, there were no scores in category 1—high footfall. This was also true for wounds and swelling, where no ponies received a score of 0 = open wound involving deeper tissue/muscle (acute).
The indicators that could be fully evaluated showed mixed reliability across the assessors (
Table 23). Assessor 1 (primary investigator) and Assessor 2 (equine surgeon), with
n = 18 assessments, had a very good agreement between five of the seven remaining indicators (BCS, ease of movement, social contact, and human approach), moderate agreement for comfort around resting, and poor agreement for thermal comfort. Assessors 1 and 3 (scientist), with
n = 11 assessments, achieved good to very good reliability for social contact and human approach. However, for ease of movement, comfort around resting, and thermal comfort, both assessors assigned scores in only one category, meaning that although they had 100% agreement, assumptions about reliability across all categories could not be made. Finally, Assessors 1 and 4 (scientist), with
n = 6 assessments, had good to very good agreement (BCS, social contact, and human approach), whilst thermal environment and comfort around resting had kappa scores of 0, indicating no agreement better than chance. Indicators that achieved 100% reliability between assessors were not further investigated; details are provided in
Table 34.
Table 23. κ
w estimates and their 95% confidence intervals for welfare indicators in the prototype assessment, which were tested for inter-assessor reliability between the primary investigator and each of the assessors. Assessor 1 (A1) (primary investigator) assessed all thirty-five ponies along with one of the other assessors (A2–A4). NT—data were not able to be tested due to an insufficient number of categories. All values of 0.61 (good) and above are shown in bold. κ
w is 1 when there was perfect agreement and 0 when there was no agreement better than chance. Interpretation is according to the work of Altman
[28][54].
Welfare Indicator |
Assessor Identity A1–A4 |
Percentage of Agreement |
κ | w | (95% CI) |
Interpretation w/CI |
BCS |
1 and 2 |
100 |
1.0 | (1.0–1.0) |
Very good |
1 and 3 |
83 |
0.57 (−0.12–1.0) |
Poor–Moderate |
1 and 4 |
100 |
1.0 | (1.0–1.0) |
Very good |
0 |
No agreement |
Ease of movement (Hazards) |
1 and 2 |
94 |
0.89 | (0.67–1.0) |
Good–Very good |
Ease of movement (people and dogs) |
85 |
0.67 | (0.33–0.99) |
Fair–Good |
1 and 3 |
100 |
- |
NT |
Ease of movement (hazards) |
85 |
0.40 (0.11–0.87) |
Poor–Moderate |
1 and 4 |
100 |
- |
NT |
Thermal environment |
85 |
0.48 (0.10–0.85) |
Poor–Moderate |
Comfort around resting |
1 and 2 |
70 |
0.42 (0.07–0.76) |
Poor–Moderate |
Skin condition head, neck, body and limbs |
100 |
1.0 | (1.0–1.0) |
Very good |
1 and 3 |
100 |
- |
NT |
Wounds and swelling |
95 |
0.65 | (0.25–1.0) |
1 and 4 |
72 |
0.0 |
No agreement |
Thermal environment and comfort |
1 and 2 |
61 |
0.18 (0.25–0.61) |
Poor |
1 and 3 |
100 |
- |
NT |
1 and 4 |
81 |
0.0 |
No agreement |
Social contact |
1 and 2 |
100 |
1.0 | (1.0–1.0) |
Very good |
Fair–Good |
1 and 3 |
100 |
1.0 | (1.0–1.0) |
Very good |
1 and 4 |
90 |
0.65 | (0.4–1.0) |
Poor–Good |
Human approach test |
1 and 2 |
80 |
0.83 | (0.61–1.0) |
Very good |
1 and 3 |
81 |
0.73 | (0.32–1.0) |
Fair–Good |
1 and 4 |
83 |
0.85 | (0.63–1.0) |
Good–Very good |
Table 34. Welfare scores for indicators with 100% agreement listed by assessor. These indicators were excluded from further analysis. Assessor 1 (A1) (primary investigator) assessed n = 35 ponies along with one of the other assessors; A2 (n = 18), A3 (n = 11), and A4 (n = 6).
Social contact |
95 |
0–0 |
No agreement |
Human approach |
72 |
0.53 (0.19–0.87) |
Poor–Moderate |
Welfare Indicator |
Score |
A1 |
A2 |
A1 |
A3 |
A1 |
A4 |
Ocular discharge/swelling Score: 0–2 |
0 1 2 |
1 0 17 |
1 0 17 |
0 0 6 |
0 0 6 |
0 0 11 |
0 0 11 |
Mobility Score: 0–2 |
0 1 2 |
1 1 16 |
1 1 16 |
0 0 6 |
0 0 6 |
0 0 11 |
0 0 11 |
Skin/coat condition (head/body) Score: 1–2 |
1 2 |
2 16 |
2 16 |
0 6 |
0 6 |
0 11 |
0 11 |
Reproductive Status NA, A (lactating), or B (not lactating) |
NA A B |
9 2 7 |
9 2 7 |
3 1 2 |
3 1 2 |
5 2 4 |
5 2 4 |
4. Intra-Assessor Reliability (Test/Retest)
As with the inter-assessor trial, during the test/retest phase, some of the welfare indicators were not observed (alopecia of mane or tail, nasal discharge, and coughing). The HGS and hoof condition were not warranted because no ponies scored a zero in any of the indicators that triggered the need to carry out the HGS. Additionally, there were no mobility scores of 0 = immobile or 1 = minor impairment during the test/retest phase. As with the inter-assessor reliability assessment, during the test/retest phase, all ponies scored a three for water quality and availability; therefore, all categories could not be tested for reliability but repeatability was confirmed. For the remaining indicators, the reliability of the repeated observations by the primary investigator of the twenty ponies had mixed results. BCS, reproductive status, and skin condition all had a kappa estimate of 1.0 (very good agreement). Wounds and swelling had fair to good agreement, as did the ease of movement (dogs/bikes/people). Social contact had a kappa score of 0; however, the agreement was 95%, with 19 of the 20 ponies receiving the same score in the initial and repeated welfare assessments. Similarly, comfort around resting had a percentage agreement of 90% and no agreement for kappa, with 18 of 20 ponies receiving the same score in the test/retest phases. All other indicators attained moderate reliability (
Table 45).
Table 45. Kappa estimates and their 95% confidence intervals for welfare indicators in the prototype assessment (test/retest reliability). All kappa values of 0.61 (good) and above are shown in bold. Interpretation is according to the work of Altman
[28][54].
Welfare Indicator |
Percentage of Agreement |
κ (95% CI) |
Interpretation w/CI |
BCS |
100 |
1.0 | (1.0–1.0) |
Very good |
Reproductive status |
100 |
1.0 | (1.0–1.0) |
Very good |
Resting comfort |
90 |