1. Introduction
Poaching is continually fuelling the illegal wildlife trade, and it is currently on the rise, becoming a global conservation issue [
1,
2]. This leads to the extinction of species, large reductions in species’ abundance, and to cascading consequences on economies, international security, and the natural world itself [
3,
4]. Evidence from the 2016 General Elephant Consensus (GEC) showed that one African elephant is killed every 15 min, causing their numbers to dwindle rapidly [
5]. One of the main drivers of this is the increasing demand for ivory and other illegal wildlife products, particularly in Asian countries, driven by the belief that products such as rhino horn hold a significant status symbol and are relied upon in traditional medicine. Rhino horn is currently valued on the black markets in Vietnam at between USD 30,000 and 65,000 per kilogram [
6,
7,
8].
At present, there are a number of anti-poaching techniques employed in affected countries, such as ground ranger patrols, rhino de-horning operations, community and education projects, and schemes focused on enforcing illegal wildlife trade laws [
4,
9]. In addition, whilst these strategies are crucial in the reduction of poaching, curbing the demand for ivory products is equally as important to ensure the long-term effectiveness of anti-poaching methods [
10]. More recently, drones were considered as an addition to these methods due to their decreasing cost, higher safety compared to manned aircrafts and ranger patrols, and their flexibility to carry a variety of payloads, including high resolution cameras of different wavelengths. Drones present an opportunity for larger areas to be surveyed and controlled compared to ground-based patrols, which is useful for when ‘boots on the ground’ and other resources are limited [
11,
12,
13].
Drones are increasingly utilised over the last few decades as a conservation tool. They are successfully used for detecting animal densities and distributions, land-use mapping, and monitoring wildlife and environmental health [
14,
15,
16,
17]. For example, Vermeulen et al. [
18] used drones fitted with an RGB (red, green and blue light) camera to successfully monitor and estimate the density of elephants (
Loxodonta africana) in southern Burkina Faso, West Africa. RGB cameras obtain images within the visible light spectrum and are generally more affordable than multispectral cameras and are found on the majority of consumer grade drones. They also obtain images with a higher resolution than multispectral cameras [
13]. TIR (thermal infrared) cameras, on the other hand, operate by detecting thermal radiation emitted from objects, making them useful for detecting activity that occurs at night, such as poaching, and events in which detection of heat sources is important [
13,
19]. For example, studies demonstrated the use of drone-mounted thermal cameras to successfully study arboreal mammals, such as Kays et al. [
20], who studied mantled howler monkeys (
Aloutta palliata) and black-handed spider monkeys (
Ateles geoffroyi) using this method and found thermal cameras were successful in observing troops moving amongst the dense canopy, particularly during night and early morning. A similar study by Spaan et al. [
21], also studying spider monkeys (
Ateles geoffroyi), found that, in 83% of surveys, drone-mounted thermal cameras obtained greater counts than ground surveys, which is thought to be due to the larger area able to be covered by drones.
In the last few years, the proven success of drones in conservation and the increases in wildlife crime sparked research into the use of drones to detect and reduce illegal activity such as poaching and illegal hunting [
22,
23,
24]. However, there is a lack of studies investigating the factors that influence detection, which are instrumental for understanding the environmental situations in which poachers may elude detection as well as the technical attributes that aid successful detection. Hambrecht et al. [
25] contributed to research focused on investigating this topic, and this was based on a study by Patterson et al. [
12], who studied the effect of different variables on the detection of boreal caribou (
Rangifer tarandus caribou). Hambrecht et al. [
25] used similar variables and adapted the focus towards poacher detection. A number of conclusions were made from this research. Using thermal cameras significantly improved detection as well as vegetation cover having a negative impact on detection during thermal flights. Time of day was not found to be a significant factor, despite results suggesting cooler times of day improved detection. The contrast of test subjects to surroundings (e.g., the colour t-shirt they were wearing) and drone altitude significantly affected detection during RGB flights as well as canopy density.
Despite these conclusive results, there are a number of knowledge gaps which, if investigated, will provide protected area managers, NGOs, and other stakeholders with additional reliable scientific information, allowing them to make informed decisions about whether to allocate the time and the resources into incorporating drones into anti-poaching operations and how to do this in a way that maximises success. One of the knowledge gaps recognised was the effect of camera angle on poacher detection. Perroy et al. [
26] conducted a study to determine how drone camera angle impacted the aerial detection of an invasive understorey plant species of miconia (
Miconia calvescens) in Hawaii. It was found that an oblique angle significantly improved detection rate. Based on this information, this study aimed to investigate whether the same effect could be found with poacher detection. Furthermore, Hambrecht et al. [
25] did not find any significance in the effect of time of day on detection, which could be due to the small number of flights that were conducted. Other studies such as one by Witczuk et al. [
27] found that the timing of drone flights had a significant influence on detection probabilities depending on the camera type. It was suggested that dawn, dusk, and night-time were the optimal times for obtaining significant quality images with a thermal camera. In addition to these factors, this study aimed to investigate whether walking test subjects were more easily detected than stationary ones, particularly through TIR imaging, as walking test subjects are more easily differentiated from objects with a similar heat signature. This was also found in a study Spaan et al. [
21], who successfully used drone mounted TIR cameras to detect spider monkeys (
Ateles geoffroyi) amongst the dense canopy.
Finally, this study also tested the efficiency of a machine-learning model for automated detection using trained deep learning neural networks as an alternative to the manual analysis of collected data [
28]. Thus far, research relied on data recording on-board the drone for later manual analysis, which can be labour intensive and time consuming, particularly when studying animal abundance or distribution [
13,
17]. Automated detection methods were successfully used previously to detect and track various species including birds and domestic animals, and they are being investigated further as a potential means for detecting threats to wildlife in real-time, e.g., poaching and illegal logging [
17,
29,
30,
31]. For example, a study by Bondi, Fang et al. [
3] explored the use of drones mounted with thermal cameras and the use of an artificial intelligence (AI) application called SPOT (Systematic Poacher de-Tector) as a method for automatically detecting poachers in near real-time. It incorporated offline training of the system and subsequent online detection. More recently, a two-part study by Burke et al. [
32] and Burke et al. [
33] evaluated the challenges faced in automatically detecting poachers with threshold algorithms, addressing environmental effects such as thermal radiation, flying altitude, and vegetation cover on the success of automated detection. A number of recommendations were made to overcome these challenges, which are discussed and evaluated later in the paper.
This study builds upon the research by Hambrecht et al. [
25], investigating the knowledge gaps as well as collecting a larger amount of data to increase the statistical power of the results. It aimed to provide additional knowledge to what was already conducted in the use of automated detection to combat wildlife crime, a field that is still in its infancy. It was predicted that, provided the deep learning model was trained sufficiently, automated detection would prove to be equally as successful in detecting poachers as manual analysis, if not more. Various studies also found this result, such as Seymour et al. [
34], who used automated detection to survey two grey seal (
Halichoerus grypus) breeding colonies in eastern Canada. Automated detection successfully identified 95–98% of human counts. In addition, it was hypothesised that, for both types of analysis, the variables having a significant effect on detection would be time of day, canopy density, and walking/stationary subjects.
2. Materials and Methods
2.1. Study Area and Flight Plan
The study took place at the Greater Mahale Ecosystem Research and Conservation (GMERC) field site in the Issa Valley, western Tanzania (latitude: −5.50, longitude: 30.56). The main type of vegetation in this region is miombo woodland, dominated by the tree genera Brachystegia and Julbernardia. This region is also characterised by a mosaic of other vegetation types such as riverine forest, swampland, and grassland [
35,
36]. The vegetation was dense and green due to the study being conducted in early March of 2020 towards the end of the rainy season.
An area of miombo woodland of approximately 30 × 30 m was chosen for its proximity to the field station and also for visual characteristics, as it offered a variety of canopy densities and open-canopy areas to utilise as take-off and landing zones. This site was the same approximate area in which the study by Hambrecht et al. [
25] was conducted, which also influenced the choice of site, as it offered the opportunity for standardisation and to build upon the research.
Within the study area, five different sequences of locations were selected. Each sequence consisted of 4 locations, marked with blue tape, representing open, low, medium, and high canopy densities. These sequences were changed each day to create a larger sample size. Therefore, over the 7 days of data collection, a total of 35 different sequences were used, and the GNSS coordinates and the canopy density of each location were recorded. A diagram of the study site and example sequences is shown in
Appendix A. A total of 20 drone flights were conducted over 7 days, with 3 flights per day at dawn (7:00), midday (13:00–13:30), and dusk (19:15). All flights conducted at dawn and dusk were conducted with a drone-mounted TIR camera, and all midday flights were conducted with a drone-mounted RGB camera. One midday flight did not proceed due to rain. The drone was hovered consistently at an altitude of 50 m for all but 2 thermal flights (which were conducted at 70 m) and in approximately the same location above the study site for each flight.
2.2. Drones and Cameras
The drones used for this study were a DJI Mavic Enterprise with an RGB camera for the midday studies and a DJI Inspire 1 with a FLIR Zenmuse XT camera (focal length: 6.8 mm, resolution: 336 × 256) for the TIR studies. A TIR camera was not used at midday due to the high thermal contrast at that particular time of day, and it is already known that the surrounding temperatures would significantly hinder the chances of detection [
20]. The cameras mounted to both drones were continuously recording footage throughout the flights, which lasted for a maximum time of 10 min. All flights were conducted by KD and SAW.
2.3. Canopy Density
The canopy density of each location was classified by first taking photos of the canopy using a Nikon Coolpix P520 camera with a NIKKOR lens (focal length: 4.3–180 mm) mounted onto a tripod set at a height of 1 m. The camera was mounted at a 90° angle so that the camera lens was facing directly upwards. The photos were then converted to a monochrome BMP format using Microsoft Paint.
Following this, the canopy densities were calculated in ImageJ software by importing each photograph and obtaining the black pixel count from the histogram analysis and converting this into a percentage. The densities were then classified into open (0–25%), low (25–50%), medium (50–75%), and high (75–100%) canopy density categories.
2.4. Stationary or Walking Test Subjects
Five test subjects were voluntarily recruited for each flight, and each test subject was randomly assigned to one of the five sequences of locations. Beginning at the open canopy location, the test subjects were instructed to walk between each location on command, remaining stationary at each location for 10 s. They would then repeat this same routine backwards, starting at the high canopy density location and finishing at the open canopy location. See
Appendix A again for a visual explanation of this. Ethical approval reference: 20/NSP/010.
2.5. Camera Angle
During each flight, the camera was first placed at a 90° angle, during which time the test subjects walked from the open to high canopy density. The camera was then tilted to 45°, and the drone was moved slightly off-centre from the study site for the second half of the study, where the test subjects walked from the high to the open canopy densities. The camera angle was changed via the drone’s remote controller, which had a tablet attached, giving a first-person view (FPV) of the camera as well as a scale of camera angle, allowing the pilot to adjust this remotely when required.
2.6. Image Processing and Manual Analysis
The drone footage for each flight was recorded in one continuous video. Each video was split into sections, representing the conditions of the flight. For example, one video was split into 14 smaller videos, 7 videos for the first half of the flight (90° camera angle) and 7 videos for the second half on the flight (45° camera angle). The 7 videos from each half represented when the test subjects were stationary (4 different canopy densities) and when they were walking between points (3 point-to-point walks).
Each video was then converted into JPG images of the same resolution using an online conversion website:
https://www.onlineconverter.com/ (accessed 15 July 202). Due to the high volume of images produced per video, they were condensed down to 5 images per video, leaving 1400 overall to be analysed. The images were then split amongst 5 voluntary analysts, all of whom had never seen the images before and had no previous knowledge of the research or any experience conducting this type of analysis. Each analyst received 1 of the 5 images per video, meaning each individual was given 280 images in total to analyse. The images were presented in a random order and in controlled stages (i.e., 20 per day), and the analysts were not told how many test subjects were in the image; they simply confirmed the number of subjects they could see, which was recorded along with false positives and false negatives. In this study, false negatives were classed as subjects that were identifiable in images with a trained eye but were missed during analysis.
2.7. Automated Detection Software
Prior to the study, a machine-learning model was trained using a Faster-Region-based Convolutional Neural Network (Faster-RCNN) and transfer learning [
37]. The training was done by tagging approximately 6000 aerial-view images of people, cars, and African animal species (elephants, rhinos, etc.), both TIR and RGB images, via the framework:
www.conservationai.co.uk (accessed 9 April 2020) using the Visual Object Tagging Tool (VoTT) version 1.7.0. In order to classify objects within new images, the deep neural network extracts and ‘learns’ various parameters from these labelled images [
28,
38].
Following the training of the model, the drone images used for manual analysis were uploaded into the model for testing, 1400 in total. The developed algorithm subsequently analysed the characteristics within each image, also comparing them to previously tagged images, enabling positive identifications of test subjects to be automatically labelled, giving the results of automated detection [
29].
2.8. Rock Density vs. False Positives
In addition to the core analysis of this study, the data were analysed further to establish whether more false positives occurred in the automated detection data images with a higher rock density. All 1400 images were split into three categories of rock density: low (0–40% ground cover), medium (40–70%), and high (>70%). The number of false positives in each image was recorded, and due to the number of images per category differing, the total number of detections was also recorded in order to calculate a percentage of false positives. To statistically compare the three categories, a three-proportion Z-test was conducted.
2.9. Statistical Analysis
All statistical analyses were carried out in R Studio using glm2, MuMIn, and ggplot2 packages [
39,
40]. The statistical analyses explained were repeated for both manual and automated detection data. Any data entries that contained missing values were removed from the data set (15 entries out of 7001 were excluded). The data set was split into two separate base data models, representing TIR flight data and RGB flight data. The variables used for statistical analysis are shown in
Table 1. Due to the test subjects transitioning between canopy density classes when walking from point to point, 6 more factors of canopy density were added in addition to ‘open’, ‘low’, ‘med’, and ‘high’ for stationary subjects. These represented canopy density with walking subjects at a 90° camera angle (open-low, low-med, med-high) and at a 45° camera angle (high-med, med-low, low-open). The ‘open’ canopy density category was used as the reference factor for all canopy density analyses in R, using the relevel() function [
41]. Time of day was not included in analysis of the RGB data model, as RGB flights were only conducted at midday. Analyst was not included as a random intercept due to the controlled environment in which the analysis took place and the fact that all analysts had no prior experience.
As the response variable was binary (i.e., detected(1)/notdetected(0)), a global generalised linear model with a logit link function was created for both RGB and thermal data [
42,
43]. This was done using the glm() function of the glm2 package [
40]. Following the methods described by Grueber et al. [
44], sub-models were created for both global models using the dredge() function from the MuMIn package [
40]. This produced a list of models with every possible combination of predictor variable, along with the Akaike Information Criterion corrected for small samples (AICc), Akaike weight, log-likelihood (LogLik), and delta. For both manual and automated analysis, a total of 8 sub-models were produced for the RGB data and 16 sub-models for the TIR data. The AICc and the weight allowed for the sub-models to be compared, as a lower AICc value describes a better fit of the data, and a high Akaike weight indicates a better parsimonious fit overall [
12]. To select the sub-models with the best fit to the data, the get.models() function with a cut-off value of 2AICc from the MuMIn package was used. This test ranks the sub-models by their AICc values and their Akaike weight [
44,
45]. The best fitting model was then tested using a generalised linear model with a logit link function, producing beta coefficient estimates and a
p-value derived from a Wald chi-square test for each predictor variable [
46]. The 95% confidence intervals were also calculated for each variable in the best-fitting model. RGB and TIR detection data were then compared using a Wald chi-square test.
For further analysis, camera angle data were incorporated into canopy density analysis for both camera type, to test whether detection probabilities for varying canopy densities differed with both camera angles.