Phase-Type Distributions of Animal Trajectories with Random Walks

Vera-Amaro, Rodolfo; Rivero-Ángeles, Mario E.; Luviano-Juárez, Alberto

doi:10.3390/math11173671

Open AccessArticle

Phase-Type Distributions of Animal Trajectories with Random Walks

by

Rodolfo Vera-Amaro

^1,*

,

Mario E. Rivero-Ángeles

²

and

Alberto Luviano-Juárez

³

¹

Academia de Telemática-UPIITA, Instituto Politécnico Nacional, Av. IPN 2580, Col. Barrio la Laguna Ticomán, Ciudad de Mexico 07740, Mexico

²

CIC, Instituto Politécnico Nacional, Av. Juan de Dios Bátiz S/N, Nueva Industrial Vallejo, Gustavo A. Madero, Ciudad de Mexico 07740, Mexico

³

SEPI-UPIITA, Instituto Politécnico Nacional, Av. IPN 2580, Col. Barrio la Laguna Ticomán, Ciudad de Mexico 07740, Mexico

^*

Author to whom correspondence should be addressed.

Mathematics 2023, 11(17), 3671; https://doi.org/10.3390/math11173671

Submission received: 20 July 2023 / Revised: 15 August 2023 / Accepted: 22 August 2023 / Published: 25 August 2023

(This article belongs to the Special Issue Latest Advances in Random Walks Dating Back to One Hundred Years)

Download

Browse Figures

Versions Notes

Abstract

:

Animal monitoring systems often rely on expensive and challenging GPS-based systems to obtain accurate trajectories. However, an alternative approach is to generate synthetic trajectories that exhibit similar statistical properties to real trajectories. These synthetic trajectories can be used effectively in the design of surveillance systems such as wireless sensor networks and drone-based techniques, which aid in data collection and the delineation of areas for animal conservation and reintroduction efforts. In this study, we propose a data generation method that utilizes simple phase-type distributions to produce synthetic animal trajectories. By employing probability distribution functions based on the exponential distribution, we achieve highly accurate approximations of the movement patterns of four distinct animal species. This approach significantly reduces processing time and complexity. The research primarily focuses on generating animal trajectories for four endangered species, comprising two terrestrial and two flying species, in order to demonstrate the efficacy of the proposed method.

Keywords:

random walk; animal monitoring; animal trajectory generation; phase-type distributions

MSC:

60-08; 60-11

1. Introduction

In recent years, animal movement monitoring has been of great importance in various scientific areas, such as population control, endangered species monitoring, pest control, and biological diversity. Additionally, studies on the ecological impact of animals on specific regions, humans’ effects on animal behavior, and virus dissemination among different species, among others [1,2,3,4,5], require the use of animal movement observations.

Without reliable animal movement patterns, many of the aforementioned applications cannot be effectively implemented. For instance, sensor nodes may be placed in areas where animals do not cross, or cameras may be positioned in areas with low animal traffic, limiting the surveillance capabilities [6,7].

Animals can be monitored by collecting specific information, such as vital signs (temperature, blood pressure), DNA, body mass, or GPS coordinates, depending on the requirements of the application [8,9]. The most common methods for collecting this information include direct observation of species, the use of radar, GPS systems, thermal cameras, capture–mark–recapture techniques, and attaching or implanting monitoring devices (e.g., collars, leg bands, backpacks) to animals [10]. The key advantage of these methods lies in their high accuracy, primarily stemming from their reliance on human observations. However, a significant drawback lies in their invasiveness, potentially leading to challenges due to the inherent difficulty and risks associated with accessing the natural habitats of the species.

Remote methods can also be employed, such as utilizing wireless sensor networks (WSNs) to detect animal positions and extract biological parameters using biological sensors. The collected data from these devices can later be retrieved through ground and air vehicles such as trucks, remote-controlled cars, helicopters, satellites, or unmanned aerial vehicles (UAVs), as described in [11,12], or a combination of these techniques. These methods offer the advantage of being non-invasive to animals, eliminating the need for a human presence on site or, at the very least, minimizing their proximity. However, a significant disadvantage lies in their susceptibility to reduced accuracy in animal monitoring, encompassing aspects such as animal counts and positions. This reduced accuracy stems from their reliance on sensors, such as camera traps, sound microphones, radio transceivers, and even artificial intelligence software designed for autonomous species detection.

For example, consider a system where WSNs make use of drones to efficiently collect animal information. Figure 1 illustrates a clustered WSN formed by cluster heads (

C H s

) that collect data from their cluster members (

C M s

). The

C M s

store information from sensors attached to moving animals whenever they pass by the static nodes.

Then, the UAV flies to the main cluster head (

C H_{s i n k}

), which has previously gathered all the data from the other

C H s

, collects the information, and finally, returns to its base station (BS). This approach reduces long-range transmissions from energy-constrained nodes, effectively increasing the system’s lifetime.

These methods have varying impacts on the animal environment and present different challenges in collecting information from the animals. For instance, the Arctic long-tailed duck, a newly considered endangered species [13], needs to be remotely monitored to obtain its geolocation data. This information is crucial for understanding the regions it visits and ensuring its safety from predators, hunters, or polluted environments.

Many of these methods face difficulties, including dangerous environments for both humans and animals. Local cameras or radars need to be strategically installed, and data collection must take place on site, with the possibility that the animal may not pass within the transceiver’s coverage. Animals often need to be captured to attach uncomfortable monitoring devices before being reintroduced into their natural habitat, causing stress to the animals and potential risks to humans, as observed in the case of ocelots in Barro Colorado [14].

Commonly, WSNs are used for monitoring without the need for attached sensors, particularly in large areas. However, factors such as energy consumption of the nodes and other parameters must be taken into account. The use of large vehicles such as trucks or helicopters incurs high costs and requires highly specialized human intervention [15,16,17]. Moreover, the natural habitats of animals are likely to be disturbed. Methods such as remote-controlled aerial or terrestrial small vehicles offer reduced monetary costs for data collection, but the observation area may be limited or not accessible, and the lifetime of small unmanned vehicles is also restricted.

Given these considerations, it is crucial to know the animal trajectories in advance before installing, building, or acquiring the necessary devices, tools, or programs to optimize the monitoring system’s performance. For example, if a group of scientists is studying mangabey monkeys’ migration in the jungle [18], which spans large areas, they can attach sensors to the animals that will be detected by a WSN consisting of static sensors randomly placed on the ground. This information allows them to predict the zone through which the animals will pass, even if it is an approximation. Consequently, they can position the sensors in a way that minimizes energy consumption [19,20]. However, obtaining real animal trajectories is a challenging task since only a small number of animals are being tracked, and even fewer are reported in the literature. Therefore, it is often not feasible to consider thousands of different trajectories for Monte Carlo simulations or other realistic simulators such as the NS-3 network simulator software [21,22]. In this regard, the proposed methodology aims to generate as many virtual animal traces [23,24,25] as required by the research team to conduct statistically accurate simulations.

The performance and accuracy of the monitoring methods described above highly depend on certain characteristics of the monitored animals, such as animal movement statistics [26]. Factors such as the duration of animal inactivity, walking patterns, and the chosen direction of movement directly influence the probability of animal detection, i.e., the likelihood of the animal being within or outside the coverage radius of a node in a WSN or camera. Therefore, it is essential to develop accurate random walks that closely resemble the movement of each animal.

It is advisable for any animal tracking system to consider the mobility statistics of each specific animal [27,28]. For illustrative purposes, four different animals are considered: African mangabey monkeys, ocelots of Barro Colorado, bats of Ghana, and Arctic long-tailed ducks. These animals are chosen for the following reasons:

The mangabey, a rare terrestrial monkey, is classified as endangered on the International Union for Conservation of Nature (IUCN) Red List of Threatened Species (RLTS). A recent assessment by the IUCN indicates that they are a step closer to extinction. The primate was known to inhabit specific sites in western Ghana, eastern Cote d’Ivoire, and southern Burkina Faso, but it was recently discovered in the Atewa Forest [29].
The ocelot occupies a wide range of habitats, from mangroves to high-altitude cloud forests, but it is commonly associated with areas of thick vegetation. Classified as “concern” by the IUCN, the ocelot is protected in most countries within its distribution range. Hunting of this species is prohibited in many countries in North, Central, and South America. The ocelot has been exploited in the wild by the pet trade, often involving killing the mother to obtain the kittens. The population of ocelots declined significantly from the 1960s to the 1980s due to the extensive fur trade, with over 566,000 ocelot pelts officially sold. Implementation of protection measures in 1989, including import bans on all spotted cat species, slowed down the trade [14].
Ghana is home to approximately $32 %$ of bat species found on the African continent, making it a global hotspot for bat conservation. There are about 86 different species of bats in Ghana, nearly twice the number found in the whole of Europe. The IUCN classifies 45 African bats as near-threatened, vulnerable, endangered, or critically endangered, some of which are found in Ghana. Factors threatening their survival include habitat degradation due to human activities, loss of foraging grounds and roosting places, extensive use of agrochemicals that kill insect prey, hunting, climate change, and indiscriminate killing due to superstitious beliefs [30].
Studies on Arctic animal species, such as the long-tailed duck, indicate that it is considered a vulnerable species. The global population has declined by at least $30 %$ over three generations (1993–2020) [13].

Therefore, it is important to maintain a reliable record of these endangered species using appropriate technologies. This work serves as a general tool to characterize the movement of endangered species with specific types of movement and environments. The study includes both terrestrial and aerial species to evaluate the accuracy of the proposed method.

The main contributions of this work are as follows:

A statistical analysis using the random walk (RW) model for endangered animals in large-scale areas. The procedure is based on real animal trajectory data.
Phase-type distributions to characterize animal movement. The memoryless property and Markovian analysis are used to simplify the problem of generating virtual trajectories. Specifically, the study focuses on hyper-exponential, Erlang, and exponential distributions, which can describe the movement of the animals as demonstrated by comparing the real and simulated traces.
Based on this characterization, clear guidelines are provided for generating virtual traces with the same statistical characteristics as the real traces, as required.

This article is organized as follows: Section 2 discusses relevant related works. Section 3 presents the system model of the trajectory for specific endangered species. Then, the case study and associated assumptions are proposed. Section 4 presents the design and development of the simulation and the results validation using random routes for each of the selected species, such as: African mangabey monkey, ocelot or mottled leopard from Barro Colorado, bat from Ghana, and Arctic long-tailed duck. Numerical results and comparisons for different cases are presented in Section 5. Finally, conclusions and future work are provided.

2. Related Work

In [31], the authors propose a composite stochastic process where the periods of active dispersal of animals alternate with periods of passivity. They derive a general equation that determines the probability density function (PDF) of this movement process. The equation is analyzed in detail for two important cases: Brownian motion and Lévy flight, described by Gaussian and Cauchy distributions, respectively. This work considers an animal moving in an idealized, uniform, and stationary environment. However, in large areas such as ours, a uniform environment may be inadequate. The authors suggest that this model may work well at small and intermediate scales, such as a small rodent foraging in a large crop. They also calculate the expected probability density function of the walker’s position, which provides a global description of the composite movement but not for a specific animal.

The work in [23] reviews the statistical models used to analyze individual animal movement data, considering discrete-time hidden Markov models (HMM). These models are central components for researchers working with the random walk of individual animals, as they are more accessible than complex models that ecologists may not have the resources to implement or that deviate from usual statistical practices. Simple analysis techniques developed within the ecological community often overlook essential properties of animal trajectories, leading to erroneous conclusions about animal random walks. The authors propose HMM models as a middle ground between these approaches. However, the HMM framework is suitable only when animal positions are monitored at regular intervals. If the sampling protocol varies or observations are made at random times, the HMM may not be suitable.

New approaches are emerging, such as the one presented in [32], where a research group addresses network issues such as efficient energy consumption, network lifetime, coverage, and communication link disconnection among nodes. They propose an efficient data collection method called location-based clustering and opportunistic geographic routing (LCOGR), which ensures stable connectivity and complete coverage of the sensing area. This work utilizes the NS-2 commercial network simulator and does not focus on any specific species.

Moreover, recent research suggests that the concept of random walk has been applied in the realm of machine learning algorithms [33]. While parametric models, such as the one we propose, are commonly utilized to comprehend the influence of environmental factors on movement behavior and to forecast animal movement patterns, it is noteworthy that machine learning and deep learning algorithms, known for their potency and adaptability in predictive modeling, have been infrequently employed with animal movement data. Studies such as [34,35] have successfully formulated models for predicting animal random walks, attaining significant precision in forecasting animal movement patterns. However, the notable drawbacks associated with these algorithms are their high computational demands and reliance on data quality. In scenarios involving species’ natural habitats, achieving the required data quality can prove to be exceptionally challenging. The authors in [36,37,38] developed novel models based on algorithms based on machine and deep learning. These algorithms are designed for diverse applications, ranging from the internet of things (IoT) to monitoring and profiling users’ daily activities, and even influencing decision making in internet routing protocols. Importantly, these advancements have the potential to be adapted and integrated into the realm of animal random walk processes.

A novel study discussed in [39] introduces innovative fast-estimation tools. The authors show that a mixed-effects model within a simple random walk movement process can infer behavior relationships found in environmental movement based on individual variability. They demonstrate this approach using southern elephant seal telemetry data. However, it remains an open question whether this model can be applied to different species.

One of the works most similar to the proposed method is [24], where three types of sheep movement (foraging, resting, and moving) are distinguished. For each animal, the authors quantify an individual movement path. By selecting a set of specific movement parameters, they develop a method to define movement states reflected in the movement parameters. Their proposed method is validated with field observations of movement behavior, and it is found that, on average, this method accurately matches the observational data. The authors also use the Kolmogorov–Smirnov test to compare the real trajectories of four sheep with the “virtual” ones obtained from their proposed model, considering rest, walk, and foraging behaviors.

The work in [25] focuses on representing the home range size and movement patterns of king cobras using a method based on dynamic Brownian bridge movement models (dBBMMs). However, they only consider this specific species and do not consider other species. Their method is suitable for researchers studying mobile reptile hunting species with well-sampled animals. While their dBBMMs method may not work perfectly with certain telemetry datasets, it can still reveal patterns that are important for conservation and management priorities.

Finally, in [40], the author develops simulation software for 3-dimensional random walks of white storks. They use three main parameters to describe the movement: (a) the turn–lift–step combination, which describes the turning angle

t_{a n g}

, (b) the lift angle

l_{a n g}

, and (c) the step length

d_{s t e p}

. The author also calculates auto-differences, which describe the dependency of steps on previous ones, and height distributions, which represent vertical limits.

Contrary to the aforementioned related works, this proposal focuses on studying the real trajectories of four different species by obtaining three random variables that characterize their movements: rest, walk, and turning angle. The probability density functions and cumulative distribution functions for each specific animal are then compared with three different phase distributions using the chi-squared and Kolmogorov–Smirnov goodness of fit tests. Although different species are studied, it is possible to compare the precision of their model with our approach in terms of animal movement: rest and walk.

In this work, for practical purposes, the proposal is focused only on 2-dimensional movements (residence, moving and turning angle states), which are considered sufficient for generating virtual traces of animals for monitoring purposes. The most commonly utilized random walk parameters for predicting animal trajectories include speed, step length (measured in distance or time) during rest or movement, and angle [41]. Nevertheless, there exist additional metrics that can be statistically examined, such as persistence velocity [42], mean squared displacement [43], and first passage time [44]. While these metrics offer valuable insights, basic movement parameters such as resting and movement points, velocity, and turning angle remain pivotal for efficiently describing and analyzing movement paths [42,45] and also reduce the model complexity.

3. System Model

In this section, we provide a detailed description of the animal trajectory model along with the assumptions made. We consider a realistic scenario where animals move with random trajectories that are characterized and approximated using phase-type distributions.

Multiple virtual trajectories will be generated, possessing the same (or similar) statistical properties as the real animal paths. These virtual traces allow for the evaluation of the monitoring system performance without the need to collect large amounts of scarce and difficult-to-obtain data.

To begin, we identify the region where the animal of interest naturally moves, denoted as

A_{o b s}

. Real traces of the four animals considered in this study are obtained from available studies in the Movebank database [18,32,46,47,48]. These traces are depicted in Figure 2.

Using the real traces, we compare the statistical characteristics of the virtual traces generated by the proposed random walk model. If the virtual traces demonstrate similar characteristics, they can serve as alternatives to the real traces.

Based on the available data, we analyze different observation areas,

A_{o b s}

, depending on the animals’ movement patterns. Specifically, an area ranging from 20,000 × 20,000 m to 200,000 × 200,000 m is selected for the monkey, ocelot, and bat. For the long-tailed duck, an area ranging from 1,000,000 × 1,000,000 m to 9,000,000 × 9,000,000 m is chosen, as this species does not venture beyond this range in its natural habitat [49].

Individual virtual trajectories of each animal within the aforementioned areas are illustrated in Figure 3a–d. The black line represents the virtual path, while the blue and red dots represent the trajectory’s start and end points, respectively.

The procedure is depicted in the flow diagram shown in Figure 4, starting with the selection of a species’ real trajectory available in the database. Random variables (RVs) and coefficient of variation (

C o V

) are calculated based on the real trajectories of each animal. Suitable phase distributions are selected for each RV according to their

C o V

. Goodness of fit tests are performed to determine if the RVs follow the proposed phase distributions. If the tests yield positive results, the phase distributions for each RV are accepted and can be used to simulate and validate the movement patterns for each animal. If none of the proposed phase distributions are validated, an alternative species from the database must be selected.

Utilizing a model selection method introduces the challenge that a phase-like distribution may not be chosen, rendering the advantages of using Markov chains unsuitable. Furthermore, it becomes impossible to measure or detect if the distribution was poorly selected since our aim is to approximate the random walk behavior through the

C o V

. It is possible that the random walk follows a distribution other than phase-type that provides a better approximation, but there is no inherently incorrect selection. Our objective is not to identify the optimal distribution through distribution selection methods, but rather to approximate the observed data using Markov chains with phase-type distributions.

For an exponential PDF, even if the experimental results do not have a coefficient of variation (

C o V

) of exactly 1, the exponential distribution can still be employed for

C o V s

that are close to 1. This can lead to a good approximation, which can be verified through goodness of fit tests. In fact, if the

C o V

is sufficiently close to 1, the exponential distribution may provide a better fit compared to the Erlang or hyper-exponential distributions.

Finally, the Abbreviations section presents the most significant variables utilized in this manuscript.

3.1. Trajectory Model Using Phase-Type Distributions

Here, the phase-type PDF approximation to the animal movement is explained in detail.

First, ten to twenty real trajectories were selected, each consisting of five hundred to one thousand GPS samples, extracted and calculated from the database. The selection of the number of trajectories and the sample size was based on the behavior of the species and the techniques employed by biologists to collect samples from them. For instance, ocelots are solitary animals that live alone for most of their lives, as described in [47], while mangabey monkeys, bats from Ghana, and long-tailed ducks live and move in groups. Therefore, monitoring a single animal over a period of time is sufficient to characterize their movement, as stated in [18,32,46,48].

Similar values were observed among the samples from each species. Consequently, the average of the minimum and maximum travel distance (

d_{m i n}

and

d_{m a x}

, respectively), maximum rest time (

T_{f i x, m a x}

), maximum movement time (

T_{m o v, m a x}

), and the average speed (

v_{a v g}

) were calculated.

Using these parameters, we can calculate the time the animals spend in a static state (

T_{f i x}

), the time they spend in motion (

T_{m o v}

), and the trajectory angles (

A_{t r a j}

). These parameters are represented as random variables (RVs) and they characterize the trajectories followed by each animal. In general, the parameter

β_{i}

can be obtained using Equation (1).

β_{i} = \{\begin{matrix} 0, & i = 0, \\ α_{i} - α_{i - 1}, & i \neq 0, \end{matrix}

(1)

The relative angles

β_{i}

are obtained using the alpha angles

α_{i}

with respect to the horizontal axis, as mentioned in [50]. Here,

i = 1, 2, 3, \dots

represents the number of samples from the database. An example illustrating the process of obtaining the movement characterization is presented in Figure 5. This model includes the initial static distance (

d_{f i x 1}

), fixed time (

t_{f i x 1}

), and initial location (

x_{1}

,

y_{1}

). When the animal moves to the first waypoint (

x_{2}

,

y_{2}

), it has an initial time (

t_{m o v 1}

), initial distance (

d_{m o v 1}

), and initial angle (

α_{1}

). Subsequently, at the second waypoint (

x_{2}

,

y_{2}

), the animal rests with a fixed distance

d_{f i x 2}

and spends a rest time

t_{f i x 2}

. Finally, the animal moves towards the third waypoint (

x_{3}

,

y_{3}

), with a relative angle

β_{2}

, a movement distance of

d_{m o v 2}

, and a movement time of

t_{m o v 2}

.

It is required to described the random variables used in the proposed model:

Resting time ( $T_{f i x}$ ): Time at resting state. If the animal walks less than 5 m, it is assumed to be in a state of rest.
Moving time ( $T_{m o v}$ ): The time that the animal spends in motion.
Angle ( $A_{t r a j}$ ): The direction in which the animal is moving.

To analyze these variables, we first obtain the histograms for each animal, as depicted in Figure 6a–d. From these histograms, we calculate the probability density function (PDF) for each variable, as shown in Figure 7a–d.

Following this, the coefficient of variation (

C o V

) is computed for each random variable of each species using the mean and variance. This is achieved using Equation (2)

C o V = \frac{σ_{X}}{E_{X}},

(2)

where

E_{X}

is the mean of the random variable X and

σ_{X}

is the standard deviation. Table 1 shows the calculated statistical parameters.

Now, we propose to use phase-type distributions to model the RVs [51]. The rationale behind using these distributions, such as the Erlang and hyper-exponential distributions, is that they allow us to utilize a Markovian model, specifically Markov chains, to describe the movement of the animals. These distributions can be easily incorporated into tele-traffic analysis, which is commonly used for performance analysis in wireless sensor networks (WSNs) [7].

Remark 1.

In future work, the use of phase-type distributions to calculate the average times and distances between sensors placed on the animals will be considered. This can help to mitigate the effect of areas without transceiver coverage caused by ground-based sensors. However, these studies are beyond the scope of the current contribution, as the main aim of this approach is to present the analytical tool for generating virtual traces of animals.

Hence, we define that RVs with

C o V > 1

have a hyper-exponential PDF, RVs with

C o V = 1

have an exponential PDF, and RVs with

C o V < 1

have an Erlang PDF. The probability density functions (PDFs) of the random variables are calculated using Equations (3)–(5).

f_{h y p e r} (x) = p λ_{1} e^{- λ_{1} x} + (1 - p) λ_{2} e^{- λ_{2} x},

(3)

f_{e x p n} (x) = λ_{3} e^{- λ_{3} x},

(4)

f_{E r l a n g} (x) = \frac{λ_{4} x^{k - 1} e^{- λ_{4} k}}{(k - 1)!}

(5)

The parameter p is obtained with (6)

p = \frac{((E_{X} - \frac{1}{λ_{2}}) (λ_{1} λ_{2}))}{(λ_{2} - λ_{1})}

(6)

and

λ_{1}

and

λ_{2}

are proposed constants. The parameter

λ_{3}

is calculated using (7).

λ_{3} = \frac{1}{E_{X}}

(7)

Finally, the parameter

λ_{4}

is calculated using Equation (8) for the negative exponential distribution,

λ_{4} = \frac{k}{E_{X}}

(8)

where k is an arbitrary number used to calibrate the PDF. None of the RVs obtained in this manuscript have a

C o V = 1

(or follows an exponential distribution). However, we include it in case a different species actually has that coefficient of variation.

To validate these hypothetical distributions, two theoretical goodness of fit tests were used: the chi-squared (CS) test [52] and the Kolmogorov–Smirnov (KS) test [53]. These tests are used to verify if a random variable follows a certain PDF, as explained in Appendix A.

For the CS test, we use the distribution of each random variable for each species, respectively. The significance level (

α

) for the

χ^{2}

test is set to

0.05

(or

5 %

) as is recommended for statistically significant tests with large sample sizes. A result probability

p \leq 0.05

indicates that the hypothetical PDF test is negative or should be rejected, while

p \geq 0.05

means that the hypothesis is accepted. The degrees of freedom (

d f

) are calculated as

d f = K - s - 1

, where K are the histograms bins, and s are the parameter quantity of the PDF:

s = 2

for Erlang,

s = 3

for hyper-exponential, and

s = 2

for negative exponential. The significance level is considered the same for each test and for all four animals in the

χ^{2}

test.

For the KS test, we assume that

T_{f i x}

,

T_{m o v}

, and

A_{t r a j}

follow hypothetical cumulative distribution functions (CDFs). We obtain the observed CDFs of the random variables and calculate their absolute maximum distance or variation (

D_{m a x}

) using Equation (A2), which is defined in Appendix A. We also calculate

D_{t a b l e}

from [54]. Similar to the chi-squared test, the significance level is set to

0.01

(or

1 %

) as is recommended for determining if the hypothesis distribution has sufficient statistical information to be accepted or rejected.

In addition, we employed an empirical approach that involves comparing the quantiles of the proposed distribution with the quantiles of the distribution of the sample data or random variable values. This method is often called the quantiles–quantiles plot (or Q–Q plot) and compares whether two datasets have similar distributions. It plots the quantiles of one dataset against the quantiles of another. If the points form a straight line, the distributions are similar. Curves indicate differences, with upward curves suggesting heavier tails and downward curves suggesting lighter tails. It is a visual tool to assess distribution match [55].

Based on this, let us select and characterize the appropriate phase-type distribution for the RVs (

T_{f i x}

,

T_{m o v}

, and

A_{t r a j}

) for each selected species.

3.1.1. Mangabey Monkey

According to the real trajectories obtained from the mangabey monkey, the parameters of the phase-type distributions are shown in Table 2a, and the corresponding parameters are calculated using Equations (6)–(8).

Specifically, we observe that

T_{f i x}

and

T_{m o v}

have

C o V > 1

, indicating a hyper-exponential distribution, while

A_{t r a j}

has

C o V < 1

, indicating an Erlang distribution.

The level of significance for the

χ^{2}

test is set to

5 %

, with

K = 50

bins. Figure 8a shows the comparison between the proposed and real distributions for the RVs.

A level of significance for the Kolmogorov–Smirnov test (LSK) is proposed of

1 %

, and the value of

D_{t a b l e}

is calculated from Appendix A. Figure 8b shows the comparison between the proposed and real cumulative distributions. It can be observed that in Figure 8b, parameter D is the difference between the values of the proposed and real CDFs.

The value of

χ^{2}

is compared with the

χ_{t a b l e}^{2}

, and the value of

D_{m a x}

is compared to the value

D_{t a b l e}

to validate the hypothesis distribution. The calculated values of the parameters are shown in Table 3. Hence,

T_{f i x}

follows a hyper-exponential,

T_{m o v}

follows a hyper-exponential, and

A_{t r a j}

follows an Erlang distribution.

Finally, in Figure 8c, we can observe that for lower theoretical quantiles, the sample quantiles of the random variable follow a straight line; however, this alignment degrades with larger values.

3.1.2. Ocelot from Barro Colorado

In the same manner as the mangabey monkey, for the ocelot or mottled leopard, the parameters of the PDFs of each RV were calculated using Equations (6)–(8) and adjusted from their respective

C o V

. In Table 2b, the corresponding parameters for each phase-type distribution of each random variable are given.

In this case,

T_{fix}

and

T_{mov}

have

C o V > 1

, hence a hyper-exponential distribution is selected, and

A_{traj}

has a

C o V < 1

, so an Erlang distribution is selected.

For the chi-squared test, Figure 9a presents the RV’s hypothetical and real distributions. For the case of the Kolmogorov–Smirnov, the values of

D_{\max}

and

D_{table}

are compared. Figure 9b shows the theoretical and real CDFs.

The parameters

χ^{2}

,

χ_{table}^{2}

,

D_{\max}

, and

D_{table}

are calculated and presented in Table 4. From these results, it can be proposed that

T_{fix}

and

T_{mov}

follow a hyper-exponential distribution, and

A_{traj}

follows an Erlang distribution.

Then, in Figure 9c, the Q–Q plots for the ocelot are presented, where the curves fit better in the lower values than in the larger values of the quantiles of the sample data.

3.1.3. Bat from Ghana

Now, for the bat from Ghana, the parameters of the proposed distributions are presented in Table 2c. In this case,

T_{fix}

,

T_{mov}

, and

A_{traj}

have

C o V > 1

, so a hyper-exponential distribution is used. As opposed to the previous animals, all three variables follow a hyper-exponential distribution.

Based on the results obtained from the chi-squared test, with an LSC =

5 %

and its respective

d f

with

K = 50

, the proposed distributions accurately model the real traces. The PDFs corresponding to each RV are shown in Figure 10a. Furthermore, using the Kolmogorov–Smirnov test shows a good fit between the real traces and the proposed distributions. Figure 10b shows the respective CDFs for each RV. Then, parameters

χ^{2}

,

χ_{table}^{2}

,

D_{\max}

, and

D_{table}

were calculated and their values are shown in Table 5. Hence,

T_{fix}

,

T_{mov}

, and

A_{traj}

follow a hyper-exponential distribution.

The Q–Q plots are depicted in Figure 10c, where the distributions maintain straight lines only for lower values of the theoretical values of the respective distribution.

3.1.4. Long-Tailed Duck

From the long-tailed duck trajectories database, the parameters calculated are presented in Table 6, and the respective parameters are calculated using Equations (6)–(8). The pause time and movement time,

T_{fix}

and

T_{mov}

, have

C o V s > 1

, hence a hyper-exponential distribution is chosen, and

A_{traj}

has a

C o V < 1

, so an Erlang distribution is selected. Finally, the Q–Q plots presented in Figure 11c show similar data quantile values in the left side of the plot for the theoretical distributions.

In summary, our conclusion is that the goodness of fit for the proposed and adjusted theoretical distributions has been validated. However, the Q–Q plots do not exhibit the desired straight line pattern. The presence of a concave downward curvature in a Q–Q plot indicates that the tails of the theoretical distribution are heavier than those of the sample data. While this can indicate a potential mismatch, its significance depends on factors such as the extent of deviation and the analysis’s purpose. For practical considerations, considering the results of the goodness of fit tests, a sufficient match between distributions can be assumed.

Next, we introduce an analysis involving different distributions that share similar coefficients of variation (

C o V s

). This comparison aims to evaluate the distributions, with particular emphasis on phase-type distributions, which are preferred due to their relevance in Markov chain models.

3.2. Trajectory Model Using Other Distributions

In the preceding section (Section 3.1), we introduced the phase-type distribution as a proposed model for this manuscript, primarily due to its memory properties and its applicability in Markov chain models. However, it is imperative to validate this proposal by subjecting our random walk models to a comparative analysis against other similar distributions.

For the purpose of comparison, we have chosen to evaluate our proposed phase-type distribution against widely recognized distributions, namely, the normal, log-normal, and Pareto distributions. These selections were made based on their ability to exhibit different characteristics of the coefficient of variation, where

C o V

values approximate to 0,

C o V

values approximate to 1, and

C o V

values exceed 1, respectively.

To illustrate this comparison, we have specifically considered three random variables sourced from the animals under study, taking into account their associated

C o V

values as presented in Table 1. For instance, for the ocelot, where

A_{t r a j}

possesses a

C o V

of 0.65, we have conducted a comparison against the normal distribution. Similarly, in the case of the mangabey monkey,

A_{t r a j}

, with a

C o V

of 0.92, has been contrasted with the log-normal distribution, while

T_{m o v}

, with a

C o V

of 4.42, has been matched with the Pareto distribution.

The mathematical formulations employed for calculating the normal, log-normal, and Pareto distributions are elucidated in Appendix C. To facilitate a comprehensive understanding, Table 7 presents the

C o V

values of the three animal random variables, accompanied by their respective hypothetical distribution counterparts.

In summary, the comparison undertaken against well-established distributions contributes to the thorough evaluation and validation of our proposed phase-type distribution within the context of our random walk models.

Utilizing the methodology outlined in Section 3.1, we employed a consistent process to compute the PDFs and CDFs for each selected random variable. Following this, based on their respective coefficient of variation values, we established the corresponding hypothetical distribution for each variable. Subsequently, we compared these hypothetical distributions with the computed distributions of the actual animal trajectories through rigorous goodness of fit tests.

Presented in Figure 12a are comparisons between the PDFs of the actual animal trajectories and the PDFs derived from the corresponding normal, log-normal, and Pareto hypothetical distributions. Additionally, Figure 12b offer a visual exploration, showcasing the cumulative distribution functions. These figures provide side-by-side representations, contrasting the hypothetical distribution CDFs with the CDFs of the actual trajectories for each of the random variables associated with the species.

This systematic approach allows us to assess the concordance between the proposed hypothetical distributions and the empirical data. Employing robust goodness of fit tests, we rigorously evaluate the compatibility and appropriateness of these distributions in capturing the characteristics of the observed animal trajectories.

Goodness of Fit Test Results for Other Distributions

In accordance with the details presented in Table 8, we subjected the proposed hypothetical distributions to a battery of goodness of fit tests, specifically the chi-squared and Kolmogorov–Smirnov tests. This comprehensive analysis aimed to ascertain whether the hypothetical distributions accurately aligned with the observed distributions of the real trajectories. The outcomes of these tests are provided in Table 9.

A notable observation is that, while the

A_{t r a j}

variables for both the ocelot and mangabey monkey successfully met the criteria of the chi-squared test, they fell short of satisfying the Kolmogorov–Smirnov test. Conversely, in an inverse scenario, the

T_{m o v}

variable of the mangabey monkey yielded satisfactory results for the Kolmogorov–Smirnov test, but did not fare well in the chi-squared test.

Subsequently, the Q–Q plots of the chosen random variables are depicted in Figure 12c, illustrating the corresponding proposed distributions: normal, log-normal, and Pareto, as discussed earlier. Notably, the Q–Q plot of the ocelot distribution displays variability in both the lower and upper ends of the theoretical quantiles, while showing a favorable alignment in the central range. Comparatively, the mangabey turning angle distribution exhibits the least compatibility with the log-normal distribution, whereas the distribution of moving times showcases a favorable match with the Pareto distribution, particularly for higher theoretical quantiles.

Finally, it is important to acknowledge that these observations may potentially evolve, as certain random variables may exhibit dual success across the three tests. It is important to underscore that despite these tendencies, we refrained from employing these distributions. As outlined in [56], these distributions are deemed inadequate for Markovian processes, in contrast to the phase-type distributions, which align seamlessly with Markovian characteristics.

4. Simulation Model

Now that the random walk has been characterized, the effectiveness of the proposed method is demonstrated by generating virtual trajectories of the four species with the same statistical properties as the real traces. This is achieved using a custom simulation tool developed in JavaScript. Algorithm 1 describes the process for generating the simulated trajectories. The code was executed M = 1,000,000 times to ensure convergence.

Algorithm 1 Simulation algorithm

1:: Start coordinates $(x = 0, y = 0) \sim m$ , $t_{s i m} = 0 \sim s$ , and choose number of nodes N;
2:: Create: $T_{f i x e d}$ , $T_{m o v}$ and $A_{t r a j}$ by the trajectory generation Algorithm 2
3:: New event: fixedAnimal, update list, calculate $T_{s i m} = T_{s i m} + T_{f i x e d}$ ;
4:: Extract the first event;
5:: Create virtual circular observation area
6:: for $l = 0$ to $L = 1, 000, 000$ do
7:: if event = resting then
8:: Program next event: moving, update list, calculate $T_{s i m} = T_{s i m} + T_{m o v}$ ;
9:: if event = in then
10:: Add event: in, compute $T_{i n} = T_{i n} + T_{f i x e d}$ ;
11:: Calculate and update $P_{i n}^{k} = \frac{T_{i n}}{T_{s i m}}$ ;
12:: else
13:: New event: out, calculate $T_{o u t} = T_{o u t} + T_{f i x e d}$ ;
14:: Calculate and update value $P_{o u t}^{k} = \frac{T_{o u t}}{T_{s i m}}$ ;
15:: end if
16:: else
17:: Program next event: resting, refresh list, compute $T_{s i m} = T_{s i m} + T_{f i x e d}$ ;
18:: if event = in then
19:: New event: in, compute $T_{i n} = T_{i n} + T_{m o v}$ ;
20:: Calculate and update $P_{i n}^{k} = \frac{T_{i n}}{T_{s i m}}$ ;
21:: else
22:: Add event: out, calculate $T_{o u t} = T_{o u t} + T_{m o v}$ ;
23:: Calculate and update $P_{o u t}^{k} = \frac{T_{o u t}}{T_{s i m}}$ ;
24:: end if
25:: end if
26:: Create: $T_{f i x e d, s i m}$ , $T_{m o v, s i m}$ and $A_{t r a j, s i m}$ by the Algorithm 2 and obtain and refresh new coordinates $(x, y)$ ;
27:: if $∥P_{o u t}^{k} - P_{o u t}^{k - 1}∥ < ϵ$ then
28:: Finish;
29:: else
30:: Skip to 4;
31:: end if
32:: end for

For the CS test, Figure 11a shows the theoretical and practical PDFs of each random variable.

To assess the accuracy of the virtual trajectories, a comparison was made between the virtual traces generated by the algorithm and the real animal trajectories obtained from their respective studies (described in Section 3).

For this comparison, an imaginary circular region with a radius (

R_{c o v}

) was defined randomly within the area where the animal freely moves. The probability of the animal spending time inside this circular region during an observation time,

T_{s i m}

, was calculated using both the virtual and real trajectories. This probability, denoted as

P_{i n}

, was calculated in the simulation based on the pseudocode described in Algorithm 1. Using the validated phase-type distributions of

T_{f i x}

,

T_{m o v}

, and

A_{t r a j}

the simulated values of the random variables

T_{f i x, s i m}

,

T_{m o v, s i m}

, and

A_{t r a j, s i m}

were obtained, respectively.

It was assumed that the animal is in a resting state if it moves less than 5 m, taking into account the error of the GPS device. From the simulated trajectories, the average velocity

v_{a v g}

, the resting distance

d_{f i x, s i m}

, and the walking distance

d_{m o v, s i m}

were calculated. Since the random variables were obtained in polar coordinates, they were converted into rectangular coordinates to obtain the waypoint positions (

x_{i}

,

y_{i}

) using the subfunction described in pseudocode Algorithm 2, which computes the simulated trajectories and the new positions

x_{i}

,

y_{i}

, as described in [57].

Algorithm 2 Trajectories algorithm

1:: Define the species and the their speed v;
2:: Define distribution;
3:: Input statistical parameters: k, $λ_{4}$ , $λ_{1}$ , $λ_{2}$ and p;
4:: if $T_{f i x}$ has Erlang then
5:: $T_{f i x, s i m}$ = $p d f E r l a n g (k, λ_{4})$ ;
6:: else
7:: $T_{f i x, s i m}$ = $p d f H y p e r E x p o (p, λ_{1}, λ_{2})$ ;
8:: end if
9:: if $T_{m o v, s i m}$ has Erlang then
10:: $T_{m o v, s i m}$ = $p d f E r l a n g (k, λ_{4})$ ;
11:: else
12:: $T_{m o v, s i m}$ = $p d f H y p e r E x p o (p, λ_{1}, λ_{2})$ ;
13:: end if
14:: if $A_{t r a j, s i m}$ has Erlang then
15:: $A_{t r a j, s i m}$ = $p d f E r l a n g (k, λ_{4})$ ;
16:: else
17:: $A_{t r a j, s i m}$ = $p d f H y p e r E x p o (p, λ_{1}, λ_{2})$ ;
18:: $= r a n d o m V a l u e H y p e r E x p o$ ;
19:: end if
20:: if $(d i s t a n c e_{f i x} = v \times T_{f i x}) < 5 \sim m$ then
21:: $d i s t = 0 \sim m$ ;
22:: else
23:: $d i s t = v_{a v g} \times T_{m o v}$ ;
24:: end if
25:: $a n g l e_{R e a l} = A_{t r a j} + a n g l e_{R e a l}$
26:: $x_{i} = d i s t \times c o s (a n g_{R e a l})$
27:: $y_{i} = d i s t \times s i n (a n g_{R e a l})$ return $x_{i}, y_{i}$

For the case of the KS test, Figure 11b shows the hypothetical and real CDFs of each random variable. Hence, the approximations is as follows:

T_{fix}

and

T_{mov}

are characterized with a hyper-exponential distribution, and

A_{traj}

has an Erlang distribution.

5. Results

In this section, we present the results of the virtual trajectory generation procedure based on phase-type distributions proposed. To validate our results, we compare the residence time of the different animals in a given region, considering both the virtual and real trajectories.

According to the goodness of fit analysis, we can conclude that there are sufficient samples to approximate the random variables involved in the animal’s movement (movement time, pause time, and direction) using phase-type distributions. The results of this analysis for each selected animal are provided in Table 7.

As described in Section 3, the probabilities of finding the animal inside the region of interest (

R_{c o v}

) (

P_{i n}

) and outside the region (

P_{o u t}

) are calculated for both the virtual and real trajectories, considering a constant restricted area where the animal can freely move (

A_{o b s}

). This calculation is performed in Algorithm 1 for each animal. As an example, Figure 13 presents a screen capture depicting the animated running simulation of the mangabey monkey virtual walk. This visualization offers insights into the virtual random walk, demonstrating instances where it traverses both within and outside the expanding coverage radius, symbolized by the green ring denoted as

R_{c o v}

.

Figure 14a presents the probabilities associated with the mangabey monkey’s presence inside and outside a specified region for

A_{o b s}

= 80,000∼m × 80,000∼m across varying values of

R_{c o v}

. These probabilities are calculated employing the statistical parameters derived from Section 3. The red line in Figure 14a represents the probability that the actual trajectory of the mangabey monkey falls within or outside the radio coverage area. In contrast, the blue line depicts the

P_{i n}

and

P_{o u t}

values corresponding to the virtual generated trajectory, with the average speed of the animal’s movement. Additionally, magenta and green lines represent the

P_{i n}

and

P_{o u t}

for the animal when its average movement speed is increased. This illustrates the insight that an increased movement speed corresponds to a reduced probability of remaining within the radio coverage area and the probability of remaining outside is increased.

Likewise, Figure 14b provides a visualization of probabilities pertaining to the ocelot’s behavior. The real traces follow the trend of the red line, while the calculated and augmented ocelot’s average velocities are denoted by the blue, magenta, and green lines, respectively. Notably, it becomes evident that despite the increase in the ocelot’s speed, it remains within its natural habitat.

Moving on to Figure 14c, we present the outcomes for bats in terms of both

P_{i n}

and

P_{o u t}

probabilities. The simulation traces align with the real traces. The red line corresponds to the actual path, while the blue, magenta, and green lines denote the

P_{i n}

and

P_{o u t}

probabilities at varying

ν_{a v g}

. Notably, for a calculated average speed of 6 m/s, the probability of bats being inside or outside their coverage area closely resembles the real path. However, with increased velocity, it is evident that they deviate from their natural environment. It should be noted that the inside and outside probabilities for the real trajectory of the bat follow a very stable line, as the bat’s trajectory has limited variations.

Finally, Figure 14d shows the probabilities of the long-tailed duck moving inside and outside the area coverage and different values of

R_{c o v}

. For the duck, an average velocity of 5 m/s was calculated, hence, it is observed that the virtual trace, depicted with a blue line, is similar to the real trace (red line). If the ducks increase their speed, they remain in the coverage area, the outside probability is decreased.

5.1. Validation Results with RMSE

From Section 5, it can be observed that the virtual traces generated by our simulation approximately match the real animal paths when they are inside or outside the sensor coverage. To quantify the accuracy of the virtual traces, the root mean square deviation (RMSD) method is used to calculate the error between the real and virtual trajectories. Specifically, the calculated

P_{i n}

values for each animal are used, considering the real trajectories as the true probability values and the virtual trajectories as the predicted probability values obtained from our simulation.

The RMSD values for the mangabey monkey, ocelot, bat, and long-tailed duck were

0.2131

,

0.3298

,

0.2494

, and

0.1613

, respectively. These values represent the portion of the total time that each species spends inside the proposed imaginary circular region. The precision of the proposed model can be calculated as in Appendix B, which results in

78.69 %

,

67.02 %

,

75.06 %

, and

83.87 %

for each animal, respectively. These precision values indicate the accuracy of the model in approximating the movement behavior of the animals.

In the study conducted by Teimouri et al. [24], the validation of their movement model for four sheep was carried out using a unique approach. Their validation methodology involved utilizing the behavioral change point analysis (BCPA) segmentation technique and the Kolmogorov–Smirnov test. By analyzing different segments of the overall distribution, specifically, the persistence velocity distribution derived from random walk parameters, they assessed the precision of their predictions.

To quantify the precision, Teimouri et al. employed a comparison between observed and predicted movements. This involved calculating the percentage of intersecting minutes in various activities (forage, rest, walk) for both real and virtual movements. The resulting precision values were

87.6 %

for forage,

68.7 %

for rest, and

70 %

for walk. These values provide insights into the accuracy of their model predictions within specific behavioral contexts.

Comparing these precision results to our proposed model, a notable observation emerges. The average precision for the overall movement of sheep in their study was

75 %

. This figure aligns well with the precision values achieved in our investigation across four distinct species. This alignment underscores the effectiveness of our proposed model in capturing movement patterns.

It is noteworthy that achieving higher precision accuracy in the proposed model could be attainable through several strategies. Increasing the number of samples, enhancing GPS accuracy, maintaining controlled environmental conditions, and ensuring uniform step times between samples could collectively contribute to the refinement of precision measurements.

6. Conclusions

Animal monitoring plays a crucial role in the conservation of endangered species, and various methods have been developed to make it more efficient in terms of time tracking, energy consumption, data collection, coverage area, and habitat perturbation. In this paper, a random walk model based on phase-type distributions was proposed to approximate the trajectories of four different species. The main contribution of this work lies in using phase-type distributions to model the random variables associated with the animals’ movement, including static time, movement time, and turn angle.

Compared to other methods (see Section 5.1), our proposed model achieves an acceptable accuracy of around

80 %

for the four studied species, considering various radius values for the observation area. This level of precision is comparable to other studies that focus on a single animal and use different techniques.

Furthermore, the proposed model was validated using well-established goodness of fit tests such as the chi-squared, Kolmogorov–Smirnov, and Q–Q plot tests. The simulations generated virtual trajectories based on the phase-type distributions, and the residence time of the animals inside and outside an imaginary region was calculated and compared to the real trajectories.

It is worth noting that while the random walk approximation provides accurate results, there is room for improvement in calculating the probabilities of the animals being inside or outside the residence coverage range, particularly for species such as bats and ocelots with highly dynamic movement patterns. Further research in these animals could yield interesting insights and improvements.

As a future research direction, the development of a system that integrates data collection from a wireless sensor network (WSN) using unmanned aerial or terrestrial systems could be explored. This system could be designed to optimize trajectory planning, enhance the performance of animal detection processes, improve energy consumption, and reduce implementation costs.

Author Contributions

R.V.-A.: software, formal analysis, writing—original draft, visualization. M.E.R.-Á.: conceptualization, methodology, validation, investigation, resources, writing—review and editing, supervision. A.L.-J.: conceptualization, methodology, formal analysis, writing—review and editing, supervision. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by SIP under project 20231158. The project of Mario E. Rivero Ángeles was funded by SIP project 20231239. The project of Alberto Luviano Juárez was funded under SIP project 20231585.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

Variable	Description	Value
$A_{o b s}$	Observation area	In square meters
$x_{i}, y_{i}$	Coordinates for each point of the animal’s walk	Variable
$t_{f i x}$	Random variable of rest	Variable
$t_{m o v}$	Random variable of the moving time	Variable
$A_{t r a j}$	Random variable of the trajectory angle	0 $^{\circ}$ to 360 $^{\circ}$
$C o V$	Coefficient of variation, $C o V = \frac{σ_{X}}{E_{X}}$	$< 1$ , $= 1$ or $> 1$
$f_{h y p e r} (x)$	PDF of the hyper-exponential distribution	Variable
$f_{e x p n} (x)$	PDF of the exponential distribution	Variable
$f_{E r l a n g} (x)$	PDF of the Erlang distribution	Variable
p	Parameter of the hyper-exponential PDF	0 to 1
$λ_{1}$ and $λ_{2}$	Statistical parameters of the Hyper-exponetial PDF	$< 1$
$λ_{3}$ and $λ_{4}$	Statistical parameters of the Erlang PDF	$< 1$
k	Statistical parameters of the Erlang PDF	Positive integers
$P_{s u c, s i m}$	Probability of successful detection for the simulation	0 to 1
$P_{s u c}$	Probability of successful detection	0 to 1
$ϵ_{k}$	Convergence error	1e⁻⁶
M	Total algorithm iterations	$1,000,000$

Appendix A. Goodness of Fit Tests

In this appendix, two goodness of fit tests are presented: chi-squared and Kolmogorov–Smirnov.

Appendix A.1. Chi-Squared

The

χ^{2}

test is performed for each random variable using Equation (A1) to obtain the

χ^{2}

value, which is then compared to the corresponding

χ_{table}^{2}

value obtained from a table (e.g., Table 27 in [52]). The level of significance for the

χ^{2}

test is set to

5 %

(or

0.05

), which is a commonly used value for tests with a large number of samples.

If the resulting probability P is less than or equal to

0.05

, it means that the hypothesis is false and the distribution does not fit the data well. On the other hand, if P is greater than

0.05

, the hypothesis is accepted, indicating a good fit between the distribution and the data.

The degrees of freedom

d f

used in the

χ^{2}

test are calculated as

d f = K - s - 1

, where K is the number of bins in the histogram and s is the number of parameters in the distribution. For an Erlang distribution,

s = 2

, for a hyper-exponential distribution,

s = 3

, and for a negative exponential distribution,

s = 2

.

χ^{2} = \sum \frac{{(o b s - e s p)}^{2}}{e s p}

(A1)

Finally, if

χ^{2} < χ_{t a b l e}^{2}

, the real distribution follows approximately the hypothetical distribution.

Appendix A.2. Kolmogorov–Smirnov

To assess the goodness of fit of the proposed distribution to the observed data, the Kolmogorov–Smirnov (KS) test is performed. The cumulative distribution function (CDF) of the hypothetical distribution, denoted as

F (x)

, is compared to the empirical CDF of the observed data, denoted as

F_{n} (x)

, where n is the number of samples for each random variable.

First, the absolute maximum distance (

D_{\max}

) between

F (x)

and

F_{n} (x)

is calculated using Equation (A2). This represents the largest discrepancy between the two distributions.

The level of significance for the KS test is set to

1 %

, which is commonly used for tests with a large number of samples. A critical value, denoted as

D_{table}

, is obtained from a table (e.g., Table 31 in [54]). If

D_{\max}

is less than

D_{table}

, the hypothesis that the observed data follows the proposed distribution is accepted.

In summary, if

D_{\max} < D_{table}

, the goodness of fit test indicates that the proposed distribution is a suitable approximation for the observed data.

D_{m a x} = m a x (∥F (x) - F_{n} (x)∥)

(A2)

Appendix B. Root Mean Square Deviation

Root mean square error (RMSE) is the standard deviation of the residuals (prediction errors). Residuals are a measure of how far from the regression line data points are; RMSE is a measure of how spread out these residuals are. Root mean square error is commonly used in monitoring animals, farming, climatology, forecasting, and regression analysis to verify experimental results. The formula can be written as (A3) [58].

R M S E_{f_{i}} = \sqrt{[\sum_{i = 1}^{N} \frac{{(f_{i} - o_{i})}^{2}}{N}]}

(A3)

where

f_{i}

are the forecasts (expected values or unknown results),

o_{i}

the observed values (known results), and N the sample size for

i = 1, 2, 3, \dots, N

.

Appendix C. Normal, Log-Normal, and Pareto Distributions

Each of these distributions captures specific characteristics of data and phenomena and are used in various fields for modeling and analysis, for example predicting individuals’ behavior and random walk processes.

Appendix C.1. Normal Distribution

The normal distribution, also known as the Gaussian distribution or the bell curve, is a common probability distribution that describes many natural phenomena. In this distribution, data tend to cluster around a central value, with most values near the mean and fewer values farther away. The shape of the distribution is symmetric and forms a bell-shaped curve. Many real-world observations, such as heights, weights, and test scores, follow this distribution [59]. The probability density function is described by Equation (A4) and it is characterized by two parameters, the mean

μ

and the standard deviation

σ

.

f (x | μ, σ) = \frac{1}{σ \sqrt{2 π}} e^{- \frac{{(x - μ)}^{2}}{2 σ^{2}}}

(A4)

Appendix C.2. Log-Normal Distribution

The log-normal distribution captures positively skewed data and is given by Equation (A5) and it models multiplicative processes and is applied to domains where relative changes are of interest, such as economics and biology [60].

f (x | μ, σ) = \frac{1}{x σ \sqrt{2 π}} e^{- \frac{{(ln (x) - μ)}^{2}}{2 σ^{2}}}

(A5)

Appendix C.3. Pareto Distribution

The Pareto distribution reflects situations where a minority contributes to the majority of effects, with Equation (A6). Defined by shape

α_{P a r e t o}

and scale

θ_{P a r e t o}

parameters, it is utilized to understand wealth distribution, network connectivity, and other phenomena with heavy-tailed behaviors [61].

f (x | α, x_{m}) = \frac{α x_{m}^{α}}{x^{α + 1}}

(A6)

References

Zuerl, M.; Stoll, P.; Brehm, I.; Raab, R.; Zanca, D.; Kabri, S.; Happold, J.; Nille, H.; Prechtel, K.; Wuensch, S.; et al. Automated video-based analysis framework for behavior monitoring of individual animals in zoos using deep learning—A study on polar bears. Animals 2022, 12, 692. [Google Scholar] [CrossRef] [PubMed]
McClintock, B.T.; Abrahms, B.; Chandler, R.B.; Conn, P.B.; Converse, S.J.; Emmet, R.L.; Gardner, B.; Hostetter, N.J.; Johnson, D.S. An integrated path for spatial capture–recapture and animal movement modeling. Ecology 2022, 103, e3473. [Google Scholar] [CrossRef]
Nadimi, E.S.; Jørgensen, R.N.; Blanes-Vidal, V.; Christensen, S. Monitoring and classifying animal behavior using ZigBee-based mobile ad hoc wireless sensor networks and artificial neural networks. Comput. Electron. Agric. 2012, 82, 44–54. [Google Scholar] [CrossRef]
Warburton, K.; Lazarus, J. Tendency-distance models of social cohesion in animal groups. J. Theor. Biol. 1991, 150, 473–488. [Google Scholar] [CrossRef] [PubMed]
Witmer, G.W. Wildlife population monitoring: Some practical considerations. Wildl. Res. 2005, 32, 259–263. [Google Scholar] [CrossRef]
Ergunsah, S.; Tümen, V.; Kosunalp, S.; Demir, K. Energy-efficient animal tracking with multi-unmanned aerial vehicle path planning using reinforcement learning and wireless sensor networks. Concurr. Comput. Pract. Exp. 2023, 35, e7527. [Google Scholar] [CrossRef]
Vera-Amaro, R.; Rivero-Angeles, M.; Luviano-Juarez, A. Design and analysis of wireless sensor networks for animal tracking in large monitoring polar regions using phase-type distributions and single sensor model. IEEE Access 2019, 7, 45911–45929. [Google Scholar] [CrossRef]
Sadeghi, E.; Kappers, C.; Chiumento, A.; Derks, M.; Havinga, P. Improving piglets health and well-being: A review of piglets health indicators and related sensing technologies. Smart Agric. Technol. 2023, 5, 100246. [Google Scholar] [CrossRef]
Mennill, D.J.; Battiston, M.; Wilson, D.R.; Foote, J.R.; Doucet, S.M. Field test of an affordable, portable, wireless microphone array for spatial monitoring of animal ecology and behaviour. Methods Ecol. Evol. 2012, 3, 704–712. [Google Scholar] [CrossRef]
Prosekov, A.; Kuznetsov, A.; Rada, A.; Ivanova, S. Methods for monitoring large terrestrial animals in the wild. Forests 2020, 11, 808. [Google Scholar] [CrossRef]
Handcock, R.N.; Swain, D.L.; Bishop-Hurley, G.J.; Patison, K.P.; Wark, T.; Valencia, P.; Corke, P.; O’Neill, C.J. Monitoring animal behaviour and environmental interactions using wireless sensor networks, GPS collars and satellite remote sensing. Sensors 2009, 9, 3586–3603. [Google Scholar] [CrossRef]
Markovic, B.; Nedic, D.; Minic, S. ICT systems for monitoring and protection of wildlife in their natural environment. Vet. J. Repub. Srp. 2018, 18, 132–181. [Google Scholar] [CrossRef]
Birdlife International. BirdLife International (2021) Species Factsheet: Clangula Hyemalis. 2021. Available online: http://www.birdlife.org (accessed on 25 October 2021).
International Society for Endangered Cats Canada (ISEC). ISEC Ocelot. Online, 2021. Available online: https://wildcatconservation.org/wild-cats/south-america/ocelot/ (accessed on 25 October 2021).
Vera-Amaro, R.; Rivero-Ángeles, M.E.; Luviano-Juárez, A. Data collection schemes for animal monitoring using WSNS-assisted by UAVS: Wsns-oriented or UAV-oriented. Sensors 2020, 20, 262. [Google Scholar] [CrossRef] [PubMed]
Burman, K.S.; Schmidt, S.; El Houssaini, D.; Kanoun, O. Design and Evaluation of a Low Energy Bluetooth Sensor Node for Animal Monitoring. In Proceedings of the 2021 18th International Multi-Conference on Systems, Signals & Devices (SSD), Monastir, Tunisia, 22–25 March 2021; IEEE: New York, NY, USA, 2021; pp. 971–978. [Google Scholar]
Al-Quayed, F.; Soudani, A.; Al-Ahmadi, S. Lightweight feature extraction method for efficient acoustic-based animal recognition in wireless acoustic sensor networks. EURASIP J. Wirel. Commun. Netw. 2020, 2020, 256. [Google Scholar] [CrossRef]
Waser, P. Monthly variations in feeding and activity patterns of the mangabey, Cercocebus albigena (Lydekker). Afr. J. Ecol. 1975, 13, 249–263. [Google Scholar] [CrossRef]
Gurarie, E.; Andrews, R.D.; Laidre, K.L. A novel method for identifying behavioural changes in animal movement data. Ecol. Lett. 2009, 12, 395–408. [Google Scholar] [CrossRef] [PubMed]
Cagnacci, F.; Focardi, S.; Ghisla, A.; Van Moorter, B.; Merrill, E.H.; Gurarie, E.; Heurich, M.; Mysterud, A.; Linnell, J.; Panzacchi, M.; et al. How many routes lead to migration? Comparison of methods to assess and characterize migratory movements. J. Anim. Ecol. 2016, 85, 54–68. [Google Scholar] [CrossRef] [PubMed]
Baig z, T.; Shastry, C. Design of WSN Model with NS2 for Animal Tracking and Monitoring. Procedia Comput. Sci. 2023, 218, 2563–2574. [Google Scholar]
Dorathy, I.; Chandrasekaran, M. Simulation tools for mobile ad hoc networks: A survey. J. Appl. Res. Technol. 2018, 16, 437–445. [Google Scholar] [CrossRef]
Patterson, T.A.; Parton, A.; Langrock, R.; Blackwell, P.G.; Thomas, L.; King, R. Statistical modelling of individual animal movement: An overview of key methods and a discussion of practical challenges. AStA Adv. Stat. Anal. 2017, 101, 399–438. [Google Scholar] [CrossRef]
Teimouri, M.; Indahl, U.G.; Sickel, H.; Tveite, H. Deriving animal movement behaviors using movement parameters extracted from location data. ISPRS Int. J. Geo-Inf. 2018, 7, 78. [Google Scholar] [CrossRef]
Silva, I.; Crane, M.; Suwanwaree, P.; Strine, C.; Goode, M. Using dynamic Brownian Bridge Movement Models to identify home range size and movement patterns in king cobras. PLoS ONE 2018, 13, e0203449. [Google Scholar] [CrossRef] [PubMed]
Pollock, K.H.; Nichols, J.D.; Simons, T.R.; Farnsworth, G.L.; Bailey, L.L.; Sauer, J.R. Large scale wildlife monitoring studies: Statistical methods for design and analysis. Environ. Off. Int. Environ. Soc. 2002, 13, 105–119. [Google Scholar] [CrossRef]
Moritz, M.; Galehouse, Z.; Hao, Q.; Garabed, R.B. Can one animal represent an entire herd? Modeling pastoral mobility using GPS/GIS technology. Hum. Ecol. 2012, 40, 623–630. [Google Scholar] [CrossRef]
Srokowski, T. Random walk in nonhomogeneous environments: A possible approach to human and animal mobility. Phys. Rev. E 2017, 95, 032133. [Google Scholar] [CrossRef]
Nolan, R.; Welsh, A.; Geary, M.; Hartley, M.; Dempsey, A.; Mono, J.; Osei, D.; Stanley, C. Camera Traps Confirm the Presence of the White-naped Mangabey Cercocebus lunulatus in Cape Three Points Forest Reserve, Western Ghana. Primate Conserv. 2019, 33, 37–41. [Google Scholar]
Pifworld. Over Batlife Ghana. Online, 2021. Available online: https://www.pifworld.com/en/nonprofits/zFNriU5NsO4/batlife-ghana/about (accessed on 25 October 2021).
Tilles, P.F.; Petrovskii, S.V.; Natti, P.L. A random walk description of individual animal movement accounting for periods of rest. R. Soc. Open Sci. 2016, 3, 160566. [Google Scholar] [CrossRef]
Karunanithy, K.; Velusamy, B. An efficient data collection using wireless sensor networks and internet of things to monitor the wild animals in the reserved area. Peer-to-Peer Netw. Appl. 2022, 15, 1105–1125. [Google Scholar] [CrossRef]
Wang, G. Machine learning for inferring animal behavior from location and movement data. Ecol. Inform. 2019, 49, 69–76. [Google Scholar] [CrossRef]
Wijeyakulasuriya, D.A.; Eisenhauer, E.W.; Shaby, B.A.; Hanks, E.M. Machine learning for modeling animal movement. PLoS ONE 2020, 15, e0235750. [Google Scholar] [CrossRef]
Torney, C.J.; Morales, J.M.; Husmeier, D. A hierarchical machine learning framework for the analysis of large scale animal movement data. Mov. Ecol. 2021, 9, 6. [Google Scholar] [CrossRef] [PubMed]
Yu, X.; Yang, X.; Tan, Q.; Shan, C.; Lv, Z. An edge computing based anomaly detection method in IoT industrial sustainability. Appl. Soft Comput. 2022, 128, 109486. [Google Scholar] [CrossRef]
Tian, Z.; Luo, C.; Lu, H.; Su, S.; Sun, Y.; Zhang, M. User and entity behavior analysis under urban big data. ACM Trans. Data Sci. 2020, 1, 1–19. [Google Scholar] [CrossRef]
Tian, Z.; Su, S.; Shi, W.; Du, X.; Guizani, M.; Yu, X. A data-driven method for future Internet route decision modeling. Future Gener. Comput. Syst. 2019, 95, 212–220. [Google Scholar] [CrossRef]
Jonsen, I.; McMahon, C.; Patterson, T.; Auger-Methe, M.; Harcourt, R.; Hindell, M.; Bestley, S. Movement behaviour responses to environment: Fast inference of individual variation among southern elephant seals with a mixed effects model. Ecology 2019, 100, e02566. [Google Scholar] [CrossRef]
Unterfinger, M. 3-D Trajectory Simulation in Movement Ecology: Conditional Empirical Random Walk. Master’s Thesis, University of Zurich, Zurich, Switzerland, 2018. [Google Scholar]
Edelhoff, H.; Signer, J.; Balkenhol, N. Path segmentation for beginners: An overview of current methods for detecting changes in animal movement patterns. Mov. Ecol. 2016, 4, 21. [Google Scholar] [CrossRef]
Calenge, C.; Dray, S.; Royer-Carenzi, M. The concept of animals’ trajectories from a data analysis perspective. Ecol. Inform. 2009, 4, 34–41. [Google Scholar] [CrossRef]
Gutenkunst, R.; Newlands, N.; Lutcavage, M.; Edelstein-Keshet, L. Inferring resource distributions from Atlantic bluefin tuna movements: An analysis based on net displacement and length of track. J. Theor. Biol. 2007, 245, 243–257. [Google Scholar] [CrossRef]
Fauchald, P.; Tveraa, T. Using first-passage time in the analysis of area-restricted search and habitat selection. Ecology 2003, 84, 282–288. [Google Scholar] [CrossRef]
Dodge, S.; Weibel, R.; Lautenschütz, A.K. Towards a taxonomy of movement patterns. Inf. Vis. 2008, 7, 240–252. [Google Scholar] [CrossRef]
Petersen, M.R.; McCafferey, B.; Flint, P.L. Post-breeding distribution of long-tailed ducks Clangula hyemaIis from the Yukon-Kuskokwim Delta, Alaska. Wildfowl 2013, 54, 103–113. [Google Scholar]
Moreno, R.; Kays, R.; Giacalone-Willis, J.; Aliaga-Rossel, E.; Mares, R.; Bustamante, A. Ámbito de hogar y actividad circadiana del ocelote (Leopardus pardalis) en la isla de Barro Colorado, Panamá. Mesoamericana 2012, 16, 30–39. [Google Scholar]
Sapir, N.; Horvitz, N.; Dechmann, D.K.; Fahr, J.; Wikelski, M. Commuting fruit bats beneficially modulate their flight in relation to wind. Proc. R. Soc. B Biol. Sci. 2014, 281, 20140018. [Google Scholar] [CrossRef] [PubMed]
Wikelski, M.; Kays, R. Movebank: Archive, Analysis and Sharing of Animal Movement Data. World Wide Web Electronic Publication. 2019. Available online: http://www.movebank.org (accessed on 8 May 2023).
Marsh, L.; Jones, R. The form and consequences of random walk movement models. J. Theor. Biol. 1988, 133, 113–131. [Google Scholar] [CrossRef]
Walck, C. Hand-Book on Statistical Distributions for Experimentalists; University of Stockholm: Stockholm, Sweden, 2007; Volume 10. [Google Scholar]
Balakrishnan, N.; Voinov, V.; Nikulin, M.S. Chi-Squared Goodness of Fit Tests with Applications; Academic Press: Cambridge, MA, USA, 2013. [Google Scholar]
Stephens, M.A. Introduction to Kolmogorov (1933) on the empirical determination of a distribution. In Breakthroughs in Statistics; Springer: Berlin/Heidelberg, Germany, 1992; pp. 93–105. [Google Scholar]
Molin, P.; Abdi, H. New Tables and Numerical Approximation for the Kolmogorov-Smirnov/Lillierfors/Van Soest Test of Normality; Technical report; University of Bourgogne: Erasme, France, 1998; Available online: https://personal.utdallas.edu/~herve/MolinAbdi1998-LillieforsTechReport.pdf (accessed on 15 March 2023).
Moore, D.S. Introduction to the Practice of Statistics; WH Freeman and Company: New York City, NY, USA, 2009. [Google Scholar]
Maier, R.S. Phase-type distributions and the structure of finite Markov chains. J. Comput. Appl. Math. 1993, 46, 449–453. [Google Scholar] [CrossRef]
L’Ecuyer, P. Uniform random number generation. Ann. Oper. Res. 1994, 53, 77–120. [Google Scholar] [CrossRef]
Barnston, A.G. Correspondence among the correlation, RMSE, and Heidke forecast verification measures; refinement of the Heidke score. Weather. Forecast. 1992, 7, 699–709. [Google Scholar] [CrossRef]
Fisher, R.A.; Tippett, L.H.C. Limiting forms of the frequency distribution of the largest or smallest member of a sample. In Mathematical proceedings of the Cambridge Philosophical Society; Cambridge University Press: Cambridge, UK, 1928; Volume 24, pp. 180–190. [Google Scholar]
Aitchison, J.; Brown, J.A.C. The Lognormal Distribution, with Special Reference to Its Uses in Economics; Wiley Online Library: Hoboken, NJ, USA, 1969. [Google Scholar]
Clauset, A.; Shalizi, C.R.; Newman, M.E. Power-law distributions in empirical data. SIAM Rev. 2009, 51, 661–703. [Google Scholar] [CrossRef]

Figure 1. WSN for animal monitoring from a base station using cluster heads and cluster members.

Figure 2. Real trajectories in Google Maps. (a) Mangabey monkey, (b) ocelot, (c) bat, (d) long-tailed duck.

Figure 3. Random walk examples inside the

A_{o b s}

(snapshots of the simulation trajectories animation in real time captured while they are still in progress and have not yet concluded). (a) Monkey virtual trajectory, (b) ocelot virtual trajectory, (c) bat virtual trajectory, (d) long-tailed duck virtual trajectory.

Figure 3. Random walk examples inside the

A_{o b s}

(snapshots of the simulation trajectories animation in real time captured while they are still in progress and have not yet concluded). (a) Monkey virtual trajectory, (b) ocelot virtual trajectory, (c) bat virtual trajectory, (d) long-tailed duck virtual trajectory.

Figure 4. Flow diagram of the general procedure.

Figure 5. Trajectories model for relative angles

β

.

Figure 5. Trajectories model for relative angles

β

.

Figure 6. Histograms of the RVs for each species. (a) Mangabey monkey, (b) ocelot, (c) bat, (d) long-tailed duck.

Figure 7. Probability density functions of the RVs for each species. (a) Mangabey monkey, (b) ocelot, (c) bat, (d) long-tailed duck.

Figure 8. Hypothetical and real PDFs and CDFs and Q–Q plots for the mangabey monkey. (a) PDF comparison, (b) CDF comparison, (c) Red and blue lines are the theoretical and practical quantiles respectively.

Figure 9. Hypothetical and real PDFs and CDFs and Q–Q plots for the ocelot. (a) PDF comparison, (b) CDF comparison, (c) Red and blue lines are the theoretical and practical quantiles respectively.

Figure 10. Hypothetical and real PDFs and CDFs and Q–Q plots for the bat. (a) PDF comparison, (b) CDF comparison, (c) Red and blue lines are the theoretical and practical quantiles respectively.

Figure 11. Hypothetical and real PDFs and CDFs and Q–Q plots for the long-tailed duck. (a) PDF comparison, (b) CDF comparison, (c) Red and blue lines are the theoretical and practical quantiles respectively.

Figure 12. Hypothetical and real PDFs and CDFs for normal, log-normal and Pareto distributions. (a) PDF comparison of the other distributions, (b) CDF comparison, (c) Red and blue lines are the theoretical and practical quantiles respectively.

Figure 13. Simulation example of the mangabey monkey random walk inside and outside a radio coverage region

R_{c o v}

. Green and red dots are the start and end points respectively of the random walk.

Figure 13. Simulation example of the mangabey monkey random walk inside and outside a radio coverage region

R_{c o v}

. Green and red dots are the start and end points respectively of the random walk.

Figure 14.

P_{i n}

and

P_{o u t}

of the real and simulated traces. (a)

P_{i n}

and

P_{o u t}

for the Mangabey monkey, (b)

P_{i n}

and

P_{o u t}

for the ocelot, (c)

P_{i n}

and

P_{o u t}

for the bat, (d)

P_{i n}

and

P_{o u t}

for the duck.

Figure 14.

P_{i n}

and

P_{o u t}

of the real and simulated traces. (a)

P_{i n}

and

P_{o u t}

for the Mangabey monkey, (b)

P_{i n}

and

P_{o u t}

for the ocelot, (c)

P_{i n}

and

P_{o u t}

for the bat, (d)

P_{i n}

and

P_{o u t}

for the duck.

Table 1. Statistical parameters of the RVs.

Random Variable	Animal Species	$CoV$	$σ_{X}$	$E_{X}$
$T_{f i x}$	Mangabey monkey	$3.86$	$1.22 \times 10^{4}$	$6.10 \times 10^{3}$
	Ocelot	$3.44$	$1.48 \times 10^{4}$	$4.32 \times 10^{3}$
	Bat	$2.95$	$5.14 \times 10^{3}$	$1.74 \times 10^{3}$
	Long-tailed duck	$2.91$	$1.36 \times 10^{4}$	$4.66 \times 10^{3}$
$T_{m o v}$	Mangabey monkey	$4.42$	$3.26 \times 10^{5}$	$7.39 \times 10^{4}$
	Ocelot	$1.93$	$1.04 \times 10^{5}$	$5.37 \times 10^{4}$
	Bat	$4.61$	$4.48 \times 10^{3}$	$9.72 \times 10^{2}$
	Long-tailed duck	$2.11$	$2.5 \times 10^{4}$	$1.18 \times 10^{4}$
$A_{t r a j}$	Mangabey monkey	$0.92$	$91.79$	$99.16$
	Ocelot	$0.65$	$79.23$	121
	Bat	$1.28$	$84.57$	$65.98$
	Long-tailed duck	$0.53$	$76.76$	$142.60$

Table 2. Parameters from the random variables.

(a) Mangabey Monkey
Random Variable	$p$	$λ_{1}$	$λ_{2}$	$λ_{3}$	$λ_{4}$	$k$
$t_{f i x}$	$0.01247$	2 × $10^{- 5}$	4 × $10^{- 4}$	-	-	-
$t_{m o v}$	$0.021713$	1 × $10^{- 6}$	5 × $10^{- 5}$	-	-	-
$A_{t r a j}$	-	-	-	-	$0.0100$	1
(b) Ocelot
Random Variable	$p$	$λ_{1}$	$λ_{2}$	$λ_{3}$	$λ_{4}$	$k$
$t_{f i x}$	$0.2484$	1 × $10^{- 4}$	3 × $10^{- 3}$	-	-	-
$t_{m o v}$	$0.1103$	3 × $10^{- 6}$	5 × $10^{- 5}$	-	-	-
$A_{t r a j}$	-	-	-	-	$0.01650$	2
(c) Bat
Random Variable	$p$	$λ_{1}$	$λ_{2}$	$λ_{3}$	$λ_{4}$	$k$
$t_{f i x}$	$0.03352$	5 × $10^{- 5}$	9 × $10^{- 4}$	-	-	-
$t_{m o v}$	$0.02275$	4.7 × $10^{- 5}$	2 × $10^{- 3}$	-	-	-
$A_{t r a j}$	$0.6355$	$0.01$	$0.15$	-	-	-
(d) Long-Tailed Duck
Random Variable	$p$	$λ_{1}$	$λ_{2}$	$λ_{3}$	$λ_{4}$	$k$
$t_{f i x}$	$0.0137$	1 × $10^{- 5}$	3 × $10^{- 4}$	-	-	-
$t_{m o v}$	$0.08980$	9 × $10^{- 6}$	5 × $10^{- 4}$	-	-	-
$A_{t r a j}$	-	-	-	-	$0.0280$	4

Table 3. Goodness of fit parameters for mangabey monkey.

Random Variable	$χ^{2}$	$χ_{table}^{2}$	$D_{\max}$	$D_{table}$
$T_{f i x}$	$12.6185$	$39.0882$	$0.2735$	$0.3651$
$T_{m o v}$	$0.6613$	$39.0882$	$0.0443$	$0.3651$
$A_{t r a j}$	$0.1163$	$62.9489$	$0.0509$	$0.2828$

Table 4. Goodness of fit parameters for ocelot.

Random Variable	$χ^{2}$	$χ_{table}^{2}$	$D_{\max}$	$D_{table}$
$T_{f i x}$	$3.0698$	$39.0882$	$0.0925$	$0.2108$
$T_{m o v}$	$1.4118$	$39.0882$	$0.3651$	$0.1153$
$A_{t r a j}$	$0.5637$	$62.9489$	$0.0855$	$0.0970$

Table 5. Goodness of fit parameters for bats from Ghana.

Random Variable	$χ^{2}$	$χ_{table}^{2}$	$D_{\max}$	$D_{table}$
$T_{f i x}$	$2.1452$	$62.9489$	$0.1516$	$0.2828$
$T_{m o v}$	$2.1476$	$39.0882$	$0.1011$	$0.1050$
$A_{t r a j}$	$2.6301$	$62.9489$	$0.2284$	$0.2828$

Table 6. Goodness of fit parameters for long-tailed duck.

Random Variable	$χ^{2}$	$χ_{table}^{2}$	$D_{\max}$	$D_{table}$
$T_{f i x}$	$5.7931$	$39.0882$	$0.1245$	$0.3651$
$T_{m o v}$	$3.7237$	$39.0882$	$0.1425$	$0.3651$
$A_{t r a j}$	$14.7590$	$62.9489$	$0.1337$	$0.2828$

Table 7. Results of the goodness of fit tests.

Animal Species	$t_{fix}$	$t_{mov}$	$A_{tray}$
Mangabey monkey from Uganda	Hyper-exponential	Hyper-exponential	Erlang
Ocelot from Barro Colorado	Hyper-exponential	Hyper-exponential	Erlang
Bat from Ghana	Hyper-exponential	Hyper-exponential	Hyper-exponential
Arctic long-tailed duck	Hyper-exponential	Hyper-exponential	Erlang

Table 8. Coefficient of variation for other hypothetical distributions.

Random Variable	Animal Species	$CoV$	Hypothesis
$A_{t r a j}$	Ocelot	$0.65$	Normal ( $C o V \approx 0$ )
$A_{t r a j}$	Mangabey monkey	$0.92$	Log-normal ( $C o V \approx 1$ )
$T_{m o v}$	Mangabey monkey	$4.42$	Pareto ( $C o V \geq 1$ )

Table 9. Goodness of fit test results for other distributions.

Random Variable	Animal Species	Chi-Squared	Kolmogorov–Smirnov	Q–Q Plot
$A_{t r a j}$	Ocelot	RV follows a normal distribution.	RV does not follow a normal distribution.	Good match in the center.
$A_{t r a j}$	Mangabey monkey	RV follows a log-normal distribution.	RV does not follow a log-normal distribution.	Poor match.
$T_{m o v}$	Mangabey monkey	RV does not follow a Pareto distribution.	RV follows a Pareto distribution.	Excellent match.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Vera-Amaro, R.; Rivero-Ángeles, M.E.; Luviano-Juárez, A. Phase-Type Distributions of Animal Trajectories with Random Walks. Mathematics 2023, 11, 3671. https://doi.org/10.3390/math11173671

AMA Style

Vera-Amaro R, Rivero-Ángeles ME, Luviano-Juárez A. Phase-Type Distributions of Animal Trajectories with Random Walks. Mathematics. 2023; 11(17):3671. https://doi.org/10.3390/math11173671

Chicago/Turabian Style

Vera-Amaro, Rodolfo, Mario E. Rivero-Ángeles, and Alberto Luviano-Juárez. 2023. "Phase-Type Distributions of Animal Trajectories with Random Walks" Mathematics 11, no. 17: 3671. https://doi.org/10.3390/math11173671

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Phase-Type Distributions of Animal Trajectories with Random Walks

Abstract

1. Introduction

2. Related Work

3. System Model

3.1. Trajectory Model Using Phase-Type Distributions

3.1.1. Mangabey Monkey

3.1.2. Ocelot from Barro Colorado

3.1.3. Bat from Ghana

3.1.4. Long-Tailed Duck

3.2. Trajectory Model Using Other Distributions

Goodness of Fit Test Results for Other Distributions

4. Simulation Model

5. Results

5.1. Validation Results with RMSE

6. Conclusions

Author Contributions

Funding

Conflicts of Interest

Abbreviations

Appendix A. Goodness of Fit Tests

Appendix A.1. Chi-Squared

Appendix A.2. Kolmogorov–Smirnov

Appendix B. Root Mean Square Deviation

Appendix C. Normal, Log-Normal, and Pareto Distributions

Appendix C.1. Normal Distribution

Appendix C.2. Log-Normal Distribution

Appendix C.3. Pareto Distribution

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI