In this section, we provide a detailed description of the animal trajectory model along with the assumptions made. We consider a realistic scenario where animals move with random trajectories that are characterized and approximated using phase-type distributions.
Multiple virtual trajectories will be generated, possessing the same (or similar) statistical properties as the real animal paths. These virtual traces allow for the evaluation of the monitoring system performance without the need to collect large amounts of scarce and difficult-to-obtain data.
Using the real traces, we compare the statistical characteristics of the virtual traces generated by the proposed random walk model. If the virtual traces demonstrate similar characteristics, they can serve as alternatives to the real traces.
Based on the available data, we analyze different observation areas,
, depending on the animals’ movement patterns. Specifically, an area ranging from 20,000 × 20,000 m to 200,000 × 200,000 m is selected for the monkey, ocelot, and bat. For the long-tailed duck, an area ranging from 1,000,000 × 1,000,000 m to 9,000,000 × 9,000,000 m is chosen, as this species does not venture beyond this range in its natural habitat [
49].
Utilizing a model selection method introduces the challenge that a phase-like distribution may not be chosen, rendering the advantages of using Markov chains unsuitable. Furthermore, it becomes impossible to measure or detect if the distribution was poorly selected since our aim is to approximate the random walk behavior through the . It is possible that the random walk follows a distribution other than phase-type that provides a better approximation, but there is no inherently incorrect selection. Our objective is not to identify the optimal distribution through distribution selection methods, but rather to approximate the observed data using Markov chains with phase-type distributions.
For an exponential PDF, even if the experimental results do not have a coefficient of variation () of exactly 1, the exponential distribution can still be employed for that are close to 1. This can lead to a good approximation, which can be verified through goodness of fit tests. In fact, if the is sufficiently close to 1, the exponential distribution may provide a better fit compared to the Erlang or hyper-exponential distributions.
Finally, the Abbreviations section presents the most significant variables utilized in this manuscript.
3.1. Trajectory Model Using Phase-Type Distributions
Here, the phase-type PDF approximation to the animal movement is explained in detail.
First, ten to twenty real trajectories were selected, each consisting of five hundred to one thousand GPS samples, extracted and calculated from the database. The selection of the number of trajectories and the sample size was based on the behavior of the species and the techniques employed by biologists to collect samples from them. For instance, ocelots are solitary animals that live alone for most of their lives, as described in [
47], while mangabey monkeys, bats from Ghana, and long-tailed ducks live and move in groups. Therefore, monitoring a single animal over a period of time is sufficient to characterize their movement, as stated in [
18,
32,
46,
48].
Similar values were observed among the samples from each species. Consequently, the average of the minimum and maximum travel distance ( and , respectively), maximum rest time (), maximum movement time (), and the average speed () were calculated.
Using these parameters, we can calculate the time the animals spend in a static state (
), the time they spend in motion (
), and the trajectory angles (
). These parameters are represented as random variables (RVs) and they characterize the trajectories followed by each animal. In general, the parameter
can be obtained using Equation (
1).
The relative angles
are obtained using the alpha angles
with respect to the horizontal axis, as mentioned in [
50]. Here,
represents the number of samples from the database. An example illustrating the process of obtaining the movement characterization is presented in
Figure 5. This model includes the initial static distance (
), fixed time (
), and initial location (
,
). When the animal moves to the first waypoint (
,
), it has an initial time (
), initial distance (
), and initial angle (
). Subsequently, at the second waypoint (
,
), the animal rests with a fixed distance
and spends a rest time
. Finally, the animal moves towards the third waypoint (
,
), with a relative angle
, a movement distance of
, and a movement time of
.
It is required to described the random variables used in the proposed model:
Resting time (): Time at resting state. If the animal walks less than 5 m, it is assumed to be in a state of rest.
Moving time (): The time that the animal spends in motion.
Angle (): The direction in which the animal is moving.
To analyze these variables, we first obtain the histograms for each animal, as depicted in
Figure 6a–d. From these histograms, we calculate the probability density function (PDF) for each variable, as shown in
Figure 7a–d.
Following this, the coefficient of variation (
) is computed for each random variable of each species using the mean and variance. This is achieved using Equation (
2)
where
is the mean of the random variable
X and
is the standard deviation.
Table 1 shows the calculated statistical parameters.
Now, we propose to use phase-type distributions to model the RVs [
51]. The rationale behind using these distributions, such as the Erlang and hyper-exponential distributions, is that they allow us to utilize a Markovian model, specifically Markov chains, to describe the movement of the animals. These distributions can be easily incorporated into tele-traffic analysis, which is commonly used for performance analysis in wireless sensor networks (WSNs) [
7].
Remark 1. In future work, the use of phase-type distributions to calculate the average times and distances between sensors placed on the animals will be considered. This can help to mitigate the effect of areas without transceiver coverage caused by ground-based sensors. However, these studies are beyond the scope of the current contribution, as the main aim of this approach is to present the analytical tool for generating virtual traces of animals.
Hence, we define that RVs with
have a hyper-exponential PDF, RVs with
have an exponential PDF, and RVs with
have an Erlang PDF. The probability density functions (PDFs) of the random variables are calculated using Equations (
3)–(
5).
The parameter
p is obtained with (
6)
and
and
are proposed constants. The parameter
is calculated using (
7).
Finally, the parameter
is calculated using Equation (
8) for the negative exponential distribution,
where
k is an arbitrary number used to calibrate the PDF. None of the RVs obtained in this manuscript have a
(or follows an exponential distribution). However, we include it in case a different species actually has that coefficient of variation.
To validate these hypothetical distributions, two theoretical goodness of fit tests were used: the chi-squared (CS) test [
52] and the Kolmogorov–Smirnov (KS) test [
53]. These tests are used to verify if a random variable follows a certain PDF, as explained in
Appendix A.
For the CS test, we use the distribution of each random variable for each species, respectively. The significance level () for the test is set to (or ) as is recommended for statistically significant tests with large sample sizes. A result probability indicates that the hypothetical PDF test is negative or should be rejected, while means that the hypothesis is accepted. The degrees of freedom () are calculated as , where K are the histograms bins, and s are the parameter quantity of the PDF: for Erlang, for hyper-exponential, and for negative exponential. The significance level is considered the same for each test and for all four animals in the test.
For the KS test, we assume that
,
, and
follow hypothetical cumulative distribution functions (CDFs). We obtain the observed CDFs of the random variables and calculate their absolute maximum distance or variation (
) using Equation (
A2), which is defined in
Appendix A. We also calculate
from [
54]. Similar to the chi-squared test, the significance level is set to
(or
) as is recommended for determining if the hypothesis distribution has sufficient statistical information to be accepted or rejected.
In addition, we employed an empirical approach that involves comparing the quantiles of the proposed distribution with the quantiles of the distribution of the sample data or random variable values. This method is often called the quantiles–quantiles plot (or Q–Q plot) and compares whether two datasets have similar distributions. It plots the quantiles of one dataset against the quantiles of another. If the points form a straight line, the distributions are similar. Curves indicate differences, with upward curves suggesting heavier tails and downward curves suggesting lighter tails. It is a visual tool to assess distribution match [
55].
Based on this, let us select and characterize the appropriate phase-type distribution for the RVs (, , and ) for each selected species.
3.1.1. Mangabey Monkey
According to the real trajectories obtained from the mangabey monkey, the parameters of the phase-type distributions are shown in
Table 2a, and the corresponding parameters are calculated using Equations (
6)–(
8).
Specifically, we observe that and have , indicating a hyper-exponential distribution, while has , indicating an Erlang distribution.
The level of significance for the
test is set to
, with
bins.
Figure 8a shows the comparison between the proposed and real distributions for the RVs.
A level of significance for the Kolmogorov–Smirnov test (LSK) is proposed of
, and the value of
is calculated from
Appendix A.
Figure 8b shows the comparison between the proposed and real cumulative distributions. It can be observed that in
Figure 8b, parameter
D is the difference between the values of the proposed and real CDFs.
The value of
is compared with the
, and the value of
is compared to the value
to validate the hypothesis distribution. The calculated values of the parameters are shown in
Table 3. Hence,
follows a hyper-exponential,
follows a hyper-exponential, and
follows an Erlang distribution.
Finally, in
Figure 8c, we can observe that for lower theoretical quantiles, the sample quantiles of the random variable follow a straight line; however, this alignment degrades with larger values.
3.1.2. Ocelot from Barro Colorado
In the same manner as the mangabey monkey, for the ocelot or mottled leopard, the parameters of the PDFs of each RV were calculated using Equations (
6)–(
8) and adjusted from their respective
. In
Table 2b, the corresponding parameters for each phase-type distribution of each random variable are given.
In this case, and have , hence a hyper-exponential distribution is selected, and has a , so an Erlang distribution is selected.
For the chi-squared test,
Figure 9a presents the RV’s hypothetical and real distributions. For the case of the Kolmogorov–Smirnov, the values of
and
are compared.
Figure 9b shows the theoretical and real CDFs.
The parameters
,
,
, and
are calculated and presented in
Table 4. From these results, it can be proposed that
and
follow a hyper-exponential distribution, and
follows an Erlang distribution.
Then, in
Figure 9c, the Q–Q plots for the ocelot are presented, where the curves fit better in the lower values than in the larger values of the quantiles of the sample data.
3.1.3. Bat from Ghana
Now, for the bat from Ghana, the parameters of the proposed distributions are presented in
Table 2c. In this case,
,
, and
have
, so a hyper-exponential distribution is used. As opposed to the previous animals, all three variables follow a hyper-exponential distribution.
Based on the results obtained from the chi-squared test, with an LSC =
and its respective
with
, the proposed distributions accurately model the real traces. The PDFs corresponding to each RV are shown in
Figure 10a. Furthermore, using the Kolmogorov–Smirnov test shows a good fit between the real traces and the proposed distributions.
Figure 10b shows the respective CDFs for each RV. Then, parameters
,
,
, and
were calculated and their values are shown in
Table 5. Hence,
,
, and
follow a hyper-exponential distribution.
The Q–Q plots are depicted in
Figure 10c, where the distributions maintain straight lines only for lower values of the theoretical values of the respective distribution.
3.1.4. Long-Tailed Duck
From the long-tailed duck trajectories database, the parameters calculated are presented in
Table 6, and the respective parameters are calculated using Equations (
6)–(
8). The pause time and movement time,
and
, have
, hence a hyper-exponential distribution is chosen, and
has a
, so an Erlang distribution is selected. Finally, the Q–Q plots presented in
Figure 11c show similar data quantile values in the left side of the plot for the theoretical distributions.
In summary, our conclusion is that the goodness of fit for the proposed and adjusted theoretical distributions has been validated. However, the Q–Q plots do not exhibit the desired straight line pattern. The presence of a concave downward curvature in a Q–Q plot indicates that the tails of the theoretical distribution are heavier than those of the sample data. While this can indicate a potential mismatch, its significance depends on factors such as the extent of deviation and the analysis’s purpose. For practical considerations, considering the results of the goodness of fit tests, a sufficient match between distributions can be assumed.
Next, we introduce an analysis involving different distributions that share similar coefficients of variation (). This comparison aims to evaluate the distributions, with particular emphasis on phase-type distributions, which are preferred due to their relevance in Markov chain models.
3.2. Trajectory Model Using Other Distributions
In the preceding section (
Section 3.1), we introduced the phase-type distribution as a proposed model for this manuscript, primarily due to its memory properties and its applicability in Markov chain models. However, it is imperative to validate this proposal by subjecting our random walk models to a comparative analysis against other similar distributions.
For the purpose of comparison, we have chosen to evaluate our proposed phase-type distribution against widely recognized distributions, namely, the normal, log-normal, and Pareto distributions. These selections were made based on their ability to exhibit different characteristics of the coefficient of variation, where values approximate to 0, values approximate to 1, and values exceed 1, respectively.
To illustrate this comparison, we have specifically considered three random variables sourced from the animals under study, taking into account their associated
values as presented in
Table 1. For instance, for the ocelot, where
possesses a
of 0.65, we have conducted a comparison against the normal distribution. Similarly, in the case of the mangabey monkey,
, with a
of 0.92, has been contrasted with the log-normal distribution, while
, with a
of 4.42, has been matched with the Pareto distribution.
The mathematical formulations employed for calculating the normal, log-normal, and Pareto distributions are elucidated in
Appendix C. To facilitate a comprehensive understanding,
Table 7 presents the
values of the three animal random variables, accompanied by their respective hypothetical distribution counterparts.
In summary, the comparison undertaken against well-established distributions contributes to the thorough evaluation and validation of our proposed phase-type distribution within the context of our random walk models.
Utilizing the methodology outlined in
Section 3.1, we employed a consistent process to compute the PDFs and CDFs for each selected random variable. Following this, based on their respective coefficient of variation values, we established the corresponding hypothetical distribution for each variable. Subsequently, we compared these hypothetical distributions with the computed distributions of the actual animal trajectories through rigorous goodness of fit tests.
Presented in
Figure 12a are comparisons between the PDFs of the actual animal trajectories and the PDFs derived from the corresponding normal, log-normal, and Pareto hypothetical distributions. Additionally,
Figure 12b offer a visual exploration, showcasing the cumulative distribution functions. These figures provide side-by-side representations, contrasting the hypothetical distribution CDFs with the CDFs of the actual trajectories for each of the random variables associated with the species.
This systematic approach allows us to assess the concordance between the proposed hypothetical distributions and the empirical data. Employing robust goodness of fit tests, we rigorously evaluate the compatibility and appropriateness of these distributions in capturing the characteristics of the observed animal trajectories.
Goodness of Fit Test Results for Other Distributions
In accordance with the details presented in
Table 8, we subjected the proposed hypothetical distributions to a battery of goodness of fit tests, specifically the chi-squared and Kolmogorov–Smirnov tests. This comprehensive analysis aimed to ascertain whether the hypothetical distributions accurately aligned with the observed distributions of the real trajectories. The outcomes of these tests are provided in
Table 9.
A notable observation is that, while the variables for both the ocelot and mangabey monkey successfully met the criteria of the chi-squared test, they fell short of satisfying the Kolmogorov–Smirnov test. Conversely, in an inverse scenario, the variable of the mangabey monkey yielded satisfactory results for the Kolmogorov–Smirnov test, but did not fare well in the chi-squared test.
Subsequently, the Q–Q plots of the chosen random variables are depicted in
Figure 12c, illustrating the corresponding proposed distributions: normal, log-normal, and Pareto, as discussed earlier. Notably, the Q–Q plot of the ocelot distribution displays variability in both the lower and upper ends of the theoretical quantiles, while showing a favorable alignment in the central range. Comparatively, the mangabey turning angle distribution exhibits the least compatibility with the log-normal distribution, whereas the distribution of moving times showcases a favorable match with the Pareto distribution, particularly for higher theoretical quantiles.
Finally, it is important to acknowledge that these observations may potentially evolve, as certain random variables may exhibit dual success across the three tests. It is important to underscore that despite these tendencies, we refrained from employing these distributions. As outlined in [
56], these distributions are deemed inadequate for Markovian processes, in contrast to the phase-type distributions, which align seamlessly with Markovian characteristics.