1. Introduction
The mutual influence of migratory processes in regional systems is a problem of growing significance in the modern world. The socioeconomic statuses of different regions demonstrate higher heterogeneity in response to rising political and military tension. All these factors cause an abrupt redistribution of migration flows and regional population variations, thereby increasing the cost of regional population maintenance [
1,
2,
3,
4]. Therefore, it is important to develop different tools (mathematical models, algorithms, and software) for forecasting the distribution of migration flows with adaptation to their dynamics considering available resources.
The authors of [
5] suggested a dynamic entropy model for the migratory interaction of regional systems. In comparison with biological reproduction, migration mobility is a rather fast process [
1,
6]. Thus, the short-term dynamics of regional population size are described by the locally stationary state of a migratory process [
7]. The latter can be simulated under the hypothesis that all migrants have a random and independent spatial distribution over interacting regional systems with given prior probabilities. The mathematical model of a locally stationary state is given by a corresponding entropy operator that maps the space of admissible resources into the space of migratory processes [
8].
Mathematical modeling and analysis of interregional migration is considered in numerous publications. First, it seems appropriate to mention the monographs [
9,
10] that are dedicated to a wide range of interregional migration problems, including mathematical modeling of migration flows. Note that the problem of migration touches upon many aspects of socioeconomic, psychological and political status of the space of migratory movements. Thus, of crucial role is the structural analysis of inter- and intraregional migration flows [
4] and motivations that generate them [
2,
11]. The results of structural and motivational analysis of migratory processes are used for computer simulation. There exist three directions of research in this field, each relying on some system of hypotheses. One of the directions involves the stochastic hypothesis about the origin of migratory motivations [
12], which is simulated using agent technologies [
13,
14]. This direction is adjoined by investigations based on the thermodynamic model of migration flows [
3,
8]. Of course, the short list above does not exhaust the whole variety of migration studies, merely outlining some topics of research.
This paper studies a stochastic version of the model in [
5], in which random parameters and measurement noises are characterized by probability density functions (PDFs). These functions are estimated using retrospective information on the real dynamics of regional population size with “soft” randomized machine learning [
15]. The learned model was implemented in the form of computer simulations, i.e., generation of an ensemble of random trajectories with the entropy-optimal PDFs of the model parameters and measurement noises. The resulting ensemble was used for testing of the model and also for short-term forecasting.
The method developed below is illustrated by an example of the randomized modeling and forecasting of the migratory interaction among three EU countries (Germany, France, and Italy—the system ) and two countries as sources of immigration (Syria and Libya—the system ).
2. Randomized Model of Migratory Interaction
Consider the dynamic discrete-time model of migratory interaction with shared resource constraints that is presented in [
5]. The first sub-model represents migration flows within the system
and is described by the dynamic regression equation
where
In these equations, denotes the population distribution in the regional system at a time .
At a time
, the distribution of immigration flows from the regional system
to the regional system
in terms of an entropy operator is modeled by the second sub-model, which can be described by a vector function
with the components
The variable
z, which is the exponential Lagrange multiplier in the entropy-optimal distribution problem of immigration flows, satisfies the equation
where
is the amount of a shared resource used by all regions from the system
to maintain immigrants.
In this model, the input data are the amounts ; and the output data are the regional population distributions .
The dynamic model in Equations (
1)–(
5) contains the following parameters:
as the shares of mobile population in system regions;
as the prior probabilities of individual migration in the system ;
as the prior probabilities of individual immigration from region k of the system to region n of the system ; and
as the normalized 1 specific generalized cost of immigration maintenance.
Normalization means that .
The parameters form three groups: mobility, migratory movements within the system
, and immigratory movements from the system
to the system
. All these characteristics are specified by the regions of both systems. The dimensionality of the parametric space is reduced using the same approach as in [
5]. The whole essence is to assign a relative regional differentiation of all parameters except for the weights
(mobility) and
(internal migration) of these groups, which are considered as model variables.
This approach leads to the parametric transformation
where
and
are given parameters which characterize the relation of variables.
Then, the dynamic model of migratory interaction in Equations (
1)–(
5) takes the form
with the matrix
and the diagonal matrix
The vector
consists of the components
For each time
, the variable
z satisfies the equation
i.e., there exist
K values
.
The randomized version of this model is described by Equations (
7)–(
11) but some parameters (variables) have random character. These are two randomized parameters,
and
, as well as the variable
, all of the interval type. More specifically, the parameters
and
belong to the intervals
The interval
of the variable
is given by Equation (
11).
Theorem 1. Let the parametersandin Equation (11) be positive andThen, the solutionof this equation belongs to the intervalwhere Therefore, the randomized dynamic model in Equations (
7)–(
11) includes three random parameters
of the interval type that are defined over the three-dimensional cube with faces (Equations (
12) and (
13)), i.e.,
The probabilistic properties of the randomized parameters are described by a continuously differentiable PDF .
By assumption, real distributions of regional population sizes contain errors that are simulated by a random vector
with the interval components
The probabilistic properties of this vector are described by a continuously differentiable PDF .
The measured output of the randomized model has an additive noise,
3. Characterization of Empirical Risk and Measurement Noises
Construct a synthetic functional that depends on the PDFs of the model parameters and measurement noises for assessing in quantitative terms the empirical risk (the difference between the regional population distribution generated by the model in Equations (
7)–(
11) and the real counterpart) and the guaranteed power of these noises. The functional must have components characterizing an intrinsic uncertainty of randomized machine learning (RML) procedures, the approximation quality of empirical balances (the empirical risk) and the worst properties of the corresponding random interval-type noises.
1. Uncertainty. In accordance with the general concept of RML, the first component among the listed ones is
an entropy functional that describes the level of uncertainty:
The two other functional components are constructed using Hölder’s vector and matrix norms (The vector norm has the form
; the matrix norm, the form
.) [
16].
2. Approximate empirical balances. First, consider a characterization of
the empirical risk. For the model in Equations (
7)–(
11), the deviation between the output and real data vectors is given by
Using well-known inequalities for the matrix and vector norms, it is possible to write
Introducing the average matrix and vector norms over the observation interval,
The parameters
and
take values within the intervals
and
(Equation (
12)) while the parameter
within the interval
where
Then, the function
takes the form
Note that the coefficients are determined by real data on regional population distributions and also by the characteristics of internal migration within the system and immigration flows from the system .
The equality in Equation (
25) defines a function
of random variables. Let its expectation be the characteristic of the empirical risk, i.e.,
where
and the intervals
and
have given limits. At the same time, the limits of the interval
are specified by the equalities in Equation (
22).
Power of noises. The measurement noises are simulated by random vectors
. The components of these vectors may have different domains (ranges of values) at different times
. For each time, introduce the Euclidean norm
and its expectation
The average expectation of this norm over the time interval has the form
If the measurement noises are the same on the observation interval, then the noise power functional can be written as
This formula involves the Euclidean norm for a quantitative characterization of the noise power. However, it is possible to choose other norms depending on problem specifics.
4. Soft Randomized Estimation of Model Parameters
The model characteristics and measurement noises are estimated using a learning data collection: the real cost of immigrants maintenance (input data) and the real distributions of regional population sizes (output data).
In accordance with the general procedure of soft randomized machine learning [
15], the optimal probability density functions
(model parameters) and
(measurement noises) are calculated by the constrained minimization of the synthetic functional
that contains the following functionals:
the average empirical risk over the observation interval
and
The soft randomized learning algorithm has the form
The solution of this problem is the optimal PDFs under maximal uncertainty, for the model parameters of the form
where
and for the measurement noises of the form
where
In the case of soft randomization, there is no need for solving the empirical balance equations, which have high complexity and computational intensiveness due to the presence of integral components. Here, computational resources are required for the normalization procedure of the resulting PDFs. On the other hand, the morphology of the optimal PDFs depends on a specific choice of the approximate data balancing criterion and a numerical characterization of the measurement noises.
5. Randomized Forecasting of Dynamic Migratory Interaction
Consider randomized forecasting of dynamic migratory interaction using the principle of soft randomization. Let
be the forecasting interval and assume the initial state (the regional population distribution at the initial time
) coincides with the real distribution, i.e.,
. The shared cost of the system
to maintain immigrants is distributed in accordance with a given scenario. For each scenario, the value
and also the interval
in Equations (
12), (
22), and (
23) are determined.
The forecasted trajectories are constructed using the randomized model in Equations (
7), (
10), and (
11)
The randomized parameters
and
take values within the corresponding intervals with the probability density function
(Equation (
34)).
An ensemble of the forecasted trajectories for the model’s output is obtained taking into account a random vector
with the PDF
(Equation (
36)):
For each scenario an ensemble of random forecasting trajectories is generated via sampling (the transformation of a PDF into a corresponding sequence of random vectors of length I) of the optimal PDFs of the model parameters and measurement noises for each time . The resulting ensemble allows deriving empirical estimates of different numerical characteristics as follows:
the variance pipe, i.e., the set of random trajectories that almost surely (since an ensemble consists of a finite number of trajectories, the matter concerns not probability but its empirical estimate) belong to the domain
the empirical probability distribution and its dynamics on the forecasting interval
where
denotes the number of vectors
whose components are smaller than
; and
the median trajectory
, which satisfies the equation
The ensemble can be used to calculate other characteristics, e.g., -quantiles, confidence probabilities, etc.
6. Example
The appearance of territories with low economic status always causes the growth of immigration. The early 2000s were remarkable for the formation of several such territories in Northern and Central Africa, the Near East, Afghanistan, etc. As a result, tens of millions of migrants moved to the EU as the level of life in these territories dropped below the subsistence minimum. The EU countries have to allocate considerable financial resources for their filtering and accommodation, which are often unacceptable. An example below illustrates the use of soft randomization for estimating and forecasting of immigration flows from Syria (1) and Libya (2) (the system ) to Germany (1), France (2), and Italy (3) (the system ).
1. Randomized model, parameters, measurement errors, time intervals, and real data collections. Choose the randomized mathematical model (Equation (
25)) with the normalized variables
The state variables of the system
and also the immigration flows from the system
are normalized, i.e.,
The variable
characterizes the entropy operator of the immigration process and satisfies the last equation in Equation (
46). The values of the parameters
and
are combined in
Table 1, where columns are different values of corresponding parameter. Recall that the two lowest rows of
Table 1 indicate the values of the parameters
. By assumption, the regions of both systems have the same specific cost.
In accordance with this table,
and
. The measurement errors of population sizes
(in normalized units) belong to the intervals
and by assumption they have the same limits for times
.
The normalized observation (model output) has the form
The random parameter model in Equation (
46) was employed for estimating parameter characteristics and testing on corresponding time intervals with step
year:
2. Entropy estimation of PDFs of model parameters and measurement noises (interval). This problem was solved using available data on regional population distribution for Germany (
), France (
), and Italy (
) and also on the shared cost of immigrants maintenance on the estimation interval (see
Table 2 and UNdata service at
https://data.un.org/).
In this model, the random parameters
and
take values within the intervals
In accordance with Equation (
24),
Then, the soft RML procedure yields the following optimal PDFs of the model parameters and measurement noises:
where
The two-dimensional sections of the three-dimensional PDFs of the model parameters are shown in
Figure 1a–c, while the graphs of the PDFs of the measurementnoises in
Figure 2.
3. Model testing. The randomized model in Equation (
49) with the optimal PDFs in Equations (
52) and (
53) was tested using the above data on regional population sizes from the UNdata service (
https://data.un.org/) (see
Table 3). This table also presents the testing results in terms of the ensemble-average trajectories
and
.
Testing was performed via sampling of the randomized interval parameters with the PDFs in Equations (
52) and (
53) and construction of the corresponding trajectories by Equation (
49).
Figure 3a–c shows ensembles of such trajectories
as well as the ensemble-average trajectories
(Graph 1); the real trajectories
of regional population sizes (Graph 2); and the limits of the variance pipes
(Graph 3).
The testing accuracy was estimated in terms of the relative root-mean-square error
In the example under study, it constituted 4.6% (Region 1), 3.5% (Region 2), and 2.6% (Region 3).
7. Conclusions
This paper has developed a mathematical model for dynamic migratory interaction of regional systems with locally stationary states described by corresponding entropy operators. The model incorporates random parameters, and their probabilistic characteristics—the probability density functions of system parameters and measurement noises—have been calculated using soft randomized machine learning. An example of migratory interaction modeling and testing has been given.