In this study, CA–Markov model is used to predict the occurrence of PWD in Anhui Province, and spatial analysis method is used to analyze the spatial and temporal distribution. The overall technical roadmap is shown in
Figure 3 and
Figure S1. Firstly, Markov transformation and multi-criteria evaluation (MCE) were carried out by using the patch data (grid) of pine forest and PWD disasters in 2000 and 2010, to obtain the disaster transfer (change) area, probability matrix and the suitability atlas of PWD prediction in 2000 and 2010, in Anhui Province. Secondly, CA–Markov prediction is used to obtain the prediction map of PWD occurrence in 2020. The grid quantity accuracy and Kappa coefficient accuracy of the prediction map are verified by using the patch data (grid) of pine forest and PWD disaster occurrence in 2020 and the field survey data. If the accuracy is low (Kappa < 0.4), the influencing factors and weights of PWD occurrence need to be re-evaluated, and the above two steps should be repeated. If the accuracy meets the actual application requirements, it shall be considered that CA–Markov model can be used to predict the occurrence of PWD, and the influencing factors and weights of PWD are reasonable, which can be used to predict the next decade (2030). Thirdly, based on the patch data (grid) of pine forest and PWD disasters in 2010 and 2020, Markov transformation and MEC evaluation are also carried out to obtain the suitability atlas of PWD prediction in 2020, then Markov transfer (change) area and probability matrix are conducted, respectively, and further, CA–Markov prediction is used to obtain the prediction map of PWD occurrence in 2030; Finally, the spatial analysis of the occurrence data of PWD in the four periods (2000, 2010, 2020 and 2030) is carried out to reveal the propagation law of the disaster in Anhui Province and warn the future prevention and control direction and measures.
2.3.1. CA–Markov Model
CA model is a spatiotemporal dynamic simulation model based on discontinuity, which is generated by some very simple local rules [
54]. CA system generally includes four elements [
55], i.e., unit, state, neighborhood range and conversion rule, whose formula is:
where
S is the state set of discrete and finite units (i.e., cells),
N is the neighborhood of cells,
t and
t + 1 represent two different moments, and
f is the cell state conversion rule.
Markov model is a model to study the probability of things changing from one state to another with the relevant knowledge of probability theory, and predict the future state of things. The model focuses on the prediction of future quantity.
where
St + 1 is the state of things in the moment of
t + 1,
St is the state of things in the moment of
t, and
Pij is the probability that a thing changes from one state to another.
CA–Markov model: The prediction of the random change state of Markov model is mainly a quantitative prediction, not spatial prediction; CA model has the concept of spatial information and the ability to simulate dynamic evolution; CA–Markov model combines the Markov model and CA model, and it highlights the advantages of the two, which can better simulate the spatiotemporal changing pattern of things.
In this study, the states of random change in the CA–Markov model are divided into three states: healthy pine forest, infected pine forest (pine forest infected with PWD) and non-pine forest. In this way, the three states of pine forest can be predicted both in time and space.
- (1)
Markov Transition Matrix
The Markov transfer area matrix reflects the amount of area interconverted between healthy pine forests, affected pine forests, and non-pine forests in different periods and the probability of interconversion, which is generated based on the transfer area matrix. The transfer matrix is the basis of the Markov model for prediction. In this study, the small-group data (raster) of pine forest and pine wood nematode occurrence in Anhui Province in 2000, 2010, and 2020 were used as input data, and Markov transfer matrices were generated for 2000–2010 and 2010–2020 using the Markov module of IDRISI software, respectively (
Table 1).
- (2)
Designation of suitability Atlas
Suitability atlas is a spatial representation of the suitability of pine forest state transformation, that is, the comprehensive transformation rule of healthy pine forest, infected pine forest and non-pine forest, which refers to the degree of different types to which the region is suitable for development in a certain period of time in the future. When making the suitability atlas, it is necessary to consider two factors, limiting conditions and influencing conditions, in which the limiting factors specify whether pine forest state changes can occur in the region, and the influencing factors determine the change trend of suitable pine forest state, which is a continuous process.
According to the state of healthy pine forest, infected pine forest and non-pine forest, set the limiting conditions and influencing conditions, respectively, to make the suitability atlas, and then combine the suitability atlas to generate the model-readable suitability atlas (
Figure 4). The multi-criteria evaluation (MCE) model is used to set the limiting factors and influencing factors of the suitability atlas (
Table 2). The limiting factor of the infected pine forest (pine forest infected with PWD) is the construction land (from the artificial surface in the land use map), that is, PWD is unlikely to occur in this land type; the influencing factors are NDVI (average value in November and December of the current year and January and February of the next year), average wind speed, solar radiation intensity, population density, average relative humidity, average rainfall, maximum temperature, DEM, and distance from the road. The limiting factors of “healthy pine forest” and “non-pine forest” are all set with construction land. The influencing factors are NDVI and DEM, with weights of 0.6 and 0.4, respectively. The three pine forest states of healthy pine forest, infected pine forest and non-pine forest are transformed into each other according to the conditions set in
Table 2 (i.e., limiting factors, influencing factors, functional relationships, and weights).
- (3)
CA–Markov Prediction
CA–Markov prediction is based on the transition probability matrix, area matrix and suitability atlas of “pine forest state” of healthy pine forest, infected pine forest, and non-pine forest. Firstly, according to the transition probability matrix and area matrix of “pine forest state” in 2000 and 2010 and the 2010 suitability atlas, the occurrence of PWD in 2020 (
Figure 5) is predicted, and the accuracy is verified according to the true value of the investigation of PWD in 2020, so as to evaluate whether the model can be used in this study and verify the rationality of the suitability atlas. After verifying that the model is usable and the suitability atlas is reasonable, based on the transition probability matrix and area matrix of “pine forest state” in 2010 and 2020 and the suitability atlas in 2020 as input data, the disaster occurrence in 2030 is predicted. In the prediction process, the CA–Markov model has two cycles, that is, the PWD in 2020 is predicted based on the relevant data in 2000 and 2010, and then the occurrence distribution map of PWD in 2030 is predicted based on the relevant data in 2010 and 2020, The setting of the number of cycles depends on the time interval between the base period year and the forecast year, which is usually a multiple of the study period interval. The research periods of this paper are 2000, 2010, 2020 and 2030, the time interval is 10, and the cycle number is set as 1, that is, the model is run at an interval of 10 years.
- (4)
Simulation Accuracy Verification
The grid number error and error matrix (or “confusion matrix”) are used to verify the simulation accuracy.
where
ri is the grid number error of class
i,
Sim is the actual number of pixels of class
i, and
Sin is the number of simulated pixels of class
i.
The basic statistical indicators of the error matrix are: Overall accuracy (OA), User accuracy (UA), Product accuracy (PA), and
Kappa coefficient (Kappa).
where
r is the total number of columns in the error matrix (i.e., the total number of categories);
is the number of pixels on the
i row and
i column of the error matrix (i.e., the number of analog types is the same as the real type, which is generally located on the diagonal of the error matrix);
and
are the sum of
i row (analog type) and
i column (real category), respectively; and
N is the total number of samples. According to pertinent study [
56], simulation accuracy can be evaluated according to Kappa. If Kappa is less than 0.4, it indicates that the simulation accuracy is too low, and if it is greater than 0.60, the simulation accuracy is high.
The result shows that the numerical accuracy error of disaster simulation grid is 4.81%, the OA is 93.19%, and the Kappa is 0.65. The comprehensive analysis shows that the CA–Markov model has high simulation accuracy for PWD in Anhui Province in 2020, and can be used to predict and simulate the occurrence of PWD in 2030. The prediction results are shown in
Figure 6.
2.3.2. Directional Distribution
D. Welty Lefever proposed the directional distribution (standard deviation ellipse) algorithm, which expressed the spatial distribution trend of samples with parameters such as ellipse center and rotation angle in 1926. In this study, the standard deviation ellipse algorithm was used to reveal the spatial distribution characteristics of PWD infection spots. The formulas of standard deviation ellipse are:
where
xi and
yi are the spatial coordinates of each element;
is the arithmetic mean center of the element;
SDEx and
SDEy are calculated variances of the ellipse; and
n is the total number of elements. This study is the number of pine forest patch infected by PWD in each stage.
where
θ refers to the clockwise rotation angle with the x-axis as the criterion and the true north (12 o’clock direction) as 0°, that is, the long axis direction of the standard deviation ellipse; and
and
are the differences between the average center and x-axis and y-axis coordinates of each patch.
The formula for calculating the standard deviation of x-axis and y-axis is as follows:
where
and
are the standard deviations of the X, Y axis.
where
s is the confidence value; we can query the chi square probability table according to the number of elements. In spatial statistics, the direction of the ellipse is determined by the long and short semi-axes. The greater the oblateness of the ellipse, that is, the difference between the long and short semi-axes, indicating that the more obvious the directionality of the spatial distribution of the data, the higher the degree of aggregation and, on the contrary, the greater the degree of dispersion.
In this study, the direction distribution (standard deviation ellipse) tool in ArcGIS 10.3 software was used to analyze the direction trend of PWD infection spots in 2000, 2010, 2020 and 2030, that is, the major axis, minor axis, and oblateness of ellipse within 68% of the elements of PWD infection spots in each period were calculated according to the model.