**4. Discussion**

A wide range of statistical models exist to model the spatial structure of natural features and phenomena: Resource Selection Function, Generalized Linear Models, Artificial Neural Networks, Maximum Entropy and Classification and Regression Trees [21]. Formerly for the Moscow Region, it was shown that Linear Discriminant Analysis demonstrates satisfactory results for modeling the spatial structure of forests based on field data and Landsat 8 spectral reflectance supplemented with digital elevation model and their derivative parameters [22]. However, linear functions of multispectral data demonstrate limited capabilities, which is demonstrated in our study. The average proportion of matching between field data and model was 52%. It varied from 20% to 100% for 38 association groups based on 1025 field sample plots. However, the number of sample plots per each association group varied from 4 to 114, which is unlikely in terms of sample size [54].

Previous works have shown the advantages of ecological-phytocenotic classification over ecological-floristic for the purpose of mapping. The relative quality of the discriminant analysis of the identified syntaxa within the ecological-floristic classification demonstrated a lower accuracy of the ecological-floristic classification (69.7%) compared to the ecological-phytocenotic classification (78.6%) [55].

The assessment of the spatial diversity of forest cover is connected with a number of limitations. Within the current study, the limitations may be generalized into three groups. The group of natural factors makes the most irregular and heterogeneous input into classification uncertainty. Foremost, the presence of multidirectional processes with degression dynamics (recreational impact, road and construction infrastructure) and restoration dynamics (tillage abandoning, forest silviculture). This limitation significantly disrupts the natural composition of coenotic types. The study area is located in the zone of Eastern European deciduous-coniferous forests, characterized by a mixed polydominant composition, which is difficult to analyze the species composition of communities and their classification, which has also been noted by other authors [56,57]. With regard to deciduous forests, an increase in their proportion due to climate warming, as well as the warming effect of the megalopolis, cannot be ruled out [58]. The polydominance of the tree layer [49], transitional succession status of most derivative forests [59], anthropogenic disturbance, as well as a high proportion of forest silviculture in the region [60], together with the anthropogenic impact mentioned above, make it one of the most complicated regional study areas.

Another group of classification uncertainty factors is generated by the quality and properties of environmental variables, especially Landsat spectral reflectance. Even excluding uncertainties of nadir angle, radiometric correction and atmospheric transparency, each of them varies by 5% to 7.5% [61], and there is still one important issue: the area of the Landsat pixel (0.09 hectares) slightly exceeds the area of the sample plot (0.04 hectares). In terms of statistics, it means that within each sample plot, one cannot evaluate statistical parameters of each sample of pixels and reflectance data are used with all potential extremes and outbreaks. Simply speaking, the Landsat dataset is too coarse for modeling typical 20 × 20 m sample plots [62]. A few solutions might be discussed in this context. One obvious solution is use of higher resolution imagery—Sentinel-2. Not to be overlooked is filtering of spectral reflectance, i.e., median filter [63].

The third group of uncertainty factors is considered in the Introduction Section, and it concerns the uneven spatial distribution of field data, typical for the regions of the Russian Federation, which is a critical factor affecting the quality of the models. This group might be characterized as of human and organizational origin.

The method applied in the current study made it possible to obtain a map of the spatial structure of the formations corresponding to the official data for most of the species (birch, spruce, pine, oak, broad-leaved species and linden) and slightly overestimated results for some rarely distributed species (alder and aspen). The applied set of methods allowed to reach overall accuracy of 0.46 for forest formations.

Modeling more detailed syntaxa of forest cover, association groups, for which spatial rarefication of points is not used, emphasizes the negative contribution of uneven field data. This is especially important for coniferous and broad-leaved coniferous formations. However, the spatial pattern is quite plausible, the results are consistent with the previously developed forest cover models for part of the Moscow Region based on discriminant analysis [43,44], and also with a map of the vegetation cover of the Moscow Region [18].

An attempt at large-scale mapping is promising for assessing biodiversity and forest dynamics, but it has limitations in the area of study with the characteristic physical and geographical diversity of the territory. The overall technology holds promise, but still, the uncertainty of classification is rather low and one shall look for utilizing the higher spatial resolution datasets along with filtering approaches. The previously performed work in the southwestern part of the Moscow Region demonstrated higher quality of the cartographic model (78.6%) of the distribution of 15 types of forest communities [55]. Thus, for large regions with a complex natural structure and anthropogenic history, it might be useful to perform modeling within the individual landscape structures.

The results obtained underline the need to use the resulting map as a stratification matrix and to carry out additional field research, systematic and optimally justified. Additional field research should be aimed at achieving the following objectives:


#### **5. Conclusions**

The use of MaxEnt nonlinear modeling together with additional tools (geographically structured spatial jack-knifing, spatial rarefication of occurrence data and independent testing of model feature classes and regularization parameters) can be used to manage the problem of uneven distribution of field data and to attempt to create a probabilistic cartographic model of forest formations at the regional level. The results of our modeling correspond well to the official data of forest inventory despite the high level of modeling uncertainty.

The main limitations of identifying and assessing the spatial distribution of types of forest communities at a more detailed typological level, in the rank of groups of associations, additionally to those mentioned above, including a series of studies at the sub-regional level within territorial units of natural zoning, were formulated. The need to utilize higher spatial resolution datasets along with filtering was emphasized.

The resulting cartographic model of the groups of associations can be used to stratify the study area and plan the optimal number and placement of field routes necessary for the final statistically valid model of forest communities.

**Author Contributions:** I.K. designed the study and performed the modeling; I.K. and T.C. did research and the data-analysis; T.C. performed ecological-phytocenotic classification of forest communities. All authors have read and agreed to the published version of the manuscript.

**Funding:** The Russian Science Foundation (project no. 18-17-00129) supported this study. This study is also conducted in the framework of the Institute of Geography RAS (project no. 0148-2019-0007) in terms of studying the composition of forest communities.

**Acknowledgments:** The authors thank all colleagues for participating in field surveys and discussing the modeling design: Olga Morozova, Elena Suslova, Nadejda Beliaeva and Maria Arkhipova.

**Conflicts of Interest:** The authors declare no conflict of interest.
