1. Introduction
Diverse forest disturbances taking place in different spatial scales are prevalent in forests, constantly altering the species composition and stand structure of forests [
1]. As a common small-scale disturbance in forests, post-disturbance shaped forest gap can substantially affect light intensity in the understory, which in turn changes plant composition of forest communities, promotes forest community succession and ecosystem development, and contributes to the growth and regeneration of trees and ground cover in forests [
2,
3,
4,
5,
6]. Conceptually, a forest gap is defined as an open space with limited size in forest canopy that occurs after the death of one or more trees due to fire or pests or disease, etc. [
7,
8,
9,
10], acting as one of the most important indicators of forest ecosystem structure dynamics [
11,
12], also one of the most important stages of forest regeneration [
13,
14], as well as a core concept of forest cycle theory [
15]. Therefore, forest-gap-related studies have already been an indispensable part of long-term dynamics in forest ecosystems. Obviously, an accurate classification or extraction of forest gap from complex forest environments is the prerequisite for the studies.
Traditional forest gap extraction is based on on-site survey, which usually requires massive manpower and material resources, and the complex topographic conditions in some remote and steep forested regions may also lead to poor accessibility to limit the field survey of forest gap. Furthermore, this manner has extra shortcomings such as small area covered and spatial discontinuity [
7,
16], poor repeatability and timeliness over wide forested regions [
17,
18]. Thus, this method is no longer prevalent in the current context of rapid development of remote sensing technology [
19]. At present, using airborne Light Detection And Ranging (LiDAR) point cloud data to extract forest gap with height thresholds, using high-resolution multi-spectral data to classify forest gap, and synergizing the two types of data for forest gap classification have already replaced the traditional field survey method as the mainstream [
20,
21].
However, LiDAR data has the disadvantages of high acquisition cost and limited spatio-temporal coverage [
22,
23], and current related studies have shown that using LiDAR data for forest gap extraction leads to an underestimation of the forest gap area [
24], and the varying height threshold value also affect the accuracy of forest gap extraction greatly [
25,
26], therefore, airborne LiDAR is not suitable for the classification of forest gaps in large areas. High-resolution multi-spectral data can contribute to an accurate extraction of forest gaps through its spectral and textural features [
27], but it has the disadvantage of high acquisition cost as well. In comparison, medium-resolution multi-spectral data, e.g., Sentinel-2 MSI has multi-spectral and textural features of forest canopies that LiDAR data does not have, particularly its policy on free use of image data, rich data sources and high temporal resolution enable forest gap classification in wider regions and longer time frames [
18]. But not all medium-resolution data is applicable for extracting them. Since the definition of forest gap is highly dependent on the scale of gap area or size, both too large or too small gaps in the canopy cannot be regarded as forest gaps [
28,
29,
30], so the 30 m resolution data or coarser multi-spectral data, e.g., Landsat is not suitable for the extraction study of forest gaps compared to 10 m resolution data, because the area of a single Landsat pixel is close to the upper limit of the area threshold of a forest gap [
31]. Although medium-resolution multi-spectral data is limited by its inability to extract small-sized forest gaps, its advantages mentioned above are inspiring. Even so, only a limited amount of studies have used medium-resolution multi-spectral data for forest gap classification [
32,
33], but they have shown the possibility to observe the effectiveness of forest management operations in a timely and low cost manner. Although these studies have already demonstrated the ability and potential of medium-resolution multi-spectral data in extracting relatively large-sized forest gaps, the extraction accuracy may be further escalated if other proper classification features and advanced algorithms are applied.
Most of the forest gap extraction studies based on airborne LiDAR and high-resolution multi-spectral data have used the object-based image analysis (OBIA) approach to segment then classify forest gap [
27,
34,
35]. These studies are premised on accurate segmentation and extraction of forest gap boundaries, and therefore rely heavily on high- quality CHM data derived from high-accuracy LiDAR data, thus, their classification accuracy of forest gaps may be reduced in the absence of high-accuracy CHM data [
36]. From another perspective, it may can be considered as that forest gap patches segmented from high-accuracy CHM data were used as reliable “samples” to support subsequent forest gap classification combined with multi-spectral features. But when only multi-spectral data is used for segmentation, the reliability of “samples” is relatively low, which would further accumulate subsequent forest gap classification errors. Therefore, we can try to use samples obtained from CHM data to build a multi-spectral image-based forest gap classification model, but we must consider its high cost and limited spatio-temporal availability of CHM data.
Fortunately, there is another way to derive canopy height data at a relatively low cost. Combining LiDAR-based CHM data or field observed forest canopy height values, as the actual measurements for model training, with multi-spectral data and SAR data to invert forest canopy height over wide regions is one of the most widely used economical methods today [
37]. Ghosh et al. [
38] demonstrated the usefulness of Sentinel series data in canopy height inversion by combining Random Forest (RF) and Symbolic Regression models. Deng’s study [
39] demonstrated that machine learning was more accurate than traditional canopy height inversion methods such as the coherent magnitude method and the geopotential method and overcame the limitation of needing to rely on fully polarized data. Meanwhile, most of the studies on forest canopy height inversion using SAR data or other sources of 3D data had shown that the RF model was the most effective one in height inversion among many machine learning algorithms [
40,
41]. It had also been observed that tree height correlated well with backscattering coefficients and interferometric coherence features calculated from SAR data, fractional vegetation cover (FVC) and leaf area index (LAI), thus, these feature variables could be used to improve the accuracy of canopy height retrievals [
38,
42,
43,
44]. Additionally, the wetness component can reflect the density, developmental stage and moisture content of vegetation canopy [
45], helping to distinguish forest canopy from non-forest canopy areas.
In terms of classification algorithms, most of the current studies have used only machine learning algorithms, which have been demonstrated to be effective for forest gap classification based on both high- and medium-resolution multi-spectral data [
18,
27,
32,
35,
36]. Additionally, a couple of deep learning algorithms including models such as VGG16, ResNet152V2, long short-term memory (LSTM) and 2D-CNN have been widely used in multi-spectral image classification [
46,
47,
48]. Among the algorithms above, CNN has been proven to perform well in computer vision tasks, such as image classification, object detection, and segmentation, due to its ability to capture local spatial relationships through convolutional operations [
49]. Further, since vegetation remote sensing has multi-temporal and multi-modal characteristics, combining data from multiple sensors or acquisition dates for vegetation analysis has often been a technical challenge. The modularity of the CNN framework facilitates the combination of multi-dimensional data, and thus offers significant advantages in vegetation-related remote sensing work, such as the detection of individual plants or the pixel-wise segmentation of vegetation classes due to its powerful ability to extract information from spatial data, in contrast to other machine learning algorithms, such as support vector machine and K-means, which require a process of feature selection to avoid information redundancy. CNN has the ability to filter and learn relevant features by iteratively optimizing the transformations during the training process [
50]. For example, Boston et al. [
51] used a CNN algorithm to implement a land cover classification, Li et al. [
47] compared the accuracies of various CNN and LSTM models for crop classification work, and these results showed that CNN models consistently performed better. Additionally, combining high-resolution multi-spectral data with Mask R-CNN algorithms have great potential in extracting forest gaps [
52]. Obviously, these machine-learning-based and deep-learning-based efforts have had varying degrees of success in different application fields, but they did not test the transferability of the proposed models in remote sensing image classifications, let alone forest gap classification. Based on these endeavors, we assume that medium-resolution multi-spectral data coupled with deep learning, such as 2D-CNN also have such potential in forest gap extraction, but which deserves further tests.
The major objective of the work was to propose a framework that integrates medium-resolution multi-spectral data and limited LiDAR data with deep learning algorithms to construct forest gap extraction models. Two major issues, including: (1) whether the canopy height data retrieved from Sentinel series data can effectively provide samples for forest gap classification, and (2) whether the transferability of the proposed forest gap classification model is acceptable, would be carefully tested in the current work.