**1. Introduction**

The persistence of forest ecosystem resources is the key to protecting wild coverage of reproductive trees in order to alleviate global warming or the impact of climate change. Specifically, the variance in the area of woods, the accumulation of forest biomass/carbon storage, and the improvement of healthy forests are the periodic evaluation indices of global forest resources for forest sustainability, according to Forest Resources Assessment FAO (Food and Agriculture Organization of the United Nations) [1]. Therefore, using telemetry to monitor the health level of forest ecosystems has a critical effect on the subject of global warming control. Climate change may affect phenological events such as the onset of green-up and dormancy [2]. Trees start to sprout once they sense the growing signals in early spring. After the leaf initiation stage, the newly sprouted leaflet will gradually develop and further facilitate tree growth in crown width, height, diameter, and carbon storage [3]. As a result, the newly grown leaves (NGL) can be seen as the first objects of trees in response to a change in temperature, and can therefore provide critical information for the early detection of climate changes [4]. UAV-sensed images are generally collected at low altitude; the images are supposedly free of atmospheric effects [5], and provide very high spatial resolution for applications. Taking the strengths of UAV-sensed images, NGL over a forest area can be detected via appropriate remote sensing techniques.

In forest science, remote sensing has been applied to investigate species classification [6,7], tree delineation [8,9], and biomass productivity estimation [10,11]. The detection of NGL is a new application in respect to the previous applications. The target of interest may occur under a very low probability or probably has a relatively smaller size than the background, such as the damaged portion of crown in the forest canopy or the new leaf crown in the forest canopy. In this case, the traditional spatial domain (i.e., literal)-based image processing techniques [12–15] may fail to extract these targets effectively, especially when the target size is smaller than the pixel resolution. In contrast, the technique using spectral characteristics to detect the subpixel level is one of the more feasible methods. From the angle of multispectral/hyperspectral detection, the spectral information-based target detection technique [16–18] should be able to solve these problems.

Hyperspectral subpixel detection techniques can be divided into active and passive manners. In the active methods, the detectors only use single or multiple spectral signatures of targets of interest for detection purposes; e.g., Constrained Energy Minimization (CEM) [19], Correlation Mahalanobis Distance (RMD) [7,20], Mahalanobis Distance (KMD) [17,21], Adaptive Coherence Estimator (ACE) [22], Target-Constrained Interference-Minimized Filter (TCIMF) [17,18,23], etc. The correctness of target information plays an important role. Incorrect information results in a false alarm and omission of a target detection result. Therefore, how to provide the correct target object information is a very important step. The Optimal Signature Generation Process (OSGP) that is proposed in this paper increases the accuracy of selecting a target iteratively so as to solve the previously mentioned problem.

Hyperspectral algorithms have been developed for many different target detections in recent years and are used in different areas [24–29]. Many studies have proposed CEM based algorithms [30,31] in the last decade. However, using hyperspectral algorithms to detect targets in RGB images presents several issues since spectral information is insufficient and spatial information is not used, and thus a false alarm is likely to occur where there is a similar spectral characteristic. In order to solve this dilemma, this paper proposes three innovative CEM based detectors-Subset CEM, Sliding Window-based CEM, and Adaptive Sliding Window-based CEM to establish their own autocorrelation matrix, and uses sliding window point-to-point scanning for calculation. As the sliding window passes through, the contrast between a NGL and the background can be enhanced. When compared with traditional CEM results that incur too many false alarms, our proposed methods can solve this issue and produce more stable results.

#### **2. Materials and Methods**

#### *2.1. Constrained Energy Minimization*

Traditional Constrained Energy Minimization (CEM) [16–19] of active hyperspectral target detection is the major technique that is adopted in this paper. The CEM only needs the spectral signature of one specific target of interest during target detection; it is free of the spectral signature of other targets or background. Many target detection methods have been proposed in recent years, among those, the CEM only needs the signature of one target of interest and the detection is stable. Thus, this paper selects CEM to compare local improvement methods. In the CEM algorithm, only one spectral signature (desired signature or target of interest) is given, referred to as **d**, any prior knowledge is not required, e.g., multiple targets of interest or background. In other words, users can extract the specific target of interest without any background information, implementing target detection. This is one of the major advantages of CEM. Another advantage of CEM is that as many signal sources cannot be recognized or observed with the naked eye, some materials may be detected by sensors leading to false alarm. However, the CEM transposes the correlation matrix **R** of data samples before the desired signature **d** is extracted. The sample autocorrelation matrix can be defined as **R** = (1/*N*) ∑ *N i*=1 **r***i***r***i* T, so that the background can be suppressed by **R**, and the filter is matched with signature d to enhance the capability of detecting signature, the execution is more efficient. The CEM is evolved from the LCMV proposed by Frost [32]. If there are *N* pixels **r** in a hyperspectral image

with *L* band, which are {**<sup>r</sup>**1,**r**2,**r**3,...,**r***N*}, where **r***i* = (**<sup>r</sup>***i*1,**r***i*2,**r***i*3,...,**r***iL*) T, the desired target to be looked for is represented by **d**, defined as **d** = (**d**1, **d**2, **d**3,..., **d***L*) T, the desired target to be looked for can be detected by finite impulse response filter (FIR) based on CEM and desired target **d**. The filter coefficient is defined as **w** = (**<sup>w</sup>**1, **w**2, **w**3,..., **<sup>w</sup>***L*) T, the **w** can be obtained with the minimum average energy, defined as **d**T**w** = **w**T**d** = 1. Therefore, if **y***i* is defined as **r***i* and imported into FIR, and then **y***i* can be expressed as

$$\mathbf{y}\_{i} = \sum\_{l=1}^{L} \mathbf{w}\_{l} \mathbf{r}\_{il} = \mathbf{w}^{\mathrm{T}} \mathbf{r}\_{i} = \mathbf{r}\_{i}^{\mathrm{T}} \mathbf{w} \tag{1}$$

The average energy is

$$\left(\mathbf{1}\left(\mathbb{N}\right)\sum\_{i=1}^{N}\mathbf{y}\_{i}^{2} = \left(\mathbf{1}\left(\mathbb{N}\right)\sum\_{i=1}^{N}\left(\mathbf{r}\_{i}^{\mathrm{T}}\mathbf{w}\right)^{2} = \mathbf{w}^{\mathrm{T}}\right]\left(\mathbf{1}\left(\mathbb{N}\right)\sum\_{i=1}^{N}\mathbf{r}\_{i}\mathbf{r}\_{i}^{\mathrm{T}}\right)\mathbf{w} = \mathbf{w}^{\mathrm{T}}\mathbf{R}\_{L\times L}\mathbf{w} \tag{2}$$

With the minimum average energy, the optimal solution of *w* can be obtained

$$\min\_{\mathbf{w}} \left\{ \mathbf{w}^{\mathbf{T}} \mathbf{R}\_{L \times L} \mathbf{w} \right\} \\ \text{subject to } \mathbf{d}^{\mathbf{T}} \mathbf{w} = \mathbf{w}^{\mathbf{T}} \mathbf{d} = 1 \tag{3}$$

According to the theory of Harsanyi [19], the optimal solution to the weight vector of one *L* band is

$$\mathbf{w}^{\text{CEM}} = \frac{\mathbf{R}\_{L}^{-1} \underset{L \times L}{\mathbf{d}} \mathbf{d}}{\mathbf{d}^{\text{T}} \mathbf{R}\_{L}^{-1} \underset{L \times L}{\mathbf{d}}} \tag{4}$$

Equation (3) is substituted in Equation (2), the result of CEM is

$$\boldsymbol{\delta}^{\rm CEM} = \left(\mathbf{w}^{\rm CEM}\right)^{\rm T}\mathbf{r} = \left(\mathbf{d}^{\rm T}\mathbf{R}\_{L}^{-1} \times\_{L} \mathbf{d}\right)^{-1} \left(\mathbf{R}\_{L}^{-1} \times\_{L} \mathbf{d}\right)^{\rm T}\mathbf{r} \tag{5}$$

## *2.2. Subset CEM*

The first target detection algorithm using autocorrelation matrix **S** that is proposed in this paper is a novel method using subsets, known as Subset CEM. The Subset CEM splits the image into several small square images; these small images are the subsets of the original image. The CEM detection is then implemented, and the results of subsets are patched up to obtain a complete resulting image. In other words, the small image of each subset has its own autocorrelation matrix **S**. For example, a 1200 × 1500 image is divided into nine small images; the resolution of each small image is 400 × 500, the autocorrelation matrix **S** is **S**1, **S**2, **S**3, ... , **S**9, respectively, and the corresponding **S***n* of each pixel is substituted into CEM for detection, thus obtaining the result. Figure 1 is the schematic diagram of the autocorrelation matrix in the image. The subset image size of the local autocorrelation matrix **S** is obtained by trial and error. Normally, using five times smaller than the original size is a good first try. The image resolution used in this paper is 1000 × 1300, and thus the image is divided into 200 × 260 subset images.

**Figure 1.** Schematic diagram for the autocorrelation matrix of the original image and subset images.

The results show that this method can effectively suppress the background pixels that are too similar to the desired target. Because the image is divided into the set of small images, the autocorrelation matrix of each small image changes accordingly. There is a different **S** for suppressing the background according to different images, and **S** is calculated according to the pixels in a small area; thus, the small difference between similar spectral signatures is enlarged. It is easier to judge the difference between two RGB values for suppression, so as to increase the detection rate. Figure 2 shows the detection process after the Subset CEM splits the image into subsets.

**Figure 2.** Schematic diagram of the division of subset images by Subset CEM.

#### *2.3. Sliding Window-Based CEM (SW CEM)*

Section 2.2 mentioned that the Subset CEM can effectively reduce the false alarm rate of similar background spectrums using the concept of the local autocorrelation matrix **S**. The Subset CEM divides the image into small square images, where each small image has its own autocorrelation matrix **S**. In other words, every two small images have dissimilar **S**. However, Subset CEM uses non-overlapped windows, which causes artifacts at the borders of subset images. In order to resolve such an issue, this paper proposes the pixel-by-pixel sliding window-based CEM (SW CEM) for detection. The sliding window concept is used in many studies [27,33–35]. The pixel-by-pixel CEM uses a sliding window of fixed size to obtain the RGB values around each pixel, according to different spectral characteristics around each pixel, so as to determine its autocorrelation matrix **S***<sup>n</sup>*. In other words, if the Subset CEM divides the image into small square images, then the pixel-by-pixel CEM divides the image into pixels, combined with a sliding window, to acquire the pixels around the pixel to determine the autocorrelation matrix **S***<sup>n</sup>*. Figure 3 shows the sliding window and direction.


**Figure 3.** Sliding window matrices of **<sup>r</sup>**(*m,n*) and **<sup>r</sup>**(*m+1,n*).

This means that each pixel in the image has its **S***<sup>n</sup>*, and each **S**n is independent and different. Hence, the SW CEM can be defined as:

$$\text{SW}\_{-}\text{CEM} = \frac{\mathbf{d}^{\text{T}}\mathbf{R}\_{nm}^{-1}\mathbf{r}\_{nm}}{\mathbf{d}^{\text{T}}\mathbf{R}\_{nm}^{-1}\mathbf{d}}\tag{6}$$

where, **r***mn* is the current pixel value, and **R***mn* is the autocorrelation matrix of **r***mn*, if the size of the sliding window is 2*k* + 1, as shown in Figure 4.

When the size of a sliding window is known, **R***mn* can be defined as:

$$\mathbf{R}\_{\rm mm} = \frac{1}{\left(2k+1\right)^{2}} \sum\_{i=m-k}^{m+k} \sum\_{j=n-k}^{n+k} \mathbf{x}\_{ij} \mathbf{x}\_{ij}^{\mathbf{T}} \tag{7}$$

where, **<sup>x</sup>***ij* represents each pixel in the sliding window, and 1 (2*k*+<sup>1</sup>)<sup>2</sup> is a constant, if **R***mn* is simplified by **S***mn*.

$$\mathbf{R}\_{mn} = \frac{1}{\left(2k+1\right)^{2}} \sum\_{i=m-k}^{m+k} \sum\_{j=n-k}^{n+k} \mathbf{x}\_{ij} \mathbf{x}\_{ij}^{\mathrm{T}} = \frac{1}{\left(2k+1\right)^{2}} \mathbf{S}\_{\mathbf{mn}} \tag{8}$$

We substitute Equation (8) into Equation (6) to obtain:

$$\mathbf{SW}\\_\mathbf{CEM} = \frac{\mathbf{d}^\mathbf{T}\mathbf{R}\_{mn}^{-1}\mathbf{r}\_{mn}}{\mathbf{d}^\mathbf{T}\mathbf{R}\_{mn}^{-1}\mathbf{d}} = \frac{(2k+1)^2(\mathbf{d}^\mathbf{T}\mathbf{S}\_{mn}^{-1}\mathbf{r}\_{mn})}{(2k+1)^2(\mathbf{d}^\mathbf{T}\mathbf{S}\_{mn}^{-1}\mathbf{d})} = \frac{\mathbf{d}^\mathbf{T}\mathbf{S}\_{mn}^{-1}\mathbf{r}\_{mn}}{\mathbf{d}^\mathbf{T}\mathbf{S}\_{mn}^{-1}\mathbf{d}}\tag{9}$$

$$\mathbf{S}\_{mn} = \sum\_{i=m-k}^{m+k} \sum\_{j=n-k}^{n+k} \mathbf{x}\_{i\uparrow} \mathbf{x}\_{i\downarrow}^{\mathbf{T}} \tag{10}$$

where, **S***mn* is the autocorrelation matrix of current pixel **r***mn*, and the capability of suppressing the background can be enhanced for each region by **S***mn*.

**Figure 4.** Schematic diagram of a sliding window.

#### *2.4. Adaptive Sliding Window-Based CEM (ASW CEM)*

The adaptive window concept has been applied to targets detection in many applications in the last decade, such as vehicles detection [36,37], adaptive filters [38], and anomaly detection [39,40]. When compared to SW CEM, which uses a fixed window size to calculate autocorrelation matrices **S**, AWS CEM determines the window size according to the spatial and spectral characteristics around each pixel, so as to suppress the background. The optimum size of the sliding window varies with the quantity of NGL around each pixel, and thus the sliding window improves based on the local CEM in this paper. The size of sliding window is determined by acquiring the proportion of sprouts around

the current pixel. When the sliding window size 2 *K* + 1 of the current pixel is determined, the result of SW CEM target detection can be obtained. In this case, we developed Adaptive Sliding Window-based CEM (ASW CEM) to combine adaptive window concept in CEM. Figure 5 illustrates the flowchart of ASW CEM. ASW CEM can change the size of the sliding window according to the ratio of the NGL around the current pixel to enhance NGL and suppress background.

The execution of ASW CEM comprises six steps:


The calculation of the rate of the sprout in the sliding window in Step 3 and the sliding window change conditions in Steps 2 and 4 are introduced below.

**Figure 5.** Flowchart of Adaptive Sliding Window-based CEM.

#### 2.4.1. Acquire the Rate of the NGL around the Current Pixel (NGL Map)

In order to enable the sliding window of each pixel to decide whether or not to enlarge or to shrink the current window according to the proportion of peripheral NGL, this paper requires a sprout map for judgment and reading. To set up the sprout distribution map, the spectral comparison is conducted on the he Spectral Information Divergence (SID) [17,41] for experimental multispectral images before ASW CEM, so as to obtain a preliminary sprout detection result. This result contains a small false alarm, but according to this result, when deciding whether or not to enlarge or shrink the sliding window, only the relative rate of the sprout in the sliding window shall be calculated, as the actual number of NGL is not required. The SID resulting image is segmented several times by using Otsu's method [42]. The rate of the sprout in the image is preliminarily estimated at 1~2%, so as to minimize the false alarm, and the major sprout points are maintained for calculation. When the

image with a preliminary estimate of the sprout is obtained, the proportion of NGL can be calculated by using the default sliding window size, and the size of the sliding window is changed according to the number of NGL. Figure 6 is the flowchart of this step.

**Figure 6.** The acquired rate of the sprout in the window from the sprout proportion chart and resizing the window: (**a**) original image; (**b**) preliminarily estimated rate of the sprout; (**c**) calculated rate of the sprout in the sliding window; and, (**d**) changed window size.

#### 2.4.2. Adaptive Sliding Window

To calculate the rate of the sprout around the current pixel, a sliding window of preset size needs to be made. The rate of the sprout in this window decides whether or not to enlarge or shrink the sliding window for subsequent algorithmic detection. Figure 7 is the flowchart of this step. A larger sliding window size is required in the region with a higher rate of the sprout; the optimum size of maximum window is set as *m*2; on the contrary, a smaller sliding window size is required in the region with a lower rate of the sprout, and the optimum size of the smallest window is set as *n*2. When the maximum window *m*<sup>2</sup> and minimum window *n*2 are obtained, the default window is set as an intermediate between maximum and minimum windows, i.e., *m*+*n* 2 2. The advantage is that as the initial window is intermediately sized, the window is enlarged or shrunk to the limit relatively fast. Afterwards, the sliding window is enlarged or shrunk gradually, according to the rate of the sprout in the window.

**Figure 7.** Flowchart of Adaptive Sliding Window CEM: (**a**) extract the newly grown leaves (NGL) around the current pixel from the preliminarily estimated sprout distribution map; (**b**) calculate the proportion of NGL in the sliding window; (**c**) enlarge or shrink the sliding window according to the proportion of NGL; (**d**) extract the pixel values from the window in relative position of the original image; (**e**) calculate the autocorrelation matrix for the pixel values in the window and substitute it in CEM for calculation; if all pixel values in the image are not finished, calculate the next pixel; and, (**f**) export the result of all pixel values.

As the distribution of NGL is not even, when the window is shrunk or enlarged, the rate of the sprout does not always increase or decrease. This method can enlarge and shrink the window. In order to avoid the non-uniform rate of the sprout leading to an infinite circulation of window enlargement or shrinkage, the initial window is used to calculate the sprout as a watershed. When the rate of the sprout in the initial window is higher than a threshold, the sliding window is enlarged gradually until the rate of the sprout in the window is lower than a threshold or the window is maximized before CEM detection. If the rate of the sprout in the initial window is lower than a threshold, then the sliding window is shrunk gradually until the rate of the sprout in the window is higher than a threshold, or the window is minimized before CEM detection. In order to adjust the window size conditionally in the stable level, this paper includes a parameter ε as initial NGL rate in the window, where ε is the proportion of the number of NGL in the window to the total number of pixels in the sliding window. When the rate of the sprout in the window is lower than ε, the window is shrunk; if the rate of the sprout in the window is higher than ε, then the window is enlarged.

#### *2.5. Optimal Signature Generation Process (OSGP)*

In order to remedy the defect in the CEM-related algorithm in that only one desired target can be selected, the Optimal Signature Generation Process (OSGP) is used herein to obtain a stable desired target by the iterative process. The idea of the iterative process is similar to K-means [43], iterative self-organizing data (ISODATA) [44] and iterative FLDA [45]. The OSGP implements Subset CEM target detection for the image iteratively. When the result of CEM is obtained, the image is segmented by using Otsu's method until the number of result pixels is 2–3%, which is the target pixel with the highest probability of the sprout. These pixels correspond to the same pixel RGB values in the original image, averaged as a new target **d'**. If the pixel value of **d'** is not similar enough to that of **d**, then **d'** is substituted in the next CEM, a new desired target is obtained. This is repeated until this and the last Spectral Angle Mapper [46] are smaller than a value θ, and then the current target **d** is exported. The threshold of spectral angle was tested continuously, and the threshold was set as 0.003. Figure 8 is the flowchart of OSGP.

**Figure 8.** Flowchart of Optimal Signature Generation Process (OSGP).

It is noteworthy that CEM uses the global correlation matrix **R** to suppress the background, and it is likely that the background has similar spectral signatures. Simply iterating CEM is the same condition, and some background pixels that do not belong to the desired target will be misrecognized as the result pixels, and the averaging of them influences the new desired target. A stable target **d** is still obtained after iteration, but the misrecognized pixels may result in errors in **d**, thus reducing the detection accuracy slightly. To solve this dilemma, this paper proposes Optimal Signature Generation Process (OSGP) and replaces CEM by subset CEM. The Subset CEM splits the image into many small sets; in the global view of the full image, the small square image of a subset is the local concept. The local algorithm can effectively suppress the pixel RGB values similar to the desired target, which is substituted into the iterative algorithm. This not only reduces the misrecognized result of pixels during iteration, but also obtains a better desired target **d**, so that the precision of detection increases and the probability of a false alarm decreases.

#### *2.6. Parameter Settings of Different Algorithms*

For different parameter settings of algorithms in this paper, the results are different, and so the parameter settings of the multiple CEM algorithm are listed in this section, as shown in Table 1. The input **d** of CEM, Subset CEM, SW CEM, and ASW CEM is the randomly selected desired target. The desired target **d'** is iterated by using OSGP. The OSGP iteration stop condition is that the SAM value of two adjacent targets **d** shall be smaller than *θ*; if tenable, then the desired target **d'** is exported. Here, *ε* is the condition value of the rate of the sprout when the sliding window of ASW CEM is enlarged or shrunk. In the following experiments in this study, *ε* is set to 1%. However, this parameter depends on the proportion of the number of target pixel in the entire image.


**Table 1.** Parameter settings of different algorithms.

In the following experiments the Subset image size is set as 200 × 260 pixel, and each small image uses the same **d** and **d'**, where each small image takes the pixels of its image size to calculate **S**. The subset image size of the local autocorrelation matrix **S** is obtained by trial and error. When each small image has calculated CEM, the results are exported and merged into the original picture size, and the merged picture is the result of Subset CEM.

The window size of SW CEM varied by different application and images. Normally, 5–6 times smaller than the entire images are the good try as the initial setting. This study sets the sliding window size as 151 × 151 pixel. It extracts the pixels of 151 × 151 pixel around each pixel to calculate **S** in Equation (10), which are substituted into CEM in order to work out the result value of the pixel. When each pixel is calculated, the output result of SW CEM is obtained.

ASW CEM uses the original image for SID measurement and gives the preliminarily estimated NGL map, and then ASW CEM employs the sliding window of preset size to calculate the rate of the sprout. When the rate of the sprout mismatches the stop condition, the window size is changed and this rate in the window is recalculated, until the window size reaches the threshold or the rate of the sprout is equivalent to ε. The pixel values in this window are used to calculate S, which is substituted into CEM to work out the result value of the pixel. When each pixel has been calculated, the output result of ASW CEM is obtained. For this experiment, the maximum size of the sliding window is set as 151 × 151, the minimum window is 31 × 31, and the initial window is 91 × 91. The initial window size can be set as the average of the maximum and minimum size.
