1. Introduction
Coffee is one of the most consumed beverages globally and one of the most traded commodities worldwide, and Brazil is the largest coffee producer in the world. Nevertheless, climate change has significantly affected tropical regions and their crops, including coffee. As a result, coffee has been included in Climate Smart Agriculture (CSA) strategies. This decision follows projections that indicate an increase in the prevalence of pests that increase in drought conditions [
1]. Indeed, the alteration of climatic conditions in various parts of the world leads to ecosystem imbalances, often resulting in the proliferation of previously less impactful or absent pathogens. This is the case with the Coffee Leaf Miner (CLM),
Leucoptera coffeella (Lepidoptera: Lyonetiidae) [
2,
3], a monophagous pest that exclusively affects coffee plants, which is the main foliar pest that develops during the dry period in Brazil.
In the caterpillar stage of the CLM, it causes necrotic lesions on the leaves, followed by the shedding of mined leaves from the top of the plants. The presence of mines triggers multiple physiological responses: it increases ethylene production (a plant hormone responsible for leaf abscission) [
4], reduces photosynthetic area and auxin levels [
5,
6], and leads to chlorosis that can take over the entire leaf. These effects can result in yield losses of 30% to 70% [
2] or even plant death.
Currently, CLM monitoring in coffee plantations is conducted through systematic field sampling, where the average number of mines per plant is counted using a zigzag pathway or regular sampling grid [
7,
8].
To facilitate field detection, systems that utilize remote sensing technology are currently being developed [
2]. Several authors indicate that remote sensing has been a more sustainable and accurate methodology in the discrimination of stresses in agriculture [
9,
10,
11], especially in coffee crops [
12,
13,
14,
15,
16]. Considering the operational challenges associated with cultivating coffee crops on extensive terrains, particularly in the Cerrado region of Minas Gerais, employing remote sensing emerges as a swifter strategy for CLM detection. However, despite their growing prevalence, remote sensing techniques are underutilized in this area.
To effectively monitor the phytosanitary status of a plantation or an entire region using remote sensing, it is essential to accurately characterize how the crop’s spectral response varies with the intensity of stress from biotic or abiotic disturbances. When dealing with spectroscopy and hyperspectral measurements, the main challenge is to develop a methodology that correlates the agronomic assessment of infested crops with hyperspectral measurements and discriminates the best intervals for classifying disease intensity.
Therefore, characterizing the spectral signature of CLM-infested leaves at different stages of infestation becomes crucial for timely intervention before the plant suffers severe defoliation. By accurately delineating the spectral patterns of CLM-infested leaves, coffee producers can create precise predictive tools capable of anticipating the widespread attack of their production area.
Researchers have taken up this challenge in recent years, mainly due to the devastating nature of the effects of CLM. However, most studies utilizing remote sensing techniques have relied on medium-resolution satellite images (e.g., Sentinel-2 constellation, with a spatial resolution of approximately 10–20 m) with a solid CLM spectral response knowledge. Ref. [
15] developed the “Coffee-Leaf-Miner Index” (CLMI), which combines the near-infrared, blue, and red bands for detecting CLM infestation at orbital images from Sentinel-2. A determination coefficient of 0.87 and an accuracy of 89.47% were achieved, highlighting the importance of red and near-infrared (NIR) ranges. More recently, Ref. [
17] demonstrated that the random forest algorithm effectively identified healthy or infested areas by CLM, using indices derived from Sentinel-2 images. The results achieved an accuracy of 86% and a kappa index of 0.64.
To the authors’ knowledge, there is limited research on higher resolution and, currently, there is no literature describing a strategy to quantify damage levels on individual leaves that could serve as an indicator for estimating the impact of CLM on the entire field. This information could significantly enhance detection using higher-resolution optical imagery, such as those collected by Unmanned Aerial Vehicles (UAVs) or very high-resolution optical satellite constellations, facilitating a more efficient detection of CLM.
In this context, this research aims to define the hyperspectral response of leaf miners. It not only addresses a critical gap in the current scientific literature, it also has profound implications for advancing agricultural practices in the coffee industry.
It is hypothesized that hyperspectral remote sensing can discriminate the different infestation levels of CLM in the leaves using specific regions of the electromagnetic spectrum. This study aims to comprehensively characterize the hyperspectral signature of leaves across varying CLM-infestation levels, aiming to (i) identify and delineate the optimal spectral regions that effectively discriminate between different degrees of infestation; (ii) to define an agronomic reference for CLM degree on infestation at leaf-level; and (iii) cluster CLM-infestation intensity according to the reflectance.
2. Materials and Methods
The study was carried out in an experimental coffee area located on the Monte Carmelo Campus of the Federal University of Uberlândia, and consists of one plot with 3.5 m spacing between rows, 0.6 m between plants, and an average height of 2 m (
Figure 1). The area has approximately 200 coffee trees in 650 m
2 and is drip irrigated. The site has an average annual rainfall of 1444 mm and a climate classified as tropical with a dry winter, which is the main condition for the natural development of the investigated pest. The experimental area was composed of
Coffea arabica L. with cultivar Topázio MG-1190, which is susceptible to CLM attack.
To evaluate the hypothesis that remote sensing can be used to discriminate different clusters of leaf miner attacks, the flowchart in
Figure 2 was proposed. Radiance measurements were obtained from 80 healthy and 80 infested leaves, specifically those affected by stress caused by the leaf miner pest. These values were subsequently transformed into the Hemispherical Conical Reflectance Factor (HCRF).
However, for an assessment at the leaf-level, it was necessary to create a new agronomic reference which included the percentage of damage on the mined leaf. Subsequently, the k-means clustering algorithm was employed to explore the optimal number of categories for discriminating among the 80 leaves, using the percentage of symptoms as a reference. Then, analysis of variance (ANOVA) was used to determine whether the mean HCRF of infested leaves had a significant difference when compared to the mean HCRF of healthy leaves. In this sense, three graphs were generated to explore the most important spectral regions for detecting the miner at different infestation levels.
2.1. Data Acquisition
2.1.1. Identification and Field Collection of Healthy and Symptomatic Leaves
The evaluation of injuries by leaf miners was carried out in July 2022, a period of lower relative humidity with an average relative humidity of 58% [
18], and considered suitable for the emergence of CLM. For this, 160 leaves located on the third or fourth pair of randomly chosen plagiotropic branches in the middle/upper third of the plant were removed, following the same methodology that is applied for evaluation in whole plants, and these were classified as healthy (80 leaves) and symptomatic (80 leaves).
2.1.2. Acquisition of Hyperspectral Measurements of the Leaves
Leaves, removed from the plants, were fixed above a black non-reflective target to minimize the influence of radiation from neighboring surfaces on the spectral radiance measurements of the leaves. The measuring instrument was a FieldSpec® Handheld spectroradiometer from Analytical Spectral Devices (ASD, Boulder, US), which operates in the spectral range of 325–1075 nm, with a spectral resolution of 1.6 nm. A filter limiting the field of view (FOV) to 10° was coupled with a distance from the spectrum to the leaf of approximately 114 mm, obtaining an instantaneous field of view (IFOV) of 10 mm.
For every leaf radiance measurement, an average of ten repetitions of the target radiance reading and the radiance of a reference Lambertian surface (Spectralon plate) [
19] were concurrently measured under the same lighting and observation conditions, between 11 a.m. and 2 p.m. (
Figure 3A).
The sensor was pointed at the adaxial surface of the leaf (upper part), measuring the spectral signature of the central part of the leaf (
Figure 3B), with 80 symptomatic and 80 healthy leaves.
2.2. Data Processing
2.2.1. Determining the Percentage Level of Leaf Damage
The level of leaf damage due to leaf miner attack was calculated as a function of the percentages of injury and yellowing of the leaves. For that, the symptomatic leaves were photographed using a conventional cell phone camera. The images were processed in AFSoft 1.1, a specific software for leaf analysis developed in Brazil [
20] based on a supervised neural network that calculates the percentages of (1) mined, (2) green/healthy, and (3) yellow/chlorotic areas. The percentages were calculated based on the total leaf area, without taking the dimensions in centimeters into account.
2.2.2. Clustering of Infestation Levels by K-Means
The clustering process was based on the percentages of leaf damage caused by CLM and the yellowing areas. These percentages, representing the extent of damage, were used to determine the optimal number of clusters. The k-means clustering algorithm was implemented in Python using the scikit-learn library to categorize infestation levels into distinct clusters. The ideal number of clusters was determined using the elbow method and the Silhouette Index. The elbow method entails examining the angles formed by the root mean squared error (RMSE) values across various numbers of clusters (k) tested. The optimal k is identified at the point where further additions of clusters do not yield a significant reduction in RMSE. However, a comparative analysis with another method becomes essential to bolster this decision-making process in determining the number of centroids. In this context, the Silhouette Index method [
21] was employed to ascertain the optimal number of clusters. This method is particularly valuable as it assesses the quality of clusters by considering individual points situated among multiple clusters.
As the k-means algorithm is sensitive to the scale of the data, a normalization technique was employed on the original data values (
X). The StandardScaler (Equation (1)) from the scikit-learn library was used to perform this normalization. This method transforms the data such that each feature has a mean (
μ) of 0 and a standard deviation (
σ) of 1, and does not restrict values within the range −1 to 1. Transformed values may extend beyond this range, but all variance in the data is preserved, allowing the clustering algorithm to capture the inherent structure of the data more effectively.
2.2.3. Hyperspectral Processing of Measurements
The HCRF, given by Equation (2), is determined by the ratio between the average radiance (L target) reflected by the target and that of an ideal diffusing surface (L surface), represented by a Spectralon plate, under identical geometric and lighting conditions, as demonstrated in [
13]. As the Spectralon plate used in the field deviates from the ideal conditions of a laboratory diffusing surface, a calibration factor (k) was calculated in reference to a perfectly conditioned plate available in the laboratory.
Aiming to reduce noisy values, the spectral range of 400 to 1000 nm was used, and a 7 nm (points) mean filter was applied to the data to smooth the spectral signatures. A mean filter takes the average spectral value of all points within the specified window as the new center point value [
22].
2.2.4. Analysis and Validation Between Groups of Symptomatic and Healthy Leaves
An ANOVA test for 95% confidence level was used to analyze the significance difference between the mean spectrum levels of symptomatic leaves of each cluster and the mean of healthy leaves. The
p-value and f-value were computed. After significance analysis, the absolute difference (
) between the average HCRF of healthy leaves (HCRF_healthy) and the HCRF of each group (K) of symptomatic leaves (HCRF_symptomatic) was performed, Equation (3), to compare the best spectral regions to discern each symptom level. Then, the identification of ideal spectral regions to discern symptom levels is assessed by graphing the differences between the mean spectral behavior of healthy leaves and the mean spectral behavior of the clusters.
4. Discussion
The study of the hyperspectral response of leaves affected by varying intensities of CLM, a defoliator of coffee plants causing significant production and economic losses in Brazil, revealed that the region around 700 nm in the electromagnetic spectrum is the most indicative of CLM infestations. These findings also underscore the robustness of studies conducted using satellite remote sensing. For instance, the previously mentioned study by [
17] utilized data from the Sentinel-2 satellite, which is equipped with the Multispectral Instrument (MSI), and achieved promising results with random forest models which identified the 700 nm wavelength as the most critical in the model. Similarly, using a UAV and a multispectral sensor, ref. [
23] found that spectral variance for CLM detection at the canopy level began to be significant from the 710 nm band.
As shown in
Figure 9, the most significant spectral differences are observed at approximately 700 nm. The work by [
15], which used Sentinel-2 images and the MSI sensor to develop the CLMI index, employs the NIR, red, and blue bands, achieving remarkable results. The blue band in the differences between the mean of healthy leaves and the symptomatic levels Δ
spectra (K) analyses was not significant, but this wavelength is strongly influenced by the atmosphere, which might explain its importance for a satellite-based indicator but not for our study. While satellite imagery typically has pixel dimensions in the order of meters, the leaf measurements were conducted using narrow spectral regions. This difference in spatial resolution between satellite sensors and high-resolution measurements could influence the comparative results.
A more detailed analysis of the Δ
spectra (K) shows that the red edge region is the most effective spectral band for distinguishing CLM infestations. Its importance increases as the level of infestation on the leaves rises. There is a significant increase in the reflectance of the spectrum of healthy leaves around 700 nm. As described by [
24], this point marks the transition between the process of absorption by chlorophyll in the red region and the process of scattering in the near-infrared, which is influenced by the internal structure of the leaves.
In the near-infrared range, the smallest difference between the spectra is observed around 750 nm, as is clearly visible in
Figure 8 and
Figure 9, where an overlap occurs that makes it impossible to differentiate between healthy and symptomatic leaves. In a previous study by [
25], which focused on the detection of nematodes in coffee plants, there was also a portion of the near-infrared region where the spectra of healthy and symptomatic coffee leaves coincided. However, after overlapping and contrary to what was observed in
Figure 7B, the mean HCRF of healthy leaves was higher than that of symptomatic leaves. In our results, this region proved to be incapable of discriminating the levels of infestation on the leaves. These findings suggest that the NIR spectral region, often employed for computing vegetation health indices such as Normalized Difference Vegetation Index (NDVI), is not significant in CLM.
It is possible to observe an increase in the importance of the visible region for discriminating infested leaves as they become more infested. It was observed by [
26] that powdery mildew, a fungal disease that affects beet leaves, showed an increase in reflectance throughout the visible and near-infrared spectrum. Furthermore, it was found that the higher the infection level, the higher the reflectance and the greater the distance between the average spectra.
These findings are groundbreaking in the literature on remote sensing and CLM concerning infestation levels. The three identified clusters demonstrate significant differences from the sample of healthy leaves, as shown in
Table 2. Here, the
p-values are substantially below the significance thresholds, and the F-scores are consistently high across all clusters. The group of minimally infested leaves reflects greater variation in infestation levels or other monitored characteristics (
Figure 6A). The group of moderately infested leaves has the smallest average distance between the three clusters, possibly indicating a more similar level of infestation among the leaves in this group (
Figure 6B). Additionally, the smaller number of observations for severely infested leaves (
Figure 6C) was expected due to the characteristic defoliation of coffee plants caused by the pest attack [
2,
4]. This supports the selection of K = 3 for clustering the HCRF dataset of symptomatic leaves and is reinforced by the positive results of the Silhouette Index and the Elbow Method (
Figure 5).
A significant challenge in this study was the absence of established agronomic references that define CLM infestation levels and their association with spectral response variations. This limitation led us to conduct the study in a single controlled experimental area of limited dimensions, where CLM occurrence was confirmed and other plant stresses were controlled. Future research will focus on expanding the dataset and exploring additional case studies to generalize the application, leveraging this initial foundation towards establishing comprehensive agronomical references.
While orbital data are utilized for extensive and continuous monitoring of plantations, leaf-level results can provide more detailed information demonstrating infestations at various levels. The findings of this study can be applied to proximal sensing applications.