Inverting Chlorophyll Content in Jujube Leaves Using a Back-Propagation Neural Network–Random Forest–Ridge Regression Algorithm with Combined Hyperspectral Data and Image Color Channels

Wu, Jingming; Bai, Tiecheng; Li, Xu

doi:10.3390/agronomy14010140

Open AccessEditor’s ChoiceArticle

Inverting Chlorophyll Content in Jujube Leaves Using a Back-Propagation Neural Network–Random Forest–Ridge Regression Algorithm with Combined Hyperspectral Data and Image Color Channels

by

Jingming Wu

^1,2,

Tiecheng Bai

^1,2,*

and

Xu Li

^1,2,*

¹

College of Information Engineering, Tarim University, Alaer 843300, China

²

Key Laboratory of Tarim Oasis Agriculture, Ministry of Education, Tarim University, Alaer 843300, China

^*

Authors to whom correspondence should be addressed.

Agronomy 2024, 14(1), 140; https://doi.org/10.3390/agronomy14010140

Submission received: 3 December 2023 / Revised: 28 December 2023 / Accepted: 3 January 2024 / Published: 5 January 2024

(This article belongs to the Section Precision and Digital Agriculture)

Download

Browse Figures

Versions Notes

Abstract

:

Chlorophyll content is highly susceptible to environmental changes, and monitoring these changes can be a crucial tool for optimizing crop management and providing a foundation for research in plant physiology and ecology. This is expected to deepen our scientific understanding of plant ecological adaptation mechanisms, offer a basis for improving agricultural production, and contribute to ecosystem management. This study involved the collection of hyperspectral data, image data, and SPAD data from jujube leaves. These data were then processed using SG smoothing and the isolated forest algorithm, following which eigenvalues were extracted using a combination of Pearson’s phase relationship method and the Partial Least Squares Regression–continuous projection method. Subsequently, seven methods were employed to analyze the results, with hyperspectral data and color channel data used as independent variables in separate experiments. The findings indicated that the integrated BPNN-RF-Ridge Regression algorithm provided the best results, with an R² of 0.8249, MAE of 2.437, and RMSE of 2.9724. The inclusion of color channel data as an independent variable led to a 3.2% improvement in R², with MAE and RMSE increasing by 1.6% and 3.9%, respectively. These results demonstrate the effectiveness of integrated methods for the determination of chlorophyll content in jujube leaves and underscore the potential of using multi-source data to improve the model fit with a minimal impact on errors. Further research is warranted to explore the application of these findings in precision agriculture for jujube yield optimization and income-related endeavors, as well as to provide insights for similar studies in other plant species.

Keywords:

jujube leaf; soil and plant analyzer development (SPAD); hyperspectral data; color channel; BPNN-RF-Ridge Regression; non-destructive testing

1. Introduction

Chlorophyll, a vital component in the photosynthetic process of plants, is intricately linked to the growth and health of vegetation [1]. Monitoring the chlorophyll content in crops is instrumental for agricultural decision-makers, as it provides insights into the crop’s growth status and enables the optimization of agricultural production management strategies [2]. Through chlorophyll inversion, timely adjustments in fertilization, irrigation, pest control, and other measures can be made, thereby enhancing crop yield and quality [3]. This non-destructive method offers an efficient means for monitoring and managing ecosystems, agricultural production, and environmental protection, having significant implications for sustainable development and environmental conservation [4]. Jujube, a crucial agricultural product, significantly contributes to the agricultural economy of Xinjiang, China [5,6]. The rise and development of the associated industry have propelled related supply chains. Due to Xinjiang’s continual advancement in large-scale farming, rising labor costs, a shortage of relevant experts, and ongoing progress in smart and precision agriculture, the seamless and convenient detection of data related to crops and their yields has become a current trend in agricultural production. This trend is in alignment with the practical needs of modern agricultural development [7,8,9].

Spectroscopic data research, as a non-invasive and non-destructive analytical approach, has been widely applied across various fields [10]. In recent years, the widespread use of optical sensors for detecting, identifying, and quantifying various types of data has become common [11,12]. Spectroscopic methods involve measuring the absorption, reflection, or transmission characteristics of a target object or sample at different wavelengths of light. Subsequently, the spectral information is utilized to determine the properties, composition, or status of the target object. In the agricultural domain, spectral data obtained from plant leaves are used to assess vegetation health, chlorophyll content, and moisture levels, among other factors, aiding in the implementation of precision agricultural management practices such as fertilization, irrigation, and pest monitoring [13,14]. Compared to spectral data, RGB images are more readily available and convenient, making the use of color channel data as independent variables an efficient and non-destructive approach [15]. Remote sensing techniques are commonly used for the inversion of chlorophyll content through image processing, involving the inversion and processing of color channels. Christine Y. Chang et al. have systematically evaluated a retrieval method for tree canopy far-red induced chlorophyll fluorescence using high-frequency automated field spectroscopy, providing optimal configuration recommendations for the SIF ground system [16]. Lorenzo Cotrozzi et al. have analyzed the spectral phenotypes of maize leaf physiological and anatomical characteristics, and concluded that expanding the range of functional traits estimated from hyperspectral data contributes to the improvement of breeding methods [17]. Dimitrios S. Kasampalis et al. have assessed the maturity stages and freshness of sweet pepper fruits using digital imaging, chlorophyll fluorescence, and visible/near-infrared spectroscopy. They concluded that accurate inversion requires the determination of sensitive bands using genetic algorithms [18]. C. Gongora-Canul et al. have conducted a spatiotemporal study on wheat blast disease using multi-spectral imaging and achieved an outstanding accuracy of up to 0.96 [19]. Rinku Basak et al. have estimated the chlorophyll a concentration in algae using impedance spectroscopy, achieving a best-fit determination coefficient of 0.915 [20]. Marta Sá et al. have monitored cell concentration, chlorophyll, and fatty acids in marine Arabidopsis using fluorescence spectroscopy and chemometrics, demonstrating the potential of fluorescence spectroscopy for real-time monitoring of key performance parameters during marine algae cultivation [21]. Oveis Hassanijalilian et al. have estimated soybean leaf chlorophyll content using smartphone digital imaging and machine learning, obtaining optimal results with R² = 0.89 and RMSE = 2.90 [22]. Lorenzo Cotrozzi et al. have studied the date palm (Phoenix dactylifera L.) using high spectral reflectance and chronic ozone exposure, achieving an accuracy of 81% [23]. Yayi Huang et al. have used unmanned aerial vehicle imagery to estimate the chlorophyll content in kale, resulting in an R² of 0.805, RMSE of 3.343, and RE of 6.84% [24]. Win Hung Tan et al. have efficiently estimated the quality, chlorophyll, and anthocyanin content of Spirogyra algae using images captured with a smartphone, achieving a stable R² of 0.8638 [25]. Aulia M. T. Nasution et al. have used a Raspberry Pi camera as a simple chlorophyll meter for the inversion of rice leaf chlorophyll content, yielding satisfactory results [26]. Overall, past studies have demonstrated that both spectral and color channel data can successfully invert chlorophyll content for respective crops, although research combining multiple data sources remains limited to date.

Ensemble learning, with its advantages in terms of good generalization, robustness, flexibility, diversity, and parallelism, has been proven to be effective in addressing numerous problems [27,28]. In many contemporary challenges, conventional machine learning algorithms often struggle due to issues such as generalization and robustness, while deep learning methods face challenges related to sample size requirements and training speed, demanding higher computational resources. Xiaowan Chen et al. have utilized a combination of genetic algorithms and Partial Least Squares Regression in a reflectance spectroscopy approach to quantitatively analyze hyperspectral features and chlorophyll, achieving significantly improved results compared to the original method [29]. Mohammad M. Hasan et al. have adjusted the hyperparameters for unmanned aerial vehicle target classification using range/micro-Doppler features combined with R-PCA-SVM, ultimately achieving a high accuracy of 98% [30]. Tang et al. achieved improved accuracy in inverting chlorophyll a concentration in Donghu through multi-factor modeling using various machine learning methods, showcasing the potential for advancement in this field [31]. Xu et al. have showcased enhanced accuracy when employing Bayesian methods in the crop radiative transfer model inversion process [32]. Yao Li et al. have estimated actual evapotranspiration in the Tujiang River Basin using a hybrid CNN-RF model, obtaining high precision with correlation coefficients of 0.925 and 0.79 for AET with temperature and precipitation, respectively [33]. Chang Shuran et al., through grey relational analysis, have diagnosed and predicted breast cancer using an improved PSO-SVM model, demonstrating the superior performance of the GP-SVM model [34]. Riccardo Trinchero et al. have quantified the uncertainty of electromagnetic interference in a power converter influenced by multiple uncertain parameters using LS-SVM and GP regression, confirming the feasibility and accuracy of the model [35]. Yi Xiao et al. combined drone multispectral imagery with integrated algorithms to monitor the water quality of urban rivers, achieving a best R² result of 0.839 [36]. Xin Ma et al. used models based on RF, RR, SVM, and ExtraTrees to forecast PM2.5 concentrations in the Beijing-Tianjin-Hebei region, obtaining a best MAE value of 5.91 ug/m³ [37]. These examples demonstrate the diverse and successful applications of various machine learning algorithms in addressing specific real-world problems.

This study commenced with the collection of hyperspectral data and imagery of jujube leaves utilizing a ground-based spectrometer and a camera. After acquisition, the hyperspectral data and images were exported using the ViewSpecPro software. Next, data pre-processing was conducted, involving the application of first- and second-order derivatives using corresponding software and code. To enhance the data quality, the data set underwent data cleaning utilizing the Isolation Forest algorithm. Subsequently, sensitive bands were extracted through the implementation of the Pearson method and Partial Least Squares Regression, combined with the Continuum Projection Method. The next step involved the application of Back-propagation Neural Network (BPNN), Ridge Regression, and Random Forest (RF) algorithms for the inversion of leaf chlorophyll. We compared the use of spectral data combined with color channel data to spectral data alone. We also tested the three types of algorithms individually, two types integrated, and three types integrated, while adjusting the parameters to obtain the best results, using RMSE, MAE, and R² as comparative analysis indicators. This paper will describe the preprocessing methods used in the experiment, the inverse modeling established, and the evaluation metrics employed in Section 2. In Section 3, the process involved in data collection, data cleaning, data preprocessing, correlation analysis, and feature data extraction, utilizing spectral sensitive bands combined with sensitive color channels, and using only spectral data for the inversion of chlorophyll in jujube leaves will be presented. Finally, the paper will present the ultimate findings and visualization based on the corresponding model and evaluation metrics. In Section 4, the paper will further analyze the advantages of this experiment and propose related future work. Section 5 will provide a comprehensive review of the entire study process and state the research significance and future work.

2. Materials and Methods

2.1. Overview of the Experimental Area

The experimental area, located in Alaer City, Xinjiang, belongs to the first division of the Xinjiang Production and Construction Corps, which is managed under the division and city management system. The area stretches from the southern foothills of the Tien Shan mountains in the north to the northern edge of the Taklamakan Desert in the south, covering a total area of 6923.4 square kilometers [38]. Alaer City is situated in the alluvial fine soil plain of the Tarim River, with a slight uplift along the riverbanks and on both sides of the alluvial gullies, and the terrain is tilted from northwest to southeast. Covering a distance of 281 km from east to west and 180 km from north to south, the city is characterized by a warm temperate extreme continental arid desert climate, with extreme maximum temperatures reaching 35 °C (and up to 40 °C, in some reclamation areas) and extreme minimum temperatures dropping to −28 °C (reaching −33.2 °C in specific reclamation areas). The reclamation area receives an annual average solar radiation of 133.7–146.3 kcal/cm² and an annual average sunshine of 2556.3–2991.8 h, with a sunshine rate of 58.69%. Rainfall in the reclamation area is scarce, with minimal snow in the winter and strong surface evaporation, resulting in an average annual precipitation of 40.1 to 82.5 mm and an average annual evaporation of 1876.6 to 2558.9 mm. The geographic map is depicted in Figure 1.

2.2. Data Acquisition and Pre-Processing

Agriculture-related growth indicators are time-sensitive, due to the specificity of the data acquired. It is essential to synchronize and collect hyperspectral data, SPAD data, and image data for the same part of the plant within a short time frame, in order to ensure accuracy and consistency.

(1) Hyperspectral data acquisition

Hyperspectral data were obtained using a FieldSpec HandHeld 2 spectrometer (ASD Corporation, Heracles Almelo, The Netherlands), which operates within the wavelength range of 325–1075 nm, with a wavelength accuracy of ±1 nm, <3.0 nm spectral resolution at 700 nm, an equivalent radiated noise of 5 × 10⁻⁹ W/cm²/nm/sr at 700 nm, an integration time of at least 8.5 ms, and a 25-degree field of view. An image of the equipment is shown in Figure 2a.

(2) Image data acquisition

For image data acquisition, a Canon EOS 800D (Canon, Tokyo, Japan) equipped with a 24.2 megapixel APS-C format CMOS sensor, DIGIC 7 image processor, and 45 focus points was used. Metering methods included evaluation metering, local metering, center focus metering, and spot metering, with a sensitivity range of ISO 100–25,600 (expandable to 51,200) and an optical viewfinder. An image of the equipment is shown in Figure 2b. The method utilized in this study was evaluation metering.

The leaves were placed in a photographic dark box, measuring 80 cm, for image acquisition. The light color temperature was set to white, and the background plate was covered with white paper before photographing. An image of the equipment is shown in Figure 2c.

(3) Measurement of relative chlorophyll content

The experimental procedure involved collecting jujube leaves from the study area, while ensuring that the corresponding veins were avoided. The chlorophyll content of the leaves was measured using an SPAD-502Plus (Konica Minolta, Tokyo, Japan). This device provides SPAD values, with an error range within ±3.0 SPAD units. For each sample, measurements were taken five times, and the average of these readings was recorded as the SPAD value for the plant. To ensure reliability, the experiment was repeated three times. The chlorophyll content was determined for the selected jujube leaves, and the corresponding data were calculated. An image of the equipment used in the experiment is provided in Figure 2d.

(4) Experimental Crop Information

The experiment was conducted in both an experimental field and a jujube orchard, using the jujube tree variety known as “Aksu Red Jujube”. The age of the jujube trees used in the experiment ranged from 5 to 10 years, and their height was artificially controlled to be approximately 2–2.5 m. The experimental field was located at the Tarim University Horticultural Experimental Station. Different types of fertilizers had been applied to the red jujube trees, while the jujube trees in the orchard were managed by the farm owner. Leaf samples were collected using a five-point sampling method, with samples taken from the east, west, south, north, and center positions, focusing on intact leaves from the jujube tree canopy. After initial data screening, a total of 69 sets of jujube tree data were selected, including 30 trees from the experimental field receiving different fertilization treatments—no fertilizer, moderate nitrogen fertilizer, excessive nitrogen fertilizer, organic fertilizer, nitrified nitrogen fertilizer, and phosphorus–potassium fertilizer—and 39 trees from the orchard under normal management. All experimental areas fell within the jurisdiction of Alar City and were subject to the specified planting climate detailed in the Experimental Area Overview section.

(5) Collection and Pre-processing of Data

The experiment involved extracting relevant data from the data set based on the preliminary screening of jujube tree data. A total of 69 sets of hyperspectral data were collected, with 20 data points in each set, resulting in a total of 1380 data points. In the experimental procedure, the high-spectral data were initially read and the raw data exported using the ViewSpecPro software. Subsequently, both the ViewSpecPro software and custom code were employed for calculation of the first and second derivatives [39,40,41]. The computation of these derivatives involved mathematical manipulations of the spectral data. The first derivative reflects the slope of the spectrum at a specific wavelength, indicating the rate of change in the curve at that particular wavelength. The corresponding formulas are presented in Equation (1). The second derivative further elucidates the local variations in the spectrum, enabling the detection of inflection points and the sharpness of peaks. The corresponding formulas are presented in Equation (2), where

f (x)

represents the relationship function between spectral intensity and wavelength, x denotes the wavelength,

f' (x)

denotes the first derivative, and

f ″ (x)

signifies the second derivative. The computation of these derivatives aids in revealing trends and features within the spectral data.

f' (x) = \lim △_{x \to 0} \frac{f (x + △ x) - f (x)}{△ x}

(1)

f ″ (x) = \lim △_{x \to 0} \frac{f' (x + △ x) - f' (x)}{△ x}

(2)

In relation to the image data, a total of 345 images were collected for this experiment corresponding to the hyperspectral data, in order to increase data diversity without altering the color channel information. The experiment involved enhancing the images through rotations, deformations, and other related data augmentation techniques, while keeping the color channel information unchanged. For the image color channel data, various features were extracted using the corresponding code, including red (R), green (G), blue (B), hue (H), saturation (S), value (V), lightness (L), components from green to red (a), components from blue to yellow (b), cyan (C), magenta (M), and yellow (Y), for a total of 12 feature variables [42,43,44].

(6) Data Smoothing and Cleaning

The SG (Savitzky–Golay) smoothing method was employed to de-noise the data set, due to the presence of excessive interference in the data. This method is a polynomial smoothing technique based on the Least Squares principle, also known as convolution smoothing, introduced by Savitzky and Golay [45]. In practical terms, a five-point smoothing algorithm was implemented, with m = 2 and k = 3. The core idea of the SG smoothing algorithm involves replacing a data point X_m with polynomial fit values based on the data points at wavelengths X_m₋₂, X_m₋₁, X_m, X_m₊₁, and X_m₊₂. The SG smoothing algorithm has been shown to be an effective spectral pre-processing method, which is capable of removing noise, enhancing spectral quality, and improving the accuracy and stability of subsequent analyses. By employing the SG smoothing algorithm, the data set was effectively de-noised, resulting in enhanced spectral quality and improved accuracy and stability for subsequent analyses.

The Isolation Forest (iForest) algorithm was introduced for data cleaning, considering the challenges posed by the use of large volumes of data. This ensemble learning-based anomaly detection method with linear time complexity and high accuracy is considered suitable for big data processing requirements [46,47]. The algorithm is applicable to continuous data, defining anomalies as “easily isolated outliers”, which are points far from dense and high-density clusters. One of its key advantages is that it does not require the definition of a mathematical model and does not need labeled training data.

2.3. Sensitive Feature Extraction Method

The identification and extraction of features from data that significantly impact task or model performance—known as sensitive feature extraction—are crucial for monitoring and reducing potential biases in the model. This process ensures that decisions do not unfairly affect specific groups. Moreover, focusing on sensitive features enhances the interpretability of the model by providing insights into the information considered during the decision-making process. In the context of the multitude of feature categories in this model, training complex models would inevitably lead to the curse of dimensionality. Therefore, the primary objective was to reduce the dimensionality of the data while maintaining model performance by prioritizing the extraction of sensitive features relevant to the task.

The model aimed to improve the accuracy of the final results by fitting the sensitive bands, which are extracted using both the Pearson correlation coefficient method and the combination of Partial Least Squares Regression (PLSR) with continuous projection. This process involves extracting sensitive bands through different methods, then fitting them to achieve the desired improvement in accuracy.

(1) Pearson Correlation Coefficient

The Pearson correlation coefficient, named after the statistician Karl Pearson, measures the strength and direction of a linear relationship between two variables.

The correlation is measured using the Pearson correlation coefficient, a value that ranges from −1 to 1, where a value of 1 indicates a perfect positive correlation (where both variables increase together), while a value of −1 indicates a perfect negative correlation (where one variable increases as the other decreases). A value of 0 indicates no linear correlation between the variables. The formula for calculating the correlation coefficient (Equation (3)) involves the sum of the products of the differences between each pair of observations and the square root of the product of the sum of the squared differences for each variable. The sign of the correlation coefficient reveals the direction of the relationship, while the magnitude indicates the strength of the relationship.

r = \frac{\sum (X i - \bar{X}) \sum (Y i - \bar{Y})}{\sqrt{\sum {(X i - \bar{X})}^{2} {(Y i - \bar{Y})}^{2}}} .

(3)

(2) Partial Least Squares Regression (PLS Regression) and Projection Pursuit Regression (PPR)

Partial Least Squares Regression (PLS Regression) is a statistical method particularly well-suited for high-dimensional data sets and situations involving multi-collinearity [48].

Projection Pursuit Regression (PPR) is a statistical method used for constructing regression models [49]. The core idea of PPR is to optimize non-linear projection functions to simplify the relationship between the target variable and the independent variables in the projected space. By identifying the most informative projections, PPR aims to enhance the interpretability and predictive performance of regression models, making it a valuable tool for data analysis and model building.

2.4. Inversion Modeling

In this study, an inversion model was constructed by selecting the BPNN, Ridge Regression, and RF algorithms for the inversion of leaf chlorophyll. The choice of BPNN stems from its suitability for modeling complex non-linear relationships, particularly when considering large-scale data sets and high-dimensional data. Ridge Regression, on the other hand, has been shown to be effective with the presence of multi-collinearity in the data, as it introduces penalty terms to shrink the coefficients of features. Additionally, the Random Forest algorithm performs admirably in most cases, providing highly accurate predictive results and excelling in handling large-scale data sets and coping with high-dimensional features. Consequently, these three methods were chosen for experimentation, and the corresponding results were compared and analyzed in seven forms to derive conclusive outcomes: BPNN, Ridge Regression, RF, BPNN–Ridge Regression, BPNN–RF, Ridge Regression–RF, and BPNN–Ridge Regression–RF.

(1) BPNN

The Back-propagation Neural Network (BPNN) is a type of Artificial Neural Network (ANN) model that can be utilized to tackle various issues [50]. Activation functions allow neural networks to learn complex non-linear relationships. Additionally, the errors introduced by a BPNN are propagated backward through the network to update the weights of each neuron, resulting in improved network predictions that are closer to the true labels. This iterative adjustment process enables the network to learn from mistakes and continually enhance the prediction accuracy over time.

(2) Ridge Regression

Ridge Regression is an extension of linear regression, designed to address the issue of multi-collinearity [51]. Multi-collinearity refers to high correlation among input features, which can lead to instability in the linear regression model parameters and over-fitting. Ridge Regression effectively addresses this problem by introducing a regularization term.

Ridge Regression shares the same loss function as linear regression, but with an additional regularization term (L2 norm). The objective is to minimize the loss function, as illustrated in Equation (4), where n represents the number of samples, p is the number of features, y_i stands for the true label,

{\bar{y}}_{i}

refers to the model’s prediction,

β_{j}^{}

represents the model’s weight parameters, and

α

signifies the regularization parameter. The regularization term in the formula

α \sum_{j = 1}^{p} β_{j}^{2}

penalizes the weight parameters, thereby preventing them from becoming excessively large. This facilitates a reduction in the influence of feature correlations on the model parameter estimates, consequently enhancing the overall model stability. Specifically, in the presence of multi-collinearity, Ridge Regression addresses over-fitting by diminishing the coefficients of features through the regularization term.

m i n i m i z e {\sum_{i = 1}^{n} {(y_{i} - {\bar{y}}_{i})}^{2} + α \sum_{j = 1}^{p} β_{j}^{2}} .

(4)

(3) Random Forest

The Random Forest (RF) algorithm is an ensemble learning method that can be employed to tackle classification and regression problems [52]. It is constructed based on the principles of decision trees, wherein multiple decision trees are created and their predictions are synthesized to improve the overall accuracy and generalization ability of the model. It is worth noting that Random Forest performs well in handling large, multi-class data sets. This is primarily due to its resistance to noise and outliers, making it less prone to over-fitting. In the context of this experiment, which involves solving multi-feature problems, the Random Forest algorithm proved to be highly valuable.

(4) Advantages of BPNN–RF–Ridge Regression algorithm

In this experiment, the fusion of multiple models, utilizing the Back-propagation Neural Network (BPNN), Ridge Regression, and Random Forest (RF) algorithms, offers advantages from different perspectives. By combining different types of models such as neural network, linear regression (a form of Ridge Regression), and ensemble learning (Random Forest) models, the combined approach can help to overcome the limitations of the individual models, contributing to an overall improvement in model robustness. This is beneficial as the models exhibit varying sensitivities and fitting capabilities to the data, resulting in a comprehensive approach to handle complex and diverse data sets. Each algorithm excels in handling data with different characteristics: BPNN is proficient in capturing complex non-linear relationships and is suitable for highly non-linear data, while Ridge Regression is effective in addressing multi-collinearity, providing stability. Random Forest, on the other hand, performs well with large-scale, high-dimensional data sets and possesses a degree of robustness. Considering that the data in this experiment exhibited characteristics suitable for each of these algorithms, the combination of these methods is justified. This integration proved to be reasonable and may yield better performance when dealing with complex and diverse data sets. The corresponding process is depicted in Figure 3.

2.5. Description of Assessment Indicators

The conducted experiment was focused on addressing a regression problem and so employed metrics such as Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and R² for evaluation.

RMSE serves as a standard for measuring the prediction error of a model, where a smaller RMSE value indicates a higher prediction accuracy. The mathematical definition of RMSE is given by Equation (5), where n is the sample size,

y_{i}

represents the actual values (true labels), and

{\bar{y}}_{i}

represents the model’s predicted values.

Another metric used to measure the disparity between predicted and actual values in regression problems is the Mean Absolute Error (MAE), which involves computing the average of the absolute values of prediction errors, as provided in Equation (6), with the same parameter meanings as in Equation (5). The mathematical definition of the MAE offers a clear method for quantifying the accuracy of predictions.

The assessment of how well a regression model fits, known as R² or the coefficient of determination, signifies the proportion of the variance in the target variable that is explained by the model. Its value ranges from 0 to 1, with a value closer to 1 indicating a better fit of the model to the target variable, as illustrated in Equation (7). The average of the actual values is represented by

\bar{y}

, while the remaining parameters have the same meanings as in Equation (5).

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\bar{y}}_{i})}^{2}},

(5)

M A E = \frac{1}{n} \sum_{i = 1}^{n} |y_{i} - {\bar{y}}_{i}|,

(6)

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - {\bar{y}}_{i})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}}

(7)

3. Results

3.1. Results of Data Cleaning

The initial step in spectral data processing involved applying Savitzky–Golay (SG) smoothing for de-noising. Following this, first- and second-order derivatives were calculated for the processed data. Subsequently, the spectral data underwent isolation forest algorithm processing, with the corresponding images illustrated in Figure 4a–d.

For color channel data processing, the acquired data first underwent first-order and second-order derivative calculations, followed by isolation forest algorithm processing. Both the spectral and color channel input data consisted of 1380 sets, with 162 sets merged, acquired, and abnormal data indices removed. This resulted in a total of 1218 sets for further processing. The corresponding resulting data are depicted in Figure 4e–h.

3.2. Correlation Analysis and Selection

We initially executed separate analyses using Pearson correlation coefficients for the raw data, first-order derivatives, and second-order derivatives of both the spectral data and color channel data. Subsequently, we compiled and summarized the corresponding results. The obtained correlation coefficient results for the 20 highly correlated spectral bands are detailed in Table 1, whereas the correlation coefficient results for color channels are visually depicted in Figure 5. Upon comparing the sensitive bands of the original spectral data, first-order derivative data, and second-order derivative data, we consistently observed the presence of bands 1075 and 1043 across all three sets of data. Additionally, we noted that bands 1014, 1015, 1023, 1044, 1006, 1018, 1024, 1032, 1007, 1029, 1017, 1013, and 992—for a total of 13 sensitive bands—manifested duplicate occurrences across the three data sets, necessitating further investigation. In relation to the color channels, an analysis of the original data, first-order derivatives, and second-order derivatives revealed distinct repetitions in channels G, L, and a, thus prompting their selection for further study.

We used Partial Least Squares Regression (PLSR) along with the Continuum Projection Method to conduct a comprehensive correlation analysis of the raw data, first-order derivatives, and second-order derivatives of both spectral data and color channel data. In view of the inadequate identification of sensitive bands from the initial selection of the top 20 highly correlated spectral bands, we re-calculated the correlation coefficient results for the top 30 strongly correlated spectral bands. The algorithmic processing for the relevant spectral data is visually depicted in Figure 6a–c, and the corresponding algorithmic processing images for color channel data are presented in Figure 6d–f. In Figure 6a–c, different colors represent different spectral bands while, in Figure 6d–f, different colors represent corresponding color channels. Due to the excessive number of categories, the corresponding labels were deemed redundant and, therefore, are not included in the figures. However, the essential information to be conveyed can still be inferred from the images. From the images, it is evident that both the original data of the spectra and color channels exhibited discrete distributions. Nevertheless, after derivative processing, the interference was notably reduced, leading to a clear convergence of the data. The correlation coefficients for color channels are illustrated in Figure 6g–i. After comparing the sensitive bands of the original spectral data, first-order derivative data, and second-order derivative data, we ultimately identified and selected 15 sensitive bands (band numbers: 656, 655, 654, 652, 651, 490, 498, 492, 491, 497, 489, 721, 723, 722, and 720) for further investigation. Likewise, in the case of the color channels, a comparative analysis of the original data, first-order derivatives, and second-order derivatives revealed that channels G, L, and A exhibited sensitivity, which were thus selected for subsequent study.

3.3. Summary and Analysis of Model Chlorophyll Inversion Results

The following experiment involved the use of sensitive spectral bands and color channels for spectral inversion, employing a combination of BPNN, Ridge Regression, and RF algorithms. The data were split into training and testing sets at a ratio of 65:35, and the results were subsequently analyzed and summarized.

For the experiments, various algorithms were utilized to explore their predictive performance. First, the BPNN algorithm was examined, with a hidden layer consisting of 10 nodes, a maximum of 1000 iterations, a ReLU activation function, an initial learning rate of 0.001, regularization alpha of 0.0001, and optimal results achieved with a batch size of 32. The obtained metrics were R² = 0.47523, MAE = 4.5661, and RMSE = 6.4783 (Figure 7a). Additionally, the RF algorithm was subjected to analysis with n_estimators set to 300, an adaptive max_depth, minimum samples for internal and leaf nodes set to 2, and an adaptive maximum feature count. The resulting performance metrics were R² = 0.53053, MAE = 3.7632, and RMSE = 4.3125 (Figure 7b). Moreover, the Ridge Regression algorithm performed a grid search for the optimal alpha value with parameters [0.1, 1.0, 10.0], using “auto” for handling. The achieved metrics were R² = 0.64312, MAE = 3.8132, and RMSE = 4.6482 (Figure 7c). Additionally, ensemble methods were explored, starting with the BPNN–LIN algorithm, employing BPNN for feature extraction and Ridge Regression for training, achieving R² = 0.68745, MAE = 2.728, and RMSE = 3.2951 (Figure 7d). The BPNN–RF algorithm utilized BPNN for feature extraction and RF for training regression with specific parameter settings, resulting in R² = 0.72707, MAE = 2.7693, and RMSE = 4.7631 (Figure 7e). Similarly, the RF–LIN algorithm and BP–RF–LIN algorithm were examined, yielding R² = 0.72707, MAE = 2.7693, and RMSE = 4.7631 for both configurations (Figure 7f and Figure 7g, respectively). The results are summarized in Figure 8a–c, which indicate that individual algorithm performance was ranked as Ridge Regression > RF > BPNN in terms of fitting and RF > Ridge Regression > BPNN in terms of error. Notably, the ensemble learning methods consistently outperformed the individual algorithms. In terms of fitting, BPNN–RF–Ridge Regression > RF–Ridge Regression > BPNN–RF > BP–Ridge Regression, while, in terms of error, RF–Ridge Regression > BPNN–RF–Ridge Regression > BP–Ridge Regression > BPNN–RF. The findings from these experiments highlight the potential for improved predictive analytics performance through the use of ensemble methods.

Experiments were simultaneously conducted with only sensitive spectral bands and without the incorporation of color channels, and the corresponding results are depicted in Figure 8d. It is evident, from the findings, that there was minimal variance in the fitting degree and error when compared to experiments combining the color channels. Notably, there was a discernible decrease in the overall error when compared to experiments with combined color channels, although the difference was relatively small. The results of the experiment indicate that sensitive color channels have a beneficial impact on chlorophyll inversion.

4. Discussion

The initial step in this study involved the collection of hyperspectral data, chlorophyll data, and image data of jujube leaves using designated instruments. Following data collection, the hyperspectral and image data underwent pre-processing steps including data augmentation and cleaning. Following this, correlation analyses were carried out between the spectral band data and the image color channel data, with the goal of extracting sensitive data as features. The next phase of the study involved the utilization of BPNN, RF, Ridge Regression, and ensemble algorithms, combining these three methods for experimentation. The results were compiled and subjected to analysis, in order to identify algorithms with good fitting and minimal errors.

In recent years, researchers have utilized spectral and image data to investigate chlorophyll inversion in various crops, with several notable findings. For example, Jay et al. have demonstrated the potential of vegetation indices in high-throughput phenotypic analyses by conducting inversion of LAI, chlorophyll, and nitrogen content in sugar beet crops using multi-source data [53]. Additionally, Lu et al. have provided practical insights for the remote monitoring of jujube mite infestation by inverting jujube leaf chlorophyll content under spider mite stress using ensemble algorithms [54]. Yuzhen Wu et al. used unmanned aerial vehicle hyperspectral remote sensing to assess the multi-parameter health of jujube trees and achieved R² values of 0.726 and a maximum of 0.853 using the RF method [55]. The integration of multi-source data and ensemble learning methods has been shown to be effective in improving the accuracy in related inversion studies. However, studies on related nutrients in jujube, including chlorophyll, remain limited. Previous studies primarily utilized fixed bands and vegetation indices without extracting sensitive bands. In addition, corresponding experiments mainly relied on single-class data and primarily used laboratory equipment to achieve higher accuracy. The present experiment offers an effective solution for the non-destructive detection of chlorophyll in jujube leaves using multi-source data and ensemble learning methods, addressing the challenges related to detection and providing valuable insights. The experiment used portable equipment to quickly and efficiently obtain data at the site. The integration of relevant data and algorithms can lead to rapid and efficient results, ensuring effective, non-destructive, and convenient detection. Rapid and non-destructive detection is crucial for the development of agricultural informatization, thus highlighting the significance of these findings in agricultural research.

The acquired data underwent manual cleaning to handle missing, redundant, and complex data, which was found to be inadequate due to the complexity of the data, necessitating the use of SG smoothing to further remove interference. To enhance the stability of the data for correlation analysis, the Isolation Forest algorithm was employed for data cleaning, effectively eliminating interference and improving the results. Furthermore, a normality analysis confirmed that the acquired data generally adhered to a normal distribution. Subsequently, sensitive data were extracted using a Pearson correlation analysis and Partial Least Squares Regression combined with the Continuum Projection Method. Thirty sensitive spectral bands and three color channels were selected for modeling. It is worth noting that band repetition was absent in the extraction process, with variations observed in the primary focus of bands between the Pearson correlation analysis (predominantly in the near-infrared spectrum) and Partial Least Squares Regression–Continuum Projection Method (primarily in the red spectrum). It is recommended that further analyses be conducted to explore different band data using existing hyperspectral indices that are sensitive to jujube crops.

The analysis of the results involved the discussion of using only spectral data compared to combining spectral data with color channel data. The R² values of the respective algorithms showed an increase, particularly with the BPNN–RF–Ridge Regression algorithm presenting a notable growth of 3.2%. Despite a 1.6% increase in MAE and a 3.9% increase in RMSE, the overall accuracy improved. Future research could delve into the exploration of sensitive color spaces in images through experiments involving different combinations of color spaces. Furthermore, the results indicated that the ensemble algorithms consistently outperformed the individual algorithms, highlighting their effectiveness regarding chlorophyll inversion in jujube leaves. When comparing the performance of BPNN–RF–Ridge Regression and RF–Ridge Regression, the former slightly outperformed the latter, with a greater decline noted when the color channel data were removed.

Chlorophyll, a crucial photosynthetic pigment in plants, directly influences plant growth and health. This study, which focused on apparently normal jujube leaves, aimed to achieve real-time detection of relevant elements for smart agriculture. Through the integration of hyperspectral data and image color channel data, the experimental results demonstrated the effectiveness of ensemble learning methods for the non-destructive detection of nutrient elements in jujube crops. The real-time monitoring and health assessment of plant growth status through the inversion of jujube leaf chlorophyll can help to promptly detect plant growth anomalies or stress. Importantly, this study advances the progress in increasing agricultural yields through smart agriculture. Additionally, it contributes to the feasibility of data fusion technology in vegetation monitoring and provides reference for ecological environmental monitoring. Ultimately, this research presented an accurate method for chlorophyll inversion in jujube leaves, highlighting its significance in the field of plant growth and smart agriculture.

It is important to discuss the limitations that led to the occurrence of corresponding errors in this study. First, the collection of spectral data needs to be considered. Regarding the natural light source used for data acquisition, despite frequent calibration and data cleaning, the interference caused by the variability of natural light sources could not be completely eliminated. Additionally, the variability in the chlorophyll data due to the peculiarities of agriculture and the associated temporal changes pose another challenge. Efforts were made to minimize the time interval between the acquisition of these two types of data in the experiment, in order to reduce interference, but residual interference still existed. Furthermore, despite efforts to increase the diversity of the sample data and enhance the theoretical robustness and generalizability, it is necessary to conduct corresponding experiments when attempting to use the developed methods on other crops. Finally, trade-offs between accuracy and efficiency need to be considered when comparing machine learning algorithms to deep learning algorithms, as the former consistently exhibits a gap in accuracy that requires a careful balance between the two aspects. By discussing limitations and future research directions, we can derive several research ideas. Firstly, controlling variables can minimize the impact of natural light. This can be achieved by using dark boxes outdoors or by simultaneously using corresponding devices for data collection to reduce interference. Additionally, using multi-source data inversion such as spectroscopy can trace different varieties of red dates and facilitate source tracking. Integrating sky-ground remote sensing data is essential for improving precision. Further improvements or innovative algorithms should be considered to enhance accuracy.

The limited research on jujube in the past has predominantly focused on other related crops, and the experiments conducted were often destructive, relying primarily on single-input data. This study, however, sought to fill this gap by starting with apparently normal jujube leaves, with the objective of detecting relevant elements for smart agriculture. By utilizing multi-source data in combination with ensemble learning, our model presented enhanced accuracy, thereby offering a potential approach for future research.

In conclusion, the study of chlorophyll inversion in jujube leaves using a combination of hyperspectral data and image color channel data holds great potential for enhancing agricultural productivity, optimizing resource usage, safeguarding the environment, and advancing the utilization of remote sensing technology in agricultural and environmental monitoring. Through the analysis of chlorophyll content, a deeper understanding of plant growth conditions can be acquired, thereby laying a scientific foundation for agricultural production and ecological environment management.

5. Conclusions

In the era of large-scale, mechanized, and intelligent agriculture, the use of precision agriculture and smart farming has become imperative. Traditional manual survey and destructive detection methods pose numerous challenges. The experiments conducted in this study utilized hyperspectral data combined with image color channel data for the inversion of chlorophyll in jujube leaves. The experiment involved using visual interpretation to clean the data, followed by the application of the Isolation Forest algorithm, derivative methods, Pearson method, and PLS + PPR methods to further clean and extract feature data. The combination of hyperspectral data and color channels was then used to invert chlorophyll, effectively improving the accuracy of the inversion. In comparison to previous studies, this experiment proposed the combined use of multiple data sources for the inversion of chlorophyll in jujube leaves. Unlike past research that relied on fixed vegetation indices for inversion, this study extracted features from various color ranges and specific bands, providing valuable references for subsequent experiments involving sensitive vegetation indices. Notably, this approach not only achieved non-destructive detection, but also provided an accurate method for chlorophyll inversion in jujube leaves. It was demonstrated that ensemble methods such as BPNN–RF–Ridge Regression present excellent performance in terms of inversion. Furthermore, the use of multi-source data, such as spectral band data combined with image color channel data, can effectively increase the associated accuracy. Crucially, these methods are applicable in experiments involving healthy jujube leaves. As a result, this study contributes to strengthening agricultural monitoring and management, providing insights to address the corresponding challenges. Importantly, the findings have practical significance for sustainable agricultural development and research, offering a significant contribution to the field.

Author Contributions

Conceptualization, J.W. and X.L.; methodology, J.W.; validation, X.L., T.B. and J.W.; formal analysis, J.W. and X.L.; investigation, J.W.; resources, X.L. and T.B.; data curation, J.W.; writing—original draft preparation, J.W.; writing—review and editing, X.L., T.B. and J.W.; visualization, J.W.; supervision, X.L. and T.B.; project administration, X.L. and T.B.; funding acquisition, X.L. and T.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Oasis Ecological Agriculture Corps Key Laboratory Open Project (202002), the Corps Science and Technology Program (2021CB041, 2021BB023, and 2021DB001), the Tarim University Innovation Team Project (TDZKCX202306 and TDZKCX202102), and the National Natural Science Foundation of China (61563046).

Data Availability Statement

If needed, you may contact any of the authors to request the relevant data.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Humphrey, A. Chlorophyll. Food Chem. 1980, 5, 57–67. [Google Scholar] [CrossRef]
Haboudane, D.; Miller, J.R.; Tremblay, N.; Zarco-Tejada, P.J.; Dextraze, L. Integrated narrow-band vegetation indices for prediction of crop chlorophyll content for application to precision agriculture. Remote Sens. Environ. 2002, 81, 416–426. [Google Scholar] [CrossRef]
Bauriegel, E.; Herppich, W.B. Hyperspectral and chlorophyll fluorescence imaging for early detection of plant diseases, with special reference to Fusarium spec. infections on wheat. Agriculture 2014, 4, 32–57. [Google Scholar] [CrossRef]
Kalaji, H.M.; Jajoo, A.; Oukarroum, A.; Brestic, M.; Zivcak, M.; Samborska, I.A.; Cetner, M.D.; Łukasik, I.; Goltsev, V.; Ladle, R.J. Chlorophyll a fluorescence as a tool to monitor physiological status of plants under abiotic stress conditions. Acta Physiol. Plant. 2016, 38, 102. [Google Scholar] [CrossRef]
Chen, K.; Fan, D.; Fu, B.; Zhou, J.; Li, H. Comparison of physical and chemical composition of three chinese jujube (Ziziphus jujuba Mill.) cultivars cultivated in four districts of Xinjiang region in China. Food Sci. Technol. 2018, 39, 912–921. [Google Scholar] [CrossRef]
Liu, M.; Zhao, Z. Germplasm resources and production of jujube in China. Acta Hortic. 2008, 840, 25–32. [Google Scholar] [CrossRef]
Banks, T. Property rights reform in rangeland China: Dilemmas on the road to the household ranch. World Dev. 2003, 31, 2129–2142. [Google Scholar] [CrossRef]
Shi, L.; Shi, G.; Qiu, H. General review of intelligent agriculture development in China. China Agric. Econ. Rev. 2019, 11, 39–51. [Google Scholar] [CrossRef]
Zhang, N.; Wang, M.; Wang, N. Precision agriculture—A worldwide overview. Comput. Electron. Agric. 2002, 36, 113–132. [Google Scholar] [CrossRef]
Milton, E.J.; Schaepman, M.E.; Anderson, K.; Kneubühler, M.; Fox, N. Progress in field spectroscopy. Remote Sens. Environ. 2009, 113, S92–S109. [Google Scholar] [CrossRef]
Cotrozzi, L. Spectroscopic detection of forest diseases: A review (1970–2020). J. For. Res. 2022, 33, 21–38. [Google Scholar] [CrossRef]
Cotrozzi, L.; Couture, J.J. Hyperspectral assessment of plant responses to multi-stress environments: Prospects for managing protected agrosystems. Plants People Planet 2020, 2, 244–258. [Google Scholar] [CrossRef]
Hank, T.B.; Berger, K.; Bach, H.; Clevers, J.G.; Gitelson, A.; Zarco-Tejada, P.; Mauser, W. Spaceborne imaging spectroscopy for sustainable agriculture: Contributions and challenges. Surv. Geophys. 2019, 40, 515–551. [Google Scholar] [CrossRef]
Tsuchikawa, S.; Ma, T.; Inagaki, T. Application of near-infrared spectroscopy to agriculture and forestry. Anal. Sci. 2022, 38, 635–642. [Google Scholar] [CrossRef]
Dupouy, C.; Whiteside, A.; Tan, J.; Wattelez, G.; Murakami, H.; Andréoli, R.; Lefèvre, J.; Röttgers, R.; Singh, A.; Frouin, R. A Review of Ocean Color Algorithms to Detect Trichodesmium Oceanic Blooms and Quantify Chlorophyll Concentration in Shallow Coral Lagoons of South Pacific Archipelagos. Remote Sens. 2023, 15, 5194. [Google Scholar] [CrossRef]
Chang, C.Y.; Guanter, L.; Frankenberg, C.; Köhler, P.; Gu, L.; Magney, T.S.; Grossmann, K.; Sun, Y. Systematic assessment of retrieval methods for canopy far-red solar-induced chlorophyll fluorescence using high-frequency automated field spectroscopy. J. Geophys. Res. Biogeosci. 2020, 125, e2019JG005533. [Google Scholar] [CrossRef]
Cotrozzi, L.; Peron, R.; Tuinstra, M.R.; Mickelbart, M.V.; Couture, J.J. Spectral phenotyping of physiological and anatomical leaf traits related with maize water status. Plant Physiol. 2020, 184, 1363–1377. [Google Scholar] [CrossRef]
Kasampalis, D.S.; Tsouvaltzis, P.; Ntouros, K.; Gertsis, A.; Gitas, I.; Siomos, A.S. The use of digital imaging, chlorophyll fluorescence and Vis/NIR spectroscopy in assessing the ripening stage and freshness status of bell pepper fruit. Comput. Electron. Agric. 2021, 187, 106265. [Google Scholar] [CrossRef]
Gongora-Canul, C.; Salgado, J.; Singh, D.; Cruz, A.; Cotrozzi, L.; Couture, J.; Rivadeneira, M.; Cruppe, G.; Valent, B.; Todd, T. Temporal dynamics of wheat blast epidemics and disease measurements using multispectral imagery. Phytopathology 2020, 110, 393–405. [Google Scholar] [CrossRef]
Basak, R.; Wahid, K.A.; Dinh, A. Estimation of the chlorophyll-a concentration of algae species using electrical impedance spectroscopy. Water 2021, 13, 1223. [Google Scholar] [CrossRef]
Sá, M.; Bertinetto, C.G.; Ferrer-Ledo, N.; Jansen, J.J.; Wijffels, R.; Crespo, J.G.; Barbosa, M.; Galinha, C.F. Fluorescence spectroscopy and chemometrics for simultaneous monitoring of cell concentration, chlorophyll and fatty acids in Nannochloropsis oceanica. Sci. Rep. 2020, 10, 7688. [Google Scholar] [CrossRef]
Hassanijalilian, O.; Igathinathane, C.; Doetkott, C.; Bajwa, S.; Nowatzki, J.; Esmaeili, S.A.H. Chlorophyll estimation in soybean leaves infield with smartphone digital imaging and machine learning. Comput. Electron. Agric. 2020, 174, 105433. [Google Scholar] [CrossRef]
Cotrozzi, L.; Lorenzini, G.; Nali, C.; Pellegrini, E.; Saponaro, V.; Hoshika, Y.; Arab, L.; Rennenberg, H.; Paoletti, E. Hyperspectral reflectance of light-adapted leaves can predict both dark-and light-adapted chl fluorescence parameters, and the effects of chronic ozone exposure on date palm (Phoenix dactylifera). Int. J. Mol. Sci. 2020, 21, 6441. [Google Scholar] [CrossRef]
Huang, Y.; Ma, Q.; Wu, X.; Li, H.; Xu, K.; Ji, G.; Qian, F.; Li, L.; Huang, Q.; Long, Y. Estimation of chlorophyll content in Brassica napus based on unmanned aerial vehicle images. Oil Crop Sci. 2022, 7, 149–155. [Google Scholar] [CrossRef]
Tan, W.H.; Ibrahim, H.; Chan, D.J.C. Estimation of mass, chlorophylls, and anthocyanins of Spirodela polyrhiza with smartphone acquired images. Comput. Electron. Agric. 2021, 190, 106449. [Google Scholar] [CrossRef]
Nasution, A.M.; Fajrin, Y.A.; Suyanto, H. Calibrating of simple and low cost Raspberry-Pi camera-based Chlorophyll meter for accurately determining chlorophyll content in paddy leaves. In Proceedings of the Third International Seminar on Photonics, Optics, and Its Applications (ISPhOA 2018), Surabaya, Indonesia, 1–2 August 2018; SPIE: Bellingham, WA, USA, 2019; pp. 36–39. [Google Scholar]
Alber, M.; Buganza Tepole, A.; Cannon, W.R.; De, S.; Dura-Bernal, S.; Garikipati, K.; Karniadakis, G.; Lytton, W.W.; Perdikaris, P.; Petzold, L. Integrating machine learning and multiscale modeling—Perspectives, challenges, and opportunities in the biological, biomedical, and behavioral sciences. NPJ Digit. Med. 2019, 2, 115. [Google Scholar] [CrossRef]
Garcez, A.d.A.; Gori, M.; Lamb, L.C.; Serafini, L.; Spranger, M.; Tran, S.N. Neural-symbolic computing: An effective methodology for principled integration of machine learning and reasoning. arXiv 2019, arXiv:1905.06088. [Google Scholar]
Chen, X.; Dong, Z.; Liu, J.; Wang, H.; Zhang, Y.; Chen, T.; Du, Y.; Shao, L.; Xie, J. Hyperspectral characteristics and quantitative analysis of leaf chlorophyll by reflectance spectroscopy based on a genetic algorithm in combination with partial least squares regression. Spectrochim. Acta Part A Mol. Biomol. Spectrosc. 2020, 243, 118786. [Google Scholar] [CrossRef]
Hasan, M.M.; Chakraborty, M.; Raj, A.A.B. A Hyper-parameters-tuned R-PCA+ SVM Technique for sUAV Targets Classification using the Range-/Micro-Doppler Signatures. IEEE Trans. Radar Syst. 2023, 1, 623–631. [Google Scholar] [CrossRef]
Tang, X.; Huang, M. Inversion of chlorophyll-a concentration in Donghu Lake based on machine learning algorithm. Water 2021, 13, 1179. [Google Scholar] [CrossRef]
Xu, X.; Lu, J.; Zhang, N.; Yang, T.; He, J.; Yao, X.; Cheng, T.; Zhu, Y.; Cao, W.; Tian, Y. Inversion of rice canopy chlorophyll content and leaf area index based on coupling of radiative transfer and Bayesian network models. ISPRS J. Photogramm. Remote Sens. 2019, 150, 185–196. [Google Scholar] [CrossRef]
Li, Y.; Wang, W.; Wang, G.; Tan, Q. Actual evapotranspiration estimation over the Tuojiang River Basin based on a hybrid CNN-RF model. J. Hydrol. 2022, 610, 127788. [Google Scholar] [CrossRef]
Shuran, C.; Yian, L. Breast cancer diagnosis and prediction model based on improved PSO-SVM based on gray relational analysis. In Proceedings of the 2020 19th International Symposium on Distributed Computing and Applications for Business Engineering and Science (DCABES), Xuzhou, China, 16–19 October 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 231–234. [Google Scholar]
Trinchero, R.; Canavero, F.G. Combining LS-SVM and GP regression for the uncertainty quantification of the EMI of power converters affected by several uncertain parameters. IEEE Trans. Electromagn. Compat. 2020, 62, 1755–1762. [Google Scholar] [CrossRef]
Xiao, Y.; Guo, Y.; Yin, G.; Zhang, X.; Shi, Y.; Hao, F.; Fu, Y. UAV multispectral image-based urban river water quality monitoring using stacked ensemble machine learning algorithms—A case study of the Zhanghe river, China. Remote Sens. 2022, 14, 3272. [Google Scholar] [CrossRef]
Ma, X.; Chen, T.; Ge, R.; Xv, F.; Cui, C.; Li, J. PM2.5 concentration forecasting in the area of Jing-Jin-Ji using models based on RF, RR, SVM, and ExtraTrees. 2022. [Google Scholar] [CrossRef]
Yang, P.; Xia, J.; Zhang, Y.; Zhan, C.; Sun, S. How is the risk of hydrological drought in the Tarim River Basin, Northwest China? Sci. Total Environ. 2019, 693, 133555. [Google Scholar] [CrossRef]
Rady, A.; Fischer, J.; Reeves, S.; Logan, B.; James Watson, N. The effect of light intensity, sensor height, and spectral pre-processing methods when using NIR spectroscopy to identify different allergen-containing powdered foods. Sensors 2020, 20, 230. [Google Scholar] [CrossRef]
Zhang, Z.; Ding, J.; Zhu, C.; Wang, J. Combination of efficient signal pre-processing and optimal band combination algorithm to predict soil organic matter through visible and near-infrared spectra. Spectrochim. Acta Part A Mol. Biomol. Spectrosc. 2020, 240, 118553. [Google Scholar] [CrossRef]
Blazhko, U.; Shapaval, V.; Kovalev, V.; Kohler, A. Comparison of augmentation and pre-processing for deep learning and chemometric classification of infrared spectra. Chemom. Intell. Lab. Syst. 2021, 215, 104367. [Google Scholar] [CrossRef]
Süsstrunk, S.; Buckley, R.; Swen, S. Standard RGB color spaces. In Proceedings of the Proc. IS&T;/SID 7th Color Imaging Conference, Scottsdale, AZ, USA, 16–19 November 1999; pp. 127–134. [Google Scholar]
Phuangsaijai, N.; Jakmunee, J.; Kittiwachana, S. Investigation into the predictive performance of colorimetric sensor strips using RGB, CMYK, HSV, and CIELAB coupled with various data preprocessing methods: A case study on an analysis of water quality parameters. J. Anal. Sci. Technol. 2021, 12, 19. [Google Scholar] [CrossRef]
Schwarz, M.W.; Cowan, W.B.; Beatty, J.C. An experimental comparison of RGB, YIQ, LAB, HSV, and opponent color models. ACM Trans. Graph. (Tog) 1987, 6, 123–158. [Google Scholar] [CrossRef]
Press, W.H.; Teukolsky, S.A. Savitzky-Golay smoothing filters. Comput. Phys. 1990, 4, 669–672. [Google Scholar] [CrossRef]
Liu, F.T.; Ting, K.M.; Zhou, Z.-H. Isolation forest. In Proceedings of the 2008 Eighth IEEE International Conference on Data Mining, Pisa, Italy, 15–19 December 2008; IEEE: Piscataway, NJ, USA, 2008; pp. 413–422. [Google Scholar]
Liu, F.T.; Ting, K.M.; Zhou, Z.-H. Isolation-based anomaly detection. ACM Trans. Knowl. Discov. Data (TKDD) 2012, 6, 1–39. [Google Scholar] [CrossRef]
Geladi, P.; Kowalski, B.R. Partial least-squares regression: A tutorial. Anal. Chim. Acta 1986, 185, 1–17. [Google Scholar] [CrossRef]
Friedman, J.H.; Stuetzle, W. Projection pursuit regression. J. Am. Stat. Assoc. 1981, 76, 817–823. [Google Scholar] [CrossRef]
Rumelhart, D.E.; Hinton, G.E.; McClelland, J.L. A general framework for parallel distributed processing. Parallel Distrib. Process. Explor. Microstruct. Cogn. 1986, 1, 26. [Google Scholar]
Hoerl, A.E.; Kennard, R.W. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 1970, 12, 55–67. [Google Scholar] [CrossRef]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Jay, S.; Maupas, F.; Bendoula, R.; Gorretta, N. Retrieving LAI, chlorophyll and nitrogen contents in sugar beet crops from multi-angular optical remote sensing: Comparison of vegetation indices and PROSAIL inversion for field phenotyping. Field Crops Res. 2017, 210, 33–46. [Google Scholar] [CrossRef]
Lu, J.; Qiu, H.; Zhang, Q.; Lan, Y.; Wang, P.; Wu, Y.; Mo, J.; Chen, W.; Niu, H.; Wu, Z. Inversion of chlorophyll content under the stress of leaf mite for jujube based on model PSO-ELM method. Front. Plant Sci. 2022, 13, 1009630. [Google Scholar] [CrossRef]
Wu, Y.; Zhao, Q.; Yin, X.; Wang, Y.; Tian, W. Multi-parameter health assessment of jujube trees based on unmanned aerial vehicle hyperspectral remote sensing. Agriculture 2023, 13, 1679. [Google Scholar] [CrossRef]

Figure 1. Map of the study area.

Figure 2. The equipment used for data collection: (a) spectral data collection instrument; (b,c) image data collection instruments; and (d) chlorophyll data collection instrument.

Figure 3. Basic flow diagram of the BPNN–RF–Ridge Regression algorithm.

Figure 4. Diagram illustrating pre-processing of one-dimensional spectral data and color channel data. (a) Spectral raw data. (b) Spectral data SG smoothing. (c) Spectral first-order derivative processing. (d) Spectral second-order derivative processing. (e) Color channel raw data. (f) First-order derivative of a color image. (g) Color image second-order derivative. (h) Isolated forest algorithm processing.

Figure 5. Color channel correlation coefficients.

Figure 6. Algorithmic processing of correlated spectral and color channel data. (a) Spectral raw data. (b) Spectral first-order derivative data. (c) Spectral second-order derivative data. (d) Color data. (e) First-order derivative of color data. (f) Second order derivative of color data. (g) Color raw data correlation analysis. (h) Color first-order derivative correlation analysis. (i) Color second-order derivative correlation analysis.

Figure 7. Chlorophyll inversion results for each algorithm. (a) BPNN. (b) RF. (c) Ridge Regression. (d) BPNN–Ridge Regression. (e) BPNN–RF. (f) RF–Ridge Regression. (g) BPNN–RF–Ridge Regression.

Figure 8. Map of assessment indicator results: (a–c) evaluation index results of chlorophyll data inferred from spectral and color channel data; and (d) evaluation index results of chlorophyll data inferred from spectral data.

Table 1. Correlation coefficient results for spectral bands.

Band No.	Raw Data Correlation Coefficient	Band No.	First-Order Derivative Correlation Coefficient	Band No.	Second-Order Derivative Correlation Coefficient
1075	0.975	1075	0.871	1075	0.852
1072	0.938	1043	0.649	1015	0.605
1061	0.931	1014	0.641	1014	0.604
1066	0.931	1015	0.641	1043	0.600
1051	0.926	1023	0.641	1006	0.598
1070	0.926	1044	0.640	1023	0.596
1052	0.924	1006	0.637	1032	0.596
1056	0.924	1018	0.637	1044	0.596
1062	0.923	1024	0.637	1007	0.594
1067	0.923	1032	0.637	1018	0.593
1057	0.922	1007	0.636	992	0.592
1069	0.922	1029	0.636	1024	0.592
1064	0.916	1017	0.633	1017	0.591
1071	0.913	1035	0.630	993	0.586
1063	0.912	1037	0.630	1029	0.586
1043	0.911	1031	0.629	986	0.584
1065	0.911	1036	0.629	1013	0.584
1058	0.910	1013	0.628	987	0.583
1050	0.908	992	0.627	974	0.582
1060	0.907	1010	0.627	975	0.582

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wu, J.; Bai, T.; Li, X. Inverting Chlorophyll Content in Jujube Leaves Using a Back-Propagation Neural Network–Random Forest–Ridge Regression Algorithm with Combined Hyperspectral Data and Image Color Channels. Agronomy 2024, 14, 140. https://doi.org/10.3390/agronomy14010140

AMA Style

Wu J, Bai T, Li X. Inverting Chlorophyll Content in Jujube Leaves Using a Back-Propagation Neural Network–Random Forest–Ridge Regression Algorithm with Combined Hyperspectral Data and Image Color Channels. Agronomy. 2024; 14(1):140. https://doi.org/10.3390/agronomy14010140

Chicago/Turabian Style

Wu, Jingming, Tiecheng Bai, and Xu Li. 2024. "Inverting Chlorophyll Content in Jujube Leaves Using a Back-Propagation Neural Network–Random Forest–Ridge Regression Algorithm with Combined Hyperspectral Data and Image Color Channels" Agronomy 14, no. 1: 140. https://doi.org/10.3390/agronomy14010140

APA Style

Wu, J., Bai, T., & Li, X. (2024). Inverting Chlorophyll Content in Jujube Leaves Using a Back-Propagation Neural Network–Random Forest–Ridge Regression Algorithm with Combined Hyperspectral Data and Image Color Channels. Agronomy, 14(1), 140. https://doi.org/10.3390/agronomy14010140

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Inverting Chlorophyll Content in Jujube Leaves Using a Back-Propagation Neural Network–Random Forest–Ridge Regression Algorithm with Combined Hyperspectral Data and Image Color Channels

Abstract

1. Introduction

2. Materials and Methods

2.1. Overview of the Experimental Area

2.2. Data Acquisition and Pre-Processing

2.3. Sensitive Feature Extraction Method

2.4. Inversion Modeling

2.5. Description of Assessment Indicators

3. Results

3.1. Results of Data Cleaning

3.2. Correlation Analysis and Selection

3.3. Summary and Analysis of Model Chlorophyll Inversion Results

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI