Evaluation of Baby Leaf Products Using Hyperspectral Imaging Techniques

Barrasso, Antonietta Eliana; Perone, Claudio; Romaniello, Roberto

doi:10.3390/app15158532

Open AccessArticle

Evaluation of Baby Leaf Products Using Hyperspectral Imaging Techniques

by

Antonietta Eliana Barrasso

,

Claudio Perone

^*

and

Roberto Romaniello

Department of Agriculture, Food, Natural Resource and Engineering, University of Foggia, 71122 Foggia, Italy

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(15), 8532; https://doi.org/10.3390/app15158532

Submission received: 5 July 2025 / Revised: 28 July 2025 / Accepted: 29 July 2025 / Published: 31 July 2025

(This article belongs to the Special Issue Advances in Automation and Controls of Agri-Food Systems)

Download

Browse Figures

Versions Notes

Abstract

Featured Application

The proposed model may enable the development of a device for the real-time field monitoring of plants. Such a system would be capable of assessing plant health and physiological needs, thereby preventing stress-related issues—such as wilting and yield reduction—and ultimately preserving product quality.

Abstract

The transition to efficient production requires innovative water control techniques to maximize irrigation efficiency and minimize waste. Analyzing and optimizing irrigation practices is essential to improve water use and reduce environmental impact. The aim of the research was to identify a discrimination method to analyze the different hydration levels in baby-leaf products. The species being researched was spinach, harvested at the baby leaf stage. Utilizing a large dataset of 261 wavelengths from the hyperspectral imaging system, the feature selection minimum redundancy maximum relevance (FS-MRMR) algorithm was applied, leading to the development of a neural network-based prediction model. Finally, a mathematical classification model K-NN (k-nearest neighbors type) was developed in order to identify a transfer function capable of discriminating the hyperspectral data based on a threshold value of absolute leaf humidity. Five significant wavelengths were identified for estimating the moisture content of baby leaves. The resulting model demonstrated a high generalization capability and excellent correlation between predicted and measured data, further confirmed by the successful training, validation, and testing of a K-NN-based statistical classifier. The construction phase of the statistical classifier involved the use of the experimental dataset and the critical humidity threshold value of 0.83 (83% of leaf humidity) was considered, below which the baby-leaf crop requires the irrigation intervention. High percentages of correct classification were achieved for data within two humidity classes. Specifically, the statistical classifier demonstrated excellent performance, with 81.3% correct classification for samples below the threshold and 99.4% for those above it. The application of advanced spectral analysis and artificial intelligence methods has led to significant progress in leaf moisture analysis and prediction, yielding substantial implications for both agriculture and biological research.

Keywords:

hyperspectral imaging; baby leaf; precision agriculture; water content

1. Introduction

Growing buyer interest in the environmental impact and misuse of non-renewable resources has increased research on the introduction of means to optimize the use of these resources and product quality. The escalating public interest in the environmental ramifications and the unsustainable exploitation of non-renewable resources has stimulated research aimed at developing strategies to optimize their use and concurrently enhance product quality [1]. The complex interaction between water resources and agri-food production is driven by the concepts of food security and environmental sustainability [2]. Global population growth coupled with rapid economic development in various regions exerts significant pressure on the demand for food and agricultural products in general. The combined effects of global population expansion and accelerated economic development in numerous regions lead to a significant increase in the demand for food and agricultural products [3]. According to Food and Agriculture Organization (FAO) projections for 2050, the combined increase in these factors could lead to a growth in global food demand of more than 70 %, with peaks of more than 100%, in some developing nations [4]. The goal is to meet growing needs through the development of sustainable agronomic practices that balance technical and economic considerations with environmental impacts and social benefits [5].

To understand the influence of climate change on plant growth and water use, it is necessary to appreciate how the water use efficiency (WUE) can be affected. Understanding the influence of climate change on plant growth and water use is essential for appreciating its impact on water use efficiency (WUE). Maximizing WUE is one way to mitigate the effects of a changing climate, and exploring mechanisms to increase WUE in agricultural systems is key to exploiting limited water resources. Addressing climate change requires that the physical and biological factors that interact to create high WUE be understood and is essential to ensure agricultural productivity and the resilience of cropping systems in a context of increasing climate change-induced water scarcity and variability [6].

Furthermore, maximizing WUE in agriculture is crucial due to the increasing competition for finite water resources between agriculture, industry, and civil use, where several technologies, like decanter centrifuge, were developed for the dewatering of sludge. Given the intensifying competition for finite water resources across agricultural, industrial, and civil applications, enhancing water use efficiency (WUE) in agriculture has become critically important [7]. This is even more intensified by population growth and increasing affluence, in a context where the development of new water sources is economically and ecologically limited. Agriculture, being the largest consumer of water globally and often perceived as inefficient, becomes a prime target for efficiency improvement in a scenario of increasing global water scarcity that may soon limit food production [8].

To maximize the WUE in crops, it is essential to accurately determine the water consumption of plants [9]. Nowadays, there are several techniques for optimizing water use in agriculture, the efficiency of which varies depending on the irrigation method, which can be optimized with certain measures. A range of techniques is currently employed to optimize water use in agricultural systems. The inherent efficiency of these techniques exhibits considerable variability, largely contingent upon the specific irrigation method utilized. Consequently, achieving maximal water-use efficiency necessitates the application of tailored optimization measures, strategically adapted to the characteristics of each irrigation approach [10]. Furthermore, there is a development trend towards more innovative and advanced technologies such as satellite sensors, drones, GPS, and telemetry systems by means of which farmers can accurately monitor soil conditions and crop water requirements in real time [11].

It is possible to take an approach that significantly considers crucial factors such as rainfall, evapotranspiration, soil moisture, and crop water requirements, based on the use of vegetation indices that provide information on plant health and water status using the measurements obtained from remote sensors or field instruments. Although such indices offer the advantage of computational simplicity, they do not fully account for the essential variables related to crop needs: consequently, they may not be able to adequately represent agricultural drought [12].

The increasing consumer acceptance of baby leaf vegetables underscores the critical importance of meticulous water management during their cultivation, directly impacting their water content and their quality and shelf-life.

Baby leaf vegetables are a production type that is gaining increasing consumer acceptance [13]. Spinach appears like a strategic crop of particular interest to the processing industry, its crop cycle following all other spring–summer cycle species, occupying a time slot in which no other crops are present [14].

The commercial successes of these products is due to several factors, including the service offered to increasingly busy consumers, the reduction in waste due to the complete consumability of the product and the high quality associated with this type of product in terms of appearance, taste, and nutritional value [13,14].

It is essential to detect the freshness of spinach to safeguard its quality and commercial value. Traditional detection methods involve a number of complex procedures, such as measuring chemical indices such as moisture, chlorophyll, and soluble sugar content; however, these methods are time consuming and require the supervision of experienced operators, in addition to being costly. Therefore, the need to develop a rapid and non-invasive method to detect the freshness of vegetables, including spinach, is crucial. This approach would allow high-quality standards to be maintained throughout the distribution chain, while ensuring excellent nutritional quality and increased consumer satisfaction [15].

Leaf water content is one of the most common physiological parameters that limit the efficiency of photosynthesis and biomass productivity in plants, so it is of great importance to determine or predict water content quickly and non-destructively [16].

The possibility to regulate irrigation levels with extreme precision could increase the nutritional quality of produce through targeted agronomic practices. Baby-leaves such as spinach are rich in antioxidant compounds that are easily perishable in post-harvest. It has been observed that irrigation management can influence the permanence of these substances over time. Studies were conducted on the effects of water stress on antioxidant compounds in spinach. The study showed that, compared to the control thesis, water stress significantly slowed down the loss of ascorbic acid in spinach leaves at the post-harvest stage [17].

Various approaches have been developed to estimate water content through the analysis of reflectance data. Several studies have explicitly tested the performance of different vegetation indices and have shown that stronger absorbance bands are more sensitive to changes in leaf water content than fainter water absorption bands in the near-infrared region [18].

As reported in the study conducted by Boshkovski et al. [19], hyperspectral data acquisition combined with multivariate statistical analysis and the creation of customized indices made it possible to study the relationship between reflectance and physiological/biochemical parameters in stressed olive trees. The study used hyperspectral reflectance to estimate the health of stressed olive trees (drought/salinity), analyzing varietal tolerance (photosynthesis, water, enzymes), correlation between spectral data and physiological parameters for future drone scans, and identifying wavelengths for customized enzymatic indices. Hyperspectral spectroscopy was used in the study of vegetation in ice-free areas of Antarctica to assess the spectral separability of different taxa. After data processing with PCA and LDA, the visible and near-UV regions (380–700 nm) were identified as the most effective for distinguishing vegetation classes. The results revealed distinct spectral signatures among taxa, confirming the potential of hyperspectral spectroscopy for environmental monitoring in Antarctic ecosystems [20].

Other studies make use of spectral indices that take advantage of the near-infrared (NIR) or shortwave near-infrared (SWIR) band to evaluate leaf water content, using the NIR band as a reference to normalize the effect of leaf structural variability [21].

As highlighted by Habibullah et al., a portable, low-cost multispectral optical sensor system has been developed for the non-invasive determination of nitrogen and water content in the leaves of various crops. This system acquires data in the visible and near-infrared bands and employs machine learning regression models. Specifically, Gaussian process regression (GPR) demonstrated high reliability, with an R² value exceeding 0.90 in estimating leaf water content, underscoring the promising feasibility of this approach for more efficient irrigation management and crop health monitoring [22].

Another recent study explored the application of visible and near-infrared (VIS/NIR) spectroscopy for the non-destructive determination of leaf water content in Miscanthus. VIS/NIR spectroscopy proved to be an effective method for this non-destructive determination. The research involved acquiring reflectance spectra in the 400–2500 nm range and applying various multivariate regression models. These models demonstrated significantly superior precision in estimating water content, even when utilizing only 75 sensitive wavelengths [16].

Currently, both artificial vision and spectroscopic technology are widely used in the field of agricultural product detection due to their non-invasive, fast, and reliable characteristics [15].

Hyperspectral imaging has rapidly risen to prominence in recent years, revolutionizing the way to obtain and analyze complex data. The research conducted by Zhao et al. specifically focuses on the detection of water content in lettuce using a portable hyperspectral imaging system with spectral range from 391.65 to 1018 nm, resulting in the selection of characteristic bands and the construction of a robust predictive model [23].

In a separate study, MIVIS hyperspectral imaging was applied to evaluate the water content of poplar plantations at leaf and canopy scales. This involved collecting data in water-sensitive spectral bands and implementing both empirical models derived from hyperspectral indices and the inversion of radiative transfer models [18].

Another study discusses hyperspectral field imaging and functional linear regression (FLR) for the rapid and nondestructive estimation of water content in grapevine (Vitis vinifera L.) leaves. This approach successfully identified optimal spectral bands (Z3 at 1465 nm and Z4 at 2035 nm), providing a precise tool for irrigation management in vineyards and effectively overcoming the limitations of traditional destructive methods [24].

As reported in the study of Zhang et al., hyperspectral imaging has been used to assess the freshness of leafy vegetables, including spinach and pakchoi. As the water content in plant tissues consistently decreases during storage, leading to deterioration and changes in freshness, hyperspectral imaging emerges as a significant method for rapid and accurate freshness evaluation without exacerbating water loss during the detection process [25].

Nowadays, in the context of food analysis, detectors sensitive to the visible–near-infrared (Vis-NIR) and near-infrared (NIR) regions are commonly used due to their ability to provide detailed information about product composition [26], to detect fraud [27], and even the ripening stage of fruit [28]. Numerous studies have demonstrated the effectiveness of Vis-NIR and NIR-based techniques for accurately identifying, distinguishing, and assessing the quality of fresh fruit in a fast and non-destructive manner [29,30,31,32].

The effectiveness of hyperspectral imaging is supported by results from various studies; for example, one investigation demonstrated the ability of reflectance-mode HSI (400–1000 nm) to accurately monitor soybean flour content in pasta. Samples with increasing percentages of soybean flour (0–50%) were analyzed, and the FS-MRMR algorithm identified the most relevant wavelengths, with 655 nm proving particularly effective. A predictive model based on these wavelengths achieved high accuracy, confirming the potential of HSI for real-time quality control and its suitability for integration into industrial processes aligned with Industry 4.0 principles [33].

The use of hyperspectral imaging in the field of the non-destructive quality detection of fruit and vegetables has achieved significant results in recent years, enabling a detailed analysis of both the internal and external quality of the sample under examination [34]. Due to its ability to simultaneously examine the spectral and spatial characteristics of samples, it has proven particularly effective in assessing freshness, ripeness, the presence of defects, and other quality parameters of agricultural produce. The integration of this information enables producers, traders, and consumers to make decisions on product quality and marketing, helping to improve supply chain efficiency and ensure better consumer satisfaction [15].

Compared to the current state of the art, the innovation of this research lies in the identification of specific, optimized wavelengths for assessing the hydration status of baby leaf vegetables under real-field conditions. This is achieved through an integrated approach combining hyperspectral imaging and simplified mathematical modeling. The primary objective is to develop a reliable, rapid, and non-destructive method for determining leaf water content, meeting key requirements such as sample integrity, measurement repeatability, ease of implementation, and suitability for in-field application.

The study aims to select the most influential wavelengths to accurately estimate leaf moisture levels, using them as input variables in a mathematical model that outputs the hydration status of the plant. In alignment with the Sustainable Development Goals (SDGs) and the Paris Agreement, this work contributes to a more sustainable use of freshwater resources by enabling irrigation practices that are tailored to the actual needs of the crop.

From a precision agriculture perspective, the research supports the development of tools and technologies that enhance efficiency and sustainability in crop management. The real innovation concerns the possibility of obtaining machines capable of carrying out scouting operations, of monitoring the plant in the field in real time, with regard to health conditions and possible needs, such as water content, in order to prevent the product from reaching a stress phase that entails a series of disadvantages in terms of product quality, such as wilting, and lower product weight.

To summarize, the research had the following main objectives:

-: Identify a rapid, reliable, and non-destructive method to analyze the different hydration levels in young leafy vegetables (baby-leaf), such as spinach.
-: Maximize irrigation efficiency and minimize waste, supporting more efficient production and reducing environmental impact.
-: Develop a model for the real-time monitoring of plant health conditions in the field, preventing water stress and preserving product quality.
-: Identify the specific wavelengths optimized by hyperspectral imaging to accurately estimate leaf moisture.
-: Contribute to the more sustainable use of water resources by promoting irrigation practices based on actual crop needs.

2. Materials and Methods

To set up the trials, two Spinach plots were created in an extensive cultivation system; in the first plot, water was administered in order to fully satisfy the field water balance and the evapotranspirative demand; in the second plot, a water stress condition was determined in order to obtain a non-transpiring vegetation; the moment when the crop would require irrigation was identified by the means of probes. The experimental design involved establishing two spinach plots within an extensive cultivation system. The first plot was managed to ensure optimal water availability, with irrigation fully meeting its field water balance and evapotranspiration requirements. The second plot, however, was subjected to induced water stress, aiming to achieve a non-transpiring vegetation state. The need for irrigation in the optimally watered plot was identified using in situ probes.

2.1. Leaf Sampling and Water Content Measurements

The entire project involved 17 experimental trials with a total of 330 samples, collected between March 2023 and May 2024, with sampling once a week.

In each trial, aliquots were collected from the two plots, and then representative samples were made in the laboratory for both irrigated and water-stressed products.

Each sample underwent preliminary operations, such as free water drying to avoid reflections, the arrangement of leaf groups, and the assignment of the identification number. Before being arranged on a black background for scanner capture, the samples were subjected to a wet weight measurement.

Before starting the acquisition, it is necessary to calibrate the instrument to provide a reference system of 0 and 100% reflectance. This calibration was performed by acquiring electronic camera noise (acquisition with the lens shuttered) and then acquiring the image of a calibrated white reference (100% reflectance). After calibration, the system was set to acquire values within 80% of the maximum reflectance in order to avoid image saturation.

To obtain the water content, the samples were completely dried at a temperature of 105 °C for 24 h, following the standardized and AOAC (Association of Official Analytical Chemists) recognized protocol for dry matter determination, used by Sánchez et al. [35].

After that, they were weighed again, and the change in moisture was calculated, providing an estimate of the amount of water present.

The images were processed using the hyperspectral scanner’s proprietary software, measuring 200 points per sample to obtain an average reflectance spectrum for each batch. Each of these spectra was then correlated to the respective moisture content of the measured sample.

2.2. Hyperspectral Scanner

The hyperspectral images of all samples were acquired by means of a hyperspectral scanner (DV Optic s.r.l., Padova, Italy), equipped with two cameras with corresponding spectrographs capable of acquiring reflectances in the visible and near-infrared range. The visible sensor acquires reflectance data in a wavelength range between 400 and 1000 nm, and the near-infrared sensor acquires between 900 and 1700 nm. The instrument setup for this thesis work comprises a moving-plane hyperspectral scanner (DV-Optic s.r.l.) equipped with two distinct types of sensors.

The primary sensor is a progressive scan camera (AVT F100 B, Allied Vision Technologies, Stadtroda, Germany), which incorporates a 16-bit charge-coupled device (CCD) chip (Kodak KSI 1020, Stuttgart, Germany). This camera is coupled with an ImSpector V10 spectrograph (Specim Ltd., Oulu, Finland) that operates within a wavelength range of 400 to 1000 nm, providing a sampling rate of 5 nm. The spectrograph employs prismatic reticular optics and is fitted to an objective with a C f 1,4 attack. Its function is to disperse the incoming radiation along the scan line, generating spectral information for each spatial position.

The secondary sensor is a short wave infrared (SWIR) camera, featuring an InGaAs sensor (Xeva InGaAs VIS/NIR 320 × 256 FPA, Xenics nv, Leuven, Belgium). This camera is capable of acquiring reflectances at wavelengths from 1000 to 1700 nm, also with a sampling rate of 5 nm. As for the NIR spectrograph (covering 900–1700 nm), the detector utilized is of the complementary metal-oxide semiconductor (CMOS) type. The illumination sources are different for the two types of sensors.

For the VIS/NIR sensor, a 150 W halogen illuminant was employed. This illuminant was connected to a fiber optic guide, which terminated in a linear collimator designed to illuminate the scanner’s line of sight. Conversely, for the SWIR sensor, a distinct halogen source (EKE 9596ER, 150 W) was utilized, characterized by an emission spectrum ranging from 800 to 2000 nm. This illuminant was also equipped with a fiber optic guide and a linear collimator. To facilitate the sensors’ reception of reflected radiation from the object, two mirrors, positioned at a 45° angle, were strategically placed within the line of sight of each sensor.

2.3. Image Pre-Processing and Spectral Wavelength Selection

The spectral data acquired by the Spectral Scanner followed a rigorous electronic and optical calibration procedure. This process ensured that the spectral data were normalized against the maximum reflectance and effectively purged of electronic noise. The instrument performed, thanks to its appropriately compiled software, the normalization of the spectral data following the equation of ElMasry et al. (2007) [36].

R_{c, λ} = \frac{R_{λ} - D_{λ}}{W_{λ} - D_{λ}}

(1)

Hyperspectral systems generate a multitude of reflectance data, at many wavelengths. Specifically, in the case of the instrument used and with the operating modes set, about 60,000 spectra with 261 wavelengths each were obtained. Therefore, strategies were put in place to identify the wavelengths that were truly influential, and thus usable for training the mathematical model. Specifically, the large dataset (261 wavelengths) obtained from the hyperspectral imaging system was analyzed using the “Feature Selection Minimum Redundancy Maximum Relevance” (FS-MRMR) algorithm. The method used a progressive addition scheme to populate an empty set S containing the best wavelength to be used for the mathematical model. For this purpose, the algorithm used an interdependence function (I) [37], which indicates the interdependence or not of a pair of wavelengths (λ), and a mutual information quotient (MIQ) value to rank the features (wavelengths). The MIQ (Equation (2)) includes two indices, V_λ and W_λ, which indicate the relevance and redundancy of the response variable (with respect to actual leaf moisture percentage).

M I Q = \frac{V_{λ}}{W_{λ}}

(2)

where

V_{λ} = I (λ, y)

(3)

W_{λ} = \frac{1}{|S|} = \sum_{z ϵ S} I (λ, z)

(4)

A simplified schematic of the algorithm can be seen in Figure 1

The set S, thus defined, comprises the most influential and usable wavelengths for constructing the prediction model.

Following the selection of these influential variables, it was necessary to eliminate the outliers. An outlier is defined as a data point that deviates from the available data population according to established standards of variation. Mathematically, this concept can be summarized by referencing the mean and standard deviation of a given data population.

μ_{k} = \frac{1}{N} \sum_{i = 1}^{N} x_{i}

(5)

where µ is the mean of a population k of data and

σ_{k} = \sqrt{\frac{1}{N - 1} \sum_{i = 1}^{N} {|x_{i} - μ|}^{2}}

(6)

where σ is the standard deviation of the population of N data.

A data point within a population may be classified as an outlier if it deviates beyond two standard deviations from the mean (i.e., less than μ − 2σ or greater than μ + 2σ). Considering the data population’s structure as a 57,000 × 261 matrix, outlier elimination was strategically focused on the wavelength exhibiting the highest variability: wavelength 213, centered at 1460 nm. Once these outliers were identified, their associated spectra and corresponding response variable values were subsequently removed. This determination and elimination process was meticulously executed through a custom-developed computational algorithm.

The decision to use the FSMRMR method was dictated by the need to improve the effectiveness of existing vegetation indices. In fact, these vegetation indices use simple mathematical functions both to be universal, and work on vegetation in general, but also to require less computing power. This, however, is countered by an accuracy of measurement of the desired parameter. With hyperspectral systems, on the other hand, it is possible to identify the specific wavelengths for the particular crop under consideration, and thanks to the methods of reducing hyperspectral variables, it is possible to obtain mathematical functions that are ad hoc for the crop under consideration. Furthermore, with the computing power available today, hyperspectral systems can be easily managed by returning real-time analyses in a very short time.

2.4. Prediction and Classification Models

Two strategies were chosen for data analysis:

Construction of a mathematical regression model for the purpose of finding a transfer function to determine leaf moisture from the hyperspectral data.
Construction of a mathematical classification model in order to identify a transfer function that can discriminate hyperspectral data based on a threshold value of absolute leaf moisture. To achieve this, the dataset was partitioned into two subsets based on a critical moisture threshold, identified as 83% absolute leaf moisture. This value signifies the point below which the crop enters a state of stress, thereby necessitating irrigation.

For the construction of the regression model, a preliminary analysis was performed on the main existing multi-parametric models. Specifically, the Matlab 2024a toolbox contains the following classes of regression models:

Linear regression models;

Regression trees;

Support vector machines;

Efficiently trained regression models;

Gaussian Process Regression models;

Kernel optimization regression models;

Ensemble of trees;

Neural networks.

All the above were used and the best one only was selected to analyze the data.

The enumerated models necessitate a sequential process of training, validation, and testing. These three distinct steps are paramount for the proper configuration of the mathematical model, endowing it with the crucial ability to generalize from the data. This generalization capability ensures the model can accurately estimate the desired magnitude even when confronted with widely varying input data.

To achieve this, the entire dataset, comprising the reflectance spectra and absolute humidity measurements of the baby leaf samples, is partitioned into three distinct subsets: 70% for training, 20% for validation, and 10% for testing. During the training phase, the chosen model is fitted using both the predictors (i.e., wavelengths) and the response data (i.e., the associated moisture values). This process enables the model to construct optimal fitting functions, according to the selected regression model, with the objective of maximizing the adjusted coefficient of determination (

R_{a d j}^{2}

) and minimizing the root mean square error (RMSE).

R_{a d j}^{2} = 1 - (\frac{N - 1}{N - p}) \cdot \frac{S S E}{S S T}

where N is the number of observations, p is the number of regression parameters (for example, for linear regression p = 2), SSE represents the sum of quadratic errors and SST represents the total quadratic sum.

R M S E = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {|p_{i} - y_{i}|}^{2}}

where N is the number of observations, p_i are the ith predicted values from the model, and y_i are the ith experimental values.

The validation phase, on the other hand, involves verifying the performance of the trained model using a validation scheme. The method employed is k-fold validation, which entails partitioning the validation dataset into k subsets, where each subset contains elements exclusive to itself. The model then performs data training (using the training set) and evaluates the performance of the model with each of the k validation subsets. Finally, the prediction error for each k set is determined. In particular, the k-fold validation method was performed with k = 5, to obtain the highest reliability of the model in terms of the generalization of the response when predicting the moisture data from the hyperspectral data.

To proceed with the second strategy (i.e., classification), a statistical classifier using a supervised nonparametric algorithm of the KNN (k-nearest neighbors) type was chosen. The classifier, like the regression model, requires a training, validation, and testing phase. The same wavelengths determined for the regression model were used to compute the data. The performance evaluation of the classifier is performed by constructing the confusion matrix. This is a matrix that compares the actual classes with the classes determined by the classification model. Classifier performance is objectified by the number of values correctly classified (TPR—true positive rate) versus errors expressed as false positive rate (FPR—false positive rate).

In addition, it was useful to visualize the ROC (receiver operating characteristic) curve which shows the true positive rate (TPR) versus false positive rate (FPR) for different classification score thresholds calculated by the currently selected classifier. The operating point of the model showed the false positive rate and true positive rate corresponding to the threshold used by the classifier to classify an observation. The area under the curve (AUC) was additionally assessed. The AUC represents the integral of the receiver operating characteristic (ROC) curve, specifically the true positive rate (TPR) with respect to the false positive rate (FPR), calculated from FPR = 0 to FPR = 1. This metric quantifies the classifier’s efficacy in accurately categorizing 90% of positive class observations into the positive class, thereby serving as a comprehensive indicator of the classifier’s overall performance. AUC values range from 0 to 1, with higher values signifying superior classifier capabilities.

3. Results and Discussions

3.1. Water Content and Spectral Data of Samples

Table 1 presents the average moisture status of the analyzed samples across 17 distinct tests. For each test, the table provides the following mean measurements:

Fresh weight: The average initial weight of the samples, indicating their total mass including water.

Dry weight: The average weight of the samples after complete drying, representing their solid content.

Delta: The average difference between the fresh weight and dry weight, which quantifies the average amount of water lost from the samples.

Std. Dev. (standard deviation): The average standard deviation, representing the variability within the measurements for that specific test.

Weight %: the average moisture content of the samples, expressed as a percentage. This value is typically calculated as (delta/fresh weight) in percentage, providing a normalized measure of water content relative to the fresh mass.

The 17 experimental trials generated 57,000 relative reflectance spectra in the wavelength range of 400 to 1700 nm, with a spectral resolution of 5 nm. For these characteristics, each spectrum is composed of 261 wavelengths.

A 3D map of the totality of the spectra obtained is shown in Figure 2, showing in x the wavelength (nm), in y the relative reflectance (%), and in z the number of samples from which the individual spectra were derived.

The obtained spectra were subjected to an outlier evaluation process. Considering a specific sample wavelength, 16,695 spectra that exceeded the 2σ threshold on that wavelength were eliminated from the initial dataset. Consequently, the refined dataset, consisting of 40,305 spectra, was retained for subsequent processing.

3.2. Regression Model Results

A neural network, with supervised training, was used for the regression model.

The first computational step was the selection of influential variables using the MRMR algorithm. The algorithm ranked the variables (wavelengths) according to their influence in estimating the response data (absolute humidity %) based on the input data (wavelengths), assigning a value (score) between 0 and 0.6. Wavelengths with scores greater than 0.5 were chosen to be used. There were five wavelengths selected, visible in red in Figure 3. Specifically, the selected wavelengths were 565 nm, 790 nm, 890 nm, 945 nm, and 1465 nm. The first is contained in the visible band and specifically in the green–yellow band (565 nm), while the other four fall in the infrared band.

Zhang et al. [25] utilized spectral data within the 400–1100 nm range, encompassing water absorption bands around 760 nm and 970 nm, for the simultaneous determination of crucial biochemical parameters such as chlorophyll and nitrogen. This approach effectively optimized the efficiency of plant physiological state assessment. In a separate study, the visible near-infrared (VIS-NIR) spectroscopy was employed for the rapid discrimination between organic and non-organic leafy vegetables. Within this context, the 1400–2500 nm wavelength range, primarily governed by plant water content absorption, was leveraged for determining structural characteristics and the overall composition of leafy vegetables [38].

Further spectroscopic analysis has identified specific wavelengths and spectral regions critical for the non-invasive assessment of physiological parameters in leafy plants.

In the visible range, the region around 565 nm (assimilated to the ranges 506–524 nm and 566–577 nm) has a significant correlation with water content, suggesting an interconnection with chlorophyll content and indicating the influence of hydration status on plant pigmentation. Moving into the near-infrared (NIR), the 790 nm wavelength (assimilated to the 765–784 nm region) is of particular interest because light absorption in this area is predominantly determined by the structural characteristics and intrinsic content of leafy vegetables, showing a strong correlation with water content. Similarly, at 890 nm, within the 780–1400 nm range, light transmission and reflection are primarily modulated by the structural properties of the leaf; although not a primary band of water absorption, its relevance to the classification of leaf spectra in the 550–910 nm band suggests a contribution to the discrimination of physiological states related to hydration and structural integrity. Continuing in the NIR, the 945 nm wavelength (near the 950–957 nm and 950–970 nm regions) showed superior correlation with water content and is recognized as a reliable indicator of plant water status, with absorption at about 960 nm influenced by the molecular structure of plant contents. Finally, at 1465 nm, very close to 1450 nm, there is one of the most pronounced water absorption characteristics influenced by molecular structural aspects, including O-H bond combinations [23,24,25,38,39].

The neural network was subsequently trained using data from the training set. This training leveraged the absolute leaf moisture data associated with their corresponding reflectance spectra, specifically employing the five selected influential wavelengths. Figure 4 details the training parameters utilized for this neural network.

As shown, the training phase required 82 computational cycles in which the operational parameters of the network were set to prepare the network for the regression of the validation data. Specifically, the number of intermediate layers was determined to be three following the data processing. These three intermediate layers are populated by a set of interconnected nodes, each housing mathematical functions, with connections extending to nodes in subsequent layers. The number of nodes, and thus functions present within the neural network, was determined through an optimization process. In this process, hyperparameters, the regularization parameters of individual nodes, were also determined. The optimization process is performed by successive cycles where the change in regularization parameters is determined based on the mean standard error (MSE) value obtained in the regression. Figure 5 shows the optimization report of the neural network, where it is possible to note the minimum MSE layer, obtained at the 15th iteration. The training phase with the optimization procedure resulted in a numerosity of intermediate layers of 300 nodes for the first, 227 for the second, and 272 for the third.

Figure 6 shows the optimization cycle for determining the best regression algorithm used by the neural network. This phase also required many cycles with high computational resource requirements. The results are displayed in Table 2.

The neural network was then validated, using the 5-fold cross-validation method, and tested using data unknown to the neural network. The regression results obtained are displayed in Table 3.

From the analysis in Table 3, it can be seen that the training phase (Train) showed a high degree of generalization of the neural network, recording an R2 value of 0.99. Validation shows an optimal value and close to 0.9, even if lower than the train data. The validation of the network’s good generalization ability is provided by the results of the test phase. In this case, the neural network generated, through the determined regression functions, a correlation coefficient of 0.97.

The application of the mathematical model, based on the determined neural network, resulted in the prediction data of the percent moisture of the analyzed baby-leaf. The correlation with respect to the gravimetrically determined percent of moisture data is displayed in Figure 7.

From the analysis of Figure 7, it can be seen that the regression lines are very close to the identical line, a sign of very small prediction error. It should be kept in mind that the points in the graphs represent the average value of each iso-humidity dataset, so they represent, each of the train, validation, and test graphs, and the number of input data expressed in Table 3.

3.3. Classification Model Results

The statistical classifier used is of the KNN (k-nearest neighbors) type.

The construction phase of the statistical classifier involved the use of the experimental dataset (experimentally determined reflectance spectra and percent moisture values), and the critical moisture threshold value below which the baby-leafs crop requires irrigation intervention was considered. This threshold was determined as reported in Section 2.1. and is 0.83 (83% leaf moisture). This value could be assumed as an average limit for water deficiency, as an indirect measure of soil water content gradient expressed as a percentage of field capacity [40].

The training phase of the statistical classifier required 130 cycles of optimization to define the regularization parameters (hyperparameters), arriving at an average classification error of 15 percent, in the training phase (Figure 8). The regularization parameters given in Table 4 are regarding: number of neighbors, type of distance between neighbors, distance weights and data standardization. In particular, the Mahalanobis distance was chosen as the distance metric because it is particularly useful when working with data with a multivariate distribution, i.e., with several variables correlated with each other, as is the case with hyperspectral data. Unlike the Euclidean distance, which ignores correlations between variables, the Mahalanobis distance takes into account the covariance of the data, allowing for the better identification of outliers and the more accurate clustering of data [41].

The statistical classifier, set with the regularization parameters determined in the optimization phase, was validated with the validation dataset and then tested using the data in the test set. The performance measurement of the statistical classifier was performed by evaluating the confusion matrices for the validation and test phase, as well as the ROC curve for the two moisture classes determined.

Figure 9 and Figure 10 show, respectively, the confusion matrices for the validation and test phases of the KNN classifier. The confusion matrices presented illustrate the predicted classes on the x axis and the actual classes on the y axis. Each cell within the matrix displays percentage values, represented by a color gradient from saturated blue (indicating 100% correct classification) to saturated red (indicating 0% correct classification), with intermediate saturation levels reflecting corresponding percentages. Adjacent to the main 2 × 2 confusion matrix, a secondary 2 × 2 matrix is displayed on the left, indicating the “true positive rate” (TPR) on the left and the “false negative rate” on the right. It should be noted that these latter values are directly derivable from the primary confusion matrix.

In terms of composition, the confusion matrices (located on the left in Figure 9 and Figure 10) are structured as follows: at position 1,1 is the percentage of correctly classified values belonging to class one (blue box); at position 2,2 is displayed the percentage of the correct classification of values belonging to class two (blue box); at position 2,1 is displayed the percentage of misclassification of class two data incorrectly assigned to class one (red box); and at position 1,2 is displayed the percentage of misclassification of class one data incorrectly classified as class one data.

The analysis of the confusion matrices showed that the statistical classifier can estimate the data in the two moisture classes excellently, with 84.2% and 99.5% for classes 1 and 2, respectively, for the validation phase, and 81.3% and 99.4% for classes 1 and 2, respectively, for the test phase.

An important aspect regarding the correct classification matrices should be noted. Taking the confusion matrix of the test set into analysis, the highest percentage of correct classification was obtained for class 2, that is, for the moisture dataset above the intervention threshold. This data, in a real-world application, can be very useful in order to determine the time of the start of irrigation, as it can be used as a discriminating value, meaning that all spectral data that do not fall into class 2 lead to defining a probably crop stress condition.

Conversely, had Class 1 achieved the highest percentage of correct classification, the system’s accuracy would not be considered as robust. This is due to the expectation that crops should ideally maintain a state of water well-being (Class 2). Consequently, the monitoring and verification of spectral data are optimally focused on Class 2, which exhibits a very high correct classification percentage of 99.4 percent.

Figure 11 and Figure 12 show the ROC (receiver operating characteristic) curves from the validation and testing phase. ROC curves are a graphical representation of a metric used to verify the quality of classifiers. For each class of classifier, ROC applies threshold values in the range [0, 1] to the outputs. For each threshold, two values, TPR (true positive ratio) and FPR (false positive ratio), are calculated. For a particular class “i”, TPR is the number of outputs whose actual and expected class is class i, divided by the number of outputs whose expected class is class “i”. FPR is the number of outputs whose actual class is not class “i”, but whose predicted class is class “i”, divid ed by the number of outputs whose predicted class is not class “i”.

Figure 11 and Figure 12 show the high correct classification ability of the developed KNN model. The blue and red points are shifted to the left at the top and this is an indication that the model has correctly classified, with a high percentage, the TPRs, i.e., the values really belonging to the relevant moisture classes. In addition, the AUC (area under curve) parameter is an indication of the integral of the curves related to the two classes. The AUC value gives an indication of the ability to correctly classify both classes. A value of 1 indicates the 100% correct classification for both classes (1 and 2). For validation and test data, the AUC values were 0.98, respectively, a value very close to 1.

4. Conclusions

The research focused on defining an effective method to discriminate the water status of spinach leaves, with a focus on the speed, ease of use, and non-invasiveness of the method. Through the mathematical models used, it is then possible to manage the water resource wisely.

Precision mechanization, including the ability to monitor plant conditions in real time, is critical to ensuring their health and maximizing the quality of agricultural products.

The initial water content in spinach appears to be crucial for ensuring its nutritional and commercial quality; however, its delicate nature and susceptibility to water loss make storage a challenge. Conventional methods for assessing their content are complex and costly; there is a need to turn to rapid, non-invasive approaches, and hyperspectral imaging emerges as a solution.

Through the use of a neural network trained with absolute moisture data and reflectance spectra, it was possible to identify five significant wavelengths for estimating baby-leaf moisture. The robustness of the model was confirmed through several rounds of training and validation, showing a high degree of generalization and excellent correlation between predicted and measured data. The practical application of this model has shown surprising accuracy in predicting leaf moisture, as evidenced by the regression lines that match almost perfectly with the identical line in the graphs. These results indicate significant progress in leaf moisture analysis and prediction using advanced spectral analysis and artificial intelligence methods, with significant implications for agriculture and biological research. Further achievement was to have trained, validated, and tested a statistical classifier based on the K-NN model, using the same five wavelengths identified for the regression model, by which the high percentages of the correct classification of the data present in the two moisture classes were obtained. Spectroscopic analysis revealed that specific wavelengths in the visible and near-infrared (such as 565, 790, 890, 945, and 1465 nm) are closely related to the physiological and structural parameters in leafy plants. These bands show direct correlations with water content, chlorophyll, and structural characteristics, offering an effective means of monitoring plant health and freshness. This system may be useful in the early identification of the critical time of irrigation intervention. The developed system, by identifying an accurate threshold of 83% in foliar water content through advanced spectroscopic analysis, proves to be a valuable tool for the early identification of the critical time for irrigation intervention, with possible applications for precision irrigation.

In the future, the application of these methods could extend to the testing of innovative agronomic techniques aimed at further improving the nutritional quality of produce through judicious management of water stress. This approach could be an important step toward more sustainable agriculture and the production of high-quality food.

Author Contributions

Conceptualization, R.R.; methodology, R.R., A.E.B., software, R.R.; validation, R.R.; formal analysis, R.R., A.E.B. and C.P.; writing—review and editing, R.R., C.P. and A.E.B.; founding acquisition, R.R.; supervision, R.R. All authors have read and agreed to the published version of the manuscript.

Funding

This work was financially supported by Italian Ministry of University and Research (MUR), project “Conservabilità, qualità e sicurezza dei prodotti ortofrutticoli ad alto contenuto di servizio—ARS01_00640—POFACS”, D.D. 1211/2020 and 1104/2021.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors upon request.

Conflicts of Interest

The authors declare no conflict of interest.

References

Cui, S.; Adamowski, J.F.; Albano, R.; Wu, M.; Cao, X. Optimal resource reallocation can achieve water conservation, emissions reduction, and improve irrigated agricultural systems. Agric. Syst. 2024, 221, 104106. [Google Scholar] [CrossRef]
Sun, J.; Li, Y.P.; Suo, C.; Liu, J. Development of an uncertain water-food-energy nexus model for pursuing sustainable agricultural and electric productions. Agric. Water Manag. 2020, 241, 106384. [Google Scholar] [CrossRef]
Zheng, Z.; Henneberry, S.R. Estimating the impacts of rising food prices on nutrient intake in urban China. China Econ. Rev. 2012, 23, 1090–1103. [Google Scholar] [CrossRef]
Ceccon, P.; Fagnano, M.; Grignani, C.; Monti, M.; Orlandini, S. Agronomia, 1st ed.; EdiSES: Napoli, Italy, 2017. [Google Scholar]
Zhong, H.; Zhang, S.; Zhang, X.; Yu, Y.; Li, D.; Wang, S.; Xiao, J.; Tian, P. Water-land-energy efficiency and nexus within global agricultural trade during 1995–2019. Sci. Total Environ. 2024, 951, 175539. [Google Scholar] [CrossRef]
Hatfield, J.L.; Dold, C. Water-Use Efficiency: Advances and Challenges in a Changing Climate. Front. Plant Sci. 2019, 10, 103. [Google Scholar] [CrossRef]
Leone, A.; Perone, C.; Berardi, A.; Tamborrino, A. Energy analysis and numerical evaluation of the decanter centrifuge for wastewater management to allow a sustainable energy planning of the process. Energy Convers. Manag. X 2024, 22, 100596. [Google Scholar] [CrossRef]
Hsiao, T.C.; Steduto, P.; Fereres, E. Water Productivity: Science and Practice. Irrig. Sci. 2007, 25, 209–231. [Google Scholar] [CrossRef]
Gao, J.; Liu, N.; Wang, X.; Niu, Z.; Liao, Q.; Ding, R.; Du, T.; Kang, S.; Tong, L. Maintaining grain number by reducing grain abortion is the key to improve water use efficiency of maize under deficit irrigation and salt stress. Agric. Water Manag. 2024, 294, 108727. [Google Scholar] [CrossRef]
Preite, L.; Vignali, G. Artificial intelligence to optimize water consumption in agriculture: A predictive algorithm-based irrigation management system. Comput. Electron. Agric. 2024, 223, 109126. [Google Scholar] [CrossRef]
Liu, Y.; Razman, M.R.; Syed Zakaria, S.Z.; Lee, K.E.; Khan, S.U.; Albanyan, A. Personalized context-aware systems for sustainable agriculture development using ubiquitous devices and adaptive learning. Comput. Hum. Behav. 2024, 160, 108375. [Google Scholar] [CrossRef]
Hernández-López, J.A.; Andrade, H.J.; Barrios, M. Agricultural drought assessment in dry zones of Tolima, Colombia, using an approach based on water balance and vegetation water stress. Sci. Total Environ. 2024, 921, 171144. [Google Scholar] [CrossRef]
Voutsinos-Frantzis, O.; Savvas, D.; Antoniadou, N.; Karavidas, I.; Ntanasi, T.; Sabatino, L.; Ntatsi, G. Innovative Cultivation Practices for Reducing Nitrate Content in Baby Leaf Lettuce Grown in a Vertical Farm. Horticulturae 2024, 10, 375. [Google Scholar] [CrossRef]
Colelli, G.; Elia, A. I prodotti ortofrutticoli di IV gamma: Aspetti fisiologici e tecnologici. Italus Hortus 2009, 16, 55–78. [Google Scholar]
Zhu, S.; Feng, L.; Zhang, C.; Bao, Y.; He, Y. Identifying Freshness of Spinach Leaves Stored at Different Temperatures Using Hyperspectral Imaging. Foods 2019, 8, 356. [Google Scholar] [CrossRef]
Jin, X.; Shi, C.; Yu, C.Y.; Yamada, T.; Sacks, E.J. Determination of Leaf Water Content by Visible and Near-Infrared Spectrometry and Multivariate Calibration in Miscanthus. Front. Plant Sci. 2017, 8, 721. [Google Scholar] [CrossRef] [PubMed]
Monaghan, J. Moderate water stress prevents the postharvest decline of ascorbic acid in spinach (Spinacia oleracea L.) but not in spinach beet (Beta vulgaris L.). J. Sci. Food Agric. 2015, 96, 2976–2980. [Google Scholar] [CrossRef] [PubMed]
Colombo, R.; Meroni, M.; Marchesi, A.; Busetto, L.; Rossini, M.; Giardino, C.; Panigada, C. Estimation of leaf and canopy water content in poplar plantations by means of hyperspectral indices and inverse modelling. Remote Sens. Environ. 2008, 112, 1820–1834. [Google Scholar] [CrossRef]
Boshkovski, B.; Doupis, G.; Zapolska, A.; Kalaitzidis, C.; Koubouris, G. Hyperspectral Imagery Detects Water Deficit and Salinity Effects on Photosynthesis and Antioxidant Enzyme Activity of Three Greek Olive Varieties. Sustainability 2022, 14, 1432. [Google Scholar] [CrossRef]
Calviño-Cancela, M.; Martín-Herrero, J. Spectral Discrimination of Vegetation Classes in Ice-Free Areas of Antarctica. Remote Sens. 2016, 8, 856. [Google Scholar] [CrossRef]
Cheng, T.; Rivard, B.; Sánchez-Azofeifa, A. Spectroscopic determination of leaf water content using continuous wavelet analysis. Remote Sens. Environ. 2011, 115, 659–670. [Google Scholar] [CrossRef]
Habibullah, M.; Mohebian, M.R.; Soolanayakanahally, R.; Wahid, K.A.; Dinh, A. A Cost-Effective and Portable Optical Sensor System to Estimate Leaf Nitrogen and Water Contents in Crops. Sensors 2020, 20, 1449. [Google Scholar] [CrossRef]
Zhao, J.; Li, H.; Chen, C.; Pang, Y.; Zhu, X. Detection of Water Content in Lettuce Canopies Based on Hyperspectral Imaging Technology under Outdoor Conditions. Agriculture 2022, 12, 1796. [Google Scholar] [CrossRef]
Rodríguez-Perez, J.R.; Ordonez, C.; Gonzalez-Fernandez, A.B.; Sanz-Ablanedo, E.; Valenciano, J.B.; Marcelo, V. Leaf water content estimation by functional linear regression of field spectroscopy data. Biosyst. Eng. 2017, 162, 105–116. [Google Scholar] [CrossRef]
Zhang, Q.; Li, Q.; Zhang, G. Rapid Determination of Leaf Water Content Using VIS/NIR Spectroscopy Analysis with Wavelength Selection. Spectrosc. Int. J. 2012, 27, 93–105. [Google Scholar] [CrossRef]
Romaniello, R.; Barrasso, A.E.; Perone, C.; Tamborrino, A.; Berardi, A.; Leone, A. Optimisation of an industrial optical sorter of legumes for gluten-free production using hyperspectral imaging techniques. Foods 2024, 13, 404. [Google Scholar] [CrossRef]
Romaniello, R.; Perone, C.; Tamborrino, A.; Berardi, A.; Leone, A.; Di Taranto, A.; Iammarino, M. Additives individuation in raw ham using image analysis. Chem. Eng. Trans. 2021, 87, 217–222. [Google Scholar]
Palumbo, M.; Cozzolino, R.; Laurino, C.; Malorni, L.; Picariello, G.; Siano, F.; Stocchero, M.; Cefola, M.; Corvino, A.; Romaniello, R.; et al. Rapid and non-destructive techniques for the discrimination of ripening stages in Candonga Strawberries. Foods 2022, 11, 1534. [Google Scholar] [CrossRef] [PubMed]
Fazayeli, H.; Amodio, M.L.; Fatchurrahman, D.; Serio, F.; Montesano, F.F.; Burud, I.; Peruzzi, A.; Colelli, G. Potential application of hyperspectral imaging and FT-NIR spectroscopy for discrimination of soilless tomato according to growing techniques, water use efficiency and fertilizer productivity. Sci. Hortic. 2024, 328, 112928. [Google Scholar] [CrossRef]
Liu, Y.; Pu, H.; Sun, D.W. Hyperspectral imaging technique for evaluating food quality and safety during various processes: A review of recent applications. Trends Food Sci. Technol. 2017, 69, 25–35. [Google Scholar] [CrossRef]
Nicolaï, B.M.; Beullens, K.; Bobelyn, E.; Peirs, A.; Saeys, W.; Theron, K.I.; Lammertyn, J. Nondestructive measurement of fruit and vegetable quality by means of NIR spectroscopy: A review. Postharvest Biol. Technol. 2007, 46, 99–118. [Google Scholar] [CrossRef]
Roberts, C.A.; Workman, J.; Workman, J.; Reeves, J.B. Near-Infrared Spectroscopy in Agriculture; American Society of Agronomy: Madison, WI, USA; Crop Science Society of America: Madison, WI, USA; Soil Science Society of America: Madison, WI, USA, 2004. [Google Scholar]
Romaniello, R.; Barrasso, A.E.; Berardi, A.; Perone, C.; Tamborrino, A.; Catalano, F.; Baiano, A. Hyperspectral imaging system to online monitoring the soy flour content in a functional pasta. J. Agric. Eng. 2023, 54, 1535. [Google Scholar] [CrossRef]
Caporaso, N.; Paduano, A.; Nicoletti, G.; Sacchi, R. Capsaicinoids, antioxidant activity, and volatile compounds in olive oil flavored with dried chili pepper (Capsicum annuum). Eur. J. Lipid Sci. Technol. 2013, 115, 1434–1442. [Google Scholar] [CrossRef]
Sánchez, M.-T.; Entrenas, J.-A.; Torres, I.; Vega, M.; Pérez-Marín, D. Monitoring texture and other quality parameters in spinach plants using NIR spectroscopy. Comput. Electron. Agric. 2018, 155, 446–452. [Google Scholar] [CrossRef]
ElMasry, G.; Wang, N.; ElSayed, A.; Ngadi, M. Hyperspectral imaging for nondestructive determination of some quality attributes for strawberry. J. Food Eng. 2007, 81, 98–107. [Google Scholar] [CrossRef]
Darbellay, G.A.; Vajda, I. Estimation of the information by an adaptive partition of the observation space. IEEE Trans. Inf. Theory. 1999, 45, 1315–1321. [Google Scholar] [CrossRef]
Wu, Y.; Wu, B.; Ma, Y.; Wang, M.; Feng, Q.; He, Z. Rapid Discrimination of Organic and Non-Organic Leafy Vegetables (Water Spinach, Amaranth, Lettuce, and Pakchoi) Using VIS-NIR Spectroscopy, Selective Wavelengths, and Linear Discriminant Analysis. Appl. Sci. 2023, 13, 11830. [Google Scholar] [CrossRef]
He, M.; Li, C.; Cai, Z.; Qi, H.; Zhou, L.; Zhang, C. Leafy vegetable freshness identification using hyperspectral imaging with deep learning approaches. Infrared Phys. Technol. 2024, 138, 105216. [Google Scholar] [CrossRef]
Sun, Y.; Wang, J.; Wang, Q.; Wang, C. Responses of the Growth Characteristics of Spinach to Different Moisture Contents in Soil under Irrigation with Magnetoelectric Water. Agronomy 2023, 13, 657. [Google Scholar] [CrossRef]
Ghosh, A.; Ghosh, A.K.; Ray, R.S.; Sarkar, S. Classification using global and local Mahalanobis distances. J. Multivar. Anal. 2025, 207, 105417. [Google Scholar] [CrossRef]

Figure 1. Algorithm operation flow diagram.

Figure 2. Absolute reflectance spectra of the analyzed samples.

Figure 3. Score of the 261 wavelengths according to the MRMR algorithm.

Figure 4. Stages of implementation of the neural network. At epoch 82 it was obtained: gradient = 1.853 at epoch 82; mu = 1 × 10⁻⁵; validation checks = 6.

Figure 5. Neural network optimization cycle.

Figure 6. Optimization cycles for determining the best regression algorithm. The green circle indicates the best regression algorithm point.

Figure 7. Regression lines observed and predicted values for the train (blue line), validation (green line), and test (red line) phases. Regression on total data (black line).

Figure 8. Optimization cycles of the KNN statistical classifier in the training phase.

Figure 9. Confusion matrix of the KNN classifier results for validation data.

Figure 10. Confusion matrix of KNN classifier results for test data.

Figure 11. ROC curves related to classification results using validation data.

Figure 12. ROC curves related to classification results using test data.

Table 1. Moisture status of analyzed samples.

	Fresh Weight (g)	Dry Weight (g)	Delta (g)	Std. Dev. (%)	Weight (%)
test 1	10.0340	1.2522	8.7831	6.2098	87.5224
test 2	19.1200	2.2177	16.9010	11.9517	88.4014
test 3	17.5460	2.0228	15.5234	10.9769	88.4719
test 4	11.2684	0.7718	10.4965	7.4222	93.1507
test 5	14.3201	1.4326	12.8874	9.1128	89.9957
test 6	7.5566	0.5473	7.0070	4.9563	92.7573
test 7	6.6514	0.4734	6.1771	4.3685	92.8827
test 8	10.1917	1.0634	9.1283	6.4547	89.5660
test 9	9.5703	0.9006	8.6697	6.1304	90.5900
test 10	11.5250	1.0970	10.4280	7.3737	90.4819
test 11	7.8208	0.4973	7.3225	5.1785	93.6413
test 12	2.0261	0.2463	1.7798	1.2585	87.8422
test 13	6.0364	0.6914	5.3451	3.7795	88.5469
test 14	6.6174	0.7203	5.8971	4.1699	89.1148
test 15	10.0203	0.9953	9.0247	6.3817	90.0675
test 16	5.3568	0.6942	4.6623	3.2968	87.0394
test 17	6.2345	1.3144	4.9202	3.4791	78.9177

Table 2. Optimized parameters of the statistical classifier.

Accuracy (Validation)	Total Cost (Validation)	Accuracy (Test)	Prediction Speed	N. of Neighbors	Distance Metric	Distance Weight
98.26%	631	97.88%	99,791 (obj/sec)	10	Minkowski	Equal

Table 3. Regression model performance data.

Data Sets	Input Data	MSE	$R_{a d j}^{2}$
Train	24,183	0.27	0.99
Validation	8061	3.54	0.89
Test	4030	0.88	0.97

Table 4. Hyperparameters of the KNN statistical classifier.

N. of Neighbors	Distance Metric	Distance Weight	Standardize Data
10	Mahalanobis	Squared inverse	true

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Barrasso, A.E.; Perone, C.; Romaniello, R. Evaluation of Baby Leaf Products Using Hyperspectral Imaging Techniques. Appl. Sci. 2025, 15, 8532. https://doi.org/10.3390/app15158532

AMA Style

Barrasso AE, Perone C, Romaniello R. Evaluation of Baby Leaf Products Using Hyperspectral Imaging Techniques. Applied Sciences. 2025; 15(15):8532. https://doi.org/10.3390/app15158532

Chicago/Turabian Style

Barrasso, Antonietta Eliana, Claudio Perone, and Roberto Romaniello. 2025. "Evaluation of Baby Leaf Products Using Hyperspectral Imaging Techniques" Applied Sciences 15, no. 15: 8532. https://doi.org/10.3390/app15158532

APA Style

Barrasso, A. E., Perone, C., & Romaniello, R. (2025). Evaluation of Baby Leaf Products Using Hyperspectral Imaging Techniques. Applied Sciences, 15(15), 8532. https://doi.org/10.3390/app15158532

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Evaluation of Baby Leaf Products Using Hyperspectral Imaging Techniques

Abstract

Featured Application

Abstract

1. Introduction

2. Materials and Methods

2.1. Leaf Sampling and Water Content Measurements

2.2. Hyperspectral Scanner

2.3. Image Pre-Processing and Spectral Wavelength Selection

2.4. Prediction and Classification Models

3. Results and Discussions

3.1. Water Content and Spectral Data of Samples

3.2. Regression Model Results

3.3. Classification Model Results

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI