Next Article in Journal
Screening of Ecotypes and Construction of Evaluation System for Drought Resistance during Seed Germination in Kudouzi (Sophora alopecuroides)
Previous Article in Journal
Dicamba: Dynamics in Straw (Maize) and Weed Control Effectiveness
Previous Article in Special Issue
Estimation of Soil Moisture during Different Growth Stages of Summer Maize under Various Water Conditions Using UAV Multispectral Data and Machine Learning
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Nondestructively Determining Soluble Solids Content of Blueberries Using Reflection Hyperspectral Imaging Technique

1
Institute of Facility Agriculture of Guangdong Academy of Agricultural Sciences, Guangzhou 510640, China
2
College of Electronic Engineering (College of Artificial Intelligence), South China Agricultural University, Guangzhou 510642, China
3
Guangdong Academy of Agricultural Sciences, Guangzhou 510640, China
4
College of Electronic Information Engineering, Guangdong University of Petrochemical Technology, Maoming 525000, China
5
Guangzhou Kaibang Information Technology Co., Ltd., Guangzhou 510100, China
*
Author to whom correspondence should be addressed.
Agronomy 2024, 14(10), 2296; https://doi.org/10.3390/agronomy14102296 (registering DOI)
Submission received: 15 September 2024 / Revised: 28 September 2024 / Accepted: 4 October 2024 / Published: 6 October 2024

Abstract

:
Effectively detecting the quality of blueberries is crucial for ensuring that high-quality products are supplied to the fresh market. This study developed a nondestructive method for determining the soluble solids content (SSC) of blueberry fruit by using a near-infrared hyperspectral imaging technique. The reflection hyperspectral images in the 900–1700 nm waveband range were collected from 480 fresh blueberry samples. An image analysis pipeline was developed to extract the spectrums of blueberries from the hyperspectral images. A regression model for quantifying SSC values was successfully established based on the full range of wavebands, achieving the highest R P 2 of 0.8655 and the lowest RMSEP value of 0.4431 °Brix. Furthermore, three variable selection methods, namely the Successive Projections Algorithm (SPA), interval PLS (iPLS), and Genetic Algorithm (GA), were utilized to identify the feature wavebands for modeling. The models calibrated from feature wavebands generated an RMSEP of 0.4643 °Brix, 0.4791 °Brix, and 0.4764 °Brix, as well as the R P 2 of 0.8507, 0.8397, and 0.8420 for SPA, iPLS, and GA, respectively. Furthermore, a pseudo-color distribution diagram of the SSC values within blueberries was successfully generated based on established models. This study demonstrated a novel approach for blueberry quality detection and inspection by jointly using hyperspectral imaging and machine learning methodologies. It can serve as a valuable reference for the development of grading equipment systems and portable testing devices for fruit quality assurance.

1. Introduction

Blueberry (Vaccinium sect. Cyanococcus) is a widespread group of perennial shrubs belonging to the genus Vaccinium in the Ericaceae family. There are many cultivated species planted worldwide due to their refreshing taste, high nutritional value, and health benefits [1,2]. Due to the growing emphasis on healthcare, the demand for blueberry fruits has increased significantly in recent years [3]. According to the latest report from the International Blueberry Organization (IBO) [4], it has maintained an annual growth of over 15 thousand hectares in the past three years to meet the market demand worldwide, reaching a total production of 1.78 billion tons in 2023. Thereinto, more than two-thirds of the total production is for fresh food consumption.
In this case, taste quality becomes a crucial factor when evaluating the quality of blueberries. Particularly, the soluble solids content (SSC) is one of the most important indicators for directly evaluating the taste quality, which is a major indicator of the sweetness of blueberry fruits [5]. The SSC of blueberries is a group of dissolvable substances that are composed of soluble sugar and other soluble substances [6]. It is usually used to calculate the ratio of titratable acid to determine the maturity of the fruits in the field of horticulture. This method can somehow be more objective and direct than using either the days of flowering or appearance color to judge the maturity of the fruit, as well as the flavors. However, the conventional method for measuring the SSC is to juice the blueberry fruits and measure the juice on a refractometric saccharometer. These kinds of methods are destructive and laborious, and they cannot realize the detection requirements for a large number of samples. Hence, there is an urgent need to develop nondestructive and rapid methods for determining the SSC of blueberry fruits, to meet the demand for quality detection on the fruit grading and packaging assembly line.
In view of the importance of the SSC to fruit quality, a large number of scholars have been attracted to study nondestructive methods for detecting the SSC of blueberries and other berry fruits [7]. Due to the vibration absorption energy level for the functional groups of the SSC just falling in the near-infrared waveband region, optical techniques including spectroscopy and spectral imaging techniques have been widely studied for nondestructively detecting the SSC of fruits [8]. Additionally, spectral detection technologies have many technical advantages such as their efficiency and accuracy and their non-contact and environment-friendly nature, which can perfectly adapt to the needs of actual production scenarios and bring economic benefits. Among them, the detection of SSC is always a point of great concern [9,10,11,12,13,14,15]. Leiva-Valenzuela et al. [9] demonstrated that it is feasible to detect both firmness and SSC in blueberries using hyperspectral reflectance images in the range of 500–1000 nm. They reported that the established model predicted firmness (a correlation coefficient of 0.87) better than SSC values (a correlation coefficient of 0.79) for the testing samples. Furthermore, they also declared that the model from the reflectance hyperspectral images gave better results than the transmittance hyperspectral images when predicting the SSC and firmness index [10]. Hu et al. [11] compared the performance of models based on reflectance (410–1113 nm) and transmittance (690–1050 nm) hyperspectral image data for detecting the quality of blueberries. They reported that the reflectance models derived better results than the corresponding transmittance models for a single cultivar of blueberry samples, and the optimal model can achieve a correlation coefficient of 0.85 in the prediction set. Qiao et al. [12] demonstrated the feasibility of selecting feature wavebands for nondestructively predicting the SSC of blueberries while modeling with hyperspectral image (450–950 nm) data. They declared that a high correlation coefficient of 0.894 was obtained only using six waveband variables for testing samples in the prediction set. More recently, Zheng et al. [13] proposed that self-adaptive models could improve the prediction accuracy for blueberry SSC when modeling based on reflection near-infrared spectroscopy (NIRS, 1000–2500 nm) data, compared to the individual variation model and the hybrid-variation model. In addition, Bai et al. [14] used a combined model calibration strategy to enhance the model robustness for blueberry SSC estimation using reflectance NIRS (1000–2500 nm). Their experimental results showed that combining Global-modeling and calibration transfer methods could weaken the influence of external conditions such as the cultivar and season. For blueberries, some valuable studies have conducted nondestructive detection methods for the maturity [16,17], freshness [18], firmness [19], mechanical damage [20], sample decay [21], and other physical and chemical properties of blueberry fruits.
As far as we know, few articles in the literature have reported the use of hyperspectral imaging techniques to visualize the distribution of soluble solids content within blueberry fruits. In this study, we attempted to establish regression models for nondestructively determining the SSC values of blueberries based on a reflection hyperspectral imaging technique. This research goal was mainly achieved through the following four steps: (1) a platform system for capturing hyperspectral images of blueberry fruits was built; (2) algorithms for extracting spectrums from hyperspectral images of the blueberries were developed; (3) prediction models for the SSC values were calibrated using the extracted full-range spectral data; (4) models for predicting the SSC values were calibrated using the selected feature wavebands; (5) an independent validation set was conducted to verify the performance of the established models; (6) a pseudo-color diagram was generated to retrieve the distribution of SSC within the blueberry fruits based on the verified models.

2. Materials and Methods

2.1. Blueberry Sample Collection

The blueberry samples in this study were harvested from Conghua District, Guangzhou City, Guangdong Province, China, in April 2024. In total, 480 blueberry fruits with dark appearance but without exterior defects were selected as experimental materials. All sample materials were transported to the laboratory on the day of harvesting and then stored at 26 ± 1 °C for three hours before the data collection.

2.2. Hyperspectral Image Acquisition

Figure 1 shows the schematic diagram of the hyperspectral imaging system used in this study. Reflection hyperspectral images of blueberry samples were collected using a prototype near-infrared hyperspectral imaging platform system. The core components of the system included a spectral camera, a light source, a camera bracket, a computer, and the control software. The spectral camera (GaiaField N17E, Dualix Company, Wuxi, China) captured hyperspectral image data with a built-in push-broom unit. In this way, each hyperspectral cube was recorded including 512 wavebands in a range of 900–1700 nm, and the image resolution was 640 × 666 pixels. The light source (HSIA-LST200, Dualix Company, Wuxi, China) included four halogen bulbs and each bulb had 60 W of rated power. The camera bracket was used to mount the spectral camera and to change the field of view by adjusting the height. The control software installed in the computer for collecting hyperspectral images was SpecVIEW (Version 2.9.3.26, Dualix Company, Wuxi, China), which was used for parameter setting and data acquisition.
At the beginning of the acquisition process, the spectral camera was mounted at a height of 58.0 cm vertically downward to cover a physical space of 20 × 20 cm below. To minimize the external light interference, the experiment occurred in a dark room that only kept the indoor halogen light source on. The exposure time of the camera was set to 1.5 ms to balance the hyperspectral image of a reference whiteboard so it was neither over-exposed nor over-dark. After the platform system had been warmed up for 0.5 h, the dark frame for the built-in shutter and the white frame for the reference board were recorded for spectral correction. Afterward, the hyperspectral images were captured for the stem side and calyx side of blueberry fruit. To improve the efficiency of acquisition, groups of 30 blueberry fruits were arranged in 6 rows and 5 columns placed on a Teflon board (20 × 20 cm) as a sample holder for capturing hyperspectral images. In this way, each hyperspectral cube with a size of 640 × 666 × 512 contained data of 30 samples. Finally, 16 hyperspectral cubes for the stem side and another 16 hyperspectral cubes for the calyx side of blueberry fruits were recorded. In other words, 32 hyperspectral cubes from 480 blueberry samples were collected as a total data set for further analyses in this study.

2.3. Soluble Solids Content Measurement

After capturing hyperspectral images of all blueberry samples, the SSC value of each blueberry fruit was measured manually using a digital refractometer (PAL-BX/ACID7; ATAGO Co., Ltd., Tokyo, Japan) with a range of 0–60 °Brix and a precision of ±0.1 °Brix. For measuring each blueberry sample, the blueberry fruit was squeezed into juice and dropped into the detection window of the refractometer for recording readings. Each sample was measured three times by reloading the sample juice, and the average of these three values was used as the sample SSC value. Between each measurement of a sample, the refractometer was calibrated with distilled water.

2.4. Spectral Data Extraction and Preprocessing

After the acquisition step was completed, all the original reflection hyperspectral images were first subjected to dark and white correction processing to remove the dark current noise introduced in the camera. Equation (1) states the black and white correction as follows:
R c o r r e c t e d = R o r i g i n a l R d a r k R w h i t e R d a r k
where R o r i g i n a l is the original data, R d a r k is dark background data, R w h i t e is the whiteboard data, and R c o r r e c t e d is the corrected sample data. In this way, the value stored in a data point was converted to the relative reflectance value. By manually defining the region of interest (ROI) where the blueberries were located in hyperspectral images, the average reflectance spectrum from the ROI of each individual blueberry was then automatically extracted by MATLAB function codes. According to the Beer–Lambert law, there is a linear relationship between the relative concentration of the sample components and the spectral absorbance. Hence, all reflectance spectrums of samples were converted to their absorbance spectrums using the method shown in Equation (2):
A = log 10 ( 1 R c o r r e c t e d )
where A is the relative absorbance spectrum of a sample. Finally, all the absorbance spectrums of samples were used for analyses throughout this study. Noticeably, since the average spectrum taken through the ROI made the fluctuation noise caused by the detector sensor insignificant, this study did not preprocess spectral data with smoothing for modeling as most other studies did. Instead, the autoscaling method was only used for preprocessing waveband variables in this study, which normalized each predictor variable and the response variable to a mean of 0 and a standard deviation of 1 for modeling.

2.5. Partial Least Squares Regression Modeling Algorithm

The partial least squares regression (PLSR) algorithm was adopted as a quantitative modeling method for predicting the SSC values of blueberry fruits. The PLSR is a widely used linear modeling method in chemometric analyses since the Beer–Lambert law describes the precise linear relationship between the absorbance of NIR wavebands and the relative concentration of components. Additionally, the PLSR method can properly solve the instability problem well caused by multivariate collinearity between NIR waveband variables, which is a common phenomenon in spectral analysis. It also performs rapidly and steadily while calculating regression coefficients even in underdetermined situations resulting from unbalanced number samples and hyperspectral variables.
In theory, the PLSR method calculates a regression model using the given number of components, namely latent variables, to predict one or more response variables from a set of predictor variables. In general, it needs to seek the optimal number of latent variables that both capture variance and achieve correlation between the predictor variables and response variables, like between the waveband variables and SSC values in this study. A PLSR model attempts to maximize the covariance of predictor variables and response variables during the modeling process. In this way, the number of latent variables can be determined by using the cross-validation method and then fitted with the regression coefficients. The established PLSR model can then be used for predicting unknown samples by transforming the predictor variables collected from an unknown sample into a one-dimensional array as input to calculate the estimating response value.

2.6. Feature Wavebands Selection Algorithms

To explore a more simplified and optimized model, three variable selection algorithms, namely Successive Projections Algorithm (SPA), interval PLS (iPLS), and Genetic Algorithm (GA), were utilized to identify the feature wavebands for determining the SSC of blueberries.
The SPA method focuses on minimizing the variable collinearity in a vector space [22]. Its purpose is to select wavelengths whose information content is minimally redundant. It is a forward selection method that starts with one waveband and then incorporates a new one with the maximum projection norm for the remaining wavebands at each iteration, until a specified number of wavebands is reached. Hence, the initial waveband is a crucial factor for SPA that impacts the selection results. For this reason, each of the 395 wavebands was circularly used as the initial waveband to find the best combination of wavebands by running SPA coupled with the cross-validation method. A maximal number of wavebands was set at 100 to limit the total number of waveband variables. To speed up the selection process, the target number of wavebands was increased from 5 to 100 with a step of 5. Finally, the wavebands in a combination that produced best result during cross-validation were marked as feature wavebands identified by the SPA method.
Similar to the SPA algorithm, the iPLS also has a stepwise process for selecting variables. Differently, the iPLS can be operated both in a forward selection mode and a reverse removal mode [23]. In the forward mode, the iPLS starts by creating individual PLS models using a single subset of variables; each subset can contain one or more variables. In each selection cycle, it successively includes one subset from the remaining subsets when the model produces the lowest RMSECV. Outperforming the SPA method, the iPLS method can automatically judge the first initial subset based on the model results, and its selection process takes into account the correlation between the independent and dependent variables. The iPLS algorithm will finish when it reaches the target number of variables. By doing a sequential, exhaustive search, the optimal combination of subsets was determined. The model performs best during the cross-validation process. In this study, each subset contained a single waveband and hence 395 subsets were generated. In order to obtain a result comparable to the SPA method, the target number of wavebands for iPLS was also set between 5 and 100 with an increase step of 5, and the wavebands calibrated by the best model among all models were marked as feature wavebands identified by iPLS method.
Unlike the SPA and iPLS methods, the GA method does not rely on traversing a whole batch of waveband combinations to search for the optimal predictors. Instead, it looks for the optimal combination of variables from randomly generated combinations that represent the population [24]. The variables in GA are equivalent to the concept of genes; the feature variables are retained and passed to the next iteration, which demonstrates the principle of species heredity, and the recombination between combinations of variables is like a hybridization in nature. In this study, the pre-defined number of combinations was 50 and total 100 iterations were set to run in the GA method. During each iteration, the worst 20% of combinations were dropped and randomly regenerated to replenish the population of combinations. The mutation rate was given as 0.01 and double cross-over was set as the breeding rule for updating new waveband variables in combinations. Finally, the variables included in a combination that performed best in the last iteration were marked as feature wavebands selected by the GA method.
The ten-fold cross-validation method was used for operating above three waveband selection algorithms while testing model performance. Meanwhile, the PLSR modeling algorithm was the method used throughout the modeling phase to provide an objectively comparable result.

2.7. Evaluation Metrics and Significance Tests for Quantitative Models

Evaluation metrics are necessary to objectively judge the performance of established models, and significance tests are helpful for verifying the statistical significance of regression models.
For regression models, the root mean square error (RMSE), the determination coefficient ( R 2 ), and the ratio prediction to deviation (RPD) are generally used as performance indicators for quantitative models. RMSE is the average squared difference between the predicted values and the measured values of the samples, which measures the average magnitude of the errors made by the prediction model. R 2 represents the proportion of the variation in the response variable that was predictable from the predictor variables, which provides a measure of how well outcomes were replicated by the prediction model. The formulas to calculate RMSE and R 2 values are shown in Equations (3) and (4):
R M S E ( R M S E C , R E S E C V , R M S E P ) = i n y i ^ y ¯ 2 n
where y i ^ is the SSC value of i t h sample predicted by model (°Brix), y ¯ is the mean SSC value of samples in data set (°Brix), and n is number of samples. R 2 values are calculated as follows:
R 2 ( R C 2 , R C V 2 , R P 2 ) = i n y i ^ y ¯ 2 i n y i y ¯ 2
where y i is the measured SSC value of i t h sample (°Brix).
RPD is related to the ability of the prediction model to predict unknown samples concerning the initial variability of the calibration set and can be calculated according to Equation (5):
R P D = S D R M S E P
where S D is the standard deviation of SSC values of samples in the validation set (°Brix), and R M S E P is the root mean square error of prediction for validation set (°Brix).
In this study, a total of 480 samples were divided into a calibration set (80% = 384 samples) and a validation set (20% = 96 samples). For the model building and verifying process, the calibration set was first used to find the optimal number of latent variables by applying ten-fold cross-validation method. The results of cross-validation with the optimal number of latent variables were reported with RMSECV and R C V 2 . Secondly, once the optimal number of latent variables was determined, all 384 samples were used again to recalibrate the PLSR model based on the optimal number of latent variables. Then, the recalibrated model was used to predict the 384 samples themselves to report the RMSEC and R C 2 . Thirdly, the 96 samples in the validation set were finally used to test the recalibrated PLSR model. The test results were then reported by RMSEP, R P 2 , and RPD, indicating the model performance on unknown samples.
For model significance testing, the F-test method was used to verify the statistical significance of the calibrated regression models, by judging the significance of a model at a confidence level of 0.01 according to the calculated p-value.

3. Results

3.1. The Soluble Solids Content of Blueberry Samples

Table 1 shows a summary of the statistical results of the blueberry samples in the calibration set and validation set. The calibration set, including 384 samples, was organized to establish regression models for predicting SSC values. The samples in the calibration set had an SSC value range of 5.1–11.9 °Brix with a mean of 8.14 °Brix. Additionally, the validation set containing 96 samples was prepared to verify the performance of established models. The equivalent mean values and standard deviation (SD) values for SSC indicated that two data sets could be selected from the same sample population, which can be observed by the violin plots of the distribution of the measured SSC values of the blueberry samples in Figure 2.

3.2. Spectrums of Blueberry Samples

The absorbance spectrums for a total of 480 samples extracted from their hyperspectral images were plotted in Figure 3. Due to sensor noise fluctuations, the wavebands of the head and tail were truncated and the spectrums in the range of 1000–1650 nm were retained for further analyses. In Figure 3a, two absorption peaks can be observed at wavebands 1150–1250 nm and wavebands 1400–1500 nm in the spectrums for all samples. Figure 3b shows that the mean absorbance of the stem side was overall higher than the calyx side in the waveband range of 1150–1650 nm. This illustrates that there is a difference in the tissue composition between the stem part and the calyx part of blueberry fruit.

3.3. Correlation Analysis of Waveband Variables

After extracting the mean spectrums of the two sides of the blueberry samples, the correlation coefficients between the spectral variables were calculated and the pseudo-color diagram of the correlation coefficients is plotted in Figure 4. The absolute value of all the coefficients was larger than 0.7, indicating that there was a high linear correlation between the waveband variables and each other. Thereinto, the wavebands 1000–1150 nm, 1150–1400 nm, and 1400–1500 nm reached a high coefficient, larger than 0.9, within their wavebands. The above results generally reflect that there is a significant multivariate collinearity between waveband variables, which is a key problem needs to be solved in the modeling process and feature waveband selection process.

3.4. Regression Models for Detecting SSC Values

Table 2 shows the results of the regression models for determining the SSC values of blueberry fruits, and Figure 5 shows the prediction details of the regression models for both the calibration set and validation set. Figure 5a shows the details of the cross-validation for determining the optimal number of latent variables in the PLSR models. While calibrating the model for each type of spectral data, the RMSECV of the models decreased as the latent variables increased to 20. Then, the RMSECV increased since the models included more than 35 latent variables. This indicated that the models underfit before adding 20 latent variables but overfit while including more than 35 latent variables. The calibrated models perform best during cross-validation while using 25, 20, and 33 latent variables with different spectral data sets. All three calibrated models generated a high determination coefficient of prediction ( R P 2 ) that was larger than 0.8. The PLSR model built on the spectral data from the stem side produced a R P 2 of 0.8021, with an RMSEP value of 0.5435 °Brix. In contrast, the PLSR model established on the spectral data from the calyx side obtained a higher R P 2 of 0.8463 and a lower RMSEP value of 0.4694 °Brix. Moreover, while using the mean spectrum of the two sides of the samples, the fitted PLSR model reached the highest R P 2 of 0.8655 and the lowest RMSEP of 0.4431. By comparing Figure 5b–d, the scatter points of the predicted values against the measured values were more concentrated and closer together in Figure 5d than those in Figure 5b,c. It can be drawn from this that using the mean spectrums of the two sides of the blueberry fruits to calibrate the regression model would perform better than using the mean spectrums of a single side. To sum up, all three calibrated models passed the significance test at a 99.9% confidence level as p-values were lower than 0.001, and they achieved high RPD values that were greater than 2.2 for practical screening.

3.5. Regression Models Based on Feature Wavelength Variables

Figure 6a shows the trends in model performance while modeling with different numbers of wavebands selected by the three feature selection methods, namely SPA, iPLS, and GA. The numbers of wavebands selected by the GA method for modeling were not continuously increasing, so a diamond-shaped scatter plot was used to show them. All three algorithms showed an overall consistent convergence trend, where the RMSECV values of the models decreased as the number of wavebands increased to 70 wavebands, and the performance of the model tended to be stable even if more than 70 wavebands were used. Among these three selection methods, the iPLS method performed more effectively than the SPA method when selecting wavebands to calibrate a better model, since it generated a lower RMSECV while using an equal number of wavebands for modeling. This was mainly because the iPLS method directly selected the target number of wavebands for the best model, and the SPA method only considered the data structure of the predictor variables. For the optimization process of the GA method, the average RMSECV of each generation went down as the algorithm introduced more wavebands in the iterative evolution process. However, the number of wavebands identified by the GA method at the latter stages of the iteration was between 60 and 70, which meant that the optimal combination of feature wavebands for modeling would include a number within this range. This was consistent with the results of the SPA and iPLS methods, where the performance fluctuated weakly when more than 70 wavebands were selected for modeling. Finally, the feature wavebands selected by the GA method could generate a better model than the SPA and iPLS methods. This is probably because the SPA and iPLS methods were restricted in the order in which they selected wavebands, and the GA method had greater random variability, which was more conducive to global optimization. Figure 6b marks the feature wavebands identified by the SPA, iPLS, and GA methods, while the model generated the lowest RMSECV. Meanwhile, the statistical results of these three models are summarized in Table 3. The calibrated models generated relatively close values for RMSECV, with values of 0.4446 °Brix, 0.4330 °Brix, and 0.4078 °Brix for SPA, iPLS, and GA, respectively.
Afterward, to further compare the computational performance of the different models, the wavebands were extracted on a single-pixel scale as the model inputs to calculate the SSC values. Consequentially, the distribution diagram could be drawn out by using the calculated SSC value as each pixel’s intensity value. The five blueberry fruits shown in Figure 7a were selected for generating the distribution diagram of SSC. While inputting wavebands extracted from pixels within fruit ROIs, the SSC values computed by the models were as displayed in Figure 7b. The four rows of samples in Figure 7b denote the results from the full-range waveband model, as well as the three models based on the feature wavebands selected by SPA, iPLS, and GA. It can be observed that the four models generated similar textures and colors for the SSC distributions, which demonstrated that the three feature models could produce an approximate output as the full wavebands model did. This indicated that the established feature waveband models performed as robustly as the full waveband model.

4. Discussion

There are still many key issues and challenges that need to be addressed to build accurate and robust models while applying hyperspectral imaging techniques. One of the most challenging problems is to identify the characteristic wavebands. Solving this problem is of great help to simplify the complexity of the model and reduce the cost of detection equipment, and plays a positive role in promoting the application of nondestructive technologies in industrial practice. This study realized the nondestructive prediction of the SSC of blueberries by using the reflection hyperspectral imaging technique and quantitative modeling method, as well as using different feature waveband selection algorithms.
Table 2 shows the model performance modeling on the hyperspectral image data captured from the different orientations of the blueberries. It was found that using the mean spectrums from both the stem side and calyx side could bring better model results ( R P 2 = 0.8655, RMSEP = 0.4431, and RPD = 2.7269) than only using one side of the mean spectrums. Additionally, the model of the calyx side was slightly better than that of the stem side. In contrast, Leiva-Valenzuela et al. [10] reported that the SSC predictions for the stem end direction were slightly better than the calyx end direction ( R p of 0.88 versus 0.87) in the reflectance detection mode. This kind of difference might be mainly because of the uneven distribution of sample compounds. These findings indicate that it is necessary to collect representative hyperspectral image data for different orientations of samples to build the optimal model.
As pointed out in Table 3, the established PLSR model based on feature wavebands could also realize an accurate prediction for the calibration set and validation set. It demonstrated that it is feasible to reduce spectral data acquisition from the full wavebands to achieve accurate detection. Previous studies have recommended the use of selection algorithms to simplify the model and then decrease the computational costs [10]. This would be more suitable for the online commercial sorting and grading of blueberries, as well as for the development of handheld detection equipment. For the three tested feature selection methods, it is recommended that the SPA method ( R P 2 = 0.8507, RMSEP = 0.4643, and RPD = 2.5883) is used since it performed best for the unknown samples in the validation test. With the tradeoff between speed and robustness, this study also recommends applying the GA method ( R P 2 = 0.8420, RMSEP = 0.4764, and RPD = 2.5156) for feature selection. This is because the GA method converged much faster than the SPA and iPLS methods when dealing with hundreds of hyperspectral variables. However, the iPLS method ( R P 2 = 0.8397, RMSEP = 0.4791, and RPD = 2.4976) did not perform as well as the SPA and GA methods in the independent validation set; this was partly due to the limitations of its waveband selection strategy. However, the above comparisons might not be strictly consistent because of the differences in the detection modes (reflectance, transmittance, or other modes), waveband range, and sample materials. Overall, the simplified models could achieve high accuracy results by drastically reducing the number of waveband variables used compared with the model using the full range of wavebands.
According to the distribution maps of the SSC in Figure 7b, similar textures and colors could be observed in the distributions generated by the different models. On one hand, this indicated that the combinations of wavebands selected by the SPA, iPLS, and GA methods had generated models with an equivalent prediction performance to the full-range waveband model. Taking a closer look at the marked feature wavebands in Figure 6b, the wavebands selected by iPLS were relatively concentrated in the regions of 1200–1250 nm and 1400–1450 nm, which were the main regions of the vibration absorption of the functional groups within the SSC. Notably, the iPLS-selected wavebands in the series fit the SSC values best within the calibration set, although the wavebands were close to each other, which resulted in collinearity. In contrast, less adjacent wavebands were selected by the SPA method. The SPA method only considered the projection information of the waveband matrix instead of the correction relationship between the wavebands and SSC values [22]. Hence, it could cut down the collinearity of the selected variables to some extent and avoid falling into excessive optimization within the calibration set. The probable reason is that the SPA method outperformed the unknown samples in the independent validation set. Owing to its unconstrained choice of variable order and powerful global optimization ability, the GA method identified a more optimal combination of wavebands, and it produced the best results in the cross-validation process. On the other hand, the distribution diagram shows that the SSC was not evenly distributed on individual blueberries and that differences in concentrations existed in different areas of a blueberry fruit. Qiao et al. [12] also found that the distribution of the SSC in blueberries was uneven and concentrated, which was predicted by the model of the hyperspectral images (400–1000 nm) data. They stated that this was mainly because of the exposure problems while capturing hyperspectral images [25]. In view of this, this study used four bulbs from different directions to provide a relatively uniform light source. Moreover, the intensity of the light source was properly adjusted to avoid the over-exposure of the captured hyperspectral images. In this way, no obvious over-exposure regions appeared in the hyperspectral images in this study. Hence, we tend to believe that this uneven distribution was the result of attribute differences in the blueberry samples. However, it is difficult to accurately describe the correlation between the SSC distribution and other fruit attributes since no other attribute information was collected as ground truth in this experiment. Anyhow, it would be a useful tool for agronomists to study the change in fruit quality and even explore the internal metabolism of substances while collecting time-series hyperspectral image data with an in-situ detection mode [26,27]. In general, the approach proposed in this study is not only significant for the nondestructive detection of fruit quality after harvest but also has a positive effect on understanding the quality changes that occur during fruit growth [28]. In summary, this study highly recommends the use of appropriate waveband selection methods and the use of uniform and stable light source conditions during modeling and equipment development for practical applications.

5. Conclusions

In this study, a method for nondestructively visualizing the soluble solids content (SSC) of blueberries was successfully developed by using a hyperspectral imaging technique. The prediction model for quantifying the SSC value was successfully established based on the full range of wavebands, achieving the highest R P 2 of 0.8655 and the lowest RMSEP value of 0.4431 °Brix. Moreover, this study showed that the feature selection methods could effectively reduce the inputs of waveband variables and relieve the complexity of the detection models. It could help to cut down the production cost by using a multispectral system for practical applications. Additionally, benefiting from hyperspectral imaging technology, the method proposed in this study could provide a new perspective to explore the SSC distribution of blueberry fruits. Meanwhile, the hyperspectral imaging technology could detect multiple samples in small sizes simultaneously and efficiently, and the reflection detection mode can greatly optimize the light path design. This mode might be more suitable for production scenarios than the transmittance mode and is expected to be a viable technical solution for internal quality inspection of small-sized fruits like diverse berries.
The results of this study can contribute to the development of quality management for blueberries and other fruits in commodity-trading activities. With the increase in blueberry production, there is an urgent need for rapid and nondestructive technologies to grade a large amount of blueberry fruits in a short period. This study demonstrated a novel approach for blueberry quality detection, and the advantages of jointly applying hyperspectral imaging and machine learning methodologies, which can serve as a valuable reference for the development of grading and packaging equipment. By mounting a hyperspectral or multispectral camera above the fruit sample conveyor belt, it can collect and transmit information on fruit samples to the central control system, which in turn directs the grading agency to grade the fruit samples. This can be embedded in newly deployed equipment or added to an existing production line with the advantages of both low cost and high adaptability. This is particularly critical in the context of the increasing demand for sustainable agricultural practices and the demand for high-quality agricultural produce.

Author Contributions

Conceptualization, H.L.; methodology, B.L.; software, B.C.; validation, G.Q. and B.C.; formal analysis, X.W.; investigation, G.Q.; resources, B.C.; data curation, H.O.; writing—original draft preparation, G.Q.; writing—review and editing, G.Q.; visualization, X.D.; supervision, X.Y.; project administration, H.L.; funding acquisition, H.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Scientific and Technological Innovation Strategic Program of the Guangdong Academy of Agricultural Sciences, grant number ZX202402; the Natural Science Foundation of Guangdong Province, grant number 2022A1515010391; the Innovation Fund of the Guangdong Academy of Agricultural Sciences, grant number 202104; the Science and Technology Commissioners Project of Guangdong Province, grant number KTP20240137; and the Youth Training Program of the Guangdong Academy of Agricultural Sciences, grant number R2020QD-061.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors upon request.

Conflicts of Interest

Author Haishan Ouyang was employed by the company Guangzhou Kaibang Information Technology Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as potential conflicts of interest.

References

  1. Moroni, A.; Zupo, R.; Castellana, F.; Amirante, F.; Zese, M.; Rondanelli, M.; Riso, P.; Perna, S. Berry Fruits and Their Improving Potential on Skeletal Muscle Health and Performance: A Systematic Review of the Evidence in Animal and in Human Studies. Foods 2024, 13, 142210. [Google Scholar] [CrossRef] [PubMed]
  2. Qi, J.; Sun, S.; Zhang, L.; Zhu, Y.; Zhou, H.; Gan, X.; Li, B.; Chen, Y.; Li, W.; Li, T.; et al. Seasonal variation of antioxidant bioactive compounds in southern highbush blueberry leaves and non-destructive quality prediction in situ by a portable near-infrared spectrometer. Food Chem. 2024, 457, 139925. [Google Scholar] [CrossRef]
  3. Xiao, F.; Wang, H.; Xu, Y.; Shi, Z. A Lightweight Detection Method for Blueberry Fruit Maturity Based on an Improved YOLOv5 Algorithm. Agriculture 2024, 14, 36. [Google Scholar] [CrossRef]
  4. IBO. Global State of the Blueberry Industry Report in 2024. 2024. Available online: https://www.internationalblueberry.org (accessed on 3 September 2024).
  5. Magwaza, L.S.; Opara, U.L. Analytical methods for determination of sugars and sweetness of horticultural products-A review. Sci. Hortic-Amsterdam 2015, 184, 179–192. [Google Scholar] [CrossRef]
  6. Li, J.L.; Sun, D.W.; Cheng, J.H. Recent Advances in Nondestructive Analytical Techniques for Determining the Total Soluble Solids in Fruits: A Review. Compr. Rev. Food Sci. F 2016, 15, 897–911. [Google Scholar] [CrossRef]
  7. Chen, Z.; Wang, J.; Liu, X.; Gu, Y.; Ren, Z. The Application of Optical Nondestructive Testing for Fresh Berry Fruits. Food Eng. Rev. 2024, 16, 85–115. [Google Scholar] [CrossRef]
  8. Zhang, M.; Li, C.; Yang, F. Optical properties of blueberry flesh and skin and Monte Carlo multi-layered simulation of light interaction with fruit tissues. Postharvest Biol. Tec. 2019, 150, 28–41. [Google Scholar] [CrossRef]
  9. Leiva-Valenzuela, G.A.; Lu, R.; Aguilera, J.M. Prediction of firmness and soluble solids content of blueberries using hyperspectral reflectance imaging. J. Food Eng. 2013, 115, 91–98. [Google Scholar] [CrossRef]
  10. Leiva-Valenzuela, G.A.; Lu, R.; Aguilera, J.M. Assessment of internal quality of blueberries using hyperspectral transmittance and reflectance images with whole spectra or selected wavelengths. Innov. Food Sci. Emerg. 2014, 24, 2–13. [Google Scholar] [CrossRef]
  11. Hu, M.H.; Dong, Q.L.; Liu, B.L. Modelling postharvest quality of blueberry affected by biological variability using image and spectral data. J. Sci. Food Agr. 2016, 96, 3365–3373. [Google Scholar] [CrossRef] [PubMed]
  12. Qiao, S.; Tian, Y.; Gu, W.; He, K.; Yao, P.; Song, S.; Wang, J.; Wang, H.; Zhang, F. Research on simultaneous detection of SSC and FI of blueberry based on hyperspectral imaging combined MS-SPA. Eng. Agric. Environ. Food 2019, 12, 540–547. [Google Scholar] [CrossRef]
  13. Zheng, W.; Bai, Y.; Luo, H.; Li, Y.; Yang, X.; Zhang, B. Self-adaptive models for predicting soluble solid content of blueberries with biological variability by using near-infrared spectroscopy and chemometrics. Postharvest Biol. Tec. 2020, 169, 111286. [Google Scholar] [CrossRef]
  14. Bai, Y.; Fang, Y.; Zhang, B.; Fan, S. Model robustness in estimation of blueberry SSC using NIRS. Comput. Electron. Agr. 2022, 198, 107073. [Google Scholar] [CrossRef]
  15. Chen, Y.; Li, Y.; Williams, R.A.; Zhang, Z.; Peng, R.; Liu, X.; Xing, T. Modeling of soluble solid content of PE-packaged blueberries based on near-infrared spectroscopy with back propagation neural network and partial least squares (BP-PLS) algorithm. J. Food Sci. 2023, 88, 4602–4619. [Google Scholar] [CrossRef]
  16. Smrke, T.; Stajner, N.; Cesar, T.; Veberic, R.; Hudina, M.; Jakopic, J. Correlation between Destructive and Non-Destructive Measurements of Highbush Blueberry (Vaccinium corymbosum L.) Fruit during Maturation. Horticulturae 2023, 9, 501. [Google Scholar] [CrossRef]
  17. MacEachern, C.B.; Esau, T.J.; Schumann, A.W.; Hennessy, P.J.; Zaman, Q.U. Detection of fruit maturity stage and yield estimation in wild blueberry using deep learning convolutional neural networks. Smart Agric. Technol. 2023, 3, 100099. [Google Scholar] [CrossRef]
  18. Huang, W.; Wang, X.; Zhang, J.; Xia, J.; Zhang, X. Improvement of blueberry freshness prediction based on machine learning and multi-source sensing in the cold chain logistics. Food Control 2023, 145, 109496. [Google Scholar] [CrossRef]
  19. Varaldo, A.; Chiabrando, V.; Giacalone, G. New approach for blueberry firmness grading to improve the shelf-life along the supply chain. Sci. Hortic-Amsterdam 2022, 304, 111273. [Google Scholar] [CrossRef]
  20. Zheng, Z.; An, Z.; Liu, X.; Chen, J.; Wang, Y. Finite Element Analysis and Near-Infrared Hyperspectral Reflectance Imaging for the Determination of Blueberry Bruise Grading. Foods 2022, 11, 1899. [Google Scholar] [CrossRef] [PubMed]
  21. Shicheng, Q.; Youwen, T.; Qinghu, W.; Shiyuan, S.; Ping, S. Nondestructive detection of decayed blueberry based on information fusion of hyperspectral imaging (HSI) and low-Field nuclear magnetic resonance (LF-NMR). Comput. Electron. Agr. 2021, 184, 106100. [Google Scholar] [CrossRef]
  22. Araújo, M.C.U.; Saldanha, T.C.B.; Galvão, R.K.H.; Yoneyama, T.; Chame, H.C.; Visani, V. The successive projections algorithm for variable selection in spectroscopic multicomponent analysis. Chemometr. Intell. Lab. 2001, 57, 65–73. [Google Scholar] [CrossRef]
  23. Nørgaard, L.; Saudland, A.; Wagner, J.; Nielsen, J.P.; Munck, L.; Engelsen, S.B. Interval Partial Least-Squares Regression (iPLS): A Comparative Chemometric Study with an Example from Near-Infrared Spectroscopy. Appl. Spectrosc. 2016, 54, 413–419. [Google Scholar] [CrossRef]
  24. Bangalore, A.S.; Shaffer, R.E.; Small, G.W.; Arnold, M.A. Genetic Algorithm-Based Method for Selecting Wavelengths and Model Size for Use with Partial Least-Squares Regression: Application to Near-Infrared Spectroscopy. Anal. Chem. 1996, 68, 4200–4212. [Google Scholar] [CrossRef]
  25. Mo, C.; Kim, M.S.; Kim, G.; Lim, J.; Delwiche, S.R.; Chao, K.; Lee, H.; Cho, B. Spatial assessment of soluble solid contents on apple slices using hyperspectral imaging. Biosyst. Eng. 2017, 159, 10–21. [Google Scholar] [CrossRef]
  26. Muñoz, C.; Ávila, J.; Salvo, S.; Huircán, J.I. Prediction of harvest start date in highbush blueberry using time series regression models with correlated errors. Sci. Hortic-Amsterdam 2012, 138, 165–170. [Google Scholar] [CrossRef]
  27. Oh, H.; Pottorff, M.; Giongo, L.; Mainland, C.M.; Iorizzo, M.; Perkins-Veazie, P. Exploring shelf-life predictability of appearance traits and fruit texture in blueberry. Postharvest Biol. Tec. 2024, 208, 112643. [Google Scholar] [CrossRef]
  28. Mengist, M.F.; Pottorff, M.; Mackey, T.; Ferrao, F.; Casorzo, G.; Lila, M.A.; Luby, C.; Giongo, L.; Perkins-Veazie, P.; Bassil, N.; et al. Assessing predictability of post-storage texture and appearance characteristics in blueberry at breeding population level. Postharvest Biol. Tec. 2024, 214, 112964. [Google Scholar] [CrossRef]
Figure 1. Schematic diagram of hyperspectral imaging system for capturing reflection hyperspectral images of blueberry fruits.
Figure 1. Schematic diagram of hyperspectral imaging system for capturing reflection hyperspectral images of blueberry fruits.
Agronomy 14 02296 g001
Figure 2. Violin plots of the distribution of measured SSC values of blueberry samples from calibration set shown in blue and validation set shown in red.
Figure 2. Violin plots of the distribution of measured SSC values of blueberry samples from calibration set shown in blue and validation set shown in red.
Agronomy 14 02296 g002
Figure 3. Mean spectrum of blueberry fruits extracted from hyperspectral images: (a) absorbance spectrums of total 480 samples in different color and (b) mean absorbance spectrums collected from the stem side and calyx side of blueberry fruits.
Figure 3. Mean spectrum of blueberry fruits extracted from hyperspectral images: (a) absorbance spectrums of total 480 samples in different color and (b) mean absorbance spectrums collected from the stem side and calyx side of blueberry fruits.
Agronomy 14 02296 g003
Figure 4. The pseudo-color diagram of the correlation coefficients between the waveband variables of the blueberry samples.
Figure 4. The pseudo-color diagram of the correlation coefficients between the waveband variables of the blueberry samples.
Agronomy 14 02296 g004
Figure 5. Prediction results of PLSR models for estimating SSC values with different spectral data: (a) results of cross-validation of PLSR models with different numbers of latent variables, (b) model results from spectral data of stem side, (c) model results from spectral data of calyx side, and (d) model results from spectral data of stem side and calyx side.
Figure 5. Prediction results of PLSR models for estimating SSC values with different spectral data: (a) results of cross-validation of PLSR models with different numbers of latent variables, (b) model results from spectral data of stem side, (c) model results from spectral data of calyx side, and (d) model results from spectral data of stem side and calyx side.
Agronomy 14 02296 g005
Figure 6. The results of feature waveband variables selection generated by SPA, iPLS, and GA methods: (a) the trend of model performance while modeling with different numbers of feature wavebands, (b) the feature wavebands identified by the three algorithms that generated the lowest RMSECV results during cross-validation processes.
Figure 6. The results of feature waveband variables selection generated by SPA, iPLS, and GA methods: (a) the trend of model performance while modeling with different numbers of feature wavebands, (b) the feature wavebands identified by the three algorithms that generated the lowest RMSECV results during cross-validation processes.
Agronomy 14 02296 g006
Figure 7. Pseudo-color map for retrieving the distribution of SSC within blueberry fruits generated by optimal PLSR models: (a) RGB image of five blueberry samples for generating the pseudo color map, (b) prediction results of models from feature wavebands selected by SPA, iPLS, and GA methods.
Figure 7. Pseudo-color map for retrieving the distribution of SSC within blueberry fruits generated by optimal PLSR models: (a) RGB image of five blueberry samples for generating the pseudo color map, (b) prediction results of models from feature wavebands selected by SPA, iPLS, and GA methods.
Agronomy 14 02296 g007
Table 1. The statistical results of blueberry samples in the calibration set and validation set.
Table 1. The statistical results of blueberry samples in the calibration set and validation set.
Data SetsNumber of SamplesSSC (°Brix)
MinMaxMeanSD a
Calibration set3845.111.98.141.2066
Validation set965.511.38.121.1999
Note: a SD, standard deviation.
Table 2. Results of PLSR models for determining SSC values using full range wavebands extracted from different tissue parts of blueberry fruits.
Table 2. Results of PLSR models for determining SSC values using full range wavebands extracted from different tissue parts of blueberry fruits.
Spectral DataLVs aRMSEC
(°Brix)
R C 2 RMSECV
(°Brix)
R C V 2 RMSEP
(°Brix)
R P 2 RPDp-Value
Stem side250.42900.87330.52420.81210.54350.80212.2481<0.001
Calyx side200.44250.86520.49550.83150.46940.84632.5504<0.001
Mean of two sides330.34000.92040.45340.85980.44310.86552.7269<0.001
Note: a LVs, number of latent variables in a PLSR model.
Table 3. Results of PLSR models for determining SSC values using feature wavebands selected by SPA, iPLS, and GA methods.
Table 3. Results of PLSR models for determining SSC values using feature wavebands selected by SPA, iPLS, and GA methods.
Selection
Algorithms
Number of
Wavebands
LVs aRMSEC
(°Brix)
R C 2 RMSECV
(°Brix)
R C V 2 RMSEP
(°Brix)
R P 2 RPDp-Value
SPA70330.36560.90800.44460.86460.46430.85072.5883<0.001
iPLS55320.37300.90420.43300.87120.47910.83972.4976<0.001
GA63310.35010.91560.40780.88580.47640.84202.5156<0.001
Note: a LVs, number of latent variables in a PLSR model.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Qiu, G.; Chen, B.; Lu, H.; Yue, X.; Deng, X.; Ouyang, H.; Li, B.; Wei, X. Nondestructively Determining Soluble Solids Content of Blueberries Using Reflection Hyperspectral Imaging Technique. Agronomy 2024, 14, 2296. https://doi.org/10.3390/agronomy14102296

AMA Style

Qiu G, Chen B, Lu H, Yue X, Deng X, Ouyang H, Li B, Wei X. Nondestructively Determining Soluble Solids Content of Blueberries Using Reflection Hyperspectral Imaging Technique. Agronomy. 2024; 14(10):2296. https://doi.org/10.3390/agronomy14102296

Chicago/Turabian Style

Qiu, Guangjun, Biao Chen, Huazhong Lu, Xuejun Yue, Xiangwu Deng, Haishan Ouyang, Bin Li, and Xinyu Wei. 2024. "Nondestructively Determining Soluble Solids Content of Blueberries Using Reflection Hyperspectral Imaging Technique" Agronomy 14, no. 10: 2296. https://doi.org/10.3390/agronomy14102296

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop