1. Introduction
Tomatoes (Solanum lycopersicum) belong to the Solanaceae family and are the most widely cultivated vegetable crop globally.
Consuming tomatoes offers health benefits due to their low fat and calorie content. They are rich in fiber, natural antioxidants, vitamins, carotenoids, and phenols. Scientific evidence supports the protective role of tomato consumption in preventing chronic degenerative diseases such as cancer, cardiovascular issues, and neurodegenerative pathologies [
1].
The taste and appearance of tomatoes significantly influence consumer preferences. These characteristics depend on the content of soluble solids, sugars, and organic acids. Soluble solids are the collective term for soluble compounds in tomato fruits that directly impact taste [
2,
3]. The sweet taste of the vegetable primarily results from fructose and glucose, while acidity arises from citric and malic acids [
4]. Consequently, the sugar and acid content often serve as essential indicators for assessing tomato flavor.
Measuring the quality characteristics of tomatoes is crucial to ensure they meet consumer expectations regarding taste, appearance, and nutritional value. Quality assessment plays a vital role throughout the tomato production and supply chain. It helps growers optimize cultivation practices, enables distributors and retailers to maintain suitable storage conditions, and ensures compliance with food safety requirements.
Near-infrared (NIR) spectroscopy is one of the methods for nondestructive quality assessment of tomato fruits, due to its noninvasive nature and rapid analysis. By evaluating the interaction of light with the tomato surface, this method provides fast and reliable measurements without damaging the fruits, making it suitable for high-throughput analyses in the food and flavor industry. Additionally, NIR spectroscopy does not require chemicals, making it more environmentally friendly and cost-effective compared to other techniques. Through multivariate analysis techniques such as principal component analysis and partial least squares regression, NIR reflection spectra can be used to assess various quality parameters simultaneously, including ripeness, firmness, sugar content, and acidity, facilitating an overall evaluation of vegetable quality. As a drawback, successfully implementing the method requires instrument calibration specific to tomato varieties and the measured parameters. External conditions such as temperature, lighting, and humidity can also influence measurement accuracy, necessitating careful management of experimental conditions. Furthermore, NIR spectroscopy has a limited penetration depth, which may restrict the ability to measure internal parameters in specific tomato varieties. Despite these limitations, NIR spectroscopy remains a reliable method when used in conjunction with other techniques to provide a comprehensive assessment of tomato quality.
It has been proven that Vis–NIR spectroscopy allows for tracking the biochemical changes occurring in tomatoes during ripening, their optimal harvest time, and assessing their ripeness after harvest [
5,
6].
In studies by Clément et al. [
7], Bunghez et al. [
8], and Szuvandzsiev et al. [
9], the assessment of tomato quality is based on their lycopene content. By using nondestructive quality measurement through Vis–NIR spectroscopy, information about lycopene content and associated color changes in tomatoes can be obtained. Vis–NIR spectroscopy is an effective method for determining specific quality parameters but it is not applicable for predicting other parameters such as pH and soluble solids in tomatoes due to sample homogeneity. The use of portable spectrometers for assessing the quality of tomato puree provides rapid and real-time evaluation of soluble solids, lycopene, and polyphenols. The method’s accuracy is influenced by the calibration period of the measuring devices. Moreover, UV–Vis spectrophotometry and FTIR spectroscopy are suitable methods for quantitatively determining lycopene in tomato powder samples. However, FTIR spectroscopy requires prior sample preparation.
In other studies by Saad et al. [
10], Governici et al. [
11], and Dobrin et al. [
12], combining Vis–NIR spectroscopy with partial least squares regression (PLSR) is proposed for tomato quality assessment. The aim is to track the physicochemical parameters during tomato storage and accurately predict the soluble solids content, degree of ripeness, and, to a lesser extent, the lycopene content.
Integrating various techniques for tomato quality assessment enhances the efficiency of spectral imaging for comprehensive evaluation, necessitating additional methods for data processing. Despite the sufficient accuracy in measuring tomato quality using Vis–NIR spectroscopy, the possibility of predicting important characteristics for these vegetables has not been thoroughly explored, including the generalization of the obtained regression models.
Najjar et al. [
13] and Duckena et al. [
14] investigated the potential of Vis–NIR spectroscopy for the analysis of the internal quality characteristics of tomato fruits. They have developed models for predicting taste index, as well as the content of lycopene, flavonoids, β-carotene, total phenols, and dry matter in intact vegetables based on Vis–NIR reflectance spectra. Their research demonstrates high correlation coefficients for determining lycopene content and dry matter. The authors emphasize the need for additional studies to optimize the application of Vis/NIR spectroscopy under various production conditions. Improving the robustness of models for predicting additional parameters is also necessary to ensure reliable tomato quality control.
Radzevičius et al. [
15] investigated the potential of NIR spectroscopy in assessing the maturity of tomatoes. A correlation between the data obtained through NIR and physicochemical analyses was determined. The authors commented on how the quality parameters of tomato fruits change during ripening. The discovered correlation for the three studied parameters (dry matter, soluble solids, and fruit firmness) is in the range of 0.82–0.96. The study presented important data regarding the potential of NIR spectroscopy for assessing the maturity characteristics of tomatoes, but this investigation requires a more comprehensive examination of the various physicochemical parameters and possible factors affecting the quality of tomatoes.
In another study, the applicability of NIR spectroscopy was tested in field conditions for predicting dry matter values in tomatoes grown in five regions of Brazil. The regression models obtained through PLS regression showed a significant accuracy of over 90%. It was also highlighted that for the commercialization of fresh fruits, improvement in the models and inclusion of different seasons and varieties, as well as a larger number of samples, are necessary. Moreover, the study was focused on a single quality parameter and may not provide a complete picture of the tomatoes’ suitability and nutritional benefits [
16]. A team of researchers, applying NIR spectroscopy, investigated the possibility of assessing the level of maturity and predicting the textural properties of “Momotaro” tomatoes. The PCA model with mean normalization demonstrated high distinguishability, classifying ripe green, pink, and red tomatoes with over 90% accuracy. For the assessment of the soluble solids content in fresh tomatoes, the obtained values of the coefficient of determination were 0.8, while the prediction of alcohol-insoluble solids in tomatoes was determined as unsuccessful by the authors [
17].
From the analysis of the available literature, it can be summarized that assessing the quality of tomatoes is important from the perspective of producers, distributors, and consumers. Vis–NIR spectroscopy is a suitable, fast, and nondestructive method for conducting such quality assessments. However, it has limitations related to the surface sensitivity of tomatoes and the need for calibration of measurement devices. NIR spectroscopy, which also requires calibration, offers several advantages over Vis spectroscopy for tomato quality assessment. Firstly, NIR spectroscopy allows analysis across a broader spectrum of light, enabling the assessment of a greater number of quality parameters in tomatoes when seeking a comprehensive evaluation. Additionally, NIR spectra penetrate deeper into the studied structure, allowing more detailed measurements of internal characteristics related to various nutrients. Due to the specific wavelengths in the NIR range, this technique typically provides greater precision in determining water content, sugars, acids, and other components. Furthermore, NIR spectroscopy enables rapid and efficient scanning of large samples, speeding up the analysis process and allowing for the processing of larger volumes of data. Despite the satisfactory accuracy of Vis–NIR spectroscopy, the method’s potential for predicting important tomato characteristics and generalizing obtained regression models has not been fully explored. Additionally, in numerous studies using the Vis–NIR spectral range, it is observed that it covers only a small portion of the entire NIR spectrum, limiting the informative wavelengths and indices relevant to specific tomato varieties. This fact leads to the need for additional research and the development of models that incorporate a broader NIR spectrum to ensure greater informativeness of measurements and better prediction of individual characteristics. Additional efforts to identify suitable indices from predefined informative wavelengths across the entire NIR spectrum can lead to more accurate and generalizable models for predicting tomato quality, which is essential for the agricultural industry. In the conducted review of studies involving the application of NIR spectroscopy for tomato quality assessment, it is evident that most of them focus on one or two evaluation parameters. Integrating techniques for multivariate analysis and encompassing a wider range of parameters could enhance the efficiency of quality assessment. Further research is necessary to overcome limitations associated with using NIR spectra for evaluating the quality of tomato fruits, considering different varieties and cultivation methods.
The purpose of this study is to explore the potential of NIR spectroscopy as a nondestructive method for assessing tomato quality. To achieve this goal, it is necessary to investigate the advantages and limitations of NIR spectroscopy in evaluating various quality parameters such as dry matter, vitamin C, titratable organic acids, total dyes, lycopene, and beta-carotene. Additionally, the integration of multivariate analysis techniques needs to be studied to enhance the efficiency of NIR spectroscopy in the comprehensive assessment of tomato quality.
The results of this study can contribute to the development of more effective and reliable methods for assessing tomato quality, thereby increasing consumer satisfaction and meeting food safety requirements during production and distribution.
2. Material and Methods
Three varieties of tomatoes were used for the examination: Manusa, Mirsini, and Red Bounty, grown under greenhouse conditions. The fruits were grown under natural sunlight, without additional heating during the summer season. They were harvested at peak ripeness, ensuring their optimal taste and color.
The main characteristics of the tomato fruits were determined using the methodologies presented in [
18,
19,
20] as follows:
Dry matter (%)—by drying at 70 °C until constant mass.
Titratable organic acids (%)—by titration of tomato juice or extract with 0.1 N NaOH (sodium hydroxide), using phenolphthalein as an indicator until reaching neutral pH. The consumed milliliters of 0.1 N NaOH during titration determine the acidity, expressed as a percentage of malic acid.
Vitamin C (mg %) by Tilman’s reaction with 2,6-ichlorophenolindophenol.
Total pigments (mg %)—by extracting lycopene and beta-carotene from tomatoes with hexane, acetone, and ethanol. Lycopene (mg%) is a red pigment and a powerful antioxidant, found in tomatoes. High-performance liquid chromatography (HPLC) is the gold standard for measuring lycopene. This involves extracting lycopene from the tomato sample and analyzing it with HPLC to determine its concentration.
Beta-carotene (mg %) is another carotenoid pigment present in tomatoes. Similar to lycopene, this component is quantitatively determined by HPLC after extraction from the tomato sample.
In the paper, the characteristics are denoted by alphanumeric symbols according to the following scheme:
C1 | Dry matter, (%) |
C2 | Vitamin C (mg %) |
C3 | Titratable organic acids (%) |
C4 | Total pigments (mg %) |
C5 | Lycopene (mg %) |
C6 | Beta-carotene (mg %) |
The chemical analysis was followed by spectrophotometric measurement of each sample. NIR reflectance spectra of tomato fruits were obtained. Each sample was measured at five independent locations on the object’s surface. The NIRQuest512 spectrophotometer (Ocean Optics Inc., Orlando, FL, USA) was used in the spectral range of 900–1700 nm, with a Hamamatsu G9204-512 InGaAs detector (Hamamatsu Photonics K. K., Japan). Its optical resolution is 3.1 nm.
The RReliefF method [
21] was used to select the informative spectral indices. RReliefF is an algorithm for selecting informative features used in machine learning. It evaluates the relevance and importance of features by comparing their values on close instances, focusing on distinguishing instances from different classes. Spectral indices defined by Ju et al. [
22] and Mendiguren et al. [
23] were used in this work. An advantage of these indices is their adaptability, as they are not limited to fixed spectral wavelengths. Six informative wavelengths (λ, nm) specific to the product under study need to be selected. Subsequently, the calculated indices can serve as input data for classification, regression, and clustering tasks. These indices are shown below (Equations (1)–(15)):
The preliminary assessment of forecasting feasibility was conducted using PCR and PLSR methods [
24]. PCR employs principal component analysis to reduce multicollinearity in regression. PLSR identifies latent variables explaining variations in both predictors and responses in regression. A second-order regression model was employed [
25], describing the relationship between independent and dependent variables as follows:
where
is the dependent variable; the independent variables are
and
; model coefficients are denoted as
.
Model assessment includes metrics such as coefficient of determination (R
2), standard error (SE),
p-value, Fisher’s test (F), and residual analysis. Validation of regression models was performed through the relationship between actual and predicted values of tomato characteristics. Validation was conducted using the coefficient of determination (R
2), mean squared error (MSE), root mean squared error (RMSE), mean absolute error (MAE), and standard error (SE). These errors are calculated using the following formulas:
where
n is the number of data;
is the actual measured values;
is the predicted values.
The choice of an error metric depends on the goals and characteristics of the regression problem being solved. Absolute error values are used when the robustness and interpretability of models are crucial.
The validation used tomato data that were not included in the creation of the regression models. The processing of the experimental data was performed in the software Matlab 2017b (The Mathworks Inc., Natick, MA, USA). All data were processed at a significance level of α = 0.05.
3. Results
Figure 1 depicts the spectral characteristics of tomatoes from the three varieties (Manusa, Mirsini, and Red Bounty 27 5a), within the range of 900–1700 nm. Across the entire spectral range, there is an overlap in characteristics for the three tomato varieties.
Selection of informative wavelengths of the spectral waves was made.
Figure 2 illustrates graphs of the weighting coefficients of the respective wavelengths for each of the technological characteristics of the tomatoes.
The six informative wavelengths, a result of the selection by the RReliefF method, are presented in
Table 1. The wavelengths are sorted according to their informativeness, determined by the corresponding weighting coefficients.
Fifteen spectral indices were computed, using the selected wavelengths of the spectra. In
Table 2, data on the informativeness of spectral indices are shown depending on the selected tomato characteristics. As seen from the selection of spectral indices and their informativeness, changes in each tomato characteristic lead to changes in reflection spectra at different wavelengths. Hence, forecasting these tomato characteristics requires the use of different spectral indices. The smallest number of informative indices is calculated for changes in lycopene content (C5), while the highest number of indices is selected for beta-carotene content (C6). Those spectral indices that have a weighting factor greater than 0.6 are selected.
The following vectors of informative spectral indices were selected:
A preliminary analysis was conducted on the possibility of forecasting key characteristics of tomatoes using the PCR and PLSR methods. The necessary number of principal components and latent variables was determined to describe over 95% of the variance in the data vectors containing spectral indices from each of the six analyzed tomato characteristics. It was found that two principal components and two latent variables are necessary to describe these data in all cases.
The results of this analysis are shown in
Table 3. It is evident that the forecasting of the main characteristics of tomatoes does not depend on the method used to reduce the volume of data, but, rather, on the informativeness of the vectors of indices used and their predictive ability. In this case, there is a preference for reduction via principal components, as the regression method used shows lower errors and a slightly higher coefficient of determination compared to PLSR.
Nonsignificant coefficients with a
p-value > 0.05 were removed from the main model. The following regression models were obtained:
The values of the criteria for the regression models assessment are indicated in
Table 4. Each model represents a function of the principal components (PC
1 and PC
2).
The coefficient of determination (R2) indicates how well each model explains the variance in the dependent variable. According to Fisher’s criterion, the computed value of F is much greater than the critical F-critical value. Based on this criterion, it can be inferred that the obtained regression models have sufficient accuracy.
In
Figure 3, the most informative models for predicting key characteristics of tomatoes are presented in general.
Figure 4 shows the results from the residual analysis for the obtained models. The conducted analysis revealed that they have a normal distribution and are closely aligned with the normal probability surface. Based on this criterion, it can be concluded that the requirements of the regression analysis are met.
The validation process involves comparing the predictive ability of the regression models under identical measurement conditions.
Figure 5 depicts the distribution of measured and predicted values of tomato characteristics relative to the appropriate regression line.
The errors in validating the obtained regression models for the key characteristics of tomatoes are presented in
Table 5. The obtained models show low errors according to the used criteria, indicating strong predictive ability. This was evidenced by high values of R
2, which signify the proportion of variance explained by the model.
The results from validating the regression models for predicting tomato characteristics across the three varieties (C2, C4, and C5) consistently demonstrate stable performance. In all features, the models exhibit low error values, indicating a sufficiently high level of accuracy. The mean squared error (MSE) and the root mean squared error (RMSE) are adequately low, ranging from 0.86 to 0.88 for MSE and 0.93 to 0.94 for RMSE. These metrics suggest that the models have a minimal average discrepancy between actual and predicted values, which is a marker of their effectiveness.
The mean absolute error (MAE), ranging from 2.85 to 2.89, further confirms the models’ strong predictive ability, although they are slightly higher than the error metrics based on squared differences. Additionally, the standard error (SE) values show a tendency toward increase compared to MAE, indicating some variability in prediction errors. However, they remain relatively low, ranging from 2.99 to 3.07.
The high values of the coefficient of determination (R2), ranging from 0.83 to 0.84, imply that a significant portion (83–84%) of the variation in actual tomato characteristics could be explained by the regression models.
5. Conclusions
This study complements and enhances the known methods for nondestructive assessment of tomato quality through systematic analysis and investigation of NIR spectroscopy for assessing tomato quality and integrating multivariate analysis techniques.
From the selection of spectral indices, it was found that changes in each tomato characteristic lead to variations in reflection spectra at different wavelengths. Therefore, vectors containing different spectral indices should be used to forecast these vegetable characteristics.
It was demonstrated that predicting the main characteristics of tomatoes depends less on the method used to reduce the volume of data and more on the informativeness of the indices vectors used and their predictive ability.
Regression models for the automated prediction of six tomato characteristics based on their spectral features were developed.
It was found that models for vitamin C, titratable organic acids, and lycopene have the highest predictive capability compared to the other investigated tomato characteristics.
The regression models can be used in creating automated systems for assessing tomato quality.
In conclusion, it is noted that the continuous development of noncontact analysis techniques requires careful consideration and taking into account new methods and technologies that have the potential to improve the accuracy and reliability of predictive models for tomato quality improvement. It is also important to recognize that producers, traders, and consumers have different requirements for tomato quality. This requires flexibility and adaptability of prediction models to meet the specific needs and preferences of different stakeholders.