1. Introduction
Ripeness is a crucial factor for determining the best harvest time for fruits, which significantly influences the taste. During the ripening period, the skin of many fruits softens, and it is also easy to cause surface damage before and after picking, affecting the quality of sale. Judgment of ripeness and surface defects is complex and comprehensive. A reliable, rapid, and non-destructive measure will help to improve the work efficiency and reduce the artificial cost. With the development of computer technology, optical and imaging techniques have made these procedures tend towards simplicity.
Dhiman et al. [
1] comprehensively summarized that RGB is the commonly used color space for some crop predictions and detection work currently. May et al. [
2] used the RGB color model and fuzzy logic technique to assess the ripeness of oil palm fruit. Pardede et al. [
3] studied the features extracted from RGB, HSV, HSL, and L*a*b* color space with the help of the support vector machine (SVM). Anraeni et al. [
4] achieved strawberry ripeness detection by the RGB feature and k-Nearest Neighbor (k-NN) method, which performed well in the non-strawberry category. However, traditional fruit surface imaging methods only represent external information and are not sensitive to the internal information under the skin. Cubero et al. [
5] and Costa et al. [
6] summarized many commercial vision systems, but these measures only imitate human vision. Therefore, they only make judgments on the parts exposed to cameras. Moreover, because strawberries are herbaceous plants, the fruiting position is close to the land, which can easily cause mud contamination; the multiple colors of contaminants could make detection difficult. It has been proven that spectroscopy techniques can be used to evaluate the quality and ripeness of fruits or vegetables.
Hyperspectral imaging (HSI) is very useful for detecting bruises under the skin, and it can be used to evaluate changes in the fruit quality (e.g., soluble solids content (SSC), moisture content (MC), and firmness). Each pixel within a hyperspectral image contains abundant spectra, so a whole hyperspectral image is a three-dimensional cubic shape. And, more information can be involved in hyperspectral images than in visible images. Wei et al. [
7] extracted hyperspectral features from wavelengths between 400 and 1000 nm to classify the ripeness of persimmon fruit. They made a different ripeness dataset with the help of a linear discriminant analysis (LDA), and reached a correct classification rate of 95.3%. Guo et al. [
8] chose optimal wavelengths by principal component analysis (PCA) loading and fed features into the SVM to assess the ripeness situation of strawberries. Khodabakhshian et al. [
9] and Benelli et al. [
10] developed different regression models to detect fruit quality attributes and used classification algorithms to finally evaluate the grade of maturity.
Although HSI can provide richer spectral information, it still suffers from many challenges, including the need to select the appropriate spectral bands in advance, the limitation on results by the selected region of interest (ROI), the low spatial resolution, the slow imaging speed, the unintuitive imaging results, etc. In addition, similar spectral curves might sometimes be ambiguous. Elmasry et al. [
11] found that the sharply changed spectral curve between 600 and 800 nm is optimal for evaluating the ripeness of strawberries, while Liu et al. [
12] thought that the similaraties within the same spectral range could be used to evaluate fungal contamination of strawberries. Unfortunately, similar situations are often encountered in reality.
Furthermore, NIR imaging also has spectral information that is deficient in RGB images, and its imaging speed is faster than HSI as well. Luo et al. [
13] proved that the best wavelengths for apple bruise detection are within the NIR range, and it is feasible to conduct detection by using the reflectance difference between the selected wavelengths. Wu et al. [
14] used spectral preprocessing and the least squares support vector machine algorithm to detect tomato surface bruising within 600–1600 nm of the wavelengths. In a narrower spectral range, Wang et al. [
15] reported that a 700–1000 nm spectral wavelength could be used to separate the chilling injured symptom severity of kiwifruits. Additionally, NIR wavelengths involved tasks including defect detection on chilling-injured nectarines [
16], hailstorm damaged olives [
17], and the sunburn [
18] and bruise susceptibility of apples [
19], which also achieved good results.
Unfortunately, current methods for NIR participation almost always acquire hyperspectra across the entire NIR range, from which some specific spectra in the ROI region are selected for analysis, and the the results are likewise limited by the selected area. This results in poor visibility of the final acquired image. Moreover, the presence of filters or spectroscopic prisms in NIR or hyperspectral cameras severely weakens the ability of CCDs to capture the light intensity, which is also a major factor causing spatial resolution degradation. Taken together, image fusion work is necessary to obtain a more visual and intuitive whole image.
Deep learning methods have gained popularity recently. Gao et al. [
20] built a real-time HSI system with a pretrained AlexNet convolutional neural network (CNN) to evaluate the early ripe stage or ripe stage of strawberry. They used two wavelength ranges (528 and 715 nm) of the spectrum as the dataset in the laboratory and obtained an accuracy of 98.6% for the test dataset. Su et al. [
21] developed one-dimensional and three-dimensional hyperspectral information [
12] and fed it into 1D and 3D ResNet models to assess the ripeness and SSC of strawberries. They achieved an accuracy of over 84% but were limited by the insufficient capacity of the dataset. Gulzar [
22] used a transfer learning technique to classify fruits, which not only improved the accuracy, but also effectively solved the challenge of having insufficient training data. But, the problem with spatial resolution degradation was still unresolved, which could negatively affect the performance of neural networks. More importantly, when faced with the co-existence of multiple conditions, such as bruises, diseases, maturity judgments, dirt masking, etc, it is particularly important to avoid the influence of extraneous conditions on the results in a reasonable way. Established experiments have rarely discussed the co-existence of multiple conditions; they are often designed with only one or two variables, which is unlikely in reality.
The main objective of this study is to fuse the RGB and NIR information into a single image with greater visualization and evaluate the strawberry ripeness and surface quality, focusing on spectral information and spatial information as well. In this experiment, we proposed a fusion method for RGB and NIR images. Firstly, the high-frequency information from the RGB image and NIR image was extracted using filters, and the pretrained VGG-19 network was utilized to process the high-frequency part. Next, the high-frequency information was extracted in advance from RuLU layers with different activation functions, and the image pixels with different features of high-frequency information were averaged and processed to fuse them into a single image. Finally, the high and low-frequency images were fused together. The resulting fused image obtained both spatially resolved information and spectral information, which improved the accuracy of subsequent detection processes.
4. Conclusions
Strawberry is a kind of fruit with soft skin, which can be easily damaged or stained, thus affecting the commercial value. Many researchers have detected the ripeness and defects of strawberries based on RGB or NIR images. However, many limitations have often been encountered: It is difficult to detect defects under skin and contamination with similar visual colors in RGB images; NIR images can capture the difference in reflectance bands of defects and contamination, but are challenged by the severe lack of spatial resolution in the images. In order to combine the advantages of NIR images for detecting feature changes under the skin and RGB images with their high spatial resolution in non-destructive strawberry detection, a neural network based on the pretrained VGG-19 was used to fuse the images. Using a delicate design, we processed the low- and high-frequency parts of the image separately, preserving the detailed parts of the fused image to a greater extent. The fused images not only contained the high-spatial-resolution RGB images but also contained richer spectral information. This also effectively avoided the regional limitations of the mean relative reflectance method in each ROI, effectively improving the objectivity of the detection results. Compared with the current mainstream algorithms, the proposed fusion method achieved the best structural similarity and information content and minimal noise introduction, having 5.42% and 3.61% higher performance levels than the RGB-only-based and NIR-only-based groups, respectively.
In addition, in order to fit multi-detection objects and multi-tasks, the proposed model can also modify the weights of the RGB and NIR parts. In order to cope with the different sensitivity levels of information sources brought by different detection objects, in practical applications, specific weights need to be determined according to the specific situation, and the method proposed in this paper provides this convenience, which can greatly facilitate the multi-task detection task. Although, because of hardware limitations, the computational speed needs to be improved when dealing with a large amount of samples at the same time. However, we are optimistic that faster computing platforms and better optimization algorithms will be developed. These techniques will facilitate real-time model detection and will also allow the model to detect multiple varieties and characteristics of fruits simultaneously.