1. Introduction and Related Work
In the process of plant growth, such as for a tomato plant, mineral nutrients play an important role. The essential mineral nutrients involve two kinds of elements: macro- and micronutrients. The macronutrients include Calcium, Potassium, Nitrogen, Magnesium, Sulfur, and Phosphorous. The micronutrients include Boron, Iron, Manganese, Copper, Chloride, Zinc, and Molybdenum. In addition, deficiency of these nutrients affects the growth, yield, and quality of tomato plants and crops. In this paper, we focus on collecting the dataset, diagnosing and predicting the symptoms of nutrient deficiencies that affect the tomato fruit. For example, deficiencies in the three main mineral elements can be observed by means of blossom end rot (BER) for Calcium, green/yellow shoulder and blotchy ripening for Potassium, and paler green and uniformly yellow/chlorosis on leaves for Nitrogen. The representations of nutrient deficiency for Calcium, Potassium, and Nitrogen are manifested early on in the development of tomato leaves and tomato fruits. These three mineral nutrients have an effect on the tomato leaves and fruits, and they directly affect the growth and quality of the tomato harvest, as shown in [
1,
2,
3,
4]. Moreover, the analysis of the bad impacts caused by Nitrogen deprivation within 10 or 19 days by Camille Bénard et al. in [
5] suggested that they contained a higher level of chlorogenic acid and rutin. In general, symptoms of nutrient deficiency with respect to Calcium, Potassium, and Nitrogen will be visible in the growth of the tomato fruit. The appearance of similar symptoms in tomato fruits and leaves could improve our ability to distinguish and predict deficiencies through the process of farming.
Recently, artificial intelligence (AI) has found widespread application in various fields, including industry, agriculture, biology, etc. Specifically, deep learning (DL) is increasingly being applied to speech recognition, visual object recognition, object detection, and the like. Moreover, LeCun et al. employed deep convolutional nets to achieve breakthroughs in the processing of images, video, and speech [
6]. In particular, in the field of agriculture, DL models are can be applied for the forecast and classification of plant disease, management of crops, weed detection, farmland management, etc., as described in [
7]. From a technical perspective, Kamilaris et al. carried out a survey of 40 studies using DL models in agriculture applications including several architectures such as the popular CNN architectures (e.g., AlexNet, VGG-16 and Inception-ResNet), the Long Short-Term Memory (LSTM) model, the Differential Recurrent Neural Network (DRNN) model, Scalable Vector Machines (VSM), and the like [
8].
Based on visible range images or digital image processing, recent studies have focused on agricultural diagnosis and prediction for symptoms of disease and nutrient deficiency in plants using an artificial neural network [
9]. Izabela A. Samborska et al. [
10] reviewed the application of ANN to agriculture under a variety of conditions with functional ANN. For identification of plant diseases based on visible range images, the discussion by Barbedo in [
11] proposed a number of intrinsic and extrinsic factors, including image background, image capture conditions, symptom segmentation, etc.
Recent studies using deep learning have focused on the identification and pathological analysis of plants. Ferentinos carried out research with several CNN-based models to identify leaf diseases in 29 crops based on a database of 87,848 images, and achieved a maximum success rate of 99.53%, as reported in [
12]. Artzai Picon et al. [
13] used a neural network model to classify the diseases of plants, including tomato plants, based on images taken using mobile devices. In [
14], Mads Dyrmann et al. built a convolutional neural network from scratch, which was trained and tested on a total of 10,413 images containing 22 species of weeds and crops. Their study results achieved a classification accuracy of 86.2%. Srdjan Sladojevic et al. [
15] developed a model that was able to recognize 13 different types of plant disease. By using Caffe, a deep learning framework based on CNN, their study achieved precision of between 91% and 98% for separate class tests. Alvaro Fuentes et al. [
16] proposed a robust DL-based detector for real-time disease and pest recognition in tomato plants. The authors considered the combination of three families of detectors, including Faster Region-based Convolutional Neural Network (Faster R-CNN), Region-based Fully Convolutional Network (R-FCN), and Single Shot Multibox Detector (SSD). Using several models, such as Visual Geometry Group (VGG-16), ResNet-50, ResNet-101, and ResNet-152, the authors compared the prediction results for nine different types of disease and pest.
Anna Chlingaryan et al. conducted a survey of machine learning approaches for crop yield with respect to estimation of nitrogen status in precision agriculture, which included various technologies and ML techniques, such as back-propagation Neural Network, combination of CNN or LSTM with Gaussian Processes, M5-Prime Regression Trees, and Least Squares Vector Machine [
17]. H. Noh et al. estimated crop nitrogen status during side-dressing operations [
18]. The authors used a neural network model to evaluate maize leaf based on the sensed reflectance of the maize canopy in three channels, including green and red colors and the near-infrared (NIR) range of a multi-spectral charge-coupled device camera. Using aerial images and an artificial neural network, R. K. Gautam et al. applied two neural network architectures—multilayer perceptron and radial basis function—to predict leaf nitrogen content in corn plants under field conditions [
19]. The authors achieved a root mean square error of prediction (RMSEP) of 6.6%, and a minimum prediction accuracy (MPA) of 88.8% for predicting leaf nitrogen content. In [
20], Guili Xu et al. described an approach for identifying nitrogen and potassium deficiency in tomatoes with an accuracy of 82.5% using a Genetic Algorithm (GA) based on the color and texture features of leaves.
On the other hand, M.A. Vazquez-Cruz et al. developed an ANN model prototype to estimate the leaf area of tomato growths under changes in climate and salicylic acid in [
21]. In addition, a study based on a Support Vector Machine (SVM) was reported by Z. Hanxu et al. in [
22], which improved the methodology by introducing the near-infrared spectrogram of the tomato leaves under investigation in order to classify anti-nematode tomatoes and normal tomatoes. C.D. Jones et al. estimated leaf nitrogen status with a reflectance spot sensor and using Bayes’ theorem for the prediction model with multispectral imaging, which exhibited a root mean square difference (RMSD) approaching 4.9% and a coefficient of determination of 0.82, as discussed in [
23]. Furthermore, the method proposed by S. B. Sulistyo et al. in [
24] discussed a novel computational intelligence in vision sensing for estimating nutrient content in wheat leaves. To evaluate the nutrient content, the authors applied deep spare extreme learning machines (DSELM) and a genetic algorithm (GA) to normalize plant images and reduce color variability due to a variation of sunlight intensities. The mean absolute percentage error (MAPE) of the GA-based committee machine with four DSELMs was approximately 2.78%.
A method for fusing the deep learning multilayer perceptron (DL-MLP) by means of committee machines to achieve color normalization and image segmentation was proposed by S. B. Sulistyo et al. The authors optimized the system architecture for high performance of color normalization and nitrogen estimation using a genetic algorithm (GA). In addition, the authors affirmed that they built a robust statistical model for prediction, rather than recognizing the images. To estimate the nitrogen content, the authors used several standard MLPs to extract the images for three colors (RGB) [
25].
S. B. Sulistyo et al. [
26] also demonstrated a low-cost, simple, and accurate approach for achieving image-based nitrogen amount estimation. The authors proposed a combination of neural networks using a committee machine with twelve statistical RGB color features to evaluate input images. They applied a neural network (NN) to distinguish the wheat leaves, and then combined a committee machine and a genetic algorithm (GA) in the NN to estimate the nitrogen content. Like previous studies, the authors used a different method based on NN to extract basic colors in order to estimate nitrogen content. In addition, the authors also presented a comparison of several methods (e.g., SPAD, univariate linear regression, multivariate linear regression, ANN 30, ANN 35, and committee machines) with respect to nitrogen prediction performance on the basis of various comparative error values, such as MAPE, MAE, RMSE, MSE, and SSE [
27]. Thereby, the committee machine achieved better results than other methods in terms of the evaluated indexes, with its values such as 3.15% for MAPE, 0.088 for MAE, RMSE of 0.125, 0.016 for MSE, and SSE of 0.565.
As mentioned above, most studies have focused on distinguishing and predicting certain kinds of disease in plants or crops, such as the effects of the symptoms of nutrient deficiency on the growth of tomato plants/crops. Furthermore, previous approaches have not used particular deep neural network models in order to forecast and evaluate the performance of models under real conditions. Therefore, the primary goal of this paper is to apply a particularly deep neural network (i.e., Inception-ResNet v2 and Autoencoder), which recognizes and predicts deficiencies in the essential mineral nutrients Calcium, Potassium, and Nitrogen based on CNN, as well as to evaluate their performance effectiveness under real natural conditions. To collect a dataset for training and validation, we captured images of normal and nutrient-deficient tomato fruits.
Accordingly, Inception-ResNet v2 and Autoencoder were utilized for training, recognition, and prediction of the nutrient status of tomato plants on the basis of the images captured in this study. We upgraded the CNN model to increase the accuracy of identification and prediction of the AI system. In contrast to previous studies, we focused on analyzing, evaluating, and predicting nutrient deficiency status in the growth stages of the tomato plant. We aim to improve the predictive performance of DL models in order to achieve high production yields and prevent the emergence of tomato pathologies caused by the lack of nutrients. In particular, we used the modified structure of Inception-ResNet v2 and Autoencoder for predicting nutrient deficiency, and achieved an accuracy rate of 87.27% for the Inception-ResNet v2 model and 79.09% for the Autoencoder. The top-3 error rate was 12.73% for Inception-ResNet v2 and 20.91% for Autoencoder. Moreover, we combined two of the previously described models by using the Ensemble Averaging method to improve the predictive precision, resulting in an accuracy of 91%.
This paper is structured as follows.
Section 2 presents the proposed system and dataset collection with Calcium, Potassium, and Nitrogen deficiencies.
Section 3 shows the building of Inception-ResNet v2 and Autoencoder models based on CNN, and the use of Ensemble Averaging to predict and distinguish mineral nutrient deficiencies based on the captured images of tomato fruit and leaves.
Section 4 demonstrates the results of forecasting. The comparison and evaluation of predictive performance for each model is described in
Section 5.
Section 6 summarizes the study and highlights the key development in this article.
5. Comparison and Validation
Based on the number of images used to validate forecasting ability in
Table 1 and the comparison results in
Table 3, it is apparent that the performance of Inception-ResNet v2 model with the modified structure using several sub-networks is higher than the Autoencoder model (without residual learning in its structure). Considering both of the model structures described in
Section 3.1 and
Section 3.2, as well as the results in
Table 3, we evaluate the effectiveness of forecasting under the same conditions.
In general (see
Table 3), the forecast results of Inception-ResNet v2 achieve greater accuracy than Autoencoder in our study. When looking at the comparison indexes in
Table 3, it is easy to recognize that prediction and classification in the Calcium and Nitrogen groups using Inception-ResNet v2 is higher than when using Autoencoder. Particularly, in
Figure 7a, and
Figure 9a, Inception-ResNet v2 accurately predicts the expression of Calcium deficiency on the fruit of tomato, whereas Autoencoder does not match it correctly. In the case of Potassium prediction, the accuracy rates of the two models are almost equal. However, for forecasting the expression of Nitrogen deficiency, Autoencoder has an uncorrected prediction, as presented in
Figure 9g. Meanwhile, Inception-ResNet v2 achieved an exact ratio of 0.747, which is not so much higher than the 0.650 achieved by Autoencoder, as shown in
Figure 7h and
Figure 9h.
Table 4 shows that validation rate of 87.27% for the Inception-ResNet v2 model, which is higher than the Autoencoder model, whereas the validation rate of Autoencdoer is only 79.09% on the basis of the same number of parameters, i.e., training and validation dataset, number of epochs, learning rate, and so on.
Figure 10a,b shows the relationship between the cost function and the training steps for the two predictive models using only the training dataset. As can be seen from these figures, it is obvious that the Inception-ResNet v2 model creates an algorithm with a higher accuracy than Autoencoder, due to the cost function of the former approaching zero more quickly.
Taking a closer look at the two predictive models described in
Section 3.1 and
Section 3.2, and based on the results in
Table 3, as well as the accuracy rate in
Table 4, we used the ensemble average method to increase the accuracy of the forecasting for the classification of nutritional deficiencies in the leaves and fruits of tomato plants. The prediction performance of the ensemble learning is compared with that of both models in
Table 5. Accordingly, based on the accuracy and error rates achieved by each predictive result, it is revealed that the ensemble learning technique in our study is more robust than the two previous mentioned methods (see
Table 5) used in our study. The 91% achieved by the ensemble is higher than the 87.27% of Inception-ResNet v2 and the 79.09% of Autoencoder for our study.
By using this simple method to evaluate the effectiveness of the DL models, we use the confusion matrix to compare the forecast results for both the training and validation datasets. According to
Table 6 and
Table 7, it can be seen that the Inception-ResNet v2 model achieves better evaluation indicators than the Autoencoder model with the training dataset that we collected under greenhouse conditions.
For the total of 461 images captured for use as the training dataset, Autoencoder achieves an average accuracy of 95.87% during the model training process. Compared to this value in
Table 6, this result reveals that the Inception-ResNet v2 model in this study is stronger than Autoencoder.
Table 7 shows the predictive results based on the observed data and the classifier results of Autoencoder for the training dataset.
Table 8 and
Table 9 show the precision when both models were verified using the validation dataset. It is noticeable that Inception-ResNet v2 has a higher accuracy than Autoencoder for the classification of all three observed nutrients. However, in both models, forecast and classification of Nitrogen deficiency has a lower accuracy when compared with the others, with concrete results of around 78.26% for Inception-ResNet v2 and 69.56% for Autoencoder. This is one cause of the bad predictive results when we randomly tested the images of Nitrogen shortage in
Figure 7g,h and
Figure 9g,h. In addition, according to
Table 9, the accuracy of forecasting for Calcium in the Autoencoder model is only 80.32%, so the results in
Figure 9a are completely incorrect (see
Table 3).