Crop Growth Monitoring System in Vertical Farms Based on Region-of-Interest Prediction

Hwang, Yujin; Lee, Seunghyeon; Kim, Taejoo; Baik, Kyeonghoon; Choi, Yukyung

doi:10.3390/agriculture12050656

Open AccessFeature PaperArticle

Crop Growth Monitoring System in Vertical Farms Based on Region-of-Interest Prediction

by

Yujin Hwang

¹

,

Seunghyeon Lee

¹

,

Taejoo Kim

¹

,

Kyeonghoon Baik

² and

Yukyung Choi

^1,*

¹

Department of Intelligent Mechatronics Engineering, Sejong University, Seoul 05006, Korea

²

N.Thing Corporation, Seoul 06020, Korea

^*

Author to whom correspondence should be addressed.

Agriculture 2022, 12(5), 656; https://doi.org/10.3390/agriculture12050656

Submission received: 20 March 2022 / Revised: 14 April 2022 / Accepted: 27 April 2022 / Published: 30 April 2022

(This article belongs to the Special Issue Remote-Sensing-Based Technologies for Crop Monitoring)

Download

Browse Figures

Versions Notes

Abstract

:

Vertical farms are to be considered the future of agriculture given that they not only use space and resources efficiently but can also consistently produce large yields. Recently, artificial intelligence has been introduced for use in vertical farms to boost crop yields, and crop growth monitoring is an essential example of the type of automation necessary to manage a vertical farm system. Region of interest predictions are generally used to find crop regions from the color images captured by a camera for the monitoring of growth. However, most deep learning-based prediction approaches are associated with performance degradation issues in the event of high crop densities or when different types of crops are grown together. To address this problem, we introduce a novel method, termed pseudo crop mixing, a model training strategy that targets vertical farms. With a small amount of labeled crop data, the proposed method can achieve optimal performance. This is particularly advantageous for crops with a long growth period, and it also reduces the cost of constructing a dataset that must be frequently updated to support the various crops in existing systems. Additionally, the proposed method demonstrates robustness with new data that were not introduced during the learning process. This advantage can be used for vertical farms that can be efficiently installed and operated in a variety of environments, and because no transfer learning was required, the construction time for container-type vertical farms can be reduced. In experiments, we show that the proposed model achieved a performance of 76.9%, which is 12.5% better than the existing method with a dataset obtained from a container-type indoor vertical farm. Our codes and dataset will be available publicly.

Keywords:

crop growth monitoring; vertical farms; region-of-interest (RoI) prediction; stance segmentation; self-training; pseudo label

1. Introduction

A vertical farm [1,2] is a multi-layered structure that enables crops to be stacked and cultivated, maximizing crop output per available area and using the least amount of resources necessary using a hydroponic system. Additionally, by utilizing artificial light and temperature control systems, the vertical farm can maintain a regular temperature. As a result, this system is regarded as the future of agriculture due to its ability to reduce environmental pollution through resource-saving while reliably producing large quantities of crops. To ensure crop yields in the vertical farm system, it is also important to maintain an environment conducive to crop growth via the continuous monitoring of farm conditions. Most current farm monitoring systems [3] are designed to maintain an environment that is consistent with the established practices for growing crops, and such methods are based on knowledge about cultivation for growing crops rather than their status. However, crop states are critical for determining the optimal cultivation environment, and by continuously monitoring crop statuses, a proper crop-optimized farm environment can feasibly be maintained. While some vertical farms inspect crops manually, this is a time-consuming operation that may include measurement inaccuracies. As a result, a machine-based monitoring system is necessary to automate the monitoring system and improve the measurement accuracy.

Many studies of artificial intelligence have been conducted to automate the monitoring process. Specifically, image-based monitoring systems are used because images are simple to gather and contain a variety of information about crops, such as the number of leaves and the presence or absence of blooms. A region-of-interest prediction is a type of image processing method that locates the user-defined high-value area in input images and is used in image-based monitoring systems to extract crop status information from images. Various factors are defined as interesting information in images, and some studies [4,5] perceive blooms and fruits as intriguing information. Monitoring crop states with flowers or fruits, on the other hand, not only cannot be accomplished continuously throughout the agricultural cycle, but also cannot be applied to leaf vegetables. Other systems [6,7] regard crop growth information contained in an image to be of interest. They extract information about the crop sizes or the number of leaves from these images; this method is more suitable for continual monitoring. Accordingly, our monitoring system is designed to conduct crop area predictions from input images. The crop area can be utilized to provide the system with the crop size and to complete different sub-tasks such as recognizing disease spots within extracted crop areas. To estimate the crop size from an image, our system uses the image segmentation approach, which is a region-of-interest prediction method, image segmentation is the process of the pixel-by-pixel categorization of an image. This approach enables systems to determine the crop area in the image. However, training a segmentation model so that it is capable of properly predicting the crop area from vertical farm images is challenging. In comparison with conventional agriculture, vertical farming requires less space between crops and allows for a greater range of crop species. As a result, unlike traditional agriculture, which relies on simple deep learning models to predict crop areas, the vertical farm requires a model that is robust to complicated scenarios.

One of the reasons why crop area prediction models struggle in the vertical farm context is a lack of training data. To ensure high performance, deep learning models require a significant amount of training data. However, public datasets produced from vertical farms are unusual, and collecting new crop images takes time. In addition, even if sufficient image data can be collected, labeling incurs high costs to create a pixel-level ground-truth class label. Additionally, the images collected from vertical farms contain an abundance of diverse cases, and it is impossible to collect all the cases that arise during the monitoring process. For example, as shown in Figure 1a, vertical farms have minimal space between crops because they can supply enough nutrients even with this amount of space. They can also have relatively simple cases involving, for example, the farming of a single crop, as shown in Figure 1b, as well as those in which several crops are produced, as shown in Figure 1c. In the case shown in Figure 1c, moreover, the crop area can be covered by other crops. In summary, vertical farms allow for the cultivation of a wide range of crop species with various configurations, resulting in an endless number of possible combinations that can be captured. Based on our investigation, we introduce pseudo Crop Mixing (SC-Mix) which is not dependent on the collected data. SC-Mix is designed to increase the diversity of training data by enabling unlabeled images to be utilized in the model training process and to improve the general performance for a wide variety of inputs by augmenting the training data with simulated images of vertical farms.

Vertical farms can produce efficiently regardless of the weather or location, and they are gaining in popularity in both professional and home agriculture. The suggested system automates crop growth monitoring, allowing for accurate crop condition evaluations and reducing the need for human intervention. Our solution is expected to increase the productivity of vertical farms by using more sophisticated supervisory systems and reduce farm operating expenses through automated monitoring.

2. Related Work

2.1. Region-of-Interest Prediction

A region-of-interest prediction is a typical method of image processing that identifies the regions desired by the users in images. Many computer vision tasks belong to this method, including object detection, which detects regions of interest in the form of bounding boxes, and image segmentation, which detects area pixel levels. Image segmentation, which can provide detailed information about the interest area, is separated into two approaches: semantic segmentation [8,9,10], which classifies pixels, and instance segmentation [11,12,13,14,15], which also classifies pixels while distinguishing individual objects. Due to the fact that instance segmentation can identify individual objects, it has been frequently utilized in agriculture to distinguish fruits, flowers, and crops in an image. The following examples demonstrate how agriculture can use instance segmentation technology. Widiyanto et al. [4] determined the location of tomatoes; Xu et al. [6] measured the health status of crops by identifying the number of leaves; Tian et al. [5] detected flowers and fruitlets to assess thinning; and Champ et al. [16] distinguished plants from weeds. Accordingly, our system tracks the growth stage of crops using the instance segmentation approach.

Instance segmentation is further subdivided into proposal-based approaches [12,13] for creating instance-level prediction masks based on detection results and proposal-free approaches [14,15] for combining the categorized result of the pixel level into instance masks. Proposal-based approaches require fewer computational resources than proposal-free approaches, which require classification operations on each pixel in the image. To reduce the computation costs and increase the object differentiation accuracy, the Mask R-CNN [11] model, a proposal-based approach, is chosen as the crop area prediction model for our system. This enables our system to distinguish between the monitoring crops and the surrounding crops, making growth state monitoring more efficient and precise.

2.2. Semi-Supervised Learning

Supervised learning and unsupervised learning are two typical deep learning methods. Supervised learning is a well-known deep learning approach that trains a model to predict desired labels from labeled data [17]. However, supervised learning suffers from data scarcity due to the high expense of annotating data. On the other hand, unsupervised learning trains the model to recognize relationships between inputs using unlabeled data [18]. Unsupervised learning is usually less accurate than supervised learning. Therefore, applying unsupervised learning in the field is challenging. Semi-supervised learning combines the two approaches in order to decrease the labeling costs while allowing for the use of unlabeled data in model training [19].

Pseudo-labels [20], sometimes referred to as self-training, are a type of semi-supervised learning in which the initialized model labels unlabeled images in order to utilize them during the training process. The model can reinforce and expand existing knowledge by training on the pseudo-labeled dataset processed using the knowledge of the previous model. This strategy is becoming more popular since it allows the performance of the model to improve over time. However, existing pseudo-labeling strategies [21] are based on the assumption that there will be enough unlabeled data to cover all possible cases. They are not effective when collecting image data is challenging itself and capturing images for all cases is impossible. To address this problem, our approach for converting unlabeled images to training data does not rely merely on pseudo-labeling but depends on extra processes.

2.3. Data Augmentation

Data augmentation is a technique for increasing the volume of the training data in a variety of methods in order to prevent deep learning models from overfitting the training data. This approach ensures that deep learning models can be generalized to new, unseen data [22]. Data augmentation techniques include geometric transformation, color space transformation, random erasing, and mixing images. Geometric transformations reduce positional biases by increasing the amount of training data generated by flipping, cropping, and rotation. Color space transformations refer to a method for resolving color and lighting biases. Random erasing augments the training data with an additional image that removes a part of the original image, allowing models to perceive diverse portions of the target of interest rather than exclusively focusing on a single portion. Mixing images is a method of generating new data by combining multiple images. Unlike traditional methods that increase the variety of training data by adding noise, mixing images increases the diversity of training data by directly mixing the components of different images. In this paper, we present a novel data augmentation technique inspired by mixing images that generate synthetic data simulating vertical farm images by combining instances of multiple images.

The following are representative examples of mixing images research. Contextual Copy-Paste [23] pastes objects whilst manually considering the surrounding context. InstaBoost [24] inserts objects using a heatmap indicating the appearance consistency of objects in images. Cut, Paste and Learn [25] prepares the scenes and the object images separately and naively pastes the objects in consideration of the context information. Simple Copy-Paste [26] is mixing objects instance from one image to another image. These methods increase the diversity of the training data in order to avoid the general overfitting issue, but they have a limit on the amount of diversity that can be increased because each mixing instance is segregated from the image using the ground truth label. SC-Mix overcomes this diversity constraint by augmenting the training dataset using pseudo-labels.

3. Materials and Methods

The proposed monitoring system is composed of two phases, as presented in Figure 2. The first step involves the collection of crop images, followed by crop growth monitoring. Crop growth status monitoring is accomplished by predicting the area of crops in the acquired images to quantify the sizes of the crops. To improve the accuracy of crop segmentation, we propose pseudo crop mixing (SC-Mix), a model training method for vertical farm optimization. Our approach is detailed in the following paragraphs.

3.1. Data Acquisition

The images are obtained using a camera module on Raspberry Pi. As shown in Figure 3, the camera is attached to the upper layer of the monitoring crop, and the distance between the camera and grow tray is 258 mm. To allow continuous monitoring, images are captured at regular intervals. Our system set this interval to 30 min but it can be adjusted by the user. Each acquired image has a resolution of 3280 × 2464 pixels and is saved in the

. p n g

format. The photographed image is transmitted from the board to the server through secure copy protocol (SCP), together with timestamps indicating the date and time of the capture. The crop area of the images is calculated on the server. If the owner desires, the model performance can be boosted by incorporating this group of images in the unlabeled pool.

3.2. Pseudo Crop Mixing

Pseudo crop mixing (SC-Mix) utilizes the knowledge of the model to undertake the pseudo-labeling of unlabeled images, convert them into trainable data, and increase the diversity of the training data. SC-Mix improves the performance by repeating the training process. First, the pseudo-crop generation (PCG) module extracts a pseudo-crop, which is a crop instance, from an unlabeled image using an initialized model trained on a limited amount of labeled data. Subsequently, the synthetic-image generation (SIG) module creates additional training data that simulate the images captured at vertical farms utilizing the pseudo-crop and is added to the training dataset. Then, the model is trained on this augmented training dataset. This entire process constitutes a single cycle illustrated in Figure 4. Because the size of the unlabeled dataset can theoretically grow endlessly, the training procedure can be repeated indefinitely. In order to conserve computational resources, we restricted the numbers of training cycles and unlabeled images.

3.2.1. Pseudo-Crop Generation

The pseudo-crop generation (PCG) module, inspired by self-training, extracts the pseudo-crop from unlabeled images in order to convert unusable data into trainable instances. A pseudo-crop is the region of interest predicted by the crop segmentation model, and it is created in the following order. First, the crop segmentation model is trained using small amounts of labeled data. The trained model is used to predict the crop area, which corresponds to the interest area, from unlabeled images. Finally, the predicted area of the monitoring crops is separated from the image; it is the pseudo-crop. Because the accuracy of pseudo-labeling is dependent on the performance of the model, the pseudo-crop becomes progressively more comparable to the actual crop area in the image as the SC-Mix learning cycles advance.

3.2.2. Synthetic-Image Generation

The synthetic-image generation (SIG) module is inspired by the mixing image and creates a new image using the instance paste approach. At this stage, the cases have been synthesized to make the data correspond to the various scenarios occurring on vertical farms. The synthetic image is generated using the following approach. First, a randomly selected pseudo-crop is positioned in the center of the growth tray image shown in Figure 4. The pseudo-crop in the center is the monitoring target and is attached to the center of the image with some random offsets. Then, with a 0.5 probability, other pseudo-crops are placed at each location, i.e., the top, bottom, left, and right, of the image to imitate surrounding crops. Before pasting, all pseudo-crops were randomly resized by

α

. Additionally, with a probability of 0.5, flipping and photographic distortion are conducted to portray diverse lighting conditions and environmental variations.

3.2.3. Training Strategy

To improve the initialization model learned on a small amount of labeled data, the training dataset is augmented with synthesized data. Model training is repeated in three cycles with 60 epochs for each cycle. The training loss and mAP performance change in the learning processes are shown in Figure 5. Throughout the training processes, the mAP was evaluated on the test data every three epochs. When seen in Figure 5, model training progressed until each cycle converges, and model performance improves as the learning cycle is repeated using the SC-Mix. This experiment is described in Section 4.3.

3.3. Assessment of the Model

The performance of the crop area prediction model is measured by the average recall (AR) and mean average precision (mAP) which are performance metrics that are generally used to measure the performances of segmentation models. AR is a metric that measures how many of the expected outcomes were not recognized. It is defined by Equation (1). The true positive value is the number of pixels that are correctly predicted as crop area. The false negative value is the number of pixels incorrectly not predicted as crop area. The false positive value is the number of pixels incorrectly predicted as the crop area of the object. The true negative value is the number of pixels correctly not predicted as the crop area of the object.

A R = \frac{T r u e P o s i t i v e}{T r u e P o s i t i v e + F a l s e N e g a t i v e}

(1)

Here, mAP refers to the average of the AP values for all classes and is measured by Equation (2). AP is the average value of the precision of each class and is measured according to Equation (3) with ten thresholds for intersection over unions (IoUs). The IoU threshold values are 0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, and 0.95. The

P r e c i s i o n (r)

value calculated by Equation (4), and the IoU value measured by Equation (5) are indicators of the degree of overlap between the prediction and the ground truth. In Equation (5),

A r e a o f I n t e r a c t i o n

indicates the intersection of the ground truth and the prediction, and

A r e a o f U n i o n

refers to the sum of the ground truth and the prediction.

m A P = \frac{1}{N} \sum_{c l a s s_{i}}^{N} A P_{i}

(2)

A P = \frac{1}{10} \sum_{r = 5, 55, \dots, 95} P r e c i s i o n (r)

(3)

P r e c i s i o n (r) = \frac{T r u e P o s i t i v e}{T r u e P o s i t i v e + F a l s e P o s i t i v e}

(4)

I o U = \frac{A r e a o f I n t e r s e c t i o n}{A r e a o f U n i o n}

(5)

4. Experiments and Results

We report the performance of the model trained by pseudo crop mixing (SC-Mix) and verify the proposed methods. The datasets used in the experiment were acquired in a vertical farm setting, and their descriptions are given in Section 4.1. In addition, Section 4.2 contains the details of the experiment. Three experiments were conducted. The impact of SC-Mix on the model performance is shown in Section 4.3, Section 4.4 verifies the utility of the pseudo-crop, and Section 4.5 presents the analysis of our image-synthesizing methods.

4.1. Vertical Farm Crops Dataset

The vertical farm crop dataset utilized in our experiments was gathered using the system described in Section 3.1. Three distinct crop species were included in the dataset. Three different crop species were collected: romaine, butterhead, and Batavian lettuce types, with an average collection duration of seven days for each. There are 600 images in the collection, 318 of which are annotated and 282 of which are not. The MATLAB image segmenter is used to annotate the crop area. The dataset is divided into four parts:

C_{G T}

,

C_{U}

,

C_{L}

, and

C_{t e s t}

.

C_{G T}

is a set of 151 labeled data samples used for the initial training of the vertical farm crop segmentation model.

C_{U}

consists of 282 unlabeled images used for composing the unlabeled pool in SC-Mix.

C_{L}

is the set used in the experiment described in Section 4.5 and consists of 278 labeled data samples, including

C_{G T}

. In Section 4.5,

C_{L}

is utilized to create crop instances.

C_{t e s t}

includes 40 labeled data samples of the total number or labeled samples except for

C_{L}

and

C_{G T}

. Furthermore,

C_{t e s t}

is used to assess the performance of the model.

4.2. Experimental Detail

In the experiments, Mask R-CNN, a proposal-based model, was used to estimate the crop areas for growth monitoring. The backbone of Mask R-CNN is the unified ResNet50-FPN [27,28]. All of the models used in the experiments were pre-trained with the COCO dataset [29] and fine-tuned with our vertical farm crops dataset. The rescaling pseudo-crop parameter

α

in the synthetic-image generation (SIG) process was randomly selected from the range of [0.1, 0.8]. During the training process, basic data augmentation was utilized in all experiments, including flipping, cropping, rotation, and photometric distortion.

4.3. Pseudo Crop Mixing Improves the Segmentation Task on Vertical Farms

We report the performance of the segmentation model when pseudo crop mixing (SC-Mix) is applied to the initialized model trained with

C_{G T}

. To demonstrate the ability of SC-Mix to enhance the performance, the same

C_{U}

is used as the unlabeled pool in each cycle, without the addition of additional unlabeled data. The SC-Mix cycle was repeated three times with 60 epochs, and these results are reported in Table 1 SC-Mix

^{S, A}

. In the results of the first cycle, SC-Mix reports a mAP value of 0.656, which is higher than the initial model performance of 0.644. The second cycle has a mAP value of 0.741 and the third cycle has an mAP value of 0.769, which is 0.125 (mAP) greater than the initial model. These results indicate that the performance increases as cycling progresses.

4.4. Effects of Pseudo-Crop on Model Training

The pseudo-crop is used as the crop instance for the synthetic-image generation (SIG) module to produce synthetic data. However, because projected errors are included in the pseudo-crop, this can have a negative impact on the training of the model. To determine the effect of inaccuracies on the model, we conducted a self-training experiment utilizing a pseudo-label, which is basis information for creating a pseudo-crop. The procedure for the experiment is as follows. On a small set of labeled data,

C_{G T}

, the segmentation model is trained. The trained model is applied to unlabeled data,

C_{U}

, for labeling. This predicted label is called a pseudo-label. Finally, the model is retrained utilizing the ground truth and pseudo-label data on labeled and unlabeled data. The entire procedure is performed in three cycles and the results are reported in Table 1 self-training

^{S}

.

The results indicate that when the cycle is repeated, applying the pseudo-label progressively improves the performance of the model. This implies that the impact of the pseudo-label information is more positive than negative. Additionally, when the accuracy of pseudo-labeling can be predicted to improve as a result of an improvement in the model performance, the range of the performance improvement with the cycling process increases. SC-Mix, on the other hand, exceeds self-training performance in all cycles. This indicates that SC-Mix is more beneficial for model training than using naïve pseudo-labels.

4.5. Effects of Synthetic-Image Generation on Model Training

The synthetic-image generation (SIG) module generates synthetic images using the growth tray images and the crop instances. SC-Mix creates various crop instances by pseudo-labeling unlabeled data with the model. The crop instance created by this procedure, on the other hand, contains model prediction errors. The crop instance was created in this experiment using a ground-truth label to assess the precision of the degree of performance improvement achieved by the suggested picture synthetic approach. The experiment was conducted in the following manner.

C_{G T}

was used to initialize the model, which was then fine-tuned using additional data. For a comparative experiment, the performance when using naive

C_{L}

as additional data is compared to the SIG module results using synthetic data formulated from

C_{L}

. The results of each experiment are reported in Table 1 supervised learning using

C_{L}

and in Table 1 labeled synthetics

^{A}

. They show that the model trained on augmented data produced by the SIG module outperforms the model trained on the original data.

5. Discussion

Existing agricultural monitoring systems maintained a stable farm environment through the use of environmental sensors such as temperature and humidity. Due to the fact that these systems do not take crop status into account during the monitoring process, they cannot easily ensure an appropriate environment. Image-based monitoring systems are the optimal option for crop growth monitoring, taking the crop status into account.

Image-based monitoring systems mostly use region-of-interest predictions to extract crop information from images. In particular, the crop size is calculated using image segmentation technology, and most segmentation models are trained in supervised methods with labeled data. In prior work [30,31,32,33,34], images were captured in a controlled setting, for example, ensuring a clean background or cultivating a single crop, and were used as input for the model. Therefore, a new model that accurately predicts the crop area on vertical farms with tiny spacing between the crops and diverse crop species is needed.

We proposed the pseudo crop mixing (SC-Mix) method as a means of increasing the performance of the segmentation model on vertical farms as a solution to the issues above. As demonstrated in Table 1, SC-Mix is compared to supervised learning methods trained with only labeled data (

C_{G T}

) and self-training with both labeled (

C_{G T}

) and unlabeled data (

C_{U}

). Not only did the suggested method expand the quantity of training data without incurring annotation expenses, but it also improved the model robustness by including extra synthetic images. These results indicate that the proposed method outperforms the other methods.

Some qualitative results pertaining to our vertical farm dataset are displayed in Figure 6 and Figure 7. For the analysis here, we reported two different cases. Figure 6 shows non-overlapping cases in which the monitored crop is captured with all areas. In contrast, Figure 7, shows overlapping cases in which the monitored crop is covered with a surrounded crop. We compare the qualitative results of SC-Mix with supervised learning and self-training results. When supervised learning is used to train the model, the results are less accurate than SC-Mix in both overlapped and non-overlapped situations. Because the supervised learning method used a small amount of data, the model does not learn the general feature of the crop. Therefore, it is difficult to make accurate predictions in untrained situations with different crop shapes and structures. The self-training approach showed high accuracy, similarly to SC-Mix in the non-overlapped cases. However, in cases where the monitored crop and the surrounding crop overlap, the self-training technique incorrectly predicts the surrounded crop as the monitored crop. This indicates that although self-training can improve the prediction performance for the trained scenario, this method is incapable of accounting for structural variations not included in the training data. The results demonstrate that SC-Mix is not only robust in untrained situations, but also capable of accurately predicting crop areas even when crops overlap.

Additionally, Figure 8 shows the model inference results on a synthetic image simulating a situation of congested vertical farms to demonstrate that our method is capable of reliably predicting the crop area in congested environments. These results demonstrate that SC-Mix is more accurate for distinguishing other crops than other approaches and at predicting the areas of monitored crops. We also provide quantitative results to show that SC-Mix is effective at predicting monitored crops. In Table 2, the model accuracy rates for the monitored crops and the surrounded crops are given separately. According to this, the SC-Mix approach achieved more than 92% mAP performance accuracy on the monitored crop and showed significantly increased prediction accuracy outcomes for all crop regions.

Through these experiments, we showed that the model trained using the SC-Mix approach is suitable for a real-world vertical farm situation. The accuracy of monitoring systems can be increased by using the proposed training technique, which allows for accurate supervision regarding farming conditions and reflects the crop status.

6. Conclusions

We developed an image-based crop growth monitoring system for vertical farms in this study. However, as a result of the effective crop arrangement on vertical farms, the images obtained from the vertical farms contain too many regions of interest which attract the focus of the crop segmentation model. SC-Mix was suggested in order to guarantee that the crop segmentation model in our system performs well even when confronted with this confusing input. SC-Mix outperforms the prior model training technique when the same quantity of labeled data is used. The model trained using SC-Mix accurately predicts the crop area, especially when crops overlap. We conducted experiments using a vertical farm dataset in order to demonstrate the applicability of our system to real-world vertical farms. The model trained using SC-Mix produced an mAP performance rate of 76.9% and an AP performance rate of 92.6% on a monitored target. All results depict the performances of models trained with fixed data for a cross comparison. However, with its real-world monitoring phase, our system will significantly benefit from extra unlabeled data that may be available for the retraining of the model. SC-Mix enables more exact crop growth monitoring and can thus increase crop production on vertical farms.

Author Contributions

Conceptualization, Y.C.; methodology, Y.C., S.L., and T.K.; software, S.L. and T.K.; validation, S.L., and Y.H.; formal analysis, S.L.; investigation, S.L. and T.K.; resources, K.B.; data curation, K.B.; writing—original draft preparation, Y.H., S.L. and T.K.; writing—review and editing, Y.H., S.L. and Y.C.; visualization, Y.H., S.L. and T.K.; supervision, Y.C.; project administration, Y.C.; funding acquisition, K.B. and Y.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work was partly supported by the Starting growth Technological R&D Program funded by the Ministry of SMEs and Startups (MSS, Korea) (No.S2784117, Development of farm environment mirroring and evaluation system and automatic control platform to advance the cultivation recipe for crop in the indoor vertical farm, 50%) and Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No.2021-0-00755, Dark data analysis technology for data scale and accuracy improvement, 50%).

Data Availability Statement

The datasets generated for this study are accessible upon request from the authors.

Conflicts of Interest

The authors declare no conflict of interest.

References

Al-Kodmany, K. The vertical farm: A review of developments and implications for the vertical city. Buildings 2018, 8, 24. [Google Scholar] [CrossRef] [Green Version]
Klerkx, L.; Jakku, E.; Labarthe, P. A review of social science on digital agriculture, smart farming and agriculture 4.0: New contributions and a future research agenda. NJAS-Wagening. J. Life Sci. 2019, 90, 100315. [Google Scholar] [CrossRef]
Ban, B.; Lee, J.; Ryu, D.; Lee, M.; Eom, T.D. Nutrient solution management system for smart farms and plant factory. In Proceedings of the 2020 International Conference on Information and Communication Technology Convergence (ICTC), Jeju Island, Korea, 21–23 October 2020; pp. 1537–1542. [Google Scholar]
Widiyanto, S.; Nugroho, D.P.; Daryanto, A.; Yunus, M.; Wardani, D.T. Monitoring the Growth of Tomatoes in Real Time with Deep Learning-based Image Segmentation. Int. J. Adv. Comput. Sci. Appl. (IJACSA) 2021, 12, 353–358. [Google Scholar] [CrossRef]
Tian, Y.; Yang, G.; Wang, Z.; Li, E.; Liang, Z. Instance segmentation of apple flowers using the improved mask R–CNN model. Biosyst. Eng. 2020, 193, 264–278. [Google Scholar] [CrossRef]
Xu, L.; Li, Y.; Sun, Y.; Song, L.; Jin, S. Leaf instance segmentation and counting based on deep object detection and segmentation networks. In Proceedings of the 2018 Joint 10th International Conference on Soft Computing and Intelligent Systems (SCIS) and 19th International Symposium on Advanced Intelligent Systems (ISIS), Toyama, Japan, 5–8 December 2018; pp. 180–185. [Google Scholar]
Lu, S.; Song, Z.; Chen, W.; Qian, T.; Zhang, Y.; Chen, M.; Li, G. Counting Dense Leaves under Natural Environments via an Improved Deep-Learning-Based Object Detection Algorithm. Agriculture 2021, 11, 1003. [Google Scholar] [CrossRef]
Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid scene parsing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2881–2890. [Google Scholar]
Yuan, Y.; Chen, X.; Wang, J. Object-contextual representations for semantic segmentation. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2020; pp. 173–190. [Google Scholar]
He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path aggregation network for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8759–8768. [Google Scholar]
Chen, K.; Pang, J.; Wang, J.; Xiong, Y.; Li, X.; Sun, S.; Feng, W.; Liu, Z.; Shi, J.; Ouyang, W.; et al. Hybrid task cascade for instance segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 4974–4983. [Google Scholar]
Liu, S.; Jia, J.; Fidler, S.; Urtasun, R. Sgn: Sequential grouping networks for instance segmentation. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 3496–3504. [Google Scholar]
Gao, N.; Shan, Y.; Wang, Y.; Zhao, X.; Yu, Y.; Yang, M.; Huang, K. Ssap: Single-shot instance segmentation with affinity pyramid. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 27–28 October 2019; pp. 642–651. [Google Scholar]
Champ, J.; Mora-Fallas, A.; Goëau, H.; Mata-Montero, E.; Bonnet, P.; Joly, A. Instance segmentation for the fine detection of crop and weed plants by precision agricultural robots. Appl. Plant Sci. 2020, 8, e11373. [Google Scholar] [CrossRef] [PubMed]
Kotsiantis, S.B.; Zaharakis, I.; Pintelas, P. Supervised machine learning: A review of classification techniques. Emerg. Artif. Intell. Appl. Comput. Eng. 2007, 160, 3–24. [Google Scholar]
Alloghani, M.; Al-Jumeily, D.; Mustafina, J.; Hussain, A.; Aljaaf, A.J. A systematic review on supervised and unsupervised machine learning algorithms for data science. In Supervised and Unsupervised Learning for Data Science; Springer: Berlin/Heidelberg, Germany, 2020; pp. 3–21. [Google Scholar]
Van Engelen, J.E.; Hoos, H.H. A survey on semi-supervised learning. Mach. Learn. 2020, 109, 373–440. [Google Scholar] [CrossRef] [Green Version]
Lee, D.H. Pseudo-Label: The Simple and Efficient Semi-Supervised Learning Method for Deep Neural Networks. Available online: https://www.researchgate.net/publication/280581078_Pseudo-Label_The_Simple_and_Efficient_Semi-Supervised_Learning_Method_for_Deep_Neural_Networks (accessed on 26 April 2022).
Sohn, K.; Berthelot, D.; Carlini, N.; Zhang, Z.; Zhang, H.; Raffel, C.A.; Cubuk, E.D.; Kurakin, A.; Li, C.L. Fixmatch: Simplifying semi-supervised learning with consistency and confidence. Adv. Neural Inf. Process. Syst. 2020, 33, 596–608. [Google Scholar]
Shorten, C.; Khoshgoftaar, T.M. A survey on image data augmentation for deep learning. J. Big Data 2019, 6, 1–48. [Google Scholar] [CrossRef]
Dvornik, N.; Mairal, J.; Schmid, C. Modeling visual context is key to augmenting object detection datasets. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 364–380. [Google Scholar]
Fang, H.S.; Sun, J.; Wang, R.; Gou, M.; Li, Y.L.; Lu, C. Instaboost: Boosting instance segmentation via probability map guided copy-pasting. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 27–28 October 2019; pp. 682–691. [Google Scholar]
Dwibedi, D.; Misra, I.; Hebert, M. Cut, paste and learn: Surprisingly easy synthesis for instance detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 1301–1310. [Google Scholar]
Ghiasi, G.; Cui, Y.; Srinivas, A.; Qian, R.; Lin, T.Y.; Cubuk, E.D.; Le, Q.V.; Zoph, B. Simple copy-paste is a strong data augmentation method for instance segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 2918–2928. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft coco: Common objects in context. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2014; pp. 740–755. [Google Scholar]
Kolhar, S.; Jagtap, J. Convolutional neural network based encoder-decoder architectures for semantic segmentation of plants. Ecol. Inform. 2021, 64, 101373. [Google Scholar] [CrossRef]
Quan, L.; Wu, B.; Mao, S.; Yang, C.; Li, H. An Instance Segmentation-Based Method to Obtain the Leaf Age and Plant Centre of Weeds in Complex Field Environments. Sensors 2021, 21, 3389. [Google Scholar] [CrossRef] [PubMed]
Safonova, A.; Guirado, E.; Maglinets, Y.; Alcaraz-Segura, D.; Tabik, S. Olive tree biovolume from uav multi-resolution image segmentation with mask r-cnn. Sensors 2021, 21, 1617. [Google Scholar] [CrossRef] [PubMed]
Mohammadi, V.; Minaei, S.; Mahdavian, A.R.; Khoshtaghaza, M.H.; Gouton, P. Estimation of Leaf Area in Bell Pepper Plant using Image Processing techniques and Artificial Neural Networks. In Proceedings of the 2021 IEEE International Conference on Signal and Image Processing Applications (ICSIPA), Kuala Terengganu, Malaysia, 13–15 September 2021; pp. 173–178. [Google Scholar]
Trivedi, M.; Gupta, A. Automatic monitoring of the growth of plants using deep learning-based leaf segmentation. Int. J. Appl. Sci. Eng. 2021, 18, 1–9. [Google Scholar]

Figure 1. Examples of vertical farms environment: (a) a vertical farm cultivation environment; and (b,c) images of crop growth monitoring.

Figure 2. Framework of the proposed monitoring system: (a) the data collection procedure; and (b) crop size estimation. The image captured at the top of the growth tray is transferred to the server through the secure copy protocol (SCP), and the server can track the growth stage of the crop by calculating and recording the occupied area of the crop in the image.

Figure 3. Example of a vertical farms environment. Monitoring images are collected from the vertical top of the crop, as shown in the figure.

Figure 4. SC-Mix framework. Our method is composed of two modules: (a) pseudo-crop generation (PCG) module generating the pseudo-crop with an unlabeled image. The PCG module converts only the pseudo-label for the monitored crop into the pseudo-crop, not the surrounding crop; (b) synthetic-image generation (SIG) module synthesizing the image taken on vertical farms using pseudo-crops. The synthetic images are combined with the training dataset to re-train the instance segmentation model of the crop.

Figure 5. Training loss and mAP performance according to training progress. This graph depicts the progress of model training while the SC-Mix cycle is repeated. The top represents the change in loss value applied during the model training, while the lower represents the mAP performance on test data as measured every three epochs throughout the learning process.

Figure 6. Supervised vs. self-training vs. SC-Mix. This result compared SC-Mix to other model training approaches for non-overlapping crops. The red box is the predicted area of the model, and the yellow box is the ground truth area of the crop.

Figure 7. Supervised vs. self-training vs. SC-Mix. This result compares SC-Mix to other model training approaches for overlapping crops. The red box is the predicted area of the model, and the yellow box is the ground truth area of the crop.

Figure 8. Supervised vs. self-training vs. SC-Mix. Comparison results of synthetic vertical farm images are shown. The model output images include color-coded predictions for different crops.

Table 1. Crop segmentation performance of the vertical farm dataset. The superior letter “S” denotes the use of the self-training scheme, and “A” indicates that the data augmentation process was employed via the proposed image synthesis method.

	Cycle	Dataset	$mAP$	${mAP}_{50}$	${mAP}_{75}$	$AR$
Supervised Learning	1	$C_{G T}$	0.644	0.844	0.764	0.664
Supervised Learning	1	$C_{L}$	0.637	0.840	0.734	0.666
Self- Training $^{S}$	1	$C_{G T}$ , $C_{U}$	0.637	0.830	0.748	0.664
	2		0.645	0.834	0.732	0.665
	3		0.675	0.848	0.779	0.703
Labeled Synthetics $^{A}$	1	$C_{L}$	0.654	0.885	0.739	0.713
SC-Mix $^{S, A}$	1	$C_{G T}$ , $C_{U}$	0.656	0.862	0.755	0.690
	2		0.741	0.915	0.874	0.795
	3		0.769	0.982	0.880	0.804

Table 2. Results of AP performance for each category and mAP.

Method	Cycle	Monitored Crop AP	Surrounding Crop AP	mAP
Supervised Learning	-	0.835	0.452	0.644
Self-Training	1	0.845	0.428	0.637
	2	0.882	0.407	0.645
	3	0.892	0.457	0.675
SC-Mix	1	0.868	0.444	0.656
	2	0.910	0.572	0.741
	3	0.926	0.611	0.769

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Hwang, Y.; Lee, S.; Kim, T.; Baik, K.; Choi, Y. Crop Growth Monitoring System in Vertical Farms Based on Region-of-Interest Prediction. Agriculture 2022, 12, 656. https://doi.org/10.3390/agriculture12050656

AMA Style

Hwang Y, Lee S, Kim T, Baik K, Choi Y. Crop Growth Monitoring System in Vertical Farms Based on Region-of-Interest Prediction. Agriculture. 2022; 12(5):656. https://doi.org/10.3390/agriculture12050656

Chicago/Turabian Style

Hwang, Yujin, Seunghyeon Lee, Taejoo Kim, Kyeonghoon Baik, and Yukyung Choi. 2022. "Crop Growth Monitoring System in Vertical Farms Based on Region-of-Interest Prediction" Agriculture 12, no. 5: 656. https://doi.org/10.3390/agriculture12050656

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Crop Growth Monitoring System in Vertical Farms Based on Region-of-Interest Prediction

Abstract

1. Introduction

2. Related Work

2.1. Region-of-Interest Prediction

2.2. Semi-Supervised Learning

2.3. Data Augmentation

3. Materials and Methods

3.1. Data Acquisition

3.2. Pseudo Crop Mixing

3.2.1. Pseudo-Crop Generation

3.2.2. Synthetic-Image Generation

3.2.3. Training Strategy

3.3. Assessment of the Model

4. Experiments and Results

4.1. Vertical Farm Crops Dataset

4.2. Experimental Detail

4.3. Pseudo Crop Mixing Improves the Segmentation Task on Vertical Farms

4.4. Effects of Pseudo-Crop on Model Training

4.5. Effects of Synthetic-Image Generation on Model Training

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI