This subsection is dedicated exclusively to the field testing of the IoT platform, presenting materials and methods used for implementation and testing. The validation was performed on a strawberry farm located in a rural area. The entire plantation is approximately 18,896 m
2 in size, and an area view of the entire plantation area can be seen in
Figure 13. The plantation has a total of 104.000 San Andreas species planted. The cultivation began on 14 March 2022 because San Andreas strawberry is a moderate day-neutral with a production pattern. Two sensor nodes and a collector node are deployed to cover the area. The sensors nodes collect the local temperature and humidity condition and send thought LoRa communication to the collector node. The collector node uploads all the data to the internet, presenting the information on a dashboard, analyzing the climate condition of the plant, and storing all the data on an online database.
First, the location of the sensor nodes needs to be chosen. The positioning schema of the sensor nodes also needs to be sufficient to cover the most critical parts of the plantation. The strawberry farm chosen has a small area, and because of that, two sensor nodes are sufficient to validate the IoT platform. Second, the variables collected need to be relevant to the plantation health and quality. As well as many other plants, strawberries are fragile fruits that take damage by some heat and humidity conditions. Because of that, the DHT11 and hygrometer sensor are chosen to validate the Machine Learning algorithms.
To finish, the real-time capture and detection performed by the CV application need to be validated. The CV application can detect seven disease types on the strawberry. However, finding these diseases on a unique farm is challenging. In order to find strawberry specimens that contain the diseases present in the dataset, a small plantation area was selected and continuously presented in real-time to the collector node.
The test roadmap for the IoT platform is written as follows: (I) implementation of the sensor nodes, (II) implementation of the collector node, (III) verification of the connection between the nodes, (IV) collection and access to the collected data, (V) storage and visualization of the data, and (VI) validation of the computer vision model in the field. Sensor nodes are deployed on the two middle points related to the center and borders of the plantation. This approach maximizes the coverage area through three points and good platform deployment. Nevertheless, the sensor nodes can be dispatched to the more critical areas of the plantation, covering the most exciting points.
LoRa communication is used because the local does not offer Wi-Fi connectivity. The online dashboard and database also cannot be used by a lack of internet connection. In addition, offline dashboards and databases are used as an alternative. Because of that, all the data are collected but available only locally. No wireless LAN was created for the application tests, so a notebook is used to check the collected data and the detections performed. The goal of the real-time test is to validate the model ability to perform processing entirely at the edge, enabling its implementation in smart greenhouses or autonomous rovers.
Figure 14 shows the connection schema applier for test purposes and
Figure 15 illustrates a possible application of rover.
Detection is completed in real time through a deep learning model installed on Raspberry Pi. A notebook is connected through an Ethernet cable to the Pi board to see the detections. In real platform implementations, the pi screen needs to be streamed to a smartphone or laptop screen through a VNC connection over the internet.
Computer Vision
The performance of the proposed model is evaluated through metrics such as (i) mean average precision, (ii) precision, (iii) recall, (iv) accuracy, and (v) box loss. A detailed description and the equation of all these metrics are shown above:
Mean average precision (mAP): The metric presents the mean of the average precision (AP), in which AP is represented by the area below the curve.
Precision: Relation of all the classifications by the model to the correct ones. In general, it is used in situations where false positives have more weight than false negatives;
Recall: Relation between all the true positives by the sum of true positive and false negative. It is like precision but used in situations where the false negatives are considered more harmful than the false positives.
Accuracy: Tells about the general performance of the model, being the relation of the correct by all the classifications performed by the model.
Accuracy is a good general sign of the performance of the model.
Box loss: Measures how close the bounding box is from the true box.
First, metrics regarding the model performance during the training are captured through a training dataset. A training dataset is a set of data used during model learning, making parameter adjustments such as weights to ensure better detection and classification capabilities. The metrics presented above are used during the training phase to follow the performance of the model during each epoch. Each epoch represents one time that the model ran through all the images. Moreover, all the metrics are applied in different weight sizes to search for a better and more performative model. In addition, all the tests between the different weight sizes are summed up in
Table 4.
In brief, all the models performed in a similar way under the same hyper-parameters, and some detections made on the training dataset can be seen in
Figure 19.
However, models with large weights consume more space on the SD card and have little performance difference to the final result. Because of that, the Yolo v5s is the chosen model for the task of disease detection on a strawberry farm. The model also showed promising results on the test dataset. A test dataset is a set of data independent of the training dataset and is an excellent way to test the model detection capability in different scenarios and situations. The performance presented by the model on the test set shows satisfactory results with a mean Average Precision (0.5) of 78.7%, Accuracy of 92.8%, Recall of 90.0%, and an F1-Score of 76% as shown in
Figure 20. The confusion matrix presented in
Figure 21 also presents satisfactory results of the model, presenting a good ability to detect the model in six diseases. However, the model presents some problems recognizing the Angular Leafspot disease, scoring only 0.23 in the obtained matrix.
Several analyses have been performed in order to find the reason behind the performance of the Angular Leafspot class concerning the other classes. The first test consists of applying the Gradient-weighted Class Activation Mapping (GRAD-Cam) technique to the model to see which features were considered for the final classification. In summary, GRAD-Cam is a technique that produces visual explanations of decisions in CNN-weighted models. Through this technique, it is possible to observe through a heat map which regions in the image are the most important for the model classification and prediction processes. However, during testing, the model could not accurately identify the disease’s main features (the spots created on the leaves) in all detections.
Figure 22 shows an example of which features were considered relevant in classifying the Angular Leafspot disease.
In further tests using GRAD-Cam on the other classes, it is possible to see the model’s accuracy in determining which regions present relevant characteristics of each disease. For example, in the Powdery Mildew Fruit disease, the model could perceive which visual changes presented in the fruit were the results of the disease.
Figure 23 presents an example of GRAD-Cam applied to the Powdery Mildew Fruit class.
The dataset was also analyzed, as unbalanced examples between classes in a dataset can cause different performance between classes. The first step quantified how many samples (images) of each class were provided to the model during the training stage. A summary of the information found can be seen in
Table 5. Although having an unbalanced dataset, Angular Leafspot has a considerable number of images when compared with the other classes. Thus, tests with a larger dataset and new images should be performed to prove their impact on the model results.
Since this is an application developed to be run in the field, the brightness variation in the captured images can change the detections performed by the application. With this in mind, a test was made to present several images to the model with different brightness profiles to test the model performance in several brightness variations. For that, a Python algorithm that iteratively changes all image brightness in the test dataset was used. An illustration of the final images presented by the model can be seen in
Figure 24.
The model achieved considerable performance even when darkness was 40%, while difficulties appear when the model is exposed to images or scenes with low lighting, where the disease detection is impaired.
Table 6 shows what precision, accuracy, and recall were obtained from the model under different brightness conditions, and
Figure 25 shows a sample of the detections performed by the model in the different dark conditions.
A new dataset with images with different brightness profiles has been developed to solve the model errors in different lighting conditions. This new dataset was used to retrain the model using the parameters described in the previous steps but now with four times more images. In the end, the model was presented with images with different lighting configurations and could detect the diseases in them accurately.
Figure 26 demonstrates the ability of the model to detect and classify diseases in different lighting profiles. An LED next to the collector node pointing directly at the plant may also be also enough to avoid problems with illumination.
Finally, another field test was performed at different times during the day to evaluate the model ability to detect and classify different diseases in different illuminations. Four schedules were defined for capturing and presenting images for the proposed model. The chosen times should be aligned with the test proposal and the conditions offered by local farmers. So, the test times were defined as 7:00 a.m., 9:00 a.m., 2:00 p.m., and 5:00 p.m. At 7:00 a.m., the light conditions are more sparse, offering a higher challenge for the model to discern the various diseases that can be detected. At 9:00 a.m., local light conditions are more abundant, and the model appears to have higher confidence in its detections. At the same time, at 2:00 p.m., the environment is so bright that model performance is impaired. Finally, at 5:00 p.m., the color change due to sunset significantly impacts model performance, causing some specimens with diseases to go unnoticed by the detector. Some detections made by the model can be seen in
Figure 27.
Benchmarking of the proposed model performance on edge has also been completed and the metrics captured. The model runs on the edge through a Raspberry Pi 4B board with a Broadcom 2711 Quad-core Cortex-A72 64-bit SoC @ 1.5 GHz and 4 GB of RAM. Both processor and RAM were monitored at 1-minute intervals for a total of 1 h, resulting in 60 samples of each. The temperature displayed by the board, frame rate and processing time displayed by the model, and total processor usage by the board were also observed. However, Equation (
4) was considered for calculating the frame rate presented by the model.
Figure 28 displays the behavior of the board running the detection model in real time.
Some improvements must be made in the collected node, the first being the addition of some active heat dissipation method due to the high temperatures reached during the execution of the computer vision application. The temperature achieved of 85 °C is critical for the Raspberry Pi health if it is maintained for long periods. Another point to be considered is the significant processing time presented by the model, which on average is 2 s. Thus, the model improvements are necessary to reduce its need for computational power and thus increase its performance.
Briefly, the model achieved an accuracy of 92% on the training set and 92.8% on the training dataset. The model also maintained good quality in its detections when used in real time in crop applications and made accurate detections even in different illuminations.