**5. Discussion**

To validate the proposed vine disease detection system, it is necessary to evaluate and compare qualitative and quantitative results for each block of the whole system. For this purpose, several experiments were conducted at each step of the disease detection procedure. The first experience was carried out on the multimodal orthophotos registration. Figure 4 shows the obtained results. As can be seen, the continuity of the vinerows is highly accurate and the continuity is respected between the visible and infrared ranges. However, if image acquisition is incorrectly conducted, this results in many registration errors. To avoid these problems, two rules must be followed. The first one regards the overlapping between visible and infrared images acquired in the same position, which must be greater than 85%. The second rule is that the overlapping between each acquired image must be greater than 70%; this rule must be respected in both ranges. Non-compliance with the first rule affects the building of the registered infrared orthophoto. Indeed, this latter may present some black holes (this means that there are no data available to complete these holes). Non-compliance with the second rule affects the photogrammetry processing and the DSM model. This can lead to deformations in the orthophoto patterns (as can be seen on the left side of the visible and infrared orthophotos in Figure 5). In case the DSM model is impacted, the depth map automatically undergoes the same deformation (as can be seen in the depth map in Figure 5). The second quality evaluation is

the building of the depth map (Figure 6). Despite the slight deformation in the left side of the parcel, the result of the depth map is consistent and well aligned with the visible orthophotos, and can be used in the segmentation process.

**Figure 4.** Qualitative results of orthophotos registration using a chessboard pattern.

**Figure 5.** Qualitative results of orthophotos and depth map.

**Figure 6.** Evaluation of the depth map alignment using a chessboard pattern.

In order to assess the added value of depth map information, two training sessions were performed on the SegNet [33], U-Net [57], DeepLabv3+ [58] and PSPNet [59] networks. The first training session was conducted only on multispectral data, and the second one on multispectral data combined with depth map information. Figures 7 and 8 illustrate the qualitative test results of the comparison between the two trainings. The left side of Figure 7 shows an example of a parcel with a green ground. The center of the figure presents the segmentation result of the SegNet model trained only on multispectral data. As can be seen, in some areas of the parcel, it is difficult to dissociate vinerows. The right side of the figure depicts the segmentation result of the SegNet model trained on multispectral data combined with depth map information. This result is better than the previous one and it can easily separate vinerows. This is due to additional depth map information that allows a better learning of the scene environment and distinction between classes. Figure 7 illustrates other examples realised under the same conditions as above. On the first row, we observe an area composed of green ground. The segmentation results using the first and second models are displayed in the centre and on the right side, respectively. We can notice in this example a huge confusion between ground and healthy vine classes. This is mainly due to the fact that the ground color is similar to the healthy vine one. This problem has been solved by adding depth map information in the second model, the result of which is shown on the right side. The second row of Figure 8 presents an example of a partially diseased area. The first segmentation result reveals the detection of the disease class on the ground. The brown color (original ground color) merged with a slight green color (grass color) on the ground confused the first model and led it to misclassifying the ground. This confusion does not exist in the second segmentation result (right side). From these results, it can be concluded that the second model learned that the diseased vine class could not be detected on "no-vine" when this one was trained on multispectral and depth map information. Based on these results, the following experiments were conducted using multispectral data and the depth map information.

**Figure 7.** Difference between a SegNet model trained only on multispectral data and the same trained on multispectral data combined with depth map information. The presented example is on an orthophoto of a healthy parcel with a green ground.

**Figure 8.** Difference between a SegNet model trained only on multispectral data and the same trained on multispectral data combined with depth map information. Two examples are presented here, the first row is an example on a healthy parcel with a green ground. The second one is an example on a partially diseased parcel with a brown ground.

In order to validate the proposed architecture, a comparative study was conducted on the most well-known deep learning architectures, SegNet [33], U-Net [57], DeepLabv3+ [58] and PSPNet [59]. All architectures were trained and tested on the following classes: shadow, ground, healthy and diseased, with the same data (same training and test). Table 3 lists the segmentation results of the different architectures. The quantitative evaluations are based on the F1-score and the global accuracy. As can be seen, the shadow and ground classes have obtained an average scores of 94% and 95%, respectively, with all architectures. The high scores are due to the easy detection of these classes. The healthy class scored between 91% and 92% for VddNet, SegNet, U-Net and DeepLabv3+. However, PSPNet obtained the worst result of 73.96%, due to a strong confusion between the ground and healthy classes. PSPNet was unable to generate a good segmentation model although the training dataset was rich. The diseased vine class is the most important class in this study. VddNet obtained the best result for this class with a score of 92.59%, followed by SegNet with a score of 88.85%. The scores of the other architectures are 85.78%, 81.63% and 74.87% for U-Net, PSPNet and DeepLabv3+, respectively. VddNet achieved the best result because the feature extraction was performed separately. Indeed, in [21] it was proven that merging visible and infrared segmentations (with two separate trained models) provides a better detection than visible or infrared separately. The worst result of the diseased class was obtained with DeepLabv3+; this is due to a insensibility in the color variation. In fact, the diseased class can correspond to the yellow, brown or golden color and these colors are usually between the green color of healthy neighbour leaves. This situation led classifiers to be insensitive to this variation. The best global segmentation accuracy was achieved by VddNet, with an accuracy of 93.72%. This score can be observed on the qualitative results of Figures 9 and 10. Figure 9 presents an orthophoto of a parcel (on the left side) partially contaminated with mildew. The right side shows the segmentation result by VddNet. It can be seen that it correctly detects the diseased areas. Figure 10 is an example of parcel without disease; here, VddNet also performs well in detecting true negatives.

**Figure 9.** Qualitative result of VddNet on a parcel partially contaminated with mildew and with green ground. The visible orthophoto of the healthy parcel is in the left side, and its disease map in the right side.


**Figure 10.** Qualitative result of VddNet on a healthy parcel with brown ground. The visible orthophoto of the healthy parcel is in the left side, and its disease map in the right side.
