Crop Growth Analysis Using Automatic Annotations and Transfer Learning in Multi-Date Aerial Images and Ortho-Mosaics

Rana, Shubham; Gerbino, Salvatore; Akbari Sekehravani, Ehsan; Russo, Mario Brandon; Carillo, Petronia

doi:10.3390/agronomy14092052

Open AccessArticle

Crop Growth Analysis Using Automatic Annotations and Transfer Learning in Multi-Date Aerial Images and Ortho-Mosaics

by

Shubham Rana

^1,*

,

Salvatore Gerbino

¹

,

Ehsan Akbari Sekehravani

¹

,

Mario Brandon Russo

¹

and

Petronia Carillo

²

¹

Department of Engineering, University of Campania “L. Vanvitelli”, Via Roma 29, 81031 Aversa, Italy

²

Department of Biological and Pharmaceutical Environmental Sciences and Technologies, University of Campania “L. Vanvitelli”, Via Antonio Vivaldi, 43, 81100 Caserta, Italy

^*

Author to whom correspondence should be addressed.

Agronomy 2024, 14(9), 2052; https://doi.org/10.3390/agronomy14092052 (registering DOI)

Submission received: 10 June 2024 / Revised: 12 August 2024 / Accepted: 4 September 2024 / Published: 7 September 2024

Download

Browse Figures

Versions Notes

Abstract

:

Growth monitoring of crops is a crucial aspect of precision agriculture, essential for optimal yield prediction and resource allocation. Traditional crop growth monitoring methods are labor-intensive and prone to errors. This study introduces an automated segmentation pipeline utilizing multi-date aerial images and ortho-mosaics to monitor the growth of cauliflower crops (Brassica Oleracea var. Botrytis) using an object-based image analysis approach. The methodology employs YOLOv8, a Grounding Detection Transformer with Improved Denoising Anchor Boxes (DINO), and the Segment Anything Model (SAM) for automatic annotation and segmentation. The YOLOv8 model was trained using aerial image datasets, which then facilitated the training of the Grounded Segment Anything Model framework. This approach generated automatic annotations and segmentation masks, classifying crop rows for temporal monitoring and growth estimation. The study’s findings utilized a multi-modal monitoring approach to highlight the efficiency of this automated system in providing accurate crop growth analysis, promoting informed decision-making in crop management and sustainable agricultural practices. The results indicate consistent and comparable growth patterns between aerial images and ortho-mosaics, with significant periods of rapid expansion and minor fluctuations over time. The results also indicated a correlation between the time and method of observation which paves a future possibility of integration of such techniques aimed at increasing the accuracy in crop growth monitoring based on automatically derived temporal crop row segmentation masks.

Keywords:

automatic annotation; grounding detection transformer with improved denoising anchor boxes (grounding DINO); segment anything model (SAM); grounded SAM; growth monitoring

1. Introduction

Manual crop growth monitoring and modeling have been two key ingredients of agricultural practices since the origin of agricultural science, allowing farmers to guide their yields and manage their limited resources as rationally as possible [1]. From ancient times, direct observation and manual measurements of fields by farmers were used to monitor crop health and growth across stages [2,3]. This, however, is laborious, time-consuming, and prone to human error. Demand for more efficient monitoring methods picked up as agriculture scaled up and diversified. From there, the advancement of statistical modeling and after that of computational tools offered another opportunity to make crop yield predictions based on the manual collection of crop data [1,4,5,6,7]. Nevertheless, the primary hurdle remained unchanged which was the requirement for a substantial workforce to accurately name and monitor individual plants and rows of crops [8]. The wide range of conditions in the field, along with the subjective nature of human observations, has resulted in data that are typically unreliable and inconsistent. As a result, there is a need for automated and exact alternative approaches.

The absence of uniformity and compatibility across different smart agriculture technologies is evident. Developing an integrated framework that enables diverse systems and devices to function together is essential for wider acceptance and improved effectiveness. An important challenge is the assimilation of data from diverse sensors and guaranteeing compatibility between different systems and platforms [6]. Standardization is necessary for data analysis and interoperability across multiple platforms for smooth functioning. To make the models more accurate, they need to be consistently integrated and calibrated. This is mainly based on differences and default parametrization used by different models, which can lead to errors [7,8,9]. It is imperative to ensure the efficient collaboration of different autonomous systems for the successful implementation of autonomous farming operations. By incorporating chemical soil health indicators and crop quality into a hierarchical multinomial logistic regression, a robust model was created that could accurately forecast the crop growth, making use of the vegetation indices [10]. Additional comparative studies are required to determine the dependability and precision of unmanned aerial vehicle (UAV) sensors in comparison to conventional ground-based techniques for crop growth and analysis over time.

When each crop row is considered as a separate class, the analysis of temporal crop characterization for plant growth becomes easier, more precise, and more cost-effective. Therefore, this approach is preferable to classification of individual crop specimens which can result in challenges due to spectrum overlap problems due to intra-crop occlusions [11]. It has also been demonstrated that the instance segmentation pipeline trained over nadir RGB images for prediction and analysis of plant shapes can be ensembled with other machine learning (ML) models to perform similar and additional analysis over ortho-mosaics to suit different agricultural environments [12]. Another work based on combinational machine learning techniques coupled classification and regression trees to enhance the accuracy of ground coverage (GC) estimation from UAV imagery for wheat phenotyping [13]. The capacity of deep learning techniques to greatly enhance the precision and effectiveness of plant segmentation in high-throughput phenotyping, thereby opening possibilities for progress in automated crop annotations based on transfer learning, was demonstrated by [14].

The issue of manual annotations was underscored in research centered on a proximally sensed multispectral imaging environment, which included a heterogeneous mixture of wheat and horseradish. The dataset was quantitatively inadequate due to human limitations [15]. The Grounded SAM [16] is an architecture which combines a Grounding DINO [17], an open-set object detector, with the Segment Anything Model (SAM) [18] to perform precise detection and segmentation of objects based on arbitrary text inputs. It is a breakthrough in open-world visual perception and provides a robust and flexible framework for a range of visual tasks through the integration of specialized models. It utilizes the Recognize Anything Model (RAM) [19] and image-caption models such as BLIP [20] to automatically provide comprehensive annotations for images, hence greatly minimizing the need for manual annotation. The performance of the Grounded SAM can be extended through integration of more models to enhance its capabilities, such as faster inference models and high-quality mask generators. The ensembling of the SAM and Explainable Contrastive Language–Image Pretraining (ECLIP) was demonstrated for plant recognition and phenological characterization, making use of B-spline curves to quantify plants’ dimensions [21]. A similar study stressing the problem of sample scarcity in crop mapping based on medium resolution satellite imagery devised an automated sample generation framework based on the SAM for improving the accuracy of crop mapping and to do away with manual annotations [22]. The generated samples significantly improved the accuracy of crop mapping in both research areas, particularly in locations with clearly defined parcel borders. The implementation of this automated workflow mitigated the problem of limited sample availability, resulting in the development of crop mapping solutions that are more dependable and can be expanded to a larger scale. A practical use case of the Grounded SAM was also demonstrated in automated weed annotation over proximally sensed multispectral images [23].

Transfer learning is a machine learning technique where a pre-trained model developed for a detection or segmentation task can be used as a starting point for a deep learning model on the subsequent task. It has been demonstrated to improve performance and reduce training time, leading to efficient utilization of the resources. A study integrated Mask R-CNN, a technique for segmenting instances, with transfer learning using VGG16 for classification, resulting in improved system robustness and accuracy [24]. Another group researched utilising advanced deep learning techniques to improve automated monitoring and management of lettuce seedlings in regulated agricultural situations and utilised an enhanced Mask R-CNN model that was pre-trained on the COCO dataset [25]. The model’s learning was then transferred to CB-Net in order to improve the accuracy and efficiency of seedling segmentation and growth trait estimation. This process ultimately facilitated automatic sorting of lettuce seedlings [26]. Transfer learning has been shown to facilitate environmental and agricultural monitoring in various areas, such as land cover mapping, vegetation monitoring, crop yield mapping, and water resource management [27].

Transfer learning, coupled with partially supervised techniques, makes use of bounding box annotations to efficiently enable one-shot image segmentation. These methods greatly decrease the dependency on over detailed annotations at the pixel level while still achieving a high level of accuracy in segmentation. Box2Mask introduces a novel method; for instance, segmentation that integrates the traditional level-set evolution model with deep neural network training. This methodology allows for precise mask prediction using only bounding box supervision [28]. Another work introduced a method called image-aligned style modification towards reinforcement of dual-mode iterative learning for one-shot robust segmentation of brain tissues, thereby achieving notable improvements in performance over existing methods [29]. An application in wildlife monitoring and precision livestock farming was demonstrated through the proposal of a one-shot learning-based approach towards the segmentation of animal videos using only one labeled frame [30] Integrating a learning approach with noisy annotations in a framework was observed to show improvement of the segmentation model. This was achieved by utilizing pseudo masks created from unlabeled volumes. This strategy improved the performance of segmentation by using extra unannotated data throughout the training process. One-shot localization and weakly-supervised segmentation were found to reduce the need for substantial annotation. This method achieved a high level of segmentation accuracy in complicated medical imaging tasks using only one annotated image and a few unannotated volumes [31].

In a nutshell, the SAM has been found robust in coupling with one-shot learning approaches such as PerSAM [32] and PerSAM-F to enhance its segmentation capabilities in the remote sensing field. These techniques enable the SAM to personalize and refine its segmentation abilities with minimal input. The performance of the SAM was assessed and verified using multi-scale imaging, showcasing its ability to process images of varying resolutions and scales, a critical requirement for remote sensing applications [33]. The Grounded SAM can improve crop growth analysis by offering an accurate, automated segmentation and comprehensive monitoring capabilities. These features enable informed decision making in crop management, health assessment, and yield forecasting, thereby promoting sustainable agricultural practices. Our study is concentrated on temporal crop growth analysis of cauliflowers using segmentation masks derived from automatically annotated multi-date aerial images and ortho-mosaics for observing growth stages of individual crop rows validated through different statistical examinations. The principal idea is to automatically segment these two sets of multi-date images and ortho-mosaics using a Grounded SAM over the trained YOLOv8x’s inference of a particular date’s imagery, which was observed to exhibit the best training parameters. The predicted multi-date mask instances would be used to calculate the mean crop pixel count for every crop row as an individual class across both the datasets. The pixel count of the individual crop rows across the dates in both the datasets would be used to observe the temporal growth patterns and the correlation between the type of observation—aerial or ortho-mosaic—and the date of observation.

This article is organized as follows. Section 2 describes the two kinds of datasets: multidate aerial imagery and ortho-mosaics, detailing the image acquisition and the annotation of images. It also explains how the dataset was used in the training of YOLOv8x-seg and transferring the optimal inference to the Grounded SAM for automatic annotation and segmentation of crop images. Subsections within this section explain the process of transfer learning, conversion of COCO format annotations to PASCAL VOC format, and the creation of instance segmentation masks from bounding box detections. Section 3 describes the results and discussion with the examination of relative crop growth rates, employing multi-date aerial imagery and ortho-mosaics to detect growth patterns and variations over time. This section also comprises a time-series analysis, Pearson correlation analysis, and a two-way ANOVA test to assess the consistency and statistical significance of the observed growth patterns to examine the findings, validating the efficacy of the automated annotation techniques towards improving decision making in precision agriculture. Section 4 concludes the study, summarizing the key findings, addressing the limitations, and suggesting directions for future research.

This work presents an automated segmentation workflow for multi-modal crop growth monitoring of cauliflower using deep learning models like YOLOv8 and the Grounded SAM. A single-shot transfer learning technique was employed to enhance the Grounding DINO’s ability to generate bounding boxes around cauliflowers identified in multi-date aerial images and ortho-mosaics. The Grounded SAM was facilitated by a pre-trained YOLOv8x-seg model to extract features from input images, providing a comprehensive detail of the images for further training. The experiment also demonstrates the potential for automated temporal crop growth monitoring based on segmented pixels of the cauliflower crop rows over a period, showing a high degree of correlation between aerial imagery and ortho-mosaics, validated through statistical techniques like Pearson correlation and ANOVA. The use of transfer learning improved the pipeline’s accuracy, resulting in higher true positives and facilitating the generation of segmentation masks for temporal growth analysis.

2. Materials and Methods

This section discusses the complete workflow, which begins with data acquisition and processing. This is followed by annotations, training, and transfer learning. The process continues with automated annotations, leading to the analysis of segmentation masks for temporal crop growth monitoring. The validation of the analysis strategies is then applied to the automatically generated segmentation masks. These masks are derived from aerial imagery and ortho-mosaics. The workflow concludes with statistical investigations, which further examine the results obtained from the segmentation masks.

2.1. Image Acquisition

The study area is situated in the Department of Agricultural Sciences, University of Napoli Federico II, Portici, Italy and marked with the red polygon (Figure 1a,b). Image data were collected in situ across six distinct temporal instances in October and November of 2020, predominantly during solar noon. The acquisition device utilized was a DJI Phantom 4 Pro Obsidian, operating in a nadir orientation. The image collection followed a systematic, drone-based aerial survey pattern, adhering to a linear grid trajectory with a substantial forward and lateral overlap of 75% (Figure 2a). For every temporal batch of aerial images, an ortho-mosaic image was synthesized (Figure 2b). The dataset comprises a total of 250 raw images, inclusive of 6 ortho-mosaic images [34]. The flight altitude was kept between 4.275 m to 4.749 m to minimize flight path deviation due to atmospheric wind turbulence and minimize the effect of rotor turbulence on crop leaves. The experimental settings adopted in this experiment for the multi-date image acquisition are described in Table 1 [34].

2.2. Manual Image Annotations

Manual annotations were conducted employing the VIA software, version 1.0.6 [35], and applied to distinct crop rows across multi-date aerial images and ortho-mosaics. Each individual row was maintained as a unique class across images from various dates. Seven classes were identified in total. The method of annotation involved the use of bounding boxes, and the resulting data were preserved in the COCO JSON format [25]. The annotated information can be found in [34].

2.3. Training over YOLOv8x-Seg

The annotated multi-date images from 8 October, 21 October and 29 October 2020 were used to train the YOLOv8x-seg model [36,37] for obtaining distinct trained weights for each date. In this stage, the 21 October 2020 batch of images was observed to exhibit highest mAP for training (Figure 3; Table 2). Therefore, all multi-date images and multi-date ortho-mosaics were trained with 21 October 2020’s inference. Figure 3 represents the training graphs for different datasets of cauliflower images taken on three different dates: 8 October, 21 October and 29 October.

Figure 3a represents the training characteristics observed for aerial imagery dated 8 October. The Mean Average Precision (mAP) starts low but quickly increases, showing significant fluctuations initially before stabilizing around 0.9. Similarly, the mAP@50:95 metric also fluctuates initially, stabilizing around 0.8. These early fluctuations indicate initial learning and adjustments in the model parameters, with eventual stabilization suggesting the model has found an optimal set of weights. The variability in performance during the early epochs is due to inter-class and intra-class heterogeneity.

The training data from 21 October show a similar pattern where both mAP and mAP@50:95 metrics improve rapidly and stabilize (Figure 3b). However, the stabilization is smoother and occurs earlier compared to the October 8th dataset, with the mAP stabilizing just under 0.9 and mAP@50:95 slightly above 0.8. This reduced fluctuation indicates a possibly cleaner dataset or a better initial model configuration, suggesting that the model learned faster and with more stability.

For the 29 October imagery, the class loss graph indicates that the loss starts high and decreases rapidly, showing a typical learning curve (Figure 3c). The loss stabilizes at a low value, demonstrating effective learning and minimal overfitting or underfitting. This smooth decrease in loss signifies that the model is effectively learning to classify the images, with stable performance over time.

Overall, the learning dynamics across the different dates show improvement in terms of stabilization and reduced fluctuation in mAP metrics, indicating that the YOLOv8x’s learning process becomes more efficient over time. The consistent decrease in class loss and stabilization of mAP metrics suggest good generalization to the data despite variations across the dates. The differences in initial fluctuations and stabilization points clearly show variability in the datasets, potentially due to intra- and inter-class variations in the cauliflower images, such as lighting, background, spectral overlap or the condition of the cauliflowers. Despite these variations, the model was found to achieve high performance and demonstrates robustness in handling different datasets, showing its ability to learn effectively from varied data sources.

2.4. Annotating Images with the Grounding DINO and SAM

2.4.1. Transfer Learning of YOLOv8

Subsequently, a one-shot transfer learning technique was employed to improve the capacity of the Grounding DINO to generate bounding boxes around the cauliflowers that were previously identified in all multi-date photos and ortho-mosaics. This is prefromed because the zero-shot detection suffers from weak detection for some cauliflower specimens. The initial bounding boxes form the basis for the instance segmentation process by the SAM. The Grounded SAM utilizes the pre-trained YOLOv8x-seg model to extract features from the input photos. The retrieved features offer a comprehensive depiction of the images, which can additionally train the Grounded SAM architecture [16,33,38,39,40].

2.4.2. Typecasting COCO Format to PASCAL VOC

The COCO annotations were transformed into an XML structure, and each image file was associated with a corresponding XML file named identically to the image, but with a .xml extension. This script was designed to ensure adherence to the standards of automatic annotation, rendering it compatible with tools and frameworks that necessitate the Pascal VOC format.

2.4.3. Conversion of Bounding Boxes to Instance Segmentation Masks

After generating the bounding boxes, the pre-trained SAM-based segmentation model ‘sam_b.pt’ was used to transform them into instance segmentation masks. This model received the bounding box data as input and generated accurate segmentation masks for each object in the image. The bounding boxes are usually defined by four coordinates: x_min, y_min, x_max, y_max. These coordinates represent the top-left corner (x_min, y_min) and the bottom-right corner x_max, y_max of the box that encloses the cauliflowers. Prior to this step, the batch of images were rescaled to a dimension of 2000 × 2000 and converted NumPy arrays, thereby normalizing the coordinates accordingly. These bounding boxes were then converted into a PyTorch tensor that the SAM-based model can directly process. Once the bounding boxes were transformed, they were passed to the SAM-based model for instance mask segmentation. This involved the model analyzing the content within each bounding box and generating the row mask that outlines the object for each row of cauliflowers (Figure 4a–g).

2.4.4. Saving Mask Labels in PASCAL VOC XML Format

To save the mask labels in the PASCAL VOC format, the XML files were created to adhere to the specific structure requirements. This process involved preparing metadata for each image, including filename, size (width, height, depth), and class details of all crop rows and bounding box coordinates. The minimum and maximum allowed area of the bounding box as a percentage over the total image area were set at 20% and 80%, respectively. Additionally, the approximation factor used for the segmentation masks was set at 75%. Using Python 3.10, the XML structure was constructed with elements for the folder, filename, path, source, size, segmentation status, and each annotated object. For each object, details such as bounding box coordinates (x_min, y_min, x_max, y_max) were included.

2.5. Statistical Investigations

This section discusses statistical techniques performing crop growth analyses focused on temporal cauliflower crop rows. The investigations are performed using crop row pixel counts observed for every row across different dates for both the datasets, for both aerial images and ortho-mosaics. To derive growth percentages and perform time series analyses, the automatically segmented row masks were utilized in both the datasets.

2.5.1. Relative Growth Rate (RGR) Calculation

Relative growth analysis is a method used to evaluate the growth performance of crops. This is acheived by calculating the growth percentage, which is observed in segmentation masks of individual crop rows. These masks help to distinguish between different rows and measure their size changes over time. The analysis was conducted for each crop row separately, and the measurements were taken on various dates. It is an essential tool in agricultural science that allows for the comparison of growth performance and efficiency in different environments. Researchers can enhance their understanding of crop growth dynamics and enhance agricultural practices by utilizing standardized methodologies and mitigating biases in estimates [41]. Multiple researchers have investigated various factors and strategies for analyzing and enhancing crop growth rates [42]. The mathematical expression (Equation (1)) for relative growth rate (RGR) in the context of plant growth based on pixel count is the ratio of the increase in plant size to its current size, measured over a specific period. The fundamental equation is as follows:

R G R = \frac{1}{X} \frac{d X}{d t}

(1)

where

x

is the size of the crop at time t and

\frac{d X}{d t}

represents the rate of growth of crop size. For practical applications, when growth measurements are taken at discrete time points, the formula (Equation (2)) can be approximated as follows:

R G R = \frac{\ln {(X}_{2}) - \ln {(X}_{1})}{t_{2} - t_{1}}

(2)

where

x_{1}

and

x_{2}

are the size of the crops at initial time

t_{1}

and

t_{2}

, respectively, and

t_{2} - t_{1}

is the time interval between two consecutive measurements.

This formula calculates the mean relative growth rate over the time interval from

t_{1}

to

t_{2}

. The natural logarithmic transformation ensures that the relative crop growth rate is standardized.

2.5.2. Time Series Analysis

Crops have different growth rates at different stages of their life cycle. By monitoring these growth rates farmers can identify critical periods in the crop’s life cycle. For example, if a crop is not growing as expected during a critical growth stage, it might indicate a problem that needs to be addressed. By monitoring these periods of rapid growth and stress trends over time, optimal growing conditions can be identified, as it is crucial to maximize crop yield and health [43]. Stress in crops could be due to a variety of factors such as pests, disease, poor soil health, or adverse weather conditions. In our experiment, the multi-modal time series analysis was performed using average function of crop row pixels over 6 different dates.

2.5.3. Pearson Correlation Coefficient

Pearson correlation coefficients are commonly employed in crop growth studies to identify the association between various variables that impact crop performance. It has been found to be more suitable for measuring linear relationships. This statistical metric facilitates the identification of linear correlations between variables, such as crop height, yield, nutrient content, and environmental conditions [44,45,46]. Mathematically, a Pearson correlation coefficient (r) is defined as a measure of the linear relationship between two variables and is calculated as the covariance of the two variables divided by the product of their standard deviations (Equation (3)). The formula for the Pearson correlation coefficient is as follows:

r = \frac{\sum_{i = 1}^{n} (X_{i} - \bar{X}) (Y_{i} - \bar{Y})}{\sqrt{\sum_{i = 1}^{n} {(X_{i} - \bar{X})}^{2} \sum_{i = 1}^{n} {(Y_{i} - \bar{Y})}^{2}}}

(3)

where

X_{i}

and

Y_{i}

are the individual data points of variables

X

and

Y

, respectively,

\bar{X}

and

\bar{Y}

are mean values of

X

and

Y

, respectively, and

n

is the number of data points. The Pearson correlation coefficient measures the linear correlation between two sets of data, providing a value between −1 and +1, signifying the following:

+1: Perfect positive linear relationship
0: No linear relationship
−1: Perfect negative linear relationship.

2.5.4. Two-Way ANOVA Test

A two-way ANOVA is a statistical test that examines the impact of two distinct categorical independent variables on a single continuous dependent variable. This test also aids in identifying the presence of an interaction effect between the two independent factors and the dependent variable [47]. Below is a concise explanation of the steps involved in doing a two-way ANOVA test and analyzing its outcomes:

(a): Formulate null and alternative hypothesis;
(b): Collection of data: listing dependent and independent variables;
(c): Check of assumptions to identify independent variables, normality, and homogeneity of variances;
(d): Performing the two-way ANOVA test using ‘statsmodels’ library;
(e): Interpretation of results: assessment of p-values to observe effects and find interactions between the datasets.

3. Results and Discussion

3.1. Relative Crop Growth Rate Analysis

Initial observations indicate that throughout the period from 8 October to 21 October, the aerial imagery generally exhibits higher growth percentages in most rows when compared to the ortho-mosaics. Both datasets exhibited variations, with peaks observed between 29 October and 11 November for most crop rows. Both datasets, ortho-mosaics and aerial imagery, exhibit a consistent decline towards the end of the studied period, 18–25 November, with their values converging to similar levels. Despite some individual variances, the datasets generally exhibit consistent relative patterns in their crop rows. Specifically, Row 1 and Row 2 consistently display higher initial values followed by large drops over time.

3.1.1. Multi-Date Aerial Imagery

The relative growth analysis of cauliflower crop rows across multi-date aerial imagery was calculated as percentages to reveal distinct trends for each row (Table 3). Row 1 begins with a high growth percentage of approximately 140%, experiences a sharp decline, then fluctuates, peaking around 29 October–11 November, and finally drops to around 40%. Row 2 starts at 100% and exhibits minor fluctuations but generally trends downward, ending just below the initial value. Row 3 starts at 80%, shows a significant initial drop, then fluctuates before stabilizing slightly above the starting value. Row 4 starts and ends around 80%, maintaining a stable pattern with minor fluctuations. Row 5 and Row 6 both start at 100%, exhibit minor fluctuations, and end slightly below their starting values. Lastly, Row 7 starts at 80%, experiences fluctuations, and ends slightly below the initial value. Overall, while individual rows display unique patterns, most exhibit a general downward trend with notable fluctuations (Figure 5).

3.1.2. Multi-Date Ortho-Mosaics

The relative growth analysis of cauliflower crop rows across multi-date ortho-mosaic imagery highlights significant variations across the observed periods (Table 4). Row 1 starts at a high growth percentage of around 120%, consistently declines, and ends slightly below 40%. Row 2 begins at approximately 60%, with fluctuations leading to a steady decline, ending below 40%. Row 3 starts at 40%, increases slightly, then steadily decreases, ending just above 20%. Row 4 starts at 80%, experiences a sharp drop and fluctuations, ending slightly above the initial value. Row 5 begins around 60%, shows a peak around 29 October–11 November, and ends slightly below the starting value. Row 6 starts at 100%, drops sharply, and fluctuates, ending slightly below the starting value. Row 7 begins at 120%, drops significantly, and stabilizes around 40%. Overall, each row demonstrates unique patterns, but the general trend across all rows indicates a decline in growth percentages over time, with periods of significant fluctuation (Figure 6).

3.2. Time Series Analysis

A multi-modal time series analysis was conducted on multi-date aerial imagery and ortho-mosaics. An averaging function was used in this analysis. This function computes the mean value for each crop row across various dates, thereby providing the average growth over time in crop rows for both formats: aerial imagery and ortho-mosaics.

3.2.1. Multi-Date Aerial Imagery

The time-series analysis for multi-date aerial images reveals a steady and rising trend in growth of cauliflower crop rows within the studied timeframe (Figure 7). The growth rates of the rows differ, with Row 1 and Row 2 experiencing much greater increases compared to the other rows by the conclusion of the period. The growth rates indicate periods of fast expansion, notably from 8 October to 21 October, and an overall pattern of consistent growth with occasional changes. Nevertheless, there is a noticeable decrease in the rate of increase towards the conclusion of the time frame for certain categories, indicating a potential point of saturation. In general, the data (Table 5) show a distinct trend of steady growth with significant periods of rapid expansion, which could be linked to particular circumstances or interventions.

3.2.2. Multi-Date Ortho-Mosaics Imagery

The time-series analysis of growth observed across multiple rows and dates in the ortho-mosaics reveals a consistent upward trend, indicating steady growth over the observed period (Figure 8). Rows exhibit varying rates of growth, with Row 7 showing the highest increase by the end of the period. The growth rates indicate significant growth between 8 October and 21 October and between 11 November and 18 November, suggesting intervals of rapid growth. However, some rows exhibit a deceleration or slight decline towards the end of the period, indicating a possible plateau or saturation point. Overall, the data demonstrate a clear pattern of consistent growth with notable periods of rapid increase, which may be influenced by specific conditions or interventions (Table 6).

3.3. Pearson Correlation Analysis

The Pearson correlation coefficients for each crop row between the aerial imagery and ortho-mosaics data were calculated (Table 7). The correlation values for the growth observations from aerial imagery and ortho-mosaics demonstrate consistently high positive correlations across all rows, indicating strong agreement between the two measurement methods. Row 1 (Figure 9a) exhibits an extremely high correlation of 0.9735, suggesting nearly identical measurements, similar to Row 3 with a correlation of 0.9737 (Figure 9c) and Row 4 with a correlation of 0.976 (Figure 9d). Row 2 also shows a strong positive correlation of 0.9457 (Figure 9b), which is slightly lower than Row 1. Row 5 has the lowest coefficient of 0.943 (Figure 9e). However, it still indicates a strong positive correlation, suggesting minor discrepancies. Row 6 has a near-perfect correlation of 0.9938 (Figure 9f), implying almost identical measurements, while Row 7 shows the highest correlation of 0.9982 (Figure 9g), indicating virtually higher growth measurements for ortho-mosaics. The correlations for all rows are very high, mostly above 0.95. This quantifies the strong linear relationship observed in the scatterplots. While the overall trend is very strong, there are slight variations in the degree of correlation across different rows. For instance, Row 2 and Row 5 have slightly lower correlation coefficients compared to the others, indicating minor discrepancies between the two methods in these rows. These could be due to localized factors affecting the crop growth measurements. Across all rows, the consistent pattern is that crop growth observed through ortho-mosaics tends to be higher than that observed through aerial imagery. This suggests that ortho-mosaics provide a more sensitive or higher-resolution method for detecting crop growth variations over time.

3.4. Two-Way ANOVA Test

The results of the two-way ANOVA indicate that both the date of observation and the type of observation, i.e., aerial imagery vs. ortho-mosaics, significantly affect the observed growth (Table 8). Specifically, the effect of the date on growth is highly significant (F(5, 72) = 80.88, p < 0.001), as is the effect of the observation type (F(1, 72) = 123.13, p < 0.001). Furthermore, there is a significant interaction between the date and the type of observation (F(5, 72) = 7.05, p < 0.001), indicating that the influence of the date on growth depends on whether the data were collected through aerial imagery or ortho-mosaics. These findings suggest that both the timing and the method of observation are crucial in assessing growth patterns.

Effect of Date: The sum of squares (SS) for the effect of date is 46.6858, representing the total variability in growth due to different dates. With degrees of freedom (dF) equal to 5 (calculated as the number of dates minus one), we have an F-value of 80.8775. This high F-value indicates that the variability in growth attributed to the date is much larger than the random error. The p-value for this factor is extremely small, at 3.86 × 10⁻²⁸. This indicates that the effect of the date on growth is statistically significant, meaning changes in date have a significant impact on growth measurements.

Effect of Observation Type: For the effect of type, the sum of squares is 14.2154, which represents the total variability in growth due to the type of observation (aerial vs. ortho-mosaics). The degree of freedom for this factor is one (calculated as the number of types minus one). The F-value here is very high at 123.1324, indicating that the variability in growth due to the type is significantly larger than the random error. The p-value is 3.02 × 10⁻¹⁷, which is extremely small. This result suggests that the type of observation has a statistically significant effect on growth.

Interaction between date and type of observation: The interaction between date and type has a sum of squares of 4.0712, representing the variability in growth due to the interaction between these two factors. The degrees of freedom for this interaction are 5. The F-value for the interaction effect is 7.053, which indicates a significant interaction effect between date and type. The p-value for this interaction is 2.04 × 10⁻⁵; a small value indicating that the interaction effect on growth is statistically significant. This means that the effects of date and time on observed growth are inter-related.

Residual: The residual sum of squares is 8.3122 represents the unexplained variability or random error in the data. The degrees of freedom for the residuals are 72, calculated as the total number of observations minus the total degrees of freedom for the factors and their interaction. F-values and p-values are not applicable for the residuals as they are part of the error term.

4. Conclusions

The study highlighted the effectiveness of using multiple sets of aerial images and ortho-mosaics taken at different dates to monitor the growth of the cauliflower crop. The integration of YOLOv8x for approximate boundary box detection around the cauliflower was followed by precise segmentation with the Grounded SAM to obtain segmentation masks for each crop row across different dates. This combination leverages YOLOv8’s detection strength and the Grounded SAM’s segmentation accuracy, improving the overall performance of leaf and canopy bound precise segmentation over individual crop rows.

Both datasets yielded consistent and comparable information regarding the relative rates of growth and patterns across time of crop rows. The aerial photography showed clear patterns with fluctuations and overall decreases in relative growth percentages, while ortho-mosaics displayed notable variances with a general decrease over time. The time-series analysis revealed intervals of rapid expansion and consistent growth, interspersed with intermittent fluctuations. Both approaches indicated the possibility of reaching saturation points near the conclusion of the period. The Pearson correlation analysis revealed robust positive correlations between the two datasets, indicating a high level of agreement in growth measures. However, it was observed that ortho-mosaics exhibited more sensitivity in detecting observed growth differences. The two-way ANOVA test verified that both the date of observation and the manner of observation had a substantial impact on the observed growth patterns.

However, there have been limitations that were observed, such as slight inconsistencies between datasets in particular rows. These may be attributable to localized factors, like the removal of overlapping weeds and damaged crops due to attacks by predatory birds and foxes, impacting the assessments of crop growth. To enhance future analyses, it would be advantageous to address these inconsistencies by including additional data points specific to the local area to enhance the accuracy of measurement techniques and performing the experiments in a controlled environment.

Automated annotation greatly speeds up the examination of crop growth by quickly processing vast amounts of visual data to create segmentation masks. This process of automation minimises the requirement for human involvement, enabling faster detection and measurement of crop phenology and growth patterns. Through the utilisation of deep learning methods, automatic annotations can provide a uniform and unbiased evaluation, hence improving the overall effectiveness and precision of monitoring crop development. This not only expedites the analysis process but also facilitates more frequent and meticulous observations, resulting in enhanced agricultural practices and decision-making based on well-informed data.

This study is useful for interpreting information from automatically generated segmentation masks derived from automatically annotated crop phenology. It provides a reliable method for monitoring and analysing the growth patterns of crops throughout time. The noteworthy findings from both the datasets emphasise the possibility of incorporating these methods into automated agricultural monitoring systems.

Author Contributions

Conceptualization, S.R. and E.A.S.; methodology, S.R. and E.A.S.; software, S.R.; validation, S.R., E.A.S., and S.G.; formal analysis, S.G. and P.C.; investigation, S.R., S.G., and P.C.; resources, S.R., P.C., and S.G.; data curation, S.R.; writing—original draft preparation, S.R.; writing—review and editing, S.R.; visualization, S.R. and M.B.R.; supervision, S.G. and P.C.; project administration, S.G. and P.C.; funding acquisition, S.G. and P.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

No new data was created. However, the data that were used to perform this research can be found in the article published at https://doi.org/10.1016/j.dib.2024.110506 and available in the repository DOI: 10.17632/dcjjcwc5dh.4 (https://data.mendeley.com/drafts/dcjjcwc5dh).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Di, L.; Ustundag, B. Crop Growth Modeling and Yield Forecasting. In Agro-Geoinformatics; Springer: Cham, Switzerland, 2021. [Google Scholar] [CrossRef]
Mithen, S.; Jenkins, E.; Jamjoum, K.; Nuimat, S.; Nortcliff, S.; Finlayson, B. Experimental crop growing in Jordan to develop methodology for the identification of ancient crop irrigation. World Archaeol. 2008, 40, 7–25. [Google Scholar] [CrossRef]
Smith, A.; Munro, N.D. A holistic approach to examining ancient agriculture: A case study from the bronze and iron age near east. Curr. Anthropol. 2009, 50, 925–936. [Google Scholar] [CrossRef]
Xu, Z.; Parizi, R.M.; Hammoudeh, M.; Loyola-González, O. (Eds.) Intelligent Agriculture—Agricultural Monitoring and Control Management System. Adv. Intell. Syst. Comput. 2020, 1, 317–325. [Google Scholar] [CrossRef]
Hassan, S.I.; Alam, M.M.; Illahi, U.; Al Ghamdi, M.A.; Almotiri, S.H.; Su’ud, M.M. A Systematic Review on Monitoring and Advanced Control Strategies in Smart Agriculture. IEEE Access 2021, 9, 32517–32548. [Google Scholar] [CrossRef]
Soussi, A.; Zero, E.; Sacile, R.; Trinchero, D.; Fossa, M. Smart Sensors and Smart Data for Precision Agriculture: A Review. Sensors 2024, 24, 2647. [Google Scholar] [CrossRef]
Divya, K.L.; Mhatre, P.H.; Venkatasalam, E.P.; Sudha, R. Crop Simulation Models as Decision-Supporting Tools for Sustainable Potato Production: A Review. Potato Res. 2021, 64, 387–419. [Google Scholar] [CrossRef]
Katupitiya, R.; Siew, J.W.; Howarth, K.W. Autonomous farming: Modelling and control of agricultural machinery in a unified framework. Int. J. Intell. Syst. Technol. Appl. 2010, 8, 444–457. [Google Scholar] [CrossRef]
Kapil, R.; Castilla, G.; Marvasti-Zadeh, S.M.; Goodsman, D.; Erbilgin, N.; Ray, N. Ortho-mosaicking Thermal Drone Images of Forests via Simultaneously Acquired RGB Images. Remote Sens. 2023, 15, 2653. [Google Scholar] [CrossRef]
Ahmed, S.; Basu, N.; Nicholson, C.E.; Rutter, S.R.; Marshall, J.R.; Perry, J.J.; Dean, J.R. Use of machine learning for monitoring the growth stages of an agricultural crop. Sustain. Food Technol. 2024, 2, 104–125. [Google Scholar] [CrossRef]
Zhao, Y.; Zheng, B.; Chapman, S.C.; Laws, K.; George-Jaeggli, B.; Hammer, G.L.; Jordan, D.R.; Potgieter, A.B. Detecting Sorghum Plant and Head Features from Multispectral UAV Imagery. Plant Phenomics 2021, 2021, 9874650. [Google Scholar] [CrossRef]
Sosa-Herrera, J.A.; Alvarez-Jarquin, N.; Cid-Garcia, N.M.; López-Araujo, D.J.; Vallejo-Pérez, M.R. Automated Health Estimation of Capsicum annuum L. Crops by Means of Deep Learning and RGB Aerial Images. Remote Sens. 2022, 14, 4943. [Google Scholar] [CrossRef]
Hu, P.; Chapman, S.C.; Zheng, B. Coupling of machine learning methods to improve estimation of ground coverage from unmanned aerial vehicle (UAV) imagery for high-throughput phenotyping of crops. Funct. Plant Biol. 2021, 48, 766–779. [Google Scholar] [CrossRef] [PubMed]
Zenkl, R.; Timofte, R.; Kirchgessner, N.; Roth, L.; Hund, A.; Van Gool, L.; Walter, A.; Aasen, H. Outdoor Plant Segmentation With Deep Learning for High-Throughput Field Phenotyping on a Diverse Wheat Dataset. Front. Plant Sci. 2022, 12, 774068. [Google Scholar] [CrossRef]
Rana, S.; Gerbino, S.; Crimaldi, M.; Cirillo, V.; Carillo, P.; Sarghini, F.; Maggio, A. Comprehensive Evaluation of Multispectral Image Registration Strategies in Heterogenous Agriculture Environment. J. Imaging 2024, 10, 61. [Google Scholar] [CrossRef] [PubMed]
Ren, T.; Liu, S.; Zeng, A.; Lin, J.; Li, K.; Cao, H.; Chen, J.; Huang, X.; Chen, Y.; Yan, F.; et al. Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks. arXiv 2024, arXiv:2401.14159. [Google Scholar]
Liu, S.; Zeng, Z.; Ren, T.; Li, F.; Zhang, H.; Yang, J.; Li, C.; Yang, J.; Su, H.; Zhu, J.; et al. Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection. arXiv 2023, arXiv:2303.05499. [Google Scholar]
Kirillov, A.; Mintun, E.; Ravi, N.; Mao, H.; Rolland, C.; Gustafson, L.; Xiao, T.; Whitehead, S.; Berg, A.C.; Lo, W.Y.; et al. Segment Anything. arXiv 2023, arXiv:2304.02643. [Google Scholar]
Zhang, Y.; Huang, X.; Ma, J.; Li, Z.; Luo, Z.; Xie, Y.; Qin, Y.; Luo, T.; Li, Y.; Liu, S.; et al. Recognize Anything: A Strong Image Tagging Model. arXiv 2023, arXiv:2306.03514. [Google Scholar]
Li, J.; Li, D.; Xiong, C.; Hoi, S. BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation. arXiv 2022, arXiv:2201.12086. [Google Scholar]
Zhang, W.; Dang, L.M.; Nguyen, L.Q.; Alam, N.; Bui, N.D.; Park, H.Y.; Moon, H. Adapting the Segment Anything Model for Plant Recognition and Automated Phenotypic Parameter Measurement. Horticulturae 2024, 10, 398. [Google Scholar] [CrossRef]
Sun, J.; Yan, S.; Alexandridis, T.; Yao, X.; Zhou, H.; Gao, B.; Huang, J.; Yang, J.; Li, Y. Enhancing Crop Mapping through Automated Sample Generation Based on Segment Anything Model with Medium-Resolution Satellite Imagery. Remote Sens. 2024, 16, 1505. [Google Scholar] [CrossRef]
Rana, S.; Gerbino, S.; Barretta, D.; Carillo, P.; Crimaldi, M.; Cirillo, V.; Maggio, A.; Sarghini, F. RafanoSet: Dataset of raw, manually, and automatically annotated Raphanus Raphanistrum weed images for object detection and segmentation. Data Brief 2024, 54, 110430. [Google Scholar] [CrossRef]
Nashat, A.; Mazen, F. Instance Segmentation and Classification of Coffee Leaf Plant using Mask RCNN and Transfer Learning. Fayoum Univ. J. Eng. 2024, 7, 130–141. [Google Scholar] [CrossRef]
Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft COCO: Common Objects in Context. arXiv 2014, arXiv:1405.0312. [Google Scholar]
Islam, S.; Reza, M.N.; Chowdhury, M.; Ahmed, S.; Lee, K.H.; Ali, M.; Cho, Y.J.; Noh, D.H.; Chung, S.O. Detection and segmentation of lettuce seedlings from seedling-growing tray imagery using an improved mask R-CNN method. Smart Agric. Technol. 2024, 8, 100455. [Google Scholar] [CrossRef]
Ma, Y.; Chen, S.; Ermon, S.; Lobell, D.B. Transfer learning in environmental remote sensing. Remote Sens. Environ. 2024, 301, 113924. [Google Scholar] [CrossRef]
Li, W.; Liu, W.; Zhu, J.; Cui, M.; Hua, R.Y.X.; Zhang, L. Box2Mask: Box-supervised Instance Segmentation via Level-set Evolution. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 5157–5173. [Google Scholar] [CrossRef] [PubMed]
Lv, J.; Zeng, X.; Wang, S.; Duan, R.; Wang, Z.; Li, Q. Robust One-shot Segmentation of Brain Tissues via Image-aligned Style Transformation. arXiv 2022, arXiv:2211.14521. [Google Scholar] [CrossRef]
Xue, T.; Qiao, Y.; Kong, H.; Su, D.; Pan, S.; Rafique, K.; Sukkarieh, S. One-Shot Learning-Based Animal Video Segmentation. IEEE Trans. Industr. Inform. 2022, 18, 3799–3807. [Google Scholar] [CrossRef]
Lei, W.; Su, Q.; Jiang, T.; Gu, R.; Wang, N.; Liu, X.; Wang, G.; Zhang, X.; Zhang, S. One-Shot Weakly-Supervised Segmentation in 3D Medical Images. IEEE Trans. Med. Imaging 2024, 43, 175–189. [Google Scholar] [CrossRef]
Zhang, R.; Jiang, Z.; Guo, Z.; Yan, S.; Pan, J.; Ma, X.; Dong, H.; Gao, P.; Li, H. Personalize Segment Anything Model with One Shot. arXiv 2023, arXiv:2305.03048. [Google Scholar]
Osco, L.P.; Wu, Q.; de Lemos, E.L.; Gonçalves, W.N.; Ramos AP, M.; Li, J.; Junior, J.M. The Segment Anything Model (SAM) for remote sensing applications: From zero to one shot. Int. J. Appl. Earth Obs. Geoinf. 2023, 124, 103540. [Google Scholar] [CrossRef]
Rana, S.; Crimaldi, M.; Barretta, D.; Carillo, P.; Cirillo, V.; Maggio, A.; Sarghini, F.; Gerbino, S. GobhiSet: Dataset of raw, manually, and automatically annotated RGB images across phenology of Brassica oleracea var. Botrytis. Data Brief 2024, 54, 110506. [Google Scholar] [CrossRef] [PubMed]
Dutta, A.; Zisserman, A. The VIA Annotation Software for Images, Audio and Video. In Proceedings of the 27th ACM International Conference on Multimedia (MM ’19), Nice, France, 21–25 October 2019. [Google Scholar] [CrossRef]
Yue, X.; Qi, K.; Na, X.; Zhang, Y.; Liu, Y.; Liu, C. Improved YOLOv8-Seg Network for Instance Segmentation of Healthy and Diseased Tomato Plants in the Growth Stage. Agriculture 2023, 13, 1643. [Google Scholar] [CrossRef]
Baek, H.; Yu, S.; Son, S.; Seo, J.; Chung, Y. Automated Region of Interest-Based Data Augmentation for Fallen Person Detection in Off-Road Autonomous Agricultural Vehicles. Sensors 2024, 24, 2371. [Google Scholar] [CrossRef]
Shikhar, S.; Sobti, A. Label-free Anomaly Detection in Aerial Agricultural Images with Masked Image Modeling. In Proceedings of the CVPR 2024 5th Workshop on Vision for Agriculture, Seattle, WA, USA, 17–21 June 2024. [Google Scholar]
Arbash, E.; de Lima Ribeiro, A.; Thiele, S.; Gnann, N.; Rasti, B.; Fuchs, M.; Ghamisi, P.; Gloaguen, R. Masking Hyperspectral Imaging Data with Pretrained Models. arXiv 2023, arXiv:2311.03053. [Google Scholar]
Zhang, J.; Zhou, Z.; Mai, G.; Mu, L.; Hu, M.; Li, S. Text2Seg: Remote Sensing Image Semantic Segmentation via Text-Guided Visual Foundation Models. arXiv 2023, arXiv:2304.10597. [Google Scholar]
Hoffmann, W.A.; Poorter, H. Avoiding bias in calculations of relative growth rate. Ann. Bot. 2002, 90, 37–42. [Google Scholar] [CrossRef]
Pommerening, A.; Muszta, A. Relative plant growth revisited: Towards a mathematical standardisation of separate approaches. Ecol. Model. 2016, 320, 383–392. [Google Scholar] [CrossRef]
Zhou, X.; Wang, J.; Shan, B.; He, Y. Early-Season Crop Classification Based on Local Window Attention Transformer with Time-Series RCM and Sentinel-1. Remote Sens. 2024, 16, 1376. [Google Scholar] [CrossRef]
Toebe, M.; Filho, A.C.; Lopes, S.J.; Burin, C.; da Silveira, T.R.; Casarotto, G. Dimensionamento amostral para estimação de coeficientes de correlação em híbridos de milho, safras e níveis de precisão. Bragantia 2015, 74, 16–24. [Google Scholar] [CrossRef]
Abebe, A.; Girma, E. Historical Development and Practical Application of Correlation and Path Coefficient Analysis in Agriculture. J. Nat. Sci. Res. 2017, 7, 43–49. [Google Scholar]
de Winter, J.C.F.; Gosling, S.D.; Potter, J. Comparing the pearson and spearman correlation coefficients across distributions and sample sizes: A tutorial using simulations and empirical data. Psychol Methods 2016, 21, 273–290. [Google Scholar] [CrossRef] [PubMed]
Liu, X.; Guo, J.; Zhou, B.; Zhang, J.-T. Two Simple Tests for Heteroscedastic Two-Way ANOVA. Stat. Res. Lett. 2016, 5, 6. [Google Scholar] [CrossRef]

Figure 1. (a) Locational information of the experimental farm in the Department of Agronomy. (b) University of Napoli Federico II, Portici (study area marked with red polygon). Source: Google Earth, 19 October 2022.

Figure 2. (a) Aerial instance from 29 October 2020. (b) Ortho-mosaic dated 21 October 2020 [34].

Figure 3. Training graphs for (a) 8 October, (b) 21 October, (c) 29 October 2020 [34].

Figure 4. Instance crop row masks derived from automatically segmented aerial images dated 21 October 2020. (a) Row 1. (b) Row 2. (c) Row 3. (d) Row 4. (e) Row 5. (f) Row 6. (g) Row 7.

Figure 5. Relative crop growth rate across multi-date aerial imagery.

Figure 6. Relative crop growth percentage across multi-date ortho-mosaic imagery.

Figure 7. Time series analysis of multi-date aerial imagery.

Figure 8. Time series analysis of multi-date ortho-mosaics.

Figure 9. Pearson correlation analysis between multi-date aerial imagery and ortho-mosaics: (a) Crop Row 1, (b) Crop Row 2, (c) Crop Row 3, (d) Crop Row 4, (e) Crop Row 5, (f) Crop Row 6, and (g) Crop Row 7.

Table 1. Experimental settings [34].

Sensor Model	DJI Phantom 4 Pro Obsidian
FOV	94° 20 mm (35 mm format equivalent) f/2.8 focus at ∞
Shutter Speed	8–1/8000 s
Raw image dimension	5472 × 3648 pixels
Ground Sampling Distance	0.12 cm/pixel at 4.3 m
Time of acquisition	11:45–12:30
Type of Images	RGB images in JPG format
Distribution of images across different dates	8 October 2020–45
	21 October 2020–35
	29 October 2020–53
	11 November 2020–32
	18 November 2020–40
	25 November 2020–39
	Ortho-mosaics: 6 unique files for every batch of images acquired over a particular date
Locational coordinates as per the RTK’s feed	Latitude: 40; 48; 50.0137°
Locational coordinates as per the RTK’s feed	Longitude: 14; 20; 47.7701°

Table 2. Training parameters for manually annotated imagery over 200 iterations [34].

Date	Training Batch Size	Validation Batch Size	Testing Batch Size	Mean Average Precision (mAP)	Precision	Recall
8 October 2020	30	9	6	91%	83.8%	91.7%
21 October 2020	23	6	6	99.2%	99.1%	98.7%
29 October 2020	37	11	5	99.1%	95.9%	99.8%

Table 3. Periodic crop growth percentage based on multi-date aerial imagery.

Date	Row 1	Row 2	Row 3	Row 4	Row 5	Row 6	Row 7
8–21 October	144.3	104.15	104.69	86.66	99.44	113.52	95.95
21–29 October	62.83	76.43	97.36	92.81	100.67	82.75	90.17
29 October–11 November	130.82	123.63	84.36	113.54	109.64	95.62	109.96
11–18 November	37.06	0.97	18.85	24.00	20.87	40.31	43.94
18–25 November	34.51	69.91	27.53	45.30	13.01	14.73	18.32

Table 4. Periodic crop growth percentage based on multi-date ortho-mosaics.

Date	Row 1	Row 2	Row 3	Row 4	Row 5	Row 6	Row 7
8–21 October	127.70	60.85	34.27	82.15	101.60	114.35	103.83
21–29 October	48.98	44.58	51.96	33.79	37.75	41.41	97.40
29 October–11 November	91.65	59.54	78.38	85.20	57.92	87.54	70.70
11–18 November	67.89	43.19	41.11	16.86	70.52	54.44	47.86
18–25 November	−0.67	8.13	−2.91	12.13	−20.62	1.13	19.61

Table 5. Mean growth in crop rows across multi-date aerial imagery.

Dates	Row 1	Row 2	Row 3	Row 4	Row 5	Row 6	Row 7
8 October 2020	0.1237	0.1372	0.1279	0.12	0.1267	0.1124	0.1039
21 October 2020	0.3022	0.2801	0.2618	0.224	0.2527	0.24	0.2036
29 October 2020	0.4921	0.4942	0.5167	0.4319	0.5071	0.4386	0.3872
11 November 2020	1.1359	1.1052	0.9526	0.9223	1.0631	0.858	0.813
18 November 2020	1.5569	1.116	1.1322	1.1437	1.285	1.2039	1.1703
25 November 2020	2.0942	1.8963	1.444	1.6619	1.4523	1.3813	1.3848

Table 6. Mean Growth in crop rows across multi-date ortho-mosaics imagery.

Dates	Row 1	Row 2	Row 3	Row 4	Row 5	Row 6	Row 7
8 October 2020	0.2184	0.3832	0.5123	0.404	0.4103	0.3916	0.3595
21 October 2020	0.4973	0.6164	0.6879	0.7359	0.8272	0.8394	0.7328
29 October 2020	0.7409	0.8912	1.0454	0.9846	1.1395	1.187	1.4466
11 November 2020	1.42	1.4219	1.8648	1.8235	1.7996	2.2261	2.4694
18 November 2020	2.3841	2.0361	2.6315	2.131	3.0688	3.4381	3.6514
25 November 2020	2.3681	2.2018	2.5547	2.3897	2.4359	3.477	4.3677

Table 7. Pearson Correlation among multi-date aerial imagery and ortho-mosaics.

Crop Row	Pearson Correlation Coefficient
1	0.973510
2	0.945791
3	0.973718
4	0.976041
5	0.943024
6	0.993824
7	0.998267

Table 8. ANOVA results for the effects of date and observation type and their interaction.

	Sum of Squares (SS)	Degree of Freedom (dF)	Ratio of Variance (F)	Probability of Null Hypothesis (p)
Effect of date	46.6858	5	80.8775	3.86 × 10⁻²⁸
Effect of observation type	14.2154	1	123.1324	3.02 × 10⁻¹⁷
Interaction between date and type of observation	4.0712	5	7.053	2.04 × 10⁻⁵
Residual	8.3122	72	NaN	NaN

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Rana, S.; Gerbino, S.; Akbari Sekehravani, E.; Russo, M.B.; Carillo, P. Crop Growth Analysis Using Automatic Annotations and Transfer Learning in Multi-Date Aerial Images and Ortho-Mosaics. Agronomy 2024, 14, 2052. https://doi.org/10.3390/agronomy14092052

AMA Style

Rana S, Gerbino S, Akbari Sekehravani E, Russo MB, Carillo P. Crop Growth Analysis Using Automatic Annotations and Transfer Learning in Multi-Date Aerial Images and Ortho-Mosaics. Agronomy. 2024; 14(9):2052. https://doi.org/10.3390/agronomy14092052

Chicago/Turabian Style

Rana, Shubham, Salvatore Gerbino, Ehsan Akbari Sekehravani, Mario Brandon Russo, and Petronia Carillo. 2024. "Crop Growth Analysis Using Automatic Annotations and Transfer Learning in Multi-Date Aerial Images and Ortho-Mosaics" Agronomy 14, no. 9: 2052. https://doi.org/10.3390/agronomy14092052

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Crop Growth Analysis Using Automatic Annotations and Transfer Learning in Multi-Date Aerial Images and Ortho-Mosaics

Abstract

1. Introduction

2. Materials and Methods

2.1. Image Acquisition

2.2. Manual Image Annotations

2.3. Training over YOLOv8x-Seg

2.4. Annotating Images with the Grounding DINO and SAM

2.4.1. Transfer Learning of YOLOv8

2.4.2. Typecasting COCO Format to PASCAL VOC

2.4.3. Conversion of Bounding Boxes to Instance Segmentation Masks

2.4.4. Saving Mask Labels in PASCAL VOC XML Format

2.5. Statistical Investigations

2.5.1. Relative Growth Rate (RGR) Calculation

2.5.2. Time Series Analysis

2.5.3. Pearson Correlation Coefficient

2.5.4. Two-Way ANOVA Test

3. Results and Discussion

3.1. Relative Crop Growth Rate Analysis

3.1.1. Multi-Date Aerial Imagery

3.1.2. Multi-Date Ortho-Mosaics

3.2. Time Series Analysis

3.2.1. Multi-Date Aerial Imagery

3.2.2. Multi-Date Ortho-Mosaics Imagery

3.3. Pearson Correlation Analysis

3.4. Two-Way ANOVA Test

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI