1. Introduction
Modern agriculture faces diverse challenges that extend beyond traditional farming, encompassing broader social and environmental concerns. For instance, early diagnosis of crop emergencies such as pests, diseases, weeds, and water or nutrient deficiencies is critical, as these issues can lead to severe harvest losses and economic damage, exacerbated by climate-change-induced extreme weather. Early detection enables targeted interventions to mitigate these impacts. Another significant example is precision farming, which involves applying treatments (e.g., pesticides, fertilizers, water) at variable rates based on field heterogeneity. Both challenges share a reliance on chemical products, posing dual drawbacks: (i) high costs for purchase and application and (ii) negative impacts on the environment and human health. Traditionally, diagnosing emergencies involves experts inspecting crops, often in rotation. However, this method is increasingly impractical. For instance, a 1-hectare crop with 2-m inter-row spacing requires inspecting 5 km of linear development. Daily monitoring is infeasible, as the average EU farm has 18 hectares per worker (2020 data, authors’ calculations based on [
1,
2]: this ratio is purely indicative, as the distribution of land among farmers is highly uneven). As for precision farming, while advanced machinery supports such practices, they often remain underutilized, also due to a lack of necessary input data for onboard systems in the form of prescription maps detailing specific treatments. Traditional methods for generating such maps—using satellite images for large-scale grain crops or drones (Unmanned Aerial Vehicles, UAVs) for smaller or detail-intensive cultivation—are subject to technical and practical constraints, including limited image resolution and high service costs.
AI offers a potential approach to these challenges. Deep learning (DL) models, particularly convolutional neural networks (CNNs; see [
3]), excel in automatically classifying images without human input. This theoretically enables the development of automated systems integrated into agricultural machinery and terrestrial rovers (Unmanned Ground Vehicles, UGVs) capable of autonomous operation. These systems could, in principle, continuously monitor crops, identify potential emergencies, and report their nature and location to human experts. Similarly, it is conceivable to equip smart machinery, such as booms, with advanced sensors and electromechanical systems to identify targets (e.g., weeds) on the move and adapt their actions accordingly.
However, these applications, along with many others of significant potential interest, are inherently constrained by a fundamental requirement: real-time processing. Consequently, the traditional cloud-centric IoT approach—where field data are transmitted to remote computers via a local wireless network (and potentially the Internet) for processing before being sent back to the crop—is largely impractical.
Fortunately, today, there is an important novelty. Recent advancements in AI models and hardware for deep learning acceleration have introduced a significant breakthrough. Image classification and segmentation can now be performed directly on edge devices with limited computational power and low energy consumption, making them ideal for integration into UGVs and agricultural machinery. This synergy of AI and IoT is commonly referred to as edge-based AIoT. Edge-based AIoT may enable the practical implementation of innovative agricultural concepts that were previously unfeasible, with autonomous robotics and real-time precision farming being two prominent examples. Smart UGVs [
4] can mimic human experts by inspecting crops, detecting potential emergencies, providing initial risk classifications, and identifying exact locations within the field. Real-time precision farming, on the other hand, may involve tractors equipped with chemical tanks, hydraulic circuits, and a rear boom featuring RGB cameras and edge computers. These systems leverage GPUs or other hardware accelerators to classify plant images in real time, enabling precise detection of weeds or fungal diseases and triggering targeted AI spraying. Unlike ideas proposed in the past, which focused on simplified situations manageable with low-complexity methods (suitable for the hardware available at the time), advancements in the performance of next-generation devices now enable true real-time applications, with precision treatments to be performed during standard field operations.
Figure 1 displays technical schematics of the systems described above.
This technology is not yet fully developed or available as an off-the-shelf solution. The examples in
Figure 1 represent research and development prototypes rather than commercial products. However, the current state of the ICT arsenal—including AI models, datasets, dedicated hardware, and advancements in deep learning theory—has matured sufficiently to enable practical implementations of edge-based AIoT systems in agriculture.
This review examines the key components necessary to develop expert systems for agriculture by leveraging advancements in edge-based AIoT technology. Four priorities have been identified for the implementation of such solutions:
High-Quality Training Datasets: cultivated plants exhibit significant phenotypic variability and produce fruit within narrow timeframes, making it challenging to create high-quality datasets. Innovative approaches are essential to address this issue.
Deep Learning Models: the rapid evolution of DL models for computer vision demands careful selection. Choosing the appropriate model can determine the feasibility of real-time inference and enable effective performance even with limited datasets.
Hardware Awareness: a thorough understanding of state-of-the-art is critical. Manufacturers offer diverse solutions based on varying philosophies, making it essential to evaluate the global landscape and select the most suitable option for each application.
Emerging Deep Learning Methods: recent advancements in DL show game-changing potential. While not yet integrated into standard libraries and less accessible than established models, they warrant close monitoring for future agricultural applications.
We aim to provide a comprehensive review of the latest findings on the topics above, drawn from a broad range of research, including both theoretical works and practical references related to ICT technologies. Our goal is also to offer a useful toolkit for those designing and implementing modern agriculture systems based on cutting-edge AIoT solutions. However, given the rapidly evolving nature of these topics, we note that this review may become outdated within a few years. For this reason, we have focused on the most recent papers, prioritizing those aligned with the latest advancements in the field. We have intentionally limited the analysis of certain well-established aspects of ‘classical’ IoT for agriculture, such as sensor technology and wireless networks in the field. The pace of innovation in these areas is not as rapid as in the four priorities outlined above. Additionally, several recent reviews already address these topics, such as [
4] for sensors related to computer vision, [
5] for general-purpose sensors in precision farming, and [
6] for wireless communication technologies in agriculture. Finally, we note that AIoT can support a wide range of applications relying on diverse data types, such as meteorological information, soil parameters, and chemical-physical measurements from plant leaf sensors. This review mostly focuses on image data and real-time methodologies for its analysis, with a particular emphasis on implementations in autonomous systems: nevertheless, in
Section 7.2, we briefly review new trends in wireless communication and sensor technology.
This work is organized as follows. In
Section 2, we present the background and related literature on AIoT and deep learning classifiers for agriculture. In
Section 3, we describe the process used to search and compile the bibliography for this review.
Section 4 covers the limited public image datasets available in this field. In
Section 5, we highlight the most relevant works on computer vision algorithms.
Section 6 reviews findings on synthetic dataset generation. In
Section 7, we discuss recent advances in IoT and real-time systems.
Section 8 outlines the core concepts behind the most promising emerging neural network architectures, detailing their strengths and weaknesses compared to traditional deep learning philosophy. Finally,
Section 9 and
Section 10 address future directions and provide a discussion and conclusions, respectively.
2. Related Work
For a systematic, general picture of digital agriculture and its transition from level 4.0 to 5.0, we recommend [
7]. They provided a clear representation of the current status of ICT technology applications in modern farming, including IoT and AI models for data and image analysis. Few studies on AIoT for agriculture exist. Ref. [
8] presents a comprehensive review of general IoT and AI applications in disease identification, farm monitoring, and agricultural data analysis. Unlike our approach, their work does not emphasize edge computing for agriculture but instead highlights the importance of 5G and broader network infrastructures, focusing primarily on the cloud-centric AIoT paradigm. Their section on challenges in technology adoption is particularly insightful. Notably, one of the issues they address—privacy and data security—can be significantly mitigated by transitioning from a cloud-centric approach to a hybrid model, where most data are collected and processed locally in real time. Another notable work on AIoT for agriculture is [
9], a research topic book encompassing 12 papers that cover a wide range of applications. Among them, Sun et al. [
10] presented a highly interesting approach for rapid pear detection to facilitate robotic harvesting, tested even under nighttime conditions. Zhou et al. [
11] described a method for detecting robot pathways for navigation between vineyard rows (‘road extraction’) and propose an algorithm for fruit detection. Their proposed solutions are innovative, although the authors acknowledged certain technical limitations and suggested directions for further development. Wang et al. [
12] addressed one of the major challenges in applying deep-learning-based computer vision models in agriculture—the lack of data—by proposing an effective method for adaptive image augmentation.
AI technology for greenhouse agriculture deserves separate consideration, given the significant differences compared to open field crops. These differences include environmental conditions, types of emergencies, data characteristics, sensors, and more. Akbar et al. [
13] provided an exhaustive review of AI technology for smart greenhouses, covering monitoring of crop growth, recognition and classification, pest, disease, and insect detection, yield estimation, and weed management. Notably, they also outline the major limitations affecting current computer vision technology applied to this field, particularly the lack of high-quality datasets, the challenge of integrating disparate sensors carrying different types of physical information, and, even more importantly, the shortage of specialized, interdisciplinary human experts in the field.
Several reviews are devoted to AI-based image classification for agriculture in open fields: in the following, we provide a selection of remarkable works published in the last five years. Tripathi and Maktedar [
14] analyzed the role of computer vision (CV) models in fruits and vegetables, stressing the importance of effective data preprocessing, which is often underrated. They also pay special attention to the mathematics behind crucial steps like feature extraction, and the computation and assessment of descriptors through similarity measures. In [
15], IoT takes center stage, emphasizing sensor technology for agrometeorological data (temperature, moisture, etc.), soil parameters (pH, water content, etc.), Wireless Sensor Networks, and robotics. Early, autonomous recognition of diseases and pests is a key activity: consequently, several authors focus on this area. Hasan et al. [
16] investigated machine learning (ML) methods, including unsupervised models, concentrating on feature representation in both classical ‘shallow’ classifiers and modern DL classifiers. The issue of real data availability is discussed, along with solutions to manage practical problems like examples scarcity through data augmentation. Mishra et al. [
17] analyzed how this general problem is managed using autonomous platforms, mainly through spectral imaging, i.e., the convergence of traditional imaging and spectroscopy. They also provided an interesting explanation of why near-infrared (NIR) imaging is important for detecting plant diseases, which we highly recommend. Ngugi et al. [
18], on the other hand, limited their investigation to methods based on ‘standard’ RGB imagery, mainly for practical and affordability reasons. These are important considerations for methods suitable for a large farming audience. They compare results from both hand-crafted feature extraction and deep learning, concluding that the combination of the two leads to higher performance. Orchi et al. [
19] provided a valuable review on the taxonomy of main plant diseases and related symptoms, with a quantitative survey of 10 deep learning methods applied to detect them. We particularly appreciate their critical examination of unresolved challenges such as insufficient and/or unbalanced data, symptom similarity, image segmentation, and practical issues encountered when taking real images in the field, such as lighting conditions and camera use. They propose several beneficial strategies to tackle these problems.
In recent years, there has been a trend towards specialized reviews focusing on specific crops, exploring AI models tailored to address their particular challenges. Omaye et al. [
20] explored plant disease detection, focusing on its application to four major species: apple, cassava, cotton, and potato plants. They addressed key points related to these crops, including the main diseases they are subject to, the trends in deep learning and machine learning models for their detection, and opportunities and recommendations for designing future strategies. Similarly, Jafar et al. [
21] studied the major diseases affecting four reference crops: tomato, chilli, potato, and cucumber; they stressed the possible benefits to integrate AI with IoT technology, in particular drones.
Weed detection deserves special consideration due to its significant impact on farming economy, yet recognizing weeds is often challenging due to their phenotypic similarity to cultivated plants (even because of Vavilovian mimicry). Hasan et al. [
22] provided a comprehensive review of the weeds landscape and detection methods, comparing traditional machine learning with deep learning models. The authors dedicated significant attention to the entire workflow, including in-field data acquisition using UAVs, robots, and other vehicles, as well as remote sensing and the utilization of public data repositories. They emphasized image preprocessing and preparation, highlighting the crucial step of annotation. AI models are also thoroughly reviewed, discussing their training and performance. Murad et al. [
23] presented another comprehensive systematic review on the topic. They clearly and effectively illustrated the widespread issue of weeds and their impact on cultivated plants. The review encompasses both classical machine learning and recent deep learning methods, offering valuable insights into existing datasets. Additionally, they showcase AI methods applied to weed detection, along with their Key Performance Indicators and, where available, the approximate computer time required for training. Intriguingly, they list each weed along with its main target crops, the best AI model for detection, and the achieved accuracy. The review by Juwono et al. [
24] is noteworthy for its focus on the mathematical aspects of weed detection workflows using AI models, an often neglected area. The authors conducted a thorough analysis of background removal techniques, which are critical for accurate detection, employing various vegetation indices and thresholds. They also delved into the distinctions between biological, spectral, and texture features. Similar to [
23], they provided valuable information on the most effective AI models and their performance in detecting specific weeds. They also detail publicly available datasets. Qu and Su [
25] adopted a similar approach, with the added value of two sections devoted, respectively, to (i) weed recognition applications, where they consider images from various sources, ranging from smartphones to drones and satellites and (ii) the integration of robots and IoT-enabled agricultural machines. Hu et al. [
26] focused on DL-based weed recognition in large-scale grain fields, appropriately illustrating the differences among image classification, object detection, semantic segmentation, and instance segmentation.
There is a final group of references that we believe are of special interest: those targeting the technological and algorithmic aspects of AI models for agriculture. Yang et al. [
27] addressed the recurrent problem of data scarcity using a few-shot learning (FSL) approach. In addition to the common data augmentation technique, they employed methods based on metric learning, external memory, and parameter optimization. Interestingly, they concluded that current agricultural few-shot learning is mostly theoretical and suggested the use of IoT to move this technology to a truly operational ground. Ragu and Teo [
28] also dealt with object detection and classification using few-shot models, focusing on meta-learning algorithms. They summarized numerous research works on FSL for agriculture, along with the datasets used and the accuracy achieved. In [
29], a beneficial survey of works on semantic segmentation of agricultural images is presented. This technique allows for pixel-level descriptions of embedded objects. For instance, it is feasible to identify pests by extracting the texture, shape, and size of insects in the image. The authors provided examples of semantic segmentation based on thresholding, clustering, and deep learning, and finally, reviewed challenges and strategies to improve AI for semantic analysis of images in agriculture. Guerri et al. [
30] considered the analysis of hyperspectral images (HSIs) in agriculture, analyzing the issues that these data bring: redundancy of spectral bands, shortage of training samples, and complex relationship between spatial positions and spectral bands. The results of several machine learning methods are illustrated, not limited to CNNs. HSI can be used also for shedding light on soil properties like texture, moisture, nutrients content, carbon presence, and salinity. Regarding crops, HSI can help to obtain information about chlorophyll content, drought stress, and weed and disease detection [
31].
Contributions of This Study
While a few reviews on AIoT in agriculture exist, as previously illustrated, they primarily focused on integrating the traditional cloud-centric paradigm with AI models available at the time of publication. In contrast, our work emphasizes edge-based AIoT for agriculture, a novel paradigm where most processing occurs directly in the field or nearby, close to the data source. Remote computers are reserved for specific tasks, such as long-term data storage, their in-depth analysis, and activities that do not require real-time processing. In general, we believe that the most effective ICT implementations in near-future agriculture will strongly rely on a well-balanced integration of edge and cloud computing paradigms, optimized to leverage the strengths of each.
In detail, our contributions can be summarized as follows:
A critical challenge in AIoT for agriculture is the scarcity of images and other data to train deep learning models, driven by factors such as the seasonality of crops, significant phenotypic variation across locations, and year-to-year variability within the same region. We provide a detailed review of three practical strategies to address this issue: utilizing open datasets, including the most recent ones, ‘classical’ data augmentation, based on simple geometric transformations of existing images, and generating synthetic images.
The landscape of deep learning models for computer vision is evolving rapidly. We focus on recent advancements that outperform existing methods and enable implementation on edge devices with limited hardware resources.
This progress is closely tied to next-generation computing devices, which surpass their predecessors in performance while maintaining moderate energy consumption. Unlike remote servers used in cloud computing, edge devices often face memory constraints. We review state-of-the-art methods tailored for edge deployment and the latest hardware innovations in this domain.
Recent neural network architectures, diverging from the traditional multi-layer perceptron approach, have emerged in the deep learning community. Although these models are not yet as mature as established methods, they hold significant promise and could soon surpass current technologies. We highlight two major trends in image classification and review the preliminary literature on these developments.
4. Public Datasets
Several surveys focus on datasets for precision agriculture. Notably, ref. [
41] reviewed public datasets for computer vision tasks in precision agriculture, including 15 datasets on weed control, 10 on fruit detection, and 9 on miscellaneous applications. They also provided recommendations for creating new image datasets, addressing aspects such as image acquisition, augmentation, annotation, and data sharing.
Table 1 lists reviewed public datasets for computer vision tasks in precision agriculture, along with their size, crops/weeds/fruits that are represented, the main tasks for which they could be used, and the countries where the photos were shot. The shown number of images indicate the number of raw photos and possibly the number of images after their augmentation or subdivision, if available.
Some datasets have several crop species, usually with images of both healthy and unhealthy plants and sometimes with the corresponding weeds. One of the most used is the PlantVillage dataset [
42], which contains 54,309 images of 14 crop species captured in experimental research facilities connected to American Land Grant Universities, divided into 38 classes of healthy and unhealthy crops. The crops include apple, blueberry, cherry, corn, grape, orange, peach, bell pepper, potato, raspberry, soybean, squash, strawberry, and tomato, while the diseases include 17 fungal diseases, 4 bacterial diseases, 2 mold (oomycete) diseases, 2 viral diseases, and 1 disease caused by a mite. The PlantDoc dataset [
43] contains 2598 images of 13 plant species divided into 27 classes. It was created by downloading images related to the 38 PlantVillage classes using their scientific and common names from Google Images and Ecosia. The resulting 20,900 images were then manually selected according to APSNet guidelines. Unlike PlantVillage images, which have homogeneous backgrounds, PlantDoc images present natural environmental conditions. According to the authors, fine-tuning the models on PlantDoc instead of PlantVillage reduces the classification error by up to 31% in the classification or detection of images in real scenarios. The Plant Pathology Databases for Image-Based Detection and Recognition of Diseases (PDDB) [
44] include a dataset with 2326 original images of 171 diseases and other disorders affecting 21 plant species and one with 46,513 images, obtained by subdividing the original images. The Plant Seedlings Dataset [
45] contains 407 images with 960 unique plant seedlings belonging to 12 crop and weed species at several growth stages captured by using a consumer camera. Katra-Twelve [
46] is a dataset provided by an Indian university and contains 4503 images divided into 12 plant species with 22 healthy and diseased leaf types. The Cashew, Cassava, Maize, Tomato (CCMT) dataset [
47] is a dataset for crop diseases that includes 24,881 raw images from local farms in Ghana divided into 22 classes, then augmented to 102,976 through cropping and resizing. Chen et al. [
48] released a dataset captured infield with non-uniform illumination intensities and clutter field background conditions containing 400 images with 4 maize diseases and 500 images with 5 rice diseases.
Other datasets are specific to weeds. The leaf counting dataset [
49] contains 9372 images collected in fields across Denmarks using several cameras with 18 weed species or families at different stages of growth and, therefore, with a different number of leaves. The images are divided into nine classes according to the number of leaves in each plant and present several types of soil and light conditions. DeepWeeds [
50] contains 17,509 images of eight weed species collected by a robot from eight locations across northern Australia. The Open Plant Phenotype Database (OPPD) [
51] contains 7590 images with 64,292 plant seedlings of 47 weed species, cultivated under ideal, drought, and natural growth conditions. The authors provide bounding box annotations for 315,038 plant objects. Weed25 [
52] is a dataset of weeds containing 14,035 images related to 25 different weed species acquired from fields and lawns in China at different time points and with different weather conditions (sunny, cloudy, and rainy).
Most datasets are related to a single crop species, often with the corresponding weeds. Rice Leaf Disease Image Samples [
53] is a dataset of 5932 images that includes 4 kinds of rice leaf diseases found in the western region of the Indian state of Odisha: bacterial blight, blast, brown spot, and tungro. Li et al. [
54] made available a dataset of 7220 photos taken by mobile phone at a rice experimental base in China that includes three common rice leaf diseases: rice leaf blast, bacterial blight of rice, and flax leaf spot. CottonWeeds [
55] is a dataset of 7578 images of two cotton weed species cropped from 1737 photos captured with a consumer reflex camera under different weather conditions and at different day periods from an Indian cotton field. CottonWeedID15 [
56], CottonWeedDet3 [
57], and CottonWeedDet12 [
58] are three datasets of weeds that are common in cotton fields in southern USA. The images were acquired by either smartphones or hand-held digital cameras, under natural field light conditions, and at varied stages of weed growth. The size varies from the 848 images with 1532 bounding box annotations of CottonWeedDet3 to 5648 images with 9370 bounding box annotations of CottonWeedDet12. The Global Wheat Head Detection (GWHD) dataset [
59] contains 4700 high-resolution RGB images collected between 2016 and 2019 by nine institutions at ten different locations in various countries by using several cameras, depicting 190,000 labeled wheat heads. An updated version was released in 2021, incorporating 1722 additional images from five more countries, along with 81,553 new wheat head annotations [
60]. The authors also provided guidelines for the development of new wheat head detection datasets, with a discussion about image acquisition, metadata to be associated, and labeling methods. The Sugar Beets 2016 dataset [
61] contains 283 images divided into ten classes, sugar beet and nine types of weeds, and 12,340 images divided into three classes, crop, weed, and background. The images were acquired by a robot on a sugar beet farm over a period of three months under controlled lighting.
Table 1.
Public datasets for precision agriculture.
Table 1.
Public datasets for precision agriculture.
Reference | Dataset | Images | Plants/Diseases | Main Tasks | Location |
---|
[42] | PlantVillage | 54,309 | 14 crop species with 26 diseases | Crop disease classification | USA |
[43] | PlantDoc | 2598 | 13 plant species with 17 diseases | Crop disease classification | Worldwide |
[44] | PDDB | 2326/46,513 | 21 plant species with 171 diseases | Crop disease classification | Brasil |
[45] | Plant Seedlings | 407 | 12 crop and weed species | Plant Seedling Classification, weed classification | Denmark |
[46] | Katra-Twelve | 4503 | 12 plant species | Crop disease classification | India |
[47] | CCMT | 24,881/102,976 | Cashew, Cassava, Maize, Tomato | Crop disease classification | Ghana |
[48] | Plant Disease Detect | 900 | Four maize diseases and five rice diseases | Crop disease classification | China |
[49] | Leaf counting dataset | 9372 | 18 weed species or families | Weed classification, leaf counting | Denmark |
[50] | DeepWeeds | 17,509 | Eight weed species | Weed classification | Australia |
[51] | OPPD | 7590 | 47 weed species | Plant Seedling Classification, weed detection | Denmark |
[52] | Weed25 | 14,035 | 25 weed species | Weed detection | China |
[53] | Rice Leaf Disease Image Samples | 5932 | Four rice leaf diseases | Rice disease classification | India |
[54] | Rice disease pictures | 7220 | Three rice leaf diseases | Rice disease classification | China |
[55] | CottonWeeds | 7578 | Two cotton weed species | Cotton weed classification | India |
[56] | CottonWeedID15 | 5187 | 15 cotton weeds | Cotton weed detection | USA |
[57] | CottonWeedDet3 | 848 | Three cotton weeds | Cotton weed detection | USA |
[58] | CottonWeedDet12 | 5648 | 12 cotton weeds | Cotton weed detection | USA |
[59,60] | GWHD | 6422 | Wheat | Wheat head detection | Various countries |
[61] | Sugar Beets 2016 | 12,623 | Nine types of sugar beet weeds | Crop and weed segmentation | Germany |
[62] | Plant Pathology 2021—FGVC8 | About 20,000 | Apple leaf diseases | Apple disease classification | USA |
[63] | AppleLeaf9 | 14,582 | Eight apple leaf diseases | Apple disease classification | China, USA |
[64] | Potato Leaf Disease Dataset in Uncontrolled Environment | 3073 | Six potato diseases | Potato disease classification | Indonesia |
[65] | Aberystwyth Leaf Evaluation Dataset | 1676 | Arabidopsis in various stages of growth | Leaf evaluation | UK |
[66] | CD&S | 2112/4455 | Three corn diseases | Corn disease classification and detection, disease severity classification | USA |
[67,68] | Vegnet | 656 | Three cauliflower diseases | Cauliflower disease detection | Bangladesh |
[69] | Sun Flower Fruits and Leaves dataset | 467/1,668 | Three sunflower diseases | Sunflower disease classification | Bangladesh |
[70] | Rice Seedling and Weed Dataset | 28/224 | Rice seedlings and weeds | Crop and weed segmentation | China |
[71] | GrassClover | 39,615 | Two types of clovers and three types of weeds | Clover, grass and weed segmentation | Denmark |
[72] | DeepFruits | 724 | sweet pepper, rock melon, apple, avocado, mango, orange and strawberry | Fruit detection | Australia |
[73] | Date fruit | 8079 | date | Fruit detection, maturity analysis | Saudi Arabia |
Plant Pathology 2021—FGVC8 [
62], constructed by Cornell Initiative for Digital Agriculture, containing about 20,000 images of several apple foliar diseases, in particular apple scab, cedar apple rust, Alternaria leaf spot and frogeye leaf spot, and healthy leaves. Photos were taken using a consumer reflex camera and smartphones under various illumination, angle, surface, and noise conditions. AppleLeaf9 [
63] contains 14,582 images of 8 apple leaf diseases, with 94% of them captured in the wild. The Potato Leaf Disease Dataset in Uncontrolled Environment [
64] contains 3073 images with seven classes, i.e., healthy leaves and leaves attacked by viruses, bacteria, fungi, pests, nematodes, and phytophthora, captured by smartphone from potato farms located in Central Java, Indonesia. The Aberystwyth Leaf Evaluation Dataset [
65] comprises 1676 top-down, time-lapse visible spectrum images of Arabidopsis, acquired over 35 days with 15-minute intervals. It includes 916 annotated images, capturing 40 plants in various stages of growth. The Corn Disease and Severity dataset (CD&S) [
66] contains 4455 images related to three common foliar corn diseases, i.e., northern leaf blight, gray leaf spot, and northern leaf spot, of which 2112 are field images and 2343 are augmented images obtained by changing the background of half of the field images. These raw images are also annotated with bounding boxes. The other 515 images are categorized according to disease severity in a range from 1 (resistant) to 5 (susceptible). Vegnet [
67] includes 656 images of cauliflowers affected by three diseases, bacterial spot rot, black rot, and downy mildew, captured in Bangladesh using a digital camera. Uddin et al. [
68] released an annotated version with bounding boxes indicating the disease locations. The Sun Flower Fruits and Leaves dataset [
69] contains 467 sunflower images manually captured from a demonstration farm in Bangladesh with 3 types of diseases, then augmented to 1668 through rotations, scaling, shifting, noising, blurring, brightness, and contrast change. Ma et al. [
70] presented a dataset with 28 original images of seedling rice with weeds captured in paddy fields in China, divided into 8 tiles to obtain 224 smaller images of size 912 × 1024 pixels. The images are segmented into rice seedlings, weeds, and background.
The GrassClover dataset [
71] contains images acquired with three digital cameras at two experimental sites and in the fields of three dairy farms in Denmark in a 18-month period. It contains dense populations of grass with two types of clovers and three types of weeds characterized by heavy occlusions. The training set contains 8000 synthetic images with pixel-wise class and instance labels, 31,600 unlabeled images, and 152 randomly selected biomass labels and corresponding training images, while the test set contains 15 manually labeled images and 238 biomass pairs. The synthetic images have been generated by cropping out several plant species and plant parts from the original photos and adding them to soil background images after random rotation and scaling together with an artificial shadow created with a Gaussian filter until a preset leaf area index was reached.
Finally, there are some datasets that are specific to fruit. DeepFruits [
72] contains between 54 and 170 images for each of 7 fruits: sweet pepper, rock melon, apple, avocado, mango, orange, and strawberry. The Date fruit dataset [
73] contains two subsets of images related to dates: the first one includes 8079 images captured in Saudi Arabia by three color cameras divided according to fruit variety, maturity and harvesting decision, while the second one contains the images, videos, and weight measurements of date bunches acquired during the harvesting period.
5. Computer Vision Algorithms
Computer vision algorithms for precision agriculture are related to several tasks, including disease, weed, and pest recognition, health monitoring, irrigation management, and fruit, plant, or leaf counting. Disease and weed recognition are two of the most analyzed tasks and they can be furthermore categorized into: (i) classification of images according to the main subject of the image, which can be a binary one with a classification between healthy and unhealthy plants or crops and weeds or a multi-class one with the identification of the depicted plant species/family or disease; (ii) detection of healthy and unhealthy plants, disease regions, or weeds in the images, usually by using rectangular bounding boxes; and (iii) segmentation of the images, assigning the pixels to the class they belong to (e.g., healthy/unhealthy plant or crop, weed, and background).
Table 2 and
Table 3 lists, respectively, multi-crop and single-crop algorithms for computer vision tasks in precision agriculture proposed or analyzed by the reviewed publications, along with the used datasets, the augmentation techniques and the best results obtained on these datasets.
For classification tasks CNNs are widely used, sometimes in combination with vision transformers, a technology based on the attention mechanism that obtained good results on natural language processing [
74]. For detection tasks one of the most used method is the YOLO one-stage algorithm [
75], a standard in computer vision, particularly among object detection algorithms for real-time applications: several authors have compared their versions or tried to improve them for tasks related to precision agriculture. There are few works related to segmentation, probably due to the difficulty in annotating the individual pixels of the images, a fact that is reflected by the lack of public datasets annotated for this task. The distribution of the works on algorithms is similar to that of the works on datasets also in relation to crops, with some publications that proposed methods for analyzing several crop species and with the majority of them related to a single crop species, often rice or cotton. The following works belong to the first group. Zhao et al. [
76] proposed a method based on YOLOv5, adding two lightweight structures for extracting feature information to the CSP structure of the neck part, the Ghost module [
77] and the inverted residual structure, introduced in the MobileNetV2 architecture [
78], and proposed the CAM module, with an improved channel attention mechanism. They also proposed an improved bounding box prediction method and an improved loss function called DIoU for replacing the Generalized Interface over Union (IoU) loss function of the original YOLOv5s, which introduces the overlap area and the distance of the centroids and that decreases convergence time by minimizing the distance between the two target frames. Augmentation includes the mosaic data enhancement method, proposed by Bochkovskiy et al. [
79], in which four images are randomly cropped and merged into one only image: this method is used in the first 40% of training rounds, while in the remaining 60%, normal data augmentation such as flip, scaling, length and width distortion, and color gamut transformation is used. Their model achieved a mean Average Precision (mAP) of 95.92% in a dataset of 1319 images extracted from the PlantVillage dataset with five crops and eight disease types, to which transformations like Gaussian blurring, horizontal flip, random rotation, and random brightness change were applied for obtaining 4079 images. Yu et al. [
80] proposed Inception Convolution and Vision Transformer (ICVT), a deep learning model that mixes the Inception architecture [
81] with the vision transformer [
82], obtaining an accuracy of 99.94% on the same PlantVillage dataset.
Table 2.
Multi-crop computer vision algorithms for precision agriculture.
Table 2.
Multi-crop computer vision algorithms for precision agriculture.
Reference | Algorithms | Datasets | Augmentation | Best Results |
---|
[76] | Improved YOLOv5 with two lightweight structures for extracting feature information added to the neck part and a new channel attention mechanism | A dataset of 1319 images extracted from the PlantVillage dataset with five crops and eight disease types | Mosaic data enhancement, flip, scaling, length and width distortion, color gamut transformation | mAP = 95.92% |
[80] | ICVT, a mix between the Inception architecture and the vision transformer | PlantVillage | | Accuracy = 99.94% |
[83] | VGG-ICNN, a CNN model with four layers from VGG16 and three blocks form GoogleNet InceptionV7 | PlantVillage, a dataset derived from the PDDB dataset with 18 plant species, the Apple Dataset from the plant pathology challenge, and the two subsets of Plant Disease Detect related to maize and rice diseases | | Accuracy = 99.94% on PlantVillage, accuracy = 93.66% on PDDB |
[84] | Lighter MobileNetV3-small model with two layers quantized with lower-precision representations | PlantVillage | | Accuracy = 99.50% |
[85] | A downscaled Inception architecture, a downscaled residual architecture, and a downscaled dense residual architecture | PlantVillage | Cropping, padding, vertical and horizontal flip, translations, scaling, shearing, rotation, image sharpening, dropping of pixel values and color channels, addition of Gaussian noise and Gaussian blur | Accuracy = 96.75% with the dense residual architecture |
[86] | DFN-PSAN, a network with multi-level deep information feature fusion where feature extraction is carried out by an improved version of YOLOv5n with a PSAN classification layer and label smoothing | Katra-Twelve, BARI-Sunflower, containing 467 images of delicate leaves and infected sunflower leaves and flowers from Bangladesh, FGVC8, PlantVillage | A weather augmentation technique that simulates solar flares, the effect of rain, fog, and shadows from leaf shading | Accuracy = 99.89% on PlantVillage |
[52] | YOLOv3, YOLOv5, and Faster-RCNN | Weed25 | | Accuracy = 92.4% with YOLOv5 |
[87] | Small Inception model applied on square patches with a size of 32 × 32 pixels | PlantVillage version with leaves segmented from the background, 108 images from the PDDB dataset | | Accuracy = 94.04% on PlantVillage, Accuracy = 97.22% on the images from the PDDB dataset |
[88] | DIC-Transformer, a mix between Faster R-CNN and Swin Transformer with image caption generation | Dataset with 3971 images of 10 plant species affected by 18 diseases | | Accuracy = 85.4% |
Table 3.
Single-crop computer vision algorithms for precision agriculture.
Table 3.
Single-crop computer vision algorithms for precision agriculture.
Reference | Algorithms | Datasets | Augmentation | Best Results |
---|
[89] | DenseNet121, Inceptionv3, MobileNetV2, resNext101, Resnet152V, Seresnext101, an ensemble stack of Densenet121, EfficientNetB7 and XceptionNet | Dataset with 900 images of nine rice diseases | Horizontal and vertical flip, distortion, shear transformation, rotation from −15° to 15°, rotations of multiple of 90°, skewing and intensity change | Accuracy = 97.62% with the ensemble of Densenet121, EfficientNetB7 and XceptionNet |
[54] | Improved YOLOv5s with a reduced workload in the backbone network | Dataset with 7220 images of rice diseases | Mosaic data enhancement, cropping, scaling, flip, translation, rotation | maP@0.5 = 98.65% |
[90] | MSDB-ResNet, a multi-scale dual-branch model with a GAN and a ConvNeXt residual block incorporated into a ResNet | Rice Leaf Disease Image Samples | Cropping, scaling, mirroring, brightness change, motion blur | Accuracy = 99.34% |
[91] | VGG-16 with an improved generalization in rice leaf detection | Rice Leaf Disease Image Samples | Cropping, tilting, rotation, blurring | Accuracy = 99.7% |
[55] | 11 deep learning models for image classification with cross-entropy and weighted cross-entropy losses, three YOLOv5 models | CottonWeeds | | Accuracy = 95.43% with MobileNet, mAP@0.5 = 87.5% with YOLOv5x |
[92] | 13 single-stage and two-stage object detectors based on deep learning | Reduced version of CottonWeedDet3 | Horizontal flip, shadow, rotation by 90°, brightness and contrast change, HSV and RGB shift, snow and rain, fancy PCA, blur, Gaussian noise | mAP@0.5 = 79.98% and mAP@[0.5:0.95] = 62.97% with RetinaNet R101-FPN |
[58] | 25 YOLO object detectors | CottonWeedDet12 | Horizontal and vertical flip, rotation, compression, fancy PCA, brightness and contrast change, RGB shift, Gaussian and multiplicative noise, blur | mAP@0.50 = 95.63% and mAP@[0.5:0.95] = 90% with YOLOv4 |
[93] | EADD-YOLO, a model based on YOLOv5 with shufflenet inverted residual blocks inserted in the backbone and depthwise convolutions inserted in the neck | Dataset with 26,377 images of apple leaf diseases | | mAP = 95.5% |
[94] | DBCoST, a Dual-branch model with a CNN branch derived from the ResNet-18 model and a Transformer one derived from the Swin Transformer Tiny | Subset of FGVC8 with five disease types, AppleLeaf9 + 756 images from FGVC8 | Horizontal and vertical flip, rotation, color jitter, normalization, Gaussian and salt-and-pepper noise | Accuracy = 97.32% on subset of FGVC8, accuracy = 98.06% on AppleLeaf9 + FGVC8 |
[95] | Enhanced YOLOX-Tiny with hierarchical mixed-scale units and convolutional block attention modules added to the neck part | Dataset with 340 images of tobacco crops with brown spot disease | | AP = 80.56% |
[96] | Lighter YOLOv5s with ghost convolution, an involution operator, an attention mechanism and a Content-Aware ReAssembly of Features | Dataset of 2246 images with seven strawberry diseases, PlantDoc | Mosaic data enhancement, HSV enhancement, brightness change, target occlusion | mAP@0.5 = 94.7% on their dataset, mAP@0.5 = 27.9% on PlantDoc |
[97] | Lesion Proposal CNN | Dataset of 3411 images with strawberry diseases | | Accuracy = 92.56% |
[68] | Cauli-Det, an improved YOLOv8 with three additional convolution blocks and Hard Swish | VegNet | | mAP@0.5 = 90.6%, mAP@[0.5:0.95] = 69.4% |
[98] | Ten deep learning models | Dataset of 656 images with four classes of cauliflower diseases | | Accuracy = 99.90% with EfficientNetB1, F1 = 99.62% with Xception |
[99] | Faster-RCNN improved by employing the ResNet-50 model with the use of spatial channel attention as the underlying network for computing deep keypoints | CD& S | | Accuracy = 97.89%, mAP = 94% |
[100] | Model based on a masked autoencoder with a Vision Transformer as the backbone structure | Dataset of 3256 images with two classes of potato diseases, CCMT | Cropping, horizontal flip, rotation | Accuracy = 99.35% on their dataset, accuracy = 95.61% on CCMT |
[101] | LeafSpotNet, a deep learning framework with a classification model based on MobileNetV3 | Dataset with 2000 images with jasmine plant diseases | Conditional GAN | Accuracy = 97% |
Thakur et al. [
83] introduced a lightweight CNN model for the identification of crop diseases, VGG-ICNN, with 6 million parameters, less than most of the top performing deep learning models. This model has seven convolution layers: four initial layers from VGG16, mixed with two max pooling layers and pre-trained on ImageNet [
102], and three blocks from GoogleNet InceptionV7, randomly initialized, followed by a Global Average Pooling layer instead of a flattening layer for reducing the number of trainable parameters. Final classification is carried out by a fully connected layers with a SoftMax activation function. Their model is compared with four other crop disease detection and classification algorithms and five standard lightweight models on five different datasets: PlantVillage, one derived from the PDDB dataset with 18 plant species, the Apple Dataset from the plant pathology challenge, and the two subsets of Plant Disease Detect related to maize and rice diseases. The proposed model outperforms all the other methods on all the datasets but MobileNetV2, that does better in two datasets out of five: in particular, it reaches an accuracy of 99.16% on the PlantVillage dataset and of 93.66% on the subset of the PDDB one. Khan et al. [
84] proposed a model specific for edge computing devices. They quantized with lower-precision representations of 8 bits the “Linear” and “Conv2d” layers of the MobileNetV3-small model through the use of the Pytorch built-in quantization tool. The resulting model has 0.9 million parameters, but, pre-trained on ImageNet data, it maintains an accuracy of 99.50% on the PlantVillage dataset.
Macdonald et al. [
85] introduced three lightweight deep learning architectures for classifying leaf diseases. According to the authors, full-scale state-of-the-art models designed for general purpose image classification tasks have a certain degree of redundancy when applied to objects like plants that exhibit similar shapes and sizes. In particular, they proposed a downscaled Inception architecture that includes Inception blocks from the GoogLeNet architecture [
81], a downscaled residual architecture that includes residual blocks from the ResNet architecture [
103], and a downscaled dense residual architecture that includes dense residual blocks from the DenseNet architecture. Augmentation includes vertical and horizontal flip, translations, scaling, shearing, cropping, padding, rotation, addition of Gaussian noise and Gaussian blur, image sharpening, dropping of pixel values, and color channels. The downscaled dense residual architecture achieves an accuracy of 96.75% on the PlantVillage dataset with an inference runtime of 31.7 ms on a NVIDIA RTX 3080 Laptop GPU, with a decrease in accuracy of only 1.25% from the full-scale DenseNet-121 model, that has 31x more parameters. Dai et al. [
86] introduced DFN-PSAN, a network with multi-level deep information feature fusion. Feature extraction is carried out by an improved version of the YOLOv5n algorithm, where the convolutional neural network of the classification layer is replaced by a PSAN classification layer, which uses the PSA module [
104] for multi-scale feature fusion. Label smoothing [
105] was added to the cross-entropy loss function for reducing the risk of model overfitting. Their model obtained average accuracies between 93.24% and 98.37% in a k-fold cross-validation with the datasets Katra-Twelve, BARI-Sunflower, containing 467 images of delicate leaves and infected sunflower leaves and flowers from Bangladesh, and FGVC8. Furthermore, the proposed model achieves an accuracy of 99.89% on the PlantVillage dataset. They also used a weather augmentation technique to simulate solar flares and the effect of rain, fog, and shadows from leaf shading, which according to them cause an improvement in accuracy between 0.69% and 2.99% in the three datasets. Their work includes an analysis of other attention mechanisms, among which ParNet [
106] reaches the highest accuracy of 94.78% on the FGVC8 dataset, and an interpretability analysis through the SHapley Additive exPlanation (SHAP) method [
107]. Wang et al. [
52] compared YOLOv3, YOLOv5, and Faster-RCNN on their Weed25 dataset, obtaining mAPs, respectively, of 91.8%, 92.4%, and 92.15%.
Bouacida et al. [
87] introduced a method for generalizing the recognition of the plant diseases to all plant and disease types through the identification of the disease itself instead of considering only the visual appearance of the diseased leaf. To do so, they split each leaf image into smaller patches that did not contain any leaf characteristics; in particular, they use square patches with a size of 32 × 32 pixels, discarding patches with a percentage of black pixels greater to that of the original image, which is estimated to be around 50%. At inference time, the prevalence rate
P of the disease is found by computing the percentage of unhealthy patches over all patches that make up the whole leaf:
where
is the number of unhealthy patches and
is the number of healthy ones. They used as a CNN the small Inception model, a version of GoogLeNet Inception designed for small input sizes, training from scratch on the version of the PlantVillage dataset created by Mohanty et al. [
108], where the leaves are segmented from the background. Their system achieved an accuracy on the same dataset of 94.04% and an accuracy of 97.22% on 108 images randomly taken from the PDDB dataset.
Zeng et al. [
88] used image caption generation to generate textual descriptions of plant areas affected by diseases and also used it to improve disease classification. Their two-stage model, called DIC-Transformer, uses Faster R-CNN with Swin Transformer as backbone to detect the diseased area and generate its feature vector, then uses the Transformer to generate image captions and the image feature vector, weighted by text features to improve the disease prediction performance. Swin Transformer has been chosen among 16 different analyzed backbones implemented in two open-source frameworks, Detectron2 [
109] and MMDetection [
110]. According to the authors, thanks to the self-attention mechanism, the caption generation based on Transformers has a better handling of long-distance dependency, parallel computing capabilities, extraction of abstract features, and capturing of internal correlations in data or features. They also compiled a dataset containing 3971 images of 10 plant species affected by 18 diseases with descriptive information about their characteristics, ADCG-18. Images were selected in a two-step process, using firstly deep learning models to identify images not related to agriculture, then a manual filtering by agricultural professionals. The authors compared DIC-Transformer to 11 state-of-the-art caption generation methods and 10 state-of-the-art classification models and their system obtains the best performance, with values of BLEU-1, CIDEr-D, ROUGE, and accuracy, respectively, of 75.6%, 450.52, 72.1%, and 85.4%.
Other works concentrate on specific crops. Ahad et al. [
89] compared six CNN architectures (DenseNet121, Inceptionv3, MobileNetV2, resNext101, Resnet152V, and Seresnext101) using a dataset of 900 images with a white background, evenly divided into nine classes of rice diseases from Bangladesh. The augmentation included random rotation from −15° to 15°, rotations of multiple of 90° at random, random distortion, shear transformation, horizontal and vertical flip, skewing, and intensity change, obtaining 10 augmented images for every original image. The DenseNet121 [
111] and InceptionV3 models achieved a maximum accuracy of 97%. They also proposed an ensemble stack of Densenet121, EfficientNetB7, and XceptionNet based on a weighted voting scheme that reaches an accuracy of 97.62%. According to their findings, transfer learning can increase a accuracy up to 17%. Li et al. [
54] proposed an improved version of YOLOv5s for reducing the workload of the backbone network for identification of rice diseases, reducing also the weight of the model by four times and increasing the prediction speed by three times. In particular, they deleted the Focus layer to avoid multiple slice operations and replace the C3 module in the backbone with a Shuffle block module, reducing the number of network parameters while capturing long-range spatial information. Their model obtains an mAP@0.5 of 98.65% and an mAP@[0.5:0.95] of 68.53% on their dataset of rice diseases, with a decrease of 0.18% and 1.48% from YOLOV5s. The augmentation includes mosaic data enhancement, flip, random translation, random rotation, random scaling, and random cropping, obtaining 18,456 images from the original 7220 photos. They also experimented with another network based on YOLOv5s, incorporating squeeze-and-excitation modules [
112] and elements from the PP-Picodet network [
80], but it failed to produce satisfactory results. Hu et al. [
90] introduced MSDB-ResNet, a multi-scale dual-branch model that uses a GAN, incorporates the ConvNeXt residual block into the ResNet model to optimize the calculation ratio of the residual blocks, and adjust the size of the convolution kernel of each branch to extract disease features of different sizes. The authors tested this model on a dataset of 20,000 images obtained from the Rice Leaf Disease Image Samples through data augmentation methods such as random brightness, motion blur, mirroring, cropping, and scaling, obtaining an accuracy of 99.34%. Ritharson et al. [
91] introduced a new architecture based on VGG-16 for improving the generalization in the classification of diseases that infect rice leaves, substituting the three fully connected layers of the original architecture with five dense layers and two dropout layers with 50% and 60% activation. According to the authors, the new layers make the model more capable to recognize intricate patterns and abstract features within the images. The VGG-16 model was chosen because it performed better than other pre-trained networks such as Xception, DenseNet121, InceptionResNetV2, InceptionV3, and ResNet50. Their model reaches an accuracy of 99.7% on the Rice Leaf Disease Image Samples dataset, subdivided by them according to the severity of the disease, i.e., according to the spread of infection above the surface of the leaf (mild or severe) and cleaned by removing duplicates, noisy and blurred images. The augmentation includes tilting, rotation, cropping, and blurring.
Saini and Nagesh [
55] compared on their CottonWeeds dataset 11 deep learning models for image classification, trained through transfer learning from ImageNet, among which MobileNet achieves the highest accuracy at 95.43%. They also tried a loss adapted from the Weighted Cross-Entropy introduced by Phan and Yamamoto [
113] for reducing the impact of class imbalance to the training and achieve an improvement from the models trained with the standard Cross-Entropy on the minority weed class purple nutsedge. Three YOLOv5 models are then evaluated in the task of object detection on the same dataset and YOLOv5x achieved the best performance, with an mAP@0.5 of 87.5%. Rahman et al. [
92] compared 13 single-stage and two-stage object detectors based on deep learning for the detection of weeds in cotton on their dataset CottonWeedDet3—cleaned by discarding annotations smaller than 200 × 200 pixels and out-of-focus and by excluding images with more than 10 bounding boxes—and studied the effect of data augmentation on detection accuracy. Training was carried out by fine-tuning the pre-trained weights obtained through the MS COCO Dataset [
114]. RetinaNet R101-FPN [
115] achieves the best performance, with an mAP@0.5 of 79.98% and an mAP@[0.5:0.95] of 62.97%. According to their experiments, RetinaNet and Faster RCNN [
116] models are better than YOLOv5 and EfficientDet [
117] in detecting smaller bounding boxes. On the other hand, the authors recommended YOLOv5n and YOLOv5s models for deploying on resource-constrained mobile devices thanks to their reduced inference time and number of parameters, while maintaining at the same time satisfactory accuracies (mAP@0.5 of 76.58% and 77.47%, respectively). The augmentation, which includes horizontal flip, brightness and contrast change, random HSV and RGB shift, random snow and rain, fancy PCA, random blur, Gaussian noise, random shadow, and random rotation by 90°, increased the mAP@0.50 for the two models chosen for experiments up to 1.6% when the dataset size was increased 8×. Dang et al. [
58] evaluated 25 YOLO object detectors belonging to seven versions from YOLOv3 to YOLOv7 on their CottonWeedDet12 dataset and found that the best one in terms of mAP@0.5 is YOLOv4 with 95.22% without augmentation, while in terms of mAP@[0.5:0.95], it is Scaled-YOLOv4 with 89.72% without augmentation. Training was carried out by fine-tuning the pre-trained weights obtained through the MS COCO Dataset, while performance was assessed with a Monte Carlo cross-validation, repeating model training and evaluation five times with different random seeds. Augmenting the original training set four times with horizontal and vertical flip, random rotation, Gaussian noise, compression, fancy PCA, change of brightness and contrast, RGB shift, multiplicative noise, and blurring, the mAP@0.50 of YOLOv4 increased to 95.63%, while its mAP@[0.5:0.95] increased to 90%. The authors also claimed that the most suitable models for real-time weed detection are YOLOv5n and YOLOv5s, which have a much smaller number of parameters and inference times with only slight decreases of mAPs. In this case, the beneficial effect of data augmentation is not clear, perhaps because YOLOv5 incorporated standard data augmentation approaches.
Zhu et al. [
93] proposed EADD-YOLO, a model based on YOLOv5 for the detection of apple leaf diseases by inserting shufflenet inverted residual blocks in the backbone and depthwise convolutions as efficient feature learning modules in the neck. Furthermore, a coordinate attention module was embedded in critical locations to improve the detection of diseases of different sizes, and the SIoU was used instead of CIoU as the bounding box regression loss to improve prediction box localization accuracy. Their model reaches an mAP of 95.5% on a dataset of 26,377 images of apple leaf diseases taken both indoor and outdoor. Si et al. [
94] proposed a dual-branch model called DBCoST for the classification of diseases in apple leaves. Their model combined CNNs, which are good at processing local features but whose limited receptive fields make them not so suitable for capturing global information, and Transformers, which on the other hand are good at capturing global information and establishing long-range dependencies with their self-attention mechanism, but do not extract very well local features. In particular, the CNN branch derives from the ResNet-18 model, while the Transformer one derives from the Swin Transformer Tiny, introduced by Liu et al. [
118], a hierarchical Transformer based on a shift window design. A feature fusion module composed of two parts, a Concatenation and Residual Block, and an improved channel attention module, integrates the features extracted by the two branches. Their model reached an accuracy of 97.32% on a subset containing five disease types from the FGVC8 dataset, using as augmentation horizontal and vertical flip, random rotation, color jitter, and normalization, and an accuracy of 98.06% on the AppleLeaf9 dataset with 756 images of mixed diseases from FGVC8, using in this case as augmentation also Gaussian noise and salt-and-pepper noise.
Lin et al. [
95] introduced an enhanced YOLOX-Tiny network for detecting brown spot disease in images of tobacco crops, introducing into the neck network hierarchical mixed-scale units (HMUs) for information interaction and feature refinement between channels and convolutional block attention modules (CBAMs) to further enhance the ability to extract useful features. Their network achieves an AP of 80.56% in their dataset with 340 images. Chen et al. [
96] proposed a lighter YOLOv5s model for real-time strawberry disease detection. They enhance the original model with a GhostConv for feature map extraction, an involution operator with SiLU activation for capturing larger-scale contextual information, an attention mechanism for emphasizing relevant image features, and a CARAFE operator for content-aware upsampling. The authors evaluated their model on a dataset of 2246 images with seven strawberry diseases, with augmentations that include the mosaic method, HSV enhancement, variations in illumination, and target occlusion, obtaining a reduction of 45% in the number of parameters, of 77.5% in FLOPs, and of 42.6% in the size of the model with respect to the original YOLOv5s model, with an mAP@0.5 of 94.7%, which is 4.5% better than that of the original model. The authors evaluated their model also on the PlantDoc dataset, obtaining in this case an mAP@0.5 of 27.9%, with an increase of 0.9% to that of the original model. Hu et al. [
97] introduced a Lesion Proposal CNN for the identification of strawberry diseases that firstly locates the main lesion object and then applies a lesion part proposal module to propose the discriminative lesion details. This system reached an accuracy of 92.56% on their dataset of 3411 images collected from Chinese fields and from the Internet.
Uddin et al. [
68] modified the YOLOv8 object detector to classify cauliflower diseases and to localize the affected areas in the image by adding three additional convolutional blocks with a kernel size of 1, inserted before the output convolutional layer: this improves the processing of the feature maps without increasing the number of parameters significantly. While base YOLOv8 uses as activation function the Swish or Sigmoid-Weighted Linear Unit (SiLU), which incorporates a smooth sigmoid function, they used Hard Swish [
119], which introduces a clipped linear function and improves efficiency without a decrease in detection performance. The proposed method, called Cauli-Det, reaches an mAP@0.5 of 90.6% and an mAP@[0.5:0.95] of 69.4% on the annotated version of the VegNet cauliflower disease classification dataset. Kanna et al. [
98] compared 10 deep learning models on a dataset containing four classes of cauliflower diseases with 656 original images collected in Bangladesh. Used pre-processing includes conversion to grayscale, dilation, and erosion to add or remove the pixels from the boundaries of the images, histogram equalization, and adaptive thresholding to extract the plants from the background. EfficientNetB1 obtained the best results for accuracy, i.e., 99.90%, in the validation set and Xception, with an F1 score of 99.62%.
Masood et al. [
99] enhanced Faster-RCNN by employing the ResNet-50 model with the use of spatial channel attention as the underlying network for computing deep keypoints, achieving an accuracy of 97.89% in classification and an mAP of 94% in detection of infections on the Corn Disease and Severity dataset. Wang et al. [
100] proposed a model based on a masked autoencoder [
120], a self-supervised learning algorithm with an encoder-decoder architecture that trains a model by masking parts of the image and trying to reconstruct those, with a Vision Transformer used as the backbone structure, for plant leaf disease recognition. It includes a convolutional block attention module [
121] for enhancing the image features before the image blocks are passed to the encoder and a Gate Recurrent Unit [
122] for capturing the sequential relationship between the diseased image blocks and enhancing the processing of temporal information of the features passed from the encoder. The model, pre-trained on the PlantVillage dataset, is tested on two datasets: a dataset of 3256 images related to potato diseases divided into three categories, late blight, early blight, and healthy, augmented through rotation, random cropping, and horizontal flip, and on the CCMT dataset. It reaches an accuracy of 99.35% on the first dataset and of 95.61% on the second one. V et al. [
101] presented LeafSpotNet, a deep learning framework for detecting leaf spot disease in jasmine plants, widely cultivated in Southeast Asia. They used a classification model based on MobileNetV3 [
123], a conditional GAN for data augmentation and the Swarm Particle Optimization method [
124], which helps in eliminating the irrelevant features for enhancing the feature selection process, achieving a classification accuracy of 97% in their dataset of 2000 images. Transfer learning from the ImageNet-21 k dataset [
125] is used for training. According to the authors, the system presents a good robustness in various conditions, including extreme camera angles, varying lighting conditions, and grain noise.
6. Synthetic Datasets
Although the significance of dataset quality is widely acknowledged, comprehensive evaluations of synthetic data generation as a potential solution remain relatively scarce. Existing scholarship underscores synthetic data’s capacity to enhance privacy, address access constraints, and expand limited datasets, yet often provides only partial insights into its underlying methodologies and evaluative frameworks [
126,
127,
128]. To fill this void, our current investigation scrutinizes both the efficacy and limitations of synthetic data techniques, thereby promoting best practices and encouraging robust adoption. Recent advances in diffusion models substantiate their potential to produce high-fidelity synthetic datasets across diverse domains. In medical imaging, diffusion-based methods yield high-resolution 3D brain images and synthetic MRI/CT scans, reinforcing patient privacy and mitigating data scarcity [
129,
130], ultimately enhancing downstream training [
131], though at considerable computational cost [
132]. Meanwhile, Generative Adversarial Networks (GANs) remain pivotal, particularly in agricultural tasks where environmental variability and limited labeled data hinder model robustness [
72,
133,
134,
135,
136]. Techniques like SMOTE [
137] and ADASYN [
138] further address class imbalance but require careful parameter tuning. Although retrieval-based augmentations can sometimes outperform generative approaches under constrained resources [
139], diffusion frameworks such as DatasetDM continue to excel, generating both synthetic images and detailed annotations [
131]. Collectively, these developments underscore synthetic data’s critical role in mitigating data scarcity, reducing annotation costs, and fostering more resilient and generalizable models across healthcare, agriculture, and beyond.
Table 4 presents an overview of pivotal publications and their contributions to synthetic dataset generation across diverse domains, which will be explored in greater detail later in this section. It highlights the methodologies employed, the primary focus of each study, and their main contributions. The entries encompass diverse approaches, including diffusion models, GANs, and hybrid datasets, showcasing their applications in fields such as medical imaging, agriculture, and generative modeling. This summary emphasizes advancements in synthetic data generation, its effectiveness in addressing domain-specific challenges, and its potential to enhance machine learning and data-driven solutions.
Yang et al. [
140] explored the potential of using AI-generated images as data sources for enhancing visual intelligence. It delves into how generative AI, including Generative Adversarial Networks (GANs) and diffusion models (DMs), can produce synthetic images that closely resemble real-world photographs, offering unmatched abundance, scalability, and the ability to rapidly generate vast datasets. These synthetic images are useful for training machine learning models, simulating scenarios for computational modeling, and performing testing and validation. The paper discusses the technological foundations of this approach, including the utilization of neural rendering and the integration of 3D scene representations. It also addresses ethical, legal, and practical considerations, highlighting the transformative potential of synthetic data in advancing various computer vision tasks and applications such as image classification, segmentation, and object detection. The comprehensive survey underscores the significant impact of synthetic data on improving the efficiency, cost, and performance of AI models in visual intelligence. Burg et al. [
139] investigate the efficacy of using diffusion models versus image retrieval for data augmentation in computer vision tasks. The authors conducted an evaluation comparing various techniques for generating augmented images, particularly focusing on the performance of diffusion models against a simpler nearest-neighbor retrieval method from the pre-training dataset. The key finding was that retrieval-based methods not only improved classification accuracy more effectively but also required significantly less computational resources compared to the sophisticated diffusion models. This suggests that for data augmentation purposes, the simplicity and efficiency of retrieval-based methods make them a more practical choice over complex generative approaches, especially in scenarios with limited computational resources.This was evident across different datasets, including a 10% subset of ImageNet [
102] and the Caltech256 dataset [
143]. Wu et al. [
131] generated data using diffusion models, addressing the challenges of data scarcity and annotation costs in deep learning. The authors proposed DatasetDM, a model that leverages pre-trained diffusion models to generate diverse synthetic images with high-quality perception annotations, such as segmentation masks and depth maps. This is achieved through a unified perception decoder (P-Decoder), which decodes the latent code from the diffusion model into perception annotations. The methodology consists of two stages: training the P-Decoder with a minimal set of manually labeled images (less than 1% of the original dataset), and using the trained P-Decoder for infinite data generation guided by text prompts. This paradigm shift from text-to-image to text-to-data generation allows for the creation of large-scale annotated datasets efficiently. Experiments demonstrated that models trained on synthetic data generated by DatasetDM achieve state-of-the-art results in various tasks, including semantic and instance segmentation, depth estimation, and pose estimation. For instance, DatasetDM improves the mIoU on the VOC 2012 dataset [
144] by 13.3% and the AP on the MS COCO 2017 dataset by 12.1%. Moreover, the synthetic data show enhanced robustness in domain generalization and can be flexibly applied to new tasks, such as image editing. The use of diffusion models in generating synthetic datasets for agriculture presents a significant advancement in overcoming data scarcity and enhancing the performance of machine learning models. These methods provide scalable solutions for generating annotated datasets that are crucial for training robust computer vision models in agricultural applications. Sapkota et al. [
135] investigated the use of OpenAI’s DALL.E model for generating synthetic image datasets for agriculture. The study examined both text-to-image and image-to-image generation methods to create realistic agricultural images. It evaluated the generated images against ground truth images using metrics such as Peak Signal-to-Noise Ratio (PSNR) and Feature Similarity Index (FSIM), finding that image-to-image generation yielded better clarity but lower structural similarity. The research highlights the potential of AI-generated imagery to streamline data collection and enhance machine vision applications in agriculture, ultimately improving crop monitoring and yield estimation. Wachter et al. [
141] emphasized the necessity of high-quality, diverse data for training AI systems. They identified the ‘data problem’, highlighting issues like data sparsity and class imbalance prevalent in agricultural data. The authors proposed hybrid datasets, combining real and synthetic data, as a solution to bridge the ‘reality gap’ of synthetic data. A unified taxonomy for data types—real, synthetic, augmented, and hybrid—is presented to clarify terminological inconsistencies in the literature. Real data are defined as information collected from the physical world, while synthetic data are generated through algorithms or manual processes. Augmented data, considered a subset of synthetic data, involves transformations of real data. Hybrid datasets contain both real and synthetic samples, improving model performance, especially in scenarios with limited real data availability. Voetman et al. [
136] challenged the notion that deep object detection models always require extensive real-world training data. They introduced ‘Genfusion’, a framework that leverages pre-trained stable diffusion models to generate synthetic datasets. The key idea is to fine-tune these models on a small set of real-world images using a technique called DreamBooth, enabling the generation of images that closely resemble specific real-world scenarios. The authors demonstrated the effectiveness of Genfusion in the context of apple detection. They fine-tuned the model on a subset of images from the MinneApple dataset and used the generated synthetic data to train YOLO object detectors (YOLOv5 and YOLOv8). The performance of these detectors was then compared to a baseline model trained on real-world data. The results showed that object detectors trained on synthetic data performed comparably to the baseline model, with the average precision (AP) deviation ranging from 0.09 to 0.12. While the baseline models achieved higher AP scores, the results highlight the potential of synthetic data generation as a viable alternative to collecting large amounts of training data. Sehwag et al. [
142] addressed the challenge of sample deficiency in low-density regions of data manifolds in image datasets. Applying diffusion models to generate novel high-fidelity images from these low-density regions can be particularly useful in agricultural datasets where certain conditions or scenarios are underrepresented. Their modified sampling process guides image generation towards low-density regions while preserving fidelity, ensuring the production of unique, high-quality samples without overfitting or memorizing training data. Lu et al. [
133] provided an extensive review of the application of Generative Adversarial Networks (GANs) in agricultural image analysis. They focused on the challenges posed by biological variability and unstructured environments in obtaining large-scale, annotated datasets for training high-performance models. The review details the evolution of various GAN architectures such as DCGAN, CycleGAN, and StyleGAN, and their roles in image augmentation for tasks like plant health detection, weed recognition, and postharvest quality assessment. Olaniyi et al. [
134] investigated the applications of Generative Adversarial Networks as a deep learning approach to data augmentation of agricultural images. They reported significant performance improvements when using GAN-augmented datasets for various tasks, including disease recognition, weed management, fruit detection, and zootechnics. Ref. [
145] used Deep Convolutional Generative Adversarial Networks (DCGAN) to create a synthetic dataset for cotton leaves affected by various diseases. The DCGAN algorithm generates images that mimic the characteristics of the original dataset, thereby providing a larger and balanced set of training samples. This method enhances the performance of machine learning models in detecting diseases in cotton leaves. The study demonstrates that synthetic data generated using DCGAN improves model accuracy and efficiency, validating its potential in agricultural data augmentation.
7. Edge Computing and Real Time: Optimal Algorithms and Advancements in Hardware
In the context of digital agriculture, recent contributions converge on the utilization of edge computing devices (often Raspberry Pi and NVIDIA Jetson platforms) to achieve low-latency data processing and enable real-time decision making in diverse agricultural tasks.
Table 5 lists key contributions to edge computing for digital agriculture.
By integrating advanced machine learning frameworks (including CNNs, RNNs, and Transformers) on resource-constrained edge devices such as Raspberry Pi and NVIDIA Jetson Nano, these works collectively advance the state of precision agriculture. Low-latency solutions and accurate, near-real-time analytics are attained through architectural layering, sensor fusion, and optimized models, paving the way for sustainable and scalable agricultural practices.
In order to achieve low latency when communicating information from the production environment, part of the data processing can be executed in edge computing devices that are close to the sensors. According to Restrepo-Arias et al. [
146], the Raspberry Pi microprocessor is the most used device in edge computing for precision agriculture. Abioye et al. [
147] presented an IoT-based monitoring framework for measuring soil moisture content, irrigation volume, and computation of the reference evapotranspiration. The data collected were transmitted to a Raspberry Pi 3 controller for onward online storage and displayed on the IoT dashboard. Adami et al. [
148] proposed a system for detecting ungulates and for protecting the fields from their intrusion through the creation of virtual fences based on ultrasound emission. They evaluated the object detectors YOLOv3 and Tiny-YOLOv3 on two edge computing devices: the Raspberry Pi Model 3 B+ with or without the Intel Movidius Neural Compute Stick and the NVIDIA Jetson Nano. As connectivity solutions, they recommend LoRa and LoRaWAN. When the PIR sensor of the Animal Repelling device detects a movement, a message is sent to the edge computing device through a xbee radio interface. Then, the edge computing device executes the object detector, and if an animal is detected, a message is sent back to the Animal Repelling Device, which generates the ultrasound according to the detected animal. The authors collected 1000 images of wild boars and deer in the Tuscany region of Italy with both cloudy and sunny weather conditions, augmented to 10,000 through jitter, image rotation, flip, cropping, multi-scale transformation, hue, saturation, Gaussian noise, and intensity change. YOLOv3 reaches an mAP of 82.5%, while Tiny-YOLO an mAP of 62.4%. On the NVIDIA Jetson Nano, the first one has an average frame rate of 3 FPS, while the second one of 15 FPS in the 20 W operational mode. With the Raspberry the frame rate reaches at most 4 FPS. The authors also noted that with the out-of-the-box Jetson, the CPU temperature remains acceptable, while with Raspberry, a PWM fan must be used.
Prabu et al. [
149] proposed an IoT-based crop field protection system that detects both crop diseases and animal interference. Motion, temperature, ultrasonic, and acoustic sensors in poles and drones sent collected data to a Raspberry Pi module 4b through wired and wireless mediums. Recurrent Convolutional Neural Networks (RCNNs) were used to detect abnormalities in leaf and field images; then, the observations were sent to Recurrent Generative Adversarial Neural networks (RGANs) for detailed analysis and to find the definitive anomalies. The authors tested their system on a dataset containing 1200 leaf images of tomato, potato, spinach, wheat, and corn plants and on a dataset containing 250 images of animal and bird intrusions, obtaining classification accuracies from 98.7% to 99.8%. Gomez et al. [
150] introduced FARMIT, an architecture based on IoT and machine learning/deep learning for continuous assessment of agricultural crop quality. It is divided into three layers: the physical layer, which gathers information from sensors about crops and executes the corrective actions through the actuators according to the commands received from the other layers, the edge layer, whose purpose is to obtain low latency in communication with the sensors and to control and manage tasks on physical devices, and the cloud layer, which analyzes collected data through machine learning/deep learning models and gives the results to desktop, mobile, or web applications through REST services. Corrective actions can be executed by users of applications or automatically when an anomaly is measured by the sensors. The authors deployed FARMIT in a tomato plantation in the south of Spain, using sensors for temperature, wind, rain, electrical conductivity, humidity, radiation, and carbon dioxide and images from RGB cameras converted to the Lab color space. Data analysis was carried out by a Random Forest regressor with 100 estimators. Devi et al. [
151] proposed a combination of IoT sensors and devices, image processing, and machine learning/deep learning for disease detection, weed detection, and process control for the cultivation of beans in India. Their system uses the Normalized Difference Vegetation Index (NDVI), which computes the amount of green vegetation contained by a land, to quantify the health of bean leaves, K-Means Clustering (KMC), Fuzzy C-Means clustering (FCM), and Region Growing methods to extract and analyze the diseased regions of the bean leaves, Local Binary Patterns (LBP), Gray—Level Co-occurrence Matrix (GLCM), and their combination Local Binary Gray Level Co-occurrence Matrix (LBGLCM) to capture physiological attributes of the bean leaves. Temperature, humidity, and soil moisture sensors are directly connected with an Arduino UNO board, while the ThingSpeak platform is used for visualization of captured data. They tried two different deep learning frameworks: the first one is based on an EfficientNetB7 with Bidirectional Long Short-Term Memory (BiLSTM), while the second one on a VGG16 with an attention layer integrated at each stage of the network to enhance the feature awareness, both pre-trained on ImageNet. The authors compare the proposed approach with the performance of three humans with up to 5 years of experience: the two models reached accuracies, respectively, of 95% and 96% in the classification between diseased and healthy leaves and in weed detection, while the human accuracy did not go above 80%.
Rashavand et al. [
152] proposed a novel approach to Automatic Modulation Recognition (AMR) using Transformer networks, a deep learning architecture originally designed for natural language processing. The authors highlighted the potential of Transformers in AMR due to their ability to process sequential data in parallel and capture dependencies across different parts of the input sequence. They introduced four distinct tokenization strategies and evaluate their performance on two datasets: RadioML2016.10b and CSPB.ML.2018+. The results demonstrate that their proposed Transformer-based model, TransIQ, outperformed existing deep learning techniques in AMR accuracy, particularly in low signal-to-noise ratio conditions. This research aligns with the broader trend of applying Transformers to various domains beyond natural language processing, as seen in works like [
74,
154], showcasing the versatility and effectiveness of this architecture. Singh et al. [
153] discussed the development and experimental validation of a smart agricultural drone integrated with IoT technologies and machine learning techniques. Their drone employed TensorFlow Lite with the EfficientDetLite1 model to identify crops, achieving an inference time of 91 ms. The system features two spray modes for optimal pesticide application, operating autonomously using real-time data. The drone, equipped with an X500 development kit, has a payload capacity of 1.5 kg, a flight time of 25 min, and a speed of 7.5 m/s at a height of 2.5 m. The research aims to enhance sustainable farming by improving pesticide use efficiency and crop health monitoring through precise and autonomous agricultural practice.
7.1. Emerging Edge Devices
In
Table 6, we offer a concise survey of emerging high-performance edge devices—beyond the commonly employed Raspberry Pi and NVIDIA Jetson Nano—that can substantially enhance computational throughput for precision agriculture. These platforms leverage powerful GPUs, specialized AI accelerators, or advanced CPUs to accommodate increasingly demanding deep learning pipelines (e.g., complex CNNs, transformers, and other computationally intensive architectures). Our investigation considers the following selection criteria, which we deem pivotal for successful edge deployment in agricultural contexts:
Computational Performance: Assessment of CPU/GPU capabilities and the presence of hardware accelerators or specialized AI modules.
Memory Capacity: Evaluation of on-board RAM and storage, essential for real-time image/video processing and handling large model footprints.
Power Consumption: Analysis of operational power requirements, thermal management, and energy efficiency in variable environmental conditions.
Form Factor and Portability: Consideration of device size, weight, and suitability for integration with agricultural machinery or field stations.
Cost and Scalability: Balancing initial investment with scalability and ease of upgrades as sensor networks grow.
Ecosystem and Developer Support: Examination of software libraries, community resources, and compatibility with popular ML frameworks (e.g., TensorFlow Lite, PyTorch).
Table 6 highlights devices that exhibit higher computational capabilities than the Raspberry Pi or NVIDIA Jetson TK1 or TX1 or TX2, allowing more complex and larger-scale inference tasks at the edge. By adopting a holistic set of assessment metrics (computational performance, memory capacity, power consumption, form factor, cost, and ecosystem support), we aim to capture the practical trade-offs between theoretical efficiency and real-world feasibility in demanding agricultural scenarios. For instance, although the NVIDIA Jetson AGX Orin dominates in raw GPU power, its higher price point and increased power draw might prove challenging for large-scale field deployments that lack stable power infrastructure. Conversely, compact solutions like the Google Coral Dev Board or Intel Movidius NCS2 offer remarkable inference speed at a fraction of the energy cost, albeit with tighter memory constraints and certain limitations on model complexity.
As modern agriculture continues to incorporate more sophisticated deep learning models—e.g., transformers for specialized tasks such as automatic modulation recognition [
152], object detection in aerial imagery [
153], or real-time disease identification [
149]—selecting an edge platform that effectively balances performance and resource usage becomes paramount. Next-generation platforms not only expand the computational envelope but also open avenues for real-time data fusion, multi-modal sensor integration, and advanced interpretability (e.g., through on-device saliency mapping). Consequently, they enable continuous or near-continuous monitoring of crops, weather, and livestock, thereby enhancing the timeliness and specificity of agronomic interventions. Future research should further quantify the lifecycle cost of these edge devices, as well as their environmental footprint, to guide sustainable technology adoption in precision agriculture.
7.2. The Role of Wireless Communication
In this review, we primarily focus on a technological framework centered on edge computing applications in modern agriculture. However, wireless communication and remote computing remain essential for tasks that cannot be performed in real time, such as complex analyses requiring human interaction, long-term data storage, and, all in all, operations that do not benefit from on-the-move execution. Examples include the generation of prescription maps, the acquisition of 3D models from point clouds, the representation of digital twins, and the creation of field datasets for training machine learning models.
Two comprehensive reviews on wireless communication in agriculture can be found in [
5,
6]. These works analyze and evaluate technologies such as Bluetooth, GPRS/3G/4G, Long Range Radio (LoRa), SigFox, WiFi, ZigBee, RFID, and NB-IoT in the context of digital agriculture. Their findings are particularly insightful, as they systematically compare key aspects such as power consumption, communication range, cost, applications, and limitations. We recommend them as practical references for selecting the most suitable protocol and designing a wireless framework tailored to specific agricultural applications. Beyond these references, recent reviews on advancements in the mentioned technologies include [
155], which examined the role of LoRa in smart farming and its integration with IoT, and [
156], which evaluated the performance of 5G in agricultural robotics, comparing it with 4G and WiFi6—an emerging wireless communication standard—particularly in the context of real-time applications.
A specific application of wireless technology in agriculture involves passive (battery-free) sensors that are externally powered by ground rovers or drones, enabling wireless data transmission without the need for an embedded battery or, even more limiting, cabled connections. This approach provides a significant advantage, as powering sensors and other fixed, crop-based devices remains a critical challenge, in addition to the risk of soil contamination from battery chemicals. While small photovoltaic panels can be used as an energy source, they are not an optimal solution due to economic and practical constraints. In [
157], notable examples of passive humidity and temperature sensors are presented, along with an introduction to the concept of energy harvesting. Although this is traditionally associated with UAVs, it is also applicable to ground rovers, where the reduced distance to the sensors provides a clear advantage. An extreme implementation of passive sensors is presented in [
158], where nature-inspired, millimeter-scale devices mimicking dandelion seeds are designed to enhance scalability and flexibility for various sensing and computing applications. These devices, capable of traveling 50–100 m in a gentle to moderate breeze, are powered by lightweight solar cells and energy harvesting, and can efficiently cover large crop areas.
As UGVs and agricultural machines primarily operate in open fields where continuous connectivity is not guaranteed, it is essential to consider solutions based on opportunistic communication. This network paradigm enables data transmission only under favorable conditions, making it well-suited for scenarios where traditional communication infrastructure is unreliable, intermittent, or energy-constrained. In such cases, data are temporarily stored on local devices (e.g., a UGV) and transmitted once a coverage area is reached. Notably, drones can also function as mobile relays, flying over a network of isolated sensors to collect data and subsequently transmit it to a receiving station. Another compelling application is opportunistic inter-peer communication, where, for example, multiple UGVs or UGV-UAV pairs exchange data to coordinate and optimize strategies for handling complex or large-scale tasks [
159,
160].
We conclude this section with a discussion on Global Navigation Satellite System (GNSS), a fundamental technology for precision farming. While not an absolute novelty, GNSS has recently become accessible through low-cost solutions that were unfeasible just a few years ago, when only high-end, next-generation tractors or expensive aftermarket products provided such capabilities. This affordability is primarily due to recent advancements in hardware, such as U-blox chips and compact antennas with enhanced performance. Related technologies, including Precise Point Positioning (PPP) and high-accuracy Real-Time Kinematic (RTK), have direct applications in the generation of prescription maps, variable-rate treatments, precise crop georeferencing, and the autonomous navigation of UGVs, UAVs, and modern agricultural machinery. In [
161], a recent state-of-the-art review on low-cost GNSS receivers, the authors conclude that in open-sky conditions—common in agriculture—the performance of these devices is comparable to that of traditional, high-cost systems.
10. Discussion and Conclusions
Recent advances in ICT technology, including hardware devices and optimized software libraries for computer vision, alongside the availability of innovative deep learning models for image classification, now enable the application of modern agriculture solutions that were previously unattainable. However, several challenges remain to be addressed before these concepts can be translated into practical tools for everyday farming.
In
Section 2, we reviewed a substantial number of studies addressing various aspects of AI, IoT, and AIoT in agriculture. Some of these studies highlight open issues and challenges, offering roadmaps for future advancements. Notably, Adli et al. [
8] identified several key challenges in technology adoption, including the complexity of interconnected devices, data privacy and security concerns, trust in AIoT technologies, the resilience of complex sensor systems, and the difficulties in developing robust, effective, and cost-efficient wireless networks capable of covering entire crop areas. Interestingly, a part of these challenges are alleviated by adopting edge-based AIoT solutions. These systems reduce reliance on cloud-based analysis, facilitate real-time decision making, eliminate the need for complex wireless communication networks in open fields, enhance data security, and lower the costs of remote analysis services. However, deferred data processing on remote platforms remains essential for tasks not requiring immediate results, such as generating high-precision spatial models for crop digital twins, and for long-term data storage. This research topic [
9] provides valuable insights and critical analyses of the open challenges in AIoT for agriculture. This field is still in its early stages and faces several issues, including data acquisition, the optimization of AI algorithms, and the limited performance of hardware required to operate under harsh environmental conditions. The use of ground rovers, which could mitigate significant labor shortages in manual-intensive practices, also requires the resolution of non-trivial technological challenges.
Several concurrent developments in the current AI landscape support the emergence of a new edge-based AIoT paradigm for digital agriculture. First, concerning the training phase, present AI models for image classification (see
Section 5) have reached a high level of maturity, with many modern techniques achieving excellent accuracy without the need for excessively powerful computational resources. Moreover, transfer learning from pre-trained models has demonstrated high effectiveness, significantly reducing the need to train large-scale classifiers from scratch, an approach that would otherwise require expensive high-performance computing platforms and considerable processing time. This favorable context facilitates precise model calibration through iterative and trial-and-error methodologies, enabling the creation of highly optimized deep learning classifiers.
Second, advancements in hardware we reviewed in
Section 7 are delivering a new generation of devices designed to balance performance and energy efficiency. These include edge GPUs and other accelerators for deep learning inference. Combined with faster and larger memory chips, this marks a departure from the traditional reliance on low-cost, low-power, and low-performance hardware like Raspberry Pi. The last generation of system-on-chips, while remaining affordable, offers enhanced AI capabilities and more consistent processing throughput. Examples of practical applications made possible by these innovations will be presented shortly.
This brings attention to a critical issue: training datasets, often the weakest link in the process. Obtaining high-quality plant images in large quantities is challenging due to the seasonality of crops and the variability of phenotypes across space and time. Publicly available high-quality datasets (see
Section 4) are limited and often focus on plant species and cultivars that differ from those of interest to end-users, restricting their utility to the setup and validation of newly developed AI workflows. In this context, the capability to generate realistic, automatically annotated synthetic images could represent a significant breakthrough and is likely to become a key research focus in the near future. The advanced AI models for image-to-image and text-to-image generation discussed in
Section 6 can utilize either powerful data servers or high-end consumer GPUs to produce, respectively, large- or small- to medium-scale synthetic datasets. Images in these collections feature realistic dimensions, with the entire dataset generated within typical computation times ranging from several hours to a few days. This capability provides a valuable resource for enhancing the quality of training datasets and, consequently, improving classifier performance.
Within this framework, novel AIoT applications can be designed and implemented, with edge computing and real-time inference as key enablers. One example is autonomous systems for continuous agricultural monitoring, where early detection of issues such as pest infestations and disease outbreaks enables cost-effective and environmentally friendly protection strategies. These challenges cannot be reliably addressed by human experts due to labor shortages in modern agriculture or by existing remote sensing methods, such as satellites, which face technical limitations. Instead, they can be tackled by unmanned ground vehicles (UGVs) with low costs, capable of autonomous operation (potentially in conjunction with drones) in vineyards, orchards, olive groves, and other low-density crops. Another practical example is the use of smart booms attached to tractors for spraying chemical active ingredients to control weeds or pests. Edge-based AIoT enables robust systems capable of localizing crop emergencies on the move, allowing circumscribed actions such as precision spraying. These systems can adjust their operational efforts based on the presence of the target, enabling precise and efficient treatments.
While not all desired applications in modern agriculture are currently feasible with existing software and hardware technologies, it is important to note the rapid pace of innovation in these fields. On the one hand, it is essential to continuously monitor advancements from ICT manufacturers to identify new products that could significantly enhance the capabilities of edge-based AIoT systems, such as those exemplified earlier. On the other hand, progress in algorithmic development can greatly facilitate the transition to expert systems, for instance, by moving from quasi-real-time to true real-time performance through improved inference speeds. Two recent methods, KANs and XNets, have emerged within the past few months: they are reviewed in
Section 8. Although still in their early stages, these approaches have generated high expectations. In the context of this review, they hold significant promise for enabling more efficient models that can be deployed on edge devices for real-time agricultural practices. Monitoring the development of these technologies is critical, as they may lead to substantial advancements in the near future.