Agricultural Image Processing: Challenges, Advances, and Future Trends

Song, Xuehua; Yan, Letian; Liu, Sihan; Gao, Tong; Han, Li; Jiang, Xiaoming; Jin, Hua; Zhu, Yi

doi:10.3390/app15169206

Open AccessReview

Agricultural Image Processing: Challenges, Advances, and Future Trends

by

Xuehua Song

,

Letian Yan

,

Sihan Liu

,

Tong Gao

,

Li Han

,

Xiaoming Jiang

,

Hua Jin

and

Yi Zhu

^*

School of Computer Science and Computer Science and Communication Engineering, Jiangsu University, Zhenjiang 212013, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(16), 9206; https://doi.org/10.3390/app15169206

Submission received: 3 July 2025 / Revised: 5 August 2025 / Accepted: 19 August 2025 / Published: 21 August 2025

(This article belongs to the Special Issue Pattern Recognition Applications of Neural Networks and Deep Learning)

Download

Browse Figures

Versions Notes

Abstract

Agricultural image processing technology plays a critical role in enabling precise disease detection, accurate yield prediction, and various smart agriculture applications. However, its practical implementation faces key challenges, including environmental interference, data scarcity and imbalance datasets, and the difficulty of deploying models on resource-constrained edge devices. This paper presents a systematic review of recent advances in addressing these challenges, with a focus on three core aspects: environmental robustness, data efficiency, and model deployment. The study identifies that attention mechanisms, Transformers, multi-scale feature fusion, and domain adaptation can enhance model robustness under complex conditions. Self-supervised learning, transfer learning, GAN-based data augmentation, SMOTE improvements, and Focal loss optimization effectively alleviate data limitations. Furthermore, model compression techniques such as pruning, quantization, and knowledge distillation facilitate efficient deployment. Future research should emphasize multi-modal fusion, causal reasoning, edge–cloud collaboration, and dedicated hardware acceleration. Integrating agricultural expertise with AI is essential for promoting large-scale adoption, as well as achieving intelligent, sustainable agricultural systems.

Keywords:

agriculture; image processing; environmental challenges; data challenges; model lightweighting

1. Introduction

As the cornerstone of both food security and socially sustainable development, agriculture is essential to human life [1]. According to data from the Food and Agriculture Organization of the United Nations, the agricultural sector employs approximately one billion people, accounting for 28% of the global workforce. With their significant advantages in net arable land area, the United States, China, and India dominate global agricultural production and significantly impact the state of food security.

Meanwhile, with the continuous growth of the global population and the urgent need for sustainable agricultural development, traditional agriculture is facing unprecedented challenges, including increasing crop yields, improving resource utilization efficiency, and reducing environmental impacts [2]. Furthermore, the rapid advancement of artificial intelligence in recent years has introduced novel opportunities across various sectors, facilitating innovation and progress in precision agriculture [3]. In light of this, the digital and intelligent transformation of agriculture has emerged as a critical pathway for advancing the development of modern agriculture [4]. Image processing, as a pivotal approach to enhancing agricultural capabilities, is increasingly recognized as a core component of smart agriculture systems. For example, research on tea tree variety identification utilizing deep learning techniques has demonstrated efficient classification performance through canopy image analysis [5].

In recent years, the application of image processing in the agricultural domain has become increasingly prevalent, encompassing areas such as disease detection [6], fruit recognition [7], yield prediction [8], crop management [9], water resource management [10], soil management [11] and weed detection [12], as shown in Figure 1. However, to effectively implement these application scenarios and harness their full potential, it is essential to gain a comprehensive understanding of the core requirements for each specific context. To further illustrate the typical pipeline of agricultural image analysis, Figure 2 presents a general workflow covering data acquisition, preprocessing, model training, inference, and deployment. In academic research, agricultural image processing is often hindered by factors such as variations in lighting conditions, occlusion, background intrusion and differences across scenes, which impose significant demands on the robustness and generalization capabilities of the models [13]. Borowiec et al. [14] emphasized that the existing datasets are limited in scale, and the scarcity of labeled data has emerged as a critical bottleneck, reflecting the pressing demand among researchers for improved data quality. For applications like disease detection, single-scale monitoring fails to comprehensively capture the microscopic physiological effects of diseases on crops and the progressive macro-level damage process. Liao et al. [15] addressed this observation by employing hyperspectral technology to implement integrated monitoring that combines microscopic localization, mesoscopic quantification, and macroscopic classification, thereby effectively addressing the multi-scale requirements for assessing rice planthopper-induced damage levels. Most agricultural settings rely on low-power devices, whereas the prevailing deployment systems are often too expensive and technically challenging to implement directly in the field. For instance, through miniaturization at the hardware level, efficiency optimization at the algorithmic level, and energy-saving strategies at the system level, a portable electronic nose was developed for the detection of chicken adulteration. While this device has achieved miniaturization and low-power consumption design, its performance at extremely low adulteration levels remains suboptimal. This indicates that further refinement is necessary to achieve an optimal balance among miniaturization, energy efficiency, and sensitivity in ultra-low-concentration detection [16].

Against the backdrop of the aforementioned challenges, researchers have carried out extensive studies in the field of agricultural image processing, proposing a range of solutions including lightweight network architecture design [17], generative adversarial networks [18], self-supervised learning [19], multimodal feature fusion [20], feature extraction techniques [21], and instance segmentation [22]. At present, there is an urgent need for a comprehensive and systematic review that consolidates the core challenges, key technical approaches, and future research directions in this field, aiming to provide both theoretical foundations and practical references for academic researchers and engineering practitioners.

While several review articles have previously addressed various aspects of agricultural image processing, most either focus narrowly on specific tasks such as disease detection or fruit recognition or prioritize traditional machine learning approaches while inadequately covering recent advancements in deep learning and deployment challenges. Furthermore, few reviews have conducted a systematic analysis of environmental, data-related, and deployment-related challenges within a unified framework. In contrast, this paper presents a structured and comprehensive review of the field by organizing the discussion around three core technical dimensions—environmental complexity, data scarcity and imbalance, and model lightweighting for edge deployment. Each dimension is supported by an in-depth examination of recent solutions. In addition, we highlight the practical trade-offs among accuracy, scalability, and computational cost, which are often overlooked in previous studies. By critically comparing representative approaches and highlighting emerging trends such as transformer-based segmentation, data-efficient learning, and domain adaptation, this review not only synthesizes recent technical progress but also serves as a strategic guide for future research and real-world deployment in smart agriculture.

2. Materials and Methods

The literature analysis in this study comprises two sequential stages: (a) relevant literature collection and (b) comprehensive review and in-depth analysis of the selected materials. In the initial stage, keyword-based searches were conducted across scientific databases (IEEE Xplore, ScienceDirect, Web of Science, and Google Scholar) to identify conference papers and journal articles. The search terms employed were: [“agriculture” OR “farming”] AND [“image processing”]. To gain a deeper understanding of the thematic distribution and emerging research hotspots within the selected corpus, a keyword co-occurrence analysis was performed using the VOSviewer software 1.6.20. As illustrated in Figure 3, according to our bibliometric analysis of 2156 articles published between 2010 and 2025, the generated network visualization identifies several distinct thematic clusters. Deep learning and computer vision (blue cluster), which constitute the core technical approaches in this domain, encompass key terms such as deep learning, convolutional neural networks (CNNs), and vision. The remote sensing technology and data augmentation cluster (green cluster) reflects the integration of remote sensing with agricultural imaging, incorporating terms such as remote sensing, precision agriculture, and biomass. The agricultural production and influencing factor analysis cluster (red cluster) highlights research focused on environmental influences in agricultural production, including keywords such as production, hyperspectral imaging, and influencing factor. The model optimization and deployment cluster (yellow cluster) addresses challenges related to accuracy improvement and model generalization in agricultural image processing, with representative terms including accuracy, species, and change. Lastly, the environmental interference and background processing cluster (purple cluster) pertains to technical challenges encountered in real-world applications, involving keywords such as robot, occlusion, object, and localization. Overall, this analytical outcome provides a clear depiction of current research hotspots and emerging trends, offering valuable insights for researchers aiming to understand and advance the field of agricultural image processing. Building on these findings, a representative sample of 134 publications was systematically selected for in-depth review and critical evaluation, ensuring broad coverage of core research themes and methodological approaches.

The second stage involves a detailed analysis of each document collected in the initial phase, guided by the following research questions:

Which specific challenges in agricultural image processing are addressed in the study?
What technical approaches are employed to tackle these challenges?
What are the advantages and limitations of the proposed technological solutions?
Does the study include comparative evaluations against alternative technologies?
What are the current state-of-the-art solutions for overcoming these challenges?

3. Technical Challenges in Agricultural Image Processing-Environmental Challenges

3.1. Problems Posed by the Environmental Conditions

According to our bibliometric analysis of 2156 articles published between 2010 and 2025 (Figure 3), approximately 43% of the reviewed studies explicitly focused on environmental challenges in agricultural image processing. Keywords such as background interference, generalization, and multi-scale adaptation appeared with high frequency, indicating a widespread and sustained research interest in addressing the complexities of natural environments where agricultural imagery is typically acquired. In open-field environments, factors such as uneven topography, fluctuating lighting conditions, and variable weather patterns frequently introduce background disturbances—including weeds, shadows, reflective water surfaces, and occluding branches. These disturbances obscure salient crop features and significantly compromise the accuracy of key tasks such as object detection, image segmentation, and disease classification. For instance, in applications like pest identification or fruit detection, soil textures or partially occluded plant parts are often misclassified as lesions or target objects, thereby reducing model reliability. Furthermore, our analysis reveals that over 28% of the reviewed publications emphasized domain adaptation and cross-scenario robustness, underscoring the critical importance of addressing generalization challenges. Agricultural imagery exhibits considerable temporal and spatial heterogeneity due to variations in geographic regions, crop growth stages, and sensor technologies. Models trained on specific datasets often fail to generalize to new environmental conditions, particularly when exposed to unfamiliar lighting, resolution levels, or morphological characteristics. In recent years, multi-scale adaptation has emerged as a central research focus, as evidenced by more than 500 publications within our dataset. The diversity of image acquisition sources—from satellite remote sensing to UAV-based imaging and microscopic close-ups—introduces significant morphological variation in target objects. Single-scale models often struggle to simultaneously capture both large-scale field structures and fine-grained lesion details, leading to missed detections or imprecise boundary segmentation.

In summary, bibliometric trends clearly demonstrate that background interference, limited generalization, and multi-scale variability are not only persistent technical challenges but also prominent research hotspots. These findings highlight the pressing need for the development of more robust, adaptive, and scalable algorithmic solutions in the field of agricultural image processing.

3.1.1. The Challenge of Background Interference

Background interference is a primary challenge in agricultural image processing, significantly complicating the accurate identification and analysis of target crops, pests, and diseases. The complexity of field environments introduces numerous interfering factors, including variations in soil texture and color, overgrown weeds that closely resemble crop seedlings in shape and hue, overlapping branches and leaves, drastic fluctuations in lighting conditions (e.g., strong illumination, shadows, backlighting), reflective water stains, as well as non-target objects such as agricultural machinery, stones, and crop residues. These background elements often exhibit high similarity to target objects in terms of color, texture, shape, and spectral characteristics, or may occlude them entirely. As a result, algorithms struggle to effectively distinguish foreground from background, which markedly undermines the accuracy and robustness of tasks such as target detection, disease diagnosis, growth stage evaluation, and yield prediction. This persistent issue necessitates the development of more sophisticated segmentation models capable of mitigating this problem across diverse agricultural scenarios.

From the perspective of crop growth characteristics, crops tend to occlude one another as they grow. For instance, in an orchard setting, fruits develop in dense formations, with branches and leaves intertwining, and both fruits and foliage mutually obstructing each other’s visibility, which makes it challenging to fully discern the outlines of certain fruits. Similarly, in later growth stages of field crops such as rice and wheat, plants become closely spaced and densely packed, with overlapping leaves that obscure individual plant morphology and signs of pests or diseases. These natural blockings and overlaps result in blurred object boundaries and missing key visual features in the images, significantly increasing the difficulty of tasks such as image segmentation and object detection.

Environmental factors further aggravate the challenges of background interference in agricultural image processing. Wind and rain can cause crops to lodge and tangle, transforming an originally organized plant layout into a disordered arrangement, thereby increasing the degree of overlap between targets and the background, as well as among targets themselves. In certain greenhouse environments, structural components such as plastic films and support brackets create complex background clutter that obscures the visibility of crops and compromises image clarity. Additionally, fluctuating lighting conditions generate shadows on crop surfaces, which, when combined with other forms of obstruction, lead to a more complicated distribution of pixel intensity values in the images. This complexity significantly hinders the accurate differentiation of target regions from background areas.

At the data collection and processing level, occluding and overlapping backgrounds can significantly increase the difficulty of data annotation. It is difficult for annotators to precisely define the true boundaries of occluded targets, and annotation errors increase significantly, affecting the quality of training data. Meanwhile, due to incomplete features of masked targets, existing deep learning models often fail to learn comprehensive and discriminative target features during training, resulting in issues such as missed detections, false detections, and reduced generalization capability and detection accuracy.

3.1.2. The Challenge of Generalized Adaptation

The challenge of generalization adaptation in agricultural image processing mainly stems from the high variability and complexity of agricultural scenes. Variations in lighting conditions, extreme weather, morphological differences at different crop growth stages, regional planting diversity, and inconsistencies in image scale, perspective, and resolution resulting from different acquisition devices have led to models performing well on training datasets but having difficulty maintaining accuracy in real, variable application scenarios. This cross-domain distribution shift significantly hampers the model’s ability to adapt to new environments, thereby reducing its generalization ability and limiting the effectiveness and reliability of intelligent agricultural systems in large-scale deployment.

The agricultural production environment is complex and dynamic, with significant variations in climate and soil conditions across different geographical regions. For example, in northern arid regions and southern humid zones, differences in light duration, temperature and humidity during crop growth lead to obvious differences in the visual characteristics of crops such as shape, color and texture. Even for the same crop, when grown in different environments, differences may arise in leaf size, color saturation, and the manifestation of pests and diseases. In addition, seasonal changes and time-of-day fluctuations in lighting can cause variations in brightness and shadow distribution across target objects, further increasing the diversity of image features and making it difficult for models trained in one particular environment to adapt to image features in other environments.

There are many difficulties in data acquisition, making it difficult for the training data to cover all possible situations. On the one hand, data related to some special scenarios or rare pests and diseases is hard to obtain, resulting in a lack of sufficient training samples and poor model performance in these situations. On the other hand, the problem of data category imbalance is prominent, such as a large number of common disease samples and very few rare disease samples, which can cause the model to over-learn common category features and have a weak ability to recognize a few categories. Moreover, inconsistencies in image resolution, color reproduction, and noise level due to varying acquisition devices further challenge model generalization.

3.1.3. The Challenge of Multi-Scale Scene Adaptation

In agricultural remote sensing monitoring, multi-scale scene adaptation faces severe challenges. Due to differences in resolution, phase and coordinate system, the spatiotemporal alignment accuracy of satellite, unmanned aerial vehicle and ground sensor data is often insufficient, which directly affects the accuracy of farmland boundary identification. For example, if the fusion of Sentinel-2 and high-resolution satellite data fails to solve the problems of spatial downscaling and spectral consistency calibration, this can lead to incorrect crop classification, which in turn reduces the accuracy of map updates and agricultural machinery navigation. Meanwhile, the dynamic changes in agricultural scenarios pose higher requirements for model robustness. Variations in weather and lighting conditions, such as the transition between sunny and cloudy days, can cause significant fluctuations in the performance of weed detection models, with traditional methods performing 15–20% worse on cloudy days. Morphological changes during the crop growth cycle, such as leaf occlusion and plant deformation, further complicate target detection. This leads to a significant increase in the false detection rate for models like YOLO (You Only Look Once) in densely occluded scenarios. In addition, there is a contradiction between the computing power limitations of edge computing devices and the real-time requirements of agriculture. Lightweight models have insufficient inference speed on agricultural machinery edge devices, making it difficult to meet the response requirements of less than 100 ms for real-time operations, especially in multi-scale feature extraction tasks. At the same time, small targets such as disease spots and early weeds in farmland have a high rate of missed detection in low-resolution remote sensing images and need to rely on high computationally costly multi-level magnification processing, further increasing resource consumption.

3.2. Cutting-Edge Solutions

In response to challenges such as severe background interference and complex, variable environmental conditions in agricultural imaging, researchers have proposed segmentation methods based on attention mechanisms, Transformer, and multi-scale fusion. Combined with lightweight design and data augmentation, these methods have effectively enhanced the robustness, generalization ability, and deployment efficiency of the model, thereby providing essential support for the real-world implementation of agricultural image processing systems.

3.2.1. Model Segmentation Techniques

Model segmentation techniques in agricultural image processing are mainly used to precisely extract target regions such as crops, pests and diseases, and weeds, from complex natural backgrounds. Traditional segmentation methods include threshold, edge detection, clustering, and region-based methods, offer simplicity and interpretability, but suffer from limited accuracy and robustness in variable field conditions. In recent years, deep learning methods have significantly improved the segmentation effect, especially models based on convolutional neural networks (CNNs) such as U-Net, DeepLabV3+ and SegNet, which are widely used in tasks such as farmland plot division, crop type recognition and disease leaf detection. These models can learn complex image features from a large amount of labeled data and achieve high precision, pixel-level segmentation. Meanwhile, to improve performance under small sample conditions, many researchers have introduced techniques such as transfer learning, data augmentation, and attention mechanisms, further promoting the development of smart agriculture -oriented image segmentation.

(1): Traditional model segmentation techniques

As shown in Table 1, methods such as threshold segmentation, edge segmentation, and cluster segmentation remain widely used withing traditional segmentation frameworks. Each have their own advantages, disadvantages, and applicable scopes, depending on image complexity, target object characteristics, and environmental variability.

Threshold segmentation is one of the most fundamental and common methods in image segmentation, especially in the early days of agricultural image processing. It is typically applied to separate the foreground (such as crops, fruits, spots, etc.) from the background (such as soil, sky, leaves). The basic principle of the threshold segmentation algorithm is to combine the gray-level features of the image, calculate one or more gray-level thresholds, compare the gray-level values of each pixel in the image with the thresholds, and finally classify each pixel by comparing its grayscale value against the threshold. Li et al. [23] applied the Semantic Edge-Aware Multi-Task Neural Network (SEANet) for rice planting areas identification. SEANet enhances the extraction of geographical features through the Geometric Boundary Constraint Module (GBCM). This effectively overcomes the high reliance of traditional threshold segmentation on morphological characteristics. During the rice heading stage, the model optimizes the segmentation and classification tasks synchronously via a multi-task learning framework. By leveraging distinct morphological differences among rice plants, it enables precise plot boundary delineation in complex terrains. In root phenotypic analysis, Zhang et al. [24] developed the High-Throughput Root Phenotyping Platform (Root-HTP), integrating the deep learning-based D-LinkNet model and RootNav 2.0. It innovatively combines morphological skeletonization techniques with an adaptive threshold optimization strategy. By maintaining the continuity of the root system’s topological structure, this system successfully achieves the precise segmentation of fine roots with diameters less than 0.2 mm, effectively resolving the segmentation challenges in complex scenarios at the soil-root interface. In fruit recognition and picking, Sun et al. [25] proposed an active depth perception path-planning algorithm to tackle the problems of occlusion and lighting variations in automatic harvesting. The algorithm integrates RGB-D camera data with spatial coordinate mapping and incorporates the Convolutional Block Attention Module (CBAM) to dynamically focus on prominent fruit features, achieving robust detection performance in complex apple and tomato picking scenarios.

Peng et al. [26] used threshold segmentation as a core step to remove background noise interference and combining it with a hierarchical sparse autoencoder network, they achieved precise prediction of chlorophyll content in pomegranate leaves, providing a new idea for non-destructive plant physiological index detection. In crop disease identification, Chen et al. [27] proposed a fusion framework integrating the Elite Strategy based Particle Swarm Optimization algorithm (GCLPSO) and the Otsu algorithm in response to the limitations of the traditional Otsu in complex lesion images, which significantly improved the segmentation quality of corn leaf spot, gray leaf spot and rust disease images, effectively reducing noise sensitivity and over-segmentation. These studies show that threshold segmentation technology demonstrates strong technical potential and application value in areas such as agricultural phenotypic analysis, fruit identification, and disease diagnosis through organic combination with other algorithms.

The core principle of edge segmentation is to identify the positions where the pixel gray values of the target object (such as leaves, fruits, stems) and the background (such as soil, sky) undergo significant mutations, or between different tissue regions (such as healthy and diseased leaves). The algorithm works by calculating the gradient magnitude or direction of change in gray values for each pixel in the image based on its local neighborhood. In agricultural applications, the algorithm needs to adapt to complex and variable natural environments (such as uneven lighting, overlapping leaves, soil background), and accurately delineate the contours of crops or the boundaries of lesion areas, providing a foundation for subsequent recognition, segmentation, and quantitative analysis. Some common edge detection operators are Prewitt, Sobel, Canny and Roberts. Xu and Zhang [28] proposed an improved Canny edge detection method for irregular shapes such as sunflower seeds, aiming to solve the problems of missed detection, false detection and edge breakage in complex backgrounds, and proposed an adaptive threshold technique based on gradient statistical differences to automatically determine high and low thresholds, addressing the sensitivity of manually set thresholds to gray-level changes. Ju et al. [29] proposed a pomegranate spot detection algorithm based on the adaptive threshold Prewitt operator, aiming to address the problems of low detection accuracy, weak noise resistance and pseudo-edges of pomegranate spots in complex natural environments, combined with dual Gaussian weights of spatial distance and pixel similarity. While denoising, edge features are preserved, and center weights are assigned based on the Gaussian noise distribution to enhance edge response. Wang et al. [30] improved the traditional Haar-like feature model to solve the problem of rice disease collection and segmentation in complex natural environments. Edge segmentation identifies the boundary between the target and the background by calculating the gray gradient of pixels, while the watershed algorithm, as an extended form of edge segmentation, essentially simulates the water accumulation process of the terrain based on image gradient information to achieve regional segmentation. Both are centered on gradient analysis and jointly provide support for precise segmentation of agricultural images. For example, Liu et al. [31] based on edge detection results from R-G grayscale images, used the watershed algorithm to segment apple images into irregular blocks, achieving a 20.31% reduction in block count compared to the traditional gradient watershed algorithm. They then combined with an SVM (Support Vector Machine) model through optimal hyperplane search and slack variable optimization to classify fruit and non-fruit blocks. Zhu et al. [32] combined the H-channel threshold segmentation method, the Euclidean distance transform watershed algorithm (accurate but time-consuming), and edge segmentation to address the segmentation problem of adhered wheat grains. Using BP neural network (achieving an average accuracy of 97%) and SVM model classification, a single grain diversion channel physical separation of adhered wheat was designed to simplify the algorithm, and noise and computational complexity were reduced through preprocessing optimization and feature extraction to improve the accuracy and efficiency of wheat quality detection. Li et al. [33] proposed an improved watershed segmentation method for identifying early rotten areas in oranges. This method enhances tissue contrast through morphological gradient processing, suppresses noise using gradient reconstruction and label correction techniques, and achieves accurate extraction of rotten areas through watershed transformation. It innovatively employs the PLS-DA and BP-ANN dual model architectures, which are linear and nonlinear classification models, respectively. The MC-UVE-SPA algorithm was used to screen the 568.8 nm and 771.2 nm feature wavelengths, reducing image processing workload. The optimized segmentation algorithm shortens single-sample detection time. This method combines spectral feature screening with algorithm optimization to provide an efficient solution for rapid fruit quality detection.

Clustering segmentation is essentially a process of aggregating and classifying independent patterns based on similarity criteria, with the core objective of ensuring that the similarity between patterns of the same kind is significantly higher than that between classes. Compared with traditional image segmentation methods, clustering segmentation technology offers distinct advantages when dealing images with blurred edges and complex backgrounds. Leveraging its unsupervised learning characteristics and computational efficiency, it has become an important tool in the field of modern agricultural image processing. Currently, the fuzzy C-means (FCM) algorithm and K-means clustering algorithm are most widely used in agricultural scenarios. In the study of grape cluster image segmentation, Liu et al. [34] proposed a fusion scheme of K-means and local anomaly factor (LOF) algorithm. This method effectively reduces noise interference by pre-filtering image outliers using LOF. On this basis, K-means clustering is then applied to achieve precise segmentation of grape clusters under different lighting and background conditions. Pham and Lee [35] developed a hybrid segment-merge algorithm for fruit defect detection: first, the original image was divided into base regions using the K-means algorithm, and then similar regions were iteratively merged using a minimum spanning tree (MST) algorithm, resulting in significant improvements in defect recognition accuracy and segmentation efficiency. In the field of automatic apple grading, Yu et al. [36] proposed a method based on multi-view image features and weighted K-means clustering. This method integrates four-view images—top, bottom, and both sides— and introduces average gray values as key features, enabling effective differentiation of defects, stems, and calyxes, with a grading accuracy exceeding 96%.

(2): Deep learning-based model segmentation techniques

In recent years, with the development of deep learning in the field of images analysis, these techniques have been widely applied in the segmentation of crop disease images. Deep learning-based image segmentation algorithms, including semantic segmentation, instance segmentation, and panoptic segmentation, outperform traditional machine learning image segmentation methods in terms of performance and efficiency. They also offer stronger robustness to interference and support more direct end-to-end training. Therefore, many researchers are focusing on deep learning-based model segmentation techniques to solve segmentation problems in complex scenarios.

The YOLO model has successfully extended its real-time object detection capabilities into instance segmentation by adding segmentation heads. Its greatest advantage lies in its astonishing speed and efficiency, making it an ideal choice for real-time segmentation applications. However, it is generally inferior to more complex segmentation models in terms of segmentation accuracy and edge detail, and mainly focuses on instance segmentation rather than semantic segmentation. Choosing YOLO or other models involves a trade-off between speed, accuracy, resource consumption, and application requirements. YOLO’s segmentation capability is a valuable tool for real-time systems that need to quickly obtain contour information of object instances. Tian et al. [37] proposed a segmentation method based on an improved YOLOv8 for cucumber fruit segmentation in greenhouse environments. The spatial adaptability of the model to irregular fruit shapes was enhanced by introducing deformable convolution (DCNv4); additionally, the RepNCSPELAN4 module was designed in series with the C2F module to optimize multi-scale feature extraction and fusion, effectively improving the accuracy of cucumber fruit boundary segmentation. Zhao et al. [38] proposed an instance segmentation method based on YOLOv8n-seg for low-cost lettuce height measurement. This method designs a lightweight instance segmentation model by enhancing the YOLOv8n-seg model architecture and reconstructing the channel dimension distribution, enabling precise segmentation of lettuce. Using YOLOv8n-seg as the base model and innovatively introducing FasterNet as the backbone network to replace the original CSP DarkNet significantly improves the efficiency of the model. Through model optimization and lightweight design, the improved model increased processing speed by 25.56% while maintaining a high average accuracy. Ma et al. [39] proposed an improved YOLOv8-seg model for maturity detection and instance segmentation of lotus pods. The model integrates the convolutional block attention module (CBAM) to extract salient features and adopts the WIoU regression loss function to improve the bounding box prediction accuracy. Experiments show that compared with models such as YOLOv8n-seg, YOLOv5, Mask R-CNN, and YOLACT, the improved YOLOv8-seg model performs better in terms of accuracy, speed, and model size, with an average accuracy ranging from 97.4% to 98.6%. Under complex lighting conditions, the model demonstrated strong robustness and accurately segmented overlapping and occluded lotus pods. This approach offers an effective visual perception solution for the lotus pond harvesting robot, enhancing harvesting efficiency.

However, the YOLOv8-seg model achieves excellent real-time performance and robustness, its instance-level segmentation approach inherently limits its ability to produce precise semantic boundaries. For tasks where boundary accuracy is critical—such as disease area quantification or subtle phenotypic trait analysis—semantic segmentation models with refined encoder–decoder architectures (e.g., DeepLabV3+, SegFormer) may offer superior accuracy, albeit at greater computational cost. Therefore, a careful trade-off must be made between speed, memory efficiency, and boundary-level precision when selecting models for specific agricultural applications.

Jiang et al. [40] proposed an improved YOLOv8n model for strawberry target detection by introduced the Squeeze and Excitation (SE) attention mechanism and the EIoU loss function, significantly improving detection accuracy and localization performance. The model performed well in the strawberry image recognition task, achieving a detection accuracy of 94.7%. In addition, the paper improved the efficiency of strawberry distribution density estimation through model optimization, data preprocessing and augmentation, and density estimation method optimization. Tang et al. [41] proposed a region segmentation method based on nearest neighbor analysis for precise segmentation and density evaluation of target regions in apple images. This method constructs a distance matrix and adjacency list, applies Prim’s algorithm to generate a minimum spanning tree for identifying connected components, combines elliptic fitting for regional contour extraction, and uses a weighted kernel density function to estimate regional density. In the model application, the improved YOLOv8n detection model also incorporates the SE attention mechanism and the EIoU loss function to optimize the target detection accuracy and localization performance. By combining kernel density estimation optimization with nearest neighbor analysis, the processing efficiency is improved while ensuring segmentation accuracy, providing technical support for strawberry planting density optimization and harvesting strategies. Zhu et al. [42] proposed a multi-scenario adaptive method for segmentation of sugar apple instances, centered on the construction of the GCE-YOLOv9-seg architecture based, an improved version of the YOLOv9-seg model. By integrating Gamma correction (GC) for image enhancement, the authors significantly improved the robustness of target region recognition under complex lighting conditions, such as interwoven strong light and shadow. The fusion of the efficient multi-scale attention mechanism (EMA) and the convolutional block attention module (CBAM) enhances the model’s feature capture ability for fruit scale variations (with a diameter span of 2–8 cm) and densely overlapping scenes. Through the collaborative optimization of illumination enhancement and multi-scale attention, the problem of missed detection and false detection in complex environments is solved, and high-precision, real-time multi-scene adaptation is achieved. The following table presents a comparison of evaluation metrics based on the YOLO model. While Table 2 summarizes key metrics such as precision, recall, mAP (mean Average Precision), inference time, and GFLOPs (Giga Floating Point Operations), it is important to note that the relevance of these metrics varies across tasks. For instance, mAP is critical in object detection tasks, particularly when evaluating models on small or occluded targets, as often seen in early-stage pest detection. In contrast, real-time applications such as UAV-based monitoring or robotic harvesting emphasize FPS (Frames Per Second) and inference time over absolute accuracy. Moreover, precision and recall trade-offs should be interpreted carefully in imbalanced datasets, where high recall may lead to increased false positives. Therefore, model evaluation in agricultural settings should consider both algorithmic performance and deployment feasibility.

The Transformer model plays a crucial role in the field of agricultural image segmentation. With its powerful self-attention mechanism and parallel processing capabilities, it can effectively capture long-distance image feature relationships, thereby enhancing segmentation performance. Agricultural images often have complex backgrounds and diverse objects, and the Transformer can adaptively focus on different regions through its multi-head attention mechanism, thereby improving the performance of traditional convolutional neural networks (CNNs) on these issues. While Transformer-based models demonstrate significant advantages in capturing global contextual information and managing complex spatial configurations, they generally entail increased model complexity, prolonged training durations, and elevated inference latency. In agricultural applications that demand real-time processing or operate under power-constrained conditions—such as UAV-based weed detection or in-field disease monitoring—these models may not be practically viable without substantial model compression or dedicated hardware acceleration. Consequently, the applicability of Transformer architectures must be carefully assessed with respect to specific resource limitations and operational priorities. In terms of crop disease segmentation, Yang et al. [43] proposed an instance segmentation method based on TD-BlendMask (Transformer-DCNv2-BlendMask) for the problem of leaf disease segmentation of Panax notoginseng in complex field environments. Based on the BlendMask framework, Transformer encoders were introduced to capture global dependencies in order to distinguish similar diseases in response to the highly similar characteristics of gray mold and blight, as well as the variable morphology of small target lesions such as anthracnose. The integrated deformable convolutional network (DCNv2) dynamically adjusts the receptive field to enhance adaptability to irregular small targets. Dai et al. [44] proposed the AISOA-SSformer image segmentation method in view of the problems of irregular lesions, multifaceted background interference and edge blurring in rice leaf diseases. Based on the Transformer architecture, this method replaces the multi-layer perceptron (MLP) with a sparse global update perceptron (SGUP), and uses a sliding window mechanism to smooth parameter updates, enhancing the stability and segmentation accuracy of the model for irregular lesion patterns; in addition, the significant Feature Attention mechanism (SFAM) is introduced to enhance the model’s ability to distinguish between disease areas and complex backgrounds, enabling precise segmentation of rice leaf disease areas. Wu et al. [45] proposed an apple orchard recognition framework based on multi-temporal remote sensing images and deep learning, using SegNet as the backbone, integrating the convolutional block attention module (CBAM) and Focal Loss function to construct the AOCF-SegNet model. This method integrates multi-temporal phenological features (spectral differences between flowering and fruiting periods) and spatial texture information of Sentinel-2 satellites, enhances key feature learning through the attention mechanism, and incorporates focus loss to address class imbalance problem, enhancing the generalization ability of the model for complex scenes such as different planting densities and terrain undulation. The model maintains inference efficiency through parameter optimization, providing a multi-scenario adaptive deep learning solution for precise identification of large orchards and agricultural remote sensing monitoring. Figure 4 compares the architectures of YOLO-based and Transformer-based detection models. YOLO adopts a streamlined, single-stage design that combines feature extraction and object localization into a unified pipeline, enabling fast inference. In contrast, Transformer-based models decouple the encoding and decoding processes, utilizing global self-attention mechanisms to capture long-range dependencies and contextual information. While YOLO offers superior real-time performance and is well-suited for edge deployment, Transformer architectures provide higher accuracy in complex agricultural scenes due to enhanced semantic understanding and multi-scale feature aggregation. This architectural comparison underscores the trade-off between speed and precision in selecting models for agricultural image processing.

DeepLabV3 has significant application advantages in agricultural image segmentation, especially when dealing with agricultural images with complex backgrounds, multiple scales, and high requirements for boundary clarity. Its superior performance provides important support for precision agriculture. By combining dilated convolution and multi-scale processing, DeepLabV3 can effectively improve the accuracy of agricultural image segmentation and promote the development of agricultural automation and intelligence. In the study of fruit cluster segmentation, Peng et al. [46] proposed an overlapping segmentation method of grape clusters based on deep region growing (DRG) to solve the problem of precise extraction of front-end grape clusters for grape-picking robots. The DeepLabV3+ model optimized by transfer learning was used for semantic segmentation of RGB images to obtain the grape region mask; the mask was then mapped to the depth map collected by RealSense D435, and the cluster extraction of the front-end grape clusters in the overlapping clusters was achieved through an innovative depth region growing algorithm, and hole filling and contour extraction were performed. In a natural pergola environment, the method achieved an average recall rate of 89.2%, an accuracy of 87.5%, a processing time of 98 ms per frame, and adapted to ±15° camera tilt angle changes (accuracy maintained at 87.1%), meeting the real-time operation requirements of the picking robot. Wang et al. [47] proposed a semantic segmentation method based on the MV3_DeepLabV3+ network framework for precise identification of wheat harvesting boundary lines. This method uses the lightweight MobileNetV3_Large as the backbone network to extract image features and introduces depthwise separable dilated convolution into the ASPP structure to expand the receptive field of the model while reducing the complexity of network parameters, and finally achieves multi-scale feature fusion of the image through a decoder. Through lightweight processing, MV3_DeepLabV3+ achieved an IoU (Intersection over Union) of 95.20% for unharvested wheat regions in complex environments, a PA of 98.04%, and a frame rate of 7.5 FPS in video segmentation. In terms of efficiency improvement, the introduction of depthwise separable dilated convolution significantly reduces the computational load of the network, shortening the average processing time of a single image to 0.15 s, improving computational efficiency by 40% compared to traditional models, while maintaining robustness to ±10% light intensity variation, providing an efficient solution for real-time visual navigation of wheat harvesters. Peng et al. [46] proposed a grape image segmentation method based on the DeepLabV3+ model to address the problem of low segmentation accuracy of traditional methods in complex lighting and multi-variety grape scenarios. By using the transfer learning strategy combining DeepLabv3+ with the network (Figure 5), an Intersection over Union (IoU) of 84.26% was achieved in the RGB color space, which was 6.73% and 8.65% higher than U-Net and FCN, respectively. Compared with traditional image processing algorithms, the deep learning method reduced the false detection rate by more than 40% in the multi-variety grape segmentation task, and the inference speed met the real-time requirements (<200 ms/frame).

3.2.2. Domain-Adaptive Techniques

In agricultural image processing, Domain Adaptation (DA) techniques are widely used to address the issue of inconsistent distribution of models in the training domain (source domain) and the test domain (target domain), especially to enhance a model’s generalization across different environments, devices, crops, and seasons.

Currently, Cycle-Consistent Generative Adversarial Network (CycleGAN) methods based on generative adversarial networks (GANs) have shown significant advantages in unsupervised image-to-image translation tasks, as they do not rely on paired samples and can effectively overcome image distribution differences caused by illumination variations or insufficient sample data. These methods significantly enhance the generalization performance of agricultural image segmentation, recognition and detection models in practical application scenarios. Xu et al. [48] addressed the problem of insufficient labeled data due to weather changes by converting 817 sunny images into cloudy scene images, expanding the training set to 1634 images, effectively enhancing the model’s adaptability to illumination changes. The results showed that when CycleGAN was not introduced, the intersection-over-union (IoU) ratio of the model in cloudy scenes was only 70.2%; after integrating CycleGAN, the IoU metric increased to 91.6%, confirming the effectiveness of the method in enhancing model robustness. Tian et al. [49] proposed a solution based on the improved CycleGAN to address the problems of difficult dataset acquisition, insufficient sample size and class imbalance in rice disease image recognition. The innovative application of CycleGAN is the core support of this solution. This solution overcomes that the traditional CycleGAN can only handle single-modal data and builds a dual-modal generation framework of hyperspectral and RGB images. By embedding the spectral-spatial joint attention mechanism in the residual module of the generator, the model can automatically focus on the key feature wavelengths and subtle lesions in the asymptomatic stage of rice blast disease, solving the problem of insufficient extraction of subtle disease features by the original CycleGAN.

3.2.3. Attention Mechanisms Technology

The core idea of the attention mechanism is to enable neural networks to dynamically and selectively focus on the most relevant and informative parts of the input and give them higher weights while suppressing irrelevant or interfering information. In agricultural image processing, in the face of the complex and changeable farmland environment and the common data limitations, the attention mechanism significantly enhance model generalization by focusing on key areas and adaptively selecting the most discriminative feature channels. It effectively suppresses background noise interference, dynamically adapt to target appearance changes, integrates multi-scale contextual information, and enhances the discriminative expression of features, enabling the model to maintain stable and reliable performance in the face of previously unseen farmland scenes, crop varieties, or growth stages, providing critical support for the robustness of precision agriculture applications. Xu et al. [50] proposed a multi-temporal deep learning model called DeepCropMapping (DCM), which combines a bidirectional long Short-Term Memory network (BiLSTM) and the attention mechanism to perform dynamic mapping of corn and soybeans using Landsat Analysis Ready Data (ARD) time series. The model demonstrated exceptional spatial generalization ability at six sites in the U.S. corn belt, with an average Kappa coefficient of 85.8% and an average transfer Kappa coefficient of 82.0% at spatially independent test sites, significantly outperforming Transformer, Random Forest (RF), and multi-layer perceptron (MLP). Chen et al. [51] proposed a domain-adaptive rice disease image recognition method based on a novel attention mechanism (CPAM), designed to address the issue of low recognition accuracy caused by significant data distribution differences between the source and target domains under limited sample conditions. This method combines channel attention and spatial attention, optimizes the network’s focus on disease-relevant regions through feature weighting, and integrates it into a Deep Subdomain Adaptive Network (DSAN), using Local Maximum Mean Discrepancy (LMMD) to align the subdomain feature distributions of the source domain and the target domain. When the target domain contains images with low resolution and Intense background interference, the recognition accuracy rate of CPAM-DSAN reaches 95.25%, providing an effective solution for disease diagnosis in complex farmland environments.

3.2.4. Feature Extraction Techniques

In the field of agricultural image processing, feature extraction technology is the key to transforming raw data into intelligent analysis. It identifies and quantifies key visual information from images of crops, soil, pests and diseases, etc. The core lies in extracting discriminative features from massive pixel data, covering both traditional visual features and deep learning methods. Traditional visual features include: color features (e.g., color histograms) that quantify color variations; texture features (e.g., gray-level co-occurrence matrices) that describe gray-level variation patterns; Shape and morphological features (e.g., edge detection) that compute geometric properties; and spatial relationship features that describe the relative position and layout of the objects. Deep learning methods, leveraging models such as convolutional neural networks, automatically learn hierarchical, task-related complex feature representations and offer significant advantages in complex agricultural scenarios. These extracted features are critical for agricultural tasks such as precise pest and disease identification, automated weed detection, and crop yield estimation, their quality and discriminative power directly affect the accuracy and intelligence level of agricultural decision-making.

In the study of feature extraction techniques for image processing, various research teams have proposed innovative solutions for multi-scenario adaptation. Wang et al. [52] introduced a method combining unmanned aerial vehicle (UAV) photogrammetry with deep learning. This approach establishes a low-cost, high-precision framework for agricultural 3D point cloud data acquisition using RGB cameras and RTK positioning systems mounted on drones. They developed a Local-Global Feature Aggregation Network (LoGA-Net) to achieve multi-scenario adaptability. By integrating the Local Geometric Feature Enhancement module (LGFE) and the Global Context Feature Aggregation module (GCFA), along with multi-scale attention mechanisms and geometric constraints, LoGA-Net addresses the technical challenges of complex boundary recognition and occluded target segmentation in agricultural environments. Liu et al. [53] proposed a multi-task super-resolution network (MT-SRNet), which decouples environmental changes from individual animal variability through modular design. By combining a Rotated Bounding Box (RBB) Detector and an Auto-Visual Prompt Strategy to achieve key feature extraction under low-resolution input, significantly improving inference speed while maintaining high accuracy. The system processes approximately 170,000 pigs per second, and the actual deployment frame rate reached 248 FPS. Tao et al. [54] proposed a method for detecting peach blossom density by combining RGBD cameras with improved convolutional neural networks, using RGBD cameras to obtain RGB and depth information, removing complex backgrounds and distant flowers through depth segmentation. Based on CSRNet, an FC-Net model integrating multi-scale feature fusion module (MSFF), high-efficiency channel attention module (ECA), and dilated convolution to enhance small-target flower bud detection. High precision and real-time processing were achieved through preprocessing and network optimization. Peng et al. [55] proposed a grape variety identification method that integrates deep features with SVM, using AlexNet, GoogLeNet, and ResNet to extract high-dimensional features such as fc6, inception_5b/output, and fc1000. Canonical Correlation Analysis (CCA) was used to map features to a subspace with the maximum correlation coefficients to generate fusion vectors for SVM-based classification. This multi-model feature extraction and CCA fusion strategy is adopted to enhance the representation ability and solve the problem of low efficiency in traditional manual recognition. The multi-scale feature representation (MSR) method proposed by Saleh et al. [56] implements the strategy of multi-scale transformation of input images, feature extraction and fusion, and prediction head processing within the YOLOv5 framework, and enhances the robustness of the model against weed size, growth stage, and light and weather changes by capturing features of different scales. Combined with an adaptive pseudo-label assignment strategy, it can achieve performance close to the fully supervised baseline with only 1% of labeled data, enabling real-time detection in multiple scenarios. Zhao et al. [57] based on Transformer encoders and Model Morphing techniques, extracted spectral features using the self-attention mechanism and transformed them into probabilistic “knowledge features”, and generated date-continuous model sequences through non-adjacent pre-trained model interpolation, alleviating spatiotemporal heterogeneity. Crop-type binary segmentation can be achieved without ground samples, mapping results are highly consistent with official statistics, and support for custom time-series knowledge model sets. The knowledge-driven sample generation algorithm proposed by Miao et al. [58] combines the random forest classification model, uses multi-source time-series satellite data and crop phenological characteristics to generate high-quality training samples, and integrates automatically generated samples, field measurement data and multi-source satellite data for classification. In alfalfa spatial distribution mapping, the overall accuracy and F1 score reached 0.97, and the mowing detection results were highly consistent with field observations. Through parameter adjustment, it can be extended to other crops and geographical region monitoring. These methods have improved the adaptability of feature extraction to multiple scenarios from different technical approaches, with breakthroughs in reasoning efficiency, data utilization, spatiotemporal generalization, etc., providing important technical support for image processing in cross-scenario applications such as agricultural monitoring and animal breeding.

3.3. Future Technological Development Trends in the Environmental Aspect

3.3.1. Future Trends of Model Segmentation Technology

With the development of artificial intelligence technology, segmentation methods in this field are evolving towards higher precision, lightweight designs, multimodal fusion, and real-time intelligent decision-making, driving agriculture to shift from traditional experience-based to data-driven.

Firstly, the introduction of the Transformer structure and the attention mechanism has brought significant performance improvements to agricultural image segmentation. While traditional convolutional neural networks (CNNs) perform well in handling local features, they have limitations when dealing with challenges such as large-scale variations, complex background obstructions, and small object recognition. The Transformer, with its powerful global modeling capabilities, has been introduced into semantic segmentation tasks, significantly enhancing contextual understanding of images. For instance, Structures like SegFormer break the traditional reliance on positional encoding and achieve efficient semantic segmentation without sacrificing accuracy, with great potential for promotion. In complex agricultural scenarios (such as variations in garden lighting, interlaced vegetation in the field, blurred boundaries of disease patches, etc.). Transformers can effectively capture global semantic relationships between regions, thereby achieving more accurate segmentation. Xie et al. [59] demonstrated its excellent performance in multiple segmentation tasks, providing a theoretical basis and model reference for advancing agricultural image segmentation techniques.

Secondly, lightweight network design has become another major trend in the development of agricultural image segmentation models. Deployment environments in agricultural fields are often resource-constrained, such as on mobile devices, unmanned aerial vehicles, ground robots, etc., which require models to have good computational efficiency and fast reasoning capabilities while ensuring certain accuracy. In recent years, lightweight models such as MobileNet, ShuffleNet and EfficientFormer have emerged, significantly reducing model parameters and computational load through methods such as depth-separable convolution, channel mixing and shuffling, and pruning quantization. Xu et al. [60] used EfficientFormer to build a lightweight Transformer that maintains segmentation accuracy while having much lower model complexity and inference delay than traditional large models, making it suitable for deployment in edge devices for tasks like real-time crop monitoring, pest and disease detection, and yield estimation.

Thirdly, multimodal data fusion is an important means to improve the generalization and scene adaptability of models. A single RGB image is often affected by factors such as illumination, occlusion, and complex backgrounds. In recent years, research has gradually focused on fusing visible light images with multi-source heterogeneous data such as multispectral maps, thermal imaging maps, near-infrared images, depth maps, and LiDAR point clouds. With multimodal input, the model can obtain more comprehensive spectral, structural and thermal properties of the target, thereby improving segmentation accuracy and robustness. Xu et al. [61] proposed an attentional fusion network that fuses RGB and infrared images for multispectral semantic segmentation and demonstrated a significant improvement in accuracy.

Fourth, weakly supervised and self-supervised learning is an effective solution to the problem of scarce labeled data in agricultural field. In agricultural image segmentation, annotation work is time-consuming and labor-intensive, and requires expertise such as classification of disease types and distinction of weed species, which increases the difficulty of data acquisition. Weakly supervised methods reduce the cost of manual annotation by introducing forms such as image-level labels, point annotations, and bounding boxes instead of pixel-level annotations. Self-supervised learning unlabeled data through pre-training tasks such as image reconstruction, puzzle sorting, contrastive learning, etc., and then transfers it to segmentation tasks. These approaches prove highly effective in agricultural small sample tasks, enabling the training of models with excellent performance when there are only a few labeled samples. For example, one study combined CLIP model pre-training with a small number of labeled samples to achieve precise segmentation of diseased areas in fruit trees demonstrated the potential of pre-trained vision-language models in agriculture.

In addition, with the development of big data and cloud platforms, agricultural image segmentation is being deeply integrated with intelligent decision-making systems. The segmentation results of the model are no longer limited to image-level analysis [62], but can further support actual production processes including agricultural machinery path planning, pesticide application optimization, and disease early warning.

3.3.2. Future Trends of Domain-Adaptive Technology

Domain Adaptation (DA) technology in future agricultural image processing is advancing towards smarter, more generalized, and lighter to address the “domain shift” problems caused by different regions, times, lighting conditions, and imaging equipment. Due to the challenges that agricultural images often face, such as inconsistent data distribution and lack of labels, traditional deep learning models often experience a sudden drop in performance in actual deployment. For this reason, the academic and industrial communities are promoting the in-depth integration of domain adaptive technologies in agriculture, especially achieving initial results in tasks such as crop segmentation, disease identification, fruit detection, and weed localization. Future research will focus on five key trends, each of which is supported by academic achievements and provides a solid foundation for technological development.

Image style transfer is a key approach that has been widely applied in domain adaptation in agricultural images. Agricultural images taken by different fields or equipment vary significantly in terms of color, contrast, and illumination distribution, which seriously affects the transfer effect of the model. Through image style transfer, the target domain image can be “translated” into an image with the same style as the source domain, thereby visually narrowing the differences in the distribution of low-level features. Unsupervised generative models like CycleGAN and StyleGAN excel in this direction, especially in agricultural images where it is difficult to obtain paired images.

Agricultural tasks often have intrinsic structural connections, such as crop segmentation and disease detection, fruit localization and maturity assessment, etc. Therefore, multi-task transfer learning can leverage these connections by sharing model parameters and training processes and using the information of related tasks to assist the learning of the main task while conducting domain transfer. Kim and Park [63] developed an innovative framework that jointly performs weed segmentation and crop classification, achieving collaborative task optimization in multiple crop scenarios, which not only improves the accuracy of domain transfer but also enhances the robustness and practicality of the overall model.

Transformer models have a natural advantage in cross-scenario modeling due to their strong global modeling capabilities and attention mechanisms. Compared with traditional CNNs, the Transformer is better suited for handling long-range dependent structures in crop images (such as vine plants, leaf distribution, etc.), and when combined with domain adaptive mechanisms (such as cross-domain attention, domain memory modules), it can effectively enhance the model’s transfer robustness and semantic expression ability.

3.3.3. Trends in Feature Extraction Techniques

With the continuous integration of deep learning, computer vision, and agricultural intelligence, its future trends will present characteristics such as multidimensionality, intelligence, and cross-modality.

Fusion feature extraction methods based on Visual Transformer (ViT) and multi-scale modeling are becoming mainstream, while traditional convolutional neural networks (CNNs) have limitations in processing crop images with complex backgrounds and significant scale variations due to their local perception characteristics. In contrast, the Transformer architecture, with its powerful global modeling capabilities, can effectively extract long-range dependent features and is particularly suitable for tasks such as fruit recognition, lesion detection, and crop segmentation. Future research will focus more on integrating the local feature extraction advantages of CNNs with the global attention mechanism of Transformers to build hybrid architectures for extracting more comprehensive, multi-scale, and robust image representations. For example, the Swin-Transformer [64] has achieved efficient computations through the local window attention mechanism and has shown excellent performance in high-resolution agricultural image analysis.

Cross-modal feature extraction will become a key research direction, as agricultural image processing tasks often require the fusion of multi-source information such as visible light images, infrared thermal imaging, hyperspectral images, LiDAR point cloud data, etc. Future feature extraction techniques will break through the limitations of a single modality analysis and move towards multi-modal perception fusion. By designing a unified deep feature encoder, it is possible to achieve aligned and semantically consistent representation extraction in multimodal information, thereby improving the accuracy of tasks such as pest and disease detection and crop yield estimation. The application of Vision-Language Models (VLMs, such as CLIP) helps to combine image and agricultural text description information for joint analysis, enabling smarter agricultural situation recognition.

The application of domain adaptation and transfer learning in agricultural scenarios will increase. Agricultural images often result in significant differences between the source and the target domains due to variations in environment, time, and equipment for image acquisition, affecting the generalization ability of the model. As a result, feature extraction techniques will employ more domain-adaptive strategies, such as adversarial training, style transfer, and feature alignment, to enhance the model’s ability to transfer across different regions and crop species. Khare et al. [65], for example, used YOLOv8 for leaf detection and segmentation, and generated more realistic minority category samples in the detection area through Neural Style Transfer (NST), which effectively improved the classification and detection performance of the model in real environments.

Lightweight and edge-deployable feature extraction models will be valued. Given that agricultural applications are mostly deployed on edge devices such as drones, agricultural robots, smart cameras, etc., operational efficiency and power consumption control of the model have become key requirements. Future feature extraction techniques will focus on the compression and acceleration of model structures—adopting lightweight architectures like MobileNet, EfficientNet, and ShuffleNet, and combining techniques like knowledge distillation, network pruning, and model quantization to achieve efficient model deployment while ensuring feature extraction capabilities.

4. Technical Challenges in Agricultural Image Processing-Data Challenges

4.1. Problems with Data

According to our bibliometric analysis, dataset-related challenges represent one of the most extensively examined topics within the domain of agricultural image processing. Specifically, keywords such as data annotation, class imbalance, and data scarcity were identified in more than 35% of the 2156 publications reviewed, indicating an increasing recognition of data-related limitations as a fundamental obstacle to achieving high model performance and real-world deployment.

The progress of agricultural image processing is critically reliant on the availability of high-quality datasets. Nevertheless, the creation of datasets tailored to agricultural scenarios presents distinctive and complex difficulties, particularly when compared to general computer vision applications. These challenges not only hinder the performance of deep learning models but have also emerged as a key limiting factor in the development of intelligent agricultural systems. This section provides a detailed examination of these data-related technical challenges from three critical perspectives: data annotation, class imbalance, and data scarcity.

4.1.1. Data Annotation Challenges

The annotation of agricultural image data is the foundation for building high-quality datasets, but it is much more complex than general visual task. Agricultural image annotation requires deep domain expertise. Take crop disease identification as an example. Different diseases have very similar visual features in their early stages. For example, early blight and late blight of tomatoes both present as brown spots on the leaves during the early stage of the disease, and it is difficult to accurately distinguish them just by image features. This requires annotators to have not only knowledge of image processing but also expertise in plant pathology, entomology, agronomy, etc., resulting in a scarcity of qualified annotators and high annotation costs.

Secondly, the granularity requirements for annotation in agricultural scenarios are extremely high. In crop phenotypic analysis, it is necessary to precisely label the fine texture of the leaves, the maturity gradient of the fruit, the extent of the spread of the lesion, etc. In the case of grape maturity detection, multiple sub-stages such as green fruit stage, color transition stage, and maturity stage need to be distinguished. The transitions between each stage are continuous and the boundaries are blurred, which brings subjectivity and inconsistency to the labeling. At the same time, obstruction and overlap are common in densely planted environments, making it a technical challenge to accurately delineate the boundaries of partially occluded fruits or leaves.

Another significant challenge is the low efficiency of annotation. Agricultural images often contain a large number of targets, such as an orchard image may contain hundreds of fruits, and a farmland image may contain thousands of crops. The traditional one-by-one annotation method takes a lot of time, while existing semi-automatic annotation tools do not perform well when dealing with complex backgrounds and similar targets in agricultural scenes. In addition, there is a conflict between the timeliness requirements of agricultural production and the time cost of annotation. By the time large-scale data annotation is completed, the growth stage of the crops has changed, reducing the timeliness value of the data. There are also challenges in quality control of annotation. Due to the complexity of agricultural images and the varying expertise levels of annotators, consistency in annotation results is difficult to guarantee. The following Table 3 presents a comparative analysis of the major annotation methods.

4.1.2. Data Imbalance Challenge

Data imbalance is a prevalent issue in agricultural image processing and manifests on multiple levels. Category imbalance is the most direct manifestation. In crop pest and disease identification tasks, image samples of common diseases are readily available, while samples of rare or emerging diseases are extremely scarce. Take rice diseases as an example. Rice blast, a common disease, may account for more than 60% of the data, while relatively rare diseases such as rice warp and seedling blight may account for less than 5 percent. This extreme class imbalance causes the model to be biased towards common categories, with insufficient ability to identify rare but potentially damaging diseases.

The imbalance in temporal and spatial distribution adds to the complexity of dataset construction. Agricultural production has distinct seasonal and regional characteristics. Some pests and diseases occur only in specific seasons or climatic conditions, resulting in a very short window period for collecting relevant data. For example, wheat scab mainly occurs during flowering and heading period when there is continuous rainy weather, creating significant data acquisition challenges. Regional differences also lead to data imbalance. Different varieties, planting methods, and climatic conditions in different regions cause the same crop to exhibit different visual characteristics, and data collected from a single region is difficult to represent the overall characteristics.

The imbalance of growth stages is a specific problem with agricultural datasets. The morphology of crops varies greatly at different growth stages, but data collection is often focused on the observable mature stage, while data on key growth stages such as seedling and flowering are relatively scarce. This imbalance leads to poor performance of the model in monitoring the entire growth period of the crop. At the same time, there is a serious imbalance between normal and abnormal samples. In actual production, healthy crops make up the vast majority and pests and diseases occur relatively less, creating a challenge for anomaly detection models.

The imbalance in scene complexity should not be overlooked either. Simple backgrounds and ideal lighting conditions are easy to obtain, while complex backgrounds and bad weather conditions are difficult to capture. However, agricultural production environments are complex and changeable; models need to maintain stable performance under various conditions. This distribution difference between training data and actual application scenarios seriously affects the generalization ability of the model.

4.1.3. Data Scarcity Challenge

Data scarcity remains a fundamental constraint in agricultural image processing technology. Compared with the millions of datasets in the general vision field, specialized datasets in agriculture are generally smaller in scale. There are multiple reasons for this. First, agricultural data collection is subject to strict temporal and spatial constraints. Crops have long growth cycles, and some key phenotypic traits only briefly appear during specific growth periods. Missing the collection window requires waiting until the next growing season. For example, the heading period of corn male ears lasts only 7–10 days, and phenotypic data from this stage is crucial for yield prediction, but the collection window is extremely limited.

The scarcity of data for rare events is particularly prominent. While some devastating pests and diseases occur less frequently, they cause huge losses once they do. Early symptom images of quarantine pests such as citrus tristeza virus and banana wilt are extremely scarce, but these data are crucial for establishing early warning systems. The constant emergence of new varieties and pests and diseases also brings about the “cold start” problem, and the lack of historical data support makes it difficult to establish relevant identification models.

The high cost of data collection has further exacerbated the scarcity. Agricultural image data collection requires going deep into the fields and faces problems such as inconvenient transportation, high equipment requirements and high labor costs. High-quality agricultural images often require specialized equipment such as multispectral cameras, thermal imagers, drones, etc. The cost of the equipment and operational thresholds limit the scale of data collection. Furthermore, the decentralized nature of agricultural production requires data collection across different plots and among different farmers, and coordination costs are huge.

4.2. Data-Centric Frontier Solutions

In response to dataset challenges in agricultural image processing, researchers have proposed innovative solutions in recent years that not only alleviate problems such as data annotation, imbalance, and scarcity, but also promote the rapid development of agricultural intelligence.

4.2.1. Data Annotation Technology

Weakly supervised learning significantly reduces annotation costs by using incomplete or coarse-grained labels. Image-level label-based methods only require coarse-grained labels such as “diseased” or “disease-free” to automatically infer diseased areas through Class Activation Mapping (CAM) and its variant techniques. For example, in tomato disease detection, a weakly supervised model based on Grad-CAM achieved a 90% reduction in labeling costs while maintaining high detection accuracy [66]. This advantage was further demonstrated by the WSLSS framework developed by Zhou et al. [67], which combines Grad-CAM with limited pixel-level annotations in U-Net pre-training, enabling precise lesion segmentation with image-level annotations alone and even surpassing the generalization ability of fully supervised methods when dealing with disease types not seen in training.

Semi-supervised learning makes full use of a small amount of labeled data and a large amount of unlabeled data, significantly reducing the demand for labeling while ensuring performance. Pseudo-labeling is one of the core methods of semi-supervised learning, which involves training the initial model to generate pseudo-labels on unlabeled data, and then adding high-confidence pseudo-labeled data to the training set to iteratively optimize the model performance. Ciarfuglia et al. [68] effectively solved the covariance offset problem between source and target domains by using bounding box weak labeling combined with a 3D motion structure algorithm to generate pseudo-labels from the video in grape detection, increasing the target domain detection mAP@0.5 from 0.69 to 0.77. Classic methods like self-training and co-training have also been innovatively applied in agricultural image processing. Huang et al. [69] proposed a greedy pseudo-label strategy for weed segmentation, optimizing the label incorporation ratio through covariance analysis to prevent overfitting, and ultimately performed well in cross-domain scenarios. Consistent regularization is based on the assumption that different perturbations to the same sample should produce consistent predictions, and methods such as Mean Teacher and FixMatch have achieved near-fully supervised performance on agricultural datasets. Liu et al. [70] applied semi-supervised learning to lawn weed detection and compared three methods: Π-model, Mean Teacher, and FixMatch. Their results showed that just 50 labeled samples could achieve detection accuracy of over 0.9530, an improvement of 3.9 percent compared to the fully supervised method.

Self-supervised learning does not rely on human labeling at all and learns feature representations from the data itself by designing proxy tasks. Contrastive learning methods have shown great potential in agricultural applications by learning semantic features by maximizing the similarity between different enhanced versions of the same image. The orchid growth inspection system proposed by Chen et al. [71] uses self-supervised learning to train a Siamese deep convolutional neural network. By generating clusterable embeddings from randomly augmented image pairs, 98.6% accuracy in orchid identification was achieved. In the field of leaf disease identification, the self-supervised clustering method developed by Monowar et al. [72] uses the AutoEmbedder architecture and introduces the mix-link and strong can-link strategies. The embedding network was trained through three data link scenarios: can-link, cannot-link and may-link. In addition, Bootstrap Your Own Latent (BYOL) self-supervised learning method has performed well in agricultural pest classification tasks. Kar et al. [73] found that using segmented images as BYOL input could achieve a classification accuracy of 94%. Demonstrating the effectiveness of self-supervised learning in reducing dependency on labels. The key advantage of these methods lies in the end-to-end training process, which eliminates the requirement for pre-training weights and maintains stable performance even when the data is imbalanced.

Unsupervised learning provides valuable information by uncovering the intrinsic structure of the data. The unsupervised machine learning technique developed by Silva et al. [74] successfully identified hidden patterns in agricultural supply chain data through k-means clustering and Self-Organizing Mapping (SOM), demonstrating the effectiveness of unsupervised methods for agricultural data analysis. The improved Kohonen self-organizing map proposed by Belattar et al. [75] combined with the Gram-Schmidt algorithm achieved a classification accuracy of 100% when dealing with correlated data, which is a significant improvement over the 87.5% accuracy of the conventional SOM-PCA method. In agricultural data visualization and cluster analysis, Badapanda et al. [76] effectively reduced dimension and visualized agricultural datasets using K-means clustering techniques combined with principal component analysis (PCA), and successfully identified four different crop clusters through distplot combined with Kernel Density Estimation (KDE). These unsupervised learning methods provide important data insights and pattern recognition capabilities for agricultural decision support systems while reducing the workload of manual annotation.

4.2.2. Data Imbalance Processing Technology

The category distribution of agricultural image data shows a severe long-tail feature, with an abundance of common crop disease samples and an extreme scarcity of rare disease samples. This imbalance stems from the natural law of disease occurrence and the realistic constraints of collection costs. The resampling technique alleviates the imbalance by adjusting the class distribution of the training data, and its advantage lies in improving the model’s ability to learn minority classes without modifying the underlying learning algorithm. Basic resampling techniques include two strategies: random oversampling and random under-sampling. While conventional random oversampling is simple and effective, it is prone to overfitting by directly replicating minority class samples, while random under-sampling achieves balance by randomly deleting majority class samples, but may discard important information about the majority class. Weller et al. [77] compared the performance of these basic methods with SMOTE (Synthetic Minority Over-sampling Technique) in the study of pathogen prediction in agricultural water quality and found obvious limitations of the basic resampling method. To overcome the limitations of the basic resampling techniques, SMOTE employs a different strategy. Instead of simply replicating existing samples, it generates entirely new synthetic samples by interpolating minority class samples in the feature space, thereby avoiding the risk of overfitting associated with direct replication. In response to the high dimensionality and complex spatial structure of agricultural images, researchers have proposed multiple variants of SMOTE. Borderline-SMOTE focuses on sample generation in category-boundary regions, which is particularly important for identifying early symptoms of diseases, as early disease features often lie on the blurred boundary between health and disease. Feng et al. [78] successfully applied Borderline-SMOTE in the evaluation of technical efficiency in agricultural production, demonstrating its superiority in handling boundary samples. ADASYN adaptively generates different numbers of synthetic samples based on the local density of the samples and performs well in field images with uneven distribution of diseases. Sovia et al. [79] employed ADASYN for multi-class classification task of cabbage pests and achieved a classification accuracy of 97% when dealing with severely imbalanced data. Although SMOTE and its variants have been extensively investigated within structured or tabular data domains, their direct applicability to image-based tasks remains limited and typically necessitates the prior transformation of images into meaningful feature representations. In the context of agricultural image processing, this process commonly entails extracting feature embeddings through pretrained neural networks (e.g., ResNet, VGG), followed by the application of SMOTE within the latent feature space. However, such methodologies may fail to maintain the spatial coherence of visual patterns—an essential factor in tasks such as disease spot detection and leaf segmentation. Consequently, SMOTE should be regarded as a supplementary technique rather than a primary solution for addressing class imbalance in image datasets. In practical implementations, data augmentation frameworks such as ImageDataGenerator (Keras), torchvision.transforms (PyTorch), and Albumentations are widely adopted for generating diverse, label-preserving, and reproducible image transformations, and should therefore be prioritized in image-based agricultural applications.

The design of the loss function directly affects the optimization direction of the model. The standard cross-entropy loss assigns the same weight to all samples, causing the model to bias towards the majority class to reduce overall loss. The reweighted loss function balances the contribution of each class to the total loss by giving higher weights to minority classes, but the static weights are difficult to adapt to the dynamic training variations. Focal Loss dynamically reduces the loss contribution of easily classifiable samples by introducing modulation factors, allowing the model to focus on difficult-to-classify samples. This mechanism is particularly suitable for agricultural scenarios where disease symptoms are similar, but the categories are different, as verified by Liu et al. In agricultural detection tasks such as corn damage detection [80]. Based on this, Focal EIoU Loss combines the difficult sample concern mechanism of Focal Loss with the geometric optimization characteristics of EIoU. By introducing width–height loss to accelerate convergence and reduce the impact of low-quality samples on gradients, it has proven its effectiveness in complex agricultural environments such as unharvested aquatic vegetables [81].

Meta-learning and few-shot learning address class imbalance problem in agricultural image processing from the perspective of learning mechanisms, with the core being to enable models to quickly adapt to new tasks using minimal samples. Prototypical Networks perform well in agricultural disease recognition by learning metric spaces to aggregate samples of the same class and separate samples of different classes [82], and are particularly suitable for capturing typical visual patterns of disease symptoms [83]. Memory-enhanced meta-learning methods store historical disease cases through external memory modules and quickly retrieve and compare when encountering similar symptoms to simulate the expert diagnosis process [84]. The multimodal meta-learning framework further integrates spectral, thermal imaging, and environmental data, achieving significant performance improvements in crop physiological state monitoring [85]. These methods effectively alleviated the practical challenges of the frequent emergence of new diseases and varieties in agriculture and the scarcity of labeled samples.

4.2.3. Data Scarcity Response Techniques

Agricultural image data acquisition is constrained by factors such as seasonal growth cycles, geographical distribution limitations, and high annotation costs, making it difficult for deep learning models to obtain sufficient training data. Transfer learning can significantly reduce the target task’s demand for labeled data by leveraging pre-trained knowledge representations from large-scale general-purpose visual datasets. The advantage of this approach is that the pre-trained model has learned the underlying visual features such as edge detection, texture recognition, and shape perception, which are equally universal in agricultural images. Zhang et al. [5] achieved 97.81% accuracy in identifying tea varieties using ImageNet’s pre-trained DenseNet201 model, fine-tuned with only 1000 tea images, while training a model of the same performance from scratch would require at least 10 times the amount of data. Yang et al. [86] achieved 98.6% accuracy in chicken starch detection using a pre-trained GoogLeNet model, demonstrating the effectiveness of the pre-trained model in the agricultural field. However, there are significant domain differences between the general visual dataset and the agricultural dataset. Objects in natural images typically have clear boundaries and stable morphology, while crops in agricultural images show high variability due to growth stage, disease severity, and environmental conditions. To address this challenge, progressive transfer learning bridges the source and target domains by constructing intermediate domains. The PLAFTSL method proposed by Upreti et al. [87] in rice disease identification employs a staged layer freezing and thawing strategy, first freezing all layers of ResNet50 to train only the last layer, then gradually thawing more layers for fine-tuning. At the same time, a progressive loss-aware mechanism was introduced to transfer loss information between stages, ultimately achieving an accuracy rate of 98.32%. This progressive strategy avoids catastrophic forgetting resulting from direct fine-tuning while retaining useful common features. Cross-crop transfer learning further explores the knowledge-sharing mechanisms within the agricultural field, enabling more efficient knowledge transfer based on the morphological and pathological similarities among different crops. Bosilj et al. [88] demonstrated in the crop-weed segmentation task that model transfer among sugar vegetables, carrots, and onions could reduce the training time for new crops by 80 percent, significantly lowering the annotation requirements, Even when simplified regional annotations were used instead of pixel-level precise annotations, the performance loss was only 2 percent.

Data augmentation techniques address the issue of data scarcity from another perspective, with the core idea being to simulate real-world data distribution changes by applying reasonable transformations to existing samples, thereby expanding the training set size without increasing the actual acquisition cost. Compared with other fields, agricultural images have unique characteristics of variability: such as the huge morphological differences in a crop at different growth stages, continuous changes in disease symptoms with the degree of development, and intense fluctuations in field light conditions over time and weather, etc. Traditional data augmentation methods can partially simulate these variations through techniques such as geometric transformations (rotation, flipping, scaling) and color transformations (brightness, contrast adjustment) [38,89,90]. However, simple random transformations may produce images that do not conform to agronomic laws, for example, excessive rotation may cause crops to grow in physically impossible postures. To address this issue, researchers have developed augmentation strategies based on agronomic knowledge. Lu et al. [91] proposed a significant feature information equalization enhancement method based on geometric position to address the imbalance of greenhouse tomato flowering and fruiting stages. This method designs enhancement strategies based on the actual distribution pattern of tomato fruits on the plant to ensure that the generated images conform to common agricultural knowledge. These methods generate diverse and representative training samples based on limited raw data by simulating phenomena such as light variations, crop shading, and dew coverage in real farmland environments. Hybrid augmentation strategies enhance training data diversity by fusing samples in feature space rather than pixel space. Li et al. [92] combined MixUp and CutMix methods with the ConvMixer architecture for corn leaf disease classification, which created intermediate samples through linear interpolation or region substitution, enhancing the model’s understanding of disease development. With the use of MixUp and CutMix, the model accuracy increased to 99.25% and 99.32%, respectively.

4.3. Future Technological Developments in Terms of Data

Generative models address data scarcity by learning the underlying distribution of real data to synthesize entirely new training samples, thus breaking through the limitations of traditional methods that rely on existing data transformations. The key advantage of generative models lies in their ability to capture complex patterns and subtle features in agricultural images, including the irregular shape of disease spots, the fine texture of leaf veins, the gradient color of fruits, and other features that are difficult to reproduce through regularized transformations. Variational Autoencoders (VAEs) generate agricultural images with continuous variations by learning the latent representation space of the data. Wu et al. [93] proposed an image generation method for tomato leaf diseases based on Adversarial VAE, which solves the blurring problem of images generated by traditional VAE by introducing an adversarial training mechanism into the VAE framework. The disease images generated on the PlantVillage dataset were superior to InfoGAN, WAE and other methods in terms of visual quality and diversity, with significantly improved FID scores. Conditional variants of Generative Adversarial Networks (GANs) have shown strong domain transformation capabilities in agricultural image generation, especially in addressing the problem of asynchrony in multimodal data collection. Şener et al. [94] applied CycleGAN to feature transformation between Sentinel-1 radar data and Sentinel-2 optical data, ensuring the reversibility of the transformation through cycle consistency constraints. This successfully addressed the issue of temporal data loss in optical remote sensing due to cloud and fog obscuration, making all-weather crop monitoring possible. The StyleGAN series of models have achieved precise control over the properties of the generated images through decoupled latent space design. Huo et al. [95] used StyleGAN3 to generate high-quality synthetic images of tomatoes at different growth stages. By manipulating the latent vector, they could precisely control the maturity, size, color and other attributes of the generated tomatoes, and the generated images achieved a PSNR of 28–39 dB. Combined with the Vision Transformer model, it achieved 98.39% accuracy in tomato growth stage recognition tasks. Diffusion models, as the latest generation of generative models, achieve unprecedented generation quality and controllability through a progressive denoising process. Mullins et al. [96] enhanced data on wild blueberries, fescue grass, and red leaf disease using the image variant generation function of DALL-E 2, which was able to generate images of specific disease severity and environmental conditions based on text descriptions, and the combined dataset achieved a 0.834 mAP50 in wild blueberry detection, significantly better than the 0.806 of the pure real dataset. Moreno et al. [97] leveraged Stable Diffusion for weed detection task, controlling the type, growth stage, and distribution density of weeds with carefully designed text prompt words, and generating a large amount of high-quality training data using only 30 real images as references. Their mixed dataset achieved a mAP of 0.93, an improvement of 6–9 percent compared to the model using only real images.

With the rapid development of artificial intelligence technology and the advancement of agricultural modernization, there will be new opportunities for agricultural image processing in dataset construction. This section will look forward to future technological trends from three dimensions—data annotation, data imbalance, and data scarcity—providing forward-looking insights for research and applications in this field.

4.3.1. Trends in Data Annotation Technology

Large-scale pre-trained models will revolutionize agricultural image annotation patterns. With the development of multimodal large models such as CLIP and GPT-4V, zero-shot and few-shot annotation is becoming a mainstream trend. The E-CLIP model proposed by Zhang et al. [98] achieved an outstanding F1 score of 0.752 in agricultural fruit detection, demonstrating the great potential of vision-language models for intelligent annotation through natural language instructions. Future annotation systems will be able to understand the description of professional terms. Agricultural experts only need to input simple instructions such as “label all ripe red apples”, and the system will automatically complete precise annotation, significantly lowering the annotation threshold and significantly improving efficiency.

Human–machine collaborative annotation will play an important development trend in addressing the challenges of agricultural image data annotation. Robinson et al. [99] demonstrated the effectiveness of human–machine collaborative systems in the study of land cover mapping. Through a real-time interactive feedback mechanism, human annotators were able to intelligently select annotation points based on model prediction results and achieve annotation results significantly better than traditional active learning methods within 15 min. Future agricultural image annotation platforms will integrate technologies such as augmented reality and virtual reality to create immersive annotation environments and provide personalized annotation suggestions by learning expert preferences through intelligent reasoning systems, thereby effectively alleviating core issues such as expert scarcity and annotation inefficiency.

Knowledge-driven automatic annotation will become an important development direction. Deepa et al. [100] proposed an agricultural knowledge extraction method based on the Jaccard relative extractor and the Naive Bayes algorithm in the study of automatic construction of agricultural ontologies. This method can automatically identify term similarities from agricultural documents and establish relation mappings with an accuracy of 94.40 percent, demonstrating the great potential of automated knowledge extraction in the agricultural field. Future agricultural image annotation systems will deeply integrate knowledge from fields such as graphs, growth models, and pathology databases to enable intelligent annotation based on causal reasoning. Unlike traditional annotation methods that rely on expert experience, knowledge-driven systems can both identify current disease symptoms and predict disease development trends based on multi-dimensional agricultural knowledge such as soil characteristics, climate conditions, pest and disease characteristics, etc., and provide temporal annotation information. Cross-modal knowledge fusion will enable the system to comprehensively utilize multi-source information such as images, meteorology, soil, and historical planting records, as well as achieving a more accurate and rich intelligent annotation through ontological reasoning mechanisms, thereby effectively addressing the key problems of scarce domain knowledge and insufficient annotation consistency in agricultural image annotation.

4.3.2. Trends in Data Imbalance Processing Technology

Adaptive balanced learning will become the mainstream paradigm in agricultural image processing for addressing the challenge of data imbalance. Future deep learning models will have the ability to dynamically perceive the distribution of data and automatically adjust their learning strategies during training to address the imbalance problem. Meta-learning-based adaptive algorithms show great potential. The task-adaptive meta-learning framework proposed by Liu et al. [101] achieves effective adaptation to heterogeneous tasks by constructing an easy-to-difficult task hierarchy, achieving remarkable results in agricultural spatiotemporal data analysis. These methods can quickly identify changes in the distribution of agricultural data and adjust the model structures, loss functions, and optimization strategies accordingly, effectively addressing the performance degradation problem of traditional methods when dealing with a large number of heterogeneous tasks. Neural Architecture Search (NAS) technology will be widely used to automatically design network structures suitable for imbalanced agricultural data, allocating differentiated computational resources and representation capabilities for different categories. The AgriNAS framework developed by Omole et al. [102] demonstrated the feasibility of this trend, which achieved a classification accuracy of 98% in soybean pest and disease detection through an adaptive convolutional architecture and spatiotemporal enhancement methods. In the future, these adaptive techniques will further integrate emerging paradigms such as reinforcement learning and federated learning to create more intelligent and personalized agricultural image processing solutions.

The multi-granularity balancing strategy will become an important development trend to solve the problem of data imbalance in agricultural image processing. Traditional data balancing methods mainly focus on sample distribution balancing at the category level, while future methods will expand to collaborative balancing at multi-granularity levels. The multi-granularity alignment method, as proposed by Guo et al. [103], provides a technical basis by learning features at three levels: pixel, instance, and category. Future developments will further deepen this idea, achieving balance not only at the data distribution level but also at the feature representation and semantic understanding levels. Hierarchical balanced learning strategies will make full use of the natural classification system of agricultural data (such as family-genus-species-variety-disease severity), and implement differentiated balanced strategies at different classification levels, ensuring the overall distribution balance while preserving subcategory relationships. In addition, the fine-grained importance weighting mechanism will combine the actual needs of agricultural production and dynamically adjust sample weights based on the potential impact of different diseases on crop yield and economic benefits, enabling the model to focus more on rare but critical disease types that may cause significant agricultural losses, thus achieving a shift from technology-oriented to application-oriented balancing strategies.

Causal reasoning will offer a fresh perspective and solution to the problem of data imbalance in agricultural image processing. While traditional data balancing methods focus only on adjusting the sample distribution at a statistical level, causal reasoning starts from the essence of the system and designs more targeted intervention strategies by constructing causal graphs of agricultural systems to identify the root causes of data imbalance. The causal reasoning framework proposed by Tsoumas et al. [104], identifies and quantifies the causal effects of various factors on agricultural output by establishing a causal structure model of agricultural systems, providing a theoretical basis for data balancing strategies. Counterfactual reasoning techniques will emerge as an important tool for data augmentation, effectively complementing rare sample situations that are difficult to observe in reality by generating scenario data of “what would happen under different conditions”, especially for disease types that only occur under specific environmental conditions. Causal representation learning will guide the model to learn essential features that are not affected by changes in data distribution, improve the ability to recognize minority class samples by identifying stable causal relationships, and avoid the model’s excessive reliance on bias features in the dataset. In addition, structural causal models will revolutionize data acquisition strategies by quantifying the value contribution of different data to understanding causality, guiding researchers to prioritize the collection of samples that are most crucial for revealing the underlying mechanisms of agricultural systems, thereby optimizing data quality and balance under limited resource constraints.

4.3.3. Data Scarcity in Response to Technological Trends

With the rapid development of multimodal data fusion technology, agricultural image processing shows broad prospects in addressing the challenge of data scarcity. Multimodal data fusion can significantly improve the accuracy and reliability of agricultural monitoring under limited data conditions by integrating complementary information obtained from different types of sensors. Peng et al. [105] demonstrated this by combining multispectral, visible light and thermal infrared sensors carried by UAVs and used multi-dimensional information like band reflectance, canopy coverage and temperature to predict the moisture content of grape leaves. Compared with single-modal data, multimodal fusion enabled the R² value of the Random Forest algorithm to reach above 0.69, and the root mean square error was kept within 2.5%. This fusion strategy effectively alleviates the scarcity of labeled data in agricultural image processing by fully tapping into the complementary characteristics of data from different modalities, laying the foundation for building a more robust and accurate agricultural monitoring systems. In the future, with the advancement of sensor technology and the optimization of fusion algorithms, multimodal data fusion will become an important technical approach to addressing the challenge of data scarcity in agricultural image processing, achieving high-precision agricultural monitoring and intelligent decision-making under limited data conditions by maximizing the synergy of various sensor data.

Generative AI will create nearly unlimited synthetic data. The next generation of generative models will produce high-quality agricultural images that are almost indistinguishable from real data. Physics-based rendering engines will simulate real lighting, materials, and growth processes to generate synthetic data with physical authenticity. The generative model will be controllable and interpretable, capable of precisely controlling various attributes of the generated image, such as disease type, severity, environmental conditions, etc. Three-dimensional reconstruction techniques such as Neural Radiation Fields (NeRF) will build three-dimensional models of crops from a small number of images and generate training data from any perspective, such as the SC-NeRF method proposed by Ku et al. [106], which is specifically designed for high-throughput phenotypic analysis applications in agriculture and can reconstruct high-quality three-dimensional point cloud models from images of rotating plant objects using a fixed camera. This method provides a feasible technical path for constructing complete three-dimensional representations of crops from a small number of two-dimensional images. Based on this type of 3D reconstruction technology, researchers can generate training data from any perspective and under any lighting conditions, greatly enriching the diversity and coverage of agricultural image datasets, thereby effectively alleviating the data scarcity challenge currently faced in agricultural image processing.

Few-shot and zero-shot learning techniques will become standard capabilities in the field of agricultural image processing. Future agricultural AI systems will demonstrate strong generalization capabilities and be able to identify new types of pests and diseases with very small or even zero samples. Ragu et al.’s [107] research shows that small sample learning has achieved remarkable results in agricultural applications, achieving over 90 percent accuracy in tasks such as plant disease identification and pest detection, with some studies even achieving 99.48 percent accuracy. Memory-based learning mechanisms will enable models to store and retrieve agricultural knowledge efficiently, supporting lifelong learning capabilities. Combinatorial learning will understand new phenomena by intelligently combining known visual features, such as identifying new disease types by combining features like “brown spots “+” circular shapes “+” leaf edges”. Program synthesis technology will automatically generate specialized models for specific agricultural tasks, dynamically adjust the network structure according to the characteristics of different crops, and significantly reduce the reliance on large-scale labeled datasets, making agricultural image processing systems more adaptable and practical.

Quantum machine learning technology may bring revolutionary breakthroughs to the field of agricultural image processing. The exponential computing power of quantum computing will make it possible to train complex models with extremely small datasets, fundamentally changing the reliance of traditional machine learning on large-scale data. The research by Wu et al. [108] shows that quantum machine learning has demonstrated exceptional performance in crop disease detection. Using advanced techniques such as quantum neural networks and quantum particle swarm optimization algorithms, it can process multiple data sources including images, sensor data, and spectral information simultaneously, significantly improving the accuracy and efficiency of early disease identification. The unique properties of quantum entanglement and superposition states will enable new possibilities for feature representation learning, allowing the system to extract richer and deeper information features from a single sample. Quantum-enhanced sampling techniques can explore the data space more efficiently, identify and select the most representative training samples, and overcome the challenges of data scarcity. Quantum-classical hybrid algorithms will fully combine the parallel processing advantages of quantum computing with the stability of classical computing, providing innovative solutions to the problem of scarce agricultural data, enabling real-time monitoring and predictive analysis, and ultimately promoting sustainable agricultural development and the achievement of global food security goals.

5. Challenges in Agricultural Image Processing Technology-Model Deployment Challenges

5.1. Problems in Model Deployment

According to our bibliometric analysis, the practical deployment of agricultural image processing models has emerged as a progressively significant research focus in recent years. Among the 2156 publications reviewed from 2010 to 2025, approximately 31% explicitly discussed challenges related to real-time inference, model compression, edge computing, and deployment under resource-constrained conditions. This trend underscores the growing recognition of the substantial gap between algorithmic development in controlled environments and practical model applicability in real-world agricultural scenarios. Within the dynamic, complex field of agricultural image processing, deep learning models face numerous significant challenges when transitioning from laboratory settings to large-scale production deployment. These challenges primarily stem from limited computational resources on edge devices, stringent real-time processing demands, and constraints related to model size and memory storage.

This section clarified the fundamental issues associated with practical model implementation in agricultural applications and analyzes the current methodological approaches designed to overcome these limitations. These include techniques such as model pruning, quantization, knowledge distillation, and hardware-aware neural architecture search. Furthermore, it outlines prospective research directions that have the potential to improve model efficiency, portability, and scalability in real-world deployment scenarios. A visual overview of the structure of this section is provided in the accompanying Figure 6.

5.1.1. Edge Device Computing Power Is Limited

Agricultural production environments are frequently situated at considerable distances from centralized data centers, requiring most image processing tasks directly on edge devices deployed in the field. However, contemporary mainstream deep learning models typically feature intricate network architectures and substantial parameter counts, resulting in computational demands that vastly surpass the capabilities of edge hardware.

To illustrate, in crop pest and disease identification. Deep learning models based on architectures such as ResNet-152 or VGG-19 (as cited in [109]) often require tens to hundreds of millions of parameters. In stark contrast, embedded platforms commonly utilized in agricultural applications—such as the Raspberry Pi and Jetson Nano—have severely constrained processing power and memory resources. This fundamental disparity in computational capacity renders it infeasible to directly deploy high-performance models, despite their success in server-side environments, onto the frontline of real-world agricultural operations.

In addition, agricultural edge devices face significant constraints related to power consumption. Field operation equipment is predominantly powered by batteries or solar energy, both of which offer only limited electrical capacity. Although high-performance GPUs offer substantial computing power, their energy requirements often reach several hundred watts, which far exceed the available power supply in typical agricultural settings. Consequently, the deployment of complex deep learning models under stringent power limitations has emerged as a critical technical challenge that demands urgent resolution.

5.1.2. The Contradiction Between Real-Time Requirements and Inference Delay

Numerous applications in the domain of agricultural image processing impose stringent requirements on real-time performance. For instance, in precision spraying systems, drones or ground-based spraying equipment must perform real-time identification of crops and weeds while operating at high speeds [110], followed by immediate adjustment of spraying strategies. Similarly, in fruit-picking robots, the vision system is required to rapidly detect ripe fruits and guide the robotic arm to execute precise picking actions [111]. Furthermore, on agricultural product grading and sorting production lines [112], it is necessary to process dozens to hundreds of items per second for quality assessment. These scenarios collectively demand that the response time of the image processing system be maintained at the millisecond level. Nevertheless, the inference latency exhibited by current deep learning models frequently falls short of meeting these rigorous real-time constraints. To illustrate, even relatively lightweight architectures like the YOLO series often require more than 100 ms for single-frame inference on edge devices—an unacceptable delay for applications involving high frame rate video streams. Inference delay is not only determined by model complexity; but also significantly influenced by multiple factors including input image resolution, batch size, and the availability of hardware acceleration. These interdependencies further complicate the optimization process.

5.1.3. Model Parameter Size and Storage Limitations

The parameter size of deep learning models is a critical factor limiting their deployment in agricultural environments. The number of parameters in contemporary convolutional neural networks and Transformer-based architectures is continuously increasing, with the storage footprint of a single model often requiring hundreds of megabytes or even gigabytes. However, agricultural edge devices typically have severely constrained storage capacities. This limitation becomes particularly pronounced in scenarios requiring the concurrent deployment of multiple models to address diverse tasks, thereby intensifying storage-related pressures.

The scale of model parameters not only influences storage demands but also has direct implications for model loading time and runtime memory consumption. On resource-constrained embedded platforms, excessively large models can cause memory overflow or prolonged initialization periods, significantly compromising system responsiveness and operational reliability. Furthermore, significant challenges arise in the context of model updates and maintenance. Remote updating of large-scale models entails substantial network bandwidth usage, rendering such operations impractical in rural areas where network connectivity is limited or unreliable.

5.2. Frontier Solutions for Model Lightweighting and Deployment

In response to the challenges of limited computing resources, high real-time requirements and strong adaptability to complex environments in model deployment in agricultural scenarios, researchers have proposed a series of solutions in two major directions: innovation in lightweight network architecture and model compression and optimization techniques. These technologies significantly enhance model deployment capabilities on resource-constrained devices by reducing model parameters, computational complexity (FLOPs), and inference time, while maintaining high detection/segmentation accuracy, laying the technical foundation for large-scale applications in agricultural image processing. The Table 4 summarizes these solutions.

5.2.1. Innovation in Lightweight Network Architecture

Lightweight network architectures provide a fundamental approach for agricultural object detection and segmentation tasks. By reconstructing the network backbone and optimizing feature extraction and fusion mechanisms, such architectural improvements effectively reduce model complexity at the source while preserving high detection accuracy.

1.: Lightweight Backbone Network Replacement

Traditional detection models—such as those in the YOLO series and DETR (Detection Transformer)—rely heavily on complex backbone networks like CSPDarknet53 and ResNet50, which are characterized by substantial parameter counts and computational demands. To address these limitations, researchers have pursued architectural-level lightweighting by substituting conventional backbones with more efficient alternatives.

For instance, T-YOLO [113] replaces the original CSPDarknet53 backbone of YOLOv4 with MobileNetV3, using depthwise separable convolution properties to reduce the model size to 18.30% of the original while increasing inference speed by 11.68 FPS. Similarly, LBDC-YOLO [114] adopts ShuffleNetV2 as the backbone and significantly reduces computational complexity through group convolution, depthwise separable convolution, and channel shuffle operations. In another study [115], an improved version of YOLOv4 incorporates EfficientNet-B0 as the backbone and integrates depthwise separable convolution to replace standard convolutions in PANet, achieving an 87.8% reduction in model size. Reference [119] substitutes the CSPDarknet backbone in YOLOv8n-seg with FasterNet and employs P-Conv (partial convolution) to lower per-layer computational load and enhance overall model efficiency. Lastly, L-U2NetP [116], designed for wheat fall region segmentation, replaces the original U2NetP’s CSPDarknet backbone with FasterNet and utilizes P-Conv to reduce computational cost per layer, resulting in a compact model with only 1.10 million parameters that meets real-time processing requirements.

2.: Efficient Convolution and Lightweight Feature Extraction

To balance computational efficiency and effective feature representation, researchers have integrated efficient convolutional operations—such as depthwise separable convolution and deformable convolution—with lightweight attention mechanisms. These strategies aim to enhance feature responses in key regions while simultaneously optimizing multi-scale feature fusion architectures to improve detection performance across varying object sizes.

In the domain of efficient convolution and attention mechanism design, SN-CNN [117] introduces depthwise separable convolution into the C2f module of YOLOv8n and combines it with the parameter-free SimAM attention mechanism, which generates weights based on local self-similarity. This integration results in a 28.4% reduction in model parameters. D3-YOLOv10 [119] proposes deformable large kernel attention (D-LKA), which merges the large receptive field of wide convolution kernels with the flexibility of deformable convolution to notably improve the detection accuracy of irregularly shaped tomato-shaded regions. Additionally, reference [121] enhances the Zero-DCE model for nocturnal cattle herd detection by replacing iterative optimization with a Self-Attention Gate mechanism and ACT module, thereby reducing computational complexity while boosting low-light image detection accuracy by 4.7%. Green Apple DETR [118] achieves efficient multi-scale feature integration by extending spatial pyramid pooling (DSPP) and introducing a multi-scale residual aggregation module (FRAM), effectively cutting computational costs.

Regarding feature fusion network optimization, LBDC-YOLO [114] replaces YOLOv8’s PAN-FPN with BiFPN, removing ingle-input edge nodes while adding same-scale skip connections along with a weighted fusion strategy, leading to a 31.1% reduction in parameter count. ShufflenetV2-YOLOX [120] introduces the ASFF module after PANet, which adaptively fuses multi-scale features with weighted combinations to effectively detect clusters of apples of varying sizes, improving detection speed by 18%. Collectively, these approaches enhance the feature representation capabilities of lightweight models, enabling higher detection accuracy without compromising inference efficiency.

5.2.2. Model Compression and Lightweighting Techniques

In addition to architectural innovations, model compression techniques further improve the deployment efficiency of deep learning models on resource-constrained devices by removing redundant parameters and lowering computational complexity. Widely adopted compression methods in agricultural applications include network pruning [126,127], quantization [128,129], and knowledge distillation [130].

Network Pruning Techniques

Model pruning is a widely utilized model compression technique in deep learning, aimed at reducing the scale and computational burden of neural networks. It streamlines the model architecture by identifying and eliminating redundant weights, connections, neurons, or channels. The resulting pruned models can achieve substantial reductions in memory consumption and inference latency while preserving high predictive accuracy. Commonly employed pruning strategies include unstructured pruning—targeting individual weight removal—and structured pruning, which involves removing entire neurons or channels. These methods play a critical role in enabling the deployment of efficient AI models on devices with limited computational resources.

Although current agricultural research predominantly focuses architectural innovation, the well-established application of pruning techniques in general computer vision tasks offers valuable insights for this domain. For instance, in object detection scenarios, low-contributing channels can be selectively pruned by evaluating their importance within feature maps, while retaining key feature pathways to preserve detection performance.

The pruning-quantization fusion scheme proposed by Yuliansyah et al. [122] applies structured pruning to both convolutional and fully connected layers in the PlantVillage plant disease identification task. This approach, combined with dynamic range quantization, achieves a model compression rate exceeding 90%, with only minimal and acceptable accuracy degradation. Similarly, LBDC-YOLO [114] reduces network depth and width by pruning redundant convolutional layers through a Slim neck design, further enhancing model efficiency without compromising detection capability.

2.: Quantization Techniques

Model quantization is a widely adopted technique in deep learning that converts high-precision floating-point parameters—such as model weights—and intermediate activation values from 32-bit floating-point (FP32) representations to lower-precision integer formats, such as 8-bit integers (INT8). The primary objective of quantization is to substantially reduce model size, minimize memory consumption, accelerate inference speed, and lower power requirements, all while maintaining minimal precision degradation. This enables efficient deployment on resource-constrained platforms such as smartphones, embedded systems, and edge computing devices.

Common quantization strategies include post-training quantization (PTQ), which applies quantization after the training phase. While this method is straightforward and computationally efficient, it may lead to a moderate decline in model accuracy. In contrast, quantization-aware training (QAT) simulates the effects of quantization during the training process, enabling the model to adapt and mitigate accuracy loss and thereby achieve superior performance.

Quantization plays a crucial role in enabling real-time object detection in embedded systems, particularly for agricultural robotics. For example, ShufflenetV2-YOLOX [120] implemented FP32-to-FP16 quantization using TensorRT on a NVIDIA Jetson Nano platform, resulting in a 128.3% improvement in inference speed with only a 0.88% reduction in average precision (AP). Padeiro et al. [131] introduced a similarity-preserving quantization (SPQ) approach that enforces consistency between activation patterns of full-precision and quantized models during PTQ. This method achieved over a threefold increase in throughput, along with reduced memory usage and improved inference efficiency, without compromising detection accuracy on MobileNetV2 and ResNet-50.

Syaharani et al. [123] used QAT on MobileNetV3 for corn disease classification, compressing the model to just 4.3 MB while achieving faster inference and preserving high classification accuracy (94.86%), thereby significantly enhancing its suitability for deployment in resource-limited environments. Yuliansyah et al. [122] combined pruning with dynamic-range PTQ, achieving over 90% compression on the PlantVillage disease identification dataset, with only negligible accuracy degradation.

3.: Knowledge Distillation Technique

Knowledge distillation is a well-established model compression technique, the core principle of which is to train a compact and efficient “student model” to emulate and learn from a larger, high-capacity “teacher model.” The “knowledge” transferred typically refers to the soft labels—i.e., the probability distributions over classes—generated by the teacher model during inference, rather than just hard labels that represent definitive class predictions. By incorporating these soft labels as supervisory signals during the training of the student model, the latter not only learns the intrinsic characteristics of the input data but also captures the teacher model’s nuanced understanding of inter-class relationships.

This form of “soft knowledge” contains richer information compared to hard labels, thereby enabling the student model to achieve improved generalization performance and higher accuracy. A key advantage of knowledge distillation lies in its ability to substantially reduce model size and lower computational and storage overheads while preserving the performance of the teacher model to the greatest extent possible. This makes it particularly suitable for deployment in resource-constrained environments where efficient inference is critical.

In agricultural inspection, knowledge distillation effectively balances model complexity with detection or classification accuracy. For example, D3-YOLOv10 [119] introduced a knowledge distillation framework based on semantic frequency cues, through which semantic features from the teacher model (D³-YOLOv10-S) were transferred to the student model (D³-YOLOv10-N), This achieved 54.0% reduction in parameter count and a 64.9% decrease in FLOPs, while maintaining an mAP0.5 of 91.8%.

Reference [132] enhanced the feature extraction capability of the ResNet-50 teacher model by integrating environmental metadata—including temperature, humidity, and soil conditions—and applied asymmetric knowledge distillation to transfer this knowledge to a MobileNet-based student model, enabling high-precision disease classification on mobile platforms. In [125], the PA-RDFKNet framework was proposed, wherein a multimodal (RGB-depth) fused teacher model guided a pure RGB-based student model, reducing plant age prediction error from two weeks to just 0.14 weeks. Reference [124] employed DenseNet201 as the teacher model and combined transfer learning with knowledge distillation to refine MobileNetV2, resulting in a recognition accuracy of 98.32% with a parameter count of only 3.5 million.

In summary, state-of-the-art solutions significantly improve the deployment capabilities of agricultural object detection and segmentation models in resource-limited settings through synergistic optimization involving architectural innovation and model compression techniques. These advances provide essential technical support for the intelligent transformation and digital evolution of modern agriculture.

However, it is crucial to acknowledge that many lightweight models are optimized using standard datasets under controlled experimental conditions, which may not reliably represent their performance in heterogeneous, real-world agricultural environments. Environmental factors such as illumination variation, occlusion, and crop diversity can amplify the inherent limitations of compressed models, leading to reduced accuracy and inconsistent predictions. Therefore, the evaluation of lightweight models should extend beyond metrics such as parameter count and inference time, incorporating assessments of their robustness and consistency across varying field conditions. Moreover, task-specific lightweighting—where essential submodules are preserved or selectively enhanced—may provide a more effective trade-off compared to uniform compression strategies.

5.3. Future Technological Trends in Model Deployment

The rapid progress of artificial intelligence, coupled with the deepening process of agricultural modernization, is driving the evolution of agricultural image processing model deployment toward greater flexibility, efficiency, and reliability. Future technological innovations are expected to fundamentally reshape the deployment paradigms of intelligent systems in agriculture, transitioning from isolated device operations to integrated architectures that combine end-cloud collaboration, scenario-specific hardware, and distributed data processing systems. This systemic transformation aims to establish a deployment ecosystem characterized by dynamic adaptability, computational optimization, and broad compatibility—laying a robust foundation for the comprehensive digital and intelligent transformation of agriculture.

5.3.1. Dynamic Computing Offloading and Task Allocation

Future agricultural imaging systems will increasingly use flexible end-cloud collaborative architectures that dynamically allocate computing tasks based on real-time factors like task complexity, network conditions, and available resources. In this paradigm, lightweight feature extraction and preliminary filtering will be performed at the edge, while more complex inference and decision-making processes will be offloaded to cloud or edge servers. This hierarchical computing structure maximizes the strengths of different computing nodes, enabling high-performance processing while maintaining real-time responsiveness. The rollout of 5G and future of 6G networks will further enhance communication reliability for end-cloud collaboration. Ultra-low latency connectivity will make real-time cloud-based video stream processing feasible, while network slicing technologies can allocate dedicated bandwidth specifically for agricultural applications.

5.3.2. Dedicated Accelerator Design

In response to the growing demand for efficient and energy-conscious model inference in agriculture, future developments will see the emergence of specialized hardware acceleration solutions tailored to agricultural applications. These include custom AI chips, FPGA-based accelerators, and edge computing modules. Through close integration with algorithm design, these platforms will significantly improve computational efficiency while reducing power consumption. Among them, FPGA-based solutions offer distinct advantages due to their low power consumption and reconfigurability. Customized AI processors, such as neural processing units (NPUs), can achieve substantial gains in inference speed and energy efficiency through domain-specific optimizations for agricultural workloads. Edge computing modules, benefiting from modular and adaptable designs, can integrate multiple types of accelerators—such as GPUs and NPUs—and support co-optimization with software and algorithms to meet diverse requirements of agricultural applications.

5.3.3. Distributed Model Deployment and Inference

As agricultural production environments become increasingly complex and data volumes grow exponentially, distributed model deployment and inference will play a pivotal role in next-generation agricultural image processing systems. Agricultural image data will originate from diverse sources—including drone-mounted cameras, ground sensor networks, and greenhouse monitoring systems—requiring scalable and efficient data processing capabilities. By partitioning large models into smaller submodules and distributing them across heterogeneous computing nodes, distributed deployment enhances both processing efficiency and system scalability. Each node performs parallel computations and exchanges intermediate results, ultimately aggregating outputs to generate final decisions.

During the inference phase, distributed strategies reduce overall latency by leveraging the combined computational capacity of multiple nodes. With the integration of federated learning techniques, geographically dispersed agricultural data can contribute to collaborative model training and refinement without compromising data privacy, thus fostering the development of more generalized and robust models. Furthermore, incorporation blockchain technology can enhance data security and traceability, facilitating trustworthy data sharing and cross-institutional collaboration. Looking ahead, optimizing inter-node communication protocols and intelligent data scheduling strategies will serve as key breakthroughs for achieving efficient and scalable operation of agricultural image processing systems in large-scale, heterogeneous environments—ultimately advancing the level of agricultural intelligence.

6. Practical Recommendations for Agricultural Stakeholders

Based on the preceding analysis of the challenges, research progress, and future trends faced by agricultural image processing technology, and integrating the actual scenarios of agricultural production, the following practical and actionable recommendations are proposed. The aim is to facilitate the effective implementation of this technology in practical scenarios, including field management and the deployment of intelligent agricultural equipment.

6.1. Recommendations for Image Acquisition and Model Application in Complex Environments

Standardize the Image Acquisition Process

Formulate crop-specific image acquisition criteria. For example, when capturing images of field crops, select the period from 10 a.m. to 2 p.m., during which the illumination is relatively stable. This helps avoid interference caused by backlighting and shadows. In orchards, adopt a combination of low-altitude aerial photography using drones and multi-angle ground photography. This approach reduces background interference resulting from foliage occlusion. For tall-stemmed crops such as maize and sorghum, utilize a telescopic rod-mounted imaging device to capture images of the middle and lower leaves, ensuring that disease and pest characteristics can be fully captured.

Select Appropriate Models for Different Scenarios

For fruit recognition in orchards, the YOLOv8-seg model integrated with an attention mechanism is recommended as a priority. This model can enhance the segmentation accuracy of overlapping fruits. For the monitoring of field diseases and pests, the DeepLabV3+ model is proposed. By fusing multi-scale features, this model can identify minute disease lesions. In greenhouse environments, the lightweight MobileNetV3-Large architecture is more suitable. It can not only ensure detection accuracy but also reduce device energy consumption and extend battery life, making it ideal for resource-constrained environments.

Dynamically Adjust to Accommodate Regional Variations

When promoting models across different regions, first use CycleGAN to transform local images into the style of the target region. For example, achieve style adaptation for crop images between arid northern regions and humid southern regions. Subsequently, fine-tune the model using 5–10% of local samples. This enables the model to quickly adapt to the new environment.

6.2. Data Management and Application Practice Scheme

Establish a Hierarchical Data Acquisition System

Construct a collaborative team composed of “farmers, technicians, and experts”. Farmers collect basic images, such as those of healthy or diseased crops, via a simple APP. Technicians are responsible for screening out valid samples, while experts accurately annotate challenging samples, such as those of rare diseases and pests. By leveraging semi-supervised learning tools such as Mean Teacher, approximately 10% of labeled samples can be utilized to drive the utilization of a large number of unlabeled samples, thereby reducing annotation costs.

Establish a Regional Shared Database

Establish an image database at the county or agricultural cooperative level, covering the main local crops, different growth stages, and various diseases and pests. Adopt a federated learning model to achieve data sharing and break the data isolation. For example, in major wheat-producing areas, images of wheat rust and powdery mildew can be shared to address the problem of insufficient samples in a single region.

Specifically Address the Issue of Data Imbalance

For common diseases and pests such as rice blast, control the sample size through random sampling. For rare types such as corn head smut, generate supplementary samples using generative adversarial networks (GANs). For instance, StyleGAN3 can be employed to generate images at different disease-onset stages. Meanwhile, incorporate Focal Loss during model training to enhance the recognition ability for minority classes.

6.3. Deployment of Intelligent Equipment and Strategies for Technology Implementation

Hierarchical Configuration of Hardware Devices

For hardware device configuration, a hierarchical approach is adopted. Low-power consumption devices such as handheld terminals can be installed with ultra-lightweight models. For example, the optimized YOLOv8n model can be employed, with its parameters kept within 1.5 M, so as to meet the requirements of real-time detection. In the case of unmanned aerial vehicle (UAV) inspection systems, the Jetson Nano edge-computing module can be selected. Through INT8 quantization, the inference speed can be boosted to over 50 frames per second. High-performance GPUs are recommended for large-scale agricultural machinery. These GPUs can run complex models such as the Transformer fusion model, enabling parallel processing of multiple tasks, including crop identification and yield estimation.

Promotion of Edge–Cloud Collaborative Applications

Edge devices, including field sensors and picking robots, are tasked with real-time detection operations. These tasks may involve activities such as determining the maturity of crops and locating weeds. Conversely, the cloud is responsible for handling complex tasks, such as predicting long-term pest and disease trends and yield modeling. The 5G network is utilized to enable efficient data transmission, thereby striking a balance between real-time responsiveness and computational capabilities.

Phased Promotion and Application

The promotion and application of this technology should be carried out in phases. Initially, pilot projects should be launched in high-value-added crop sectors, such as protected fruit and vegetable cultivation and tea plantations. For instance, image recognition technology can be used to provide early warnings of grape downy mildew. Once the effectiveness of these pilots has been verified, the technology can be extended to field crops. Simultaneously, in conjunction with agricultural subsidy policies, efforts should be made to lower the threshold for small and medium-sized farmers to adopt intelligent equipment, thus facilitating the large-scale implementation of this technology.

7. Conclusions

This paper explores the multifaceted challenges and innovative solutions in the field of agricultural image recognition and processing. It offers a comprehensive review of three pivotal technical obstacles currently facing the discipline: environmental interference, data scarcity, and model deployment constraints. Furthermore, it synthesizes cutting-edge strategies and emerging trends aimed at overcoming these obstacles.

Research findings reveal that to combat environmental variability, advanced techniques such as multi-scale perception models, domain adaptation methods, and attention mechanisms significantly enhance the robustness and accuracy of image analysis in complex farmland environments. The persistent issue of limited labeled data can be effectively alleviated through the application of weakly supervised learning, generative adversarial networks (GANs), and sophisticated data augmentation strategies. Moreover, adaptive balanced learning frameworks and causal reasoning approaches have demonstrated notable potential in addressing class imbalance challenges. At the deployment stage, breakthroughs in lightweight network architectures, model compression algorithms, and edge–cloud collaborative computing are progressively dismantling the barriers imposed by the limited computational resources of agricultural edge devices.

Looking ahead, future research should prioritize three strategic directions: the construction of novel agricultural image datasets encompassing diverse crops, multiple growth stages, and intricate environmental conditions; the design of specialized lightweight neural network architectures infused with agronomic knowledge; and the development of end-to-end real-time monitoring systems capable of early disease detection and severity assessment. Only through interdisciplinary collaboration—seamlessly integrating agricultural expertise with cutting-edge artificial intelligence—can we transition agricultural image processing from controlled laboratory environments to widespread practical deployment. This will ultimately pave the way for intelligent, precise, and sustainable agricultural production.

By presenting a systematic overview of deep learning applications in agriculture, this study aims to catalyze innovation and inspire researchers to develop tailored solutions for computer vision tasks such as crop classification, pest and disease identification, and yield prediction, as well as in the broader domain of agricultural big data analytics. Ultimately, it aims to empower the advancement of smart agriculture, foster more sustainable farming practices, strengthen the global food supply chain, and serve as a cornerstone for the digital transformation of the agricultural sector.

Funding

This research was funded by Jiangsu University grant number JSU-JSJ-2025010.

Data Availability Statement

No new data were created or analyzed in this study.

Conflicts of Interest

The authors declare no conflict of interest.

References

Guo, Y.; Cui, M.; Xu, Z. Spatial Characteristics of Transfer Plots and Conservation Tillage Technology Adoption: Evidence from a Survey of Four Provinces in China. Agriculture 2023, 13, 1601. [Google Scholar] [CrossRef]
Calicioglu, O.; Flammini, A.; Bracco, S.; Bellù, L.; Sims, R. The Future Challenges of Food and Agriculture: An Integrated Analysis of Trends and Solutions. Sustainability 2019, 11, 222. [Google Scholar] [CrossRef]
Hu, T.; Wang, W.; Gu, J.; Xia, Z.; Zhang, J.; Wang, B. Research on Apple Object Detection and Localization Method Based on Improved YOLOX and RGB-D Images. Agronomy 2023, 13, 1816. [Google Scholar] [CrossRef]
Taha, M.F.; Mao, H.; Mousa, S.; Zhou, L.; Wang, Y.; Elmasry, G.; Al-Rejaie, S.; Elwakeel, A.E.; Wei, Y.; Qiu, Z. Deep Learning-Enabled Dynamic Model for Nutrient Status Detection of Aquaponically Grown Plants. Agronomy 2024, 14, 2290. [Google Scholar] [CrossRef]
Zhang, Z.; Yang, M.; Pan, Q.; Jin, X.; Wang, G.; Zhao, Y.; Hu, Y. Identification of Tea Plant Cultivars Based on Canopy Images Using Deep Learning Methods. Sci. Hortic. 2025, 339, 113908. [Google Scholar] [CrossRef]
Mahmood Ur Rehman, M.; Liu, J.; Nijabat, A.; Faheem, M.; Wang, W.; Zhao, S. Leveraging Convolutional Neural Networks for Disease Detection in Vegetables: A Comprehensive Review. Agronomy 2024, 14, 2231. [Google Scholar] [CrossRef]
Ma, J.; Li, M.; Fan, W.; Liu, J. State-of-the-Art Techniques for Fruit Maturity Detection. Agronomy 2024, 14, 2783. [Google Scholar] [CrossRef]
Xu, H.; Yin, H.; Liu, Y.; Wang, B.; Song, H.; Zheng, Z.; Zhang, X.; Jiang, L.; Wang, S. Regional Winter Wheat Yield Prediction and Variable Importance Analysis Based on Multisource Environmental Data. Agronomy 2024, 14, 1623. [Google Scholar] [CrossRef]
Chen, X.; Ding, H.; Yuan, L.-M.; Cai, J.-R.; Chen, X.; Lin, Y. New Approach of Simultaneous, Multi-Perspective Imaging for Quantitative Assessment of the Compactness of Grape Bunches: Simultaneous Multi-Perspective Imaging of Bunches. Aust. J. Grape Wine Res. 2018, 24, 413–420. [Google Scholar] [CrossRef]
Elbeltagi, A.; Srivastava, A.; Deng, J.; Li, Z.; Raza, A.; Khadke, L.; Yu, Z.; El-Rawy, M. Forecasting Vapor Pressure Deficit for Agricultural Water Management Using Machine Learning in Semi-Arid Environments. Agric. Water Manag. 2023, 283, 108302. [Google Scholar] [CrossRef]
Chen, X.; Guo, Y.; Hu, J.; Xu, G.; Liu, W.; Ma, G.; Ding, Q.; He, R. Quantitative Evaluation of Post-Tillage Soil Structure Based on Close-Range Photogrammetry. Agriculture 2024, 14, 2124. [Google Scholar] [CrossRef]
Chen, S.; Memon, M.S.; Shen, B.; Guo, J.; Du, Z.; Tang, Z.; Guo, X.; Memon, H. Identification of Weeds in Cotton Fields at Various Growth Stages Using Color Feature Techniques. Ital. J. Agron. 2024, 19, 100021. [Google Scholar] [CrossRef]
Kamilaris, A.; Prenafeta-Boldú, F.X. Deep Learning in Agriculture: A Survey. Comput. Electron. Agric. 2018, 147, 70–90. [Google Scholar] [CrossRef]
Borowiec, M.L.; Dikow, R.B.; Frandsen, P.B.; McKeeken, A.; Valentini, G.; White, A.E. Deep learning as a tool for ecology and evolution. Methods Ecol. Evol. 2022, 13, 1640–1660. [Google Scholar] [CrossRef]
Liao, J.; Tao, W.; Liang, Y.; He, X.; Wang, H.; Zeng, H.; Wang, Z.; Luo, X.; Sun, J.; Wang, P.; et al. Multi-scale monitoring for hazard level classification of brown planthopper damage in rice using hyperspectral technique. Int. J. Agric. Biol. Eng. 2024, 17, 202–211. [Google Scholar] [CrossRef]
Zhou, M.; Dai, C.; Aheto, J.H.; Zhang, X. Design of a Portable Electronic Nose for Identification of Minced Chicken Meat Adulterated with Soybean Protein Isolate. J. Food Saf. 2024, 44, e13163. [Google Scholar] [CrossRef]
Ma, R.; Zhang, Y.; Zhang, B.; Fang, L.; Huang, D.; Qi, L. Learning attention in the frequency domain for flexible real photograph denoising. IEEE Trans. Image Process. 2024, 33, 3707–3721. [Google Scholar] [CrossRef]
Lu, Y.; Chen, D.; Olaniyi, E.; Huang, Y. Generative Adversarial Networks (GANs) for Image Augmentation in Agriculture: A Systematic Review. Comput. Electron. Agric. 2022, 200, 107208. [Google Scholar] [CrossRef]
Güldenring, R. Self-Supervised Contrastive Learning on Agricultural Images. Comput. Electron. Agric. 2021, 191, 106510. [Google Scholar] [CrossRef]
Wang, Y.; Zhang, X.; Ma, G.; Du, X.; Shaheen, N.; Mao, H. Recognition of weeds at asparagus fields using multi-feature fusion and backpropagation neural network. Int. J. Agric. Biol. Eng. 2021, 14, 190–198. [Google Scholar] [CrossRef]
Zhang, F.; Chen, Z.; Ali, S.; Yang, N.; Fu, S.; Zhang, Y. Multi-class detection of cherry tomatoes using improved YOLOv4-tiny. Int. J. Agric. Biol. Eng. 2023, 16, 225–231. [Google Scholar] [CrossRef]
Xie, H.; Zhang, Z.; Zhang, K.; Yang, L.; Zhang, D.; Yu, Y. Research on the Visual Location Method for Strawberry Picking Points under Complex Conditions Based on Composite Models. J. Sci. Food Agric. 2024, 104, 8566–8579. [Google Scholar] [CrossRef]
Li, M.M.; Long, J.; Stein, A.; Wang, X.Q. Using a semantic edge-aware multi-task neural network to delineate agricultural parcels from remote sensing images. ISPRS J. Photogramm. Remote Sens. 2023, 200, 24–40. [Google Scholar] [CrossRef]
Zhang, Z.; Qiu, X.; Guo, G.; Zhu, X.; Shi, J.; Zhang, N.; Ding, S.; Tang, N.; Qu, Y.; Sun, Z.; et al. An automated root phenotype platform enables nondestructive high-throughput root system architecture dissection in wheat. Plant Physiol. 2025, 198, kiaf154. [Google Scholar] [CrossRef] [PubMed]
Sun, T.; Zhang, W.; Gao, X.; Zhang, W.; Li, N.; Miao, Z.H. Efficient occlusion avoidance based on active deep sensing for harvesting robots. Comput. Electron. Agric. 2024, 225, 109360. [Google Scholar] [CrossRef]
Peng, Y.; Wang, Y. Prediction of the Chlorophyll Content in Pomegranate Leaves Based on Digital Image Processing Technology and Stacked Sparse Autoencoder. Int. J. Food Prop. 2019, 22, 1720–1732. [Google Scholar] [CrossRef]
Chen, C.; Wang, X.; Heidari, A.A.; Yu, H.; Chen, H. Multi-Threshold Image Segmentation of Maize Diseases Based on Elite Comprehensive Particle Swarm Optimization and Otsu. Front. Plant Sci. 2021, 12, 789911. [Google Scholar] [CrossRef]
Xu, C.; Zhang, Q.J. An Edge Detection for Sunflower Seeds Based on Improved Canny Algorithm. Food Mach. 2015, 31, 36–38. [Google Scholar]
Ju, Z.; Xue, Y.; Zhang, W.; Zhai, C. Algorithm for Detecting Pomegranate Disease Spots Based on Prewitt Operator with Adaptive Threshold. Trans. Chin. Soc. Agric. Eng. 2020, 36, 135–142. [Google Scholar]
Wang, Z.; Chu, G.K.; Zhang, H.J.; Liu, S.X.; Huang, X.C.; Gao, F.R.; Zhang, C.Q.; Wang, J.X. Identification of Diseased Empty Rice Panicles Based on Haar-like Feature of UAV Optical Image. Trans. CSAE 2018, 34, 73–82. [Google Scholar]
Liu, X.; Jia, W.; Ruan, C.; Zhao, D.; Gu, Y.; Chen, W. The Recognition of Apple Fruits in Plastic Bags Based on Block Classification. Precis. Agric. 2018, 19, 735–749. [Google Scholar] [CrossRef]
Zhu, J.; Sun, B.; Cai, J.; Xu, Y.; Lu, F.; Ma, H. Inspection and Classification of Wheat Quality Using Image Processing. Qual. Assur. Saf. Crops Foods 2023, 15, 43–54. [Google Scholar] [CrossRef]
Li, J.; Luo, W.; Han, L.; Cai, Z.; Guo, Z. Two-Wavelength Image Detection of Early Decayed Oranges by Coupling Spectral Classification with Image Processing. J. Food Compos. Anal. 2022, 111, 104642. [Google Scholar] [CrossRef]
Liu, Z.H.; Yu, M.; Ren, H.E. Image Segmentation of Grape Clusters Based on Improved K-means Clustering. Jiangsu Agric. Sci. 2018, 46, 239–244. [Google Scholar] [CrossRef]
Pham, V.H.; Lee, B.R. An Image Segmentation Approach for Fruit Defect Detection Using K-Means Clustering and Graph-Based Algorithm. Vietnam J. Comput. Sci. 2015, 2, 25–33. [Google Scholar] [CrossRef]
Yu, Y.; Velastin, S.A.; Yin, F. Automatic Grading of Apples Based on Multi-Features and Weighted K-Means Clustering Algorithm. Inf. Process. Agric. 2020, 7, 556–565. [Google Scholar] [CrossRef]
Tian, I.A.; Xie, C.; Li, L.Y.; Lu, S.L.; Qian, T.T. Improved YOLO v8 Method for Cucumber Fruit Segmentation in Complex Greenhouse Environments. Trans. Chin. Soc. Agric. Mach. 2025, 56, 433–442. [Google Scholar]
Zhao, Y.; Zhang, X.; Sun, J.; Yu, T.; Cai, Z.; Zhang, Z.; Mao, H. Low-Cost Lettuce Height Measurement Based on Depth Vision and Lightweight Instance Segmentation Model. Agriculture 2024, 14, 1596. [Google Scholar] [CrossRef]
Ma, J.; Zhao, Y.; Fan, W.; Liu, J. An Improved YOLOv8 Model for Lotus Seedpod Instance Segmentation in the Lotus Pond Environment. Agronomy 2024, 14, 1325. [Google Scholar] [CrossRef]
Jiang, L.; Wang, Y.; Wu, C.; Wu, H. Fruit Distribution Density Estimation in YOLO-Detected Strawberry Images: A Kernel Density and Nearest Neighbor Analysis Approach. Agriculture 2024, 14, 1848. [Google Scholar] [CrossRef]
Tang, S.; Xia, Z.; Gu, J.; Wang, W.; Huang, Z.; Zhang, W. High-Precision Apple Recognition and Localization Method Based on RGB-D and Improved SOLOv2 Instance Segmentation. Front. Sustain. Food Syst. 2024, 8, 1403872. [Google Scholar] [CrossRef]
Zhu, G.; Luo, Z.; Ye, M.; Xie, Z.; Luo, X.; Hu, H.; Wang, Y.; Ke, Z.; Jiang, J.; Wang, W. Instance Segmentation of Sugar Apple (Annona squamosa) in Natural Orchard Scenes Using an Improved YOLOv9-Seg Model. Agriculture 2025, 15, 1278. [Google Scholar] [CrossRef]
Yang, Q.L.; Chen, C.; Lei, L.; Zhou, N.S.; Yang, L. TD-BlendMask-based Approach for Instance Segmentation of Panax notoginseng Leaf Diseases in Complex Environments. Trans. Chin. Soc. Agric. Mach. 2025, 56, 375–386. [Google Scholar]
Dai, W.; Zhu, W.; Zhou, G.; Liu, G.; Xu, J.; Zhou, H.; Hu, Y.; Liu, Z.; Li, J.; Li, L. AISOA-SSformer: An Effective Image Segmentation Method for Rice Leaf Disease Based on the Transformer Architecture. Plant Phenomics 2024, 6, 0218. [Google Scholar] [CrossRef]
Wu, C.; Liu, Y.; Yang, J.; Dai, A.; Zhou, H.; Tang, K.; Zhang, Y.; Wang, R.; Wei, B.; Wang, Y. Large-Scale Apple Orchard Identification from Multi-Temporal Sentinel-2 Imagery. Agronomy 2025, 15, 1487. [Google Scholar] [CrossRef]
Peng, Y.; Zhao, S.; Liu, J. Segmentation of Overlapping Grape Clusters Based on the Depth Region Growing Method. Electronics 2021, 10, 2813. [Google Scholar] [CrossRef]
Wang, Q.; Qin, W.; Liu, M.; Zhao, J.; Zhu, Q.; Yin, Y. Semantic Segmentation Model-Based Boundary Line Recognition Method for Wheat Harvesting. Agriculture 2024, 14, 1846. [Google Scholar] [CrossRef]
Xu, B.; Werle, R.; Chudzik, G.; Zhang, Z. Enhancing Weed Detection Using UAV Imagery and Deep Learning with Weather-Driven Domain Adaptation. Comput. Electron. Agric. 2025, 237, 110673. [Google Scholar] [CrossRef]
Tian, L.; Ustin, S.L.; Xue, B.; Zarco-Tejada, P.J.; Jin, Y.F.; Yao, X.; Zhu, Y.; Cao, W.X.; Cheng, T. Visualizing the pre-visual: Rice blast infection signals revealed. Remote Sens. Environ. 2025, 328, 114905. [Google Scholar] [CrossRef]
Xu, J.; Zhu, Y.; Zhong, R.; Lin, Z.; Xu, J.; Jiang, H.; Huang, J.; Li, H.; Lin, T. DeepCropMapping: A Multi-Temporal Deep Learning Approach with Improved Spatial Generalizability for Dynamic Corn and Soybean Mapping. Remote Sens. Environ. 2020, 247, 111946. [Google Scholar] [CrossRef]
Chen, L.; Zou, J.; Yuan, Y.; He, H. Improved Domain Adaptive Rice Disease Image Recognition Based on a Novel Attention Mechanism. Comput. Electron. Agric. 2023, 208, 107806. [Google Scholar] [CrossRef]
Wang, H.; Shan, Y.; Chen, L.; Liu, M.; Wang, L.; Meng, Z. Multi-Scale Feature Learning for 3D Semantic Mapping of Agricultural Fields Using UAV Point Clouds. Int. J. Appl. Earth Obs. Geoinf. 2025, 141, 104626. [Google Scholar] [CrossRef]
Liu, D.; Parmiggiani, A.; Norton, T. MT-SRNet: A Transferable Multi-Task Super-Resolution Network for Pig Keypoint Detection, Segmentation, and Posture Estimation. Comput. Electron. Agric. 2025, 237, 110533. [Google Scholar] [CrossRef]
Tao, K.; Wang, A.; Shen, Y.; Lu, Z.; Peng, F.; Wei, X. Peach Flower Density Detection Based on an Improved CNN Incorporating Attention Mechanism and Multi-Scale Feature Fusion. Horticulturae 2022, 8, 904. [Google Scholar] [CrossRef]
Peng, Y.; Zhao, S.; Liu, J. Fused Deep Features-Based Grape Varieties Identification Using Support Vector Machine. Agriculture 2021, 11, 869. [Google Scholar] [CrossRef]
Saleh, A.; Olsen, A.; Wood, J.; Philippa, B.; Rahimi Azghadi, M. Semi-Supervised Weed Detection for Rapid Deployment and Enhanced Efficiency. Comput. Electron. Agric. 2025, 236, 110410. [Google Scholar] [CrossRef]
Zhao, L.; Dong, T.; Du, X.; Dong, B.; Li, Q. Model Morphing Supported Large Scale Crop Type Mapping: A Case Study of Cotton Mapping in Xinjiang, China. Int. J. Appl. Earth Obs. Geoinf. 2025, 141, 104667. [Google Scholar] [CrossRef]
Miao, C.; Fu, S.; Sun, W.; Feng, S.; Hu, Y.; Liu, J.; Feng, Q.; Li, Y.; Liang, T. Large-Scale Mapping of the Spatial Distribution and Cutting Intensity of Cultivated Alfalfa Based on a Sample Generation Algorithm and Random Forest. Comput. Electron. Agric. 2025, 237, 110613. [Google Scholar] [CrossRef]
Xie, E.; Wang, W.; Yu, Z.; Anandkumar, A.; Alvarez, J.M.; Luo, P. SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers. In Proceedings of the 35th International Conference on Neural Information Processing Systems (NIPS’21), Red Hook, NY, USA, 6 December 2021. [Google Scholar]
Xu, Z.; Zhang, W.; Zhang, T.; Yang, Z.; Li, J. Efficient Transformer for Remote Sensing Image Segmentation. Remote Sens. 2021, 13, 3585. [Google Scholar] [CrossRef]
Xu, J.; Lu, K.; Wang, H. Attention Fusion Network for Multi-Spectral Semantic Segmentation. Pattern Recognit. Lett. 2021, 146, 179–184. [Google Scholar] [CrossRef]
Tang, J.; Miao, R.; Zhang, Z.; He, D.; Liu, L. Decision Support of Farmland Intelligent Image Processing Based on Multi-Inference Trees. Comput. Electron. Agric. 2015, 117, 49–56. [Google Scholar] [CrossRef]
Kim, Y.H.; Park, K.R. MTS-CNN: Multi-Task Semantic Segmentation-Convolutional Neural Network for Detecting Crops and Weeds. Comput. Electron. Agric. 2022, 199, 107146. [Google Scholar] [CrossRef]
Guo, Y.; Lan, Y.; Chen, X. CST: Convolutional Swin Transformer for Detecting the Degree and Types of Plant Diseases. Comput. Electron. Agric. 2022, 202, 107407. [Google Scholar] [CrossRef]
Khare, O.; Mane, S.; Kulkarni, H.; Barve, N. LeafNST: An Improved Data Augmentation Method for Classification of Plant Disease Using Object-Based Neural Style Transfer. Discov. Artif. Intell. 2024, 4, 50. [Google Scholar] [CrossRef]
Quach, L.-D.; Quoc, K.N.; Quynh, A.N.; Thai-Nghe, N.; Nguyen, T.G. Explainable Deep Learning Models with Gradient-Weighted Class Activation Mapping for Smart Agriculture. IEEE Access 2023, 11, 83752–83762. [Google Scholar] [CrossRef]
Zhou, L.; Xiao, Q.; Taha, M.F.; Xu, C.; Zhang, C. Phenotypic Analysis of Diseased Plant Leaves Using Supervised and Weakly Supervised Deep Learning. Plant Phenomics 2023, 5, 0022. [Google Scholar] [CrossRef] [PubMed]
Ciarfuglia, T.A.; Motoi, I.M.; Saraceni, L.; Fawakherji, M.; Sanfeliu, A.; Nardi, D. Weakly and Semi-Supervised Detection, Segmentation and Tracking of Table Grapes with Limited and Noisy Data. Comput. Electron. Agric. 2023, 205, 107624. [Google Scholar] [CrossRef]
Huang, Y.; Bais, A. Unsupervised Domain Adaptation for Weed Segmentation Using Greedy Pseudo-Labelling. In Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, WA, USA, 17–18 June 2024; IEEE: New York, NY, USA, 2024; pp. 2484–2494. [Google Scholar] [CrossRef]
Liu, T.; Zhai, D.; He, F.; Yu, J. Semi-supervised Learning Methods for Weed Detection in Turf. Pest Manag. Sci. 2024, 80, 2552–2562. [Google Scholar] [CrossRef]
Chen, L.-B.; Huang, G.-Z.; Huang, X.-R.; Wang, W.-C. A Self-Supervised Learning-Based Intelligent Greenhouse Orchid Growth Inspection System for Precision Agriculture. IEEE Sens. J. 2022, 22, 24567–24577. [Google Scholar] [CrossRef]
Monowar, M.M.; Hamid, M.A.; Kateb, F.A.; Ohi, A.Q.; Mridha, M.F. Self-Supervised Clustering for Leaf Disease Identification. Agriculture 2022, 12, 814. [Google Scholar] [CrossRef]
Kar, S.; Nagasubramanian, K.; Elango, D.; Nair, A.; Mueller, D.S.; O’Neal, M.E.; Singh, A.K.; Sarkar, S.; Ganapathysubramanian, B. Self-supervised learning improves agricultural pest classification. In Proceedings of the AI for Agriculture and Food Systems, Vancouver, BC, Canada, 28 February 2021. [Google Scholar]
Silva, R.F.; Mostaço, G.M.; Xavier, F.; Saraiva, A.M.; Cugnasca, C.E. Use of Unsupervised Machine Learning for Agricultural Supply Chain Data Labeling. In Information and Communication Technologies for Agriculture—Theme II: Data; Bochtis, D.D., Moshou, D.E., Vasileiadis, G., Balafoutis, A., Pardalos, P.M., Eds.; Springer Optimization and Its Applications; Springer International Publishing: Cham, Switzerland, 2022; Volume 183, pp. 267–288. [Google Scholar] [CrossRef]
Sara, B.; Otman, A.; Khatir, E. New Learning Approach for Unsupervised Neural Networks Model with Application to Agriculture Field. IJACSA 2020, 11, 360–369. [Google Scholar] [CrossRef]
Badapanda, K.; Mishra, D.P.; Salkuti, S.R. Agriculture Data Visualization and Analysis Using Data Mining Techniques: Application of Unsupervised Machine Learning. TELKOMNIKA 2022, 20, 98. [Google Scholar] [CrossRef]
Weller, D.L.; Love, T.M.T.; Wiedmann, M. Comparison of Resampling Algorithms to Address Class Imbalance When Developing Machine Learning Models to Predict Foodborne Pathogen Presence in Agricultural Water. Front. Environ. Sci. 2021, 9, 701288. [Google Scholar] [CrossRef]
Feng, J.; Shi, Y.; Su, Y.; Mu, W.; Tian, D. Evaluation Method of Agricultural Production Technical Efficiency Based on Borderline-SMOTE and LightGBM. In Atlantis Highlights in Engineering, Proceedings of the 3rd International Conference on Management Science and Software Engineering (ICMSSE 2023), Qingdao, China, 9 October 2023; Rauf, A., Zakuan, N., Sohail, M.T., Azmi, R., Eds.; Atlantis Press International BV: Dordrecht, The Netherlands, 2024; Volume 20, pp. 586–592. [Google Scholar] [CrossRef]
Sovia, N.A.; Wardhani, N.W.S. Ensemble CNN with ADASYN for multiclass classification on cabbage pests. BAREKENG J. Math. Its Appl. 2024, 18, 1237–1248. [Google Scholar] [CrossRef]
Liu, Z.; Wang, S. Broken Corn Detection Based on an Adjusted YOLO With Focal Loss. IEEE Access 2019, 7, 68281–68289. [Google Scholar] [CrossRef]
Guan, X.; Shi, L.; Yang, W.; Ge, H.; Wei, X.; Ding, Y. Multi-Feature Fusion Recognition and Localization Method for Unmanned Harvesting of Aquatic Vegetables. Agriculture 2024, 14, 971. [Google Scholar] [CrossRef]
Yang, Y.; Xi, M.; Wen, J.; Xiao, S.; Yang, J. Image Classification of Rice Diseases Based on Metric Learning. In Proceedings of the 2024 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB), Toronto, ON, Canada, 19–21 June 2024; IEEE: New York, NY, USA, 2024; pp. 1–5. [Google Scholar] [CrossRef]
Zhang, D.; Pan, F.; Diao, Q.; Feng, X.; Li, W.; Wang, J. Seeding Crop Detection Framework Using Prototypical Network Method in UAV Images. Agriculture 2021, 12, 26. [Google Scholar] [CrossRef]
Santoro, A.; Bartunov, S.; Botvinick, M.; Wierstra, D.; Lillicrap, T. Meta-Learning with Memory-Augmented Neural Networks. In Proceedings of the 33rd International Conference on International Conference on Machine Learning, New York, NY, USA, 20–22 June 2016. [Google Scholar]
Elsherbiny, O.; Gao, J.; Ma, M.; Guo, Y.; Tunio, M.H.; Mosha, A.H. Advancing Lettuce Physiological State Recognition in IoT Aeroponic Systems: A Meta-Learning-Driven Data Fusion Approach. Eur. J. Agron. 2024, 161, 127387. [Google Scholar] [CrossRef]
Yang, F.; Sun, J.; Cheng, J.; Fu, L.; Wang, S.; Xu, M. Detection of Starch in Minced Chicken Meat Based on Hyperspectral Imaging Technique and Transfer Learning. J. Food Process Eng. 2023, 46, e14304. [Google Scholar] [CrossRef]
Upreti, K.; Singh, P.; Jain, D.; Pandey, A.K.; Gupta, A.; Singh, H.R.; Srivastava, S.K.; Prasad, J.S. Progressive Loss-Aware Fine-Tuning Stepwise Learning with GAN Augmentation for Rice Plant Disease Detection. Multimed. Tools Appl. 2024, 83, 84565–84588. [Google Scholar] [CrossRef]
Bosilj, P.; Aptoula, E.; Duckett, T.; Cielniak, G. Transfer Learning between Crop Types for Semantic Segmentation of Crops versus Weeds in Precision Agriculture. J. Field Robot. 2020, 37, 7–19. [Google Scholar] [CrossRef]
Ding, Y.; Zeng, R.; Jiang, H.; Guan, X.; Jiang, Q.; Song, Z. Classification of Tea Quality Grades Based on Hyperspectral Imaging Spatial Information and Optimization Models. Food Meas. 2024, 18, 9098–9112. [Google Scholar] [CrossRef]
Guo, J.; Zhang, K.; Adade, S.Y.S.; Lin, J.; Lin, H.; Chen, Q. Tea Grading, Blending, and Matching Based on Computer Vision and Deep Learning. J. Sci. Food Agric. 2025, 105, 3239–3251. [Google Scholar] [CrossRef]
Lu, P.; Zheng, W.; Lv, X.; Xu, J.; Zhang, S.; Li, Y.; Zhangzhong, L. An Extended Method Based on the Geometric Position of Salient Image Features: Solving the Dataset Imbalance Problem in Greenhouse Tomato Growing Scenarios. Agriculture 2024, 14, 1893. [Google Scholar] [CrossRef]
Li, L.-H.; Tanone, R. Improving Robustness Using MixUp and CutMix Augmentation for Corn Leaf Diseases Classification Based on ConvMixer Architecture. J. ICT Res. Appl. 2023, 17, 167–180. [Google Scholar] [CrossRef]
Wu, Y.; Xu, L. Image Generation of Tomato Leaf Disease Identification Based on Adversarial-VAE. Agriculture 2021, 11, 981. [Google Scholar] [CrossRef]
Sener, E.; Colak, E.; Erten, E.; Taskin, G. The Added Value of Cycle-GAN for Agriculture Studies. In Proceedings of the 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS, Brussels, Belgium, 12–16 July 2021; IEEE: New York, NY, USA, 2021; pp. 7039–7042. [Google Scholar] [CrossRef]
Huo, Y.; Liu, Y.; He, P.; Hu, L.; Gao, W.; Gu, L. Identifying Tomato Growth Stages in Protected Agriculture with StyleGAN3–Synthetic Images and Vision Transformer. Agriculture 2025, 15, 120. [Google Scholar] [CrossRef]
Mullins, C.C.; Esau, T.J.; Zaman, Q.U.; Hennessy, P.J. Optimizing Data Collection Requirements for Machine Learning Models in Wild Blueberry Automation through the Application of DALL-E 2. Smart Agric. Technol. 2025, 10, 100764. [Google Scholar] [CrossRef]
Moreno, H.; Gómez, A.; Altares-López, S.; Ribeiro, A.; Andújar, D. Analysis of Stable Diffusion-Derived Fake Weeds Performance for Training Convolutional Neural Networks. Comput. Electron. Agric. 2023, 214, 108324. [Google Scholar] [CrossRef]
Zhang, Y.; Shao, Y.; Tang, C.; Liu, Z.; Li, Z.; Zhai, R.; Peng, H.; Song, P. E-CLIP: An Enhanced CLIP-Based Visual Language Model for Fruit Detection and Recognition. Agriculture 2025, 15, 1173. [Google Scholar] [CrossRef]
Robinson, C.; Ortiz, A.; Malkin, K.; Elias, B.; Peng, A.; Morris, D.; Dilkina, B.; Jojic, N. Human-Machine Collaboration for Fast Land Cover Mapping. AAAI 2020, 34, 2509–2517. [Google Scholar] [CrossRef]
Deepa, R.; Vigneshwari, S. An Effective Automated Ontology Construction Based on the Agriculture Domain. ETRI J. 2022, 44, 573–587. [Google Scholar] [CrossRef]
Liu, Z.; Liu, L.; Xie, Y.; Jin, Z.; Jia, X. Task-Adaptive Meta-Learning Framework for Advancing Spatial Generalizability. AAAI 2023, 37, 14365–14373. [Google Scholar] [CrossRef]
Omole, O.J.; Rosa, R.L.; Saadi, M.; Rodriguez, D.Z. AgriNAS: Neural Architecture Search with Adaptive Convolution and Spatial–Time Augmentation Method for Soybean Diseases. AI 2024, 5, 2945–2966. [Google Scholar] [CrossRef]
Guo, G.; Lai, S.; Wu, Q.; Shou, Y.; Shi, W. Enhancing Domain Adaptation for Plant Diseases Detection through Masked Image Consistency in Multi-Granularity Alignment. Expert Syst. Appl. 2025, 276, 127101. [Google Scholar] [CrossRef]
Tsoumas, I.; Sitokonstantinou, V.; Giannarakis, G.; Lampiri, E.; Athanassiou, C.; Camps-Valls, G.; Kontoes, C.; Athanasiadis, I.N. Leveraging Causality and Explainability in Digital Agriculture. Environ. Data Sci. 2025, 4, e23. [Google Scholar] [CrossRef]
Peng, X.; Ma, Y.; Sun, J.; Chen, D.; Zhen, J.; Zhang, Z.; Hu, X.; Wang, Y. Grape Leaf Moisture Prediction from UAVs Using Multimodal Data Fusion and Machine Learning. Precis. Agric. 2024, 25, 1609–1635. [Google Scholar] [CrossRef]
Ku, K.; Jubery, T.Z.; Rodriguez, E.; Balu, A.; Sarkar, S.; Krishnamurthy, A.; Ganapathysubramanian, B. SC-NeRF: NeRF-Based Point Cloud Reconstruction Using a Stationary Camera for Agricultural Applications. In Proceedings of the Computer Vision and Pattern Recognition Conference, Nashville, TN, USA, 11–15 June 2025. [Google Scholar]
Ragu, N.; Teo, J. Object Detection and Classification Using Few-Shot Learning in Smart Agriculture: A Scoping Mini Review. Front. Sustain. Food Syst. 2023, 6, 1039299. [Google Scholar] [CrossRef]
Wu, Y.; Nagy, A.; Rajnai, Z.; Fregan, B.; Takács-György, K. Quantum Machine Learning in Crop Disease Monitoring: Opportunities and Challenges to Practical Implementation. In Proceedings of the 2025 IEEE 12th International Conference on Computational Cybernetics and Cyber-Medical Systems (ICCC), Mahe Island, Beau Vallon, Seychelles, 9–11 April 2025; IEEE: New York, NY, USA, 2025; pp. 59–64. [Google Scholar] [CrossRef]
Yin, H.; Gu, Y.H.; Park, C.-J.; Park, J.-H.; Yoo, S.J. Transfer Learning-Based Search Model for Hot Pepper Diseases and Pests. Agriculture 2020, 10, 439. [Google Scholar] [CrossRef]
Guo, Z.; Cai, D.; Bai, J.; Xu, T.; Yu, F. Intelligent Rice Field Weed Control in Precision Agriculture: From Weed Recognition to Variable Rate Spraying. Agronomy 2024, 14, 1702. [Google Scholar] [CrossRef]
Xu, Z.; Huang, X.; Huang, Y.; Sun, H.; Wan, F. A Real-Time Zanthoxylum Target Detection Method for an Intelligent Picking Robot under a Complex Background, Based on an Improved YOLOv5s Architecture. Sensors 2022, 22, 682. [Google Scholar] [CrossRef]
Lu, A.; Guo, R.; Ma, Q.; Ma, L.; Cao, Y.; Liu, J. Online Sorting of Drilled Lotus Seeds Using Deep Learning. Biosyst. Eng. 2022, 221, 118–137. [Google Scholar] [CrossRef]
Zhang, Z.; Lu, Y.; Zhao, Y.; Pan, Q.; Jin, K.; Xu, G.; Hu, Y. TS-YOLO: An All-Day and Lightweight Tea Canopy Shoots Detection Model. Agronomy 2023, 13, 1411. [Google Scholar] [CrossRef]
Zuo, Z.; Gao, S.; Peng, H.; Xue, Y.; Han, L.; Ma, G.; Mao, H. Lightweight Detection of Broccoli Heads in Complex Field Environments Based on LBDC-YOLO. Agronomy 2024, 14, 2359. [Google Scholar] [CrossRef]
Ji, W.; Gao, X.; Xu, B.; Pan, Y.; Zhang, Z.; Zhao, D. Apple Target Recognition Method in Complex Environment Based on Improved YOLOv4. J. Food Process Eng. 2021, 44, e13866. [Google Scholar] [CrossRef]
Feng, G.; Wang, C.; Wang, A.; Gao, Y.; Zhou, Y.; Huang, S.; Luo, B. Segmentation of Wheat Lodging Areas from UAV Imagery Using an Ultra-Lightweight Network. Agriculture 2024, 14, 244. [Google Scholar] [CrossRef]
Zhang, T.; Zhou, J.; Liu, W.; Yue, R.; Shi, J.; Zhou, C.; Hu, J. SN-CNN: A Lightweight and Accurate Line Extraction Algorithm for Seedling Navigation in Ridge-Planted Vegetables. Agriculture 2024, 14, 1446. [Google Scholar] [CrossRef]
Ji, W.; Zhai, K.; Xu, B.; Wu, J. Green Apple Detection Method Based on Multidimensional Feature Extraction Network Model and Transformer Module. J. Food Prot. 2025, 88, 100397. [Google Scholar] [CrossRef]
Li, A.; Wang, C.; Ji, T.; Wang, Q.; Zhang, T. D3-YOLOv10: Improved YOLOv10-Based Lightweight Tomato Detection Algorithm Under Facility Scenario. Agriculture 2024, 14, 2268. [Google Scholar] [CrossRef]
Ji, W.; Pan, Y.; Xu, B.; Wang, J. A Real-Time Apple Targets Detection Method for Picking Robot Based on ShufflenetV2-YOLOX. Agriculture 2022, 12, 856. [Google Scholar] [CrossRef]
Yu, Z.; Guo, Y.; Zhang, L.; Ding, Y.; Zhang, G.; Zhang, D. Improved Lightweight Zero-Reference Deep Curve Estimation Low-Light Enhancement Algorithm for Night-Time Cow Detection. Agriculture 2024, 14, 1003. [Google Scholar] [CrossRef]
Yuliansyah, H.; Hartanto, R.; Soesanti, I. Model Compression Using Post-Training Dynamic Range Quantization for Plant Disease Recognition. IET Conf. Proc. 2025, 2024, 281–287. [Google Scholar] [CrossRef]
Syaharani, A.B.; Sulandari, W.; Pardede, H.F.; Heryana, A.; Suryawati, E.; Subekti, A. Small CNN for Plant Disease Classification Using Quantization-Aware Training: A Case Study on MobileNetV3. In Proceedings of the 2023 International Conference on Radar, Antenna, Microwave, Electronics, and Telecommunications (ICRAMET), Bandung, Indonesia, 15–16 November 2023; IEEE: New York, NY, USA, 2023; pp. 129–134. [Google Scholar] [CrossRef]
Chelliah, B.J.; Sofi, S.J.; SreeKrishna, K.S.S.; Vishaal, S.; Senthilselvi, A.; Senthil Pandi, S. Mobile-Friendly Plant Disease Detection and Classification Using Metadata and Asymmetric Knowledge Distillation. In Proceedings of the 2024 2nd International Conference on Advances in Computation, Communication and Information Technology (ICAICCIT), Faridabad, India, 28–29 November 2024; IEEE: New York, NY, USA, 2024; pp. 767–771. [Google Scholar] [CrossRef]
Bansal, S.; Singh, M.; Barda, S.; Goel, N.; Saini, M. PA-RDFKNet: Unifying Plant Age Estimation through RGB-Depth Fusion and Knowledge Distillation. IEEE Trans. AgriFood Elect. 2024, 2, 226–235. [Google Scholar] [CrossRef]
Liu, Z.; Li, J.; Shen, Z.; Huang, G.; Yan, S.; Zhang, C. Learning Efficient Convolutional Networks through Network Slimming. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; IEEE: New York, NY, USA, 2017; pp. 2755–2763. [Google Scholar] [CrossRef]
He, Y.; Kang, G.; Dong, X.; Fu, Y.; Yang, Y. Soft Filter Pruning for Accelerating Deep Convolutional Neural Networks. arXiv 2018, arXiv:1808.06866. [Google Scholar] [CrossRef]
Jacob, B.; Kligys, S.; Chen, B.; Zhu, M.; Tang, M.; Howard, A.; Adam, H.; Kalenichenko, D. Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; IEEE: New York, NY, USA, 2018; pp. 2704–2713. [Google Scholar] [CrossRef]
Nagel, M.; Baalen, M.V.; Blankevoort, T.; Welling, M. Data-Free Quantization Through Weight Equalization and Bias Correction. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; IEEE: New York, NY, USA, 2019; pp. 1325–1334. [Google Scholar] [CrossRef]
Hinton, G.; Vinyals, O.; Dean, J. Distilling the Knowledge in a Neural Network. arXiv 2015, arXiv:1503.02531. [Google Scholar] [CrossRef]
Padeiro, C.V.; Chen, T.-W.; Komamizu, T.; Ide, I. Lightweight Maize Disease Detection through Post-Training Quantization with Similarity Preservation. In Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, WA, USA, 17–18 June 2024; IEEE: New York, NY, USA, 2024; pp. 2111–2120. [Google Scholar] [CrossRef]
Wang, S.; Miao, Z.; Cao, Y. A Lightweight Leaf Disease Recognition Model for MobileNetV2 Based on Transfer Learning and Knowledge Distillation. In Proceedings of the 2024 6th International Conference on Communications, Information System and Computer Engineering (CISCE), Guangzhou, China, 10–12 May 2024; IEEE: New York, NY, USA, 2024; pp. 459–462. [Google Scholar] [CrossRef]

Figure 1. Applications of agricultural image processing.

Figure 2. General workflow of agricultural image analysis.

Figure 3. Keyword co-occurrence network of agricultural image processing literature.

Figure 4. YOLO vs. Transformer architecture.

Figure 5. Encoder and decoder modules of DeepLabv3++.

Figure 6. Technical architecture for edge deployment of agricultural models.

Table 1. Comparison of common conventional segmentation methods.

Algorithms	Advantages	Disadvantages	Range	References
Threshold segmentation	It has a small computational load, can compress data to the greatest extent, and has relatively stable performance	When dealing with complex images, the segmentation effect is poor, and the time consumption is high	There is a large difference in gray levels between the target and the background, and the background is simple	[23,24,25,26,27]
Edge segmentation	There is a significant advantage in detecting gray levels or structural mutations	Not suitable for processing complex images	Distinct edges, clear target structure	[28,29,30,31,32,33]
Clustering segmentation	The number of pixels is proportional to the running time of the algorithm, and the linear complexity is low	There may be “isolated” pixels that do not belong to the cluster center	Multi-region, color-complex, unlabeled data	[34,35,36]

Table 2. Comparison of evaluation metrics for different YOLO-based models.

Model	Precision	Recall	mAP0.5	MIoU	Parameters (M)	FPS	Inference Time (s)	GFLOPs	F1	References
YOLO-RepNCSPELAN4-DCNv4	96.30%	93.10%	96.20%		3.259	138.9		901		[37]
YOLOv8n-seg-FasterNet	99.80%	98.50%	99.50%		1.369		0.040	596		[39]
YOLOv8n-seg-CBAM-WIoU	96.50%	94.30%	98.00%				0.0259			[40]
YOLOv8n-Squeeze-and-Excitation-EIoU	94.70%	90.70%	87.30%				0.0627			[41]
SOLOV2-Prim			90.10%	83.20%	44.290	29.5		147	88.50%	[42]
GCE-YOLOv9-seg	90.40%	89.60%	93.40%		27.950			162	90.50%	[43]

Table 3. Comparison of advantages and disadvantages of different methodological techniques.

Method Type	Representative Techniques	Advantages	Disadvantages	Applicable Scenarios
Traditional manual annotation	Pixel-by-pixel annotation, rectangular box annotation	High annotation accuracy, suitable for fine-grained tasks	Time-consuming, labor-intensive, costly, and dependent on experts	Small samples, high-value data (such as early disease detection)
Semi-automatic annotation	Threshold segmentation, edge detection	Higher efficiency than manual, lower annotation costs	Accuracy is limited by the complexity of the image and requires manual correction	Medium scale data, scenes with a simple background
Weakly supervised learning	Grad-CAM, WSLSS framework	Only image-level labels are needed, reducing costs by 90%	The positioning accuracy is slightly lower than that of full supervision	Preliminary screening of large-scale data, identification of rare diseases
Semi-supervised learning	Pseudo-labels, consistent regularization	Combine a small amount of labeled data with a large amount of unlabeled data	Pseudo-labels may introduce noise	Label scenarios where resources are limited but accuracy must be guaranteed

Table 4. Related technical solutions.

Specific Methods	Cases
Replacement with Lightweight Backbone Networks	[113,114,115,116]
Efficient Convolution and Lightweight Feature Extraction	[113,117,118,119,120,121]
Network Pruning	[114,122]
Quantization Technology	[120,122,123]
Knowledge Distillation Technology	[119,124,125]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Song, X.; Yan, L.; Liu, S.; Gao, T.; Han, L.; Jiang, X.; Jin, H.; Zhu, Y. Agricultural Image Processing: Challenges, Advances, and Future Trends. Appl. Sci. 2025, 15, 9206. https://doi.org/10.3390/app15169206

AMA Style

Song X, Yan L, Liu S, Gao T, Han L, Jiang X, Jin H, Zhu Y. Agricultural Image Processing: Challenges, Advances, and Future Trends. Applied Sciences. 2025; 15(16):9206. https://doi.org/10.3390/app15169206

Chicago/Turabian Style

Song, Xuehua, Letian Yan, Sihan Liu, Tong Gao, Li Han, Xiaoming Jiang, Hua Jin, and Yi Zhu. 2025. "Agricultural Image Processing: Challenges, Advances, and Future Trends" Applied Sciences 15, no. 16: 9206. https://doi.org/10.3390/app15169206

APA Style

Song, X., Yan, L., Liu, S., Gao, T., Han, L., Jiang, X., Jin, H., & Zhu, Y. (2025). Agricultural Image Processing: Challenges, Advances, and Future Trends. Applied Sciences, 15(16), 9206. https://doi.org/10.3390/app15169206

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Agricultural Image Processing: Challenges, Advances, and Future Trends

Abstract

1. Introduction

2. Materials and Methods

3. Technical Challenges in Agricultural Image Processing-Environmental Challenges

3.1. Problems Posed by the Environmental Conditions

3.1.1. The Challenge of Background Interference

3.1.2. The Challenge of Generalized Adaptation

3.1.3. The Challenge of Multi-Scale Scene Adaptation

3.2. Cutting-Edge Solutions

3.2.1. Model Segmentation Techniques

3.2.2. Domain-Adaptive Techniques

3.2.3. Attention Mechanisms Technology

3.2.4. Feature Extraction Techniques

3.3. Future Technological Development Trends in the Environmental Aspect

3.3.1. Future Trends of Model Segmentation Technology

3.3.2. Future Trends of Domain-Adaptive Technology

3.3.3. Trends in Feature Extraction Techniques

4. Technical Challenges in Agricultural Image Processing-Data Challenges

4.1. Problems with Data

4.1.1. Data Annotation Challenges

4.1.2. Data Imbalance Challenge

4.1.3. Data Scarcity Challenge

4.2. Data-Centric Frontier Solutions

4.2.1. Data Annotation Technology

4.2.2. Data Imbalance Processing Technology

4.2.3. Data Scarcity Response Techniques

4.3. Future Technological Developments in Terms of Data

4.3.1. Trends in Data Annotation Technology

4.3.2. Trends in Data Imbalance Processing Technology

4.3.3. Data Scarcity in Response to Technological Trends

5. Challenges in Agricultural Image Processing Technology-Model Deployment Challenges

5.1. Problems in Model Deployment

5.1.1. Edge Device Computing Power Is Limited

5.1.2. The Contradiction Between Real-Time Requirements and Inference Delay

5.1.3. Model Parameter Size and Storage Limitations

5.2. Frontier Solutions for Model Lightweighting and Deployment

5.2.1. Innovation in Lightweight Network Architecture

5.2.2. Model Compression and Lightweighting Techniques

5.3. Future Technological Trends in Model Deployment

5.3.1. Dynamic Computing Offloading and Task Allocation

5.3.2. Dedicated Accelerator Design

5.3.3. Distributed Model Deployment and Inference

6. Practical Recommendations for Agricultural Stakeholders

6.1. Recommendations for Image Acquisition and Model Application in Complex Environments

6.2. Data Management and Application Practice Scheme

6.3. Deployment of Intelligent Equipment and Strategies for Technology Implementation

7. Conclusions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI