Deep Learning-Based Seedling Row Detection and Localization Using High-Resolution UAV Imagery for Rice Transplanter Operation Quality Evaluation

Luo, Yangfan; Dai, Jiuxiang; Shi, Shenye; Xu, Yuanjun; Zou, Wenqi; Zhang, Haojia; Yang, Xiaonan; Zhao, Zuoxi; Li, Yuanhong

doi:10.3390/rs17040607

Open AccessArticle

Deep Learning-Based Seedling Row Detection and Localization Using High-Resolution UAV Imagery for Rice Transplanter Operation Quality Evaluation

by

Yangfan Luo

¹

,

Jiuxiang Dai

¹,

Shenye Shi

¹,

Yuanjun Xu

¹,

Wenqi Zou

¹,

Haojia Zhang

¹,

Xiaonan Yang

¹,

Zuoxi Zhao

¹

and

Yuanhong Li

^2,*

¹

College of Engineering, South China Agricultural University, Guangzhou 510642, China

²

College of Electronic Engineering, South China Agricultural University, Guangzhou 510642, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2025, 17(4), 607; https://doi.org/10.3390/rs17040607

Submission received: 14 December 2024 / Revised: 2 February 2025 / Accepted: 8 February 2025 / Published: 11 February 2025

(This article belongs to the Section AI Remote Sensing)

Download

Browse Figures

Versions Notes

Abstract

:

Accurately and precisely obtaining field crop information is crucial for evaluating the effectiveness of rice transplanter operations. However, the working environment of rice transplanters in paddy fields is complex, and data obtained solely from GPS devices installed on agricultural machinery cannot directly reflect the specific information of seedlings, making it difficult to accurately evaluate the quality of rice transplanter operations. This study proposes a CAD-UNet model for detecting rice seedling rows based on low altitude orthorectified remote sensing images, and uses evaluation indicators such as straightness and parallelism of seedling rows to evaluate the operation quality of the rice transplanter. We have introduced convolutional block attention module (CBAM) and attention gate (AG) modules on the basis of the original UNet network, which can merge multiple feature maps or information flows together, helping the model better select key areas or features of seedling rows in the image, thereby improving the understanding of image content and task execution performance. In addition, in response to the characteristics of dense and diverse shapes of seedling rows, this study attempts to integrate deformable convolutional network version 2 (DCNv2) into the UNet network, replacing the original standard square convolution, making the sampling receptive field closer to the shape of the seedling rows and more suitable for capturing various shapes and scales of seedling row features, further improving the performance and generalization ability of the model. Different semantic segmentation models are trained and tested using low altitude high-resolution images of drones, and compared. The experimental results indicate that CAD-UNet provides excellent results, with precision, recall, and F1-score reaching 91.14%, 87.96%, and 89.52%, respectively, all of which are superior to other models. The evaluation results of the rice transplanter’s operation effectiveness show that the minimum and maximum straightnessof each seedling row are 4.62 and 13.66 cm, respectively, and the minimum and maximum parallelismbetween adjacent seedling rows are 5.16 and 23.34 cm, respectively. These indicators directly reflect the distribution of rice seedlings in the field, proving that the proposed method can quantitatively evaluate the field operation quality of the transplanter. The method proposed in this study can be applied to decision-making models for farmland crop management, which can help improve the efficiency and sustainability of agricultural operations.

Keywords:

rice transplanter; high-resolution UAV images; improved UNet network; rice seedling row detection; operation quality evaluation

1. Introduction

Accurately assessing the quality of agricultural machinery operations is of great significance for precision agriculture. In the actual production operation process, due to the complexity and unpredictability of the field operating environment, along with the differences in the driving skills of agricultural machinery drivers, the results of agricultural machinery operations do not meet the operating specifications. Accurately assessing the quality of agricultural machinery operations contributes to agricultural planning decisions and appropriate field management, thereby improving crop yield and quality and reducing production costs [1]. Especially for rice seedlings planted in rows, accurate and precise acquisition of field information and evaluation of transplanter operation effectiveness are important prerequisites for achieving refined farmland management [2,3].

The operational effectiveness of agricultural machinery can be reflected through its trajectory in the field, which is usually tracked and recorded using global navigation satellite system (GNSS) technology. When agricultural machinery is automatically driving for transplanting and sowing operations, researchers test and obtain experimental observation values according to the characteristics of the operation [4,5], and use statistical values such as mean, standard deviation, and root mean square value to evaluate the operation quality of agricultural machinery [6,7,8]. Ma et al. [9] conducted a crawler tractor navigation test to evaluate the performance of the combined navigation system. They detected the lateral deviation between the tractor position and the preset line AB at different operating speeds to evaluate the tracking effect of the agricultural machinery navigation system in the case of rice seedling shortage. However, due to environmental factors, the lateral deviation of the GPS trajectory cannot accurately reflect the actual operation accuracy. Therefore, some scholars have tried to use machine vision to directly obtain the position of crops and use this information to evaluate the actual effect of agricultural machinery operations [10,11]. Compared with GPS technology, machine vision technology has the advantages of rich information, non-contact measurement, good real-time performance, and high cost-effectiveness [12,13,14]. However, this method has a limited observation range, and it is a challenging task to detect rows and create location-specific management practice maps using only field images [15,16]. In addition, satellite or manned airborne imaging has some limitations in row detection, including low spatial resolution, high operating costs, and long wait times for product delivery [17]. Compared with the above methods, drone image systems have shown great potential in crop row detection and field-scale precision agriculture applications [18,19].

Some researchers initially used high-resolution remote sensing images to identify and extract crop information, and applied digital image processing technology to explore the planting and growth conditions of crops [20,21,22]. They used methods such as Hough transform and linear fitting to detect crop rows in the field [23,24,25,26,27,28]. However, these traditional machine vision-based methods (such as threshold segmentation, edge detection, etc.) exposed the problems of large workload and low recognition accuracy in early attempts, especially in complex scenarios such as fields [29,30]. With the continuous improvement of deep learning techniques in the field of target detection, the detection of crop rows using semantic segmentation networks in deep learning shows great potential. Convolutional neural networks (CNNs) can efficiently learn multidimensional features from a large number of images through shared weights and locally aware convolutional operations, which makes them an ideal solution for extracting spatial information and rich spectral features from remote sensing images, along with extraction of spatial information and enrichment of spectral features [31,32,33,34,35]. Bah et al. [36] proposed a new method called CRowNet, which combines the use of CNNs and Hough transforms to detect crop rows in UAV images, and the results show that the method has good robustness. Zhang et al. [37] based their research on an improved lightweight neural network (LW-UNet) to establish a rice seedling row segmentation model and realized high-precision rice seedling segmentation using multispectral UAV images.

Since 2020, the vision transformer (ViT) network has attracted great attention in the field of computer vision. Unlike CNNs, ViT networks have a larger perceptual range and can obtain richer global information [38]. ViT-based semantic segmentation algorithms such as SETR [39], Segformer [40], and Swin transformer [41] demonstrated efficient image feature extraction and classification capabilities and were applied to agricultural crop detection tasks [42,43]. Zhang et al. [44] combined remote sensing monitoring methods with deep learning techniques and used six different deep learning models to detect winter wheat in Landsat images, and finally demonstrated the superiority of the SegFormer model through comparative experiments. However, when applying transformer networks to crop row detection in agricultural fields, the results are not always satisfactory, mainly because factors such as edge details, color, texture, and shape of crops can significantly affect the effectiveness of semantic segmentation. Transformer-based methods segment high-resolution farmland images into small chunks, which can lead to incomplete structure of crop edges [45]. In addition, since transformers themselves do not have a local-aware mechanism like CNNs, when an image is compressed into one-dimensional features and input into the encoder as a sequence, the spatial structure information and local detail information of the image may be lost, which affects the restoration of the details at the decoder stage, leading to a degradation of the segmentation performance [46].

With the increasing complexity of application scenarios and the surge in data volume, researchers are constantly exploring new methods to further improve the performance of models. They try to effectively extract valuable information by introducing attention mechanisms, including adding modules such as CBAM [47,48], SENet [49] and GSoP [50], and attention gate (AG) [51] to CNN networks. These attention modules can be used together to help the model focus more on the important parts of the input data, thereby improving the accuracy of tasks such as detection and classification. Zhang et al. [52] integrated ResNet-50 and CBAM attention mechanism modules into the U-Net network, enabling the network to adapt to the various growth stages of corn crops through accurate segmentation even under complex field conditions. Yan et al. [45] proposed a neural network that combines CBAM mechanism, transformer, and dilated convolution to extract rice fields in three growth stages in southern China. The experimental results show that their proposed method (ETUnet) can accurately extract rice fields in the transplanting, tillering, and maturity stages. For the semantic segmentation of field ridges in paddy field environments, Wu et al. [53] proposed a network structure (AM-UNet) that combines the AG module and the atrous spatial pyramid pooling (ASPP) module based on the MultiResUNet model, and verified the feasibility and high accuracy of the method.

Although the above studies have achieved good performance, there are still challenges in identifying seedling rows in complex paddy field environments, especially when there is too much water accumulation resulting in missing seedlings and, when the shapes of seedling rows are diverse, the recognition accuracy is low. In order to solve these problems, this study introduced CBAM and AG modules on the basis of the original UNet network, which can merge multiple feature maps or information streams together, so that the model can more accurately select the key areas or features of seedling rows in the image, thereby improving the understanding of image content and task execution. In addition, in view of the characteristics of dense seedling rows and diverse shapes, this study attempts to replace the standard square convolution in the original network with DCNv2, so that the sampling receptive field is closer to the shape of the seedling row, which is more suitable for capturing the characteristics of seedling rows of various shapes and scales, and further improving the performance and generalization ability of the model.

The main purpose of this study is to propose a method for evaluating the field operation quality of rice transplanters based on the deep learning of high-resolution remote sensing images. The core of this method is to use the CAD-UNet model to accurately extract the rows of seedlings from remote sensing images, and then calculate the straightness, parallelism, and standard deviation of the rows of seedlings through mathematical statistics, and finally evaluate the operation quality of the rice transplanter. The research flow is shown in Figure 1. The main contributions of this study can be summarized as follows:

(1): Constructed a deep learning model (CAD-UNet) for rice seedling segmentation based on high-resolution UAV images.
(2): Further excavated the data information contained in high-resolution UAV RGB images, generated high-resolution distribution maps of seedlings in the field, and visualized the operation effectiveness of the transplanter.
(3): Established a quality evaluation model for rice transplanter operation.

2. Materials and Methods

2.1. Study Area

The study area of this experiment is located in the teaching and research base of South China Agricultural University, Ningxi Town, Zengcheng District, Guangzhou City (23°14′22.02″N, 113°37′56.62″E), which belongs to the South China double-cropping rice cultivation area. Due to natural conditions such as terrain and climate, rice seedlings are often planted in small paddy fields in southern China. The experiment was conducted during the local transplanting season, and the UAV collected field images of the transplanter after normal operation, as shown in Figure 2.

2.2. Remote Sensing Image Collection and Preprocessing

2.2.1. Remote Sensing Image Acquisition and Stitching

This study used DJI Phantom 4 RTK (manufactured by DJI in Shenzhen, China), which has a mass of 1391 g and can fly continuously for 30 min, to capture visible light remote sensing data. The relevant performance parameters and flight parameters during data collection are shown in Table 1. DJI Phantom 4 RTK can achieve more precise positioning and trajectory control by integrating RTK (real-time kinematic) modules, connecting D-RTK 2 high-precision GNSS mobile stations, and using NTRIP (networked transport of RTCM via internet protocol) services. It is equipped with a mechanical shutter that supports high-speed flight shooting, which means it can obtain clear images even during flight, effectively avoiding the impact of the jelly effect on image quality. During the experiment, we used A mode (aperture priority mode) to shoot while increasing the shutter speed of the UAV to ensure that the image data were clear enough. In addition, Phantom 4 RTK is equipped with a three-axis stable pan tilt camera, which can stably capture high-resolution images and videos.

We used Pix4dcapture ground station to plan the flight path, and the UAV obtained orthorectified remote sensing images of the field under the planned path, covering the entire research area, as shown in Figure 3. The height of remote sensing image acquisition was set to 10 m, with a forward and horizontal overlap of 80%. In the end, the UAV flew about 480 m, taking 15 min and 33 s, and obtained 210 *.TIF format image data.

After collecting all data, they were uploaded to the graphics workstation. Figure 3b describes the entire process of data processing, in which the professional software Pix4Dmapper 4.8.2 was used to process the data collected by UAV and generate orthophoto panoramas and digital terrain models of farmland. In addition, in order to perform geometric correction on the obtained UAV images and further improve the spatial accuracy of the images, we used RTK to arrange 11 ground control points in the experimental field. The coverage area of the orthophoto image (as shown in Figure 3c) was approximately 1706.927 m², with a ground resolution of 0.27 cm/pixel. The straight operation area of the rice transplanter was taken as the testing area (as shown in Figure 3d), with an area of approximately 789.133 m².

2.2.2. Construction of Training Sample Sets

Firstly, in order to preserve the geographic coordinate information of the image and meet the accuracy requirements for extracting rice seedling rows, sliding cropping was used to crop the original image into JPG images with a pixel size of 512 × 512, and 858 image samples with obvious features were selected from them. Then, using the professional annotation software Labelme 3.18, the remote sensing images were semantically annotated in the format of the Pascal VOC2007 dataset to create the seedling row dataset. It should be noted that, during the image annotation process, seedlings may be submerged by water, which may lead to model recognition errors and reduce the overall accuracy of seedling detection. Therefore, the author analyzed the definition of seedling row categories and added images with standing water as training data, so that the model can better understand the relationship between seedling rows and standing water, which can improve the accuracy of seedling row segmentation. The aim of this study was to extract seedling rows in paddy fields, so the labels were divided into two categories: “row” and “background”, where “row” represented seedling rows and “background” represented paddy field background. Each image had a corresponding label file and the dataset was divided into training and testing sets at a ratio of 8:2. Additionally, to enhance the robustness and generalization ability of the model, data augmentation techniques such as random cropping, random rotation and scaling, adding random noise, and adjusting brightness and color were applied to the training dataset. After manually filtering data samples, images with low resolution and those not containing the target were removed, resulting in a total of 1200 images in the training set.

2.3. Construction of Seedling Row Extraction Model

2.3.1. CAD-UNet

This study aimed to construct a pixel-level classification model for remote sensing images that could not only determine the category of each pixel in the remote sensing image but also determine the position of the pixels. The background information of the seedling rows in farmland images is complex, and the local shapes of the rows may be greatly different. If the existing deep learning model is directly used to extract the seedling rows, its accuracy is difficult to meet the requirements. To solve these problems, we made improvements to the original UNet. Specifically, we simultaneously introduced the CBAM and AG modules into the skip connections of the original UNet and replaced standard square convolution kernels with DCNv2. We named the improved model CAD-UNet, and the specific structure is shown in Figure 4.

In CAD-UNet, both CBAM and AG modules are introduced into the skip connections, playing a crucial role in better understanding and fully utilizing image features, thereby significantly improving segmentation performance of the model. Furthermore, unlike standard square convolutions, CAD-UNet adopts DCN v2, which allows for dynamic adjustment of the convolution kernel shape to better adapt to specific feature structures, enhancing the model’s perceptual ability and performance. These improvements optimize the model’s detection performance from different angles, which makes it more suitable for extracting seedling rows with complex features and shapes in farmland images.

(1): Convolutional Block Attention Module (CBAM)

In order to help the model better focus on the key features of the seedling rows in the image, this study introduced the CBAM module [45,54] into the model. CBAM is considered an “attention enhancer” that can be easily plugged into existing network architectures and trained end-to-end. It combines the channel attention module (CAM) and the spatial attention module (SAM), which enable the model to pay more attention to important objects in the image. Therefore, this study attempted to introduce the CBAM module into the skip connections of UNet, which allows the model to allocate different weights on different channels and spatial positions, thereby obtaining richer feature maps and enhancing the performance of the model in seedling row detection. The specific principles are as follows:

(1) CAM is mainly used to extract channel features. It applies global maximum pooling (GMP) and global average pooling (GAP) operations based on height and width separately to the input F (H × W × C), and passes through a two-layer shared multi-layer perceptron (MLP). Next, element-wise summation is performed on the MLP output features, and the final channel attention feature

M_{c}

is generated after the sigmoid activation operation. Finally, element-wise multiplication is performed between

M_{c}

and the input feature map

F

to generate the input feature

F^{'}

required by SAM. The expression of CAM is as (1).

\begin{array}{l} M_{c} (F) & = σ (M L P (A v g P o o l (F)) + M L P (M a x P o o l (F))) \\ = σ (W_{1} (W_{0} (F_{a v g}^{c})) + W_{1} (W_{0} (F_{m a x}^{c}))) \end{array}

(1)

where

σ

represents the sigmoid operation,

W_{0}

and

W_{1}

are the weights of the MLP.

F_{a v g}^{c}

and

F_{m a x}^{c}

represent the average pooling feature and the maximum pooling feature, respectively.

(2) SAM is mainly used to extract spatial features. Firstly, channel-based GMP and GAP operations are performed on the input

F^{'}

to obtain two 1 × 1 × C feature maps, and the two are spliced together through the “Concat” operation. Next, after a 7 × 7 convolution operation, the feature map of H × W × 1 is obtained, and the spatial attention feature

M_{s}

is generated after the sigmoid activation operation. It is worth noting that this convolution operation can retain both spatial information and channel information to better capture the location and spatial relationship of the target in the image. Finally, multiply

M_{s}

and the input

F^{'}

of SAM to obtain the final feature map (H × W × C). The expression for SAM is as (2), and the entire CBAM module is shown in Figure 5.

\begin{array}{l} M_{s} (F) & = σ (f^{7 \times 7} ([A v g P o o l (F); M a x P o o l (F)])) \\ = σ (f^{7 \times 7} ([F_{a v g}^{s}; F_{m a x}^{s}])) \end{array}

(2)

where

σ

is the sigmoid operation,

f^{7 \times 7}

represents the convolution operation with a convolution kernel size of 7 × 7.

F_{a v g}^{s}

and

F_{m a x}^{s}

represent the average pooling feature and the maximum pooling feature, respectively.

The CBAM module can be applied as plug and play in the UNet architecture, and model parameters can be optimized through end-to-end training, which helps the model better learn the correlation between channels and the correlation between feature space positions, thereby improving the performance of the model in seedling row detection tasks.

(2): Attention Gate (AG)

The main function of the AG module [49,55] is to improve the feature fusion and information transmission capabilities of the UNet by increasing the resolution of attention weights, especially when dealing with multiple imaging scales. Therefore, this study also embedded the AG module into the skip connection of the UNet. AG consists of convolutional layers with a kernel size of 1 × 1, nonlinear ReLU layers, and sigmoid layers. The AG structure is shown in Figure 6.

The AG module in the UNet architecture is actually a simple subnetwork in the encoder–decoder mode, which has two inputs: one is the feature map

X^{l}

transmitted through the l-th layer skip connection, and the other is the feature map G obtained by upsampling the rough features’ output from the previous neural layer in the decoder section, where

X^{l}

and G have the same size. The workflow of AG is mainly divided into three parts: (1)

X^{l}

and G operate in parallel, that is, G obtains A through

W_{g}

, and

X^{l}

obtains B through

W_{x}

. Then, A “+” B is performed to obtain C. (2) C sequentially performs the nonlinear ReLU operation and

ψ

(1 × 1 Conv) operation to obtain the feature map

q_{a t t}^{l} (X^{l}, G)

, and then obtains the attention coefficient

α

(

α \in [0, 1]

). (3) The input feature

X^{l}

is multiplied by the attention coefficient

α

to obtain a weighted feature map

{\hat{X}}^{l}

, which highlights the significant areas in the image and helps achieve accurate segmentation. The relevant formulas are as follows:

q_{a t t}^{l} (X^{l}, G) = ψ^{T} (σ_{1} (W_{x}^{T} X^{l} + W_{g}^{T} G + b_{g})) + b_{ψ}

(3)

α^{l} = σ_{2} (q_{a t t}^{l} (X^{l}, G))

(4)

{\hat{X}}^{l} = X^{l} \cdot α^{l}

(5)

In the formula, the linear attention coefficient

q_{a t t}^{l}

is calculated through the element sum and 1 × 1 linear transformation, which is jointly determined by the parameters

W_{x}

,

W_{g}

,

b_{g}, b_{ψ}

, where

b_{g}, b_{ψ}

are bias terms (

b_{g}, b_{ψ} \in R

),

ψ^{T}

is a 1 × 1 convolution operation,

σ_{1}

is the ReLU activation function, and

σ_{2}

is the sigmoid activation function.

Like the CBAM module, the AG module can also be easily integrated into the model. The simultaneous use of CBAM and AG modules in UNet’s skip connections helps the model utilize higher level features, focusing on seedling rows in the image while ignoring cluttered background information, thereby enhancing the robustness of the model in segmentation tasks.

(3): Deformable Convolutional Networks (DCNs)

At different image positions, targets usually exhibit different scales or have certain deformations, which requires adaptive adjustment of the receptive field to adapt to target changes. At the same time, the adaptive receptive field will bring greater possibilities for precise positioning. Traditional convolution is limited by a fixed geometric structure, and the receptive field of its activation unit is fixed. Therefore, the original network has certain limitations in modeling targets that undergo geometric deformation, and the network’s generalization ability is poor.

On the basis of standard square convolution, DCNv1 adds a two-dimensional position offset to each sampling point in the convolution kernel, which can adjust the receptive field according to the position offset during sampling, come closer to the target area of interest, and achieve sampling within the target range [56]. Although DCNv1 can cover the entire target, this coverage is not precise enough and can cause a lot of background information interference. DCNv2 [57] incorporates a modulation mechanism to enable the deformable convolution module to modulate the amplitude of offset variables and features at each sampling point, enabling the network to adapt to different scales, shapes, directions, etc. Therefore, this study uses DCNv2 in the UNet network to not only change the spatial distribution of sample points, but also to control the relative influence of bias variables. This improvement improves the model’s modeling ability for seedling rows with changing directions and different widths, making the model more suitable for the seedling row segmentation task of this article. Figure 7 is an illustration of receptive fields in DCN and standard convolution.

If it is assumed that the input feature image X and the standard square convolution receptive field are H, then each pixel q in the output feature image Y can be formalized as:

Y (q) = \sum_{k = 1}^{K} w_{k} \cdot X (q + q_{k})

(6)

where

q_{k}

represents the position in

H

, and

w_{k}

represents the weight of the sampled values at different positions.

In deformable convolution, the sampling grid H is offset by a two-dimensional position offset

{∆ q_{n} | n = 1, \dots, N}

to make the sampling closer to the target area, where

N = | H |

. Then, Formula (6) becomes:

Y (q) = \sum_{k = 1}^{K} w_{k} \cdot X (q + q_{k} + ∆ q_{k}) \cdot ∆ m_{k}

(7)

where

∆ q_{k}

and

∆ m_{k}

represent the position offset and weight modulation parameter of the convolution kernel at the k-th sampling point, respectively. Both

∆ q_{k}

and

∆ m_{k}

are obtained through a separate convolutional layer with the same spatial resolution and dilation rate as the current convolutional layer. Figure 8 shows the comparison of DCNv1, DCNv2, and standard square convolution sampling operation modes.

During DCNv2 sampling, the offset sampling point index value is obtained by the sum of the pixel index value of the input feature map X and the offset vector V. After the offset sampling points are weighted, the output feature map of DCNv2 is finally obtained, where V is the offset in the x and y direction of each sampling point pixel in the input feature image, calculated through an independent convolution layer. Through this mechanism, DCNv2 can adapt to feature information at different locations, allowing the model to better perceive the geometric changes of the seedling rows, especially when dealing with irregular features.

2.3.2. Model Training Parameters and Performance Evaluation

To ensure fairness and comparability in evaluating object detection models, all models in this study were trained and tested in the same environment. Specifically, the CPU and GPU models used are Intel^®Core™ i9-11900K and GeForce RTX 3060Ti 8G, respectively, and the software running environment is Python3.8 and CUDA 10.2. The resolution size of the input image is set to 512 × 512 pixels, the initial learning rate of the model is

1 \times 10^{- 4}

, the learning rate momentum is 0.9, the batch size is set to 4, the number of iterations for model training is 200, and the “Adam” optimizer is used to optimize the network.

In order to visually represent the accuracy and reliability of segmentation results, we used five quantitative evaluation indicators: precision (P), recall (R), intersection over union (IoU), F1-score (F1), and overall accuracy (OA) to evaluate the improved model. Among them, F1 is an evaluation index that combines the precision and recall of the model, representing the proportion of true positive samples to the entire result. OA refers to the percentage of correctly predicted pixel values to the total pixel values, which truly reflects the segmentation results of seedling rows. IoU represents the degree of overlap between the predicted bounding box and the ground truth regions, which is the ratio of intersection to union. They are often used as important indicators in segmentation tasks to quantitatively evaluate the segmentation performance of the model for each category. The relevant formula is as follows:

P = \frac{T P}{T P + F P}

(8)

R = \frac{T P}{T P + F N}

(9)

F 1 = \frac{2 \times (P \cdot R)}{P + R}

(10)

O A = \frac{T P + T N}{T P + T N + F P + F N}

(11)

I o U = \frac{T P}{T P + F P + F N}

(12)

where TP (true positive) represents the number of samples correctly predicted as positive, TN (true negative) represents the number of samples correctly predicted as negative, FP (false positive) represents the number of samples incorrectly predicted as positive, and FN (false negative) represents the number of samples incorrectly predicted as negative.

2.4. Rice Transplanter Operation Effectiveness Evaluation

Normally, rice transplanters work in rows back and forth in the field, and the operation trajectory is required to be as straight and parallel as possible. However, due to practical factors such as the complex field operation environment, the operation quality of the rice transplanter cannot fully meet the operation requirements. Therefore, according to the characteristics of the rice transplanting operation, this study selected straightness and parallelism as the operation quality evaluation indicators of the rice transplanter. These two indicators can effectively reflect the distribution of the operation trajectory of the transplanter, thereby evaluating its operation quality, which can help farmers and agricultural workers better understand the operating effectiveness of the transplanter and provide a reference for subsequent farmland management decisions.

2.4.1. Extract Target Seedling Rows

The model of the rice transplanter used in this study was the Yanmar VP6, which has 6 seedling claws, and the relative positions of the 6 seedling claws are fixed. The rice transplanter can complete the transplantation of 6 rows of seedlings in the test field at the same time, and the trajectories of these 6 rows of seedlings are the same. It is worth noting that the seedling claw is a component on the rice transplanter, and its operating trajectory in the field is consistent with that of the transplanter. Therefore, by selecting only one seedling row produced by one of the claws for each operation, the distribution information of the other five seedling rows can be obtained, and then the operating effectiveness of the transplanter can be analyzed. In order to facilitate the extraction of target seedling rows, we labeled the seedling claws. This study selected the fifth seedling claw for research, as shown in Figure 9a.

In the experiment, the rice transplanter performed straight-line operations in the field 8 times, covering the entire field and producing 48 rows of seedlings. Therefore, these rows of seedlings were divided into 8 groups, with each group selecting the rows generated by the 5th seedling claw as representatives of 1 straight-line operation of the transplanter. Ultimately, 8 target seedling rows were obtained, recorded as 1, 2, …, 8, as shown in Figure 9b.

2.4.2. Evaluation Indicators for Rice Transplanter Operation Quality

In order to more accurately evaluate the operation quality of the rice transplanter, we used the least squares method to fit the target seedling row into a straight line and compared it with the actual seedling row using the fitted straight line as a benchmark. By calculating the deviation, the degree of deviation of the seedling row can be quantified. Finally, the straightnessand parallelismof the seedling row were calculated using mathematical statistics methods to evaluate the operation quality of the rice transplanter.

(1).: Straightness

Ideally, the transplanter should operate in a straight line, and the trajectory should be as close to a straight line as possible. This study used the straightnessof the seedling rows to evaluate the effectiveness of the rice transplanter in straight-line operations in the field. The least squares linear regression method is used to obtain the fitted lines for each seedling row, and the distance from the trajectory point to the fitted line is calculated based on the position information data of the seedling row, which is cross-track error (

X T E

). Assuming there are

n_{i}

trajectory coordinate points in the i-th row (i = 1, 2, …, 8), the average value of

X T E

in this row is

\bar{{X T E}_{i}}

, and the straightness (

{S t}_{i})

of the i-th seedling row can be obtained by calculating the arithmetic mean of the difference between

{X T E}_{i j}

and

\bar{{C T E}_{i}}

. The relevant formulas are shown in (13) and (14).

{X T E}_{i j} = \frac{A_{i} X_{i j} - Y_{i j} + B_{i}}{\sqrt{{A_{i}}^{2} + 1}}

(13)

{S t}_{i} = \frac{1}{n_{i}} \sum_{j = 1}^{n_{i}} |{X T E}_{i j} - \bar{{X T E}_{i}}|

(14)

where

A_{i}

and

B_{i}

are the slopes and intercepts of the fitting line equations for the i-th seedling rows, respectively;

{X T E}_{i j}

is the

X T E

from the jth point in the i-th row to the fitted line;

X_{i j}

and

Y_{i j}

are the X and Y coordinate values of the j-th point in the i-th seedling row. If

|{X T E}_{i j}|

= 0, it indicates that the point is on the corresponding fitting line, and the straight line operation has the best effectiveness. If

|{X T E}_{i j}|

≠ 0, then the point is located on both sides of the corresponding line.

S t r a i g h t n e s s

directly reflects the accuracy of the transplanter during linear operation. In theory, when

{S t}_{i}

= 0, it indicates that all points on the seedling row coincide with the fitted line, and the performance of the transplanter in linear operation is optimal; the smaller the

{S t}_{i}

, the more points on the fitted line, the higher the accuracy of the transplanter in linear operation, and the better the operation effect; conversely, a larger

{S t}_{i}

indicates the poorer performance of the transplanter in linear operation.

(2).: Parallelism

Since the rice transplanter usually works back and forth, the working tracks should be kept parallel as much as possible. Parallelism is used to measure the parallel effect between adjacent operation tracks of the rice transplanter. Therefore, this study selected the data of two adjacent rice seedling rows to analyze the parallel situation. The 8 target rice seedling rows can be divided into 7 groups of adjacent rice seedling rows in sequence. In the process of evaluating the parallelism (

{P l}_{k}

) of the adjacent seedling rows in the k-th group (k = 1, 2, …, 7), the first step involves replacing the (k + 1)-th seedling row with a fitted line as the reference. Then, the distance from the points on the k-th seedling row to the fitted line of the (k + 1)-th row is calculated, representing the deviation between adjacent rows (

D_{k, j}

). Assuming there are

n_{k}

trajectory coordinate points in the k-th row, the mean deviation between adjacent rows in this group is

\bar{D_{k}}

, and the

{P l}_{k}

can be obtained by calculating the arithmetic mean of the difference between

D_{k, j}

and

\bar{D_{k}}

.

D_{k, j} = \frac{{| A}_{k + 1} X_{k, j} - Y_{k j} + B_{k + 1} |}{\sqrt{{{(A}_{k + 1})}^{2} + 1}}

(15)

{P l}_{k} = \frac{1}{n_{k}} \sum_{j = 1}^{n_{i}} |D_{k, j} - \bar{D_{k}}|

(16)

where

A_{k + 1}

and

B_{k + 1}

are the slope and intercept of the fitting line equation for the (k + 1)-th seedling row, respectively.

D_{k, j}

represents the deviation between adjacent rows in the k-th group, that is, the distance from the j-th point on the k-th row to the (k + 1) adjacent row’s fitted line, and

\bar{D_{k}}

is the mean value of

D_{k, j}

.

This study used

p a r a l l e l i s m

to evaluate the tightness of the rice transplanter when operating between rows. The smaller the

{P l}_{k}

value indicates the smaller the deviation between adjacent rows, which means that the transplanter can better maintain consistent spacing between rows, and also indicates high field utilization and good operating results. In particular, when

{P l}_{k}

= 0, it indicates that the transplanter has achieved the best effectiveness during field operations, with all adjacent seedling rows completely parallel. On the contrary, a larger value of

{P l}_{k}

indicates poor accuracy of inter-row combination lines in agricultural machinery operations and low land use efficiency. This situation should be avoided as much as possible in actual production operations.

3. Results

3.1. Performance Comparison of Different Models

This section mainly discusses the performance of the CAD-Unet model and other common semantic segmentation models from the same dataset, including SegNet [58], DeepLabV3+ [59], U-Net [60], and SegFormer [40]. Among them, SegNet adopts an encoder–decoder structure and uses the maximum pooling index to retain spatial information, which is suitable for fine-grained segmentation tasks; U-Net can effectively retain the spatial information of the image through a symmetrical encoder–decoder structure and jump connections; DeepLabV3+ uses dilated convolution and atrous spatial pyramid pooling (ASPP) modules to enhance global context information, especially in detail segmentation and background separation; and SegFormer combines the transformer self-attention mechanism with convolutional feature extraction, which can fuse multi-scale features and improve the expression of local details and deep semantic information [44]. In addition, in order to reduce the errors caused by different training environments, all experiments are carried out using the same training strategy and period. Finally, all trained models are compared using the same validation dataset.

The recognition results of rice seedling rows obtained by various methods are shown in Figure 10, where black is the background, red boxes are missed seedling rows, and yellow boxes indicate the misdetection of seedling rows. In Figure 10, the first column is the original paddy field seedling image, the second column is the seedling row label, and the third to seventh columns are the recognition results of SegNet, DeepLabV3+, U-Net, SegFormer, and CAD Unet models. As a whole, the detection result graph of the network model designed in this study (CAD-UNet) is the most similar to the label, while the results of the other methods suffer from the problems of missed detection, misdetection, and edge blurring. As can be seen from the figure, the SegNet and DeepLabV3+ models poorly extract the seedling rows, and the seedling row boundaries are blurred in the recognition results of these two models; especially, the SegNet results have no obvious contours, which leads to the loss of a large amount of spatial information with high resolution. In addition, the leakage of seedling rows using the UNet model is obvious, especially in the waterlogged area, where there are many holes and missing extractions in the detection results. Although the extraction results obtained using the SegFormer model are significantly better than the original UNet model, its extraction results for seedling rows are still not fine enough, and it still has some misdetections. Compared with other models, CAD-UNet is able to better distinguish between ground and seedling rows, especially in waterlogged plots, where seedlings can still be extracted clearly and with good continuity. Therefore, we conclude that CAD-UNet performs best in the seedling row recognition task in this study.

This study selected the evaluation indicators mentioned in Section 2.4.2 to quantitatively analyze the detection performance of each model and the specific results are shown in Table 2. Compared with SegNet, DeepLabV3+, UNet, and SegFormer, CAD-UNet achieves the highest accuracy in extracting seedling rows from farmland images, with an IoU value of 81.03% and an F1 score of 89.52%. CAD-UNet achieves IoU values that are 18.38%, 12.16%, 6.98%, and 5.29% higher than the other four models, respectively. Correspondingly, its F1-scores are higher by 12.49%, 7.96%, 4.43%, and 3.32%, respectively. The results show that the seedling row shape extracted by the model established in this article is closer to the actual situation, and the quantitative results (Table 2) are consistent with the visual results (Figure 10).

Through the analysis of the above experimental results, it can be concluded that the CAD-UNet model can achieve optimal accuracy in detecting farmland seedling rows compared with other models. CAD-UNet is more stably able to process image data in complex paddy field environments while improving segmentation accuracy and reducing leakage and false detection. In addition, the self-comparison method shows that using the CBAM, AG, and DCNv2 modules in the original UNet can help improve the detection performance of the network, which proves the effectiveness of the improved method in this study.

3.2. Ablation Experiment

The functions of the CBAM, AG, and DCNv2 modules were verified through ablation experiments and quantitative analysis. Ablation experiments are usually performed by gradually adding a module or combination of modules to fully understand the role of each module in the improved network. The ablation experiment of this study used the dataset produced in Section 2.2.2, and the training strategy and experimental environment used were the same as those in Section 2.4.1. The relevant results are shown in Table 3.

The results data in Table 3 strongly demonstrate that gradually adding CBAM, AG, and DCNv2 modules can significantly improve the model’s detection performance. Especially when these three module combinations are added to the original model, the improved model (CAD-UNet) achieves the highest scores for all indicators. Among them, the overall segmentation accuracy (OA) of CAD-UNet is improved by 1.53 percentage points compared with the original UNet model, which fully proves the effectiveness of the improved method. In addition, the visualization results in Figure 11 also prove that adding CBAM, AG, and DCNv2 modules to UNet at the same time can significantly improve the model’s perception ability, making the detected seedling rows clearer and more complete.

3.3. Analysis of the Operation Effectiveness of Transplanter

This study used the CAD-UNet model to identify and segment the seedling rows in the image, and stitched all the pictures to obtain a binary-class orthorectified remote sensing image of the experimental area, as shown in Figure 12b, where black is the background and white is the seedling rows. The transplanter operated in a straight line back and forth within the test area eight times, and was able to extract data from eight target seedling rows (target seedling rows: the seedling row produced by the fifth seedling claw). In addition, this study did not consider seedling height and only collected two-dimensional geographic location information data of field seedling rows. Using the principle of least squares, linear fitting was performed on the seedling rows to obtain the fitting benchmark line equations for each row. The relevant information is shown in Table 4. The position information of each point on the seedling row was accurately extracted, and MATLAB 2020b software was used to establish the distribution map of the row in the test area (Figure 12e), which can intuitively reflect the operating effectiveness of transplanter and provide data support for further quantitative evaluation of the agricultural machinery operation’s quality.

According to Table 4, the distance of straight-line operation for the transplanter ranged from 50.39 to 50.58 m. In order to accurately reflect the distribution of seedling rows, a large number of points (ranging from 140,419 to 161,577) were selected on the target row, with a density range of 2781.83 to 3199.09 points/m. The slope of the fitted straight lines for eight seedling rows was concentrated between −1.527 and −1.467. Combined with the results shown in Figure 12d, the parallelism of adjacent seedling rows in each group can be preliminarily determined. In addition, the correlations of determination (

R^{2}

) were all greater than 99.9%, indicating excellent fitting effect of the seedling rows.

3.3.1. Straightness of Rice Transplanter Operation

After obtaining the fitted line equations for each seedling row, the cross-track error (

X T E

) for each row was calculated using Formula (13). Subsequently, the straightnessof each seedling row was obtained through Formula (14), and a comprehensive analysis of the straight-line operation of the rice transplanter was conducted by combining indicators such as root mean square value (

R M S

) and standard deviation (

S D

). The relevant results are shown in Table 5.

According to Table 5, the minimum and maximum values of straightnesswere 4.62 cm and 13.66 cm, respectively. Combining the

S D

and

R M S

results of each seedling row, it can be preliminarily judged that the performance of the first four rows was relatively poor, while the

s t r a i g h t n e s s

of the fifth~eighth seedling rows were all less than 6 cm, indicating that the operation effect was better. It is worth noting here that the

R M S

values of each seedling row were greater thanstraightness, because the

R M S

value was more sensitive to extreme values than

s t r a i g h t n e s s

.

In addition, in the same seedling row, the

X T E

of each point was different. If we treat it as a statistical variable, then analyzing each seedling row’s data can obtain the frequency distribution curve of this variable. Figure 13 shows the cumulative frequency distribution curve of the

X T E

of each seedling row.

According to the cumulative frequency distribution curve, it can be seen that the

X T E

high frequency interval in the fifth~eighth seedling rows was relatively concentrated, and the values were distributed on both sides of 0, which proved that the straight-line operation accuracy of the transplanter was high in the fifth~eighth rows. On the contrary,

{X T E}_{1}

~

{X T E}_{4}

were relatively scattered, indicating that the transplanter operation quality in the first four rows was poor.

3.3.2. Parallelism of Rice Transplanter Operation

In order to further quantify the parallel relationship between seedling rows, this study treated adjacent rows as a group of research objects. The distance (

D_{k, j}

) from the point on the k-th seedling row to the fitted line of the (k + 1)-th row can be obtained from Equation (15). After obtaining the parallelismof two adjacent seedling rows through Equation (16), a comprehensive analysis of the parallel operation of the rice transplanter was conducted by combining indicators such as the

R M S

,

S D

, and mean spacing indicators of the data. The relevant results are shown in Table 6.

It can be seen from Table 6 that the minimum and maximum values of parallelismbetween adjacent seedling rows in each group were 5.16 cm and 23.34 cm, respectively. Based on the SD and RMS results, it can be preliminarily judged that the operation effectiveness of the first three groups was poor. However, the parallelismof the fourth~eighth rows were all less than 7.05 cm, indicating that the transplanter’s operational effectiveness in the latter four groups was relatively good. In addition, according to the “Technical Regulations for Mechanized Rice Transplanting” (Issued by Shantou Supervision and Administration Bureau, Guangdong Province, China) [61], the standard spacing between adjacent seedling rows is 30 cm, and the mean spacing in this trial transplanter operation was 39.26 cm, indicating that the missing area was larger than overlap area.

In order to better understand whether parallelismhad a certain representativeness for the inter-row combination line accuracy of the transplanter operation, we also treated the deviation (

D_{k, j}

) between adjacent rows as a statistical variable and analyzed the distribution trend of

D_{k, j}

for each group, thereby obtaining the frequency distribution curve of the variable. Figure 14 shows the cumulative frequency distribution curve of the

D_{k, j}

of each group.

As shown in Figure 14, the

D_{k, j}

in the fourth~seventh groups was relatively concentrated, and the values were distributed on both sides of 0, which can be judged that the parallelism of the seedling rows was relatively high in the four to seven groups. Specifically, according to the results in Figure 14c, the

D_{k, j}

in the third group was relatively scattered, indicating that the parallelism accuracy between the third and fourth seedling rows was the worst, which is consistent with the results in Table 6 and the visualization results in Figure 12d.

4. Discussion

The large-scale production of agricultural machinery is an inevitable trend in China’s agricultural development. The quality of agricultural machinery operation is directly related to agricultural production and farmers’ income, which in turn affects farmers’ enthusiasm for using agricultural machinery. Therefore, a scientific operation quality evaluation method is needed. Many researchers use GPS equipment to record agricultural machinery trajectories in real time and use statistical indicators such as mean, standard deviation, and root mean square value to evaluate the operation effect [62,63]. However, GPS trajectories alone cannot accurately reflect the actual operation situation, especially in complex paddy field environments. GPS trajectories can only provide the location information of agricultural machinery and cannot directly reflect details such as crop distribution and operation accuracy [9,10]. Computer vision technology can directly obtain crop positions, thereby more comprehensively and accurately evaluating the operation effect of agricultural machinery. This study proposes a method for evaluating the field operation quality of rice transplanters based on high-resolution remote sensing images and deep learning technology. The method mainly consists of two parts: (1) using deep learning technology to accurately extract seedlings from remote sensing images; (2) establishing a rice transplanter operation quality evaluation model based on the extracted seedling rows.

The detection of crop rows depends on the color characteristics and growth characteristics of seedlings in the image, while the image quality is often affected by many factors (such as lack of seedlings, weeds, light, etc.). We obtained a paddy seedling image with a resolution of 0.27 cm, which has the advantage of high spatial resolution, reduces the influence of mixed pixels, and improves the accuracy of the image. Some researchers use high-resolution remote sensing images to identify and extract crop rows, and use digital image processing techniques (such as Hough transform [12,25], linear fitting [26,27,28], etc.) to extract information about the location of crops from the background image [64]. Bah et al. [25] regard crop row detection as an important step in weed detection and apply Hough transform to detect linear objects (crop rows) in vegetation binary images. However, this method may lead to false detection when selecting extreme points in parameter space, such as detecting error lines that are not parallel to crop rows, thus affecting the detection accuracy. Chen et al. [28] used the least squares method and Hough transform to extract crop rows from drone images, and used the crop row detection accuracy (CRDA) as an evaluation index. The results showed that the CRDA value of the least squares fitting method was between 0.99 and 1.00, which was more accurate than the Hough transform. However, the least squares fitting method was easily disturbed by image noise and was only feasible when weeds were separated from crops. In short, these methods based on traditional machine vision exposed the problems of large workload and low recognition accuracy in early attempts, especially in complex and changeable scenes such as paddy fields [29,30]. This study directly used deep learning technology to accurately extract seedling rows from remote sensing images. Compared with traditional methods, the seedling row extraction model based on deep learning algorithm can effectively cope with the interference of various environmental factors, has stronger feature extraction and pattern recognition capabilities, and can achieve high-accuracy crop row detection in complex backgrounds. In addition, this study can also obtain the location distribution of seedlings during the planting stage, providing valuable information for later field management.

With the continuous development of remote sensing technology, the popularity of high-resolution images has enabled deep learning models to detect at a finer spatial scale, further improving the accuracy of crop row extraction. Zhang et al. [37] used the DeepLabV3 network to extract rice seedling rows. By adopting a relatively small dilated convolution kernel (3 × 3), they effectively extracted spatial details, thereby improving the detection accuracy; however, the small receptive field limits the capture of global context information, which is the main reason for its poor recall performance. Zhang et al. [44] used the SegFormer model to detect winter wheat in Landsat images and found that the model has significant advantages in capturing global context information and processing resolution differences, but there is still room for improvement in local detail processing and computational efficiency, mainly because factors such as crop edge details, color, texture, and shape can significantly affect the effect of semantic segmentation [45,46]. In order to accurately extract seedling rows, this study proposed an efficient seedling row extraction model (CAD-UNet). First, we introduced the CBAM and AG modules in the jump connection of the original U-Net model at the same time, so that the model can better understand and utilize the features in the image, thereby improving the segmentation performance of the model. Secondly, we replaced the standard square convolution with DCNv2, so that the convolution kernel shape of CAD-Unet can be dynamically adjusted to better cope with the dense distribution and morphological changes of rice seedling rows in paddy fields. The above improvements help the model better handle objects of different scales and shapes, especially for extracting rice seedling rows with complex features and shapes in farmland images. In addition, the CAD-UNet model shows strong generalization ability when dealing with different crop planting conditions, making this method not only suitable for rice, but it can also be extended to the operation quality assessment of other row crops such as corn and wheat.

Although the method proposed in this study can effectively evaluate the field operation effect of rice transplanters, there is still room for further improvement, as follows:

(1): Expand the scope and depth of evaluation. Future research can try to introduce agricultural machinery satellite navigation operation data to expand the evaluation scale, such as combining indicators such as land utilization rate to comprehensively analyze the quality of agricultural machinery operations, and further promote the improvement of the standardized management level of agricultural machinery field operations.
(2): Optimize model design and reasoning efficiency. In the design of future seedling row extraction models, compression techniques such as model pruning and knowledge distillation will be used, and lightweight clipping type segmentation networks will be given priority to reduce model size and computational complexity, thereby improving reasoning efficiency.
(3): Explore self-supervised learning methods. The quantity and quality of labeled data determine the accuracy and reliability of the final recognition results. However, it is obvious that collecting and generating paddy field seedling samples is very time-consuming. Therefore, future research can try deep learning self-supervised methods, which have the potential to perform image segmentation tasks in large and complex scenes using a limited number of manually labeled data samples, thereby achieving accuracy comparable to fully supervised methods.

5. Conclusions

This study combines the improved UNet semantic segmentation model with the agricultural machinery operation effectiveness evaluation model to establish a rice transplanter operation quality evaluation model based on high-resolution remote sensing images. Introducing both CBAM and AG modules in the skip connections of the original UNet model helps the model better understand and utilize the features in the image, thereby improving the segmentation performance of the model. In addition, CAD-UNet uses DCNv2 instead of standard square convolution, allowing the shape of the convolution kernel to be dynamically adjusted to better adapt to specific feature structures, further improving the model’s perception ability and performance. These improvements help the model better handle objects of different scales and shapes, and are especially suitable for extracting seedling rows with complex features and shapes in farmland images. The segmentation and extraction results of seedling rows show that, under the test conditions of the same dataset, CAD-UNet provides more balanced results, with F1 of 89.52% and P of 91.14%, both better than other semantic segmentation models.

After extracting the position information of the seedling rows, mathematical statistics methods are used to obtain indicators such as straightness, parallelism, and standard deviation of the rows, which are used to evaluate the operation quality of the rice transplanter. The evaluation results show that the minimum and maximum straightnessof each seedling row are 4.62 and 13.66 cm, respectively, and the minimum and maximum parallelismbetween adjacent seedling rows are 5.16 and 23.34 cm, respectively. These indicators directly reflect the distribution of rice seedlings in the field, proving that the proposed method can quantitatively evaluate the field operation quality of the rice transplanter. Agricultural researchers, crop managers, and farmers can use this technology to obtain more accurate crop information and make appropriate crop management decisions to improve the efficiency and sustainability of agricultural operations.

Author Contributions

Conceptualization, Y.L. (Yangfan Luo); methodology, Y.L. (Yangfan Luo) and J.D.; software, Y.L. (Yangfan Luo) and Y.X.; validation, J.D. and W.Z.; formal analysis, Y.L. (Yangfan Luo); investigation, X.Y. and H.Z.; resources, Z.Z.; data curation, Y.L. (Yangfan Luo); writing—original draft preparation, Y.L. (Yangfan Luo); writing—review and editing, Y.L. (Yangfan Luo); visualization, Y.L. (Yangfan Luo) and S.S.; supervision, Z.Z. and Y.L. (Yuanhong Li); project administration, Y.L. (Yuanhong Li); funding acquisition, Y.L. (Yuanhong Li). All authors have read and agreed to the published version of the manuscript.

Funding

The authors would like to acknowledge the support of this study by the National Natural Science Foundation of China (Grant No. 32301708), and the National Key R&D Program of China “Creation and Application of Intelligent Operation Equipment for the Whole Process of Main Forage Feed Production” Sub-project: “Research and Application of Common Key Tech-nologies for the Whole Process Intelligent Production of Main Forage Feed” (Grant No. 2022YFD2001901).

Data Availability Statement

The datasets presented in this study are available from the corresponding author on reasonable request.

Acknowledgments

The authors gratefully acknowledge the editors and anonymous reviewers for their constructive comments on our manuscript.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Cisternas, I.; Velásquez, I.; Caro, A.; Rodríguez, A. Systematic literature review of implementations of precision agriculture. Comput. Electron. Agric. 2020, 176, 105626. [Google Scholar] [CrossRef]
Mathenge, M.; Sonneveld, B.G.J.S.; Broerse, J.E.W. Application of GIS in Agriculture in Promoting Evidence-Informed Decision Making for Improving Agriculture Sustainability: A Systematic Review. Sustainability 2022, 14, 9974. [Google Scholar] [CrossRef]
Delavarpour, N.; Koparan, C.; Nowatzki, J.; Bajwa, S.; Sun, X. A Technical Study on UAV Characteristics for Precision Agriculture Applications and Associated Practical Challenges. Remote Sens. 2021, 13, 1204. [Google Scholar] [CrossRef]
Pérez-Ruiz, M.; Slaughter, D.; Gliever, C.; Upadhyaya, S. Automatic GPS-based intra-row weed knife control system for transplanted row crops. Comput. Electron. Agric. 2012, 80, 41–49. [Google Scholar] [CrossRef]
Yang, L.; Wang, X.; Li, Y.; Xie, Z.; Xu, Y.; Han, R.; Wu, C. Identifying Working Trajectories of the Wheat Harvester In-Field Based on K-Means Algorithm. Agriculture 2022, 12, 1837. [Google Scholar] [CrossRef]
Tian, Y.; Mai, Z.; Zeng, Z.; Cai, Y.; Yang, J.; Zhao, B.; Zhu, X.; Qi, L. Design and experiment of an integrated navigation system for a paddy field scouting robot. Comput. Electron. Agric. 2023, 214, 108336. [Google Scholar] [CrossRef]
Yao, Z.; Zhao, C.; Zhang, T. Agricultural machinery automatic navigation technology. iScience 2023, 27, 108714. [Google Scholar] [CrossRef] [PubMed]
Nguyen, N.V.; Cho, W. Performance Evaluation of a Typical Low-Cost Multi-Frequency Multi-GNSS Device for Positioning and Navigation in Agriculture—Part 2: Dynamic Testing. AgriEngineering 2023, 5, 127–140. [Google Scholar] [CrossRef]
Ma, Z.; Yin, C.; Du, X.; Zhao, L.; Lin, L.; Zhang, G.; Wu, C. Rice row tracking control of crawler tractor based on the satellite and visual integrated navigation. Comput. Electron. Agric. 2022, 197, 106935. [Google Scholar] [CrossRef]
Perez-Ruiz, M.; Slaughter, D.C.; Gliever, C.; Upadhyaya, S.K. Tractor-based Real-time Kinematic-Global Positioning System (RTK-GPS) guidance system for geospatial mapping of row crop transplant. Biosyst. Eng. 2012, 111, 64–71. [Google Scholar] [CrossRef]
Cao, M.; Tang, F.; Ji, P.; Ma, F. Improved real-time semantic segmentation network model for crop vision navigation line detection. Front. Plant Sci. 2022, 13, 898131. [Google Scholar] [CrossRef] [PubMed]
García-Santillán, I.D.; Montalvo, M.; Guerrero, J.M.; Pajares, G. Automatic detection of curved and straight crop rows from images in maize fields. Biosyst. Eng. 2017, 156, 61–79. [Google Scholar] [CrossRef]
García-Santillán, I.; Guerrero, J.M.; Montalvo, M.; Pajares, G. Curved and straight crop row detection by accumulation of green pixels from images in maize fields. Precis. Agric. 2018, 19, 18–41. [Google Scholar] [CrossRef]
Wu, S.; Chen, Z.; Bangura, K.; Jiang, J.; Ma, X.; Li, J.; Peng, B.; Meng, X.; Qi, L. A navigation method for paddy field management based on seedlings coordinate information. Comput. Electron. Agric. 2023, 215, 108436. [Google Scholar] [CrossRef]
Wang, Y.; Fu, Q.; Ma, Z.; Tian, X.; Ji, Z.; Yuan, W.; Kong, Q.; Gao, R.; Su, Z. YOLOv5-AC: A Method of Uncrewed Rice Transplanter Working Quality Detection. Agronomy 2023, 13, 2279. [Google Scholar] [CrossRef]
Pang, Y.; Shi, Y.; Gao, S.; Jiang, F.; Veeranampalayam-Sivakumar, A.-N.; Thompson, L.; Luck, J.; Liu, C. Improved crop row detection with deep neural network for early-season maize stand count in UAV imagery. Comput. Electron. Agric. 2020, 178, 105766. [Google Scholar] [CrossRef]
Li, D.; Li, B.; Feng, H.; Kang, S.; Wang, J.; Wei, Z. Low-altitude remote sensing-based global 3D path planning for precision navigation of agriculture vehicles-beyond crop row detection. ISPRS J. Photogramm. Remote Sens. 2024, 210, 25–38. [Google Scholar] [CrossRef]
Rabab, S.; Badenhorst, P.; Chen, Y.-P.P.; Daetwyler, H.D. A template-free machine vision-based crop row detection algorithm. Precis. Agric. 2021, 22, 124–153. [Google Scholar] [CrossRef]
Grayson, B.; Penna, N.T.; Mills, J.P.; Grant, D.S. GPS precise point positioning for UAV photogrammetry. Photogramm. Rec. 2018, 33, 427–447. [Google Scholar] [CrossRef]
De Silva, R.; Cielniak, G.; Wang, G.; Gao, J. Deep learning-based crop row detection for infield navigation of agri-robots. J. Field Robot. 2024, 41, 2299–2321. [Google Scholar] [CrossRef]
Teshome, F.T.; Bayabil, H.K.; Hoogenboom, G.; Schaffer, B.; Singh, A.; Ampatzidis, Y. Unmanned aerial vehicle (UAV) imaging and machine learning applications for plant phenotyping. Comput. Electron. Agric. 2023, 212, 108064. [Google Scholar] [CrossRef]
Khan, M.N.; Rahi, A.; Rajendran, V.P.; Al Hasan, M.; Anwar, S. Real-time crop row detection using computer vision-application in agricultural robots. Front. Artif. Intell. 2024, 7, 1435686. [Google Scholar] [CrossRef]
Zhang, S.; Liu, Y.; Xiong, K.; Tian, Y.; Du, Y.; Zhu, Z.; Du, M.; Zhai, Z. A review of vision-based crop row detection method: Focusing on field ground autonomous navigation operations. Comput. Electron. Agric. 2024, 222, 109086. [Google Scholar] [CrossRef]
Ruan, Z.; Chang, P.; Cui, S.; Luo, J.; Gao, R.; Su, Z. A precise crop row detection algorithm in complex farmland for unmanned agricultural machines. Biosyst. Eng. 2023, 232, 1–12. [Google Scholar] [CrossRef]
Bah, M.D.; Hafiane, A.; Canals, R. Deep learning with unsupervised data labeling for weed detection in line crops in UAV images. Remote Sens. 2018, 10, 1690. [Google Scholar] [CrossRef]
Tenhunen, H.; Pahikkala, T.; Nevalainen, O.; Teuhola, J.; Mattila, H.; Tyystjärvi, E. Automatic detection of cereal rows by means of pattern recognition techniques. Comput. Electron. Agric. 2019, 162, 677–688. [Google Scholar] [CrossRef]
Zhang, X.; Li, X.; Zhang, B.; Zhou, J.; Tian, G.; Xiong, Y.; Gu, B. Automated robust crop-row detection in maize fields based on position clustering algorithm and shortest path method. Comput. Electron. Agric. 2018, 154, 165–175. [Google Scholar] [CrossRef]
Chen, P.; Ma, X.; Wang, F.; Li, J. A new method for crop row detection using unmanned aerial vehicle images. Remote Sens. 2021, 13, 3526. [Google Scholar] [CrossRef]
Li, Y.; Zhao, Z.; Luo, Y.; Qiu, Z. Real-Time Pattern-Recognition of GPR Images with YOLO v3 Implemented by Tensorflow. Sensors 2020, 20, 6476. [Google Scholar] [CrossRef] [PubMed]
Li, Y.; Zhao, Z.; Xu, W.; Liu, Z.; Wang, X. An effective FDTD model for GPR to detect the material of hard objects buried in tillage soil layer. Soil Tillage Res. 2019, 195, 104353. [Google Scholar] [CrossRef]
Guo, Z.; Cai, D.; Zhou, Y.; Xu, T.; Yu, F. Identifying rice field weeds from unmanned aerial vehicle remote sensing imagery using deep learning. Plant Methods 2024, 20, 105. [Google Scholar] [CrossRef]
Punithavathi, R.; Rani, A.D.C.; Sughashini, K.; Kurangi, C.; Nirmala, M.; Ahmed, H.F.T.; Balamurugan, S. Computer Vision and Deep Learning-enabled Weed Detection Model for Precision Agriculture. Comput. Syst. Sci. Eng. 2023, 44, 2759–2774. [Google Scholar] [CrossRef]
Lin, S.; Jiang, Y.; Chen, X.; Biswas, A.; Li, S.; Yuan, Z.; Wang, H.; Qi, L. Automatic detection of plant rows for a transplanter in paddy field using faster r-cnn. IEEE Access 2020, 8, 147231–147240. [Google Scholar] [CrossRef]
Li, Y.; Wang, X.; Zhao, Z.; Han, S.; Liu, Z. Lagoon water quality monitoring based on digital image analysis and machine learning estimators. Water Res. 2020, 172, 115471. [Google Scholar] [CrossRef] [PubMed]
Osco, L.P.; de Arruda, M.d.S.; Gonçalves, D.N.; Dias, A.; Batistoti, J.; de Souza, M.; Gomes, F.D.G.; Ramos, A.P.M.; de Castro Jorge, L.A.; Liesenberg, V. A CNN approach to simultaneously count plants and detect plantation-rows from UAV imagery. ISPRS J. Photogramm. Remote Sens. 2021, 174, 1–17. [Google Scholar] [CrossRef]
Bah, M.D.; Hafiane, A.; Canals, R. CRowNet: Deep network for crop row detection in UAV images. IEEE Access 2019, 8, 5189–5200. [Google Scholar] [CrossRef]
Zhang, P.; Sun, X.; Zhang, D.; Yang, Y.; Wang, Z. Lightweight Deep Learning Models for High-Precision Rice Seedling Segmentation from UAV-Based Multispectral Images. Plant Phenomics 2023, 5, 0123. [Google Scholar] [CrossRef]
Khan, S.; Naseer, M.; Hayat, M.; Zamir, S.W.; Khan, F.S.; Shah, M. Transformers in vision: A survey. ACM Comput. Surv. (CSUR) 2022, 54, 1–41. [Google Scholar] [CrossRef]
Zheng, S.; Lu, J.; Zhao, H.; Zhu, X.; Luo, Z.; Wang, Y.; Fu, Y.; Feng, J.; Xiang, T.; Torr, P.H. Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 6881–6890. [Google Scholar]
Xie, E.; Wang, W.; Yu, Z.; Anandkumar, A.; Alvarez, J.M.; Luo, P. SegFormer: Simple and efficient design for semantic segmentation with transformers. Adv. Neural Inf. Process. Syst. 2021, 34, 12077–12090. [Google Scholar]
Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 10012–10022. [Google Scholar]
dos Santos Ferreira, A.; Junior, J.M.; Pistori, H.; Melgani, F.; Gonçalves, W.N. Unsupervised domain adaptation using transformers for sugarcane rows and gaps detection. Comput. Electron. Agric. 2022, 203, 107480. [Google Scholar] [CrossRef]
Wang, C.; Yang, S.; Zhu, P.; Zhang, L. Extraction of Winter Wheat Planting Plots with Complex Structures from Multispectral Remote Sensing Images Based on the Modified Segformer Model. Agronomy 2024, 14, 2433. [Google Scholar] [CrossRef]
Zhang, Q.; Wang, G.; Wang, G.; Song, W.; Wei, X.; Hu, Y. Identifying Winter Wheat Using Landsat Data Based on Deep Learning Algorithms in the North China Plain. Remote Sens. 2023, 15, 5121. [Google Scholar] [CrossRef]
Yan, C.; Li, Z.; Zhang, Z.; Sun, Y.; Wang, Y.; Xin, Q. High-resolution mapping of paddy rice fields from unmanned airborne vehicle images using enhanced-TransUnet. Comput. Electron. Agric. 2023, 210, 107867. [Google Scholar] [CrossRef]
Wang, H.; Chen, X.; Zhang, T.; Xu, Z.; Li, J. CCTNet: Coupled CNN and transformer network for crop segmentation of remote sensing images. Remote Sens. 2022, 14, 1956. [Google Scholar] [CrossRef]
Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. CBAM: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
Luo, Y.; Huang, Y.; Wang, Q.; Yuan, K.; Zhao, Z.; Li, Y. An improved YOLOv5 model: Application to leaky eggs detection. LWT 2023, 187, 115313. [Google Scholar] [CrossRef]
Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
Gao, Z.; Xie, J.; Wang, Q.; Li, P. Global second-order pooling convolutional networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 3024–3033. [Google Scholar]
Li, R.; Li, M.; Li, J.; Zhou, Y. Connection sensitive attention U-NET for accurate retinal vessel segmentation. arXiv 2019, arXiv:1903.05558. [Google Scholar]
Zhang, X.; Wang, Q.; Wang, X.; Li, H.; He, J.; Lu, C.; Yang, Y.; Jiang, S. Automated detection of Crop-Row lines and measurement of maize width for boom spraying. Comput. Electron. Agric. 2023, 215, 108406. [Google Scholar] [CrossRef]
Wu, X.; Fang, P.; Liu, X.; Liu, M.; Huang, P.; Duan, X.; Huang, D.; Liu, Z. AM-UNet: Field Ridge Segmentation of Paddy Field Images Based on an Improved MultiResUNet Network. Agriculture 2024, 14, 637. [Google Scholar] [CrossRef]
Kazaj, P.M.; Koosheshi, M.; Shahedi, A.; Sadr, A.V. U-net-based models for skin lesion segmentation: More attention and augmentation. arXiv 2022, arXiv:2210.16399. [Google Scholar]
Oktay, O.; Schlemper, J.; Folgoc, L.L.; Lee, M.; Heinrich, M.; Misawa, K.; Mori, K.; McDonagh, S.; Hammerla, N.Y.; Kainz, B. Attention u-net: Learning where to look for the pancreas. arXiv 2018, arXiv:1804.03999. [Google Scholar]
Dai, J.; Qi, H.; Xiong, Y.; Li, Y.; Zhang, G.; Hu, H.; Wei, Y. Deformable convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 764–773. [Google Scholar]
Zhu, X.; Hu, H.; Lin, S.; Dai, J. Deformable convnets v2: More deformable, better results. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 9308–9316. [Google Scholar]
Badrinarayanan, V.; Kendall, A.; Cipolla, R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef] [PubMed]
Chen, L.-C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 801–818. [Google Scholar]
Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the International Conference on Medical Image Computing And Computer-Assisted Intervention, Munich, Germany, 5–9 October 2015; pp. 234–241. [Google Scholar]
DB 4405/T 295-2022; Technical Regulations for Mechanized Rice Transplanting. China National Standardizing Committee, Shantou Municipal Market Supervision and Administration Bureau: Shantou, China, 2022. Available online: https://std.samr.gov.cn/db/search/stdDBDetailed?id=DACBC26985C90ADEE05397BE0A0AE231 (accessed on 24 November 2024).
He, Y.; Zhou, J.; Sun, J.; Jia, H.; Liang, Z.; Awuah, E. An adaptive control system for path tracking of crawler combine harvester based on paddy ground conditions identification. Comput. Electron. Agric. 2023, 210, 107948. [Google Scholar] [CrossRef]
Chen, Z.; Yin, J.; Yang, J.; Zhou, M.; Wang, X.; Farhan, S.M. Development and Experiment of an Innovative Row-Controlled Device for Residual Film Collector to Drive Autonomously along the Ridge. Sensors 2023, 23, 8484. [Google Scholar] [CrossRef] [PubMed]
He, R.; Luo, X.; Zhang, Z.; Zhang, W.; Jiang, C.; Yuan, B. Identification Method of Rice Seedlings Rows Based on Gaussian Heatmap. Agriculture 2022, 12, 1736. [Google Scholar] [CrossRef]

Figure 1. Research flowchart.

Figure 2. Test site location. The red area is the test area.

Figure 3. Using UAV to collect orthophoto remote sensing images of farmland.

Figure 4. Structure of CAD-UNet.

Figure 5. CBAM module.

Figure 6. Schematic diagram of AG module.

Figure 7. Adaptive receptive fields in DCN and fixed receptive fields in standard convolution.

Figure 8. Comparison of DCN v1, DCN v2, and standard square convolution sampling.

Figure 9. Evaluation method for rice transplanter operation effectiveness.

Figure 10. Visual comparison of detection results of different models. (a) Farmland image. (b) Ground truth. (c) SegNet detection results. (d) Deeplabv3+ detection results. (e) UNet detection results. (f) SegFormer detection results. (g) CAD-UNet detection results. (Black represents the background, and white represents detected seedling rows. False negative is marked with red boxes, and false positive is marked with yellow boxes.)

Figure 11. Visual comparison of ablation experiment results. (a) Farmland image. (b) Ground truth. (c) UNet detection results. (d) C-UNet detection results. (e) CA-UNet detection results. (f) CAD-UNet detection results. (Black represents the background, and white represents detected seedling rows. False negative is marked with red boxes, and false positive is marked with yellow boxes.)

Figure 12. Extraction results of target seedling rows. (a) Orthophoto remote sensing image. (b) Binary-class orthorectified remote sensing image. (c) Determination of target seedling rows. (d) Extraction of target seedling rows. (e) Distribution of target seedling rows.

Figure 13. Deviation statistics of seedling rows.

Figure 14. Deviation statistics of adjacent rows.

Table 1. Parameters of remote sensing data acquisition equipment (DJI Phantom 4 RTK).

Parameters	Value
Resolution of CMOS sensor	20 million
Flight altitude	10 m
Camera angle	Orthophoto (90°)
Shooting overlap rate	Forward overlap: 80%. Lateral overlap: 80%
Positioning accuracy (RMS)	Vertical: 1.5 cm + 1 ppm; horizontal: 1 cm + 1 ppm ¹
Climatic conditions	Sunny, wind speed < 4 m/s

¹ 1 ppm means that the error increases by 1 mm for every 1 km of aircraft movement.

Table 2. Comparison results of different models. (The best results are shown in bold.)

Models	P (%)	R (%)	F1 (%)	OA (%)	IoU (%)
SegNet	72.98	81.55	77.03	92.30	62.64
Deeplabv3+	78.73	84.59	81.56	93.70	68.86
UNet	86.73	83.50	85.08	94.82	74.04
SegFormer	87.94	84.51	86.19	95.21	75.74
CAD-UNet	91.14	87.96	89.52	96.35	81.03

Table 3. Ablation test results. (The best values are highlighted in bold.)

Models	P (%)	R (%)	F1 (%)	OA (%)	IoU (%)
UNet	86.73	83.50	85.08	94.82	74.04
UNet + CBAM	88.65	86.54	87.58	95.66	77.91
UNet + CBAM + AG	89.50	87.22	88.34	95.93	79.12
UNet + CBAM + AG + DCNv2	91.14	87.96	89.52	96.35	81.03

Table 4. Data information of target seedling rows.

Row	Length (m)	Points Number	Density (/m)	Slope	$R^{2}$ (%)
1	50.58	150,035	2966.37	−1.4685	99.95
2	50.57	158,973	3143.41	−1.4684	99.95
3	50.57	143,820	2843.98	−1.4659	99.97
4	50.39	141,545	2809.01	−1.5174	99.99
5	50.48	140,419	2781.83	−1.5268	99.98
6	50.54	145,380	2876.26	−1.5260	99.99
7	50.49	153,617	3042.17	−1.5269	99.99
8	50.51	161,577	3199.09	−1.5280	99.99

Table 5. Evaluation results of precision for linear operation of rice transplanters.

Row (i)	${S t}_{i}$ (cm)	$R M S$ (cm)	$S D$ (cm)	High-Frequency Interval (cm)
1	13.66	16.00	15.99	(−20, 30)
2	11.13	15.42	13.75	(−15, 25)
3	9.71	12.16	12.02	(−20, 20)
4	6.59	10.11	8.64	(0, 30)
5	5.33	$6.78$	6.77	(−5, 15)
6	5.68	9.87	7.09	(−5, 20)
7	5.15	6.44	6.31	(−10, 15)
8	4.62	4.88	5.58	(−10,10)

Table 6. Evaluation results of parallel operation accuracy of rice transplanter.

Group (k)	${P l}_{k}$ (cm)	$R M S$ (cm)	$S D$ (cm)	Mean Spacing (cm)	High-Frequency Interval (cm)
1	13.67	59.38	15.99	57.19	(−30, 30)
2	10.94	53.20	13.82	51.37	(−10, 30)
3	23.34	39.91	25.84	29.58	(−40, 40)
4	7.05	37.81	9.52	36.59	(−15, 15)
5	5.36	19.83	6.78	41.37	(−5, 15)
6	5.66	22.09	7.10	20.92	(−10, 10)
7	5.16	23.09	6.32	37.80	(−15, 10)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Luo, Y.; Dai, J.; Shi, S.; Xu, Y.; Zou, W.; Zhang, H.; Yang, X.; Zhao, Z.; Li, Y. Deep Learning-Based Seedling Row Detection and Localization Using High-Resolution UAV Imagery for Rice Transplanter Operation Quality Evaluation. Remote Sens. 2025, 17, 607. https://doi.org/10.3390/rs17040607

AMA Style

Luo Y, Dai J, Shi S, Xu Y, Zou W, Zhang H, Yang X, Zhao Z, Li Y. Deep Learning-Based Seedling Row Detection and Localization Using High-Resolution UAV Imagery for Rice Transplanter Operation Quality Evaluation. Remote Sensing. 2025; 17(4):607. https://doi.org/10.3390/rs17040607

Chicago/Turabian Style

Luo, Yangfan, Jiuxiang Dai, Shenye Shi, Yuanjun Xu, Wenqi Zou, Haojia Zhang, Xiaonan Yang, Zuoxi Zhao, and Yuanhong Li. 2025. "Deep Learning-Based Seedling Row Detection and Localization Using High-Resolution UAV Imagery for Rice Transplanter Operation Quality Evaluation" Remote Sensing 17, no. 4: 607. https://doi.org/10.3390/rs17040607

APA Style

Luo, Y., Dai, J., Shi, S., Xu, Y., Zou, W., Zhang, H., Yang, X., Zhao, Z., & Li, Y. (2025). Deep Learning-Based Seedling Row Detection and Localization Using High-Resolution UAV Imagery for Rice Transplanter Operation Quality Evaluation. Remote Sensing, 17(4), 607. https://doi.org/10.3390/rs17040607

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Deep Learning-Based Seedling Row Detection and Localization Using High-Resolution UAV Imagery for Rice Transplanter Operation Quality Evaluation

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area

2.2. Remote Sensing Image Collection and Preprocessing

2.2.1. Remote Sensing Image Acquisition and Stitching

2.2.2. Construction of Training Sample Sets

2.3. Construction of Seedling Row Extraction Model

2.3.1. CAD-UNet

2.3.2. Model Training Parameters and Performance Evaluation

2.4. Rice Transplanter Operation Effectiveness Evaluation

2.4.1. Extract Target Seedling Rows

2.4.2. Evaluation Indicators for Rice Transplanter Operation Quality

3. Results

3.1. Performance Comparison of Different Models

3.2. Ablation Experiment

3.3. Analysis of the Operation Effectiveness of Transplanter

3.3.1. Straightness of Rice Transplanter Operation

3.3.2. Parallelism of Rice Transplanter Operation

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI