2.1. Structure of the Deficiency Detection and Replanting Positioning (DDRP) Machine
The DDRP machine system for detecting deficient seedlings is an automatic machine with four independent wheels, including two front wheels with in-wheel stepper motors, allowing smooth forward and backward movement with responsive maneuverability. The main structure is a cube shape (55 cm × 55 cm × 62 cm, LWH) constructed from aluminum extrusion (
Figure 1. All sides are covered with 5 mm thick black foam boards to block all visible light, except for the bottom side (
Figure 1). A plug tray is placed under the DDRP system for deficiency detection and replanting positioning tests (
Figure 2).
On the top side, there are two cameras for different purposes: a depth camera (Realsense Depth Camera D435i, Intel Co., Santa Clara, CA, USA) for image recognition and a surveillance camera (W200, HP Inc., Palo Alto, CA, USA) for monitoring the entire area. As an alternative to expensive laser rangefinders and network cameras, the Azure Kinect DK depth camera was available before the Intel® RealSense™ depth camera. In terms of size, weight, and price, the Intel® RealSense™ camera is an excellent choice for users needing to identify image and distance information. A depth camera is a lens module with an RGB lens and uses two sets of infrared rangefinders to obtain distance values, with some models adding a six-axis accelerometer module for increased functionality. Initially, distance calculation required using the lens proportion, but with the infrared ranging module, the starting position and angle of measurement are consistent, reducing measurement error and making it easier to capture target distance information.
In the proposed methodological framework, the integration of two distinct camera modules, a depth camera (Intel® RealSense™ D435i) and a surveillance camera (HP W200), offers a balanced solution in terms of cost, performance, and system compatibility. The depth camera serves as the primary sensor for image recognition and distance measurement, while the surveillance camera provides wide-area monitoring to ensure operational oversight. Compared to earlier alternatives such as the Azure Kinect DK and high-cost laser rangefinders, the Intel® RealSense™ D435i presents a cost-effective and compact option. Its lightweight design and affordability make it particularly suitable for small-scale agricultural systems. Functionally, the RealSense™ camera combines an RGB lens with dual infrared rangefinders, enabling accurate depth sensing. The inclusion of a six-axis accelerometer in some models further enhances spatial awareness and system responsiveness. From a performance standpoint, the infrared-based distance measurement ensures consistent starting positions and angles, significantly reducing error margins compared to traditional lens-proportion methods. This consistency improves the reliability of seedling detection and spatial mapping. Moreover, the modular nature of these components allows seamless integration with edge computing platforms, enhancing compatibility and scalability across various deployment environments.
Three LED strips (5050 white LED, 500 mm length, 5 V, China) are attached to the top and front sides to illuminate the plug trays and aid image recognition. Two stepper motors provide high torque for the front wheels, driving the machine’s movement (stepper motor, 24 V, YH57BYGH51-402A, Zhejiang Yuhui Electronics, Yueqing City, China). Additionally, two stepper motors are applied to the linear
x/
y-axis actuator for processing replantation positions, including a stepper motor (3.7 V, 17HS1352-P4130, Shenzhen Rtelligent Technology Co., Ltd., Shenzhen, China) for
X-axis movement and another stepper motor (24 V, 17PM-KA39B, Shenzhen Rtelligent Technology Co., Ltd., Shenzhen, China) for
Y-axis movement (
Figure 2).
All electronic control devices and wiring are installed on the rear side, including a Raspberry Pi 4B (Raspberry Pi Ltd., Cambridge, UK), a Programmable Logic Controller (PLC) (DVP-285V, Delta Electronics, Inc., Taipei, Taiwan), a communication adaptor (CH340 chip USB to TS485 converter adaptor, Shenzhen, China), and a stepper motor driver (TB6600, Sysmotor, Sys Tech. Co., Ltd., Dongguan, China) (
Figure 2). The Raspberry Pi 4B is connected to the DVP-285V PLC through a USB to RS-485 converter adaptor. Additionally, a red laser target designator (RLTD1, Bulcomers KS Ltd., Sofia, Bulgaria) with a red laser beam is used to point out deficiency plantings on the 128-cell plug trays (60 cm × 30 cm, LW, DD128, Wen-Kang Plastic Inc., Ltd., Nantou, Taiwan).
2.2. Design of the Image Control System on Raspberry PI for Deficiency Detection
2.2.1. Image Processing Suite
In image deep learning, three main algorithms are commonly used: R-CNN series (R-CNN, Fast R-CNN, Faster R-CNN), Single Shot Detector (SSD), and YOLO (You Only Look Once) [
10]. Although R-CNN was the earliest algorithm and delivers highly accurate results, it has a significant drawback: its slow speed. To address this issue, YOLO and OpenCV were developed. YOLO is a convolutional neural network that can predict multiple box positions and categories simultaneously, achieving end-to-end object detection and recognition, with its key advantage being speed. OpenCV (Open Source Computer Vision Library) is a cross-platform computer vision library initially developed by Intel and is available for free in commercial and academic applications.
Considering cost factors, OpenCV is entirely open-source and free, eliminating licensing fees or subscription costs. It operates efficiently on standard CPUs without requiring specialized hardware, and training custom models with OpenCV demands significantly fewer computational resources compared to deep learning frameworks. In contrast, YOLO implementation incurs higher costs due to the necessity of expensive hardware (such as GPUs or TPUs) for training and inference. Additionally, training a customized YOLO model is computationally intensive because of its extensive parameters.
While Tiny-YOLO and MobileNet SSD are designed for resource-limited devices, their implementation requires model training, dataset refinement, and specialized hardware acceleration, adding complexity and increasing deployment costs. The primary objective of this study is to develop an affordable, real-time detection system that nurseries can adopt with minimal investment. By leveraging the Haar cascade combined with optimized image pre-processing, we achieved reliable detection while significantly reducing hardware and computational requirements. Therefore, OpenCV is selected for this study instead of YOLO.
2.2.2. Processing of the RGB Image to HSV Color Space with Grayscale Processing and the Otsu Thresholding Methodology
The RGB image control and processing system relies on a depth camera and a Raspberry Pi 4B single-board computer to analyze image data efficiently. Two distinct image-processing strategies have been implemented in this study (
Figure 3):
(1) Process #A (Grayscale and Ostu thresholding, GO): Captures an RGB image of the plug tray ⟶ Converts the image to grayscale ⟶ Applies Otsu thresholding to segment the image ⟶ Using photomask processing to refine detection ⟶ Performs image recognition using OpenCV’s Haar cascade algorithm to identify targets ⟶ Extracts pixel coordinates of the detected object ⟶ Sends coordinate data via Modbus to the PLC control system for further processing.
(2) Process #B (HSV, Grayscale, and Ostu thresholding, HGO):
Captures an RGB image of the plug tray ⟶ Applies HSV conversion, followed by grayscale processing ⟶ Uses Otsu thresholding on both the HSV-converted grayscale image and the directly processed grayscale image ⟶ Merges the outputs from both paths before proceeding with photomask processing ⟶ Utilizes OpenCV’s Haar cascade algorithm for image recognition ⟶ Extracts pixel coordinates of the identified object ⟶ Transmits this data via Modbus to the PLC control system for precise execution.
The Raspberry Pi was used as the central processing unit for RGB image processing. The RGB image stream from the depth camera was fed into the system, with depth distance data retained temporarily for secondary verification if necessary. Initially, the RGB images were converted to grayscale and processed using the Otsu thresholding method [
11] (
Figure 4). This method automatically selects an optimal threshold based on pixel values, minimizing intra-class variance within the image. Proposed by Nobuyuki Otsu [
11], it maximizes inter-class variance because the squared distance between any two values remains constant [
12]. The method’s ability to find a suitable threshold under varying lighting conditions makes it ideal for this study, as external environmental light sources significantly impact image processing and threshold selection.
To mitigate the influence of irrelevant image areas, photomasks were applied around the perimeter, reducing the impact of color variations from unrelated objects (such as the floor) on the binarization results (
Figure 4). The relevant programs were implemented using Python’s OpenCV 4.5.5 package. However, even after direct binarization, there was still excessive noise. Therefore, morphological operations were applied to clean up the results. It was found that performing two mathematical morphology operations, opening and closing, yielded satisfactory outputs. The opening operation involved erosion followed by dilation processing, while the closing operation involved dilation followed by erosion processing.
Dilation is a fundamental operation in mathematical morphology, initially developed for binary images but now extended to grayscale images and complete lattices. The dilation operation typically employs a structuring element to probe and expand the shapes present in the input image. Erosion, the other fundamental operation in morphological image processing, typically employs a structuring element to probe and reduce the shapes present in the input image.
In the experiment, the addition of LED strips (5050 white LED strips) improved stability and allowed us to obtain more consistent parameters (
Figure 2). To mitigate the impact of external environmental changes during movement, light-blocking foam boards were installed above and around the test bench. Initially, when using a direct approach of turning off external light sources, the Otsu thresholding method still resulted in the loss of many fine details due to the wide color domain. To address this, the approach was modified by segmenting problematic color regions using photomasks and creating separate image sources for subsequent binarization. Finally, the binarization results from these segmented images were combined to retain more feature details.
To enhance the RGB images of the plug trays and seedlings, the images were first converted to HSV color space, then processed with grayscale and Otsu thresholding techniques, followed by erosion and opening operations using the cv2.bitwise_ and command from the OpenCV 4.5.5 package. All processed images were further refined with photomasks for image recognition in machine learning, identifying the pixel coordinates of the targets, such as the empty cells in the plug trays, on the Raspberry Pi. Due to the differing coordinate systems between the Raspberry Pi and the PLC modules, all target pixel coordinates were converted to a 0–255 scale. These rescaled coordinates were then transmitted to the PLC module via the Modbus protocol. The PLC module utilized the received pixel coordinates from the Raspberry Pi to detect and pinpoint the empty cells in the plug trays and control the machine’s movement (
Figure 5).
2.2.3. Machinery Image Reorganization
There are various methods for real-time recognition of missing plant cells in seedling plug trays. While deep learning has recently gained popularity, earlier machine learning approaches can also accomplish this task. The goal of this research is to identify missing plants. During the hardware introduction, we discussed how the choice of algorithm impacts hardware selection. For example, using a neural network-based deep learning method for real-time recognition requires a display chip with high computing capabilities and a graphics card with a high Compute Unified Device Architecture (CUDA) number for real-time computing. Alternatively, if portability and compact size are essential, one must optimize for machine learning algorithms that impose less hardware burden. In this study, we utilize OpenCV’s Haar cascade algorithm for the image recognition of empty cells in each plant tray.
2.2.4. Haar Cascade Algorithm
Introduced by Ali et al. [
13], the Haar Cascade classifier detects object features and remains effective for real-time recognition thanks to hardware advancements. This machine-learning algorithm, included in OpenCV’s sample programs, is designed for object detection in images and videos. Training the algorithm requires placing images of objects to be identified in a folder of positive samples, while targets that should not be identified go into a folder of negative samples, which are used as backgrounds. Mixing the target with negative samples can cause training failure. In this experiment, the method was used to identify empty cells in seedling plug trays, with images binarized into white and black parts, making the empty tray’s shape characteristics fixed. The process for training image recognition included seeding the target samples into fixed-size images, collecting pre-processed samples, and using random internet images for negative samples. Training feature files in the Haar Cascade classifier uses a special training mode in OpenCV. The classification algorithm, derived from the AdaBoost algorithm and the Probably Approximately Correct (PAC) model, involves screening rectangular features to construct a weak classifier. These classifiers, combined with suitable weight parameters, form a strong classifier. Weak classifiers need only exceed 50% accuracy to be used. Finally, multiple strong classifiers are combined into a Cascade Classifier [
13].
During the recognition process (
Figure 6, modified from
Figure 1 of [
14]), the screen to be identified is scanned from the top left corner based on the user-defined box in the program.
Figure 6 shows a sequential decision-making process for evaluating image blocks using a cascade of classifiers. The goal is to filter and process only the most relevant image data for advanced analysis. This type of architecture is commonly used in machine vision and image processing systems to: (1) Reduce computational load by filtering out irrelevant data early. (2) Improve accuracy by applying increasingly strict criteria. (3) Ensure only high-quality or relevant image blocks undergo intensive processing. The more pre-classifiers pass, the higher the probability of detecting the target. The same block may be scanned multiple times, and the program will mark duplicates as successful detections. The program parameters are designed based on the set number of passes [
13].
Our study prioritizes low-cost automation for small-scale nursery farms, where the adoption of computationally expensive deep-learning approaches presents financial and hardware constraints. While Tiny-YOLO and MobileNet SSD are optimized for edge devices like the Raspberry Pi, they still require GPU acceleration or TPU support for optimal performance, which many small-scale commercial nurseries lack the infrastructure to support. The Haar cascade classifier, though dated, remains computationally lightweight, enabling real-time defect detection without excessive processing power. This trade-off ensures affordability and accessibility, addressing the economic constraints of smaller farms that may not be able to afford deep-learning-based hardware.
2.2.5. Target Coordinate Transmission Section
The original display result is presented as [x, y, w, h], where x and y are the starting pixels of the successfully detected image block, and w and h are the width and height of the image block (
Figure 7). The representative value can be obtained by extracting the original value and post-processing it. For transmission hardware, external wiring uses a USB to RS-485 converter with a CH340 chip, and the software program employs the Serial suite in Python for action execution.
The Serial suite has an 8-bit limit to the transmission value, restricted to 0–255. Any over-performing or negative integer values will force the program to interrupt execution. The coefficients 0.8 and 0.75 were empirically derived based on the physical dimensions of the plug tray and the resolution of the captured image (640 × 480 pixels) (
Figure 7). These scaling factors ensure that the pixel coordinates are proportionally mapped to the real-world grid of the tray. The offsets 225 and 10 correspond to the origin shift required to align the image capture box with the actual plug tray layout, compensating for the cropped region used during image acquisition, from pixel range (210, 10) to (530, 270).
However, the coordinates for capturing images are (210, 10) to (530, 270), and the rest of the exterior is black. Without additional processing, these measured coordinates will not be transmitted, causing the Raspberry Pi terminal program to be interrupted. Therefore, target coordinate values need to be converted using the following equations before transmission to the PLC:
where
x and
y represent the top-left pixel coordinates of the detected image block.
w and h denote the width and height of the image block.
x′ and
y′ are the converted coordinates used by the PLC to control stepper motor positioning.
Before transmission, the floating point part is rounded, and the central value of the calibration box is calculated as the positioning coordinates. Values 225 and 10 are used to remove unnecessary parts based on the starting coordinates of the image capture box. The difference of 15 with 210 is due to the cell disk’s left and right range, and the center point of the first cell grid usually being around the coordinate value 225. The upper and lower ranges are detected as much as possible without special modification. The multipliers 0.8 and 0.75 limit values to 0–255, balancing accuracy and size. If out-of-range values occur, values less than 0 are corrected to 0, and values greater than 255 are corrected to 255. Preliminary experiments show values are between −1 and 256, with minimal impact from corrections.
To prevent coordinate accumulation, an empty set is added before the detection program block, allowing direct transmission and updating with a new empty set before each detection. Pre-processed images, real-time images, and real-time light information are displayed on a window (
Figure 8). Each time the PLC performs a new action, a screenshot of the window is synchronized and automatically saved to the specified folder.
2.2.6. Receiving Pixel Coordinates from Raspberry Pi on PLC
Coordinate reception utilizes the built-in Delta instruction API 80 RS (serial data transmission). Unlike other Modbus instructions, such as RTU (Remote Terminal Unit) and ASCII (American Standard Code for Information Interchange), this instruction does not require a CRC (Cyclic Redundancy Check) for the tail code, nor does it need to specify station numbers or read/write codes in the header. This streamlines communication transmission, makes it more efficient, and significantly enhances data transfer capabilities. It also allows for specifying data storage destinations, facilitating data management and usage. During the experimental phase, direct monitoring verifies if the transmitted coordinates match the values on the Raspberry Pi side.
2.2.7. Conversion of Received Pixel Coordinates to the Real-World Coordinates on PLC
After transmitting the signal from the Raspberry Pi, the data received by the PLC requires conversion. The received coordinates must be mapped to the actual target position of the compensating positioning slide. The coordinate conversion system maps the image pixel coordinate system to the world coordinate system. Instead of using standard length units, it is based on the step count of the stepper motor controlled by the PLC (
Figure 9), facilitating intuitive control and operation within the program.
The origin in the world coordinate system starts from the lower right corner (
Figure 10), as the mechanism’s reference origin is there. Each time the program starts or the positioning action is completed, the red laser point indicating the position is reset to the lower right corner origin. Programming control uses the PLC API 203 SCLP (Parameterized Proportional Direct Operation Instruction) (Delta Electronics, Inc., Taipei, Taiwan), which performs proportional calculations on input values to achieve the conversion requirement.
2.4. Experimental Setup
The main equipment used in this study is a custom-designed deficient detection and replanting positioning (DDRP) system on a pilot-scale automated DDRP machine (
Figure 1 and
Figure 2). This machine is equipped with a moving function to simulate track travel mode with a fixed direction during actual operation. The steering wheels, controlled by stepper motors, maintain a fixed direction to reduce direction errors caused by the driving wheels’ speed discrepancies.
The system module includes an image recognition and positioning system with a laser designator that marks and points to the actual positioning position with a red laser beam. This allows the program to be corrected according to the laser pointing and confirms the accuracy and location of the missing seedling cells in the plug trays.
The plug trays have been used in the Taiwanese seedling industry for over 30 years and come in various specifications, from 72 to 406 cells per tray. The cell shapes also vary, including basic square and round, special star, and triangular shapes. In this study, a 128-round cell (16 × 8 cells) plug tray was selected, measuring 60 cm × 30 cm, with artificial seedlings used instead of live seedlings due to indoor experiments without adequate solar radiation (
Figure 12 and
Figure 13).
This approach was necessary due to laboratory constraints, particularly the lack of adequate solar radiation indoors. A fully controlled indoor environment provided uniform lighting conditions, minimizing external variables that could affect image processing accuracy. This ensured that the machine vision system’s core performance was evaluated based on algorithmic precision rather than environmental fluctuations.
Additionally, since seedling replenishment typically occurs between 6 and 10 days after germination, maintaining a uniform appearance was crucial for recognition accuracy. Real seedlings at different growth stages exhibit variations in shape and texture, potentially compromising the system’s ability to assess fundamental detection capabilities. Using standardized artificial seedlings allowed for consistent characteristics across trials, ensuring reliable evaluation of image processing techniques and deficiency detection algorithms. While artificial seedlings helped maintain controlled experimental conditions, we acknowledge that real-world applications require further validation with live seedlings under diverse lighting and environmental settings.
Seedling replenishment typically occurs 6–10 days after germination, with slight variations between summer and winter. Although a depth camera is used for image recognition, an additional network camera is mounted on top of the experimental stage to record the experimental process. This network camera captures the landing point of the laser marker and details of the entire experimental process for reanalysis (
Figure 2). The experimental process is recorded using a network camera and a screen recording program. Data such as the current experiment time, PLC program data monitoring, and remote program monitoring from the Raspberry Pi are recorded for secondary analysis and confirmation. The total number of additional replanting actions, the duration of the experiment, and the number of successful seedling captures are also automatically screenshotted on the Raspberry Pi for saving coordinates and pre-processed images.
Based on practical experience in commercial seedling nurseries, a manual seedling process typically results in about 5 to 7 deficient seedlings per plug tray, with a maximum of 10 deficient seedlings. Therefore, in the systematic test experiment, 10 deficiency seedlings were pre-set for each plug tray (i.e., 10 empty cells randomly on each plug tray). The position of the 10 deficient seedlings was selected using a computer random function from the seedling position of cell numbers 1 to 128 (
Figure 12). The artificial seedlings were removed before each experiment, the missing seedlings’ cell numbers were recorded, and then new deficient seedlings were randomly selected again.
While this study was conducted in a controlled indoor environment, real-world field validation is a crucial next step in assessing the system’s robustness. The indoor setup ensured consistent lighting and environmental conditions, allowing a reliable evaluation of image-processing techniques and maneuver control strategies without external interference. To maintain uniformity in seedling shape, size, and color, artificial seedlings were used as experimental targets. This approach minimized growth-related variability that could affect detection accuracy, ensuring consistent evaluation parameters across trials. However, the controlled laboratory setting does not fully account for challenges such as variations in light intensity, occlusion effects, and complex backgrounds. Despite these limitations, the system incorporates photomask processing and HSV color-space conversion, significantly improving image segmentation performance under non-uniform lighting conditions.
The Effects of Different Image Processing Methods
The two methods—grayscale and Otsu thresholding processing—were compared in the control process design. The variable in the experiment was the image processing method, and the PLC adopted the batch mode of the DDRP system with fixed advance distance (
Figure 10). Relevant experimental data were recorded, and all artificial seedling replanting was carried out manually immediately after deficient detection and positioning by a laser designator.
In this phase of the experiment, the image pre-processing methods, Otsu thresholding, and HSV color space were used. Photos of the results were retained for subsequent analysis. The machine operated in batch detection mode (BD mode), detecting the target after moving a fixed distance. If the results of machine learning and image recognition are discussed separately, they are presented using the confusion matrix commonly used in machine learning (
Table 1) [
15]. Two of the most commonly used metrics for classification are precision and recall. These metrics are calculated using the following equations:
For this study, each result is summed up and manually identified and recorded once per experiment. The total number of trials using Otsu thresholding processing alone is 30, and the average of these trials is used as the benchmark for comparing the total results. The number of trials processed with mixed HSV color space conversion was 90 trials. Therefore, 90 trials were individually calculated, then 30 trials were randomly selected for comparison under the same benchmark. In practice, seedling replenishment is carried out 6−10 days after germination, with slight variations in timing between summer and winter. To enable reuse and ensure uniformity in seedling age and morphological characteristics, this study employs green-colored paper and green iron wire to fabricate simulated cabbage seedlings aged 10 days as experimental materials (
Figure 13).
The experiment was carried out using the same image pre-processing method, but the HSV color conversion space program was introduced for pre-processing. Variables in the experiments included the PLC’s movement and detection control programs. Experiments applied both (i) continuous detection mode without a fixed distance (
Figure 10) and (ii) batch detection mode with a fixed forward distance (
Figure 11), and relevant data were recorded. All artificial seedling replanting was manually performed immediately after deficient detection and positioning by a laser designator.