1. Introduction
Precision agriculture (PA) [
1,
2] involves the observation, measurement, and response to inter- and intra-field variability in crops, etc. Precision farming may play an important role in agricultural innovation [
3]. PA offers a data-driven and technology-enabled approach to optimize farming practices, increase sustainability, and enhance productivity in the face of field variability. It empowers farmers to make precise and informed decisions, leading to improved crop yields.
Plant counting plays an important role in PA. It traverses through almost every critical stage in agricultural production, spreading from seed breeding, germination, cultivation, fertilization, pollination, to yield estimation and harvesting [
4]. There are many key reasons to signify the significance of plant counting. Accurate plant counting helps estimate crop yields, which is crucial for agricultural planning, resource allocation, and market forecasting. It enables farmers to make informed decisions regarding harvesting, storage, and the marketing of their produce. Yield estimation should start from (1) crop management: Plant counting provides essential information for effective crop management. Knowing before harvesting how many plants have emerged and how they are growing is key in optimizing labor and an efficient use of resources [
5]. By knowing the population density of plants, farmers can optimize irrigation, fertilization, and pest control practices tailored to specific crop requirements. (2) Plant spacing and thinning: Proper plant spacing is vital for optimal growth and resource utilization. Plant counting helps determine if plants are evenly spaced, allowing for efficient light penetration, air circulation, and nutrient absorption. It also aids in identifying overcrowded areas, facilitating thinning or replanting actions to achieve the desired plant density. (3) Research and experimentation: Plant counting is crucial in scientific research and experimental studies. It helps researchers assess the effects of different treatments, interventions, or genetic modifications on plant growth, development, and productivity. Accurate plant counting enables reliable and meaningful data analysis, leading to valuable insights and advancements in plant science. (4) Disease and pest monitoring: Plant counting can assist in the early detection and monitoring of plant diseases and pests. Plant counting has a wide range of applications in agriculture, such as crop management, yield estimation, disease and pest monitoring, etc., [
6]. By regularly counting plants, farmers or researchers can identify and track the spread of diseases or infestations. Timely intervention measures can be implemented to prevent further damage and minimize crop losses. (5) Plant breeding and genetics: Plant counting is essential in breeding programs and genetic studies. It aids in evaluating plant traits, such as flowering time, fruit set, or seed production, and helps select superior individuals for further breeding. Accurate counting enables breeders to make informed decisions in developing new varieties with desired characteristics [
7].
The traditional counting is conducted by humans, which is a labor-intensive, time-consuming, and expensive work. The current technologies, such as the Internet of Things (IoT) and artificial intelligence (AI), as well as their applications, must be integrated into the agricultural sector to ensure long-term agricultural productivity [
8]. These technologies have the potential to improve global food security by reducing crop output gaps, decreasing food waste, and minimizing resource use inefficiencies [
8]. Computer vision provides a real-time, non-destructive, and indirect way of horticultural crop yield estimation [
9]. Therefore, we choose to combine deep learning for statistical research on plants. Along with the booming development of artificial intelligence and computer vision, it has become more feasible to monitor the crops by using imagery. For monitoring the stand of crops, the best perspective image is the orthomosaic image, which can be obtained via aerial photography. The unmanned aerial vehicles (UAV), commonly known as drones, is a feasible solution for capturing the images. In the field of plant protection, drones are attracting increasing attention due to their versatility and applicability in a variety of environmental and working conditions [
10]. The benefits of using UAV instead of using satellite data is that the data captured via UAV has better spacial resolution, as well as the temporal resolution is more flexible [
11]. By equipping it with multispectral sensors, the drone can capture the multispectral imagery, which includes richer information for crop growth observation and estimation [
12,
13]. Furthermore, drones have been used for agricultural crop research. For example, this study aimed to evaluate the response of coffee seedlings transplanted to areas subjected to deep liming in comparison to conventional (surface) liming, using vegetation indices (VIs) generated by multispectral images acquired using UAVs [
14].
In existing works [
5,
15] of plant counting, the most frequently used image is the three visible light (VIS) bands: blue band (400 to 500 nm), green band (500 to 600 nm), and red band (600 to 700 nm), which cover the range of wavelengths that are visible to the human eye. Except for the visible light bands, multispectral bands can be captured via UAV if it is equipped with multispectral sensors. The essence of multispectral data lies in the reflection of light at various discrete wavelengths, which reflects the reflectance of a point for a particular wavelength. Therefore, the multispectral data can be regarded as images. Deep learning techniques for image processing can also be applied to the multispectral images [
16].
Some spectral bands are significant for agriculture and provide valuable information for crop monitoring and analysis [
12]. For example, the near-infrared (NIR) band (700 to 1300 nm) provides insights into plant health, photosynthetic activity, and water content. It is particularly useful for monitoring vegetation vigor, detecting stress conditions, and estimating biomass in crops [
17]. The red-edge (RE) band (700 to 750 nm) captures subtle changes in chlorophyll absorption and can indicate growth stages, nutrient status, and stress conditions in crops. It is useful for differentiating between healthy and stressed vegetation [
18]. Hence, for the specific purpose of plant counting, we propose to take advantages of using more relevant bands in the multispectral range besides the RBG.
In recent years, along with the blooming of advanced technologies, such as artificial intelligence, computer vision, machine learning, deep learning, etc., they have been greatly introduced into the agriculture industry [
19]. Many traditional tasks in agriculture, such as plant counting, plant disease identification, etc., have achieved excellent performance [
20,
21]. Deep learning is a kind of data-driven method which generally requires large-scale data to train the network. By using UAV, the data collection work is easier and faster. Combined with computer vision, the automated plant counting becomes promising by using the deep learning method.
Tobacco is a kind of highly valuable crop due to its economic significance in the global market. Tobacco cultivation and the tobacco industry contribute significantly to the economies of many countries. It provides income and employment opportunities for farmers, laborers, and workers involved in various stages of production, processing, and distribution. The global tobacco market is substantial, meaning that tobacco products continue to be in demand worldwide. In addition, tobacco is an important commodity for export earnings and value-added products [
22].
The cultivation of tobacco occurs annually. The plant is germinated in cold frames or hotbeds firstly and then transplanted to the field. The time from transplanting to harvest depends on the breeds, generally around 60 to 90 days, but in the range of 45 to 120 days [
22]. During cultivation in the field, the UAVs can be used for monitoring the number of stands after the transplanting, as well as the number of plants during the growth in field. The yield of tobacco largely depends on the number of viable tobacco plants because the leaves are what is harvested from the tobacco plant. Hence, tobacco plant counting is also relevant for the yield estimation.
In this work, we propose to utilize the images captured via UAV for automatic tobacco plant counting. The images not only include the visible images, but also include the multispectral images. Our contributions of this work can be expressed as follows: (1) we proposed to use the multispectral images for plant counting; (2) we created a dedicated multispectral tobacco plant dataset; (3) Because our input data has three different channels, we have modified the architecture of YOLOv8 and designed a post-processing algorithm to count the stands of a big field.
3. Methods and Materials
The YOLO algorithms’ series has become a widely used algorithm as a one-step algorithm for object detection [
46]. In this study, we used YOLOV8 as the baseline network. The framework of YOLOv8 is as shown in
Figure 1. YOLOv8 is a deep learning-based OD algorithm that enables fast, accurate, and robust OD and instance segmentation on high-resolution images. It includes three components: backbone part, neck part, and head part.
3.1. Backbone Network
The backbone network is a convolutional neural network used for extracting image features. YOLOv8 uses a similar backbone as YOLOv5 with some changes on the CSPLayer, now called the C2f module [
43], which enhances gradient flow and feature fusion, thereby improving the feature representation capabilities. The C2f structure consists of multiple C2f modules, with each C2f module comprising two convolutional layers and a residual connection. The first convolutional layer reduces the number of channels, while the second convolutional layer restores the channel count.
C2f uses a 3 × 3 convolutional kernel as the first convolutional layer, allowing it to adapt the channel count differently for models of different scales, without following a fixed set of scaling factors. This enables the selection of an appropriate channel count based on the model’s size, avoiding overfitting or underfitting. C2f employs a four-stage backbone network, with each stage consisting of 3-6-6-3 C2f modules, yielding feature maps of sizes 80 × 80, 40 × 40, 20 × 20, and 10 × 10, respectively. This allows for the extraction of multi-scale feature information to accommodate objects of various sizes.
In our work, we modify the first layer of the backbone network to adapt different input data. In addition to the traditional RGB images that has three input channels, our data has more diverse channel combinations. The visible RGB color images have three channels, as well as the red, green, and near-infrared combination images also have three channels. The single channel inputs include the narrow red band (NR), narrow green band (NG), near-infrared band (NIR), and red edge band (RE). In addition, all of the bands are combined together into a seven-band combination. Therefore, the structure of the first layer has to be modified to fit the input data, as shown in
Figure 2.
In order to avoid overfitting, we adopt transfer learning and load the pre-trained weights before training with our target domain data. Since the structure of the YOLOv8 is modified, the weights cannot entirely match the modified structure. We only load the rest weights of the trained model except for the first convolutional layer.
3.2. Neck Network
The neck network is a network used to fuse feature maps of different scales. YOLOv8 uses a PANet structure, which can achieve top-down and bottom-up feature fusion, increasing receptive fields and semantic information. The spacial pyramid pooling layer in the SPPF module can use multiple convolution kernels of different sizes to perform pooling operations to extract multi-scale features.
YOLOv8 uses two mechanisms in the neck network: one is a feature fusion mechanism and the other is a feature selection mechanism. YOLOv8 uses a new feature fusion mechanism, which can dynamically adjust the fusion weight according to feature maps of different scales to improve the fusion effect. This fusion mechanism uses an attention module that can calculate the similarity between feature maps at different scales and assign fusion weights based on the similarity. YOLOv8 also uses a new feature selection mechanism, which can select appropriate feature maps as the output according to the needs of different tasks. This selection mechanism uses a classifier that predicts the contribution of each feature map based on the task label and selects the optimal feature map based on the contribution.
3.3. Head Network
The head network is a network used to predict the target category and location. YOLOv8 uses a new decoupled head structure, which can separate classification and regression tasks, improving model flexibility and stability. The decoupled head structure consists of two branches: one is the classification branch, which is used to predict the category probability of each pixel, and the other is the regression branch, which is used to predict the bounding box coordinates of each pixel.
YOLOv8 also uses a new integral form representation, which can model the bounding box coordinates into a probability distribution, improving regression accuracy and robustness. The integral form representation consists of two convolutional layers and a softmax layer, which can output the offset probability of each pixel in four directions and obtain the final bounding box coordinates via an integral operation. After obtaining the bounding box coordinates, YOLOv8 uses the loss function of CIoU [
47] and DFL [
48] for boundary box loss and binary cross entropy for classification loss. It is a new positive and negative sample matching strategy, which can determine positive and negative samples according to the IoU (intersection over union) between the boundary box and the real label. Specifically, if the IoU between the bounding box generated by a pixel and the real label is greater than or equal to 0.5, the pixel and bounding box are regarded as positive samples; if it is less than 0.4, they are regarded as negative samples; and if it is within 0.4 and 0.5, they are ignored.
3.4. Hardware
The configuration hardware in this work is as follows: a server was equipped with an Intel Xeon Platinum 8336C processor running at 2.30 GHz, featuring 64 logical cores, and it was also outfitted with 128 GB of Samsung DDR4-2400 ECC RDIMM memory, consisting of 16 modules, each with an 8 GB capacity.
The UVA used in this work is equipped with an RGB camera and a multispectral camera. The RGB camera captures the RGB images and the multispectral camera with a narrow band filter captures four multispectral images. The multispectral images include NG (560 ± 16 nm), NR (650 ± 16 nm), RE (730 ± 16 nm), and NIR (860 ± 26 nm).
3.5. Data and Data Pre-Processing
3.5.1. Data
In the field cultivation season in 2023, we captured multispectral images five times at the location (25°01′32.0″ N 103°59′21.4″ E). The flight altitude was set at 15 m. Starting from May, when the seedlings were transplanted in the field, to August, when the leaves were ready for harvesting, we visited the field approximately every 15 days to collect images of the tobacco plants at various growth stages. A comprehensive dataset is essential to ensure the model’s robustness. Additionally, continuous monitoring of the tobacco plant’s growth process is valuable for harvest estimation. Plants in different growth stages exhibit distinct appearances and sizes in the orthophoto imagery
As shown in
Figure 3,
Figure 3a displays the RGB image captured on 2 July. It is evident that the tobacco plants were quite small immediately after transplantation into the field, with significant gaps between them, and the black plastic film was still present around the plants. As the growth process continued, the plants grew larger and the gaps between them gradually disappeared. By the arrival of August, the tobacco plants entered the maturity phase, as indicated by the yellowing of lower leaves, signaling their readiness for harvest.
Figure 3e shows the plants after pruning and the first harvest of ripe leaves.
We choose a site for our experiments. The view of the site is as shown in
Figure 4. In the site, except for the tobacco plant field, there are many other types of land surfaces, such as lanes, gourds, stones, trees, weeds, and other plants. In the field, corn and soy bean are also planted alternatively. The data of a complex land cover is good to see the robustness of our method.
Each photograph captured resulted in five different spectral images, which are the RGB image, NG image, NR image, RE image, and NIR image, as shown in
Figure 5. For this site, 2315 shots are captured by the UAV, with 463 shots for each. These shots of each kind of spectrum are stitched together into a complete image of the site. After stitching, we keep the intersection area of the five spectrum images for making sure that the each channel does not include an empty area inside.
3.5.2. Data Pre-Processing
For the site, we observed 503 shots. In order to utilize more bands except for the RGB, we merged the RGB, NR, NG, RE, and NIR together, as shown in
Figure 6a. Another combination of bands is G, R, and NIR, as shown in
Figure 6c. It is a commonly used combination in agriculture to detect vegetation. Due to the two cameras being located at different places of the UAV, the RGB image and the other multispectral images of the same shot still have slight bias. Therefore, the coordinates are firstly calibrated in merging stage.
The size of the image after stitching is too large to feed into the deep learning neural network. Hence, before training, the stitched image is segmented into small patches (640 × 640). The plants in the images are labeled by using Labelme.
4. Experiments and Results
In total, we conducted 103 experiments. We adopted the relevant settings of the basic network (YOLOv8), including the settings of basic parameters. Through a series of experiments and comparisons, we found that when the batch of the original basic network is set to 128 and epoch is set to 200, the model performs best and has smaller fluctuations. In view of this, we adjusted the batch and epoch to better fit our multispectral dataset while keeping the other parameters of the base network unchanged. These experiments can be approximately summarized into three categories: the performance, the accuracy of plant counting in slices, and the accuracy of plant counting in a big plot.
4.1. Metrics
In this work, we use three common metrics to evaluate the performance: precision, recall, and mAP [
49]. A critical metric is the accuracy of plant counting. Before defining these metrics, the labels of true positive (
TP), False Positive (
FP), False Negative (
FN), and True Negative (
TN) must be defined. In OD, TP refers to a correct detection when the predicted class is matched and the IoU is greater than a threshold. FP refers to a wrong detection when the class is unmatched or the detection with IoU is less than the threshold. FN is the number of missed detections of positive instances. TN does not apply in OD. In this work, because there is only one detection class, tobacco plant, only positive has meaning and negative has non-meaning. TP indicates that the tobacco plant is detected as tobacco plant. FP indicates that the other things are detected as tobacco plant.
The precision is calculated by dividing the number of TP by the total number of positive detections, which is represented in Formula (
1). It indicates the ability of a model to identify only the relevant objects and also has insights into the model’s ability to avoid false positive detections. A high precision indicates a low rate of incorrectly identifying the background or non-target objects as positive detections.
The second metric is recall. It is the ability of a model to find all the relevant cases. It is calculated by dividing the number of true positive detections by the sum of true positive detections and false negative detections, which is represented in Formula (
2). It indicates the proportion of correctly identified positive instances out of all the actual positive instances of a given dataset. A high recall value indicates that the model has a low rate of missing relevant objects and is effective at capturing most of the positive instances.
In Formula (
2), the recall value is the ratio of the correct detection number of tobacco plants to the number of all ground truths. However, it does not indicate the plant counting accuracy. In here, it is calculated as an average value with ten different IoU settings from 0.50 to 0.95 and with a space of 0.05. It still indicates the counting accuracy to a certain degree.
mAP50 is another staple metric used for project detection. It is a variation in the mean average precision (mAP) metric specifically calculated, at a fixed threshold of intersection over a union, as 0.5. The threshold of 0.5 is commonly used to determine whether a detection is considered a true positive or a false positive. For every category, the average precision (AP) is calculated as the area under the precision–recall curve. Then, the mAP is the mean value of the AP of all categories. For our specific task, the number of categories is one because we only have one category that is tobacco plant.
Except for the three metrics commonly used for OD, the plant counting accuracy is the most important metric we considered, which is described in Formula (
3). It is the ratio of the number of predicted stands divided by the actual number of stands.
4.2. Experiment of Different Bands
The group of experiments aims to explore the performance of different spectrum combinations. Except for the five kind of images obtained from UAV, Band-7 and RGN introduced in
Section 3.5.2 are tested in our experiments.
Each experiment listed in
Table 1 was conducted within the images captured in one flight. The tobacco seedling images used in the experiment were obtained after planting on 29 April 2023. Images from e1 to e7 were captured 30 days after the tobacco seedlings were planted (on 2 June 2023). Images from e8 to e14 were taken 57 days after planting (on 29 June 2023). Images from e15 to e21 were captured 72 days after planting (on 14 July 2023). Images from e22 to e28 were taken 90 days after planting (on 1 August 2023). Images from e29 to e35 were captured 112 days after planting (on 23 August 2023). With the segmented images from the complete image, 80% of the images are used for training and the remaining 20% are used for testing, as shown in
Figure 4. The column data indicates the date that the images were taken and the experiments are the five groups according to the date.
Firstly, we can see that the RGN setting gains the best performance in every group. The images with more channels perform better then these single-channel images. The reason is that more channels always contain more information. Comparing the four single channels, R, G, RE, and NIR, we find that the RGN combination performs the best. Although the Band-7 includes seven channels that cover all of the above bands, it does not show the best results. The reason is the band-7 contain redundant information and causes the network to converge slowly or not converge even at the lowest point. The red, green, and NIR are three spectral bands which are more sensitive. Comparing the different data, we find that 0801 is the lowest group. The reason is that in this stage, the plants are the biggest in their life cycle, where the gap between plants are closed and the plants start joining together. It causes the difficulty of tobacco plant instance detection. The detection results are better in group 0823 after that because the first-round mature leaves were reaped, resulting in the plants not getting so close to each other anymore.
The plant counting is another critical result of our task. As mentioned before, a right detection is decided by both the confidence and IoU. In the group of counting experiments from e36 to e70, the threshold of confidence is set as 0.25 and the threshold of IoU is set as 0.5. The data used in this group follow the same setting in
Table 1. The GT of plants is the number of plants of the total testing images. The plants detected is the sum of plant numbers detected in the testing set. The counting results are reported in
Table 2. The results show that the RGN band combinations still keep the best results of plant counting. By using the slice images, all of the counting accuracy using different band combinations are kept in the level higher than 95%.
4.3. The Setting of Confidence
The confidence score is a possibility associated with each detected object. It represents how accurately and confidently it belongs to the predicted category. As shown in
Figure 7, the value on top of the detection bounding box is the confidence value of the object. Since we only have one category in the detection task, the confidence indicates that the possibility of the detection is tobacco plant. We can see that generally, the detection possibilities of a tobacco plant located in the middle of the image are over 50% because they keep their entire shape. Even plants at the border of the image can still be identified, just with a lower confidence. It indicates that the YOLOv8 is powerful for OD. We conducted a group of experiments to show the relationship of confidence with the precision and recall, as shown in
Table 3 and
Figure 8. e71, e72, and e73 all have a good balance of the precision of recall. For the plant counting task, because the recall value is more critical, in the rest experiments, the confidence is set as 0.25. Generally, when the confidence is smaller than 0.2, the tobacco plant is only a very small part at the border of the image and will not be counted.
4.4. Experiment of Mixed Data of Five Dates
In the previous experiments, we conduct each training and testing on the same date. A model with good robustness expectations is better for adapting into different periods. For this sake, we conducted a group of experiments with mixed data, from e80 to e86. We use 80% of all the data of five dates for training, and 20% of all the data of five dates for testing. The results are shown in
Table 4. We can see that these metrics still keep at a high level. The plant counting is also conducted with the mixed data setting. As shown in
Table 5, e87 to e93 follow the same data setting of e80 to e86, respectively. The results show that training the network with the mixed data allows the model to adapt to different periods of detection.
4.5. Experiment Test of the Whole Land
In the previous experiments, we conducted the testing experiments with the images that were made into pieces. Also, the counting results are reported as a sum value of all pieces of the images. From the perspective of an application, users are more interested in knowing the number of plants in a given field. For this purpose, we design another experiment testing with a whole piece of field. We choose a plot named as plot-s9 from the testing part, as shown in
Figure 9a. It can be sliced into nine slices of a
size. The reason behind using nine slices is they covered the cases of the left, right, top, and bottom neighboring.
In training, the left part is still split into pieces of a small size (
). In testing, it is not reasonable to feed the entire image into the neural network. No matter if the image is resized or kept at the initial size, the detection accuracy is degraded. When the slice has the same size as the training image, the detection accuracy is the highest. However, the parts of the plant located at the edges of the images are detected as instances. One plant is cut into two parts that are detected in the neighboring images. If a plant is segmented into two slices, they are probably detected as two instances. For a whole piece of land, it is not accurate anymore to count the number of plants just via a simple summation because the plants at the border of the slices may be counted twice. Hence, we design an algorithm to count the plants of a big plot of land. As shown in
Figure 10a, a tobacco plant sitting at the cutting line of two slices is separated into two parts, which are either half and half or a big part with a small part. The YOLOv8 is powerful enough to detect the plant even if is just a small part. When putting the slices together, the plant will be counted twice because it is detected twice in the two slices and the two detected bounding boxes with non-intersection, respectively. It is hard to tell if one plant was detected twice or if two plants were detected separately.
Aiming at the problem, in testing, the entire image for testing is split into slices, with an overlap area. There are two benefits of segmenting the image in this way. Firstly, if a plant coincidentally is located at the slicing line, the overlap slicing makes each of the parts larger and guarantees a higher detection confidence. Secondly, when putting the detection results of all slices together, the detected bounding boxes of the two parts from one plant are overlapped, as shown in
Figure 10b. If it happens, only one plant is counted. We conducted two groups of experiments to evaluate this post-processing algorithm. In
Figure 9a, it shows plot-s9 with
pixels. The slices covered by a red or green color are
. The width of the overlap area is set as 60 pixels. Then, the images are split into nine slices, which are then fed into the network.
As shown in
Table 6, e94 to e98 use the model trained by the data on the same date as the testing data. Taking e95 as an example, the sum of the detected plants is 272. The results are the same as the GT-1. However, the true plants in plot-s9 are 233. Obviously, the plants located in the overlap strips are repeatedly counted. The counting algorithm we designed only counts once if two detected bounding boxes are overlapped. As shown in
Figure 9b, the plants marked with yellow circles are supposed to be one but detected twice in two slices at the same time. They are marked because the two or more detected bounding boxes are overlapped. There are 43 plants that are counted repeatedly. After noting the repeated counting, the final counting is 229. After the post-processing, the detection accuracy is high as 98.28%. We found that e94 gained the best counting accuracy and e95 has the second best counting accuracy. Then, the accuracy reduced along the time line. The reason is that the tobacco plants grew bigger along the time line and overlapped. Hence, the detection accuracy of the plant instance are effected. However, even in the last date of e98, the results still stay at a high-level as 95.65%.
e99 to e103 is another group of the experiment using the same model trained with the data of all five dates, where the testing data is plot-s9. The results are as shown in
Table 7 and
Figure 11. In this group, we found that the plants’ counting accuracy is even higher than the results shown in
Table 6. The reason is that the model is trained by a more comprehensive dataset. In the field, even the plants are also always presented as either larger or smaller. Using a model that trained only the data of a certain period, the robustness of the model is reduced. As shown in
Figure 11, the plants in the region of non-intersection are detected accurately, and the plants located in the overlap strips are also processed via the post-processing algorithm.
6. Conclusions
Plant counting is crucial monitoring process in agriculture for watching how many stands exist in a field. Accurate counting forms the foundation for yield estimation. In this work, we use YOLOv8 to count tobacco plants. Different from the existing works, we use the images captured via UAV that are multispectral data. By using transfer learning, only the small-scale data used in training gained excellent detection results. In order to make the model suitable for the life cycle of tobacco plants in the field, close to the application purpose, we conducted a large number of experiments and post-processed the output of YOLOv8. The results show that, by using multispectral data, the detection is more accurate than using traditional RGB images. The results of plant counting is really close to the ground truth. It indicates that using the UAV, multispectral data, and advanced DL technologies hold great promise in PA.
Our research aims to bridge the gap between cutting-edge technologies and practical applications in agriculture. By focusing on the life cycle of tobacco plants in the field, from germination to harvest, and tailoring our model accordingly, we strive to provide a comprehensive solution for farmers. The post-processing of YOLOv8 outputs further refines the accuracy of plant counting, bringing it in close alignment with ground truth.
The promising results obtained from our experiments underscore the significance of incorporating UAVs, multispectral data, and advanced deep learning technologies in precision agriculture. This integration holds tremendous potential not only for tobacco cultivation but also for a wide range of crops, paving the way for sustainable and efficient farming practices. As we continue to explore the synergies between technology and agriculture, the impact on yield estimation, resource optimization, and overall productivity is poised to be substantial.