Intelligent Cotton Pest and Disease Detection: Edge Computing Solutions with Transformer Technology and Knowledge Graphs

Gao, Ruicheng; Dong, Zhancai; Wang, Yuqi; Cui, Zhuowen; Ye, Muyang; Dong, Bowen; Lu, Yuchun; Wang, Xuaner; Song, Yihong; Yan, Shuo

doi:10.3390/agriculture14020247

Open AccessArticle

Intelligent Cotton Pest and Disease Detection: Edge Computing Solutions with Transformer Technology and Knowledge Graphs

by

Ruicheng Gao

^†,

Zhancai Dong

^†,

Yuqi Wang

,

Zhuowen Cui

,

Muyang Ye

,

Bowen Dong

,

Yuchun Lu

,

Xuaner Wang

,

Yihong Song

and

Shuo Yan

^*

China Agricultural University, Beijing 100083, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Agriculture 2024, 14(2), 247; https://doi.org/10.3390/agriculture14020247

Submission received: 28 December 2023 / Revised: 27 January 2024 / Accepted: 30 January 2024 / Published: 2 February 2024

(This article belongs to the Special Issue Application of Machine Learning and Data Analysis in Agriculture)

Download

Browse Figures

Versions Notes

Abstract

:

In this study, a deep-learning-based intelligent detection model was designed and implemented to rapidly detect cotton pests and diseases. The model integrates cutting-edge Transformer technology and knowledge graphs, effectively enhancing pest and disease feature recognition precision. With the application of edge computing technology, efficient data processing and inference analysis on mobile platforms are facilitated. Experimental results indicate that the proposed method achieved an accuracy rate of 0.94, a mean average precision (mAP) of 0.95, and frames per second (FPS) of 49.7. Compared with existing advanced models such as YOLOv8 and RetinaNet, improvements in accuracy range from 3% to 13% and in mAP from 4% to 14%, and a significant increase in processing speed was noted, ensuring rapid response capability in practical applications. Future research directions are committed to expanding the diversity and scale of datasets, optimizing the efficiency of computing resource utilization and enhancing the inference speed of the model across various devices. Furthermore, integrating environmental sensor data, such as temperature and humidity, is being considered to construct a more comprehensive and precise intelligent pest and disease detection system.

Keywords:

cotton pest detection; edge computing in farming; object detection; mobile application; dataset augmentation

1. Introduction

In modern agricultural production, cotton, as a vital raw material for the global textile industry, plays a crucial role in ensuring the supply of textiles [1]. Unfortunately, the growth of cotton is frequently plagued by various pests and diseases, which not only threaten the yield and quality of the crop, but also potentially incur economic losses for farmers [2]. Traditional methods of pest and disease identification, mostly reliant on farmers’ experience and manual inspection, are time-consuming and labor-intensive. Moreover, their accuracy heavily depends on the individual skills of farmers, making these methods unsuitable for large-scale modern agriculture [3].

With the rapid development of information technology, particularly breakthroughs in deep learning and computer vision, new possibilities for intelligent detection of agricultural pests and diseases have emerged [4,5,6]. Advanced techniques, such as deep learning, enable automatic analysis of field images of cotton, accurately identifying types and locations of pests and diseases [7], thereby improving the efficiency and accuracy of pest and disease management [8]. Particularly in large-scale farm management, this approach significantly reduces manual labor, enhancing the speed and precision of pest and disease identification [9].

A model for detecting and classifying cotton plant diseases based on convolutional neural networks (CNNs) was developed and analyzed by Suriya et al., utilizing multiple convolutional and max pooling layers to extract features from images of cotton plants [10]. However, the accuracy of their model was not notably high. In contrast, Zambare et al. proposed a deep learning model based on CNN, achieving a detection and classification accuracy of 99.38% for various diseases in cotton plant images [11]. Yet, the performance of their model in complex field scenarios could not be guaranteed. Addressing this, Rai et al. proposed an improved Deep Convolutional Neural Network (DCNN) model for identifying and predicting different disease states in cotton plant images collected from real environments [12]. These researchers utilized deep learning technologies, particularly models based on CNN, to enhance the accuracy of detection and classification of diseases in cotton plants. Their research not only improved the accuracy of cotton disease detection but also provided effective technological means for the early diagnosis and prevention of agricultural diseases.

Pankaj et al. introduced a model for predicting cotton diseases based on Internet of Things (IoT) hardware sensors and CNN algorithms, assisting farmers in identifying cotton diseases and recommending appropriate pesticides through a mobile application [13]. Additionally, Shao et al. [14] proposed a model for identifying cotton leaf diseases, enhanced by a bilinear coordinate attention module. This model, operating in natural environments, precisely locates and identifies diseased areas on cotton leaves. It integrates spatial coordinate information with features by embedding them into the feature map through the bilinear coordinate attention mechanism, reducing the loss of feature information and focusing more on diseased leaf areas while minimizing attention to redundant information, such as healthy areas. Although significant progress in cotton disease recognition was made by the teams of Pankaj and Shao, demonstrating the vast potential of utilizing IoT and deep learning technologies in practical applications, their models required substantial computational resources. Therefore, further improvements and optimizations of these technologies are crucial.

Existing pest and disease recognition technologies must overcome a series of challenges to achieve widespread application in the agricultural sector [15]. On one hand, due to the diversity of cotton pests and diseases and the fact that some have minute appearance features or resemble natural surroundings, accurate identification becomes increasingly difficult. On the other hand, complex and variable field environments, such as lighting conditions and background noise, can affect the effectiveness of pest and disease recognition. Furthermore, most existing recognition models usually require substantial computational resources, making them unsuitable for applications on mobile devices or edge computing devices [16,17].

In response to these challenges, this study presents an innovative solution combining deep learning, knowledge graphs, and edge computing. Firstly, deep learning models, particularly those utilizing Transformer technology [18], are effectively able to process complex image data, thereby enhancing the accuracy of pest and disease detection. Transformer technology, originally designed for natural language processing tasks, is centered around the self-attention mechanism. This mechanism enables the model to process all elements in the input sequence simultaneously and compute the relationships among them. In the context of image processing, this implies that the model not only focuses on local features, as traditional convolutional neural networks do, but also captures and analyzes global relationships within the image. Such a global perspective allows Transformers to more effectively recognize complex patterns in images, such as subtle changes indicative of pests and diseases. When analyzing images of cotton plant diseases, Transformer models are capable of simultaneously noting different parts of the leaves and understanding the interrelationships among these parts, which is particularly important for disease feature recognition. Additionally, Transformer models typically possess deeper network structures, further enhancing their feature extraction and recognition capabilities. Compared to traditional deep learning models, Transformers provide a higher level of abstraction, making them more effective in processing image data with complex structures and patterns, thus achieving higher accuracy and robustness in pest and disease detection applications.

Secondly, by constructing a knowledge graph specifically for cotton pests and diseases, the integration of domain experts’ knowledge with deep learning models further improves the recognition capabilities and accuracy. The rich information contained in the knowledge graph assists the model in making more precise judgments during the identification process.

Additionally, this study considers the practical application scenarios of the model, especially in resource-constrained mobile or edge computing environments. By optimizing the model structure and computational strategy, the model is enabled to run efficiently on these devices, achieving rapid pest and disease detection. This not only enhances the timeliness of pest and disease management but also provides significant technical support for the intelligent automation of agricultural production.

2. Related Work

2.1. Transformer in Pest and Disease Detection

In modern agricultural technology, the Transformer model [19] has garnered significant attention for its exceptional ability in processing sequential data. In the context of cotton pest identification, the application of the Transformer model introduces a novel method of recognition, chiefly attributed to its unique network structure and processing mechanism. A core feature of the Transformer model is the self-attention mechanism, enabling the model to address long-range dependencies in input data [20]. This is particularly crucial for processing images of cotton pests and diseases, as the characteristics of these ailments may be unevenly distributed across the image and influenced by complex backgrounds [21,22,23,24]. The self-attention mechanism captures global information by computing the influence of each element in the input sequence on all other elements. Its mathematical expression is given by the following:

Attention (Q, K, V) = softmax (\frac{Q K^{T}}{\sqrt{d_{k}}}) V

(1)

Here, Q (Query), K (Key), and V (Value) represent the query, key, and value respectively, while

d_{k}

denotes the dimension of the key. Transformer models typically comprise multiple layers of such attention mechanisms, each capable of capturing different levels of features. Additionally, the Transformer model includes positional encoding to provide information about the position of elements in the input sequence. Since the Transformer model inherently lacks any information about the order of input data, positional encoding is crucial for maintaining temporal relationships in sequence data. Positional encoding is typically achieved through transformations of sine and cosine functions:

P E_{(p o s, 2 i)} = sin (\frac{p o s}{10000^{2 i / d_{model}}})

(2)

P E_{(p o s, 2 i + 1)} = cos (\frac{p o s}{10000^{2 i / d_{model}}})

(3)

Here,

p o s

represents the position, i denotes the dimension, and

d_{model}

is the dimension of the model. In the application for cotton pest identification, the Transformer model initially extracts features of cotton images through a pretrained convolutional neural network (such as ResNet [25] or VGG [26]). Subsequently, these features are processed further in the Transformer model. Given the uneven distribution of pest and disease characteristics in images, the self-attention mechanism effectively captures these dispersed features, integrating them into a global context. To enhance the model’s recognition capabilities, a fusion model combining the Transformer with a knowledge graph is proposed. The knowledge graph provides abundant information about cotton pests and diseases, including types, characteristics, and impacts. This information, once structured and encoded, is integrated into the Transformer model, enabling it to make more accurate judgments not only based on image features but also leveraging domain knowledge.

KG Integration = Transformer Output + KG Embedding

(4)

In this equation,

KG Integration

represents the output after integrating the knowledge graph,

Transformer Output

is the output of the Transformer model, and

KG Embedding

is the embedding representation of the knowledge graph. Overall, the application of the Transformer model in the scenario of cotton pest identification offers a new perspective [27]. The combination of self-attention mechanisms and knowledge graphs allows the model to not only process complex image data but also utilize expert knowledge to improve recognition accuracy.

2.2. Knowledge Graph in Cotton Pest and Disease Recognition

In the field of cotton pest and disease recognition, knowledge graphs serve as a powerful tool, playing a vital role in integrating and utilizing specialized knowledge [28]. By organizing information in a structured manner, knowledge graphs enable models to more comprehensively and accurately understand and process complex pest and disease data. The application of knowledge graphs in cotton pest recognition scenarios not only enhances the accuracy of identification but also deepens the model’s understanding of the characteristics of cotton pests and diseases [29].

The construction of a knowledge graph typically involves several steps. Initially, information collection and organization are carried out, categorizing and annotating various data related to cotton pests and diseases, such as types, symptoms, and environmental conditions. Next, this information is processed into a structured format, transforming it into nodes and edges within the graph. Nodes usually represent entities (like specific pests or diseases), while edges signify relationships between entities (such as “is a symptom of…”) [30]. The final step involves embedding the nodes and edges of the graph into a mathematically processable vector format. A key aspect in the application of knowledge graphs for cotton pest recognition is their effective integration with deep learning models, such as convolutional neural networks or Transformers. This is often achieved by combining the embedded representations of the graph with the output of the model. Assuming a knowledge graph

K G

with node embeddings E and model output M, the integration of the knowledge graph with the model can be represented as follows:

K G_{c o m b i n e d} = f (E, M)

(5)

Here, f denotes a fusion function, which could be a simple addition, concatenation, or a more complex neural network structure.

K G_{c o m b i n e d}

represents the combined output, incorporating both the features extracted by the model and the specialized knowledge provided by the knowledge graph. In practical applications, the use of knowledge graphs helps the model to better understand the characteristics of cotton pests and diseases. For example, through the knowledge graph, the model can learn about the typical environmental conditions under which a certain pest or disease occurs, its characteristic symptoms, and how to distinguish similar pests and diseases [31,32,33]. This information is crucial for improving the model’s accuracy in complex scenarios. An important application of knowledge graphs in cotton pest and disease recognition is to enhance the model’s ability to identify subtle and ambiguous features. In real agricultural fields, the characteristics of pests and diseases are often not obvious and can be easily confused with the surrounding environment. By integrating knowledge graphs, the model can utilize specialized knowledge to assist in identifying these elusive features.

2.3. Edge Computing

Edge computing, as a significant computational architecture, reflects the ongoing pursuit of data processing speed and efficiency in current technological trends. Unlike traditional centralized computing models, edge computing emphasizes dispersing data processing tasks to the network’s edge [34], close to the source of data generation. The core advantage of this architecture lies in its ability to significantly reduce latency caused by data transmission, reduce the demand for network bandwidth [35], and enhance the real-time reliability of data processing [36].

In the specific application scenario of cotton pest identification, the advantages of edge computing are particularly evident. By deploying devices with certain computational capabilities, such as smart cameras or sensors, in cotton fields, real-time capture of cotton image data can be achieved, and preliminary image processing and pest detection can be conducted on the device itself. This method not only enables real-time monitoring and rapid response to pest situations, but also reduces the data-processing burden on traditional central servers, enhancing the efficiency and response speed of the entire pest identification system [37]. When implementing edge computing, effective data collection through edge devices is the first step, involving the use of cameras and other equipment to capture real-time images in the cotton fields. Subsequently, these devices use their built-in computational resources to perform preliminary processing of the collected data, such as image preprocessing and feature extraction [38]. Then, lightweight pest identification algorithms are used for preliminary pest detection on the edge devices. This step is crucial as it allows for rapid identification of potential pests and preliminary responses. Finally, these preliminarily processed data and identification results are sent to central servers or the cloud for further analysis and storage [39], facilitating long-term tracking and more in-depth data analysis.

Despite the many advantages of edge computing in cotton pest identification, it also faces challenges, such as the limited computational and storage capabilities of edge devices [40]. To address these challenges, measures such as optimizing algorithms to reduce the demand for computational resources or using lightweight neural network models can be taken. In summary, edge computing provides an efficient and real-time solution for cotton pest identification, enabling the monitoring system to respond more quickly to pest threats while also increasing the flexibility and scalability of the entire system. By conducting preliminary data processing and analysis on edge devices, data processing speed and efficiency can be significantly improved, providing robust technical support for modernized agricultural pest management.

3. Materials and Methods

3.1. Dataset Collection

In the research presented in this paper, to ensure the accuracy and practicality of the intelligent model for cotton pest and disease identification, datasets from diversified sources were selected for experiments. These datasets were primarily acquired through manual collection and internet crawling techniques, as shown in Table 1 and Figure 1.

By this method, a large and varied collection of cotton pest and disease images was amassed. These images not only cover different types of pests and diseases, but also include field images of cotton in various environments. The rationale for this approach is the abundance of publicly available resources related to cotton pests and diseases on the internet, including professional agricultural websites, forums, and social media platforms. Utilizing web crawling techniques facilitated the efficient collection of these image data, providing an ample sample pool for model training.

3.2. Dataset Annotation

The annotation of the dataset constitutes a critical step in constructing an effective model for cotton pest identification. Initially, a team comprising agricultural experts and data scientists was organized to manually annotate the collected images. During the annotation process, not only were the specific locations of pest damage in the images marked but detailed descriptions of pest types were also provided, such as names and characteristics of the pests. To enhance the accuracy and efficiency of annotation, a semi-automated method was employed. Specifically, a simple pretrained model was first used for preliminary pest detection and annotation in the images. Subsequently, these automated annotations were reviewed and corrected by the expert team, as shown in Figure 2.

Common image processing techniques were utilized to assist in the annotation process, such as edge detection and color segmentation for identifying pest areas in the images. The basic principle of edge detection can be represented by the following mathematical formula:

G (x, y) = \sqrt{G_{x}^{2} (x, y) + G_{y}^{2} (x, y)}

(6)

where

G (x, y)

represents the gradient intensity of the image at point

(x, y)

, and

G_{x} (x, y)

and

G_{y} (x, y)

are the gradients at that point in the x and y directions, respectively. Calculating the image’s gradient effectively detects the edges of pest areas, thus assisting the annotation process.

3.3. Dataset Augmentation

In the field of deep learning and computer vision, particularly in image recognition tasks, the quality and diversity of a dataset are key factors determining the effectiveness of model training and final performance. The diversity of a dataset encompasses not only the variation and complexity of images, but also the presentation of samples under different conditions, all of which directly impact the model’s generalization capability and practicality. To enhance the accuracy and robustness of the model in the specific task of cotton pest identification, dataset augmentation is employed as a vital technical strategy. The primary aim of data augmentation is to create more training samples artificially by applying various transformations and processing methods to the original dataset, as shown in Table 1 and Table 2.

These methods include, but are not limited to, geometric transformations, color adjustments, random cropping, and noise addition, significantly increasing the dataset’s diversity and size. This increased diversity is crucial for improving the model’s generalization ability, especially in the application scenario of cotton pest identification, where the model needs to accurately identify pest images under different lighting conditions, angles, and backgrounds. In this paper, three advanced data augmentation techniques, namely Cutout, Cutmix, and Mixup, are employed. Each technique possesses unique advantages and applicable scenarios, enhancing the model’s depth of understanding and flexibility in image interpretation.

3.3.1. Cutout

The fundamental principle of the Cutout method [42] is to randomly select an area in the image and set the pixel values of that area to zero or other specific values. This simple yet effective strategy enhances the model’s learning of key features and reduces reliance on non-essential parts. For instance, in the scenario of cotton pest identification, using Cutout prevents the model from overly depending on certain specific pest features, such as a particular shape or color, and instead teaches it to identify pests from a more comprehensive perspective. The operation of the Cutout method is relatively straightforward. First, the size and shape of the area to be covered need to be determined. In most cases, the area is rectangular and its dimensions are set based on the specific requirements of the experiment. Once the dimensions of the covered area are determined, the next step is to randomly select a point in the image as the center of this area, as shown in Figure 3A. The randomness of this process is key to ensuring the diversity of data augmentation. The pixels within the selected area are then set to zero or other specified values, creating a “blank” area in the image. This process is repeatedly applied to different images in the dataset, thereby enhancing the entire dataset. Through these steps, the Cutout method effectively increases the diversity of the dataset and the generalization ability of the model. In the application of cotton pest identification, this method is particularly helpful in improving the model’s ability to recognize partially obscured or incomplete pest images, thus enhancing overall identification performance.

3.3.2. Cutmix

The fundamental principle of the Cutmix [43] method involves cropping a region from one image and pasting it onto a corresponding location in another image. This method not only enhances the model’s ability to learn local features but also increases its robustness across category boundaries. In operation, the cropping area chosen by Cutmix is typically rectangular, with its size and location randomly determined. This region-level image blending approach, compared to traditional pixel-level blending, more effectively preserves the structural information of images while introducing additional background and contextual information. The mathematical representation of the Cutmix method can be simplified as follows:

I^{'} = M ⊙ I_{A} + (1 - M) ⊙ I_{B}

(7)

Here,

I_{A}

and

I_{B}

are two distinct training images, and M is a binary mask matrix of the same size as the images, indicating the region cropped from image

I_{A}

, while

1 - M

indicates the corresponding area retained in image

I_{B}

. The symbol ⊙ denotes element-wise multiplication. The operational steps of Cutmix are relatively straightforward. Firstly, two training images,

I_{A}

and

I_{B}

, are randomly chosen, along with a random position as the center of the cropping area, as shown in Figure 3B. Subsequently, a rectangular area is cropped from image

I_{A}

based on preset size parameters, and this area is pasted onto the corresponding position in image

I_{B}

. Finally, the newly created image

I^{'}

serves as the mixed result, with its label also being proportionally mixed based on the size of the cropping area. In the application of cotton pest identification, the Cutmix method effectively enhances the model’s understanding of the complex background and the relationships between different types of pests. This data augmentation technique not only allows the model to learn more robust feature representations but also shows improved performance in dealing with diverse and complex pest images.

3.3.3. Mixup

The core idea of Mixup [44] is to blend two images at the pixel level, along with a proportional blending of their corresponding labels. The advantage of this method lies in its ability to enable the model to learn smoother decision boundaries during the training process, thereby improving its adaptability and robustness to variations in input data. Compared to traditional data augmentation methods, Mixup not only offers a new perspective to increase data diversity but also helps mitigate the problem of overfitting, especially in cases of smaller dataset sizes. The Mixup method can be mathematically described as follows:

I^{'} = λ I_{A} + (1 - λ) I_{B}

(8)

y^{'} = λ y_{A} + (1 - λ) y_{B}

(9)

Here,

I_{A}

and

I_{B}

are two distinct images, and

y_{A}

and

y_{B}

are their respective labels. The coefficient

λ

is randomly drawn from the interval [0, 1]. This blending approach results in the generated image

I^{'}

and label

y^{'}

containing information from both original samples, thus increasing the diversity and complexity of the training data. In the application of this paper, the operation of Mixup is relatively simple and direct. First, two images and their corresponding labels are randomly selected from the training dataset. Then, these images and labels are linearly mixed according to a predetermined

λ

value, as shown in Figure 3C. The newly blended images and labels are then used as input data for training the model. This process is repeated throughout the entire training set to ensure the effectiveness of data augmentation. In the application of cotton pest identification, the Mixup method effectively increases the diversity of the training data, enabling the model to exhibit stronger robustness and generalization capabilities when faced with images of various types of cotton pests.

3.4. Proposed Methods

3.4.1. Transformer in Object Detection with Knowledge Graph

The focus of this study is the application of a method combining Transformer architecture with knowledge graph technology to enhance the accuracy and efficiency of cotton pest and disease detection. The Transformer model, primarily known for its success in the field of natural language processing, hinges on the self-attention mechanism. This mechanism enables the model to address long-distance dependencies in sequences, a crucial aspect for detecting pests and diseases in cotton images. However, a challenge faced by Transformers in processing images is the effective integration of domain-specific knowledge. Therefore, knowledge graph technology is employed to structurally integrate domain expertise into the model, enhancing its recognition capabilities and accuracy.

Network Structure Design: Input Layer: The input layer receives raw cotton pest and disease image data. Images are first standardized to a fixed size and normalized through a preprocessing module. Feature Extraction Layer: Features of the images are extracted using deep convolutional networks. This layer outputs a series of feature maps containing spatial information of the images. Transformer Layer: Feature maps are fed into the Transformer model. Multiple self-attention mechanisms are set up in this layer, each capable of capturing correlations at different positions in the feature maps. Knowledge Graph Integration Layer: Information from the knowledge graph is combined with the output of the Transformer layer.

Block (F) = Joint - Attention (F) + Feed Forward (F)

(10)

F^{'} = Transformer Layer (F)

(11)

Integration (F^{'}, K G) = F^{'} + Embedding (K G)

(12)

F denotes the input feature map;

K G

symbolizes knowledge graph information;

F^{'}

is the output of the Transformer layer. Professional knowledge is converted into operational feature vectors and fused with the output of the Transformer layer.

Practical Application: In practical applications, a knowledge graph specific to cotton is used. The Transformer model in this study comprises several layers of blocks, each consisting of multiple distinct operators. These layers are stacked in a specific order to form a deep network structure. Composition of the Block: Each Transformer block mainly consists of the following core components: joint-attention mechanism (as described above), normalization layer, feed-forward network, and residual connection. Joint-Attention Mechanism: This is the core component of the Transformer model. In this operator, each “head” processes a different aspect of the input data. Assuming the model has N heads and each head processes data in dimension D, the total dimension of the operator is

N \times D

. Each head’s weights are controlled by a separate set of parameters, enabling the model to capture various feature correlations. N (Number of Heads): This parameter defines the number of heads in the self-attention mechanism of the model. In this study, N is chosen as 16, meaning each Transformer block contains 16 independent attention “heads”. This design allows the model to analyze the input data from multiple perspectives simultaneously. D (dimension per head): D defines the dimension each head processes. To maintain computational efficiency, a smaller value for D is typically chosen in this study, with D set to 64. This setting ensures that each head focuses on a specific set of features while keeping the overall computational complexity within a reasonable range. Normalization Layer: A normalization layer follows each self-attention mechanism and feed-forward network to stabilize the model’s learning process. It typically employs Layer Normalization to ensure data maintains a relatively stable distribution while flowing through the network. Feed-Forward Network: Each Transformer block also includes a feed-forward network, usually composed of two fully connected layers with a ReLU activation function in between. This network primarily functions to further nonlinearly transform the output of the self-attention layer. Residual Connection: Each core operator in the Transformer uses residual connections. This means the input is directly added to the output of the operator before passing to the next level. Residual connections help to prevent the vanishing gradient problem in training deep networks. Connection Method: In the Transformer, data first passes through the joint-attention mechanism, then through the normalization layer, followed by the feed-forward network, and finally through another normalization layer. Each step is accompanied by residual connections to facilitate information flow and deeper training of the network. Network Size and Channel Quantity: In this study, the size and channel quantity of each layer of the Transformer block are optimized based on experiments and dataset characteristics. For example, initial layers might have fewer channels (32, 64) to reduce computational load, while deeper layers increase the number of channels (256, 512) to capture more complex features.

3.4.2. Construction of Knowledge Graph in Agriculture

In this study, the construction of a knowledge graph plays a crucial role in enhancing the understanding and reasoning capabilities of the cotton pest and disease detection model. A knowledge graph is a semantic network that represents entities and their interrelations in a graphical form, effectively organizing and managing a vast amount of professional knowledge to provide rich background information and prior knowledge for deep learning models. The process of constructing an agricultural knowledge graph involves four main steps: data collection, entity recognition, relationship extraction, and knowledge integration. Initially, in the data collection phase, extensive data related to cotton pests and diseases were gathered, including academic papers, professional books, online databases, and agricultural reports. These data encompass information such as types, characteristics, causes, impacts, and control measures of cotton pests and diseases, providing a wealth of raw materials for constructing the knowledge graph. Subsequently, in the entity recognition phase, natural language processing models [45] were employed to analyze the collected texts, identifying key entities related to cotton pests and diseases, such as names, symptoms, pathogens, and control agents. These entities form the foundation of the knowledge graph, representing core concepts in the field of cotton pest and disease management. Further, in the relationship extraction phase, interrelations between entities were analyzed, for example, the “manifests as” relationship between a certain pest and a specific symptom, or the “can be treated with” relationship between a control agent and a pest. Through extraction and classification of these relationships, a multi-layered, interconnected knowledge network is constructed, as shown in Figure 4.

Finally, in the knowledge integration phase, the extracted entities and relationships were consolidated to form a structured knowledge graph. Graph database technology (Neo4j) [46] was utilized to store and manage the knowledge graph, ensuring its efficient and stable operation. Additionally, data cleansing and quality assessment were conducted to ensure the accuracy and reliability of the information in the knowledge graph. As of now, the constructed knowledge graph and model have been internally tested by some researchers at the China Agricultural University [47].

In the cotton pest and disease detection model, the application of the knowledge graph significantly enriches the model’s understanding capabilities. By integrating the knowledge graph with deep learning models, the model can extract features from images and utilize the professional knowledge in the knowledge graph for reasoning and judgment, thus showing higher accuracy and robustness in complex real-world applications. For instance, when the model detects a specific symptom in an image, it can quickly identify potential pests and diseases by querying the knowledge graph, and even recommend appropriate control measures. This combination of deep learning and knowledge graphs offers a new solution for smart agriculture.

3.4.3. Joint-Attention Mechanism for Complex Background

In the research presented here, the integration of a joint-attention mechanism and a joint-head design specifically for tiny object detection stands as a key innovation in the cotton pest and disease detection model. The joint-attention mechanism enhances the model’s recognition capabilities in complex backgrounds by combining spatial and channel attentions, particularly excelling in detecting minute pest infestations. Specifically, the joint-attention mechanism initially employs spatial attention to identify significant locations within the image, achieved through a set of convolutional layers (Conv) followed by an activation function like sigmoid. The objective of spatial attention is to highlight key areas within the image, focusing the model’s processing on these segments. The computation formula for spatial attention is expressed as:

S A = σ (C o n v (F))

. The convolutional layers in this mechanism each have specific configurations to accommodate feature extraction and processing at varying scales. The detailed configurations of these layers are as follows:

The first convolutional layer aims at initial feature extraction from the image. Utilizing a small kernel size (3 × 3) with a stride of 1, it ensures the extraction of high-resolution features. The number of channels in this layer is set between 64 to 128, depending on the complexity of the input image.
The second convolutional layer deepens the extracted features after the initial extraction, using the same kernel size (3 × 3) but increasing the number of channels to 256. This approach aids in extracting more complex features, aiding the model in recognizing finer details.
Downsampling convolutional layers are incorporated to accommodate images of various scales. These layers reduce the dimensions of feature maps by increasing the stride (for example, a 2 × 2 convolution with a stride of 2), thereby capturing a broader range of contextual information.
Upsampling convolutional layers, in contrast to downsampling ones, increase the dimensions of feature maps through transpose convolution techniques, crucial for restoring detailed information in the image.

To ensure effective processing of images of different sizes, the convolutional layers in the joint-head adopt an adaptable structure. By adjusting the number and configuration of these layers, the network can flexibly adapt to inputs of various resolutions, thus ensuring effective processing of images of all sizes. This design allows the network to maintain high-resolution features while capturing wider contextual information, enhancing accuracy and robustness in detecting minute targets.

Subsequently, channel attention works on analyzing and emphasizing important feature channels in the image. This step involves processing the feature maps with average pooling (AvgPool) and maximum pooling (MaxPool), followed by a multilayer perceptron (MLP), also employing an activation function like sigmoid. The computation formula for channel attention is:

C A = σ (M L P (A v g P o o l (F))) \oplus σ (M L P (M a x P o o l (F)))

. Finally, these two types of attentions are integrated and combined with the original feature maps through element-wise multiplication and addition, generating the final output feature map:

F^{'} = (S A \otimes F) \oplus (C A \otimes F)

. This step is the crux of the joint-attention mechanism, enabling the model to concurrently focus on both spatial and channel information of the image. Such a design allows the model to accurately locate and recognize minute pest and disease features while understanding the image, thereby improving detection accuracy and robustness in complex cotton field environments. Through the joint-attention mechanism and the joint-head design for tiny objects, the presented model effectively addresses the multiple challenges in cotton pest and disease detection.

3.4.4. Joint-Head for Tiny Objects

In the research conducted for this paper, a specialized design joint-head was developed for tiny objects, building upon the foundation of the joint-attention mechanism. This approach aims to enhance the detection of minute targets in cotton pest and disease detection by integrating feature information from various scales through a specific fusion strategy, thus enabling precise localization and identification of small-scale features.

Principle of Joint-Head: Differing from the traditional multi-head mechanisms, joint-head does not merely fuse information at the same scale but strategically combines feature maps of different scales. This design allows the model to capture fine details while simultaneously comprehending higher-level semantic information. The implementation of joint-head is achieved through the following steps:

Initially, the input feature map is processed across different scales, including both downsampling and upsampling operations, to obtain representations at various resolutions.
Subsequently, the joint-attention mechanism is applied to these feature maps at different scales, highlighting important information at each scale.
Finally, these attention-processed feature maps are fused, integrating information across scales.

The fusion strategy can be mathematically expressed as follows:

F_{j o i n t} = \sum_{i = 1}^{N} w_{i} \cdot F_{i}

(13)

Here,

F_{j o i n t}

represents the fused feature map, N is the number of scales,

w_{i}

is the weight of the

i th

scale, and

F_{i}

is the feature map processed by the attention mechanism at scale i. The specific implementation is as shown in Algorithm 1.

Algorithm 1 Joint-Head for Tiny Object Detection

Require: Input feature map F, Number of scales N, Functions for DownSampling

D (\cdot)

,
UpSampling

U (\cdot)

, and Joint Attention

J A (\cdot)

Ensure: Enhanced feature map

F_{j o i n t}

for tiny object detection
1: Initialize an empty list

F e a t u r e M a p s

2: for

i = 1

to N do
3: if

i <

median scale then
4:

F_{i} \leftarrow D (F)

{DownSampling for smaller scales}
5: else if

i >

median scale then
6:

F_{i} \leftarrow U (F)

{UpSampling for larger scales}
7: else
8:

F_{i} \leftarrow F

{Original scale}
9: end if
10:

F_{i} \leftarrow J A (F_{i})

{Apply Joint Attention}
11: Append

F_{i}

to

F e a t u r e M a p s

12: end for
13:

F_{j o i n t} \leftarrow \sum_{i = 1}^{N} w_{i} \cdot F e a t u r e M a p s [i]

{Weighted sum of feature maps}
14: return

F_{j o i n t}

Difference from Multi-Head Mechanism: Contrasting with the multi-head mechanism, joint-head places greater emphasis on integrating features from various scales. While multi-head mechanisms typically analyze information from multiple perspectives within the same scale, joint-head combines features from different scales to obtain a more comprehensive understanding. This approach is particularly effective for processing minute targets, as small-scale features often manifest differently across scales and require a synthesis of information at multiple levels for accurate identification.

Contribution to the Task: In the task of cotton pest and disease detection, crucial features like small pests or early-stage disease spots are often minute and challenging to detect accurately with traditional methods. The design of joint-head enables the model to effectively capture these minute features and analyze them in conjunction with a broader context. This not only improves the precision in detecting minute targets but also enhances the model’s adaptability to features of varying scales. Especially in complex cotton field environments, this capability is vital for enhancing the accuracy and robustness of detection. By integrating features across different scales, joint-head significantly improves the model’s performance in detecting minute features of pests and diseases, providing an effective solution for cotton pest and disease detection.

3.4.5. Combined Loss Function

In the research presented in this paper, combined loss function was specifically proposed to optimize the training of the model and enhance its performance in complex tasks. This loss function amalgamates various types of losses, aiming to simultaneously address classification, localization, and other challenges in cotton pest and disease detection.

Principle of the Combined Loss Function: The core idea of the combined loss function is to consider the advantages of different loss types comprehensively, facilitating a holistic optimization of the model training. This function comprises the following components:

Classification Loss: This evaluates the model’s accuracy in identifying different categories of pests and diseases. Common classification loss functions, such as Cross-Entropy Loss, play a pivotal role in this aspect.
Localization Loss: This assesses the precision of the model in locating areas affected by pests and diseases, often involving the prediction of bounding boxes, utilizing losses like Intersection over Union (IoU) or Smooth L1 loss.
Regularization: This prevents model overfitting, ensuring good generalization ability of the model across diverse data.

The combined loss function can be mathematically expressed as follows:

L_{c o m b i n e d} = α L_{c l a s s} + β L_{l o c} + γ L_{r e g}

(14)

In this study, the coefficients

α

,

β

, and

γ

are crucial for balancing the significance of classification loss

L_{c l a s s}

, localization loss

L_{l o c}

, and the regularization term

L_{r e g}

within the overall loss function. The determination of these coefficients significantly impacts the training and eventual performance of the model. Initially, the values for

α

,

β

, and

γ

are often set based on prior research and experimental insights at the commencement of training. Throughout different stages of training, the model’s performance is evaluated using a validation set. Depending on the model’s performance in classification, localization, and regularization, the values of

α

,

β

, and

γ

are adjusted accordingly. For instance, if the model underperforms in localization tasks, the value of

β

may be increased to give more weight to the localization loss in the total loss. Additionally, an auxiliary network, a simple multilayer perceptron, is employed for automatic adjustment. This network receives various performance indicators of the model, such as components of loss and accuracy, as inputs. During the training process, it computes the current classification loss

L_{c l a s s}

, localization loss

L_{l o c}

, and regularization loss

L_{r e g}

. The auxiliary network aims to minimize the overall loss

L_{c o m b i n e d}

of the main model, implying that it learns to adjust the coefficients to optimize the performance of the main model. This gradient-based automatic adjustment strategy allows for dynamic optimization of the loss function coefficients based on the model’s performance during training.

3.5. Experiment Settings

This section elaborates on the experimental design of this paper, encompassing the division of the dataset, selection of baseline models, and configuration of optimizers and hyperparameters.

3.5.1. Hardware and Test Platform

The hardware configuration used in this study forms the foundation for efficient and precise detection of cotton pests and diseases. To ensure the validity and reliability of the experiments, advanced hardware platforms were utilized for training and testing the model. The details of the hardware configuration employed in this study are as follows: Primarily, the model training and testing were conducted on high-performance GPU servers. These servers were equipped with multiple NVIDIA GeForce RTX 3090 graphics cards, each possessing 24 GB of video memory, providing substantial computational power and rapid data processing capabilities. The RTX 3090, based on the Ampere architecture, supports efficient parallel computing and deep learning optimization techniques, making it highly suitable for large-scale deep learning model training and complex computational tasks. Additionally, the servers were fitted with powerful CPU processors and ample memory resources, ensuring the efficiency of the entire training process. Beyond GPU servers, specialized data collection devices were also employed. These included high-definition cameras and various sensors installed in cotton fields for real-time monitoring and collection of images related to cotton pests and diseases. These devices, characterized by their high resolution and adaptability to different environmental conditions, are capable of capturing clear images under various lighting and weather conditions, providing a high-quality data source for model training and testing.

Regarding the mobile platform, Apple’s iOS devices were chosen as the testing platform for the application. Specifically, iPhones equipped with high-performance A-series chips, such as the A14 Bionic, were used. These chips offer robust CPU and GPU performance and include a dedicated neural engine for efficiently executing deep learning models. Moreover, the high-resolution displays and stable operating systems of iOS devices provide users with a favorable interactive experience and a stable operating environment.

In summary, the hardware configuration for our experiments was designed to meet the high-performance demands of model training while also accommodating the environmental adaptability and user experience requirements of practical applications. Through these advanced hardware resources, the cotton pest and disease detection model was able to undergo efficient training and accurate inference, thus achieving favorable results in practical applications.

3.5.2. Dataset Settings

For the study, the collected 3129 image data were divided into training and validation sets, ensuring the model’s generalizability and the accuracy of its evaluation. Specifically, the dataset was divided in an 80–20% ratio, with 80% of the data allocated for the training set, serving the purpose of model training and optimization, while the remaining 20% was utilized as the validation set for assessing and comparing the model’s performance. To guarantee uniformity in the distribution of data, stratified sampling was implemented during the dataset division, ensuring that both the training and validation sets were representative and diverse in terms of categories, image backgrounds, and lighting conditions.

3.5.3. Baseline

To comprehensively evaluate the detection model proposed in this paper, several baseline models were selected for comparison: YOLOv7 [48], known for its lightweight structure and efficiency in object detection, has shown excellent performance across multiple domains, particularly suited for real-time detection tasks. YOLOv8 [49], as the latest iteration in the YOLO series, offers enhanced performance and accuracy, representing the cutting edge in the field of object detection. RetinaNet [50], with its unique focal loss design, is particularly effective in addressing class imbalance issues, a common challenge in object detection tasks. EfficientDet [51], renowned for its efficient network architecture and exceptional performance-to-resource ratio, is well-suited to resource-constrained environments. Detection Transformer (DETR) [52], a Transformer-based object detection model, demonstrates good capabilities in handling complex scene object detection tasks. These models were chosen as baselines to assess the performance of the proposed model relative to current state-of-the-art technologies.

3.5.4. Optimizer and Hyperparameters

During the training process, the Adam optimizer was selected due to its proven convergence speed and stability when handling large datasets. Combining the benefits of momentum optimization and RMSprop, the Adam optimizer adaptively adjusts the learning rate for each parameter, which is particularly effective for deep learning models. The hyperparameters were set as follows: The initial learning rate was set at 0.001, with a learning rate decay strategy implemented, halving the learning rate when performance on the validation set ceased to improve. Batch size was set to 16 or 32, depending on the GPU memory capacity. Weight decay was set at 0.0001 to prevent overfitting. The number of training epochs was set to 100 but an early stopping strategy was adopted, halting training if no improvement in validation set performance was observed for 10 consecutive epochs.

3.5.5. Model Evaluation Metrics

In the context of this study, accuracy, mean average precision (mAP), and frames per second (FPS) serve as key indicators for assessing the performance of the model. Detailed mathematical descriptions of these evaluation metrics and their significance are provided below.

Accuracy is one of the most intuitive metrics used to measure the proportion of correctly predicted samples by the model. Its formula is given as follows:

Accuracy = \frac{Number of Correctly Predicted Samples}{Total Number of Samples}

(15)

Accuracy reflects the model’s overall performance across the entire dataset, indicating its ability to correctly identify pests. While being an intuitive metric, it is not always sufficient, especially in datasets with class imbalance. In such scenarios, high accuracy might be achieved even if the model predicts most samples correctly, but it does not necessarily imply good predictive capability across all categories.

mAP is a more nuanced metric for evaluating the performance of classification models, particularly suitable for multi-category detection tasks. mAP first calculates the average precision (AP) for each category, followed by averaging the APs across all categories. AP represents the average precision of the model at different recall rates, calculated as follows:

AP = \int_{0}^{1} p (r) d r

where

p (r)

is the precision at recall rate r. The formula for mAP is as follows:

mAP = \frac{1}{N} \sum_{i = 1}^{N} {AP}_{i}

(16)

Here, N is the total number of categories and

{AP}_{i}

is the average precision for the ith category. mAP considers not only the accuracy of predictions but also their completeness. It is a metric that balances recall and precision, crucial for complex tasks requiring differentiation between various categories.

FPS is a critical metric for measuring the real-time performance of a model, especially in applications requiring real-time processing. FPS refers to the number of frames processed by the model per second, calculated as follows:

FPS = \frac{1}{Time to Process One Frame}

(17)

A high FPS indicates faster image processing by the model, which is vital for real-time pest detection systems. In agricultural applications, real-time monitoring and response are key to effective pest control. Thus, enhancing FPS not only improves user experience but also ensures the timeliness and effectiveness of pest management measures.

In summary, these evaluation metrics not only reflect the model’s accuracy and efficiency in pest detection tasks, but also provide crucial insights into the model’s performance in practical applications. By considering these metrics comprehensively, a thorough evaluation and optimization of the pest detection model’s performance can be achieved.

4. Results and Discussion

4.1. Results of Cotton Pest and Disease Detection

The experimental design in this chapter aims to evaluate and compare the performance of different deep learning models in the task of cotton pest and disease detection. By analyzing metrics such as accuracy, mean average precision (mAP), F1-score, and frames per second (FPS), insights into the strengths and limitations of each model and the underlying reasons are gained. The experimental results are presented in Table 3.

Initially, RetinaDet exhibits relatively weaker performance in this experiment, with accuracy, mAP, and F1-score of 0.84, 0.83, and 0.84, respectively, and an FPS of 31.8. RetinaDet, a target detection model based on the feature pyramid and focal loss, is primarily designed to address class imbalance issues. However, its performance is limited in processing cotton pest and disease images due to challenges in handling fine features and complex backgrounds. YOLOv7 and YOLOv8 show better results, with accuracies of 0.91 and 0.92, mAPs of 0.91 and 0.91, F1-scores of 0.90 and 0.91, and FPS of 46.5 and 48.3, respectively. The YOLO series models are known for their speed and efficiency, suitable for scenarios requiring rapid response. They effectively capture image features through deep convolutional neural networks but may have limitations in processing extremely fine details. EfficientDet’s performance, with an accuracy of 0.89, mAP of 0.85, and F1-score of 0.87, and an FPS of 22.9, indicates a slower processing speed. EfficientDet emphasizes a balance between efficiency and performance, but its capability may be constrained by the model’s design when processing complex cotton pest and disease images. The DETR model shows similar accuracy and mAP to YOLOv8, but with a lower FPS of 34.0. This suggests that, while DETR’s Transformer structure performs well in understanding global image information, it is slower than convolution-based models. The model proposed in this paper surpasses others in all metrics, with an accuracy of 0.94, mAP of 0.95, F1-score of 0.94, and FPS of 49.7. This superiority is attributed to innovations in structural design, feature extraction, and optimization strategies. The proposed model integrates Transformer technology and knowledge graphs, along with a unique joint-attention mechanism and a combined head design for small targets, as shown in Figure 5. These features enable the model to efficiently and accurately capture fine details and process complex backgrounds. Furthermore, the model’s optimization strategies ensure high processing speed, making it outstanding in real-time detection.

Overall, the performance of the model proposed in this paper in the task of cotton pest and disease detection validates its effectiveness both theoretically and practically. By considering image features and domain knowledge comprehensively, and through optimized network structure design, the proposed model not only improves in accuracy but also maintains a high level of processing speed. These results provide robust technical support for intelligent identification of cotton pests and diseases, significantly contributing to enhanced agricultural productivity and crop protection.

4.2. Visualization Analysis of Detection Results

4.2.1. Confusion Matrix Analysis

In the analysis of confusion matrices for cotton pest and disease detection, emphasis was placed on examining the performance of different models across specific pest and disease subclasses, particularly focusing on which subclasses were prone to confusion and the potential reasons behind such confusion.

Data from Figure 6 reveal significant misclassification issues in certain subclasses by various models. For instance, the YOLO series models exhibited a degree of confusion between “Constrictor Aphid” and “Foliar Disease”. This confusion might stem from the visual similarities between these two pests, such as size, shape, or color, challenging the deep-convolutional-neural-network-based models in accurate differentiation. Similar phenomena were observed in the RetinaDet and EfficientDet models, indicating a common challenge in handling subtle features. In the case of the DETR model, despite a high overall accuracy, confusion persisted in specific categories, such as “Bacterial Blight” and “Red Spot”. This could be attributed to the challenge of distinguishing subtly different categories, despite the Transformer’s capability in capturing global contextual information. This suggests that global context might not always suffice in discerning categories with similar features. Conversely, the method proposed in this paper demonstrated higher accuracy in the confusion matrices, particularly in categories prone to confusion. For example, compared to other models, the proposed method exhibited less confusion between “Constrictor Aphid” and “Foliar Disease”. This improvement might be attributed to the application of the joint-attention mechanism and joint-head design for tiny objects, enhancing the model’s ability to distinguish subtle differences. Additionally, the use of a combined loss function might have contributed to optimizing the model’s performance in both classification and localization tasks, thereby reducing confusion.

Overall, the analysis of confusion matrices highlighted performance disparities and potential weaknesses of different models in specific subclasses. By comprehensively understanding the causes of these confusions, future model designs and improvements can be better guided, especially in handling subtle features and similar categories. This is crucial for enhancing the accuracy and robustness of cotton pest and disease detection.

4.2.2. Detection Results Visualization

In this section, an in-depth exploration was conducted on the performance of the cotton pest and disease detection model proposed in this paper, particularly in practical applications. Visualization analysis of the detection results clearly demonstrates the model’s accuracy and efficiency in identifying pest and disease features, especially its exceptional capability in processing images with complex edges and multiple detection points, as illustrated in Figure 7.

Firstly, regarding the processing of complex edges, the model displayed significant advantages. In agricultural image processing, edge detection is often challenging due to the variable shapes of cotton leaves and their high color similarity with the background. The model accurately identified the edges of cotton leaves, even in situations where the leaf edges were blurred or highly blended with the background colors. This advantage stems from the model’s powerful feature extraction capability and effective handling of complex backgrounds, ensuring high accuracy in pest and disease detection. Secondly, the model also excelled in scenarios involving multiple detection points in an image. In practical applications, cotton pests and diseases often appear not in isolation but as clusters of multiple spots or infestation points. Under such circumstances, identifying and locating each spot or pest point becomes particularly complex. The model was not only capable of precisely locating each detection point but also distinguished between adjacent multiple spots or pest points, demonstrating its efficiency and accuracy in multi-target detection tasks. Moreover, the visualization results also revealed the model’s ability to recognize different types of pests and diseases. The model could identify not only common types but also effectively recognize some uncommon or difficult-to-detect pest and disease types. This indicates the model’s significant advantage in learning and generalization abilities, enabling it to adapt to varied practical application environments. Lastly, it is worth mentioning that the model also showed good adaptability and robustness in processing images under different lighting conditions and shooting angles. Whether in strong lighting or in shadow, the model accurately identified pests and diseases, further reflecting its capability of adapting to complex environmental conditions.

4.3. Ablation Study on Different Attention Architecture

In this section, a pivotal experiment was conducted to evaluate and compare the performance of different attention architectures in cotton pest and disease detection. The aim of this experiment was to explore the impact of various attention mechanisms on model performance, thereby gaining a better understanding of how these models operate and determining the most suitable attention architecture for this task. By comparing the accuracy, mAP, and FPS of different architectures, insights were gained into their strengths and limitations.

Table 4 and Figure 8 present the performance of five different attention architectures. Initially, the pure self-attention architecture demonstrated good performance, achieving an accuracy of 0.93 and an mAP of 0.92, though its FPS was relatively low at 31.8. The key advantage of self-attention [19] lies in its ability to capture long-distance dependencies, enabling the model to understand global information in images. However, due to higher computational costs, this architecture is somewhat limited in processing speed. The channel attention and spatial attention architectures [53], focusing on the channel and spatial features of images, respectively, showed moderate performance. The channel attention architecture achieved an accuracy of 0.90 and an mAP of 0.87, while the spatial attention architecture had an accuracy of 0.88 and an mAP of 0.90. Compared to self-attention, both architectures improved in FPS, reaching 39.7 and 41.1, respectively, indicating their efficiency in processing speed. Yet, they might not be as comprehensive as self-attention in capturing global context information.

The architecture combining joint attention with joint-head proved to be the most effective in the experiment, attaining an accuracy of 0.96 and an mAP of 0.95, while maintaining a high FPS of 51.4. This efficient performance can be attributed to its simultaneous focus on spatial and channel information in images, and the integration of the joint-head design for better handling of tiny objects. This suggests that considering both spatial and channel information is crucial in complex tasks like cotton pest and disease detection. Lastly, the architecture that combined joint attention with multi-head attention [19] also showed good performance, with an accuracy and mAP of 0.93 and 0.92, respectively, but a lower FPS of 32.6. This indicates that, while multi-head attention can provide diverse perspectives in image analysis, it may not be as fast as the joint-head design in processing.

In summary, the experimental results indicate that the application of joint-attention mechanisms combined with joint-heads is highly effective in cotton pest and disease detection tasks. Not only does it offer high accuracy and mAP, but it also maintains a high FPS, showcasing its superiority in both precision and processing speed. These findings provide valuable insights, guiding the selection and design of the most suitable attention architecture for specific tasks. Understanding the mathematical and technical characteristics of each architecture allows for a deeper analysis of their applicability and limitations in various scenarios, thus offering essential guidance for future model design and optimization.

4.4. Agricultural Large-Model Implementation with Sensors

With the advancement in agricultural technology and the development of the Internet of Things, sensors are increasingly being utilized in agriculture. They not only monitor crop growth environments in real-time but also collect valuable data to support agricultural decision making. Integrated with large-scale agricultural models, sensor technology can achieve more intelligent and efficient pest and disease monitoring and management. To incorporate data from various types of sensors, a multi-source data alignment module is implemented prior to the model in practical deployment, as shown in Figure 9.

This module is pivotal in processing and fusing heterogeneous data from different sensors, including temperature, humidity, and light intensity data. The working principle of the multi-source data alignment module can be broken down into the following steps:

Data Preprocessing: Initially, all sensor data are subjected to preprocessing. Non-image data (such as temperature and humidity) are transformed into standardized numerical representations.
Feature Extraction: Subsequently, appropriate feature extraction methods are employed for each data type. In this study, fully connected layers are used to extract features from environmental data like temperature and humidity.
Feature Fusion: The features from different data sources are then fused. This step typically involves aligning and fusing features in a common feature space using techniques such as weighted summation and concatenation.
Data Alignment: Post feature fusion, data alignment operators ensure that features from different sources are effectively aligned in the same dimensional space. This alignment allows subsequent deep learning models to better utilize the data.

Assuming there are n different data sources (in this study,

n = 3

, representing temperature, humidity, and light intensity data), each source i yields a feature vector

F_{i}

after feature extraction. The objective of the data alignment module is to transform these feature vectors into a unified feature space for effective fusion. Feature fusion can be represented as follows:

F_{c o m b i n e d} = \sum_{i = 1}^{n} w_{i} \cdot Align (F_{i}, Θ_{i})

(18)

Here,

F_{c o m b i n e d}

is the fused feature vector,

w_{i}

is the weight of the ith data source,

Align (\cdot)

is the data alignment operator, and

Θ_{i}

are the operator parameters. The

Align (\cdot)

operator maps feature vectors of different dimensions to a common dimensional space. This can be achieved through a fully connected layer or another appropriate neural network layer:

Align (F_{i}, Θ_{i}) = σ (W_{i} \cdot F_{i} + b_{i})

(19)

Here,

W_{i}

and

b_{i}

are the weights and bias of the fully connected layer, respectively, and

σ

is an activation function, such as ReLU or sigmoid. In practice, the operator parameters

Θ_{i} = {W_{i}, b_{i}}

must be obtained through training to ensure effective support for pest and disease detection tasks after data alignment. In this way, the multi-source data alignment module unifies data from different sensors, providing a comprehensive and accurate feature representation for the deep learning model, thereby enhancing the effectiveness of cotton pest and disease detection.

4.5. Application on Mobile Computing Platform

In the modern agricultural production, real-time monitoring and rapid identification of cotton pests and diseases are crucial for ensuring crop health and improving yield. The research aims to develop an application for mobile computing platforms that performs instant detection and analysis of cotton pests and diseases. The application not only supports server-side processing but also has the capability of completing inference analysis locally on mobile devices, greatly enhancing its application value in environments without network access, as shown in Figure 10.

Mobile Platform Framework (iOS Platform) and Development Process: The Apple iOS platform was chosen as the target platform for the mobile application due to its broad user base and mature development environment. The iOS app is written in Swift and integrates with the Core ML framework, which is Apple’s solution for deploying machine learning models on iOS devices. Core ML supports various model formats and can leverage the advantages of Apple hardware to improve the model’s operational efficiency on devices. The development process includes the following:

Model Training and Optimization: Deep learning models are trained and validated on the server side using a large collection of cotton pest and disease image data. After training, the models are optimized to accommodate the performance limitations of mobile devices.
Model Conversion and Testing: The trained models are converted into Core ML model format and tested on iOS devices to ensure their accuracy and efficiency.
Application Development: Using the Xcode development environment and Swift language, the iOS application is developed by integrating the optimized model into the application and designing an intuitive user interface.
Local Inference Implementation: An optimized Core ML model is embedded in the iOS app to enable local execution. A user-friendly operational interface is designed to allow users to easily upload images and receive inference results from the model.
Performance Testing and Optimization: Extensive performance tests are conducted to ensure the application performs well across different models of iOS devices. Model parameters are adjusted based on test results to balance inference speed and accuracy.

Mobile–Server System Design: As illustrated in Figure 11, the mobile–server system comprises several key components:

Data Collection Terminals: Multiple data collection terminals are deployed in the fields to collect images of cotton pests and diseases. Communication Network: GSM/GPRS networks are used for data transmission to ensure the images collected are promptly uploaded to the server. Server-Side Processing: The server stores and analyzes the collected data to produce pest and disease detection results. Mobile Application: Users access pest and disease detection services through an application on their iOS devices. The app can communicate with the server when network connection is available or work independently using local inference functionality when there is no network. Local Storage and Inference: The application allows users to store images directly on the device and perform pest and disease recognition inference using the embedded Core ML model. User Interaction and Feedback: The application provides an interactive interface for users to upload images and receive inference results. Users can also provide feedback to help developers further improve the application.

With the design and implementation of the above system, the application developed in this research realizes efficient detection of cotton pests and diseases on mobile computing platforms. Whether processing data through the server under good network connection conditions or completing inference locally on mobile devices when there is no network, the application provides users with accurate and fast pest and disease identification services, significantly enhancing the level of intelligence in modern agriculture.

5. Conclusions

In this study, a deep-learning-based intelligent recognition model is proposed for cotton pest and disease detection, achieving efficient and accurate identification of cotton pests and diseases. The significance of this task lies in not only enhancing the automation level of pest and disease identification, alleviating labor intensity for farmers, but also improving agricultural productivity, which is crucial for ensuring cotton yield and quality.

The research innovation of this paper mainly includes the following aspects: Firstly, the latest Transformer technology and knowledge graph are adopted and applied to the task of cotton pest and disease detection, which is relatively rare in previous studies. The introduction of Transformer technology allows the model to better capture long-range dependencies in images, enhancing the model’s understanding of global information; the use of the knowledge graph provides the model with a wealth of domain knowledge, enhancing the model’s accuracy in identifying pest and disease features. Secondly, the joint-attention mechanism and joint-head design for tiny objects proposed in this paper enable the model to have higher accuracy and robustness in dealing with complex backgrounds and tiny objects. With these innovative architectures, our model outperforms traditional models in all metrics, including accuracy, mAP, and FPS, all of which are validated in the experimental results. In terms of experimental results, the method proposed in this paper performs excellently in the cotton pest and disease detection task, specifically achieving an accuracy rate of 0.94, an mAP of 0.95, and an FPS of 49.7, all surpassing other comparative models. Especially when dealing with complex images where multiple tiny aphids cluster together, our model exhibits superior performance.

Nevertheless, our research still has some limitations. For instance, although the model shows good performance in the experiments, how to maintain this performance on a larger scale dataset and how to further reduce the consumption of computational resources require more in-depth study. In addition, the robustness of the model in dealing with images under extreme environmental conditions, such as extreme lighting and occlusions, needs to be improved. In future research, we plan to expand the scale of the dataset, adding more types of pest and disease images to improve the model’s generalization ability. Also, we will explore more efficient model optimization methods to reduce the inference time of the model on mobile devices, achieving faster pest and disease detection. Moreover, we will consider incorporating more types of sensor data, such as temperature and humidity information, to achieve more accurate and comprehensive pest and disease monitoring.

Author Contributions

Conceptualization, R.G., Y.S. and S.Y.; Methodology, R.G., Z.D. and S.Y.; Software, Z.D. and M.Y.; Validation, Y.W., Z.C. and B.D.; Formal analysis, Z.D., B.D., Y.L. and X.W.; Investigation, Z.C.; Resources, M.Y. and Y.L.; Data curation, Y.W., M.Y. and Y.L.; Writing—original draft, R.G., Z.D., Y.W., Z.C., M.Y., B.D., Y.L., X.W., Y.S. and S.Y.; Visualization, Y.W., Z.C. and X.W.; Supervision, Y.S.; Project administration, R.G., Y.S. and S.Y.; Funding acquisition, S.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Yunnan Academician Expert Workstation (202305AF150142).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Ahmad, S.; Hasanuzzaman, M. Cotton Production and Uses; Agronomy, Crop Protection, and Postharvest Technologies (Springer Nature Singapore Pte Ltd.): Singapore, 2020. [Google Scholar]
Khan, M.A.; Wahid, A.; Ahmad, M.; Tahir, M.T.; Ahmed, M.; Ahmad, S.; Hasanuzzaman, M. World cotton production and consumption: An overview. In Cotton Production and Uses: Agronomy, Crop Protection, and Postharvest Technologies; Springer: Singapore, 2020; pp. 1–7. [Google Scholar]
Ul-Allah, S.; Rehman, A.; Hussain, M.; Farooq, M. Fiber yield and quality in cotton under drought: Effects and management. Agric. Water Manag. 2021, 255, 106994. [Google Scholar] [CrossRef]
Rothe, P.R.; Rothe, J.P. Intelligent pattern recognition system with application to cotton leaf disease identification. In Proceedings of the Innovations in Computer Science and Engineering: Proceedings of the Fifth ICICSE 2017; Springer: Berlin/Heidelberg, Germany, 2019; pp. 19–27. [Google Scholar]
Banu, T.S.; Mani, D. Cotton crop monitoring system using CNN. Xi’an Jianzhu Keji Daxue Xuebao/J. Xi’an Univ. Arch. Technol. 2020, 12, 2. [Google Scholar]
Song, Y.; Zhang, H.; Li, J.; Ye, R.; Zhou, X.; Dong, B.; Fan, D.; Li, L. High-Accuracy Maize Disease Detection Based on Attention Generative Adversarial Network and Few-Shot Learning. Plants 2023, 12, 3105. [Google Scholar] [CrossRef]
Caldeira, R.F.; Santiago, W.E.; Teruel, B. Identification of cotton leaf lesions using deep learning techniques. Sensors 2021, 21, 3169. [Google Scholar] [CrossRef]
Kumar, S.; Jain, A.; Shukla, A.P.; Singh, S.; Raja, R.; Rani, S.; Harshitha, G.; AlZain, M.A.; Masud, M. A comparative analysis of machine learning algorithms for detection of organic and nonorganic cotton diseases. Math. Probl. Eng. 2021, 2021, 1–18. [Google Scholar] [CrossRef]
Shah, N.; Jain, S. Detection of disease in cotton leaf using artificial neural network. In Proceedings of the 2019 Amity International Conference on Artificial Intelligence (AICAI), Dubai, United Arab Emirates, 4–6 February 2019; pp. 473–476. [Google Scholar]
Suriya, D.S.; Navina, N. Development and Analysis of CNN based Disease Detection in Cotton Plants. J. Innov. Image Process. 2023, 5, 140–160. [Google Scholar] [CrossRef]
Zambare, R.; Deshmukh, R.; Awati, C.; Shirgave, S.; Thorat, S.; Zalte, S. Deep Learning Model for Disease Identification of Cotton Plants. Spec. Ugdym. 2022, 1, 6684–6695. [Google Scholar]
Rai, C.K.; Pahuja, R. Classification of Diseased Cotton Leaves and Plants Using Improved Deep Convolutional Neural Network. Multimed. Tools Appl. 2023, 1–19. [Google Scholar] [CrossRef]
Pankaj, M.; Gupta, T.; Poongodi, T. IoT Hardware Sensor-Based Cotton Disease Prediction Using CNN Algorithm. In Proceedings of the 2022 3rd International Conference on Intelligent Engineering and Management (ICIEM), London, UK, 27–29 April 2022; pp. 479–483. [Google Scholar]
Shao, M.; He, P.; Zhang, Y.; Zhou, S.; Zhang, N.; Zhang, J. Identification Method of Cotton Leaf Diseases Based on Bilinear Coordinate Attention Enhancement Module. Agronomy 2022, 13, 88. [Google Scholar] [CrossRef]
Liu, Y.; Song, Y.; Ye, R.; Zhu, S.; Huang, Y.; Chen, T.; Zhou, J.; Li, J.; Li, M.; Lv, C. High-Precision Tomato Disease Detection Using NanoSegmenter Based on Transformer and Lightweighting. Plants 2023, 12, 2559. [Google Scholar] [CrossRef] [PubMed]
Liu, J.; Wang, X. Plant diseases and pests detection based on deep learning: A review. Plant Methods 2021, 17, 1–18. [Google Scholar] [CrossRef] [PubMed]
Zhang, J.; Huang, Y.; Pu, R.; Gonzalez-Moreno, P.; Yuan, L.; Wu, K.; Huang, W. Monitoring plant diseases and pests through remote sensing technology: A review. Comput. Electron. Agric. 2019, 165, 104943. [Google Scholar] [CrossRef]
Han, K.; Xiao, A.; Wu, E.; Guo, J.; Xu, C.; Wang, Y. Transformer in transformer. Adv. Neural Inf. Process. Syst. 2021, 34, 15908–15919. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar]
Tay, Y.; Bahri, D.; Metzler, D.; Juan, D.C.; Zhao, Z.; Zheng, C. Synthesizer: Rethinking self-attention for transformer models. In Proceedings of the International Conference on Machine Learning, PMLR, Virtual, 18–24 July 2021; pp. 10183–10192. [Google Scholar]
Guo, Y.; Lan, Y.; Chen, X. CST: Convolutional Swin Transformer for detecting the degree and types of plant diseases. Comput. Electron. Agric. 2022, 202, 107407. [Google Scholar] [CrossRef]
Salamai, A.A.; Ajabnoor, N.; Khalid, W.E.; Ali, M.M.; Murayr, A.A. Lesion-aware visual transformer network for Paddy diseases detection in precision agriculture. Eur. J. Agron. 2023, 148, 126884. [Google Scholar] [CrossRef]
Li, X.; Chen, X.; Yang, J.; Li, S. Transformer helps identify kiwifruit diseases in complex natural environments. Comput. Electron. Agric. 2022, 200, 107258. [Google Scholar] [CrossRef]
Zhang, Y.; Wa, S.; Liu, Y.; Zhou, X.; Sun, P.; Ma, Q. High-accuracy detection of maize leaf diseases CNN based on multi-pathway activation function module. Remote Sens. 2021, 13, 4218. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26–30 June 2016. [Google Scholar]
Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. In Proceedings of the Computer Science, Sydney, Australia, 1–3 October 2014. [Google Scholar]
Jajja, A.I.; Abbas, A.; Khattak, H.A.; Niedbała, G.; Khalid, A.; Rauf, H.T.; Kujawa, S. Compact convolutional transformer (CCT)-Based approach for whitefly attack detection in cotton crops. Agriculture 2022, 12, 1529. [Google Scholar] [CrossRef]
Chen, X.; Jia, S.; Xiang, Y. A review: Knowledge reasoning over knowledge graph. Expert Syst. Appl. 2020, 141, 112948. [Google Scholar] [CrossRef]
Tian, L.; Zhou, X.; Wu, Y.P.; Zhou, W.T.; Zhang, J.H.; Zhang, T.S. Knowledge graph and knowledge reasoning: A systematic review. J. Electron. Sci. Technol. 2022, 20, 100159. [Google Scholar] [CrossRef]
Xu, J.; Kim, S.; Song, M.; Jeong, M.; Kim, D.; Kang, J.; Rousseau, J.F.; Li, X.; Xu, W.; Torvik, V.I.; et al. Building a PubMed knowledge graph. Sci. Data 2020, 7, 205. [Google Scholar] [CrossRef]
Huang, W.; Mao, Y.; Yang, Z.; Zhu, L.; Long, J. Relation classification via knowledge graph enhanced transformer encoder. Knowl.-Based Syst. 2020, 206, 106321. [Google Scholar] [CrossRef]
Zhang, Y.; Shi, X.; Mi, S.; Yang, X. Image captioning with transformer and knowledge graph. Pattern Recognit. Lett. 2021, 143, 43–49. [Google Scholar] [CrossRef]
Chhetri, T.R.; Hohenegger, A.; Fensel, A.; Kasali, M.A.; Adekunle, A.A. Towards improving prediction accuracy and user-level explainability using deep learning and knowledge graphs: A study on cassava disease. Expert Syst. Appl. 2023, 233, 120955. [Google Scholar] [CrossRef]
Jiang, C.; Fan, T.; Gao, H.; Shi, W.; Liu, L.; Cérin, C.; Wan, J. Energy aware edge computing: A survey. Comput. Commun. 2020, 151, 556–580. [Google Scholar] [CrossRef]
Lin, H.; Zeadally, S.; Chen, Z.; Labiod, H.; Wang, L. A survey on computation offloading modeling for edge computing. J. Netw. Comput. Appl. 2020, 169, 102781. [Google Scholar] [CrossRef]
Mansouri, Y.; Babar, M.A. A review of edge computing: Features and resource virtualization. J. Parallel Distrib. Comput. 2021, 150, 155–183. [Google Scholar] [CrossRef]
Kang, H.; Ai, L.; Zhen, Z.; Lu, B.; Man, Z.; Yi, P.; Li, M.; Lin, L. A Novel Deep Learning Model for Accurate Pest Detection and Edge Computing Deployment. Insects 2023, 14, 660. [Google Scholar] [CrossRef] [PubMed]
Ning, H.; Li, Y.; Shi, F.; Yang, L.T. Heterogeneous edge computing open platforms and tools for internet of things. Future Gener. Comput. Syst. 2020, 106, 67–76. [Google Scholar] [CrossRef]
Lv, Z.; Chen, D.; Lou, R.; Wang, Q. Intelligent edge computing based on machine learning for smart city. Future Gener. Comput. Syst. 2021, 115, 90–99. [Google Scholar] [CrossRef]
Zhang, T.; Li, Y.; Chen, C.P. Edge computing and its role in Industrial Internet: Methodologies, applications, and future directions. Inf. Sci. 2021, 557, 34–65. [Google Scholar] [CrossRef]
China Agricultural University. Image Dataset for Agricultural Diseases and Pests Research. 2023. Available online: http://www.icgroupcas.cn/website_bchtk/index.html (accessed on 27 December 2023).
DeVries, T.; Taylor, G.W. Improved regularization of convolutional neural networks with cutout. arXiv 2017, arXiv:1708.04552. [Google Scholar]
Yun, S.; Han, D.; Oh, S.J.; Chun, S.; Choe, J.; Yoo, Y. Cutmix: Regularization strategy to train strong classifiers with localizable features. In Proceedings of the Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 6023–6032. [Google Scholar]
Zhang, H.; Cisse, M.; Dauphin, Y.N.; Lopez-Paz, D. mixup: Beyond empirical risk minimization. arXiv 2017, arXiv:1710.09412. [Google Scholar]
Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
Liu, P.; Huang, Y.; Wang, P.; Zhao, Q.; Nie, J.; Tang, Y.; Sun, L.; Wang, H.; Wu, X.; Li, W. Construction of typhoon disaster knowledge graph based on graph database Neo4j. In Proceedings of the 2020 Chinese Control And Decision Conference (CCDC), Hefei, China, 22–24 August 2020; pp. 3612–3616. [Google Scholar]
China Agricultural University 2023. Available online: https://ciee.cau.edu.cn/art/2023/12/31/art_1328745.html (accessed on 27 December 2023).
Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 7464–7475. [Google Scholar]
Terven, J.; Cordova-Esparza, D. A comprehensive review of YOLO: From YOLOv1 to YOLOv8 and beyond. arXiv 2023, arXiv:2304.00501. [Google Scholar]
Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
Tan, M.; Pang, R.; Le, Q.V. Efficientdet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 10781–10790. [Google Scholar]
Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-end object detection with transformers. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 213–229. [Google Scholar]
Iandola, F.N.; Han, S.; Moskewicz, M.W.; Ashraf, K.; Dally, W.J.; Keutzer, K. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size. arXiv 2016, arXiv:1602.07360. [Google Scholar]

Figure 1. Dataset collection at China Agricultural University (CAU), the device are supported by CAU.

Figure 2. Dataset annotation.

Figure 3. Dataset augmentation methods used in this paper. (A) is Cutout [42]; (B) is Cutmix [43] (the red box means the image inserted); (C) is Mixup [44].

Figure 4. Illustration of agricultural knowledge graph.

Figure 5. Curves for training set using our method.

Figure 6. Confusion matrices.

Figure 7. Visualization of detection.

Figure 8. Diagram illustrating the attention mechanism architecture utilized in the study for enhanced cotton pest and disease detection.

Figure 9. Multi-source data alignment module.

Figure 10. Application screenshot on iOS device.

Figure 11. An illustration of a mobile–server system architecture for agricultural data acquisition and processing, featuring acquisition terminals, a base station for GSM/GPRS communication, a server with local storage capability, and a gateway connecting to the internet through a GPRS network.

Table 1. Dataset collection details.

Species	Number	Source	Number (after Augmentation)
Foliar Disease	417	Manual Collection	1251
Curl Leaves	275	Manual Collection	825
Bacterial Blight	104	Manual Collection	312
Constrictor Aphid	286	Manual Collection	858
Red Spot	316	Internet [41]	948
Alternaria Leaf Spot	491	Manual Collection	1473
Herbicide	328	Internet [41]	984
Spodoptera litura	193	Manual Collection	579
Cotton Bollworm	117	Manual Collection	351
Cotton Aphid	125	Manual Collection	375

Table 2. Ablation results on dataset augmentation.

Augmentation	Accuracy	mAP	F1-Score
None	0.81	0.82	0.81
Basic (Resize, Color Adjust, and Rotation)	0.87	0.84	0.85
Cutout	0.90	0.91	0.90
Cutmix	0.89	0.91	0.90
Mixup	0.87	0.90	0.88
All	0.94	0.95	0.94

Table 3. Results of cotton pest and disease detection.

Model	Accuracy	mAP	F1-Score	FPS	Inference Time (s)	Model Size
RetinaDet	0.84	0.83	0.84	31.8	0.0314	203 MB
YOLOv7	0.91	0.91	0.90	46.5	0.0215	147 MB
YOLOv8	0.92	0.91	0.91	48.3	0.0207	156 MB
EfficientDet	0.89	0.85	0.87	22.9	0.0437	248 MB
DETR	0.91	0.91	0.90	34.0	0.0294	129 MB
Ours	0.94	0.95	0.94	49.7	0.0201	147 MB

Table 4. Comparison of attention architectures in cotton pest and disease detection.

Attention Architecture	Accuracy	mAP	FPS
Self Attention	0.93	0.92	31.8
Channel Attention	0.90	0.87	39.7
Spacial Attention	0.88	0.90	41.1
Joint Attention + Joint-Head	0.96	0.95	51.4
Joint Attention + Multi-Head	0.93	0.92	32.6

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Gao, R.; Dong, Z.; Wang, Y.; Cui, Z.; Ye, M.; Dong, B.; Lu, Y.; Wang, X.; Song, Y.; Yan, S. Intelligent Cotton Pest and Disease Detection: Edge Computing Solutions with Transformer Technology and Knowledge Graphs. Agriculture 2024, 14, 247. https://doi.org/10.3390/agriculture14020247

AMA Style

Gao R, Dong Z, Wang Y, Cui Z, Ye M, Dong B, Lu Y, Wang X, Song Y, Yan S. Intelligent Cotton Pest and Disease Detection: Edge Computing Solutions with Transformer Technology and Knowledge Graphs. Agriculture. 2024; 14(2):247. https://doi.org/10.3390/agriculture14020247

Chicago/Turabian Style

Gao, Ruicheng, Zhancai Dong, Yuqi Wang, Zhuowen Cui, Muyang Ye, Bowen Dong, Yuchun Lu, Xuaner Wang, Yihong Song, and Shuo Yan. 2024. "Intelligent Cotton Pest and Disease Detection: Edge Computing Solutions with Transformer Technology and Knowledge Graphs" Agriculture 14, no. 2: 247. https://doi.org/10.3390/agriculture14020247

APA Style

Gao, R., Dong, Z., Wang, Y., Cui, Z., Ye, M., Dong, B., Lu, Y., Wang, X., Song, Y., & Yan, S. (2024). Intelligent Cotton Pest and Disease Detection: Edge Computing Solutions with Transformer Technology and Knowledge Graphs. Agriculture, 14(2), 247. https://doi.org/10.3390/agriculture14020247

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Intelligent Cotton Pest and Disease Detection: Edge Computing Solutions with Transformer Technology and Knowledge Graphs

Abstract

1. Introduction

2. Related Work

2.1. Transformer in Pest and Disease Detection

2.2. Knowledge Graph in Cotton Pest and Disease Recognition

2.3. Edge Computing

3. Materials and Methods

3.1. Dataset Collection

3.2. Dataset Annotation

3.3. Dataset Augmentation

3.3.1. Cutout

3.3.2. Cutmix

3.3.3. Mixup

3.4. Proposed Methods

3.4.1. Transformer in Object Detection with Knowledge Graph

3.4.2. Construction of Knowledge Graph in Agriculture

3.4.3. Joint-Attention Mechanism for Complex Background

3.4.4. Joint-Head for Tiny Objects

3.4.5. Combined Loss Function

3.5. Experiment Settings

3.5.1. Hardware and Test Platform

3.5.2. Dataset Settings

3.5.3. Baseline

3.5.4. Optimizer and Hyperparameters

3.5.5. Model Evaluation Metrics

4. Results and Discussion

4.1. Results of Cotton Pest and Disease Detection

4.2. Visualization Analysis of Detection Results

4.2.1. Confusion Matrix Analysis

4.2.2. Detection Results Visualization

4.3. Ablation Study on Different Attention Architecture

4.4. Agricultural Large-Model Implementation with Sensors

4.5. Application on Mobile Computing Platform

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI