1. Introduction
As technology develops, crop yields have greatly increased [
1]. However, due to differences in geography and climate, crop production in some regions is far from meeting the demand. As yield cannot be significantly increased, it is crucial to effectively reduce grain consumption to preserve grain. The key to scientific grain preservation is “temperature control” [
2], and the key to maintaining an appropriate temperature for grain is ventilation. Effective ventilation not only prolongs the storage time of grain but also maintains its quality.
Nowadays, ventilation technology includes temperature-reducing ventilation, precipitation-reducing ventilation, anti-condensation ventilation, heat-dissipation ventilation, and conditioning ventilation [
3]. Choosing the appropriate ventilation timing and completing ventilation control in different modes is the key to temperature control. However, currently, grain storage facilities can only detect “three temperatures and two humidities”, which are the temperature of the grain pile, the temperature of the atmosphere in the granary, the temperature of the external atmosphere, the humidity of the atmosphere inside the granary, and the humidity of the external atmosphere. The commonly used ventilation control still relies on human judgment to start and stop ventilation equipment. Faced with the complex and ever-changing environment inside the granary, there is no scientific, accurate, or fast response strategy.
The so-called multi-modal grain status refers to the state of different ventilation conditions under a global perspective, which is divided into temperature-reducing ventilation mode, precipitation-reducing ventilation mode, anti-condensation ventilation mode, heat dissipation ventilation mode, and conditioning ventilation mode.
There has been a rapid development of deep learning and its applications in various fields, such as computer vision [
4], speech recognition [
5], natural language processing [
6], medical diagnosis [
7,
8], precision agriculture [
9], stock market [
10], and so on. As deep learning becomes more widespread, its applications in food security have also increased, but there have been relatively few practical applications in grain storage.
Deep learning is a learning algorithm that uses multi-layer neural networks and has strong adaptability, robustness, and learning ability [
11,
12]. The concept of multimodality originates at the intersection of cognitive science and computer science research. It refers to the acquisition of rich information through multiple sensory channels [
13], such as vision, hearing, and touch, and the integration and joint processing of this information to obtain more accurate and comprehensive information and understanding. This multimodal information processing approach plays an important role in human cognition and communication [
14] and has also become an important research direction in fields such as computer vision, speech recognition, and natural language processing.
Based on the theory of computer multimodality, multi-modal grain storage refers to the classification of internal and external environmental conditions of a grain pile under ventilation into different modes. These modes include temperature-reducing ventilation, moisture-reducing ventilation, anti-condensation ventilation, heat dissipation ventilation, and conditioning ventilation. Multi-modal grain status control refers to the control method that changes a series of modes with the environmental changes of the grain pile. Therefore, combining deep learning with the latest multi-modal grain status control theory, researching the decision-making and control strategies of grain storage ventilation mode, achieving balanced grain temperature, preventing water condensation, stopping grain heating, reducing grain moisture, creating a low-temperature environment, improving grain storage performance, and reducing manual operation functions are of practical significance in ensuring grain storage safety [
15]. Based on this theory,
Figure 1 shows the multimodal division of the grain situation decision-making algorithm.
We conducted an extensive literature review on grain storage and found limited research in this field. However, we explored cutting-edge knowledge in related areas. One article introduced a ventilation management model for grain storage based on Bayesian networks [
16]. The model utilized different factors, such as temperature, humidity, and oxygen concentration, as nodes and used mathematical analysis to determine the probability relationships between them to optimize ventilation management. Additionally, we searched for a paper about a grain storage loss analysis model based on the decision tree algorithm [
17]. By collecting relevant data during the grain storage process, including temperature, humidity, and ventilation information, we constructed a decision tree model. The model yielded important findings, identifying temperature, humidity, and ventilation as the key factors that impact grain storage losses. Effective measures should be taken to control these factors in grain storage management to reduce the occurrence of grain storage losses. These findings provide a theoretical foundation for our experiments.
This article proposes a neural network structure based on the transformer, which relies entirely on attention mechanisms to process input sequences, greatly improving the efficiency and speed of natural language processing and other sequence data tasks [
18]. The self-attention mechanism allows the model to attend to different parts of the sequence at different positions, enabling it to gather more global information rather than relying on fixed-size windows as traditional recurrent and convolutional neural networks do. Applying this theory to ResNet (Residual Network) can quickly train neural networks and achieve faster convergence of network models.
Building on these two theories, this paper combines residual networks with self-attention mechanisms, leveraging the advantages of residual networks to avoid the problems of gradient vanishing and exploding and accelerate neural network training. At the same time, self-attention mechanisms are introduced to establish relationships among data autonomously.
Figure 2 is a flowchart of a decision algorithm based on the combination of the self-attention mechanism and ResNet (Residual Network).
2. Materials and Methods
In this section, we first introduced the data collection method and the principles of creating the dataset used in this study, including how to collect grain condition data under different ventilation scenarios and how to preprocess and divide the data.
2.1. Data Collection
The granary data used for this experiment was obtained from a granary located in Yushu City, Jilin Province. The storage grain was directly placed beneath the 25th tall bungalow warehouse, as shown in
Figure 3. The warehouse had a length of 35.76 m, a width of 23.26 m, an outer length of 36.62 m, an outer width of 25.18 m, an eaves height (h) of 11.33 m, a top height (H) of 13.26 m, and a grain pile height of 8.0 m. Distributed fiber optic temperature measurement technology was used to measure the temperature of grain piles [
19], and its distribution is shown in
Figure 4. Additionally, the temperature and humidity inside the warehouse and the atmospheric temperature and humidity were measured using the Sensirion digital temperature and humidity sensor model SHTW2. Since the temperature changes in the grain pile are slow, an hourly data collection strategy was employed to transfer the collected data to the MySQL database.
2.2. Dataset Description
The data required for this experiment includes the temperature of each point in the grain pile, the temperature and humidity inside the warehouse, the atmospheric temperature and humidity, and the average temperature inside the grain pile.
Table 1 describes the data used in this experiment.
2.3. Making a Dataset
2.3.1. Data Preprocessing
The form of the data obtained from Jilin Grain Depot No. 35 is shown in
Figure 5, which is an Excel table of one year testing data.
Summary and aggregation were performed on one year’s worth of data, which was then plotted on a single table. This table includes the temperature of each point in the grain pile, the temperature and humidity inside the warehouse, the atmospheric temperature and humidity, and the average temperature inside the grain pile. The results are summarized in
Table 2 below.
The unprocessed data was compared and analyzed to observe the distribution of average temperature data for each month and identify any outliers that may have been caused by other incidents.
Figure 6 shows the distribution of the average temperature data for each month. The data was then validated to determine whether any outliers were caused by other incidents, and if so, they were removed.
After outlier processing, since the data types of temperature and humidity are different, it is necessary to normalize the data to eliminate the impact of data inconsistency on the experiment. To prevent the standardized data from being close to zero and not differentiating the data, we choose z-score normalization, and its formula is as follows:
where
X is the sample value, mean is the mean value of the sample data, and standard deviation is the standard deviation of the sample data. After normalization, the resulting
Z value indicates the degree of deviation between the original data and the sample mean:
Z < 0 indicates that the data is smaller than the mean, and
Z > 0 indicates that the data is larger than the mean.
2.3.2. Labeling Data
An excellent deep learning model requires accurate data classification. However, since the humidity collected in the grain depot is relative humidity, labeling directly according to the grain depot ventilation regulations is not possible. Therefore, the grain ventilation equation CAE (Chen–Clayton Approximation Equation) is used to fit the grain equilibrium absolute humidity and grain dew point temperature. During this process, the influence of different grains on the CAE (Chen–Clayton Approximation Equation) equation needs to be considered.
Table 3 provides a detailed explanation of the various parameters of the CAE (Chen–Clayton Approximation Equation) equation for different grain categories [
20]. Finally, the fitted data is labeled according to the grain depot ventilation regulations, and the formula is as follows:
where:
: grain equilibrium absolute humidity, mmHg;
: grain moisture content, % (wet basis);
: grain temperature;
: the five parameters of the CAE equation.
where:
: atmospheric relative humidity, %;
: atmospheric temperature, °C;
: atmospheric dew point temperature, °C.
Table 3.
Parameters of the CAE equation for the main grain types.
Table 3.
Parameters of the CAE equation for the main grain types.
Classification | Aspiration Type | CAE Equation Parameters |
---|
A1 | A2 | B1 | B2 | D |
---|
Wheat | Desorption | 4.212 | 4.796 | 7.493 | 4.028 | 202.031 |
Adsorption | 4.874 | 4.767 | 4.671 | 3.639 | 201.676 |
Paddy | Desorption | 4.431 | 4.883 | 7.758 | 4.373 | 205.097 |
Adsorption | 4.606 | 4.561 | 4.918 | 3.613 | 202.632 |
Corn | Desorption | 4.393 | 4.845 | 7.843 | 3.858 | 203.892 |
Adsorption | 4.812 | 4.479 | 4.783 | 3.799 | 202.164 |
2.4. Data Set Partitioning
To divide the processed and labeled data into training, testing, and validation sets, we will use the DataLoader package in Python. Firstly, we will set the batch size to 32 and the random state to 42 to ensure consistent results each time. This will guarantee that the dataset is shuffled in a reproducible manner.
Next, we will create three data loaders: the training data loader, the testing data loader, and the validation data loader. The data loaders will allow us to efficiently load and iterate through the data during model training and evaluation.
The training data loader will be responsible for providing batches of data during the training process. It will randomly sample 70% of the data for training.
The testing data loader will be used to evaluate the model’s performance. It will contain 15% of the data and will be used to assess how well the model generalizes to unseen examples.
The validation data loader will also contain 15% of the data and will be used to finetune the model’s parameters and assess its performance on a separate dataset. This will help ensure that the model’s weights are optimized and prevent overfitting to the training data.
By using the DataLoader package and setting the appropriate parameters, we can create data loaders that provide randomized and rigorous training, testing, and validation data for our model.
2.5. Neural Network Model
2.5.1. CNN
CNN (Convolutional Neural Network) is a type of deep learning neural network widely used in fields such as computer vision and natural language processing. It consists of convolutional layers, pooling layers, and fully connected layers. The convolutional layer extracts features from the data, while the pooling layer performs downsampling to reduce the number of parameters in the feature map, thereby reducing computational complexity, preventing overfitting, and improving model robustness [
21]. The fully connected layer mainly works with the softmax function to normalize the output and obtain the probability of each category, thereby achieving the classification task.
The CNN model used in this experiment is based on VGGNet16, which consists of 13 convolutional layers and 3 fully connected layers. Each convolutional layer uses a 3 × 3 convolutional kernel and the ReLU activation function. The first 12 convolutional layers follow the same configuration, with a 2 × 2 max pooling layer following each layer, using a stride of 2 to reduce the dimensionality of the feature maps. The first two fully connected layers have 4096 neurons each, and the last fully connected layer has 1000 neurons. The output of the last fully connected layer is passed through a softmax operation to convert it into a probability distribution.
2.5.2. ResNet
ResNet (Residual Network) is a deep learning neural network composed of multiple residual blocks. Each residual block consists of two main parts: the main path and the skip connection [
22]. The main path is composed of a series of convolutional layers, batch normalization layers, and activation functions, which are used for feature extraction of the input signal. The skip connection is a direct connection that adds the input signal directly to the output signal, thus preserving the information of the input signal and allowing it to bypass the convolutional layers in the main path and be directly passed to the subsequent layers. This structure makes the network easier to train, avoids problems such as gradient disappearance and explosion, and makes the network deeper, which improves the accuracy of the network. In this article, ResNet (Residual Network) is used as the basic model, and the network structure of its residual blocks is shown in
Figure 7.
2.5.3. GRU
GRU (Gated Recurrent Unit) is a type of recurrent neural network that consists of an update gate, a reset gate, and a hidden state vector [
23,
24]. The GRU model used in this study consists of 16 hidden layers and 2 GRU layers. The hidden layers are responsible for computing the hidden state at the current time step based on the previous information in the input sequence and the current input. The hidden state contains information from previous time steps and is passed to the model at the next time step, enabling the model to capture the temporal dependencies in the sequence data.
2.5.4. LSTM
LSTM stands for Long Short-Term Memory, which is a type of recurrent neural network [
25,
26]. It was designed to overcome the vanishing gradient problem in traditional RNNs and allow for the processing of long-term dependencies. LSTMs use a series of gates, including an input gate, a forget gate, and an output gate, to selectively allow information to flow through the network and control the memory stored in the hidden state. This enables LSTMs to selectively remember or forget information from previous time steps as needed, making them well-suited for tasks such as language modeling, speech recognition, and handwriting recognition. The LSTM utilized in this article consists of 16 hidden layers and 2 LSTM layers. The purpose of the hidden layers is to introduce non-linear mappings, thereby enhancing the expressive capacity of the network. LSTM, a special type of recurrent neural network (RNN) architecture, addresses the issue of vanishing and exploding gradients encountered by traditional RNNs by incorporating gate mechanisms. The LSTM layer exhibits memory capabilities when processing sequential data, enabling effective handling of long-term dependencies.
2.5.5. Self-Attention
Self-attention is a mechanism used in deep learning to balance the importance of different parts of a sequence when predicting or generating the next element. Self-attention allows the model to focus on different parts of the input sequence during prediction without using recursive or convolutional operations.
The core idea of self-attention is to calculate the influence of each element on other elements by computing the associated weights. This weight can be calculated using various methods, but the most common approach is to use dot-product attention. This measures the degree of association between a query vector and a key vector by calculating their dot product and using it as the attention weight.
Figure 8 for self-attention is as follows:
2.5.6. ResNet_Attention
This network is a 1D convolutional neural network based on the ResNet architecture. It is mainly used for grain quality sequence data classification tasks, where the input is one-dimensional grain quality sequence data. The network includes a convolutional layer (Conv3×3), a maximum pooling layer, four residual blocks, and a self-attention layer. Each residual block contains two convolutional layers (Conv3×3) and a self-attention layer. Inside the residual block, the input signal is passed through two 1D convolutional layers (Conv3×3) with the same kernel size, followed by a self-attention layer for feature extraction and adaptive feature weighting. The self-attention layer adjusts the weights of the feature vectors by calculating attention weights so that important features get larger weights while unimportant features get smaller weights. Self-attention can help the network better understand the long-term dependencies and importance of the input signal, thus improving classification performance. Finally, after global average pooling, a fixed-size feature vector is obtained, which is then fed to a fully connected layer to output the classification result for the grain quality data. The neural network structure used in this experiment is shown in
Figure 9.
2.6. Evaluation Criteria
This article explores the use of residual neural networks with self-attention mechanism for making ventilation decisions in granaries under multiple modalities. Five evaluation metrics, including loss, accuracy, precision, F1 score, and recall, are used to compare the performance of the proposed model against other models.
2.6.1. Cross-Entropy Loss
The cross-entropy loss is a commonly used loss function in deep learning, especially in classification tasks. In classification tasks, we want the model to assign each input sample to the correct category. The purpose of the cross-entropy loss is to measure the difference between the predicted and actual categories. Its formula is as follows:
where:
: represents the total number of samples;
: represents the total number of categories;
: represents the true label of the sample, which is 1 if the sample belongs to the category, or 0 otherwise;
: represents the probability that the model predicts the sample belongs to the category.
2.6.2. Accuracy
Accuracy is one of the most commonly used metrics for comparing model performance, and is used to evaluate the accuracy of a model’s classification. However, accuracy is not a universal evaluation metric. In some cases, it may be misleading because it only considers the number of correctly classified samples while ignoring the errors made by the model on misclassified samples. Its formula is as follows:
where:
TP (True Positive): predictions that are positive and that are actually positive;
TN (True Negative): predictions that are negative and that are actually negative;
FP (False Positive): predictions that are positive but are actually negative;
FN (False Negative): predictions that are negative but are actually positive.
2.6.3. Precision
Precision is used to evaluate the proportion of true positive samples among all samples that the model predicts as positive, so it can be used to measure the prediction accuracy of the model. Its formula is as follows:
2.6.4. Recall
Recall, also known as sensitivity, is the proportion of true positive samples that are correctly identified by the classifier among all positive samples. It can be understood as the ability of the model to correctly identify positive samples and is also referred to as the model’s “true positive rate” or “hit rate”. Its formula is as follows:
2.6.5. F1 Score
F1 score is a metric that considers both precision and recall of a classification model and is commonly used to evaluate the performance of binary classification models. Its formula is as follows:
2.6.6. Confusion Matrix
The confusion matrix is a tool to evaluate classification models. It is a matrix that shows the cross-occurrence of actual and predicted classifications. Rows in the matrix represent the actual classes, while columns represent the predicted classes. By computing the confusion matrix, we can obtain evaluation indicators such as accuracy, precision, recall, and F1 score for the model.
4. Discussion
In this study, we compared the proposed residual network model with self-attention mechanisms to several common sequence models, including LSTM, GRU, ResNet, and CNN. LSTM and GRU are widely used in sequence modeling tasks, while ResNet and CNN are popular deep learning architectures for image processing and classification. Compared to these models, our proposed model achieved higher accuracy in all ventilation categories, especially for cooling ventilation, where it reached 99%.
One of the advantages of the proposed model is its ability to capture the temporal and spatial dependencies in the ventilation data, which is important for accurately identifying different ventilation categories. Unlike the other models, our proposed model incorporates the self-attention mechanism, which allows it to focus on important features and enhance their representation. This mechanism also enables the model to learn more complex relationships between the inputs and outputs, which is crucial for achieving high accuracy in the ventilation classification task.
However, there are still some limitations that need to be addressed in future work. The model exhibits high computational complexity due to its complex network structure and the inclusion of self-attention mechanisms within the residual block structure, especially considering the input sequence length of 425 and the computational demands of self-attention. In some cases, this high computational complexity may be even higher. The current model training and evaluation times are excessively long. The average training time exceeds 23 min and 16 s, while the evaluation time is approximately 1 min and 16 s. Moreover, the dataset used in this study only covers a certain range of grain conditions and ventilation scenarios, which may not fully represent real-world situations.
Overall, our proposed model provides a promising approach to the development of more accurate and efficient ventilation control systems for grain storage by leveraging the principles of computer modeling and self-attention mechanisms. The model has demonstrated superior performance compared to several commonly used models, and future work can further improve its robustness and efficiency.
5. Conclusions
This paper discusses the current status of intelligent ventilation management in grain storage and its main challenges, which are due to a lack of clarity around the concept of intelligent ventilation and grain storage data. To address this, a multimodal concept for grain storage is proposed, which transforms the traditional ventilation problem into a pattern selection problem. This allows decision-makers to make informed decisions based on multiple factors rather than solely relying on ventilation regulations to determine the existence of a problem.
The study combines self-attention mechanisms with residual network models to solve decision-making problems in abnormal grain situations. The experimental results demonstrate that residual networks with self-attention mechanisms converge faster and have smaller losses, providing more accurate and efficient decision support for grain storage managers. Moreover, the use of multi-head attention mechanisms significantly improves feature extraction for sequence data, and adjusting these mechanisms for grain situation data in the future may further improve the accuracy of residual networks and shorten decision-making time.
Compared with traditional methods, this approach has significant advantages in dealing with decision-making problems in abnormal grain situations. By considering multiple factors and utilizing self-attention mechanisms, this method provides more accurate and efficient decision support for grain storage managers. In the future, this method can be extended to other fields, providing valuable insights and solutions to a wider range of decision-making problems.