1. Introduction
As the internet and big data continue to evolve, accessing online information more effectively has become an increasingly important issue [
1,
2,
3]. Agriculture is currently rushing towards modernization and informatization [
4], and the volume of agricultural news is increasing daily. However, many industries that are difficult to distinguish are involved in agricultural news, resulting in people needing to spend more time screening the required agricultural news [
5], dramatically hindering the dissemination of agricultural news [
6,
7,
8]. In this line, the correct classification of agricultural news can lead to the more accurate dissemination of some advanced agricultural technologies [
9,
10,
11], such as bioenergy robots for agriculture [
12] and specialist drones [
13], which can be used in farming to provide more solutions for the development of agriculture. Therefore, disseminating agricultural news can significantly contribute to the development of agriculture, and is of great importance to modern agriculture [
14]. However, few studies have focused on the classification of agricultural news. Therefore, accurately classifying agricultural news has become an urgent problem.
In recent years, with scientific and technological development, natural language processing (NLP) has been developing rapidly [
15]. As an important branch of NLP, text classification has also developed rapidly, and there have been many studies on news text classification. However, there is a lack of relevant research on agricultural news classification.
Hu et al. [
16] have focused on improving a patent keyword extraction algorithm by using the distributed Skip-gram model, and proposed a new text classification keyword extraction method to improve the effectiveness of the text classification algorithm. Junjie Li et al. [
17] have proposed a two-channel news headline classification model based on the enhanced representation through knowledge integration (ERNIE) pre-training model, through the use of ERNIE and BiLSTM-AT to extract text information, and deep pyramid convolutional neural networks (DPCNNs) to overcome the long-distance text dependency problem; their approach performed well in news text multi-classification applications. Taimoor Ahmed Javed et al. [
18] have proposed a deep learning model for hierarchical text classification of Urdu news. Their model uses Word2vec to convert words into vectors, followed by LSTM networks to learn text features and perform the final classification.
There is still relatively little research on the classification of Chinese agricultural news. Yang et al. [
19] have proposed a model based on ENIRE, BiGRU, and DPCNN-upgrade (EGC), in which the text is first encoded by ERNIE, followed by feature extraction by the DPCNN and bidirectional gating recurrent unit (BiGRU). Next, the extracted features are fused, and the fused features are finally classified by Softmax. This model has achieved the best results in a Chinese agricultural news data set so far.
Almost all of the research methods have tweaked individual models and only improved the average accuracy of the classification, while not achieving the highest accuracy in all categories; as such, text classification has not yet reached the desired level of effectiveness. As there are five general categories of agricultural news, even the most advanced models are unable to achieve the highest classification accuracy in all categories. The fine-tuning process typically consists of two steps: first, fine-tune the various parameters of the model; second, keep the model that achieves the highest accuracy on the validation set, while abandoning the other models. These discarded models can also provide great utility in the field of classification. Research from Google has shown that, while not performing optimally overall, some models perform well on certain data sets, such that the accuracy can be improved by combining such models [
20]. They gave multiple examples where combining just two models can significantly improve the accuracy, with significant implications for improving the performance of the models in downstream tasks. Google has further proposed the GreedySoup weighting strategy for improving model accuracy [
21]. Instead of selecting a single fine-tuned model that achieves the highest accuracy on the validation set, this approach combines multiple independent models, then adjusts the model weights; this strategy was also shown to be effective in improving the accuracy of the models.
In summary, E3W proposed in this paper is obtained by combining several models that perform well in different categories of the data and adjusting the weights of the models, which provides a feasible and effective modelling approach. In comparative experiments with 13 models, the four models that performed best in different categories of the data set were selected; these four models were ERNIE, ERNIE + DPCNN, EGC, and Word2Vec + TextCNN. Glove essentially involves the dimensionality reduction of a matrix; however, certain attributes are naturally lost in the process of dimensionality reduction. As agricultural text titles are inherently short, have limited information, and cover many categories, this can easily lead to some information being lost through dimensionality reduction, making it difficult to correctly classify. Based on the above reasons, we chose to combine word2vec with TextCNN.
The E3W model proposed in this paper first combines these four models, and then uses the GreedySoup weighting strategy to weigh the outputs of the four sub-models differently. The category with the highest calculated weight value is selected as the final output. Comparative experiments on a Chinese agricultural news data set demonstrate that E3W can achieve the most advanced results. The experimental validation analysis proved the effectiveness of the model in the classification of Chinese agricultural news.
The main contributions of this paper are as follows:
We propose the E3W model, which combines several sub-models and adopts the GreedySoup weighting strategy to adjust the model, achieving the best classification performance on a Chinese agricultural news data set to date;
By combining multiple sub-models, we solve the problem wherein traditional models cannot achieve the highest classification accuracy in all categories;
Applying the GreedySoup weighting strategy to combinatorial models solves the problem of weight assignment when multiple models are combined, and provides a solution to similar problems in other fields
The remainder of this paper is structured as follows:
Section 2 presents information on the four used sub-models.
Section 3 describes the structure and operational procedures of the E3W model proposed in this paper.
Section 4 describes the set of experiments conducted, as well as discussing and analyzing the obtained results. Finally, in
Section 5, we discuss the strengths, weaknesses, and outlook of the proposed approach.
2. Background
2.1. Combined Model Studies
In the multi-classification domain, developing improvements on the basis of individual models makes it difficult to achieve the best results in all categories. This is an important reason behind the difficulty related to improving accuracy in the multi-classification domain. Chinese agricultural news classification is inherently a five-category task and, in order to improve the classification accuracy more significantly, we should consider the use of other methods to improve the classification accuracy of the model.
Traditional approaches to improving model accuracy often involve first training multiple models with different hyperparameters, then selecting the single model that performs best on the validation set while discarding the rest [
22,
23]. Alternatively, different techniques may be used in the different stages of a single model, in order to improve the accuracy of the model [
24,
25]. While research on combining multiple models remains scarce, recent studies have shown that combined models (i.e., those that combine the outputs of multiple models) can outperform the best single models.
The improvement in effectiveness through model combination is clear, and a Google study—which conducted systematic experiments using 82 models—has shown that, even with some low-accuracy models, their combinations may be useful. By combining a high-accuracy model with a low-accuracy model, the accuracy was improved by 7% in an experiment [
20], comprising a significant improvement. Therefore, in order to improve the classification accuracy for agricultural news, we considered the use of a combination model approach for classification.
When using a combined model for classification, it is important to adjust the weights of each model across the structure, in order to achieve better results, as some incorrect weight adjustments may lead to a reduction in the effectiveness of the model. Therefore, we used the GreedySoup strategy to adjust the weights of the model. Google proposed GreedySoup as a strategy for adjusting the output weights of multiple models in 2022 [
21]. GreedySoup first sorts the models from highest to lowest, according to their experimental accuracies in the data set, then combines some of them and adjusts the weights of each model’s output; if a weight adjustment can improve the accuracy of the combined model, then that weight is used. The GreedySoup weighting strategy allows the combined model to classify no less effectively than any of its sub-models. Therefore, for this paper, we adopted this approach to adjust the model weights, that is, weighting the output of each model and experimentally obtaining the weight set leading to the best classification accuracy.
To improve the classification accuracy on agricultural texts, we apply this multi-model combination approach to the field of text classification and propose the E3W classification model, which combines four sub-models and uses the GreedySoup strategy to select the models that perform best in the sub-domains, as well as to weigh the outputs of the sub-models, resulting in significant accuracy improvements.
2.2. Related Work
Natural language processing (NLP) is the study of theories and methods that enable effective communication between humans and computers through the use of natural language, which has a wide range of applications in scenarios such as opinion monitoring, opinion extraction, text classification, and question-answering. Text classification—one of the most fundamental tasks in NLP—has been addressed in many scenarios, such as conversational bots, emotion recognition, and other directions. Similarly, there has been a significant amount of research in the field of news classification.
Fesseha et al. [
26] proposed a convolutional neural network (CNN) based text classification method. The method achieved better results than using traditional machine learning methods. In the field of Chinese agricultural news classification, Huo Tingting [
27] proposed an improved algorithm, CFT-fast text, based on fast text for solving the agricultural news text classification problem. Subsequently, Yang et al. proposed a model based on ENIRE, BiGRU and DPCNN-upgrade (EGC), which achieved the best results so far in the field of Chinese agricultural news classification. At present, some of the most advanced and common tools used for text classification include BERT, ERNIE, TextCNN, DPCNN, Word2Vec, and BiGRU.
As E3W consists of a combination of four sub-models, this section will focus on the four sub-models: ERNIE, ERNIE + DPCNN, EGC, and Word2Vec + TextCNN.
2.2.1. ERNIE
Enhanced representation through knowledge integration (ERNIE) is a model proposed by Baidu back in April 2019 [
28] that further improves upon the BERT model to obtain state-of-the-art results in NLP tasks in Chinese. BERT masks text when processing Chinese [
29], where its masking function is based on individual words, ignoring textual connections, resulting in less-than-comprehensive extracted features; however, this problem does not occur when processing English [
30]. On the other hand, ERNIE masks the whole phrase, which can well capture the relationship between words.
As shown in
Figure 1, BERT masks 15% of the text at random, but the masking does not take into account contextual connections, resulting in a word being separated and fragmenting the original meaning expressed by the phrase, thus not easily inferring the masked phrase. ERNIE changes the way in which BERT masks: instead of masking individual words, a mask of entities and phrases is added, which gives the model a stronger grammar learning capability. The ways that ERNIE and BERT mask words are depicted in
Figure 1.
In addition to the major changes to the mask, ERNIE also adds a number of Chinese data sets for training. BERT uses the Chinese Wikipedia as a training data set in the Chinese processing domain, while ERNIE also uses the Chinese Wikipedia and adds Baidu’s own data sets to this, including Baidu Encyclopedia (solid, strong descriptives), Baidu News (professional fluent corpus), and Baidu Posting (multi-round conversations). These three data sets have different emphases and provide a more comprehensive enhancement to the model. The core of ERNIE is the transformer-encoder, where data are input, encoded, and location information is added, and then computed using a multi-head attention mechanism before a normalization operation is performed to output the final encoded result. The structure of the transformer is shown in
Figure 2.
2.2.2. ERNIE + DPCNN
The ERNIE model is a further optimization based on the BERT model that modifies the BERT masking approach by masking phrases as the smallest unit and adding multiple Chinese training sets. In the ERNIE + DPCNN model, the text is first encoded by ERNIE, following which the encoded text is used as input to the DPCNN to fully extract the features of the text. The obtained features are finally used for text classification by Softmax.
Deep pyramid convolutional neural network (DPCNN) is a type of convolutional neural network proposed by JR [
31]. The core of the network comprises equal-length convolution and half pooling layers, where the equal-length convolution has input and output of size
n, while the half pooling layers halve the length of each input sequence, stacking up the lengths as the layers deepen and eventually giving the core a pyramid shape.
After the text is input, it passes through a region embedding layer containing three different convolutional feature extractors, then two layers of equal-length convolution. Finally, the text is repeated through a residual block of half pooling, which continuously improves the semantics of the text and makes the extracted features richer. The structure of the DPCNN is shown in
Figure 3.
2.2.3. EGC
Based on a combination of the ENIRE, BiGRU, and DPCNN-upgrade models and proposed by Yang Sengqi et al. in 2022, EGC is a model for agricultural news classification that achieved the best classification accuracy on Chinese agricultural news.
EGC is divided into four parts, consisting of an input coding layer, a feature extraction layer, a feature fusion layer, and Softmax activation. At the input encoding layer, the input text is masked by ERNIE, embedded, and finally fed into the transformer for encoding. The encoded text is then fed into the feature extraction layer, which feeds the data into the DPCNN-upgrade and BiGRU modules for feature extraction. The extracted features are then fed into the feature fusion layer, where they are stitched together and fused into new features. The stitched features are then classified using Softmax, in order to obtain the final result. The EGC structure is shown in
Figure 4.
The GRU model contains two gate structures: the update gate and the reset gate [
32]. The reset gate determines how new input information is combined with previously stored data, while the update gate indicates the amount of previous memories saved at the current time step. Compared to the LSTM model [
33], the GRU model is faster to train and better able to represent text features. Let the input be
Xt and the output of the GRU hidden layer at moment
t be
Ht. W is the weight matrix connecting the two layers, the subscripts
r and
Z denote the reset and update gates, respectively, and σ denotes the activation function. The calculation formula is as follows:
The model uses BiGRU for data extraction. As news texts are typically short, the relationships between contexts need to be extracted in order to fully capture the semantic information of the headlines. However, GRU only extracts the impact of the preceding text on the following text and does not reflect the impact of the following text on the preceding text. The output of each step of BiGRU includes a combination of the forward and backward states of the current state, which better considers the relationships between the contexts such that more complete and rich feature information can be extracted [
34]. The BiGRU structure is shown in
Figure 5.
DPCNN-upgrade is an improvement to the deep pyramidal convolutional neural network DPCNN proposed by JR; in particular, it is improved with regard to the features of short agricultural news. According to our calculations, the average length of the agricultural news data set in this paper was about 18.98 words, indicating that agricultural news is generally shorter than other news. In terms of this feature, DPCNN-upgrade reduces the two convolutional layers of DPCNN to retain more text features, thus achieving better results. The DPCNN-upgrade structure is shown in
Figure 6.
2.2.4. Word2Vec + TextCNN
In the proposed model, we combine Word2Vec (a feature-rich word vector for training) with TextCNN (a neural network specifically designed for text classification), in order to further improve the classification accuracy.
Word2Vec, a set of word-embedding tools invented and pioneered by Google in 2013 [
35], has a wide range of applications [
36] and provides a more efficient means of representing semantic distances between words through a deep learning-based tool that obtains a vector representation of each word in a corpus by calculating the cosine distance between word vectors. Word2Vec consists of two word vector training models, CBOW and Skip-gram, both of which include an input layer, a projection layer, and an output layer. The CBOW model predicts the current word from its context, which is suitable for use with a smaller data set; therefore, in this paper, we use the CBOW model. In contrast, the Skip-gram model predicts the current word in context. The structures of these two models are shown in
Figure 7.
Word2Vec processes text vectors by taking into account the relationship between contexts, resulting in a more feature-rich text vector than the previous embedding method, making it easier to extract features and perform some related processing. The word vectors generated by Word2Vec are less dimensional than those generated by embedding, and are faster and more general in operation, allowing them to be used in a variety of NLP tasks and to achieve better results in classifying data sets [
37,
38].
We also use TextCNN, which is based on a modified CNN architecture. CNNs have a wide range of applications in the field of deep learning, due to their great usefulness [
39,
40]. In the CNN, the data is first fed into the input layer of the CNN to obtain the original matrix. The features of this matrix are then extracted by the convolutional layer; the formula for calculating the output results is given in Formula (5). After that, the output results are obtained. As the calculation between adjacent layers of the neural network only can be conducted linearly, it is necessary to use an activation function to carry out the non-linear operation, which enables the neural network to simulate more complex models. There are two activation functions that are mainly used in traditional neural networks:
σ(
x) and tanh(
x). In the process of using these two activation functions, there are problems related to their small interval range and gradient vanishing. To solve these problems, the ReLU function is mainly used, as shown in Formula (6), which is a linear operation with high efficiency. The pooling layer is used for down-sampling (i.e., sparse processing) of feature data. The pooling formula is given in Formula (7).
where
N2 is the size of the output,
N1 is the size of the input data,
F is the size of the convolution kernel, stride is the sliding step of the convolution kernel, and
P is used to fill in the input data, allowing it to be divisible when the stride is greater than 1;
where it can be seen that the function takes a value of 0 when the gradient is less than 0, while the actual number is taken when the gradient is greater than 0, which avoids the problem of the gradient vanishing;
TextCNN is an algorithm for classifying text using a convolutional neural network, proposed by Yoon Kim in 2014 [
41]. TextCNN introduces some deformations into the input layer of the CNN, in order to improve the text classification performance. The data are first encoded as a vector matrix, then convolved and max-pooled. Finally, the data are classified by Softmax. TextCNN is able to automatically combine and filter N-gram features to obtain semantic information at different levels of abstraction when convolving the text, and is able to extract richer text features. Notably, this approach is more effective when processing text with fewer features. The TextCNN structure is shown in
Figure 8.
3. E3W Model
In selecting the models that make up E3W, we chose the following four models, taking into account the fact that Chinese agricultural news headlines are short, making it difficult to extract more features.
ERNIE: Compared with BERT, ERNIE’s improved mask mechanism makes it more applicable to Chinese, obtaining state-of-the-art results in Chinese NLP tasks.
ERNIE + DPCNN: With shorter news headlines, in order to make the extracted features richer, we applied DPCNN to the model as well. In DPCNN, the text repeatedly passes through the remaining half pooling blocks. In this way, the semantics of the text are continuously improved, and more features can be obtained.
EGC: In the field of Chinese agricultural news classification, the EGC model has presented the best classification effect and, at the same time, can provide a more intuitive and better comparison effect.
Finally, we also chose Word2Vec + TextCNN: Word2Vec is used to train feature-rich word vectors to compensate for the lack of features due to the short text, while TextCNN is an improved CNN model that more advantageous for text classification processing.
For these reasons, we chose ERNIE, ERNIE + DPCNN, EGC, and Word2Vec + TextCNN as the sub-models of E3W. Meanwhile, the GreedySoup weighting strategy was used to obtain weights for the outputs of the four sub-models appropriately, such that the four sub-models produced better results when combined. Agricultural news was divided into five categories, where we denoted by 0, 1, 2, 3 and 4 the five categories of fisheries, forestry, planting, animal husbandry, and side-businesses, respectively. We experimentally obtained the classification accuracy for each model in the different categories within the used data set, in order to obtain the best classification areas for each model. These areas represent the category in which the model has the highest classification accuracy, which is an important basis for the weighting of E3W sub-models using GreedySoup. The best classification areas for ERNIE, ERNIE + DPCNN, EGC, and Word2Vec + TextCNN were categories 1, 0, 2 and 3, and 4, respectively.
The E3W structure can be divided into four parts: input layer, model combination, GreedySoup Fine-tune, and final output. The overall E3W structure is shown in
Figure 9.
First, in the input layer, the text is fed into the pre-treatment module for pre-processing. The pre-processed text is then fed into the combination of models, consisting of all four models (i.e., ERNIE, ERNIE + DPCNN, EGC, and Word2Vec + TextCNN), each of which outputs a result to the next stage. In the GreedySoup Fine-tune stage, the outputs of the four sub-models are weighted using the GreedySoup weighting strategy, which is carried out to obtain a matrix containing information on the weights of each category. Finally, in the final output stage, the category with the highest weight is selected as the final output.
3.1. Input Layer
The text is first entered into the model, after which it flows into a pre-processing module. In the pre-processing module, punctuation marks and stop words are filtered out [
42], as these are among the many useless symbols in text information that no practical meaning. This is followed by word segmentation processing. Chinese word segmentation is not as simple as English word segmentation, as there are typically no obvious distinguishing marks between words, and semantic and logical relationships must often be taken into account. The effect of word segmentation directly affects information analysis and the experimental results. The word segmentation tool we used was Jieba. The cleaned text was then fed into the model combination for calculation.
3.2. Combination of Models
In the model combination stage, we combined the four sub-models to process the data. These four sub-models are those models with the highest classification accuracies on different categories of the data set and, by combining them, the classification accuracy could be improved. This idea comes from the approach proposed by Google and other research institutions in 2022 [
20]: as some models perform better on different categories of the same data set, by combining these models, a higher accuracy can be achieved. In this way, the performance of the models in downstream tasks can be improved.
In the model combination stage, the cleaned text passes through four processing routes: (1) In ERNIE, the text is first encoded by the transformer in ERNIE, in order to obtain a vector representation of the text, and is then classified by Softmax to obtain the output; (2) in ERNIE + DPCNN, the text is first encoded by ERNIE to obtain a vector representation of the text, following which the DPCNN encodes the text vector again, improving the semantics of the text and making the extracted features richer. Finally, the text vector is classified by Softmax to obtain the output result; (3) in EGC, the text is first encoded by ERNIE, following which the encoded data are fed into DPCNN-upgrade and BiGRU for processing, respectively. BiGRU contains a bi-directional neural network structure that fully extracts the contextual relationships in the text, which facilitates the extraction of deeper features from the text. DPCNN-upgrade is an improvement of DPCNN. As the news text is short, DPCNN-upgrade reduces two convolutional layers to retain more text features. Then, we fuse the two text features extracted by DPCNN-upgrade and BiGRU, in order to form new features. Finally, Softmax classifies these new features to obtain the output results; and (4) in Word2Vec + TextCNN, Word2Vec first encodes the text to obtain a vector representation of the text, after which TextCNN convolves and pools the encoded data. TextCNN is a neural network specifically applied to text classification, and the extracted text features will be richer. Finally, the TextCNN-processed data are classified by Softmax, in order to obtain the output results. The model combination structure is shown in
Figure 10.
3.3. GreedySoup Fine-Tune
In the GreedySoup Fine-tune stage, the weights are adjusted using the GreedySoup weighting strategy, in which the output weights of each model are adjusted to achieve the best classification effect. GreedySoup is a method for adjusting model weights, as part of the ModelSoup method proposed by Google 2022 [
21]. GreedySoup first sort the models from highest to lowest, according to their experimental accuracy in the data set, then combines some of them and adjusts the weights for each model output: if a weight adjustment can improve the accuracy of the combined model, then the improved weight is used. The GreedySoup weighted strategy allows the combined model to have classification performance no less effective than that of any of the sub-models.
The weight values X, Y in the GreedySoup Fine-tune stage are obtained through parametric experiments, and subsequent experimental results showed that the best results were obtained when Y was greater than X. The final parameter used in this paper was (X:1, Y:2). In the GreedySoup Fine-tune stage, we weigh the four outputs produced in the model combination stage. First, we determine whether the category of the output of a model belongs to the best classification area of the model; if it does, the output is assigned a value of Y and, if not, the output is assigned a value of X. We then adjust the weights of the outputs to improve the performance of the combined model.
By adjusting the weights, we obtain the output of each model, along with the weight for this output. For example, if the output of the ERNIE model is Class 0 and this output is not the best classification area of ERNIE, the weight of this output is X; if the output of the EGC model is Class 3 and this output is the best classification area of EGC, the weight of this output is Y. Next, we add up the weights for the same class; for example, if the output and weights of EGC and Word2Vec + TextCNN are Class 3: Y, Class 3: X, respectively, by adding them together, we obtain the weights of Class 3 as X + Y. By summing the weights, a matrix containing the different output categories and the corresponding weights is obtained. The GreedySoup Fine-tune process is depicted in
Figure 11.
3.4. Final Output
In the GreedySoup Fine-tune structuring stage, we obtained a matrix containing different weights for different categories. In the final output stage, we select the category with the highest weight in this matrix as the E3W output result. For example, the weight of Class 3 is X + Y, which has the highest weight; thus, the final output is Class 3.
Finally, combined with the above description, Algorithm 1 provides a concrete implementation of the E3W classification method.
Algorithm 1. GreedySoup Fine-tune. |
Input: Enter the text set Class and preprocess it Output: ClassN//N indicates the text category 1: for each moudle ∈ Moudles do //Pass the data into each model 2: P = Best_classfication_ares 3: N = classfication_result 4: if P = N then 5: ClassN = Y 6: else 7: ClassN = X 8: end if 9: Sum(ClassN)//Calculate the probability of ClassN 10: end for 11: Max(ClassN) //Maximum probability of getting ClassN 12: return ClassN //The output |
4. Experiments and Analysis
As we found no publicly available data set for Chinese agricultural news, we put together a self-built data set. Then, we compared E3W with sub-models against several different combined models, analyzing them from several perspectives.
4.1. Data Set
At present, a data set for Chinese agricultural news is lacking. The most famous data set for Chinese news classification is the THUCNews data set [
43], which contains 740,000 news documents. This data set includes 14 categories of news, such as sports, education, and science and technology; however, agricultural news is absent. In the absence of a data set in a research direction, it is common to construct the required data set [
44]; accordingly, we endeavored to construct the data set used in this paper.
The data set constructed in this paper includes Chinese agricultural news headlines, collected using the Octopus software [
45]. The data were obtained from agricultural news websites such as China Animal Husbandry Network, Ocean Information Network, Southwest Fishery Network, China Agriculture Network, and China Soybean Network. These websites are all very large agricultural websites in China, hosted by professional agricultural companies. The above websites provide professional and objective news and perfect information, and have great influence in the Chinese agricultural community. The data set collected in this paper includes up to the latest data in 2022, which ensures the data currency of this data set, which is crucial in ensuring data quality [
46,
47,
48].
The E3W model takes text data as input, including exclamations, special characters, etc. Therefore, we pre-process the data set, including the deletion of stop words and word segmentation. The deletion of stop words involves removing meaningless symbols, such as ‘[’ and ‘*’, as well as semantic words that have no real meaning, such as ‘etc’, ‘so on’, and ‘the’. These stop words and symbols provide no information, and their removal can help us to reduce the size of the training data, in order to capture the meaning more appropriately. Furthermore, word segmentation is the process of dividing Chinese text into entities and phrases, such that more of the meaning of the text can be retained for classification.
The data set included 15,548 news headline data, collected from websites in different domains. Some manual checks were performed after data collection so that the data could be more correctly classified into the appropriate categories. Chinese agricultural classifications are generally divided into five categories; hence, some studies on Chinese agricultural news have also treated it as a five-category NLP task [
19]. The exact number of categories is shown in
Table 1. The ratio of data in the training, test, and validation sets was 8:1:1, and the average Chinese word length of sentences in the data set was 18.98. The details of the data set are provided in
Table 1.
For a more visual representation of the data set constructed for this paper, we provide word clouds for the five categories of the data set, as well as for the total data, in
Figure 12.
Table 2 shows a selection of the self-built Chinese agricultural news data set, presented in English. As can be seen from
Table 2, the average length in the data set was short, resulting in insufficient text features being extracted and making classification more difficult. It is difficult for traditional models to achieve the highest classification accuracy in every category, which was considered as an important factor in this paper, leading to the use of multiple models for processing, through which we improved the classification accuracy by combining multiple sub-models. Among them, the text length for the planting category was the shortest, while there was not much difference between the text lengths in other categories. It is worth noting that the side business category texts were the most extensive and partially similar to the texts in the other categories, which is an important reason why the accuracy rate for the side business category was lower than those for the other categories in the classification results.
4.2. Experimental Parameter Settings
In addition to the parameter settings of the four sub-models and E3W, the relevant parameters for the other models used in the experiments were set as follows: The dimension of the words was 100, the number of hidden layers was 769, the learning rate was set to 1 × 10−5, the maximum length of the sentences was 19, the dropout was set as 0.5, and the learning decay rate was 0.9. When CNN convolutional kernels were involved, the convolutional kernels were set to (2,3,4), and the number was 128.
4.2.1. Parameter Settings for ERNIE
The dimension of the words was 100. The number of ERNIE hidden layers was 769. The learning rate was 1 × 10−5, the sentence length was 19, and the dropout was set to 0.1.
4.2.2. Parameter Settings for ERNIE + DPCNN
The number of convolution kernels was 250. The size of the convolution kernels was (2,3,4). The other parameters were consistent with ERNIE.
4.2.3. Parameter Settings for EGC
The number of BiGRU layers was 2, while the number of BiGRU hidden layers was 256. Other parameters were consistent with ERNIE and ERNIE + DPCNN.
4.2.4. Parameter Settings for Word2Vec + TextCNN
The dimension of the words was 100. The sentence length was 19. The dropout was 0.5. The number of convolution kernels was 128. The size of the convolution kernels was (2,3,4). The learning rate was 1 × 10−3. The decay rate of the learning rate was 0.9.
4.2.5. Parameter Settings for E3W
The parameters of the E3W model were the same as those described above, with the weighting parameters set to 1 for X and 2 for Y.
We obtained the best values for the weighted parameters X and Y in the GreedySoup Fine-tune stage. As E3W consists of a combination of four sub-models, parameters 1–4 were chosen for the experiments. When the output of a model resulted in the best classification area for that model, the value was assigned to Y, or was assigned to X otherwise.
For example, as the best classification area of Word2Vec + TextCNN is Class 4, when the output of Word2Vec + TextCNN is Class 4, the weight of this output is Y; otherwise, it is X.
From the experiments, when Y was greater than or equal to X, the results were the best. However, in theory, Y greater than X has the best effect, because there are four sub-models in the E3W, and when two sub-models incorrectly choose a category, the weighting ensures that there is still a chance of obtaining the correct result if the other two sub-models choose the correct category. Additionally, X equal to 1 and Y equal to 2 is also in the range where Y is greater than or equal to X. Therefore, the final parameters used in this paper were X = 1, Y = 2. The experimental results are detailed in
Table 3.
In addition, in GreedySoup Fine-tune, there may be a situation where multiple categories are weighted equally, making it impossible to select the best output. In this case, two solutions are provided in this paper:
- (1)
The smaller category label was selected as a result (if class 0 and class 4 had the same weight, then class 0 was selected as the final result). This operation achieved good results in the experiment. This is mainly because E3W has a lower classification accuracy for class 4, while classes 0, 1, 2, and 3 have more text, more obvious features, and a higher probability of correct classification.
- (2)
When the final weights of two categories were the same, if one of the outputs came from the best classification area of the model, then the weights of the other three sub-models were ignored. This operation ensured that the accuracy of E3W was not lower than that of either sub-model.
The above two solutions significantly improved the classification accuracy on the data set, where the improvement with both approaches was almost identical.
4.3. Model Evaluation Indicators
For this study, an indicator test of the text classification model was carried out. The model evaluation indicators of the classification algorithm are often measured using the confusion matrix, as shown in
Table 4. The data obtained from the confusion matrix are extended by calculation to obtain four secondary metrics—accuracy, precision, recall, and F1-score—which are the core metrics for evaluating classification models [
49].
The accuracy of a classification model represents the ratio of samples correctly predicted by the model for all samples. In general, the higher the accuracy, the better the classifier. The accuracy is calculated as follows:
The precision of a classification model is defined as the percentage of samples with true positive class among all samples predicted to be positive class, calculated as:
The recall of a classification model is defined as the percentage of samples with true positive classes that are correctly predicted, and the formula is as follows:
The F1-score is the summed mean of precision and recall (see Equation (8)), which combines the precision and recall results and is closer to the smaller of the two; therefore, when the precision and recall are close, the F1-score is large. A higher value of F1 indicates a better model prediction effect.
4.4. Experiments
Next, 13 models were selected for experiments, in order to test the classification effects of different models on the agricultural news data set. The four models that performed best in the different categories were finally selected and combined to generate E3W.
4.4.1. Experiments on the 13 Models
The models tested included the basic Naïve Bayes and
k-nearest neighbors models, as well as more advanced models such as BERT, RoBERTa, and MacBERT. When combining CNN, RNN, and DPCNN with BERT, we found that BERT combined with DPCNN gave the best results. Additionally, to compensate for BERT’s lack of masking capability when dealing with Chinese, ERNIE was also used for the classification of agricultural news, combined with DPCNN and BiGRU, respectively. In order to be more applicable to agricultural news and retain more text features, DPCNN was optimized by reducing two convolutional layers, in order to obtain DPCNN-upgrade. We extracted features using both BiGRU and DPCNN-upgrade, then fused the two extracted features, which were then classified to achieve better results. In addition, we attempted to combine Word2Vec with TextCNN by first training Word2Vec with word vectors, then inputting it into TextCNN, obtaining the final results through Softmax classification. The experimental results for the models are provided in
Table 5.
It can be concluded, from
Table 5, that the best results were achieved with ERNIE + DPCNN for the fishery classification. The average length in the fisheries data set was the longest of the five data sets and, with ERNIE and DPCNN, many valid features were still preserved, such that good results were achieved. ERNIE achieved the best results for forestry classification, where the average length in the forestry data set was second only to that in the fisheries data set. Effective features could be extracted with good results when using only ERNIE. EGC achieved the best results in the planting and animal husbandry categories, where the average lengths in the plantation and livestock data sets were the shortest. In EGC, the data is encoded by ERNIE, followed by DPCNN-upgrade and BiGRU to extract different features. Then, the two parts are fused to obtain richer features for better results. The models based on BERT and ERNIE did not obtain very good results when classifying the side business category. Word2Vec + TextCNN achieved the best results for side business, which contains many other categories that are not easy to distinguish. Word2Vec first trains the text adequately and then combines it with TextCNN to extract features, in order to achieve the best results.
4.4.2. Experiments with E3W and Four Sub-Models
Considering
Table 5, we obtained four sub-models with excellent classification results: ERNIE, ERNIE + DPCNN, EGC, and Word2Vec + TextCNN. We combined these four high-performing models to obtain the proposed E3W model. Then, we evaluated these five models, in terms of accuracy (ACC), precision (P), recall (R), and F1-score (F1), in order to evaluate their effectiveness and conducted relevant ablation experiments, the details of which are provided below.
E3W is the model proposed in this paper. In
Figure 13, we provide the confusion matrix for E3W; it can be seen, from the figure, that the number of misclassifications for side business was the highest among all of the classification areas. In particular, the distinction between side business and planting categories was the most difficult, as some side business-related industries, such as the textile industry, often use materials from the plantation industry, making them prone to misclassification. There are also difficulties in distinguishing between side business and animal husbandry for similar reasons, such as the leather sector, which obtains its raw materials from animal husbandry. These are the key reasons underlying the low classification accuracy of side business. Based on the presented confusion matrix, we also obtained the ACC, P, R, and F1 values for E3W, which will be shown subsequently for comparison.
We next verified whether the overall performance of the model was improved by analyzing the average accuracy.
Table 6 shows the average accuracies of different models. The E3W model proposed in this paper achieved the highest average accuracy, being 1.02% higher than that of the most advanced EGC. It can be concluded that the classification effect of the E3W model is excellent, and it is more effective than using any model alone.
Figure 14 shows the precision of the different models. It can be seen that E3W achieved the highest precision values for fisheries and planting, with increases of 0.02% and 1.59%, respectively. E3W achieved an average precision of 93.09%, which was 1.62% better than the state-of-the-art (EGC).
Figure 15 shows the recall of the different models. Again, E3W had the highest recall rates in fisheries, animal husbandry, and side business categories, improving by 0.78%, 0.81%, and 1.44%, respectively. E3W reached 90.61% in the average recall rate, ahead of the state-of-the-art by 1.21%.
Figure 16 shows the F1-scores of the different models, which reflect the overall capability of a model. E3W achieved the highest F1-score values in the fisheries, planting and side business categories, improving by 0.47%, 0.08% and 2.62%, respectively, with the average F1-score of E3W improving upon the state-of-the-art by 1.02%.
To verify the effectiveness of the E3W model combination, we conducted ablation experiments. From
Table 7, it can be seen that E3W presented significant improvements in Avg Accuracy, Avg Precision, Avg Recall score, and Avg F1-score, by 0.15%, 0.14%, 0.15%, and 0.12%, respectively. The results of the ablation experiment fully illustrate the validity and correctness of the E3W model.
4.5. Experiment Discussion
It can be concluded, from the experimental results, that the classification effectiveness of the E3W model proposed in this paper was significantly improved in various evaluation metrics, with a 1.02% improvement in average accuracy, 1.62% improvement in average precision, 1.12% improvement in average recall, and 1.02% improvement in average F1-score. Subsequent ablation experiments also showed that the E3W model was made more effective through the used combination. The above experiments demonstrate that E3W is the most advanced approach, combining multiple models for agricultural news classification.
Experiments comparing E3W with 13 classical and advanced models showed that E3W performed best for the classification of Chinese agricultural news. This is because E3W is a combination of the models with the highest accuracy in each classification domain, such that E3W can achieve the best results on each domain. We also compared the E3W model with its four sub-models in terms of four evaluation metrics: accuracy, precision, recall, and F1-score. E3W not only achieved the highest results in each evaluation metric, but also provided a significant improvement in accuracy in each evaluation metric.
The output of E3W takes into account the output of the four sub-models and, if the output category of a sub-model is the category in which the sub-model has the highest classification accuracy, we weigh this output to compensate for the sub-model’s lower classification accuracy in other categories, thus improving the accuracy.
Finally, in order to verify that the combination used in E3W is scientific and reasonable, ablation experiments were conducted to compare the results under different model combinations; however, none of these combinations performed as well as E3W. The main reason for this is that E3W considers all categories and uses a combination of models with the highest accuracy in each classification domain, such that any other combination will not be as comprehensive as E3W.
5. Conclusions
In this paper, we proposed a Chinese agricultural news classification model, E3W, based on a GreedySoup weighting strategy and a multi-model combination approach. E3W consists of a combination of four sub-models, where the outputs of the four sub-models are weighted to obtain the final classification results. The proposed model was tested on a self-built Chinese agricultural news data set. A total of 13 model comparison experiments indicated a significant improvement in the classification performance of our proposed model; in particular, E3W improved the average accuracy by 1.02%, the average precision by 1.62%, the average recall by 1.21%, and the average F1-score by 1.02%. Subsequent ablation experiments also validated that the E3W model was composed of an optimal combination of sub-models. The results presented above demonstrate the excellent classification capability of the E3W model, which further enhances the effectiveness of Chinese agricultural short text classification and provides a new strategy for subsequent text classification work, using a combination of models to improve the model performance in downstream tasks.
The method proposed in this paper still has some limitations. For example, several more accurate models should be assessed before performing model combination. There are many model improvement experiments and comparison experiments, and more pre-preparation research needs to be developed, where the pre-preparation workload is expected to be high when the problem to be dealt with requires more models. In addition, when the combined model contains a large number of sub-models, more comparative experiments are required to obtain the most effective weights.
Moreover, E3W still has some shortcomings, as it was less accurate in classifying side businesses than other categories. This is because the side business category comprises many elements, such as textiles, handicrafts, agro-processing, and other related industries. Further research is needed to more effectively classify side businesses, in order to accurately distinguish side businesses from other categories and improve the agricultural news classification accuracy.