Research on the Identification of a Winter Olympic Multi-Intent Chinese Problem Based on Multi-Model Fusion

Liu, Pingshan; Liang, Qi; Cai, Zhangjing

doi:10.3390/app131911048

Open AccessArticle

Research on the Identification of a Winter Olympic Multi-Intent Chinese Problem Based on Multi-Model Fusion

by

Pingshan Liu

,

Qi Liang

^* and

Zhangjing Cai

School of Business, Guilin University of Electronic Technology, Guilin 541004, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2023, 13(19), 11048; https://doi.org/10.3390/app131911048

Submission received: 16 September 2023 / Revised: 30 September 2023 / Accepted: 5 October 2023 / Published: 7 October 2023

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Aiming at addressing the inability of traditional web technologies to effectively respond to Winter-Olympics-related user questions containing multiple intentions, this paper explores a multi-model fusion-based multi-intention recognition model BCNBLMATT to solve this problem. The model is proposed to address the characteristics of complex semantics, strong contextual relevance, and a large number of informative features of the Chinese problem text related to the Winter Olympics, as well as the limitations of the traditional word vector model, such as insufficient expression in the textual representation and the relative concern mechanism of feature expression. The BCNBLMATT model first obtains a comprehensive feature vector representation of the problem text through BERT. Then, a multi-scale text convolutional neural network model and a BiLSTM-Multi-heads attention model (a joint model combining a bidirectional long- and short-term attention network with a multi-head attention mechanism) are used to capture local features at more scales and contextually critical information features at more levels. Finally, the two obtained kinds of features are concatenated and fused to obtain richer and more comprehensive information about the problem text features, which improves the model’s performance in the multi-attention recognition task. Comparative experiments on the Winter Olympics Chinese question dataset and the MixATIS question dataset show that the BCNBLMATT model significantly improves the three metrics of macro-averaged precision, macro-averaged recall, and macro-averaged F1 value and exhibits better generalization. This study provides an effective solution to the multi-intent recognition task for Winter Olympic problems, overcomes the limitations of traditional models, and provides new ideas for improving the performance of multi-intent recognition.

Keywords:

multi-intent recognition; deep learning; multi-feature fusion; Bert; multi-heads attention

1. Introduction

The successful conclusion of the 2022 Beijing Winter Olympics has triggered widespread interest in issues related to the Winter Olympics. The Winter Olympics is held only once every four years and has been held 24 times so far, accumulating rich information resources. Although widely used, general purpose search engines face the problems of much information interference and difficulty in quality assurance. Especially when dealing with queries containing multiple question intents, general purpose search engines often fail to identify all the user’s intents, thus making it difficult to answer the user’s questions effectively. Therefore, how to accurately identify the user’s complete intention has become a core problem that needs to be solved. Against this background, this paper is dedicated to exploring a key problem, i.e., the “Winter Olympics Multi-intent Chinese Problem”, and describing in detail the “Multi-model Fusion” approach that we adopt.

Winter Olympics multi-intent Chinese questions involve multiple aspects that users may care about at the same time, such as inquiring about an athlete’s age, nationality, and career. Such multi-intent contexts are often characterized by complex semantics, strong contextual associations, and rich information features. Therefore, the goal of our research is to more comprehensively extract the information features of user question texts to improve the performance of multi-intent recognition questions. To achieve this goal, we constructed a multi-intent recognition model BCNBLMATT, which is based on the improvement of Bert and the multi-heads attention mechanism, and by exploring our problem domain in detail, we will reveal how to parse user inputs more accurately and comprehensively under the “multi-model fusion” approach to provide new possibilities for a deep understanding of Winter-Olympics-related problems. The main contributions of this article are as follows:

Aiming at the scarcity of corpus for Winter-Olympics-related problems, crawler technology is used to obtain Winter-Olympics-related information, extracting information such as the basic information of the athletes being inquired about, the achievements they have received, their careers, and their competitions. A user question dataset about the Winter Olympics domain is automatically generated through a customized template containing single-, two-, and three-intent question data.
The Chinese pretraining language model Bert-base-chinese learning is used to obtain dynamic text semantic vector representations containing richer semantic information and improve the text’s semantic representation ability.
To address the problem of a single-head attention mechanism with single feature expression, the multi-heads attention mechanism is introduced so that the model can obtain more information about the problem text from different perspectives and improve the feature expression ability of the model.
The improved multi-intent recognition model BCNBLMATT based on BERT and the multi-heads attention mechanism is proposed and the problem text is encoded through Bert-base-chinese to obtain the dynamic text semantic vector representation; the local feature extraction of TextCNN and the context-dependent relationship of BiLSTM-Multi-heads attention feature extraction are combined to obtain the local feature and contextual feature information of the problem text. By fusing these two kinds of features, the problem of incomplete feature extraction is solved, and the superiority of this model in terms of the multi-intent recognition effect is verified by comparing and analyzing it with other models on the Winter Olympics Chinese question dataset and MixATIS question dataset.

2. Related Research

2.1. Winter Olympics Field

In the face of the problem that universal search engines on the internet contain a large amount of data resources, which can lead to data redundancy and clutter, thus affecting the efficiency of people’s access to information, Luo Ling et al. [1] proposed three Winter Olympics knowledge Q&A system models based on knowledge graph, TF-IDF, and BERT models, and demonstrated through experiments that the overall performance of the BERT model was slightly better than the other two types of models in these three methods. The dataset used in this experiment was factual information crawled by Luo Ling et al. [1] using web crawler technology as the answer, and simple questions were generated through templates. The dataset contained basically simple, single-intention question-and-answer pairs. For the TF-IDF model and BERT model, the method used was to find the answer that was most similar to the question as the answer. For the knowledge graph method, as the dataset was all simple questions, it only needed to predict the header entity and predicate of the question and obtain the answer through a triplet, without considering the complex sentence structure and multiple intentions of the problem text.

2.2. Multi-Intention Recognition Task

For multi-intention recognition tasks, early on, Xu et al. [2] used feature-based logarithmic linear models and perceptron training methods to utilize shared intention information between different intention combinations for multi-intention recognition tasks. However, when faced with a large number of intention combinations, the problem of sparse training data may arise. Kim et al. [3] proposed a multi-intention recognition method based on single-intention training data, which divides the problem text into single-intention problems, two-intention problems with conjunctions, and multi-intention problems with two clauses without conjunctions. The method utilizes maximum entropy and conditional random field models to perform multi-intention recognition using a two-stage method. However, the model is based on the maximum number of user intentions being two intentions. Later, with the rapid development of deep learning technology, experts began to use this technology for multi-intention recognition tasks. Yang Chunni et al. [4] first used a dependency syntax analysis to determine whether text was multi-intention, and calculated the distance between the words in the sentence and the keywords in the intention category through word frequency-inverse document frequency (TF-IDF) and trained word vectors to determine the number of intentions. Then, CNN (Convolutional Neural Networks) models were used for intention classification. Finally, the true intention of the user was obtained by determining the polarity of the positive and negative emotions of the intention. It relies too heavily on the results of dependency parsing, but the complexity of Chinese is high, and errors in dependency parsing can have a significant impact on the results. Liu Jiao et al. [5] further extracted the deep semantics of text by adding convolutional capsule layers into the capsule network [6] to improve the performance of the multi-intention recognition. However, the capsule network is currently immature and only performs well on MNIST dataset. Some scholars believe that intention and semantic slots are interrelated, so they have proposed a model that combines intention detection and semantic slot filling [7,8] to improve the accuracy of intention recognition and semantic slot filling. However, this method requires a lot of manual labeling, and the cost of manually implementing feature representation is high.

2.3. Multi-Label Text Classification Task Based on Deep Learning Technology

At present, there is relatively little research on multi-intention recognition, and the multi-intention recognition task is similar to the multi-label text classification task. For multi-label text classification tasks, with the rapid development of deep learning technology in recent years, using deep learning methods to solve multi-label classification tasks has become a research hotspot. Deep learning methods automatically extract features through neural network structures such as Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN), which reduces labor costs and enhances feature expression ability [9]. Convolutional Neural Networks (CNN) [10] were initially applied in the field of computer vision. Later, Kim [11] proposed the TextCNN model, which is suitable for text classification tasks and was obtained by Kim through some modifications to the input layer of a CNN. It only has one layer of convolution and one layer of maximum pooling, so it has advantages such as a simple network structure and a fast training speed; it is widely used in multi-label text classification tasks [12,13]. Its characteristic is that it can effectively extract the local features of sentences, but its disadvantage is that it requires the use of fixed windows. Therefore, when facing long texts, it is not suitable for capturing the long-distance dependencies of text sequences and is prone to losing key information. Recurrent Neural Networks (RNN) [14] can capture long-distance dependencies and are therefore applied to multi-label text classification tasks [15]. However, RNNs may experience gradient vanishing or exploding when processing text. To address this issue, the RNN variant BiLSTM [16] has emerged, which can effectively capture contextual information, extract the global features of sentences, and improve the classification accuracy of multi-label texts [17].

In text classification tasks, in order to extract key information that is more effective for classification, Zhou et al. [18] proposed using the BiLSTM model to extract key features globally by introducing an attention mechanism to assign higher weights to key information, thereby improving the effectiveness of text classification. Some scholars have combined a CNN with BiLSTM for use. Some scholars have first obtained the local features of text through a CNN and then input them into LSTM for classification [19,20], while others have first extracted contextual information through BiLSTM-attention and then input them into a CNN to extract the features for classification [21,22]. Song Zhongshan et al. [23] concatenated the output results of a CNN and the BiLSTM method that introduced the attention mechanism. They believed that the feature extraction capabilities of BiLSTM and CNN models are limited, resulting in a low classification accuracy. Using this method can improve the classification performance of the model by utilizing complementary advantages. At the same time, it was verified through comparative experiments that this concatenation method is better than LSTM-CNN and LSTM-CNN.

The above research provides some reference ideas for the Winter Olympics multi-intention problem recognition task: (1) Currently, most classification models use word vector models when extracting features, which still have limitations in terms of text representation, such as Word2vec. Its vocabulary is determined from the beginning of training, which cannot effectively solve the impact of words outside its vocabulary on text feature learning, and it also ignores the polysemy of words; the BERT [24] pre-trained language model has strong text representation and semantic understanding abilities, which can better solve the above problems, and also has good results in text classification [25,26,27]. Due to the particularity of Chinese itself, this article uses Bert-base-chinese as the pre-trained language model to maximize the text representation; (2) Faced with the problem of limited feature extraction for TextCNN and BiLSTM, combining the advantages of complementary advantages can extract richer and more complete text semantic feature information; and (3) Previous models have mostly used single-layer attention mechanisms with relatively single feature expression. For multi-intention recognition models, this article introduces a multi-head attention mechanism [28], aiming to enable the model to obtain more information about sentence levels from different representation spaces and improve its feature expression ability.

3. BCNBLMATT Model Building

3.1. Overall Architecture of the Model

The BCNBLMATT model framework proposed in this paper is shown in Figure 1 and is divided into three parts. The first part is the text representation layer, which uses Bert-base-chinese to encode the problem text to obtain a semantic feature representation of the text containing richer semantic information. The second part is the feature extraction part, including the TextCNN layer and BiLSTM-Multi-heads attention layer. The TextCNN layer captures the local key features (F1) of the problem text via convolutional operation, while the BiLSTM-Multi-heads attention layer extracts the contextual key features (F2) of the problem text by using the mechanism of Multi-heads attention information. The third part is the feature splicing and fusion and intent classification layer, which splices and fuses the two semantic features obtained in the second part to obtain a richer feature representation and perform intent classification.

3.2. Text Representation Layer

Due to the fact that this article focuses on Chinese datasets, the Bert-base-chinese pretraining model is adopted. This model is trained based on a relevant corpus from Chinese Wikipedia, and is used as the baseline. The model uses a multi-layer bidirectional Transformer encoder as the feature extractor. As shown in Figure 2, Firstly, the question text is converted into a word vector embedded representation En, and then, the Transformer encoder is used to convert the text into a text vector Tn rich in semantic features as the input for the downstream BiLSTM-Multi-heads attention model and TextCNN model.

3.3. TextCNN Layer

TextCNN is a convolutional neural network for text classification that utilizes multiple sliding windows of different sizes to perform convolution pooling operations on text vectors. It extracts the key information from sentences by capturing the local features of text sequences. As shown in Figure 3, it consists of four parts: input layer, convolutional layer, pooling layer and output layer, which will use the text vector Tn obtained after training by Bert-base-chinese as the word vector input, and the corresponding feature vector is obtained by convolution operation through sliding windows of different sizes, and then the largest feature is selected from the feature vector generated by each sliding window by the maximum pooling operation, and then these features are spliced together to obtain the local feature vector F1 of the text.

The formula is as follows:

C = f (w T_{i : i + h - 1} + b), w \in R^{h \times k}

(1)

where h represents the size of the convolutional kernel, k represents the word vector dimension corresponding to each word in the text sequence, w represents the h × k-dimensional weight matrix, T_i:i+h−₁ represents a sliding window of size h × k composed of rows i to i + h−1 of the input matrix, consisting of T_i, T_i+₁, … T_i+h−₁ is concatenated, b is the bias parameter, f is the nonlinear activation function, and w and T_i:i+h−₁ are sequential dot products for obtaining the corresponding eigenvectors. The pooling layer adopts the Max Pooling maximum pooling strategy to filter out a maximum feature value from each sliding window, as follows:

F_{i} = m a x {C_{1}, C_{2}, \dots, C_{n - h + 1}}

(2)

Among them, n represents the number of words in the text, and finally, all the pooled feature values are concatenated to obtain the high-level feature vector F1 of the text.

3.4. BiLSTM-Multi-Heads Attention Layer

The Bidirectional Long Short Term Memory neural network model (BiLSTM) is a combination of forward LSTM and backward LSTM. Although the LSTM mode [29] can capture long-distance dependencies, it cannot encode information from back to front. Therefore, the BiLSTM model is proposed to perform bidirectional processing on sequence data to capture the bidirectional semantic dependencies of sentences, and it can be used to extract the contextual information from text sequences. In order to capture the more important features in sentences, this article adopts a multi-head attention mechanism for key information extraction. The multi-head attention mechanism divides the model into multiple heads, performs multiple independent attention calculations [28], focuses on different aspects of information, and learns more key feature information. It can focus on more positional information and improve the model’s expressive ability. As shown in Figure 4, This article inputs the text vector Tn obtained after the Bert-base-chinese training into BiLSTM for the forward and reverse processing of the input sequence. The two processed feature vectors are concatenated as the output sentence feature vector hn, which contains all the information of the sentence in both forward and backward directions. Then, hn is input into the multi-head attention mechanism model for multiple sets of attention processing to obtain the key feature information from different angles. Finally, the features obtained from each group are concatenated and a linear transformation is performed to obtain the final high-level feature vector F2 of the text.

The formula is as follows:

{\begin{matrix} \vec{h_{i}} = \vec{L S T M (T_{i})} \\ \overset{\leftarrow}{h_{i}} = \overset{\leftarrow}{L S T M (T_{i})} \\ h_{i} = [\vec{h_{i}}, \overset{\leftarrow}{h_{i}}] \end{matrix}

(3)

Among them, Ti is the vector of the i-th word,

\vec{h_{i}}

represents the forward output of the i-th word LSTM and

\overset{\leftarrow}{h_{i}}

represents the reverse output of the i-th word LSTM. The two are concatenated as the output value of the i-th word BiLSTM.

{\begin{matrix} Q_{i} = Q W_{i}^{Q}, W_{i}^{Q} \in R^{l \times d k} \\ K_{i} = K W_{i}^{K}, W_{i}^{K} \in R^{l \times d k} \\ V_{i} = V W_{i}^{V}, W_{i}^{V} \in R^{l \times d k} \end{matrix}

(4)

h e a d_{i} = A t t e n t i o n (Q_{i}, K_{i}, V_{i}) = s o f t m a x (\frac{Q_{i} K_{i}}{\sqrt{d k}}) V_{i}

(5)

F_{2} = M u l t i h e a d = c o n c a t (h e a d_{1}, h e a d_{2}, \dots, h e a d_{n}) W^{o}, W^{o} \in R^{l \times l}

(6)

where i represents the i-th attention, with n attention heads, i = 1,..., n, and l is the word vector dimension, dk = l/n. Its existence is to shrink the value to avoid an excessive inner product value,

W_{i}^{Q}

,

W_{i}^{K}

,

W_{i}^{V}

, and

W^{o}

are the weight matrices, and concat represents the concatenation operation.

3.5. Feature Fusion and Intention Classification

The features extracted from the TextCNN layer and BiLSTM-Multi-heads attention layer are concatenated and fused to obtain the final feature vector F, which is passed into the fully connected network. The output dimension of the fully connected network is equal to the number of intention categories, and the prediction results obtained through the fully connected network are normalized. Since each intention predicted in this paper is independently distributed, the sigmoid function is used for processing. The selection criteria for intention are based on a probability greater than 0.5. The formula is as follows:

F = c o n c a t (F_{1}, F_{2})

(7)

P = s i g m o i d (F \times W_{f}), W_{f} \in R^{2 l \times s}

(8)

P_{e} = (i f P_{e} \geq 0.5 1 e l s e 0)

(9)

Among them, W_f is the weight matrix, sigmoid is the nonlinear activation function, s is the total number of intention categories, P represents the probability of each intention category corresponding to the problem text, and P_e represents the intention category to which the problem text belongs, that is, the intention category corresponding to the problem text with a probability greater than or equal to 0.5 is set to 1, and others are set to 0, indicating that they do not belong to the corresponding intention category.

4. Experiment and Result Analysis

In this paper, we first divided the dataset into training, validation, and testing sets. In the model’s training process, we used the AdamW optimizer, the learning rate scheduler, and the binary cross-entropy loss function (BCEWithLogitsLoss). Through iterative training on the training set, we gradually adjusted the model parameters to improve the performance. At the end of each training cycle, we evaluated the model using the validation set to comprehensively assess the model performance, mainly based on the Macro_F1 metrics, and retained the model that performed best on the validation set. We chose Macro_F1 as the evaluation metric because it balanced the precision and recall of the model, thus providing a more comprehensive picture of its performance. In the end, we used the test set for the final comprehensive evaluation of the models using the three metrics Macro_P, Macro_R, and Macro_F1. In order to highlight the effectiveness and accuracy of the models, four models were selected for comparison, including BERT, BERT+TextCNN (Bert-textcnn for short), BERT+BiLSTM+Multi-head attention (Bert-blmatt), and BERT-TextCNN+BiLSTM-attention (Bert-cnn+blatt). To enhance the comparability, we also added a dataset for comparison.

4.1. Dataset

Since the question data about the Winter Olympics are too sparse, the main questions focus on asking basic information about athletes, achievements, careers, and the competition, etc. There are relatively few question patterns for this kind of vertical domain, so this paper collected the information containing the names of athletes, the competition results, and the names of the events of the previous Winter Olympics in advance, and firstly, set up a specific template and replaced it with entities, synonym replacement, sentence splicing, and other methods to automatically batch generate user question datasets about the Winter Olympics domain, in which the questions containing three intentions, two intentions, and a single intention were 2000, 1483, and 1494, respectively. At the same time, to validate the generalization of the present model, this paper adopted the multi-intention dataset MixATIS provided by Qin [30] for validation since it is an English dataset. Therefore, this paper translated the English problem dataset into a Chinese problem dataset through Baidu Translate, in which the problem datasets containing three intentions, two intentions, and one intention were 3983, 9371, and 1366, respectively. The dataset is divided into training set, validation set, and test set, the specific number of which is shown in Table 1. Examples of Winter Olympics datasets are shown in Table 2, and MixATIS datasets are shown in Table 3.

4.2. Experimental Parameters

The model and benchmark experiments were conducted on the Windows 10 operating system and implemented using the Pytorch deep learning framework. The parameter settings of this model are shown in Table 4.

4.3. Evaluating Indicator

Because this article is aimed at identifying multiple intentions, the macro average accuracy (Macro_P), macro average recall rate (Macro_R), and macro average F1 value (Macro_F1) were used as evaluation indicators. The confusion matrix is shown in Table 5.

The calculation method for the evaluation indicators is as follows:

In the following formula, i represents the i-th intention category and N represents the total number of intention categories.

M a c r o_P = \frac{1}{N} (\sum^{} \frac{T P_{i}}{T P_{i} + F P_{i}})

(10)

This value is the macro average accuracy, which represents the overall accuracy of the model’s prediction results for each type of label. The closer this value is to 1, the better the model’s accuracy.

M a c r o_R = \frac{1}{N} (\sum^{} \frac{T P_{i}}{T P_{i} + F N_{i}})

(11)

This value is the macro average recall rate, which represents the coverage of the model’s prediction results for each type of label and the sum of the average values to obtain the overall coverage. The closer this value is to 1, the better the recall rate of the model.

M a c r o = \frac{1}{N} (\sum^{} \frac{2 \times P_{i} \times R_{i}}{P + R})

(12)

This value is the macro average F1 value, which represents the accuracy and coverage of the model’s prediction results for each type of label. This value is the harmonic average of the accuracy and recall. The closer this value is to 1, the better the model’s performance.

4.4. The Influence of the Number of Attention Heads and the Size of Convolutional Kernels on the Model’s Intention Recognition Performance

To verify the effect of the number of heads and the size of the convolutional kernel on the intention recognition, this article used the method of controlling variables for comparative verification. Firstly, the convolutional kernel size was set to [3,4,5] unchanged, and the number of attention heads was set to 4, 8, 12, and 16, respectively. The results are shown in the first row of Figure 5: when the number of attention heads was set to 12, the model performed best; therefore, in the subsequent experiments, this model set the number of attention layers of multiple heads to 12 unchanged, and set the convolutional kernel sizes to [2,3,4], [3,4,5], and [4,5,6] for comparative experiments. The results are shown in the second row of Figure 5: when the convolutional kernel size was [3,4,5], [3,4,5,6], the model had the best intention recognition effect. Due to the very similar values, in order to reduce computational costs, this model set the convolutional kernel size to [3,4,5].

4.5. Comparative Experiment

This article uses comparisons with the following models:

(1): Bert: the BERT pre-trained language model was only used for feature extraction and intention classification.
(2): Bert-textcnn: BERT was used as a pre-trained language model for local feature extraction and intention classification using TextCNN.
(3): Bert-blmatt: To verify this model’s effectiveness, local decomposition validation was performed on the model, and Bert-blmat was used as a comparative experiment. BERT was used as a pre-trained language model, global semantic features were extracted through BiLSTM, the noise was reduced through Multi-heads attention, and key features were extracted for intention recognition.
(4): Bert-cnn+blatt: In order to verify the effectiveness of Multi-heads attention, this article replaced Attention with Multi-head attention for comparative experiments. BERT was used as a pre-trained language model, and the local features of the text and the key features in the global semantic information were extracted through TextCNN and BiLSTM-attention. Finally, the two were concatenated and fused for intention recognition.
(5): BCNBLMATT: Using BERT as a pre-trained language model, the problem text was transformed into text semantic features. This feature was then used to obtain the local features of the text and the key features in the global semantic information through TextCNN and BiLSTM-Multi-heads attention. Finally, the two were concatenated and fused for intention recognition

4.6. Experimental Results and Analysis

This article tested the macro average accuracy P, macro average recall R, and macro average F1 values of each model on the Winter Olympics Chinese question data test set, as shown in Table 6.

By comparing the performances of the different models on the multi-intent recognition task for the Winter Olympics problem in Table 6, we were able to gain a deeper understanding of their performances, strengths, and limitations. First, we used Bert as the baseline model, which performed well on the Macro_P value (98.40%) and Macro_F1 value (95.51%), highlighting Bert’s superior ability to deal with multi-intent problems.

Further comparison experiments showed that the introduction of both TextCNN and BiLSTM-Multi-heads attention models alone could effectively improve the performance of the multi-intent recognition model, resulting in improvements of 0.96% and 1.14% in the Macro_F1 value, respectively. However, our in-depth analysis revealed that this enhancement effect may have been affected by the characteristics of the problem dataset in real tasks. Considering the characteristics of the problem dataset with longer text and more contextual associations, we paid special attention to the performances of the TextCNN and BiLSTM-Multi-heads attention models introduced separately.

The TextCNN model focuses more on the local key features in text features, which is suitable for dealing with short texts and situations where local features are more important. However, in dealing with long texts and multi-intent recognition tasks with more contextual relevance, TextCNN may be limited by its local attention property, resulting in a relatively weak performance in capturing overall contextual features. In contrast, the BiLSTM-Multi-heads attention model performs better in this case, as it is good at capturing the semantic features of long-range text and more suitable for processing tasks with a high contextual relevance.

In terms of model blending, by combining the two models, BiLSTM-Multi-heads attention and TextCNN, the BCNBLMATT model improved the Macro_P, Macro_R, and Macro_F1 values by 1.2%, 2.52%, and 2.61%, respectively, compared to the single model. This confirms the superiority of multi-model fusion, which makes full use of the advantages of each model to extract the semantic features of the problem text more comprehensively, thus improving the overall performance.

We further validated the effectiveness of the Multi-head Attention mechanism in the BCNBLMATT model. Compared to the regular Attention mechanism, the Multi-head Attention mechanism improved the Macro_F1 value by 1.9%. This provides empirical support for the choice of the Attention mechanism, especially the effectiveness of Multi-head Attention in dealing with multi-intention problems. This result re-emphasizes the importance of selecting Multi-intent Attention mechanisms when solving complex tasks.

In the results of the comparative experiments on the MixATIS dataset, we first focused on the performance of Bert as a benchmark model. Even in the multi-intent recognition task in different domains, Bert still showed an excellent level of performance. Its performance in terms of the Macro_P, Macro_R, and Macro_F1 values reached 97.62%, 94.74%, and 95.78%, respectively. This indicates that Bert has a strong generalization ability and can handle natural language understanding tasks in different domains.

Subsequently, we introduced the BCNBLMATT model and observed its performance on the MixATIS dataset. The results showed that the BCNBLMATT model achieved 97.70%, 97.04%, and 97.20% in terms of the Macro_P, Macro_R, and Macro_F1 values. This further validates its superiority in solving multi-intention problems.

Taken together, the BCNBLMATT model can maintain an excellent performance on multi-intent recognition tasks in different domains, laying a solid foundation for its generalization in practical applications and making it a reliable solution with wide applicability.

5. Conclusions and Outlook

Synthesizing the results and findings of the study, we conducted an in-depth performance comparison and detailed analysis of a multi-intent recognition task for the Winter Olympics problem. Through a comprehensive understanding of the performances of different models, we found that each model had different advantages and disadvantages in dealing with long text and contextual relevance. In particular, in facing the context of long text and contextual relevance, such as the Winter Olympics multi-intent problem, the TextCNN model focused more on local key features, while the BiLSTM-Multi-heads attention model was better at capturing the semantic features of long-distance text. In terms of model fusion, we used the self-constructed BCNBLMATT model, which combined the advantages of both the BiLSTM-Multi-heads attention and TextCNN models, and the model achieved a significant performance improvement compared to a single model in terms of the Macro_P, Macro_R, and Macro_F1 values, respectively, reaching 1.2%, 2.52%, and 2.61%. This highlights the superiority of the Multi-model fusion strategy, which extracted the semantic features of the problem text more comprehensively and successfully meets the challenges of complex tasks. In addition, we verified the effectiveness of the Multi-head Attention mechanism in improving the Macro_F1 value by 1.9% compared to the conventional Attention mechanism, highlighting the importance of choosing the Multi-head Attention mechanism when solving multi-intent problems.

Overall, by deeply exploring the performance of the BCNBLMATT model and cleverly integrating the advantages of different models, this study provides a more comprehensive understanding of solving the multi-intent recognition task for Winter Olympic problems. These findings have a guiding significance for academic research and provide strong insights for practical applications. The future research direction can consider integrating the BCNBLMATT model into the Winter Olympics Q&A model to solve more complex problems, provide more rapid, convenient, and accurate information query services for the Winter Olympics, and provide strong support for the development of China’s sports industry.

Author Contributions

Conceptualization, P.L. and Q.L.; methodology, Q.L.; software, Q.L.; validation, Q.L., P.L. and Z.C.; formal analysis, Q.L.; investigation, Q.L. and Z.C.; resources, Q.L. and Z.C.; writing—original draft preparation, Q.L.; writing—review and editing, Q.L.; supervision, P.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the national key research and development program of China grant number 2020YFF0305300, and the National Natural Science Foundation grant number 61762029.

Conflicts of Interest

The authors declare no conflict of interest.

References

Luo, L.; Li, S.; He, Q.; Yang, C.; Chen, T. Winter Olympic Q & A system based on knowledge map, TF-IDF and BERT model. CAAI Trans. Intell. Syst. 2021, 16, 819–826. [Google Scholar]
Xu, P.Y.; Sarikaya, R. Exploiting shared information for multiintent natural language sentence classification. In Proceedings of the 14th Annual Conference of the International Speech Communication Association, Lyon, France, 25–29 August 2013; pp. 3785–3789. [Google Scholar]
Kim, B.; Ryu, S.; Gary, G.L. Two-stage multi-intent detection for spoken language understanding. Multimed. Tools Appl. 2017, 76, 11377–11390. [Google Scholar] [CrossRef]
Yang, C.; Feng, C. Multi-intention recognition model with combination of syntactic feature and convolution neural network. J. Comput. Appl. 2018, 38, 1839–1845+1852. [Google Scholar]
Liu, J.; Li, Y.L.; Lin, M. Research of short text multi-intent detection with capsule network. J. Front. Comput. Sci. Technol. 2020, 14, 1735–1743. [Google Scholar]
Sabour, S.; Frosst, N.; Hinton, G.E. Dynamic routing between capsules. arXiv 2017, arXiv:1710.09829. [Google Scholar]
Weld, H.; Huang, X.; Long, S.; Poon, J.; Han, S.C. A survey of joint intent detection and slot-filling models in natural language understanding. ACM Comput. Surv. (CSUR) 2021, 55, 1–38. [Google Scholar] [CrossRef]
Li, S.; Sun, Z.P. Bidirectional Interaction Model for Joint Multiple Intent Detection and Slot Filling. Comput. Eng. Appl. 2023. Available online: http://kns.cnki.net/kcms/detail/11.2127.TP.20230321.0934.004.html (accessed on 4 October 2023).
Li, D.; Yang, Y.; Meng, X.; Zhang, X.; Song, C.; Zhao, Y. Review on Multi-Lable Classification. J. Front. Comput. Sci. Technol. 2023. Available online: http://kns.cnki.net/kcms/detail/11.5602.TP.20230627.1225.003.html (accessed on 4 October 2023).
LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
Kim, Y. Convolutional neural networks for sentence classification. arXiv 2014, arXiv:1408.5882. [Google Scholar]
Baker, S.; Korhonen, A. Initializing neural networks for hierarchical multi-label text classification. In BioNLP 2017; Association for Computational Linguistics: Vancouver, BC, Canada, 2017; pp. 307–315. [Google Scholar]
Zheng, C.; Wang, X.; Wang, T. Multi-label classification for medical text based on ALBERT-TextCNN model. J. Shandong Univ. (Nat. Sci.) 2022, 57, 21–29. [Google Scholar]
Lipton, Z.C.; Berkowitz, J.; Elkan, C. A critical review of recurrent neural networks for sequence learning. arXiv 2015, arXiv:1506.00019. [Google Scholar]
Yang, P.; Sun, X.; Li, W.; Ma, S.; Wu, W.; Wang, H. SGM: Sequence generation model for multi-label classification. arXiv 2018, arXiv:1806.04822. [Google Scholar]
Liu, P.; Qiu, X.; Chen, X.; Wu, S.; Huang, X.J. Multi-timescale long short-term memory neural network for modelling sentences and documents. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, 17–21 September 2015. [Google Scholar]
Hu, J.; Kang, X.; Nishide, S.; Ren, F. Text multi-label sentiment analysis based on Bi-LSTM. In Proceedings of the 2019 IEEE 6th International Conference on Cloud Computing and Intelligence Systems, Singapore, 25–27 September 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 16–20. [Google Scholar]
Zhou, Y.; Xu, J.; Cao, J.; Xu, B.; Li, C. Hybrid attention networks for Chinese short text classification. Comput. Sist. 2017, 21, 759–769. [Google Scholar] [CrossRef]
She, X.; Zhang, D. Text classification based on hybrid CNN-LSTM hybrid model. In Proceedings of the 2018 11th International Symposium on Computational Intelligence and Design, Hangzhou, China, 8–9 December 2018; Volume 2, pp. 185–189. [Google Scholar]
Chaofan, L.; Kai, M. Electronic Medical Record Text Classification Based on Attention Mechanism Combined with CNN-BiLSTM. Sci. Technol. Eng. 2022, 22, 2363–2370. [Google Scholar]
Xu, J.; Cai, Y.; Wu, X.; Lei, X.; Huang, Q.; Leung, H.F.; Li, Q. Incorporating context-relevant concepts into convolutional neural networks for short text classification. Neurocomputing 2020, 386, 42–53. [Google Scholar] [CrossRef]
Yang, X.R.; Zhao, S.W.; Zhang, R.X.; Yang, X.J.; Tang, Y.H. BiLSTM_CNN Classification Model Based on Self-Attention and Residual Network. Comput. Eng. Appl. 2022, 58, 172–180. [Google Scholar]
Song, Z.S.; Niu, Y.; Zheng, L.; Tie, J.; Jiang, H. Multiscale double-layer convolution and global feature text classification model. Comput. Eng. Appl. 2023. Available online: http://kns.cnki.net/kcms/detail/11.2127.TP.20230214.1508.046.html (accessed on 4 October 2023).
Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
Duan, D.D.; Tang, J.S.; Wen, Y.; Yuan, K.H. Chinese short text classification algorithm based on BERT model. Comput. Eng. 2021, 47, 79–86. [Google Scholar]
Liu, B.; Pu, Y. BERT-base approach for long document classification. J. Sichuan Univ. (Nat. Sci. Ed.) 2023, 60, 81–88. [Google Scholar]
Lee, J.S.; Hsiang, J. Patent classification by fine-tuning BERT Language model. World Pat. Inf. 2020, 61, 101965. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. arXiv 2017, arXiv:1706.03762. [Google Scholar]
Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
Qin, L.; Xu, X.; Che, W.; Liu, T. AGIF: An Adaptive Graph-Interactive Framework for Joint Multiple Intent Detection and Slot Filling. Findings of the Association for Computational Linguistics: EMNLP 2020. arXiv 2020, arXiv:2004.10087. [Google Scholar]

Figure 1. Model architecture.

Figure 2. Word-embedding model.

Figure 3. TextCNN model.

Figure 4. BiLSTM-Multi-heads attention model.

Figure 5. The impact of different parameters on indicator values.

Table 1. Dataset.

	Winter Olympics Problem Dataset	MixATIS Dataset
train	3003	13,056
dev	987	896
test	987	768

Table 2. Sample Winter Olympics dataset.

Question	Label
哪位运动员在平昌冬奥会的女子大回转的比赛中取得了第1名的成绩?	运动员姓名
(Which athlete achieved first place in the women’s slalom at the Pingchang Winter Olympics?)	(Athlete name)
韩雨桐在第24届界冬奥会中获得的名次是,他获得的成就有哪些?	运动员名次/运动员所获成就
(What was Han Yutong’s ranking and achievements at the 24th Winter Olympics?)	(Athlete rankings/athlete achievement)
埃琳娜·朗格迪尔的年龄? 他来自哪个国家? 他的教练是谁?	运动员年龄/运动员国家/教练
(What is the age of Elena Langedier? Which country does he come from? Who is his coach?)	(Athlete age/athlete country/athlete coach)

Table 3. Sample MixATIS dataset.

Question	Label
给我看看从西雅图到明尼阿波利斯的票价?	机票价格
(Show me a fare from Seattle to Minneapolis?)	(Airfares)
加拿大国际航空公司为哪些城市提供服务, 以及所有可用的餐点是什么?	城市/餐点
(What cities does Air Canada International serve and what are all the available meals?)	(City/Meal)
告诉我华盛顿特区附近的机场, 从波士顿机场到波士顿市中心的地面交通是什么，以及1765年大陆航空公司从波士顿到旧金山有多少站?	机场/地面服务/航班数量
(Tell me about the airports near Washington DC, what is the ground transportation from Boston Airport to downtown Boston, and how many stops did Continental Airlines make from Boston to San Francisco in 1765?)	(Airports/ground services/number of flights)

Table 4. Parameter settings.

Parameter	Value
kernels size	3,4,5
kernels number	100
heads	12
hidden size	768
word vector size	768
activation function	Relu
learning rate	1 × 10⁻⁵
optimizer	Adam
dropout	0.5
batch size	21

Table 5. Confusion matrix.

Real Category	Prediction Category
Real Category	Positive	Negative
Positive	TP	FN
Negative	FP	TN

Table 6. Comparative experiment.

Dataset	Model	P %	R %	F1 %
Winter Olympics Problem	Bert	98.40	94.44	95.51
	Bert-textcnn	97.87	95.68	96.47
	Bert-blmatt	97.74	96.03	96.65
	Bert-cnn+blatt	98.18	95.30	96.22
	BCNBLMATT	99.60	96.96	98.12
MixATIS	Bert	97.62	94.74	95.78
	Bert-textcnn	97.81	96.13	96.78
	Bert-blmatt	97.22	96.90	96.89
	Bert-cnn+blatt	97.61	95.82	96.51
	BCNBLMATT	97.70	97.04	97.20

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, P.; Liang, Q.; Cai, Z. Research on the Identification of a Winter Olympic Multi-Intent Chinese Problem Based on Multi-Model Fusion. Appl. Sci. 2023, 13, 11048. https://doi.org/10.3390/app131911048

AMA Style

Liu P, Liang Q, Cai Z. Research on the Identification of a Winter Olympic Multi-Intent Chinese Problem Based on Multi-Model Fusion. Applied Sciences. 2023; 13(19):11048. https://doi.org/10.3390/app131911048

Chicago/Turabian Style

Liu, Pingshan, Qi Liang, and Zhangjing Cai. 2023. "Research on the Identification of a Winter Olympic Multi-Intent Chinese Problem Based on Multi-Model Fusion" Applied Sciences 13, no. 19: 11048. https://doi.org/10.3390/app131911048

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Research on the Identification of a Winter Olympic Multi-Intent Chinese Problem Based on Multi-Model Fusion

Abstract

1. Introduction

2. Related Research

2.1. Winter Olympics Field

2.2. Multi-Intention Recognition Task

2.3. Multi-Label Text Classification Task Based on Deep Learning Technology

3. BCNBLMATT Model Building

3.1. Overall Architecture of the Model

3.2. Text Representation Layer

3.3. TextCNN Layer

3.4. BiLSTM-Multi-Heads Attention Layer

3.5. Feature Fusion and Intention Classification

4. Experiment and Result Analysis

4.1. Dataset

4.2. Experimental Parameters

4.3. Evaluating Indicator

4.4. The Influence of the Number of Attention Heads and the Size of Convolutional Kernels on the Model’s Intention Recognition Performance

4.5. Comparative Experiment

4.6. Experimental Results and Analysis

5. Conclusions and Outlook

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI