A Character-Word Information Interaction Framework for Natural Language Understanding in Chinese Medical Dialogue Domain

Cao, Pei; Yang, Zhongtao; Li, Xinlu; Li, Yu

doi:10.3390/app14198926

Open AccessArticle

A Character-Word Information Interaction Framework for Natural Language Understanding in Chinese Medical Dialogue Domain

School of Artificial Intelligence and Big Data, Hefei University, Hefei 230061, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(19), 8926; https://doi.org/10.3390/app14198926

Submission received: 19 August 2024 / Revised: 22 September 2024 / Accepted: 30 September 2024 / Published: 3 October 2024

(This article belongs to the Special Issue Natural Language Processing (NLP) and Applications—2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

:

Natural language understanding is a foundational task in medical dialogue systems. However, there are still two key problems to be solved: (1) Multiple meanings of a word lead to ambiguity of intent; (2) character errors make slot entity extraction difficult. To solve the above problems, this paper proposes a character-word information interaction framework (CWIIF) for natural language understanding in the Chinese medical dialogue domain. The CWIIF framework contains an intent information adapter to solve the problem of intent ambiguity caused by multiple meanings of words in the intent detection task and a slot label extractor to solve the problem of difficulty in yellowslot entity extraction due to character errors in the slot filling task. The proposed framework is validated on two publicly available datasets, the Intelligent Medical Consultation System (IMCS-21) and Chinese Artificial Intelligence Speakers (CAIS). Experimental results from both datasets demonstrate that the proposed framework outperforms other baseline methods in handling Chinese medical dialogues. Notably, on the IMCS-21 dataset, precision improved by 2.42%, recall by 3.01%, and the F1 score by 2.4%.

Keywords:

Chinese medical dialogue; intent detection; slot filling; character-word information interaction

1. Introduction

As medical needs increase, intelligent consultation systems [1] have become one of the most effective solutions to address the issue of insufficient healthcare resources. The medical dialogue system is a typical intelligent consultation system [2] (e.g., medical robot and intelligent online consultation), which automatically generates diagnosis and treatment recommendations based on the patient’s input, thereby improving the efficiency of medical services. Moreover, it significantly alleviates the issue of patient difficulty in accessing medical care. In medical dialogue systems, natural language understanding is a fundamental task that mainly consists of two tasks, namely, intent detection and information slot filling. Intent detection categorizes the purpose of the patient’s inquiry text, while information slot filling assigns precise slot labels to each token in the text, accurately identifying the patient’s symptoms. Taking the sentence ‘清鼻涕也只是一点点’ (Clear nose is just a little bit) as an example, as illustrated in Figure 1, natural language understanding can identify the patient’s intent of asking the question as ‘Inform-Symptom’, and obtain the slot labels for the symptoms (i.e., B-Symptom, I-Symptom, I-Symptom, O, O, O, O, O, O). In the slot labels, ‘B-Symptom’ and ‘I-Symptom’ are the slots required for the slot filling task and contain information about the patient’s symptoms. If the intent of the intent detection task is classified incorrectly, it will affect the identification of the correct intent of the patient in the process of medical consultation, and if the slot filling task slot labels cannot be obtained completely, it will lead to the omission of the correct key information in the process of patient consultation, which will result in the medical misdiagnosis, wrong diagnosis, and other situations.

In recent years, medical dialogue systems have made significant progress in many tasks such as dialogue generation [3,4], information extraction [5], classification [6], and sequence annotation [7]. For example, Xu et al. [8] proposed a dual-stream augmented framework for medical dialogue generation. This framework extracts medical slot entities and dialogue intent actions (e.g., diagnosing and prescribing medication) and models their transformations using entity-centric graph streams and sequential act streams, respectively, to improve the accuracy of the next round of medical dialogue generation. Zeng et al. [9] proposed a context-sensitive deep matching model for medical dialogue information extraction. This model employs a multi-view channel that utilizes multiple mask templates to capture keyword information within dialogues. Additionally, it incorporates a two-way attention mechanism to assess crucial slot information across different contexts, thereby enhancing the extraction of contextual dialogue information. Guo et al. [10] developed a retrieval-based medical question answering (Q&A) system that utilizes a pre-trained language model for intent matching and enhances sorting through named entity recognition and knowledge graphs. This approach aims to mine relationships between slotted entities in questions and answers, thereby improving the accuracy of responses in medical Q&A. Ziletti et al. [11] proposed a novel method named XTARS, which integrates traditional BERT-based classification with modern zero- and few-shot learning techniques. This method addresses the automated identification and categorization of terms in medical corpora, specifically for various intent types in medical texts. Their approach significantly enhances the performance of intent classification in medical contexts. Building on this foundation, Fu et al. [12] developed a novel synonym generalization framework. This approach employs span-based prediction to identify biomedical concepts within input text. Additionally, they introduce two regularization terms aimed at resolving boundary issues in biomedical slot entity extraction. Ultimately, these innovations enhance the accuracy of slot entity extraction in medical texts.

It can be seen that there is relatively little research in the medical dialogue field regarding the extraction of intents and slots from dialogue texts [8,9,10,11,12]. However, during Chinese medical dialogues, there are serious issues, such as multiple meanings of words in the text. If the intent and slot information in the patient’s consultation text are not accurately identified, it will affect the correctness of the diagnostic texts generated by subsequent medical dialogue systems, thereby reducing the effectiveness of medical consultations. Take ‘清鼻涕也只是一点点’ (Clear nose is just a little bit) as an example. The word ‘清’ can be used as a verb, meaning to clean, or as a noun, meaning to be limpid. If the meaning of the word ‘清’ is not correctly understood, the system’s understanding of the intent of the consultation text will be greatly biased. In addition, in Chinese medical dialogues, there are cases where character errors lead to difficulties in entity extraction, potentially resulting in the omission of crucial label information. Take for example, ‘鱼甘油能一起吃么？’ (Can fish glycerol be eaten together?). The slot label of this sentence should be ‘鱼甘(肝)油’. Because of the error of the word ‘甘’, the system may not be able to extract the label ‘鱼甘油’, which may mislead the medical dialogue system to give wrong diagnosis results. Therefore, accurately identifying the patient’s diagnostic intent and key entity information in the dialogue text forms the foundation for achieving intelligent medical dialogue.

To address these issues, this paper proposes a character-word information interaction framework for natural language understanding in the Chinese medical dialogue domain. For intent detection, the goal is to capture the most relevant features of discourse intents, mitigate the effects of multiple meanings of words on intent judgments, and improve the accuracy of detecting patient intents in Chinese medical dialogues. For slot filling tasks, integrating character and word information enhances the model’s ability to capture crucial label information in Chinese medical dialogues. Figure 2 illustrates the flowchart depicting the key contribution of this paper.

In summary, the contribution of the work in this paper is as follows:

This paper proposes a character-word information interaction framework for natural language understanding in the Chinese medical dialogue domain, which integrates the characteristics of character and word information to accurately achieve intent detection and slot filling in the Chinese medical dialogue domain.
For intent detection, this paper develops the intent information adapter designed to mitigate the impact of multiple meanings of word on intent assessment. For the slot filling task, the paper proposes the slot label extractor to address challenges arising from character errors that complicate the extraction of slot entities.
In this paper, experiments were conducted on the Chinese medical dataset and public dataset, respectively. The results demonstrate that our model achieves accurate intent detection and slot filling for medical dialogue texts compared with other baseline models.

The rest of the article is organized as follows. Section 2 provides an overview of some work in the area of Chinese medical dialogue and joint work on intent detection and slot filling tasks. Section 3 describes the character-word information interaction framework proposed in this paper, as well as the proposed intent information adapter and slot label extractor and how they are jointly trained. Section 4 describes the dataset, assessment metrics, experimental parameters, baseline, and analyses of the experimental component. In the final section, the research is summarized.

2. Related Work

2.1. Medical Dialogue System

The medical dialogue system automatically generates diagnosis and treatment recommendations based on the patient’s input regarding their condition. This not only alleviates the shortage of medical resources but also enhances the efficiency of patient consultations. In recent years, medical dialogue systems have made significant progress in tasks such as dialogue generation [3,4], information extraction [5], classification [6], and sequence annotation [7]. For example, Xu et al. [8] proposed a dual-stream augmented framework for medical dialogue generation, focusing on extracting medical entities and dialogue acts within contextual dialogues. Zeng et al. [9] proposed a context-sensitive deep matching model for extracting medical dialogue information, aimed at enhancing the comprehension of contextual dialogue information in medical settings. Guo et al. [10] developed a retrieval-based medical Q&A system aimed at enhancing the accuracy of answers provided in medical Q&A scenarios. This paper investigates the joint tasks of intent detection and slot filling in medical dialogue. These tasks essentially involve classification and sequence labeling. Therefore, the following will provide a detailed introduction to related work on classification and sequence labeling tasks.

For the classification task, Zhang et al. [13] proposed a clinical note segmentation scheme to tackle the issue of critical patient information loss in unstructured clinical records. Their model identifies contextual changes and assigns medically relevant headings to each section. Rawat et al. [14] evaluated the application of transfer learning techniques for detecting suicide attempts and suicidal beliefs in electronic health records. They fine-tuned publicly available models to assess the effectiveness of transfer learning with five different parameters, aiming to address the challenge of limited resources in clinical applications. Barros et al. [15] proposed a new neural network-based clinical coding architecture. It initially leverages the hierarchical structure of ontologies to form clusters grounded in semantic relationships. Subsequently, the Matcher module assigns probabilities to determine the likelihood of a document belonging to each cluster, addressing the challenge of multi-label classification in medical terminology.

On the task of sequence annotation, Yepes et al. [16] constructed two large datasets of biological pathogens known to cause diseases. They applied manually prescribed indexes to these datasets to identify focal entities and utilized a machine learning approach to distinguish these focal entities from background entities in medicine. Kwon et al. [17] first designed a novel natural language processing application aimed at identifying medical terms within electronic health record notes that might pose difficulties for patients to comprehend. Secondly, they introduced a new publicly available dataset and proposed a medical terminology extraction model to tackle the challenge of detecting unfamiliar terms within specific domains. Zhang et al. [18] explore knowledge-enriched self-supervision of biomedical entity linking. They leverage readily available domain knowledge to generate self-supervised mention examples from unlabeled text using a domain ontology during training. Additionally, they employ contrastive learning to train a context encoder that addresses the challenge of fuzzy entity boundaries, thereby enhancing the efficiency of entity processing.

Although there have been studies on classification and sequence annotation tasks, there are very few approaches that are applicable to the joint study of intent detection and slot filling tasks for medical texts, so in this work, this paper investigates the two tasks of intent detection and slot filling in the field of medical dialogues to improve the accuracy of medical dialogues in the diagnostic process.

2.2. Intent Detection and Slot Filling Joint Framework

In early research, the tasks of intent detection and slot filling were traditionally handled separately. Intent detection was typically regarded as a text classification task, while slot filling was commonly viewed as a sequence labeling problem. For example, Goo et al. [19] proposed a gate mechanism that integrates intent information to explore slot filling. Haihong E et al. [20] developed a SFID modeling framework that enables the two tasks to interact with each other in a mutually reinforcing way. Qin et al. [21] designed a joint model with a stack propagation mechanism that is able to incorporate intent information, thus guiding slot filling. Cai et al. [22] enhance semantic features by combining intention and slot information to dynamically guide text representation. Although the above studies have taken into account the relevance of the two tasks of intent detection and slot filling and have shown promising results, most of the studies have been conducted in English; however, Chinese text is more complex and diverse than English.

On Chinese utterances, Zhang et al. [23] introduced a variant of the long short-term memory network, Lattice-LSTM, to encode all potential words into a character-based LSTM. Gui et al. [24] proposed a more concise approach. They applied the idea of Lattice-LSTM to the character representation layer to achieve the encoding of the matching vocabulary. Qiao et al. [25] designed a word-character attention model for Chinese text categorization, combining word-level and character-level attention mechanisms to determine text categories. Tao et al. [26] proposed to make full use of Chinese characters, words, and radicals to represent and classify Chinese texts more accurately. Recently, Liu et al. [27] proposed a character-based approach to handle Chinese and English intent detection and slot filling tasks. Teng et al. [28] proposed a multi-level word adaptation model for Chinese dialogue systems. Xie et al. [29] proposed a multi-source information fusion network based on reading comprehension. They utilized a machine reading comprehension framework to explore tasks such as Chinese slot filling and intent detection. The above studies focus on intent detection and slot filling for Chinese sentences, involving applications related to characters and words. However, there are few intent detection and slot filling tasks applied to medical dialogues. Therefore, this paper investigates the joint task of intent detection and slot filling in medical dialogues by incorporating character and word information from contextual conversations, based on the respective characteristics of character and word information.

3. Method

3.1. Model Description

In this section, this paper introduces a character-word information interaction framework (CWIIF) for natural language understanding in the Chinese medical dialogue domain. The general framework of the method is shown in Figure 3, which is mainly composed of an encoder-decoder channel with characters as inputs and an encoder-decoder channel with words as inputs, which are referred to as the character information extraction channel and the word information extraction channel, respectively, in this paper. Firstly, in the character information extraction channel, the medical text is transformed into the character input form

C = \{c_{1}, c_{2}, c_{3}, \dots, c_{n}\}

, which is encoded by the pre-trained model to output the character information vector

H^{c t} = \{H_{c l s}^{c t}, H_{1}^{c t}, H_{2}^{c t}, \dots, H_{n}^{c t}\}

for predicting the intent label, and in the word information extraction channel, the medical text is transformed into the word input form

W = \{w_{1}, w_{2}, w_{3}, \dots, w_{n}\}

, which is encoded by the pre-trained model to output the word information vector

H^{w t} = \{H_{c l s}^{w t}, H_{1}^{w t}, H_{2}^{w t}, \dots, H_{n}^{w t}\}

for predicting the intent label. Second, in the character information extraction channel, the character information hidden state

h_{t}^{c, s}

for slot filling is outputted by the decoder, and in the word information extraction channel, the word information hidden state

h_{t}^{w, s}

for slot filling is outputted by the decoder. Finally, the character information and word information are extracted by the model, merged to the intent information adapter to predict the intent label

O^{I}

for the intent detection task, and merged to the bi-level association label extractor to identify the slot label

O_{t}^{S}

for the slot filling task.

This paper chooses to use RoBERTa as the encoder, which is a pre-trained model consisting of several stacked transformer blocks that capture a rich representation of the contextual discourse. Given a medical text, the text is transformed into the form of character

C = \{c_{1}, c_{2}, c_{3}, \dots, c_{n}\}

and word

W = \{w_{1}, w_{2}, w_{3}, \dots, w_{n}\}

as input, where n is the sequence length of the medical text. The context is encoded using RoBERTa to obtain a vector representation of all its tokens. Firstly, the virtual token (CLS) is placed at the beginning of the sentence. Secondly, add the three embeddings of token, segment, and position to RoBERTa. Finally, the output of the last transformer block in RoBERTa is represented as vectors, character vector

H^{c t} = \{H_{c l s}^{c t}, H_{1}^{c t}, H_{2}^{c t}, \dots, H_{n}^{c t}\}

for medical text, and word vector

H^{w t} = \{H_{c l s}^{w t}, H_{1}^{w t}, H_{2}^{w t}, \dots, H_{n}^{w t}\}

for medical text, respectively.

The character C and word W are trained through the encoder RoBERTa to increase the generalization ability with the following formula:

\begin{matrix} H^{c t} = R o B E R T a (c 1, c 2, \dots, c n) \end{matrix}

(1)

\begin{matrix} H^{w t} = R o B E R T a (w 1, w 2, \dots, w n) \end{matrix}

(2)

When performing the Chinese medical dialogue intent detection task, in the character information extraction channel part, the entire medical text representation is obtained by using MLP attention to improve the processing of complex data for the obtained pre-trained vector representation with the following formula:

\begin{matrix} S^{c} = \sum_{i} α i H_{i}^{c t} \end{matrix}

(3)

\begin{matrix} α_{i} = \frac{e x p (u H_{i}^{c t})}{\sum_{j} e x p (u H_{j}^{c t})} \end{matrix}

(4)

Similarly, the word information extraction channel part operates in the same way. Where

S^{c}

is the weighted sum of all hidden states while

u \in R^{d}

is the learnable model parameters.

Then, the overall vector is taken, and the probability P is obtained by the softmax function, and then P is calculated by the argmax function to determine the highest predicted probability category for Chinese medical dialogue intention detection. The formula is as follows:

\begin{matrix} O^{I} = \underset{\tilde{y} = S^{i n t}}{a r g m a x P} 〔\tilde{y} ∣ S^{c}〕 \end{matrix}

(5)

\begin{matrix} P 〔\tilde{y} = j ∣ S^{c}〕 = s o f t m a x (W^{I} S^{c} + b^{I}) \end{matrix}

(6)

where

S^{i n t}

is the set of intent labels and

W^{I}

,

b^{I}

are trainable parameters.

When executing the Chinese medical dialogue slot filling task, in the character information extraction channel part, this paper uses an LSTM as a slot filling task decoder to help the model perform sequence labeling and information extraction efficiently and use the intent information to guide slot filling. At each decoding step t, the decoder hidden state

h_{t}^{c, s}

is computed as follows:

h_{t}^{c, s} = L S T M (h_{t}^{c t} \oplus ⌀^{i n t} \oplus y_{t - 1}^{s}, h_{t - 1}^{c, s})

(7)

Similarly, the word information extraction channel part operates in the same way. where

h_{t}^{c, t}

is the corresponding context vector representation,

⌀^{i n t} (\cdot)

denotes the embedding matrix of the intent, and

y_{t - 1}^{s}

is the embedding of the slot labels issued by the previous decoding step.

The probability P is obtained by the softmax function, and then P is calculated by the argmax function to determine the slot label with the highest probability for Chinese medical dialogue slot filling with the following formula:

\begin{matrix} y_{t}^{S} = ⌀^{s l o t} (O_{t}^{S}) \end{matrix}

(8)

\begin{matrix} O_{t}^{s} = \underset{\tilde{y} = S^{s l o t}}{a r g m a x P} 〔\tilde{y} ∣ h_{t}^{c, s}〕 \end{matrix}

(9)

\begin{matrix} P 〔\tilde{y} = j ∣ h_{t}^{c, s}〕 = s o f t m a x (W^{S} h_{t}^{c, s} + b^{S}) \end{matrix}

(10)

where

W^{s}

and

b^{s}

are trainable parameters,

S^{s l o t}

is the set of slot labels for

O_{t}^{S}

, the slot label of the tth character, and

⌀^{s l o t} (\cdot)

denotes the embedding matrix of the slot.

3.2. Intent Detection and Slot Filling

In this paper, in order to better carry out the two tasks of intent detection and slot filling in the domain of Chinese medical dialogue, an intent information adapter (IIA) and a slot label extractor (SLE) are designed. Firstly, designing the IIA will help the model to reduce the influence of multiple meanings of words on the intent judgment. This is achieved through the gate mechanism to ignore the unnecessary features in the sentence and focus on the important information, which will improve the accuracy of the patient’s intent detection. Secondly, combining the acquired intent information and designing a SLE to enhance the acquisition of important label information in Chinese medical dialogues. This improves the filling of correct slot labels by fusing and interacting character and word information through multi-head attention.

3.2.1. Intent Information Adapter

In order to carry out the task of intent detection in Chinese medical dialogues, this paper designs an IIA that, through the gate mechanism, ignores the unnecessary features in the sentence and focuses on the important information, which will improve the accuracy of the patient’s intent detection. As shown in Figure 4. Firstly, the sentence representations of character information and word information are taken as inputs and both of them are spliced together, and through the gating mechanism, a common gating value is computed, then the important information of the sentence representations of both of them is obtained separately through the gating value, and finally, the fusion interaction is carried out to obtain the correct intent.

Given two outputs

S^{c}

and

S^{w}

after the MLP attention Formula (3),

S^{c}

and

S^{w}

are the sentence representations of character information and word information on character and word sequences, respectively. Then, this paper employs an IIA to merge the character-word information with Equation (11). The gating mechanism, which ignores unnecessary features in a sentence and focuses on important information, helps the model to reduce the impact of word polysemy on intent judgment. The formula is as follows:

\begin{matrix} IIA (C, W) = G ((1 - λ) \cdot C + W) \end{matrix}

(11)

\begin{matrix} λ = s i g m o i d (C^{T} N_{f} W) \end{matrix}

(12)

where

N_{f} \in R^{d \times d}

is a trainable parameter and

λ

is an adaptively adjustable importance between character and word information.

Finally, in this paper, the intention label of

O^{I}

is predicted using the summary vector

V^{I}

, the probability P is obtained by softmax function, which is given by (13):

V^{I} = IIA (S^{c}, S^{w})

(13)

P 〔\tilde{y} = j ∣ V^{I}〕 = s o f t m a x (W^{I} V^{I} + b^{I})

(14)

3.2.2. Slot Label Extractor

In order to carry out the task of slot filling in Chinese medical dialogues, this paper designs the SLE, which interacts with the fusion of character and word information through multi-head attention to achieve the extraction of important tags in patient texts. Firstly, an LSTM with the decoding function of Equation (7) is used to generate the hidden state

h_{t}^{c, s}

. Then, this paper uses a SLE to integrate the character and word information, as shown in Figure 5, to make use of character and word information interaction for better slot label extraction. Firstly, the hidden states of character information and word information are taken as inputs, and both are spliced to join the multi-head attention; the common multi-head attention weight is calculated; then the attention weight acts on the hidden states of both; and finally the fusion interaction is carried out to realize the extraction of slot labels, which improves the filling of the correct slot labels. The formula is as follows:

\begin{matrix} SLE (C, W) = A ((1 - λ) \cdot C + W) \end{matrix}

(15)

\begin{matrix} λ = s i g m o i d (C^{T} N_{f} W) \end{matrix}

(16)

Finally, slot filling is performed using integration

V_{t}^{S}

instead of

h_{t}^{c, s}

that contains character information and word information. The probability P is obtained by the softmax function. The formula is as follows:

\begin{matrix} V_{t}^{S} = SLE (h_{t}^{c, s}, h_{t}^{w, s}) \end{matrix}

(17)

P 〔\tilde{y} = j ∣ V_{t}^{S}〕 = s o f t m a x (W^{S} V_{t}^{S} + b^{S})

(18)

3.3. Joint Training

Based on the above model design, the loss function for the intention detection task is first calculated, then the loss function for the slot filling task is calculated, and the loss function is summed for joint training. The final joint objective function is calculated as follows:

\begin{matrix} L^{I} = - \sum_{k = 1}^{S^{i n t}} y^{k, I} l o g P (k ∣ V^{I}) \end{matrix}

(19)

\begin{matrix} L^{S} = - \sum_{i = 1}^{N} \sum_{j = 1}^{S^{i n t}} y_{i}^{j, S} l o g P (j ∣ V_{i}^{S}) \end{matrix}

(20)

\begin{matrix} L = L^{I} + L^{S} \end{matrix}

(21)

where

V^{I}

and

V_{i}^{S}

are intent labels and slot labels, respectively,

S^{i n t}

and

S^{s l o t}

is the set of labels for intents and slots, respectively.

4. Experiments

4.1. Datasets

This paper conducted experiments on two Chinese SLU datasets, Chinese Artificial Intelligence Speakers (CAIS) [27] and Intelligent Medical Consultation System (IMCS-21) [30]. Details of the dataset are shown in Table 1 and Table 2. In order to test the validity of the model in the public dialogue domain, the CAIS dataset was chosen for this paper. In addition, in order to further evaluate the performance of the model in this paper in the domain of medical dialogue, the IMCS-21 dataset is selected in this paper.

The CAIS dataset consists of sentences collected by Chinese artificial intelligence speakers. It consists of 7995 training discourses, 994 validation discourses, and 1024 test discourses.

The IMCS-21 dataset is a corpus of medical dialogues processed by Chen et al. [30], who removed some incomplete and short dialogue samples from the raw data and annotated the filtered samples. The dataset was used for research and evaluation of medical dialogue comprehension tasks.

In this study, the IMCS-21 dataset was assigned according to the guidelines of Chen et al. [30], while the CAIS dataset used the same format and data partitioning as Teng et al. [28] and others.

4.2. Evaluation Metrics

On the IMCS-21 dataset, this paper uses the same evaluation metrics as Chen et al. [30], including Precision (P), Recall (R), F1 Score (F1), and Accuracy (Acc). These metrics are used to evaluate the performance of the model in Chinese medical dialogues. On the CAIS dataset, this paper follows the method of Teng et al. [28] for evaluation. Specifically, this paper evaluates the slot filling performance of Chinese SLU using F1 score (F1), the intent detection performance using Accuracy (Acc), and also the sentence-level semantic frame parsing performance using Overall Accuracy (OAcc), which indicates that the outputs are regarded as correct predictions when and only when the predicted intents and all predicted slots match the underlying facts exactly.

4.3. Experimental Settings

In this work, the pre-trained model RoBERTa based on whole word masking (WWM) is utilized for embedding Chinese text. The experimental process parameters are set as shown in Table 3. In the training process, the batch size (i.e., the number of texts in each batch) is set to 16, the dropout probability is set to 0.3, the learning rate is set to

2 \times 10^{- 5}

, the word embedding dimension is set to 128, and the character embedding dimension is set to 64. In this paper, the number of epochs is set to 50, and adamW is used to optimize all the trainable parameters.

4.4. Baseline

In order to validate the effectiveness of the CWIIF in this paper on the IMCS-21 dataset, the baseline models in this paper include traditional deep learning models: TextCNN [31], TextRNN [32], TextRCNN [33], and DPCNN [34], general-purpose domain pre-trained models: BERT [35] and ERNIE [36], and biomedical pre-trained models: MC-BERT [37] and ERNIE-health [36].

On the CAIS dataset, the baseline models in this paper include (1) Slot-gated: Goo et al. [19] proposed a slot-gate that models the relationship between intent and slot attention vectors. (2) SF-ID network: Haihong E et al. [20] proposed a new SF-ID network that applies a bidirectional association mechanism to intent detection and slot filling tasks. (3) CM-Net: Liu et al. [27] proposed a character-based approach for Chinese SLU using a character-level association model. (4) Stack-Propagation: Qin et al. [22] proposed a Stack-Propagation framework to solve the SLU task by combining word-level intent detection mechanisms. (5) Multi-level word adapter: Teng et al. [28] proposed a simple and effective multi-level word adapter (MLWA) model.

4.5. Results and Analysis

As shown in Table 4, P, R, F1, and Acc were evaluated on the IMCS-21 dataset. The baseline model employs three sets of baseline models, which are traditional deep learning models, generic domain pre-trained models and biomedical pre-trained models, while the CWIIF proposed in this paper will be validated on the medical dataset. In view of this, the CWIIF in this paper uses the generic domain pre-training model RoBERTa for the pre-training model. As shown in Table 5, the performance of the proposed CWIIF is compared with previous baseline models evaluated on the CAIS dataset for F1 of slot filling, Acc of intent detection, and OAcc of the whole task. In addition, this paper selects two baseline methods with the highest OAcc, i.e., Stack-Propagation and MLWA, and their embedding methods are replaced with RoBERTa to form two baseline models (Stack-Propagation + RoBERTa, MLWA + RoBERTa).

On the IMCS-21 dataset, the CWIIF in this paper consistently outperforms other baseline methods in four metrics: P, R, F1, and Acc. In particular, it presents significant advantages in P, R, and F1. Specifically, compared with the previous ERNIE-health baseline, this paper showed a 2.42% improvement in P, a 3.01% improvement in R, and a 2.4% improvement in F1. Also on the CAIS dataset, this paper’s CWIIF compares favorably with other baseline methods in terms of Acc, F1, and OAcc. Compared with the previous state-of-the-art SOTA model MLWA + RoBERTa, the model in this paper improves the slot filling F1 by 0.83%, the Acc of intent detection by 0.37%, and the OAcc by 1.22% on the CAIS dataset. These results show that the CWIIF in this paper can effectively encode interactions between characters and words and confirm the advantages of CWIIF for intent detection and slot filling in Chinese medical dialogues.

4.6. Ablation Experiment

In order to evaluate the contribution of the method modules in this work, this paper designs ablation studies on the models, including an IIA for intent detection and a SLE for slot filling. In this section, this paper removes the proposed method modules one by one and performs ablation experiments on the IMCS-21 dataset with the CAIS dataset.

Firstly, this paper removes the IIA for intent detection and keeps the other components of the model unchanged. In this paper, it is named Ours-IIF, and the results are given in Table 6 and Table 7. In this paper, it is found that the Acc of the IMCS-21 dataset and the CAIS dataset is significantly reduced, which proves that the proposed IIA can effectively identify the correct intent to facilitate Chinese intent detection. It can also be observed that the F1 and OAcc decrease by 0.1% and 0.89%, respectively, in the results of Table 7, which further validates the effectiveness of the IIA proposed in this paper.

Secondly, this paper removes the SLE for slot filling and keeps the other components of the model unchanged. In this paper, it is named Ours-LE. The results are shown in Table 6 and Table 7. In this paper, it can be observed that the F1 of the IMCS-21 dataset and the CAIS dataset have decreased by 0.34% and 0.07%, respectively. This paper attributes this to the fact, that the fusion of character and word information can help the model detect important labels, and the proposed SLE can effectively extract complete label information to improve slot filling.

Finally, this paper removes both the IIA and the SLE, which is named Ours-ALL in this paper, and the results are shown in Table 6 and Table 7. In the results of Table 6, the IMCS-21 dataset decreases 1.06%, 1.48%, 0.59%, and 0.76% in the four metrics of P, R, F1, and Acc, respectively. In the results of Table 7, the CAIS dataset decreases by 0.83%, 0.37%, and 1.22% in the metrics of F1, Acc, and OAcc, respectively. This further confirms the feasibility of the second contribution point proposed in this paper. Therefore, in summary, both the IIA and the SLE have their own advantages, and the union of the two is more instructive and practically valuable.

4.7. Case Study

In order to verify whether the model helps reduce the effect of multiple meanings of words on intent judgment, as well as to address the case where character errors make slot entity extraction difficult. In this paper, an utterance containing both problems is selected for case study, and the MLWA + RoBERTa baseline method with the highest OAcc is chosen to compare the prediction results of intent labeling and slot labeling. The results are shown in Figure 6.

The model in this paper correctly predicts the intent label and slot label, while MLWA + RoBERTa performs poorly. For example, taking the statement ‘鱼甘油一直吃起的’ (Fish glycerin is always eaten) as input, the intent label should be ‘Inform-Basic-Information’ in terms of intent. Because of the problem of multiple meanings of words and in doctor–patient dialogues, MLWA + RoBERTa outputs an incorrect intent ‘Inform-Precautions’, while the model in this paper correctly detects the intent as ‘Inform-Basic-Information’. On the slot, the slot labels should be ‘B-Drug’ and ‘I-Drug’. Because of the problem of character errors that make entity extraction difficult, MLWA + RoBERTa extracts the slot labels as ‘O’, and the slot labels are correctly extracted as ‘B-Drug’, ‘I-Drug’, and ‘O’. In conclusion, this case study demonstrates, the importance of integrating character-word information interactions into the model to improve the accuracy of Chinese medical intent detection and slot filling.

5. Conclusions

This paper proposes a Character-Word Information Interaction Framework (CWIIF) for natural language understanding in the Chinese medical dialogue domain. This framework is designed for the tasks of intent detection and slot filling in Chinese medical dialogues, which extract information from character-level and word-level texts, respectively, and designs an information-interaction module between the two in order to improve the accuracy of intent and slot recognition in Chinese medical dialogues. Meanwhile, in this paper, an intent information adapter (IIA) is designed to perform the intent detection task, and a slot label extractor (SLE) is designed to perform the slot filling task. By conducting experiments on two Chinese datasets, this paper verifies the performance validity of the model and achieves significant results. Further ablation experiments confirm the feasibility of the intent information adapter and slot label extractor. However, the model in this paper is still inapplicable to Chinese medical dialogue terminology. In the future, character word information can be used to apply to more medical dialogue tasks, thus enhancing the efficiency of patient medical consultations.

Author Contributions

Conceptualization, P.C., Z.Y., X.L. and Y.L.; methodology, P.C., Z.Y., X.L. and Y.L.; software, Z.Y. and X.L.; validation, Z.Y. and X.L.; formal analysis, Z.Y.; resources, P.C.; data curation, X.L. and Y.L.; writing—original draft preparation, Z.Y.; writing—review and editing, P.C., X.L. and Y.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by 2023 Hefei University Talent Research Fund project OF FUNDER grant number 23RC11, and the major scientific research project of Anhui Province in 2024 OF FUNDER grant number 2024AH040209. The APC was funded by 23RC11.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

Relevant data is not available due to privacy and ethical constraints.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Lee, D.; Yoon, S.N. Application Of Artificial Intelligence-Based Technologies in the Healthcare Industry: Opportunities and Challenges. Int. J. Environ. Res. Public Health 2021, 18, 271. [Google Scholar] [CrossRef]
Khan, A.; Asghar, M.Z.; Ahmad, H.; Kundi, F.M.; Ismail, S. A Rule-Based Sentiment Classification Framework for Health Reviews on Mobile Social Media. J. Med. Imaging Health Inform. 2017, 7, 1445–1453. [Google Scholar] [CrossRef]
Shang, L.; Lu, Z.; Li, H. Neural Responding Machine For Short-Text Conversation. arXiv 2015, arXiv:1503.02364. [Google Scholar]
Iyyer, M.; Boyd-Graber, J.L.; Claudino, L.M.B.; Socher, R.; Daumé, H.D., III. A Neural Network for Factoid Question Answering over Paragraphs. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014; Volume D14-1, pp. 633–644. [Google Scholar]
Zhang, Y.; Jiang, Z.; Zhang, T.; Liu, S.; Cao, J.; Liu, K.; Liu, S.; Zhao, J. MIE: A Medical Information Extractor towards Medical Dialogues. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; Volume 2020.acl-main, pp. 6460–6469. [Google Scholar]
Zhang, Y.; Wallace, B.C. A Sensitivity Analysis of (and Practitioners’ Guide to) Convolutional Neural Networks for Sentence Classification. arXiv 2017, arXiv:1510.03820. [Google Scholar]
Huang, Z.; Xu, W.; Yu, K. Bidirectional LSTM-CRF Models for Sequence Tagging. arXiv 2015, arXiv:1508.01991. [Google Scholar]
Xu, K.; Hou, W.; Cheng, Y.; Wang, J.; Li, W. Medical Dialogue Generation via Dual Flow Modeling. arXiv 2023, arXiv:2305.18109. [Google Scholar]
Zeng, D.; Peng, R.; Jiang, C.; Li, Y.; Dai, J. CSDM: A context-sensitive deep matching model for medical dialogue information extraction. Inf. Sci. 2022, 607, 727–738. [Google Scholar] [CrossRef]
Guo, Q.; Cao, S.; Yi, Z. A medical question answering system using large language models and knowledge graphs. Int. J. Intell. Syst. 2022, 37, 8548–8564. [Google Scholar] [CrossRef]
Ziletti, A.; Akbik, A.; Berns, C.; Herold, T.; Legler, M.; Viell, M. Medical Coding with Biomedical Transformer Ensembles and Zero/Few-shot Learning. arXiv 2022, arXiv:2206.02662. [Google Scholar]
Fu, Z.; Su, Y.; Meng, Z.; Collier, N. Biomedical Named Entity Recognition via Dictionary-based Synonym Generalization. arXiv 2023, arXiv:2305.13066. [Google Scholar]
Zhang, F.; Laish, I.; Benjamini, A.; Feder, A. Section Classification in Clinical Notes with Multi-task Transformers. In Proceedings of the 13th International Workshop on Health Text Mining and Information Analysis (LOUHI), Abu Dhabi, United Arab Emirates, 7 December 2022. [Google Scholar]
Rawat, B.P.S.; Yu, H. Parameter Efficient Transfer Learning for Suicide Attempt and Ideation Detection. In Proceedings of the 13th International Workshop on Health Text Mining and Information Analysis, Abu Dhabi, United Arab Emirates, 7 December 2022; pp. 108–115. [Google Scholar]
Barros, J.; Rojas, M.; Dunstan, J.; Abeliuk, A. Divide and Conquer: An Extreme Multi-Label Classification Approach for Coding Diseases and Procedures in Spanish. In Proceedings of the 13th International Workshop on Health Text Mining and Information Analysis, Abu Dhabi, United Arab Emirates, 7 December 2022; pp. 138–147. [Google Scholar]
Jimeno-Yepes, A.; Verspoor, K. Distinguishing between focus and background entities in biomedical corpora using discourse structure and transformers. In Proceedings of the 13th International Workshop on Health Text Mining and Information Analysis, Abu Dhabi, United Arab Emirates, 7 December 2022; pp. 35–40. [Google Scholar]
Kwon, S.; Yao, Z.; Jordan, H.S.; Levy, D.A.; Corner, B.; Yu, H. MedJEx: A Medical Jargon Extraction Model with Wiki’s Hyperlink Span and Contextualized Masked Language Model Score. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Abu Dhabi, United Arab Emirates, 7–11 December 2022; Volume 2022, pp. 11733–11751. [Google Scholar]
Zhang, S.; Cheng, H.; Vashishth, S.; Wong, C.; Xiao, J.; Liu, X.; Naumann, T.; Gao, J.; Poon, H. Knowledge-Rich Self-Supervision for Biomedical Entity Linking. arXiv 2022, arXiv:2112.07887. [Google Scholar]
Goo, C.W.; Gao, G.; Hsu, Y.K.; Huo, C.L.; Chen, T.C.; Hsu, K.W.; Chen, Y.N. Slot-Gated Modeling for Joint Slot Filling and Intent Prediction. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), New Orleans, LA, USA, 1–6 June 2018; pp. 753–757. [Google Scholar]
Niu, P.; Chen, Z.; Song, M. A Novel Bi-Directional Interrelated Model For Joint Intent Detection And Slot Filling. arXiv 2019, arXiv:1907.00390. [Google Scholar]
Qin, L.; Che, W.; Li, Y.; Wen, H.; Liu, T. A Stack-Propagation Framework with Token-Level Intent Detection for Spoken Language Understanding. arXiv 2019, arXiv:1909.02188. [Google Scholar]
Cai, S.; Ma, Q.; Hou, Y.; Zeng, G. Semantically Guided Enhanced Fusion for Intent Detection and Slot Filling. Appl. Sci. 2023, 13, 12202. [Google Scholar] [CrossRef]
Zhang, Y.; Yang, J. Chinese NER Using Lattice LSTM. arXiv 2018, arXiv:1805.02023. [Google Scholar]
Gui, T.; Zou, Y.; Zhang, Q.; Peng, M.; Fu, J.; Wei, Z.; Huang, X.J. A Lexicon-Based Graph Neural Network for Chinese NER. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; Volume D19-1, pp. 1040–1050. [Google Scholar]
Qiao, X.; Peng, C.; Liu, Z.; Hu, Y. Word-character attention model for Chinese text classification. Int. J. Mach. Learn. Cybern. 2019, 10, 3521–3537. [Google Scholar] [CrossRef]
Tao, H.; Tong, S.; Zhao, H.; Xu, T.; Jin, B.; Liu, Q. A Radical-aware Attention-based Model for Chinese Text Classification. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; pp. 5125–5132. [Google Scholar]
Liu, Y.; Meng, F.; Zhang, J.; Zhou, J.; Chen, Y.; Xu, J. CM-Net: A Novel Collaborative Memory Network for Spoken Language Understanding. arXiv 2019, arXiv:1909.06937. [Google Scholar]
Teng, D.; Qin, L.; Che, W.; Zhao, S.; Liu, T. Injecting word information with multi-level word adapter for chinese spoken language understanding. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada, 6–11 June 2021; pp. 8188–8192. [Google Scholar]
Xie, B.; Jia, X.; Song, X.; Zhang, H.; Chen, B.; Jiang, B.; Wang, Y.; Pan, Y. ReCoMIF: Reading comprehension based multi-source information fusion network for Chinese spoken language understanding. Inf. Fusion 2023, 96, 192–201. [Google Scholar] [CrossRef]
Chen, W.; Li, Z.; Fang, H.; Yao, Q.; Zhong, C.; Hao, J.; Zhang, Q.; Huang, X.; Peng, J.; Wei, Z. A benchmark for automatic medical consultation system: Frameworks, tasks and datasets. Bioinformatics 2023, 39, btac817. [Google Scholar] [CrossRef]
Chen, Y. Convolutional Neural Network for Sentence Classification; GitHub: San Francisco, CA, USA, 2015. [Google Scholar]
Liu, P.; Qiu, X.; Huang, X. Recurrent Neural Network for Text Classification with Multi-Task Learning. arXiv 2016, arXiv:1605.05101. [Google Scholar]
Lai, S.; Xu, L.; Liu, K.; Zhao, J. Recurrent Convolutional Neural Networks For Text Classification. In Proceedings of the AAAI Conference on Artificial Intelligence, Austin, TX, USA, 25–30 January 2015; pp. 2267–2273. [Google Scholar]
Johnson, R.; Zhang, T. Deep Pyramid Convolutional Neural Networks for Text Categorization. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vancouver, BC, Canada, 30 July–4 August 2017. [Google Scholar] [CrossRef]
Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
Zhang, Z.; Han, X.; Liu, Z.; Jiang, X.; Sun, M.; Liu, Q. ERNIE: Enhanced Language Representation with Informative Entities. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July–2 August 2019. [Google Scholar] [CrossRef]
Zhang, N.; Chen, M.; Bi, Z.; Liang, X.; Li, L.; Shang, X.; Yin, K.; Tan, C.; Xu, J.; Huang, F.; et al. CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark. arXiv 2021, arXiv:2106.08087. [Google Scholar]

Figure 1. Example of medical intent detection and slot filling tasks. ‘清鼻涕也只是一点点’ means Clear nose is just a little bit, and ‘可以喝点感冒灵颗粒’ means You can take some cold medicine granules.

Figure 2. Challenge-contribution graph. ‘清鼻涕也只是一点点’ means clear nose is just a little bit, and ‘鱼甘油能一起吃么？’ means can fish glycerol be eaten together?

Figure 3. Character-word information interaction framework diagram.

Figure 4. Example diagram of an IIA.

Figure 5. Model diagram of SLE.

Figure 6. Example case study diagram. ‘鱼甘油能一直吃起的’ means fish glycerin is always eaten.

Table 1. CAIS dataset.

Utterances			Intent Types	Slot Types
dev	test	train	11	75
994	1024	7995	11	75

Table 2. IMCS-21 dataset.

Utterances			Intent Types	Slot Types
dev	test	train	16	11
33,267	32,935	98,529	16	11

Table 3. Experimentally relevant parameters.

Experimental Parameters	Parameters Specific Information
Pre-trained models	RoBERTa(Chinese-roberta-wwm-ext)
Batch size	32
Learning rate	$2 \times 10^{- 5}$
Epoch	50
Optimizer	AdamW
Dropout	0.3
Word embedding dimension	128
Character emedding dimension	64

Table 4. Comparison of performance on IMCS-21 dataset.

Models	P	R	F1	Acc
TextCNN (2015)	74.02	70.92	72.22	78.99
TextRNN (2016)	73.07	69.88	70.96	78.53
TextRCNN (2015)	73.82	72.53	72.89	79.40
DPCNN (2017)	74.30	69.45	71.28	78.75
BERT (2019)	75.35	77.16	76.14	81.62
ERNIE (2019)	76.18	77.33	76.67	82.19
MC-BEAT (2022)	75.03	77.09	75.94	81.54
ERNIE-Health (2019)	75.81	77.85	76.71	82.37
Ours	78.23	80.86	79.11	82.39

Table 5. Comparison of performance on CAIS dataset.

Models	F1	Acc	OAcc
Slot-Gated (2018)	82.21	93.87	80.43
SF-ID Network (2019)	86.34	94.66	84.09
CN-Net (2019)	86.16	94.56	-
Stack-Propagation (2019)	87.65	94.57	84.68
MLWA (2021)	88.57	94.66	85.47
Stack-Propagation + RoBERTa	89.33	95.26	86.85
MLWA + RoBERTa	91.10	95.16	88.34
Ours	91.93	95.33	89.56

Table 6. Ablation experiments on IMCS-21 dataset.

Models	P	R	F1	Acc
Ours-IIF	77.85	79.92	79.08	82.18
Ours-SLE	77.82	80.72	78.77	82.38
Ours-ALL	77.17	79.38	78.52	81.63
Ours	78.23	80.86	79.11	82.39

Table 7. Ablation experiments on CAIS dataset.

Models	F1	Acc	OAcc
Ours-IIF	91.49	95.43	88.67
Ours-SLE	91.76	95.46	89.17
Ours-ALL	91.10	95.16	88.34
Ours	91.93	95.53	89.56

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Cao, P.; Yang, Z.; Li, X.; Li, Y. A Character-Word Information Interaction Framework for Natural Language Understanding in Chinese Medical Dialogue Domain. Appl. Sci. 2024, 14, 8926. https://doi.org/10.3390/app14198926

AMA Style

Cao P, Yang Z, Li X, Li Y. A Character-Word Information Interaction Framework for Natural Language Understanding in Chinese Medical Dialogue Domain. Applied Sciences. 2024; 14(19):8926. https://doi.org/10.3390/app14198926

Chicago/Turabian Style

Cao, Pei, Zhongtao Yang, Xinlu Li, and Yu Li. 2024. "A Character-Word Information Interaction Framework for Natural Language Understanding in Chinese Medical Dialogue Domain" Applied Sciences 14, no. 19: 8926. https://doi.org/10.3390/app14198926

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Character-Word Information Interaction Framework for Natural Language Understanding in Chinese Medical Dialogue Domain

Abstract

1. Introduction

2. Related Work

2.1. Medical Dialogue System

2.2. Intent Detection and Slot Filling Joint Framework

3. Method

3.1. Model Description

3.2. Intent Detection and Slot Filling

3.2.1. Intent Information Adapter

3.2.2. Slot Label Extractor

3.3. Joint Training

4. Experiments

4.1. Datasets

4.2. Evaluation Metrics

4.3. Experimental Settings

4.4. Baseline

4.5. Results and Analysis

4.6. Ablation Experiment

4.7. Case Study

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI