RoGraphER: Enhanced Extraction of Chinese Medical Entity Relationships Using RoFormer Pre-Trained Model and Weighted Graph Convolution

Zhang, Qinghui; Sun, Yaya; Lv, Pengtao; Lu, Lei; Zhang, Mengya; Wang, Jinhui; Wan, Chenxia; Wang, Jingping

doi:10.3390/electronics13152892

Open AccessArticle

RoGraphER: Enhanced Extraction of Chinese Medical Entity Relationships Using RoFormer Pre-Trained Model and Weighted Graph Convolution

by

Qinghui Zhang

^1,2,3,

Yaya Sun

¹,

Pengtao Lv

^1,2,3,*,

Lei Lu

^1,2,3,

Mengya Zhang

^1,2,3,

Jinhui Wang

^1,2,3,

Chenxia Wan

^1,2,3 and

Jingping Wang

¹

Laboratory of Grain Information Processing and Control, Henan University of Technology, Zhengzhou 450001, China

²

Henan Key Laboratory of Grain Storage Information Intelligent Perception and Decision Making, Zhengzhou 450001, China

³

Henan Grain Big Data Analysis and Application Engineering Research Center, Henan University of Technology, Zhengzhou 450001, China

^*

Author to whom correspondence should be addressed.

Electronics 2024, 13(15), 2892; https://doi.org/10.3390/electronics13152892

Submission received: 1 June 2024 / Revised: 5 July 2024 / Accepted: 19 July 2024 / Published: 23 July 2024

Download

Browse Figures

Versions Notes

Abstract

:

Unstructured Chinese medical texts are rich sources of entity and relational information. The extraction of entity relationships from medical texts is pivotal for the construction of medical knowledge graphs and aiding healthcare professionals in making swift and informed decisions. However, the extraction of entity relationships from these texts presents a formidable challenge, notably due to the issue of overlapping entity relationships. This study introduces a novel extraction model that leverages RoFormer’s rotational position encoding (RoPE) technique for an efficient implementation of relative position encoding. This approach not only optimizes positional information utilization but also captures syntactic dependency information by constructing a weighted adjacency matrix. During the feature fusion phase, the model employs an entity attention mechanism for a deeper integration of features, effectively addressing the challenge of overlapping entity relationships. Experimental outcomes demonstrate that our model achieves an F1 score of 83.42 on datasets featuring overlapping entity relations, significantly outperforming other baseline models.

Keywords:

RoFormerpre-training; rotational position encoding (RoPE); weighted graph convolutional network (WGCN); entity relationship overlap

1. Introduction

The integration of Chinese medical texts into information technology frameworks is crucial for enhancing the decision-making capabilities of healthcare professionals. However, Chinese medical texts are replete with unstructured entities that have rich semantic relationships. The purpose of extracting entity relationships from these texts is to identify the connections between two entities, which is essential for constructing knowledge graphs [1] and advancing intelligent healthcare solutions.

Entity relationship extraction is a crucial aspect of information extraction, especially within the realm of medical texts. Entity relationship extraction aims to distill structured relationships from Chinese medical documents [2]. Entity relationship extraction organizes these relationships into ternary sets, comprising a primary entity, relationship, and a secondary entity. As the field has evolved, three primary methodologies have emerged for the extraction of entity relationships in Chinese medical texts: rule-based, traditional machine learning-based and deep learning-based approaches.

Researchers rely on meticulously crafted rules derived from extensive medical dictionaries and knowledge bases [3] in rule-based methods. Rule-based systems have inherent limitations, particularly in their inability to adapt to the diversity of medical texts. These challenges significantly hinder the effective processing and analysis of large-scale medical texts. Therefore, machine learning is gradually coming into people’s view. Machine learning methods for entity relationship extraction use probabilistic statistical models to analyze and predict relationships within medical texts, treating the task as a classification problem [4]. However, reliance on data preprocessing and a lack of model generalization highlight the limitations of traditional machine learning. The advent of deep learning has brought a significant turning point in the field of entity relationship extraction. Researchers employ deep learning techniques to extract finer features from medical texts without the need for manual feature engineering.

Although many researchers successfully extract entity relationships from Chinese medical texts, the extraction performance remains suboptimal when dealing with datasets that contain nested and overlapping entities. While some studies introduce more complex models and graph networks to address these issues, several challenges still remain. One major issue is the ineffective utilization of relative positional information and syntactic structures among entities. Moreover, the nuanced and overlapping features in entity relationships are not yet fully understood, which continues to hinder progress in the Chinese medical field.

To address these issues, we propose an entity relationship extraction model based on RoFormer pre-training and weighted graph convolutional networks (RoGraphER). This model effectively leverages the dependencies and relative positional information between entities to enhance information extraction performance. Additionally, RoGraphER employs an entity attention mechanism for deep feature fusion. An entity attention mechanism is designed to effectively address the complexities of overlapping entity relationships. The innovations presented in this paper include

(1): To tackle the issue of polysemy within medical text datasets, we introduce the RoFormer pre-training model that incorporates rotational positional encoding. This approach efficiently utilizes positional information to mitigate ambiguities.
(2): We enhance the model’s comprehension of sentence feature information by updating the weighted adjacency matrix, which is achieved by calculating syntactic dependency weight coefficients between nodes.
(3): The integration of an entity attention mechanism significantly improves the model’s ability to discern the inherent structure of sentences, addressing the complexities associated with overlapping entity relationships.

2. Related Work

2.1. Deep Learning-Based Entity Relationship Extraction

Although convolutional neural networks (CNNs) [5] perform well in a series of natural language processing (NLP) tasks, they face some limitations when applied to entity relation extraction. CNNs use convolutional kernels for local feature extraction, which limits effectiveness in tasks that require recognizing long-distance dependencies. Because CNNs inherently lack the capability to capture extensive contextual information, they exhibit limitations in entity relation extraction tasks. In contrast, other neural network architectures, such as recurrent neural networks (RNNs) [6], long short-term memory networks (LSTMs) [7], and the revolutionary transformer [8], along with the pre-trained model BERT [9] and its derivatives, exhibit better performance in the field of complex entity relationship extraction. Among these, pre-trained models learn rich semantic information and contextual features through corpus training, and with the advancement of deep learning technology, their application in medical entity extraction is becoming increasingly widespread.

Lee et al. utilize the bidirectional encoder capabilities of BERT to delve into biomedical literature and clinical records, uncovering entities such as drugs and diseases, along with their interrelations [10]. This approach demonstrates BERT’s excellent performance in handling various specialized medical texts and clinical datasets. Building on this, Sun et al. enhance the BERT framework by incorporating biomedical expertise, employing Gaussian probability distributions to weight feature representations [11]. This enhancement greatly improves the model’s ability to accurately identify biomedical entities and their complex relationships. It achieves a remarkable success rate of 76.56% on the compound–protein interaction (CPI) dataset, underscoring its potential and value in biomedical entity relationship extraction.

The adoption of attention mechanisms signifies a transformative advancement in text processing, especially in the identification of entity relation extraction. This technique allows models to focus on text segments that are rich in entity information while ignoring irrelevant sections, thus enhancing the precision and efficiency of entity relationship extraction.

Yang Yanyun et al. propose the DA-BiLSTM-CRF model, which combines data augmentation and attention techniques to realize the entity relationship extraction task [12]. By leveraging unlabeled datasets for augmentation, the model achieves an F1 score of 82.43% in traditional Chinese medicine (TCM) text extraction. Addressing the challenges posed by lengthy texts and intricate entity interrelations in medical information processing, Yao et al. propose an integrated model combining ERNIE [13] pre-training, Bi-GRU neural networks, and attention mechanisms [14]. By leveraging ERNIE pre-training to enhance medical datasets, the model performs well in the field of medical information extraction.

2.2. EntityRelationship Extraction Based on Graph Neural Networks

In complex Chinese medical texts, traditional CNN and LSTM models cannot address the issue of overlapping entity relationships because they fail to effectively capture long-distance dependencies and complex contextual information. However, graph convolutional networks (GCNs) overcome these limitations by using graph representations. This approach allows the model to identify independent entities and clearly delineate their dynamic interrelationships, significantly improving the accuracy of medical text relation extraction tasks in scenarios with overlapping entities.

Hong, Yin et al. introduce a model that skillfully handles the complex network of entity connections using a relationship-aware attention mechanism [15]. By considering the behavior and properties of neighboring nodes and the edges that connect them, the model gains a better understanding of the network’s structure and function. This approach allows it to outperform traditional benchmarks and achieve outstanding results on various leading datasets. Addressing the complex issue of overlapping entity triples, Heng Hong Jun et al. introduce the FSSRel model [16]. The FSSRel model integrates semantic and syntactic insights to enhance the analysis and interpretation of structured data in graphs. The FSSRel model excels in environments with dense overlapping and multiple triples, demonstrating its effectiveness in entity relationship extraction tasks. Niu et al. introduce the GCN2-NAA model to address the complexities of sentences featuring overlapping relational triples in natural language processing [17]. This model integrates a two-stage graph convolutional network with a node-aware attention mechanism. This dual-layer structure is designed to effectively manage and decipher the complex interrelationships and attributes within texts. Zhao et al. address the challenging task of cross-sentence relation extraction in the biomedical field [18]. They employ a hybrid model that combines graph convolutional networks (GCNs) with a multi-head attention mechanism. This model effectively captures structural dependencies and long-distance relationships, thereby improving the accuracy and efficiency of extracting entity relationships from complex text structures.

3. Model

In this study, we introduce an innovative relationship extraction model—RoGraphER. This model combines the RoFormer pre-trained model with a weighted graph convolutional network (WGCN) to extract features from medical texts. By incorporating an entity attention mechanism, it effectively addresses the complex challenge of overlapping entity relationships in Chinese medical texts. The architecture of the RoGraphER model includes the following key layers:

To begin with, the RoFormer layer employs a rotational position encoding technique to enhance the processing of positional information, providing a robust foundation for understanding the structure of the text. Following this, the bidirectional long short-term memory (Bi-LSTM) network captures sequential dependencies and long-range relationships within the text. Next, the weighted graph convolutional network (WGCN) layer constructs a weighted adjacency matrix by calculating syntactic dependency weight coefficients between nodes. This innovative approach allows the layer to extract global semantic features and contextual information between nodes, effectively modeling complex interactions within the text, especially for overlapping entities. Finally, the representations of sentences and entities are refined through an entity attention mechanism and max pooling. This final step effectively addresses the issue of overlapping entity relationships. The RoGraphER model structure is shown in Figure 1.

3.1. RoFormer Pre-Training Models

Transformer-based pre-trained models, such as BERT and its derivatives, are making significant progress in the field of entity relationship extraction. However, when extracting entity relationships from Chinese medical texts, the performance of these models is hindered by the insufficient consideration of positional features. To address this issue, this approach proposes the rotational position embedding (RoPE) technique [19], which is specifically engineered to enhance the model’s ability to handle long-distance dependencies. RoPE incorporates rotational positional encoding to better capture the relative positions of words, thereby improving the model’s understanding of the context and relationships between entities over long text spans. The RoFormer framework introduces a groundbreaking approach to represent relative positional information in text sequences. It innovatively combines rotational position encoding with traditional absolute position encoding within its attention mechanism framework.

In Chinese medical texts,

E_{N} = {X_{i}}_{i = 1}^{N}

represents the embedding vectors for each word, where

X_{i}

denotes the embedding vector for the ith word before incorporating positional data. Through specific functions to adjust query, key, and value vectors, RoFormer enriches these vectors with positional context, denoted as:

q_{m} = f_{q} (x_{m}, m) k_{n} = f_{k} (x_{n}, n) v_{n} = f_{v} (x_{n}, n)

(1)

The calculation formula for attention weights and output vectors becomes:

a_{m, n} = \frac{\exp (\frac{q_{m}^{T} k_{n}}{\sqrt{d}})}{\sum_{j = 1}^{N} \exp (\frac{q_{m}^{T} k_{j}}{\sqrt{d}})} o_{m} = \sum_{n = 1}^{N} a_{m, n} v_{n}

(2)

We assumed there exists a function g that defines the inner product

q_{m}^{T} k_{n}

, where

q_{m}

and

k_{n}

include absolute positional information, thus representing their relative position. By using the function f(x), we introduce absolute positional information to q and k. After applying the attention operation, these vectors contain relative positional information, thereby establishing how positional information is directly embedded into the model. The inner product

q_{m}^{T} k_{n}

is articulated through Equation (3).

q_{m}^{T} k_{n} = < f_{q} (x_{m}, m), f_{k} (x_{n}, n) > = g (x_{m}, x_{n}, m - n)

(3)

This approach seeks an effective function f(q,k,v) to stabilize Equation (3). Under the assumption that positional information is not yet integrated, the initial conditions for the query and key vectors are

f_{q} (x_{m}, 0) = q

,

f_{k} (x_{n}, 0) = k

. The complex expression for

f_{q} (x_{m}, m)

is given in Equation (4). Specifically, it shows that

f_{q}

is represented as a complex exponential, where

‖x_{m}‖

is the magnitude of

x_{m}

and mθ is the phase angle dependent on position m. This complex exponential form is a key characteristic of RoPE, encoding positional information through rotation in the complex plane.

f_{q}

can also be represented as a linear transformation using a rotation matrix parameterized by mθ. This rotation matrix applies a rotational transformation to the input vector components

x_{m}^{(1)}

and

x_{m}^{(2)}

. Matrix multiplication represents a spatial transformation that maps one matrix to another.

f_{q} (x_{m}, m) = R_{f_{q}} (x_{m}, m) e^{{i Θ}_{f_{q}} (x_{m}, m)} = ‖x_{m}‖ e^{i m θ}

(4)

f_{q} (x_{m}, m) = (\begin{matrix} c o s m θ & - s i n m θ \\ s i n m θ & c o s m θ \end{matrix}) (\begin{matrix} x_{m}^{(1)} \\ x_{m}^{(2)} \end{matrix})

(5)

We rotated the high-dimensional vector, two in pairs, and rotated them separately. The rotation of the final high-dimensional vector can be expressed as Equation (6).

(\begin{matrix} \begin{matrix} \begin{matrix} \begin{matrix} x_{1} \\ x_{2} \end{matrix} \\ x_{3} \\ x_{4} \end{matrix} \\ ⋮ \end{matrix} \\ x_{d - 1} \\ x_{d} \end{matrix}) ⨂ (\begin{matrix} \begin{matrix} \begin{matrix} \begin{matrix} {c o s m θ}_{1} \\ {c o s m θ}_{1} \end{matrix} \\ {c o s m θ}_{2} \\ {c o s m θ}_{2} \end{matrix} \\ ⋮ \end{matrix} \\ {c o s m θ}_{\frac{d}{2}} \\ {c o s m θ}_{\frac{d}{2}} \end{matrix}) + (\begin{matrix} \begin{matrix} \begin{matrix} \begin{matrix} - x_{2} \\ x_{1} \end{matrix} \\ - x_{4} \\ x_{3} \end{matrix} \\ ⋮ \end{matrix} \\ - x_{d} \\ x_{d - 1} \end{matrix}) ⨂ (\begin{matrix} \begin{matrix} \begin{matrix} \begin{matrix} {s i n m θ}_{1} \\ {s i n m θ}_{1} \end{matrix} \\ {s i n m θ}_{2} \\ {s i n m θ}_{2} \end{matrix} \\ ⋮ \end{matrix} \\ {s i n m θ}_{\frac{d}{2}} \\ {s i n m θ}_{\frac{d}{2}} \end{matrix})

(6)

The RoFormer model’s innovative methodology, which uses rotational matrices to encode relative positions, fundamentally transforms the understanding of contextual relationships and positional information in text processing. It excels particularly in analyzing Chinese medical texts, and this approach surpasses the limitations of traditional positional encoding methods.

3.2. Bi-LSTM

The Bi-LSTM model innovatively circumvents the gradient issues prevalent in traditional RNNs during the processing of extensive sequences by synergizing two LSTM networks. This architecture ensures that both preceding and succeeding contextual information is considered simultaneously at each temporal point, thereby enhancing the model’s understanding of the sequence’s overall context. The representation of context, derived from preceding text segments, is determined by the forward LSTM unit employing a series of equations [20]:

z_{i} = σ (W_{z} x_{i} + U_{z} {\vec{h}}_{i - 1} + b_{z})

(7)

o_{i} = σ (W_{o} x_{i} + U_{o} {\vec{h}}_{i - 1} + b_{o})

(8)

f_{i} = σ (W_{f} x_{i} + U_{f} {\vec{h}}_{i - 1} + b_{f})

(9)

{\tilde{c}}_{i} = t a n h (W_{c} x_{i} + U_{c} {\vec{h}}_{i - 1} + b_{c})

(10)

c_{i} = f_{i} ⨀ c_{i - 1} + z_{i} ⨀ {\tilde{c}}_{i}

(11)

{\vec{h}}_{i} = o_{i} ⨀ t a n h (c_{i})

(12)

In these equations,

z_{i}

,

o_{i}

and

f_{i}

correspond to the input, output, and forget gates, respectively;

{\tilde{c}}_{i}

introduces new memory content;

c_{i}

encapsulates the contextual memory; and

{\vec{h}}_{i}

reflects the forward LSTM unit’s ultimate output. Weight matrices

W_{(.)}

and bias terms

b_{(.)}

are the linchpins in this equation, fine-tuning the gates’ operations and thereby influencing the model’s overall behavior. The ultimate hidden state

h_{i}

is derived by amalgamating the forward representation

{\vec{h}}_{i}

and the backwardrepresentation

{\overset{\leftarrow}{h}}_{i}

, offering a comprehensive view that encapsulates both the preceding and succeeding contexts within the sequence.

h_{i} = {\vec{h}}_{i} ⨁ {\overset{\leftarrow}{h}}_{i}

(13)

3.3. WGCN

A single-layer graph convolutional network (GCN) only captures first-order neighborhood dependencies, which poses significant challenges in extracting complex entity relationships from Chinese medical texts. While stacking multiple GCN layers can capture deeper feature information, it often results in significant computational burdens and over-smoothing issues [21]. To address these challenges, the weighted graph convolutional network (WGCN) is proposed [22]. In our dataset of Chinese medical texts, the implementation of WGCN enhances the model’s ability to capture a wide range of features within a single layer. By assigning different weights to various edges, WGCN effectively balances the need for detailed neighborhood feature representation while maintaining computational efficiency. This approach ensures that the distinctiveness of features is preserved without overloading the system, allowing for more accurate extraction of complex entity relationships.

Utilizing LTP [23] tools to build sentence dependency trees converts complex structural details into adjacency matrices, creating a clear depiction. The direct links between these dependency tree nodes indicate stronger syntactic relationships, while indirect links suggest weaker associations. As the connections become more indirect or the distance between nodes increases, the strength of these dependency relationships weakens. The weighted graph convolutional network (WGCN) integrates the sentence’s dependency tree T and the sequence length N as inputs, subsequently generating a logical adjacency matrix (LAM) [24].

To construct the LAM, this article began by initializing an N × N matrix, where N represents the number of nodes. Each matrix element

A_{i j}

reflects the edge weight from node i to node j. Initially, the matrix is a zero matrix. Next, we traversed the sentence’s dependency tree T to explore all child nodes j for each node i. A crucial step in the construction process was calculating the distance between nodes i and j. Within the weighted adjacency matrix, element values were derived by combining the similarity of the nodes with their distance, followed by a normalization step. The formula applied is as follows:

s_{i j} = \frac{{\underset{x}{\to}}_{i} \cdot {\underset{x}{\to}}_{j}}{\underset{‖x_{i}‖}{\to} \underset{‖x_{j}‖}{\to}} d_{i j} = \frac{1}{e^{d - 1}} e_{i j} = α \cdot s_{i j} + β \cdot d_{i j}

(14)

where

s_{i j}

denotes the nodes’ similarity,

d_{i j}

denotes the reciprocal of their distance, and

e

signifies Euler’s number, employed in calculating attention weights. The softmax function transforms the connection strength into attention weights

e_{i j}

, ensuring the sum of all matrix weights equals one. This softmax combines cosine similarity and distance factor into a unified similarity score. These weights, once finalized, update the original adjacency matrix as shown below:

{\tilde{L A M}}_{i j} = \frac{e x p (e_{i j})}{\sum_{k \in N_{i}} e x p (e_{i k})}

(15)

with

N_{i}

marking the neighborhood surrounding node i. As depicted in Figure 2, initially all-zero matrix elements become weighted after attention mechanism calculations, highlighting the dependency-related edges (green) and weights of hypothetical edges (yellow).

The output of the feature vectors, subsequent to the processing through the weighted feature network, is delineated as:

h_{i}^{l} = σ (\frac{\sum_{j = 1}^{N} {\tilde{L A M}}_{i j} h_{j}^{(l - 1)} W^{(l)}}{d_{i}} + b^{(l)})

(16)

where

h_{i}^{0}

is the vector obtained post Bi-LSTM encoding,

d_{i}

represents the degree of node i in the dependency graph, and

b^{(l)}

is the bias vector.

3.4. Feature Fusion Layer

For the Chinese medical texts relation extraction model, the context information and entity information of sentence sequence are the most critical features, which play an important role in improving the performance of relation extraction [25]. Entity attention (EA) mechanism can handle the complex medical terminology and patient information, making the extraction of entity relationships from overlapping Chinese medical texts more efficient. As the data sequence X =

\{x_{1}, x_{2}, \dots x_{n}\}

, each element representing a segment of text, processes through the weighed graph convolutional network (WGCN) layer, it transforms into a series of enriched feature vectors

H^{l} = \{h_{1}^{l}, h_{2}^{l}, \dots h_{N}^{l}\}

. Then, the weight of each word relative to the entity pair in the sentence sequence is obtained by Equation (17):

a_{i} = \frac{\exp (h_{i}^{l} \cdot w^{e n t})}{\sum_{j = 1}^{N} (h_{j}^{l} \cdot w_{j}^{e n t})}

(17)

where

w^{e n t}

is the entity information feature vector representations of each word in the sentence sequence. Then, the weighted sum is used to obtain the feature vector representations of the sentence sequence by Equation (18):

h = \sum_{i = 1}^{N} a_{i} \cdot h_{i}

(18)

This equation symbolizes the aggregation of feature vectors, where

a_{i}

represents the attention weight assigned to each vector

h_{i}

.

To further refine the data, we applied a max pooling operation to the feature vectors of the sentence. This operation reduces the dimensionality of the feature vectors by selecting the maximum value from each feature map segment, ensuring that the most relevant features are retained. After this reduction in dimensionality, these vectors are cleverly concatenated to form the final, consolidated feature vector:

h_{s} = m a x p o o l (h_{s})

(19)

h_{0} = m a x p o o l (h_{0})

(20)

h = m a x p o o l (h)

(21)

h_{o u t} = [h_{s}; h; h_{0}]

(22)

In this methodology,

h_{s}

and

h_{0}

are the feature vectors for the head and tail entities, respectively. The max pooling operation extracts the most important features by focusing on the highest values. This helps in retaining key information while reducing noise. By concatenating these pooled vectors, we created a comprehensive feature vector

h_{o u t}

that integrates important characteristics from different parts of the model. By leveraging max pooling and subsequent concatenation, the EA mechanism adeptly crucial entity characteristics into a feature vector.

3.5. Linear Classification Layer

After passing through the feature fusion layer, the vector

h_{o u t}

carries dependency information vectors. This fused vector is then fed into a fully connected layer of a feed-forward neural network to produce a probability distribution for relationships, as described by the following equation:

P (\frac{Y}{X}) = s o f t m a x (w_{f} h_{o u t} + b_{f})

(23)

within this formula, Y identifies a specific relationship category identified in advance, whereas X encapsulates the processed sentence sequence.

P (\frac{Y}{X})

delineates the likelihood of selecting the most appropriate relationship type between two entities. Utilizing cross-entropy as the loss function, the model applies back propagation to rectify discrepancies between its relationship type predictions and the actual classifications, thus sharpening the model’s entity relationship identification accuracy.

4. Experiment Results and Analysis

4.1. Datasets

The dataset currently used in this research stems from the entity relation extraction task specifically designed for Chinese medical texts, which was introduced at the China Health Information Processing Conference (CHIP) in 2020. It contains more than 17,000 Chinese medical sentences, more than 50,000 triplets, and 43 types of prespecified relations. The CHIP2020 dataset consists of diseases, symptoms, imaging tests, and other medical information. Moreover, the dataset is divided into a training set and testing set. We divided the sentence into normal, single-entity overlap (SEO), and entity pair overlap (EPO). The detailed statistical results are shown in Table 1.

We introduced the performance of the model using 5-fold cross-validation. The test dataset was divided into five parts, with each part containing normal, EPO, and SEO data types. One part served as the test set, while the remaining four parts served as the training set. The overall performance was averaged over the five results.

4.2. Evaluation Indicators

In this study, we used three core evaluation metrics: precision, recall, and F1 score (F-Measure) to assess the model’s performance comprehensively. The calculation process was as follows:

P = \frac{T_{p}}{T_{p} + F_{P}} R = \frac{T_{p}}{T_{p} + F_{n}} F 1 = \frac{2 \times P \times R}{P + R}

(24)

In this analytical framework,

T_{p}

consistently symbolizes the number of correct predictions the model makes when identifying various types of entity relationships.

F_{P}

indicates the instances where the model incorrectly predicts relationships, and

F_{n}

refers to the occasions when the model overlooks existing relationships. According to these definitions, the F1 score of the model represents a harmonized mean between precision (P) and recall (R). This score directly mirrors the comprehensive performance of the entity relationship extraction model.

4.3. Parameterization

The parameter configurations during the training process of the Chinese medical entity relationship extraction model proposed in this thesis are shown in Table 2:

This model selected key parameters to optimize efficiency and performance. During the training phase, 64 data samples were processed per batch to balance computational efficiency. A single-layer graph convolutional network (WGCN) was integrated to deeply analyze relationships between entities. To avoid overfitting and improve the model’s generalization ability, a dropout strategy with a 0.5 ratio was used. The training duration was set to 30 epochs to ensure the model learns adequately without over fitting. In the LSTM module, the hidden layer dimension was set to 100, while in the WGCN, it was set to 200. This configuration allows for effective residual calculations, ensuring the model remains robust and efficient when processing complex data.

To optimize the model’s training process for increased efficiency, it was necessary to determine the optimal number of iterations. To save training time, the model employed an early stopping protocol. This mechanism stops the training when further reductions in the loss value become negligible. Experimental results show that the model’s loss value usually stabilizes after 27 epochs, at which point the model’s performance reaches its peak. This strategy significantly improves training efficiency. Additionally, it ensures peak accuracy without incurring extra computational costs.

When studying the extraction performance of the RoGarphER model on medical text datasets, the influence of the learning rate on the decreasing trend of the loss value was carefully evaluated. Proper calibration of the learning rate is crucial for the model to achieve effective convergence. Selecting the optimal initial learning rate can significantly enhance training efficiency. Conversely, a starting learning rate that is too high can hinder the model’s convergence, while a rate that is too low may unnecessarily prolong the convergence period.

From Figure 3a, it is clear that the model’s loss values decrease differently under various learning rates. All models show a steady decline in loss values as the number of training epochs increases. However, the model with a learning rate of 4 × 10⁻⁵ achieves the slowest convergence, while the learning rate of 7 × 10⁻⁵ achieves the fastest convergence. After extending the training to 30 epochs, all models demonstrate significant reductions in loss levels, with the model using a learning rate of 6 × 10⁻⁵ stabilizing around the 27th epoch. This comparison indicates that the learning rate of 6 × 10⁻⁵ strikes the best balance between training efficiency and convergence speed. From Figure 3b, it is clear that when the learning rate is set to 6 × 10⁻⁵, the model’s performance metrics are well balanced and the F1 score is relatively high. In summary, the learning rate of 6 × 10⁻⁵ provides a balanced performance across different metrics, ensuring the best balance in training efficiency, convergence speed, precision, recall, and F1 score.

4.4. Ablation Experiment Results and Analysis

4.4.1. Superiority of the RoFormer Model

The RoFormer model enhances the transformer architecture by incorporating rotary positional embedding (RoPE), which significantly improves the accuracy of capturing semantic features. This integration is particularly effective for extracting entity relationships in Chinese medical texts. The RoFormer model adeptly identifies subtle linguistic nuances, making it crucial for accurately extracting relationships in complex medical contexts. Table 3 details the RoFormer model’s performance compared to two other pre-trained models.

From Table 3, BERT uses character-level tokens, while WoBERT and RoFormer use word-level tokens. Character-level tokens have advantages in certain fine-grained text analysis tasks, but when dealing with Chinese medical texts, word-level tokens can better capture semantic and contextual relationships. Additionally, RoFormer employs rotary positional embedding (RoPE), which not only captures relative positional relationships but also excels in handling long-range dependencies and complex contextual relationships. RoPE allows the model to maintain a stable performance across different sequence lengths and complexities.

To evaluate the context-learning capabilities of the RoFormer model, comparative pre-training experiments were conducted, including the RoFormer, BERT, and WoBERT models. Each model employs unique position encoding techniques, with the RoFormer model utilizing rotational position encoding. Throughout these experiments, the masked language model (MLM) loss value was used as the primary metric to gauge training efficiency and effectiveness. The experimental results are presented in Figure 4.

From the analysis of Figure 4, it is evident that the RoFormer model, with its rotary positional embedding (RoPE), significantly improves the learning efficiency and effectiveness of contextual representation compared to BERT and WoBERT. Throughout the entire training process, the RoFormer model achieves the lowest and most stable MLM loss values, indicating its superior performance in context learning and robustness in capturing complex dependencies within the data. This performance underscores the effectiveness of the RoFormer model in extracting entity relationships in complex Chinese medical texts.

To validate RoFormer’s capability in handling polysemy, we conducted experiments using the CHIP2020 dataset. This dataset includes numerous instances of polysemy words used in different medical contexts. For example, the term ‘fever’ can represent different symptoms or diseases depending on the context. We compared RoFormer with BERT and WoBERT to demonstrate its effectiveness in addressing polysemy. As shown in Table 4, RoFormer outperforms both BERT and WoBERT, demonstrating its superior capability in disambiguating polysemous words.

RoFormer demonstrates clear superiority in handling polysemy due to its advanced rotary position encoding (RoPE) mechanism, as evidenced by Table 4. The F1 score of RoFormer is 2.15% higher than WoBERT and 7.59% higher than BERT, illustrating its significant advantage.

BERT uses absolute position encoding, which inadequately captures word order and positional relationships. In contrast, RoPE in RoFormer encodes words in a continuously rotational manner. This method helps the model dynamically capture and represent intricate positional relationships. This method is also essential for understanding polysemy words within different contexts.

WoBERT, with its relative position encoding, improves over BERT by better capturing contextual relationships. However, it still falls short when compared to RoFormer’sRoPE. RoPE enables RoFormer to focus precisely on the most relevant parts of the context that influence the meanings of polysemy words. This mechanism not only enhances the model’s ability to understand subtle contextual nuances but also significantly improves its accuracy in polysemy disambiguation.

By integrating RoPE, RoFormer effectively captures the complexities of word relationships and meanings. This allows it to outperform both WoBERT and BERT in scenarios involving polysemy.

In a word, RoFormer’s advanced rotary position embedding and improved attention mechanism enable it to more effectively capture and utilize contextual information. This allows RoFormer to accurately distinguish between different meanings of words based on their context, thus addressing the challenge of polysemy.

4.4.2. Impact of the Logic Matrix LAM on the Model

This study introduces the logical adjacency matrix (LAM) as an effective tool. LAM identifies k-order neighborhood information through a single-layer weighted graph convolutional network (WGCN), eliminating the need for additional layers or increased parameters. To evaluate the effectiveness of LAM in extracting entity relationships, a comparative study was conducted between a single-layer WGCN and multi-layer GCNs. The experimental results are presented in Figure 5.

From Figure 5, it is evident that the single-layer WGCN combined with the use of LAM shows significant advantages in extracting entity relationships compared to multi-layer GCNs. The single-layer WGCN not only learns quickly in the initial stages but also maintains lower training loss throughout the training process. This demonstrates that LAM effectively captures k-order neighborhood information in a single-layer structure, improving the model’s training efficiency and accuracy.

In order to comprehensively evaluate the performance of graph convolutional networks (GCNs) with different layers and the introduction of a weighting mechanism (weighted GCN), we conducted detailed experiments and compared the performance of each model in the entity relationship extraction task. Table 5 provides a detailed comparison of the performance of several models on the CHIP2020 dataset.

From the data analysis in Table 5, it is evident that the proposed weighted graph convolutional network (GCN) consistently improves the F1 score by 0.67 compared to the original GCN. Additionally, it outperforms the GCN models with multiple stacked layers, demonstrating superior performance across various configurations.

Overall, LAM is an efficient tool that effectively identifies and utilizes complex neighborhood information through a single-layer WGCN. It accomplishes this without increasing model complexity or parameter count, significantly enhancing the performance of entity relationship extraction.

4.4.3. Effect of Entity Attention Mechanisms on Experiments

To validate the effectiveness of the entity attention mechanism in addressing the overlapping entity relationships, two distinct experiments were conducted on the CHIP2020dataset. The outcomes of these experiments are summarized in Table 6:

Incorporating an entity attention mechanism into the feature fusion layer enhances the model’s capability to weigh the significance of each entity’s influence within a sentence. By assigning weights based on the extent of influence, this method shows notable efficiency in datasets featuring overlapping entity relationships. To verify the ability of entity attention to alleviate the problem of overlapping entity triples, experiments were conducted on three types of sentences, and their performance was compared with models without entity attention. The specific experimental results are shown in Figure 6.

Figure 6 presents the comparison results between the RoGraphER model and the baseline model across three types of sentences. The results clearly show that the RoGraphER model outperforms the baseline in all categories: general, EPO, and SEO. Specifically, the F1 scores increased by 2.03, 2.21, and 1.99 points, respectively, compared to the baseline model without entity attention. This consistent performance, particularly in the EPO category, highlights the effectiveness of entity attention in addressing overlapping entity relationship datasets.

4.5. Comparison of Experimental Results and Analysis

To ascertain the efficacy of the RoFormer-based pre-trained model combined with a weighted graph convolutional network, as proposed in this article, for the Chinese medical entity relationship dataset, comparative analyses were conducted against CopyRE [26], MGCN [27], GraphJERE [28], and SpanBioER [29].

Table 7 provides a comparative analysis of various models. It is evident from Table 7 that the RoGraphER model outperforms other entity relationship extraction models.

CopyRE: CopyRE introduces a copying mechanism into Seq2Seq learning. Although it innovates in sequence learning, it fails to account for positional information, leading to a suboptimal performance in handling complex dependencies and overlapping entity relationships. This makes it a benchmark model to demonstrate our model’s improvements in handling complex relationships.

MGCN: MGCN uses graph convolutional network technology, effectively capturing both local and global features of graph-structured data. MGCN was chosen to showcase the advantages of our model in utilizing graph convolutional network techniques.

GraphJERE: GraphJERE extracts detailed features from character sequences using convolutional networks, excelling in identifying local patterns and interactions within the text. It was selected to compare our model’s improvements in handling long-range dependencies and contextual information.

SpanBioER: SpanBioER excels in extracting fine-grained features at the character level using convolutional networks. It efficiently processes and identifies detailed and nuanced patterns. SpanBioER was chosen to demonstrate our model’s advantages in capturing full-context and long-distance relationships.

These models represent the latest and most effective methods in the biomedical entity relationship extraction task. Comparing our model with these models provides a comprehensive demonstration of the improvements and advantages of our model in this specific task.

The comparative analysis results in Table 7 show that the RoGraphER model outperforms other entity relationship extraction models in all performance metrics. The RoGraphER model enhances its ability to effectively capture and utilize contextual information by incorporating the RoPE mechanism to utilize word positional information, thereby more accurately resolving the ambiguity of polysemy words. By employing WGCN, the model deepens its understanding of the textual structure by aggregating data from adjacent nodes. This approach effectively captures complex dependencies without increasing the model’s parameter burden, thereby maintaining computational efficiency. During the feature fusion phase, the integration of the EA mechanism facilitates a deep integration of features. This fusion strategy enhances the model’s semantic understanding and significantly improves its ability to manage complex structural data, which is crucial for effectively addressing overlapping entity relationships.

To provide a comprehensive evaluation of our model, we present the performance metrics from the 5-fold cross-validation experiment. Table 8 below shows the precision (P), recall (R), F1score (F1), and average accuracy (Ave-acc) for each fold.

The results of the 5-fold cross-validation experiment show that our model performs consistently across different subsets of the data. The cross-validation results demonstrate the robustness and reliability of our model across different data subsets. By partitioning the data into multiple folds, we ensured that the model’s performance was not dependent on any specific portion of the data. This method provides a more comprehensive evaluation and highlights the model’s ability to generalize well to unseen data.

To evaluate the performance stability of RoGraphER and the other four comparison models in the entity relation extraction task, we used an ANOVA test to determine the statistical significance of the results. We used the F1 score, which takes into account both precision and recall, for the evaluation. The experimental results are shown in Table 9.

The significance test results in Table 9 show that the p-values of the RoGraphER model are well below 0.05, indicating statistically significant differences between the groups. Compared to other four comparison models, RoGraphER’s lower p-values highlight its superior performance. Additionally, the RoGraphER model exhibits consistent and effective performance across various experiments, underscoring its stability and reliability. The high F1 scores achieved by RoGraphER across different batch sizes surpass those of other models. The significantly lower p-values further demonstrate its exceptional performance and stability in extracting overlapping entity relationships.

To evaluate the performance of the RoGraphER model on specific diseases, we created subsets of the CHIP2020 dataset focused on the aforementioned rare diseases. These subsets included annotated texts specifically mentioning diseases, symptoms, treatments, and related genetic information. The experimental results are shown in Table 10.

From Table 10, it can be seen that the RoGraphER model does not perform well in extracting specific entity relations related to certain diseases. The primary reason is that the RoGraphER model was not specifically designed to integrate domain-specific medical knowledge bases or ontologies. The lack of specialized domain knowledge limits the model’s ability to handle complex and rare medical relationships. Additionally, during training, the RoGraphER model may not have been exposed to a sufficiently diverse range of medical texts, particularly those involving rare diseases and complex relationships. This limits the model’s generalization ability when faced with these special cases.

To address these issues, we plan to integrate professional medical knowledge bases and ontologies (such as oncology and genetics) into the model to enhance its ability to recognize entity relationships in specific diseases. Additionally, we will utilize transfer learning methods, first pre-training on general medical data and then fine-tuning on specific disease data, to improve the model’s performance in specialized fields.

5. Conclusions and Future Prospects

The RoGraphER model enhances its ability to effectively capture and utilize contextual information by incorporating the RoPE mechanism to utilize word positional information, thereby more accurately resolving the ambiguity of polysemy words. By employing WGCN, the model deepens its understanding of the textual structure by aggregating data from adjacent nodes. This approach effectively captures complex dependencies without increasing the model’s parameter burden, thereby maintaining computational efficiency. During the feature fusion phase, the integration of the EA mechanism facilitates a deep integration of features. This fusion strategy enhances the model’s semantic understanding and significantly improves its ability to manage complex structural data, which is crucial for effectively addressing overlapping entity relationships.

While the model represents a significant advancement in managing the complexities of overlapping entity relationships, its performance was not good when analyzing the extraction of entity relationships for certain rare diseases in the CHIP2020 dataset. The primary reason is that the RoGraphER model is not specifically designed to integrate domain-specific medical knowledge bases or ontologies, which limits its ability to handle complex and rare medical relationships. Additionally, during training, the RoGraphER model may not have been exposed to a sufficiently diverse range of medical texts, particularly those involving rare diseases and complex relationships. This lack of specialized domain knowledge and diverse training data limits the model’s generalization ability in these special cases.

Furthermore, the model may underperform in environments that significantly deviate from the training data. When dealing with texts from entirely different fields or containing rare entity relationships, the model’s performance may decline. This decline is attributed to sparse datasets or rare entity relationships. The model does not sufficiently learn these during training, making it difficult to predict and recognize entity relationships in new environments. Significant deviations between the training data and the test data also contribute to a decline in performance. Additionally, inadequate representation of terms and relationships in the training dataset further impacts the model’s effectiveness.

To address these issues, we plan to integrate professional medical knowledge bases and ontologies (such as oncology and genetics) into the model. This will enhance the model’s ability to recognize entity relationships in specific diseases. Additionally, we will use transfer learning methods. First, we will pre-train on general medical data, and then fine-tune on specific disease data. By integrating specialized medical knowledge and transfer learning techniques, we aim to improve the model’s generalization ability in different medical contexts and its performance in recognizing complex and certain entity relationships.

Author Contributions

Conceptualization, Q.Z.; methodology, Q.Z.; software, Y.S.; validation, Q.Z., Y.S and J.W. (Jinhui Wang); formal analysis, Q.Z.; investigation, Q.Z.; resources, C.W.; data curation, M.Z.; writing—original draft preparation, Q.Z., and Y.S.; writing—review and editing, P.L.; visualization, J.W. (Jingping Wang); supervision, P.L.; project administration, L.L.; funding acquisition, Q.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded by the National Natural Science Foundation of China (62073123), Science & Technology Research Project of Henan Province (232102210081), Henan University of Technology high-level talents Scientific Research start-up Fund Project (31401356), Natural Science Foundation of Henan (242300421708), Development and Promotion Project of Henan Province (Grant No. 242102211002), and the High-Level Talent Research Start-up Fund Project of Henan University of Technology (2023BS040).

Data Availability Statement

Datasets are available on request from the authors.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Qiu, G.; Gao, H.; Wu, T. Advances in knowledge graph research. Intell. Eng. 2017, 3, 4–25. [Google Scholar]
Zhang, T.; Lin, H.; Tadesse, M.M.; Ren, Y.; Duan, X.; Xu, B. Chinese medical relation extraction based on multi-hop self-attention mechanism. Int. J. Mach. Learn. Cybern. 2020, 12, 355–363. [Google Scholar] [CrossRef]
Chen, T.; Wu, M.; Li, H. A general approach for improving deep learning-based medical relation extraction using a pre-trained model and fine-tuning. Database J. Biol. Databases Curation 2019, 2019, baz116. [Google Scholar] [CrossRef] [PubMed]
E, H.; Zhang, W.; Xiao, S.; Chen, R.; Hu, Y.; Zhou, X.; Niu, P. Survey of entity relationship extraction based on deep learning. J. Softw. 2019, 30, 1793–1818. [Google Scholar]
He, B.; Guan, Y.; Dai, R. Classifying medical relations in clinical text via convolutional neural networks. Artif. Intell. Med. 2019, 93, 43–49. [Google Scholar] [CrossRef] [PubMed]
Xu, Y.; Jia, R.; Mou, L.; Li, G.; Chen, Y.; Lu, Y.; Jin, Z. Improved relation classification by deep recurrent neural networks with data augmentation. In Proceedings of the COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, Osaka, Japan, 11–16 December 2016; pp. 1461–1470. [Google Scholar]
Luo, Y. Recurrent neural networks for classifying relations in clinical notes. J. Biomed. Inform. 2017, 72, 85–95. [Google Scholar] [CrossRef] [PubMed]
Zhu, Y.; Li, L.; Lu, H.; Zhou, A.; Qin, X. Extracting drug-drug interactions from texts with BioBERT and multiple entity-aware attentions. J. Biomed. Inform. 2020, 106, 103451. [Google Scholar] [CrossRef]
Lee, J.; Yoon, W.; Kim, S.; Kim, D.; Kim, S.; So, C.H.; Kang, J. BioBERT: A pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 2020, 36, 1234–1240. [Google Scholar] [CrossRef] [PubMed]
Sun, C.; Yang, Z.; Su, L.; Wang, L.; Zhang, Y.; Lin, H.; Wang, J. Chemical–protein interaction extraction via Gaussian probability distribution and external biomedical knowledge. Bioinformatics 2020, 36, 4323–4330. [Google Scholar] [CrossRef]
Yang, Y.; Du, J.; Nie, B.; Luo, J.; He, J. Fusion of data augmentation and attention mechanism for joint extraction of entities and relations in Chinese medicine. Intell. Comput. Appl. 2023, 13, 186–191. [Google Scholar]
Yao, J.; Wang, C. A medical entity relationship extraction model based on ERNIE-Bi-GRU-Attention. Inf. Technol. Informatiz. 2024, 002, 208–212. [Google Scholar]
Hong, Y.; Liu, Y.; Yang, S.; Zhang, K.; Wen, A.; Hu, J. Improving graph convolutional networks based on relation-aware attention for end-to-end relation extraction. IEEE Access 2020, 8, 51315–51323. [Google Scholar] [CrossRef]
Heng, H.; Miao, J. Joint extraction of entity relations by fusing semantic and syntactic graph neural networks. Comput. Sci. 2023, 50, 295–302. [Google Scholar]
Niu, W.; Chen, Q.; Zhang, W.; Ma, J.; Hu, Z. GCN2-NAA: Two-stage graph convolutional networks with node-aware attention for joint entity and relation extraction. In Proceedings of the 2021 13th International Conference on Machine Learning and Computing, Shenzhen, China, 26 February–1 March 2021. [Google Scholar]
Zhao, D.; Wang, J.; Lin, H.; Wang, X.; Yang, Z.; Zhang, Y. Biomedical cross-sentence relation extraction via multihead attention and graph convolutional networks. Appl. Soft Comput. 2021, 104, 107230. [Google Scholar] [CrossRef]
Su, J.; Ahmed, M.; Lu, Y.; Pan, S.; Bo, W.; Liu, Y. RoFormer: Enhanced transformer with rotary position embedding. Neurocomputting 2024, 568, 127063. [Google Scholar] [CrossRef]
Geng, B.; Liang, C.; Wei, W.; Zhu, C. Deep learning-based knowledge extraction from unstructured medical texts. Comput. Eng. Des. 2024, 45, 177–186. [Google Scholar] [CrossRef]
Zhang, L.; Duan, Y.; Liu, J.; Lu, Y. Chinese geologic entity relationship extraction based on RoBERTa and weighted graph convolutional network. Comput. Sci. 2024, 1–11. [Google Scholar]
Tang, M.; Li, T.; Wang, W.; Zhu, R.; Ma, Z.; Tang, Y. Software Knowledge Entity Relation Extraction with Entity-Aware and Syntactic Dependency Structure Information. Sci. Program. 2021, 2021, 7466114. [Google Scholar] [CrossRef]
Zhou, L.; Wang, T.; Qu, H.; Huang, L.; Liu, Y. A weighted GCN with logical adjacency matrix for relation extraction. In ECAI 2020; iOS Press: Amsterdam, The Netherlands, 2020; pp. 2314–2321. [Google Scholar]
Zeng, X.; Zeng, D.; He, S.; Liu, K.; Zhao, J. Extracting relational facts by an end-to-end neural model with copy mechanism. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Melbourne, Australia, 15–20 July 2018; pp. 506–514. [Google Scholar]
Wang, Y.; Liu, Y.; Zhang, J. MGCN: Medical Relation Extraction Based on GCN. Comput. Inform. 2023, 42, 411–435. [Google Scholar] [CrossRef]
Pang, Y.; Zhou, T.; Zhang, Z. A joint model for Chinese medical entity and relation extraction based on graph convolutional networks. In Proceedings of the 2021 3rd International Conference on Natural Language Processing (ICNLP), Beijing, China, 26–28 March 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 119–124. [Google Scholar]
Fei, H.; Zhang, Y.; Ren, Y.; Ji, D. A span-graph neural model for overlapping entity relation extraction in biomedical texts. Bioinformatics 2021, 37, 1581–1589. [Google Scholar] [CrossRef]
Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
Sun, Y.; Wang, S.; Li, Y.; Feng, S.; Chen, X.; Zhang, H.; Tian, X.; Zhu, D.; Tian, H.; Wu, H. ERNIE: Enhanced representation through knowledge integration. CoRR abs/1904.09223 (2019). arXiv 2019, arXiv:1904.09223. [Google Scholar]
Che, W.; Li, Z.; Liu, T. Ltp: A chinese language technology platform. In Proceedings of the Coling 2010: Demonstrations, Beijing, China, 23–27 August 2010. [Google Scholar]
Peng, H.; Gao, T.; Han, X.; Lin, Y.; Li, P.; Liu, Z.; Sun, M.; Zhou, J. Learning from context or names? an empirical study on neural relation extraction. arXiv 2020, arXiv:2010.01923. [Google Scholar]

Figure 1. Structure of the RoGraphER model.

Figure 2. Weighted adjacency matrix. (a) Initially, all-zero matrix elements. (b) We constructed a weighted adjacency matrix by calculating the syntactic dependency weight coefficients between nodes. In the weighted adjacency matrix, edges related to dependency relationships are represented in green, while edges representing assumed weights are shown in yellow.

Figure 3. Losses of the model at different learning rates. (a) Left figure illustrates the model loss across different learning rates. It is evident from the figure that the model achieves the best performance at a learning rate of 6 × 10⁻⁵, stabilizing around the 27th epoch; (b) Right figure illustrates the precision (P), recall (R), and F1score (F1) values of the model at four different learning rates. The comparison reveals that the model performs best at a learning rate of 6 × 10⁻⁵. For this learning rate, the precision, recall, and F1scores are 90.16, 77.62, and 83.42, respectively.

Figure 4. Comparison of different pre-trained models.

Figure 5. Impact of LAM on the model.

Figure 6. Performance with and without EA modeling.

Table 1. Results of the Chip2020 dataset statistics.

	Sentence	Triplets	Relations	Normal	EPO	SEO
Train	14,339	43,660	43	5724	6937	1678
Test	3585	10,626	43	1496	1655	434

Table 2. Parameter settings of the model.

Parameter Name	Parameter Value	Parameter Name	Parameter Value
Batch size	64	WGCN hidden dim	200
Epoch	30	Dropout	0.5
LSTM hidden dim	100	Learning rate	6 × 10⁻⁵
WGCN layer	1	Optimizer	Adam

Table 3. Comparison of the structure of different pre-trained models.

Pre-Trained Models	BERT	WoBERT	RoFormer
Token unit	Char	word	word
Position code	Absolute position	Relative position	Rotary position

Table 4. RoFormer outperforms both BERT and WoBERT.

Model	P	R	F1
BERT	79.78	68.68	73.82
WoBERT	85.66	73.74	79.26
RoFormer	87.98	75.69	81.41

Table 5. Settings for GCN and weighted GCN models with different layers.

Model	RoFormer	WGCN	EA	F1 Value
RoFormer + 1GCN + EA	√	×	√	81.85
RoFormer + 2GCN + EA	√	×	√	82.63
RoFormer + 3GCN + EA	√	×	√	82.75
RoFormer + 4GCN + EA	√	×	√	82.58
RoFormer + WGCN + EA	√	√	√	83.42

Table 6. Effect of entity attention on experimental results.

Model	RoFormer	WGCN	EA	F1 Value
RoFormer + WGCN	√	√	×	81.07
RoFormer + WGCN + EA	√	√	√	83.42

Table 7. Comparison of experimental results for different models.

Method	P	R	F1
CopyRE	72.88	56.71	63.78
MGCN	84.14	75.19	76.68
SpanBioER	85.16	73.32	78.79
GraphJERE	87.28	75.12	80.79
RoGraphER	90.16	77.62	83.42

Table 8. Performance metrics from 5-fold cross-validation.

Fold	P	R	F1	Ave-Acc
1	90.12	77.34	83.78	82.45
2	89.78	77.89	83.31	82.12
3	90.45	77.56	83.12	82.67
4	90.78	77.12	83.56	83.01
5	89.99	77.01	83.67	82.34
Average	90.16	77.62	83.42	82.52

Table 9. Significance test data.

Model	Batch Size = 32		Batch Size = 64		Batch Size = 128
Model	F1	p-Value	F1	p-Value	F1	p-Value
CopyRE	63.47	7.32 × 10⁻⁰⁶	63.78	8.54 × 10⁻⁰⁷	64.29	9.13 × 10⁻⁰⁶
MGCN	76.35	2.12 × 10⁻⁰⁶	76.68	5.19 × 10⁻⁰⁷	76.15	4.04 × 10⁻⁰⁷
GraphJERE	80.15	7.42 × 10⁻⁰⁸	80.79	2.56 × 10⁻¹¹	80.45	5.15 × 10⁻⁰⁹
SpanBioER	78.35	5.32 × 10⁻⁰⁹	78.79	2.35 × 10⁻⁰⁸	77.83	5.27 × 10⁻⁰⁸
RoGraphER	83.10	6.72 × 10⁻⁰⁹	83.42	3.14 × 10⁻¹²	83.32	2.87 × 10⁻¹⁰

Table 10. Performance of certain disease entity relationship extraction.

Disease	P	R	F1
Fabry Disease	83.15	72.98	77.69
Hereditary Breast Cancer	75.95	65.39	70.28
Systemic Lupus Erythematous	75.34	64.86	69.71
Huntington’s Disease	79.14	69.76	74.23
Alzheimer’s Disease	82.79	72.66	77.35

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, Q.; Sun, Y.; Lv, P.; Lu, L.; Zhang, M.; Wang, J.; Wan, C.; Wang, J. RoGraphER: Enhanced Extraction of Chinese Medical Entity Relationships Using RoFormer Pre-Trained Model and Weighted Graph Convolution. Electronics 2024, 13, 2892. https://doi.org/10.3390/electronics13152892

AMA Style

Zhang Q, Sun Y, Lv P, Lu L, Zhang M, Wang J, Wan C, Wang J. RoGraphER: Enhanced Extraction of Chinese Medical Entity Relationships Using RoFormer Pre-Trained Model and Weighted Graph Convolution. Electronics. 2024; 13(15):2892. https://doi.org/10.3390/electronics13152892

Chicago/Turabian Style

Zhang, Qinghui, Yaya Sun, Pengtao Lv, Lei Lu, Mengya Zhang, Jinhui Wang, Chenxia Wan, and Jingping Wang. 2024. "RoGraphER: Enhanced Extraction of Chinese Medical Entity Relationships Using RoFormer Pre-Trained Model and Weighted Graph Convolution" Electronics 13, no. 15: 2892. https://doi.org/10.3390/electronics13152892

APA Style

Zhang, Q., Sun, Y., Lv, P., Lu, L., Zhang, M., Wang, J., Wan, C., & Wang, J. (2024). RoGraphER: Enhanced Extraction of Chinese Medical Entity Relationships Using RoFormer Pre-Trained Model and Weighted Graph Convolution. Electronics, 13(15), 2892. https://doi.org/10.3390/electronics13152892

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

RoGraphER: Enhanced Extraction of Chinese Medical Entity Relationships Using RoFormer Pre-Trained Model and Weighted Graph Convolution

Abstract

1. Introduction

2. Related Work

2.1. Deep Learning-Based Entity Relationship Extraction

2.2. EntityRelationship Extraction Based on Graph Neural Networks

3. Model

3.1. RoFormer Pre-Training Models

3.2. Bi-LSTM

3.3. WGCN

3.4. Feature Fusion Layer

3.5. Linear Classification Layer

4. Experiment Results and Analysis

4.1. Datasets

4.2. Evaluation Indicators

4.3. Parameterization

4.4. Ablation Experiment Results and Analysis

4.4.1. Superiority of the RoFormer Model

4.4.2. Impact of the Logic Matrix LAM on the Model

4.4.3. Effect of Entity Attention Mechanisms on Experiments

4.5. Comparison of Experimental Results and Analysis

5. Conclusions and Future Prospects

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI