Med-Tree: A Medical Ontology Tree Combined with the Graph Attention Networks for Medication Recommendation

Yue, Weiqi; Zhang, Lijuan; Zhang, Lei; Huang, Jie; Wan, Jian; Xiong, Naixue

doi:10.3390/electronics11213558

Open AccessArticle

Med-Tree: A Medical Ontology Tree Combined with the Graph Attention Networks for Medication Recommendation

by

Weiqi Yue

¹

,

Lijuan Zhang

^1,*

,

Lei Zhang

^1,2,

Jie Huang

¹,

Jian Wan

^1,*

and

Naixue Xiong

³

¹

School of Information and Electronic Engineering, Zhejiang University of Science and Technology, Hangzhou 310023, China

²

School of Biological and Chemical Engineering, Zhejiang University of Science and Technology, Hangzhou 310023, China

³

Department of Computer Science, Mathematics Sul Ross State University, Alpine, TX 79830, USA

^*

Authors to whom correspondence should be addressed.

Electronics 2022, 11(21), 3558; https://doi.org/10.3390/electronics11213558

Submission received: 17 September 2022 / Revised: 28 October 2022 / Accepted: 28 October 2022 / Published: 31 October 2022

(This article belongs to the Section Artificial Intelligence)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Medication recommendation based on Electronic Health Records (EHRs) is a significant research direction in the field of intelligent medicine, which aims to recommend personalized medication combinations for patients based on their historical and current physical conditions. However, since the structural and temporal characteristics of medical records are affected by many uncertain factors, there are many limitations in medication recommendation methods based on EHRs. Specifically, most existing works either fail to adequately assess the structural correlation and temporal dependency among various medical entities or ignore existing knowledge of Drug–Drug Interactions (DDI), which could lead to adverse outcomes. These factors contribute to poor recommendation quality. Therefore, we propose a medical ontology tree model combined with the Graph Attention Networks (GAT) for medication recommendations. First, the class hierarchy extracted from the medical ontology and the GAT model is used to learn the ICD-9 codes of diagnoses and procedures, which enriches the semantic representation of medical entities. Secondly, Gate Recurrent Units (GRU) are used to learn the temporal characteristics of medical entities. Finally, memory bank, dynamic memory and DDI graph are used to optimize the hidden layer results, which improve the accuracy of the model. Experimental results show that the proposed model is superior to the previous methods in all evaluation indicators, and the recommended results have a lower DDI rate.

Keywords:

medication recommendation; electronic health records; drug–drug interactions; graph attention networks; memory bank

1. Introduction

Electronic Health Records (EHRs) are the original records of the whole process of diagnosis and treatment of patients, which can greatly improve the work efficiency and medical quality of the care process and provide a smoother patient experience [1,2,3,4]. With the deepening of medical information construction, efforts have been devoted to research studies, including medication recommendations, medical knowledge questions and answers, etc. [5,6,7]. Figure 1 illustrates the synergy of the medication recommendation system in the diagnosis and treatment process, which simplifies the medical process and assists doctors in formulating safe and effective prescriptions. Moreover, the goal of medication recommendations is to recommend personalized medication combinations for patients based on their current medical records and historical health conditions. However, most of the early medication recommendation tasks are based on the prior knowledge of experts with rich clinical experience. In recent years, the continuous optimization of the wireless sensor structure has enabled the collection work of EHRs to be fully carried out [8,9]. Meanwhile, a large number of deep learning models are widely used in the field of medication recommendation, which significantly improves the accuracy of recommendation tasks and the feasibility for practical applications [10,11,12,13]. However, although the EHRs are cleaned and organized in pretraining, there are still many uncertainties, including a lack of information and inconsistencies from different sources, highly subjective and imprecise medical history, etc. Among them, obscure structural and temporal features and Drug–Drug Interactions (DDIs) [14,15] are two more critical factors that bring difficulties to the subsequent medication recommendation tasks:

(1): Obscure structural and temporal features: the EHRs can be considered as a continuous collection of medical entities consisting of diagnoses, procedures and medications, which are full of obscure structure and temporal correlations among medical entities. For example, a peptic ulcer may cause gastric perforation, chickenpox may cause pneumonia, and cerebral infarction often causes high blood pressure. This cryptic medical knowledge affects the accuracy of medication recommendation tasks.
(2): DDIs: two or more drugs taken at the same time or in sequences of time may result in a compound effect. The effect can enhance or reduce the drugs’ efficacy and reduce or aggravate the side effects. For example, the combination of a cholinesterase reactivator and atropine sulfate can produce complementary effects, which reduces the amount of atropine and adverse effects. The antimalarial drug artemisinin is susceptible to resistance when used alone and can delay resistance when used in combination with sulfamethoxine and pyrimethamine.

To enhance the learning of obscure structures and temporal features and reduce the DDI rate in recommendation results, early studies include k-means clustering [16], the association rule method [17] and the expert system [18]. With the development of deep learning, Graph Neural Networks (GNN) are introduced into medication recommendation. Studies [19,20,21] introduce Graph Convolutional Networks (GCN) to capture the structural and temporal features between medical events that fully improve the recommendation efficiency. However, they ignore the influence of DDIs in medication-recommendation results. In addition, some models [22,23,24] account for DDIs in training, but their abilities to model the structural and temporal properties of EHRs are poor. Therefore, to simultaneously learn the cryptic structural and temporal features and reduce the DDIs in recommendation results, we propose a medical ontology tree model (Med-tree) combined with the Graph Attention Networks (GAT) for medication recommendation. Specifically, the Med-tree model applies the class hierarchy extracted from the medical ontology and the GAT model to learn the representations of diagnoses and procedures, and the model builds a DDI graph structure to reduce the DDI rate. As a result, the Med-tree model significantly improves the recommendation quality.

The contributions of our work can be summarized as follows:

(1): We treat EHRs as continuous records with structural and temporal properties and propose a medical ontology tree combined with GAT. The proposed model uses the medical hierarchy to model the structural features of medical entities. Meanwhile, the model applies two GRU models with attention mechanisms to capture the temporal characteristics in EHRs.
(2): We model the medical knowledge database and construct the drug interaction map. Specifically, the drug interaction map is embedded into the memory component using the memory bank and dynamic memory. Moreover, the query generator is applied to realize the memory search based on the attention mechanism, which effectively reduces the DDI rate.
(3): The proposed model is tested on MIMIC-III datasets, and the performance of the model is superior to all baselines in terms of the Jaccard Similarity Score, Average F1 score and Precision–Recall AUC. In addition, the model achieves a lower DDI rate among recommended medication combinations compared with previous recommendation models.

The rest of the paper is organized as follows: Section 2 introduces the related works, and Section 3 reviews the framework of the proposed model. In Section 4, the predictive performance of the proposed model is evaluated with baselines from MIMIC-III datasets, and several analyses are presented. In the end, Section 5 presents the conclusions and future directions of the research.

2. Related Work

2.1. Graph Attention Networks

Attention mechanisms have been successfully applied to many sequence-based tasks, such as Machine Translation (MT) and Natural Language Understanding (NLU) [25,26]. Different from Graph Convolutional Networks (GCN) that treat all neighbor nodes equally, Graph Attention Networks (GAT) integrated with the attention mechanism can assign different attention weights to each neighbor node so as to identify the relatively important nodes. Moreover, the attention mechanism incorporated into the GCN can make the propagation step more intuitive, and the GAT can also be regarded as a method of the GCN family.

The GAT model has been applied in many fields. To be specific, Fang et al. [27] proposed a new traffic network speed prediction model named L-GAT, which can capture the spatial characteristics and the temporal dynamics of the traffic network. Based on GAT, Cai et al. [28] proposed an unsupervised model named DQ-GAT, which can achieve scalable and proactive autonomous driving. Moreover, DQ-GAT provided a better trade-off between safety and efficiency in both seen and unseen scenarios. Qin et al. [29] proposed a Co-Interactive Graph Attention Network (Co-GAT), and the model can establish connections between the dialog act recognition and the sentiment classification so as to capture speakers’ intentions. Moreover, studies [22,30] applied GAT to model the internal relationship between medical events that bring breakthrough progress in the medical field.

Similar to the GCN model, the GAT model executes the calculation process by calculating the local network of nodes rather than the whole graph structure, and this process improves the calculation efficiency and reduces memory usage. However, differently from the GCN model, the GAT model assigns different weights to different neighbor nodes according to the importance of the current node, which can better deal with the structural problems, so the GAT model is widely used in social prediction, drug discovery and recommendation systems.

2.2. Medication Recommendation

Medication recommendation is one of the significant research directions in the field of intelligent medicine, which can assist doctors in making safe and effective prescriptions and has great significance for drug synergism and safety. Due to the lack of available datasets and the difficulties in information sharing in early medical research work [31,32], early medication recommendation methods are mainly based on expert prior rules and focus on the association and causality between diagnosis, procedure, and drug combinations. Specifically, Chen et al. [33] described a physician-advisory system for Chronic Heart Failure (CHF) management, which encoded the entire set of clinical practice guidelines using answer-set programming and gave patients medical information like a human physician. Slavescu et al. [34] presented a rule-based system that can assist medical doctors in routine tasks, and the system suggested a diagnosis and recommended treatment for patients based on their medical history and current symptoms. Moreover, Ajmi et al. [35] proposed an expert system that could recommend the right medication combinations depending on the location where the patient lives and the symptoms of the patient. Although these rule-based medication recommendation methods can recommend drugs for patients, their recommendation accuracy is limited.

With the security of information collection and sharing improved [36,37], medical data have been widely accumulated. Meanwhile, medication recommendation methods based on deep learning have gradually become the mainstream methods in the field of medication recommendation. These deep learning approaches learn the relationships between medical entities based on statistical regularities of EHRs and apply these relationships to medical consultation and medication recommendation. To be specific, Wang et al. [23] constructed temporal information from medical records to obtain patient representations and build a key-value memory network to recommend medications. Wang et al. [38] obtained the target distribution associated with safe medication combinations from raw patient records, which could shape distributions of patient representations and reduce the DDI rate of the medication recommendation. Furthermore, Shang et al. [22] pretrained the relationship between drugs in advance and constructed a knowledge map. Moreover, studies [39,40] used the GCN model to learn the relationship between medical events and built a medical relationship tree. Although these methods have higher accuracy than the earlier methods, they still have many limitations and defects, such as high complexity and more training parameters.

Based on the above reasons, we propose a medical ontology tree model (Med-tree) that can simultaneously capture the complex correlation and temporal features in EHRs and effectively reduce the DDI rate in real medical datasets.

3. The Proposed Model

In this section, the training process of the proposed model is described in detail. To be specific, the description of Med-tree is divided into three parts. First, the data structures and the medication recommendation tasks are explained. Next, the framework of the proposed model is described in detail. Finally, the optimization of the Med-tree and the training algorithm are presented.

3.1. Problem Formulation

Medication recommendation models based on EHRs need to be trained on datasets with high accuracy. Moreover, to improve the recommendation accuracy, the EHRs need to be standardized and pretrained. Specifically, the standardized EHRs, the medical ontology tree constructed in pretraining, the definition of the EHR graph and DDI graph and the specific medication recommendation task are described in the following parts. Moreover, the notations used in the proposed model are shown in Table 1.

3.1.1. Definition of the Standardized EHRs

To improve the accuracy of medication recommendation, the EHRs are standardized into ICD-9 diagnosis codes, ICD-9 procedure codes and ATC medication codes, as shown in Figure 2. Furthermore, the standardized EHRs can be represented as a sequence of multi-variate observations:

X^{n}

=

[x_{1}^{1},

x_{2}^{1},

x_{3}^{1},

x_{1}^{2},

\dots, x_{t}^{n}]

, where

n \in [1, N], t \in [1, T], N

represents the total number of the patients and T represents the maximum number of one’s visits. To avoid confusion and ambiguity, the superscript n is omitted and the training process of the Med-tree is described for a single patient. To be specific, the tth visit

x_{t}

=

{c_{d}^{t}, c_{p}^{t}, c_{m}^{t}}

of a patient contains diagnosis codes

c_{d}^{t}

, procedure codes

c_{p}^{t}

and medication codes

c_{m}^{t} .

3.1.2. Definition of the Medical Ontology Tree

The structure of the ICD-9 encoding system is similar to the class hierarchy extracted from the medical ontology, and it can be represented by a directed acyclic graph structure. Specifically, the leaf node of the structure represents an ICD-9 diagnosis code or an ICD-9 procedure code, and the ancestor node represents the medical hierarchy with the specific classification significance of the medicine. Moreover, the structure of the medical ontology tree is shown in Figure 3, and each node in Figure 3 is described in Table 2. It can be seen that

c_{8}

and

c_{9}

represent the angina decubitus and prinzmetal Angina, respectively, and the nodes belong to the sub-classification of angina pectoris represented by

c_{5}

and the general classification of ischemic heart disease represented by

c_{2}

. In the training of Med-tree, all diagnosis nodes and their ancestors are regarded as tree nodes, and the relationship between each medical entity is regarded as the edge composition between nodes and their neighbors. Based on the edge connection of these medical entities, the medical ontology tree is constructed. Meanwhile, due to the different medical concepts of diagnosis and procedure, the diagnosis ontology tree and the procedure ontology tree are constructed, respectively.

3.1.3. Definition of the EHR Graph and DDI Graph

The EHR graph and the DDI graph can be represented as

G_{e}

=

{ϑ,

ξ_{e}}

and

G_{d}

=

{ϑ, ξ_{d}}

respectively, where

ϑ

is the set of medications,

ξ_{e}

is the edge set of the EHR datasets, and the

ξ_{d}

is the edge set of known DDIs between a pair of drugs. Furthermore, adjacency matrices

A_{e}

,

A_{d}

∈

R^{| ϑ | \times | ϑ |}

are constructed to clarify the construction of edge

ξ_{e}, ξ_{d}

. To be specific,

A_{e} [i, j]

= 1 can be described as drug i and drug j appearing in the same prescription and acting synergistically. Meanwhile,

A_{d} [i, j]

= 1 indicates that drug i and drug j contradict each other.

3.1.4. Medication Recommendation Task

Given diagnosis codes

c_{d}^{t}

, procedure codes

c_{p}^{t}

of the current tth visit and historical visit representations

X_{1 : t - 1}

=

[x_{1}, x_{2}, \dots,

x_{t - 1}]

, where

x_{i}

=

{c_{d}^{i}, c_{p}^{i}, c_{m}^{i}}

represents the set of the diagnosis codes, procedure codes and medication codes of the ith visit. The goal of the medication recommendation task is to obtain a medication combination

{\hat{y}}_{t} \in {0, 1}^{N_{m}}

at the tth visit based on the patient’s current clinical events

c_{d}^{t}

,

c_{p}^{t}

and historical visits

X_{1 : t - 1}

, where

N_{m}

represents the total number of the recommended medications. Meanwhile, Since the medication combination

{\hat{y}}_{t}

is more than one label, the medication recommendation based on EHRs is considered a multi-label classification task.

3.2. Model Framework

The Med-tree model is divided into three parts: Ontology Embedding, Temporal Dependency and Knowledge Memory. In the following content, the specific structure shown in Figure 4 and the training process of the proposed model is explained in detail.

3.2.1. Ontology Embedding

The EHR of a single admission is considered a set of medical entities consisting of diagnoses, procedures and medications. Moreover, these entities have internal correlations, and this correlation with different meanings and degrees is called structural characteristics. Specific procedures and medications may be needed for certain diagnoses. However, the vector representation of the standardized EHRs is unordered, and it does not conform to the actual situation. Therefore, the medical ontology tree is used to enrich the vector representation of medical entities. As shown in Figure 5, the point of the arrow indicates the direction of the node message delivery in the medical ontology tree, and the tree realizes the convergence and update of messages among medical nodes through a special GAT. Different operations are carried out for the leaf nodes of specific medical entities and the ancestor nodes with medical classification concepts in the medical ontology tree.

For the non-leaf nodes with the concept of medical classification, the correlation of each child node is adaptively learned, and its representation is updated by weighted addition as follows:

c_{i}^{'} = \prod_{k = 1}^{K} σ (\begin{matrix} \sum_{j \in c (i) \cup i} α_{i j}^{k} c_{j} \end{matrix}),

(1)

where K represents the number of multiple attention;

σ

is the sigmoid function;

c (i)

represents all children nodes of node i;

c_{j}

is the vector representation of node j;

α_{i j}^{k}

can be interpreted as the weight coefficient of the kth attention between node i and node j. Specifically, the calculation formula of

α_{i j}^{k}

[41] can be illustrated below:

α_{i j}^{k} = \frac{\exp (LeakyReLU (a^{T} [c_{i} | | c_{j}]))}{\sum_{l \in c (i) \cup i} \exp (LeakyReLU (a^{T} [c_{i} | | c_{l}]))},

(2)

where

a^{T}

is a single-layer feedforward neural network;

| |

is a concatenation operation; the weight coefficient

α_{i j}^{k}

is obtained through the LeakyReLU activation function.

For the leaf node

e_{i}

, the information of its ancestor nodes can be aggregated into itself, and the embedded representation of the leaf node can be enriched by combining the information of its medical classification. Furthermore, the graph attention mechanisms are used to adaptively learn the correlation coefficients of each classification concept, and the mechanisms update the representation of leaf node

e_{i}

as follows:

e_{i} = \prod_{k = 1}^{K} σ (\begin{matrix} \sum_{j \in p (i) \cup i} α_{i j}^{k^{'}} c_{j}^{'} \end{matrix}),

(3)

where

p (i)

represents all ancestor nodes of node i;

c_{j}^{'}

represents the vector representation of node j; the formula of

α_{i j}^{k^{'}}

is translated as follows:

α_{i j}^{k^{'}} = \frac{\exp (LeakyReLU (a^{T} [c_{i}^{'} | | c_{j}^{'}]))}{\sum_{l \in p (i) \cup i} \exp (LeakyReLU (a^{T} [c_{i}^{'} | | c_{l}^{'}]))} .

(4)

The establishment of the medical ontology tree uses the GAT model to transfer information from bottom to top and aggregates the transferred results into all leaf nodes from top to bottom, which enriches the vector representation of the medical entities. Therefore, the diagnosis codes

c_{d}^{t}

and procedure codes

c_{p}^{t}

of the current tth visit are transformed into the more comprehensive representations

e_{d}^{t} and e_{p}^{t}

.

3.2.2. Temporal Dependency

The records of a patient’s multiple visits can be regarded as a collection of time series. Meanwhile, the current diagnosis of patients may be influenced by their previous health conditions. For example, for patients with chronic conditions, such as diabetes or cerebral infarction, their current visits may be very similar to the historical visits. Moreover, stomach perforation and erysipelas may appear in a patient’s later visit when they have a peptic ulcer or chickenpox. Therefore, it is necessary to fully explore the temporal characteristics of the EHRs.

The more comprehensive representations

e_{d}^{t}

and

e_{p}^{t}

are input into the GRU model to capture the temporal features of the EHRs. To be specific, the training of the diagnosis representation

e_{d}^{t}

can be taken as an example. Two GRU models with attention mechanisms are used to calculate the influence of the historical diagnosis sequence on the current diagnosis representation, and the hidden layer representations

h_{d, 1}^{t}

and

h_{d, 2}^{t}

are obtained as follows:

h_{d, 1}^{t} = {GRU}_{β_{d, 1}} (e_{d}^{1}, e_{d}^{2}, \dots, e_{d}^{t}),

(5)

h_{d, 2}^{t} = {GRU}_{β_{d, 2}} (e_{d}^{1}, e_{d}^{2}, \dots, e_{d}^{t}),

(6)

where

β_{d, 1}, β_{d, 2}

are two weights of the attention mechanisms, and they can be activated using different functions as follows:

β_{d, 1} = Softmax (h_{d, 1}^{1}, h_{d, 1}^{2}, \dots, h_{d, 1}^{t}),

(7)

β_{d, 2} = \tanh (h_{d, 2}^{1}, h_{d, 2}^{2}, \dots, h_{d, 2}^{t}) .

(8)

The weights of the attention mechanisms

β_{d, 1}

and

β_{d, 2}

are applied to obtain the representation with historical diagnostic information, and the formula of the representation

q_{d}^{t}

can be expressed as follows:

q_{d}^{t} = \sum_{i = 1}^{t} β_{d, 1} [i] β_{d, 2} [i] \otimes e_{d}^{t},

(9)

where ⊗ represents the element multiplication.

Similar to the training of diagnosis representation

e_{d}^{t}

, the procedure representation

e_{p}^{t}

can be input into the GRU model to capture the influence of the patient’s historical process information and obtain the procedure representation with history dependency information. Therefore, the representations

e_{d}^{t}

and

e_{p}^{t}

are transformed into

q_{d}^{t}

and

q_{p}^{t}

.

3.2.3. Knowledge Memory

The Knowledge Memory module is constructed to make full use of the medication knowledge, and this module not only embeds EHR and DDI graphs as facts in the memory bank but also inserts the patients’ history information into the dynamic memory to fully obtain information from different views. Specifically, inspired by research [22], the EHRs and DDI records in Section 3.1.3 are embedded into the memory bank to further improve the model recommendation accuracy and reduce the DDI rate of recommendation results. To be specific, the Knowledge Memory module can be divided into the following five steps:

Convert inputs into a representation vector of the tth visit. Here, $q_{d}^{t}$ and $q_{p}^{t}$ transformed from the Temporal Dependency module can be generated as a query $q^{t}$ as follows:

$q^{t} = f ([q_{d}^{t}, q_{p}^{t}]),$

(10)

where $f (*)$ represents a transformation function that connects the diagnosis representation $q_{d}^{t}$ and the procedure representation $q_{p}^{t}$ .
Design a memory bank. Graphically enhanced memory representations are stored in the memory bank with two adjacency matrices $A_{e}$ and $A_{d}$ . According to the GCN procedure, each $\tilde{A_{*}}$ ( $\tilde{A_{*}}$ is used for $\tilde{A_{e}}$ and $\tilde{A_{d}}$ ) is preprocessed as follows:

${\tilde{A}}_{*} = {\tilde{D}}^{- \frac{1}{2}} (A_{*} + I) {\tilde{D}}^{- \frac{1}{2}}$

(11)

where $\tilde{D}$ is a diagonal matrix such that ${\tilde{D}}_{i i}$ = $\sum_{j} A_{i j}$ and $I$ are identity matrices. Moreover, two-layer GCN is used for the EHR graph and DDI graph to capture the medical relationship between drug combination usage and DDIs, respectively, and the output $M_{b}$ is generated by two graph embeddings added as follows:

$M_{b} = Z_{1} - β Z_{2},$

(12)

$Z_{1} = {\tilde{A}}_{e} t a n h ({\tilde{A}}_{e} W_{e, 1}) W_{1},$

(13)

$Z_{2} = {\tilde{A}}_{d} t a n h ({\tilde{A}}_{d} W_{e, 2}) W_{2},$

(14)

where $β$ is a training coefficient; $W_{e, 1}$ and $W_{e, 2}$ are the medication embedding from the EHR graph and DDI graph; and $W_{1}$ and $W_{2}$ are the weight matrix of the hidden layer.
Design a dynamic memory. Specifically, patients’ historical information is inserted into the dynamic memory as key-value pairs to fully capture the information from different perspectives. Moreover, based on $q^{t}$ and medication representation $c_{m}^{t}$ , the history cache of the tth visit can be represented in the form of key-value pairs as follows:

$M_{d}^{t} = {q^{t^{'}} : c_{m}^{t^{'}}}_{1}^{t - 1},$

(15)

where $M_{d}^{t}$ is empty when t = 1, and $t^{'}$ ∈ $(1, t - 1)$ represents the historical visit before the tth visit. For convenience, $M_{d, k}^{t}$ : $[q^{1}, q^{2}, \dots, q^{t - 1}]$ are denoted as key vectors, and $M_{d, v}^{t}$ : $[c_{m}^{1}, c_{m}^{2}, \dots, c_{m}^{t - 1}]$ are denoted as value vectors to represent the history cache of the tth visit.
Output the memory representation. Get the output $o_{d}^{t}$ and $o_{b}^{t}$ by the query $q^{t}$ , the memory bank $M_{b}$ and the dynamic memory $M_{d}^{t}$ . Here, attention mechanisms are applied to retrieve the most relevant information with the query $q^{t}$ as follows:

$o_{b}^{t} = M_{b}^{T} Softmax (M_{b} q^{t}),$

(16)

$o_{d}^{t} = M_{b}^{T} (M_{d, v}^{t}) Softmax (M_{d, k}^{t} q^{t}) .$

(17)
Obtain the multi-label recommended medication combination ${\hat{y}}_{t}$ by activating $q^{t}$ , $o_{b}^{t}$ , and $o_{d}^{t}$ , which can be expressed as follows:

${\hat{y}}_{t} = σ ([q^{t}, o_{b}^{t}, o_{d}^{t}]),$

(18)

where $σ$ represents the activation function.

3.3. Optimization

The gap between the recommended medication combination and the real medication recommendation determines the quality of the recommendation model. For this reason, the medication recommendation task can be regarded as a multi-label classification problem. Thus, the multi-label margin loss

L_{m u l t i}

and the binary cross-entropy loss

L_{b c e}

are combined as the multi-label classification loss

L_{m l p}

as below:

L_{m l p} = α * L_{b c e} + (1 - α) * L_{m u l t i},

(19)

L_{b c e} = - \sum_{t}^{T} \sum_{i} y_{i}^{t} l o g σ ({\hat{y}}_{i}^{t}) + (1 - y_{i}^{t}) l o g (1 - σ ({\hat{y}}_{i}^{t})),

(20)

L_{m u l t i} = \sum_{t}^{T} \sum_{i}^{| c_{m} |} \sum_{j}^{{\hat{Y}}^{t}} \frac{m a x (0, 1 - ({\hat{y}}_{t} [{\hat{Y}}_{j}^{t}] - {\hat{y}}_{t} [i]))}{L} .

(21)

Here,

α

is the mixture weights; T represents the maximum number of one’s visits;

{\hat{y}}_{i}^{t}

and

\hat{y} [i]

are the mean medication i coordinate at the tth visit and its value, respectively; and

{\hat{y}}^{t} [{\hat{Y}}_{j}^{t}]

represents the jth classification label indexed by classification label set

{\hat{Y}}^{t}

.

DDI loss is designed to control the DDI rate in the medication recommendation result, and its formula can be described as follows:

L_{D D I} = \sum_{t}^{T} \sum_{i, j} (A_{d} ⊙ ({\hat{y}}_{t}^{T} {\hat{y}}_{t})) [i, j],

(22)

where

{\hat{y}}_{t}^{T} {\hat{y}}_{t}

gives the pair-wise probability of the recommendation result. ⊙ is the element-wise product.

To achieve a lower DDI rate for medication recommendation results, the balance between

L_{m l p}

and

L_{D D I}

should be achieved. Inspired by study [42], we transform between

L_{m l p}

and

L_{D D I}

with a certain probability as follows:

L = \{\begin{matrix} L_{D D I} & if s^{'} > s & & p = \exp (- \frac{s^{'} - s}{t e m p}) \\ L_{m l p} & else \end{matrix},

(23)

on the one hand, there will be a high probability of using

L_{D D I}

in the case where the DDI rate

s^{'}

larger than the expected DDI rate s. On the other hand, decay rate

ϵ

will be applied to the temperature

t e m p

when the model becomes stable with training time, and

L_{m l p}

will be used as the loss function.

In summary, the training algorithm of the Med-tree model is detailed in Algorithm 1.

Algorithm 1: Training Algorithm.

4. Performance Analysis

This section is mainly introduced from three aspects. First, the experimental configurations are explained, which includes the data source, the baselines, and so on. Secondly, the performance of Med-tree for medication recommendation is verified by comparing it with some basic models. Finally, the feasibility of the proposed model in practical application is derived from a case study.

4.1. Experimental Setup

4.1.1. Data Source

The experiments are carried out on the MIMIC-III datasets, and the structure of the MIMIC-III datasets is described in detail as follows:

MIMIC-III are open-source medical datasets based on intensive care unit patient monitoring managed by MIT. It contains more than 50,000 admissions to intensive care units in large tertiary care hospitals between 2001 and 2012 and 7870 newborns admitted between 2001 and 2008. Specifically, the MIMIC-III data components include vital signs, diagnoses, procedures, medications, and so on. Furthermore, to improve EHRs standardization and availability, the data of the MIMIC-III datasets are transformed into a time-series list of diagnoses, procedures, and medications that are easy to be trained.

In addition, the representations of medical entities on MIMIC-III datasets are converted according to medical standards. To be specific, the medications using NDC codes are converted to ATC codes, and the diagnoses and procedures are integrated with the ICD-9 codes. Moreover, the statistical results are listed in Table 3 to further illustrate the characteristics of the MIMIC-III datasets.

4.1.2. Baselines

Several baselines are considered for comparisons as follows:

LR (Logistic regression) is a classical machine learning method that uses L2 regularization logistic regression and multi-heat vector to represent the multi-label output.
Leap [43] is a method to predict future events through an additional set of tensors that contain temporal information, and the model performs well in medical data, recommendation engine, and many multiple scenarios.
RETAIN [21] provides medication recommendations by establishing a two-layer attention-based RNN that considers the influence of temporal factors.
GAMENet [22] establishes a Drug–Drug Interaction graph to strengthen the relationship between medications and employs a memory enhancement network to learn the temporal dependency of medication recommendation tasks.
PREMIER [44] adds two attention mechanisms for learning the medical history information, and it combines the GAT model to learn the DDIs, which enhances the unlockability of the recommendation.

4.1.3. Metrics

To measure the accuracy of experimental results, the Jaccard Similarity Score (Jaccard), Average F1 (F1) and Precision–Recall AUC (PRAUC) are considered as the scoring functions. The formula of Jaccard can be written below:

Jaccard = \frac{1}{\sum_{k}^{N} \sum_{t}^{T_{k}} 1} \sum_{k}^{N} \sum_{t}^{T_{k}} \frac{| Y_{t}^{(k)} ⋂ {\hat{Y}}_{t}^{(k)} |}{| Y_{t}^{(k)} ⋃ {\hat{Y}}_{t}^{(k)} |},

(24)

where N means the total number of patients and

T_{k}

can be interpreted as the maximum number of visits of the kth patient.

The PRAUC is calculated by the trapezoidal integral for the area under the PR curve, which can be applied to datasets with imbalanced positive and negative sample numbers.

The F1 score can transform the multi-classification problem into n bipartitions and calculate the average score of the bipartition to obtain the final evaluation index. Moreover, the F1 score can be described as follows:

A v g (P_{t}^{(k)}) = \frac{| Y_{t}^{(k)} ⋂ {\hat{Y}}_{t}^{(k)} |}{| Y_{t}^{(k)} |}, A v g (R_{t}^{(k)}) = \frac{| Y_{t}^{(k)} ⋂ {\hat{Y}}_{t}^{(k)} |}{| {\hat{Y}}_{t}^{(k)} |},

(25)

F 1 = \frac{1}{\sum_{k}^{N} \sum_{t}^{T_{k}} 1} \sum_{k}^{N} \sum_{t}^{T_{k}} \frac{2 \times A v g (P_{t}^{(k)}) \times A v g (R_{t}^{(k)})}{A v g (P_{t}^{(k)}) + A v g (R_{t}^{(k)})},

(26)

where t means the tth visit, and k is the kth patient in the test set.

To measure the medication safety, the DDI rate is defined as the percentage of medication recommendations containing DDIs, and the calculation formula of the DDI rate is as follows:

DDI rate = \frac{\sum_{k}^{N} \sum_{t}^{T_{k}} \sum_{i, j} | {(c_{i}, c_{j}) \in Y_{t}^{(k)} | (c_{i}, c_{j}) \in ε_{d}}}{\sum_{k}^{N} \sum_{t}^{T_{k}} \sum_{i, j} 1} .

(27)

Here, the DDI rate will count each medication pair

(c_{i}, c_{j})

in a recommendation set

Y_{t}^{(k)}

if pair

(c_{i}, c_{j})

belongs to an edge set

ξ_{d}

of the DDI graph. Furthermore, the

Δ DDI rate %

is defined as the percentage of the DDI rate compared to the DDI rate in EHR test datasets, and its formula is defined below:

Δ DDI rate % = \frac{DDI rate - DDI rate (EHRs)}{DDI rate (EHRs)} .

(28)

4.1.4. Evaluation Strategies

The datasets are randomly divided into training, validation, and test sets at a ratio of

2 / 3 : 1 / 6 : 1 / 6

. To be specific, the EHR graph

G_{e}

and the DDI graph

G_{d}

are constructed in pretraining, and the Adam is applied as an optimizer with an initial learning rate of 0.001. The best recommendation result can be fixed on the evaluation set in 40 epochs. All the methods are implemented in PyTorch 1.7.0 and trained on Ubuntu 18.04 with 12 GB memory and Nvidia 3090 GPU.

4.2. Experimental Results

The validity of the proposed model is proven in the following four parts. To be specific, the first part compares the recommendation accuracy of the proposed model with baselines. In the second part, the DDI rate between Med-tree and baselines is compared. Next, in the third part, the performance of Med-tree is compared with the baseline model when the frequency of medication occurrence is different. In the end, the performance of Med-tree with baselines is compared in medical records with varying lengths.

4.2.1. Prediction Performance

Table 4 shows the performance comparison of the Jaccard Similarity Score, Average F1 score, Precision–Recall AUC, and DDI rate between Med-tree and baselines on MIMIC-III datasets. It can be seen from the experimental results that the proposed model achieves the best results among all indicators. In terms of Jaccard Similarity Score, Average F1 score, and Precision–Recall AUC, the proposed model is 1.02%, 1.23%, and 1.09% higher than that of the latest method (PREMIER), respectively. Moreover, in terms of the number of medication recommendation outcomes, the average number of drugs recommended by Med-tree is 14.98, which has the smallest gap from the true value of 14.68. The several indicators above show that the proposed model can effectively improve the accuracy of medication recommendations. Meanwhile, Med-tree takes into account drug safety. To be specific, LR and LEAP are machine learning-based methods, which are inferior to RETAIN and other deep learning methods in all indicators. Moreover, GAMENet and PREMIER are the latest methods that introduce DDI knowledge and reduce the DDI rate in the medication recommendation combination. However, the DDI rate of these methods is all lower than the proposed model, which verifies that the proposed model combined with the medical ontology tree and learning medical feature representations can greatly improve the medication recommendation result.

4.2.2. Evaluation of the DDIs

To verify the ability of the model to reduce the DDI rate in medication recommendation combinations, the experiments are classified with Top 40, Top 60, Top 80, and Top 100 to study the comparisons between Med-tree and the baselines, and Table 5 lists the comparison results.

Leap is a rule-based approach where medication recommendation combinations are selected from the prescribed drugs of previous doctors, so the DDI rate recommended by Leap is lower than other baselines. However, the Leap model requires a large number of manual participations, which is not suitable for complex datasets. Moreover, the RETAIN model does not incorporate the knowledge of DDIs, which leads to a higher DDI rate for medication recommendation. The GAMENet model and the PREMIER model learn the knowledge of DDIs by establishing the EHR graph and the DDI graph so that their Top 40 and Top 60 indicators still have a low DDI rate. However, their DDI rates are inevitably increased when the number of indicators increases. Furthermore, the Med-tree model is superior to all previous deep learning methods. Even though the

Δ_{D D I}

increases from −18.48% to −0.26%, the Med-tree model can still keep the DDI rate below zero. These characteristics further indicate that the Med-tree can still take into account drug safety in some complex environments and situations. It also shows that the introduction of the medical ontology tree can enrich the representations of medical entities and improve the quality of downstream medication recommendation tasks.

4.2.3. Evaluation for Unbalanced Medications

Due to the particularity of EHRs, the occurrence frequency of drugs may be different, so it is difficult to recommend drugs with low frequency during medication recommendation. Moreover, the proposed model could mitigate the influence of different frequencies of medications by constructing the medical ontology tree and the memory bank. Figure 6a counts the number of medications in different frequency bands, and it can be seen that 58 of the 145 medication types appear less than 100 times, while nearly 40 types are recommended more than 1000 times. The average F1 scores of medication recommendation results are calculated in different frequency ranges, as shown in Figure 6b, which indicates that the proposed model significantly improves the recommendation of less frequent medications compared with other methods. In addition, several frequency bands have higher average F1 scores because they contain more dedicated medications for specific diseases, such as hypertension drugs, diabetes drugs, etc.

4.2.4. Evaluation for EHRs of Different Length

As can be seen from Table 3, the maximum number of medical visits of patients on MIMIC-III datasets is 29. Since each patient has a different number of admissions, the influence of temporal series length should be considered. Figure 7 shows the evaluations for EHRs with different temporal lengths on MIMIC-III datasets. To be specific, the Leap model is a rule-based model, and the length of the EHRs has no effect on its recommendation results, which results in low F1 indicators. Moreover, the GAMENet model uses the GRU to learn the temporal characteristics of a patient’s medical records, which greatly improves the F1 indicators compared with the LEAP model. However, its F1 indicators gradually decrease with the increase in the sequence length of the EHRs. The PREMIER model improves the GAMENet model by adding two attention mechanisms to prompt the learning of temporal information of medical records, which results in relatively high F1 indicators. Furthermore, Med-tree has the best performance among all methods, and its F1 value can still maintain a high level with the increase in medical record sequence length, which indicates that the proposed model can better learn the temporal dependency of a patient’s medical records.

4.3. Case Study

To specifically and intuitively observe the ability of the proposed model in medication recommendation, the performance of Med-tree is compared with other baselines on the case samples randomly selected from MIMIC-III datasets. A typical case is shown in Table 6, which indicates the recommendation result of a patient’s last visit. Moreover, there are 15 drugs actually recommended in the last visit, and the recommended medication combinations from Med-tree and baseline methods are listed in Table 6. It is seen that Med-tree has the best performance. To be specific, Med-tree correctly recommends 14 of the 15 drugs, only 1 medication is missed and 3 types are mispredicted. In contrast, the previous best method is PREMIER, which recommends one less correct drug than Med-tree. Furthermore, it can be found that none of the models successfully hit the medication “Anxiolytics”, which may due to the specific habits of certain doctors for this drug. In summary, compared with other recommended methods, Med-tree has more accurate recommendation effects on MIMIC-III datasets, and the comparison results of the case fully prove the excellent performance of the proposed model in the actual medication recommendation process.

5. Conclusion and Future Work

In this work, we propose a novel method for medication recommendation that can adequately enhance the learning of obscure structural and temporal features and reduce the DDIs in recommendation results. Moreover, in terms of structural correlation, we construct the medical ontology tree and apply the GAT model to learn the target features, which effectively learns the internal correlation between medical events. Meanwhile, the GRU model combined with the attention mechanisms is applied to capture the temporal characteristics, which improves the accuracy of recommendation. Finally, the memory bank and dynamic memory mechanism are introduced to reduce the DDI rate of the recommendation result. Experimental results show that the proposed model effectively captures the correlation and temporal features and further has a better performance than the existing methods. Furthermore, the case study also shows that the proposed model can make more accurate and reasonable prescriptions for patients in practical applications.

In fact, EHRs are inadequate for the task of personalized and accurate drug recommendation. Moreover, due to a lack of information, inaccurate information, contradictory information, contradictions, and other reasons, there are many uncertainties in EHRs for medication recommendation. However, we only consider the influences of diagnoses and procedures, and there is still great research potential for improvement in the feature mining of EHRs. In the future, we will consider incorporating the original textual information of the EHRs and focus on how to effectively model the fine-grained temporal evolution in EHRs. Meanwhile, medical artificial intelligence studies based on EHRs can further expand the application scope. In addition to medication recommendation, we will further apply EHRs to medical knowledge question answering, disease prediction, and other issues in the future.

Author Contributions

W.Y., L.Z. (Lei Zhang), L.Z. (Lijuan Zhang), J.H. and J.W. contributed to the conception of the study; W.Y. performed the experiment; W.Y., L.Z. (Lijuan Zhang) and N.X. contributed significantly to analysis and manuscript preparation; W.Y., L.Z. (Lijuan Zhang) and J.W. performed the data analyses and wrote the manuscript; L.Z. (Lei Zhang) and N.X. helped perform the analysis with constructive discussions. All authors have read and agreed to the published version of the manuscript.

Funding

The research was partially funded by Zhejiang Province Key Research and Development Project (2020C03071, 2021C03145).

Conflicts of Interest

The authors declare no conflict of interest.

References

Wang, M.; Wang, M.; Yu, F.; Yang, Y.; Walker, J.; Mostafa, J. A systematic review of automatic text summarization for biomedical literature and EHRs. J. Am. Med. Inform. Assoc. 2021, 28, 2287–2297. [Google Scholar] [CrossRef] [PubMed]
Ramachandran, S.; Kiruthika, O.O.; Ramasamy, A.; Vanaja, R.; Mukherjee, S. A review on blockchain-based strategies for management of electronic health records (EHRs). In Proceedings of the 2020 International Conference on Smart Electronics and Communication (ICOSEC), Trichy, India, 10–12 September 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 341–346. [Google Scholar]
Tutty, M.A.; Carlasare, L.E.; Lloyd, S.; Sinsky, C.A. The complex case of EHRs: Examining the factors impacting the EHR user experience. J. Am. Med. Inform. Assoc. 2019, 26, 673–677. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Gao, Y.; Xiang, X.; Xiong, N.; Huang, B.; Lee, H.J.; Alrifai, R.; Jiang, X.; Fang, Z. Human action monitoring for healthcare based on deep learning. IEEE Access 2018, 6, 52277–52285. [Google Scholar] [CrossRef]
Miller, D.D.; Brown, E.W. Artificial intelligence in medical practice: The question to the answer? Am. J. Med. 2018, 131, 129–133. [Google Scholar] [CrossRef]
Xie, S.; Yu, Z.; Lv, Z. Multi-disease prediction based on deep learning: A survey. CMES Comput. Model. Eng. Sci. 2021, 127, 3. [Google Scholar] [CrossRef]
Yadav, P.; Steinbach, M.; Kumar, V.; Simon, G. Mining electronic health records (EHRs) A survey. ACM Comput. Surv. 2018, 50, 1–40. [Google Scholar] [CrossRef]
Lin, C.; He, Y.X.; Xiong, N. An energy-efficient dynamic power management in wireless sensor networks. In Proceedings of the 2006 Fifth International Symposium on Parallel and Distributed Computing, Timisoara, Romania, 6–9 July 2006; IEEE: Piscataway, NJ, USA, 2006; pp. 148–154. [Google Scholar]
Xia, F.; Hao, R.; Li, J.; Xiong, N.; Yang, L.T.; Zhang, Y. Adaptive GTS allocation in IEEE 802.15. 4 for real-time wireless sensor networks. J. Syst. Archit. 2013, 59, 1231–1242. [Google Scholar] [CrossRef]
Elhoseny, M.; Shankar, K.; Uthayakumar, J. Intelligent diagnostic prediction and classification system for chronic kidney disease. Sci. Rep. 2019, 9, 1–14. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Erraguntla, M.; Zapletal, J.; Lawley, M. Framework for Infectious Disease Analysis: A comprehensive and integrative multi-modeling approach to disease prediction and management. Health Inform. J. 2019, 25, 1170–1187. [Google Scholar] [CrossRef] [PubMed]
Wu, C.; Luo, C.; Xiong, N.; Zhang, W.; Kim, T.H. A greedy deep learning method for medical disease analysis. IEEE Access 2018, 6, 20021–20030. [Google Scholar] [CrossRef]
Wu, C.; Ju, B.; Wu, Y.; Lin, X.; Xiong, N.; Xu, G.; Li, H.; Liang, X. UAV autonomous target search based on deep reinforcement learning in complex disaster scene. IEEE Access 2019, 7, 117227–117245. [Google Scholar] [CrossRef]
Nyamabo, A.K.; Yu, H.; Shi, J.Y. SSI–DDI: Substructure–substructure interactions for drug–drug interaction prediction. Briefings Bioinform. 2021, 22, bbab133. [Google Scholar] [CrossRef] [PubMed]
Ren, Z.H.; Yu, C.Q.; Li, L.P.; You, Z.H.; Guan, Y.J.; Wang, X.F.; Pan, J. BioDKG–DDI: Predicting drug–drug interactions based on drug knowledge graph fusing biochemical information. Briefings Funct. Genom. 2022, 21, 216–229. [Google Scholar] [CrossRef] [PubMed]
John, A.; Vasudevan, V.; Ilyas, H.A. Medication recommendation system based on clinical documents. In Proceedings of the 2016 International Conference on Information Science (ICIS), Dublin, Ireland, 11–14 December 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 180–184. [Google Scholar]
Syed-Abdul, S.; Nguyen, A.; Huang, F.; Jian, W.S.; Iqbal, U.; Yang, V.; Hsu, M.H.; Li, Y.C. A smart medication recommendation model for the electronic prescription. Comput. Methods Programs Biomed. 2014, 117, 218–224. [Google Scholar] [CrossRef]
Ghasemi, S.H.; Etminani, K.; Dehghan, H.; Eslami, S.; Hasibian, M.R.; Vakili-Arki, H.; Saberi, M.R.; Aghabagheri, M.; Namayandeh, S.M. Design and Evaluation of a Smart Medication Recommendation System for the Electronic Prescription. In Proceedings of the dHealth, Vienna, Austria, 28–29 May 2019; pp. 128–135. [Google Scholar]
Choi, E.; Bahadori, M.T.; Song, L.; Stewart, W.F.; Sun, J. GRAM: Graph-based attention model for healthcare representation learning. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, 13–17 August 2017; pp. 787–795. [Google Scholar]
Gao, C.; Sun, H.; Wang, T.; Tang, M.; Bohnen, N.I.; Müller, M.L.; Herman, T.; Giladi, N.; Kalinin, A.; Spino, C.; et al. Model-based and model-free machine learning techniques for diagnostic prediction and classification of clinical outcomes in Parkinson’s disease. Sci. Rep. 2018, 8, 1–21. [Google Scholar] [CrossRef] [Green Version]
Choi, E.; Bahadori, M.T.; Sun, J.; Kulas, J.; Schuetz, A.; Stewart, W. Retain: An interpretable predictive model for healthcare using reverse time attention mechanism. Adv. Neural Inf. Process. Syst. 2016, 29, 3504–3512. [Google Scholar]
Shang, J.; Xiao, C.; Ma, T.; Li, H.; Sun, J. Gamenet: Graph augmented memory networks for recommending medication combination. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 1126–1133. [Google Scholar]
Wang, Y.; Chen, W.; Pi, D.; Yue, L. Adversarially regularized medication recommendation model with multi-hop memory network. Knowl. Inf. Syst. 2021, 63, 125–142. [Google Scholar] [CrossRef]
Yang, C.; Xiao, C.; Ma, F.; Glass, L.; Sun, J. SafeDrug: Dual Molecular Graph Encoders for Recommending Effective and Safe Drug Combinations. In Proceedings of the 30th International Joint Conference on Artificial Intelligence (IJCAI-21), Montreal, QC, Canada, 19–27 August 2021; pp. 3735–3741. [Google Scholar]
Farinhas, A.; Martins, A.F.; Aguiar, P.M. Multimodal continuous visual attention mechanisms. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 1047–1056. [Google Scholar]
Liang, Y.; Li, H.; Guo, B.; Yu, Z.; Zheng, X.; Samtani, S.; Zeng, D.D. Fusion of heterogeneous attention mechanisms in multi-view convolutional neural network for text classification. Inf. Sci. 2021, 548, 295–312. [Google Scholar] [CrossRef]
Fang, Y.; Jiang, J.; He, Y. Traffic Speed Prediction Based on LSTM-Graph Attention Network (L-GAT). In Proceedings of the 4th International Conference on Advanced Electronic Materials, Computers and Software Engineering (AEMCSE), Changsha, China, 26–28 March 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 788–793. [Google Scholar]
Cai, P.; Wang, H.; Sun, Y.; Liu, M. DQ-GAT: Towards Safe and Efficient Autonomous Driving with Deep Q-Learning and Graph Attention Networks. IEEE Trans. Intell. Transp. Syst. 2022, 8, 1–11. [Google Scholar] [CrossRef]
Qin, L.; Li, Z.; Che, W.; Ni, M.; Liu, T. Co-gat: A co-interactive graph attention network for joint dialog act recognition and sentiment classification. In Proceedings of the AAAI Conference on Artificial Intelligence, Online, 2–9 February 2021; Volume 35, pp. 13709–13717. [Google Scholar]
Su, C.; Gao, S.; Li, S. GATE: Graph-attention augmented temporal neural network for medication recommendation. IEEE Access 2020, 8, 125447–125458. [Google Scholar] [CrossRef]
Yao, Y.; Xiong, N.; Park, J.H.; Ma, L.; Liu, J. Privacy-preserving max/min query in two-tiered wireless sensor networks. Comput. Math. Appl. 2013, 65, 1318–1325. [Google Scholar] [CrossRef]
Zhao, J.; Huang, J.; Xiong, N. An effective exponential-based trust and reputation evaluation system in wireless sensor networks. IEEE Access 2019, 7, 33859–33869. [Google Scholar] [CrossRef]
Chen, Z.; Marple, K.; Salazar, E.; Gupta, G.; Tamil, L. A physician advisory system for chronic heart failure management based on knowledge patterns. Theory Pract. Log. Program. 2016, 16, 604–618. [Google Scholar] [CrossRef] [Green Version]
Slăvescu, R.R.; Groşan, A.C.; Slăvescu, K.C. Towards Assisting Medical Decisions by Using Rule Based Protocols and Semantic Resources. In Proceedings of the International Conference on Advancements of Medicine and Health Care through Technology, Cluj-Napoca, Romania, 5–7 June 2014; Springer: Berlin/Heidelberg, Germany, 2014; pp. 31–36. [Google Scholar]
Al-Ajmi, N.; Almulla, M.A. Rule-Based Expert System for Headache Diagnosis and Medication Recommendation. Int. J. Health Med. Eng. 2020, 14, 388–391. [Google Scholar]
Chen, Y.; Zhou, L.; Pei, S.; Yu, Z.; Chen, Y.; Liu, X.; Du, J.; Xiong, N. KNN-BLOCK DBSCAN: Fast clustering for large-scale data. IEEE Trans. Syst. Man Cybern. Syst. 2019, 51, 3939–3953. [Google Scholar] [CrossRef]
Huang, S.; Zeng, Z.; Ota, K.; Dong, M.; Wang, T.; Xiong, N.N. An intelligent collaboration trust interconnections system for mobile information control in ubiquitous 5G networks. IEEE Trans. Netw. Sci. Eng. 2020, 8, 347–365. [Google Scholar] [CrossRef]
Wang, Y.; Chen, W.; Pi, D.; Yue, L.; Wang, S.; Xu, M. Self-Supervised Adversarial Distribution Regularization for Medication Recommendation. In Proceedings of the International Joint Conferences on Artificial Intelligence Organization, Online, 19–26 August 2021. [Google Scholar]
Shang, J.; Ma, T.; Xiao, C.; Sun, J. Pre-training of graph augmented transformers for medication recommendation. arXiv 2019, arXiv:1906.00346. [Google Scholar]
Choi, E.; Xu, Z.; Li, Y.; Dusenberry, M.; Flores, G.; Xue, E.; Dai, A. Learning the graphical structure of electronic health records with graph convolutional transformer. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 606–613. [Google Scholar]
Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Lio, P.; Bengio, Y. Graph attention networks. arXiv 2017, arXiv:1710.10903. [Google Scholar]
Kirkpatrick, S.; Gelatt, C.D., Jr.; Vecchi, M.P. Optimization by simulated annealing. Science 1983, 220, 671–680. [Google Scholar] [CrossRef]
Esteban, C.; Tresp, V.; Yang, Y.; Baier, S.; Krompaß, D. Predicting the co-evolution of event and knowledge graphs. In Proceedings of the 19th International Conference on Information Fusion (FUSION), Heidelberg, Germany, 5–8 July 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 98–105. [Google Scholar]
Bhoi, S.; Li, L.M.; Hsu, W. Premier: Personalized recommendation for medical prescriptions from electronic records. arXiv 2020, arXiv:2008.13569. [Google Scholar]

Figure 1. The medication recommendation system is built from multiple data sources that can help physicians develop personalized drug combinations.

Figure 2. A standardized sample of the EHRs.

Figure 3. The structure of the medical ontology tree.

Figure 4. The framework of Med-tree. The standardized EHRs are divided into multi-hot representations

c_{d}^{t}, c_{p}^{t}

and

c_{m}^{t}

. Then,

c_{d}^{t}

and

c_{p}^{t}

are input into the medical ontology tree to generate embedding

e_{d}^{t}

and

e_{p}^{t}

using Equations (1)–(4). Next, the GRU models with attention mechanisms generate

q_{d}^{t}

and

q_{p}^{t}

described in Equations (5)–(9). After that, the outputs

o_{b}^{t}

and

o_{d}^{t}

are generated by integrating the key-value pairs stored in dynamic memory using Equations (10)–(17). Finally, query

q^{t}

and memory output

o_{b}^{t}

,

o_{d}^{t}

are activated by Equation (18) to get the recommended combination of drugs.

Figure 4. The framework of Med-tree. The standardized EHRs are divided into multi-hot representations

c_{d}^{t}, c_{p}^{t}

and

c_{m}^{t}

. Then,

c_{d}^{t}

and

c_{p}^{t}

are input into the medical ontology tree to generate embedding

e_{d}^{t}

and

e_{p}^{t}

using Equations (1)–(4). Next, the GRU models with attention mechanisms generate

q_{d}^{t}

and

q_{p}^{t}

described in Equations (5)–(9). After that, the outputs

o_{b}^{t}

and

o_{d}^{t}

are generated by integrating the key-value pairs stored in dynamic memory using Equations (10)–(17). Finally, query

q^{t}

and memory output

o_{b}^{t}

,

o_{d}^{t}

are activated by Equation (18) to get the recommended combination of drugs.

Figure 5. A sample construction of the medical ontology tree.

Figure 6. (a) is the total number of medications in different frequency ranges from MIMIC-III datasets and (b) is the comparison of the average F1 score recommended by different methods (Leap [43], RETAIN [21], GAMENet [22], PREMIER [44], and Med-tree) in different frequency ranges on MIMIC-III datasets.

Figure 7. The comparisons of average F1 score recommended by different methods (Leap [43], RETAIN [21], GAMENet [22], PREMIER [44], and Med-tree) for EHRs with different temporal lengths on MIMIC-III datasets.

Table 1. Notations used in Med-tree.

Notation	Description
$X^{n}$	the representation of the standardized EHRs
$X_{1 : t - 1}$	the historical visit representation of tth visit
$x_{t}$	the representation of tth visit
$c_{d}^{t}, c_{p}^{t}, c_{m}^{t}$	the diagnosis codes, procedure codes and medication codes of tth visit
$G_{e}, G_{d}$	the EHR and DDI graph
$A_{e}, A_{d}$	the adjacency matrixes of $G_{e}$ and $G_{d}$
$c_{i}, c_{i}^{'}, e_{i}$	the representation of nodes changed in the medical ontology tree
$e_{d}^{t}, e_{p}^{t}$	the more comprehensive representations for diagnoses and procedures through the Ontology Embedding
$q_{d}^{t}, q_{p}^{t}$	the representations of diagnoses and procedures through the Temporal Dependency
$q^{t}$	the query of the dynamic memory
$M_{b}$	the memory bank
$M_{d}^{t}$	the dynamic memory
$M_{d, k}^{t}, M_{d, v}^{t}$	the key vector and the value vector of the dynamic memory
$o_{b}^{t}, o_{d}^{t}$	the memory outputs through the Knowledge Memory
${\hat{y}}_{t}$	the multi-label medication recommendation of tth visit
$\hat{Y}$	the recommended medication set
$Y$	the ground truth of the medication set

Table 2. The description of each node in Figure 3.

Node	ICD-9 Code	Description
$c_{0}$	∖	Root
$c_{2}$	410∼414	Ischemic heart disease
$c_{5}$	413	Angina pectoris
$c_{8}$	413.0	Angina decubitus
$c_{9}$	413.1	Prinzmetal Angina

Table 3. Statistical results of MIMIC-III datasets.

Type			MIMIC-III
patients			35,886
- single-visit			28,936
- multi-visit			6950
clinical entities			3529
- diagnosis			1958
- procedure			1426
- medication			145
max # of visits			29
avg # of visits			2.36
avg # of diagnosis			10.51
avg # of procedure			3.84
avg # of medication			8.80

Table 4. Collaborative medication recommendation performance of different methods obtained on MIMIC-III datasets. The average number of medicines on the test set is 14.61.

Methods	Jaccard	PR-AUC	F1	$R_{DDI} Top 40$	Avg # of Med
LR	0.4087	0.6739	0.5669	0.0782	11.37
Leap [43]	0.3911	0.5699	0.5486	0.0633	15.96
RETAIN [21]	0.4140	0.6612	0.5746	0.0863	18.46
GAMENet [22]	0.4479	0.6883	0.6053	0.0780	13.93
PREMIER [44]	0.4641	0.7055	0.6186	0.0754	13.96
Med-tree (w/o DDI)	0.4738	0.7157	0.6306	0.0767	15.03
Med-tree (w/o ontology)	0.4691	0.7098	0.6220	0.0726	14.34
Med-tree	0.4743	0.7164	0.6309	0.0705	14.98

Table 5. The comparisons of the DDI rate between Med-tree and baselines are divided into Top 40, Top 60, Top 80, and Top 100 types.

Top K	DDI	Leap [43]	RETAIN [21]	GAMENet [22]	PREMIER [44]	Med-Tree
40	$R_{D D I}$	0.0633	0.0863	0.0780	0.0754	0.0705
	$Δ_{D D I}$	−26.81%	−0.21%	−9.81%	−12.82%	−18.48%
60	$R_{D D I}$	0.1923	0.2065	0.1985	0.1989	0.1869
	$Δ_{D D I}$	−3.15%	4.00%	−0.03%	0.17%	−5.87%
80	$R_{D D I}$	0.2931	0.2985	0.3000	0.3095	0.2873
	$Δ_{D D I}$	−0.08%	8.77%	2.28%	5.52%	−2.05%
100	$R_{D D I}$	0.3964	0.3979	0.4035	0.4201	0.3880
	$Δ_{D D I}$	1.90%	12.29%	3.73%	7.99%	−0.26%

Table 6. Results of the last recommended medications for a patient with a total of five visits on MIMIC-III datasets. Here, “unseen” indicates the drugs are not appearing in the actual recommendation results, while “missed” refers to the drugs that should be recommended in the actual situation but are not recommended.

Methods	Recommended Medication Combination (the Last Visit)
Leap [43]	8 correct + 2 unseen + 7 missed (Antigout, Anxiolytics, Cardiac glycosides, …)
RETAIN [21]	10 correct + 4 unseen + 5 missed (Anxiolytics, Cardiac glycosides, Potassium, …)
GAMENet [22]	12 correct + 2 unseen + 3 missed (Antigout, Anxiolytics, Dopaminergic agents, …)
PREMIER [44]	13 correct + 4 unseen + 2 missed (Anxiolytics, Cardiac glycosides, …)
Med-tree	14 correct + 4 unseen + 1 missed (Anxiolytics)

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yue, W.; Zhang, L.; Zhang, L.; Huang, J.; Wan, J.; Xiong, N. Med-Tree: A Medical Ontology Tree Combined with the Graph Attention Networks for Medication Recommendation. Electronics 2022, 11, 3558. https://doi.org/10.3390/electronics11213558

AMA Style

Yue W, Zhang L, Zhang L, Huang J, Wan J, Xiong N. Med-Tree: A Medical Ontology Tree Combined with the Graph Attention Networks for Medication Recommendation. Electronics. 2022; 11(21):3558. https://doi.org/10.3390/electronics11213558

Chicago/Turabian Style

Yue, Weiqi, Lijuan Zhang, Lei Zhang, Jie Huang, Jian Wan, and Naixue Xiong. 2022. "Med-Tree: A Medical Ontology Tree Combined with the Graph Attention Networks for Medication Recommendation" Electronics 11, no. 21: 3558. https://doi.org/10.3390/electronics11213558

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Med-Tree: A Medical Ontology Tree Combined with the Graph Attention Networks for Medication Recommendation

Abstract

1. Introduction

2. Related Work

2.1. Graph Attention Networks

2.2. Medication Recommendation

3. The Proposed Model

3.1. Problem Formulation

3.1.1. Definition of the Standardized EHRs

3.1.2. Definition of the Medical Ontology Tree

3.1.3. Definition of the EHR Graph and DDI Graph

3.1.4. Medication Recommendation Task

3.2. Model Framework

3.2.1. Ontology Embedding

3.2.2. Temporal Dependency

3.2.3. Knowledge Memory

3.3. Optimization

4. Performance Analysis

4.1. Experimental Setup

4.1.1. Data Source

4.1.2. Baselines

4.1.3. Metrics

4.1.4. Evaluation Strategies

4.2. Experimental Results

4.2.1. Prediction Performance

4.2.2. Evaluation of the DDIs

4.2.3. Evaluation for Unbalanced Medications

4.2.4. Evaluation for EHRs of Different Length

4.3. Case Study

5. Conclusion and Future Work

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI