*Article* **Biomedical Relation Extraction Using Dependency Graph and Decoder-Enhanced Transformer Model**

**Seonho Kim 1, Juntae Yoon 2,\* and Ohyoung Kwon 3,\***


**Abstract:** The identification of drug–drug and chemical–protein interactions is essential for understanding unpredictable changes in the pharmacological effects of drugs and mechanisms of diseases and developing therapeutic drugs. In this study, we extract drug-related interactions from the DDI (Drug–Drug Interaction) Extraction-2013 Shared Task dataset and the BioCreative ChemProt (Chemical–Protein) dataset using various transfer transformers. We propose BERTGAT that uses a graph attention network (GAT) to take into account the local structure of sentences and embedding features of nodes under the self-attention scheme and investigate whether incorporating syntactic structure can help relation extraction. In addition, we suggest T5slim\_dec, which adapts the autoregressive generation task of the T5 (text-to-text transfer transformer) to the relation classification problem by removing the self-attention layer in the decoder block. Furthermore, we evaluated the potential of biomedical relation extraction of GPT-3 (Generative Pre-trained Transformer) using GPT-3 variant models. As a result, T5slim\_dec, which is a model with a tailored decoder designed for classification problems within the T5 architecture, demonstrated very promising performances for both tasks. We achieved an accuracy of 91.15% in the DDI dataset and an accuracy of 94.29% for the CPR (Chemical–Protein Relation) class group in ChemProt dataset. However, BERTGAT did not show a significant performance improvement in the aspect of relation extraction. We demonstrated that transformer-based approaches focused only on relationships between words are implicitly eligible to understand language well without additional knowledge such as structural information.

**Keywords:** DDI (drug–drug interaction); CPR (chemical–protein relation); transformer; self-attention; GAT (graph-attention network); relation extraction; ChemProt; T5 (text-to-text transfer transformer)

## **1. Introduction**

With the rapid progress in biomedical studies, it is a very challenging issue to extract efficiently useful information described in the biomedical literature. According to LitCOVID [1], over 1000 articles were published in just three months from December 2019, when COVID-19 was first reported, to March 2020. In PubMed [2] which is a biomedical literature retrieval system, more than 35 million biomedical articles are included. Therefore, life science researchers cannot keep up with all journals relevant to their areas of interest and select useful information from the latest research. In order to manage biomedical knowledge, curated databases such as UniProt [3], DrugBank [4], CTD [5], and IUPHAR/BPS [6] are constantly being updated. However, updating or developing a database manually can be time-consuming and labor-intensive work, and the speed is often slow, which makes automatic knowledge extraction and mining from biomedical literature highly demanding. Consequently, many pieces of valuable information with complex relationships between entities still remain unstructured and hidden in raw text.

**Citation:** Kim, S.; Yoon, J.; Kwon, O. Biomedical Relation Extraction Using Dependency Graph and Decoder-Enhanced Transformer Model. *Bioengineering* **2023**, *10*, 586. https://doi.org/10.3390/ bioengineering10050586

Academic Editors: Pedro Miguel Rodrigues, João Alexandre Lobo Marques and João Paulo do Vale Madeiro

Received: 1 March 2023 Revised: 6 May 2023 Accepted: 9 May 2023 Published: 12 May 2023

**Copyright:** © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

Recently, AI algorithms have been used to analyze complex forms of medical and life science data to assist human knowledge or to develop protocols for disease prevention and treatment. Moreover, deep learning techniques have been actively applied to various biomedical fields such as drug and personalized medicine development, clinical decision support systems, patient monitoring, and interaction extraction between biomedical entities. For example, protein–protein interaction in biomedical entities are very crucial for understanding various human life phenomena and diseases. Many biochemistry studies go beyond the molecular level of individual genes and focus on the networks and signaling pathways that connect groups or individuals that interact with each other. Similarly, interest in the integration and curation of relationships between biological and drug/chemical entities from text is increasing.

One of valuable information of drugs and chemical compounds is how they interact with certain biomedical entities, in particular genes and proteins. As mentioned in the study [1], metabolic relations are related to construction/curation of metabolic pathways and drug metabolism such as drug–drug interaction and adverse reactions. Inhibitor/activator associations are related to drug design and system biology approaches. Antagonist and agonist interactions helps in drug design, drug discovery, and understanding mechanism of actions. Drug–drug interaction (DDI) can be defined as a change in the effects of one drug by the presence of another drug. Since such information prevents dangers or side-effects caused by drugs, it is also important to extract useful knowledge from pharmaceutical papers.

Compared to other fields, texts of biomedical publications are more easily accessible due to the publicly available database MEDLINE [7] and the search system PubMed [2] However, the complexity and ambiguity in biomedical text are much greater than those of general text. One of characteristics of biomedical text is that multiple biomedical entities appear within a single sentence and one entity may be interacted with multiple entities. In particular, it is very difficult to infer which pairs contain actual relations because all entities in a single sentence share the same context, as shown in Figures 1 and 2. In this work, the relation extraction is simplified as classification task, where the problem is to classify which interaction exists between the given pre-recognized entities at sentence level.


**Figure 1.** Examples of ChemProt interactions.

The main objectives of this study are as follows: (1) we apply transfer transformer learning models, which have made impressive performances and progresses in recent years across a wider range of NLP tasks, to the detection of drug-related interactions in biomedical text, and aim to demonstrate which models are effective in biomedical relation extraction. The transformers generate abstract contextual representations of tokens very well by incorporating inter-relations of all tokens in a sequence with the concept of self-attention. As baseline models, three different dominant types of transformers: encoder-only model such as Google's BERT (Bidirectional Encoder Representations from Transformers) [8], decoder-only model such as OpenAI's GPT-3 (Generative Pre-trained Transformer) [9], and encoder–decoder structure of Google's T5 (Text-To-Text Transfer Transformer) [10] are chosen to establish a performance benchmark for our proposed methods. All experiments are conducted using ChemProt corpus [11] and DDI corpus [12] which are a collection of text documents that contains information about chemical/drug–protein/gene interactions and drug–drug interactions, respectively.

(2) The second objective of this study is to investigate the effects of syntactic structure of sentences on biomedical relation extraction by incorporating dependencies between words to enhance self-attention mechanism. According to previous studies, syntactic clues such as grammatical dependencies of a sentence help relation extraction. Some studies [13] have demonstrated that removing tokens outside the subtree rooted at the lowest common ancestor of the two entities or SDP (shortest dependency path) word sequence between two entities from the parse tree can improve relation extraction performance by eliminating irrelevant information from the sentence. However, this simplified representation by considering only the SDP word sequence may fail to capture contextual information, such as the presence of negation, which could be crucial for relation extraction [14].

In this work, we propose BERTGAT, a newly developed structure-enhanced encoding model that combines the graph-attention network (GAT) [15] with BERT. We investigate its effectiveness on relation extraction by taking into account not only word token information but also grammatical relevance between words within the attention scheme. To incorporate syntactic information, each dependency tree structure is converted into corresponding adjacency matrix. The GAT model uses an attention mechanism to calculate the importance of words within the input graph. This can allow for the extraction of more relevant information.

(3) Finally, we tailor T5, the encode–decoder transformer which has demonstrated high performances in text generation task, to efficiently handle discriminative, non-autoregressive tasks such as our relation classification problem. Since T5 transformer is designed for textto-text tasks such as text generation and machine translation, the decoder generates output tokens autoregressively based on previous tokens. This can be less efficient for classification tasks where a single label or output is required. Consequently, decoder's role is not much in classification tasks. We suggest T5slim\_dec, which determines the interaction category by removing the self-attention block of T5's decoder input.

The rest of the paper is organized as follows. In Section 2, related works in the field of biomedical relation extraction is presented. Section 3 briefly describes the dataset and provides necessary background information about transformers to help readers better understand the rest of the paper. Section 4 introduces the baseline models and proposed approaches in detail. Data statistics, results, and analysis are discussed in Section 5, along with comparisons with state of the art approaches and limitations. Finally, conclusions and outlooks are reported in Section 6.

#### **2. Related Works**

In the DDI (drug–drug interaction) extraction task [12], traditional deep-learning systems, such as convolutional neural networks (CNNs) [16] and recurrent neural networks (RNNs) [17] have shown better performances than feature-based approaches. Recently, the transformer-based models including BERT [8], RoBERTa [18], MASS [19], BART [20], MT-DNN [21], GPT-3 [9], and T5 [10] have demonstrated remarkable improvement in performance across various NLP (Natural Language Processing) tasks by obtaining contextualized token representation through a self-supervised learning on a large-scale raw text such as masked language model. The transformer model is originated from the "Attention Is All You Need" paper [22] researched by Google Brain and Google Research. They also attempted the transfer learning which the weights pretrained on a large-scale text dataset for a specific task such as masked language modeling, next sentence prediction or next token prediction were applied to downstream task by fine-tuning the pretrained models on the downstream task. As a result, pretrained language models tend to perform better than learning new knowledge from scratch with no prior knowledge because they utilize previously learned results.

The pretraining on large-scale raw texts has also significantly improved performance in biomedical domain. BERT based on encoder structure and its variants such as SCIBERT [23], BioBERT [24], and PubMedBERT [25] have been successfully applied in biomedical field. Since previous methods consider only the context around entities in the text, some research has encoded various knowledge besides input tokens, resulting in more informative input representations for downstream tasks [26,27].

Asada et al. [26] explored the impact of incorporating drug-related heterogeneous information on DDI extraction, and achieved an F-score of 85.40. They reported it as state-of-the-art performance. They constructed a HKG (heterogeneous knowledge graph) embedding vectors of drugs by performing a link prediction task which predicts an entity, t, that forms triple (*h*, *r*, *t*) for a given entity, h and relation pair, r on the PharmaHKG dataset. The dataset contains graph information: six nodes (entities), i.e., drug, protein, pathway, category, and ATC (Anatomical Therapeutic Chemical) code, molecular structure from different databases/thesauruses and eight edges (relations): category, ATC, pathway, interact, target, enzyme, carrier, and transporter. The input sentence S was tokenized into sub-word tokens by the BERT tokenizer and extended by adding KG vectors of two drugs. Thus, the input sentence is represented with {[CLS], w1, ... wm1, ... , wm2, ... ; [SEP], [KGm1] [KGm2]}, where wi corresponds to subword and m1, to drug1 and m2, to drug2, and [KGm1] and [KGm2] represent knowledge embeddings for each drug entity.

Similarly, Zhu et al. [28] utilized drug descriptions from Wikipedia and DrugBank to enhance the BERT model with the semantic information of drug entities. They used three kinds of entity-aware attentions to get sentence representation with entity information, mutual drug entity information, and drug entity information. The mutual information vector of two drug entities was obtained by subtracting the BioBERT embeddings of two drugs. For drug description information, all drug description documents were fed into Doc2Vec model and obtained its vector representations for each drug entity appearing in the 2013 DDI corpus. The vectors for entity information were fed into attention layers and retrieve sentence representation vectors integrating entity's multiple information. They reported 80.9 (micro F1-score) on DDI corpus.

LinkBERT [29] used hyperlinks to create better context for learning general-purpose LMs (language model). The hyperlink can offer new, multi-hop knowledge, which is not available in the single article alone. It creates inputs by placing linked documents in the same context window. They joined the segments of two different documents on BERT via special tokens to form an input instance: [CLS] XA [SEP] XB [SEP], where XA segment belongs to document A and XB segment belongs to document B. They used the Document Relation Prediction (DPR) objective for pretraining, which classifies the relation of two segments XB to XA as contiguous (XB is direct continuation of XA), random, and linked. They achieved a performance of 83.35 (micro F1-score) on DDI classification task. SciFive [30] and T5-MTFT [31] pretrained on biomedical text using T5 architecture also showed good performance in relation extraction. In particular, SciFive was pretrained on PubMed abstracts and outperformed other encoder-only models.

### **3. Preliminaries**

#### *3.1. Data Sets and Target Relations*

The evaluation of transformers is conducted on two datasets, namely ChemProt [11] and DDI [12] which are used for RE (relation extraction) between drug-related entities. This paper is not intended to validate different RE methods across various datasets, but rather than focuses on extraction of drug-related interactions and perform a more indepth evaluation.

In ChemProt track corpus in BioCreative VI, interactions are annotated to explore recognition of chemical–protein relations from abstracts, as shown in Table 1. The corpus contains directed relations from chemical/drug to gene/protein, indicating how the chemical/drug interacts with the gene/protein. Chemical–protein relations, referred to as 'CPR', are categorized into 10 semantically related classes that share some underlying biological characteristics. For instance, the interactions such as "activator", "indirect upregulator" and "upregulator", which result in an increase in the activity or expression of a target gene or protein, belong to CPR:3 group. The interactions such as "downregulator", "indirect downregulator", and "inhibitor" interactions which all decrease the activity or expression of a target gene or protein, belong to CPR:4. For this task, chemical and protein/gene entity mentions were manually annotated. In the track, only relations belonging to the following five classes were considered for evaluation purposes: CPR:3, CPR:4, CPR:5, CPR:6, and CPR:9.


**Table 1.** Interaction classes of ChemProt Corpus.

In the DDIExtraction 2013 shared task, five types of interactions are annotated, as shown in Table 2. The false pairs, which are drug pairs that do not interact, were excluded in the evaluation to simplify the evaluation and enable better comparability between systems in the shared task. Tables 3 and 4 display the number of instances for each class. Figures 1 and 2 illustrate examples of interactions in ChemProt and DDI, respectively. For example, the first sentence in Figure 2 states that 'mineral oil' and 'fat-soluble vitamins' have a DDI-mechanism relationship, while there is no interaction (false) between 'fat-soluble vitamin' and 'vitamin d preparations'. The interaction between 'mineral oil' and 'vitamin d preparation' is a DDI-mechanism. Since three interactions appear in one sentence, when creating instances, separators such as \*\* (## and \*\* for ChemProt) are added before and after the target entities to indicate the desired interaction pair.

**Table 2.** Interaction classes of DDI 2013 Corpus.


**Table 3.** The instances of the ChemProt corpus.


**Table 4.** The instances of the DDI extraction 2013 corpus.

