COMCARE: A Collaborative Ensemble Framework for Context-Aware Medical Named Entity Recognition and Relation Extraction
Abstract
:1. Introduction
- Complex medical terminology: Medical texts often contain specialized vocabulary, ambiguous abbreviations, synonyms, and polysemous terms. For example, the abbreviation ‘CA’ can indicate either ‘cancer’ or ‘cardiac arrest’, and similarly, ‘ASD’ can refer to ‘atrial septal defect’ or ‘autism spectrum disorder’. Synonyms such as ‘myocardial infarction’ and ‘heart attack’ further complicate the identification process. Moreover, polysemous terms such as ‘discharge’ can refer to either a patient’s release from a hospital or bodily fluid, leading to potential misinterpretations that affect downstream medical applications [6,7,8].
- Diverse entity types and overlapping entities: Medical texts contain a wide variety of entity types, including diseases, symptoms, drugs, procedures, and anatomical locations. These entities often overlap, making precise entity recognition more difficult [9]. For example, in the phrase ‘aspirin therapy for stroke prevention’, ‘aspirin’ is a drug entity, and ‘stroke’ is a medical condition. Additionally, overlapping entities such as ‘lung cancer screening,’ where ‘lung cancer’ is a disease entity and ‘cancer screening’ is a medical procedure, pose challenges for models to accurately identify and differentiate each component in context.
- Context-sensitive relationships: The relationships between medical terms are highly context-dependent and can extend across multiple sentences or paragraphs [10]. For instance, in a diagnostic context, ‘hypertension’ may be identified as the primary cause of ‘heart failure’, suggesting a cause-and-effect relationship. In contrast, in the treatment context, managing ‘hypertension’ might be part of a therapeutic strategy for patients already suffering from ‘heart failure’, indicating a treatment-related relationship. These complex and long-range relationships require models that can accurately capture the context across sentences or paragraphs to extract meaningful insights.
- Collaborative Decision Strategy: We present a collaborative decision strategy that fuses outputs from domain-specific models, such as PubMedBERT and PubMed-T5, leveraging their complementary strengths in handling diverse forms of medical terminology. This strategy directly addresses the challenges of complex medical terminology and overlapping entities in medical texts by combining the contextual understanding of PubMedBERT with the generative capabilities of PubMed-T5 through a token-level fusion mechanism.
- Context-Aware Relation Extraction: We integrated token-level information from the NER module with context embeddings using a semantic chunking approach, enabling the accurate capture of context-dependent relationships. This approach effectively addresses both entity-based relation detection and long-range dependencies by preserving detailed entity-specific insights while maintaining a broader contextual understanding. This integration significantly improves the ability of the model to identify and classify complex relationships across medical texts.
2. Related Work
2.1. Named Entity Recognition in Medical Texts
2.2. Relation Extraction in Medical Texts
2.3. Large Language Models in Medical Texts
3. Methods
3.1. Medical Named Entity Recognition Module
3.1.1. Domain-Specific Token Embedding Generation
3.1.2. Feature Learning Through BiLSTM Networks
3.1.3. Collaborative Decision Strategy
- Confidence-based Tag Selection: For a given token, each emission matrix and outputs probability values for all possible entity tags (e.g., O, Tag1, Tag2, Tag3). From these tag probability distributions, we identify the tag and its corresponding confidence score with the highest probability from each matrix. For instance, given an emission matrix P1 with probabilities {O:0.2, Tag1:0.3, Tag2:0.4, Tag3:0.1}, the model would select Tag2 as its prediction with a confidence score of 0.4.
- Avoidance of non-entity ‘O’ Tag Bias: When one model predicts non-entity ‘O’ tags with the highest probability and the other model predicts an entity tag with a probability above a predefined threshold, the ensemble strategy selects the entity tag. This threshold was empirically set to 0.6 by evaluating a range of potential values (from 0.1 to 1.0) on the validation set. During our experiments, we observed that lower thresholds (0.1–0.5) led to increased false positives, while higher thresholds (0.7–1.0) resulted in missed entity detections. The threshold of 0.6 provided the most effective balance between precision and recall, reducing false negatives while maintaining a high level of accuracy for detected entities. This rule helps prevent potential entities from being overlooked when only one model predicts ‘O’ tags.
- Weighted Combination: If both models assign the highest probability to the same tag, the ensemble strategy averages the probabilities as , where denotes the token index within the sequence. In cases where the models disagree and neither the confidence-based selection nor the ‘O’ tags avoidance rule applies, a weighted combination is used where . Here, denotes a weight parameter computed using a softmax function to determine the relative importance of each model’s prediction for that token.
Algorithm 1. Collaborative Decision Strategy |
Input: Sentence S Output: Resulting ensemble matrix EM |
1: 2: 3: 4: 5: Initialize resulting ensemble matrix 6: for each token i in sequence do: 7: Compute attention weight 8: Get predictions from each model://Select tag and confidence score with highest probability from each model 9: 10: 11: if : 12: if : 13: 14: else if : 15: 16: else 17: //weighted combination based on attention 18: else 19: ////simple average if predictions match 20: return EM |
3.1.4. Conditional Random Field (CRF) Layer
3.2. Relation Extraction Module
3.2.1. Semantic Chunking and Embedding Generation
Algorithm 2. Semantic Chunking |
Input: Document D, maximum chunk size k, entity pairs EP Output: Processed chunks C |
1: Initialize C ← [], position ← 0 2: sentences ← split_into_sentence(D) 3: S ← compute_sentence_embedding(sentences) 4: while position < len(D) do: 5: chunk_end ← min(position + k, len(D)) |
6: //Semantic boundary detection 7: boundary_sim ← compute_boundary_similarity(S[position:chunk_end]) 8: //Compute overlap size based on similarity 9: if boundary_sim > high_threshold: 10: δ ← 0.4 * k//max_overlap 11: else if boundary_sim < low_threshold: 12: δ ← 0.1 * k//min_overlap 13: else: 14: δ ← 0.2 * k//α·k |
15: //Entity preservation adjustment 16: if contains_partial_entity_pair(D[position:chunk_end], EP): 17: entity_distance ← compute_entity_span_distance(chunk_end, EP) 18: δ ← max(δ, entity_distance) 19. chunk ← D[position:chunk_end] 20. C.append(chunk) 21. position ← chunk_end - δ 22: return C |
3.2.2. Relation Prediction with BiLSTM Feature Learning
3.2.3. Majority Voting Mechanism
Algorithm 3. Majority Voting for Relation Prediction |
Input: Model predictions P = {P_BERT, P_PubMedBERT, P_T5}, Confidence thresholds τ = {τBERT = 0.5, τPubMedBERT = 0.6, τT5 = 0.6}, Consensus threshold θ = 0.7, Validation set V Output: Final relation prediction rel* |
1://Weight Learning 2: Initialize performance_scores ← array of zeros[num_models] 3: for each model in models do: 4: performance_scores[model] ← compute_f1_score(model, V) 5: W ← normalize(performance_scores)//Normalize to sum to 1 |
6://Voting 7: Initialize vote_counts ← array of zeros[num_relation_types] 8://Aggregate votes from each model 9: for each model_pred in P do: 10: pred_type, conf ← get_max_prediction(model_pred)//conf ∈ [0, 1] 11: if conf > τ[model_type] then 12: vote_counts[pred_type] += W[model_type] * conf 13: end for |
14://Evaluate consensus and determine final prediction 15: if max(vote_counts) < θ: 16: //Low consensus case: prioritize domain-specific models 17: rel* ← weighted_average({P_PubMedBERT, P_T5}, W) 18: else: 19: //Strong consensus case 20: rel* ← argmax(vote_counts) 21: return rel* |
4. Experiments
4.1. Dataset
4.1.1. BioRED Dataset
4.1.2. ADE Corpus
4.1.3. DIANN Corpus
4.1.4. RDD Corpus
4.2. Baseline Models
- CNN: The CNN implementation serves as the foundational baseline. A CNN was originally designed to process image data. However, its ability to detect meaningful patterns in specific parts of data makes it valuable for NLP tasks. CNN effectively learns specific patterns within sentences and predicts labels based on these patterns. In our experiments, we implemented the CNN model using the PyTorch framework with the following parameters: kernel_size = 3, n_filters = 5, and epochs = 100. Early stopping was consistently applied across all experiments, and Adam was used to optimize the model.
- BiLSTM: The BiLSTM network served as the sequential modeling baseline. BiLSTM is a traditional deep learning approach that is widely adopted for sequential data processing and is designed to overcome the limitations of conventional LSTM by considering sequence information from both directions. This bidirectional approach enables superior context understanding and is particularly effective for entity recognition in medical texts. Our BiLSTM implementation utilized the torch package with carefully tuned parameters: batch size of 8, learning rate of 3e-5, 100 training epochs, embedding size of 768, and hidden size of 32. The Adam optimizer was employed for model training.
- BiLSTM-CRF: The BiLSTM-CRF extends the BiLSTM architecture by incorporating a CRF layer. The addition of the CRF layer enabled the model to consider the dependencies between adjacent labels, thereby significantly improving the coherence of the predicted entity sequences. This enhancement is particularly important for capturing the structured nature of medical entity tags. The CRF layer was implemented using the pytorch-crf package, maintaining hyperparameters consistent with the base BiLSTM model while adding transition matrix optimization.
- BERT-CRF: The BERT-CRF model combines the pre-trained BERT model with a CRF layer for sequence optimization. We utilized the pre-trained BERT-base model (768 hidden dimensions and 12 attention heads) and fine-tuned it using our medical datasets. The CRF layer was added on top of the BERT final hidden states, allowing the model to optimize the tag sequences while leveraging the BERT contextual representations. We employed a learning rate of 2 ×10−5 and trained for 20 epochs with early stopping.
- BERT-BiLSTM-CRF: The BERT-BiLSTM-CRF model represents a sophisticated architecture that utilizes word embeddings obtained from BERT as inputs for the BiLSTM-CRF model. This architecture leverages BERT’s contextual understanding capabilities of BERT while maintaining the strength of BiLSTM-CRF in modeling label dependencies.
- KECI (Knowledge-Enhanced Collective Inference): The KECI is a state-of-the-art model for joint biomedical entity and relation extraction that integrates external domain knowledge [42]. The model follows three main steps: constructing an initial span graph, building a background knowledge graph, and fusing these graphs using attention mechanisms. With SciBERT as its transformer encoder, KECI processes graphs using both bidirectional and relational Graph Convolutional Networks (GCNs) while incorporating external knowledge from UMLS and MetaMap. To enhance entity representations, KECI encodes UMLS-derived definitions and relational information using SciBERT and integrates them into the entity nodes in the knowledge graph. The model’s key hyperparameters include a span length limit of 20 tokens, a learning rate of 2 × 10−5, and a batch size of 32.
- LLMs: LLMs are characterized by their extensive parameters and training data, demonstrating exceptional capabilities in language understanding, text generation, and question-answering. Following the emergence of OpenAI’s ChatGPT, interest in LLMs has surged, leading to the development of various models. In our experiments, we compared the performance of our proposed framework with GPT-3.5 and GPT-4 in zero-shot settings. For these evaluations, we carefully crafted prompts to elicit entity recognition and relation extraction responses while maintaining consistent evaluation conditions across all experiments.
4.3. Implementation Details
4.4. Experimental Results and Analysis
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Appendix A. Detailed Comparative Results: Base Model and Proposed Model
BioRED | NER | RE | |||||
---|---|---|---|---|---|---|---|
Precision (%) | Recall (%) | F1-Score (%) | Precision (%) | Recall (%) | F1-Score (%) | ||
Base model (NER: BERT-BiLSTM-CRF, RE: BERT) | Seed 0 | 93.08 | 91.63 | 92.35 | 70.88 | 64.15 | 67.35 |
Seed 1 | 93.86 | 93.01 | 93.43 | 72.31 | 66.37 | 69.21 | |
Seed 2 | 95.19 | 91.08 | 93.09 | 70.21 | 65.66 | 67.86 | |
Seed 3 | 95.3 | 91.54 | 93.38 | 70.4 | 63.97 | 67.03 | |
Seed 4 | 93.91 | 91.34 | 92.61 | 72.02 | 65.03 | 68.34 | |
Seed 5 | 94.04 | 91.42 | 92.71 | 69.66 | 65.05 | 67.28 | |
Seed 6 | 94.17 | 90.36 | 92.23 | 70.17 | 64.96 | 67.46 | |
Seed 7 | 94.92 | 89.96 | 92.38 | 71.04 | 65.58 | 68.2 | |
Seed 8 | 93.18 | 91.69 | 92.43 | 71.23 | 65.32 | 68.15 | |
Seed 9 | 93.47 | 91.12 | 92.28 | 71.47 | 66.17 | 68.72 | |
Proposed model | Seed 0 | 95.34 | 92.56 | 93.93 | 73.7 | 63.51 | 68.23 |
Seed 1 | 94.54 | 92.47 | 93.49 | 73.39 | 64.11 | 68.43 | |
Seed 2 | 97.24 | 94.96 | 96.09 | 71.19 | 65.88 | 68.43 | |
Seed 3 | 95.04 | 89.78 | 92.33 | 73.56 | 65.9 | 69.52 | |
Seed 4 | 94.2 | 92.3 | 93.24 | 74.38 | 66.01 | 69.95 | |
Seed 5 | 95.21 | 91.21 | 93.17 | 72.98 | 64.91 | 68.71 | |
Seed 6 | 94.95 | 91.6 | 93.73 | 72.4 | 64.7 | 68.33 | |
Seed 7 | 94.21 | 92.38 | 93.29 | 74.49 | 63.2 | 68.38 | |
Seed 8 | 95.94 | 91.6 | 93.72 | 72.66 | 65.07 | 68.65 | |
Seed 9 | 94.11 | 91.81 | 92.95 | 72.48 | 65.6 | 68.87 | |
p-value < 0.05 | No | Yes | Yes | Yes | No | Yes |
RDD | NER | RE | |||||
---|---|---|---|---|---|---|---|
Precision (%) | Recall (%) | F1-Score (%) | Precision (%) | Recall (%) | F1-Score (%) | ||
Base model (NER: BERT-BiLSTM-CRF, RE: BERT) | Seed 0 | 94.19 | 64.18 | 76.34 | 85.0 | 87.33 | 86.15 |
Seed 1 | 93.98 | 64.75 | 76.67 | 83.59 | 85.41 | 84.49 | |
Seed 2 | 95.73 | 67.79 | 79.37 | 85.2 | 85.44 | 85.32 | |
Seed 3 | 94.96 | 64.29 | 76.67 | 85.65 | 84.56 | 85.1 | |
Seed 4 | 95.18 | 64.01 | 76.55 | 84.32 | 84.83 | 84.58 | |
Seed 5 | 96.27 | 64.52 | 77.26 | 84.44 | 82.56 | 83.49 | |
Seed 6 | 93.94 | 65.36 | 77.09 | 84.92 | 85.71 | 85.31 | |
Seed 7 | 95.04 | 65.23 | 77.36 | 86.18 | 84.68 | 85.42 | |
Seed 8 | 94.61 | 64.53 | 76.73 | 84.82 | 85.69 | 85.25 | |
Seed 9 | 94.43 | 67.23 | 78.54 | 85.81 | 83.89 | 84.84 | |
Proposed model | Seed 0 | 93.96 | 67.04 | 78.25 | 87.91 | 84.6 | 86.22 |
Seed 1 | 93.88 | 67.71 | 78.68 | 86.98 | 86.82 | 86.9 | |
Seed 2 | 95.42 | 66.23 | 78.19 | 88.2 | 86.13 | 87.15 | |
Seed 3 | 94.26 | 67.03 | 78.35 | 87.02 | 86.11 | 86.56 | |
Seed 4 | 94.4 | 66.63 | 78.12 | 87.19 | 85.84 | 86.51 | |
Seed 5 | 95.76 | 66.54 | 78.52 | 86.75 | 85.13 | 85.93 | |
Seed 6 | 93.6 | 65.52 | 77.08 | 89.26 | 86.62 | 87.92 | |
Seed 7 | 93.67 | 65.68 | 77.22 | 89.26 | 85.8 | 85.96 | |
Seed 8 | 95.28 | 67.18 | 78.8 | 86.79 | 84.8 | 85.78 | |
Seed 9 | 94.59 | 64.32 | 76.57 | 88.25 | 84.71 | 86.45 | |
p-value < 0.05 | No | Yes | No | Yes | No | Yes |
ADE | NER | |||
---|---|---|---|---|
Precision (%) | Recall (%) | F1-Score (%) | ||
Base model (BERT-BiLSTM-CRF) | Seed 0 | 93.08 | 91.63 | 92.35 |
Seed 1 | 93.86 | 93.01 | 93.43 | |
Seed 2 | 95.19 | 91.08 | 93.09 | |
Seed 3 | 95.3 | 91.54 | 93.38 | |
Seed 4 | 93.91 | 91.34 | 92.61 | |
Seed 5 | 94.04 | 91.42 | 92.71 | |
Seed 6 | 94.17 | 90.36 | 92.23 | |
Seed 7 | 94.92 | 89.96 | 92.38 | |
Seed 8 | 93.18 | 91.69 | 92.43 | |
Seed 9 | 93.47 | 91.12 | 92.28 | |
Proposed model | Seed 0 | 92.9 | 72.54 | 81.47 |
Seed 1 | 95.26 | 74.07 | 83.34 | |
Seed 2 | 96.67 | 74.26 | 84.0 | |
Seed 3 | 95.27 | 71.91 | 81.96 | |
Seed 4 | 95.71 | 74.49 | 83.78 | |
Seed 5 | 95.9 | 73.37 | 83.13 | |
Seed 6 | 94.28 | 73.18 | 82.4 | |
Seed 7 | 93.84 | 73.51 | 82.44 | |
Seed 8 | 93.01 | 75.54 | 82.37 | |
Seed 9 | 94.81 | 73.61 | 82.87 | |
p-value < 0.05 | Yes | Yes | Yes |
DIANN | NER | |||
---|---|---|---|---|
Precision (%) | Recall (%) | F1-Score (%) | ||
Base model (BERT-BiLSTM-CRF) | Seed 0 | 97.23 | 97.39 | 97.31 |
Seed 1 | 99.45 | 97.48 | 98.45 | |
Seed 2 | 98.62 | 97.38 | 97.99 | |
Seed 3 | 98.68 | 96.73 | 97.69 | |
Seed 4 | 99.5 | 96.07 | 97.75 | |
Seed 5 | 99.04 | 97.61 | 98.32 | |
Seed 6 | 98.88 | 95.88 | 97.35 | |
Seed 7 | 99.2 | 99.32 | 99.26 | |
Seed 8 | 98.54 | 95.44 | 96.97 | |
Seed 9 | 98.16 | 97.18 | 97.67 | |
Proposed model | Seed 0 | 99.36 | 97.79 | 98.57 |
Seed 1 | 99.86 | 98.3 | 99.07 | |
Seed 2 | 99.29 | 98.95 | 99.12 | |
Seed 3 | 99.61 | 99.78 | 99.69 | |
Seed 4 | 98.31 | 99.83 | 99.06 | |
Seed 5 | 99.76 | 98.87 | 99.31 | |
Seed 6 | 99.42 | 98.61 | 99.01 | |
Seed 7 | 98.9 | 97.87 | 98.38 | |
Seed 8 | 98.95 | 98.96 | 98.85 | |
Seed 9 | 99.07 | 98.66 | 98.86 | |
p-value < 0.05 | Yes | Yes | Yes |
References
- Yang, F.; Shu, H.; Zhang, X. Understanding “Internet Plus Healthcare” in China: Policy Text Analysis. J. Med. Internet Res. 2020, 23, e23779. [Google Scholar] [CrossRef] [PubMed]
- Wiest, I.C.; Ferber, D.; Zhu, J.; van Treeck, M.; Meyer, S.K.; Juglan, R.; Carrero, Z.I.; Paech, D.; Kleesiek, J.; Ebert, M.P.; et al. Privacy-preserving large language models for structured medical information retrieval. NPJ Digit. Med. 2024, 7, 257. [Google Scholar] [CrossRef] [PubMed]
- Elgaar, M.; Cheng, J.; Vakil, N.; Amiri, H.; Celi, L.A. MedDec: A Dataset for Extracting Medical Decisions from Discharge Summaries. In Findings of the Association for Computational Linguistics: ACL 2024; Association for Computational Linguistics: Bangkok, Thailand, 2024; pp. 16442–16455. [Google Scholar]
- da Silva, D.P.; da Rosa Fröhlich, W.; de Mello, B.H.; Vieira, R.; Rigo, S.J. Exploring named entity recognition and relation extraction for ontology and medical records integration. Inform. Med. Unlocked 2023, 43, 101381. [Google Scholar] [CrossRef]
- Navarro, D.F.; Ijaz, K.; Rezazadegan, D.; Rahimi-Ardabili, H.; Dras, M.; Coiera, E.W.; Berkovsky, S. Clinical named entity recognition and relation extraction using natural language processing of medical free text: A systematic review. Int. J. Med. Inform. 2023, 177, 105122. [Google Scholar] [CrossRef]
- Moscato, V.; Postiglione, M.; Sansone, C.; Sperlí, G. TaughtNet: Learning Multi-Task Biomedical Named Entity Recognition From Single-Task Teachers. IEEE J. Biomed. Health Inform. 2023, 27, 2512–2523. [Google Scholar] [CrossRef]
- Grossman Liu, L.; Grossman, R.H.; Mitchell, E.G.; Weng, C.; Natarajan, K.; Hripcsak, G.; Vawdrey, D.K. A deep database of medical abbreviations and acronyms for natural language processing. Sci. Data 2021, 8, 149. [Google Scholar] [CrossRef] [PubMed]
- Hu, Z.; Ma, X. A novel neural network model fusion approach for improving medical named entity recognition in online health expert question-answering services. Expert Syst. Appl. 2023, 223, 119880. [Google Scholar] [CrossRef]
- Jonker, R.A.; Almeida, T.; Antunes, R.; Almeida, J.R.; Matos, S. Multi-head CRF classifier for biomedical multi-class named entity recognition on Spanish clinical notes. Database 2024, 2024, baae068. [Google Scholar] [CrossRef]
- Noriega-Atala, E.; Hein, P.D.; Thumsi, S.S.; Wong, Z.; Wang, X.; Hendryx, S.M.; Morrison, C.T. Extracting Inter-Sentence Relations for Associating Biological Context with Events in Biomedical Texts. IEEE/ACM Trans. Comput. Biol. Bioinform. 2020, 17, 1895–1906. [Google Scholar] [CrossRef] [PubMed]
- Popovski, G.; Kochev, S.; Korousic-Seljak, B.; Eftimov, T. FoodIE: A Rule-based Named-entity Recognition Method for Food Information Extraction. In Proceedings of the International Conference on Pattern Recognition Applications and Methods, Prague, Czech Republic, 19–21 February 2019. [Google Scholar]
- Gorinski, P.J.; Wu, H.; Grover, C.; Tobin, R.; Talbot, C.; Whalley, H.C.; Sudlow, C.L.; Whiteley, W.; Alex, B. Named Entity Recognition for Electronic Health Records: A Comparison of Rule-based and Machine Learning Approaches. arXiv 2019, arXiv:1903.03985. [Google Scholar]
- Khan, W.; Daud, A.; Shahzad, K.; Amjad, T.; Banjar, A.T.; Fasihuddin, H.A. Named Entity Recognition Using Conditional Random Fields. Appl. Sci. 2022, 12, 6391. [Google Scholar] [CrossRef]
- Suthaharan, S. Support Vector Machine. In Machine Learning Models and Algorithms for Big Data Classification; Integrated Series in Information Systems; Springer: Boston, MA, USA, 2016; Volume 36. [Google Scholar] [CrossRef]
- Yang, J.; Zhang, T.; Tsai, C.; Lu, Y.; Yao, L. Evolution and emerging trends of named entity recognition: Bibliometric analysis from 2000 to 2023. Heliyon 2024, 10, e30053. [Google Scholar] [CrossRef] [PubMed]
- Huang, Z.; Xu, W.; Yu, K. Bidirectional LSTM-CRF Models for Sequence Tagging. arXiv 2015, arXiv:1508.01991. [Google Scholar]
- Devlin, J.; Chang, M.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the North American Chapter of the Association for Computational Linguistics, Minneapolis, MN, USA, 2–7 June 2019. [Google Scholar]
- Gu, Y.; Tinn, R.; Cheng, H.; Lucas, M.R.; Usuyama, N.; Liu, X.; Naumann, T.; Gao, J.; Poon, H. Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing. ACM Trans. Comput. Healthc. (HEALTH) 2020, 3, 1–23. [Google Scholar] [CrossRef]
- Lee, J.; Yoon, W.; Kim, S.; Kim, D.; Kim, S.; So, C.H.; Kang, J. BioBERT: A pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 2019, 36, 1234–1240. [Google Scholar] [CrossRef]
- OpenAI. ChatGPT (3.5-turbo, 4o-mini) [Large Language Model]. 2024. Available online: https://chat.openai.com/ (accessed on 7 December 2024).
- Touvron, H.; Lavril, T.; Izacard, G.; Martinet, X.; Lachaux, M.; Lacroix, T.; Rozière, B.; Goyal, N.; Hambro, E.; Azhar, F.; et al. LLaMA: Open and Efficient Foundation Language Models. arXiv 2023, arXiv:2302.13971. [Google Scholar]
- Raffel, C.; Shazeer, N.M.; Roberts, A.; Lee, K.; Narang, S.; Matena, M.; Zhou, Y.; Li, W.; Liu, P.J. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. J. Mach. Learn. Res. 2019, 21, 1–67. [Google Scholar]
- Arbabi, A.; Adams, D.R.; Fidler, S.; Brudno, M. Identifying Clinical Terms in Medical Text Using Ontology-Guided Machine Learning. JMIR Med. Inform. 2019, 7, e12596. [Google Scholar] [CrossRef]
- Zhao, S.; Liu, T.; Zhao, S.; Wang, F. A Neural Multi-Task Learning Framework to Jointly Model Medical Named Entity Recognition and Normalization. arXiv 2018, arXiv:1812.06081. [Google Scholar] [CrossRef]
- Chaudhry, M.; Kazmi, A.; Jatav, S.; Verma, A.; Samal, V.; Paul, K.; Modi, A. Reducing Inference Time of Biomedical NER Tasks using Multi-Task Learning. In Proceedings of the 19th International Conference on Natural Language Processing, New Delhi, India; 2022; pp. 116–122. [Google Scholar]
- Li, J.; Wei, Q.; Ghiasvand, O.A.; Chen, M.; Lobanov, V.S.; Weng, C.; Xu, H. A comparative study of pre-trained language models for named entity recognition in clinical trial eligibility criteria from multiple corpora. BMC Med. Inform. Decis. Mak. 2022, 22, 235. [Google Scholar] [CrossRef]
- Yi, F.; Liu, H.; Wang, Y.; Wu, S.; Sun, C.; Feng, P.; Zhang, J. Medical Named Entity Recognition Fusing Part-of-Speech and Stroke Features. Appl. Sci. 2023, 13, 8913. [Google Scholar] [CrossRef]
- Liang, T.; Xia, C.; Zhao, Z.; Jiang, Y.; Yin, Y.; Yu, P. Transferring From Textual Entailment to Biomedical Named Entity Recognition. IEEE/ACM Trans. Comput. Biol. Bioinform. 2023, 20, 2577–2586. [Google Scholar] [CrossRef] [PubMed]
- Zhang, Z.; Chen, A.L. Biomedical named entity recognition with the combined feature attention and fully-shared multi-task learning. BMC Bioinform. 2022, 23, 458. [Google Scholar] [CrossRef]
- Dewi, I.N.; Dong, S.; Hu, J. Drug-drug interaction relation extraction with deep convolutional neural networks. In Proceedings of the 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Kansas City, MO, USA, 13–16 November 2017; pp. 1795–1802. [Google Scholar]
- Fabregat, H.; Duque, A.; Martínez-Romo, J.; Araujo, L. Negation-based transfer learning for improving biomedical Named Entity Recognition and Relation Extraction. J. Biomed. Inform. 2023, 138, 104279. [Google Scholar] [CrossRef] [PubMed]
- Luo, L.; Lai, P.; Wei, C.; Arighi, C.N.; Lu, Z. BioRED: A rich biomedical relation extraction dataset. Brief. Bioinform. 2022, 23, bbac282. [Google Scholar] [CrossRef] [PubMed]
- Li, Z.; Wei, Q.; Huang, L.; Li, J.; Hu, Y.; Chuang, Y.; He, J.; Das, A.; Keloth, V.K.; Yang, Y.; et al. Ensemble pretrained language models to extract biomedical knowledge from literature. J. Am. Med. Inform. Assoc. 2024, 31, 1904–1911. [Google Scholar] [CrossRef] [PubMed]
- Lu, Q.; Li, R.; Wen, A.; Wang, J.; Wang, L.; Liu, H. Large Language Models Struggle in Token-Level Clinical Named Entity Recognition. arXiv 2024, arXiv:2407.00731. [Google Scholar]
- Hu, Y.; Ameer, I.; Zuo, X.; Peng, X.; Zhou, Y.; Li, Z.; Li, Y.; Li, J.; Jiang, X.; Xu, H. Improving large language models for clinical named entity recognition via prompt engineering. J. Am. Med. Inform. Assoc. 2023, 31, 1812–1820. [Google Scholar] [CrossRef]
- Zhou, H.; Li, M.; Xiao, Y.; Yang, H.; Zhang, R. LEAP: LLM instruction-example adaptive prompting framework for biomedical relation extraction. J. Am. Med. Inform. Assoc. 2024, 31, 2010–2018. [Google Scholar] [CrossRef]
- Kudo, T.; Richardson, J. SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2018. [Google Scholar]
- Wu, Y.; Schuster, M.; Chen, Z.; Le, Q.V.; Norouzi, M.; Macherey, W.; Krikun, M.; Cao, Y.; Gao, Q.; Macherey, K.; et al. Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation. arXiv 2016, arXiv:1609.08144. [Google Scholar]
- Gurulingappa, H.; Rajput, A.M.; Roberts, A.; Fluck, J.; Hofmann-Apitius, M.; Toldo, L. Development of a benchmark corpus to support the automatic extraction of drug-related adverse effects from medical case reports. J. Biomed. Inform. 2012, 45, 885–892. [Google Scholar] [CrossRef] [PubMed]
- Fabregat, H.; Araujo, L.; Martínez-Romo, J. Deep neural models for extracting entities and relationships in the new RDD corpus relating disabilities and rare diseases. Comput. Methods Programs Biomed. 2018, 164, 121–129. [Google Scholar] [CrossRef]
- Fabregat, H.; Martínez-Romo, J.; Araujo, L. Overview of the DIANN Task: Disability Annotation Task. In Proceedings of the IberEval@SEPLN, Seville, Spain, 18 September 2018. [Google Scholar]
- Lai, T.; Ji, H.; Zhai, C.; Tran, Q.H. Joint Biomedical Entity and Relation Extraction with Knowledge-Enhanced Collective Inference. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; Volume 1: Long Papers; Association for Computational Linguistics: Seattle, WA, USA, 2021; pp. 6248–6260. [Google Scholar]
- Akiba, T.; Sano, S.; Yanase, T.; Ohta, T.; Koyama, M. Optuna: A next-generation hyperparameter optimization framework. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019; pp. 2623–2631. [Google Scholar]
Entity Type | Entity Counts |
---|---|
CellLine | 117 |
ChemicalEntity | 2540 |
OrganismTaxon | 1420 |
SequenceVariant | 1011 |
GeneOrGeneproduct | 4764 |
DiseaseOrPhenotypicFeature | 3784 |
Total | 13,636 |
Relation Type | Pair of Entity Types | Total | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
CC | CD | CG | CV | DG | DV | GG | GV | VV | ||
Association | 53 | 108 | 198 | 40 | 1002 | 349 | 514 | 1 | 9 | 2274 |
Positive Correlation | 41 | 343 | 175 | 5 | 49 | 292 | 253 | 1 | - | 1159 |
Negative Correlation | 100 | 293 | 178 | 5 | 54 | 12 | 78 | 1 | - | 721 |
Bind | 1 | - | 23 | - | - | - | 45 | - | - | 69 |
Cotreatment | 26 | - | 3 | - | - | - | - | - | - | 29 |
Comparison | 22 | - | - | - | - | - | - | - | - | 22 |
Drug Interaction | 3 | - | - | - | - | - | - | - | - | 3 |
Conversion | 3 | - | - | - | - | - | - | - | - | 3 |
Total | 249 | 744 | 577 | 50 | 1105 | 653 | 890 | 3 | 9 | 4280 |
Entity Type | Entity Counts |
---|---|
Drug | 5063 |
Adverse effect | 5776 |
Dosage | 231 |
Total | 11,070 |
Entity Type | Entity Counts |
---|---|
Rare disease | 578 |
Disability | 3678 |
Total | 4256 |
Relation Type | Relation Counts |
---|---|
Positive | 1251 |
Negative | 706 |
Total | 1957 |
Hyperparameter | Value |
---|---|
BiLSTM hidden size | 32 |
Embedding size | 768 |
Learning rate | |
Max sequence length | 100 |
Dropout | 0.5 |
Batch size | 8 |
Epoch | 100 |
Models | NER | ||
---|---|---|---|
Precision (%) | Recall (%) | F1-Score (%) | |
1. SciBERT | 90.17 | 88.85 | 89.54 |
2. BlueBERT | 92.66 | 90.83 | 91.83 |
3. BioBERT | 93.88 | 91.72 | 92.39 |
4. PubMedBERT | 94.31 | 91.04 | 92.51 |
Models | NER | RE | ||||
---|---|---|---|---|---|---|
Precision (%) | Recall (%) | F1-Score (%) | Precision (%) | Recall (%) | F1-Score (%) | |
1. CNN | 69.74 | 54.26 | 60.75 | 3.89 | 18.04 | 6.40 |
2. BiLSTM | 83.86 | 64.54 | 72.46 | 13.28 | 0.98 | 0.72 |
3. BiLSTM-CRF | 83.33 | 66.91 | 73.95 | - | - | - |
4. BERT | 93.45 | 90.81 | 91.96 | 71.01 | 65.49 | 67.99 |
5. BERT-CRF | 94.27 | 90.32 | 92.07 | - | - | - |
6. BERT-BiLSTM | 93.99 | 90.38 | 92.03 | 69.75 | 60.59 | 64.14 |
7. BERT-BiLSTM-CRF | 94.31 | 91.04 | 92.51 | - | - | - |
8. KECI (SciBERT) | 93.01 | 88.53 | 90.71 | 70.54 | 62.31 | 66.17 |
9. GPT-3.5 | 16.36 | 3.17 | 5.30 | 8.29 | 18.40 | 9.87 |
10. GPT-4o-mini | 13.18 | 2.45 | 4.09 | 7.58 | 15.92 | 9.86 |
Our model | 95.11 | 92.45 | 93.76 | 72.58 | 65.27 | 68.73 |
Models | NER | RE | ||||
---|---|---|---|---|---|---|
Precision (%) | Recall (%) | F1 score (%) | Precision (%) | Recall (%) | F1 score (%) | |
1. CNN | 30.01 | 3.49 | 6.19 | 72.28 | 72.45 | 70.16 |
2. BiLSTM | 83.49 | 19.80 | 31.59 | 66.17 | 67.35 | 62.91 |
3. BiLSTM-CRF | 26.34 | 16.50 | 16.12 | - | - | - |
4. BERT | 93.40 | 65.11 | 75.70 | 85.09 | 85.20 | 84.99 |
5. BERT-CRF | 94.18 | 65.69 | 76.72 | - | - | - |
6. BERT-BiLSTM | 95.31 | 64.46 | 76.91 | 85.31 | 85.20 | 84.83 |
7. BERT-BiLSTM-CRF | 95.15 | 65.30 | 77.45 | - | - | - |
8. KECI (SciBERT) | 93.02 | 63.24 | 75.51 | 84.19 | 82.10 | 83.13 |
9. GPT-3.5 | 25.32 | 1.10 | 2.05 | 68.14 | 68.98 | 65.23 |
10. GPT-4o-mini | 25.24 | 3.73 | 6.31 | 72.04 | 68.72 | 67.50 |
Our model | 94.42 | 66.24 | 77.86 | 87.56 | 86.03 | 86.79 |
Models | NER | ||
---|---|---|---|
Precision (%) | Recall (%) | F1-Score (%) | |
1. CNN | 27.26 | 0.05 | 0.09 |
2. BiLSTM | 47.35 | 0.18 | 0.46 |
3. BiLSTM-CRF | 31.19 | 0.09 | 0.18 |
4. BERT | 91.74 | 64.21 | 74.74 |
5. BERT-CRF | 94.97 | 70.95 | 80.53 |
6. BERT-BiLSTM | 95.52 | 67.34 | 78.54 |
7. BERT-BiLSTM-CRF | 97.14 | 71.38 | 81.65 |
8. KECI (SciBERT) | 92.81 | 70.48 | 80.38 |
9. GPT-3.5 | 29.18 | 3.58 | 6.24 |
10. GPT-4o-mini | 27.48 | 4.62 | 7.65 |
Our model | 94.69 | 73.36 | 82.48 |
Models | NER | ||
---|---|---|---|
Precision (%) | Recall (%) | F1-Score (%) | |
1. CNN | 23.32 | 5.23 | 8.54 |
2. BiLSTM | 49.07 | 42.83 | 45.51 |
3. BiLSTM-CRF | 48.32 | 16.18 | 23.97 |
4. BERT | 98.74 | 98.16 | 98.43 |
5. BERT-CRF | 98.90 | 98.22 | 98.55 |
6. BERT-BiLSTM | 99.66 | 98.22 | 98.94 |
7. BERT-BiLSTM-CRF | 98.88 | 96.94 | 97.89 |
8. KECI (SciBERT) | 98.37 | 96.80 | 97.57 |
9. GPT-3.5 | 0 | 0 | 0 |
10. GPT-4o-mini | 7.68 | 1.90 | 0.37 |
Our model | 99.94 | 98.78 | 99.36 |
Models | Precision | Recall | F1-Score |
---|---|---|---|
NER | |||
1. NER Ensemble_weighted (1.0*BERT + 0.0*T5) | 94.31 | 91.04 | 92.51 |
2. NER Ensemble_weighted (0.7*BERT + 0.3*T5) | 94.62 | 91.55 | 93.00 |
3. NER Ensemble_weighted (0.5*BERT + 0.5*T5) | 94.37 | 91.51 | 92.80 |
4. NER Ensemble_weighted (0.3*BERT + 0.7*T5) | 92.65 | 88.85 | 90.28 |
5. NER Ensemble_weighted (0.0*BERT + 1.0*T5) | 61.90 | 49.83 | 53.43 |
6. CDS (only confidence score) | 94.51 | 91.96 | 93.22 |
7. CDS (only weighted combination with attention) | 94.73 | 92.24 | 93.47 |
8. Ours | 95.11 | 92.45 | 93.76 |
RE | |||
9. W/O General BERT | 69.80 | 65.10 | 67.16 |
Ours | 72.58 | 65.27 | 68.73 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Jin, M.; Choi, S.-M.; Kim, G.-W. COMCARE: A Collaborative Ensemble Framework for Context-Aware Medical Named Entity Recognition and Relation Extraction. Electronics 2025, 14, 328. https://doi.org/10.3390/electronics14020328
Jin M, Choi S-M, Kim G-W. COMCARE: A Collaborative Ensemble Framework for Context-Aware Medical Named Entity Recognition and Relation Extraction. Electronics. 2025; 14(2):328. https://doi.org/10.3390/electronics14020328
Chicago/Turabian StyleJin, Myeong, Sang-Min Choi, and Gun-Woo Kim. 2025. "COMCARE: A Collaborative Ensemble Framework for Context-Aware Medical Named Entity Recognition and Relation Extraction" Electronics 14, no. 2: 328. https://doi.org/10.3390/electronics14020328
APA StyleJin, M., Choi, S.-M., & Kim, G.-W. (2025). COMCARE: A Collaborative Ensemble Framework for Context-Aware Medical Named Entity Recognition and Relation Extraction. Electronics, 14(2), 328. https://doi.org/10.3390/electronics14020328