Biomedical Natural Language Processing and Text Mining

A special issue of Information (ISSN 2078-2489). This special issue belongs to the section "Biomedical Information and Health".

Deadline for manuscript submissions: 31 July 2025 | Viewed by 10402

Special Issue Editors

Associate Professor, School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China
Interests: biomedical text mining; natural language processing; deep learning
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor

Special Issue Information

Dear Colleagues,

With the rapid development of biomedical research, a large quantity of biomedical text data is available in the biomedical domain. These biomedical texts, such as biomedical literature, clinical notes and medical guidelines, have become an important biomedical domain resource and provide a rich source of knowledge for biomedical research. However, the large size of the body of biomedical text and its rapid growth (e.g., >3000 articles are published in biomedical journals every day) make document search and information access a demanding task. 

In recent years, biomedical natural language processing (NLP) or text mining with the goal of developing text mining, NLP, and machine learning techniques for various biomedical applications has received considerable attention and has seen great progress. For example, LitCOVID (https://www.ncbi.nlm.nih.gov/research/coronavirus/), a curated literature hub for tracking up-to-date scientific information about COVID-19, automatically collects the COVID-19 articles and further categorizes them according to research topic. Despite this success, many challenges remain to be solved in the field. 

This Special Issue aims to bring together NLP researchers and experts in the biomedical field to advance the current state of the art and share insights and challenges. The goal is to develop computational methods and software tools for analyzing and better understanding unstructured biomedical text data towards accelerated knowledge discovery and improving health.

Topics of interest include, but are not limited to, the following: 

  • Biomedical text classification;
  • Biomedical information retrieval;
  • Biomedical named entity recognition and normalization (linking);
  • Biomedical relation and event extractions;
  • Biomedical literature-based discovery;
  • Biomedical text summarization;
  • Biomedical question answering;
  • Pre-trained language models for biomedical NLP;
  • Biomedical machine translation;
  • BioNLP applications;
  • BioNLP resources and evaluation. 

Dr. Ling Luo
Prof. Dr. Diego Reforgiato Recupero
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Information is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • biomedical text classification
  • biomedical information retrieval
  • biomedical named entity recognition and normalization (linking)
  • biomedical relation and event extractions
  • biomedical literature-based discovery
  • biomedical text summarization
  • biomedical question answering
  • pre-trained language models for biomedical NLP
  • biomedical machine translation
  • BioNLP applications
  • BioNLP resources and evaluation.

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (4 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

15 pages, 288 KiB  
Article
LLMs in Action: Robust Metrics for Evaluating Automated Ontology Annotation Systems
by Ali Noori, Pratik Devkota, Somya D. Mohanty and Prashanti Manda
Information 2025, 16(3), 225; https://doi.org/10.3390/info16030225 - 14 Mar 2025
Viewed by 407
Abstract
Ontologies are critical for organizing and interpreting complex domain-specific knowledge, with applications in data integration, functional prediction, and knowledge discovery. As the manual curation of ontology annotations becomes increasingly infeasible due to the exponential growth of biomedical and genomic data, natural language processing [...] Read more.
Ontologies are critical for organizing and interpreting complex domain-specific knowledge, with applications in data integration, functional prediction, and knowledge discovery. As the manual curation of ontology annotations becomes increasingly infeasible due to the exponential growth of biomedical and genomic data, natural language processing (NLP)-based systems have emerged as scalable alternatives. Evaluating these systems requires robust semantic similarity metrics that account for hierarchical and partially correct relationships often present in ontology annotations. This study explores the integration of graph-based and language-based embeddings to enhance the performance of semantic similarity metrics. Combining embeddings generated via Node2Vec and large language models (LLMs) with traditional semantic similarity metrics, we demonstrate that hybrid approaches effectively capture both structural and semantic relationships within ontologies. Our results show that combined similarity metrics outperform individual metrics, achieving high accuracy in distinguishing child–parent pairs from random pairs. This work underscores the importance of robust semantic similarity metrics for evaluating and optimizing NLP-based ontology annotation systems. Future research should explore the real-time integration of these metrics and advanced neural architectures to further enhance scalability and accuracy, advancing ontology-driven analyses in biomedical research and beyond. Full article
(This article belongs to the Special Issue Biomedical Natural Language Processing and Text Mining)
Show Figures

Figure 1

15 pages, 617 KiB  
Article
On the Creation of a Corpus-Derived Medical Multi-Word Term List
by Cosmin Mihail Florescu and Ryosuke L. Ohniwa
Information 2025, 16(2), 118; https://doi.org/10.3390/info16020118 - 7 Feb 2025
Viewed by 475
Abstract
Although several studies have succeeded in creating medical word lists using corpus analysis methods, there is currently a shortage of comprehensive lists containing medical multi-word terms (MWTs). This study attempts to fill this gap by identifying medical MWTs using a large corpus of [...] Read more.
Although several studies have succeeded in creating medical word lists using corpus analysis methods, there is currently a shortage of comprehensive lists containing medical multi-word terms (MWTs). This study attempts to fill this gap by identifying medical MWTs using a large corpus of English language medical textbooks (28,384,681 running words). The term extraction function in Sketch Engine was used to extract high-frequency MWTs and to calculate keyness and dispersion data for each MWT. The validity of the resulting list and of specific subsets was tested using a different medical corpus and a general English corpus. The resulting list comprises 3307 MWTs with 63.83% (2111 MWTs) occurring at comparable frequencies in the different medical corpus and only 0.97% (32 MWTs) occurring at comparable frequencies in the general English corpus. The study also revealed clear differences in replicability between semantic subsets, with MWTs from the Anatomy and the Disorders semantic groups displaying high replicability, while MWTs from the Concepts and Ideas semantic group showed low to moderate replicability. The list may be used to develop evidence-based materials in English for Medical Purposes courses and to further explore how information is packaged in healthcare communication settings. Full article
(This article belongs to the Special Issue Biomedical Natural Language Processing and Text Mining)
Show Figures

Graphical abstract

17 pages, 1985 KiB  
Article
A Spine-Specific Lexicon for the Sentiment Analysis of Interviews with Adult Spinal Deformity Patients Correlates with SF-36, SRS-22, and ODI Scores: A Pilot Study of 25 Patients
by Ross Gore, Michael M. Safaee, Christopher J. Lynch and Christopher P. Ames
Information 2025, 16(2), 90; https://doi.org/10.3390/info16020090 - 24 Jan 2025
Cited by 1 | Viewed by 628
Abstract
Classic health-related quality of life (HRQOL) metrics are cumbersome, time-intensive, and subject to biases based on the patient’s native language, educational level, and cultural values. Natural language processing (NLP) converts text into quantitative metrics. Sentiment analysis enables subject matter experts to construct domain-specific [...] Read more.
Classic health-related quality of life (HRQOL) metrics are cumbersome, time-intensive, and subject to biases based on the patient’s native language, educational level, and cultural values. Natural language processing (NLP) converts text into quantitative metrics. Sentiment analysis enables subject matter experts to construct domain-specific lexicons that assign a value of either negative (−1) or positive (1) to certain words. The growth of telehealth provides opportunities to apply sentiment analysis to transcripts of adult spinal deformity patients’ visits to derive a novel and less biased HRQOL metric. In this study, we demonstrate the feasibility of constructing a spine-specific lexicon for sentiment analysis to derive an HRQOL metric for adult spinal deformity patients from their preoperative telehealth visit transcripts. We asked each of twenty-five (25) adult patients seven open-ended questions about their spinal conditions, treatment, and quality of life during telehealth visits. We analyzed the Pearson correlation between our sentiment analysis HRQOL metric and established HRQOL metrics (the Scoliosis Research Society-22 questionnaire [SRS-22], 36-Item Short Form Health Survey [SF-36], and Oswestry Disability Index [ODI]). The results show statistically significant correlations (0.43–0.74) between our sentiment analysis metric and the conventional metrics. This provides evidence that applying NLP techniques to patient transcripts can yield an effective HRQOL metric. Full article
(This article belongs to the Special Issue Biomedical Natural Language Processing and Text Mining)
Show Figures

Figure 1

29 pages, 4460 KiB  
Article
Performance of 4 Pre-Trained Sentence Transformer Models in the Semantic Query of a Systematic Review Dataset on Peri-Implantitis
by Carlo Galli, Nikolaos Donos and Elena Calciolari
Information 2024, 15(2), 68; https://doi.org/10.3390/info15020068 - 23 Jan 2024
Cited by 11 | Viewed by 7109
Abstract
Systematic reviews are cumbersome yet essential to the epistemic process of medical science. Finding significant reports, however, is a daunting task because the sheer volume of published literature makes the manual screening of databases time-consuming. The use of Artificial Intelligence could make literature [...] Read more.
Systematic reviews are cumbersome yet essential to the epistemic process of medical science. Finding significant reports, however, is a daunting task because the sheer volume of published literature makes the manual screening of databases time-consuming. The use of Artificial Intelligence could make literature processing faster and more efficient. Sentence transformers are groundbreaking algorithms that can generate rich semantic representations of text documents and allow for semantic queries. In the present report, we compared four freely available sentence transformer pre-trained models (all-MiniLM-L6-v2, all-MiniLM-L12-v2, all-mpnet-base-v2, and All-distilroberta-v1) on a convenience sample of 6110 articles from a published systematic review. The authors of this review manually screened the dataset and identified 24 target articles that addressed the Focused Questions (FQ) of the review. We applied the four sentence transformers to the dataset and, using the FQ as a query, performed a semantic similarity search on the dataset. The models identified similarities between the FQ and the target articles to a varying degree, and, sorting the dataset by semantic similarities using the best-performing model (all-mpnet-base-v2), the target articles could be found in the top 700 papers out of the 6110 dataset. Our data indicate that the choice of an appropriate pre-trained model could remarkably reduce the number of articles to screen and the time to completion for systematic reviews. Full article
(This article belongs to the Special Issue Biomedical Natural Language Processing and Text Mining)
Show Figures

Figure 1

Back to TopTop