Large Language Model and Large Vision Model for Life Sciences

A special issue of Life (ISSN 2075-1729). This special issue belongs to the section "Biochemistry, Biophysics and Computational Biology".

Deadline for manuscript submissions: closed (28 June 2024) | Viewed by 1749

Special Issue Editor

Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences, 320 Yue Yang Road, Shanghai 200031, China
Interests: bioinformatics; genetics; genomics; machine learning; ceRNA network; predictive modeling
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Artificial intelligence (AI) has profoundly changed the research paradigm of life sciences. Many Large Language Models (LLMs) inspired by ChatGPT have been built to accelerate biomedical studies. For example, GeneGPT can answer biomedical questions uses NCBI Web APIs and CancerGPT can predict drug pair synergy using large pre-trained language models.

Beside text data, there are many images and videos in biomedical researches, such as pathology images, light or electron microscopy cell images and videos. It is estimated that over one million digital pathology images are collected per day all over the USA. Many hospitals are sharing their large repositories. Large Vision Model (LVM) can be used to analyze such large volume image and video data.

Since LLM and LVM are trained on large volume data with large diversity, they are general models and can do a lot of tasks without too much sample and feature engineering. For language problems, we can build an NLP (Natural Language Processing) model to do translation and build another NLP model to fix grammar errors. But for LLM like ChatGPT which are trained on almost the entire Internet data, one model can do all these tasks. Similarly, we can train a traditional CV (Computer Vision) model between the culture plate images and the number of cells. But for another cell type with different shapes, we need to train another model. Now, we can train one LVM and give a prompt of a series of cell A growth images and few cell B growth images and ask the LVM model to predict in next image how many B cells will be.

Although it is still challenging to build LVM, several new approaches such as visual sentences have been proposed. The visual data can be represented as sequences, the LVM can be trained by minimizing the cross-entropy loss for next token prediction. Different vision tasks can be solved with suitable visual prompts.

The fusion of LLM and LVM makes the Large Multimodal Model (LMM) possible and eventually the Artificial General Intelligence (AGI) possible. We will understand its underlying meaning no matter we see a sentence or an image or a video.

Therefore, we invite multidisciplinary researchers to submit original articles, as well as review articles on this special issue. Potential topics of interest include, but are not limited to, the following:

  1. Large language model for single cell studies
  2. Large language model in protein design
  3. Large language model in High Throughput Drug screening/ discovery
  4. Large vision model for pathology images
  5. Large vision model for spatial omics images
  6. Large vision model for cancer cell growth and migration
  7. Large multimodal model for building digital twins of cancer progression
  8. Large vision model in High Throughput Drug screening/ discovery

Dr. Tao Huang
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Life is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • artificial intelligence
  • large language model
  • large vision model
  • images
  • videos
  • single cell
  • protein design
  • spatial omics
  • drug discovery
  • digital twin

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (1 paper)

Order results
Result details
Select all
Export citation of selected articles as:

Research

27 pages, 4348 KiB  
Article
Machine Learning in Identifying Marker Genes for Congenital Heart Diseases of Different Cardiac Cell Types
by Qinglan Ma, Yu-Hang Zhang, Wei Guo, Kaiyan Feng, Tao Huang and Yu-Dong Cai
Life 2024, 14(8), 1032; https://doi.org/10.3390/life14081032 - 19 Aug 2024
Viewed by 1001
Abstract
Congenital heart disease (CHD) represents a spectrum of inborn heart defects influenced by genetic and environmental factors. This study advances the field by analyzing gene expression profiles in 21,034 cardiac fibroblasts, 73,296 cardiomyocytes, and 35,673 endothelial cells, utilizing single-cell level analysis and machine [...] Read more.
Congenital heart disease (CHD) represents a spectrum of inborn heart defects influenced by genetic and environmental factors. This study advances the field by analyzing gene expression profiles in 21,034 cardiac fibroblasts, 73,296 cardiomyocytes, and 35,673 endothelial cells, utilizing single-cell level analysis and machine learning techniques. Six CHD conditions: dilated cardiomyopathy (DCM), donor hearts (used as healthy controls), hypertrophic cardiomyopathy (HCM), heart failure with hypoplastic left heart syndrome (HF_HLHS), Neonatal Hypoplastic Left Heart Syndrome (Neo_HLHS), and Tetralogy of Fallot (TOF), were investigated for each cardiac cell type. Each cell sample was represented by 29,266 gene features. These features were first analyzed by six feature-ranking algorithms, resulting in several feature lists. Then, these lists were fed into incremental feature selection, containing two classification algorithms, to extract essential gene features and classification rules and build efficient classifiers. The identified essential genes can be potential CHD markers in different cardiac cell types. For instance, the LASSO identified key genes specific to various heart cell types in CHD subtypes. FOXO3 was found to be up-regulated in cardiac fibroblasts for both Dilated and hypertrophic cardiomyopathy. In cardiomyocytes, distinct genes such as TMTC1, ART3, ARHGAP24, SHROOM3, and XIST were linked to dilated cardiomyopathy, Neo-Hypoplastic Left Heart Syndrome, hypertrophic cardiomyopathy, HF-Hypoplastic Left Heart Syndrome, and Tetralogy of Fallot, respectively. Endothelial cell analysis further revealed COL25A1, NFIB, and KLF7 as significant genes for dilated cardiomyopathy, hypertrophic cardiomyopathy, and Tetralogy of Fallot. LightGBM, Catboost, MCFS, RF, and XGBoost further delineated key genes for specific CHD subtypes, demonstrating the efficacy of machine learning in identifying CHD-specific genes. Additionally, this study developed quantitative rules for representing the gene expression patterns related to CHDs. This research underscores the potential of machine learning in unraveling the molecular complexities of CHD and establishes a foundation for future mechanism-based studies. Full article
(This article belongs to the Special Issue Large Language Model and Large Vision Model for Life Sciences)
Show Figures

Figure 1

Back to TopTop