Applications of Artificial Intelligence in Biomedical Data Analysis

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Biomedical Engineering".

Deadline for manuscript submissions: 31 August 2024 | Viewed by 16625

Special Issue Editors


E-Mail Website
Guest Editor
Department of Computer Science and Engineering, Inha University, 100 Inha-ro, Michuhol-gu, Incheon 22212, Korea
Interests: algorithm; bioinformatics; data analysis

E-Mail Website
Guest Editor
Electronics and Telecommunications Research Institute, Daejeon, Korea
Interests: image processing; health-IT; bioinformatics; data mining

Special Issue Information

Dear Colleagues,

Biomedical data have exploded over the past decade. The amount of data available for analysis is very large due to the increase in genomic data by the reduction in the cost of gene sequencing and the digitization of medical records. This flood of biomedical information requires new thinking about how data can be used to enhance scientific understanding and improve bioresearch and healthcare services. The emergence of deep learning technology, which is developing along with AI technology, provides a new solution to biomedical data analysis and can be used in clinical research. In this Special Issue, we want to address recent advances in the following topics related to AI:

  • Biomedical data analysis;
  • Biomedical engineering;
  • Bioinformatics;
  • Sequence analysis;
  • Time series data analysis.

Submissions are invited for both original research and review articles. We hope that this collection of papers will serve as an inspiration for those interested in the applications of Artificial Intelligence in biomedical informatics.

Dr. Jeong Seop Sim
Dr. SooJun Park
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • artificial intelligence
  • deep learning
  • biomedical data analysis
  • sequence analysis
  • bioinformatics

Published Papers (12 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

19 pages, 2760 KiB  
Article
Explainable Multimodal Graph Isomorphism Network for Interpreting Sex Differences in Adolescent Neurodevelopment
by Binish Patel, Anton Orlichenko, Adnan Patel, Gang Qu, Tony W. Wilson, Julia M. Stephen, Vince D. Calhoun and Yu-Ping Wang
Appl. Sci. 2024, 14(10), 4144; https://doi.org/10.3390/app14104144 - 14 May 2024
Viewed by 255
Abstract
Background: A fundamental grasp of the variability observed in healthy individuals holds paramount importance in the investigation of neuropsychiatric conditions characterized by sex-related phenotypic distinctions. Functional magnetic resonance imaging (fMRI) serves as a meaningful tool for discerning these differences. Among deep learning [...] Read more.
Background: A fundamental grasp of the variability observed in healthy individuals holds paramount importance in the investigation of neuropsychiatric conditions characterized by sex-related phenotypic distinctions. Functional magnetic resonance imaging (fMRI) serves as a meaningful tool for discerning these differences. Among deep learning models, graph neural networks (GNNs) are particularly well-suited for analyzing brain networks derived from fMRI blood oxygen level-dependent (BOLD) signals, enabling the effective exploration of sex differences during adolescence. Method: In the present study, we introduce a multi-modal graph isomorphism network (MGIN) designed to elucidate sex-based disparities using fMRI task-related data. Our approach amalgamates brain networks obtained from multiple scans of the same individual, thereby enhancing predictive capabilities and feature identification. The MGIN model adeptly pinpoints crucial subnetworks both within and between multi-task fMRI datasets. Moreover, it offers interpretability through the utilization of GNNExplainer, which identifies pivotal sub-network graph structures contributing significantly to sex group classification. Results: Our findings indicate that the MGIN model outperforms competing models in terms of classification accuracy, underscoring the benefits of combining two fMRI paradigms. Additionally, our model discerns the most significant sex-related functional networks, encompassing the default mode network (DMN), visual (VIS) network, cognitive (CNG) network, frontal (FRNT) network, salience (SAL) network, subcortical (SUB) network, and sensorimotor (SM) network associated with hand and mouth movements. Remarkably, the MGIN model achieves superior sex classification accuracy when juxtaposed with other state-of-the-art algorithms, yielding a noteworthy 81.67% improvement in classification accuracy. Conclusion: Our model’s superiority emanates from its capacity to consolidate data from multiple scans of subjects within a proven interpretable framework. Beyond its classification prowess, our model guides our comprehension of neurodevelopment during adolescence by identifying critical subnetworks of functional connectivity. Full article
(This article belongs to the Special Issue Applications of Artificial Intelligence in Biomedical Data Analysis)
Show Figures

Figure 1

23 pages, 5723 KiB  
Article
Layer-Weighted Attention and Ascending Feature Selection: An Approach for Seriousness Level Prediction Using the FDA Adverse Event Reporting System
by Bader Aldughayfiq, Hisham Allahem, Ayman Mohamed Mostafa, Mohammed Alnusayri and Mohamed Ezz
Appl. Sci. 2024, 14(8), 3280; https://doi.org/10.3390/app14083280 - 13 Apr 2024
Viewed by 371
Abstract
In this study, we introduce a novel combination of layer-static-weighted attention and ascending feature selection techniques to predict the seriousness level of adverse drug events using the Food and Drug Administration (FDA) Adverse Event Reporting System (FAERS). We utilized natural language processing (NLP) [...] Read more.
In this study, we introduce a novel combination of layer-static-weighted attention and ascending feature selection techniques to predict the seriousness level of adverse drug events using the Food and Drug Administration (FDA) Adverse Event Reporting System (FAERS). We utilized natural language processing (NLP) to analyze the terms in the active substance field, in addition to considering demographic and event information such as patient sex, healthcare provider qualification, and drug characterization. Our ascending feature selection method, which progressively incorporates additional features based on their importance, demonstrated continuous enhancements in prediction performance. Simultaneously, we employed a layer-static-weighted attention technique, which dynamically adjusts the model’s focus between natural language processing (NLP) and demographic features. This technique achieved its best performance at a balanced weight of 50%, yielding an average test accuracy of 74.56% and CV ROC score of 0.83 when 4000 features were included, indicating a compelling advantage to include a larger volume of meaningful features. By integrating these methodologies, we constructed a robust model capable of effectively predicting seriousness levels, offering significant potential for improving pharmacovigilance and enhancing drug safety monitoring. The results underscore the value of NLP and demographic data in predicting drug event seriousness and demonstrate the effectiveness of our combined techniques. We encourage further research to refine these methods and evaluate their application to other clinical datasets. Full article
(This article belongs to the Special Issue Applications of Artificial Intelligence in Biomedical Data Analysis)
Show Figures

Figure 1

19 pages, 1919 KiB  
Article
Applying a Recurrent Neural Network-Based Deep Learning Model for Gene Expression Data Classification
by Sergii Babichev, Igor Liakh and Irina Kalinina
Appl. Sci. 2023, 13(21), 11823; https://doi.org/10.3390/app132111823 - 29 Oct 2023
Cited by 1 | Viewed by 1428
Abstract
The importance of gene expression data processing in solving the classification task is determined by its ability to discern intricate patterns and relationships within genetic information, enabling the precise categorization and understanding of various gene expression profiles and their consequential impacts on biological [...] Read more.
The importance of gene expression data processing in solving the classification task is determined by its ability to discern intricate patterns and relationships within genetic information, enabling the precise categorization and understanding of various gene expression profiles and their consequential impacts on biological processes and traits. In this study, we investigated various architectures and types of recurrent neural networks focusing on gene expression data. The effectiveness of the appropriate model was evaluated using various classification quality criteria based on type 1 and type 2 errors. Moreover, we calculated the integrated F1-score index using the Harrington desirability method, the value of which allowed us to improve the objectivity of the decision making when model effectiveness was evaluated. The final decision regarding model effectiveness was made based on a comprehensive classification quality criterion, which was calculated as the weighted sum of classification accuracy, integrated F1-score index, and loss function values. The simulation results show higher appeal of a single-layer GRU recurrent network with 75 neurons in the recurrent layer. We also compared convolutional and recurrent neural networks on gene expression data classification. Although convolutional neural networks showcase benefits in terms of loss function value and training time, a comparative analysis revealed that in terms of classification accuracy calculated on the test data subset, the GRU neural network model is slightly better than the CNN and LSTM models. The classification accuracy when using the GRU network was 97.2%; in other cases, it was 97.1%. In the first case, 954 out of 981 objects were correctly identified. In other cases, 952 objects were correctly identified. Full article
(This article belongs to the Special Issue Applications of Artificial Intelligence in Biomedical Data Analysis)
Show Figures

Figure 1

13 pages, 669 KiB  
Article
Detection of Unknown Polymorphic Patterns Using Feature-Extracting Part of a Convolutional Autoencoder
by Przemysław Kucharski and Krzysztof Ślot
Appl. Sci. 2023, 13(19), 10842; https://doi.org/10.3390/app131910842 - 29 Sep 2023
Viewed by 608
Abstract
Background: The present paper proposes a novel approach for detecting the presence of unknown polymorphic patterns in random symbol sequences that also comprise already known polymorphic patterns. Methods: We propose to represent rules that define the considered patterns as regular expressions and show [...] Read more.
Background: The present paper proposes a novel approach for detecting the presence of unknown polymorphic patterns in random symbol sequences that also comprise already known polymorphic patterns. Methods: We propose to represent rules that define the considered patterns as regular expressions and show how these expressions can be modeled using filter cascades of neural convolutional layers. We adopted a convolutional autoencoder (CAE) as a pattern detection framework. To detect unknown patterns, we first incorporated knowledge of known rules into the CAE’s convolutional feature extractor by fixing weights in some of its filter cascades. Then, we executed the learning procedure, where the weights of the remaining filters were driven by two different objectives. The first was to ensure correct sequence reconstruction, whereas the second was to prevent weights from learning the already known patterns. Results: The proposed methodology was tested on sample sequences derived from the human genome. The analysis of the experimental results provided statistically significant information on the presence or absence of polymorphic patterns that were not known in advance. Conclusions: The proposed method was able to detect the existence of unknown polymorphic patterns. Full article
(This article belongs to the Special Issue Applications of Artificial Intelligence in Biomedical Data Analysis)
Show Figures

Figure 1

12 pages, 4620 KiB  
Article
Automated Clinical Impression Generation for Medical Signal Data Searches
by Woonghee Lee, Jaewoo Yang, Doyeong Park and Younghoon Kim
Appl. Sci. 2023, 13(15), 8931; https://doi.org/10.3390/app13158931 - 3 Aug 2023
Viewed by 933
Abstract
Medical retrieval systems have become significantly important in clinical settings. However, commercial retrieval systems that heavily rely on term-based indexing face challenges when handling continuous medical data, such as electroencephalography data, primarily due to the high cost associated with utilizing neurologist analyses. With [...] Read more.
Medical retrieval systems have become significantly important in clinical settings. However, commercial retrieval systems that heavily rely on term-based indexing face challenges when handling continuous medical data, such as electroencephalography data, primarily due to the high cost associated with utilizing neurologist analyses. With the increasing affordability of data recording systems, it becomes increasingly crucial to address these challenges. Traditional procedures for annotating, classifying, and interpreting medical data are costly, time consuming, and demand specialized knowledge. While cross-modal retrieval systems have been proposed to address these challenges, most concentrate on images and text, sidelining time-series medical data like electroencephalography data. As the interpretation of electroencephalography signals, which document brain activity, requires a neurologist’s expertise, this process is often the most expensive component. Therefore, a retrieval system capable of using text to identify relevant signals, eliminating the need for expert analysis, is desirable. Our research proposes a solution to facilitate the creation of indexing systems employing electroencephalography signals for report generation in situations where reports are pending a neurologist review. We introduce a method incorporating a convolutional-neural-network-based encoder from DeepSleepNet, which extracts features from electroencephalography signals, coupled with a transformer which learns the signal’s auto-correlation and the relationship between the signal and the corresponding report. Experimental evaluation using real-world data revealed our approach surpasses baseline methods. These findings suggest potential advancements in medical data retrieval and a decrease in reliance on expert knowledge for electroencephalography signal analysis. As such, our research represents a significant stride towards making electroencephalography data more comprehensible and utilizable in clinical environments. Full article
(This article belongs to the Special Issue Applications of Artificial Intelligence in Biomedical Data Analysis)
Show Figures

Figure 1

19 pages, 1618 KiB  
Article
A Hybrid Model of Cancer Diseases Diagnosis Based on Gene Expression Data with Joint Use of Data Mining Methods and Machine Learning Techniques
by Sergii Babichev, Lyudmyla Yasinska-Damri and Igor Liakh
Appl. Sci. 2023, 13(10), 6022; https://doi.org/10.3390/app13106022 - 14 May 2023
Cited by 6 | Viewed by 1490
Abstract
One of the current focuses of modern bioinformatics is the development of hybrid models to process gene expression data, in order to create diagnostic systems for various diseases. In this study, we propose a solution to this problem that combines an inductive spectral [...] Read more.
One of the current focuses of modern bioinformatics is the development of hybrid models to process gene expression data, in order to create diagnostic systems for various diseases. In this study, we propose a solution to this problem that combines an inductive spectral clustering algorithm, random forest classifier, convolutional neural network, and alternative voting method for making the final decision about patient condition. In the first stage, we apply the spectral clustering algorithm to gene expression profiles using inductive methods of objective clustering, with the calculation of internal, external, and balance clustering quality criteria. This results in clusters of mutually correlated and differently expressed gene expression profiles. In the second stage, we apply the random forest classifier and convolutional neural network to identify the examined objects, containing as attributes the gene expression values in the allocated clusters. The presented research solves both binary- and multi-classification tasks. The final decision about the patient’s condition is made using the alternative voting method, considering the classification results based on the gene expression data in various clusters. The simulation results showed that the proposed technique was highly effective, achieving a high accuracy in object identification when both classifiers were used. However, the convolutional neural network had a significantly higher data processing efficiency than the random forest algorithm, due to its substantially shorter processing time. Full article
(This article belongs to the Special Issue Applications of Artificial Intelligence in Biomedical Data Analysis)
Show Figures

Figure 1

11 pages, 5419 KiB  
Article
Web Interface of NER and RE with BERT for Biomedical Text Mining
by Yeon-Ji Park, Min-a Lee, Geun-Je Yang, Soo Jun Park and Chae-Bong Sohn
Appl. Sci. 2023, 13(8), 5163; https://doi.org/10.3390/app13085163 - 21 Apr 2023
Viewed by 1928
Abstract
The BioBERT Named Entity Recognition (NER) model is a high-performance model designed to identify both known and unknown entities. It surpasses previous NER models utilized by text-mining tools, such as tmTool and ezTag, in effectively discovering novel entities. In previous studies, the Biomedical [...] Read more.
The BioBERT Named Entity Recognition (NER) model is a high-performance model designed to identify both known and unknown entities. It surpasses previous NER models utilized by text-mining tools, such as tmTool and ezTag, in effectively discovering novel entities. In previous studies, the Biomedical Entity Recognition and Multi-Type Normalization Tool (BERN) employed this model to identify words that represent specific names, discern the type of the word, and implement it on a web page to offer NER service. However, we aimed to offer a web service that includes Relation Extraction (RE), a task determining the relation between entity pairs within a sentence. First, just like BERN, we fine-tuned the BioBERT NER model within the biomedical domain to recognize new entities. We identified two categories: diseases and genes/proteins. Additionally, we fine-tuned the BioBERT RE model to determine the presence or absence of a relation between the identified gene–disease entity pairs. The NER and RE results are displayed on a web page using the Django web framework. NER results are presented in distinct colors, and RE results are visualized as graphs in NetworkX and Cytoscape, allowing users to interact with the graphs. Full article
(This article belongs to the Special Issue Applications of Artificial Intelligence in Biomedical Data Analysis)
Show Figures

Figure 1

10 pages, 557 KiB  
Article
Order-Preserving Multiple Pattern Matching in Parallel
by Somin Park, Jinhyeok Park, Youngho Kim and Jeong Seop Sim
Appl. Sci. 2023, 13(8), 5142; https://doi.org/10.3390/app13085142 - 20 Apr 2023
Viewed by 940
Abstract
The order-preserving multiple pattern matching problem is to find all substrings of T whose relative orders are the same for any pattern in a set of patterns. Various sequential algorithms have been studied for the order-preserving multiple pattern matching problems. In this paper, [...] Read more.
The order-preserving multiple pattern matching problem is to find all substrings of T whose relative orders are the same for any pattern in a set of patterns. Various sequential algorithms have been studied for the order-preserving multiple pattern matching problems. In this paper, we propose two parallel algorithms, each of which uses Aho–Corasick automata and fingerprint tables, respectively. We also present experimental results of comparing the execution times of each parallel algorithm on various types of time-series data. Full article
(This article belongs to the Special Issue Applications of Artificial Intelligence in Biomedical Data Analysis)
Show Figures

Figure 1

18 pages, 2664 KiB  
Article
RNA Sequences-Based Diagnosis of Parkinson’s Disease Using Various Feature Selection Methods and Machine Learning
by Jingeun Kim, Hye-Jin Park and Yourim Yoon
Appl. Sci. 2023, 13(4), 2698; https://doi.org/10.3390/app13042698 - 20 Feb 2023
Viewed by 2294
Abstract
Parkinson’s disease is a neurodegenerative disease that is associated with genetic and environmental factors. However, the genes causing this degeneration have not been determined, and no reported cure exists for this disease. Recently, studies have been conducted to classify diseases with RNA-seq data [...] Read more.
Parkinson’s disease is a neurodegenerative disease that is associated with genetic and environmental factors. However, the genes causing this degeneration have not been determined, and no reported cure exists for this disease. Recently, studies have been conducted to classify diseases with RNA-seq data using machine learning, and accurate diagnosis of diseases using machine learning is becoming an important task. In this study, we focus on how various feature selection methods can improve the performance of machine learning for accurate diagnosis of Parkinson’s disease. In addition, we analyzed the performance metrics and computational costs of running the model with and without various feature selection methods. Experiments were conducted using RNA sequencing—a technique that analyzes the transcription profiling of organisms using next-generation sequencing. Genetic algorithms (GA), information gain (IG), and wolf search algorithm (WSA) were employed as feature selection methods. Machine learning algorithms—extreme gradient boosting (XGBoost), deep neural network (DNN), support vector machine (SVM), and decision tree (DT)—were used as classifiers. Further, the model was evaluated using performance indicators, such as accuracy, precision, recall, F1 score, and receiver operating characteristic (ROC) curve. For XGBoost and DNN, feature selection methods based on GA, IG, and WSA improved the performance of machine learning by 10.00% and 38.18%, respectively. For SVM and DT, performance was improved by 0.91% and 7.27%, respectively, with feature selection methods based on IG and WSA. The results demonstrate that various feature selection methods improve the performance of machine learning when classifying Parkinson’s disease using RNA-seq data. Full article
(This article belongs to the Special Issue Applications of Artificial Intelligence in Biomedical Data Analysis)
Show Figures

Figure 1

15 pages, 1714 KiB  
Article
A Wrapped Approach Using Unlabeled Data for Diabetic Retinopathy Diagnosis
by Xuefeng Zhang, Youngsung Kim, Young-Chul Chung, Sangcheol Yoon, Sang-Yong Rhee and Yong Soo Kim
Appl. Sci. 2023, 13(3), 1901; https://doi.org/10.3390/app13031901 - 1 Feb 2023
Cited by 2 | Viewed by 1150
Abstract
Large-scale datasets, which have sufficient and identical quantities of data in each class, are the main factor in the success of deep-learning-based classification models for vision tasks. A shortage of sufficient data and interclass imbalanced data distribution, which often arise in the medical [...] Read more.
Large-scale datasets, which have sufficient and identical quantities of data in each class, are the main factor in the success of deep-learning-based classification models for vision tasks. A shortage of sufficient data and interclass imbalanced data distribution, which often arise in the medical domain, cause modern deep neural networks to suffer greatly from imbalanced learning and overfitting. A diagnostic model of diabetic retinopathy (DR) that is trained from such a dataset using supervised learning is severely biased toward the majority class. To enhance the efficiency of imbalanced learning, the proposal of this study is to leverage retinal fundus images without human annotations by self-supervised or semi-supervised learning. The proposed approach to DR detection is to add an auxiliary procedure to the target task that identifies DR using supervised learning. The added process uses unlabeled data to pre-train the model that first learns features from data using self-supervised or semi-supervised learning, and then the pre-trained model is transferred with the learned parameter to the target model. This wrapper algorithm of learning from unlabeled data can help the model gain more information from samples in the minority class, thereby improving imbalanced learning to some extent. Comprehensive experiments demonstrate that the model trained with the proposed method outperformed the one trained with only the supervised learning baseline utilizing the same data, with an accuracy improvement of 4~5%. To further examine the method proposed in this study, a comparison is conducted, and our results show that the proposed method also performs much better than some state-of-the-art methods. In the case of EyePaCS, for example, the proposed method outperforms the customized CNN model by 9%. Through experiments, we further find that the models trained with a smaller but balanced dataset are not worse than those trained with a larger but imbalanced dataset. Therefore, our study reveals that utilizing unlabeled data can avoid the expensive cost of collecting and labeling large-scale medical datasets. Full article
(This article belongs to the Special Issue Applications of Artificial Intelligence in Biomedical Data Analysis)
Show Figures

Figure 1

13 pages, 4331 KiB  
Article
Biomedical Text NER Tagging Tool with Web Interface for Generating BERT-Based Fine-Tuning Dataset
by Yeon-Ji Park, Min-a Lee, Geun-Je Yang, Soo Jun Park and Chae-Bong Sohn
Appl. Sci. 2022, 12(23), 12012; https://doi.org/10.3390/app122312012 - 24 Nov 2022
Cited by 2 | Viewed by 2253
Abstract
In this paper, a tagging tool is developed to streamline the process of locating tags for each term and manually selecting the target term. It directly extracts the terms to be tagged from sentences and displays it to the user. It also increases [...] Read more.
In this paper, a tagging tool is developed to streamline the process of locating tags for each term and manually selecting the target term. It directly extracts the terms to be tagged from sentences and displays it to the user. It also increases tagging efficiency by allowing users to reflect candidate categories in untagged terms. It is based on annotations automatically generated using machine learning. Subsequently, this architecture is fine-tuned using Bidirectional Encoder Representations from Transformers (BERT) to enable the tagging of terms that cannot be captured using Named-Entity Recognition (NER). The tagged text data extracted using the proposed tagging tool can be used as an additional training dataset. The tagging tool, which receives and saves new NE annotation input online, is added to the NER and RE web interfaces using BERT. Annotation information downloaded by the user includes the category (e.g., diseases, genes/proteins) and the list of words associated to the named entity selected by the user. The results reveal that the RE and NER results are improved using the proposed web service by collecting more NE annotation data and fine-tuning the model using generated datasets. Our application programming interfaces and demonstrations are available to the public at via the website link provided in this paper. Full article
(This article belongs to the Special Issue Applications of Artificial Intelligence in Biomedical Data Analysis)
Show Figures

Figure 1

23 pages, 2229 KiB  
Article
Hybrid Inductive Model of Differentially and Co-Expressed Gene Expression Profile Extraction Based on the Joint Use of Clustering Technique and Convolutional Neural Network
by Sergii Babichev, Lyudmyla Yasinska-Damri, Igor Liakh and Jiří Škvor
Appl. Sci. 2022, 12(22), 11795; https://doi.org/10.3390/app122211795 - 20 Nov 2022
Cited by 4 | Viewed by 1142
Abstract
The development of hybrid models focused on gene expression data processing for the allocation of differentially expressed and mutually correlated genes is one of the current directions in modern bioinformatics. The solution to this problem can allow us to improve the effectiveness of [...] Read more.
The development of hybrid models focused on gene expression data processing for the allocation of differentially expressed and mutually correlated genes is one of the current directions in modern bioinformatics. The solution to this problem can allow us to improve the effectiveness of existing systems for complex diseases diagnosis based on gene expression data analysis on the one hand and increase the efficiency of gene regulatory network reconstruction procedures by more careful selection of genes by considering the type of disease on the other hand. In this research, we propose a stepwise procedure to form the subsets of mutually correlated and differentially expressed gene expression profiles (GEP). Firstly, we allocate an informative GEP in terms of statistical and entropy criteria using the Harrington desirability function. Then, we performed cluster analysis using SOTA and spectral clustering algorithms implemented within the framework of objective clustering inductive technology. The result of this step’s implementation is a set of clusters containing co- and differentially expressed GEPs. Validation of the model was performed using a one-dimensional two-layer convolutional neural network (CNN). The analysis of the simulation results has shown the high efficiency of the proposed model. The clusters of GEPs formed based on the clustering quality criteria values allowed us to identify the investigated objects with high accuracy. Moreover, the simulation results have also shown that the hybrid inductive model based on the spectral clustering algorithm is more effective in comparison with the use of the SOTA clustering algorithm in terms of both the complexity of the formed optimal cluster structure and the classification accuracy of the objects that contain the allocated gene expression data as attributes. The proposed hybrid inductive model contributes to increasing objectivity during the formation of the subsets of differentially and co-expressed gene expression profiles for further their application in various disease diagnosis systems and for gene regulatory network reconstruction. Full article
(This article belongs to the Special Issue Applications of Artificial Intelligence in Biomedical Data Analysis)
Show Figures

Figure 1

Back to TopTop