Feature Papers in Medical Statistics and Data Science Section

A special issue of BioMedInformatics (ISSN 2673-7426). This special issue belongs to the section "Medical Statistics and Data Science".

Deadline for manuscript submissions: closed (31 January 2024) | Viewed by 62338

Printed Edition Available!
A printed edition of this Special Issue is available here.

Special Issue Editor


E-Mail Website
Guest Editor
Medical Informatics and Data Analysis Research Group, University of Oulu, P.O. Box 5000, FI-90014 Oulu, Finland
Interests: medical statistics; data informatics; statistics in medical journals; statistical computing; statistical modelling; data presentation; bibliometrics; information retrieval
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

This Special Issue aims to bring together information on established statistical methods and data analysis methodologies in the fields of biostatistics, epidemiology, health sciences, dentistry, clinical medicine and biomedicine. Such resources will provide support for researchers at all levels, as well as students, publishing articles on a wide array of methods, how those methods should be applied, and providing examples of applications in practice.

Most papers published in biomedical and clinical journals contain elements of statistical methods, analysis and interpretation. Mathematical statisticians and data science researchers are introducing new data analysis methods, marked by a rapid expansion in computing capability. Current views on the causes, mechanisms, and treatment methods of diseases are advancing too rapidly for any physician or researcher to be familiar with all new findings. This has led to a growing reliance on the statistical reporting and data presentation of the published medical literature to learn about new discoveries.

In summary, this Special Issue is an opportunity for the scientific community to present research on the application and complexity of data analytical methods, to give insight into new challenges in medical data informatics, and to provide knowledge on the statistical intensity of medical articles. Both original research and review articles are welcome to be submitted to this Special Issue.

Dr. Pentti Nieminen
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. BioMedInformatics is an international peer-reviewed open access quarterly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1000 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (15 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review, Other

24 pages, 11145 KiB  
Article
Comparing ANOVA and PowerShap Feature Selection Methods via Shapley Additive Explanations of Models of Mental Workload Built with the Theta and Alpha EEG Band Ratios
by Bujar Raufi and Luca Longo
BioMedInformatics 2024, 4(1), 853-876; https://doi.org/10.3390/biomedinformatics4010048 - 19 Mar 2024
Cited by 2 | Viewed by 1489
Abstract
Background: Creating models to differentiate self-reported mental workload perceptions is challenging and requires machine learning to identify features from EEG signals. EEG band ratios quantify human activity, but limited research on mental workload assessment exists. This study evaluates the use of theta-to-alpha [...] Read more.
Background: Creating models to differentiate self-reported mental workload perceptions is challenging and requires machine learning to identify features from EEG signals. EEG band ratios quantify human activity, but limited research on mental workload assessment exists. This study evaluates the use of theta-to-alpha and alpha-to-theta EEG band ratio features to distinguish human self-reported perceptions of mental workload. Methods: In this study, EEG data from 48 participants were analyzed while engaged in resting and task-intensive activities. Multiple mental workload indices were developed using different EEG channel clusters and band ratios. ANOVA’s F-score and PowerSHAP were used to extract the statistical features. At the same time, models were built and tested using techniques such as Logistic Regression, Gradient Boosting, and Random Forest. These models were then explained using Shapley Additive Explanations. Results: Based on the results, using PowerSHAP to select features led to improved model performance, exhibiting an accuracy exceeding 90% across three mental workload indexes. In contrast, statistical techniques for model building indicated poorer results across all mental workload indexes. Moreover, using Shapley values to evaluate feature contributions to the model output, it was noted that features rated low in importance by both ANOVA F-score and PowerSHAP measures played the most substantial role in determining the model output. Conclusions: Using models with Shapley values can reduce data complexity and improve the training of better discriminative models for perceived human mental workload. However, the outcomes can sometimes be unclear due to variations in the significance of features during the selection process and their actual impact on the model output. Full article
(This article belongs to the Special Issue Feature Papers in Medical Statistics and Data Science Section)
Show Figures

Figure 1

16 pages, 3180 KiB  
Article
The Effect of Data Missingness on Machine Learning Predictions of Uncontrolled Diabetes Using All of Us Data
by Zain Jabbar and Peter Washington
BioMedInformatics 2024, 4(1), 780-795; https://doi.org/10.3390/biomedinformatics4010043 - 6 Mar 2024
Viewed by 1215
Abstract
Electronic Health Records (EHR) provide a vast amount of patient data that are relevant to predicting clinical outcomes. The inherent presence of missing values poses challenges to building performant machine learning models. This paper aims to investigate the effect of various imputation methods [...] Read more.
Electronic Health Records (EHR) provide a vast amount of patient data that are relevant to predicting clinical outcomes. The inherent presence of missing values poses challenges to building performant machine learning models. This paper aims to investigate the effect of various imputation methods on the National Institutes of Health’s All of Us dataset, a dataset containing a high degree of data missingness. We apply several imputation techniques such as mean substitution, constant filling, and multiple imputation on the same dataset for the task of diabetes prediction. We find that imputing values causes heteroskedastic performance for machine learning models with increased data missingness. That is, the more missing values a patient has for their tests, the higher variance there is on a diabetes model AUROC, F1, precision, recall, and accuracy scores. This highlights a critical challenge in using EHR data for predictive modeling. This work highlights the need for future research to develop methodologies to mitigate the effects of missing data and heteroskedasticity in EHR-based predictive models. Full article
(This article belongs to the Special Issue Feature Papers in Medical Statistics and Data Science Section)
Show Figures

Graphical abstract

30 pages, 29526 KiB  
Article
Whole Slide Image Understanding in Pathology: What Is the Salient Scale of Analysis?
by Eleanor Jenkinson and Ognjen Arandjelović
BioMedInformatics 2024, 4(1), 489-518; https://doi.org/10.3390/biomedinformatics4010028 - 14 Feb 2024
Cited by 2 | Viewed by 1804
Abstract
Background: In recent years, there has been increasing research in the applications of Artificial Intelligence in the medical industry. Digital pathology has seen great success in introducing the use of technology in the digitisation and analysis of pathology slides to ease the burden [...] Read more.
Background: In recent years, there has been increasing research in the applications of Artificial Intelligence in the medical industry. Digital pathology has seen great success in introducing the use of technology in the digitisation and analysis of pathology slides to ease the burden of work on pathologists. Digitised pathology slides, otherwise known as whole slide images, can be analysed by pathologists with the same methods used to analyse traditional glass slides. Methods: The digitisation of pathology slides has also led to the possibility of using these whole slide images to train machine learning models to detect tumours. Patch-based methods are common in the analysis of whole slide images as these images are too large to be processed using normal machine learning methods. However, there is little work exploring the effect that the size of the patches has on the analysis. A patch-based whole slide image analysis method was implemented and then used to evaluate and compare the accuracy of the analysis using patches of different sizes. In addition, two different patch sampling methods are used to test if the optimal patch size is the same for both methods, as well as a downsampling method where whole slide images of low resolution images are used to train an analysis model. Results: It was discovered that the most successful method uses a patch size of 256 × 256 pixels with the informed sampling method, using the location of tumour regions to sample a balanced dataset. Conclusion: Future work on batch-based analysis of whole slide images in pathology should take into account our findings when designing new models. Full article
(This article belongs to the Special Issue Feature Papers in Medical Statistics and Data Science Section)
Show Figures

Figure 1

12 pages, 1733 KiB  
Article
Machine Learning Analysis of Genomic Factors Influencing Hyperbaric Oxygen Therapy in Parkinson’s Disease
by Eirini Banou, Aristidis G. Vrahatis, Marios G. Krokidis and Panagiotis Vlamos
BioMedInformatics 2024, 4(1), 127-138; https://doi.org/10.3390/biomedinformatics4010009 - 9 Jan 2024
Viewed by 1411
Abstract
(1) Background: Parkinson’s disease (PD) is a progressively worsening neurodegenerative disorder affecting movement, mental well-being, sleep, and pain. While no cure exists, treatments like hyperbaric oxygen therapy (HBOT) offer potential relief. However, the molecular biology perspective, especially when intertwined with machine learning dynamics, [...] Read more.
(1) Background: Parkinson’s disease (PD) is a progressively worsening neurodegenerative disorder affecting movement, mental well-being, sleep, and pain. While no cure exists, treatments like hyperbaric oxygen therapy (HBOT) offer potential relief. However, the molecular biology perspective, especially when intertwined with machine learning dynamics, remains underexplored. (2) Methods: We employed machine learning techniques to analyze single-cell RNA-seq data from human PD cell samples. This approach aimed to identify pivotal genes associated with PD and understand their relationship with HBOT. (3) Results: Our analysis indicated genes such as MAP2, CAP2, and WSB1, among others, as being crucially linked with Parkinson’s disease (PD) and showed their significant correlation with Hyperbaric oxygen therapy (HBOT) indicatively. This suggests that certain genomic factors might influence the efficacy of HBOT in PD treatment. (4) Conclusions: HBOT presents promising therapeutic potential for Parkinson’s disease, with certain genomic factors playing a pivotal role in its efficacy. Our findings emphasize the need for further machine learning-driven research harnessing diverse omics data to better understand and treat PD. Full article
(This article belongs to the Special Issue Feature Papers in Medical Statistics and Data Science Section)
Show Figures

Figure 1

24 pages, 2602 KiB  
Article
Weighted Trajectory Analysis and Application to Clinical Outcome Assessment
by Utkarsh Chauhan, Kaiqiong Zhao, John Walker and John R. Mackey
BioMedInformatics 2023, 3(4), 829-852; https://doi.org/10.3390/biomedinformatics3040052 - 7 Oct 2023
Cited by 2 | Viewed by 2285
Abstract
The Kaplan–Meier (KM) estimator is widely used in medical research to estimate the survival function from lifetime data. KM estimation is a powerful tool to evaluate clinical trials due to simple computational requirements, its use of a logrank hypothesis test, and the ability [...] Read more.
The Kaplan–Meier (KM) estimator is widely used in medical research to estimate the survival function from lifetime data. KM estimation is a powerful tool to evaluate clinical trials due to simple computational requirements, its use of a logrank hypothesis test, and the ability to censor patients. However, KM estimation has several constraints and fails to generalize to ordinal variables of clinical interest, such as toxicity and ECOG performance. We devised weighted trajectory analysis (WTA) to combine the advantages of KM estimation with the ability to visualize and compare treatment groups for ordinal variables and fluctuating outcomes. To assess statistical significance, we developed a new hypothesis test analogous to the logrank test. We demonstrated the functionality of WTA through 1000-fold clinical trial simulations of unique stochastic models of chemotherapy toxicity and schizophrenia disease course. With increments in sample size and hazard ratio, we compared the performance of WTA to KM estimation and the generalized estimating equation (GEE). WTA generally required half the sample size to achieve comparable power to KM estimation; advantages over the GEE included its robust nonparametric approach and summary plot. We also applied WTA to real clinical data: the toxicity outcomes of melanoma patients receiving immunotherapy and the disease progression of patients with metastatic breast cancer receiving ramucirumab. The application of WTA demonstrated that using traditional methods such as KM estimation can lead to both type I and II errors by failing to model illness trajectory. This article outlines a novel method for clinical outcome assessment that extends the advantages of Kaplan–Meier estimates to ordinal outcome variables. Full article
(This article belongs to the Special Issue Feature Papers in Medical Statistics and Data Science Section)
Show Figures

Figure 1

21 pages, 4682 KiB  
Article
Evaluation of Transmembrane Protein Structural Models Using HPMScore
by Stéphane Téletchéa, Jérémy Esque, Aurélie Urbain, Catherine Etchebest and Alexandre G. de Brevern
BioMedInformatics 2023, 3(2), 306-326; https://doi.org/10.3390/biomedinformatics3020021 - 6 Apr 2023
Cited by 2 | Viewed by 3228
Abstract
Transmembrane proteins (TMPs) are a class of essential proteins for biological and therapeutic purposes. Despite an increasing number of structures, the gap with the number of available sequences remains impressive. The choice of a dedicated function to select the most probable/relevant model among [...] Read more.
Transmembrane proteins (TMPs) are a class of essential proteins for biological and therapeutic purposes. Despite an increasing number of structures, the gap with the number of available sequences remains impressive. The choice of a dedicated function to select the most probable/relevant model among hundreds is a specific problem of TMPs. Indeed, the majority of approaches are mostly focused on globular proteins. We developed an alternative methodology to evaluate the quality of TMP structural models. HPMScore took into account sequence and local structural information using the unsupervised learning approach called hybrid protein model. The methodology was extensively evaluated on very different TMP all-α proteins. Structural models with different qualities were generated, from good to bad quality. HPMScore performed better than DOPE in recognizing good comparative models over more degenerated models, with a Top 1 of 46.9% against DOPE 40.1%, both giving the same result in 13.0%. When the alignments used are higher than 35%, HPM is the best for 52%, against 36% for DOPE (12% for both). These encouraging results need further improvement particularly when the sequence identity falls below 35%. An area of enhancement would be to train on a larger training set. A dedicated web server has been implemented and provided to the scientific community. It can be used with structural models generated from comparative modeling to deep learning approaches. Full article
(This article belongs to the Special Issue Feature Papers in Medical Statistics and Data Science Section)
Show Figures

Graphical abstract

14 pages, 1448 KiB  
Article
Enhancing Explainable Machine Learning by Reconsidering Initially Unselected Items in Feature Selection for Classification
by Jörn Lötsch and Alfred Ultsch
BioMedInformatics 2022, 2(4), 701-714; https://doi.org/10.3390/biomedinformatics2040047 - 12 Dec 2022
Cited by 7 | Viewed by 1844
Abstract
Feature selection is a common step in data preprocessing that precedes machine learning to reduce data space and the computational cost of processing or obtaining the data. Filtering out uninformative variables is also important for knowledge discovery. By reducing the data space to [...] Read more.
Feature selection is a common step in data preprocessing that precedes machine learning to reduce data space and the computational cost of processing or obtaining the data. Filtering out uninformative variables is also important for knowledge discovery. By reducing the data space to only those components that are informative to the class structure, feature selection can simplify models so that they can be more easily interpreted by researchers in the field, reminiscent of explainable artificial intelligence. Knowledge discovery in complex data thus benefits from feature selection that aims to understand feature sets in the thematic context from which the data set originates. However, a single variable selected from a very small number of variables that are technically sufficient for AI training may make little immediate thematic sense, whereas the additional consideration of a variable discarded during feature selection could make scientific discovery very explicit. In this report, we propose an approach to explainable feature selection (XFS) based on a systematic reconsideration of unselected features. The difference between the respective classifications when training the algorithms with the selected features or with the unselected features provides a valid estimate of whether the relevant features in a data set have been selected and uninformative or trivial information was filtered out. It is shown that revisiting originally unselected variables in multivariate data sets allows for the detection of pathologies and errors in the feature selection that occasionally resulted in the failure to identify the most appropriate variables. Full article
(This article belongs to the Special Issue Feature Papers in Medical Statistics and Data Science Section)
Show Figures

Figure 1

18 pages, 5920 KiB  
Article
Analysis of Differentially Expressed Genes, MMP3 and TESC, and Their Potential Value in Molecular Pathways in Colon Adenocarcinoma: A Bioinformatics Approach
by Constantin Busuioc, Andreea Nutu, Cornelia Braicu, Oana Zanoaga, Monica Trif and Ioana Berindan-Neagoe
BioMedInformatics 2022, 2(3), 474-491; https://doi.org/10.3390/biomedinformatics2030030 - 3 Sep 2022
Cited by 2 | Viewed by 2264
Abstract
Despite the great progress in its early diagnosis and treatment, colon adenocarcinoma (COAD) is still poses important issues to clinical management. Therefore, the identification of novel biomarkers or therapeutic targets for this disease is important. Using UALCAN, the top 25 upregulated and downregulated [...] Read more.
Despite the great progress in its early diagnosis and treatment, colon adenocarcinoma (COAD) is still poses important issues to clinical management. Therefore, the identification of novel biomarkers or therapeutic targets for this disease is important. Using UALCAN, the top 25 upregulated and downregulated genes in COAD were identified. Then, a Kaplan–Meier plotter was employed for these genes for survival analysis, revealing the correlation with overall survival rate only for MMP3 (Matrix Metallopeptidase 3) and TESC (Tescalcin). Despite this, the mRNA expression levels were not correlated with the tumor stages or nodal metastatic status. MMP3 and TESC are relevant targets in COAD that should be additionally validated as biomarkers for early diagnosis and prevention. Ingenuity Pathway Analysis revealed the top relevant network linked to Post-Translational Modification, Protein Degradation, and Protein Synthesis, where MMP3 was at the core of the network. Another important network was related to cell cycle regulation, TESC being a component of this. We should also not underestimate the complex regulatory mechanisms mediated by the interplay of the multiple other regulatory molecules, emphasizing the interconnection with molecules related to invasion and migration involved in COAD, that might serve as the basis for the development of new biomarkers and therapeutic targets. Full article
(This article belongs to the Special Issue Feature Papers in Medical Statistics and Data Science Section)
Show Figures

Figure 1

13 pages, 1175 KiB  
Article
A Preliminary Evaluation of “GenDAI”, an AI-Assisted Laboratory Diagnostics Solution for Genomic Applications
by Thomas Krause, Elena Jolkver, Sebastian Bruchhaus, Paul Mc Kevitt, Michael Kramer and Matthias Hemmje
BioMedInformatics 2022, 2(2), 332-344; https://doi.org/10.3390/biomedinformatics2020021 - 10 Jun 2022
Cited by 2 | Viewed by 2599
Abstract
Genomic data enable the development of new biomarkers in diagnostic laboratories. Examples include data from gene expression analyses or metagenomics. Artificial intelligence can help to analyze these data. However, diagnostic laboratories face various technical and regulatory challenges to harness these data. Existing software [...] Read more.
Genomic data enable the development of new biomarkers in diagnostic laboratories. Examples include data from gene expression analyses or metagenomics. Artificial intelligence can help to analyze these data. However, diagnostic laboratories face various technical and regulatory challenges to harness these data. Existing software for genomic data is usually designed for research and does not meet the requirements for use as a diagnostic tool. To address these challenges, we recently proposed a conceptual architecture called “GenDAI”. An initial evaluation of “GenDAI” was conducted in collaboration with a small laboratory in the form of a preliminary study. The results of this pre-study highlight the requirement for and feasibility of the approach. The pre-study also yields detailed technical and regulatory requirements, use cases from laboratory practice, and a prototype called “PlateFlow” for exploring user interface concepts. Full article
(This article belongs to the Special Issue Feature Papers in Medical Statistics and Data Science Section)
Show Figures

Figure 1

14 pages, 6919 KiB  
Article
Automated Detection of Ear Tragus and C7 Spinous Process in a Single RGB Image—A Novel Effective Approach
by Ivanna Kramer, Sabine Bauer and Anne Matejcek
BioMedInformatics 2022, 2(2), 318-331; https://doi.org/10.3390/biomedinformatics2020020 - 8 Jun 2022
Viewed by 2750
Abstract
Biophotogrammetric methods for postural analysis have shown effectiveness in the clinical practice because they do not expose individuals to radiation. Furthermore, valid statements can be made about postural weaknesses. Usually, such measurements are collected via markers attached to the subject’s body, which can [...] Read more.
Biophotogrammetric methods for postural analysis have shown effectiveness in the clinical practice because they do not expose individuals to radiation. Furthermore, valid statements can be made about postural weaknesses. Usually, such measurements are collected via markers attached to the subject’s body, which can provide conclusions about the current posture. The craniovertebral angle (CVA) is one of the recognized measurements used for the analysis of human head–neck postures. This study presents a novel method to automate the detection of the landmarks that are required to determine the CVA in RGBs. Different image processing methods are applied together with a neuronal network Openpose to find significant landmarks in a photograph. A prominent key body point is the spinous process of the cervical vertebra C7, which is often visible on the skin. Another visual landmark needed for the calculation of the CVA is the ear tragus. The methods proposed for the automated detection of the C7 spinous process and ear tragus are described and evaluated using a custom dataset. The results indicate the reliability of the proposed detection approach, particularly head postures. Full article
(This article belongs to the Special Issue Feature Papers in Medical Statistics and Data Science Section)
Show Figures

Figure 1

21 pages, 2481 KiB  
Article
Meal and Physical Activity Detection from Free-Living Data for Discovering Disturbance Patterns of Glucose Levels in People with Diabetes
by Mohammad Reza Askari, Mudassir Rashid, Xiaoyu Sun, Mert Sevil, Andrew Shahidehpour, Keigo Kawaji and Ali Cinar
BioMedInformatics 2022, 2(2), 297-317; https://doi.org/10.3390/biomedinformatics2020019 - 1 Jun 2022
Cited by 7 | Viewed by 3471
Abstract
Objective: The interpretation of time series data collected in free-living has gained importance in chronic disease management. Some data are collected objectively from sensors and some are estimated and entered by the individual. In type 1 diabetes (T1D), blood glucose concentration (BGC) data [...] Read more.
Objective: The interpretation of time series data collected in free-living has gained importance in chronic disease management. Some data are collected objectively from sensors and some are estimated and entered by the individual. In type 1 diabetes (T1D), blood glucose concentration (BGC) data measured by continuous glucose monitoring (CGM) systems and insulin doses administered can be used to detect the occurrences of meals and physical activities and generate the personal daily living patterns for use in automated insulin delivery (AID). Methods: Two challenges in time-series data collected in daily living are addressed: data quality improvement and the detection of unannounced disturbances of BGC. CGM data have missing values for varying periods of time and outliers. People may neglect reporting their meal and physical activity information. In this work, novel methods for preprocessing real-world data collected from people with T1D and the detection of meal and exercise events are presented. Four recurrent neural network (RNN) models are investigated to detect the occurrences of meals and physical activities disjointly or concurrently. Results: RNNs with long short-term memory (LSTM) with 1D convolution layers and bidirectional LSTM with 1D convolution layers have average accuracy scores of 92.32% and 92.29%, and outperform other RNN models. The F1 scores for each individual range from 96.06% to 91.41% for these two RNNs. Conclusions: RNNs with LSTM and 1D convolution layers and bidirectional LSTM with 1D convolution layers provide accurate personalized information about the daily routines of individuals. Significance: Capturing daily behavior patterns enables more accurate future BGC predictions in AID systems and improves BGC regulation. Full article
(This article belongs to the Special Issue Feature Papers in Medical Statistics and Data Science Section)
Show Figures

Figure 1

Review

Jump to: Research, Other

26 pages, 697 KiB  
Review
Towards Automated Meta-Analysis of Clinical Trials: An Overview
by Stella C. Christopoulou
BioMedInformatics 2023, 3(1), 115-140; https://doi.org/10.3390/biomedinformatics3010009 - 1 Feb 2023
Cited by 2 | Viewed by 5394
Abstract
Background: Nowadays, much research deals with the application of the automated meta-analysis of clinical trials through appropriate machine learning tools to extract the results that can then be applied in daily clinical practice. Methods: The author performed a systematic search of the literature [...] Read more.
Background: Nowadays, much research deals with the application of the automated meta-analysis of clinical trials through appropriate machine learning tools to extract the results that can then be applied in daily clinical practice. Methods: The author performed a systematic search of the literature from 27 September 2022–22 November 2022 in PUBMED, in the first 6 pages of Google Scholar and in the online catalog, the Systematic Review Toolbox. Moreover, a second search of the literature was performed from 7 January 2023–20 January 2023 in the first 10 pages of Google Scholar and in the Semantic Google Scholar. Results: 38 approaches in 39 articles met the criteria and were included in this overview. These articles describe in detail machine learning approaches, methods, and tools that have been or can potentially be applied to the meta-analysis of clinical trials. Nevertheless, while the other tasks of a systematic review have significantly developed, the automation of meta-analyses is still far from being able to significantly support and facilitate the work of researchers, freeing them from manual, difficult and time-consuming work. Conclusions: The evaluation of automated meta-analysis results is presented in some studies. Their approaches show positive and promising results. Full article
(This article belongs to the Special Issue Feature Papers in Medical Statistics and Data Science Section)
Show Figures

Figure 1

19 pages, 1689 KiB  
Review
In Silico Protein Structure Analysis for SARS-CoV-2 Vaccines Using Deep Learning
by Yasunari Matsuzaka and Ryu Yashiro
BioMedInformatics 2023, 3(1), 54-72; https://doi.org/10.3390/biomedinformatics3010004 - 11 Jan 2023
Cited by 2 | Viewed by 3957
Abstract
Protein three-dimensional structural analysis using artificial intelligence is attracting attention in various fields, such as the estimation of vaccine structure and stability. In particular, when using the spike protein in vaccines, the major issues in the construction of SARS-CoV-2 vaccines are their weak [...] Read more.
Protein three-dimensional structural analysis using artificial intelligence is attracting attention in various fields, such as the estimation of vaccine structure and stability. In particular, when using the spike protein in vaccines, the major issues in the construction of SARS-CoV-2 vaccines are their weak abilities to attack the virus and elicit immunity for a short period. Structural information about new viruses is essential for understanding their properties and creating effective vaccines. However, determining the structure of a protein through experiments is a lengthy and laborious process. Therefore, a new computational approach accelerated the elucidation process and made predictions more accurate. Using advanced machine learning technology called deep neural networks, it has become possible to predict protein structures directly from protein and gene sequences. We summarize the advances in antiviral therapy with the SARS-CoV-2 vaccine and extracellular vesicles via computational analysis. Full article
(This article belongs to the Special Issue Feature Papers in Medical Statistics and Data Science Section)
Show Figures

Figure 1

25 pages, 930 KiB  
Review
Application of Standardized Regression Coefficient in Meta-Analysis
by Pentti Nieminen
BioMedInformatics 2022, 2(3), 434-458; https://doi.org/10.3390/biomedinformatics2030028 - 31 Aug 2022
Cited by 57 | Viewed by 24442
Abstract
The lack of consistent presentation of results in published studies on the association between a quantitative explanatory variable and a quantitative dependent variable has been a long-term issue in evaluating the reported findings. Studies are analyzed and reported in a variety of ways. [...] Read more.
The lack of consistent presentation of results in published studies on the association between a quantitative explanatory variable and a quantitative dependent variable has been a long-term issue in evaluating the reported findings. Studies are analyzed and reported in a variety of ways. The main purpose of this review is to illustrate the procedures in summarizing and synthesizing research results from multivariate models with a quantitative outcome variable. The review summarizes the application of the standardized regression coefficient as an effect size index in the context of meta-analysis and describe how it can be estimated and converted from data presented in original research articles. An example of synthesis is provided using research articles on the association between childhood body mass index and carotid intima-media thickness in adult life. Finally, the paper shares practical recommendations for meta-analysts wanting to use the standardized regression coefficient in pooling findings. Full article
(This article belongs to the Special Issue Feature Papers in Medical Statistics and Data Science Section)
Show Figures

Figure 1

Other

Jump to: Research, Review

7 pages, 223 KiB  
Opinion
Big Data in Chronic Kidney Disease: Evolution or Revolution?
by Abbie Kitcher, UZhe Ding, Henry H. L. Wu and Rajkumar Chinnadurai
BioMedInformatics 2023, 3(1), 260-266; https://doi.org/10.3390/biomedinformatics3010017 - 14 Mar 2023
Cited by 1 | Viewed by 2617
Abstract
Digital information storage capacity and biomedical technology advancements in recent decades have stimulated the maturity and popularization of “big data” in medicine. The value of utilizing big data as a diagnostic and prognostic tool has continued to rise given its potential to provide [...] Read more.
Digital information storage capacity and biomedical technology advancements in recent decades have stimulated the maturity and popularization of “big data” in medicine. The value of utilizing big data as a diagnostic and prognostic tool has continued to rise given its potential to provide accurate and insightful predictions of future health events and probable outcomes for individuals and populations, which may aid early identification of disease and timely treatment interventions. Whilst the implementation of big data methods for this purpose is more well-established in specialties such as oncology, cardiology, ophthalmology, and dermatology, big data use in nephrology and specifically chronic kidney disease (CKD) remains relatively novel at present. Nevertheless, increased efforts in the application of big data in CKD have been observed over recent years, with aims to achieve a more personalized approach to treatment for individuals and improved CKD screening strategies for the general population. Considering recent developments, we provide a focused perspective on the current state of big data and its application in CKD and nephrology, with hope that its ongoing evolution and revolution will gradually identify more solutions to improve strategies for CKD prevention and optimize the care of patients with CKD. Full article
(This article belongs to the Special Issue Feature Papers in Medical Statistics and Data Science Section)
Back to TopTop