MDPI - Publisher of Open Access Journals

13 pages, 1754 KiB

Open AccessArticle

A Study on Survival Analysis Methods Using Neural Network to Prevent Cancers

by Chul-Young Bae, Bo-Seon Kim, Sun-Ha Jee, Jong-Hoon Lee and Ngoc-Dung Nguyen

Cancers 2023, 15(19), 4757; https://doi.org/10.3390/cancers15194757 - 27 Sep 2023

Cited by 3 | Viewed by 2684

Background: Cancer is one of the main global health threats. Early personalized prediction of cancer incidence is crucial for the population at risk. This study introduces a novel cancer prediction model based on modern recurrent survival deep learning algorithms. Methods: The study includes 160,407 participants from the blood-based cohort of the Korea Cancer Prevention Research-II Biobank, which has been ongoing since 2004. Data linkages were designed to ensure anonymity, and data collection was carried out through nationwide medical examinations. Predictive performance on ten cancer sites, evaluated using the concordance index (c-index), was compared among nDeep and its multitask variation, Cox proportional hazard (PH) regression, DeepSurv, and DeepHit. Results: Our models consistently achieved a c-index of over 0.8 for all ten cancers, with a peak of 0.8922 for lung cancer. They outperformed Cox PH regression and other survival deep neural networks. Conclusion: This study presents a survival deep learning model that demonstrates the highest predictive performance on censored health dataset, to the best of our knowledge. In the future, we plan to investigate the causal relationship between explanatory variables and cancer to reduce cancer incidence and mortality. Full article

(This article belongs to the Topic AI and Data-Driven Advancements in Industry 4.0)

► Show Figures

Figure 1

14 pages, 6883 KiB

Open AccessArticle

Searching for Optimal Oversampling to Process Imbalanced Data: Generative Adversarial Networks and Synthetic Minority Over-Sampling Technique

by Gayeong Eom and Haewon Byeon

Mathematics 2023, 11(16), 3605; https://doi.org/10.3390/math11163605 - 21 Aug 2023

Cited by 7 | Viewed by 5134

Abstract

Classification problems due to data imbalance occur in many fields and have long been studied in the machine learning field. Many real-world datasets suffer from the issue of class imbalance, which occurs when the sizes of classes are not uniform; thus, data belonging to the minority class are likely to be misclassified. It is particularly important to overcome this issue when dealing with medical data because class imbalance inevitably arises due to incidence rates within medical datasets. This study adjusted the imbalance ratio (IR) within the National Biobank of Korea dataset “Epidemiologic data of Parkinson’s disease dementia patients” to values of 6.8 (raw data), 9, and 19 and compared four traditional oversampling methods with techniques using the conditional generative adversarial network (CGAN) and conditional tabular generative adversarial network (CTGAN). The results showed that when the classes were balanced with CGAN and CTGAN, they showed a better classification performance than the more traditional oversampling techniques based on the AUC and F1-score. We were able to expand the application scope of GAN, widely used in unstructured data, to structured data. We also offer a better solution for the imbalanced data problem and suggest future research directions. Full article

(This article belongs to the Special Issue Class-Imbalance and Cost-Sensitive Learning)

► Show Figures

Figure 1

7 pages, 207 KiB

Open AccessBrief Report

Consensus Definition of Blood Samples from the Subcategorized Normal Controls in the Korea Biobank Network

by Ji Eun Han, Min Kyu Park, Ju Hyun Jin, Jung Ah Lee, Gyeongsin Park, Jong Sook Park, Han-Ik Bae, Seok Joong Yun, An Na Seo, Man-Hoon Han, Hyoungnam Lee, Jae-Pil Jeon, Ji-In Yu, Soon Sun Kim and Jae Youn Cheong

J. Clin. Med. 2023, 12(9), 3080; https://doi.org/10.3390/jcm12093080 - 24 Apr 2023

Viewed by 1797

Abstract

A control group is defined as a group of people used for comparison. Depending on the type of study, it can be a group of healthy people or a group not exposed to risk factors. It is important to allow researchers to select the appropriate control participants. The Korea Biobank Project-sponsored biobanks are affiliated with the Korea Biobank Network (KBN), for which the National Biobank of Korea plays a central coordinating role among KBN biobanks. KBN organized several working groups to address new challenges and needs in biobanking. The “Normal Healthy Control Working Group” developed standardized criteria for three defined control groups, namely, normal, normal-plus, and disease-specific controls. Based on the consensus on the definition of a normal control, we applied the criteria for normal control participants to retrospective data. The main reason for exclusion from the “Normal-plus” group was blood test results beyond 5% of the reference range, including hypercholesterolemia. Subclassification of samples of normal controls by detailed criteria will help researchers select optimal normal controls for their studies. Full article

(This article belongs to the Section Clinical Guidelines)

13 pages, 3220 KiB

Open AccessArticle

Common Data Model and Database System Development for the Korea Biobank Network

by Soo-Jeong Ko, Wona Choi, Ki-Hoon Kim, Seo-Joon Lee, Haesook Min, Seol-Whan Oh and In Young Choi

Appl. Sci. 2021, 11(24), 11825; https://doi.org/10.3390/app112411825 - 13 Dec 2021

Cited by 5 | Viewed by 4296

Abstract

The importance of clinical information related to specimens is increasing due to the research on human biological specifications being conducted worldwide. In order to utilize data, it is necessary to define the range of data and develop a standardized system for collected resources. The purpose of this study is to establish clinical information standardization and to allow clinical information management systems to improve the utilization of biological specifications. The KBN CDM, consisting of 18 tables and 177 variables, was developed. The clinical information codes were mapped in standard terms. The 27 diseases in the group were collected from 17 biobanks, and all disorders not belonging to the group were standardized and loaded. We also developed a system that provides statistical visualization screens and data retrieval tools for data collection. This study developed a unified management system to model KBN CDM that collects standardized data, manages clinical information, and shares the information systematically. Through this system, all participating biobanks can be integrated into one system for integrated management and research. Full article

(This article belongs to the Special Issue New Trends in Medical Informatics II)

► Show Figures

Figure 1

Search Results (4)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (4)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI