Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (979)

Search Parameters:
Keywords = imputation

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
36 pages, 2178 KB  
Article
Linking Spatialized Sustainable Income and Net Value Added in Ecosystem Accounting and the System of National Accounts 2025: Application to the Stone Pine Forests of Andalusia, Spain
by Pablo Campos, José L. Oviedo, Alejandro Álvarez and Bruno Mesa
Forests 2025, 16(9), 1370; https://doi.org/10.3390/f16091370 (registering DOI) - 25 Aug 2025
Abstract
This research objective is to overcome the shortcomings of the updated values added of the System of National Accounts 2025 (SNA 2025) in order to measure the spatialized total sustainable social income from forest ecosystems through an experimentally refined System of Environmental-Economic Accounting [...] Read more.
This research objective is to overcome the shortcomings of the updated values added of the System of National Accounts 2025 (SNA 2025) in order to measure the spatialized total sustainable social income from forest ecosystems through an experimentally refined System of Environmental-Economic Accounting (rSEEA). Sustainable income measured at observed, imputed, and simulated market transaction prices is defined as the maximum potential consumption of products generated in the forest ecosystem without a real decline in the environmental asset and manufactured fixed capital at the closing of the current period, assuming idealized future conditions of stable real prices and dynamics of institutional and other autonomous processes. A key finding of this research is that sustainable income extends the SNA 2025 net value added by incorporating the omissions by the latter of environmental net operating surplus (or ecosystem service in the absence of environmental damage), ordinary changes in the environmental asset condition and manufactured fixed capital adjusted according to a less ordinary entry of manufactured fixed capital plus the manufactured consumption of fixed capital. Sustainable income was measured spatially for 15 individual products, the area units being the map tiles for Andalusia, Spain, Stone pine forest (Pinus pinea L.) canopy cover was predominant, covering an area of 243,559 hectares. In 2010, the SNA 2025 gross and net values added accounted for 24% and 27%, respectively, of the Stone pine forest sustainable income measured by the rSEEA. The ecosystem services omitted by the SNA 2025 made up 69% of the rSEEA sustainable income. Full article
(This article belongs to the Section Forest Economics, Policy, and Social Science)
Show Figures

Figure 1

19 pages, 990 KB  
Article
Machine Learning for Mortality Risk Prediction in Myocardial Infarction: A Clinical-Economic Decision Support Framework
by Konstantinos P. Fourkiotis and Athanasios Tsadiras
Appl. Sci. 2025, 15(16), 9192; https://doi.org/10.3390/app15169192 - 21 Aug 2025
Viewed by 511
Abstract
Myocardial infarction (MI) remains a leading cause of in-hospital mortality. Early identification of high-risk patients is essential for improving clinical outcomes and optimizing hospital resource allocation. This study presents a machine learning framework for predicting mortality following MI using a publicly available dataset [...] Read more.
Myocardial infarction (MI) remains a leading cause of in-hospital mortality. Early identification of high-risk patients is essential for improving clinical outcomes and optimizing hospital resource allocation. This study presents a machine learning framework for predicting mortality following MI using a publicly available dataset of 1700 patient records, and after excluding records with over 20 missing values and features with more than 300 missing entries, the final dataset included 1547 patients and 113 variables, categorized as binary, categorical, integer, or continuous. Missing values were addressed using denoising autoencoders for continuous features and variational autoencoders for the remaining data. In contrast, feature selection was performed using Random Forest, and PowerTransformer scaling was applied, addressing class imbalance by using SMOTE. Twelve models were evaluated, including Focal-Loss Neural Networks, TabNet, XGBoost, LightGBM, CatBoost, Random Forest, SVM, Logistic Regression, and a voting ensemble. Performance was assessed using multiple metrics, with SVM achieving the highest F1 score (0.6905), ROC-AUC (0.8970), and MCC (0.6464), while Random Forest yielded perfect precision and specificity. To assess generalizability, a subpopulation external validation was conducted by training on male patients and testing on female patients. XGBoost and CatBoost reached the highest ROC-AUC (0.90), while Focal-Loss Neural Network achieved the best MCC (0.53). Overall, the proposed framework outperformed previous studies in key metrics and maintained better performance under demographic shift, supporting its potential for clinical decision-making in post-MI care. Full article
(This article belongs to the Special Issue Advances and Applications of Machine Learning for Bioinformatics)
Show Figures

Figure 1

16 pages, 1109 KB  
Article
Development and Validation of a Machine Learning Model for Early Prediction of Acute Kidney Injury in Neurocritical Care: A Comparative Analysis of XGBoost, GBM, and Random Forest Algorithms
by Keun Soo Kim, Tae Jin Yoon, Joonghyun Ahn and Jeong-Am Ryu
Diagnostics 2025, 15(16), 2061; https://doi.org/10.3390/diagnostics15162061 - 17 Aug 2025
Viewed by 390
Abstract
Background: Acute Kidney Injury (AKI) is a pivotal concern in neurocritical care, impacting patient survival and quality of life. This study harnesses machine learning (ML) techniques to predict the occurrence of AKI in patients receiving hyperosmolar therapy, aiming to optimize patient outcomes in [...] Read more.
Background: Acute Kidney Injury (AKI) is a pivotal concern in neurocritical care, impacting patient survival and quality of life. This study harnesses machine learning (ML) techniques to predict the occurrence of AKI in patients receiving hyperosmolar therapy, aiming to optimize patient outcomes in neurocritical settings. Methods: We conducted a retrospective cohort study of 4886 patients who underwent hyperosmolar therapy in the neurosurgical intensive care unit (ICU). Comparative predictive analyses were carried out using advanced ML algorithms—eXtreme Gradient Boosting (XGBoost), Gradient Boosting Machine (GBM), Random Forest (RF)—against standard multivariate logistic regression. Predictive performance was assessed using an 8:2 training-testing data split, with model fine-tuning through cross-validation. Results: The RF with KNN imputation showed slightly better performance than other approaches in predicting AKI. When applied to an independent test set, it achieved a sensitivity of 79% (95% CI: 70–87%) and specificity of 85% (95% CI: 82–88%), with an overall accuracy of 84% (95% CI: 81–87%) and AUROC of 0.86 (95% CI: 0.82–0.91). The multivariate logistic regression analysis, while informative, showed less predictive strength compared to the ML models. Delta chloride levels and serum osmolality proved to be the most influential predictors, with additional significant variables including pH, age, bicarbonate, and the osmolar gap. Conclusions: The prominence of delta chloride and serum osmolality among the predictive variables underscores its potential as a biomarker for AKI risk in this patient population. Full article
(This article belongs to the Section Machine Learning and Artificial Intelligence in Diagnostics)
Show Figures

Figure 1

19 pages, 614 KB  
Article
Effects of Outdoor and Household Air Pollution on Hand Grip Strength in a Longitudinal Study of Rural Beijing Adults
by Wenlu Yuan, Xiaoying Li, Collin Brehmer, Talia Sternbach, Xiang Zhang, Ellison Carter, Yuanxun Zhang, Guofeng Shen, Shu Tao, Jill Baumgartner and Sam Harper
Int. J. Environ. Res. Public Health 2025, 22(8), 1283; https://doi.org/10.3390/ijerph22081283 - 16 Aug 2025
Viewed by 450
Abstract
Background: Outdoor and household PM2.5 are established risk factors for chronic disease and early mortality. In China, high levels of outdoor PM2.5 and solid fuel use for cooking and heating, especially in winter, pose large health risks to the country’s aging [...] Read more.
Background: Outdoor and household PM2.5 are established risk factors for chronic disease and early mortality. In China, high levels of outdoor PM2.5 and solid fuel use for cooking and heating, especially in winter, pose large health risks to the country’s aging population. Hand grip strength is a validated biomarker of functional aging and strong predictor of disability and mortality in older adults. We investigated the effects of wintertime household and outdoor PM2.5 on maximum grip strength in a rural cohort in Beijing. Methods: We analyzed data from 877 adults (mean age: 62 y) residing in 50 rural villages over three winter seasons (2018–2019, 2019–2020, and 2021–2022). Outdoor PM2.5 was continuously measured in all villages, and household (indoor) PM2.5 was monitored for at least two months in a randomly selected ~30% subsample of homes. Missing data were handled using multiple imputation. We applied multivariable mixed effects regression models to estimate within- and between-individual effects of PM2.5 on grip strength, adjusting for demographic, behavioral, and health-related covariates. Results: Wintertime household and outdoor PM2.5 concentrations ranged from 3 to 431 μg/m3 (mean = 80 μg/m3) and 8 to 100 μg/m3 (mean = 49 μg/m3), respectively. The effect of a 10 μg/m3 within-individual increase in household and outdoor PM2.5 on maximum grip strength was 0.06 kg (95%CI: −0.01, 0.12 kg) and 1.51 kg (95%CI: 1.35, 1.68 kg), respectively. The household PM2.5 effect attenuated after adjusting for outdoor PM2.5, while outdoor PM2.5 effects remained robust across sensitivity analyses. We found little evidence of between-individual effects. Conclusions: We did not find strong evidence of an adverse effect of household PM2.5 on grip strength. The unexpected positive effects of outdoor PM2.5 on grip strength may reflect transient physiological changes following short-term exposure. However, these findings should not be interpreted as evidence of protective effects of air pollution on aging. Rather, they highlight the complexity of air pollution’s health impacts and the value of longitudinal data in capturing time-sensitive effects. Further research is needed to better understand these patterns and their implications in high-exposure settings. Full article
(This article belongs to the Section Environmental Health)
Show Figures

Figure 1

8 pages, 529 KB  
Data Descriptor
An Extended Dataset of Educational Quality Across Countries (1970–2023)
by Hanol Lee and Jong-Wha Lee
Data 2025, 10(8), 130; https://doi.org/10.3390/data10080130 - 15 Aug 2025
Viewed by 272
Abstract
This study presents an extended dataset on educational quality covering 101 countries, from 1970 to 2023. While existing international assessments, such as the Programme for International Student Assessment (PISA) and Trends in International Mathematics and Science Study (TIMSS), offer valuable snapshots of student [...] Read more.
This study presents an extended dataset on educational quality covering 101 countries, from 1970 to 2023. While existing international assessments, such as the Programme for International Student Assessment (PISA) and Trends in International Mathematics and Science Study (TIMSS), offer valuable snapshots of student performance, their limited coverage across countries and years constrains broader analyses. To address this limitation, we harmonized observed test scores across assessments and imputed missing values using both linear interpolation and machine learning (Least Absolute Shrinkage and Selection Operator (LASSO) regression). The dataset included (i) harmonized test scores for 15 year olds, (ii) annual educational quality indicators for the 15–19 age group, and (iii) educational quality indexes for the working-age population (15–64). These measures are provided in machine-readable formats and support empirical research on human capital, economic development, and global education inequalities across economies. Full article
Show Figures

Figure 1

27 pages, 9197 KB  
Data Descriptor
A Six-Year, Spatiotemporally Comprehensive Dataset and Data Retrieval Tool for Analyzing Chlorophyll-a, Turbidity, and Temperature in Utah Lake Using Sentinel and MODIS Imagery
by Kaylee B. Tanner, Anna C. Cardall and Gustavious P. Williams
Data 2025, 10(8), 128; https://doi.org/10.3390/data10080128 - 13 Aug 2025
Viewed by 321
Abstract
Data from earth observation satellites provide unique and valuable information about water quality conditions in freshwater lakes but require significant processing before they can be used, even with the use of tools like Google Earth Engine. We use imagery from Sentinel 2 and [...] Read more.
Data from earth observation satellites provide unique and valuable information about water quality conditions in freshwater lakes but require significant processing before they can be used, even with the use of tools like Google Earth Engine. We use imagery from Sentinel 2 and MODIS and in situ data from the State of Utah Ambient Water Quality Management System (AQWMS) database to develop models and to generate a highly accessible, easy-to-use CSV file of chlorophyll-a (which is an indicator of algal biomass), turbidity, and water temperature measurements on Utah Lake. From a collection of 937 Sentinel 2 images spanning the period from January 2019 to May 2025, we generated 262,081 estimates each of chlorophyll-a and turbidity, with an additional 1,140,777 data points interpolated from those estimates to provide a dataset with a consistent time step. From a collection of 2333 MODIS images spanning the same time period, we extracted 1,390,800 measurements each of daytime water surface temperature and nighttime water surface temperature and interpolated or imputed an additional 12,058 data points from those estimates. We interpolated the data using piecewise cubic Hermite interpolation polynomials to preserve the original distribution of the data and provide the most accurate estimates of measurements between observations. We demonstrate the processing steps required to extract usable, accurate estimates of these three water quality parameters from satellite imagery and format them for analysis. We include summary statistics and charts for the resulting dataset, which show the usefulness of this data for informing Utah Lake management issues. We include the Jupyter Notebook with the implemented processing steps and the formatted CSV file of data as supplemental materials. The Jupyter Notebook can be used to update the Utah Lake data or can be easily modified to generate similar data for other waterbodies. We provide this method, tool set, and data to make remotely sensed water quality data more accessible to researchers, water managers, and others interested in Utah Lake and to facilitate the use of satellite data for those interested in applying remote sensing techniques to other waterbodies. Full article
(This article belongs to the Collection Modern Geophysical and Climate Data Analysis: Tools and Methods)
Show Figures

Graphical abstract

16 pages, 1461 KB  
Article
Prognostic Factors and Clinical Outcomes of Spontaneous Intracerebral Hemorrhage: Analysis of 601 Consecutive Patients from a Single Center (2017–2023)
by Cosmin Cindea, Vicentiu Saceleanu, Victor Tudor, Patrick Canning, Ovidiu Petrascu, Tamas Kerekes, Alexandru Breazu, Iulian Roman-Filip, Corina Roman-Filip and Romeo Mihaila
NeuroSci 2025, 6(3), 77; https://doi.org/10.3390/neurosci6030077 - 12 Aug 2025
Viewed by 309
Abstract
Background: Spontaneous intracerebral hemorrhage (ICH) has the highest case fatality of all stroke types, yet recent epidemiological and outcome data from Central and Eastern Europe remain limited. Methods: We retrospectively analyzed prospectively collected data for 601 consecutive adults with primary ICH admitted to [...] Read more.
Background: Spontaneous intracerebral hemorrhage (ICH) has the highest case fatality of all stroke types, yet recent epidemiological and outcome data from Central and Eastern Europe remain limited. Methods: We retrospectively analyzed prospectively collected data for 601 consecutive adults with primary ICH admitted to Sibiu County Clinical Emergency Hospital, Romania (2017–2023). Demographics, Glasgow Coma Scale (GCS), CT-derived hematoma volume (ABC/2), anatomical site, intraventricular extension (IVH), treatment, comorbidities, and in-hospital death were reported with exact counts and percentages; no imputation was performed. Results: Mean age was 68.4 ± 12.9 years, and 59.7% were male. Mean hematoma volume was 30.4 mL, and 23.0% exceeded 30 mL. IVH occurred in 40.1% and doubled mortality (50.6% vs. 16.7%). Overall case fatality was 29.6% and climbed to 74.5% for brain-stem bleeds. Men, although younger than women (66.0 vs. 71.9 years), died more often (35.4% vs. 21.1%; risk ratio 1.67, 95% CI 1.26–2.21). Systemic hazards amplified death risk: Oral anticoagulation, 44.2%; chronic alcohol misuse, 51.4%; thrombocytopenia, 41.0%; chronic kidney disease, 42.3%. Conservative management (74.9%) yielded 27.8% mortality overall and ≤15 for small-to-mid lobar or capsulo-lenticular bleeds; lobar surgery matched this (13.4%) only in large clots. Thalamic evacuation was futile (82.3% mortality), and cerebellar decompression performed late still carried 54.5% mortality versus 16.6% medically. Multivariable analysis confirmed that low GCS, IVH, large hematoma volume, thrombocytopenia, and chronic alcohol use independently predicted in-hospital mortality. Limitations: This retrospective study lacked post-discharge functional outcome data (e.g., mRS at 90 days). Conclusions: This study presents the largest Romanian single-center ICH cohort, establishing national benchmarks and underscoring modifiable risk factors. Early ICH lethality aligns with Western data but is amplified by exposures such as alcohol misuse, anticoagulation, thrombocytopenia, and CKD. Priorities include preventive strategies, timely surgical access, wider adoption of minimally invasive techniques, and development of a prospective regional registry. Full article
Show Figures

Figure 1

14 pages, 452 KB  
Article
An Integrated Intuitionistic Fuzzy-Clustering Approach for Missing Data Imputation
by Charlène Béatrice Bridge-Nduwimana, Aziza El Ouaazizi and Majid Benyakhlef
Computers 2025, 14(8), 325; https://doi.org/10.3390/computers14080325 - 12 Aug 2025
Viewed by 285
Abstract
Missing data imputation is a critical preprocessing task that directly impacts the quality and reliability of data-driven analyses, yet many existing methods treat numerical and categorical data separately and lack the integration of advanced techniques. We suggest a novel imputation technique to overcome [...] Read more.
Missing data imputation is a critical preprocessing task that directly impacts the quality and reliability of data-driven analyses, yet many existing methods treat numerical and categorical data separately and lack the integration of advanced techniques. We suggest a novel imputation technique to overcome these restrictions that synergistically combines regression imputation using HistGradientBoostingRegressor and fuzzy rule-based systems and is enhanced by a tailored clustering process. This integrated approach effectively handles mixed data types and complex data structures using regression models to predict missing numerical values, fuzzy logic to incorporate expert knowledge and interpretability, and clustering to capture latent data patterns. Categorical variables are managed by mode imputation and label encoding. We evaluated the method on twelve tabular datasets with artificially introduced missingness, employing a comprehensive set of metrics focused on originally missing entries. The results demonstrate that our iterative imputer performs competitively with other established imputation techniques, achieving better and comparable error rates and accuracy. By combining statistical learning with fuzzy and clustering frameworks, the method achieves 15% lower Root Mean Square Error (RMSE), 10% lower Mean Absolute Error (MAE), and 80% higher precision in UCI datasets, thus offering a promising advance in data preprocessing in practical applications. Full article
(This article belongs to the Special Issue Emerging Trends in Machine Learning and Artificial Intelligence)
Show Figures

Figure 1

20 pages, 5008 KB  
Article
Harnessing Large-Scale University Registrar Data for Predictive Insights: A Data-Driven Approach to Forecasting Undergraduate Student Success with Convolutional Autoencoders
by Mohammad Erfan Shoorangiz and Michal Brylinski
Mach. Learn. Knowl. Extr. 2025, 7(3), 80; https://doi.org/10.3390/make7030080 - 8 Aug 2025
Viewed by 316
Abstract
Predicting undergraduate student success is critical for informing timely interventions and improving outcomes in higher education. This study leverages over a decade of historical data from Louisiana State University (LSU) to forecast graduation outcomes using advanced machine learning techniques, with a focus on [...] Read more.
Predicting undergraduate student success is critical for informing timely interventions and improving outcomes in higher education. This study leverages over a decade of historical data from Louisiana State University (LSU) to forecast graduation outcomes using advanced machine learning techniques, with a focus on convolutional autoencoders (CAEs). We detail the data processing and transformation steps, including feature selection and imputation, to construct a robust dataset. The CAE effectively extracts meaningful latent features, validated through low-dimensional t-SNE visualizations that reveal clear clusters based on class labels, differentiating students likely to graduate from those at risk. A two-year gap strategy is introduced to ensure rigorous evaluation and simulate real-world conditions by predicting outcomes on unseen future data. Our results demonstrate the promise of CAE-derived embeddings for dimensionality reduction and computational efficiency, with competitive performance in downstream classification tasks. While models trained on embeddings showed slightly reduced performance compared to raw input data, with accuracies of 83% and 85%, respectively, their compactness and computational efficiency highlight their potential for large-scale analyses. The study emphasizes the importance of rigorous preprocessing, feature engineering, and evaluation protocols. By combining these approaches, we provide actionable insights and adaptive modeling strategies to support robust and generalizable predictive systems, enabling educators and administrators to enhance student success initiatives in dynamic educational environments. Full article
(This article belongs to the Section Learning)
Show Figures

Figure 1

6 pages, 1076 KB  
Proceeding Paper
Applying Transformer-Based Dynamic-Sequence Techniques to Transit Data Analysis
by Bumjun Choo and Dong-Kyu Kim
Eng. Proc. 2025, 102(1), 12; https://doi.org/10.3390/engproc2025102012 - 7 Aug 2025
Viewed by 276
Abstract
Transit systems play a vital role in urban mobility, yet predicting individual travel behavior within these systems remains a complex challenge. Traditional machine learning approaches struggle with transit trip data because each trip may consist of a variable number of transit legs, leading [...] Read more.
Transit systems play a vital role in urban mobility, yet predicting individual travel behavior within these systems remains a complex challenge. Traditional machine learning approaches struggle with transit trip data because each trip may consist of a variable number of transit legs, leading to missing data and inconsistencies when using fixed-length tabular representations. To address this issue, we propose a transformer-based dynamic-sequence approach that models transit trips as variable-length sequences, allowing for flexible representation while leveraging the power of attention mechanisms. Our methodology constructs trip sequences by encoding each transit leg as a token, incorporating travel time, mode of transport, and a 2D positional encoding based on grid-based spatial coordinates. By dynamically skipping missing legs instead of imputing artificial values, our approach maintains data integrity and prevents bias. The transformer model then processes these sequences using self-attention, effectively capturing relationships across different trip segments and spatial patterns. To evaluate the effectiveness of our approach, we train the model on a dataset of urban transit trips and predict first-mile and last-mile travel times. We assess performance using Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE). Experimental results demonstrate that our dynamic-sequence method yields up to a 30.96% improvement in accuracy compared to non-dynamic methods while preserving the underlying structure of transit trips. This study contributes to intelligent transportation systems by presenting a robust, adaptable framework for modeling real-world transit data. Our findings highlight the advantages of self-attention-based architectures for handling irregular trip structures, offering a novel perspective on a data-driven understanding of individual travel behavior. Full article
(This article belongs to the Proceedings of The 2025 Suwon ITS Asia Pacific Forum)
Show Figures

Figure 1

23 pages, 3831 KB  
Article
Estimating Planetary Boundary Layer Height over Central Amazonia Using Random Forest
by Paulo Renato P. Silva, Rayonil G. Carneiro, Alison O. Moraes, Cleo Quaresma Dias-Junior and Gilberto Fisch
Atmosphere 2025, 16(8), 941; https://doi.org/10.3390/atmos16080941 - 5 Aug 2025
Viewed by 363
Abstract
This study investigates the use of a Random Forest (RF), an artificial intelligence (AI) model, to estimate the planetary boundary layer height (PBLH) over Central Amazonia from climatic elements data collected during the GoAmazon experiment, held in 2014 and 2015, as it is [...] Read more.
This study investigates the use of a Random Forest (RF), an artificial intelligence (AI) model, to estimate the planetary boundary layer height (PBLH) over Central Amazonia from climatic elements data collected during the GoAmazon experiment, held in 2014 and 2015, as it is a key metric for air quality, weather forecasting, and climate modeling. The novelty of this study lies in estimating PBLH using only surface-based meteorological observations. This approach is validated against remote sensing measurements (e.g., LIDAR, ceilometer, and wind profilers), which are seldom available in the Amazon region. The dataset includes various meteorological features, though substantial missing data for the latent heat flux (LE) and net radiation (Rn) measurements posed challenges. We addressed these gaps through different data-cleaning strategies, such as feature exclusion, row removal, and imputation techniques, assessing their impact on model performance using the Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and r2 metrics. The best-performing strategy achieved an RMSE of 375.9 m. In addition to the RF model, we benchmarked its performance against Linear Regression, Support Vector Regression, LightGBM, XGBoost, and a Deep Neural Network. While all models showed moderate correlation with observed PBLH, the RF model outperformed all others with statistically significant differences confirmed by paired t-tests. SHAP (SHapley Additive exPlanations) values were used to enhance model interpretability, revealing hour of the day, air temperature, and relative humidity as the most influential predictors for PBLH, underscoring their critical role in atmospheric dynamics in Central Amazonia. Despite these optimizations, the model underestimates the PBLH values—by an average of 197 m, particularly in the spring and early summer austral seasons when atmospheric conditions are more variable. These findings emphasize the importance of robust data preprocessing and higtextight the potential of ML models for improving PBLH estimation in data-scarce tropical environments. Full article
(This article belongs to the Special Issue Applications of Artificial Intelligence in Atmospheric Sciences)
Show Figures

Figure 1

14 pages, 614 KB  
Article
Development of Cut Scores for Feigning Spectrum Behavior on the Orebro Musculoskeletal Pain Screening Questionnaire and the Perceived Stress Scale: A Simulation Study
by John Edward McMahon, Ashley Craig and Ian Douglas Cameron
J. Clin. Med. 2025, 14(15), 5504; https://doi.org/10.3390/jcm14155504 - 5 Aug 2025
Viewed by 317
Abstract
Background/Objectives: Feigning spectrum behavior (FSB) is the exaggeration, fabrication, or false imputation of symptoms. It occurs in compensable injury with great cost to society by way of loss of productivity and excessive costs. The aim of this study is to identify feigning [...] Read more.
Background/Objectives: Feigning spectrum behavior (FSB) is the exaggeration, fabrication, or false imputation of symptoms. It occurs in compensable injury with great cost to society by way of loss of productivity and excessive costs. The aim of this study is to identify feigning by developing cut scores on the long and short forms (SF) of the Orebro Musculoskeletal Pain Screening Questionnaire (OMPSQ and OMPSQ-SF) and the Perceived Stress Scale (PSS and PSS-4). Methods: As part of pre-screening for a support program, 40 injured workers who had been certified unfit for work for more than 2 weeks were screened once with the OMPSQ and PSS by telephone by a mental health professional. A control sample comprised of 40 non-injured community members were screened by a mental health professional on four occasions under different aliases, twice responding genuinely and twice simulating an injury. Results: Differences between the workplace injured people and the community sample were compared using ANCOVA with age and gender as covariates, and then receiver operator characteristics (ROCs) were calculated. The OMPSQ and OMPSQ-SF discriminated (ρ < 0.001) between all conditions. All measures discriminated between the simulation condition and workplace injured people (ρ < 0.001). Intraclass correlation demonstrated the PSS, PSS-4, OMPSQ, and OMPSQ-SF were reliable (ρ < 0.001). Area Under the Curve (AUC) was 0.750 for OMPSQ and 0.835 for OMPSQ-SF for work-injured versus simulators. Conclusions: The measures discriminated between injured and non-injured people and non-injured people instructed to simulate injury. Non-injured simulators produced similar scores when they had multiple exposures to the test materials, showing the uniformity of feigning spectrum behavior on these measures. The OMPSQ-SF has adequate discriminant validity and sensitivity to feigning spectrum behavior, making it optimal for telephone screening in clinical practice. Full article
(This article belongs to the Section Clinical Rehabilitation)
Show Figures

Figure 1

14 pages, 1805 KB  
Data Descriptor
Mediterranean-DASH Intervention for Neurodegenerative Delay (MIND) Trial: Genetic Resource for Precision Nutrition
by Yuxi Liu, Hailie Fowler, Dong D. Wang, Lisa L. Barnes and Marilyn C. Cornelis
Nutrients 2025, 17(15), 2548; https://doi.org/10.3390/nu17152548 - 4 Aug 2025
Viewed by 671
Abstract
Background: The Mediterranean-DASH Intervention for Neurodegenerative Delay (MIND) was a 3-year, multicenter, randomized controlled trial to test the effects of the MIND diet on cognitive decline in 604 individuals at risk for Alzheimer’s dementia. Here, we describe the genotyping, imputation, and quality control [...] Read more.
Background: The Mediterranean-DASH Intervention for Neurodegenerative Delay (MIND) was a 3-year, multicenter, randomized controlled trial to test the effects of the MIND diet on cognitive decline in 604 individuals at risk for Alzheimer’s dementia. Here, we describe the genotyping, imputation, and quality control (QC) procedures for the genetic data of trial participants. Methods: DNA was extracted from either whole blood or serum, and genotyping was performed using the Infinium Global Diversity Array. Established sample and SNP QC procedures were applied to the genotyping data, followed by imputation using the 1000 Genomes Phase 3 v5 reference panel. Results: Significant study-site, specimen type, and batch effects were observed. A total of 494 individuals of inferred European ancestry and 58 individuals of inferred African ancestry were included in the final imputed dataset. Evaluation of the imputed APOE genotype against gold-standard sequencing data showed high concordance (98.2%). We replicated several known genetic associations identified from previous genome-wide association studies, including SNPs previously linked to adiponectin (rs16861209, p = 1.5 × 10−5), alpha-linolenic acid (rs174547, p = 1.3 × 10−7), and alpha-tocopherol (rs964184, p = 0.003). Conclusions: This dataset represents the first genetic resource derived from a dietary intervention trial focused on cognitive outcomes. It enables investigation of genetic contributions to variability in cognitive response to the MIND diet and supports integrative analyses with other omics data types to elucidate the biological mechanisms underlying cognitive decline. These efforts may ultimately inform precision nutrition strategies to promote cognitive health. Full article
(This article belongs to the Section Nutrigenetics and Nutrigenomics)
Show Figures

Figure 1

29 pages, 1132 KB  
Article
Generating Realistic Synthetic Patient Cohorts: Enforcing Statistical Distributions, Correlations, and Logical Constraints
by Ahmad Nader Fasseeh, Rasha Ashmawy, Rok Hren, Kareem ElFass, Attila Imre, Bertalan Németh, Dávid Nagy, Balázs Nagy and Zoltán Vokó
Algorithms 2025, 18(8), 475; https://doi.org/10.3390/a18080475 - 1 Aug 2025
Viewed by 400
Abstract
Large, high-quality patient datasets are essential for applications like economic modeling and patient simulation. However, real-world data is often inaccessible or incomplete. Synthetic patient data offers an alternative, and current methods often fail to preserve clinical plausibility, real-world correlations, and logical consistency. This [...] Read more.
Large, high-quality patient datasets are essential for applications like economic modeling and patient simulation. However, real-world data is often inaccessible or incomplete. Synthetic patient data offers an alternative, and current methods often fail to preserve clinical plausibility, real-world correlations, and logical consistency. This study presents a patient cohort generator designed to produce realistic, statistically valid synthetic datasets. The generator uses predefined probability distributions and Cholesky decomposition to reflect real-world correlations. A dependency matrix handles variable relationships in the right order. Hard limits block unrealistic values, and binary variables are set using percentiles to match expected rates. Validation used two datasets, NHANES (2021–2023) and the Framingham Heart Study, evaluating cohort diversity (general, cardiac, low-dimensional), data sparsity (five correlation scenarios), and model performance (MSE, RMSE, R2, SSE, correlation plots). Results demonstrated strong alignment with real-world data in central tendency, dispersion, and correlation structures. Scenario A (empirical correlations) performed best (R2 = 86.8–99.6%, lowest SSE and MAE). Scenario B (physician-estimated correlations) also performed well, especially in a low-dimensions population (R2 = 80.7%). Scenario E (no correlation) performed worst. Overall, the proposed model provides a scalable, customizable solution for generating synthetic patient cohorts, supporting reliable simulations and research when real-world data is limited. While deep learning approaches have been proposed for this task, they require access to large-scale real datasets and offer limited control over statistical dependencies or clinical logic. Our approach addresses this gap. Full article
(This article belongs to the Collection Feature Papers in Algorithms for Multidisciplinary Applications)
Show Figures

Figure 1

19 pages, 573 KB  
Article
Dietary Habits and Obesity in Middle-Aged and Elderly Europeans—The Survey of Health, Ageing, and Retirement in Europe (SHARE)
by Manuela Maltarić, Jasenka Gajdoš Kljusurić, Mirela Kolak, Šime Smolić, Branko Kolarić and Darija Vranešić Bender
Nutrients 2025, 17(15), 2525; https://doi.org/10.3390/nu17152525 - 31 Jul 2025
Viewed by 459
Abstract
Background/Objectives: Understanding the impact of dietary habits in terms of obesity, health outcomes, and functional decline is critical in Europe’s growing elderly population. This study analyzed trends in Mediterranean diet (MD) adherence, obesity prevalence, and grip strength among middle-aged and elderly Europeans [...] Read more.
Background/Objectives: Understanding the impact of dietary habits in terms of obesity, health outcomes, and functional decline is critical in Europe’s growing elderly population. This study analyzed trends in Mediterranean diet (MD) adherence, obesity prevalence, and grip strength among middle-aged and elderly Europeans using data from the Survey of Health, Ageing and Retirement in Europe (SHARE). Methods: Data from four SHARE waves (2015–2022) across 28 countries were analyzed. Dietary patterns were assessed through food frequency questionnaires classifying participants as MD-adherent or non-adherent where adherent implies daily consumption of fruits and vegetables and occasional (3–6 times/week) intake of eggs, beans, legumes, meat, fish, or poultry (an unvalidated definition of the MD pattern). Handgrip strength, a biomarker of functional capacity, was categorized into low, medium, and high groups. Body mass index (BMI), self-perceived health (SPHUS), chronic disease prevalence, and CASP-12 scores (control, autonomy, self-realization, and pleasure evaluated on the 12-item version) were also evaluated. Statistical analyses included descriptive methods, logistic regressions, and multiple imputations to address missing data. Results: A significant majority (74–77%) consumed fruits and vegetables daily, which is consistent with MD principles; however, the high daily intake of dairy products (>50%) indicates limited adherence to the MD, which advocates for moderate consumption of dairy products. Logistic regression indicated that individuals with two or more chronic diseases were more likely to follow the MD (odds ratio [OR] = 1.21, confidence interval [CI] = 1.11–1.32), as were those individuals who rated their SPHUS as very good/excellent ([OR] = 1.42, [CI] = 1.20–1.69). Medium and high maximal handgrip were also strongly and consistently associated with higher odds of MD adherence (Medium: [OR] = 1.44, [CI] = 1.18–1.74; High: [OR] = 1.27, [CI] = 1.10–1.48). Conclusions: The findings suggest that middle-aged and older adults are more likely to adhere to the MD dietary pattern if they have more than two chronic diseases, are physically active, and have a medium or high handgrip. Although an unvalidated definition of the MD dietary pattern was used, the results highlight the importance of implementing targeted dietary strategies for middle-aged and elderly adults. Full article
(This article belongs to the Special Issue Food Insecurity, Nutritional Status, and Human Health)
Show Figures

Figure 1

Back to TopTop