Journal Description
Stats
Stats
is an international, peer-reviewed, open access journal on statistical science published quarterly online by MDPI. The journal focuses on methodological and theoretical papers in statistics, probability, stochastic processes and innovative applications of statistics in all scientific disciplines including biological and biomedical sciences, medicine, business, economics and social sciences, physics, data science and engineering.
- Open Access— free for readers, with article processing charges (APC) paid by authors or their institutions.
- High Visibility: indexed within ESCI (Web of Science), Scopus, RePEc, and other databases.
- Rapid Publication: manuscripts are peer-reviewed and a first decision is provided to authors approximately 18.2 days after submission; acceptance to publication is undertaken in 2.9 days (median values for papers published in this journal in the first half of 2025).
- Recognition of Reviewers: reviewers who provide timely, thorough peer-review reports receive vouchers entitling them to a discount on the APC of their next publication in any MDPI journal, in appreciation of the work done.
Impact Factor:
1.0 (2024);
5-Year Impact Factor:
1.1 (2024)
Latest Articles
A Mixture Model for Survival Data with Both Latent and Non-Latent Cure Fractions
Stats 2025, 8(3), 82; https://doi.org/10.3390/stats8030082 (registering DOI) - 13 Sep 2025
Abstract
One of the most popular cure rate models in the literature is the Berkson and Gage mixture model. A characteristic of this model is that it considers the cure to be a latent event. However, there are situations in which the cure is
[...] Read more.
One of the most popular cure rate models in the literature is the Berkson and Gage mixture model. A characteristic of this model is that it considers the cure to be a latent event. However, there are situations in which the cure is well known, and this information must be considered in the analysis. In this context, this paper proposes a mixture model that accommodates both latent and non-latent cure fractions. More specifically, the proposal is to extend the Berkson and Gage mixture model to include the knowledge of the cure. A simulation study was conducted to investigate the asymptotic properties of maximum likelihood estimators. Finally, the proposed model is illustrated through an application to credit risk modeling.
Full article
(This article belongs to the Section Survival Analysis)
Open AccessArticle
The Unit-Modified Weibull Distribution: Theory, Estimation, and Real-World Applications
by
Ammar M. Sarhan, Thamer Manshi and M. E. Sobh
Stats 2025, 8(3), 81; https://doi.org/10.3390/stats8030081 - 12 Sep 2025
Abstract
►▼
Show Figures
This paper introduces the Unit-Modified Weibull (UMW) distribution, a novel probability model defined on the unit interval . We derive its key statistical properties and estimate its parameters using the maximum likelihood method. The performance of the
[...] Read more.
This paper introduces the Unit-Modified Weibull (UMW) distribution, a novel probability model defined on the unit interval . We derive its key statistical properties and estimate its parameters using the maximum likelihood method. The performance of the estimators is assessed via a simulation study based on mean squared error, coverage probability, and average confidence interval length. To evaluate the practical utility of the model, we analyze three real-world data sets. Both parametric and nonparametric goodness-of-fit techniques are employed to compare the UMW distribution with several well-established competing models. In addition, nonparametric diagnostic tools such as total time on test transform plots and violin plots are used to explore the data’s behavior and assess the adequacy of the proposed model. Results indicate that the UMW distribution offers a competitive and flexible alternative for modeling bounded data.
Full article

Figure 1
Open AccessReview
Statistical Tools Application for Literature Review: A Case on Maintenance Management Decision-Making in the Steel Industry
by
Nuno Miguel de Matos Torre, Valerio Antonio Pamplona Salomon and Luis Ernesto Quezada
Stats 2025, 8(3), 80; https://doi.org/10.3390/stats8030080 - 12 Sep 2025
Abstract
►▼
Show Figures
Literature review plays a crucial role in research. This paper explores bibliometrics, which utilize statistical tools to evaluate the researcher’s scientific contributions. Its intent is to map frequently cited articles and authors, identify top sources, track publication years, explore keywords and their co-occurrences,
[...] Read more.
Literature review plays a crucial role in research. This paper explores bibliometrics, which utilize statistical tools to evaluate the researcher’s scientific contributions. Its intent is to map frequently cited articles and authors, identify top sources, track publication years, explore keywords and their co-occurrences, and show article distribution by thematic area and country. Additionally, it provides a thematic map of relevance and progress, with special attention to interdisciplinary work. Finally, it also makes use of research findings in maintenance management decision-making, where the findings reveal that the literature provides valuable insights into the impact of the Analytic Hierarchy Process (AHP) method. Despite advancements in maintenance management, gaps persist in comprehensively addressing core themes, evolutionary trends, and future research directions. This research aims to bridge this gap by providing a detailed examination of the application of bibliometric analysis employing statistical tools to measure researchers’ scientific contributions, concerning the AHP method applications in maintenance management within the steel industry. The study confirmed that tools like VOSviewer and the Bibliometrix package in R can extract relevant information regarding bibliometric laws, helping us understand research patterns. These findings support strategic decision-making and the evaluation of scientific policies for researchers and institutions.
Full article

Figure 1
Open AccessArticle
Bootstrap Methods for Correcting Bias in WLS Estimators of the First-Order Bifurcating Autoregressive Model
by
Tamer Elbayoumi, Mutiyat Usman, Sayed Mostafa, Mohammad Zayed and Ahmad Aboalkhair
Stats 2025, 8(3), 79; https://doi.org/10.3390/stats8030079 - 5 Sep 2025
Abstract
►▼
Show Figures
In this study, we examine the presence of bias in weighted least squares (WLS) estimation within the context of first-order bifurcating autoregressive (BAR(1)) models. These models are widely used in the analysis of binary tree-structured data, particularly in cell lineage research. Our findings
[...] Read more.
In this study, we examine the presence of bias in weighted least squares (WLS) estimation within the context of first-order bifurcating autoregressive (BAR(1)) models. These models are widely used in the analysis of binary tree-structured data, particularly in cell lineage research. Our findings suggest that WLS estimators may exhibit significant and problematic biases, especially in finite samples. The magnitude and direction of this bias are influenced by both the autoregressive parameter and the correlation structure of the model errors. To address this issue, we propose two bootstrap-based methods for bias correction of the WLS estimator. The paper further introduces shrinkage-based versions of both single and fast double bootstrap bias correction techniques, designed to mitigate the over-correction and under-correction issues that may arise with traditional bootstrap methods, particularly in larger samples. Comprehensive simulation studies were conducted to evaluate the performance of the proposed bias-corrected estimators. The results show that the proposed corrections substantially reduce bias, with the most notable improvements observed at extreme values of the autoregressive parameter. Moreover, the study provides practical guidance for practitioners on method selection under varying conditions.
Full article

Figure 1
Open AccessArticle
On Synthetic Interval Data with Predetermined Subject Partitioning and Partial Control of the Variables’ Marginal Correlation Structure
by
Michail Papathomas
Stats 2025, 8(3), 78; https://doi.org/10.3390/stats8030078 - 27 Aug 2025
Abstract
►▼
Show Figures
A standard approach for assessing the performance of partition models is to create synthetic datasets with a prespecified clustering structure and assess how well the model reveals this structure. A common format involves subjects being assigned to different clusters, with observations simulated so
[...] Read more.
A standard approach for assessing the performance of partition models is to create synthetic datasets with a prespecified clustering structure and assess how well the model reveals this structure. A common format involves subjects being assigned to different clusters, with observations simulated so that subjects within the same cluster have similar profiles, allowing for some variability. In this manuscript, we consider observations from interval variables. Interval data are commonly observed in cohort and Genome-Wide Association studies, and our focus is on Single-Nucleotide Polymorphisms. Theoretical and empirical results are utilized to explore the dependence structure between the variables in relation to the clustering structure for the subjects. A novel algorithm is proposed that allows control over the marginal stratified correlation structure of the variables, specifying exact correlation values within groups of variables. Practical examples are shown, and a synthetic dataset is compared to a real one, to demonstrate similarities and differences.
Full article

Figure 1
Open AccessArticle
A Markov Chain Monte Carlo Procedure for Efficient Bayesian Inference on the Phase-Type Aging Model
by
Cong Nie, Xiaoming Liu, Serge Provost and Jiandong Ren
Stats 2025, 8(3), 77; https://doi.org/10.3390/stats8030077 - 27 Aug 2025
Abstract
►▼
Show Figures
The phase-type aging model (PTAM) belongs to a class of Coxian-type Markovian models that can provide a quantitative description of well-known aging characteristics that are part of a genetically determined, progressive, and irreversible process. Due to its unique parameter structure, estimation via the
[...] Read more.
The phase-type aging model (PTAM) belongs to a class of Coxian-type Markovian models that can provide a quantitative description of well-known aging characteristics that are part of a genetically determined, progressive, and irreversible process. Due to its unique parameter structure, estimation via the MLE method presents a considerable estimability issue, whereby profile likelihood functions are flat and analytically intractable. In this study, a Markov chain Monte Carlo (MCMC)-based Bayesian methodology is proposed and applied to the PTAM, with a view to improving parameter estimability. The proposed method provides two methodological extensions based on an existing MCMC inference method. First, we propose a two-level MCMC sampling scheme that makes the method applicable to situations where the posterior distributions do not assume simple forms after data augmentation. Secondly, an existing data augmentation technique for Bayesian inference on continuous phase-type distributions is further developed in order to incorporate left-truncated data. While numerical results indicate that the proposed methodology improves parameter estimability via sound prior distributions, this approach may also be utilized as a stand-alone statistical model-fitting technique.
Full article

Figure 1
Open AccessArticle
Pattern Classification for Mixed Feature-Type Symbolic Data Using Supervised Hierarchical Conceptual Clustering
by
Manabu Ichino and Hiroyuki Yaguchi
Stats 2025, 8(3), 76; https://doi.org/10.3390/stats8030076 - 25 Aug 2025
Abstract
►▼
Show Figures
This paper describes a region-oriented method of pattern classification based on the Cartesian system model (CSM), a mathematical model that allows manipulating mixed feature-type symbolic data. We use the supervised hierarchical conceptual clustering to generate class regions for respective pattern class based on
[...] Read more.
This paper describes a region-oriented method of pattern classification based on the Cartesian system model (CSM), a mathematical model that allows manipulating mixed feature-type symbolic data. We use the supervised hierarchical conceptual clustering to generate class regions for respective pattern class based on the evaluation of the generality of the regions and the separability of the regions against other classes in each clustering step. We can easily find the robustly informative features to describe each pattern class against other pattern classes. Some examples show the effectiveness of the proposed method.
Full article

Figure 1
Open AccessCommunication
Who Comes First and Who Gets Cited? A 25-Year Multi-Model Analysis of First-Author Gender Effects in Web of Science Economics
by
Daniela-Emanuela Dănăcică
Stats 2025, 8(3), 75; https://doi.org/10.3390/stats8030075 - 24 Aug 2025
Abstract
►▼
Show Figures
The aim of this research is to provide a 25-year multi-model analysis of gender dynamics in economics articles that include at least one Romanian-affiliated author, published in Web of Science journals between 2000 and 2025 (2025 records current as of 15 May 2025).
[...] Read more.
The aim of this research is to provide a 25-year multi-model analysis of gender dynamics in economics articles that include at least one Romanian-affiliated author, published in Web of Science journals between 2000 and 2025 (2025 records current as of 15 May 2025). Drawing on 4030 papers, we map the bibliometric gender gap by examining first-author status, collaboration patterns, research topics and citation impact. The results show that the female-to-male first-author ratio for Romanian-affiliated publications is close to parity, in sharp contrast to the pronounced under-representation of women among foreign-affiliated first authors. Combining negative binomial, journal fixed-effects Poisson, quantile regressions with a text-based topic analysis, we find no systematic or robust gender penalty in citations once structural and topical factors are controlled for. The initial gender gap largely reflects men’s over-representation in higher-impact journals rather than an intrinsic bias against women’s work. Team size consistently emerges as the strongest predictor of citations, and, by extension, scientific visibility. Our findings offer valuable insights into gender dynamics in a semi-peripheral scientific system, highlighting the nuanced interplay between institutional context, research practices, legislation and academic recognition.
Full article

Figure 1
Open AccessArticle
A Bayesian Non-Linear Mixed-Effects Model for Accurate Detection of the Onset of Cognitive Decline in Longitudinal Aging Studies
by
Franklin Fernando Massa, Marco Scavino and Graciela Muniz-Terrera
Stats 2025, 8(3), 74; https://doi.org/10.3390/stats8030074 - 18 Aug 2025
Abstract
►▼
Show Figures
Change-point models are frequently considered when modeling phenomena where a regime shift occurs at an unknown time. In aging research, these models are commonly adopted to estimate of the onset of cognitive decline. Yet these models present several limitations. Here, we present a
[...] Read more.
Change-point models are frequently considered when modeling phenomena where a regime shift occurs at an unknown time. In aging research, these models are commonly adopted to estimate of the onset of cognitive decline. Yet these models present several limitations. Here, we present a Bayesian non-linear mixed-effects model based on a differential equation designed for longitudinal studies to overcome some limitations of classical change point models used in aging research. We demonstrate the ability of the proposed model to avoid biases in estimates of the onset of cognitive impairment in a simulated study. Finally, the methodology presented in this work is illustrated by analyzing results from memory tests from older adults who participated in the English Longitudinal Study of Aging.
Full article

Figure 1
Open AccessArticle
A Mixture Integer GARCH Model with Application to Modeling and Forecasting COVID-19 Counts
by
Wooi Chen Khoo, Seng Huat Ong, Victor Jian Ming Low and Hari M. Srivastava
Stats 2025, 8(3), 73; https://doi.org/10.3390/stats8030073 - 13 Aug 2025
Abstract
►▼
Show Figures
This article introduces a flexible time series regression model known as the Mixture of Integer-Valued Generalized Autoregressive Conditional Heteroscedasticity (MINGARCH). Mixture models provide versatile frameworks for capturing heterogeneity in count data, including features such as multiple peaks, seasonality, and intervention effects. The proposed
[...] Read more.
This article introduces a flexible time series regression model known as the Mixture of Integer-Valued Generalized Autoregressive Conditional Heteroscedasticity (MINGARCH). Mixture models provide versatile frameworks for capturing heterogeneity in count data, including features such as multiple peaks, seasonality, and intervention effects. The proposed model is applied to regional COVID-19 data from Malaysia. To account for geographical variability, five regions—Selangor, Kuala Lumpur, Penang, Johor, and Sarawak—were selected for analysis, covering a total of 86 weeks of data. Comparative analysis with existing time series regression models demonstrates that MINGARCH outperforms alternative approaches. Further investigation into forecasting reveals that MINGARCH yields superior performance in regions with high population density, and significant influencing factors have been identified. In low-density regions, confirmed cases peaked within three weeks, whereas high-density regions exhibited a monthly seasonal pattern. Forecasting metrics—including MAPE, MAE, and RMSE—are significantly lower for the MINGARCH model compared to other models. These results suggest that MINGARCH is well-suited for forecasting disease spread in urban and densely populated areas, offering valuable insights for policymaking.
Full article

Figure 1
Open AccessCommunication
On the Appropriateness of Fixed Correlation Assumptions in Repeated-Measures Meta-Analysis: A Monte Carlo Assessment
by
Vasileios Papadopoulos
Stats 2025, 8(3), 72; https://doi.org/10.3390/stats8030072 - 13 Aug 2025
Abstract
►▼
Show Figures
In repeated-measures meta-analyses, raw data are often unavailable, preventing the calculation of the correlation coefficient r between pre- and post-intervention values. As a workaround, many researchers adopt a heuristic approximation of r = 0.7. However, this value lacks rigorous mathematical justification and may
[...] Read more.
In repeated-measures meta-analyses, raw data are often unavailable, preventing the calculation of the correlation coefficient r between pre- and post-intervention values. As a workaround, many researchers adopt a heuristic approximation of r = 0.7. However, this value lacks rigorous mathematical justification and may introduce bias into variance estimates of pre/post-differences. We employed Monte Carlo simulations (n = 500,000 per scenario) in Fisher z-space to examine the distribution of the standard deviation of pre-/post-differences (σD) under varying assumptions of r and its uncertainty (σr). Scenarios included r = 0.5, 0.6, 0.707, 0.75, and 0.8, each tested across three levels of variance (σr = 0.05, 0.1, and 0.15). The approximation of r = 0.75 resulted in a balanced estimate of σD, corresponding to a “midway” variance attenuation due to paired data. This value more accurately offsets the deficit caused by assuming a correlation, compared to the traditional value of 0.7. While the r = 0.7 heuristic remains widely used, our results support the use of r = 0.75 as a more mathematically neutral and empirically defensible alternative in repeated-measures meta-analyses lacking raw data.
Full article

Figure 1
Open AccessArticle
Individual Homogeneity Learning in Density Data Response Additive Models
by
Zixuan Han, Tao Li, Jinhong You and Narayanaswamy Balakrishnan
Stats 2025, 8(3), 71; https://doi.org/10.3390/stats8030071 - 9 Aug 2025
Abstract
►▼
Show Figures
In many complex applications, both data heterogeneity and homogeneity are present simultaneously. Overlooking either aspect can lead to misleading statistical inferences. Moreover, the increasing prevalence of complex, non-Euclidean data calls for more sophisticated modeling techniques. To address these challenges, we propose a density
[...] Read more.
In many complex applications, both data heterogeneity and homogeneity are present simultaneously. Overlooking either aspect can lead to misleading statistical inferences. Moreover, the increasing prevalence of complex, non-Euclidean data calls for more sophisticated modeling techniques. To address these challenges, we propose a density data response additive model, where the response variable is represented by a distributional density function. In this framework, individual effect curves are assumed to be homogeneous within groups but heterogeneous across groups, while covariates that explain variation share common additive bivariate functions. We begin by applying a transformation to map density functions into a linear space. To estimate the unknown subject-specific functions and the additive bivariate components, we adopt a B-spline series approximation method. Latent group structures are uncovered using a hierarchical agglomerative clustering algorithm, which allows our method to recover the true underlying groupings with high probability. To further improve estimation efficiency, we develop refined spline-backfitted local linear estimators for both the grouped structures and the additive bivariate functions in the post-grouping model. We also establish the asymptotic properties of the proposed estimators, including their convergence rates, asymptotic distributions, and post-grouping oracle efficiency. The effectiveness of our method is demonstrated through extensive simulation studies and real-world data analysis, both of which show promising and robust performance.
Full article

Figure 1
Open AccessArticle
Unraveling Similarities and Differences Between Non-Negative Garrote and Adaptive Lasso: A Simulation Study in Low- and High-Dimensional Data
by
Edwin Kipruto and Willi Sauerbrei
Stats 2025, 8(3), 70; https://doi.org/10.3390/stats8030070 - 6 Aug 2025
Abstract
Penalized regression methods are widely used for variable selection. Non-negative garrote (NNG) was one of the earliest methods to combine variable selection with shrinkage of regression coefficients, followed by lasso. About a decade after the introduction of lasso, adaptive lasso (ALASSO) was proposed
[...] Read more.
Penalized regression methods are widely used for variable selection. Non-negative garrote (NNG) was one of the earliest methods to combine variable selection with shrinkage of regression coefficients, followed by lasso. About a decade after the introduction of lasso, adaptive lasso (ALASSO) was proposed to address lasso’s limitations. ALASSO has two tuning parameters ( and ), and its penalty resembles that of NNG when , though NNG imposes additional constraints. Given ALASSO’s greater flexibility, which may increase instability, this study investigates whether NNG provides any practical benefit or can be replaced by ALASSO. We conducted simulations in both low- and high-dimensional settings to compare selected variables, coefficient estimates, and prediction accuracy. Ordinary least squares and ridge estimates were used as initial estimates. NNG and ALASSO ( ) showed similar performance in low-dimensional settings with low correlation, large samples, and moderate to high . However, under high correlation, small samples, and low , their selected variables and estimates differed, though prediction accuracy remained comparable. When , the differences between NNG and ALASSO became more pronounced, with ALASSO generally performing better. Assuming linear relationships between predictors and the outcome, the results suggest that NNG may offer no practical advantage over ALASSO. The parameter in ALASSO allows for adaptability to model complexity, making ALASSO a more flexible and practical alternative to NNG.
Full article
(This article belongs to the Section Statistical Methods)
►▼
Show Figures

Figure 1
Open AccessReview
Archimedean Copulas: A Useful Approach in Biomedical Data—A Review with an Application in Pediatrics
by
Giulia Risca, Stefania Galimberti, Paola Rebora, Alessandro Cattoni, Maria Grazia Valsecchi and Giulia Capitoli
Stats 2025, 8(3), 69; https://doi.org/10.3390/stats8030069 - 1 Aug 2025
Abstract
Many applications in health research involve the analysis of multivariate distributions of random variables. In this paper, we review the basic theory of copulas to illustrate their advantages in deriving a joint distribution from given marginal distributions, with a specific focus on bivariate
[...] Read more.
Many applications in health research involve the analysis of multivariate distributions of random variables. In this paper, we review the basic theory of copulas to illustrate their advantages in deriving a joint distribution from given marginal distributions, with a specific focus on bivariate cases. Particular attention is given to the Archimedean family of copulas, which includes widely used functions such as Clayton and Gumbel–Hougaard, characterized by a single association parameter and a relatively simple structure. This work differs from previous reviews by providing a focused overview of applied studies in biomedical research that have employed Archimedean copulas, due to their flexibility in modeling a wide range of dependence structures. Their ease of use and ability to accommodate rotated forms make them suitable for various biomedical applications, including those involving survival data. We briefly present the most commonly used methods for estimation and model selection of copula’s functions, with the purpose of introducing these tools within the broader framework. Several recent examples in the health literature, and an original example of a pediatric study, demonstrate the applicability of Archimedean copulas and suggest that this approach, although still not widely adopted, can be useful in many biomedical research settings.
Full article
(This article belongs to the Section Statistical Methods)
►▼
Show Figures

Figure 1
Open AccessArticle
Automated Classification of Crime Narratives Using Machine Learning and Language Models in Official Statistics
by
Klaus Lehmann, Elio Villaseñor, Alejandro Pimentel, Javiera Preuss, Nicolás Berhó, Oswaldo Diaz and Ignacio Agloni
Stats 2025, 8(3), 68; https://doi.org/10.3390/stats8030068 - 30 Jul 2025
Abstract
This paper presents the implementation of a language model–based strategy for the automatic codification of crime narratives for the production of official statistics. To address the high workload and inconsistencies associated with manual coding, we developed and evaluated three models: an XGBoost classifier
[...] Read more.
This paper presents the implementation of a language model–based strategy for the automatic codification of crime narratives for the production of official statistics. To address the high workload and inconsistencies associated with manual coding, we developed and evaluated three models: an XGBoost classifier with bag-of-words features and word embeddings features, an LSTM network using pretrained Spanish word embeddings as a language model, and a fine-tuned BERT language model (BETO). Deep learning models outperformed the traditional baseline, with BETO achieving the highest accuracy. The new ENUSC (Encuesta Nacional Urbana de Seguridad Ciudadana) workflow integrates the selected model into an API for automated classification, incorporating a certainty threshold to distinguish between cases suitable for automation and those requiring expert review. This hybrid strategy led to a 68.4% reduction in manual review workload while preserving high-quality standards. This study represents the first documented application of deep learning for the automated classification of victimization narratives in official statistics, demonstrating its feasibility and impact in a real-world production environment. Our results demonstrate that deep learning can significantly improve the efficiency and consistency of crime statistics coding, offering a scalable solution for other national statistical offices.
Full article
(This article belongs to the Section Applied Statistics and Machine Learning Methods)
►▼
Show Figures

Figure 1
Open AccessArticle
Requiem for Olympic Ethics and Sports’ Independence
by
Fabio Zagonari
Stats 2025, 8(3), 67; https://doi.org/10.3390/stats8030067 - 28 Jul 2025
Abstract
This paper suggests a theoretical framework to summarise the empirical literature on the relationships between sports and both religious and secular ethics, and it suggests two interrelated theoretical models to empirically evaluate the extent to which religious and secular ethics, as well as
[...] Read more.
This paper suggests a theoretical framework to summarise the empirical literature on the relationships between sports and both religious and secular ethics, and it suggests two interrelated theoretical models to empirically evaluate the extent to which religious and secular ethics, as well as sports policies, affect achievements in sports. I identified two national ethics (national pride/efficiency) and two social ethics (social cohesion/ethics) by measuring achievements in terms of alternative indexes based on Olympic medals. I referred to three empirical models and applied three estimation methods (panel Poisson, Data Envelopment, and Stochastic Frontier Analyses). I introduced two sports policies (a quantitative policy aimed at social cohesion and a qualitative policy aimed at national pride), by distinguishing sports in terms of four possibly different ethics to be used for the eight summer and eight winter Olympic Games from 1994 to 2024. I applied income level, health status, and income inequality, to depict alternative social contexts. I used five main religions and three educational levels to depict alternative ethical contexts. I applied country dummies to depict alternative institutional contexts. Empirical results support the absence of Olympic ethics, the potential substitution of sport and secular ethics in providing social cohesion, and the dependence of sports on politics, while alternative social contexts have different impacts on alternative sport achievements.
Full article
(This article belongs to the Special Issue Ethicametrics)
►▼
Show Figures

Figure 1
Open AccessArticle
Proximal Causal Inference for Censored Data with an Application to Right Heart Catheterization Data
by
Yue Hu, Yuanshan Gao and Minhao Qi
Stats 2025, 8(3), 66; https://doi.org/10.3390/stats8030066 - 22 Jul 2025
Abstract
In observational causal inference studies, unmeasured confounding remains a critical threat to the validity of effect estimates. While proximal causal inference (PCI) has emerged as a powerful framework for mitigating such bias through proxy variables, existing PCI methods cannot directly handle censored data.
[...] Read more.
In observational causal inference studies, unmeasured confounding remains a critical threat to the validity of effect estimates. While proximal causal inference (PCI) has emerged as a powerful framework for mitigating such bias through proxy variables, existing PCI methods cannot directly handle censored data. This article develops a unified proximal causal inference framework that simultaneously addresses unmeasured confounding and right-censoring challenges, extending the proximal causal inference literature. Our key contributions are twofold: (i) We propose novel identification strategies and develop two distinct estimators for the censored-outcome bridge function and treatment confounding bridge function, resolving the fundamental challenge of unobserved outcomes; (ii) To improve robustness against model misspecification, we construct a robust proximal estimator and establish uniform consistency for all proposed estimators under mild regularity conditions. Through comprehensive simulations, we demonstrate the finite-sample performance of our methods, followed by an empirical application evaluating right heart catheterization effectiveness in critically ill ICU patients.
Full article
(This article belongs to the Section Applied Statistics and Machine Learning Methods)
►▼
Show Figures

Figure 1
Open AccessArticle
Local Stochastic Correlation Models for Derivative Pricing
by
Marcos Escobar-Anel
Stats 2025, 8(3), 65; https://doi.org/10.3390/stats8030065 - 18 Jul 2025
Abstract
This paper reveals a simple methodology to create local-correlation models suitable for the closed-form pricing of two-asset financial derivatives. The multivariate models are built to ensure two conditions. First, marginals follow desirable processes, e.g., we choose the Geometric Brownian Motion (GBM), popular for
[...] Read more.
This paper reveals a simple methodology to create local-correlation models suitable for the closed-form pricing of two-asset financial derivatives. The multivariate models are built to ensure two conditions. First, marginals follow desirable processes, e.g., we choose the Geometric Brownian Motion (GBM), popular for stock prices. Second, the payoff of the derivative should follow a desired one-dimensional process. These conditions lead to a specific choice of the dependence structure in the form of a local-correlation model. Two popular multi-asset options are entertained: a spread option and a basket option.
Full article
(This article belongs to the Section Applied Stochastic Models)
►▼
Show Figures

Figure 1
Open AccessArticle
Machine Learning Ensemble Algorithms for Classification of Thyroid Nodules Through Proteomics: Extending the Method of Shapley Values from Binary to Multi-Class Tasks
by
Giulia Capitoli, Simone Magnaghi, Andrea D'Amicis, Camilla Vittoria Di Martino, Isabella Piga, Vincenzo L'Imperio, Marco Salvatore Nobile, Stefania Galimberti and Davide Paolo Bernasconi
Stats 2025, 8(3), 64; https://doi.org/10.3390/stats8030064 - 16 Jul 2025
Abstract
►▼
Show Figures
The need to improve medical diagnosis is of utmost importance in medical research, consisting of the optimization of accurate classification models able to assist clinical decisions. To minimize the errors that can be caused by using a single classifier, the voting ensemble technique
[...] Read more.
The need to improve medical diagnosis is of utmost importance in medical research, consisting of the optimization of accurate classification models able to assist clinical decisions. To minimize the errors that can be caused by using a single classifier, the voting ensemble technique can be used, combining the classification results of different classifiers to improve the final classification performance. This paper aims to compare the existing voting ensemble techniques with a new game-theory-derived approach based on Shapley values. We extended this method, originally developed for binary tasks, to the multi-class setting in order to capture complementary information provided by different classifiers. In heterogeneous clinical scenarios such as thyroid nodule diagnosis, where distinct models may be better suited to identify specific subtypes (e.g., benign, malignant, or inflammatory lesions), ensemble strategies capable of leveraging these strengths are particularly valuable. The motivating application focuses on the classification of thyroid cancer nodules whose cytopathological clinical diagnosis is typically characterized by a high number of false positive cases that may result in unnecessary thyroidectomy. We apply and compare the performance of seven individual classifiers, along with four ensemble voting techniques (including Shapley values), in a real-world study focused on classifying thyroid cancer nodules using proteomic features obtained through mass spectrometry. Our results indicate a slight improvement in the classification accuracy for ensemble systems compared to the performance of single classifiers. Although the Shapley value-based voting method remains comparable to the other voting methods, we envision this new ensemble approach could be effective in improving the performance of single classifiers in further applications, especially when complementary algorithms are considered in the ensemble. The application of these techniques can lead to the development of new tools to assist clinicians in diagnosing thyroid cancer using proteomic features derived from mass spectrometry.
Full article

Figure 1
Open AccessCommunication
Beyond Expectations: Anomalies in Financial Statements and Their Application in Modelling
by
Roman Blazek and Lucia Duricova
Stats 2025, 8(3), 63; https://doi.org/10.3390/stats8030063 - 15 Jul 2025
Cited by 1
Abstract
The increasing complexity of financial reporting has enabled the implementation of innovative accounting practices that often obscure a company’s actual performance. This project seeks to uncover manipulative behaviours by constructing an anomaly detection model that utilises unsupervised machine learning techniques. We examined a
[...] Read more.
The increasing complexity of financial reporting has enabled the implementation of innovative accounting practices that often obscure a company’s actual performance. This project seeks to uncover manipulative behaviours by constructing an anomaly detection model that utilises unsupervised machine learning techniques. We examined a dataset of 149,566 Slovak firms from 2016 to 2023, which included 12 financial parameters. Utilising TwoSteps and K-means clustering in IBM SPSS, we discerned patterns of normative financial activity and computed an abnormality index for each firm. Entities with the most significant deviation from cluster centroids were identified as suspicious. The model attained a silhouette score of 1.0, signifying outstanding clustering quality. We discovered a total of 231 anomalous firms, predominantly concentrated in sectors C (32.47%), G (13.42%), and L (7.36%). Our research indicates that anomaly-based models can markedly enhance the precision of fraud detection, especially in scenarios with scarce labelled data. The model integrates intricate data processing and delivers an exhaustive study of the regional and sectoral distribution of anomalies, thereby increasing its relevance in practical applications.
Full article
(This article belongs to the Section Applied Statistics and Machine Learning Methods)
►▼
Show Figures

Figure 1
Highly Accessed Articles
Latest Books
E-Mail Alert
News
Topics
Topic in
JPM, Mathematics, Applied Sciences, Stats, Healthcare
Application of Biostatistics in Medical Sciences and Global Health
Topic Editors: Bogdan Oancea, Adrian Pană, Cǎtǎlina Liliana AndreiDeadline: 31 October 2026

Conferences
Special Issues
Special Issue in
Stats
Benford's Law(s) and Applications (Second Edition)
Guest Editors: Marcel Ausloos, Roy Cerqueti, Claudio LupiDeadline: 31 October 2025
Special Issue in
Stats
Nonparametric Inference: Methods and Applications
Guest Editor: Stefano BonniniDeadline: 28 November 2025
Special Issue in
Stats
Robust Statistics in Action II
Guest Editor: Marco RianiDeadline: 31 December 2025