Next Issue
Volume 6, December
Previous Issue
Volume 6, June
 
 

Stats, Volume 6, Issue 3 (September 2023) – 14 articles

Cover Story (view full-size image): Using two-factor fixed-effects ANOVAs, we show how to construct orthonormal F contrasts for main effects, while with equally replicated models, we also show how to construct orthonormal F contrasts for interaction effects. Our primary focus is when the levels of both factors are ordered. For these models, the interaction contrasts may be interpreted as generalised correlations. Thus, for example, ordinary correlation (linear–linear) and umbrella (linear–quadratic) effects may be detected. Our analysis is objective, whereas the standard plots only permit a subjective scrutiny of the data. View this paper
  • Issues are regarded as officially published after their release is announced to the table of contents alert mailing list.
  • You may sign up for e-mail alerts to receive table of contents of newly released issues.
  • PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Reader to open them.
Order results
Result details
Section
Select all
Export citation of selected articles as:
14 pages, 599 KiB  
Article
A Family of Finite Mixture Distributions for Modelling Dispersion in Count Data
by Seng Huat Ong, Shin Zhu Sim, Shuangzhe Liu and Hari M. Srivastava
Stats 2023, 6(3), 942-955; https://doi.org/10.3390/stats6030059 - 18 Sep 2023
Viewed by 1187
Abstract
This paper considers the construction of a family of discrete distributions with the flexibility to cater for under-, equi- and over-dispersion in count data using a finite mixture model based on standard distributions. We are motivated to introduce this family because its simple [...] Read more.
This paper considers the construction of a family of discrete distributions with the flexibility to cater for under-, equi- and over-dispersion in count data using a finite mixture model based on standard distributions. We are motivated to introduce this family because its simple finite mixture structure adds flexibility and facilitates application and use in analysis. The family of distributions is exemplified using a mixture of negative binomial and shifted negative binomial distributions. Some basic and probabilistic properties are derived. We perform hypothesis testing for equi-dispersion and simulation studies of their power and consider parameter estimation via maximum likelihood and probability-generating-function-based methods. The utility of the distributions is illustrated via their application to real biological data sets exhibiting under-, equi- and over-dispersion. It is shown that the distribution fits better than the well-known generalized Poisson and COM–Poisson distributions for handling under-, equi- and over-dispersion in count data. Full article
(This article belongs to the Special Issue Advances in Probability Theory and Statistics)
Show Figures

Figure 1

11 pages, 4091 KiB  
Article
A Detecting System for Abrupt Changes in Temporal Incidence Rate of COVID-19 and Other Pandemics
by Jiecheng Song, Guanchao Tong and Wei Zhu
Stats 2023, 6(3), 931-941; https://doi.org/10.3390/stats6030058 - 18 Sep 2023
Viewed by 1167
Abstract
COVID-19 spread dramatically across the world in the beginning of 2020. This paper presents a novel alert system that will detect abrupt changes in the COVID-19 or other pandemic incidence rate through the estimated time-varying reproduction number (Rt). We applied the system to [...] Read more.
COVID-19 spread dramatically across the world in the beginning of 2020. This paper presents a novel alert system that will detect abrupt changes in the COVID-19 or other pandemic incidence rate through the estimated time-varying reproduction number (Rt). We applied the system to detect abrupt changes in the COVID-19 pandemic incidence rates in thirteen world regions with eight in the US and five across the world. Subsequently, we also evaluated the system with the 2009 H1N1 pandemic in Hong Kong. Our system performs well in detecting both the abrupt increases and decreases. Users of the system can obtain accurate information on the changing trend of the pandemic to avoid being misled by low incidence numbers. The world may face other threatening pandemics in the future; therefore, it is crucial to have a reliable alert system to detect impending abrupt changes in the daily incidence rates. An added benefit of the system is its ability to detect the emergence of viral mutations, as different virus strains are likely to have different infection rates. Full article
Show Figures

Figure 1

11 pages, 1818 KiB  
Article
Orthonormal F Contrasts for Factors with Ordered Levels in Two-Factor Fixed-Effects ANOVAs
by J. C. W. Rayner and G. C. Livingston, Jr.
Stats 2023, 6(3), 920-930; https://doi.org/10.3390/stats6030057 - 1 Sep 2023
Cited by 2 | Viewed by 932
Abstract
In multifactor fixed-effects ANOVAs, we show how to construct orthonormal F contrasts for main effects. Our primary focus is the case when the levels of the factor of interest are ordered. Likewise, in multifactor equally replicated fixed-effects ANOVAs, we show how to construct [...] Read more.
In multifactor fixed-effects ANOVAs, we show how to construct orthonormal F contrasts for main effects. Our primary focus is the case when the levels of the factor of interest are ordered. Likewise, in multifactor equally replicated fixed-effects ANOVAs, we show how to construct orthonormal F contrasts for interactions. The primary focus here is on interactions when both factors are ordered, although the approach also applies if just one factor is ordered. Interactions with both factors ordered may be interpreted in terms of generalised correlations. Full article
(This article belongs to the Section Data Science)
Show Figures

Figure 1

13 pages, 419 KiB  
Article
Investigating Self-Rationalizing Models for Commonsense Reasoning
by Fanny Rancourt, Paula Vondrlik, Diego Maupomé and Marie-Jean Meurs
Stats 2023, 6(3), 907-919; https://doi.org/10.3390/stats6030056 - 29 Aug 2023
Cited by 1 | Viewed by 1312
Abstract
The rise of explainable natural language processing spurred a bulk of work on datasets augmented with human explanations, as well as technical approaches to leverage them. Notably, generative large language models offer new possibilities, as they can output a prediction as well as [...] Read more.
The rise of explainable natural language processing spurred a bulk of work on datasets augmented with human explanations, as well as technical approaches to leverage them. Notably, generative large language models offer new possibilities, as they can output a prediction as well as an explanation in natural language. This work investigates the capabilities of fine-tuned text-to-text transfer Transformer (T5) models for commonsense reasoning and explanation generation. Our experiments suggest that while self-rationalizing models achieve interesting results, a significant gap remains: classifiers consistently outperformed self-rationalizing models, and a substantial fraction of model-generated explanations are not valid. Furthermore, training with expressive free-text explanations substantially altered the inner representation of the model, suggesting that they supplied additional information and may bridge the knowledge gap. Our code is publicly available, and the experiments were run on open-access datasets, hence allowing full reproducibility. Full article
(This article belongs to the Special Issue Machine Learning and Natural Language Processing (ML & NLP))
Show Figures

Figure 1

18 pages, 840 KiB  
Article
Statistical Modeling of Implicit Functional Relations
by Stan Lipovetsky
Stats 2023, 6(3), 889-906; https://doi.org/10.3390/stats6030055 - 25 Aug 2023
Cited by 1 | Viewed by 1086
Abstract
This study considers the statistical estimation of relations presented by implicit functions. Such structures define mutual interconnections of variables rather than outcome variable dependence by predictor variables considered in regular regression analysis. For a simple case of two variables, pairwise regression modeling produces [...] Read more.
This study considers the statistical estimation of relations presented by implicit functions. Such structures define mutual interconnections of variables rather than outcome variable dependence by predictor variables considered in regular regression analysis. For a simple case of two variables, pairwise regression modeling produces two different lines of each variable dependence using another variable, but building an implicit relation yields one invertible model composed of two simple regressions. Modeling an implicit linear relation for multiple variables can be expressed as a generalized eigenproblem of the covariance matrix of the variables in the metric of the covariance matrix of their errors. For unknown errors, this work describes their estimation by the residual errors of each variable in its regression by the other predictors. Then, the generalized eigenproblem can be reduced to the diagonalization of a special matrix built from the variables’ covariance matrix and its inversion. Numerical examples demonstrate the eigenvector solution’s good properties for building a unique equation of the relations between all variables. The proposed approach can be useful in practical regression modeling with all variables containing unobserved errors, which is a common situation for the applied problems. Full article
Show Figures

Figure 1

21 pages, 887 KiB  
Article
Statistical Predictors of Project Management Maturity
by Helder Jose Celani de Souza, Valerio Antonio Pamplona Salomon and Carlos Eduardo Sanches da Silva
Stats 2023, 6(3), 868-888; https://doi.org/10.3390/stats6030054 - 15 Aug 2023
Viewed by 1493
Abstract
Global scenarios of organizations show investments wasted in projects with poor performances in more than 11 percent of cases, according to the Project Management Institute. This research aims to guide organizations in assertively investing in the right pertinent factors to improve project success [...] Read more.
Global scenarios of organizations show investments wasted in projects with poor performances in more than 11 percent of cases, according to the Project Management Institute. This research aims to guide organizations in assertively investing in the right pertinent factors to improve project success rates and speed up project management maturity at a higher accuracy level using statistical predictions. Challenging existing drivers for project management maturity models and expanding their current practical view will be the result of a quantitative methodology based on a survey supported by data collection targeting the project management community in Brazil. The originality and value of this research are in contributing to the development of new project maturity models statistically supported by the increasing rate of maturity accuracy, which can be continually improved by confident data input into the model. The results show a high correlation between the performance measurement system and the project success rate associated with project management maturity. In addition, this research contemplates the relationship between organizational culture, business type, and project management office and project management maturity. Full article
Show Figures

Figure 1

29 pages, 431 KiB  
Article
Multi-Step-Ahead Prediction Intervals for Nonparametric Autoregressions via Bootstrap: Consistency, Debiasing, and Pertinence
by Dimitris N. Politis and Kejin Wu
Stats 2023, 6(3), 839-867; https://doi.org/10.3390/stats6030053 - 11 Aug 2023
Cited by 1 | Viewed by 1336
Abstract
To address the difficult problem of the multi-step-ahead prediction of nonparametric autoregressions, we consider a forward bootstrap approach. Employing a local constant estimator, we can analyze a general type of nonparametric time-series model and show that the proposed point predictions are consistent with [...] Read more.
To address the difficult problem of the multi-step-ahead prediction of nonparametric autoregressions, we consider a forward bootstrap approach. Employing a local constant estimator, we can analyze a general type of nonparametric time-series model and show that the proposed point predictions are consistent with the true optimal predictor. We construct a quantile prediction interval that is asymptotically valid. Moreover, using a debiasing technique, we can asymptotically approximate the distribution of multi-step-ahead nonparametric estimation by the bootstrap. As a result, we can build bootstrap prediction intervals that are pertinent, i.e., can capture the model estimation variability, thus improving the standard quantile prediction intervals. Simulation studies are presented to illustrate the performance of our point predictions and pertinent prediction intervals for finite samples. Full article
(This article belongs to the Section Time Series Analysis)
27 pages, 556 KiB  
Article
Analysis of Ordinal Populations from Judgment Post-Stratification
by Amirhossein Alvandi and Armin Hatefi
Stats 2023, 6(3), 812-838; https://doi.org/10.3390/stats6030052 - 9 Aug 2023
Viewed by 907
Abstract
In surveys requiring cost efficiency, such as medical research, measuring the variable of interest (e.g., disease status) is expensive and/or time-consuming; however, we often have access to easily obtainable characteristics about sampling units. These characteristics are not typically employed in the data collection [...] Read more.
In surveys requiring cost efficiency, such as medical research, measuring the variable of interest (e.g., disease status) is expensive and/or time-consuming; however, we often have access to easily obtainable characteristics about sampling units. These characteristics are not typically employed in the data collection process. Judgment post-stratification (JPS) sampling enables us to supplement the random samples from the population of interest with these characteristics as ranking information. This paper develops methods based on the JPS samples for estimating categorical ordinal populations. We develop various estimators from the JPS data even for situations where the JPS suffers from empty strata. We also propose the JPS estimators using multiple ranking resources. Through extensive numerical studies, we evaluate the performance of the methods in estimating the population. Finally, the developed estimation methods are applied to bone mineral data to estimate the bone disorder status of women aged 50 and older. Full article
(This article belongs to the Section Statistical Methods)
Show Figures

Figure 1

10 pages, 372 KiB  
Communication
On the Extreme Value H-Function
by Pushpa Narayan Rathie, Luan Carlos de Sena Monteiro Ozelim, Felipe Quintino and Tiago A. da Fonseca
Stats 2023, 6(3), 802-811; https://doi.org/10.3390/stats6030051 - 4 Aug 2023
Cited by 2 | Viewed by 974
Abstract
In the present paper, a new special function, the so-called extreme value H-function, is introduced. This new function, which is a generalization of the H-function with a particular set of parameters, appears while dealing with products and quotients of a wide class [...] Read more.
In the present paper, a new special function, the so-called extreme value H-function, is introduced. This new function, which is a generalization of the H-function with a particular set of parameters, appears while dealing with products and quotients of a wide class of extreme value random variables. Some properties, special cases and a series representation are provided. Some statistical applications are also briefly discussed. Full article
Show Figures

Figure 1

29 pages, 1122 KiB  
Article
The New Exponentiated Half Logistic-Harris-G Family of Distributions with Actuarial Measures and Applications
by Gayan Warahena-Liyanage, Broderick Oluyede, Thatayaone Moakofi and Whatmore Sengweni
Stats 2023, 6(3), 773-801; https://doi.org/10.3390/stats6030050 - 31 Jul 2023
Cited by 2 | Viewed by 1073
Abstract
In this study, we introduce a new generalized family of distributions called the Exponentiated Half Logistic-Harris-G (EHL-Harris-G) distribution, which extends the Harris-G distribution. The motivation for introducing this generalized family of distributions lies in its ability to overcome the limitations of previous families, [...] Read more.
In this study, we introduce a new generalized family of distributions called the Exponentiated Half Logistic-Harris-G (EHL-Harris-G) distribution, which extends the Harris-G distribution. The motivation for introducing this generalized family of distributions lies in its ability to overcome the limitations of previous families, enhance flexibility, improve tail behavior, provide better statistical properties and find applications in several fields. Several statistical properties, including hazard rate function, quantile function, moments, moments of residual life, distribution of the order statistics and Rényi entropy are discussed. Risk measures, such as value at risk, tail value at risk, tail variance and tail variance premium, are also derived and studied. To estimate the parameters of the EHL-Harris-G family of distributions, the following six different estimation approaches are used: maximum likelihood (MLE), least-squares (LS), weighted least-squares (WLS), maximum product spacing (MPS), Cramér–von Mises (CVM), and Anderson–Darling (AD). The Monte Carlo simulation results for EHL-Harris-Weibull (EHL-Harris-W) show that the MLE method allows us to obtain better estimates, followed by WLS and then AD. Finally, we show that the EHL-Harris-W distribution is superior to some other equi-parameter non-nested models in the literature, by fitting it to two real-life data sets from different disciplines. Full article
Show Figures

Figure 1

10 pages, 293 KiB  
Communication
Khinchin’s Fourth Axiom of Entropy Revisited
by Zhiyi Zhang, Hongwei Huang and Hao Xu
Stats 2023, 6(3), 763-772; https://doi.org/10.3390/stats6030049 - 27 Jul 2023
Viewed by 967
Abstract
The Boltzmann–Gibbs–Shannon (BGS) entropy is the only entropy form satisfying four conditions known as Khinchin’s axioms. The uniqueness theorem of the BGS entropy, plus the fact that Shannon’s mutual information completely characterizes independence between the two underlying random elements, puts the BGS entropy [...] Read more.
The Boltzmann–Gibbs–Shannon (BGS) entropy is the only entropy form satisfying four conditions known as Khinchin’s axioms. The uniqueness theorem of the BGS entropy, plus the fact that Shannon’s mutual information completely characterizes independence between the two underlying random elements, puts the BGS entropy in a special place in many fields of study. In this article, the fourth axiom is replaced by a slightly weakened condition: an entropy whose associated mutual information is zero if and only if the two underlying random elements are independent. Under the weaker fourth axiom, other forms of entropy are sought by way of escort transformations. Two main results are reported in this article. First, there are many entropies other than the BGS entropy satisfying the weaker condition, yet retaining all the desirable utilities of the BGS entropy. Second, by way of escort transformations, the newly identified entropies are the only ones satisfying the weaker axioms. Full article
(This article belongs to the Section Data Science)
16 pages, 1423 KiB  
Article
Exploring Heterogeneity with Category and Cluster Analyses for Mixed Data
by Veronica Distefano, Maria Mannone and Irene Poli
Stats 2023, 6(3), 747-762; https://doi.org/10.3390/stats6030048 - 5 Jul 2023
Cited by 2 | Viewed by 1580
Abstract
Precision medicine aims to overcome the traditional one-model-fits-the-whole-population approach that is unable to detect heterogeneous disease patterns and make accurate personalized predictions. Heterogeneity is particularly relevant for patients with complications of type 2 diabetes, including diabetic kidney disease (DKD). We focus on a [...] Read more.
Precision medicine aims to overcome the traditional one-model-fits-the-whole-population approach that is unable to detect heterogeneous disease patterns and make accurate personalized predictions. Heterogeneity is particularly relevant for patients with complications of type 2 diabetes, including diabetic kidney disease (DKD). We focus on a DKD longitudinal dataset, aiming to find specific subgroups of patients with characteristics that have a close response to the therapeutic treatment. We develop an approach based on some particular concepts of category theory and cluster analysis to explore individualized modelings and achieving insights onto disease evolution. This paper exploits the visualization tools provided by category theory, and bridges category-based abstract works and real datasets. We build subgroups deriving clusters of patients at different time points, considering a set of variables characterizing the state of patients. We analyze how specific variables affect the disease progress, and which drug combinations are more effective for each cluster of patients. The retrieved information can foster individualized strategies for DKD treatment. Full article
Show Figures

Figure 1

7 pages, 243 KiB  
Communication
Some More Results on Characterization of the Exponential and Related Distributions
by Lev B. Klebanov
Stats 2023, 6(3), 740-746; https://doi.org/10.3390/stats6030047 - 29 Jun 2023
Viewed by 789
Abstract
There are given characterizations of the exponential distribution based on the properties of independence of linear forms with random coefficients. Results based on the constancy of regression of one statistic in a linear form are obtained. Related characterizations based on the property of [...] Read more.
There are given characterizations of the exponential distribution based on the properties of independence of linear forms with random coefficients. Results based on the constancy of regression of one statistic in a linear form are obtained. Related characterizations based on the property of the identical distribution of statistics are also provided. Full article
(This article belongs to the Special Issue Advances in Probability Theory and Statistics)
6 pages, 224 KiB  
Communication
Guess for Success? Application of a Mixture Model to Test-Wiseness on Multiple-Choice Exams
by Steven B. Caudill and Franklin G. Mixon, Jr.
Stats 2023, 6(3), 734-739; https://doi.org/10.3390/stats6030046 - 26 Jun 2023
Viewed by 1052
Abstract
The use of large lecture halls in business and economic education often dictates the use of multiple-choice exams to measure student learning. This study asserts that student performance on these types of exams can be viewed as the result of the process of [...] Read more.
The use of large lecture halls in business and economic education often dictates the use of multiple-choice exams to measure student learning. This study asserts that student performance on these types of exams can be viewed as the result of the process of elimination of incorrect answers, rather than the selection of the correct answer. More specifically, how students respond on a multiple-choice test can be broken down into the fractions of questions where no wrong answers can be eliminated (i.e., random guessing), one wrong answer can be eliminated, two wrong answers can be eliminated, and all wrong answers can be eliminated. The results from an empirical model, representing a mixture of binomials in which the probability of a correct choice depends on the number of incorrect choices eliminated, we find, using student performance data from a final exam in principles of microeconomics consisting of 100 multiple choice questions, that the responses to all of the questions on the exam can be characterized by some form of guessing, with more than 26 percent of questions being completed using purely random guessing. Full article
Previous Issue
Next Issue
Back to TopTop