Next Issue
Volume 4, June
Previous Issue
Volume 3, December
 
 

Stats, Volume 4, Issue 1 (March 2021) – 16 articles

Cover Story (view full-size image): The Nadaraya–Watson kernel estimator is among the most popular non-parameteric regression techniques thanks to its simplicity. Its asymptotic bias was studied by Rosenblatt in 1969 and has been reported in several related works. However, its asymptotic nature gives no access to a hard bound. The increasing popularity of predictive tools for automated decision-making increases the need for hard guarantees. To alleviate this issue, a novel non-probabilistic upper bound of the bias is proposed, which relies on Lipschitz assumptions and mitigates some of Rosenblatt’s analysis prerequisites. The upper bound holds for a large class of kernels, designs, regression functions, admits finite bandwidths, and is tight even with large second derivatives of the regression function—where Rosenblatt’s analysis typically fails. View this paper
  • Issues are regarded as officially published after their release is announced to the table of contents alert mailing list.
  • You may sign up for e-mail alerts to receive table of contents of newly released issues.
  • PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Reader to open them.
Order results
Result details
Section
Select all
Export citation of selected articles as:
12 pages, 3729 KiB  
Article
Normality Testing of High-Dimensional Data Based on Principle Component and Jarque–Bera Statistics
by Yanan Song and Xuejing Zhao
Stats 2021, 4(1), 216-227; https://doi.org/10.3390/stats4010016 - 17 Mar 2021
Cited by 5 | Viewed by 2916
Abstract
The testing of high-dimensional normality is an important issue and has been intensively studied in the literature, it depends on the variance–covariance matrix of the sample and numerous methods have been proposed to reduce its complexity. Principle component analysis (PCA) has been widely [...] Read more.
The testing of high-dimensional normality is an important issue and has been intensively studied in the literature, it depends on the variance–covariance matrix of the sample and numerous methods have been proposed to reduce its complexity. Principle component analysis (PCA) has been widely used in high dimensions, since it can project high-dimensional data into a lower-dimensional orthogonal space. The normality of the reduced data can then be evaluated by Jarque–Bera (JB) statistics in each principle direction. We propose a combined test statistic—the summation of one-way JB statistics upon the independence of the principle directions—to test the multivariate normality of data in high dimensions. The performance of the proposed method is illustrated by the empirical power of the simulated normal and non-normal data. Two real data examples show the validity of our proposed method. Full article
(This article belongs to the Section Computational Statistics)
Show Figures

Figure 1

11 pages, 1011 KiB  
Article
A Viable Approach to Mitigating Irreproducibility
by David Trafimow, Tonghui Wang and Cong Wang
Stats 2021, 4(1), 205-215; https://doi.org/10.3390/stats4010015 - 8 Mar 2021
Cited by 2 | Viewed by 1717
Abstract
In a recent article, Trafimow suggested the usefulness of imagining an ideal universe where the only difference between original and replication experiments is the operation of randomness. This contrasts with replication in the real universe where systematicity, as well as randomness, creates differences [...] Read more.
In a recent article, Trafimow suggested the usefulness of imagining an ideal universe where the only difference between original and replication experiments is the operation of randomness. This contrasts with replication in the real universe where systematicity, as well as randomness, creates differences between original and replication experiments. Although Trafimow showed (a) that the probability of replication in the ideal universe places an upper bound on the probability of replication in the real universe, and (b) how to calculate the probability of replication in the ideal universe, the conception is afflicted with an important practical problem. Too many participants are needed to render the approach palatable to most researchers. The present aim is to address this problem. Embracing skewness is an important part of the solution. Full article
Show Figures

Figure 1

21 pages, 1045 KiB  
Article
An FDA-Based Approach for Clustering Elicited Expert Knowledge
by Carlos Barrera-Causil, Juan Carlos Correa, Andrew Zamecnik, Francisco Torres-Avilés and Fernando Marmolejo-Ramos
Stats 2021, 4(1), 184-204; https://doi.org/10.3390/stats4010014 - 4 Mar 2021
Cited by 1 | Viewed by 2858
Abstract
Expert knowledge elicitation (EKE) aims at obtaining individual representations of experts’ beliefs and render them in the form of probability distributions or functions. In many cases the elicited distributions differ and the challenge in Bayesian inference is then to find ways to reconcile [...] Read more.
Expert knowledge elicitation (EKE) aims at obtaining individual representations of experts’ beliefs and render them in the form of probability distributions or functions. In many cases the elicited distributions differ and the challenge in Bayesian inference is then to find ways to reconcile discrepant elicited prior distributions. This paper proposes the parallel analysis of clusters of prior distributions through a hierarchical method for clustering distributions and that can be readily extended to functional data. The proposed method consists of (i) transforming the infinite-dimensional problem into a finite-dimensional one, (ii) using the Hellinger distance to compute the distances between curves and thus (iii) obtaining a hierarchical clustering structure. In a simulation study the proposed method was compared to k-means and agglomerative nesting algorithms and the results showed that the proposed method outperformed those algorithms. Finally, the proposed method is illustrated through an EKE experiment and other functional data sets. Full article
(This article belongs to the Special Issue Functional Data Analysis (FDA))
Show Figures

Figure 1

22 pages, 509 KiB  
Article
Bayesian Bandwidths in Semiparametric Modelling for Nonnegative Orthant Data with Diagnostics
by Célestin C. Kokonendji and Sobom M. Somé
Stats 2021, 4(1), 162-183; https://doi.org/10.3390/stats4010013 - 4 Mar 2021
Cited by 11 | Viewed by 2451
Abstract
Multivariate nonnegative orthant data are real vectors bounded to the left by the null vector, and they can be continuous, discrete or mixed. We first review the recent relative variability indexes for multivariate nonnegative continuous and count distributions. As a prelude, the classification [...] Read more.
Multivariate nonnegative orthant data are real vectors bounded to the left by the null vector, and they can be continuous, discrete or mixed. We first review the recent relative variability indexes for multivariate nonnegative continuous and count distributions. As a prelude, the classification of two comparable distributions having the same mean vector is done through under-, equi- and over-variability with respect to the reference distribution. Multivariate associated kernel estimators are then reviewed with new proposals that can accommodate any nonnegative orthant dataset. We focus on bandwidth matrix selections by adaptive and local Bayesian methods for semicontinuous and counting supports, respectively. We finally introduce a flexible semiparametric approach for estimating all these distributions on nonnegative supports. The corresponding estimator is directed by a given parametric part, and a nonparametric part which is a weight function to be estimated through multivariate associated kernels. A diagnostic model is also discussed to make an appropriate choice between the parametric, semiparametric and nonparametric approaches. The retention of pure nonparametric means the inconvenience of parametric part used in the modelization. Multivariate real data examples in semicontinuous setup as reliability are gradually considered to illustrate the proposed approach. Concluding remarks are made for extension to other multiple functions. Full article
(This article belongs to the Special Issue Directions in Statistical Modelling)
Show Figures

Figure 1

16 pages, 512 KiB  
Article
Assessment of Climate Change in Italy by Variants of Ordered Correspondence Analysis
by Assuntina Cembalo, Rosaria Lombardo, Eric J. Beh, Gianpaolo Romano, Michele Ferrucci and Francesca M. Pisano
Stats 2021, 4(1), 146-161; https://doi.org/10.3390/stats4010012 - 1 Mar 2021
Viewed by 2153
Abstract
This paper explores climate changes in Italy over the last 30 years. The data come from the European observation gridded dataset and are concerned with the temperature throughout the country. We focus our attention on two Italian regions (Lombardy in northern Italy and [...] Read more.
This paper explores climate changes in Italy over the last 30 years. The data come from the European observation gridded dataset and are concerned with the temperature throughout the country. We focus our attention on two Italian regions (Lombardy in northern Italy and Campania in southern Italy) and on two particular years roughly thirty years apart (1986 and 2015). Our primary aim is to assess the most important changes in temperature in Italy using some variants of correspondence analysis for ordered categorical variables. Such variants are based on a decomposition method using orthogonal polynomials instead of singular vectors and allow one to easily classify the meteorological station observations. A simulation study, based on bootstrap sampling, is undertaken to demonstrate the reliability of the results. Full article
(This article belongs to the Special Issue Multivariate Statistics and Applications)
Show Figures

Figure 1

8 pages, 259 KiB  
Article
Cumulative Median Estimation for Sufficient Dimension Reduction
by Stephen Babos and Andreas Artemiou
Stats 2021, 4(1), 138-145; https://doi.org/10.3390/stats4010011 - 20 Feb 2021
Viewed by 1870
Abstract
In this paper, we present the Cumulative Median Estimation (CUMed) algorithm for robust sufficient dimension reduction. Compared with non-robust competitors, this algorithm performs better when there are outliers present in the data and comparably when outliers are not present. This is demonstrated in [...] Read more.
In this paper, we present the Cumulative Median Estimation (CUMed) algorithm for robust sufficient dimension reduction. Compared with non-robust competitors, this algorithm performs better when there are outliers present in the data and comparably when outliers are not present. This is demonstrated in simulated and real data experiments. Full article
(This article belongs to the Special Issue Robust Statistics in Action)
Show Figures

Figure 1

16 pages, 1798 KiB  
Article
A Consistent Estimator of Nontrivial Stationary Solutions of Dynamic Neural Fields
by Eddy Kwessi
Stats 2021, 4(1), 122-137; https://doi.org/10.3390/stats4010010 - 13 Feb 2021
Cited by 2 | Viewed by 1795
Abstract
Dynamics of neural fields are tools used in neurosciences to understand the activities generated by large ensembles of neurons. They are also used in networks analysis and neuroinformatics in particular to model a continuum of neural networks. They are mathematical models that describe [...] Read more.
Dynamics of neural fields are tools used in neurosciences to understand the activities generated by large ensembles of neurons. They are also used in networks analysis and neuroinformatics in particular to model a continuum of neural networks. They are mathematical models that describe the average behavior of these congregations of neurons, which are often in large numbers, even in small cortexes of the brain. Therefore, change of average activity (potential, connectivity, firing rate, etc.) are described using systems of partial different equations. In their continuous or discrete forms, these systems have a rich array of properties, among which is the existence of nontrivial stationary solutions. In this paper, we propose an estimator for nontrivial solutions of dynamical neural fields with a single layer. The estimator is shown to be consistent and a computational algorithm is proposed to help carry out implementation. An illustrations of this consistency is given based on different inputs functions, different kernels, and different pulse emission rate functions. Full article
Show Figures

Figure 1

14 pages, 597 KiB  
Article
Predictor Analysis in Group Decision Making
by Stan Lipovetsky
Stats 2021, 4(1), 108-121; https://doi.org/10.3390/stats4010009 - 9 Feb 2021
Cited by 4 | Viewed by 2076
Abstract
Priority vectors in the Analytic Hierarchy Process (AHP) are commonly estimated as constant values calculated by the pairwise comparison ratios elicited from an expert. For multiple experts, or panel data, or other data with varied characteristics of measurements, the priority vectors can be [...] Read more.
Priority vectors in the Analytic Hierarchy Process (AHP) are commonly estimated as constant values calculated by the pairwise comparison ratios elicited from an expert. For multiple experts, or panel data, or other data with varied characteristics of measurements, the priority vectors can be built as functions of the auxiliary predictors. For example, in multi-person decision making, the priorities can be obtained in regression modeling by the demographic and socio-economic properties. Then the priorities can be predicted for individual respondents, profiled by each predictor, forecasted in time, studied by the predictor importance, and estimated by the characteristic of significance, fit and quality well-known in regression modeling. Numerical results show that the suggested approaches reveal useful features of priority behavior, that can noticeably extend the AHP abilities and applications for numerous multiple-criteria decision making problems. The considered methods are useful for segmentation of the respondents and finding optimum managerial solutions specific for each segment. It can help to decision makers to focus on the respondents’ individual features and to increase customer satisfaction, their retention and loyalty to the promoted brands or products. Full article
(This article belongs to the Section Regression Models)
Show Figures

Figure 1

20 pages, 466 KiB  
Article
Improving the Efficiency of Robust Estimators for the Generalized Linear Model
by Alfio Marazzi
Stats 2021, 4(1), 88-107; https://doi.org/10.3390/stats4010008 - 4 Feb 2021
Cited by 4 | Viewed by 2411
Abstract
The distance constrained maximum likelihood procedure (DCML) optimally combines a robust estimator with the maximum likelihood estimator with the purpose of improving its small sample efficiency while preserving a good robustness level. It has been published for the linear model and is now [...] Read more.
The distance constrained maximum likelihood procedure (DCML) optimally combines a robust estimator with the maximum likelihood estimator with the purpose of improving its small sample efficiency while preserving a good robustness level. It has been published for the linear model and is now extended to the GLM. Monte Carlo experiments are used to explore the performance of this extension in the Poisson regression case. Several published robust candidates for the DCML are compared; the modified conditional maximum likelihood estimator starting with a very robust minimum density power divergence estimator is selected as the best candidate. It is shown empirically that the DCML remarkably improves its small sample efficiency without loss of robustness. An example using real hospital length of stay data fitted by the negative binomial regression model is discussed. Full article
(This article belongs to the Special Issue Robust Statistics in Action)
Show Figures

Figure 1

2 pages, 158 KiB  
Editorial
Acknowledgment to Reviewers of Stats in 2020
by Stats Editorial Office
Stats 2021, 4(1), 86-87; https://doi.org/10.3390/stats4010007 - 29 Jan 2021
Cited by 1 | Viewed by 1802
Abstract
Peer review is the driving force of journal development, and reviewers are gatekeepers who ensure that Stats maintains its standards for the high quality of its published papers [...] Full article
15 pages, 309 KiB  
Article
Fusing Nature with Computational Science for Optimal Signal Extraction
by Hossein Hassani, Mohammad Reza Yeganegi and Xu Huang
Stats 2021, 4(1), 71-85; https://doi.org/10.3390/stats4010006 - 19 Jan 2021
Cited by 2 | Viewed by 2182
Abstract
Fusing nature with computational science has been proved paramount importance and researchers have also shown growing enthusiasm on inventing and developing nature inspired algorithms for solving complex problems across subjects. Inevitably, these advancements have rapidly promoted the development of data science, where nature [...] Read more.
Fusing nature with computational science has been proved paramount importance and researchers have also shown growing enthusiasm on inventing and developing nature inspired algorithms for solving complex problems across subjects. Inevitably, these advancements have rapidly promoted the development of data science, where nature inspired algorithms are changing the traditional way of data processing. This paper proposes the hybrid approach, namely SSA-GA, which incorporates the optimization merits of genetic algorithm (GA) for the advancements of Singular Spectrum Analysis (SSA). This approach further boosts the performance of SSA forecasting via better and more efficient grouping. Given the performances of SSA-GA on 100 real time series data across various subjects, this newly proposed SSA-GA approach is proved to be computationally efficient and robust with improved forecasting performance. Full article
Show Figures

Figure 1

9 pages, 2387 KiB  
Article
A Statistical Approach to Analyzing Engineering Estimates and Bids
by Roshanak Farshidpour, Kiana Negoro and Fariborz M. Tehrani
Stats 2021, 4(1), 62-70; https://doi.org/10.3390/stats4010005 - 13 Jan 2021
Cited by 2 | Viewed by 4383
Abstract
This paper introduces a methodology to assess the accuracy of engineering estimates in relation to the final project cost. The objective of this assessment is to develop a comprehensive approach towards obtaining a more reliable estimate of the project cost. This approach relies [...] Read more.
This paper introduces a methodology to assess the accuracy of engineering estimates in relation to the final project cost. The objective of this assessment is to develop a comprehensive approach towards obtaining a more reliable estimate of the project cost. This approach relies on the review of a synthesis of literature, which provides a basis for determining key components in the estimation of the capital cost of a project. A systematic review of existing data for selected projects was obtained as well. Employed data cover sampled public transportation projects to maintain existing infrastructure within selected geographical location and specified time. Enhanced analysis of existing data through statistical models was employed to indicate potential measures for prevention of errors in the estimate due to uncertainties in the time, cost, and method of construction. The comparison of results with similar findings from past research shows the effectiveness of presented methodologies and opportunities to enhance statistical analyses of bids and engineering estimates. Conclusions enable project managers to address uncertainties in the bidding process and enhance financial sustainability of projects within specific programs. Full article
(This article belongs to the Special Issue Applied Statistics in Engineering)
Show Figures

Figure 1

16 pages, 3649 KiB  
Article
A Quantitative Approach to Evaluate the Application of the Extended Situational Teaching Model in Engineering Education
by Fariborz M. Tehrani, Christopher McComb and Sherrianna Scott
Stats 2021, 4(1), 46-61; https://doi.org/10.3390/stats4010004 - 13 Jan 2021
Cited by 4 | Viewed by 3484
Abstract
The extended situational teaching model is a variation of situational teaching, which itself has roots in situational leadership. Application of situational leadership in education requires the teacher to lead students through various stages of the learning process. This paper presents the relationship between [...] Read more.
The extended situational teaching model is a variation of situational teaching, which itself has roots in situational leadership. Application of situational leadership in education requires the teacher to lead students through various stages of the learning process. This paper presents the relationship between performance measures of extended situational teaching and common pedagogical tools in engineering classrooms. These relationships outlined the response of students at different preparation levels to the application of various course components, including classroom activities and out-of-classroom assignments, in respect to task and relationship behaviors. The results of a quantitative survey are presented to support the existence of such a relationship and to demonstrate the effectiveness of the extended situational teaching model. The survey covered 476 engineering students enrolled in nine different courses over a four-year period within the civil engineering program. The statistical analysis of the survey responses proceeded in two stages. The first stage of the analysis evaluates whether the survey tool can resolve meaningful differences between the categories of the situational teaching model, and provides aggregate recommendations for each category. In the second stage of the analysis, the specific instantiation of these categories is broken down according to academic standing (grade point average) and academic level, offering support for an extended situational teaching model. Conclusions discuss the statistical characteristics of the results and correlations between selected pedagogical tools and performance measures. Full article
(This article belongs to the Special Issue Applied Statistics in Engineering)
Show Figures

Figure 1

18 pages, 771 KiB  
Article
Kumaraswamy Generalized Power Lomax Distributionand Its Applications
by Vasili B.V. Nagarjuna, R. Vishnu Vardhan and Christophe Chesneau
Stats 2021, 4(1), 28-45; https://doi.org/10.3390/stats4010003 - 7 Jan 2021
Cited by 16 | Viewed by 2803
Abstract
In this paper, a new five-parameter distribution is proposed using the functionalities of the Kumaraswamy generalized family of distributions and the features of the power Lomax distribution. It is named as Kumaraswamy generalized power Lomax distribution. In a first approach, we derive its [...] Read more.
In this paper, a new five-parameter distribution is proposed using the functionalities of the Kumaraswamy generalized family of distributions and the features of the power Lomax distribution. It is named as Kumaraswamy generalized power Lomax distribution. In a first approach, we derive its main probability and reliability functions, with a visualization of its modeling behavior by considering different parameter combinations. As prime quality, the corresponding hazard rate function is very flexible; it possesses decreasing, increasing and inverted (upside-down) bathtub shapes. Also, decreasing-increasing-decreasing shapes are nicely observed. Some important characteristics of the Kumaraswamy generalized power Lomax distribution are derived, including moments, entropy measures and order statistics. The second approach is statistical. The maximum likelihood estimates of the parameters are described and a brief simulation study shows their effectiveness. Two real data sets are taken to show how the proposed distribution can be applied concretely; parameter estimates are obtained and fitting comparisons are performed with other well-established Lomax based distributions. The Kumaraswamy generalized power Lomax distribution turns out to be best by capturing fine details in the structure of the data considered. Full article
Show Figures

Figure 1

10 pages, 816 KiB  
Article
General Formulas for the Central and Non-Central Moments of the Multinomial Distribution
by Frédéric Ouimet
Stats 2021, 4(1), 18-27; https://doi.org/10.3390/stats4010002 - 6 Jan 2021
Cited by 9 | Viewed by 3569
Abstract
We present the first general formulas for the central and non-central moments of the multinomial distribution, using a combinatorial argument and the factorial moments previously obtained in Mosimann (1962). We use the formulas to give explicit expressions for all the non-central moments up [...] Read more.
We present the first general formulas for the central and non-central moments of the multinomial distribution, using a combinatorial argument and the factorial moments previously obtained in Mosimann (1962). We use the formulas to give explicit expressions for all the non-central moments up to order 8 and all the central moments up to order 4. These results expand significantly on those in Newcomer (2008) and Newcomer et al. (2008), where the non-central moments were calculated up to order 4. Full article
(This article belongs to the Section Multivariate Analysis)
17 pages, 967 KiB  
Article
An Upper Bound of the Bias of Nadaraya-Watson Kernel Regression under Lipschitz Assumptions
by Samuele Tosatto, Riad Akrour and Jan Peters
Stats 2021, 4(1), 1-17; https://doi.org/10.3390/stats4010001 - 30 Dec 2020
Cited by 3 | Viewed by 4341
Abstract
The Nadaraya-Watson kernel estimator is among the most popular nonparameteric regression technique thanks to its simplicity. Its asymptotic bias has been studied by Rosenblatt in 1969 and has been reported in several related literature. However, given its asymptotic nature, it gives no access [...] Read more.
The Nadaraya-Watson kernel estimator is among the most popular nonparameteric regression technique thanks to its simplicity. Its asymptotic bias has been studied by Rosenblatt in 1969 and has been reported in several related literature. However, given its asymptotic nature, it gives no access to a hard bound. The increasing popularity of predictive tools for automated decision-making surges the need for hard (non-probabilistic) guarantees. To alleviate this issue, we propose an upper bound of the bias which holds for finite bandwidths using Lipschitz assumptions and mitigating some of the prerequisites of Rosenblatt’s analysis. Our bound has potential applications in fields like surgical robots or self-driving cars, where some hard guarantees on the prediction-error are needed. Full article
(This article belongs to the Section Regression Models)
Show Figures

Figure 1

Previous Issue
Next Issue
Back to TopTop