Computational Statistics and Data Analysis, 2nd Edition

A special issue of Mathematics (ISSN 2227-7390). This special issue belongs to the section "Probability and Statistics".

Deadline for manuscript submissions: 30 June 2025 | Viewed by 11332

Special Issue Editor


E-Mail Website
Guest Editor
School of Statistics, Beijing Normal University, Beijing 100875, China
Interests: high-dimensional statistics; nonparametric statistics and complex data analysis; model/variable selection; statistical learning; causal inference; longitudinal/panel data analysis; measurement error model; empirical likelihood
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues, 

With the development of scientific techniques, computational statistics and data analysis have become more and more important in diverse areas of science, engineering, and humanities, ranging from genomics and health sciences to economics, finance, and machine learning. To analyze the real data in these fields, statistical methodologies and computing for data analysis are fundamental to statistical modeling and data analysis. In this Special Issue, we are looking for high-quality research papers in computational statistics and data analysis. We invite investigators to contribute original research articles as well as review articles that will stimulate the development of statistical methodology and applications concerning the data analysis.

Prof. Dr. Gaorong Li
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Mathematics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • bootstrapping
  • classification
  • data analytical strategies and methodologies applied in biostatistics
  • dimension reduction of high-dimensional data analysis
  • large-scale inference for Gaussian graphical models and covariance estimation
  • longitudinal/panel data analysis
  • massive networks
  • medical statistics
  • nonparametric and semiparametric models
  • optimal portfolio
  • robust statistics
  • statistical methodology and computing for data analysis
  • statistical methodology and computing for noise data, such as measurement error data, missing data etc.
  • sufficient dimension reduction methods in regression analysis variable/model selection for high-dimensional data

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (9 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

16 pages, 570 KiB  
Article
A New Random Coefficient Autoregressive Model Driven by an Unobservable State Variable
by Yuxin Pang and Dehui Wang
Mathematics 2024, 12(24), 3890; https://doi.org/10.3390/math12243890 - 10 Dec 2024
Viewed by 515
Abstract
A novel random coefficient autoregressive model is proposed, and a feature of the model is the non-stationarity of the state equation. The autoregressive coefficient is an unknown function with an unobservable state variable, which can be estimated by the local linear regression method. [...] Read more.
A novel random coefficient autoregressive model is proposed, and a feature of the model is the non-stationarity of the state equation. The autoregressive coefficient is an unknown function with an unobservable state variable, which can be estimated by the local linear regression method. The iterative algorithm is constructed to estimate the parameters based on the ordinary least squares method. The ordinary least squares residuals are used to estimate the variances of the errors. The Kalman-smoothed estimation method is used to estimate the unobservable state variable because of its ability to deal with non-stationary stochastic processes. These methods allow deriving the analytical solutions. The performance of the estimation methods is evaluated through numerical simulation. The model is validated using actual time series data from the S&P/HKEX Large Cap Index. Full article
(This article belongs to the Special Issue Computational Statistics and Data Analysis, 2nd Edition)
Show Figures

Figure 1

20 pages, 370 KiB  
Article
High-Dimensional U-Statistics Type Hypothesis Testing via Jackknife Pseudo-Values with Multiplier Bootstrap
by Mingjuan Zhang and Libin Jin
Mathematics 2024, 12(23), 3837; https://doi.org/10.3390/math12233837 - 4 Dec 2024
Viewed by 471
Abstract
High-dimensional parameter testing is commonly used in bioinformatics to analyze complex relationships in gene expression and brain connectivity studies, involving parameters like means, covariances, and correlations. In this paper, we present a novel approach for testing U-statistics-type parameters by leveraging jackknife pseudo-values. [...] Read more.
High-dimensional parameter testing is commonly used in bioinformatics to analyze complex relationships in gene expression and brain connectivity studies, involving parameters like means, covariances, and correlations. In this paper, we present a novel approach for testing U-statistics-type parameters by leveraging jackknife pseudo-values. Inspired by Tukey’s conjecture, we establish the asymptotic independence of these pseudo-values, allowing us to reformulate U-statistics-type parameter testing as a sample mean testing problem. This reformulation enables the use of established sample mean testing frameworks, simplifying the testing procedure. We apply a multiplier bootstrap method to obtain critical values and provide a rigorous theoretical analysis to validate the approach. Simulation studies demonstrate the robustness of our method across a variety of scenarios. Additionally, we apply our approach to investigate differences in the dependency structures of a subset of genes within the Wnt signaling pathway, which is associated with lung cancer. Full article
(This article belongs to the Special Issue Computational Statistics and Data Analysis, 2nd Edition)
18 pages, 591 KiB  
Article
Estimation and Simultaneous Confidence Bands for Fixed-Effects Panel Data Partially Linear Models
by Suigen Yang, Xiujuan Yang and Xuefei Wang
Mathematics 2024, 12(23), 3774; https://doi.org/10.3390/math12233774 - 29 Nov 2024
Viewed by 391
Abstract
In this paper, we study the estimation and simultaneous confidence band (SCB) problems for fixed-effects panel data partially linear models. We remove the fixed effects and then obtain estimators for the parametric and nonparametric components, which do not depend on the fixed effects. [...] Read more.
In this paper, we study the estimation and simultaneous confidence band (SCB) problems for fixed-effects panel data partially linear models. We remove the fixed effects and then obtain estimators for the parametric and nonparametric components, which do not depend on the fixed effects. We establish the asymptotic distribution of the maximum absolute deviation between the estimated nonparametric component and the true nonparametric component under some suitable conditions; hence, this result can be used to construct the simultaneous confidence band for the nonparametric component. Based on the asymptotic distribution, it becomes difficult to construct the simultaneous confidence band. The reason for this is that the asymptotic distribution involves estimators of the asymptotic bias and conditional variance, as well as the choice of bandwidth for estimating the second derivative of the nonparametric function. Clearly, this will result in a computational burden and accumulated errors. To overcome these problems, we propose a bootstrap method to construct the simultaneous confidence band. The Monte Carlo results indicate that the proposed bootstrap method exhibits better performance with limited samples. An empirical application is presented to evaluate the performance of the proposed method. Full article
(This article belongs to the Special Issue Computational Statistics and Data Analysis, 2nd Edition)
Show Figures

Figure 1

19 pages, 354 KiB  
Article
Identifiability and Estimation for Potential-Outcome Means with Misclassified Outcomes
by Shaojie Wei, Chao Zhang, Zhi Geng and Shanshan Luo
Mathematics 2024, 12(18), 2801; https://doi.org/10.3390/math12182801 - 10 Sep 2024
Viewed by 975
Abstract
Potential outcomes play a fundamental and important role in many causal inference problems. If the potential-outcome means are identifiable, a series of causal effect measures, including the risk difference, the risk ratio, and the treatment benefit rate, among others, can also be identified. [...] Read more.
Potential outcomes play a fundamental and important role in many causal inference problems. If the potential-outcome means are identifiable, a series of causal effect measures, including the risk difference, the risk ratio, and the treatment benefit rate, among others, can also be identified. However, current identification and estimation methods for these means often implicitly assume that the collected data for analysis are measured precisely. In many fields such as medicine and economics, the collected variables may be subject to measurement errors, such as medical diagnostic results and individual wage data. Misclassification, as a non-classic measurement error, can lead to severely biased estimates in causal inference. In this paper, we leverage a combined sample to study the identifiability of potential-outcome means corresponding to different treatment levers under a plausible misclassification assumption for the outcome, allowing the misclassification probability to depend on not only the true outcome but also the covariates. Furthermore, we propose the multiply-robust and semiparametric efficient estimators for the means, consistent even under partial misspecification of the observed data law, based on the semiparametric theory framework. The simulation studies and real data analysis demonstrate the satisfactory performance of the proposed method. Full article
(This article belongs to the Special Issue Computational Statistics and Data Analysis, 2nd Edition)
16 pages, 3039 KiB  
Article
Testing Multivariate Normality Based on Beta-Representative Points
by Yiwen Cao, Jiajuan Liang, Longhao Xu and Jiangrui Kang
Mathematics 2024, 12(11), 1711; https://doi.org/10.3390/math12111711 - 30 May 2024
Cited by 1 | Viewed by 587
Abstract
Testing multivariate normality in high-dimensional data analysis has been a long-lasting topic in the area of goodness of fit. Numerous methods for this purpose can be found in the literature. Reviews on different methods given by influential researchers show that new methods keep [...] Read more.
Testing multivariate normality in high-dimensional data analysis has been a long-lasting topic in the area of goodness of fit. Numerous methods for this purpose can be found in the literature. Reviews on different methods given by influential researchers show that new methods keep emerging in the literature from different perspectives. The theory of statistical representative points provides a new perspective to construct tests for multivariate normality. To avoid the difficulty and huge computational load in finding the statistical representative points from a high-dimensional probability distribution, we develop an approach to constructing a test for high-dimensional normal distribution based on the representative points of the simple univariate beta distribution. The representative-points-based approach is extended to the the case that the sample size may be smaller than the dimension. A Monte Carlo study shows that the new test is able to control type I error rates fairly well for both large and small sample sizes when faced with a high dimension. The power of the new test against some non-normal distributions is generally or substantially improved for a set of selected alternative distributions. A real-data example is given for a simple application illustration. Full article
(This article belongs to the Special Issue Computational Statistics and Data Analysis, 2nd Edition)
Show Figures

Figure 1

20 pages, 549 KiB  
Article
Estimation in Semi-Varying Coefficient Heteroscedastic Instrumental Variable Models with Missing Responses
by Weiwei Zhang, Jingxuan Luo and Shengyun Ma
Mathematics 2023, 11(23), 4853; https://doi.org/10.3390/math11234853 - 2 Dec 2023
Viewed by 1171
Abstract
This paper studies the estimation problem for semi-varying coefficient heteroscedastic instrumental variable models with missing responses. First, we propose the adjusted estimators for unknown parameters and smooth functional coefficients utilizing the ordinary profile least square method and instrumental variable adjustment technique with complete [...] Read more.
This paper studies the estimation problem for semi-varying coefficient heteroscedastic instrumental variable models with missing responses. First, we propose the adjusted estimators for unknown parameters and smooth functional coefficients utilizing the ordinary profile least square method and instrumental variable adjustment technique with complete data. Second, we present an adjusted estimator of the stochastic error variance by employing the Nadaraya–Watson kernel estimation technique. Third, we apply the inverse probability-weighted method and instrumental variable adjustment technique to construct the adaptive-weighted adjusted estimators for unknown parameters and smooth functional coefficients. The asymptotic properties of our proposed estimators are established under some regularity conditions. Finally, numerous simulation studies and a real data analysis are conducted to examine the finite sample performance of the proposed estimators. Full article
(This article belongs to the Special Issue Computational Statistics and Data Analysis, 2nd Edition)
Show Figures

Figure 1

13 pages, 318 KiB  
Article
Regression Analysis of Dependent Current Status Data with Left Truncation
by Mengyue Zhang, Shishun Zhao, Tao Hu, Da Xu and Jianguo Sun
Mathematics 2023, 11(16), 3539; https://doi.org/10.3390/math11163539 - 16 Aug 2023
Viewed by 1165
Abstract
Current status data are encountered in a wide range of applications, including tumorigenic experiments and demographic studies. In this case, each subject has one observation, and the only information obtained is whether the event of interest happened at the moment of observation. In [...] Read more.
Current status data are encountered in a wide range of applications, including tumorigenic experiments and demographic studies. In this case, each subject has one observation, and the only information obtained is whether the event of interest happened at the moment of observation. In addition to censoring, truncating is also very common in practice. This paper examines the regression analysis of current status data with informative censoring times, considering the presence of left truncation. In addition, we propose an inference approach based on sieve maximum likelihood estimation (SMLE). A copula-based approach is used to describe the relationship between the failure time of interest and the censoring time. The spline function is employed to approximate the unknown nonparametric function. We have established the asymptotic properties of the proposed estimator. Simulation studies suggest that the developed procedure works well in practice. We also applied the developed method to a real dataset derived from an AIDS cohort research. Full article
(This article belongs to the Special Issue Computational Statistics and Data Analysis, 2nd Edition)
Show Figures

Figure 1

16 pages, 325 KiB  
Article
An Improved Dunnett’s Procedure for Comparing Multiple Treatments with a Control in the Presence of Missing Observations
by Wenqing Jiang, Jiangjie Zhou and Baosheng Liang
Mathematics 2023, 11(14), 3233; https://doi.org/10.3390/math11143233 - 22 Jul 2023
Viewed by 2403
Abstract
Dunnett’s procedure has been frequently used for multiple comparisons of group means of several treatments with a control, in drug development and other areas. However, in practice, researchers usually face missing observations when performing Dunnett’s procedure. This paper presents an improved Dunnett’s procedure [...] Read more.
Dunnett’s procedure has been frequently used for multiple comparisons of group means of several treatments with a control, in drug development and other areas. However, in practice, researchers usually face missing observations when performing Dunnett’s procedure. This paper presents an improved Dunnett’s procedure that can construct unique ensemble confidence intervals for comparing group means of several treatments with a control, in the presence of missing observations, using a derived multivariate t distribution under the framework of Rubin’s rule. This procedure fills the current research gap that Rubin’s repeated-imputation inferences cannot adjust for multiplicity and, thereby, cannot give a unified confidence interval to control the family-wise error rate (FWER) when dealing with this problem. Simulation results show that the constructed pooled confidence intervals archive nominal joint coverage and the interval estimations preserve comparable precision to Rubin’s repeated-imputation inference as the missing rate increases. The proposed procedure with propensity-score imputation method is shown to produce more accurate interval estimations and control the FWER well. Full article
(This article belongs to the Special Issue Computational Statistics and Data Analysis, 2nd Edition)
Show Figures

Figure 1

Review

Jump to: Research

22 pages, 428 KiB  
Review
Overview of High-Dimensional Measurement Error Regression Models
by Jingxuan Luo, Lili Yue and Gaorong Li
Mathematics 2023, 11(14), 3202; https://doi.org/10.3390/math11143202 - 21 Jul 2023
Cited by 1 | Viewed by 1909
Abstract
High-dimensional measurement error data are becoming more prevalent across various fields. Research on measurement error regression models has gained momentum due to the risk of drawing inaccurate conclusions if measurement errors are ignored. When the dimension p is larger than the sample size [...] Read more.
High-dimensional measurement error data are becoming more prevalent across various fields. Research on measurement error regression models has gained momentum due to the risk of drawing inaccurate conclusions if measurement errors are ignored. When the dimension p is larger than the sample size n, it is challenging to develop statistical inference methods for high-dimensional measurement error regression models due to the existence of bias, nonconvexity of the objective function, high computational cost and many other difficulties. Over the past few years, some works have overcome the aforementioned difficulties and proposed several novel statistical inference methods. This paper mainly reviews the current development on estimation, hypothesis testing and variable screening methods for high-dimensional measurement error regression models and shows the theoretical results of these methods with some directions worthy of exploring in future research. Full article
(This article belongs to the Special Issue Computational Statistics and Data Analysis, 2nd Edition)
Back to TopTop