Statistics and Data Science

A special issue of Mathematics (ISSN 2227-7390). This special issue belongs to the section "Probability and Statistics".

Deadline for manuscript submissions: 26 May 2025 | Viewed by 3615

Special Issue Editors


E-Mail Website
Guest Editor
Department of Mathematics and Statistics, Old Dominion University, Norfolk, VA 23529, USA
Interests: big data analytics; machine learning; computational statistics; quantitative finance; statistical process control; robust statistics; nonparametric and semiparametric techniques
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Department of Mathematical Sciences, University of Memphis, Memphis, TN 38152, USA
Interests: cluster analysis; robust statistical procedures

Special Issue Information

Dear Colleagues,

As large datasets become more ubiquitous, the demand for data-driven methodologies that provide valuable insights into complex phenomena and facilitate computer-guided decision making continues growing. Fueled by theoretical and methodological advancements and the increasing availability of computing resources, increasingly more cutting-edge developments are taking place at the interface of statistics and data science and have proven to be a major driver of innovation in science and technology. This Special Issue aims to promote the convergence between modern research agendas and practices in data science and statistics and to explore collaborative synergies in addressing theoretical and real-world problems.

While the role of data science becomes increasingly important in statistics, leading to a broader use of computationally intensive methods, a heavier reliance on resampling and bootstrapping techniques, the utilization of multi-modal datasets, etc., traditional statistical approaches have catalyzed major developments in tree-based learning, Bayesian deep learning, robust uncertainty quantification, model reduction, feature selection, as well as many other areas of data science. This emphasizes the importance of interdisciplinary research that connects experts from both fields.

This Special Issue welcomes original manuscripts on a broad variety of topics in statistics and data science including, but not limited to, the following: theoretical and methodological developments in multivariate and high-dimensional statistics, robust and nonparametric statistics, statistical learning, computational statistics, Bayesian statistics, machine learning, big data analytics, deep learning, image analysis and computer vision, text mining and large language models, multimodal learning, explainable AI, the application of advanced data analytics in solving real-world problems, and reviews of the modern data science and statistics literature.

Dr. Michael Pokojovy
Dr. Andrews T. Anum
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Mathematics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • data science
  • big data analytics
  • advanced data analytics
  • computational statistics
  • robust inference
  • Bayesian inference
  • statistical learning
  • machine learning
  • deep learning
  • explainable AI
  • trustworthy AI
  • text mining
  • image mining
  • multimodal learning
  • supervised learning
  • unsupervised learning
  • reinforcement learning
  • transfer learning

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (4 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

28 pages, 3798 KiB  
Article
Smooth Sigmoid Surrogate (SSS): An Alternative to Greedy Search in Decision Trees
by Xiaogang Su, George Ekow Quaye, Yishu Wei, Joseph Kang, Lei Liu, Qiong Yang, Juanjuan Fan and Richard A. Levine
Mathematics 2024, 12(20), 3190; https://doi.org/10.3390/math12203190 - 11 Oct 2024
Viewed by 528
Abstract
Greedy search (GS) or exhaustive search plays a crucial role in decision trees and their various extensions. We introduce an alternative splitting method called smooth sigmoid surrogate (SSS) in which the indicator threshold function used in GS is approximated by a smooth sigmoid [...] Read more.
Greedy search (GS) or exhaustive search plays a crucial role in decision trees and their various extensions. We introduce an alternative splitting method called smooth sigmoid surrogate (SSS) in which the indicator threshold function used in GS is approximated by a smooth sigmoid function. This approach allows for parametric smoothing or regularization of the erratic and discrete GS process, making it more effective in identifying the true cutoff point, particularly in the presence of weak signals, as well as less prone to the inherent end-cut preference problem. Additionally, SSS provides a convenient means of evaluating the best split by referencing a parametric nonlinear model. Moreover, in many variants of recursive partitioning, SSS can be reformulated as a one-dimensional smooth optimization problem, rendering it computationally more efficient than GS. Extensive simulation studies and real data examples are provided to evaluate and demonstrate its effectiveness. Full article
(This article belongs to the Special Issue Statistics and Data Science)
Show Figures

Figure 1

20 pages, 3823 KiB  
Article
From Whence Commeth Data Misreporting? A Survey of Benford’s Law and Digit Analysis in the Time of the COVID-19 Pandemic
by Călin Vâlsan, Andreea-Ionela Puiu and Elena Druică
Mathematics 2024, 12(16), 2579; https://doi.org/10.3390/math12162579 - 21 Aug 2024
Viewed by 487
Abstract
We survey the literature on the use of Benford’s distribution digit analysis applied to COVID-19 case data reporting. We combine a bibliometric analysis of 32 articles with a survey of their content and findings. In spite of combined efforts from teams of researchers [...] Read more.
We survey the literature on the use of Benford’s distribution digit analysis applied to COVID-19 case data reporting. We combine a bibliometric analysis of 32 articles with a survey of their content and findings. In spite of combined efforts from teams of researchers across multiple countries and universities, using large data samples from a multitude of sources, there is no emerging consensus on data misreporting. We believe we are nevertheless able to discern a faint pattern in the segregation of findings. The evidence suggests that studies using very large, aggregate samples and a methodology based on hypothesis testing are marginally more likely to identify significant deviations from Benford’s distribution and to attribute this deviation to data tampering. Our results are far from conclusive and should be taken with a very healthy dose of skepticism. Academics and policymakers alike should remain mindful that the misreporting controversy is still far from being settled. Full article
(This article belongs to the Special Issue Statistics and Data Science)
Show Figures

Figure 1

13 pages, 630 KiB  
Article
A Copula Discretization of Time Series-Type Model for Examining Climate Data
by Dimuthu Fernando, Olivia Atutey and Norou Diawara
Mathematics 2024, 12(15), 2419; https://doi.org/10.3390/math12152419 - 3 Aug 2024
Viewed by 801
Abstract
The study presents a comparative analysis of climate data under two scenarios: a Gaussian copula marginal regression model for count time series data and a copula-based bivariate count time series model. These models, built after comprehensive simulations, offer adaptable autocorrelation structures considering the [...] Read more.
The study presents a comparative analysis of climate data under two scenarios: a Gaussian copula marginal regression model for count time series data and a copula-based bivariate count time series model. These models, built after comprehensive simulations, offer adaptable autocorrelation structures considering the daily average temperature and humidity data observed at a regional airport in Mobile, AL. Full article
(This article belongs to the Special Issue Statistics and Data Science)
Show Figures

Figure 1

15 pages, 1224 KiB  
Article
Testing Informativeness of Covariate-Induced Group Sizes in Clustered Data
by Hasika K. Wickrama Senevirathne and Sandipan Dutta
Mathematics 2024, 12(11), 1623; https://doi.org/10.3390/math12111623 - 22 May 2024
Viewed by 1340
Abstract
Clustered data are a special type of correlated data where units within a cluster are correlated while units between different clusters are independent. The number of units in a cluster can be associated with that cluster’s outcome. This is called the informative cluster [...] Read more.
Clustered data are a special type of correlated data where units within a cluster are correlated while units between different clusters are independent. The number of units in a cluster can be associated with that cluster’s outcome. This is called the informative cluster size (ICS), which is known to impact clustered data inference. However, when comparing the outcomes from multiple groups of units in clustered data, investigating ICS may not be enough. This is because the number of units belonging to a particular group in a cluster can be associated with the outcome from that group in that cluster, leading to an informative intra-cluster group size or IICGS. This phenomenon of IICGS can exist even in the absence of ICS. Ignoring the existence of IICGS can result in a biased inference for group-based outcome comparisons in clustered data. In this article, we mathematically formulate the concept of IICGS while distinguishing it from ICS and propose a nonparametric bootstrap-based statistical hypothesis-testing mechanism for testing any claim of IICGS in a clustered data setting. Through simulations and real data applications, we demonstrate that our proposed statistical testing method can accurately identify IICGS, with substantial power, in clustered data. Full article
(This article belongs to the Special Issue Statistics and Data Science)
Show Figures

Figure 1

Back to TopTop