Previous Issue
Volume 7, June
 
 

Stats, Volume 7, Issue 3 (September 2024) – 7 articles

  • Issues are regarded as officially published after their release is announced to the table of contents alert mailing list.
  • You may sign up for e-mail alerts to receive table of contents of newly released issues.
  • PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Reader to open them.
Order results
Result details
Section
Select all
Export citation of selected articles as:
12 pages, 380 KiB  
Case Report
Neurodevelopmental Impairments Prediction in Premature Infants Based on Clinical Data and Machine Learning Techniques
by Arantxa Ortega-Leon, Arnaud Gucciardi, Antonio Segado-Arenas, Isabel Benavente-Fernández, Daniel Urda and Ignacio J. Turias
Stats 2024, 7(3), 685-696; https://doi.org/10.3390/stats7030041 (registering DOI) - 12 Jul 2024
Viewed by 116
Abstract
Preterm infants are prone to NeuroDevelopmental Impairment (NDI). Some previous works have identified clinical variables that can be potential predictors of NDI. However, machine learning (ML)-based models still present low predictive capabilities when addressing this problem. This work attempts to evaluate the application [...] Read more.
Preterm infants are prone to NeuroDevelopmental Impairment (NDI). Some previous works have identified clinical variables that can be potential predictors of NDI. However, machine learning (ML)-based models still present low predictive capabilities when addressing this problem. This work attempts to evaluate the application of ML techniques to predict NDI using clinical data from a cohort of very preterm infants recruited at birth and assessed at 2 years of age. Six different classification models were assessed, using all features, clinician-selected features, and mutual information feature selection. The best results were obtained by ML models trained using mutual information-selected features and employing oversampling, for cognitive and motor impairment prediction, while for language impairment prediction the best setting was clinician-selected features. Although the performance indicators in this local cohort are consistent with similar previous works and still rather poor. This is a clear indication that, in order to obtain better performance rates, further analysis and methods should be considered, and other types of data should be taken into account together with the clinical variables. Full article
Show Figures

Figure 1

14 pages, 2280 KiB  
Case Report
Estimator Comparison for the Prediction of Election Results
by Miltiadis S. Chalikias, Georgios X. Papageorgiou and Dimitrios P. Zarogiannis
Stats 2024, 7(3), 671-684; https://doi.org/10.3390/stats7030040 (registering DOI) - 1 Jul 2024
Viewed by 220
Abstract
Cluster randomized experiments and estimator comparisons are well-documented topics. In this paper, using the datasets of the popular vote in the presidential elections of the United States of America (2012, 2016, 2020), we evaluate the properties (SE, MSE) of three cluster sampling estimators: [...] Read more.
Cluster randomized experiments and estimator comparisons are well-documented topics. In this paper, using the datasets of the popular vote in the presidential elections of the United States of America (2012, 2016, 2020), we evaluate the properties (SE, MSE) of three cluster sampling estimators: Ratio estimator, Horvitz–Thompson estimator and the linear regression estimator. While both the Ratio and Horvitz–Thompson estimators are widely used in cluster analysis, we propose a linear regression estimator defined for unequal cluster sizes, which, in many scenarios, performs better than the other two. The main objective of this paper is twofold. Firstly, to indicate which estimator is most suited for predicting the outcome of the popular vote in the United States of America. We do so by applying the single-stage cluster sampling technique to our data. In the first partition, we use the 50 states plus the District of Columbia as primary sampling units, whereas in the second one, we use 3112 counties instead. Secondly, based on the results of the aforementioned procedure, we estimate the number of clusters in a sample for a set standard error while also considering the diminishing returns from increasing the number of clusters in the sample. The linear regression estimator is best in the majority of the examined cases. This type of comparison can also be used for the estimation of any other country’s elections if prior voting results are available. Full article
(This article belongs to the Special Issue Statistical Learning for High-Dimensional Data)
Show Figures

Figure 1

24 pages, 713 KiB  
Article
Hierarchical Time Series Forecasting of Fire Spots in Brazil: A Comprehensive Approach
by Ana Caroline Pinheiro and Paulo Canas Rodrigues
Stats 2024, 7(3), 647-670; https://doi.org/10.3390/stats7030039 - 27 Jun 2024
Viewed by 349
Abstract
This study compares reconciliation techniques and base forecast methods to forecast a hierarchical time series of the number of fire spots in Brazil between 2011 and 2022. A three-level hierarchical time series was considered, comprising fire spots in Brazil, disaggregated by biome, and [...] Read more.
This study compares reconciliation techniques and base forecast methods to forecast a hierarchical time series of the number of fire spots in Brazil between 2011 and 2022. A three-level hierarchical time series was considered, comprising fire spots in Brazil, disaggregated by biome, and further disaggregated by the municipality. The autoregressive integrated moving average (ARIMA), the exponential smoothing (ETS), and the Prophet models were tested for baseline forecasts, and nine reconciliation approaches, including top-down, bottom-up, middle-out, and optimal combination methods, were considered to ensure coherence in the forecasts. Due to the need for transformation to ensure positive forecasts, two data transformations were considered: the logarithm of the number of fire spots plus one and the square root of the number of fire spots plus 0.5. To assess forecast accuracy, the data were split into training data for estimating model parameters and test data for evaluating forecast accuracy. The results show that the ARIMA model with the logarithmic transformation provides overall better forecast accuracy. The BU, MinT(s), and WLS(v) yielded the best results among the reconciliation techniques. Full article
(This article belongs to the Special Issue Modern Time Series Analysis II)
Show Figures

Figure 1

20 pages, 1027 KiB  
Article
Impact of Brexit on STOXX Europe 600 Constituents: A Complex Network Analysis
by Anna Maria D’Arcangelis, Arianna Pierdomenico and Giulia Rotundo
Stats 2024, 7(3), 627-646; https://doi.org/10.3390/stats7030038 - 27 Jun 2024
Viewed by 243
Abstract
Political events play a significant role in exerting their influence on financial markets globally. This paper aims to investigate the long term effect of Brexit on European stock markets using Complex Network methods as a starting point. The media has heavily emphasized the [...] Read more.
Political events play a significant role in exerting their influence on financial markets globally. This paper aims to investigate the long term effect of Brexit on European stock markets using Complex Network methods as a starting point. The media has heavily emphasized the connection between this major political event and its economic and financial impact. To analyse this, we created two samples of companies based on the geographical allocation of their revenues to the UK. The first sample consists of companies that are either British or financially linked to the United Kingdom. The second sample serves as a control group and includes other European companies that are conveniently matched in terms of economic sector and firm size to those in the first sample. Each analysis is repeated over three non-overlapping periods: before the 2016 Referendum, between the Referendum and the 2019 General Elections, and after the 2019 General Elections. After an event study aimed at verifying the short-term response of idiosyncratic daily returns to the referendum result, we analysed the topological evolution of the networks through the MST (Minimum Spanning Trees) of the various samples. Finally, after the computation of the centrality measures pertaining to each network, our attention was directed towards the examination of the persistence of the levels of degree and eigenvector centralities over time. Our target was the investigation on whether the events that determined the evolution of the MST had also brought about structural modifications to the centrality of the most connected companies within the network. The findings demonstrate the unexpected impact of the referendum outcome, which is more noticeable on European equities compared to those of the UK, and the lack of influence from the elections that marked the beginning of the hard Brexit phase in 2019. The modifications in the MST indicate a restructuring of the network of British companies, particularly evident in the third period with a repositioning of the UK nodes. The dynamics of the MSTs around the referendum date is associated with the persistence in the relative rank of the centrality measures (relative to the median). Conversely, the arrival of hard Brexit does alter the relative ranking of the nodes in accord to the the degree centrality. The ranking in accord to the eigenvector centrality keeps the persistence. However, such movements are not statistically significant. An analysis of this kind points out relevant insights for investors, as it equips them to have a comprehensive view of political events, while also assisting policymakers in their endeavour to uphold stability by closely monitoring the ever-changing influence and interconnectedness of global stock markets during similar political events. Full article
(This article belongs to the Section Financial Statistics)
Show Figures

Figure 1

14 pages, 1744 KiB  
Case Report
Investigating Risk Factors for Racial Disparity in E-Cigarette Use with PATH Study
by Amy Liu, Kennedy Dorsey, Almetra Granger, Ty-Runet Bryant, Tung-Sung Tseng, Michael Celestin, Jr. and Qingzhao Yu
Stats 2024, 7(3), 613-626; https://doi.org/10.3390/stats7030037 - 21 Jun 2024
Viewed by 367
Abstract
Background: Previous research has identified differences in e-cigarette use and socioeconomic factors between different racial groups However, there is little research examining specific risk factors contributing to the racial differences. Objective: This study sought to identify racial disparities in e-cigarette use and to [...] Read more.
Background: Previous research has identified differences in e-cigarette use and socioeconomic factors between different racial groups However, there is little research examining specific risk factors contributing to the racial differences. Objective: This study sought to identify racial disparities in e-cigarette use and to determine risk factors that help explain these differences. Methods: We used Wave 5 (2018–2019) of the Adult Population Assessment of Tobacco and Health (PATH) Study. First, we conducted descriptive statistics of e-smoking across our risk factor variables. Next, we used multiple logistic regression to check the risk effects by adjusting all covariates. Finally, we conducted a mediation analysis to determine whether identified factors showed evidence of influencing the association between race and e-cigarette use. All analyses were performed in R or SAS. The R package mma was used for the mediation analysis. Results: Between Hispanic and non-Hispanic White populations, our potential risk factors collectively explain 17.5% of the racial difference, former cigarette smoking explains 7.6%, receiving e-cigarette advertising 2.6%, and perception of e-cigarette harm explains 27.8% of the racial difference. Between non-Hispanic Black and non-Hispanic White populations, former cigarette smoking, receiving e-cigarette advertising, and perception of e-cigarette harm explain 5.2%, 1.8%, and 6.8% of the racial difference, respectively. E-cigarette use is most prevalent in the non-Hispanic White population compared to non-Hispanic Black and Hispanic populations, which may be explained by former cigarette smoking, exposure to e-cigarette advertising, and e-cigarette harm perception. Conclusions: These findings suggest that racial differences in e-cigarette use may be reduced by increasing knowledge of the dangers associated with e-cigarette use and reducing exposure to e-cigarette advertisements. This comprehensive analysis of risk factors can be used to significantly guide smoking cessation efforts and address potential health burden disparities arising from differences in e-cigarette usage. Full article
Show Figures

Figure 1

21 pages, 200 KiB  
Article
Estimation of Standard Error, Linking Error, and Total Error for Robust and Nonrobust Linking Methods in the Two-Parameter Logistic Model
by Alexander Robitzsch
Stats 2024, 7(3), 592-612; https://doi.org/10.3390/stats7030036 - 21 Jun 2024
Viewed by 339
Abstract
The two-parameter logistic (2PL) item response theory model is a statistical model for analyzing multivariate binary data. In this article, two groups are brought onto a common metric using the 2PL model using linking methods. The linking methods of mean–mean linking, mean–geometric–mean linking, [...] Read more.
The two-parameter logistic (2PL) item response theory model is a statistical model for analyzing multivariate binary data. In this article, two groups are brought onto a common metric using the 2PL model using linking methods. The linking methods of mean–mean linking, mean–geometric–mean linking, and Haebara linking are investigated in nonrobust and robust specifications in the presence of differential item functioning (DIF). M-estimation theory is applied to derive linking errors for the studied linking methods. However, estimated linking errors are prone to sampling error in estimated item parameters, thus resulting in artificially increased the linking error estimates in finite samples. For this reason, a bias-corrected linking error estimate is proposed. The usefulness of the modified linking error estimate is demonstrated in a simulation study. It is shown that a simultaneous assessment of the standard error and linking error in a total error must be conducted to obtain valid statistical inference. In the computation of the total error, using the bias-corrected linking error estimate instead of the usually employed linking error provides more accurate coverage rates. Full article
(This article belongs to the Special Issue Robust Statistics in Action II)
16 pages, 476 KiB  
Article
A Comparison of Limited Information Estimation Methods for the Two-Parameter Normal-Ogive Model with Locally Dependent Items
by Alexander Robitzsch
Stats 2024, 7(3), 576-591; https://doi.org/10.3390/stats7030035 - 21 Jun 2024
Viewed by 260
Abstract
The two-parameter normal-ogive (2PNO) model is one of the most popular item response theory (IRT) models for analyzing dichotomous items. Consistent parameter estimation of the 2PNO model using marginal maximum likelihood estimation relies on the local independence assumption. However, the assumption of local [...] Read more.
The two-parameter normal-ogive (2PNO) model is one of the most popular item response theory (IRT) models for analyzing dichotomous items. Consistent parameter estimation of the 2PNO model using marginal maximum likelihood estimation relies on the local independence assumption. However, the assumption of local independence might be violated in practice. Likelihood-based estimation of the local dependence structure is often computationally demanding. Moreover, many IRT models that model local dependence do not have a marginal interpretation of item parameters. In this article, limited information estimation methods are reviewed that allow the convenient and straightforward handling of local dependence in estimating the 2PNO model. In detail, pairwise likelihood, weighted least squares, and normal-ogive harmonic analysis robust method (NOHARM) estimation are compared with marginal maximum likelihood estimation that ignores local dependence. A simulation study revealed that item parameters can be consistently estimated with limited information methods. At the same time, marginal maximum likelihood estimation resulted in biased item parameter estimates in the presence of local dependence. From a practical perspective, there were only minor differences regarding the statistical quality of item parameter estimates of the different estimation methods. Differences between the estimation methods are also compared for two empirical datasets. Full article
(This article belongs to the Special Issue Statistics, Analytics, and Inferences for Discrete Data)
Show Figures

Figure 1

Previous Issue
Back to TopTop