Bootstrap Method as a Tool for Analyzing Data with Atypical Distributions Deviating from Parametric Assumptions: Critique and Effectiveness Evaluation

Kostanek, Joanna; Karolczak, Kamil; Kuliczkowski, Wiktor; Watala, Cezary

doi:10.3390/data9080095

Open AccessArticle

Bootstrap Method as a Tool for Analyzing Data with Atypical Distributions Deviating from Parametric Assumptions: Critique and Effectiveness Evaluation

¹

Department of Haemostatic Disorders, Medical University of Lodz, 6/8 Mazowiecka Street, 92-215 Lodz, Poland

²

Institute for Heart Diseases, Wroclaw Medical University, 213 Borowska Street, 50-556 Wroclaw, Poland

^*

Authors to whom correspondence should be addressed.

Data 2024, 9(8), 95; https://doi.org/10.3390/data9080095

Submission received: 11 May 2024 / Revised: 25 July 2024 / Accepted: 25 July 2024 / Published: 26 July 2024

Download

Browse Figures

Versions Notes

Abstract

:

In today’s research environment characterized by exponential data growth and increasing complexity, the selection of appropriate statistical tests, tailored to research objectives and data distributions, is paramount for rigorous analysis and accurate interpretation. This article explores the growing prominence of bootstrapping, an advanced statistical technique for multiple comparisons analysis, offering flexibility and customization by estimating sample distributions without assuming population distributions, thus serving as a valuable alternative to traditional methods in various data scenarios. Computer simulations were conducted using data from cardiovascular disease patients. Two approaches, spontaneous partly controlled simulation and fully constrained simulation using self-written R scripts, were utilized to generate datasets with specified distributions and analyze the data using tests for comparing more than two groups. The utilization of the bootstrap method greatly improves statistical analysis, especially in overcoming the constraints of conventional parametric tests. Our research showcased its effectiveness in comparing multiple scenarios, yielding strong findings across diverse distributions, even with minor inflation in p values. Serving as a valuable substitute for parametric approaches, bootstrap promotes careful consideration when rejecting hypotheses, thus fostering a deeper understanding of statistical nuances and bolstering analytical rigor.

Keywords:

bootstrap; resampling procedures; violation of parametric tests assumptions; multiple comparisons; simulations

1. Introduction

In the current dynamic research landscape, marked by an exponential increase in accumulated data and a rise in the complexity of research inquiries, the need for efficient statistical analyses plays a pivotal role. Choosing the right test is crucial in statistical data analysis. The selection of an appropriate data analysis method depends on the research goal and the distribution of the data, which determines whether the method can be a parametric test [1].

Parametric tests are characterized by greater precision than their nonparametric counterparts and provide results that are better for assessing the real significance of observed differences [2]. Every researcher should strive to ensure that the statistical analysis is conducted rigorously and interpreted appropriately. Our concern should not only be to achieve statistically significant results but also to ensure that our interpretation reflects reality.

In the analysis of multiple groups, the analysis of variance (ANOVA) test is often a common choice. This test, however, belongs to parametric methods and thus requires certain assumptions. The two most critical assumptions are that the data to be compared with the ANOVA test must have (i) a normal distribution and (ii) equal variances [3]. Depending solely on the outcomes of ANOVA analysis per se is often insufficient. While this statistical test can determine whether differences exist between groups, it does not identify specifically between which groups these differences occur. To obtain more detailed insights, post hoc tests are routinely employed, which are essential in the so-called multiple comparisons analyses [4]. Currently, there is a wide range of post hoc tests, each suitable for specific studies. Most often, more liberal tests are not recommended due to inadequate correction of the error level, thereby increasing the risk of committing statistical errors [5].

Researchers are confronted with the challenge of appropriately managing the risk of statistical errors, and such a threat may constitute a particular risk in the realm of multiple comparisons. In fact, especially when analyzing numerous groups or conditions concurrently, it may lead to an undue escalation of the risk of type I statistical errors [6]. In the context of statistical analysis, the issue of multiple comparisons has become a focal point of intensive research, leading to the development of diverse methods aimed at effectively controlling the risk of statistical errors. Traditional approaches, such as the Bonferroni correction or false discovery rate (FDR) procedure methods, have for a long time served as pivotal tools in this domain [7]. However, with advancements in the field of statistics, there is a noticeable surge in the interest toward more flexible and innovative techniques.

One advanced method in statistical analysis of multiple comparisons that has gained recognition is bootstrapping. Proposed by Bradley Efron, bootstrapping is a resampling technique that involves multiple sampling with replacements [8]. It enables the estimation of sample distribution without assumptions about the population distribution. This method offers flexibility and can be tailored to the specifics of a dataset, making it valuable when traditional methods are inadequate.

The objective of this article was to explore the versatile role of the bootstrap method in multiple comparisons analyses. We chose the bootstrap method because it helps to better handle the issue of multiple comparisons. The aim was to explain how bootstrapping manages various data distributions and compare its efficacy with conventional statistical methods such as ANOVA. In our study, we sought to demonstrate that bootstrap is an excellent tool for handling distributions that deviate significantly from the normal distribution, which is required for the application of parametric methods. Additionally, we aimed to show how bootstrap effectively manages data characterized by small sample sizes. By juxtaposing the outcomes of bootstrapping with those of traditional approach, we sought to underscore the added value that bootstrapping brings to statistical analyses.

Utilizing techniques such as ANOVA and post hoc tests, we analyzed differences between groups, taking into account factors such as data distribution and variance homogeneity. Through this comparative analysis, we sought to illustrate how the bootstrap method can provide insights that may not be readily apparent with traditional methods alone. Acknowledging that the bootstrap method may slightly influence the significance of results, we intended to highlight its potential to enhance the interpretative depth of statistical analyses. By demonstrating the subtle interplay between the bootstrap method and traditional methods, our aim is to promote a more integrated approach to statistical analysis, thereby supporting more robust research practices.

2. Materials and Methods

2.1. Data Preparation

In the simulation study, the anonymous data obtained from the Department and Clinic of Cardiology at the Medical University of Wroclaw were utilized. In our analysis, we focused on a group of male patients (N = 486), all diagnosed with cardiovascular disease. For the purpose of our analysis, we focused on two variables: serum concentration of triglycerides (TG [mg/dL]) and serum concentration of High-Density Lipoprotein Cholesterol (HDL [mg/dL]). Such a choice was justified by the departure of variable TG from the normal distribution, rendering it particularly intriguing for analysis. Conversely, HDL was chosen because, although it shared similar skewness and kurtosis with TG, it exhibited homogenous variances unlike TG. This property was particularly useful for assessing how the bootstrap method handles differences in variance homogeneity in two different variables. Both variables were recorded in all patients in routinely performed diagnostic tests on the admission to the clinic. Another variable, BMI, was recoded to the arbitrarily chosen categories according to absolute values of this variable and served as a grouping variable in the below-mentioned analyses. Thus, all patients were categorized into groups based on their BMI [kg/m²] values in the following way: patients with BMI < 18.5 kg/m² were deliberately excluded from the analysis due to very small sample size of this group; among the remaining patients, three distinct groups were identified: normal weight (group 1 (N = 149); 18.5 ≤ BMI ≤ 24.99 kg/m²), overweight (group 2 (N = 230); 25.0 ≤ BMI ≤ 29.99 kg/m²) and obesity (group 3 (N = 107); BMI ≥ 30 kg/m²) [9]. The characteristics of an overall group of patients selected for further analyses, including patient categorization according to BMI ranges described above, is given in Table 1. Detailed characteristics of the whole study group are presented in Table S1.

The experiments reported here were undertaken in accordance with the guidelines of the 1975 Helsinki Declaration for Human Research. The blood biobanking and diagnostic analyses were approved by the Committee on the Ethics of Research in Human Experimentation at the Medical University of Wroclaw (Agreement No. KB-73/2012). A written layout of the experiment with detailed information about its objectives, design, risk and benefits were presented to each of the patients in the course of recruitment. Informed written consent was obtained from each individual at the beginning of the experiment.

2.2. Sampling Distributions with Different Characteristics

We employed the variable TG in this part of the study. The original dataset had a leptokurtic right-skewed distribution. All distributions with skewness and kurtosis different those of the original dataset were obtained through the procedures of resampling with replacement (for more details see below). Each resampling was performed from the original dataset while maintaining the original sample size (N = 486) using the relevant inclusion/exclusion criteria. A given simulated distribution was considered symmetric when its skewness fell within the range of (−0.1) ≤ γ < 0.1, and mesokurtic when the kurtosis was >(−0.05) < κ < 0.05.

The generation of datasets with specified distributions was conducted in two separate approaches in both employing self-written R scripts: (A) the spontaneous partly controlled simulation and (B) the fully constrained simulation (described in details below). Samples were selected from the original dataset using a method of random sampling with replacement, enabling the creation of datasets intended for further simulations. The sampling process was conducted separately for each type of skewness, resulting in three datasets characterized by varying kurtosis values in a single process. After the datasets were generated, a condition was checked to ensure the appropriate skewness and adherence to leptokurtic, mesokurtic or platykurtic distributions. Sample selection was made directly from the original dataset, with a consideration given to the need of limiting the range of the variable TG to obtain datasets with desired characteristics. The groups were selected with the aim of minimizing disparities in outcomes across different distributions, as the goal was to compare classical approaches with bootstrapping across various distribution types. Consequently, an ANOVA test was conducted during the randomization process to yield results that closely resembled the original data. Ultimately, nine different datasets were utilized for analysis, of which one was original (leptokurtic right-skewed), while the rest were artificially generated through the sampling process. In each bootstrap analysis, we conducted 10,000 iterations. This number of repetitions was chosen to minimize the impact of sampling variability on the results, ensuring the highest possible accuracy of the estimates obtained.

Small samples were also generated from the original data through resampling with replacement. Datasets were created with group sizes of 10, 20, or 30. The compared datasets had the same number of samples, but differed in terms of meeting the assumptions of normality and homogeneity of variance. The medians of the resampled groups were the same as those of the groups in the original dataset.

2.3. Statistical Analysis

Data were presented as mean ± SD or median with interquartile ranges (25% [lower quartile, LQ] to 75% [upper quartile, UQ]). Kurtosis and skewness were calculated following the methods described by Brown [10] and Jar [11]. Normality was assessed using the Shapiro–Wilk test (p-value > 0.05 indicated a normal distribution), while the homogeneity of variance was checked using Levene’s test (p-value > 0.05 indicated equal variances). To compare means among groups, we used a one-way analysis of variance (ANOVA) followed by post hoc tests (Tukey, Bonferroni and LSD Fisher) to account for varying levels of conservativeness. Statistical significance was defined as p < 0.05. All statistical analyses, including descriptive statistics (such as skewness and kurtosis), one-way ANOVA, and post hoc tests, were conducted using STATISTICA software (Dell Inc., 2016, Dell Statistica: data analysis software system, version 13, software.dell.com) and the R Platform (R-4.2.2 for Windows, with publicly available libraries and self-written scripts).

3. Results

3.1. Dataset Characteristics

The analysis focused on comparing the mean levels of triglycerides (TG) or High-Density Lipoprotein Cholesterol (HDL) among three groups of patients with diagnosed cardiovascular disease and different BMI. The original dataset for the TG variable exhibited a leptokurtic right-skewed distribution. The subsequent datasets, obtained through sampling with replacement while maintaining the original data’s sample size, varied only in the proportions between groups. Analyses featuring varying degrees of skewness and kurtosis were conducted across two variants: those incorporating heterogenous variances and those characterized by homogenous variances. Fundamental statistics for both modalities are delineated in the Table 2 and Table 3.

The ANOVA test served as one of the determinants in the grouping process. Results for all randomly selected distributions, including the original distribution, were compared with each other, and for each analysis, its bootstrap-boosted equivalent was performed. In the case of outcomes for distributions with heterogenous variances (Table 4), as well as distributions with homogenous variances (Table 5), all results were statistically significant.

3.2. Bootstrap as a Tool to Tackle Variance Heterogeneity in Parameter Estimation Processes

A comparison was conducted for two variables that did not meet the assumptions of a normal distribution but differed in meeting the assumptions of variance homogeneity. The analysis focused on differences in two variables, HDL or TG, among individuals with different ranges of BMI levels. The results demonstrated that bootstrap analysis could be applied to evaluate outcomes in both homogenous and heterogenous variance scenarios (Table 6). For the HDL variable with homogenous variances, there were no statistically significant differences among the BMI groups. The post hoc probability values in the classical approach and the bootstrap approach showed slight variations but not to an extent that would alter the interpretation of the results. In the case of the TG variable, where variances lacked homogeneity, the results revealed significant differences between groups 1 and 2, as well as between groups 1 and 3. Bootstrap analysis confirmed the significance of these findings, albeit with slightly higher p values compared to the classical approach.

3.3. Bootstrap Resilience: Addressing Skewness and Kurtosis Disparities in Distributional Analysis

The simulation was conducted to assess how the bootstrap method performs with distributions of specific shapes. The variable TG was examined in its original form (leptokurtic right-skewed distribution) and in the form of randomly generated datasets with specific characteristics (skewness and kurtosis). The variances remained heterogenous as in the original dataset. A comparison of three groups was evaluated using Tukey’s test. Statistically significant results in the classical approach were obtained only between groups 1 and 2 and between groups 1 and 3. For some distributions, the bootstrap results did not show statistical significance (Table 7). In nearly all variants, the post hoc probabilities (P) obtained from the bootstrap analysis are inflated compared to the classical approach with post hoc testing.

3.4. Bootstrap Efficacy in Handling Skewness and Kurtosis Disparities under Homogenous Variance Settings

In the subsequent phase, the simulation also addressed the analysis of distributions with varying skewness and kurtosis, yet datasets were generated to ensure homogenous variances in each case. Tukey’s test results revealed significant differences between Group 1 and 2 as well as between Group 1 and 3. Bootstrap-derived results predominantly exhibited higher p values. For some distributions, the bootstrap results did not show statistical significance in comparison to the classical approach (Table 8).

3.5. Exploring Bootstrap Power: Effectiveness in Dealing with Conservativeness in Post Hoc Testing

The comparison of post hoc tests with different degrees of conservativeness was conducted using the TG variable, comparing regular analyses with bootstrap-boosted tests. In all comparisons, the results indicated that the values of the post hoc probabilities (P) obtained in bootstrap analyses were higher than the p values in the classical approach (Table 9). In all tests, statistically significant differences were found between groups 1 and 2, as well as between groups 1 and 3. The least significance difference Fisher’s test (LSD) was the most liberal, with the lowest p values in all cases, while the Bonferroni’s test appeared the most conservative.

3.6. Optimizing Statistical Results Using the Bootstrap Method with Small Samples

The statistical significance of the ANOVA analyses with post hoc tests differed between the traditional approach and the bootstrap approach. To illustrate the differences between the data analysis methods, we presented the distributions of all original data as well as the distributions of the bootstrap means for each group (Figure 1, Figure 2 and Figure 3). The distributions derived from the bootstrap analysis appear to be visually similar to a normal distribution.

Table 10 shows the comparisons of p values from Tukey’s test for comparing groups across all datasets with varying sample sizes (N = 10, N = 20 or N = 30). In the analyses of groups with sample sizes of 20 and 30, the greatest differences in significance are most noticeable in the case of distributions with heterogenous variances, where the traditional approach shows p values below 0.05 for both normal and non-normal distributions. However, for datasets with a group size of 10, all results were statistically non-significant regardless of the distribution type or analysis method, and the differences in p values were very small. The ANOVA results for all groups are available in the Supplementary Materials (Tables S3–S5).

4. Discussion

Our article aimed to illustrate the application of the bootstrap method in the context of atypical distributions. In the case of multiple comparisons, bootstrap proved to be a valuable tool, aiding in demonstrating greater reliability of the outcomes of statistical analysis. The simulation results have demonstrated that bootstrap is a valuable tool for confirming the credibility of the obtained results, irrespective of the distribution type of a variable and its departure from the normal distribution characteristics. Table 11 presents a summary of the performance of bootstrap and its effectiveness across various distributions, as well as its application in multiple group analyses.

Bootstrap is a method commonly employed to enhance and validate existing analyses. In the study by Jayalath et al. this method was utilized for tests examining the homogeneity of variances in two groups [12]. The presented article discusses an approach to improving tests for variance homogeneity across samples of equal and unequal sizes. The authors suggest employing a bootstrap test based on the ratio of mean absolute deviations to enhance assessment accuracy. This proposed bootstrap test demonstrates effectiveness particularly when the underlying distributions exhibit symmetry or slight skewness. The study by Zhang G. assessed the utility of bootstrap in multiple comparisons following one-way ANOVA [13]. They conducted a comprehensive study of one-way ANOVA under heteroscedastic variances and varying sample sizes, employing a bootstrap approach without data transformation. Simulation indicated convergence of type I error rate and multiple comparison procedures to the nominal level of significance. Hill et al. used ANOVA enriched with bootstrap due to the requirements for normal distributions in the analyzed data [14]. They concluded that using the bootstrap method requires more time and computational resources compared to traditional ANOVA, yet it does not rely on the assumption of data distribution.

In our study, we investigated how bootstrap-boosted analyzes facilitate multiple comparisons in different scenarios with data of various distributions. We employed ANOVA and post hoc tests to examine differences between individual groups. Original data were employed to simulate different variants of distributions either departing or not departing from normality. In addition, we simulated the data with deliberately chosen unequal variances in the compared BMI groups. Thus, we prepared the data with distributions meeting the assumptions for use the parametric analysis of variance, as well as the data that violated at least one of these assumptions (normal distribution, variance homogeneity). We paid a special attention to ensure that all simulated data systems within the comparison category (i.e., leptocurtic vs. platycurtic vs. mesocurtic and homogenous vs. heterogenous variances) were characterized by non-different values of the F test statistic, regardless of whether they met or did not meet the assumptions of using parametric ANOVA methods. Implementing this idea in our analysis, we calculated the F statistics values for distributions that meet the assumptions of parametric ANOVA (producing reliable ANOVA outcomes) and for those that do not meet these assumptions (unreliable outcomes, distorting true relationships). Taking advantage of the convenience that bootstrap does not require the data to meet the assumptions necessary to apply parametric tests, the next step was to compare the results obtained using the classic analysis of variance and the analysis supported by the resampling procedure (bootstrap-boosted ANOVA). In the context of this study, we conducted an evaluation of the effectiveness of the bootstrap method in statistical data analysis, with a primary focus on cases where observed data exhibited atypical distributions or did not meet the assumptions required by parametric methods. Our goal was to demonstrate that the bootstrap method can serve as a more flexible and adaptable approach to data analysis under diverse conditions compared to traditional parametric approaches. In selecting the data, we adhered to the principle of representativeness and endeavored to incorporate the diversity of observations to ensure the utmost reliability and universality of our results. We prioritized the objectivity of our findings, striving to minimize the impact of subjective interpretations and biases on the data analysis process.

In the first part, we compared two variables with differing homogeneity of variances. It was observed that in the case of heterogenous variances, bootstrap slightly inflates the values of the post hoc probabilities (P). For the variable with homogenous variances, the outcomes of the bootstrap-boosted analyses were closer to those obtained with the classical approach, which simply implies that bootstrap is an effective tool for assessing the reliability of results in data with both homogenous and heterogenous variances.

In the second part, we examined the effectiveness of bootstrap procedure in analyses of data with distributions characterized by different kurtosis and skewness. It was found that in tests enriched with bootstrap, rejecting the null hypothesis is more challenging. This indicates that bootstrap provides more robust results and reduces the risk of type I statistical errors. Simultaneously, the post hoc probabilities were not inflated to an extent that would lead to type II errors compared to classical methods.

In other studies, bootstrap has also proven to be a valuable tool for distributions significantly deviating from the Gaussian distribution. In the simulation study by Perez-Melo et al., the bootstrap method was useful for calculating confidence intervals in distributions with substantial skewness [15]. Likewise, Chan et al. demonstrated that the bootstrap method is recommended for correlation tests in non-Gaussian distributions deviating from normality [16].

In the final section of our paper, we tested bootstrap with post hoc tests of varying conservatism. It turned out that the strictness of the test does not affect the performance of bootstrap. The results obtained with this method had slightly inflated p values in all cases compared to the traditional approach. However, the inflated p values did not lead to a loss of statistical significance.

Researchers willingly use bootstrap in increasingly diverse statistical analyses and situations where classical methods may yield uncertain results. For instance, Xu et al. demonstrated that bootstrap can be useful in two-way ANOVA, even with small sample sizes [17]. In contrast, Romano et al. utilized bootstrap in conjunction with the Bonferroni test for multiple testing [18]. The use of bootstrap in the context of multiple comparisons was also discussed in Westfall P.H., where the authors concluded that it is not a universal improvement over the classical approach [19].

There is no universal recommendation for the number of repetitions in bootstrap analyses. In our study, we selected 10,000 iterations to ensure the stability and precision of our results. Efron’s early work suggested that even a small number of iterations, such as 25 or 50, could suffice for estimating the standard error, while a larger number is needed for confidence intervals [20]. Efron based his recommendations on the unconditional coefficient of variation. Booth and Sarkar proposed a higher number of iterations, considering the conditional coefficient of variation, which only accounts for the variability from resampling [21]. Contemporary studies, such as those by Hesterberg, recommend using 10,000 iterations for more precise estimates [22]. The current computational power available allows for significantly more iterations, further enhancing the accuracy and reliability of the results.

Parametric methods are often the primary choice in statistical data analysis. However, in reality, studies rarely involve data where variables follow a Gaussian distribution. The application of non-parametric methods carries a higher risk of making errors since they have lower statistical power [23]. One should also not underestimate the fact that while parametric methods provide us with a huge variety of different analysis models, only a few of them, the most basic ones, have their equivalents among non-parametric tests. Therefore, the bootstrap has proven to be a viable alternative to classical methods of analysis. In situations where classical methods fail or yield uncertain results, bootstrap can be a valuable tool for emphasizing the credibility of analyses and their outcomes. Analyses utilizing the bootstrap method demonstrated an inclination to elevate p values, thus indicating that employing of this method may encourage a more prudent approach to null hypothesis rejection. Although bootstrap analysis tends to yield higher p values, these differences are not significant enough to obscure genuine, substantial deviations. They represent subtle adjustments that still allow attention to be focused on real statistical differences where they exist.

5. Conclusions

Despite the abundance of available statistical tools, the problem of multiple comparisons is still present during data analysis. Parametric methods, which are known for their great power, are not applicable when data distributions deviate from normality. While parametric tests are popular in use, bootstrap seems to be a good alternative in data analysis. In our study, simulations were conducted in various scenarios, showing data with extreme distributions and differing homogeneity of variance. In all cases, bootstrap effectively validates the accuracy of the results. Bootstrap-boosted analyses showed that the rejection of the null hypothesis becomes less hasty, which enhances the significance of the results.

The results demonstrated that bootstrap is especially useful tool for analyzing data with small sample sizes. The p values obtained from bootstrap-enhanced analyses help to prevent the premature rejection of the null hypothesis, thereby reducing the risk of type I errors. In the traditional approach, there is a higher chance of obtaining a false result, especially when the sample sizes of the groups are very small.

On the other hand, the bootstrap method does have its limitations. One potential drawback is the duration of the analyses, which can extend to several hours depending on the complexity of the analyses, the sample size, and the computer’s processing power. In our study, the most time-consuming aspect was the resampling of groups with different distributions, which took several hours per distribution. When employing this data analysis method, it is important to consider that less powerful computers may significantly increase the analysis time or even be unable to complete the analyses.

In conclusion, the analyses presented in our study show the effectiveness of bootstrap in verifying the robustness of the research results. In analyzing data with distributions significantly departing from the Gaussian model, an alternative method such as bootstrap should be considered so that the results obtained by classical methods are not uncertain and ambiguous, but more reliable.

Supplementary Materials

The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/data9080095/s1, Table S1 (PDF): Blood morphology, serum biochemistry, medical history and medical treatment variables reported in the studied subjects; Table S2 (xlsx): Original dataset; Table S3 (PDF): ANOVA and bootstrap-boosted ANOVA outcomes for the triglycerides across different distribution types with group size 10; Table S4 (PDF): ANOVA and bootstrap-boosted ANOVA outcomes for the triglycerides across different distribution types with group size 20; Table S5 (PDF): ANOVA and bootstrap-boosted ANOVA outcomes for the triglycerides across different distribution types with group size 30.

Author Contributions

Conceptualization: J.K. and C.W.; methodology, J.K.; software, J.K.; validation, J.K. and C.W.; formal analysis, J.K.; investigation, J.K. and C.W.; resources, J.K. and C.W.; data curation, J.K., C.W. and W.K.; writing—original draft preparation, J.K.; writing—review and editing, J.K., K.K. and C.W.; visualization, J.K.; supervision, C.W. and W.K.; project administration, C.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

The study was conducted according to the guidelines of the Declaration of Helsinki, and approved by the Committee on the Ethics of Research in Human Experimentation at the Medical University of Wroclaw (Agreement No. KB-73/2012, date of approval: 9 February 2012).

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

Data supporting the reported results can be made available upon request. For details on how to access the data, please contact the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Mishra, P.; Pandey, C.M.; Singh, U.; Keshri, A.; Sabaretnam, M. Selection of appropriate statistical methods for data analysis. Ann. Card. Anaesth. 2019, 22, 297–301. [Google Scholar] [CrossRef] [PubMed]
le Cessie, S.; Goeman, J.J.; Dekkers, O.M. Who is afraid of non-normal data? Choosing between parametric and non-parametric tests. Eur. J. Endocrinol. 2020, 182, E1–E3. [Google Scholar] [CrossRef] [PubMed]
Kim, T.K. Understanding one-way ANOVA using conceptual figures. Korean J. Anesthesiol. 2017, 70, 22–26. [Google Scholar] [CrossRef] [PubMed]
McHugh, M.L. Multiple comparison analysis testing in ANOVA. Biochem. Medica 2011, 21, 203–209. [Google Scholar] [CrossRef] [PubMed]
Midway, S.; Robertson, M.; Flinn, S.; Kaller, M. Comparing multiple comparisons: Practical guidance for choosing the best multiple comparisons test. PeerJ 2020, 8, e10387. [Google Scholar] [CrossRef] [PubMed]
Gissane, C. The problem of too many hypothesis tests. Physiother. Pract. Res. 2017, 38, 67–68. [Google Scholar] [CrossRef]
Vasilopoulos, T.; Morey, T.; Dhatariya, K.; Rice, M. Limitations of significance testing in clinical research: A review of multiple comparison corrections and effect size calculations with correlated measures. Anesth. Analg. 2016, 122, 825–830. [Google Scholar] [CrossRef] [PubMed]
Efron, B. Bootstrap Methods: Another Look at the Jackknife. Ann. Stat. 1979, 7, 1–26. [Google Scholar] [CrossRef]
Expert Panel Members; Jensen, M.D.; Ryan, D.H.; Donato, K.A.; Apovian, C.M.; Ard, J.D.; Comuzzie, A.G.; Hu, F.B.; Hubbard, V.S.; Jakicic, J.M.; et al. Executive summary: Guidelines (2013) for the management of overweight and obesity in adults: A report of the American College of Cardi-ology/American Heart Association Task Force on Practice Guidelines and the Obesity Society published by the Obesity Society and American College of Cardiology/American Heart Association Task Force on Practice Guidelines. Based on a systematic review from the The Obesity Expert Panel, 2013. Obesity 2014, 22 (Suppl. 2), S5–S39. [Google Scholar] [CrossRef]
Measures of Shape: Skewness and Kurtosis, 2008–2016 by Stan Brown, Oak Road Systems. Available online: https://BrownMath.com (accessed on 28 March 2024).
Zar, J.H. Biostatistical Analysis, 4th ed.; Prentice Hall International, Inc.: Upper Saddle River, NJ, USA, 1999; pp. 67–73. [Google Scholar]
Jayalath, K.P.; Ng, H.K.T.; Manage, A.B.; Riggs, K.E. Improved tests for homogeneity of variances. Commun. Stat. Simul. Comput. 2017, 46, 7423–7446. [Google Scholar] [CrossRef]
Zhang, G. A parametric bootstrap approach for one-way ANOVA under unequal variances with unbalanced data. Commun. Stat. Simul. Comput. 2015, 44, 827–832. [Google Scholar] [CrossRef]
Hill, J.; Caldwell, B. A bootstrap method for the analysis of physiological data in uncontrolled settings. Proc. Hum. Factors Ergon. Soc. Annu. Meet. 2019, 63, 136–140. [Google Scholar] [CrossRef]
Perez-Melo, S.; Kibria, B.M.G. Comparison of some confidence intervals for estimating the skewness parameter of a distribution. Thail. Stat. 2016, 14, 93–115. [Google Scholar]
Chan, W.; Chan, W.L. Bootstrap standard error and confidence intervals for the correlation corrected for range restriction: A simulation study. Psychol. Methods 2004, 9, 369–385. [Google Scholar] [CrossRef] [PubMed]
Xu, L.W.; Yang, F.Q.; Abula, A.; Qin, S. A parametric bootstrap approach for two-way ANOVA in presence of possible interactions with unequal variances. J. Multivar. Anal. 2013, 115, 172–180. [Google Scholar] [CrossRef]
Romano, J.P.; Wolf, M. Multiple Testing of One-Sided Hypotheses: Combining Bonferroni and the Bootstrap; Working Paper, No. 254; Department of Economics, University of Zurich: Zurich, Switzerland, 2017. [Google Scholar] [CrossRef]
Westfall, P.H. On using the bootstrap for multiple comparisons. J. Biopharm. Stat. 2011, 21, 1187–1205. [Google Scholar] [CrossRef] [PubMed]
Efron, B.; Tibshirani, R.J. An Introduction to the Bootstrap, 1st ed.; Chapman and Hall/CRC: London, UK, 1994; pp. 51–53. [Google Scholar] [CrossRef]
Booth, J.G.; Sarkar, S. Monte Carlo approximation of bootstrap variances. Am. Stat. 1998, 52, 354–357. [Google Scholar] [CrossRef]
Hesterberg, T.C. What Teachers Should Know About the Bootstrap: Resampling in the Undergraduate Statistics Curriculum. Am. Stat. 2015, 69, 371–386. [Google Scholar] [CrossRef] [PubMed]
Nahm, F.S. Nonparametric statistical tests for continuous data: The basic concept and the practical use. Korean J. Anesthesiol. 2016, 69, 8–14. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Frequency distribution of triglyceride values [mg/dL] for all distributions in the variant with a sample size of 10 per group, comparing the classical approach and bootstrap analysis. Group 1 is represented by the red line, Group 2 by the green line, and Group 3 by the blue line. (A) normal distribution homogenous variances classical approach; (B) normal distribution homogenous variances bootstrap approach; (C) non-normal distribution homogenous variances classical approach; (D) non-normal distribution homogenous variances bootstrap approach; (E) normal distribution heterogenous variances classical approach; (F) normal distribution heterogenous variances bootstrap approach (G) non-normal distribution heterogenous variances classical approach; (H) non-normal distribution heterogenous variances bootstrap approach.

Figure 2. Frequency distribution of triglyceride values [mg/dL] for all distributions in the variant with a sample size of 20 per group, comparing the classical approach and bootstrap analysis. Group 1 is represented by the red line, Group 2 by the green line, and Group 3 by the blue line. (A) normal distribution homogenous variances classical approach; (B) normal distribution homogenous variances bootstrap approach; (C) non-normal distribution homogenous variances classical approach; (D) non-normal distribution homogenous variances bootstrap approach; (E) normal distribution heterogenous variances classical approach; (F) normal distribution heterogenous variances bootstrap approach; (G) non-normal distribution heterogenous variances classical approach; (H) non-normal distribution heterogenous variances bootstrap approach.

Figure 3. Frequency distribution of triglyceride values [mg/dL] for all distributions in the variant with a sample size of 30 per group, comparing the classical approach and bootstrap analysis. Group 1 is represented by the red line, Group 2 by the green line, and Group 3 by the blue line. (A) normal distribution homogenous variances classical approach; (B) normal distribution homogenous variances bootstrap approach; (C) non-normal distribution homogenous variances classical approach; (D) non-normal distribution homogenous variances bootstrap approach; (E) normal distribution heterogenous variances classical approach; (F) normal distribution heterogenous variances bootstrap approach; (G) non-normal distribution heterogenous variances classical approach; (H) non-normal distribution heterogenous variances bootstrap approach.

Table 1. The characteristics of original variables across studied patients in three groups categorized on the basis of BMI.

	Variable	Mean	Median	Mode(s)	Min Value	Max Value	Variance	Standard Deviation	Skewness	Kurtosis
Group 1	HDL	45.13	43	40	15	100	180.35	13.43	1.07	2.19
Group 1	TG	120.36	110	92	41	424	3197.19	56.54	1.88	5.87
Group 2	HDL	43.97	42	35	20	88	161.59	12.71	0.99	1.05
Group 2	TG	145.83	130	163	39	513	4716.7	68.68	1.64	4.6
Group 3	HDL	42.73	41	35	23	105	126.65	11.25	1.81	7.84
Group 3	TG	160.64	154	89, 117, 155, 204	49	489	4920.67	70.15	1.47	3.8

For each group, measures of central tendency (mean, median, mode/modes) alongside the range of data (minimum and maximum values), measures of variability (variance, standard deviation), as well as skewness and kurtosis are presented. The variables encompass High-Density Lipoprotein Cholesterol (HDL) and triglycerides (TG). Group 1: normal weight (BMI 18.5–24.99 kg/m²), Group 2: overweight (BMI 25.0–29.99 kg/m²), and Group 3: obesity (BMI ≥ 30 kg/m²).

Table 2. Basic statistics and Levene’s test outcomes for the triglycerides across different distribution types with non-homogenous variances.

			Mean	Median	Standard Deviation	Skewness	Kurtosis	Levene’s Test p Value
Right-skewed	Leptokurtic	Group 1	120.36	110	56.54	1.88	5.87	0.0407
		Group 2	145.83	130	68.68	1.64	4.596
		Group 3	160.64	154	70.15	1.47	3.798
	Platykurtic	Group 1	124.54	110	53.56	0.83	−0.28	0.0089
		Group 2	147.38	134	64.704	0.72	−0.08
		Group 3	158.64	147	68.62	0.696	−0.47
	Mesokurtic	Group 1	123.13	112	52.15	0.82	−0.039	0.0096
		Group 2	145.03	132	63.91	0.81	0.025
		Group 3	157.27	147	67.53	0.65	−0.045
Left-skewed	Leptokurtic	Group 1	208.799	225	71.66	−0.19	0.16	0.0295
		Group 2	229.45	253	81.39	−0.33	0.32
		Group 3	240.505	248	65.71	−0.09	1.28
	Platykurtic	Group 1	203.201	221	67.395	−0.46	−0.34	0.0014
		Group 2	233.58	255	88.33	−0.13	−0.06
		Group 3	236.22	248	64.67	−0.58	−0.13
	Mesokurtic	Group 1	209.23	225	73.51	−0.19	0.03	0.0039
		Group 2	233.48	254	87.59	−0.08	−0.01
		Group 3	238.92	248	62.86	−0.65	0.01
Symmetrical distribution	Leptokurtic	Group 1	230.66	243	84.54	−0.03	0.101	0.0411
		Group 2	260.396	280	106.88	0.05	0.07
		Group 3	286.45	309	99.53	−0.01	0.14
	Platykurtic	Group 1	229.17	243	88.08	0.02	−0.12	0.035
		Group 2	264.15	280	112.77	−0.01	−0.42
		Group 3	288	309	104.90	0.02	−0.08
	Mesokurtic	Group 1	229.67	243	85.13	−0.03	0.02	0.047
		Group 2	263.43	280	109.47	0.01	−0.01
		Group 3	286.14	296	99.52	0.04	0.01

For each group, measures of central tendency (mean, median) alongside measure of variability (standard deviation), Levene’s test outcome (p value) as well as skewness and kurtosis are presented. The variables encompass the serum concentration of triglycerides (TG; [mg/dL]). Group 1: normal weight (BMI 18.5–24.99 kg/m²), Group 2: overweight (BMI 25.0–29.99 kg/m²), and Group 3: obesity (BMI ≥ 30 kg/m²).

Table 3. Basic statistics and Levene’s test outcomes for the triglycerides across different distribution types with homogenous variances.

			Mean	Median	Standard Deviation	Skewness	Kurtosis	Levene’s Test p Value
Right-skewed	Leptokurtic	Group 1	123.01	109	64.78	1.86	5.23	0.122
		Group 2	149.56	135	67.43	1.24	2.85
		Group 3	164	152	79.79	1.63	3.74
	Platykurtic	Group 1	120.95	113	54.99	0.82	−0.12	0.281
		Group 2	153.47	141	62.83	0.74	−0.05
		Group 3	152.12	147	56.75	0.72	−0.296
	Mesokurtic	Group 1	123.63	115	52.41	0.74	−0.03	0.067
		Group 2	143.93	125	62.57	0.94	0.04
		Group 3	155.65	140	57.21	0.78	0.03
Left-skewed	Leptokurtic	Group 1	209.66	222	77.48	−0.1	0.11	0.1396
		Group 2	230.73	255	79.43	−0.26	0.48
		Group 3	244.34	256	67.98	−0.13	1.15
	Platykurtic	Group 1	202.05	222	66.104	−0.58	−0.199	0.0547
		Group 2	225.87	255	75.13	−0.76	−0.57
		Group 3	240.38	248	69.78	−0.45	−0.55
	Mesokurtic	Group 1	203.64	222	69.64	−0.36	−0.01	0.0614
		Group 2	231.61	255	79.12	−0.401	−0.02
		Group 3	236.68	248	67.09	−0.53	0.03
Symmetrical distribution	Leptokurtic	Group 1	223.07	243	87.03	0.04	0.21	0.0861
		Group 2	260.37	280	105.89	0.03	0.08
		Group 3	284.94	296	94.63	−0.01	0.59
	Platykurtic	Group 1	236.52	243	97.94	0.02	−0.46	0.6133
		Group 2	271.53	280	112.17	0.04	−0.21
		Group 3	299.01	309	110.15	0.02	−0.05
	Mesokurtic	Group 1	237.92	243	92.24	−0.01	−0.03	0.1069
		Group 2	264.73	280	109.82	0.03	−0.02
		Group 3	285.59	296	103.89	−0.01	−0.01

For each group, the measures of central tendency (mean, median) alongside the measure of variability (standard deviation), Levene’s test outcome (p value), as well as skewness and kurtosis are presented. The variables encompass the serum concentration of triglycerides (TG; [mg/dL]). Group 1: normal weight (BMI 18.5–24.99 kg/m²), Group 2: overweight (BMI 25.0–29.99 kg/m²), and Group 3: obesity (BMI ≥ 30 kg/m²).

Table 4. ANOVA and bootstrap-boosted ANOVA outcomes for the triglycerides across different distribution types with non-homogenous variances.

			Sum of Squares	df	Mean Square	F	Post Hoc P
Right-skewed Leptokurtic	ANOVA	Between Groups	110,084.62	2	55,042.31	12.81	3.78 × 10⁻⁶
		Within Groups	2,074,899.32	483	4295.86
		Total	2,184,983.94	485
	ANOVA bootstrap	Between Groups	119,012.82	2	59,506.41	14.08	0.000688
		Within Groups	2,059,592.49	483	4264.17
		Total	2,178,605.31	485
Right-skewed Platykurtic	ANOVA	Between Groups	81,389.32	2	40,694.66	10.44	3.63 × 10⁻⁵
		Within Groups	1,882,395.88	483	3897.30
		Total	1,963,785.20	485
	ANOVA bootstrap	Between Groups	89,335.60	2	44,667.80	11.55	0.002057
		Within Groups	1,869,768.68	483	3871.16
		Total	1,959,104.28	485
Right-skewed Mesokurtic	ANOVA	Between Groups	79,630.72	2	39,815.36	10.56	3.25 × 10⁻⁵
		Within Groups	1,821,132.18	483	3770.46
		Total	1,900,762.90	485
	ANOVA bootstrap	Between Groups	86,914.03	2	43,457.01	11.62	0.001835
		Within Groups	1,808,951.97	483	3745.24
		Total	1,895,866.00	485
Left-skewed Leptokurtic	ANOVA	Between Groups	69,242.43	2	34,621.21	6.12	0.002384
		Within Groups	2,734,541.68	483	5661.58
		Total	2,803,784.11	485
	ANOVA bootstrap	Between Groups	79,815.28	2	39,907.64	7.13	0.019972
		Within Groups	2,718,895.48	483	5629.18
		Total	2,798,710.76	485
Left-skewed Platykurtic	ANOVA	Between Groups	101,213.53	2	50,606.76	8.42	0.000254
		Within Groups	2,902,024.51	483	6008.33
		Total	3,003,238.03	485
	ANOVA bootstrap	Between Groups	112,439.14	2	56,219.57	9.44	0.005145
		Within Groups	2,885,922.26	483	5974.99
		Total	2,998,361.40	485
Left-skewed Mesokurtic	ANOVA	Between Groups	71,853.41	2	35,926.71	5.83	0.003145
		Within Groups	2,975,748.45	483	6160.97
		Total	3,047,601.86	485
	ANOVA bootstrap	Between Groups	82,677.31	2	41,338.66	6.78	0.022909
		Within Groups	2,957,948.73	483	6124.12
		Total	3,040,626.04	485
Symmetrical distribution Leptokurtic	ANOVA	Between Groups	198,837.89	2	99,418.95	10.17	4.74 × 10⁻⁵
		Within Groups	4,723,683.01	483	9779.88
		Total	4,922,520.90	485
	ANOVA bootstrap	Between Groups	217,255.00	2	108,627.50	11.22	0.002388
		Within Groups	4,692,377.43	483	9715.07
		Total	4,909,632.43	485
Symmetrical distribution Platykurtic	ANOVA	Between Groups	228,634.85	2	114,317.43	10.56	3.23 × 10⁻⁵
		Within Groups	5,226,721.78	483	10,821.37
		Total	5,455,356.63	485
	ANOVA bootstrap	Between Groups	249,232.93	2	124,616.46	11.63	0.002153
		Within Groups	5,194,869.83	483	10,755.42
		Total	5,444,102.76	485
Symmetrical distribution Mesokurtic	ANOVA	Between Groups	211,075.00	2	105,537.50	10.47	3.52 × 10⁻⁵
		Within Groups	4,866,544.03	483	10,075.66
		Total	5,077,619.03	485
	ANOVA bootstrap	Between Groups	231,788.38	2	115,894.19	11.62	0.001538
		Within Groups	4,838,913.09	483	10,018.45
		Total	5,070,701.46	485

For each, variant of skewness (right-, left-skewed and symmetrical) and kurtosis (lepto-, platy and mesokurtic), the measure of central tendency (arithmetic mean) alongside the measures of inter- (between groups) and intra-variability (within groups) (mean squares), Fisher–Snedecore’s test statistics and post hoc p values are presented. Homogeneity of variances verified with Levene’s test. The variables encompass the serum concentration of triglycerides (TG; [mg/dL]).

Table 5. ANOVA and bootstrap-boosted ANOVA outcomes for the triglycerides across different distribution types with homogenous variances.

			Sum of Squares	df	Mean Square	F	Post Hoc P
Right-skewed Leptokurtic	ANOVA	Between Groups	115,405.12	2	57,702.56	11.92	8.81 × 10⁻⁶
		Within Groups	2,337,195.64	483	4838.91
		Total	2,452,600.76	485
	ANOVA bootstrap	Between Groups	125,510.45	2	62,755.22	13.15	0.001353
		Within Groups	2,322,130.52	483	4807.72
		Total	2,447,640.97	485
Right-skewed Platykurtic	ANOVA	Between Groups	106,541.43	2	53,270.71	15.20	3.98 × 10⁻⁷
		Within Groups	1,693,118.21	483	3505.42
		Total	1,799,659.64	485
	ANOVA bootstrap	Between Groups	113,636.19	2	56,818.09	16.34	0.000218
		Within Groups	1,683,849.32	483	3486.23
		Total	1,797,485.50	485
Right-skewed Mesokurtic	ANOVA	Between Groups	69,657.88	2	34,828.94	10.20	4.60 × 10⁻⁵
		Within Groups	1,649,885.79	483	3415.91
		Total	1,719,543.67	485
	ANOVA bootstrap	Between Groups	76,189.84	2	38,094.92	11.24	0.002033
		Within Groups	1,638,786.61	483	3392.93
		Total	1,714,976.45	485
Left-skewed Leptokurtic	ANOVA	Between Groups	80,102.83	2	40,051.42	6.85	0.001163
		Within Groups	2,823,022.40	483	5844.77
		Total	2,903,125.23	485
	ANOVA bootstrap	Between Groups	91,139.65	2	45,569.83	7.88	0.014981
		Within Groups	2,807,804.76	483	5813.26
		Total	2,898,944.41	485
Left-skewed Platykurtic	ANOVA	Between Groups	98,888.05	2	49,444.03	9.72	7.23 × 10⁻⁵
		Within Groups	2,455,700.78	483	5084.27
		Total	2,554,588.83	485
	ANOVA bootstrap	Between Groups	108,426.47	2	54,213.23	10.78	0.003212
		Within Groups	2,439,084.82	483	5049.87
		Total	2,547,511.29	485
Left-skewed Mesokurtic	ANOVA	Between Groups	92,252.61	2	46,126.30	8.48	2.41 × 10⁻⁴
		Within Groups	2,628,322.13	483	5441.66
		Total	2,720,574.73	485
	ANOVA bootstrap	Between Groups	102,834.40	2	51,417.20	9.56	0.005794
		Within Groups	2,611,787.22	483	5407.43
		Total	2,714,621.62	485
Symmetrical distribution Leptokurtic	ANOVA	Between Groups	254,286.99	2	127,143.49	13.24	2.52 × 10⁻⁶
		Within Groups	4,637,806.31	483	9602.08
		Total	4,892,093.30	485
	ANOVA bootstrap	Between Groups	272,692.43	2	136,346.22	14.36	0.00043
		Within Groups	4,609,132.59	483	9542.72
		Total	4,881,825.02	485
Symmetrical distribution Platykurtic	ANOVA	Between Groups	252,725.92	2	126,362.96	10.92	2.29 × 10⁻⁵
		Within Groups	5,587,105.50	483	11,567.51
		Total	5,839,831.43	485
	ANOVA bootstrap	Between Groups	274,666.22	2	137,333.11	11.98	0.001696
		Within Groups	5,554,908.61	483	11,500.85
		Total	5,829,574.84	485
Symmetrical distribution Mesokurtic	ANOVA	Between Groups	147,269.89	2	73,634.94	6.89	0.001126
		Within Groups	5,165,035.76	483	10,693.66
		Total	5,312,305.65	485
	ANOVA bootstrap	Between Groups	167,242.35	2	83,621.17	7.90	0.013056
		Within Groups	5,133,038.72	483	10,627.41
		Total	5,300,281.07	485

For each, variant of skewness (right-, left-skewed and symmetrical) and kurtosis (lepto-, platy and mesokurtic), the measure of central tendency (arithmetic mean) alongside the measures of inter- (between groups) and intra-variability (within groups) (mean squares), Fisher–Snedecore’s test statistics and post hoc p values are presented. Homogeneity of variances verified with Levene’s test. The variables encompass the serum concentration of triglycerides (TG; [mg/dL]).

Table 6. Comparison of Tukey’s test post hoc probabilities estimated with the use of either classical or bootstrap approaches in variables differing in variance homogeneity, triglycerides (TG; heterogenous variances), High-Density Lipoprotein Cholesterol (HDL; homogenous variances).

Variable	Group Comparison	Tukey’s Test (Classical Approach)	Bootstrap-Boosted Tukey’s Test
HDL	1 vs. 2	0.657	0.56
	1 vs. 3	0.291	0.378
	2 vs. 3	0.677	0.585
TG	1 vs. 2	0.0007	0.016
	1 vs. 3	4.98 × 10⁻⁶	0.001
	2 vs. 3	0.131	0.267