Statistics Reform: Practitioner’s Perspective
Abstract
:1. Introduction
2. Examples of Good Statistical Methods That Should Withstand Statistics Reform
- Method of least squares;
- Method of maximum likelihood;
- Central Limit Theorem;
- Akaike information criterion (AIC).
3. Why Should the Two-Sample t-Test Be Abandoned?
3.1. Logic Issue: The Two-Sample t-Test Is Philosophically Flawed and Misleading
3.2. Performance Issues: Uncertainty, Inconsistency, and Dependence on Sample Size
4. Alternative to the Two-Sample t-Test: Advanced Estimation Statistics
4.1. Observed Effect Size (ES) and Relative Effect Size (RES)
4.2. Standard Uncertainty (SU), Relative Standard Uncertainty (RSU), Signal-to-Noise Ratio (SNR), and Signal Content Index (SCI)
4.3. Exceedance Probability (EP) and Net Superiority Probability (NSP)
4.4. A Flowchart of the Advanced Estimation Statistics Framework for Comparing Two Groups
4.5. Example: Comparison of Old and New Flavorings for a Beverage
5. Why Should the t-Interval Method for Calculating Measurement Uncertainty Be Abandoned?
5.1. Rationale Issue: “Coverage” Is a Misleading Concept
5.2. Methodological Issue: The t-Interval or t-Based Uncertainty Is a Distorted Mirror of Physical Reality
5.3. Issues with the t-Distribution
6. Alternative to the t-Interval Method for Calculating Measurement Uncertainty: Unbiased Estimation Method
6.1. Unbiased Estimation Method
“What we can extract—at best—from a random sample is an unbiased point estimate (signal) of an unknown population effect size and an unbiased estimation of the uncertainty (noise), caused by random error, of that point estimation, i.e., the standard error, which is but another label for the standard deviation of the sampling distribution”.
6.2. Example: A Comparison of the WS-z and WS-t Approaches
7. Conclusions and Recommendations
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Nuzzo, R. Scientific method: Statistical errors. Nature 2014, 506, 150–152. [Google Scholar] [CrossRef] [PubMed]
- Hirschauer, N.; Grüner, S.; Mußhoff, O. The p-Value and Statistical Significance Testing. In Fundamentals of Statistical Inference; SpringerBriefs in Applied Statistics and Econometrics; Springer: Cham, Switzerland, 2022. [Google Scholar] [CrossRef]
- Siegfried, T. Odds Are, It’s wrong: Science fails to face the shortcomings of statistics. Sci. News 2010, 177, 26. Available online: https://www.sciencenews.org/article/odds-are-its-wrong (accessed on 10 May 2023). [CrossRef]
- Siegfried, T. To Make Science Better, Watch Out for Statistical Flaws. Science News Context Blog, 7 February 2014. Available online: https://www.sciencenews.org/blog/context/make-science-better-watch-out-statistical-flaws (accessed on 10 May 2023).
- Amrhein, V.; Greenland, S.; McShane, B. Retire statistical significance. Nature 2019, 567, 305–307. [Google Scholar] [CrossRef] [PubMed]
- Halsey, L.G. The reign of the p-value is over: What alternative analyses could we employ to fill the power vacuum? Biol. Lett. 2019, 15, 20190174. [Google Scholar] [CrossRef]
- McShane, B.B.; Gal, D.; Gelman, A.; Robert, C.P.; Tackett, J.L. Abandon statistical significance. Am. Stat. 2018, 73, 235–245. [Google Scholar] [CrossRef]
- Wasserstein, R.L.; Lazar, N.A. The ASA’s statement on p-values: Context, process, and purpose. Am. Stat. 2016, 70, 129–133. [Google Scholar] [CrossRef]
- Wasserstein, R.L.; Schirm, A.L.; Lazar, N.A. Moving to a world beyond “p <0.05”. Am. Stat. 2019, 73 (Suppl. S1), 1–19. [Google Scholar] [CrossRef]
- Trafimow, D.; Marks, M. Editorial. Basic Appl. Soc. Psychol. 2015, 37, 1–2. [Google Scholar] [CrossRef]
- Colling, L.J.; Szűcs, D. Statistical Inference and the Replication Crisis. Rev. Philos. Psychol. 2021, 12, 121–147. [Google Scholar] [CrossRef]
- Haig, B.D. Tests of statistical significance made sound. Educ. Psychol. Meas. 2016, 77, 489–506. [Google Scholar] [CrossRef]
- Wagenmakers, E.J.; Wetzels, R.; Borsboom, D.; Van Der Maas, H.L. Why psychologists must change the way they analyze their data: The case of psi: Comment on Bem. J. Personal. Soc. Psychol. 2011, 100, 426–432. [Google Scholar] [CrossRef] [PubMed]
- Amrhein, V.; Greenland, S. Remove, rather than redefine, statistical significance. Nat. Hum. Behav. 2018, 2, 4. [Google Scholar] [CrossRef] [PubMed]
- Cumming, G. The new statistics: Why and how. Psychol. Sci. 2014, 25, 7–29. [Google Scholar] [CrossRef] [PubMed]
- Cumming, G.; Calin-Jageman, R. Introduction to the New Statistics Estimation, Open Science, and Beyond, 2nd ed.; Routledge: Oxfordshire, UK, 2024; ISBN 9780367531508. [Google Scholar]
- Normile, C.J.; Bloesch, E.K.; Davoli, C.C.; Scherr, K.C. Introducing the new statistics in the classroom. Scholarsh. Teach. Learn. Psychol. 2019, 5, 162–168. [Google Scholar] [CrossRef]
- Claridge-Chang, A.; Assam, P. Estimation statistics should replace significance testing. Nat. Methods 2016, 13, 108–109. [Google Scholar] [CrossRef]
- Elkins, M.R.; Pinto, R.Z.; Verhagen, A.; Grygorowicz, M.; Söderlund, A.; Guemann, M.; Gómez-Conesa, A.; Blanton, S.; Brismée, J.M.; Ardern, C.; et al. Statistical inference through estimation: Recommendations from the International Society of Physiotherapy Journal Editors. Eur. J. Physiother. 2022, 24, 129–133. [Google Scholar] [CrossRef]
- Trafimow, D.; Hyman, M.R.; Kostyk, A.; Wang, Z.; Tong, T.; Wang, T.; Wang, C. Gain-probability diagrams in consumer research. Int. J. Mark. Res. 2022, 64, 470–483. [Google Scholar] [CrossRef]
- Trafimow, D.; Tong, T.; Wang, T.; Choy ST, B.; Hu, L.; Chen, X.; Wang, C.; Wang, Z. 2024 Improving Inferential Analyses Pre-Data and Post-Data. Psychol. Methods 2024. to be published. [Google Scholar]
- Benjamini, Y.; De, V.R.; Efron, B.; Evans, S.; Glickman, M.; Graubard, B.I.; He, X.; Meng, X.-L.; Reid, N.; Stigler, S.M.; et al. ASA President’s Task Force Statement on Statistical Significance and Replicability. Harv. Data Sci. Rev. 2021, 3, 10–11. [Google Scholar] [CrossRef]
- Hand, D.J. Trustworthiness of Statistical Inference. J. R. Stat. Soc. Ser. A Stat. Soc. 2022, 185, 329–347. [Google Scholar] [CrossRef]
- Lohse, K. In Defense of Hypothesis Testing: A Response to the Joint Editorial from the International Society of Physiotherapy Journal Editors on Statistical Inference Through Estimation. Phys. Ther. 2022, 102, 118. [Google Scholar] [CrossRef] [PubMed]
- Aurbacher, J.; Bahrs, E.; Banse, M.; Hess, S.; Hirsch, S.; Hüttel, S.; Latacz-Lohmann, U.; Mußhoff, O.; Odening, M.; Teuber, R. Comments on the p-value debate and good statistical practice. Ger. J. Agric. Econ. 2024, 73, 1–3. [Google Scholar] [CrossRef]
- Heckelei, T.; Hüttel, S.; Odening, M.; Rommel, J. The p-value debate and statistical (Mal) practice–implications for the agricultural and food economics community. Ger. J. Agric. Econ. 2023, 72, 47–67. [Google Scholar] [CrossRef]
- Berner, D.; Amrhein, V. Why and how we should join the shift from significance testing to estimation. J. Evol. Biol. 2022, 35, 777–787. [Google Scholar] [CrossRef]
- Joint Committee for Guides in Metrology (JCGM). Evaluation of Measurement Data—Guide to the Expression of Uncertainty in Measurement; GUM 1995 with Minor Corrections; Joint Committee for Guides in Metrology (JCGM): Sevres, France, 2008. [Google Scholar]
- Huang, H. A paradox in measurement uncertainty analysis. In Proceedings of the Measurement Science Conference, Pasadena, CA, USA, 25–26 March 2010. ‘Global Measurement: Economy & Technology’ 1970–2010 Proceedings (DVD). [Google Scholar]
- Huang, H. Uncertainty estimation with a small number of measurements, Part I: New insights on the t-interval method and its limitations. Meas. Sci. Technol. 2018, 29, 015004. [Google Scholar] [CrossRef]
- Huang, H. Uncertainty-based measurement quality control. Accredit. Qual. Assur. 2014, 19, 65–73. [Google Scholar] [CrossRef]
- Jenkins, J.D. The Student’s t-distribution uncovered. In Proceedings of the Measurement Science Conference, Johannesburg, South Africa, 19–21 November 2007. [Google Scholar]
- D’Agostini, G. Jeffeys priors versus experienced physicist priors: Arguments against objective Bayesian theory. In Proceedings of the 6th Valencia International Meeting on Bayesian Statistics, Alcossebre, Spain, 30 May–4 June 1998. [Google Scholar]
- Ballico, M. Limitations of the Welch-Satterthwaite approximation for measurement uncertainty calculations. Metrologia 2000, 37, 61–64. [Google Scholar] [CrossRef]
- Huang, H. On the Welch-Satterthwaite formula for uncertainty estimation: A paradox and its resolution. Cal Lab Int. J. Metrol. 2016, 23, 20–28. [Google Scholar]
- Ziliak, S.T.; McCloskey, D.N. The Cult of Statistical Significance: How the Standard Error Costs Us Jobs, Justice, and Lives; University of Michigan Press: Ann Arbor, MI, USA, 2008. [Google Scholar] [CrossRef]
- Trafimow, D. The Story of My Journey Away from Significance Testing. In A World Scientific Encyclopedia of Business Storytelling; World Scientific: Singapore, 2023; pp. 95–127. [Google Scholar] [CrossRef]
- Fisher, R.A. Statistical Methods and Scientific Inference; Oliver and Boyd: Edinbrgh, UK, 1956. [Google Scholar]
- Huang, H. Exceedance probability analysis: A practical and effective alternative to t-tests. J. Probab. Stat. Sci. 2022, 20, 80–97. [Google Scholar] [CrossRef]
- Halsey, L.G.; Curran-Everett, D.; Vowler, S.L.; Drummond, G.B. The fickle P value generates irreproducible results. Nat. Methods 2015, 12, 179–185. [Google Scholar] [CrossRef]
- Hirschauer, N.; Grüner, S.; Mußhoff, O.; Becker, C. Pitfalls of significance testing and p-value variability: An econometrics perspective. Stat. Surv. 2018, 12, 136–172. [Google Scholar] [CrossRef]
- Lazzeroni, L.C.; Lu, Y.; Belitskaya-Lévy, I. Solutions for quantifying P-value uncertainty and replication power. Nat. Methods 2016, 13, 107–108. [Google Scholar] [CrossRef] [PubMed]
- Bonovas, S.; Piovani, D. On p-Values and Statistical Significance. J. Clin. Med. 2023, 12, 900. [Google Scholar] [CrossRef] [PubMed]
- Stansbury, D. p-Hacking 101: N Chasing. The Clever Machine. 2020. Available online: https://dustinstansbury.github.io/theclevermachine/p-hacking-n-chasing (accessed on 15 July 2024).
- Baguley, T. Standardized or simple effect size: What should be reported? Br. J. Psychol. 2009, 100 Pt 3, 603–617. [Google Scholar] [CrossRef] [PubMed]
- Schäfer, T. On the use and misuse of standardized effect sizes in psychological research. OSF Prepr. 2023, 1–18. [Google Scholar] [CrossRef]
- Huang, H. Signal content index (SCI): A measure of the effectiveness of measurements and an alternative to p-value for comparing two means. Meas. Sci. Technol. 2019, 31, 045008. [Google Scholar] [CrossRef]
- Etz, A. Confidence Intervals? More like Confusion Intervals. The Featured Content Blog of the Psychonomic Society Digital Content Project. 2015. Available online: https://featuredcontent.psychonomic.org/confidence-intervals-more-like-confusion-intervals/ (accessed on 3 March 2023).
- Karlen, D. Credibility of confidence intervals. In Proceedings of the Conference on Advanced Techniques in Particle Physics, Durham, UK, 18–22 March 2002; Whalley, M., Lyons, L., Eds.; 2002; pp. 53–57. [Google Scholar]
- Morey, R.D.; Hoekstra, R.; Rouder, J.N.; Lee, M.D.; Wagenmakers, E.-J. The fallacy of placing confidence in confidence intervals. Psychon. Bull. Rev. 2016, 23, 103–123. [Google Scholar] [CrossRef]
- Morey, R.D.; Hoekstra, R.; Rouder, J.N.; Wagenmakers, E.-J. Continued misinterpretation of confidence intervals: Response to Miller and Ulrich. Psychon. Bull. Rev. 2016, 23, 131–140. [Google Scholar] [CrossRef]
- Trafimow, D. Confidence intervals, precision and confounding. New Ideas Psychol. 2018, 50, 48–53. [Google Scholar] [CrossRef]
- Huang, H. Probability of net superiority for comparing two groups or group means. Lobachevskii J. Math. 2023, 44, 42–54. [Google Scholar] [CrossRef]
- Environment Protection Agency (EPA). Technical Support Document for Water Quality-Based Toxics Control; Office of Water: Washington, DC, USA, 1991; EPA/505/2-90-001.
- Di Toro, D.M. Probability model of stream quality due to runoff. J. Environ. Eng. ASCE 1984, 110, 607–628. [Google Scholar] [CrossRef]
- Huang, H.; Fergen, R.E. Probability-domain simulation—A new probabilistic method for water quality modelling. In Proceedings of the WEF Specialty Conference “Toxic Substances in Water Environments: Assessment and Control”, Cincinnati, OH, USA, 14–17 May 1995. [Google Scholar]
- Krishnamoorthy, K.; Mathew, T.; Ramachandran, G. Upper limits for exceedance probabilities under the one-way random effects model. Ann. Occup. Hyg. 2007, 51, 397–406. [Google Scholar] [CrossRef] [PubMed]
- Zaiontz, C. Two Sample t Test: Unequal Variances. Real Statistics Using Excel. 2020. Available online: https://real-statistics.com/students-t-distribution/two-sample-t-test-uequal-variances/ (accessed on 22 August 2023).
- Willink, R. Probability, belief and success rate: Comments on ‘On the meaning of coverage probabilities’. Metrologia 2010, 47, 343–346. [Google Scholar] [CrossRef]
- Huang, H. More on the t-interval method and mean-unbiased estimator for measurement uncertainty estimation. Cal Lab Int. J. Metrol. 2018, 25, 24–33. [Google Scholar]
- Bunge, M. Four concepts of probability. Appl. Math. Model. 1981, 5, 306–312. [Google Scholar] [CrossRef]
- Kempthorne, O. Comments on paper by Dr. E. T. Jaynes ‘Confidence intervals vs Bayesian intervals’. In Foundations of Probability Theory, Statistical Inference, and Statistical Theories and Science; Harper, W.L., Hooker, C.A., Eds.; D. Reidel Publishing Company: Dordrecht, The Netherlands, 1976; Volume II, pp. 175–257. [Google Scholar]
- Huang, H. A unified theory of measurement errors and uncertainties. Meas. Sci. Technol. 2018, 29, 125003. [Google Scholar] [CrossRef]
- Huang, H. Uncertainty estimation with a small number of measurements, Part II: A redefinition of uncertainty and an estimator method. Meas. Sci. Technol. 2018, 29, 015005. [Google Scholar] [CrossRef]
- Student (William Sealy Gosset). The probable error of a mean. Biometrika 1908, VI, 1–25. [Google Scholar]
- Ziliak, S.T.; McCloskey, D.N. Significance redux. J. Socio-Econ. 2004, 33, 665–675. [Google Scholar] [CrossRef]
- Huang, H. A minimum entropy criterion for distribution selection for measurement uncertainty analysis. Meas. Sci. Technol. 2023, 35, 035014. [Google Scholar] [CrossRef]
- Huang, H. The Theory of Informity: A Novel Probability Framework; Bulletin of Taras Shevchenko National University of Kyiv: Kyiv, Ukraine, 2025; accepted for publication. [Google Scholar]
- Matloff, N. Open Textbook: From Algorithms to Z-Scores: Probabilistic and Statistical Modeling in Computer Science; University of California: Davis, CA, USA, 2014. [Google Scholar]
- Matloff, N. Why Are We Still Teaching t-Tests? On the Blog: Mad (Data) Scientist—Data Science, R, Statistic. 2014. Available online: https://matloff.wordpress.com/2014/09/15/why-are-we-still-teaching-about-t-tests/ (accessed on 6 June 2022).
- Hirschauer, N. Some thoughts about statistical inference in the 21st century. SocArXiv 2022, 1–12. [Google Scholar] [CrossRef]
- ISO:24578:2021(E); Hydrometry—Acoustic Doppler Profiler—Method and Application for Measurement of Flow in Open Channels from a Moving Boat, First Edition, 2021–2023. International Organization of Standards (ISO): Geneva, Switzerland, 2021.
- Huang, H. Comparison of three approaches for computing measurement uncertainties. Measurement 2020, 163, 107923. [Google Scholar] [CrossRef]
- Huang, H. Why the scaled and shifted t-distribution should not be used in the Monte Carlo method for estimating measurement uncertainty? Measurement 2019, 136, 282–288. [Google Scholar] [CrossRef]
- Jaynes, E.T. Probability Theory: The Logic of Science; Cambridge University Press: Cambridge, UK, 2003. [Google Scholar]
- Hurlbert, S.H.; Levine, R.A.; Utts, J. Coup de Grâce for a Tough Old Bull: “Statistically Significant” Expires. Am. Stat. 2019, 73 (Suppl. S1), 352–357. [Google Scholar] [CrossRef]
New Flavoring (Group A1) | New Flavoring (Group A2) | Old Flavoring (Group B) |
---|---|---|
13 | 20 | 12 |
17 | 32 | 8 |
19 | 2 | 6 |
10 | 25 | 16 |
20 | 5 | 12 |
15 | 18 | 14 |
18 | 21 | 10 |
9 | 7 | 18 |
12 | 28 | 4 |
15 | 40 | 11 |
16 |
New Flavoring (Group A1) | New Flavoring (Group A2) | Old Flavoring (Group B) | |
---|---|---|---|
Sample mean | 14.91 | 19.80 | 11.10 |
Sample standard deviation | 3.59 | 12.27 | 4.33 |
Statistic | Comparison Between Group A1 and Group B | Comparison Between Group A2 and Group B |
---|---|---|
Observed effect size (ES): Equation (1) | ||
Relative effect size (RES): Equation (2) | 28.52% | 72.12% |
Standard uncertainty (SU): Equation (4) | 1.75 | 4.12 |
Relative standard uncertainty (RSU): Equation (5) | 45.84% | 47.31% |
Signal-to-noise ratio (SNR): Equation (6) | 4.76 | 4.47 |
Signal content index (SCI): Equation (7) | 0.83 | 0.82 |
Comparison Between Groups A1 and B | Comparison Between Groups A2 and B | |
---|---|---|
Estimated distribution of Y: Equation (9) | ||
: Equation (10) | ||
: Equation (12) | ||
Net superiority probability (NSP): Equation (13) |
Comparison Between Group A1 and Group B | Comparison Between Group A2 and Group B | |
---|---|---|
: Equation (14) | ||
: Equation (15) | ||
Net superiority probability (NSP): Equation (16) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Huang, H. Statistics Reform: Practitioner’s Perspective. AppliedMath 2025, 5, 49. https://doi.org/10.3390/appliedmath5020049
Huang H. Statistics Reform: Practitioner’s Perspective. AppliedMath. 2025; 5(2):49. https://doi.org/10.3390/appliedmath5020049
Chicago/Turabian StyleHuang, Hening. 2025. "Statistics Reform: Practitioner’s Perspective" AppliedMath 5, no. 2: 49. https://doi.org/10.3390/appliedmath5020049
APA StyleHuang, H. (2025). Statistics Reform: Practitioner’s Perspective. AppliedMath, 5(2), 49. https://doi.org/10.3390/appliedmath5020049