This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Open AccessArticle
Empirical Inferences Under Bayesian Framework to Identify Cellwise Outliers
by
Luca Sartore
Luca Sartore 1,2,*,
Lu Chen
Lu Chen 1,2 and
Valbona Bejleri
Valbona Bejleri 2
1
National Institute of Statistical Sciences, P.O. Box 33762, Washington, DC 20033, USA
2
United States Department of Agriculture, National Agriculture Statistics Service, Washington, DC 20250, USA
*
Author to whom correspondence should be addressed.
Stats 2024, 7(4), 1244-1258; https://doi.org/10.3390/stats7040073 (registering DOI)
Submission received: 14 August 2024
/
Revised: 16 October 2024
/
Accepted: 18 October 2024
/
Published: 19 October 2024
Abstract
Outliers are typically identified using frequentist methods. The data are classified as “outliers” or “not outliers” based on a test statistic that measures the magnitude of the difference between a value and the majority part of the data. The threshold for a data value to be an outlier is typically defined by the user. However, a subjective choice of the threshold increases the uncertainty associated with outlier status for each data value. A cellwise outlier detection algorithm named FuzzyHRT is used to automate the editing process in repeated surveys. This algorithm uses Bienaymé–Chebyshev’s inequality and fuzzy logic to detect four different types of outliers resulting from format inconsistencies, historical, tail, and relational anomalies. However, fuzzy logic is not suited for probabilistic reasoning behind the identification of anomalous cells. Bayesian methods are well suited for quantifying the uncertainty associated with the identification of outliers. Although, as suggested by the literature, there exist well-developed Bayesian methods for record-level outlier detection, Bayesian methods for identifying outliers within individual records (i.e., at the cell level) remain unexplored. This paper presents two approaches from the Bayesian perspective to study the uncertainty associated with identifying outliers. A Bayesian bootstrap approach is explored to study the uncertainty associated with the output scores from the FuzzyHRT algorithm. Empirical likelihoods in a Bayesian setting are also considered for probabilistic reasoning behind the identification of anomalous cells. NASS survey data for livestock and major crop yield (such as corn) are considered for comparing the performances of the two proposed approaches with recent cellwise outlier methods.
Share and Cite
MDPI and ACS Style
Sartore, L.; Chen, L.; Bejleri, V.
Empirical Inferences Under Bayesian Framework to Identify Cellwise Outliers. Stats 2024, 7, 1244-1258.
https://doi.org/10.3390/stats7040073
AMA Style
Sartore L, Chen L, Bejleri V.
Empirical Inferences Under Bayesian Framework to Identify Cellwise Outliers. Stats. 2024; 7(4):1244-1258.
https://doi.org/10.3390/stats7040073
Chicago/Turabian Style
Sartore, Luca, Lu Chen, and Valbona Bejleri.
2024. "Empirical Inferences Under Bayesian Framework to Identify Cellwise Outliers" Stats 7, no. 4: 1244-1258.
https://doi.org/10.3390/stats7040073
Article Metrics
Article metric data becomes available approximately 24 hours after publication online.