Previous Article in Journal
Prepivoted Augmented Dickey-Fuller Test with Bootstrap-Assisted Lag Length Selection
Previous Article in Special Issue
A Spatial Gaussian-Process Boosting Analysis of Socioeconomic Disparities in Wait-Listing of End-Stage Kidney Disease Patients across the United States
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

Empirical Inferences Under Bayesian Framework to Identify Cellwise Outliers

1
National Institute of Statistical Sciences, P.O. Box 33762, Washington, DC 20033, USA
2
United States Department of Agriculture, National Agriculture Statistics Service, Washington, DC 20250, USA
*
Author to whom correspondence should be addressed.
Stats 2024, 7(4), 1244-1258; https://doi.org/10.3390/stats7040073 (registering DOI)
Submission received: 14 August 2024 / Revised: 16 October 2024 / Accepted: 18 October 2024 / Published: 19 October 2024
(This article belongs to the Special Issue Bayes and Empirical Bayes Inference)

Abstract

Outliers are typically identified using frequentist methods. The data are classified as “outliers” or “not outliers” based on a test statistic that measures the magnitude of the difference between a value and the majority part of the data. The threshold for a data value to be an outlier is typically defined by the user. However, a subjective choice of the threshold increases the uncertainty associated with outlier status for each data value. A cellwise outlier detection algorithm named FuzzyHRT is used to automate the editing process in repeated surveys. This algorithm uses Bienaymé–Chebyshev’s inequality and fuzzy logic to detect four different types of outliers resulting from format inconsistencies, historical, tail, and relational anomalies. However, fuzzy logic is not suited for probabilistic reasoning behind the identification of anomalous cells. Bayesian methods are well suited for quantifying the uncertainty associated with the identification of outliers. Although, as suggested by the literature, there exist well-developed Bayesian methods for record-level outlier detection, Bayesian methods for identifying outliers within individual records (i.e., at the cell level) remain unexplored. This paper presents two approaches from the Bayesian perspective to study the uncertainty associated with identifying outliers. A Bayesian bootstrap approach is explored to study the uncertainty associated with the output scores from the FuzzyHRT algorithm. Empirical likelihoods in a Bayesian setting are also considered for probabilistic reasoning behind the identification of anomalous cells. NASS survey data for livestock and major crop yield (such as corn) are considered for comparing the performances of the two proposed approaches with recent cellwise outlier methods.
Keywords: anomaly identification; Bayesian bootstrap; empirical likelihood; fuzzy logic; predictive distribution; uncertainty anomaly identification; Bayesian bootstrap; empirical likelihood; fuzzy logic; predictive distribution; uncertainty

Share and Cite

MDPI and ACS Style

Sartore, L.; Chen, L.; Bejleri, V. Empirical Inferences Under Bayesian Framework to Identify Cellwise Outliers. Stats 2024, 7, 1244-1258. https://doi.org/10.3390/stats7040073

AMA Style

Sartore L, Chen L, Bejleri V. Empirical Inferences Under Bayesian Framework to Identify Cellwise Outliers. Stats. 2024; 7(4):1244-1258. https://doi.org/10.3390/stats7040073

Chicago/Turabian Style

Sartore, Luca, Lu Chen, and Valbona Bejleri. 2024. "Empirical Inferences Under Bayesian Framework to Identify Cellwise Outliers" Stats 7, no. 4: 1244-1258. https://doi.org/10.3390/stats7040073

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop