Next Article in Journal
Distributed Adaptive Optimization Algorithm for High-Order Nonlinear Multi-Agent Stochastic Systems with Lévy Noise
Previous Article in Journal
A Novel Parameter-Variabled and Coupled Chaotic System and Its Application in Image Encryption with Plaintext-Related Key Concealment
Previous Article in Special Issue
Efficient and Flexible Method for Reducing Moderate-Size Deep Neural Networks with Condensation
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

Information FOMO: The Unhealthy Fear of Missing Out on Information—A Method for Removing Misleading Data for Healthier Models

by
Ethan Pickering
* and
Themistoklis P. Sapsis
*
Department of Mechanical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
*
Authors to whom correspondence should be addressed.
Entropy 2024, 26(10), 835; https://doi.org/10.3390/e26100835
Submission received: 7 July 2024 / Revised: 27 September 2024 / Accepted: 27 September 2024 / Published: 30 September 2024
(This article belongs to the Special Issue An Information-Theoretical Perspective on Complex Dynamical Systems)

Abstract

Misleading or unnecessary data can have out-sized impacts on the health or accuracy of Machine Learning (ML) models. We present a Bayesian sequential selection method, akin to Bayesian experimental design, that identifies critically important information within a dataset while ignoring data that are either misleading or bring unnecessary complexity to the surrogate model of choice. Our method improves sample-wise error convergence and eliminates instances where more data lead to worse performance and instabilities of the surrogate model, often termed sample-wise “double descent”. We find these instabilities are a result of the complexity of the underlying map and are linked to extreme events and heavy tails. Our approach has two key features. First, the selection algorithm dynamically couples the chosen model and data. Data is chosen based on its merits towards improving the selected model, rather than being compared strictly against other data. Second, a natural convergence of the method removes the need for dividing the data into training, testing, and validation sets. Instead, the selection metric inherently assesses testing and validation error through global statistics of the model. This ensures that key information is never wasted in testing or validation. The method is applied using both Gaussian process regression and deep neural network surrogate models.
Keywords: double descent; Bayesian sequential selection; machine learning; deep neural networks; Gaussian process regression; sample-wise error convergence; misleading data double descent; Bayesian sequential selection; machine learning; deep neural networks; Gaussian process regression; sample-wise error convergence; misleading data

Share and Cite

MDPI and ACS Style

Pickering, E.; Sapsis, T.P. Information FOMO: The Unhealthy Fear of Missing Out on Information—A Method for Removing Misleading Data for Healthier Models. Entropy 2024, 26, 835. https://doi.org/10.3390/e26100835

AMA Style

Pickering E, Sapsis TP. Information FOMO: The Unhealthy Fear of Missing Out on Information—A Method for Removing Misleading Data for Healthier Models. Entropy. 2024; 26(10):835. https://doi.org/10.3390/e26100835

Chicago/Turabian Style

Pickering, Ethan, and Themistoklis P. Sapsis. 2024. "Information FOMO: The Unhealthy Fear of Missing Out on Information—A Method for Removing Misleading Data for Healthier Models" Entropy 26, no. 10: 835. https://doi.org/10.3390/e26100835

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop