Next Article in Journal
Global Stability and Exponential Decay of Processes in Nonlinear Feedback Systems with Different Fractional Orders
Previous Article in Journal
Topical Emulsion Containing Lavandula stoechas Essential Oil as a Therapeutic Agent for Cutaneous Wound Healing
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Filtering-Based Instance Selection Method for Overlapping Problem in Imbalanced Datasets

Graduate Program in Electrical Engineering and Computing, Mackenzie Presbyterian University, Rua da Consolação, 896, Prédio 30, Consolação, São Paulo 01302-907, Brazil
*
Author to whom correspondence should be addressed.
J 2021, 4(3), 308-327; https://doi.org/10.3390/j4030024
Submission received: 27 April 2021 / Revised: 9 June 2021 / Accepted: 11 June 2021 / Published: 9 July 2021

Abstract

The overlapping problem occurs when a region of the dimensional data space is shared in a similar proportion by different classes. It has an impact on a classifier’s performance due to the difficulty in correctly separating the classes. Further, an imbalanced dataset consists of a situation in which one class has more instances than another, and this is another aspect that impacts a classifier’s performance. In general, these two problems are treated separately. On the other hand, Prototype Selection (PS) approaches are employed as strategies for selecting appropriate instances from a dataset by filtering redundant and noise data, which can cause misclassification performance. In this paper, we introduce Filtering-based Instance Selection (FIS), using as a base the Self-Organizing Maps Neural Network (SOM) and information entropy. In this sense, SOM is trained with a dataset, and, then, the instances of the training set are mapped to the nearest prototype (SOM neurons). An analysis with entropy is conducted in each prototype region. From a threshold, we propose three decision methods: filtering the majority class (H-FIS (High Filter IS)), the minority class (L-FIS (Low Filter IS)), and both classes (B-FIS). The experiments using artificial and real dataset showed that the methods proposed in combination with 1NN improved the accuracy, F-Score, and G-mean values when compared with the 1NN classifier without the filter methods. The FIS approach is also compatible with the approaches mentioned in the relevant literature.
Keywords: prototype selection; self-organizing maps; imbalanced datasets; overlapping problem prototype selection; self-organizing maps; imbalanced datasets; overlapping problem

Share and Cite

MDPI and ACS Style

Rubbo, M.; Silva, L.A. Filtering-Based Instance Selection Method for Overlapping Problem in Imbalanced Datasets. J 2021, 4, 308-327. https://doi.org/10.3390/j4030024

AMA Style

Rubbo M, Silva LA. Filtering-Based Instance Selection Method for Overlapping Problem in Imbalanced Datasets. J. 2021; 4(3):308-327. https://doi.org/10.3390/j4030024

Chicago/Turabian Style

Rubbo, Marcio, and Leandro A. Silva. 2021. "Filtering-Based Instance Selection Method for Overlapping Problem in Imbalanced Datasets" J 4, no. 3: 308-327. https://doi.org/10.3390/j4030024

APA Style

Rubbo, M., & Silva, L. A. (2021). Filtering-Based Instance Selection Method for Overlapping Problem in Imbalanced Datasets. J, 4(3), 308-327. https://doi.org/10.3390/j4030024

Article Metrics

Back to TopTop