Next Article in Journal
A Data-Centric Approach to Understanding the 2020 U.S. Presidential Election
Previous Article in Journal
Combining Semantic Matching, Word Embeddings, Transformers, and LLMs for Enhanced Document Ranking: Application in Systematic Reviews
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

Supervised Density-Based Metric Learning Based on Bhattacharya Distance for Imbalanced Data Classification Problems

by
Atena Jalali Mojahed
1,
Mohammad Hossein Moattar
2,* and
Hamidreza Ghaffari
1
1
Department of Computer Engineering, Ferdows Branch, Islamic Azad University, Ferdows 25H9+CVW, Iran
2
Department of Computer Engineering, Mashhad Branch, Islamic Azad University, Mashhad 9G58+59Q, Iran
*
Author to whom correspondence should be addressed.
Big Data Cogn. Comput. 2024, 8(9), 109; https://doi.org/10.3390/bdcc8090109
Submission received: 23 July 2024 / Revised: 21 August 2024 / Accepted: 3 September 2024 / Published: 4 September 2024

Abstract

Learning distance metrics and distinguishing between samples from different classes are among the most important topics in machine learning. This article proposes a new distance metric learning approach tailored for highly imbalanced datasets. Imbalanced datasets suffer from a lack of data in the minority class, and the differences in class density strongly affect the efficiency of the classification algorithms. Therefore, the density of the classes is considered the main basis of learning the new distance metric. It is possible that the data of one class are composed of several densities, that is, the class is a combination of several normal distributions with different means and variances. In this paper, considering that classes may be multimodal, the distribution of each class is assumed in the form of a mixture of multivariate Gaussian densities. A density-based clustering algorithm is used for determining the number of components followed by the estimation of the parameters of the Gaussian components using maximum a posteriori density estimation. Then, the Bhattacharya distance between the Gaussian mixtures of the classes is maximized using an iterative scheme. To reach a large between-class margin, the distance between the external components is increased while decreasing the distance between the internal components. The proposed method is evaluated on 15 imbalanced datasets using the k-nearest neighbor (KNN) classifier. The results of the experiments show that using the proposed method significantly improves the efficiency of the classifier in imbalance classification problems. Also, when the imbalance ratio is very high and it is not possible to correctly identify minority class samples, the proposed method still provides acceptable performance.
Keywords: imbalanced data classification; distance metric learning; Bhattacharya divergence; class density estimation imbalanced data classification; distance metric learning; Bhattacharya divergence; class density estimation

Share and Cite

MDPI and ACS Style

Jalali Mojahed, A.; Moattar, M.H.; Ghaffari, H. Supervised Density-Based Metric Learning Based on Bhattacharya Distance for Imbalanced Data Classification Problems. Big Data Cogn. Comput. 2024, 8, 109. https://doi.org/10.3390/bdcc8090109

AMA Style

Jalali Mojahed A, Moattar MH, Ghaffari H. Supervised Density-Based Metric Learning Based on Bhattacharya Distance for Imbalanced Data Classification Problems. Big Data and Cognitive Computing. 2024; 8(9):109. https://doi.org/10.3390/bdcc8090109

Chicago/Turabian Style

Jalali Mojahed, Atena, Mohammad Hossein Moattar, and Hamidreza Ghaffari. 2024. "Supervised Density-Based Metric Learning Based on Bhattacharya Distance for Imbalanced Data Classification Problems" Big Data and Cognitive Computing 8, no. 9: 109. https://doi.org/10.3390/bdcc8090109

Article Metrics

Back to TopTop