Next Article in Journal
Isogeometric Topology Optimization of Multi-Material Structures under Thermal-Mechanical Loadings Using Neural Networks
Previous Article in Journal
Adaptive Adversarial Self-Training for Semi-Supervised Object Detection in Complex Maritime Scenes
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

Clustering Validation Inference

by
Pau Figuera
1,*,
Alfredo Cuzzocrea
2 and
Pablo García Bringas
1
1
Faculty of Engineering, University of Deusto, 48007 Bilbao, Spain
2
iDEA Lab, University of Calabria, 87036 Rende, Italy
*
Author to whom correspondence should be addressed.
Mathematics 2024, 12(15), 2349; https://doi.org/10.3390/math12152349 (registering DOI)
Submission received: 21 March 2024 / Revised: 18 July 2024 / Accepted: 19 July 2024 / Published: 27 July 2024
(This article belongs to the Section Probability and Statistics)

Abstract

Clustering validation is applied to evaluate the quality of classifications. This step is crucial for unsupervised machine learning. A plethora of methods exist for this purpose; however, a common drawback is that statistical inference is not possible. In this study, we construct a density function for the cluster number. For this purpose, we use smooth techniques. Then, we apply non-negative matrix factorization using the Kullback–Leibler divergence. Employing a unique linearly independent uncorrelated observational variable hypothesis, we construct a sequence by varying the dimension of the span space of the factorization only using analytical techniques. The expectation of the limit of this sequence follows a gamma probability density function. Then, identifying the dimension of the factorization of the space span with clusters, we transform the estimation of the suitable dimension of the factorization into a probabilistic estimate of the number of clusters. This approach is an internal validation method that is suitable for numerical and categorical multivariate data and independent of the clustering technique. Our main achievement is a predictive clustering validation model with graphical abilities. It provides results in terms of credibility, thus making it possible to compare results such as expert judgment on a quantitative basis.
Keywords: non-negative matrix factorization; trace sequence limit; clustering validation; inferential clustering validation non-negative matrix factorization; trace sequence limit; clustering validation; inferential clustering validation

Share and Cite

MDPI and ACS Style

Figuera, P.; Cuzzocrea, A.; Bringas, P.G. Clustering Validation Inference. Mathematics 2024, 12, 2349. https://doi.org/10.3390/math12152349

AMA Style

Figuera P, Cuzzocrea A, Bringas PG. Clustering Validation Inference. Mathematics. 2024; 12(15):2349. https://doi.org/10.3390/math12152349

Chicago/Turabian Style

Figuera, Pau, Alfredo Cuzzocrea, and Pablo García Bringas. 2024. "Clustering Validation Inference" Mathematics 12, no. 15: 2349. https://doi.org/10.3390/math12152349

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop