1. Introduction
Thanks to their ability to simulate human reasoning, fuzzy and probabilistic models allow the construction of automatic decision makers capable of handling very complex phenomena using limited knowledge, in particular for clustering, which is an unsupervised learning task consisting of grouping a collection of elements into subgroups so that the elements in the same group are more similar (according to a certain measure) compared with the objects in other groups. Clustering plays a critical role in data mining and as a standard method for statistical data analysis. It is widely employed in various tasks in machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics. In this work, we introduce a new clustering method named Convolution-Fuzzy-Probabilistic-C-Means (FP-Conv-CM) that implements a new sub-measure, and which is a hybrid of two soft computing approaches: fuzzy logic and probabilistic methods.
The fact that the term “cluster” is not precisely defined is one reason behind the existence of numerous clustering methods. One thing they all have in common is the grouping of data objects. Nevertheless, many researchers have developed and employed various cluster models, and, again, several methods may be provided for each of these cluster models. Typical cluster models include connectivity models [
1], centroid models [
2], distribution models, density-based clustering [
3], subspace models [
4,
5], graph-based models [
6], and neural clustering models [
7].
Clustering methods can be roughly distinguished into hard clustering or soft clustering methods. There are more specific distinctions that can be discussed, including (a) hierarchical clustering [
8] (a child cluster’s objects are likewise a part of the parent cluster); the hierarchy can be built either by dividing or merging recursively [
9,
10], (b) strict partitioning clustering with and without outliers (objects might belong to none, which is referred as outliers); the partition-based clustering technique divides the data points of a dataset into different partitions [
11,
12,
13], (c) overlapping clustering techniques, where an object may belong to one or more clusters; partitioning algorithms that overlap include the commonly used Fuzzy K-means and its variations [
14,
15], (d) subspace clustering (i.e. while an overlapping clustering, within an individually defined subspace, clusters are not expected to overlap) [
16,
17]. Fuzzy clustering is one of the most established clustering methods that partition the dataset into overlapping clusters, i.e., each object belongs to several clusters with different degrees of membership [
18]. The notion of membership function embodies the concept of fuzzy sets and fuzzy logic. The effectiveness of soft clustering has been confirmed when dealing effectively with the discrimination of datasets in n-dimensional spaces and is more useful for unlabeled data and datasets with outliers.
Dunn presented the widely used Fuzzy C-mean algorithm (FCM) [
18], and over the years, Bezdek [
19] has suggested an improved version of FCM; this algorithm being the soft centroid-based model that allows one piece of data to belong to two or more clusters. Due to the poor performance of FCM against noise and intensity inhomogeneity, many variants of FCM have been developed. These FCM variations are covered in [
20]. FCM fails to discriminate between data points and noise or outliers, thus centroids could be stuck in the outliers instead of in the centers. To overcome this drawback, ref.[
21] proposed a possibilistic C-means algorithm, where PCM is a distribution-based model which assumes that the observations of the learning set are the realizations of a random variable whose density function is a mixture of normal distributions [
22] where the membership is the degree of belonging that illustrates how the datum is compatible with the class typicality. This makes it possible to improve performance when dealing with noise and outliers. Despite its capability, the PCM is sensitive to initializations and choices of parameters, it can generate coincident clusters, and also suffers from the degenerescence problem, i.e., when some group is empty, the standard deviation tends toward 0 [
23]. To correct that, Pal et al. [
24] combined FCM and PCM while considering the need for both relative and absolute typicalities in clustering. Another weakness of the possibilistic C-means is the negligence of membership values even with the observation of the typicality values. Further variants of the PCM method have been enhanced by changing the objective function to address some of the PCM’s shortcomings [
25,
26,
27].
Furthermore, there are many attempts to benefit from the combination of FCM and the possibilistic approach to ameliorate the results, also recognizing the requirement for applying membership and typicality values simultaneously [
24,
28,
29]. Once more, Pal et al. [
30] presented a new impressive model named Possibilistic Fuzzy C-means (PFCM), which simultaneously created, for each cluster, a membership and typicity values with the standard prototypes or cluster centers. This algorithm has resolved the primary overlap and coincidence cluster issues that plagued PCM, in addition to removing noise sensitivity issues in FCM. Unfortunately, all these algorithms did not give satisfactory results, thus it was better to use a new approach [
31] that enhanced the PFCM model to precisely detect the cluster centers. The authors introduced the Modified Possibilistic Fuzzy C-Means (MPFCM) with norm functions, including the covariance norm, thus making it superior and flexible at correcting the most complicated problems that other variants find challenging—mainly noise issues and outlier points.
Several other clustering methods have been created in order to prove the possibility that a data point might belong to multiple sub-clusters at once, and some of these algorithms were based on neutrosophic logic. In [
32], a Neutrosophic C-means algorithm (NCM) was presented to derive such an approach by introducing a new objective function handling two different classes of rejection—the ambiguity rejection and the distance rejection—to correct the FCM method’s weakness in identifying noise and outlier data points. NCM simultaneously calculates the degrees of belonging to the determinate and indeterminate clusters for each of the data points. By using a new iterative process, Guo and Sengur [
33] updated the membership degrees to the ambiguity and outlier class of data points, and as a result, the membership functions are more immune to noise. The study [
34] suggested a new Kernel NCM clustering algorithm (KNCM) to improve the idea of the NCM approach for nonlinear-shaped data clustering by combining the kernel function with the neutrosophic logic. In addition, they produced a novel cost function to solve for noise and outliers using a robust parameter estimation, which led to generation of a new membership function and prototype update equations. Along the same lines, [
35] proposed a new kernel-based fuzzy clustering method to deal with the issue of arbitrary cluster shapes. A novel utilization of the kernel method was proposed to extract more expressive features implicitly and may be used to reduce the struggle during fuzzy clustering of high dimensional data by discovering good new representations from the original data. The kernel approach in [
36] has emerged as an interesting and quite viable alternative to fuzzy clustering. In addition, [
37] proposed multiple kernel fuzzy clustering (MKFC), which expands the fuzzy c-means algorithm with the multiple kernel-learning setting. MKFC has more excellent resistance to useless features and kernels by employing multiple kernels and an automated adjustment of the kernel weights. A multiple kernel fuzzy c-means (MKFCM) approach was also presented by [
38], who developed a novel algorithm that uses a linear composite of multiple kernels and automatically determines the updating of the linear coefficients of the combined kernel.
Fuzzy clustering plays an important role in the field of data mining. Since the membership function offers a powerful tool for the identification of changing class structures, ref.[
39] has suggested a technique for dynamic data mining that focuses on fuzzy c-means. This approach appears to afford a convenient method for the detection of changing class structures. Unfortunately, it suffers from several weaknesses, and [
40] proposed some adjustments that led to the creation of the modified dynamic fuzzy c-means (MDFCM) algorithm. This allows for more flexibility in membership function estimation and averts the use of extremely complicated equations [
41]. In a recent paper, we developed a clustering model based on a recurrent neural network, an optimization model that treats various samples equally, and the Euler-Cauchy technique with a fixed time step [
42]. However, outlier samples make it impossible to identify the true groups and cause a very long and incorrect convergence trajectory. We investigated, in [
43], a constrained optimization method that de-assigns memberships from centers to reduce the impact of outlier samples. To benefit from the capacity of dynamic systems to memorize prior groupings and the understanding of the features of the data by neural networks, we have introduced, in our previous work [
44], an original clustering method that implements fuzzy logic and a recurrent neural network, namely the Recurrent Neural Network Fuzzy C-means. Other recent versions of Fuzzy C-means were introduced. In [
45], authors proposed a gradient descent algorithm based on possibilistic fuzzy c-means for clustering noisy data. In [
46], authors suggested the Optimization of a Fuzzy C-Means Clustering Algorithm with a Combination of Minkowski and Chebyshev Distance Using Principal Component Analysis. Wang et al. introduced an improved index for clustering validation based on the Silhouette index and the Calinski–Harabasz index [
47]. A hybrid fuzzy c-means clustering algorithm was suggested in [
48] and was essentially focused on big data problems. The Deep Possibilistic C-means Clustering Algorithm was recently introduced by Gu Y et al. and was used on medical datasets [
49].
In this work, we propose a new sub-measure that implements both notions (the degree of membership and the frequency of “pattern p belongs to A with degree of membership (A,p)”, event denoted with an E, while exploiting the ability of the convolution operator to combine functions on continuous intervals. This measure evaluates both the degree of membership and the frequency of E in the design of decisional systems. We show, using concrete examples, the disadvantages of fuzzy logic and probability-based approaches taken separately, and then we show how a convolution probabilistic measure allows the correction of these disadvantages. Based on this measure, we introduce a new clustering method named Fuzzy-Probabilistic-Convolution-C-Means (FP-Conv-CM). FCM, PKM, and FP-Conv-CM were comparatively tested on several datasets to analyze the clustering task and on several images to analyze the compression task. FCM, PKM, and FP-Conv-CM were compared based on several performance measures: Silhouette and Dunn’s indexes, mean square error, structural similarity (SSIM), and peak signal-to-noise ratio (PSNR). Considering the results obtained using FCM and PKM, FP-Conv-CM was able to improve the Silhouette and Dunn’s Indexes. In addition, FP-Conv-CM improved MSE, PSNR, and SSIM values.
The rest of the paper is organized as follows: In
Section 2, we present the methodology adopted in this work to achieve the stated goals. In
Section 3, we discuss the disadvantages of fuzzy and Probabilistic K-means using concrete data. In
Section 4, the proposed fuzzy probability measure is described and used for clustering tasks. In
Section 5, we present experimental results, and
Section 6 concludes the paper.
2. Methodology Overview
Let represent the space of unlabeled observations. The clustering methods seek to reduce the information contained in :
By summing it up as a set , where ; these vectors will be called referents (centers) throughout the rest of the article;
By defining an assignment function, χ, which is an application of in the set of indices , this function makes it possible to realize a partition of in subsets, .
It should be noted that the clustering problem is NP difficult. Indeed, if
(the number of groups) and
(the dimension), are fixed and
is the number of items to be clustered, then the problem can be solved exactly in
time [
50].
Definitions
Data is the dataset under study;
Centers is the set of the centers to be determined;
is the allocation function of data to the groups represented by the centers ;
, are the groups determined based on ;
is the membership function of group where m is a real number strictly greater than 1;
is the probability of x being in group , where is the covariance of the component .
The quantity measures the frequency that is taken from ; whereas measures the degree of belonging of to . Our idea is to define a new borelian measure that measures the degree of belonging of to with frequency.
Methods
In this work, we introduce a new measure that measures the degree of belonging of to with frequency. This measure is called fuzzy probability convolution measure and implements the following membership-density functions:
Based on the proposed fuzzy probability convolution measure, we introduce a new clustering method that we named Fuzzy Probabilistic Convolution C-Means (FP-Conv-CM). This method estimates the vectors
, the common standard deviation
, and the membership coefficients. Based on FP-Conv, the model implementing these parameters must approximate the true density that generated the data
. In this regard, our method involves maximizing the fuzzy probability of the observations:
where
represents the likelihood measuring the capacity of the model to reproduce the data
,
is the matrix of the membership coefficients (which represents the degree of membership of each data
to each group
), and
is the group that won the sample
according to FP-Conv.
Metrics
One of the biggest challenges facing researchers is how to evaluate the clustering. The homogeneity of the created classes/clusters and the separation between them are calculated in order to assess the efficiency of the clustering method. These qualities may be evaluated using a variety of indices:
Silhouette index:
Let N be the number of patterns. The Silhouette index [
43] finds the optimal clustering effect using the difference between the average distance within the cluster and the minimum distance between the clusters, i.e., the silhouette coefficient is given as follows:
represents the average distance from sample to other samples in the cluster;
represents the minimum distance from sample to the other clusters.
The silhouette coefficient ranges from −1 to 1, where −1 denotes that the data point is not assigned to the relevant cluster, 0 denotes that the clusters are overlapping, and 1 denotes that the cluster is dense and well-separated. This metric is one of the most popular measurements in clustering. It can distinguish between objects that were placed wisely within their cluster and those that are in the outlier zone between clusters.
Dunn index: The Dunn index [
40,
44] is defined as
If
and
are different clusters then
is the minimal distance between samples in different clusters and
is the largest within-cluster distance. Note that large inter-cluster distances (better separation) and smaller cluster sizes (more compact clusters) lead to higher DI values. A higher DI value implies better clustering [
40]. It assumes that better clustering means that clusters are compact and well-separated from other clusters.
The performance measures used to evaluate different image compressions are:
- (a)
Mean Squared Error (MSE), which calculates the error between the initial image and the compressed image. It is in fact the distance between two matrices that represent the images to be compared.
- (b)
Peak Signal-to-Noise Ratio (PSNR), which implements the following equation:
where peakval is taken from the range of the image datatype.
- (c)
Structural SIMilarity (SSIM) index, which is calculated on various windows of an image. The measure between two windows x and y of common size is:
Where
,
is the variance of
, and
is the covariance of
and
;
and
are two constants estimated from the image [
51].
Experimental validation
FCM, PKM, and FP-Conv-CM were tested on multiple datasets and compared on the basis of two performance measures, i.e., the Silhouette metric and Dunn’s Index (see
Section 5.2). FP-Conv-CM was able to improve the Silhouette value by 1000 and Dunn’s Index by 0.024.
FCM, PKM, and FP-Conv-CM were used for multiple image compression tasks and were compared based on three performance measures: mean squared error (MSE), peak signal-to-noise ratio (PSNR), and structural similarity index (SSIM) (see
Section 5.3). FP-Conv-CM improved MSE by 3000, PSNR by 11, and SSIM by 0.32.
3. Drawbacks of Fuzzy and Probabilistic Approaches
3.1. K-Means
The K-means method, which is the most well-known vector quantization method, determines the set of reference vectors
and the assignment function
, by minimizing the cost function:
The time complexity of K-means is
[
52], where
is the number of iterations needed until convergence. The k-means method suffers from several drawbacks due to it being a hard clustering method. To overcome these issues, there exist many extensions, including FCM and PKM [
43].
3.2. Probabilistic K-Means
To obtain the probabilistic version of k-means, it is assumed that the observations of the learning set
are the realizations of a random variable whose density function is a mixture of
normal distributions [
44,
53]:
where
; and
is the normal density function.
In addition to this formalism, the shift to the probabilistic interpretation of the K-means algorithm requires us to introduce additional assumptions:
The prior probabilities are all equal to ;
The k normal functions have identical variance–covariance matrices equal to , where represents the unit matrix and is the standard deviation considered constant for all these normal distributions.
In that case, the density function has the following expression:
The k-means probabilistic version involves estimating the vectors and the typical standard deviation trying to realize the sample as much as possible. This method, known as the maximum likelihood method, involves maximizing the probability of these observations.
Maximizing the classifying likelihood amounts to minimizing:
Probabilistic k-means has a running time of
[
52], where
is the number of
-dimensional vectors,
is the number of clusters, and
is the number of iterations required to reach convergence; CONST is a constant that depends only on the data.
3.3. Fuzzy C-Means (FCM)
The fuzzy mean square clustering algorithm known as Fuzzy C-means (FCM) allows one sample of data to belong to each cluster with different degrees of membership. This method is frequently used in pattern recognition [
42]. The objective function of the FCM method is minimizing the following error:
where the real value
determines the fuzziness of the generated clusters (m > 1),
is the dataset size,
is the number of clusters,
is the degree of membership of sample
to the cluster
,
is the
-dimensional center of the cluster
, and ‖.‖ represents any norm denoting the similarity between each measured data and the center.
An iterative optimization of the objective function presented above is used to perform a fuzzy partitioning, with the update of membership
and the cluster centers
by:
This iteration will stop when , where is a termination criterion between 0 and 1, whereas are the iteration steps. This procedure converges to a local minimum or a saddle point of .
3.4. Fuzzy Reasoning and Probabilistic Reasoning Are Complementary
Let be a set from . The probability, measures the probability that is taken from , whereas measures the degree of belonging of to .
Probabilistic reasoning corrects the weakness in fuzzy reasoning:
Consider a dataset whose elements are distributed according to two normal distributions,
and
, presented in
Figure 1.
These two distributions form two clusters (or classes), namely C1 and C2. In this subsection, we will use probabilistic and fuzzy models to recognize the classes of some critical data. When we generated the data, we noticed that some samples have a very high frequency and are not close enough to the centers of the two densities. As we will see later, depending on the case study, this type of data misleads both models.
Example 1. We consider the element , of high frequency, from class (or cluster) C2.
- (a)
The fuzzy model based on the membership functions
and
, presented in
Figure 1, predicts
as an element of
C1.
In fact, we have and then and thus, the fuzzy model predicts that the sample is from class 1 with center , which is false.
- (b)
The probabilistic model based on the densities
and
, presented in
Figure 1, predicts
as an element of
C2
In fact, we have and .
Thus, the probability model predicts that s is from class 2 with center , which is true.
Fuzzy reasoning corrects the weakness in probabilistic reasoning:
Consider a dataset whose elements are distributed according to two normal distributions, and . These two distributions form two clusters (or classes), namely C1 and C2. In this subsection, we will use probabilistic and fuzzy models to recognize the classes of some critical data. When we generated the data, we noticed that some samples are close to the mean of the two densities and do not have a very high frequency. As we will see later, this perturbs the decision of the models.
Example 2. We consider the element , close to the center 0, of the class (or cluster) C1.
- (a)
The fuzzy model based on the membership functions
and
given below predicts
as an element of
C1.
In fact, we have and , thus, the fuzzy logic predicts that the sample is from class 1 with center , which is true.
- (b)
The probabilistic model based on the membership functions and given below predicts as an element of C2.
In fact, we have and .
Thus, the probability approach predicts that is from class 2 of center , which is false.
Drawbacks of fuzzy and probabilistic approaches on real data: image compression case study:
To show the limits of fuzzy and probabilistic approaches on real data, we have used these methods to compress images and we have conducted a deep analysis of the obtained results. For this, we have used the image of the great scientist Max Planck (see
Figure 2a). The histogram of this image is given in
Figure 2b, while
Figure 2c highlights an ambiguous pixel.
The image obtained after decompression is shown in
Figure 3a (after compression using 2-GMM). The histograms of group 1 (center = 58.95, std = 15.98) and group 2 (center = 58.95, std = 15.98), obtained using 2-GMM, are given in
Figure 3b,c, respectively.
The image obtained after decompression is shown in
Figure 4a (following compression using 2-FCM). The histograms of group 1 (center 1 = 63.75) and group 2 (center 2 = 186.53), obtained using 2-FCM, are given in
Figure 4b,c, respectively.
In the first phase of this example, we focus on the pixels with a gray level 125, which represent 29.05% of the image; for example, the pixel located at position [30 264], which is called p. For pedagogical reasons, we have highlighted this pixel on the different histograms and the different images of Max Planck—original and those obtained by decompression.
This kind of pixel causes ambiguities for 2-FCM, but 2-GMM assigns them correctly thanks to amplification of the small quantities (thanks to the averages and standard deviations) by the exponential function of each Gaussian component.
Indeed, the degrees of membership of p to each of these groups, formed by 2-FCM, are given by et ; thus . The probabilities of p belonging to each of the two groups formed using 2-GMM are given by and ; hence, the clarity of the probabilistic decision with respect to the suitable group of the pixel p.
In the second phase of this example, we focus on the pixels with a gray level of 92, which represent 45.8% of the image; for example, the pixel located at position [198 279], which is called q. This kind of pixel causes ambiguities for 2-GMM, but 2-FCM assigns them correctly thanks to the integration of all centers in the formulas of all membership functions.
Indeed, the probabilities of q belonging to each of the two groups are et ; thus . The degree of membership of q in each of these groups formed by 2-FCM are given by and ; hence the clarity of the fuzzy decision with respect to the suitable group of pixel q.
In the next section, in order to overcome these shortcomings, we will introduce a hybrid sub-measure that implements both fuzzy and probabilistic concepts.
4. Proposed Approach
4.1. Fuzzy Probability Convolution Measure
Our idea is to define a new hybrid measure that implements the fuzzy and the probability reasonings at the same time and corrects the shortcoming of the classical reasoning. Thus, this new measure computes the frequency of taking from and the degree of belonging of to at the same time; we call this measure fuzzy probability convolution.
Definition 1. (Convolution-Fuzzy-probability): The Fuzzy Probability Convolution (FP-Conv) measure is defined based on the convolution of the membership function and on the density (probability) as given by:
The fuzzy probability convolution corrects the incorrect decisions of the fuzzy logic.
Consider the following membership functions:
and the following normal density functions:
Proposition 1. Based on the decision model implementing the following FP-Conv densities:
and
, the sample
is predicted as the element of class 2, which is true.
Proof. We have 0.055585, and ; thus, .
Thus, is considered as an element of the class with center , which is true.
Note: It should be noted thatand , which means that the decision is clearer. □
The convolution fuzzy probability corrects the incorrect decisions of the probability reasoning.
Consider the membership functions and, and the normal densities functions and .
Proposition 2. Based on the fuzzy density measures and , the sample is predicted as an element of class 1, which is true.
Proof. We have and; thus .
Thus, is predicted as an element of the class with center , which is true.
Note: It should be noted that and . □
4.2. Fuzzy Probabilistic Convolution C-Means
The quantity measures the frequency that is taken from ; whereas, measures the degree of belonging of to . Our idea is to define a new borelian measure that measures the degree of belonging of to with frequency. This new measure calculates the frequency of taking from and the degree of belonging of to at the same time.
Based on the concepts introduced in
Section 2, the fuzzy probability convolution clustering method implements the following membership density functions:
The following example shows, geometrically, how the new membership density function makes the decision increasingly clearer and easier.
Example 3. Let us consider that , ,
The samples allocated to each group: and,
We set ; then,
and
(see
Figure 2) and set
then,
and
(see
Figure 2).
Therefore, and
.
Figure 5 represents
,
,
,
,
, and
. The curves of the first four functions clearly underline the source of the ambiguity in the probabilistic and fuzzy models: the curves representing the two groups are very close to each other and the data located on the edges have almost the same probabilities of belonging to different groups. The introduced measure created a very safe separation zone and made a very large difference between the degrees of membership of the data at the edge to the different groups. This will enable the FCM-Conv clustering method to make decisions very comfortably.
4.3. Fuzzy Probability Convolution for the Clustering Task
Following the same principle of probabilistic models presented in
Section 2, the proposed clustering method, named Fuzzy Probabilistic Convolution C-Means (FP-Conv-CM), involves estimating the vectors
, the common standard deviation
, and the membership coefficients, trying to make the realization of the sample of as much as possible in the sense of FP-Conv measure. In this sense, our method involves maximizing the fuzzy probability of the observations:
The parameters must be chosen such that the likelihood is maximum, meaning that the log-likelihood is maximum.
To maximize , we use the partial gradient descent. We present the sample to the fuzzy probability system, and then we set .
At the time
, we propose that
are known. In addition, thanks to the FP-Conv-CM, we propose that
, then we update
via the gradient descent method represented by the following equation:
Theorem 1. We have such that and are the gradients of and , respectively, and is the step of the algorithm following the current direction at time t.
Proof. We set
Then, we calculate the gradient of the memberships
and
and we obtain:
and,
. Finally,
.
Where and □
To update
, we use the gradient descent method represented by the following equation:
where is the step of the Algorithm 1 following the current direction at time
.
Theorem 2. We have .
Proof. In fact, we have
Then we substitute with
In the following section, we give the version of the proposed system that implements the full-gradient of the loss function:
Algorithm 1. Fuzzy-Probabilistic-Convolution-C-Means. |
Requires: Data =, (number of groups), , , m (memberships parameter), b (mini-batch), ITER (maximum number of iterations). |
Ensure: centers matrix, memberships matrix of to the groups. |
Initialization: t = 0, , are randomly chosen;
For all t = 1, …, ITER Do
For all j = 1, …, k Do
For all d = 1, …, N Do |
|
|
|
|
End For End For End For |
END |
In this algorithm, the full convolution was approximated by discrete values, which implements a mini-batch of size b: the b-nearest samples of the current sample are used to estimate the continuous convolution.
In the experimentation section, we use the b nearest neighbors of each sample. It is possible to use genetic algorithm to estimate Ω (V)(support of
), Ω such that:
Given a support Ω, it is assumed that there exists an expression for the mathematical expectation of the function
of the random variable X, of density
of support Ω, resulting from the transfer theorem, based on which
. This can be extended to discrete probabilities by summing with a discrete Dirac-like measure. We use Monte Carlo simulation while producing a sample (
x1,
x2,...,
xE) of the random variable
X on the support Ω, then we calculate an estimation of
based on this sample [
54]. Based on the law of large numbers, the empirical mean is a very good estimator. Since the probability densities and membership functions are designed so that they cover, as much as possible, the different data in the set
, then the supports of these functions are centered around these data. Thus, it is natural to choose the Monte Carlo samples from these data. In our case, the parameter b is calculated using the formula
, where
is the number of clusters, because each pair (
) is supported by the data gained by the group
.
6. Conclusions and Future Perspectives
Fuzzy logic makes decisions on the basis of the degree of membership without giving any information about the frequency of events, whereas probability informs us about the frequency of events but gives no information about the degree of membership to a set or class. This paper proposed a convolution fuzzy probabilistic measure that measures the membership degree and the frequency at the same time. Using concrete examples, we proved that the new measure corrects the shortcoming of both the probability measure and the fuzzy logic-based measure. Based on this measure, this paper introduced a new clustering method, named Fuzzy-Probabilistic-Convolution C-Means (FP-Conv-CM). FCM, PKM, and FP-Conv-CM were tested on multiple datasets and compared on the basis of two performance measures: the Silhouette metric and Dunn’s Index. FP-Conv-CM was able to improve the Silhouette value by 1000 and Dunn’s Index by 0.024. In addition, FCM, PKM, and FP-Conv-CM were used for multiple image compressions and were compared based on three performance measures: mean squared error (MSE), peak signal-to-noise ratio (PSNR), and structural similarity index (SSIM). FP-Conv-CM improved MSE by 3000, PSNR by 11, and SSIM by 0.32.
Given the performance of FP-Conv-CM in food grouping, in future we will use it to personalize diets for people with diabetes [
57,
58]. Also, given the performance of FP-Conv-CM in grouping diabetics, we will use it in the automatic grouping of a population of diabetics to determine the different components of the control models proposed in [
59]. Recently, we used GMM and FCM for localization in stochastic environments to improve the results obtained in [
60], and we will use FP-Conv-CM to summarize the information from LiDAR sensor, which will overcome the localization limitations caused by both GMM and FCM methods.
Unfortunately, FP-Conv-CM inherits some limitations of classical fuzzy logic, which does not take into account the degree of non-membership to different classes. Moreover, we encountered some difficulties in the selection of an optimal support, and a heuristic method is needed to make a good choice. In future, we will use evolutionary algorithms to choose the patterns that best cover the support of the membership functions and the Gaussians implemented by FC-Conv-CM. In addition, we will introduce the Fuzzy Intuitionist Convolution C-means version to take advantage of the ability of the intuitionist logic to quantify the degree of non-appartenance of patterns to different classes.