Unsupervised Multiview Fuzzy C-Means Clustering Algorithm

Hussain, Ishtiaq; Sinaga, Kristina P.; Yang, Miin-Shen

doi:10.3390/electronics12214467

Open AccessArticle

Unsupervised Multiview Fuzzy C-Means Clustering Algorithm

by

Ishtiaq Hussain

,

Kristina P. Sinaga

and

Miin-Shen Yang

^*

Department of Applied Mathematics, Chung Yuan Christian University, Chung-Li, Taoyuan 32023, Taiwan

^*

Author to whom correspondence should be addressed.

Electronics 2023, 12(21), 4467; https://doi.org/10.3390/electronics12214467

Submission received: 4 October 2023 / Revised: 25 October 2023 / Accepted: 29 October 2023 / Published: 30 October 2023

(This article belongs to the Special Issue Feature Papers in Computer Science & Engineering)

Download

Browse Figures

Versions Notes

Abstract

:

The rapid development in information technology makes it easier to collect vast numbers of data through the cloud, internet and other sources of information. Multiview clustering is a significant way for clustering multiview data that may come from multiple ways. The fuzzy c-means (FCM) algorithm for clustering (single-view) datasets was extended to process multiview datasets in the literature, called the multiview FCM (MV-FCM). However, most of the MV-FCM clustering algorithms and their extensions in the literature need prior information about the number of clusters and are also highly influenced by initializations. In this paper, we propose a novel MV-FCM clustering algorithm with an unsupervised learning framework, called the unsupervised MV-FCM (U-MV-FCM), such that it can search an optimal number of clusters during the iteration process of the algorithm without giving the number of clusters a priori. It is also free of initializations and parameter selection. We then use three synthetic and six benchmark datasets to make comparisons between the proposed U-MV-FCM and other existing algorithms and to highlight its practical implications. The experimental results show that our proposed U-MV-FCM algorithm is superior and more useful for clustering multiview datasets.

Keywords:

clustering; fuzzy c-means (FCM); multiview FCM (MV-FCM); unsupervised multiview FCM (U-MV-FCM); number of clusters

1. Introduction

Clustering is one of the fundamental techniques used to partition a dataset into clusters such that the data points in the same cluster are the most similar as each other and the data points in the different clusters are the most dissimilar [1,2]. It had been widely used in many areas [3]. Clustering may be categorized into two groups, named the parametric and non-parametric approaches. In 1993, model-based clustering was initially proposed by Banified and Raftrey [4] with its applications [5,6,7]. In the non-parametric methods, one of the renowned and pioneering partitional method is k-means [8,9]. Since Zadeh [10] first proposed fuzzy sets in 1965 where he introduced partial memberships, Ruspini [11] coined a fuzzy approach in the form of fuzzy c-partition for clustering by extending indicator membership function to allow fuzzy memberships in the interval [0, 1]. Based on fuzzy c-partitions, Dunn [12] introduced fuzzy c-means clustering as an advancement of k-means. Various extensions of FCM and its applications can be found in the literature, such as [13,14,15,16,17].

Nowadays, social media, social networks, and IoT rapidly grow, and so collected data are increasing each day. These produced data become more complex with a multiple-view way. That is, with these information and communication technologies via internet, massive numbers of data with multiview representation are generated each day. However, FCM and its extensions were used to handle (single-view) datasets. Usually, particular views capture specific aspects of information, and the evidence obtained from particular views is mutually supportive. Some researchers may extract information from different angles and then combined it. In general, the goal of multiview learning is to obtain more filtering and higher level of information. In 2001, Dhillon [18] was the first to propose co-clustering for 2-view data, and then Bickel and Scheffer [19] introduced multiview clustering for dealing multiview data. With advances in clustering for multiple-view data, many researchers have contributed to the literature, such as [20,21,22,23,24]. Recently, Huang et al. [25] used multiview for deep matrix decomposition. Chen et al. [26] used it for graph regularized least square regression. Tan et al. [27] constructed low-rank subspace multiview clustering through squeezing integrated information from cross view and each view. Yang and Sinaga [28] developed FCM clustering for multiview data using a collaborative way of feature weights. Yang and Hussain [29] proposed unsupervised multiview k-means clustering algorithm. Papakostas et al. [30] introduced an augmented reality spatial ability training application based on fuzzy weights. Papakostas et al. [31] proposed customizing spatial aptitude training in an augmented reality system using fuzzy logic. Lengyel, and Botta-Dukát [32] introduced evaluating clustering efficiency with generalized mean-based silhouette width: a flexible approach. Yang et al. [33] proposed active sensing in the categorization of visual patterns and Xu et al. [34] offered reviews on determining the number of clusters.

Multiview clustering is a technique used in data analysis and machine learning to group data points into clusters when multiple perspectives or “views” of data are available. In the context of the evolving data landscape, it holds significant relevance for several reasons, such as data complexity and diversity, improved accuracy, robustness and stability, discovery of hidden patterns, and adaptability of new data, etc. However, the aforementioned traditional multiview clustering schemes are always affected by initials and necessity to assign a prior cluster number. It is sensitive to the choice of initial condition, and it is highly dependent on the initial guess for cluster centers. These sensitivity and dependence will let algorithms be more complicated and have cost increasing. In this article, we first formulate an unsupervised schema for multiview FCM (MV-FCM) clustering. We then propose the unsupervised MV-FCM (MV-FCM) such that it can find an optimal number of clusters without assigning a prior number of clusters and can also be free of initializations and parameter selection.

The proposed U-MV-FCM algorithm presents several key innovations including the ability to seamlessly integrate information from these multiple views to improve clustering accuracy and in an unsupervised learning framework so that it does not rely on prior knowledge of the number of clusters. Another notable innovation in the U-MV-FCM algorithm is its ability to automatically determine the number of clusters. The initialization-free nature and the capacity of automatically finding the number of clusters in the U-MV-FCM algorithm make it a user-friendly and robust algorithm for clustering multiview data. These advantages reduce the burden on users in terms of parameter tuning and initializations and make it more accessible and effective in various real-world applications. Some of the notations used in the paper are shown in Table 1. The reminder of this paper is as follows. Section 2 demonstrates related works. In Section 3, we introduce an unsupervised-regularization structure for MV-FCM clustering. More parameter estimations are discussed, and then the unsupervised MV-FCM (U-MV-FCM) clustering scheme is proposed. Section 4 presents the experimental results and comparisons, with some existing algorithms. Finally, Section 5 offers some conclusions.

2. Related Works

Before delving into our proposed U-MV-FCM algorithm, it is essential to explore these related existing MV-FCM clustering algorithms. This review offers a crucial foundation for our study, as it helps us understand the state of the art in FCM and MV-FCM algorithms in the realm of single-view as well as multiview clustering, which would be also used in our experimental comparisons. By examining these prior approaches, we aim to identify their strengths, limitations, and common trends. We also review cluster validity indices for MV-FCM clustering that can be used to find the number of clusters. Let

x_{1}, \dots, x_{n}

be n data points in the Euclidean space

ℝ^{d}

and let

a_{1}, \dots, a_{c}

represents c cluster centers. Let

μ_{i k}

be the membership degree of the ith data point in the kth cluster center with

μ_{i k} \in [0, 1]

and

\sum_{k = 1}^{c} μ_{i k} = 1_{} \forall i

. The well-known FCM [12,13] with its objective function is given by:

J (U, A) = \sum_{i = 1}^{n} \sum_{k = 1}^{c} μ_{i k}^{m} {‖x_{i} - a_{k}‖}^{2}

(1)

where

‖x_{i} - a_{k}‖

is the Euclidean distance between ith data and kth cluster, m is the fuzziness index, n denotes the data points, and c is the cluster number. The FCM algorithm is iterated through the updating equations

a_{k j} = \sum_{i = 1}^{n} μ_{i k}^{m} x_{i j} / \sum_{i = 1}^{n} μ_{i k}^{m}

and

μ_{i k} = {(\sum_{k' = 1}^{c} {(d_{i k} / d_{i k'})}^{\frac{2}{m - 1}})}^{- 1}

with

d_{i k} = ‖x_{i} - a_{k}‖

for minimizing the objective function

J (U, A)

of FCM.

Since FCM and its extensions need to give the prior number of clusters, and always depend on initializations and some parameters like fuzziness index m, Yang and Nataliani [35] gave a novel FCM algorithm, called robust-learning FCM (RLFCM), to the robust initializations and also free to the parameter selection with finding an optimal number of clusters. The objective function is as follows:

J_{R L F C M} (U, α, A^{h}) = [\sum_{i = 1}^{n} \sum_{k = 1}^{c} μ_{i k} d_{i k}^{2} - η_{1} \sum_{i = 1}^{n} \sum_{k = 1}^{c} μ_{i k} \ln α_{k} + η_{2} \sum_{i = 1}^{n} \sum_{k = 1}^{c} μ_{i k} \ln μ_{i k} - η_{3} \sum_{i = 1}^{n} \sum_{k = 1}^{c} α_{k} \ln α_{k}]

(2)

where

η_{1,} η_{2,} η_{3} \geq 0

and

d_{i k}^{2} = {‖x_{i}^{h} - a_{k}^{h}‖}^{2} = \sqrt{\sum_{j =}^{d} {(x_{i}^{h} - a_{k}^{h})}^{2}}

. The updated equations of RLFCM are as follows: For cluster center

a_{k j}

,

a_{k j} = \sum_{i = 1}^{n} μ_{i k} x_{i j} / \sum_{i = 1}^{n} μ_{i k},

for membership

μ_{i k}

,

μ_{i k} = \exp (\frac{- d_{i k}^{2} + η_{1} \ln α_{k}}{η_{2}}) / \sum_{t = 1}^{c} \exp (\frac{- d_{i t}^{2} + η_{1} \ln α_{t}}{η_{2}}),

and for mixing proportion

α_{k}

,

α_{k}^{(n e w)} = \frac{1}{n} \sum_{i = 1}^{n} μ_{i k} + \frac{η_{3}}{η_{1}} α_{k}^{(o l d)} (\ln α_{k}^{(o l d)} - \sum_{t = 1}^{c} α_{t}^{(o l d)} \ln α_{t}^{(o l d)}),

where

η_{1} = e^{- \frac{t}{150}},

η_{2} = e^{- \frac{t}{400}} .

In this study, we borrow the idea of Yang and Nataliani [35] to propose the unsupervised multiview FCM clustering algorithm.

Although Bickel and Scheffer [19] proposed multiview clustering for the first time to handle multiview data, Cleuziou et al. [20] was a pioneer in developing an advance version of FCM, known as multiview FCM by augmenting more information among different views, and they also created Co-FKM. Let

X = \{x_{1}, \dots, x_{n}\}

be a multiview dataset with

x_{i} = {\{x_{i}^{h}\}}_{h = 1}^{s},

x_{i}^{h} \in ℝ^{d_{h}},

and

x_{i}^{h} = {\{x_{i j}^{h}\}}_{j = 1}^{d_{h}} .

The Co-FKM algorithm [20] used the collaborative idea of Pedrycz [36] with a sequencing strategy. To handle the multiview data, the Co-FKM combines two strategies. In the first strategy, the average disagreement term between any pair of the views, which are given as

Δ = \frac{1}{s - 1} (\sum_{h' = 1, h' \neq h}^{s} \sum_{i = 1}^{n} \sum_{k = 1}^{c} ({(u_{i k}^{h'})}^{m} - {(u_{i k}^{h})}^{m}) {(d_{i k}^{h})}^{2})

, is integrated into the following objective function:

J_{C o F K M} (U, A) = \sum_{h = 1}^{s} \sum_{i = 1}^{n} \sum_{k = 1}^{c} {(μ_{i k}^{h})}^{m} {(d_{i k}^{h})}^{2} + η Δ

(3)

where

d_{i k}^{h} = ‖x_{i}^{h} - a_{k}^{h}‖ = \sqrt{\sum_{j = 1}^{d_{h}} {(x_{i j}^{h} - a_{k j}^{h})}^{2}},

and

η

is used to control the weight of the disagreement term. Thus, by minimizing the summation of the FCM objective function for each view and the pair disagreement term

Δ

, the membership of each view is achieved. The second strategy is then applied, in which the final consensus fuzzy membership is generated by calculating the geometric mean of memberships of all views with

μ_{i k} = \sqrt[s]{\prod_{h = 1}^{s} μ_{i k}^{h}} .

Another extension to multiview FCM clustering with weighted views was proposed by Jiang et al. [21], called WV-Co-FCM, where they considered different weights per view, by considering the parameters to regulate the distribution of each view weights. The objective function of the WV-Co-FCM is given as

J_{W V C o F C M} (U, A, V) = \sum_{h = 1}^{s} v_{h} [\sum_{i = 1}^{n} \sum_{k = 1}^{c} {(μ_{i k}^{h})}^{m} {(d_{i k}^{h})}^{2} + Δ_{h}] + λ \sum_{h = 1}^{s} v_{h} \ln v_{h}

(4)

where

v_{h}

is the hth view weight with

\sum_{h = 1}^{s} v_{h} = 1

,

v_{h} \in [0, 1],

\sum_{k = 1}^{c} μ_{i k} = 1

μ_{i k} \in [0, 1] .

λ > 0

is used to regulate the view weights,

d_{i k}^{h} = \sqrt{\sum_{j = 1}^{d_{h}} {(x_{i j}^{h} - a_{k j}^{h})}^{2}}

and

Δ_{h} = \sum_{i = 1}^{n} α_{i k}^{h} \sum_{k = 1}^{c} μ_{i k}^{h} (1 - {(μ_{i k}^{h})}^{m - 1}) - \sum_{i = 1}^{n} β_{i k}^{h} \sum_{k = 1}^{c} μ_{i k}^{h} (1 - {(μ_{i k}^{h})}^{m - 1}),

where

α_{i k}^{h} = η {(d_{i k}^{h})}^{2}

, and

β_{i k}^{h}

has four cases including

β_{i k}^{h} = η \frac{1}{s - 1} \sum_{h' = 1, h' \neq h}^{s} {(d_{i k}^{h'})}^{2},

β_{i k}^{h} = \frac{η}{s} \sum_{h = 1}^{s} {(d_{i k}^{h})}^{2},

β_{i k}^{h} = η \min_{h' \neq h} \{{(d_{i k}^{h'})}^{2}\},

and

β_{i k}^{h} = η \sqrt[s - 1]{\prod_{h' \neq h}^{} {(d_{i k}^{h'})}^{2}}

, and

0 < η < 1

is a parameter used to control the penalty related to the disagreement. The term,

Δ_{h},

aims at reducing the disagreement between organizations on different views. The clustering process of the WV-Co-FCM is based on minimizing the objective function that highlights the fuzzy partition for a dataset using the collaborative clustering technique, and the weight of each view that can be achieved by adding the entropy regularization term. The WV-Co-FCM is used to tackle the multiview data and the weights are utilized in the last step. That is, the membership for each object and cluster center is updated independently with the influence of weights.

The multiview FCM clustering algorithm without the collaboration step that considered different weights for its view is called MinMax-FCM, and it was proposed by Wang and Chen [22]. The MinMax-FCM is constructed based on the single-view FCM, which measures the minimum distance between membership matrix

U^{*}

and the cluster centers in each view

A^{h}

. Here, the maximum value within views

V^{h}

impacts the minimum separation between

U^{*}

and

A^{h} .

In this sense, the unanimous clustering results in the MinMax-FCM are produced by the MinMax optimization, in which the dissimilarities of the diverse views are reduced. The MinMax-FCM [22] is formulated as:

\min_{U^{*}, {\{A^{h}\}}_{h = 1}^{s}} \max_{{\{v_{h}\}}_{h = 1}^{s}} \sum_{h = 1}^{s} {(v_{h})}^{β} Q^{h}

(5)

where

Q^{h} = \sum_{i = 1}^{n} \sum_{k = 1}^{c} {(u_{i k}^{*})}^{m} {‖x_{i}^{h} - a_{k}^{h}‖}^{2}

subject to

\sum_{k = 1}^{c} u_{i k}^{*} = 1,

1 \leq i \leq n,

u_{i k}^{*} \geq 0 \forall i, k,

\sum_{h = 1}^{s} v_{h} = 1

and

v_{h} \geq 0,

1 \leq h \leq s .

The expression

{(u_{i k}^{*})}^{m} {‖x_{i}^{h} - a_{k}^{h}‖}^{2}

is considered as the cost of the hth view, which is the FCM objective function.

v_{h}

is the weight of hth view. The parameter

β \in (0, 1)

controls the distribution of weights

v_{h}

for different views, and

m > 1

is the fuzzifier for fuzzy clustering, which controls the fuzziness of the membership.

In Co-FKM, there are two strategies. The first strategy is as follows: the average disagreement between any pairs of the views is integrated into the objective function. The second strategy is applied in which the final consensus fuzzy membership is generated by calculating the geometric mean of the membership of all views. Based on similar strategies as applied in the Co-FKM, the WV-Co-FCM was proposed to handle multiview data. Like the Co-FKM, the fuzzy membership for each object in each view is first calculated in the WV-Co-FCM. An additional step is needed to calculate the final consensus membership. There are mainly three differences between the aforementioned two approaches. First, instead of using standard FCM, the WV-Co-FCM is based on GIFP-FCM (Zhu et al. [37]) in which the entropy is added to enhance the fuzzy membership. Second, the weight for each view is considered in the WV-Co-FCM and the entropy regularization term of the weight is introduced into the objective function. Third, instead of using the geometric mean in Co-FKM, the final consensus membership is generated based on the weight of each view. In the MinMax-FCM, the consensus clustering results are generated based on the MinMax optimization, in which the maximum disagreements of the different weighted views are minimized. According to Wang and Chen [22], the MinMax-FCM generates harmonic consensus clustering results integrating the heterogeneous views of data on the consensus memberships.

In order to make comparisons with the state of the art in multiview clustering algorithms, the algorithms discussed above have been used. The key relationships among them are summarized as follows: the Co-FKM serves as the foundational algorithm, and the WV-Co-FCM is a variation with additional features, while the MinMax-FCM is a separate algorithm that focuses on minimizing disagreements to achieve consensus clustering. Each algorithm has its unique characteristics and strategies for handling multiview data. In the multiview clustering algorithms, they always depend on initializations to achieve good results. All three multiview algorithms of Co-FKM, WV-Co-FCM, and MinMax-FCM try to calculate the consensus membership in different ways. The three multiview algorithms required a number of clusters a priori with the parameter selection. In fact, we need to have an algorithm that can tackle these issues and automatically find the number of clusters. This is our goal in this research.

In general, cluster validity indices (CVIs) are used to assess the suitability of the partitions shaped by clustering algorithms and are commonly used to determine an appropriate clusters number for the datasets [12]. The most used fuzzy CVIs linked to FCM are the partition coefficient (PC) [38] and the partition entropy (PE) [39], which both only use fuzzy memberships obtained by the FCM algorithm. Since the geometry of data cannot be taken into consideration by these indices, the fuzzy hypervolume (FHV) was introduced by Gath and Geva [40]. The XB validity index introduced by Xie and Beni [41] is based on fuzzy memberships and the geometrical structure of data with its objective function, and Wu et al. [42] gave the robust-type CVIs. However, the FHV and XB validity indices are not fitted for multiview FCM clustering algorithms even though we modify them by considering the multiview FCM objective functions. In this sense, for the multiview FCM clustering algorithms, we use PC, PE, and MPC as the validity indices.

The PC [38] is calculated by taking the overall summation of squared fuzzy memberships

μ_{i k}^{2}

. The total data points n is embedded to the controlling weights. Thus, PC associated with fuzzy c-memberships is defined as

P C (c) = \frac{1}{n} \sum_{k = 1}^{c} \sum_{i = 1}^{n} μ_{i k}^{2}

(6)

It is known that

1 / c \leq P C (c) \leq 1

. An optimal number c is determined by solving

\max_{2 \leq c \leq n - 1} P C (c)

. To normalize

P C (c)

such that it is between 0 and 1, Roubens [43] modified

P C (c)

as

M P C (c) = 1 - \frac{c}{c - 1} (1 - P C (c))

(7)

On the other hand, PE [39] is estimated by taking the entropy of fuzzy c-memberships

μ_{i k}

. The PE associated with fuzzy c-memberships is expressed as

P E (c) = - \frac{1}{n} \sum_{i = 1}^{n} \sum_{k = 1}^{c} μ_{i k} \log_{2} μ_{i k}

(8)

Generally, we determine an optimal c by simplifying

\min_{2 \leq c \leq n - 1} P E (c)

to obtain an optimal cluster number.

3. The Proposed Unsupervised Multiview Fuzzy C-Means Clustering Algorithm

The existing clustering algorithms of k-means and FCM are the most popular among single-view data. It is a common fact that FCM is a fuzzy extension of the k-means clustering algorithm. However, most multiview clustering algorithms need prior information such as a number of clusters and the parameter selection. Therefore, there is a need for an algorithm that can handle all these issues and automatically find an optimal number of clusters in clustering the multiview data. For filling this important research gap, we introduce a novel U-MV-FCM algorithm that has the capacity to address the drawbacks of these existing algorithms. That has been demonstrated as follows.

Let

X = \{x_{1}, \dots, x_{n}\}

be a multiview dataset with

x_{i} = {\{x_{i}^{h}\}}_{h = 1}^{s}

,

x_{i}^{h} \in ℝ^{d_{h}},

and

x_{i}^{h} = {\{x_{i j}^{h}\}}_{j = 1}^{d_{h}},

i = 1, \dots, n

. Let

V = {[v_{h}]}_{1 \times s}

, where

v_{h}

is the hth view weight and

A^{h} = \{a_{1}^{h}, \dots, a_{c}^{h}\}

is the set of the c cluster centers in the hth view with

a_{k}^{h} = {a_{k j}^{h}}, j = 1, \dots, d_{h}

. Let

U = {[μ_{i k}]}_{n \times c}

be the fuzzy c-membership matrix with

μ_{i k} \in [0, 1]

and

\sum_{k = 1}^{c} μ_{i k} = 1 \forall i

where

μ_{i k}

is interpreted as the agreed membership of ith data point in the cluster centers k shared across different view h. As the prime objective of our proposed unsupervised-regularization structure is to determine an optimal number of clusters for multiview FCM, we use the concept of robust-learning FCM proposed by Yang and Nataliani [35]. We first design an adequate membership architecture across different views to the entire multiview dataset, and then add more penalty terms to construct the unsupervised-regularization structure. Thus, we construct the following unsupervised multiview FCM (U-MV-FCM) objective function:

J_{U - M V F C M} (U, A^{h}, V, α) = [\sum_{h = 1}^{s} v_{h}^{β} \sum_{i = 1}^{n} \sum_{k = 1}^{c} μ_{i k} {‖x_{i}^{h} - a_{k}^{h}‖}^{2} - η_{1} \sum_{i = 1}^{n} \sum_{k = 1}^{c} μ_{i k} \ln α_{k} + η_{2} \sum_{i = 1}^{n} \sum_{k = 1}^{c} μ_{i k} \ln μ_{i k} - η_{3} \sum_{i = 1}^{n} \sum_{k = 1}^{c} α_{k} \ln α_{k}]

(9)

subject to

\sum_{k = 1}^{c} μ_{i k} = 1, \forall i, μ_{i k} \in [0, 1]

,

\sum_{h = 1}^{s} v_{h} = 1, v_{h} \in [0, 1],

and

\sum_{k = 1}^{c} α_{k} = 1, α_{k} \in [0, 1]

, where

{‖x_{i}^{h} - a_{k}^{h}‖}^{2} = \sum_{j = 1}^{d_{h}} {(x_{i j}^{h} - a_{k j}^{h})}^{2},

η_{1}, η_{2}, η_{3} > 0,

and

β

is a view-weight exponent. We mention that

α_{k}

is used to represent the probability of the kth cluster in the c clusters that will be utilized to determine an optimal number of clusters. The first term of

J_{U - M V F C M} (U, A^{h}, V, α)

is the only simple objective function of the multiview FCM, and the other three penalty terms are considered as unsupervised-regularization by using the parameters

η_{1}, η_{2},

and

η_{3}

to adjust these penalties. We will give the estimates of

η_{1}, η_{2}, η_{3}

and

β

later.

In general, it is challenging to simplify the variables

μ_{i k}

,

v_{h}

,

a_{k}^{h}

, and

α_{k}

in Equation (9) directly, as Equation (9) is nonconvex. Yet we observe that the objective function is convex with respect to

μ_{i k}, v_{h}, α_{k}

and is concave with respect to

a_{k}^{h}

and therefore alternative optimization (AO) is utilized for the optimization problem by fixing other variables while solving the one.

Theorem 1.

The updating equations to find the necessary conditions for minimization of the U-MV-FCM objective function

J_{U - M V F C M} (U, A^{h}, V, α)

of Equation (9) are

μ_{i k} = \frac{e^{\frac{1}{η_{2}} (η_{1} \ln α_{k} - \sum_{h = 1}^{s} v_{h}^{β} {‖x_{i}^{h} - a_{k}^{h}‖}^{2})}}{\sum_{k' = 1}^{c} e^{\frac{1}{η_{2}} (η_{1} \ln α_{k'} - \sum_{h = 1}^{s} v_{h}^{β} {‖x_{i}^{h} - a_{k'}^{h}‖}^{2})}}

(10)

v_{h} = {({\sum_{h^{'} = 1}^{s} (\frac{\sum_{i = 1}^{n} \sum_{k = 1}^{c} μ_{i k} {‖x_{i}^{h} - a_{k}^{h}‖}^{2}}{\sum_{i = 1}^{n} \sum_{k = 1}^{c} μ_{i k} {‖x_{i}^{h'} - a_{k}^{h'}‖}^{2}})}^{\frac{1}{β - 1}})}^{- 1}

(11)

a_{k j}^{h} = \sum_{i = 1}^{n} μ_{i k} x_{i j}^{h} / \sum_{i = 1}^{n} μ_{i k}

(12)

α_{k}^{(t + 1)} = \frac{1}{n} \sum_{i = 1}^{n} μ_{i k} + \frac{η_{3}}{η_{1}} α_{k}^{^{(t)}} (\ln α_{k}^{^{(t)}} - \sum_{k' = 1}^{c} α_{k'}^{^{(t)}} \ln α_{k'}^{^{(t)}})

(13)

Proof.

The Lagrangian Multiplier technique is employed to resolve the optimization problem regarding the proposed U-MV-FCM w.r.t.

μ_{i k}, v_{h,} a_{k}^{(h)}

and

α_{k} .

The Lagrangian for

J_{U - M V F C M} (U, A^{h}, V, α)

is expressed as

\tilde{J} (V, A^{h}, U, α) = J_{U - M V F C M} - λ_{1} (\sum_{k = 1}^{c} μ_{i k} - 1) - λ_{2} (\sum_{k = 1}^{c} α_{k} - 1) - λ_{3} (\sum_{h = 1}^{s} v_{h} - 1) .

By taking the partial derivative of the Lagrangian

\tilde{J} (V, A^{h}, U, α)

w.r.t.

μ_{i k}

equal to zero, we obtain

\frac{\partial J}{\partial μ_{i k}} = \sum_{h = 1}^{s} v_{h}^{β} {‖x_{i}^{h} - a_{k}^{h}‖}^{2} - η_{1} \ln α_{k} + η_{2} (\ln μ_{i k} + 1) - λ_{1} = 0 .

Then, we have

\ln μ_{i k} = \frac{1}{η_{2}} (η_{1} \ln α_{k} - \sum_{h = 1}^{s} v_{h}^{β} {‖x_{i}^{h} - a_{k}^{h}‖}^{2} + λ_{1} - η_{2}) .

Thus, we have

μ_{i k} = e^{\frac{1}{η_{2}} (η_{1} \ln α_{k} - \sum_{h = 1}^{s} v_{h}^{β} {‖x_{i}^{h} - a_{k}^{h}‖}^{2})} e^{\frac{λ_{1} - η_{2}}{η_{2}}} .

Since

\sum_{k = 1}^{c} μ_{i k} = 1,

we obtain

e^{\frac{λ_{1} - η_{2}}{η_{2}}} = 1 / \sum_{k = 1}^{c} e^{\frac{1}{η_{2}} (η_{1} \ln α_{k^{'}} - \sum_{h = 1}^{s} v_{h}^{β} {‖x_{i}^{h} - a_{k^{'}}^{h}‖}^{2})} .

Hence, the updating equation for

μ_{i k}

can be obtained as

μ_{i k} = e^{\frac{1}{η_{2}} (η_{1} \ln α_{k} - \sum_{h = 1}^{s} v_{h}^{β} {‖x_{i}^{h} - a_{k}^{h}‖}^{2})} / \sum_{k = 1}^{c} e^{\frac{1}{η_{2}} (η_{1} \ln α_{k^{'}} - \sum_{h = 1}^{s} v_{h}^{β} {‖x_{i}^{h} - a_{k^{'}}^{h}‖}^{2})}

and the updating Equation (10) for

μ_{i k}

is obtained. Similarly, the partial derivative of the Lagrangian

\tilde{J} (V, U, A^{h}, α)

w.r.t.

v_{h}

and if we make it equal to zero, we can obtain the equation

\frac{\partial \tilde{J}}{\partial v_{h}} = β v_{h}^{β - 1} \sum_{i = 1}^{n} \sum_{k = 1}^{c} μ_{i k} {‖x_{i}^{h} - a_{k}^{h}‖}^{2} - λ_{3} = 0 .

Thus, we have

v_{h} = {(λ_{3})}^{\frac{1}{β - 1}} {(β \sum_{i = 1}^{n} \sum_{k = 1}^{c} μ_{i k} {‖x_{i}^{h} - a_{k}^{h}‖}^{2})}^{- \frac{1}{β - 1}} .

Since

\sum_{h^{'} = 1}^{s} v_{h} = 1, v_{h} \in [0, 1],

we obtain

{(λ_{3})}^{\frac{1}{β - 1}} = {({(β \sum_{i = 1}^{n} \sum_{k = 1}^{c} μ_{i k} {‖x_{i}^{h^{'}} - a_{k}^{h^{'}}‖}^{2})}^{- \frac{1}{β - 1}})}^{- 1} .

Thus, the updating Equation (11) for

v_{h}

is obtained. The derivative of the U-MV-FCM objective function

J_{U R - M V F C M}

w.r.t.

a_{k}^{h}

and equal to zero, we can obtain the equation

\frac{\partial J_{U - M V F C M}}{\partial a_{k j}^{h}} = - 2 v_{h}^{β} \sum_{i = 1}^{n} μ_{i k} (x_{i j}^{h} - a_{k j}^{h}) = 0,

and then

\sum_{i = 1}^{n} μ_{i k} x_{i j}^{h} = \sum_{i = 1}^{n} μ_{i k} a_{k j}^{h} .

Thus, the updating Equation (12) for

a_{k}^{h}

can be obtained as

a_{k j}^{h} = \sum_{i = 1}^{n} μ_{i k} x_{i j}^{h} / \sum_{i = 1}^{n} μ_{i k} .

We next take the partial derivative of Lagrangian

\tilde{J} (V, A^{h}, U, α)

w.r.t.

α_{k}

and make them equal to zero. We obtain the equation

\frac{\partial J}{\partial α_{k}} = - η_{1} \sum_{i = 1}^{n} μ_{i k} (\frac{1}{α_{k}}) - η_{3} n (\ln α_{k} + 1) - λ_{2} = 0,

and then

- η_{1} \sum_{i = 1}^{n} μ_{i k} - n η_{3} α_{k} \ln α_{k} - n η_{3} α_{k} - λ_{2} α_{k} = 0 .

Thus, we have

- η_{1} \sum_{i = 1}^{n} \sum_{k = 1}^{c} μ_{i k} - n η_{3} \sum_{k = 1}^{c} α_{k} \ln α_{k} - n η_{3} \sum_{k = 1}^{c} α_{k} - λ_{2} \sum_{k = 1}^{c} α_{k} = 0 .

with

λ_{2} = - n η_{1} - n η_{3} \sum_{k = 1}^{c} α_{k} \ln α_{k} - n η_{3} .

We obtain

- η_{1} \sum_{i = 1}^{n} μ_{i k} - n η_{3} α_{k} \ln α_{k} - n η_{3} α_{k} - α_{k} (- n η_{1} - n η_{3} \sum_{k^{'} = 1}^{c} α_{k^{'}} \ln α_{k^{'}} - n η_{3}) = 0

and then we find the updating Equation (13) for

α_{k}

as

α_{k}^{(t + 1)} = \frac{1}{n} \sum_{i = 1}^{n} μ_{i k} + \frac{η_{3}}{η_{1}} α_{k}^{(t)} (\ln α_{k}^{(t)} - \sum_{k^{'} = 1}^{c} α_{k^{'}}^{(t)} \ln α_{k^{'}}^{(t)}),

where t denotes the iteration number in the algorithm. □

The data are commonly used as a potential source for reaching an educated decision. In a multiview data scenario, the high-level detail information is used to capture the correlation between the data features in the hth view. It is known that determining an optimal number of clusters for the multiview FCM clustering algorithm is quite difficult, and no work has been written yet in the literature. This process usually requires much more effort to understand the data, especially in real world application. To make proper assumptions, it is essential to have domain expertise in a specific application. In practice, to assign a specific value to the number c of clusters is needed on the base of proprietary data insights. However, different datasets may result in different c values. It becomes difficult in real applications, like biological applications with Prokaryotic data analysis. This variability becomes particularly prominent. Thus, the regularization of the proposed U-MV-FCM clustering algorithm is designed to accommodate varying preferences and conceptual frameworks (Figure 1). It is known that researchers had investigated the Prokaryotic dataset with its actual number c as 4. Based on these insights, the users will specify c = 4 in the implementation of most MV-FCM clustering algorithms for Prokaryotic data. However, without giving the number c of clusters, U-MV-FCM can potentially obtain the optimal number of clusters with c = 4.

In the U-MV-FCM objective function

J_{U - M V F C M} (U, A^{h}, V, α),

we have four parameters

η_{1}, η_{2}, η_{3}

and

β

. We now use two artificial datasets to analyze the behaviors of these parameters and then give their estimations.

Artificial Data 1.

A two-view numerical dataset with two clusters & two feature components is considered. Gaussian mixture model (GMM) is used to generate a two-component two-variate data point for each view where their probability proportions are

α_{1}^{(1)} = α_{1}^{(2)} = 0.7

and

α_{2}^{(1)} = α_{2}^{(2)} = 0.3 .

The means

μ_{k}^{(1)}

for the first view are (5, 6) and (12, 6). The means

μ_{k}^{(2)}

for the second view are (6, 12) & (6, 5). For the two views, covariance matrices are

\sum_{1}^{(1)} = \sum_{1}^{(2)} = \sum_{2}^{(1)} = \sum_{2}^{(2)} = (\begin{matrix} 1 & 0 \\ 0 & 1 \end{matrix}) .

The

x_{1}^{(1)}

and

x_{2}^{(1)}

are the coordinates for the first view,

x_{1}^{(2)}

&

x_{2}^{(2)}

denotes the coordinates for the second view, as displayed in Figure 2a and Figure 2b, respectively.

Artificial Data 2.

A three-view numerical dataset with five clusters and two feature components is considered. The data points in each view are generated from a two-component two-variate Gaussian mixture model (GMM) with their mixing proportions

α_{1}^{(1)} = α_{1}^{(2)} = α_{1}^{(3)} = 0.1

,

α_{2}^{(1)} = α_{3}^{(1)} = α_{2}^{(2)} = α_{3}^{(2)} = α_{2}^{(3)} = α_{3}^{(3)} = 0.25

and

α_{4}^{(1)} = α_{5}^{(1)} = α_{4}^{(2)} = α_{5}^{(2)} = α_{4}^{(3)} = α_{5}^{(3)} = 0.2 .

The means

μ_{k}^{(1)}

for the first view are

(\begin{matrix} 4.580 & 3.344 \end{matrix}), (\begin{matrix} 3.8730 & 2.637 \end{matrix}), (\begin{matrix} 1 . 752 & 1 . 930 \end{matrix}), (\begin{matrix} 1 . 752 & 3 . 344 \end{matrix}), (\begin{matrix} 1 . 044 & 2 . 637 \end{matrix}) .

The means

μ_{k}^{(3)}

for the third view are

(\begin{matrix} 2 & 5 \end{matrix}), (\begin{matrix} 0 . 65 & 0 . 67 \end{matrix}),

(\begin{matrix} 3 . 10 & 2 . 48 \end{matrix}),

(\begin{matrix} 2 & 3 . 5 \end{matrix}), (\begin{matrix} 1 & 2 . 48 \end{matrix}) .

These

x_{1}^{(1)}

&

x_{2}^{(1)}

are the coordinates for the first view,

x_{1}^{(2)}

&

x_{2}^{(2)}

denotes the second view coordinates, and

x_{1}^{(3)}

&

x_{2}^{(3)}

denotes the third view coordinates, as presented in Figure 3a–c, respectively. As can be seen, the two views of the

x_{1}^{(1)}

&

x_{2}^{(1)}

and

x_{1}^{(2)}

&

x_{2}^{(2)}

are mirror images of each other. So it is clear that the data points

x_{1}^{(2)}

&

x_{2}^{(2)}

for the second view are generated on the basis of inverse data points of

x_{1}^{(1)}

&

x_{2}^{(1) .}

Parameter

β

: We know that the parameter

β

is a view-weight exponent. Our main idea is to find an optimal clusters number and also to estimate the importance of different views in multiview datasets. When we simulate with different values of

β

for the Artificial Datasets 1 and 2, we find that it makes a difference for the proposed U-MV-FCM. That is, if the values are too small or too large, then the overall accuracy rates will be decreasing. Intuitively, a small or a large

β

can represent an imbalanced importance for one view. Thus, the issue is: how to estimate

β

so that it can represent the importance of different views. This question can be answered by designing an appropriate balancing estimator for

β

. In this sense, a balance movement of

β

values can control the distribution of one view in the multiview scenario for simultaneously finding a good number of clusters. To test this hypothesis, we implement our balancing estimator of

β

by using the above Artificial Datasets 1 and 2. This idea can be addressed by considering the minimum weights of the probability proportion k into account. As at every iteration the values of mixing proportion k is changed, the necessary condition estimating

β

is to have a sufficient number that generated from

μ_{i k}, a_{k}^{(t)}

and

a_{k^{'}}^{(t)} .

In machine learning, it is a common knowledge that

π

can be used to measure the identifiability of the data. Therefore, the view-weight exponent

β

will algebraically consider the minimal solution of

α_{k}^{^{(t - 1)}}

and then construct it with

π

value. In this sense, the infinite series of

π

is suitable to adjust the dimensions of each view into well-structured shapes. The values of

β

based on the movement of the probability proportion kth shared across different views combined with

π

value can be presented in the form

β^{(t)} = 2 π \arg \min_{1 \leq k \leq c} [α_{k}^{^{(t - 1)}}]

(14)

where

\arg \min_{1 \leq k \leq c} []

stands for the argument for which the minimum of

α_{k}^{^{(t - 1)}}

is attained, and that is a real and positive number. To be noted

α_{k}^{^{(t + 1)}}

it must be derived based on the constant

n, η_{1}, η_{3},

μ_{i k,} α_{k}^{^{(t)}},

and

α_{k^{'}}^{^{(t)}} .

In this sense, the value of

β

not only depends on the input of the k and k’ current probability proportion

α_{k}^{^{(t)}}, α_{k^{'}}^{^{(t)}},

but also depends on the k and k’ value of the probability proportion at the previous time

α_{k}^{^{(t - 1)}}, α_{k^{'}}^{^{(t - 1)}} .

Using

π

and the minimal solution of

α

to estimate

β

enables the proposed U-MV-FCM algorithm to simultaneously approximate and reproduce the qualifying number of clusters such that the Euclidean distances

d_{i k} = ‖x_{i} - a_{k}‖

of hth components are minimized while

α_{k}^{^{(t)}}

is maximized. In this sense, the data points will be well-structured and easy to be classified.

Parameters

η_{1}, η_{2}

: One of the motivations for this simulation of balancing the parameters is to tune the level in determining the number of clusters in each iteration. A good combination value between

η_{1}

and

η_{2}

needs to be in general stable, scalable, and fit to any types of multiview data. Usually, the functions of

e^{- t},

e^{- t / 150},

e^{- t / 250},

e^{- t / 400},

e^{- t / 600}

and

e^{- t / 1000}

are used as the learning functions in which

y = e^{- t}

decreases faster, but

e^{- t / 1000}

decreases slower at each iteration t. According to the penalty term

\sum_{i = 1}^{n} \sum_{k = 1}^{c} μ_{i k} \ln α_{k}

, it has effects on

μ_{i k}

and

α_{k}

. Thus, the learning behavior for

η_{1}

should not decrease too slowly or too fast. Further, the penalty term

\sum_{i = 1}^{n} \sum_{k = 1}^{c} μ_{i k} \ln μ_{i k}

is the entropy on

μ_{i k}

, and so the learning function for

η_{2}

should decrease fast. Thus, the decreasing learning functions for

η_{1}

and

η_{2}

are chosen as follows:

η_{1} = e^{- \frac{t}{150}}

(15)

η_{2} = e^{- \frac{t}{400}}

(16)

The terms in (15) and (16) can accelerate the process in finding the number of clusters during iterations in the algorithm. To address this issue, taking the square root of the iteration t enables the U-MV-FCM algorithm to reduce the number of clusters in each iteration. To assist the smooth movement, we use the following

η_{1}

and

η_{2}

for the U-MV-FCM algorithm:

η_{1} = e^{- \frac{t}{150}} \frac{1}{\sqrt{t}}

(17)

η_{2} = e^{- \frac{t}{400}} \frac{1}{\sqrt{t}}

(18)

After updating the number of clusters c, the remaining mixing proportion

α_{k'}

and corresponding

μ_{i k'}

need to be re-normalized by

α_{k'} = \frac{α_{k'}}{\sum_{l = 1}^{c^{t + 1}} α_{l}}

(19)

μ_{i k'} = \frac{μ_{i k'}}{\sum_{l = 1}^{c^{t + 1}} μ_{i l}}

(20)

For the two parameters

η_{1}

and

η_{2}

in Figure 4, we consider four decreasing learning rates mentioned in Equations (15)–(18). From Figure 4a,b, we can see that, as the starting steps, the decreasing rates based on Equations (15) and (16) are slower than that based on Equations (17) and (18). However, the decreasing rates based on Equations (15) and (16) are faster than that based on Equations (17) and (18) when the algorithm becomes stable. In this sense, we suppose that the learning rates of Equations (17) and (18) should be better than the learning rates of Equations (15) and (16) for our proposed U-MV-FCM algorithm. For further observations, we represent the processes of the newly proposed U-MV-FCM in determining the optimal number of clusters c* by using the learning rates of Equations (17) and (18) as presented in Figure 4c,d, in which Figure 4c is for Artificial Dataset 1 and Figure 4d is for Artificial Data 2. It is clear that Artificial Data 1 initially uses n data points as the number of clusters c, i.e., c = n = 600. After the second iteration, the number of clusters reduces from 600 to 327 and then it stops at the iteration t = 9 with the optimal number of clusters c* = 2, as portrayed in Figure 4c. Given the result for Artificial Data 2 with initial c = n = 500, it successfully shows that the proposed U-MV-FCM can produce the optimal number c* = 5 of clusters at the iteration t = 12, as shown in Figure 4d. These results reflect the parameters

η_{1}

and

η_{2}

with Equations (17) and (18) can obtain the correct optimal number of clusters for Artificial Data 1 and 2. In general, the parameters

η_{1}

and

η_{2}

with Equations (17) and (18) as the learning rates are more efficient than those with Equations (15) and (16) for the proposed U-MV-FCM. In this sense, we use Equations (17) and (18) as the learning rates in our proposed U-MV-FCM algorithm.

Parameter

η_{3}

: Furthermore, the Parameter

η_{3}

is the balancing parameter to control the competition. Here, we directly use the restriction of

- e^{- 1} \leq α_{k} \ln α_{k} < 0 .

First, if

0 < α_{k} \leq 1_{} \forall k

and

E = \sum_{l = 1}^{c} α_{l} \ln α_{l} < 0,

then we have

α_{k} E = α_{k} \sum_{l = 1}^{c} α_{l} \ln α_{l} < 0 .

Second, using

- e^{- 1} \leq α_{k} \ln α_{k} < 0

and

α_{k} E = α_{k} \sum_{l = 1}^{c} α_{l} \ln α_{l} < 0,

we obtain

- e^{- 1} η_{3} < η_{3} α_{k} (\ln α_{k} - \sum_{l = 1}^{c} α_{l} \ln α_{l}) < η_{3} (- α_{k} E) .

Third, we apply

\sum_{k = 1}^{c} α_{k} = 1

and

α_{k} < 1 / 2,

then we can obtain

(\ln α_{k} - \sum_{l = 1}^{c} α_{l} \ln α_{l}) < 0 .

Fourth, because

α_{k} > 0

then

- e^{- 1} η_{3} > - \max \{α_{k} | α_{k} < 1 / 2, k = 1, 2, \dots, c\} .

So we obtain

η_{3} < \max \{α_{k} e | α_{k} < 1 / 2, k = 1, 2, \dots, c\} < e / 2 .

Note that

η_{3}

is a solution to the matric

α_{k}

, then justifying an appropriate input value of this

η_{3}

is essential. The estimation of

η_{3}

’s can easily be formulated, if the dispersion between

α_{k}^{^{(t + 1)}}

and

α_{k}^{^{(t)}}

is not too small, and to enhance the competition, it must be large. If the dispersion between

α_{k}^{^{(t + 1)}}

and

α_{k}^{^{(t)}}

is large, then to maintain stability

η_{3}

will be small. Thus, we introduce an updating equation for

η_{3}

as

η_{3} = \frac{\sum_{k = 1}^{c} \exp \{- r n |α_{k}^{t + 1} - α_{k}^{t}|\}}{c},

(21)

In addition, we consider the restriction

\max_{1 \leq k \leq c} α_{k}^{^{(t + 1)}} \leq 1 .

But,

\max_{1 \leq k \leq c} α_{k}^{^{(t + 1)}} \leq \max_{1 \leq k \leq c} (\frac{1}{n} \sum_{i = 1}^{n} μ_{i k}) + \frac{η_{3}}{η_{1}} \max_{1 \leq k \leq c} α_{k}^{^{(t)}} (\ln \max_{1 \leq k \leq c} α_{k}^{^{(t)}} - \sum_{l = 1}^{c} α_{l}^{^{(t)}} \ln α_{l}^{^{(t)}}) \max_{1 \leq k \leq c} (\frac{1}{n} \sum_{i = 1}^{n} μ_{i k}) + η_{3} \max_{1 \leq k \leq c} α_{k}^{^{(t)}} (- \sum_{l = 1}^{c} α_{l}^{^{(t)}} \ln α_{l}^{^{(t)}})],

[\max_{1 \leq k \leq c} (\frac{1}{n} \sum_{i = 1}^{n} μ_{i k}) + \frac{η_{3}}{η_{1}} \max_{1 \leq k \leq c} α_{k}^{^{(t)}} (\ln \max_{1 \leq k \leq c} α_{k}^{^{(t)}} - \sum_{l = 1}^{c} α_{l}^{^{(t)}} \ln α_{l}^{^{(t)}}) < \max_{1 \leq k \leq c} (\frac{1}{n} \sum_{i = 1}^{n} μ_{i k}) + η_{3} \max_{1 \leq k \leq c} α_{k}^{^{(t)}} (- \sum_{l = 1}^{c} α_{l}^{^{(t)}} \ln α_{l}^{^{(t)}})] .

Thus, if

\max_{1 \leq k \leq c} (\frac{1}{n} \sum_{i = 1}^{n} μ_{i k}) + η_{3} \max_{1 \leq k \leq c} α_{k}^{^{(t)}} (- \sum_{l = 1}^{c} α_{l}^{^{(t)}} \ln α_{l}^{^{(t)}}) \leq 1,

then the restriction will be held. It employs that

η_{3} \leq \frac{(1 - \max_{1 \leq k \leq c} (\frac{1}{n} \sum_{i = 1}^{n} μ_{i k}))}{(- \max_{1 \leq k \leq c} α_{k}^{^{(t)}} \sum_{l = 1}^{c} α_{l}^{^{(t)}} \ln α_{l}^{^{(t)}})}

(22)

Thus, by combining Equations (21) and (22), we obtain

η_{3} = \min (\frac{\sum_{k = 1}^{c} \exp (- r n |α_{k}^{^{(t + 1)}} - α_{k}^{^{(t)}}|)}{c}, \frac{1 - \max_{1 \leq k \leq c} (\frac{1}{n} \sum_{i = 1}^{n} μ_{i k})}{(- \max_{1 \leq k \leq c} α_{k}^{^{(t)}} \sum_{l = 1}^{c} α_{l}^{^{(t)}} \ln α_{l}^{^{(t)}})})

(23)

We use Figure 5 to demonstrate the parameter sensitivity of

η_{1}, η_{2}, η_{3},

and

β

for Artificial Datasets 1 and 2.

The formulation of multiview learning is significant as the growing of data is massive. Each data point can be represented by different views and hence various restrictions including the parameter estimations are required. In the U-MV-FCM, we propose several parameter estimators to explore patterns in the multiview data. There are many possible combinations of the parameters to process the data, but not all combinations can perform well. Thus, the multiview data can be patterned by using hundreds of possible combinations of the parameters before choosing the one that reveals meaningful insights behind the data. Our proposed U-MV-FCM aims to find the optimal number of clusters with high-quality cluster centroids in an efficient manner. One of the important parameters to determine the success of the U-MV-FCM is

r .

In relation to

η_{3}

estimator in (23), we also consider

r

as a nonzero element. We then formulate

r

as

\underset{1 \leq h \leq s}{r = \min} \{\min \{1, 2 / t * d_{h}^{⌊t * d_{h} / 2 - 1⌋}\}\},

where

⌊a⌋

denotes the largest integer that is no more than

a .

Thus, the proposed U-MV-FCM clustering algorithm is presented in Algorithm 1:

Algorithm 1. The U-MV-FCM clustering algorithm

Input: Dataset

X = \{x_{1}, \dots, x_{n}\}

with

x_{i} = {\{x_{i}^{h}\}}_{h = 1}^{s}

and

x_{i}^{h} = {\{x_{i j}^{h}\}}_{j = 1}^{d_{h}}

.
Output:

a_{k j}^{h}, μ_{i k},

v_{h}

and

α_{k}

.
Initialization: Give initial

c^{(0)} = n

,

α_{k}^{(0)} = 1 / n

,

a_{k, h}^{(0)} = x_{i}

, initial

η_{3}^{(0)} = 1

, initialize view
weight

V^{(t)} = {[v_{h}]}_{1 \times s}

(user may define

v_{h} = 1 / s \forall h

), and set iteration counter t = 0,
and fit

ε > 0

.
Step 1:

Calculate η_{1}^{(t)}, η_{2}^{(t)}

by Equations (17) and (18).
Step 2:

Calculate β^{(t)}

by Equation (14).
Step 3: Compute the membership

U^{^{(t)}}

using

A^{h^{(t - 1)}}

,

V^{h^{(t - 1)}}

,

α^{^{(t - 1)}}

,

c^{(t - 1)}

,

η_{1}^{(t)}, η_{2}^{(t)}

, and

β^{(t)}

by Equation (10).
Step 4:

Update α^{^{(t)}}

with η_{1}^{(t)}, η_{3}^{(t)}

, U^{^{(t)}}

and α^{^{(t - 1)}}

by Equation (13).
Step 5:

Compute η_{3}^{(t)}

with α^{^{(t)}}

and α^{^{(t - 1)}}

by Equation (23).
Step 6: Update

c^{(t - 1)}

to

c^{(t)}

by discarding those clusters with

α^{^{(t)}} \leq 1 / n

and adjust

α^{^{(t)}}

and U^{^{(t)}}

by Equations (19) and (20).

IF t \geq 60

and c^{(t - 60)} - c^{(t)} = 0

, THEN let

η_{3}^{(t)} = 0

.
Step 7: Update the view weight

V^{h^{(t)}}

using

c^{(t)}

,

A^{h^{(t - 1)}}

,

β^{(t)}

, and

U^{^{(t)}}

by Equation (11).
Step 8:

Update A^{h^{(t)}}

with c^{(t)}

and U^{^{(t)}}

by Equation (12).
Step 9:

IF |‖A^{h^{(t)}}‖ - ‖A^{h^{(t - 1)}}‖| < ε

,

then stop;
ELSE let t = t +1 and return to Step 1.

We next demonstrate the parameters and cluster behaviors for the proposed U-MV-FCM clustering algorithm. Parameter selection is one of the important step in clustering data, as the high-quality structured data will be obtained if the choice of the combination parameters is provided. The relation and interaction between

η_{1}, η_{2}, η_{3}

, and

β

can be parameterized and lead the proposed U-MV-FCM to give the optimal number of clusters c. As displayed in Figure 6a–d, the proposed U-MV-FCM reduced the clusters number for both views that are the first and second views of Artificial 1 from originally c = 600 to c = 327 after t = 2 and continuously reduced to c* = 2 after t = 9 with AR = 1.00. For Artificial 2 as shown in Figure 7a–f, the proposed U-MV-FCM processed the data from the three views from originally c = 500 to 295 after the second iteration and converged after fourteen iterations with c* = 5 and AR = 0.9980.

4. Experimental Results and Comparisons

In this section, we show the comparison results of the proposed U-MV-FCM with the the Co-FKM, MinMax-FCM, and WV-Co-FCM algorithms with consideration of biological, image, text, and webpage news datasets. The six benchmark datasets used are Wikipedia Articles [44,45], Prokaryotic [46], WebKB [47,48], 3-Sources [49], Reuters [50] and Extended YaleB [51]. These three artificial datasets such as Artificial 1, Artificial 2, and Syn500 [52] are also involved in experiments to evaluate our proposed U-MV-FCM algorithm. Table 2 represents the brief descriptions of these six benchmark dataset numbers in terms of data types, view s, cluster c, data points n, and feature dimension

d_{h}

.

4.1. Performance Evaluation of MV-FCM Clustering Algorithms

In this subsection, the proposed U-MV-FCM and these existing algorithms such as the Co-FKM, MinMax-FCM, and WV-Co-FCM are validated by using the three artificial datasets and six benchmark multiview datasets. In order to evaluate the performance of the aforementioned MV-FCM clustering algorithms, we utilize the performance evaluation measures including AR (accuracy rate), RI (rand index) [53], FMI (Fowlkes and Mallows index) [54], NMI (normalized mutual information) [55], and JI (Jaccard index) [56]. In general, these performance evaluation measures are the most commonly used indices in the literature. The values of these external validation measures range from 0 to 1. The higher the rate, the better the performance. For a fair comparison, simulation for the Co-FKM, MinMax-FCM, and WV-Co-FCM would be designed over 51 different random initializations. For quantitative evaluation, we only report the average of 51 different random initializations and then summarize the average of fuzziness m ranging from 1.1–2.0. It should be noted that our proposed U-MV-FCM aims at finding the optimal number of clusters with a high-quality cluster centroids in an efficient manner. Furthermore, our proposed U-MV-FCM is designed to automatically assign the viewpoint data as initial cluster centers. While in the existing algorithms, users need to specify a certain number as initialization for clusters. Therefore, in these experimental setups of Co-FKM, WV-Co-FCM, and MinMax-FCM, we assign the accurate clusters number on each multiview data as their initial number of clusters.

Findings and Discussion: The results regarding the performance of the proposed U-MV-FCM with the existing MV-FCM clustering algorithms on the nine multiview datasets have been demonstrated. Table 3 shows the clusters number obtained by the proposed U-MV-FCM as well as the information regarding the actual number of clusters. The optimal number of clusters c* obtained by the proposed U-MV-FCM is displayed inside the parentheses. It can be observed that the proposed U-MV-FCM obtains an optimal number of clusters for Artificial Data 1 (c* = 2), for Artificial Data 2 (c* = 5), Syn500 (c* = 2), and the real-world datasets Prokaryotic (c* = 4), Wikipedia Articles (c* = 10), WebKB (c* = 3), Extended YaleB (c* = 10), 3-Source (c* = 3), Reuters (c* = 2), respectively.

We established the results of the U-MV-FCM in terms of AV-AR, AV-FMI, AV-RI, AV-NMI, and AV-JI in Table 4, Table 5, Table 6, Table 7, Table 8, Table 9 and Table 10. It should be noted that the best result is marked in boldface, and the second one is marked in underlining. It can be witnessed that the proposed U-MV-FCM performs superior in the case of the evaluation measure as well as the comparison to other state-of-the-art algorithms on Artificial Data 1, Artificial Data 2, Syn500, Prokaryotic, Wikipedia Articles, and WebKB datasets. In the case of 3-Sources and Reuters datasets, the proposed U-MV-FCM failed to recognize the optimal number of clusters c* as six. It is apparent that the MV-FCM unveils a higher AR, FMI, RI, and JI in most of the real MV datasets.

It can be clearly seen from Table 4 that the evaluation measures for the proposed U-MV-FCM performs better compared to the other algorithms except the WV-Co-FCM for Artificial Data 1.

Table 5 shows that our proposed U-MV-FCM performed the best compared to the other algorithms, and the WV-Co-FCM is ranked second on the base of the performance evaluation measures. It is important to note that, in our proposed U-MV-FCM, the optimal c is calculated automatically, while in the other algorithms the performance measures are calculated with a given number of clusters as initialization.

Table 6 shows that our proposed U-MV-FCM performed the best compared to the other algorithms. The WV-Co-FCM is ranked second on the base of the performance evaluation measures for AV-AR, AV-RI, AV-NMI, and the MinMax-FCM for AV-FMI and AV-JI, respectively. It is worth mentioning that our proposed U-MV-FCM calculates the optimal c in an automatic way, while in the other algorithms the performance evaluation measures are calculated with a given number of clusters.

From Table 7, it is evident that our proposed U-MV-FCM performed the best compared to the other algorithms. The WV-Co-FCM is ranked second in respect to the performance evaluation measures for AV-AR, AV-FMI, AV-NMI, AV-JI, and the MinMax-FCM for AV-RI, respectively. We mention that our proposed U-MV-FCM calculates the optimal c in an automatic way, but in the other algorithms, the performance evaluation measures are calculated with a given number of clusters.

It can be seen from Table 8 that our proposed U-MV-FCM performed the best compared to the other algorithms, and the Co-FKM is ranked second in the performance evaluation measures. It is worth saying that our proposed U-MV-FCM calculates the optimal c in an automatic way. But, in the other algorithms, the performance evaluation measures are calculated with a given number of clusters.

Table 9 shows that our proposed U-MV-FCM performed the best compared to the other algorithms, while the Co-FKM is ranked second in respect to the performance evaluation measures for AV-RI, AV-NMI, AV-JI, WV-Co-FCM for AV-AR and the MinMax-FCM for AV-FMI, respectively. It is worth mentioning that our proposed U-MV-FCM calculates the optimal c in an automatic way, while in the other algorithms the performance evaluation measures are calculated with a given number of clusters in initialization.

Co-FKM performs the best for the Extended YaleB dataset in terms of AV-AR, AV-RI, and AV-NMI, while for AV-FMI and AV-JI, the WV-Co-FCM and MinMax-FCM performs the best, and our proposed algorithm ranked second.

The running time (RTs) for all algorithms on the nine datasets are reported in Table 11. The RTs on Artificial Data 1, Artificial Data 2, Syn500, Prokaryotic, Wikipedia Articles, 3-Sources, WebKB, Reuters, and Extended YaleB for the MinMax-FCM are less compared to our proposed U-MV-FCM. However, it is noteworthy that the MinMax-FCM and the other compared algorithms depend on the fuzziness parameter m. Hence, if we calculate and report the total running time of the MinMax-FCM under 51 different random initializations for ten different m’s, then the introduced U-MV-FCM is faster and more efficient.

For visual comparison, the performance results for the Co-FKM, WV-Co-FCM, and MinMax-FCM algorithms in terms of minimum values, average values, and maximum values on Prokaryotic, Wikipedia Articles, WebKB, and Extended YaleB datasets are displayed in Figure 8, Figure 9, Figure 10, Figure 11, Figure 12, Figure 13, Figure 14, Figure 15 and Figure 16.

Figure 8 shows the minimum, average, and maximum values for the Co-FKM on Prokaryotic data; the Min-AR, Min-RI, Min-FMI, and Min-JI in Figure 8a are varied over the entire m, while the Min-NMI is stable. The Avg-AR, Avg-RI, Avg-FMI, Avg-NMI, and Avg-JI in Figure 8b show fluctuating patterns. Figure 8c shows a higher variability.

Figure 9 shows the performance results for the Co-FKM on the Wikipedia Articles dataset. The Min-AR, Min-RI, Min-FMI, Min-NMI, and Min-JI in Figure 9a show a downward pattern. The Avg-AR, Avg-RI, Avg-FMI, Avg-NMI, and Avg-JI in Figure 9b show a downward trend when m = 1.1, but stable when m = 1.3, and a decreasing movement when m = 1.9. The Max-AR, Max-FMI, and Max-JI in Figure 9c is stable when m =1.1, but a descending trend at m = 1.8. The Max-RI and Max-NMI have a downward trend when m = 1.9.

Figure 10 portrays the performance results for the Co-FKM on WebKB data. The Min-AR, Min-RI, and Min-NMI in Figure 10a show an irregular pattern, initially decreases and reaches stability when m = 1.9. The Min-FMI and Min-JI show a downward trend when m = 1.1, but stable when m = 1.3. The Avg-AR, Avg-NMI, Avg-FMI, and Avg-JI in Figure 10b is stable when m = 1.2. The Avg-RI in Figure 10c reaches stability over the entire m.

Figure 11, Figure 12 and Figure 13 display the performance results for the MinMax-FCM on Prokaryotic, Wikipedia Articles, and WebKB data, respectively, while Figure 14, Figure 15 and Figure 16 show the performance for WV-Co-FCM on Prokaryotic, Wikipedia Articles and WebKB data, respectively.

4.2. The Effect of Parameters for MV-FCM Clustering Algorithms

It is commonly known that (single-view) FCM is basically affected by the fuzziness parameter m. The m values are generated to control the distribution of intra-clusters in the data. Since m values are varied, it leads to different structures. In this sense, m is vital to trial the data for producing a recommended result. At the same time, MV-FCM, as a high considerably clustering algorithm, considers more parameters to maintain good performance. Since the MV-FCM clustering algorithms are developed to handle multiview data, it is reasonable if the objective function became more complex. The MV-FCM involves additional variables such as view weights, feature-view weights, an agreement between memberships, etc. These additional variables are driven by some parameters to handle their distribution, and mostly are generated by users. Due to these complexities, it is highly recommended to measure these MV-FCM performances by employing different values of the parameters. The sequences of m with 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, and 2.0 are implemented in this experiment. The parameter setting of Co-FKM, MinMax-FCM, and WV-Co-FCM is varied and control different tasks. Since these parameters are correlated to each other, parameter tunings are required to obtain a precious pattern. In this sense, an experiment to evaluate the consistency of m with fixed combination parameters on three MV-FCM algorithms is considerable. Table 12 reports the parameter setting for these three MV-FCM clustering algorithms of Co-FKM, MinMax-FCM, and WV-Co-FCM. We mention that our proposed U-MV-FCM clustering algorithm is free of parameters.

4.3. Ensembling Membership-Based Indices to MV-FCM Clustering Algorithms

For further experiments, we consider ensembling MV-FCM clustering algorithms with some membership-based internal indices. These MV-FCM clustering algorithms of Co-FKM, MinMax-FCM, and WV-Co-FCM will be employed with these membership-based CVIs of Partition Coefficient (PC) [32], Partition Entropy (PE) [33], and Modified Partition Coefficient (MPC) [36]. For MV-FCM cases, the cluster centers are initially generated and updated separately for each view. In this sense, different clusters number will lead to different output. Since cluster-center-based CVIs were designed only for (single view) FCM clustering algorithm, it is not recommended to perform them for the MV-FCM clustering algorithms. However, membership-based CVIs are applicable to the MV-FCM clustering algorithms. The MV-FCM clustering algorithms usually generate their memberships in two ways. One is generated by treating the importance of each data-view membership equally. Another one is to generate the membership for each view (local-membership), and then to estimate a global solution by collaborating local-memberships based on their agreement/disagreement degree. In this sense, membership-based CVIs are well considered for this experiment. These membership-based CVIs are PC, PE, and MPC. In order to verify the effectiveness of Co-FKM, MinMax-FCM, and WV-Co-FCM algorithms with the CVIs of PC, PE, and MPC for finding an optimal number of clusters, we ensemble the MV-FCM clustering algorithms with PC, PE, and MPC, which are termed as Co-FKM+PC, Co-FKM+PE, Co-FKM+MPC, MinMax-FCM+PC, MinMax-FCM+PE, MinMax-FCM+MPC, WV-Co-FCM+PC, WV-Co-FCM+PE, and WV-Co-FCM+MPC, respectively.

Findings and Discussion: These Co-FKM+PC, Co-FKM+PE, Co-FKM+MPC, MinMax-FCM+PC, MinMax-FCM+PE, MinMax-FCM+MPC, WV-Co-FCM+PC, WV-Co-FCM+PE, and WV-Co-FCM+MPC are implemented on three artificial and five real multiview datasets, called Artificial 1, Artificial 2, Syn500, Prokaryotic, Wikipedia Articles, 3-Source, WebKB, and Reuters, respectively. Figure 17, Figure 18 and Figure 19 visualize the distribution of these Co-FKM+PC, Co-FKM+PE, and Co-FKM+MPC on the three artificial multiview datasets, with m ranging from 1.1 to 2.0. As it can be seen, the weights for Co-FKM+PC, Co-FKM+PE, and Co-FKM+MPC follow a similar distribution. In this sense, each m mostly produces the same optimal number of clusters. The same behavior is also presented for CVIs- ensembled MinMax-FCM and WV-Co-FCM. As shown in Figure 20, Figure 21, Figure 22 and Figure 23, in cases of Artificial 2 and Syn500 MV datasets, the pattern distributions of these MinMax-FCM+MPC, WV-Co-FCM+PC, WV-Co-FCM+PE, and WV-Co-FCM+MPC give the same optimal number of clusters. As noted, Figure 17, Figure 18, Figure 19, Figure 20, Figure 21, Figure 22 and Figure 23 are generated based on a random initialization.

We generate 51 different random initializations for Co-FKM+PC, Co-FKM+PE, Co-FKM+MPC, MinMax-FCM+PC, MinMax-FCM+PE, MinMax-FCM+MPC, WV-Co-FCM+PC, WV-Co-FCM+PE, and WV-Co-FCM+MPC with m = 2 on the eight multiview datasets. The results based on 51 different random initializations on the eight datasets are reported in Table 13. In terms of the three artificial and five real multiview datasets, Table 13 illustrates that WV-Co-FCM+PC, MinMax-FCM+PE, and MinMax-FCM+MPC are the best investigators, while the second investigators are Co-FKM+PC, Co-FKM+PE, and WV-Co-FCM+MPC. Furthermore, we report the running time (RTs) for these CVIs- ensembled Co-FKM, MinMax-FCM, and WV-Co-FCM on the eight multiview datasets in Table 14. It can be seen that MinMax-FCM+PC, MinMax-FCM+PE, and MinMax+MPC perform faster as compared to the others.

5. Conclusions

In this paper, we proposed a novel multiview fuzzy c-means (MV-FCM) clustering algorithm to find the optimal number of clusters with high-quality cluster centroids in an efficient manner, called U-MV-FCM. The proposed U-MV-FCM can automatically produce an optimal number of clusters and simultaneously improve the accuracy rate without a parameter setting. The view-points on the multiview data are assigned as the initial cluster centers, and the proposed U-MV-FCM can directly reduce the number of clusters and automatically produce an optimal number of clusters. To assess the performance of the proposed U-MV-FCM algorithm, a comparative analysis with the existing MV-FCM algorithms, such as the Co-FKM, MinMax-FCM, and WV-Co-FCM, was made. Furthermore, experimental results based on the CVIs-ensembled MV-FCM clustering algorithms are generated, namely Co-FKM+PC, Co-FKM+PE, Co-FKM+MPC, MinMax-FCM+PC, MinMax-FCM+PE, MinMax-FCM+MPC, WV-Co-FCM+PC, WV-Co-FCM+PE, and WV-Co-FCM+MPC. The effectiveness of experimental results is conducted by using three artificial and six benchmark multiview datasets, including Artificial Data 1, Artificial Data 2, Syn500, Prokaryotic, Wikipedia Articles, 3-Sources, WebKB, Reuters and Extended YaleB. For the Artificial 1, Artificial 2, Syn500, Prokaryotic, Wikipedia Articles, WebKb datasets, our proposed method outperforms these existing algorithms on the base of five validity indices, namely AV-AR, AV-FMI, AV-RI, AV-NMI and AV-JI, except Extended YaleB dataset in which our proposed method shows the second better performance.

Based on the experimental results, it is evident that the U-MV-FCM outperformed the existing algorithms on most datasets, as indicated by the higher accuracy rates and more accurate cluster identification with automatically finding an optimal number of clusters. Thus, it can also be concluded that the learning approach for multiview data in the U-MV-FCM is suitable to obtain more a accurate number of clusters and has better clustering accuracy than others, such as these CVIs-ensembled MV-FCM clustering algorithms. In addition, the U-MV-FCM is more stable from the aspect of the variance of multiple results as our parameter settings are more robust and so it improves the performance of fuzzy clustering algorithms in multiview scenarios. As in the U-MV-FCM, we use the idea of global membership; thus, to study further we suggest the scheme of collaboration in membership, and we can extend it to feature extraction for the U-MV-FCM. On the other hand, we mention that if the desired selected clusters are not applicable to MV data, then the proposed U-MV-FCM failed to evaluate the accurate number of clusters c*. This will be our further research to improve the proposed U-MV-FCM clustering algorithm. The work presented in this paper, introducing the U-MV-FCM clustering algorithm, holds significant promise and potential impact in the field of multiview clustering. The conclusions drawn from this study highlight several key aspects that underline the significance of this work. The U-MV-FCM algorithm outperforms existing multiview clustering algorithms in terms of accuracy and the identification of an optimal number of clusters across various datasets. On the other hand, our research emphasizes the stability of the U-MV-FCM in which the algorithm is less prone to variations in results, due to the robust parameter settings. The research has real-world implications, as the improved multiview clustering offered by the U-MV-FCM can benefit various domains, such as data analysis, information retrieval, bioinformatics, social network analysis, and more. It can lead to more accurate insights and decision-making in these application areas. In general, the U-MV-FCM may fail to evaluate the accurate number of clusters in case the selected clusters are not applicable to multiview data, and so future research may explore methods to address this limitation. This might involve the development of adaptive clustering techniques that can adjust the number of clusters. On the other hand, the U-MV-FCM does not consider feature weights. Thus, an advanced U-MV-FCM with feature weights and feature reduction will be our further research topic.

Author Contributions

Conceptualization, I.H. and M.-S.Y.; methodology, I.H., K.P.S. and M.-S.Y.; software, I.H. and K.P.S.; validation, I.H. and K.P.S.; formal analysis, I.H., K.P.S. and M.-S.Y.; investigation, I.H., K.P.S. and M.-S.Y.; data curation, I.H. and K.P.S.; writing—original draft preparation, I.H.; writing—review and editing, M.-S.Y.; visualization, I.H., K.P.S. and M.-S.Y.; supervision, M.-S.Y.; funding acquisition, M.-S.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded in part by the Ministry of Science and technology (MOST) of Taiwan under Grant MOST 111-2118-M-033-001.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Jain, A.K.; Dubes, R.C. Algorithms for Clustering Data; Prentice-Hall, Inc.: Upper Saddle River, NJ, USA, 1988. [Google Scholar]
Chang-Chien, S.J.; Hung, W.L.; Yang, M.S. On mean shift-based clustering for circular data. Soft Comput. 2012, 16, 1043–1060. [Google Scholar] [CrossRef]
Kaufman, L.; Rousseeuw, P.J. Finding Groups in Data: An Introduction to Cluster Analysis; John Wiley & Sons: Hoboken, NJ, USA, 2009. [Google Scholar]
Banfield, J.D.; Raftery, A.E. Model-based Gaussian and non-Gaussian clustering. Biometrics 1993, 1, 803–821. [Google Scholar] [CrossRef]
Zhong, S.; Ghosh, J. A unified framework for model-based clustering. J. Mach. Learn. Res. 2003, 4, 1001–1037. [Google Scholar]
Yu, J.; Chaomurilige, C.; Yang, M.S. On convergence and parameter selection of the EM and DA-EM algorithms for Gaussian mixtures. Pattern Recognit. 2018, 77, 188–203. [Google Scholar] [CrossRef]
Chamroukhi, F.; Nguyen, H.D. Model-based clustering and classification of functional data. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2019, 9, e1298. [Google Scholar] [CrossRef]
MacQueen, J. Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Davis, CA, USA, 21 June–18 July 1967; Volume 1, No. 14. pp. 281–297. [Google Scholar]
Inbarani, H.H.; Azar, A.T. Leukemia image segmentation using a hybrid histogram-based soft covering rough k-means clustering algorithm. Electronics 2020, 9, 188. [Google Scholar] [CrossRef]
Zadeh, L.A. Fuzzy sets. Inf. Control 1965, 8, 338–353. [Google Scholar] [CrossRef]
Ruspini, E.H. A new approach to clustering. Inf. Control 1969, 15, 22–32. [Google Scholar] [CrossRef]
Dunn, J.C. A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. J. Cybern. 1975, 3, 32–57. [Google Scholar] [CrossRef]
Chaomurilige, C.; Yu, J.; Yang, M.S. Analysis of parameter selection for Gustafson–Kessel fuzzy clustering using Jacobian matrix. IEEE Trans. Fuzzy Syst. 2015, 23, 2329–2342. [Google Scholar] [CrossRef]
Chaomurilige, C.; Yu, J.; Yang, M.S. Deterministic annealing Gustafson-Kessel fuzzy clustering algorithm. Inf. Sci. 2017, 417, 435–453. [Google Scholar] [CrossRef]
Cardone, B.; Di Martino, F. A novel fuzzy entropy-based method to improve the performance of the fuzzy C-means algorithm. Electronics 2020, 9, 554. [Google Scholar] [CrossRef]
Kumar, D.; Agrawal, R.K.; Kumar, P. Bias-corrected intuitionistic fuzzy c-means with spatial neighborhood information approach for human brain MRI image segmentation. IEEE Trans. Fuzzy Syst. 2020, 30, 687–700. [Google Scholar] [CrossRef]
Wang, E.; Lee, H.; Do, K.; Lee, M.; Chung, S. Recommendation of Music Based on DASS-21 (Depression, Anxiety, Stress Scales) Using Fuzzy Clustering. Electronics 2022, 12, 168. [Google Scholar] [CrossRef]
Dhillon, I.S. Co-clustering documents and words using bipartite spectral graph partitioning. In Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 26–29 August 2001; pp. 269–274. [Google Scholar]
Bickel, S.; Scheffer, T. Multi-view clustering. In Proceedings of the 4th IEEE International Conference on Data Mining ICDM, Brighton, UK, 1–4 November 2004; Volume 4, pp. 19–26. [Google Scholar]
Cleuziou, G.; Exbrayat, M.; Martin, L.; Sublemontier, J.H. CoFKM: A centralized method for multiple-view clustering. In Proceedings of the 2009 Ninth IEEE International Conference on Data Mining, Miami Beach, FL, USA, 6–9 December 2009; pp. 752–757. [Google Scholar]
Jiang, Y.; Chung, F.L.; Wang, S.; Deng, Z.; Wang, J.; Qian, P. Collaborative fuzzy clustering from multiple weighted views. IEEE Trans. Cybern. 2014, 45, 688–701. [Google Scholar] [CrossRef]
Wang, Y.; Chen, L. Multi-view fuzzy clustering with minimax optimization for effective clustering of data from multiple sources. Expert Syst. Appl. 2017, 72, 457–466. [Google Scholar] [CrossRef]
Zeng, S.; Wang, X.; Cui, H.; Zheng, C.; Feng, D. A unified collaborative multikernel fuzzy clustering for multiview data. IEEE Trans. Fuzzy Syst. 2017, 26, 1671–1687. [Google Scholar] [CrossRef]
Benjamin, J.B.; Yang, M.S. Weighted multiview possibilistic c-means clustering with L2 regularization. IEEE Trans. Fuzzy Syst. 2022, 30, 1357–1370. [Google Scholar] [CrossRef]
Huang, S.; Kang, Z.; Xu, Z. Auto-weighted multi-view clustering via deep matrix decomposition. Pattern Recognit. 2020, 97, 107015. [Google Scholar] [CrossRef]
Chen, Y.; Wang, S.; Zheng, F.; Cen, Y. Graph-regularized least squares regression for multi-view subspace clustering. Knowl.-Based Syst. 2020, 194, 105482. [Google Scholar] [CrossRef]
Tan, J.; Shi, Y.; Yang, Z.; Wen, C.; Lin, L. Unsupervised multi-view clustering by squeezing hybrid knowledge from cross view and each view. IEEE Trans. Multimed. 2020, 23, 2943–2956. [Google Scholar] [CrossRef]
Yang, M.S.; Sinaga, K.P. Collaborative feature-weighted multi-view fuzzy c-means clustering. Pattern Recognit. 2021, 119, 108064. [Google Scholar] [CrossRef]
Yang, M.S.; Hussain, I. Unsupervised multi-view K-means clustering algorithm. IEEE Access 2023, 11, 13574–13593. [Google Scholar] [CrossRef]
Papakostas, C.; Troussas, C.; Krouska, A.; Sgouropoulou, C. Personalization of the learning path within an augmented reality spatial ability training application based on fuzzy weights. Sensors 2022, 22, 7059. [Google Scholar] [CrossRef] [PubMed]
Papakostas, C.; Troussas, C.; Krouska, A.; Sgouropoulou, C. PARSAT: Fuzzy logic for adaptive spatial ability training in an augmented reality system. Comput. Sci. Inf. Syst. 2023, 20, 1389–1417. [Google Scholar] [CrossRef]
Lengyel, A.; Botta-Dukát, Z. Silhouette width using generalized mean—A flexible method for assessing clustering efficiency. Ecol. Evol. 2019, 9, 13231–13243. [Google Scholar] [CrossRef] [PubMed]
Yang, S.C.H.; Lengyel, M.; Wolpert, D.M. Active sensing in the categorization of visual patterns. eLife 2016, 5, e12215. [Google Scholar] [CrossRef] [PubMed]
Xu, S.; Qiao, X.; Zhu, L.; Zhang, Y.; Xue, C.; Li, L. Reviews on determining the number of clusters. Appl. Math. Inf. Sci. 2016, 10, 1493–1512. [Google Scholar] [CrossRef]
Yang, M.S.; Nataliani, Y. Robust-learning fuzzy c-means clustering algorithm with unknown number of clusters. Pattern Recognit. 2017, 71, 45–59. [Google Scholar] [CrossRef]
Pedrycz, W. Collaborative fuzzy clustering. Pattern Recognit. Lett. 2002, 23, 1675–1686. [Google Scholar] [CrossRef]
Zhu, l.; Chung, F.L.; Wang, S. Generalized fuzzy c-means clustering algorithm with improved fuzzy partitions. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 2009, 39, 578–591. [Google Scholar]
Bezdek, J.C. Numerical taxonomy with fuzzy sets. J. Math. Biol. 1974, 1, 57–71. [Google Scholar] [CrossRef]
Bezdek, J.C. Cluster validity with fuzzy sets. J. Cybern. 1973, 3, 58–73. [Google Scholar] [CrossRef]
Gath, I.; Geva, A.B. Unsupervised optimal fuzzy clustering. IEEE Trans. Pattern Anal. Mach. Intell. 1989, 11, 773–780. [Google Scholar] [CrossRef]
Xie, X.L.; Beni, G.A. validity measure for fuzzy clustering. IEEE Trans. Pattern Anal. Mach. Intell. 1991, 13, 841–847. [Google Scholar] [CrossRef]
Wu, K.L.; Yang, M.S.; Hsieh, J.N. Robust cluster validity indexes. Pattern Recognit. 2009, 42, 2541–2550. [Google Scholar] [CrossRef]
Roubens, M. Pattern classification problems and fuzzy sets. Fuzzy Sets Syst. 1978, 1, 239–253. [Google Scholar] [CrossRef]
Pereira, J.C.; Coviello, E.; Doyle, G.; Rasiwasia, N.; Lanckriet, G.R.; Levy, R.; Vasconcelos, N. On the role of correlation and abstraction in cross-modal multimedia retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 36, 521–535. [Google Scholar] [CrossRef]
Wang, H.; Yang, Y.; Li, T. Multi-view clustering via concept factorization with local manifold regularization. In Proceedings of the 2016 IEEE 16th international conference on data mining (ICDM), Barcelona, Spain, 12–15 December 2016; pp. 1245–1250. [Google Scholar]
Brbić, M.; Piškorec, M.; Vidulin, V.; Kriško, A.; Šmuc, T.; Supek, F. The landscape of microbial phenotypic traits and associated genes. Nucleic Acids Res. 2016, 44, gkw964. [Google Scholar] [CrossRef]
Lu, Q.; Getoor, L. Link-based Classification. In Proceedings of the 20th International Conference on Machine Learning (ICML), Washington, DC, USA, 21–24 August 2003; pp. 496–503. [Google Scholar]
Wang, H.; Yang, Y.; Liu, B. GMC: Graph-based multi-view clustering. IEEE Trans. Knowl. Data Eng. 2019, 32, 1116–1129. [Google Scholar] [CrossRef]
Greene, D.; Cunningham, P. A matrix factorization approach for integrating multiple data views. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Bled, Slovenia, 6–10 September 2009; Springer: Berlin/Heidelberg, Germany; pp. 423–438. [Google Scholar]
Lewis, D.D.; Yang, Y.; Russell-Rose, T.; Li, F. Rcv1: A new benchmark collection for text categorization research. J. Mach. Learn. Res. 2004, 5, 361–397. [Google Scholar]
Georghiades, A.S.; Belhumeur, P.N.; Kriegman, D.J. From few to many: Illumination cone models for face recognition under variable lighting and pose. IEEE Trans. Pattern Anal. Mach. Intell. 2001, 23, 643–660. [Google Scholar] [CrossRef]
Rong, W.; Zhuo, E.; Peng, H.; Chen, J.; Wang, H.; Han, C.; Cai, H. Learning a consensus affinity matrix for multi-view clustering via subspaces merging on Grassmann manifold. Inf. Sci. 2021, 547, 68–87. [Google Scholar] [CrossRef]
Rand, W.M. Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 1971, 66, 846–850. [Google Scholar] [CrossRef]
Fowlkes, E.B.; Mallows, C.L. A method for comparing two hierarchical clusterings. J. Am. Stat. Assoc. 1983, 78, 553–569. [Google Scholar] [CrossRef]
Cover, T.M. Elements of Information Theory; John Wiley & Sons: Hoboken, NJ, USA, 1999. [Google Scholar]
Jaccard, P. Distribution de la flore alpine dans le bassin des Dranses et dans quelques régions voisines. Bull. Soc. Vaudoise Sci. Nat. 1901, 37, 241–272. [Google Scholar]

Figure 1. Data Processing of the proposed U-MV-FCM framework.

Figure 2. The two-view two-cluster dataset for (a) first view; (b) second view.

Figure 3. The three-view-five-cluster dataset for (a) first view; (b) second view; (c) third view.

Figure 4. Parameter sensitivity of

η_{1}, η_{2}

with Equations (15) and (18) for (a) Artificial Data 1; (b) Artificial Data 2; Further, the iterative c* when processing the U-MV-FCM in determining the optimal number of clusters c* with the learning rates of Equations (17) and (18) for (c) Artificial Data 1; (d) Artificial Data 2.

Figure 4. Parameter sensitivity of

η_{1}, η_{2}

with Equations (15) and (18) for (a) Artificial Data 1; (b) Artificial Data 2; Further, the iterative c* when processing the U-MV-FCM in determining the optimal number of clusters c* with the learning rates of Equations (17) and (18) for (c) Artificial Data 1; (d) Artificial Data 2.

Figure 5. Parameter sensitivity of

η_{1}, η_{2}, η_{3},

and

β

for (a) Artificial 1; (b) Artificial 2.

Figure 5. Parameter sensitivity of

η_{1}, η_{2}, η_{3},

and

β

for (a) Artificial 1; (b) Artificial 2.

Figure 6. (a) Processes of U-MV-FCM for first view after the second iteration; (b) Convergence results for first view; (c) Process of U-MV-FCM for second view after the second iteration; (d) Convergence results for second view.

Figure 7. Processes of the U-MV-FCM after second iteration for: (a) first view; (b) second view; (c) third view; Convergence results for: (d) first view; (e) second view; (f) third view.

Figure 8. The performance results of Co-FKM for Prokaryotic data (a) minimum values; (b) average values; (c) maximum values.

Figure 9. The performance results of Co-FKM for Wikipedia Articles data (a) minimum values; (b) average values; (c) maximum values.

Figure 10. The performance results of Co-FKM for WebKB data (a) minimum values; (b) average values; (c) maximum values.

Figure 11. The performance results of MinMax-FCM for Prokaryotic data (a) minimum values; (b) average values; (c) maximum values.

Figure 12. The performance results of MinMax-FCM for Wikipedia Articles data (a) minimum values; (b) average values; (c) maximum values.

Figure 13. The performance results of MinMax-FCM for WebKB data (a) minimum values; (b) average values; (c) maximum values.

Figure 14. The performance results of WV-Co-FCM for Prokaryotic data (a) minimum values; (b) average values; (c) maximum values.

Figure 15. The performance results of WV-Co-FCM for Wikipedia Articles data (a) minimum values; (b) average values; (c) maximum values.

Figure 16. The performance results of WV-Co-FCM for WebKB data (a) minimum values; (b) average values; (c) maximum values.

Figure 17. Cluster validity indices results on Artificial 1 data with m ranging from 1.1 to 2.0 and

c_{\min} = 2

to

c_{\max} = 10