Cluster-Centered Visualization Techniques for Fuzzy Clustering Results to Judge Single Clusters

Vahldiek, Kai; Klawonn, Frank

doi:10.3390/app14031102

Open AccessArticle

Cluster-Centered Visualization Techniques for Fuzzy Clustering Results to Judge Single Clusters

by

Kai Vahldiek

^1,2

and

Frank Klawonn

^1,3,*

¹

Institute for Information Engineering, Ostfalia University of Applied Sciences, Salzdahlumer Str. 46/48, 38302 Wolfenbüttel, Germany

²

Nordzucker AG, Küchenstraße 9, 38100 Braunschweig, Germany

³

Helmholtz Centre for Infection Research, Biostatistics, Inhoffenstr. 7, 38124 Braunschweig, Germany

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(3), 1102; https://doi.org/10.3390/app14031102

Submission received: 20 December 2023 / Revised: 23 January 2024 / Accepted: 25 January 2024 / Published: 28 January 2024

(This article belongs to the Section Computing and Artificial Intelligence)

Download

Browse Figures

Versions Notes

Abstract

:

Fuzzy clustering, as a powerful method for pattern recognition and data analysis, often produces complex results that require careful examination of individual clusters. In this paper, advanced visualization techniques are presented that aim to facilitate the analysis of fuzzy clustering results by focusing on the evaluation and interpretation of individual clusters. The presented approach is based on the development of cluster-centric visualization techniques that consider the inherent uncertainty of fuzzy clustering results. The novelty is an assessment of individual clusters with the proposed visualizations. In general, three cluster-centered visualization techniques are presented. These approaches are intended not only to illustrate the overall structure of the fuzzy clustering results but also to enable detailed individual cluster analysis. The performance of the presented visualization techniques is demonstrated by their application to real data sets from different areas. The results show that the techniques provide an effective way to judge individual clusters in fuzzy clustering results for complex data structures.

Keywords:

hard clustering; fuzzy clustering; visualization; visual analytics

1. Introduction

Cluster analysis is often defined as the task of partitioning a data set into subsets—called clusters—such that data objects within the same cluster are similar and data objects from different clusters are dissimilar. The notion of clustering as a partitioning task is, for instance, explicitly stated in [1,2] and implicitly in [3] where data objects are assumed to “fall into relatively distinct groups”. In real applications, data sets might not satisfy the idealized assumption of a partition in the form of almost perfect clusters. In many cases, cluster analysis is “misused” to identify some interesting clusters without the intention of partitioning the whole data set into more or less well-separated clusters.

Sometimes it is not necessary to identify all existing patterns or groups with a clustering algorithm within a data set. Instead, the focus is on certain specific clusters that are important for analyzing or understanding a particular problem. Therefore, individual clusters are the focus of the analyzed data sets in order to understand specific patterns or trends within them. This is particularly relevant when different clusters have different characteristics. In addition, too many clusters can also make interpretation more difficult. If there are only a few well-defined clusters, it is often easier to gain insights and interpret them. The examples in this paper comprise artificial data sets, the Iris data set, a medical data set (Hepatitis-C-Virus (HCV)), and a production data set from a sugar factory. A single cluster within the HCV data set could be more interesting if, for instance, only the severely diseased patients were analyzed in more detail.

In order to judge whether single clusters are “meaningful”, special methods are required. Cluster validity indices are designed to evaluate a clustering result as a whole [4], but are usually not meant for judging single clusters. We also believe that abstract indices and measures are an important issue in cluster analysis, and they add a certain objective perspective to the judgment of clustering results. Nevertheless, they usually summarize a complex clustering result into a single number, leading to a very strong loss of information.

Visualization techniques are less objective but can provide much more information on clustering results than a single number provided by a cluster validity measure. Dimension reduction techniques like principal component analysis, multidimensional scaling, or more advanced techniques like t-SNE or UMAP [5,6] can help to identify clusters in multidimensional data visually or to check whether a clustering result might make sense. Like cluster validity measures, dimension reduction techniques focus on lower-dimensional representations of the data set as a whole and do not focus on specific parts or clusters within the data set. In addition, when cluster analysis techniques are used that are based on modified or individual distance measures for each cluster, like the Gustafson–Kessel algorithm [7] or Gaussian mixture models, dimension reduction techniques do not take the individual local scaling for each cluster into account and might lead to misleading representations.

In this paper, we focus on visualization techniques that focus on single clusters, i.e., the visualizations do not aim to provide a global representation of all clusters but show the data from the perspective of a selected cluster, making it possible to judge whether a specific cluster is meaningful or not.

The paper is divided into six sections. After the introduction, Section 2 presents the basic principles of clustering methods. In Section 3, cluster validity and visualization techniques are briefly reviewed. The cluster-centered visualization techniques are described in detail and compared to other approaches in Section 4. Then, the visualization techniques are validated by different examples in Section 5, before Section 6 briefly concludes.

2. Clustering Methods

Hard clustering is a common method in cluster analysis in which data points are clearly and exclusively assigned to a specific cluster. Each data point is assigned to exactly one cluster, and there are no overlaps or uncertainties regarding membership. One example is the k-means algorithm, which groups the data points into k predefined clusters [8]. This clear assignment makes hard clustering particularly easy to interpret and understand. While hard clustering is effective in many applications, it proves to be disadvantageous when the data has naturally fuzzy boundaries between the clusters. The same applies if it is necessary to consider uncertainties in the cluster assignment. If a data point belongs to a specific cluster, it cannot be included in another cluster [9]. In such cases, the fuzzy clustering approach proves to be more advantageous because data points can be assigned to different clusters via membership degrees.

Clustering algorithms are based on different “philosophies” concerning the assignment of data objects to clusters and how this assignment is computed. Based on hard clustering, methods like DBSCAN, and hierarchical or k-means clustering assign each data object to a single cluster—including a possible noise cluster, as in the case of DBSCAN. Gaussian mixture models approximate the distribution of the data by (multivariate) normal distributions, yielding for each data object and each cluster a probability for the data object to be generated by the normal distribution that represents the cluster. Fuzzy clustering uses membership degrees between 0 and 1 to assign data objects to clusters, and for so-called probabilistic fuzzy clustering, the membership degrees of one data object to all clusters sum up to one. Given

n

data objects and a suitable distance function

d

, probabilistic fuzzy clustering tries to minimize the objective function.

f = \sum_{i = 1}^{c} \sum_{j = 1}^{n} u_{i j}^{m} d_{i j}

(1)

under the constraints

\sum_{i = 1}^{c} u_{i j} = 1 f o r a l l j = 1, \dots, n

(2)

The parameter

c

is the chosen number of clusters, and the parameter

m > 1

is the so-called fuzzifier, which controls how “fuzzy” the clusters tend to be. The larger

m

is to chosen, the more data objects are assigned to multiple clusters. The value d_ij denotes the distance the of data object

j

to the cluster

i

. In the simplest case of fuzzy k-means clustering, this is the distance of the data object

j

to the center of the cluster

i

. It was shown by [10] that

u_{i j} = \frac{1}{\sum_{k = 1}^{c} {(\frac{d_{i j}}{d_{k j}})}^{\frac{1}{m - 1}}}

(3)

must hold to minimize Equation (1).

Although the membership degrees sum up to 1 for each data object, one should be careful with an interpretation of such membership degrees as probabilities. Of course, for clustering algorithms that do not yield a unique assignment of data objects to clusters, the probabilities computed in mixture models or membership degrees in fuzzy clustering can be converted to unique assignments by simply choosing for each data object the cluster with the highest probability or membership degree.

In an ideal case, a data set can be partitioned into clusters that are perfectly separated in the data space; however, this assumption does not often apply to real data sets. Fuzzy clustering can take ambiguities in the assignment of data objects into account, resulting in slightly higher computational costs. Our visualization techniques allow the identification of single good clusters among sets of more or less overlapping clusters. Among many other possible applications, the identification of symptom clusters in medicine might be an important scenario when a disease has ambiguous manifestations, like, for example, in the case of Post-COVID-19 [11].

Apart from the different data to cluster assignment “philosophies”, clustering algorithms also differ in the principal input that is used. Algorithms like hierarchical clustering, or DBSCAN, need a distance matrix, whereas Gaussian and other mixture models, as well as k-means clustering and its fuzzy version, work directly with data vectors in the multidimensional space. Of course, distance matrices can also be based on the data vectors, for instance, by computing the pairwise Euclidean distances. But they can also be based on other distance measures, e.g., one-minus correlation. The membership degrees in fuzzy clustering are based on relative distances—for details, see Equation (1) in Section 2—leading to the advantage that one can recalculate distances from the membership degrees and use the distances for visualization techniques, no matter whether these distances are Euclidean distances, locally scaled Euclidean distances as in the Gustafson–Kessel algorithm [7] or abstract distances derived from correlations.

For our cluster visualization techniques, we exploit the interdependence between membership degrees and distances. But the most important aspect of our visualization techniques is that we do not try to visualize the clustering result as a whole, but for single selected clusters.

3. Cluster Validity and Visualization Techniques

The validity of clustering depends on many parameters and settings. These are—among others—the right number of clusters, the advantage of membership degrees, and the chosen cluster visualization (CV) techniques. Furthermore, it depends on the concept of the user in which way the interpretation could be succeeded in an appropriate way. In the following, the complexity of the different parameters and settings is discussed.

3.1. Problem of Number of Clusters

Choosing the right number k of clusters, especially in the k-means algorithm, is very difficult. Because clustering is part of exploratory data analysis [12], there is no right answer as to which k is useful in the respective case. Therefore, this problem is also present in addition to actually solving the clustering. The right choice of k is often equivocal because it depends on the distribution of the points within the data set and the idea of the user for the desired output. A larger number of k leads, namely, to a reduced error; however, it is not expedient if each data point is assigned to its own cluster [13]. Then the error is zero, and k is equal to n as the number of data points. For the k-means algorithm, there are several approaches available.

One method is an extension of k-means, where the determination of the optimal number of clusters is achieved by using many iterations of the algorithm and a stopping rule based on the Bayesian information criterion (BIC). The method chooses the optimal number of clusters based on information theory [14]. Another method is using the clusters as silhouettes. These silhouettes are based on the comparison of their separation and tightness, which leads to an evaluation of whether objects lie well within their cluster or not [15]. A further heuristic method is the elbow method. The method shows the explained variation as a function of the number of clusters [16]. The “elbow” is the cutoff point where the optimal number of clusters is identified. One more cluster does not lead to much more accurate results.

In this paper, the different methods will not be applied or evaluated for certain data sets. However, different cluster-centered visualization (CCV) techniques from Section 4 show the problem of the right choice of the cluster number.

3.2. Advantages of Membership Degrees

One form of clustering is fuzzy clustering. Each data point can belong to more than one cluster. To distinguish which point belongs to which cluster, each point gets its own membership degree [17]. The advantage is that a data point close to the cluster center has a high membership degree, and another data point that is far away from the cluster center has a low membership degree. With different degrees of membership, it is possible to assign an object to multiple clusters. This enables a flexible representation of the membership relationships. In contrast to hard clustering methods, where each data point is assigned to exactly one cluster, membership degrees allow a more gradual membership. This allows a more realistic modeling of uncertainties in the data and leads to robust clustering results, especially when the boundaries between clusters are vague [18]. In addition, membership degrees allow a differentiated view of outliers that may not be clearly assigned to a specific group. Overall, the advantages of membership degrees help to strengthen the flexibility and adaptability of fuzzy cluster analysis, which is particularly useful in complex data sets and real-world applications.

3.3. Visualization Techniques Literature Review

For visualization techniques for clustering, there are several publications available. In the following, an overview of the current state of the existing literature is given.

Automatic interpretation of evolving large data sets is of particular importance. In [19], two methods are presented that use cluster heat maps to visualize structures in static data sets. Subsequently, visual anomaly detection can also be performed to eliminate them from the data set. The methods are verified on real data sets. The algorithms can isolate anomalies and visualize changing cluster structures within the data set.

Fuzzy clustering methods such as fuzzy k-means can be used to assign individual objects to multiple clusters with different degrees of membership. However, these memberships are difficult to visualize and analyze. Therefore, they are converted to 0–1 values and visualized by parallel coordinates or different shades of color. In [20], a new approach is used to visualize fuzzy cluster data. The idea is based on a geometric visualization and groups objects with similar cluster memberships. This allows relationships between individual clusters to be viewed spatially.

The membership degrees are the basis in [21] to visualize fuzzy clustering results. Most fuzzy clustering techniques minimize an objective function with this parameter. But it leads to bad results if the cluster shape or the number of clusters are not chosen properly. For the evaluation of clustering results, cluster validity measures are available, but they reduce the information in a large data set to a single value. A new method was developed to avoid simplification to a single value and to visualize the clustering results. The visualization of fuzzy clusters plots the maximum membership degrees as a function of the distance from the cluster centers. Therefore, it can contain more information for the user to identify inconsistencies within the individual clusters.

In [22], the problem of visualization of clusters in very large rectangular dissimilarity data is addressed. First, four clustering problems are defined, which should be answered for the assessment of cluster tendency. Two artificially generated very large rectangular data sets were generated, and the clustering of the data was performed. The developed method is a scalable approach to the four different cluster assessment problems.

Another visualization technique for fuzzy clustering is radial visualization (RadViz) from [23] for visually analyzing multidimensional data sets. The results of fuzzy clustering are displayed by using RadViz, where each dimensional anchor is a representation of the clusters. The stability of one data point is based on its distance from the circumference. Other visualization techniques display only the maximum association for each data point, but RadViz considers all associations for each data point. This leads to more information about the relationship between the data points. Furthermore, the algorithm was applied for explanation to the Iris data set.

For fuzzy cluster analysis, multidimensional visualization techniques are beneficial. But no single existing visualization technique can support the wide range of analytical tasks based on fuzzy clustering. In [24], a new visualization called FuzzyRadar for a better understanding of fuzzy clusters is developed. The idea is based on the combination of radial coordinate visualization (RadViz) and parallel coordinate plotting (PCP). RadViz is specialized in data-oriented analytical tasks and PCP in the performance of cluster-oriented analytical tasks. First, both approaches are integrated into one new visualization. Subsequently, two methods are introduced to reduce the visual clusters and to recognize the distribution of the degrees of membership. In a case study, the usability of FuzzyRadar is demonstrated, and the results for seven analytical tasks are shown.

Topological techniques for evaluating data are used in the field of topological data analysis. The aim is to determine topological data structures. In this area, the mapper algorithm is considered a solid representative approach. The aim is to find concise and meaningful global topological data structures that are neither accessible nor clusterable and are identifiable by many other clustering methods. For this purpose, a new method called the Shape Fuzzy C-Means (SFCM) algorithm was developed in [25], which is based on the Fuzzy C-Means algorithm with special features of the mapper algorithm. The SFCM algorithm has the same clustering capability as the Fuzzy C-Means algorithm and reveals relationships by visualizing the global shape of the data provided by the mapper.

In [26], multi-dimensional visualizations to understand fuzzy clusters are evaluated. It is well known that the use of visualization techniques for multidimensional data is beneficial for understanding fuzzy clusters. The capability of fuzzy cluster analysis is evaluated in an experiment in which four multidimensional visualization techniques are used and compared. The visualization techniques are a parallel coordinate diagram, a scatterplot matrix, principal component analysis, and Radviz. Subsequently, a guideline for the selection of suitable and efficient visualization techniques for the analysis of fuzzy clusters is presented.

For the analysis of biclustering results, Furby is presented as an interactive visualization technique in [27]. On the one hand, the technique provides an overview of a biclustering result and shows the actual data; on the other hand, in the case of fuzzy clustering results, it also enables the interactive definition of threshold values in order to convert the fuzzy clustering into hard clusters. Changes to the membership thresholds are immediately visible in the visualization. The new technique is demonstrated using a corresponding data set.

In [28], a cluster-oriented development of fuzzy models is proposed. The focus is on the efficient use of fuzzy clusters, in particular fuzzy C-means (FCM), for the formation of clusters for the creation of the fuzzy model. The development of the models is considered from the perspective of the creation and efficient use of clusters. In this study, fuzzy clustering is directly linked to fuzzy modeling. The augmented FCM method is mainly used for modeling purposes to achieve a balance between the structural contents in the input and output spaces. This optimizes the performance of the resulting fuzzy model. This is demonstrated by some experiments.

Visualizing the cluster structure of high-dimensional data is a challenging task, as it must be able to handle the high dimensionality of the input data. The visualization of fuzzy clusters is also challenging, as soft clustering algorithms result in more complex cluster structures. By introducing a membership network in [29], which is based on the fuzzy partition matrix and represents fuzzy clustering, the visual understanding should be improved. It is shown how the elements involved in this type of complex data clustering structure interact with each other without relying on a visualization of the input data itself. The results of the experiment demonstrate the usefulness of the proposed method using the iris flower data set and two other data sets.

Within the context of this paper, some ideas and approaches from the given literature are also used. In general, the newly developed CCV techniques are used for clustering results based on fuzzy k-means but can be applied in principle to any fuzzy clustering algorithm because of the duality between distances and membership degrees. From [19,22,25] only the ideas for clustering are used. The basics are also the membership degree, which refers to [20,21,29]. Since the CCV techniques also use a circular display, refs. [23,24,26] are very important. Especially the idea of RadViz is useful to provide maximum information from the visualization to the user. Biclustering is not directly covered, but the idea of a cluster orientation, even if not for fuzzy modeling but for visualization, is used [27,28]. Furthermore, one evaluation data set will also be the Iris data set. In Section 4, the newly developed CCV techniques are explained.

4. Cluster-Centered Visualization Techniques

For visualization of the clustering results based on the fuzzy k-means clustering algorithm from Section 2, three new CCV techniques are developed. The first method is “Compute-Points-Circular” (CPC), where a cluster prototype is placed in the center of a circle and the other cluster prototypes on the edge of the circle. This method gives an indication of how well a cluster is separated from the other clusters. The second method is “Angle-2-Clusters” (A2C), where two cluster centers are placed at specific points and the triangle between each data point and the cluster center is calculated. In the third method, “Angle-Mapping” (AM), the data points are considered from the viewpoint of one central cluster, and the angle and radius of each data point are considered. The methods are explained in detail in the following sections and also a visual comparison with other techniques is given.

4.1. Method: Compute-Points-Circular

By rewriting Equation (1) in the form

u_{i j} = \frac{\sum_{k = 1}^{c} d_{k j}^{\frac{1}{m - 1}}}{d_{k j}^{\frac{1}{m - 1}}}

(4)

one can easily derive—as shown in [17]—that for the final fuzzy clustering result

{(\frac{u_{i j}}{u_{k j}})}^{m - 1} = \frac{d_{k j}}{d_{i j}}

(5)

must hold. The method CPC exploits this simple association between distances and membership degrees and places a selected cluster prototype in the center of a circle and the other cluster prototypes on the edge of the circle at equidistant positions. A data point in the visualization is placed between the prototype in the center of the circle and the prototype on the edge of the circle, where the data point has the highest membership degree. According to Equation (5) the data point is placed on the radius of the circle between the center and the prototype on the edge, such that the distances between the center of the circle and the prototype on the edge yield the same quotient as in Equation (5).

4.2. Method: Angle-2-Clusters

The method A2C focuses on two cluster centers that are placed at (0; 0) and (1; 0). In the original data space, the triangle defined by the two cluster centers and a data point is considered. This triangle is then mapped proportionally onto the plane, where the connecting line between the cluster centers in the original data space is mapped to the line between (0; 0) and (1; 0). Simple geometric calculations for triangles yield that a data object with membership degrees

u_{0}

and

u_{1}

to the cluster placed at (0; 0) and (1; 0), respectively, must be positioned at the point

(\binom{x}{y}) = (\binom{- 1}{- \sqrt{3}}) u_{0} + (\binom{1}{- \sqrt{3}}) u_{1} + (\binom{0}{\sqrt{3}})

(6)

in the two-dimensional visualization.

4.3. Method: Angle-Mapping

The method AM considers the data points from the viewpoint of one central cluster whose prototype is placed at the origin of the two-dimensional visualization space. The distances of the data points to this cluster center are preserved in the visualization. Polar coordinates are used for the visualization on the plane. The radius for the polar coordinates of a data point is defined by the distance of the data point to the prototype of the central cluster. The visualizations try to preserve angles as much as possible. Since not all angles can be preserved, angles between data vectors that have similar distances are prioritized by the visualization. For this purpose, the data points are placed in the visualization step by step. The zero angle is assigned to the first point. For the following points, the point with the most similar radius among those points that were already placed in the plane is selected, and the angle between these two points is preserved. Since there are two possible angles for the new points: the positive or the negative original angle. To decide between the positive and negative angles, the point with the second most similar radius to the new point among the already placed points is selected to minimize the error of the angle between the new point and this second selected point among the already placed points.

4.4. Visual Comparison with Other Techniques

The newly developed CCV techniques can be compared from a visual perspective with other techniques. Table 1 gives an overview of this comparison.

Table 1 shows a comparison between the classical visualization techniques and our new CCV techniques for the consideration of single and all possible clusters. The main important difference between the presented visualization techniques is the consideration of clusters. Our new CCV techniques consider only single clusters because sometimes a specific single cluster is more interesting than focusing on all possible clusters within the data set. In contrast, the classical techniques of scatterplot, heat maps, and dendrogram visualize all possible clusters and their dependencies. Focusing on interesting single clusters is not possible. The following Figure 1 presents a visual comparison between a classical visualization technique (scatterplot) and one of the newly developed visualization techniques (CPC).

Figure 1 shows an example of a visual comparison between a scatterplot and the CPC technique. The visualization is based on an ambiguous data set and fuzzy clustering with three clusters. The generation of an ambiguous data set is described in Section 5.1. The scatterplot on the left shows which data point belongs to which cluster. However, it is not recognizable how close or far away the clusters are from each other. The new CCV techniques are particularly advantageous at exactly this point. Here, the CPC visualization technique is applied to the same data set. By focusing on one single cluster (the black one), it is clearly visible that the green cluster is very close to the black cluster and the red cluster is clearly separated. All three new CCV techniques focus on one specific cluster and visualize the dependencies among the other possible clusters. This is validated by different examples in the following section.

5. Validation by Example

This section is devoted to the validation of the new techniques for CCV using several examples. Altogether, five data sets are used.

An artificial data set with well-separated clusters is referred to as an artificial good data set in the following. The data set contains three clusters generated from multivariate normal distributions.
An artificial data set with three clusters from multivariate normal distributions where two clusters show a strong overlap. In addition, uniform noise was added to the data set. We refer to this data set as an ambiguous artificial data set.
The Iris data set.
A medical data set and
A data from an industrial production line.

With these data sets, the clustering results based on the fuzzy k-means clustering algorithm will be visualized, as will the problem of the correct choice for the number of clusters. For data generation and visualization, the statistical open-source program R is used [30]. The relevant functions and an application example of the CCV techniques for the artificial good data set are attached as Supplementary Materials.

5.1. Artificial Data Sets

For validation of the newly developed CCV techniques, two artificial data sets were generated. The first one is a good data set with normal distributions where the clusters should be well separated. The second one is an ambiguous data set with one separate cluster and two overlapping clusters based on a normal distribution. Noisy data points were also added based on the uniform distribution. Both data sets have 200 data points each. The artificial good data set was generated with methods that are also used in the data generator [31]. With this data generator, it is possible to generate multivariate numerical data with arbitrary marginal distributions and arbitrary correlations. The values for the centers of the normal distribution for the good data set are

µ_{1} = (0; 0)

,

µ_{2} = (2; 15)

and

µ_{3} = (8; 5)

. The standard deviation is always 1. Then the x- and y-values were generated based on the three normal distributions. After the data generation, the fuzzy k-means clustering algorithm is applied using the package fkm provided in the statistics software R (Version 4.2.2) to determine the clustering results [10]. To visualize the clustering results, CCV techniques are applied. The following figures show the results of the visualization for the artificial good data set with 2, 3, and 4 clusters. The CPC method always puts the first cluster center in the middle, all other centers on the edge, and all data points inside the circle. The A2C method always uses the first and second cluster centers for the angle-based visualization. The AM method also always uses the first cluster center for the angle-based visualization. The colors of the clusters for all following figures are black for the first, red for the second, green for the third, and blue for the fourth cluster.

Figure 2 shows the artificially good data set with two clusters. The clustered data are located in the top left. The centers of the three normal distributions are very well separated, and the fuzzy k-means algorithm identified two clusters. With the CPC method in the top right, both clusters are clearly separated in the visualization. The cluster center of the black cluster is in the middle of the circle, and the red cluster center is on the edge. The A2C method in the bottom left also delivers a clear separation of the two clusters. The AM method in the bottom right leads to a clear separation of the clusters.

Figure 3 shows the artificial good data set with three clusters. The clustered data is located again at the top left. The centers of the three normal distributions are very well separated, and the fuzzy k-means algorithm identified three clusters. With the CPC method in the top right, the three clusters are clearly separated in the visualization. The black cluster center is in the middle of the circle, and the red and green cluster centers are on the edges. The A2C method in the bottom left also delivers a clear separation of the three clusters. The AM method in the bottom right leads to separation of the black cluster, but some data points of the red and green clusters are mixed.

Figure 4 shows the artificial good data set with four clusters. The clustered data is located again at the top left. The centers of the three normal distributions are more or less clearly separated, and the fuzzy k-means algorithm identified four clusters. But two clusters are very close to each other, and some data points could theoretically be in the other cluster. With the CPC method in the top right, the four clusters are separated in the visualization. The black cluster center is in the middle of the circle, and the red, green, and blue cluster centers are on the edge. The visualization shows clearly that the black, red, and green clusters are clearly separated. But some data points in the black cluster are attracted to the red cluster, and vice versa. Therefore, some data points overlap toward the respective cluster center. The A2C method in the bottom left does not indicate a clear separation of the four clusters either. It is also visible that the data points of the black and red clusters are mixed together. The AM method in the bottom right delivers the same mixed separation of the clusters and, additionally, a mix between the blue and green clusters.

For the artificially ambiguous data set, the values for the center of the separated cluster based on the normal distribution is

µ_{1} = (0; 0)

and

µ_{2} = (11; 3)

and

µ_{3} = (14; 4)

for the two overlapping clusters. The standard deviation is also always 1. The interval of the uniform distribution is from −3 to 18 to cover the three clusters with noisy data points. Then the x- and y-values were generated based on the distributions. To achieve the clustering results, the fuzzy k-means algorithm is also applied, and the CCV techniques are used for the visualization. The following figures show the results of the visualization for the artificially ambiguous data set with 2, 3, and 4 clusters. All three CCV techniques show the dependencies always in the direction of the first cluster.

Figure 5 shows the artificially ambiguous data set with two clusters. The clustered data are located in the top left. The center of the separated red cluster is clearly visible, but the fuzzy k-means algorithm seems to be able to identify two clusters. With the CPC method in the top right, both clusters are also not clearly separated in the visualization. Although the black cluster center is in the middle and the red cluster center is at the edge of the circle, there is a smooth transition between the data points. Some data points of the black cluster are attracted to the red cluster center, so a separation is not clearly visible. The A2C method in the bottom left delivers a better visualization result, but there are also some data points very close to the black and red clusters, respectively, and the two generated overlapping clusters are identified as one cluster. The same applies for the AM method in the bottom right.

Figure 6 shows the artificially ambiguous data set with three clusters. The clustered data is located again at the top left. The center of the separated green cluster is theoretically well separated, and the fuzzy k-means algorithm identified three clusters. With the CPC method in the top right, the three clusters are partially clearly separated in the visualization. The black cluster center is in the middle of the circle, and the red and green cluster centers are on the edge. Some data points in the red cluster are attracted to the black cluster. The A2C method in the bottom left delivers a not-clear separation of the clusters. Only the green cluster, in contrast to the black and red clusters, is generally separated. The black and red clusters are very close to each other, with no clear separation. The AM method in the bottom right leads to a clear separation of the separated green cluster. The data points of the red and black clusters are mixed.

Figure 7 shows the artificially ambiguous data set with four clusters. The clustered data are located again at the top left. The center of the separated black cluster and both centers of the overlapping clusters are visible. Due to the noisy data points, the fuzzy k-means algorithm seems to be able to identify four clusters. The majority of the noisy data points are in their own cluster. With the CPC method in the top right, the four clusters are separated in the visualization. The black cluster center is in the middle of the circle, and the red, green, and blue cluster centers are on the edge. The visualization shows clearly that the black, red, and blue clusters are clearly separated. But some data points in the black cluster are attracted to the blue cluster, and vice versa. Therefore, some data points overlap toward the respective cluster center. The A2C method in the bottom left delivers a not very clear separation of the four clusters because of some mixed data points for the black and blue clusters. The green and red clusters are also overlapping. The AM method in the bottom right delivers the same mixed separation of the black and blue clusters but a significantly better separation of the green and red clusters.

The use of the three newly developed CCV techniques has indicated that the results based on the k-means clustering algorithm deliver more information about the dependencies between the individual clusters when focusing on a single cluster. With two or three clusters, the separation is visible both for the artificial good and partially for the ambiguous data set. But with four clusters, the data points are overlapping between some clusters. Therefore, it is not possible to separate the data points correctly. But with only two clusters, one larger cluster is created, into which all close data points are classified. However, it should be noted that if the initial data set is better, the easier it can be clearly clustered than poorer initial data. With the black cluster as the center, all three CCV techniques achieved a good separation of the three other clusters.

5.2. Iris Data Set

In addition to the two artificial data sets, the CCV techniques are also applied to a real data set. The data set is the simple Iris data set [32]. This data set was created as part of the investigation of Iris flowers in 1935. The data set contains measured data from 150 iris plants. These are divided into 50 Virginia irises (Iris Virginica), bristly irises (Iris setosa), and irises of different colors (iris versicolor) each. The length and width of the sepal leaves are included, as well as the petal leaves of each individual plant. The categorical attribute species will not be considered here. The fuzzy k-means clustering algorithm is also applied to determine the clustering results. For the visualization of the clustering results, CCV techniques are used. The following figures show the results of the visualization for the Iris data set with 2, 3, and 4 clusters. The CPC method always puts the first cluster center in the middle, all other centers on the edge, and all data points inside the circle. The A2C method always uses the first and second cluster centers for the angle-based visualization. The AM method also always uses the first cluster center for the angle-based visualization. The colors of the clusters are red for the first, black for the second, and green for the third. and blue for the fourth cluster.

Figure 8 shows the Iris data set with two clusters. With the CPC method (left), two clusters are separated, but some data points from the red cluster are attracted to the black cluster. Therefore, a clear separation is not visible. The A2C method (middle) leads to a better visualization because the two clusters are partially separated, but some data points of the red cluster are also attracted to the black cluster. The AM method (right) shows the same result.

Figure 9 shows the Iris data set with three clusters. With the CPC method (left), three clusters are clearly separated, and the black cluster is very well distinguished from the other two clusters. The A2C method (middle) also shows that the red cluster is clearly separated. But the black and green clusters are slightly mixed. With the AM method (right), the same results are indicated.

Figure 10 shows the Iris data set in four clusters. The CPC method (left) leads to a clear separation of the different clusters. But the green and red clusters are attracted to each other. With the A2C method (middle), the black and blue cluster are slightly mixed, and the other two clusters are mixed with no distinction. The same applies to the AM method (right).

The use of the three newly developed CCV techniques has indicated that the results based on the k-means clustering algorithm deliver more information about the dependencies between the individual clusters when focusing on a single cluster. This was also demonstrated with the artificial data sets. In general, two clusters could be too few for the Iris dataset because there are three iris plants. With three clusters, one cluster is clearly separated but not the other two clusters. With four clusters, a large part of the data points are mixed, and only one cluster differs from the others. With the red cluster as the center, all three CCV techniques achieved a good separation of the three other clusters.

5.3. Hepatitis-C-Virus Data Set

In addition to the already-mentioned data sets, the CCV techniques are also applied to a second real data set. The HCV data set contains laboratory values of blood donors, Hepatitis C patients, and demographic values like age [33]. The target attribute contains normal blood donors vs. Hepatitis C patients, including its progress with ‘just’ Hepatitis C, Fibrosis, Cirrhosis. The fuzzy k-means clustering algorithm is also applied to determine the clustering results. For the visualization of the clustering results, CCV techniques are used. The following figures show the results of the visualization for the HCV data set with 2, 3, and 4 clusters. The CPC method always puts the first cluster center in the middle, all other centers on the edge, and all data points inside the circle. The A2C method always uses the first and second cluster centers for the angle-based visualization. The AM method also always uses the first cluster center for the angle-based visualization. The colors of the clusters are red for the first, black for the second, green for the third, and blue for the fourth cluster.

Figure 11 shows the HCV data set with two clusters. With the CPC method (left), two clusters are separated, but the data points seem to merge smoothly from the red cluster to the black cluster. Therefore, a clear separation is not visible. The A2C method (middle) leads to a better visualization of the fact that the two clusters are partially separated. The AM method (right) shows the same result.

Figure 12 shows the HCV data set with three clusters. With the CPC method (left), three clusters are separated, but some data points from the red cluster are attracted to the green and black cluster. The A2C method (middle) shows a better, more clear separation. With the AM method (right), the same result is visible.

Figure 13 shows the HCV data set with four clusters. The CPC method (left) leads to a clear separation of the different clusters. But the read and blue clusters are attracted to each other. With the A2C method (middle), again, a better, clearer separation is visible. The same applies to the AM method (right).

The use of the three newly developed CCV techniques has indicated that the results based on the k-means clustering algorithm deliver more information about the dependencies between the individual clusters when focusing on a single cluster. In general, two clusters could be too few for the HCV dataset because there are four categories. With three clusters, the clusters are close together but more or less clearly separated. Four clusters lead to a more or less clear separation with no mixing data points. With the red cluster as the center, all three CCV techniques achieved a good separation of the three other clusters.

5.4. Sugar Production Data Set

The CCV techniques are also applied to a third real data set. The data was collected in a sugar factory of Nordzucker AG, where the sugar production process consists of several stages that are monitored automatically. The sugar production data set contains 43 parameters that influence a target parameter during sugar production. The parameters represent the sugar production over a period of four weeks. The collected data was completely anonymized and additionally manipulated with normally distributed noise. The fuzzy k-means clustering algorithm is also applied to determine the clustering results. For the visualization of the clustering results, CCV techniques are used. The following figures show the results of the visualization for the sugar production data set with 2, 3, and 4 clusters. The CPC method always puts the first cluster center in the middle, all other centers on the edge, and all data points inside the circle. The A2C method always uses the first and second cluster centers for the angle-based visualization. The AM method also always uses the first cluster center for the angle-based visualization. The colors of the clusters are red for the first, black for the second, green for the third, and blue for the fourth cluster.

Figure 14 shows the sugar production data set with two clusters. With the CPC method (left), two clusters are separated, but the points are placed in the middle of both clusters and are very close to each other. Therefore, a clear separation is not visible. The A2C method (middle) leads to a better visualization of the fact that the two clusters are partially separated. The AM method (right) indicates that the points of the black cluster are distributed in a circle around the center, even if some points deviate from this. The points of the red cluster are distributed around the black cluster.

Figure 15 shows the sugar production data set in three clusters. With the CPC method (left), three clusters are separated, but the data points of the green cluster are near the center between the green and red clusters. The data points of the black cluster are in the center between the red and the black cluster but are more attracted to the black cluster. The A2C method (middle) shows a better separation where the points of the black and green clusters are close to each other. The red cluster is more separated. The AM method (right) shows that the data points of the black cluster are also distributed in a circle around the center with no deviations. The data points of the red and green clusters are distributed around the black data points.

Figure 16 shows the sugar production data set with four clusters. The CPC method (left) leads to a clear separation of the different clusters, with green data points in the center of the green and red clusters. With the A2C method (middle), a clear separation is also visible even if the red and blue clusters are slightly mixed. The AM method (right) shows that the points of the blue cluster are distributed in a circle around the center with slight deviations from some data points. The data points of the red, green, and black clusters are distributed around the blue cluster.

The use of the three newly developed CCV techniques has indicated that the results based on the k-means clustering algorithm deliver more information about the dependencies between the individual clusters when focusing on a single cluster. Only two clusters could be too few for the sugar production dataset because the whole process varies and the influence of the parameters on the target variables cannot be divided into only two classes. With three clusters, the clusters are close together but more or less clearly separated. This is possible because the data show the parameters of four weeks when the sugar factory is running in a balanced mode, in contrast to the start of the factory, where all parameters have to be set first. With four clusters, there is also a more or less clear separation, but two clusters have mixing data points. With the red cluster as the center, all three CCV techniques achieved a good separation of the three other clusters.

6. Conclusions

The use of fuzzy clustering can lead to complex and overlapping cluster structures. For further interpretation, not all meaningful clusters within a data set should necessarily be found, but the focus is more on individual clusters. The novelty of the paper is an assessment of individual clusters using the proposed visualizations to better assess single clusters and identify their characteristics. The developed visualization techniques focus on a single cluster, which is placed in the center, and possible further clusters around it. Therefore, the data can be considered from the perspective of a selected cluster, making it possible to judge whether a specific cluster is meaningful or not. This is important because global representation is not always helpful. The developed CCV techniques are applied to different datasets to demonstrate their ability to identify patterns and relationships in the data. For each dataset, two to four clusters were visualized using the three CCV techniques: CPC, A2C, and AM. The datasets were an artificially good and ambiguous dataset with a clear separation of the clusters, but also a more difficult separation of the clusters due to mixed data points. The Iris dataset showed a clear separation of the different numbers of clusters, as did the HCV dataset, despite mixed data points. The sugar production dataset showed a clear separation of the clusters up to a number of three, before a larger proportion of the data points were mixed with four clusters. The visualization techniques allowed a characterization of individual clusters with varying degrees of uncertainty in the assignments. This is particularly relevant when traditional hard clustering might be too restrictive.

The results show that the presented techniques can add value to data analysis and the interpretation of meaningful clusters. Focusing on cluster centers and membership degrees leads to the identification of patterns in the datasets that may have been missed by traditional clustering visualizations. The results contribute to the understanding and possible applications of fuzzy clustering and its visualization in data analysis. The integration of more advanced machine learning methods and adaptation to specific application areas can lead to further improvements in visualization techniques in the future.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/app14031102/s1, R-Code: Function of the CCV techniques.

Author Contributions

Conceptualization, K.V. and F.K.; methodology, F.K.; software, K.V. and F.K.; validation, K.V. and F.K.; formal analysis, K.V. and F.K.; investigation, K.V. and F.K.; resources, F.K.; data curation, K.V. and F.K.; writing—original draft preparation, K.V.; writing—review and editing, F.K.; visualization, K.V.; supervision, F.K.; project administration, K.V.; funding acquisition, F.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded partly by the German Federal Ministry of Education and Research, grant number 68518.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

Author Kai Vahldiek is employed by the company Nordzucker AG. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Duda, R.O.; Stork, D.G.; Hart, P.E. Pattern Classification and Scene Analysis, 2nd ed.; Wiley: Chichester, UK; New York, NY, USA, 2000. [Google Scholar]
Giordani, P. An Introduction to Clustering with R; Springer: Singapore, 2020. [Google Scholar]
Tibshirani, R.; Hastie, T.; Witten, D.; James, G. An Introduction to Statistical Learning: With Applications in R; Springer: New York, NY, USA, 2021. [Google Scholar]
Arbelaitz, O.; Gurrutxaga, I.; Muguerza, J.; Pérez, J.M.; Perona, I. An extensive comparative study of cluster validity indices. Pattern Recognit. 2013, 46, 243–256. [Google Scholar] [CrossRef]
Hinton, G.; Roweis, S. Stochastic neighbor embedding. In Advances in Neural Information Processing Systems; The MIT Press: Cambridge, MA, USA, 2002. [Google Scholar]
McInnes, L.; Healy, J.; Melville, J. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. arXiv 2018, arXiv:1802.03426. [Google Scholar]
Gustafson, D.; Kessel, W. Fuzzy clustering with a fuzzy covariance matrix. In Proceedings of the 1978 IEEE Conference on Decision and Control including the 17th Symposium on Adaptive Processes, San Diego, CA, USA, 10–12 January 1979; pp. 761–766. [Google Scholar]
Lloyd, S. Least squares quantization in PCM. IEEE Trans. Inform. Theory 1982, 28, 129–137. [Google Scholar] [CrossRef]
Bora, D.J.; Gupta, A.K. A Comparative study Between Fuzzy Clustering Algorithm and Hard Clustering Algorithm. IJCTT 2014, 10, 108–113. [Google Scholar] [CrossRef]
Bezdek, J.C. Pattern Recognition with Fuzzy Objective Function Algorithms; Springer: New York, NY, USA, 1981. [Google Scholar]
Larson, J.L.; Zhou, W.; Veliz, P.T.; Smith, S. Symptom Clusters in Adults with Post-COVID-19: A Cross-Sectional Survey. Clin. Nurs. Res. 2023, 32, 1071–1080. [Google Scholar] [CrossRef] [PubMed]
Dubes, R.; Jain, A.K. Clustering Methodologies in Exploratory Data Analysis. In Advances in Computers Volume 19; Elsevier: Amsterdam, The Netherlands, 1980; pp. 113–228. [Google Scholar]
Omatu, S.; Neves, J.; Rodríguez, J.M.C.; Santana, J.F.D.P.; González, S.R. Distributed Computing and Artificial Intelligence. In Proceedings of the 12th International Conference, Salamanca, Spain, 28–30 March 2012; Springer International Publishing: Cham, Switzerland, 2015. [Google Scholar]
Ishioka, T. Extended K-Means with an Efficient Estimation of the Number of Clusters. In Intelligent Data Engineering and Automated Learning—IDEAL 2000. Data Mining, Financial Engineering, and Intelligent Agents; Goos, G., Hartmanis, J., van Leeuwen, J., Leung, K.S., Chan, L.-W., Meng, H., Eds.; Springer: Berlin/Heidelberg, Germany, 2000; pp. 17–22. [Google Scholar]
Rousseeuw, P.J. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 1987, 20, 53–65. [Google Scholar] [CrossRef]
Thorndike, R.L. Who belongs in the family? Psychometrika 1953, 18, 267–276. [Google Scholar] [CrossRef]
Klawonn, F.; Höppner, F. What Is Fuzzy about Fuzzy Clustering? Understanding and Improving the Concept of the Fuzzifier. In Advances in Intelligent Data Analysis V; R. Berthold, M., Lenz, H.J., Bradley, E., Kruse, R., Borgelt, C., Eds.; Springer: Berlin/Heidelberg, Germany, 2003; pp. 254–264. [Google Scholar]
Jiao, L.; Yang, H.; Liu, Z.; Pan, Q. Interpretable fuzzy clustering using unsupervised fuzzy decision trees. Inf. Sci. 2022, 611, 540–563. [Google Scholar] [CrossRef]
Kumar, D.; Bezdek, J.C.; Rajasegarar, S.; Palaniswami, M.; Leckie, C.; Chan, J.; Gubbi, J. Adaptive Cluster Tendency Visualization and Anomaly Detection for Streaming Data. ACM Trans. Knowl. Discov. Data 2016, 11, 1–40. [Google Scholar] [CrossRef]
Rueda, L.; Zhang, Y. Geometric visualization of clusters obtained from fuzzy clustering algorithms. Pattern Recognit. 2006, 39, 1415–1429. [Google Scholar] [CrossRef]
Klawonn, F.; Chekhtman, V.; Janz, E. Visual Inspection of Fuzzy Clustering Results. In Advances in Soft Computing; Benítez, J.M., Cordón, O., Hoffmann, F., Roy, R., Eds.; Springer: London, UK, 2003; pp. 65–76. [Google Scholar]
Park, L.A.F.; Bezdek, J.C.; Leckie, C.A. Visualization of clusters in very large rectangular dissimilarity data. In Proceedings of the 2009 4th International Conference on Autonomous Robots and Agents, Wellington, New Zealand, 10–12 February 2009; pp. 251–256. [Google Scholar]
Sharko, J.; Grinstein, G. Visualizing Fuzzy Clusters Using RadViz. In Proceedings of the 2009 13th International Conference Information Visualisation, Barcelona, Spain, 15–17 July 2009; pp. 307–316. [Google Scholar]
Zhou, F.; Bai, B.; Wu, Y.; Chen, M.; Zhong, Z.; Zhu, R.; Chen, Y.; Zhao, Y. FuzzyRadar: Visualization for understanding fuzzy clusters. J. Vis. 2019, 22, 913–926. [Google Scholar] [CrossRef]
Bui, Q.T.; Vo, B.; Snasel, V.; Pedrycz, W.; Hong, T.P.; Nguyen, N.T.; Chen, M.Y. SFCM: A Fuzzy Clustering Algorithm of Extracting the Shape Information of Data. IEEE Trans. Fuzzy Syst. 2021, 29, 75–89. [Google Scholar] [CrossRef]
Zhao, Y.; Luo, F.; Chen, M.; Wang, Y.; Xia, J.; Zhou, F.; Wang, Y.; Chen, Y.; Chen, W. Evaluating Multi-Dimensional Visualizations for Understanding Fuzzy Clusters. IEEE Trans. Vis. Comput. Graph. 2018, 25, 12–21. [Google Scholar] [CrossRef] [PubMed]
Streit, M.; Gratzl, S.; Gillhofer, M.; Mayr, A.; Mitterecker, A.; Hochreiter, S. Furby: Fuzzy force-directed bicluster visualization. BMC Bioinform. 2014, 15 (Suppl. S6), S4. [Google Scholar] [CrossRef] [PubMed]
Pedrycz, W.; Izakian, H. Cluster-Centric Fuzzy Modeling. IEEE Trans. Fuzzy Syst. 2014, 22, 1585–1597. [Google Scholar] [CrossRef]
Ariza-Jiménez, L.; Villa, L.F.; Quintero, O.L. Memberships Networks for High-Dimensional Fuzzy Clustering Visualization. In Proceedings of the Applied Computer Sciences in Engineering: 6th Workshop on Engineering Applications, WEA 2019, Santa Marta, Colombia, 16–18 October 2019; Figueroa-García, J.C., Duarte-González, M., Jaramillo-Isaza, S., Orjuela-Cañon, A.D., Diaz-Gutierrez, Y., Eds.; Springer: Berlin/Heidelberg, Germany; New York, NY, USA, 2019; pp. 263–273. [Google Scholar]
R Core Team. R: A Language and Environment for Statistical Computing; R Foundation: Vienna, Austria, 2021. [Google Scholar]
Vahldiek, K.; Zhou, L.; Zhu, W.; Klawonn, F. Development of a data generator for multivariate numerical data with arbitrary correlations and distributions. IDA 2021, 25, 789–807. [Google Scholar] [CrossRef]
Runkler, T.A. Data Analytics: Models and Algorithms for Intelligent Data Analysis; Vieweg+Teubner Verlag: Wiesbaden, Germany, 2012. [Google Scholar]
Hoffmann, G.; Bietenbeck, A.; Lichtinghagen, R.; Klawonn, F. Using machine learning techniques to generate laboratory diagnostic pathways—A case study. J. Lab. Precis. Med. 2018, 3, 58. [Google Scholar] [CrossRef]

Figure 1. Example of a visual comparison between scatterplot and the new developed CPC technique.

Figure 2. Artificial good data set with two clusters and the visualized results with the three CCV techniques.

Figure 3. Artificial good data set with three clusters and the visualized results with the three CCV techniques.

Figure 4. Artificial good data set with four clusters and the visualized results with the three CCV techniques.

Figure 5. Artificial ambiguous data set with two clusters and the visualized results with the three CCV techniques.

Figure 6. Artificial ambiguous data set with three clusters and the visualized results with the three CCV techniques.

Figure 7. Artificial ambiguous data set with four clusters and the visualized results with the three CCV techniques.

Figure 8. Iris data set with two clusters and visualized results with the CPC (left), A2C (middle) and AM (right) method.

Figure 9. Iris data set with three clusters and visualized results with the CPC (left), A2C (middle), and AM (right) methods.

Figure 10. Iris data set with four clusters and visualized results with the CPC (left), A2C (middle), and AM (right) methods.

Figure 11. HCV data set with two clusters and visualized results with the CPC (left), A2C (middle), and AM (right) methods.

Figure 12. HCV data set with three clusters and visualized results with the CPC (left), A2C (middle), and AM (right) methods.

Figure 13. HCV data set with four clusters and visualized results with the CPC (left), A2C (middle), and AM (right) methods.

Figure 14. Sugar production data set with two clusters and visualized results with the CPC (left), A2C (middle), and AM (right) methods.

Figure 15. Sugar production data set with three clusters and visualized results with the CPC (left), A2C (middle), and AM (right) methods.

Figure 16. Sugar production data set with four clusters and visualized results with the CPC (left), A2C (middle), and AM (right) methods.

Table 1. Comparison of visualization techniques based on the consideration of single or all clusters.

	Consideration of Clusters
Visualization Technique	Single	All
CPC	Yes	No
A2C	Yes	No
AM	Yes	No
Scatterplot	No	Yes
Heat map	No	Yes
Dendrogram	No	Yes

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Vahldiek, K.; Klawonn, F. Cluster-Centered Visualization Techniques for Fuzzy Clustering Results to Judge Single Clusters. Appl. Sci. 2024, 14, 1102. https://doi.org/10.3390/app14031102

AMA Style

Vahldiek K, Klawonn F. Cluster-Centered Visualization Techniques for Fuzzy Clustering Results to Judge Single Clusters. Applied Sciences. 2024; 14(3):1102. https://doi.org/10.3390/app14031102

Chicago/Turabian Style

Vahldiek, Kai, and Frank Klawonn. 2024. "Cluster-Centered Visualization Techniques for Fuzzy Clustering Results to Judge Single Clusters" Applied Sciences 14, no. 3: 1102. https://doi.org/10.3390/app14031102

APA Style

Vahldiek, K., & Klawonn, F. (2024). Cluster-Centered Visualization Techniques for Fuzzy Clustering Results to Judge Single Clusters. Applied Sciences, 14(3), 1102. https://doi.org/10.3390/app14031102

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Cluster-Centered Visualization Techniques for Fuzzy Clustering Results to Judge Single Clusters

Abstract

1. Introduction

2. Clustering Methods

3. Cluster Validity and Visualization Techniques

3.1. Problem of Number of Clusters

3.2. Advantages of Membership Degrees

3.3. Visualization Techniques Literature Review

4. Cluster-Centered Visualization Techniques

4.1. Method: Compute-Points-Circular

4.2. Method: Angle-2-Clusters

4.3. Method: Angle-Mapping

4.4. Visual Comparison with Other Techniques

5. Validation by Example

5.1. Artificial Data Sets

5.2. Iris Data Set

5.3. Hepatitis-C-Virus Data Set

5.4. Sugar Production Data Set

6. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI