Next Article in Journal
Neural Subspace Learning for Surface Defect Detection
Next Article in Special Issue
Fuzzy Extension of Crisp Metric by Means of Fuzzy Equivalence Relation
Previous Article in Journal
Recovery of Inhomogeneity from Output Boundary Data
Previous Article in Special Issue
A Combined Approach of Fuzzy Cognitive Maps and Fuzzy Rule-Based Inference Supporting Freeway Traffic Control Strategies
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

From Fuzzy Information to Community Detection: An Approach to Social Networks Analysis with Soft Information

1
Facultad de Estudios Estadísticos, Universidad Complutense de Madrid, 28040 Madrid, Spain
2
Instituto de Evaluación Sanitaria, Universidad Complutense de Madrid, 28040 Madrid, Spain
*
Author to whom correspondence should be addressed.
Mathematics 2022, 10(22), 4348; https://doi.org/10.3390/math10224348
Submission received: 29 September 2022 / Revised: 7 November 2022 / Accepted: 11 November 2022 / Published: 19 November 2022
(This article belongs to the Special Issue FSTA: Fuzzy Set Theory and Applications)

Abstract

:
On the basis of network analysis, and within the context of modeling imprecision or vague information with fuzzy sets, we propose an innovative way to analyze, aggregate and apply this uncertain knowledge into community detection of real-life problems. This work is set on the existence of one (or multiple) soft information sources, independent of the network considered, assuming this extra knowledge is modeled by a vector of fuzzy sets (or a family of vectors). This information may represent, for example, how much some people agree with a specific law, or their position against several politicians. We emphasize the importance of being able to manage the vagueness which usually appears in real life because of the common use of linguistic terms. Then, we propose a constructive method to build fuzzy measures from fuzzy sets. These measures are the basis of a new representation model which combines the information of a network with that of fuzzy sets, specifically when it comes to linguistic terms. We propose a specific application of that model in terms of finding communities in a network with additional soft information. To do so, we propose an efficient algorithm and measure its performance by means of a benchmarking process, obtaining high-quality results.

1. Introduction and Related Work

Social Network Analysis (SNA) is described as the study and understanding of the relationships between two or more items. As one of the hottest topics of SNA, the Community Detection Problem (CDP) has become a problem of great interest in modern statistics with applications in several fields [1,2,3].
Most of the algorithms and definitions of community detection problems assume that the only information available for identifying the clusters/communities in a network is the graph which describes its structure. This graph can be non-directed and binary (all the relations in the network are equals) in the classical and most studied community detection problem [2]; non-directed and valued (the network is modeled by a weighted non-directed graph), or the case in which the relations are not symmetrical [1,4,5]. There are other interesting approaches focused on the incorporation of additional information to the crisp graphs, specifically to find communities in a network [6,7]. Nevertheless, any of these approaches only considers the community detection problem from a topological point of view, with a focus on the problem from the relations between nodes, but not considering other types of information that could be relevant in order to find communities in a real problem.
We illustrate our idea with an example. Let us present a situation in which we have a set V of nodes which represents the members of a parliament, whose friendly relations are known to us by the crisp graph G = ( V , E ) . Let us assume that the reason why they are interacting is because they are voting on a specific law in parliament. This information (the voting problem) and also their political preference on the law (or their capacity in the voting problem) could be relevant information to identify the clusters in the network.
To deal with this type of problem, in [8,9,10,11,12], the authors introduce a new element to the community detection problem: a capacity measure that tries to model and reflect the reason why the nodes are interacting in the network in addition to the interests of the nodes to remain united. From this perspective, in [11], we present an efficient algorithm for a community detection problem that deals with networks and fuzzy measures in that sense. Furthermore, in [13], we present a constructive method to build a 1-additive fuzzy measure from a crisp valuation of the nodes in the network.
Nevertheless and due to the natural uncertainty in real problems, the information associated with the network nodes is not usually assumed to be crisp in a natural way. Uncertainty is associated with the lack of knowledge about the occurrence of some event. Within the last decades, two important models are proposed to represent different types of uncertainty: randomness and vagueness/imprecision. Whereas the randomness emerges due to the lack of knowledge about the occurrence of some event [14], vagueness a phenomenon rises when trying to group together objects that share a certain property. A typical vague property is “to be a small number” or “to be a tall person”, or (taking the previous example of the voting system in a parliament) “to be against a specific law”. In this way, the fuzzy linguistic approach has been successfully applied to many problems [15]. Taking into account this type of information, an important goal of this work is to provide a methodology to face community detection problems in networks with additional soft information. With the aim to extend some of the definitions and algorithms presented in [13] for crisp information, in this paper, we work on the basis of the existence of a vector (or a family of them) whose elements are no longer crisp values, but they are fuzzy sets that provide some type of soft information related to the individuals in the problem. In this context, another important objective of this work arises: we characterize a new representation tool which generalizes other existing models in the literature, regarding the nature of the information: the extended fuzzy graph based on a fuzzy vector (EFVFG). It is defined on the basis of a crisp networks and a vector of fuzzy sets. Another goal is to extend it to a more complex scenario in which there is not only one type of information but many; in this situation, we strongly recommend consideration of the multi-dimensional extended fuzzy graph fuzzy vector-based (MEFVFG), which is defined on the basis of a family of vectors of fuzzy sets.
Then, we suggest a specific application of the new representation model, which is useful to obtain realistic partitions in a network with additional soft information. We present a competitive algorithm which introduces fuzzy sets to the process of grouping individuals. It is a modification of the well-known Louvain algorithm for crisp networks [16] that allows us to deal with soft information in the network, which is developed on the basis of MEFVFG. To guarantee the quality of the proposed methodology, we dedicate an important part of this work to its evaluation. The computational results showed in this work, obtained through a benchmarking process developed on the basis of some trapezoidal fuzzy sets, allow us to assert the good performance of our algorithm.
This paper is organized as follows. In Section 2, we lay the foundations of the work, showing several concepts and definitions that are useful for the understanding and follow-up of the work. In Section 3, we characterize a new model representation based on soft information about the individuals of a network given by several fuzzy sets. After that, in Section 4, we propose a specific application of that new tool, related to the community detection problem with additional soft information, which is a very live issue in the field of SNA. In order to evaluate the performance of the proposed methodology, we show some computational results in Section 5. The paper ends in Section 6 with some conclusions and a final discussion.

2. Preliminaries

2.1. Fuzzy Sets

Fuzzy sets were introduced by Zadeh as an extension of usual concept of set, and they have been applied in several fields [17,18,19].
 Definition 1 
(Fuzzy set [20]). Let X denote a set. A fuzzy set in X, denoted by A ˜ , is a set uniquely characterized by its membership function, defined by η A : X [ 0 , 1 ] where, for every point x X , η A ( x ) defines x’s “grade of membership”.
In this work, we will focus on fuzzy sets over positive real numbers, so from now on, we will assume that X = R + .
Introduced by Zadeh [21] and applied to the resolution of many real problems, the fuzzy linguistic variables were defined in situations in which imprecision or vagueness of a quantitative variable are given in linguistic terms. For example, a linguistic variable L ˜ = { L 1 , , L k } can be characterized by k membership functions, that is, the collection of its linguistic values, U R + is a universe of discourse and the meaning of each linguistic value is characterized by η L i : U [ 0 , 1 ] , which associates each u U with its compatibility. In the computational results section of this work, we consider a specific type of fuzzy sets that are commonly used to model the linguistic terms of a fuzzy linguistic variable: the trapezoidal fuzzy sets, whose shape is similar to Figure 1.
 Definition 2 
(Trapezoidal fuzzy set [22]). The fuzzy set A = a , b , c , d is said to be trapezoidal if its membership function η A is defined by:
η A x = 0 if   x < a x a b a if   a x b 1 if   b < x c x d c d if   c < x d 0 if   d < x
Figure 1. Graph G = ( V , E ) and fuzzy linguistic variable L ˜ .
Figure 1. Graph G = ( V , E ) and fuzzy linguistic variable L ˜ .
Mathematics 10 04348 g001
In this work, some of the input values are defined as fuzzy sets. Nevertheless, to face the final goal, we need to “convert” that soft information into crisp values. This process of obtaining a single output from the output of an aggregated fuzzy set is known as “defuzzification”, and there are different methods to carry on with it [23,24]. In general, a formal definition of a defuzzification operator is shown below.
 Definition 3 
(Defuzzification operator [24]). Given a universe X, the operator D : F ( X ) X which maps the fuzzy sets on X into elements of the X is said to be a defuzzificatin operator.

2.2. Networks with Additional Information: An Algorithm That Deals with CDP and Fuzzy Measures

A graph is a pair G = ( V , E ) , in which V = { 1 , 2 , , n } is a set of individuals called nodes or vertices, and E = { { i , j } | i , j V } is an unordered set of pairs of nodes called edges or arcs. A graph is unequivocally defined by its adjacency matrix A, which is characterized as A i , j V = 1 if { i , j } E , and 0 otherwise. A graph is said to be valued or weighted if there is a function w : E R which assigns a weight to each edge. In this type of graph, the adjacency matrix not only represents the existence of an edge between two nodes but also shows the weight of each edge by displaying the value assigned to each edge by the w function.
The community detection problem is an important problem in the SNA field, the goal of which is to find a “good” partition of the set of nodes. A partition is considered to be only as good as how internally homogeneous and externally heterogeneous the defined groups are in terms of the connections between individuals. The modularity measure defined in [2], usually denoted by Q, is a quality function of the partitions which somehow measures the strength of the division of a graph in a partition of communities. Q is usually considered as a function to be maximized.
 Definition 4 
(Modularity [25]). Let G = V , E denote a graph with adjacency matrix A. Let i , j V and m = | E | . The modularity function of the partition of V, P, is characterized by
Q P = 1 2 m i , j V A ( i , j ) k i k j 2 m δ ( C i , C j )
in which k i is the degree of i and C i is the group to which i is assigned; δ C i , C j = 1 if C i = C j , and δ C i , C j = 0 otherwise.
Without detriment regarding the worthiness of classic approaches, some authors agreed on the importance of including as much information as possible in the network analysis process, regardless of the direct crisp connections between individuals defined by the edges. We find several approaches with a common idea: the more information is considered, the more realistic the results obtained, either in terms of partitions or any other notion [6,26,27]. Specifically, this work is set on the basis of the idea introduced in [11]. In that preliminary work, the authors proposed a methodology to find realistic communities in a graph in terms of a fuzzy measure, defining some additional information about the synergies between the individuals. That method was based on the Louvain algorithm [16] with a main difference: the calculation of modularity not only considers the adjacency matrix but also some additional information matrix, specifically, one obtained from the mentioned affinity fuzzy measure. This methodology, named Duo Louvain, is summarized in Algorithm 1. The main difference with respect to the common Louvain method can be seen in line 15 of the pseudo-code: the variation of modularity obtained when moving the node o i to the community to which its neighbor e j belongs, Δ Q o i ( e j ) , is calculated in any matrix M, which is different from the adjacency matrix. This methodology was adapted to different scenarios in later works [9,12,28] related to a variety of fuzzy measures. Let us emphasize that this methodology is far more powerful than being limited to the consideration of fuzzy measures. It can also be considered in any other scenario beyond them, provided that any additional information that can be aggregated, in any form, into a matrix, is available.
As mentioned, a quick overview of that methodology is showed in Algorithm 1, where π ( V ) denotes all the feasible permutations of the elements of V; o = ( o 1 , , o n ) π ( V ) is one of these orders; H ( o i ) denotes the set of neighbors of o i V or, what is the same, the nodes with which o i is directly connected, and Δ Q o i ( e j ) denotes the variation of modularity obtained when moving o i to the community to which e j belongs.
Algorithm 1 Duo Louvain
 1:
Input: A , M ;
 2:
Output: P;
 3:
Preliminary
 4:
C i { i } , i V (each node i is an isolated community);
 5:
P 1 , 2 , , n (initial partition);
 6:
end Preliminary
 7:
Phase 1
 8:
Take o = o 1 , , o i , , o n π ( V ) ;
 9:
s t o p 0 ;
10:
while ( s t o p = = 0 ) do
11:
    s t o p 1
12:
   for  ( i = 1 ) ( n )  do
13:
       e 1 , , e h H ( o i ) (find the neighbors of o i in A);
14:
      for  ( j = 1 ) ( h )  do
15:
         Calculate Δ Q o i ( e j ) in M;
16:
      end for 
17:
       j * e | Δ Q o i ( j * ) = max { 1 , h } Δ Q o i ( e ) ;
18:
      if  ( Δ Q o i ( j * ) > 0 )  then
19:
          C P o i C P o i { o i } ;  
20:
          C P j * C P j * { o i } ;  
21:
          P o i P j * ;  
22:
          s t o p 0 ;
23:
      end if
24:
   end for
25:
end while
26:
end Phase 1
27:
Phase 2
28:
Calculate A * from A (nodes of A * are the communities previously found in A);
29:
Calculate M * from M (nodes of M * are the communities previously found in M);
30:
if ( A * A ) then
31:
    A A * ;
32:
    M M * ;
33:
   Apply Phase 1 and Phase 2;
34:
end if
35:
end Phase 2
36:
return P ;

3. Model Definition: Building Extended Fuzzy Graphs from Graphs with Fuzzy Nodes Information

In this section, we work on the definition of a new representation tool. Firstly, we do this in a uni-dimensional scenario, assuming there is an additional fuzzy information vector related to the individuals of a set V, denoted by f ˜ = f 1 ˜ , , f r ˜ . For each i V , the fuzzy set f i ˜ (characterized by its membership function η f i ) represents the vague or imprecise information associated to the node i of some characteristic or evidence. This fuzzy modelization is especially useful (but not only) when the information associated with each node is gathered (for example) by a linguistic term. Specifically, in this case, we could work with linguistic terms f i ˜ L ˜ . By analogy with [9], we first propose a characterization of a fuzzy Sugeno λ -measure from this fuzzy vector f ˜ . This measure is denoted by μ f , p .
 Definition 5 
(Fuzzy Sugeno λ -measure obtained from fuzzy sets). Given the set V = { 1 , 2 , , n } , let f ˜ = f 1 ˜ , , f n ˜ denote a vector of fuzzy sets defined over a universe U R + (i.e., η f i : U [ 0 , 1 ] ), and let D : F ( R + ) R + denote a defuzzification operator. Then, for any p ( 0 , 1 ] and i V , a natural definition of is μ f , p is:
μ f , p ( i ) = p D ( f i ˜ ) k = 1 n D ( f k ˜ ) , i V
where μ f , p ( M N ) = μ f , p ( M ) + μ f , p ( N ) + λ μ f , p ( M ) μ f , p ( N ) , M , N V , with M N = Ø and λ + 1 = i = 1 n ( 1 + λ μ f , p ( i ) ) , b e i n g p ( 0 , 1 ] .
Note that the interpretation of μ f , p depends on p. Specifically,
 Proposition 1. 
Given the parameter p = 1 , the function μ f , p is a fuzzy Sugeno λ-measure 1-additive.
 Proof. 
Because of the properties of the Sugeno λ -measures [29] in addition to the assumption of p = 1 , we have λ = 0 . Then, M V , μ f , p ( M ) = M D ( f ˜ ) k = 1 n D ( f k ˜ ) , so μ f , p meets the conditions of fuzzy measures [30], Sugeno λ -measures [29] and 1-additivity [31].
  • μ f , p ( Ø ) = 0 Trivial.
  • μ f , p ( V ) = D ( f 1 ˜ ) k = 1 n D ( f k ˜ ) + + D ( f n ˜ ) k = 1 n D ( f k ˜ ) = 1 .    
  • Let M N V . Then, μ f , p ( N ) = N D ( f ˜ ) k = 1 n D ( f k ˜ ) =   M D ( f ˜ ) + t N M D ( f t ˜ ) k = 1 n D ( f k ˜ ) M D ( f ˜ ) k = 1 n D ( f k ˜ ) = μ f , p ( M ) , so μ f , p is a fuzzy measure.
  • Sugeno λ -measure. Trivial by definition.
  • 1-additivity: regarding [31], it is trivial if i { 1 , , n } , we define a i = μ f , p ( i ) .
  □
 Proposition 2. 
Given the parameter p ( 0 , 1 ) , μ f , p is a fuzzy Sugeno λ-measure.
 Proof. 
The proof is similar to that of Proposition 1.    □
So, we generalize the notion of extended fuzzy graph vector based, G ˜ = V , E , μ x , p [9] to a scenario where the additional information is not provided by a crisp vector x, but it comes from a vector of fuzzy sets, f ˜ = f 1 ˜ , , f r ˜ .
 Definition 6 
(Extended fuzzy graph fuzzy vector based (EFVFG)). Let G = ( V , E ) denote a graph with n = | V | individuals and m = | E | edges. Let f ˜ = f ˜ 1 , , f ˜ n denote a vector of fuzzy sets in membership function form, each of them related to an individual of V. Let D : F ( R + ) R + denote a defuzzification operator, and given the parameter p ( 0 , 1 ] , let μ f , p denote the fuzzy Sugeno λ-measure obtained from f ˜ . Then, the tuple G ^ = V , E , μ f , p is said to be a fuzzy extended graph based on the fuzzy vector f ˜ .
 Example 1. 
Let L ˜ = { V e r y L o w , L o w , M e d i u m , H i g h , V e r y H i g h } denote a fuzzy linguistic variable defined over the universe U = [ 0 , 100 ] , which is characterized by the corresponding membership functions η V L , η L , η M , η H , η V H : [ 0 , 100 ] [ 0 , 1 ] associated with the different linguistic terms that represent how in agreement a person is with some law denoted by L W 1 . Let G = ( V , E ) define a cyclic graph with V = { 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 } and E = { ( 1 , 2 ) , ( 2 , 3 ) , ( 3 , 4 ) , ( 4 , 5 ) , ( 5 , 6 ) , ( 6 , 7 ) , ( 7 , 8 ) , ( 8 , 1 ) } , and finally, let f ˜ = f 1 ˜ , , f 8 ˜ = ( V L , V L , L , V L , H , V H , H , V H ) denote a vector of fuzzy sets that models the linguistic terms affinity of these e i g h t nodes of V to the law L W 1 .
From the previous definition, it is possible to build (for any p ( 0 , 1 ] ) the extended fuzzy graph associated with the fuzzy vector f ˜ and the graph G = ( V , E ) , that is: G ^ = V , E , μ f , p .
Assuming we can have more than one characteristic associated with each node in a network, we go beyond the uni-dimensional case by considering there is not only a vector of fuzzy sets f ˜ , but a family of them, f 1 ˜ , , f r ˜ , each of them defining some extra knowledge about the individuals.
 Definition 7 
(Multi-dimensional extended fuzzy graph fuzzy vector based (MEFVFG)). Let G = ( V , E ) denote a graph with n nodes and m edges, and let f ˜ 1 , , f ˜ r denote a family of r independent vector of n fuzzy sets, each of them defining a type of information, so that = 1 , , r ; i = 1 , , n the component f i ˜ is the fuzzy set related to the characteristic ℓ and the individual i with membership function η f i : U [ 0 , 1 ] . Let D : F ( R + ) R + be a defuzzification operator (that we will assume that is the same for all characteristics f ˜ ), and let p ( 0 , 1 ] be a parameter in ( 0 , 1 ] .
Then, the tuple G ^ = V , E , μ f 1 , p 1 , , μ f r , p r is said to be a multi-dimensional extended fuzzy graph (MEFVFG) based on the r fuzzy vectors ( f 1 ˜ , , f r ˜ ).
 Example 2. 
Let L ˜ = { V e r y L o w , L o w , M e d i u m , H i g h , V e r y H i g h } denote a fuzzy linguistic variable defined over the universe U = [ 0 , 100 ] characterized by the corresponding membership functions η V L , η L , η M , η H , η V H : [ 0 , 100 ] [ 0 , 1 ] associated with the different linguistic terms that represent how in agreement a person is with a specific law, L W 1 . Let G = ( V , E ) denote a cyclic graph with V = { 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 } and E = { ( 1 , 2 ) , ( 2 , 3 ) , ( 3 , 4 ) , ( 4 , 5 ) , ( 5 , 6 ) , ( 6 , 7 ) , ( 7 , 8 ) , ( 8 , 1 ) } , and finally let f 1 ˜ = f 1 1 ˜ , , f 8 1 ˜ = ( V L , V L , L , V L , H , V H , H , V H ) be a vector of fuzzy sets that models the linguistic terms affinity of these eight nodes to the congress proposal L W 1 and let f 2 ˜ = f 1 2 ˜ , , f 8 2 ˜ = ( M , M , M , M , L , L , L , H ) be a vector of eight fuzzy sets that models the linguistic terms affinity of the eight nodes to the congressional bill, which is denoted by L W 2 (see Figure 2).
From the previous definition, for any p 1 , p 2 ( 0 , 1 ] , it is possible to build the MEFVFG associated with the two fuzzy vectors ( f 1 ˜ , f 2 ˜ ) as
G ^ = V , E , μ f 1 , p 1 , μ f 2 , p 2 .
Let us remark that this last case generalizes other existent tools, such as for example fuzzy graphs defined in [32], which actually only define relations between connected individuals, the extended fuzzy graphs [11], in which the additional information is about the relations between the elements but not on the individuals itself; or the [9] whose additional information is about individuals, but it is crisp.
Figure 2. Graph G = ( V , E ) and fuzzy linguistic variable L ˜ .
Figure 2. Graph G = ( V , E ) and fuzzy linguistic variable L ˜ .
Mathematics 10 04348 g002

4. An Application: Social Network Analysis with Soft Information

As a specific application of the new proposed model, we take up firstly the idea introduced in [9] about community detection in graphs taking into account the information given by a crisp vector, x. In that preliminary work, we set up the philosophy of finding groups in a network when there is a vector of crisp values providing some additional information. Now, we generalize this idea, starting from extended fuzzy graphs that are built from r fuzzy vectors f 1 ˜ , , f r ˜ .
To find “good” communities in G ^ , we have to extend the Sugeno Louvain Algorithm described in [9] to a multi-dimensional stage with more than one vector of additional information, with the peculiarity that the components of the vectors considered are no longer crisp values but fuzzy sets: therefore, actually, what we have is an MEFVFG. We illustrate the problem of community detection based on an MEFVFG in Example 3.
 Example 3. 
We consider a chain with 12 nodes, represented by the crisp graph G = ( V , E ) (see Figure 3) about which we have additional information, ( f ˜ 1 , f ˜ 2 , f ˜ 3 , f ˜ 4 ) , and the defuzzification operator D, so ( D ( f ˜ 1 ) = ( 9 , 9 . 5 , 10 , 1 , 0.5 , 1 , 9 . 5 , 8 , 10 , 1 , 1 , 2 ) , D ( f ˜ 2 ) = ( 10 , 9 . 5 , 9 , 1 , 0.5 , 1 , 9 , 9 , 9 . 5 , 1.5 , 2 , 0.5 ) , D ( f ˜ 3 ) = ( 9 . 5 , 8 . 5 , 10 , 1.5 , 1 , 1 , 10 , 9 , 9 . 5 , 0.9 , 1 , 1 ) , D ( f ˜ 4 ) = ( 9 , 9 , 10 , 1 , 1 , 1 , 10 , 9 . 5 , 9 , 0.5 , 1 , 1 ) ) . These fuzzy sets represent the opinion of 12 people about f o u r different films. We accept that there are more synergies between those people who have similar preferences. Partition P = { { 1 , 2 , 3 , 4 } , { 5 , 6 , 7 , 8 } , { 9 , 10 , 11 , 12 } } is obtained with any algorithm based on modularity optimization. Nevertheless, if the additional information is considered, the partition provided by the Multi-Dimensional fuzzy Sugeno–Louvain 1-additive is P f = { { 1 , 2 , 3 } , { 4 , 5 , 6 } , { 7 , 8 , 9 } , { 10 , 11 , 12 } } .
Figure 3. Chain with 12 nodes. Partitions P and P f .
Figure 3. Chain with 12 nodes. Partitions P and P f .
Mathematics 10 04348 g003
The proposed method, named Multi-dimensional Fuzzy Sugeno Louvain, is based on the Louvain algorithm [16]. The main point is to summarize all the knowledge of the MEFVFG into two matrices: A that represents the direct connections between the nodes (edges), and F summarizes the additional information given by the family of vectors of fuzzy sets f 1 ˜ , , f r ˜ . The weighted graph associated to a fuzzy Sugeno λ -measure [9] μ f , p is essential, and it is considered in terms of a multi-dimensional scenario (MAWG) (one weighted graph with adjacency matrix F associated to each μ f , p ). This methodology to find realistic partitions in an MEFVFG explained below is summarized in Algorithm 2, which includes its pseudo-code, and in Figure 4, which shows a flowchart of the process.
  • Step 1: definition of the MAWG. Given the fuzzy Sugeno λ -measures μ f 1 , p 1 , , μ f r , p r obtained from f ˜ 1 , f ˜ r and p 1 , , p r , and the defuzzification operator D, matrices F 1 , , F r are calculated as
    F i j = ϕ S h i ( μ f , p ) S h i j ( μ f , p ) , S h j ( μ f , p ) S h j i ( μ f , p )
    being ϕ : [ 1 , 1 ] [ 0 , 1 ] a bi-variate aggregation operator [33]; S h i ( μ f , p ) and S h i j ( μ f , p ) the Shapley values of i on μ f , p in the presence of all the elements of V or V { j } , respectively [34].
  • Step 2: information aggregation. Matrices F 1 , , F r are aggregated to obtain the matrix F. The aggregation function Φ : Π r Π is used, being Π the set of quadratic n-matrices. Particularly, we suggest the use of a matrix aggregator based on the classical aggregation operators with element to element transformation: F = Φ F 1 , , F r .
After this aggregation process, the method Duo Louvain has to be applied [12,13], considering the matrix M = θ A , F , being θ : Π 2 Π an aggregation function. That method can consider the information of two matrices when finding communities in a graph.
Algorithm 2 Multi-dimensional Fuzzy Sugeno–Louvain
 1:
Input: A , f ˜ 1 , , f ˜ r , p 1 , , p r , A represents G = ( V , E ) ; f ˜ is a vector of fuzzy sets; p [ 0 , 1 ) , = 1 , , r ;
 2:
Output: P;
 3:
Preliminary
 4:
for ( = 1 ) ( r ) do
 5:
   Calculate μ f , p (fuzzy Sugeno λ -measure from f ˜ );
 6:
    F i j ϕ S h i ( μ f , p ) S h i j ( μ f , p ) , S h j ( μ f , p ) S h j i ( μ f , p ) , i , j V ;
 7:
end for
 8:
F Φ F 1 , , F r ;
 9:
M θ A , F ;
10:
end Preliminary
11:
P Duo Louvain A , M ;
12:
return P ;
Figure 4. Flowchart of the methodology Multi-dimensional Fuzzy Sugeno–Louvain.
Figure 4. Flowchart of the methodology Multi-dimensional Fuzzy Sugeno–Louvain.
Mathematics 10 04348 g004
 Remark 1. 
The concept of “what is a good group” depends on the operator Φ applied. In the case that Φ is a disjunctive operator, groups are composed by elements among which there are strong synergies regarding any evidence or characteristic (any fuzzy vector). The size of the groups that are somehow similar regarding the additional information will increase the more vectors are considered. In contrast, where Φ is a conjunctive operator, the groups are composed by elements among which there are strong synergies in all the evidence or characteristics. The size of the groups that are somehow similar regarding the additional information will increase the less vectors are considered. Particularly, we consider the most popular ordered weighted averaging aggregation operators, OWA [35]: maximum, minimum and average.
As in the uni-dimensional problem with crisp information addressed in [9], the exponential complexity concerning the calculation of the Shapley value may be avoided by considering an additive fuzzy measure. For this reason, in this paper, we suggest the specific characterization of μ f , p when p = 1 . On this basis, as μ f a is a 1—additive fuzzy measure [31], it holds:
S h i ( μ f a ) = D ( f ˜ i ) k = 1 n D ( f ˜ k ) a n d S h i j ( μ f a ) = D ( f ˜ i ) k = 1 k j n D ( f ˜ k )
In this context, we propose a specific application of the Algorithm Multi-dimensional Fuzzy Sugeno–Louvain. For every = 1 , , r , the characterization of μ f a only depends on the calculation of F . Then, the complexity of the method 1-additive Multi-dimensional Fuzzy Sugeno–Louvain is equal to that of the Louvain algorithm (Algorithm 3) [16].
Algorithm 3 1-additive Multi-dimensional Sugeno–Louvain
 1:
Input: A , f ˜ 1 , , f ˜ r , A is a representation of G = ( V , E ) ; f ˜ is a vector of fuzzy sets, = 1 , , r ;
 2:
Output: P;
 3:
Preliminary
 4:
for ( = 1 ) ( r ) do
 5:
    F i j ϕ { | D ( f ˜ i ) k = 1 n D ( f ˜ k ) D ( f ˜ i ) k = 1 k j n D ( f ˜ k ) | , | D ( f ˜ j ) k = 1 n D ( f ˜ k ) D ( f ˜ j ) k = 1 k i n D ( f ˜ k ) | } ;
 6:
end for
 7:
F Φ F 1 , , F r ;
 8:
M θ A , F ;
 9:
end Preliminary
10:
P Duo Louvain A , M ;
11:
return P ;
 Example 4. 
We illustrate the idea of our methodology in a simple case considering there is a vector of fuzzy sets. Let us consider the situation described in Example 1 in which we have a cycle of eight nodes and the information associated to each node is described in linguistic terms f ˜ = f 1 ˜ , , f 8 ˜ = ( V L , V L , L , V L , H , V H , H , V H ) . The whole information is summarized in the Figure 5. Now, let us assume that these linguistic fuzzy variables defined over U = [ 0 , 100 ] are modeled in terms of the following four fuzzy trapezoidal sets. V L ˜ = ( 0 , 0 , 10 , 25 ) , L ˜ = ( 5 , 10 , 20 , 25 ) , M ˜ = ( 30 , 40 , 60 , 70 ) , H ˜ = ( 60 , 70 , 80 , 100 ) , V H ˜ = ( 75 , 90 , 100 , 100 ) . It is possible to see that if we apply the Fuzzy Sugeno–Louvain 1-additive, just in the unidimensional so we only have one matrix F , to this extended fuzzy graph the partition obtained for any p is P f = { { 1 , 2 , 3 , 4 } , { 4 , 5 , 6 , 7 , 8 } } .
Figure 5. Graph G = ( V , E ) and fuzzy linguistic variable L ˜ .
Figure 5. Graph G = ( V , E ) and fuzzy linguistic variable L ˜ .
Mathematics 10 04348 g005

5. Computational Results

When a new method is proposed, an evaluation of its performance is required. This process can be addressed comparing the results obtained with the method under evaluation with respect to other proposals established in the literature to solve the same problem. Nevertheless, in our case, as the community detection problem with additional soft information has never been faced before, we cannot compare our method with other proposals of the literature. Then, we work on an evaluation process. For this, we consider several reference models [36] to which we apply our methodology, whose performance is quantified with the calculation of the Normalized Mutual Information (NMI) [37].
 Definition 8 
(Normalized Mutual Information(NMI) [37]). Let X = { x i } i V and Y = { y i } i V denote two disjoint partitions of the graph G = V , E . Let P ( x ) denote the probability that a random node is assigned to the community x, and let P ( x , y ) denote the conditional probability that a random node is assigned to the community x in the partition X and assigned to the community y in the partition Y. The Shannon entropy of X is calculated as H ( X ) = x P ( x ) l o g ( P ( x ) ) ; and the Shannon entropy of X and Y is calculated as H X , Y = x y P ( x , y ) l o g ( P ( x , y ) ) . The Mutual Information ( M I ) among partitions X and Y is defined as:
M I X , Y = x y P x , y l o g P x , y P x P y
Then, N M I is a normalization of Equation (5).
N M I X , Y = 2 M I X , Y H ( X ) + H ( Y )
Although there can be some issues with the basic version of the measure [38], we consider this measure because, to the best of our knowledge, it is fair enough to compare how similar two partitions are; i.e., NMI allows us to quantify how much the partition provided by our method resembles the considered standard partition.
In all the benchmark models we present, there are two components: the adjacency matrix A and the additional information matrix F, which are obtained from some aggregation of a family of vectors of soft information about the individuals. That component about the synergies is defined from multiple vectors. The generation of these vectors is based on the use of trapezoidal fuzzy sets [39].

5.1. Experiment Design

Following the idea in [2], then, we explain how we generate the benchmark models. Each one will represent an MEFVFG with n = 256 nodes. This process has two main parts: the definition of the adjacency matrix and the generation of the additional information.
To approach the manipulation of multiple vectors from a benchmarking perspective, we propose the following: in every vector, the value of each component depends on certain trapezoidal fuzzy sets, specifically saying low and high fuzzy sets. Low fuzzy sets are related to the generation of the components of each vector which imply scarce connections among the nodes, whereas high fuzzy sets refer to the generation of the components of the vectors which imply many connections among the nodes. Therefore, in each vector, the component related to nodes which are in the same community are generated as high fuzzy sets, whereas the components related to nodes of different communities are generated as low fuzzy sets. Let us emphasize that in the simulation process presented, what we randomly generate are the values D ( f ˜ i ) as high or low depending on the trapezoidal fuzzy sets f ˜ i , being D a defuzzification operator.
We have r vectors of fuzzy sets as the starting point in each benchmark, where r is the amount of communities embedded in the synergies matrix, F. Each vector is associated with a community C i , so nodes belonging to C i have a high value in the vector f ˜ i , whereas nodes which are not in C i have a low value in f ˜ i : D ( f ˜ j i ) = , if j C i ; D ( f ˜ j i ) = , if j C i . The process for defining and simulating these trapezoidal fuzzy sets is detailed below. To analyze different scattering of the and fuzzy sets, several combinations of the parameters a, b, c and d are considered (see Figure 6 and Figure 7). For example, to define a benchmark graph with f o u r communities, we have to generate f o u r n-vectors D ( f ˜ 1 ) , D ( f ˜ 2 ) , D ( f ˜ 3 ) , D ( f ˜ 4 ) with n = 256 nodes.
Each benchmark model represents an MEFVFG summarized into two matrices: one of direct connections (adjacency A) and another of additional information (synergies matrix F, obtained from the soft information vectors). Below, we explain the generation of them.
1
Adjacency matrix. The adjacency matrix A is randomly generated according to Equation (7) for a set V with 256 nodes. We consider different combinations of the values of parameters α and β regarding the input/output values ( z i n and z o u t ), as shown in Table 1 (similarly to the proposal in [2]). These parameters regulate the density of the connections matrix, A, whose generation process is shown in Algorithm 4.
P ( i , j ) = α if i , j C k β if otherwise
Table 1. Parameters used to generate the adjacency matrix A of each model.
Table 1. Parameters used to generate the adjacency matrix A of each model.
Network 1Network 2Network 3Network 4Network 5Network 6Network 7Network 8Network 9
α 0.450.40.350.3250.30.2750.250.2250.2
β 0.0160.0330.050.0580.0660.0750.0830.0910.1
Algorithm 4 Generate Adjacency
1:
Input: | C 1 | , | C r | , α , β , n ;
2:
Output: A;
3:
A i , j 0 , i , j = 1 , , n ;
4:
for ( i = 1 ) ( n ) do
5:
   for  ( i = 1 ) ( n )  do
6:
     for  ( = 1 ) ( r )  do
7:
         ϵ r a n d ( 0 , 1 ) ;
8:
        if  | C 1 | < i | C | a n d | C 1 | < j | C |  then
9:
          if  ϵ < α  then
10:
              A ( i , j ) 1 ;
11:
          end if
12:
        else
13:
          if  ϵ < β  then
14:
              A ( i , j ) 1 ;
15:
          end if
16:
        end if
17:
     end for
18:
   end for
19:
end for
20:
return ( A ) ;
2
Low trapezoidal fuzzy sets generation. This type of fuzzy sets f ˜ i , shown in Figure 6, are generated to represent, in each vector D ( f ˜ i ) , the components related to the elements with a low value in the characteristic of the corresponding vector.
Figure 6. Low trapezoidal fuzzy set.
Figure 6. Low trapezoidal fuzzy set.
Mathematics 10 04348 g006
After fixing the values a and b, we can calculate the lines r 1 and r 2 to obtain a trapezoid below them with area 1. Particularly, r 1 is defined as y = h , where h is a value chosen so that the value of the corresponding integral is 1: 1 = a h + b a 2 h h = 2 a + b .
On the other hand, r 2 is the line through the points 2 a + b , a and 0 , b , so
r 2 = 2 a + b = α + β a 0 = α + β b
By isolating α and β , α = 2 b a + b a b and β = 2 a + b a b , the distribution function of the low trapezoidal fuzzy set is:
F x = 2 x a + b , x [ 0 , a ] 2 a a + b + a x 2 b a + b a b + 2 z a + b a b d z , x ( a , b ]
where
2 a a + b + a x 2 b a + b a b + 2 z a + b a b d z = 2 a a + b + x b 2 a b 2 a + b a b
Once the low fuzzy set is characterized, in the following denoted by , we apply the inverse method. First, we have to calculate the inverse function of F, F 1 x ; then, we simulate a value between 0 and 1 (p). Finally, F 1 p is the value assigned to an edge which connect nodes which are not in the same community.
  • If p 2 a a + b p = 2 x a + b x = a + b p 2 .
  • If p > 2 a a + b p = 2 a a + b + x b 2 a b 2 a + b a b x = b p 2 a a + b a + b a b + a b 2 .
    We take the sign ‘−’ because x b < 0 .
Then, if p U ( 0 , 1 ) , the low values considered are obtained as:
x = a + b p 2 if ¡ p 2 a a + b x = b p 2 a a + b a + b a b + a b 2 otherwise
This process is summarized in Algorithm 5.
Algorithm 5 Low Fuzzy Set
1:
Input: a , b ;
2:
Output: ;
3:
p r a n d ( 0 , 1 ) ;
4:
if p 2 a a + b then
5:
    a + b p 2 ;
6:
else
7:
    b p 2 a a + b a + b a b + a + b 2 ;
8:
end if
9:
return ;
3
High trapezoidal fuzzy sets generation. This type of fuzzy sets, f ˜ i , shown in Figure 7, are generated to represent, in each vector D ( f ˜ i ) , the components related to the elements with a high value in the characteristic of the corresponding vector.
Figure 7. High trapezoidal fuzzy set.
Figure 7. High trapezoidal fuzzy set.
Mathematics 10 04348 g007
After fixing the values c and d, we can calculate the lines r 3 and r 4 to obtain a trapezoid below them with area 1. Particularly, r 3 is defined as y = h , where h is a value chosen so that the value of the corresponding integral is 1: 1 = d c h 2 1 d × h h = 2 1 d + 1 c
On the other hand, r 4 is the line through points 2 1 d + 1 c , d and 0 , c , so:
r 4 = 2 1 d + 1 c = α + β d 0 = α + β c
By isolating α and β , α = 2 c 1 d + 1 c d c and β = 2 1 d + 1 c d c , so the distribution function of the high trapezoidal fuzzy set is:
F x = c x 2 c + 2 z 1 d + 1 c d c d z , x [ c , d ] c d 2 c + 2 z 1 d + 1 c d c d z + z = d z = x 2 1 d + 1 c , x ( d , 1 ]
where
  • c x 2 c + 2 z 1 d + 1 c d c d z = x c 2 1 d + 1 c d c ;
  • c d 2 c + 2 z 1 d + 1 c d c d z + d x 2 1 d + 1 c = x d + x c 1 d + 1 c .
As with low fuzzy sets, we apply the inverse method to simulate the values of the high fuzzy sets ( in the following). Then, the value F 1 p is:
  • If p d c 1 d + 1 c p = x c 2 1 d + 1 c d c x = c + p d c 1 d + 1 c ;
  • If p > d c 1 d + 1 c p = x d + x c 1 d + 1 c x = p 1 d + 1 c + d + c 2 .
    We take the sign ‘+’ because x d > 0 .
Then, if p U ( 0 , 1 ) , the high values considered are obtained as:
x = c + p d c 1 c + 1 d if   p d c 1 c + 1 d x = p 1 c + 1 d + c + d 2 otherwise
This process is summarized in Algorithm 6.
Algorithm 6 High Fuzzy Set
1:
Input: c , d ;
2:
Output: ;
3:
p r a n d ( 0 , 1 ) ;
4:
if p d c ( 1 c ) + ( 1 d ) then
5:
    c + p ( d c ) ( ( 1 c ) + ( 1 d ) ) ;
6:
else
7:
    p 1 c + 1 d + c + d 2 ;
8:
end if
9:
return ;
4
Generate multiple vectors. In each benchmark model, we have r vectors as the starting point, where r is the amount of communities embedded in the synergies matrix, F. Each vector is associated with a community C i , so that nodes belonging to C i will have a high value in f ˜ i , whereas the nodes which are not in C i will have a low value in f ˜ i (then D ( f ˜ j i ) = , if j C i ; D ( f ˜ j i ) = , if j C i ). Different combinations of the parameters a, b, c and d are considered to generate the low/high fuzzy sets (see Table 2). These combinations affect the scattering of the and fuzzy sets. The process is summarized in Algorithm 7.
Algorithm 7 Generate Multiple Vectors
1:
Input: | C 1 | , | C r | , a , b , c , d ;
2:
Output: m u l t i p l e V e c t o r s ;
3:
| C 0 | 0 ;
4:
m u l t i p l e V e c t o r s 0 ; (matrix r × n , the line represents the vector D ( f ˜ ) )
5:
for ( = 1 ) ( r ) do
6:
   for  ( i = 1 ) ( n )  do
7:
     if  | C 1 | < i | C |  then
8:
         m u l t i p l e V e c t o r s ( , i ) H i g h F u z z y s e t ( c , d ) ;
9:
     else
 10:
         m u l t i p l e V e c t o r s ( , i ) L o w F u z z y s e t ( a , b ) ;
 11:
     end if
 12:
   end for
 13:
end for
 14:
return ( m u l t i p l e V e c t o r s ) ;
Table 2. Parameters to generate the matrix F of the benchmark model.
Table 2. Parameters to generate the matrix F of the benchmark model.
Case 1Case 2Case 3Case 4Case 5Case 6Case 7Case 8Case 9
a 0000.10.10.10.20.20.2
b 0.10.10.10.20.20.20.30.30.3
c 0.90.80.70.90.80.70.90.80.7
d 10.90.810.90.810.90.8
5
Synergies matrix. From vectors generated with the Algorithm Generate Multiple Vectors, we obtain μ f 1 a , , μ f r a . We consider the matrices F 1 , , F r and the adjacency of the corresponding MAWG. The second component of each benchmark is an aggregation of these matrices, F = Φ F 1 , , F r = max F 1 , , F r . We summarize this process in Algorithm 8 for the particular case p = 1 .
Algorithm 8 Matrix From Multiple Vectors
1:
Input: | C 1 | , | C r | , a , b , c , d ;
2:
Output: F;
3:
m u l t i p l e V e c t o r s G e n e r a t e M u l t i p l e V e c t o r s | C 1 | , | C r | , a , b , c , d ;
4:
for ( = 1 ) ( r ) do
5:
   for  ( i = 1 ) ( n )  do
6:
      S h ( , i ) m u l t i p l e V e c t o r s , i k = 1 n m u l t i p l e V e c t o r s , k ;
7:
     for  ( j = 1 ) ( n )  do
8:
         S h j ( , i ) m u l t i p l e V e c t o r s , i k i k V n m u l t i p l e V e c t o r s , k ;
9:
      end for
10:
   end for
11:
end for
12:
for ( = 1 ) ( r ) do
13:
   for  ( i = 1 ) ( n )  do
14:
     for  ( j = 1 ) ( n )  do
15:
         F i , j min { | S h ( , i ) S h j ( , i ) | , | S h ( , j ) S h i ( , j ) | } ;
16:
     end for
17:
   end for
18:
end for
19:
F max { F 1 , , F r } ;
20:
return ( F ) ;

5.2. Results

Then, we show the evaluation of the proposed methodology in the 1-additive stage. We do this to avoid exponential complexity in computing fuzzy measures. Nevertheless, there is no reason to think that the goodness of the partitions obtained, and therefore the accuracy of the evaluated method, will worsen if non-additive measures are considered. To do so, we consider several structures which vary in size and number of groups. Each of them represents an MEFVFG, G ^ = V , E , μ f 1 , p 1 , , μ f r , p r with two independent components. One of them, G = V , E , is related to the direct connections among the nodes represented by edges. The other, μ f 1 , p 1 , , μ f r , p r , is used to define a relations matrix F.
For each combination of α and β ; a, b, c and d, we analyze the linear combination M = θ A , F = γ A + 1 γ F , by considering γ = 0 (this is the only case in which, including the additional information, we can assert the partition which should be obtained).
In Table 3, Table 4, Table 5 and Table 6, we show the average of the NMI obtained from 100 iterations of each combination of α and β , concerning matrix A, and the parameters a, b, c and d for the definition of the vectors which give rise to the synergies matrix F. To simplify the interpretation of the results, these tables display the values in different colors: the closer the value is to 1 (i.e., the better the result), the lighter the color.
  • Benchmark graph. Model 1. It is the simpler benchmark model, showed in the Figure 8. The adjacency matrix has two communities with an expected size of 128 each, being < k > = 128 α + 128 β the expected degree of each node. F 1 is obtained from vectors D ( f ˜ 1 ) , D ( f ˜ 2 ) , D ( f ˜ 3 ) , D ( f ˜ 4 ) , so the 256 nodes are organized into four groups C 1 F , , C 4 F with expected size | C i F | = 64 . In Table 3, we show the results. Note that the tested algorithm always recovers the standard partition, even when the networks are sparse.
    Figure 8. Benchmark graph. Model 1.
    Figure 8. Benchmark graph. Model 1.
    Mathematics 10 04348 g008
    Table 3. NMI. Model 1.
    Table 3. NMI. Model 1.
    NMI F 1
    Case 1
    F 1
    Case 2
    F 1
    Case 3
    F 1
    Case 4
    F 1
    Case 5
    F 1
    Case 6
    F 1
    Case 7
    F 1
    Case 8
    F 1
    Case 9
    A 1
    Network 1
    1 11111111
    A 1
    Network 2
    111111111
    A 1
    Network 3
    111111111
    A 1
    Network 4
    111111111
    A 1
    Network 5
    111111111
    A 1
    Network 6
    111111111
    A 1
    Network 7
    111111111
    A 1
    Network 8
    111111111
    A 1
    Network 9
    111111111
  • Benchmark graph. Model 2. Due to modularity resolution limit [40], the Louvain algorithm is very sensitive to changes in the groups’ size, particularly with small communities [41]. Then, we test the algorithm in this context, where the groups of the standard structure are smaller than in Model 1, as it can be seen in the Figure 9. F 2 has eight communities with 32 nodes each. We also reduce the communities in A 2 : it has four communities with 64 nodes each, being < k > = 64 α + 192 β the expected degree. The obtained results are shown in Table 4. Despite the size reduction, our method provides good results.
    Figure 9. Benchmark graph. Model 2.
    Figure 9. Benchmark graph. Model 2.
    Mathematics 10 04348 g009
    Table 4. NMI. Model 2.
    Table 4. NMI. Model 2.
    NMI F 2
    Case 1
    F 2
    Case 2
    F 2
    Case 3
    F 2
    Case 4
    F 2
    Case 5
    F 2
    Case 6
    F 2
    Case 7
    F 2
    Case 8
    F 2
    Case 9
    A 2
    Network 1
    1111110.99870.99810.9994
    A 2
    Network 2
    1111110.99860.99800.9994
    A 2
    Network 3
    1111110.99930.99920.9994
    A 2
    Network 4
    1111110.99860.99800.9996
    A 2
    Network 5
    1111110.99840.99900.9991
    A 2
    Network 6
    1111110.99860.99840.9990
    A 2
    Network 7
    1111110.99900.99920.9992
    A 2
    Network 8
    1111110.99680.99920.9989
    A 2
    Network 9
    1111110.99930.99920.9996
  • Benchmark graph. Model 3. Previous results set light on the high quality of the tested method in symmetric structures. However, the interest of every method goes further than synthetic structures; the main objective is to reach proper results in real cases. Then, we work with asymmetric structures to simulate more realistic networks, as it can be seen in the Figure 10. F 3 has four communities whose sizes are | C 1 F | = 43 , | C 2 F | = 42 , | C 3 F | = 43 , | C 4 F | = 96 , | C 5 F | = 32 . On the other hand, A 3 = A 1 . We show the results in Table 5.
    Figure 10. Benchmark graph. Model 3.
    Figure 10. Benchmark graph. Model 3.
    Mathematics 10 04348 g010
    Table 5. NMI. Model 3.
    Table 5. NMI. Model 3.
    NMI F 3
    Case 1
    F 3
    Case 2
    F 3
    Case 3
    F 3
    Case 4
    F 3
    Case 5
    F 3
    Case 6
    F 3
    Case 7
    F 3
    Case 8
    F 3
    Case 9
    A 3
    Network 1
    1111110.99960.99940.9997
    A 3
    Network 2
    1111110.999711
    A 3
    Network 3
    1111110.999711
    A 3
    Network 4
    1111110.9990 11
    A 3
    Network 5
    11111110.99960.9992
    A 3
    Network 6
    11111110.99940.9994
    A 3
    Network 7
    1111110.99970.99960.9997
    A 3
    Network 8
    111111110.9994
    A 3
    Network 9
    111111110.9995
  • Benchmark graph. Model 4. This model combines the reduction of the size communities with partition asymmetry, as it can be seen in the Figure 11. In this case, A 4 = A 2 , and F 4 has eight communities whose expected sizes are | C 1 F | = 24 , | C 2 F | = 40 , | C 3 F | = 64 , | C 4 F | = 21 , | C 5 F | = 22 , | C 6 F | = 21 , | C 7 F | = 32 y | C 8 F | = 32 . Despite the obvious complexity of this structure, the results presented in Table 6 show the good performance of the tested algorithm.
    Figure 11. Benchmark graph. Model 4.
    Figure 11. Benchmark graph. Model 4.
    Mathematics 10 04348 g011
    Table 6. NMI. Model 4.
    Table 6. NMI. Model 4.
    NMI F 4
    Case 1
    F 4
    Case 2
    F 4
    Case 3
    F 4
    Case 4
    F 4
    Case 5
    F 4
    Case 6
    F 4
    Case 7
    F 4
    Case 8
    F 4
    Case 9
    A 4
    Network 1
    0.99750.99690.99740.99760.99700.99680.993710.9994
    A 4
    Network 2
    0.99560.99600.99620.99770.99820.99810.99600.99911
    A 4
    Network 3
    0.99510.99640.99560.99800.99720.99700.99740.99750.9997
    A 4
    Network 4
    0.99490.99640.99630.99780.99630.99800.99430.99900.9973
    A 4
    Network 5
    0.99520.99560.99750.99600.99780.99760.99570.99880.9979
    A 4
    Network 6
    0.99540.99570.99720.99670.99540.99890.99730.99610.9990
    A 4
    Network 7
    0.99440.99550.99780.99710.99690.99710.99800.99891
    A 4
    Network 8
    0.99520.99580.99730.99740.99690.99800.99960.99841
    A 4
    Network 9
    0.99520.99670.99740.99650.99690.99720.99720.99681

6. Discussion and Conclusions

Within the framework of the analysis of networks and social networks, this work is based on the analysis of fuzzy information. We can find different main contributions in this article. Our first objective is the definition of a new representation model, whose base is made up of two types of information sources. The first is a set of individuals whose direct connections are known and represented by a crisp graph or network. The other part deals with some additional knowledge about these individuals in terms of soft information represented by family of vectors of fuzzy sets. On this basis, we define the multi-dimensional extended fuzzy graph based on fuzzy vectors (MEFVFG). This new model combines the crisp information of a graph with the soft information provided by the vectors about the individuals. We define it from the simplest case where there is only one vector of fuzzy sets to the multidimensional scenario involving multiple vectors.
Another goal of this work is the proposal of a specific application of this new model related to community detection. This problem has been previously addressed in terms of fuzzy measures that provide additional information about the synergies among the individuals in some works [12,13,28]. In [11], we proposed a methodology, named Duo-Louvain, to find a “good” partition of the individuals of an extended fuzzy graph considering both the connections defined by the edges and also the additional information provided by the fuzzy measures. It was based on the well-known Louvain method [16], a greedy multi-phase algorithm based on local moving [42] and modularity optimization [25]. That proposal [11] is the inspiration of this paper. Now, we work on the community detection in networks by considering additional soft information about the individuals of the network. Specifically, we face the existence of several fuzzy sets related to the nodes of a network, so our proposed application of the MEFVFG in current work is to obtain realistic communities in it. That idea is quite useful and goes beyond any other previous proposal, as it can be applied in a wide range of scenarios, for example, when any linguistic variable(s) appear. As far as we know, this situation has never been faced before, so this work intrinsically leads to the definition of a new type of problem.
Another important objective is the evaluation of the developed methodology. As mentioned above, the problem presented in this article has not been addressed before in the literature, so there are no other methods with which we can compare our proposal. Then, to evaluate the performance of the new algorithm, we present some experimental results developed on the basis of benchmarking [36] and NMI calculation [37]. We develop some methods based on trapezoidal fuzzy sets with which we generate the elements of the gold models considered, each with a standard partition summarizing an MEFVFG, which should be detected by the evaluated algorithm. The high level of the results shown in Section 5 allows us to assert the good performance of the proposed method: the NMI calculated in almost all the scenarios is 1, which means that our algorithm perfectly detects the standard partition despite the complexity of the considered model.
As further research, we stress the importance of an in-depth analysis of the distance between fuzzy sets. Specifically, we are interested in analyzing how far two fuzzy sets are in order to compute new measures of additional information to be later considered in the presented methodology. Our idea is to work with the Hausdorff distance between two fuzzy sets [43], based on the classic metric with the same name [44], which is used in mathematics to quantify how far two subsets of a metric space are from each other. To approach this theoretical approach, it is essential to be familiar with the properties of fuzzy sets and also with various topological concepts related to the measurement of distances in different spaces.
Another important line of future work is not so theoretical but applied. Let us emphasize the importance of applying this methodology in real-life cases in order to obtain realistic groups of individuals which not only consider the direct connections between them but also some additional soft information. Real problems are too complex to be represented by a crisp graph alone. The need to include as many sources of information as possible is clear. For example, in the behavior of people, things are not “black or white”. To the question “do you agree with this law?”, the answer may be something like “well, more or less but not quite”. The more capable we are of representing these situations in a model, the more realistic the results obtained will be. We have to be prepared to understand, model, and analyze the fuzzy knowledge of real life, for example, by the consideration of linguistic terms. Undeniably, it is worth an in-depth analysis of the linguistic terms that accompany any real problem, for whose study the tools and methodology proposed here can be crucial. When facing this type of problem, it is vitally important to take into account its difficulty, both computationally and in terms of understanding. When fuzzy elements appear, it is essential to be well prepared to consider tools that mitigate the intrinsic difficulty.

Author Contributions

Conceptualization, I.G., D.G. and J.C.; methodology, I.G., D.G. and J.C.; software, I.G. and J.C.; validation, D.G., J.C. and R.E.; formal analysis, D.G.; investigation, I.G., D.G., J.C. and R.E.; resources, D.G., J.C. and R.E.; data curation, I.G.; writing—original draft preparation, I.G., D.G. and J.C.; writing—review and editing, I.G., D.G., J.C. and R.E.; visualization, I.G.; supervision, D.G., J.C. and R.E.; project administration, D.G.; funding acquisition, D.G., J.C. and R.E. All authors have read and agreed to the published version of the manuscript.

Funding

This research has been partially supported by the Government of Spain, Grant Plan Nacional de I+D+i, PID2020-116884GB-I00, PGC2018096509-B-I00.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

  1. Fortunato, S. Community detection in graphs. Phys.-Rep.-Rev. Sect. Phys. Lett. 2010, 486, 75–174. [Google Scholar] [CrossRef] [Green Version]
  2. Girvan, M.; Newman, M. Community structure in social and biological networks. Proc. Natl. Acad. Sci. USA 2002, 99, 7821–7826. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  3. Gómez, D.; Rodríguez, J.T.; Yáñez, J.; Montero, J. A new modularity measure for Fuzzy Community detection problems based on overlap and grouping functions. Int. J. Approx. Reason. 2016, 74, 88–107. [Google Scholar] [CrossRef] [Green Version]
  4. Speidel, L.; Takaguchi, T.; Masuda, N. Community detection in directed acyclic graphs. Eur. Phys. J. B 2016, 88, 203. [Google Scholar] [CrossRef] [Green Version]
  5. Li, L.; He, X.; Yan, G. Improved Louvain Method for Directed Networks. In Proceedings of the 10th IFIP TC 12 International Conference on Intelligent Information Processing (IIP), Nanning, China, 19–22 October 2018; Volume 538, pp. 192–203. [Google Scholar]
  6. Fumanal-Idocin, J.; Cordón, O.; Minárová, M.; Alonso-Betanzos, A.; Bustince, H. Combinations of affinity functions for different community detection algorithms in social networks. In Proceedings of the Hawaii International Conference on System Sciences, Hawaii, USA, 4–7 January 2022. [Google Scholar]
  7. Fumanal-Idocin, J.; Alonso-Betanzos, A.; Cordón, O.; Bustince, H.; Minárová, M. Community detection and social network analysis based on the Italian wars of the 15th century. Future Gener. Comput. Syst. 2020, 113, 25–40. [Google Scholar] [CrossRef]
  8. Gutiérrez, I.; Gómez, D.; Castro, J.; Espínola, R. A new community detection problem based on bipolar fuzzy measures. Stud. Comput. Intell. 2020, 955, 91–99. [Google Scholar]
  9. Gutiérrez, I.; Gómez, D.; Castro, J.; Espínola, R. Fuzzy Sugeno λ-Measures and Theirs Applications to Community Detection Problems. In Proceedings of the IEEE International Conference on Fuzzy Systems, Glasgow, UK, 19–24 July 2020; pp. 1–6. [Google Scholar]
  10. Barroso, M.; Gutiérrez, I.; Gómez, D.; Castro, J.; Espínola, R. Group Definition Based on Flow in Community Detection. In Proceedings of the Information Processing and Management of Uncertainty in Knowledge-Based Systems, Lisbon, Portugal, 15–19 June 2020; pp. 524–538. [Google Scholar]
  11. Gutiérrez, I.; Gómez, D.; Castro, J.; Espínola, R. A New Community Detection Algorithm Based on Fuzzy Measures. In Proceedings of the Advances in Intelligent Systems and Computing Series, Intelligent and Fuzzy Techniques in Big Data Analytics and Decision Making, INFUS 2019, Istanbul, Turkey, 23–25 July 2020; Volume 1029, pp. 133–140. [Google Scholar]
  12. Gutiérrez, I.; Gómez, D.; Castro, J.; Espínola, R. Multiple bipolar fuzzy measures: An application to community detection problems for networks with additional information. IJCIS 2020, 13, 1636–1649. [Google Scholar] [CrossRef]
  13. Gutiérrez, I.; Gómez, D.; Castro, J.; Espínola, R. Fuzzy Measures: A solution to deal with community detection problems for networks with additional information. JIFS 2020, 39, 6217–6230. [Google Scholar] [CrossRef]
  14. Novák, V.; Perfilieva, I.; Mockor, J. Mathematical Principles of Fuzzy Logic; Springer Science & Business Media: Berlin, Germnany, 2012; Volume 517. [Google Scholar]
  15. Herrera, F.; Martínez, L. A 2-tuple fuzzy linguistic representation model for computing with words. IEEE Trans. Fuzzy Syst. 2000, 8, 746–752. [Google Scholar]
  16. Blondel, V.; Guillaume, J.; Lambiotte, R.; Lefevre, E. Fast unfolding of communities in large networks. J. Stat.-Mech.-Theory Exp. 2008, 10, P1008. [Google Scholar] [CrossRef] [Green Version]
  17. Deveci, M.; Pamucar, D.; Gokasar, I.; Pedrycz, W.; Wen, X. Autonomous Bus Operation Alternatives in Urban Areas Using Fuzzy Dombi-Bonferroni Operator Based Decision Making Model. IEEE Trans. Intell. Transp. Syst. 2022, 1–10. [Google Scholar] [CrossRef]
  18. Chen, X.; Liu, X.; Wu, Q.; Deveci, M.; Martínez, L. Measuring technological innovation efficiency using interval type-2 fuzzy super-efficiency slack-based measure approach. Eng. Appl. Artif. Intell. 2022, 116, 105405. [Google Scholar] [CrossRef]
  19. Ross, T. Fuzzy Logic with Engineering Applications, 3rd ed.; Wiley: New York, NY, USA, 2010. [Google Scholar]
  20. Zadeh, L.A. Fuzzy sets. Inf. Control. 1965, 8, 338–353. [Google Scholar] [CrossRef] [Green Version]
  21. Zadeh, L.A. The concept of a linguistic and application to approximate reasoning I. Inf. Sci. 1975, 8, 199–249. [Google Scholar] [CrossRef]
  22. Wang, Y.J. Ranking triangle and trapezoidal fuzzy numbers based on the relative preference relation. Appl. Mathmatical Model. 2015, 39, 586–599. [Google Scholar] [CrossRef]
  23. Van Leekwijck, W.; Kerre, E. Defuzzification: Criteria and classification. Fuzzy Sets Syst. 1999, 108, 159–178. [Google Scholar] [CrossRef]
  24. De Hierro, A.; Sánchez, M.; Puente-Fernández, D.; Montoya-Juárez, R.; Roldán, C. A fuzzy Delphi consensus methodology based on a fuzzy ranking. Mathematics 2021, 9, 2323. [Google Scholar] [CrossRef]
  25. Newman, M.; Girvan, M. Finding and evaluating community structure in networks. Phys. Rev. E 2004, 69, 026113. [Google Scholar] [CrossRef] [Green Version]
  26. Gómez, D.; González-Arangüena, E.; Manuel, C.; Owen, G.; del Pozo, M.; Tejada, J. Centrality and power in social networks: A game theoretic approach. Math. Soc. Sci. 2003, 46, 27–54. [Google Scholar] [CrossRef]
  27. Gómez, D.; Castro, J.; Gutiérrez, I.; Espínola, R. A new edge betweenness measure using a game theoretical approach: An application to hierarchical community detection. Mathematics 2021, 9, 2666. [Google Scholar] [CrossRef]
  28. Gutiérrez, I.; Guevara, J.A.; Gómez, D.; Castro, J.; Espínola, R. Community Detection Problem Based on Polarization Measures. An application to Twitter: The COVID-19 case in Spain. Mathematics 2021, 9, 443. [Google Scholar] [CrossRef]
  29. Sugeno, M. Theory of Fuzzy Integrals and Its Applications. Ph.D. Thesis, Tokyo Institute of Technology, Tokyo, Japan, 1974. [Google Scholar]
  30. Sugeno, M. Fuzzy measures and fuzzy integrals: A survey. Fuzzy Autom. Decis. Process. 1977, 78, 89–102. [Google Scholar]
  31. Grabisch, M. k-order additive discrete fuzzy measures and their representation. Fuzzy Sets Syst. 1997, 92, 167–189. [Google Scholar] [CrossRef]
  32. Rosenfeld, A. Fuzzy Graphs. Fuzzy Sets Their Appl. 1975, 77–95. [Google Scholar] [CrossRef]
  33. Yager, R.R.; Kacprzyk, J. (Eds.) The Ordered Weighted Averaging Operators. Theory and Applications; Springer Science & Business Media: Boston, MA, USA, 1997. [Google Scholar]
  34. Shapley, L.S. A value for n-person games. Contribute. Theory Games 1953, 2, 307–317. [Google Scholar]
  35. Yager, R.R. On ordered weighted averaging aggregation operators in multicriteria decision making. IEEE Trans. Syst. Man Cybern. 1988, 18, 183–190. [Google Scholar] [CrossRef]
  36. Bader, D.; Kappes, A.; Meyerhenke, H.; Sanders, P.; Schulz, C.; Wagner, D. Benchmarking for Graph Clustering and Partitioning. In Encyclopedia of Social Network Analysis and Mining; Springer: New York, USA, 2018; pp. 1–11. [Google Scholar]
  37. Kvalseth, T.O. On Normalized Mutual information: Measure Derivations and Properties. Entropy 2017, 19, 631. [Google Scholar] [CrossRef] [Green Version]
  38. Amelio, A.; Pizzuti, C. Is normalized mutual information a fair measure for comparing community detection methods? In Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), Paris, France, 25–28 August 2015; pp. 1584–1585. [Google Scholar]
  39. Voskoglou, M. Application of Fuzzy Numbers to Assessment Processes. Int. J. Fuzzy Syst. Appl. 2018, 6, 59–73. [Google Scholar]
  40. Fortunato, S.; Barthélemy, M. Resolution Limit in Community Detection. Proc. Natl. Acad. Sci. USA 2007, 104, 36–41. [Google Scholar] [CrossRef] [Green Version]
  41. Liu, J.; Abbass, H.; Tan, K. Evolutionary Computation and Complex Networks; Springer: Berlin/Heidelberg, Germany, 2018. [Google Scholar]
  42. Waltman, L.; Van Eck, N.J. A smart local moving algorithm for large-scale modularity-based community detection. Eur. Phys. J. B 2013, 86, 471. [Google Scholar] [CrossRef]
  43. Abbasbandy, S.; Hajighasemi, S. A fuzzy distance between two fuzzy numbers. Commun. Comput. Inf. Sci. USA 2010, 81, 376–382. [Google Scholar]
  44. Rockafellar, R.; Wets, R. Variational Analysis; Grundlehren derMathematischenWissenschaften; Springer: Berlin/Heidelberg, Germany, 1998; Volume 317. [Google Scholar]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Gutiérrez, I.; Gómez, D.; Castro, J.; Espínola, R. From Fuzzy Information to Community Detection: An Approach to Social Networks Analysis with Soft Information. Mathematics 2022, 10, 4348. https://doi.org/10.3390/math10224348

AMA Style

Gutiérrez I, Gómez D, Castro J, Espínola R. From Fuzzy Information to Community Detection: An Approach to Social Networks Analysis with Soft Information. Mathematics. 2022; 10(22):4348. https://doi.org/10.3390/math10224348

Chicago/Turabian Style

Gutiérrez, Inmaculada, Daniel Gómez, Javier Castro, and Rosa Espínola. 2022. "From Fuzzy Information to Community Detection: An Approach to Social Networks Analysis with Soft Information" Mathematics 10, no. 22: 4348. https://doi.org/10.3390/math10224348

APA Style

Gutiérrez, I., Gómez, D., Castro, J., & Espínola, R. (2022). From Fuzzy Information to Community Detection: An Approach to Social Networks Analysis with Soft Information. Mathematics, 10(22), 4348. https://doi.org/10.3390/math10224348

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop