Next Article in Journal
Infinite Dimensional Maximal Torus Revisited
Previous Article in Journal
Multi-Scale Geometric Feature Extraction and Global Transformer for Real-World Indoor Point Cloud Analysis
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

EGNAS: Efficient Graph Neural Architecture Search Through Evolutionary Algorithm

1
AI Graduate School, Gwangju Institute of Science and Technology, Gwangju 61005, Republic of Korea
2
GIST Institute for Artificial Intelligence, Gwangju Institute of Science and Technology, Gwangju 61005, Republic of Korea
3
Convergence of AI, Chonnam National University, Gwangju 61186, Republic of Korea
*
Authors to whom correspondence should be addressed.
Mathematics 2024, 12(23), 3828; https://doi.org/10.3390/math12233828
Submission received: 14 November 2024 / Revised: 29 November 2024 / Accepted: 2 December 2024 / Published: 4 December 2024
(This article belongs to the Section Mathematics and Computer Science)

Abstract

:
The primary objective of our research is to enhance the efficiency and effectiveness of Neural Architecture Search (NAS) with regard to Graph Neural Networks (GNNs). GNNs have emerged as powerful tools for learning from unstructured network data, compensating for several known limitations of Convolutional Neural Networks (CNNs). However, the automatic search for optimal GNN architectures has seen little progressive advancement so far. To address this gap, we introduce the Efficient Graph Neural Architecture Search (EGNAS), a method that leverages the advantages of evolutionary search strategies. EGNAS incorporates inherited parameter sharing, allowing offspring to inherit parameters from their parents, and utilizes half epochs to improve optimization stability. In addition, EGNAS employs a combined evolutionary search, which explores both the model structure and the hyperparameters within a large search space, resulting in improved performance. Our experimental results demonstrate that EGNAS outperforms state-of-the-art methods in node classification tasks on the Cora, Citeseer, and PubMed datasets while maintaining a high degree of computational efficiency. In particular, EGNAS is the fastest GNN architecture search method in terms of search time, particularly when compared to precedently suggested evolutionary search strategies, delivering performance up to 40 times faster.

1. Introduction

Graph Neural Networks (GNNs) have emerged as a crucial methodology, in areas such as social networks, transportation, and bioinformatics, where they tackle the intricacies of unstructured network data [1,2,3,4]. However, the manual construction of GNN models is a task that demands extensive expertise and time, presenting a significant challenge. Addressing this challenge, Neural Architecture Search (NAS) introduces an innovative approach, employing optimization algorithms to automate the identification of an optimal model [5,6,7]. The fusion of GNN with NAS has also led to state-of-the-art performance, surpassing manually designed architectures [8,9]. While NAS offers the convenience of automation, its significant search times can paradoxically lead to reduced efficiency. This paper aims to overcome the limitations of NAS, focusing on efficiently identifying high-performing GNN models.
The integration of model structure and hyperparameter searches, broadening the exploration scope, has been increasingly recognized as pivotal in enhancing Graph Neural Network performance. Furthermore, this method enhances the automation efficiency in NAS by obviating the necessity for an independent hyperparameter search following the determination of the optimal structure. This comprehensive approach, exemplified by Genetic-GNN, has ventured into this integrated search space, yielding outcomes that rival the capabilities of state-of-the-art models [10]. However, the method that employs separate evolutionary searches for structure and hyperparameters results in an extended duration of the search process. Consequently, a transformation in the search methodology is required, one that is tailored to the expanded scope of the search space, to ensure an efficient and effective exploration process.
Training models from scratch is a notable challenge in the process of GNN model discovery due to the extensive time required. The concept of parameter sharing emerges as a solution to this, allowing for the reuse of previously trained parameters, thereby obviating the need for ground-up training of each new model [11,12]. AGNN successfully reduces the number of epochs needed for training new models through parameter sharing, effectively cutting down the time cost [13]. However, this efficiency gain introduces a reduction in the accuracy of the final models. A primary cause of this limitation is weight coupling, a phenomenon that occurs when the same weights are frequently reused across different parts of a model. This coupling can lead to the repeated selection of high-performing structures, potentially reducing the diversity of representations. Furthermore, it poses a critical challenge in parameter sharing approaches, as it increases the risk of overfitting, particularly when excessive training epochs are applied. This over-reliance on shared weights can undermine the generalization capability of the model [14,15].
To this end, we introduce the Efficient Graph Neural Architecture Search (EGNAS) algorithm, featuring two novel methodologies: combined evolutionary search and inherited parameter sharing with half epochs. The combined evolutionary search integrates structure and hyperparameter searches, using crossover and mutation techniques to discover a broader range of diverse architectures and enhance the speed of search. Inherited parameter sharing, alongside the half epochs approach, addresses weight coupling issues by reducing redundant parameter usage. As a result, EGNAS not only achieved state-of-the-art accuracy in node classification tasks but also significantly reduced computational cost compared to existing methods. Furthermore, its efficiency and scalability make it a practical solution for real-world applications across domains such as social networks, transportation, and bioinformatics.

2. Related Works

2.1. Evolutionary Neural Architecture Search

Neural Architecture Search (NAS) is an automated method for designing Convolutional Neural Networks (CNNs) [16]. In NAS, Evolutionary Algorithm (EA), Reinforcement Learning (RL), and differentiable architecture search are commonly used as search strategies [17,18,19].
The Evolutionary Algorithm is an optimization technique inspired by natural selection and genetic principles. It iteratively evolves a population of candidate solutions to find the best possible answer for a given problem [20]. It is beneficial in automating architecture exploration and hyperparameter search by evaluating multiple candidates simultaneously [21]. Its ability to explore extensive search spaces allows it to effectively discover optimal solutions for deep learning tasks, making it applicable in various domains, including image classification, speech recognition, emotion recognition, and others [22,23,24]. Moreover, the evolutionary method is well-suited for addressing multi-objective optimization problems to find trade-off solutions and solve complex optimization challenges [25,26].

2.2. Graph Neural Network Architecture Search

Graph Neural Networks (GNNs) are a specialized type of neural network designed to operate on graph-structured data. They are adept at capturing and utilizing the relationships within complex interconnected systems. In our research, we focus on developing an attention-based graph model, drawing upon the foundational principles of Graph Convolutional Network (GCN) and Graph Attention Network (GAT). GCN performs localized convolutional operations on graph nodes [27]. This approach iteratively aggregates information from neighboring nodes, effectively capturing the local graph structure. GAT incorporates attention mechanisms [28]. These mechanisms allow the model to adaptively assign varying levels of importance to different neighboring nodes during the aggregation process, enhancing the ability of the model to focus on more relevant information within the graph.
Several attempts have been made to automate graph model design using Neural Architecture Search (NAS) techniques, eliminating the need for manual intervention. GraphNAS and AGNN employ Reinforcement Learning frameworks to explore task-specific GNN structures [8,13]. Both approaches require training separate recurrent networks as controllers. In contrast, Genetic-GNN utilizes Evolutionary Algorithms as the optimization method for architecture search [10]. Comparing it to reinforcement-based NAS methodologies, it shows comparable performance and lower computational costs. However, one drawback of this approach is the significant time required for the overall architecture search process. DSS and SANE utilize differentiable architecture search and one-shot search to discover GNN models, enabling the use of pre-trained weights and reducing search time [9,29]. Nevertheless, the limitation is the potential underestimation of performance due to evaluation without additional training.

3. Graph Neural Network Architecture Search

3.1. Problem Statement

The main goal of our paper is to leverage the Neural Architecture Search (NAS) in conjunction with the Evolutionary Algorithm (EA) to identify an optimal Graph Neural Network (GNN) structure for the task of semi-supervised node classification. Within our research, the objective function is defined as depicted in Equation (1). The aim is to discover a GNN model denoted as  g * , which belongs to the search space of GNN G, and maximizes the fitness value  f D ( g ) . D represents the validation dataset, and the fitness metric employed within the EGNAS framework is the validation accuracy of the GNN. To explore the expansive search space of GNNs, we adopt an evolutionary search methodology. By integrating NAS and our evolutionary search, we ascertain the optimal GNN structure that exhibits outstanding performance.
g * = arg max g G f D ( g )
The search space in the EGNAS framework comprises the structure space and the hyperparameter space. In the structure space, the GNN structure  S  is represented as a matrix with dimensions  R | S 1 | × | S 2 | × × | S m | . Each component  S i  represents the set of choices for the ith structure component, such as the GNN aggregation function. Similarly, in the hyperparameter space, the hyperparameters  P  are represented as a matrix with dimensions  R | P 1 | × | P 2 | × × | P n | . Each component  P j  represents the set of candidate choices for the jth hyperparameter, such as the dropout rate. The number of structure search space types is denoted by m, and the number of hyperparameter search space types is denoted by n. In the search space section, we provide a detailed explanation of the different search space types.
The primary objective of EGNAS is to discover the optimal combination of GNN structure components and hyperparameters from entire generations, denoted as  SP b e s t . This combination results in the Graph Neural Network  g *  that maximizes the fitness value f. It can be represented as  SP b e s t = { S 1 , , S m , P 1 , , P n } b e s t .
Graph Neural Networks (GNNs) utilize the graph structure  G = ( V , E ) , where V represents the set of nodes and E represents the set of edges. The input to GNNs consists of graph structure and node features  X v  for each node  v V , and the output entails node embeddings  h v . In the  k t h  layer of GNN, it can be represented as follows:
a u , v ( k ) = ATTENTION ( k ) h u ( k 1 ) , h v ( k 1 )
h v ( k ) = ACT ( k ) AGG ( k ) { a u , v ( k ) · h u ( k 1 ) : u N ( v ) }
where  h v ( k )  is the embedding vector of node v and  a u , v ( k )  is the attention weight between the node u and v in the  k t h  layer. The term  N ( v )  represents the set of neighborhood nodes surrounding node v ATTENTION ACT AGG  refer to attention, activation, and aggregation functions, respectively.
In the context of developing GNN embedding vectors, we adopt a list of essential elements that delineate the range of exploration for potential model candidates. This list is referred to as the search space in Neural Architecture Search (NAS). In this study, we utilize two kinds of search space: the structure space and the hyperparameter space.
Table 1 provides components of the structure search space [8]. Attention Function determines the type of attention mechanism used in GNN. Each option represents a different way of weighing information from neighboring nodes. Aggregation Function determines how the information from neighboring nodes is combined in each graph convolution layer. Activation Function applies a non-linear transformation to the aggregated information in each layer. Attention Head specifies the number of attention heads used in the GNN. Multiple attention heads allow the model to capture different patterns and relationships in the graph. Hidden Unit determines the number of hidden units or output channels, represents the dimensionality of the node embeddings, and influences the expressive power and complexity of the model. When constructing a GNN with multiple layers, all five components need to be determined for each layer.

3.2. Search Space

Table 2 presents the search space for the hyperparameters of the GNN layers [10]. The three specific hyperparameters that need to be determined are the dropout rate of each layer, the learning rate, and the scale of weight decay for the training phase. We configure the dropout rate uniquely for each layer, ensuring its inclusion in every GNN layer. This setting originates from GraphNAS, a key reference study for our research. We use these values to enable a meaningful comparison between our study and previous works. By leveraging the search spaces in Table 1 and Table 2, we intend to explore various combinations to develop diverse GNN architectures with specific training configurations for optimal performance.

4. EGNAS

4.1. Overall Algorithm

The Efficient Graph Neural Architecture Search (EGNAS) algorithm aims to discover the best fitness Graph Neural Network (GNN) using an evolutionary approach. The overall algorithm is described in Algorithm 1. Algorithm 1 begins by initializing the individuals by randomly selecting elements from both the structure space and hyperparameter space. Each individual is then decoded into a GNN model based on its structure genes, and training is conducted according to its hyperparameter genes. The accuracy of trained model is evaluated using a validation set, enabling us to obtain the fitness values of the initial population.
Algorithm 1 EGNAS
Input: Population size K, the number of generations T, the number of parent candidates N, the number of parent pairs M, training set  D t r a i n , validation set  D v a l i d
Output: Best fitness GNN
  1:
S P 0  Initialize the combined population with the size K
  2:
Training the individuals in  S P 0  on  D t r a i n
  3:
Evaluate the fitness of individuals in  S P 0  on  D v a l i d
  4:
for  t : 0  to  T 1  do
  5:
     A t  Parent candidates selected from  S P t  through N-roulette wheel selection
  6:
     O t ϕ
  7:
    for  m : 1  to M do
  8:
         P 1 , P 2  Randomly selected parents from  A t
  9:
         C 1 , C 2 Combined Evolutionary Search P 1 , P 2
10:
         O t O t C 1 C 2
11:
    end for
12:
    Training the individuals in  O t  on  D t r a i n
13:
    Evaluate the fitness of individuals in  O t  on  D v a l i d
14:
     S P t + 1  Select K individuals from  S P t O t  through elitist replacement selection
15:
end for
16:
return Best fitness individual from  { S P 1 , , S P T }
The algorithm operates through multiple generations. Each generation begins with the selection of parent candidates from the current population using a roulette wheel selection. Using  f D ( g ) / g S P t f D ( g )  as the probability of selection, where  f D ( g )  represents the fitness of the model, a predetermined number of parent candidates are selected based on this criterion. Parent pairs are randomly chosen from the selected candidates, ensuring that each pair is unique within the same generation. Crossover and mutation operations are employed to generate offspring individuals, mixing genetic information from the selected parent pair. After their creation, these offspring undergo training using the training set, and their fitness is subsequently evaluated on the validation set.
The population for the next generation is formed by applying elitist replacement selection. The elitist replacement selection is a method that replaces the worst individual of the current generation if the structure of a new individual is not found among the population. If the structure does exist, it is compared with that individual, and only the superior one is retained. This method ensures the preservation of the most superior solutions from the current generation, thereby maintaining the quality of the population in each successive generation. This iterative process continues for a specified number of generations, allowing the algorithm to refine and enhance the population over time. Finally, the best fitness individual from all populations is returned as the best-performing GNN model discovered by the algorithm.

4.2. Combined Evolutionary Search

The extension to include hyperparameter search in NAS enhances the potential for superior performance by broadening the exploration scope, which also facilitates obtaining a fully optimized model without necessitating further tuning. This advancement consequently raises the automation level within NAS. Nonetheless, the extended scope increases the required search time, overshadowing these benefits [10]. In response, we propose the combined evolutionary search, a strategy designed to leverage the advantages of expanded search scope while concurrently reducing the time expenditure. This method, comprising crossover and mutation phases as outlined in Algorithm 2, aims to generate offspring that inherit genetic information from their parent configurations.
Algorithm 2 Combined Evolutionary Search
Input: Parents  P 1 , P 2 , the length of structure genes s, the length of hyperparameter genes h, mutation probability  μ
Output: Offsprings generated by crossover and mutation
  1:
Randomly generate a number  r 1  in range of  [ 0 , 1 )
  2:
if   r 1 0.5  then
  3:
     C 1 , C 2  Structure one-point crossover of  P 1 , P 2  in the range of integers  [ 0 , s )
  4:
else
  5:
     C 1 , C 2  Hyperparameter one-point crossover of  P 1 , P 2  in the range of integers  [ 0 , h )
  6:
end if
  7:
for i to  1 , 2  do
  8:
    Randomly generate a number  r 2  in range of  [ 0 , 1 )
  9:
    if  r 2 μ  then
10:
         C i  Structure and hyperparameter point replacement mutation of  C i
11:
    else
12:
         C i C i
13:
    end if
14:
end for
15:
return Offsprings  C 1 , C 2
In the EGNAS algorithm, we employ a combined chromosome structure, incorporating both structure genes and hyperparameter genes. We divide the crossover and mutation processes into two distinct types: structure and hyperparameter. Crossover is applied selectively, targeting either structure or hyperparameter for each model candidate. This selection is determined randomly with equal probability, ensuring controlled genetic modification. The purpose of this restriction is to reduce the risk of deviating too far from effective solutions identified in previous generations. This is crucial because crossover can induce extensive changes within the individual. On the other hand, mutations are applied concurrently to both structure and hyperparameter aspects. Through this simultaneous application, we increase variability and diversity within the expanded search space. This approach fosters a comprehensive exploration and aids in global optimization.
To perform the crossover, a random number  r 1  is generated in the range of  [ 0 , 1 ) . If r is less than or equal to 0.5, a structure one-point crossover is conducted between parents  P 1  and  P 2 . This involves selecting a random point within the range of integers  [ 0 , s ) , where s represents the length of the structure genes. On the other hand, if  r 1  is greater than 0.5, a hyperparameter one-point crossover is performed. A random point within the range of integers  [ 0 , h ) , where h denotes the length of the hyperparameter genes, is selected as the crossover point. For the mutation step, another random number  r 2  is generated. If  r 2  is less than or equal to the mutation probability  μ , a random point replacement mutation occurs for both the structure and hyperparameter, respectively. On the other hand, if  r 2  is greater than  μ , the offspring remains unchanged. An example of a mutation in EGNAS is illustrated in Figure 1. In this example, the activation function of the first layer is mutated from ELU to Tanh, and the learning rate is mutated from  5 × 10 3  to  1 × 10 2 . These  μ  were chosen to reflect the adoption of a relatively higher mutation probability, intended to introduce greater variation during the evolutionary process. This approach mitigates weight coupling and encourages the exploration of a more diverse range of architectural designs.
Finally, the algorithm outputs the two offspring individuals, denoted as  C 1  and  C 2 , which have undergone the crossover and mutation operations. Throughout this step, new genetic combinations are explored, promoting diversity within the population and enabling the search for advanced solutions in subsequent generations.

4.3. Inherited Parameter Sharing

The main issue causing weight coupling when using parameter sharing techniques is that all trained models can potentially share their parameters. To address this, EGNAS leverages the characteristics of evolutionary search and introduces the concept of inherited parameter sharing. With this technique, offspring exclusively inherit parameters from their parents. By adopting inherited parameter sharing, EGNAS not only benefits from the accumulated knowledge of successful architectures but also avoids repeated selection and overfitting.
With parameter sharing, the necessity for further training diminishes. Accordingly, we introduce the half epochs strategy which reduces the number of training epochs by half. When all layers of an offspring have received parameters from the parents, the training epoch is halved. This not only mitigates overfitting but also reduces computational costs and enhances search efficiency.
Parameter sharing should be conducted after confirming whether specific elements of the GNN are the same. These key components include layer number, input channels, output channels (hidden units), attention type, aggregator type, and the number of heads. When the layer of the offspring matches the key components of the parent layer, the offspring inherits the parameters of the parents. This targeted parameter sharing ensures relevant and compatible information exchange.
During the crossover process, which combines genetic information from parents, parameter sharing plays a crucial role. Figure 2 illustrates a scenario where parameters of a single parent are inherited when the crossover point lies within a layer. Conversely, Figure 3 demonstrates an example where the crossover point is between layers, allowing the offspring to inherit parameters from both parents. In this case, the half epochs condition is met, and the offspring undergoes training for only half of the epochs compared to the original training procedure. This enables the offspring to take advantage of the pre-trained knowledge from both parents.
In the mutation stage, random modifications are introduced to the genetic information of offspring. Figure 4 showcases an example where a key component is altered during mutation, resulting in the loss of a specific shared parameter. When the second layer aggregator of  C 2  is changed to mean, the key component of the second layer in both  C 2  and  P 1  becomes mismatched. As a result, the offspring cannot inherit weight 2. In such cases, the offspring is trained with the newly initialized parameters.

5. Experiments

5.1. Datasets

For our experiments, we utilized three representative node classification benchmark datasets: Cora, Citeseer, and PubMed [30]. These datasets represent citation networks, forming single graphs where each node corresponds to a document. The citation relationships between papers are depicted as undirected edges. Furthermore, each paper is composed of bag-of-words features, and the task is to classify the type of each paper. Table 3 presents the detailed statistics for each dataset.
For data splitting, we follow the semi-supervised learning approach introduced by Yang et al. [31], which utilizes all node features and edge information and employs only a subset of node labels for training. We designate the last 1000 data points in each dataset as the test set, while the preceding 500 data points are employed as the validation set. As the training data, we use the initial 140, 120, and 60 data points for Cora, Citeseer, and PubMed, respectively.

5.2. Experimental Settings

The two types of hyperparameters used for the experiments are as follows:
Hyperparameters of evolutionary search: The experimental settings for EGNAS include a population size of 20 individuals and a total of 50 generations of evolution. The number of parent candidates is set to 10, and 10 parent pairs are selected for reproduction. A mutation probability of 0.2 is applied during the evolutionary process.
Hyperparameters of GNNs: We adopt a GNN architecture consisting of two layers. As a result, the length of the structure genes is adjusted to 10, representing the two GNN layers. Additionally, since a dropout rate is specified for each of the two layers, the length of the hyperparameter genes is set to 4. Therefore, the total number of architectures in the search space is  5.4 × 10 13 , which is the largest value among the GNN architecture search methods compared. The training duration for individuals that do not correspond to half epochs is set to 200 epochs, while individuals corresponding to half epochs are trained for 100 epochs. EGNAS-10 applies early stopping with a patience of 10, halting training if validation loss does not decrease for 10 consecutive epochs. We adopt Glorot parameter initialization [32] and utilize the Adam optimizer to minimize the cross-entropy loss during the training process.

5.3. Baseline Selection

To ensure a comprehensive comparison, we chose baselines that are not only a set of state-of-the-art approaches representative of the diverse field but also are practically implementable. Our selection includes two categories of baselines: hand-crafted GNN architectures and Graph Neural Network Architecture Search (GNN NAS) methods. The hand-crafted architectures comprise GCN (Graph Convolutional Network) and GAT (Graph Attention Network), which are representative structures in GNN research. GCN employs spectral convolutional filters to gather information from all adjacent nodes within a graph [27]. GAT utilizes an attention mechanism to weigh node-level features, thereby calculating the relationships between adjacent nodes [28]. In addition, we incorporate three GNN NAS approaches into our baselines. GraphNAS is a Reinforcement Learning-based method for optimizing GNN architectures, where a recurrent network is trained to maximize the metric of generated architectures [8]. SANE is a methodology that introduces differentiable NAS methods to improve search efficiency [29]. Genetic-GNN applies an evolutionary search to evolve model structures and hyperparameters over generations [10]. Notably, Genetic-GNN is one of the most recent studies and has recorded superior performance in the GNN NAS field. All algorithms are executed using the specified parameters outlined in the paper.
To ensure equitable evaluations, the handcrafted algorithm outcomes are generated through the training of 1000 models, mirroring the candidate count in EGNAS. The top five models with the highest validation accuracy are selected to calculate the average test accuracy for the handcrafted approach. In the GNN NAS baseline experiments, methods are executed in a manner consistent with their published methods. GraphNAS selects the top five models by validation accuracy, each retrained 20 times from scratch, with their best outcomes averaged for the final result. Similarly, Genetic-GNN utilizes its top five models but does not retrain them. SANE conducts five experiments, averaging the results of retraining the best validation model five times.
Contrasting with GraphNAS and SANE, EGNAS directly employs the top five validation models without retraining. The essence of NAS research lies in its capacity to automate processes, thereby reducing manual intervention and yielding immediately usable model parameters. EGNAS demonstrates this principle by incorporating hyperparameter search and presenting outcomes without additional retraining.

5.4. Results

Table 4 compares test accuracies by various algorithms in node classification within semi-supervised learning. This table encompasses outcomes from conventional, handcrafted techniques (e.g., GCN, GAT) and cutting-edge GNN NAS methodologies, alongside our novel proposal, EGNAS. EGNAS-10 means integrating an early stopping mechanism into EGNAS. The most exceptional outcomes are highlighted using boldface type. The experimental findings demonstrate the notable efficacy of both EGNAS and EGNAS-10, which consistently surpass the baseline accuracy on the Cora, Citeseer, and PubMed datasets. EGNAS-10 achieves state-of-the-art performance in the Cora and PubMed datasets, while EGNAS emerges as the top-performing algorithm in the Citeseer dataset.
Table 5 presents the time duration, measured in minutes, required for the process of Graph Neural Architecture Search across diverse algorithms. The time measurements for the search process were uniformly conducted for each dataset. Specifically, for the Cora and Citeseer datasets, computations were performed on a GeForce RTX 2080 GPU, Intel Xeon Gold 6138 CPU, while the PubMed dataset leveraged an NVIDIA TITAN V with AMD Threadripper 3990 WX.
EGNAS surpasses both GraphNAS and Genetic-GNN in the comparison of search times across three datasets. Relative to GraphNAS, EGNAS requires only about 40% of the search time on average. Notably, against Genetic-GNN, which shows the highest accuracy apart from EGNAS in most datasets, EGNAS demonstrates a search speed that is on average about 40 times faster. EGNAS is capable of finding more accurate models than SANE with only a marginal difference in search time. SANE is on average 23.7 min faster than EGNAS, but it significantly lags behind EGNAS in accuracy, with differences of 4.58%, 6.26%, and 3.11% across the Cora, Citeseer, and PubMed datasets, respectively. Furthermore, the EGNAS-10 variant not only maintains a comparable level of accuracy to EGNAS but also enhances search time efficiency. Compared to SANE, the fastest among the baselines, EGNAS-10 demonstrates an average improvement of 23.4 min in search speed.
Table 6 presents the computational costs associated with the results in Table 4, analyzing metrics that include the number of trainable parameters (Params), training time per epoch (Time), and inference latency per epoch (Latency). In the Cora dataset, EGNAS distinguished itself by not only achieving the highest accuracy among GNN NAS algorithms but also recording the smallest number of trainable parameters. Moreover, its training and inference durations are either comparable or faster than its counterparts. Notably, in the Citeseer dataset, the EGNAS-10 variant attains higher accuracy while demonstrating the least computational costs among the GNN NAS alternatives. On the PubMed dataset, EGNAS shows larger computational costs when compared to other GNN NAS methods. This might be due to the relatively subdued performance of these groups compared to manually constructed models (GCN, GAT).
In conclusion, EGNAS stands out as a remarkable solution in the realm of GNN NAS that not only achieves the highest performance with minimal search time but also maintains a balance between model complexity and computational efficiency. This indicates that the EGNAS methodology possesses both efficiency and effectiveness.

5.5. Ablation Study

To assess the effectiveness of the EGNAS framework, we conduct ablation studies on the combined evolutionary search, inherited parameter sharing, and half epochs. The results, shown in Table 7 and Table 8, explore scenarios where we optimize the structure and hyperparameter space separately (‘separate’), do not employ parameter sharing (‘no PS’), and allow sharing from all trained parameters (‘fully PS’). In the ‘separate’ experiment, the Citeseseer trial was conducted under similar conditions as before. However, due to the increased number of model candidates that needed to be stored, we faced out-of-memory issues and then left that column blank. The population size and the number of generations for the hyperparameters were set at 4 and 5, respectively, identical to those used in Genetic-GNN [10]. Contrasting with EGNAS, the ‘separate’ requires running hyperparameter generations within structure generations, resulting in the formation of more candidates and extending the search time by approximately 14 times compared to EGNAS. Despite the increased time, the accuracy also falls by an average of 0.93%.
Under conditions without parameter sharing (’no PS’), except for Citeseer, more search time is required and accuracy also drops by an average of 1.27%. Conversely, when the scope of candidates for parameter sharing is expanded (’fully PS’), search time decreases compared to EGNAS, but performance decreases compared to the non-sharing scenario. This is because repeated selection of the same parameters speeds up the process by reducing training needs, but it often results in convergence to local optima.
Figure 5 illustrates the trajectory of performance for the best validation models across generations within the Cora dataset, contingent upon the employment of the half epochs. Each experiment was conducted 10 times. The solid lines represent the average experimental results, and the shaded areas indicate the variance. In EGNAS integrated with half epochs, accuracy consistently displays an upward trajectory with the passage of generations. In contrast, excluding the half epochs approach results in a relatively moderate upward curve recording lower values compared to EGNAS. This can be attributed to overfitting, which happens when excessive learning occurs from repeating the same epochs, despite using previously trained parameters through parameter sharing. Moreover, the absence of the half epochs strategy leads to an elongated training time, as evident from the increase in training duration to 41.2 min, in contrast to EGNAS’s 33.3 min.

6. Conclusions

We proposed EGNAS, a method for the efficient and effective discovery of optimal Graph Neural Networks. Through combined evolutionary search, we rapidly explored a large search space expanded into the hyperparameter space. The integration of inherited parameter sharing and half epochs strategy significantly enhanced the reusability of trained parameters and mitigated the risk of overfitting. As a result, EGNAS recorded the highest accuracy in node classification tasks than state-of-the-art hand-crafted models and other GNN NAS methods with less computational cost. Furthermore, the considerable reduction in search time suggests its immediate applicability in practical industrial settings. We anticipate EGNAS to be a valuable tool for rapidly identifying task-specific models across various GNN domains, like social networks, transportation, and bioinformatics.

Author Contributions

Conceptualization, Y.J.; methodology, Y.J.; software, Y.J.; validation, M.-J.K. and C.W.A.; formal analysis, M.-J.K.; investigation, Y.J.; resources, C.W.A.; data curation, Y.J.; writing—original draft preparation, Y.J. and M.-J.K.; writing—review and editing, M.-J.K. and C.W.A.; visualization, Y.J.; supervision, M.-J.K.; project administration, C.W.A.; funding acquisition, M.-J.K. and C.W.A. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Korea government (MSIT, Ministry of Science and ICT) (RS-2024-00347902) and the Ministry of Education (RS-2023-00247900), and the Institute of Information & communications Technology Planning & Evaluation (IITP) under the Artificial Intelligence Convergence Innovation Human Resources Development (IITP-2023-RS-2023-00256629) grant funded by the Korea government (MSIT) and ITRC (Information Technology Research Center) support program (IITP-2024-RS-2024-00437718).

Data Availability Statement

The data presented in this study are openly available in [30,31]. and The source code is available at the following link: https://github.com/JwaYounkyung/EGNAS (accessed on 13 November 2024).

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Wu, Z.; Pan, S.; Chen, F.; Long, G.; Zhang, C.; Philip, S.Y. A comprehensive survey on graph neural networks. IEEE Trans. Neural Networks Learn. Syst. 2020, 32, 4–24. [Google Scholar] [CrossRef] [PubMed]
  2. Zhou, J.; Cui, G.; Hu, S.; Zhang, Z.; Yang, C.; Liu, Z.; Wang, L.; Li, C.; Sun, M. Graph neural networks: A review of methods and applications. AI Open 2020, 1, 57–81. [Google Scholar] [CrossRef]
  3. Huang, R.; Huang, C.; Liu, Y.; Dai, G.; Kong, W. LSGCN: Long short-term traffic prediction with graph convolutional networks. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20, Yokohama, Japan, 7–15 January 2021; pp. 2355–2361. [Google Scholar] [CrossRef]
  4. Fout, A.; Byrd, J.; Shariat, B.; Ben-Hur, A. Protein interface prediction using graph convolutional networks. In Advances in Neural Information Processing Systems; Neural Information Processing Systems Foundation, Inc. (NeurIPS): La Jolla, CA, USA, 2017; Volume 30. [Google Scholar]
  5. Zoph, B.; Le, Q. Neural architecture search with reinforcement learning. In Proceedings of the International Conference on Learning Representations, Toulon, France, 24–26 April 2017. [Google Scholar]
  6. Tan, M.; Le, Q. EfficientNet: Rethinking model scaling for convolutional neural networks. In Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; Volume 97, pp. 6105–6114. [Google Scholar]
  7. Xu, H.; Wang, S.; Cai, X.; Zhang, W.; Liang, X.; Li, Z. Curvelane-nas: Unifying lane-sensitive architecture search and adaptive point blending. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part XV 16. pp. 689–704. [Google Scholar]
  8. Gao, Y.; Yang, H.; Zhang, P.; Zhou, C.; Hu, Y. Graph neural architecture search. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20, Yokohama, Japan, 7–15 January 2021; pp. 1403–1409. [Google Scholar] [CrossRef]
  9. Li, Y.; Wen, Z.; Wang, Y.; Xu, C. One-shot graph neural architecture search with dynamic search space. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtually, 2–9 February 2021; Volume 35, pp. 8510–8517. [Google Scholar]
  10. Shi, M.; Tang, Y.; Zhu, X.; Huang, Y.; Wilson, D.; Zhuang, Y.; Liu, J. Genetic-GNN: Evolutionary architecture search for graph neural networks. Knowl.-Based Syst. 2022, 247, 108752. [Google Scholar] [CrossRef]
  11. Pham, H.; Guan, M.; Zoph, B.; Le, Q.; Dean, J. Efficient neural architecture search via parameters sharing. In Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; Volume 80, pp. 4095–4104. [Google Scholar]
  12. Zhang, H.; Jin, Y.; Cheng, R.; Hao, K. Efficient evolutionary search of attention convolutional networks via sampled training and node inheritance. IEEE Trans. Evol. Comput. 2020, 25, 371–385. [Google Scholar] [CrossRef]
  13. Zhou, K.; Huang, X.; Song, Q.; Chen, R.; Hu, X. Auto-gnn: Neural architecture search of graph neural networks. Front. Big Data 2022, 5, 1029307. [Google Scholar] [CrossRef] [PubMed]
  14. Xie, L.; Chen, X.; Bi, K.; Wei, L.; Xu, Y.; Wang, L.; Chen, Z.; Xiao, A.; Chang, J.; Zhang, X.; et al. Weight-sharing neural architecture search: A battle to shrink the optimization gap. ACM Comput. Surv. (CSUR) 2021, 54, 183. [Google Scholar] [CrossRef]
  15. Xia, X.; Xiao, X.; Wang, X.; Zheng, M. Progressive automatic design of search space for one-shot neural architecture search. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 3–8 January 2022; pp. 2455–2464. [Google Scholar]
  16. Elsken, T.; Metzen, J.H.; Hutter, F. Neural architecture search: A survey. J. Mach. Learn. Res. 2019, 20, 1997–2017. [Google Scholar]
  17. Real, E.; Aggarwal, A.; Huang, Y.; Le, Q.V. Regularized evolution for image classifier architecture search. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 4780–4789. [Google Scholar]
  18. Zoph, B.; Vasudevan, V.; Shlens, J.; Le, Q.V. Learning transferable architectures for scalable image recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8697–8710. [Google Scholar]
  19. Liu, H.; Simonyan, K.; Yang, Y. DARTS: Differentiable architecture search. In Proceedings of the International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
  20. Schaffer, J.D. Some Experiments in Machine Learning Using Vector Evaluated Genetic Algorithms; Technical Report; Vanderbilt University: Nashville, TN, USA, 1985. [Google Scholar]
  21. Young, S.R.; Rose, D.C.; Karnowski, T.P.; Lim, S.H.; Patton, R.M. Optimizing deep learning hyper-parameters through an evolutionary algorithm. In Proceedings of the Workshop on Machine Learning in High-Performance Computing Environments; Association for Computing Machinery: New York, NY, USA, 2015; pp. 1–5. [Google Scholar]
  22. Yang, Z.; Wang, Y.; Chen, X.; Shi, B.; Xu, C.; Xu, C.; Tian, Q.; Xu, C. Cars: Continuous evolution for efficient neural architecture search. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 1829–1838. [Google Scholar]
  23. Moriya, T.; Tanaka, T.; Shinozaki, T.; Watanabe, S.; Duh, K. Evolution-strategy-based automation of system development for high-performance speech recognition. IEEE/ACM Trans. Audio, Speech, Lang. Process. 2018, 27, 77–88. [Google Scholar] [CrossRef]
  24. Wu, M.; Su, W.; Chen, L.; Liu, Z.; Cao, W.; Hirota, K. Weight-adapted convolution neural network for facial expression recognition in human-robot interaction. IEEE Trans. Syst. Man, Cybern. Syst. 2019, 51, 1473–1484. [Google Scholar] [CrossRef]
  25. Lu, Z.; Whalen, I.; Boddeti, V.; Dhebar, Y.; Deb, K.; Goodman, E.; Banzhaf, W. Nsga-net: Neural architecture search using multi-objective genetic algorithm. In Proceedings of the Genetic and Evolutionary Computation Conference, Prague, Czech Republic, 13–17 July 2019; pp. 419–427. [Google Scholar]
  26. Chen, Y.; Yang, T.; Zhang, X.; Meng, G.; Xiao, X.; Sun, J. Detnas: Backbone search for object detection. In Advances in Neural Information Processing Systems; Neural Information Processing Systems Foundation, Inc. (NeurIPS): La Jolla, CA, USA, 2019; Volume 32. [Google Scholar]
  27. Kipf, T.N.; Welling, M. Semi-supervised classification with graph convolutional networks. In Proceedings of the International Conference on Learning Representations, Toulon, France, 24–26 April 2017. [Google Scholar]
  28. Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Liò, P.; Bengio, Y. Graph attention networks. In Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
  29. Huan, Z.; Quanming, Y.; Weiwei, T. Search to aggregate neighborhood for graph neural network. In Proceedings of the 2021 IEEE 37th International Conference on Data Engineering (ICDE), Chania, Greece, 19–22 April 2021; pp. 552–563. [Google Scholar]
  30. Sen, P.; Namata, G.; Bilgic, M.; Getoor, L.; Galligher, B.; Eliassi-Rad, T. Collective classification in network data. AI Mag. 2008, 29, 93. [Google Scholar] [CrossRef]
  31. Yang, Z.; Cohen, W.; Salakhudinov, R. Revisiting semi-supervised learning with graph embeddings. In Proceedings of the 33rd International Conference on Machine Learning, New York, NY, USA, 20–22 June 2016; Volume 48, pp. 40–48. [Google Scholar]
  32. Glorot, X.; Bengio, Y. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, Sardinia, Italy, 13–15 May 2010; pp. 249–256. [Google Scholar]
Figure 1. Structure and hyperparameter point replacement mutation.
Figure 1. Structure and hyperparameter point replacement mutation.
Mathematics 12 03828 g001
Figure 2. Parameter sharing from a single parent.
Figure 2. Parameter sharing from a single parent.
Mathematics 12 03828 g002
Figure 3. Parameter sharing from two parents.
Figure 3. Parameter sharing from two parents.
Mathematics 12 03828 g003
Figure 4. Parameter sharing influenced by mutation.
Figure 4. Parameter sharing influenced by mutation.
Mathematics 12 03828 g004
Figure 5. Test accuracy of best validation model over generations on Cora with and without the half epochs.
Figure 5. Test accuracy of best validation model over generations on Cora with and without the half epochs.
Mathematics 12 03828 g005
Table 1. The structure search space for GNN layers.
Table 1. The structure search space for GNN layers.
Structure ComponentSearch Space
Attention Functionconst, gcn, gat, sym-gat, cos, linear, gene-linear
Aggregation Functionsum, mean-pooling, max-pooling, mlp
Activation Functionsigmoid, tanh, relu, linear, softplus, leaky_relu, relu6, elu
Attention Head1, 2, 4, 6, 8, 16
Hidden Unit4, 8, 16, 32, 64, 128, 256
Table 2. The hyperparameter search space for GNN layers.
Table 2. The hyperparameter search space for GNN layers.
Hyperparameter ComponentSearch Space
Dropout Rate0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6
Learning Rate   5 × 10 4 , 1 × 10 3 , 5 × 10 3 , 1 × 10 2
Weight Decay   5 × 10 4 , 8 × 10 4 , 1 × 10 3 , 4 × 10 3
Table 3. Descriptions of graph datasets.
Table 3. Descriptions of graph datasets.
Dataset# Nodes# Edges# Features# Classes
Cora2708542914337
Citeseer3327473237036
PubMed19,71744,3385003
Table 4. Test accuracy (%) comparison in the node classification task.
Table 4. Test accuracy (%) comparison in the node classification task.
AlgorithmCoraCiteseerPubMed
HandcraftedGCN   81.18 ± 0.49   72.04 ± 0.61   78.82 ± 0.47
GAT   82.98 ± 0.44   71.40 ± 0.27   77.98 ± 0.24
GNN NASGraphNAS   82.26 ± 0.96   69.46 ± 0.99   76.94 ± 0.39
SANE   79.12 ± 1.04   67.84 ± 0.99   76.29 ± 1.16
Genetic-GNN   83.00 ± 0.44   73.12 ± 0.34   78.20 ± 0.00
OursEGNAS   83.70 ± 0.00   74 . 10 ± 0 . 00   79.40 ± 0.00
EGNAS-10   83 . 80 ± 0 . 00   73.90 ± 0.00   79 . 50 ± 0 . 00
The bold formatting indicate the most optimal values.
Table 5. Graph Neural Architecture Search time (m) in the node classification tasks.
Table 5. Graph Neural Architecture Search time (m) in the node classification tasks.
AlgorithmCoraCiteseerPubMed
GraphNAS119.5153.3256.4
SANE39.161.740.2
Genetic-GNN862.93937.84378.1
EGNAS33.387.091.9
EGNAS-109.117.244.4
The bold formatting indicate the most optimal values.
Table 6. Computational costs comparison in the node classification task.
Table 6. Computational costs comparison in the node classification task.
AlgorithmCoraCiteseerPubMed
Params (M) Time (ms) Latency (ms) Params (M) Time (ms) Latency (ms) Params (M) Time (ms) Latency (ms)
GCN0.024.642.570.064.552.520.014.391.99
GAT0.098.193.150.248.703.220.039.793.15
GraphNAS1.369.953.827.5322.019.280.107.322.99
SANE0.738.943.071.3110.664.530.048.993.14
Genetic-GNN0.144.371.3810.7642.1916.580.054.901.51
EGNAS0.105.321.455.7121.748.490.2011.453.83
EGNAS-100.385.271.520.487.781.930.5221.128.13
Table 7. Comparison of test accuracy (%) in the node classification task for original algorithm and ablation studies.
Table 7. Comparison of test accuracy (%) in the node classification task for original algorithm and ablation studies.
AlgorithmCoraCiteseerPubMed
EGNAS   83 . 70 ± 0 . 00   74 . 10 ± 0 . 00   79 . 40 ± 0 . 00
Separate   83.08 ± 0.18 OOM   78.16 ± 0.14
No PS   82.70 ± 0.00   72.68 ± 0.65   78.00 ± 0.00
Fully PS   81.92 ± 0.35   72.90 ± 0.00   78.50 ± 0.00
The bold formatting indicate the most optimal values.
Table 8. Comparison of Graph Neural Architecture Search time (m) in the node classification task for original algorithm and ablation studies.
Table 8. Comparison of Graph Neural Architecture Search time (m) in the node classification task for original algorithm and ablation studies.
AlgorithmCoraCiteseerPubMed
EGNAS33.387.091.9
Separate476.0OOM1299.9
No PS50.267.5149.0
Fully PS28.033.790.8
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Jwa, Y.; Ahn, C.W.; Kim, M.-J. EGNAS: Efficient Graph Neural Architecture Search Through Evolutionary Algorithm. Mathematics 2024, 12, 3828. https://doi.org/10.3390/math12233828

AMA Style

Jwa Y, Ahn CW, Kim M-J. EGNAS: Efficient Graph Neural Architecture Search Through Evolutionary Algorithm. Mathematics. 2024; 12(23):3828. https://doi.org/10.3390/math12233828

Chicago/Turabian Style

Jwa, Younkyung, Chang Wook Ahn, and Man-Je Kim. 2024. "EGNAS: Efficient Graph Neural Architecture Search Through Evolutionary Algorithm" Mathematics 12, no. 23: 3828. https://doi.org/10.3390/math12233828

APA Style

Jwa, Y., Ahn, C. W., & Kim, M.-J. (2024). EGNAS: Efficient Graph Neural Architecture Search Through Evolutionary Algorithm. Mathematics, 12(23), 3828. https://doi.org/10.3390/math12233828

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop