1. Introduction
Neural Architecture Search (NAS) has gained popularity in recent years as a powerful tool for the automated design of deep learning architectures. In this paper, we compare the performance of the resulting topologies generated by evolutionary NAS for two different tasks: grammaticality judgment for n-word ordering [
1], and the MNIST dataset. Despite the differences in the tasks, we find that NAS leads to consistent loss reduction using the same tuning method, which cannot be easily designed by human experts. Specifically, we report the emergence of a distinctive fork-like structure that multiplies links between layers. To the best of our knowledge, this fork-like structure has not been previously reported in the literature and may have implications for various deep learning tasks [
1,
2].
Previous research on the application of NAS to linguistic data has focused on improving its accuracy compared to existing language models [
3,
4]. In this paper, we acknowledge that the fork-like structure does not necessarily outperform the traditional technique utilized in each task. Therefore, we have limited the scope of our analysis to the novelty of the fork-like structure and its potential application in various deep learning tasks.
Neural Architecture Search (NAS) is a powerful technique for automating the design of Deep Neural Networks (DNNs) for a given task [
5,
6]. It encompasses a broad set of techniques that automatically design the architecture of a DNN, including selecting the appropriate layers, their connections, and performing hyperparameter tuning [
6]. However, NAS faces several challenges in terms of designing effective network architectures. Firstly, there is a challenge in selecting the appropriate network architecture to suit the specific task at hand. Secondly, there is a challenge in determining how to effectively integrate different components into the network topology, such as selecting appropriate activation functions, designing attention mechanisms, and including skip connections. Finally, there is a challenge in finding the optimal hyperparameters to fine-tune the network’s performance [
7]. Each of these factors must be optimized and studied separately for each new task, which can be time consuming and computationally expensive. Nonetheless, NAS has shown tremendous potential in automating the design process and achieving state-of-the-art performance in various domains.
This paper presents a modified version of Neural Architecture Search (NAS) using the Genetic Algorithm (GA), with chromosome non-disjunction as a novel deep neural network structuring methodology, as recently proposed in [
8,
9]. Our approach leverages the power of Evolutionary Algorithms (EA) to automatically design the full network topology. The key advantage of our system is the utilization of the chromosome non-disjunction operation, which preserves information from both parents without losing any crucial data. This enables the fine-tuning of the previous neural network architecture, which is not possible with traditional cross-over operations, as they typically lose some information while obtaining new data.
This paper presents a comparison of the resulting topologies generated by Neural Architecture Search (NAS) in various deep learning tasks, including grammaticality tasks for n-word ordering [
1], and the MINIST dataset. We found that a particular fork-like structure used in the system provides an efficient solution to these tasks. Additionally, our comparison reveals that the language model for the grammaticality tasks created by NAS is somewhat similar to the model created by the MINIST dataset. The main contribution of this paper is that it provides an analysis of the fork-like structure found in these experiments. While our analysis is still in its preliminary stage, we observed that the fork-like structure reduces the loss during training, which suggests that it has potential as a useful tool in deep learning tasks.
This paper consists of five sections.
Section 2 introduces NAS with the genetic algorithm.
Section 3 presents two experiments: the grammaticality tasks and the MINIST dataset.
Section 4 contains a discussion of our findings, including the fork-like structure that emerged in our experiments.
Section 5 concludes the paper by summarizing our contributions and suggesting future research directions.
2. Neural Architecture Search
2.1. Automated Machine Learning (AutoML)
AutoML is a modern technique in deep learning that automates the process of designing neural network architecture [
10]. Neural Architecture Search (NAS) is a subfield of AutoML that focuses specifically on the automated design of network architecture. The design of the architecture is a critical stage in deep learning since it has a significant impact on performance. However, manual architecture design is both time consuming and prone to errors [
11]. As deep learning tasks have become more challenging and complex, manual design has become increasingly difficult [
7].
Auto-designing the topology of deep neural network architecture is made possible by NAS techniques. Deep neural networks have become more complex, and now have numerous layers and nodes connected by links. Typically, human experts design most neural network architectures. On the other hand, NAS can generate the architecture automatically.
2.2. Neural Architecture Search (NAS)
NAS consists of three critical stages: search space, search strategy, and performance estimation strategy [
11,
12]. In the search space stage, experts must define the architecture that the search process will explore. Since the search space can be massive, incorporating prior knowledge is essential in order to simplify the search space. However, this stage is highly dependent on human expertise. In the search strategy stage, the method used to explore the search space is defined, which is the most important process in NAS. Finally, the performance estimation strategy stage evaluates the performance of the neural networks in order to identify the architecture with the best performance, which is the main objective of NAS.
Various search strategies can be employed in order to explore the search space of artificial neural networks, including Bayesian Optimization (BO), Reinforcement Learning (RL), and Evolutionary Algorithms (EAs). While Hyper Parameter Optimization (HPO) methods are successful with BO, they are not suitable for generating neural network topologies [
13,
14]. To design neural network topologies, RL can be considered, as it rewards agents for efficient actions [
15]. On the other hand, EAs are commonly used by researchers to evolve neural networks, and they generate the topology of the neural network architecture through their evolutionary processes.
2.3. Evolutionary Architecture Search
Neuroevolution, also known as Evolutionary Architecture Search, is a technique that combines deep neural networks (DNN) with Evolutionary Strategy (ES) [
16]. The goal of ES is to optimize the topology of the neural network architecture.
In recent neuro-evolutionary approaches, the Genetic Algorithm (GA) has been widely used to design DNN architecture [
7,
8,
17,
18]. The EA method evolves a population of DNN architectures, and in each generation, the automatically designed DNN architectures are trained and tested. The offspring inherit information about layers and links from their parents, which are then subject to mutations.
One of the most well-known techniques for Neuroevolution using GA is Neuroevolution of Augmenting Topologies (NEAT) [
19]. NEAT has demonstrated its success in solving the XOR problem by evolving a simple neural network architecture with a minimum number of layers and nodes. Additionally, coDeepNEAT, which is based on the NEAT methodology, can even automatically design Convolutional Neural Network (CNN) architecture through evolution [
7]. Unlike other approaches, coDeepNEAT uses a node as a layer in a DNN, which simplifies the network architecture. However, NEAT is a constructive algorithm that must start from a minimum architecture, so it cannot begin from a middle architecture state.
To address this limitation, researchers have attempted to introduce destructive operations, such as deleting nodes. However, this approach alone is insufficient to enable starting from a middle point in the architecture search process.
A potential solution to the problem of starting the architecture search from a middle state is the use of a variable chromosome GA with a genetic operation called chromosome non-disjunction [
8]. This operation involves changing the number of chromosomes in the population, which in turn alters the volume of genetic information and consequently affects the architecture of the artificial neural network. This unique feature of the operation allows the search process to start from a middle state in the architecture evolution process. However, it should be noted that previous implementations of this technique using nodes and links have had limitations in terms of designing complex artificial neural network architectures.
3. Experiments
3.1. Experiment Environment
We utilized the PyTorch framework to develop our Neural Architecture Search (NAS) model using genetic algorithm, which was based on our previous research. For more detailed information, please refer to our previous publications [
1,
2].
We used a workstation equipped with one NVIDIA GeForce RTX 2060 GPU, Intel (R) I7 (R) 10th Gen @ 3.0 GHz, and 64 GB of RAM for most of our experiments. However, each experiment had slightly different settings depending on the specific task and requirements.
3.2. Experiment 1—Application of NAS to n-Words Ordering
In this study, the researchers aimed to apply NAS to Korean grammaticality judgement tasks, in order to explore the possibility of automating the design of language models. Since the word order of a language is the product of complex syntactic operations [
20,
21], a successful neural architecture search in linguistic data would suggest that NAS is capable of designing language models effectively. Although some previous studies have proposed the application of NAS in language modeling, the researchers in this study took it further by adding a novel dataset that contains Korean-specific linguistic operations, which added great complexity to the language patterns. The findings of the experiment demonstrated that NAS can provide an architecture for the Korean language, which is a remarkable result. It is interesting to note that the architecture suggested by NAS was an unprecedented structure that could not have been designed manually, highlighting the potential of NAS to automate the design of language models [
8].
The dataset used in this experiment consists of four-word level sentences in Korean that involve seven different syntactic categories, including noun phrases, verb phrases, prepositional phrases, adjective phrases, adverbs, complementizer phrases, and auxiliary phrases. These combinations result in a total of 2401 possibilities. To ensure the grammaticality of each combination, we consulted the Sejong corpus and two linguists, and the distribution of grammaticality is provided below. These data are unique because we have added two syntactic operations that were not previously visible in the Sejong corpus. An example of a sentence containing scrambling and ellipsis is presented in
Table 1, where the underlying sentence may contain more than four words as some words can be elided and the order can be changed. Overall, this approach presents a novel way of testing language processing using unique data that add complexity to the patterns tested.
The dataset’s grammaticality is presented in
Figure 1, which shows 113 grammatical sentences represented by circles.
Figure 1 is a five-dimensional plot that includes color coding. The X-axis represents the first word slot, while the Y and Z axes represent the second and third word slots, respectively. The color spectrum indicates the fourth word slot. The O/X values in the plot represent grammaticality, which is the output of the artificial neural network. Although our experiment is limited to four-word sentences, the Korean language adds complexity to the sentence patterns due to its syntactic operations. For instance, the sentence in
Table 1 is grammatically correct despite having more than four underlying words. The experiment took approximately 10 h to complete.
The resulting topology of the experiment is displayed in
Figure 2, which reveals a similar fork-like structure to the one found in the previous experiment. This result is somewhat unexpected, as the syntactic phenomena underlying the Korean sentences in the dataset are much more complex than the ones used in the previous experiment. The success of the NAS in designing an architecture for the Korean language model is significant, as it implies that NAS can automate language model designing even for languages with intricate syntactic operations. The discovery of an unprecedented structure in the Korean language model topology highlights the potential of NAS to discover new and improved architectures that can be used to complete various natural language processing tasks.
3.3. MNIST Task
The MNIST dataset is a representative computer vision dataset that includes handwritten digit images [
22,
23]. It consists of 28 × 28-sized images and is comprised of 10 classes that represent digits from 0 to 9.
In this experiment, we used evolutionary NAS to generate an artificial neural network structure for the MNIST dataset. We evolved the network for 30 generations, with the number of layers fixed at three. The number of connections oscillated between three and five before converging to five. The loss values gradually decreased during the evolution, resulting in improved performance. The experiment took approximately one day to complete.
Figure 3 shows the evolution process of this experiment.
The final network structure obtained is shown in
Figure 4, which consists of an input layer, two convolutional layers, one flattening layer, and an output layer, with each layer composed of five connected nodes.
4. Discussion
The fork-like structures found in the resulting topologies of the two experiments are interesting because they suggest a common underlying principle or pattern that may be present in the optimization process of neural network architectures. Despite the differences in the input data and tasks, the networks evolved to have a similar topology.
Figure 5 shows the results of these experiments.
In the case of the MNIST experiment, the fork-like structure was found to be useful in classifying digits. In the Korean grammaticality judgement experiment, the network with the fork-like structure was found to perform better in terms of identifying grammatically correct sentences.
In addition, in a previous paper, we experimented with the CIFAR-10 dataset using the same methodology [
9]. CIFAR-10 is a dataset of 60,000 32 × 32 pixel images divided into 10 classes; it is used for training and testing image classification algorithms [
24]. The classes include airplanes, cars, birds, cats, deer, dogs, frogs, horses, ships, and trucks.
Figure 6 shows the result structure of the CIFAR-10 dataset.
The fact that the similar fork-like structures emerged in both experiments suggests that there may be a generalizable principle of neural network architecture optimization that is at play. It is possible that the fork-like structure provides an efficient way to capture and process information across different inputs and tasks. Further research could explore the possible benefits and limitations of the fork-like structure and whether it is applicable to other types of data and tasks.
The fork-like structure facilitates additional connections while preserving the original neural network structure. This design can be effectively integrated into pre-existing neural networks, and we plan to apply it to both image processing and natural language processing neural networks.
One limitation of this experiment is the small dataset size, which may not be representative of the broader Korean language or its syntactic operations. Additionally, the experiment is limited to only four-word level sentences, which may not capture the complexity of longer sentences. Furthermore, while the resulting architecture is shown to be effective in classifying the grammaticality of the dataset, it is not clear whether it can be generalized to other linguistic tasks or datasets. Finally, the use of a neural architecture search algorithm may make it difficult to interpret the resulting architecture and understand the specific linguistic operations that it is encoding.
5. Conclusions
In conclusion, we have conducted two experiments on different datasets using Neural Architecture Search (NAS) in order to evolve artificial neural networks (ANNs) that can be applied to specific tasks. Despite the differences in the datasets and tasks, we observed a similar fork-like structure in the resulting ANNs. This suggests that the fork structure may be a common pattern in ANNs that are optimized through NAS, regardless of the task or dataset. This finding adds to our understanding of the underlying principles of NAS and ANNs. However, it is worth noting that these experiments have limitations, and further research is needed to validate and extend our findings. Nevertheless, our study highlights the potential application of NAS in automated ANN design in various domains.
Author Contributions
Conceptualization, Y.Y.; Methodology, K.-M.P.; Software, K.-M.P.; Validation, Y.Y. and K.-M.P.; Formal Analysis, Y.Y. and M.P.; Investigation, K.-M.P.; Resources, Y.Y.; Data Curation, Y.Y.; Writing-Original Draft Preparation, Y.Y.; Writing, Review and Editing, M.P. and K.-M.P.; Visualization, K.-M.P.; Supervision, Y.Y. and M.P.; Project Administration, Y.Y.; Funding Acquisition, Y.Y. and K.-M.P. All authors have read and agreed to the published version of the manuscript.
Funding
This work was supported by National University Development Project at Jeonbuk National University in 2021 and was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. 2022R1G1A10070561230382068210102).
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
Data sharing not applicable.
Conflicts of Interest
The authors declare no conflict of interest.
References
- Park, K.; Shin, D.; Yoo, Y. Evolutionary Neural Architecture Search (NAS) Using Chromosome Non-Disjunction for Korean Grammaticality Tasks. Appl. Sci. 2020, 10, 3457. [Google Scholar] [CrossRef]
- Yoo, Y.; Park, K. Developing Language-Specific Models Using a Neural Architecture Search. Appl. Sci. 2021, 11, 10324. [Google Scholar] [CrossRef]
- Jiang, Y.; Hu, C.; Xiao, T.; Zhang, C.; Zhu, J. Improved Differentiable Architecture Search for Language Modeling and Named Entity Recognition. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; pp. 3585–3590. [Google Scholar]
- Wong, C.; Houlsby, N.; Lu, Y.; Gesmundo, A. Transfer Learning with Neural Automl. Adv. Neural Inf. Process. Syst. 2018, 31, 1–10. [Google Scholar]
- Wilamowski, B.M. Neural Network Architectures and Learning Algorithms. IEEE Ind. Electron. Mag. 2009, 3, 56–63. [Google Scholar] [CrossRef]
- Adam, G.; Lorraine, J. Understanding Neural Architecture Search Techniques. arXiv 2019, arXiv:1904.00438. [Google Scholar]
- Miikkulainen, R.; Liang, J.; Meyerson, E.; Rawal, A.; Fink, D.; Francon, O.; Raju, B.; Shahrzad, H.; Navruzyan, A.; Duffy, N. Evolving Deep Neural Networks. In Artificial Intelligence in the Age of Neural Networks and Brain Computing; Elsevier: Amsterdam, The Netherlands, 2019; pp. 293–312. [Google Scholar]
- Park, K.; Shin, D.; Chi, S. Variable Chromosome Genetic Algorithm for Structure Learning in Neural Networks to Imitate Human Brain. Appl. Sci. 2019, 9, 3176. [Google Scholar] [CrossRef]
- Park, K.-M.; Shin, D.; Chi, S.-D. Modified Neural Architecture Search (NAS) Using the Chromosome Non-Disjunction. Appl. Sci. 2021, 11, 8628. [Google Scholar] [CrossRef]
- He, X.; Zhao, K.; Chu, X. AutoML: A Survey of the State-of-the-Art. Knowl. Based Syst. 2021, 212, 106622. [Google Scholar] [CrossRef]
- Elsken, T.; Metzen, J.H.; Hutter, F. Neural Architecture Search: A Survey. arXiv 2018, arXiv:1808.05377. [Google Scholar]
- Weng, Y.; Zhou, T.; Li, Y.; Qiu, X. NAS-Unet: Neural Architecture Search for Medical Image Segmentation. IEEE Access 2019, 7, 44247–44257. [Google Scholar] [CrossRef]
- Kandasamy, K.; Neiswanger, W.; Schneider, J.; Poczos, B.; Xing, E.P. Neural Architecture Search with Bayesian Optimisation and Optimal Transport. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 3–8 December 2018; pp. 2016–2025. [Google Scholar]
- Ma, L.; Cui, J.; Yang, B. Deep Neural Architecture Search with Deep Graph Bayesian Optimization. In Proceedings of the 2019 IEEE/WIC/ACM International Conference on Web Intelligence (WI), Thessaloniki, Greece, 14–17 October 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 500–507. [Google Scholar]
- Zoph, B.; Le, Q.V. Neural Architecture Search with Reinforcement Learning. arXiv 2016, arXiv:1611.01578. [Google Scholar]
- Zhang, H.; Yang, C.-H.H.; Zenil, H.; Kiani, N.A.; Shen, Y.; Tegner, J.N. Evolving Neural Networks through a Reverse Encoding Tree. arXiv 2020, arXiv:2002.00539. [Google Scholar]
- Elsken, T.; Metzen, J.H.; Hutter, F. Efficient Multi-Objective Neural Architecture Search via Lamarckian Evolution. arXiv 2018, arXiv:1804.09081. [Google Scholar]
- Xie, L.; Yuille, A. Genetic Cnn. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 1379–1388. [Google Scholar]
- Stanley, K.O.; Miikkulainen, R. Evolving Neural Networks through Augmenting Topologies. Evol. Comput. 2002, 10, 99–127. [Google Scholar] [CrossRef] [PubMed]
- Saito, M. Some Asymmetries in Japanese and Their Theoretical Implications. Ph.D. Thesis, NA Cambridge, Cambridge, UK, 1985. [Google Scholar]
- Kim, S. Sloppy/Strict Identity, Empty Objects, and NP Ellipsis. J. East Asian Linguist. 1999, 8, 255–284. [Google Scholar] [CrossRef]
- Ciresan, D.C.; Meier, U.; Gambardella, L.M.; Schmidhuber, J. Convolutional Neural Network Committees for Handwritten Character Classification. In Proceedings of the 2011 International Conference on Document Analysis and Recognition, Beijing, China, 18–21 September 2011; IEEE: Piscataway, NJ, USA, 2011; pp. 1135–1139. [Google Scholar]
- Teow, M.Y. Understanding Convolutional Neural Networks Using a Minimal Model for Handwritten Digit Recognition. In Proceedings of the 2017 IEEE 2nd International Conference on Automatic Control and Intelligent Systems (I2CACIS), Kota Kinabalu, Malaysia, 21 October 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 167–172. [Google Scholar]
- Krizhevsky, A.; Hinton, G. Learning Multiple Layers of Features from Tiny Images; University of Toronto: Toronto, ON, Canada, 2009. [Google Scholar]
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).