1. Introduction
1.1. Problem Background
Convolutional neural networks (CNNs) have garnered significant attention in recent years within the field of artificial intelligence due to their exceptional performance in addressing diverse real-world tasks such as object recognition [
1], image classification [
2], defect identification [
3], denoising [
4], etc. Typically, CNNs consist of convolutional layers, pooling layers, and fully-connected layers. This network architecture exhibits remarkable versatility when dealing with a wide range of data types, including images, audio, and video. One fundamental challenge in these applications is the development of efficient signal preprocessing and feature extraction systems to generate appropriate data structures for specific tasks such as classification. CNNs, however, have alleviated the need for preprocessing in many domains by automatically extracting valuable features. In several cases, the convolutional layers of CNNs can perform this automatic preprocessing task as effectively as human experts could achieve, through the meticulous design process of network architecture.
Nevertheless, crafting meaningful CNN architectures remains a meticulous and labor-intensive process, often requiring the expertise of specialists [
5]. This network design process involves identifying the most suitable CNN components, both in terms of architecture and hyperparameters. The performance of CNNs is primarily influenced by two key factors: architectures and trainable parameters (i.e., weights and biases) [
6]. While gradient descent algorithms have been effective for optimizing trainable parameters, there are no explicit functions available for directly determining the optimal CNN architecture needed to achieve competitive results on specific datasets [
6]. Notable CNN architectures, including AlexNet [
7], ResNet [
8], VGGNet [
9], MobileNet [
10], and GoogleNet [
11], have demonstrated remarkable accuracy enhancements in computer vision tasks due to their distinctive network architectures characterized by layer count, interlayer connections, and basic unit designs. However, the manual design of these network architectures remains challenging due to the large number of parameters [
12]. Notably, these CNN architectures are hand-crafted and cannot autonomously learn optimal configurations, necessitating designers with extensive expert knowledge in CNN architecture design [
13]. Furthermore, the optimal design of CNN architecture is problem-specific, and determined by the varying data distributions. These manually crafted architectures lack flexibility, often necessitating time-consuming trial-and-error approaches. Additionally, these manually designed networks may have limited adaptability to diverse datasets, potentially hindering network generalization [
6].
To address these challenges and limitations of manually crafted network architectures, there is an increasing demand for automated CNN architecture design methods that reduce reliance on domain-specific human expertise. The goal of these automated methods is to efficiently search for the optimal CNN architectures that can surpass the performance of manually crafted counterparts. The development of automated CNN architecture design approaches, capable of accommodating diverse datasets, represents a critical step in enhancing CNN efficiency and effectiveness. Such automated strategies yield network architectures tailored to specific tasks and dataset requirements, ultimately enhancing CNN generalization capabilities.
1.2. Recent Progress in Network Architecture Design Techniques
Designing optimal CNN architectures for specific datasets has traditionally demanded substantial manual effort. Recent advancements in network architecture design have introduced three main approaches to automate this process: reinforcement learning-based [
14,
15,
16], gradient-based [
17,
18], and metaheuristic search algorithm (MSA)-based [
19,
20] methods.
Reinforcement learning-based approaches, such as efficient architecture search (EAS) [
15] and BlockQNN [
16], have exhibited impressive performance in discovering competitive architectures. EAS employs network transformation techniques to evolve existing model architectures, while BlockQNN employs a reinforcement technique based on Q-Learning, with an epsilon-greedy strategy to balance exploitation and exploration. However, these approaches demand significant computational resources. For instance, EAS and BlockQNN require 5 and 32 graphics processing units (GPUs), respectively, for tasks such as solving CIFAR-10 and ImageNet datasets. Gradient-based methods, such as differentiable architecture search (DARTS) [
17], offer higher efficiency compared to reinforcement learning-based strategies, but may yield inconsistent results. Nonetheless, they still require substantial human knowledge and computational resources during the network construction phase.
In contrast, MSA-based approaches present a promising solution by integrating nature-inspired search operators, facilitating the discovery of optimal network architectures without the need for specialized domain expertise. These methods, including particle swarm optimization (PSO), grey wolf optimization (GWO), teaching-learning-based optimization (TLBO), and differential evolution (DE), exhibit robust global search capabilities and find extensive application across various domains [
21,
22,
23,
24]. Due to their appealing features, MSA-based techniques have emerged as popular alternatives to conventional design methods, offering researchers a versatile tool to effectively address a wide array of deep learning challenges.
1.3. General Challenges of MSA-Based Network Design Techniques
MSA-based approaches hold great promise for robustly searching for optimal CNN architecture designs tailored to given datasets. However, despite their potential, several fundamental challenges persist. One major challenge is that optimal CNN architectures for different datasets are generally unknown in advance, and can encompass a wide range of architectures and parameters, including the number and type of layers, number of filters, kernel size, pool size, pool stride, and number of neurons. Addressing this challenge requires the adoption of an appropriate encoding strategy that can represent individual solutions as potential CNN architectures with varying network lengths, while maintaining manageable search complexity.
Additionally, integrating effective network constraints into MSA-based approaches during the optimization of CNN architectures is crucial. These constraints prevent the construction of invalid networks, while preserving flexibility in the discovery of novel network architectures. Another challenge associated with population-based MSAs is the significant computational time and resources required to evaluate the effectiveness of each candidate solution representing potential CNN architectures for a given dataset. Therefore, it is imperative to implement a fitness evaluation process with enhanced computational efficiency to make MSAs more practical for optimizing CNN architectures.
While numerous state-of-the-art MSAs inspired by various natural phenomena have emerged due to the no free lunch theorem, their applicability to complex optimization tasks remains relatively unexplored. Although established MSAs such as PSO, GA, and DE have been used to address CNN architecture optimization, there is a critical need to advance the field by exploring the potential of other MSAs in tackling these intricate real-world challenges.
1.4. Drawbacks of Original TLBO in CNN Architecture Optimization
TLBO has recently emerged as a promising approach for optimizing the search for optimal CNN architecture designs based on given datasets [
25]. However, this automatic network design method primarily relies on search operators from the original TLBO version, which has certain limitations. One notable concern is that in the original TLBO, all learners are guided by the same exemplars (population mean and teacher solution) during the teacher phase, overlooking potentially valuable information contributed by other population members. While the original TLBO shows promising convergence speed, it is highly susceptible to premature convergence, especially when both the teacher and population mean become trapped in suboptimal regions early in the optimization process.
Moreover, the learner phase of the original TLBO does not emulate a realistic classroom learning scenario, as it confines each learner’s interaction to a randomly selected peer for acquiring new knowledge. Therefore, considering alternative strategies such as self-learning and adaptive interaction with multiple peers during the learner phase could enhance TLBO’s learning capability. Lastly, TLBO uses a greedy selection scheme based solely on the fitness criterion to determine the survival of learners in the subsequent generation. While this scheme is straightforward to implement, it may neglect potentially superior learners that exhibit temporarily poor fitness values, but could substantially improve the overall population quality in long terms. These limitations can hinder the effectiveness of the original TLBO in solving complex tasks, including CNN architecture optimization, due to the imbalance between exploration and exploitation strengths.
1.5. Research Significances and Contributions
This paper introduces a new variant, i.e., modified TLBO with refined knowledge sharing (MTLBORKS), designed to autonomously discover optimal CNN architectures tailored to specific datasets without human intervention. This process involves searching for the optimal combination of network hyperparameters across three vital CNN building blocks: the convolutional block, pooling block, and fully-connected block. Determining the best combinations of these hyperparameters, encompassing layer numbers and types, filter numbers, kernel sizes, pool size and stride, and neuron numbers, contributes to the optimal design of CNN architecture. MTLBORKS incorporates several enhancements throughout its teacher phase, learner phase, and selection scheme, collectively promoting a more balanced exploration and exploitation search.
The key highlights of this study include:
Introduction of MTLBORKS-CNN, an automatic network design method demonstrating outstanding accuracy in image classification tasks. This approach leverages on the robust global search capabilities of MTLBORKS to autonomously identify the optimal CNN architectures tailored to specific datasets without human intervention.
The proposed MTLBORKS-CNN method incorporates a comprehensive solution encoding strategy that facilitates the search for optimal network hyperparameters. This encoding strategy enables the construction of robust and innovative CNN architectures with varying network lengths for diverse datasets, while avoiding the generation of infeasible models. To ensure practicality, a fitness evaluation process with reduced computational intensity is implemented.
In the modified teacher phase of MTLBORKS-CNN, the unique population mean and social exemplar are calculated for each learner by harnessing the valuable information from better-performing learners through a social learning concept. This individualized approach enhances the potential of learners for discovering novel CNN architectures, while preserving population diversity.
The modified learner phase of MTLBORKS-CNN integrates two innovative strategies: self-learning and adaptive peer learning, aiming to enhance the knowledge of learners effectively. The self-learning mechanism promotes exploration, empowering each learner to independently search for new CNN architectures through personal efforts. Conversely, the adaptive peer learning encourages exploitation by facilitating knowledge sharing among learners through adaptive interactions with multiple peers based on their fitness values during the search for optimal CNN architectures.
MTLBORKS-CNN integrates a dual-criterion selection scheme that comprehensively evaluates learners for their survival in the next generation. This scheme considers the fitness and diversity values of learners, reducing the risk of premature convergence. It ensures that learners with the relatively lower fitness but promising diversity values are not prematurely excluded, allowing them to contribute to the long-term enhancement of the overall population’s quality.
This study conducts thorough simulation studies on nine image datasets derived from MNIST variations to meticulously evaluate the efficacy and feasibility of MTLBORKS-CNN in automatically discovering optimal CNN architectures. The results highlight the considerable merit of MTLBORKS-CNN, as the produced optimal CNN architectures consistently outperform state-of-the-art methods across most of the datasets.
This paper is structured as follows:
Section 2 provides an overview of prior research efforts.
Section 3 explains the search mechanisms used by MTLBORKS-CNN to autonomously discover optimal CNN architectures.
Section 4 presents detailed performance assessments conducted on nine distinct image datasets derived from MNIST variations. Lastly,
Section 5 concludes the paper and outlines avenues for future research.
3. Proposed MTLBORKS-CNN
This study introduces MTLBORKS-CNN as an automatic technique for designing optimized CNN architectures for image classification tasks. The proposed approach eliminates the need for domain-specific expertise from humans.
Figure 1 provides an overview of the MTLBORKS-CNN framework, and specific modifications within MTLBORKS-CNN will be detailed in subsequent subsections.
3.1. Overview of CNN Architecture
Figure 2 depicts a typical sequential CNN architecture, comprising a feature extraction module with two convolutional layers and two pooling layers, along with a trainable classification module consisting of three fully-connected layers. Each CNN layer is defined by specific network hyperparameters, vital for network construction and training, as detailed in the following subsections.
3.1.1. Convolutional Layer
CNNs utilize two types of convolution processes: SAME convolution and VALID convolution. SAME convolution produces feature maps with the same dimensions as the input data by using zero padding. In contrast, VALID convolution generates smaller feature maps without any padding. Convolutional blocks employ filters with predetermined height and width to generate feature maps from input data.
In the convolution process, a filter horizontally slides from left to right with a specified stride width, then vertically moves downward with a stride height, repeating the left-to-right process to form a complete feature map. Feature map elements are computed by summing the product of filter elements and corresponding input elements covered by the filter. The key network hyperparameters of convolutional layer considered for CNN architecture optimization in this study include: number of convolutional layers (), number of filters in each l-th convolutional layer () and kernel size (i.e., ) of filter in each l-th convolutional layer ().
3.1.2. Pooling Layer
Pooling is critical in CNNs for achieving local translation invariance and results in down-sampled feature maps that exhibit increased robustness to variations in feature locations within input data. There are two common pooling types: average pooling and maximum pooling. Average pooling computes the mean values of elements within a kernel to create down-sampled feature maps. Meanwhile, maximum pooling identifies the largest values among the captured elements.
Pooling involves applying a kernel with predefined type, height, and width to input data. Down-sampled feature maps are generated by sliding the kernel from the top-left to the bottom-right, guided by predetermined stride height and width. The network hyperparameters of pooling layer considered for CNN architecture optimization in this study include: selection probability of pooling type connected to each l-th convolutional layer, (), kernel size (i.e., ) of pooling layer connected to each l-th convolutional layer (), and stride size (i.e., ) of pooling layer connected to each l-th convolutional layer, .
3.1.3. Fully-Connected Layer
The feature extraction module, comprising convolutional and pooling layers, extracts relevant features from raw data. Once extracted, these features can be fed into a classifier, often a fully-connected layer used for classification. To introduce the output feature maps to the fully-connected layer, the feature maps must be flattened and reshaped into a vector.
The CNN’s classification module can consist of one or multiple fully-connected layers, comprising neurons that process input to generate an output. These layers receive inputs from neurons in the preceding layers via connections with assigned weights. Generally, the fully-connected layers are used alongside backpropagation to learn network weights. CNN training aims to minimize errors between predicted and actual dataset outputs via the reduction of cross-entropy loss. The network hyperparameters of fully-connected layers considered for CNN architecture optimization in this study include: number of fully-connected layer () and number of neurons in the q-th fully-connected layer ().
3.2. Solution Encoding Scheme of MTLBORKS-CNN
Once the network hyperparameters crucial for optimizing the CNN architecture are identified, an efficient solution scheme is designed for MTLBORKS-CNN, as depicted in
Figure 1. This scheme allows each learner to encode vital network hyperparameters necessary for generating optimal CNN architectures. These encoded hyperparameters encompass network length, layer types, filter numbers, kernel sizes, pool sizes, pool strides, and neuron numbers. The solution encoding approach in MTLBORKS-CNN is meticulously designed to prevent the generation of invalid network architectures, while maintaining the flexibility required to discover effective CNN architectures for diverse image classification tasks.
Figure 3 illustrates
D-dimensional position vector
of the
n-th MTLBORKS-CNN learner. Each
d-th dimension (
) in this vector contains essential information for constructing a unique CNN architecture, including layer types and their corresponding hyperparameters. The information encoded in
is categorized into three main sections: convolutional, pooling, and fully-connected.
Table 1 summarizes the attributes of network hyperparameters to optimize in each section, such as data type, encoded dimension index, index number (if applicable), lower limit, and upper limit. Notably, the pooling layer hyperparameters,
, encoded in dimensions
, where
, indicate the types of pooling layers connected to each
l-th convolutional layer. Specifically, (a) no pooling layer for
, (b) maximum pooling for
, and (c) average pooling for
. With predefined
and
values, the total dimensional size of
for each
n-th learner is
.
Referring to the solution encoding scheme depicted in
Figure 3 and detailed in
Table 1, Algorithm 1 has been devised to decode the network hyperparameters contained within each learner and transform it into a valid CNN architecture. It is worth noting that despite all position vectors
share the same
D-dimensional size for
, each
n-th MTLBORKS learner can generate CNNs of varying network lengths based on values
,
, and
encoded in different dimensional indices, as indicated in
Table 1. For instance, when
, only the first
pieces of information (i.e.,
,
,
,
and
) for
are utilized for constructing the convolutional and pooling sections of CNN. Information pertaining to the remaining convolutional and pooling sections of the CNN, specifically for
, are disregarded during network construction. It should be noted that information pertaining to
and
are also omitted when the corresponding value of
falls within the range of
. This indicates that no pooling layer is linked to the
l-th convolutional layer under this condition. Similarly, when
, only the first
pieces of information (i.e.,
), for
, are used to construct the fully-connected layer of CNN. Any remaining information related to the fully-connected layer, for
, is omitted.
Algorithm 1: Decoding Learner to CNN Architecture |
Input:, |
01: | Initialize an empty CNN architecture; |
02: | Extract and encoded in dimensions and of learner , respectively; |
03: | Initialize the indices of convolutional layer and fully-connected layer as and , respectively; |
04: | while do /*only the first pieces of information are used to construct the convolutional and pooling sections*/ |
05: | Extract and encoded in dimensions and of learner , respectively; |
06: | Append the l-th convolutional layer with and to the CNN architecture; |
07: | Extract , and encoded in dimensions , and |
| of learner , respectively; |
08: | if then |
09: | No pooling layer is appended to the l-th convolutional layer of CNN architecture; |
10: | elseif then |
11: | Append a maximum pooling layer with and to the l-th convolutional layer of CNN architecture; |
12: | else /*when */ |
13: | Append an average pooling layer with and to the l-th convolutional layer of CNN architecture; |
14: |
end if |
15: | ; |
16: | end while |
17: | while do /*only the first pieces of information are used to construct the fully-connected section |
18: | Extract encoded in dimension of learner , respectively; |
19: | Append the q-th fully-connected layer with to the CNN architecture; |
20: | ; |
21: | end while |
Output: A valid CNN architecture corresponds to learner |
Table 2 provides an overview of the feasible search ranges for network hyperparameters that are subject to CNN architecture optimization, and the total dimension of for each learner is determined as
. As depicted in
Figure 4, Algorithm 1 demonstrates the capability to construct different valid CNN architectures based on the unique combination of network hyperparameters encoded within each MTLBORKS-CNN learner. This holds true, provided that these network hyperparameters encoded in each MTLBORKS-CNN learner fall within their predefined boundary limits.
3.3. Population Initialization of MTLBORKS-CNN
Algorithm 2 outlines the initialization process of the MTLBORKS-CNN population. During this stage, diverse candidates of CNN architectures are created by randomly generating position vectors, for N learners, where . The dimension size of each is , considering the predefined values of and .
For each potential CNN architecture represented by the
n-th learner, network hyperparameters for convolutional, pooling, and fully-connected sections are randomly generated within their feasible search ranges in
, where
. For example, the convolutional section’s hyperparameters, such as
,
, and
, are initialized in dimensions
,
, and
, where
. The remaining network hyperparameters are initialized based on the solution encoding scheme and attributes of network hyperparameters described in
Figure 3 and
Table 1, respectively.
Algorithm 2: Population Initialization of MTLBORKS-CNN |
Input:, , , , , , |
01: | Calculate the total dimension size of ; |
02: | Initialize teacher solution, i.e., ; |
03: | for n = 1 to N do |
04: | ; |
05: | for d = 1 to D do /*Initialize each dimension of learner*/ |
06: | Identify the types of network hyperparameter encoded in ; |
07: | Randomly initialize the network hyperparameter in based on its corresponding attributes described in Table 1; |
08: |
end for |
09: | based on Algorithm 3; |
10: | then |
11: | ; |
12: |
end if |
13: | end for |
Output:
|
Once is generated for each n-th learner, its fitness value in classification error, , is determined through a fitness evaluation process detailed in the next section. This initialization is performed for all N learners, forming an initial population . The learner with the lowest classification error becomes the teacher solution, represented by its position vector and fitness .
3.4. Fitness Evaluation of MTLBORKS-CNN
Each MTLBORKS-CNN learner possesses a position vector representing a potential CNN architecture for solving a given problem. The learner’s fitness is evaluated based on the corresponding classification error of this CNN. Learners with lower error rates in classifying datasets are considered to have superior fitness, and vice versa. In this study, the automatic network architecture design problem is formulated as a minimization problem. The primary goal of MTLBORKS-CNN is to identify the optimal CNN model that solves the assigned tasks with the fewest classification errors. Algorithm 3 outlines the two major stages of fitness evaluation for each MTLBORKS-CNN learner, involving (i) constructing and training a potential CNN architecture with the training set, and (ii) evaluating the trained CNN architecture using the validation set.
Algorithm 3: Fitness Evaluation of MTLBORKS-CNN |
Inputs: , , , , , , |
01: | Compile a full-fledged CNN architecture based on the network information extracted from using Algorithm 1 and a fully-connected layer with output neurons added as the last layer of the CNN; |
02: | Compute and using Equations (4) and (6), respectively; |
03: | Initialize the weights of compiled CNN model as using He Normal initializer; |
04: | for to do /*First stage of fitness evaluation as explained in Section 3.4.1*/ |
05: | do |
06: | Calculate the cross-entropy function of CNN model based on and the i-th batch data ; |
07: | based on Equation (5); |
08: |
end for |
09: | end for |
10: | for do /*Second stage of fitness evaluation as explained in Section 3.4.2*/ |
11: | Use the trained CNN model to classify dataset; |
12: | Calculate the classification errors of trained CNN model based on the j-th batch data ; |
13: | end for |
14: | Calculate the mean classification error of the CNN constructed based on , using Equation (7), to obtain its fitness value ; |
Output: |
3.4.1. Stage 1: Construction and Training of Potential CNN Architecture
When evaluating the fitness of each n-th learner, a CNN architecture is constructed using Algorithm 1 based on network hyperparameters decoded from the corresponding position vector, . These hyperparameters include , , , , , , , and for , and . This compiled CNN architecture is also augemented with a fully-connected layer having the same number of output neurons as the required classification classes, denoted as .
All convolutional and fully-connected layers’ weights are initialized using the He Normal weights initializer [
40]. These trainable CNN parameters are denoted as
. Let
be the training dataset with size
, employed to train each potential CNN architecture from MTLBORKS-CNN learners. Each CNN architecture undergoes training in
steps, with a predefined batch size
, where
The compiled CNN architecture is trained using the Adam optimizer [
41] for a predetermined number of epochs (
), performed over
data batches extracted from the training dataset (
). At each
i-th training step (
), the cross-entropy loss function of the CNN model is evaluated as
using the current weight parameters (
) and the
i-th batch data (
). The learning rate is represented as
, and the gradient of the cross-entropy loss is denoted as
. The updated weight parameters of the CNN model, i.e.,
, are calculated as follows:
3.4.2. Stage 2: Evaluation of the Trained CNN Architecture
Next, the trained CNN model is evaluated using a validation dataset,
, with
samples. The resulting classification error is assigned as the fitness value for the corresponding MTLBORKS-CNN learner. This evaluation is performed over multiple steps,
, where
In each
j-th evaluation step, different data batches,
, are used to evaluate the trained CNN models, resulting in distinct classification errors, indicated as
for
. The mean classification error of the trained CNN model, computed across all
batches of data within
, determines the fitness value for each
n-th learner, represented as
, i.e.,
3.4.3. Design Consideration of Epoch Numbers during Fitness Evaluation Process
CNNs are often characterized by deep architectures, which makes full training with dataset for minimal classification error computationally expensive and time-consuming. This is primarily due to the requirement for a large number of training epochs , typically exceeding 100, for full training of CNNs. However, when employing population-based MTLBORKS-CNN for identifying the optimal CNN architectures tailored to a specific dataset, each learner undergoes full training in each generation. This approach becomes impractical because it requires evaluating numerous candidate CNN architectures represented by all learners, resulting in a significant number of fitness evaluations. In the context of optimizing CNN architectures, each fitness evaluation involves the full training of a potential CNN architecture represented by an MTLBORKS-CNN learner. These excessive computational demands pose challenges to the feasibility of MTLBORKS-CNN in handling a large number of alternatives for achieving substantial improvements in the optimization process of CNN architectures.
To tackle this issue, a fitness approximation method is implemented. It involves training the potential CNN architecture represented by each MTLBORKS-CNN learner with a reduced training epoch (e.g., ) during the fitness evaluation process. While this may result in less precise evaluations, it effectively mitigates the computational load. In the selection process for the next generation of the population, it becomes more vital to ensure a fair comparison among learners to establish their relative superiority, rather than achieving precise fitness evaluations for each individual.
Moreover, a candidate CNN architecture is more likely to demonstrate a promising final classification error if it displays superior performance in the initial training epochs. Upon completing the MTLBORKS-CNN search process, the optimal CNN architecture formulated based on the network hyperparameters derived from the teacher can undergo full training with a larger epoch size to obtain its final classification error. To address potential network overfitting, the dropout and batch normalization techniques are incorporated between different layers of candidate CNN architectures [
13].
3.5. Modified Teacher Phase of MTLBORKS-CNN
In the teacher phase of original TLBO, all learners are guided by the same exemplars, specifically the teacher solution and population mean, as defined by Equation (1). However, this conventional approach disregards valuable directional information contributed by other nonfittest population members. While both teacher solution and population mean can expedite learners’ convergence towards promising solution regions in the early stages of optimization, their influence diminishes as population diversity declines over time. This limitation becomes especially evident when addressing real-world optimization problems characterized by complex fitness landscapes, such as CNN architecture optimization in this study. These complex optimization problems often feature numerous local optima, which can misguide TLBO towards suboptimal regions. This undesirable phenomenon impedes the effectiveness of TLBO in searching for satisfactory CNN architectures due to premature convergence. To address this inherent limitation of teacher phase in the original TLBO, it is vital to incorporate a robust diversity maintenance mechanism. In this study, the modified teacher phase in MTLBORKS-CNN integrates a social learning concept. This mechanism considers the valuable direction information provided by other nonfittest learners, enabling more diversified and tailored guidance to each learner during the teacher phase, ultimately resulting in more effective CNN architecture searching.
3.5.1. Construction of Unique Mean Positions
In the modified teacher phase of MTLBORKS-CNN, a social learning concept is employed to calculate unique mean positions for all learners. This process begins by arranging all learners in descending order based on their fitness values, denoted as
for
. It is assumed that any learner outperforming the
n-th learner falls within the population indices
. For each
n-th learner, a unique mean position
is defined to represent the mean CNN architecture information for
. Particularly, each
d-th dimension of unique mean position (i.e.,
) is calculated as follows:
The rounding operator, i.e.,
, transforms all dimensional components of
into integer values. This conversion excludes the network hyperparameters
stored in
for
, which signify the selection probabilities of pooling layers connected to the
l-th convolutional layer. The mean CNN architecture, represented by
, varies across learners due to its computation relying on distinct groups of learners with superior fitness. However, the best-performing learner, with
n =
N, lacks other better-performing peers to mimic. Therefore, it does not receive a unique mean position using Equation (8).
Figure 5 illustrates the calculation of the unique mean position,
, for the worst-performing learner using Equation (8), considering directional information from other superior learners (
for
) under
and
. Algorithm 4 outlines the pseudocode used to compute the mean CNN architecture, represented by
, for every
n-th learner (
).
Algorithm 4: Computation of Unique Mean Positions Based on Social Learning Concept |
Inputs: , , |
01: | Sort all learners in descending order based on their fitness values (i.e., for ); |
02: | for to do |
03: | Initialize ; |
04: | for to D do |
05: | Calculate using Equation (8); |
06: | if with do |
07: | ; |
08: |
end if |
09: |
end for |
10: | end for |
Output: for |
3.5.2. Construction of Unique Social Exemplar
To enhance knowledge exchange within a classroom, learners often benefit not only from their teachers, but also from peers who excel in various subjects. A similar approach is employed in the modified teacher phase of MTLBORKS-CNN, where each
n-th learner is assigned a unique social exemplar,
. This exemplar is designed to provide more effective guidance, drawing from valuable insights offered by other population members. Specifically, any randomly selected learner who outperforms the
n-th learner (with a population index
) contributes to the
d-th dimension of the unique social exemplar allocated to each
n-th learner, denoted as
. This contribution is achieved via the corresponding dimension of the
-th learner, as shown below:
Note that the unique social exemplar,
, differs for each
n-th learner, as it is computed based on a distinct group of learners who demonstrate superior fitness compared to the
n-th learner. Similar to Equation (8), Equation (9) does not apply to the top-performing learner with a population index of
, as it surpasses all other learners.
Figure 6 illustrates the process of constructing a unique social exemplar for the worst-performing learner,
, by considering the position vectors of four other learners with superior fitness (i.e.,
for
). Algorithm 5 presents the pseudocode detailing the creation of the unique social exemplar,
, for every
n-th learner, where
.
Algorithm 5: Computation of Unique Social Exemplars Based on Social Learning Concept |
Inputs: , , |
01: | for to do |
02: | Initialize ; |
03: | for to D do |
04: | Randomly select a fitter learner with the population index of ; |
05: | Assign based on Equation (9); |
06: |
end for |
07: | end for |
Output: for |
3.5.3. Construction of New CNN Architecture
In the modified teacher phase of MTLBORKS-CNN, a new CNN architecture is determined for each
n-th offspring learner using the position vector
. This vector is derived by incorporating both the unique mean position
and the unique social exemplar,
, computed for each learner with
, i.e.,
where
and
are randomly generated from the uniform distribution. As shown in Equation (10), the position vector for the
n-th offspring learner, crucial for generating the new CNN architecture, is determined by the differences between the CNN architectures represented by (
) and
.
After obtaining
from Equation (10), boundary checking is performed to ensure all network hyperparameters are within their search boundaries, as specified in
Table 1. A rounding operator,
, is applied to convert the network hyperparameters into integer values. This conversion excludes the network hyperparameters stored in
for
, which signify the selection probabilities of pooling layers connected to the
l-th convolutional layer. Algorithm 3 measures the fitness of
as
. The teacher solution can be replaced by the
n-th offspring learner with superior fitness. All generated offspring solutions,
for
, along with the best-performing
N-th learner, are stored in an offspring population set, denoted as
. These solutions will be used in the subsequent modified learner phase alongside the original population,
. The search mechanisms for the modified teacher phase of MTLBORKS-CNN are described in Algorithm 6.
Algorithm 6: Modified Teacher Phase of MTLBORKS-CNN |
Inputs: , , , , , , , , , , |
01: | Calculate the unique mean position of each learner for using Algorithm 4; |
02: | Calculate the unique social exemplar of each learner for using Algorithm 5; |
03: | Initialize the offspring population ; |
04: | for to do |
05: | if then |
06: | Randomly generate and ; |
07: | Calculate using Equation (10) and perform boundary checking; |
08: | for to D do |
09: | if with do |
10: | ; |
11: |
end if |
12: |
end for |
13: | Perform fitness evaluation on to obtain using Algorithm 3; |
14: |
else /*i.e.,*/ |
15: | ; |
16: |
end if |
17: | If then |
18: | , ; /*Update the teacher*/ |
19: |
end if |
20: | ; /*Store the new learner into the offspring population*/ |
21: | end for |
Output: , , |
3.6. Modified Learner Phase of MTLBORKS-CNN
In the learner phase of original TLBO, each learner performs searching within the solution space by interacting with a randomly selected peer learner from the population. This interaction, as described in Equation (3), involves attracting all dimensional components of the learner toward the peer learner if the peer has superior fitness, or repelling the learner away from the peer if the peer has inferior fitness. However, as the number of iterations increases, the likelihood of triggering the exploration-focused repelling mechanism diminishes. This is primarily because most learners converge toward specific solution regions, resulting in reduced population diversity. When dealing with complex optimization problems, such as CNN architecture optimization in the current study, the decreasing exploration strength of the original TLBO in later optimization stages can hinder its ability to discover new CNN architectures, as it becomes highly prone to becoming trapped in local optima. Additionally, the learning mechanisms of the original TLBO, as defined in Equation (3), fail to accurately emulate real classroom learning dynamics, as they overlook individual efforts and adaptive interactions among peer learners for more effective knowledge improvement. This inaccurate emulation of real classroom learning dynamics limits the balance between exploration and exploitation searches in the original TLBO, thereby constraining its effectiveness in searching for promising CNN architectures during the learner phase. To address these limitations, the modified learner phase of MTLBORKS-CNN introduces self-learning and adaptive peer learning schemes to achieve a more accurate emulation of real classroom learning dynamics. The incorporation of both self-learning and adaptive peer learning into the modified learner phase aims to strike a delicate balance between exploration and exploitation searches in MTLBORKS-CNN, ultimately enhancing its efficiency in optimizing CNN architectures during the modified learner phase.
3.6.1. Self-Learning Scheme
The self-learning scheme introduced in the modified learner phase of MTLBORKS-CNN aims to simulate learners’ preference for improving their knowledge in specific subjects through individual efforts using a probabilistic mutation operator. This scheme provides learners with an additional momentum through random perturbations, helping them break away from local optima.
After completing modified teacher phase, each
n-th learner is assigned a probability of
to engage in the self-learning scheme during the modified learner phase. To facilitate the
n-th learner’s self-learning process, a dimension index
is randomly generated. This index is used to apply a random perturbation to
, which denotes the
-th dimension of the offspring solution generated by the
n-th learner:
where
is a random number obtained from a uniform distribution;
and
represent the upper and lower limits of the
-th dimension of the arrays containing network hyperparameters, as defined in
Table 1.
Subsequently, a rounding operator is applied to to convert network hyperparameters into integer values. However, this rounding operation excludes values that pertain to the selection probability of the pooling layer connected to the l-th convolutional layer. These specific values are situated within the dimension indices of for . The fitness value of is then evaluated using Algorithm 3 to obtain . If can construct a CNN architecture with lower classification error than , the latter solution will be replaced by . Algorithm 7 outlines the pseudocode for the self-learning scheme.
Algorithm 7: Self-Learning Scheme of MTLBORKS-CNN |
Inputs: , , , , , , , , , , , |
01: | Generate a random dimension index, ; |
02: | Retrieve -th dimension of position vectors , and ; |
03: | for to D do |
04: | if then |
05: | Update using Equation (11); |
06: | if with do |
07: | ; |
08: |
end if |
09: |
end if |
10: | end for |
11: | Perform fitness evaluation on to update using Algorithm 3; |
12: | if then |
13: | , ; |
14: | end if |
Output: Updated , and |
3.6.2. Adaptive Peer Learning Scheme
In the modified learner phase of MTLBORKS-CNN, offspring solutions of learners who do not opt for the self-learning scheme are updated using the proposed adaptive peer learning approach. To initiate this phase, the fitness values of all offspring learners stored in are utilized to arrange them in descending order, from worst to best. The ranking for each n-th offspring learner is calculated as follows:
Based on Equation (12), an offspring learner with a lower value of receives a higher value of and vice versa. With these assigned ranking values, the probability of interaction with peers for each n-th offspring learner, denoted as , is calculated as:
Let
represent the
d-th dimension of each
n-th offspring learner. A random number
is generated using a uniform distribution and compared with the corresponding
when updating
. If
is smaller than
, three peer offspring learners, denoted as
,
, and
, are randomly selected from
. The network hyperparameters stored in their respective
d-th dimension are used to update
, where
. Otherwise, the original value stored in
is retained. Let
be a peer learning factor generated randomly from a uniform distribution for each
n-th offspring learner. The adaptive peer learning scheme updates
for each
n-th offspring learner as follows:
After obtaining
from Equation (14), boundary checking in performed to ensure all network hyperparameters are within their search boundaries, as specified in
Table 1. A rounding operator
is applied to all network hyperparameters encoded in
, except for those stored in the dimension indices where
for
. Algorithm 3 is utilized to evaluate the fitness value of the
n-th offspring learner as
based on its updated
. If the CNN architecture represented by
results in a lower classification error than that of
,
replaces
. Equations (12)–(14) illustrate that the
n-th offspring learner with a larger (indicating worse fitness) is more inclined to learn from multiple peers and update most of its dimensional components in
, as indicated by its larger
value, and vice versa.
This adaptive peer learning scheme effectively regulates the explorative and exploitative behaviors of MTLBORKS-CNN. It enables learners to adjust their CNN architectures through interactions with peers, or retain their original architectures based on their fitness levels. A detailed description of the adaptive peer learning scheme is presented in Algorithm 8. Algorithm 9 outlines the comprehensive search mechanisms for the modified learner phase of MTLBORKS-CNN, combining both self-learning and adaptive peer learning schemes detailed in Algorithms 7 and 8, respectively.
Algorithm 8: Adaptive Peer Learning of MTLBORKS-CNN |
Inputs: , , , , , , , , , |
01: | Calculate and using Equations (12) and (13), respectively. |
02: | for to D do |
03: | Randomly select , , and from , where ; |
04: | Update using Equation (14) and perform boundary checking; |
05: | if with do |
06: | ; |
07: |
end if |
08: | Perform fitness evaluation on the updated to obtain using Algorithm 3; |
09: | if then |
10: | , ; |
11: | end if |
Output: Updated , , , and |
Algorithm 9: Modified Learner Phase of MTLBORKS-CNN |
Inputs: , , , , , , , , |
01: | Sort the offspring learners stored in in descending order (i.e., from worst to best); |
02: | for to N do |
03: | if do |
04: | Perform self-learning scheme to update and using Algorithm 7; |
05: |
else |
06: | Perform adaptive peer learning scheme to update and using Algorithm 8; |
07: |
end |
08: | end for |
Output: Updated , , and |
3.7. Dual-Criterion Selection Scheme of MTLBORKS-CNN
The selection scheme plays a crucial role in forming the next generation population in MSAs, using predefined criteria during the optimization process. Traditional selection schemes, such as greedy selection and tournament selection, determine the survival of individual solutions solely based on their fitness values. For example, in the original TLBO, the greedy selection scheme compares the fitness values of current learners with those of new learners generated through teacher and learner phases. Despite their straightforward implementation, these fitness-based selection schemes have drawbacks when dealing with complex optimization problems, such as CNN architecture optimization in this study. In the context of CNN architecture optimization, the excessive reliance on fitness-based selection schemes may hinder the ability of original TLBO to discover novel CNN architectures that might initially show slightly lower performance, but have the potential for long-term usefulness, such as maintaining population diversity. To overcome this limitation, MTLBORKS-CNN introduces a dual-criterion selection approach. This approach constructs the next generation population by considering both the fitness and diversity of learners. It allows learners with slightly lower fitness but greater diversity to move forward to the next iteration, thereby preserving population diversity.
Upon the completion of the modified learner phase in MTLBORKS-CNN, a merged population is formed, denoted as
. This merged population combines the current population, represented as
, with the updated offspring population, denoted as
. The population size of
is 2
N, and it is represented as follow:
where
refers to the
n-th solution member stored in
, which may originate from either the current learners in
or the offspring learners in
. These solution members within
are sorted in ascending order based on the classification error
associated with the CNN architecture represented by the corresponding
. Let
represents the Euclidean distance between the
n-th solution member of
(i.e.,
) and the best solution member (i.e.,
). The value of
is determined using a Euclidean distance operator
, where
The dual-criterion selection scheme proposed in MTLBORKS-CNN utilizes the calculated values of
and
for each
n-th solution within
, where
. To construct the next generation population, denoted as
, with a population size of
N,
K solution members with the best
values are directly selected from
in each iteration of MTLBORKS-CNN. Here,
represents a randomly generated integer. For the remaining
solution members in
, with population indices
, a weighted fitness value is calculated for each
n-th solution member, denoted as
. This calculation takes into account both the classification error (i.e.,
) and the diversity (i.e.,
), as follows:
where
is a weighted factor randomly generated from a normal distribution
. Its value is constrained within the range [0.8,1.0] to ensure that diversity does not dominate the selection process for the remaining population members in the subsequent iteration.
and
represent the worst and best fitness values observed within
, respectively, while
and
correspond to the largest and smallest Euclidean distances measured from the best solution member
, respectively.
After calculating the weighted fitness value
for each
n-th solution member, where
, a binary tournament strategy randomly selects two solution members,
and
from
, where
. The solution member with the smaller weighted fitness value is then chosen to be one of the remaining solution members of
, denoted as
, for
, as follows:
The remaining solution members for the next iteration are selected from by repeating the binary tournament strategy described in Equation (18). Algorithm 10 provides the pseudocode for the proposed dual-criterion selection scheme. In contrast to fitness-based selection schemes, such as greedy selection and tournament selection, this dual-criterion selection scheme not only retains the K elite solution members for in the subsequent iteration, but also effectively maintains population diversity by considering both the fitness and diversity levels of solutions when choosing the remaining solution members of .
Algorithm 10: Dual-Criterion Selection Scheme of MTLBORKS-CNN |
Inputs: , , |
01: | Initialize the next generation population ; |
02: | Construct the merged population using Equation (15); |
03: | Rearrange the solutions of in ascending order by referring to their fitness values; |
04: | for to 2N do |
05: | Calculate the for every n-th solution of using Equation (16); |
06: | end for |
07: | Randomly generate an integer ; |
08: | for to K do /*Select the first K learners by only considering their fitness*/ |
09: | ; |
10: | ; |
11: | end for |
12: | for to 2N do |
13: | Randomly generate from a normal distribution ; |
14: | if then |
15: | ; |
16: | else if then |
17: | 1; |
18: |
end if |
19: | Calculate the of each n-th solution member of using Equation (17); |
20: | end for |
21: | for to N do /*Select the remaining learners by considering their fitness and diversity*/ |
22: | Randomly select two solution members of and from , where ; |
23: | Determine using Equation (18); |
24: | ; |
25: | end for |
Output:
|
3.8. Complete Mechanisms of MTLBORKS-CNN
Algorithm 11 delineates the comprehensive mechanisms of MTLBORKS-CNN, designed for the search of an optimized CNN architecture tailored to a specific dataset. Throughout this algorithm, a counter variable t keeps track of the current iteration, with representing the predefined maximum iteration number, which acts as the termination criterion for MTLBORKS-CNN.
The procedure commences by loading the training and validation datasets, denoted as and , respectively, from the designated directory. Following this, the MTLBORKS-CNN population is initialized using Algorithm 2. Subsequently, Algorithms 6 and 9, elucidating the modified teacher phase and modified learner phase of MTLBORKS-CNN, respectively, are iteratively employed to generate a population set, , comprising diverse new CNN architectures. The next-generation population, , is then derived from the merged population, , employing the dual-criterion selection scheme proposed in Algorithm 10. The optimization process for CNN architecture, facilitated by the proposed MTLBORKS-CNN method, concludes when the termination condition is met.
As explained in the previous subsection, the fitness evaluation process, as delineated in Algorithm 3, employs a reduced epoch number, denoted as , for training each CNN architecture generated by every MTLBORKS-CNN learner. While this approach effectively reduces computational demands, its efficacy in addressing complex problems may be limited. Therefore, after the termination of MTLBORKS-CNN, the CNN architecture constructed from the teacher solution (i.e., ) undergoes a full training process. This full training process shares the same mechanisms as Algorithm 3, but with a larger number of epochs, denoted as . Once the full training is completed, all relevant network details of the fully trained CNN model, including its architecture, classification error, number of trainable parameters, and more, are returned.
Algorithm 11: Proposed MTLBORKS-CNN |
Inputs: , , , , , , , , , , , , , , , , , , , , , , |
01: | Load and from the directory; |
02: | Initialize the population using Algorithm 2; |
03: | Initialize the iteration counter as ; |
04: | while do |
05: | Generate and update and using modified teacher phase (Algorithm 6); |
06: | Update , and using modified learner phase (Algorithm 9); |
07: | Determine using dual-criterion selection scheme (Algorithm 10); |
08: |
|
09: | ; |
10: | end while |
11: | Fully train the CNN architecture constructed from with larger (Algorithm 3); |
Output: An optimal CNN architecture constructed from with all related network information |
3.9. Complexity Analysis of MTLBORKS
The time complexity of the proposed MTLBORKS, compared to the original TLBO, is evaluated using Big O analysis. Since both TLBO and MTLBORKS are used to solve the CNN architecture optimization problem in the current study, the time complexity for fitness evaluation is the same for both algorithms. The original TLBO incurs a time complexity of across population initialization, teacher phase, and learner phase, where N is the population size, and D is problem dimensionality. Thus, the time complexity of the original TLBO in each iteration is in the worst-case scenario. Similar to the original TLBO, the time complexity for initializing MTLBORKS population is . The computational time complexity of MTLBORKS is affected by three key modifications: the modified teacher phase, the modified learner phase, and the dual-criterion selection scheme.
In the modified teacher phase, learners are rearranged in descending order based on their fitness values, incurring a time complexity of per iteration. Additionally, the values of , , and are calculated for all N learners using Equations (8), (9), and (10), respectively, resulting in a time complexity of in every iteration. Therefore, the total time complexity of the modified teacher phase of MTLBORKS amounts to per iteration, due to its growth rate being higher than .
During the modified learner phase, learners are rearranged in descending order based on their fitness values, incurring a time complexity of per iteration. Furthermore, the values of and are calculated for all N learners in each iteration using Equations (12) and (13), respectively, with a time complexity of . When updating using the self-learning scheme and adaptive peer learning, as described in Equations (11) and (14), respectively, a computational time complexity of is incurred for each learner. Thus, the total time complexity of the modified learner phase of MTLBORKS remains per iteration.
For the proposed dual-criterion selection scheme, a time complexity of per iteration is incurred when merging and to produce using Equation (15). Then, a time complexity of per iteration is incurred to rearrange all learners in in descending order based on their fitness values. Time complexities of and are incurred in each iteration to calculate the values of and for all N learners using Equations (16) and (17), respectively. When constructing for the next iteration of MTLBORKS, a time complexity of is incurred. Thus, the total time complexity of the dual-criterion selection scheme of MTLBORKS is per iteration.
In conclusion, based on the time complexity analyses presented above, it is evident that the overall time complexity for each iteration of MTLBORKS, encompassing the modified teacher phase, modified learner phase, and dual-criterion selection scheme, remains under the worst-case scenario.