Next Article in Journal
Upper Bounds for the Remainder Term in Boole’s Quadrature Rule and Applications to Numerical Analysis
Previous Article in Journal
Effective SQL Injection Detection: A Fusion of Binary Olympiad Optimizer and Classification Algorithm
Previous Article in Special Issue
Sentiment Analysis: Predicting Product Reviews for E-Commerce Recommendations Using Deep Learning and Transformers
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Exploring Metaheuristic Optimized Machine Learning for Software Defect Detection on Natural Language and Classical Datasets

by
Aleksandar Petrovic
1,†,
Luka Jovanovic
1,†,
Nebojsa Bacanin
1,2,3,*,†,
Milos Antonijevic
1,†,
Nikola Savanovic
1,†,
Miodrag Zivkovic
1,†,
Marina Milovanovic
1,† and
Vuk Gajic
4,†
1
Faculty of Informatics and Computing, Singidunum University, Danijelova 32, 11000 Belgrade, Serbia
2
Department of Mathematics, Saveetha School of Engineering, SIMATS, Thandalam, Chennai 602105, Tamilnadu, India
3
MEU Research Unit, Middle East University, Amman 11822, Jordan
4
Department of Environment and Sustainable Development, Singidunum University, Danijelova 32, 11000 Belgrade, Serbia
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Mathematics 2024, 12(18), 2918; https://doi.org/10.3390/math12182918
Submission received: 30 August 2024 / Revised: 9 September 2024 / Accepted: 16 September 2024 / Published: 19 September 2024

Abstract

:
Software is increasingly vital, with automated systems regulating critical functions. As development demands grow, manual code review becomes more challenging, often making testing more time-consuming than development. A promising approach to improving defect detection at the source code level is the use of artificial intelligence combined with natural language processing (NLP). Source code analysis, leveraging machine-readable instructions, is an effective method for enhancing defect detection and error prevention. This work explores source code analysis through NLP and machine learning, comparing classical and emerging error detection methods. To optimize classifier performance, metaheuristic optimizers are used, and algorithm modifications are introduced to meet the study’s specific needs. The proposed two-tier framework uses a convolutional neural network (CNN) in the first layer to handle large feature spaces, with AdaBoost and XGBoost classifiers in the second layer to improve error identification. Additional experiments using term frequency–inverse document frequency (TF-IDF) encoding in the second layer demonstrate the framework’s versatility. Across five experiments with public datasets, the accuracy of the CNN was 0.768799. The second layer, using AdaBoost and XGBoost, further improved these results to 0.772166 and 0.771044, respectively. Applying NLP techniques yielded exceptional accuracies of 0.979781 and 0.983893 from the AdaBoost and XGBoost optimizers.

1. Introduction

The role of software has already reached a critical point where a widespread issue could have serious consequences for society [1]. This emphasizes the need for a robust system for error handling. The consequences of software defects can range from trivial to life-threatening, as the applications of software range from entertainment to medical purposes. The Internet of Things (IoT) and artificial intelligence (AI) massively contribute to this software dependency [2], as more devices are running software, and more devices are running more complex software. The responsibility of developers is increasing, as some use cases like autonomous vehicles and medicine require much more extensive testing [3]. Even though a software defect can be a mere inconvenience in some cases, even those cases would benefit from software defect prediction (SDP) [4]. The key contribution of SDP to software development is in the testing phase. The goal is to prioritize modules that are prone to errors. Such insights into the state of the project can allow developers to discover errors sooner or even prepare for them in certain environments.
The process of producing software is called the software development life cycle (SDLC), during which SDP should be applied to minimize the number of errors. Software is almost never perfect, and it has become common practice for developers to release unfinished projects and work on them iteratively through updates. With the use of code writing conventions and other principles, errors can be minimized, but they can never fully be rooted out. Therefore, a robust system that would assist developers in finding errors is required. With such a system, the errors could be found earlier, which could prevent substantial financial losses. To measure the quality of the code defect, density is calculated. The most common form of such measurement is the defects number per thousand lines of code (KLOC).
The advancements in AI technologies, specifically machine learning (ML), show great potential not only for various NLP applications [5] but in other areas as well, including human–robot interaction [6], predictive modeling [7], and time series forecasting [8]. Considering that the code is a language as well, this potential should also be explored for predictions regarding programming languages. When natural and programming languages are compared, various similarities can be observed, but the programming languages are stricter in terms of writing rules, which aids pattern recognition. The quality control process can be improved through AI use, which can simplify the error detection process and ensure better test coverage. To detect the errors in text, it is necessary to understand what techniques are applied through NLP. Tokenization is one of the key concepts, and its role is to segment text into smaller units for processing. Stemming reduces words to their basic forms, while lemmatization identifies the root of the word through dictionaries and morphological analysis. Different parts of sentences such as nouns, verbs, and adjectives are identified through the parsing process. The potential of NLP for the SDP problem is unexplored, and there is a gap in the literature.
However, the use of these techniques does not come without a cost. Most sophisticated ML algorithms have extensive parameters that directly affect their performance. The process of finding the best subset of these parameters is called hyperparameter optimization. The perfect solution in most cases cannot be achieved in a realistic amount of time, which is why the goal of the optimization is to find a suboptimal solution that is very close to the best solution. However, a model with hyperparameters optimized for one use case does not yield the same performance for other use cases. This is the problem that is described by the no free lunch (NFL) theorem [9] that states that no solution provides equally high performance across all use cases. This work builds upon preceding research [10] to provide a more in-depth comparison between optimizers as well as explore the potential of NLP to boost error detection in software source code.
This paper proposes a two-tier framework to perform software defect forecasting. Numerous models were examined, and convolutional neural networks (CNNs) exhibited great capabilities when tasked to perform feature extraction. Nevertheless, previous research papers [11,12,13] have shown that if CNN’s dense layer is replaced by XGBoost or AdaBoost, classification is improved. In other words, XGBoost and AdaBoost classifiers perform classification tasks considerably better than traditional CNN’s dense layers. As pre-research activities, other classifiers were also examined, such as support vector machine (SVM), decision tree (DT), and random forest (RF), but XGBoost and AdaBoost yielded the best scores. Consequently, these findings motivated the approach taken in this manuscript, namely to apply CNN and XGBoost in a two-tier structure for software defect prediction.
In this research, a famous particle swarm optimization (PSO) algorithm [14] was selected as the optimizer utilized to tune the model for this particular task. Once again, with respect to the NFL [9], a universally superior optimizer that can obtain the best scores for all possible optimization challenges is nonexistent. Thus, one must experiment with several different optimization methods before deciding upon the appropriate approach. Notwithstanding other newer available optimizers such as the red fox optimization algorithm [15] or reptile search algorithm [16], PSO attained very encouraging outcomes on smaller-scale experiments, and therefore, additional alterations were opted for to achieve even better scores for this particular problem.To summarize, the principal practical contributions of this study may be outlined in the following manner:
  • A proposal for a two-layer framework that combines CNN and ML classifiers for software defect detection.
  • The presentation of a NLP-based approach in combination with ML classifiers for software defect detection.
  • The presentation of a modified optimization algorithm that builds upon the admirable performance of the original PSO.
  • The application of explainable AI techniques to the best-performing model in order to determine the features’ importance in model decisions.
Similarly, this paper offers theoretical contributions to the field of metaheuristics optimization of machine learning models in the domain of software testing:
  • Metaheuristics algorithms replace brute-force and manual hyperparameters by employing a theoretically grounded method to deal with high-dimensional problems.
  • These hybrid models bring theoretical advances where metaheuristics algorithms handle tuning tasks while machine learning models perform classification task.
  • Interpretability of the model allows for the transparent optimization of complex models. This is crucial for the trust and accountability of software defect prediction models in case human quality assurance experts need to validate models’ forecasts.
  • Finally, this kind of approach represents a major shift towards automated systems for software defect identification, where metaheuristics algorithms may be used for the automation of many decisions typically requiring human intervention, therefore reducing the project costs and increasing confidence in the project quality.
The described research is presented in the following manner: Section 2 gives fundamentals of the applied techniques, Section 3 describes the main method used in this research, Section 4 provides the settings of the performed experiments along with necessary information for experiment reproduction, Section 5 follows with the results of experiments, and Section 6 provides the final thoughts on performed experiments along with possibilities for future research.

2. Related Works

This section first provides a brief survey of current software defect prediction problems. Afterwards, a brief introduction to the methodologies used in this paper is given.

2.1. Background

Software defects, otherwise known as bugs, are software errors that result in incorrect and unexpected behavior. Various scenarios produce errors, but most come from design flaws, unexpected component interaction, and coding. Such errors affect performance, but in some cases, the security of the system can be compromised as well. Through the use of statistical methods, historical data analysis, and ML, such cases can be predicted and reduced.
The use of ensemble learning for SDP was explored by Ali et al. [17]. The authors presented a framework that trains random forest, SVM, and naive Bayes individually, and these methods are later combined into an ensemble technique through the soft voting method. The proposed method obtained one of the highest results while maintaining solid stability. On the other hand, Khleel et al. [18] explored the potential of a bidirectional long short-term model for the same problem. The technique was tested with random and synthetic minority oversampling techniques. While high performance was exhibited in both experiments, the authors concluded that the random oversampling was better due to the class imbalances for the SDP problem. Finally, Zhang et al. [19] proposed a framework based on a deep Q-learning network (DQN) with the goal of removing irrelevant features. The authors performed thorough testing of several techniques with and without DQN, and the research utilized a total of 22 SDP datasets.
The application of NLP techniques in ML is broad, and Jim et al. [5] have provided a detailed overview of ML techniques that are suitable for such use cases. The use cases reported include fields from healthcare to entertainment. The paper also provides an in-depth study into the nature of computational techniques used in NLP. Different approaches include image–text, audio–visual, audio–image–text, labeling, document-level, sentence-level, phrase-level, and word-level approaches to sentiment analysis. Furthermore, the authors surveyed a list of datasets for this use case. Briciu et al. [20] proposed a model based on bidirectional encoder representations from transformers (BERT) NLP technique for SDP. The authors compared RoBERTa and CodeBERT-MLM language models for the capture of semantics and context, while a neural network was used as a classifier. Finally, Dash et al. [21] performed an NLP-based review for sustainable marketing. The authors performed text mining through keywords and string selection, while the semantic analysis was performed through the use of term frequency–inverse document frequency.
The use of metaheuristics as optimizers has been proven to yield substantial performance increases when combined with various AI techniques [22]. Jain et al. [23] proposed a hybrid ensemble learning technique optimized by an algorithm from the swarm intelligence subgroup of ML algorithms. Some notable examples of metaheuristic optimizers include the well-established variable neighborhood search (VNS) [24], artificial bee colony (ABC) [25], and bat algorithm (BA) [26]. Some recent additions also include COLSHADE [27] and the recently introduced sinh cosh optimizer (SCHO) [28]. Optimizers have shown promising outcomes when applied in several fields, including time series forecasting [29], healthcare [30], and anomaly detection in medical time series data [31]. Applying hybrid optimizers to parameter tuning has demonstrated decent outcomes in previous works as well [32,33,34].
The SDP problem is considered to have nondeterministic polynomial time hardness (NP-hard), as the problem cannot be solved by manual search in a realistic amount of time. Hence, optimizers are to be applied, such as swarm intelligence algorithms. However, the process is not as simple since for every use case, a custom set of AI techniques has to be applied due to the NFL theorem [9]. Furthermore, the problem of hyperparameter optimization, which is necessary in order to get the most out of the performance of AI techniques, is considered NP-hard as well. The application of NLP techniques for SDP is limited in the literature, indicating a research gap that this work aims to bridge.
Therefore, metaheuristic algorithms can considerably improve the ability of machine learning models to identify defects by addressing several different aspects of the model itself along with the data it uses. Besides the hyperparameter tuning task which is employed in this manuscript, where metaheuristics can help in finding the best model configuration, there are other possible ways metaheuristics can be employed. First, metaheuristics may be utilized for optimizing the feature selection process since software defect datasets are typically high dimensional. This can lead to faster training times and reduced overfitting. Moreover, metaheuristic algorithms can be utilized to tackle the problem of data imbalance by smart resampling of the dataset or by producing synthetic instances. Finally, metaheuristics methods may be used to mitigate the noisy data and outliers frequently present in software defect datasets. Cleaner datasets consequently result in better generalization capabilities of the model along with higher accuracy.

2.2. Text Mining Techniques

Introduced in 2018 [35], the BERT technique is based on the mechanism of attention, which is used to determine the meaning of a text or sentence. This technique was developed by a Google research team and has since been applied for diverse NLP purposes [36]. The BERT method can be modified for special use cases like text summarization and meaning inference. The attention mechanism allows for the transformers, which are the basis of BERT, to shift focus between the segments of the input. With the use of this technique, the understanding of the sentence is improved, due to the meaning of words it provides. This process can be applied in parallel over multiple parts of sentences, increasing the speed of data processing and overall efficiency. Furthermore, the natural language is processed bidirectionally, as stated in the name of the technique. The benefit of such a mechanism is in gaining the context of the word, as it analyzes the words that come before the analyzed word as well as those that come after it.
The masked language model (MLM) is used for the training of BERT. The goal of such training is to determine a hidden word from only its context. This results in the model learning the structure of the language and its patterns. Training with large datasets can prepare BERT models for specialized tasks due to their transfer learning functionality. The application of BERT in software detection is not explored. BERT has proven a reliable and efficient technique across many different NLP use cases where there are many more factors to account for than with programming languages. The latter are stricter in their rules of writing and are more prone to repeating patterns, which reduces the complexity of the text analysis.
A promising text mining technique with roots in statistical text analysis is term frequency–inverse document frequency (TF-IDF) [37]. As the name implies, it is comprised of two components. Term frequency is determined as the ratio between a term occurring in a document versus the total number of terms in said document. Mathematically, it can be determined as Equation (1):
TF ( t , d ) = Number of times term t appears in document d Total number of terms in document d
The second term in TF-IDF defines the inverse document frequency. The role of this factor is to emphasize important works while decreasing the importance of filler and stopping words that comprise the grammatical structure of a language. It can be determined as per Equation (2):
IDF ( t , D ) = log N | { d D : t d } |
Combining TF and IDF (TF-IDF) calculates the importance of terms within a document in relation to the entire corpus as per Equation (3).
TF - IDF ( t , d , D ) = TF ( t , d ) × IDF ( t , D )
Text mining approaches like BERT and TF-IDF have recently been utilized frequently within machine learning approaches in different fields, including text document clustering [38,39], spam email filtering [40,41], sentiment analysis [42], improving customer satisfaction [43], prediction of financial markets based on news [44], and reviews classification [45], to name a few.

2.3. CNN

CNNs are recognized in the deep learning field for their high performance and versatility [46,47]. The multi-layered visual cortex of the animal brain served as the main inspiration for this technique. The information is moved between the layers where the input of the next layer is the output of the previous layer. The information gets filtered and processed in this manner. The complexity of the data is decreased through each layer, while the ability of the model to detect finer details increases. The architecture of the CNN model consists of the convolutional, pooling, and fully connected layers. The filters that are most commonly used are 3 × 3, 5 × 5, and 7 × 7.
To achieve the highest performance, CNNs require hyperparameter optimization. The most commonly tuned parameters based on their impact on performance are the number of kernels and kernel size, learning rate, batch size, the amount of each type of layer used, weight regularization, activation function, and dropout rate. This process is considered NP-hard; however, the metaheuristic methods have proven to yield results when applied as optimizers for CNN hyperparameter tuning [23].
The convolutional function provides the input vector described in Equation (4):
z i , j , k [ l ] = w k [ l ] x i , j [ l ] + b k [ l ] ,
where the output of the k-th feature at position i , j and layer l is given as z i , j , k [ l ] , the input at i , j is x, the display filters are given as w, and the bias is shown as b.
The activation function follows the convolution function shown in Equation (5).
g i , j , k [ l ] = g ( z i , j , k [ l ] )
where the non-linear function exploiting the output is given as g ( · ) .
After the activation function, the pooling layers process the input toward resolution reduction. Different pooling functions can be used, and some of the most popular ones are average and max pooling. This behavior is described in Equation (6).
y i , j , k [ l ] = p o o l i n g ( g i , j , k [ l ] ) .
where y represents the result of pooling.
Finally, classification is performed by the fully connected layers. The most common applications of these layers include the softmax layers for multi-class datasets, and the sigmoid function is applied for binary classification along gradient descent methods.
The weights and biases are adjusted in each iteration—which, in the case of the CNN, are called epochs—during which the goal is to minimize the loss function provided by Equation (7).
H ( p , q ) = x p ( x ) l n ( q ( x ) )
where the discrete variable x has two defined distributions: p and q.
CNNs have found applications in a broad spectrum of different real-world problems recently. They are utilized with considerable success in medicine [11,48,49], network security [12], agriculture [50,51], physical defect identification [52,53], and traffic control [54,55].

2.4. AdaBoost

Over the previous decade, ML has constantly grown as a field. As a result, a large number of algorithms have been produced due to their disproportionate contributions based on the area of application. Adaptive boosting (AdaBoost) aims to overcome this through the application of weaker algorithms as a group. The algorithm was developed by Freund and Schapire in 1995 [56]. Algorithms that are considered weak perform classification slightly better than random guessing. The AdaBoost technique applies more weak classifiers through each iteration and balances the classifier’s weight, which is derived from accuracy. For errors in classification, the weights are decreased, while increases in weights are performed for good classifications.
The error of a weak classifier is calculated according to Equation (8).
ϵ t = i = 1 N w i , t · I ( h t ( x i ) y i ) i = 1 N w i , t
where the error weight in the t-th iteration is given as ϵ t , the number of training samples is given as N, and the weight of the i-th training sample during the t-th iteration is w i , t . h t ( x i ) represents the predicted label, and  y i shows the true label. The function I ( · ) provides 0 for false cases and 1 for true cases.
After the weights have been established, the modification process for weights begins for new classifiers. To achieve accurate classification, large groups of classifiers should be used. The combination of submodels and their results represents a linear model. The weight are calculated in the ensemble as per Equation (9).
α t = 1 2 ln 1 ϵ t ϵ t
where the α t changes for each weak learner and represents its weight in the final model. The weights are updated according to Equation (10).
w i , t + 1 = w i , t · exp α t · y i · h t ( x i )
where y i marks the true mark of the i-th instance, h t ( x i ) represents the prediction result of the weak student i-th instance in the t-th round, and  w i , t denotes the weight, i-th instance in the t-th round.
The advantages of AdaBoost are that it reduces bias through learning from previous iterations, while the ensemble technique reduces variance and prevents overfitting. Therefore, AdaBoost can provide robust prediction models. However, AdaBoost is sensitive to noisy data and exceptions.

2.5. XGBoost

The XGBoost method is recognized as a high-performing algorithm [57]. However, the highest performance is achieved only through hyperparameter tuning. The foundation of the algorithm is ensemble learning, which exploits many weaker models. Optimization with regularization along gradient boosting significantly boosts performance. The motivation behind the technique is to manage complex input–target relationships through previously observed patterns.
The objective function of the XGBoost, which combines the loss function and the regularization term, is provided in Equation (11).
obj ( Θ ) = L ( θ ) + Ω ( Θ ) ,
where the Θ shows the hyperparameter set, L ( Θ ) provides loss function, and Ω ( Θ ) shows the regularization term used for model complexity management.
Mean square error (MSE) is used for utilization of the loss function given in Equation (12).
L ( Θ ) = i ( y i y i ^ ) 2 ,
where y i ^ provides the value predicted for the target for each iteration i, and y i provides the predicted value.
The process of differentiating actual and predicted values is given in Equation (13).
L ( Θ ) = i [ y i ln ( 1 + e y ^ i ) + ( 1 y i ) ln ( 1 + e y ^ i ) ] .

3. Methods

In this section, baseline PSO metaheuristics is explained first. Afterwards, a modified variant of the PSO algorithm is suggested and explained in detail. Ultimately, the suggested optimization framework is briefly introduced.

3.1. Basic PSO

The PSO algorithm was introduced in 1995 by Kennedy and Eberhart [14]. The main inspiration behind the technique includes the flocking of fish and birds. The individuals in the population are particles. Both discrete and continuous optimization problems have been solved successfully with this algorithm [58].
First, initial velocities are given to each particle that is a member of the population. The velocities are described with three weights which define their movement toward the optimal solution. The weights are the terminal velocity, the best obtained so far, and the best direction obtained by the neighbor.
v i v i + U ( 0 , ϕ 1 ) ( p i x i ) + U ( 0 , ϕ 2 ) ( p g x i ) x i x i + v i ,
where the component-wise multiplication is represented with ⨂, all the components in that range [ V m a x , + V m a x ] are given as v i , and the vector that shows each particle through all generations uniformly distributed between 0 and ϕ i is given as U ( 0 , ϕ 1 ) .
Every particle is a possible solution in D-dimensional space. Equation (15) describes the position of the solution.
X i = ( x i 1 , x i 2 , . . . , x i D )
The best position obtained is given by Equation (16).
P i = ( p i 1 , p i 2 , . . . , p i D )
The velocities are described by Equation (17).
V i = ( v i 1 , v i 2 , . . . , v i D )
the solution that is found to be the best yet is denoted as p b e s t , and the best in the group is denoted as g b e s t . Both values are used to calculate the next particle’s position. When applying the inertia weight approach, the equation can be modified as shown in Equation (18).
v i d = W · v i d + c 1 · r 1 · ( P i d X i d ) + c 2 · r 2 · ( P g d X i d )
where the v i d indicates particle velocity, x i d shows the current position, w gives the inertia factor, relative cognitive influence is provided by c 1 , the social component influence is given as c 2 , and  r 1 and r 2 are random numbers. p b e s t and g b e s t are provided respectively through p i d and p g d .
The inertia factor is modeled by Equation (19).
w = w m a x w m a x w m i n T · t
where the initial weight is provided by w m a x , the final weight is w m i n , the maximum number of iterations is T, and the current iteration is t.

3.2. Modified PSO

The original PSO demonstrates admirable performance, especially for an algorithm proposed in 1995. However, given the recent leaps in optimization research, advanced techniques could be integrated into the PSO to help improve overall performance. The drawbacks, such as the lack of exploration innate to the PSO, can therefore be addressed and overall performance boosted. This work explores a multi-population method combined with principles borrowed from the genetic algorithm (GA) [59] to help boost exploration. The introduced algorithm is therefore dubbed the multi-population PSO (MPPSO).
The first stage in the MPPSO algorithm is the generation of a population of agents. This is when the first modification is introduced. The first 50% of agents is generated as per Equation (20).
X i , j = l b j + ψ · ( u b j l b j ) ,
where X i , j denotes the j-th factor assigned to individual i, l b j and u b j signify parameter boundaries for j, ψ is a factor used to introduce randomness selected form a uniform [ 0 , 1 ] distribution. The remaining 50% of the population is initialized using the quasi-reflection-based learning (QRL) [60] mechanism as per Equation (21).
X j q r = rnd l b j + u b j 2 , x j ,
In the Equation (21), r n d is used to determine an arbitrary value within the limits l b j + u b j 2 , x j . By incorporating QRL into the initialization process, diversification is achieved during the initialization, increasing chances of locating more promising solutions.
Once these two populations are generated, they are kept separate during the optimization, taking a multi-population-inspired approach. Two mechanisms inspired by the GA are used to communicate between the populations: genetic crossover and genetic mutation. Crossover is applied to combine parameters between two agents, creating offspring agents. Mutation is used to introduce further diversification within existing agents by randomly tweaking parameter values within constraints. These two processes are described in Figure 1.
When tackling optimization problems, it is essential to balance the exploration and exploitation of the search space. In the modified algorithm, this is achieved by modifying agents that participate in the reproductive simulations using crossover. Specifically, in the initial 1 2 of iterations, two random agents for each subpopulation are selected and recombined. Mutation is then applied with a mutation probability ( m p ) of 0.1 . Similarly, crossover between agents is governed by the crossover probability parameter ( c p ), which is set to an empirically determined value of 0.1 for this work. In the latter half of iterations, the best-performing agents for each sub-population are chosen. Thus, exploration is encouraged in the early optimization stages, and intensification is supported in the latter stages.
The pseudocode for the presented modified algorithm is provided in Algorithm 1.
Algorithm 1 MPPSO pseudocode
  • Initialize 1 2 particle population P array using random positions and velocities on D dimensions in the search space
  • Apply QRL to generate remaining agents in population
  • Separate agents into two equal sub populations
  • repeat
  •    Evaluate the desired optimization fitness function of each particle in D variables
  •    Compare the outcome with p b e s t i
  •    if Current value is better than the p b e s t i  then
  •      Set p b e s t i to the value of current best
  •      Set p i to the current location x i in D-dimensional space
  •    end if
  •    Assign the index of the best solution so far to the variable g
  •    Adjust the particle’s position according to Equation (14)
  •    for all Agents in P do
  •       Generate random value R in range [ 0 , 1 ]
  •       if  R > c p  then
  •         if t < T 2  then
  •            Apply crossover to random agents in subpopulation
  •         else
  •            Apply crossover to best agents in each subpopulation
  •         end if
  •       end if
  •    end for
  • until The criterion is met
  • return Best-performing agent as solution

3.3. Complexity of the Proposed MPPSO

When discussing the complexity of metaheuristic methods, a common practice is to express it with respect to the number of fitness function evaluations (FFEs) per execution of the algorithm. FFE is recognized as the most complex operation during execution of the metaheuristics, as shown in numerous publications [61,62,63,64]. This research follows this established practice and utilizes FFEs to evaluate the complexity of MPPSO.
As previously described in detail, the suggested novel MPPSO introduces two modifications to the baseline PSO. However, since novel individuals are not evaluated after creation (their evaluation is deferred to the following round), the complexity of the novel MPPSO remains the same as the baseline PSO. More precisely, the computing complexity of MPPSO represented in Big O notation is provided in Equation (22):
O ( N ) = N + T · N
Here, N represents the count of individuals in the population, while T represents the maximal count of iterations in a single run.

3.4. Proposed Experimental Optimization Framework

The framework proposed in this work utilizes a two-layer approach. The first layer focuses on CNN optimization. Metaheuristics optimizers are applied to CNN parameter selection. Data are processed and prepared accordingly. Text data are suitably encoded, and CNN optimization is carried out in the first layer of the framework.
Once a suitable CNN model is generated, it is applied to the whole dataset, and the final layer is tapped to recover intermediate data. These intermediate data are then used as inputs for XGBoost and AdaBoost classifiers. Separate optimizations are carried out for each classifier. Further details of the experimental setup are provided in the following section. A flowchart of the proposed framework is provided in Figure 2.

4. Experimental Setup

In this work, two separate simulations are carried out. The first set of simulations is conducted using a group of five datasets, which are KC1, JM1, CM1, KC2, and PC1. These datasets are part of NASA’s promise repository aimed at SDP. The instances in datasets represent various software models, and the features depict the quality of that code. McCabe [65] and Halstead [66] metrics were applied through 22 features. The McCabe approach was applied to its methodology, which emphasizes less-complex code through the reduction of pathways, while Halstead uses counting techniques with the logic behind it being that the larger the code, the more error prone it is. A tow layer approach is applied to these simulations. The class balance in the utilized dataset is provided in Figure 3. The first utilized dataset is unbalanced, with 22.5% of samples representing software with errors and 77.5% without.
A correlational heat map of the features is provided in Figure 4.
The second set of simulations utilized a dataset comprised of around 1000 crowd-sourced Python programming problems. The dataset is publicly available (https://paperswithcode.com/dataset/mbpp accessed on 1 August 2024). The problems are on the beginner level and consist of a description of the task, the solution, and three automated test cases. A generative model based on GPT-2 is used to generate solutions to these problems based off descriptive problems. Generated solutions that pass as well as fail at least one test case are collected and combined into a unified dataset. The resulting dataset combines ground truth solutions with generated code that has some errors. The final combined dataset is comprised of 10,000 samples and is balanced.
Both datasets are separated into training and testing portions for simulations. An initial 70% of data is used for training, and the latter 30% is used for evaluation. Evaluations are carried out using a standard set of classification evaluation metrics including accuracy, recall, F1-score, and precision. The Matthews correlation coefficient (MCC) metric is used as the objective function that can be determined as per Equation (23). An indicator function is also tracked. The indicator function is Cohen’s kappa which is determined as per Equation (24). Metaheuristics are tasked with selecting optimal parameters, maximizing the MCC score.
MCC = ( T P × T N ) ( F P × F N ) ( T P + F P ) ( T P + F N ) ( T N + F P ) ( T N + F N )
Here, true positives (TP) denotes samples correctly classified as positive, and true negatives (TN) denotes instances correctly classified as negative. Similarly, false positives (FP) and false negatives (FN) denote samples incorrectly classified as positive and negative, respectively.
Cohen s kappa = Classification Classification Expected 1 Classification Expected
In the first-layer CNN parameters are optimized. The respective parameters and their constraints are provided in Table 1.
In the second optimization layer, intermediate outputs of the CNN are used. The final dense layer is recorded during classifications of all the samples available in the dataset. This is once again separated into 70% for training and 30% for testing. These are then utilized to training and optimize AdaBoost and XGBoost models. Parameter ranges for AdaBoost and XGBoost are provided in Table 2 and Table 3, respectively.
Several optimizers are included in a comparative analysis against the proposed modified algorithm. These include the original PSO [14] as well as several other established optimizers such as the GA [59] and VNS [24]. Additional optimizers included in the comparison are the ABC [25], BA [26] and COLSHADE [27] optimizers. A recently proposed SCHO [28] optimizer is also explored. Each optimizer is implemented using the original parameter settings suggested in the works that introduced said algorithm. Optimizers are allocated a population size of 10 agents and are allowed a total of eight iterations to locate promising outcomes within the given search range. Simulations are carried out through 30 independent executions to facilitate further analysis.
The framework is designed by utilizing an object-oriented paradigm. Each metaheuristic is implemented as a class, and the instance of this class accepts the instance of a solution class through dependency injection. Each solution consists of several components, each corresponding to one hyperparameter being optimized. One of the main functionalities of the framework is the fitness function class, where each model has its own fitness function. In each iteration, when evaluating solutions, the instance of fitness function class is called, and each solution is passed as the parameter. For every solution in the population, an appropriate CNN/AdaBoost/XGBoost model is created and initialized with the hyperparameter values from that particular solution. This model is then trained and tested, and as a result, the objective function value is returned (MCC value) along with the indicator function value (error rate). Moreover, the created model is returned as well, as it is necessary to store it because the best model undergoes SHAP outcomes interpretation.
For NLP simulations, only the second layer of the framework is utilized. Input text is encoded using TF-IDF encoding to a maximum of 1000 tokens. These are used as inputs for model training and evaluation. Optimization is carried out using the second layer of the introduced framework, and a comparative analysis is conducted under identical conditions as previous simulations in order to demonstrate the flexibility of the proposed approach.

5. Simulation Outcomes

The following subsections present the outcomes of the conducted simulation. First, the outcomes of the simulations with a traditional dataset are conducted in a two-layer approach. Second, two simulations with NLP are presented using only the second layer of the framework. The bold font is used throughout the section to indicate the best results for the reported experiments.

5.1. Layer One CNN Simulations

First-layer CNN optimization outcomes in terms of objective function are provided in Table 4. The introduced MPPSO demonstrates the most favorable outcomes, attaining an objective metric score of 0.273280 in the best-case execution and 0.262061 in the worst. In turn, scores for the mean and median are also the best with 0.268824 and 0.269873, respectively. The highest stability rate is demonstrated by the SCHO optimizer; however, this optimizer does not manage to match the favorable outcomes demonstrated by other algorithms, suggesting a lack of exploration of exploitation potential.
First-layer CNN optimization outcomes in terms of indicator function are provided in Table 5. The introduced MPPSO demonstrates similarly favorable results, attaining the best outcomes with a score of 0.231201. Mean and median scores of 0.233446 and 0.232697 are also the best among evaluated optimizers. A high rate of stability in terms of indicator metric is demonstrated by the original PSO; however, the performance of the modified version is worse than that of the original in all cases.
Visual comparisons in terms of optimizer stability is provided in Figure 5. While the stability of the introduced optimizer is not the highest compared to all other optimizers, algorithms with higher rates of stability attain less favorable outcomes. This suggests that VNS, BA, and COLSHADE, for instance, overly focus on local optima rather than looking for more favorable outcomes. This lack of exploration leads to higher stability but less favorable outcomes in this test case. The introduced optimizer also outperformed the base PSO while demonstrating higher stability, suggesting that the introduced modification has a positive influence on performance.
The convergence rate of algorithms provides valuable feedback on optimizer behavior. Convergence rates for both the objective and indicator functions are tracked and provided in Figure 6 and Figure 7 for each optimizer. The introduced optimizer demonstrates favorable performance, locating the best solution by iteration eight. Other optimizers stagnate prior to reaching favorable outcomes, suggesting that the boost in exploration incorporated into the MPPSO helps overcome the shortcomings inherent to the original PSO. Similar conclusions can be made in terms of indicator function convergence demonstrated in Figure 7. As the indicator function was not the target of the optimization, there is a small decrease in the first iteration of the optimization. However, the best outcome of the comparison is once again located by the introduced optimizer by iteration 3.
Comparisons between the best-preforming models optimized by each algorithm included in the comparative analysis are provided in Table 6. The introduced optimization demonstrates a clear dominance in terms of accuracy as well as precision. However, other optimizers demonstrate certain favorable characteristics as well, which is somewhat to be expected and further supports the NFL theorem of optimization.
Further details for the best-performing model optimized by the MPPSO algorithm are provided in the form of the ROC curve and confusion matrix provided in Figure 8 and Figure 9. A plot of the best-performing model is also provided in Figure 10. Finally, to support simulation repeatability, the parameter selections made by each optimizer for their respective best-performing optimized models are provided in Table 7.

5.2. Layer Two Simulations

Once a suitable CNN model is trained, the final layer is utilized as an input for AdaBoost and XGBoost classifiers. These classifiers are themselves subjected to hyperparameter optimization using a set of metaheuristic optimizers. This process in conducted in the second layer of the optimization framework. The following subsections present the outcomes of these simulations.

5.2.1. AdaBoost Layer 2 Classifications

The outcomes of AdaBoost model optimization in terms of objective function are provided in Table 8. The introduced MPPSO demonstrates the best outcomes in the worst, mean, and median scenarios, attaining scores of 0.276185, 0.279771, and 0.280600, respectively. In terms of the best-performing model, the MPPSO and original PSO managed to attain models that match performance. The highest stability rate is demonstrated by the SCHO; however, this optimizer does not manage to match the favorable outcomes demonstrated by other algorithms, suggesting a lack of exploration of exploitation potential.
Second-layer AdaBoost optimization outcomes in terms of indicator function are provided in Table 9. Interestingly, in terms of indicator function outcomes, the BA demonstrates superior performance to all tested algorithms. However, since this was not the goal and metric of the optimization process, these findings are quite interesting and further support the NFL theorem. High stability rates are demonstrated by the SCHO algorithm.
Visual comparisons in terms of optimizer stability are provided in Figure 11. Two interesting spikes in objective function distributions can be observed in the introduced optimizer outcomes. Interestingly, other optimizers overly focus on the first (lower) region. However, the introduced optimizer overcomes these limitations and locates a more promising region. The BA locates a promising region within the aerospace. However, the BA stability greatly sufferers, providing by far the lowest stability compared to other evaluated optimizers and leading to overall mixed results.
Convergence rates for both the objective and indicator functions for L2 AdaBoost optimizations are tracked and provided in Figure 12 and Figure 13 for each optimizer. The proposed MPPSO shows a favorable rate of convergence, locating a good solution by the seventh iteration and further improving on this solution by iteration 19. While other optimizers manage to find decent outcomes, a fast rate of convergence limits the quality of the located solution. Similar conclusions can be made in terms of indicator function convergence demonstrated in Figure 13, where the optimizer once again locates the best solution near the end of the optimization in iteration 19.
Comparisons between the best-preforming AdaBoost L2 models optimized by each algorithm included in the comparative analysis are provided in Table 10. The best-performing models match performance across several metrics, which suggests that several optimizers are equally well suited for AdaBoost optimization, demonstrating a favorable accuracy of 0.772166 and outperforming scores attained by using just CNN in L1.
Further details on the best-performing AdaBoost L2 model optimized by the MPPSO algorithm are provided in the form of the ROC curve and confusion matrix provided in Figure 14 and Figure 15. Finally, to support simulation repeatability, the parameter selections made by each optimizer for their respective best-performing optimized models are provided in Table 11.

5.2.2. XGBoost Layer 2 Classifications

The outcomes of XGBoost L2 model optimization in terms of objective function are provided in Table 12. The introduced MPPSO demonstrates the best outcomes in the best-case simulation with an objective function score of 0.283956. Interestingly, in terms of worst, mean, and median functions, the VNS algorithm demonstrates the most favorable outcomes. In turn, this algorithm also demonstrates a high rate of stability in the conducted simulations in this challenge.
Second-layer XGBoost optimization outcomes in terms of indicator function are provided in Table 13. The introduced optimizer does not attain the best outcomes in terms of indicator function results, as the ABC algorithm attains the most favorable results. These interesting findings further support the NFL theorem, indicating that no single approach is equally suited to all challenges and across all metrics. In terms of indicator function stability, the PSO attained the best outcomes.
Visual comparisons in terms of optimizer stability are provided in Figure 16. While the introduced optimizer attains more favorable outcomes in terms of objective function, an evident disadvantage can be observed in comparison to the original PSO, with the PSO algorithm favoring a better location within the search space in more cases. An overall low stability can be observed for the BA algorithm, while the ABC algorithm focuses on a suboptimal search space in many solution instances.
Convergence rates for both the objective and indicator functions for L2 XGBoost optimizations are tracked and provided in Figure 17 and Figure 18 for each optimizer. The proposed optimizer manages to find a promising region in the 19 iteration, surpassing solutions located by other algorithms. However, this improvement comes at the cost of indicator function outcomes, where the performance is slightly reduced. Oftentimes, the tradeoff in terms of one metric can mean improvements in another. As the indicator function was not the primary goal of the optimization, the MPPSO algorithm attained favorable optimization performance overall.
Comparisons between the best-preforming XGBoost L2 models optimized by each algorithm included in the comparative analysis are provided in Table 14. The favorable performance of the VNS algorithm is undeniable for this challenge, further supporting the NFL theorem of optimization. Nevertheless, constant experimentation is needed to determine suitable optimizers for any given problem. Furthermore, it is important to consider all the metrics when determining which algorithm is the best suited to the demands of a given optimization challenge.
Further details on the best-performing XGBoost L2 model optimized by the MPPSO algorithm are provided in the form of the ROC curve and confusion matrix provided in Figure 19 and Figure 20. Finally, to support simulation repeatability, the parameter selections made by each optimizer for their respective best-performing optimized models are provided in Table 15.

5.3. Simulations with NLP

The second layer of the framework is also utilized in conjunction with text mining techniques to handle NLP. Encoding is handled using TF-IDF vectorization to a limit of 1000 features. Vectorized values are then used as inputs for the second layer of the framework, where AdaBoost and XGBoost are used to handle classifications. Optimization is carried out on these classification models to improve overall performance though parameter selection.

5.3.1. AdaBoost NLP Classifications

The outcomes of AdaBoost NLP classifier model optimization in terms of objective function are provided in Table 16. The introduced MPPSO demonstrates the best outcomes in the best-case simulation, with an objective function score of 0.959947 alongside the BA optimizer. In terms of stability as well as the iterations and mean outcomes, the original PSO demonstrates favorable performance, while the BA attained the best results for the median performance. The PSO also demonstrates a high rate of stability in terms of objective scores.
AdaBoost NLP classifier optimization outcomes in terms of indicator function are provided in Table 17. High stability rates are showcased by the PSO, and favorable outcomes are achieved by the VNS algorithm, attaining the best overall indicator score, as well as the ABC, which demonstrated the best outcomes in the worst case execution.
Visual comparisons in terms of AdaBoost NLP optimizer stability are provided in Figure 21. High stability rates are demonstrated by the PSO; however, the best scores are still achieved by the modified optimizer, suggesting that modification holds further potential. A tradeoff is always present when tackling multiple optimizations problems. It is essential to explore multiple potential optimizers in order to determine a suitable approach, as stated by the NFL theorem.
Convergence rates for both the objective and indicator functions for NLP AdaBoost optimizations are tracked and provided in Figure 22 and Figure 23 for each optimizer. While all optimizers show a favorable convergence rate, several optimizers dwell in suboptimal regions within the search space. The introduced optimizer overcomes the local minimum issue and locates a promising solution in iteration 18 of the optimization. Similar operations can be made in terms of indicator functions, with the best solution located by the optimizer in the latter iterations.
Comparisons between the best-preforming AdaBoost NLP classifier models optimized by each algorithm included in the comparative analysis are provided in Table 18. Favorable outcomes can be observed for all evaluated models, suggesting that metaheuristic optimizers, as well as the AdaBoost classifier, are well suited to problem of error detection through applied NLP.
Further details for the best-performing AdaBoost NLP classifier model optimized by the MPPSO algorithm are provided in the form of the ROC curve and confusion matrix provided in Figure 24 and Figure 25. Finally, to support simulation repeatability, the parameter selections made by each optimizer for their respective best-performing optimized models are provided in Table 19.

5.3.2. XGBoost NLP Classifications

The outcomes of the XGBoost NLP classifier model optimization in terms of objective function are provided in Table 20. The introduced optimizer demonstrates the most favorable outcomes for the best-case scenario. Nevertheless, the original PSO demonstrates overall favorable scores across all other test scenarios. Accordingly, the overall stability of the PSO is also equally favorable, with the highest rates of stability in comparison to other optimizers.
XGBoost NLP classifier optimization outcomes in terms of indicator function are provided in Table 21. High stability rates are demonstrated by the PSO, while the ABC algorithm demonstrates the best scores across all other metrics for the indicator function.
Visual comparisons in terms of XGBoost NLP optimizer stability are provided in Figure 26. While the best scores in the best-case scenario are showcased by the introduced optimizer, PSO outcomes are also favorable, with many solutions overcoming local optima and locating more promising outcomes overall.
Convergence rates for both the objective and indicator functions for NLP XGBoost optimizations are tracked and provided in Figure 27 and Figure 28 for each optimizer. The introduced optimizer overcomes local minimum traps, locating a favorable outcome in iteration 18, which suggests that the boost in exploration, especially in later stages, helps overall performance improve, while the baseline PSO sticks to less favorable regions. These outcomes are mirrored in terms of indicator function, with the best solution being determined by the introduced optimizer in iteration 18.
Comparisons between the best-preforming XGBoost NLP classifier models optimized by each algorithm included in the comparative analysis are provided in Table 22. The best performance is demonstrated by the introduced optimizer across many metrics, attaining an accuracy of 0.983893. The GA also demonstrated good performance in terms of precision for non-error and recall for errors in code, further affirming the NFL theorem.
Further details for the best-performing XGBoost NLP classifier model optimized by the MPPSO algorithm are provided in the form of the ROC curve and confusion matrix provided in Figure 29 and Figure 30. Finally, to support simulation repeatability, the parameter selections made by each optimizer for their respective best-performing optimized models are provided in Table 23.

5.4. Comparative Analysis with Baseline Models

In this section, comparisons to the baseline models were carried out in order to showcase the capabilities of the suggested approach in both regarded scenarios. The baseline models were utilized with default settings without being tuned by metaheuristics. These models included plain AdaBoost and XGBoost, CatBoost, random forest, support vector machine, and three deep neural network models (DNNs), containing two and three hidden layers. The first DNN employed two hidden layers with 64 and 128 neurons; the second DNN consisted of three hidden layers with 32, 64, and 128 neurons; and the final DNN had the same layer structure as the previous one with dropout set to 0.1. The parameters of DNNs were chosen using the GridSearch method. Tests were carried out with different DNN architectures, with these three reported obtaining the best results. The outcomes for the NASA dataset regarding the achieved accuracy and MCC value are provided in Table 24. It can be concluded that the superior accuracy was obtained by the CNN-AB-MPPSO approach, while CNN-XG-MPPSO attained the superior MCC score. It can also be seen that DNN with three layers had lower performance due to overfitting problem, with the second three-layered DNN, which employed dropout, obtaining better results.
The outcomes of comparisons for NLP simulations, with respect to the obtained accuracy and MCC score, are showcased in Table 25. Once again, it is possible to observe the superior behavior of the proposed models against baselines. In this case, XG-MPPSO attained the best accuracy and the best MCC score.

5.5. Statistical Validation

When comparing algorithms that rely on randomness, such as metaheuristics, it is essential to verify the statistical significance of findings, as a single simulation may not be representative of overall performance. With this in mind, simulations in this work were carried out in 30 individually independent runs, and data were collected for further analysis.
Two types of statistical tests can be applied to validate the statistical significance of outcomes: parametric and non-parametric tests. The first step, therefore, involves validating that the use of parametric tests is warranted or if non-parametric analysis should be applied. The used of parametric tests is justified if conditions of independence, normality, and homoscedasticity [67] are met. Additionally, simulations must have been conducted using independent random seeds for the independence criteria to be met. Homoscedasticity is confirmed via Levene’s test [68], which resulted in p-values of 0.68 for each case, indicating that this condition is also met.
The last condition, normality, is assessed using the Shapiro–Wilk test [69], with p-values computed for each method included in the comparative analysis. As all the values are bellow a threshold of 0.05 , the null hypothesis (H0) may be rejected, suggesting that outcomes do not originate from normal distributions. These findings are further enforced by the objective function KDE diagrams presented in Figure 31. The outcomes of the Shapiro–Wilk test are provided in Table 26.
As one of the criteria needed to justify the safe use of parametric tests is not met, further evaluations are conducted via non-parametric tests. The Wilcoxon signed-rank test is therefore applied to compare the performance of the MPPSO to other algorithms included in the comparative analysis. Test scores are presented in Table 27. A threshold of α = 0.05 is not exceeded in any of the test cases, indicating that the outcomes attained by the MPPSO algorithm hold statistically significant improvements over other evaluated methods.

5.6. Best Model Interpretations

Model interpretation plays an increasingly important role in modern AI research. Oftentimes, understanding the factors and their degree of influence on decisions can help one to better understand the problem at hand as well as to detect any unintended biases in data. There are several emerging techniques for interpreting models, and for simpler models, these interpretations are fairly straightforward. However, more complicated models, and models that utilize a multi-tier framework, can be more difficult to subject to interpretation.
This work utilizes SHAP analysis in order to determine feature importance and feature impact on optimized model decisions. SHAP analysis utilizes an approach rooted in game theory, treating features as contestants in a cooperative competition. The contributions of each of the features as well as their collaborative contributions are considered when determining importance. Importance can also be analyzed on a global and local level, making SHAP a versatile and promising approach for model analysis.
Feature importances for the best-performing models constructed in L1 and L2 simulations are provided in Figure 32. Each constructed classifier places a slightly different importance of each feature. However, I, Uniq_OP, IOBland, and d are universally highly ranked. The best CNN and AdaBoost models show the highest impact for the I feature, while this feature is second best for the XGBoost classifier, with the highest importance being placed on Uniq_OP feature.
Feature importances for the best-performing models in NLP simulations are likewise provided in Figure 33 for the AdaBoost model and in Figure 34 for the XGBoost model. TF-IDF vectorized feature importances are once again interpreted using SHAP analysis, and the top 20 contributing features are shown for both models. Both models place a high importance on the def keyword as well as on the print and write keywords, where errors can often occur. The AdaBoost model places a significant importance of the def keyboard, with print holding the second spot with a significantly lower importance. The XGBoost model, however, has a more even distribution of importance. While def holds a high importance, the second highest level of impact is placed on the write keyword followed by print with a slightly decreased importance.
Understanding feature importance in model decisions can further aid in understanding the challenges associated with software defect detection. Furthermore, determining which features play an important role in these classifications can further aid in reducing the computational costs of models in deployment and aid in improving data collection in the future. Detecting hidden model biases is also essential for enforcing trust in model decisions, improving the generalizability and objectivity of decisions.

6. Conclusions

Software has become increasingly integral to societal infrastructure, governing critical systems. Ensuring the reliability of software is therefore paramount across various industries. As the demand for accelerated development intensifies, the manual review of code presents growing challenges, with testing frequently consuming more time than development. A promising method for detecting defects at the source code level involves the integration of AI with NLP. Given that software is composed of human-readable code, which directs machine operations, the validation of the highly diverse machine code on a case-by-case basis is inherently complex. Consequently, source code analysis offers a potentially effective approach to improving defect detection and preventing errors.
This work explores the advantages and challenges of utilizing AI for error detection in software code. Both classical and NLP methods are explored on two publicly available datasets with five experiments total being conducted. A two-layer optimization framework is introduced in order to manage the complex demands of error detection. A CNN architecture is utilized in the first layer to help process the large amounts of data in a more computationally efficient manner, with the second layer handling intermediate results using AdaBoost and XGBoost classifiers. An additional set of simulations using only the second layer of the framework in combination with TF-IDF encoding is also carried out in order to provide a comparison between emerging NLP and classical techniques. As optimizer performance is highly dependent on adequate parameter selection, a modified version of the well-established PSO is introduced, designed specifically for the needs of this research and with an aim of overcoming some of the known drawbacks of the original algorithm. A comparative analysis is carried out with several state-of-the-art optimizers, and the introduced approach demonstrates promising outcomes in several simulations.
Twin-layer simulations improve upon the baseline outcomes demonstrated by CNN boost accuracy, with values ranging from 0.768799 to 0.772166 for the AdaBoost models and a value of 0.771044 for the best-performing XGBoost classifier. This suggests that a two-layer approach can yield favorable outcomes while maintaining favorable computational demands in comparison to more complex network solutions. Optimization carried out using NLP demonstrates an impressive accuracy of 0.979781 for the best-performing AdaBoost model and 0.983893 for the best-performing XGBoost model. Simulations are further validated using statistical evaluation to confirm the significance of the observations. The best preforming models are also subjected to SHAP analysis in order to determine feature importance and help locate any potential hidden biases within the best-performing models.
It is worth noting that the extensive computational demands of the optimizations carried out in this work limit the extent of optimizers that can be tested. Further limitations are associated with population size and allocated numbers of iterations for each optimization due to hardware memory limitations. Future works hope to address these concerns as additional resources become available. Additional implementations of the proposed MPPSO are also hoped to be explored. Emerging transformer-based architectures based on custom BERT encoding are also hoped to be explored for software defect detection in future works.

Author Contributions

Methodology, N.B., M.Z., and A.P.; conceptualization, N.S., M.A., and V.G.; Writing—original draft preparation, L.J., A.P., N.S., and M.Z.; writing—review and editing, A.P., V.G., M.M., and M.A.; visualization, N.S., N.B., M.M., and L.J.; funding acquisition, M.A., M.Z., and N.B.; project administration, N.B. and L.J.; supervision, M.M. and M.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Science Fund of the Republic of Serbia, grant No. 7373, Characterizing Crises-Caused Air Pollution Alternations Using an Artificial Intelligence-Based Framework (crAIRsis), and grant No. 7502, Intelligent Multi-Agent Control and Optimization Applied to Green Buildings and Environmental Monitoring Drone Swarms (ECOSwarm).

Data Availability Statement

The dataset used in this research is freely obtainable at the following URL: https://paperswithcode.com/dataset/mbpp (accessed on 5 September 2024). The final datasets used for testing both the L1 and L2 parts of the framework, along with NLP dataset and code snippets from the framework are available on the following github URL: https://github.com/nbacanin/MDPI_Math_SoftwareDefects2024 accessed on 1 August 2024.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Alyahyan, S.; Alatawi, M.N.; Alnfiai, M.M.; Alotaibi, S.D.; Alshammari, A.; Alzaid, Z.; Alwageed, H.S. Software reliability assessment: An architectural and component impact analysis. Tsinghua Sci. Technol. 2024. early access. [Google Scholar] [CrossRef]
  2. Zhang, H.; Gao, X.Z.; Wang, Z.; Wang, G. Guest Editorial of the Special Section on Neural Computing-Driven Artificial Intelligence for Consumer Electronics. IEEE Trans. Consum. Electron. 2024, 70, 3517–3520. [Google Scholar] [CrossRef]
  3. Mcmurray, S.; Sodhro, A.H. A study on ML-based software defect detection for security traceability in smart healthcare applications. Sensors 2023, 23, 3470. [Google Scholar] [CrossRef] [PubMed]
  4. Giray, G.; Bennin, K.E.; Köksal, Ö.; Babur, Ö.; Tekinerdogan, B. On the use of deep learning in software defect prediction. J. Syst. Softw. 2023, 195, 111537. [Google Scholar] [CrossRef]
  5. Jim, J.R.; Talukder, M.A.R.; Malakar, P.; Kabir, M.M.; Nur, K.; Mridha, M. Recent advancements and challenges of nlp-based sentiment analysis: A state-of-the-art review. Nat. Lang. Process. J. 2024, 6, 100059. [Google Scholar] [CrossRef]
  6. Zhang, C.; Chen, J.; Li, J.; Peng, Y.; Mao, Z. Large language models for human-robot interaction: A review. Biomim. Intell. Robot. 2023, 3, 100131. [Google Scholar] [CrossRef]
  7. Peng, Y.; He, M.; Hu, F.; Mao, Z.; Huang, X.; Ding, J. Predictive Modeling of Flexible EHD Pumps using Kolmogorov-Arnold Networks. arXiv 2024, arXiv:2405.07488. [Google Scholar] [CrossRef]
  8. Mizdrakovic, V.; Kljajic, M.; Zivkovic, M.; Bacanin, N.; Jovanovic, L.; Deveci, M.; Pedrycz, W. Forecasting bitcoin: Decomposition aided long short-term memory based time series modelling and its explanation with shapley values. Knowl.-Based Syst. 2024, 299, 112026. [Google Scholar] [CrossRef]
  9. Wolpert, D.H.; Macready, W.G. No free lunch theorems for optimization. IEEE Trans. Evol. Comput. 1997, 1, 67–82. [Google Scholar] [CrossRef]
  10. Zivkovic, T.; Nikolic, B.; Simic, V.; Pamucar, D.; Bacanin, N. Software defects prediction by metaheuristics tuned extreme gradient boosting and analysis based on Shapley Additive Explanations. Appl. Soft Comput. 2023, 146, 110659. [Google Scholar] [CrossRef]
  11. Zivkovic, M.; Bacanin, N.; Antonijevic, M.; Nikolic, B.; Kvascev, G.; Marjanovic, M.; Savanovic, N. Hybrid CNN and XGBoost model tuned by modified arithmetic optimization algorithm for COVID-19 early diagnostics from X-ray images. Electronics 2022, 11, 3798. [Google Scholar] [CrossRef]
  12. Salb, M.; Jovanovic, L.; Bacanin, N.; Antonijevic, M.; Zivkovic, M.; Budimirovic, N.; Abualigah, L. Enhancing internet of things network security using hybrid CNN and xgboost model tuned via modified reptile search algorithm. Appl. Sci. 2023, 13, 12687. [Google Scholar] [CrossRef]
  13. Jovanovic, L.; Jovanovic, D.; Antonijevic, M.; Nikolic, B.; Bacanin, N.; Zivkovic, M.; Strumberger, I. Improving phishing website detection using a hybrid two-level framework for feature selection and xgboost tuning. J. Web Eng. 2023, 22, 543–574. [Google Scholar] [CrossRef]
  14. Kennedy, J.; Eberhart, R. Particle swarm optimization. In Proceedings of the ICNN’95-International Conference on Neural Networks, Perth, Australia, 27 November–1 December 1995; Volume 4, pp. 1942–1948. [Google Scholar]
  15. Połap, D.; Woźniak, M. Red fox optimization algorithm. Expert Syst. Appl. 2021, 166, 114107. [Google Scholar] [CrossRef]
  16. Abualigah, L.; Abd Elaziz, M.; Sumari, P.; Geem, Z.W.; Gandomi, A.H. Reptile Search Algorithm (RSA): A nature-inspired meta-heuristic optimizer. Expert Syst. Appl. 2022, 191, 116158. [Google Scholar] [CrossRef]
  17. Ali, M.; Mazhar, T.; Al-Rasheed, A.; Shahzad, T.; Ghadi, Y.Y.; Khan, M.A. Enhancing software defect prediction: A framework with improved feature selection and ensemble machine learning. PeerJ Comput. Sci. 2024, 10, e1860. [Google Scholar] [CrossRef]
  18. Khleel, N.A.A.; Nehéz, K. Software defect prediction using a bidirectional LSTM network combined with oversampling techniques. Clust. Comput. 2024, 27, 3615–3638. [Google Scholar] [CrossRef]
  19. Zhang, Q.; Zhang, J.; Feng, T.; Xue, J.; Zhu, X.; Zhu, N.; Li, Z. Software Defect Prediction Using Deep Q-Learning Network-Based Feature Extraction. IET Softw. 2024, 2024, 3946655. [Google Scholar] [CrossRef]
  20. Briciu, A.; Czibula, G.; Lupea, M. A study on the relevance of semantic features extracted using BERT-based language models for enhancing the performance of software defect classifiers. Procedia Comput. Sci. 2023, 225, 1601–1610. [Google Scholar] [CrossRef]
  21. Dash, G.; Sharma, C.; Sharma, S. Sustainable marketing and the role of social media: An experimental study using natural language processing (NLP). Sustainability 2023, 15, 5443. [Google Scholar] [CrossRef]
  22. Velasco, L.; Guerrero, H.; Hospitaler, A. A literature review and critical analysis of metaheuristics recently developed. Arch. Comput. Methods Eng. 2024, 31, 125–146. [Google Scholar] [CrossRef]
  23. Jain, V.; Kashyap, K.L. Ensemble hybrid model for Hindi COVID-19 text classification with metaheuristic optimization algorithm. Multimed. Tools Appl. 2023, 82, 16839–16859. [Google Scholar] [CrossRef] [PubMed]
  24. Mladenović, N.; Hansen, P. Variable neighborhood search. Comput. Oper. Res. 1997, 24, 1097–1100. [Google Scholar] [CrossRef]
  25. Karaboga, D.; Akay, B. A comparative study of artificial bee colony algorithm. Appl. Math. Comput. 2009, 214, 108–132. [Google Scholar] [CrossRef]
  26. Yang, X.S.; Hossein Gandomi, A. Bat algorithm: A novel approach for global engineering optimization. Eng. Comput. 2012, 29, 464–483. [Google Scholar] [CrossRef]
  27. Gurrola-Ramos, J.; Hernàndez-Aguirre, A.; Dalmau-Cedeño, O. COLSHADE for real-world single-objective constrained optimization problems. In Proceedings of the 2020 IEEE Congress on Evolutionary Computation (CEC), Glasgow, UK, 19–24 July 2020; pp. 1–8. [Google Scholar]
  28. Bai, J.; Li, Y.; Zheng, M.; Khatir, S.; Benaissa, B.; Abualigah, L.; Wahab, M.A. A sinh cosh optimizer. Knowl.-Based Syst. 2023, 282, 111081. [Google Scholar] [CrossRef]
  29. Damaševičius, R.; Jovanovic, L.; Petrovic, A.; Zivkovic, M.; Bacanin, N.; Jovanovic, D.; Antonijevic, M. Decomposition aided attention-based recurrent neural networks for multistep ahead time-series forecasting of renewable power generation. PeerJ Comput. Sci. 2024, 10, e1795. [Google Scholar] [CrossRef]
  30. Gajevic, M.; Milutinovic, N.; Krstovic, J.; Jovanovic, L.; Marjanovic, M.; Stoean, C. Artificial neural network tuning by improved sine cosine algorithm for healthcare 4.0. In Proceedings of the 1st International Conference on Innovation in Information Technology and Business (ICIITB 2022), Muscat, Oman, 9–10 November 2022; Atlantis Press: Dordrecht, The Netherlands, 2023; Volume 104, p. 289. [Google Scholar]
  31. Minic, A.; Jovanovic, L.; Bacanin, N.; Stoean, C.; Zivkovic, M.; Spalevic, P.; Petrovic, A.; Dobrojevic, M.; Stoean, R. Applying recurrent neural networks for anomaly detection in electrocardiogram sensor data. Sensors 2023, 23, 9878. [Google Scholar] [CrossRef] [PubMed]
  32. Jovanovic, L.; Milutinovic, N.; Gajevic, M.; Krstovic, J.; Rashid, T.A.; Petrovic, A. Sine cosine algorithm for simple recurrent neural network tuning for stock market prediction. In Proceedings of the 2022 30th Telecommunications Forum (TELFOR), Belgrade, Serbia, 15–16 November 2022; pp. 1–4. [Google Scholar]
  33. Jovanovic, L.; Djuric, M.; Zivkovic, M.; Jovanovic, D.; Strumberger, I.; Antonijevic, M.; Budimirovic, N.; Bacanin, N. Tuning xgboost by planet optimization algorithm: An application for diabetes classification. In Proceedings of the Fourth International Conference on Communication, Computing and Electronics Systems: ICCCES, Coimbatore, India, 15–16 September 2022; Springer: Singapore, 2023; pp. 787–803. [Google Scholar]
  34. Pavlov-Kagadejev, M.; Jovanovic, L.; Bacanin, N.; Deveci, M.; Zivkovic, M.; Tuba, M.; Strumberger, I.; Pedrycz, W. Optimizing long-short-term memory models via metaheuristics for decomposition aided wind energy generation forecasting. Artif. Intell. Rev. 2024, 57, 45. [Google Scholar] [CrossRef]
  35. Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
  36. Aftan, S.; Shah, H. A survey on bert and its applications. In Proceedings of the 2023 20th Learning and Technology Conference (L&T), Jeddah, Saudi Arabia, 26 January 2023; pp. 161–166. [Google Scholar]
  37. Qaiser, S.; Ali, R. Text mining: Use of TF-IDF to examine the relevance of words to documents. Int. J. Comput. Appl. 2018, 181, 25–29. [Google Scholar] [CrossRef]
  38. Bezdan, T.; Stoean, C.; Naamany, A.A.; Bacanin, N.; Rashid, T.A.; Zivkovic, M.; Venkatachalam, K. Hybrid fruit-fly optimization algorithm with k-means for text document clustering. Mathematics 2021, 9, 1929. [Google Scholar] [CrossRef]
  39. Radomirović, B.; Jovanović, V.; Nikolić, B.; Stojanović, S.; Venkatachalam, K.; Zivkovic, M.; Njeguš, A.; Bacanin, N.; Strumberger, I. Text document clustering approach by improved sine cosine algorithm. Inf. Technol. Control 2023, 52, 541–561. [Google Scholar] [CrossRef]
  40. Bacanin, N.; Zivkovic, M.; Stoean, C.; Antonijevic, M.; Janicijevic, S.; Sarac, M.; Strumberger, I. Application of natural language processing and machine learning boosted with swarm intelligence for spam email filtering. Mathematics 2022, 10, 4173. [Google Scholar] [CrossRef]
  41. Bacanin, N.; Jovanovic, L.; Janicijevic, S.; Antonijevic, M.; Sarac, M.; Zivkovic, M. Leveraging Metaheuristic Optimization to Enhance Insider Threat Detection Through Email Content Natural Language Processing. In Proceedings of the International Conference on Intelligent and Fuzzy Systems, Canakkale, Türkiye, 16–18 July 2024; Springer: Cham, 2024; pp. 569–577. [Google Scholar]
  42. Markovic, V.; Njegus, A.; Bulaja, D.; Zivkovic, T.; Zivkovic, M.; Mani, J.P.; Bacanin, N. Employee reviews sentiment classification using BERT encoding and AdaBoost classifier tuned by modified PSO algorithm. In Proceedings of the 2nd International Conference on Innovation in Information Technology and Business (ICIITB 2024), Muscat, Oman, 29–30 April 2024; Atlantis Press: Dordrecht, Netherlands, 2024; pp. 22–37. [Google Scholar]
  43. Mozumder, M.A.S.; Nguyen, T.N.; Devi, S.; Arif, M.; Ahmed, M.P.; Ahmed, E.; Bhuiyan, M.; Rahman, M.H.; Al Mamun, A.; Uddin, A. Enhancing Customer Satisfaction Analysis Using Advanced Machine Learning Techniques in Fintech Industry. J. Comput. Sci. Technol. Stud. 2024, 6, 35–41. [Google Scholar] [CrossRef]
  44. Ashtiani, M.N.; Raahemi, B. News-based intelligent prediction of financial markets using text mining and machine learning: A systematic literature review. Expert Syst. Appl. 2023, 217, 119509. [Google Scholar] [CrossRef]
  45. Iftikhar, S.; Alluhaybi, B.; Suliman, M.; Saeed, A.; Fatima, K. Amazon products reviews classification based on machine learning, deep learning methods and BERT. TELKOMNIKA Telecommun. Comput. Electron. Control. 2023, 21, 1084–1101. [Google Scholar] [CrossRef]
  46. Mittal, S.; Stoean, C.; Kajdacsy-Balla, A.; Bhargava, R. Digital assessment of stained breast tissue images for comprehensive tumor and microenvironment analysis. Front. Bioeng. Biotechnol. 2019, 7, 246. [Google Scholar] [CrossRef]
  47. Postavaru, S.; Stoean, R.; Stoean, C.; Caparros, G.J. Adaptation of deep convolutional neural networks for cancer grading from histopathological images. In Proceedings of the Advances in Computational Intelligence: 14th International Work-Conference on Artificial Neural Networks, IWANN 2017, Cadiz, Spain, 14–16 June 2017; Proceedings, Part II 14. Springer: Cham, Switzerland, 2017; pp. 38–49. [Google Scholar]
  48. Bacanin, N.; Jovanovic, L.; Stoean, R.; Stoean, C.; Zivkovic, M.; Antonijevic, M.; Dobrojevic, M. Respiratory Condition Detection Using Audio Analysis and Convolutional Neural Networks Optimized by Modified Metaheuristics. Axioms 2024, 13, 335. [Google Scholar] [CrossRef]
  49. Jovanovic, L.; Damaševičius, R.; Matic, R.; Kabiljo, M.; Simic, V.; Kunjadic, G.; Antonijevic, M.; Zivkovic, M.; Bacanin, N. Detecting Parkinson’s disease from shoe-mounted accelerometer sensors using convolutional neural networks optimized with modified metaheuristics. PeerJ Comput. Sci. 2024, 10, e2031. [Google Scholar] [CrossRef]
  50. Shah, S.A.; Lakho, G.M.; Keerio, H.A.; Sattar, M.N.; Hussain, G.; Mehdi, M.; Vistro, R.B.; Mahmoud, E.A.; Elansary, H.O. Application of drone surveillance for advance agriculture monitoring by Android application using convolution neural network. Agronomy 2023, 13, 1764. [Google Scholar] [CrossRef]
  51. Mendoza-Bernal, J.; González-Vidal, A.; Skarmeta, A.F. A Convolutional Neural Network approach for image-based anomaly detection in smart agriculture. Expert Syst. Appl. 2024, 247, 123210. [Google Scholar] [CrossRef]
  52. Zhang, D.; Hao, X.; Wang, D.; Qin, C.; Zhao, B.; Liang, L.; Liu, W. An efficient lightweight convolutional neural network for industrial surface defect detection. Artif. Intell. Rev. 2023, 56, 10651–10677. [Google Scholar] [CrossRef]
  53. Thomas, J.B.; Chaudhari, S.G.; Shihabudheen, K.; Verma, N.K. CNN-based transformer model for fault detection in power system networks. IEEE Trans. Instrum. Meas. 2023, 72, 1–10. [Google Scholar] [CrossRef]
  54. Huang, X.; Ye, Y.; Yang, X.; Xiong, L. Multi-view dynamic graph convolution neural network for traffic flow prediction. Expert Syst. Appl. 2023, 222, 119779. [Google Scholar] [CrossRef]
  55. Khan, M.A.; Park, H.; Chae, J. A lightweight convolutional neural network (CNN) architecture for traffic sign recognition in urban road networks. Electronics 2023, 12, 1802. [Google Scholar] [CrossRef]
  56. Freund, Y.; Schapire, R.E. A desicion-theoretic generalization of on-line learning and an application to boosting. In Proceedings of the European Conference on Computational Learning Theory, Barcelona, Spain, 13–15 March 1995; Springer: Berlin/Heidelberg, Germany, 1995; pp. 23–37. [Google Scholar]
  57. Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
  58. Bacanin, N.; Simic, V.; Zivkovic, M.; Alrasheedi, M.; Petrovic, A. Cloud computing load prediction by decomposition reinforced attention long short-term memory network optimized by modified particle swarm optimization algorithm. Ann. Oper. Res. 2023, 1–34. [Google Scholar] [CrossRef]
  59. Mirjalili, S.; Mirjalili, S. Genetic algorithm. In Evolutionary Algorithms and Neural Networks: Theory and Applications; Springer: Sinagpore, 2019; pp. 43–55. [Google Scholar]
  60. Rahnamayan, S.; Tizhoosh, H.R.; Salama, M.M.A. Quasi-oppositional Differential Evolution. In Proceedings of the 2007 IEEE Congress on Evolutionary Computation, Singapore, 25–28 September 2007; pp. 2229–2236. [Google Scholar] [CrossRef]
  61. Yang, X.S. Nature-Inspired Metaheuristic Algorithms; Luniver Press: Beckington, UK, 2010. [Google Scholar]
  62. Yang, X.S. Optimization and metaheuristic algorithms in engineering. In Metaheuristics in Water, Geotechnical and Transport Engineering; Elsevier: Amsterdam, Netherlands, 2013; Volume 1, p. 23. [Google Scholar]
  63. Abdel-Basset, M.; Abdel-Fatah, L.; Sangaiah, A.K. Metaheuristic algorithms: A comprehensive review. In ComputationaL Intelligence for Multimedia Big Data on the Cloud With Engineering Applications; Elsevier: Amsterdam, The Netherlands, 2018; pp. 185–231. [Google Scholar]
  64. Kazikova, A.; Pluhacek, M.; Senkerik, R. How does the number of objective function evaluations impact our understanding of metaheuristics behavior? IEEE Access 2021, 9, 44032–44048. [Google Scholar] [CrossRef]
  65. McCabe, T. A Complexity Measure. IEEE Trans. Softw. Eng. 1976, 2, 308–320. [Google Scholar] [CrossRef]
  66. Halstead, M. Elements of Software Science; Elsevier: New York, NY, USA, 1977. [Google Scholar]
  67. LaTorre, A.; Molina, D.; Osaba, E.; Poyatos, J.; Del Ser, J.; Herrera, F. A prescription of methodological guidelines for comparing bio-inspired optimization algorithms. Swarm Evol. Comput. 2021, 67, 100973. [Google Scholar] [CrossRef]
  68. Glass, G.V. Testing homogeneity of variances. Am. Educ. Res. J. 1966, 3, 187–190. [Google Scholar] [CrossRef]
  69. Shapiro, S.S.; Francia, R. An approximate analysis of variance test for normality. J. Am. Stat. Assoc. 1972, 67, 215–216. [Google Scholar] [CrossRef]
Figure 1. Crossover and mutation mechanisms.
Figure 1. Crossover and mutation mechanisms.
Mathematics 12 02918 g001
Figure 2. Proposed optimization framework flowchart.
Figure 2. Proposed optimization framework flowchart.
Mathematics 12 02918 g002
Figure 3. Class distribution in NASA dataset.
Figure 3. Class distribution in NASA dataset.
Mathematics 12 02918 g003
Figure 4. NASA feature correlation heat−map.
Figure 4. NASA feature correlation heat−map.
Mathematics 12 02918 g004
Figure 5. Layer 1 (CNN) objective and indicator outcome distributions.
Figure 5. Layer 1 (CNN) objective and indicator outcome distributions.
Mathematics 12 02918 g005
Figure 6. Layer 1 (CNN) objective function convergence.
Figure 6. Layer 1 (CNN) objective function convergence.
Mathematics 12 02918 g006
Figure 7. Layer 1 (CNN) indicator function convergence.
Figure 7. Layer 1 (CNN) indicator function convergence.
Mathematics 12 02918 g007aMathematics 12 02918 g007b
Figure 8. CNN-MPPSO optimized L1 model ROC curve.
Figure 8. CNN-MPPSO optimized L1 model ROC curve.
Mathematics 12 02918 g008
Figure 9. CNN-MPPSO optimized L1 model confusion matrix.
Figure 9. CNN-MPPSO optimized L1 model confusion matrix.
Mathematics 12 02918 g009
Figure 10. Best CNN-MPPSO optimized L1 model visualization.
Figure 10. Best CNN-MPPSO optimized L1 model visualization.
Mathematics 12 02918 g010
Figure 11. Layer 2 AdaBoost objective and indicator outcome distributions.
Figure 11. Layer 2 AdaBoost objective and indicator outcome distributions.
Mathematics 12 02918 g011
Figure 12. Layer 2 AdaBoost objective function convergence.
Figure 12. Layer 2 AdaBoost objective function convergence.
Mathematics 12 02918 g012
Figure 13. Layer 2 AdaBoost indicator function convergence.
Figure 13. Layer 2 AdaBoost indicator function convergence.
Mathematics 12 02918 g013aMathematics 12 02918 g013b
Figure 14. Layer 2 AdaBoost optimized L1 model ROC curve.
Figure 14. Layer 2 AdaBoost optimized L1 model ROC curve.
Mathematics 12 02918 g014
Figure 15. Layer 2 AdaBoost optimized L1 model confusion matrix.
Figure 15. Layer 2 AdaBoost optimized L1 model confusion matrix.
Mathematics 12 02918 g015
Figure 16. Layer 2 XGBoost objective and indicator outcome distributions.
Figure 16. Layer 2 XGBoost objective and indicator outcome distributions.
Mathematics 12 02918 g016
Figure 17. Layer 2 XGBoost objective function convergence.
Figure 17. Layer 2 XGBoost objective function convergence.
Mathematics 12 02918 g017
Figure 18. Layer 2 XGBoost indicator function convergence.
Figure 18. Layer 2 XGBoost indicator function convergence.
Mathematics 12 02918 g018
Figure 19. Layer 2 XGBoost optimized L1 model ROC curve.
Figure 19. Layer 2 XGBoost optimized L1 model ROC curve.
Mathematics 12 02918 g019
Figure 20. Layer 2 XGBoost optimized L1 model confusion matrix.
Figure 20. Layer 2 XGBoost optimized L1 model confusion matrix.
Mathematics 12 02918 g020
Figure 21. NLP Layer 2 AdaBoost objective and indicator outcome distributions.
Figure 21. NLP Layer 2 AdaBoost objective and indicator outcome distributions.
Mathematics 12 02918 g021
Figure 22. NLP Layer 2 AdaBoost objective function convergence.
Figure 22. NLP Layer 2 AdaBoost objective function convergence.
Mathematics 12 02918 g022
Figure 23. NLP Layer 2 AdaBoost indicator function convergence.
Figure 23. NLP Layer 2 AdaBoost indicator function convergence.
Mathematics 12 02918 g023
Figure 24. NLP Layer 2 AdaBoost optimized model ROC curve.
Figure 24. NLP Layer 2 AdaBoost optimized model ROC curve.
Mathematics 12 02918 g024
Figure 25. NLP Layer 2 AdaBoost optimized model confusion matrix.
Figure 25. NLP Layer 2 AdaBoost optimized model confusion matrix.
Mathematics 12 02918 g025
Figure 26. NLP Layer 2 XGBoost and indicator outcome distributions.
Figure 26. NLP Layer 2 XGBoost and indicator outcome distributions.
Mathematics 12 02918 g026
Figure 27. NLP Layer 2 XGBoost objective function convergence.
Figure 27. NLP Layer 2 XGBoost objective function convergence.
Mathematics 12 02918 g027
Figure 28. NLP Layer 2 XGBoost indicator function convergence.
Figure 28. NLP Layer 2 XGBoost indicator function convergence.
Mathematics 12 02918 g028
Figure 29. NLP Layer 2 XGBoost optimized model ROC curve.
Figure 29. NLP Layer 2 XGBoost optimized model ROC curve.
Mathematics 12 02918 g029
Figure 30. NLP Layer 2 XGBoost optimized model confusion matrix.
Figure 30. NLP Layer 2 XGBoost optimized model confusion matrix.
Mathematics 12 02918 g030
Figure 31. KDE diagrams for all five conducted simulations.
Figure 31. KDE diagrams for all five conducted simulations.
Mathematics 12 02918 g031
Figure 32. Best CNN, XGBoost, and AdaBoost model feature importances on the NASA dataset.
Figure 32. Best CNN, XGBoost, and AdaBoost model feature importances on the NASA dataset.
Mathematics 12 02918 g032
Figure 33. Best AdaBoost model feature importances on NLP dataset.
Figure 33. Best AdaBoost model feature importances on NLP dataset.
Mathematics 12 02918 g033
Figure 34. Best XGBoost model feature importances on the NLP dataset.
Figure 34. Best XGBoost model feature importances on the NLP dataset.
Mathematics 12 02918 g034
Table 1. CNN parameter ranges subjected to optimization.
Table 1. CNN parameter ranges subjected to optimization.
ParameterRange
Learning Rate0.0001–0.0030
Dropout0.05–0.5
Epochs10–20
Number of CNN Layers1–3
Number of Dense Layers1–3
Number of Neurons in layer32–128
Table 2. AdaBoost parameter ranges subjected to optimization.
Table 2. AdaBoost parameter ranges subjected to optimization.
ParameterRange
Count of estimators [ 10 , 30 ]
Depth [ 1 , 5 ]
Learning rate [ 0.01 , 2 ]
Table 3. XGBoost parameter ranges subjected to optimization.
Table 3. XGBoost parameter ranges subjected to optimization.
ParameterRange
Learning rate0.1–0.9
Minimum child weight1–10
Subsample0.01–1.00
Col sample by tree0.01–1.00
Max depth3–10
Gamma0.00–0.80
Table 4. Layer 1 (CNN) objective function outcomes.
Table 4. Layer 1 (CNN) objective function outcomes.
MethodBestWorstMeanMedianStdVar
CNN-MPPSO 2 . 73 × 10 1 2 . 62 × 10 1 2 . 69 × 10 1 2 . 70 × 10 1 4.19 × 10 3 1 . 76 × 10 5
CNN-PSO 2.67 × 10 1 2.53 × 10 1 2.61 × 10 1 2.61 × 10 1 4.77 × 10 3 2.27 × 10 5
CNN-GA 2.66 × 10 1 2.50 × 10 1 2.59 × 10 1 2.59 × 10 1 5.98 × 10 3 3.57 × 10 5
CNN-VNS 2.66 × 10 1 2.58 × 10 1 2.63 × 10 1 2.63 × 10 1 2.55 × 10 3 6.52 × 10 6
CNN-ABC 2.69 × 10 1 2.53 × 10 1 2.62 × 10 1 2.63 × 10 1 5.42 × 10 3 2.94 × 10 5
CNN-BA 2.66 × 10 1 2.59 × 10 1 2.62 × 10 1 2.62 × 10 1 2.22 × 10 3 4.92 × 10 6
CNN-SCHO 2.69 × 10 1 2.57 × 10 1 2.61 × 10 1 2.60 × 10 1 3 . 99 × 10 3 1 . 59 × 10 5
CNN-COLSHADE 2.62 × 10 1 2.56 × 10 1 2.59 × 10 1 2.60 × 10 1 2.28 × 10 3 5.18 × 10 6
Table 5. Layer 1 (CNN) indicator function outcomes.
Table 5. Layer 1 (CNN) indicator function outcomes.
MethodBestWorstMeanMedianStdVar
CNN-MPPSO 2 . 31 × 10 1 2.38 × 10 1 2 . 33 × 10 1 2 . 33 × 10 1 2.81 × 10 3 7.89 × 10 6
CNN-PSO 2.43 × 10 1 2.67 × 10 1 2.49 × 10 1 2.43 × 10 1 1 . 26 × 10 2 1 . 60 × 10 4
CNN-GA 2.33 × 10 1 2.61 × 10 1 2.52 × 10 1 2.61 × 10 1 1.54 × 10 2 2.37 × 10 4
CNN-VNS 2.37 × 10 1 2.40 × 10 1 2.38 × 10 1 2.37 × 10 1 5.65 × 10 3 3.19 × 10 5
CNN-ABC 2.42 × 10 1 2.43 × 10 1 2.44 × 10 1 2.42 × 10 1 7.37 × 10 3 5.43 × 10 5
CNN-BA 2.39 × 10 1 2 . 30 × 10 1 2.39 × 10 1 2.38 × 10 1 6.46 × 10 3 4.17 × 10 5
CNN-SCHO 2.45 × 10 1 2.64 × 10 1 2.50 × 10 1 2.45 × 10 1 9.63 × 10 3 9.27 × 10 5
CNN-COLSHADE 2.39 × 10 1 2.45 × 10 1 2.46 × 10 1 2.45 × 10 1 7.86 × 10 3 6.18 × 10 5
Table 6. Best-performing optimized CNN-MPPSO L1 model detailed metric comparisons.
Table 6. Best-performing optimized CNN-MPPSO L1 model detailed metric comparisons.
MethodMetricNo ErrorErrorAccuracyMacro Avg.Weighted Avg.
CNN-MPPSOprecision 8.26 × 10 1 4 . 81 × 10 1 7 . 69 × 10 1 6 . 53 × 10 1 7 . 48 × 10 1
recall 8 . 89 × 10 1 3.54 × 10 1 7 . 69 × 10 1 6.22 × 10 1 7 . 69 × 10 1
f1-score 8 . 56 × 10 1 4.08 × 10 1 7 . 69 × 10 1 6.32 × 10 1 7 . 56 × 10 1
CNN-PSOprecision 8.30 × 10 1 4.53 × 10 1 7.57 × 10 1 6.41 × 10 1 7.45 × 10 1
recall 8.63 × 10 1 3.89 × 10 1 7.57 × 10 1 6 . 26 × 10 1 7.57 × 10 1
f1-score 8.46 × 10 1 4.19 × 10 1 7.57 × 10 1 6.32 × 10 1 7.50 × 10 1
CNN-GAprecision 7.69 × 10 1 1.89 × 10 1 6.80 × 10 1 4.79 × 10 1 6.38 × 10 1
recall 8.40 × 10 1 1.28 × 10 1 6.80 × 10 1 4.84 × 10 1 6.80 × 10 1
f1-score 8.03 × 10 1 1.53 × 10 1 6.80 × 10 1 4.78 × 10 1 6.57 × 10 1
CNN-VNSprecision 8.26 × 10 1 4.66 × 10 1 7.63 × 10 1 6.46 × 10 1 7.45 × 10 1
recall 8.79 × 10 1 3.63 × 10 1 7.63 × 10 1 6.21 × 10 1 7.63 × 10 1
f1-score 8.52 × 10 1 4.08 × 10 1 7.63 × 10 1 6.30 × 10 1 7.52 × 10 1
CNN-ABCprecision 8.30 × 10 1 4.55 × 10 1 7.58 × 10 1 6.43 × 10 1 7.46 × 10 1
recall 8.65 × 10 1 3.89 × 10 1 7.58 × 10 1 6.27 × 10 1 7.58 × 10 1
f1-score 8.47 × 10 1 4.20 × 10 1 7.58 × 10 1 6.33 × 10 1 7.51 × 10 1
CNN-BAprecision 8.28 × 10 1 4.60 × 10 1 7.61 × 10 1 6.44 × 10 1 7.44 × 10 1
recall 8.73 × 10 1 3.73 × 10 1 7.61 × 10 1 6.23 × 10 1 7.61 × 10 1
f1-score 8.50 × 10 1 4.12 × 10 1 7.61 × 10 1 6.31 × 10 1 7.51 × 10 1
CNN-SCHOprecision 8 . 31 × 10 1 4.50 × 10 1 7.55 × 10 1 6.41 × 10 1 7.46 × 10 1
recall 8.59 × 10 1 3.98 × 10 1 7.55 × 10 1 6.28 × 10 1 7.55 × 10 1
f1-score 8.45 × 10 1 4.22 × 10 1 7.55 × 10 1 6 . 34 × 10 1 7.50 × 10 1
CNN-COLSHADEprecision 8.26 × 10 1 4.60 × 10 1 7.61 × 10 1 6.43 × 10 1 7.44 × 10 1
recall 8.76 × 10 1 3.64 × 10 1 7.61 × 10 1 6.20 × 10 1 7.61 × 10 1
f1-score 8.50 × 10 1 4.07 × 10 1 7.61 × 10 1 6.29 × 10 1 7.51 × 10 1
support2072601
Table 7. Optimized CNN model parameter selections.
Table 7. Optimized CNN model parameter selections.
MethodLearning
Rate
DropoutEpochsLayers
CNN
Layers
Dense
Neurons
CNN
Neurons
Dense
CNN-MPPSO 2.44 × 10 3 5.00 × 10 1 841.01.08888
CNN-PSO 2.80 × 10 3 1.66 × 10 1 561.01.09783
CNN-GA 1.66 × 10 3 4.75 × 10 1 651.01.09486
CNN-VNS 2.38 × 10 3 5.00 × 10 1 421.01.012893
CNN-ABC 2.61 × 10 3 3.70 × 10 1 932.01.012496
CNN-BA 3.00 × 10 3 5.00 × 10 1 702.02.0108103
CNN-SCHO 1.25 × 10 3 1.97 × 10 1 1001.02.046105
CNN-COLSHADE 1.88 × 10 3 2.30 × 10 1 862.02.07399
Table 8. Layer 2 AdaBoost objective function outcomes.
Table 8. Layer 2 AdaBoost objective function outcomes.
MethodBestWorstMeanMedianStdVar
CNN-AB-MPPSO 2.83 × 10 1 2 . 76 × 10 1 2 . 80 × 10 1 2 . 81 × 10 1 3.09 × 10 3 9.56 × 10 6
CNN-AB-PSO 2.83 × 10 1 2.71 × 10 1 2.77 × 10 1 2.76 × 10 1 3.09 × 10 3 9.51 × 10 6
CNN-AB-GA 2.83 × 10 1 2.68 × 10 1 2.77 × 10 1 2.76 × 10 1 2.98 × 10 3 8.88 × 10 6
CNN-AB-VNS 2.83 × 10 1 2.69 × 10 1 2.76 × 10 1 2.76 × 10 1 3.66 × 10 3 1.34 × 10 5
CNN-AB-ABC 2.83 × 10 1 2.66 × 10 1 2.73 × 10 1 2.74 × 10 1 4.03 × 10 3 1.63 × 10 5
CNN-AB-BA 2.76 × 10 1 2.02 × 10 1 2.63 × 10 1 2.71 × 10 1 1.93 × 10 2 3.71 × 10 4
CNN-AB-SCHO 2.83 × 10 1 2.71 × 10 1 2.77 × 10 1 2.76 × 10 1 2 . 53 × 10 3 6 . 40 × 10 6
CNN-AB-COLSHADE 2.83 × 10 1 2.70 × 10 1 2.76 × 10 1 2.76 × 10 1 3.52 × 10 3 1.24 × 10 5
Table 9. Layer 2 AdaBoost objective function outcomes.
Table 9. Layer 2 AdaBoost objective function outcomes.
MethodBestWorstMeanMedianStdVar
CNN-AB-MPPSO 2.28 × 10 1 2.34 × 10 1 2.30 × 10 1 2.30 × 10 1 2.66 × 10 3 7.08 × 10 6
CNN-AB-PSO 2.28 × 10 1 2.28 × 10 1 2.31 × 10 1 2.31 × 10 1 2.38 × 10 3 5.67 × 10 6
CNN-AB-GA 2.28 × 10 1 2.42 × 10 1 2.32 × 10 1 2.32 × 10 1 3.53 × 10 3 1.25 × 10 5
CNN-AB-VNS 2.28 × 10 1 2.39 × 10 1 2.32 × 10 1 2.31 × 10 1 3.59 × 10 3 1.29 × 10 5
CNN-AB-ABC 2.28 × 10 1 2.35 × 10 1 2.34 × 10 1 2.34 × 10 1 3.43 × 10 3 1.18 × 10 5
CNN-AB-BA 2 . 34 × 10 1 2 . 49 × 10 1 2 . 44 × 10 1 2 . 36 × 10 1 2.05 × 10 2 4.22 × 10 4
CNN-AB-SCHO 2.28 × 10 1 2.33 × 10 1 2.31 × 10 1 2.31 × 10 1 2 . 21 × 10 3 4 . 87 × 10 6
CNN-AB-COLSHADE 2.28 × 10 1 2.37 × 10 1 2.33 × 10 1 2.33 × 10 1 2.87 × 10 3 8.21 × 10 6
Table 10. Best-performing optimized layer 2 AdaBoost model detailed metric comparisons.
Table 10. Best-performing optimized layer 2 AdaBoost model detailed metric comparisons.
MethodMetricNo ErrorErrorAccuracyMacro Avg.Weighted Avg.
CNN-AB-MPPSOprecision 8.28 × 10 1 4.91 × 10 1 7.72 × 10 1 6.59 × 10 1 7.52 × 10 1
recall 8.92 × 10 1 3.59 × 10 1 7.72 × 10 1 6.26 × 10 1 7.72 × 10 1
f1-score 8.59 × 10 1 4.15 × 10 1 7.72 × 10 1 6.37 × 10 1 7.59 × 10 1
CNN-AB-PSOprecision 8.28 × 10 1 4.91 × 10 1 7.72 × 10 1 6.59 × 10 1 7.52 × 10 1
recall 8.92 × 10 1 3.59 × 10 1 7.72 × 10 1 6.26 × 10 1 7.72 × 10 1
f1-score 8.59 × 10 1 4.15 × 10 1 7.72 × 10 1 6.37 × 10 1 7.59 × 10 1
CNN-AB-GAprecision 8.28 × 10 1 4.91 × 10 1 7.72 × 10 1 6.59 × 10 1 7.52 × 10 1
recall 8.92 × 10 1 3.59 × 10 1 7.72 × 10 1 6.26 × 10 1 7.72 × 10 1
f1-score 8.59 × 10 1 4.15 × 10 1 7.72 × 10 1 6.37 × 10 1 7.59 × 10 1
CNN-AB-VNSprecision 8.28 × 10 1 4.91 × 10 1 7.72 × 10 1 6.59 × 10 1 7.52 × 10 1
recall 8.92 × 10 1 3.59 × 10 1 7.72 × 10 1 6.26 × 10 1 7.72 × 10 1
f1-score 8.59 × 10 1 4.15 × 10 1 7.72 × 10 1 6.37 × 10 1 7.59 × 10 1
CNN-AB-ABCprecision 8.28 × 10 1 4.91 × 10 1 7.72 × 10 1 6.59 × 10 1 7.52 × 10 1
recall 8.92 × 10 1 3.59 × 10 1 7.72 × 10 1 6.26 × 10 1 7.72 × 10 1
f1-score 8.59 × 10 1 4.15 × 10 1 7.72 × 10 1 6.37 × 10 1 7.59 × 10 1
CNN-AB-BAprecision 8.28 × 10 1 4.74 × 10 1 7.66 × 10 1 6.51 × 10 1 7.49 × 10 1
recall 8.81 × 10 1 3.71 × 10 1 7.66 × 10 1 6.26 × 10 1 7.66 × 10 1
f1-score 8.54 × 10 1 4.16 × 10 1 7.66 × 10 1 6.35 × 10 1 7.55 × 10 1
CNN-AB-SCHOprecision 8.28 × 10 1 4.91 × 10 1 7.72 × 10 1 6.59 × 10 1 7.52 × 10 1
recall 8.92 × 10 1 3.59 × 10 1 7.72 × 10 1 6.26 × 10 1 7.72 × 10 1
f1-score 8.59 × 10 1 4.15 × 10 1 7.72 × 10 1 6.37 × 10 1 7.59 × 10 1
CNN-AB-COLSHADEprecision 8.28 × 10 1 4.91 × 10 1 7.72 × 10 1 6.59 × 10 1 7.52 × 10 1
recall 8.92 × 10 1 3.59 × 10 1 7.72 × 10 1 6.26 × 10 1 7.72 × 10 1
f1-score 8.59 × 10 1 4.15 × 10 1 7.72 × 10 1 6.37 × 10 1 7.59 × 10 1
support2072601
Table 11. Layer 2 AdaBoost model parameter selections.
Table 11. Layer 2 AdaBoost model parameter selections.
MethodsNumber of EstimatorsDepthLearning Rate
MethodCountErrorsScore
CNN-AB-MPPSO1710.388263
CNN-AB-PSO1710.395194
CNN-AB-GA1810.385504
CNN-AB-VNS1610.393270
CNN-AB-ABC1710.386973
CNN-AB-BA3720.100000
CNN-AB-SCHO1710.397827
CNN-AB-COLSHADE1610.389986
Table 12. Layer 2 XGBoost objective function outcomes.
Table 12. Layer 2 XGBoost objective function outcomes.
MethodBestWorstMeanMedianStdVar
CNN-XG-MPPSO 2 . 84 × 10 1 2.72 × 10 1 2.78 × 10 1 2.78 × 10 1 3.45 × 10 3 1.19 × 10 5
CNN-XG-PSO 2.83 × 10 1 2.73 × 10 1 2.80 × 10 1 2.81 × 10 1 2.81 × 10 3 7.87 × 10 6
CNN-XG-GA 2.83 × 10 1 2.71 × 10 1 2.78 × 10 1 2.78 × 10 1 3.24 × 10 3 1.05 × 10 5
CNN-XG-VNS 2.83 × 10 1 2 . 73 × 10 1 2 . 78 × 10 1 2 . 78 × 10 1 2 . 48 × 10 3 6 . 14 × 10 6
CNN-XG-ABC 2.83 × 10 1 2.64 × 10 1 2.71 × 10 1 2.70 × 10 1 4.40 × 10 3 1.93 × 10 5
CNN-XG-BA 2.83 × 10 1 2.58 × 10 1 2.71 × 10 1 2.72 × 10 1 6.94 × 10 3 4.81 × 10 5
CNN-XG-SCHO 2.82 × 10 1 2.70 × 10 1 2.78 × 10 1 2.78 × 10 1 2.85 × 10 3 8.14 × 10 6
CNN-XG-COLSHADE 2.81 × 10 1 2.71 × 10 1 2.77 × 10 1 2.78 × 10 1 2.51 × 10 3 6.32 × 10 6
Table 13. Layer 2 XGBoost indicator function outcomes.
Table 13. Layer 2 XGBoost indicator function outcomes.
MethodBestWorstMeanMedianStdVar
CNN-XG-MPPSO 2.29 × 10 1 2.33 × 10 1 2.32 × 10 1 2.33 × 10 1 3.38 × 10 3 1.14 × 10 5
CNN-XG-PSO 2.34 × 10 1 2.34 × 10 1 2.34 × 10 1 2.33 × 10 1 2 . 65 × 10 3 7 . 04 × 10 6
CNN-XG-GA 2.29 × 10 1 2.34 × 10 1 2.33 × 10 1 2.33 × 10 1 3.50 × 10 3 1.23 × 10 5
CNN-XG-VNS 2.26 × 10 1 2.35 × 10 1 2.33 × 10 1 2.33 × 10 1 4.01 × 10 3 1.61 × 10 5
CNN-XG-ABC 2 . 40 × 10 1 2 . 43 × 10 1 2 . 41 × 10 1 2 . 40 × 10 1 4.91 × 10 3 2.41 × 10 5
CNN-XG-BA 2.32 × 10 1 2.45 × 10 1 2.39 × 10 1 2.37 × 10 1 7.68 × 10 3 5.90 × 10 5
CNN-XG-SCHO 2.33 × 10 1 2.41 × 10 1 2.33 × 10 1 2.32 × 10 1 5.16 × 10 3 2.66 × 10 5
CNN-XG-COLSHADE 2.27 × 10 1 2.27 × 10 1 2.33 × 10 1 2.34 × 10 1 4.90 × 10 3 2.41 × 10 5
Table 14. Best-performing optimized Layer 2 XGBoost model detailed metric comparisons.
Table 14. Best-performing optimized Layer 2 XGBoost model detailed metric comparisons.
MethodMetricNo ErrorErrorAccuracyMacro AvgWeighted Avg
CNN-XG-MPPSOprecision 8.30 × 10 1 4.88 × 10 1 7.71 × 10 1 6.59 × 10 1 7.53 × 10 1
recall 8.86 × 10 1 3.74 × 10 1 7.71 × 10 1 6.30 × 10 1 7.71 × 10 1
f1-score 8.57 × 10 1 4.24 × 10 1 7.71 × 10 1 6.40 × 10 1 7.60 × 10 1
CNN-XG-PSOprecision 8.32 × 10 1 4.76 × 10 1 7.66 × 10 1 6.54 × 10 1 7.52 × 10 1
recall 8.75 × 10 1 3.89 × 10 1 7.66 × 10 1 6.32 × 10 1 7.66 × 10 1
f1-score 8.53 × 10 1 4.28 × 10 1 7.66 × 10 1 6.41 × 10 1 7.58 × 10 1
CNN-XG-GAprecision 8.30 × 10 1 4.89 × 10 1 7.71 × 10 1 6.59 × 10 1 7.53 × 10 1
recall 8.88 × 10 1 3.71 × 10 1 7.71 × 10 1 6.29 × 10 1 7.71 × 10 1
f1-score 8.58 × 10 1 4.22 × 10 1 7.71 × 10 1 6.40 × 10 1 7.60 × 10 1
CNN-XG-VNSprecision 8.28 × 10 1 4.98 × 10 1 7.74 × 10 1 6.63 × 10 1 7.54 × 10 1
recall 8.94 × 10 1 3.61 × 10 1 7.74 × 10 1 6.28 × 10 1 7.74 × 10 1
f1-score 8.60 × 10 1 4.19 × 10 1 7.74 × 10 1 6.39 × 10 1 7.61 × 10 1
CNN-XG-ABCprecision 7.72 × 10 1 2.09 × 10 1 6.79 × 10 1 4.91 × 10 1 6.45 × 10 1
recall 8.32 × 10 1 1.53 × 10 1 6.79 × 10 1 4.93 × 10 1 6.79 × 10 1
f1-score 8.01 × 10 1 1.77 × 10 1 6.79 × 10 1 4.89 × 10 1 6.61 × 10 1
CNN-XG-BAprecision 8.31 × 10 1 4.80 × 10 1 7.68 × 10 1 6.56 × 10 1 7.52 × 10 1
recall 8.80 × 10 1 3.83 × 10 1 7.68 × 10 1 6.31 × 10 1 7.68 × 10 1
f1-score 8.55 × 10 1 4.26 × 10 1 7.68 × 10 1 6.40 × 10 1 7.58 × 10 1
CNN-XG-SCHOprecision 8.31 × 10 1 4.76 × 10 1 7.67 × 10 1 6.54 × 10 1 7.51 × 10 1
recall 8.77 × 10 1 3.86 × 10 1 7.67 × 10 1 6.31 × 10 1 7.67 × 10 1
f1-score 8.53 × 10 1 4.26 × 10 1 7.67 × 10 1 6.40 × 10 1 7.57 × 10 1
CNN-XG-COLSHADEprecision 8.28 × 10 1 4.93 × 10 1 7.73 × 10 1 6.61 × 10 1 7.53 × 10 1
recall 8.92 × 10 1 3.63 × 10 1 7.73 × 10 1 6.27 × 10 1 7.73 × 10 1
f1-score 8.59 × 10 1 4.18 × 10 1 7.73 × 10 1 6.38 × 10 1 7.60 × 10 1
support2072601
Table 15. Layer 2 XGBoost model parameter selections.
Table 15. Layer 2 XGBoost model parameter selections.
MethodLearning Ratemax_child_weightSubsamplecollsample_bytreemax_depthGamma
CNN-XG-MPPSO0.10000010.0000000.9831970.29394530.713646
CNN-XG-PSO0.1000002.3571060.9682640.36364950.015409
CNN-XG-GA0.1074031.6915920.9503450.31998430.000000
CNN-XG-VNS0.1751664.6453081.0000000.28493240.757203
CNN-XG-ABC0.2859486.2117370.7447530.67827360.779172
CNN-XG-BA0.2063963.7160901.0000000.32223240.528585
CNN-XG-SCHO0.1000003.3537701.0000000.26946130.000000
CNN-XG-COLSHADE0.1000002.4093300.9897360.30620830.050836
Table 16. NLP Layer 2 AdaBoost objective function outcomes.
Table 16. NLP Layer 2 AdaBoost objective function outcomes.
MethodBestWorstMeanMedianStdVar
AB-MPPSO 9.60 × 10 1 9.55 × 10 1 9.57 × 10 1 9.56 × 10 1 1.60 × 10 3 2.56 × 10 6
AB-PSO 9.59 × 10 1 9 . 57 × 10 1 9 . 58 × 10 1 9.58 × 10 1 7 . 80 × 10 4 6 . 08 × 10 7
AB-GA 9.60 × 10 1 9.55 × 10 1 9.57 × 10 1 9.57 × 10 1 1.55 × 10 3 2.39 × 10 6
AB-VNS 9.56 × 10 1 9.54 × 10 1 9.55 × 10 1 9.55 × 10 1 9.13 × 10 4 8.34 × 10 7
AB-ABC 9.57 × 10 1 9.52 × 10 1 9.55 × 10 1 9.56 × 10 1 1.55 × 10 3 2.42 × 10 6
AB-BA 9.60 × 10 1 9.50 × 10 1 9.57 × 10 1 9 . 59 × 10 1 3.72 × 10 3 1.38 × 10 5
AB-SCHO 9.58 × 10 1 9.54 × 10 1 9.56 × 10 1 9.56 × 10 1 1.11 × 10 3 1.23 × 10 6
AB-COLSHADE 9.58 × 10 1 9.55 × 10 1 9.56 × 10 1 9.56 × 10 1 1.37 × 10 3 1.88 × 10 6
Table 17. NLP Layer 2 AdaBoost indicator function outcomes.
Table 17. NLP Layer 2 AdaBoost indicator function outcomes.
MethodBestWorstMeanMedianStdVar
AB-MPPSO 2.02 × 10 2 2.26 × 10 2 2.17 × 10 2 2.19 × 10 2 7.93 × 10 4 6.29 × 10 7
AB-PSO 2.06 × 10 2 2.16 × 10 2 2.10 × 10 2 2.12 × 10 2 4 . 11 × 10 4 1 . 69 × 10 7
AB-GA 2.02 × 10 2 2.26 × 10 2 2.16 × 10 2 2.16 × 10 2 7.81 × 10 4 6.11 × 10 7
AB-VNS 2 . 19 × 10 2 2.33 × 10 2 2 . 26 × 10 2 2 . 26 × 10 2 4.85 × 10 4 2.35 × 10 7
AB-ABC 2.16 × 10 2 2 . 40 × 10 2 2.25 × 10 2 2.23 × 10 2 7.93 × 10 4 6.29 × 10 7
AB-BA 2.02 × 10 2 2.50 × 10 2 2.17 × 10 2 2.06 × 10 2 1.84 × 10 3 3.40 × 10 6
AB-SCHO 2.12 × 10 2 2.30 × 10 2 2.21 × 10 2 2.23 × 10 2 5.57 × 10 4 3.10 × 10 7
AB-COLSHADE 2.09 × 10 2 2.30 × 10 2 2.21 × 10 2 2.19 × 10 2 7.06 × 10 4 4.98 × 10 7
Table 18. Best-performing optimized NLP Layer 2 AdaBoost model detailed metric comparisons.
Table 18. Best-performing optimized NLP Layer 2 AdaBoost model detailed metric comparisons.
MethodMetricNo ErrorErrorAccuracyMacro AvgWeighted Avg
AB-MPPSOprecision 9.94 × 10 1 9.66 × 10 1 9.80 × 10 1 9.80 × 10 1 9.80 × 10 1
recall 9.66 × 10 1 9.94 × 10 1 9.80 × 10 1 9.80 × 10 1 9.80 × 10 1
f1-score 9.80 × 10 1 9.80 × 10 1 9.80 × 10 1 9.80 × 10 1 9.80 × 10 1
AB-PSOprecision 9.92 × 10 1 9.67 × 10 1 9.79 × 10 1 9.80 × 10 1 9.80 × 10 1
recall 9.67 × 10 1 9.92 × 10 1 9.79 × 10 1 9.80 × 10 1 9.79 × 10 1
f1-score 9.80 × 10 1 9.79 × 10 1 9.80 × 10 1 9.79 × 10 1 9.80 × 10 1
AB-GAprecision 9.90 × 10 1 9.69 × 10 1 9.80 × 10 1 9.80 × 10 1 9.80 × 10 1
recall 9.70 × 10 1 9.90 × 10 1 9.80 × 10 1 9.80 × 10 1 9.80 × 10 1
f1-score 9.80 × 10 1 9.80 × 10 1 9.80 × 10 1 9.80 × 10 1 9.80 × 10 1
AB-VNSprecision 9.89 × 10 1 9.67 × 10 1 9.78 × 10 1 9.78 × 10 1 9.78 × 10 1
recall 9.68 × 10 1 9.89 × 10 1 9.78 × 10 1 9.78 × 10 1 9.78 × 10 1
f1-score 9.78 × 10 1 9.78 × 10 1 9.78 × 10 1 9.78 × 10 1 9.78 × 10 1
AB-ABCprecision 9.91 × 10 1 9.66 × 10 1 9.78 × 10 1 9.78 × 10 1 9.79 × 10 1
recall 9.66 × 10 1 9.91 × 10 1 9.78 × 10 1 9.79 × 10 1 9.78 × 10 1
f1-score 9.79 × 10 1 9.78 × 10 1 9.78 × 10 1 9.78 × 10 1 9.78 × 10 1
AB-BAprecision 9.94 × 10 1 9.66 × 10 1 9.80 × 10 1 9.80 × 10 1 9.80 × 10 1
recall 9.66 × 10 1 9.94 × 10 1 9.80 × 10 1 9.80 × 10 1 9.80 × 10 1
f1-score 9.80 × 10 1 9.80 × 10 1 9.80 × 10 1 9.80 × 10 1 9.80 × 10 1
AB-SCHOprecision 9.90 × 10 1 9.67 × 10 1 9.79 × 10 1 9.79 × 10 1 9.79 × 10 1
recall 9.68 × 10 1 9.90 × 10 1 9.79 × 10 1 9.79 × 10 1 9.79 × 10 1
f1-score 9.79 × 10 1 9.79 × 10 1 9.79 × 10 1 9.79 × 10 1 9.79 × 10 1
AB-COLSHADEprecision 9.91 × 10 1 9.67 × 10 1 9.79 × 10 1 9.79 × 10 1 9.79 × 10 1
recall 9.68 × 10 1 9.91 × 10 1 9.79 × 10 1 9.79 × 10 1 9.79 × 10 1
f1-score 9.79 × 10 1 9.79 × 10 1 9.79 × 10 1 9.79 × 10 1 9.79 × 10 1
support14871431
Table 19. NLP Layer 2 AdaBoost model parameter selections.
Table 19. NLP Layer 2 AdaBoost model parameter selections.
MethodsNumber of EstimatorsDepthLearning Rate
AB-MPPSO1551.425870
AB-PSO1561.806671
AB-GA1461.657350
AB-VNS1561.357072
AB-ABC1461.761995
AB-BA1561.808268
AB-SCHO1461.669562
AB-COLSHADE1361.464666
Table 20. NLP Layer 2 XGBoost objective function outcomes.
Table 20. NLP Layer 2 XGBoost objective function outcomes.
MethodBestWorstMeanMedianStdVar
XG-MPPSO 9 . 68 × 10 1 9.59 × 10 1 9.62 × 10 1 9.62 × 10 1 1.91 × 10 3 3.64 × 10 6
XG-PSO 9.67 × 10 1 9 . 61 × 10 1 9 . 65 × 10 1 9 . 65 × 10 1 1 . 40 × 10 3 1 . 95 × 10 6
XG-GA 9.66 × 10 1 9.57 × 10 1 9.62 × 10 1 9.62 × 10 1 2.15 × 10 3 4.61 × 10 6
XG-VNS 9.67 × 10 1 9.61 × 10 1 9.63 × 10 1 9.63 × 10 1 1.61 × 10 3 2.59 × 10 6
XG-ABC 9.63 × 10 1 9.48 × 10 1 9.57 × 10 1 9.58 × 10 1 3.63 × 10 3 1.31 × 10 5
XG-BA 9.65 × 10 1 9.53 × 10 1 9.60 × 10 1 9.60 × 10 1 2.96 × 10 3 8.76 × 10 6
XG-SCHO 9.65 × 10 1 9.58 × 10 1 9.62 × 10 1 9.63 × 10 1 1.89 × 10 3 3.57 × 10 6
XG-COLSHADE 9.65 × 10 1 9.59 × 10 1 9.62 × 10 1 9.62 × 10 1 1.81 × 10 3 3.28 × 10 6
Table 21. NLP Layer 2 XGBoost indicator function outcomes.
Table 21. NLP Layer 2 XGBoost indicator function outcomes.
MethodBestWorstMeanMedianStdVar
XG-MPPSO 1.61 × 10 2 2.06 × 10 2 1.91 × 10 2 1.92 × 10 2 9.69 × 10 4 9.38 × 10 7
XG-PSO 1.68 × 10 2 1.95 × 10 2 1.78 × 10 2 1 . 76 × 10 2 6 . 99 × 10 4 4 . 89 × 10 7
XG-GA 1.71 × 10 2 2.16 × 10 2 1.92 × 10 2 1.92 × 10 2 1.07 × 10 3 1.15 × 10 6
XG-VNS 1.65 × 10 2 1.99 × 10 2 1.86 × 10 2 1.85 × 10 2 8.15 × 10 4 6.65 × 10 7
XG-ABC 1 . 88 × 10 2 2 . 64 × 10 2 2 . 15 × 10 2 2.12 × 10 2 1.86 × 10 3 3.44 × 10 6
XG-BA 1.78 × 10 2 2.40 × 10 2 2.01 × 10 2 2.04 × 10 2 1.50 × 10 3 2.26 × 10 6
XG-SCHO 1.75 × 10 2 2.09 × 10 2 1.89 × 10 2 1.88 × 10 2 9.50 × 10 4 9.03 × 10 7
XG-COLSHADE 1.75 × 10 2 2.06 × 10 2 1.93 × 10 2 1.94 × 10 2 9.21 × 10 4 8.49 × 10 7
Table 22. Best-performing optimized NLP Layer 2 XGBoost model detailed metric comparisons.
Table 22. Best-performing optimized NLP Layer 2 XGBoost model detailed metric comparisons.
MethodMetricNo ErrorErrorAccuracyMacro AvgWeighted Avg
XG-MPPSOprecision 9.95 × 10 1 9.73 × 10 1 9.84 × 10 1 9.84 × 10 1 9.84 × 10 1
recall 9.73 × 10 1 9.95 × 10 1 9.84 × 10 1 9.84 × 10 1 9.84 × 10 1
f1-score 9.84 × 10 1 9.84 × 10 1 9.84 × 10 1 9.84 × 10 1 9.84 × 10 1
XG-PSOprecision 9.96 × 10 1 9.71 × 10 1 9.83 × 10 1 9.83 × 10 1 9.83 × 10 1
recall 9.71 × 10 1 9.96 × 10 1 9.83 × 10 1 9.83 × 10 1 9.83 × 10 1
f1-score 9.83 × 10 1 9.83 × 10 1 9.83 × 10 1 9.83 × 10 1 9.83 × 10 1
XG-GAprecision 9.97 × 10 1 9.69 × 10 1 9.83 × 10 1 9.83 × 10 1 9.83 × 10 1
recall 9.70 × 10 1 9.97 × 10 1 9.83 × 10 1 9.83 × 10 1 9.83 × 10 1
f1-score 9.83 × 10 1 9.83 × 10 1 9.83 × 10 1 9.83 × 10 1 9.83 × 10 1
XG-VNSprecision 9.96 × 10 1 9.71 × 10 1 9.84 × 10 1 9.84 × 10 1 9.84 × 10 1
recall 9.72 × 10 1 9.96 × 10 1 9.84 × 10 1 9.84 × 10 1 9.84 × 10 1
f1-score 9.84 × 10 1 9.83 × 10 1 9.84 × 10 1 9.84 × 10 1 9.84 × 10 1
XG-ABCprecision 9.95 × 10 1 9.67 × 10 1 9.81 × 10 1 9.81 × 10 1 9.82 × 10 1
recall 9.68 × 10 1 9.95 × 10 1 9.81 × 10 1 9.81 × 10 1 9.81 × 10 1
f1-score 9.81 × 10 1 9.81 × 10 1 9.81 × 10 1 9.81 × 10 1 9.81 × 10 1
XG-BAprecision 9.95 × 10 1 9.69 × 10 1 9.82 × 10 1 9.82 × 10 1 9.83 × 10 1
recall 9.70 × 10 1 9.95 × 10 1 9.82 × 10 1 9.82 × 10 1 9.82 × 10 1
f1-score 9.82 × 10 1 9.82 × 10 1 9.82 × 10 1 9.82 × 10 1 9.82 × 10 1
XG-SCHOprecision 9.96 × 10 1 9.69 × 10 1 9.83 × 10 1 9.83 × 10 1 9.83 × 10 1
recall 9.70 × 10 1 9.96 × 10 1 9.83 × 10 1 9.83 × 10 1 9.83 × 10 1
f1-score 9.83 × 10 1 9.82 × 10 1 9.83 × 10 1 9.83 × 10 1 9.83 × 10 1
XG-COLSHADEprecision 9.96 × 10 1 9.69 × 10 1 9.83 × 10 1 9.83 × 10 1 9.83 × 10 1
recall 9.70 × 10 1 9.96 × 10 1 9.83 × 10 1 9.83 × 10 1 9.83 × 10 1
f1-score 9.83 × 10 1 9.82 × 10 1 9.83 × 10 1 9.83 × 10 1 9.83 × 10 1
support14871431
Table 23. NLP Layer 2 XGBoost model parameter selections.
Table 23. NLP Layer 2 XGBoost model parameter selections.
MethodLearning Ratemax_child_weightSubsamplecollsample_bytreemax_depthGamma
XG-MPPSO0.8788641.0000000.8865500.447144100.001162
XG-PSO0.9000001.0000001.0000000.418138100.137715
XG-GA0.9000001.0000001.0000000.380366100.000000
XG-VNS0.8745561.0000001.0000000.389317100.800000
XG-ABC0.9000001.0000000.9201350.41261070.512952
XG-BA0.9000001.1915611.0000000.451404100.800000
XG-SCHO0.9000001.0000001.0000000.425699100.161777
XG-COLSHADE0.9000001.0000001.0000000.369307100.262928
Table 24. Comparisons to baselines in scenario 1 (NASA).
Table 24. Comparisons to baselines in scenario 1 (NASA).
ModelsAccuracyMCC
CNN-MPPSO76.88%0.273280
CNN-AB-MPPSO77.22%0.282906
CNN-XG-MPPSO77.10%0.283956
AdaBoost73.58%0.245921
XGBoost73.21%0.241493
CatBoost72.86%0.238149
Random forest75.54%0.255372
SVM74.37%0.247345
DNN 2 hidden layers76.86%0.266794
DNN 3 hidden layers76.72%0.266037
DNN 3 hidden layers (dropout 0.1)77.05%0.268044
Table 25. Comparisons to baselines in scenario 2 (NLP).
Table 25. Comparisons to baselines in scenario 2 (NLP).
ModelsAccuracyMCC
AB-MPPSO97.98%0.959947
XG-MPPSO98.38%0.968036
AdaBoost95.38%0.932849
XGBoost95.85%0.934185
CatBoost95.77%0.933941
Random forest94.62%0.924924
SVM93.39%0.914582
DNN 2 hidden layer97.91%0.943741
DNN 3 hidden layers97.83%0.948537
DNN 3 hidden layers (dropout 0.1)98.14%0.946093
Table 26. Shapiro–Wilk outcomes of the five conducted simulations for verification of the normality condition for safe utilization of parametric tests.
Table 26. Shapiro–Wilk outcomes of the five conducted simulations for verification of the normality condition for safe utilization of parametric tests.
ModelMPPSOPSOGAVNSABCBASCHOCOLSHADE
CNN Optimization0.0330.0290.0230.0170.0350.0440.0410.037
AdaBoost Optimization0.0190.0270.0380.0210.0340.0430.0250.049
XGBoost Optimization0.0220.0260.0320.0180.0290.0400.0380.042
AdaBoost NLP Optimization0.0310.0220.0370.0260.0330.0460.0300.045
XGBoost NLP Optimization0.0280.0240.0340.0190.0310.0410.0360.043
Table 27. Wilcoxon signed-rank test scores of the five conducted simulations.
Table 27. Wilcoxon signed-rank test scores of the five conducted simulations.
MPPSO vs. OthersPSOGAVNSABCBASCHOCOLSHADE
CNN Optimization0.0270.0240.0340.0180.0320.0400.045
AdaBoost Optimization0.0200.0280.0310.0220.0300.0380.042
XGBoost Optimization0.0330.0230.0370.0190.0310.0430.046
AdaBoost NLP Optimization0.0290.0260.0350.0210.0290.0410.044
XGBoost NLP Optimization0.0240.0270.0320.0200.0340.0390.047
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Petrovic, A.; Jovanovic, L.; Bacanin, N.; Antonijevic, M.; Savanovic, N.; Zivkovic, M.; Milovanovic, M.; Gajic, V. Exploring Metaheuristic Optimized Machine Learning for Software Defect Detection on Natural Language and Classical Datasets. Mathematics 2024, 12, 2918. https://doi.org/10.3390/math12182918

AMA Style

Petrovic A, Jovanovic L, Bacanin N, Antonijevic M, Savanovic N, Zivkovic M, Milovanovic M, Gajic V. Exploring Metaheuristic Optimized Machine Learning for Software Defect Detection on Natural Language and Classical Datasets. Mathematics. 2024; 12(18):2918. https://doi.org/10.3390/math12182918

Chicago/Turabian Style

Petrovic, Aleksandar, Luka Jovanovic, Nebojsa Bacanin, Milos Antonijevic, Nikola Savanovic, Miodrag Zivkovic, Marina Milovanovic, and Vuk Gajic. 2024. "Exploring Metaheuristic Optimized Machine Learning for Software Defect Detection on Natural Language and Classical Datasets" Mathematics 12, no. 18: 2918. https://doi.org/10.3390/math12182918

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop