You are currently on the new version of our website. Access the old version .
MAKEMachine Learning and Knowledge Extraction
  • Article
  • Open Access

16 May 2024

Assessment of Software Vulnerability Contributing Factors by Model-Agnostic Explainable AI

,
and
Department of Electrical and Computer Engineering, Concordia University, Montréal, QC H4B 1R6, Canada
*
Author to whom correspondence should be addressed.
This article belongs to the Special Issue Advances in Explainable Artificial Intelligence (XAI): 2nd Edition

Abstract

Software vulnerability detection aims to proactively reduce the risk to software security and reliability. Despite advancements in deep-learning-based detection, a semantic gap still remains between learned features and human-understandable vulnerability semantics. In this paper, we present an XAI-based framework to assess program code in a graph context as feature representations and their effect on code vulnerability classification into multiple Common Weakness Enumeration (CWE) types. Our XAI framework is deep-learning-model-agnostic and programming-language-neutral. We rank the feature importance of 40 syntactic constructs for each of the top 20 distributed CWE types from three datasets in Java and C++. By means of four metrics of information retrieval, we measure the similarity of human-understandable CWE types using each CWE type’s feature contribution ranking learned from XAI methods. We observe that the subtle semantic difference between CWE types occurs after the variation in neighboring features’ contribution rankings. Our study shows that the XAI explanation results have approximately 78% Top-1 to 89% Top-5 similarity hit rates and a mean average precision of 0.70 compared with the baseline of CWE similarity identified by the open community experts. Our framework allows for code vulnerability patterns to be learned and contributing factors to be assessed at the same stage.

1. Introduction

Software vulnerability refers to weaknesses within an information system, its internal controls, its system security procedures, or its implementation that could be exploited by a threat source [1]. These vulnerabilities often arise from design errors, poor coding practices, or inadequate security testing. In large-scale software systems, detecting vulnerabilities presents challenges in terms of the accuracy and transparency of both research [2,3,4,5] and industrial [6,7] practices. Applying vulnerability analyses and detection at the early stage of the software process, prior to deployment, is a proactive attack mitigation solution [8]. The analysis involves learning existing patterns of vulnerability types and analyzing the underlying factors in the code structure that may contribute to the weakness [9]. Vulnerability detection is the process that identifies, classifies, remediates, and mitigates code vulnerabilities.
Research in software vulnerability detection has progressed from static code analysis techniques to machine learning approaches. Static code analysis tools, such as security scanners, employ pattern matching [10,11] based on well-defined rules to identify bugs or flaws in the software [12,13,14]. However, these tools suffer from high false-positive rates [15].
Machine-learning-based approaches utilize source code, software complexity metrics, and version control system data to predict vulnerabilities [5,16,17]. These approaches enable automatic feature extraction and the learning of complex patterns, reducing the need for expert-driven feature engineering [3,18,19,20]. Data-driven software vulnerability detection has been reported to improve the detection accuracy in practice [21,22].
A comprehensive study [23] has identified the common limitations of six deep learning models in producing realistic code vulnerability detection. The main limitation is inadequate models that reduce their learning performance when transferred to real-world settings. Further reasons include the learning of irrelevant features, data duplication, and data imbalances. All these aspects impose limitations on model-specific approaches’ ability to provide transferable patterns beyond the training datasets. Questions remain regarding the scope of interpretability and explainability of AI, what kind of features these models are learning, and whether they can be effectively and reliably transferred to other datasets [23].
One limitation is that practitioners cannot understand the features learned by a deep learning model without mapping the semantic meanings of vulnerable artifacts [8]. The opacity leads to questions, such as the following: (1) How transferable are the signatures of vulnerable artifacts learned from one set of software projects to others [24]? (2) What factors are mostly involved in the representation learning? (3) What variance is caused by factors from (2) in the classification results among different learning methods [25]? A key to bridging the gap between the learned feature representations and human-understandable vulnerability semantics is to assess the importance of code features to the semantics of the vulnerability classification. Such an assessment necessitates that the techniques are model-agnostic, emphasizing only the code features as inputs and the resulting vulnerability classification as outputs.
EXplainable Artificial Intelligence (XAI) is an emerging research field that aims to enhance AI models as trustworthy and transparent [26]. XAI encompasses diverse techniques, methods, and models to explain how the learning models reach their predictions. For instance, model-agnostic attribute-based XAI methods focus on identifying attributes that contribute the most to the model’s prediction. A manifesto of XAI was thoroughly defined based on a set of XAI survey papers and shared visions by scholars [27]. Applications of XAI covered in the XAI manifesto  [27] include healthcare, medicine, bio-informatics, finance, environmental science, agriculture, and education. Additionally, the software development and software system domains have a large amount of code, documentation and diverse scenarios that require AI learning and explanation. In the context of code vulnerability learning, SHapley Additive exPlanations (SHAP) [28], LIME [29], Lemna [30] and Mean-Centroid PredDiff [31] have been applied to measure the feature contribution values of program code feature representation.
The current application of XAI techniques in software vulnerability analysis faces the issue that the covered attributes cannot be extrapolated beyond the domain of the input data. Thus the explanation is (1) limited to a few attributes and (2) disconnected from the semantically defined relations among CWE types. Such a semantic relation is embedded in the definition of CWE types accumulated over years of practice in the community. We consider explanations with a link to CWE semantics as human-understandable explanations. This type of research challenge has been identified and defined as one of the nine aspects of the XAI manifesto [27].
Several studies have attempted to explain the importance of Abstract Syntax Tree (AST) path content [32,33] or individual code tokens [34] using XAI methods such as SHAP [28]. However, only limited syntactic constructs such as name, parameters, statements are investigated, rather than the whole set of syntactic constructs. Moreover, there is a lack of studies that relate learned features’ representation to the semantic similarity collectively described by security experts for a variety of types of software vulnerabilities [33]. In summary, a systematic method to correlate these meta syntactic constructs with common characteristics across multiple vulnerability types is the purpose of this study.
The work from CSAIL MIT [35] has demonstrated, with a novel experiment design, that program synthesis trained as program corpus in textual input–output is well-suited for characterizing the meaning in language models. We are informed by the evidence in [23] of the limitations of the token sequences at the program level to reveal the semantic meanings of feature contributions. We further consider the code tokens at the syntactic construct level to assess the feature contributions to vulnerability classification through XAI probing techniques.
In this paper, our method first studies the inputs as code features by defining the taxonomy of the state-of-the-art works into four types, namely, text-based, graph-based, code binary, and mixed representation [25], as discussed in Section 3. For each type of code feature representation, we further categorize works into eight groups of program artifacts and twenty-one extracted code feature types. We mainly select Abstract Syntax Tree out of the twenty-one extracted code feature types using the taxonomy shown in Figure 1. The abstract syntax tree contains structured meta-data of a programming language and applies this to all the programming codes of the same language. Then, graph learning models can be used to encode and decode the hierarchical structure of the abstract syntax tree.
We attempt to derive model-agnostic explanations for multi-classification in contrast to binary classification in [23]. We focus on the graph context of code tokens as features that embed code token connections through traversing the abstract syntax tree. Code property graphs such as the control flow graph, data flow graph, and program dependency graph are analyzed at the program level, which faces the challenges of maintaining balanced data samples for every vulnerability type. The sufficient and balanced samples of feature types are suitable for XAI methods such as feature masking.
Our method then assesses the output of vulnerability classification based on the Common Weakness Enumeration (CWE) [36]. CWE is a community-developed list of software and hardware security weaknesses that are commonly used as labels for supervised learning in vulnerability detection. The study [37] has identified that various vulnerability types exhibit semantic similarities. Similarly, the vulnerability similarities are represented by organizing the CWE’s hierarchical structures [36] or CWE clusters [38].
We define and apply information retrieval metrics to measure the similarity between classified CWEs. Through XAI methods and techniques, we further assess the variance in CWE classification similarity under input feature changes. Thus, the machine-learned feature representations are quantitatively measured for their contributions to classifying community-defined and human-understandable code vulnerability types. Our contribution is three-fold:
  • We define the taxonomy of code representation into eight types of high-level categories, and a further twenty-one fine-grained code representations. This taxonomy distinguishes the fine-grained code representation set at the program source code level and the program meta-data level. This taxonomy clearly positions our XAI-based approach in the map of related works.
  • We design a model agnostic XAI framework that derives rankings of the feature contribution levels of a list of forty syntactic constructs in Abstract Syntax Trees (AST) across twenty CWE types for both Java and C++ datasets. This framework is applicable to different choices of classifiers and XAI methods.
  • We develop a novel feature masking technique for the graph context that varies the neighbourhood of code tokens and syntactic constructs. We define and apply information retrieval techniques to convert the change in the code token neighbourhood into the CWE type similarity.
Overall, we demonstrate that the similarity between CWE types derived from XAI explanations links subtle semantics that are understood by security experts to the learned code feature representations. Through experiments, we compare XAI-derived CWE similarities and sibling CWE types defined by security experts. Thus, our approach is able to retrospectly identify the misclassification of similar CWE types due to the variance in feature contributions. We open-sourced our code and made our dataset available on GitHub (https://github.com/DataCentricClassificationofSmartCity/XAI-based-Software-Vulnerbility-Dection, accessed on 1 May 2024).
Figure 1. The taxonomy of factors under code feature representation techniques [21,23,31,32,33,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59].
This paper is organized starting with an overview of model-agnostic XAI methods and the motivation for adopting XAI in Section 2. Section 3 reviews the existing literature on the code feature representation and vulnerability detection domain using XAI applications. We present our main research methodology and XAI-based framework in Section 4. Then, we propose the research questions, conduct the experiments, and demonstrate our results in Section 5. We present a retrospective of the motivation case in Section 5.4. Finally, we discuss the potential threats to the validity in Section 6 and draw conclusions in Section 7.

4. An XAI-Based Framework for Feature Contribution and Vulnerability Assessment

We propose a framework that retrieves the feature contribution values utilizing XAI techniques and analyzes the XAI explanation summaries. We quantitatively assess the feature contributions to the multi-classification of the code vulnerability of CWE types to identify the factors. As discussed in Section 2, CWE types are defined and categorized by experts from many real-world samples. The similarities of CWE types have subtle effects on the learning tasks of vulnerability classification. Hence, our workflow utilizes XAI methods to probe into the high-dimension code features and relate feature variations to the classification results.
The main components of the workflow are shown in Figure 3. Compared to existing code vulnerability classification solutions, our workflow has three additional components: feature variation, XAI method, and an analysis of XAI outputs. Our workflow applies post-hoc and model-agnostic XAI methods that compute the feature contribution values under the variations in feature mutations, feature masking, and feature removal. The outputs from XAI methods are further analyzed to identify high-ranking code features.
Figure 3. The assessment of feature contributions using XAI explanations. The main components include feature representation, feature variations, the XAI method, the pre-trained model, and an analysis of XAI results.

4.1. The Graph Context Extraction of Program Code

We extracted the program paths of the input program source code derived from abstract syntax trees, which preserves the semantic properties of the program code. For example, Figure 2 illustrates the difference between two sibling CWE types that derive from the semantic meanings of arguments. One type is the relative path, and the other is the absolute path. Both are traced back to the syntax of the argument construction.
To capture links between semantic meanings and the syntax constructs, we considered extracting a path that has leaf nodes as code tokens and non-leaf nodes as the syntax constructs derived from Abstract Syntax Trees (AST). Figure 4 shows the complete syntactic constructs of the code example with the CWE23 Relative Path Traversal Weakness.
Figure 4. The complete paths of the syntactic construct and code tokens for Listing 1. The leaf nodes are code tokens, while the non-leaf nodes are syntactic constructs that define the syntax properties of code.
Syntactic constructs are the program syntax’s building blocks, including forty constructs such as loops, conditionals, declarations, and expressions. Table 1 lists a summary of syntactic constructs and the higher-level categorized meta syntactic constructs defined in the work [83]. The meta syntactic constructs preserve the semantic roles within a program. For instance, the Declarations, Definitions, Initializations meta construct category consists of syntactic constructs related to defining and initializing variables, functions, and objects.
Table 1. Syntactic constructs in Abstract Syntax Tree.
Further, traversing from one code token through the syntax paths to another shows the connection between code tokens and preserves the functional meanings. An example syntactic construct tree in Figure 4 represents the code listed in Listing 1, which contains the vulnerability type CWE23 relative path’s traversal weakness. The syntactic path, String-name-type-decl-name-root, extends from the source code token String to the target code token root, where ↑ and ↓ are the traversing directions. In this example, decl changes the traversing direction from upward of the path to downward of the path. We call a node that converts the traversing directions an inflection node. Through an inflection node, two code tokens in a pair are linked together by the shortest path that traverses the nearest inflection node.
Listing 1. Code snippet from the Juliet dataset [62]. The code is of vulnerability type CWE23—Relative Path Traversal Weakness.
Make 06 00050 i001
Therefore, the source code of programs is extracted into paths. Each path traverses pairs of code tokens through the shortest path. The set of all the paths leading from the source leaf nodes to the target source node contains the graph context of the target nodes. All the source leaf nodes become the neighbor nodes of the target code token. In the graph context, the syntactic constructs along the paths are on the edges that connect the source code tokens to the target code token. For example, Figure 5 illustrates the graph context extracted from the complete syntactic paths that cover the CWE23 vulnerability sample code in Listing 1.
Figure 5. An overview of embedding learning. The distributed representations of target code token data are learned from the relevant context tokens (blue nodes) that are fed into a one-layer Graph Convolutional Network (GCN). h w i , h w t are hidden representations of context token and target token, and b is the added bias [39].
Two configurations path length and window are involved in the graph context generation. The path length is the length of the shortest AST syntactic path connecting two leaf nodes (code tokens). The term window refers to the maximum distance between the target code token and its neighboring code token within a code function, considering both upward and downward directions, as illustrated in Figure 6. When learning a target code token, only neighboring tokens located within this window, either upward or downward of it, are selected as source code token nodes in the graph context. Both path length and window are utilized to shape the scope of the graph context, as illustrated in Figure 5.
Figure 6. An example of how the window size restricts the selection of neighboring nodes as the source code node for the target code node data, considering both upwards and downwards directions.

4.2. Embedding by Graph Convolutional Networks

The graph context of each target token is used to learn the embedding of the target token. The source tokens within the graph context form the input vector to a learning model, and the output is the target token data. Figure 5 shows that the graph context vector is input to a one-layer Graph Convolutional Network (GCN). We adopted the GCN model developed in [39,84], which demonstrated its use in six software repository analysis tasks, including code classification. The output embeddings for each code token are 128-dimensional vectors containing information about the code token and syntactic constructs.

4.3. Feature Masking

XAI methods such as SHAP and Mean-Centroid PredDiff assess the feature contribution values by means of changing features and measuring the learning models’ outputs. By masking a syntactic construct, the graph context of a targeted node is mutated, which results in the masking of neighboring tokens of the target token. Figure 7 shows, as an example, an inflection node decl (as an abbreviation for declaration) in the path of String-name-type-decl-name-data. When declaration is masked, the syntactic construct decl is not embedded in any graph context feature presentation. Correspondingly, String is excluded from the graph context of data. Through the masking of syntactic constructions, we can mutate the embedding of each target node and further assess the feature contribution values of each syntactic construct.
Figure 7. Feature masking for graph context mutation. After masking syntactic constructs such as decl, the target node’s embedding is mutated as the graph context changes.

4.4. Integrating XAI Methods in Multi-Classification

The graph context is processed through a Graph Convolutional Network (GCN) [39] model to generate code token embeddings with a 128-dimension vector. These embeddings are then classified into CWE types using various classifiers, such as the Text Convolutional Neural Network (TextCNN)[85], Random Forest[86], and Transformer [79].
An XAI method Φ works on the trained model and estimates feature contributions by masking or mutating feature representation. In Section 2, we introduced two XAI methods, SHAP and Mean-Centroid PredDiff, which are applicable as Φ . The XAI outputs are vectors for each CWE type, ranked by the contribution values of each syntactic construct. An example of the XAI outputs is illustrated in Listing 2. The syntactic constructs are ranked in descending order according to the feature contribution value. Hence, we obtained a ranked sequence of syntactic constructs of CWE types that are classified for the dataset.
Listing 2. CWE23 vector with syntactic constructs and their feature contribution values.
Make 06 00050 i002
Specifically, each XAI method produces one CWE vector, as demonstrated in Algorithm 1. We aggregated and computed the average of the contribution values indexed by the syntactic constructs from different XAI methods and derived the final CWE vector. This result helps us to quantify the contributions at the level of the syntactic constructs, in addition to the code tokens, in the classification task of vulnerability code CWE types.
Algorithm 1 Compute the CWE vector of each syntactic construct’s contribution value
Input:
  • The input dataset X;
  • The full AST construct forms the feature set P = { 1 , , j , p } ;
  • The subset S P { j } by masking feature j;
  • The feature j contribution value ϕ j = Φ ( P , S , j , f ^ ( X ) ) ;
  • The model prediction under feature j masking f ^ ( X ) ;
  • The CWE label set K = { c w e 1 , , c w e k , , c w e N } .
 1:
/* Partition dataset by ground truth CWE label */
 2:
for all  x i X  do
 3:
  if  x i owns label c w e k  then
 4:
    Add x i to X c w e k
 5:
  end if
 6:
end for
 7:
/* Compute CWE vector of feature contribution value */
 8:
for all  X c w e k  do
 9:
  for all  j P  do
10:
     ϕ j c w e k = Φ ( P , S , j , f ^ ( X c w e k ) )
11:
  end for
12:
   ϕ j c w e k ¯ = 1 | | P | | ϕ j c w e k
13:
  /* Create CWE vector */
14:
  for all  j P  do
15:
     V c w e k j , ϕ j c w e k ¯
16:
  end for
17:
end for
18:
/* Sort elements in descending order by feature contribution values */
19:
for all  c w e k K  do
20:
   V c w e k { s o r t ( V c w e k ) }
21:
end for
Output: The CWE vector for each CWE label c w e k K , V K .
We analyzed the complexity of our algorithms. Given the size of the dataset samples N, the feature number P, and the CWE label number K, the complexity of Algorithm 1 is Θ ( P × K × Θ ( Φ ) ) , in which, for SHAP, Θ ( Φ ) = Θ ( N × ( 2 P + P 3 ) ) , and for Mean-Centroid PredDiff, Θ ( Φ ) = Θ ( N × P 2 ) .

4.5. CWE Similarity Assessment

The assessment of CWE similarity consists of two steps. The first step is to derive the CWE similarity pairs from the XAI explanation summary. The second step is to validate the CWE similarity from the baseline (ground truth) from the knowledge base of the CWE community.
CWE Similarity Score. We represent the similarity score between CWEs as ρ . It is derived from the normalized ranking distance [87] between two sorted CWE vectors. The value of ρ ranges from zero, indicating identical CWE pairs, to one, indicating complete dissimilarity. A lower ρ value indicates a higher similarity between a pair of CWEs. We sorted the ρ values of CWEs and listed CWEs in descending order of their similarity in terms of ranking with a given CWE.
The CWE similarity assessment follows the simplified steps below, with details outlined in Algorithm 2:
  • Based on the sorted CWE vectors, we compute the similarity score between CWE types. Therefore, any pair of CWE types has a similarity score.
  • Given a CWE type, we rank the highest similarity score in all the pairs that involve this CWE type.
  • Given a CWE type, the CWE type that has the highest similarity score becomes the most similar to the given CWE type.
Algorithm 2 Compute CWE similarity vector for CWE types
Input:
  • Sorted vectors of feature importance for each CWE label V c w e k output from Algorithm 1;
  • The CWE label set K = { c w e 1 , , c w e k , , c w e N } .
 1:
Initialize an empty array d for storing ranking distances.
 2:
for all distinct pairs of CWE labels c w e i , c w e j K × K  do
 3:
  /*Calculate Kendall Tau ranking distance between CWE vectors*/
 4:
   d i j d i s t a n c e ( V c w e i , V c w e j ) .
 5:
  Store d i j in vector d.
 6:
end for
 7:
d m a x = m a x ( d ) .
 8:
/* Compute normalized CWE similarity distance */
 9:
for all CWE label c w e j  do
10:
   ρ ( c w e i , c w e j ) = d i j d m a x .
11:
   W c w e i c w e j , ρ ( c w e i , c w e j ) .
12:
end for
13:
/* Sort elements in descending order by value of ρ i j */
14:
for all  c w e k K  do
15:
   W c w e k { s o r t ( W c w e k ) } .
16:
end for
Output: The CWE similarity vector for each CWE label c w e k K , W K .
These pair-wise CWE similarity results are derived from the learning process combined with XAI methods. Meanwhile, we developed the baseline similarity pairs. As an example, in Figure 2, any siblings of two CWE types in [60] from the same parent CWE type form a pair of CWE similarities. The complexity is Θ ( K 2 ) , where K is the number of CWE types.
CWE Similarity Validation. Further, we compared the pair-wise CWE similarity derived by XAI methods with the baseline CWE similarity in terms of four metrics, namely Top-N Similarity Hit, Mean Reciprocal Rank (MRR), Mean Average Precision (MAP), and Average Normalized Similarity Score S ¯ . Top-N Similarity Hit and Average Normalized Similarity Score focus on the occurrence of a certain CWE type in our explanation in the baseline. MRR and MAP focus on the occurrence ranking of a CWE type in the baseline. The better the score values, the better the explanation quality; thus, the more accurate the syntactic constructs’ contribution values.
The complexity of Algorithm 2 depends on the number of CWE pair combinations. Table 2 shows the baseline of CWE similarity defined by the open community. The CWE types are classified in a tree structure. The sibling leaves share a commonality with the parent CWE type. CWE22, CWE23, and CWE36 fall under the path’s traversal weakness. Then, CWE22, CWE23, and CWE36 form three CWE similarity pairs.
Table 2. CWE categorized by baseline similarities [60].
To validate the CWE’s similarity to the XAI explanation, we apply four metrics to compare with the baseline: Top-N Similarity Hit, Mean Reciprocal Rank (MRR), Mean Average Precision (MAP), and Average Normalized Similarity Score S ¯ . B c w e i is the set that contains all the CWE types that are siblings to c w e i defined in the baseline. W c w e k is the set of CWE types derived from Algorithm 2. Given the example of CWE23, B c w e 23 = { c w e 22 , c w e 36 } and W c w e 23 = { c w e 22 , c w e 79 , , c w e 36 } , each metric can be illustrated as follows.
  • Top-N Similarity Hit is defined as a boolean value. For example, Top-1 Similarity Hit of CWE22 equals one.
    H N c w e k = 0 if B c w e k W c w e k 1 otherwise
  • Mean Reciprocal Rank (MRR) measures the mean reciprocal rank given a CWE type c w e i .
    M R R c w e i = 1 | | B c w e i | | 1 | | B c w e i | | 1 r a n k c w e j , c w e j B c w e i
    where r a n k c w e j is the position index value of c w e j in W c w e j . In the example of W c w e 23 , CWE22 is ranked as one and CWE36 is ranked as k = 14 . M R R c w e 23 = 1 2 × ( 1 + 1 k ) = 1 2 × ( 1 + 1 14 ) = 0.5357 .
  • Mean Average Precision (MAP) is a metric used to measure the XAI explanation accuracy of CWE type similarity by averaging the precision of each CWE type’s similarity rank. Let W N c w e i represent the top-N subset of W c w e i , where N represents a cut-off rank. For a given CWE type c w e i , Average Precision (AP) is calculated as the mean precision value at each rank:
    A P c w e i = 1 | | B c w e i | | κ = 1 N | | B c w e i W κ c w e i | | | | W κ c w e i | | · r e l ( κ )
    where r e l ( κ ) is an indicator function that equals one if the item at rank κ is a ground truth sibling CWE type of c w e i , that is, W c w e i [ κ ] B c w e i , and is zero otherwise. In the example of W c w e 23 , CWE22 is ranked as one and CWE36 is ranked as k = 14 . A P c w e 23 = 1 2 × ( 1 + 2 k ) = 1 2 × ( 1 + 2 14 ) = 0.5714 . Finally, given an XAI explanation method Φ , MAP is calculated as the mean average precision of all Q number of CWE types:
    M A P Φ = 1 Q q = 1 Q A P c w e q
  • Average Normalized Similarity Score ( S ¯ ) measures the average normalized similarity score for all CWE types in the baseline.
    S ¯ = c w e k B c w e i | | B c w e i | | c w e j W c w e i | | W c w e i | | ( 1 ρ ( cwe k , cwe j ) ) | | B c w e i | | · | | W c w e i | |
    where ρ ( cwe k , cwe j ) represents the similarity between a CWE type c w e k in the baseline and a CWE type c w e j derived using an XAI method. ρ ( cwe k , cwe j ) is calculated in Algorithm 2.

5. The Evaluation and Results

The evaluation aims to assess the importance of the contribution of syntactic constructs. These syntactic constructs are inflection nodes in the Abstract Syntax Tree (AST) that connect code token nodes in a path that convey semantic meanings [88]. We summarize the feature importance of syntactic constructs using XAI methods. We also validate the CWE similarity pairs from XAI explanations in comparison with the baseline from the community knowledge base. We present research experiments that could answer two specific research questions, as follows.
RQ1. What are the top-ranking syntactic constructs that contribute most to the multi-classification of software vulnerability? This question relies on the XAI methods to determine the importance of code tokens traversing syntactic construct paths that contribute to the deep learning model’s prediction for various vulnerability types.
RQ2. How does the CWE similarity summarized by XAI methods align with the expert-defined similarity? This question applies the measurement of CWE similarity pairs to the baseline CWE similarity pairs to validate whether the explanations of syntactic constructs correspond to the expert-established ground truth. Thus, the explanation maps the syntactic constructs to human-understandable CWE types of semantic meanings of vulnerable artifacts.

5.1. Datasets

Our experiment examined three benchmark software vulnerability datasets at the method or function level, including the Juliet Test Suite (Java) [62], OWASP Benchmark (Java) [89], and Draper (C/C++) [6]. These datasets can be sorted into three categories [23] based on the method of collection and annotation of the code samples. They represent synthetic, semi-synthetic, and real data, respectively.
Synthetic data refer to instances where both the vulnerability code example and its annotations are artificially constructed. The Juliet Test Suite, a product of the National Security Agency’s Center for Assured Software, falls under this category. By comprising 217 vulnerable methods (42%) and 297 non-vulnerable methods (58%), this dataset provides a balanced distribution of method-level examples, all synthesized based on recognized vulnerable patterns.
Semi-synthetic data involve either the code or its annotation being artificially derived. The OWASP Benchmark dataset, also Java-based, is an example of semi-synthetic data. We captured 1415 vulnerable methods (52%) and 1325 non-vulnerable methods (48%) from this dataset.
Real data, on the other hand, involve code and corresponding vulnerability annotations sourced from real-world repositories. The Draper dataset fits into this category. The functions in this dataset are collected from open-source repositories and annotated using static analyzers. While the original dataset presented an imbalanced distribution, we reprocessed it into a balanced dataset to analyze vulnerable code typed and their constructs and characters, and to preserve all comments and code. Consequently, this dataset includes 43,506 (50.1%) vulnerable functions. In Table 3, we summarize the vulnerability types and their respective distributions in each dataset. For vulnerability types that have less than 1% distribution, we group them all into a CWE-Other type.
Table 3. CWE distribution by dataset.

5.2. Assessing Contribution of Syntactic Constructs (RQ1)

The three-step approach for assessing syntactic construct importance with settings includes the following:
Step 1: Converting Graph Context. We utilized the srcML tool [83] to transform the method-level program into an AST structure. In this process, we removed code comments and retained mathematical and logical operators. The output from srcML is the XML-based content, encompassing both the code token (the leaf nodes in AST) and the AST path. This content was subsequently converted into a graph context, as introduced in Section 4.1. We retained the maximum edge length of eight and the window of ten as default values in the graph neural network model [39].
Step 2: Learning Embedding for Code Tokens. The graph context of each target token was used to learn the embedding of the target token. The graph convolutional neural network model was connected with a classifier layer for downstream classification tasks. The graph convolutional neural network model has one layer with a batch size of 64 and a dropout rate of 0. The embedding vector dimension is 128, which represents each code token for the classification models.
Step 3: Feature Masking and Feature Importance Ranking. We maintained the full graph context to retrieve embedding sets for the entire program’s code tokens. After masking each syntactic construct, we obtained altered neighbour tokens in a graph context as a form of feature masking to XAI methods SHAP and Mean-Centroid PredDiff. We compiled the results by averaging the contribution values across XAI methods.
Results. Table 4 (for Step 2) presents the performance of three classifiers augmented with GCN-based embeddings on Juliet, OWASP, and Draper datasets. The TextCNN classifier outperforms Random Forest and Transformer on all three datasets. We then chose TextCNN as the classifier with GCN embeddings to perform the following XAI tasks.
Table 4. Performance of classifiers augmented with GCN embeddings.
Figure 8 (for Step 3) shows, for each CWE type, the importance ranking of the meta syntactic constructs categorized in Table 1. We observe that despite the varying importance orders of the syntactic constructs for each CWE type, certain constructs such as statement_subelements, parameters, name, statement are consistently ranked highly across multiple CWEs, suggesting their general impact on code vulnerabilities. For instance, CWE78, CWE79, and CWE89 share similar top-ranked constructs, such as statement_subelements, name, decl_def_init, and operators. On the other hand, syntactic constructs such as specifier, classes have a lower importance across CWEs.
Answer summary to RQ1: Syntactic constructs statement_subelements, statement, name, and parameters consistently rank highly across sixteen CWE types, approximately 80% of all CWE types, indicating their contribution to code vulnerability classification.
Figure 8. Feature importance of meta syntactic constructs per CWE type, represented in descending order clockwise. Importance is quantified as the normalized feature contribution value from the XAI method, shown in the leaf nodes after the contract’s name. CWEs that describe similar vulnerability issues [60] are also categorized in the dendrogram.

5.3. CWE Similarity Explained by XAI Methods (RQ2)

Driven by the observations of syntactic similarities among certain CWE types, we further quantified CWE similarity based on the feature importance rank. We then compared these results with an expert-defined CWE similarity baseline.
Step 1: We computed the CWE similarity based on the importance rank of syntactic constructs. Figure 8 shows the importance values of nine metadata syntactic constructs grouped by CWE type. We further expanded the assessment of forty syntactic constructs to obtain the CWE similarity distance of any two CWE pairs using the full list of syntactic construct importance ranking, following Algorithm 2.
Step 2: We validated our XAI-based CWE similarity against the expert-defined baseline [60]. The similar sibling set for each CWE (required in Algorithm 2) is listed in Table 2. To assess our results, we employed four metrics to measure each CWE type, including Top-N Similarity Hit, Mean Reciprocal Rank (MRR), Mean Average Precision (MAP), and Average Normalized Similarity Score (ANSS) (Section 4.5). We obtained an aggregated score by averaging across all CWE types listed in Table 5.
Table 5. CWE similarity evaluation results.
Results. Figure 9 (for step 1) presents the CWE similarity ρ of three datasets. For instance, CWE23 and CWE22 show a strong similarity with a low distance value, indicating that they share similar syntactic constructs. On the other hand, the pair consisting of CWE23 and CWE328 has a high distance value in the matrix, indicating low similarity between the pair.
Figure 9. CWE similarity score ρ ( c w e i , c w e j ) for CWE pair from syntactic construct feature importance based on XAI approach.
The results presented in Table 5 (for step 2) evaluate the similarity of CWE types, explained by XAI methods, compared to the baseline. The average Top-1 hit is approximately 78%, which means our XAI approach is able to identify accurate siblings for 78% of CWE types. Considering the Top-5 hit, the accuracy in identifying the siblings improved to 89%. Top-N metrics focus on the existence of a sibling CWE using the XAI explanation.
The Mean Reciprocal Rank (MRR) and Mean Average Precision (MAP) further consider the ranking of a sibling CWE type derived from the XAI explanation. In addition, MAP is approximately 70%, considering both the number of existing CWEs and their rankings. We revisit the example of CWE23, which has CWE22 and CWE36. The XAI methods provide the similarity assessment with CWE22 in the first position and CWE36 in the 14th position. This is the reason for CWE23’s lower MRR value and MAP value.
Answer summary to RQ2: The XAI approach identifies the similarity between CWE types through the changes incurred in deep learning model’s classification due to feature masking. We applied metrics to evaluate the alignment of the XAI-derived similarity with the expert-established baseline by measuring both the occurrence and occurrence rankings of similar CWE types. The alignment connects the deep learning feature representations with the human-understandable CWE types. In addition to deep learning’s vulnerability classification, our XAI approach returns along the syntactic construct paths to locate the code that could lead to the misclassification of a similar CWE type.

5.4. Reflection on the Motivating Case

Studying the case presented in Section 2.1, we observe that the graph-based learning model faces misclassification among similar CWE types, as illustrated in Figure 10. In the example presented in Figure 2, a key difference in the processing of paths in CWE23 and CWE36 becomes evident. CWE23 employs a relative path embodied in the expression File file = new File(root + data), while CWE36 uses an absolute path, represented as File file = new File(data). Our XAI approach probes the feature importance in the terms of syntactic constructs to explain the potential cause of miscalculation. Both CWE23 and CWE36 cases exhibit similar overall feature importance ranking sequences, with name and if ranked in the top two syntactic constructs. However, in the case of CWE23, constructs such as argument_list, argument and operator are ranked higher than they are in the case of CWE36. This subtle feature ranking difference aligns with the unique characteristics of CWE23, which incorporates an additional argument root and an operator with + into the file, thereby creating the relative path.
Figure 10. The CWE23 code snippet contains two additional AST paths (marked with red), with argument and operator, to make the absolute path into a relative path, compared with CWE36.

5.5. Summary of Findings and Existing Research

We analyzed our findings via a comparative summary of previous studies. We focused on the insights drawn from the current research, areas of alignment, and novel discoveries listed in Table 6.
Table 6. Summary of our findings compared with existing work.
According to the manifesto of XAI [27], our work is within the scope of the application of attribute-based XAI methods. We demonstrate that our work relates to two aspects of the manifesto, as follows: (1) evaluating XAI methods and the explanation—we integrated the SOTA XAI methods within a pipeline where the semantic relations among the CWE types defined by domain experts are encoded as the XAI output ranking and explanation evaluation; (2) supporting the human-centeredness of explanations—we were able to identify and rank the contributions of syntactic constructs across languages. Such information helps to explain the difference at the syntactic level between entities relative to the misclassification of similar CWE types in a human-understandable way.

6. Threats to Validity

The validity of our work includes the following factors: (1) internal validity stemming from limited model evaluation and the use of specific datasets for XAI and (2) external validity threats related to the transferability of the identified importance and similarity of syntactic constructs to the emerging CWE types.
Dataset. We employed three datasets, namely Juliet, OWASP, and Draper. Both Juliet and OWASP consist of synthetic samples with artificially constructed annotations, which may limit their generalizability to real-world data. Draper consists of samples derived from real-world source codes. However, Draper does not share overlapping CWE types with either Juliet or OWASP. The disjointed datasets in the common CWE types mean that we cannot obtain a cross-validation of the top-ranking sequences of the syntactic constructs of the common CWE types across multiple datasets. As a result, we cannot further validate the consistency of the XAI explanation across datasets. In our previous work [31,61], we defined the explanation consistency metrics to measure the explanations across multiple datasets.
Models. Our XAI-based framework, depicted in Figure 3, is model-agnostic, including the embedding and classifier models. We applied one graph-embedding model adopted from GraphCodeVec [39], three deep learning models to be used as classifiers, and two XAI methods. The variance incurred by different models can be further evaluated by introducing more models to assess the explanation stability [31,61] across multiple models on the same datasets.
Transferability to Broader CWE Sets. Our study involves 20 CWE types beyond the 1% distribution percentage from three datasets. These 20 CWE types include six of the top twenty-five most dangerous software weaknesses listed by the CWE community [90]. In the full list of CWE types, the issue is the imbalanced data samples and the lack of labelled real datasets akin to Draper. In this paper, we preprocessed the Draper dataset to ensure it was balanced, since a poor classification performance from the imbalanced dataset causes the explanation results to be meaningless. Both the explanation stability and the consistency can be further validated with larger, balanced datasets.

7. Conclusions

In this work, we established the explanation for the program code in a graph context as the features and semantics of vulnerability types collectively defined by open community experts. Our study begins by defining a feature type taxonomy of code representations, and subsequently progresses to analyze syntactic constructs within abstract syntax-tree-based graph code representations. We developed an XAI-based framework to explain the relationship among the combination of 20 code vulnerability types and over 40 syntactic constructs from three Java and C++ datasets. We observed that the variation in the syntactic construct importance rankings relates to the intrinsic similarities amongst certain CWEs that share common vulnerability characteristics. We thus derived the CWE similarity based on the XAI explanation summary and validated it using the expert-defined baseline. We applied four types of information retrieval metrics to evaluate our XAI-based results. Our study links the comprehension of code semantics and syntactic feature representation learned by deep learning models for vulnerability classification.

Author Contributions

Conceptualization, Y.L. and D.L.; methodology, D.L. and Y.L.; software, D.L. and J.H.; validation, D.L. and Y.L.; formal analysis, D.L. and Y.L.; investigation, D.L. and Y.L.; resources, Y.L.; data curation, D.L. and J.H.; writing—original draft preparation, D.L.; writing—review and editing, D.L. and Y.L. and J.H.; visualization, D.L.; supervision, Y.L.; project administration, Y.L.; funding acquisition, Y.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research is supported by Canada Natural Sciences and Engineering Research Council Discovery Grant under RGPIN-2020-06797.

Data Availability Statement

The datasets used in this study are sourced from the open-source Juliet Test Suite for Java [62], the OWASP Benchmark for Java [89], and the Draper C/C++ suite [6]. The processed dataset example is available on GitHub: https://github.com/DataCentricClassificationofSmartCity/XAI-based-Software-Vulnerbility-Dection/tree/main/dataset, accessed on 1 May 2024.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. National Institute of Standards and Technology (NIST). Vulnerability Definition; Computer Security Resource Center: Gaithersburg, MA, USA, 2012.
  2. Dam, H.K.; Tran, T.; Pham, T.; Ng, S.W.; Grundy, J.; Ghose, A. Automatic feature learning for predicting vulnerable software components. IEEE Trans. Softw. Eng. 2019, 47, 67–85. [Google Scholar] [CrossRef]
  3. Zou, D.; Wang, S.; Xu, S.; Li, Z.; Jin, H. μVulDeePecker: A Deep Learning-Based System for Multiclass Vulnerability Detection. IEEE Trans. Dependable Secur. Comput. 2019, 18, 2224–2236. [Google Scholar] [CrossRef]
  4. Ghaffarian, S.M.; Shahriari, H.R. Software vulnerability analysis and discovery using machine-learning and data-mining techniques: A survey. ACM Comput. Surv. (CSUR) 2017, 50, 56. [Google Scholar] [CrossRef]
  5. Shin, Y.; Meneely, A.; Williams, L.; Osborne, J.A. Evaluating complexity, code churn, and developer activity metrics as indicators of software vulnerabilities. IEEE Trans. Softw. Eng. 2010, 37, 772–787. [Google Scholar] [CrossRef]
  6. Russell, R.; Kim, L.; Hamilton, L.; Lazovich, T.; Harer, J.; Ozdemir, O.; Ellingwood, P.; McConley, M. Automated vulnerability detection in source code using deep representation learning. In Proceedings of the 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA), Orlando, FL, USA, 17–20 December 2018; pp. 757–762. [Google Scholar]
  7. Zimmermann, T.; Nagappan, N.; Williams, L. Searching for a needle in a haystack: Predicting security vulnerabilities for windows vista. In Proceedings of the 2010 Third International Conference on Software Testing, Verification and Validation, Paris, France, 6–10 April 2010; pp. 421–428. [Google Scholar]
  8. Lin, G.; Wen, S.; Han, Q.L.; Zhang, J.; Xiang, Y. Software vulnerability detection using deep neural networks: A survey. Proc. IEEE 2020, 108, 1825–1848. [Google Scholar] [CrossRef]
  9. Morrison, P.; Herzig, K.; Murphy, B.; Williams, L. Challenges with applying vulnerability prediction models. In Proceedings of the 2015 Symposium and Bootcamp on the Science of Security, Urbana, IL, USA, 21–22 April 2015; pp. 1–9. [Google Scholar]
  10. Wheeler, D.A. Flawfinder. 2021. Available online: https://github.com/david-a-wheeler/flawfinder (accessed on 1 May 2024).
  11. Checkmarx. Checkmarx Software Security Platform. 2021. Available online: https://www.checkmarx.com (accessed on 1 May 2024).
  12. Kals, S.; Kirda, E.; Krügel, C.; Jovanovic, N. SecuBat: A Web Vulnerability Scanner. In Proceedings of the 15th International Conference on World Wide Web, Edinburgh, UK, 23–26 May 2006; pp. 247–256. [Google Scholar]
  13. PortSwigger. Burp Suite Web Vulnerability Scanner. 2021. Available online: https://portswigger.net/burp (accessed on 1 May 2024).
  14. Acunetix. Acunetix Web Vulnerability Scanner. 2021. Available online: https://www.acunetix.com/vulnerability-scanner (accessed on 1 May 2024).
  15. Nadeem, M.; Williams, B.J.; Allen, E.B. High false positive detection of security vulnerabilities: A case study. In Proceedings of the 50th Annual Southeast Regional Conference, Tuscaloosa, AL, USA, 29–31 March 2012; pp. 359–360. [Google Scholar]
  16. Shin, Y.; Williams, L. An empirical model to predict security vulnerabilities using code complexity metrics. In Proceedings of the 2nd ACM-IEEE IEEE International Symposium on Empirical Software Engineering and Measurement, Kaiserslautern, Germany, 9–10 October 2008; pp. 315–317. [Google Scholar]
  17. Shin, Y.; Williams, L. Can traditional fault prediction models be used for vulnerability prediction? Empir. Softw. Eng. 2013, 18, 25–59. [Google Scholar] [CrossRef]
  18. Sestili, C.D.; Snavely, W.S.; VanHoudnos, N.M. Towards security defect prediction with AI. arXiv 2018, arXiv:1808.09897. [Google Scholar]
  19. Lin, G.; Tang, M.; Wang, Y.; Luo, W.; Luo, X.; Liao, X. Cross-project transfer representation learning for vulnerable function discovery. IEEE Trans. Ind. Informat. 2018, 14, 3289–3297. [Google Scholar] [CrossRef]
  20. Jiang, J.; Wen, S.; Yu, S.; Xiang, Y.; Zhou, W. Identifying propagation sources in networks: State-of-the-art and comparative studies. IEEE Commun. Surveys Tuts. 2017, 19, 465–481. [Google Scholar] [CrossRef]
  21. Zhou, Y.; Liu, S.; Siow, J.; Du, X.; Liu, Y. Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks. Adv. Neural Inf. Process. Syst. 2019, 32, 10197–10207. [Google Scholar]
  22. Wang, H.; Ye, G.; Tang, Z.; Tan, S.H.; Huang, S.; Fang, D.; Feng, Y.; Bian, L.; Wang, Z. Combining graph-based learning with automated data collection for code vulnerability detection. IEEE Trans. Inf. Forensics Secur. 2020, 16, 1943–1958. [Google Scholar] [CrossRef]
  23. Chakraborty, S.; Krishna, R.; Ding, Y.; Ray, B. Deep Learning Based Vulnerability Detection: Are We There Yet? IEEE Trans. Softw. Eng. 2022, 48, 3280–3296. [Google Scholar] [CrossRef]
  24. Lin, G.; Zhang, J.; Luo, W.; Pan, L.; De Vel, O.; Montague, P.; Xiang, Y. Software vulnerability discovery via learning multi-domain knowledge bases. IEEE Trans. Dependable Secur. Comput. 2019, 18, 2469–2485. [Google Scholar] [CrossRef]
  25. Zeng, P.; Lin, G.; Pan, L.; Tai, Y.; Zhang, J. Software vulnerability analysis and discovery using deep learning techniques: A survey. IEEE Access 2020, 8, 197158–197172. [Google Scholar] [CrossRef]
  26. Gunning, D.; Stefik, M.; Choi, J.; Miller, T.; Stumpf, S.; Yang, G.Z. XAI—Explainable artificial intelligence. Sci. Robot. 2019, 4, eaay7120. [Google Scholar] [CrossRef]
  27. Longo, L.; Brcic, M.; Cabitza, F.; Choi, J.; Confalonieri, R.; Del Ser, J.; Guidotti, R.; Hayashi, Y.; Herrera, F.; Holzinger, A.; et al. Explainable artificial intelligence (XAI) 2.0: A manifesto of open challenges and interdisciplinary research directions. Inf. Fusion 2024, 106, 102301. [Google Scholar] [CrossRef]
  28. Lundberg, S.M.; Lee, S.I. A unified approach to interpreting model predictions. Adv. Neural Inf. Process. Syst. 2017, 30, 4768–4777. [Google Scholar]
  29. Ribeiro, M.T.; Singh, S.; Guestrin, C. “Why should I trust you?” Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 1135–1144. [Google Scholar]
  30. Guo, W.; Mu, D.; Xu, J.; Su, P.; Wang, G.; Xing, X. Lemna: Explaining deep learning based security applications. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, Toronto, ON, Canada, 15–19 October 2018; pp. 364–379. [Google Scholar]
  31. Li, D.; Liu, Y.; Huang, J.; Wang, Z. A Trustworthy View on Explainable Artificial Intelligence Method Evaluation. Computer 2023, 56, 50–60. [Google Scholar] [CrossRef]
  32. Alon, U.; Zilberstein, M.; Levy, O.; Yahav, E. code2vec: Learning distributed representations of code. Proc. ACM Program. Lang. 2019, 3, 1–29. [Google Scholar] [CrossRef]
  33. Hariharan, M.; Tanwar, A.; Sundaresan, K.; Ganesan, P.; Ravi, S.; Karthik, R. Proximal Instance Aggregator networks for explainable security vulnerability detection. Future Gener. Comput. Syst. 2022, 134, 303–318. [Google Scholar]
  34. Sotgiu, A.; Pintor, M.; Biggio, B. Explainability-based Debugging of Machine Learning for Vulnerability Discovery. In Proceedings of the 17th International Conference on Availability, Reliability and Security, Vienna, Austria, 23–26 August 2022; pp. 1–8. [Google Scholar]
  35. Jin, C.; Rinard, M. Evidence of Meaning in Language Models Trained on Programs. arXiv 2023, arXiv:2305.11169. [Google Scholar]
  36. Christey, S.; Kenderdine, J.; Mazella, J.; Miles, B. Common Weakness Enumeration; Mitre Corporation: McLean, VA, USA, 2013. [Google Scholar]
  37. Hariyanti, E.; Djunaidy, A.; Siahaan, D. Information security vulnerability prediction based on business process model using machine learning approach. Comput. Secur. 2021, 110, 102422. [Google Scholar] [CrossRef]
  38. Pan, S.; Bao, L.; Xia, X.; Lo, D.; Li, S. Fine-grained Commit-level Vulnerability Type Prediction by CWE Tree Structure. In Proceedings of the 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), Melbourne, Australia, 14–20 May 2023; pp. 957–969. [Google Scholar]
  39. Ding, Z.; Li, H.; Shang, W.; Chen, T.H. Towards Learning Generalizable Code Embeddings using Task-agnostic Graph Convolutional Networks. ACM Trans. Softw. Eng. Methodol. 2022, 32, 1–43. [Google Scholar] [CrossRef]
  40. Guo, D.; Ren, S.; Lu, S.; Feng, Z.; Tang, D.; Shujie, L.; Zhou, L.; Duan, N.; Svyatkovskiy, A.; Fu, S.; et al. GraphCodeBERT: Pre-training Code Representations with Data Flow. In Proceedings of the International Conference on Learning Representations, Addis Ababa, Ethiopia, 26–30 April 2020. [Google Scholar]
  41. Allen, F.E. Control flow analysis. ACM Sigplan Not. 1970, 5, 1–30. [Google Scholar] [CrossRef]
  42. Ferrante, J.; Ottenstein, K.J.; Warren, J.D. The program dependence graph and its use in optimization. ACM Trans. Program. Lang. Syst. (TOPLAS) 1987, 9, 319–349. [Google Scholar] [CrossRef]
  43. Nguyen, V.A.; Nguyen, D.Q.; Nguyen, V.; Le, T.; Tran, Q.H.; Phung, D. ReGVD: Revisiting Graph Neural Networks for Vulnerability Detection. In Proceedings of the 2022 IEEE/ACM 44th International Conference on Software Engineering: Companion Proceedings (ICSE-Companion), Pittsburgh, PA, USA, 22–24 May 2022; pp. 178–182. [Google Scholar]
  44. Yan, H.; Luo, S.; Pan, L.; Zhang, Y. HAN-BSVD: A hierarchical attention network for binary software vulnerability detection. Comput. Secur. 2021, 108, 102286. [Google Scholar] [CrossRef]
  45. Wang, Y.; Jia, P.; Peng, X.; Huang, C.; Liu, J. BinVulDet: Detecting vulnerability in binary program via decompiled pseudo code and BiLSTM-attention. Comput. Secur. 2023, 125, 103023. [Google Scholar] [CrossRef]
  46. Li, L.; Ding, S.H.; Tian, Y.; Fung, B.C.; Charland, P.; Ou, W.; Song, L.; Chen, C. VulANalyzeR: Explainable binary vulnerability detection with multi-task learning and attentional graph convolution. ACM Trans. Priv. Secur. 2023, 26, 1–25. [Google Scholar] [CrossRef]
  47. Tian, J.; Xing, W.; Li, Z. BVDetector: A program slice-based binary code vulnerability intelligent detection system. Inf. Softw. Technol. 2020, 123, 106289. [Google Scholar] [CrossRef]
  48. Sharma, R.; Chen, F.; Fard, F.; Lo, D. An exploratory study on code attention in BERT. In Proceedings of the 30th IEEE/ACM International Conference on Program Comprehension, Pittsburgh, PA, USA, 16–17 May 2022; pp. 437–448. [Google Scholar]
  49. Zheng, W.; Gao, J.; Wu, X.; Xun, Y.; Liu, G.; Chen, X. An Empirical Study of High-Impact Factors for Machine Learning-Based Vulnerability Detection. In Proceedings of the 2020 IEEE 2nd International Workshop on Intelligent Bug Fixing (IBF), London, ON, Canada, 18 February 2020; pp. 26–34. [Google Scholar]
  50. Yuan, X.; Lin, G.; Tai, Y.; Zhang, J. Deep neural embedding for software vulnerability discovery: Comparison and optimization. Secur. Commun. Netw. 2022, 2022, 1–12. [Google Scholar] [CrossRef]
  51. Alenezi, M.; Zagane, M.; Javed, Y. Efficient deep features learning for vulnerability detection using character n-gram embedding. Jordanian J. Comput. Inf. Technol. (JJCIT) 2021, 7, 25–38. [Google Scholar] [CrossRef]
  52. Jie, G.; Xiao-Hui, K.; Qiang, L. Survey on software vulnerability analysis method based on machine learning. In Proceedings of the 2016 IEEE first international conference on data science in cyberspace (DSC), Changsha, China, 13–16 June 2016; pp. 642–647. [Google Scholar]
  53. Vashishth, S.; Upadhyay, S.; Tomar, G.S.; Faruqui, M. Attention interpretability across nlp tasks. arXiv 2019, arXiv:1909.11218. [Google Scholar]
  54. Hanif, H.; Maffeis, S. Vulberta: Simplified source code pre-training for vulnerability detection. In Proceedings of the 2022 International Joint Conference on Neural Networks (IJCNN), Padua, Italy, 18–23 July 2022; pp. 1–8. [Google Scholar]
  55. Zhou, Z.; Bo, L.; Wu, X.; Sun, X.; Zhang, T.; Li, B.; Zhang, J.; Cao, S. SPVF: Security property assisted vulnerability fixing via attention-based models. Empir. Softw. Eng. 2022, 27, 171. [Google Scholar] [CrossRef]
  56. Kim, J.; Hubczenko, D.; Montague, P. Towards attention based vulnerability discovery using source code representation. In Proceedings of the Artificial Neural Networks and Machine Learning–ICANN 2019: Text and Time Series: 28th International Conference on Artificial Neural Networks, Munich, Germany, 17–19 September 2019; Proceedings, Part IV 28. Springer: Berlin/Heidelberg, Germany, 2019; pp. 731–746. [Google Scholar]
  57. Mao, Y.; Li, Y.; Sun, J.; Chen, Y. Explainable software vulnerability detection based on attention-based bidirectional recurrent neural networks. In Proceedings of the 2020 IEEE International Conference on Big Data (Big Data), Atlanta, GA, USA, 10–13 December 2020; pp. 4651–4656. [Google Scholar]
  58. Duan, X.; Wu, J.; Ji, S.; Rui, Z.; Luo, T.; Yang, M.; Wu, Y. VulSniper: Focus Your Attention to Shoot Fine-Grained Vulnerabilities. In Proceedings of the IJCAI, Macao, China, 10–16 August 2019; pp. 4665–4671. [Google Scholar]
  59. Mani, S.; Sankaran, A.; Aralikatte, R. Deeptriage: Exploring the effectiveness of deep learning for bug triaging. In Proceedings of the ACM India Joint International Conference on Data Science and Management of Data, Kolkata, India, 3–5 January 2019; pp. 171–179. [Google Scholar]
  60. Corporation, M. CWE-1000: Research Concepts; Technical report; MITRE: McLean, VA, USA, 2022; Available online: https://cwe.mitre.org/data/definitions/1000.html (accessed on 1 May 2024).
  61. Huang, J.; Wang, Z.; Li, D.; Liu, Y. The Analysis and Development of an XAI Process on Feature Contribution Explanation. In Proceedings of the 2022 IEEE International Conference on Big Data (Big Data), Osaka, Japan, 17–20 December 2022; pp. 5039–5048. [Google Scholar]
  62. Juliet Test Suite for C/C++ and Java; Technical report; National Institute of Standards and Technology (NIST): Gaithersburg, MA, USA, 2019.
  63. Tamilselvam, K. Preddiff: A novel feature importance measure for machine learning models. In Proceedings of the 2019 18th IEEE International Conference on Machine Learning and Applications (ICMLA), Boca Raton, FL, USA, 16–19 December 2019; pp. 1459–1463. [Google Scholar]
  64. Zintgraf, L.M.; Cohen, T.S.; Adel, T.; Welling, M. Visualizing deep neural network decisions: Prediction difference analysis. In Proceedings of the International Conference on Learning Representations (ICLR), Toulon, France, 24–26 April 2017. [Google Scholar]
  65. Covert, I.C.; Lundberg, S.; Lee, S.I. Explaining by removing: A unified framework for model explanation. J. Mach. Learn. Res. 2021, 22, 9477–9566. [Google Scholar]
  66. Blücher, S.; Vielhaben, J.; Strodthoff, N. PredDiff: Explanations and interactions from conditional expectations. Artif. Intell. 2022, 312, 103774. [Google Scholar] [CrossRef]
  67. Reynolds, D.A. Gaussian mixture models. Encycl. Biom. 2009, 741, 659–663. [Google Scholar]
  68. Boudjema, E.H.; Verlan, S.; Mokdad, L.; Faure, C. VYPER: Vulnerability detection in binary code. Secur. Priv. 2020, 3, e100. [Google Scholar] [CrossRef]
  69. Heelan, S.; Gianni, A. Augmenting vulnerability analysis of binary code. In Proceedings of the 28th Annual Computer Security Applications Conference, Orlando, FL, USA, 3–7 December 2012; pp. 199–208. [Google Scholar]
  70. Bojanowski, P.; Grave, E.; Joulin, A.; Mikolov, T. Enriching Word Vectors with Subword Information. Trans. Assoc. Comput. Linguist. 2017, 5, 135–146. [Google Scholar] [CrossRef]
  71. Svyatkovskiy, A.; Zaytsev, V.; Sundaresan, N. Semantic Source Code Models using Identifier Embeddings. In Proceedings of the 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER), Hangzhou, China, 24–27 February 2019; pp. 554–565. [Google Scholar]
  72. Loyola, P.; Matzger, B.; Schiele, G. Import2vec learning embeddings for software libraries. In Proceedings of the 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE), San Diego, CA, USA, 11–15 November 2019; pp. 1106–1108. [Google Scholar]
  73. Mikolov, T.; Sutskever, I.; Chen, K.; Corrado, G.S.; Dean, J. Distributed representations of words and phrases and their compositionality. Adv. Neural Inf. Process. Syst. 2013, 26, 3111–3119. [Google Scholar]
  74. Feng, Z.; Guo, D.; Tang, D.; Duan, N.; Feng, X.; Gong, M.; Shou, L.; Qin, B.; Liu, T.; Jiang, D.; et al. CodeBERT: A Pre-Trained Model for Programming and Natural Languages. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020, Online, 16–20 November 2020; pp. 1536–1547. [Google Scholar]
  75. Yang, Z.; Dai, Z.; Yang, Y.; Carbonell, J.; Salakhutdinov, R.R.; Le, Q.V. Xlnet: Generalized autoregressive pretraining for language understanding. Adv. Neural Inf. Process. Syst. 2019, 32, 5753–5763. [Google Scholar]
  76. Beltagy, I.; Peters, M.E.; Cohan, A. Longformer: The Long-Document Transformer. arXiv 2020, arXiv:2004.05150. [Google Scholar]
  77. Zaheer, M.; Guruganesh, G.; Dubey, K.A.; Ainslie, J.; Alberti, C.; Ontanon, S.; Pham, P.; Ravula, A.; Wang, Q.; Yang, L.; et al. Big bird: Transformers for longer sequences. Adv. Neural Inf. Process. Syst. 2020, 33, 17283–17297. [Google Scholar]
  78. OpenAI. GPT-4 Technical Report. arXiv 2023, arXiv:2303.08774. [Google Scholar]
  79. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
  80. Jain, S.; Wallace, B.C. Attention is not Explanation. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, MN, USA, 2–7 June 2019; pp. 3543–3556. [Google Scholar]
  81. Parr, T. The Definitive ANTLR 4 Reference; Pragmatic Bookshelf: Raleigh, NC, USA, 2013; p. 22. [Google Scholar]
  82. Li, Z.; Zou, D.; Xu, S.; Chen, Z.; Zhu, Y.; Jin, H. Vuldeelocator: A deep learning-based fine-grained vulnerability detector. IEEE Trans. Dependable Secur. Comput. 2021, 19, 2821–2837. [Google Scholar] [CrossRef]
  83. Collard, M.L.; Decker, M.J.; Maletic, J.I. srcml: An infrastructure for the exploration, analysis, and manipulation of source code: A tool demonstration. In Proceedings of the 2013 IEEE International Conference on Software Maintenance, Eindhoven, The Netherlands, 22–28 September 2013; pp. 516–519. [Google Scholar]
  84. Vashishth, S.; Bhandari, M.; Yadav, P.; Rai, P.; Bhattacharyya, C.; Talukdar, P. Incorporating Syntactic and Semantic Information in Word Embeddings using Graph Convolutional Networks. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July–2 August 2019; pp. 3308–3318. [Google Scholar]
  85. Chen, Y. Convolutional Neural Network for Sentence Classification. Master’s Thesis, University of Waterloo, Waterloo, ON, Canada, 2015. [Google Scholar]
  86. Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  87. Kendall, M.G. A new measure of rank correlation. Biometrika 1938, 30, 81–93. [Google Scholar] [CrossRef]
  88. Aho, A.V.; Lam, M.S.; Sethi, R.; Ullman, J.D. Compilers: Principles, Techniques, and Tools; Pearson Education: London, UK, 2006. [Google Scholar]
  89. Williams, J.; Wichers, D. The OWASP Benchmark Project. In Proceedings of the Open Web Application Security Project (OWASP) Conference, Washington, DC, USA, 23 October 2019. [Google Scholar]
  90. Corporation, M. CWE Top 25 List 2023. Available online: https://cwe.mitre.org/top25/archive/2023/2023_top25_list.html (accessed on 1 May 2024).
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.