1. Introduction
Smart contracts constitute the cornerstone of all decentralized applications [
1]. As such, any vulnerability in a smart contract could be detrimental to the security of the application. The most well-known incident related to smart contract vulnerability is the DAO hack that took place in 2016 [
2]. Here, DAO is short for decentralized autonomous organization. In the DAO hack, a hacker stole about ETH 3.54 million (worth about USD 150 million at the time) by exploiting the re-entrancy vulnerability in the smart contract that powers the DAO. Because of the severe impact of the incident, the Ethereum foundation was forced to perform a highly controversial hard fork to effectively restore the stolen funds back to the hands of the original investors [
3]. The positive impact of the incident is the sharply increased awareness of the smart contract security issue. Various tools have been developed to detect vulnerabilities in smart contracts. In recent years, machine learning and deep learning have also been used to identify vulnerabilities in smart contracts because they could be more robust in detecting vulnerabilities if trained properly.
In many ways, smart contract source code resembles natural languages. Hence, the schemes developed for natural language processing have been used to transform the smart contract source code (typically written in Solidity) into a form of input that is conducive for analysis, either by rule-based vulnerability detection tools or by machine-learning models. A large number of schemes have been proposed, some of which would transform the bytecode of the smart contract instead of the smart contract code into feature vectors. Each scheme focuses on capturing some specific characteristics of the smart contracts. Hence, it would be interesting to study the impact of different types of input towards the detection of smart contract vulnerabilities. Although some studies have incorporated more than one type of input, we have yet to see a systematic study that examines the impact of different types of input in the context of smart contract vulnerability detection.
The goal of this study is to examine the impact of four types of input, namely, Word2Vec, FastText, Bag-of-Words (BoW), and Term Frequency–Inverse Document Frequency (TF-IDF), towards the detection of six common types of smart contract vulnerability, namely re-entrancy, integer overflow, integer underflow, timestamp dependence, delegate call, and call stack depth attack. We choose to use a deep-learning model, the convolutional neural network (CNN), as the classifier. We report the performance of vulnerability detection in term of binary classification (i.e., a particular type of vulnerability vs. normal case) and multiclass classification (i.e., all types of vulnerability and the normal case). The choice of the four types of input and six types of vulnerability is driven by two considerations: (1) these types of input and types of vulnerability are the most heavily reported in the literature of smart contract vulnerability detection studies; (2) the six types of vulnerability have publicly available datasets, and the python code for converting a smart contract to three out of the four types of input is available in GitHub (we developed the python code for TF-IDF).
To our knowledge, this is the first study that systematically examines the impact of the input types for smart contract vulnerability detection, which constitutes the main research contribution of this paper. Our original hypothesis is that different input types would be complementary to each other (at least some of them are) in that one input type could exhibit superior detection performance for some types of vulnerability while another input type would show excellent detection performance for some other types of vulnerability. If proven true, then, we could develop an ensemble model that would select the best input type for each type of vulnerabilities. Unfortunately, our experimental results show that this is not the case. Instead of them being complementary to each other, TF-IDF clearly outperforms all other input types. Nevertheless, we think the findings still carry research merit.
Furthermore, we note that it is not our goal to propose a methodology that outperforms existing approaches. The current study is limited to the study of the impact of the input types on the detection performance of smart contract vulnerabilities. As such, we choose to use CNN as the classifier for experiments because it offers reasonably good performance and it does not require the availability of huge amount of training data. For the same reason, we intentionally do not use any attention mechanisms to improve the classification performance, and we do not use more advanced models such as Bidirectional Encoder Representations from Transformers (BERT) [
4].
The remainder of the paper is organized as follows.
Section 2 provides the necessary background information for the current study.
Section 3 discusses related work.
Section 4 describes the methodology of the current study, including the dataset used, input preparation, and the classifier description.
Section 5 presents the experimental results and analysis for our study.
Section 6 reflects our findings and points out limitations of the current study.
Section 7 concludes this paper.
3. Related Work
A large body of work has been published on machine-learning-based detection of smart contract vulnerabilities. In this paper, we focus on studies that have employed deep-learning models for detection. Given sufficient training data, deep-learning models typically attain better performance than traditional machine-learning models, as we have demonstrated previously [
15]. We further limit the related works to those that have adopted the same taxonomy on smart contract vulnerabilities [
5] (
Table 1).
In [
16], the re-entrancy and timestamp dependency vulnerabilities were detected (separately) using eight deep-learning models. The primary innovation was the introduction of an additional step prior to performing word embedding, which is referred to as the vulnerability candidate slice (VCS). The VCS was inspired by a common practice of extracting regions of interest in an image for recognition. Hence, the method was termed as DeeSCVHunter in the paper. The paper stated that three different word embeddings were employed, including Word2Vec, FastText, and Glove, and FastText was used as the default embedding method. However, the paper did not report the detection performance for each of the embedding methods. Presumably, the best performance out of the three embedding methods was reported.
In [
17], Word2Vec and FastText were used for word embedding. Furthermore, CNN was used to perform further feature extraction based on the output of the Word2Vec embedding, and the bidirectional gated recurrent unit (BiGRU) was used to perform further feature extraction based on the output of the FastText embedding. Then, the features extracted by CNN and BiGRU were combined by concatenation. Then, a fusion neural network layer and a softmax neural network layer were used to perform classification based on the fused input. The dataset contains six different types of vulnerability, including integer overflow, integer underflow, re-entrancy, timestamp dependency, CDAV, and the infinite loop. It appears that binary classification was performed for each type of vulnerability.
In [
18], a single type of vulnerability, i.e., re-entrancy, was detected using a transformer neural architecture called GraphCodeBERT [
25] and an improved version of data flow graph (referred to as a crucial data flow graph) as the way to encode the smart contracts for classification. The paper reported the classification performance using two datasets.
In [
19], a new model that converts the smart contract to a vector format was proposed (referred to as a sequential model). The study used a deep-learning model called the bidirectional long short-term memory with attention mechanism (BLSTM-ATT) to perform the detection of the re-entrancy vulnerability in smart contracts.
In [
20], the smart contracts were converted into a graph format (combining control flow, data flow, and fallback information). Two deep-learning models were proposed to detect vulnerabilities based on normalized graph input. One model is referred to as a degree-free graph convolutional neural network (DR-GCN), and the other is a novel temporal message propagation network (TMP). The performance of the proposed input and two deep-learning models was reported for the detection of each of three types of vulnerability, namely, re-entrancy, timestamp dependency, and infinite loop. TMP was shown to have better performance than DR-GCN.
In [
21], the same graph format that combines the control flow, data flow, and fallback information in the smart contracts was used as the input. Differently from that of [
20], four levels of input (three local patterns, and one global graph-based input similar to that of [
20]) were experimented on to see their impact on the detection performance. An attentive multi-encoder network was used to detect each of three types of vulnerability: re-entrancy, timestamp dependency, and infinite loop. The different levels of input were used to illustrate the interpretability of the detection process.
In [
23], Word2Vec was used for word embedding, and a hybrid attention mechanism with deep learning was used for the detection of vulnerabilities. The detection performance was presented for each of the five types of vulnerability, namely re-entrancy, timestamp dependency, arithmetic vulnerability, unchecked return value, and Tx.origin vulnerability.
In [
22], a control flow graph was used to represent the smart contract fragments as the input to the deep-learning model, which is referred to as a dual attention graph convolutional network (DA-GCN). Two types of smart contract vulnerability, namely, re-entrancy and timestamp dependency, were detected separately.
In [
24], Word2Vec was used to encode the smart contract fragments as the input to a sophisticated deep-learning model referred to as the Serial–Parallel Convolutional Bidirectional Gated Recurrent Network Model incorporating Ensemble Classifiers (SPCBIG-EC). Two types of smart contract vulnerability, re-entrancy and timestamp dependency, were detected separately.
As can be seen, most of related studies have chosen to use a single input type. Although three types of input were mentioned in [
16], FastText was used as the default input type, and the study did not disclose any impact of the input types on the vulnerability detection performance. In [
17], Word2Vec and FastTest were fused together as the input. We are not aware of any study that systematically compared the impact of different input types on the detection performance.
Again, we note that it is not our goal is to propose a method that outperforms other approaches. Nevertheless, for completeness, we show the vulnerability detection performance of the related studies compared with that of ours in Table 6 in
Section 5.3. The purpose of the comparison is to summarize what has been studied and the reported results instead of drawing any definitive conclusion on which approach is superior because these studies often used different datasets, in addition to the use of different input types and different classifiers.
5. Vulnerability Detection Results and Analysis
We first present the results for binary classification scenarios. Then, we present the results for the multiclass classification scenario. Finally, we present a comparison with related work.
5.2. Multiclass Classification
Although the detection of a single type of vulnerability has some value in research, its practical impact is very limited because one would not know which type of vulnerability exists in a given smart contract in general. Furthermore, it is also important to inform what types of vulnerability a smart contract contains. Scientifically, it is interesting to determine the power of a machine-learning model with a certain type of input to discern different types of vulnerability. Hence, it is informative to conduct multiclass classification, although this is rarely performed in machine-learning-based vulnerability detection for smart contracts.
To facilitate the multiclass classification study, we first construct the training dataset based on those for individual datasets created for binary classification studies. The six types of vulnerability and the no-vulnerability fragments together constitute seven classes. We re-labeled the data as follows: no vulnerability as class 0, timestamp dependency vulnerability as class 1, re-entrancy vulnerability as class 2, integer underflow vulnerability as class 3, delegate vulnerability as class 4, CDVA vulnerability as class 5, and integer overflow vulnerability as class 6.
Because the size of the dataset for each type of vulnerability is quite different, we are facing a class imbalance problem. To avoid the imbalance between the classes during training, the Synthetic Minority Over-sampling Technique (SMOTE) [
32] and Undersampling are used. SMOTE works by oversampling the minority classes and it is used to augment minority classes. Undersampling is used on majority classes.
The vulnerability detection overall accuracy for the four types of input is summarized in
Figure 11. The overall accuracy is calculated as the fraction of correctly classified samples with respect to all the samples in the test set. As we can see in
Figure 11, although TF-IDF still has good performance (at 95.09%), BoW actually has the highest accuracy at 95.84%. Word2Vec comes as the third best performer at 85.07%, while FastText performs in the last place at 84.68%. This is significantly different from the results in binary classification. To understand the details of multiclass classification, it is necessary to inspect the confusion matrix [
33] for each type of input, which is shown in
Figure 12,
Figure 13,
Figure 14, and
Figure 15, respectively.
Each row of the confusion matrix represents instances in an actual class, and each column represents instances in a predicted class. We define the misclassification rate as the fraction of the sum of misclassified instances in a row of the total number of instances in the row.
For BoW, the classifier with BoW has the most difficulty detecting timestamp dependency, and can detect delegate and integer overflow perfectly. The result is consistent with binary classification using BoW for timestamp dependency and delegate; it is somewhat surprising that BoW is capable of detecting integer overflow perfectly in multiclass classification, while it has only 87.50% F1 score in binary classification. The low F1 score is primarily due to low precision, which means BoW suffers from false positives in binary classification for integer overflow vulnerability.
For TF-IDF, the classifier with TF-IDF has the most difficulty detecting CDVA, integer underflow, and timestamp dependency, and it has the best performance in detecting integer overflow. Again, the result is not consistent with that of the binary classification with TF-IDF, where only the detection performance for timestamp dependency is noticeably lower than the remaining five types of vulnerability.
For Word2Vec, the classifier with Word2Vec performs badly for all types except for delegate and integer overflow. Although the outstanding detection performance for delegate and integer overflow is consistent with that of the binary classification with Word2Vec, the detection performance for other types of vulnerability is noticeably lower than that in binary classification. Furthermore, the detection for integer underflow has the (tied) worst performance in multiclass classification. The result also shows that there is a significant issue of false positives (i.e., 0.235 misclassification rate for the no-vulnerability class).
The result for FastText is very similar to that for Word2Vec, but in general slightly worse. Most notably, the model with FastText struggles to detect integer underflow reliably with a misclassification rate of 0.296.
In summary, in consideration of the results from both binary classification and the multiclass classification, TF-IDF is the only input type that is capable of making excellent detection consistently, demonstrating that this input type can capture the essential characteristics of the six types of vulnerability. Although BoW has the worst overall performance in binary classification, it actually achieves the best performance in multiclass classification. Furthermore, although Word2Vec and FastText exhibit fairly good performance in binary classification, the overall accuracy in multiclass classification is only decent. Closer examination reveals that Word2Vec and FastText have very bad detection performance for timestamp dependency, integer underflow, and CDVA, and they also suffer from an unacceptable rate of false positives.
Author Contributions
Conceptualization, I.M.A., W.Z., S.Y. and X.L.; methodology, I.M.A., W.Z., S.Y. and X.L.; investigation, I.M.A., W.Z., S.Y. and X.L.; writing—original draft preparation, I.M.A. and W.Z.; writing—review and editing, I.M.A., W.Z., S.Y. and X.L.; visualization, I.M.A. and W.Z. All authors have read and agreed to the published version of the manuscript.
Funding
This work was supported in part by the Beijing Natural Science Foundation under Grants L211020 and M21032.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
Acknowledgments
The authors wish to express their deep gratitude to the anonymous reviewers for their invaluable comments and suggestions.
Conflicts of Interest
The authors declare no conflicts of interest.
References
- Zhao, W. From Traditional Fault Tolerance to Blockchain; John Wiley & Sons: Hoboken, NJ, USA, 2021. [Google Scholar]
- Dhillon, V.; Metcalf, D.; Hooper, M.; Dhillon, V.; Metcalf, D.; Hooper, M. The DAO hacked. In Blockchain Enabled Applications: Understand the Blockchain Ecosystem and How to Make it Work for You; Springer: Berlin/Heidelberg, Germany, 2017; pp. 67–78. [Google Scholar]
- Mehar, M.I.; Shier, C.L.; Giambattista, A.; Gong, E.; Fletcher, G.; Sanayhie, R.; Kim, H.M.; Laskowski, M. Understanding a revolutionary and flawed grand experiment in blockchain: The DAO attack. J. Cases Inf. Technol. (JCIT) 2019, 21, 19–32. [Google Scholar] [CrossRef]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
- Kushwaha, S.S.; Joshi, S.; Singh, D.; Kaur, M.; Lee, H.N. Systematic review of security vulnerabilities in Ethereum blockchain smart contract. IEEE Access 2022, 10, 6605–6621. [Google Scholar] [CrossRef]
- Mik, E. Smart contracts: Terminology, technical limitations and real world complexity. Law Innov. Technol. 2017, 9, 269–300. [Google Scholar] [CrossRef]
- Liu, C.; Liu, H.; Cao, Z.; Chen, Z.; Chen, B.; Roscoe, B. Reguard: Finding reentrancy bugs in smart contracts. In Proceedings of the 40th International Conference on Software Engineering: Companion Proceeedings, Gothenburg, Sweden, 27 May–3 June 2018; pp. 65–68. [Google Scholar]
- Wöhrer, M.; Zdun, U. Design patterns for smart contracts in the ethereum ecosystem. In Proceedings of the 2018 IEEE International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData), Halifax, NS, Canada, 30 July–3 August 2018; pp. 1513–1520. [Google Scholar]
- Atzei, N.; Bartoletti, M.; Cimoli, T. A survey of attacks on ethereum smart contracts (sok). In Proceedings of the Principles of Security and Trust: 6th International Conference, POST 2017, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2017, Uppsala, Sweden, 22–29 April 2017; pp. 164–186. [Google Scholar]
- Gupta, B.C.; Shukla, S.K. A study of inequality in the ethereum smart contract ecosystem. In Proceedings of the 2019 Sixth International Conference on Internet of Things: Systems, Management and Security (IOTSMS), Granada, Spain, 22–25 October 2019; pp. 441–449. [Google Scholar]
- Mikolov, T.; Chen, K.; Corrado, G.; Dean, J. Efficient estimation of word representations in vector space. arXiv 2013, arXiv:1301.3781. [Google Scholar]
- Mnih, A.; Kavukcuoglu, K. Learning word embeddings efficiently with noise-contrastive estimation. Adv. Neural Inf. Process. Syst. 2013, 26. [Google Scholar]
- Le, Q.; Mikolov, T. Distributed representations of sentences and documents. In Proceedings of the International Conference on Machine Learning, PMLR, Beijing, China, 21–26 June 2014; pp. 1188–1196. [Google Scholar]
- Ramos, J. Using tf-idf to determine word relevance in document queries. In Proceedings of the First Instructional Conference on Machine Learning, Citeseer; 2003; Volume 242, pp. 29–480. [Google Scholar]
- Zhao, W. Towards frame-level person identification using Kinect skeleton data with deep learning. In Proceedings of the 2021 IEEE Symposium Series on Computational Intelligence (SSCI), Virtual, 5–7 December 2021; pp. 1–8. [Google Scholar]
- Yu, X.; Zhao, H.; Hou, B.; Ying, Z.; Wu, B. Deescvhunter: A deep learning-based framework for smart contract vulnerability detection. In Proceedings of the 2021 International Joint Conference on Neural Networks (IJCNN), Shenzhen, China, 18–22 July 2021; pp. 1–8. [Google Scholar]
- Zhang, L.; Chen, W.; Wang, W.; Jin, Z.; Zhao, C.; Cai, Z.; Chen, H. CBGRU: A detection method of smart contract vulnerability based on a hybrid model. Sensors 2022, 22, 3577. [Google Scholar] [CrossRef] [PubMed]
- Wu, H.; Zhang, Z.; Wang, S.; Lei, Y.; Lin, B.; Qin, Y.; Zhang, H.; Mao, X. Peculiar: Smart contract vulnerability detection based on crucial data flow graph and pre-training techniques. In Proceedings of the 2021 IEEE 32nd International Symposium on Software Reliability Engineering (ISSRE), Wuhan, China, 25–28 October 2021; pp. 378–389. [Google Scholar]
- Qian, P.; Liu, Z.; He, Q.; Zimmermann, R.; Wang, X. Towards automated reentrancy detection for smart contracts based on sequential models. IEEE Access 2020, 8, 19685–19695. [Google Scholar] [CrossRef]
- Zhuang, Y.; Liu, Z.; Qian, P.; Liu, Q.; Wang, X.; He, Q. Smart contract vulnerability detection using graph neural networks. In Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence, online, 7–15 January 2021; pp. 3283–3290. [Google Scholar]
- Liu, Z.; Qian, P.; Wang, X.; Zhu, L.; He, Q.; Ji, S. Smart contract vulnerability detection: From pure neural network to interpretable graph feature and expert pattern fusion. arXiv 2021, arXiv:2106.09282. [Google Scholar]
- Fan, Y.; Shang, S.; Ding, X. Smart contract vulnerability detection based on dual attention graph convolutional network. In Proceedings of the Collaborative Computing: Networking, Applications and Worksharing: 17th EAI International Conference, CollaborateCom 2021, Virtual Event, 16–18 October 2021; pp. 335–351. [Google Scholar]
- Wu, H.; Dong, H.; He, Y.; Duan, Q. Smart contract vulnerability detection based on hybrid attention mechanism model. Appl. Sci. 2023, 13, 770. [Google Scholar] [CrossRef]
- Zhang, L.; Li, Y.; Jin, T.; Wang, W.; Jin, Z.; Zhao, C.; Cai, Z.; Chen, H. SPCBIG-EC: A robust serial hybrid model for smart contract vulnerability detection. Sensors 2022, 22, 4621. [Google Scholar] [CrossRef] [PubMed]
- Guo, D.; Ren, S.; Lu, S.; Feng, Z.; Tang, D.; Liu, S.; Zhou, L.; Duan, N.; Svyatkovskiy, A.; Fu, S.; et al. Graphcodebert: Pre-training code representations with data flow. arXiv 2020, arXiv:2009.08366. [Google Scholar]
- Hwang, S.J.; Choi, S.H.; Shin, J.; Choi, Y.H. CodeNet: Code-targeted convolutional neural network architecture for smart contract vulnerability detection. IEEE Access 2022, 10, 32595–32607. [Google Scholar] [CrossRef]
- Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
- Abadi, M.; Agarwal, A.; Barham, P.; Brevdo, E.; Chen, Z.; Citro, C.; Corrado, G.S.; Davis, A.; Dean, J.; Devin, M.; et al. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv 2016, arXiv:1603.04467. [Google Scholar]
- Qiao, S.; Han, N.; Huang, J.; Yue, K.; Mao, R.; Shu, H.; He, Q.; Wu, X. A dynamic convolutional neural network based shared-bike demand forecasting model. ACM Trans. Intell. Syst. Technol. (TIST) 2021, 12, 1–24. [Google Scholar] [CrossRef]
- Durieux, T.; Ferreira, J.F.; Abreu, R.; Cruz, P. Empirical review of automated analysis tools on 47,587 ethereum smart contracts. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering, Seoul, Republic of Korea, 27 June–19 July 2020; pp. 530–541. [Google Scholar]
- Durieux, T.; Madeiral, F.; Martinez, M.; Abreu, R. Empirical review of Java program repair tools: A large-scale experiment on 2,141 bugs and 23,551 repair attempts. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Tallinn, Estonia, 26–30 August 2019; pp. 302–313. [Google Scholar]
- Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
- Raschka, S.; Mirjalili, V. Python Machine Learning: Machine Learning and Deep Learning with Python, Scikit-Learn, and TensorFlow 2; Packt Publishing Ltd.: Birmingham, UK, 2019. [Google Scholar]
Figure 1.
The overview of methodology in detection of smart contract vulnerabilities.
Figure 3.
Vulnerability detection performance with Word2Vec embedding.
Figure 4.
Vulnerability detection performance with FastText embedding.
Figure 5.
Vulnerability detection performance with BoW.
Figure 6.
Vulnerability detection performance with TF-IDF.
Figure 7.
Detection accuracy for six types of smart contract vulnerability with the four types of input.
Figure 8.
Detection precision for six types of smart contract vulnerability with the four types of input.
Figure 9.
Detection recall for six types of smart contract vulnerability with the four types of input.
Figure 10.
Detection F1 score for six types of smart contract vulnerability with the four types of input.
Figure 11.
Overall accuracy with Word2Vec, FastText, BoW, and TF-IDF for multiclass classification.
Figure 12.
Confusion matrix with BoW as the input type.
Figure 13.
Confusion matrix with TF-IDF as the input type.
Figure 14.
Confusion matrix with Word2Vec as the input type.
Figure 15.
Confusion matrix with FastText as the input type.
Table 1.
Summary of related work.
Study | Input Type(s) | Types of Vulnerability Detected |
---|
DeeSCVHunter [16] | FastText (Word2Vec + Glove) | Re-entrancy and timestamp dependency |
CBGRU [17] | Word2Vec+FastText | Re-entrancy, timestamp dependency, integer overflow/underflow, CDAV, and infinite loop |
Peculiar [18] | Graph | Re-entrancy |
BLSTM-ATT [19] | Sequential | Re-entrancy |
TMP [20] | Graph | Re-entrancy, timestamp dependency, and infinite loop |
AME [21] | Graph | Re-entrancy, timestamp dependency, and infinite loop |
DA-GCN [22] | Graph | Re-entrancy and timestamp dependency |
HAM [23] | Word2Vec | Re-entrancy, timestamp dependency, arithmetic vulnerability, unchecked return value, and Tx.origin |
SPCBIG-EC [24] | Word2Vec | Re-entrancy, timestamp dependency, and infinite loop |
Table 2.
Vulnerability detection performance with Word2Vec embedding.
Dataset | Accuracy | Recall | Precision | F1 Score |
---|
Delegate | 96.77 | 100.00 | 93.94 | 96.88 |
Integer Overflow | 94.44 | 100.00 | 90.00 | 94.74 |
CDAV | 86.96 | 85.51 | 88.06 | 86.76 |
Integer Underflow | 84.42 | 86.43 | 83.09 | 84.73 |
Re-entrancy | 84.23 | 79.34 | 88.07 | 83.48 |
Timestamp Dependency | 78.20 | 84.78 | 74.92 | 79.55 |
Table 3.
Vulnerability detection performance with FastText embedding.
Dataset | Accuracy | Recall | Precision | F1 Score |
---|
Delegate | 100.00 | 100.00 | 100.00 | 100.00 |
Integer Overflow | 91.67 | 100.00 | 85.71 | 92.31 |
CDAV | 86.78 | 84.42 | 88.59 | 86.46 |
Integer Underflow | 85.43 | 87.94 | 83.73 | 85.78 |
Re-entrancy | 85.06 | 83.33 | 86.21 | 84.75 |
Timestamp Dependency | 77.16 | 84.78 | 73.57 | 78.78 |
Table 4.
Vulnerability detection performance with BoW.
Dataset | Accuracy | Recall | Precision | F1 Score |
---|
Delegate | 99.19 | 100.00 | 98.41 | 99.20 |
Integer Overflow | 86.11 | 97.22 | 79.55 | 87.50 |
Re-entrancy | 78.84 | 75.21 | 81.25 | 78.11 |
CDAV | 76.81 | 82.97 | 73.87 | 78.16 |
Timestamp Dependency | 76.47 | 83.04 | 73.39 | 77.92 |
Integer Underflow | 74.37 | 73.12 | 75.00 | 74.05 |
Table 5.
Vulnerability detection performance with TF-IDF.
Dataset | Accuracy | Recall | Precision | F1 Score |
---|
Delegate | 99.19 | 100.00 | 98.41 | 99.20 |
Re-entrancy | 95.85 | 96.69 | 95.12 | 95.90 |
Integer Underflow | 95.23 | 95.98 | 94.55 | 95.26 |
CDAV | 95.11 | 96.01 | 94.31 | 95.15 |
Integer Overflow | 94.44 | 100.00 | 90.00 | 94.74 |
Timestamp Dependency | 85.29 | 85.81 | 84.93 | 85.37 |
Table 6.
F1 scores for the detection of various vulnerability types in our study and related studies.
Study | Input | R | T | D | C | IO | IU |
---|
DeeSCVHunter [16] | FastText (+W+G) | 86.87 | 79.93 | | | | |
CBGRU [17] | Word2Vec+FastText | 90.92 | 93.29 | | 90.21 | 86.43 | 85.28 |
Peculiar [18] | Graph | 92.40 | | | | | |
AME [21] | Graph | 87.94 | 84.10 | | | | |
BLSTM-ATT [19] | Sequential | 89.81 | | | | | |
TMP [20] | Graph | 78.11 | 79.19 | | | | |
DA-GCN [22] | Graph | 85.43 | 84.83 | | | | |
SPCBIG-EC [24] | Word2Vec | 96.74 | 91.62 | | | | |
HAM [23] | Word2Vec | 94.04 | 87.85 | | | | |
This Study | TF-IDF | 95.90 | 85.37 | 99.20 | 95.15 | 94.74 | 95.90 |
| Word2Vec | 83.48 | 79.55 | 96.77 | 86.76 | 94.74 | 84.73 |
| FastText | 84.75 | 78.78 | 100.00 | 86.46 | 92.31 | 85.78 |
| BoW | 78.11 | 74.05 | 99.20 | 78.16 | 87.50 | 74.05 |
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).