Detection of Reflected XSS Vulnerabilities Based on Paths-Attention Method
Abstract
:1. Introduction
2. Related Work
- From the perspective of code semantics, using the paths-attention model for reflection-type XSS vulnerability detection, applying advanced methods in natural language processing to the field of vulnerability detection, and adjusting experiments to obtain a superior detection model.
- Constructing a reliable dataset. Currently, most datasets for reflection-type XSS vulnerability detection suffer from imbalanced positive and negative samples. To avoid this problem, this article selects files numbered WEC80XSS from the National Institute of Standards and Technology’s (NIST) Software Assurance Reference Dataset (SARD) as negative samples and real project code data as positive samples [15].
- Implementing the vulnerability detection method. Based on a real-world vulnerability dataset, this article conducted simulation experiments that showed that our proposed vulnerability detection method has advantages in both efficiency and accuracy.
3. Vulnerability Detection Method Based on Paths-Attention
3.1. Theoretical Analysis of Paths-Attention Vulnerability Detection Method
3.1.1. Analysis of Semantic Extraction Method for Paths-Attention
3.1.2. Semantic Extraction and Vulnerability Detection Compatibility Analysis of Paths-Attention
- By using a function-based approach, the system reduces the learning cost for vulnerability features.
- The introduction of attention mechanism enables the model to extract semantic information more effectively, and the detection efficiency is higher when finding vulnerability features in semantic features.
- 1.
- Difficulty in distinguishing between code snippets with similar semantics: The paths-attention method starts from the syntax path and decomposes code snippets into groups of syntax paths. Similar semantics may have similar syntactic structures, but the importance of deeper syntactic paths is not necessarily the same. The semantic analysis process for two similar pieces of code is shown in Figure 4.
- 2.
- Issue of feature coverage range of the dataset: The model samples are selected from the National Institute of Standards and Technology (NIST) in the United States. After processing, a relatively rich set of samples can be used for training to allow the model to learn semantic features of vulnerability code snippets as fully as possible. This enables the model to have both high accuracy and fast detection capabilities.
3.2. The Composition of the Paths-Attention Vulnerability Detection Method
3.2.1. Source Code Function Preprocessing
3.2.2. Code Mapping
3.2.3. Attention Mechanism
Algorithm 1: Neural Network Attention Mechanism |
Input: Code snippet D |
Output: Vector V |
Process: |
Step 1: Convert code snippets to AST; |
Step 2: Traverse the AST to parse the syntax path, store the value of each node in a matrix called “vocab value”, and store the path in “vocab path”; |
Step 3: Embed the set of discrete vector groups into a continuous space and combine them into a single vector |
to obtain the aggregated vector v. |
4. Experimental Design and Analysis
4.1. Dataset Construction
4.2. Data Preprocessing
4.3. Model Parameters
4.4. Analysis of Semantic Feature Extraction Results
4.5. Analysis of XSS Vulnerability Detection Results
4.6. Melting Experiment
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Vieira, M.; Antunes, N. Defending against web application vulnerabilities. Computer 2012, 45, 66–72. [Google Scholar]
- Huang, J.; Guan, X.; Li, S. Software defect prediction model based on attention mechanism. In Proceedings of the 2021 International Conference on Computer Engineering and Application (ICCEA), Kunming, China, 25–27 June 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 338–345. [Google Scholar]
- Ni, P.; Chen, W. Detection of Reflected Cross-Site Scripting Vulnerabilities Based on Fuzzy Testing. J. Comput. Appl. 2021, 41, 2594. [Google Scholar]
- Wang, X.; Chen, J.; Gu, Y. Generalized graph signal sampling and reconstruction. In Proceedings of the 2015 IEEE Global Conference on Signal and Information Processing (GlobalSIP), Orlando, FL, USA, 14–16 December 2015; pp. 567–571. [Google Scholar]
- Kirda, E.; Jovanovic, N.; Kruegel, C. Client-side cross-site scripting protection. Comput. Secur. 2009, 28, 592–604. [Google Scholar] [CrossRef]
- Qin, Y. Key Technology Research and Implementation of Stored Cross-Site Scripting Vulnerabilities. Ph.D. Thesis, Beijing University of Technology, Beijing, China, 2019. [Google Scholar]
- Zhang, W. Research and Improvement of White Box Fuzz Testing Technology. Ph.D. Thesis, Nanjing University of Posts and Telecommunications, Nanjing, China, 2019. [Google Scholar]
- Simos, D.E.; Garn, B.; Zivanovic, J.; Leithner, M. Practical combinatorial testing for XSS detection using locally optimized attack models. In Proceedings of the 2019 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW), Xi’an, China, 22–23 April 2019; pp. 122–130. [Google Scholar]
- Liu, M.; Zhang, B.; Chen, W.; Zhang, X. A survey of exploitation and detection methods of XSS vulnerabilities. IEEE Access 2019, 7, 182004–182016. [Google Scholar] [CrossRef]
- Allamanis, M.; Tarlow, D.; Gordon, A.; Wei, Y. Bimodal modelling of source code and natural language. In Proceedings of the International Conference on Machine Learning, Lille, France, 6–11 July 2015; pp. 2123–2132. [Google Scholar]
- Gamez-Diaz, A.; Fernandez, P.; Ruiz-Cortes, A. An analysis of RESTful APIs offerings in the industry. In Proceedings of the Service-Oriented Computing: 15th International Conference, ICSOC 2017, Malaga, Spain, 13–16 November 2017; Springer: Berlin/Heidelberg, Germany; pp. 589–604. [Google Scholar]
- Gu, M.; Wang, D.; Zhao, W.; Fu, L. A Penetration Testing Method for XSS Vulnerabilities Based on Attack Vector Generation. Softw. Guide 2016, 15, 173–177. [Google Scholar]
- Sivanesan, A.P.; Mathur, A.; Javaid, A.Y. A google chromium browser extension for detecting XSS attack in html5 based websites. In Proceedings of the 2018 IEEE International Conference on Electro/Information Technology (EIT), Rochester, MI, USA, 3–5 May 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 302–304. [Google Scholar]
- Liu, Z.; Fang, Y.; Huang, C.; Han, J. GraphXSS: An efficient XSS payload detection approach based on graph convolutional network. Comput. Secur. 2022, 114, 102597. [Google Scholar] [CrossRef]
- Software Assurance Reference Dataset from the National Institute of Standards and Technology (NIST). 2018. Available online: https://www.nist.gov/itl/ssd/software-quality-group/samate/software-assurance-reference-dataset-sard (accessed on 1 May 2021).
- Yang, L.; Wu, Y.; Wang, J.; Liu, Y. A Review of Research on Recurrent Neural Networks. Comput. Appl. 2018, 38, 1–6+26. [Google Scholar]
- Iyer, S.; Konstas, I.; Cheung, A.; Zettlemoyer, L. Summarizing source code using a neural attention model. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany, 7–12 August 2016; Volume 1, pp. 2073–2083. [Google Scholar]
- Bielik, P.; Raychev, V.; Vechev, M. PHOG: Probabilistic model for code. In Proceedings of the International Conference on Machine Learning, New York, NY, USA, 19–24 June 2016; pp. 2933–2942. [Google Scholar]
- Alon, U.; Zilberstein, M.; Levy, O.; Yahav, E. code2vec: Learning distributed representations of code. Proc. ACM Program. Lang. 2019, 3, 1–29. [Google Scholar] [CrossRef] [Green Version]
- Tsukamoto, S.; Sakai, S.; Irie, H. Detection of Reflected XSS Using Dynamic Information Flow Tracking. Res. Rep. Syst. Archit. ARC 2019, 2019, 1–6. [Google Scholar]
- Mishne, A.; Shoham, S.; Yahav, E. Typestate-based semantic code search over partial programs. In Proceedings of the ACM International Conference on Object Oriented Programming Systems Languages and Applications, Tucson, AZ, USA, 19–26 October 2012. [Google Scholar]
- Sun, H.; Cui, L.; Li, L.; Ding, Z.; Hao, Z.; Cui, J.; Liu, P. VDSimilar: Vulnerability detection based on code similarity of vulnerabilities and patches. Comput. Secur. 2021, 110, 102417. [Google Scholar] [CrossRef]
- Li, Q.; Wang, R.; Jia, X. Cross-site scripting detection method based on classifier and improved n-gram model in OSN. Comput. Appl. 2014, 34, 1661–1665. [Google Scholar]
- Dam, H.K.; Pham, T.; Ng, S.W.; Tran, T.; Grundy, J.; Ghose, A.; Kim, T.; Kim, C.-J. Lessons learned from using a deep tree-based model for software defect prediction in practice. In Proceedings of the 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR), Montreal, QC, Canada, 25–31 May 2019; IEEE: Piscataway, NJ, USA, 2019. [Google Scholar]
- Cui, L.; Hao, Z.; Jiao, Y.; Fei, H.; Yun, X. Vuldetector: Detecting vulnerabilities using weighted feature graph comparison. IEEE Trans. Inf. Forensics Secur. 2020, 16, 2004–2017. [Google Scholar] [CrossRef]
Source | Positive Sample | Negative Sample | Sample Ratio |
---|---|---|---|
Promise database | 132,943 | 70 | 0.05265% |
Git Hub Open source projects | 478,000 | 2326 | 0.4866% |
Model | Precision | Recall | F1 | Prediction Rate |
---|---|---|---|---|
CNN + Attention | 47.3 | 29.4 | 33.9 | 0.1 |
LSTM + Attention [22] | 27.5 | 21.5 | 24.1 | 5 |
Paths + CRFs | - | - | - | 10 |
Paths-Attention (this model) | 63.3 | 56.2 | 59.5 | 1000 |
Model | ACC | TRP | FPR | F-1 |
---|---|---|---|---|
N-gram | 0.6154 | 0.1085 | 0.0427 | 0.1844 |
Tree-LSTM | 0.8302 | 0.5896 | 0.0690 | 0.6993 |
Siamese Network | 0.8117 | 0.6528 | 0.0613 | 0.7237 |
Paths-Attention (this model) | 0.9025 | 0.7451 | 0.0836 | 0.8162 |
Model | Training Time/s | Test Time/s | Number of Vulnerabilities | Rate of Speed/s |
---|---|---|---|---|
N-gram | 1762.7 | 254 | 117 | 6.51 |
Tree-LSTM | 4141.7 | 287 | 161 | 7.09 |
Siamese Network | 2923.3 | 331 | 155 | 5.95 |
Paths-Attention (this model) | 2987.0 | 343 | 176 | 57.51 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Tan, X.; Xu, Y.; Wu, T.; Li, B. Detection of Reflected XSS Vulnerabilities Based on Paths-Attention Method. Appl. Sci. 2023, 13, 7895. https://doi.org/10.3390/app13137895
Tan X, Xu Y, Wu T, Li B. Detection of Reflected XSS Vulnerabilities Based on Paths-Attention Method. Applied Sciences. 2023; 13(13):7895. https://doi.org/10.3390/app13137895
Chicago/Turabian StyleTan, Xiaobo, Yingjie Xu, Tong Wu, and Bohan Li. 2023. "Detection of Reflected XSS Vulnerabilities Based on Paths-Attention Method" Applied Sciences 13, no. 13: 7895. https://doi.org/10.3390/app13137895
APA StyleTan, X., Xu, Y., Wu, T., & Li, B. (2023). Detection of Reflected XSS Vulnerabilities Based on Paths-Attention Method. Applied Sciences, 13(13), 7895. https://doi.org/10.3390/app13137895