*Article* **CVDF DYNAMIC—A Dynamic Fuzzy Testing Sample Generation Framework Based on BI-LSTM and Genetic Algorithm**

**Mingrui Ma 1, Lansheng Han 1,\* and Yekui Qian 2**


**Abstract:** As one of the most effective methods of vulnerability mining, fuzzy testing has scalability and complex path detection ability. Fuzzy testing sample generation is the key step of fuzzy testing, and the quality of sample directly determines the vulnerability mining ability of fuzzy tester. At present, the known sample generation methods focus on code coverage or seed mutation under a critical execution path, so it is difficult to take both into account. Therefore, based on the idea of ensemble learning in artificial intelligence, we propose a fuzzy testing sample generation framework named CVDF DYNAMIC, which is based on genetic algorithm and BI-LSTM neural network. The main purpose of CVDF DYNAMIC is to generate fuzzy testing samples with both code coverage and path depth detection ability. CVDF DYNAMIC generates its own test case sets through BI-LSTM neural network and genetic algorithm. Then, we integrate the two sample sets through the idea of ensemble learning to obtain a sample set with both code coverage and vulnerability mining ability for a critical execution path of the program. In order to improve the efficiency of fuzzy testing, we use heuristic genetic algorithm to simplify the integrated sample set. We also innovatively put forward the evaluation index of path depth detection ability (pdda), which can effectively measure the vulnerability mining ability of the generated test case set under the critical execution path of the program. Finally, we compare CVDF DYNAMIC with some existing fuzzy testing tools and scientific research results and further propose the future improvement ideas of CVDF DYNAMIC.

**Keywords:** genetic algorithm; Bi-LSTM neural network; fuzzy testing sample generation; deep learning

### **1. Introduction and Background**

Vulnerability in program has always been a serious threat to software security, which may cause denial of service, information leakage and other exceptions. Some typical cases of vulnerability exploitation, such as wannacry ransomware, have a disastrous impact on social economy and network security. Therefore, mining vulnerabilities scientifically and efficiently has been a hot topic.

At present, vulnerability mining technology can be divided into static vulnerability mining and dynamic testing (fuzzy testing) [1]. The former does not construct test cases nor run source code. By extracting the characteristics or key operations of the corresponding types of vulnerabilities, static code audit is carried out on the source code to detect the possibility of various vulnerabilities. The target source code of static vulnerability mining can be advanced language, assembly language generated by compiler, or binary file. The advantages of static vulnerability mining lie in fast mining speed, high efficiency, and good detection accuracy for vulnerabilities with obvious characteristics. However, static vulnerability mining often leads to high false positive rate and false negative rate for vulnerabilities with unclear features or diverse types and forms (such as null pointer reference vulnerability in C/C++). Dynamic fuzzy testing can solve this problem by

**Citation:** Ma, M.; Han, L.; Qian, Y. CVDF DYNAMIC—A Dynamic Fuzzy Testing Sample Generation Framework Based on BI-LSTM and Genetic Algorithm. *Sensors* **2022**, *22*, 1265. https://doi.org/10.3390/ s22031265

Academic Editors: Athanasios V. Vasilakos and Vassilis S. Kodogiannis

Received: 22 December 2021 Accepted: 4 February 2022 Published: 7 February 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

constructing reasonable test examples. However, the efficiency of dynamic fuzzy testing is lower than that of static vulnerability mining because it needs to construct samples and run programs to determine whether there are vulnerabilities. Therefore, how to construct test cases with high pdda and code coverage is the key of fuzzy testing. In practical application, it is often necessary to combine static vulnerability mining with fuzzy testing to achieve better vulnerability detection performance. Existing mainstream fuzzy testing can be divided into the following three categories:


In black box test, the internal structure of the program is not understood at all, and the test cases are constructed blindly. Thus, its testing efficiency is very low. White box test uses program analysis methods [4] (such as path traversal and symbolic execution) to analyze the program source code and then constructs the corresponding test cases. The white box test can cover deeper test path, which causes a lot time cost and system resources with poor scalability. The grey box test [5] can achieve a good balance between the test efficiency and the coverage of test cases because of the introduction of lightweight program analysis technology. It is more effective than a black box test and more extensible than a white box test. At present, the grey box testing program is mainly guided by code coverage. The typical grey box fuzzers are AFL [6] and so on.

However, the problem of current grey box fuzzers is that they are designed to cover as many code execution paths as possible. In the regulation of seed energy, they usually use the idea of average distribution instead of regulating different energies for different test paths. Nevertheless, most of the source code vulnerabilities are concentrated on a small number of critical test paths in reality. Existing grey box fuzzers often spend a lot of time to detect the path whose vulnerability is not easy to be detected, thus reducing the efficiency of fuzzy testing.

Because the application of a single method in grey box fuzzy testing has its own limitations, more and more researchers have begun to integrate a variety of methods to achieve better fuzzy testing results, such as [7].

Based on existing research work [8], this paper proposes a new framework of fuzzy testing sample generation called CVDF DYNAMIC. It consists of three parts:


The genetic algorithm can improve the quality of test cases and expand the code coverage by simulating the natural process of gene recombination and evolution. The bi-LSTM time sequence can regulate different energy of the test path, which can make the seeds on the critical path iterate and mutate for many times, and enhance the path depth detection ability. The critical contribution of CVDF DYNAMIC is that it integrates the two methods of sample generation, and simplifies the sample set by using a heuristic genetic algorithm, which makes the test case set achieve a good balance in code coverage, path depth detection ability and sample set size. This paper also compares the proposed method with other fuzzy testing samples and further presents the improvement direction of that method.
