HotCFuzz: Enhancing Vulnerability Detection through Fuzzing and Hotspot Code Coverage Analysis

Du, Chunlai; Guo, Yanhui; Feng, Yifan; Zheng, Shijie

doi:10.3390/electronics13101909

Open AccessArticle

HotCFuzz: Enhancing Vulnerability Detection through Fuzzing and Hotspot Code Coverage Analysis

¹

School of Information Science and Technology, North China University of Technology, Beijing 100144, China

²

Department of Computer Science, University of Illinois Springfield, Springfield, IL 62703, USA

^*

Author to whom correspondence should be addressed.

Electronics 2024, 13(10), 1909; https://doi.org/10.3390/electronics13101909

Submission received: 13 April 2024 / Revised: 8 May 2024 / Accepted: 10 May 2024 / Published: 13 May 2024

(This article belongs to the Special Issue Machine Learning for Cybersecurity: Threat Detection and Mitigation)

Download

Browse Figures

Versions Notes

Abstract

:

Software vulnerabilities present a significant cybersecurity threat, particularly as software code grows in size and complexity. Traditional vulnerability-mining techniques face challenges in keeping pace with this complexity. Fuzzing, a key automated vulnerability-mining approach, typically focuses on code branch coverage, overlooking syntactic and semantic elements of the code. In this paper, we introduce HotCFuzz, a novel vulnerability-mining model centered on the coverage of hot code blocks. Leveraging vulnerability syntactic features to identify these hot code blocks, we devise a seed selection algorithm based on their coverage and integrate it into the established fuzzing test framework AFL. Experimental results demonstrate that HotCFuzz surpasses AFL, AFLGo, Beacon, and FairFuzz in terms of efficiency and time savings.

Keywords:

vulnerability mining; hotspot code; fuzzing; AFL

1. Introduction

In recent years, a surge in security incidents, including hacker attacks and user information leaks, has underscored the pivotal role of software vulnerabilities in such events. As reported by the CVE Detail website, the steady increase in vulnerabilities demonstrates a significant upward trajectory. These vulnerabilities stem from weaknesses in software algorithm models and flaws in code implementation. As software code expands and logic grows more intricate, the task of uncovering hidden vulnerabilities becomes increasingly daunting. Coverage-guided fuzzing has emerged as a highly effective method for automated vulnerability mining. This approach entails the automated or semi-automated generation of unexpected, random, abnormal, or invalid data, which are then inputted into the target software. By monitoring the software’s response to these inputs, potential vulnerabilities are unearthed if the program crashes or behaves abnormally. Moreover, the integration of deep-learning techniques into vulnerability-mining research has led to advancements in source code vulnerability mining, as exemplified by projects like SySeVR [1]. These initiatives extract code representations capturing both syntax and semantic information, enabling vulnerability mining at the source code level.

Path-coverage-based fuzzing test techniques, exemplified by AFL [2], emphasize the exploration of new paths and the minimization of test inputs for seed selection. However, these techniques lack consideration for semantic code analysis. Conversely, AI-based vulnerability-mining techniques centered on the program source code leverage feature learning from code blocks harboring vulnerabilities. In practical vulnerability-mining scenarios, these techniques can identify the location of vulnerability blocks but often struggle to precisely pinpoint the exact vulnerability point. Moreover, they encounter difficulties in uncovering unknown vulnerabilities.

Therefore, we explore a vulnerability-mining method based on hotspot code coverage. During the static analysis phase, deep-learning techniques are employed to identify and fine-tune suspicious hotspot codes. Subsequently, the research pivots to dynamic fuzzing, where vulnerability mining is guided by hotspot code coverage. Our contributions can be summarized as follows:

(1): The introduction of a test input seed-filtering algorithm designed for hotspot code coverage, facilitating the priority-oriented testing of hotspot code blocks.
(2): The proposal of HotCFuzz, a vulnerability-mining model guided by hotspot code coverage. Leveraging the calibration of hotspot codes identified during the static analysis phase, we prioritize vulnerability mining within hotspot code regions. This approach ensures the effective mining not only for hotspot codes but also for non-hotspot codes.
(3): Experimental results demonstrate the superior performance of our proposed HotCFuzz model compared to AFL, AFLGo, Beacon, and FairFuzz.

The rest of the paper is organized as follows: Section 2 introduces related work. In Section 3, the proposed MemConFuzz model is described. In Section 4, the experimental process and the results are discussed. Finally, we conclude the paper in Section 5.

2. Related Work

In terms of fuzzing tests, AFLgo [3] identifies the source code during compilation and computes the distance to the target basic block based on edges in the Control Flow Graph (CFG) of the Program Under Test (PUT). During runtime, it aggregates the distance values for each basic block and computes the average to assess seeds. The prioritization of seeds is based on distances, favoring those closer to the target. Hawkeye [4] measures the similarity between the execution traces of seeds and the target at the function level, incorporating the basic block trace distance with the coverage function similarity for seed prioritization and power scheduling. LOLLY [5] employs a user-specified sequence of program statements as the target, evaluating seeds based on their ability to cover the target sequence (i.e., sequence coverage). Berry [6] enhances LOLLY by considering the execution context of the target sequence, upgrading target sequences with “essential nodes,” and using the similarity between the target execution trace and the enhanced target sequence to differentiate seed priority. DrillerGo [7] utilizes semantic information from logs and CVE descriptions to guide fuzzing, whereas SAVIOR [8] leverages Sanitizer information to prioritize seeds with collaborative executions and verifies all vulnerable program locations on the execution program path. UAFuzz [9] employs a sequence-aware target similarity metric to gauge the similarity between seed executions and error tracing after the target is freed. RDFuzz [10] combines the basic block execution frequency and distance to the target path for seed prioritization, aiming to identify frequently executed code regions close to the target path to effectively trigger and discover potential vulnerabilities. Beacon [11] steers the execution path and employs lightweight static analysis to compute abstract preconditions to the target, thereby discarding numerous infeasible execution paths at runtime to enhance execution efficiency. FairFuzz [12] automatically identifies branches covered by a small number of inputs and biases the mutation towards these branches, leading to a higher coverage. AFLChurn [13] assigns numerical weights to basic blocks based on recent changes or the change frequency to identify and prioritize code regions more likely to introduce new vulnerabilities. WindRanger [14] considers deviations from basic blocks, meaning blocks deviating from the target location starting from the execution trace when calculating the “distance” to the target path. This approach aims to identify paths potentially concealing vulnerabilities by analyzing changes in execution paths.

In the realm of AI-based techniques, Hin et al. [15] introduced LineVD, a deep-learning framework that approaches statement-level vulnerability detection as a node classification task. It captures control and data dependency relationships between statements within functions using graph neural networks, while encoding original source code tokens using a transformer-based model. Li et al. [16] pioneered the VulDeePecker vulnerability detection method, which, first, segments the program forward and backward based on the Program Dependency Graph (PDG) of the source code. Subsequently, it employs a bidirectional long short-term memory network to ascertain whether program slices contain vulnerabilities.

To overcome the limitations of the grammar features in VulDeePecker’s program-slicing dependencies, Li et al. [1] proposed the SySeVR vulnerability detection method. This method broadens the grammar features of the slicing dependencies into four categories, extracting vulnerability code blocks containing both semantic and syntactic information from program slices. Zhou et al. [17] introduced Devign, a methodology that extracts valuable features from learned rich node representations through a novel convolutional module, utilizing them for graph-level classification. Additionally, Wu et al. [18] developed VulCNN, a model inspired by deep-learning-based image classification techniques. VulCNN transforms the source code of functions into an image representation, parsing and tokenizing the source code while preserving crucial syntactic and semantic information. This approach enables the model to effectively learn the internal representation of the source code.

3. Methodology

3.1. Motivation

The dynamic vulnerability-mining method based on fuzzing tests hinges on two pivotal factors: (1) the selection of input seeds for generating distorted test inputs, and (2) the distortion strategies employed on these input seeds. It can be argued that the quality of the chosen input seeds significantly influences the efficacy of vulnerability mining through fuzz testing. To elucidate the concepts presented in this paper, the following definitions are provided:

Definition 1.

Dangerous Instruction Code: Instructions within a program’s codebase that operate on operands influenced by untrustworthy input data from the external environment. These instructions have the potential to induce program exceptions or alter the normal execution flow, posing a risk to the program’s integrity and security.

Definition 2.

Dangerous Function Code: Functions within a software system designed to process input data originating from untrusted external sources. These functions, however, fail to implement rigorous boundary checks on the received input data, thereby exposing the system to potential security vulnerabilities.

Definition 3.

Hotspot Code Region: A set of code segments within a program characterized by their propensity to trigger program crashes or other undesirable behaviors. Hotspot code regions encompass both dangerous functions, which handle untrusted input data without stringent boundary checks, and dangerous instruction codes, which operate on operands susceptible to manipulation by untrusted external sources.

If the target program under test contains hotspot code regions, the likelihood of triggering program crashes during the testing process is significantly higher compared to other code regions. Building upon this logical assumption, we prioritize hotspot code region coverage as a fundamental factor for orienting fuzzing tests. Figure 1 illustrates the core concept of vulnerability mining presented in this paper.

In the static analysis phase, hotspot code regions are discerned from the program’s source code employing a deep-learning framework. Subsequently, during the dynamic analysis phase, the coverage of each test input concerning the identified hotspot codes during the execution of the target program is logged. An evaluation method for test inputs is then formulated to allocate greater emphasis on input seed selection within the fuzzing test framework. This ensures that test inputs exhibiting high coverage of hotspot code regions are prioritized for selection as input seeds. Finally, a hotspot-code-coverage-guided fuzzing framework is constructed around the chosen seed inputs.

In summary, our proposed model meticulously analyzes the semantics of code behavior by examining dependencies in static analysis. Furthermore, leveraging dynamic fuzzing, it promptly retains context information during program crashes along with the corresponding test inputs responsible for triggering the crashes. This capability allows for precise pinpointing of the crash location, thereby facilitating subsequent analysis of the underlying cause of the vulnerability.

3.2. HotCFuzz Framework

According to the concept of prioritizing testing of hotspot code as described in Section 3.1, we have developed a vulnerability-mining framework named HotCFuzz, which operates under the guidance of hotspot code coverage. The components of HotCFuzz are depicted in Figure 2 below:

HotCFuzz comprises two distinct phases: (1) Static Analysis Phase: This phase involves scrutinizing the source code to pinpoint and annotate hotspot code. The phase yields location information pertaining to the identified hotspot code. Utilizing LLVM, instrumentation code is inserted at the outset of all identified hotspot codes within the source code of the target program. This instrumentation code records the execution information of the instrumented hotspot code during runtime. (2) Dynamic Testing Phase: This phase encompasses two primary facets. Firstly, it monitors the execution coverage of the hotspot code for each test input processed by the program. Secondly, it evaluates the quality of the test inputs based on the coverage of the hotspot code. Leveraging this evaluation, superior test inputs are selected as seed inputs to generate a new round of test input sets.

3.2.1. Detect and Locate Hotspot Codes

The program comprises a sequence of functions interconnected by sequential control flow relationships, with each function capable of being converted into an Abstract Syntax Tree (AST) representation. To achieve this, we employ Joern, which operates by generating a Code Property Graph (CPG) initially. This CPG amalgamates the AST, Control Flow Graph (CFG), and Program Dependence Graph (PDG) of the program’s code. Subsequently, Joern produces the corresponding AST, CFG, and PDG from the CPG, tailored to meet the task requirements. For example, let us consider the source code depicted in Figure 3, which encompasses two functions, namely, Value and func, with a calling relationship established between them.

The corresponding ASTs for the functions Value and func generated by Joern from CPG are shown in Figure 4 and Figure 5:

In order to identify hotspots within programs, we first detect potentially hazardous elements within the AST of the target program. These hazardous elements encompass API functions, array variables, pointer variables, etc. Due to variations in variable naming conventions across different programs, direct comparisons are often infeasible. Therefore, our AST-based hazardous element detection focuses solely on hazardous API functions. Subsequently, we consider semantic information of API functions to find association relations hidden in instructions. For example, hazardous instruction sequence of heap operation on the same memory block can be found through assignment propagation dependencies among heap pointer variables in the DDG. To realize the detection of hazardous API functions within AST elements, we necessitate the construction of vulnerability syntax characteristics tailored for the search, comparison, and identification of AST elements. We employed an outstanding static auditing tool, Checkmarx, to identify snippets of vulnerable code within known vulnerability programs. This strategic utilization of Checkmarx significantly enhances our ability to scrutinize the source code and pinpoint hotspot areas effectively. By leveraging Checkmarx’s advanced capabilities, we ensure a thorough examination of critical code segments, thereby augmenting the precision of our hotspot code annotation process. The incorporation of Checkmarx’s analysis results into our methodology adds an additional layer of assurance regarding the effectiveness of our hotspot code identification and annotation process.

Subsequently, we extract hazardous API functions from the identified vulnerable code snippets to represent the vulnerability syntax characteristics. In our research, we selected 5891 snippets which encompass C language vulnerabilities in the SARD dataset [19]. Utilizing the represented vulnerability syntax characteristics, we performed a comprehensive traversal search on the AST encompassing all functions within the target program. Since the code elements for vulnerability may reside as either leaf nodes or intermediate nodes within the AST, upon encountering a node that meets a vulnerability syntax characteristic, we meticulously record the location of corresponding code area. Subsequently, through this analysis, we discern the locations of hotspot codes within the target program.

In certain scenarios, specific sequences of function abnormalities are necessary in order to pose a vulnerability threat. For instance, in the case of a heap vulnerability such as double free, the occurrence of two free operations on memory allocated by malloc is required. Hence, it is imperative to apply further filtering to the identified hotspot functions in the target program to accommodate such cases.

The PDG delineates the data dependencies and control dependencies among instructions within a function. Figure 6 and Figure 7, respectively, illustrate the PDGs of the functions Value and func from Figure 2.

Since the Program Dependence Graphs (PDGs) generated by Joern are function-oriented, it becomes necessary to merge the PDGs of related functions based on their calling relationships. This is accomplished by establishing an equivalence mapping between the data within the calling function and the parameters of the called function, taking into account the relationship of parameter passing. As a result, we establish semantic correlations between functions or instructions. For instance, when dealing with operations on the same heap memory, despite potential differences in the names of pointer variables between the calling function and the called function, these disparities can be reconciled through equivalence mapping. Ultimately, this process enables us to discern the hotspot code within the target program (Algorithm 1).

Algorithm 1: Detect and Locate Hotspot Functions in Target Programs
Input: Vulnerability syntax characteristic
Output: Hotspot code information
1:	generate the AST of the target program function
2:	traversal search of function AST based on vulnerability syntax characteristics
3:	for AST not traversed, do
4:	if AST grammar element matching the characteristics, then
5:	grammatical elements incorporated into a collection of hotspot codes
6:	endif
7:	endfor
8:	using Joern to generate the PDG of the target program function
9:	for function call not processed, do
10:	constructing mapping equivalence tables for variable names
11:	endfor
12:	traverse the function PDG vertically and comb through the sequence of anomalies between functions based on data dependencies.
13:	update collection set of hotspot code regions

3.2.2. Input Seed Filtering based on Hotspot Code Coverage

Based on the identification of the hotspot code within the target program, we undertook two tasks. Firstly, we defined a bitmap matrix named hotspot_shm, which shares the same dimensions as the AFL bitmap. This matrix serves to track and record coverage information of executed hotspot code functions, specifically capturing the frequency of triggering each hotspot function, denoted as hotspot_funcs. Secondly, as illustrated in Figure 2, during the static phase, we utilized LLVM tools to inject instrumentation code ahead of all hotspot code segments. The objective was to capture the execution of hotspot code. Throughout the fuzzing process, when the target program encounters a hotspot code, the corresponding position in the hotspot code bitmap matrix is set to 1.

To obtain input seeds prioritizing the coverage of hotspot code, we employ hotspot_funcs as a metric. Each input is evaluated based on the number of hotspot code functions it activates, with inputs triggering more hotspot code functions assigned a higher coefficient. Consequently, such inputs receive a higher score and are accorded higher priority as seeds. We adhere to the following two principles:

Principle 1.

The coefficient increases proportionally with the recorded coverage of hotspot functions during execution. Consequently, the seed’s score and energy are augmented.

Principle 2.

AFL’s original scoring strategy is also factored into our approach. We ensure that the final score of the seed does not become excessively large, as this could potentially lead to the seed being trapped in local code blocks during execution.

Therefore, the input seed preference evaluation formula proposed in this paper is as follows:

Priority score ({seed}_{i}) = \{\begin{array}{l} P_{afl} ({seed}_{i}) * (a - \frac{e^{- hotspot_funcs}}{b}), h o t s p o t_f u n c s \\ P_{afl} ({seed}_{i}), o t h e r w i s e \end{array},

(1)

where a is the deviation factor, which is a consideration for implementing principle 2 and takes the value of 1.3 in the experimental test. b is adjustment to ensure

(a - \frac{e^{- hotspot_funcs}}{b}) > 1

whenever hotspot codes are executed. When hotspot codes are found but the input seeds concerning hotspot code have been decreasing, b should be increased. In our experiment, b is set to 6 based on experiment statistics. The hotspot_funcs denotes the total number of hotspot code functions triggered by a test seed during the fuzzing test loop.

s e e d_{i}

refers to the current test input in the fuzzing process.

P_{a f l} (s e e d_{i})

represents AFL’s original seed selection strategy. Equation (1) not only considers the prioritization of input seeds that explore hotspot code regions but also ensures that other excellent input seeds are not discarded.

3.2.3. HotCFuzz Model

The HotCFuzz model is constructed based on the AFL framework. As depicted in Algorithm 2, during the static analysis phase, hotspot code information is first obtained from the target program (line 1). Subsequently, instrumentation codes are inserted at the heads of hotspot codes to record the execution of these hotspot codes during the fuzzing of the target program. These recorded data serve to evaluate the quality of test inputs, thereby aiding in the selection of superior input seeds (line 2). During the fuzzing phase, HotCFuzz initially provides a seed set S, from which inputs s are chosen and mutated for fuzzing in a continuous loop until a timeout is reached or fuzzing is aborted. Within this loop, AFL typically selects seeds using ChooseNext from a circular queue in the order they were added in the original framework. However, we modify ChooseNext to prioritize seeds based on their importance (line 5). AssignEnergy determines the number of inputs generated from s (line 6). The generated inputs, denoted as s’, are produced by randomly mutating s (line 8). If the generated input s’ triggers a crash, it is added to the CrashSet. Otherwise, if s’ covers a new branch (line 12–14) or multiple hotspot codes (line 15–17), it is appended to the seed set S.

Algorithm 2: HotCFuzz
Input: Test Input Set S
Output: CrashSet
1:	Static analysis phase to obtain hotspot code information
2:	Insert codes for recording hotspot code executed by LLVM
3:	CrashSet ← Ø
4:	repeat
5:	select seed s from Set S
6:	determine the number p of abnormal inputs according to s by power schedule
7:	for i from 1 to p, do
8:	mutate s to create new test input s’
9:	if s’ triggers crash, then
10:	add s’ to CrashSet
11:	else
12:	if new paths are found, then
13:	add s’ to S
14:	endif
15:	if hotspot code areas are found, then
16:	evaluating s’ and add s’ to S according to score
17:	endif
18:	endif
19:	endfor
20:	until timeout reached or abort-signal

4. Experiments

4.1. Experiment Setup

In this experiment, our objective is to assess the performance of the HotCFuzz model and compare it with other fuzzing test techniques. To achieve this, we have selected a publicly available test input set comprising programs such as cxxfilt v2.30, readelf v2.25, objdump v2.25, and openjpeg v2.3.0. It is noteworthy that some of these programs have known vulnerabilities, duly identified by CVE numbers, thereby ensuring the authenticity and reliability of the experiment.

For comparison with HotCFuzz, we have selected AFL, AFLGo, Beacon, and FairFuzz as the fuzzing test tools. These tools represent advanced techniques in the field of fuzzing and are extensively utilized for vulnerability-mining purposes.

4.2. Evaluation Indicators

To assess the performance of the HotCFuzz model, we have identified the following three critical evaluation metrics:

(1): Vulnerability Discovery Count: This metric involves comparing the number of vulnerabilities detected by different fuzzing tools within the same duration of execution. It serves to validate the efficacy of vulnerability mining across different testers.
(2): Same Vulnerability Trigger Time: This metric entails comparing the time required by different fuzzing tools to reproduce known vulnerabilities. It validates the effectiveness of various testers in triggering known vulnerabilities.
(3): Seed Execution Path Gain Count: This metric involves comparing the number of gained execution paths for seeds among different fuzzing tools within the same duration of execution. This metric visualizes code coverage, revealing testing depth and breadth, and also reflects the efficiency of the fuzzing tester in generating test cases.

4.3. Experimental Results

We evaluated the performance of the HotCFuzz model using the above evaluation metrics and compared it with AFL, AFLGo, Beacon, and FairFuzz. The experimental results are shown in Table 1, Table 2 and Table 3.

4.4. Discussion

To validate the efficiency of HotCFuzz in vulnerability mining, we assessed its performance by comparing the number of vulnerabilities detected within the same timeframe. As illustrated in Table 1, across various test programs, HotCFuzz consistently outperforms AFL, AFLGo, Beacon, and FairFuzz in terms of vulnerability discovery. Specifically, in the cxxfilt test program, HotCFuzz discovers a greater number of vulnerabilities compared to its counterparts. Similarly, in the readelf and objdump test programs, HotCFuzz excels, surpassing AFL, AFLGo, and FairFuzz in vulnerability mining. Even in the openjpeg test program, HotCFuzz outshines other fuzzing testers by identifying more vulnerabilities. These findings underscore the efficiency, superiority, and effectiveness of HotCFuzz in vulnerability mining. The experimental result in Table 1 highlights the superiority of our approach over established methods such as AFL, AFLGo, Beacon, and FairFuzzy.

HotCFuzz aims to expedite the triggering of exploits located at the target location. The experimental findings, outlined in Table 2, demonstrate HotCFuzz’s superior performance in this aspect across most scenarios. With the exception of a longer vulnerability reproduction time observed in CVE-2017-9039, HotCFuzz consistently outperforms AFL, AFLGo, Beacon, and FairFuzz in terms of vulnerability triggering speed. For instance, in the vulnerability reproduction of CVE-2016-4492, HotCFuzz showcases greater efficiency compared to the latest Beacon tool, owing to its well-crafted seed selection strategy. Conversely, in the vulnerability reproduction of CVE-2017-9039, HotCFuzz experiences a longer reproduction time due to the specific or intricate triggering conditions associated with the vulnerability. In cases where the target vulnerability relies on a particular execution environment state or necessitates a specific input sequence for triggering, HotCFuzz may encounter challenges in the absence of adequate dynamic execution information feedback. The outcomes of these experiments presented in Table 2 demonstrate the shortest time consumption achieved by our model and underscore its effectiveness in vulnerability detection.

Table 3 presents the results of experiments on the number of execution paths covered, showcasing HotCFuzz’s exceptional performance compared to other fuzzing testers. However, it is worth noting that, in the case of testing readelf, HotCFuzz’s results are slightly inferior to those of FairFuzz. This outstanding experimental performance can be attributed to HotCFuzz’s innovative approach, which integrates both static analysis and dynamic execution information to guide the seed selection and mutation process. Through meticulous analysis of the program’s data flow and control flow, HotCFuzz efficiently identifies potential vulnerability trigger paths, thereby optimizing test cases to explore a greater number of execution paths. Consequently, HotCFuzz outshines similar tools in seed path coverage experiments, underscoring its effectiveness and superiority in vulnerability mining.

5. Conclusions

Vulnerability mining has long been a focal point in software security research, with fuzzing remaining a prominent technology in this domain. While fuzzing has seen significant success, it often overlooks the syntax and semantics of the target program. To address this gap, we aimed to identify suspicious hotspot codes within the target program by leveraging syntax and semantic features extracted from vulnerability code fragments. Based on this concept, we introduced HotCFuzz, a vulnerability fuzzing model centered on hotspot code coverage. In the HotCFuzz model, we introduced an algorithm to detect and pinpoint hotspot codes during the static analysis stage. Instrumentation codes are inserted at the beginning of hotspot codes to record their execution, facilitating the selection of superior input seeds. Experimental results demonstrate that HotCFuzz surpasses AFL, AFLGo, Beacon, and FairFuzz in terms of performance and effectiveness.

Author Contributions

Conceptualization, C.D. and Y.G.; methodology, C.D. and Y.G.; software, Y.F.; validation, S.Z.; investigation, Y.F.; writing—original draft preparation, Y.F. and S.Z.; writing—review and editing, Y.G.; visualization, Y.F. and S.Z.; project administration, C.D.; funding acquisition, C.D. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China grant number 62172006.

Data Availability Statement

The test results data presented in this study are available upon request. The dataset can be found in public web sites.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Li, Z.; Zou, D.; Xu, S.; Jin, H.; Zhu, Y.; Chen, Z. SySeVR: A Framework for Using Deep Learning to Detect Software Vulnerabilities. IEEE Trans. Dependable Secur. Comput. 2022, 19, 2244–2258. [Google Scholar] [CrossRef]
American Fuzzing Lop. Available online: https://lcamtuf.coredump.cx/afl/ (accessed on 8 April 2024).
Böhme, M.; Pham, V.T.; Nguyen, M.D.; Roychoudhury, A. Directed Greybox Fuzzing. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, Dallas, TX, USA, 30 October–3 November 2017; pp. 2329–2344. [Google Scholar]
Chen, H.; Xue, Y.; Li, Y.; Chen, B.; Xie, X.; Wu, X.; Liu, Y. Hawkeye: Towards a Desired Directed Grey-Box Fuzzer. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, Toronto, ON, Canada, 15–19 October 2018; pp. 2095–2108. [Google Scholar]
Liang, H.; Zhang, Y.; Yu, Y.; Xie, Z.; Jiang, L. Sequence Coverage Directed Greybox Fuzzing. In Proceedings of the 2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC), Montreal, QC, Canada, 25–26 May 2019; IEEE Computer Society. pp. 249–259. [Google Scholar]
Liang, H.; Jiang, L.; Ai, L.; Wei, J. Sequence Directed Hybrid Fuzzing. In Proceedings of the 2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER), London, ON, Canada, 18–21 February 2020; pp. 127–137. [Google Scholar]
Kim, J.; Yun, J. Poster: Directed Hybrid Fuzzing on Binary Code. In Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, London, UK, 11–15 November 2019; pp. 2637–2639. [Google Scholar]
Chen, Y.; Li, P.; Xu, J.; Guo, S.; Zhou, R.; Zhang, Y.; Wei, T.; Lu, L. Savior: Towards Bug-Driven Hybrid Testing. In Proceedings of the 2020 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 18–21 May 2020; pp. 1580–1596. [Google Scholar]
Nguyen, M.D.; Bardin, S.; Bonichon, R.; Groz, R.; Lemerre, M. Binary-Level Directed Fuzzing for {use-after-Free} Vulnerabilities. In Proceedings of the 23rd International Symposium on Research in Attacks, Intrusions and Defenses (RAID 2020), San Sebastian, Spain, 14–16 October 2020; pp. 47–62. [Google Scholar]
Ye, J.; Li, R.; Zhang, B. RDFuzz: Accelerating Directed Fuzzing with Intertwined Schedule and Optimized Mutation. Math. Probl. Eng. 2020, 2020, 7698916. [Google Scholar] [CrossRef]
Huang, H.; Guo, Y.; Shi, Q.; Yao, P.; Wu, R.; Zhang, C. Beacon: Directed Grey-Box Fuzzing with Provable Path Pruning. In Proceedings of the 2022 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 23–25 May 2022; pp. 36–50. [Google Scholar]
Lemieux, C.; Sen, K. FairFuzz: A Targeted Mutation Strategy for Increasing Greybox Fuzz Testing Coverage. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, Montpellier, France, 3–7 September 2018; pp. 475–485. [Google Scholar]
Zhu, X.; Böhme, M. Regression Greybox Fuzzing. In Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security, Virtual Event, Republic of Korea, 15–19 November 2021; pp. 2169–2182. [Google Scholar]
Du, Z.; Li, Y.; Liu, Y.; Mao, B. WindRanger: A Directed Greybox Fuzzer Driven by Deviation Basic Blocks. In Proceedings of the 44th International Conference on Software Engineering, Pittsburgh, PA, USA, 21–29 May 2022; pp. 2440–2451. [Google Scholar]
Hin, D.; Kan, A.; Chen, H.; Babar, M.A. LineVD: Statement-Level Vulnerability Detection Using Graph Neural Networks. In Proceedings of the 19th International Conference on Mining Software Repositories, Pittsburgh, PA, USA, 23–24 May 2022; pp. 596–607. [Google Scholar]
Li, Z.; Zou, D.; Xu, S.; Ou, X.; Jin, H.; Wang, S.; Deng, Z.; Zhong, Y. VulDeePecker: A Deep Learning-Based System for Vulnerability Detection. In Proceedings of the 2018 Network and Distributed System Security Symposium, San Diego, CA, USA, 18–21 February 2018. [Google Scholar]
Zhou, Y.; Liu, S.; Siow, J.; Du, X.; Liu, Y. Devign: Effective Vulnerability Identification by Learning Comprehensive Program Semantics via Graph Neural Networks. Adv. Neural Inf. Process. Syst. 2019, 32. [Google Scholar]
Wu, Y.; Zou, D.; Dou, S.; Yang, W.; Xu, D.; Jin, H. VulCNN: An Image-Inspired Scalable Vulnerability Detection System. In Proceedings of the 44th International Conference on Software Engineering, Pittsburgh, PA, USA, 21–29 May 2022; pp. 2365–2376. [Google Scholar]
Nist Sofware Assurance Reference Dataset. Available online: https://samate.nist.gov/SARD/ (accessed on 8 April 2024).

Figure 1. The idea behind this article.

Figure 2. HotCFuzz model.

Figure 3. Sample code.

Figure 4. AST for the function Value.

Figure 5. AST for the function func.

Figure 6. PDG for the function Value.

Figure 7. PDG for the function func.

Table 1. Number of vulnerability mining for the same testing time (24 h) for different fuzzing testers.

Program	AFL	AFLGo	Beacon	FairFuzz	HotCFuzz
cxxfilt	373	6	401	39	493
readelf	86	65	54	181	91
objdump	3	2	4	10	18
openjpeg	8	10	0	7	21

Table 2. Trigger time for real-world exploits (min).

Program	Vulnerability	AFL	AFLGo	Beacon	FairFuzz	HotCFuzz
cxxfilt	CVE-2016-4492	17	9	6	8	5
readelf	CVE-2017-9039	6	12	6	6	9
openjpeg	CVE-2017-12982	768	960	Timeout	930	673

Table 3. Number of seed execution paths.

Program	AFL	AFLGo	Beacon	FairFuzz	HotCFuzz
cxxfilt	2783	3433	4106	3826	7948
readelf	3049	3988	3125	5895	4013
objdump	2876	3934	3877	4671	4832
openjpeg	3946	3771	1285	3016	4011

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Du, C.; Guo, Y.; Feng, Y.; Zheng, S. HotCFuzz: Enhancing Vulnerability Detection through Fuzzing and Hotspot Code Coverage Analysis. Electronics 2024, 13, 1909. https://doi.org/10.3390/electronics13101909

AMA Style

Du C, Guo Y, Feng Y, Zheng S. HotCFuzz: Enhancing Vulnerability Detection through Fuzzing and Hotspot Code Coverage Analysis. Electronics. 2024; 13(10):1909. https://doi.org/10.3390/electronics13101909

Chicago/Turabian Style

Du, Chunlai, Yanhui Guo, Yifan Feng, and Shijie Zheng. 2024. "HotCFuzz: Enhancing Vulnerability Detection through Fuzzing and Hotspot Code Coverage Analysis" Electronics 13, no. 10: 1909. https://doi.org/10.3390/electronics13101909

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

HotCFuzz: Enhancing Vulnerability Detection through Fuzzing and Hotspot Code Coverage Analysis

Abstract

1. Introduction

2. Related Work

3. Methodology

3.1. Motivation

3.2. HotCFuzz Framework

3.2.1. Detect and Locate Hotspot Codes

3.2.2. Input Seed Filtering based on Hotspot Code Coverage

3.2.3. HotCFuzz Model

4. Experiments

4.1. Experiment Setup

4.2. Evaluation Indicators

4.3. Experimental Results

4.4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI