Next Article in Journal
Mitigation of 1-Row Hammer in BCAT Structures Through Buried Oxide Integration and Investigation of Inter-Cell Disturbances
Next Article in Special Issue
Addressing Class Imbalance in Intrusion Detection: A Comprehensive Evaluation of Machine Learning Approaches
Previous Article in Journal
Dynamic Artificial Bee Colony Algorithm Based on Permutation Solution
Previous Article in Special Issue
Slicing Through the Noise: Efficient Crash Deduplication via Trace Reconstruction and Fuzzy Hashing
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

BSP: Branch Splitting for Unsolvable Path Hybrid Fuzzing

1
National Key Laboratory of Science and Technology on Information System Security, Bejing 100101, China
2
College of Cyber Science, Nankai University, Tianjin 300350, China
3
Tianjin Navigation Instruments Research Institute, Tianjin 300130, China
*
Authors to whom correspondence should be addressed.
Electronics 2024, 13(24), 4935; https://doi.org/10.3390/electronics13244935
Submission received: 13 November 2024 / Revised: 9 December 2024 / Accepted: 12 December 2024 / Published: 13 December 2024
(This article belongs to the Special Issue Network Security and Cryptography Applications)

Abstract

:
Hybrid fuzzing leverages the result of the concolic executor for a direct exploration of fuzzing, which has been proven to improve coverage during tests significantly.However, some constraints, such as those related to environments or depending on the host’s status, cannot be solved. Despite many performance optimizations on hybrid fuzzing, we observe that repeatedly constraint solving on unsolvable branches causes significant computational redundancies. This paper focuses on eliminating the unsolvable branches in concolic execution. We propose Branch Splitting for Unsolvable Path Hybrid Fuzzing (BSP), which splits unsolvable branches to achieve higher fuzzing coverage. BSP modifies the target program during concolic execution so that the fuzzer can easily cover initially unsolvable branches. Specifically, it changes the condition of unsolvable branches to constant True (or False), which generates multiple variants of the original program. Then, the fuzzer tests these variants instead. This allows BSP to explore more branches with high performance. The experimental results on real-world programs demonstrate that BSP can explore 46.68% more branches than QSYM.

1. Introduction

Fuzzing can effectively explore paths in a program by generating a mutated corpus, which has become a popular technology in vulnerability discovery. However, the generation of fuzzing input is random, so the fuzzer cannot explore branches with complex constraints quickly. To overcome this problem, researchers have suggested hybrid fuzzing [1,2,3,4,5], which integrates the advantages of both fuzzing and concolic execution. Fuzzing and concolic execution serve distinct purposes: fuzzing rapidly tests code by applying random changes, while concolic execution methodically addresses intricate branch conditions through constraint solving. In a hybrid approach, the concolic executor analyzes each test case from the fuzzer, monitors how they run, resolves branch constraints, and produces new test cases aimed at expanding code coverage. These newly created cases are then fed back to the fuzzer, prompting a fresh round of mutations. This cyclical method enhances the hybrid fuzzer’s ability to explore a broader range of code more efficiently.
The high cost of constraint solving is a challenge hybrid fuzzy processing faces. Even optimized hybrid fuzzers have performance problems due to the significant computational overhead [6]. Cutting-edge hybrid fuzzers prioritize solving only the most intriguing path constraints to boost efficiency. When the fuzzer becomes stuck (unable to enhance code coverage), Driller [1] utilizes a concolic executor to aid in probing complex branches. QSYM [2] employs dynamic binary translation (DBT) for instruction-level concolic execution, focusing solely on the instructions required to create symbolic constraints. This approach dramatically reduces the amount of symbolic simulation needed, thereby boosting the efficiency of concolic execution. DigFuzz [3] employs Monte Carlo approaches, while MEUZZ [7] harnesses machine learning, both aimed at improving the seed scheduling strategies for their concolic executors.
The speed of random mutation in generating test cases far outpaces that of constraint solving, resulting in a mismatch between the fuzzer’s aims and the efficiency of the concolic executor. As an illustration, while the concolic executor is busy processing a specific test case (labeled “A”), the fuzzer could have concurrently created several additional test cases. In this case, branches covered by A may have been covered by test cases generated by fuzzer. This situation incurs substantial computational costs, which can significantly impede the overall speed of the testing procedure. Additionally, we observe that some constraints that cannot be solved by the concolic executor are constantly solved, which makes it even worse.
In this paper, we implement a prototype system called BSP, which releases the concolic executor from unsolvable branches to improve constraint-solving performance and help the fuzzer to explore more branches. During concolic execution, we filter out branches that are not solved. In addition, we manually analyze branches that contain constraints unrelated to the input but related to the execution environment. These branches could not be solved theoretically by either concolic execution or fuzzing. Then, BSP splits the program into multiple variants. In each variant, the condition of an unsolvable branch is assigned a constant value (True or False), which helps the fuzzer easily overcome the branch. This allows BSP to focus on resolving solvable branches, enabling it to cover a larger amount of code quickly.
We tested BSP on six popular programs. Results from the evaluation revealed that BSP, with its optimization methods, exceeded the performance of the advanced hybrid fuzzer QSYM [2] in branch coverage across six real-world software programs. These results demonstrate that BSP can cover 46.68% more branches than QSYM.
In conclusion, we analyzed mainstream concolic executors’ mechanisms. Our analysis revealed shortcomings of concolic executors in solving branch conditions. We proposed and implemented an untestable branch-splitting mechanism to help the concolic executor overcome unsolvable branches and lead the fuzzer to a deeper program state. The results of the experiments highlight that BSP demonstrates higher performance than our reference tool, QSYM [2], regarding the growth rate of branch coverage and efficiency.
The work is organized as follows: Section 2 discusses the extant literature. Section 3 describes our motivation, focusing on why we propose our method. Section 4 discusses the design of BSP, including how to perform branch splitting and the definition of “unsolvable branch”. Section 5 presents our experimental results. Section 6 describes limitation and future work, then Section 7 concludes.

2. Related Work

We selected related work whose publication date varied from 2007 to 2024. We mainly included literature about software security.

2.1. Coverage-Based Fuzzing

It stands to reason that the more code a fuzzer can cover during testing, the higher the probability of discovering vulnerabilities within the program. Increased code coverage generally leads to testing a more comprehensive range of execution paths, including those that might contain critical bugs or security flaws. Consequently, coverage-based fuzzers are designed to maximize code coverage and target specific regions of code that are likely to yield valuable insights. These fuzzers employ a variety of strategies to achieve this. For instance, some fuzzers focus on generating well-structured and semantically correct test cases, ensuring that the inputs they produce can reach a broader range of code paths, thereby maximizing coverage [8,9,10,11,12]. This approach is efficient in scenarios where complex or structured inputs are needed to thoroughly exercise the program’s logic. Other fuzzers concentrate on providing more precise and detailed coverage data, which in turn allows for a deeper and broader exploration of the program’s state space, increasing the likelihood of discovering previously hidden vulnerabilities [13,14,15,16]. By employing these techniques, fuzzers improve code coverage and enhance their effectiveness in detecting vulnerabilities, making them an indispensable tool in modern software security testing.
Coverage tracking is most commonly achieved through program instrumentation. One prominent example of a coverage-based fuzzer is American Fuzzy Lop (AFL) [17], which gathers edge coverage data by instrumenting either the source code or binary, typically using a QEMU mode. This enables AFL to observe which program paths are executed, and it mutates inputs to discover new coverage, thus expanding its exploration. In addition to AFL, other widely used instrumentation methods include tools like Pin [2,18], Intel Processor Trace (Intel PT) [19,20], DynamoRIO [21,22], E9Patch [23], and RetroWrite [24]. These techniques are frequently employed in practice to enhance fuzzing effectiveness. In our work, BSP leverages Pin [2,18] to obtain detailed coverage of conditional branch instructions, enabling a more thorough exploration of critical code paths.
Fuzzers utilize coverage metrics to select more effective seeds for their testing processes. Concentrating on seeds that are most likely to improve code coverage enhances the chances of uncovering software vulnerabilities. For example, fuzzers like Zeror [25] and UnTracer [26] focus exclusively on test cases that lead to increased coverage. In contrast, TortoiseFuzz [27] emphasizes cases that interact with edges linked to critical memory operations, which are vital for comprehensive security evaluations. Additionally, certain fuzzers assign priority to seeds based on the significance of the execution paths they explore, facilitating a more efficient search for potential vulnerabilities [13,28,29,30,31].

2.2. Concolic Execution

Concolic execution is a powerful technique that combines concrete execution with symbolic execution to analyze program paths. In this approach, a program is executed with specific inputs while simultaneously tracking symbolic variables representing a range of possible values. This dual execution allows for the generation of constraints based on the program’s behavior during execution.
Several widely used tools support concolic execution, including Angr [32], KLEE [33], S2E [34], and UC-KLEE [35]. Each of these tools offers unique features and optimizations that enhance the capabilities of concolic execution, facilitating improved path exploration and aiding in detecting vulnerabilities within software systems.
The effectiveness of concolic execution has led to its incorporation into various tools designed to tackle different issues in software analysis. For example, BORG [36] utilizes concolic execution to highlight potential vulnerabilities within programs, specifically to expose buffer overread bugs. On the other hand, SYMFUZZ [37] applies concolic execution to uncover the relationships between individual bits of input data, generating an optimized mutation ratio to enhance the parameters used in fuzz testing. These applications illustrate how concolic execution contributes to more effective and precise software testing methodologies.
While concolic execution can theoretically explore every possible execution path within a program, practical implementations face a significant obstacle known as path explosion. This phenomenon complicates the analysis, making managing the sheer volume of potential paths challenging. Although it is impossible to eliminate this issue, various innovative approaches have been developed to reduce its adverse effects. For instance, the Woodpecker tool [38] enhances symbolic execution efficiency by systematically removing redundant paths. Similarly, Matryoshka [39] homes in on conditional statements pertinent to the target branch, streamlining the analysis process. Another approach is offered by MergePoint [40], which alleviates performance burdens by alternating between dynamic and static symbolic execution strategies. Furthermore, Eclipser [41] selectively focuses on a limited set of comparison instructions to formulate approximate path constraints, contributing to a more efficient execution analysis.

2.3. Hybrid Fuzzing

Hybrid fuzzing is a sophisticated approach that integrates the strengths of both traditional fuzzing and concolic execution, resulting in more effective software testing. In this framework, fuzzing explores the majority of execution paths, while concolic execution tackles the more complex and challenging paths. A notable example is Driller [1], which activates concolic execution to navigate to new execution regions whenever the fuzzer encounters a dead end. In another instance, Dowser [42] enhances fuzzing capabilities by employing taint analysis and program analysis techniques. It identifies critical code sections that access arrays within loops, which are commonly linked to buffer overflow vulnerabilities. This tool strategically symbolizes only those input bytes that influence the indices of these arrays, thereby improving the precision of the fuzzing process.
While hybrid fuzzing offers numerous advantages, the inherently slow nature of concolic execution can lead to inefficiencies in the testing process. To address this challenge, QSYM [2] implements Dynamic Binary Translation (DBT), effectively merging symbolic emulation with native execution. This method facilitates instruction-level symbolic emulation, resulting in a significant acceleration of concolic execution. Additionally, QSYM identifies an issue related to over-constrained scenarios in concolic execution; to mitigate this, it generates a broader range of potentially valuable inputs by partially resolving path constraints.
For hybrid fuzzing to be effective, it is vital to ensure a smooth interaction between the fuzzer and the concolic executor. DigFuzz [3] critiques the shortcomings of Driller’s “demand launch” strategy, which can limit its efficiency. DigFuzz employs the Monte Carlo-based Probabilistic Path Prioritization Model (MCP3) to overcome this, which assesses the likelihood of traversing specific execution paths. This model prioritizes paths that are deemed more complex, allowing concolic execution to focus on the most challenging aspects of the program.
Moreover, the inherent randomness of the fuzzer’s mutation strategy can inadvertently alter valid inputs generated by the concolic executor. To counter this issue, PANGLOLIN [43] adopts a technique called “polyhedral path abstraction”. This approach confines the fuzzer’s mutations to a specific range, ensuring that newly produced test cases adhere to the established path constraints. Similarly, MEUZZ [7] argues that fixed and simplistic heuristics, like opting for the smallest seed, may not be universally applicable. Instead, it harnesses machine learning to enable adaptive seed scheduling, facilitating a more tailored approach that can be adapted to different programming contexts.
Recently, there has been much research focusing on enhancing the efficiency of hybrid fuzzing or using hybrid fuzzing to detect vulnerabilities. Zhao et al. [44] introduced a tightly coupled hybrid fuzzing technique that extracted input specifications from concolic execution to guide fuzzing mutations, enhancing the performance of hybrid fuzzing in terms of code coverage and vulnerability discovery. Jiang et al. [5] conducted an extensive study on state-of-the-art hybrid fuzzers and proposed CoFuzz, a framework that improved hybrid fuzzing effectiveness by upgrading coordination modes, leading to significant increases in edge coverage and unique crash exposure. RSFuzzer [45] is a hybrid gray-box fuzzing framework designed to detect deep vulnerabilities in the SMI handlers of UEFI firmware, outperforming existing techniques in code coverage and vulnerability detection. BSFuzz [46] is a hybrid fuzzing approach that uses a branch state map to efficiently synchronize test cases and reduce redundant constraint solving, enhancing the performance of concolic execution. SHFuzz [47] introduces a selective hybrid fuzzing approach that focuses on critical branch selection and priority score calculation to improve code coverage and bug detection capabilities. HyperGo [48] is a directed hybrid fuzzer that employs path probability and an optimized symbolic execution scheme to efficiently reach target sites and expose vulnerabilities. PBFuzz [49] enhances hybrid fuzzing efficacy by synchronizing seeds and employing a branch-oriented scheduling strategy to prioritize seeds with high exploratory potential for concolic execution.
BSFuzz [46] proposes a method to approximately mitigate unsolvable branches during concolic execution, but it can only handle a limited number of unsolvable branches, while a large number of unsolvable branches cannot be handled. The other methods focus on enhancing efficiency of hybrid fuzzing but not concolic execution. Our approach focuses on eliminating unsolvable branches in concolic execution and is orthogonal to previously discussed methods. It can be easily integrated into other hybrid fuzzers. This innovative strategy has been shown to enhance the efficiency of the hybrid fuzzing process.

2.4. Program Modification

Most existing fuzzing methods improve testing efficiency by changing various strategies of the fuzzer, such as the instrumentation strategy [50,51], seed selection strategy [7,52], seed mutation strategy [28,53,54], etc. However, we can change the fuzzing algorithm and the target program itself to cover more code.
TaintScope [55] detects checksum-based integrity checks in the target program and modifies the corresponding instructions in the binary program so that the conditional instruction is always taken or not, thus bypassing the checksum. Afterward, to fix the incorrect checksum fields in the test cases, TaintScope [55] solves the constraints on the path to generate the test cases that can truly pass the checksum check. Like TaintScope, T-Fuzz [56] modifies the target binary program to trigger deeper vulnerabilities. But instead of focusing only on checksum-based integrity checks, T-Fuzz [56] removes all the complex checks that the fuzzer cannot pass, allowing inputs to explore deeper into the program.
IJON [57] proposed an interactive fuzzing approach that can analyze the state of code coverage and the location of complex constraints. Guidance comments are manually added to the source code to address those hard-to-cover branches. BSP also modifies the source code of the target program but uses concolic execution to filter out those problematic branches.
Alternatively, program modification could be applied to directed fuzzing. Directed fuzzing focuses on how to reach suspicious code or reproduce vulnerabilities ASAP. ParmeSan [58] introduces sanitizers to the instrument program and collects attractive targets for efficient fuzzing. BEACON [59] prunes infeasible paths (such as those with unsolvable branches) and irrelevant paths (such as those that obviously cannot reach a target position) by instrumenting assert() statements into the source code. The directed fuzzing selectively reaches specific exciting points required by prior knowledge.

3. Motivation

Hybrid fuzzing merges fuzzing that relies on test cases with concolic execution techniques. In practice, there exists a speed gap between the test case generation of the fuzzer and the concolic executor. Sometimes, the concolic executor is stuck in unsolvable branches, which causes significant computational overhead and slows down the overall testing process.

3.1. Slow Concolic Execution

Fuzzing produces a vast array of test cases. For instance, both Driller [1] and QSYM [2] thoroughly investigate every input from the fuzzer. However, the concolic execution process requires symbolic simulation and constraint solving, both highly demanding in computation, leading to considerably slower execution. This means the concolic executor can only analyze a minimal subset of the test cases in the fuzzer’s queue.
We evaluated QSYM [2] on a series of real-world benchmark programs. Table 1 shows that after 24 h, it processed an average of only 13.55% of the test cases available in the fuzzer’s queue. There is still on average 86.45% of the test cases handled by the fuzzer that are not handled by QSYM [2]. This indicates a significant gap between the efficiency of the concolic executor and that of the fuzzer. Notably, in all five programs, aside from pngfix, QSYM’s [2] processed test cases accounted for less than 10% of the total generated by the fuzzer.
There is a notable delay between the concolic executor and the fuzzer. The concolic executor’s ability to handle test cases falls short of the fuzzer’s rate of test case generation. As a result, when the concolic executor eventually creates valuable inputs for a specific path, the fuzzer may have already produced inputs that cover that path. In such instances, the inputs generated by the concolic executor become pointless. Therefore, it is important for us to reduce redundant solving, which means trying to solve branches that cannot be solved before (so that they cannot be solved now either) or cannot be solved theoretically. This way, the concolic executor can focus on solving new branches and those that “can be solved”.

3.2. Unsolvable Constraints

Our study revealed that the concolic executor often wastes time and resources trying to solve constraints for branches that are fundamentally unsolvable. Current hybrid fuzzers do not track whether constraint solving succeeds or fails. For instance, when QSYM [2] encounters a branch, it initially attempts to resolve the complete path constraint; if that fails, it then tries to tackle the last constraint. Meanwhile, Driller [1] only checks if the current branch can reach code areas that have not been covered yet, completely overlooking whether that branch has already been tested multiple times with unsuccessful results.
We divide the status of constraint solving for a branch into three distinct categories: solvable, partially solvable, and unsolvable. A branch is considered solvable when all associated path constraints are satisfiable. If not all constraints can be satisfied but the last constraint in the path can be resolved, it is classified as partially solvable. Conversely, the branch is marked as unsolvable if the last constraint cannot be solved within the current context. The last constraint typically involves a single decision point, simplifying it compared to the entire path’s constraints, making it easier to address. If this final constraint proves unsolvable, the complete path constraints are also unsolvable.
Many unsolvable branches are being repeatedly evaluated during the constraint-solving process. Hu et al. [46] recorded the process of solving pdftotext by QSYM [2]’s concolic executor and found that successful solutions were found in 27% of all branches, but among the branches that remained unsolvable, only 7% managed to be resolved after being iterated multiple times in varying contexts. These branches were 3% of the total branches. The solver spent 44% of its total solving time on ultimately unsolvable branches, meaning it dedicated nearly half of its time repeatedly attempting to resolve these branches without producing any test cases that could improve coverage. As a result, our goal was to minimize the time allocated to unsolvable branches and instead focus that time on branches that were more likely to yield solutions.

3.3. Unsolvable and Uncovered Branches

There are many branches in the program that are challenging for both fuzzing and concolic execution to address. Typically, the code under these branches cannot be covered, and the vulnerabilities hidden under these branches cannot be triggered. These locations often have constraints that cannot be met in the current execution environment, and neither the fuzzer nor the concolic executor can generate correct test cases. After testing, we found many such cases in large programs. These unsolvable and uncovered branches are generally related to the execution environment and are not beyond the constraint solver’s solving capability. For example, finding an input that makes the return value of the malloc function equal to NULL is usually unsatisfiable because the malloc function only returns NULL when there is insufficient memory.
To further improve the program’s coverage and discover deeper vulnerabilities, we can assume that these conditions can be met. We just need to locate and remove the judgment conditions related to the execution environment in these statements so that the code protected by these conditional statements can be exposed to the concolic executor and the fuzzer.

4. Design

This section highlights the fundamental components of BSP. An overview of the structure of BSP is shown in Figure 1.
Firstly, BSP finds out unsolvable constraints and unsolvable and uncovered branches in the original program (described in Section 3.2 and Section 3.3). Then, BSP generates program variants by splitting detected unsolvable branches (described in Section 4.1). Next, the fuzzer takes the generated variants as fuzzing targets. When a crash is encountered, we use the crash input to run the original program, which can validate the crash (described in Section 4.2).

4.1. Branch Splitting

Traditional fuzzing methods focus on improving the input mutation strategy rather than modifying the target program. Through analysis, we discover many unsolvable branches in the program. Some branches need plenty of resources (which we cannot afford) to solve. Some are related to the execution environment, making it difficult for an optimized hybrid fuzzer to generate test cases that satisfy the conditions. To represent the current status of each unsolvable branch, we use a pair of 16-bit integers for the first type of these branches. When a branch is considered likely unsolvable, it is assigned a mark of 65,534 in the state graph. If a test case has covered that branch, the last bit of the label is set to 1, yielding a total value of 65,535. During subsequent executions at a marked branch, the program checks if that branch has already been flagged when attempting to solve the current path constraints. If the branch is already labeled as unsolvable, it is not solved and is the program point where the program is divided into variants. This enables BSP to conserve time and resources by avoiding branches that are unlikely to produce valuable outcomes.
Manually modifying the program can help hybrid fuzzing break through complex branches (i.e., unsolvable and uncovered). During fuzzing, binary addresses of unsolvable branches, jump targets, and corresponding indexes in the state map can be collected. These branches are then checked for coverage according to their indexes. If a branch is unsolvable and uncovered in the program, its corresponding value in the state map is 65,534.
We manually analyze branches related to the system environment for the second kind of unsolvable branches. If the reason for the intractability of the constraints on these branches is related to the execution environment, these constraints are unsolvable.
The unsolvable branches can be manually modified to generate a new target program. A new fuzzing test is started on the latest program to help the hybrid fuzzer pass these constraints. The binary addresses of the obtained branches are first converted into source code locations. Then, manual changes are made at the source code level, for example, to make the result of the branch condition always equal to True or False. In the modified program, the code protected by this condition is fully exposed to the hybrid fuzzing. Therefore, on the new target program, the constraint is removed when the test case executes at that location, and the test case can easily pass through that location and further explore the code under that branch.
We developed branch-splitting automation scripts for if-else structures, as shown in Algorithm 1. The program’s source code structure is analyzed using pycparser [60] to generate an abstract syntax tree to obtain information about the program’s structure. The nodes of the syntax tree contain the if node and its children: cond, iftrue, iffalse. The cond node is the evaluation condition of if, the iftrue node is the section of code to be executed if the condition is true, and the iffalse node is the section of code to be executed if the condition is false. We add “&&0” or “||1” after the evaluation condition of the if statement. If the code in the iftrue node cannot be covered, we add “||1” after the evaluation condition to satisfy the branch constraint; if the code in the iffalse node cannot be covered, we add “&&0”. The original if-else structure is replaced by the modified if-else structure, completing the program modification.
Algorithm 1 Branch-splitting automation.
1:
Input:  i n s t r u c t i o n s (sequence of program instructions)
2:
Input:  s l i c i n g _ c r i t e r i o n (criterion based on crash point)
3:
Output:  s l i c e (relevant instruction slice)
4:
1. Analyze abstract syntax tree to specify c o n d , i f t r u e , i f f a l s e .
5:
2. Get set of c o n d i t i o n s that is unsolvable.
6:
for each c o n d i t i o n in c o n d i t i o n s  do
7:
    Add “||1” after the original condition of the if statement to generate a variant of source code.
8:
    Add “&&0” after the original condition of the if statement to generate another variant of source code.
9:
end for
For instance, in the code presented in Listing 1, the true branch of the if statement activates when the malloc function yields a null pointer, which indicates that the memory allocation was unsuccessful. Since this branch condition is related to the memory environment and is a nonlinear constraint, generating a test case that satisfies the condition by constraint solving is difficult. Therefore, we manually add “||1” to the branch condition of the if statement. This ensures that the actual branch condition is satisfied, and the code at lines 3–4 is executed.
Listing 1. An example of modifying an if statement in a file.
Electronics 13 04935 i001

4.2. Filtering Out False Positives

Removing branch constraints from a program can introduce new errors in the recompiled version. To address this issue, we used two processes to perform simultaneous fuzzing on the original and modified programs. This allowed us to compare the results and ensure that any reported bugs were genuine and not introduced by our modifications. To ensure that the bugs we found were genuine vulnerabilities, we filtered out false positives by re-executing them on the original program.

5. Implementation and Evaluation

We created a prototype system named BSP, leveraging AFL [17], QSYM [2], and pycparser [60]. AFL [17] is among the most popular fuzzers and is used for program fuzzing. QSYM [2] is used for concolic execution to gather and address constraints found along the execution paths.
Pycparser [60] can generate an abstract syntax tree (AST) for C code. We used pycparser [60] to generate AST for the target program and later transform the source code. To establish our approach’s validity, we conducted thorough experiments across multiple target programs. These investigations sought to answer the following key question:
  • RQ: Can the branch-splitting technique help the hybrid fuzzer cover more code? (Section 5)
Experimental setup. All experiments were run on an Ubuntu 16.04 LTS system equipped with four Intel(R) Xeon(R) Gold 5117 CPUs (each with fourteen 2.00 GHz cores) and 128 GB of RAM. We used up to four cores for each experiment.
Baseline. Hybrid fuzzers [45,47,48,49] utilize the Z3 solver [61] to perform concolic execution. But in practice, the solving capability of the Z3 solver is limited, so that some constraints are unsolvable by Z3, which causes them to be repeatedly solved in existing concolic executor and degrade the efficiency of hybrid fuzzers. QSYM [2] leverages the Z3 solver as its backend to solve symbolic constraints and leverages AFL [17] as a fuzzer; BSP was built upon QSYM [2]. Therefore, in order to prove concolic execution performance of our method, we performed a performance evaluation of our approach concerning QSYM [2]. For the QSYM [2] experiments, each program utilized two CPU cores for the fuzzing operation and an additional two for handling concolic execution. In contrast, our BSP experiments employed a different configuration, using three CPU cores in total—two dedicated to fuzzing and one exclusively for concolic execution, which we refer to as BSP-ce.
Benchmarks. To test the effectiveness of BSP, we selected the most recent versions of several real-world programs. These programs are often featured in the literature on fuzz testing, including tcpdump, xpdf, libpng, and binutils. Table 2 provides an overview of these applications’ configuration settings and versions. Our initial seed files were obtained from AFL [17] and UNIFUZZ [62].

BranchSplitting

To illustrate the performance enhancements of our unsolvable branch filtering strategy, we compared the branch-solving capabilities between QSYM and BSP using various benchmark programs. Table 3 summarizes the branch-solving performance for each program over 24 h. Branches were classified into three categories: solvable, partially solvable, and unsolvable. We focused on the total number of branches resolved by the concolic executor during this timeframe and the counts for solvable and unsolvable branches. Furthermore, we assessed the growth rate of BSP relative to QSYM [2]. BSP demonstrated a marked increase in the total number of branches solved, with an average uplift of nearly 50%. Additionally, the number of branches effectively resolved by BSP also saw a substantial rise, averaging a growth rate of 22.87%.
To demonstrate the effectiveness of program modification, we also tested the six target programs in Table 2 using BSP for 24 h. BSP successfully filtered out those unsolvable branches by the concolic executor and uncovered them by the fuzzer (branches related to the system environment). Table 4 shows the number of branches corresponding to each program. The second column shows the number of branches in each program that were neither solvable by the concolic executor nor covered by the fuzzer. Our results showed many unsolvable and uncovered branches, indicating that there were branches in the program that were difficult for both the concolic executor and the fuzzer to handle. We found the corresponding source code locations for these unsolvable and uncovered branches and modified the original program using the method described in Section 4.1. After analysis, we found that most of these branches that were not processed by the hybrid fuzzer were memory- or data-size-related. After splitting these branches, we fuzzed the new target programs again and successfully covered the code under these hard-to-resolve branches.

6. Discussion

In our experiment, BSP successfully covered an average of 46.68% more code compared to existing hybrid fuzzing techniques, which proves that BSP can significantly enhance code coverage by splitting the program under test from unsolvable branches. It is important to note that our method focuses on handling unsolvable branches during concolic execution and can be integrated with other hybrid fuzzers.
To evaluate the validity of our approach, we focused on a set of benchmark programs widely used in the fuzzing community. Despite this, our method did not perform optimally on specific programs. For example, BSP covered 35.32% more code than QSYM on readelf, while it covered 63.23% more code than QSYM on pdftops. The underlying reason for this shortfall is that while our approach significantly accelerated concolic execution, it did not enhance the actual constraint-solving capabilities.
Our changes to the intractable constraints were made at the source code level. When we do not have access to the source code of the program under test, BSP cannot make changes to the intractable constraints. Secondly, although we designed automated scripts for if-else branch splitting, they still could not handle all conditional jumping cases. For example, branch conditions in for and while statements are not handled in BSP.
In the future, research can be conducted on modifying the instructions in the binaries to adapt to more cases. Meanwhile, we can improve the implementation of BSP to support other kinds of conditions in the source code.

7. Conclusions

In this paper, we proposed and implemented BSP, a method to address the shortcomings of existing concolic executors. Existing tools repeatedly attempt to solve unsolvable branch conditions, leading to a performance penalty. To address this issue, for branches that cannot be solved by concolic execution, BSP splits the program into two programs from the branch. This way, the unsolvable condition is eliminated. Our experiments showed that this allowed BSP to successfully cover an average of 46.68% more code compared to existing hybrid fuzzing techniques. Our research proves that handling the unsolvable branches during concolic execution by splitting can help to explore more paths. BSP can significantly mitigate unsolvable branch challenge in hybrid fuzzing. Our work is orthogonal to traditional hybrid fuzzing methods and can be integrated into existing hybrid fuzzers to improve their efficiency. In the future, research can be conducted on modifying the instructions in the binaries to adapt to more cases. Meanwhile, we can improve the implementation of BSP to support other kinds of conditions in source code.

Author Contributions

Conceptualization, C.Q. and L.P.; methodology, C.Q.; software, L.P., X.K. and J.Q.; validation, Q.Z. and J.Z.; formal analysis, C.Q.; investigation, L.P.; resources, X.K.; data curation, J.Q.; writing—original draft preparation, C.Q., L.P.; writing—review and editing, X.K. and J.Q.; visualization, Y.Z.; supervision, C.Q.; project administration, C.Q.; funding acquisition, C.Q. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in this study are included in the article material. Further inquiries can be directed to the corresponding author(s).

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Stephens, N.; Grosen, J.; Salls, C.; Dutcher, A.; Wang, R.; Corbetta, J.; Shoshitaishvili, Y.; Kruegel, C.; Vigna, G. Driller: Augmenting Fuzzing Through Selective Symbolic Execution. In Proceedings of the 2016 Network and Distributed System Security Symposium, San Diego, CA, USA, 21–24 February 2016. [Google Scholar] [CrossRef]
  2. Yun, I.; Lee, S.; Xu, M.; Jang, Y.; Kim, T. QSYM: A Practical Concolic Execution Engine Tailored for Hybrid Fuzzing. In Proceedings of the 27th USENIX Security Symposium (USENIX Security 18), Baltimore, MD, USA, 15–17 August 2018; pp. 745–761. [Google Scholar]
  3. Zhao, L.; Duan, Y.; Yin, H.; Xuan, J. Send Hardest Problems My Way: Probabilistic Path Prioritization for Hybrid Fuzzing. In Proceedings of the 2019 Network and Distributed System Security Symposium, San Diego, CA, USA, 24–27 February 2019. [Google Scholar] [CrossRef]
  4. Majumdar, R.; Sen, K. Hybrid Concolic Testing. In Proceedings of the 29th International Conference on Software Engineering (ICSE’07), Minneapolis, MN, USA, 20–26 May 2007; pp. 416–426. [Google Scholar] [CrossRef]
  5. Jiang, L.; Yuan, H.; Wu, M.; Zhang, L.; Zhang, Y. Evaluating and improving hybrid fuzzing. In Proceedings of the 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), Melbourne, Australia, 14–20 May 2023; IEEE: New York, NY, USA, 2023; pp. 410–422. [Google Scholar]
  6. Wang, Z.; Ming, J.; Jia, C.; Gao, D. Linear Obfuscation to Combat Symbolic Execution. In Proceedings of the 16th European Symposium on Research in Computer Security(ESORICS 2011), Leuven, Belgium, 12–14 September 2011; pp. 210–226. [Google Scholar]
  7. Chen, Y.; Ahmadi, M.; Farkhani, R.M.; Wang, B.; Lu, L. MEUZZ: Smart Seed Scheduling for Hybrid Fuzzing. In Proceedings of the 23rd International Symposium on Research in Attacks, Intrusions and Defenses, RAID 2020, San Sebastian, Spain, 14–15 October 2020; Egele, M., Bilge, L., Eds.; USENIX Association: Berkeley, CA, USA, 2020; pp. 77–92. [Google Scholar]
  8. Kargén, U.; Shahmehri, N. Turning Programs against Each Other: High Coverage Fuzz-Testing Using Binary-Code Mutation and Dynamic Slicing. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, Bergamo, Italy, 30 August–4 September 2015; pp. 782–792. [Google Scholar] [CrossRef]
  9. Shi, J.; Wang, Z.; Feng, Z.; Lan, Y.; Qin, S.; You, W.; Zou, W.; Payer, M.; Zhang, C. {AIFORE}: Smart Fuzzing Based on Automatic Input Format Reverse Engineering. In Proceedings of the 32nd USENIX Security Symposium (USENIX Security 23), Anaheim, CA, USA, 9–11 August 2023; pp. 4967–4984. [Google Scholar]
  10. Fioraldi, A.; D’Elia, D.C.; Coppa, E. WEIZZ: Automatic grey-box fuzzing for structured binary formats. In Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis, Seattle, WA, USA, 17–21 July 2020; pp. 1–13. [Google Scholar]
  11. Cao, S.; He, B.; Sun, X.; Ouyang, Y.; Zhang, C.; Wu, X.; Su, T.; Bo, L.; Li, B.; Ma, C.; et al. Oddfuzz: Discovering java deserialization vulnerabilities via structure-aware directed greybox fuzzing. In Proceedings of the 2023 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 21–25 May 2023; IEEE: New York, NY, USA, 2023; pp. 2726–2743. [Google Scholar]
  12. Deng, P.; Yang, Z.; Zhang, L.; Yang, G.; Hong, W.; Zhang, Y.; Yang, M. NestFuzz: Enhancing Fuzzing with Comprehensive Understanding of Input Processing Logic. In Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, Copenhagen, Denmark, 26–30 November 2023; pp. 1272–1286. [Google Scholar]
  13. Gan, S.; Zhang, C.; Qin, X.; Tu, X.; Li, K.; Pei, Z.; Chen, Z. CollAFL: Path Sensitive Fuzzing. In Proceedings of the 2018 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 21–23 May 2018; pp. 679–696. [Google Scholar] [CrossRef]
  14. Chen, P.; Chen, H. Angora: Efficient Fuzzing by Principled Search. In Proceedings of the 2018 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 21–23 May 2018; pp. 711–725. [Google Scholar] [CrossRef]
  15. Nagy, S.; Nguyen-Tuong, A.; Hiser, J.D.; Davidson, J.W.; Hicks, M. Same Coverage, Less Bloat: Accelerating Binary-only Fuzzing with Coverage-preserving Coverage-guided Tracing. In Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security, Virtual Event Republic of Korea, 15–19 November 2021; pp. 351–365. [Google Scholar] [CrossRef]
  16. Liu, Y.; Meng, W. DSFuzz: Detecting Deep State Bugs with Dependent State Exploration. In Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, Copenhagen, Denmark, 26–30 November 2023; pp. 1242–1256. [Google Scholar]
  17. Zalewski, M. American Fuzzy Lop. 2015. Available online: https://lcamtuf.coredump.cx/afl/ (accessed on 15 August 2024).
  18. Luk, C.K.; Cohn, R.; Muth, R.; Patil, H.; Klauser, A.; Lowney, G.; Wallace, S.; Reddi, V.J.; Hazelwood, K. Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation. SIGPLAN Not. 2005, 40, 190–200. [Google Scholar] [CrossRef]
  19. Google. Honggfuzz. 2010. Available online: https://honggfuzz.dev/ (accessed on 15 August 2024).
  20. Schumilo, S.; Aschermann, C.; Abbasi, A.; Wor-ner, S.; Holz, T. Nyx: Greybox Hypervisor Fuzzing Using Fast Snapshots and Affine Types. In Proceedings of the 30th USENIX Security Symposium (USENIX Security 21), Online, 11–13 August 2021; USENIX Association: Berkeley, CA, USA, 2021; pp. 2597–2614. [Google Scholar]
  21. Google. DynamoRIO. 2014. Available online: https://github.com/DynamoRIO/dynamorio (accessed on 15 August 2024).
  22. Fratric, I. WinAFL. 2016. Available online: https://github.com/googleprojectzero/winafl (accessed on 15 August 2024).
  23. Duck, G.J.; Gao, X.; Roychoudhury, A. Binary Rewriting without Control Flow Recovery. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation, London, UK, 15–20 June 2020; pp. 151–163. [Google Scholar] [CrossRef]
  24. Dinesh, S.; Burow, N.; Xu, D.; Payer, M. RetroWrite: Statically Instrumenting COTS Binaries for Fuzzing and Sanitization. In Proceedings of the 2020 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 18–21 May 2020; pp. 1497–1511. [Google Scholar] [CrossRef]
  25. Zhou, C.; Wang, M.; Liang, J.; Liu, Z.; Jiang, Y. Zeror: Speed Up Fuzzing with Coverage-sensitive Tracing and Scheduling. In Proceedings of the 2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE), Melbourne, Australia, 21 September 2020; pp. 858–870. [Google Scholar]
  26. Nagy, S.; Hicks, M. Full-Speed Fuzzing: Reducing Fuzzing Overhead through Coverage-Guided Tracing. In Proceedings of the 2019 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 23 May 2019; pp. 787–802. [Google Scholar] [CrossRef]
  27. Wang, Y.; Jia, X.; Liu, Y.; Zeng, K.; Bao, T.; Wu, D.; Su, P. Not All Coverage Measurements Are Equal: Fuzzing by Coverage Accounting for Input Prioritization. In Proceedings of the Proceedings 2020 Network and Distributed System Security Symposium, San Diego, CA, USA, 23–26 February 2020. [Google Scholar] [CrossRef]
  28. Lemieux, C.; Sen, K. FairFuzz: A Targeted Mutation Strategy for Increasing Greybox Fuzz Testing Coverage. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, Montpellier, France, 3–7 September 2018; pp. 475–485. [Google Scholar] [CrossRef]
  29. Yan, S.; Wu, C.; Li, H.; Shao, W.; Jia, C. PathAFL: Path-Coverage Assisted Fuzzing. In Proceedings of the 15th ACM Asia Conference on Computer and Communications Security, Taipei, Taiwan, 5–9 October 2020. [Google Scholar]
  30. Bohme, M.; Pham, V.T.; Roychoudhury, A. Coverage-Based Greybox Fuzzing as Markov Chain. IEEE Trans. Softw. Eng. 2019, 45, 489–506. [Google Scholar] [CrossRef]
  31. Yue, T.; Wang, P.; Tang, Y.; Wang, E.; Yu, B.; Lu, K.; Zhou, X. EcoFuzz: Adaptive Energy-Saving Greybox Fuzzing as a Variant of the Adversarial Multi-Armed Bandit. In Proceedings of the 29th USENIX Security Symposium (USENIX Security 20), Anaheim, CA, USA, 9–11 August 2020; USENIX Association: Berkeley, CA, USA, 2020; pp. 2307–2324. [Google Scholar]
  32. Shoshitaishvili, Y.; Wang, R.; Salls, C.; Stephens, N.; Polino, M.; Dutcher, A.; Grosen, J.; Feng, S.; Hauser, C.; Kruegel, C.; et al. SOK: (State of) The Art of War: Offensive Techniques in Binary Analysis. In Proceedings of the 2016 IEEE Symposium on Security and Privacy (SP), San Jose, CA, USA, 22–26 May 2016; pp. 138–157. [Google Scholar] [CrossRef]
  33. Cadar, C.; Dunbar, D.; Engler, D. KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs. In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, San Diego, CA, USA, 8–10 December 2008; OSDI’08. pp. 209–224. [Google Scholar]
  34. Chipounov, V.; Kuznetsov, V.; Candea, G. S2E: A Platform for in-Vivo Multi-Path Analysis of Software Systems. In Proceedings of the Sixteenth International Conference on Architectural Support for Programming Languages and Operating Systems, New York, NY, USA, 5–11 May 2011; ASPLOS XVI. pp. 265–278. [Google Scholar] [CrossRef]
  35. Ramos, D.A.; Engler, D. Under-Constrained Symbolic Execution: Correctness Checking for Real Code. In Proceedings of the 24th USENIX Conference on Security Symposium, Washington, DC, USA, 12–14 August 2015; SEC’15. pp. 49–64. [Google Scholar]
  36. Neugschwandtner, M.; Milani Comparetti, P.; Haller, I.; Bos, H. The BORG: Nanoprobing Binaries for Buffer Overreads. In Proceedings of the 5th ACM Conference on Data and Application Security and Privacy, San Antonio, TX, USA, 2–4 March 2015; pp. 87–97. [Google Scholar] [CrossRef]
  37. Cha, S.K.; Woo, M.; Brumley, D. Program-Adaptive Mutational Fuzzing. In Proceedings of the 2015 IEEE Symposium on Security and Privacy, San Jose, CA, USA, 17–21 May 2015; pp. 725–741. [Google Scholar] [CrossRef]
  38. Cui, H.; Hu, G.; Wu, J.; Yang, J. Verifying Systems Rules Using Rule-Directed Symbolic Execution. In Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems—ASPLOS ’13, Houston, TX, USA, 16–20 March 2013; p. 329. [Google Scholar] [CrossRef]
  39. Chen, P.; Liu, J.; Chen, H. Matryoshka: Fuzzing Deeply Nested Branches. In Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, New York, NY, USA, 11–15 November 2019; CCS ’19. pp. 499–513. [Google Scholar] [CrossRef]
  40. Avgerinos, T.; Rebert, A.; Cha, S.K.; Brumley, D. Enhancing Symbolic Execution with Veritesting. Commun. ACM 2016, 59, 93–100. [Google Scholar] [CrossRef]
  41. Choi, J.; Jang, J.; Han, C.; Cha, S.K. Grey-Box Concolic Testing on Binary Code. In Proceedings of the 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE), Montreal, QC, Canada, 25 May 2019; pp. 736–747. [Google Scholar] [CrossRef]
  42. Haller, I.; Slowinska, A.; Neugschwandtner, M.; Bos, H. Dowsing for Overflows: A Guided Fuzzer to Find Buffer Boundary Violations. In Proceedings of the 22nd USENIX Conference on Security, Washington, DC, USA, 14–16 August 2013; SEC’13. pp. 49–64. [Google Scholar]
  43. Huang, H.; Yao, P.; Wu, R.; Shi, Q.; Zhang, C. Pangolin: Incremental Hybrid Fuzzing with Polyhedral Path Abstraction. In Proceedings of the 2020 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 18–21 May 2020; pp. 1613–1627. [Google Scholar] [CrossRef]
  44. Zhao, Y.; Gao, L.; Wei, Q.; Zhao, L. Towards Tightly-coupled Hybrid Fuzzing via Excavating Input Specifications. IEEE Trans. Dependable Secur. Comput. 2024, 21, 4801–4814. [Google Scholar] [CrossRef]
  45. Yin, J.; Li, M.; Li, Y.; Yu, Y.; Lin, B.; Zou, Y.; Liu, Y.; Huo, W.; Xue, J. RSFuzzer: Discovering Deep SMI Handler Vulnerabilities in UEFI Firmware with Hybrid Fuzzing. In Proceedings of the 2023 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 21–25 May 2023; IEEE: New York, NY, USA, 2023; pp. 2155–2169. [Google Scholar]
  46. Hu, Q.; Chen, W.; Wang, Z.; Lu, S.; Nie, Y.; Li, X.; Kuang, X. BSFuzz: Branch-State Guided Hybrid Fuzzing. Electronics 2023, 12, 4033. [Google Scholar] [CrossRef]
  47. Mi, X.; Wang, B.; Tang, Y.; Wang, P.; Yu, B. SHFuzz: Selective hybrid fuzzing with branch scheduling based on binary instrumentation. Appl. Sci. 2020, 10, 5449. [Google Scholar] [CrossRef]
  48. Lin, P.; Wang, P.; Zhou, X.; Xie, W.; Lu, K.; Zhang, G. HyperGo: Probability-based directed hybrid fuzzing. Comput. Secur. 2024, 142, 103851. [Google Scholar] [CrossRef]
  49. Fang, W.; Yang, X.; Guo, W.; Sun, H.; Li, Q.; Lin, Z. PBFuzz: Potential-aware Branch-oriented Hybrid Fuzzing. In Proceedings of the 2024 4th International Symposium on Computer Technology and Information Science (ISCTIS), Xi’an, China, 12–14 July 2024; IEEE: New York, NY, USA, 2024; pp. 608–614. [Google Scholar]
  50. Nguyen, M.D.; Bardin, S.; Bonichon, R.; Groz, R.; Lemerre, M. Binary-Level Directed Fuzzing for Use-after-Free Vulnerabilities. In Proceedings of the 23rd International Symposium on Research in Attacks, Intrusions and Defenses, RAID 2020, San Sebastian, Spain, 14–15 October 2020; Egele, M., Bilge, L., Eds.; USENIX Association: Berkeley, CA, USA, 2020; pp. 47–62. [Google Scholar]
  51. Gao, X.; Duck, G.J.; Roychoudhury, A. Scalable Fuzzing of Program Binaries with E9AFL. In Proceedings of the 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE), Melbourne, Australia, 15–19 November 2021; pp. 1247–1251. [Google Scholar] [CrossRef]
  52. She, D.; Shah, A.; Jana, S. Effective Seed Scheduling for Fuzzing with Graph Centrality Analysis. In Proceedings of the 2022 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 22–26 May 2022; pp. 2194–2211. [Google Scholar] [CrossRef]
  53. Lyu, C.; Ji, S.; Zhang, C.; Li, Y.; Lee, W.H.; Song, Y.; Beyah, R. MOPT: Optimized Mutation Scheduling for Fuzzers. In Proceedings of the 28th USENIX Security Symposium, USENIX Security 2019, Santa Clara, CA, USA, 14–16 August 2019; Heninger, N., Traynor, P., Eds.; USENIX Association: Berkeley, CA, USA, 2019; pp. 1949–1966. [Google Scholar]
  54. Pham, V.T.; Böhme, M.; Santosa, A.E.; Căciulescu, A.R.; Roychoudhury, A. Smart Greybox Fuzzing. IEEE Trans. Softw. Eng. 2021, 47, 1980–1997. [Google Scholar] [CrossRef]
  55. Wang, T.; Wei, T.; Gu, G.; Zou, W. TaintScope: A Checksum-Aware Directed Fuzzing Tool for Automatic Software Vulnerability Detection. In Proceedings of the 2010 IEEE Symposium on Security and Privacy, Oakland, CA, USA, 16–19 May 2010; pp. 497–512. [Google Scholar] [CrossRef]
  56. Peng, H.; Shoshitaishvili, Y.; Payer, M. T-Fuzz: Fuzzing by Program Transformation. In Proceedings of the 2018 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 20–24 May 2018; pp. 697–710. [Google Scholar] [CrossRef]
  57. Aschermann, C.; Schumilo, S.; Abbasi, A.; Holz, T. Ijon: Exploring Deep State Spaces via Fuzzing. In Proceedings of the 2020 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 18–21 May 2020; pp. 1597–1612. [Google Scholar] [CrossRef]
  58. Österlund, S.; Razavi, K.; Bos, H.; Giuffrida, C. ParmeSan: Sanitizer-guided Greybox Fuzzing. In Proceedings of the 29th USENIX Security Symposium, USENIX Security 2020, Boston, MA, USA, 12–14 August 2020; Capkun, S., Roesner, F., Eds.; USENIX Association: Berkeley, CA, USA, 2020; pp. 2289–2306. [Google Scholar]
  59. Huang, H.; Guo, Y.; Shi, Q.; Yao, P.; Wu, R.; Zhang, C. BEACON: Directed Grey-Box Fuzzing with Provable Path Pruning. In Proceedings of the 2022 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 22–26 May 2022; pp. 36–50. [Google Scholar] [CrossRef]
  60. Bendersky, E. Pycparser. 2010. Available online: https://github.com/eliben/pycparser (accessed on 15 August 2024).
  61. Microsoft. Z3Prover. Available online: https://github.com/Z3Prover/z3 (accessed on 15 August 2024).
  62. Li, Y.; Ji, S.; Chen, Y.; Liang, S.; Lee, W.H.; Chen, Y.; Lyu, C.; Wu, C.; Beyah, R.; Cheng, P.; et al. UNIFUZZ: A Holistic and Pragmatic Metrics-Driven Platform for Evaluating Fuzzers. In Proceedings of the 30th USENIX Security Symposium (USENIX Security 21), Online, 11–13 August 2021; USENIX Association: Berkeley, CA, USA, 2021; pp. 2777–2794. [Google Scholar]
Figure 1. Overview of BSP.
Figure 1. Overview of BSP.
Electronics 13 04935 g001
Table 1. Comparison between the number of test cases executed by QSYM [2] and generated by AFL [17] in 24 h.
Table 1. Comparison between the number of test cases executed by QSYM [2] and generated by AFL [17] in 24 h.
ProgramAFL [17]QSYM [2]Ratio
readelf13,13211778.96%
tcpdump21,75111655.36%
pdfimages15,5218805.67%
pdftops16,8758655.13%
pdftotext18,0898884.91%
pngfix3194191259.86%
Table 2. Configuration information and version of the benchmark programs.
Table 2. Configuration information and version of the benchmark programs.
ProgramVersionInput Format
readelf-a @@binutils-2.40elf
tcpdump-e-vv-nr @@tcpdump-4.99pcap
pdftotext @@/dev/nullxpdf-4.04pdf
pdfimages @@/dev/nullxpdf-4.04pdf
pdftops @@/dev/nullxpdf-4.04pdf
pngfix @@libpng-1.6.40png
Table 3. Comparison of Fuzzers on Different Programs.
Table 3. Comparison of Fuzzers on Different Programs.
ProgramFuzzerTotal BranchesUnsolvable Branches
NumGrowthNumGrowth
pdftotextQSYM945-480-
BSP1338+41.59%712+48.33%
pdftopsQSYM786-348-
BSP1283+63.23%683+96.26%
pdfimagesQSYM809-380-
BSP1119+38.32%584+53.68%
tcpdumpQSYM4254-1016-
BSP6570+54.44%1632+60.63%
readelfQSYM2302-299-
BSP3115+35.32%545+82.27%
pngfixQSYM1438-547-
BSP2027+40.96%595+8.78%
AverageQSYM1756-512-
BSP2575+46.68%792+54.76%
Table 4. Unsolvable and uncovered branches per program.
Table 4. Unsolvable and uncovered branches per program.
ProgramUnsolvable and Uncovered
pdftotext148
pdftops90
pdfimages145
tcpdump395
readelf218
pngfix237
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Qian, C.; Pang, L.; Kuang, X.; Qin, J.; Zang, Y.; Zhao, Q.; Zhang, J. BSP: Branch Splitting for Unsolvable Path Hybrid Fuzzing. Electronics 2024, 13, 4935. https://doi.org/10.3390/electronics13244935

AMA Style

Qian C, Pang L, Kuang X, Qin J, Zang Y, Zhao Q, Zhang J. BSP: Branch Splitting for Unsolvable Path Hybrid Fuzzing. Electronics. 2024; 13(24):4935. https://doi.org/10.3390/electronics13244935

Chicago/Turabian Style

Qian, Cheng, Ling Pang, Xiaohui Kuang, Jiuren Qin, Yujie Zang, Qichao Zhao, and Jiapeng Zhang. 2024. "BSP: Branch Splitting for Unsolvable Path Hybrid Fuzzing" Electronics 13, no. 24: 4935. https://doi.org/10.3390/electronics13244935

APA Style

Qian, C., Pang, L., Kuang, X., Qin, J., Zang, Y., Zhao, Q., & Zhang, J. (2024). BSP: Branch Splitting for Unsolvable Path Hybrid Fuzzing. Electronics, 13(24), 4935. https://doi.org/10.3390/electronics13244935

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop