TPH-Fuzz: A Two-Phase Hybrid Fuzzing Framework for Smart Contract Vulnerability Detection

Shi, Fanglei; Yang, Jinsheng; Guo, Zhaohui

doi:10.3390/electronics14071465

Open AccessArticle

TPH-Fuzz: A Two-Phase Hybrid Fuzzing Framework for Smart Contract Vulnerability Detection

by

Fanglei Shi

,

Jinsheng Yang

and

Zhaohui Guo

^*

School of Microelectronics, Tianjin University, Tianjin 300072, China

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(7), 1465; https://doi.org/10.3390/electronics14071465

Submission received: 6 March 2025 / Revised: 2 April 2025 / Accepted: 3 April 2025 / Published: 5 April 2025

(This article belongs to the Section Computer Science & Engineering)

Download

Browse Figures

Versions Notes

Abstract

:

Blockchain technology is revolutionizing various industries through decentralized architecture and secure transaction mechanisms, yet its core application—smart contracts—faces increasingly sophisticated security threats. Recognizing the critical need for enhanced protection in this emerging domain, this paper introduces TPH-Fuzz, a two-phase hybrid fuzzing framework designed to overcome current limitations in vulnerability detection. TPH-Fuzz combines global exploration with local vulnerability targeting. It utilizes dynamic symbolic execution for semantics-aware path analysis and employs data-dependency-based state modeling to generate effective transaction sequences. These methods improve both path exploration and vulnerability detection precision significantly. Experiments on a coverage dataset of 9309 contracts demonstrate an 85% branch coverage on complex contracts, outperforming conventional methods; meanwhile, tests on a vulnerability dataset of 1086 labeled contracts show a detection precision of 89.24% across eight vulnerability categories. The promising results underscore the framework’s potential to transform security auditing practices in the blockchain industry, paving the way for more reliable smart contract development and deployment.

Keywords:

smart contracts; symbolic execution; hybrid fuzzing; transaction sequences; vulnerability detection

1. Introduction

The concept of smart contracts was initially proposed by Nick Szabo in 1996 [1], designed to facilitate the automated execution of contractual provisions. Smart contracts aim to reduce trust costs and minimize the need for human intervention. The emergence of Bitcoin [2] in 2008 introduced blockchain technology, providing a transparent, decentralized, and immutable environment. However, Bitcoin’s scripting language was non-Turing complete, meaning it lacked the capability to perform general-purpose computations and could only execute a limited set of predefined operations, which limited its support for complex business logic. This limitation was overcome with the advent of Ethereum in 2015 [3], where the Ethereum Virtual Machine (EVM) allowed developers to write smart contracts using Turing-complete languages such as Solidity [4], catalyzing the expansion of blockchain technology from cryptocurrency transactions to a wide range of applications, including decentralized finance (DeFi), supply chain management, and beyond [5,6,7].

As of December 2024, more than 70 million smart contracts have been deployed on the Ethereum system, collectively managing assets valued at several hundred billion dollars [8,9]. The widespread adoption of smart contracts highlights their transformative role in enabling complex and feature-rich applications on the blockchain. However, this increased functionality also brings new security challenges. While many smart contracts are designed with security in mind, they remain susceptible to vulnerabilities, particularly those that involve complex logic or are deployed in unpredictable environments. As illustrated in Table 1, a series of high-profile security incidents have occurred in recent years. These incidents have not only caused significant economic losses but also shaken trust in the blockchain system’s security. For instance, the 2016 DAO incident exposed a critical reentrancy vulnerability, resulting in losses of more than 60 million dollars, ultimately leading to a hard fork of the Ethereum network [10]. This incident notably catalyzed extensive research in smart contract security. Since then, numerous scholars and security researchers have actively proposed various methods for detecting smart contract vulnerabilities, aiming to enhance the reliability of blockchain applications.

Existing smart contract vulnerability detection (SCVD) techniques can be categorized as follows: formal verification, symbolic execution, machine learning, and fuzz testing. Formal verification methods [18,19,20,21,22] utilize rigorous mathematical models to ascertain whether contract behaviors adhere to predefined security specifications; these methods cannot handle SCVD efficiently due to the high modeling complexity [23]. Symbolic execution-based SCVD translates inputs and states into symbolic values, systematically exploring execution paths to detect potential vulnerabilities [24,25,26,27,28]. However, as the code complexity grows, the number of paths increases exponentially, which reduces its efficiency for complex contracts [29]. Machine learning techniques [30,31,32] employ pattern recognition and anomaly detection strategies to identify latent security vulnerabilities via a pretrained model using the extensive dataset. Yet, these methods are highly dependent on the quality and quantity of annotated data, and they often exhibit lower interpretability compared to traditional approaches [33].

Fuzz testing (fuzzing) is a dynamic analysis technique that automatically generates random or semi-random inputs to trigger unexpected behaviors. In SCVD, diverse inputs are generated from the contract interface, execution is monitored for anomalies, and outputs are analyzed to reveal latent issues. ContractFuzzer [34] uses random test cases to monitor contract behavior with preset security rules. However, it only tests single functions and overlooks the importance of transaction sequences. sFuzz [35] significantly improves code coverage by integrating AFL-style coverage guidance with a multi-objective search strategy, yet it overlooks the sparsity of vulnerable code. Although Confuzzius [36] recognizes the importance of transaction sequences, relying solely on the crossover operation of genetic algorithms is insufficient for constructing effective transaction sequences. Despite fuzzing being a promising solution for SCVD, it still faces three major drawbacks:

Insufficient guidance: An overreliance on code coverage metrics neglects the sparse distribution of vulnerable code, leading to inefficient allocation of testing resources.
Ineffective path exploration: Random inputs are inadequate for triggering deeply nested logical paths that require specific conditions.
State space blind spots: Since the state of a smart contract depends on the historical sequence of function calls, traditional fuzzing’s lack of state dependency modeling results in unexamined areas.

Addressing the aforementioned challenges, inspired by the design philosophy of hybrid fuzzing [37] and data dependency analysis methods [36,38], this paper introduces TPH-Fuzz, a two-phase hybrid fuzzing framework. The contributions of this work are as follows:

We propose a “global coverage–local targeting" two-phase framework. In the first phase, a breadth-first, coverage-guided approach is employed to swiftly identify potential vulnerability regions. Subsequently, in the second phase, a vulnerability-directed mutation strategy is utilized to generate high-value test cases, thereby mitigating the uneven allocation of testing resources (corresponding to Deficiency 1).
By integrating dynamic symbolic execution, our framework leverages constraint solving to generate inputs that satisfy complex path conditions, thus overcoming the inherent randomness of traditional fuzzing methods (corresponding to Deficiency 2).
Based on the static data dependencies extracted from the abstract syntax tree (AST) and the dynamic data dependencies observed during runtime, our approach generates function call sequences that can reach specified contract states, thereby achieving effective coverage of the state space (corresponding to Deficiency 3).
We evaluated TPH-Fuzz on a dataset of 9309 contracts (coverage analysis) and 1086 labeled cases (vulnerability detection). The results demonstrate that branch coverage in complex contracts reaches as high as 85%, and detection precision attains 89.24%, highlighting significant improvements in both exploration depth and detection accuracy.

The remainder of this paper is organized as follows: Section 2 reviews the background of smart contracts and the fundamentals of fuzzing techniques; Section 3 details the design and implementation of the TPH-Fuzz architecture; Section 4 validates the framework’s effectiveness through comparative experiments; and Section 5 concludes the paper and outlines potential directions for future research.

2. Background

This section introduces the background of smart contracts and the fundamentals of fuzzing techniques. Ethereum’s basic architecture and the EVM model are first introduced in Section 2.1. Then, smart contract vulnerabilities are discussed in Section 2.2. Finally, the fundamental principles of fuzzing are presented in Section 2.3.

2.1. Ethereum and Smart Contract

Ethereum’s core architecture leverages distributed ledger technology to ensure data consistency and transparency across its decentralized network. At the heart of this system are consensus mechanisms, such as Proof of Work (PoW) and Proof of Stake (PoS). In PoW, miners solve computationally intensive puzzles to propose new blocks, ensuring the integrity of the ledger. Any attempt to alter the blockchain requires a substantial amount of computing power. PoS, on the other hand, selects validators based on the amount of cryptocurrency they are willing to “stake” as collateral. It significantly reduces energy consumption compared to PoW and aligns the economic interests of validators with network security—since any malicious behavior can lead to the forfeiture of staked assets [3].

A key innovation of Ethereum is its smart contract—self-executing programs embedded in the blockchain. These contracts automatically execute predefined logic when specific conditions are met, eliminating the need for third-party intermediaries and enabling secure interactions in trustless environments [39]. In Ethereum, accounts serve as the fundamental unit for participating in transactions and interactions. Ethereum accounts are divided into two categories: Externally Owned Accounts (EOAs) and Contract Accounts (CAs). EOAs are ordinary accounts controlled by private keys, used for transactions. CAs are controlled and created by the code of smart contracts. They can only be deployed and called by EOAs or another CA. The bytecodes of smart contracts are maintained as a part of the Ethereum world state and cannot be modified. Once a vulnerability is discovered, redeployment is necessary, which incurs high costs and any losses already incurred cannot be recovered.

All nodes interact with each other in the form of transactions. Each transaction, which is initiated and signed by an EOA (called sender), represents an operation directive used for asset transfer or smart contract invocation. Transactions not only facilitate the transfer of Ether between users but also trigger the execution of code stored in CAs. The EVM is the fundamental component that facilitates the execution of smart contracts; it can be thought of as a distributed computer that runs code on the blockchain, ensuring that all nodes in the network execute the same operations in the same order, providing a consistent global state.

For example, consider a user, Alice, who wants to transfer 10 tokens to Bob using an ERC-20 token contract. When Alice issues the transaction to the Ethereum network, it triggers the smart contract’s transfer function. The EVM processes this transaction in the following way:

Transaction Processing: The EVM loads the bytecode of the ERC-20 contract, which contains the logic for transferring tokens.
Balance Check: The transfer function first checks whether Alice’s account has a sufficient balance of tokens. This is done by accessing the state stored in the EVM’s state tree, where Alice’s balance is recorded.
State Update: Once the balance check passes, the EVM updates the global state by deducting 10 tokens from Alice’s balance and adding them to Bob’s balance.
Event Emission: The EVM also emits a Transfer event, which is logged and can be monitored by dApps or other users.

The Ethereum blockchain can be conceptualized as a “transaction-based state machine”, starting from an initial genesis state and evolving through each executed transaction to reach the current state (shown in Figure 1). The global state of the EVM is represented by a shared state tree, a structure that leverages a modified Merkle Patricia Tree (MPT) for data storage. The EVM operates similarly to a mathematical function; given an initial state S and a transaction T, it deterministically generates an output state

S^{'}

, as shown in Equation (1). This deterministic nature ensures that transaction results achieve consensus across all validating nodes, enabling consistent network-wide state synchronization despite the nodes’ diverse geographic locations and hardware environments.

F (S, T) = S^{'}

(1)

To manage resource consumption through smart contracts, the EVM implements a gas mechanism. Each execution of a smart contract incurs a gas cost, which prevents malicious actors from exhausting network resources by running infinite loops, and also incentivizes miners to validate transactions and uphold the network. Figure 2 illustrates the EVM execution model and the interaction among its critical components. Smart contracts are deployed as EVM bytecode, stored in an immutable virtual ROM. The program counter sequentially reads the bytecode instructions, which are then parsed and executed by the operational module. During execution, instructions manipulate the stack (e.g., via push or pop commands) and access memory and persistent storage, where the memory and storgae are used for temporary data and persistent storage for long-term contract data. Each instruction consumes a specific amount of gas, and the initial gas provided at transaction initiation sets the maximum allowable consumption. The remaining gas is updated in real time, and if it is fully depleted, the transaction will fail due to gas exhaustion and will be reverted.

The EVM is vital for decentralized applications, yet it faces challenges such as scalability issues and high transaction costs; as network activity increases, its sequential execution model and consensus protocols slow down transaction throughput, while the gas mechanism often leads to prohibitive fees during congestion, limiting both small transactions and complex smart contracts due to fixed computational resources.

2.2. Vulnerability Landscape in Smart Contracts

Smart contract security vulnerabilities primarily stem from three interconnected domains: Solidity programming language semantics, EVM architecture, and blockchain infrastructure characteristics [40]. At the Solidity layer, developers’ incomplete understanding of language features and Ethereum mechanisms frequently leads to logic errors. Improper coding practices, such as neglecting the handling of assert and integer overflows/underflows, often lead to vulnerabilities. The EVM’s gas mechanism requires a fee for each operation, which can become problematic when complex control flows lead to excessive gas consumption. This may result in transaction failures due to out-of-gas errors. Cross-contract interactions further compound risks, as adversaries may exploit external call return values to manipulate dependent contracts. Blockchain’s immutable deployment amplifies consequences, where unresolved vulnerabilities may permanently lock funds or disable contracts. Oracle manipulations represent another attack surface, where corrupted data feeds subvert contract execution.

While technical vulnerabilities pose significant risks, the smart contract ecosystem also benefits from a robust support network. Active developer communities and open-source initiatives have played a pivotal role in mitigating these risks. For instance, projects like Mythril [28] and Slither [41] are continuously improved through collaborative development and peer review, enhancing the precision of vulnerability detection. Similarly, OpenZeppelin [42] offers a suite of audited smart contract libraries along with comprehensive security guidelines, which many developers rely on to fortify their contracts. Additionally, platforms such as Ethereum Stack Exchange [43] and Reddit’s r/ethereum [44] facilitate the rapid exchange of security insights and best practices among developers, thereby accelerating both the identification and remediation of potential vulnerabilities.

This work addresses the following key vulnerabilities, identified as both the most common and most destructive in smart contract security, which are summarized in Table 2. SWC (Smart Contract Weakness Classification) and CWE (Common Weakness Enumeration) are standardized classification systems used to categorize and describe vulnerabilities in smart contracts. To better understand the potential impact of these vulnerabilities, they can be categorized into different risk levels. High-risk vulnerabilities can directly lead to significant security issues, such as the potential loss of assets, data manipulation, or total disruption of contract functionality. Medium-to-high-risk vulnerabilities can be exploited by attackers under certain conditions, causing severe impacts. Medium-risk vulnerabilities may affect specific functions of the contract, but typically do not lead to severe losses. These vulnerabilities are more likely to influence the contract’s stability and reliability.

2.3. Automated Vulnerability Detection Through Fuzzing

Fuzzing technology has evolved significantly since its inception in the 1980s [45], transitioning from a rudimentary fault injection method to a sophisticated automated security auditing paradigm. Over the decades, its evolution has been driven by advances in software complexity and the increasing need for robust security measures. As a dynamic testing technique, fuzzing operates on the principle of exposing software vulnerabilities through the systematic injection of malformed input, and its efficacy has been demonstrated in various domains ranging from network protocols to embedded systems [46]. Modern fuzzing frameworks now incorporate feedback mechanisms and instrumentation to further enhance their capability to uncover hard-to-detect bugs.

The fuzzing workflow generally comprises three core stages: testcase generation, program execution, and anomaly detection (shown in Figure 3). In the testcase generation stage, diverse testcases are created either by generating random data, or by mutating existing seed inputs to form unexpected or malformed inputs. These inputs are then fed into the target program, where its execution is closely monitored to capture any deviations from normal behavior. Finally, any anomalies—such as crashes, unexpected exceptions, or memory leaks—are detected, logged, and analyzed to identify potential vulnerabilities.

Fuzzing techniques can be categorized into black-box, white-box, and gray-box approaches [47]. Black-box fuzzing treats the target software as an opaque system, generating inputs without relying on any internal knowledge; this approach is straightforward to deploy but may offer lower code coverage. White-box fuzzing leverages detailed insight from the source code or internal execution paths, often employing techniques like symbolic execution or instrumentation to guide test case generation for deeper analysis. Gray-box fuzzing strikes a balance between these methods by utilizing limited internal information—such as runtime feedback or code coverage metrics—to enhance efficiency while still maintaining an external testing perspective. This approach has become the most widely adopted in practice due to its blend of simplicity and effectiveness.

Despite its advantages, fuzzing faces several challenges. One of the main issues is achieving comprehensive code coverage. Random input generation may not trigger all critical execution paths, which means some hidden vulnerabilities could go unnoticed [48]. Additionally, fuzzing can be quite resource-intensive. This is especially true for complex systems with intricate logic or stateful behaviors, which require significant computational power and time. To address these challenges, fuzzing is increasingly being combined with other techniques. For example, static analysis and symbolic execution are used to improve vulnerability detection and enhance software resilience [49].

3. Proposed System

The general architecture of TPH-fuzz is first introduced in Section 3.1. Then, the input preprocessing module is introduced in Section 3.2. The two-phase hybrid fuzzing engine is introduced in Section 3.3, followed by the vulnerability detection module in Section 3.4.

3.1. Overview

The general architecture of TPH-Fuzz is depicted in Figure 4 and orchestrates a seamless process beginning with the input of raw smart contract source code. Initially, the input preprocessing module transforms this raw code into structured representations that are suitable for vulnerability detection; it achieves this by performing a series of critical operations including compilation, control flow graph (CFG) construction, and data dependency analysis. The structured output is then fed into the two-phase hybrid fuzzing engine, which integrates coverage-guided and vulnerability-guided strategies to generate efficient test cases through symbolic execution and sequence optimization. While the coverage-guided strategy explores untested code paths to ensure comprehensive analysis, the vulnerability-guided strategy targets known vulnerability patterns for in-depth testing, all within a sandboxed EVM environment. Finally, the vulnerability detection module examines the execution traces, generates tracking information, and propagates data taint to identify vulnerabilities through pattern matching; by leveraging taint analysis to trace the flow of untrusted data and employing simulation to validate potential exploits, it confirms the authenticity of detected vulnerabilities and records their type, location, and severity, culminating in the production of detailed reports.

The details of each module will be discussed in the following sections.

3.2. Input Preprocessing

Fuzzing tools require the dynamic execution of smart contract transactions to thoroughly test the contract’s behavior. Therefore, preprocessing of the smart contract source code is essential. This preprocessing step involves extracting several critical components, including the EVM bytecode for contract deployment, the interface information for contract invocation, and the CFG and data dependency information for generating auxiliary test cases. The following outline each of these processes in detail.

(A) Compilation: The first step in preprocessing is compiling the smart contract source code using the solc compiler. This process generates a JSON artifact that includes several key elements: the ABI (Application Binary Interface), the AST (Abstract Syntax Tree), and the EVM bytecode. These components play distinct roles in enabling effective contract execution and vulnerability analysis. The specific explanations and roles of these compilation results are summarized in Table 3.

(B) CFG Construction: The next critical preprocessing step is the construction of the CFG, which is formally defined as a directed graph

G = (V, E)

, comprising basic blocks as vertices V and control transitions as edges E. Each basic block encapsulates a continuous sequence of non-branching EVM instructions, demarcated by start/end addresses. As detailed in Algorithm 1, CFG construction is implemented through bytecode stream analysis: JUMPDEST opcodes and JUMP/JUMPI instructions partition basic blocks, with static jump targets derived from preceding PUSH operands. Conditional jumps (JUMPI) generate dual edges for both branches. Note that for some dynamic jump edges, their determination must occur during subsequent symbolic execution.

Algorithm 1 CFG construction

Input: EVM runtime bytecode R
Output: Control flow graph

C F G (V, E)

1:: Initialize $V \leftarrow Ø$ , $E \leftarrow Ø$
2:: $b a s i c_b l o c k \leftarrow None$ , $p r e v i o u s_o p \leftarrow None$
3:: for $p c \leftarrow 0$ to $len (R) - 1$ do
4:: $o p \leftarrow R [p c]$
5:: if $b a s i c_b l o c k = None$ then
6:: $b a s i c_b l o c k \leftarrow BasicBlock (s t a r t = p c)$
7:: end if
8:: if $o p = JUMPDEST \land b a s i c_b l o c k . i n s t r u c t i o n s \neq Ø$ then
9:: $b a s i c_b l o c k . s e t_e n d (p c - 1)$
10:: $V [p c] \leftarrow b a s i c_b l o c k$
11:: $b a s i c_b l o c k \leftarrow BasicBlock (s t a r t = p c)$
12:: end if
13:: if $o p \in {JUMP, JUMPI}$ then
14:: $b a s i c_b l o c k . s e t_e n d (p c)$
15:: $V [p c] \leftarrow b a s i c_b l o c k$
16:: $t a r g e t s \leftarrow get_jump_target (R, p c, p r e v i o u s_o p)$
17:: $E [p c] \leftarrow t a r g e t s$ {JUMP: $[t a r g e t]$ , JUMPI: $[t a r g e t, p c + 1]$ }
18:: else if $o p \in PUSH_OPCODES$ then
19:: $(s i z e, v a l u e) \leftarrow decode_push (R, p c)$
20:: $b a s i c_b l o c k . a d d_i n s t r u c t i o n (p c, o p, v a l u e)$
21:: $p c \leftarrow p c + s i z e$ {Skip operand bytes}
22:: else
23:: $b a s i c_b l o c k . a d d_i n s t r u c t i o n (p c, o p)$
24:: end if
25:: $p r e v i o u s_o p \leftarrow o p$
26:: end for
27:: return $C F G (V, E)$

(C) Data Dependency Analysis: In addition to CFG construction, another crucial analysis involves data dependency, which plays a key role in detecting vulnerabilities like deep-N vulnerability [50], such as cross-function reentrancy attacks. These vulnerabilities are often triggered by a specific sequence of function calls that modify global state variables. To address this, the proposed method constructs a vulnerability trigger model through data dependency analysis, examining the read and write operations of functions on global state variables. Algorithm 2 implements function-level data flow analysis, extracting the set of global variables from the AST, traversing the function’s statement structure to identify state variable writes in assignment operations (recorded in

f n_w d

) and variable references within expressions (recorded in

f n_r d

). For cross-function calls, the target is resolved through ABI function signature matching, and the read/write dependencies of the called function are inherited through the union operation on the sets.

Algorithm 2 Data dependency analysis

Input: Compiled contract result jsonContent
Output:

f n_r d

,

f n_w d

(function → variable set mappings)

1:: $a b i \leftarrow j s o n C o n t e n t . c o n t r a c t s . a b i$
2:: $a s t \leftarrow j s o n C o n t e n t . c o n t r a c t s . a s t$
3:: $g v l \leftarrow extract_global_variables (a s t)$
4:: Initialize $f n_r d, f n_w d \leftarrow {}, {}$
5:: for each $f n_n o d e \in a s t . get_all_functions ()$ do
6:: $f n \leftarrow get_qualified_name (f n_n o d e)$
7:: $f n_r d [f n], f n_w d [f n] \leftarrow Ø, Ø$
8:: for each $s t m t \in f n_n o d e . b o d y$ do
9:: if $s t m t$ is Assignment then
10:: $t a r g e t \leftarrow s t m t . l e f t_e x p r e s s i o n$
11:: if $t a r g e t \in g v l$ then
12:: $f n_w d [f n] . add (t a r g e t)$
13:: $analyze_expression (s t m t . r i g h t_e x p r e s s i o n, f n, r d)$
14:: end if
15:: else if $s t m t$ is VariableReference then
16:: if $s t m t . v a r \in g v l$ then
17:: $f n_r d [f n] . add (s t m t . v a r)$
18:: end if
19:: else if $s t m t$ is FunctionCall then
20:: $c a l l e e \leftarrow resolve_function_call (s t m t)$
21:: if $c a l l e e \in r d$ then
22:: $f n_r d [f n] \leftarrow f n_r d [f n] \cup f n_r d [c a l l e e]$
23:: $f n_w d [f n] \leftarrow f n_w d [f n] \cup f n_w d [c a l l e e]$
24:: end if
25:: end if
26:: end for
27:: end for
28:: return $f n_r d$ , $f n_w d$

3.3. Two-Phase Hybrid Fuzzing Engine

The Two-Phase Hybrid Fuzzing Engine is the core module of our framework, designed to efficiently detect vulnerabilities in smart contracts by combining coverage-guided and vulnerability-guided fuzzing strategies. This engine operates in two distinct yet complementary phases, ensuring both breadth (exploring all possible paths) and depth (focusing on high-risk areas) in vulnerability detection.

Phase ①: Coverage-Guided Fuzzing: In this phase, the engine generates a wide variety of test cases to explore untested execution paths. For instance, when a smart contract includes a function with multiple conditional branches, this phase attempts to traverse each branch, by varying input parameters through a combination of symbolic analysis and random mutation strategies. The symbolic analysis method negates branch conditions and solves for uncovered path constraints, while the random mutation approach introduces new inputs when constraints cannot be satisfied symbolically. This phase initializes a special individual pool, which records those test cases that reach potential vulnerability areas.

Phase ②: Vulnerability-Guided Fuzzing: Building on the outcomes of the first phase, this phase focuses on targeted fuzzing using the special individual pool. Here, the transaction sequences of the stored test cases are fixed, while the fuzzer exclusively optimizes the parameters of these transactions. For example, if during the coverage-guided phase certain inputs cause unexpected state changes in a contract function handling fund transfers, the vulnerability-guided phase will prioritize these cases, refining the transaction parameters to intensify testing in that specific area.

In both phases, the Symbolic Analyzer and Sequence Optimizer play crucial roles in analyzing the results of test case execution. They provide valuable insights, which are then fed back into the fuzzing engine to guide subsequent testing.

(A) Symbolic Analyzer: One key component of our fuzzing engine is the Symbolic Analyzer, as outlined in Algorithm 3. Unlike traditional symbolic execution—which must explore every possible execution path, leading to exponential path explosion as contract complexity increases—our approach leverages dynamic symbolic execution to focus solely on the paths that are actually executed during fuzzing. This strategy significantly mitigates the path explosion problem while concentrating the analysis on the most relevant execution branches. The analyzer extracts path constraints (e.g., transaction parameters satisfying require(msg.value > 0)) and branch conditions from the traces. For each conditional branch, the analyzer constructs a constraint system. It combines historical path conditions with the negation of the target branch condition. Using constraint solvers, the analyzer generates new input values that satisfy these conditions. These solved values are then injected into the original test case structure. The fields, such as function selector and address formats, are preserved. This creates mutated seeds that are fed back to the fuzzer. This process enables targeted exploration of complex branching logic. Random mutation often cannot efficiently reach these paths.

Algorithm 3 Symbolic analyzer

Input: Execution trace execution_trace
Output: List of new mutation seeds new_seeds

1:: Initialize new_seeds $\leftarrow []$
2:: for each branch in extract_branches(execution_trace) do
3:: path_constraints ← branch.get_path_constraints()
4:: negated_condition ← negate_condition(branch.condition)
5:: new_constraints ← path_constraints + [negated_condition]
6:: solved_input ← solve_constraints(new_constraints)
7:: if solved_input is valid then
8:: mutated_seed ← mutate_seed(branch.input_context, solved_input)
9:: new_seeds.append(mutated_seed)
10:: end if
11:: end for
12:: return new_seeds

(B) Sequence Optimizer: Equally important is the Sequence Optimizer, which contributes significantly to exploring the deep state space of the contract. Algorithm 4 outlines its process, which leverages data dependency relationships obtained during preprocessing (for simple variable types) and dynamic data dependency relationships observed during runtime (for complex types such as mappings or arrays) to generate input sequences. When exploring regions that may potentially contain vulnerabilities, the Sequence Optimizer performs reverse analysis on the CFG to derive complete call sequences. These sequences are then used to guide the generation of test cases that target specific vulnerabilities.

(C) EVM Sandbox: Additionally, the fuzzing engine incorporates the EVM Sandbox, which serves a crucial role in ensuring secure and controlled testing. This sandbox is a secure and isolated environment built on Ethereum’s official Python implementation (Py-EVM version 0.9.0). It faithfully simulates the EVM, and by leveraging Py-EVM’s execution semantics, the sandbox removes real-world blockchain operations like block mining and transaction encoding/decoding. This allows it to focus solely on the precise execution of transactions. While this isolation ensures safety and focused testing, it also means that the real status or behavior of external contracts cannot be directly obtained. To address this, the test framework sets predefined return values or statuses for external calls within the test cases, thus avoiding potential failures caused by the absence of real external contracts during testing. External calls are simulated by adding hook functions to the test framework, which replicate the expected results of these calls. During testing, the sandbox records detailed execution traces, including opcode-level execution flow, gas consumption, and storage state changes.

Algorithm 4 Sequence optimizer

Input: Control Flow Graph

C F G

, Data Dependencies

D_{s t a t i c}

,

D_{d y n a m i c}

Output: Optimized test sequence set

T

1:: Initialize $T \leftarrow Ø$
2:: critical node set $V_{c r i t} \leftarrow I d e n t i f y V u l n e r a b i l i t y P a t t e r n s (C F G)$
3:: for $v \in V_{c r i t}$ do
4:: $P \leftarrow R e v e r s e T r a v e r s e C F G (C F G, v)$
5:: for $p a t h \in P$ do
6:: $d e p_c h a i n \leftarrow B u i l d D e p e n d e n c y C h a i n (p a t h, D_{s t a t i c}, D_{d y n a m i c})$
7:: if $V a l i d a t e F e a s i b i l i t y (d e p_c h a i n)$ then
8:: $s e q \leftarrow G e n e r a t e I n p u t S e q u e n c e (d e p_c h a i n)$
9:: $T \leftarrow T \cup {s e q}$
10:: end if
11:: end for
12:: end for
13:: return $T$

3.4. Vulnerability Detector

The vulnerability detection module operates through a multistage analytical process that begins with the execution traces generated during the fuzzing campaign. These traces, captured from the EVM during test case execution, comprehensively record the contract’s runtime behavior, including function call sequences, stack operations, memory modifications, and storage changes. By analyzing these temporal records of bytecode-level operations, the system reconstructs the complete execution context necessary for vulnerability verification, as raw transaction logs alone cannot reveal subtle data flow anomalies.

To precisely model contract behavior, the module implements a full EVM simulator that dynamically interprets each opcode while maintaining virtualized representations of stack frames, memory buffers, and persistent storage slots. This simulation process goes beyond literal execution by augmenting concrete values with symbolic metadata—for instance, tracking not just the numerical result of an ADD operation but also its semantic relationship to prior computations. During this emulation, the system injects taint markers into security-sensitive inputs identified through predefined sources like function parameters (CALLER, CALLVALUE), block state variables (TIMESTAMP, NUMBER), and external call returns, as systematically categorized in Table 4. These taint labels then propagate through subsequent operations following EVM-specific rules: arithmetic instructions (MUL, SUB) inherit taint status from their operands, control flow operations (JUMPI) carry taint conditions to destination addresses, and storage operations (SSTORE) preserve taint states across transactions. This taint tracking mechanism crucially distinguishes between benign data flows and attacker-controllable paths, reducing false positives in vulnerability detection.

The actual vulnerability identification matches model patterns in synchronization with execution simulation. Our framework categorizes vulnerabilities into two groups based on the detection strategy:

Require Specific Transaction Sequences: For vulnerabilities such as RE, AF, UD, and US, our system generates specialized test cases designed to reach targeted code segments. For example, to detect RE, a specific function sequence (e.g., deposit–withdraw–fallback) is generated. During execution simulation, the monitor tracks external call patterns (such as CALL and DELEGATECALL) occurring between contract state updates, flagging instances where tainted parameters control call targets or value transfers. Similarly, for UD, the system constructs transaction sequences that reach delegate call statements and then verifies whether the call parameters have been influenced by tainted inputs, as determined by our vulnerability pattern analysis.

Direct Execution Trajectory Analysis: Other vulnerabilities—such as IO, UC, BD, and EF—are identified directly by analyzing the execution trajectory. For instance, when detecting IO, the system compares arithmetic operation results against the EVM’s 256-bit modular arithmetic boundaries, while simultaneously verifying whether tainted inputs have influenced the computation. An alert is triggered only when the suspect value flows into security-critical contexts, such as fund transfer amounts or array indices. For UC, the system examines whether block information variables have contaminated key instructions like STATICCALL, SELFDESTRUCT, or DELEGATECALL.

Each detected vulnerability event is documented with its triggering transaction sequence, affected state variables, a visualization of the taint propagation path, and a severity assessment that includes the specific source code location. This comprehensive approach ensures that both tailored test case generation and direct execution analysis work in tandem to accurately detect and categorize vulnerabilities in practice.

While the vulnerability detection module effectively uses taint analysis and symbolic execution, it faces trade-offs between false positives and false negatives. False positives may occur when detected vulnerabilities are actually intentional design choices by the contract developer, such as specific reentrancy patterns, which are not real security risks. These cannot be fully eliminated, but efforts are made to reduce them. False negatives arise from three factors, incomplete vulnerability definitions, inability to simulate external interactions, and insufficient branch coverage, all of which can lead to missed vulnerabilities. Our tool design addresses these challenges through detailed vulnerability pattern analysis, simulating external interactions, and employing two-phase fuzzing strategies to improve branch coverage.

4. Experiments Evaluation

To evaluate the effectiveness of TPH-Fuzz, we present results based on its application to an open source dataset. This section provides an analysis of our experimental findings. The experiment design is introduced in Section 4.1. Then, the evaluation methods and metrics are introduced in Section 4.2. Finally, the experimental results and analysis are introduced in Section 4.3.

4.1. Experiment Design

Dataset: Two rigorously curated datasets, sourced from a publicly available collection of Ethereum smart contracts compiled by Liu et al. [38], were employed. These datasets were selected to meet several important criteria: (1) Diversity—the datasets encompass a wide range of contract types, including financial, gaming, and utility contracts, to ensure that the evaluation reflects the variety of real-world Ethereum smart contracts; (2) Authenticity—the datasets include contracts that are deployed on the Ethereum blockchain, ensuring the accuracy of the Solidity source code and its corresponding on-chain bytecode; (3) Vulnerability Label Accuracy—the vulnerability labels were rigorously reviewed and verified by experts, ensuring that the type, location, and severity of the vulnerabilities are correctly marked.

The first dataset (D1), designed for branch coverage evaluation, originates from 5600 randomly selected Solidity source files. These files typically contain multiple contractual components, including primary contracts implementing core DApp logic, auxiliary contracts, interfaces (excluded due to the absence of bytecode implementations), and libraries (filtered out as they primarily consist of stateless pure functions). Using SHA-256 bytecode hashing, 9788 duplicate instances were removed, resulting in a final dataset of 9309 unique contracts. These contracts are stratified by computational complexity into 2437 large-scale contracts (with ≥3600 instructions) and 6872 small contracts (with <3600 instructions). As detailed in Table 5, each contract is annotated with various code metrics, including source lines, instruction counts, and function counts.

Complementing this, the second dataset (D2) evaluates vulnerability detection performance, containing 1086 contracts with manually verified vulnerability labels. These contracts span eight vulnerability categories, with specific distributions shown in Table 6. Each contract was manually verified through expert code audits and historical attack pattern matching to ensure the accuracy of the label.

Baseline: We select five popular and representative open source smart contract analysis tools as baseline methods to comprehensively evaluate the effectiveness of our proposed approach. These tools were chosen based on their wide usage in the research and industry communities, and they cover a range of diverse vulnerability detection capabilities.

As illustrated in Table 7, sFuzz [35], a fuzzing tool, employs an adaptive genetic algorithm to optimize test case generation, dynamically adjusting mutation strategies by monitoring branch coverage. ILF [51] integrates symbolic execution with deep reinforcement learning to simulate transaction sequence generation, transforming path constraints into probabilistic models for intelligent state space traversal. ConFuzzius [36] combines evolutionary fuzzing and symbolic execution, leveraging dynamic data flow tracing to identify critical variable dependencies and resolve complex arithmetic constraints via solvers. Mythril [28], a symbolic execution-based analyzer, systematically explores execution paths in a customized EVM environment, verifies vulnerability trigger conditions using SMT solvers, and traces root causes through backward analysis. Securify [19], a formal verification tool, constructs global dependency graphs to model data flow relationships between state variables and detects vulnerabilities through formal rule-based compliance checks. These tools collectively establish a multi-paradigm baseline encompassing dynamic/static analysis, white-box/black-box testing, and learning-driven/logical reasoning methodologies, ensuring a rigorous evaluation framework for comparative studies.

All experiments were conducted on an Ubuntu 18.04 system equipped with an AMD Ryzen 7 6800HS processor (8 cores, 16 threads at 3.20 GHz) and 16 GB RAM. The development environment utilized Python 3.8.10 with the Py-EVM library for EVM execution. Smart contracts were compiled using the Solidity compiler (optimization enabled with 200 runs) and analyzed within a controlled execution environment. To mitigate random variations, all experiments were repeated five times, with results reported as mean values after excluding initialization outliers. Computational reproducibility was ensured through Docker containers with fixed dependency versions.

4.2. Evaluation Methods and Metrics

Our evaluation framework adopts fundamental statistical measures to quantify vulnerability detection performance. Detection results are categorized into four atomic components:

True Positives (TP): Vulnerable contracts correctly identified by the detection system.
True Negatives (TN): Non-vulnerable contracts accurately classified as secure.
False Positives (FP): Secure contracts erroneously flagged as vulnerable.
False Negatives (FN): Vulnerable contracts undetected by the system.

These elementary measures form the basis for computing three essential performance metrics:

Precision measures detection reliability by calculating the proportion of correct vulnerability alerts. It answers the following question: Of all contracts flagged as vulnerable, how many are actually vulnerable?

$Precision (P) = \frac{T P}{T P + F P}$

(2)
Recall evaluates detection completeness through the fraction of actual vulnerabilities identified. It answers the following question: Of all the actual vulnerabilities, how many did the system successfully detect?

$Recall (R) = \frac{T P}{T P + F N}$

(3)
F1-Score provides a balanced assessment using the harmonic mean of precision and recall, it particularly suitable for vulnerability detection scenarios where both false alarms and missed vulnerabilities carry significant consequences.

$F_{1} -Score (F_{1}) = \frac{2 \cdot P \cdot R}{P + R}$

(4)

All metrics are computed through systematic comparison against manually verified ground truth labels, ensuring objective performance evaluation across different detection methodologies.

Branch Coverage is another key metric used to evaluate the effectiveness of fuzz testing. This metric measures the extent to which the branches of a smart contract’s CFG are covered during testing. Branch coverage is an essential indicator of how well the test cases exercise the contract’s logic, ensuring that all possible execution paths are tested. This is particularly critical in vulnerability detection, as certain vulnerabilities may only be triggered under specific conditions represented by particular branches in the contract’s control flow.

4.3. Results and Analysis

4.3.1. Vulnerability Detection

The experimental results show significant performance disparities among the evaluated tools, with TPH-Fuzz demonstrating superior detection capabilities across all metrics. As shown in Table 8, TPH-Fuzz detects 846 true positives (TP) and 102 false positives (FP), achieving an F1-score of 83.19%, accuracy of 89.24%, and recall of 83.19%, significantly outperforming the baseline methods. In contrast, Mythril, the second-best performer, only achieves an F1-score of 52.62% and recall of 42.57%, constrained by path explosion issues in symbolic execution. TPH-Fuzz’s hybrid approach uniquely balances precision and coverage: in Ether Freezing (EF) detection, it eliminates false positives (129 TP/0 FP), while maintaining 94 TP in Unprotected Selfdestruct (US) analysis, a 1.8× improvement over Mythril’s 51 TP.

Notably, TPH-Fuzz addresses key limitations in existing approaches. For Block Dependency (BD) vulnerabilities, it achieves 185 TP with only 10 FP, surpassing sFuzz (52 TP/34 FP) through enhanced control flow tracking. In Integer Overflow (IO) detection, TPH-Fuzz reduces FP to 9 (97 TP), resolving Confuzzius’s severe over-approximation (35 TP/236 FP) via runtime value validation. Although Mythril partially detects Reentrancy (RE) vulnerabilities (85 TP), its 39 FP highlights deficiencies in state transition analysis, while TPH-Fuzz’s context-aware mechanism achieves 82 TP/33 FP. ILF’s learning-based strategy fails catastrophically in Unchecked Calls (UC), yielding 142 FP (45 TP), while TPH-Fuzz maintains 84.0% precision (147 TP/28 FP).

These performance metrics further validate the findings, as shown in Figure 5. TPH-Fuzz’s 83.19% recall ensures comprehensive coverage, in contrast to Securify’s critical gaps (39.62% recall) due to limitations in static analysis. Confuzzius’s hybrid approach improves recall (39.41%), but generates 412 FP, resulting in a reduced F1-score of 45.08%. sFuzz and ILF exhibit systemic failures, with ILF’s F1-score of 20.50% reflecting its poor generalization. These results empirically confirm that TPH-Fuzz’s integration of dynamic symbolic execution and two-phase fuzzing effectively overcomes the precision–recall tradeoff inherent in single-paradigm tools, enabling robust detection across various vulnerability types.

4.3.2. Branch Coverage

Figure 6 illustrates the branch coverage performance of various fuzzing tools across both small- and large-scale smart contracts. TPH-Fuzz demonstrates substantial superiority, achieving stable convergence faster than all baseline methods while consistently maintaining the highest coverage rates. For small contracts, TPH-Fuzz attains near-optimal branch coverage of 90%, significantly outperforming competing tools. Notably, this advantage persists in large-scale contract analysis, where TPH-Fuzz sustains 85% coverage despite heightened complexity—a testament to its robust adaptability. In contrast, alternative approaches exhibit significant limitations: Confuzzius peaks at 70% coverage (small contracts) and 65% (large contracts), while sFuzz and ILF fail to exceed 50% coverage in both categories.

Technical limitations of baseline tools explain this performance gap. While sFuzz employs branch distance metrics to evolve test cases iteratively toward target branches, its reliance on population-based optimization requires substantial test case quantities to achieve meaningful coverage. Confuzzius leverages constraint solving for test case mutation but neglects the critical impact of transaction sequence ordering on contract state transitions. ILF’s learning-based methodology, constrained by training data biases, demonstrates poor generalizability. In contrast, TPH-Fuzz integrates static analysis and constraint solving within a hybrid framework: its initial population is generated through data dependency analysis to construct methodologically diverse transaction sequences. This foundation is continuously refined through adaptive optimization, enabling efficient navigation of complex branching conditions.

4.3.3. Component Evaluation

The ablation study (as shown in Table 9) reveals critical dependencies between TPH-Fuzz’s components and its overall efficacy. When the symbolic analyzer is removed (WSA), the system’s branch coverage plunges by 40.3% (86.41% → 51.62%), directly undermining detection reliability: precision drops to 62.03%, while recall collapses to 36.56%, resulting in a 44.7% reduction in F1-score (83.19% → 46.00%). This stark decline highlights the analyzer’s irreplaceable role in resolving complex path constraints, preventing random exploration from generating valid test cases for intricate logical checks.

Disabling the sequence optimizer (WSO) degrades branch coverage by 21.0% (86.41% → 68.34%), with precision and recall falling to 66.48% and 43.09%, respectively. The optimizer’s absence drastically limits the system’s ability to construct meaningful transaction sequences, particularly for vulnerabilities requiring multi-step state transitions. While partial coverage persists, untargeted sequence generation significantly inflates false positives (236 FP vs. full system’s 102 FP), demonstrating the necessity of guided state exploration.

Removing the two-phase policy (WTPF) yields more nuanced effects. Despite modest coverage loss (−5.7%), detection precision declines sharply (78.16% vs. 89.24%), with a 16.3% F1-score reduction (83.19% → 69.49%). This exposes the policy’s pivotal role in balancing mutation strategies: without adaptive resource allocation, energy is wasted on low-yield code regions, increasing spurious alerts (182 FP vs. 102 FP) and delaying critical vulnerability discovery.

The ablation study demonstrates that the integrated synergy of TPH-Fuzz’s components is essential for achieving comprehensive detection capabilities. Disabling any individual module—the symbolic analyzer, sequence optimizer, or two-phase policy—results in significant performance degradation across precision, recall, and coverage metrics. The symbolic analyzer is critical for resolving complex logical constraints, the sequence optimizer enables targeted exploration of state-dependent paths, and the two-phase policy ensures efficient resource allocation. Statistical analysis confirms that their combined operation accounts for the overwhelming majority of performance improvements over baseline methods. This validates that robust smart contract analysis fundamentally requires a harmonized integration of complementary techniques, as standalone approaches fail to address the multifaceted nature of real-world vulnerabilities.

5. Conclusions

This study establishes TPH-Fuzz as a framework addressing critical limitations in smart contract vulnerability detection. Through its two-phase hybrid fuzzing architecture combining static analysis and symbolic execution, the framework resolves three core challenges: imbalanced resource allocation (via hierarchical testing), random path exploration inefficiency (through symbolic execution), and state-space complexity (using data dependency modeling). Experimental validation across 9309 contracts (coverage analysis) and 1086 labeled cases (vulnerability detection) confirms its efficacy, achieving 85% branch coverage for complex contracts and 89.24% detection precision. These findings conclusively answer our research hypothesis that synergistic static–dynamic analysis surpasses single-method approaches in both exploration depth and detection accuracy.

However, several limitations warrant discussion. First, the diverse datasets used in our experiments were drawn from publicly available smart contracts. As a result, they might not fully represent the range of real-world vulnerabilities and could be biased toward common contract types. Additionally, while TPH-Fuzz excels at identifying established vulnerabilities, it may overlook unknown or emerging ones, thereby limiting its overall applicability. Our evaluation was also confined to a comparison with five representative smart contract analysis tools. Although these tools cover a broad spectrum of detection capabilities, the selection may have excluded innovative or niche solutions with unique strengths. Future research should broaden this comparison to encompass a wider array of tools, thereby providing a more comprehensive overview of the smart contract analysis landscape.

Future work will extend the applicability of TPH-Fuzz to other blockchain platforms, enhancing its generalizability and robustness. Given that the security challenges of smart contracts are not exclusive to Ethereum, we plan to explore its application on other major blockchain environments, such as Binance Smart Chain and Solana. Another promising direction is real-time detection in dynamic environments, where TPH-Fuzz can be embedded into live transaction streams to continuously monitor and identify emerging vulnerabilities, ensuring proactive security throughout the smart contract lifecycle.

Author Contributions

Conceptualization, methodology, software, validation, formal analysis, and investigation, F.S.; writing—original draft preparation, F.S. and Z.G.; writing—review and editing, Z.G. and J.Y.; project administration, J.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

EVM	Ethereum Virtual Machine
DeFi	Decentralized Finance
SCVD	Smart Contract Vulnerability Detection
PoW	Proof of Work
PoS	Proof of Stake
EOAs	Externally Owned Accounts
CAs	Contract Account
MPT	Merkle Patricia Tree
AST	Abstract Syntax Tree
ABI	Application Binary Interface
CFG	Control Flow Graph
BD	Block Dependency
RE	Reentrancy
IO	Integer Overflow
AF	Assert Failure
EF	Ether Freezing
UC	Unchecked Low-Level Call
US	Unprotected Selfdestruct
UD	Unsafe Delegatecall
TP	True Positive
FP	False Positive
TN	True Negative
FN	False Negative
WSA	Without Symbolic Analyzer
WSO	Without Sequence Optimizer
WTPF	Without Two-Phase Fuzzing

References

Szabo, N. Smart Contracts: Building Blocks for Digital Markets. EXTROPY J. Transhumanist Thought 1996, 16, 18–23. [Google Scholar]
Nakamoto, S. Bitcoin: A Peer-to-Peer Electronic Cash System. Available online: https://bitcoin.org/bitcoin.pdf (accessed on 14 February 2025).
Wood, G. Ethereum: A Secure Decentralized Generalized Transaction Ledger. Ethereum Proj. Yellow Pap. 2014, 1–32. Available online: https://ethereum.github.io/yellowpaper/paper.pdf (accessed on 14 February 2025).
Solidity Documentation. Available online: https://soliditylang.org (accessed on 14 February 2025).
Cagigas, D.; Clifton, J.; Diaz-Fuentes, D.; Fernandez-Gutierrez, M. Blockchain for Public Services: A Systematic Literature Review. IEEE Access 2021, 9, 13904–13921. [Google Scholar]
Javaid, M.; Haleem, A.; Singh, R.; Suman, R.; Khan, S. A Review of Blockchain Technology Applications for Financial Services. BenchCouncil Trans. Benchmarks, Stand. Eval. 2022, 2, 100073. [Google Scholar] [CrossRef]
Doguchaeva, S.; Zubkova, S.; Katrashova, Y. Blockchain in Public Supply Chain Management: Advantages and Risks. Transp. Res. Procedia 2022, 63, 2172–2178. [Google Scholar] [CrossRef]
Etherscan. Available online: https://etherscan.io (accessed on 14 February 2025).
DeFiLlama. Available online: https://defillama.com (accessed on 14 February 2025).
Mehar, M.I.; Shier, C.L. Understanding a Revolutionary and Flawed Grand Experiment in Blockchain: The DAO Attack. J. Cases Inf. Technol. 2019, 21, 19–32. [Google Scholar] [CrossRef]
The Parity Wallet Hack Explained. Available online: https://blog.openzeppelin.com/on-the-parity-wallet-multisig-hack-405a8c12e8f7 (accessed on 14 February 2025).
Awesome Buggy ERC20 Tokens. Available online: https://github.com/sec-bit/awesome-buggy-erc20-tokens (accessed on 14 February 2025).
The Poly Network Hack Explained. Available online: https://research.kudelskisecurity.com/2021/08/12/the-poly-network-hack-explained/ (accessed on 14 February 2025).
Wormhole. Available online: https://learnblockchain.cn/article/3650 (accessed on 14 February 2025).
Euler Finance. Available online: https://learnblockchain.cn/article/5551 (accessed on 14 February 2025).
Curve Finance. Available online: https://learnblockchain.cn/article/6265 (accessed on 14 February 2025).
Penpie. Available online: https://www.cnblogs.com/ACaiGarden/p/18399387 (accessed on 14 February 2025).
Kalra, S.; Goel, S.; Dhawan, M.; Sharma, S. ZEUS: Analyzing Safety of Smart Contracts. In Proceedings of the Network and Distributed System Security Symposium (NDSS) 2018, San Diego, CA, USA, 18–21 February 2018; Internet Society: San Diego, CA, USA, 2018; pp. 1–16. Available online: https://www.ndss-symposium.org/wp-content/uploads/2018/02/ndss2018_09-1_Kalra_paper.pdf (accessed on 14 February 2025).
Tsankov, P.; Dan, A.; Drachsler-Cohen, D.; Gervais, A.; Bünzli, F.; Vechev, M. Securify: Practical Security Analysis of Smart Contracts. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security (CCS ’18), Toronto, ON, Canada, 15–19 October 2018; ACM: Toronto, ON, Canada, 2018; pp. 67–82. [Google Scholar] [CrossRef]
Permenev, A.; Dimitrov, D.; Tsankov, P.; Drachsler-Cohen, D.; Vechev, M. VerX: Safety Verification of Smart Contracts. In Proceedings of the 2020 IEEE European Symposium on Security and Privacy, San Francisco, CA, USA, 18–21 May 2020; IEEE: London, UK, 2020; pp. 1–16. [Google Scholar] [CrossRef]
Brent, L.; Jurisevic, A.; Kong, M.; Liu, E.; Gauthier, F.; Gramoli, V.; Holz, R.; Scholz, B. Vandal: A Scalable Security Analysis Framework for Smart Contracts. arXiv 2018, arXiv:1809.03981. [Google Scholar] [CrossRef]
Hildenbrandt, E.; Saxena, M.; Rodrigues, N.; Zhu, X.; Daian, P.; Guth, D.; Moore, B.; Zhang, Y.; Park, D.; Stefanescu, A.; et al. KEVM: A Complete Formal Semantics of the Ethereum Virtual Machine. In Proceedings of the 31st IEEE Computer Security Foundations Symposium (CSF 2018), Oxford, UK, 9–12 July 2018; IEEE: Oxford, UK, 2018; pp. 204–217. [Google Scholar] [CrossRef]
Murray, Y.; Anisi, D.A. Survey of Formal Verification Methods for Smart Contracts on Blockchain. In Proceedings of the 2019 10th IFIP International Conference on New Technologies, Mobility and Security (NTMS), Canary Islands, Spain, 24–26 June 2019; IEEE: Canary Islands, Spain, 2019; pp. 1–6. [Google Scholar] [CrossRef]
Luu, L.; Chu, D.-H.; Olickel, H.; Saxena, P.; Hobor, A. Making Smart Contracts Smarter. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (CCS ’16), Vienna, Austria, 24–28 October 2016; ACM: Vienna, Austria, 2016; pp. 254–269. [Google Scholar] [CrossRef]
Torres, C.F.; Schütte, J.; State, R. Osiris: Hunting for Integer Bugs in Ethereum Smart Contracts. In Proceedings of the 34th Annual Computer Security Applications Conference, San Juan, PR, USA, 3–7 December 2018; pp. 664–676. [Google Scholar] [CrossRef]
Mossberg, M.; Manzano, F.; Hennenfent, E.; Groce, A.; Grieco, G.; Feist, J.; Brunson, T.; Dinaburg, A. Manticore: A User-Friendly Symbolic Execution Framework for Binaries and Smart Contracts. In Proceedings of the 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE), San Diego, CA, USA, 11–15 November 2019; pp. 1186–1189. [Google Scholar] [CrossRef]
Chen, J.; Xia, X.; Lo, D.; Grundy, J.; Luo, X.; Chen, T. DefectChecker: Automated Smart Contract Defect Detection by Analyzing EVM Bytecode. IEEE Trans. Softw. Eng. 2020, 48, 2189–2207. [Google Scholar] [CrossRef]
ConsenSys. Mythril: A Security Analysis Tool for Ethereum Smart Contracts. Available online: https://github.com/ConsenSys/mythril (accessed on 14 February 2025).
Baldoni, R.; Coppa, E.; D’Elia, D.C.; Demetrescu, C.; Finocchi, I. A Survey of Symbolic Execution Techniques. ACM Comput. Surv. 2018, 51, 50. [Google Scholar] [CrossRef]
Gao, Z.; Jayasundara, V.; Jiang, L.; Xia, X.; Lo, D.; Grundy, J. SmartEmbed: A Tool for Clone and Bug Detection in Smart Contracts through Structural Code Embedding. In Proceedings of the 2019 IEEE International Conference on Software Maintenance and Evolution (ICSME), Cleveland, OH, USA, 29 September–4 October 2019; pp. 394–397. [Google Scholar] [CrossRef]
Liu, Z.; Qian, P.; Wang, X.; Zhuang, Y.; Qiu, L.; Wang, X. Combining Graph Neural Networks with Expert Knowledge for Smart Contract Vulnerability Detection. IEEE Trans. Knowl. Data Eng. 2023, 35, 1296–1310. [Google Scholar] [CrossRef]
Ashizawa, N.; Yanai, N.; Cruz, J.P.; Okamura, S. Eth2Vec: Learning Contract-Wide Code Representations for Vulnerability Detection on Ethereum Smart Contracts. In Proceedings of the 3rd ACM International Symposium on Blockchain and Secure Critical Infrastructure, Hong Kong, China, 7 June 2021; pp. 47–59. [Google Scholar] [CrossRef]
Budach, L.; Feuerpfeil, M.; Ihde, N.; Nathansen, A.; Noack, N.S.; Patzlaff, H.; Harmouch, H.; Naumann, F. The Effects of Data Quality on ML-Model Performance. arXiv 2022, arXiv:2207.14529. Available online: https://api.semanticscholar.org/CorpusID:251196935 (accessed on 14 February 2025).
Jiang, B.; Liu, Y.; Chan, W.K. ContractFuzzer: Fuzzing Smart Contracts for Vulnerability Detection. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, Montpellier, France, 3–7 September 2018; pp. 259–269. [Google Scholar] [CrossRef]
Nguyen, T.D.; Pham, L.H.; Sun, J.; Lin, Y.; Minh, Q.T. sFuzz: An Efficient Adaptive Fuzzer for Solidity Smart Contracts. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering, Seoul, Republic of Korea, 27 June–19 July 2020; pp. 778–788. [Google Scholar] [CrossRef]
Torres, C.F.; Iannillo, A.K.; Gervais, A.; State, R. ConFuzzius: A Data Dependency-Aware Hybrid Fuzzer for Smart Contracts. In Proceedings of the 2021 IEEE European Symposium on Security and Privacy, Vienna, Austria, 6–10 September 2021; pp. 103–119. [Google Scholar] [CrossRef]
Zhang, T.; Jiang, Y.; Guo, R.; Zheng, X.; Lu, H. A Survey of Hybrid Fuzzing based on Symbolic Execution. In Proceedings of the 2020 International Conference on Cyberspace Innovation of Advanced Technologies, Guangzhou, China, 4–6 December 2020; pp. 192–196. [Google Scholar] [CrossRef]
Liu, Z.; Qian, P.; Yang, J.; Liu, L.; Xu, X.; He, Q.; Zhang, X. Rethinking Smart Contract Fuzzing: Fuzzing with Invocation Ordering and Important Branch Revisiting. IEEE Trans. Inf. Forensics Secur. 2023, 18, 1237–1251. [Google Scholar] [CrossRef]
Zheng, Z.; Xie, S.; Dai, H.; Chen, X.; Wang, H.; Li, Q. Blockchain Challenges and Opportunities: A Survey. Int. J. Web Grid Serv. 2018, 14, 352–375. [Google Scholar] [CrossRef]
Atzei, N.; Bartoletti, M.; Cimoli, T. A Survey of Attacks on Ethereum Smart Contracts SoK. In Proceedings of the 6th International Conference on Principles of Security and Trust, Uppsala, Sweden, 22–29 April 2017; Springer: Berlin/Heidelberg, Germany, 2017; Volume 10204, pp. 164–186. [Google Scholar] [CrossRef]
Crytic. Slither: A Static Analysis Framework for Smart Contracts. Available online: https://github.com/crytic/slither (accessed on 14 February 2025).
OpenZeppelin. OpenZeppelin: Secure Smart Contract Development Framework. Available online: https://openzeppelin.com (accessed on 14 February 2025).
Ethereum Stack Exchange. Ethereum Stack Exchange: Q&A for Ethereum Developers. Available online: https://ethereum.stackexchange.com (accessed on 14 February 2025).
Reddit. r/ethereum: Community for Ethereum Discussions. Available online: https://www.reddit.com/r/ethereum/ (accessed on 14 February 2025).
Miller, B.P.; Fredriksen, L.; So, B. An Empirical Study of the Reliability of UNIX Utilities. Commun. ACM 1990, 33, 32–44. [Google Scholar] [CrossRef]
Manès, V.J.M.; Han, H.; Han, C.; Cha, S.K.; Egele, M.; Schwartz, E.J.; Woo, M. The Art, Science, and Engineering of Fuzzing: A Survey. IEEE Trans. Softw. Eng. 2021, 47, 2312–2331. [Google Scholar] [CrossRef]
Zhu, X.; Wen, S.; Camtepe, S.; Xiang, Y. Fuzzing: A Survey for Roadmap. ACM Comput. Surv. 2022, 54, 230. [Google Scholar] [CrossRef]
Boehme, M.; Cadar, C.; Roychoudhury, A. Fuzzing: Challenges and Reflections. IEEE Softw. 2021, 38, 79–86. [Google Scholar] [CrossRef]
Mallissery, S.; Wu, Y. Demystify the Fuzzing Methods: A Comprehensive Survey. ACM Comput. Surv. 2023, 56, 71. [Google Scholar] [CrossRef]
Zhang, W.; Banescu, S.; Passos, L.; Stewart, S.; Ganesh, V. MPro: Combining Static and Symbolic Analysis for Scalable Testing of Smart Contracts. In Proceedings of the 2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE), Berlin, Germany, 28–31 October 2019; IEEE: New York, NY, USA, 2019; pp. 456–462. [Google Scholar] [CrossRef]
He, J.; Balunović, M.; Ambroladze, N.; Tsankov, P.; Vechev, M. Learning to Fuzz from Symbolic Execution with Application to Smart Contracts. In Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, London, UK, 11–15 November 2019; Association for Computing Machinery: New York, NY, USA, 2019; pp. 531–548. [Google Scholar] [CrossRef]

Figure 1. The state machine of Ethereum.

Figure 2. The EVM execution model.

Figure 3. Workflow of fuzzing.

Figure 4. Workflow of the TPH-Fuzz. The steps are labeled as follows: (①) Compilation; (②) Static Analysis; (③) Function Call Sequence; (④) New Seeds; (⑤) Test Cases; (⑥) Execution Trace.

Figure 5. Comparison of F1-score, accuracy, and recall across different tools.

Figure 6. Branch coverage of different tool on small- (L) and large-scale (R) contracts.

Table 1. Smart contract attack events.

Time	Target of Attack	Amount of Damage
June 2016	The DAO [10]	60,000,000 USD
July 2017	Parity Wallet [11]	30,000,000 USD
April 2018	BeautyChain [12]	100,000,000 USD
Aug 2021	Poly Network [13]	610,000,000 USD
Feb 2022	Wormhole [14]	326,000,000 USD
Mar 2023	Euler Finance [15]	196,000,000 USD
July 2023	Curve Finance [16]	70,000,000 USD
Sep 2024	Penpie [17]	27,340,000 USD

Table 2. Common and destructive vulnerabilities in smart contract security.

Vulnerability Type	Classification	Danger Level	Description
Reentrancy (RE)	SWC-107/CWE-841	High	Smart contracts can use a fallback function to receive Ether, which takes no parameters or returns values. If no specific function handles (such as `transferFrom`) the Ether, this fallback function runs automatically. However, if it calls other functions, it may open up a reentrancy vulnerability, allowing attackers to repeatedly trigger contract functions before the state is fully updated.
Integer Overflow (IO)	SWC-101/CWE-682	High	Integer overflow occurs when an arithmetic operation exceeds the limits of a data type, causing a wrap-around. For example, adding 1 to a uint 8 value of 255 resets it to 0. Underflow happens when a subtraction results in a value below 0. A notable exploitation of this vulnerability was the 2018 BEC token incident, where attackers created excessive tokens [12].
Assert Failure (AF)	SWC-110/CWE-670	Medium–High	The assert failure vulnerability occurs when a critical condition is not met during contract execution, causing an `assert` check to fail and triggering an error that reverts the transaction. Assertions are used to ensure internal state consistency, but improper conditions or design flaws can be exploited by attackers, leading to contract failures or irrecoverable states.
Block Dependency (BD)	SWC-116/CWE-829	Medium	This vulnerability arises when contract logic depends on unstable blockchain parameters (e.g., `TIMESTAMP`, `NUMBER`, `DIFFICULTY`) that miners can manipulate, allowing attackers to exploit predictable values to alter contract behavior.
Ether Freezing (EF)	SWC-135/CWE-1164	Medium	Design flaws or permission issues in a smart contract can lock Ether, making it untransferable. This can happen due to missing withdrawal functions, failed critical calls, dependencies on problematic external contracts, potentially trapping funds permanently.
Unchecked Low-Level Call (UC)	SWC-104/CWE-252	Medium	The low-level call return value vulnerability occurs when using low-level functions like `CALL`, `SEND`, `DELEGATECALL`, or `CALLCODE` in Solidity without checking the return value. These calls interact directly with other contracts but do not automatically throw errors, so developers must manually verify the success of the call.
Unprotected Selfdestruct (US)	SWC-106/CWE-284	High	This vulnerability occurs when a contract’s `SELFDESTRUCT` is unprotected, allowing unauthorized users to trigger it. Since `SELFDESTRUCT` permanently removes a contract and transfers its Ether, an attacker could terminate the contract, disrupt dependent functionalities, and cause fund loss.
Unsafe Delegatecall (UD)	SWC-112/CWE-829	High	The `DELEGATECALL` is a specific call that allows Contract A to execute code from Contract B within its own context, accessing Contract A’s storage, balance, and permissions. Misusing this feature can introduce security risks, as a malicious or untrusted contract could manipulate Contract A’s data, potentially leading to financial loss or contract corruption.

Table 3. Explanation of the compilation results and their role.

Name	Explanation	Role
EVM Bytecode	The low-level instruction code executed by the EVM, including both contract deployment code and runtime code.	When a vulnerability is discovered, the location in the source code can be identified through EVM bytecode.
ABI	Defines each function’s signature and parameter encoding rules in the contract.	Provides the basis for generating initial test cases, ensuring accurate simulation of intended interactions.
AST	Converts code into a hierarchical tree structure, where each node represents a specific language construct.	Supplies detailed syntax-level context for control flow analysis and understanding internal function structures.

Table 4. Taint source.

Instruction of Taint Source	Annotation
`CALLDATALOAD`, `CALLDATACOPY`, `CALLDATASIZE`	Input Data
`CALL`, `DELEGATECALL`, `STATICCALL`	Contract Call
`CALLER`, `CALLVALUE`, `GAS`	Transaction Context
`EXTCODESIZE`, `EXTCODECOPY`,`EXTCODEHASH`	Contract Exploration
`TIMESTAMP`, `NUMBER`, `DIFFICULTY`, `COINBASE`, `GASLIMITBLOCKHASH`, `BALANCE`	Blockchain State

Table 5. Statistics of contract features of D1 dataset.

Features	Dataset	Min	Max	Mean
Lines of Code	Small	1	1226	172
Lines of Code	Large	45	2097	383
Functions	Small	1	59	16
Functions	Large	1	191	41
Instructions	Small	4	3599	1458
Instructions	Large	3601	15536	5425

Table 6. Distribution of each vulnerability in the D2 dataset.

Vulnerability Type	BD	EF	RE	IO	AF	UC	US	UD
Counts	228	144	147	117	68	183	122	77

Table 7. Baseline tools: detection methods and vulnerability types.

Baseline	Type	BD	EF	RE	IO	AF	UC	US	UD
sFuzz	Fuzzer	✓ ¹	✓	✓	✓		✓		✓
ILF	Fuzzer	✓	✓				✓		✓
Confuzzius	Fuzzer	✓	✓	✓	✓	✓	✓	✓	✓
Mythril	Symbolic Execution	✓		✓	✓	✓	✓	✓	✓
Securify	Formal Verification	✓	✓	✓			✓	✓	✓
TPH-Fuzz	Fuzzer	✓	✓	✓	✓	✓	✓	✓	✓

¹ ✓ indicates that the tool supports the detection of this type of vulnerability.

Table 8. True positives and false positives of vulnerability detection results for each tool.

Methods	BD	EF	RE	IO	AF	UC	US	UD	Total
sFuzz	52/34	2/10	65/14	20/3	-	24/52	-	8/0	171/113
ILF	27/4	14/8	-	-	-	45/142	-	4/2	90/156
Confuzzius	120/58	116/12	12/5	35/236	44/19	27/6	37/13	66/13	428/385
Mythril	69/94	-	85/39	57/11	29/8	76/18	51/6	34/5	401/181
Securify	101/46	73/10	30/67	-	-	108/35	19/31	26/16	357/205
TPH-Fuzz	185/10	129/0	82/33	97/9	53/3	147/28	94/17	59/12	846/102

Table 9. Ablation study on component contributions to detection performance.

Methods	TP/FP	Branch Coverage	Precision	Recall	F1-Score
TPH-Fuzz-WSA	397/243	51.62%	62.03%	36.56%	46.00%
TPH-Fuzz-WSO	468/236	68.34%	66.48%	43.09%	52.29%
TPH-Fuzz-WTPF	662/182	80.75%	78.16%	60.96%	69.49%
TPH-Fuzz	846/102	86.41%	89.24%	77.90%	83.19%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Shi, F.; Yang, J.; Guo, Z. TPH-Fuzz: A Two-Phase Hybrid Fuzzing Framework for Smart Contract Vulnerability Detection. Electronics 2025, 14, 1465. https://doi.org/10.3390/electronics14071465

AMA Style

Shi F, Yang J, Guo Z. TPH-Fuzz: A Two-Phase Hybrid Fuzzing Framework for Smart Contract Vulnerability Detection. Electronics. 2025; 14(7):1465. https://doi.org/10.3390/electronics14071465

Chicago/Turabian Style

Shi, Fanglei, Jinsheng Yang, and Zhaohui Guo. 2025. "TPH-Fuzz: A Two-Phase Hybrid Fuzzing Framework for Smart Contract Vulnerability Detection" Electronics 14, no. 7: 1465. https://doi.org/10.3390/electronics14071465

APA Style

Shi, F., Yang, J., & Guo, Z. (2025). TPH-Fuzz: A Two-Phase Hybrid Fuzzing Framework for Smart Contract Vulnerability Detection. Electronics, 14(7), 1465. https://doi.org/10.3390/electronics14071465

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

TPH-Fuzz: A Two-Phase Hybrid Fuzzing Framework for Smart Contract Vulnerability Detection

Abstract

1. Introduction

2. Background

2.1. Ethereum and Smart Contract

2.2. Vulnerability Landscape in Smart Contracts

2.3. Automated Vulnerability Detection Through Fuzzing

3. Proposed System

3.1. Overview

3.2. Input Preprocessing

3.3. Two-Phase Hybrid Fuzzing Engine

3.4. Vulnerability Detector

4. Experiments Evaluation

4.1. Experiment Design

4.2. Evaluation Methods and Metrics

4.3. Results and Analysis

4.3.1. Vulnerability Detection

4.3.2. Branch Coverage

4.3.3. Component Evaluation

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI