Incremental Formula-Based Fix Localization

Phung, Quang-Ngoc; Lee, Eunseok

doi:10.3390/app11010303

Open AccessArticle

Incremental Formula-Based Fix Localization

by

Quang-Ngoc Phung

¹

and

Eunseok Lee

^2,*

¹

Department of Electrical and Computer Engineering, Sungkyunkwan University, Suwon 16419, Korea

²

College of Computing, Sungkyunkwan University, Suwon 16419, Korea

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2021, 11(1), 303; https://doi.org/10.3390/app11010303

Submission received: 13 November 2020 / Revised: 21 December 2020 / Accepted: 23 December 2020 / Published: 30 December 2020

Download

Browse Figures

Versions Notes

Abstract

:

Automatically fixing bugs in software programs can significantly reduce the cost and improve the productivity of the software. Toward this goal, a critical and challenging problem is automatic fix localization, which identifies program locations where a bug fix can be synthesized. In this paper, we present AgxFaults, a technique that automatically identifies minimal subsets of program statements at which a suitable modification can remove the error. AgxFaults works based on dynamically encoding semantic of program parts that are relevant to an observed error into an unsatisfiable logical formula and then manipulating this formula in an increasingly on-demand manner. We perform various experiments on faulty versions of the traffic collision avoidance system (TCAS) program in the Siemens Suite, programs in Bekkouche’s benchmark, and server real bugs in the Defects4J benchmark. The experimental results show that AgxFaults outperforms single-path-formula approaches in terms of effectiveness in finding fix localization and fault localization. AgxFaults is better than program-formula-based approaches in terms of efficiency and scalability, while providing similar effectiveness. Specifically, the solving time of AgxFaults is 28% faster, and the running time is 45% faster, than the program-formula-based approach, while providing similar fault localization results.

Keywords:

automatic fix localization; debugging; angelic value; maximum satisfiability; concolic execution

1. Introduction

Debugging is an essential and yet the most expensive task in software development [1,2]. It includes a labor-intensive process of locating and fixing faulty code in a buggy program. This process consumes about 50% of the total software development costs, of which the majority is spent on fault localization [1]. Automatic techniques that reduce manual effort in debugging can significantly impact software costs and productivity [3].

Many automatic techniques have been proposed for supporting developers in various debugging activities (e.g., Reference [4,5,6,7]). Most fault localization approaches (e.g., spectrum-based or mutation-based methods [8,9]) focus on computing a statistical measurement of suspicious to rank program statements by their likelihood of being faulty. However, to be useful, these methods require a test suite containing many passing and failing executions with high code coverage [10,11]. Such a high-quality test suite is often not available in practice. Moreover, these methods provide a ranked list of suspicious statements solely without any explanation; thus, developers still need high inspection efforts to examine these statements for localizing and fixing faults [3].

Formula-based fault localization (FFL) is particularly a promising approach as it not only logically identifies possible fault locations but also provides additional information that helps to explain and fix the faults [5]. Consider a buggy program and a failing test case that, in its execution trace, called error trace, demonstrate an error. FFL techniques work by constructing an unsatisfiable logical formula called error trace formula, that is a symbolic representation of the error trace, and using an automatic solver to find the causes of this formula unsatisfiability. Based on the solution obtained from the solver, possible faulty statements in the program can be logically identified. Existing FFL techniques differ in how they construct the error trace formula and how they use the automatic solver to manipulate the formula for locating the fault.

In Reference [12,13,14,15], a static analysis technique was performed to construct a logical formula called program-formula that semantically equal to the input program (with regard to a certain unwinding bound). Specifically, every satisfiable assignment to the program-formula corresponds to a feasible execution in the program, and vice versa every program execution (with regard to a certain bound) corresponds to a satisfiable assignment of the formula. This program-formula is then extended in conjunction with clauses encoding the input values and the assertions of a failing test case to form an unsatisfiable formula. They then feed the extended formula into a pMaxSAT solver, which finds an assignment to the formula’s variables that maximize the number of satisfied clauses. The set of unsatisfied clauses, called a minimal correction subset (MCS) of the formula, indicates a corresponding minimal set of program statements that can be modified to correct the considered error execution. In addition, the obtained variable assignments correspond to a feasible correct execution in the angelically fixed program, which replaces all statements in the MCS with suitable angelic values. As a result, they can provide developers with potential minimal fix location, together with a successful angelic execution, as an explanation.

A key limitation of these methods is that they are extremely computationally expensive and has scalability issue even with small programs. This is because they represent all possible execution paths in a program into a formula. It may easily lead to a very large and complex formula that cannot be handled or is difficult to handle by recent solvers. Jin [16] proposed an on-demand formula computation (OFC) technique to construct a smaller formula that encodes only program parts relevant to a given test case. Their experimental evaluation showed that OFC formulas are much simpler than program-formulas but still sufficient to produce the same result as the program-formulas. This method, however, requires to compute all MCS for multiple intermediate formulas before obtaining the final formula. Enumerating all MCS of many formulas might outweigh the benefits of generating a simpler formula.

In Reference [17,18,19]’s approaches, they work with a formula encoding the semantics of a single execution path, we refer to this formula as single-path trace formula. A single-path trace formula is semantically equal to a straight-line program that contains program statements in which execution produced an error, although these single-path trace formulas are simple and, thus, easy to solve. However, because the formula does not contain information related to the control dependence among statements in the original program, the MCS obtained from these formulas may not correctly correspond to an angelic fix in the original program. As a result, these methods may fail to identify some angelic fix location, and they may also report invalid angelic fix candidates.

In this paper, we present AgxFaults, an incremental formula-based fault localization method to overcome the above limitations. Our method is based on two main components. First, instead of based on a static formula that encodes the entire program (like program-formula) or just encodes a single execution path (like single-path formula), AgxFaults is based on an error formula that is constructed and extended dynamically and effectively in an on-demand manner. This formula, called angelic error trace formula, encodes only program parts that relevant to a specific failing test case, and it over-abstracts all unrelated program parts by angelic non-deterministic executions. This encoding results in compact error trace formulas that are easier to solve but that are still sufficient to identify both data and control-related faults. Second, because the angelic formula is incremented dynamically, instead of multiple calls to a MaxSMT solver to solve multiple formulas separately (like OFC approach), we adapt the incremental core-guide MaxSMT algorithm [20] to manipulate the formula and compute the MCS incrementally.

We implemented our method in a tool named AgxFaults by extending the Java Path Finder (a NASA model checking tool) for localizing faults in Java programs. The input of AgxFaults is a buggy program and failing test cases (can be given as a jUnit test case, or a pair of input and expected post-condition in a configuration file). AgxFaults outputs a set of minimal angelic fix candidates (MFC), and each MFC is a pair consisting of a minimal fix location set and an angelic execution path to show that modifying these statements can make the given failing test case pass.

We evaluated our methods using various programs of different kinds and sizes. These programs include several sample programs provided by Bekkouche [19], 41 faulty versions of a commercial traffic collision avoidance system (TCAS), and several large and complex real-world programs in the Defects4J benchmarks. The experimental results showed that AgxFaults succeeded in reporting actual fault localization for all bugs in both the Bekkouche and the TCAS benchmarks. AgxFaults outperformed single-path formula approaches both in terms of the success rate and the accuracy of fault localization. AgxFaults provided similar results compared to the program-formula approach with better efficiency and scalability. Specifically, AgxFaults has a 28% faster formula solving time and 45% faster running time than the program-formula approach when applied to the TCAS programs. Furthermore, when the complexity of the program increased (e.g., loop unwinding bound increased), the formula solving time of the program-formula-based method increased exponentially, while that of AgxFaults increased significantly slower.

In summary, we have made the following contributions in this paper:

We proposed a technique to dynamically encode the semantic of a partial program that is related to a specific test input into a formula in an increasingly and on-demand manner.
We present an iterative algorithm for enumerating minimal angelic fix candidates by manipulating and solving the constructed error formula in an incremental manner.
We implement our proposed method in a tool, AgxFaults, and it is public as open-source software.
We perform experiments on various public benchmarks and open-source projects to show the effectiveness of our proposed approaches.

The rest of this paper is organized as follows. We first provide a basic background in Section 2. We then describe the detail of the proposed method in Section 3. We describe our experimental setup in Section 4 and discuss the experimental results in Section 5. We review related work in Section 6. Finally, we give our conclusions in Section 7.

2. Background

In this section, we describe the fault localization problem and provide a running example. Then, we explain the basic background of maximal satisfiability-based fault localization.

2.1. Fault Localization Problem

Fault localization is the problem of identifying program statements that responsible for an observed failure in a software program. Without knowing the correct program in advance, it is impractical to automatically pinpoint the faults with absolute accuracy. Indeed, any program statement or subset of program statements, that, if suitably modified, can remove the failure, is considered possibly faulty [12,14,16]. Since checking the existence of an actual syntactic fix for a program is extremely computation expensive [7,21], we check for the existence of an angelic fix that removes the failure in an angelically way by replacing program expressions with suitable non-deterministic values (i.e., angelic values).

In the context of this paper, we consider fault localization to be the problem of finding angelic fix candidates for an observed failure in a software program. An angelic fix candidate (AFC) consists of (1) a fix locations set (i.e., a set of suspicious statements) and (2) angelic values (i.e., set of values that, if substituted for these statements, would make the given failure execution become a success). Essentially, the angelic values represents an angelic execution that is diverted from the original program execution by replacing the value of specific variables at the fix locations with their corresponding angelic values. Because there may exist a large number of angelic fix candidates, providing the developer with a set containing only minimal angelic fix candidates (i.e., angelic fix candidates that have a minimal fix location set) is more referable. Thus, given a faulty program and a failing test case that, in its execution, demonstrates a failure, we produce a set of minimal angelic fix candidates (MFCs) that make the given test execution become a success.

Figure 1 shows a method foo and a unit test method testFoo, which checks if foo returns a certain value when it is called on a particular input. The unit test method testFoo calls foo with an input

x = 3

,

y = 5

and asserts that the output is equal to

- 2

.

However, because of a fault at line 3, where the assignment

(a = 2 * x)

is accidentally written as

(a = x)

, the method foo returns 8, thus violating the assertion, and the test fails. After running the test case testFoo, we know there is fault in the program foo. However, without knowledge about the correct program, a program statement can be considered possibly faulty if there exists a suitable replacement for these statements to make the observed error disappear.

We use

x = (L, V)

to denote an angelic fix candidates x, where

L = {l_{1}, \dots, l_{n}}

is its fix locations set, and

V = {“ v_{1} ” = v a l_{1}, \dots, “ v_{m} ” = v a l_{m}}

is the corresponding set of angelic values. In reality, the size of set L and set V may different, e.g., a statement may have multiple instances in an execution. However, for simplifying the representation, we assume

m = n

; thus, the angelic values

V [i]

corresponds to the statement at fix location

L [i]

. Each item

“ v_{i} ” = v a l_{i}

in an the angelic values set V is a mapping from an angelic value

v a l_{i}

to variable

v_{i}

at the fix location point i.

Given the program and a failing test case in Figure 1, our approach produces five minimal angelic fix candidates, which are:

m f c_{1} = ({8}, {“ r e t u r n ” = - 2})

;

m f c_{2} = ({6}, {“ g u a r d ” = T r u e})

;

m f c_{3} = ({3}, {“ a ” = 5})

;

m f c_{4} = ({1}, {“ b ” = - 2})

; and

m f c_{5} = ({2, 4}, {“ g u a r d ” = F a l s e, “ a ” = 5})

.

Consider MFC

m f c_{3} = ({3}, {“ a_{1} ” = 5})

. This MFC states that the failure can be removed by modifying the assignment (

a = x

) at line 3 such that it assigns the angelic value 5 to the variable a. Indeed, assigning 5 for the variable a at line 5 will change the value of condition expression in the if-statement at line 6 (i.e.,

a > = y

) from

“ f a l s e ”

to

“ t r u e ”

; thus, the execution flow of the error trace is flipped into the true-branch. As a result, the statement (

b = x - y

) at line 7 is executed, and the final value of variable b at the return statement is

- 2

; thus, the test assertion is satisfied.

An angelic fix candidate is said a feasible fix candidate if substituting the value of variables at the fix locations by their corresponding angelic values actually results in a successful program execution, i.e., the corresponding angelic execution results in a success. Otherwise, it is said to be an invalid fix candidate, or an infeasible angelic fix candidate. All the five MFCs above are feasible fix candidates because their corresponding angelic executions are feasible.

An angelic fix candidate is a correct fault location if all statements in its fix locations set are actually faulty statements. For example, the

m f c_{3} = ({3}, {“ a_{1} ” = 5})

is a correct fault location because all statements in its fix locations set, i.e., statement at line 3, are actually faulty. Let us consider another MFC

m f c = ({3, 1}, {“ a_{1} ” = 5, “ b_{1} ” = 3})

, for example. This MFC is a feasible fix candidate because replacing value of variable “b” at line 1 by value 3 and replacing the value of variable “a” by value 5 actually make the test execution a success. This MFC is not a correct fault location because its fix locations set contains a statement at line 1 that is not a faulty statement.

An angelic fix candidate is said a correct fix candidate if it is both a feasible fix candidate and a correct fault location. For example, the

m f c_{3} = ({3}, {“ a_{1} ” = 5})

is a correct fix candidate, as it is both feasible and also a correct fault location. Let us consider another MFC

m f c = ({3}, {“ a_{1} ” = 0})

, for example. This MFC is a correct fault location. This mcs is not a feasible fix candidate because replacing the value of variable “a” at line 3 by value 0 does not make the test execution success. Thus, this mcs is not a correct fix candidate.

2.2. Formula Satisfiability and Solvers

The formula satisfiability (SAT or SMT) is the problem of determining if there exists an assignment to variables in a given logic formula such that the formula evaluates to true. If such an assignment (called model) exists, the formula is called satisfiable (SAT); otherwise, it is called unsatisfiable (UNSAT).

Maximum satisfiability (MaxSAT or MaxSMT) is an optimization version of the SAT problem where the goal is to find a model for a given formula such that maximizes the number of clauses satisfied together. Such a maximized subset of clauses is called a maximal satisfiable subset (MSS). The complement of MSS is called a minimal correction subset (MCS), which is a minimal subset of clauses that, if removed, can make the remainder formula satisfiable again. Partial MaxSAT (pMaxSAT) is an extension of the MaxSAT, in which clauses are marked as either “soft” or “hard”. The goal in pMaxSAT is to find a model that satisfies all "hard" clauses and maximizes the number of satisfied “soft” clauses.

Although the SAT problem was known to be NP-complete, recent automatic solver algorithms have shown that they can solve large SAT formulas encoding practical industrial problems. SAT solvers are software programs that accept a logic formula in conjunction normal form (CNF) and decide if the formula is satisfiable. If the formula is satisfiable, the solver returns “SAT” and provides a satisfiable model for the formula. Otherwise, the solver returns “UNSAT” and may produce an unsatisfiable core, which is a subset of clauses in the formula that cannot be satisfiable together, as an explanation for the unsatisfiability. Some recent solvers support incremental SAT solving, which facilitates solving a series of closely related formulas. Incremental SAT solvers remember already learned information after checking the satisfiability of one input formula and utilize this information to avoid repeating redundant work in further satisfiability checks of additional formulas.

Generally, MaxSAT solver algorithms perform a succession of SAT solver calls; after each call, they add additional cardinality constraints to reach an optimal solution. The state-of-the-art MaxSAT solvers are based on the Core-guided MaxSAT algorithm, which leverages the unsatisfiable core produced by the SAT solver. Specifically, after each call to the SAT solver, they relax clauses in the unsatisfiable core by associating a relaxation variable with each such clause. To reach optimal solutions, they add cardinality constraints to constrain the number of relaxed clauses.

2.3. MaxSAT-Based Fault Localization

MaxSAT-based fault localization approaches [12,14,15,19,22] reduce the fault localization to the maximal satisfiability problem, which finds a variable assignment for a logic formula such that the number of satisfied clauses is maximized. Given a buggy program P and a failing test case

T (i n p, a s)

that exposes a bug in the program, these approaches perform as follows.

First, they use a bounded model checking or symbolic execution tool to construct a logical formula, called error trace formula, that semantically represents the error execution of the buggy program and the failing test case. Essentially, this error trace formula is the conjunction

Φ \equiv φ_{i n p} \land φ_{t f} \land φ_{a s}

, where

φ_{i n p}

encodes the test input,

φ_{t f}

is called trace formula encoding the semantics of program execution trace induced by the given input, and

φ_{a s}

encodes the test assertion that the program must be satisfied (or the expected output that the program must produce for the given test input). Since the program fails the test, thus, obviously, this error trace formula is logically unsatisfiable. Because the test input and the test assertion are correct by definition, thus, the clauses encoding test input and assertion are not responsible for the unsatisfiability of the error trace formula. Therefore, the causes of this formula unsatisfiability are account for clauses in the trace formula

φ_{t f}

. It is exactly the situation that faulty statements in the program are responsible for the test failure.

Second, they treat the constructed error trace formula as an instance of a partial MaxSAT problem, in which the clauses encoding the test input

φ_{i n p}

and test assertions

φ_{a s}

are marked “hard”, and the clauses encoding program statements in the formula

φ_{T F}

are marked “soft”. They then feed the formula into a pMaxSAT solver to obtain a set of MCSs of the formula. Intuitively, the set of clauses in an MCS indicates a corresponding set of minimal fix locations set, and the maximum satisfiability model of the MCS provides angelic values for these fix locations. Thus, as a result, they can produce a set of minimal fix candidates that can make the given test execution become a success.

Consider our example in Figure 1. Program-formula-based approaches, such as Bug-Assist [12] and SNIPER [14,15], first inline all function calls and unwind all loops in the program foo up to a given bound to obtain a loop-free and function-call-free program. They then transform the flatted program into a semantically equal program in the static single assignment form (SSA) [23], in which each variable in the program is assigned, at most, one time. Figure 2 shows the SSA form of the method foo where the number of loop unwinding is two. Each statement in the SSA program is then represented as a logic clause. These clauses are then conjoined to form a program-formula, the formula TF shown in Figure 3. This program-formula is semantically equal to the original program with respect to a specific unwinding bound. Specifically, every satisfiable assignment to the program-formula corresponds to a feasible execution in the program, and, vice versa, every program execution (with regard to the unwinding bound) corresponds to a satisfiable assignment of the program-formula. The program-formula is then extended in conjunction with clauses encoding the test input values and clauses encoding the test assertions of the failing test case to form an error trace formula

I N \land T F \land A S

, shown in Figure 3. By applying a pMaxSAT solver to this error trace formula, they obtain, totally, five minimal correction subsets:

m c s_{1} = {c_{9}}

,

m c s_{2} = {c_{6}}

,

m c s_{3} = {c_{3}}

,

m c s_{4} = {c_{1}}

,

m c s_{5} = {c_{2}, c_{4}}

. Each MCS indicates a corresponding set of minimal fix locations set, and the maximum satisfiability model of the MCS provides angelic values for these fix locations. As a result, they identify and report to developer following five minimal fix candidates:

m f c_{1} = ({8}, {“ f o o ” = - 2})

;

m f c_{2} = ({6}, {“ g_{2} ” = T r u e})

;

m f c_{3} = ({3}, {“ a_{1} ” = 5})

;

m f c_{4} = ({1}, {“ b_{1} ” = - 2})

; and

m f c_{5} = ({2, 4}, {“ g_{1} ” = F a l s e, “ a_{2} ” = 5})

.

3. Proposed Method

In this section, we provide details of our proposed fault localization method, AgxFaults.

3.1. Overview

AgxFaults takes a buggy program and a failing test case that demonstrate a program failure as an input. It outputs a set of pairs, each pair consisting of a minimal set of suspicious statements with an angelic execution that explain how the failure can be removed by replacing these suspicious statements with angelic values. The fault localization process of AgxFaults is iterative and incremental on-demand. Figure 4 provides a high-level view of AgxFaults and its main components. Below, we first briefly describe the main components and then explain the overall fault localization process of AgxFaults.

Angelic DCFG (Dynamic Control Flow Graph): The Angelic DCFG is the dynamic control flow graph of an angelic program [24]. This angelic program acts as an abstraction of the input buggy program such that only program parts relevant to the error are represented precisely, while irrelevant parts are represented abstractedly as angelic non-determinisms (i.e., executions that can produce non-deterministic values such that the program execution succeeds).
Error Trace Formula: The error trace formula is essentially formula $I N \land T F_{a g x} \land A S$ , where $I N$ represents the test input, $A S$ represents the assertions of the given failing test case, and $T F_{a g x}$ is semantically equal to the current version of the Angelic Program. The error trace formula encodes the fault localization problem of the current angelic program with the given failing test case. Each MCS of this angelic formula corresponds to an angelic execution in the angelic program.
Angelic Execution The angelic execution is a correct execution of the angelic program. This execution is obtained by diverting the original error trace such that the output of specific statements is dynamically replaced with proper values, i.e., angelic values, to make the test execution success.
On-demand Program Explorer and Encoder component is responsible for refining the angelic program and the error trace formula in an on-demand manner.
Incremental formula solver is responsible for computing the minimal correction subset (MCS) of the angelic formula incrementally.
MCS analyzer analyzes the obtained MCS of the angelic formula to determine possible faults in the program. In addition, it determines which abstract parts of the angelic program need more refinement to provide a more precise result.

The core idea of the AgxFaults is to work with an angelic program incrementally instead of a program formula encoding all semantics of the original buggy program, which may lead to very complex and expensive computation. In the beginning, only statement instances that executed in the original failing trace are presented precisely in the angelic program. The angelic program is expanded dynamically in an on-demand manner, after each iteration, to provide more precise results.

3.2. Overall Fault Localization Algorithm

The overall fault localization process of AgxFaults is described in Algorithm 1. In the algorithm,

P_{a g x}

represents the dynamic control flow graph of the angelic program, and

s o l v e r

is an instance of an Incremental Partial MaxSat Solver. Additional hard and soft constraints can be added into the solver via method addHard() and addSoft(), respectively. The method Check() of the solver returns True if it finds an MCS for the current formula; otherwise, it returns False.

In the beginning, the algorithm initializes

P_{a g x}

as a pure angelic program (which does not contain any specific statements, line 1). The formula solver,

s o l v e r

, is initialized with the set of soft-constraints empty and the set of hard-constraints containing clauses encoding the test input and its assertion (lines 1 to 3).

The main process of the algorithm is the loop from line 4 to line 15. It is an iterative process comprising the following steps: At the beginning of each iteration, it asks the solver to find an MCS for the current formula, line 4. If there is no more MCS, then the loop is terminated. Otherwise, the iteration starts and the solution of the formula is stored in

m o d e l

, line 5. Then, based on

m o d e l

, it determines the corresponding possible faulty statements

m c s

and an angelic execution

Π_{a g x}

in the angelic program, line 6. If the angelic execution

Π_{a g x}

contains unspecified executions (i.e., the if condition at line 7 is true), then the angelic refinement process is performed to refine the angelic program and angelic formula at these angelic-branches, line 8. Otherwise,

Π_{a g x}

is a feasible angelic execution; thus, the possible fault location

m c s

, together with the angelic execution path

Π_{a g x}

, are reported to the developer, line 12. In addition, a blocking constraint is added to the solver, line 13. This blocking constraint guarantees that the current MCS will not be encountered again in subsequent iterations. Specifically, the blocking constraint of the MCS is

⋁ {c | c \in m c s}

. The blocking constraint states that all program statements in the

m c s

are not simultaneously containing faults.

Algorithm 1 Overall Fault Localization Algorithm
Input:prog: buggy program
Input:(inp,as): failing test case
Input:bound: maximum bound fix location size
Output: {(stmt,angelic correct execution)}
1:	$P_{a g x}, I N, A S \leftarrow \emptyset, i n p, a s$
2:	$Φ_{h}, Φ_{s} \leftarrow (I N \cup A S), \emptyset$
3:	$s o l v e r$ = new IncrMaxSMTSolver( $Φ_{h}, Φ_{s}, b o u n d$ );
4:	while $s o l v e r . C h e c k () \land \neg t i m e o u t ()$ do
5:	$φ_{m c s}, m o d e l \leftarrow$ $s o l v e r$ .getModel()
6:	$s t m t, Π_{a g x} \leftarrow$ AnalyzeMCS( $m o d e l, P_{a g x}$ )
7:	if ( $Π_{a g x}$ contain angelic-branches) then
8:	$P_{a g x}, Φ_{h}, Φ_{s} \leftarrow$ AngelicRefinement( $s t m t, Π_{a g x}, P_{a g x}$ )
9:	$s o l v e r$ .addHard( $Φ_{h}$ )
10:	$s o l v e r$ .addSoft( $Φ_{s}$ )
11:	else
12:	writeOutput( $s t m t$ , $Π_{a g x}$ )
13:	$s o l v e r$ .addHard(BlockingConstraint( $φ_{m c s}$ ))
14:	end if
15:	end while

3.3. On-Demand Program Explorer and Incremental Formula Encoder

On-demand Program Explorer and Encoder (OPEE) is responsible for dynamically constructing and refining the angelic program, as well as the angelic formula in an incremental and on-demand manner. Algorithm 2 shows its details. It takes as input a program P, a concrete input

i n p

, a set of angelic modifications

m c s

, and an angelic execution path

Π

that contains angelic-branches. Each angelic modification,

m \in m c s

, specifies a statement instance

s t

and a value

v a l

, called the angelic value.

m c s [s t]

represents the angelic value of statement instance

s t

. The abstract execution path

Π

contains a sequence of branch decisions.

Π [s t]

represents the decision at branch instance

s t

.

The procedure AngelicRefinement( ) in Algorithm 2 describes how our the on-demand program explorer works. The OPEE component runs the program with the given test input and dynamically substitutes the value produced by statement

s t \in m c s

with its corresponding angelic value

m c s [s t]

to explore the program execution path specified in the abstract path

Π

. During program execution, it treats each statement instance,

s t

, differently, depending on whether

s t

corresponds to a non-deterministic or concrete statement instance in the angelic program

P_{a g x}

. If

s t

corresponds to a concrete statement instance of

P_{a g x}

, the condition at line 21 is true. If

s t

is an angelic location, then the execution engine perturbs the program memory M such that the output of

s t

is replaced with its corresponding angelic value (line 23). In addition, the execution engine also checks whether the current execution has diverted from the expected abstract path by comparing the actual branch decision with the angelic branch decision (line 25). If the diversion happens, then the execution stops. If

s t

corresponds to a nondeterminism statement in

P_{a g x}

(i.e., the condition at line 21 is false), the angelic program is refined by making this statement instance concrete (line 31), and the angelic formula is simultaneously updated to encode this statement instance (line 32).

Let

s t

be the statement instance that is being encoded into the angelic formula. Depending on the type of

s t

, it is encoded in the angelic formula differently. The procedure UpdateFormula( ) in Algorithm 2 shows the details. Specifically, if

s t

is an assignment statement

[v = e x p r]

, we represent

s t

as an equivalence relation between variable v on the right-hand side and expression

e x p r

on the left-hand side (line 47). For a conditional statement,

[i f c o n d]

, we add an extra variable

g u a r d

to represent the branch predicate, and we represent the conditional statement as an equivalence relation between the

g u a r d

variable and the conditional predicate expression (line 45). If

s t

is a phi statement instance,

[v = Φ (g u a r d_{c s}, e x p r)]

, it is presented as an implication constraint

(g u a r d_{c s} \Rightarrow (v = e x p r))

. This implication constraint essentially states that, if the execution trace reaches the statement

s t

by going through the branch

g u a r d_{c s}

of the conditional statement, then the value of variable v is equal to

e x p r

; otherwise, the value of v is un-constrained. By encoding Phi statements as implication constraints, it allows tightening the constraints of the joined variable by additional constraints when executing other branches of the conditional statement.

Algorithm 2 On-demand Program Explorer and Formula Encoder
16:	procedureAngelicRefinement( $m c s, Π_{a g x}, P_{a g x}$ )
17:	$Φ_{h a r d}, Φ_{s o f t} \leftarrow \emptyset, \emptyset$
18:	$M \leftarrow i n i t i a l i z e (P, i n p)$ ; $s t \leftarrow s t_{e n t r y}$ ;
19:	while $s t \neq n u l l$ do
20:	$(s t_{s s a}, π_{s s a}) = c o n v e r t T o S S A (s t, π_{s s a})$
21:	if $s t_{s s a} \in P_{a g x}$ then
22:	if $s t_{s s a} \in m c s$ then
23:	$M \leftarrow p e r t u r b (s t_{s s a}, Π_{a g x} [s t_{s s a}])$
24:	end if
25:	if $(s t_{s s a}$ is $[i f c o n d])$ then
26:	if $M [g u a r d_{s t}]) \neq Π_{a g x} [g u a r d_{s t}]$ then
27:	break ▹ execution is diverted from the angelic path $Π_{a g x}$
28:	end if
29:	end if
30:	else // $s t_{s s a} \notin P_{a g x}$
31:	$P_{a g x} \leftarrow P_{a g x} \cup {s t_{s s a}}$ ▹ refine the angelic program
32:	updateFormula( $s t_{s s a}, Φ_{h a r d}, Φ_{s o f t}$ ) ▹ refine the angelic formula
33:	end if
34:	$(s t, M) \leftarrow e x e c u t e S t a t e m e n t (P, M)$
35:	end while
36:	return ( $P_{a g x}, Φ_{h a r d}, Φ_{s o f t}$ )
37:	end procedure

38:	procedureupdateFormula( $s t_{s s a}, Φ_{h a r d}, Φ_{s o f t}$ )
39:	if $(s t_{s s a}$ is $[v = Φ (g u a r d_{c s}, e x p r)])$ then
40:	$φ_{v a l} \leftarrow (g u a r d_{c s} \Rightarrow (v = e x p r))$
41:	$Φ_{h a r d} \leftarrow Φ_{h a r d} \cup {φ_{v a l}}$
42:	else
43:	if $s t_{s s a}$ is $[i f c o n d]$ then
44:	$c o n d \leftarrow$ conditional expression of $s t_{s s a}$
45:	$φ_{v a l} \leftarrow (g u a r d_{s t} = c o n d)$
46:	else
47:	$φ_{v a l} \leftarrow (v = e x p r))$
48:	end if
49:	$a b \leftarrow$ fault predicate of original $s t$
50:	$Φ_{h a r d} \leftarrow Φ_{h a r d} \cup {a b \lor φ_{v a l}}$
51:	$Φ_{s o f t} \leftarrow Φ_{s o f t} \cup {\neg a b}$
52:	end if
53:	end procedure

Because if a statement

s t

is faulty, it would be replaced by a different statement. Thus, in that case, the constraints that constrain the value of variables in

s t

is invalid and should be relaxed. To reasoning on the faultiness of program statements, we associate with each statement

s t

in the original program a boolean variable

a b_{s t}

as a fault predicate. Specifically, the variable

a b_{s t}

indicates that the statement

s t

is faulty (or correct) if it is evaluated to

t r u e

(or

f a l s e

, otherwise). We encode each conditional and each assignment statement instance

s t

into the angelic formula by adding a clause

(a b_{s t} \lor φ_{v a l})

to the hard constraints set and adding a clause

\neg a b_{s t}

to the soft constraints set (line 49–51), where

φ_{v a l}

is the constraint representation of the statement instance. Since phi statements are fake statements that are introduced to explicitly represent the dependence of variables on branch decisions in the execution trace, the faultiness of a phi statement instance may account for faults in its preceding assignment or conditional statement instances. Thus, we add the constraint representing a phi statement instance into the angelic formula as a hard constraint (line 41).

To summarize, the constructed angelic formula is an instance of a partial maximum satisfiability problem in which the hard constraints represent semantic of the input and assertions of a test case and the constructed angelic program; the soft constraints contain a set of fault predicate of statements in the buggy program that have been represented in the angelic program. Each MCS of this angelic error trace formula indicates a minimal set of program statement

s t m t

, in which its corresponding fault predicate

a b

is evaluated to

t r u e

.

3.4. Incremental Formula Solver

After each iteration of the Angelic Refinement Loop, a new set of constraints are added into the angelic formula. Instead of considering each angelic formula after each iteration as an independent max-sat problem and invoking a Max-SAT solver to find MCS, we consider all generated angelic formula so far as a sequence of a similar max-sat problem, i.e., an instance of a sequential maximum satisfiability problem [20]. The Formula Solver uses a sequential max-sat solver to compute MCS of the angelic formula incrementally.

3.5. MCS Analyzer and Report Writer

Given an MCS,

m c s

, and a satisfiable model of the angelic error trace formula, the MCS Analyzer responsible for reconstructing an angelic execution trace

Π_{a g x}

in the angelic program that corresponds to the MCS in the angelic formula. The angelic execution trace

Π_{a g x}

is reconstructed by traveling the dynamic control flow graph of

P_{a g x}

beginning from the entry point as follows. The entry point of the program is the first element of

Π_{a g x}

. All assignment statements in the path are added in

Π_{a g x}

. When traveling to a conditional branch node

c s

, the guard expression of

c s

is evaluated using the

m o d e l

to determine the selected branch, which we will call b. If the selected branch b is not included in the angelic program

P_{a g x}

, (b is a non-deterministic branch), then the process moves to the corresponding Phi statement instance of

c s

. Otherwise, it moves to the first statement in the selected branch and continues the process until it reaches the terminal node.

3.6. Illustrative Example

We illustrate how our proposed method work using the running example in Figure 1. Figure 5 shows the progress of AgxFaults in the firsts 4 iterations. In the figure, the green nodes represents program parts that are assumed to be correct; thus, they are encoded into the formula as hard constraints.

After the initialization steps, the angelic program

P_{a g x}

contains only an entry point and a normally exit point (i.e., successful terminated). The solver,

s o l v e r

, is initialized when the soft-constraints set is empty, and the set of hard-constraints encode the entry point and the normal exit points which represent the successful terminator. In the first iteration of the loop, the algorithm do nothing important but invoke the AgelicRefinement method (line 8) with input an empty angelic fix location set and an empty angelic path. Thus, the AngelicRefinement method just runs a given test case and encodes the executed statements into the formula. Figure 5b shows the state of algorithm after the first iteration. Specifically, the angelic program precisely presents only executed statements in the original error trace, while all other parts are abstracted. Figure 6 shows the set of encoded constraints in the first iteration (We eliminated the fault predicate variables

a b

from the clauses for simplification).

In the second iteration, first, the solver solves the current angelic formula and returns an MCS

ϕ_{m c s} = {f o o = b_{3}}

, which corresponds to the return statement at line 8. Then, the MCSAnalyzer component analyzes the mcs and corresponding model to identify fix location set and corresponding angelic path, (line 6 in Algorithm 1). The constructed angelic path

Π_{a g x}

is the path (s,1,2,6,8,9,OK) in the CFG of the angelic program. This path is feasible because all edges in the path are deterministic. Thus, the algorithm goes to line 12 to output the newly found minimal fix candidate, and then it adds a constraint to block this solution that will occur in the next iterations. In other words, from this time, the solver considers the statement at line 8 as correct.

In the third iteration, the solver returns an MCS

m c s = {b_{1} = x_{0} + y_{0}}

, which correspond to assignment [

b = x + y

] at line 1. The constructed angelic path for this mcs is the path (s,1,2,6,8,9,OK), which is a feasible path in the angelic program. Thus, similar to the second iteration, the algorithm outputs the newly found MFC, adds blocking constraint for line 1, and goes to the next iteration.

Figure 5c shows the current state of the angelic program and the angelic formula in the solver. Because the blocking constraint that are added in previous iterations, the statements at line 1 and line 8 will not be considered as possible faults. Thus, the solve only needs to search in a smaller search space for the next MCSs.

In the fourth iteration, the solver found an MCS

m c s = {g_{2} = (a_{1} > = y_{0})}

, and the angelic value is

{“ g_{2} ” = T r u e}

. This mcs indicates the branch condition at line 6 is the fix location. The constructed angelic path for this mcs is the path (S,1,2,3,6,?,8,9,OK). Because the angelic path contain an non-deterministic branch, thus, the AngelicRefinement method is invoked with input

m c s = {l i n e 6}

, and its corresponding angelic value is

T r u e

to explore the true-branch of the if-statement at line 6. After the refinement, the assignment [

b = x - y

] is presented in the angelic program, and additional constraints in Figure 7 are added into the solver.

After the refinement, the procession continues until timeout or all minimal fix candidates have been found (i.e., the solver.Check() return UNSAT).

4. Evaluation Setup

To evaluate our proposed method, we performed experiments on three different set of benchmarks and compared against existing formula-based fault localization techniques including: techniques that are based on program-formula (e.g., BugAssist [12], Sniper [14]); techniques that are based on single-path control-flow insensitive formulas (e.g., Reference [17,19]); and single-path control-flow sensitive formulas (e.g., Reference [18]). All experiments are performed on a computer with a 4.0 Ghz Intel Core i7 CPU and 8 GB RAM. In this section, we describe the setups of our evaluation.

4.1. Implementation

We have implemented our approach in a prototype tool, named AgxFaults, that automatically localize faults for Java programs. Unfortunately, all the formula-based fault localization techniques, which provide tools or source-code online, are targeting C programs. Thus, for comparison, we also implemented three different existing formula-based fault localization techniques into the tool AgxFaults, specifically, the program-formula (PF) approach used in Reference [12,14], the flow-insensitive trace formula (FI)-based approach used in Reference [17], and the flow-sensitive trace formula (FS) used in Reference [18] are implemented into AgxFaults.

We implemented the AgxFaults tool as an extension of NASA’s model checker Java-Path-Finder (JPF). The inputs of AgxFaults are a buggy program and a failing test case (given as a JUnit test method or a pair of input and expected post-condition in a configuration file). The output is a list of minimal angelic fix candidate (MFCs), where each MFC contains (1) a set of suspicious statements, together with (2) angelic values that these statements should have produced to make the test execution, which originally fails, become a success. AgxFaults includes in itself a customizable formula-builder and a constraint solver for solving formulas. We implement the formula-builder by using extension mechanisms of the JPF and adapted from the implementation of jDart [25], a concolic execution engine for Java programs. Specifically, we use the bytecode factories and listeners extension mechanisms of JPF to (1) create fresh symbolic variables on-the-fly when executing assignments and branch conditions instructions, (2) dynamically perturb concrete program state and force the program execution to follow a specific path, and (3) manipulate and propagate symbolic values along different program execution paths and collect constraints for building error trace formulas. As inherited from jDart, AgxFaults also uses the constraints library jConstraints [26] as an abstraction layer for constraint solvers. Since the jConstraint does not support solving pMaxSMT/MaxSMT and sequential MaxSMT problems, we implemented the Fu & Malik’s core-guided max-sat algorithm [27] and the incremental core-guided MaxSMT solving algorithm [20] into the jConstraint library. We use the Z3 (https://github.com/Z3Prover/z3) SMT solver as our back-end constraint solver.

We implemented the program-formula approach (PF) following the full-flow sensitive formula encoding approaches in Sniper [14] because it was experimentally showed more effective than Bug-Assist. To construct a program-formula, we force the on-demand program explorer and formula builder to explore all paths up a certain bound in the program and encode all of them into the formula, instead of operating on demand. We implemented the single-path flow-insensitive formula approaches by following error trace formula encoding described in Reference [17]. The single-path flow-sensitive formula approach is implemented by following the error trace formula encoding described in Reference [18]. The source code of AgxFaults, which contain our implementation of all above techniques, as well as our benchmark programs, are available online for open access at http://bit.ly/agxfaults.

4.2. Research Questions and Evaluation Metrics

We applied each of these four implemented techniques to several buggy programs to empirically investigate the following research questions:

RQ1: How effective is AgxFaults in finding angelic fix candidates, compared to program-formula and single-path formula approaches?

A technique is said more effective in finding angelic fix candidate if it can find more feasible fix candidates. A technique is said more precise in finding angelic fix candidate if it always produces feasible MFCs where the ratio of the number of its found feasible MFCs to the total number of its reported MFCs is high. Thus, to answer RQ1, we identify two metrics:

(1) average number of feasible MFCs that the technique found for each run (i.e., number of feasible MFCs found), and

(2) the number of feasible MFCs in the total number of MFCs that the technique reported for each run (precision).

RQ2: How effective is AgxFaults in localizing fault location, compared to program-formula and single-path formula approaches?

To evaluate the fault localization effectiveness of a technique, we identify three metrics:

(1) the number of runs the technique were able to report the correct fault locations in the total of its runs (i.e., successful fault localization rate),

(2) the number of runs the technique were able to report a correct fix candidate in total number of its runs (i.e., successful fix localization rate), and

(3) the percent of program code lines that the developer need to examine before identifying the first faulty statement (i.e., EXAM score).

RQ3: How efficient and scalable are AgxFaults, compared to program-formula and single-path formula approaches?

We use the CPU time as a metric to measure the efficiency and scalability of a technique. Specifically, for each technique, we measure:

(1) the CPU time spent by the solver to solve formulas (i.e., Formula solving time), and

(2) the total running time the technique for each run (i.e., Running time).

RQ4: Can AgxFaults be applied to real bugs in large software projects?

4.3. Benchmarks

For our evaluation, we use several buggy programs selected from three different benchmarks. The first benchmark consist of a set of example programs provided by Bekkouche [28]. The size of these program ranges from 17 to 130 lines of code. Each program contains one to three faults that were specifically injected to evaluate fault localization techniques. The second benchmark contains 41 faulty versions of a real traffic collision avoidance system (TCAS) from Siemens [29], which is popularly used in software testing and fault localization researches. The third benchmark we selected is real-world buggy programs in the Defects4J [30].

We selected the programs in the first and second benchmark for two main reasons. First, these programs are small, so they allow us to find all minimal angelic fix candidates for the programs. This is generally impossible for large and more complex programs. By finding all minimal angelic fix candidates for the programs, we can precisely compare the efficiency of the techniques in term of complexity reduction and compare the effectiveness of the techniques in terms of success rates. Second, the TCAS programs are commonly used to evaluate state-of-the-art formula-based fault localization techniques, including BugAssist, Sniper, and LocFaults. Thus, we can directly compare our results with those techniques since the program in the first and the second benchmarks are small size and contains only artificial faults. The third benchmark contains non-trivial open-source programs with real bugs.

We obtained Java versions of the TCAS programs from the SIR website. Each faulty version of the TCAS program has a size of 180 lines and contains one to three artificially-injected faults. These programs also come with a total of 1576 test inputs and a fault-free version. For each buggy version, we manually compare it with the fault-free version and consider the set of different statements to be the actual faulty statements. To obtain failing test cases for each buggy version, we ran all test cases using the fault-free version to obtain the expected output of the test cases. Then, for each buggy version, we ran all the test cases and matched the results with the expected output to identify the failing test cases.

The third benchmark we considered is several faulty programs in the Defects4J, a benchmark containing real bugs from large and complex open-source projects. We randomly select several buggy versions of open-source projects in the Defects4J. These programs include JFreeChart, Commons-Codec, Commons-Compress, Common-CSV, Common-Lang, Common-Math, and Mockito. In this experiment, we use the failing test cases that are already provided in the Defects4J as the input for the AgxFaults tool.

Study Protocol

We ran each of four techniques methods implemented in our tool on a buggy program multiple times, each time inputting a different failing test case and outputting a list of minimal angelic fix candidates (MFCs). We examined all MFCs produced for each run, in the generated order, and determined the validity and accuracy of the results. Specifically, a run was considered successful fault localization if its output contained a correct fault location MFC, i.e., all suspicious statements in the MFC were actually faulty statements. A run was considered successful fix localization if its output contained an correct fix location MFC, i.e., an MFC that is both a feasible fix candidate and performs fix at the correct fault location.

To evaluate the performance of the techniques, we record the average CPU time of each technique for processing each failing test case and the amount of which consumed by the solvers, the number of MFC each technique generated and the number of which are feasible fix candidates, the number of code lines included in the report, whether the generated report contained the actual fault, and how many lines of code developer will have to examine to identify the actual fault location.

We did not set timeout when running the techniques on the programs in the TCAS and the Bechouche benchmarks. Thus, the tool finished after it had reported all MFCs that it could find. When the techniques were ran on the programs in the Defects4J benchmark, we set a timeout of five minutes and the deepest nested loop was 100.

5. Results of the Experiments

5.1. Result of RQ1: Effectiveness in Finding Angelic Fix Candidates

To evaluate the effectiveness of our method in finding angelic fix candidates, we report and compare (1) the number of MFCs that each technique found for each run and (2) the number of which are feasible angelic fix candidates. A technique is said more effective in finding angelic fix candidates if it can find more feasible fix candidates.

Table 1 shows the experimental results for 41 buggy versions of the TCAS program in terms of effectiveness in finding MFCs. The first section of the table shows the version name (Ver), the number of faults (Faults), and the number of failing test cases (Ftc) for each buggy version. The second and third sections of the table show, on average, the total number of MFCs that each technique found in a fault localization run for each buggy version, as well as the number of these found MFCs that are feasible (i.e., MFCs with valid explanation), respectively.

For example, version v1 has one injected fault, and there are 131 failing test cases. Thus, each fault localization method was run on version v1 131 times. On average, both AgxFaults and PF found 44 minimal angelic fix candidates for each run, while those of FS and FI are 52 and 3, respectively. After examining the MFCs generated by all methods, we found that all MFCs generated by AgxFaults, PF, and FI are feasible, while only 35 in a total of 52 MFCs generated by FS are feasible. Thus, the number of feasible MFCs generated by AgxFaults, PF, FS, and FI is 44, 44, 35, and 3, respectively.

In total, each method was run 2156 times over 41 program versions. On average, the number of MFCs generated for each run by the AgxFaults, PF, FS, and FI are 87, 87, 45, and 3, respectively. Of which, the number of generated MFCs that are feasible of AgxFaults, PF, FS, and FI are 87, 87, 23, and 3, respectively. As the result showed, AgxFaults produced similar results as the PF approach, which is based on a more complex error trace formula encoding the entire program semantic, in all buggy versions. All MFCs generated by AgxFaults, PF, and FI approaches are feasible (i.e., provide valid explanations), while only 23 (51%) out of 45 MFCs generated by FS are feasible. AgxFaults outperform single-path formula approaches (both flow-sensitive and flow-insensitive techniques) in finding more feasible MFCs.

5.2. Result of RQ2: Effectiveness in Fault Localization

To evaluate the fault localization effectiveness, we report and compare (1) the number of successful fault localization runs, the number of successful fix localization runs, and the average EXAM score of each technique for each buggy program. For recall, a run of a technique was said successful fault localization if it found an MFC that fixes at only faulty statements (i.e., all fix locations in the MFC are actually faulty statements). A fault localization run was considered successful fix localization if it outputs an MFC that is both a feasible fix candidate and a correct fault location. EXAM score is the percent of program code lines that the developer needs to examine before identifying the fault. EXAM score is computed as the ratio of the number of lines of code that the developer examined before reaching an actual faulty line to the total number of code lines in the program.

Table 2 shows the experimental results for the TCAS programs in terms of fault localization effectiveness. The first section of the table shows the version name (Ver) and the number of fault localization runs (Ftc) of each technique for each buggy version.

The columns in the first section, “#Succ. fault localization”, of Table 2 show the number of runs that each technique succeed in reporting correct fault location. We obtained the result of Bug-Assist (BA) and LocFaults (LF) from Reference [19,22]. In a total of 2156 runs of each technique, the number of runs that successfully output the correct fault locations of the AgxFaults, PF, and FS techniques is 2156 (100%) runs, while those of Bug-Assist (BA), LocFaults (LF), and FI is 2087 (96%), 1345 (62%), and 121 (5.6%) runs, respectively. Specifically, FI reported the correct fault location only for version v36, in which the faulty statement is data-dependent on the program output.

The columns in the second section, “#Succ. fix localization”, of Table 2 show the number of runs that each technique can output an MFC that is both correct fault location and feasible angelic fix candidate. As the result showed, all 2156 in a total of 2156 runs of the AgxFaults and PF are successful fix localization runs, while FS and FI techniques succeed in 2027 and 1345 runs, respectively.

The columns in the last sections of the table show the EXAM score of each technique. On average, the EXAM score for Agx was 6.8%, PF was 6.9%, FS was 11.5%, FI was 17.7%, and execution slice was 17.6%.

5.3. Result of RQ3: Efficiency and Scalability

To answer RQ3, we compare AgxFaults with PF, FS, and FI techniques based on their computational expensive. We use the CPU time as a metric for measure the computational complexity. Specifically, we measure and report the CPU time spent on the solver to solve formulas and the total running time of the techniques for each run.

Figure 8 and Figure 9 show the formula solving time—the CPU time spent by the solver to solve formulas—and the total running time of PF, Agx, FS, and FI formula techniques for the 41 buggy versions of the TCAS program. Both the formula-solving time and total running time of Agx were significantly smaller than those of PF for most of the versions. On average, the Agx approaches was 28% faster than PF, but three time slower than FS and 51 time slower than FI at formula solving. The Agx approaches was 45% faster than PF, but 9.4 time slower than FS and 9.4 time slower than FI at total running.

Because the computational complexity of AgxFaults and PF approaches are proportional to the loop unwinding bound that limits the maximum number of nested iterations for each loop in the target program, indeed, the computational complexity increased when the loops unwinding bound increased. Thus, we used the set of programs containing loops in Bekkouche’s benchmark to further evaluate the scalability of fault localization approaches with respect to loop unwinding bound. These programs included various variations of the SquareRoot, Sum, and BSearch programs. SquareRoot is a program that finds the integer part of the square root of an integer number. Sum is a program that computes the sum of all natural numbers from one to a given input value. BSearch is a program that implements the binary search algorithm to search in an increased array of integers. We run a fault localization technique multiple times for each program and each failing test case. We varied the maximum number of loop unwinding for the program execution trace from 10 to 100.

The fault localization result returned by both PF and AgxFaults are identical for each program. Specifically, both method return 6 MFCs for SquareRoot, 8 MFCs for Sum, and 8 MFCs for BSearch programs. Figure 10 shows the average formula solving time of Agx and PF approaches when applied to programs with increasing loop unwinding bounds. As shown in this graph, for a small loop unwinding bound, the formula solving time of PF and Agx are similar. However, when the number of loop unwinding bounds was increased, the time it took the PF approach to solve the problem increased exponentially, while that of the Agx approach increased at a significantly slower rate.

5.4. Result of RQ4: Real Software Bugs

This experiment is to evaluate the capability of AgxFaults on real bugs in large and complex projects. Table 3 has details of the projects and characteristics of the bugs that are used in this study. Columns “Name and “LOC” in the table show the project name and the number lines of the Java code in these projects. Columns “Bug ID” and “Description” show the unique id that identifies the bug and a description of each bug. Columns in the “Patch size” section of the table show the complexity of the patch written by the developer to fix the bugs. Specifically, columns “Add”, “Del.”, and “Edit” show the number of lines that the developer has added, deleted, and edited to fix the bugs.

Table 4 shows the results of our method for each bug in the benchmark. Column “#MFC shows the number of angelic fix candidates that AgxFaults found for a failing test case. All these generated MFCs are feasible. Column “#Susp. Lines” shows the number of distinct lines reported in the list of MFCs. Column “Found actual fault?” describes whether the reported lines contain the actual faulty statements or not (“yes/no"). Column “Exam lines” shows the number of lines of code that the developer needs to examine to identify the first fault location. “Solver time” and “Run time” shows the time spent on SMT solver and the total running time of AgxFaults for each buggy program.

Let us consider bug Chart 5, for example. Figure 11 shows how the developer fixed the bug. To fix the bug, the developer has (1) changed the condition expression of the if statements at line 548 and (2) added additional code at line 544. AgxFaults found 6 MFCs for this bug, shown in Figure 12, and all of these MFCs are feasible (replacing the value of the suspicious expressions with the corresponding angelic value actually results in a successful execution). There are a total of 7 lines of code reported in all MFCs. The actual faulty lines (i.e., the “if statement” at line 548) are reported in 4 MFCs, which are mfc3, mfc4, mfc5, and mfc6. All these MFCs contain the buggy line, together with one additional statement. This result indicates that modifying the buggy if-statement alone is not enough to make the failing test case pass. The developer should modify both the if-statement at line 548, together with one additional statement, as reported in mfc3, mfc4, mfc5, and mfc6. For example, mfc3 shows that modifying the if statement at line 548, together with the assignment “return = 0” at line 203 of the file XYDataItem.java, can make the failing test case a success. The number of lines that the developer has to examine before identify the faulty line is 3, as he needs to examine two lines in the mfc1 and mfc2 before checking the mfc3. The total running time of AgxFaults is about 3 seconds, of which the SMT solver accounts for 0.47 s.

In a total of 22 bugs in this study, there are 9 bugs that AgxFaults reported with the actual faulty line at the first candidate to the developer. A developer needs to examine less than 5 lines to identify the actual faulty line in all cases, except for the bug Codec18. The total running time for each run is a few seconds, which is acceptable.

Comparison with Existing Techniques on Real Bugs

Because the program-formula approach crashed or timed out without generating any MFCs when it was ran on the bugs in the real bug benchmark, we can only compare the result of AgxFaults with those of the single-path formula approaches FS and FI.

Table 5 shows the comparison of the results generated by AgxFaults with those generated by the FS and FI approaches on the real bugs benchmark. Column “Trace size” shows the number line of code that is executed in the error trace of the bug. The “#MFC columns show the number of minimal angelic fix candidates that each technique AgxFaults, FS, and FI produced for a failing test case. The “#Susp. Lines” columns show the total number of lines that each techniques reported as suspicious. The “#Exam lines” columns show the number of lines of code that the developer needs to examined to identify the first fault. Empty value in the “#Exam lines” means that the developer would not find any faulty statement in the list of suspicious statements produced by the tool, i.e., the tool cannot report any actual faulty statement in the buggy program.

As the result shows, in total 22 bugs, AgxFaults outperforms FS and FI in terms of fault localization effectiveness since, as shown in Table 5, in a total of 22 bugs in this study, AgxFaults successfully reported the actual faulty line in 19 bugs, while FS succeeded in 9 bugs, and FI succeeded in 5 bugs.

We compared the efficiency of AgxFaults on the real bugs benchmark with only FS and FI techniques. Table 6 shows the comparison results. As shown in the table, the time for solving formulas in AgxFaults is about 1.7 times longer than in FS, and about 95 times longer than FI approaches. As shown in the table, the total running time of AgxFaults is about 1.7 times longer than in FS, and about 5 times longer than FI approaches. The formula solving time accounts for about 43% of the total running time of AgxFault, while that of FS is 41%, and FI is The computation overhead for the formula solving activity account for about 43% of total computation cost of AgxFaults and total running time.

5.5. Threats to Validity

The most important internal threat to validity in our evaluation is that we implemented the existing techniques that we compare against. Since we are targeting Java program, unfortunately, all techniques that we compare against target C or C++ programs. Another internal threat is the possibility errors in our implementation of the core-guided incremental sequential partial MaxSMT algorithms. To reduce the threats, we have made all the source codes of our implementation available online in an open-source repository.

The main external threat to validity is that we performed our evaluation on two simple programs and some bugs in real open source projects. They do not necessarily represent all types of programs and bugs; thus, our results may not generalize. Another external threat is that the SMT solver Z3 and the JConstraints library, which we used in our implementation, may contain bugs.

6. Related Work

For decades, many automated techniques have been proposed for automatic fault localization. We refer the interested reader to the survey work of W. Wong et al. [6] for a systematic literature review. In this section, we give a brief overview of the most popular fault localization techniques, such as spectrum-based, slicing, mutation-based, and, especially, focus on formula-based fault localization, which is closely related to our work.

6.1. Spectrum-Based Fault Localization

Most existing automatic fault localization techniques are spectrum-based fault localization (SFL) [4,6]. SFL profiling the buggy program with a given test suite and count the number of passing and failing tests that cover a statement. Based on this coverage information, they compute for each statement a suspicious score which measures the likelihood of them being faulty. The output to developers a list of statements ranked by their suspicious score. SFL techniques require lightweight computation; thus, they can be applied to a very large program. However, these techniques usually return a long list of program entities with no context information. Moreover, in order to rank the actual faulty statement at the top of the suspicious list, they require to have a comprehensive test suite that contains sufficiently many passing and failing executions. These limitations limit the usefulness of their fault localization result. Our approach required only a single failing test case, and it can return a small sets of suspicious statements, where a suitable modification can make the test become passing.

6.2. Program Slicing Based

Slicing-based techniques [31] use program dependence information to reduce the suspicious scope to a subset of statements that might affect the wrong value of variables at the failure site. Since all statements, together with the corresponding dependencies, are taken into account, slicing-based techniques often return an imprecise list of suspicious statements. B. Hofer and F Wotawa [32] combine dynamic slicing with constraint solving to produce a more precise list of suspicious statements compared to dynamic slicing.

6.3. Mutation-Based Fault Localization

Mutation-based fault localization (MBFL) [8] is a recent direction that utilizes mutation analysis in fault localization. These techniques first use a set of syntactic change operations (i.e., mutation operations) to mutate the program code in order to generate several variant programs, called mutants. They then run these mutations with test cases and measure how the test execution results change when a code element is mutated. Based on this information, MBFL techniques statistically infer program elements that are highly relevant to the fault. A limitation of MBFL techniques is the huge mutation execution cost [9] because they need to generate a large number of mutants, combine them, and run these mutants with many test cases.

6.4. Formula-Based Fault Localization

Bug-Assist [12,22], SNIPER [14,33], and F. Wotawa [13] construct a formula that semantically represents all possible executions in a buggy program (unwinding to a given bound) and extend this formula in conjunction with clauses encoding input and expected output of a failing test case to form an unsatisfiable error trace formula. Bug-Assist [12,22] and SNIPER [14,33] treat this error trace formula as an instance of a partial MaxSAT problem, in which the clauses encoding test input and expected output is marked “hard-clause”, and the clauses encoding program statements are marked “soft-clause”. They use a pMaxSAT solver to obtain the MCS and report the program statements that correspond to clauses in the MCS as possible faults. To reduce the formula solving time, S. Lamraoui et al. [15] determine correct basic blocks (CB), which are basic blocks that do not participate in any failing executions, and set all clauses related to statements in these CBs as hard-clause. Instead of using MaxSAT solver to obtain MCS directly, F. Wotawa [13] derives the MCS by computing irreducible infeasible subsets (minimal hitting set) of the error trace formula. Our approach is similar to these MaxSAT-based approaches in finding minimal sets of program locations where an angelic fix may exist. However, our approach differs from these approaches in several aspects. First, while these approaches are based on a static error trace formula, which may lead to over-complex or insufficient for reasoning, our approach is based on a formula that is constructed dynamically on-demand in order to gain the trade-off between efficiency and complexity. Second, instead of using a MaxSAT solver, we adapt the core-guided maxsat algorithm to manipulate and solve the formula incrementally.

E. Ermis [17], U. Christ [18], O. Chanseok [34], and M. Bekkouche [19] work on error trace formulas that represent a sequence of program statements in which execution produced an error; however, we referred as single-path formula. E. Ermis [17] and U. Christ2013 [34] leverage Craig Interpolation to find Error Invariants for every point in the error trace, where an error invariant for a position in a trace is a condition that the error will occur if the program is continued from that position. Based on Error Invariants, they can semantically remove all irrelevant statements from the error trace, thus resulting in a shorter error trace which is easier to localize bugs. Compared to our approach, both approaches can output a reduced error trace which contains error-relevant statements only (in our approach, the reduced error trace can be reconstructed by sort all statements in MCS by the executed order); in addition, our approach also provides a suggestion about how to fix these bugs. Moreover, while our approach uses incremental SMT solving, which is commonly supported by recent SMT solvers, these error invariant approaches require an interpolation solver, which is not popular.

Bekkouche et al. [19] encode assignment statements on a given error path into an error trace formula. They used a MaxSAT solver to compute the MCS of this formula and reported the corresponding statements as possible faults. An attempt is made to divert at most k conditional branch decisions on the error path to find alternative corrected paths. For each corrected path found, diverted conditional branches and also the MCS of the trace formula constructed on the path that reached the first diverted condition are reported as possibly faulty. Similar to this approach, we also diverted the counterexample path to find corrected executions. However, there are several difference between our approach and that of Bekkouche et al. [19]. First, our approach finds corrected paths by diverting not only branch decisions but also assignment statements. Second, instead of trying to check all possible diverted paths by exhaustively diverting bounded subsets of conditions on the counterexample path, we encode the possible effect of diverting operations into the trace formula to leverage the search capacity of the solver. As a result, by analyzing MCSs obtained from the solver, a much smaller number of diversion attempts is needed to derive correct paths than is required using Beckkouche’s approach.

W. Jin and A. Orso [16] proposed two techniques called on-demand formula computation (OFC) and clause weighting (CW) to mitigate the computational expensive and improve the accuracy of formula-based fault localization. Specifically, OFC (1) encodes only statement instances in the original failing trace into the error trace formula, (2) computes all MCS of the constructed formula. (3) If there is a conditional statement

s t

such that (i)

s t

is found in an MCS and (ii) a branch b of

s t

is still not encoded in the formula, then the OFC expands the formula by encoding all statement instances in the branch b and go back to step (2). Otherwise, the obtained MCSs are reported as the final output. OFC and our proposed ATF encoding similar in encoding only partial program into formula in an increasing manner. There are two main differences between the OFC and our method ATF. First, the formula in OFC approach is expanded to encode all branches of conditional statements that occurred in an MCS, while the ATF formula is expanded to refine abstracted conditional branches that are included in an angelic execution of the angelic program. Second, our approach does not require computing all MCS of the intermediate formula, as the OFC does, instead, it computes one MCS at a time and stops computing MCS when the formula needs more refinement.

In the Angelic Debugging approach [21], program expression(s) in suspicious scope (provided in advance) are replaced with a nondeterministic expression (i.e., an angelic choice which can return arbitrary value). Then, they use symbolic execution to find a successful execution in the transformed program with the input is fixed to a given failing test input. If such a successful execution exists, the translated suspicious expression is considered a fix candidate, and the concrete values of the non-deterministic expressions is reported as angelic values, or a suggestion to fix. The limitation of this method is that it performs on each expression separately; thus, they need to run the symbolic execution many times, each time for checking one expression. Thus, they have to call to SMT solver many times for solving different formulas. One limitation of this method is that it does not return minimal results. Indeed, it can output a successful angelic execution by replacing all statements in the suspicious scope with angelic value.

6.5. Automatic Program Repair

Automatic program repair (APR) [7,35,36,37] is a hot research topic in software engineering currently. These techniques try to provide the developer with actual patches that can make the buggy program pass a given test suite, which it originally fails. Automated program repair techniques usually start by using a fault localization or a fix localization [37,38] to identify a subset of code elements at which a patch can be applied. The effectiveness of the fix localization task is critical important to the effectiveness, as well as the reliability of automatic program repair [39,40]. The fix localization components in the semantic-based APR approaches (such as Angelix [37] and Nopol [36]) share the same objective with our approach that is finding angelic execution paths that make the failing test case to be a pass. Our method differ to these techniques in several points. The angelic fix localization in Nopol find angelic values only for conditional expression, they assume a single modification, and they do not use solver. Our approach similar to the angelic forest extractor component in the Angelix, as they find angelic values for both assignments and conditional expressions. However, our method produce angelic execution path by modifying minimal locations, while Angelix does not constraint the size of the fix locations.

7. Conclusions

In this paper, we presented AgxFaults, a formula-based fault localization method that aims to automatically find minimal sets of program locations where a bug fix might exist. We implemented AgxFaults as an extension of the Java Path Finder for automatic localizing fault in Java programs. We used AgxFaults to localize faults in various benchmarked programs of different sizes and compared the performance of AgxFaults to existing formula-based fault localization approaches. The experimental results demonstrated that our proposed method outperformed single-path formula approaches in terms of effectiveness. AgxFaults was comparable to the Program Formula approach in terms of effectiveness but was better in efficiency and scalability. We also demonstrated the capability of AgxFaults when applied to bugs from large real-world software projects.

Author Contributions

Conceptualization, Q.-N.P.; methodology, Q.-N.P.; software, Q.-N.P.; validation, Q.-N.P. and E.L.; formal analysis, Q.-N.P. and E.L.; investigation, Q.-N.P. and E.L.; writing—original draft preparation, Q.-N.P.; writing—review and editing, Q.-N.P. and E.L.; supervision, E.L.; project administration, E.L.; funding acquisition, E.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research is supported by the Next-Generation Information Computing Development Program (2017M3C4A7068179), and the Basic Science Research Program (2019R1A2C2006411) through the National Research Foundation of Korea (NRF) grant funded by the Korean Government (MSIT).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The implementation and benchmarks data are publicly available at https://bit.ly/agxfaults.

Acknowledgments

I would like to extend my thanks to my advisor Eunseok Lee and my lab mates for their guidance and support throughout the process of this paper.

Conflicts of Interest

The authors declare no conflict of interest.

References

Britton, T.; Jeng, L.; Carver, G.; Cheak, P.; Katzenellenbogen, T. Reversible Debugging Software: Quantify the Time and Cost Saved Using Reversible Debuggers; University Cambridge: Cambridge, UK, 2013; Available online: https://core.ac.uk/display/23390105 (accessed on 29 December 2020).
Hailpern, B.; Santhanam, P. Software debugging, testing, and verification. IBM Syst. J. 2002, 41, 4–12. [Google Scholar] [CrossRef] [Green Version]
Parnin, C.; Orso, A. Are automated debugging techniques actually helping programmers? In Proceedings of the 2011 International Symposium on Software Testing and Analysis, Toronto, ON, Canada, 17–21 July 2011; p. 199. [Google Scholar] [CrossRef]
Abreu, R.; Zoeteweij, P.; Gemund, A.V. Spectrum-Based Multiple Fault Localization. In Proceedings of the 2009 IEEE/ACM International Conference on Automated Software Engineering, Auckland, New Zealand, 16–20 November 2009. [Google Scholar] [CrossRef]
Roychoudhury, A.; Chandra, S. Formula-based software debugging. Commun. ACM 2016, 59, 68–77. [Google Scholar] [CrossRef]
Wong, W.E.; Gao, R.; Li, Y.; Abreu, R.; Wotawa, F. A Survey on Software Fault Localization. IEEE Trans. Softw. Eng. 2016, 5589, 707–740. [Google Scholar] [CrossRef] [Green Version]
Gazzola, L.; Micucci, D.; Mariani, L. Automatic Software Repair: A Survey. IEEE Trans. Softw. Eng. 2017, 5589, 1. [Google Scholar] [CrossRef] [Green Version]
Papadakis, M.; Le Traon, Y. Metallaxis-FL: Mutation-based fault localization. Softw. Test. Verif. Reliab. 2015, 25, 605–628. [Google Scholar] [CrossRef] [Green Version]
Li, Z.; Wang, H.; Liu, Y. HMER: A Hybrid Mutation Execution Reduction approach for Mutation-based Fault Localization. J. Syst. Softw. 2020, 168, 110661. [Google Scholar] [CrossRef]
Pearson, S.; Campos, J.; Just, R.; Fraser, G.; Abreu, R.; Ernst, M.D.; Pang, D.; Keller, B. Evaluating & improving fault localization techniques. In Proceedings of the 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE), Buenos Aires, Argentina, 20–28 May 2017. [Google Scholar]
Gopinath, D.; Zaeem, R.N.; Khurshid, S. Improving the effectiveness of spectra-based fault localization using specifications. In Proceedings of the 2012 27th IEEE/ACM International Conference on Automated Software Engineering, Essen, Germany, 3–7 September 2012; p. 40. [Google Scholar] [CrossRef]
Jose, M.; Majumdar, R. Cause Clue Clauses: Error Localization using Maximum Satisfiability. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, San Jose, CA, USA, 4–8 June 2011; pp. 437–446. [Google Scholar] [CrossRef]
Wotawa, F.; Nica, M.; Moraru, I. Automated debugging based on a constraint model of the program and a test case. J. Log. Algebr. Program. 2012, 81, 390–407. [Google Scholar] [CrossRef] [Green Version]
Lamraoui, S.M.; Nakajima, S. A Formula-Based Approach for Automatic Fault Localization of Imperative Programs. In Proceedings of the 16th International Conference on Formal Engineering Methods, Luxembourg, 3–5 November 2014; pp. 251–266. [Google Scholar] [CrossRef]
Lamraoui, S.M.; Nakajima, S.; Hosobe, H. Hardened Flow-Sensitive Trace Formula for Fault Localization. In Proceedings of the International Conference on Engineering of Complex Computer Systems (ICECCS), Gold Coast, Australia, 9–12 December 2015; pp. 50–59. [Google Scholar] [CrossRef]
Jin, W.; Orso, A. Improving efficiency and accuracy of formula-based debugging. In Proceedings of the Haifa Verification Conference, Haifa, Israel, 14–17 November 2016; pp. 99–116. [Google Scholar] [CrossRef]
Ermis, E.; Schäf, M.; Wies, T. Error invariants. In Proceedings of the International Symposium on Formal Methods, Paris, France, 27–31 August 2012; pp. 187–201. [Google Scholar] [CrossRef]
Christ, U.; Ermis, E.; Schaef, M.; Wies, T. Flow-Sensitive Fault Localization. Verif. Model Checking Abstr. Interpret. 2013, 7737, 189–208. [Google Scholar]
Bekkouche, M.; Collavizza, H.; Rueher, M. LocFaults: A new flow-driven and constraint-based error localization approach. In Proceedings of the 30th Annual ACM Symposium on Applied Computing, Salamanca, Spain, 13–17 April 2015; pp. 1773–1780. [Google Scholar]
Si, X.; Zhang, X.; Manquinho, V.; Janota, M.; Ignatiev, A.; Naik, M. On Incremental Core-Guided MaxSAT Solving. In Principles and Practice of Constraint Programming; Rueher, M., Ed.; Springer International Publishing: Cham, Switzerland, 2016; Volume 9892, pp. 473–482. [Google Scholar]
Chandra, S.; Torlak, E.; Barman, S.; Bodik, R. Angelic debugging. In Proceedings of the 2011 33rd International Conference on Software Engineering (ICSE), Honolulu, HI, USA, 21–28 May 2011; pp. 121–130. [Google Scholar] [CrossRef]
Jose, M.; Majumdar, R. Bug-assist: Assisting fault localization in ANSI-C Programs. In Proceedings of the International Conference on Computer Aided Verification), Edinburgh, UK, 15–19 July 2011; pp. 504–509. [Google Scholar] [CrossRef] [Green Version]
Cytron, R.; Ferrante, J.; Rosen, B.K.; Wegman, M.N.; Zadeck, F.K. Efficiently computing static single assignment form and the control dependence graph. ACM Trans. Program. Lang. Syst. 1991, 13, 451–490. [Google Scholar] [CrossRef]
Barman, S.; Bodik, R.; Chandra, S.; Galenson, J.; Kimelman, D.; Rodarmor, C.; Tung, N. Programming with angelic nondeterminism. ACM SIGPLAN Not. 2010, 45, 339. [Google Scholar] [CrossRef]
Luckow, K.; Dimjašević, M.; Giannakopoulou, D.; Howar, F.; Isberner, M.; Kahsai, T.; Rakamarić, Z.; Raman, V. JDart: A Dynamic Symbolic Analysis Framework. In Proceedings of the International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS 2016), Eindhoven, The Netherlands, 4–7 April 2016. [Google Scholar]
Howar, F.; Jabbour, F.; Mues, M. JConstraints: A Library for Working with Logic Expressions in Java. In Models, Mindsets, Meta: The What, the How, and the Why Not? Springer: Cham, Switzerland, 2019. [Google Scholar]
Fu, Z.; Malik, S. On Solving the Partial MAX-SAT Problem. In Proceedings of the International Conference on Theory and Applications of Satisfiability Testing 2006, Seattle, WA, USA, 12–15 August 2006; pp. 252–265. [Google Scholar] [CrossRef]
Bekkouche, M. Java Benchmark. Available online: http://www.capv.toile-libre.org/Benchs_Mohammed.html (accessed on 29 December 2020).
Hutchins, M.; Foster, H.; Goradia, T.; Ostrand, T. Experiments of the effectiveness of dataflow- and controlflow-based test adequacy criteria. In Proceedings of the 16th International Conference on Software Engineering, Sorrento, Italy, 16–21 May 1994. [Google Scholar]
Just, R.; Jalali, D.; Ernst, M.D. Defects4J: A Database of Existing Faults to Enable Controlled Testing Studies for Java Programs. In Proceedings of the 2014 International Symposium on Software Testing and Analysis, San Jose, CA, USA, 21–26 July 2014. [Google Scholar] [CrossRef] [Green Version]
Weiser, M. Program slicing. In Proceedings of the 5th International Conference on Software Engineering, San Diego, CA, USA, 9–12 March 1981. [Google Scholar]
Hofer, B.; Wotawa, F. Combining slicing and constraint solving for better debugging: The CONBAS approach. Adv. Softw. Eng. 2012, 2012, 628571. [Google Scholar] [CrossRef]
Lamraoui, S.M.; Nakajima, S. A Formula-Based Approach for Automatic Fault Localization of Multi-fault Programs. J. Inf. Process. 2016, 24, 251–266. [Google Scholar] [CrossRef] [Green Version]
Chanseok, O.H.; Schaf, M.; Schwartz-Narbonne, D.; Wies, T. Concolic Fault Abstraction. In Proceedings of the 2014 IEEE 14th International Working Conference on Source Code Analysis and Manipulation, Victoria, BC, USA, 28–29 September 2014; pp. 135–144. [Google Scholar] [CrossRef] [Green Version]
Yuan, Y.; Banzhaf, W. Toward Better Evolutionary Program Repair: An Integrated Approach. ACM Trans. Softw. Eng. Methodol. (TOSEM) 2020, 29, 1–53. [Google Scholar] [CrossRef]
Xuan, J.; Martinez, M.; DeMarco, F.; Clément, M.; Marcote, S.L.; Durieux, T.; Berre, D.L.; Monperrus, M. Nopol: Automatic Repair of Conditional Statement Bugs in Java Programs. IEEE Trans. Softw. Eng. 2016, 41, 34–55. [Google Scholar] [CrossRef] [Green Version]
Mechtaev, S.; Yi, J.; Roychoudhury, A. Angelix: Scalable Multiline Program Patch Synthesis via Symbolic Analysis. In Proceedings of the International Conference on Software Engineering, Austin, TX, USA, 14–22 May 2016. [Google Scholar]
Jeffrey, D.; Gupta, N.; Gupta, R. Fault localization using value replacement. In Proceedings of the 2008 International Symposium on Software Testing and Analysis, Seattle, WA, USA, 20–24 July 2008; p. 167. [Google Scholar] [CrossRef] [Green Version]
Liu, K.; Koyuncu, A.; Bissyande, T.F.; Kim, D.; Klein, J.; Le Traon, Y. You cannot fix what you cannot find! An investigation of fault localization bias in benchmarking automated program repair systems. In Proceedings of the 2019 IEEE 12th International Conference on Software Testing, Verification and Validation, Xi’an, China, 22–27 April 2019; pp. 102–113. [Google Scholar] [CrossRef] [Green Version]
Liu, K.; Li, L.; Koyuncu, A.; Kim, D.; Liu, Z.; Klein, J.; Bissyandé, T.F. A critical review on the evaluation of automated program repair systems. J. Syst. Softw. 2021, 171, 110817. [Google Scholar] [CrossRef]

Figure 1. Buggy program and a failing test.

Figure 2. Static single assignment (SSA) representation of method foo.

Figure 3. Error trace formula.

Figure 4. Overview of AgxFaults.

Figure 5. Angelic program encoded during the process of AgxFaults.

Figure 6. Constraints added in the first iteration.

Figure 7. Constraints added in the fourth iteration.

Figure 8. Formula solving time of different methods on the TCAS programs.

Figure 9. Total runtime of different methods on the TCAS programs.

Figure 10. Formula solving time with increasing loop unwinding bounds.

Figure 11. Chart 5 diff.

Figure 12. Minimal angelic fix candidates (MFCs) returned by AgxFaults for the bug Chart 5.

Table 1. Comparison of the effectiveness in finding minimal angelic fix candidate of AgxFaults (Agx), Program Formula (PF), Single Path Control-Flow Sensitive (FS), and Control-Flow Insensitive (FI) approaches on the traffic collision avoidance system (TCAS) programs.

Program Info			Num. MFC Found				Num. Feasible MFC
Prog	Faults	Ftc	Agx	PF	FS	FI	Agx	PF	FS	FI
v1	1	131	44	44	52	3	44	44	35	3
v2	1	69	48	48	44	3	48	48	22	3
v3	1	24	61	61	53	3	61	61	33	3
v4	1	22	42	42	48	3	42	42	34	3
v5	1	10	61	61	47	3	61	61	31	3
v6	1	12	65	65	50	3	65	65	31	3
v7	1	36	66	66	50	3	66	66	32	3
v8	1	1	44	44	52	3	44	44	35	3
v9	1	7	39	39	48	3	39	39	14	3
v10	2	14	81	81	50	3	81	81	35	3
v11	3	148	150	150	37	3	150	150	5	3
v12	1	70	59	59	49	3	59	59	30	3
v13	1	4	57	57	51	3	57	57	33	3
v14	1	50	7	7	10	3	7	7	7	3
v15	2	10	64	64	46	3	64	64	30	3
v16	1	70	63	63	49	3	63	63	33	3
v17	1	35	64	64	50	3	64	64	32	3
v18	1	29	85	85	50	3	85	85	23	3
v19	1	19	64	64	51	3	64	64	33	3
v20	1	7	39	39	48	3	39	39	14	3
v21	1	16	42	42	51	3	42	42	35	3
v22	1	11	121	121	49	3	121	121	11	3
v23	1	42	36	36	49	3	36	36	14	3
v24	1	7	42	42	49	3	42	42	33	3
v25	1	3	79	79	51	3	79	79	13	3
v26	1	11	55	55	49	3	55	55	31	3
v27	1	10	61	61	47	3	61	61	31	3
v28	1	76	48	48	43	3	48	48	21	3
v29	1	18	56	56	43	3	56	56	21	3
v30	1	58	44	44	42	3	44	44	20	3
v31	1	279	106	106	42	3	106	106	21	3
v32	1	336	107	107	43	3	107	107	30	3
v34	1	79	50	50	45	3	50	50	26	3
v35	2	76	48	48	43	3	48	48	21	3
v36	1	121	223	223	52	3	223	223	3	3
v37	1	99	52	52	50	3	52	52	25	3
v39	1	3	79	79	51	3	79	79	13	3
v40	2	121	111	111	47	3	111	111	15	3
v41	1	22	44	44	49	3	44	44	36	3
SumAvg.		2156	87	87	45	3	87	87	23	3

Table 2. Comparison of the effectiveness in identifying fault locations of AgxFaults (Agx), Program Formula (PF), Single Path Control-Flow Sensitive (FS), Control-Flow Insensitive (FI), Bug-Assist (BA), and LocFaults (LF) approaches on the TCAS programs.

		#Succ. Fault Localization						#Succ. Fix Localization				EXAM Score
Ver	Ftc	Agx	PF	FS	FI	BA	LF	Agx	PF	FS	FI	Agx	PF	FS	FI	Exe
v1	131	131	131	131	0	131	131	131	131	131	0	6.1	8.4	6.4	19.3	18.2
v2	69	69	69	69	0	69	69	69	69	42	0	7.4	7.9	10.2	18.7	17.6
v3	24	24	24	24	0	14	23	24	24	24	0	5.1	6.1	9.8	19.5	18.3
v4	22	22	22	22	0	22	4	22	22	22	0	8.6	8	9.7	19.4	18.2
v5	10	10	10	10	0	10	9	10	10	10	0	5.8	6.5	8.8	19.5	18.3
v6	12	12	12	12	0	12	11	12	12	9	0	12.4	13.6	13.7	19.2	18.1
v7	36	36	36	36	0	36	36	36	36	36	0	13.1	12.4	15.6	19.1	18
v8	1	1	1	1	0	1	1	1	1	1	0	13.7	17.6	17.6	20.2	19.1
v9	7	7	7	7	0	7	7	7	7	7	0	4.9	7.2	11.8	19.7	18.5
v10	14	14	14	14	0	14	12	14	14	11	0	13.7	12.7	12.7	19.2	18.1
v11	148	148	148	148	0	148	148	148	148	148	0	2.2	2.1	6.8	18.9	17.7
v12	70	70	70	70	0	48	45	70	70	70	0	5.5	6.6	7.6	19.4	18.3
v13	4	4	4	4	0	4	4	4	4	4	0	13.2	8.2	13.6	19.5	18.5
v14	50	50	50	50	0	50	50	50	50	50	0	2.9	1	3.7	11.1	9.9
v15	10	10	10	10	0	10	10	10	10	10	0	6.3	5.4	7	19.4	18.2
v16	70	70	70	70	0	70	70	70	70	70	0	9.8	9.8	16.9	18.7	17.6
v17	35	35	35	35	0	35	35	35	35	35	0	9.8	8.2	13.7	19.1	18
v18	29	29	29	29	0	29	28	29	29	29	0	6	6.7	13.8	19.7	18.5
v19	19	19	19	19	0	19	18	19	19	19	0	13.1	11.8	17	19.5	18.4
v20	7	7	7	7	0	7	7	7	7	7	0	4.9	7.2	11.8	19.7	18.5
v21	16	16	16	16	0	16	16	16	16	16	0	6.4	8.2	14.4	19.6	18.5
v22	11	11	11	11	0	11	11	11	11	11	0	4	3.4	13.1	20.2	19
v23	42	42	42	42	0	41	42	42	42	42	0	5	4.7	12.7	19.8	18.7
v24	7	7	7	7	0	7	7	7	7	7	0	11.9	8.3	6.6	19.7	18.5
v25	3	3	3	3	0	3	2	3	3	3	0	4.6	3.3	6.6	20	18.9
v26	11	11	11	11	0	11	7	11	11	11	0	7.5	4.3	8.2	19.6	18.5
v27	10	10	10	10	0	10	9	10	10	10	0	5.8	6.5	8.8	19.5	18.3
v28	76	76	76	76	0	58	74	76	76	55	0	7.9	7.4	12.7	18.6	17.5
v29	18	18	18	18	0	18	17	18	18	14	0	7.3	7.5	8.3	18.5	17.3
v30	58	58	58	58	0	58	58	58	58	43	0	6.9	6.9	9.3	18.2	17.1
v31	279	279	279	279	0	279		279	279	279	0	6.2	6.4	11.8	18.2	17.1
v32	336	336	336	336	0	336		336	336	336	0	9.1	9.5	13.4	18.9	17.7
v34	79	79	79	79	0	79	79	79	79	79	0	7.6	7.6	7.8	19.4	18.2
v35	76	76	76	76	0	58	74	76	76	51	0	7.2	7.2	12	18.6	17.5
v36	121	121	121	121	121	121	120	121	121	121	121	1.5	1.4	14.8	1.5	18.6
v37	99	99	99	99	0	99	21	99	99	68	0	9.2	9.8	17.8	19.1	17.9
v39	3	3	3	3	0	3	2	3	3	3	0	4.6	3.3	6.6	20	18.9
v40	121	121	121	121	0	121	72	121	121	121	0	5.7	5.5	13.4	18.6	17.5
v41	22	22	22	22	0	22	16	22	22	22	0	5.7	5.4	9.8	19.4	18.2
	2156	2156	2156	2156	121	2087	1345	2156	2156	2027	121	6.8	6.9	11.5	17.7	17.6

Table 3. Bugs used for studies.

Project		Bug		Dev. Patch Size
Name	LOC	BugID	Description	Add	Del	Edit
Chart	89.3K	Chart5	mising branch	4	0	1
		Chart7	wrong assignments	0	0	2
		Chart18	missing guard	11	3	2
		Chart22	algorithm error	30	1	2
Codec	17.5K	Codec16	wrong field initialization	1	1	1
Codec	17.5K	Codec18	wrong return expression	1	1	1
Compress	28.3K	Comp24	algorithm error	3	4	0
		Comp27	wrong branch guard	0	3	0
		Comp6	algorithm error	3	0	2
Csv	3.8K	Csv2	mising try-catch	7	0	0
Csv	3.8K	Csv8	algorithm error	6	7	0
Lang	53.2K	Lang14	missing branch	3	0	1
		Lang21	wrong return expression	1	1	1
		Lang22	missing branch	7	1	2
		Lang30	missing branch	38	0	5
		Lang31	missing branch	8	0	0
		Lang40	design error	7	0	1
		Lang58	wrong if condition	0	1	0
Math	60.6K	Math94	wrong if condition	0	0	1
Math	60.6K	Math97	design error	14	0	2
Mockito	10.5K	Mock11	design error	8	0	1
Mockito	10.5K	Mock21	design error	16	0	4

Table 4. Fault localization results of AgxFaults for real bugs.

Bug ID	#MFC	#Susp. Lines	Found Actual Fault?	#Exam Lines	Solver Time (ms)	Run Time (ms)
Chart5	6	7	yes	3	470	3167
Chart7	4	4	yes	4	15(s)	19(s)
Chart18	3	3	yes	3	458	5749
Chart22	1	1	no	2	220	3509
Codec16	6	3	yes	5	58	2237
Codec18	48	12	yes	12	2938	7870
Comp24	31	14	yes	3	9370	26(s)
Comp27	31	17	yes	2	29(s)	63(s)
Comp6	4	4	yes	2	46	1586
Csv2	2	2	yes	1	5	1419
Csv8	4	4	yes	1	202	2369
Lang14	1	1	yes	1	69	1772
Lang21	8	1	yes	1	2897	6534
Lang22	1	1	no	3	25(s)	29(s)
Lang30	11	8	yes	5	166	2397
Lang31	39	13	yes	1	851	5757
Lang40	13	4	yes	4	445	6444
Lang58	4	4	yes	1	157	5907
Math94	2	2	yes	1	57(s)	59(s)
Math97	2	2	yes	1	722	2512
Mock11	1	1	yes	1	4	1743
Mock21	2	2	no	3	5	1725

Table 5. Comparing the results of AgxFaults (Agx), Single-path Control-Flow-Sensitive (FS), and Single-path Control-Flow-Insensitive (FI) approaches on real bugs.

	Trace Size	#MFC			#Susp. Lines			#Exam Lines
Bug ID	Trace Size	Agx	FS	FI	Agx	FS	FI	Agx	FS	FI
Chart5	928	6	14	7	7	14	6	3	9	1
Chart7	843	4	14	0	4	14	0	4	12	-
Chart18	806	3	0	3	3	0	3	3	-	-
Chart22	754	1	11	4	1	11	4	-	7	-
Codec16	54	6	9	2	3	5	2	5	-	-
Codec18	207	48	6	6	12	6	6	12	-	-
Comp24	5836	31	6	6	14	5	5	3	-	-
Comp27	6183	31	2	2	17	2	2	2	-	-
Comp6	90	4	6	0	4	6	0	2	-	-
Csv2	45	2	2	2	2	2	2	1	-	-
Csv8	575	4	3	3	4	3	3	1	-	-
Lang14	61	1	3	1	1	2	1	1	1	1
Lang21	5103	8	8	0	1	1	0	1	1	-
Lang22	1621	1	1	0	1	0	0	-	-	-
Lang30	161	11	10	0	8	8	0	5	2	-
Lang31	345	39	0	0	13	0	0	1	-	-
Lang40	7740	13	0	0	4	0	0	4	-	-
Lang58	476	4	13	9	4	10	8	1	7	2
Math94	791	2	0	0	2	0	0	1	-	-
Math97	155	2	4	4	2	4	4	1	4	4
Mock11	35	1	1	1	1	1	1	1	1	1
Mock21	105	2	2	2	2	2	2	-	-	-

Table 6. Comparing execution time of AgxFaults (Agx), Single-path Control-Flow-Sensitive (FS), and Single-path Control-Flow-Insensitive (FI) approaches on real bugs.

	Solver Time			Running Time
Bug ID	Agx	FS	FI	Agx	FS	FI
Chart5	470	571	108	3167	3953	1968
Chart7	15,170	546	3	18,962	3376	3227
Chart18	458	0	139	5749	2597	2434
Chart22	220	400	119	3509	3070	2192
Codec16	58	193	76	2237	3085	2485
Codec18	2938	139	152	7870	2341	2009
Comp24	9370	212	179	26,295	3950	3094
Comp27	29,469	1219	355	63,236	6746	5035
Comp6	46	67	3	1586	1745	1368
Csv2	5	10	8	1419	2478	1390
Csv8	202	102	81	2369	3841	2597
Lang14	69	18	13	1772	1702	1668
Lang21	2897	4755	4	6534	8128	2495
Lang22	25,962	53,232	4	29,507	57,691	3452
Lang30	166	141	2	2397	1974	1640
Lang31	851	57	3	5757	2488	1616
Lang40	445	19	3	6444	2438	1802
Lang58	157	257	136	5907	2998	2298
Math94	56,595	24,535	5	59,014	27,860	2569
Math97	722	180	116	2512	2091	1661
Mock11	4	10	7	1743	1807	1698
Mock21	5	40	22	1725	2115	1766
Sum	146,279	86,703	1538	259,711	148,474	50,464

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Phung, Q.-N.; Lee, E. Incremental Formula-Based Fix Localization. Appl. Sci. 2021, 11, 303. https://doi.org/10.3390/app11010303

AMA Style

Phung Q-N, Lee E. Incremental Formula-Based Fix Localization. Applied Sciences. 2021; 11(1):303. https://doi.org/10.3390/app11010303

Chicago/Turabian Style

Phung, Quang-Ngoc, and Eunseok Lee. 2021. "Incremental Formula-Based Fix Localization" Applied Sciences 11, no. 1: 303. https://doi.org/10.3390/app11010303

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Incremental Formula-Based Fix Localization

Abstract

1. Introduction

2. Background

2.1. Fault Localization Problem

2.2. Formula Satisfiability and Solvers

2.3. MaxSAT-Based Fault Localization

3. Proposed Method

3.1. Overview

3.2. Overall Fault Localization Algorithm

3.3. On-Demand Program Explorer and Incremental Formula Encoder

3.4. Incremental Formula Solver

3.5. MCS Analyzer and Report Writer

3.6. Illustrative Example

4. Evaluation Setup

4.1. Implementation

4.2. Research Questions and Evaluation Metrics

4.3. Benchmarks

Study Protocol

5. Results of the Experiments

5.1. Result of RQ1: Effectiveness in Finding Angelic Fix Candidates

5.2. Result of RQ2: Effectiveness in Fault Localization

5.3. Result of RQ3: Efficiency and Scalability

5.4. Result of RQ4: Real Software Bugs

Comparison with Existing Techniques on Real Bugs

5.5. Threats to Validity

6. Related Work

6.1. Spectrum-Based Fault Localization

6.2. Program Slicing Based

6.3. Mutation-Based Fault Localization

6.4. Formula-Based Fault Localization

6.5. Automatic Program Repair

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI