*3.2. Feedback Collector*

The feedback collector is mainly used to continuously update seed information to assist seed schedule. For the running of the instrumented program, a series of running information would be updated for seeds. Algorithm 3 shows the process of information updating by feedback collector. It takes the seed queue *Q* and the pointer variable as inputs, and output is the seed queue *Q* with new information. The new information includes the number of times the seed has been selected, the path frequency, the path risk, and the mutation information. Specifically, MooFuzz selects a seed *s* by using seed scheduler (see Section 3.3) and updates the number of times it has been selected (Lines 1–3). Then, it uses a mutation strategy to generate a new test case *s* and executes the target program by using test case *s* (Lines 4–5). Next, two pointer variables *danger*\_*bits* and *edger* are used to update the edge risk (Line 6). Here, *danger*\_*bits* is obtained with the pointer variable *danger*\_*trace*. The *edger* records the risk of each edge. At the beginning, the edge corresponding to dangerous function has a maximum value, while those of the other edges are zero. Next, if the mutated test case produces new coverage, MooFuzz will calculate path risk value (Lines 7–8). Next, MooFuzz traverses each seed in the seed pool and determine whether its path is the same as the current path. If so, the frequency information of the seeds in seed pools will be updated (Lines 9–11). Finally, if the path of *s* is identical to the path of *s*, the mutation information will be updated (Lines 12–13).

We discuss how to update different information separately as follows.

The path risk mainly refers to the ability of seeds to detect dangerous locations, which determines the number and speed of bug discovery. Before discussing the path risk, we first give the definition of edge risk update and then that of path risk update.

**The edge risk update.** Given an edge *ei* and the corresponding hit-count *danger*\_*bits*[*ei*], the edge risk *edger*[*ei*] is updated as follows.

$$\text{edge}\_r[e\_i] = \begin{cases} \text{edge}\_r[e\_i] - \text{danger\\_bits}[e\_i], & e\_i \in \text{danger\\_edge} \\ 0, & \text{others} \end{cases} \tag{5}$$

where *danger*\_*edge* is the set of edges corresponding to dangerous function.

**Algorithm 3:** Information update **Input:** a seed queue *Q*, a pointer variable *danger*\_*bits*, a pointer variable *edger*,the instrumented program *P* **Output:** a seed queue *Q* with new information **1** *S* = *SeedSchedule*(*Q*) **2 for** *s in S* **do 3** *<sup>s</sup>*.*select* \_ *num*++ **4** *s* ← *Mutation*(*s*) **5** (*trace*\_*bits*, *danger*\_*bits*) ← *Run*\_*target*(*s*) **6** *update*\_*edge*\_*risk*(*danger*\_*bits*,*edger*) **7 if** *is\_NewCoverage*(*<sup>P</sup>*,*s*) **then 8** *calculate* \_*path*\_*risk*(*edger*) **9 for** *si in* seed pool **do 10 if** the path of *si* is the same as that of *s* **then 11** *update*\_ *f re*\_*inf o*(*si*) **12 if** the path of *s* is the same as that of *s* **then 13** *update*\_*mta*\_*inf o*(*s*)

**The path risk update.** Given a seed *s* and the risk values of all edges covered by the seed *s*, the path risk of seed *s*, *<sup>s</sup>*.*risk* is calculated as follows.

$$s.risk = \sum\_{i=1}^{N} \frac{edge\_r[c\_i]}{N} \tag{6}$$

The path frequency indicates the ability of the seed to discover a new path. As time goes by, there are high-frequency paths and low-frequency paths in the program. Generally, those seeds that cover low-frequency paths have a higher probability of discovering new paths than those that cover high-frequency paths (the larger the value, the higher the path frequency) after the program running for a while.

**The path frequency update.** Given a seed *s* and its path *ps* , if there is a seed *s* in the seed pool and its path *ps*, and *ps* is the same as *ps* . We add one to the path frequency of seed *s*, that is,

$$s.fre = s.fre + 1, \text{if } p\_{s'} = p\_s \tag{7}$$

The mutation information indicates the mutation ability of a seed. For each seed that has not been fuzzed, its mutation effectiveness is set to 0, indicating that the seed has the best mutation validity. Among the seeds being fuzzed, the mutation ability of the seeds will be continuously evaluated, and individuals with high mutation ability (the smaller the value, the better) will obtain priority.

**The mutation information update.** Given a seed *s* and its mutation strategy *M*, if the path of seed *s* is the same as that of seed *s* generated by seed mutation upon *s*, the mutation information of seed *s*, *s*.*mta* is calculated as follows.

$$s.mta = s.mta + 1, \text{if } s' = M(s) \text{ and } p\_{s'} = p\_s \tag{8}$$

#### *3.3. Seed Scheduler*

Seed scheduler is mainly used for seeds selection. In order to effectively prioritize seeds, we propose a many-objective optimization seed schedule scheme.

Before seed schedule, MooFuzz divides the seed pool into three states according to seed attributes.

**Exploration State**. Exploration State refers to the existence of unfuzzed and favored seeds in the seed pool. Exploration State represents that the current seed pool state is an excellent state and it maintains the diversity of seeds.

**Search State**. In this state, the favored seeds have been fuzzed, but there are still unfuzzed seeds. Search State represents that there is a risk that the seed pool is completely fuzzed, and it is necessary to concentrate on finding more paths.

**Assessment State**. In this state, all the seeds are all fuzzed. It is very difficult to find a priority seed, but the fuzzed seeds produce a lot of information that can serve as a reference. Besides, MooFuzz performs state monitoring in the assessment state. Once the state changes, the seed set of the current state will be discarded to perform seed schedule in other states.

For these three states, MooFuzz uses different selection criteria based on bug detection, path discovery, and seed evaluation. MooFuzz constructs different objective functions based on different states.

In the previous discussion, MooFuzz has obtained the risk value of the seed before it is added to the seed pool, indicating the path risk. Based on previous research [8], seeds with deeper executing paths may be more capable of detecting bugs. Therefore, MooFuzz uses path risk *r* and path depth *d* as objectives for seed selection. To reduce the energy consumption of seeds and speed up the discovery of bugs, MooFuzz also takes the length *l* of the seed data and the execution time *t* of the seed as objectives. In Exploration State, MooFuzz uses the following objective functions to select the seeds that have not been fuzzed and favored.

$$\dim F(\mathbf{s}) = [-r, -d, l, \mathbf{t}]^T, \mathbf{s} \in S \tag{9}$$

Search State indicates that all the favored seeds in current seed pool have been fuzzed and there are unfuzzed seeds. At this time, MooFuzz's selection of seeds will mainly focus on the path discovery. The frequency information of the seeds will increase with the running time changes. In this state, those seeds that pass the low-frequency path will have greater potential to discover new paths. MooFuzz regards path frequency *e* and path depth *d* as criteria for seeds selection. Meanwhile, MooFuzz uses *l* and *t* described above to balance energy consumption. In Search State, MooFuzz uses the following objective functions to select the seeds that have not been fuzzed.

$$\text{Min } F(\mathbf{s}) = [\mathbf{e}, -d, l, \mathbf{t}]^T, \mathbf{s} \in \mathcal{S} \tag{10}$$

Assessment State means that all seeds in the current seed pool have been fuzzed. MooFuzz will obtain the information of the seed including the path frequency *e*, the number of times that the seed has been selected *n*, the seed path depth *d*, and the mutation information *m*, and then add them to the objective functions as mutation criterion. Note that the current state does not choose the length and execution time of the seed as criteria to balance energy consumption, because the current state is very difficult to generate new seeds. Besides, once new seeds are generated in this state, Assessment State will be terminated and enter other state. In Assessment State, MooFuzz uses the following objective functions to select the seeds from the seed pool.

$$\dim F(s) = [\varepsilon, n, -d, m]^T, s \in \mathcal{S} \tag{11}$$

MooFuzz selects the optimal seed set after establishing objective functions for different seed pool states and models seed schedule as a minimization problem. Algorithm 4 mainly completes the seed schedule by using non-dominated sorting [19]. The seed set *S* that satisfies state conditions will be selected as the input. A set *CF* that is used to store the optimal seed set. Initially, *CF* is an empty set, and *s*1 in seed set *S* was added to *CF*. For each seed *si* from the seed set *S* and seeds *sj* in *CF* finish the dominance comparisons (Lines 1–9). If *sj* dominates *si* (each attribute value of *sj* is less than *si*), the next seed comparison will be performed. If *si* dominates *sj*, remove *sj* from *CF*. After the comparison between the seed *si* and *sj*, if there is not a dominance relationship between *si* and all the seeds in

*CF*, *si* will be added to *CF* (Lines 10–11). After the above cycle is completed, the optimal seed set is stored in *CF*, and MooFuzz extracts each seed inside for fuzzing (Lines 12–13).


#### *3.4. Power Scheduler*

The purpose of power schedule is assigning reasonable energy for each seed involved in mutation. A high quality seed has more chances to mutation and should be assigned with more energy in fuzzing process.

Existing coverage-based fuzzers (such as AFL [7]) usually calculate the energy for the selected seeds as follows [18],

$$energy(i) = allowable\\_energy(q\_i) \tag{12}$$

where *i* is the seed and *qi* is the quality of the seed, depending on the execution time, branch edge coverage, creation time, and so on.

Algorithm 5 is the seed power schedule algorithm. MooFuzz considers different seed pool states to set up different energy distribution methods. Meanwhile, it also uses an energy monitoring mechanism, which has the ability to monitor the execution of target programs and reduce unnecessary energy consumption.

After many experiments, we find that the amount of energy in the deterministic stage is mainly related to the length of the seed, which is a relatively fine-grained mutation, but as the number of candidate seeds in the seed pool increases, it will affect the path discovery. Thus, in Algorithm 5 we open the deterministic stage to seeds that cause crashes after mutation (Lines 1–2). In the indeterministic stage, MooFuzz judges the state of the current seed. If it belongs to Search State, MooFuzz uses the frequency information to set the energy. If it belongs to Assessment State, both the frequency and the mutation information will be comprehensively considered to set the energy (Lines 3–6).

After energy allocation, we set up a monitoring mechanism to monitor the mutation of seeds (Lines 7–14). When each seed consumes 75% of the allocated energy, MooFuzz monitors the mutation of the current seed, and records the ratio of the average energy consumption of the current seed covering a new path and that of all seeds covering a new path. If its ratio is lower than *threshold*1, MooFuzz will withdraw the energy, if its ratio is higher than *threshold*2, the mutation information will be updated. Here, *threshold*1 is equal to 0.9 and *threshold*2 is equal to 1.3.

**AlgorithmInput:** a seed *s*, the number of all seeds in seed pool *total*\_*seed*, the total energyconsumed in the fuzzing process *total*\_*energy*, the number of new seedsgenerated by the current seed mutation *cur*\_*seed* **Output:** the energy of seed *s <sup>s</sup>*.*energy* **1 if** seed *s* that causes crashes after mutation **then 2 goto** deterministic stage /\* indeterministic stage:**3 if** state is Search State **then 4** *<sup>s</sup>*.*energy* = (1 + 1 *s*. *f re*) ∗ *energy*(*s*) **5 if** state is Assessment State **then 6** *<sup>s</sup>*.*energy* = (1 + ( 1 *s*.*mta* + 1 *s*. *f re*)) ∗ *energy*(*s*) **7 for** *cur\_energy* = 0 to *<sup>s</sup>*.*energy* **do 8 if** the energy consumption of seed *s* reaches 75% **then 9** *total* \_*average* = *total* \_*energy total* \_ *seed* **10** *cur* \_*average* = *cur* \_*energy cur* \_ *seed* **11 if** *cur\_average total\_average* < *threshold*1 **then 12 break 13 if** *cur\_average total\_average* > *threshold*2 **then 14** *s*.*mta* = *s*.*mta* ∗ 0.9

 \*/
