(c) Fitness calculation

By tracking the program under test, we can see that an execution path information can be expressed as a sequence of edges. Therefore, in order to find a new execution path and improve the path coverage of CVDF DYNAMIC, we need to calculate the fitness. We define the sequence set of edges as *V* = (*<sup>V</sup>*1, *V*2, ... , *Vn*), where each *Vk* (1 ≤ *k* ≤ *n*) is equivalent to *Ee*. For any edge in *Ee*, let us assume that the final test data are *Xi*. We can obtain a binary set of edge information related to the test data, as shown in Equation (2):

$$Q\_i = \{ (e\_{i,1}, X\_{i,1}), (e\_{i,2}, X\_{i,2}) \dots (e\_{i,n}, X\_{i,n}) \} \tag{2}$$

It is not difficult to find that its essence is a weighted digraph, and the weight is the test data. We define that the fitness (adaptation) f of an individual consists of two functions, as shown in Equations (3) and (4).

Finding the number of new edges *f*1 and the number of edges *f*2 associated with them in *Qi*:

$$f\_1(X\_i) = \operatorname{card}(V\_i - E\_l) \tag{3}$$

$$f\_2(X\_i) = \sum\_{q \in V\_i}^{q} G(\mathcal{W}\_{q'} X\_i) \tag{4}$$

$$G(X\_1, X\_2) = \begin{cases} \ 1(X1 = X2) \\ \ \ 0(X1 \neq X2) \end{cases} \tag{5}$$

Firstly, the fitness *f*1 of each individual is calculated, and then the fitness *f*2 of each individual is calculated after updating the set. The two sets used to calculate the fitness are updated after each round of testing. When comparing two individuals, first *f*1 is compared; if *f*1 cannot be distinguished, then compare *f*2.

(d) Individual selection, crossover and variation

Our individual selection method uses elite selection to produce new individuals. It is a strategy of generating new individuals in genetic algorithm, which makes the individuals with high fitness enter the next generation. The method of crossover is 2-opt transformation. A number of random numbers are generated as the intersection points, and then the fragments of the intersection points in the chromosome are exchanged. Rather than using the random mutating method, this paper proposes a control mutation method to improve the effect of mutation. A motivating example of the Algorithm 2 Control Mutation is as follows:

### **Algorithm 2. Control Mutation**

**Start Func Func ControlPROC**(X,Y) 1: A = 1, B = 1 2: **IF** Y >= B **THEN** 3: **FORK1:** A=A × X, B = B + 1 4: **ELSE:** 5: **IF** X >= A **THEN** 6: FORK2: A = A + X, B = B − 1 7: **ELSE:** 8: FORK3: A = A − X, B = B/2 9: **RETURN** A **End Func**

The input data format of the program is (*<sup>X</sup>*,*<sup>Y</sup>*) assuming that the template data are (*X* = 1, *Y* = <sup>1</sup>), and the variation factor is the operation of replacing 0. Therefore, two test data can be generated by mutation (*X* = 1, *Y* = 0) and (*X* = 0, *Y* = <sup>1</sup>), which can cover FORK1 and FORK2. This form of testing could not achieve 100% branch coverage due to the failure to cover FORK3. For control variation, when the test data (*X* = 1, *Y* = 0) generated by the variation make the program enter the new branch FORK2, the variation field of this time will be marked as an immutable field, and the variation will be carried out on the basis of the test data. In this example, the control variation marks *Y* = 0 as an immutable field and mutates the remaining fields, the *X* value, to 0, resulting in test data (*X* = 0, *Y* = 0) that can be overridden by FORK3.

The control mutation strategy consists of the test data and control information that make the program enter the new branch. The control mutation process is as follows: Firstly, the control mutation strategy is taken out from the policy database, and the test data entering the new branch are taken as the mutation template. Secondly, check the stored control information and each byte in the template to confirm whether it is marked as control information; if so, check the next byte, if not, modify the byte in combination with random mutation strategy, generate test data and execute fuzzy testing, then continue to check the next byte. Finally, after all bytes are checked, we complete one time of mutation, and the above process is repeated.

After completing the above operations, we have completed a round of iteration of the genetic algorithm taking the newly generated chromosome data as the test data of the next round of mutation, that is, continuous iterative mutation.

### 3.3.3. Integrating New Test Data with Integration Idea

Firstly, through the above genetic algorithm, test cases with high path coverage are constructed from the original test case seeds. Then, for the test cases located on different execution paths, the bi-LSTM neural network is used to construct test cases with stronger path depth detection ability. Finally, we integrate the test case set constructed by the two methods to obtain the final test case set. Considering that the test case set generated by the above two methods may be too large and the efficiency of the fuzzy testing is reduced, this paper uses heuristic genetic algorithm to simplify the integrated test case set to ensure that the efficiency of fuzzy testing can be improved without losing the test performance.

#### 3.3.4. Using Heuristic Genetic Algorithm to Reduce Sample Set

In order to reduce the sample set without losing the performance of fuzzy testing as much as possible, the screening principle of heuristic genetic algorithm in this paper is to give priority to the samples with stronger code coverage and Path Depth Detection Ability. Then, select the remaining test samples in the order of decreasing test performance, until the performance index basically covers the original fuzzy testing sample set (see the experiment in Section 4.4 for specific results). Here, our heuristic algorithm is a selection mutation algorithm for chromosomes.

### (a) Using a compression matrix to represent chromosomes

At present, the common chromosome representation method is to use a 0–1 matrix [39]. The element of each row vector of the 0–1 matrix is 0 or 1. As mentioned earlier, we treat the basic block address as a collection of elements. Each basic block is equivalent to the gene in the genetic algorithm. Therefore, 1 in the 0–1 matrix indicates that a basic block exists in the sample, while 0 indicates that it does not exist. In this way, the sample set formed by all samples constitutes a 0–1 matrix, and the set of genes in each column is equivalent to a chromosome. Considering the complexity of the program execution path, the 0–1 matrix is a sparse matrix. If it is stored directly in the way of 0–1, the space efficiency will be significantly reduced. Therefore, this paper compresses the 0–1 matrix. Our storage method is a triple sequence < *Val*, *Xcor*,*Ycor* >, where *Val* is the element with the storage value of 1, and *Xcor* and *Ycor* are its X and Y coordinates in the original matrix, respectively. Since the value of *Val* is 1 by default, the value of this item can be omitted in the actual operation.

#### (b) Using heuristic genetic algorithm to improve chromosome

Each chromosome has its own independent gene sequence, but there will also be a large number of repeated and overlapping genes. Therefore, as mentioned above, we should solve the SCP when carrying out set coverage and reduce set redundancy as much as possible. Therefore, the heuristic function of the heuristic genetic algorithm is mainly reflected in eliminating the redundancy caused by gene duplication and screening better chromosomes through genetic iteration.

The specific algorithm is described as follows:

We deduce the chromosome from the position information in the compression matrix. For genes in the same column, if they contain more "1" values, it indicates that the performance priority of this column is relatively high, so we give priority to selection, mark the selected column and so on. Subsequently, we perform gene exchange on chromosomes. We assume that there are two different chromosomes *Fa*1 and *Fa*2 in the parent generation. After chromosome exchange, we can obtain the child's chromosomes *Ch*1 and *Ch*2. It is assumed that *Ch*1 and *Ch*2 can cover set *S*1. We use sets *T*1 and *T*2 to store the line numbers not covered in the genes and use sets *Cot*1 and *Cot*2 to store the genes contained in *Ch*1 and *Ch*2. First, we calculate the performance priority of each gene in the parents *Fa*1 and *Fa*2, that is, count the number of "1" values in each column for screening. Then, we screen out the chromosomes with the highest performance priority in *Fa*1 and *Fa*2, copy them to *Ch*1, count the genes contained in *Ch*1 and delete the genes contained in *Ch*1 from *Cot*1. Then, we calculate the value of *Cot*1 − *Ch*1, which is the difference set, and store its line number in set *T*1. Next, we continue to arrange the remaining genes of *Fa*1 and *Fa*2 using the same

performance priority selection method, and then put them into *Ch*1 again. The remaining genes will be put into *Ch*2.

In the process of gene selection and gene exchange, there are some special cases with the same gene performance. At this time, we need to further screen them to obtain the optimal gene. Suppose that there are two genes, *Gene*1 and *Gene*2, with the same performance priority in *Fa*1, and there is one gene *Gene*3 in set *Ch*1. At this time, we need to compare the results of *Gene*1 ∩ *Gene*3 and *Gene*2 ∩ *Gene*3 to screen out the larger results. Considering that there will be a corresponding mutation process in the genetic algorithm, the above calculation should be carried out before and after mutation to ensure that the optimal result is always selected.

From the above description, the heuristic genetic algorithm proposed in this paper uses the compression matrix on the basis of the original population and selects the optimal chromosome according to the way of gene selection and gene exchange. Therefore, this heuristic genetic algorithm essentially does not change the workflow of ordinary genetic algorithm, but through the optimization of search conditions, it simplifies the sample set and further improves the efficiency of fuzzy testing.

The specific process of the ordinary genetic algorithm has been described above. The heuristic genetic algorithm is different from ordinary genetic algorithm in the following aspects:
