4.4.1. Boundary Delineation

In terms of boundary delineation, we compare the linear regression framework of this paper with the bit-flip rate algorithm used by READ, ReCAN, and LibreCAN. The performance of the methods in this study and the bit-flip rate method in delineating CAN messages with discrete states and continuous vehicle behavior is shown in Table 10. The framework in this paper can delineate the vehicle behavior within the corresponding range with 100% correctness, while the bit-flip-based rate is only 53.3% correct in delineating the boundaries. In particular, bit flipping has relatively good results in delineating CAN messages describing continuous behavior, but boundary delineation errors occur for fields corresponding to discrete vehicle behavior.

**Table 10.** Boundary Delineation Comparison.


The reasons for the different performance of existing methods in delineating boundaries are explained in Figure 19 using 0x082 (for steering) and 0x228 (for gears) as examples. As shown in Figure 19a, this approach may not set the boundary for the boundary delineation of continuous values quite correctly, but the delineation is within the correct range. In contrast, the bit-flip rate approach is easily affected by bits with the exact change pattern or are completely changed when dividing the boundary, which leads to the boundary division outside the normal range. Figure 19b compares the delineation results of the two methods for discrete values. The bit-flip rate approach fails to delineate the boundary accurately because the flipped cases of individual bits are generalized to the same field as the adjacent invariant bits when delineating the boundary. Therefore, the framework proposed in this study gives better results for discrete values.

**Figure 19.** Boundary division results of bit-flip rate and proposed method: (**a**) Continuous value division result (0x082 for steering); (**b**) Discrete value division result (0x228 for gear).

### 4.4.2. Related Message Filtering

This section describes the outstanding performance of the framework in this paper compared to existing schemes in related message filtering, where existing schemes mainly use correlation coefficients (e.g., LibreCAN, Bram's method) to filter related messages. Figure 20 compares the performance between our proposed framework and the Pearson correlation coefficient for correlated message filtering. Regardless of the number of messages, the multiple linear regression method proposed in this study can filter messages related to vehicle behavior with 100% accuracy. When using the correlation coefficient to filter messages, although the accuracy of candidate message filtering increases as the number of messages rises, the accuracy still does not exceed 95%. When calculating the correlation between the two vectors, the results of the Pearson correlation coefficient are easily influenced by outliers in the two vectors, resulting in a reduced correlation coefficient that does not effectively filter out candidate messages [46]. In this paper, using multiple linear regression to model each bit of the data field as an independent variable, the effect of outliers is weakened, and the relevant messages are effectively filtered out. This result shows that the framework proposed in this study is more accurate than existing message filtering methods.

**Figure 20.** Comparison between correlation coefficient and multiple linear regression.

In addition, as shown in Table 11, the accuracy of the linear regression method is not affected by the number of messages, which remains 100%, while the correlation coefficient requires a higher number of messages to obtain a higher correct rate. This indicates that fewer messages are needed to locate messages related to vehicle behavior when using the linear regression method for CAN message screening, reducing data acquisition and computation time that speeds up the reverse work.


**Table 11.** The influence of different message counts on accuracy.
