We used an MSI laptop with an Intel(R) Core(TM) i7-6700HQ hyper-threading quad-core processor in this phase of the experiment. The frequency of the processor is 2.60 GHz. The size of the last level cache is 6 MB. The operating system is Ubuntu 18.04LTS. The perf tool version is 5.4.114. We evaluated the performance of CacheHawkeye in this section.
  4.1. Detect Flush+Reload and Flush+Flush Attacks
In this subsection, we evaluated the performance of 
CacheHawkeye on Flush+Reload and Flush+Flush attacks. Flush+Reload attack program monitored target addresses 10,000 times and Flush+Flush attack monitored target addresses 5600 times. As a control experiment, we also evaluated the behavior of 
CacheHawkeye on legit AES and RSA encryption and decryption programs. These legit programs also use shared libraries and may access sensitive functions (memory addresses) stored in the cache attack library, so we evaluated the performance of 
CacheHawkeye on these programs. We will analyze the performance of 
CacheHawkeye under various system loads in 
Section 4.3, and we temporarily overlook the interference of system load in this subsection.
The results of 
CacheHawkeye detected Flush+Reload attacking RSA are listed in 
Table 2. For the sake of brevity, we only list the data in user mode. The results of mem-loads are listed in columns 1 and 2, while the results of mem-stores are listed in columns 3 and 4. The data symbol columns of mem-loads and mem-stores both contain monitored functions (such as mpihelp_mul_karatsuba_case and mpihelp_divrem). Some data symbols appear repeatedly in the same column because the other parameters of this row are different. These functions are frequently accessed and account for 10% to 12.99% overhead. Because these function names are stored in the cache attack library, 
CacheHawkeye determines that this program is malicious.
Table 3 shows the results of 
CacheHawkeye detecting Flush+Flush attacking AES. We list a few lines which we care about. For the results of mem-loads, the data symbol in the first column does not contain any monitored addresses in the cache attack library. For the results of mem-stores, the data symbols in the third column are contained in the cache attack library. As a result, 
CacheHawkeye considers this program to be vicious.
 Let us explain the above results. In Flush+Reload, Flush is memory storage procedure in which the cache line is evicted to memory, and Reload is a memory loading procedure in which the memory block is placed into cache. As a result, Mem-stores and mem-loads contain the names of the sensitive functions. Different from Flush+Reload, Flush+Flush only includes the Flush process, so the sensitive memory addresses are only found in mem-stores.
We ran tests to evaluate the performance of CacheHawkeye on legit cryptographic programs that use shared libraries and monitored functions or memory addresses. We tested four programs: AES encryption, AES decryption, RSA encryption, and RSA decryption.
Table 4 lists all results of a legit AES encryption program. The data symbol column does not contain any sensitive functions or memory addresses, So 
CacheHawkeye believes that this program is legit. The results of the legit AES decryption program are similar to 
Table 4 and also do not contain any sensitive functions or memory addresses. These results are not presented for the sake of brevity.
 It is worth noting that we divide the detection of the legit RSA decryption program into two tables. 
Table 5 shows the memory load results, and 
Table 6 shows the memory store results. There are no sensitive function names in 
Table 5. However, this does not mean that the legit RSA decryption program does not access these addresses, but because the access is not frequent enough, they are not caught by 
CacheHawkeye. There are some sensitive function addresses(such as mpih_sqr_n_basecase and mpihelp_divrem) in the symbol column of 
Table 5, this indicates that the legit RSA program has accessed these target addresses(functions). But there are no sensitive function addresses in the data symbol column. These demonstrate that the symbol column represents the legit program’s memory access, whereas the data symbol represents the cache side channel attack’s malicious memory access. The 
perf tool automatically puts the sensitive function name of the attack program in the data symbol column and puts the sensitive function name which the legit program also accesses in the symbol column. We conjecture that the reason behind it may be the memory access (by 
clflush instruction and 
movl instruction) of the attack program is somewhat different from the memory access of the legit program. 
CacheHawkeye only pays attention to the data symbol column, so it does not misjudge the legit cryptographic program.
For legit RSA encryption programs, the results of the data symbol column still do not contain any sensitive functions or memory addresses. These results are not presented for the sake of brevity. Therefore, CacheHawkeye can distinguish between benign cryptographic programs and side channel attack programs.
  4.2. Sampling Frequency Configuration
In this subsection, we evaluated the performance of 
CacheHawkeye at different frequencies and determine an appropriate sampling frequency. We tested 
CacheHawkeye to detect 4 representative attack programs and 4 legit cryptographic programs which may access sensitive addresses. we chose programs that are extremely difficult to detect when configuring the frequency, this can make the configured frequency more universally adaptable. For 
CacheHawkeye, the shorter the execution time of the attack program, the fewer sensitive memory address accesses, and the more difficult it is to detect. We chose 4 programs with very short execution time to configure the sampling frequency. As shown in 
Table 7, the execution time of these programs is only 7–12 ms. Real-world attacks must be much longer than these times because the attacker cannot synchronize with the victim. Therefore, the frequency configured according to these attack programs far meets the requirements of detecting real-world attacks.
We monitored 4 representative attack programs and 4 legit cryptographic programs with sampling frequencies of 2999, 5999, 8999, 11,999 and 14,999. Each attack program is executed 1000 times at each frequency. 4000 attacking samples are generated per frequency. 
CacheHawkeye also tests legit AES encryption/decryption programs and RSA encryption/decryption programs under different frequencies. Each legit program is executed 1000 times at each frequency. We use accuracy to evaluate the performance of 
CacheHawkeye at different frequencies and then determine the appropriate sampling frequency configuration. Accuracy refers to the percentage of samples that are judged correctly in the total samples. The formula for accuracy is as follows:
In Equation (
1), True Positive(TP) represents that the malignant program is correctly recognized, True Negative(TN) represents that the benign program is correctly recognized, False Positive(FP) represents that the benign program is recognized as a malignant program, and False Negative(FN) represents that the malignant program is recognized as a benign program.
The accuracy of malicious and legit programs at different sampling frequencies is shown in 
Figure 4. The accuracy of the 
CacheHawkeye is only 82.7% when the sampling frequency is 2999. 
CacheHawkeye’s accuracy improves as the sampling frequency rises. The accuracy rate reaches 100% when the sampling frequency reaches 14,999.
We hypothesized that a greater sample frequency would lengthen the sampling time, so we measured it at various frequencies. We define sampling time as the time it takes to collect memory events and store them as a file. 
Figure 5 shows the average sample time of four malicious programs at various frequencies. We can see that when the frequency increases, the sample time does not change significantly. We only need to consider the accuracy when configuring the frequency. As a result, 
CacheHawkeye’s sample frequency configuration is 14,999.
  4.3. Performance under Different System Loads
In this subsection, we evaluated 
CacheHawkeye under different system loads. We used 
unixbench and 
sysbench to generate system load. We used the default configuration of 
unixbench. The configuration settings of 
sysbench are listed in 
Table 8. During the execution of 
sysbench, we randomly picked one of the five routines. The system loads are divided into three categories: no-load, average-load, and full-load. No-load means that there is no system load when 
CacheHawkeye is running. The average-load has two workloads, one runs 
sysbench, the other runs 
unixbench. The full-load has four workloads, two of which run 
sysbench, and the other two run 
unixbench.
We tested 
CacheHawkeye to detect 4 representative attack programs and 4 legit cryptographic programs which may access sensitive addresses under different system loads. Each program is executed 1000 times. 4000 benign samples and 4000 malignant samples are generated under each system load. The experimental results are listed in 
Table 9. We discovered that 
CacheHawkeye is 100% accurate under no-load and full-load, and 99.99% accurate under average-load. Because 
CacheHawkeye has not been pre-trained under different system loads, it can be expected that 
CacheHawkeye still performs excellently under unknown system loads. As a result, it can be inferred that the performance of 
CacheHawkeye performance is unaffected by system load. Because the memory capacity is substantially more than the capacity of the microarchitecture components (such as the branch instruction buffer and cache), memory events are very little affected by system loads.
Table 10 summarizes some limitations of the above work. CacheRadar and Alam et al.’s methods cannot detect Flush+Flush attacks. These two strategies, however, do not take system loads into account. We believe that these strategies are extremely sensitive to system loads because hardware events such as cache hits and misses are highly susceptible to interference from system loads. NIGHTs-WATCH has a good performance in known system loads and can detect Flush+Flush attacks. However, system loads still bring an accuracy loss of 4.97% [
13] and this pre-trained model may perform poorly under unknown system load. Microarchitecture events are used as feature vectors for detection in all of the approaches listed above. Our approach detects cache side channel attacks using memory events. Compared with the above methods, our method has a very strong ability to adapt to the system loads and close to 100% accuracy.