Next Article in Journal
Pre-Launch Calibration of the Bidirectional Reflectance Distribution Function (BRDF) of Ultraviolet-Visible Hyperspectral Sensor Diffusers
Next Article in Special Issue
BVTED: A Specialized Bilingual (Chinese–English) Dataset for Vulnerability Triple Extraction Tasks
Previous Article in Journal
From Tilings of Orientable Surfaces to Topological Interlocking Assemblies
Previous Article in Special Issue
Research on the Simulation Method of HTTP Traffic Based on GAN
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

WCET Analysis Based on Micro-Architecture Modeling for Embedded System Security

School of Computer Science and Engineering, University of Electronic Science and Technology of China (UESTC), Chengdu 611731, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2024, 14(16), 7277; https://doi.org/10.3390/app14167277
Submission received: 12 June 2024 / Revised: 20 July 2024 / Accepted: 15 August 2024 / Published: 19 August 2024

Abstract

:
To ensure the timely execution of hard real-time applications, scheduling analysis techniques must consider safe upper bounds on the possible execution durations of tasks or runnables, which are referred to as Worst-Case Execution Times (WCET). Bounding WCET requires not only program path analysis but also modeling the impact of micro-architectural features present in modern processors. In this paper, we model the ARMv8 ISA and micro-architecture including instruction cache, branch predictor, instruction prefetching strategies, out-of-order pipeline. We also consider the complex interactions between these features (e.g., cache misses caused by branch predictions and branch misses caused by instruction pipelines) and estimate the WCET of the program using the Implicit Path Enumeration Technique (IPET) static WCET analysis method. We compare the estimated WCET of benchmarks with the observed WCET on two ARMv8 boards. The ratio of estimated to observed WCET values for all benchmarks is greater than 1, demonstrating the security of the analysis.

1. Introduction

Real-time systems are characterized by the existence of timing constraints, in which tasks must be completed within a limited time frame. In hard real-time systems, missing a deadline can directly impact the system’s security. For instance, if an airbag system in a car does not deploy within the required time frame, it could result in severe injury or death during an accident. Ensuring real-time performance is therefore essential for maintaining security in these systems. The Worst-Case Execution Time (WCET) of a computational task is the maximum length of time the task could take to execute on a specific hardware platform. Its offline estimation helps in choosing appropriate scheduling algorithms, ensuring the system’s performance, and guarantying the system’s energy efficiency [1]. Since the wide application of real-time systems makes real-time analysis of programs particularly important, WCET estimation has been studied extensively [2], and various approaches have been developed to derive such bounds [3]. In embedded systems, ARM processors are considered mainstream because of their high performance, low power consumption, reasonable price, and complete maintenance system. They are particularly prominent in the fields of real-time control, connected automated vehicles, and mobile phones. Thus, the analysis of WCET for ARM processors is crucial. Methods to estimate the WCET of programs on newer ARM platforms like ARMv8 should be provided.
Estimating the WCET of a program is difficult because the execution of a program depends on specific hardware [4]. In terms of micro-architecture, modern processors are often equipped with pipelines, caches, branch prediction, and other features. These units not only perform their own functions but also interact with each other. For example, modern processors often load multiple cache lines at once when transferring data from memory to cache. When dealing with branch instructions, this loading scheme can lead to additional cache invalidation [5]. When a processor’s branch instruction is mispredicted, instructions along the taken branch are fetched and executed, while instructions along the wrong path are undone, incurring a branch misprediction penalty. The branch misprediction penalty can be substantially larger than the pipeline length [6]. These interactions between functional units greatly complicate the analysis of program execution, and there is little research on them [7,8].
In this paper, we model the ARMv8 ISA and the features of the processor’s micro-architecture. These features include cache, dynamic branch predictor, prefetching strategy, and out-of-order pipeline. We also consider the interactions between these features. For example, the cache miss caused by branch prediction and the branch miss caused by the instruction pipeline. We estimate the WCET of a program using a static analysis method. The static estimation is performed by establishing and solving an Integer Linear Programming (ILP) problem using IPET, where the control flow information of the program and the processor’s micro-architecture are converted into a series of linear constraints and a target WCET equation. The estimated WCET of a program can be obtained by solving its corresponding ILP problem. Using this process, we present a WCET analysis tool based on the open-source project Chronos [9]. The tool is tested on WCET benchmarks. By comparing the estimated WCET of benchmarks with the observed results, it can be concluded that, in most cases, the analysis performs reliable WCET estimations.
Our contributions cam be summarized as follows:
  • We use the IPET static WCET analysis method to estimate the WCET of programs on the ARMv8 platform.
  • In the analysis, we model features of the micro-architecture of the platform, along with the interactions between them.
  • We evaluate the performance of our WCET analysis tool by comparing the estimated WCET of benchmarks with the observed results. The results show that our tool can provide reliable WCET estimations.
The remainder of this paper is organized as follows. Section 2 explores common knowledge of WCET analysis, reviews prior research in the field, and elucidates its features. Section 3 introduces our WCET estimation method. Section 4 presents a detailed implementation of our analysis tool. Section 5 shows our evaluation method and experimental results. Lastly, Section 6 concludes the paper.

2. Related Work

There are three types of WCET analysis methods: static analysis, dynamic analysis, and their hybrid [10]. Dynamic analysis obtains the estimated WCET by executing the given task on the given hardware or a simulator, for some set of inputs, and measuring the execution time of the task or its parts. Static analysis is the process of analyzing program code using offline methods without executing the code. Due to the introduction of mathematical theory, the estimated result of the static analysis is higher than the actual value and is considered the safest. This approach is usually used for hard real-time systems that have stringent execution time requirements [11]. Static analysis is comprised of three main sub-tasks: control flow analysis, processor behavior analysis, and WCET calculation. The tool in this paper is designed using static analysis.
The first task of IPET static analysis is to perform control flow analysis and construct the control flow diagram (CFG). CFG is a directed graph with one entry and one exit. Each node, or basic block, in a CFG represents the maximum sequence of consecutive instructions. Except for the last instruction, there is no more branch instructions in the block. Each edge between nodes represents a branch in the control flow. Basic blocks are divided by scanning the symbols using binary. Figure 1 shows an example program and its corresponding CFG.
Currently, there are three types of static WCET analysis methods: the Tree-based Technique, Implicit Path Enumeration Technique [12,13], and Path-Based Technique [14]. The Tree-based Technique translates the structure of a program into a syntax tree, where reduction is performed from the bottom of the tree upwards. When reduction reaches the root of the syntax tree, the WCET of the program is obtained. The Implicit Path Enumeration Technique converts the longest execution time path search problem into the problem of finding the maximum number of executions for each basic block. This can be modeled and solved using the ILP method. The Path-Based Technique is similar to the Tree-based Technique, but it uses a Scope Graph to represent the structure of the program hierarchically. Table 1 provides a comparison of the characteristics of the methods. We adopt IPET in this paper because this method achieves good performance in various aspects when the input program is not too complex.
The purpose of control flow analysis is to find the program path with the longest execution time. Since IPET is based on the construction and solving of an ILP, the WCET target equation is given by Equation (1), ignoring hardware configuration. From basic block 0 to basic block n there is one possible control flow of the program, where the corresponding execution time and execution count of each block are c o s t i and c n t i , respectively.
W C E T = m a x i = 0 N ( c o s t i × c n t i ) ;
Then, the purpose of control flow analysis shifts to obtaining constraints from the program structure for the target equation. There are three kinds of constraints that should be established: basic block constraints, loop iteration upper limits, and infeasible path constraints [13]. Basic block constraints are relations regarding the execution count of basic blocks. Loop iteration upper limits include the maximum and minimum execution count of multi-exit loops, dependency count of inner loops in nested loops, and others constraints. Common methods for loop bound analysis are abstract interpretation [15] and symbolic execution [16]. Infeasible path constraints are used to exclude redundant control flow in the program that will never be reached, thereby improving the efficiency of the analysis. The detection of an Infeasible path is necessary to analyse reliable hard, critical real-time systems [17]. Ref. [18] proposed an abstract interpretation technique to analyze Infeasible paths in programs, and the technique requires manual intervention in test cases and cannot be fully automated yet.
After obtaining the preliminary WCET target equation and a series of constraints from the control flow of the program, the execution time of each basic block along the longest path is calculated by modelling the micro-architecture of the processor. This is a difficult task because most modern processors use multi-level cache, dynamic branch predictor, pipeline, etc., to improve CPU throughput and the complex behavior and interaction of these units make the execution time very unpredictable [5,19].
To analyse the influence of an instruction cache on program execution time, ref. [20] proposes a method based on IPET, and the basic idea is to estimate whether the cache access is hit or not by analyzing potential cache conflicts within the program and using Cache Conflict Graph (CCG) to model the cache line.
Dynamic branch prediction used by modern processors is based on the History State (HS) branch prediction algorithm. The idea is to predict a branch based on the execution history. The implementation uses the Branch History Register (BHR) to save the address of the Branch History Table (BHT). When a branch instruction is executed, the branch prediction for this instruction is looked up and updated in BHT. Ref. [21] considers the impact of branch prediction on WCET analysis by adding HS nodes in CFG.
Superscalar out-of-order CPUs can achieve higher performance than in-order CPUs, but it is difficult to guarantee the WCET of the software. Ref. [22] modeled basic blocks and performed an analysis of the five-stage pipeline, considering cache effects by using an execution graph, and avoided enumeration of all possible instruction execution times in the analysis by using intervals to represent the execution time of the instructions. Their analysis of pipelines can also be expressed in the form of linear constraints, meaning it can be easily integrated into IPET. Their methods have been adopted by the static WCET analysis tool Chronos.

3. Design

The analysis method in this paper is divided into three steps: compile, analyze, and solve and the process of these works is shown in Figure 2. In the compile step, C source files are compiled and disassembled. We focus on analyzing programs on the ARM architecture, so the output of this step is an object file in ELF format and its disassembled code on the ARM platform. In the analysis step, we perform control flow analysis and micro-architecture modeling to collect the target WCET equation and linear constraints, then combine them into the target ILP problem. In the solve step, the ILP problem obtained in the previous steps is solved to get the estimated WCET value.

3.1. Control Flow Analysis

Each node, or basic block, in the CFG is the maximum sequence of consecutive instructions. Let the directed edge E B B represent the count of executions from a basic block B to B , and let v B denote the execution count of a basic block. For any node in the CFG, the count of control flows into the node is equal to the count of flows out of the node, which is equal to the node’s execution count. Thus, the basic block constraint Equation (2) is obtained.
The other two kinds of constraints mentioned in Section 2, namely the loop iteration upper limit and infeasible path constraints, are provided by the user in our analysis. Figure 3 shows a simple C program and its corresponding CFG, along with the structural constraints and loop iteration bounds. These form an ILP problem together with (1). By solving this preliminary problem, we can estimate the execution count of each basic block without considering micro-architecture details.
i d i , j = c n t j = k d j , k ;
Within these constraints, variable d n denotes the execution count of the path, c n t n denotes the execution count of each basic block, and L n denotes the execution count of a loop in the CFG. The identification of loops in the CFG will be described in Section 4.

3.2. Instruction Cache Analysis

In this paper, we use a Cache Conflict Graph (CCG) to model direct-mapping cache schemes and FIFO replacement policies. In the direct-mapping scheme, each block of the main memory is mapped to only one specific cache location. We divide instructions in a basic block into several cache basic blocks based on the size of the physical cache line configured by the user, represented by a node in the CCG. Each split block is then mapped to a cache line based on its starting address. If multiple cache basic blocks are mapped to the same cache line, conflicts may occur during execution, represented by edges between nodes in the CCG. Assuming there are four cache sets, each with one cache line, a program, its cache mapping table, and the CCG of the cache is set to zero, as shown in Figure 4. S and E represent the start and end nodes. B i , j denotes the j th cache basic block i and edge c e c m , n , k , l denotes the count cache basic block B k l , which is evicted by B m n .
The target equation, considering the instruction cache model, is revised to (3). n i is the number of cache blocks in basic block i, c o s t i , j h i t ( c o s t i , j m i s s ) denotes the execution time of B i , j when the cache hits (misses), and c n t i , j h i t ( c n t i , j m i s s ) denotes the execution count of B i , j when the cache hits (misses). Estimated WCET is the maximum sum of the execution time of the cache basic blocks in all possible control flows in the CFG.
W C E T = m a x i = 1 N j = i n i ( c o s t i , j h i t × c n t i , j h i t + c o s t i , j m i s s × c n t i , j m i s s ) ;
In the CFG and CCG of a program, the execution count of a basic block is equal to the execution counts of all the cache basic blocks within it. The sum of the control flow into a basic block B i , i f i is equal to the control flow out of the basic block o f i and the branch connected to the starting and ending nodes will only be executed once. Based on these observations, the following constraints can be established for the target equation:
c n t i , j h i t + c n t i , j m i s s = c n t i , k h i t + c n t i , k m i s s , j k ;
c n t i = i f i = o f i ;
c n t i = u , v i f i c e c ( u , v , i , j ) = u , v o f i c e c ( i , j , u , v ) ;
i i f i c e c ( i , E ) = i o f i c e c ( S , i ) ;

3.3. Branch Prediction Analysis

This paper adopts the GAg prediction model for micro-architecture analysis. In this model, the branch predictor first looks up the Global History Register (GHR) to obtain the current global history of the branches when predicting a branch instruction. This history is then used as an index to access entry into the Pattern History Table (PHT), which contains the results of the branch direction. Prediction errors can lead to pipeline flushes and delays in instruction prefetching. To analyze the impact of branch prediction on program execution time, let b c m i denote the count of prediction errors and p e n a l t y represent the delay penalty. The target equation is then revised to:
W C E T = i = 0 N ( c o s t i × c n t i + p e n a l t y × b m c i ) ;
To obtain the constraints bounding the count of mispredictions under the GAg model, information about the execution history of each branch instruction must be recorded. Therefore, we introduce the Historical State (HS) into the CFG. Since there is only one branch instruction per basic block, each basic block will have an HS attribute that records the possible branch history when the control flow reaches that block. Assuming that the BHR is a two-bit register, the CFG, after adding the HS attribute to each basic block, is shown in Figure 5. Let 0 represent the branch that is not taken and 1 represent the branch that is taken. Basic block B 0 is the starting block, so its HS is always 00. Basic block B 1 has two incoming edges and the two possible paths to this block are B 0 B 1 and B 2 B 3 B 1 . The branch history of these two paths are not taken ( B 0 ), not taken ( B 0 B 1 ), and taken ( B 2 B 3 ), not taken ( B 3 B 1 ); therefore, the HS is { 00 , 11 } . In an execution, the current HS of each basic block is used as its global history to access and modify the branch prediction result in the BHT, as proposed by the GAg prediction model.
After adding HS information, the condition still holds that for each basic block in the CFG, the sum of the control flows into the block equals the sum of the control flows out of the block. Let d i , j denote the execution count from B i to B j , h s B i represents all HS of basic block B i , b m c i h s , and c n t i h s represents the corresponding variable when the control flow comes with the h s historical state. Then, the constraints for the target equation can be obtained:
d i , j = h s B i d i , j h s ;
b m c i = h s B i b m c i h s ;
c n t i = h s B i c n t i h s b m c i h s ;
If there is an edge from basic block B j to basic block B i , and the HS of this control flow at B j is h p r e , the HS at B i in this control flow can occur after recording the branch of the edge B j B i . The sum of the incoming control flow from all possible basic blocks under all possible HSs is the execution count of a basic block. Constrains can be obtained in Equation (12).
c n t i h = j d j , i h p r e = j h { h | h p r e h } d j , i h ;
Let s p e c ( i , j ) ( h , n ) denote the count of the executions when the HS h remains unchanged on the longest path from B i to B j . This means that along the path, branches are either always taken or always not taken, and n represents the branch result of B i . If a basic block takes (1) the branch to the next block and the branch prediction under HS h is incorrect, the count of the incorrect prediction must be smaller than the total execution count when the basic block takes the branch. So, the following constraints can be obtained:
b m c i ( h , 1 ) j s p c ( i , j ) ( h , 1 ) ;
b m c i ( h , 0 ) j s p c ( i , j ) ( h , 0 ) ;

3.4. Prefetching Strategy

In the previous two subsections, the analyses of the instruction cache and the dynamic branch predictor were performed independently. When analyzing the cache using the CCG, the impact of branch prediction on program execution time was ignored. Similarly, when analyzing the impact of branch prediction using HS, the influence of cache hits or misses due to the branch predictor was not considered. In reality, instruction cache prefetching occurs based on the predicted branch direction, which affects whether instructions hit the cache and, consequently, influences the program’s execution time. In this paper, we assume that the processor can only allow one branch instruction to be in the prediction stage at any given time. This ensures that all previous instructions have finished executing before the branch instruction is processed.
When branch prediction is incorrect, an invalid cache may cause a cache miss, resulting in delays. For this scenario, we introduce virtual nodes for the CCG. If the cache line blocks on the predicted path conflict with the other cache line blocks, virtual node B i , j x is introduced for cache basic block B ( i , j ) , to describe the mutual influence between branch prediction and instruction cache. x denotes the actual execution outcome of the branch instruction. If a cache line basic block along the predicted path does not conflict with other blocks, there is no need to add a virtual node. After adding virtual nodes, the CCG in Figure 4 is modified to the CCG shown in Figure 6. For branch instruction 2, when the actual execution is 0 but the prediction is 1, we do not need to add additional nodes because, in this scenario, instruction prefetching will proceed along the erroneous path and no instructions within basic block B 2 conflict with other instructions. Similarly, for branch instruction 3, when the execution is 1 but the prediction is 0, we also do not need to add nodes. B 3.1 ( 2 , 1 ) denotes the impact on cache line 3.1 when the actual execution branch instruction 2 is 1, but the prediction is 0. The edge B 1.2 B 3.1 ( 2 , 1 ) indicates that, due to the branch instruction 2 is predicted as 0, 3.1 is already in the cache. However, when instruction 1.2 is executed, the cache miss for 1.2 occurs because of 3.1. For edge B 3.1 ( 2 , 1 ) B 3.1 , when the prediction for branch instruction 2 is 0 but the execution is 1, the incorrect prediction results in instruction 3.1 being cached. This scenario actually improves the cache hit rate.
After adding virtual nodes, the target equation is modified to Equation (15). d e l a y _ b represents the total time consumed by the branch prediction. d e l a y _ c denotes the total time consumed by the cache misses. The Equation illustrates that when a branch prediction error occurs, instruction prefetching will proceed along the erroneous path. Consequently, when the program executes along the correct path, at least one cache miss will occur. Here, l denotes the number of instructions fetched in a single prefetch operation.
W C E T = i = 1 N [ c o s t i × c n t i + d e l a y _ b + d e l a y _ c ] + b B r a n c h ( P ) [ c m p × l ] ;
d e l a y _ b = b m p × b m i ;
d e l a y _ c = j = 1 n i ( c m p × c m ( i , j ) ) ;

3.5. Pipeline Analysis

The purpose of pipeline analysis is to comprehensively consider the effects of both the cache model and branch prediction, and to calculate the execution time for a basic block. Ref. [23] analyzes the impact of out-of-order execution pipelines on program execution time, noting that the execution time cannot be simply represented by the longest path of instruction execution; dependencies between instructions must also be considered. This paper achieves this by analyzing the commonly used five-stage pipeline architecture in ARMv8 processors: Instruction Fetch (IF), Decode (ID), Instruction Execute (EX), Memory Access (MEM), and Write Back (WB). We use execution graphs to model and analyze the pipelines of out-of-order executions. Execution graphs illustrate the dependencies between instructions using directed edges as follows:
  • The dependency between different stages of the same instruction. The completion of the previous stage is required before proceeding to the next stage.
  • The dependency between different instructions in the same pipeline stages. Earlier instructions in the program have a higher priority.
  • The data dependency between instructions.
  • Queuing for idle Instruction Fetch Buffers (I-buffers) and Reorder Buffers (ROB).
Assuming the size of the Instruction I-buffer is 2 and the size of the Reorder ROB is 4, a program and its corresponding execution graph is shown in Figure 7. The edge W B E X represents the data dependency between instructions, I D I F and M E M I D represent the dependencies between instructions due to the sizes of the I-buffer and ROB, respectively, and the dashed line E X E X indicates instructions competing for the same functional unit.
Since the analysis of the cache is performed using the CCG, the cache is ignored when establishing execution graphs; we assume that all instructions hit the cache. However, errors in branch prediction can cause the instruction pipeline to prefetch instructions along an incorrect path. To address this, we add instructions with the same size as the Reorder Buffer (ROB) along the erroneous path after the mispredicted branch instruction in the execution graph.
For a node in execution phase, its earliest finish time and latest finish time can be obtained using Algorithm 1. e a r l i e s t [ t i s t a r t ] denotes the earliest start time of node i, e a r l i e s t [ t i f i n i s h ] denotes the earliest finish time, l a t e s t [ t i s t a r t ] denotes the latest start time, and l a t e s t [ t i f i n i s h ] denotes the latest finish time. The latest finishing time of the last node is the estimated WCET. The functions in the algorithm are:
  • L a t e s t T i m e s ( G ) calculates the latest ready, start, and finish times for the nodes. The latest start time depends on its latest ready time, which depends on the latest finish time of its predecessor or competitors. If a competitor has a lower priority than the instruction in question, the competitor will be excluded, and those whose execution times do not overlap will also be excluded. If a competitor has a higher priority than the instruction, it is assumed that all nodes preceding this competitor will delay the node. Once a node’s latest start time is obtained, the latest ready times of its successor nodes are updated.
  • E a r l i e s t T i m e s ( G ) calculates the earliest ready, start, and finish times of nodes. Unlike L a t e s t T i m e s ( G ) , the calculation of these times only considers competitors that conflict with the node’s preparation time and hardware resources.
Algorithm 1: Basic Block WCET Analysis Algorithm
Applsci 14 07277 i001

4. Implementation

This section provides a description of the implementation process of the tool, including the establishment of the CFG and CCG, the extension of the CFG, and WCET calculation.
Each basic block in the CFG must indicate whether it contains a branch instruction. For blocks with branch instructions, there are two outgoing edges representing the two possible execution directions of the branch instruction. In addition to edges, other information, such as the basic block index, must also be stored. The data type of a basic block is shown in Table 2.
To analyze the impact of the cache on program execution time, basic blocks are divided into multiple cache basic blocks. If a basic block is smaller than the size of a cache line configured by the user, it is treated as a single node. Otherwise, the basic block is divided into multiple cache basic blocks. The data type for a cache basic block is shown in Table 3. Each cache basic block must record information such as the starting instruction address, the index of the cache set it is mapped to, and the index of the basic block it belongs to. This data type can also be used to describe nodes in the CCG. Consequently, the edges in the CCG only need to record the indices of the start and end nodes, and the count of conflicts.
For branch prediction analysis, the introduction of HS requires the extension of CFG. A new node is used to store the branch history information for nodes in the CFG. The data type is shown in Table 4. The main steps in analyzing branch prediction are collecting the set of instructions on the erroneous path, constructing the HS of nodes, and analyzing the HSs of edges to feasible nodes. Function c o l l e c t _ m p _ i n s t s is used to collect instructions along the predicted path, function b u i l d _ b f g collects control flow information between adjacent branch instructions under a specific HS, and function b u i l d _ b t g uses BFS to collect the information of all nodes reachable from a specific CFG node.
The pipeline model analysis is implemented using execution graphs, with the goal of determining the execution time of a basic block. When computing the execution time of a basic block, it is insufficient to consider the block in isolation; the block’s context within the CFG must also be considered. For example, if the instruction fetch buffer size is 2 and the instruction reorder buffer size is 4, then before the execution of a basic block, there will be 5 instructions waiting in the pipeline and one instruction being executed. Instructions in the cache may have dependencies on nearby instructions. Instructions that have dependencies on the previous basic block are referred to as the prologue, while those with dependencies on the subsequent basic block are called the epilogue. By using the functions c o l l e c t _ p r o l o g s and c o l l e c t _ e p i l o g s to traverse paths in the CFG, we can establish the context between cache basic blocks. This allows us to account for dependencies within the contextual environment when analyzing the execution graph.
In the implementation of the pipeline analysis, the function e s t _ u n i t s calculates the execution time of each basic block in the CFG. The function c t x _ u n i t _ t i m e analyzes paths within the CFG, focusing on branch prediction and the contextual environment of the basic blocks. The function c r e a t e _ e g r a p h establishes the execution graph for each basic block, while the function e s t _ e g r a p h applies Algorithm 1 to derive the final WCET target equation.
IPET solves problems using ILP. The overall construction process for the ILP problem is shown in Figure 8. There are two branches related to branch prediction: the function after the first branch prediction extracts constraints related to branch prediction, while the function after the second branch extracts constraints related to the erroneous path prefetching strategy. The roles of the other functions are as follows:
  • c o s t _ f u n c generates linear target equations based on the micro-architecture configured by the user. For scenarios involving branch prediction execution, c o s t _ t e r m ( B P _ C P R E D ) or c o s t _ t e r m ( B P _ M P R E D ) is invoked to construct target equations for correct branch prediction or branch prediction errors, respectively. m p c o s t _ f u n c is invoked to establish the target Equation (15) to consider branch prediction and the instruction cache, along with their interactions.
  • t c f g _ c o n s generates constraints derived from the control flow information.
  • b f g _ c o n s generates constraints derived from the branch prediction.
  • c a c h e _ c o n s and m p _ c a c h e _ c o n s generates constraints for cache hits or misses, as well as for constraints, when executing along erroneous paths.
  • u s e r _ c o n s reads user-input constraint files and parses the linear constraint for the ILP problem.
Figure 8. Process of implementing the ILP problem.
Figure 8. Process of implementing the ILP problem.
Applsci 14 07277 g008
The ILP designed in this paper is based on the ILOG/CPLEX format [24]. There are two ILP solvers capable of solving problems in this format: CPLEX, a commercial software, and l p _ s o l v e r , a free and open-source alternative. This paper uses l p _ s o l v e r 5.5.0.4 as the third-party ILP solver for calculating the WCET.

5. Results and Discussion

Figure 9 gives the analysis WCET in cycles for the set of Mälardalen benchmarks [25] supported by our WCET analysis tool. The processor configuration of the tool we use in experiments is listed in Table 5. Some user-provided constraints we used for the benchmarks are shown in Table 6.
Among Mälardalen WCET benchmarks, the insertsort program has the largest estimated WCET, in contrast to the binary search program b s and the finite impulse response filter program f i r . This is because we changed the length of the target reversed array in i n s e r t s o r t to 1024 but b s only sorts 15 elements.
Two boards equipped with ARMv8-A CPUs, the Raspberry Pi 4 Model B from Sony, Pencoed, Wales, and the Firefly ROC-RK3568-PC-SE from T-CHIP, Zhongshan, Guangdong Province, China, are also used to obtain the observed WCETs of the benchmarks in a physical environment. Execution time in processor cycles is obtained by reading the generic timer register of the ARMv8-A CPU. The difference between the values before and after execution is the measured execution time of the program. Five measured execution times are averaged to obtain the final observed WCET. We configure the analysis tool to make the micro-architecture similar to the target processor. The two sets of measured and observed WCETs of the benchmarks are shown in Figure 10 and Figure 11.
Given a C program and a processor configuration, the static analysis method guarantees that the estimated WCET is not less than the program’s actual execution time for any input. As shown in the figure, the ratio of the estimated to observed WCET values for all benchmarks is greater than 1, which ensures the reliability of the analysis tool. The analysis result is considered precise if the estimated value is close to the observed value.
However, some estimated WCET values for benchmarks are up to 2–3 times larger than the observed WCET. There may be two reasons for these differences. First, our micro-architecture modeling may not be detailed enough. For instance, features like data caches and Translation Lookaside Buffers (TLBs) supported by modern processors are not included in our model, leading to a lack of constraints. Second, the differences may be attributed to the pessimistic nature of the static analysis method.

6. Conclusions

This paper designs a WCET analysis tool that includes using a CCG to address cache conflicts, employing a control flow analysis method based on historical states to analyze branch predictors, and examining instruction cache prefetching strategies based on erroneous branch prediction paths by adding virtual nodes to the CCG. When calculating the execution time of a basic block, this paper integrates instruction pipelines into the analysis and uses execution graphs to determine the execution time of individual basic blocks.
After implementation, we evaluate the performance of the analysis tool by comparing the estimated WCET of benchmarks with the observed values for two boards. The ratio of estimated to observed WCET values for all benchmarks is greater than 1, demonstrating the tool’s reliability. Some estimated values differ greatly from the observed values. This discrepancy is due to our incomplete modeling of modern processor features.
Future works will mainly include the following:
  • Model and analyze shared caches among multi-core to improve analysis accuracy.
  • Optimize the micro-architecture model to make it more suitable for modern processors.
  • Adopt algorithms with lower time complexity so that the tool can analyze more complex programs.
  • Use estimated WCET bounds to perform schedulability analyses for programs [26].

Author Contributions

Conceptualization, M.L. and Y.Z.; methodology, M.L. and K.X.; software, M.L.; validation, M.L., K.X. and Y.Z.; formal analysis, M.L.; writing—original draft preparation, D.H.; writing—review and editing, D.H.; supervision, K.X.; funding acquisition, M.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work is based on the research results of CMIOT-UESTC Joint Laboratory of Operating System.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Bouziane, R.; Rohou, E.; Gamatié, A. Energy-Efficient Memory Mappings based on Partial WCET Analysis and Multi-Retention Time STT-RAM. In Proceedings of the 26th International Conference on Real-Time Networks and Systems, Chasseneuil-du-Poitou, France, 10–12 October 2018; pp. 148–158. [Google Scholar]
  2. Lee, J.; Shin, S.Y.; Nejati, S.; Briand, L.; Parache, Y.I. Estimating Probabilistic Safe WCET Ranges of Real-Time Systems at Design Stages. ACM Trans. Softw. Eng. Methodol. 2023, 32, 1–33. [Google Scholar] [CrossRef]
  3. Lugo, T.; Lozano, S.; Fernández, J.; Carretero, J. A Survey of Techniques for Reducing Interference in Real-Time Applications on Multicore Platforms. IEEE Access 2022, 10, 21853–21882. [Google Scholar] [CrossRef]
  4. Pedro-Zapater, A.; Segarra, J.; Tejero, R.G.; Viñals, V.; Rodríguez, C. Reducing the WCET and analysis time of systems with simple lockable instruction caches. PLoS ONE 2020, 15, e0229980. [Google Scholar] [CrossRef] [PubMed]
  5. Segarra, J.; Cortadella, J.; Gran Tejero, R.; Viñals-Yufera, V. Automatic Safe Data Reuse Detection for the WCET Analysis of Systems With Data Caches. IEEE Access 2020, 8, 192379–192392. [Google Scholar] [CrossRef]
  6. Eyerman, S.; Smith, J.E.; Eeckhout, L. Characterizing the branch misprediction penalty. In Proceedings of the 2006 IEEE International Symposium on Performance Analysis of Systems and Software, Austin, TX, USA, 19–21 March 2006. [Google Scholar]
  7. Zhang, Q.; Huangfu, Y.; Zhang, W. Statistical regression models for WCET estimation. Qual. Technol. Quant. Manag. 2019, 16, 318–329. [Google Scholar] [CrossRef]
  8. Chattopadhyay, S.; Roychoudhury, A. Unified Cache Modeling for WCET Analysis and Layout Optimizations. In Proceedings of the 2009 30th IEEE Real-Time Systems Symposium, Washington, DC, USA, 1–4 December 2009; pp. 47–56. [Google Scholar]
  9. Li, X.; Liang, Y.; Mitra, T.; Roychoudhury, A. Chronos: A timing analyzer for embedded software. Sci. Comput. Program 2007, 69, 56–67. [Google Scholar] [CrossRef]
  10. Reghenzani, F.; Massari, G.; Fornaciari, W.; Galimberti, A. Probabilistic-WCET Reliability: On the experimental validation of EVT hypotheses. In Proceedings of the International Conference on Omni-Layer Intelligent Systems, Crete, Greece, 5–7 May 2019; pp. 229–234. [Google Scholar]
  11. Puschner, P.; Burns, A. A review of worst-case execution-time analysis. Real Time Syst. 2000, 18, 115–128. [Google Scholar] [CrossRef]
  12. Li, Y.T.S.; Malik, S.; Wolfe, A. Efficient microarchitecture modeling and path analysis for real-time software. In Proceedings of the 16th IEEE Real-Time Systems Symposium, Pisa, Italy, 5–7 December 1995; pp. 298–307. [Google Scholar]
  13. Li, Y.T.S.; Malik, S. Performance analysis of embedded software using implicit path enumeration. IEEE T. Comput. Aid D 1997, 16, 1477–1487. [Google Scholar] [CrossRef]
  14. Stappert, F.; Ermedahl, A.; Engblom, J. Efficient longest executable path search for programs with complex flows and pipeline effects. In Proceedings of the 2001 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems, Atlanta, GA, USA, 16–17 November 2001; pp. 132–140. [Google Scholar]
  15. Healy, C.; Sjodin, M.; Rustagi, V.; Whalley, D. Bounding loop iterations for timing analysis. In Proceedings of the Fourth IEEE Real-Time Technology and Applications Symposium, Denver, CO, USA, 3–5 June 1998; pp. 12–21. [Google Scholar]
  16. Gómez, G.; Liu, Y.A. Automatic time-bound analysis for a higher-order language. SIGPLAN Not. 2002, 37, 75–86. [Google Scholar] [CrossRef]
  17. Ruiz, J.; Cassé, H.; Michiel, M.d. Working Around Loops for Infeasible Path Detection in Binary Programs. In Proceedings of the 2017 IEEE 17th International Working Conference on Source Code Analysis and Manipulation, Shanghai, China, 17–18 September 2017; pp. 1–10. [Google Scholar]
  18. Ferdinand, C.; Heckmann, R.; Langenbach, M.; Martin, F.; Schmidt, M.; Theiling, H.; Thesing, S.; Wilhelm, R. Reliable and Precise WCET Determination for a Real-Life Processor. In Proceedings of the Embedded Software, First International Workshop, EMSOFT 2001, Tahoe City, CA, USA, 8–10 October 2001. [Google Scholar]
  19. Lundqvist, T.; Stenstrom, P. Timing anomalies in dynamically scheduled microprocessors. In Proceedings of the 20th IEEE Real-Time Systems Symposium, Phoenix, AZ, USA, 1–3 December 1999; pp. 12–21. [Google Scholar]
  20. Li, Y.-T.S.; Malik, S.; Wolfe, A. Cache modeling for real-time software: Beyond direct mapped instruction caches. In Proceedings of the 17th IEEE Real-Time Systems Symposium, Washington, DC, USA, 4–6 December 1996; p. 254. [Google Scholar]
  21. Mitra, T.; Roychoudhury, A.; Xianfeng, L. Timing analysis of embedded software for speculative processors. In Proceedings of the 15th International Symposium on System Synthesis, Kyoto, Japan, 2–4 October 2002; pp. 126–131. [Google Scholar]
  22. Xianfeng, L.; Roychoudhury, A.; Mitra, T. Modeling out-of-order processors for software timing analysis. In Proceedings of the 25th IEEE International Real-Time Systems Symposium, Lisbon, Portugal, 5–8 December 2004; pp. 92–103. [Google Scholar]
  23. Bai, Z.; Cassé, H.; Carle, T.; Rochange, C. Computing Execution Times With Execution Decision Diagrams in the Presence of Out-of-Order Resources. IEEE T. Comput. Aid D 2023, 42, 3665–3678. [Google Scholar] [CrossRef]
  24. IBM ILOG CPLEX Optimization Studio. Available online: https://www.ibm.com/products/ilog-cplex-optimization-studio (accessed on 21 April 2024).
  25. Gustafsson, J.; Betts, A.; Ermedahl, A.; Lisper, B. The Mälardalen WCET Benchmarks: Past, Present and Future; Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik: Wadern, Germany, 2010; Volume 15, pp. 136–146. [Google Scholar]
  26. Maiza, C.; Rihani, H.; Rivas, J.M.; Goossens, J.; Altmeyer, S.; Davis, R.I. A Survey of Timing Verification Techniques for Multi-Core Real-Time Systems. ACM Comput. Surv. 2019, 52, 1–38. [Google Scholar] [CrossRef]
Figure 1. A C program and its CFG.
Figure 1. A C program and its CFG.
Applsci 14 07277 g001
Figure 2. Process of WCET analysis.
Figure 2. Process of WCET analysis.
Applsci 14 07277 g002
Figure 3. CFG and constraints of a program.
Figure 3. CFG and constraints of a program.
Applsci 14 07277 g003
Figure 4. (a) CFG of a program. (b) Cache Table of a program. (c) CCG of a program.
Figure 4. (a) CFG of a program. (b) Cache Table of a program. (c) CCG of a program.
Applsci 14 07277 g004
Figure 5. CFG with HS information.
Figure 5. CFG with HS information.
Applsci 14 07277 g005
Figure 6. CCG with virtual nodes.
Figure 6. CCG with virtual nodes.
Applsci 14 07277 g006
Figure 7. ARM assembly and its execution graph.
Figure 7. ARM assembly and its execution graph.
Applsci 14 07277 g007
Figure 9. Estimated WCET of benchmarks.
Figure 9. Estimated WCET of benchmarks.
Applsci 14 07277 g009
Figure 10. WCET experimental data from the Raspberry Pi 4 Model B with Crotex-A75 processor.
Figure 10. WCET experimental data from the Raspberry Pi 4 Model B with Crotex-A75 processor.
Applsci 14 07277 g010
Figure 11. WCET experimental data from the Firefly ROC-RK3568-PC-SE board with Crotex-A55 processor.
Figure 11. WCET experimental data from the Firefly ROC-RK3568-PC-SE board with Crotex-A55 processor.
Applsci 14 07277 g011
Table 1. Comparison of the three analysis methods.
Table 1. Comparison of the three analysis methods.
Analysis MethodTree-BasedImplicit Path EnumerationPath-Based
EfficiencyHighestHighLow
AccuracyAverageGoodBest
Ability to Describe
Flow Information
PoorGoodBest
Affected by Compiler
Optimization
YesNoNo
Table 2. Data type of basic block.
Table 2. Data type of basic block.
Data TypeNameDescription
i n t i d Index of the basic block.
p r o c _ t * p r o c Function the node belongs to.
a d d r _ t s a Starting address of the node.
i n t s i z e Size of the basic block.
i n t n u m _ i n s t Number of instructions in the basic block.
d e _ i n s t _ t * c o d e Pointer to the first instruction in the basic block.
b b _ t y p e _ t t y p e Whether include branch instruction.
c f g _ e d g e _ t * o u t _ n , o u t _ t Outing edges of the basic block.
i n t n u m _ i n Number of the edges out.
c f g _ e d g e _ t * * i n t Set of the edges in.
Table 3. Data type of cache basic block.
Table 3. Data type of cache basic block.
Data TypeNameDescription
i n s t _ t s t a r t _ i n s t Starting instruction of the basic block.
s i z e _ t l b _ s i z e Size of the cache basic block.
u n s i g n e d l b _ s e t Cache line the basic block mapped to.
c f g _ n o d e _ t c f g _ b b Basic block the cache basic block belongs to.
u n s i g n e d i d Index of the cache basic block in CFG.
c c g _ e d g e _ * e d g e s _ i n Set of edges in.
c c g _ e d g e _ * e d g e s _ o u t Set of edges out.
Table 4. Data type of HS node in CFG.
Table 4. Data type of HS node in CFG.
Data TypeNameDescription
t c f g _ n o d e _ t * b b i Basic node pointed to.
s h o r t b h r Value of BHR.
s h o r t p i Index of prediction table, calculated by BHR.
t c f g _ n o d e _ t * o u t Set of edges out.
t c f g _ n o d e _ t * i n Set of edges in.
i n t f l a g s Flags indicating whether the path is feasible.
Table 5. Processor configuration.
Table 5. Processor configuration.
ModuleParameterDescription
Cache-cache:il1 il1:768:64:3:lL1 instruction cache that is a 48 KB 3-way set-associative cache with a 64-byte cache line; LRU replacement policie.
-cache:il2 il2:16384:64:16:lL1 instruction cache that is a 1 MB 16-way set-associative cache with a 64-byte cache line; LRU replacement policie.
-cache:dl1 noneData caching is not considered
Branch
Prediction
-bpred:2lev 1 1024 4 1Size of first level entry is 1; size of second level entry is 1024; Width of BHR is 4; Support branch history.
-fetch:mplat 15Branch mis-prediction latency is 15 cycles
Instruction
Pipeline
-fetch:ifqsize 4Instruction prefetch queue size is 4.
-decode:width 4Instruction decode width is 4 insts/cycle.
-issue:width 4Instruction transmission width is 4 insts/cycle.
-commit:width 4Instruction commisstion width is 4 insts/cycle.
-issue:inorder trueRun pipeline with in-order issue.
-issue:wrongpath trueIssue instructions down wrong execution paths
Others-ruu:size 128Size of register update unit is 128.
-lsq:size 64Size of load/store queue is 64.
-mem:width 8Size of memory block is 32 bytes.
Table 6. User constraints for benchmarks.
Table 6. User constraints for benchmarks.
InsertsortcntMatmultbs
     c0.1 − 1024 c0.0 ≤ 0
c0.3 − 1024 c0.2 ≤ 0
c0.5 − 512 c0.4 ≤ 0
   c0.2 − 128 c0.1 ≤ 0
c0.1 − 128 c0.0 ≤ 0
c0.6 − 128 c0.5 ≤ 0
c0.6 − 128 c0.5 ≤ 0
c0.5 − 128 c0.4 ≤ 0
c0.1 − 24 c0.0 ≤ 0
c0.2 − 24 c0.1 ≤ 0
c0.6 − 24 c0.5 ≤ 0
c0.5 − 24 c0.4 ≤ 0
c0.11 − 24 c0.10 ≤ 0
c0.10 − 24 c0.9 ≤ 0
c0.9 − 24 c0.8 ≤ 0
     c0.1 − 1024 c0.0 ≤ 0
c0.3 − 1024 c0.2 v 0
c0.4 − 512 c0.3 ≤ 0
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Li, M.; Xiao, K.; Zhou, Y.; Huang, D. WCET Analysis Based on Micro-Architecture Modeling for Embedded System Security. Appl. Sci. 2024, 14, 7277. https://doi.org/10.3390/app14167277

AMA Style

Li M, Xiao K, Zhou Y, Huang D. WCET Analysis Based on Micro-Architecture Modeling for Embedded System Security. Applied Sciences. 2024; 14(16):7277. https://doi.org/10.3390/app14167277

Chicago/Turabian Style

Li, Meng, Kun Xiao, Yong Zhou, and Dajun Huang. 2024. "WCET Analysis Based on Micro-Architecture Modeling for Embedded System Security" Applied Sciences 14, no. 16: 7277. https://doi.org/10.3390/app14167277

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop