Next Article in Journal
MS-YOLOv8-Based Object Detection Method for Pavement Diseases
Previous Article in Journal
Analysis of the Generalization Ability of Defogging Algorithms on RICE Remote Sensing Images
Previous Article in Special Issue
Autonomous Exploration Method of Unmanned Ground Vehicles Based on an Incremental B-Spline Probability Roadmap
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Model Predictive Control with Variational Autoencoders for Signal Temporal Logic Specifications

Department of Information and Telecommunication Engineering, Incheon National University, Incheon 22012, Republic of Korea
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Sensors 2024, 24(14), 4567; https://doi.org/10.3390/s24144567
Submission received: 27 May 2024 / Revised: 7 July 2024 / Accepted: 8 July 2024 / Published: 14 July 2024

Abstract

:
This paper presents a control strategy synthesis method for dynamical systems with differential constraints, emphasizing the prioritization of specific rules. Special attention is given to scenarios where not all rules can be simultaneously satisfied to complete a given task, necessitating decisions on the extent to which each rule is satisfied, including which rules must be upheld or disregarded. We propose a learning-based Model Predictive Control (MPC) method designed to address these challenges. Our approach integrates a learning method with a traditional control scheme, enabling the controller to emulate human expert behavior. Rules are represented as Signal Temporal Logic (STL) formulas. A robustness margin, quantifying the degree of rule satisfaction, is learned from expert demonstrations using a Conditional Variational Autoencoder (CVAE). This learned margin is then applied in the MPC process to guide the prioritization or exclusion of rules. In a track driving simulation, our method demonstrates the ability to generate behavior resembling that of human experts and effectively manage rule-based dilemmas.

1. Introduction

Robotics is increasingly permeating diverse sectors, spanning both civilian and industrial applications, and is becoming integral to everyday life. Service robots are now prevalent in public spaces, interacting with individuals and delivering services. Within the field of robotics, autonomous driving emerges as a particularly dynamic area, garnering extensive research attention.
In robotics, adherence to rules varies from basic collision avoidance in navigation scenarios to compliance with complex traffic regulations in autonomous driving. These rules, established primarily for safety, must generally be upheld by robots while executing their tasks. However, it is essential to acknowledge that not all rules carry equal importance. Depending on the context, some rules may need to be prioritized over others or even disregarded. For example, in autonomous driving, scenarios may necessitate breaching certain rules—such as lane changes in dense traffic, decisions at yellow traffic lights, or crossing double yellow lines to avoid obstacles. These situations compel robots to make intricate decisions regarding rule compliance, presenting significant challenges in determining appropriate control inputs.
Model Predictive Control (MPC) stands out as a robust approach for autonomous control, recognized for its capabilities in online trajectory optimization [1]. The core principle of MPC involves identifying optimal control inputs to minimize a predefined cost function, considering both inputs and anticipated future outputs. This method integrates an objective function characterizing the desired robot behavior and constraints mitigating undesirable actions. The efficacy of MPC is well-documented across diverse applications, such as the full-body control of humanoid robots [2,3,4].
Designing effective MPC controllers remains a significant challenge. Experienced operators can adeptly manage robots, yet encoding such expertise into MPC parameters is complex. For instance, expert drivers in autonomous driving must make continuous, complex decisions, such as whether to decelerate or change lanes in response to slow-moving vehicles. However, finding the appropriate MPC parameters to handle such varied scenarios is complex and computationally intensive.
Recently, imitation learning has emerged as a promising solution for robotic learning challenges [5,6]. This approach derives near-optimal control strategies directly from human expert demonstrations, eliminating the need for manual policy or cost function design. Imitation learning excels in capturing complex policy functions that balance multiple considerations [7], learning the importance of various factors from expert behaviors to enable robots to replicate human actions. However, despite its advantages, imitation learning does not inherently ensure performance reliability. In scenarios where safety rules like collision avoidance are crucial, imitation learning may not consistently yield control actions that comply with essential safety norms, underscoring the paramount importance of rule adherence for robot safety and human protection.
In this paper, we address a control synthesis problem within a framework of prioritized rules, building on our previous research [8]. We assumed inherent rule priorities and aimed to design a controller that accounts for these priorities to manage dilemmas effectively. Our methodology is grounded in the MPC framework, which, unlike purely deep learning-based approaches, integrates each rule as a constraint, thereby enhancing performance reliability.
We represent these rules using Signal Temporal Logic (STL) [9,10], a formalism allowing the precise specification of desired system behaviors, commonly applied in robotic task specifications [11,12,13,14,15]. STL is particularly suited for describing properties of real-valued signals in dense time scenarios, making it ideal for real-world robotic applications.
Instead of explicitly determining rule priorities, we adopted a learning approach to identify minimal acceptable levels of rule satisfaction, informed by expert demonstrations. This approach diverges from our earlier work [8] by employing a Conditional Variational Autoencoder (CVAE) [16]. This technique helps discern essential rules and decide on adherence levels, facilitating selective compliance rather than strict obedience to all rules. The use of a CVAE is justified by its efficiency in handling uncertainties within data, providing a more effective solution compared to Gaussian process regression methods used in previous work [8].
Our hybrid approach combines deep learning with traditional MPC, guiding robots to emulate expert human behaviors in complex decision-making scenarios.

2. Related Work

Extensive research has explored trajectory optimization and Model Predictive Control (MPC) within the framework of temporal logic specifications, particularly Linear Temporal Logic (LTL). Mixed-integer linear programming (MILP) has been employed to generate trajectories for continuous systems subject to finite-horizon LTL specifications [17,18]. Wolff et al. [19] extended this approach by encoding general LTL formulas into MILP constraints, accommodating infinite runs with periodic structures. Additionally, Cho et al. [20] investigated optimal path planning under synthetically co-safe LTL specifications, utilizing a sampling-tree and two-layered structure.
Recent advancements have integrated Signal Temporal Logic (STL) within MPC frameworks. Raman et al. [21] structured MPC to facilitate control synthesis from STL specifications using MILP, allowing for the calculation of open-loop control signals that adhere to both finite and infinite horizon STL properties while maximizing robust satisfaction. Sadigh et al. [22] introduced a novel STL variant incorporating probabilistic predicates to address uncertainties in predictive models, thereby enhancing safety assessments under uncertainty. Mao et al. [23] proposed a solution to handle complex temporal requirements formalized in STL specifications within the Successive Convexification algorithmic framework. This approach retains the expressiveness of encoding mission requirements with STL semantics while avoiding combinatorial optimization techniques such as MILP.
The integration of MPC with machine learning techniques has been pursued to address system identification challenges within MPC contexts [24,25,26]. Lenz et al. [24] applied deep learning within MPC to derive task-specific controls for complex activities such as robotic food cutting. Carron et al. [25] presented a model-based control approach that utilizes data gathered during operation to improve the model of a robotic arm and thereby enhance the tracking performance. Their scheme is based on inverse dynamic feedback linearization and a data-driven error model, integrated into an MPC formulation. Lin et al. [26] compared deep reinforcement learning (DRL) and MPC for Adaptive Cruise Control (ACC) design in car-following scenarios.
Efforts have also been made to address the types of dilemmas introduced in our work. Tumova et al. [27] and Castro et al. [28] examined scenarios where not all LTL rules can be satisfied in path planning, seeking paths that minimally violate these rules. However, their approaches require predetermined weights among rules, contrasting with our method that learns directly from expert demonstrations. Urban driving dilemmas were specifically addressed by Lee et al. [29], who applied inverse reinforcement learning to capture expert driving strategies.
Imitation learning is emerging as a promising approach to robotic learning problems and has been widely applied to autonomous driving. Policies for autonomous vehicles have been learned from image or video datasets through Convolutional Neural Networks (CNNs) [30,31]. Schmerling et al. [32] utilized a Conditional Variational Autoencoder (CVAE) framework to reason about interactions between vehicles in traffic-weaving scenarios, producing multimodal outputs. Additionally, some studies have applied learning approaches to MPC, where certain parameters of MPC are learned from data [8,33]. Reinforcement learning has also been considered for autonomous driving, using CNNs to encode visual information [34].

3. Preliminaries

3.1. System Model

We consider a continuous-time dynamical system described by the following differential equation:
x ˙ t = f ( x t , u t ) ,
where x t X R n x represents the state vector, u t U R n u denotes the control input, and f is a smooth (continuously differentiable) function with respect to its arguments. Through employing a predefined time step d t , the continuous system in Equation (1) can be discretized as follows:
x n + 1 = f ( x n , u n ) ,
where n represents the discrete time step, defined as n = t / d t , and x 0 denotes the initial state. For a fixed horizon H, let x ( x n , u H , n ) denote a trajectory generated from the state x n with the control inputs u H , n = { u n , , u n + H 1 } .
A signal is defined as a sequence of states and control inputs:
ξ ( x n , u H , n ) = ( x n , u n ) , , ( x n + H 1 , u n + H 1 ) .
In addition to the definition provided in Equation (3), we use the notation ξ ( n ) to represent a signal starting from the discrete time step n, with a slight abuse of notation.

3.2. Signal Temporal Logic

Signal Temporal Logic (STL) is a formalism used to specify properties of real-valued, dense-time signals, and is extensively applied in the analysis of continuous and hybrid systems [9,10]. A predicate within an STL formula is defined as an inequality of the form μ ( ξ ( t ) ) > 0 , where μ is a function of the signal ξ at time t. The truth value of the predicate μ is determined by the condition μ ( ξ ( t ) ) > 0 .
An STL formula is composed of boolean and temporal operations on these predicates. The syntax of STL formulae φ is defined recursively as follows:
φ : : = μ | ¬ μ | φ ψ | G [ a , b ] ψ | φ U [ a , b ] ψ ,
where φ and ψ are STL formulas, G denotes the globally operator, and U represents the until operator.
The validity of an STL formula φ with respect to a signal ξ at time t is defined inductively as follows:
( ξ , t ) μ μ ( ξ ( t ) ) > 0
( ξ , t ) ¬ μ ¬ ( ( ξ , t ) μ )
( ξ , t ) φ ψ ( ξ , t ) φ ( ξ , t ) ψ
( ξ , t ) φ ψ ( ξ , t ) φ ( ξ , t ) ψ
( ξ , t ) G [ a , b ] φ t [ t + a , t + b ] , ( ξ , t ) φ
( ξ , t ) φ U [ a , b ] ψ t [ t + a , t + b ] s . t . ( ξ , t ) ψ t [ t , t ] , ( ξ , t ) φ .
The notation ( ξ , t ) φ indicates that the signal ξ satisfies the STL formula φ at time t. For example, ( ξ , t ) G [ a , b ] φ implies that φ holds for the signal ξ throughout the interval from t + a to t + b . In discrete-time systems, STL formulas are evaluated over discrete time intervals.
One significant advantage of Signal Temporal Logic (STL) is its associated metric, known as the robustness degree, which quantifies how well a given signal ξ satisfies an STL formula φ . The robustness degree is defined as a real-valued function of the signal ξ and time t, calculated recursively using the following quantitative semantics:
ρ μ ( ξ , t ) = μ ( ξ ( t ) ) ,
ρ ¬ μ ( ξ , t ) = μ ( ξ ( t ) ) ,
ρ φ ψ ( ξ , t ) = min ( ρ φ ( ξ , t ) , ρ ψ ( ξ , t ) ) ,
ρ φ ψ ( ξ , t ) = max ( ρ φ ( ξ , t ) , ρ ψ ( ξ , t ) ) ,
ρ G [ a , b ] φ ( ξ , t ) = min t [ t + a , t + b ] ρ φ ( ξ , t ) ,
ρ φ U [ a , b ] ψ ( ξ , t ) = max t [ t + a , t + b ] ( min ( ρ ψ ( ξ , t ) ,
min t [ t , t ] ρ φ ( ξ , t ) ) ) .
Following our previous study [8], we introduce the notation ( ξ , t ) ( φ , r ) to indicate that the signal ξ satisfies the STL formula φ at time t with a robustness slackness r, defined as
( ξ , t ) ( φ , r ) ρ φ ( ξ , t ) > r .
Equation (18) asserts that the signal ξ satisfies φ with at least the minimum robustness degree r. The robustness slackness r serves as a margin for the satisfaction of the STL formula φ . As r increases, the constraints on the signal ξ to satisfy φ at time t become more stringent, while smaller values of r imply more relaxed constraints. Notably, when r < 0 , it allows for the violation of φ .

4. Problem Formulation

This study aimed to solve a control synthesis problem using Signal Temporal Logic (STL) formulas [8]. Let φ = [ φ 1 , , φ N ] represent a set of STL formulas, with their conjunction denoted as φ ¯ = φ 1 φ N . We define a cost function J over the state and control spaces, where J ( x , u ) measures the cost associated with a trajectory x and control sequence u . The control synthesis problem under STL for Model Predictive Control (MPC) is formulated as follows.
Problem 1.
Given a system model as described in (2) and an initial state x 0 , with a planning horizon of length H, determine the control input sequence u H , t at each time step t that minimizes the cost function J ( x ( x t , u H , t ) , u H , t ) while ensuring that the conjunction of STL formulas φ ¯ is satisfied:
minimize u H , t J ( x ( x t , u H , t ) , u H , t ) subject to ( ξ ( x t , u H , t ) , t ) φ ¯ .
While this strict formulation ensures compliance with the STL formulas, our primary objective is to develop a control sequence that incorporates flexibility in rule compliance. To this end, we introduce robustness slackness values, denoted by r = [ r 1 , , r N ] , which quantify the degree to which each STL formula is satisfied. In incorporating these robustness values, the MPC problem can be reformulated as follows [8].
Problem 2.
Given the system model specified in (2), an initial state x 0 , and a horizon length H, compute the control input sequence u H , t at each time step t by solving the following optimization problem:
minimize u H , t J ( x ( x t , u H , t ) , u H , t ) subject to ( ξ ( x t , u H , t ) , t ) ( φ 1 , r 1 ) , ( ξ ( x t , u H , t ) , t ) ( φ N , r N ) .
This enhanced formulation allows for a more flexible management of STL constraints, effectively addressing scenarios where it is not feasible to fully satisfy all STL formulas. The robustness slackness values are derived from expert demonstrations, based on the assumption that these experts have accurately assessed the priority and required compliance level of each rule. This learning is achieved through a deep learning approach.

5. Proposed Method

The proposed framework, illustrated in Figure 1, synergizes learning techniques with STL constraints to refine MPC, enabling it to more accurately mimic human expert behavior. By leveraging expert demonstrations, we learn robustness slackness values, which define the margins of rule compliance. A Conditional Variational Autoencoder (CVAE) [16] is utilized to estimate these robustness slackness values in novel scenarios.
In incorporating the robustness slackness values obtained through the learning process, the MPC method, designed under STL constraints, generates control sequences that respect the specified rules with a certain level of flexibility. To manage the nonlinear differential constraints characteristic of dynamical systems, we employ linearized models. Although this approach may introduce some approximation errors, it remains effective for practical applications.
Figure 1 presents an overview of the proposed learning-based MPC framework. Expert demonstrations are used to learn the lower bounds of robustness, referred to as robustness slackness, through a deep learning approach. These learned values inform the MPC method, which then calculates control sequences that take into account the STL rules.

5.1. Feature Description

We introduce a feature function, denoted as ϕ , which transforms a signal into a feature vector, mapping from the combined state and control spaces into the feature space: ϕ : R n x + n u R n f .
As illustrated in Figure 2, the control of the ego vehicle, V e g o , is influenced by six nearby vehicles located in adjacent lanes. These vehicles are collectively referred to as V n e a r = { V l f , V l r , V c f , V c r , V r f , V r r } , where the subscripts denote the relative position to the ego vehicle: left-front (lf), left-rear (lr), center-front (cf), center-rear (cr), right-front (rf), and right-rear (rr).
The feature vector ϕ includes the following components:
  • Distances to each of the nearby vehicles ( d l f , d l r , d c f , d c r , d r f , d r r );
  • Lateral deviation from the lane center ( d d e v );
  • Heading angle relative to the lane direction ( θ d e v ).

5.2. Learning Robustness Slackness from Demonstration

We consider a set of M demonstrated signals, denoted by Ξ = { ξ i } i = 1 M , where each signal ξ n i = ( x n i , u n i ) comprises the state x n i and control input u n i at time step n. The robustness degree r i , j is defined as the minimum value observed from the current time step n to the future time step n + H 1 for the demonstration ξ i :
r n i , j = min m [ n , n + H 1 ] ρ φ j ( ξ i , m ) ,
where H denotes the control horizon length. The robustness degree r i , j serves as the robustness slackness for the signal over the horizon length H, starting from ξ n i , indicating the minimum permissible lower bound of robustness within this timeframe.
Figure 3 illustrates a demonstrated trajectory in a track driving scenario, depicting both the robustness degree and its lower bound for the time horizon H. The rule considered involves maintaining the first (lowest) lane, defined by the STL formula φ lane = ( y y upper ) ( y y lower ) , where y represents the vertical position of the vehicle, and y upper and y lower are the upper and lower lane boundaries, respectively. An obstacle (or other vehicle), depicted as a striped black box, necessitates a lane change to proceed. The figure illustrates the variance between the robustness degree values and their corresponding lower bounds.
From the demonstrated signals Ξ , let D j represent the outputs from Equation (21) corresponding to the STL formula φ j . We define Φ as the set of feature vectors derived from the demonstrated signals Ξ (see Figure 2). For a new input feature ϕ , the CVAE network, depicted in Figure 4, predicts the lower bound of the robustness degree for the horizon H, representing the learned robustness slackness r = [ r 1 , , r N ] .
Our CVAE model comprises the following three parameterized functions:
  • The recognition model  q ν ( Z | ϕ ) approximates the distribution of the latent variable Z based on the input features. This is modeled as a Gaussian distribution, N ( μ ν ( ϕ ) , Σ ν ( ϕ ) ) , where μ ν and Σ ν represent the mean and covariance determined by the network.
  • The prior model  p θ ( Z | ϕ ) assumes a standard Gaussian distribution, N ( 0 , I ) , simplifying the structure of the latent space.
  • The generation model  p θ ( r | Z , ϕ ) calculates the likelihood of robustness slackness based on the latent variable Z and the input feature ϕ .
Both the recognition model q ν ( Z | ϕ ) and the generation model p θ ( r | Z , ϕ ) are implemented as multi-layer perceptrons.
The training of our CVAE is guided by the Evidence Lower Bound (ELBO) loss function, initially formulated as
E q ν ( Z | ϕ ) [ log p θ ( r | Z , ϕ ) ] D K L ( q ν ( Z | ϕ ) p θ ( Z | ϕ ) ) .
To better accommodate the specific requirements of our application, we adapted the ELBO function and define the loss function as follows:
i = 1 N log p θ ( r i | Z , ϕ ) + λ · D K L N ( μ ν ( ϕ ) , Σ ν ( ϕ ) ) N ( 0 , I ) ,
where r i represents an element of the robustness slackness r , and λ is a scaling factor used to balance the terms. The Kullback–Leibler divergence ( D K L ) measures the divergence between two probability distributions. We set λ = 1 and optimize parameters ν and θ by minimizing this loss function.

5.3. Model Predictive Control Synthesis

Previous work, such as that by Raman et al. [21], has shown that MPC optimization with STL constraints can be formulated as a mixed-integer linear program (MILP). This method introduces two encoding strategies: one that focuses on satisfying STL formulas and another, termed ‘robustness-based encoding’, that considers the robustness degree of the STL formulas. In our problem formulation, we manage each STL formula according to its defined robustness slackness using the robustness-based encoding method.
Let C φ j , r j denote the encoded constraints for the STL formula φ j with robustness slackness r j . The combined encoded constraints are formulated as follows:
z φ = j = 1 N z φ j z φ z φ j ,
z φ 1 N + j N z φ j ,
where z φ , z φ j [ 0 , 1 ] are Boolean variables, with z φ representing the satisfaction of all STL constraints and z φ j representing the satisfaction of an individual STL formula φ j . Note that z φ j = 1 only if ρ φ j r j > 0 ; otherwise, z φ j = 0 .
The proposed algorithm is outlined in Algorithm 1. We extended our previous work [8] by incorporating a deep learning network approach. Inputs to the algorithm include a set of STL formulas φ 1 , , φ N , the time of interest τ = [ t 0 , t 1 ] , the discretization time step d t , a control horizon H, an initial signal state ξ i n i t , and demonstrated signals Ξ .
Initially, feature vectors and robustness slackness values (the lowest robustness degree for the horizon H) are pre-computed from demonstrations (line 1). The closed-loop algorithm, which determines the optimal strategy at each time step, runs over the time interval τ = [ t 0 , t 1 ] . Nonlinear dynamics are linearized with respect to the current signal state (line 4). The robustness slackness of the STL formula φ j for the input feature ϕ ( ξ c u r ) is predicted using the trained CVAE network (line 6). The predicted robustness slackness for each STL formula is denoted as r j . Based on the updated robustness slackness r j , each STL formula φ j is converted into mixed-integer programming constraints C φ j , r j using the robustness-based encoding method (line 7), where C φ j , r j consists of binary variables and linear predicates. In considering all STL constraints, dynamic constraints, and past trajectories, the optimal control sequence is computed over the time horizon H using a user-defined cost function (line 11). This procedure is repeated for the entire time interval τ .
Algorithm 1 Variational Autoencoder-based Controller Synthesis under STL Constraints
  1:
Φ , D j Initialize( Ξ )
  2:
ξ c u r ξ i n i t , ξ p a s t
  3:
for  t = t 0 : d t : t 1  do
  4:
      f l i n Linearize( f , ξ c u r )
  5:
     for  j = 1 : 1 : N  do
  6:
           r j CVAE( ϕ ( ξ c u r ) )
  7:
           C φ j , r j EncodeSTLConstraints( φ j , r j )
  8:
     end for
  9:
      C S T L C φ 1 , r 1 C φ N , r N
10:
      C C S T L f l i n [ ξ ( t 0 , , t d t ) = ξ p a s t ]
11:
      u H , t Optimize( J ( ξ H ) , C )
12:
      x n e x t = f ( x c u r , u H , t ( t ) )
13:
      ξ p a s t [ ξ p a s t ξ c u r ]
14:
      ξ c u r ( x n e x t , u H , t ( t ) )
15:
end for

6. Experimental Results

The proposed algorithm was implemented in a Python (version 3.10) environment, utilizing PyTorch (version 2.2.1) [35] for the deep learning components and Gurobi [36] as the optimization engine for MPC. Simulation experiments were conducted on a system equipped with an AMD R7-7700 processor and an RTX 4080 Super GPU. The Gurobi tool enabled solving the proposed MPC problem in approximately 0.11 s.
We conducted realistic simulations using the Next Generation Simulation (NGSIM) dataset [37] and the highD dataset [38], assuming that the drivers in these datasets possessed a certain level of expertise, making them suitable for “expert driver” demonstrations in our proposed approach. In the proposed method, obstacles were set as nearby vehicles. For generating training data, we utilized a combination of 70% data from the highD dataset and 30% from the NGSIM dataset. Data points from the NGSIM dataset that involved vehicles deviating from the track or causing collisions were excluded or modified. Additionally, data with normal speeds but no lane changes were partially removed to ensure a diverse set of training scenarios.
The CVAE network was trained with the following hyperparameters: a batch size of 64, a learning rate of 0.001, and 100 epochs. The future time horizon H was set to 16.

6.1. System Description

We modeled the dynamics of the vehicles on the track using a unicycle model. The state of the system at time t is described by x t = [ x t , y t , θ t , v t ] T , where x t and y t represent the vehicle’s position, θ t denotes the heading angle, and v t indicates the linear velocity. The control inputs are u t = [ w t , a t ] T , with w t as the angular velocity and a t as the acceleration. The vehicle dynamics are expressed as follows:
x ˙ t = v t cos ( θ t ) , y ˙ t = v t sin ( θ t ) , θ ˙ t = κ 1 w t , v ˙ t = κ 2 a t ,
where κ 1 and κ 2 are constants. To facilitate the optimization process, we linearize the dynamics around a reference point x ^ = [ x ^ , y ^ , θ ^ , v ^ ] T . The resulting linear system is derived as a first-order Taylor approximation of the nonlinear dynamics, given by
x n + 1 = A n x n + B n u n + C n ,
where the matrices A n , B n , and C n are defined as
A n = 1 0 v ^ sin ( θ ^ ) d t cos ( θ ^ ) d t 0 1 v ^ cos ( θ ^ ) d t sin ( θ ^ ) d t 0 0 1 0 0 0 0 1 , B n = 0 0 0 0 κ 1 v ^ d t 0 0 κ 2 d t , C n = v ^ sin ( θ ^ ) θ ^ d t v ^ cos ( θ ^ ) θ ^ d t 0 0 .

6.2. Rule Description

We formulated five distinct rules as STL formulas. The definitions of the rules φ = [ φ 1 , , φ 5 ] are as follows:
  • Lane keeping (right): φ 1 = y t y l , min ;
  • Lane keeping (left): φ 2 = y t y l , max ;
  • Collision avoidance (front vehicle):
    φ 3 = ( x t x c , min ) ( x t x c , max ) ( y t y c , min ) ( y t y c , max ) ;
  • Speed limit: φ 4 = v t v t h ;
  • Slow down before the front vehicle:
    φ 5 = ( v t v u ) U [ t a , t b ] ( x t x c , min ) .
In these formulations, t a and t b are set to 6 and 12, respectively.
Figure 5 illustrates the driving environment used to describe these STL rules. Note that in this figure, the ego vehicle is depicted in blue, the preceding vehicle in orange, and other vehicles in gray. The positions x t and y t and velocity v t correspond to the ego vehicle. The boundaries of the preceding vehicle at the x-y coordinates are denoted by x c , min , x c , max , y c , min , and y c , max . Similarly, x o , min , x o , max , y o , min , and y o , max represent the boundaries of other vehicles except the preceding one. The lane boundaries are denoted by y l , min and y l , max , while the track boundaries are represented by y t , min and y t , max .
Here, v t h represents the speed limit threshold for rule φ 4 . The final rule, φ 5 , mandates that the ego vehicle decelerate when approaching a preceding vehicle in the same lane. Parameters v u , t a , and t b are specific to rule φ 5 .

6.3. Simulation Results

Figure 6 presents the predicted robustness slackness r generated by the proposed CVAE network, alongside the control sequence produced by the MPC based on these predicted values. In the left subfigures indicating robustness slackness, negative degrees of satisfaction are marked with a red box.
In Figure 6a, the predicted robustness slackness suggests that rules φ 2 and φ 5 may be violated. It can be observed that the control sequence generated by the MPC results in the vehicle moving to the left lane (violating φ 2 ) and accelerating in the presence of a preceding vehicle (violating φ 5 ).
Figure 7 demonstrates the application of the proposed method in the NGSIM road environment. The figure illustrates four different scenes, showing the predicted robustness slackness and the corresponding vehicle movements for each situation. For the lane-keeping rules φ 1 and φ 2 , if the robustness slackness value is less than or equal to a specified threshold (indicated by ‘threshold’ in the figure), it is evident that the ego vehicle attempts to change lanes. Conversely, if the robustness slackness value for φ 1 and φ 2 is greater than the threshold value, the proposed method may not initiate a lane change, depending on the specific situation (as illustrated in scene 4). Overall, the proposed method demonstrates the ability to drive efficiently—allowing the violation of some rules in certain situations—while maintaining safety in complex traffic conditions.
Collision experiments using the proposed approach were conducted across five test scenarios: two from the NGSIM dataset and three from the highD dataset. We compared the proposed method against five methods: LBMPC_STL [8], LSTM, TFN [39], and DQN [40]. The LSTM method employs a naive LSTM encoder–decoder framework for imitation learning, while TFN utilizes a Transformer network for imitation learning. In the DQN method, the Q-network is modeled as a four-layer multi-layer perceptron with 12 discrete actions and receives input features. The DQN model was trained until convergence was achieved (1,000,000 episodes).
A total of 100 experiments were conducted on various tracks and starting positions. Figure 8 and Figure 9 illustrate single trials from the collision experiments, demonstrating the performance of each algorithm under identical track conditions, time, and starting positions. The distance traveled by each method is indicated below each subfigure. Methods incorporating MPC (Proposed and LBMPC_STL) successfully arrived at the target area (the end of the track), whereas other methods failed due to collisions. Compared to the proposed method, LBMPC_STL exhibited deficiencies, such as the ego vehicle ending up on the lane border and being too close to the preceding vehicle.
Table 1 presents the number of successful trials for each method. The two methods with the highest number of successes for each scenario are highlighted in bold. The results clearly demonstrate that the proposed approach outperforms other methods in most test scenarios. DQN(1/2) refers to cases where half of the episodes (500,000 episodes) are used in the reinforcement learning stage. The average time steps for successful cases are shown in parentheses.
In the results presented in Table 1, reinforcement learning techniques (specifically DQN) exhibit a longer average time step compared to other methods due to the emphasis on stability in the design of the reward function. Additionally, there was no significant difference in average time steps (for successful cases) between the MPC techniques, including the proposed method, and the supervised learning techniques. Notably, the proposed method demonstrated a slightly shorter average time step compared to the other MPC technique, LBMPC_STL.
While the “average time step” cannot be an absolute criterion for evaluating the superiority of an algorithm’s performance, when combined with the “collision rate”, it indicates that the proposed method enables more stable and efficient autonomous driving compared to other methods.
Two key observations can be made from these results:
  • Model Predictive Control (MPC) demonstrates superior safety performance compared to reinforcement learning (DQN) and imitation learning approaches (LSTM, TFN).
  • The deep learning approach employed in the proposed method yields a better performance than the Gaussian process regression approach used in LBMPC_STL.

7. Conclusions

In this paper, we present a Model Predictive Control (MPC) method designed to manage dynamic systems while adhering to a set of Signal Temporal Logic (STL) rules. Unlike traditional approaches that enforce strict compliance with all rules, our method efficiently balances rule adherence by selectively disobeying certain rules to resolve dilemma situations where not all rules can be simultaneously satisfied.
The proposed method introduces the concept of robustness slackness, which represents the lower bound of the robustness degree, learned from expert demonstrations or data. By employing a Conditional Variational Autoencoder (CVAE) network, the controller adapts its behavior to prioritize different rules based on the context, emulating the decision-making processes of human experts.
Our contribution lies in the innovative approach of learning the satisfaction measure of rules using a deep-learning network, enabling robots to internalize and replicate the value systems of humans. This approach allows for more flexible and context-aware control, which is crucial for operating in complex and dynamic environments.

Author Contributions

Conceptualization, E.I., M.C. and K.C.; methodology, E.I. and M.C.; validation, K.C.; data curation, E.I. and K.C.; writing—original draft preparation, E.I. and M.C.; writing—review and editing, K.C.; visualization, E.I. and M.C.; supervision, K.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by Innovative Human Resource Development for Local Intellectualization program through the Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korean government (MSIT) (IITP-2024-RS-2023-00259678).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data available in a publicly accessible repository.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Camacho, E.F.; Alba, C.B. Model Predictive Control; Springer Science & Business Media: Berlin, Germany, 2013. [Google Scholar]
  2. Dantec, E.; Naveau, M.; Fernbach, P.; Villa, N.; Saurel, G.; Stasse, O.; Taix, M.; Mansard, N. Whole-body model predictive control for biped locomotion on a torque-controlled humanoid robot. In Proceedings of the IEEE-RAS International Conference on Humanoid Robots, Ginowan, Japan, 28–30 November 2022; pp. 638–644. [Google Scholar]
  3. Kong, N.J.; Li, C.; Council, G.; Johnson, A.M. Hybrid iLQR model predictive control for contact implicit stabilization on legged robots. IEEE Trans. Robot. 2023, 39, 4712–4727. [Google Scholar] [CrossRef]
  4. Le Cleac’h, S.; Howell, T.A.; Yang, S.; Lee, C.Y.; Zhang, J.; Bishop, A.; Schwager, M.; Manchester, Z. Fast contact-implicit model predictive control. IEEE Trans. Robot. 2024, 40, 1617–1629. [Google Scholar] [CrossRef]
  5. Jang, E.; Irpan, A.; Khansari, M.; Kappler, D.; Ebert, F.; Lynch, C.; Levine, S.; Finn, C. Bc-z: Zero-shot task generalization with robotic imitation learning. In Proceedings of the Conference on Robot Learning, London, UK, 8–11 November 2021; pp. 991–1002. [Google Scholar]
  6. Zare, M.; Kebria, P.M.; Khosravi, A.; Nahavandi, S. A survey of imitation learning: Algorithms, recent developments, and challenges. arXiv 2023, arXiv:2309.02473. [Google Scholar]
  7. Abbeel, P.; Ng, A.Y. Apprenticeship learning via inverse reinforcement learning. In Proceedings of the International Conference on Machine Learning, Banff, AB, Canada, 4–8 July 2004. [Google Scholar]
  8. Cho, K.; Oh, S. Learning-based model predictive control under signal temporal logic specifications. In Proceedings of the IEEE International Conference on Robotics and Automation, Brisbane, QLD, Australia, 21–25 May 2018; pp. 7322–7329. [Google Scholar]
  9. Maler, O.; Nickovic, D. Monitoring temporal properties of continuous signals. In Proceedings of the FORMATS/FTRTFT, Grenoble, France, 22–24 September 2004; Volume 3253, pp. 152–166. [Google Scholar]
  10. Donzé, A.; Maler, O. Robust satisfaction of temporal logic over real-valued signals. In Proceedings of the FORMATS, Klosterneuburg, Austria, 8–10 September 2010; Volume 6246, pp. 92–106. [Google Scholar]
  11. Fainekos, G.E.; Kress-Gazit, H.; Pappas, G.J. Temporal logic motion planning for mobile robots. In Proceedings of the IEEE International Conference on Robotics and Automation, Barcelona, Spain, 18–22 April 2005. [Google Scholar]
  12. Karaman, S.; Frazzoli, E. Complex mission optimization for multiple-UAVs using linear temporal logic. In Proceedings of the IEEE American Control Conference, Seattle, WA, USA, 11–13 June 2008. [Google Scholar]
  13. Wongpiromsarn, T.; Topcu, U.; Murray, R.M. Receding horizon temporal logic planning for dynamical systems. In Proceedings of the IEEE Conference on Decision and Control, Shanghai, China, 15–18 December 2009. [Google Scholar]
  14. Pant, Y.V.; Abbas, H.; Mangharam, R. Distributed Trajectory Planning for Multi-rotor UAVs with Signal Temporal Logic Objectives. In Proceedings of the IEEE Conference on Control Technology and Applications, Trieste, Italy, 23–25 August 2022; pp. 476–483. [Google Scholar]
  15. Meng, Y.; Fan, C. Signal temporal logic neural predictive control. IEEE Robot. Autom. Lett. 2023, 8, 7719–7726. [Google Scholar] [CrossRef]
  16. Sohn, K.; Lee, H.; Yan, X. Learning structured output representation using deep conditional generative models. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015; Volume 28. [Google Scholar]
  17. Karaman, S.; Sanfelice, R.G.; Frazzoli, E. Optimal control of mixed logical dynamical systems with linear temporal logic specifications. In Proceedings of the IEEE Conference on Decision and Control, Cancun, Mexico, 9–11 December 2008. [Google Scholar]
  18. Kwon, Y.; Agha, G. LTLC: Linear temporal logic for control. In Proceedings of the Hybrid Systems: Computation and Control, St. Louis, MO, USA, 22–24 April 2008; pp. 316–329. [Google Scholar]
  19. Wolff, E.M.; Topcu, U.; Murray, R.M. Optimization-based control of nonlinear systems with linear temporal logic specifications. In Proceedings of the IEEE International Conference on Robotics and Automation, Hong Kong, China, 31 May–7 June 2014; pp. 5319–5325. [Google Scholar]
  20. Cho, K. Learning-based path planning under co-safe temporal logic specifications. IEEE Access 2023, 11, 25865–25878. [Google Scholar] [CrossRef]
  21. Raman, V.; Donzé, A.; Maasoumy, M.; Murray, R.M.; Sangiovanni-Vincentelli, A.; Seshia, S.A. Model predictive control with signal temporal logic specifications. In Proceedings of the IEEE Conference on Decision and Control, Los Angeles, CA, USA, 15–17 December 2014; pp. 81–87. [Google Scholar]
  22. Sadigh, D.; Kapoor, A. Safe control under uncertainty. arXiv 2015, arXiv:1510.07313. [Google Scholar]
  23. Mao, Y.; Acikmese, B.; Garoche, P.L.; Chapoutot, A. Successive convexification for optimal control with signal temporal logic specifications. In Proceedings of the ACM International Conference on Hybrid Systems: Computation and Control, Milan, Italy, 4–6 May 2022; pp. 1–7. [Google Scholar]
  24. Lenz, I.; Knepper, R.A.; Saxena, A. DeepMPC: Learning Deep Latent Features for Model Predictive Control. In Proceedings of the Robotics: Science and Systems, Rome, Italy, 13–17 July 2015. [Google Scholar]
  25. Carron, A.; Arcari, E.; Wermelinger, M.; Hewing, L.; Hutter, M.; Zeilinger, M.N. Data-driven model predictive control for trajectory tracking with a robotic arm. IEEE Robot. Autom. Lett. 2019, 4, 3758–3765. [Google Scholar] [CrossRef]
  26. Lin, Y.; McPhee, J.; Azad, N.L. Comparison of deep reinforcement learning and model predictive control for adaptive cruise control. IEEE Trans. Intell. Veh. 2020, 6, 221–231. [Google Scholar] [CrossRef]
  27. Kong, Z.; Jones, A.; Medina Ayala, A.; Aydin Gol, E.; Belta, C. Temporal logic inference for classification and prediction from data. In Proceedings of the International Conference on Hybrid Systems: Computation and Control, Berlin, Germany, 15–17 April 2014; pp. 273–282. [Google Scholar]
  28. Castro, L.I.R.; Chaudhari, P.; Tumova, J.; Karaman, S.; Frazzoli, E.; Rus, D. Incremental sampling-based algorithm for minimum-violation motion planning. In Proceedings of the IEEE Conference on Decision and Control, Firenze, Italy, 10–13 December 2013; pp. 3217–3224. [Google Scholar]
  29. Lee, S.; Seo, S. A learning-based framework for handling dilemmas in urban automated driving. In Proceedings of the IEEE Conference on Robotics and Automation, Singapore, 29 May–3 June 2017; pp. 1436–1442. [Google Scholar]
  30. Xu, H.; Gao, Y.; Yu, F.; Darrell, T. End-to-end learning of driving models from large-scale video datasets. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2174–2182. [Google Scholar]
  31. Codevilla, F.; Miiller, M.; López, A.; Koltun, V.; Dosovitskiy, A. End-to-end driving via conditional imitation learning. In Proceedings of the IEEE International Conference on Robotics and Automation, Brisbane, QLD, Australia, 21–25 May 2018; pp. 1–9. [Google Scholar]
  32. Schmerling, E.; Leung, K.; Vollprecht, W.; Pavone, M. Multimodal probabilistic model-based planning for human-robot interaction. In Proceedings of the IEEE International Conference on Robotics and Automation, Brisbane, QLD, Australia, 21–25 May 2018; pp. 1–9. [Google Scholar]
  33. Drews, P.; Williams, G.; Goldfain, B.; Theodorou, E.A.; Rehg, J.M. Aggressive deep driving: Model predictive control with a cnn cost model. arXiv 2017, arXiv:1707.05303. [Google Scholar]
  34. Jaritz, M.; de Charette, R.; Toromanoff, M.; Perot, E.; Nashashibi, F. End-to-end race driving with deep reinforcement learning. In Proceedings of the IEEE International Conference on Robotics and Automation, Brisbane, QLD, Australia, 21–25 May 2018; pp. 2070–2075. [Google Scholar]
  35. Paszke, A.; Gross, S.; Chintala, S.; Chanan, G.; Yang, E.; DeVito, Z.; Lin, Z.; Desmaison, A.; Antiga, L.; Lerer, A. Automatic differentiation in PyTorch. In Proceedings of the NIPS-W, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
  36. Gurobi Optimization, LLC. Gurobi Optimizer Reference Manual. 2024. Available online: http://www.gurobi.com (accessed on 3 March 2024).
  37. Alexiadis, V.; Colyar, J.; Halkias, J.; Hranac, R.; McHale, G. The next generation simulation program. Inst. Transp. Eng. 2004, 74, 22. [Google Scholar]
  38. Krajewski, R.; Bock, J.; Kloeker, L.; Eckstein, L. The highD Dataset: A Drone Dataset of Naturalistic Vehicle Trajectories on German Highways for Validation of Highly Automated Driving Systems. In Proceedings of the IEEE International Conference on Intelligent Transportation Systems, Maui, HI, USA, 4–7 November 2018. [Google Scholar]
  39. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
  40. Mnih, V.; Kavukcuoglu, K.; Silver, D.; Graves, A.; Antonoglou, I.; Wierstra, D.; Riedmiller, M. Playing atari with deep reinforcement learning. arXiv 2013, arXiv:1312.5602. [Google Scholar]
Figure 1. Overview of the proposed learning-based MPC framework. Expert demonstrations are utilized to learn the lower bounds of robustness, referred to as robustness slackness, through a deep learning approach. The learned values inform the MPC method, which then computes control sequences considering the STL rules.
Figure 1. Overview of the proposed learning-based MPC framework. Expert demonstrations are utilized to learn the lower bounds of robustness, referred to as robustness slackness, through a deep learning approach. The learned values inform the MPC method, which then computes control sequences considering the STL rules.
Sensors 24 04567 g001
Figure 2. Description of the ego vehicle and nearby vehicles in a track driving scenario. The ego vehicle ( V e g o ) is shown in blue. The diagram includes up to six nearby vehicles positioned in front and behind, across the left, center, and right lanes relative to the ego vehicle.
Figure 2. Description of the ego vehicle and nearby vehicles in a track driving scenario. The ego vehicle ( V e g o ) is shown in blue. The diagram includes up to six nearby vehicles positioned in front and behind, across the left, center, and right lanes relative to the ego vehicle.
Sensors 24 04567 g002
Figure 3. Demonstration in a track driving environment showing (a) the robustness degree and (b) its predicted lower bound. Trajectories with values less than zero are shaded in red.
Figure 3. Demonstration in a track driving environment showing (a) the robustness degree and (b) its predicted lower bound. Trajectories with values less than zero are shaded in red.
Sensors 24 04567 g003
Figure 4. The CVAE network used to predict the robustness slackness.
Figure 4. The CVAE network used to predict the robustness slackness.
Sensors 24 04567 g004
Figure 5. Driving environment illustrating the defined STL rules φ .
Figure 5. Driving environment illustrating the defined STL rules φ .
Sensors 24 04567 g005
Figure 6. Snapshots of the proposed method applied to the NGSIM dataset.
Figure 6. Snapshots of the proposed method applied to the NGSIM dataset.
Sensors 24 04567 g006
Figure 7. Illustration of the proposed method’s performance in NGSIM road environments. The figure depicts four different scenes, showing the predicted robustness slackness and the corresponding vehicle movements for each situation.
Figure 7. Illustration of the proposed method’s performance in NGSIM road environments. The figure depicts four different scenes, showing the predicted robustness slackness and the corresponding vehicle movements for each situation.
Sensors 24 04567 g007
Figure 8. Snapshots of the collision experiments in the highD environment (Proposed, LBMPC_STL, LSTM).
Figure 8. Snapshots of the collision experiments in the highD environment (Proposed, LBMPC_STL, LSTM).
Sensors 24 04567 g008
Figure 9. Snapshots of the collision experiments in the highD environment (TFN, DQN).
Figure 9. Snapshots of the collision experiments in the highD environment (TFN, DQN).
Sensors 24 04567 g009
Table 1. Number of successful trials in collision experiments.
Table 1. Number of successful trials in collision experiments.
ProposedLBMPC_STLLSTMTFNDQNDQN(1/2)
testset 1 (NGSIM-US101)88 (180.1)84 (189.6)75 (186.5)80 (183.4)82 (199.5)77 (205.5)
testset 2 (NGSIM-I80)86 (175.4)81 (180.8)71 (176.6)76 (174.2)79 (187.1)74 (193.2)
testset 3 (highD)89 (124.3)86 (129.1)75 (125.5)80 (119.9)83 (147.3)74 (156.8)
testset 4 (highD)91 (131.6)90 (131.6)76 (130.8)86 (135.2)88 (148.8)74 (159.2)
testset 5 (highD)95 (136.1)93 (139.8)82 (138.7)90 (134.3)91 (159.3)81 (165.7)
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Im, E.; Choi, M.; Cho, K. Model Predictive Control with Variational Autoencoders for Signal Temporal Logic Specifications. Sensors 2024, 24, 4567. https://doi.org/10.3390/s24144567

AMA Style

Im E, Choi M, Cho K. Model Predictive Control with Variational Autoencoders for Signal Temporal Logic Specifications. Sensors. 2024; 24(14):4567. https://doi.org/10.3390/s24144567

Chicago/Turabian Style

Im, Eunji, Minji Choi, and Kyunghoon Cho. 2024. "Model Predictive Control with Variational Autoencoders for Signal Temporal Logic Specifications" Sensors 24, no. 14: 4567. https://doi.org/10.3390/s24144567

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop