1. Introduction
The dynamics of DESs are driven by sequences of asynchronous events. The main control theory developed for DESs is the supervisory control theory, where a supervisor is desired to disable events that lead to some undesirable event sequences. Since the supervisor cannot control and observe all the events, the desired behaviors (control objective) could be unachievable. The necessary and sufficient conditions for the existence of a supervisor are characterized as controllability [
1] and observability [
2]. Since then, the supervisory control is extended in several directions, such as decentralized supervisory control [
3], robust supervisory control [
4], asynchronous supervisory control [
5], and quantitative supervisory control [
6].
Nowadays, in many industrial applications, the supervisor is usually connected to the plant via communication networks. Such a network structure provides efficient ways for controlling DESs. However, the communication delays existing in the observation channel and the control channel pose significant challenges to the supervisory control of DESs [
7,
8,
9,
10,
11,
12,
13,
14]. Thus, networked DESs have drawn much attention in the past few years [
15,
16,
17,
18,
19,
20] Most of the current works on networked supervisory control focus on verifying if a given control objective can be achieved under observation delays and control delays [
20,
21,
22,
23,
24,
25,
26,
27,
28,
29,
30], which is known as the supervisor existence problem. When the desired language cannot be exactly achieved, one would compute a safe control policy online or offline, known as the supervisor synthesis problem [
31,
32,
33,
34,
35,
36]. In this paper, we focus on solving the maximally-permissive supervisor synthesis problem under observation delays and control delays. In particular, based on the infinite observed sequence of events, an online algorithm is presented in this paper to calculate a maximal supervisor under observation delays and control delays. The calculated online supervisor is optimal because (i) the system is prevented from leaving the desired language even if communication delays exist in both the observation channel and the control channel, and (ii) given (i) is satisfied, the language generated by the closed-loop system is maximized.
In the supervisor synthesis, state estimation is a crucial step in determining a valid control action after each new observation. The state estimation problem can be briefly stated as follows: estimate all of the states of the closed-loop system that may be under communication delays, given that all future control decisions are unknowable. To synthesize an optimal supervisor under communication delays, the authors in [
22,
24,
37,
38] compute the state estimate based on the open-loop system without using the information of the controls imposed on the system. As stated in [
39], the state estimate calculated in [
22,
24,
37,
38] contains some states that have been prevented from reaching. Therefore, the solutions computed in [
22,
24,
37,
38] are suboptimal for the unrestricted domain of observed event sequences. In [
29], the state estimates are computed based on the assumption that the control delays and the observation delays are constant. The proposed approach fails to deal with nondeterministic observation delays and control delays. Recently, the authors in [
26] calculated the state estimates of the networked DESs by taking the information of the control decision’s history into consideration. Nevertheless, the work of [
26] can be further improved in two directions. First, the framework of the networked supervisory control adopted in [
26] is conservative in the sense that the specified language of the closed-loop system is an over-approximation of the actual language of the closed-loop system. That is, it may include some sequences that never occur in practice (see Example 1 for more details), and the state estimate computed by [
26] may contain some states that the closed system never reaches. Thus, the synthesized supervisor could be restrictive in the sense that it may disable some unnecessary events. Second, the work of [
26] considers only control delays. When only control delays exist, the observation of a supervisor to a string is deterministic and the control command made after a string can be uniquely determined. In practice, however, the delays often exist in both the observation channel and the control channel. If this is the case, the observation of a supervisor to a string is nondeterministic and varies with the different observations. The supervisor may make different control decisions based on different observations, which complicates the supervisor’s synthesis problem.
In this paper, a new modeling framework for the supervisory control of DESs under control delays and observation delays is first presented. Specifically, in the newly proposed framework, we model the observation channel by a sequence of pairs of an occurred event and its observation delays (called the observation channel configuration). We also model the control channel by a sequence of pairs of an issued control command and its control delays (called the control channel configuration). We then build an automaton to model the interaction process between the plant and the supervisor over the observation channel and the control channel. In the automaton, two special types of events representing the respective receptions of observable events and the executions of control commands are introduced. Each state of the automaton dynamically tracks the plant state, the current control command, the observation channel configuration, the control channel configuration, and the supervisor state. Based on the constructed automaton, the exact language of the closed-loop system can be specified. Under the framework, we then discuss how to estimate (and predict) all the states of the current (and future) closed-loop system. Without any structural assumption on the solution space, an online algorithm is finally presented to calculate a maximal network-controlled policy based on the infinite observed sequence of events. We further compare the proposed supervisor with the supervisor proposed in [
26]. The previous framework may contain some physically impossible strings. This may damage the supervisor’s synthesis because a synthesized good supervisor may be mistakenly taken as a bad supervisor. There exists the possibility that all of the illegal strings that may be generated by the closed-loop system are physically impossible. In such situations, the controlled system can never reach an illegal state as all of the illegal strings never occur in reality. Since the proposed framework excludes all physically impossible strings, the state estimation is more precise than the previous approach. Thus, the proposed supervisor is more permissive than the previous one.
To show the application of the proposed modeling and control approach, we consider the vehicle management problem in a signal intersection. When a self-driving vehicle arrives at the intersection, it needs to communicate with the intersection to determine the traffic light color. If the traffic light is yellow or red, it must stop and wait until the traffic light is switched to green. Otherwise, if the traffic light is green, it can pass through the intersection. We show that the proposed approach can be used to achieve control objectives when control delays and observation delays exist.
Finally, we briefly discuss how to extend the proposed approaches to deal with non-FIFO observations and controls. Specifically, we consider a system where the actuators and the sensors are distributed at different sites. For each actuator, the supervisor sends control commands to it over an individual control channel, and for each sensor, the detected information is sent to the supervisor over an individual observation channel. Different channels may have different upper bounds of delays. Techniques are developed to model the dynamics of the closed-loop system.
The proposed supervisor synthesis approach differs from the existing works in the following sense.
In contrast to [
26], we consider both the control delays and the observation delays in this paper. That is, the observation of the supervisor to a string is nondeterministic and varies with the different observation delays. For different observations, the supervisor may make different control decisions. An event after a string may be allowed to occur after some of these control decisions but not be allowed to occur for the other control decisions. Thus, we must consider all possibilities. In addition, the closed-loop system behaviors specified in the proposed framework exclude those strings that never occur in reality and are shown to be more accurate. As a result, the supervisor can estimate the states of the closed-loop system more accurately and make control decisions more reasonable at any instant.
Compared with [
22,
24,
37,
38], the supervisor makes control decisions based on closed-loop systems. In other words, the synthesized supervisor considers controls imposed on the system when making control decisions. Thus, the control command made by the proposed supervisor is optimal with respect to the unrestricted domain of the observed event sequences.
Different from [
29], the proposed model assumes that the observation delays and control delays are nondeterministic, which often happens. In this paper, the observation delays and control delays are measured by the number of events occurring in the plant. More specifically, the observation delays and control delays are upper-bounded by
and
events, respectively. That is, all of the events delayed at the observation channel can be communicated to the supervisor (in the same order that they are generated) before no more than
event occurrences. All control commands delayed at the control channel can be executed by the actuator (in the same order that they are issued) before no more than
events (since they are issued).
The rest of this paper is organized as follows.
Section 2 presents some preliminary concepts and the required assumptions in this paper.
Section 3 introduces a new modeling framework for supervisory control with observation delays and control delays. An online procedure for estimating the states of the closed-loop system is presented in
Section 4.
Section 5 synthesizes a maximal and safe networked supervisor on the fly.
Section 6 discusses an application for the vehicle control in a signal intersection.
Section 7 extends the proposed approaches to deal with non-FIFO observations and controls.
Section 8 concludes this paper.
2. Preliminaries
We model a DES using a deterministic finite-state automaton
, where
Q is the finite set of states;
is the finite set of events;
is the transition function;
is the initial state.
is the Kleene closure of
, i.e., the set of all sequences over events in
.
is extended to
in the usual way [
40]. The language generated by
G is denoted by
.
is the empty sequence. “!” means “is defined”.
Given a , we write for , and . is the length of s. denotes the set of all prefixes of s. denotes the prefix of s, such that . Let . Let be the suffix of s after its prefix t, i.e., . If t is not a prefix of s, then is not defined. The prefix closure of a language is denoted by . L is prefix-closed if . In this paper, only prefix-closed languages are considered. is the set of natural numbers. Let be the set of natural numbers no larger than N. Given and , we say is a sub-automaton of , denoted by , if can be obtained from by deleting some states in and all transitions connect to these states.
In many applications, the original system G may not satisfy the desired specification. To make the system fulfill the specification, the supervisory control finds a supervisor to dynamically disable events that lead to some undesirable sequences. In general, not all of the events are controllable and observable. We partition into the set of controllable events and the set of uncontrollable events . We also partition into the set of observable events and the set of unobservable events . The natural projection is recursively defined as: and, for all , , if , and , if .
We denote, in this paper, the supervisor by a pair , where is a deterministic automaton with , and is a function that specifies the set of events to be enabled. Specifically, for any , we denote by the set of events to be enabled after observing t. With a slight abuse of notation, we write . More details on the definition of S are provided in Example 1. Let be the set of all the admissible control commands. Since we cannot disable an uncontrollable event, for all . The control objective in this paper is given by a specification language . We assume that K can be represented by a sub-automaton of G. The automaton representation of language with can always be changed to satisfy .
As shown in
Figure 1, in the networked DESs, communications from the plant (supervisor) to the supervisor (plant) for the observation (control) are carried out over an observation channel (control channel) subject to random delays. We assume first-in-first-out (FIFO) is satisfied in both the observation and control, i.e., the observations of events are sent to the supervisor in the same order that they are generated and the control commands are executed in the same order that they are issued. As shown in [
21,
22,
23,
24], the delays are measured by the number of event occurrences (observable or not). We assume that (1) the observation delays are upper-bounded by
event occurrences, i.e., when an observable event occurs, it can be communicated to the supervisor before no more than
additional event occurrences; (2) the control delays are upper-bounded by
event occurrences, i.e., after a control command is issued, it can be executed before no more than
event occurrences. We assume that the initial control command has been deployed in the actuator of the plant beforehand. When the plant is initialized and starts to work, the initial control command can be executed without any delays.
Given system
G and a supervisor
S defined over
, we consider all possible strings, which may be generated by the closed-loop system (also called the controlled system) when the observation delays and control delays are upper-bounded by
and
, respectively. Before that, let us first recall how the previous works specify the language of the closed-loop system under observation delays and control delays. As shown in [
23], an upper bound on possible strings, denoted by
, which may be generated by the controlled system under observation delays and control delays is defined as follows:
;
for any
and
with
,
if
is enabled by one of the control commands issued in the past
steps, i.e.,
In [
22,
24], the language
is also referred to as the large language. However, as discussed in [
23,
26],
is not the exact language that may be generated by the closed-loop system. It is essentially an over-approximation of the actual language that may be generated by the closed-loop system and may contain some sequences that never occur in reality. To make this paper self-contained, we use the following simple example to illustrate this.
Example 1. Consider the system G depicted in Figure 2a with , , and . Let , i.e., the upper bounds of control delays and observation delays are both 1. The supervisor is depicted in Figure 2b. The function χ is specified by the set of events associated with each state in Figure 2b. Specifically, . When α is observed, automaton A moves to state from state , and . When is observed, automaton A moves to state from state , and . For the other , . We first show that . At first, we have . Since , , and , by definition, . Moreover, since , , and , by definition, . Then, since , , and , by definition, . We next show that never occurs in practice.
Since , , and , one can check that α can occur after only if is taking effect when α occurs after . However, since and , must have been executed at the time β occurs after α. In other words, must have been replaced by after the occurrence of . Therefore, never occurs (under S) in reality.
To obtain the exact language of the closed-loop system, we need a new modeling framework for networked DESs, which will be discussed in the following section.
3. Modeling Framework for Networked Supervisory Control
In this section, we consider a new modeling framework for the network supervisory control of DESs. In the new framework, we model the observation channel by a sequence of observable events waiting to be communicated and their observation delays. We also model the control channel by a sequence of control commands waiting to be executed and their control delays. We then build an automaton to describe how the supervisor and the plant interact with each other over the observation channel and the control channel. It is shown that the language of the closed-loop system subject to communication delays can be simply “decoded” from sequences of the constructed automaton.
3.1. Modeling of the Communication Channels
Let us first consider the observation channel.
Definition 1. The observation channel configuration is defined as a sequence of pairs:where are the observable events (in the same order that they are generated) that have occurred but are currently delayed at the observation channel, and , is the number of event occurrences since occurred. If the observation channel is empty, . We denote by the set of all the possible observation channel configurations, where is the maximum length of . The observation channel configuration is evolving when a new event occurs or a new observable event is communicated. To update , we introduce the following operators.
When a new event
occurs, to update the observation channel configuration, we define the operator
as: for all
and all
,
where if
,
, and if
,
.
When a new observable event
delayed at the observation channel is communicated, to update the observation channel configuration, we define the operator
as: for all
and all
,
When an event
occurs in the plant, all the natural numbers in
should be ’plus 1’ since they are used to count the observation delays. Furthermore, if
, by FIFO, we still need to add
to the end of
for recording the new observable event occurrence. That is what the operator
does in Equation (
1). On the other hand, when a new observable event is communicated to the supervisor, by FIFO, it must be the first event queued at the observation channel. Therefore, we define
to remove the first event
from
. Next, let us consider the control channel.
Definition 2. The control channel configuration is defined as a sequence of pairs:where is a sequence of control commands (in the same order that they are issued) that have been issued but are currently delayed at the control channel, and , is the number of event occurrences since has been issued. If the control channel is empty, . We denote by the set of all possible control channel configurations, where is the maximum length of . To update , we introduce the following operators.
When a new event
occurs in the plant, to update the control channel configuration, we define the operator
as: for all
,
where if
,
, and if
,
.
When a new control command
is issued by the supervisor, to update the control channel configuration, we define the operator
as: for all
and all
,
When a new control command
delayed at the control channel is executed, to update the control channel configuration, we define the operator
as: for all
and all
,
When a new event occurs, for recording the control delays, adds 1 to all of the natural numbers in . When a new control command is issued (following a new observation), adds the newly issued control command to the end of . Moreover, when a new control command is executed, removes the first control command from .
3.2. Language of the Closed-Loop System
We next show how to specify the language that may be generated by the controlled system subject to observation delays and control delays. Specifying an accurate language requires the information of the control command that takes effect in the interval between two successive observable event communications. In the standard supervisory control framework without communication delays, the control command taking effect is exactly the one that was most-recently issued, which can be uniquely determined by the sequence that has been observed so far. However, in the presence of communication delays, the control commands taking effect (between two successive observable event communications) are non-deterministic.
To obtain the exact language of the closed-loop system, we construct an automaton that dynamically tracks the state of the plant, the current control command, the observation channel configuration, the control channel configuration, and the state of the supervisor. This model essentially captures the interaction process between the supervisor and the plant over the observation channel and the control channel. Before we formally construct the automaton, let us introduce two special types of events.
To keep track of what has been successfully communicated so far, we define the bijection , such that is a set disjointed from . For all , we use to denote that the occurrence of was communicated. Define as, for all , . We extend f to a set of sequences, as and, for all , . We also extend to a set of sequences, as and, for all , .
To model which control action is taken, we define bijection , such that is disjointed from . For all , we use to denote that the control command was executed. Define as, for all , . We extend g to a set of sequences, as and, for all , . We also extend to a set of sequences, as and, for all , .
We show in
Figure 3 how the plant interacts with the supervisor over the observation channel and the control channel. When a new observable event
occurs in the plant, it is immediately added to the end of the observation channel. Since the observation delays are upper-bounded by
event occurrences, the maximum observation delays after the occurrence of
should be no larger than
, i.e.,
. Similarly, since the control delays are upper-bounded by
event occurrences, the maximum control delays after the occurrence of
should be no larger than
, i.e.,
. By FIFO, the first event delayed at the observation channel, i.e.,
can be communicated to the supervisor. If
is communicated to the supervisor, we need to remove
from the head of the observation channel. Meanwhile, following the observation of
, a new control command
is made and is added to the end of the control channel. Moreover, by FIFO, the control commands are executed in the same order that they are issued. Thus,
cannot be executed until
is executed. If
is executed, we need to remove
from the head of
.
Notations: Given a , if , let be the maximum delay occurring in the observation channel, and if , let . Similarly, given a , if , let be the maximum delay occurring in the control channel, and if , let .
Given a supervisor with , we formally construct , where is the state space; is the initial state, where is the initial control command (since the initial control command can be immediately executed when the plant starts to work, we assume that the initial control command takes effect at first); is the event set; the transition function is defined as:
For all
and all
,
where
;
For all
and all
,
where
;
For all
and all
,
where
.
Equation (
6): for any
, an event
can occur at
q only if (i)
is active at
q, i.e.,
; (ii)
is allowed to occur by the control command in use, i.e.,
; (iii) after the occurrence of
, the control delays and the observation delays are no larger than
and
, respectively, i.e.,
. When
occurs at
q, to track the plant state, we set
. Meanwhile, to update the observation channel configuration and the control channel configuration, by Equations (
1) and (
3), we set
and
. Since no new control command is executed, we keep
unchanged.
Equation (
7): for any
, if a new observable event
is communicated (denoted by
), by FIFO,
must be the first event queued at the observation channel, i.e.,
. When
is communicated, by Equation (
2), we set
. Meanwhile, upon the communication of
, the supervisor moves to state
and sends a new control command
to the actuator of the plant. Correspondingly, we set
and
.
Equation (
8): for any
, if a new control command
is executed (denoted by
), by FIFO, it must be the first control command queued at the control channel, i.e.,
. When
is executed, by FIFO, the control command that takes effect becomes
, and the control commands delayed at the control channel become
. Correspondingly, we have
and
.
Remark 1. By the construction, satisfies all of the assumptions made in this paper. Specifically, by Equation (6), we know the maximum delay occurring in the observation channel is no larger than , and the maximum delay occurring in the control channel is no larger than . By Equations (7) and (8), the delayed observable events are communicated to the supervisor in the same order that they are generated, and the delayed control commands are delivered to the plant in the same order that they are issued, i.e., both the observation channel and the control channel satisfy the FIFO property. Moreover, by Equations (7) and (8), the observation delays and control delays are non-deterministic. That is, an observable event can be communicated in any one of the following steps from the occurrence, and a control command can be executed in any one of the following steps from when it is issued. Remark 2. In some control applications, there may exist communication losses between the plant and the supervisor. For example, some observable transitions may be lost when they are communicated to the supervisor. Let us denote the set of transitions of G by . We also denote the set of observable transitions of G by . We partition into and , where is the set of transitions whose corresponding event occurrences are either observed without losses or observed with losses. To model possible observation losses, we can first refine the structure of G by adding parallel ε-transitions to the transitions that may be lost in and obtain , where . Using techniques developed in this section, we can construct . Note that when constructing , the supervisor does not need to make any control decisions following the communication of ε ( occurs). Although the occurrence of ε cannot be sensed by the supervisor, all of the natural numbers in and should be added by 1 when ε occurs (as an event occurred but was lost). In this paper, we focus on dealing with the nondeterministic observation delays and control delays existing between the supervisor and the plant. The formal approaches for implementing supervisory control under communication delays and losses are beyond the scope of this paper, yet a fruitful area for future exploration.
We use the following example to further illustrate how to construct .
Example 2. Consider the system G depicted in Figure 2a with , , and . Let . The supervisor is depicted in Figure 2b. We now construct using G and S. The initial state of is . By Figure 2a, we have . Moreover, since , , and , by Equation (6), we have , where , , , , and . Since , by Equation (1), . By Equation (3), . Therefore, . Next, consider state . Since , by Equation (2), . By Equation (7), where , , , , and . By Figure 2b, and . By Equation (4), . Therefore, . In this way, we can define all of the transitions. Finally, the complete is constructed as shown in Figure 4. For all
, let
and
be the sequences obtained by removing all the event occurrences in
and
, respectively, without changing the order of the remaining event occurrences in
. For example, consider
in
Figure 4. By the definitions of
and
, we have
and
. We extend
and
to a set of sequences in the usual way. Intuitively, for all
,
tracks the sequence that occurred in the plant, and
tracks what the supervisor observed after the occurrence of
. The following proposition formally proves this.
Proposition 1. Given an arbitrary , we write . Then, (i) and (ii) .
By Proposition 1, the dynamics of the closed-loop system can be simply obtained by removing all of the events in from sequences generated by , which yields the following definition.
Definition 3. Given a system G and a supervisor S, we construct as described above. All possible strings that may be generated by the closed-loop system when the observation delays and control delays are upper-bounded by and , respectively, are defined as .
Remark 3. By Definition 3 and Figure 4, is not included in . As we already discussed in Example 1, never occurs in practice. This example justifies the advantage of the proposed modeling framework. Given two supervisors and , we say that is smaller than , denoted by , if for all , , and we say that is strictly smaller than if , and there exists , such that . The following proposition states that the more events a supervisor S enables, the larger the language the closed-loop system generates.
Proposition 2. Given two with , , we construct as described above. Then, if , we have .
By Proposition 2, to synthesize a supervisor, such that the closed-loop language is maximal and safe, we only need to synthesize a supervisor’s maximal supervisor, such that the closed-loop system behaviors are safe. We next formally formulate the optimal supervisor synthesis problem.
3.3. Problem Formulation
Before we formally present the problem to be solved, let us first introduce the following notation. Given a supervisor S, for a prefix-closed language , means that S is restricted to a smaller domain L, defined as , if , and undefined, otherwise. Assuming that the supervisor observes , our goal is to compute a maximal and safe supervisor on the fly, for each , .
Problem 1. Assuming that the system G executes an arbitrarily long sequence and the current observation for s is , we find a supervisor S, such that:
S is safe, i.e., ;
is maximal, i.e., there is no other that satisfies (1) with .
Remark 4. Since we focus on an online supervisor synthesis, we only need to ensure that the control decisions that make up the current instant are optimal. That is why S is only required to be optimal on (instead of the whole ).
Remark 5. The solutions to Problem 1 need not be unique. Actually, there may exist several incomparable maximal solutions. In this paper, we emphasize how to online synthesize a “greedily maximal” supervisor, rather than ambitiously calculate all possible solutions.
4. State Estimation under Communication Delays
To make a control decision right after each new observation, the supervisor needs to estimate the states of the closed-loop system (subject to observation delays and control delays) on the fly. We focus on the problem of online networked state estimation in this section. Let us first introduce the definition of the networked state estimate (NSE) as follows.
Definition 4. Given a DES G and a supervisor S, we construct as described in Section 3.2. For any , defineas the NSE of t under S, which is the set of all the possible states that system G may be in after observing t (subject to observation delays and control delays) under S. If S is given beforehand, we can calculate by constructing an observer of with the set of observable events . However, we focus on online network supervisory control in this paper. That is, we need to calculate the state estimate right after each new communication (all future controls are unknown). To this end, besides the plant state , we also need to estimate the current control command , the observation channel configuration , and the control channel configuration , because all of them can affect the behaviors of the closed-loop system. Therefore, we denote each “state” of the closed-loop system by a four-tuple . We call such a state an augmented state. Let be the set of all the augmented states. We next show that we can estimate all possible states that the plant may be in by estimating all possible augmented states that the controlled system may reach.
To precisely estimate the augmented states, we need the following two operators.
Let be a set of augmented states calculated immediately after a new observation or the initial (since the plant does not work until it is initialized, we let before the initial control command is executed).The delayed unobservable reach of Z under an admissible control command , denoted by , is defined as follows.
If
, no control commands have been executed. That is,
is the initial control command. By assumption,
can be executed without any delays. Hence, by Equation (
10), we have
. Otherwise, if
,
is not the initial control command. By FIFO, it will not be executed until all of the control commands that are now delayed at the control channel are executed. Thus, for all
, Equation (
11) adds
to the end of
, i.e.,
. Meanwhile, Equations (
12) and (
13) consider the cases of “an event (observable or not) occurs” and “a control command is executed”, respectively. When there exist observation delays and control delays, only “an observable event is communicated” is observable. Therefore,
consists of all the augmented states that may be reached from
Z in an “unobservable” way.
Let
be the current set of augmented states. The delayed observable reach of
Z under an observable event
, denoted by
, is defined as:
Intuitively, includes all of the augmented states that can be reached from Z upon a new communication of . By FIFO, an observable event can be communicated only if it is the first event queued at the observation channel. Hence, we consider all the , such that there exists with . When is communicated, we remove from . As we can see, we set . We assume that is updated right after a new observation of but before the next control command is issued. Therefore, we keep unchanged.
We next present how to online estimate augmented states.
Definition 5. Let G be a DES and S be a supervisor. We construct as described in Section 3.2. For a , let be the augmented state estimate for t. is calculated by alternatively applying and as follows. Remark 6. indeed online estimates the augmented states. As shown in Figure 5, the online procedure for estimating augmented states can be briefly summarized as repeatedly executing (i) an observable event occurrence (after ) is communicated to the supervisor, and the set of augmented states is updated to ; (ii) following the observation of σ, a new control command is issued by the supervisor S. Then, the corresponding augmented state estimate is updated to . We next show that
indeed estimates the plant state, the current control command, the observation channel configuration, and the control channel configuration. Let us first define
The following lemma will be used later.
Lemma 1. For any if and is the augmented state calculated by applying Equation (12) or (13) on z, then . Proof. Without a loss of generality, we write
and
. Since
, by the definition of
, there exists a
, such that
and
with
,
,
, and
. Since
is the augmented state obtained by applying one of the operations in Equations (
12)∼(
13) on
z, one of the following two cases must be true.
Case 1:
. By Equation (
12),
,
,
, and
. Since
, by Equation (
6),
. Thus,
.
Case 2:
. By Equation (
13),
. Since
, by Equation (
8),
. Thus, we have that
. □
Theorem 1. Given a DES G and a supervisor S, we construct as described in Section 3.2. For any , we have Proof. We first prove
by contradiction. Suppose there exists
, such that
. Without loss of generality (w.l.o.g.), we assume that
t is the shortest sequence in
, satisfying
. We now show
. By Definition 5, for any
, there exists a sequence of augmented states
, such that
,
, and
is the augmented state calculated by applying Equation (
12) or (
13) on
,
. Since
,
. By repeatedly applying Lemma 1,
. Therefore,
.
Since
, we write
for some
. Since
,
such that
. Since
, by Definition 5, there exists a sequence of augmented states
with
for some
,
, and
is the augmented state calculated by applying Equation (
12) or (
13) on
,
. Next, we prove
.
Since
,
, such that
,
, and
. Since
and
, we have
. By Equation (
7),
Since
, we have
. By Proposition 1, we have
. By definition, we have
. Thus,
By the definition of
,
Since
,
. Hence,
. By repeatedly applying Lemma 1,
, which contradicts
.
We next prove . To prove , we only need to prove that for all , if , then . The proof is by induction on the finite length of sequences in .
Since and , the base case is true. The induction hypothesis is that for all with , we write . Then, .
We next prove the same is also true for with . We write . Then, . Since , , , or . We consider each of them separately as follows.
Case 1:
. Since
, by Equation (
6), we have
, , , and ;
, , , and .
Since
and Condition 1 in Case 1, by Equation (
12),
Since , we have . By Condition 2 in Case 1, .
Case 2:
. For brevity, we write
. Then,
. Since
, by Equation (
7),
;
, , , , and .
Since
, by Proposition 1,
. Thus,
. By the induction hypothesis,
. Since
, by Equation (
14),
. Moreover, since
, by Equation (
12),
By Definition 5, we have . Thus, by Condition 2 in Case 2,
Case 3:
. Since
,
. Since
, by Equation (
8), we have
;
, , , and .
Moreover, since
and
, by Equation (
13), we have that
By Condition 2 in Case 3,
□
Let be a given augmented state. We denote by the first component (the plant state) of z. (“FC” means “first component”). We extend to a set of augmented states as follows: . The following corollary discusses the relationship between and .
Corollary 1. Let G be a DES and S be a supervisor. We construct as described in Section 3.2. For any , . Proof. The proof directly follows from Theorem 1 and Definition 4. □
By Corollary 1, we can estimate the plant states by taking the first component of the estimated augmented states. We use the following example to further illustrate our online state estimation procedure.
Example 3. Consider again the system G depicted in Figure 2a and the supervisor S depicted in Figure 2b. Let , , and . We now compute , and , . Initially, by Equation (10), . Since , , and , by Equation (12), . Then, since , , , and , also by Equation (12), . Therefore, By Corollary 1, . Since , by Equation (14), By Definition 5, . Since is not the initial control command, by Equation (11), Then, by Equations (12) and (13), we have By Corollary 1, .
5. Online Network Supervisory Control
In this section, we calculate a maximal and safe control on the fly based on the state estimation techniques developed in
Section 4.
5.1. State Prediction
To determine if the control decision made at the moment is safe, we need to predict all states that we cannot prevent from reaching under observation delays and control delays. To this end, for a appeared in an augmented state estimate, we construct an automaton to check what states the plant may reach from q, if we disable all controllable events in the future. The basic idea for the construction of is similar to that of . That is, starting from z, dynamically tracks the plant state, the current control command, the observation channel configuration, and the control channel configuration, given that all future controls are .
Formally, we construct , where is the state space; is the initial state; is the event set; the transition function is defined as:
For all
and all
,
where
;
For all
and all
,
where
;
For all
and all
,
where
.
Since we assume all the controllable events are disabled in the future when a new observable event is communicated, we adopt the control command
. As shown in Equation (
16), we set
after the communication of
.
The following proposition states that for any and any , we cannot disable the occurrence of from q even if we disable all of the controllable events in the future.
Proposition 3. Let G be a DES and S a supervisor. For any , we write . Let and be the automaton constructed as described above. Then, if , .
Given a
, all of the plant states that we cannot prevent from reaching
q via some
can be obtained by taking the first component of
. That is,
We can prove Equation (
18) by inducing the finite length of sequences in
, which is similar to the proof of Proposition 1, and is omitted here for brevity.
5.2. Online Algorithm
Suppose that the current observation of the system is
. When a new event is observed, the supervisor
S makes a new control command
, and the augmented state estimate will be updated to
. As discussed in
Section 5.1, for any
,
collects all of the plant states that may be reached from
q no matter what control commands we adopt in the future. Therefore, we define the set of “bad” augmented states as:
For safety, all the augmented states in should never be reached. To make the problem non-trivial, we assume that the controlled system is safe if we choose to disable all of the controllable events after each new observation.
With the above preparations, we are now ready to introduce our online algorithm. We first pre-compute
offline. The networked supervisory control for
G is implemented on the fly in Algorithm 1 as follows: when the supervisor receives a new observable event occurrence
, Line 9 is executed with the new communication of
. The set of events to be enabled following the communication of
is then calculated by the for-loop on Line 3, where all the controllable events are checked one by one to see if they can be enabled while the system cannot reach some “bad” augmented states. The above processes are repeated when another observable event occurrence is communicated.
Algorithm 1:Online maximal networked control |
![Mathematics 11 00003 i001]() |
Definition 6. For any , the online network supervisor is defined as: for all , , is the set of events that is enabled right after the communication of , and for all with , .
Note that can be represented as an automaton with a finite state space.
Remark 7. The maximal networked supervisors are not unique. Given a different order on the controllable events, Algorithm 1 may return different results. However, the order of controllable events can be changed dynamically after each new communication, if desired. However, all of the possible supervisors returned by Algorithm 1 are safe and maximal.
Remark 8. Algorithm 1 tries to enable a maximum allowable set of controllable events at any instant to ensure the closed-loop system is within the desired specification language. However, as discussed in Remark 7, there may exist several incomparable maximum control decisions after each new observation. In many applications, enabling a controllable event could involve financial and human costs. In such situations, it is preferable to select a maximum allowable set of controllable events with the minimum enablement of costs at each instant. A simple approach is to consider all maximum control commands and select one with the minimum enablement cost. Another approach is to list all of the controllable events in ascending order according to their enablement costs: , where is a controllable event that is the least costly to enable, and is the event that is the most costly to enable. By the for-loop on Line 3, a controllable event with a smaller enablement cost has a priority to be considered. The first approach is optimal but needs more computational resources than the second approach. The second approach may be suboptimal but is more efficient compared with the first approach.
Remark 9. For a given , we know the number of states in is upper-bounded by . Since the number of verifiers to be constructed is , the computational complexity for calculating is the order of . By definition, , Since , , and , the complexity for calculating is polynomial with respect to (w.r.t.) and exponential w.r.t. .
After each new communication, for each , we need to test whether or not σ can be enabled. In each test, is updated by Line 5 and we need to search the state space once. Therefore, the computational complexity of Algorithm 1 is the stepwise order of , which is also polynomial w.r.t. and exponential w.r.t. .
Next, we show that the control commands made at each step in Algorithm 1 guarantee that the controlled system is safe.
Theorem 2. Suppose the current observation of the system is . Let be the augmented state estimate for under , . Then, The following corollary states that is the solution to Problem 1.
Corollary 2. Suppose the current observation of the system is . The online supervisor derived by Algorithm 1 satisfies conditions 1 and 2 of Problem 1.
Proof. The proof directly follows from Problem 1 and Theorem 2. □
5.3. Comparison with the Existing Work
In this section, we compare the proposed algorithm with the algorithm proposed in [
26]. Similar to [
26], we assume that there are only control delays with an upper bound of
, and there are no observation delays, i.e.,
in this section. To make this paper self-contained, we first review the state estimation techniques proposed in [
26].
A
channel configuration is defined in [
26] as a set of pairs in the form of
where
is an admissible control action that is delayed at the control channel, and
is a nonnegative integer indicating that the control action
is still effective for the next
steps. We denote by
the union of all the control actions in
, i.e.,
. We also denote by
the set of all channel configurations. To update a
after a new event occurrence, we define the “next” operator
as follows: for any
,
decreases the timing index of each element of
by one unit and only keeps the elements of
with nonnegative natural numbers. Thus,
collects all the control actions issued in the past
steps (including the current step).
We define an extended state as a pair of a plant state and a channel configuration . Let be the set of all extended states. Let be a set of extended states and be a control action. Then, the networked unobservable reach of Z under , denoted by , is defined recursively as follows:
For any
, we have
For any
and any unobservable event
, if
and
, then
Operation Equation (
20) is used to add the latest control action
into the channel configuration. Operation Equation (
21) computes all the extended states that can be reached from any
via an unobservable event occurrence. In Equation (
17), an event
can occur at an extended state
if it is active at state
q, i.e.,
, and it is allowed to occur by one of the control actions issued in the past
steps, i.e.,
.
Let
be a set of extended states and
be an observable event. The networked observable reach (
) of
Z upon the occurrence of
, denoted by
, is defined as:
Operation Equation (
22) collects all of the extended states that can be immediately reached from elements of
Z via
.
Let S be a given networked supervisor. The set of extended states that the controlled system may be in after a communicated , denoted by , can be calculated as follows:
Then, it is shown by Corollary 1 of [
26] that the set of plant states that the controlled system may be in after observing
t can be simply obtained by taking the first components of
.
Let
be a channel configuration and
be a non-negative integer. We denote by
the union of all control decisions that can take effect in the next
m steps, i.e.,
The uncontrollable language for
can be defined as:
Given an extended state
, the uncontrollable state prediction of
, denoted by
, is defined as
The online supervisor synthesis approaches proposed in [
26] mainly consist of the following two steps.
Step 1: When an observable event sequence is communicated, calculate the extended state estimate ;
Step 2: Find a maximal control decision , such that .
Next, we use two examples to show that the proposed supervisor can be more permissive than that proposed in [
26].
Example 4. Consider the uncontrolled system G and the desired system H depicted in Figure 6a and Figure 6b, respectively. Let and . Initially, we start from and choose a maximal control decision , such that . One can check that is such a maximal control action. Then, we can compute the extended state estimate . If α is observed, we have . Then, we again find a maximal control decision , such that .
By definition, . One can check that , because otherwise, by Equation (24), . Then, by Equation (25), we have . Therefore, we have . Since and , β will never occur at State 1. Therefore, all possible behaviors that may occur under the synthesized supervisor include . In the previous framework, it was assumed that all control actions issued in the past steps may take effect. Thus, in Example 4, since , may take effect after . Therefore, we must disable after to prevent the system from reaching State 5. Since , we know that will never occur after . However, as shown in the following example, we can actually enable after observing , and the controlled system can never reach State 5.
Example 5. Continue with Example 4. We now show how to apply Algorithm 1 to compute an optimal supervisor.
Algorithm 1 starts from and iterates the for-loop on Line 3 for computing a maximal , such that . One can check that is such an optimal control decision. The augmented state estimate is updated toAfter that, the supervisor observes α and estimatesThen, iterating the for-loop on Line 3 leads to . The augmented state estimate is updated toThen, the supervisor observes β and estimatesIterating the for-loop on Line 3, we have . Note that we cannot enable γ after observing . Therefore, under the synthesized supervisor, we may reach States 0, 1, 2, and 3. That is, all the possible behaviors that may be generated by the closed-loop system include .
By Examples 4 and 5, the language of the closed-loop system under the supervisor synthesized by Algorithm 1 is larger than the language of the closed-loop system under the supervisor synthesized by [
26]. Since the proposed framework excludes all physically impossible strings, the state estimate calculated is more precise than that calculated by the previous approach. Thus, the proposed supervisor is more permissive than the previous one.
6. Application in Traffic Control
We consider a signalized intersection as shown in
Figure 7. When a self-driving vehicle
x arrives at the intersection, it needs to communicate with the intersection to observe the signal and make a control decision accordingly. The observation and control are realized through a network. Due to network characteristics, observation delays and control delays are unavoidable. We assume in this example the observation delays are upper-bounded by 1 and the control decision are upper-bounded by 1, i.e.,
and
. We define seven events as shown in
Table 1.
Event a denotes that Vehicle x arrives at the intersection. Event p denotes that Vehicle x passes through the intersection. Event y denotes that the traffic signal is switched to yellow. The green time in one signal cycle is seconds, and we divide into and equally: denotes the first seconds and denotes the remaining seconds. Similarly, the red time in one signal cycle is seconds, and we divide into and equally: denotes the first seconds and denotes the remaining seconds.
Events
a and
p are controllable since Vehicle
x can choose to approach or pass through the intersection. Events
,
,
,
, and
y are observable but are not controllable since Vehicle
x can observe but cannot change the color of the traffic light. The system model
for vehicle
x is displayed in
Figure 8a.
Let us interpret the construction of
G in
Figure 8a as follows. When Vehicle
x arrives at the intersection (
a occurs), the system enters State 1. If the signal in the forward direction is switched to red for no more than
seconds after green, i.e.,
. (respectively, red for more than
seconds but no more than
seconds after green, i.e.,
, green for no more than
seconds after red, i.e.,
, green for more than
seconds but no more than
seconds after red, i.e.,
, and yellow, i.e.,
y), the system makes a state transition to State 5 (respectively, States 2, 6, 3, and 4). If the system is uncontrolled, Vehicle
x can pass through the intersection at any time. Hence,
p can occur in States 2, 3, 4, 5, and 6. Let us suppose that the traffic light is
when Vehicle
x arrives at the intersection. Thus, the system is in State 5. If Vehicle
x chooses to pass through the intersection, then the system moves to State 9. Otherwise, if Vehicle
x stops at the intersection, then upon the occurrence of
, the traffic light enters the second stage of the red cycle, and the system makes a state transition from State 5 to State 2. Then, if Vehicle
x chooses to pass through the intersection, the system moves from State 2 to State 9. Otherwise, by the switching rule, the traffic light is further switched to green (
occurs), the system makes a state transition from State 2 to State 6, and so on.
By traffic laws, passing the intersection (enabling
p) is not permitted when the traffic light is red or yellow when the vehicle approaches the intersection. Therefore, we should disable the occurrence of
p at States 2, 4, and 5. On the other hand, we can enable the occurrence of
p at States 3 and 6. In particular, when the system is in State 3, the traffic lights may be switched to yellow. Upon the occurrence of
y, the system moves to State 7. By the traffic law, enabling
p is legal if the traffic light is switched from green to yellow when Vehicle
x is passing through the intersection. Thus, we can enable
p at State 7. The desired system
H is depicted in
Figure 8b.
We now apply Algorithm 1 to calculate an optimal control command after each new observation. We denote by the set of augmented states returned by Line 9 after the observation of the ith event. We also denote by the set of events returned by Line 7 after the observation of the ith event.
Initially, by Lines 2 and 3, we have
and
. Let
. By the for-loop on Line 3, we first try to add
a into
and set
. By the definition of
, one can verify that
since only the occurrence of
p can lead the controlled system to the “illegal” state 9, and
p can never occur if we choose to disable
p now and in the future. Thus, by Line 6, we have
. By the for-loop on Line 3, we next try to add
p into
and set
. It can be checked that
and
since
may take effect after
, and
p is prevented from occurring after
. Thus, we have
and
. Let
. By definition,
Next, if
is communicated, by Line 9,
Then, let us go to Line 2, and we have . By the for-loop on Line 3, we first try to add a into and set . One can verify that . We next try to add p into and set .
Since
, when
is issued, augmented states can occur in the order as follows:
where
is the control command made after the communication of
. As we can see, the control action
may take effect at the time
p occurs at
, which violates the traffic law. Hence, we can only add
a into
. We have
.
The above process is repeated until the vehicle passes through the intersection. The synthesized supervisor is depicted in
Figure 9. For brevity, we only list all of the controllable events to be enabled at each state of the supervisor (all of the uncontrollable events are omitted). From
Figure 9, to pass through the intersection safely, Vehicle
x must stop if the traffic light is
,
,
y, or
when the vehicle arrives at the intersection. Vehicle
x can choose to pass through the intersection only when
is communicated, i.e., Vehicle
x observes the occurrence of
.
7. Extension of the Proposed Framework
In this section, we briefly discuss how to model a system with non-FIFO observations and controls.
In many control applications, such as cyber–physical systems, the sensors are often distributed at different sites, and the detected information is communicated to the supervisor over different observation channels. Different observation channels may have different upper bounds of observation delays. The nondeterministic observation delays may change the order of events communicated to the supervisor. In other words, the supervisor may receive observable event occurrences in different orders as they occur. On the other hand, the enablement and disablement of controllable events are achieved by single actuators, and all actuators are distributed at different sites. The supervisor sends the control decisions for disabling or enabling events to the corresponding actuators upon each new event observation. Different control channels may have different upper bounds of control delays.
As shown in
Figure 10, there are
sensors, each is associated with an observable event. For brevity, we write
, where
. For each
, the occurrence can be detected by sensor
i and communicated to the supervisor over observation channel
i. We assume that observation delays occurring in the observation channel
i are upper-bounded by
event occurrences. That is, when an event
occurs, it will be communicated to the supervisor before no more than
additional event occurrences. On the other hand, there are
actuators, and each is associated with a controllable event. We write
, where
. For each
, the enablement and disablement are achieved by actuator
i, and the control decision for enabling or disabling
is sent to the actuator
i over the control channel
i. We assume that control delays occurring in control channel
i are upper-bounded by
event occurrences. That is, the control decision made for an event
can be executed before no more than
additional event occurrences.
For each , the supervisor sends 0 or 1 to actuator i over control channel i, where “0” means “disablement” and “1” means “enablement”. Thus, we denote by the set of all the possible control decisions that the supervisor may make. Correspondingly, we denote, in this section, the supervisor by a pair , where is a deterministic automaton with , and is a function that specifies control decisions for disabling or enabling . Specifically, for any , we denote by control decisions for disabling or enabling . With a slight abuse of notation, we write . For any , we have . For any and any , we say that e is allowed to be enabled by , denoted by , if .
Definition 7. The observation channel i configuration is defined as a sequence of pairs:where is a sequence of observable events that have been detected by sensor i but were currently delayed at the observation channel i, and , is the number of event occurrences since occurred and was detected. If observation channel i is empty, . We denote by the set of all possible observation channel i configurations, where is the maximum length of . Given a , let be the maximum observation delays occurring in the observation channel i. The overall state of the observation channels is defined as a vector , where is the observation channel i configuration. Let be the set of all the states of observation channel configurations.
When a new event
occurs, to update the state of the observation channel configurations, we define the operator
as: for all
and all
,
such that
where if
,
, and if
,
.
When a new observable event
is communicated to the supervisor, to update the state of the observation channel configurations, define the operator
as for all
and all
,
such that
where
is the first component of
. That is, for all
, we have
.
When an event occurs in the plant, all natural numbers in should be plus 1 since they are used to counting the observation delays. Furthermore, if , by FIFO, we still need to add to the end of for recording the new observable event occurrence. On the other hand, when a new observable event is communicated to the supervisor, removes from the head of if .
Definition 8. The control channel i configuration is defined as a sequence of pairs:where is a sequence of control decisions made for enabling or disabling but are currently delayed at the control channel i, and , is the number of event occurrences since has been issued. If the control channel i is empty, . We denote by the set of all the possible control channel i configurations, where is the maximum length of . Given a , let be the maximum control delays occurring in the control channel i. The overall state of the control channel is defined as a vector , where is the control channel i configuration. Let be the set of all the states of control channel configurations. To update , we introduce the following operators.
When a new event
occurs in the plant, to update the state of the control channel configurations, define the operator
as: for all
,
such that
where if
,
, and if
,
.
When a new control command
is issued by the supervisor, to update the state of the control channel configurations, we define the operator
as: for all
and all
,
such that
for all
.
When a new control command
is executed, to update the states of the control channel configurations, define the operator
as: for all
and all
,
such that
When a new event occurs, for recording the control delays, adds 1 to all of the natural numbers in . When a new control command is issued (following a new observation), adds the newly issued control command to the end of control channel i. When a new control command is executed by actuator i, removes the first control command from .
To keep track of what has been successfully communicated to the supervisor so far, define bijection , such that is a set disjoint from . For all , we use to denote that the occurrence of has been communicated to the supervisor.
To model which control action has been executed by one of the actuators, we define bijection , such that is disjoint from . For all , we use to denote that the control command has been executed by the corresponding actuator.
Given a supervisor
with
, we formally construct
where
is the state space;
is the initial state;
is the event set; the transition function
is defined as:
For all
and all
,
where
;
For all
and all
,
where
;
For all
and all
,
where
.
For all , let be the sequences obtained by removing all the event occurrences in without changing the order of the remaining event occurrences in . We extend to a set of sequences in the usual way. The dynamics of the closed-loop system can be simply obtained from as follows.
Definition 9. Given system G and supervisor D, we construct as described above. All possible strings that may be generated by the closed-loop system with the observation delays and the control delays , are defined as: .
By Definition 9, we can specify the dynamics of the closed-loop system when the sensors and the actuators are distributed at different sites. Furthermore, we can extend the proposed approaches to make the state estimation and synthesize the supervisor for the “distributed” system. Since this paper focuses on the case where there is one control channel and one observation channel, such an extension to the “distributed” system is beyond the scope of this paper.
8. Conclusions
In this paper, we considered the optimal supervisory control of DESs under communication delays. It is assumed that (i) delays do not change the order of the observations and controls; and (ii) both the observation delays and control delays have upper bounds. A modeling framework for supervisory control under communication delays was developed and evaluated. With this proposed framework, an online algorithm for the state estimation of the supervised system is proposed. The proposed algorithm can be used to solve the supervisor’s synthesis problem in networked DESs. Compared with the supervisor proposed in the existing work, (i) the synthesized supervisor can be more permissive as the proposed framework and state estimation approaches are more precise; (ii) the proposed framework considers the nondeterministic observation delays and control delays, which often happen. An application is provided to show how to implement the proposed algorithm. Finally, we extended the proposed framework to specify the dynamics of the closed-loop system when the sensors and actuators of the system are distributed, where delays may change the order of the observations and the controls.
One direction for future research can be to enhance the application scope of the proposed approach by accommodating communication losses in the system model. Researches can also look at how to estimate the states and synthesize supervisors when the sensors and actuators of the system are distributed.