Efficient Consistency Check Based on Perceived Initial Deviation

Zhang, Liwen; Wang, Fanglue; Song, Zhihuan; Huang, Kaifeng; Hu, Yanli; Zhuo, Guiying

doi:10.3390/electronics13234669

Open AccessArticle

Efficient Consistency Check Based on Perceived Initial Deviation

by

Liwen Zhang

^1,2

,

Fanglue Wang

^3,*

,

Zhihuan Song

²,

Kaifeng Huang

¹,

Yanli Hu

¹ and

Guiying Zhuo

¹

School of Mechanical and Electrical Engineering, Huainan Normal University, Huainan 232038, China

²

State Key Laboratory of Industrial Control Technology, School of Control Science and Engineering, Zhejiang University, Hangzhou 310027, China

³

School of Biological Engineering, Huainan Normal University, Huainan 232038, China

^*

Author to whom correspondence should be addressed.

Electronics 2024, 13(23), 4669; https://doi.org/10.3390/electronics13234669

Submission received: 6 September 2024 / Revised: 19 November 2024 / Accepted: 21 November 2024 / Published: 26 November 2024

(This article belongs to the Section Computer Science & Engineering)

Download

Browse Figures

Versions Notes

Abstract

:

The behavior recorded by an information system is often different from the behavior in the initial model since the business process is constantly changing in actual operation. In order to enable event logs to be replayed well to process the model, the set of optimal alignments first needs to be determined. All the corresponding activities in each alignment are compared to the existing method, which will cause a lot of unnecessary work. Thus, we propose that the non-optimal alignment is perceived beforehand according to the relationship between the location of the initial deviation and the number of deviations. The perceptible regions in the process model are divided based on the behavioral characteristics of various substructures. The comparison of an alignment is terminated if the location of the initial deviation is less than the previous value in the perceptible region. This alignment is judged to be non-optimal. Otherwise, the alignment with optimal probability is completely compared. The OPS plug-in was executed in the data sets from various networks and BPIC2020, and the results showed that the search efficiency could be improved under the premise of guaranteeing optimality.

Keywords:

non-optimal alignment; location of initial deviation; the number of deviations; perceptible region; recorded value

1. Introduction

1.1. Research Background

In recent years, process mining has become an important technique to extract valuable information from event logs [1]. Process mining mainly includes the following three steps: process discovery, conformance check and model repair [2,3,4,5]. Process discovery refers to extracting useful information from event logs to construct the corresponding process model [6,7]. Differences usually exist between current recorded behaviors and the initial models due to the variations of an event log with the actual situations [8]. Thus, it is necessary to complete the subsequent model repair according to this differing information [9,10]. Trace alignment detects deviations by correlating each trace with all firing sequences, while behavior alignment identifies all non-consistent behavior between the event log and the process model based on the division of the event structure [11,12]. Two kinds of deviations between the event log and the initial model can be detected by applying a robust analysis technique, namely the insert and skip deviations [13]. An insert deviation refers to activity that is only present in the event log. Meanwhile, a skip deviation is specific to the initial model, which affects the event log replays in the process model [14]. The inconsistent behavior detected by the behavior-checking technology mainly includes the following two types: (i) unfitted behavior in the event log that cannot be replayed in the process model and (ii) the additional behavior that can only be captured in the initial model and that does not affect the normal replay of the event log. The additional behavior in the model needs to be retained as much as possible so that the similarity is ensured between the repaired and initial models [15,16]. Trace alignment involves the detection of a deviation element, and its ultimate goal is to measure the optimal performance of the event log replay in the process model. Thus, there is no need to consider the influence of additional behavior in the model [17].

The minimum deviation cost between the event log and the process model is satisfied under the set of optimal alignments. The deviation cost is equal to the number of deviations when the unit cost of all deviations is the same [18]. The brute force search for optimal alignment involves comparing each trace in the event log to all firing sequences in the process model, which can cause a lot of unnecessary consumption [19]. Therefore, a huge amount of work must be devoted to ensuring the optimal detection of consistency when dealing with the data set, which has a larger size or complex network structure. In this study, we propose that the non-optimality of some alignments can be perceived beforehand based on the location of the initial deviation. This method was focused on the perspective of control flow. Under the premise of ensuring the optimal cost of trace alignment, the efficiency of conformance check is improved mainly in the following ways: (i) the initial deviation of an alignment, whether it occurred within the perceptible region, is determined; (ii) the location of the initial deviation is smaller, and the number of deviation is larger when the initial deviation occurs in the perceptible region; and (iii) the location of effective initial deviation is continuously updated when each trace in the event log is aligned with the process model. Therefore, some alignments are terminated when the location of the initial deviation is smaller than the previously recorded one. In this case, some non-optimal alignment is determined without the whole comparison. It is worth noting that the perceptible processes of each trace in the event log are independent from each other.

1.2. Related Work

Alignment associates every trace in the event log with all firing sequences in the process model, and their deviations are detected accordingly [20]. This technique mainly involves the following three analyses: conformance check, cost accounting and optimal alignment search.

The conformance between the event log and the initial model is measured using four different metrics: fitness, precision, simplicity and generalization. Particularly, the fitness is the most important [21]. Conformance checks can be divided into the following two types according to different detection objects: (i) the deviation element between the event log and the initial model, which is detected based on trace alignment [22,23]; (ii) the behavior that does not match the event log and the initial model, which is discovered based on behavior mining [24]. Conformance checks can also be divided into the following two types from the different perspectives of different workflows: (i) conformance checks based on the control flow [25,26]; (ii) conformance checks from multiple perspectives, including role, data flow and stochastic language, among others [27,28,29]. Conformance checks can be further divided into the following two types, according to the integrity of the check: (i) conformance of the business process in the current state, which is only checked [30]; (ii) conformance, which is checked in real-time with changes in the business process [31,32].

Cost accounting refers to the total cost of asynchronous movement in an alignment. The unit cost of synchronous movement is set to 0, while the unit cost of asynchronous movement is set to 1 in the unweighted generic business process (the unit cost is 0 when an invisible transition is included in an asynchronous movement) [33]. The process model is repaired by processing the deviation information according to the minimum cost, and the event log can be replayed in the repaired process model.

The optimal alignment search is a key technique in conformance checking, which allows the optimal fitness of the control flow to be analyzed by it. The performance of an optimal alignment search is mainly affected by two different criteria: (i) the minimal deviation cost and (ii) the search workload. The event log is directly compared with the process model, and the fitness is analyzed by calculating the total cost of deviations. Therefore, the conformance check is limited to simple-model or single operation, and the optimal fitness cannot be guaranteed by it [34,35]. Each trace in the event log is aligned with all firing sequences of the initial model using the

A^{*}

algorithm, and the set of optimal alignments is selected through cost evaluation [36]. The location and severity of deviations can be pinpointed by adjusting the differences between the event log and the process model [37]. For this purpose, the process model and the event log are transformed into two respective automatons, respectively. Then, they are compared according to the synchronous product. The synchronous product is computed using the

A^{*}

heuristic and acceptable heuristic functions. All differences are captured by this synchronous product, allowing the minimum cost to be obtained [38]. Although the optimality of the alignment set is guaranteed by the

A^{*}

algorithm and the synchronous product, they involve a huge amount of work on a complicated data set. The fitness is accurately obtained using the iterative decomposition approach. The best fitness can be estimated within 10 min, and the unnecessary time to calculate the accurate value can be decreased when the fitness interval is narrow enough [39]. Based on the structural and behavioral characteristics of the process model, the overall search space for optimal alignment can be reduced through the use of effective heuristics and trace replay [40]. In this study, the perceptible regions of the process model are first determined based on the behavior characteristics of various substructures. Then, some non-optimal alignments can be terminated early when the initial deviation is located in the perceptible region. In this way, the search efficiency of optimal alignment is efficiently improved.

2. Basic Definition

Definition 1

(Event log). The tuple

L = (℘, o, e, l, ϑ, ⊳)

is recorded as the event log.

℘

is a trace in the event log, and

o

is the unified case number of traces that occurs repeatedly. Since there may be different traces in the event log, all the case numbers belong to the case set

O

, i.e.,

\forall o \in O

,

\sum_{o = 1}^{| O |} ℘_{0} \in L

.

e

is an event element in the log, denoted as

\forall e \in L

.

l

is a function that assigns each event element to its corresponding label, denoted as

l (e) = ϑ

.

ϑ^{*}

is the label set of all events,

\forall l (e) \in ϑ^{*}

and

\forall ϑ \in ϑ^{*}

.

⊳

is the flow relationship between adjacent events, denoted as

⊳ \subseteq \sum_{i = 1}^{I} e_{i - 1} \times e_{i}

.

Definition 2

(Labeled process model). The tuple

M = (p, t, λ, σ, τ, δ, j, p_{ini}, p_{final}, ≺)

is recorded as the labeled process model.

p

and

t

represent the place and transition, respectively.

λ

is the function that assigns each transition to the corresponding label, that is,

λ (t) = σ \lor λ (t) = τ

.

τ

is a label for invisible transition that has no real meaning.

σ^{*}

is a finite set of labels in the process model,

\forall λ (t) \in σ^{*} \cup τ

and

\forall σ \in σ^{*}

.

δ

is a complete sequence from the initial to the terminated node, and

j

is the case number of each sequence

δ

, denoted as

\sum_{j = 1}^{| J |} δ_{j} \in M

.

p_{ini} \in P

and

p_{final} \in P

are the initial and the terminated places, respectively. The process model

M

is a workflow net if and only if

| p_{ini} | = | p_{final} | = 1

.

≺

is the flow relationship between adjacent transitions, denoted as

≺ \subseteq \sum_{k = 1}^{K} t_{k - 1} \times t_{k}

.

Definition 3

(The unit cost setting of alignment). The corresponding activity between a trace

℘

and a firing sequence

δ

is compared by alignment, denoted as

A l i g n (℘_{i}, δ_{j}) = (M o v e_{L M}, {\tilde{M o v e}}_{L}, {\tilde{M o v e}}_{M}, E s e r t (l (e)), S k i p (λ (t)))

,

\forall A l i g n (℘_{i}, δ_{j}) \in A l i g n^{*}

(

A l i g n^{*}

represents the set of all alignments between the event log and process model). The set of synchronous movements is denoted as

M o v e_{L M}

,

\forall m o v e_{L M} \in M o v e_{L M}

. The set of asynchronous movements belonging to the event log is denoted as

{\tilde{M o v e}}_{L}

,

\forall {\tilde{m o v e}}_{L} \in {\tilde{M o v e}}_{L}

. The set of asynchronous moves belonging to the process model is denoted as

{\tilde{M o v e}}_{M}

,

\forall {\tilde{m o v e}}_{M} \in {\tilde{M o v e}}_{M}

. The unit cost of asynchronous movement is “1” in the unweighted business process, denoted as

\cos t ({\tilde{m o v e}}_{L} / {\tilde{m o v e}}_{M}) = 1

. The unit cost of synchronous movement is “0”, denoted as

\cos t (m o v e_{L M}) = 0

. The unit cost is 0 when invisible transition is included in an asynchronous movement. The insert deviation

E s e r t (l (e))

and skip deviation

S k i p (λ (t))

are produced by

{\tilde{M o v e}}_{L}

and

{\tilde{M o v e}}_{M}

, respectively (

\forall e \in L

,

\forall t \in M

). All movements in an alignment are recorded in Table 1, and the costs of different movements are accounted for.

Definition 4

(Search for the optimal alignment). The alignment set is obtained by comparing each trace of the event log with all firing sequences in the process model, i.e.,

A l i g n^{*} = \sum_{o = 1}^{| O |} \sum_{j = 1}^{| J |} A l i g n (℘_{o}, δ_{j}) = \sum_{o = 1}^{| O |} \sum_{j = 1}^{| J |} c o m p a r e (℘_{o}, δ_{j})

,

\forall ℘_{o} \in L

,

\forall δ_{j} \in M

. The optimal alignment

A l i g n_{o p}^{*}

is the minimum cost alignment set associated with all traces, that is,

\cos t (A l i g n (℘_{o}, δ_{o p})) < \cos t (A l i g n (℘_{o}, δ_{j}))

,

A l i g n_{o p}^{*} = \sum_{o = 1}^{| O |} \sum_{o p = 1}^{| O P |} A l i g n_{o p} (℘_{o}, δ_{o p})

,

\forall δ_{j} \in M \ δ_{o p}

,

\forall δ_{o p} \in M \ δ_{j}

. In other words, all alignments between the event log and the process model are necessary to be obtained before accurately judging the optimal alignment. As shown in Figure 1, the search workload for the optimal alignment between event log

L

and process model

M

is

v a l u e (\sum_{o = 1}^{| O |} \sum_{j = 1}^{| J |} A l i g n (℘_{o}, δ_{j})) = v a l u e (\sum_{j = 1}^{4} c o m p a r e (℘_{1}, δ_{j})) = 40

(

v a l u e

represents the number of comparisons in the optimal alignment evaluation). The optimal alignment between

L

and

M

is

A l i g n_{o p} = (℘_{1}, δ_{1})

.

Definition 5

(Location of the initial deviation). The initial deviation is the first deviation in an alignment, and its location refers to the order of occurrence in an alignment. Figure 2 depicts the discovery process of the initial deviation. In Figure 2, a trace

℘_{0}

of

L

and a sequence

δ_{2}

of

M

are extracted, and the first difference

S k i p (λ (t)) = e

produced by the alignment of

℘_{0}

and

δ_{2}

is marked with a dashed blue line. It is worth noting that the invisible transition

τ

in the model is ignored in order to better observe the deviation.

The initial deviation is detected in an alignment, and the location of the initial deviation is determined according to the unit order of alignment. The location of the initial deviation is denoted as

v (l o c_{i d} (A l i g n (℘_{o}, δ_{j})))

(v is the evaluation function, that is, the location of moving unit in alignment is converted into numerical form; loc_id represents the occurrence location of the initial deviation in alignment).

A l i g n (℘_{1}, δ_{1})

and

A l i g n (℘_{1}, δ_{2})

in Table 2 are used as examples, and the asynchronous movement that produced the initial deviation

E s e r t (l (e_{3})) = C

is the third moving unit in

A l i g n (℘_{1}, δ_{1})

.

v (l o c_{i d} (A l i g n (℘_{1}, δ_{1}))) = 3

is obtained by the conversion of the evaluation function v, that is, the location of the initial deviation in

A l i g n (℘_{1}, δ_{1})

is 3. The asynchronous movement in

A l i g n (℘_{1}, δ_{2})

that produces the initial deviation

S k i p (λ (t_{2})) = C

is the second unit, which is transformed by the evaluation function v to obtain

v (l o c_{i d} (A l i g n (℘_{1}, δ_{2}))) = 2

, that is, the initial deviation in

A l i g n (℘_{1}, δ_{2})

occurs at 2. Therefore, the location of the initial deviation in

A l i g n (℘_{1}, δ_{1})

is greater than

A l i g n (℘_{1}, δ_{2})

, denoted as

v (l o c_{i d} (A l i g n (℘_{1}, δ_{1}))) > v (l o c_{i d} (A l i g n (℘_{1}, δ_{2})))

.

Definition 6

(Path partition of various substructures). Any path between the initial and the terminated nodes in a business process is called a complete path. Two types of sub-paths may be included in a complete path. Namely, a mandatory sub-path must occur, and a selective sub-path may occur, denoted as

p_{m a n}

,

p_{s e l}

. The unique behavior on each complete path comes from

p_{s e l}

, while the common behavior belongs to

p_{m a n}

. The substructure corresponding to

p_{m a n}

is the mandatory substructure, that is,

{N_{c a u} \cup N_{c o n c}} = N_{m a n}

.

N_{c a u}

and

N_{c o n c}

satisfy causal and concurrent behavior, respectively.

N_{c a u}

and

N_{c o n c}

both belong to mandatory substructures, that is, their activities occur on each complete path.

p_{s e l}

corresponds to the selective substructure, i.e.,

N_{c o n f} = N_{s e l}

(

N_{c a u}

represents a substructure satisfying conflict behavior relationship). According to the above rules, the various substructures are divided and their corresponding sub-paths are given in Figure 3.

The level division of the selective substructure depends on the depth of selection. The selective substructure of the first level is regarded as the root of the tree structure. Then, the next level of the selective substructure is contained in the upper substructure, denoted as

N_{s e l (l e a f 1)} \subset N_{s e l (r o o t)}

,

N_{s e l (n)} \subset N_{s e l (n - 1)}

(

N_{s e l (l e a f 1)}

represents the first leaf of the root structure,

n

indicates the number of level division). According to the order of occurrence, the selective substructure of the same level can be denoted as

N_{s e l (n)}^{m}

(

m

represents the number of occurrence order). The mandatory substructure can be divided into prepositive and postpositive substructures, which are denoted as

N_{{}^{\cdot}m a n (z)}

and

N_{m a n^{\cdot} (d)}

.

N_{{}^{\cdot}{m a n (z)}}

and

N_{m a n^{\cdot} (d)}

are mandatory substructures that occur before and after the selective structure, respectively (

z

and

d

represent the occurrence orders of

N_{{}^{\cdot}m a n (z)}

and

N_{m a n^{\cdot} (d)}

, respectively). It is worth noting that the types of all sub-paths in a substructure correspond to that of their substructures.

Definition 7

(Perceptible Region). The perceptible region is a set of special substructures in the process model, denoted as

φ_{p M}^{*} \subseteq {N_{s e l (n)}^{m} \cup N_{{}^{\cdot}m a n (z)} \cup N_{m a n^{\cdot} (d)}}

. A subalignment between a subsequence

δ_{j}^{\cdot}

in

φ_{p M}^{*}

and any trace

℘_{o}

in the event log is denoted as

A l i g n_{s u b} (℘_{o}, δ_{j}^{\cdot})

. The subalignment set between

φ_{p M}^{*}

and

℘_{o}

is

\sum_{j = 1}^{| J |} A l i g n (℘_{o}, δ_{j}^{\cdot}) = A l i g n_{s u b}^{*} (℘_{o}, δ_{j}^{\cdot})

. The smaller the initial deviation location, the larger the number of deviations in

A l i g n_{s u b}^{*} (℘_{o}, δ_{j}^{\cdot})

. A substructure

N_{c o n c}

in

φ_{p M}^{*}

is described in Figure 4.

δ_{1}^{\cdot} = (B C D E)

and

δ_{2}^{\cdot} = (B D C E)

are included in

N_{c o n c}

. From Table 3,

A l i g n_{s u b} (℘_{o}, δ_{1}^{\cdot})

is produced by

℘_{o} = (A B C D F G)

and

δ_{1}^{\cdot} = (B C D E)

.

A l i g n_{s u b} (℘_{o}, δ_{2}^{\cdot})

is produced by

℘_{o} = (A B C D F G)

and

δ_{2}^{\cdot} = (B D C E)

. According to Definition 5, the location of the initial deviation in

A l i g n_{s u b} (℘_{o}, δ_{1}^{\cdot})

and

A l i g n_{s u b} (℘_{o}, δ_{2}^{\cdot})

is

v (l o c_{i d} (A l i g n_{s u b} (℘_{1}^{\cdot}, δ_{1}^{\cdot}))) = 4 > v (l o c_{i d} (A l i g n_{s u b} (℘_{1}^{\cdot}, δ_{2}^{\cdot}))) = 2

. The cost of

A l i g n_{s u b} (℘_{1}^{\cdot}, δ_{1}^{\cdot})

and

A l i g n_{s u b} (℘_{2}^{\cdot}, δ_{1}^{\cdot})

is

\cos t (A l i g n_{s u b} (℘_{1}^{\cdot}, δ_{1}^{\cdot})) = 1 < \cos t (A l i g n_{s u b} (℘_{1}^{\cdot}, δ_{2}^{\cdot})) = 3

.

3. Recognition of the Perceptibility of Business Process

The search for optimal alignment aligns each trace with all firing sequences, and the alignment set is discovered by producing the minimum cost. In order to improve the search efficiency of the optimal alignment, the non-optimality of some alignments can be determined beforehand according to the specified condition in the perceptible region. The different substructures are divided based in the process model, and the perceptible region is determined by the behavioral characteristics of various substructures.

3.1. Establishment of Perceptibility Condition

The costs of all alignments corresponding to each trace are obtained, and the alignment with the minimum cost is evaluated (the number of deviations is equivalent to the cost of deviations if and only if the cost of each deviation is “1”). The optimal alignment between the event log and process model consists of all optimal trace alignments. Some non-optimal alignments can be predicted by the location of the initial deviation.

The location of the initial deviation in any alignment can be denoted as

v (l o c_{i d} (A l i g n (℘_{o}, δ_{j})))

according to Definition 5. The location of the initial deviation in an alignment is not the largest when a trace is aligned with the process model. Then, this alignment must be non-optimal, denoted as the perceptibility condition

C_{p} : v (l o c_{i d} (A l i g n (℘_{o}, δ_{j})) < v (l o c_{i d} (A l i g n (℘_{o}, {δ^{'}}_{j})) \to \cos t (A l i g n (℘_{o}, δ_{j})) > \cos t (A l i g n (℘_{o}, {δ^{'}}_{j})) \to A l i g n (℘_{o}, δ_{j}) \notin A l i g n_{o p}^{*}

(

\cos t

is cost function). The comparisons of some non-optimal alignments are terminated at the location of the initial deviation when the perceptibility condition

C_{p}

is satisfied. Therefore, the unnecessary alignments are saved. In order to take advantage of the given condition

C_{p}

, it is necessary to determine the perceptible region in the process model. For example, a trace in the event log

℘_{1} = (A B D E K)

is aligned with all firing sequences of the process model, shown in Figure 5. The location of the initial deviation in

A l i g n (℘_{1} = (A B D E K), δ_{1} = (A B G E F K))

is

v (l o c_{i d} (A l i g n (℘_{1}, δ_{1}))) = 3

, and the number of deviations is also 3. The location of the initial deviation in

A l i g n (℘_{1} = (A B D E K), δ_{2} = (A B D H I J K))

is

v (l o c_{i d} (A l i g n (℘_{1}, δ_{2}))) = 4

, and the number of deviations is also 4.

v (l o c_{i d} (A l i g n (℘_{1}, δ_{1}))) < v (l o c_{i d} (A l i g n (℘_{1}, δ_{2})))

and

\cos t (A l i g n (℘_{1}, δ_{1})) < \cos t (A l i g n (℘_{1}, δ_{2}))

, so

A l i g n (℘_{1}, δ_{1})

may be optimal. The above situation violates

C_{p}

.

3.2. Determination of Perceptible Region

According to the example in Section 3.1, the perceptibility condition is not always satisfied. The perceptibility condition is valid if and only if the initial deviation occurs in the specific range. Therefore, it is necessary to determine the perceptible region based on the behavior characteristics of various substructures in the process model. There are four steps to determine the perceptible region in the process model: (i) according to Definition 6, various types of mandatory and selective substructures are divided in the process model; (ii) the initial node

a_{i}

and the terminated node

a_{f}

in the process model are determined and included in the perceptible region; (iii) according to Definition 6, the selective substructure with a fixed number of enable activities is denoted as

N_{s e l (n)}^{m} (n u m (e a^{*})) = f (v)

. (

e a^{*}

is a set of enabled activities,= that can occur;

f (v)

is a fixed value). The given condition

C_{p}

is satisfied by an alignment when the initial deviation occurs in

N_{{}^{\cdot}m a n (z)}

,

N_{m a n^{\cdot} (d)}

and

N_{s e l (n)}^{m} (n u m (e a^{*})) = f (v)

; (iv) it does not necessarily satisfy

C_{p}

, and the selective substructure with an uncertain number of enabled activities is denoted as

N_{s e l (n)}^{m} (n u m (e a^{*})) = u (v)

(

u (v)

is a uncertain value).

C_{p}

is satisfied when the number of enabling activities of

N_{s e l (n)}^{m} (n u m (e a^{*})) = u (v)

does not exceed 2 in any case, denoted as

N_{s e l (n)}^{m} (n u m (e a^{*})) = u (v) \leq 2

. Therefore, the whole perceptible region is denoted as

φ_{p M}^{*} = {a_{i} \cup N_{{}^{\cdot}m a n (z)} \cup (N_{s e l (n)}^{m} (n u m (e a^{*})) = f (v)) \cup (N_{s e l (n)}^{m} (n u m (e a^{*})) = u (v) \leq 2) \cup N_{m a n^{\cdot} (d)} \cup a_{f}}

. It is denoted as

M \in φ_{p M}^{*}

when the process model contains only the substructures of the perceptible region. This is the reason why Figure 5 does not satisfy

C_{p}

, namely,

N_{s e l (1)}^{2} (n u m (e a^{*})) = u (v) > 2 \to N_{s e l (1)}^{2} \in φ_{\neg p M}^{*}

(

φ_{\neg p M}^{*}

is the whole non-perceptible region, which is the set of all substructures that do not belong to the perceptible region

φ_{p M}^{*}

). Figure 6a–c describes the three structures belonging to the perceptible region in the process model, respectively, i.e.,

N_{s e l (n)}^{m} (n u m (e a^{*})) = f (v)

,

N_{s e l (n)}^{m} (n u m (e a^{*})) = u (v) \leq 2

and

N_{m a n}

(

N_{{}^{\cdot}m a n (z)}

,

N_{m a n^{\cdot} (d)}

).

The determinate process of the perceptible region is described in Algorithm 1. Firstly, the initial and the terminated activities belong to the perceptible region (lines 1–3). Secondly, two mandatory substructures are located correctly, which belong to the perceptible region (lines 4–17). The selective substructure in which the number of the enabled activities is a fixed value in the process model is located correctly. Then, this selective substructure is assigned to the perceptible region (lines 22–24). The selective substructure is assigned to the perceptible region when the number of enabled activities is uncertain and the total number of enabled activities does not exceed 2 (lines 25–27). The selective substructure does not belong to the perceptible region when the number of enabled activities is uncertain and the total number of enabled activities exceeds 2 (lines 28–30). Finally, the output of Algorithm 1 is the perceptible range of the process model, denoted as

φ_{p M}^{*}

.

Algorithm 1 Determine perceptible range

Input:: $\begin{array}{l} process model M, a_{i} is the initial activity, a_{f} is the terminated activity, process model \\ substructure N, pre - function p r e (), post - function p o s t (), the numerical function num () \\ the function of fixed value f v (), the function of uncertain u v (), the set of enable activities \\ e a^{*} . \end{array}$

\begin{array}{l} Output : the non - perceptible area φ_{p M}^{*} . \\ 1 : If a_{i}, a_{f} \in M a n d a_{i} = p r e (a^{*} \ a_{i}) a n d a_{f} = p o s t (a^{*} \ a_{f}) then \\ 2 : a_{i}, a_{f} \in φ_{p M}^{*} \\ 3 : end \\ 4 : z = 1 \\ 5 : For each z \in Z do \\ 6 : If N_{{}^{\cdot}m a n (z)} \in M and N_{{}^{\cdot}m a n (z)} = p r e (N_{s e l (n)}^{m}) then \\ 7 : N_{{}^{\cdot}m a n (z)} \in φ_{p M}^{*} \\ 8 : end \\ 9 : z = z + 1 \\ 10 : end \\ 11 : d = 1 \\ 12 : For each d \in D do \\ 13 : If N_{m a n^{\cdot} (d)} \in M and N_{m a n^{\cdot} (d)} = p o s t (N_{s e l (n)}^{m}) then \\ 14 : N_{m a n^{\cdot} (d)} \in φ_{p M}^{*} \\ 15 : end \\ 16 : d = d + 1 \\ 17 : end \\ 18 : n = 1 \\ 19 : For n \in n u m (n) do \\ 20 : m = 1 \\ 21 : For m \in n u m (m) do \\ 22 : If N_{s e l (n)}^{m} \in M and n u m (e a^{*}) = f (v) then \\ 23 : N_{s e l (n)}^{m} \in φ_{p M}^{*} \\ 24 : end \\ 25 : If N_{s e l (n)}^{m} \in M and n u m (e a^{*}) = u (v) then \\ 26 : If n u m (e a^{*}) \leq 2 then \\ 27 : N_{s e l (n)}^{m} \in φ_{p M}^{*} \\ 28 : else \\ 29 : N_{s e l (n)}^{m} \notin φ_{p M}^{*} \\ 30 : end \\ 31 : end \\ 32 : m = m + 1 \\ 33 : end \\ 34 : n = n + 1 \\ 35 : end \\ 36 : return φ_{p M}^{*} \end{array}

The substructure of a real-life business process can contain various nested types [41]. Four common nested substructures are described in Figure 7. The perceptible nested substructure needs to be analyzed according to the specific situation.

The selective substructure in

N_{sel (2)}^{1}

in Figure 7a is contained in the selective substructure

N_{sel (1)}^{1}

, and the number of enabled activities of

N_{s e l (1)}^{1}

is 1. Thus,

N_{s e l (1)}^{1} \in φ_{p M}

. The selective substructure

N_{s e l (1)}^{1}

is contained in the mandatory substructure

N_{m a n}

, shown in Figure 7b. The mandatory substructure

N_{{}^{\cdot}m a n}

is regarded as a special selective substructure

N_{m a n (s e l (1))}^{1}

.

N_{m a n (s e l (1))}^{1} (n u m (e a^{*})) \neq f (v)

and

\max (N_{m a n (s e l (1))}^{1} (n u m (e a^{*}))) = 4 > 2

, so

N_{m a n (s e l (1))}^{1} \in φ_{\neg p M}^{*}

(

\max

is the maximum function). As shown in Figure 7c, two mandatory substructures

N_{m a n}

are contained in the selective substructure

N_{s e l (1)}^{1}

. The selective substructure

N_{s e l (2)}^{1}

is contained in one of the mandatory substructures

N_{m a n}

.

N_{s e l (1)}^{1} (n u m (e a^{*})) \neq f (v)

and

\max (N_{s e l (1)}^{1} (n u m (e a^{*}))) = 4 > 2

, so

N_{s e l (1)}^{1} \in φ_{\neg p M}^{*}

. Two selective substructures

N_{s e l (1)}^{1 / 2}

and

N_{s e l (1)}^{2 / 1}

are contained in the mandatory substructure

N_{m a n}

, shown in Figure 7d, denoted as

N_{m a n (s e l (1))}^{1 / 2}

and

N_{m a n (s e l (1))}^{2 / 1}

.

m = 1 / 2

and

m = 2 / 1

, so the occurrence order of

N_{s e l (1)}^{1 / 2}

and

N_{s e l (1)}^{2 / 1}

is uncertain according to Definition 6.

N_{s e l (2)}^{1 / 2}

and

N_{s e l (2)}^{2 / 1}

are contained in

N_{m a n (s e l (1))}^{1 / 2}

and

N_{m a n (s e l (1))}^{2 / 1}

, respectively. The structural property of

N_{m a n}

is regarded as the selective substructure

N_{m a n (s e l)}

.

N_{m a n (s e l)} (n u m (e a^{*})) \neq f (v)

and

\max (N_{m a n (s e l)} (n u m (e a^{*}))) = 5 > 2

in

N_{m a n (s e l)}

, so

N_{m a n (s e l)} \in φ_{\neg p M}^{*}

. To sum up, the perceptible nested substructure needs to obey the following two rules: (i) the perceptibility condition of the selective substructure is directly considered when the outermost structure is the selective substructure; (ii) the mandatory substructure is judged to be a perceptible substructure if it does not contain a selective substructure. Otherwise, the mandatory substructure needs to be judged as the type of selective substructure.

The determination of perceptible nested substructure is described in Algorithm 2. The perceptible region is determined according to the normal rule of selective substructure when a mandatory substructure is contained in a selective substructure (lines 2–14). The mandatory substructure is regarded as a selective substructure if a selective substructure is contained in this mandatory substructure (lines 15–17).

Algorithm 2 Determine the perceptibility of nested substructures

Input:: $\begin{array}{l} process model M, mandatory concurrency substructure N_{m a n}, selective substructure N_{s e l}, \\ the set of enable activities e a^{*}, function of fixed value f v (), function of maximum value \\ \max (), function of judge substructure N is a perceptive region j_{N} () . \end{array}$

\begin{array}{l} Output : judge result y e s or n o . \\ 1 : \forall N_{m a n} \in M, \forall N_{s e l} \in M \\ 2 : If N_{m a n} \subseteq N_{s e l} and N_{s e l (m a n)} \in M then \\ 3 : If e a^{*} \in N_{s e l (m a n)} then \\ 4 : If f v (e a^{*}) = y e s \\ 5 : j_{N} (N_{s e l (m a n)}) = y e s \\ 6 : else \\ 7 : If \max (e a^{*}) \leq 2 then \\ 8 : j_{N} (N_{s e l (m a n)}) = y e s \\ 9 : else \\ 10 : j_{N} (N_{s e l (m a n)}) = n o \\ 11 : end \\ 12 : end \\ 13 : end \\ 14 : end \\ 15 : If N_{s e l} \subseteq N_{m a n} and N_{m a n (s e l)} \in M then \\ 16 : N_{m a n (s e l)} ⊳ N_{s e l} \\ 17 : j_{N} (N_{m a n (s e l)}) = j_{N} (N_{s e l}) \\ 18 : end \end{array}

3.3. Reverse Search for Perceived Range

There are many types of substructures in process models, which are divided according to the occurrence possibility of each path in the substructure. Therefore, the type of each substructure between the initial and end nodes is determined for accurately detecting the perceptible and non-perceptible regions in the process model. The perceptible region in the process model is searched back-to-front according to Definition 6, which is called a reverse search. Figure 8 describes the process of a reverse search. Since

N_{s e l (n)}^{m} \subset N_{s e l (n - 1)}^{m}

,

N_{s e l (n)}^{m}

and

N_{s e l (n - 1)}^{m}

are regarded as a whole.

N_{s e l (n - 1)}^{m} (n u m (e a^{*})) = 2

indicates that the number of enabled activities in

N_{s e l (n - 1)}^{m}

is a constant of 2, that is,

N_{s e l (n - 1)}^{m} \in φ_{p M}^{*}

.

\max (N_{s e l (n - 1)}^{m - 1} (n u m (e a^{*}))) = 3

indicates that one branch of

N_{s e l (n - 1)}^{m - 1}

contains only a hidden transition, while the number of enabled activities on the other branch is 3, i.e.,

N_{s e l (n - 1)}^{m - 1} \in φ_{\neg p M}^{*}

.

\max (N_{s e l (n - 1)}^{m - 2} (n u m (e a^{*}))) = 2

indicates that the maximum number of enable activities of

N_{s e l (n - 1)}^{m - 2}

is 2, that is,

N_{s e l (n - 1)}^{m - 2} \in φ_{p M}^{*}

.

N_{s e l (n - 1)}^{m - 3} (n u m (e a^{*})) = 1

indicates that the number of enabled activities in

N_{s e l (n - 1)}^{m - 3}

is a constant of 1, that is,

N_{s e l (n - 1)}^{m - 3} \in φ_{p M}^{*}

.

N_{{}^{\cdot}m a n (z)} (n u m (e a^{*})) = 4

is the number of enabled activities of

N_{{}^{\cdot}m a n (z)}

, which is a constant, that is,

N_{{}^{\cdot}m a n (z)} \in φ_{p M}^{*}

. Thus, the perceptible range in Figure 8 is

{φ_{p M}^{*}}^{'} = M \ (N_{s e l (n - 1)}^{m - 2} \cup N_{s e l (n - 1)}^{m - 1})

. The perceptibility condition of

C_{p}

can be used to determine the optimal alignment when the initial deviation occurs in the perceptible region.

The reverse search for the perceptible region is described in Algorithm 3. Firstly, it is determined that various substructures belong to the process model (line 1).

N_{s e l (n)}^{m}

is not contained in

N_{m a n^{\cdot} (d)}

; then,

M_{φ_{p M}^{*}} \leftarrow N_{m a n^{\cdot} (d)}

(lines 3–6). Otherwise,

N_{m a n^{\cdot} (d)}

is regarded as

N_{s e l (n)}^{m}

, and the algorithm jumps to line 10 (lines 7–9). The number of enabled activities of

N_{s e l (n)}^{m}

is determined to be a fixed value from the back forward, then

M_{φ_{p M}^{*}} \leftarrow N_{s e l (n)}^{m}

(lines 13–17). The number of enabled activities of

N_{s e l (n)}^{m}

is not a fixed value and the maximum number of enabled activities does not exceed 2; then,

M_{φ_{p M}^{*}} \leftarrow N_{s e l (n)}^{m}

(lines 18–21). The level

n

is reduced by 1 if the next substructure satisfies

N_{s e l (n - 1)}^{m} \supset N_{s e l (n)}^{m}

(lines 22–24). The occurrence order

m

is reduced by 1 if the substructure

N_{s e l (n)}^{m - 1}

is the preceding substructure of

N_{s e l (n)}^{m}

(lines 25–27).

M_{φ_{p M}^{*}} \leftarrow N_{m a n^{\cdot} (z)}

if

N_{s e l (n)}^{m}

is not contained in

N_{m a n^{\cdot} (z)}

(lines 29–32). Otherwise,

N_{{}^{\cdot}m a n (z)}

is regarded as

N_{s e l (n)}^{m}

, and the algorithm returns to line 10 (lines 33–35). Finally, the output is the perceptible region of the process model

M_{φ_{P M}^{*}}

(line 37).

Algorithm 3 Non-perceivable areas are eliminated by reverse search

Input:: $\begin{array}{l} process model M, mandatory substructure N_{m a n}, selective substructure N_{s e l}, \\ perceivable range of process model M_{φ_{P M}^{*}}, initial activity a_{i}, final activity a_{f}, \\ perceptible condition C_{p}, fixed value f v, function of judge substructure N is \\ a perceivable region j () . \end{array}$

\begin{array}{l} Output : perceivable range of process model M_{φ_{P M}^{*}} . \\ 1 : \forall N_{m a n} \in M, \forall N_{s e l} \in M \\ 2 : d = | d | \\ 3 : For each d \geq 1 do \\ 4 : If N_{s e l (n)}^{m} ⊄ N_{m a n^{\cdot} (d)} then \\ 5 : j (N_{m a n^{\cdot} (d)}) = y e s \\ 6 : M_{φ_{P M}^{*}} \leftarrow N_{m a n^{\cdot} (d)} \land C_{p} (N_{m a n^{\cdot} (d)}) = t r u e \\ 7 : else \\ 8 : N_{m a n^{\cdot} (d)} ⊳ N_{s e l (n)}^{m} and jump to 10 line \\ 9 : end \\ 10 : d = d - 1 \\ 11 : end \\ 12 : n = n u m (n) \land m = n u m (m) \\ 13 : For each n \geq 1 \land m \geq 1 do \\ 14 : If N_{s e l (n)}^{m} \in M \land num (e a^{*}) = f v \land e a^{*} \in N_{s e l (n)}^{m} then \\ 15 : j (N_{s e l (n)}^{m}) = y e s \\ 16 : M_{φ_{P M}^{*}} \leftarrow N_{s e l (n)}^{m} \land C_{p} (N_{s e l (n)}^{m}) = t r u e \\ 17 : end \\ 18 : If N_{s e l (n)}^{m} \in M \land num (e a^{*}) \neq f v \land 0 \leq n u m (e a^{*}) \leq 2 \land e a^{*} \in N_{s e l (n)}^{m} then \\ 19 : j (N_{s e l (n)}^{m}) = y e s \\ 20 : M_{φ_{P M}^{*}} \leftarrow N_{s e l (n)}^{m} \land C_{p} (N_{s e l (n)}^{m}) = t r u e \\ 21 : end \\ 22 : If \exists N_{s e l (n - 1)}^{m} \supset N_{s e l (n)}^{m} then \\ 23 : n = n - 1 \\ 24 : end \\ 25 : If \exists N_{s e l (n)}^{m - 1} = p r e (N_{s e l (n)}^{m}) then \\ 26 : m = m - 1 \\ 27 : end \\ 28 : end \\ 29 : For each z \geq 1 do \\ 30 : If N_{s e l (n)}^{m} ⊄ N_{m a n^{\cdot} (z)} then \\ 31 : j (N_{m a n^{\cdot} (z)}) = y e s \\ 32 : M_{φ_{P M}^{*}} \leftarrow N_{m a n^{\cdot} (z)} \land C_{p} (N_{m a n^{\cdot} (z)}) = t r u e \\ 33 : else \\ 34 : N_{m a n^{\cdot} (z)} ⊳ N_{s e l (n)}^{m} and return to 10 line \\ 35 : end \\ 36 : end \\ 37 : Return M_{φ_{p M}^{*}} \end{array}

4. Perceptible Search for Optimal Alignment

Each trace is aligned with all firing sequences by the optimal alignment, and the alignment that satisfies the least deviation cost is selected. The brute force search will produce a huge amount of computation for the complex business process. It is proposed that the non-optimality of alignment is perceived based on the location of the initial deviation, thereby reducing some unnecessary comparisons.

4.1. Perceptible Search Algorithm

The optimal alignments of the event log can be regarded as a sum of the optimal alignment of each trace since the event log contains different traces. The search for optimal alignment is denoted as

S e a r c h (A l i g n_{o p}^{*} (L, M)) = S e a r c h (\sum_{o = 1}^{| O |} \sum_{o p = 1}^{| O P |} A l i g n (℘_{o}, δ_{o p}))

. The default location of the initial deviation of each trace is recorded as 0, denoted as

v (l o c_{i d} (A l i g n (℘_{o}, M))) = 0

. According to the perceptibility condition

C_{p}

, an alignment cannot be judged to be non-optimal when

v (l o c_{i d} (A l i g n (℘_{o}, δ_{j}))) \geq v (l o c_{i d} (A l i g n (℘_{o}, M)))

. Therefore, the final cost of

A l i g n (℘_{o}, δ_{j})

needs to be obtained and evaluated together with the other alignments of

℘_{o}

, namely,

(A l i g n^{*} (℘_{o}, M) \leftarrow A l i g n (℘_{o}, δ_{j})) \land h_{o p} (A l i g n^{*} (℘_{o}, M))

(

h_{o p}

is the evaluated function of the minimum cost). In this case,

v (l o c_{i d} (A l i g n (℘_{o}, δ_{j})))

is assigned to the record value

v (l o c_{i d} (A l i g n (℘_{o}, M)))

, and acts as the perceptible standard for the next alignment

A l i g n (℘_{o}, δ_{j + 1})

. This alignment must not be the optimal alignment of

℘_{o}

when

v (l o c_{i d} (A l i g n (℘_{o}, δ_{j}))) < v (l o c_{i d} (A l i g n (℘_{o}, M)))

. Therefore, a complete comparison of

A l i g n (℘_{o}, δ_{j})

is not needed, and

v (l o c_{i d} (A l i g n (℘_{o}, M)))

remains unchanged. It is worth noting that the perceptibility condition is feasible when the initial deviation occurs in the perceptible region. The overall process of the perceptible algorithm is described in Figure S1.

The above search for optimal alignment can be denoted as

A l i g n_{o p}^{*} (L, M) \leftarrow P S (\sum_{o = 1}^{| O |} \sum_{j = 1}^{| J |} A l i g n (℘_{o}, δ_{j}))

. It can directly jump to the next trace

℘_{o + 1}

when

℘_{o}

is consistent with

δ_{j}

, and this alignment is regarded as the optimal alignment of

℘_{o}

. The concrete procedures and the proof of perceptible search (

P S

) are described in the Supplementary Materials.

The process model

M

with the perceptible location is described in Figure 9, and the non-perceptible region is not contained in

M

, namely,

I d e (φ_{\neg p M}^{*}) = \emptyset

. All the firing sequences of

M

are recorded in Table 4.

The event log is set to L = {(a,b,c,d,e,f,h,i,l),(a,b,c,d,e,f,j,k,l,)}.

L

is aligned with

M

, and the search process of the optimal alignment is depicted in Figure 10. Figure 10a,b is the search process for traces

℘_{1}

and

℘_{2}

, respectively. From Figure 10a, the location of the initial deviation in the first and the third alignment is greater than or equal to the previously recorded value. These two alignments need to be evaluated, that is,

v (l o c_{i d} (A l i g n^{*'} (℘_{1}, δ_{1, 3}))) \geq v (l o c_{i d} (A l i g n^{*'} (℘_{1}, M))) \to W c o m p a r e (A l i g n^{*'} (℘_{1}, δ_{1, 3}))

. The location of the initial deviation in other alignments is smaller than the previously recorded value. These alignments are directly judged to be non-optimal, denoted as

v (l o c_{i d} (A l i g n^{* ″} (℘_{1}, δ^{*} \ δ_{1, 3}))) < v (l o c_{i d} (A l i g n^{* ″} (℘_{1}, M))) \to P c o m p a r e (A l i g n^{* ″} (℘_{1}, δ^{*} \ δ_{1, 3})) \to A l i g n^{* ″} (℘_{1}, δ^{*} \ δ_{1, 3}) \notin A l i g n_{o p}^{*}

(

P c o m p a r e

is the function of partial comparison). For example,

v (l o c_{i d} (A l i g n (℘_{1}, δ_{2}))) = 6 < 7

and

A l i g n (℘_{1}, δ_{2}) = P c o m p a r e (A l i g n (℘_{1}, δ_{2}))

. The comparison of

A l i g n (℘_{1}, δ_{2})

is terminated when the initial deviation occurs. From Figure 10b, the first alignment has no deviation, so it is directly judged as the optimal alignment of

℘_{2}

.

Algorithm 4 describes the perceptible search for optimal alignment. The set of optimal alignments is initialized to the

\emptyset

(line 1). The recorded location of the initial deviation is set to 0 (line 4). The alignment needs to be completely compared and the location of the initial deviation is assigned to the recorded value if the location of the initial deviation is greater than or equal to the previously recorded value in perceptible region (lines 7–9). The recorded value is maintained and the comparison of alignment is terminated when the location of the initial deviation is smaller than the previously recorded value (lines 10–13). The recorded value is maintained and the alignment is completely compared if the initial deviation of alignment occurs in a non-perceptible region (lines 15–17). The alignment is regarded as the optimal alignment of a trace if no deviation is found between the trace and firing sequence (lines 18–21). All alignments that need complete comparison are assigned to the alignment set

A l i g n^{*}

(line 22). The optimal alignment set

A l i g n_{o p}^{*}

is evaluated by

h_{o p}

(line 25). The optimal alignment set is output between the event log and process model as a result (line 28).

Algorithm 4 Search the optinal alignment based on the location of initial deviation

Input:: $\begin{array}{l} event \log L, process model M, trace ℘_{o}, firing sequence δ_{j}, the value of initial deviation location v (l o c_{i d} (℘_{o}, M)), \\ the perceptible range φ_{p M}^{*}, the non - perceptible range φ_{\neg p M}^{*}, the evalution function of the optimal alignment h_{o p}, the \\ whole compared function of alignment W c o m p a r e, the partial compare of alignment P c o m p a r e, the set of the alignments \\ is A l i g n_{o p}^{*} . \end{array}$

\begin{array}{l} Output : the set of the alignments is A l i g n_{o p}^{*} . \\ 1 : A l i g n_{o p}^{*} = \emptyset \\ 2 : o = 1 \\ 3 : For each o \in O do \\ 4 : v (l o c_{i d} (℘_{o}, M)) = 0 \\ 5 : j = 1 \\ 6 : For each j \in J do \\ 7 : If \cos t (A l i g n (℘_{o}, δ_{j})) > 0 \land l o c_{i d} (A l i g n (℘_{o}, δ_{j})) \subseteq N_{m a n / s e l} \land N_{m a n / s e l} \in φ_{p M}^{*} then \\ 8 : If v (l o c_{i d} (A l i g n (℘_{o}, δ_{j}))) \geq v (l o c_{i d} (A l i g n (℘_{o}, M))) = 0 then \\ 9 : W c o m p a r e (A l i g n (℘_{o}, δ_{j})) \land v (l o c_{i d} (A l i g n (℘_{o}, M))) \leftarrow v (l o c_{i d} (A l i g n (℘_{o}, δ_{j}))) \\ 10 : else \\ 11 : | P c o m p a r e (A l i g n (℘_{o}, δ_{j})) | = v (l o c_{i d} (A l i g n (℘_{o}, δ_{j}))) \\ 12 : P c o m p a r e (A l i g n (℘_{o}, δ_{j})) \land v (l o c_{i d} (A l i g n (℘_{o}, M))) \leftarrow v (l o c_{i d} (A l i g n (℘_{o}, M))) \land j u m p t o 24 l i n e \\ 13 : end \\ 14 : end \\ 15 : If \cos t (A l i g n (℘_{o}, δ_{j})) > 0 \land l o c_{i d} (A l i g n (℘_{o}, δ_{j})) \subseteq N_{m a n / s e l} \land N_{m a n / s e l} \in φ_{\neg p M}^{*} then \\ 16 : W c o m p a r e (A l i g n (℘_{o}, δ_{j})) \land v (l o c_{i d} (A l i g n (℘_{o}, M))) \leftarrow v (l o c_{i d} (A l i g n (℘_{o}, M))) \\ 17 : end \\ 18 : If \cos t (A l i g n (℘_{o}, δ_{j})) = 0 then \\ 19 : A l i g n_{o p}^{*} \leftarrow A l i g n (℘_{o}, δ_{j}) \\ 20 : j u m p t o l i n e 26 \\ 21 : end \\ 22 : A l i g n^{*} \leftarrow W c o m p a r e (A l i g n (℘_{o}, δ_{j})) \\ 23 : j = j + 1 \\ 24 : end \\ 25 : A l i g n_{o p}^{*} \leftarrow h_{o p} (A l i g n^{*}) \\ 26 : o = o + 1 \\ 27 : end \\ 28 : return A l i g n_{o p}^{*} \end{array}

4.2. Example of Industrial Application

The behavior recorded in the initial process is often different from that described in the current information system due to the larger flow and complicated properties of coal slime water. The degradation performance of anionic polyacrylamide (HPAM) may be affected by different operational conditions. Therefore, the biological coal-washing process should be regarded as a business process that executes a specific goal. The deviation elements are detected in the actual biological coal-washing process, providing accurate technical support for the subsequent optimization. Figure 11 and Figure 12 describe the current process and initial model in the biological coal-washing using Petri net language, respectively. Table 5 lists annotations of the letter labels in Figure 11 and Figure 12.

As shown in Figure 11, let trace

℘_{o} = (A B C D E G K N P Q S T U V P Q R W)

be recorded in

L

, and then,

℘_{o}

compares

M

in Figure 12. The two firing sequences

δ_{j} = (A B C D E P Q U T W)

and

δ_{j^{'}} = (A B C D F P Q T U W)

are extracted for constructing two alignment

A l i g n (℘_{o}, δ_{j})

and

A l i g n (℘_{o}, δ_{j^{'}})

, respectively. According to the judgment of the perceptible region in Section 3.2,

M \in φ_{p M}^{*}

is known.

A l i g n (℘_{o}, δ_{j^{'}})

can be determined to be non-optimal when the first deviation

E s e r t (l (e_{5})) = E

occurs due to

v (l o c_{i d} (A l i g n (℘_{o}, δ_{j}))) = 6 > v (l o c_{i d} (A l i g n (℘_{o}, δ_{j^{'}}))) = 5

. According to Theorem S1 in the Supplementary Materials, there exists

δ_{j^{″}} = (A B C D E P Q T U W)

and it makes

A l i g n (℘_{o}, δ_{j^{″}})

superior to

A l i g n (℘_{o}, δ_{j})

and

A l i g n (℘_{o}, δ_{j^{'}})

. Therefore, the efficiency of deviation detection in actual operation is effectively improved.

5. Evaluation

An experiment was performed on a 64-bit Win10 computer with Inter(R) Core(TM) i5-2.11 GHZ with 8 GB of memory space and python3.7. The

A^{*}

algorithm needed to make the complete comparison of all alignments between the event log and process model, which was denoted as

A^{*} - A l i g n

.

O I A^{*} - A l i g n

directly jumped to the alignment of the next trace if and only if an alignment that fit perfectly was found by it. The deviation perception proposed in this paper was denoted as

I D P - A l i g n

, and its scalability and effectiveness were verified by the performance comparison of the above three methods.

5.1. The Operation Interface of OPS-Align Plug-In

A^{*} - A l i g n

and

O I A^{*} - A l i g n

could be realized through the source code in the prom framework, but

I D P - A l i g n

could not. To ensure the fairness of the experimental results, we wrote a OPS-Align plug-in in Python to evaluate the search workload of the optimal alignment.

Figure 13 describes the OPS-Align plug-in interface. Input and output information were located in the left of the interface, and parts of the program code were located in the right. As shown in Figure 13,

v a l u e (O)

was the total number of trace cases,

v a l u e (J)

was the total number of firing sequences,

P R

was the perceptible range,

S - M e t h o d

was the current method,

E L - E x

was the event log used in the experiment,

M - E x

was the process model used in the experiment and

S - T i m e

was the search time.

5.2. Scalability

In order to verify the improvement in various business processes,

A^{*} - A l i g n

and

I D P - A l i g n

could be evaluated on data sets with different network structures.

5.2.1. Data Sets of Artificial and Real-Life Business Process

Four artificial and two real-life data sets were used in the experiment. Each data set was constructed by ten different event logs and a corresponding initial model. Based on the initial model, these event logs were generated by redundancy activity, loss activity and dislocation between activities or cyclic substructure.

Artificial data set.

Four artificial data sets were generated from two different artificial business processes. These two business processes were considered initial models

M_{a 1}

and

M_{a 2}

, respectively. The non-perceptible range was not set in

M_{a 1}

and

M_{a 2}

. Each artificial data set contained the selective substructure or the cyclic substructure.

Real-life data set.

As shown in Figure 14, the auto-claim process mined by the prom framework was regarded as the initial model

M_{r}

in the two real-life data sets. The activities of M, O, R and T were removed so that the non-perceptible region was not contained in

M_{r}

. The selective substructure and the cyclic substructure were contained in each real-life data set, so these data sets had a universal network structure.

5.2.2. Experimental Results on Different Network Structures

Artificial data sets with selective substructures.

Figure 15 describes the performance of A^* − Align and IDP − Align in two data sets containing a selective substructure. The X and Y axes represented the number of aligned traces and the search time of the optimal alignment, respectively. Each trace was required compared with all firing sequences, so the number of aligned traces was referred to the total number of traces involved in the alignment process, denoted as

| T r a c e_{a l i g n} | = \sum_{o = 1}^{| O |} ℘_{o} \times n \times n u m (J)

(

n

represents the number of repetition of

℘_{o}

, and

n u m (J)

is the total number of firing sequences). Each event log had 200 aligned traces, and ten data sets were analyzed, as shown in Figure 15a,b. Each data set was constructed of ten event logs and a corresponding initial model, that is,

D_{a 1}^{*} (L_{a (1, \dots, 10)}, M_{a 1})

and

D_{a 2}^{*} ({L^{″}}_{a (1, \dots, 10)}, M_{a 1})

. The average search time of ten data sets was taken to ensure the authenticity of the experimental result. The business processes in Figure 15a,b contained 11 and 23 activities, respectively. The length intervals of traces in two data sets were 6 to 8 and 14 to 20, respectively. The lengths of firing sequences in two data sets were 7 and 16, respectively. The search time using IDP − Align was reduced by 207 ms (in Figure 15a), while it was reduced by 1302 ms (in Figure 15b) compared to A^* − Align. From Figure 15, we found that the size of the data set was larger, and more activities could be automatically eliminated in perception. Thus, the difference in search time was greater.

Artificial data sets with different cyclic substructures.

Each event log had 300 aligned traces, and ten data sets were analyzed, as shown in Figure 16a,b. Each data set was constructed of ten event logs and a corresponding initial model, that is,

D_{a 3}^{*} ({L^{″}}_{a (1, \dots, 10)}, M_{a 2})

and

D_{a 4}^{*} ({L^{‴}}_{a (1, \dots, 10)}, M_{a 2})

. The average search time of ten data sets was taken to ensure the authenticity of the experimental result. Figure 16a,b contained 10 and 11 activities, respectively. The maximum length of the trace and firing sequence could not be determined due to the cyclic substructure in each data set. The cyclic substructures in Figure 16a,b were the self-loop and the loop of three activities, respectively. The search time using IDP − Align was reduced by 2700 ms (in Figure 16a), while it was reduced by 7908 ms (in Figure 16b) compared to A^* − Align. The size of alignment comparison was enlarged by the cyclic substructure, so more elements were automatically eliminated in perception (in Figure 16b). Thus, the improvement degree of search time depended on the size of the data set compared to the A^* − Align.

Real-life data sets with universal network structures.

The real-life data sets containing both selective substructures and cyclic substructures were evaluated. The cyclic substructures contained in Figure 17a,b were the self-loop and normal loop, respectively. Each event log had 1020 and 780 aligned traces, and ten data sets were analyzed as shown in Figure 17a,b, respectively. Each data set was constructed of ten event logs and a corresponding initial model

M_{r}

, that is,

D_{r 1}^{*} (L_{r (1, \dots, 10)}, M_{r})

and

D_{r 2}^{*} ({L^{'}}_{r (1, \dots, 10)}, M_{r})

. The average search time of ten data sets was taken to ensure the authenticity of the experimental result. The search time using IDP − Align was reduced by 37,752 ms (in Figure 17a), while it was reduced by 24,600 ms (in Figure 17b) compared to A^* − Align. This was due to the size of the data set and cyclic substructure.

To sum up, the search time was significantly shortened by IDP − Align applied to various networks for the large-sized data sets. Figure 18 shows the experimental results of eight different data sets. Their parameters were search time, alignment proportion of partial comparisons and the number of aligned traces. The alignment proportion of partial comparisons was the ratio of automatically eliminated non-optimal alignments. In the same data set, the alignment proportion of partial comparisons was larger and the reduction of search time was more obvious than A^* − Align. As shown in Figure 18, the search time for data sets with various network structures could be effectively reduced using IDP − Align compared with A^* − Align.

5.2.3. Experimental Results on Biological Coal-Washing Data Set

The biological coal-washing process was simulated as event logs, that is,

L_{B i o}

without loop structure and

{L_{B i o}}^{'}

with loop structure, respectively. The initial model is constructed by a simple process,

M_{B i o}

(

M_{B i o} \in φ_{p M}^{*}

) and

{M_{B i o}}^{'}

(

{M_{B i o}}^{'} \in φ_{p M}^{*}

), consisting of basic steps, respectively. The points of each line represent the search time and the proportion of time difference in data sets of different sizes (seen in Figure 19). As can be seen from Figure 19, the proportion of search time and time difference increases with the increase of data set size. As can also be seen from Figure 19, the search time and the proportion of time difference increases with the increase of data set size.

From Figure 19a,

D_{B i o 1}

is composed of

L_{B i o}

and

M_{B i o}

, while

D_{B i o 2}

is composed of

{L_{B i o}}^{'}

and

M_{B i o}

.

A^{*} - A l i g n

and

I D P - A l i g n

are executed on these two data sets. The substructures in

D_{B i o 1}

were causal, concurrent and selective, that is,

N_{c a u}

,

N_{c o n c}

and

N_{s e l (n)}^{m}

. However,

D_{B i o 2}

has more cyclic substructures than

D_{B i o 1}

. As shown in Figure 19a, the search time of

I D P - A l i g n

in

D_{B i o 1}

and

D_{B i o 2}

was reduced by 47,648 ms and 65,768 ms compared with that of

A^{*} - A l i g n

, respectively. From Figure 19b,

{D_{B i o 1}}^{'}

is composed of

L_{B i o}

and

{M_{B i o}}^{'}

, while

{D_{B i o 2}}^{'}

is composed of

{L_{B i o}}^{'}

and

{M_{B i o}}^{'}

.

A^{*} - A l i g n

and

I D P - A l i g n

are executed on these two data sets. As shown in Figure 19b, the search time of

I D P - A l i g n

in

{D_{B i o 1}}^{'}

and

{D_{B i o 2}}^{'}

was reduced by 55,226 ms and 40,232 ms compared with that of

A^{*} - A l i g n

, respectively. The substructures in

{D_{B i o 1}}^{'}

were causal, concurrent and selective, that is,

N_{c a u}

,

N_{c o n c}

and

N_{s e l (n)}^{m}

. However,

{D_{B i o 2}}^{'}

has more cyclic substructures than

{D_{B i o 1}}^{'}

.

5.3. Effectiveness

A^{*} - A l i g n

,

O I A^{*} - A l i g n

and

L D P - A l i g n

were performed on two BPIC2020 data sets so as to verify the improvement of search efficiency on the basis of guaranteeing optimality. At the same time, the experiment took into account the different noises.

5.3.1. BPIC2020 Data Sets

The data sets used came from BPIC2020. Table 6 lists the specific information of the two event logs in BPIC2020. Their cases and event types represent the number of different traces and event elements, respectively. Domestic Declarations in BPIC2020 were used as the event log of the experiment. Figure 20a,b shows two process models

M_{B 1}

and

M_{B 2}

with inconsistent behavior from Domestic Declarations, respectively. The non-perceivable region was not identified in

M_{B 1}

and

M_{B 2}

.

5.3.2. Experimental Result

The data set

D_{B 1}

was constructed by Domestic Declarations and the given process model, as shown in Figure 20a. Then, 20% and 40% noise were added to the initial event log to produce two new data sets, denoted as D_B1+20% and D_B1+40%.

A^{*} - A l i g n

,

O I A^{*} - A l i g n

and

I D P - A l i g n

were used to search the optimal alignment set (in Figure 21). Cost and search time were evaluated, and the proportion of fitted alignments in

D_{B 1}

, D_B1 and D_B1+20% and D_B1+40% was extremely low. Thus, the search time was shortened only by 0.34%, 0.28% and 0.25% using

O I A^{*} - A l i g n

compared to

A^{*} - A l i g n

. The certain proportion of alignments in

D_{B 1}

, D_B1+20% and D_B1+40% were partially compared. Thus, the search time was reduced using

I D P - A l i g n

by 40%, 34% and 29% compared with

A^{*} - A l i g n

. The noise was greater and the reduction of search time was smaller in the same data set. Fewer elements were automatically eliminated due to the initial deviation of some noise alignments occurring at the end. The difference of search time between

O I A^{*} - A l i g n

and

A^{*} - A l i g n

in

D_{B 1}

, D_B1+20% and D_B1+40% are depicted in Figure 21 (partially enlarged).

The data set

D_{B 2}

was constructed by Domestic Declarations and the given process model, as shown in Figure 20b. Then, 20% and 40% noise were added to the initial event log to produce two new data sets, denoted as

D_{B 2 + 20 %}

and

D_{B 2 + 40 %}

. The search time was shortened by 2.82%, 2.34% and 2% using

O I A^{*} - A l i g n

compared to

A^{*} - A l i g n

. It could be seen that the ratio of fitted alignments in Figure 22 was larger than that in Figure 21. The search time was reduced using

I D P - A l i g n

by 20%, 18% and 14% compared with

A^{*} - A l i g n

. The result showed that the improvement in Figure 22 was lower than that in Figure 21 due to the size of the data sets.

Table 7 records the time and cost obtained using

A^{*} - A l i g n

,

I A^{*} - A l i g n

and

I D P - A l i g n

. From Table 7, the search time of

I D P - A l i g n

was the least in the same data set compared with others.

Table 8 lists all the numerical time differences in different data sets. From Table 8, we can find that the difference between

A^{*} - A l i g n

and

I A^{*} - A l i g n

was much smaller than the others. At the same time, we can also see the following two problems according to the time difference: (1) the time differences were basically proportional to the size of the data set; (2) noise addition has little effect on the time savings of

I D P - A l i g n

.

5.4. Performance Comparison

The efficiency improvement of

I D P - A l i g n

was verified by the above experiments compared with the existing

A^{*} - A l i g n

and

I A^{*} - A l i g n

. Moreover, the search time was effectively shortened on the premise of ensuring the minimum cost. However, the main performance of conformance checking was evaluated by taking cost, time, perspective, form and application range into account. Therefore, these features are discussed in Table 9.

6. Conclusions

The effective search for the optimal alignment is introduced in this paper, aiming at unweighted business processes with uniform unit cost. The search for the optimal alignment is efficiently improved compared with the traditional brute force search when dealing with a complex business process. The perceptible search mainly includes the following three steps: (i) According to the behavior characteristics of different substructures in the process model, the perceptible region is determined by reverse search. The location of the initial deviation is inversely proportional to the number of deviations when the initial deviation occurs in the perceptible region. (ii) The recorded location of the initial deviation is firstly set to 0. The recorded location is updated when the location of the current initial deviation is greater than the previously recorded value. At the same time, its recorded location is used as the optimal metric of the same trace. (iii) The recorded location is unchanged and the comparison of alignment is automatically terminated as non-optimal when the location of the current initial deviation is smaller than the previously recorded value. In summary, the search workload of the optimal alignment can be effectively reduced now that some alignments are only partially compared based on the perception of the initial deviation.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/electronics13234669/s1.

Author Contributions

L.Z.: Writing—original draft, Validation, Methodology, Investigation, Formal analysis, Data curation, Project administration. F.W.: Conceptualization, Methodology, Validation, Investigation, Data curation, Formal analysis, Writing—original draft, Writing—review and editing. Z.S.: Investigation, Supervision, Project administration, Methodology. K.H.: Investigation, Validation. Y.H.: Validation, Data curation, Formal analysis. G.Z.: Data curation, Formal analysis. All authors have read and agreed to the published version of the manuscript.

Funding

This work was financially supported by the Open Research Project of the State Key Laboratory of Industrial Control Technology, Zhejiang University, China (no. ICT2024B58), the Natural Science Research Project of Anhui Higher Education Institution (no. 2024AH051732 and 2024AH051735) and the High-Level Talent Fund Project of Huainan Normal University (no. 621222-BSKYQDJ).

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References

Li, C.; Ge, J.; Huang, L.; Hu, H.; Wu, B.; Yang, H.; Hu, H.; Luo, B. Process mining with token carried data. Inf. Sci. 2016, 328, 558–576. [Google Scholar] [CrossRef]
Caldeira, J.; Abreu, F.B.e. Software development process mining: Discovery, conformance checking and enhancement. In Proceedings of the 2016 10th International Conference on the Quality of Information and Communications Technology (QUATIC), Lisbon, Portugal, 6–9 September 2016; pp. 254–259. [Google Scholar]
Van der Aalst, W.; Weijters, T.; Maruster, L. Workflow mining: Discovering process models from event logs. IEEE Trans. Knowl. Data Eng. 2004, 16, 1128–1142. [Google Scholar] [CrossRef]
Weidlich, M.; Mendling, J. Perceived consistency between process models. Inf. Syst. 2012, 37, 80–98. [Google Scholar] [CrossRef]
Alizadeh, M.; Lu, X.; Fahland, D.; Zannone, N.; van der Aalst, W.M. Linking data and process perspectives for conformance analysis. Comput. Secur. 2018, 73, 172–193. [Google Scholar] [CrossRef]
van Zelst, S.J.; van Dongen, B.F.; van der Aalst, W.M. Event stream-based process discovery using abstract representations. Knowl. Inf. Syst. 2018, 54, 407–435. [Google Scholar] [CrossRef]
Pourmasoumi, A.; Kahani, M.; Bagheri, E. Mining variable fragments from process event logs. Inf. Syst. Front. 2017, 19, 1423–1443. [Google Scholar] [CrossRef]
Buijs, J.C.; La Rosa, M.; Reijers, H.A.; van Dongen, B.F.; van der Aalst, W.M. Improving business process models using observed behavior. In Proceedings of the 2nd International Symposium on Data-Driven Process Discovery and Analysis (SIMPDA), Campione d’Italia, Italy, 18–20 June 2012; pp. 44–59. [Google Scholar]
Qi, H.; Du, Y.; Qi, L.; Wang, L. An approach to repair Petri net-based process models with choice structures. Enterp. Inf. Syst. 2018, 12, 1149–1179. [Google Scholar] [CrossRef]
Zhang, L.; Fang, X.; Shao, C.; Wang, L. Real-time repair of business processes based on alternative operations in case of uncertainty. IEEE Access 2021, 9, 23672–23690. [Google Scholar] [CrossRef]
Kleiner, N. Delta analysis with workflow logs: Aligning business process prescriptions and their reality. Requir. Eng. 2005, 10, 212–222. [Google Scholar] [CrossRef]
Garcia-Banuelos, L.; van Beest, N.R.; Dumas, M.; La Rosa, M.; Mertens, W. Complete and interpretable conformance checking of business processes. IEEE Trans. Softw. Eng. 2017, 44, 262–290. [Google Scholar] [CrossRef]
van der Aalst, W.; Adriansyah, A.; van Dongen, B. Replaying history on process models for conformance checking and performance analysis. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2012, 2, 182–192. [Google Scholar] [CrossRef]
Fang, X.; Cao, R.; Liu, X.; Wang, L. A method of mining hidden transition of business process based on region. IEEE Access 2018, 6, 25543–25550. [Google Scholar] [CrossRef]
Armas-Cervantes, A.; Baldan, P.; Dumas, M.; Garcia-Bañuelos, L. Diagnosing behavioral differences between business process models: An approach based on event structures. Inf. Syst. 2016, 56, 304–325. [Google Scholar] [CrossRef]
Fahland, D.; Van Der Aalst, W.M. Model repair-aligning process models to reality. Inf. Syst. 2015, 47, 220–243. [Google Scholar] [CrossRef]
Bauer, M.; Van der Aa, H.; Weidlich, M. Estimating process conformance by trace sampling and result approximation. In Proceedings of the Business Process Management: 17th International Conference, BPM 2019, Proceedings 17, Vienna, Austria, 1–6 September 2019; Springer International Publishing: Cham, Switzerland, 2019. [Google Scholar]
De Leoni, M.; Maggi, F.M.; van der Aalst, W.M. Aligning event logs and declarative process models for conformance checking. In Proceedings of the Business Process Management: 10th International Conference, BPM 2012, Proceedings 10, Tallinn, Estonia, 3–6 September 2012; Springer: Berlin/Heidelberg, Germany, 2012; pp. 82–97. [Google Scholar]
Adriansyah, A.; van Dongen, B.F.; van der Aalst, W.M. Conformance checking using cost-based fitness analysis. In Proceedings of the 2011 IEEE 15th International Enterprise Distributed Object Computing Conference, Helsinki, Finland, 29 August–2 September 2011; pp. 55–64. [Google Scholar]
Leemans, S.J.; Fahland, D.; Van Der Aalst, W.M. Discovering block-structured process models from event logs-a constructive approach. In Proceedings of the Application and Theory of Petri Nets and Concurrency: 34th International Conference, PETRI NETS 2013, Proceedings 34, Milan, Italy, 24–28 June 2013; Springer: Berlin/Heidelberg, Germany, 2013; pp. 311–329. [Google Scholar]
Buijs, J.C.; Van Dongen, B.F.; van Der Aalst, W.M. On the role of fitness, precision, generalization and simplicity in process discovery. In Proceedings of the On the Move to Meaningful Internet Systems: OTM 2012: Confederated International Conferences: CoopIS, DOA-SVI, and ODBASE 2012, Proceedings, Part I, Rome, Italy, 10–14 September 2012; Springer: Berlin/Heidelberg, Germany, 2012; pp. 305–322. [Google Scholar]
de Leoni, M.; Maggi, F.M.; van der Aalst, W.M. An alignment-based framework to check the conformance of declarative process models and to preprocess event-log data. Inf. Syst. 2015, 47, 258–277. [Google Scholar] [CrossRef]
Jagadeesh Chandra Bose, R.P.; van der Aalst, W. Trace alignment in process mining: Opportunities for process diagnostics. In Proceedings of the International Conference on Business Process Management, Hoboken, NJ, USA, 13–16 September 2010; Springer: Berlin/Heidelberg, Germany, 2010; pp. 227–242. [Google Scholar]
Weidlich, M.; Polyvyanyy, A.; Desai, N.; Mendling, J. Process compliance measurement based on behavioural profiles. In Proceedings of the Advanced Information Systems Engineering: 22nd International Conference, CAiSE 2010, Proceedings 22, Hammamet, Tunisia, 7–9 June 2010; Springer: Berlin/Heidelberg, Germany, 2010; pp. 499–514. [Google Scholar]
Bogdanov, E.; Cohen, I.; Gal, A. Conformance checking over stochastically known logs. In Proceedings of the International Conference on Business Process Management, Münster, Germany, 11–16 September 2022; Springer International Publishing: Cham, Switzerland, 2022; pp. 105–119. [Google Scholar]
Pegoraro, M.; Uysal, M.S.; van der Aalst, W.M. Conformance checking over uncertain event data. Inf. Syst. 2021, 102, 101810. [Google Scholar] [CrossRef]
Calheno, R.; Carvalho, P.; Lima, S.R.; Henriques, P.R.; Merino, M.R. Improving conformance checking in process modelling: A multiperspective algorithm. J. Super-Comput. 2023, 79, 18256–18292. [Google Scholar] [CrossRef]
Felli, P.; Gianola, A.; Montali, M.; Rivkin, A.; Winkler, S. Multi-perspective conformance checking of uncertain process traces: An SMT-based approach. Eng. Appl. Artif. Intell. 2023, 126, 106895. [Google Scholar] [CrossRef]
Leemans, S.J.; van der Aalst, W.M.; Brockhoff, T.; Polyvyanyy, A. Stochastic process mining: Earth movers’ stochastic conformance. Inf. Syst. 2021, 102, 101724. [Google Scholar] [CrossRef]
Wang, L.; Du, Y.; Qi, M.; Qi, H.; He, Z. Petri net-based deviation detection between a process model with loop semantics and event logs. Concurr. Comput. Pract. Exp. 2018, 30, e4419. [Google Scholar] [CrossRef]
van Zelst, S.J.; Bolt, A.; Hassani, M.; van Dongen, B.F.; van der Aalst, W.M.P. Online conformance checking: Relating event streams to process models using prefix-alignments. Int. J. Data Sci. Anal. 2019, 8, 269–284. [Google Scholar] [CrossRef]
Lee, W.L.J.; Burattin, A.; Munoz-Gama, J.; Sepúlveda, M. Orientation and conformance: A HMM-based approach to online conformance checking. Inf. Syst. 2021, 102, 101674. [Google Scholar] [CrossRef]
Adriansyah, A.; Van Dongen, B.F.; van der Aalst, W.M. Cost-based conformance checking using the A* Algorithm. BPM Cent. Rep. BPM-11-11 BPMcenter.Org 2011, 1111, 1–14. [Google Scholar]
Adriansyah, A.; van Dongen, B.F.; van der Aalst, W.M. Towards robust conformance checking. In Proceedings of the Business Process Management Workshops: BPM 2010 International Workshops and Education Track, Revised Selected Papers 8, Hoboken, NJ, USA, 13–15 September 2010; Springer: Berlin/Heidelberg, Germany, 2010; pp. 122–133. [Google Scholar]
De Weerdt, J.; De Backer, M.; Vanthienen, J.; Baesens, B. A robust F-measure for evaluating discovered process models. In Proceedings of the 2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM), Paris, France, 11–15 April 2011; pp. 148–155. [Google Scholar]
Bloemen, V.; van Zelst, S.J.; van der Aalst, W.M.; van Dongen, B.F.; van de Pol, J. Maximizing synchronization for aligning observed and modelled behaviour. In Proceedings of the Business Process Management: 16th International Conference, BPM 2018, Proceedings 16, Sydney, NSW, Australia, 9–14 September 2018; Springer International Publishing: Cham, Switzerland, 2018; pp. 233–249. [Google Scholar]
Reißner, D.; Armas-Cervantes, A.; Conforti, R.; Dumas, M.; Fahland, D.; La Rosa, M. Scalable alignment of process models and event logs: An approach based on automata and s-components. Inf. Syst. 2020, 94, 101561. [Google Scholar] [CrossRef]
Lee, W.L.; Verbeek, H.M.; Munoz-Gama, J.; van der Aalst, W.M.; Sepúlveda, M. Recomposing conformance: Closing the circle on decomposed alignment-based conformance checking in process mining. Inf. Sci. 2018, 466, 55–91. [Google Scholar] [CrossRef]
Song, W.; Xia, X.; Jacobsen, H.-A.; Zhang, P.; Hu, H. Efficient alignment between event logs and process models. IEEE Trans. Serv. Comput. 2016, 10, 136–149. [Google Scholar] [CrossRef]
Dumas, M.; García-Bañuelos, L. Process mining reloaded: Event structures as a unified representation of process models and event logs. In Proceedings of the Application and Theory of Petri Nets and Concurrency: 36th International Conference, PETRI NETS 2015, Proceedings 36, Brussels, Belgium, 21–26 June 2015; Springer International Publishing: Cham, Switzerland, 2015; pp. 33–48. [Google Scholar]
Zhong, C.; Zhang, H.; Huang, H.; Chen, Z.; Li, C.; Liu, X.; Li, S. DOMICO: Checking conformance between domain models and implementations. Softw. Pract. Exp. 2024, 54, 595–616. [Google Scholar] [CrossRef]
Zhang, L.; Fang, X. Business process fitness analysis based on alignment processing and deviation detection. Comput. Integr. Manuf. Syst. 2020, 26, 1573–1581. [Google Scholar]

Figure 1. Search workload for the optimal alignment. Note: event log is

L = (A C B D E R S T)

, and process model is

M = (A C B D E R T, A B C D E R T, A B C D E F I R K S T, A C B D E F I R K S T)

.

Figure 1. Search workload for the optimal alignment. Note: event log is

L = (A C B D E R S T)

, and process model is

M = (A C B D E R T, A B C D E R T, A B C D E F I R K S T, A C B D E F I R K S T)

.

Figure 2. Discovery of initial deviation.

Figure 3. Division of paths in various substructures.

Figure 4. Substructure N_conc of process model

M

.

Figure 4. Substructure N_conc of process model

M

.

Figure 5. Example of process model. Note:

N_{s e l (1)}^{1}

is represented by a green circle, and

N_{s e l (1)}^{2}

is marked by a red circle.

Figure 5. Example of process model. Note:

N_{s e l (1)}^{1}

is represented by a green circle, and

N_{s e l (1)}^{2}

is marked by a red circle.

Figure 6. Various substructures belonging to the perceptible region in the process model. Selective substructure with the same number of enable activities (a), selective structure with the number of enable activities ≤ 2 (b), and mandatory substructure (c).

Figure 7. The perceptible range of nested substructures. Nested substructure

N_{s e l (2)}^{1} \subset N_{s e l (1)}^{1}

(a); nested substructure

N_{s e l (1)}^{1} \subset N_{m a n (s e l (1))}^{1}

(b); nested substructure

N_{m a n} \supset N_{s e l (1)}^{1}

(c); nested substructure

N_{s e l (1)}^{(1 / 2) / (2 / 1)} \subset N_{m a n}

(d).

Figure 7. The perceptible range of nested substructures. Nested substructure

N_{s e l (2)}^{1} \subset N_{s e l (1)}^{1}

(a); nested substructure

N_{s e l (1)}^{1} \subset N_{m a n (s e l (1))}^{1}

(b); nested substructure

N_{m a n} \supset N_{s e l (1)}^{1}

(c); nested substructure

N_{s e l (1)}^{(1 / 2) / (2 / 1)} \subset N_{m a n}

(d).

Figure 8. The reverse search for perceptible range. Note: numbers are marked at the locations of all activities in the process model, and the activities of the selective substructure that are in the same location can be recorded as the same number.

Figure 9. The process model

M

without non-perceivable region.

Figure 9. The process model

M

without non-perceivable region.

Figure 10. Search for the set of optimal alignments. Note:

v (l o c_{i d} (℘_{1}, δ_{1}))

is simplification for

v (l o c_{i d} (A l i g n (℘_{1}, δ_{1})))

. Note: the pink block represents the alignments of complete comparison, the yellow block shows the alignments of partial comparison and the green block represents the fitting alignment. The search process of optimal alignment between

℘_{1}

and

M

(a), and the search process of optimal alignment between

℘_{2}

and

M

(b).

Figure 10. Search for the set of optimal alignments. Note:

v (l o c_{i d} (℘_{1}, δ_{1}))

is simplification for

v (l o c_{i d} (A l i g n (℘_{1}, δ_{1})))

. Note: the pink block represents the alignments of complete comparison, the yellow block shows the alignments of partial comparison and the green block represents the fitting alignment. The search process of optimal alignment between

℘_{1}

and

M

(a), and the search process of optimal alignment between

℘_{2}

and

M

(b).

Figure 11. Current business process of biological coal-washing process. Note: labels for the actual steps in the transition are replaced by letters.

Figure 12. Initial model of biological coal-washing process.

Figure 13. OPS-Align plug-in interface.

Figure 14. Real-life business process mined by prom framework.

Figure 15. The running results on the data sets with selective substructure. Run result of data set with 11 activities (a), and run result of data set with 23 activities (b).

Figure 16. The running results of data sets with cyclic substructures. Run result of data set with 10 activities (a), and run result of data set with 11 activities (b).

Figure 17. Running results on generic data sets. Run result of data set with 1020 aligned traces (a), and run result of data set with 780 aligned traces (b).

Figure 18. Merging results of different data sets.

Figure 19. Evaluations of biological coal-washing data set. The results of

D_{B i o 1}

and

D_{B i o 2}

(a), and the results of

{D_{B i o 1}}^{'}

and

{D_{B i o 2}}^{'}

(b). Note: time indicates the search time for optimal alignment; the number of traces indicates the number of traces in the event log; proportion of variance represents the proportion of the current time difference in the total time difference.

Figure 19. Evaluations of biological coal-washing data set. The results of

D_{B i o 1}

and

D_{B i o 2}

(a), and the results of

{D_{B i o 1}}^{'}

and

{D_{B i o 2}}^{'}

(b). Note: time indicates the search time for optimal alignment; the number of traces indicates the number of traces in the event log; proportion of variance represents the proportion of the current time difference in the total time difference.

Figure 20. The initial process model

M_{B 1}

and

M_{B 2}

. Process model

M_{B 1}

(a), and process model

M_{B 2}

(b).

Figure 20. The initial process model

M_{B 1}

and

M_{B 2}

. Process model

M_{B 1}

(a), and process model

M_{B 2}

(b).

Figure 21. Running result of

D_{B 1}

, D_B1+20% and D_B1+40%.

Figure 21. Running result of

D_{B 1}

, D_B1+20% and D_B1+40%.

Figure 22. Running result of

D_{B 2}

, D_B2+20% and D_B2+40%.

Figure 22. Running result of

D_{B 2}

, D_B2+20% and D_B2+40%.

Table 1. The cost setting of unit movement.

Cost	Type of Movements	Expression of Alignment
0	$m o v e_{L M}$	$A l i g n (l (e_{1}), λ (t_{1})) = A l i g n (A, A)$
1	${\tilde{m o v e}}_{L}$	$A l i g n (l (e_{2}), λ (t_{2})) = A l i g n (B, \to)$
1	${\tilde{m o v e}}_{M}$	$A l i g n (l (e_{3}), λ (t_{3})) = A l i g n (\to, C)$
0	${\tilde{m o v e}}_{L}$ / ${\tilde{m o v e}}_{M}$	$A l i g n (l (e_{3}), λ (t_{3})) = A l i g n (τ, \to) / A l i g n (\to, τ)$

Table 2. Initial deviation locations of ξ₁ and ξ₂. Note: yellow is the initial deviation,

ξ

is the identifier of

A l i g n (℘_{o}, δ_{j})

.

Table 2. Initial deviation locations of ξ₁ and ξ₂. Note: yellow is the initial deviation,

ξ

is the identifier of

A l i g n (℘_{o}, δ_{j})

.

ξ₁	A	B	C	D	E	→	G
ξ₁	A	B	→	→	E	F	G
ξ₂	A	→	B	C	D	→	F
ξ₂	A	C	B	→	→	E	F

Table 3. The subalignments

A l i g n_{s u b} (℘_{o}, δ_{1}^{\cdot})

and

A l i g n_{s u b} (℘_{o}, δ_{2}^{\cdot})

. Note: the yellow is the initial deviation, and the gray has no practical meaning.

Table 3. The subalignments

A l i g n_{s u b} (℘_{o}, δ_{1}^{\cdot})

and

A l i g n_{s u b} (℘_{o}, δ_{2}^{\cdot})

. Note: the yellow is the initial deviation, and the gray has no practical meaning.

$A l i g n_{s u b} (℘_{o}, δ_{1}^{\cdot})$	B	C	D	→
$A l i g n_{s u b} (℘_{o}, δ_{1}^{\cdot})$	B	C	D	E
$A l i g n_{s u b} (℘_{o}, δ_{2}^{\cdot})$	B	C	D	→	→
$A l i g n_{s u b} (℘_{o}, δ_{2}^{\cdot})$	B	→	D	C	E

Table 4. A set of firing sequences of

M

.

Table 4. A set of firing sequences of

M

.

Serial No.	Occurrence Sequence
δ₁	(a,b,c,d,e,f,j,k,l)
δ₂	(a,b,c,d,e,g,l)
δ₃	(a,b,c,d,e,f,l)
δ₄	(a,b,c,d,e,g,j,k,l)
δ₅	(a,b,d,c,e,f,l)
δ₆	(a,b,d,c,e,g,l)
δ₇	(a,b,d,c,e,f,j,k,l)
δ₈	(a,b,d,c,e,g,j,k,l)

Table 5. Professional notes for letter labels.

Letter	Professional Note	Remark
A	Treatment of coal slime water
B	Pre-sedimentation
C	Enter the reaction pool
D	Add HPAM
E	Flocculation
F	Biodegradation
G	Monitoring concentration
H	Fungus
I	bacteria
J	Bio-enzyme
K	$< 10 mg / L$	This concentration range meets environmental emission standards.
L	$10 - 100 mg / L$	This concentration range has no significant effect on coal flotation.
M	$> 100 mg / L$	The opposite of the above two cases ( $< 10 mg / L$ and $10 - 100 mg / L$ ).
N	Emission
O	Reuse
P	Real environment survey	The real-life environment contains many possibilities. This section has many selective behaviors.
Q	Optimization of reaction conditions based on machine learning	This section contains a number of selective operations based on the actual situation.
R	Yes
S	No
T	Computer molecular simulation
U	Research on the mechanism of enzymatic transformation
V	Enzymatic engineering design
W	Terminate

Table 6. The information of BPIC2020 event log.

Event Log	Cases	Lines	Event Types	Events
Domestic Declarations	582	10,000	25	122,762
Request For Payment	719	10,000	29	134,521

Table 7. The experiment results of BPIC2020.

Method	A*-Align		IA*-Align		IDP-Align
Parameter	Time (ms)	Cost	Time (ms)	Cost	Time (ms)	Cost
D1	7632 ms	181,798	7606 ms	181,798	4582 ms	181,798
D1 (+20%)	9238 ms	221,042	9212 ms	221,042	6105 ms	221,042
D1 (+40%)	10,661 ms	255,518	10,635 ms	255,518	7611 ms	255,518
D2	127,340 ms	179,125	123,753 ms	179,125	101,475 ms	179,125
D2 (+20%)	153,426 ms	214,887	149,839 ms	214,887	126,275 ms	214,887
D2 (+40%)	179,863 ms	252,393	176,276 ms	252,393	153,998 ms	252,393

Table 8. The numerical differences of experiment results.

Method	IA-Align Is Subtracted by A-Align	IDP-Align Is Subtracted by A*-Align	IDP-Align Is Subtracted by IA*-Align
Parameter	Time Difference Value (ms)	Time Difference Value (ms)	Time Difference Value (ms)
D₁	26 ms	3050 ms	3024 ms
D₁ (+20%)	26 ms	3133 ms	3107 ms
D₁ (+40%)	26 ms	3050 ms	3024 ms
D₂	3587 ms	25,865 ms	22,278 ms
D₂ (+20%)	3587 ms	27,151 ms	23,564 ms
D₂ (+40%)	3587 ms	25,865 ms	22,278 ms

Table 9. Check performances of various methods.

References	Cost	Time	Check Perspective	Check Form	Application Region
[13,19,33]	Minimum	$\to$	Control flow	static state	${L \cup M} = D$
[42]	Minimum	$↓$	Control flow	static state	${L \cup M} = D$
[28]	Minimum	$\to$	Multi-perspective	static state	${L \cup M} = D$
[29]	Minimum		Stochastic perspective	static state	${L \cup M} = D$
[31,32]	Minimum	$\to$	Control flow	dynamic state	${L \cup M} = D$
[39]	Minimum	$↓$ > [42]	Control flow	static state	${L \cup M} = D$
This work	Minimum	$↓$ > [42]	Control flow	static state	${L \cup (M \ φ_{p M}^{*})} = D$

Note:

\to

and

↓

represent the invariability and decline in search time, respectively.

{L \cup M} = D

represents an arbitrary data set consisting of the event log and the initial model.

{L \cup (M \ φ_{p M}^{*})} = D

indicates the initial model in the data set not containing non-perceivable region.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, L.; Wang, F.; Song, Z.; Huang, K.; Hu, Y.; Zhuo, G. Efficient Consistency Check Based on Perceived Initial Deviation. Electronics 2024, 13, 4669. https://doi.org/10.3390/electronics13234669

AMA Style

Zhang L, Wang F, Song Z, Huang K, Hu Y, Zhuo G. Efficient Consistency Check Based on Perceived Initial Deviation. Electronics. 2024; 13(23):4669. https://doi.org/10.3390/electronics13234669

Chicago/Turabian Style

Zhang, Liwen, Fanglue Wang, Zhihuan Song, Kaifeng Huang, Yanli Hu, and Guiying Zhuo. 2024. "Efficient Consistency Check Based on Perceived Initial Deviation" Electronics 13, no. 23: 4669. https://doi.org/10.3390/electronics13234669

APA Style

Zhang, L., Wang, F., Song, Z., Huang, K., Hu, Y., & Zhuo, G. (2024). Efficient Consistency Check Based on Perceived Initial Deviation. Electronics, 13(23), 4669. https://doi.org/10.3390/electronics13234669

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Efficient Consistency Check Based on Perceived Initial Deviation

Abstract

1. Introduction

1.1. Research Background

1.2. Related Work

2. Basic Definition

3. Recognition of the Perceptibility of Business Process

3.1. Establishment of Perceptibility Condition

3.2. Determination of Perceptible Region

3.3. Reverse Search for Perceived Range

4. Perceptible Search for Optimal Alignment

4.1. Perceptible Search Algorithm

4.2. Example of Industrial Application

5. Evaluation

5.1. The Operation Interface of OPS-Align Plug-In

5.2. Scalability

5.2.1. Data Sets of Artificial and Real-Life Business Process

5.2.2. Experimental Results on Different Network Structures

5.2.3. Experimental Results on Biological Coal-Washing Data Set

5.3. Effectiveness

5.3.1. BPIC2020 Data Sets

5.3.2. Experimental Result

5.4. Performance Comparison

6. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI