Next Article in Journal
DDoS Attack Detection in IoT-Based Networks Using Machine Learning Models: A Survey and Research Directions
Previous Article in Journal
Design and Implementation of Autonomous Underwater Vehicle Simulation System Based on MOOS and Unreal Engine
Previous Article in Special Issue
Inaudible Attack on AI Speakers
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Process Discovery Techniques Recommendation Framework

by
Mohammed Abdulhakim Al-Absi
1 and
Hind R’bigui
2,*
1
Department of Computer Engineering, Graduate School, Dongseo University, 47 Jurye-ro, Sasang-gu, Busan 47011, Republic of Korea
2
Digital Enterprise Department, Nsoft Co., Ltd., No. 407, 14, Tekeunosaneop-ro 55beon-gil, Nam-gu, Ulsan 44776, Republic of Korea
*
Author to whom correspondence should be addressed.
Electronics 2023, 12(14), 3108; https://doi.org/10.3390/electronics12143108
Submission received: 2 June 2023 / Revised: 26 June 2023 / Accepted: 13 July 2023 / Published: 17 July 2023

Abstract

:
In a competitive environment, organizations need to continuously understand, analyze and improve the behavior of processes to maintain their position in the market. Process mining is a set of techniques that allows organizations to have an X-ray view of their processes by extracting process related knowledge from the information recorded in today’s process aware information systems such as ‘Enterprise Resource Planning’ systems, ‘Business Process Management’ systems, ‘Supply Chain Management’ systems, etc. One of the major categories of process mining techniques is the process of discovery. This later allows for automatically constructing process models just from the information stored in the system representing the real behavior of the process discovered. Many process discovery algorithms have been proposed today which made users and businesses, in front of many techniques, unable to choose or decide the appropriate mining algorithm for their business processes. Moreover, existing evaluation and recommendation frameworks have several important drawbacks. This paper proposes a new framework for recommending the most suitable process discovery technique to a given process taking into consideration the limitations of existing frameworks.

1. Introduction

Process mining is a recent set of techniques that provide a strong bridge between business intelligence (BI) and business process management (BPM) by combining process models and event data, forming a novel form of process-driven analytics. Moreover, process mining enables and strengthens various business process improvement (BPI) approaches such as TQM, Six-sigma, CPI, and others, where processes are diagnosed to explore possible improvements [1].
Modern information systems, including workflow management systems (WFM), enterprise resource planning (ERP) systems, customer relationship management (CRM) systems, supply chain management (SCM) systems, and business-to-business (B2B) systems, store business-related events in event logs [2]. These event logs typically contain information about executed activities within the enterprise (i.e., process instances), the time of activity execution, the people, machines, or systems involved, and other relevant data. Process mining extracts knowledge from these event logs to automatically build a representation of the current execution of business processes within an enterprise. The aim is to identify incorrect executions, bottlenecks, and other problems that hinder the organization from achieving its strategic goals and vision [3].
Process mining utilizes this event data to extract process-related information, and three major classes of process mining techniques can be employed for different purposes, as illustrated in Figure 1: process discovery, conformance checking, and process enhancement or performance analysis [4]. Process discovery, considered the most important process mining technique, takes an event log as input and automatically constructs a process model. Conformance checking compares an existing process model with the event log of the same process model to investigate whether the organization’s actual operations conform to the defined process model. Process enhancement aims to improve or extend the existing process model based on information obtained from the discovered process model or from the event log. In this thesis, our focus is on the control-flow perspective of process discovery, which entails constructing process models.

1.1. Event Log

Table 1 provides an example of an event log for a purchasing process, which is utilized for process mining. Each row in the table represents an individual event, while each column represents an attribute associated with that event. Events are linked to specific cases, and in Table 1, events are grouped by case and arranged chronologically. The first recorded event corresponds to case Q521-QZR, indicating the execution of the Purchase Request activity performed by the employee named Hind on 15 May 2022. Additional attributes, such as entered data, costs, etc., can also be utilized for process mining. It is essential for each event to have a unique identifier and be associated with a case. Furthermore, events should be sorted, typically using timestamps to establish a chronological order. The timestamps displayed in Table 1 represent the start time of the corresponding activity’s execution. Timestamps indicating the completion or pause of an activity’s execution can also be recorded and utilized for process mining.
This event log contains information about three cases (i.e., workflow instances). The log demonstrates that for cases 1 and 2, the tasks Request, Approve, Verify, Finalize, Order, and Receive and Verify have been executed. Each case begins with the execution of the Purchase Request activity and concludes with the execution of Receive and Verify.
The event data presented in Table 1 contains typical information found in an event log. In process mining, the first class of analysis, known as process discovery, can utilize the case ID and event ID from Table 1 to create an accurate process model that represents the purchasing process described in this example. This model is depicted in Figure 2, constructed using the information provided in Table 1. The process model shown in Figure 2 is represented with Petri Net, a process modeling language [5].
Conformance checking techniques involve comparing the process model derived from the event data in Table 1 with an existing or predefined model. This comparison helps identify inconsistencies, deviations, and other issues that may arise. Additionally, attributes like timestamps associated with the events can be used in process enhancement to investigate the performance of the process.

1.2. Process Discovery Techniques Overview

Several process discovery algorithms have been developed and applied successfully. One of the big differences between existing discovery algorithms is the ability to discover the complex constructs of a process model. These complex constructs are short loops, invisible tasks, duplicate tasks, and non-free choice constructs. One of the major process discovery techniques is the α algorithm [6]. This algorithm produces a workflow net based on the causal relationships observed between tasks, but without short loops, invisible tasks, non-free choice constructs, and duplicate tasks. Therefore, several extensions have been presented so far to tackle these limitations. The Alpha + algorithm was first introduced to extend the Alpha algorithm to short loop mining [7], while Alpha ++ was developed to extend the Alpha algorithm to non-free choice constructs [8], Alpha # was developed to mine invisible tasks [9,10], and Alpha $ was developed to handle invisible tasks involved in non-free choice constructs [11]. However, none of these extensions can detect all types of characteristics that can exist in an event log. The Heuristics Miner [12] derives ordering relations based on their frequencies and returns a net. The Heuristics Miner algorithm is robust for invisible tasks but cannot handle duplicate tasks and non-free choice constructs. The Inductive Miner returns a process tree and guarantees sound models that other algorithms cannot guarantee [13]. It is capable of mining invisible tasks but cannot discover duplicate tasks or non-free choice constructs. The Region-based algorithm can mine non-free choice constructs but cannot handle invisible tasks and duplicate tasks [14]. The Genetic Mining algorithm is the only existing algorithm that can deal with most of the common constructs. Nevertheless, it requires many parameters and does not provide the correct output in a restricted time.

2. Literature Review

Several papers have developed frameworks to evaluate or recommend process discovery techniques. The authors in [15] have written an important and well-founded article on building a benchmark framework for process mining algorithms. They introduced two methodologies to evaluate process discovery techniques. The first strategy is based on evaluation using metrics, and the second one is a machine learning strategy. In [16], the authors extended this work by empirically evaluating process discovery techniques using artificial and real-life datasets, as well as different similarity measures. Although this framework allows users and businesses to select the most suitable process mining algorithm for a given event log, empirical evaluation can be time-consuming. It is only after performing experiments on all existing mining algorithms that the best algorithm can be determined. Notably, remarkable work was presented in [17], where the issue of spending a long time on experiments to select the appropriate mining algorithm was considered. Their proposed framework is based on a learning step and a recommendation step. In the first phase, a regression model is built based on features extracted from high-quality selected reference models and the similarities between the reference models and the mined models. Using this regression model, along with features extracted from other models, similarities between reference models and mined models are predicted without the need for experiments. While the results obtained by applying this framework are accurate and attractive, there is a weakness in this concept. The presented methodology heavily relies on reference models, which are usually not available in the practical world. In [18], the authors developed a framework and a tool for recommending control flow algorithms. Based on features extracted from event logs and prediction models built from experiments, top-K control-flow miners are recommended. Although this work takes event logs as input instead of reference models, addressing the limitation mentioned in [17], the study can be criticized for requiring experiments to build prediction models before recommending top-K mining algorithms, making it time-consuming once again. The proposed systems would have been more interesting if the authors had based their framework on event logs and had reduced the number of experiments required to decide the mining algorithm suitable for a given event log. [19] proposed an approach to recommend a process discovery algorithm based solely on the classification of event logs, but they only conceptualized the idea. In [20], a classification-based framework was proposed to evaluate the quality of process discovery algorithms. The starting point of this framework is the generation of random samples of process models artificially from a specified population of processes. For each model, a training log with fitting traces and a test log with both fitting and non-fitting traces are generated. The quality of process discovery algorithms is assessed based on their capability to correctly classify a trace representing real process behavior as fitting and a trace representing non-related behavior to the process as non-fitting. However, similar to other methodologies, it is an empirical framework. Users need to conduct experiments on all existing discovery techniques before they can decide the best algorithm based on quality performance.

3. Process Discovery Recommendation Framework

Existing evaluation and recommendation frameworks allow users to choose the best algorithm to a given process by comparing the performance of existing discovery algorithms empirically. While in fact, recommending a process discovery technique for a given process log based on empirical assessment is time and resource consuming. From a business perspective, practically one cannot perform experiments on all algorithms each time to decide the most suitable algorithm at the end. In addition, some recommendation frameworks are based on reference models. There is a strong possibility that reference models are not available in the practical world.
The development of a recommendation framework for process discovery algorithms is strongly needed due to the abundance of process discovery algorithms and due to the fact that each algorithm has different characteristics and ability, and specific limitations. One of the big differences between existing discovery algorithms is the ability to discover the standards and complex constructs of a process model which are short loops, invisible tasks, non-free choice constructs, non-free choice involved in invisible tasks, and duplicate tasks. There is currently no algorithm that can handle all of these structures in a restricted time. For instance, the Inductive Miner algorithm [13] is robust for invisible tasks, but it cannot handle duplicate tasks and non-free choice constructs. Another example is the Region-based algorithm [14], which is capable of mining some non-free choice constructs but cannot handle invisible and duplicate tasks. The other algorithms have similar problems. Each algorithm has an advantage in mining specific structures but at the same time there is a restriction in mining other constructs.
In our previous work [21], we proposed a conceptual idea for a new framework that recommends or selects a suitable mining algorithm for a given event log. In this paper, we provide detailed explanations of the idea and its evaluation. The framework involves extracting process model constructs from the event log without actually discovering a model. Essentially, it investigates the log for short loops, invisible tasks, non-free choice constructs, non-free choice involved in invisible tasks, and duplicate tasks. By utilizing a knowledge database containing information about the capability of each algorithm in discovering these structures, one can determine the candidate techniques suitable for the given event log. The knowledge database can be constructed using existing comparison and evaluation frameworks that assess the ability of process discovery algorithms in mining the aforementioned constructs.
In the proposed framework, no reference model or empirical evaluation is required to recommend a process discovery technique. If the constructs extraction stage results in numerous candidate algorithms, other metrics from the knowledge database, such as mining time and soundness, gathered from research papers, will be used to reduce the candidate algorithms. For example, if the structures extracted from the event logs reveal the presence of duplicate tasks, algorithms like the Alpha algorithm and Inductive Miner will be eliminated due to their inability to discover duplicate tasks. Both the Alpha* algorithm and Genetic Miner can handle duplicate tasks and are considered as candidates. However, based on the mining time metric in the knowledge database, the Genetic algorithm takes significantly longer compared to the Alpha* algorithm to discover a model from an event log with duplicate tasks. Therefore, the recommended discovery algorithm for mining such event logs, without any empirical evaluation, is the Alpha* algorithm.
The overall framework is illustrated in Figure 3.

3.1. Constructs

The most relevant constructs typically discovered by process discovery algorithms are sequence, exclusive choice, inclusive choice, parallelism, loops, invisible tasks, non-free choice, and duplicate tasks.
  • Sequence: Certain process activities need to be sequentially executed.
  • Exclusive choice: Certain process parts of the process are mutually exclusive. In several notations, this is known as XOR split/join.
  • Parallelism: Certain branches are “parallel”, indicating that the activities of a first part of the model are executed simultaneously with the activities of a second part of the model. In several notations, this is known as AND split/join.
  • Inclusive choice: a choice needs to be made on which part(s) of the process that follow need to be performed, when reaching given points of the process. Inclusive choice is different from exclusive choice because multiple parts can be executed in parallel and different from the parallelism construct since not every part that follows the reached point needs to be executed. In several notations, this is known as OR split/join.
  • Loop: The execution of certain parts of the process can be sequentially repeated multiple times.
  • Invisible tasks: Certain transitions are incorporated into the model for a process-routing purpose. For instance, combined with exclusive choices, invisible tasks allow the execution of some parts of the process to be skipped.
  • Duplicate activities: Certain activities share the same name but placed in two or more nodes in the process model.
  • Non-free choice: when the choice of one or multiple branches is influenced by a choice that occurred before.
We classify these constructs into standard constructs, which can be discovered by all existing process discovery algorithms, and complex constructs, which cannot be mined by all algorithms. Our focus in this work is on the complex constructs, as the ability of each algorithm differs in terms of mining these constructs. We are not concerned with the standard constructs.

3.2. Complex Construct Detection

In this section, we will present the equations used to detect short loops, invisible tasks, non-free choice, and non-free choice involved in invisible tasks in a given event log. The equations for detecting these complex constructs are based on the extensions of the alpha miner. Compared to other algorithms, the extensions of the alpha miner, namely alpha++, alpha$, and alpha#, address each construct separately. These extensions handle various constructs such as short loops, all types of invisible tasks, and non-free choice constructs.

3.2.1. Basic Ordering Relations

In the α algorithm, which serves as the foundation for the α extensions algorithms, six fundamental relations were defined: > L , L , L , L , L , and L . Therefore, we briefly explain each of these relations:
  • Relation > L : This relation signifies that two activities can be executed successively, one after the other. It indicates a direct succession between the activities.
  • Relation L : This relation represents a loop with a length-two structure, such as “aba.” It is also used to differentiate between length-2-loops and parallel routing.
  • Relation L : This relation indicates a two-way loop with a length of two. It means that two activities have the L relation between them, for example, “aba” or “bab.”
  • Relation L : This relation denotes a direct causal relationship between two activities, indicating that one activity is a direct cause or prerequisite for the other.
  • Relation ‖_L: The ‖_L relation signifies that activities can be executed concurrently, meaning they can happen simultaneously or in parallel. For instance, “ab” or “ba” can be executed concurrently.
  • Relation L : This relation implies that two activities should never directly follow each other. There should always be other activities or conditions between them.
These defined relations in the α algorithm are used to describe different types of relationships and structures within a process model. They play a crucial role in understanding and analyzing the behavior and dependencies of activities in a given process.
Definition 1. 
(Basic Ordering relations). Let A be a set of activities, L be an event l o g   o v e r   A , a and b be two activities in A , the relation defined in α and α + algorithms as follows:
a > L b     σ = t 1 t 2 t n L ,   i 1 , , n 1 :   t i = a t i + 1 = b ,
a   L   b     σ = t 1 t 2 t n L ,   i 1 , , n 2 :   t i = t i + 2 = a t i + 1 = b ,
a   L   b   a   L   b     b   L   a ,
a L b   a > L b b L a a   L   b ,
a   L b   a > L b   b > L a b   L a ,   a n d    
a L b   a L b b L a .

3.2.2. Short Loops Detection

The relation that can detect loops of length two is defined in the basic ordering relations in Definition 1 as L . The relation that can detect loops of length-one loop is defined using the relation of direct succession as follows:
a > L a     σ = t 1 t 2 t n L ,   i 1 , , n 1 :     t i = a     t i + 1 = a
Several constructs and sound models cannot correctly be discovered with the existence of a loop of length-one. Therefore, most of the algorithms focus on the pre- and post-processing steps of process mining as a solution to tackle length-one loops. Similarly, our framework starts by investigating the existence of a loop of length-one in the event log using the direct succession relation > L . Once it is detected, the framework records the existence of loop of length-one, removes it from the event log and then investigates the rest of the constructs in the new event log defined in Definition 2.
Definition 2. 
(loop-1-length free event log). Let A be a set of activities, L be an event l o g   o v e r   A ,   a be an activity from A , and L 1 L be the set of loops of length-one, and the new e v e n t   l o g   A free of loop-1-length is defined as follows:
A l o g = { a A | σ L [ t σ ] } ,
L 1 L   =   { a A | σ = t 1 t 2 t n L ;   i 2,3 , . . , n   a = t i a = t i + 1       a = t i 1 a = t i ,
A = A l o g \ L 1 L .

3.2.3. Invisible Tasks Detection

α # algorithm introduced the mendacious dependency L to reflect invisible tasks of type SKIP, REDO, and SWITCH. The relation L can be used to detect the existence of invisible tasks in the event log and to discover them in a process model. The mendacious dependency L defined in α # algorithm is defined in Definition 3.
Definition 3. 
(mendacious dependency associated with invisible tasks). Let A be a set of activities, L be an event log over A , and a , b   be two activities from A . The mendacious dependency associated with invisible tasks is defined as follows:
a L b   a L b       x , y A :   a L x     y L b     y L x     ( x L b ) a L y
This relation is capable of detecting invisible tasks of type Short-Skip, Long-Skip, Short-Redo, and Long-Redo. The basic idea for detecting these types is illustrated in figure Task t in Figure 4 which represents one such invisible task. We have identified four types of invisible tasks: Short-Skip, Long-Skip, Short-Redo, and Long-Redo:
  • Short-Skip: Task t is of Short-Skip type if tasks x and y are equal. In a sequential workflow, this type of invisible task can be detected by the relation L .
  • Long-Skip: Task t is of Long-Skip type if x reaches y . Again, in a sequential workflow, this type of invisible task can be detected by the relation L .
  • Short-Redo: Task t is of Short-Redo type if tasks a and b are equivalent. In a sequential workflow, this type of invisible task can be detected by the relation L .
  • Long-Redo: Task t is of Long-Redo type if b reaches a . Once again, in a sequential workflow, this type of invisible task can be detected by the relation L .
  • Switch: If none of the above conditions are met, task t is considered a Switch type.
However, the relation L is incapable of detecting these types of invisible tasks when they are executed in parallel with other tasks. To address this limitation, the α $ algorithm introduced new definitions to detect invisible tasks of types Short-Skip, Long-Skip, Short-Redo, and Long-Redo even when they are executed concurrently with other tasks. The algorithm improved the detection of these invisible tasks by considering their interactions and dependencies in parallel workflows.
α $ algorithm introduced the Between-set to be used to improve the mendacious dependency. The Between-set is defined in Definition 4 and refers to the tasks occurring between two tasks. If the two tasks are the endpoints of a concurrent construct, the Between-Set is the set of tasks in the parallel branches. For instance, in Figure 5, B e t w e e n ( L , a , b ) = { x , y , c } .
Definition 4. 
(Between-Set). Let A be a set of activities, L be an event l o g   o v e r   A , a and b be two activities from A ,   σ be a trace that belongs to   L , and the Between-Set of a , b , i.e., B e t w e e n L ,   a , b , is defined as follows:
B e t w e e n σ ,   a , b = { σ k | 1 i < j m σ i = a σ j = b i < k < j   i < l < j σ l = a σ l = b ) } ,
B e t w e e n σ ,   a , b = σ k 1 k m       B e t w e e n σ ,   a , b ,   a n d
B e t w e e n L ,   a , b = σ L B e t w e e n ( σ ,   a , b ) σ L   B e t w e e n σ ,   a , b .
The mendacious dependency L is improved as L and defined in Definition 5. the improved mendacious dependency L is capable of detecting the types of invisibles tasks when they are involved in a concurrent construct. The basic idea of L is illustrated in Figure 5. Task t in Figure 5 represents an invisible task in a parallel branch. Similar to Figure 4, if the two tasks x and y are equal, t is of Short-Skip type. If x reaches y , t is of Long-Skip type. If the two tasks a and b are equivalent, t is of Short-Redo type. If b reaches a , t is of type Long-redo type. Otherwise, t is of type Switch.
Definition 5. 
(Mendacious dependencies associated with invisible tasks in parallel). Let A be a set of activities, L be an event l o g   o v e r   A , a and b be two activities from A , σ be a trace that belongs to L , and the mendacious dependency a L b associated with invisible tasks involved in a parallel construct is defined as follows:
a L b   x , y A B e t w e e n   L , x , y B e t w e e n   L , a , b       m B e t w e e n L ,   a , b B e t w e e n L , x , y x , y   n B e t w e e n   L , x , y : m     L   n     σ L   B e t w e e n   σ ,   a , b     B e t w e e n L ,   a , b x , y ,
a L b a L b b L a ( a L b ) , a n d
a L b   x , y A : a L x     y L b x L b   y L a a L b y L x .
The mendacious dependency L can detect invisible tasks of types Skip, Redo, and Switch involved in a sequential constructs using one relation. The dependency L can be seen as a combined relation that can detect invisible tasks of type Short-Skip, Long-Skip, Short-Redo, and Long-Redo involved in a sequential construct. Similarly, the improved one L can discover invisible tasks of types Skip, Redo, and Switch involved in a parallel construct using one relation. The dependency L can be seen as a combined relation that can detect invisible tasks of types Short-Skip, Long-Skip, Short-Redo, and Long-Redo involved in a parallel construct. In fact, there are process discovery algorithms which can detect certain types of invisible tasks while they cannot detect the other types. For instance, the ETM algorithm [22] is capable of discovering invisible tasks of types Short-Skip and Long-Skip involved in a sequential construct whereas it is incapable of discovering those of types Short-Redo and long redo in sequence, and Short-Skip and Long-Skip in parallel and switch. Similarly, the Heuristic Miner [12] is proven to be able to detect invisible tasks. However, it cannot detect those of types Short-Redo in a sequence construct and Short-Skip in a parallel construct. Hence, even if the Heuristic Miner is proven to discover invisible tasks, if the existence of invisible task of type Short-Skip in a parallel construct is detected in the log, the Heuristic Miner will not be recommended. The dependencies L and L do not detect the types of invisible tasks separately. Therefore, we need to define a relation for each type of invisible task.
In our framework, we define a relation for each type of invisible task by splitting the two relations L and L and based on the information that if the two tasks x and y are equal, t is of the Short-Skip type. If x reaches y , t is of the Long-Skip type. If the two tasks a and b are equivalent, t is of the Short-Redo type. If b reaches a , t is of the Long-redo type. Otherwise, t is of the Switch type. The relations detecting invisible tasks of types Short-Skip, Long-Skip, Short-Redo, and Long-Redo in a sequential construct is defined in Definition 7, Definition 8, Definition 9, and Definition 10, respectively. The relations detecting those involved in a parallel construct are defined in Definition 11, Definition 12, Definition 13, and Definition 14, respectively.
Before that we define two relations that will be used in the new definitions for detecting each type of invisible task separately. These two relations are L and L . They were introduced in the α + + algorithm. Relation L represents the case where one task can only be indirectly followed by another task, while relation L refers to the situation when one task can be followed by another task either directly or indirectly. A L b means that task b is reachable from task a indirectly, and a L b indicates that task b is reachable from task a directly or indirectly. Relations L and L are defined in Definition 6.
Definition 6. 
(Reachable dependencies). Let A be a set of tasks, L be an event log over A , a ,   b two tasks from A , and the dependency a   L b reflects the indirect reachable dependency between the two tasks a and b .     a L b reflects the indirect or direct reachable dependency between the two tasks a and b . These two relations are defined as follows:
a L b       σ = t 1 t 2 t n i , j 1 , , n : i < j   t i = a t j = b     k i + 1 , , j 1 :   t k a   t k b
a L b     a L b ( a L b )
Definition 7. 
(Short-Skip in sequence). Let A be a set of tasks, L be an event log over A , a ,   b two tasks from A , and the dependency a L s s s b reflects the mendacious dependency associated with invisible tasks of type Short-Skip in Sequence and is defined as follows:
a L s s s b   ( a L b   ) x A :   ( a L x ) ( x L b ) ( x w b ) ( a w x ) ( x L x ) .
Definition 8. 
(Short-Redo in sequence). Let A be a set of tasks, L   be an event log over A , and x , y two tasks from A . The dependency x L s r s y represents the mendacious dependency associated with invisible tasks of type Short Redo in Sequence and is defined as follows:
y L s r s x y L x a T :   a L x y L a x L a a L y a > L a .
Definition 9. 
(Long-Skip in sequence). Let A be a set of tasks, L be an event log over A ,   a n d   a ,   b two activities from A . The dependency a L l s s b reflects the mendacious dependency associated with invisible tasks of type long Skip in Sequence and is defined as follows:
a L l s s b a L b       x , y A :   a L x     y L b     y L x     ( x L b ) a L y x L y .
Definition 10. 
(long redo in sequence). Let A be a set of tasks, L be an event log over A ,   a n d   x ,   y two activities from A . The dependency x L s r s y which represents the mendacious dependency associated with invisible tasks of type long Redo in Sequence is defined as follows:
y L l r s x   a , b A :     a L x     y L b x L b   y L a     ( b L a )   ( a > L b ) .
Definition 11. 
(Short-Skip in parallel). Let A be a set of tasks, L be an event log over A , and a , b two activities from A . The dependency x L s s p y which represents the mendacious dependency associated with invisible tasks of type Short-Skip in Parallel is defined as follows:
a L s h b x B e t w e e n L ,   a , b   m ( B e t w e e n L ,   a , b x ) : m     L x     σ L B e t w e e n σ ,   a , b B e t w e e n L ,   a , b x ,
a L s h b a L s h b b L s h a ( a L b ) ,
a L s s p b   x , y A : a L x     y L b x L b   y L a ( a L s h b )
.
Definition 12. 
(Short-Redo in parallel). Let A be a set of tasks, L be an event log over A ,   a n d   x ,   y two activities from A . The dependency x L s r p y which represents the mendacious dependency associated with invisible tasks of type Short-Redo in Parallel is defined as follows:
y L s r p x   a A : a L x     y L a a L x   y L a y L s h x   ( a > L a )
.
Definition 13. 
(long skip in parallel). Let A be a set of tasks, L be an event log over A ,   a n d   a ,   b two activities from A . The dependency x L l s p y which represents the mendacious dependency associated with invisible tasks of type Long-Skip in Parallel is defined as follows:
a L l g b   x , y A B e t w e e n   L , x , y B e t w e e n   L , a , b       m B e t w e e n L ,   a , b B e t w e e n L , x , y x , y   n B e t w e e n   L , x , y : m     L   n     σ L   B e t w e e n   σ ,   a , b     B e t w e e n L ,   a , b x , y ,
a L l g b a L l g b b L l g a ( a L b ) ,
a L l s p b   x , y A : a L x     y L b x L b   y L a a L l g b y L x ( x L y )
.
Definition 14. 
(long redo in parallel). Let A be a set of tasks, L be an event log over A ,   a n d   a ,   b   two activities from   A . The dependency x L l r p y which represents the mendacious dependency associated with invisible tasks of type Long-Redo in Parallel is defined as follows:
y L l r p x   a , b A : a L x     y L b x L b   y L a y L l g x b L a ( b L a )
.

3.2.4. Non-Free Choice Constructs

Definition 15 in the framework defines the transformed implicit ordering relations for detecting non-free choice constructs based solely on the information in the event log. These transformed relations are derived from the three implicit dependencies introduced in the α + + algorithm, which were originally based on the place’s component of a workflow net. To clarify, the α + +   algorithm initially proposed the three implicit dependencies to detect five types of non-free choice constructs, as depicted in Figure 6. However, since the framework focuses on utilizing only the event log for non-free choice detection, we have converted the conditions on places in the original algorithm to conditions on transitions. It is worth mentioning that for a more comprehensive understanding of the implicit dependencies defined in the α + +   algorithm, refer to [23], which provides additional details.
Before that we introduce two relations L and L defined in the α + + algorithm and which will be used in the definition of implicit dependencies. Moreover, we introduce a new relation, the L , which we will use too in defining the implicit dependencies.
  • L (XOR-Split): This relation represents an exclusive OR (XOR) splitting of a process flow. It indicates that at a specific point in the process, there are multiple possible outgoing transitions, but only one of them can be taken. L (XOR-Join): This relation corresponds to an exclusive OR (XOR) joining of a process flow. It indicates that there are multiple incoming transitions to a specific point in the process, but only one of them is enabled and must be followed.
  • L (AND-Split): This relation represents an AND-split in the process flow. It indicates that at a particular point in the process, multiple outgoing transitions can be taken simultaneously.
These relations L , L , and L , are defined in Definition 15 of the α + +   algorithm. They are used in the subsequent definitions of implicit dependencies for detecting non-free choice constructs within the framework.
Definition 15. 
(XOR-Split, XOR-Join and AND-split). Let A be a set of tasks,   L be an event log over A , a n d   a , b two activities from A .
a L b   a L b   c A : c   L a c   L   b ,
a L b   a L b   c A : a L c b   L   c ,
a L     x ,   y A : a   L   x a   L   y   ( x   L y )
.
Definition 16. 
(Implicit ordering relations). Let A be a set of tasks, L be an event log over A ,   a n d   a ,   b   two activities from A . The implicit dependencies L 1 , L 2 , and L 3 which detect non-free choice constructs are defined as follows:
a L 1 b   a L b     c A : a > L c   c L b     t A : t > L a   t > L a     t   L a ,
a L 21 b   a L b   a L     b A : b L b t A : a > L t   t L b   o r   t   L b t A : a > L t t L b     t L b ,
a L 22 b   a L b   b L     a A : a L a t A : t > L b a L t   o r   a   L t t A : t > L b   a L t   a L t ,
a L 2 b     a L 21 b   a L 22 b   ,   a n d
a L 3 b     a L a b L b a L b a ¬ L b .
First of all, relation L 1 detects the implicit dependencies illustrated in Figure 6b,g from an event log. These dependencies are identified by analyzing the event log and are related to specific patterns or relationships among the activities recorded in the log. Secondly, relation L 2 detects the implicit dependencies shown in Figure 6c–f. By examining the event log data, these dependencies can be uncovered, revealing correlations or interdependencies between different activities or events. Finally, relation L 3 detects the implicit dependencies similar to Figure 6a. Through an analysis of the event log, similar patterns or relationships among activities can be identified, indicating the existence of these dependencies.
By leveraging these three relations, your framework aims to identify and discover various implicit dependencies within the event log, aligning with the specific scenarios and patterns represented in Figure 6.

3.2.5. Invisible Tasks Involved in a Non-Free Choice Construct Detection

There are cases where invisible tasks are involved in a non-free choice construct. An example is illustrated in Figure 7. Tasks t1 and t2 are invisible tasks designed to skip the execution of tasks B and E. t2 together with E, p2, p3, and p4 form a non-free choice construct indicating that if the invisible task t1 is executed, t2 will be later executed and E will not be performed, and if B is executed, E will be executed later and not t2. To detect the involvement of invisible tasks in a non-free choice construct, the reachable dependencies between the two invisible tasks t1 and t2 need to be detected in this example. For this, the α $ algorithm introduced the notion of conditional reachable dependency (CRD) which requires artificially adding a starting task (i.e., ) and an ending task (i.e., ). The conditional reachable dependency is defined in Definition 17. The relation a L b which indicates that a is indirectly followed by b is used in this definition. Definition 17 introduced three types of CRDs: pre-CRD (i.e., L , p r e = x ), post-CRD (i.e., L , p o s t = y ), and both-CRD (i.e., L , p r e = x , p o s t = y ). a L , p r e = x , p o s t = y b indicates that there is a trace where a L b holds and x is executed directly before a and y is performed directly after b .
Definition 17. 
(Conditional reachable dependency). Let A be a set of activities, L be an event log over A ; and are starting activity and ending activity artificially added to each trace in the event log such that for a trace σ with length n , σ 0 = and σ n + 1 = , a ,   b two activities from A , and x ,   y two activities from A { } . Conditional reachable dependencies are defined as follows:
a L ,   p r e = x b   a L b     ( σ L 1 i σ   σ i = a     σ i 1 = x   a σ b ) ,
a L ,   p o s t = y b   a L b     ( σ L 1 j σ   σ j = b     σ j + 1 = y   a σ b ) ,
a L ,   p r e = x , p o s t = y b   a L b     ( σ L 1 i , j σ     σ i = a     σ j = b     σ j + 1 = y     σ i 1 = x   a σ b ) .
T h e   α $ algorithm defined the reachable dependency related to invisible tasks based on conditional reachable dependency. For two invisible tasks x and y , x L y holds if there exist four tasks a 1 , a 2 , b 1 , and b 2 such that a 1 L x , x L b 1 , a 2 L y , y L b 2 , and b 1 L , p r e = a 1 , p o s t = b 2 a 2 . x L m holds if there exist two tasks a and b such that a L x , x L b , and b L , p r e = a m . m L x is similar to x L m . The reachable dependencies related to invisible tasks involved in a non-free choice construct are defined in Definition 18.
Definition 18. 
(Reachable dependencies related to invisible tasks with non-free choice). Let A be a set of activities, L be an event log over A , m be an activity from A ,   a n d   x , y be two invisible tasks. The reachable dependencies related to invisible tasks in a non-free choice are defined as follows:
x L m     a =     a L     b L   a L x     x L b     b L ,   p r e = a m
m L x       a L     ( b L     b = )   a L x     x L b     m L ,   p r e = b a
x L y     a 1 =     a 1 L     b 1 L     a 2 L     ( b 2 L     b 2 = )     a 1 L x     x L b 1 a 2 L y   y L b 2   b 1 L ,   p r e = a 1 , p o s t = b 2 a 2
.
However, the reachable dependency defined in the α $ algorithm to detect the involvement of invisible tasks in a non-free choice construct actually does not differentiate between non-free choice and a free choice. For instance, t 1 L t 2 of the event log of the WF-net shown in Figure 7 where there is a reachable dependency between t 1 and t 2 holds, and t 1 L t 2 of the event log of the WF-net shown in Figure 8 where there is not a reachable dependency between t 1 and t 2 also holds. In the given log example of Figure 8, it is stated that t 1 L t 2 holds, indicating an XOR-split between tasks t 1 and t 2 . Additionally, t 1 L E also holds, implying an XOR-split between task t1 and some other task E. However, to properly detect a reachable dependency between t 1 and t 2 , it should not be the case that t 1 L E holds. Based on this, we improved the reachable dependency related to invisible tasks in Definition 19. This improvement addresses the issue of invisible tasks t 1 and t 2 being involved in an XOR-split ( t 1 L t 2 ) while t 1 is also XOR-split with another task E ( t 1 L E ). The improved definition takes into account this constraint and ensures that the presence of t 1 L E does not interfere with detecting the reachable dependency between t 1 and t 2 . The provided example in Figure 8 depicts a sound WF-net with invisible tasks t 1 and   t 2 that are not involved in a free choice construct. The set of traces L = {<A, B, C, D, E, F>, <A, C, D, F>, <A, B, C, D, F>, <A, C, D, E, F>} represents the traces observed in the event log for this WF-net.
Definition 19. 
(Improved reachable dependency related to invisible tasks). Let A be a set of activities, L be an event log over A , m be an activity from   A ,   x , y be two invisible tasks, and the improved reachable dependencies related to invisible tasks in a non-free choice are defined as follows:
x L m     a =     a L     b L   a L x     x L b     b L ,   p r e = a m   n σ   b L ,   p r e = a n   ,
m L x       a L     ( b L     b = )   a L x     x L b     m L ,   p r e = b a   n σ   m L ,   p r e = n a   ,
x L y     a 1 =     a 1 L     b 1 L     a 2 L     ( b 2 L     b 2 = )     a 1 L x     x L b 1 a 2 L y   y L b 2   b 1 L ,   p r e = a 1 , p o s t = b 2 a 2     n L   b 1 L ,   p r e = a 1 , p o s t = b n a 2
.

3.3. Knowledge Data Base Construction

All existing process discovery algorithms have no problem in correctly discovering the standard (sequence, choice, and parallel) constructs. The difference between the algorithms appears in the mining of complex constructs. The knowledge database will contain the information on the ability of each algorithm in mining the complex constructs. The process discovery techniques that are included in the knowledge database are the α algorithm, the α + algorithm, the α + + algorithm, the α # algorithm, Inductive Miner (IM), Heuristics Miner (HM), ILP [24], ETM [22], Region Miner (RM), Transition System (TS) [25], DWS [26], and Genetic Miner (GM). There are other algorithms but they are not implemented in ProM. Only the information of those implemented in ProM is included in the knowledge database. However, any newly developed algorithm and implemented in ProM can be included in the knowledge database. The knowledge database includes the information on the capability of any of these algorithms in discovering in detail the complex constructs: Short loops of length one ( L 1 p for short), Short loops of length two ( L 2 p ), invisible tasks of type Short-Skip in sequence ( I v T S S s e q ), invisible tasks of type Short-Skip in parallel ( I v T S S p a r ), invisible tasks of type Short-Redo in sequence ( I v T S R s e q ), invisible tasks of type Short-Redo in parallel ( I v T S R p a r ), invisible tasks of type Long-Skip in sequence ( I v T L S s e q ), invisible tasks of type Long-Skip in parallel ( I v T L S p a r ), invisible tasks of type Long-Redo in sequence ( I v T L R s e q ), invisible tasks of type Long-Redo in parallel ( I v T L R p a r ), non-free choice construct (NFC), and invisible tasks involved in a non-free choice construct (IvT-NFC).

3.3.1. Algorithms Classification Based on Their Ability in Mining Complex Constructs

There are plenty of empirical studies on the capability of process discovery techniques in mining complex constructs. However, most studies focus on a small number of algorithms or on the most frequently used ones. Moreover, they do not evaluate the ability of algorithms in discovering all types of complex constructs. Therefore, we generated event logs containing all types of aforementioned complex constructs. We imported the event logs with the ProM tool and ran the plugin of each of the aforementioned algorithms.
T h e α algorithm and the α + algorithm were not evaluated because we know before that the α algorithm cannot discover the complex constructs and the α + algorithm can discover from complex constructs only the two types of short loops ( L 1 p and L 2 p ). We ran experiments on the rest of algorithms. The results show that the α + + algorithm can correctly discover the two types of short loops and the three types of non-free choice constructs while it is incapable of discovering all types of invisible tasks and IvT-NFC. The α # algorithm is capable of mining L 1 p , L 2 p , I v T S S s e q , I v T S R s e q , I v T L S s e q , I v T L R s e q whereas it is unable of discovering I v T S S p a r , I v T S R p a r , I v T L S p a r , I v T L R p a r , I v T S W , N F C , and I v T N F C . The Inductive Miner can correctly discover the two types of short loops and seven types of invisible tasks. However, the latter cannot detect one type of invisible tasks which are of type Switch and type Short-Redo in parallel, non-free choice constructs and invisible tasks involved in a non-free choice construct. The Heuristic Miner is capable of mining invisible tasks of types I v T L R s e q , I v T S S s e q , I v T L S s e q , I v T L S p a r , I v T S W , a similar behaviour to short loops of type L 2 p , but it cannot discover short loops of type L 1 p , invisible tasks of types I v T S R s e q , I v T S R p a r , I v T S S p a r , non-free choice constructs, and invisible tasks involved in a non-free choice construct. The Integer linear programming miner (ILP) can mine the two types of short loops, whereas it cannot discover all types of invisible tasks, non-free choice constructs, and invisible tasks involved in a non-free choice construct. The Evolutionary Tree Miner (ETM) is capable of discovering short loops of types L 1 p , invisible tasks of types I v T S S s e q , I v T L S s e q while it is incapable of mining short loops of type L 2 p , invisible tasks of types I v T S R s e q , I v T S R p a r , I v T L R s e q , I v T L R p a r , I v T S S p a r , I v T L S p a r , I v T S W , non-free choice constructs, and invisible tasks involved in a non-free choice construct. Region miner (RM) is able of deriving short loops of type L 2 p , invisible tasks of type with similar behavior to I v T L R s e q , while it cannot discover short loops of type L 1 p , the rest of invisible tasks, non-free choice constructs, and I v T N F C . The transition system (TS) can derive short loops of type L 2 p and similar behaviour to L 1 p , invisible tasks of type I v T S S s e q , similar behavior to I v T S R s e q , and similar behavior to I v T S S p a r . However, this technique cannot discover I v T L R s e q , I v T L R p a r , I v T L S s e q , I v T L S p a r , non-free choice constructs, and invisible tasks involved in a non-free choice structure. The Disjunctive Workflow Schema (DWS) algorithm can mine short loops of type and invisible tasks of types I v T S S s e q , I v T S R s e q , I v T L S s e q , I v T L R s e q , and I v T S W . Nevertheless, it cannot derive short loops of types L 1 p , all invisible tasks that are involved in a parallel construct, non-free choice constructs, and invisible tasks involved in a non-free choice construct. Finally, Genetic Miner (GM) can discover all constructs except invisible tasks of type I v T S S p a r , non-free choice constructs, and I v T N F C . The abilities of these algorithms in mining complex constructs are summarized in Table 2.
There is no discovery algorithm capable of discovering invisible tasks of type Short- Redo in parallel and invisible tasks involved in non-free choice constructs. Only the α $ algorithm can discover these two types of invisible tasks. However, this algorithm is not implemented in ProM. Therefore, we exclude the detection of invisible tasks of types I v T S R p a r and I v T N F C .

3.3.2. Mining Time Based Classification of Discovery Algorithms

Process model discovery time may differ from algorithm to algorithm. Most existing algorithms may take milliseconds to a few minutes, based on the size of the event logs, to discover a process model. However, there are some algorithms which can take a long time to derive a process model such as Genetic Miner. This latter takes a long time in mining a process model even with a small event log and for big event log, it may run forever. Therefore, the genetic algorithm might be the last candidate process discovery algorithm to recommend. The Evolutionary tree miner and Integer linear programing algorithms also take too much time to discover a model. Genetic Miner takes more time than ETM and ETM takes more time than the ILP algorithm, and these three algorithms take time compared to the rest of algorithms.

3.3.3. Algorithms Classification Based on Their Ability in Discovering Sound Models

There are process discovery algorithms that are not guaranteed to discover sound models. α algorithm series and heuristic miner might produce process models that are not sound. They may contain problems such as deadlocks, livelocks, etc. The inductive miner and ILP miner can guarantee soundness of discovered models [20]. A process model that is not sound cannot replay traces until the end.

4. Recommendation Framework Implementation and Evaluation

This section discusses the experimental evaluation of the framework proposed for recommending business process discovery algorithms. First, we discuss the implementation in ProM, followed by an explanation of the method used for evaluation. Then in Section 4.3, we describe an evaluation based on 40 artificial examples. In Section 4.4, we discuss an evaluation based on 5 more realistic event logs.

4.1. Implementation in ProM

The framework consists of two parts. The first part is the detection of complex constructs from a given event log. The second part consists of recommending, based on the knowledge database, the algorithm capable of handling the constructs detected from the event log in the first part. The first part (i.e., the major part) which consists of detecting the complex characteristics (i.e., invisible tasks, non-free choice, loops) has been implemented in ProM. ProM is an extensible framework that provides a comprehensive set of plugins for the discovery, conformance checking and analysis of process models from event logs and can be downloaded from http://www.processmining.org (accessed on 3 January 2023). ProM takes an event log as input in the standard XES format and uses process mining plugins for different purposes. Knowledge in the database on the ability of algorithms in mining the complex constructs can be changed and updated with the conducted research on enhancing the ability of existing algorithms in handling complex constructs. Figure 9 shows a screenshot of the plugin for detecting the status of complex constructs from a given event log.
The plugin works as follows. Once the event log in question is imported in ProM, the plugin “Event Log Characteristics Detection for Recommending Process Discovery Algorithms” is selected and run. The output is a table which indicates whether the complex constructs (i.e., Short loops of length one ( L 1 p for short), Short loops of length two ( L 2 p ), invisible tasks of type Short-Skip in sequence ( I v T S S s e q ), invisible tasks of type Short-Skip in parallel ( I v T S S p a r ), invisible tasks of type Short-Redo in sequence ( I v T S R s e q ), invisible tasks of type Short-Redo in parallel ( I v T S R p a r ), invisible tasks of type Long-Skip in sequence ( I v T L S s e q ), invisible tasks of type Long-Skip in parallel ( I v T L S p a r ), invisible tasks of type Long-Redo in sequence ( I v T L R s e q ), invisible tasks of type Long-Redo in parallel ( I v T L R p a r ), non-free choice construct (NFC), and invisible tasks involved in a non-free choice construct (IvT-NFC)) exist or not in the event log by showing “YES” if they exist and “NO” if not. Then, based on the result obtained in the table and knowledge database, candidate algorithms are recommended. A screenshot of an example of obtained results is depicted in Figure 10.

4.2. Evaluation Framework

The framework has been evaluated using 40 artificial event logs and 5 real-life event logs. The 40 artificial event logs contain randomly the 11 characteristics L 1 p , L 2 p , I v T S S s e q , I v T S R s e q , I v T L S s e q , I v T L R s e q , I v T S S p a r , I v T S R p a r , I v T L S p a r , I v T L R p a r , and N F C . The characteristics existing in the logs were detected and candidate algorithms were recommended based on the detected characteristics and based on the knowledge database containing information on the ability of the α + algorithm, the α + + algorithm, the α # algorithm, Inductive Miner (IM), Heuristics Miner (HM), Integer Linear Programming algorithm (ILP), Evolutionary Tree Miner (ETM), Region Miner (RM), Transition System (TS), and DWS on mining the complex constructs. If more than one algorithm is recommended, a recommendation based on time classification and soundness classification is conducted. Then, the final recommended algorithm is used to discover a process model from the event log in question. After that, for artificial event logs, the process model discovered with the final recommended algorithm is compared with the reference model of these event logs. In the practical world, reference models are usually not available. Therefore, for real life data, the model discovered with the recommended algorithm is replayed by the event log to investigate whether the event log is in conformance with the discovered process model. An illustration of the evaluation method is depicted in Figure 11. The comparison of the discovered model with the original model and the event log is conducted using conformance checking metrics. We used three types of metrics: behavioral similarity ( B P   a n d   B R ), to evaluate how similar the discovered model and the original model behaved in terms of precision and recall; structural similarity ( S P   a n d   S R ), to assess how structurally similar the discovered model and the original model were in terms of precision and recall; and ETC precision to identify whether the process model is precise to the observed behavior (i.e., the event log). ETC precision is used when the reference model is unavailable. Behavioral and structural similarity metrics are used in the case of evaluation using artificial data and the ETC precision metric is used in the case of evaluation using real-life data.
To identify similarity between two process models, an original model and a discovered model, the behavioral and structural similarity between them must be considered [2]. The behavioral similarity metrics measure the similarity in behavior between two models in terms of precision and recall [3]. These metrics investigate the event log to quantify how similar the behavior of the discovered model is to that of its original model. This is completed by replaying each trace against the two models and calculating how many activities are enabled in each model at the occurrence of every event in the trace. The more enabled activities the two models have in common, the higher the similarity is between them.
Definition 20. 
(Behavioral precision and recall) [15]. Some parameters are defined as follows:
σ: a trace in an event log.
L(σ): the number of occurrences of σ in an event log.
N o   and   N d : the respective Petri nets for the original and the discovered models.
C o   and   C d : the respective causality relations for N o   and   N d .
The behavioral precision and recall are defined as:
B p L ,   C o , C d = ( ( L σ σ × i = 0 σ 1 E n a b l e d   C o ,   σ ,   i E n a b l e d   C d ,   σ ,   i E n a b l e d   C d ,   σ ,   i   ) / σ L L ( σ )
B R L ,   C o , C d = ( ( L σ σ × i = 0 σ 1 E n a b l e d   C o ,   σ ,   i E n a b l e d   C d ,   σ ,   i E n a b l e d   C o ,   σ ,   i   ) / σ L L ( σ )
where, E n a b l e d C o , σ , i is the set of enabled activities when parsing the next event (or task) after position i in trace σ [15]. The value of both behavioral precision and recall metrics lies in the [0, 1] range. A value close to 1 indicates a very high degree of similarity between the two models. The behavioral precision reflects how much of the behavior of the discovered model is also in the original model. The behavioral recall reflects how much of the behavior of the original model also occurs in the discovered model.
For structural similarity, the structural precision and recall metrics [15] are used. The structural recall reflects the number of correct causality relations present in the discovered model as a fraction of the total number of causality relations in the original model. The structural precision reflects the fraction of correct causality relations present in the discovered model.
Definition 21. 
(Structural precision and recall) [15]. Let N o = ( P o ,   T o ,   F o   ) and N d = ( P d ,   T d ,   F d ) be respective Petri nets for the original and discovered models. Let C o and C d be the respective causality relations for N o and N d . The structural precision and structural recall are defined as:
S p N o , N d = | C o C d | C d
S R N o , N d = | C o C d | C o
Both structural precision and recall are in the range [0, 1]. A value close to 1 means they are very similar structurally.

4.3. Evaluation Using Artificial Event Logs

We evaluated our approach using 40 artificial examples and compared the results of the proposed framework with empirical evaluation. The corresponding complete logs were manually generated. In this set of 40 models, the maximum number of activities in a process model is less than 15, and the number of cases in each event log is less than 30. The 40 reference models randomly include the following characteristics: L 1 p , L 2 p , I v T S S s e q , I v T S R s e q , I v T L S s e q , I v T L R s e q , I v T S S p a r , I v T S R p a r , I v T L S p a r , I v T L R p a r , and N F C . For this analysis, we used visual inspection to compare the mined model with the reference model, as well as the previously mentioned conformance metrics: B r (behavioral recall), B p (behavioral precision), S r (structural recall), and S p (structural precision). By visually comparing the reference models and the models discovered by the recommended algorithm, we found that 95% of the process models (38 out of 40 models) were similar to the reference models.
In addition to visual inspection, we computed conformance metrics to test the behavioral and structural similarities between the original models and the models discovered by the recommended algorithms. The results are illustrated in Figure 12. As can be seen, the results obtained from the conformance metrics are consistent with the results from visual inspection. The 38 process models are similar to the original models, with values equal to or close to 1. Only two discovered models are dissimilar to the reference models. These models were discovered from the event logs L29 and L38.
For L29, the structural similarity metrics deviate slightly from a value of 1, but the behavioral similarity metrics are close to 1. Upon examining the original model, it was found to contain a loop of length one, while in the discovered model, there is an invisible task of type Short-Redo in sequence. Although these structures are different, they exhibit similar behavior, which justifies the deviation in the structural similarity metrics.
For L38, the original model contains two invisible tasks involved in a non-free choice construct. However, in the discovered model, the two invisible tasks are correctly discovered, but the implicit dependency between them, reflecting the non-free choice construct, is not discovered. This is expected since none of the existing algorithms implemented in ProM are capable of discovering invisible tasks involved in a non-free choice construct.

4.4. Evaluation Based on Real-Life Event Logs

The evaluation using artificial event logs demonstrates the power of the proposed framework in recommending a process discovery algorithm suitable for a given event log by detecting complex constructs within it. To showcase the applicability of our framework, we conducted evaluations using five real-life event logs obtained from the Process Mining Manifesto database and the Shipbuilding Processing Plan Management System of a heavy manufacturing company. Firstly, we identified existing constructs from the event logs and recommended suitable algorithms using the constructed knowledge database. The obtained results are presented in Table 3. Subsequently, we applied different process discovery algorithms to each event log to discover process models. To evaluate the discovered models, we utilized the “Check Conformance using ETConformance” plugin in ProM, which provides information on the precision of the mined model compared to the corresponding event log. The precision metric, referred to as ETC precision, is displayed in Table 4. Additionally, we compared the precision metric results for each discovered model by algorithm and event log, comparing them with the recommended algorithm in Table 3. The process models were discovered using the Inductive Miner (IM), Heuristics Miner (HM), Alpha++, Alpha#, Integer Linear Programming Miner (ILP), Evolutionary Tree Miner (ETM), DWS algorithm, Genetic Miner, and Region Miner. Table 4 only shows IM, HM, ILP, DWA, Alpha#, and Alpha++ algorithms, as the others were excluded due to exceeding the running time limit of five minutes.”
As can be seen in Table 3, Alpha++ was recommended to be a suitable process discovery algorithm for the event log L1 due to the detection of a non-free choice construct in the event log L1 and due to the fact that the numbers of events, cases, and tasks are not that big. In Table 4, the precision value of the model discovered by the Alpha++ algorithm is the highest among the others (equal to 1). The precisions values of the models mined by IM, HM, ILP, Alpha#, DWS is lower. This is logical since those algorithms are incapable of handling non-free choice constructs. For log L2, no complex construct was detected. Thus, any one of the five algorithms can be recommended. However, since Alpha algorithms and Heuristic Miner do not guarantee sound models with big event logs, we recommended IM, ILP, and DWS. The values of precision of IM and DWS are similar and the precision value of ILP is slightly higher. For log L3, the invisible task of type Short-Skip in sequence was detected. Therefore, the IM algorithm was recommended. In Table 4, the precision value of IM was the highest. The results obtained for logs L4 and L5 are similar to the previous ones. Based on the obtained results, we can conclude that the recommendation framework is working well.

5. Conclusions

In today’s competitive business environment, organizations must continuously understand, analyze, and improve their processes to stay ahead in the market. Process mining offers a set of techniques that provide organizations with an in-depth understanding of their processes. It extracts valuable process-related knowledge from the vast amount of data recorded in modern process-aware information systems like Enterprise Resource Planning (ERP) systems, Business Process Management (BPM) systems, Supply Chain Management (SCM) systems, and more. Among the various categories of process mining techniques, process discovery plays a significant role. It enables the automatic construction of process models based on the information stored in the system, reflecting the real behavior of the processes. Numerous process discovery algorithms have been proposed, which has led to a situation where users and businesses face challenges in selecting the most suitable mining algorithm for their specific business processes. Additionally, existing evaluation and recommendation frameworks for process discovery techniques have their limitations and drawbacks. In our study, we have introduced a framework that recommends the appropriate process discovery algorithm for a given event log. Our methodology focuses on detecting complex control flow patterns present in the event log without the need to discover any process model. Instead, we utilize a knowledge database that contains information about the performance of existing algorithms in mining these complex constructs. This information takes into account factors such as computation time and the ability of the model to accurately discover sound models.
The detection of complex control flow constructs is based on the relations introduced in Alpha algorithms ( α + + , α # , α $ ). However, in our work, we have enhanced and adapted these relations to detect complex constructs directly from event logs, without the necessity of discovering a process model. We have implemented our proposed framework in ProM and provided a detailed description of its implementation. Furthermore, we have discussed the methodology and metrics employed for the evaluation process. The framework has been evaluated using both artificial and real-life data. The results obtained demonstrate that the recommended algorithms are indeed accurate. However, it is worth noting that the evaluation of our framework was conducted with a limited number of real-life event logs. To further strengthen our findings, future work will involve incorporating a larger and more diverse set of real-life data. In summary, our proposed framework offers a method for recommending process discovery algorithms based on the detection of complex control flow constructs directly from event logs, eliminating the need for process model discovery. The framework’s implementation in ProM, evaluation using artificial and real-life data, and the correct recommendations obtained from the results indicate its potential utility. As we move forward, we aim to enhance the framework by incorporating more extensive real-life data to improve its robustness and effectiveness.

Author Contributions

H.R. contributed to the main idea and the methodology of the research. H.R. and M.A.A.-A. designed the experiment, performed the simulations and wrote the original manuscript. H.R. and M.A.A.-A. contributed significantly in improving the technical and grammatical contents of the manuscript. H.R. and M.A.A.-A. reviewed the manuscript and provided valuable suggestions to further refine the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. R’bigui, H.; Cho, C. The state-of-the-art of business process mining challenges. Int. J. Bus. Process Integr. Manag. 2017, 8, 285–303. [Google Scholar] [CrossRef]
  2. Van der Aalst, W.M.P.; Reijers, H.A.; Weijters, A.J.M.M.; van Dongen, B.F.; de Medeiros, A.K.A.; Song, M.; Verbeek, H.M.W. Business process mining: An industrial application. Inf. Syst. 2007, 32, 713–732. [Google Scholar] [CrossRef]
  3. Taylor, P.; Leida, M.; Majeed, B. Case Study in Process Mining in a Multinational Enterprise. In Data-Driven Process Discovery and Analysis. SIMPDA 2011; Aberer, K., Damiani, E., Dillon, T., Eds.; Springer: Berlin, Heidelberg, 2012; Volume 116, pp. 134–153. [Google Scholar]
  4. Van der Aalst, W.M.P. Book on Process Mining: Discovery, Conformance and Enhancement of Business Processes, 1st ed.; Springer: Berlin/Heidelberg, Germany, 2011. [Google Scholar]
  5. Dennis, J.B. Petri Nets. In Encyclopedia of Parallel Computing; Padua, D., Ed.; Springer: Boston, MA, USA, 2011. [Google Scholar] [CrossRef]
  6. Van der Aalst, W.M.P.; Weijters, A.J.M.M.; Maruster, L. Workflow Mining: Discovering Process Models from Event Logs. IEEE Trans. Knowl. Data Eng. 2004, 16, 1128–1142. [Google Scholar] [CrossRef]
  7. De Medeiros, A.K.A.; van Dongen, B.F.; van der Aalst, W.M.P.; Weijters, A.J.M.M. Process Mining: Extending the Alpha-Algorithm to Mine Short Loops; BETA Working Paper Series, WP 113; Eindhoven University of Technology: Eindhoven, The Netherlands, 2004. [Google Scholar]
  8. Wen, L.; van der Aalst, W.M.P.; Wang, J.; Sun, J. Mining process models with non-free choice constructs. Data Min. Knowl. Discov. 2007, 15, 145–180. [Google Scholar] [CrossRef]
  9. Wen, L.; Wang, J.; Sun, J. Mining Invisible Tasks from Event Logs. In Advances in Data and Web Management; Lecture Notes in Computer Science 4505; Springer: Berlin/Heidelberg, Germany, 2007; pp. 358–365. [Google Scholar]
  10. Wen, L.; Wang, J.; van der Aalst, W.M.P.; Huang, B.; Sun, J. Mining process models with prime invisible tasks. Data Knowl. Eng. 2010, 69, 999–1021. [Google Scholar] [CrossRef]
  11. Guo, Q.; Wen, L.; Wang, J.; Yan, Z.; Yu, P.S. Mining invisible tasks in non-free-choice constructs. In Business Process Management—BPM 2016; Motahari-Nezhad, H., Recker, J., Weidlich, M., Eds.; Lecture Notes in Computer Science 9253; Springer: Cham, Switzerland, 2015; pp. 109–125. [Google Scholar]
  12. Weijters, A.J.M.M.; van der Aalst, W.M.P.; de Medeiros, A.K.A. Process Mining with the Heuristics Miner-Algorithm; BETA Working Paper Series, WP 166; Eindhoven University of Technology: Eindhoven, The Netherlands, 2006. [Google Scholar]
  13. Leemans, S.J.J.; Fahland, D.; van der Aalst, W.M.P. Discovering block-structured process models from event logs—A constructive approach. In Application and Theory of Petri Nets and Concurrency; Lecture Notes in Computer Science 7927; Springer: Berlin/Heidelberg, Germany, 2013; pp. 311–329. [Google Scholar]
  14. Bergenthum, R.; Desel, J.; Lorenz, R.; Mauser, S. Process mining based on regions of languages. In Business Process Management—BPM 2007; Lecture Notes in Computer Science 4714; Springer: Berlin/Heidelberg, Germany, 2007; pp. 375–383. [Google Scholar]
  15. Rozinat, A.; Alves de Medeiros, A.K.; Günther, C.W.; Weijters, A.J.M.M.; Van der Aalst, W.M.P. Toward an Evaluation Framework for Process Mining Algorithms; BPM Center Report BPM-07-06; Technische Universiteit Eindhoven: Eindhoven, The Netherlands, 2007; Available online: http://bpmcenter.org/ (accessed on 2 February 2023).
  16. Wang, J.; Tan, S.; Wen, L. An Empirical Evaluation of Process Mining Algorithms based on Structural and Behavioral Similarities. In Proceedings of the 27th Annual ACM Symposium on Applied Computing, Trento, Italy, 26–30 March 2012; pp. 211–213. [Google Scholar]
  17. Wang, J.; Wong, R.K.; Ding, J.; Guo, Q.; Wen, L. Efficient selectin of Process Mining Algorithms. IEEE Trans. Serv. Comput. 2013, 6, 484–496. [Google Scholar] [CrossRef]
  18. Ribeiro, J.; Carmouna, J. RS4PD: A Tool for Recommending Control-Flow Algorithms; BPM (Demos): Eindhoven, The Netherlands, 2014; p. 66. [Google Scholar]
  19. Pérez-Alfonso, D.; Fundora-Ramírez, O.; Lazo-Cortés, M.S.; Roche-Escobar, R. Recommendation of Process Discovery Algorithms Through Event Log Classification. In Pattern Recognition, MCPR 2015; Carrasco-Ochoa, J., Martínez-Trinidad, J., Sossa-Azuela, J., Olvera López, J., Famili, F., Eds.; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2015; Volume 9116. [Google Scholar]
  20. Jouck, T.; Bolt, A.; Depaire, B.; de Leoni, M.; van der Aalst, W.M. An Integrated Framework for Process DiscoveryAlgorithm Evaluation. arXiv 2018, arXiv:1806.07222. [Google Scholar]
  21. R’bigui, H.; Al-Absi, M.A.; Cho, C. Process Discovery Algorithms Recommendation Approach. In International Conference on Smart Computing and Cyber Security: Strategic Foresight, Security Challenges and Innovation; Lecture Notes in Networks and Systemsthis link is disabled; Springer Nature Singapore: Singapore, 2021; pp. 55–60. [Google Scholar]
  22. Buijs, J.C.A.M.; van Dongen, B.F.; van der Aalst, W.M.P. On the Role of Fitness, Precision, Generalization and Simplicity in Process Discovery. In On the Move to Meaningful Internet Systems: OTM 2012; Springer: Berlin/Heidelberg, Germany, 2012; pp. 305–322. [Google Scholar]
  23. Wen, L.; Wang, J.; Sun, J. Detecting Implicit Dependencies Between Tasks from Event Logs. In Frontiers of WWW Research and Development—APWeb 2006; Zhou, X., Li, J., Shen, H.T., Kitsuregawa, M., Zhang, Y., Eds.; Lecture Notes in Computer Science 3841; Springer: Berlin/Heidelberg, Germany, 2006; pp. 591–603. [Google Scholar]
  24. Van der Werf, J.M.E.M.; van Dongen, B.F.; Hurkens, C.A.J.; Serebrenik, A. Process Discovery using Integer Linear Programming. Fundam. Inform. 2009, 94, 387–412. [Google Scholar] [CrossRef] [Green Version]
  25. Kalenkova, A.A.; Lomazova, I.A.; van der Aalst, W.M.P. Process Model Discovery: A Method Based on Transition System Decomposition, In Application and Theory of Petri Nets and Concurrency; Ciardo, G., Kindler, E., Eds.; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2014; Volume 8489. [Google Scholar]
  26. Greco, G.; Guzzo, A.; Pontieri, L.; Saccà, D. Discovering Expressive Process Models by Clustering Log Traces. IEEE Trans. Knowl. Data Eng. 2006, 18, 1010–1027. [Google Scholar] [CrossRef]
Figure 1. Overview of process mining framework.
Figure 1. Overview of process mining framework.
Electronics 12 03108 g001
Figure 2. A process model corresponding to the event log of Table 1.
Figure 2. A process model corresponding to the event log of Table 1.
Electronics 12 03108 g002
Figure 3. Process discovery algorithms recommendation framework [21].
Figure 3. Process discovery algorithms recommendation framework [21].
Electronics 12 03108 g003
Figure 4. Illustration of the basic idea of L .
Figure 4. Illustration of the basic idea of L .
Electronics 12 03108 g004
Figure 5. Illustration of the basic idea of improved mendacious dependency.
Figure 5. Illustration of the basic idea of improved mendacious dependency.
Electronics 12 03108 g005
Figure 6. Sound sub-WF-nets with different location of implicit dependencies [23].
Figure 6. Sound sub-WF-nets with different location of implicit dependencies [23].
Electronics 12 03108 g006
Figure 7. Example of a sound WF-net with invisibles tasks t1 and t2 involved in a non-free choice construct (L = {<A, B, C, D, E, F>, <A, C, D, F>}).
Figure 7. Example of a sound WF-net with invisibles tasks t1 and t2 involved in a non-free choice construct (L = {<A, B, C, D, E, F>, <A, C, D, F>}).
Electronics 12 03108 g007
Figure 8. Example of a sound WF-net with invisibles tasks t1 and t2 not involved in a free choice construct (L = {<A, B, C, D, E, F>, <A, C, D, F>, <A, B, C, D, F>, <A, C, D, E, F>}).
Figure 8. Example of a sound WF-net with invisibles tasks t1 and t2 not involved in a free choice construct (L = {<A, B, C, D, E, F>, <A, C, D, F>, <A, B, C, D, F>, <A, C, D, E, F>}).
Electronics 12 03108 g008
Figure 9. A screenshot of the implementation of the plugin detecting the status of complex constructs from event log.
Figure 9. A screenshot of the implementation of the plugin detecting the status of complex constructs from event log.
Electronics 12 03108 g009
Figure 10. A screenshot of the result obtained from an example event log.
Figure 10. A screenshot of the result obtained from an example event log.
Electronics 12 03108 g010
Figure 11. A screenshot of the result obtained from an example event log The evaluation method, which involves using conformance checking metrics to compare the discovered model with the original model and the event log.
Figure 11. A screenshot of the result obtained from an example event log The evaluation method, which involves using conformance checking metrics to compare the discovered model with the original model and the event log.
Electronics 12 03108 g011
Figure 12. Comparison of mining results by recommended algorithms with reference models on complete logs generated from artificial examples.
Figure 12. Comparison of mining results by recommended algorithms with reference models on complete logs generated from artificial examples.
Electronics 12 03108 g012
Table 1. An example of an event log used for process mining.
Table 1. An example of an event log used for process mining.
Case IdEvent IdActivityTimestampResource….
Q521-QZRN0060Purchase Request5/15/2022 16:35Hind….
N0070Purchase Approval5/15/2022 16:40Safae….
N0080Approval Verification 5/16/2022 16:40Soukaina….
N0090Purchase Finalization5/16/2022 17:42Younes….
N0100Purchase Order5/16/2022 17:30Moui….
N0110Purchase Reception and verification6/13/2022 10:55Mohammed….
Q543-289N0060Purchase Request5/15/2022 10:38Hind….
N0070Purchase Approval5/15/2022 11:41Safae….
N0080Approval Verification 5/15/2022 12:42Soukaina….
N0060Purchase Request5/15/2022 13:42Hind….
N0080Approval Verification 5/15/2022 15:42Soukaina….
N0090Purchase Finalization5/15/2022 16:43Younes….
N0100Purchase Order5/15/2022 16:45Moui….
N0110Purchase Reception and verification6/15/2022 11:45Mohammed….
Table 2. Comparison of the ability of current algorithms in mining standard and complex constructs.
Table 2. Comparison of the ability of current algorithms in mining standard and complex constructs.
α + + α # IMHMILPETMRMTSDWSGM
L 1 p YesYesSbNoYesYesNoSbNoYes
L 2 p YesYesYesSbYesNoYesYesYesYes
I v T S R s e q SbYesYesNoNoNoNoSbYesNo
I v T L R s e q NoYesYesYesNoNoSbNoYesYes
I v T S R p a r NoNoNoNoNoNoNoNoNoNo
I v T L R p a r NoNoYesNoNoNoNoNoNoYes
I v T S S s e q NoYesYesYesNoYesNoYesYesYes
I v T S S p a r NoNoYesNoNoNoNoSbNoNo
I v T L S s e q NoYesYesYesNoYesNoNoYesYes
I v T L S p a r NoNoYesYesNoNoNoNoNoYes
I v T S W NoYesNoYesNoNoNoNoYesYes
NFCYesNoNoNoNoNoNoNoNoYes
I v T i n N F C NoNoNoNoNoNoNoNoNoNo
Table 3. Recommended algorithm for each event log.
Table 3. Recommended algorithm for each event log.
Event LogsL1L2L3L4L5
cases97110663167631,509
events361377019,658129,4281,202,267
tasks821183866
Recommended algorithmAlpha++IM/ILPIMIM/DWSILP
Table 4. The ETC precision values of discovered models.
Table 4. The ETC precision values of discovered models.
HMAlpha++IMAlpha#ILPDWS
L10.79731.00.80.80.80.69
L20.38110.66650.66650.66650.70.6665
L3--0.9450.44230.3610.3
L40.40.030.87090.02-0.89
L50.681-0.89-0.910.45
Note: (-) the value could not be computed.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Al-Absi, M.A.; R’bigui, H. Process Discovery Techniques Recommendation Framework. Electronics 2023, 12, 3108. https://doi.org/10.3390/electronics12143108

AMA Style

Al-Absi MA, R’bigui H. Process Discovery Techniques Recommendation Framework. Electronics. 2023; 12(14):3108. https://doi.org/10.3390/electronics12143108

Chicago/Turabian Style

Al-Absi, Mohammed Abdulhakim, and Hind R’bigui. 2023. "Process Discovery Techniques Recommendation Framework" Electronics 12, no. 14: 3108. https://doi.org/10.3390/electronics12143108

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop