Next Article in Journal
Modeling to Correct the Effect of Soil Moisture for Predicting Soil Total Nitrogen by Near-Infrared Spectroscopy
Previous Article in Journal
Distributed Stochastic Model Predictive Control for a Microscopic Interactive Traffic Model
 
 
Article
Peer-Review Record

Predictive Business Process Monitoring Approach Based on Hierarchical Transformer

Electronics 2023, 12(6), 1273; https://doi.org/10.3390/electronics12061273
by Weijian Ni, Gang Zhao, Tong Liu *, Qingtian Zeng and Xingzong Xu
Reviewer 1:
Reviewer 2: Anonymous
Electronics 2023, 12(6), 1273; https://doi.org/10.3390/electronics12061273
Submission received: 13 February 2023 / Revised: 27 February 2023 / Accepted: 2 March 2023 / Published: 7 March 2023

Round 1

Reviewer 1 Report

Thank you very much for the opportunity of reviewing this paper, which I found very interesting and very well-written. This is a lovely contribution to the study of  IT technology that must keep exploring its competitive advantage. I suggest only some minor points that could be addressed for improving the paper:

1. Introduction

 

This is very well done, but a short paragraph introducing the main characteristics of industries where  the research was conducted. The author(s)  does not refer to them in the course of the paper at all .  This information would fit very nicely in the introduction so that any reader can more easily understand your arguments. 

2. Always in the introduction (and always to not give for granted that readers know in detail machine learning methods), example  of  the companies product using  could me more precise. Also, by adding rates of those apply  Markov  Models, Support Vector Machines , and Cluster Analysis  to the task of business  process prediction (lines 53-55).

3. In lines (98-99) : The existing data-driven business  process prediction studies also do not consider this effect.  Please  describe   what  is the effect  if it is not fullfilled? Slow down the program? Inaccurate prediction or what?

4. Is it HiP-Transformer  your discovery  or  you  develop  it  from the existing model?  If it is  developed  from other people's invention, please  describe the previous  studies and gap you would like to fill in.

5. Line 138. you mention : Compared with all competitive baselines, the next 138 activity prediction accuracy improves by 6.32% on average, and the average absolute 139 error of remaining time prediction reduces by 21% on average.

Please be specific  and again do not assume everyone know:  " what are those all competitive baselines". Also  cite  who/what  did you refer  to in determining baselines.

5. Some  abbreviations  suddenly  appear without any prior explanation, e.g RNN, LSTM. Please check in the whole  article.

2. Preliminaries and Methods

In sub section 3.1.1 until 3.1.4, : author presents a long list  of  algorithm. Is it your own formula, or cited  from  other's work.  Author's main task is to show the weakness  of this model if it is  taken from other's people  work, not simply repost. 

Figure 1. HiP-Transformer Model Framework. The same concern as I previously mentioned, your original work or adopted from others?

Results and Discussion

 

No particular remarks. However I really appreciate  if authors  could  give example of  possibly companies, let say  how YouTube or other big corporation  that can use  this model to improve predicting. I recommend this as  it has potential  for commercialization.

 

 

 

 

Author Response

Please see the attachment.

Response to Reviewer 1 Comments

 

        We are extremely grateful for your careful review and constructive suggestions on our work. We have carefully revised your feedback and provided detailed explanations under your comments point by point, and corrected them accordingly in the manuscript (page numbers and line numbers described in the explanations are in the revised version). All authors have agreed to reply to the letter as well as the revised version of the manuscript. We hope that our explanations and the revised manuscript will satisfy you and thank you again for your valuable comments and suggestions.

 

Point 1: 1. Introduction

This is very well done, but a short paragraph introducing the main characteristics of industries where the research was conducted. The author(s) does not refer to them in the course of the paper at all. This information would fit very nicely in the introduction so that any reader can more easily understand your arguments. 

Response 1: Thank you very much for your approval of our introduction section and giving the idea that adding the characteristics of the field can help the reader better understand the paper, which is indeed essential and very logical information for an introduction. We summarize three characteristics of the process mining domain, which are the large volume of log data, the complexity of business process relationships, and the fact that business processes change with technology improvements, resource scheduling, etc. We add the relevant features of the process mining domain and the corresponding problems (from line 52 to line 62 on page 2), so that the reader can understand more clearly the problem to be solved in the article by these features.

 

Point 2: Always in the introduction (and always to not give for granted that readers know in detail machine learning methods), example of the companies product using could me more precise. Also, by adding rates of those apply Markov Models, Support Vector Machines , and Cluster Analysis  to the task of business process prediction (lines 53-55).

Response 2: We are very sorry for the inconvenience of reading this, and we did not consider before that the reader is not clear about the application of machine learning in process mining. Adding details of these works can describe the development of process mining more accurately, so we have adopted your comments and added examples of three machine learning model applications (from line 71 to line 76 on page 2).

        In addition, since the above machine learning models can only solve specific problems and are not very general, and previous work [1] has demonstrated that the prediction accuracy of machine learning models is lower than that of deep learning models, therefore we do not additionally add redundant comparison experiments between our own models and machine learning models, which we hope you can understand.

Point 3: In lines (98-99) : The existing data-driven business process prediction studies also do not consider this effect. Please describe what is the effect if it is not fullfilled? Slow down the program? Inaccurate prediction or what?

Response 3: We are very sorry for the simple description in this sentence, the problem that the original text wanted to express here is that the concept drift phenomenon will make the business process data distribution change, and the current machine learning models and deep learning models assume the same data distribution before and after, without considering the impact of concept drift on the final result. Therefore we have added that business process models do not match the processes where conceptual drift occurs, and outdated models lead to inaccurate predictions of future results(from line 124 to line 126 on page 3).

 

Point 4: Is it HiP-Transformer your discovery or you develop it from the existing model? If it is  developed from other people's invention, please describe the previous studies and gap you would like to fill in.

Response 4: Thank you very much for your question. The Hip-Transformer model is our original model; no author has previously proposed a model fully similar to ours, and the core component of our model is the Transformer, and we have expanded upon it.

        A three-layer architecture is proposed to model information at different granularities separately. We use a self-attention mechanism to represent the relationship between activities and attributes; we propose drift detection algorithms to segment long business processes with conceptual drift phenomena and use cross-attention to capture the relevance of events to segments; and we propose self-learning location encoding to capture the location information of segments. The final prediction of the next activity and the remaining time is conducted for information that incorporates three different granularities.

        We have clarified the origin of the framework on page 7 of the revised version, lines (323 - 325).

 

Point 5: Line 138. you mention : Compared with all competitive baselines, the next 138 activity prediction accuracy improves by 6.32% on average, and the average absolute 139 error of remaining time prediction reduces by 21% on average.

         Please be specific and again do not assume everyone know: " what are those all competitive baselines". Also cite who/what did you refer to in determining baselines.

Response 5: Thank you very much for pointing out the lack of clarity in the article and as you say the reader will wonder what exactly a baseline would be and how reliable it would be when reading it.。Therefore, we have added specific comparison baselines to the manuscript (from line 166 to line 168 on page 4) to increase the reader's understanding of the paper's experiments; in addition, to demonstrate the reliability of the comparison baselines, we cite the deep learning model benchmarks (from line 152 to line 153 on page 4)

 

Point 6: Some abbreviations suddenly appear without any prior explanation, e.g RNN, LSTM. Please check in the whole article.

Response 6: Thank you for your comments on the rigour of the paper. After careful examination of the full text, we have determined that the abbreviations RNN, LSTM and GRU, which appear for the first time, have been explained on page 2, from line 86 and line 90. To increase readability, we have further added a brief description of RNNs and the fact that LSTMs, GRUs are based on RNNs to solve their problems of gradient disappearance and explosion (from line 87 to line 91 on page 2).

 

Point 7: In sub section 3.1.1 until 3.1.4, : author presents a long list of algorithm. Is it your own formula, or cited from other's work. Author's main task is to show the weakness of this model if it is taken from other's people work, not simply repost.

Figure 1. HiP-Transformer Model Framework. The same concern as I previously mentioned, your original work or adopted from others?

Response 7: Thank you for your interest. The notation and formulas in sections 3.1.1 to 3.1.4 are the basic definitions of the business process prediction domain and are descriptions of what is already available in the current domain. The descriptions in the manuscript are formal representations of them using our own notation system, based on our own understanding and the need for the proposed method.

        In addition, the algorithm you mentioned in the article is proposed by us to solve the problem of concept drift in long sequences of business processes. The business process is segmented by Algorithm 1, and the concept drift exists in the business process subsequences of adjacent segments, and the correlation between subsequences is subsequently captured by cross-attention.

        The model shown in Figure 1 is, as explained in Response 4, not previously proposed by other authors in full consistency with the model in our paper, and the origin of the model has been clarified in the manuscript.

 

Point 8: Results and Discussion

No particular remarks. However I really appreciate if authors could give example of possibly companies, let say how YouTube or other big corporation that can use this model to improve predicting. I recommend this as it has potential for commercialization.

Response 8: Thank you very much for the valuable advice you have given us, we appreciate it deeply. Practical applications by companies or organisations can add to the commercial value of the article and we felt it was very important to add this content. We have therefore added an application of the model to consumer credit companies in rows 761 to 765 on page 20, where the full use of consumer information and the segmentation of consumer records allows a more accurate prediction of the remaining repayment time of consumers, which can be useful for companies dealing with potential liquidity problems.

 

 

 

 

reference:

  1. Rama-Maneiro, E.; Vidal, J.; Lama, M. Deep learning for predictive business process monitoring: Review and benchmark. IEEE Transactions on Services Computing. 2021(Early access).

 

Author Response File: Author Response.pdf

Reviewer 2 Report

Dear Authors, 

I would like to make some recommendations and also generate some remarks in relation to the content of the proposed article:

- the abstract it had to be reduced in number of words. I believe that the approach regarding a hierarchical transformer-based business process prediction model should not be explained in such detail. Explaining in the abstract the key issues for each layer of the hierarchical transformer it is not necessary. Rather, a general development in relation to the three layers would be enough to create the reason for those interested; It would rather welcome how the usefulness of new approaches proposed by the authors can be synthesized

- In the Introduction section I recommend to insert two types of issues: 1) related to the paper goal(s) to formulate some research questions (in order to understand the authors concerns in relation to the field referred to in the article); 2) generate a paper workflow which will help to understand the way in which the research was designed;

- Please rename the section 2. Related Work in 2. Literature review;

- you have a subsection 3.3. Transformer. Here you indicated an article numbered as 12 (Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need.  Advances in neural information processing systems. 2017, 5998–6008). Is not clear if the equations inserted here are part of this paper (numbered [12]), are adapted from that article or are totally designed by the authors.

- in the same time you have on page 7 the figure 1 which can  be considered the key point of the article (Figure 1. HiP-Transformer Model Framework). Here is necessary to clarify the origin of this framework (is totally designed by the authors, is an adapted framework from another source etc.);

- then, you proposed on page 10 the Algorithm 1. Here there are two kind of questions: How you could test and validate it?; There are another algorithms (Algorithm 2, Algorithm 3 etc.) already designed or your research plans?

- In the section 4 Results and Discussion you used experimental data obtained from a Dutch portal. You have a particular reason to use: 4TU Center for Research Data? Also, I am interested to know from table 1 for what particular period these data are available  (in terms of years, months etc)? I ask you to declare the units of measure for duration (days, weeks etc.).

- another issue to be solved is to focus on the Conclusions section. Here I think you need to underline better all the gains generated by designing the HiP-Transformer Model framework, especially through the subsection 4.3. and 4.4.). Also, is mandatory to declare which are the limitations of this research (since you already have thought about the future work)?

- last but not least, please proceed to a spelling and grammar round (for instance, you have in line 275 Euqation (10)).

Good luck!

Author Response

Please see the attachment.

Response to Reviewer 2 Comments

 

        We are extremely grateful for your careful review and constructive suggestions on our work. We have carefully revised your feedback and provided detailed explanations under your comments point by point, and corrected them accordingly in the manuscript (page numbers and line numbers described in the explanations are in the revised version). All authors have agreed to reply to the letter as well as the revised version of the manuscript. We hope that our explanations and the revised manuscript will satisfy you and thank you again for your valuable comments and suggestions.

 

Point 1: the abstract it had to be reduced in number of words. I believe that the approach regarding a hierarchical transformer-based business process prediction model should not be explained in such detail. Explaining in the abstract the key issues for each layer of the hierarchical transformer it is not necessary. Rather, a general development in relation to the three layers would be enough to create the reason for those interested; It would rather welcome how the usefulness of new approaches proposed by the authors can be synthesized.

Response 1: Thank you very much for your empirical advice and we have refreshed our work to make a general statement on the proposed approach and the solutions used. We use different encoding schemes to encode attributes and capture the relationship between activities and attributes; we propose a concept drift detection algorithm to segment the business process and use cross-attention to capture the correlation between subsequences as well as between activities and subsequences; we propose self-learning position encoding to capture the relative position information of subsequences.

        We have removed the previous detailed description of the model's three-layer architecture (lines 21 to 28 on page 1).

        Finally, we have added that the different granularity of information is fused by different weights (lines 20 to 21 on page 1), ensuring the overall integrity of the work and thus stimulating the reader's interest in reading it.

 

Point 2: In the Introduction section I recommend to insert two types of issues:

1) related to the paper goal(s) to formulate some research questions (in order to understand the authors concerns in relation to the field referred to in the article);

        2) generate a paper workflow which will help to understand the way in which the research was designed;

Response 2: Thank you very much for your valuable suggestions on our Introduction section. Adding relevant research questions in the field really increases the readability of the paper. We have added three related research questions based on the characteristics of the process mining domain and the problems to be addressed in the article, which are how to mine features in massive amounts of information through automated techniques? How to capture the complex relationships between business processes through multiple perspectives? How to deal with the phenomenon of conceptual drift in business processes? (lines 52 to 62 on page 2).

        The problems that exist in the domain are consistent with the problems that this paper is intended to address. We use the proposed deep learning model to set up a three-layer architecture that fuses business process information from three different perspectives and adds sub-series relevance modelling in the presence of conceptual drift, thus enabling the logical flow of the article.

        We have added a research line to the fourth page of the revised version, lines 172 to 176, to make the logic of the text clearer to the reader.

 

Point 3: Please rename the section 2. Related Work in 2. Literature review;

Response 3: Thank you very much for your valuable advice! We have renamed the second chapter of the article to better match the chapter descriptions. (line 177 on page 4)

 

Point 4: you have a subsection 3.3. Transformer. Here you indicated an article numbered as 12 (Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need.  Advances in neural information processing systems. 2017, 5998–6008). Is not clear if the equations inserted here are part of this paper (numbered [12]), are adapted from that article or are totally designed by the authors.

Response 4: We are very sorry for any inconvenience caused to you in reading, the formulas mentioned in 3.3 are taken from the references[12]. As the core component of our proposed model is the Transformer, the self-attention mechanism of the Transformer and its related formulas are the basis of the proposed model in the paper. Reviewing the relevant content will enable the reader to better understand the basis of the model and facilitate our subsequent presentation of the proposed approach.

 

Point 5: in the same time you have on page 7 the figure 1 which can  be considered the key point of the article (Figure 1. HiP-Transformer Model Framework). Here is necessary to clarify the origin of this framework (is totally designed by the authors, is an adapted framework from another source etc.);

Response 5: Thank you very much for your question. The Hip-Transformer model is our original model; no author has previously proposed a model fully similar to ours, and the core component of our model is the Transformer, and we have expanded upon it.

        A three-layer architecture is proposed to model information at different granularities separately. We use a self-attention mechanism to represent the relationship between activities and attributes; we propose drift detection algorithms to segment long business processes with conceptual drift phenomena and use cross-attention to capture the relevance of events to segments; and we propose self-learning location encoding to capture the location information of segments. The final prediction of the next activity and the remaining time is conducted for information that incorporates three different granularities.

        We have clarified the origin of the framework on page 7 of the revised version, lines (323 - 325).

 

Point 6: then, you proposed on page 10 the Algorithm 1. Here there are two kind of questions: How you could test and validate it?; There are another algorithms (Algorithm 2, Algorithm 3 etc.) already designed or your research plans?

Response 6:  Thank you very much for your interest in Algorithm 1. Algorithm 1 proposed in this paper is able to segment the trace of long business process sequences, and subsequently capture the conceptual drift information between the subsequences by cross-attention, thus enriching the business process representation and improving the accuracy of the final model prediction. We conducted ablation experiments in 4.4.1 on whether to use Algorithm 1, keeping the rest of the model structure unchanged, to verify the effectiveness of Algorithm 1 by comparing the accuracy of the final model predictions.

        Since the main problem addressed in this paper is conceptual drift, Algorithm 1 has been designed . If other features of the business process are considered, then segmentation can be done according to data flow (for example, the loan business can be divided into information collection, review, disbursement, repayment, etc.); and segmentation according to time (the recruitment business can be divided into making plans, job posting, CV screening, interviewing, hiring, etc.). These plans can be considered for comparative experiments in subsequent studies to explore their effectiveness in predicting outcomes.

 

Point 7: In the section 4 Results and Discussion you used experimental data obtained from a Dutch portal. You have a particular reason to use: 4TU Center for Research Data? Also, I am interested to know from table 1 for what particular period these data are available  (in terms of years, months etc)? I ask you to declare the units of measure for duration (days, weeks etc.).

Response 7: Thank you very much for your interest in our article. We have selected seven datasets from the 4TU Center for Research Data because they contain complex process relationships and additional attribute information, as well as the large time span of the datasets. The above characteristics coincide with the points of interest in this paper and the datasets have a high level of acceptance in the field of process prediction, so we have chosen them for our comparison experiments.         We have added to the month and year information contained in the dataset for table1, which is clearly documented in rows 593 on page 14 to 601 on page 15.

        In additioon, I apologise for the lack of clarity on the basic unit of duration, which is described in terms of "days" (line 614 of 15 pages).

 

Point 8: another issue to be solved is to focus on the Conclusions section. Here I think you need to underline better all the gains generated by designing the HiP-Transformer Model framework, especially through the subsection 4.3. and 4.4.). Also, is mandatory to declare which are the limitations of this research (since you already have thought about the future work)?

Response 8:  Thank you very much for your valuable advice and it is important to highlight more clearly in the conclusion the benefits that are brought about by the design of the Hip-Transformer. Compared to the baseline of the deep learning model, the proposed model predicts the next activity with 6.32% higher accuracy and 21% lower prediction remaining time loss. In addition, comparative experiments revealed that the segmentation strategy based on concept drift facilitated the improvement in prediction accuracy, and the longer the business process prefix is, the higher the model prediction accuracy is. We have added the results to lines 742 to 752 on page 19.

        The limitations of the model overlap somewhat with future work, so we have removed the limitations of the model (lines 758 to 761 on page 20) and added the practical applications of the model (lines 761 to 765 on page 20).

 

Point 9: last but not least, please proceed to a spelling and grammar round (for instance, you have in line 275 Euqation (10)).

Response 9:  Thank you very much for your careful review and we apologize for the typing errors. We have carefully checked the entire manuscript and corrected spelling errors and grammatical problems that occurred, including lines 48, 79, 193, 312, 398, etc. 

 

 

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

I agree and accept the comments made by the authors.

Back to TopTop