Next Article in Journal
On Predicting Exam Performance Using Version Control Systems’ Features
Previous Article in Journal
ARPocketLab—A Mobile Augmented Reality System for Pedagogic Applications
 
 
Article
Peer-Review Record

A Clustering and PL/SQL-Based Method for Assessing MLP-Kmeans Modeling

Computers 2024, 13(6), 149; https://doi.org/10.3390/computers13060149
by Victor Hugo Silva-Blancas 1, Hugo Jiménez-Hernández 1,*, Ana Marcela Herrera-Navarro 1, José M. Álvarez-Alvarado 2, Diana Margarita Córdova-Esparza 1 and Juvenal Rodríguez-Reséndiz 2
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3:
Computers 2024, 13(6), 149; https://doi.org/10.3390/computers13060149
Submission received: 21 April 2024 / Revised: 1 June 2024 / Accepted: 7 June 2024 / Published: 9 June 2024

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

1.The paper proposes using machine learning (ML)-based methods to improve database response time compared to traditional PL/SQL tools. However, is the interpretability issue of machine learning considered?

2.The paper needs to compare with relevant methods proposed in recent years and demonstrate the results.

Author Response

Reviewer 1:

1.The paper proposes using machine learning (ML)-based methods to improve database response time compared to traditional PL/SQL tools. However, is the interpretability issue of machine learning considered?

Author response: Thank you for your valuable comment. Yes, we do consider it in section

3.3.7 line 254 "...The connection..."

Author action: We improve the sentence to "...the group query hypothesis provided by Kmeans and the interpretability of the MLP model drives to the amount and categorization of the clusters that will be set for analysis in the k-means process, the proposal helps as a hash table estimated to improve the computation and resources to make a query data base minimizing the response time..."  

 

  1. The paper needs to compare with relevant methods proposed in recent years and demonstrate the results.

Author response: Thank you for your valuable comment. Our study proposes a novel method and is compared with similar techniques to highlight the main contributions of the work. We updated Table 7 by adding and comparing others research works from 2020 to 2022.

Author action: We added to Table 7 two more publications dated in 2023 and 2024 in order to highlight the contribution of the work as follows: 

Author

Model

Data

Epochs

K

[45]

Unsupervised Kmeans

400

11

k

[46]

Kmeans clustering jointly

N/A

N/A

1 ≤ k ≤ K

[47]

Kmeans FE

50

N/A

N/A

[48]

Parallel Kmeans

Random Pool

N/A

k=2

[49]

Kmeans spherical

REST

N/A

k=6

Proposed work

MLP/KMeans

306849

4

k=10

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

The presentation of the study needs improvement. Too much information presented with inadequate logic connecting the points together. It is better if the authors can focus on the main ideas.

A few other suggestions:

  1. Section 3 contains some basic concepts, equations, and pseudo codes. Could be more concise.

  2. The part about java seems off from the main focus of the article.

 

Comments on the Quality of English Language

There are major mistakes that affect the fluency of reading

Author Response

Reviewer 2:  

The presentation of the study needs improvement. Too much information presented with inadequate logic connecting the points together. It is better if the authors can focus on the main ideas. 

Author response: Thank you for your valuable comment. We had reviewed the way information had been presented and made some changes to avoid meanderings.

Author action: We reduce SQL instructions list, take off some unnecessary Oracle citations, and erase others like: Java reference, graph types, SPARSQL citation, grasshopper optimization and parallel execution models.

 

  1. Section 3 contains some basic concepts, equations, and pseudo codes. Could be more concise.

Author response: Thank you for your observation. As for better clarification we had wrote SQL query description in algebraic mode. In the case of MLP we considered necessary to describe its mathematical representation.

Author action: We updated the pseudocodes. We substitute SQL equations for the itemized list of the queries to be executed. We kept MLP equations (sections 3.3.1 to 3.3.3) because it represents our contribution. In the case of the pseudo code, we already had made some corrections requested by one of the other Doctor reviewers. 

 

  1. The part about java seems off from the focus of the article.

Author response: Thank you, we agree on your observation. Because many researchers are using Python language (and its relatives) for data analysis, we wanted to distinguish that this work was made on Java language.

Author action: We removed all reference to Java language.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

The Abstract is super important, please double check your Abstraction section to ensure there is no error.

1. L2, "it is necessary an optimize", optimize is a verb, you may consider "it is necessary to optimize".

2. L3, " Database query system is held by a standard SQL". This sentence is unclear, please revise it.

3. L4, " computer resources achievement" -> " computer resource achievement". "achievement, encryption and security" -> "achievement, encryption, and security". 

4. L3-6, "Database query system is held by a standard SQL language which shows the same functional design since the PL/SQL surge, this is due to the recent research on computer resources achievement, encryption and security, instead of improving the data milking based on AI tools, like Generative AI, machine learning (ML), and artificial neural network (ANN)." This sentence is poorly written. You have two verbs  (two sentences)without a conjunction, which is incorrect.

5. L5-6, "based on AI tools, like Generative AI, machine learning (ML), and artificial neural network (ANN)". When you use "like...", it is expected that entities listed after "like" are AI tools. However, I noticed that the "ML" and "ANN" are not AI tools. Thus, it lacks a parallel structure.

6. L4, "this is due to the recent research" -> "this noun(please summarize it) is due to the recent research". "This is due to" is unclear. Please add a noun after "this". 

7. L7, "with PL/SQL traditional tools" -> "with traditional PL/SQL  tools".

8. L 6-15, "The objective of this work is to present a projected methodology from MLP integrated with Kmeans, which is compared with PL/SQL traditional tools and intends to improve database response time and outline future advantages for ML and Kmeans in data processing, proposing a new corollary with the form of hk → H = SSE (C ) : k > 0 ∃ X executed on application software whose queries were made on data collections with more than 306 thousand records, producing a comparative table between PL/SQL and MLP-Kmeans from tree hypothesis: line query, group query and total query, getting the following results: line query reduced to -9 ms, group query from 88 to 2460 ms and total query from 13 to 279 ms, which concludes that data training reduces search engine efficiency although already training data remain available to subsequent AI production-processes." This sentence is too long and grammatically incorrect. Please revise it.

9. L13, " line query reduced to -9 ms" -> " line query reduced to 9 ms".

10. L13, " line query reduced to -9 ms, group query from 88 to 2460 ms" ->  "line query reduced to -9 ms, group query reduced from 88 to 2460 ms"

11. L15, "AI production-processes" -> "AI production processes"

12. In Algorithm 1, the function is named "LOOP", which is meaningless. I recommend that you use an informative function name instead of "LOOP".  Also, in Algorithm 2, you have two functions but with the same function name "LOOP". Please correct it. 

Comments on the Quality of English Language

Please see my previous comments to the author.

Author Response

Reviewer 3:  

The Abstract is super important, please double check your Abstraction section to ensure there is no error. 

Author response: Thank you very much, we apologize for the lack of clarity. We had taken the following actions:

  1. L2, "it is necessary an optimize", optimize is a verb, you may consider "it is necessary to optimize".

Author action: ... optimizing search engines to process time and resource consumption efficiently is necessary...

  1. L3, " Database query system is held by a standard SQL". This sentence is unclear, please revise it.

Author action: ... database query system, upheld by the standard SQL language...

  1. L4, " computer resources achievement" -> " computer resource achievement".

"achievement, encryption and security" -> "achievement, encryption, and security".  

Author action: ... focusing on computer resource management, encryption, and security...

  1. L3-6, "Database query system is held by a standard SQL language which shows the same functional design since the PL/SQL surge, this is due to the recent research on computer resources achievement, encryption and security, instead of improving the data milking based on AI tools, like Generative AI, machine learning (ML), and artificial neural network (ANN)." This sentence is poorly written. You have two verbs (two sentences) without a conjunction, which is incorrect.

Author action: ... The database query system, upheld by the standard SQL language, has maintained the same functional design since the advent of PL/SQL. This is due to recent research focusing on computer resource management, encryption, and security rather than improving data mining based on AI tools, machine learning (ML), and artificial neural networks (ANN)...

  1. L5-6, "based on AI tools, like Generative AI, machine learning (ML), and artificial neural network (ANN)". When you use "like...", it is expected that entities listed after "like" are AI tools.

However, I noticed that the "ML" and "ANN" are not AI tools. Thus, it lacks a parallel structure.

Author action: ... AI tools, machine learning (ML), and artificial neural networks (ANN)...

  1. L4, "this is due to the recent research" -> "this noun(please summarize it) is due to the recent research". "This is due to" is unclear. Please add a noun after "this".

Author action: ... This situation is caused because recent research has been focusing...

  1. L7, "with PL/SQL traditional tools" -> "with traditional PL/SQL tools".

Author action: ... traditional PL/SQL tools...

  1. L 6-15, "The objective of this work is to present a projected methodology from MLP integrated with Kmeans, which is compared with PL/SQL traditional tools and intends to improve database response time and outline future advantages for ML and Kmeans in data processing, proposing a new corollary with the form of hk → H = SSE (C ) : k > 0 X executed on application software whose queries were made on data collections with more than 306 thousand records, producing a comparative table between PL/SQL and MLP-Kmeans from tree hypothesis: line query, group query and total query, getting the following results: line query reduced to -9 ms, group query from 88 to 2460 ms and total query from 13 to 279 ms, which concludes that data training reduces search engine efficiency although already training data remain available to subsequent AI production-processes." This sentence is too long and grammatically incorrect. Please revise it.

Author action: .... This work presents a projected methodology integrating multilayer perceptron (MLP) with Kmeans. This methodology is compared with traditional PL/SQL tools and aims to improve database response time while outlining future advantages for ML and

Kmeans in data processing. We propose a new corollary: hk → H = SSE(C) : k > 0  X executed on application software querying data collections with more than 306 thousand records. The study produces a comparative table between PL/SQL and MLP-Kmeans based on three hypotheses: line query, group query, and total query. The results show that line query reduced to -9 ms, group query from 88 to 2460 ms, and total query from 13 to 279 ms. This concludes that data training reduces search engine efficiency. However, data training remains available for subsequent AI production processes...

  1. L13, " line query reduced to -9 ms" -> " line query reduced to 9 ms".

Author action: ...line query reduced to 9 ms...

  1. L13, " line query reduced to -9 ms, group query from 88 to 2460 ms" -> "line query reduced to -9 ms, group query reduced from 88 to 2460 ms"

Author action: ...line query reduced to 9 ms, group query increased from 88 to 2460 ms...

  1. L15, "AI production-processes" -> "AI production processes"

Author action: ... AI production processes...

*

  1. In Algorithm 1, the function is named "LOOP", which is meaningless. I recommend that you use an informative function name instead of "LOOP". Also, in Algorithm 2, you have two functions but with the same function name "LOOP". Please correct it.

Author response: 

  • Algorithm 1: We tried to specify that each centroid must be calculated for each cluster.
  • Algorithm 2: We intend to present a nested dataset inside of an epoch's cycle.

Author action:  

  • Algorithm 1: We have eliminated the LOOP function and only pointed out the centroid calculation.
  • Algorithm 2: We substitute the second LOOP function by a FOR one and clarify that OUTPUT must belongs to a K hypothesis.

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

Comments and Suggestions for Authors

The intension of the article is clearer after the revision. 

Author Response

Thank you very much for the time you dedicated to this document revision. We do appreciate the effort and honesty which has helped us to improve our research.

Reviewer 3 Report

Comments and Suggestions for Authors

This paper is not ready for publication even after the authors's revision.

I can easily identify at least 10 errors in this paper.

1. In line 9, the author states that " hk → H = SSE(C) : k > 0 ∃ X". However, this equation does not make sense as hk is mapping to a function. Also, there should be a comma between "k>0" and "∃ X".  The author may consider: "hk→H=SSE(C), where k>0 and ∃X". Alternatively, "∃X, such that ...".

2. The authors stated: "The results show that line query increased to 9 ms, group query increased from 88 to 2460 ms, and total query from 13 to 279 ms. This concludes that data training reduces search engine efficiency. " From my understanding, the smaller the query time, the method the method is. If the proposed method increases the query time, what's the benefit it brings? Without stating the benefits of the proposed method, the audience may question the necessity of the proposed method.

3. In line 197, the author stated that " d = X × X → R+", which is not commonly used. The notation "𝑑 : 𝑋 × 𝑋 → 𝑅+"  is a correct and commonly used way to describe such a function in mathematical contexts. Also, the authors started the paragraph with " Where a set of elements X and a distance function d are as follows", which is not common to start a sentence with "where".

4. In line 203, the author stated that "The output is a domain partition of X where C = (C1 ..C k )". However, the author misused the comma. It should be "C = (C1, C2, ..., Ck )".

5. In Algorithm 1, it states that "for k ← 1 to N d, Centroid ← Euclidean ( x, y, z)". However, the index k is never used in the Centroid computation. Moreover, it remains unclear what the author is iterating on.

6. Line 234, the author stated that " Group query Ck = i : ∑ Ci ∃ m when i is an element of k". The notation is unclear. 

7. The figure 3 is not clear. The flow chart should include the key steps. Thus, I recommend the author remove or replace the modules such as "for each... euclidean distance" and "for each... centroid". There is a typo of "hypotesis", which should be "hypothesis".

8. In Figure 3, In the MLP architecture, the author used the notation of "HV,E,σ = {hV,E,σ,w : Rd}". However, σ is misused since the author uses σ for relational algebra (ROWDATA = σ line,idproduct,amount (rowdata.csv)). If a notation has two interpretations, there are ambiguities.

9. In the Conclusion section, the author stated: "When comparing a modern MLP technique with a traditional PL/SQL method, implementing MLP and Kmeans may be less efficient in the short term. However, it establishes training that could be utilized in the long term to resolve classification problems in intelligent environments." The whole paper shows the proposed method is worse than the SQL method. So why would we accept a paper with poor performance and no benefits? Moreover, the statement of "could be utilized in the long term to resolve classification problems" is not supported by the evidence.

 

Comments on the Quality of English Language

Please see my comments for the author.

Author Response

Reviewer 3: 
1. In line 9, the author states that " hk → H = SSE(C) : k > 0 ∃ X". However, this equation does not make sense as hk is mapping to a function. Also, there should be a comma between "k>0" and "∃ X".  The author may consider: "hk→H=SSE(C), where k>0 and ∃X". Alternatively, "∃X, such that ...".

Author response: Thank you for your valuable comment. We didn't realize this equation was not clear enough.

Author action: The reference in line 9 has been changed according to your consideration, and also its corresponding equation 3 (number 4 before corrections).

2. The authors stated: "The results show that line query increased to 9 ms, group query increased from 88 to 2460 ms, and total query from 13 to 279 ms. This concludes that data training reduces search engine efficiency. " From my understanding, the smaller the query time, the method the method is. If the proposed method increases the query time, what's the benefit it brings? Without stating the benefits of the proposed method, the audience may question the necessity of the proposed method.

Author response: Thank you for your observation. Our intention not only to test one methodology against the other but to show that the complexity of the use of a neural network brings more precision than the simple use of PL/SQL instructions which will have more importance in the future for a specific problem solving hypothesis.

Author action: We change the wording of the abstract as follows: "Testing one methodology against the other not only has show the incremental fatigue and time consuming that training brings to database query but that the complexity of the use of a neural network is capable of produce more precision results than the simple use of PL/SQL instructions and this will be more important in the future for domain specific problems."

 

3. In line 197, the author stated that " d = X × X → R+", which is not commonly used. The notation "? : ? × ? → ?+"  is a correct and commonly used way to describe such a function in mathematical contexts. Also, the authors started the paragraph with " Where a set of elements X and a distance function d are as follows", which is not common to start a sentence with "where".

Author response: Thank you for your observation. You are absolutely right, changing : to = was a misspelling. Also the paragraph was incorrectly started.

Author action: We fix the notation and correct the paragraph to: "The input is defined as a set of elements X and a distance function d as follows: d : X × X → R+, where R+ is..."

4. In line 203, the author stated that "The output is a domain partition of X where C = (C1 ..C k )". However, the author misused the comma. It should be "C = (C1, C2, ..., Ck )".

Author response: Thank you for your observation. As software programmers unconsciously we sometimes used programming notation, in this case two points instead of tree.

Author action: We correct the format according to your indication.

5. In Algorithm 1, it states that "for k ← 1 to N d, Centroid ← Euclidean ( x, y, z)". However, the index k is never used in the Centroid computation. Moreover, it remains unclear what the author is iterating on.

Author response: Thank you for your observation. You are right, we didn't clarify that k is the centroid index to which iterations are made.

Author action: We added to the Centroid its k index.

6. Line 234, the author stated that " Group query Ck = i : ∑ Ci ∃ m when i is an element of k". The notation is unclear. 

Author response: Thank you for your observation. We have simplified the content of all 3.3.5. subsection.

Author action: We change  the item list as follows: "1. Line query when each element is a cluster or the size of the cluster is equal to the size of elements. 2. Group query when each k cluster is the sum of all i clusters with the same requested parameter. 3. Total query when there is only one cluster."

7. The figure 3 is not clear. The flow chart should include the key steps. Thus, I recommend the author remove or replace the modules such as "for each... euclidean distance" and "for each... centroid". There is a typo of "hypotesis", which should be "hypothesis".

Author response: Thank you for your observation. We had reconsidered the figure design and improved according to your instructions.

Author action: We joined the euclidean and centroid modules and made only one of them, also we corrected the missing h in the hypothesis column and replaced the original figure.

8. In Figure 3, In the MLP architecture, the author used the notation of "HV,E,σ = {hV,E,σ,w : Rd}". However, σ is misused since the author uses σ for relational algebra (ROWDATA = σ line,idproduct,amount (rowdata.csv)). If a notation has two interpretations, there are ambiguities.

Author response: Thank you for your observation. You are entirely right. Our intention was to point out the use of relational algebra, but this style uses the sigma symbol as reference for data selection contrary to the MLP notation which utilizes it as the representation of the sigmoid function, producing for that reason an ambiguity.

Author action: We consider not necessarily the relational algebra reference. So it was retired from lines 176 to 178.

9. In the Conclusion section, the author stated: "When comparing a modern MLP technique with a traditional PL/SQL method, implementing MLP and Kmeans may be less efficient in the short term. However, it establishes training that could be utilized in the long term to resolve classification problems in intelligent environments." The whole paper shows the proposed method is worse than the SQL method. So why would we accept a paper with poor performance and no benefits? Moreover, the statement of "could be utilized in the long term to resolve classification problems" is not supported by the evidence.

Author response: Thank you for your observation. In the context of your observation number 2, we justify ourselves with the same arguments that our intention not only was to test one methodology against the other but to show that the complexity of the use of a neural network brings more precision than the simple use of PL/SQL instructions which will have more importance for a domain specific problems.

Author action: We change the wording of the conclusion text about the abstract as follows: "When comparing different methodologies, it becomes evident that training can lead to increased fatigue and consume a lot of time when dealing with database queries. Additionally, the complexity of using a neural network can produce more accurate results compared to using simple PL/SQL instructions, in the testing scenario. This will be particularly significant in the future for specific problem-solving scenarios, such as intelligent environments characterized by prescriptive control achieved through self-programming processes, resulting in holistic functionality, for which this work should be extended...."

Author Response File: Author Response.pdf

Round 3

Reviewer 3 Report

Comments and Suggestions for Authors

I have no comments on this round of review

Back to TopTop