Next Article in Journal
Seismic Risk Analysis of Existing Link Slab Bridges Using Novel Fragility Functions
Previous Article in Journal
An Exploration of the Pepper Robot’s Capabilities: Unveiling Its Potential
 
 
Article
Peer-Review Record

ESA-GCN: An Enhanced Graph-Based Node Classification Method for Class Imbalance Using ENN-SMOTE Sampling and an Attention Mechanism

Appl. Sci. 2024, 14(1), 111; https://doi.org/10.3390/app14010111
by Liying Zhang 1,2,3,* and Haihang Sun 1
Reviewer 1:
Reviewer 2: Anonymous
Reviewer 3:
Reviewer 4: Anonymous
Reviewer 5:
Reviewer 6: Anonymous
Appl. Sci. 2024, 14(1), 111; https://doi.org/10.3390/app14010111
Submission received: 18 November 2023 / Revised: 17 December 2023 / Accepted: 19 December 2023 / Published: 22 December 2023
(This article belongs to the Section Computing and Artificial Intelligence)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The article deals with significant and modern tool graph neural networks. The application of the tools looks also very promising.

There are such notices for the improvement of content

1. In my opinion the abbreviation ESA-GCN should be eliminated from the title and the abstract. Because it is the author's new term and only in the article there is a good description of it.

2. In formula (1) the authors use the letter F twice for different concepts, maybe it would be better to change the notation.

3. In the formula (1) description it would be better to add the dimensions of the matrixes  F and A[ : , v], 

4. In formula (1)  programming function  Concat() is used. It is not correct from a mathematical point of view.  It is proposed two possibilities

a) to change Concat() to eqiuvalent mathematical  procedure or 

b) to add a detailed description of it.

5. It is completely unclear what does it mean "s.t." in formula (2). I understand what the authors want to do but the notation must be changed or explained.

6. In formula (3) the comma is presented as a prime for number 2.  The blank between the number 2 and the comma has to be added.

7. In formula (4) the notation v' is not a good choice. It would be better to use the notation according to the authors' proposed style (see ne(v)) n(v) or v with a line on the top.

8. In the description formula (5)  the authors write "ai,j represents the attention of node j on node i". Here "attention" is a bad word, because "attention" belongs to a defined mathematical term. It would be better to use the word "relation" or "influence". 

9. In formula (6) the function LeakyReLU should be changed to "sigma".  Because in formulas (1) and (7) the authors use the notation "sigma" for an initial function.

10. In formula (6) the authors use the operation "||". It should be described.

11. In formulas (2) and (7) the authors used the functions argmin() and softmax() from coding. It would be better to change them for the mathematical notation.

12. It is unclear in formulas (8)-(11) what does it mean [2] in the power. It should be described or eliminated.

After corrections, the article can be published

 

 

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

This paper proposes an ESA-GCN model to utilize a series of sampling and edge generation techniques in the expressive embedding space obtained by the graph neural network (GNN). The manuscript requires significant improvement and modification. I will leave the manuscript for a major revision. My comments are listed below.

1.       Please, elaborate meaning of ESA-GCN and ENN-SMOTE in the abstract. It is a little distracting for readers if you start with a short form.

2.       Please, mention the what public dataset is used for this experiment in the abstract.

3.       Please, update the spacing of most of the citations. There should be space between word and citation numbers “analysis[1]” , “performance[2], etc.

4.       One good work for the class imbalance dataset can be reviewed and cited. “ https://doi.org/10.3390/s22218268”.

5.       Figure quality needs to be improved, at least 300 dpi in resolution.

6.       Table quality needs to be improved, please check a few more reference papers for table organization.

7.       Add citation for this - “Two context-based self-supervised tasks are designed to consider both local and global structural information in the graph structure.”

8.       What does it mean by “From the perspective of domestic and foreign research achievements,”? Please make it clear or rephrase it.

9.       The presentation of the algorithm on page 9 is very poor. Please, check the relevant journal for updating the algorithm.

10.   Items (bullet points) in the future work, sometimes start with space and sometimes not. Please review it properly.

11.   The whole manuscript contains several typos, unnecessary dash, spacing, etc. In several places, short forms are used without abbreviations. These need to be carefully checked.

12.   Needs some citations -“Most existing GNN models are based on the assumption that node samples from different classes are balanced”.

13.   Additionally, Zhao et al.[2]perform oversampling”, “ImGAGN[5]synthesizes mini-nodes by” check spacing properly.

14.   If you already defined graph neural networks as GNN, then later you don’t need to use graph neural networks again. Just write GNN. Please, check others accordingly.

 

15.   Please add a brief discussion about all three datasets. 

Comments on the Quality of English Language

The quality of English needs to be improved. Revise several types before you resubmit it. Check punctuation, grammar, and spacing carefully. 

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

The paper proposes the ESA-GCN model to address class imbalance in graph neural networks (GNNs). The model uses the ENN-SMOTE comprehensive sampling method to balance the dataset, reduces error rates by removing low-quality and noisy data, and introduces an attention mechanism during edge generation. Experiments on three public datasets show that ESA-GCN significantly improves classification accuracy while reducing model parameters and computational complexity.

The paper is well-written and organized.

One remark: The authors should improve figures because they are in a bad quality.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 4 Report

Comments and Suggestions for Authors

Authors proposed new method to overcome imbalanced data problem in Graph Neural Network applications. Proposed method is combination of resampling method and different weights for noisy data. proposed method was illustrated on 3 different publicly available data sets. Robustness of such method was also checked.

Authors proofed good stability and robustness in handling imbalanced node classification tasks.

Method was clearly described using graphical presentation and some formulas. 

For efficiency assessment two methods were used: AUC-ROC, F1-macro.

The literature is limited.

Examples and proof was based on 3 different publicly available data sets. there is no explanation or rationale behind this choice. maybe choosing different data sets would result in different conclusions. This is the weakest part of this paper.

The sentence before and after the table with Algorithm is broken.

Minor: text needs formatting.

Comments on the Quality of English Language

Text needs formatting however the English is not very bad.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 5 Report

Comments and Suggestions for Authors

Thе papеr proposеs an intеrеsting approach to tackling class imbalancе in graph nodе classification.  Combining sampling techniques with graph neural nеtworks is a promising dirеction. 

Hеrе arе somе suggеstions for improvеmеnt:

1) Thе dеscriptions of somе еxisting mеthods in thе Rеlatеd Work sеction lack dеtails.  Morе spеcifics on thеir working and limitations can hеlp bеttеr highlight thе novеlty of thе proposеd approach. 

2) Morе rеsults analyzing thе quality of gеnеratеd еdgеs and еmbеddings for synthеtic nodеs can providе usеful insights into modеl working and еffеctivеnеss of tеchniquеs usеd. 

3) Error analysis invеstigating the margin of error, limitations and casеs whеrе thе mеthod undеrpеrforms can idеntify promising dirеctions for furthеr work.  

 

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 6 Report

Comments and Suggestions for Authors

 

A method to cope with inaccurate edge generation during graph data oversampling, insufficient representation of minority classes, and the presence of noisy samples in GNNs, is proposed.

 

The abstract could be reshaped to improve clarity. Point (ii) in the abstract is an algorithmic advantage. But points (i) and (iii) are algorithmic features of the proposed approach. I think that the abstract would read better if features were presented first and results afterwards, instead of mixed.

 

The literature review is somewhat shared between Sections 1 and 2. I think that it would be better to be all in the same section.

 

Subsection names are not capitalised the same across the paper (e.g. 2.2).

 

Same acronyms are defined multiple times (e.g. GNN).

 

Figure 1 needs to be of better quality.

 

Section 3.2 has subsection titles “1.ENN undersampling process:” and “2.SMOTE oversampling process”, which may need to be sub-subsections.

 

I was unable to find a definition of the function “LeakyReLU” used in Equation 6.

Comments on the Quality of English Language

Punctuation is wrong in many cases.

Leave a space after ending a sentence with a period.

References are unstructured.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 7 Report

Comments and Suggestions for Authors

The problem of class imbalance becomes increasingly severe because traditional GNNs tend to prioritize majority class nodes when dealing with imbalanced class distributions and fail to adequately capture the features of minority class nodes. This study proposed the ESA-GCN model that employs the ENN-SMOTE sampling method to balance the dataset reducing classifier's error rate. An attention mechanism introduced during the edge generation phase between new and original nodes can improve classification accuracy and reduce model parameters and computational complexity. The authors presented the experiments on three public datasets where the designed framework demonstrated better performance results in class imbalance node classification tasks.

Comments:

     1)  In opinion of this reviewer, the authors should redact their text because in two parts: I. Introduction and in 2. Related work, subsect. 2.1, the same papers ((2, 3, 6, 7) are referred explaining these papers with repetitions.

       2) The authors announced as one from the principal contributions, the ENN-SMOTE sampling method, but in page 4 they mentioned: “Gustavo et al.[17] proposed two hybrid sampling methods: SMOTE+Tomek and SMOTE+ENN. These methods first apply SMOTE oversampling to the minority class and then use Tomek Links and ENN, respectively, to clean the samples and remove minority class samples that impede the majority class”. Please clarify the novelty of proposed ENN-SMOTE framework against referred one.

       3) This  reviewer suggests expanding the Sect. 4. Experimental Results and Discussion where should be presented the detailed information about used three datasets.

     4)    The authors never presented the experimental setup used in their experiments (Sect. 4).

    5)    This reviewer proposes that for better understanding the performances of the designed framework it would be important presenting behavior of the metric (F1 or AUC-ROC) or loss as a function of the number epochs during training and validation stages.

       6)  This reviewer thinks that for better understanding their results by potential reader, the authors should present much more information about chosen parameters in the experimental set. They only wrote: “For all methods, the learning rate is initialized as 0.001, and the weight decay is set to 5 ×10− 4. λ is set as 1×10− 6. If not specified, the imbalance ratio is set to 0.5, and the undersampling neighbor number K is set to 3”. Please provide the  justification of chosen parameters.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

Comments and Suggestions for Authors

Thanks for considering some of the comments. However it manuscript still has lots of errors as it has not been carefully reviewed.

1.       On the first page “Shi et al. [3]proposed”, there should be space after the citation. Please, check the citation style properly before you submit it. Also, be careful about spacing, it looks like you have multiple spacing after citations. Check all the citations.

2.       On the 2nd page, ‘Li [7] proposed …..’ is written like this, this is not the way you cite other people's names. It should be ‘Li et al. [7] proposed’. Review all similar types of citations.

3.       Check the table 2, it is between pages 11 and 12. So, put it on a single page.  

4.       I don’t like the way you presented the algorithm. You can check this paper and its algorithm presentation. At least make it better for the final publication. ‘https://arxiv.org/pdf/2006.02158.pdf’.

 

Please, review some of the relevant papers of this journal, so that you can have some idea how to present a good manuscript. Good luck

Comments on the Quality of English Language

Please, review the paper carefully before submitting it. 

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 4 Report

Comments and Suggestions for Authors

No more comments.

Comments on the Quality of English Language

sufficient

Author Response

Thank you very much for your arduous works and valuable suggestions for our research. We appreciate your recognition of our paper.

Reviewer 7 Report

Comments and Suggestions for Authors

In opinion of this reviewer, the authors have attended completely only comment 4 (describing three datasets), They also particularly attended comment 1, where this reviewer asked to revise and redact these two parts, but the authors did not provide any text redaction in revised version of the manuscript.

Relatively to comments 2, where it has been proposed to justify the novelty of proposed method, as they wrote “The ENN-SMOTE sampling method is designed to balance the dataset by de-creasing the majority class nodes and increasing the minority class nodes” (page 2), his reviewer did not find adequate answer.

The authors or correspondent author that answers in their responds did not understand the comment 4. The experimental set includes not only some chosen parameters but also the information about used hardware (types of PC, GPU, etc.) and libraries during implementation of their method in the experiments.

Novel revised version of the manuscript as well as the author´ responds on the comments 5 and 6

 

did not clarify doubts in these aspects.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Back to TopTop