Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

A New Improved Learning Algorithm for Convolutional Neural Networks

Processes 2020, 8(3), 295; https://doi.org/10.3390/pr8030295

by Jie Yang, Junhong Zhao, Lu Lu^*

, Tingting Pan and Sidra Jubair

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Reviewer 3: Anonymous

Reviewer 4: Anonymous

Processes 2020, 8(3), 295; https://doi.org/10.3390/pr8030295

Submission received: 24 January 2020 / Revised: 14 February 2020 / Accepted: 26 February 2020 / Published: 4 March 2020

(This article belongs to the Special Issue Neural Computation and Applications for Sustainable Energy Systems)

Round 1

Reviewer 1 Report

This paper proposes an improved version of CNN with good performance compared to a classical CNN. Some benchmarks are described to show the results.

Good and interesting paper.

--------> Mention the accuracy of Non-improve CNN in abstract.

--------> Just some English typos::::

line 7 . abstract -> use impersonal form: "Then we present ..." ---> "This paper presents ..."

line 42, 145, etc. " we " --- use impersonal " this paper", "this research", etc.

Figure 1.- Copyright information???? add reference [17] and some phrase "this image is taken from ..."

Line 66,72, etc... . Equation -> equation

line 80. Otherwise --> otherwise

Line 110, 111, etc... Figure --> figure

Author Response

Comment 1: Mention the accuracy of Non-improve CNN in abstract.

Revision 1: I am very grateful for the suggestion, which makes the experimental results clearer and the abstract of the paper more comprehensive. According the comment, we add the accuracy of Non-improve CNN in Lines 10 to 13 of the abstract.

Comment 2: Just some English typos:

line 7. abstract -> use impersonal form: "Then we present ..." ---> "This paper presents ...".

line 42, 145, etc. " we " --- use impersonal " this paper", "this research", etc.

Line 66,72, etc... . Equation -> equation.

line 80. Otherwise --> otherwise.

Line 110, 111, etc... Figure --> figure.

Revision 2: We are very sorry that we have made such mistakes in our manuscript. Firstly, we change the personal form to the impersonal in the revised paper, such as Lines 5, 7, 65, 71, 117, 167 and 265. Secondly, we change the form of word “Otherwise”in Line 102 of Page 3. Finally, since the template requires “all figures and equations should be cited in the main text as Figure 1, Equation (1)”, we do not change the Figure and Equation forms.

Comment 3: Figure 1.- Copyright information? add reference [17] and some phrase "this image is taken from ..."

Revision 3: Thanks for the reviewer’s comments. Figure 1 is drawn by us with the drawing tool “Visio” and there is no copyright issue. And we add reference [21] in Line 82 of Page 2.

Reviewer 2 Report

From my point of view, the article entitled is well written and, from my point of view, would be of interest for the readers of processes. In spite of these, and before its publication I would recommend the following changes:

Line 95: it speaks about PCNN please, explain its meaning.

Line 116: please review the word strenthenly- Is it correct?

Figure 3: I recommend changing the vertical axis to percentage and fixing the maximum in 100%.

Formula 10: please, explain the meaning of NMSE.

Formula 11: please, explain the meaning of NCE.

Figure 5: move the legends square to avoid hiding part of the line of deer vs. horse

Line 146: CIFAR-10 and MNIST datasets are cited before been explained.

Finally, I would proposed the authors to perform a complete revision of the text in order to put all in third person.

Author Response

Comment 1: Line 95: it speaks about PCNN please, explain its meaning.

Revision 1: Thanks for the reviewer’s valuable comments, we add the interpretation of PCNN in Line 158 of Page 6.

Comment 2: Line 116: please review the word strenthenly- Is it correct?

Revision 2: We are very sorry that we have made such a mistake in our manuscript. We change the word in Line 137 of Page 5.

Comment 3: Figure 3: I recommend change the vertical axis of Figure 3 to percentage and fixing the maximum in 100%.

Revision 3: According to the reviewer’s comment, we redraw Figure 3 in Page 5.

Comment 4: Formula 10: please, explain the meaning of NMSE.

Revision 4: Thanks for the reviewer’s valuable comments, we add the interpretation of NMSE in Line 160 to 161 of Page 6.

Comment 5: Formula 11: please, explain the meaning of NCE.

Revision 5: Thanks for the reviewer’s meaningful opinions, we add the explanation of NCE in Line 160 to 161 of Page 6.

Comment 6: Figure 5: move the legends square to avoid hiding part of the line of deer vs. horse.

Revision 6: According to the reviewer’s comment, we redraw Figure 5 in Page 8.

Comment 7: Line 146: CIFAR-10 and MNIST datasets are cited before been explained.

Revision 7: Thanks for the reviewer’s careful review. We add references related to the MNIST and CIFAR10 datasets in Line 170 of Page 7.

Comment 8: Finally, I would proposed the authors to perform a complete revision of the text in order to put all in third person.

Revision 8: Thanks for the reviewer’s comments, and we redescribe the paper in the third person.

Reviewer 3 Report

The paper presents a novel approach in training the weights in the neural network. The proposed approach uses weighted loss calculation based on the borderline classification of instances. The authors start the paper with a brief history of the research of neural networks. The basic neural network architecture and the proposed method are presented next. This is followed by the section of the experiment. The structure of the paper is appropriate, and the English language used is good.

My concerns about the paper are the following.

-Authors spend most of the introduction presenting ta brief history of neural networks. This is not relevant, especially since some references have no place in the paper (i.e. [12]). The introduction lacks the reasoning on why the proposed is to be used and what are the shortcomings of the existing backpropagation method that this approach tries to address. Also, there are no references to similar approaches. One could argue that the basic boosting method is similar. The authors should rewrite the introduction in a manner that addresses these shortcomings.

-The presentation of the proposed approach is too brief and it does not provide enough information to allow the reimplementation of this method. When is the new proposed loss calculation performed? After every epoch? Be clear on this.

-The authors present an alternative to backpropagation. But from their description, the backpropagation algorithm is still used, just the loss function and the backward sweeps are impacted by their change. If not, authors should be more clear on what else is changes and what is used if not backpropagation.

-The experiment has major issues. The authors do not provide the details on the methodology of the experiment. They state that the datasets were used in five runs. Were the train/test splits different in every run? If not, why not, did authors just hope for better initial random weight setting? If yes, what kind of split was it used? It should be done with random stratified cross-validation to prevent overfitting and data leakage.

-The charts of train and test accuracies suggest that there could be serious overfitting – the difference between a train and test accuracies are sometimes (i.e. in CIFAR-10) too large.

-The charts suggest that only one of the five runs is reported. Is this true, and if yes, why? Did the authors handpick the results that support their claim? I.e. the worst-performing runs for competitive methods and the best performing runs for their approach? The charts suggest so, as the competitive method is worse even in the starting epoch, not only in the last ones.

-To test if the results are due to chance or are the differences statistically significant, authors should use statistical analysis to compare the proposed method to competitive methods. See Demšar, 2006 for the help on this.

Demšar, J., 2006. Statistical comparisons of classifiers over multiple data sets. Journal of Machine learning research, 7(Jan), pp.1-30.

Author Response

Comment 1: Authors spend most of the introduction presenting ta brief history of neural networks. This is not relevant, especially since some references have no place in the paper (i.e. [12]). The introduction lacks the reasoning on why the proposed is to be used and what are the shortcomings of the existing backpropagation method that this approach tries to address. Also, there are no references to similar approaches. One could argue that the basic boosting method is similar. The authors should rewrite the introduction in a manner that addresses these shortcomings.

Revision 1: Thanks for the reviewer’s meaningful opinions, which makes the paper more complete. According to the reviewer’s comments, we delete some references that are less relevant to this paper and add two new paragraphs to complete the introduction of this paper. Firstly, we delete the irrelevant references and introduce the advantages and disadvantages of the BP algorithm in Paragraph 3 of Page 1. Secondly, we introduced some research on how to solve the existing shortcomings of BP algorithm in Paragraph 1 of Page 2. Thirdly, we explain why the new learning algorithm is proposed in this paper and introduce some literatures with similar methods in Paragraph 2 of Page 2.

Comment 2: The presentation of the proposed approach is too brief and it does not provide enough information to allow the reimplementation of this method. When is the new proposed loss calculation performed? After every epoch? Be clear on this.

Revision 2: According to the reviewer’s comment, we add Algorithm 1 in Page 6, which summarizes the specific steps to implement the new learning algorithm. It can be seen from Algorithm 1 that the new proposed loss function is executed after each batch.

Comment 3: The authors present an alternative to backpropagation. But from their description, the backpropagation algorithm is still used, just the loss function and the backward sweeps are impacted by their change. If not, authors should be more clear on what else is changes and what is used if not backpropagation.

Revision 3: It’s our fault that the new proposed algorithm is not described in detail. We add Algorithm 1 in Page 6 to describe the detailed steps of the new leaning algorithm. It can be seen from Algorithm 1 that gradient descent is still used to adjust the parameters, but the the updating of parameters in the backward sweeps is more affected by danger samples.

Comment 4: The experiment has major issues. The authors do not provide the details on the methodology of the experiment. They state that the datasets were used in five runs. Were the train/test splits different in every run? If not, why not, did authors just hope for better initial random weight setting? If yes, what kind of split was it used? It should be done with random stratified cross-validation to prevent overfitting and data leakage.

Revision 4: According to the reviewer’s comment, we add some details of the experiment in in Lines 178 to 183 of Page 7, where the explanation of the division of the dataset is shown in Lines 180, 194 and 249.

Comment 5: The charts of train and test accuracies suggest that there could be serious overfitting—the difference between a train and test accuracies are sometimes (i.e. in CIFAR-10) too large.

Revision 5: Thanks for the reviewer’s careful review. In our experiments, although we have adopted the Dropout method to prevent overfitting, it can be seen from Figure 7 in Page 9 that there is overfitting. However, it can still be seen from this figure that the effect of PCNN is improved compared with the original CNN.

Comment 6: The charts suggest that only one of the five runs is reported. Is this true, and if yes, why? Did the authors handpick the results that support their claim? I.e. the worst-performing runs for competitive methods and the best performing runs for their approach? The charts suggest so, as the competitive method is worse even in the starting epoch, not only in the last ones.

Revision 6: We are sorry that we didn't clearly describe the reason of running 5 trials. We redescribe this in Lines 178 to 180 of Page 7. We take the mean of the results of 5 runs as the final result, instead of handpick one from the five runs to support our conclusion.

Comment 7: To test if the results are due to chance or are the differences statistically significant, authors should use statistical analysis to compare the proposed method to competitive methods. See Demšar, 2006 for the help on this.

Revision 7: Thanks for the reviewer’s valuable comments, which makes our conclusion more credible. According to the reviewer’s comment, we conduct the Wilcoxon signed-rank test for each experiment, which is shown in Lines 183, 208, 225, 238 and 256. And we summarize the results of the Wilcoxon signed-rank test for each dataset in Table 2 of Page 9.

Reviewer 4 Report

This is an interesting paper that is within the interest of the journal.

The paper is well written with only a few minor grammar errors.

A few comments are:

1. in the literature review, you could also add techniques that fuse two
methods, i.e. CNN and sparse coding,

O. Kechagias-Stamatis, N. Aouf, Fusing Deep Learning and Sparse Coding for SAR ATR, IEEE Trans. Aerosp. Electron. Syst. 55 (2019) 785–797. doi:10.1109/TAES.2018.2864809.

2. Eq. (7), define mathematically the y_i and t_i terms

3. line 153 - 154, what is the σ of the performance for the various λ values? are only 5 trials adequate?

4. what is the performance for λ=1? so essentially you don't have a MSE

5. minor grammar errors

Author Response

Comment 1: In the literature review, you could also add techniques that fuse two
methods, i.e. CNN and sparse coding.

Revision 1: Thanks for the reviewer’s comment. We add the suggested reference in line 29 of Page 1.

Comment 2: Eq. (7), define mathematically the y_i and t_i terms.

Revision 2: I am very grateful for the suggestion. To ensure the unity of letters, we define the mathematical expression for y_i in Line 98 of Page 3 and explain the meaning of t_i in Line 115 of Page 4.

Comment 3: Line 153 - 154, what is theσof the performance for the various λ values? are only 5 trials adequate?

Revision 3: It is our fault that we didn’t accurately describe the classification performance of network with the variousvalues. We redescribe this in Lines 185 to 190 of Page 7. And we also explained why we run it 5 times in lines 178 to 180 of Page 7.

Comment 4: What is the performance for λ=1? so essentially the paper don't have a MSE.

Revision 4: Thanks for the reviewer’s questions. We discuss the performance of the network whenλ=1 in Line 163 and add the range of in Line 165 of Page 6. For two-class classification problems, the new loss function is equivalent to MSE if λ=0, which is explained in Line 161.

Comment 5: Minor grammar errors.

Revision 5: We are very sorry that we have made such mistakes in our manuscript. We correct grammar errors in the revised paper.

Round 2

Reviewer 3 Report

The authors have significantly improved the paper in accordance with the reviewer's suggestions and concerns. Yet, there are still some important drawbacks of the paper, which have to be addressed.

The newly added text contains disproportionally more grammar mistakes as the old text (i.e. mini-bath in Algorithm 1). The updated introduction still does not review competitive methods. The authors just changed the history review of the CNNs. Why mention leaky ReLU and other variants, if this is not in the context of your research. The main contribution of the proposed method is the disproportional importance of training instances during the weight optimization (training. back-propagation) process. None of the references in the introduction deal with this. There are noumerous boosted neural network methods which are dealing with different importances of training instances during the training process. The authors should include those in the brief review and also (important), in the experimental section. Why compare the proposed method to standard CNNs (with dropout; which is arguably already standard) only, and why not to competitive and similar methods? Similar methods here are the one where weighted training instances are used during the training phase (i.e. boosted NNs). Authors state that the means of all 5 folds are presented in the line charts through the iterations (Figs. 5-9). The mean of all five folds is made for each iteration and this is shown in chart? If yes, this is unusual. Include median, SDs, min, max and average rank in Table 3.

Author Response

We would like to express our sincere appreciations to the reviewer for your careful review to improve the quality of the paper. We believe that the comments are highly constructive and very meaningful to restructure the manuscript. We have thoughtfully studied the comments and tried our best to revise our manuscript accordingly. Detailed changes are as follows.

Comment 1: The newly added text contains disproportionally more grammar mistakes as the old text (i.e. mini-bath in Algorithm 1).

Revision 1: We are very sorry that we have made such a mistake in our manuscript. We correct grammar errors in Algorithm 1 of Page 7.

Comment 2: The updated introduction still does not review competitive methods. The authors just changed the history review of the CNNs. Why mention leaky ReLU and other variants, if this is not in the context of your research.

Revision 2: Thanks for the reviewer’s careful review. The reason why we mentioned the variation of leaky ReLU in this paper is that these studies modified the ReLU function to solve the problem that the weights of negative neurons cannot be updated in back propagation. The similarity between these studies and this paper is that they both improve the BP for CNN by changing the weight update. And the difference is that these studies provide an opportunity for negative neurons, but we strengthen the update extent by the danger samples.

Comment 3: The main contribution of the proposed method is the disproportional importance of training instances during the weight optimization (training. back-propagation) process. None of the references in the introduction deal with this. There are numerous boosted neural network methods which are dealing with different importance of training instances during the training process. The authors should include those in the brief review and also (important), in the experimental section.

Revision 3: Thanks for the reviewer’s comments. In this paper, References [18] and [19] have been cited to introduce the importance of paying more attention to the misclassificated samples during the training process. According to the reviewer’s comment, we add References [20] and [21] in Paragraph 2 of Page 2, which presented similar methods.

Comment 4: Why compare the proposed method to standard CNNs (with dropout; which is arguably already standard) only, and why not to competitive and similar methods? Similar methods here are the one where weighted training instances are used during the training phase (i.e. boosted NNs).

Revision 4: Thanks for the reviewer’s questions. In our experiments, the parameter setting of PCNN and CNN are the same, except for the loss function. The experimental results can better prove the effectiveness of the proposed learning algorithm. Meanwhile, we construct the three sub-datasets of CIFAR-10 for two-class problem, and no paper has ever implemented on these datasets.

Comment 5: Authors state that the means of all 5 folds are presented in the line charts through the iterations (Figs. 5-9). The mean of all five folds is made for each iteration and this is shown in chart? If yes, this is unusual. Include median, SDs, min, max and average rank in Table 3.

Revision 5: Thanks for the reviewer’s comment. We cannot be sure the meaning of the word “fold”. If it means to split datasets into 5 folds, we want to say that we don’t adopt 5-fold cross validation in our experiments at all. Each experiment using all training samples is repeated for 5 times with different initial random and their mean value is taken as the final result, which has been mentioned in lines 186 to 191 of Page 7. The mean of all five runs is indeed made for each iteration and these are shown in Figures 5 to 9. Secondly, situations of median, SDs, min, and Max are reflected in Figures 6(b), 7(b), 8(b) and 9(b), because they are to aggregate multiple measurements of training and test error values by plotting the mean and the 95% confidence interval around the mean. Finally, the other algorithms in Table 3 only show the final test accuracy in their experiments, we cannot obtain other information about the results of these algorithms.

Article Menu

A New Improved Learning Algorithm for Convolutional Neural Networks

Further Information

Guidelines

MDPI Initiatives

Follow MDPI