Next Article in Journal
A Method for Evaluating the Spatial Layout of Fire Stations in Chemical Industrial Parks
Previous Article in Journal
STPA-RL: Integrating Reinforcement Learning into STPA for Loss Scenario Exploration
 
 
Article
Peer-Review Record

A Time–Frequency Image Quality Evaluation Method Based on Improved LIME

Appl. Sci. 2024, 14(7), 2917; https://doi.org/10.3390/app14072917
by Yihao Bai, Weidong Cheng, Weigang Wen * and Yang Liu
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Appl. Sci. 2024, 14(7), 2917; https://doi.org/10.3390/app14072917
Submission received: 27 February 2024 / Revised: 22 March 2024 / Accepted: 27 March 2024 / Published: 29 March 2024

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The presented subject is very interesting and could be very beneficial in many different applications of deep learning methodology. It would be motivating to general preprocessing a training set in deep learning.

The items to be improved:

- Technical preparation of the exposed content needs to be reconsidered. A lot of sentences, especially in the section 2, must be reconstructed, capital letters and punctuation must be corrected.

- Subsection 2.1. is missing, there are two subsections labeled 2.2.

- Within the subsubsection 2.3.4. more detailed should be provided, particularly on the accuracy.

- Text in lines 330-339 should vanish.

Comments on the Quality of English Language

The quality of English language is is not appropriate. It should be improved.

Author Response

Dear reviewer:

 

Thank you for your comments on the manuscript.

 

In response to the comments you raised, we have made the following modifications:

  1. We describe the technologies involved in more detail. The second chapter of the manuscript was emphatically revised.

 

  1. A more detailed description is also given in the experimental section, including but not limited to accuracy.

 

  1. The manuscript was carefully proofread for grammar and format.

 

Thanks again for your comments. If you still have any questions about the manuscript, please feel free to point them out.

Reviewer 2 Report

Comments and Suggestions for Authors

The authors proposed a time-frequency image quality evaluation method based on an improved local interpretable model-agnostic explanations (LIME). The idea is interesting, but the following comments should be addressed before publication:

1, the authors said that they proposed an image quality evaluation method, but based on my knowledge, in this manuscript, they don't apply the image as the input of the deep learning model, they just used the time-frequency matrixes. Please explain.

2, Based on Figure 2, I can not get the improved LIME in this figure. Could the authors explain more?

3, Could the authors explain more of the 2.3.2 Neural network part?

4, The description of Table 1 is not clear. From this table, if I understand correctly, the training set of each model only contains 2000 samples, right? But in your table, you tell the reader that your training set contains 2000 * 5 with each model. Please improve Table 1.

5, Please improve the quality of Figure 7.

6, I am wondering if the authors could give the results based on the normal LIME method to demonstrate the performance of the improved LIME.

7, Could the authors explain more about the consistency in the experimental result part?

8, Could the author explain more about the usage of the quality score in Table 3? 

Comments on the Quality of English Language

Moderate editing of English language required.

Author Response

Dear reviewer:

 

Thank you for your comments on the manuscript.

 

In response to the comments you raised, we have made the following modifications:

  1. In the experiment, the time-frequency matrix is usually transformed into gray-scale image. This part has been added at the end of 2.3.1.
  2. There is an error in the description. The focus of the manuscript is to propose a technical route, which is described in Part 2.3. Improvements to LIME are only part of this, which is described in section 2.3.3. We have also made revisions and supplements to these parts.
  3. A more specific description of the network has been added in 2.3.2.
  4. This is a misrepresentation. Five one-to-one corresponding training sets and test sets are obtained by different time-frequency transformation methods. The size of each training set is 128*128*2000, and the size of the test set is 128*128.500. The corresponding descriptions and tables have been modified.
  5. The image was modified according to 300 dpi requirement.
  6. LIME is designed to serve the overall technology route. The original LIME is even difficult to effectively divide superpixels from time-frequency images, so no comparison is made in this paper. We have also paid attention to the improvement of some other researchers' techniques, and comparative analysis will be one of our future studies.
  7. The description of quality scores and consistency is supplemented in 2.3.4.

 

Thanks again for your comments. Your comments refined the manuscript.

Reviewer 3 Report

Comments and Suggestions for Authors

In general, the submitted manuscript is interesting. The introduction section gives enough background information on the research topic. On the other hand, the main contribution of the manuscript is not clear. Unfortunately, these sentences: "Aiming at this problem, this paper proposed an improved LIME method to evaluate the time-frequency image quality of vibration signals of rotating machinery from two aspects: the accuracy of diagnosis results and the consistency of interpretation results with prior knowledge." are too general. Section 2.2 is rather unclear. To tell truth, I was not able to interpret this section. I think Fig. 1. is not too meaningful. Sections 2.2 and 2.3 are OK, but the authors should illustrate the applied time-frequency transforms with real signals and time-frequency images. It was unclear to me why the authors applied feature selection algorithms if the 2d time-frequncy images were fed into the CNN. The applied evaluation metrics in Table 3 are somewhat unclear. A separate evaluation metrics could be helpful for readers.

Author Response

Dear reviewer:

 

Thank you for your comments on the manuscript.

 

In response to the comments you raised, we have made the following modifications:

  1. Section 2.1 has been revised to include a description of the technical background.
  2. Feature selection serves the interpretable model g rather than the original model f, which is supplemented in 2.3.4.
  3. A description of the evaluation indicators is added in section 2.3.4.

 

Thanks again for your comments. Your comments refined the manuscript.

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

The article has been improved. Some technical corrections have yet to be made. The crossed-out text should be removed and typos should be corrected (for example, the paragraph in line 100 starts with a lowercase letter).

Comments on the Quality of English Language

I think the Quality of English is acceptable after minor editing.

Author Response

Dear reviewer:

Thank you for your comments on the manuscript.

We made further revisions to the manuscript.

Reviewer 2 Report

Comments and Suggestions for Authors
  1. Please enhance your proficiency in English writing.

  2. Upon seeing a dataset with dimensions 128*128*2000, do you comprehend its structure? The input data has three dimensions; can you identify them, as well as the total number of training samples?

  3. Since you mentioned that Figure 6 does not represent your original input data, but rather a grayscale transformation of it, could you kindly provide a visual representation or example of the actual input data?

  4. Frankly, I am unclear about the significance of the quality score. You've trained a model and used a metric to assess the quality of the inputs; however, it seems that a higher score does not necessarily equate to higher accuracy. Could you clarify the meaning behind this quality score? Furthermore, would it be possible to evaluate the quality of the images prior to training, where a higher score would indeed indicate a potential for higher accuracy?

Comments on the Quality of English Language

improve the quality of English Language

Author Response

Dear reviewer:

 

Thank you for your comments on the manuscript.

 

  1. We made further revisions to the manuscript.
  2. Where 128*128*2000 is the dimension of the data set, not the dimension of one sample. The data set contains 2000 samples of size 128*128, so the data set is size 128*128*2000. This is stated in the first paragraph of section 3.2.
  3. Figure 6 showing the original vibration signal was added to Section 3.3. Figure 7 (previously Figure 6) is the actual input data. In fact, whether or not a gray-scale processing is performed only affects the amplitude range of the image, which is explained in the last paragraph of section 2.3.1.
  4. In scenarios where machine learning is applied, accuracy is not appropriate as the only factor to evaluate quality, because sometimes high accuracy is based on the wrong reasons. Examples of this can be found in book ‘Artificial Intelligence as a Positive and Negative Factor in Global Risk’. Therefore, we need to try to understand the logic of machine learning decisions and consider whether such logic is reasonable. Hence the concept of 'the consistency of interpretation results with prior knowledge' in this manuscript. And the quality score is a combination of accuracy and consistency. This evaluation cannot be performed without training.

 

Thanks again for your comments. If you still have any questions about the manuscript, please feel free to point them out.

Reviewer 3 Report

Comments and Suggestions for Authors

Unfortunately, the authors did not address my comments. Contributions are still not declared. I still do not understand why the authors used feature selection if the time-frequncy images are fed into a CNN. Figure captions are still very short and not informative. The authors did not illustrate the manuscript with real signals and their corresponding time-frequency images.

Author Response

Dear reviewer: Thank you for your comments on the manuscript. We made further revisions to the manuscript. 1. In scenarios where machine learning is applied, accuracy is not appropriate as the only factor to evaluate quality, because sometimes high accuracy is based on the wrong reasons. Examples of this can be found in book ‘Artificial Intelligence as a Positive and Negative Factor in Global Risk’. Therefore, we need to try to understand the logic of machine learning decisions and consider whether such logic is reasonable. Hence the concept of 'the consistency of interpretation results with prior knowledge' in this manuscript. And the quality score is a combination of accuracy and consistency. 2. There are two models in this manuscript, the complex model f that needs to be explained and the model g that is used to make the explanation. For CNN-based model f, time-frequency images are its inputs. For model g, the understandable feature is its input. Therefore, feature selection is required. This is explained in the second part of Section 2.3.3. 3. Added descriptions of the images. The image of the measured vibration signal is supplemented, as shown in Figure 6. Thanks again for your comments. If you still have any questions about the manuscript, please feel free to point them out.

Round 3

Reviewer 2 Report

Comments and Suggestions for Authors

Thanks for your explanation. I still have some comments that should be addressed before publication.

1, I don't agree with your explanation of the reason for the gray-scale figures. If you use the time-frequency matrix as the input, the values in the matrix may be close to zero, but if you transform the time-frequency matrix into gray-scale figures, the values in these figures should be from 0 to 255. So based on your explanation, could you take an experiment that shows your results based on the raw time-frequency matrix? It is necessary to prove your explanation.

2, It is true that accuracy is not appropriate as the only factor to evaluate quality. I agree with this. However, the quality score that you proposed is not appropriate to support your explanation. If you don't apply the STFT to transform the raw vibration signals into time-frequency images, the highest score would be 0.7153.  Based on your explanation, the model trained with Winger-Ville Transform dataset is more trustworthy than the model trained with Wavelet Transform, which is the model with 88% accuracy is better than the model with 95% accuracy.  So in my opinion, the quality score that you proposed is not appropriate to support your explanation.

3, Please demonstrate the results of the training datasets and the raw time-frequency matrix.

Comments on the Quality of English Language

Extensive editing of English language required

Author Response

Dear reviewer:

 

Thank you for your comments on the manuscript.

Our answers to your latest questions and comments are as follows.

  1. The amplitude ranges of the time-frequency matrix obtained by different time-frequency transforms will be very different. Figure 3 and related descriptions have been added to the manuscript to show this difference. This difference does not matter when a single kind of time-frequency transform result is taken as the model input. However, this difference can have a bad effect when the different time-frequency transforms are compared horizontally. Gray-scale processing of the time-frequency matrix is a normalization process rather than cutting or adding.
  2. Thank you very much for agreeing on the point that accuracy is not appropriate as the only factor to evaluate quality. In the experiments covered in this manuscript, the Wavelet Transform results in higher accuracy, but its slightly lower consistency results in a lower quality score than the Winger-Ville Transform. This precisely illustrates the point made in lines 56-57 of this manuscript ‘Sometimes machine learning's predictions are highly accurate but for the wrong reasons.’ Here you can refer to an example given in reference 17, which I have appended below.

 

‘Once upon a time, the US Army wanted to use neural networks to automatically detect camouflaged enemy tanks.  The researchers trained a neural net on 50 photos of camouflaged tanks in trees, and 50 photos of trees without tanks. Using standard techniques for supervised learning, the researchers trained the neural network to a weighting that correctly loaded the training set - output "yes" for the 50 photos of camouflaged tanks, and output "no" for the 50 photos of forest. This did not ensure, or even imply, that new examples would be classified correctly.  The neural network might have "learned" 100 special cases that would not generalize to any new problem.  Wisely, the researchers had originally taken 200 photos, 100 photos of tanks and 100 photos of trees.  They had used only 50 of each for the training set.  The researchers ran the neural network on the remaining 100 photos, and without further training the neural network classified all remaining photos correctly.  Success confirmed!  The researchers handed the finished work to the Pentagon, which soon handed it back, complaining that in their own tests the neural network did no better than chance at discriminating photos.

It turned out that in the researchers' data set, photos of camouflaged tanks had been taken on cloudy days, while photos of plain forest had been taken on sunny days. The neural network had learned to distinguish cloudy days from sunny days, instead of distinguishing camouflaged tanks from empty forest.’

 

In the experiment of this manuscript, there is no obvious positive correlation between quality score and accuracy. I hope the above explanation will make you understand what I mean.

In addition, we have revised other parts of the manuscript again.

 

Thank you again for your comments, they have made this manuscript more complete.

Reviewer 3 Report

Comments and Suggestions for Authors

Unfortunately, the authors cooperate with the reviewer very reluctantly. Figure captions are still very short and not informative. For instance, "Fig. 1: The basic principle of LIME." and the meaning of colors, lines, and symbols are not explained. Figure captions should be informative and self-contained.

Author Response

Dear reviewer:

 

Thank you for your comments on the manuscript.

I am very sorry that the previous revision did not meet your requirements.

A more detailed description of the images has been added in this revision.

 

Thanks again for your comments.

Back to TopTop