Next Article in Journal
Detectability of Subsurface Defects in Polypropylene/Glass Fiber Composites Using Multiple Lock-In Frequency Modulated Algorithms
Next Article in Special Issue
Comparison of ANN and ANFIS Models for AF Diagnosis Using RR Irregularities
Previous Article in Journal
Research on the Mechanism of the Passive Reinforcement of Structural Surface Shear Strength by Bolts under Structural Surface Dislocation
Previous Article in Special Issue
Implementation of the XR Rehabilitation Simulation System for the Utilization of Rehabilitation with Robotic Prosthetic Leg
 
 
Article
Peer-Review Record

Mandarin Electro-Laryngeal Speech Enhancement Using Cycle-Consistent Generative Adversarial Networks

Appl. Sci. 2023, 13(1), 537; https://doi.org/10.3390/app13010537
by Zhaopeng Qian 1,*, Kejing Xiao 2 and Chongchong Yu 1
Reviewer 1:
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Appl. Sci. 2023, 13(1), 537; https://doi.org/10.3390/app13010537
Submission received: 13 November 2022 / Revised: 18 December 2022 / Accepted: 26 December 2022 / Published: 30 December 2022
(This article belongs to the Special Issue AI-Based Biomedical Signal Processing)

Round 1

Reviewer 1 Report

Section 1 must be improved.

-       Authors should emphasize contribution and novelty, the introduction needs to clarify the motivation, challenges, contribution, objectives, and significance/implication. 

-       You must properly introduce your work, specify well what were the goals you set yourself and how you approached the problem.

-       You should already at this stage summarize how you intend to improve the intelligibility of Mandarin Electro-Laryngeal Speech using Deep Learning

-       At the end of the section, add an outline of the rest of the paper, in this way the reader will be introduced to the content of the following sections.

 

Section 2 must be improved.

-       Try to describe in more detail how you will use the data you have to train the network.

-       Add a more effective description of the Deep Learning Architecture used, do not use only acronyms otherwise a reader who is not an expert on the subject will not be able to follow the flow of the information

-       Describe in more detail how EL-FT and EL-VT work together to train the network.

-       Figure 1must be improved: The text appears blurry, making it difficult for the reader to follow the flow of information.

-       You must properly introduce the equation, list in detail the variables contained in it with a concise description of the meaning. To make them more readable show them in a bulleted list. In this way the reader will be able to understand the contribution of each variable.

-       Describe in more detail the setup of the experiment

-       Add some photos and a summary table of the characteristics of the device you used for your experiment

-       Describe in more detail how you used the THCHS -30 database in your experiment

 

Section 3 must be improved.

-       In this section you present your results. You should start the section by summarizing how you tested your technology, what data you used, and how.

-       Figure must 4 be improved: The text is too small and appears blurry, making it difficult for the reader to follow the flow of information. Furthermore, make Figure and Caption fit on the same page.

-       Try to improve the formatting of the section so that the figures and text fit the text. I've seen that you leave a lot of blank space, so the space in the magazine won't be properly utilized

-       Add a description of the hardware and software used for data processing is completely missing. Describe in detail the hardware used:  Extract this data from the datasheet of the hardware manufacturer. To make reading the specifications of the hardware more immediate, you can insert them in a table, listing the instruments used and the specific characteristics for each.

-       Also, you should describe in detail the software platform you used. Also describe the machine learning-based libraries you used.

 

Section 5 must be improved.

-       Paragraphs are missing where the possible practical applications of the results of this study are reported. What these results can serve the people, it is necessary to insert possible uses of this study that justify their publication.

-       They also lack the possible future goals of this work. Do the authors plan to continue their research on this topic?

 

 

99) “Mel-Spectrogram” Introduce adequately the topic

119) Leave a blank line before the Figure 2. I have seen that you often use this format, so I will not repeat this advice again, it also applies to the other occurrences.

122) Leave a blank line after Figure 2 Caption. I have seen that you often use this format, so I will not repeat this advice again, it also applies to the other occurrences.

254) Leave a blank line before and after Table. I have seen that you often use this format, so I will not repeat this advice again, it also applies to the other occurrences.

Author Response

Response to Reviewer 1 Comments

 

Point 1:  Authors should emphasize contribution and novelty, the introduction needs to clarify the motivation, challenges, contribution, objectives, and significance/implication. 

Response 1:  Thanks for your suggestions. We have supplemented relevant content according to your suggestions. The contributions emphazied in Introduction can be found in the first and the fourth sentences in the fifth paragraph of the Introduction Section (highlited in line 93, and form line 97 to line 99). “To our knowledge, we are the first to use CycleGAN-based Voice Conversion (VC) to enhance the Mandarin EL speech.”; “This architecture is used to capture contextual semantic information to enhance the tone variation of continuous Mandarin EL speech.” The novelty emphasized in Introduction can be found in the third sentence of the fifth paragraph of the Introduction Section (highlighted in line 94~line 97). “The 2D-Con-1D-Tran-2D-Con neural networks are designed as the generator in CycleGAN to process the input, and the 2D-Conformer is designed as the discriminator in CycleGAN to discriminate whether the predictions approximate the outputs.”

The generator and discriminator designed in this paper are the improvements on the basis of CycleGAN-VC3. CycleGAN-VC3 proposed by Kaneko, T, et al. (2020) is the State-of-The-Art (SOTA) full parallel-data-free voice conversion method. CycleGAN-VC3 can get rid of the dependence of the parallel speech materials, which performs well in converting speech with high quality. The motivation can be found in the first sentence of the second paragraph of the Introduction Section (highlighted in line 46) and the second sentence of the third paragraph of the Introduction Section (highlighted in line 68~line 69). The chanllenges can be found in the eighth and ninth sentences of the second paragraph of the Introduction Section (line 58~line 62), the second sentence (line 68~line 69), the sixth sentence and the seventh sentence (line 75~line 78) of the third paragraph of the Introduction Section. The contributions can be found in the fifth paragraph of the Introduction Section (highlighted in line 94~line 101). The objectives can be found in the first sentence of the second paragraph (highlighted in line 46~line 47), the sixth, seventh and the last sentences of the third paragraph (line 75~line 78 and line 86~line 87) of the Introduction Section. Among them, the first sentence of the second paragraph (highlighted in line 46~line 47) shows the general objective of our research; the sixth and the seventh sntences (line 75~line 78) show the specific objective of our research; the last sentence (line 86~line 87) shows what we want to achieve by our proposed method. The significance of our research can be found in the fifth sentence of the fifth paragraph (line 99~line 101) of the Introduction Section. The main significance of our research is as follows. We used the CycleGAN-based VC to address the problem that the source and target speech are not parallel. To further improve the ability of parallel-data-free VC to capture the contextual semantic information, we designed 2D-Con-1D-Tran-2D-Con-based generator and 2D-Conformer-based discriminator of CycleGAN.

 

Point 2:     You must properly introduce your work, specify well what were the goals you set yourself and how you approached the problem.

Response 2: Thanks for your suggestions. The specific goals we set can be found in the sixth and seventh sentences (line 75~line 78) of the third paragraph in the Introduction Section. We aims to introduce how we enhance the Mandarin EL speech with complicated tone variation rules. The goal of our research is too complicated. When the speaker uses hand-hold Electro-Larynx to pronounce the Mandarin speech, the tone errors cannot be avoid completely. The tone errors make the voice conversion task more chanllenging. The Introduction Section of this paper has been written step by step. Firstly, we introduced the generation of the EL speech and the drawbacks of EL speech, such as radiation noise and mechnical F0 variation (the first paragraph of the Introduction Section). Secondly, we introduced some researches about how to cancel the radiation noise of EL speech (the second paragraph of the Introduction Section). Thirdly, we introduced some relevant reseaches about how to improve the intelligibility and naturalness of EL speech by voice conversion. Also, we reviewed the advantages and limitations of the VC methods in dealing with the Mandarin EL speech (the second and third paragraph). Finally, we introduced what we have done to improve the intelligibility and naturalness of Mandarin EL speech by our proposed method. The novel neural networks we proposed in this paper, which are applied in the CycleGAN framework, are very useful to improve the tone accuracy of the Mandarin EL speech (the third and fourth paragraphs of the Introduction Section).

Point 3:    You should already at this stage summarize how you intend to improve the intelligibility of Mandarin Electro-Laryngeal Speech using Deep Learning

Response 3:  Thanks for your suggestions. In the third paragraph of the Introduction Section (line 76~line 78), we introduced that the 2D-Con-1D-Tran-2D-Con was proposed as the generator and the 2D-Conformer was proposed as the discriminator, they can capture the contextual semantic information and can significantly improve the tone accuracy of Mandarin EL speech. We summarized the contributions of our work in the fifth paragraph. We designed the novel parallel-data-free VC based on deep learning to further improve the intelligibility and naturalness of Mandarin EL speech. And we also designed a variety of experiments to evaluate the performance of our proposed method.

Point 4:    At the end of the section, add an outline of the rest of the paper, in this way the reader will be introduced to the content of the following sections.

Response 4:  Thanks for your suggestion. We have supplemented the outline of the rest of the paper. The details can be found in the last paragraph highlighted in the Introduction Section (line 105~line 108). 

Point 5:   Try to describe in more detail how you will use the data you have to train the network.

Response 5:  Thanks for your suggestions. According to 1250 Chinese text sentences, we recorded 1250 utterances of continuous Mandarin EL speech with fixed tone (EL-FT), 1250 utterances of continuous Mandarin EL speech with variable tone (EL-VT) and 1250 utterances of normal speech, respectively. 1000 utterances of EL-FT and 1000 utterances of normal speech are used as the source and target speech to train a CycleGAN-based VC model (we can regard it as Group1); 1000 utterances of EL-VT and 1000 utterances of normal speech are used as the source and target speech to train another CycleGAN-based VC model (we can regard it as Group2). 250 utterances of EL-FT are used to test the Group1 CycleGAN; 250 utterances of EL-VT are used to test the Group2 CycleGAN. Please note that 1000 utterances of training data are recorded according to the text materials in THCHS-30; 250 utterances of testing data are recorded according to the Daily Chinese materials. In addition, we re-record 100 utterances of EL-FT and EL-VT according to the special content materials. This is because we want to test how the semantic information influences the tone accuracy of the CycleGAN-based VC. The above explanations have been add to Section 2.3.3 (highlighted in line 261 ~ line 269).

Point 6:   Add a more effective description of the Deep Learning Architecture used, do not use only acronyms otherwise a reader who is not an expert on the subject will not be able to follow the flow of the information

Response 6:  Thanks for your suggestions. We have checked and revised the whole manuscript. All acronyms are given their full names when they first appear in the manuscript. In addition, we have added more descriptions of our methods to make them as easy to understand as possible. The details are shown as the highlighted section under Figure 2, the highlighted section in Section 2.1.1 and 2.1.2.

Point 7:   Describe in more detail how EL-FT and EL-VT work together to train the network.

Response 7:  Thanks for your comments. In this paper, the EL-FT and EL-VT are not working together to train the CycleGAN. Actually, EL-FT is used to train a CycleGAN-based VC model, and EL-VT is used to train another CycleGAN-based model. They are two different models, and are tested separately.

Point 8:    Figure 1must be improved: The text appears blurry, making it difficult for the reader to follow the flow of information.

Response 8:  Thanks for your suggestion. Figure 1 has been improved. The details can be found in Figure 1 in the revised manuscript.

Point 9:   You must properly introduce the equation, list in detail the variables contained in it with a concise description of the meaning. To make them more readable show them in a bulleted list. In this way the reader will be able to understand the contribution of each variable.

Response 9:  Thanks for your suggestions. We have supplemented the descriptions of symbols for losses (euqation (3)~(6)) at line 196, line 197, line 206, line 208, respectively. Moreover, we added the input  for equation (7). Equations (3)~(6) are used to calculate the losses of CycleGAN, which are referred to reference [29]. Equations (7)~(9) show the main calculation process of WaveNet, which are referred to reference [28]. The WaveNet is used to predict the waveform of audio according to the input acoustic features. We use text to describe the symbols but not a bulleted list, bacause all of the symbols in this paper are the widely used symbols and we have referred to the format of formulas in several published papers of the “Applied Sciences” journal, and they all have no bulleted list. For example, literature (1)-(3).

(1) Wu J, Hua Y, Yang S, et al. Speech enhancement using generative adversarial network by distilling knowledge from statistical method[J]. Applied Sciences, 2019, 9(16): 3396.

(2) Li L, Kürzinger L, Watzel T, et al. Lightweight End-to-End Speech Enhancement Generative Adversarial Network Using Sinc Convolutions[J]. Applied Sciences, 2021, 11(16): 7564.

(3) Kim H Y, Yoon J W, Cheon S J, et al. A multi-resolution approach to gan-based speech enhancement[J]. Applied Sciences, 2021, 11(2): 721.

Point 10:  Describe in more detail the setup of the experiment

Response 10:  Thanks for your suggestion. We have supplemented Table 1 and the description about the setup conditions of the experiments. The details can be found in Table 1 and the first paragraph under Table 1 in Section 2.3.1 (highlighted in line 244~line 252).

Point 11:  Add some photos and a summary table of the characteristics of the device you used for your experiment

Response 11:  Thanks for your comments. This paper dose not shown the photos and details of devices, for that we should obey the confidentiality provisions. Nevertheless, we added more relevant descriptions of the mainly used devices (including GPU resources and Roland sound collection device). The details can be found in the first paragraph highlighted under Table 1 (line 249~line 252). The details of data collection can be found in Section 2.3.3 (line 261~line 269).

Point 12:  Describe in more detail how you used the THCHS -30 database in your experiment

Response 12:  Thanks for your suggestions. We only used the Chinese text materials of THCHS-30 to record the speech. This is because THCHS-30 database is commonly used in application of Automatic Speech Recognition (ASR). However, the Chinese text materials of THCHS-30 cover all mono-phonemes, 71.5% bi-phonemes and 14.3% tri-phonemes. The VC model can be trained adequately by the materials. In addition, the sample rate of audio files is 44100Hz during recording. However, the sample rate of audio files is reduced to 16000Hz during processing, in order to facilitate the system to extract acoustic features. The relevant details have been supplemented to Section 2.3.3 (line 261~line269 and line 274~line278).

Point 13:  In this section you present your results. You should start the section by summarizing how you tested your technology, what data you used, and how.

Response 13:  Thanks for your suggestions. We used the objective and subjective evaluations to test the performance of our proposed method. The objective evaluations include F0 pattern evaluation and spectrum augmentation evaluation. The subjective evaluations include Tone Accuracy, Influences of Semantic Information for tone accuracy, WER, MOS of intelligibility and naturalness. The details can be found in Section 3.1 and Section 3.2. Besides, we have revised the start of the Section 3 according to your suggestions. The details can be found in the highlighted first paragraph of Section 3 (line 297~line 304). The preparation of testing data has been described in Section 2.3.3 (line 264).

Point 14: Figure must 4 be improved: The text is too small and appears blurry, making it difficult for the reader to follow the flow of information. Furthermore, make Figure and Caption fit on the same page.

Response 14:  Thanks for your suggestions. The original Figure 4 has been improved, which has been split into three figures (Figure 4(1), Figure 4(2) and Figure 4(3)). The three new split figures are shown in three pages (page 13, page 14 and page 15). The details can be found in line 372~line 380.

Point 15: Try to improve the formatting of the section so that the figures and text fit the text. I've seen that you leave a lot of blank space, so the space in the magazine won't be properly utilized

Response 15:  Thanks for your suggestions. The format of whole manuscript has been improved. We removed some text from their original positions to the new positions, which makes the layout of this manuscript more compact. (Following with the requirements of Editor, we used the “Track Changes Function” of MS Word. Therefore, you can track the original positions and the new positions of the removed text.)

Point 16: Add a description of the hardware and software used for data processing is completely missing. Describe in detail the hardware used:  Extract this data from the datasheet of the hardware manufacturer. To make reading the specifications of the hardware more immediate, you can insert them in a table, listing the instruments used and the specific characteristics for each.

Response 16:  Thanks for your suggestions. Several hardware devices are used in our research, such as professional sound card to collect the speech, GPU video card to train the model, GPU server to process the audio data and to train the model, Electro-Larynx to pronounce speech, etc. In addition, Several software platforms are also used in our research, such as the PyCharm to code the python program, Adobe Audition to collect audio files, etc. The above devices and software platforms can be bought online, and the parameters of these devices and software platforms have no influence on data collection and processing. Therefore, we do not use table to list the details. However, we give the descriptions in a general way in the revised manuscript. The details can be found in the second and third sentences of the first paragraph under Table 1 of Section 2.3.1 (line 250~line 251) and the ninth and tenth sentences of the first paragraph of Section 2.3.3 (line 274~line 278).

Point 17: Also, you should describe in detail the software platform you used. Also describe the machine learning-based libraries you used.

Response 17:  Thanks for your suggestions. The software platform for data collection is Adobe Audition CS6, the details can be found in Section 2.3.3. Our proposed method is an improvement of CycleGAN-VC3, which was edited by Pycharm using python program. The machine learning-based library is PyTorch. The details can be found in Section 2.3.1.

Point 18: Paragraphs are missing where the possible practical applications of the results of this study are reported. What these results can serve the people, it is necessary to insert possible uses of this study that justify their publication.

Response 18:  Thanks for your suggestions. We have supplemented the possible application of this study in the last two sentences of the first paragraph of Section 4 (Discussion) (highlighted line 503~line 505).

Point 19: They also lack the possible future goals of this work. Do the authors plan to continue their research on this topic?

Response 19:  Thanks for your comments. The future works are shown in the highlighted part of the last paragraph of Section 4 (line 517~line 520). One future work is that we want to reduce the cost of GPU resources; another future work is that we want to reduce the cost of time.

Point 20:  99) “Mel-Spectrogram” Introduce adequately the topic

Response 20:  Thanks for your suggestion. We have removed the keyword “Mel-Spectrogram Transformation”.

Point 21:  119) Leave a blank line before the Figure 2. I have seen that you often use this format, so I will not repeat this advice again, it also applies to the other occurrences.

Response 21:  Thanks for your suggestions. We have left a blank line before all of the figures according to your suggestions.

Point 22:  122) Leave a blank line after Figure 2 Caption. I have seen that you often use this format, so I will not repeat this advice again, it also applies to the other occurrences.

Response 22:  Thanks for your suggestions. We have left a blank line after all captions of figures according to your suggestions.

Point 23:  254) Leave a blank line before and after Table. I have seen that you often use this format, so I will not repeat this advice again, it also applies to the other occurrences.

Response 23:  Thanks for your suggestions. We have left a blank line before and after all of the tables according to your suggestions.

 

Author Response File: Author Response.docx

Reviewer 2 Report

The Title: Mandarin Electro-Laryngeal Speech Enhancement using Cycle-Consistent Generative Adversarial Networks

In this paper, the authors propose the use of Cycle-Consistent Generative Adversarial Networks to enhance the continuous Mandarin Electro-Laryngeal speech. The work is well but there are major comments that should addressed, which are listed below. 

1- In the Abstract, it is better to put the abbreviation “EL” beside the first appear of its term “Electro-Laryngeal”. 

2- In the Abstract, the authors mentioned the general problem in this filed. It is preferable to mention the specific problem statement and how it was solved by the proposed work very briefly.

3- The contribution of the proposed work is not clear since it is mentioned that the main contributions are “The 2D-Con-1D-Tran-2D-Con neural networks are applied, and the 2D-Conformer is applied”. Can the authors please identify what is new that has been added to the body of knowledge?

4-There are some typos relating to the table number. In the text, some of them are written in Latin numbers. For example, “The WER in Table VI is”, “The results are shown in Table V”, etc. Please correct this issue. 

5- It is recommended to add more recent papers (2021 and 2022) that are related to the current work.

6- Basically, all the figures are not clear. Please, enhance the quality of the images and use another colormap for more clarity.  Moreover, in general, the captions of the figures are long and need to be more concise. The details that are written in the captions can be moved to the text of the manuscript.  Besides, the caption of Table 4 is written in another page and this is unacceptable.

7- The authors mentioned the importance of speech intelligibility in Electro-Laryngeal speech that is the core of this work. There are many measurements that can assess intelligibility (how accurately people understand speech) such as CSII, CSTI, and others. Can the authors evaluate the performance of the proposed work using such measures and compare with other existing works.

8- The writing of the entire manuscript should be improved and more organization are required. 

9- Section 2.3.4 is at the bottom of the page. Please check and correct such issues. Moreover, there is a white space in Page 10 should be handled. 

10- All equation and their terms should be written in mathematical form such p(X) in equation 5.

11- The approaches used in Table 1, need to be cited with their reliable source. Besides, it is better to use other terms instead of “Ours-FT” and “Ours-VT”. Same notes should be repeated for other Tables. 

12- Some speech enhancement algorithms need to be included such as: a) doi:  10.1088/1757-899X/1090/1/012102, b) doi: 10.1109/TASLP.2018.2842159 , and c) doi: 10.1109/ACCESS.2019.2929864 

 

Author Response

Response to Reviewer 2 Comments

 

Point 1:  In the Abstract, it is better to put the abbreviation “EL” beside the first appear of its term “Electro-Laryngeal”. 

Response 1:  Thanks for your suggestions. We have put the abbreviation “EL” beside where “Electro-Laryngeal” first appeared in the Abstract.

Point 2:  In the Abstract, the authors mentioned the general problem in this filed. It is preferable to mention the specific problem statement and how it was solved by the proposed work very briefly.

Response 2:  Thanks for your suggestions. Our specific problem is “If the EL speech to be enhanced is with complicated tone variation rules in Mandarin, the enhancement will be less effective. This is because the source speech (Mandarin EL speech) and the target speech (normal speech) are not strictly parallel.” The third and fourth sentences (highlighted line 10~line 13) of Abstract show the details. The fifth sentence to the eighth sentence describes how the proposed method solves the above problems (highlighted line 13~line 19).

Point 3: The contribution of the proposed work is not clear since it is mentioned that the main contributions are “The 2D-Con-1D-Tran-2D-Con neural networks are applied, and the 2D-Conformer is applied”. Can the authors please identify what is new that has been added to the body of knowledge?

Response 3:  Thanks for your comments. 2D-Con-1D-Tran-2D-Con neural network is designed as the Generator of CycleGAN in this paper to further improve the intelligibility and naturalness of Mandarin EL speech. This is one point of our contributions. 2D-Conformer is designed as the Discriminator of CycleGAN in this paper. This is another point of our contributions. The two above points are all new that have been added to the body of knowledge. The Generator and Discriminator designed in CycleGAN-based VC have better performance in dealing with the Mandarin EL speech, because the Conformer and Transformer applied in our proposed framework can effectively take advantage of the sematic information of speech.

Point 4: There are some typos relating to the table number. In the text, some of them are written in Latin numbers. For example, “The WER in Table VI is”, “The results are shown in Table V”, etc. Please correct this issue. 

Response 4:  Thanks for your suggestions. All Latin numbers in the manuscript have been corrected.

Point 5: It is recommended to add more recent papers (2021 and 2022) that are related to the current work.

Response 5:  Thanks for your suggestions. We have added some recent articles, such as the references [18] and [19] in the revised manuscript. Reference [18] will be published in 2023 (accepted in 2022). Reference [19] was published in 2021. In addition, the references [20] and [22] are also the recently published works and were included in our manuscript.

Point 6: Basically, all the figures are not clear. Please, enhance the quality of the images and use another colormap for more clarity.  Moreover, in general, the captions of the figures are long and need to be more concise. The details that are written in the captions can be moved to the text of the manuscript.  Besides, the caption of Table 4 is written in another page and this is unacceptable.

Response 6:  Thanks for your suggestions. We have revised Figure 2, 5 and 6 by using different colors to mark different modules. However, Figure 1, 3 and 4 cannot be colored, because the analysis of speech were obtained by using praat software,which does not provide the function of adding color to images. In addition, the captions of all figures have been revised to be more concise, and the caption of Table 4 has been adjusted.

Point 7: The authors mentioned the importance of speech intelligibility in Electro-Laryngeal speech that is the core of this work. There are many measurements that can assess intelligibility (how accurately people understand speech) such as CSII, CSTI, and others. Can the authors evaluate the performance of the proposed work using such measures and compare with other existing works.

Response 7:  Thanks for your comments. The measurements you suggest such as CSII, CSTI are not commonly used to evaluate the EL speech. EL speech is different from the normal speech. EL speech is a kind of re-producted speech. In this paper, we focus on how to further improve the intelligibility and naturalness of Mandarin EL speech. The tone errors are the core problems. To overcome the problem, we propose to use parallel-data-free VC to enhance the Mandarin EL speech. The parallel-data-free VC can effectively improve the intelligibility and naturalness of Mandarin EL speech.

The objective and subjective measurements are the most commonly used measuremts to evaluate the enhancement of Mandarin EL speech. Therefore, we adopted the evaluation methods. The objective evaluation includes U/V analysis, F0 CC, log F0 RMSE for F0 pattern, and MCD, CodeAP RMSE, Spectrogram analysis for spectrum augmentation. The subjective evaluation includes tone accuracy, WER, MOS of intelligibility and naturalness. The measurements used in this paper were also widely used in researches [16], [17], [18], [19], [20], [22], [23], [24] and [25].

In addition, our proposed method is compared with CLDNN-based VC [20] and CycleGAN-VC3 [24] (baseline methods). The CLDNN-based VC [20] has been verified to be effective in enhancing the Mandarin EL speech. Besides, the CycleGAN-VC3 [24] is the SOTA parallel-data-free VC. We compared the results of proposed method and the baseline methods. The results show that our proposed method can effectively improve the intelligibility and naturalness of the Mandarin EL speech.

Point 8: The writing of the entire manuscript should be improved and more organization are required. 

Response 8:  Thanks for your suggestions. We have checked the whole manuscript carefully and improved the organization of this paper. For example, we supplemented conclusions in Abstract Section (line 26~line 28). Outline of the rest paper has been added in the last paragraph of the Introduction Section (line 105~line 108). More details of methods have been supplemented in Section 2.1.1 (line 149~line 171) and Section 2.1.2 (line 177~line 180). More descriptions of setup conditions for experiments have been added in Section 2.3.1 (line 244~line 252). The above modifications have been highlighted.

Point 9: Section 2.3.4 is at the bottom of the page. Please check and correct such issues. Moreover, there is a white space in Page 10 should be handled. 

Response 9:  Thanks for your suggestion. We have checked and corrected such issues.

Point 10: All equation and their terms should be written in mathematical form such p(X) in equation 5.

Response 10:  Thanks for your suggestions. We have checked all equations and corrected them in mathmatical form.

Point 11: The approaches used in Table 1, need to be cited with their reliable source. Besides, it is better to use other terms instead of “Ours-FT” and “Ours-VT”. Same notes should be repeated for other Tables. 

Response 11:  Thanks for your suggestions. For that we have added a new Table 1 according to the other reviewer’s comments, the original Table 1 has been modified as Table 2. The results of U/V analysis in Table 2 are calculated according to the testing data. The reliable source of U/V analysis has been supplemented in the second sentence of the first paragraph under Table 2 (highlighted line 315~line 316). The U/V analysis method was referred to [20], which is used to evaluate the accuracy of unvoiced and voiced segment prediction for the enhanced speech. Following with your suggestion, the Ours-FT and Ours-VT have been modified to 2C1T2C-FT and 2C1T2C-VT in results.

Point 12: Some speech enhancement algorithms need to be included such as: a) doi:  10.1088/1757-899X/1090/1/012102, b) doi: 10.1109/TASLP.2018.2842159 , and c) doi: 10.1109/ACCESS.2019.2929864 

Response 12:  Thanks for your comments. The research objective of our manuscript is very different from that of the papers you provide. The problem of Mandarin EL speech enhancement is different from the traditional speech enhancement. Mandarin EL speech is a kind of re-producted speech. The researches of a) and c) aim to cancel the background noise to improve the quality of speech. The review b) introduces the researches about speech separation. However, our Mandarin EL speech is recorded in a silent room, which has no background noise. Although the intelligibility of EL speech is influenced by its radiation noise, the problem about radiation noise has been solved well. Some speech signal enhancement methods (including VC) can effectively cancel the radiation noise. This problem is not what we focus now. In this paper, we focus on how to improve the tone accuracy of Mandarin EL speech. Currently, the limitations of VC in enhancing the Mandarin EL speech are caused by the complicated tone variation rule of Mandarin EL speech. The Mandarin EL speech (source speech) and the normal speech (target speech) are not strictly parallel, which makes the VC very challenging. Therefore, we included the typical VC methods as the speech enhancement algorithms.

Author Response File: Author Response.docx

Reviewer 3 Report

1. Provide a stronger explanation on the enhancement compared to previous models or approaches

2. Describe more background or reasoning on why cycle consistent GAN is required for the proposed method, compared to GAN or other improvements from GAN

3. Explain the use of wavenet, why mfcc couldn't be used for features.

4. Describe more about how to handle noisy data and more on intonation, since intonation provide a clearer meaning of the speech

5. Elaborate on the dataset collected in this research with the details of the source and setting of the collection

 

Author Response

Response to Reviewer 3 Comments

 

Point 1:  Provide a stronger explanation on the enhancement compared to previous models or approaches

Response 1:  Thanks for your suggestions. The previous parallel-dependent methods or models are weak in dealing with the Mandarin EL speech, because the Mandarin EL speech has complicated tone variation rules, which lead to a large amount of tone errors in Mandarin EL speech. Tone errors cause that the source speech and the target speech are not strictly parallel. Therefore, the enhancement of Mandarin EL speech by using parallel-dependent Voice Conversion (VC) is limited. To address the problem, we proposed to use parallel-data-free VC. CycleGAN-VC3 is the current SOTA parallel-data-free VC, however, CycleGAN-VC3 still has some limitations in enhancing Mandarin EL speech. This is because the CycleGAN-VC3 cannot capture the long dependency of the speech. Considering this point, we designed the 2D-Con-1D-Tran-2D-Con as the generator and the 2D-Conformer as the discriminator of CycleGAN, respectively. The proposed method is used to further improve the intelligibility and naturalness of Mandarin EL speech. The Transformer and Conformer are applied to design the novel CycleGAN, because the Transformer and Conformer have good ability in dealing with the contextual content of series data. Our proposed method in this paper is used to improve the tone accuracy of the Mandarin EL speech. This part has been shown as the highlighted sentences in the Introduction Section (line 75 ~ line 78).

Point 2:  Describe more background or reasoning on why cycle consistent GAN is required for the proposed method, compared to GAN or other improvements from GAN

Response 2:  Thanks for your comments. The problem we want to solve in this paper is that the source speech and the target speech are not strictly parallel. To the best of our knowledge, only CycleGAN-based and StarGAN-based VC systems are full parallel-data-free model. However, the StarGAN-based VC is commonly used for multi-speaker voice conversion task, which cannot be used to solve the problems in this paper. Therefore, we used the CycleGAN as the framework to enhance the Mandarin EL speech.

Point 3:  Explain the use of wavenet, why mfcc couldn't be used for features.

Response 3:  Thanks for your comments. WaveNet is a kind of neural vocoder. WaveNet is used to synthesize the acoustic features into waveform signal directly. In this paper, WaveNet is used to re-synthesize the enhanced Mel-Spectrogram parameters into enhanced speech. The details can be found in the fourth paragraph of the Introduction Section (line 88~line 92). MFCC is usually used for Automatic Speech Recognition (ASR). If MFCC is used for the VC task, especially for the enhancement of EL speech, the enhancement effect will be bad. This is because the MFCC loses a lot of acoustic details of speaker. Therefore, MFCC cannot be used in the task of VC with high quality of re-synthesized speech.

Point 4:  Describe more about how to handle noisy data and more on intonation, since intonation provide a clearer meaning of the speech

Response 4:  Thanks for your comments. The data collection was completed in a silent room without background noise. The radiation noise affects the intelligibility of EL speech. However, the VC can effectively cancel the radiation noise. The above conclusions can be drawn by comparing the spectrogram analysis results of enhanced and unenhanced Mandarin EL speech in Figure 4. We does not study the problem of radiation noise, because the problem of radiation noise is not what we focus on in this paper. Please note that the intonation is different from the tone of tonal language speech. There are four tones (including Tone1, Tone2, Tone3 and Tone4) in Mandarin speech. The tone variation rules of Mandarin are very complicated. For example, if the tone of latter syllable is the measure word and the former syllable is the numeral word (“yi2 ge4 ping2 guo3” but not “yi1 ge4 ping2 guo3”, the Chinese meaning is one apple), Tone1 should be changed to Tone2. VC can improve the tone accuracy of Mandarin EL speech, referring to [20]. The improvement of tone accuracy is an important point what we focus on. Therefore, we use a whole section to describe how we improved the tone accuracy. The details can be found in Section 3.2.1.

Point 5:  Elaborate on the dataset collected in this research with the details of the source and setting of the collection

Response 5:  Thanks for your suggestion. The data collection of our research was completed in a silent room. The data was recorded by a speaker using hand-hold Electro-Larynx. The Chinese text materials are from THCHS-30 database. During the recording process, we use the Roland professional sound card to collect the speech data. The recording software platform is Adobe Audition. The sample rate of the recorded audio file is 44100Hz, while we reduced the sample rate to 16000Hz during the process of acoustic feature extraction, training and testing. The details can be found in Section 2.3.3 (highlighted line 261~line 269 and line 274~line 278).

Author Response File: Author Response.docx

Reviewer 4 Report

The paper addresses a mandarin electro-laryngeal speech enhancement using cycle-consistent generative adversarial networks.

 1- The abstract should be modified. There is not enough information about methodology, proposed work, conclusion in this part. I suggest you structure your abstract as presented in https://www.principiae.be/pdfs/UGent-X-003-slideshow.pdf

 2- The introduction was started with little information about Electro-larynx. The introduction section is short, and it should be longer, but with focus on the details. The introduction should be extended to new published papers for recent years. In the introduction should be expressed the better state-of-art of new methods. The new references will also be examined in this part. I would like to see the articles for last and this year in this section.

 3- I cannot see the details of the proposed method in this article. The method was explained short without preparing enough explanation. For example, sections 2 do not have new information. Most of them are well-know and can be referred to some references.

 4- The quality of all figures is very low. All of them should be modified deeply.

 5- Figure 2 has some datils where had not been explained well in the text of the article. I recommend explaining well each section of this figure.

 6- Figures 3 and 4 are ambiguous and make reader confuse about the content. I recommend the authors to show better this figure with more explanation in the body text.

 7- Simulation conditions are not well discussed. The proposed approach was illustrated only on some specific simulations, which is not enough to draw a complete and accurate conclusion about the proposed approach.

 8- This method should be compared with more famous methods to determine the superiority of the proposed method. The evaluations in not enough.

 9- Please, do not forget that the clarity and the good structure of an article are important factors in the review decision. Please read the paper carefully (again) and correct it in English.

Author Response

Response to Reviewer 4 Comments

 

Point 1: The abstract should be modified. There is not enough information about methodology, proposed work, conclusion in this part. I suggest you structure your abstract as presented in https://www.principiae.be/pdfs/UGent-X-003-slideshow.pdf

Response 1:  Thanks for your suggestions. We have revised the Abstract according to your suggested link. The background and purpose parts can be found from the first to the fourth sentences of Abstract. The methodology and our proposed work can be found from the fifth to the ninth sentences. The results can be found in the tenth and eleventh sentences. We supplemented the conclusions in the Abstract of revised manuscript, and the details can be found in the highlighted twelfth and thirteenth sentences.

Point 2: The introduction was started with little information about Electro-larynx. The introduction section is short, and it should be longer, but with focus on the details. The introduction should be extended to new published papers for recent years. In the introduction should be expressed the better state-of-art of new methods. The new references will also be examined in this part. I would like to see the articles for last and this year in this section.

Response 2:  Thanks for your suggestions. Although there are so many researches about Electro-Larynx, these researches are only the background of this paper. There is a great difference between the Electro-Larynx and the enhancemnet of Electro-Laryngeal (EL) speech. What we focus on is how to enhance the EL speech, and the core problem is how to improve the intelligibility and the naturalness of the EL speech. Our Introduction (Section 1) is written step by step according to the following structure: The first paragraph introduces the background of our research (What is the Electro-Larynx used for? And what is the EL speech?) and the problem of EL speech. The second paragraph introduces the typical researches about how to improve the intelligibility and naturalness of the EL speech (and Mandarin EL speech). The third paragraph introduces the problems that parallel-dependent voice conversion (VC) is weak in dealing with the Mandarin EL speech. The fourth paragraph introduces why we use the neural vocoder (WaveNet) to synthesize the enhanced speech. The fifth paragraph sumarizes our contributions. Following with your suggestions, we supplemented more descriptions of the novelty of our research. For example, the first and the fifth sentences of the fifth paragraph. In addition, we have added the outline of the rest of the manuscript. The details can be found in highlighted sixth paragraph of Section 1.

We have supplemented some latest references of this year and last year, such as [18] and [19]. To the best of our knowledge, the method in [20] is the SOTA method in enhancing Mandarin EL speech. The methods in [21] and [22] are SOTA in the field of ASR for Mandarin EL speech. Besides, the CycleGAN-VC3 is the SOTA parallel-data-free VC. Considering this condition, we designed the 2D-Con-1D-Tran-2D-Con neural networks as the generator of CycleGAN and designed the 2D-Conformer neural networks as the discriminator of CycleGAN. The new proposed CycleGAN-based VC framework is applied to enhance the Mandarin EL speech. We compared the results of proposed method and the CycleGAN-VC3. The details can be found in the highlighted parts of the third paragraph of the Introduction Section (line 81~line 87).

Point 3: I cannot see the details of the proposed method in this article. The method was explained short without preparing enough explanation. For example, sections 2 do not have new information. Most of them are well-know and can be referred to some references.

Response 3:  Thanks for your suggestions. Following with your suggestions, we supplemented more details of the proposed methods. The details can be found in the highlighted parts of Section 2.1.1 and Section 2.1.2, respectively. In addition, the setup conditions of hyper-parameters are supplemented in Table 1 of Section 2.3.1.

Point 4: The quality of all figures is very low. All of them should be modified deeply.

Response 4: Thanks for your suggestions. We have revised all of the figures in the revised manuscript.

Point 5: Figure 2 has some datils where had not been explained well in the text of the article. I recommend explaining well each section of this figure.

Response 5:  Thanks for your suggestions. We have supplemented more details for Figure 2. The details of every part in Figure 2 can be found in the highlighted paragraph under Figure 2 and the highlighted paragraphs of Section 2.1.1 and Section 2.1.2.

Point 6: Figures 3 and 4 are ambiguous and make reader confuse about the content. I recommend the authors to show better this figure with more explanation in the body text.

Response 6: Thanks for your suggestions. We have revised Figure 3 and Figure 4. More explanations also have been supplemented in the revised manuscript. The detailed descriptions of Figure 3 can be found in the highlighted parts from line 188 to line 194. The detailed descriptions of Figure 4 can be found in the highlighted parts from line 345 to line 351 and from line 367 to line 370.

Point 7: Simulation conditions are not well discussed. The proposed approach was illustrated only on some specific simulations, which is not enough to draw a complete and accurate conclusion about the proposed approach.

 

Response 7:  Thanks for your comments. The experiment in our research is not the simulation. The results of our research are analyzed based on the enhancement of the real Mandarin EL speech. The details of experimental setup conditions can be found in Section 2.3.1 (We added Table 1 to show more details of our experimental setup conditions.). Furthermore, we designed the objective and subjective evaluations to analyze the performance of our proposed method. These evaluations are designed referring to [16], [17], [18], [19] and [20]. The above researches all used the objective and subjective evaluations to analyze the enhancement of EL speech. To convenient compare with the baseline methods, we use the U/V analysis, F0 CC, log F0 RMSE, MCD and CodeAP RMSE as the objective evaluation indicators to evaluate the F0 pattern and spectrum augmentation, respectively. In addition, we use the tone accuracy, WER, MOS of intelligibility and naturalness as the subjective evaluation indicators. According to the objective and subjective evaluations, we comprehensively analyzed the enhancement effect of Mandarin EL speech of our proposed method comparing with the baseline methods. The results can be used to draw a complete and accurate conclusion of our proposed approach.

Point 8: This method should be compared with more famous methods to determine the superiority of the proposed method. The evaluations in not enough.

Response 8: Thanks for your comments. Our proposed method is compared with CLDNN-based VC and CycleGAN-VC3 (baseline methods). The above two baseline methods are very famous. Especially, the CLDNN-based VC has been verified to be effective in enhancing the Mandarin EL speech. However, the CLDNN-based VC is a typical parallel-dependent VC, which is weak in dealing with the Mandarin EL speech. We require a novel method to address this core problem. The core problem in our research is that the source speech and the target speech are not strictly parallel, which leads to the bad enhancement effect of Mandarin EL speech. This is because the Mandarin EL speech has very complicated tone variation rules. CycleGAN-VC3 is the SOTA method of parallel-data-free VC, which is expected to be used to address the above problem. Considering the above conditions, we proposed to use the CycleGAN-based VC to enhance the Mandarin EL speech. However, CycleGAN-VC3 cannot capture the long dependency of speech, which is still weak in dealing with the Mandarin EL speech. Therefore, we designed the 2D-Con-1D-Tran-2D-Con as the generator and 2D-Conformer as the discriminator of CycleGAN. This novel CycleGAN-based VC is used to enhance the Mandarin EL speech.

According to the results of objective and subjective evaluations including “U/V analysis, F0 CC, log F0 RMSE, MCD, CodeAP RMSE, spectrogram analysis, tone accuracy, WER, MOS of intelligibility and naturalness”, we can draw the conclusion that our proposed method has the superiority comparing with the baseline methods. The intelligibility and naturalness of Mandarin EL speech have all been improved significantly by our proposed method. Please note that the evaluation indicators shown in this paper are the most commonly used, referring to [16], [17], [18], [19], [20], [23], [24] and [25].

Point 9: Please, do not forget that the clarity and the good structure of an article are important factors in the review decision. Please read the paper carefully (again) and correct it in English.

Response 9:  Thanks for your suggestions. We have checked our manuscript carefully and presented our research in a widely accepted way. The structure of our manuscript includes seven Sections (Abstract, Introduction, Methods, Results, Discussion, Conclusion and References). Especially, the experimental setup conditions are represented in Methods. And we represented the results in a separate Section.

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

The authors addressed the reviewer's comments with attention and modified the paper with the suggestions provided. The new version of the paper has improved both in the presentation and in the contents.

Minor revision

Try to enrich the captions of the figures, the reader should be able to read the figure without the need to retrieve the information in the paper. Try to summarize the essential parts of the Figure and what you want to explain with it.

- 46)remove blank line after the text. I will not repeat this advice again, it also applies to the other occurrences.

158) Four equation only one number (1) check.  I will not repeat this advice again, it also applies to the other occurrences.

- 180) Left a blank line

-245) check the table 1 format.  I will not repeat this advice again, it also applies to the other occurrences.

-373) Figure 4(1) ? Rename Figure only Figure 4

- 376) Rename Figure only Figure 5

- 379) Rename Figure only Figure 6

-532) Add authors contributors

- 532)Add where data are available

Author Response

Response to Reviewer 1 Comments

 

Point 1:  Try to enrich the captions of the figures, the reader should be able to read the figure without the need to retrieve the information in the paper. Try to summarize the essential parts of the Figure and what you want to explain with it.

Response 1:  Thanks for your suggestions. Following with your suggestion, we have modified the captions of all the figures to enrich the information of captions. In addition, the summarized description of all the figures are shown as the paragraphs under each figure.

Point 2:  46)remove blank line after the text. I will not repeat this advice again, it also applies to the other occurrences.

Response 2:  Thanks for your suggestions. Following with the format of Applied Sciences journal template, we removed all blank lines in our manuscript and re-adjusted the segment spaces of the figures and tables.

Point 3:  158) Four equation only one number (1) check.  I will not repeat this advice again, it also applies to the other occurrences.

Response 3:  Thanks for your sugesstions. We have split the original equation (1) into the equations (1)-(4), and the original equation (2) into the equations (5)-(7) in the revised manuscript.

Point 4:  180) Left a blank line

Response 4:  Thanks for your suggestions. Following with the format of Applied Sciences journal template, we removed all blank lines in our manuscript and re-adjusted the segment spaces of all equations, figures and tables.

Point 5:  245) check the table 1 format.  I will not repeat this advice again, it also applies to the other occurrences.

Response 5:  Thanks for your suggestions. We have modified the format of all tables following with the template of Applied Sciences journal.

Point 6:  373) Figure 4(1) ? Rename Figure only Figure 4

Response 6:  Thanks for your suggestion. Figure 4 (1) has been renamed to Figure 4.

Point 7:  376) Rename Figure only Figure 5

Response 7:  Thanks for your suggestion. Thanks for your suggestions. Figure 4 (2) has been renamed to Figure 5.

Point 8:  379) Rename Figure only Figure 6

Response 8:  Thanks for your suggestion. Figure 4 (3) has been renamed to Figure 6.

Point 9:  532) Add authors contributors

Response 9:  Thanks for your suggestion. The “Author Contributions” have been added after the Section 5. Highlighted in line 549~552.

Point 10:  532)Add where data are available

Response 10:  Thanks for your suggestion. The “Data Availability Statement” has been added after the “Funding”. Highlighted in line 557.

Author Response File: Author Response.docx

Reviewer 2 Report

In the revised manuscript, the authors have performed a good job; however, the following comments need to be addressed:

1 – In the authors’ names, I believe the number referred to the affiliation should be in superscript.

2 – In the introduction section, the authors need to show the difference between Mandarin EL speech enhancement and the traditional speech enhancement. Also, include references for both types as discussed in the response letter.

3 – Figure 3 should be redesigned/replotted to make it more clear to the readers.

4 – after equation 5, replace “ * ” by “” .

5 – For Figures 4(1), 4(2), and 4(3), the pitch plots are unclear and need to be replotted.

6 – The computation time for the presented work needs to be discussed and compared with existing works.

7 – In the results (Figures and Tables), for example Table 2, include reference number for each Approach.

8 – The manuscript is still having grammatical errors and need to be checked and corrected carefully.

 

Author Response

Response to Reviewer 2 Comments

 

Point 1:  In the authors’ names, I believe the number referred to the affiliation should be in superscript.

Response 1:  Thanks for your suggestion. We have modified all the numbers referred to the affiliation to superscript.

 

 

Point 2:  In the introduction section, the authors need to show the difference between Mandarin EL speech enhancement and the traditional speech enhancement. Also, include references for both types as discussed in the response letter.

Response 2:  Thanks for your suggestions. We have supplemented the difference between Mandarin EL speech enhancement and the traditional speech enhancement. The details can be found in the second paragraph of Introduction (highlighted line 54~line 61). Also, we have added the references discussed in the previous response letter, shown as the references [16-18] in the revised manuscript (in highlighted line 56).

Point 3:  Figure 3 should be redesigned/replotted to make it more clear to the readers.

Response 3:  Thanks for your suggestion. We have supplemented more details of training procedures of CycleGAN including the forward-inverse mapping and the inverse-forward mapping in Figure 3 of the revised manuscript. The top part of Figure 3 contains the acoustic features (plot using praat) of source speech and target speech. The top part of Figure 3 shows the data flow mapping from the source to the target and the data flow mapping from the target to the source. The left bottom part of Figure 3 shows how the discriminator  discriminates the generated  by the forward generator  of CycleGAN. The right bottom part of Figure 3 shows how the discriminator  discriminates the generated  by the inverse generator  of CycleGAN. The description has also been added in the first paragraph under Figure 3 (highlighted in line 198~line 204).

Point 4:  after equation 5, replace “ * ” by “” .

Response 4:  Thanks for your suggestion. We have removed the symbol “*” after the original equation (5) (now equation (10), because we have split the original equation (1) into equations (1)-(4) and original equation (2) into equations (5)-(7) following with the suggestions of Reviewer 1).

Point 5:  For Figures 4(1), 4(2), and 4(3), the pitch plots are unclear and need to be replotted.

Response 5:  Thanks for your suggestion. We have replotted the “pitch plots” of Figure 4(1), Figure 4(2) and Figure 4(3) (Figure 4(1)-(3) have been renamed as Figure 4, Figure 5 and Figure 6 following with the suggestions of Reviewer 1).

Point 6:  The computation time for the presented work needs to be discussed and compared with existing works.

Response 6:  Thanks for your suggestion. We have supplemented the discussion of computation time for the presented work and the existing work. The details can be found in the highlighted parts in the third paragraph of the Discussion Section (line 535~line 536).

Point 7:  In the results (Figures and Tables), for example Table 2, include reference number for each Approach.

Response 7:  Thanks for your suggestion. The first column of each table shows not the approach name, but the speech types enhanced by different baseline methods including CLDNN-based VC and CycleGAN-VC3 (EL-FT and EL-VT are the original Mandarin EL speech). The results shown in tables and figures are not cited from the references directly, but all re-calculated according to the testing data of our research.

To make the tables more clear, we introduced all types of enhanced speech in detail in Section 2.3.5. The reference number of Mandarin EL speech and baseline methods are given in the description of each type of speech (such as references [8],[23],[27]), shown as the highlighted part of Section 2.3.5 (line 306~line 322).

After revising the writing according to your reminder, the information in tables is easier for readers to understand. We appreciate the piece of suggestion.

We have added the reference number of each approach in Figure 7 and Figure 8.

Point 8:  The manuscript is still having grammatical errors and need to be checked and corrected carefully.

Response 8:  Thanks for your suggestion. We have carefully checked our manuscript and corrected the grammatical errors of our manuscript.

 

Author Response File: Author Response.docx

Reviewer 4 Report

The authors corrected the article based on most of the comments and observations. It can be accepted.

Author Response

Response to Reviewer 4 Comments

 

Point 1: The authors corrected the article based on most of the comments and observations. It can be accepted.

Response 1:  Thanks very much for your agreement.

Author Response File: Author Response.docx

Back to TopTop