Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

A Supervised Learning Method for Improving the Generalization of Speaker Verification Systems by Learning Metrics from a Mean Teacher

Appl. Sci. 2022, 12(1), 76; https://doi.org/10.3390/app12010076

by Ju-Ho Kim¹

, Hye-Jin Shim¹, Jee-Weon Jung²

and Ha-Jin Yu^1,*

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Appl. Sci. 2022, 12(1), 76; https://doi.org/10.3390/app12010076

Submission received: 26 November 2021 / Revised: 15 December 2021 / Accepted: 20 December 2021 / Published: 22 December 2021

(This article belongs to the Section Computing and Artificial Intelligence)

Round 1

Reviewer 1 Report

Please check the attached pdf

Comments for author File: Comments.pdf

Author Response

"Please see the attachment."

Author Response File: Author Response.docx

Reviewer 2 Report

The results look encouraging and motivating. But there are still some contents, which need be revised in order to meet the requirements of publish. A number of concerns listed as follows:

1.The abstract should be improved. Your point is your own work that should be further highlighted.

2. In the introduction, the authors should clearly indicate the contributions and innovations of this paper.

3. In my view, the literature survey of introduction is quite weak, unfocused and insufficient. The literature review needs to have a flow that leads to the objectives. What is the essential problem of this work? The authors should really explain the drawback of approaches in the related works especially instead of simply stating what they have done. In addition, some other methods are analyzed in detail, such as clustering algorithms(10.3390/app112311202; 10.3390/app112311294), and so on.

4. Check grammatical errors. Although the paper reads quite well, it is not free from grammatical errors, such as “By learning the reliable intermediate representations derived from the mean teacher network as well as conventionally used one-hot speaker labels,”...

5. The font in figure 1 is too large. It is recommended to use a suitable font.

6. Please define all abbreviations the first time they appear in the abstract, the main text, and the first figure or table caption.

Author Response

"Please see the attachment."

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

The authors have made an outstanding work to improve the manuscript. However, there are still minor errors:

Concept Errors
-Line 139 "the converter transforms the intermediate representation to the speaker embedding". According to your previous description the encoder consists on RawNet2, i.e. the network described in Table 1. Thus it returns an speaker embedding of dimension 512. In consequence you are converting a speaker embedding into a "new" speaker embedding
-Line 156 "

English:
-Line 2 "these task"
-Line 5 "novel supervised learning method based speaker verification system" (hyphens are missing).
-Line 139 "The enconder extracts the representations from the inut utterances, and we used ... as encoder" (Shorter sentences are simpler)

Author Response

"Please see the attachment."

Author Response File: Author Response.docx

Reviewer 2 Report

This paper can be accepted now.

Author Response

Thank you for your positive review of the revised manuscript.

Article Menu

A Supervised Learning Method for Improving the Generalization of Speaker Verification Systems by Learning Metrics from a Mean Teacher

Further Information

Guidelines

MDPI Initiatives

Follow MDPI