Next Article in Journal
Generative Adversarial Network for Image Super-Resolution Combining Texture Loss
Next Article in Special Issue
Gaussian Process Synthesis of Artificial Sounds
Previous Article in Journal
An Integrated Approach to Biomedical Term Identification Systems
Previous Article in Special Issue
Binaural Rendering with Measured Room Responses: First-Order Ambisonic Microphone vs. Dummy Head
 
 
Article
Peer-Review Record

Source Separation Using Dilated Time-Frequency DenseNet for Music Identification in Broadcast Contents

Appl. Sci. 2020, 10(5), 1727; https://doi.org/10.3390/app10051727
by Woon-Haeng Heo 1, Hyemi Kim 2 and Oh-Wook Kwon 1,*
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Appl. Sci. 2020, 10(5), 1727; https://doi.org/10.3390/app10051727
Submission received: 19 December 2019 / Revised: 20 February 2020 / Accepted: 24 February 2020 / Published: 3 March 2020
(This article belongs to the Special Issue Sound and Music Computing -- Music and Interaction)

Round 1

Reviewer 1 Report

In this paper, the authors proposed a Time-Frequency DenseNet with novel submodules for sound source separation.

Performance of the proposed method was extensively compared with other state-of-the-arts neural networks used for sound source separation, such as U-Net and Dense-Net.

Among the proposed techniques, the idea of ​​multi-band block is excellent, and the performance improvement is shown objectively through experiments.

I believe that this paper may be published as current status.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 2 Report

This paper presents source separation architecture using dilated time-frequency DenseNet for background music identification of broadcast contents.

The paper is well organized and readable. I have some suggestions, which are described below, to be considered in order to improve the paper.

There are quite a few comparisons in the paper. In the references, I miss one paper (Music detection from broadcast contents using convolutional neural networks with a Mel-scale kernel) in which the authors of this paper also participated. Please make comparisons of the proposed algorithm in this missing reference. The additional experiments should confirm that in this paper presented technique is really better.

I recommend that the paper should be accepted with major revision.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 3 Report

A source separation method based on dilated Time-Frequency DenseNet is proposed, and used for music identification. The main contribution is on source separation based on the proposed architecture, an improved version of MDenseNet. The paper is well written. The topic is edge-cutting and interesting, nevertheless from my point of view, some enhancement should be done to achieve requirements for publication. Please, consider the following:

Have you tried the blind source separation (BSS) methods [1-4], such as ICA for determined BSS and SCA for the underdetermined case? I guess they are also suitable for the source separation. Please consider this concern in the revised paper. How to determined the number of sources in real applications? For instance, if there were many background speakers, will the proposed method still work? Why the dataset for source separation and identification are different? How the speech signal and music signal are mixed? In linear way or? I suggest to reduce the abstract and remark key findings and in which way the manuscript improves the state of the art or, please mention very briefly the advantages of the proposed techniques. Please double check the paper to avoid the typos, such as Line 45-line 46 background music is mostly mixed XXX Line 111, remove ‘this’ Line 151, the number of

 

Reference:

[1] Oja, Erkki, and Zhijian Yuan. "The FastICA algorithm revisited: Convergence analysis." IEEE Transactions on Neural Networks 17.6 (2006): 1370-1381.

[2] Georgiev, Pando, Fabian Theis, and Andrzej Cichocki. "Sparse component analysis and blind source separation of underdetermined mixtures." IEEE transactions on neural networks 16.4 (2005): 992-996.

[3] Zou, Liang, et al. "Underdetermined joint blind source separation of multiple datasets." IEEE Access 5 (2017): 7474-7487.

[4] De Lathauwer, Lieven, and Joséphine Castaing. "Blind identification of underdetermined mixtures by simultaneous matrix diagonalization." IEEE Transactions on Signal Processing 56.3 (2008): 1096-1105.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Round 2

Reviewer 2 Report

This paper presents source separation architecture using dilated time-frequency DenseNet for background music identification of broadcast contents.

The paper is well organized and readable. I have one suggestion, which is described below, to be considered in order to improve the paper.

The authors explained in the cover letter why they did not do additional experiments. However, I suggest that the authors mention their previous paper (Music detection from broadcast contents using convolutional neural networks with a Mel-scale kernel) and briefly outline the difference between both types of research.

I recommend that the paper should be accepted.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 3 Report

The manuscript has been significantly improved. It can be accepted for publication now.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Back to TopTop