Next Article in Journal
Irrigation-Initiated Changes in Physicochemical Properties of the Calcisols of the Northern Part of Fergana Valley
Previous Article in Journal
Study on the Optical Coupling Effect of Building-Integrated Photovoltaic Modules Applied with a Shingled Technology
 
 
Article
Peer-Review Record

Verse1-Chorus-Verse2 Structure: A Stacked Ensemble Approach for Enhanced Music Emotion Recognition

Appl. Sci. 2024, 14(13), 5761; https://doi.org/10.3390/app14135761
by Love Jhoye Moreno Raboy * and Attaphongse Taparugssanagorn
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3:
Reviewer 4: Anonymous
Appl. Sci. 2024, 14(13), 5761; https://doi.org/10.3390/app14135761
Submission received: 12 April 2024 / Revised: 26 June 2024 / Accepted: 28 June 2024 / Published: 1 July 2024

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

- The main weakness of the work consists in the fact that it does not justify a new standard in the field as mentioned in section 1.5. I agree that it contributes to the field, but it does not generate a new standard.  To affirm that it generates a new standard in the field, it is necessary to deepen the literature review by exposing with a higher level of depth the review of works of the same field. It should be clarified what a standard is in this context, which ones exist at present, and support with solidity the reasons for considering it as a contribution to a new standard. 

- Table 1 is not clearly understood, it should be better explained.

- It is important to make it clear whether the dataset is your own or taken from a source. In the contributions, the dataset is mentioned, but in the development of the paper, it seems that the dataset is taken from Spotify. 

- The general model integrates through concatenations all the parts of the song structure, verses, and choruses. It is important to clarify why in the experiments the song is analyzed globally, and not by parts of the structure. 

Author Response

Please see the attachment

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

The study delves into music emotion recognition within artificial intelligence frameworks. While the methodology is well-articulated, the primary objective of the research remains unclear to this reviewer.

The authors mentioned considering spectrum bandwidth in their computations. Could you elaborate on how this frequency spectrum relates to emotion recognition, if at all?

Regarding references, please include one for Russell's Emotion Plane at L209.

At L213, could you provide scientific rationale for excluding gym and yoga songs?

Furthermore, it would be helpful to clarify whether all songs used in the study were in English at L225.

I suggest a minor revision.

Author Response

Please see the attachment

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

The manuscript entitled, ‘Verse1-Chorus-Verse2 Structure: A Stacked Ensemble Approach 2 for Enhanced Music Emotion Recognition’, provides the narrative on emotional recognition through the rhythm and lyrics using the structured framework. The authors have utilized Spotify playlists to construct a stacked ensemble model to improve emotion prediction. 

 

Specific comments to improve:

1.     The narrative in this manuscript may appear to the readers that the authors did not write the manuscript. The description does not specifically address the dataset but reads like a review article. The manuscript needs to describe the methodology precisely.

2.     The abstract does not reflect what is described in the manuscript

3.     The authors have provided a detailed introduction describing the referenced articles, which may be helpful to the readers.

4.     The methodology sections need clarity as the description sounds hypothetical rather than the experimental/mathematical derivation. The description of Table 1 needs to be clear so, the readers can appreciate how these numbers are derived from, ‘what song’ the authors are referring to. Otherwise, it appears as a generalized statement.

5.     The maximum likelihood estimator and function described in prediction vectors and output classification are generally adopted in many existing algorithms how are the methods described in this paper unique?

 

6.     The results section needs to be improved. The comparative bar diagram (Fig. 4 to 8) does not provide an accurate scale, providing a scale on the y-axis will help.

Comments on the Quality of English Language

The description needs to correspond to the data being presented and not superfluous.

Author Response

Please see the attachment

Author Response File: Author Response.pdf

Reviewer 4 Report

Comments and Suggestions for Authors

l  This study uses a stack ensemble model for music emotion recognition.

 

l  The detailed approach that attempted to classify the song into verse1, chorus, and verse2 segments through structural analysis is impressive.

Comments for author File: Comments.pdf

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

The work has improved considerably in terms of its introduction, and the inclusion of details that allow a better understanding of its purpose and development methodology. 

However:

1. The concatenation process should be clarified in more detail, perhaps explaining with a brief example how to concatenate the audio of a song with a lyric. What type of information does this vector have? What is its structure? Explain better why it makes sense to do this concatenation. 

2. It is not clear the data flow between the concatenated and the meta learner. This process should be detailed to understand it. 

3. It would be necessary to compare the classification results of this proposed standard versus a system that only classifies by text or audio. 

4. I believe that there should be a more practical explanation somewhere in the methodology through a brief example, where the concatenation process is clarified, and also justifies the strengths of carrying out this process from a musical perspective.

The topic of the work is extremely interesting, but I consider that it requires a rewriting process that allows a better understanding of the contribution, including a more extensive review of the state of the art that supports the importance of making classifications with the concatenation of sound and lyrics. For this last part, it is also important a theoretical background that supports the contribution from the musical point of view.

 

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Back to TopTop