Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Multi-Modal Emotion Recognition for Online Education Using Emoji Prompts

Appl. Sci. 2024, 14(12), 5146; https://doi.org/10.3390/app14125146

by Xingguo Qin¹, Ya Zhou¹ and Jun Li^1,2,*

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Reviewer 3:

Abdeljalil Métioui

Appl. Sci. 2024, 14(12), 5146; https://doi.org/10.3390/app14125146

Submission received: 10 May 2024 / Revised: 10 June 2024 / Accepted: 11 June 2024 / Published: 13 June 2024

(This article belongs to the Special Issue Applied and Innovative Computational Intelligence Systems: 3rd Edition)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The proposal, "Multi-modal emotion recognition fusing emoji prompt for online learning", introduces a method for analyzing the emotional content of online learning reviews using a combination of text and emoji data. The study leverages a pre-training emotion prompt learning method to enhance sentiment polarity detection, aiming to provide a more accurate reflection of students' emotional expressions.

The proposal seems exciting but has some basic improvements that should be clarified.

Substance:

1. Emoji Data Integration:

- The current method for integrating emoji data is described as simplistic. It primarily relies on the straightforward matching of emoji expressions to emotional polarity, which can lead to significant errors. This approach must leverage the nuanced emotional content that emojis can convey fully.

- Develop a more sophisticated model to understand better and interpret the context in which emojis are used. This could involve using deep learning techniques designed explicitly for emoji interpretation, such as emoji embeddings that capture the semantic meaning of emojis in various contexts.

- Emojis can have different meanings depending on the context of the surrounding text. The current model does not account for this contextual variability. For example: "We selected two groups of emoji data with strong emotional polarities, as shown in Figure 3." What do the authors refer to when they name "strong emotional polarities"? In the negative polarity, the last emoticon does not represent something obvious. It is the same scenario for the second one in the same line; it can be used sarcastically. How do they deal with these scenarios?

- Different emojis can express varying intensities of emotions, which the current method may not adequately capture.

2. Model Robustness:

- The study relies on text and emoji data from a few specific online learning platforms. This may limit the generalizability of the findings.

- The simplistic emoji matching can introduce errors propagating through the model, affecting the final sentiment analysis.

- In section (4.1), please better contextualize those used, their differences, main characteristics, and reasons for comparison. With the current format, what is essential needs to be recovered.

- Use error-correction techniques and post-processing steps to mitigate the impact of initial matching errors. For example, a secondary validation step could be added where misclassified sentiments are reviewed and corrected using a more sophisticated sub-model or heuristic rules.

3. Minor observations:

- Please improve the image quality of Fig. 2. Generally, the images are of poor quality. I suggest reassembling or improving the resolution of the images.

- The authors declare a repository: https://github.com/mrlijun2017 - However, they do not have information regarding the present research. Please clarify.

Comments on the Quality of English Language

The article presents minor grammatical errors, and awkward phrasing should be corrected to improve the overall clarity and professionalism of the manuscript.

Author Response

Dear Reviewer,

Thank you very much for your comments and suggestions of our manuscript. The following are the responses to the questions. All changes are shown in red handwriting in the revised manuscript.

Emoji Data Integration:

Comment:

Response:

In actual experiments, each expression corresponds to an emotional label. During training, the embedding vector of each expression corresponds to the embedding vector of its label. This maintains consistency across modal spaces and integrates expression information into textual information. Although this method is overly simplistic and direct, it can enhance the accuracy of emotional classification to a certain extent.

Comment:

Response:

In our experiments, we integrated the textual labels of emoji symbols to enhance feature expression. This method is indeed straightforward, but it achieves a certain level of effectiveness.

Comment:

Response:

In practice, each emoji contains a label, and for ambiguous emojis, there are multiple candidate labels. When the model is uncertain about the true emotion of an emoji, it utilizes textual information to preliminarily determine the sentiment polarity and then selects the final emotional label based on the sentiment polarity.

Comment:

- Different emojis can express varying intensities of emotions, which the current method may not adequately capture.

Response:

We achieve this by using the textual labels of the emojis.

Model Robustness:

Comment:

- The study relies on text and emoji data from a few specific online learning platforms. This may limit the generalizability of the findings.

Response:

Indeed, there is such concern. In the future, we will integrate more platform-specific emoji features to achieve universality of the model.

Comment:

- The simplistic emoji matching can introduce errors propagating through the model, affecting the final sentiment analysis.

Response:

We will further optimize the model to control the emotional polarity of emoji symbols within a certain range of emotional polarity thresholds.

Comment:

- In section (4.1), please better contextualize those used, their differences, main characteristics, and reasons for comparison. With the current format, what is essential needs to be recovered.

Response:

We have revised the section 4.1.

Comment:

Response:

We added related experiment in section 4, and the results in Table 4.

Minor observations:

Comment:

- Please improve the image quality of Fig. 2. Generally, the images are of poor quality. I suggest reassembling or improving the resolution of the images.

Response:

We have modified the Fig.2.

Comment:

- The authors declare a repository: https://github.com/mrlijun2017 - However, they do not have information regarding the present research. Please clarify.

Response:

Our related research is currently undergoing further validation and will soon be published to the relevant repository. Any researcher can seek the relevant data through the corresponding author.

Comment:

The article presents minor grammatical errors, and awkward phrasing should be corrected to improve the overall clarity and professionalism of the manuscript.

Response:

We have revised the language of the manuscript.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

This paper presents an emotion classification method for online learning scenarios, which has been an emerging education mode in recent years. The proposed method uses both text comments and emoji symbols for emotion prediction and achieves best performance compared to several methods. Overall, it is good although there are several issues to be addressed.

Issues:

1. All figures should be of high-resolution.

2. The writing must be improved by a native English speaker. There are several confusing expressions and mistakes. For example, "online learning" should be "online education" to distinguish it from online learning in machine learning. The sentence in Line 167 is not complete.

3. Emotion classification has broad applications, such as in movie dubbing, both captions and face expressions are used to predict the emotions in generated speeches. This should be discussed in the Introduction with comparisons to V2C: visual voice cloning, Learning to dub movies via hierarchical prosody models, StyleDubber: Towards Multi-Scale Style Learning for Movie Dubbing.

Comments on the Quality of English Language

As mentioned above, there are many non-professional and misleading expressions. The authors should find a professional native English speaker to proofread the whole paper.

Author Response

Dear Reviewer,

Thank you very much for your comments and suggestions of our manuscript. The following are the responses to the questions. All changes are shown in red handwriting in the revised manuscript.

Comment:

All figures should be of high-resolution.

Response：

We have modified the figures.

Comment:

The writing must be improved by a native English speaker. There are several confusing expressions and mistakes. For example, "online learning" should be "online education" to distinguish it from online learning in machine learning. The sentence in Line 167 is not complete.

Response：

We have modified these mistakes.

Comment:

Emotion classification has broad applications, such as in movie dubbing, both captions and face expressions are used to predict the emotions in generated speeches. This should be discussed in the Introduction with comparisons to V2C: visual voice cloning, Learning to dub movies via hierarchical prosody models, StyleDubber: Towards Multi-Scale Style Learning for Movie Dubbing.

Response：

We have added the discussion in the Introduction.

Comment:

As mentioned above, there are many non-professional and misleading expressions. The authors should find a professional native English speaker to proofread the whole paper.

Response：

We have revised the language of the manuscript.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

In the context of work on online learning, this research is relevant. The literature review, as well as the methodology used, are well explained. However, the proposed method (lines 132-228) requires a more detailed explanation in relation to the literature review presented, particularly the module illustrated in section 3.2 (« Multi-modal Emotion Analyses Method »). This will help the reader to better understand the research process and its implications. Also, the limits of the quantitative analyses should be highlighted to enhance the scientific quality of the results put forward.

Author Response

Dear Reviewer,

Thank you very much for your comments and suggestions of our manuscript. The following are the responses to the questions. All changes are shown in red handwriting in the revised manuscript.

Comment:

Response：

We have modified these sections.

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

Although the authors have taken care of some observations, they have not yet managed to respond to some observations:

I recommend explicitly putting in the text clarifications to the following.

1.- About “straightforward matching of emoji expressions.” State this characteristic in your work, indicating its advantages and disadvantages.

2.- Add a section on future work stating the weaknesses and how they will be handled, especially about the educational platforms used and the polarity of emotions.

Comments on the Quality of English Language

N/A

Author Response

Dear Reviewer,

Thank you very much for your comments and suggestions of our manuscript. The following are the responses to the questions. All changes are shown in red handwriting in “Conclusions and future work” of the revised manuscript.

I recommend explicitly putting in the text clarifications to the following.

Comment:

1.- About “straightforward matching of emoji expressions.” State this characteristic in your work, indicating its advantages and disadvantages.

Response:

We have added them in the section “Conclusions”.

Comment:

- Add a section on future work stating the weaknesses and how they will be handled, especially about the educational platforms used and the polarity of emotions.

Response:

We have added them in the section “Conclusions and future work”.

The contents modified and added are as follows:

Our method integrates multi-modal features, which can make the use of emoji features to increase the dimension of emotion polarity, thus improving the accuracy of emotion analysis. However, our model relies solely on the label characteristics of the emojis and lacks the deep emotional features of the emojis, due to the low computational complexity of the direct matching method of emoji expressions and the lack of deep fusion of emotional features, thus needing to be enhanced in terms of model migration and robustness. In future work, according to the characteristics of the education platform we use, we further enhance the enthusiasm of the platform students to participate, collect more comprehensive comment data, and use the large language model to enhance the data, so as to improve the emotion analysis accuracy of the model and the migration ability of the platform, while further optimizing the emoji emotion module, particularly for modeling prompted emoji data, we will extract more refined emotional features and optimize the fusion algorithm. At the same time, we will expand the dimensions of the multi-modal data further, such as by incorporating multi-frame audio and video data, to enhance the model’s robustness and applicability.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

My concerns have been addressed and recommend to publish it as it is.

Author Response

Dear Reviewer,

Thank you very much for your comments and suggestions of our manuscript.

Comment:

My concerns have been addressed and recommend to publish it as it is.

Response:

Thank you again for your work.

Author Response File: Author Response.pdf

Article Menu

Multi-Modal Emotion Recognition for Online Education Using Emoji Prompts

Further Information

Guidelines

MDPI Initiatives

Follow MDPI