Next Article in Journal
Microstructural Characteristics, Modeling of Mechanical Strength and Thermal Performance of Industrial Waste Glass Blended Concrete
Next Article in Special Issue
Multi-Scale Convolutional Network for Space-Based ADS-B Signal Separation with Single Antenna
Previous Article in Journal
Simulation, Fabrication and Testing of UAV Composite Landing Gear
Previous Article in Special Issue
Cloud Gaming Video Coding Optimization Based on Camera Motion-Guided Reference Frame Enhancement
 
 
Article
Peer-Review Record

Adaptive Multi-Modal Ensemble Network for Video Memorability Prediction

Appl. Sci. 2022, 12(17), 8599; https://doi.org/10.3390/app12178599
by Jing Li 1, Xin Guo 2, Fumei Yue 3,*, Fanfu Xue 3 and Jiande Sun 3,*
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Appl. Sci. 2022, 12(17), 8599; https://doi.org/10.3390/app12178599
Submission received: 26 June 2022 / Revised: 18 August 2022 / Accepted: 20 August 2022 / Published: 27 August 2022
(This article belongs to the Special Issue Computer Vision and Pattern Recognition Based on Deep Learning)

Round 1

Reviewer 1 Report

Dear Authors,

Recommendations for corrections of scientific paper are as follows:

1.      Line numbers do not exist after number 283 in one part of the text. This is not so important, but it should be taken care of.

2.      Line 60 and 61- Please explain abbreviations when they appear for the first time in the text (LBP, HOG, RGB). Write full name of abbreviation. You explained some of them later in the text (eg LBP) but the full name should be be written when they appear for the first time in the text.

3.       Line 126 - Write full name of abbreviation (SUN).

4.       Line 301 and 302- You wrote “As can be seen from Figure 2, our data are not normally distributed…..”. Where can we see it?  It cannot be seen in Figure 2. Please explain in more details. You may have meant Figure 1.

5.       Please explain in the text what is shown in Figure 1.

6.       Figure 2.- Name. Shorten the name of Figure 2 if possible. Give a longer explanation in the text.

7.       Line 328- You wrote “The experimental process is the same as shown in Figure 4…..”. Figure 4 shows no process at all. Please provide further explanation on this.

8.       Figure 3.- Name. Shorten the name of Figure 2 if possible. Give a longer explanation in the text.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

In this work, authors provided a new framework, Adaptive Multi-modal Ensemble Network (AMEN), to predict the video memorability scores. From my view, this paper is well organized and the proposed method is valuable for this research filed. After reviewed this paper, there are some questions and suggestions as follows.

  1. Some figures need to be enhanced in terms of quality and resolution.
  2. You must review all significant similar works that have been done. Also, review some of the good recent works that have been done in this area and are more similar to your paper. 
  3. What are the advantages and disadvantages of this study compared to the existing studies in this area? This needs to be addressed explicitly and in a separate subsection.
  4. There are many grammatical mistakes and typo errors.
  5. Write a pseudocode in standard format for the proposed algorithm.
  6. The comparison section is relatively weak. The proposed method should be compared with at least 3 other novel methods.
  7. The experimental results indicate that they perform well, but providing a stronger theoretical analysis and justification for the algorithm would be more convincing. To clearly state the objective of the research in terms of problems to address and expected results and show how the proposed technique will advance the state of the art by overcoming the limitations of the existing work. Also, the results obtained must be interpreted.
  8. It is necessary to experimentally analyze the proposed algorithm in terms of time consumed and compare with other algorithms.

Some final cosmetic comments:
* The results of your comparative study should be discussed in-depth and with more insightful comments on the behaviour of your algorithm on various case studies. Discussing results should not mean reading out the tables and figures once again.
* Avoid lumping references as in [x, y] and all other. Instead summarize the main contribution of each referenced paper in a separate sentence. For scientific and research papers, it is not necessary to give several references that say exactly the same. Anyway, that would be strange, since then what is innovative scientific contribution of referenced papers? For each thesis state only one reference.
* Avoid using first person.
* Avoid using abbreviations and acronyms in title, abstract, headings and highlights.
* Please avoid having heading after heading with nothing in between, either merge your headings or provide a small paragraph in between.
* The first time you use an acronym in the text, please write the full name and the acronym in parenthesis. Do not use acronyms in the title, abstract, chapter headings and highlights.
* The results should be further elaborated to show how they could be used for the real applications.

* Are all the images used in this work copyrights free? If not, have the authors obtained proper copyrights permission to re-use them? Please kindly clarify, and this is just to ensure all the figures are fine to be published in this work.

* Also, the list of references should be carefully checked to ensure consistency with between all references and their compliances with the journal policy on referencing.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

In this study, the authors proposed an Adaptive Multi-modal Ensemble Network for Video Memorability Prediction. The model is applied on public dataset VideoMem and reached a promising performance. However, some major points should be addressed as follows:

1. There must have external validation data to see the model performance on different datasets.

2. The authors are suggested to conduct cross-validation on the training process.

3. Uncertainties of models should be reported.

4. When comparing the predictive performance among methods/models, the authors must conduct some statistical tests to see significant differences.

5. More discussions should be added.

6. Why did the authors use different numbers of epochs in short and long-term video prediction models?

7. Deep learning is common and has been used in previous studies i.e., PMID: 34915158, PMID: 34812044. Thus, the authors are suggested to refer to more works in this description to attract a broader readership.

8. Source codes should be provided for replicating the study.

9. There must have evidence on "we determine using Random Forest (RF) algorithm and the full connection layer (MLP) method for comparison".

10. Quality of figures should be improved.

11. English language should be improved.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

 

Good revisions have been made in the paper and the revised version has the necessary qualities for acceptance compared to the previous version. In my opinion, the article is acceptable in its current form.

Reviewer 3 Report

My previous comments have been addressed.

Back to TopTop