Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessReview

Peer-Review Record

Survey on Videos Data Augmentation for Deep Learning Models

Future Internet 2022, 14(3), 93; https://doi.org/10.3390/fi14030093

by Nino Cauli^*

and Diego Reforgiato Recupero^*

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Future Internet 2022, 14(3), 93; https://doi.org/10.3390/fi14030093

Submission received: 22 February 2022 / Revised: 11 March 2022 / Accepted: 13 March 2022 / Published: 16 March 2022

(This article belongs to the Special Issue Big Data Analytics, Privacy and Visualization)

Round 1

Reviewer 1 Report

Dear authors,

you have presented a survey on data augmentation methods for video tasks like video classification, video action recognition, and video object detection. Different application fields and novel methods have been summarized. Overall, the paper is well written. The reviewer has some suggestions and concerns:

Please discuss some existing surveys for data augmentation methods and clarify the differences and relations between your work and existing surveys. They may target other domains of data augmentation but it is nice to discuss this.
For the surveyed data augmentation methods, would it be nice to discuss and summarize their performance? What kinds of data augmentation methods are most useful and important for improving the results?
The efficiency of data augmentation methods is also important because sometimes they are used online during training so it is important to be fast and not increase too much training overhead (e.g., memory). Would it be possible to have a discussion on the efficiency of the surveyed data augmentation methods?
How about distortion data augmentation methods? E.g., [*] "Universal semantic segmentation for fisheye urban driving images." 2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC). IEEE, 2020.
How about video semantic segmentation and other video segmentation applications? Please discuss this.
Many image data augmentation methods can be also used for video data augmentation. Please have a discussion on this.
The future directions section can be better structured.
Recent data augmentation methods based on mixing are quite popular like CutMix, MixUp, SuperMix, please have a discussion on this. Would they be relevant for video data augmentation? Please discuss this.

For these reasons, a revision is recommended.

Sincerely,

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

In this paper the authors present a survey on data augmentation for videos in deep learning models. The existing literature focus mainly in individual images and they try to fill that gap.

The English is good (at least from the point of view of a non-native speaker), but there are some typos and therefore I would recommend to check the manuscript thoroughly.

Some of the typos:
Line 14: "leave" -> "live"
line 47: "a less" -> "less"
line 186: "several time" -> "several times"
line 411: "draw back" -> "drawback"

I am not sure about the convenience of "deep learning models" in the title. While it may be true that the current applications where data augmentation in videos is needed are mainly deep learning models, if any other application needs augmented videos, the presented survey would be still relevant.

In line 79: "A different approach is to generate the images for the augmented dataset from scratch.". Even if the images are not generated from preexisting images, some previous knowledge is needed to create them. You could discuss that a bit. I mean if some simplified, hand-crafted or physics-based model is used to generate images from scratch.

The methodology of using Scopus seems reasonable, but there are some papers in Arxiv widely cited that are not indexed in Scopus. For example, the paper

Samek, Wojciech, Thomas Wiegand, and Klaus-Robert Müller. "Explainable artificial intelligence: Understanding, visualizing and interpreting deep learning models." arXiv preprint arXiv:1708.08296 (2017).

has been cited 956 times according to Google Scholar and it does not appear on Scopus. In this vein, a rapid search of "video" "data augmentation" in Google Scholar returns the paper

Yun, Sangdoo, Seong Joon Oh, Byeongho Heo, Dongyoon Han, and Jinhyung Kim. "Videomix: Rethinking data augmentation for video classification." arXiv preprint arXiv:2012.03457 (2020).

with 8 citations as the first result, and therefore it may deserve a place in your survey. I would recommend to make at least some searching among the top results in Google Scholar, given that you have a total of 33 papers and any addition would have some impact.

The last columns in Tables 1 and 2 are very sparse, therefore they could just be condensed in one column called "Kind of approach" or something similar, and instead of a "X" in the field, the name of the approach. If you feel the table looks too empty, you could add the number of citations of the paper as of the current day, as it could add some approach to the relevance of the oldest papers. Or just expand the kind of approach, for example telling which kind of DL model, which among geometrical, color or temporal space, which GAN flavour, etc.

In lines 401-402: "Some authors are starting to integrate, to generators, models used for sequence analysis like RNN or 3D convolutions, and we believe this will be the future direction." Please elaborate this. Why do you think it will be the future direction?

In line 421: "We analysed 33 papers published in the last 6 years pointing out the most common methodologies in use and future directions." Not sure if six or seven years since the oldest paper is from 2016 and the newest from 2022. Anyway, maybe it is better to explicit the period 2016-2021 or 2016-first months of 2022.

The main point I miss in the paper is data about the goodness of the different approaches presented. While I find the paper really interesting and that it conveys a lot of information, the researcher who reads it cannot know which approach to use unless some reference tells that they applied approach X and they obtained results Y. There are sentences stating things like "thermal image closer to the real data", "This method is able to create realistic video sequence" or "generating a new realistic synthetic sequence", but no data to support this. If the aim of the paper is to guide the practitioner of deep learning to choose an approach over another, then some data of the performance of deep learning models with these augmented data should be shown.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Dear authors,

thank you very much for your responses and revisions. Most of the concerns have been addressed. We suggest that this paper be accepted.

For the final version, we have a suggestion.

Many paragraphs are very long. It would be nice to separate the long paragraphs into shorter ones to better structure the paper, which can improve readability.

Sincerely,

Article Menu

Survey on Videos Data Augmentation for Deep Learning Models

Further Information

Guidelines

MDPI Initiatives

Follow MDPI