Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessReview

Peer-Review Record

Quick Overview of Face Swap Deep Fakes

Appl. Sci. 2023, 13(11), 6711; https://doi.org/10.3390/app13116711

by Tomasz Walczyna

and Zbigniew Piotrowski^*

Reviewer 1:

Han Wang

Reviewer 2:

Kim Boström

Reviewer 3: Anonymous

Appl. Sci. 2023, 13(11), 6711; https://doi.org/10.3390/app13116711

Submission received: 9 May 2023 / Revised: 24 May 2023 / Accepted: 29 May 2023 / Published: 31 May 2023

(This article belongs to the Special Issue Application of Machine Vision and Deep Learning Technology)

Round 1

Reviewer 1 Report

Comments for author File: Comments.pdf

Please find the attached PDF.

Author Response

Dear Reviewer,

Thank you for your insightful question regarding our article. We appreciate your attention to detail and are grateful for the opportunity to address your concerns.

In the introduction section's first paragraph, the authors say that source-based algorithms have no control over the environment they use. If so, why would the authors focus on reviewing the source-based approach?

Thank you for bringing up the inconsistency in the first paragraph of the introduction section. We appreciate your attention to detail.

Upon reviewing the manuscript, we acknowledge the error in our initial statement. The correct intention was to focus on target-based algorithms rather than source-based algorithms. We apologize for the confusion caused by the inaccurate information presented.

In the revised version of the manuscript, we have made the necessary corrections to reflect our intended focus accurately. As explained, the target-based approach involves extracting the person's "identity" from the source image and integrating it into the target image while preserving the characteristics present in the target image. On the other hand, the source-based approach involves editing the source image based on the attributes extracted from the target image.

The manuscript has multiple abbreviations not adequately described with full names during their first appearances (such as SOTA and GANs). Please carefully define all abbreviations.

Thank you for raising an important point regarding using abbreviations in our manuscript. We appreciate your feedback and apologize for any confusion caused by the lack of proper definitions for these abbreviations.

In the revised version of the manuscript, we have taken great care to address this issue. We have ensured that all abbreviations used throughout the text are explicitly defined with their full names upon their first appearance.

Except for the generator section, the reviews of all other sections are relatively superficial. Can the authors dive deeper into these sections?

Thank you for your feedback regarding the depth of the reviews in our manuscript. We appreciate your suggestion to provide a more comprehensive analysis of the sections other than the generator. We have carefully considered your input and would like to address your concern.

We carefully considered your suggestion and tried to provide more detailed discussions in the sections concerning identity extraction. We recognize the importance of delving deeper into the intricacies of this aspect, as it is a crucial component of our research.

However, we would like to explain our rationale behind the relatively concise treatment of the attribute extraction section. Our decision was based on the fact that many deepfake generation methods have inherent connections and interdependencies between attribute extraction and the generator. Separating them forcefully could create confusion and hinder a comprehensive understanding of the framework.

We apologize if this caused any confusion or gave the impression of superficiality. We aimed to strike a balance between providing detailed insights and maintaining clarity in presenting the interconnected components of the framework.

Please let us know if you have any specific suggestions or areas within the non-generator sections that require further elaboration. We value your feedback and are committed to enhancing the manuscript's comprehensiveness and coherence.

The authors give a lot of different abbreviations in introducing different algorithms. For example, but not limited to SPADE, AAD, etc. Can the authors give more background introductions about the technologies referred to by these abbreviations?

Thank you for your feedback on the use of abbreviations in our manuscript. We appreciate your suggestion to provide more background introductions about the technologies represented by these abbreviations. While our primary focus in the article was not to provide extensive descriptions of these auxiliary methods, we understand the importance of giving readers sufficient context.

In response to your feedback, we have revised the manuscript to include brief descriptions or explanatory sentences about the most significant and relevant technologies associated with the mentioned abbreviations. We intend to provide readers with a better understanding of these abbreviations' key concepts and methodologies, even if our coverage is concise.

While we strive to balance brevity and clarity, we recognize that providing some background information will contribute to the overall comprehension of the topic.

The authors listed the comparison of different face-swapping algorithms in Fig. 2. What is the conclusion of this comparison? What is the most efficient method?

Thank you for your question regarding comparing different face-swapping algorithms in Figure 2. We appreciate your interest in understanding the conclusion of this comparison and identifying the most efficient method.

Our manuscript has presented a comprehensive evaluation and comparison of various face-swapping algorithms. This comparison aims to provide an overview of their strengths and weaknesses rather than to determine a single "most efficient" method.

It is important to note that the efficiency of a face-swapping method depends on the specific requirements and priorities of the application or use case. Therefore, determining the "most efficient" method would require considering various factors such as computational speed, accuracy, attribute preservation, and identity representation in relation to the user's specific needs.

Please let us know if you have any further suggestions or specific areas, you would like us to focus on in particular sections. We value your input and are committed to delivering a high-quality, informative manuscript.

Thank you for your valuable feedback and for giving us the opportunity to enhance our work.

Sincerely,

MSc. Tomasz Walczyna

Reviewer 2 Report

The authors present a review on the functionality of algorithms most used in face-swapping. They lay out a general scheme to describe all these algorithms schematically, and they thoroughly discuss and compare the algorithms. The paper is well-written and structured, the references to the literature are sufficiently recent and well informed. Only a few minor issues should be addressed to improve the manuscript:

Introduction: “Target-based entails extracting the person's "identity" from the image containing the target (source) and inserting it in the target image with the characteristics contained there. Source-based, however, involves editing the image containing our target (source) based on the attributes extracted from the target image. The downside of the second solution is that we have no control over the environment we use. Therefore, the work will focus on source-based algorithms to make the use as universal as possible.” This paragraph is very confusing, especially the use of the words “target” and “source”. In my understanding, the paragraph may be improved as “The target-based approach entails extracting the person's "identity" from the source image and inserting it in the target image with the characteristics contained there. Source-based, in contrast, involves editing the source image based on the attributes extracted from the target image. The downside of the second solution is that we have no control over the environment we use. Therefore, the work will focus on target-based algorithms to make the use as universal as possible.” Note that I made some fundamental adjustments here that may well be wrong. In the original formulation there were some obvious contradictions, e.g. the authors say that the “second” approach has some downsides, which is the source-based approach, and “therefore” they use the source-based approach, which does not make sense. I assume they mean the target-based approach, also from my understanding of these approaches. However, even if I'm wrong here, the authors should re-formulate their sentences to remove any confusion.
P2: “Source identity extraction - involves obtaining information about the target's identity”. Similar to my above point, there is an apparent contradiction. The authors define “source” identity extraction by obtaining information about the “target's” identity. Does it not have to be the “source's” identity that is obtained?
P2: “Attributes extraction - involves extraction from the image containing the person whose identity will be edited of features and attributes such as pose, emotion, and background.” Is it the source or the target attribute extraction? From what I understand it should be the source attribute extraction, so maybe the sentence can be improved as “Source attribute extraction - involves extraction from the source image containing the person whose identity will be edited of features and attributes such as pose, emotion, and background to blend into the target image.”
P2: “Generator-responsible for generating the result containing the sub-identity to be edited”. What does “generator-responsible” and “sub-identity” mean? I assume the authors aim to put a third category of image processing here, analog to the two above. The sentence could possibly be improved as “Target generation - generates the target image containing the source person's identity”
P2: “The presented division is general. Some developers combine or divide the presented functionalities additionally. However, for the sake of unification, this division is pro-posed.” This formulation is somewhat unhappy. I suggest the following. “Although some developers combine or divide the presented functionalities differently, our formulation is universally applicable and straightforward, so we will stick to it throughout this study.” Or something similar.
P2: “One of the first approaches to face swapping is FaceSwap [2]. This software first appeared in 2017 and is still being developed.” I remember having installed a face swap app already as early as 2014. Maybe the authors may want to check for earlier occurrences and adapt their references accordingly.
P4: “DeepFaceLab is still under development, and the developers, in terms of prepro-cessing, leave only the best algorithms to the user. Instead, they introduce several mods, determining what part of the face to manipulate [...]” This formulation is confusing. What does it mean that the developers “leave only the best algorithms” to the user? Also, why is it “instead” that they introduce several mods. I can't make sense of these sentences and suggest to reformulate them.
P7: Please add equation numbering. Also, remove the superfluous comma right at the beginning of the paragraph following the equation. And please use full sentences to describe the meaning of the symbols used, e.g. “where Y?,? denotes the output image with swapped identity, X? denotes the input image containing identity information [...]”
P9: Remove the colon after “process using”.
P13: “[...] the algorithm must execute quickly enough for the user to hear the outcome in real time.” Certainly, the user “sees” the outcome, not “hears”.

With these minor issues adequately addressed I recommend the paper for publication.

The language is readable, but contains some quirks and would certainly benefit from being proofread by a native speaker. I have suggested a few reformulations where the sentences are confusing.

Author Response

Dear Reviewer,

Thank you for your thorough and valuable feedback on our manuscript. We appreciate your attention to detail and the specific points you raised. We have carefully reviewed your suggestions and made the necessary revisions to address the concerns and improve the manuscript's clarity. Here are the responses to each of your points:

Introduction: We apologize for the confusion caused by the initial paragraph. We have revised it per your suggestion to remove ambiguity and contradictions.
P2: We have corrected the definition of "source identity extraction" to accurately refer to obtaining information about the source's identity rather than the target's identity.
P2: After considering this issue, we made the identity extractor independent of source/target naming to avoid confusion on the extractors' part.
P2: The sentence referring to the generator has been rephrased to clarify that it is responsible for generating the target image containing the source person's identity.
P2: We have modified the formulation to emphasize that our division of functionalities is universally applicable and straightforward throughout the study.
P2: In our manuscript, when referring to FaceSwap, we specifically focus on the software with that name, which first appeared in 2017 and continues to be developed. We acknowledge that there were earlier face swap apps available. However, it is essential to note that the scope of our discussion is centered on specific techniques that utilize deep neural networks, which fall under the category described as DeepFake.
P4: The statements regarding DeepFaceLab have been rephrased to provide more precise and coherent explanations.
P7: Equation numbering has been added, the extra comma has been removed, and symbol descriptions have been provided using full sentences.
P9: The colon after "process using" has been removed.
P13: We have corrected the sentence to reflect that the user "sees" the outcome rather than "hears" it.

We have carefully addressed each of the points raised, ensuring the necessary revisions have been made to improve the manuscript's clarity and coherence.

We sincerely appreciate your thoughtful review and valuable suggestions. We are grateful for your recommendation of the paper for publication. If you have any further questions or concerns, please do not hesitate to let us know.

Thank you once again for your thorough evaluation.

Sincerely,

MSc. Tomasz Walczyna

Reviewer 3 Report

This paper briefly introduces the fundamentals of some latest FaceSwap DeepFake algorithms. The topic is interesting, and the results look encouraging and motivating. The reviewer can recommend its acceptance with minor revision. Here are some comments:

1. In order to highlight the introduction, some latest references should be added to the paper for improving the reviews’ part and the connection with the literature. Federated multi-source domain adversarial adaptation framework for machinery fault diagnosis with data privacy, and A novel conditional weighting transfer Wasserstein auto-encoder for rolling bearing fault diagnosis with multi-source domains.

2. Some symbols in the equations are not introduced the significance. It is not easy to read.

3. The author should highly refine the existing literature results, as the existing summaries are too redundant and require further improvement in the quality of the manuscript review.

4. The references must be updated to meet the published requirements. The existing methods are not comprehensive enough, and some of the latest algorithms such as transfer learning and federated learning algorithms.

Author Response

Dear Reviewer,

Thank you for your positive feedback on our paper and valuable suggestions for improvement. We have carefully reviewed your comments and considered them for the revision of our manuscript.

To enhance the introduction and improve the connection with the literature, we will include the latest references you recommended, such as "Federated multi-source domain adversarial adaptation framework for machinery fault diagnosis with data privacy" and "A novel conditional weighting transfer Wasserstein auto-encoder for rolling bearing fault diagnosis with multi-source domains." These references will enrich the review section and strengthen the alignment with current research.

We apologize for the lack of introduction to the significance of some symbols in the equations, which made them difficult to understand. In the revised version, we will ensure that each symbol is introduced correctly and its meaning and significance are explained, improving readability and comprehension.

Thank you for your valuable feedback, which will enhance our paper. If you have any further suggestions or concerns, please do not hesitate to let us know.

Sincerely,

MSc. Tomasz Walczyna

Article Menu

Quick Overview of Face Swap Deep Fakes

Further Information

Guidelines

MDPI Initiatives

Follow MDPI