Next Article in Journal
Energy-Efficient Cooperative Spectrum Sensing Using Machine Learning Algorithm
Next Article in Special Issue
Compact Image-Style Transfer: Channel Pruning on the Single Training of a Network
Previous Article in Journal
agroString: Visibility and Provenance through a Private Blockchain Platform for Agricultural Dispense towards Consumers
Previous Article in Special Issue
Automatic Meter Reading from UAV Inspection Photos in the Substation by Combining YOLOv5s and DeeplabV3+
 
 
Article
Peer-Review Record

AI-Generated Face Image Identification with Different Color Space Channel Combinations

Sensors 2022, 22(21), 8228; https://doi.org/10.3390/s22218228
by Songwen Mo 1,2, Pei Lu 1,2 and Xiaoyong Liu 1,2,*
Reviewer 2: Anonymous
Reviewer 3:
Sensors 2022, 22(21), 8228; https://doi.org/10.3390/s22218228
Submission received: 3 September 2022 / Revised: 22 October 2022 / Accepted: 25 October 2022 / Published: 27 October 2022
(This article belongs to the Special Issue Artificial Intelligence in Computer Vision: Methods and Applications)

Round 1

Reviewer 1 Report

1. Suggest the authors to recheck for the grammar and syntax of many sentences and to correct them by more careful and thorough proof-reading.

2.The abstract may be extended further. It should be a microcosm of the full article. Author may try to include the principal aim of the paper, the challenge to be solved, the conclusion, performance standards and competing approaches.

3. All keywords (abbreviations) must be mentioned in the abstract.

Author Response

Dear Reviewer:

The responses to your issues are as follows:

 

Point 1: Suggest the authors to recheck for the grammar and syntax of many sentences and to correct them by more careful and thorough proof-reading.

 

Response 1: I corrected many grammar and sentence errors such as.

  • In the first sentence of the second paragraph of the introduction:”It is true that many technology companies and academics are responding to the "generation" problem by developing "countermeasure" technologies” the “countermeasure” change to “countermeasures.”
  • In the title of “2.2. Attention mechanism selection”, The second sentence of the first paragraph: “The embodiment of attentional mechanisms in neural learning networks as a framework that is not itself a specific network model.” Inside the sentence, as change to is. The change sentene is “The embodiment of attentional mechanisms in neural learning networks is a framework that is not itself a specific network model.”
  • In the title of “3.5.2. Image compression”, The second paragraph: “JPEG compression is often used for compression during network transmission, and in order to test the robustness of the model, a test of image compression compression was performed, by quality factors ranging from 60 to 100 with an interval of 10.” Change to “JPEG compression is often used for compression during network transmission, to test the model’s robustness, a test of image compression compression was performed, by quality factors ranging from 60 to 100 with an interval of 10.”

 

Point 2: The abstract may be extended further. It should be a microcosm of the full article. Author may try to include the principal aim of the paper, the challenge to be solved, the conclusion, performance standards and competing approaches.

 

Response 2: Original:” With the rapid development of the Internet and information technology, network data is exploding, due to the misuse of technology and inadequate supervision, deep network generated face images flood the network and the effect of realistic, launched a serious challenge to the human eye and automatic identification system, resulting in many legal, ethical and social issues. For the needs of network information security, the deep network generated face image identification based on different color spaces is proposed. First, by analyzing the difference of different color space components in the deep learning network model for face sensitivity, a combination of color space components that can effectively improve the discrimination rate of the deep learning network model is given. Second, to further improve the discriminative performance of the model, a channel attention mechanism is added at the shallow level of the model to further focus on the features contributing to the model. The experimental results show that the scheme can effectively solve the discrimination problem of deep network generated face images with a discrimination accuracy of 99.10%.” Change to:

         “With the rapid development of the Internet and information technology(In particular, generative adversarial networks and deep learning), network data is exploding, due to the misuse of tech-nology and inadequate supervision, deep network-generated face images flood the network, and the forged image is called deepfake. Those realistic faked images, launched a serious challenge to the human eye and automatic identification system, resulting in many legal, ethical and social is-sues. For the needs of network information security, the deep network-generated face image iden-tification based on different color spaces is proposed. Due to the extremely realistic effect of deepfake image, it is difficult to achieve high accuracy with ordinary methods for neural networks, so we use the image processing method here. First, by analyzing the difference of different color space components in the deep learning network model for face sensitivity, a combination of color space components that can effectively improve the discrimination rate of the deep learning net-work model is given. Second, to further improve the discriminative performance of the model, a channel attention mechanism is added at the shallow level of the model to further focus on the features contributing to the model. The experimental results show that this scheme achieves better accuracy in the same face generation model and in different face generation models than the two compared methods, and its accuracy reaches up to 99.10% in the same face generation model. Meanwhile, the accuracy of this model only decreases to 98.71% when coping with the JPEG compression factor is 100, which shows that this model is robust.”

 

Point 3: All keywords (abbreviations) must be mentioned in the abstract.

 

Response 3: All keywords have been included in the new abstract: “With the rapid development of the Internet and information technology(In particular, generative adversarial networks and deep learning), network data is exploding, due to the misuse of technology and inadequate supervision, deep network-generated face images flood the network, and the forged image is called deepfake. Those realistic faked images, launched a serious challenge to the human eye and automatic identification system, resulting in many legal, ethical and social issues. For the needs of network information security, the deep network-generated face image identification based on different color spaces is proposed. Due to the extremely realistic effect of deepfake image, it is difficult to achieve high accuracy with ordinary methods for neural networks, so we use the image processing method here. First, by analyzing the difference of different color space components in the deep learning network model for face sensitivity, a combination of color space components that can effectively improve the discrimination rate of the deep learning network model is given. Second, to further improve the discriminative performance of the model, a channel attention mechanism is added at the shallow level of the model to further focus on the features contributing to the model. The experimental results show that this scheme achieves better accuracy in the same face generation model and in different face generation models than the two compared methods, and its accuracy reaches up to 99.10% in the same face generation model. Meanwhile, the accuracy of this model only decreases to 98.71% when coping with the JPEG compression factor is 100, which shows that this model is robust.”

 

Sincerely yours,

Songwen Mo

 

Author Response File: Author Response.pdf

Reviewer 2 Report

1. the title can be further simplified as:AI Generated Face Image Identification with  different color space channel combinations ? and the using  of "Discrimination" may cause ambiguity to the readers.

2.In the introduction section, the difference compared with existing literature and contribution of this paer should be clearly given out.

3. In the second section, a clear overview figure should be added to illustrate the structure of proposal, rather than a list of layers in the network.

4. In 3.2, the embedding of attentional mechanism is not so clear to me, why is it needed, and why is it working in the whole model?

5. The figures in the paper, including but not limited to Figure 1, Figure 4, and Figure 5, can be further improved.

6. The training set is randomly selected from the data set, how many samples are selected? and also the test set is selected randomly? Does it reasonable for a network validation?

7. A much more detailed performance comparision between the proposal and existing approaches should be added in the final evaluation section.

8. An extensive editing of English language and style is required to further improve the readability of this paper.

Author Response

Dear Reviewer:

The responses to your issues are as follows:

 

Point 1: the title can be further simplified as:AI Generated Face Image Identification with  different color space channel combinations ? and the using  of "Discrimination" may cause ambiguity to the readers.

 

Response 1: Your proposal is very meaningful and I would like to change the title to “AI Generated Face Image Identification with different color space channel combinations.”

 

Point 2: In the introduction section, the difference compared with existing literature and contribution of this paer should be clearly given out.

 

Response 2: At the end of the third paragraph of the introduction, the difference of the quoted article and the contribution of this article are added: “In summary, the discrimination methods for generating images from generative adversarial networks can be divided into two types: one is to feature the visual artifact defects (e.g., visual artifacts in the eyes, teeth, and facial contours of the generated face images) that exist in the generated images themselves to finally achieve classification discrimination, and the other is to design a specific deep neural network model to achieve discrimination of the generated face images.” Change to “In summary, the discrimination methods for generating images from generative adversarial networks can be divided into two types: one is to feature the visual artifact defects (e.g., visual artifacts in the eyes, teeth, and facial contours of the generated face images) that exist in the generated images themselves to finally achieve classification discrimination, and the other is to design a specific deep neural network model to achieve discrimination of the generated face images. Among the related papers mentioned above: papers 2, 3, and 4 belong to the first category, using the information of the generated image itself for feature extraction and then later input to the classifier for classification. Papers 5, 6, and 7 belong to the category of designing specific deep neural network models that are used to implement classification of the generated face images. In this paper, it belongs to the second category, where a new image processing method is utilized and then input to a specific neural network for classification.

 

Point 3: In the second section, a clear overview figure should be added to illustrate the structure of proposal, rather than a list of layers in the network.

 

Response 3:

Table 1. The simplified structure of Xception

Entry flow

Middle flow

Exit flow

Conv

Relu

Relu

SeparaConv

Relu

SeparaConv

Conv

Relu

Relu

SeparaConv

Relu

SeparaConv

SeparaConv

Relu

SeparaConv

MaxPooling

Relu

SeparaConv

 

SeparaConv

Relu

ReluSeparaConv

 

SeparaConv

Relu

MaxPooling

 

Global

Average

Pooling

 Change to:

 

Figure 1. The Xception deep learning network model is divided into three parts: entry flow, middle flow, exit flow.

 

 

Point 4: In 3.2, the embedding of attentional mechanism is not so clear to me, why is it needed, and why is it working in the whole model?

 

Response 4: We need attention mechanism to improve the classification accuracy of the model, and the attention mechanism allows the model to focus on more areas that are not correctly represented when the AI generates face images. Because the attention mechanism further affects the receptive field of the model, experiments need to be conducted in different locations to find the best location to embed the attention mechanism.

 

Point 5: The figures in the paper, including but not limited to Figure 1, Figure 4, and Figure 5, can be further improved.

 

Response 5: Some images that are not clearly visible I have improved, the operation includes enlargement, image text re-embedding. Such as:

change to

 

Point 6: The training set is randomly selected from the data set, how many samples are selected? and also the test set is selected randomly? Does it reasonable for a network validation?

      

Response 6: The training set is randomly selected from C128 and GS128 with 10000 pieces each. The test set is the same database data, but the test set and training set are mutually exclusive. For the validation of the different networks we are using the same experimental data as above.

 

Point 7: A much more detailed performance comparision between the proposal and existing approaches should be added in the final evaluation section.

 

Response 7: We add a new performance comparsion after Figure 6: the confusion matrix

Point 8: An extensive editing of English language and style is required to further improve the readability of this paper.

 

Response 8: I corrected many grammar and sentence errors such as.

  • In the first sentence of the second paragraph of the introduction:”It is true that many technology companies and academics are responding to the "generation" problem by developing "countermeasure" technologies” the “countermeasure” change to “countermeasures.”
  • In the title of “2.2. Attention mechanism selection”, The second sentence of the first paragraph: “The embodiment of attentional mechanisms in neural learning networks as a framework that is not itself a specific network model.” Inside the sentence, as change to is. The change sentene is “The embodiment of attentional mechanisms in neural learning networks is a framework that is not itself a specific network model.”
  • In the title of “3.5.2. Image compression”, The second paragraph: “JPEG compression is often used for compression during network transmission, and in order to test the robustness of the model, a test of image compression compression was performed, by quality factors ranging from 60 to 100 with an interval of 10.” Change to “JPEG compression is often used for compression during network transmission, to test the model’s robustness, a test of image compression compression was performed, by quality factors ranging from 60 to 100 with an interval of 10.”

 

 

Sincerely yours,

Songwen Mo

 

 

Author Response File: Author Response.pdf

Reviewer 3 Report

The paper presents an interesting subject but it was not clear at all what is the benefit of the proposed method - where is it used. It is not clearly at all what is its scope. 

Many improvements are needed:

-it is not clearly what is the novelty of the proposed method

-a section with related work must be added (including existing performances)

-discussion results must be improved - adding more examples, showing what are the advantages and drawbacks; explain more clearly section 4.4 - how is it used?

-check different formulation: eg: check title for section 3, line 137: it is not clearly the meaning of the sentence, line 211: find the top 5 combinations with accuracy. -> what accuracy

-other error: line 125: python -> Python

-figure 3 is not explained in the text

The paper needs to be re-written - for a better structure and clearly understanding. Describe related work, proposed method with improvements, evaluation and discussion (eg in section 3.1 it is not the best part to add information about hardware environment). Since there are many improvements to be done, I recommend to reject the paper.

Author Response

Dear Reviewer:

The responses to your issues are as follows:

 

Point 1: it is not clearly what is the novelty of the proposed method

 

Response 1: Our main innovation is in image processing. We use channels in different color Spaces to recombine. The image information of the recombined face image can improve the classification effect of the model for real faces and deep network generated faces.

 

Point 2: a section with related work must be added (including existing performances)

 

Response 2: The structure of the paper has been revised. At the beginning of the second paragraph of the introduction, we introduce the title of the related works and summarize the papers involved and the contribution of this paper at the end:

”In summary, the discrimination methods for generating images from generative adversarial networks can be divided into two types: one is to feature the visual artifact defects (e.g., visual artifacts in the eyes, teeth, and facial contours of the generated face images) that exist in the generated images themselves to finally achieve classification discrimination, and the other is to design a specific deep neural network model to achieve discrimination of the generated face images.” Change to : “In summary, the discrimination methods for generating images from generative adversarial networks can be divided into two types: one is to feature the visual artifact defects (e.g., visual artifacts in the eyes, teeth, and facial contours of the generated face images) that exist in the generated images themselves to finally achieve classification discrimination, and the other is to design a specific deep neural network model to achieve discrimination of the generated face images. Among the related papers mentioned above: papers 7, 8, 9 and 16 belong to the first category, using the information of the generated image itself for feature extraction and then later input to the classifier for classification. Papers 10, 11, 12, 13 and 15 belong to the category of designing specific deep neural network models that are used to imple-ment classification of the generated face images. In this paper, it belongs to the second category, where a new image processing method is utilized and then input to a specific neural network for classification.

We have also expanded Related Workds:

         In Hsu [13] paper, a new two-stream network structure is proposed by using the simplified DenseNet [14] network structure. This network allows for Pairwise Learning. Specifically, Pairwise Learning is used to solve the problem that deep Learning network models cannot effectively identify deep network-generated images that are not included in the training process. In addition, different depth features are extracted in the proposed network structure. In Zhuang [15], as in [13] to solve the problem that in order to solve the deep learning network model cannot effectively identify the deep network-generated images that are not included in the training process thus using Pairwise Learning, and based on this triplet state loss is used to learn the relationship between deep network-generated and real images. They also propose a new coupled network to extract features of different depths of the target image. In Carvalho [16], deep network-generated faces are detected by finding the difference between real and deep network-generated faces in the eyes. Specifically, when the Eye's specular highlight is removed from both real and deep network-generated faces, the deep network-generated face presents more artifacts. The bottleneck features obtained using the processed human eyes are extracted using VGG19[17] for feature extraction. Finally, the eyes are fed into the SVM classifier for classifying the deep network-generated and real faces.

 

 

 

 

 

 

 

 

 

Point 3: discussion results must be improved - adding more examples, showing what are the advantages and drawbacks; explain more clearly section 4.4 - how is it used?

Response 3: We add a new performance comparsion after Figure 6: the confusion matrix

When i samples are input, the parameters are optimized based on the value of the minimization cross-entropy function. For example, when the probability of a real face sample being determined by the network to be a real face is 0.4234, a value of the minimization cross-entropy function such as 0.7213 is obtained, which means that the network will change this weight parameter in the correct direction based on the value of this minimization cross-entropy function.

 

Point 4: check different formulation: eg: check title for section 3, line 137: it is not clearly the meaning of the sentence, line 211: find the top 5 combinations with accuracy. -> what accuracy

 

Response 4: We modified some formulation Such as:

section 3, line 137: ”In order to fairly compare the discrimination rates of face images generated by different GAN models, the resolutions of face images generated by the three GAN models were set to 128×128 and 64×64, respectively, and were noted as GD128, GS128, GP128, GD64, GS64, GP64(DCGAN, StyleGAN , and ProGAN, The letters after G represent the corresponding models).” Change to ”In order to fairly compare the discrimination rates of face images generated by different GAN models, the resolutions of face images generated by the three GAN models were set to 128×128 and 64×64, respectively, and were noted as GD128 from DCGANGS128 from StyleGAN , GP128 from ProGAN, GD64 from DCGAN, GS64 from StyleGAN, and GP64 from ProGAN.”

 

         “After converting RGB to HSV and YCbCr, the channel combinations are selected by R, G, B, H, S, V, Y, Cb, Cr combinations to find the top 5 combinations with accuracy.” Change to “After converting RGB to HSV and YCbCr, the channel combinations are selected by R, G, B, H, S, V, Y, Cb, and Cr. The recombined images of the three channels are input to the Xception model to select the top 5 accuracy combinations of real faces and deep network generated faces.”

 

Point 5: other error: line 125: python -> Python

 

  • Response 5: We corrected some spelling such as:

Channal Change to Channel

Dateset Change to Dataset

python -> Python

 

Point 6: figure 3 is not explained in the text

 

Response 6: We changed the text above the figure 3

 “The generated face, whose part is shown in Figure 3.” Change to “The generated face, whose part is shown in Figure 4. The left face image of the Figure 4 is a 128×128 resolution face image produced by StyleGAN, and the right is a 64×64 face image generated by ProGAN.”

 

Sincerely yours,

Songwen Mo

 

 

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

I donot have any more questions in present form.

Author Response

Point 1: Extensive editing of English language and style required

 

Response 1: We have modified some sentences to make them easier to read such as:

 

‘Secondly, considering the wide application of the attention mechanism in image processing, natural language processing, and speech recognition in recent years, the channel attention mechanism [18] is introduced to enable the deep neural network model to effectively extract distinguishable features of real and generated face images.’

 

Change to:

 

‘Secondly, considering the wide application of attention mechanism in image processing, natural language processing and speech recognition in recent years. Therefore, we introduce the channel attention mechanism in the model [18]. When the model has the attention mechanism module at the appropriate location, it enables the model to effectively extract the distinguishable features of real and generated face images.’

 

‘The experimental results show that the scheme can effectively solve the identification problem of the deep network-generated face images with a discrimination accuracy of 99.10%.’

 

Change to:

 

‘The results of this experiment show that this proposed scheme can effectively solve the recognition problem of face images generated by deep networks. The classification accuracy reaches 99.10% on the relevant dataset, and the model possesses good robustness.’

 

‘the test dataset was not degraded in accuracy when it was compressed by the JPEG algorithm.’

 

Change to:

 

‘when the test dataset was compressed by the JPEG algorithm, the classification accuracy of the model did not show a substantial degradation.’

 

‘Humans have cognitive bottlenecks because the brain has a limited processing speed and does not interact with everything in its field of vision, but rather sorts out the objects to interact with by way of attention, thus responding within the limited processing speed.’

 

Change to

 

‘Humans have a visual field interaction bottleneck because the brain has a limited processing speed. The human eye does not interact with everything in the visual field, but selects the objects it wants to interact with by way of attention. So humans can respond quickly to the object of attention within a limited processing speed.’

Reviewer 3 Report

Some of my comments were improved, but still there are some that needs to be extended:

-it is not clearly if the article deals with face generation or face identification

-Section Related work must contain also results

-Discussion Section must be more elaborated - showing what are the advantages and drawbacks 

-lines 104-107: papers 7-16 must be referred as references in the text

Author Response

Dear Reviewer:

The responses to your issues are as follows:

 

Point 1: it is not clearly if the article deals with face generation or face identification

 

Response 1: Our work involves face generation but no too much involves face recognition. We use DCGAN , ProGAN , StyleGAN to generate human faces.

 

Point 2: Section Related work must contain also results

  • Response 2: We added the results after the related work: It is true that many technology companies and academics are responding to the "generation" problem by developing "countermeasures" technologies. McCloskey and Albright[7] discriminate from the generated images based on the presence of underexposure or overexposure in real-face images and an auc value of 0.92 was obtained in the classification of ProGAN and Celeba. Matren et al[8] used the visual artifacts that appear in the eyes, teeth, and facial contours of the generated face images and used these artifacts for recognition , and the auc values of 0.852, 0.843 were obtained for ProGAN, Glow and Celeba classification, respectively. Zhang et al [9] theoretically analyzed the existence of spectral replication of artifacts in the frequency domain and trained the classifier based on spectral rather than pixel inputs to achieve effective detection of generated artifacts, It’s corresponding experimental results on CycleGAN obtained an average accuracy of 97.2. Mo et al[10] used filters with residuals to design deep learning network models to achieve 96.3% detection accuracy in the corresponding dataset. Dang [11] et al. captured the features of the generated face images by setting a specific CGFace layer with an accuracy of 90.1% for the corresponding dataset. In addition to feeding images directly into deep learning models, some work has attempted to improve detection performance by incorporating domain-specific knowledge. Nataraj [12] trained the network to detect the generated artifacts by extracting the co-occurrence matrix from the pixel domain images in RGB space and it obtained an accuracy of 93.78% after the images were detected by JPEG compression. In Hsu [13] paper, a new two-stream network structure is proposed by using the simplified DenseNet [14] network structure. This network allows for Pairwise Learning. Specifically, Pairwise Learning is used to solve the problem that deep Learning network models cannot effectively identify deep network-generated images that are not included in the training process. In addition, different depth features are extracted in the proposed network structure, and the experimental results were obtained on DCGAN and Celeba with classification accuracy of 97.2%, Recall accuracy of 91.6%. In Zhuang [15], as in [13] to solve the problem that in order to solve the deep learning network model cannot effectively identify the deep network-generated images that are not included in the training process thus using Pairwise Learning, and based on this triplet state loss is used to learn the relationship between deep network-generated and real images. They also propose a new coupled network to extract features of different depths of the target image, and the experimental results were obtained on DCGAN and Celeba with classification accuracy of 98.6%, Recall accuracy of 98.6%. In Carvalho [16], deep network-generated faces are detected by finding the difference between real and deep network-generated faces in the eyes. Specifically, when the Eye's specular highlight is removed from both real and deep network-generated faces, the deep network-generated face presents more artifacts. The bottleneck features obtained using the processed human eyes are extracted using VGG19[17] for feature extraction. Finally, the eyes are fed into the SVM classifier for classifying the deep network-generated and real faces, and the experimental results obtained an auc value of 0.88 on the corresponding HD face data.

Point 3: Discussion Section must be more elaborated - showing what are the advantages and drawbacks

 

Response 3: We expand the discussion. The advantages and disadvantages are further described.

 

In this proposed method, we use a new image preprocessing approach to discriminate the faces generated by deep learning networks, in that the pixel synthesis from the deep network-generated face image itself is not the same as the real face synthesis approach. In the future, our work should also focus on this aspect: finding the differences between the faces generated by deep learning networks and real faces.

However, the limitation of this work is that we find that the accuracy decreases rapidly when testing between different generated face models. In the future, we will focus on commonalities between different generative models, with the initial intention to work on domain migration.

 

Change to :

 

         In this proposed method, we use a new image preprocessing approach to discriminate the faces generated by deep learning networks, in that the pixel synthesis from the deep network-generated face image itself is not the same as the real face synthesis approach. Because generative adversarial networks do not represent many details in the generated images as correctly as real images, this also leads to potential factors in the generated images that can be explored. Therefore, it is these potential factors that are different from the real face that we can use to assist our model in classification. In this paper, this potential factor is that the images generated by the generative adversarial network are expressed differently from the real images in different color spaces. In the future, our work should also focus on this aspect: finding the differences between the face generated by deep learning networks and real face.

The proposed method of image processing can also be used in other Deepfake fields, such as the statistical analysis of spectrograms of channel recombined images to achieve classification effects. Can this solve the problem of insufficient number of samples?

Even though our proposed method achieves high classification accuracy in the corresponding DCGAN, ProGAN, and StyleGAN datasets, and the model still has good robustness against JPEG compression attacks. However, the limitation of this work is that we find that the accuracy decreases rapidly when testing between different generated face models. In the future, we will focus on commonalities between different generative models, with the initial intention to work on domain migration.

 

Point 4: lines 104-107: papers 7-16 must be referred as references in the text

 

Response 4: Among the related papers mentioned above: papers 7, 8, 9 and 16 belong to the first category, using the information of the generated image itself for feature extraction and then later input to the classifier for classification. Papers 10, 11, 12, 13 and 15 belong to the category of designing specific deep neural network models that are used to implement classification of the generated face images.

 

Change to:

 

Among the related papers mentioned above: papers [7], [8], [9] and [16] belong to the first category, using the information of the generated image itself for feature extraction and then later input to the classifier for classification. Papers [10], [11], [12], [13] and [15] belong to the category of designing specific deep neural network models that are used to implement classification of the generated face images.

 

 

Sincerely yours,

Songwen Mo

 

 

Author Response File: Author Response.pdf

Round 3

Reviewer 3 Report

Some of my comments were addressed and improved. Discussion Section must be more elaborated. That are presented some drawbacks, but they are not clearly explained (discussed).

Other aspects that must be considered:

-line 58 and following: use AUC instead of auc

-line 312 - use correct references for 5 and 6

Author Response

Dear Reviewer:

The responses to your issues are as follows:

 

Point 1: Some of my comments were addressed and improved. Discussion Section must be more elaborated. That are presented some drawbacks, but they are not clearly explained (discussed).

 

Response 1: Even though our proposed method achieves high classification accuracy in the corresponding DCGAN, ProGAN, and StyleGAN datasets, and the model still has good robustness against JPEG compression attacks. However, the limitation of this work is that we find that the accuracy decreases rapidly when testing between different generated face models. In the future, we will focus on commonalities between different generative models, with the initial intention to work on domain migration.

 

Change to:

 

Even though our proposed method achieves high classification accuracy in the corresponding DCGAN, ProGAN, and StyleGAN datasets, and the model still has good robustness against JPEG compression attacks. However we find that the classification accuracy of the model decreases when the training dataset for generating faces is GS128 and the test dataset is not the same model. Its classification accuracy decreases from 99.10% to 95.55% on average for some datasets (99.10% has been removed), and it decreases more rapidly on [5](MO) and [6](Dang) methods, corresponding to an average decrease of 89.93% and 86.62%. In the future, we will focus on commonalities between different generative models, with the initial intention to work on domain migration.

 

Point 2: line 58 and following: use AUC instead of auc

Response 2: McCloskey and Albright[7] discriminate from the generated images based on the presence of underexposure or overexposure in real-face images and an auc value of 0.92 was obtained in the classification of ProGAN and Celeba.

Change to:

McCloskey and Albright[7] discriminate from the generated images based on the presence of underexposure or overexposure in real-face images and an AUC value of 0.92 was obtained in the classification of ProGAN and Celeba.

Matren et al[8] used the visual artifacts that appear in the eyes, teeth, and facial con-tours of the generated face images and used these artifacts for recognition , and the auc values of 0.852, 0.843 were obtained for ProGAN, Glow and Celeba classification, respectively.

Change to:

Matren et al[8] used the visual artifacts that appear in the eyes, teeth, and facial con-tours of the generated face images and used these artifacts for recognition , and the AUC values of 0.852, 0.843 were obtained for ProGAN, Glow and Celeba classification, respectively.

Finally, the eyes are fed into the SVM classifier for classifying the deep net-work-generated and real faces, and the experimental results obtained an auc value of 0.88 on the corresponding HD face data.

Change to:

Finally, the eyes are fed into the SVM classifier for classifying the deep net-work-generated and real faces, and the experimental results obtained an AUC value of 0.88 on the corresponding HD face data.

 

Point 3: line 312 - use correct references for 5 and 6

 

Response 3: In this section, We compare the proposed approach with the network model structure in papers 5(MO) and 6(Dang).

 

Change to:

 

In this section, We compare the proposed approach with the network model structure in papers [5](MO) and [6](Dang).

Sincerely yours,

Songwen Mo

 

 

Author Response File: Author Response.pdf

Back to TopTop