It is true that many technology companies and academics are responding to the “generation” problem by developing “countermeasures” technologies. McCloskey and Albright [
7] discriminated generated images based on the presence of underexposure or overexposure in real face images, and an AUC value of 0.92 was obtained in the classification of ProGAN and Celeba. Matren et al. [
8] used the visual artifacts that appear in the eyes, teeth, and facial contours of the generated face images for recognition, and AUC values of 0.852 and 0.843 were obtained for the ProGAN, Glow, and Celeba classifications. Zhang et al. [
9] theoretically analyzed the existence of the spectral replication of artifacts in the frequency domain and trained a classifier based on spectral rather than pixel inputs to achieve the effective detection of generated artifacts. The corresponding experimental results on CycleGAN obtained an average accuracy of 97.2. Mo et al. [
10] used filters with residuals to design deep learning network models to achieve 96.3% detection accuracy in the corresponding dataset. Dang et al. [
11] captured the features of the generated face images by setting a specific CGFace layer with an accuracy of 90.1% for the corresponding dataset. In addition to feeding images directly into deep learning models, some work has attempted to improve detection performance by incorporating domain-specific knowledge. Nataraj [
12] trained a network to detect generated artifacts by extracting the co-occurrence matrix from the pixel domain images in RGB space, and it obtained an accuracy of 93.78% after the images were detected by JPEG compression. In a Hsu [
13] paper, a new two-stream network structure was proposed by using the simplified DenseNet [
14] network structure. This network allows for pairwise learning. Specifically, pairwise learning is used to solve the problem that deep learning network models cannot effectively identify deep-network-generated images that are not included in the training process. In addition, different depth features are extracted in the proposed network structure, and the experimental results were obtained on DCGAN and Celeba with a classification accuracy of 97.2% and a recall accuracy of 91.6%. In a study by Zhuang [
15], as in [
13], to solve the problem that in order to solve the deep learning network the model cannot effectively identify the deep-network-generated images that are not included in the training process, pairwise learning was used, and, based on this, triplet-state loss was used to learn the relationship between the deep-network-generated images and real images. They also proposed a new coupled network to extract features of different depths of the target image, and the experimental results were obtained on DCGAN and Celeba, with a classification accuracy of 98.6% and a recall accuracy of 98.6%. In a study by Carvalho [
16], deep-network-generated faces were detected by finding the differences between real and deep-network-generated faces in the eyes. Specifically, when the eyes’ specular highlights were removed from both real and deep-network-generated faces, the deep-network-generated face presented more artifacts. The bottleneck features obtained using the processed human eyes were extracted using VGG19 [
17] for feature extraction. Finally, the eyes were fed into the SVM classifier to classify the deep-network-generated and real faces, and the experimental results obtained an AUC value of 0.88 on the corresponding HD face data.
In summary, the discrimination methods for generating images from generative adversarial networks can be divided into two types: one is to feature the visual artifact defects (e.g., visual artifacts in the eyes, teeth, and facial contours of the generated face images) that exist in the generated images themselves to finally achieve classification discrimination, and the other is to design a specific deep neural network model to achieve the discrimination of the generated face images. Among the related papers mentioned above, papers [
7,
8,
9,
16], belong to the first category, using the information of the generated image itself for feature extraction and as an input to the classifier for classification. Papers [
10,
11,
12,
13,
15], belong to the category of designing specific deep neural network models that are used to implement the classification of the generated face images. The model used in this paper belongs to the second category, where a new image processing method is utilized and then input to a specific neural network for classification.
In this paper, we propose to use different color space channel recombinations on the basis of the existing neural network model to effectively discriminate the generated face graphics. First, by analyzing the differences in different color space components in the deep learning network model for face sensitivity, a combination of color space components that can effectively improve the discrimination rate of the deep learning network model is given. Second, considering the wide application of attention mechanisms in image processing, natural language processing, and speech recognition in recent years, we introduced a channel attention mechanism to the model [
18]. When the model has the attention mechanism module at the appropriate location, it enables the model to effectively extract the distinguishable features of real and generated face images. The results of this experiment show that this proposed scheme can effectively solve the recognition problem of face images generated by deep networks. The classification accuracy reached 99.10% on the relevant dataset, and the model possesses good robustness.