Face Aging with Feature-Guide Conditional Generative Adversarial Network

Li, Chen; Li, Yuanbo; Weng, Zhiqiang; Lei, Xuemei; Yang, Guangcan

doi:10.3390/electronics12092095

Open AccessArticle

Face Aging with Feature-Guide Conditional Generative Adversarial Network

by

Chen Li

¹

,

Yuanbo Li

¹,

Zhiqiang Weng

²,

Xuemei Lei

^3,* and

Guangcan Yang

¹

School of Information Science and Technology, North China University of Technology, Beijing 100144, China

²

Baidu Inc., Beijing 100085, China

³

University of Science and Technology Beijing, Beijing 100083, China

^*

Author to whom correspondence should be addressed.

Electronics 2023, 12(9), 2095; https://doi.org/10.3390/electronics12092095

Submission received: 28 February 2023 / Revised: 9 April 2023 / Accepted: 12 April 2023 / Published: 4 May 2023

(This article belongs to the Section Computer Science & Engineering)

Download

Browse Figures

Versions Notes

Abstract

:

Face aging is of great importance for the information forensics and security fields, as well as entertainment-related applications. Although significant progress has been made in this field, the authenticity, age specificity, and identity preservation of generated face images still need further discussion. To better address these issues, a Feature-Guide Conditional Generative Adversarial Network (FG-CGAN) is proposed in this paper, which contains extra feature guide module and age classifier module. To preserve the identity of the input facial image during the generating procedure, in the feature guide module, perceptual loss is introduced to minimize the identity difference between the input and output face image of the generator, and L2 loss is introduced to constrain the size of the generated feature map. To make the generated image fall into the target age group, in the age classifier module, an age-estimated loss is constructed, during which L-Softmax loss is combined to make the sample boundaries of different categories more obvious. Abundant experiments are conducted on the widely used face aging dataset CACD and Morph. The results show that target aging face images generated by FG-CGAN have promising validation confidence for identity preservation. Specifically, the validation confidence levels for age groups 20–30, 30–40, and 40–50 are 95.79%, 95.42%, and 90.77% respectively, which verify the effectiveness of our proposed method.

Keywords:

face aging; feature guide; information preserving; generative adversarial network; age classifier module

1. Introduction

Face aging, also known as age image generation or age regression problem [1,2,3,4], can be defined as the process of aesthetically rendering a face image while making the processed image visually appealing with natural aging or rejuvenation of the human face. Face aging has a broad range of applications in different fields, such as cross-age facial recognition, lost children searching, and audio–visual entertainment.

An ideal face aging algorithm should possess the following key characteristics: authenticity, identity preservation, and accuracy when generating images within the target age group. Previous research on facial aging has primarily focused on two categories of methods: physical model-based [4,5,6,7] and prototype-based [2,8,9]. Physical model-based methods rely on adding or removing age-related features, such as wrinkles, gray hair, or beards, that align with image generation rules. The prototype-based method refers to first counting the average face of an age group, and then using the differences between different age groups as an aging pattern to synthesize aging faces, which does not preserve identity well. However, these methods often lack a deep understanding of facial semantics, resulting in generated images that may not be authentic.

Along with the rapid development of deep neural networks, generative adversarial networks (GANs) have drawn much attention from researchers in the face aging field [10,11,12,13,14,15,16] and have been proven that they can generate images with better quality, identity consistency, and aging accuracy compared with the traditional methods. Many studies [17,18] have used unpaired face aging data to train models. However, these methods mainly focus on face aging itself and ignore other key conditional information of the input face (e.g., facial attributes). As a result, incorrect face attributes may appear in the generated results, and the identity information of the generated face cannot be preserved well. In order to suppress such undesired changes in semantic information during face aging, many recent face aging studies [19,20] have attempted to supervise the output by enforcing identity consistency, which to some extent preserves identity information. However, significant unnatural variation in facial attributes is still observed. This indicates enforcing identity consistency alone is not sufficient to achieve satisfactory face aging performance.

To combat this, Variational Autoencoders (VAEs) [21] are combined with GANs to generate new images. Age estimation is used to help implement the generation of aging face images [15]. The subsequent Pyramid Face Aging-GAN [22] combines a pyramid weight-sharing scheme is combined to ensure that face aging changes slightly between adjacent age groups and dramatically between distant age groups. Most existing GAN-based methods [18,23] usually use pixel-level loss to train the model to preserve identity consistency and background information. However, to minimize the Euclidean distance between the synthesized image and the input image, the aging accuracy of its generated results is not very high, which indicates that preserving good identity information does not mean a reasonable aging result.

To more effectively preserve the identity information of faces in face aging tasks, a new face aging framework called Feature-Guide Conditional Generative Adversarial Network (FG-CGAN) is proposed in this article. Compared with existing methods in the literature, to ensure that the generative model preserves the identity information of face images well, feature-guide methods are introduced. Specifically, it is to constrain the network features generated in the process of image generation, generating the image and the identity information features of the original image and the generated image. At the same time, to make the generated image should fall into the target age group, an age classifier module is attached to the discriminator network. Finally, abundant comparison experiments are conducted. To summarize, the main contributions are as follows:

To address identity preservation as well as aging accuracy, a network structure based on a gradient feature guide named FG-CGAN is proposed. Extra sub-modules, including a feature guide module in the generator part as well as an age classifier module combined with the discriminator, are attached to tackle identity reservation as well as aging accuracy.
To minimize the distance between the input and output face image identities in the feature guide module, the perceptual loss combined is introduced. During this, L2 loss is introduced to constrain the size of the feature map generated in the generator module.
In the age classifier module, to improve classification accuracy, the age-estimated loss is constructed, in which L-Softmax loss is combined to learn the intra-class compactness and inter-class separability between features.

2. Related Work

2.1. Face Aging Methods

In this part, representative and inspiring works of face aging are reviewed. Until now, existing face aging research can be divided into 3 phases, physical model-based methods, prototype-based methods, and deep generative-based methods.

Physical model-based methods: as seen in early face aging applications [4,5,6,7], intuitively adding or smoothing “age factors” to an image is a simple way to simulate the appearance of the face at a target age. The advantage of these methods is that they are easily applicable because they only require adding artificial elements to existing face images. However, these methods do not guarantee the visual authenticity of the generated faces. Moreover, the preservation of identity information is not considered.

Prototype-based methods: the prototype-based methods take the average face of each age group as the prototype and map the differences between the age group to the input face image. Refs. [2,8] exploit the differences between the average faces of different age groups to transform the age patterns. However, these methods usually ignore the differences between individuals. At the same time, some important age characteristics may be lost due to averaging.

Deep generative-based method: with the rapid development of deep learning, the deep generative model is widely used to synthesize the aging face image. Refs. [24,25] use the deep generative model with temporal architectures to synthesize the face images. However, the most critical problem of these methods is that multiple facial images of the same person at different ages are needed in the training stages, so their potential in practical applications is limited. The appearance of GANs reshapes the research pattern of face generation. Grigory et al. [26] propose a conditional generative adversarial network, which brings a supervised learning scheme for GAN. It restricts the feature information of the potential vectors in the down-sampling stage by L2 regularization so that the network can minimize the difference between the original and the generated image during the training process, which helps preserve the identity information. However, it may cause the generated image to be unable to reach the target age group.

To generate enhanced aging face images, Neha Sharma et al. [27] use a fusion-based generative adversarial network. Zhu et al. [10] propose a spatial attention mechanism-based GAN. It limits image modification to areas that are closely related to age changes, which helps maintain high visual fidelity when synthesizing images under unknown circumstances. Ref. [11] achieves good results in the pixel-based image migration task. However, attention/semantic-based work cannot describe local areas of the input face well. It is necessary to deal with a deeper entanglement between age characteristics and identity information.

Hence, improved generative adversarial network methods are discussed. DualGAN [12], DiscGAN [13], and CycleGAN [14] utilize circular consistency between the input image and the generated image, which keeps the identity of the generated face the same as the input face. However, because the generative network achieves stable generation quality in a multimodal state, even inputs are ignored. Wang et al. [15] propose to add identity preservation optimization measures to the generator, i.e., using a hidden vector-based L2 paradigm to decrease the loss of identity information caused during down-sampling. However, such an approach may cause a reduction in the diversity of the generated images, and also, it forces the reservation of unnecessary identity information, which may generate images not belonging to our expected age groups.

Most previous works focus on progression and neglect the discussion of a wide range of age transformations. To address this issue, Makhmudkhujaev et al. [28] propose Re-Aging GAN, which can learn personalized age features through high-level interaction between a given identity and target age. These features include identity and target age information, which provides an important indication of how the input face should be at a specific age. How to take advantage of relevant facial conditions is also a research focus. Shen et al. [29] propose a framework Interface-GAN to learn the facial semantic information in the potential space, which can truly operate the corresponding facial attributes without retraining the model. Controlling the attribute operation more accurately can change the facial posture and repair the artifacts generated by GAN accidentally. Liu et al. [30] propose embedding the facial attribute vector into the generator and discriminator in order to stimulate each synthesized aging face image to abide by its corresponding input attribute. Yang et al. [16] use coupling to simulate the constraints of specific features of intrinsic subjects and age-specific facial changes over time, respectively. To render realistic facial details, the advanced age-specific features conveyed by synthetic faces are estimated at multiple scales by pyramid adversarial discriminators. A³GAN [31] embeds facial attribute vectors into generators and discriminators; this helps synthesized faces be faithful to the attributes of the corresponding input. Moreover, it also utilizes attention mechanisms to limit modifications of age-relevant regions and therefore preserve image detail.

In conclusion, how to take advantage of useful conditions and preserving the identity information of the input face while guaranteeing that aging and visual accuracy is still the main challenge of face aging.

2.2. Age Prediction/Estimation Methods

Age prediction involves recognizing and predicting age information from facial images, analyzing and processing various features such as wrinkles, eye bags, and facial contours that are important in the aging process of the face. Accurate age prediction provides fundamental data to help facial aging algorithms simulate and predict changes that occur in the face over time. This task is critical for applications related to face recognition, surveillance, and human-computer interaction.

In recent years, deep learning technology has greatly improved the performance of facial age prediction. Deep learning methods can automatically learn feature representations and model parameters without the need for manual design and extraction of features, resulting in improved prediction performance. For instance, Levi et al. [32] propose a method for age prediction using convolutional neural networks (CNN), achieving excellent prediction results. Rothe et al. [33] propose a method for age prediction based on VGG-16 architecture, which can predict real and apparent ages from a single image without the need for facial landmarks.

Li et al. [34] propose a label refinery network (LRN) and a slack regression refinement method that can progressively learn specific age-label distributions for different facial images without assumptions on fixed distribution formulations. To reduce the overlap of face features between adjacent ages and improve age prediction accuracy, Xia et al. [35] propose a face age estimation method that considers various factors affecting biometric information in a face image. The proposed multi-stage feature constraints learning method refines features through three stages to reduce overlap and increase the discrimination between age ranges, resulting in improved accuracy and fast estimation.

Our proposed framework builds upon the foundation of GAN, with the generator utilizing the encoder–decoder architecture to generate images while the discriminator follows existing methods. While our encoder and decoder network structure is similar to existing works, we introduce several modifications that enhance performance (see details below). One unique aspect of our framework is the integration of two additional subnetworks: the feature guide module and the age classifier module. These modules can provide information about what the age of a face should look like. They can play a guiding role in generating images. The method proposed in this article not only preserves the identity information of the face but also produces a more visible and intuitive age change. The generated images look more realistic and reliable, and facial features are well preserved, which is crucial in applications such as age progression and regression.

3. Methodology

As illustrated in Figure 1, the proposed framework contains four components, including the generator module, the feature guide module, the age classifier module, and the discriminator module. In order to ensure that the comparisons are fair, age categories are divided into five groups: 10–20, 20–30, 30–40, 40–50, and over 50 years old. One-hot labels are used to indicate the age groups. One-hot is filled with 1 in one dimension to indicate the target age and filled with 0 in other dimensions.

The purpose of FG-CGAN is to generate an aging face that conforms to the target age group from the original face image. The generator module is composed of an encoder and a decoder. In the process of generating face, the feature guide module extracts the corresponding feature maps of the encoder and decoder, respectively, and makes them conform to L2 constraints. At the same time, L2 is also used to constrain the identity features extracted by the pre-trained network. The discriminator is used to distinguish whether the generated face is true or false. The age classifier module determines whether the age of generated face is within the target age group. The details of this framework are explained below.

3.1. Base Network

The fundamental architecture of the network comprises a generator and a discriminator that utilize GAN principles. The generator has been modified from the Variational Autoencoder (VAE) [21] architecture to enhance its performance.

Generator module: given an input face image

x \in R^{h \times w \times N}

and several target age groups

C_{g} \in R^{N}

, in which,

h

and

w

represent the height and width of a feature map,

N

represents the number of age groups, respectively. To generate a synthetic face image

x_{t}

within the target age group

C_{t}

, a generator

G

is built, which refers to the work of the VAE. The synthetic face image can be shown in Equation (1):

x_{t} = G (x, C_{t}),

(1)

G

consists of both an encoder and a decoder.

The encoder module aims to encode the high-dimensional input

x

into a low-dimensional latent vector, thereby forcing the neural network to learn the most informative features. The encoder module in this paper is constructed with a full convolution neural network. In order to preserve the semantic information of the image, the full connection layer of the full convolution network is replaced with the convolution layer in our proposed network. The input pipes’ number of the input layer is adjusted to be consistent with the image matrix dimension after adding one-hot coding. Using this encoder structure allows the generator to obtain the expression learning ability in the hidden space so that the data can be manipulated at the semantic level through interpolation or conditional embedding on the hidden variable space [21].

The decoder module aims at restoring the latent vector of the hidden layer to the initial dimension and making the output

x_{l} \approx x

. The fractional step convolution structure is introduced to construct the decoder network.

Discriminator module: this module is used to determine the authenticity of the input synthetic image

x_{t}

, which is functionally consistent with the discriminator model in the original generative adversarial network, that is,

\max D (x_{t}, y)

.

D

is a discriminator to identify the facticity of the synthetic face and

y

is the real target image belonging to the target age group

C_{t}

.

Adversarial loss: for the problem that the original GANs cannot generate images with specific attributes. The core of CGANs [26] is to integrate attribute information into generator G and discriminator D, where attribute information can be any label information. The objective function of CGANs can be expressed as:

m i n_{G} m a x_{D} V (D, G) = E_{x ~ p_{x} (x)} [l o g D (x | C_{t})] + E_{y ~ p_{y} (y)} [\log (1 - D (G (y | C_{t})))],

(2)

The distributions of

x

and

y

are represented by

p_{x} (x)

and

p_{y} (y),

respectively, where

p_{x} (x)

and

p_{y} (y)

are denoting the corresponding probability distributions.

However, CGANs share the same drawbacks as the original GANs as they employ cross entropy as the loss function, which results in generated samples being distant from the decision boundary and only achieving a small loss. This instability in the training process leads to a low-quality output from the generator. In contrast, LSGAN [36] aims to minimize the distance between the generated and real faces, thereby making it difficult for the discriminator to distinguish between them. FG-CGAN adopts the conditional LSGAN function as its adversarial loss:

L_{D} = \frac{1}{2} E_{x ~ p_{x} (x)} [{(D (G (x | C_{t}) - 1)}^{2}] + \frac{1}{2} E_{y ~ p_{y} (y)} [{(D (G (y | C_{t})))}^{2}],

(3)

L_{G} = \frac{1}{2} E_{y ~ p_{y} (y)} [(D (G (y | C_{t}) - 1))^{2}]

(4)

3.2. Feature Guide Module

Preserving the identity information of the input face is a critical requirement in the face aging process. However, only utilizing the adversarial loss to make the generated sample and target sample distributions similar may not adequately preserve identity information. To address this issue, we introduce the identity information preservation function to supervise the image generation process by using the features extracted from the network. Specifically, this function is called the feature guide module.

This module obtains the feature maps of the encoder and decoder separately and compares the corresponding feature maps. This requires that the encoder and decoder be symmetrical when designing the network. The advantage of this is that the extracted corresponding feature maps are of the same size, and they can be fed into the same pre-trained network, ensuring that the final output is a one-dimensional vector for metric comparison based on L2 normal form. Thus, to make the corresponding feature maps the same size,

L_{l a y e r}

is defined in the following Equation (5):

L_{l a y e r} = \sum_{i \in k} {||f_{e n c o d e r}^{i} - f_{d e c o d e r}^{i}||}^{2},

(5)

Here

k

represents the total number of network layers, and

f_{e n c o d e r}^{i}

and

f_{d e c o d e r}^{i}

represent the feature maps of the encoder and the decoder at the

i - t h

layer, respectively.

For identity consistency, the perceptual loss is introduced to minimize the distance between the input and output face image identities of the generator, as shown in Equation (6):

L_{i d} = \sum_{x \in p_{x} (x)} {||h_{i d}^{x_{t}} - h_{i d}^{x}||}^{2},

(6)

where

h_{i d}^{x} (\cdot)

represents features extracted from a specific feature layer in the pre-trained model with

x

as input. The difference metric of the paired feature maps can preserve the identity information between the original and generated images. The pre-trained network uses ResNet-34 as the basic network structure to classify the age of the generated images. The reason why the formulas use L2 instead of L1 as the metric is that L1 is pixel-based and strongly supervises each pixel, leading to a conservative approach to generating the original image. Consequently, the generative network may lack diversity in the generated images. The overall loss function for identity preservation is:

l_{i d e n t i t y} = l_{i d} + l_{l a y e r}

(7)

3.3. Age Classifier Module

The age classifier module is used to classify the age group

C_{t}

of the generated face image

x_{t} .

This module is primarily structured as Resnet-34.

In addition to meeting the requirements of visual perception, the synthetic face image must also satisfy the target age group condition. To achieve this, the generator utilizes the age classifier module to regulate the age distribution of the synthetic image through loss estimation. This enables the generator to generate a synthetic image

x_{t}

that conforms to the target age condition by comparing it with the target image

y

, the loss function of the age classifier module is:

l_{a g e} = - \frac{1}{M} \sum_{s = 1}^{M} \sum_{j = 1}^{N} s i g n \times l o g P,

(8)

where

M

represents the number of samples.

s i g n

represents the sign function, if the age group of the sample is equal to the true group

C_{t}

, take 1; otherwise, take 0.

The L-Softmax loss is introduced to learn the intra-class compactness and inter-class separability between features. This loss function enhances the distinguishability of sample boundaries for different categories by adjusting the inter-class angle boundary constraints. By multiplying the preset constant with the angle between the sample and the ground truth class, an angular margin is created. The strength of the margin around the ground truth category is determined by the preset constant, allowing the L-Softmax loss to be customized according to the task requirements.

3.4. Overall Objective Function

The final integrated objective function can be obtained by combining the aforementioned equations:

l_{f i n a l - G} = λ_{G} l_{G} + λ_{i d e n t i t y} l_{i d e n t i t y} + λ_{a g e} l_{a g e},

(9)

l_{f i n a l - D} = l_{D},

(10)

where

λ_{G}

,

λ_{i d e n t i t y}

, and

λ_{a g e}

are hyper-parameters used to balance the weight of objective function.

4. Experiments and Evaluation

4.1. Dataset

To ensure fair comparisons, the CACD dataset [37] is introduced to evaluate face generation based on identity preservation. This dataset comprises over 160,000 face images with variations in pose, illumination, and expression obtained from 2000 celebrities aged between 16–62. All the images are age-annotated, although not very accurately. We first use target detection to calibrate the face position and then perform various data enhancement processes on the input images, including adjusting saturation and brightness, horizontally flipping the image, randomly rotating the angle, and normalizing the image. The final images used comprise approximately 146,794 images with a resolution of 400 × 400 pixels, which is split into two parts for training and validation, with 90% and 10% of the images, respectively. The face images are divided into five age groups: 10–20, 20–30, 30–40, 40–50, and over 50 years old, with the number of samples in each age group being 8656, 36,662, 38,736, 35,768, and 26,972, respectively. To further validate the effectiveness of our method, we utilize the Morph dataset [38] for testing. Table 1 shows the comparison information of both data sets.

4.2. Experimental Details

In this section, the FG-CGAN method is compared with IPCGANs [15], acGANs [39], and CAAE [19], which can generate real face images of specific age groups with identity conditions as constraints.

The image size of the input generation network is 128

\times

128

\times

3. For the training parameters of the network, refer to the strategy of Wang et al. [25]. The age recognition network proposed in Section 3.4 is used as a feature extractor to extract the feature map between symmetric network layers. During the training phase, the batch size is set to 32 and the learning rate to 0.001. For the selection of optimizer, Adam is used as the optimizer, taking into account the characteristics that it is difficult to train and optimize the generated countermeasure network itself. In the end, the training iteration of the whole network setup is 50,000 rounds. The hyperparameter settings are consistent with Fang et al.’s work [40].

4.3. Classification Accuracy Influence for Age Classifier Module

To supervise the training process of the generative adversarial network, we utilize the age classifier module with varying classification accuracy to identify the optimal classifier network for the current generative adversarial network. During the training process of the age classifier module, we preserve the models with training accuracy of 62, 72, and 82, respectively. These three models are then incorporated into the model training process. The experimental results for different accuracies are presented in Figure 2.

Figure 2a depicts the final training output when using the age classifier module with an accuracy of 82. The generated image in Figure 2a displays a trend where the generative adversarial network is less likely to alter the face image due to the larger penalty imposed by the age classifier module. A higher classification accuracy can reduce diversity in age groups, causing the generated images to remain similar or unchanged from the original images. Conversely, using a model with low classification accuracy, as seen in Figure 2b, results in obvious artifacts and unrealistic features in the generated images. A lower classification accuracy leads to less punishment during training, causing the generative adversarial network to generate more images with a chance-based approach. Based on these results, a model with a classification accuracy of 72 is selected as the age classifier module. The weights of the age classifier module are set according to the experimental findings in IPCGANs.

4.4. Intuitive Visual Display

Visual effect: Figure 3 shows the output of images randomly selected from Morph dataset at the age of 11–20 and 21–30 as inputs. It can be directly seen that the algorithm proposed in this paper has a good effect on the preservation of face identity information and the change of age diversity. Among them, these include changes in hair color, an increase in facial wrinkles, and an enlarged jaw with age. These changes are in line with people’s intuitive understanding of changes when the face ages.

Intuitive comparison: based on previous work, we compare the effects of different methods of face generation. We randomly extract four face images from the CACD dataset. Different models are then used to generate images corresponding to age groups to visually compare the realism and information preservation of the images. Since IPCGANs do not provide an official implementation, we re-implement the IPCGANs network and use the acGANs algorithm model available on the network. We evaluate the effects of different algorithms and explain the problems and advantages of algorithms in generating images. Figure 4a–d shows the comparison results:

Figure 4a–d provides a comprehensive comparison of different face-generation methods. The first column of the figure represents the selected original face image, and the last three columns are arranged by age order of 20–30, 30–40, and 40–50. In Figure 4a, the age change of the face generated by the acGANs algorithm is not very evident, and a certain level of smoothing effect is observed. In contrast, the age change of the images generated by IPCGANs and the method proposed in this paper is more apparent, and the identity information of the face is well preserved.

In Figure 4c, acGANs still exhibit a noticeable smoothing effect, and the images suffer from blurred backgrounds. While IPCGANs have a considerable impact on age change and identity preservation, noise appears, resulting in unclear images. The approach proposed in this paper preserves the identity characteristics of the face relatively well and generates images that meet the target age range. Moreover, the images are clearer than those generated by IPCGANs.

In Figure 4b, acGANs result in a certain degree of face identity information loss, and the significant smoothing effect even makes it difficult to recognize the gender of the person in the image. Both IPCGAN and the method proposed in this paper perform well.

In Figure 4d, acGANs still suffer from problems related to face identity information loss, and the age change is not very evident. The IPCGANs differ from the method proposed in this paper for the direction of image aging. IPCGANs prefer to add beards and other elements to show the age difference, while the algorithm in this paper highlights the change characteristics of age from the wrinkles and changes in hair color that occur after face aging.

After conducting the above comparison analysis, it is evident that the method proposed in this paper produces more realistic and reasonable age changes than IPCGANs. The algorithm highlights the features that change with age, such as wrinkles, more frequently. However, both approaches perform well in preserving the identity information of the face. In contrast, the images generated by acGANs exhibit severe dermabrasion and smoothing effects, causing the loss of identity information in the generated images. It is difficult to identify whether the images before and after the generation correspond to the same person or even estimate the gender of the original face. In summary, the approach proposed in this paper and IPCGANs generate images that appear more reliable and realistic as compared to acGANs.

Intuitively looking at the images generated by the three models, the images generated by the work in this paper can not only preserve the identity information features of the face intact but also make the change of age more obvious and intuitive.

Effectiveness of our algorithm: the generation effect verification presented above primarily focuses on gradually generating aging images from young images, but this approach does not fully demonstrate the algorithm’s effectiveness. To address this, we also verify the spread from middle age to both sides. Specifically, we randomly select the faces of individuals aged 30–45 years and input them into our network to generate images of both young and old age. The experiment is illustrated in Figure 5a–d, and we evaluate the generation effect of each image.

Figure 5a depicts the original faces of individuals aged 41 and 42 as input, with the generated facial appearance displaying a gradual change in wrinkles and graying of hair.

Similarly, Figure 5b shows the increased wrinkles, graying of hair, and enlargement of the jaw, while preserving the identity information features of the face. The generated images meet the expectations of the experiment. Figure 5c also demonstrates a similar effect to the two groups in Figure 5a,b.

However, in Figure 5d, a small number of whiskers appear on the female face in the first row of images. This can be attributed to the principle of generative adversarial networks, which only fit the distribution of data and may struggle to ensure the strict distinction between the male and female sexes throughout the training process.

4.5. Quality Comparison

Face aging aims to convert the face of the input image to the target age while preserving personal identity. Therefore, the face aging model can be evaluated from two perspectives: (1) how well the identity of the original image is preserved; (2) what is the quality of the age classification of the generated aging images.

Identity preservation: we randomly select 500 samples and calculate the face verification confidence by comparing the input images with the images generated for each age group. This approach can comprehensively evaluate the performance of each generative model in terms of preserving identity and achieving accurate age classification. CACE, acGAN, and IPCGAN are measured in comparison.

To evaluate identity preservation in our face aging experiments, we conducted face verification experiments using FaceNet. For each input image of a young face, not only are the original input and the generated faces compared but also the generated faces themselves. In addition, we verified the face verification rate for each age group using different methods on the dataset.

Table 2 reports verification confidences between synthetic aging images from different age groups of the same face, where high verification confidence indicates consistent preservation of identity information. Notably, as the face ages, authentication confidence decreases, indicating that face aging changes appearance. Table 3 also reports the results of the face verification rate, where the threshold is set to 1.0 in FaceNet. Although IPCGAN has an identity retention module, our proposed feature preservation module can better improve identity preservation.

Accurate age classification: at this stage, volunteers are asked to make an age prediction of the images. One hundred images from the Morph and CACD datasets are randomly selected as input to the generative model. Then, different generative models are used to generate images of the target age groups, resulting in each model containing the original 100 images and 300 generated images. Based on 100 original images, we randomly select 20 images and randomly select 20 images from the corresponding 300 generated images to distribute to volunteers for discrimination in two directions.

For face verification, we generate three images for each input image based on the age label. These images are then grouped into three pairs: (input image, age label 0 image), (age label 1 image, age label 2 image), and (age label 3 image, image randomly selected from other generated face images). The first two pairs are compared to verify if the generated images belong to the same person, while the third pair is used to verify if the generated images are similar to other faces. Volunteers complete the face verification task, and we compare the results using different methods. To calculate the accuracy, using the acc formula,

acc = \frac{(k_{p} + k_{n})}{N_{p} + N_{n}}

(11)

N_{p}

and

N_{n}

represent the total number of sample pairs in the first and second groups, and the number of sample pairs in the third group.

k_{p}

and

k_{n}

represent the number of people considered to be the same in the first and second groups and the number of people considered not to be the same in the third group.

For age classification, volunteers need to estimate the age of the image they receive, that is, vote for the age group to which the image belongs. After scoring multiple votes using different models, record the percentage of face images that report a target age that is consistent with the user’s estimated result.

On the other hand, to ensure objectivity in the comparison of effects. We use the face-comparing service provided by Face++ to evaluate how well the generated image matches the face information of the original image. Similarly, we refer to previous work using the VGG-face score to measure image quality. Table 4 shows the result of these experiments.

5. Conclusions

Face-based human identification is still a complex problem when it encounters images of the person of different age groups. Cross-age applications such as finding the track missing children after many years, surveillance, and so on are challenging mainly due to the lack of labeled cross-age faces dataset. Hence generating visually realistic faces with GAN-based methods seems to be a promising way. However, identity preservation, as well as aging accuracy, are the essential characteristics of most cross-age applications. In this paper, we tackle these issues with a proposed Feature-Guide Conditional Generative Adversarial Network (FG-CGAN), which is composed of four sub-modules, including the generator module, the feature guide module, the age classifier module, and the discriminator module. In the process of face generation, perceptual loss combined with L2 constraint is introduced to minimize the distance between the input and output face image identities in the feature guide module. In the age classifier module, to improve classification accuracy, the age-estimated loss is constructed, in which L-Softmax loss is combined to learn the intra-class compactness and inter-class separability between features. Sufficient experiments are conducted on the widely used face aging dataset CACD and Morph. Encouraging results are obtained, which verify the effectiveness of our proposed method.

In subsequent work, we will address some of the limitations of this method. For example, this method does not consider facial differences between different races. In addition, although the paper introduced L-Softmax losses in experiments to improve classification accuracy, further validation is needed to evaluate the applicability of this method to other data sets and scenarios. Therefore, the universality and applicability of this method need to be further evaluated, and we hope that future research can more effectively address these issues.

Author Contributions

Conceptualization, X.L. and G.Y.; methodology, C.L., Y.L., Z.W. and X.L.; software, Y.L. and Z.W.; investigation, G.Y.; writing—original draft preparation, C.L., Y.L. and Z.W.; writing—review and editing, C.L., Y.L. and X.L.; supervision, X.L. and G.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This paper is supported by the Research Project of the Beijing Young Topnotch Talents Cultivation Program (Grand No. CIT&TCD201904009), partially by the National Natural Science Foundation of China (Grand No. 62172006 and 61977001) and the Great Wall Scholar Program (CIT&TCD20190305).

Data Availability Statement

Data will be made available on request.

Conflicts of Interest

The authors declare no conflict of interest.

References

Tazoe, Y.; Gohara, H.; Maejima, A.; Morishima, S. Facial aging simulator considering geometry and patch-tiled texture. In ACM SIGGRAPH 2012 Posters; Association for Computing Machinery: New York, NY, USA, 2012; p. 1. [Google Scholar]
Kemelmacher-Shlizerman, I.; Suwajanakorn, S.; Seitz, S.M. Illumination-aware age progression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 3334–3341. [Google Scholar]
Lanitis, A.; Taylor, C.J.; Cootes, T.F. Toward automatic simulation of aging effects on face images. IEEE Trans. Pattern Anal. Mach. Intell. 2002, 24, 442–455. [Google Scholar] [CrossRef]
Suo, J.; Zhu, S.C.; Shan, S.; Chen, X. A compositional and dynamic model for face aging. IEEE Trans. Pattern Anal. Mach. Intell. 2009, 32, 385–401. [Google Scholar]
Suo, J.; Chen, X.; Shan, S.; Gao, W.; Dai, Q. A concatenational graph evolution aging model. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 34, 2083–2096. [Google Scholar] [PubMed]
Ramanathan, N.; Chellappa, R. Modeling shape and textural variations in aging faces. In Proceedings of the 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition, Amsterdam, The Netherlands, 17–19 September 2008; IEEE: Piscataway, NJ, USA, 2008; pp. 1–8. [Google Scholar]
Ramanathan, N.; Chellappa, R. Modeling age progression in young faces. In Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), New York, NY, USA, 17–22 June 2006; IEEE: Piscataway, NJ, USA, 2006; Volume 1, pp. 387–394. [Google Scholar]
Liu, C.; Yuen, J.; Torralba, A. SIFT Flow: Dense Correspondence across Scenes and Its Applications. IEEE Trans. Pattern Anal. Mach. Intell. 2011, 33, 978–999. [Google Scholar] [CrossRef] [PubMed]
Rowland, D.; Perrett, D. Manipulating facial appearance through shape and color. IEEE Comput. Graph. Appl. 1995, 15, 70–76. [Google Scholar] [CrossRef]
Zhu, H.; Huang, Z.; Shan, H.; Zhang, J. Look globally, age locally: Face aging with an attention mechanism. In Proceedings of the ICASSP 2020—2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1963–1967. [Google Scholar]
Peng, F.; Yin, L.P.; Zhang, L.B.; Long, M. CGR-GAN: CG facial image regeneration for Antiforensics based on generative adversarial network. IEEE Trans. Multimed. 2019, 22, 2511–2525. [Google Scholar] [CrossRef]
Yi, Z.; Zhang, H.; Tan, P.; Gong, M. Dualgan: Unsupervised dual learning for image-to-image translation. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2849–2857. [Google Scholar]
Kim, T.; Cha, M.; Kim, H.; Lee, J.K.; Kim, J. Learning to discover cross-domain relations with generative adversarial networks. In Proceedings of the International Conference on Machine Learning. PMLR, Sydney, Australia, 6–11 August 2017; pp. 1857–1865. [Google Scholar]
Zhu, J.Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2223–2232. [Google Scholar]
Wang, Z.; Tang, X.; Luo, W.; Gao, S. Face aging with identity-preserved conditional generative adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7939–7947. [Google Scholar]
Yang, H.; Huang, D.; Wang, Y.; Jain, A.K. Learning Continuous Face Age Progression: A Pyramid of GANs. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 43, 499–515. [Google Scholar] [CrossRef] [PubMed]
Li, P.; Hu, Y.; Li, Q.; He, R.; Sun, Z. Global and local consistent age generative adversarial networks. In Proceedings of the 2018 24th International Conference on Pattern Recognition (ICPR), Beijing, China, 20–24 August 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 1073–1078. [Google Scholar]
Yang, H.; Huang, D.; Wang, Y.; Jain, A.K. Learning face age progression: A pyramid architecture of gans. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 31–39. [Google Scholar]
Zhang, Z.; Song, Y.; Qi, H. Age progression/regression by conditional adversarial autoencoder. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 5810–5818. [Google Scholar]
Antipov, G.; Baccouche, M.; Dugelay, J.L. Face aging with conditional generative adversarial networks. In Proceedings of the 2017 IEEE International Conference on Image Processing (ICIP), Beijing, China, 17–20 September 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 2089–2093. [Google Scholar]
Kingma, D.P.; Welling, M. Auto-Encoding Variational Bayes. In Proceedings of the International Conference on Learning Representations, Scottsdale, AZ, USA, 2–4 May 2013. [Google Scholar]
Pantraki, E.; Kotropoulos, C. Face Aging Using Global and Pyramid Generative Adversarial Networks. Mach. Vision Appl. 2021, 32, 82. [Google Scholar] [CrossRef]
Isola, P.; Zhu, J.Y.; Zhou, T.; Efros, A.A. Image-to-Image Translation with Conditional Adversarial Networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
Duong, C.N.; Luu, K.; Quach, K.G.; Bui, T.D. Longitudinal Face Modeling via Temporal Deep Restricted Boltzmann Machines. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
Wang, W.; Cui, Z.; Yan, Y.; Feng, J.; Yan, S.; Shu, X.; Sebe, N. Recurrent face aging. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2378–2386. [Google Scholar]
Mirza, M.; Osindero, S. Conditional Generative Adversarial Nets. arXiv 2014, arXiv:1411.1784. [Google Scholar]
Sharma, N.; Sharma, R.; Jindal, N. Prediction of face age progression with generative adversarial networks. Multimed. Tools Appl. 2021, 80, 33911–33935. [Google Scholar] [CrossRef] [PubMed]
Makhmudkhujaev, F.; Hong, S.; Kyu Park, I. Re-Aging GAN: Toward Personalized Face Age Transformation. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, BC, Canada, 11–17 October 2021; pp. 3888–3897. [Google Scholar]
Shen, Y.; Yang, C.; Tang, X.; Zhou, B. InterFaceGAN: Interpreting the Disentangled Face Representation Learned by GANs. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 2004–2018. [Google Scholar] [CrossRef] [PubMed]
Liu, Y.; Li, Q.; Sun, Z. Attribute-Aware Face Aging With Wavelet-Based Generative Adversarial Networks. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 11869–11878. [Google Scholar]
Liu, Y.; Li, Q.; Sun, Z.; Tan, T. A3GAN: An Attribute-Aware Attentive Generative Adversarial Network for Face Aging. IEEE Trans. Inf. Forensics Secur. 2021, 16, 2776–2790. [Google Scholar] [CrossRef]
Levi, G.; Hassncer, T. Age and gender classification using convolutional neural networks. In Proceedings of the 2015 IEEE 146 Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Boston, MA, USA, 7–12 June 2015; pp. 34–42. [Google Scholar]
Rothe, R.; Timofte, R.; Van Gool, L. Deep Expectation of Real and Apparent Age from a Single Image without Facial Landmarks. Int. J. Comput. Vision 2018, 126, 144–157. [Google Scholar] [CrossRef]
Li, P.; Hu, Y.; Wu, X.; He, R.; Sun, Z. Deep Label Refinement for Age Estimation. Pattern Recogn. 2020, 100, 107178. [Google Scholar] [CrossRef]
Xia, M.; Zhang, X.; Liu, W.; Weng, L.; Xu, Y. Multi-Stage Feature Constraints Learning for Age Estimation. IEEE Trans. Inf. Forensics Secur. 2020, 15, 2417–2428. [Google Scholar] [CrossRef]
Mao, X.; Li, Q.; Xie, H.; Lau, R.Y.; Wang, Z.; Paul Smolley, S. Least squares generative adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2794–2802. [Google Scholar]
Chen, B.C.; Chen, C.S.; Hsu, W.H. Cross-age reference coding for age-invariant face recognition and retrieval. In Proceedings of the European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2014; pp. 768–783. [Google Scholar]
Ricanek, K.; Tesafaye, T. Morph: A longitudinal image database of normal adult age-progression. In Proceedings of the 7th International Conference on Automatic Face and Gesture Recognition (FGR06), Southampton, UK, 10–12 April 2006; IEEE: Piscataway, NJ, USA, 2006; pp. 341–345. [Google Scholar]
Odena, A.; Olah, C.; Shlens, J. Conditional image synthesis with auxiliary classifier gans. In Proceedings of the International conference on machine learning. PMLR, Sydney, Australia, 6–11 August 2017; pp. 2642–2651. [Google Scholar]
Fang, H.; Deng, W.; Zhong, Y.; Hu, J. Triple-GAN: Progressive Face Aging with Triple Translation Loss. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, WA, USA, 14–19 June 2020; pp. 3500–3509. [Google Scholar]

Figure 1. Feature-Guide Conditional Generative Adversarial Network (FG-CGAN). FG-CGAN mainly adds a feature guide module and an age classification module to the basic GAN structure, enabling the original face image to generate new images that preserve identity information and have good visual effects.

Figure 2. The results from different age classification accuracies: (a) the effect of a classification accuracy of 82; (b) the effect of a classification accuracy of 62.

Figure 3. The generated aged faces by FG-CGAN. The images demonstrate the direct visual effect of FG-CGAN on eight randomly selected input images from the CACD and Morph datasets. As shown, all of these images exhibit a high-quality visual effect of facial aging, showcasing the effectiveness of the model.

Figure 4. Performance comparison with prior works on CACD dataset: (a–d). Using the same input face images with acGANs, IPCGANs, and FG-CGANs, the synthetic aging images for the age groups of 20–30, 30–40, and 40–50 are displayed.

Figure 5. Display of face aging results on middle-aged faces by FG-CGAN: (a–d). Inputting randomly selected face images of individuals aged 30–45 into the network, the resulting images of the same face at a younger and older age are shown. This comparison illustrates the ability of the model to transform a face’s appearance across a range of ages.

Table 1. Specific data of CACD-ours and Morph.

Database	Images	Subject	Dataset Distribution
CACD-ours	146,794	2000	10–20 (8656), 20–30 (36,662), 30–40 (38,736), 40–50 (35,768)
Morph	55,134	13,000	<20 (7469), 20–30 (163,225), 30–40 (15,357), 40–50 (12,050), 50+ (3993)

Table 2. Face verification confidence quantitative results.

		20–30	30–40	40–50
	original	69.84	67.21	64.85
CACE	20–30	-	66.91	64.18
	30–40	-	-	65.02
	original	94.60	93.26	91.2
acGAN	20–30	-	90.02	90.72
	30–40	-	-	90.57
	original	95.57	94.65	90.69
IPCGAN	20–30	-	94.68	90.8
	30–40	-	-	90.36
	original	95.79	95.42	90.77
Our	20–30	-	94.11	91.92
	30–40	-	-	90.67

Table 3. Face verification rate quantitative results.

	20–30	30–40	40–50
CACE	76.42	73.17	72.87
acGAN	94.29	92.21	90.09
IPCGAN	100	100	100
Our	100	100	100

Table 4. Generate image quality assessment results.

	acGANs	IPCGANs	Ours
Face verification	85.83	91.6	95.52
Age classification	32.70	31.74	31.87
Image quality	39.67	71.74	75.44
VGG-face score	21.60–25.24	34.48–38.18	36.12–39.12

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, C.; Li, Y.; Weng, Z.; Lei, X.; Yang, G. Face Aging with Feature-Guide Conditional Generative Adversarial Network. Electronics 2023, 12, 2095. https://doi.org/10.3390/electronics12092095

AMA Style

Li C, Li Y, Weng Z, Lei X, Yang G. Face Aging with Feature-Guide Conditional Generative Adversarial Network. Electronics. 2023; 12(9):2095. https://doi.org/10.3390/electronics12092095

Chicago/Turabian Style

Li, Chen, Yuanbo Li, Zhiqiang Weng, Xuemei Lei, and Guangcan Yang. 2023. "Face Aging with Feature-Guide Conditional Generative Adversarial Network" Electronics 12, no. 9: 2095. https://doi.org/10.3390/electronics12092095

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Face Aging with Feature-Guide Conditional Generative Adversarial Network

Abstract

1. Introduction

2. Related Work

2.1. Face Aging Methods

2.2. Age Prediction/Estimation Methods

3. Methodology

3.1. Base Network

3.2. Feature Guide Module

3.3. Age Classifier Module

3.4. Overall Objective Function

4. Experiments and Evaluation

4.1. Dataset

4.2. Experimental Details

4.3. Classification Accuracy Influence for Age Classifier Module

4.4. Intuitive Visual Display

4.5. Quality Comparison

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI