Deep Learning and Face Recognition: Face Recognition Approach Based on the DS-CDCN Algorithm

Deng, Nan; Xu, Zhengguang; Li, Xiuyun; Gao, Chenxuan; Wang, Xue

doi:10.3390/app14135739

Open AccessArticle

Deep Learning and Face Recognition: Face Recognition Approach Based on the DS-CDCN Algorithm

by

Nan Deng

^1,*,

Zhengguang Xu

²,

Xiuyun Li

¹,

Chenxuan Gao

² and

Xue Wang

²

¹

School of Mathematics and Computer Science, Hebei Minzu Normal University, Chengde 067000, China

²

School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing 100083, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(13), 5739; https://doi.org/10.3390/app14135739

Submission received: 16 May 2024 / Revised: 19 June 2024 / Accepted: 19 June 2024 / Published: 1 July 2024

(This article belongs to the Special Issue Recent Applications of Artificial Intelligence for Bioinformatics)

Download

Browse Figures

Versions Notes

Abstract

:

To enhance the performance and reliability of the face recognition algorithm that is based on deep learning technology, this study utilizes a density-based noise-applied spatial clustering algorithm to cluster a large-scale face image dataset, resulting in a self-constructed dataset. A deep separable center differential convolutional network algorithm is utilized for face recognition. The impact of convolutional parameters on the algorithm’s performance is verified through experiments with ablated convolutional parameters. The study found that the density-based noise-applied spatial clustering algorithm resulted in time savings of 43.66% and 51.22% compared to the K-means clustering algorithm and the hierarchical clustering algorithm, respectively, when analyzing 8000 images. Additionally, the depth-separable center difference convolutional network algorithm had a lower average classification error rate compared to the other two algorithms, with reductions of 2.49% and 17.01%, respectively. The depth-separable center difference convolutional network technique is an advanced method for identifying the faces of people of different races, according to the experimental investigation. It can provide efficient and accurate services for the face recognition needs of various races.

Keywords:

deep learning; face recognition; data preprocessing; DBSCAN; DS-CDCN

1. Introduction

Face recognition (FR) technology has grown in importance and is now a commonly used biometric identification tool across many industries thanks to the ongoing advancements in science and technology [1]. The goal of FR technology is to analyze and process face images (FIs), extract facial feature information, and compare it with faces in the database to confirm or verify identity [2]. However, in practical applications, FR techniques face a number of challenges, such as lighting changes, posture changes, occlusion, and expression changes [3]. To address these challenges, researchers have proposed various methods for FI. Significant advancements in the field of deep learning (DL) have been made in recent years due to the rapid growth of DL technology. DL is a learning method that simulates the neural network of the human brain, enabling it to automatically learn feature representations from large amounts of data through a multilevel neural network model [4]. Compared to traditional FR methods, DL algorithms have better robustness and accuracy. However, due to the specificity of the FR domain, traditional DL models cannot be directly applied to FR tasks [5]. Therefore, this study proposes an improved algorithm called deep separable center difference convolutional networks (DS-CDCN), which enhances the accuracy and robustness of FR by introducing techniques such as variable convolution and depth-separable convolution for modeling local features in FIs. The primary contribution of this thesis is the combination of the DL technique with FR to propose a new FR method. The effectiveness of this method has been experimentally proven, which is crucial for improving the accuracy and robustness of FR. The study is divided into four parts: a summary of related studies, the design of the FR method, the validation of the FR method, and the conclusion of the study.

2. Related Works

DL neural network is a machine learning model that utilizes multiple layers of neurons for learning and inference, enabling adaptive responses to complex tasks by learning features and patterns from large amounts of data. To address the problems of neural network cell vanishing and exploding gradients, Bodyanskiy’s research group used generalized neofuzzy neurons to optimize the learning process and combined them with a backpropagation algorithm for network training. It was shown that this method improves the optimization rate of the model [6]. Tang et al. proposed an anomaly detection model with a multi-scale autoencoder and a deep feature extractor for DL industrial anomaly detection. Application data showed that the model could obtain a high area under the curve [7]. Shen’s group addressed the adversarial training problem using an integration-based approach for autopilot control systems and proposed a black-box attack scheme based on boosting. The results indicated that the effectiveness of the method could reach 90% [8]. Xin and his group members proposed a time series framework for predicting nonlinear systems, including generative performance-based models and long and short-term memory models, etc., and combined it with the K-means method for preprocessing and hypothesis validation. Experiments showed that the average standard error of the method was 0.23% [9]. Burak et al. proposed a deep convolutional neural network (CNN) model. The model utilized algorithms such as stochastic gradient descent, Nesteroff-accelerated gradient, and adaptive gradient to compute the initial weights of the network and backpropagation learning was accelerated by model parameter updating. The results demonstrated that the deep CNN model has an accuracy of 99.05% [10].

FR is a biometric method based on DL technology to achieve accurate identification and protection of personal identity by recognizing the features on the human face. In their discriminative deep multitask learning facial expression recognition system, Zheng et al. took into account the samples’ local spatial distribution information as well as their class label information. The results showed that the method outperforms the traditional DL method [11]. Proposed a useful image pre-selection framework for facial emotion classification based on reinforcement learning [12], which contains two modules: an image selector and a coarse emotion classifier. The experimental results on the FER2013 dataset indicated that the effectiveness of this method was improved on the basis of traditional emotion classification methods. Bhatt and his group members used various heuristic algorithms to optimize the feature extraction process for face expression detection, and the optimized features were fed into the residual network for representation and classification. The results showed that the quantum-inspired firefly algorithm performed better [13]. In order to extract deep features from the residual neural network and to enrich the feature information, Zhang et al. suggested an improved residual neural network-based face expression recognition algorithm. According to the experimental results, this method’s recognition accuracy was 96.37% and 93.38% in two datasets, respectively [14]. Cao et al. used a dual augmented capsule neural network for facial expression recognition, which utilizes a capsule neural network with multiple convolutional layers to enhance the feature representation. The results indicated that the method outperformed previous state-of-the-art methods [15].

In summary, numerous scholars have conducted research and design for DL and FR, but the robustness and generalizability of these models are not adequately assessed. Additionally, some studies have used small datasets, which may limit the reliability and generalizability of the results. Therefore, this study proposes an FR method based on the DS-CDCN algorithm, which is expected to enhance the model’s performance and the effectiveness of FR.

3. Design of Face Recognition Method Based on DS-CDCN Algorithm

In this chapter, the face data preprocessing (FDP) method and FR algorithm are proposed for the FR method, respectively. In FDP, it investigates the use of density-based spatial clustering for the application of noise (DBSCAN) algorithm to preprocess noisy datasets, provide high-quality datasets to train the model and propose an improved DS-CDCN algorithm.

3.1. Face Data Preprocessing Method

Large-scale face datasets are typically required for DL training in the FR domain. Preprocessing the face datasets is a crucial step in ensuring the quality of the dataset, which can enhance the model’s resilience and classification accuracy [16]. The FDP process is shown in Figure 1.

FDP includes data cleaning, data enhancement, data labeling, dataset segmentation, and data enhancement and labeling. Data cleaning removes noise, errors, duplicates, or missing data from face datasets, and commonly used data cleaning algorithms include singular value detection, connectivity detection, and missing value filling [17]. Data enhancement can expand the dataset and improve the robustness and generalization of the model through operations such as rotation, scaling, translation, flipping and morphing. For some FR tasks, manual labeling is required to obtain more accurate classification results. In addition, the labeling operations can include labeling information such as key points, angles, and proportions of each face. To facilitate cross-validation and model performance evaluation, the dataset must be split into training, validation, and test sets prior to model training. Data improvement and tagging are crucial processes since DL models need a lot of face datasets for training [18]. In FI data processing, the image’s gray level is shown in Equation (1).

\{\begin{cases} q = \sum_{g = 0}^{255} q_{g} \\ p_{g} = \frac{q_{g}}{q} \end{cases}

(1)

In Equation (1), the pixel point is

g

, the pixel gray level probability is

p_{g}

, the sum of the pixels is

q

, and the gray level pixels is

q_{g}

. After the binarized segmentation of the image, the image is divided into two parts. The probabilities of the two classes of pixel points are shown in Equation (2).

\{\begin{cases} P_{0} = \sum_{g = 0}^{t} p_{g} \\ P_{1} = \sum_{g = t + 1}^{255} p_{g} = 1 - P_{0} \end{cases}

(2)

In Equation (2), the probability of classifying to a pixel greater than the threshold is

P_{0}

, the probability of classifying to a pixel less than the threshold is

P_{1}

, and the pixel threshold is

t

. The average gray value is calculated as shown in Equation (3).

m = \sum_{g = 0}^{255} g p_{g}

(3)

In Equation (3), the overall average gray value of the image is

m

. Based on the interclass variance and maximizing the interclass variance, the threshold value is obtained as shown in Equation (4).

t = \underset{t \in (0, 255)}{\arg \max} (H^{2} (t)) = {I |\forall t : H^{2} (t) \leq H^{2} (I)}

(4)

In Equation (4), the inter-class variance is

H^{2}

, and the gray level of the image is

I

. Based on the above points, the study proposes an FDP method based on the DBSCAN algorithm, which aims to obtain high-quality datasets from noisy and erroneous face datasets. The algorithm first determines the neighborhood parameters, including the distance threshold of the neighbor threshold and the density threshold of the core object. Then, the parameter initialization setting is performed to determine the sample set, cluster division and number of clusters for which the data need to be clustered. Next, the core objects are randomly selected from the sample set, initialized with category order numbers and noted as sets. The domain subsample sets of the core objects are found through the distance constraints, and these sample sets are added to the cluster sample set. At the same time, consider the intersection of domain samples and remove the intersection samples. Repeat this process until the set of unvisited samples is found. Finally, the core objects are merged into the set and the cluster sample set is updated to a new set containing the core objects [19]. The face dataset acquisition process is shown in Figure 2.

In the face dataset acquisition process, firstly, the web crawler and video frame extraction are used to obtain their own face dataset, and then the dataset is clustered by the DBSCAN algorithm to get the categorized high-quality dataset. Then, the dense face key points are obtained using the positional regression mapping network. In addition, the background of the face is removed by the key point information and finally filled with average pixels of the face [20].

3.2. Design of Face Recognition Algorithm Based on DS-CDCN

Center difference convolutional neural network (CDCNN) is a CNN algorithm for face detection. The features extracted by CDCNN at low, medium, and high levels are subjected to spatial attention refinement and fusion by a multi-scale feature modulation (MAFM) module [21]. The features at different levels are refined by spatial attention related to the kernel size of the receptive domain and then connected together. The refined features are shown in Equation (5).

{F_{i}}^{'} = F_{i} ⊙ (σ (C_{i} ([A (F_{i}), M (F_{i})]))), i \in {l o w, m i d, h i g h}

(5)

In Equation (5), the refined features are

F^{'}

, and the features at different levels are

F

. The multiplication of two matrices with the same dimensions is denoted by

⊙

. The average pooling and the maximum pooling are

A

and

M

, respectively. The Sigmoid function is

σ

, and the convolutional layer is

C

. In this study, an improved DS-CDCN algorithm is proposed on the basis of CDCNN. The algorithm uses depth separable center difference convolution to extract multilevel center difference features and then fuses the multilevel features using multiscale channel attention fusion (MCAF). The structure of the DS-CDCN algorithm is shown in Figure 3.

The DS-CDCN neural network consists of four DS-CDC layers and three CDC_Blocks. The DS-CDC convolutional layer consists of a 3 × 3 depth-by-depth center difference convolutional layer plus a 1 × 1 point-by-point convolution. The CDC_Block consists of two DS-CDC layers. The two DS-CDC layers in the CDC_Block are used to expand the channels and reduce the channels, respectively. The first DS-CDC layer first expands the channels to double the output of the previous layer and keeps the feature map size unchanged. The second CDC_DW layer then reduces the number of channels in the output feature map of the first layer to half the previous number, while maintaining the same feature map size. The specific function of CDC_Block is to extract center difference features at different levels to achieve a detailed analysis of input images. Three CDC_Blocks are used for low-, medium- and high-level center difference feature extraction. Each CDC_Block is followed by a maximum pooling layer for parameter reduction. The three-level features are fed into the MCAF feature fusion module, and the fused features are passed through two DS-CDC convolutional layers to reduce the number of channels to obtain the predicted depth map of the predicted pass. The pixel values of the depth map obtained after the depth loss guided network training are mapped to the range [0, 1]. Subsequently, the predicted depth map obtained is compared with the input image after resetting the labeled size, and the probability that the current face is alive is calculated [22]. The output features obtained by ordinary convolution are shown in Equation (6).

y (p_{0}) = \sum_{p_{n} \in R} ω (p_{n}) \cdot x (p_{0} + p_{n})

(6)

In Equation (6), the output feature is

y

, and the input feature is

x

. The position of the current point of the sampling window is

p_{0}

. The coordinates of the point in the local sensing field

R

are

p_{n}

, and the weight is

ω

. The sampling and aggregation process of CDCNN is different from the ordinary convolution. CDCNN is more likely to aggregate the collected values’ center gradient information during the sampling step. In order to enhance the network model’s performance, CDCNN combines the multilevel center difference features through multi-scale channel attention fusion during the aggregation step. Equation (7) displays the CDCNN equation derived from standard convolution.

y^{'} (p_{0}) = \sum_{p_{n} \in R} ω (p_{n}) \cdot (x (p_{0} + p_{n}) - x (p_{0}))

(7)

In Equation (7), the CDCNN output is

y^{'}

. CDCNN can effectively extract the gradient-level detail information in the face live detection task. However, the method mainly focuses on intensity-level semantic information and ignores the importance of gradient-level detail information. The study suggests an algorithm to address this issue by combining gradient and detail information with ordinary and center-differential convolution. This allows the two algorithms to share weights and learn both gradient and intensity-level semantic information. In this way, the algorithm can generalize better and improve performance without additional parameters. The equation for combining ordinary and center difference convolution is shown in Equation (8).

\begin{array}{l} y^{″} (p_{0}) = & θ \sum_{p_{n} \in R} ω (p_{n}) \cdot (x (p_{0} + p_{n}) - x (p_{0})) \\ + (1 - θ) \sum_{p_{n} \in R} ω (p_{n}) \cdot x (p_{0} + p_{n}) \end{array}

(8)

In Equation (8), the shared weight, i.e., hyperparameter, is

θ

, which takes the value in the range of [0, 1]. Its larger value indicates the higher importance of the center differential gradient information. The structure of DS-CDC is shown in Figure 4.

DS-CDC consists of depth-by-depth center difference convolution CDC and point-by-point convolution. The structure initiates with a channel-by-channel convolution of each channel of the input feature map, which yields the intermediate feature map. Subsequently, a point-by-point convolution of the depth-by-depth center difference convolution results is conducted, with the objective of fusing the center difference features and convolving them point-by-point. For point-by-point convolution, each intermediate feature map corresponds to a set of 1 × 1 convolutions, and finally, M output feature maps are obtained using M sets of 1 × 1 convolutions. This depth-separable structure can reduce the number of network parameters. After each depth layer is completed, the feature images need to be processed by normalization and activation operations. Equation (9) displays the conventional convolution computation.

C o m p u t_{1} = D_{K} \times D_{K} \times M \times N \times D_{F} \times D_{F}

(9)

In Equation (9), the conventional convolution is computed as

C o m p u t_{1}

, and the input feature map size is

D_{F} \times D_{F} \times N

. The output feature map size is

D_{F} \times D_{F} \times M

, and the convolution kernel size is

D_{K} \times D_{K}

. The computation of the depth separable convolution is shown in Equation (10).

In Equation (9), the input feature map size is

D_{F} \times D_{F} \times N

, and the output feature map size is

D_{K} \times D_{K}

. The convolution kernel size is

D_{K} \times D_{K}

, which computes the standard convolution as

C o m p u t_{1}

. Equation (10) illustrates how the depth separable convolution was computed.

C o m p u t_{2} = D_{K} \times D_{K} \times N \times D_{F} \times D_{F} + N \times M \times D_{F} \times D_{F}

(10)

In Equation (10), the computation of depth-separable convolution is

C o m p u t_{2}

. MCAF is a method that can dynamically fuse feature information between different levels. The low-, middle-, and high-level features that are retrieved are subjected to 3 × 3, 5 × 5, and 7 × 7 convolution operations in a center difference convolutional network, respectively. Following that, a feature vector F is created by concatenating these features. We will feed this feature vector F into the channel attention module. The channel attention module is capable of refining and fusing the central difference features at low and medium levels through the channel attention mechanism. This allows for the dynamic fusion of various feature information levels, enhancing the network’s generalization and performance. The structure of MCAF is shown in Figure 5.

In the compression process, each feature map obtained by convolution is globally pooled, and the values of the feature channels are averaged into a real number to obtain a global feature on a channel. The compressed features are shown in Equation (11).

z_{c} = \frac{1}{H \times W} \sum_{i = 1}^{H} \sum_{j = 1}^{W} f_{c} (i, j)

(11)

In Equation (11), the feature coordinates of the original image

(i, j)

are

f_{c}

. Its dimension is

H \times W \times C

, and the compressed feature is

z_{c}

. The normalized weights of the excitation are shown in Equation (12).

ω_{c} = σ (W_{2} δ (W_{1 z_{c}}))

(12)

In Equation (12), the normalized weight on the interval [0, 1] is

ω_{c}

, the ReLU activation function is

δ

, and the weights of the two fully connected layers are

W_{1}

and

W_{2}

. The characteristics of each channel obtained after compression and excitation are shown in Equation (13).

F S E = ω_{n} \cdot f_{c}

(13)

In Equation (13), each channel is characterized as

F S E

and the number of channels is

n

. The mean square error of pixel supervision is shown in Equation (14).

L_{M S E} = \frac{1}{H^{'} \times W^{'}} \sum_{i \in H, j \in W} {(- B_{g t (i, j)} + B_{p r e (i, j)})}^{2}

(14)

In Equation (14), the mean square error loss is

L_{M S E}

and the binary mask height and width are

H^{'}

and

W^{'}

, respectively. The predicted grayscale mask is

B_{p r e}

, and the image-processed binary mask is

B_{g t}

. The process of contrast depth loss calculation is shown in Equation (15).

L_{C D L} = \frac{\sum_{i \in H, j \in W, n \in N} {(K_{n}^{C D L} ⊙ B_{p r e (i, j)} - K_{n}^{C D L} ⊙ B_{p r e (i, j)})}^{2}}{H^{'} \times W^{'} \times N}

(15)

In Equation (15), the contrast depth loss is

L_{C D L}

, the

n

th contrast convolutional kernel is

K_{n}^{C D L}

, and the convolutional kernels is

N

. Combining the mean-square error loss and the contrast depth loss yields the overall loss value.

4. Application Analysis of Face Recognition Method Based on DS-CDCN Algorithm

In this chapter, in terms of data preprocessing, the FI dataset is clustered using the DBSCAN algorithm to obtain a self-built dataset. By ablating convolutional parameters, the influence of convolutional parameters on the algorithm’s performance is confirmed, hence verifying the effect of the FR approach based on the DS-CDCN algorithm. Finally, the performance of the DS-CDCN algorithm and other algorithms in recognizing faces of different races is compared.

4.1. Face Data Preprocessing Validation

Larger FI datasets usually consist of thousands to millions of FIs, which cover a wide range of different face features, expressions, poses, and lighting conditions. Such large-scale datasets are important for training FR models and help the models learn a wider and more diverse range of face features. The commonly used large-scale FI datasets are shown in Table 1.

The experiment employs a public dataset for data enhancement processing, whereby the FI obtained from crawling and video is clustered by the DBSCAN algorithm to create a self-built dataset comprising a total of 775,590 images. The experiment will compare the hierarchical clustering algorithm with the K-means clustering algorithm (KMCA) to confirm the clustering effect of the DBSCAN algorithm. Figure 6 compares the various techniques for the image clustering effect.

In Figure 6a, the three algorithms converge gradually with the increase in FI, and the DBSCAN algorithm has the fastest convergence speed. After convergence, the loss value of the DBSCAN algorithm is less than 0.1, and the loss value of the KMCA and the hierarchical clustering algorithm is greater than 0.2. When compared to KMCA and the hierarchical clustering technique, the DBSCAN approach in Figure 6b has the lowest running time, about 41 s, and saves the most time, 43.66% and 51.22%, respectively, when there are 8000 images. In conclusion, the DBSCAN method works well in terms of computational efficiency and time savings and has a superior clustering impact. The comparison of image denoising effect is shown in Figure 7.

In Figure 7, adaptive median filtering has a better effect on image denoising, with an average signal-to-noise ratio of 33.36 dB, and an average image structural similarity of 0.92. This indicates that adaptive median filtering can effectively eliminate the noise in FI and retain the structural information of the image. Specifically, adaptive median filtering is a filtering method that combines mean filtering and median filtering by adaptively adjusting the filtering parameters according to the pixel density around the pixel to achieve a better denoising effect. By retaining the image’s structural information and effectively removing noise from the image, this approach can raise the image’s quality. Therefore, the image filtering in the subsequent experiments uses the adaptive median filtering method approach. The data information of the image after image data enhancement is shown in Figure 8.

In Figure 8, the contrast and brightness of the image are improved by about 24.5% and 19.8%, respectively, after daytime FI enhancement. The image’s contrast and brightness are enhanced by roughly 36.9% and 26.9%, respectively, following nighttime FI enhancement. The outcomes demonstrate that the image enhancement method significantly improves the quality of the images. An image’s visual impact can be enhanced, making it easier to see and distinguish by increasing its contrast and brightness. In practical applications, this is of great significance in the fields of FR, image processing and analysis.

4.2. Validation of the Effect of Face Recognition Method Based on DS-CDCN Algorithm

To confirm the efficacy and dependability of the FR approach founded on the DS-CDCN algorithm, the research develops a whole experimental setup, comprising hardware and software. The hardware adopts an Intel Core i7-9700K processor, NVIDIA GeForce RTX 3090 graphics card, and a 1 TB NVMe solid-state drive for storing data and parameters. The experiment uses Ubuntu 18.04.5 LTS operating system, TensorFlow 2.5 DL framework, Python 3.8 compiler, and commonly used libraries such as NumPy and Pandas. The research is equipped with a Logitech C920 camera (San Jose, CA, USA) for capturing FIs. Before transferring the image to the computer, it is necessary to preprocess the image to eliminate noise and lighting effects. The preprocessed image should be transferred to the computer via a wireless network and stored in the database. One-third of the data from the homemade image dataset was used for the ablation experiments on the convolution parameters. To compare the impact of convolution parameters on the convolution results, ten values are chosen at random for the tests. The algorithm’s performance is assessed using the real presentation classification error rate (CER) and average CER. To avoid the randomness of experimental results, tests 1–5 are repeated under the same experimental conditions. Figure 9 illustrates how the convolution settings affect the algorithm’s performance.

In Figure 9, as the shared weight, i.e., the hyperparameter is

θ

. The algorithm’s true presentation CER and average CER show more significant changes. When the hyperparameter is less than 0.2, the true presentation CER and average CER gradually increase. When the hyperparameter is greater than 0.2 and less than 0.5, the true presentation CER and average CER gradually decrease. When the hyperparameter is greater than 0.5, the real presentation CER and average CER gradually increase. Combining the above analysis and experimental data, the value of the shared weight, i.e., the hyperparameter

θ

, is set to 0.5 to guarantee the best performance of the algorithm. The recognition effect of the DS-CDCN algorithm on the faces of different races is shown in Figure 10.

In Figure 10, the true rendering error of the DS-CDCN algorithm for African, Asian and European FI recognition is less than 3%. This shows that the algorithm has high accuracy and robustness in recognizing faces of different races. The DS-CDCN algorithm is able to achieve fast and accurate FR with a low false recognition rate in FIs of different races. This indicates that the DS-CDCN algorithm is an advanced algorithm for FR of different races, and can provide efficient and accurate services for FR requirements of various races. To further validate the performance of the DS-CDCN algorithm, the study uses CDCNN, spatial detection network (SDNet), DenseNet and FaceNet as a comparison. The comparison of the recognition effect of different algorithms is shown in Table 2.

In Table 2, the misidentification rates 1–3 denote the attack presentation CER, the true presentation CER and the average CER, respectively. Compared with CDCNN, SDNet, DenseNet and FaceNet, the DS-CDCN algorithm reduces the average CER by 2.49%, 17.01%, 10.9% and 8.2%, respectively, and is able to more accurately recognize and differentiate samples of different categories. In FI recognition in Africa and Asia, the average error of the DS-CDCN algorithm is 3.3% and 3.9%, respectively. In European FI recognition, the average error of the DS-CDCN algorithm is 7.5%. There are certain differences in FI recognition errors among different regions. In the European region, the average error of all algorithms is generally high, which may be related to the diversity and complexity of European FIs. The DS-CDCN algorithm has demonstrated robust generalization ability in both Africa and Asia, suggesting its high adaptability in addressing FI recognition tasks across diverse geographical regions.

5. Conclusions

Traditional facial recognition methods based on feature extraction have many problems, such as being vulnerable to attackers, requiring a large amount of training data and computational resources, etc. Therefore, this study proposes a DL-based FR method that uses DBSCAN to preprocess noisy datasets. This can provide high-quality datasets to train the model. Moreover, based on CDCNN, improvements were made to propose the DS-CDCN algorithm. The experimental data indicated that the loss value of the DBSCAN algorithm was less than 0.1, and the loss value of KMCA and hierarchical clustering algorithm were both greater than 0.2. In image denoising, adaptive median filtering had a better effect, with an average signal-to-noise ratio of 33.36 dB and an average image structural similarity of 0.92. The daytime FI enhancement improved the contrast and brightness of the image by approximately 24.5% and 19.8%, respectively. After the nighttime FI enhancement, the images’ contrast and brightness improved by approximately 36.9% and 26.9%, respectively. To ensure the best performance of the algorithm, the shared weight hyperparameter was set to 0.5. The DS-CDCN algorithm achieved a lower average CER compared to CDCNN and SDNet by 2.49% and 17.01%, respectively. It was able to accurately recognize and differentiate between different classes of samples. The results indicate that the method exhibits higher accuracy and robustness, and can fulfill the FR requirements for real-time and security. The limitation of this research is that the DBSCAN algorithm performs well on small and medium-sized datasets but requires more time and computational resources when dealing with large-scale datasets. Therefore, future research can explore and study DL-based face clustering algorithms more to improve the efficiency and accuracy of large-scale face clustering.

Author Contributions

Conceptualization, N.D.; Methodology, N.D.; Software, Z.X. and X.L.; Formal analysis, N.D. and C.G.; Investigation, N.D.; Resources, X.L.; Data curation, Z.X., C.G. and X.W.; Writing—original draft, N.D.; Visualization, N.D. and X.W.; Project administration, Z.X.; Funding acquisition, N.D. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the R&D investment intensity growth reward fund from the Science and Technology Bureau of Chengde City, Hebei Province, with project number 232304B. This work is supported by the school-level fund of Hebei Minzu Normal University, with fund number DR2023003.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Rao, T.; Li, J.; Wang, X.; Sun, Y.; Chen, H. Facial Expression Recognition with Multiscale Graph Convolutional Networks. IEEE Multimed. 2021, 28, 11–19. [Google Scholar] [CrossRef]
Atmaja, B.T.; Akagi, M. Two-Stage Dimensional Emotion Recognition by Fusing Predictions of Acoustic and Text Networks Using SVM. Speech Commun. 2021, 126, 9–21. [Google Scholar] [CrossRef]
Rodriguez, A.M.; Geradts, Z.; Worring, M. Likelihood Ratios for Deep Neural Networks in Face Comparison. J. Forensic Sci. 2020, 65, 1169–1183. [Google Scholar] [CrossRef] [PubMed]
Najmabadi, M.; Moallem, P. Local Symmetric Directional Pattern: A Novel Descriptor for Extracting Compact and Distinctive Features in Face Recognition. Optik 2022, 251, 168331–168353. [Google Scholar] [CrossRef]
Ma, Y.; Zhang, H.; Cong, H.; Wang, Y.; Xin, S.; Xu, P.; Wang, F. Metal-organic-framework-derived porous core/shell CoP polyhedrons intertwined with 2D MXene as anode for Na-ion storage. J. Alloys Compd. 2023, 968, 171985. [Google Scholar] [CrossRef]
Bodyanskiy, Y.; Antonenko, T. Deep Neural Network Based on Generalized Neo-Fuzzy Neurons and Its Learning Based on Backpropagation. Artif. Intell. 2021, 26, 32–41. [Google Scholar]
Tang, T.; Hsu, H.; Li, K. Industrial Anomaly Detection with Multiscale Autoencoder and Deep Feature Extractor-Based Neural Network. IET Image Process. 2023, 17, 1752–1761. [Google Scholar] [CrossRef]
Shen, J.; Robertson, N. BBAS: Towards Large Scale Effective Ensemble Adversarial Attacks against Deep Neural Network Learning. Inf. Sci. 2020, 569, 469–478. [Google Scholar] [CrossRef]
Xin, S.; Chang, Y.; Zhou, R.; Cong, H.; Zheng, L.; Wang, Y.; Wang, F. Ultraviolet-driven metal oxide semiconductor synapses with improved long-term potentiation. J. Mater. Chem. C 2023, 11, 722–729. [Google Scholar] [CrossRef]
Burak, K.C.; Baykan, M.K.; Uuz, H. A New Deep Convolutional Neural Network Model for Classifying Breast Cancer Histopathological Images and the Hyperparameter Optimisation of the Proposed Model. J. Supercomput. 2021, 77, 973–989. [Google Scholar] [CrossRef]
Zheng, H.; Wang, R.; Ji, W.; Zong, M.; Lv, H. Discriminative Deep Multi-Task Learning for Facial Expression Recognition. Inf. Sci. 2020, 533, 60–71. [Google Scholar] [CrossRef]
Xu, P.; Liu, H.; Zhang, H.; Lan, D.; Shin, I. Optimizing performance of recycled aggregate materials using BP neural network analysis: A study on permeability and water storage. Desalination Water Treat. 2024, 317, 100056. [Google Scholar] [CrossRef]
Bhatt, A.; Alam, T.; Rane, K.P.; Nandal, R.; Malik, M.; Neware, R.; Goel, S. Quantum-Inspired Meta-Heuristic Algorithms with Deep Learning for Facial Expression Recognition Under Varying Yaw Angles. Int. J. Mod. Phys. C 2021, 33, 2250045–2250068. [Google Scholar] [CrossRef]
Zhang, W.; Zhang, X.; Tang, Y. Facial Expression Recognition Based on Improved Residual Network. IET Image Process. 2023, 17, 2005–2014. [Google Scholar] [CrossRef]
Cao, S.; Yao, Y.; An, G. E2-Capsule Neural Networks for Facial Expression Recognition Using AU-Aware Attention. IET Image Process. 2020, 14, 2417–2424. [Google Scholar] [CrossRef]
Paier, W.; Hilsmann, A.; Eisert, P. Interactive Facial Animation with Deep Neural Networks. IET Comput. Vis. 2020, 14, 359–369. [Google Scholar] [CrossRef]
Zheng, G.; Xu, Y. Efficient Face Detection and Tracking in Video Sequences Based on Deep Learning. Inf. Sci. 2021, 568, 265–285. [Google Scholar] [CrossRef]
Cheng, C.; Li, C.; Han, Y.; Zhu, Y. A Semi-Supervised Deep Learning Image Caption Model Based on Pseudo Label and N-Gram. Int. J. Approx. Reason. 2020, 131, 93–107. [Google Scholar] [CrossRef]
Lakhmiri, D.; Le Digabel, S.; Tribes, C. HyperNOMAD: Hyperparameter Optimization of Deep Neural Networks Using Mesh Adaptive Direct Search. ACM Trans. Math. Softw. 2021, 47, 27–53. [Google Scholar] [CrossRef]
Liu, F.; Chen, D.; Wang, F.; Li, Z.; Xu, F. Deep Learning Based Single Sample Face Recognition: A Survey. Artif. Intell. Rev. 2023, 56, 2723–2748. [Google Scholar] [CrossRef]
Purohit, J.; Dave, R. Leveraging Deep Learning Techniques to Obtain Efficacious Segmentation Results. Arch. Adv. Eng. Sci. 2023, 1, 11–26. [Google Scholar] [CrossRef]
Choudhuri, S.; Adeniye, S.; Sen, A. Distribution Alignment Using Complement Entropy Objective and Adaptive Consensus-Based Label Refinement For Partial Domain Adaptation. Artif. Intell. Appl. 2023, 1, 43–51. [Google Scholar] [CrossRef]

Figure 1. Face data preprocessing process.

Figure 2. Face dataset acquisition process.

Figure 3. DS-CDCN algorithm structures.

Figure 4. DS-CDC structure.

Figure 5. Structure of MCAF.

Figure 6. DBSCAN algorithm clustering effect.

Figure 7. Comparison of image denoising effects.

Figure 8. The data information of the enhanced image data.

Figure 9. The impact of convolution parameters on algorithm performance.

Figure 10. The recognition effect of the DS-CDCN algorithm on faces of different races.

Table 1. Commonly used large-scale face image datasets.

Serial Number	Dataset Name	Number of Images	Number of Characters	State
1	WDref	99,800	3000	Public
2	CASIA WebFace	494,500	10,580	Public
3	CACD	163,450	2000	Public
4	VGG-Face	2,600,000	2600	Public
5	MS-CELEB-1M	10,000,000	100,000	Public
6	SFC	4,400,000	4000	Unpublished
7	CelebFaces+	2,600,000	2600	Unpublished

Table 2. Commonly used large-scale face image datasets.

Face Classification	Algorithm	Error Rate 1 (%)	Error Rate 2 (%)	Error Rate 3 (%)
Africa	DS-CDCN	5.28	1.33	3.31
	CDCNN	13.8	1.56	7.68
	SDNet	35.6	15.8	25.71
	DenseNet	22.7	8.2	9.1
	FaceNet	16.3	5.3	6.7
Asia	DS-CDCN	6.39	1.43	3.91
	CDCNN	8.96	1.69	5.33
	SDNet	15.67	12.36	14.02
	DenseNet	24.5	9.3	9.9
	FaceNet	17.3	6.8	7.3
Europe	DS-CDCN	12.31	2.69	7.51
	CDCNN	14.73	3.64	9.19
	SDNet	45.62	6.45	26.04
	DenseNet	30.3	13.6	14.7
	FaceNet	20.3	18.7	19.3

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Deng, N.; Xu, Z.; Li, X.; Gao, C.; Wang, X. Deep Learning and Face Recognition: Face Recognition Approach Based on the DS-CDCN Algorithm. Appl. Sci. 2024, 14, 5739. https://doi.org/10.3390/app14135739

AMA Style

Deng N, Xu Z, Li X, Gao C, Wang X. Deep Learning and Face Recognition: Face Recognition Approach Based on the DS-CDCN Algorithm. Applied Sciences. 2024; 14(13):5739. https://doi.org/10.3390/app14135739

Chicago/Turabian Style

Deng, Nan, Zhengguang Xu, Xiuyun Li, Chenxuan Gao, and Xue Wang. 2024. "Deep Learning and Face Recognition: Face Recognition Approach Based on the DS-CDCN Algorithm" Applied Sciences 14, no. 13: 5739. https://doi.org/10.3390/app14135739

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Deep Learning and Face Recognition: Face Recognition Approach Based on the DS-CDCN Algorithm

Abstract

1. Introduction

2. Related Works

3. Design of Face Recognition Method Based on DS-CDCN Algorithm

3.1. Face Data Preprocessing Method

3.2. Design of Face Recognition Algorithm Based on DS-CDCN

4. Application Analysis of Face Recognition Method Based on DS-CDCN Algorithm

4.1. Face Data Preprocessing Validation

4.2. Validation of the Effect of Face Recognition Method Based on DS-CDCN Algorithm

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI