A Lightweight Convolutional Neural Network Method for Two-Dimensional PhotoPlethysmoGraphy Signals

Zhao, Feng; Zhang, Xudong; He, Zhenyu

doi:10.3390/app14103963

Open AccessArticle

A Lightweight Convolutional Neural Network Method for Two-Dimensional PhotoPlethysmoGraphy Signals

by

Feng Zhao

,

Xudong Zhang

^* and

Zhenyu He

School of Cyberspace Security, Xi’an University of Posts and Telecommunications, Xi’an 710121, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(10), 3963; https://doi.org/10.3390/app14103963

Submission received: 15 April 2024 / Revised: 3 May 2024 / Accepted: 4 May 2024 / Published: 7 May 2024

(This article belongs to the Special Issue Machine Learning Based Biomedical Signal Processing)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Data information security on wearable devices has emerged as a significant concern among users, so it becomes urgent to explore authentication methods based on wearable devices. Using PhotoPlethysmoGraphy (PPG) signals for identity authentication has been proven effective in biometric authentication. This paper proposes a convolutional neural network authentication method based on 2D PPG signals applied to wearable devices. This method uses Markov Transition Field technology to convert one-dimensional PPG signal data into two-dimensional image data, which not only retains the characteristics of the signal but also enriches the spatial information. Afterward, considering that wearable devices usually have limited resources, a lightweight convolutional neural network model is also designed in this method, which reduces resource consumption and computational complexity while ensuring high performance. It is proved experimentally that this method achieves 98.62% and 96.17% accuracy on the training set and test set, respectively, an undeniable advantage compared to the traditional one-dimensional deep learning method and the classical two-dimensional deep learning method.

Keywords:

wearable devices; PPG signals; Markov Transition Field; lightweight convolutional neural network

1. Introduction

Wearable devices have been widely popularized and applied in recent years. Their portability, real-time, and intelligence characteristics make them indispensable to people’s lives. With the popularity of these devices, the user information collected by various built-in sensors has also caused users to worry about the security of data information. The information collected by wearable devices covers many aspects of users, including sensitive data such as daily behaviors, location trajectories, environmental information, and physiological health. Once this information is leaked or illegally obtained, it may be used for various illegal activities, which will seriously threaten the user’s privacy and personal safety.

In order to better protect users’ data security, researchers have turned their attention to biometric authentication methods. Among them, identity authentication based on photoplethysmography (PPG) signals is one. The PPG signal is obtained by non-invasive physiological detection using photoplethysmography (PhotoPlethysmoGraphy). When the heart contracts and relaxes periodically, blood flow in the body also increases and decreases. When the light emitted by the light source penetrates the skin tissue, the hemoglobin in the blood absorbs it, and the intensity of the light reflected or projected back is also weakened and enhanced. At this time, the light receiver converts this change in light intensity into an electrical signal, the PPG signal [1]. Existing research shows that by analyzing the original PPG signal and the key signs of derivatives such as systolic peak and diastolic blood pressure peak extracted from it, a series of benchmark features can be obtained, which can be used as unique biometric information of the user [2,3,4]. Moreover, the PPG signal is collected through the sensor in the wearable device without additional operations, which makes continuous user authentication more natural. At the same time, the PPG signal can continuously monitor blood circulation, thereby achieving continuous user identity verification and improving the accuracy and reliability of identity authentication. PPG signals meet the essential characteristics of biometrics: universality, persistence, uniqueness, and ease of collection. Therefore, using PPG signals as biometrics to research identity authentication methods is feasible and can bring new possibilities to biometric identity authentication.

Consequently, this study proposes a deep learning-based identity authentication method for PPG signals. The primary contributions are as follows: firstly, the filtered one-dimensional temporal PPG data is partitioned into individual cycles and transformed into two-dimensional images using the Markov Transition Field (MTF) technique [5]. Unlike most existing identity authentication methods that directly employ one-dimensional PPG signals as inputs, this paper initially converts the one-dimensional temporal data into two-dimensional image data before incorporating it into a convolutional neural network (CNN). Two-dimensional image data is richer than one-dimensional time series data because of its multi-dimensional spatial information. Especially in color images, the three independent channels of red, green, and blue each carry different image information. Therefore, in feature extraction by convolutional neural networks, two-dimensional image data shows incomparable advantages. Secondly, a lightweight CNN model is devised, integrating the concepts of Depthwise Separable Convolution [6] and Residual Structures [7]. This model maintains high recognition accuracy while reducing memory consumption and improving runtime efficiency. It is well-suited for resource-constrained devices such as wearable devices, which have small form factors, low power requirements, and limited computational resources. Identity authentication can be approached as a classification problem to gain deeper insights. Experimental results show that the lightweight CNN model proposed in this study achieves an accuracy rate of 98.62% on the training set and 96.17% on the testing set, both surpassing some traditional deep learning methods. Hence, this research demonstrates high feasibility.

The use of PPG signals for biometric identification has garnered significant attention and application in the current literature. Researchers have actively explored methods that use PPG signals to distinguish between individuals by extracting unique and stable features from these signals. Two primary categories of methods are available for use.

Regarding the first category of methods, analyzing waveform characteristics of signals and extracting features from both the time domain and frequency domain has been employed. Kavsaoğlu et al. [8] successfully extracted 40-time domain features from the PPG signal and its first and second derivatives and innovatively proposed a feature sorting algorithm. They further verified the effectiveness of the algorithm through the k-NN classifier. Jaafar et al. [9] utilized the second derivative of PPG signals, known as APG signals, obtained from ten individual users. The morphology of APG signals was analyzed to extract features, which were then classified using a Bayesian network. Meanwhile, Salanke et al. [10] segmented PPG signals based on P-wave intervals and used principal component analysis to extract features successfully utilized for signal classification.

In the second category, deep neural network models are employed for automatic feature classification and learning from PPG signals. Deep learning methods can automatically learn useful features directly from raw data without requiring manual feature design and selection. Li et al. [11] designed and trained a multi-scale feature fusion deep learning (MFFD) model. This model is mainly based on convolutional neural network architecture and is used to effectively extract the features of PPG signals and to learn how to accurately distinguish different individuals based on each person’s unique PPG pattern. Wei et al. [12] first proposed PPG enhancement technology to generate multi-scale PPG signals and then proposed a deep end-to-end model to extract and classify multi-scale features. Seok et al. [13] proposed a one-dimensional twin neural network biometric model based on PPG, which reduced noise and retained individual unique characteristics through the multi-period averaging method, achieving efficient and safe identification and authentication. Abbani et al. [14] used a two-way long short-term memory deep learning algorithm in their research and successfully designed an identity authentication model based on PPG signals. Dwaipayan et al. [15] designed a novel deep learning model, CorNET, which combines two convolutional neural network layers and two long short-term memory layers for identity authentication tasks. The network combines two convolutional neural layers and two long short-term memory layers. Jordi et al. [16] proposed an end-to-end recognition architecture based on the original PPG signal, mainly built by a convolutional neural network. However, some methods convert PPG signals into two-dimensional image data and utilize deep learning methods for classification. Cherry et al. [17] transformed one-dimensional PPG signals into two-dimensional spectrograms and employed convolutional neural networks for automatic feature extraction and classification tasks. Using the scalogram technique, Mostafa et al. [18] converted one-dimensional PPG signals into two-dimensional images. They developed a CVT-ConvMixer classifier and attention mechanisms to achieve individual identity recognition.

The main problem of the first type of method in the above literature is that the features extracted by analyzing the waveform characteristics of the signal or the signal characteristics from the time domain and frequency domain are often not comprehensive enough, and manual processing of features is prone to errors. In the second type of method, the deep learning model used has a relatively complex network structure and many parameters, which results in high computational complexity and consumes many resources. The method in this article uses a neural network model to extract features and learn to classify PPG signals which are automatically converted into two-dimensional images. Compared with the literature mentioned in the first method above, the method used in this paper does not need manual processing of features but realizes end-to-end learning from original input to final output, which can significantly simplify the whole process of feature extraction and classification and improve the efficiency and practicality of the model. Deep learning methods can learn multi-level and multi-scale feature representations of data through multi-level neural network structures. This enables the model to perform effective feature extraction and recognition when facing new, unseen data and has strong generalization capabilities. Compared with the literature mentioned in the second method above, the deep learning model proposed in this article significantly reduces the model’s size by optimizing the network structure and reducing the number of parameters, making the model more efficient in storage and transmission. Efficient, especially for resource-constrained environments such as wearable devices, without sacrificing accuracy.

2. Methods

This section will comprehensively elucidate the identity authentication method utilizing PPG signals as a carrier. This method comprises three essential phases. The first step involves signal preprocessing to eliminate noise interference and enhance signal quality. Subsequently, the preprocessed one-dimensional signal is converted into a two-dimensional image signal to facilitate subsequent feature extraction and recognition. Lastly, the transformed two-dimensional signal is classified using a constructed lightweight convolutional neural network, thus achieving accurate identity authentication. The workflow is depicted in Figure 1.

2.1. PPG Signal Preprocessing

Various types of noise often accompany PPG signals, acting as interference during the collection process [19]. These noises mainly include baseline drift, which arises from respiratory fluctuations and the instability of amplification circuits; power line interference, originating from AC power sources; electromyographic noise, resulting from limb tremors and muscle contractions; and motion artifacts, caused by changes in the optical measurement due to bodily movements.

To enhance the quality of PPG signals, we used a 3rd-order bandpass Butterworth filter. Choosing a 3rd-order bandpass Butterworth filter can balance frequency selectivity and phase response during the filtering process, reducing distortion and signal delay. The design of this filter takes into account the characteristics of the PPG signal. It sets the high-pass cutoff frequency to 8 Hz, which can effectively filter out high-frequency interference caused by electromyographic noise and power frequency drift. The low-pass cutoff frequency is 0.5 Hz, which can effectively filter out low-frequency interference caused by baseline drift. The filtered PPG signals exhibited a significant quality improvement, forming the basis for subsequent identity authentication tasks. Furthermore, the filtered PPG signals underwent amplitude normalization to address amplitude variations in the signals. This step ensured a unified dynamic range of the signals at 1, further enhancing the accuracy and stability of the identity authentication process.

2.2. Two-Dimensional Signal Transformation Methods

Although various deep learning algorithms, such as 1D-CNN and LSTM, have been developed to handle one-dimensional time series data like PPG signals, two-dimensional images possess a wealth of spatial information and structural characteristics. Leveraging deep learning methods for feature extraction can effectively capture edge detection, corner identification, and other informative details, ultimately improving learning efficiency. Hence, the conversion of one-dimensional PPG signals into two-dimensional images is indispensable.

This paper adopts the Markov Transition Field (MTF) method to transform the one-dimensional PPG signals into two-dimensional images. MTF is an image encoding technique that uses the Markov transition matrix to encode time series data. It treats the temporal evolution of the time series as a Markov process, where the future state depends solely on the present state, independent of the past states. By constructing the Markov transition matrix to reflect this concept, we encode time series data as images by extending it into a Markov Transition Field. The main steps involved in this process are as follows:

Divide the time series data equally into Q different quantile bins and label these quantile bins from 1 to Q in sequence.
Replace each data point in the time series with its bin number.
Calculate the transfer frequency between each quantile bin along the time axis of the time series as a 1st-order Markov chain and construct a Q × Q transfer matrix W accordingly, as shown in Formula (1). In this matrix, the element ω_ij represents the transition frequency from quantile bin i to quantile bin j. The transition matrix provides quantitative information about the quantile bin transition patterns in the time series.

W = [\begin{matrix} ω_{11} & \dots & ω_{1 Q} \\ ω_{21} & \dots & ω_{2 Q} \\ ⋮ & ⋱ & ⋮ \\ ω_{Q 1} & \dots & ω_{QQ} \end{matrix}] s.t. \sum_{j} ω_{ij} = 1

(1)

4.: In the Markov transfer field, x₁ to x_N are elements in the time series, q_i and q_j are quantile intervals of time steps i and j, respectively, and the transfer probability from q_i to q_j is M_ij. Considering each probability of time position arrangement, the Markov transfer matrix W is extended to the Markov transfer field M, as shown in Formula (2).

M = [\begin{matrix} M_{11} & \dots & M_{1 Q} \\ M_{21} & \dots & M_{2 Q} \\ ⋮ & ⋱ & ⋮ \\ M_{Q 1} & \dots & M_{QQ} \end{matrix}] = [\begin{matrix} ω_{ij} | X_{1} \in q_{i}, X_{1} \in q_{j} & \dots & ω_{ij} | X_{1} \in q_{i}, X_{N} \in q_{j} \\ ω_{ij} | X_{2} \in q_{i}, X_{1} \in q_{j} & \dots & ω_{ij} | X_{2} \in q_{i}, X_{N} \in q_{j} \\ ⋮ & ⋱ & ⋮ \\ ω_{ij} | X_{N} \in q_{i}, X_{1} \in q_{j} & \dots & ω_{ij} | X_{N} \in q_{i}, X_{N} \in q_{j} \end{matrix}]

(2)

This paper uses the peak detection method to split the long-segment PPG signal into independent single-cycle signals. Then, the single-cycle signals are used as input for the neural network model. During the segmentation process, all abnormal single-cycle signals are eliminated to ensure the accuracy and reliability of subsequent analysis. Subsequently, these single-period signals were successfully converted into two-dimensional images with a size of 28 × 28 using the Markov transfer field method, as shown in Figure 2.

2.3. Lightweight Convolutional Neural Network, LW-CNN

Since this paper studies identity authentication based on wearable devices and considers the resource-limited characteristics of wearable devices, we propose a lightweight convolutional neural network LW-CNN with deep separable convolution ideas and residual connections. Compared with traditional convolutional neural networks (CNN), LW-CNN significantly reduces the complexity of the model by reducing the number of network layers, using smaller convolution kernels and a more straightforward network structure. Additionally, because of its simpler structure and fewer parameters, both forward and backward calculations require less computation, resulting in improved efficiency. Moreover, because the residual structure is introduced, the network structure is more stable, which improves the network’s performance and enhances its robustness.

Depthwise separable convolution can be regarded as a special convolution operation. In traditional convolution operations, each convolution kernel performs convolution operations simultaneously on all channels of the input feature map, thereby generating channels of the output feature map. If the input feature map has N channels, each convolution kernel requires N convolution sub-kernels for the input channels. However, depthwise separable convolution divides this process into two stages. The first part is the Depthwise Convolution layer, in which each input channel has its own 3 × 3 convolution kernel for independent convolution operations. This means there will be N convolution kernels if there are N input channels, and each kernel will only convolve one input channel. In this way, the deep convolutional layer can extract the features of each channel without increasing the number of parameters. Secondly, the pointwise convolution layer (Pointwise Convolution), after the depth convolution, uses a 1 × 1 convolution kernel to convolve the output of the depth convolution. This 1 × 1 convolution kernel is a cross-channel linear transformation. Its function is to fuse and combine the features of all channels to generate the final output feature map. In this way, depthwise separable convolution can significantly reduce the model’s parameters while maintaining the convolution operation’s spatial filtering capabilities. Specifically, for a traditional convolutional layer with N input channels and M output channels, the number of parameters is N × M × K × K (where K is the size of the convolution kernel). The number of parameters of the corresponding depth-separable convolution layer is reduced to N × K × K (depth convolution layer) + M × N (point-by-point convolution layer). Usually, M and N are both large, so this reduction is very significant. Figure 3 is a schematic diagram of depthwise separable convolution.

The residual structure is proposed to solve the problem of gradient explosion or dispersion in neural network training. The core idea is to build a neural network by introducing residual blocks (Residual blocks). These residual blocks create a shortcut connection between the input and output, allowing information to pass more smoothly through the network. Specifically, each residual block contains multiple convolutional layers used to extract and transform features. At the same time, short-circuit connections allow the input signal to skip one or more layers and be added directly to the output of subsequent layers. This structural innovation ensures that gradients can flow back to previous layers more effectively during backpropagation, thereby avoiding the problem of vanishing gradients. Mathematically, this short-circuit connection can be expressed as y = F(x) + x, where y represents the output of the current layer, F(x) represents the feature map obtained by the current layer through operations such as convolution, and x is the output of the previous layer or layers, that is, the input of a short-circuit connection. This addition operation not only preserves the information of the original input but also allows the network to focus on learning the residual between the input and the output, that is, the difference between them. This residual learning method helps the network extract features more efficiently and makes the training of deep networks more stable. Since the gradient contains the derivative term concerning the input x, the gradient can be effectively propagated even in deep networks, alleviating the vanishing gradient problem. In addition, since the convolutional layers in the residual block are usually accompanied by batch normalization and ReLU activation functions, this further improves the stability and convergence speed of the network. Figure 4 shows a schematic diagram of a three-layer residual structure.

The lightweight convolutional neural network model, LW-CNN, employed in this study primarily comprises an initial convolutional layer, three residual blocks, a global average pooling layer, a Dropout layer, and a fully connected layer. In the initial convolutional layer, a 3 × 3 kernel performs convolutional operations on the input image with a stride of 2. Padding is applied to the image edges to ensure consistency in spatial dimensions, transforming the original image with three channels into a feature map with 32 channels. The three residual blocks are responsible for expanding the number of channels in the feature map, achieving values of 64 and 128 and maintaining it at 128, respectively. Within each residual block, two convolutional layers are incorporated, wherein the first one employs a 3 × 3 kernel, and the second utilizes a 1 × 1 kernel. The stride for the 3 × 3 convolutional layers in the first and third residual blocks is set to 1, while it is 2 for the second residual block. The outputs are summed together by establishing shortcut connections between the two parts of each residual block, effectively addressing gradient vanishing problems and ensuring a smooth flow of information. A batch normalization layer and a ReLU activation function follow the initial and convolutional layers in the residual blocks. The former stabilizes the training process and expedites model convergence, while the latter enhances the model’s representative capacity and improves classification accuracy. Subsequently, a global average pooling layer is employed to reduce the spatial dimensions of the feature map to 1 × 1, yielding a feature vector. A Dropout layer is introduced to enhance the model’s generalization capability further. During training, the Dropout layer randomly discards specific neuron outputs, preventing the model from over-relying on specific neurons and thereby enhancing the model’s robustness. Finally, the feature vector is fed into a fully connected layer for classification to generate the final prediction. Experimental results show that setting the dropout rate to 0.5 achieves the best prediction performance. The schematic diagram of this lightweight convolutional neural network is depicted in Figure 5.

3. Experiment

The experiments in this study were performed using a computer equipped with an Intel(R) Core(TM) i5-7300HQ CPU operating at a frequency of 2.50 GHz and having 8.00 GB of memory. The experimental development environment was set up using Pycharm2022 as the software, with Python as the programming language. All neural networks’ construction and training evaluations were carried out using the PyTorch deep learning framework.

3.1. Dataset

The PPG signals used in this study were sourced from the BIDMC PPG and Respiration Dataset [20] available on Physionet. In order to conduct a more accurate experimental analysis, the PPG signals of 16 individuals were screened as research samples, and the sampling frequency was 125 Hz. These individuals included a total of eight men and eight women. The oldest was 88 years old, the youngest was 19 years old, and the average age was 41 years old, ensuring the representativeness and validity of the data. After preprocessing the dataset and eliminating any aberrant values, 4439 two-dimensional images were generated, each contributing between 200 and 350 images. For model training and learning, 70% of the overall dataset for each individual was allocated as the training set. The remaining 30% was the test set to evaluate the model’s performance and accuracy.

3.2. Experimental Results and Analysis

While training and testing the LW-CNN model, this experiment used the cross-entropy loss function and Adma optimizer. The primary function of the cross-entropy loss function is to measure the difference between the model prediction results and the actual labels and provide guidance for model optimization. When the loss value approaches 0, the model’s prediction results are closer to the actual label. The Adma optimizer flexibly adjusts each parameter’s learning rate based on the loss function’s gradient information to achieve a more efficient optimization process. In the experiment, after many attempts and adjustments, we set the learning rate of the Adma optimizer to 0.001. This learning rate value effectively balances convergence speed and model stability, ensuring efficient evaluation. In addition, we also set the number of model epochs to 80 and the batch size to 128. After training, the accuracy of the LW-CNN model on the training set reached 98.62%, and the accuracy on the test set also reached 96.17%, showing good performance. Figure 6a shows the accuracy curves of the training set and validation set during the 80 iterations of the model. As the number of epochs increases, the accuracy gradually increases and stabilizes. Figure 6b shows the loss curves of the training set and the test set. As the number of epochs increases, the loss value gradually decreases and finally converges to a lower level. These results fully prove the effectiveness and stability of the LW-CNN model.

To evaluate the performance of the LW-CNN model more comprehensively and in-depth, in addition to observing the changing trends of the accuracy curve and loss curve, we further drew a confusion matrix to analyze the classification effect of the model in detail. The confusion matrix shows the model’s intuitive classification of categories, including the number of correctly and incorrectly classified samples. We can observe the model’s specific performance in each category through the confusion matrix and then better evaluate the model’s performance. Precision, accuracy, recall, and F1-score can also be calculated by analyzing the confusion matrix. Precision measures the proportion of correctly predicted classifications among all the predicted correct classifications, while accuracy calculates the percentage of accurate predictions out of all predictions made. Recall establishes the ratio of correctly classified instances over the total number of genuinely correct classifications, and the F1-score balances the precision and recall measures by taking their harmonic mean. The confusion matrix for the LW-CNN model is depicted in Figure 7, with corresponding metrics of accuracy (96.17%), precision (96.24%), recall (96.28%), and F1-score (96.11%).

This study conducted two control experiments to compare and showcase the LW-CNN model. The hyperparameters, including iteration count, batch size, learning rate, and dropout, were consistent with the LW-CNN model described earlier.

Firstly, in the first set of experiments, the LW-CNN model utilized two-dimensional image data transformed from one-dimensional temporal data as inputs. Two other neural network models directly processing one-dimensional temporal data were selected to serve as a reference. The first model employed a one-dimensional convolutional neural network (1D-CNN) [21] capable of extracting features directly from the raw one-dimensional temporal data and performing classification through fully connected layers. The second model utilized an extended short-term memory network (LSTM) [22], which excels at processing sequence data with long-term dependencies. LSTM captures the dynamic variations in temporal data through its internal gating mechanisms and memory units. Table 1 illustrates the performance of these three networks in terms of accuracy, precision, recall, and F1-score. As seen from the table, compared with the traditional neural network model that directly processes one-dimensional time series data, the LW-CNN model converts one-dimensional time series data into two-dimensional image data and then trains it, which is a more effective method to improve classification accuracy.

In addition, Table 2 below also gives the total number of parameters, the total parameter size, and the time required for each iteration of the three networks. LW-CNN has the advantages of fewer parameters and a faster training speed than the other two neural networks that directly process one-dimensional time series signals.

Followed by the second set of experiments, LW-CNN, as a lightweight convolutional neural network, was compared with two classic convolutional neural network models, AlexNet and GoogleNet. Both models perform excellently in image classification tasks, and each has its characteristics. AlexNet performs extracting and classifying image features by stacking multiple convolutional layers, pooling, and fully connected layers. GoogleNet introduces the Inception module, which improves the model’s feature extraction capabilities and performance by using convolution kernels of different sizes and pooling operations in parallel. Table 3 shows the three networks’ accuracy, precision, recall, and F1-score performance.

In addition, the total number of parameters required to run the three networks, the total parameter size, and the time required for each iteration are also given in Table 4. The total parameters required for LW-CNN operation are significantly smaller than the other two network models, and the training time is shorter. This means that LW-CNN has a smaller model size and faster training speed while maintaining higher accuracy. It exhibits significant lightweight characteristics and is more suitable for resource-limited and real-time applications—highly demanded by wearable devices.

4. Conclusions

This paper proposes an identity authentication method based on a two-dimensional PPG signal and lightweight convolutional neural network. Through innovative data processing and model design, the accuracy of identity authentication is greatly improved, which is especially suitable for the application scenario of wearable devices with limited resources and high real-time requirements. Firstly, compared with the traditional deep learning method of one-dimensional time series data, this method converts a one-dimensional PPG signal into a two-dimensional image by using the Markov transfer field method after filtering. This conversion not only retains the critical information of the PPG signal but also contains more spatial transformation information, thus improving the recognition accuracy. Secondly, a lightweight convolutional neural network (LW-CNN) is designed for wearable devices with limited resources and high real-time performance. Compared with traditional deep learning methods, LW-CNN has the advantages of fewer parameters and a faster running time while maintaining a higher accuracy recognition rate. To sum up, the identity authentication method proposed in this paper is based on two-dimensional PPG data and a lightweight convolutional neural network, which improves recognition accuracy and is especially suitable for application scenarios such as wearable devices. In subsequent research, an identity authentication method with less energy consumption, higher robustness, and higher recognition accuracy should be designed for wearable devices with parameters such as small size, low power consumption, and limited computing resources.

Author Contributions

Conceptualization, F.Z., X.Z. and Z.H.; Data curation, X.Z.; Formal analysis, X.Z.; Investigation, X.Z.; Methodology, F.Z., X.Z. and Z.H.; Project administration, F.Z.; Resources, F.Z.; Software, X.Z.; Supervision, F.Z.; Validation, X.Z.; Visualization, X.Z.; Writing—original draft, X.Z.; Writing—review and editing, F.Z. and X.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Park, J.; Seok, H.S.; Kim, S.S.; Shin, H. Photoplethysmogram analysis and applications: An integrative review. Front. Physiol. 2022, 12, 808451. [Google Scholar] [CrossRef] [PubMed]
Spachos, P.; Gao, J.; Hatzinakos, D. Feasibility study of photoplethysmographic signals for biometric identification. In Proceedings of the 2011 17th International Conference on Digital Signal Processing (DSP), Corfu, Greece, 6–8 July 2011; pp. 1–5. [Google Scholar]
Sarkar, A.; Abbott, A.L.; Doerzaph, Z. Biometric authentication using photoplethysmography signals. In Proceedings of the 2016 IEEE 8th International Conference on Biometrics Theory, Applications and Systems (BTAS), Niagara Falls, NY, USA, 6–9 September 2016; pp. 1–7. [Google Scholar]
Zhao, T.; Wang, Y.; Liu, J.; Chen, Y.; Cheng, J.; Yu, J. TrueHeart: Continuous Authentication on Wrist-worn Wearables Using PPG-based Biometrics. In Proceedings of the IEEE INFOCOM 2020—IEEE Conference on Computer Communications, Toronto, ON, Canada, 6–9 July 2020; pp. 30–39. [Google Scholar]
Wang, Z.; Oates, T. Imaging time-series to improve classification and imputation. In Proceedings of the 24th International Joint Conference on Artificial Intelligence, Buenos Aires, Argentina, 25–31 July 2015; pp. 3939–3945. [Google Scholar]
Chollet, F. Xception: Deep Learning with Depthwise Separable Convolutions. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 1800–1807. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Kavsaoğlu, A.R.; Polat, K.; Bozkurt, M.R. A novel feature ranking algorithm for biometric recognition with PPG signals. Comput. Biol. Med. 2014, 49, 1–14. [Google Scholar] [CrossRef] [PubMed]
Jaafar, N.; Sidek, K.A.; Azam, S.N. Acceleration plethysmogram based biometric identification. In Proceedings of the 2015 International Conference on Bio Signal Analysis, Processing and Systems (ICBAPS), Kuala Lumpur, Malaysia, 26–28 May 2015; pp. 16–21. [Google Scholar]
Salanke, N.S.; Maheswari, N.; Samraj, A. An Enhanced Intrinsic Biometric in Identifying People by Photopleythsmography Signal. In Proceedings of the 4th International Conference on Signal and Image Processing 2012 (ICSIP 2012), Coimbatore, India, 13–15 September 2012; pp. 291–299. [Google Scholar]
Wan, L.; Liu, K.; Mengash, H.A.; Alruwais, N.; Duhayyim, M.A.; Venkatachalam, K. Deep Learning-Based Photoplethysmography Biometric Authentication for Continuous User Verification. Appl. Soft Comput. 2024, 156, 111461. [Google Scholar] [CrossRef]
Wei, R.; Xu, X.; Li, Y.; Zhang, Y.; Wang, J.; Chen, H. PulseID: Multi-scale photoplethysmographic identification using a deep convolutional neural network. Biomed. Signal Process. Control 2024, 88, 105609. [Google Scholar] [CrossRef]
Seok, C.L.; Song, Y.D.; An, B.S.; Lee, E.C. Photoplethysmogram Biometric Authentication Using a 1D Siamese Network. Sensors 2023, 23, 4634. [Google Scholar] [CrossRef] [PubMed]
Cherry, A.; Abbani, M.; Sleiman, A.; Hage-Diab, A.; Hajj-Hassan, M. Photoplethysmography Biometric Recognition Using Deep Learning. In Proceedings of the 2023 Seventh International Conference on Advances in Biomedical Engineering (ICABME), Beirut, Lebanon, 12–13 October 2023; pp. 93–96. [Google Scholar]
Biswas, D.; Everson, L.; Liu, M.; Panwar, M.; Verhoef, B.E.; Patki, S.; Kim, C.H.; Acharyya, A.; Hoof, C.V. CorNET: Deep Learning Framework for PPG-Based Heart Rate Estimation and Biometric Identification in Ambulant Environment. IEEE Trans. Biomed. Circuits Syst. 2019, 13, 282–291. [Google Scholar] [CrossRef] [PubMed]
Luque, J.; Cortes, G.; Segura, C.; Maravilla, A.; Esteban, J.; Fabregat, J. END-to-END Photopleth YsmographY (PPG) Based Biometric Authentication by Using Convolutional Neural Networks. In Proceedings of the 2018 26th European Signal Processing Conference (EUSIPCO), Rome, Italy, 3–7 September 2018; pp. 538–542. [Google Scholar]
Cherry, A.; Charanek, Y.; Taha, Y.; Sleiman, A.; Hajj-Hassan, M. Photoplethysmography Biometric Authentication using Convolutional Neural Network. In Proceedings of the 2023 Seventh International Conference on Advances in Biomedical Engineering (ICABME), Beirut, Lebanon, 12–13 October 2023; pp. 97–101. [Google Scholar]
Ibrahim, M.E.A.; Abbas, Q.; Daadaa, Y.; Ahmed, A.E.S. A Novel PPG-Based Biometric Authentication System Using a Hybrid CVT-ConvMixer Architecture with Dense and Self-Attention Layers. Sensors 2024, 24, 15. [Google Scholar] [CrossRef] [PubMed]
Mishra, B.; Nirala, N.S. A Survey on Denoising Techniques of PPG Signal. In Proceedings of the 2020 IEEE International Conference for Innovation in Technology (INOCON), Bangluru, India, 1 January 2020; pp. 1–8. [Google Scholar]
Pimentel, M.A.F.; Johnson, A.E.W.; Charlton, P.H.; Birrenkott, D.; Watkinson, P.J.; Tarassenko, L.; Clifton, D.A. Toward a Robust Estimation of Respiratory Rate from Pulse Oximeters. IEEE Trans. Biomed. Eng. 2017, 64, 1914–1923. [Google Scholar] [CrossRef] [PubMed]
Yoon, K. Convolutional Neural Networks for Sentence Classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014; pp. 1746–1751. [Google Scholar]
Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Overall methodology flowchart for identity authentication using PPG signals.

Figure 2. Conversion of one-dimensional data to two-dimensional.

Figure 3. Illustration of Depthwise Separable Convolution.

Figure 4. Schematic diagram of the residual structure.

Figure 5. Illustration of the Lightweight Convolutional Neural Network (LW-CNN).

Figure 6. LW-CNN Accuracy (a) and Loss (b) Curves.

Figure 7. Confusion matrix of LW-CNN.

Table 1. Comparison of the performance of LW-CNN, 1D-CNN, and LSTM in terms of accuracy, precision, recall, and F1-score.

Network Model	Accuracy	Precision	Recall	F1-Score
LW-CNN	96.17%	96.24%	96.28%	96.11%
1D-CNN	92.10%	92.45%	92.14%	92.00%
LSTM	93.28%	93.56%	93.47%	93.51%

Table 2. Comparison of the performance of LW-CNN, 1D-CNN, and LSTM.

Network Model	Total Number of Parameters	Total Parameter Size (MB)	Iteration Time per Iteration (S)
LW-CNN	44,016	0.17	7.82
1D-CNN	236,834	0.93	9.57
LSTM	158,356	0.64	10.23

Table 3. Comparison of the performance of LW-CNN, AlexNet, and GoogleNet in terms of accuracy, precision, recall, and F1-score.

Network Model	Accuracy	Precision	Recall	F1-Score
LW-CNN	96.17%	96.24%	96.28%	96.11%
AlexNet (trained from initial)	93.91%	93.89%	93.72%	93.60%
GoogleNet (trained from initial)	94.29%	94.40%	94.58%	94.41%

Table 4. Comparison of the performance of LW-CNN, AlexNet, and GoogleNet.

Network Model	Total Number of Parameters	Total Parameter Size (MB)	Iteration Time per Iteration (S)
LW-CNN	44,016	0.17	7.82
AlexNet (trained from initial)	1,048,784	4.09	12.08
GoogleNet (trained from initial)	1,668,608	6.36	10.94

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhao, F.; Zhang, X.; He, Z. A Lightweight Convolutional Neural Network Method for Two-Dimensional PhotoPlethysmoGraphy Signals. Appl. Sci. 2024, 14, 3963. https://doi.org/10.3390/app14103963

AMA Style

Zhao F, Zhang X, He Z. A Lightweight Convolutional Neural Network Method for Two-Dimensional PhotoPlethysmoGraphy Signals. Applied Sciences. 2024; 14(10):3963. https://doi.org/10.3390/app14103963

Chicago/Turabian Style

Zhao, Feng, Xudong Zhang, and Zhenyu He. 2024. "A Lightweight Convolutional Neural Network Method for Two-Dimensional PhotoPlethysmoGraphy Signals" Applied Sciences 14, no. 10: 3963. https://doi.org/10.3390/app14103963

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Lightweight Convolutional Neural Network Method for Two-Dimensional PhotoPlethysmoGraphy Signals

Abstract

1. Introduction

2. Methods

2.1. PPG Signal Preprocessing

2.2. Two-Dimensional Signal Transformation Methods

2.3. Lightweight Convolutional Neural Network, LW-CNN

3. Experiment

3.1. Dataset

3.2. Experimental Results and Analysis

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI