Age Prediction in Healthy Subjects Using RR Intervals and Heart Rate Variability: A Pilot Study Based on Deep Learning

Lee, Kyung Hyun; Byun, Sangwon

doi:10.3390/app13052932

Open AccessArticle

Age Prediction in Healthy Subjects Using RR Intervals and Heart Rate Variability: A Pilot Study Based on Deep Learning

by

Kyung Hyun Lee

and

Sangwon Byun

^*

Department of Electronics Engineering, Incheon National University, Incheon 22012, Republic of Korea

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2023, 13(5), 2932; https://doi.org/10.3390/app13052932

Submission received: 27 December 2022 / Revised: 15 February 2023 / Accepted: 23 February 2023 / Published: 24 February 2023

(This article belongs to the Section Biomedical Engineering)

Download

Browse Figures

Versions Notes

Abstract

:

Autonomic cardiac regulation is affected by advancing age and can be observed by variations in R-peak to R-peak intervals (RRIs). Heart rate variability (HRV) has been investigated as a physiological marker for predicting age using machine learning. However, deep learning-based age prediction has rarely been performed using RRI data. In this study, age prediction was demonstrated in a healthy population based on RRIs using deep learning. The RRI data were extracted from 1093 healthy subjects and applied to a modified ResNet model to classify four age groups. The HRV features were evaluated using this RRI dataset to establish an HRV-based prediction model as a benchmark. In addition, an age prediction model was developed that combines RRI and HRV data. The adaptive synthetic algorithm was used because of class imbalance and a hybrid loss function that combined classification loss and mean squared error functions was implemented. Comparisons suggest that the RRI model can perform similarly to the HRV and combined models, demonstrating the potential of the RRI-based deep learning model for automated age prediction. However, these models showed limited efficacy in predicting all age groups, indicating the need for significant improvement before they can be considered reliable age prediction methods.

Keywords:

biological age; age prediction; machine learning; deep learning; autonomic nervous system; RR intervals; heart rate variability; ResNet; class imbalance; ordinal classification

1. Introduction

The aging of the human population, as well as the rise in age-related disorders such as cardiovascular diseases, impose pressures on the healthcare system and society [1,2,3]. The concept of biological aging arose from the observation that people do not age at the same rate [4]. Biological aging, also known as functional or physiological aging, differs from chronological aging because it reflects a functional decline in tissues and organs, whereas chronological age relates only to the passage of time [4]. Molecular biomarkers, such as DNA methylation and telomere length, have been investigated to evaluate biological age based on the findings that they are affected by aging [4,5,6]. For example, the difference between chronological age and methylation-estimated age was associated with mortality in the elderly population [7]. These results suggest that the estimated age, that is, biological age, is a better indicator of age-related physiological decline and disease than the chronological age [4].

Biomarkers that indicate accelerated aging compared to chronological aging can be used to improve early intervention for high-risk individuals. Although molecular biomarkers have shown promising results in predicting age-related diseases by comparing chronological and estimated ages, these biomarkers require invasive procedures to take blood or tissue samples, followed by elaborate biochemical assays, which make them unfavorable to be routinely used by the general public [8]. To overcome this, researchers have investigated a non-invasive method, such as age estimation based on highly accessible physiological signals [8,9].

The autonomic nervous system (ANS) regulates and maintains homeostasis in the cardiovascular system through the interaction between sympathetic and parasympathetic neural activity, particularly at the sinoatrial node [10,11,12]. The autonomic regulation of the cardiac system is affected by advancing age because of structural changes, such as the loss of sinoatrial pacemaker cells, and functional changes, which lead to gradual deterioration in the balanced interaction between the two branches of the ANS [13,14]. This age-related change in autonomic cardiovascular control can be observed via the variation in cardiac beat-to-beat time intervals, known as heart rate variability (HRV) [12], which is quantified by evaluating the features of R-peak to R-peak intervals (RRIs) [10]. Heart rate variability features have been widely used to evaluate cardiac autonomic responsiveness to normal activities, external stimuli, and diseases [15,16,17]. Heart rate variability is considered a surrogate marker for cardiac autonomic regulation and the balance between sympathetic and vagal activities, although the underlying mechanism for the observed alterations in HRV is not fully understood [12].

Heart rate variability has been investigated as a non-invasive physiological biomarker to predict age because trends in HRV features associated with advancing age reflect age-induced variation in autonomic cardiac control in healthy subjects [18,19,20]. In particular, time, frequency, and nonlinear HRV features related to vagal modulation decrease with age in healthy subjects, indicating the prognostic significance of HRV indices in predicting age [18,19,20]. For example, Colosimo et al. developed an equation that predicts the cardiac age of healthy subjects using frequency-domain HRV features [21].

Owing to the recent advances in machine learning, studies have applied various algorithms to HRV features to predict age or classify age groups [22,23,24,25,26]. For instance, Corino et al. used neural networks for the regression analysis of HRV features to predict the age of 113 healthy subjects, resulting in a correlation coefficient of 0.872 between the predicted and true ages [22]. Poddar et al. used probabilistic neural network (PNN) algorithms to classify three age groups of 60 healthy men and achieved an accuracy of 70% [23]. Botsva et al. used neural networks to predict the mean group age from nine age groups of 22,433 subjects with 85% accuracy [24]. These results demonstrate the potential of using HRV features for machine learning-based automated age prediction.

Notably, these studies on the automated prediction of age used precalculated HRV features as input data for machine learning models. This is a classical machine learning approach based on handcrafted features, which requires the extraction of predefined features from raw data, that is, HRV features from RRI data. Conventional feature engineering methods require the prior knowledge of human experts and additional data processing to select the best feature set. By contrast, end-to-end learning methods, such as deep neural networks (DNNs), present an alternative to the classical approach by learning and extracting representations from raw signals without requiring explicitly defined features [27,28]. Deep neural network models, such as convolutional neural networks (CNNs) and long short-term memory (LSTM), have recently experienced a remarkable breakthrough, outperforming conventional methods in many applications, including image classification and natural language processing [27].

The use of RRI time-series data as physiological markers has been explored using a deep learning approach based on the observation that RRI data reflect autonomic cardiovascular regulation and the resulting cardiac rhythm. R-peak to R-peak interval data have been directly applied to DNNs without performing feature engineering to implement end-to-end learning in the detection of abnormal cardiac activity caused by medical conditions, such as arrhythmia [29,30,31,32,33,34] and sleep apnea [35,36,37]. For example, Faust et al. used RRI segment data with an LSTM model to distinguish between normal sinus rhythm (NSR) and atrial fibrillation (AFIB) [29]. Wang et al. proposed a modified residual neural network (ResNet) to detect apnea segments in RRI data [35]. However, to the best of our knowledge, deep learning-based age prediction has rarely been performed using RRI data. A previous study demonstrated age prediction using RRIs with a deep learning architecture, comprising a CNN and LSTM [38]. In that study, the highest accuracy in classifying seven age groups of healthy subjects was 32.43%, indicating that automated age prediction based on RRI data using deep learning is challenging.

Therefore, as a pilot study, we aimed to investigate the feasibility of using RRI data in conjunction with a deep learning approach as a biomarker for age prediction. Our main hypothesis was that an RRI-based deep learning model would offer similar or superior performance in age prediction compared with a conventional HRV feature-based model. We compared the proposed RRI model with the HRV model because previous studies have demonstrated that HRV features could be used to predict age based on a machine learning approach [22,23,24,25,26].

We focused on aging in a healthy population to build a baseline model representing normal aging. Because the baseline model is trained to learn healthy aging, we expect that a patient with age-related diseases would show an older estimated age than their chronological age. The difference between the baseline model’s estimated age and chronological age would indicate the risk of premature aging and disease development. Therefore, it is important to develop an age estimation model based on only healthy subjects before we apply this method to a broader population that includes unhealthy subjects.

To test our hypothesis, we used the RRI data collected from healthy subjects with a modified ResNet model to predict their age groups. We extracted the HRV features from the same RRI dataset to establish an HRV-based prediction model as a benchmark model. The RRI and HRV models were compared to examine whether the prediction was improved by implementing a deep learning approach. In addition, we developed a model that combines RRI and HRV data for age prediction to test whether combining the two data types would improve performance. We used a publicly available dataset to maximize the number of samples for deep learning models, with the restriction that the minimum duration of RRI data instances be 5 min (see Materials and Methods for more details). Because of the class imbalance in the datasets used in this study, we oversampled the minority classes using the adaptive synthetic (ADASYN) algorithm. We implemented a hybrid loss function that combines classification loss and mean squared error (MSE) functions to improve the prediction of ordinal age classes. The proposed RRI model using ADASYN and the hybrid loss function outperformed the HRV model in classifying the four age groups.

The main contribution of this study is the development of a deep learning model based on RRI data to predict age in healthy subjects, which provides a new non-invasive approach for characterizing aging in terms of ANS. We expect that this study will provide a baseline DNN model for identifying normal aging that can be used to detect abnormal aging characteristics in RRI data in future studies.

2. Related Works

Estimating age based on facial images is widely recognized as the most prevalent non-invasive technique for age prediction [39]. Previous studies have demonstrated strong performance in age prediction using conventional machine learning or deep learning algorithms [39]. One of the major benefits of this method is its accessibility because obtaining facial images does not require expertise. However, cosmetics and facial expressions can affect estimation based on facial images, impairing the accuracy of such predictions [39]. By contrast, using physiological signals regulated by the ANS for age prediction presents an advantage because these signals are not affected by such external factors and are resistant to deception. However, acquiring physiological signals requires training and specific equipment, making it less accessible than facial imaging.

Non-invasive methods to evaluate biological age using physiological signals have been demonstrated in previous studies. For example, deep learning-based age prediction was performed using an electrocardiogram (ECG), which is one of the physiological signals that can be easily obtained [8,9]. In these works, 10 s-long 12-lead ECG signals were used as input data, and a CNN model was implemented to estimate age as a continuous variable. Attia et al. achieved a 0.7 r-squared between estimated and chronological ages and found that patients with an estimated age having a deviation of more than seven years from chronological age exhibited a higher incidence of comorbidities [9]. These studies trained deep learning models to learn raw ECG waveforms with a very short length (~10 s) instead of RRIs. Although 10 s-long ECG signals were long enough to reflect cardiac aging, it is not clear whether the same signals can be used to represent autonomic physiological responses due to their short length [40]. Because our study was aimed at characterizing aging in terms of ANS using RRIs, the rest of this section focuses on reviewing previous works based on RRIs.

Recently, an increasing number of studies have implemented deep learning methods without relying on feature engineering for the analysis of RRI time-series data. One of the major contributions of these studies was the automated detection of altered heartbeat rhythms caused by different types of arrhythmias [29,30,31,32,33,34]. The methods introduced were based on the observation that the distribution of RRIs during arrhythmias differs from that during NSRs [41]. For this purpose, the RRI sequence was extracted from ECG recordings using beat annotations manually labeled by expert clinicians, or by applying an R-peak detection algorithm to raw ECG signals when manual annotations were not available. In some cases, RRI data were split into shorter segments, particularly when ECG measurements were based on a long-term monitoring system such as Holter. Subsequently, the RRI data were input to various DNN models to classify two to six different types of heartbeat rhythms, including the NSR as a healthy control. For example, Faust et al. used 100-beat RRI sequences with an LSTM model to distinguish between AFIB and NSR [29]. Andersen et al. employed 30-beat RRIs with a deep learning model combining a CNN and LSTM to identify AFIB [30]. Ivanovic et al. proposed a DNN model consisting of a CNN and LSTM using 30 s RRIs to classify AFIB, atrial flutter, and NSR [31]. Lai et al. detected AFIB in 10 s RRIs extracted from patch-based ECG devices using a CNN model [32]. Taye et al. used 360 s RRI data with a CNN model to detect ventricular tachyarrhythmia; their model outperformed classical machine learning methods with HRV features [33]. Faust and Acharya implemented a ResNet model with 10 s RRI to classify six different beat types [34]. These studies achieved excellent performance in classifying beat types, with approximately 85–99% accuracy.

In addition, RRIs have been used to detect sleep apnea based on deep learning. Apnea events are accompanied by cyclic variations in RRIs, which consist of bradycardia during apnea, followed by tachycardia during cessation [42]. This pattern has been utilized in previous studies to develop new methods to distinguish apnea from normal events in the RRI data. For this purpose, RRIs obtained from ECG signals were divided into 1 min-long segments. Then, each segment was labeled as apnea or normal events by experts and used as input data. Wang et al. used three 1 min RRIs with ResNet to classify apnea and normal events [35]. Wang et al. used five 1 min RRIs as input data using a modified LeNet-5 CNN model [36]. Shen et al. implemented a multiscale dilation attention CNN model using 1 min RRI segments [37]. These studies on apnea detection also achieved high accuracies of approximately 88–94%.

However, most previous studies on the automated prediction of age were based on alterations in cardiac autonomic regulation and implemented feature-dependent classical machine learning methods using HRV features, as described in the Introduction section. To the best of our knowledge, only one previous study has demonstrated deep learning-based age prediction using RRI data [38]. That study used a deep learning architecture comprising a CNN and LSTM to classify seven age groups of 181 healthy subjects. Two prediction methods—classification and regression—were separately used for age prediction because classification does not consider orders of age classes. In the regression analysis, the closest age decade was selected as the predicted class. The highest accuracy achieved with the classification and regression methods was 10.81% and 32.43%, respectively. The overall performance was not adequate, even after considering the number of total classes, suggesting that RRI-based age prediction using deep learning is difficult.

In this study, we developed a novel end-to-end learning method for classifying age groups in healthy subjects using RRIs. We combined the classification loss and MSE functions to enable the proposed model to learn the ordinal relationships of the age classes. A modified ResNet was implemented with 5 min-long RRI sequences as input data. In addition, we established an HRV feature-based prediction model using the same RRI data for performance comparisons. Finally, we demonstrated age prediction using both RRI and HRV data.

3. Materials and Methods

3.1. Dataset

We applied three criteria for selecting a database for RRI-based age prediction: (1) a recording length of at least 5 min, (2) a controlled measurement condition, and (3) a large number of samples. First, it is recommended to record at least 5 min-long RRI data to evaluate the characteristics of spectral components in studies on HRV [10]. A 5 min recording length is widely considered the gold standard for investigating short-term autonomic physiological responses [10]. Because the objective of this study was to build a deep learning model that detects age-related changes in autonomic regulation embedded in RRI sequences, we selected a database containing measurements that were at least 5 min in length. Second, we preferred measurements in a controlled laboratory environment during supine rest because subjects are less affected by external factors and their own activity, which facilitates the acquisition of stable RRIs. We did not use long-term ambulatory ECG recordings because of the lack of accurate information of the subject’s status. Finally, we selected a large database of samples because a deep learning model typically requires a large number of samples for network training.

In this study, we used a database called “Autonomic Aging: a dataset to quantify changes of cardiovascular autonomic function during healthy aging,” that was acquired from PhysioNet [14,43]. It contains ECG recordings from 1121 healthy participants obtained at Jena University Hospital, Jena, Germany [14]. Participants with the following conditions were excluded from participating: any medical conditions, the use of illegal drugs or medication potentially influencing cardiovascular function, and any pathological findings from physical examination, resting ECG, and routine laboratory parameters [14]. The ECG signals were measured using the lead II configuration at a 1000 Hz sampling rate while participants were in a supine resting position [14]. The length of the recording was 8–35 min [14].

Individual age data from the participants were anonymized and generalized to the age groups. The database originally categorized ages into 15 five-year age groups, except for the youngest (18–19 years) and oldest (85–92 years) groups [14]. However, owing to a lack of samples in the elderly population and data imbalance, we reduced the number of total groups to four and combined subjects aged 50 and over into one group. Table 1 lists the number of participants in each age group used in this study. The dataset was highly imbalanced, with 64% of the total subjects in Group 1 (18–29 years). From a total of 1121 subjects, 1093 subjects were included in the final dataset after excluding 25 subjects because of lack of age information and 3 subjects because of low signal quality.

Despite the class imbalance and lack of individual age data, the dataset used in this study is currently the best available option. The dataset contained more than 1000 ECG recordings with at least a 5 min length obtained from subjects in the supine resting position, satisfying criteria 1 and 2. Although PhysioNet has various ECG datasets with a higher number of samples, most of the recordings are extremely short in length or measured under ambulatory conditions, which does not fit the criteria [43]. Issues arising from data imbalance and the lack of age data are further discussed in the Discussion section.

Although one of the criteria for selecting a database was a large sample size, the final number of samples used in this study, 3606, which is explained in the following section, was considered small in the context of deep learning. For example, public datasets for image classification and natural language processing, such as ImageNet and 20 Newsgroup, have at least 10,000 and up to 10 million samples. Because it was difficult to find a publicly available database that satisfied criteria 1 and 2, we did not assume a specific minimum sample size but tried to select a database with as many samples as possible. Despite the relatively small sample size compared to image or text databases, the dataset used in this study was the largest database that we could find for RRI-based age prediction.

3.2. Data Preprocessing

Figure 1 shows the overview of data preprocessing and feature extraction. We used the NeuroKit2 package to extract R-peaks from the raw ECG recordings and generate RRI sequences [44]. The NeuroKit2 package uses in-house algorithms for noise reduction and R-peak detection [45]. Electrocardiogram signals are often contaminated with different types of noise, including baseline wandering, power line interference, and instrumentation noise [46]. Therefore, noise reduction is required before performing R-peak detection to prevent false detection. Noise in the raw ECG signals was reduced using a high-pass filter (Butterworth, 5th order, 0.5 Hz), followed by a powerline notch filter [45]. The QRS complexes are detected based on the steepness of the absolute gradient of the ECG signal [45]. R-peaks were detected as local maxima in the QRS complexes, which indicate ventricular depolarization and are the most prominent wave components in ECG signals measured using the lead II configuration [45]. The RRI sequences were generated by calculating the time differences between the R-peaks. Subsequently, RRIs were interpolated to 4 Hz.

We applied a non-overlapping sliding window to the entire length of the RRI data obtained from each subject to generate 5 min RRI sequences. A total of 3606 5 min RRI samples were generated from 1093 subjects. The median number of samples extracted from each subject was 3 (1–7 samples from each subject). Table 1 lists the number of RRI samples acquired from each age group. Figure 2 shows a sample RRI sequence extracted from a subject in age group 1.

3.3. Feature Extraction

We extracted 20 HRV features from each of the 5 min RRI segments as input data for the feature-based prediction model (Figure 1, Table 2) [17]. We followed the international guidelines for a standard HRV analysis in feature extraction [10,47], which was performed using the Neurokit2 package [44]. Voss et al. demonstrated that linear and nonlinear HRV features evaluated based on 5 min recordings showed significant dependence on age in healthy subjects, suggesting that the length of the signal was long enough to reflect age-induced variation in HRV features [18].

Six time-domain features were calculated directly from the RRI series: the mean of the RRIs, standard deviation of the RRIs (SDNN), root mean square of successive RRI differences (RMSSD), percentage of successive RRIs differing by more than 50 ms (pNN50), integral of the histogram of the RRI divided by its height (TRI), and baseline width of the RRI histogram (TINN). Seven features were calculated using frequency-domain analysis. The RRI data were interpolated at 4 Hz using quadratic spline interpolation. Subsequently, Welch’s method was used to estimate the power spectral density (PSD) of equidistantly sampled RRI data. The following features were evaluated from the PSD: logarithm-transformed powers of very-low-frequency (VLF, 0–0.04 Hz), low-frequency (LF, 0.04–0.15 Hz), high-frequency (HF, 0.15–0.4 Hz), and total bands. Additionally, the relative powers of the LF and HF bands in normalized units and the LF/HF power ratio were evaluated. Five nonlinear features were extracted to assess the nonlinear dynamics in the RRI series: approximate entropy (ApEn) [48], sample entropy (SampEn) [49], short- and long-range scaling exponents of detrended fluctuation analysis (DFA) [50], and correlation dimension (CorDim) [51]. Finally, two features were evaluated using a Poincaré plot. SD1 and SD2 represent the standard deviations of the Poincaré plot perpendicular to and along the identity line, respectively.

3.4. Class Imbalance Handling

As listed in Table 1, the database is dominated by Group 1. Owing to the imbalance in the dataset, models tend to learn from the majority class, resulting in a bias toward Group 1. Previous studies have demonstrated that eliminating data imbalance in training data with oversampling improves the prediction performance of DNNs [52,53]. We used data-level ADASYN oversampling to augment minority classes and address the imbalanced learning problem [54]. The adaptive synthetic algorithm is a widely used sample synthesis method that can be implemented for tabular and time-series data, such as ECG and RRI [55,56]. In this study, we used ADASYN to synthesize the HRV and RRI data of minority classes in the training datasets. The adaptive synthetic algorithm generates new data in proportion to the sample size of the majority class [54]. Synthetic samples were generated using the following rule:

x_new = x_i + λ(x_ki − x_i),

where x_new is a newly generated sample, x_i is a sample from a minority class, x_ki is one of the samples from the k-nearest neighbors, λ is a random number between zero and one, and (x_ki − x_i) is the difference vector in n-dimensional spaces [54]. Additional details on ADASYN implementation can be found in [54].

3.5. Loss Function

We employed two different loss functions, classification and hybrid loss functions, to investigate whether performance would be improved by learning ordinal relationships among the age classes. The main purpose of the loss function is to optimize the model by minimizing classification loss. To improve the performance in ordinal classification, we added a regression loss term to the loss function, which provided ordinal information for optimization.

To implement the classification loss function, we combined cross-entropy and cosine losses. Cross-entropy is a widely used method for optimizing various types of models for classification problems. However, a previous study suggested that cross-entropy might not be an optimal choice for training models with small datasets [57]. To address this issue, Barz et al. proposed a cosine loss as a loss function, which maximized the cosine similarity between the true class and the model’s output, and demonstrated that the cosine loss outperformed cross-entropy by a considerable margin when a model was trained on small datasets [57]. Furthermore, for small datasets, combining cosine loss and cross-entropy outperforms employing either cosine loss or cross-entropy alone [57]. Therefore, the classification loss function used in this study can be defined as follows:

L_class = CrossEntropy(y_class, ŷ_class) + CosineLoss(y_class, ŷ_class),

where CosineLoss = 1 − (cosine similarity between y_class and ŷ_class). Additional details on the implementation of cosine loss can be found in [57].

To implement the hybrid loss function, we added the MSE loss term to the loss function, which considers the ordinal relationship among age classes. Models were constrained to learn the median age of each age group before being recategorized. The MSE loss penalized a failed prediction that was farther from its corresponding true class, whereas all misclassifications were penalized equally without MSE loss. The hybrid loss function can be defined as follows:

L_hybrid = L_class + MSE(y_reg, ŷ_reg).

3.6. Modified 1D-ResNet34

We modified the ResNet34 model to a 1D version to extract features from the raw RRIs [58]. The selection of the architecture plays an important role in the development of deep learning models. ResNet was chosen as the base architecture for our proposed model owing to its successful performance. A previous review article reported that ResNet outperformed other deep learning models in classifying time-series data [59]. In addition, ResNet was employed with RRI data to classify arrhythmias [34].

Figure 3a depicts the overall architecture of the model. Figure 3b depicts the conv block. The conv block contains one 1D convolutional layer, a batch normalization layer, and a max pooling layer. Figure 3c depicts the residual block. The residual block contains three 1D convolutional layers and two batch normalization layers. Two 1D convolutional layers were used for the stacked layers and the other 1D convolutional layer (kernel size = 1) was used for a shortcut connection. ReLU activation was used for each layer, and a batch normalization layer was placed before the vectors were fed into the activation function. Table 3 presents details of the model architecture. In every convolutional layer, the input vectors were zero-padded for seven, except in the shortcut connections. In the linear block, dropout layers (50%) were added after each activation function, and SoftMax was used for the activation function on the output layer.

3.7. Modeling and Statistical Analysis

As described in the Introduction section, the main hypothesis of this study was that an RRI-based deep learning model would offer similar or superior performance in age prediction compared with a conventional HRV feature-based model. To test our hypothesis, we developed three different types of models for age group prediction: an HRV feature-based model (Figure 4a), an RRI-based model (Figure 3), and a combined model that used both HRV and RRI data (Figure 4b). The HRV model used only HRV features as input data and consisted of three fully connected layers (number of neurons: 200, 100, and 50), where each layer consisted of batch normalization and a dropout layer. The RRI model used only RRIs as input data with the modified 1D-ResNet34 as described in Section 3.6. The combined model used both HRV and RRI as input data, but the two types of data were separately fed into the model. The RRIs were processed by the same modified ResNet34 until the average pool layer extracted the representation vectors from the RRIs. The processed RRIs were then concatenated with HRV features and fed into three fully connected layers (number of neurons: 1000).

We used stratified 5-fold subject-wise cross-validation for the model evaluation. Age group was used as a stratification option. Four folds were used for training and the remaining fold was used for evaluation. Five models were developed and evaluated for each fold. We gathered all evaluation results and bootstrapped them to estimate the model’s mean performance with confidence intervals.

One-way analysis of variance (ANOVA) was used to investigate the effect of age on mean HRV features. In addition, we used an independent t-test for two samples to compare the evaluation metrics between the RRI and HRV models, the combined and RRI models, and the combined and HRV models.

3.8. Evaluation Metric

We evaluated the proposed models using five different metrics—accuracy, balanced accuracy (BAC), quadratic weighted kappa (QWK), precision, and F1-score (Table 4). This is because the models performed ordinal classification on an imbalanced dataset, which required a more rigorous evaluation than nominal classification on a balanced dataset. An imbalance in a dataset affects a model’s decision owing to a bias toward a majority class [60,61]. Accuracy becomes an unreliable measure in the presence of class imbalance because it can be improved by simply predicting all samples to the majority class. The BAC in multiclass classification is defined the same as the macro-recall, which is the average recall obtained from each class. The BAC is specialized for evaluating imbalanced datasets in multiclass classification, which considers all classes equal [62]. The BAC was approximately the same as the accuracy for a balanced dataset.

In addition, a metric must consider the deviation between a prediction and the ground truth because age classes have an ordinal relationship. Nominal evaluation metrics penalize all misclassifications equally, assuming that classes have no ordinal relationship [63]. We used the ordinal metric QWK to compensate for accuracy and BAC in evaluating the ordinal classification. Quadratic weighted kappa is a modified version of Cohen’s kappa, which was developed to measure the degree of disagreement [64,65]. Quadratic weighted kappa is a metric for ordinal problems because it considers the ordering of classes by assigning a weight to each prediction based on the closeness between the predicted and actual classes [64,66]. Therefore, a failed prediction that is close to the ground truth is considered better than a misclassification that is far from the ground truth [66,67]. In QWK, the weight is proportional to the square of distance. Previous studies have suggested that QWK is suitable for multiclass ordinal classification [68] and highly imbalanced datasets [69].

In addition, we evaluated precision and F1-score using macro-average. The F1-score is defined as the harmonic mean of recall and precision. Because the macro F1-score is calculated based on class level and each class has the same weight, the minority class is equally important as the majority class in the F1-score evaluation. Therefore, F1-score is more suitable than accuracy when dealing with imbalanced data. We used BAC, QWK, and F1 as the primary measures of performance. Accuracy was reported as one of the standard metrics, although it can be misleading in an imbalanced dataset.

We investigated how the prediction was affected by different RRI samples in the same person by evaluating the inter-sample reliability of age prediction using Cohen’s kappa. Two distinct 5 min RRI samples from the same subject were randomly selected without replacement and predicted age groups from these two samples were acquired. We repeated this process for all test subjects with two or more 5 min samples. Based on these results, we calculated Cohen’s kappa, which measured the agreement between two predictions obtained from the same subject. We employed the quadratic weights described in Table 4 for the calculations to account for the ordinal nature of the predictions. For the inter-sample reliability, we evaluated the kappa values based on the models using hybrid loss with ADASYN.

4. Results

4.1. Effect of Age on HRV Features

Table 5 lists the mean and standard deviation of HRV features calculated for each age group. One-way ANOVA was performed to compare the effects of age on HRV features. Except for SDNN, RMSSD, and SD1, a statistically significant difference exists in each HRV feature between at least two age groups. These results suggest that most HRV features used in this study were affected by aging and support the potential of HRV features in automated age prediction, which is consistent with previous studies that have demonstrated an association between aging and HRV features [18,19,20].

4.2. Classifier Assessment

We demonstrated an automated age prediction based on RRI data using a deep learning approach. In addition, the HRV model, which is considered a conventional feature-based method for comparison with the proposed RRI model, was developed for age prediction. The HRV features used in this study clearly showed age dependence, supporting the potential of the HRV model for age prediction (Table 5). We demonstrated a model that combined RRI and HRV data, and its performance was compared to the RRI and HRV models to test whether the prediction was improved by combining the two data types.

We compared the performance of models implemented with and without ADASYN oversampling, which addresses the class imbalance. We also compared two different loss functions, classification and hybrid loss functions, to investigate whether learning the ordinal relationship among the age classes would improve the performance. Therefore, four different combinations of methods were implemented: classification loss without ADASYN, classification loss with ADASYN, hybrid loss without ADASYN, and hybrid loss with ADASYN. Five different performance metrics, BAC, QWK, accuracy, precision, and F1, were used to consider the issues of class imbalance and ordinal classification.

Table 6 and Figure 5 show the prediction performance based on data type and method. The bold entries in Table 6 indicate the best performance for each evaluation metric. The RRI model using hybrid loss with ADASYN achieved the highest BAC (44.8%) and F1-score (0.446). The same highest BAC was also achieved by the HRV model using classification loss with ADASYN. The highest QWK (0.59) was obtained by the combined model using hybrid loss with ADASYN. The highest accuracy (70.1%) and precision (0.496) were realized by the combined model using the hybrid loss without ADASYN. We noted that the RRI model using hybrid loss with ADASYN outperformed both the HRV and combined models using the same method in terms of BAC, precision, and F1-score. We further analyze the prediction results using confusion matrices in the following section to investigate the effect of class imbalance on the evaluation metrics.

We considered the method using classification loss without ADASYN as the base method and examined whether applying hybrid loss or ADASYN would change the performance compared with the base method. Except for the accuracy, implementing hybrid loss or ADASYN improved the performance metrics in most cases. For example, the BAC, QWK, and F1-score were increased using hybrid loss or ADASYN for all three models, except when the HRV model used hybrid loss without ADASYN. Similarly, the precision was improved by using hybrid loss or ADASYN for all three models, except when the HRV and RRI models used hybrid loss without ADASYN.

By contrast, the effect of hybrid loss and ADASYN on the accuracy showed the opposite trend. Using hybrid loss with ADASYN resulted in a 2–9% decrease in accuracy compared to the base method. Implementing classification loss with ADASYN also led to a decrease in accuracy for the HRV and RRI models. The use of hybrid loss without ADASYN improved the accuracy for all data types.

It should be noted that in the RRI model, using hybrid loss with ADASYN resulted in the lowest accuracy but the highest BAC, QWK, precision, and F1-score. These results suggest that implementing hybrid loss with ADASYN improved the RRI model in learning imbalanced datasets and the ordinal nature of classes, while decreasing accuracy.

Table 7 shows the prediction performance for each class based on data type and method. It indicates that the prediction of minority classes was not successful. In particular, Groups 2 and 3 showed a substantially lower F1-score, which was between 0 and 0.253 and 0 and 0.227, respectively, compared to Groups 1 and 4. The RRI model using hybrid loss with ADASYN achieved the highest F1-score in Groups 2 and 3.

4.3. Confusion Matrices

The confusion matrices for the proposed models are shown in Figure 6, Figure 7 and Figure 8, which illustrate the effect of class imbalance on the prediction. All classifiers were skewed to choose Group 1 as a prediction, regardless of the ground truth (Figure 6, Figure 7 and Figure 8). In addition, recall showed significant differences among the classes. Group 1 always showed the highest recall among the age groups in each confusion matrix, which was between 73 and 97% (Figure 6, Figure 7 and Figure 8). Group 4 always showed the second highest recall among the age groups in each confusion matrix, which was between 44 and 65% (Figure 6, Figure 7 and Figure 8). By contrast, Groups 2 and 3 showed a substantially lower recall, which was between 0 and 31% and 0 and 24%, respectively, compared to Groups 1 and 4 (Figure 6, Figure 7 and Figure 8).

The confusion matrices indicate that using hybrid loss with ADASYN increased the recall of Groups 2 or 3, although it decreased the recall of Group 1 in all data types (Figure 6, Figure 7 and Figure 8). However, applying either hybrid loss or ADASYN led to mixed outcomes, affecting the recall of Groups 2 and 3. These results were consistent with the trends observed in the performance metrics (Section 4.2), which showed that using hybrid loss with ADASYN increased the BAC, QWK, or F1-score but decreased the accuracy.

The confusion matrices demonstrated that predicting Groups 2 and 3 was not successful and that accuracy should be interpreted with caution if there is an imbalance in multiclass datasets. In all models, high accuracy was achieved by simply choosing Group 1 as a prediction. Using hybrid loss with ADASYN resulted in a decrease in accuracy compared with the base method, although the prediction of Groups 2 and 3 was improved. This is because the recall of Group 1 was decreased using hybrid loss with ADASYN, which considerably affected the accuracy. The effect of a decrease in Group 1 recall on the overall accuracy was amplified because of the large sample size of Group 1. These results suggest that BAC, QWK, and F1 should be considered as more important metrics than accuracy to prevent the misinterpretation of models’ performance.

4.4. Comparison between RRI and HRV Models

We compared the performances of the HRV and RRI models to test whether the prediction was improved by implementing a deep learning approach. Table 8 lists the difference in the performance metrics between the RRI and HRV models that used the same method. The differences were calculated by subtracting the HRV model value from that of the RRI model. We performed a two-sample t-test to statistically compare the metrics between the two models.

The RRI model outperformed the HRV model in terms of BAC, precision, and F1-score when using hybrid loss with ADASYN. However, there was no significant difference in QWK between the two models when using the same method. We observed a significantly higher BAC in the RRI model than in the HRV model when the hybrid loss was implemented with or without ADASYN, and opposite trends in BAC were observed when the classification loss was implemented. We observed a significantly higher QWK in the RRI model than in the HRV model when hybrid loss was used, and the opposite trend in QWK was observed when the classification loss was used. The accuracy differed significantly in three methods, with the HRV model outperforming the RRI model when using hybrid loss with ADASYN. The precision differed significantly in two methods, with the RRI model outperforming the HRV model when using hybrid loss with ADASYN. The F1-score differed significantly in three methods, with the RRI model outperforming the HRV model when using hybrid loss with ADASYN.

Specifically, the RRI model outperformed the HRV model in terms of BAC and F1-score when using hybrid loss with ADASYN, suggesting better performance in addressing class imbalance. Considering all four methods, these results suggest that the RRI and HRV models show comparable performance in predicting age.

4.5. Comparison between Combined, HRV, and RRI Models

We compared the performance of the combined model with the HRV and RRI models to test whether the prediction was improved by combining the two data types. Table 9 and Table 10 list the differences in performance metrics between the combined, HRV, and RRI models that used the same method. The differences were calculated by subtracting the HRV or RRI model’s value from the value of the combined model. We performed a two-sample t-test to compare the metrics between the two models statistically.

When using hybrid loss with ADASYN, there was no significant difference in BAC, QWK, or F1-score between the combined and HRV models (Table 9). The BAC differed significantly in three methods, with the HRV model outperforming the combined model in two cases (Table 9). The QWK differed significantly in three methods, with the combined model outperforming the HRV model in two cases (Table 9). The accuracy differed significantly in three methods, with the combined model outperforming the HRV model in all three cases (Table 9). The precision differed significantly in four methods, with the HRV model outperforming the combined model in three cases (Table 9). The F1-score differed significantly in two methods, with the combined model outperforming the HRV model when using hybrid loss without ADASYN.

When using hybrid loss with ADASYN, the RRI model showed significantly higher BAC, precision, and F1-score and significantly lower QWK and accuracy than the combined model (Table 10). The BAC differed significantly in two methods, with the RRI model outperforming the combined model when using hybrid loss with ADASYN (Table 10). The QWK differed significantly in three methods, with the combined model outperforming the RRI model in two cases (Table 10). The accuracy differed significantly in four methods, with the combined model outperforming the RRI model in three cases (Table 10). The precision differed significantly in three methods, with the combined model outperforming the RRI model in two cases (Table 10). The F1-score differed significantly in four methods, with the combined model outperforming the RRI model in three cases.

Specifically, the RRI model outperformed the combined model in terms of BAC and F1-score when using hybrid loss with ADASYN, suggesting better performance in addressing class imbalance. Considering all four methods, these results suggest that the combined model would show similar performance in age prediction compared to the HRV and RRI models.

4.6. Inter-Sample Reliability in Age Prediction

We investigated how the prediction was affected by different RRI samples of the same subject by evaluating the inter-sample reliability in age prediction using Cohen’s kappa. Table 11 shows Cohen’s kappa representing the agreement between predictions of two distinct samples from the same subject. Kappa can range from −1 to 1, which is suggested to be interpreted as follows: values <0 indicate no agreement, 0.01–0.20 indicate slight agreement, 0.21–0.40 indicate fair agreement, 0.41–0.60 indicate moderate agreement, 0.61–0.80 indicate substantial agreement, and 0.81–1.00 indicate almost perfect agreement [71]. All three models show substantial but not perfect agreement between two predictions from the same person.

Therefore, we further analyzed the effects of inter-sample disagreement on the prediction performance using the RRI model, which showed the lowest kappa. We evaluated the performance metrics using different sample windows (Table 12). These samples were different from those used for Table 6. The performance metrics were affected by changing the sample window. Nonetheless, the RRI model achieved the highest BAC, QWK, and F1 and the lowest accuracy when using hybrid loss with ADAYSN, which was consistent with the results in Table 6, and the highest metric values achieved by the RRI model remain similar after using different samples. These results suggest that performance metrics would be affected by using different sample windows due to inter-sample disagreement, but the overall trend of the performance would remain similar.

5. Discussion

5.1. Performance of RRI-Based Deep Learning Model

The RRI model proposed in this study exhibited 45% BAC, 0.57 QWK, 60% accuracy, 0.46 precision, and 0.45 F1-score in classifying the four age groups using hybrid loss with ADASYN. In addition, the BAC and F1 achieved by the RRI model were the highest among all the data types and methods. Comparisons among the three models suggest that the RRI model can perform similarly to the HRV and combined models, demonstrating the potential of the RRI-based deep learning model for automated age prediction. However, our models showed limited efficacy in accurately predicting all age groups, indicating the need for significant improvement before they can be considered reliable age prediction methods.

The RRI model could not classify the minority classes. Although the bias to Group 1 was considerably reduced using hybrid loss with ADASYN, the recall obtained from the RRI model was 73%, 31%, 24%, and 50% for Groups 1, 2, 3, and 4, respectively (Figure 7). One possible cause of the inaccurate prediction of minority classes is data imbalance. Group 1 contained 63% of the total samples, which was approximately five to six times more than the other minority groups (Table 1). We used ADASYN oversampling to balance the training data for each class, although the effect of ADASYN alone on reducing bias was limited. The prediction of minority classes was further improved when ADASYN was combined with hybrid loss, particularly in the RRI model. Previous studies have suggested that oversampling improves the prediction of minority classes in DNNs [52,53], although a sampling method may not be sufficient to solve the imbalance problem in all cases [72].

Furthermore, we developed models for predicting an age class instead of a numeric value of age, because the subjects’ age data were anonymized and generalized to age groups in the dataset (see Materials and Methods section for more details). The loss of individual age data increased the difficulty of classifying subjects who were near the group boundaries. For example, 29-year-old and 30-year-old subjects have only a one-year age difference, but they are labeled as Groups 1 and 2, separated by a decade gap. Near-boundary samples can show a higher probability of being misclassified into adjacent age groups. In the RRI model using hybrid loss with ADASYN, 54% of the actual Group 2 samples were misclassified as Group 1, and 23% of Group 3 samples were misclassified as Group 2. We used the MSE to develop the hybrid loss function to improve the learning ordinal attributes of the age group; however, the hybrid loss cannot address issues from near-boundary samples unless actual age values are provided.

5.2. Comparison with Similar Previous Studies

Table 13 summarizes previous studies on machine learning-based age prediction using HRV features and RRIs. It is difficult to directly compare our results with others because of differences in the database and methodology. In particular, the datasets used in previous studies were well-balanced, whereas the database used in this study was highly imbalanced, resulting in a bias toward the majority age class. Accuracy was listed as a performance metric for comparison because previous studies have mainly used accuracy to report the performance of their models in predicting age. Confusion matrices are necessary to evaluate other metrics, although they were not reported in these previous studies, except for Poddar et al. [23]. Therefore, we listed the accuracy in Table 13, although it may not be the best metric to evaluate ordinal classification, as described in the Materials and Methods section. In Table 13, we list only the results using hybrid loss with ADASYN.

In the case of RRI-based prediction, our model achieved higher accuracy than the method proposed by Pfundstein [38]. However, Pfundstein classified more age groups using a smaller number of samples compared to our study. In addition, we combined classification and regression methods in the prediction model, whereas Pfundstein implemented the two methods separately, as described in the Related Works section. The measurement conditions were also different, which may have affected the information embedded in the RRI data.

In the case of HRV-based prediction, our model showed a lower accuracy than that of the model proposed by Poddar et al., wherein 70% accuracy was achieved in classifying three age groups [23]. Makowiec and Wdowczyk achieved 93.6% accuracy in classifying four age groups using HRV features [25]. However, their performance evaluation was not based on an unknown test dataset, suggesting that high accuracy might be a result of overfitting the entire dataset. Our model achieved a higher accuracy than the method demonstrated by Al-Mter; however, Al-Mter classified more age groups using a smaller number of samples [26].

Corino et al. and Botsva et al. demonstrated methods for predicting individual age and mean group age, respectively [22,24]. Because these previous studies predicted a numeric value instead of a class, they adopted evaluation methods for numerical prediction, such as the correlation coefficient between predicted and true ages used by Corino et al. [22]. Botsva et al. did not describe in detail the way of calculating the accuracy reported in their study [24], and we assumed that their accuracy represented the closeness of values between the predicted and true ages. Both previous studies reported an overestimation of age in younger subjects and an underestimation in older subjects, although the overall prediction performance was successful. Previous studies on age prediction suggested that predicting numeric age would lead to better performance than classifying age groups, although our observation was based on two regression studies by Corino et al. and Botsva et al. [22,24].

5.3. Comparison between the Models

In this study, we hypothesized that an RRI-based deep learning model would offer similar or superior performance in age prediction compared to a conventional HRV feature-based model. We showed that the RRI model outperformed the HRV and combined models in BAC, precision, and F1-score when using hybrid loss with ADASYN, while achieving the highest BAC and F1. Although the prediction of minority classes was not successful, we expected that the performance of minority groups would be improved if the RRI model was trained with balanced data. Our assumption is based on the promising results of the aforementioned previous studies. Poddar et al. achieved 70% accuracy in classifying three age groups by using a balanced HRV dataset obtained from the supine rest position [23]. We also calculated BAC, QWK, and F1-score for the model proposed by Poddar et al. using the confusion matrices provided in their article. Balanced accuracy was 70%, QWK was 0.55, and F1-score was 0.70, indicating that the prediction was equally successful for all age groups. In addition, we expect that the prediction of numeric ages would result in better performance than the classification of age groups because misclassification caused by a lack of age information would be prevented. Therefore, we believe that the RRI-based deep learning model has the potential to be used as an age prediction tool, assuming that its training is conducted with a well-balanced dataset without the loss of numeric age data. However, more studies are required in the future to test the effect of adding more minority samples and using numeric age data on the performance of the prediction models.

5.4. Limitations

This study has two major limitations, namely class imbalance and loss of numeric age in the dataset. Owing to class imbalance, the prediction was highly skewed to the majority group even after using ADASYN oversampling during training. The sampling method was not sufficient to solve the imbalance issue of the dataset used in this study, suggesting that new data for minority classes must be added to improve performance. Furthermore, the absence of numeric age data added to the difficulty in prediction because we could not use strong ordinal information provided in the age values. Instead of optimizing models to predict an individual’s age, we employed the hybrid loss function, which constrains models to learn the median age of each age group using the MSE term. The anonymization of age data in the current dataset caused a loss of crucial ordinal information, which could be exploited to develop a prediction model.

Although the dataset used in this study has critical limitations, it is currently the best option, as described in the Materials and Methods section. An ECG database with at least 5 min recording length and acquired in a resting condition is rarely publicly provided. For example, most ECG datasets provided by PhysioNet are short (<1 min) or measured under ambulatory conditions [43]. Because our goal was to obtain stable RRIs reflecting autonomic function, we chose the current dataset and developed a deep learning model based on it. In future studies, we will perform additional tests on well-balanced datasets provided with individual age data to further improve and evaluate the performance of the proposed deep learning method for age prediction. We can modify our models to learn numeric age values although we developed deep learning models to classify age groups in this study. For example, we can perform regression instead of classification using the MSE as a regression loss function based on the same ResNet34.

The limitation of the deep learning model is that we tested the RRI model based only on ResNet34 and did not compare the performance with other deep learning architectures. Our main goal was to test the feasibility of RRI-based prediction by comparison with the conventional HRV feature-based model. We assumed the HRV model as a benchmark because HRV-based models have shown successful results on age estimation, as we described in the Introduction. We will conduct a comparison study with different types of architecture to evaluate their performances.

The inter-sample reliability results suggest that two distinct windows randomly selected from the same subject can result in disagreeing predictions. Because the recording length of the current dataset (>5 min) was much longer than those of other short ECG databases (<30 s), the measurement protocol and laboratory environment may considerably affect the long-term reliability of physiological signals. In future studies, we will explore a new measurement method to minimize disagreement in inter-sample predictions from the same subject.

6. Conclusions

We demonstrated deep learning-based age prediction using RRI data in a healthy population. The results indicated that the RRI-based model can perform similarly to the HRV feature-based model in terms of age prediction. These findings suggest that age-related ANS alterations observed in RRI signals can be extracted and used to predict biological age using a deep learning algorithm. However, our models demonstrated inadequate effectiveness in estimating all age groups, indicating that substantial improvement is required before these models can be considered dependable age prediction methods. Although this study was limited to healthy aging, our approach can be extended to detect abnormal changes in unhealthy patients in future studies.

Author Contributions

Conceptualization, K.H.L. and S.B.; Methodology, K.H.L. and S.B.; Formal Analysis, K.H.L. and S.B.; Writing—Original Draft Preparation, K.H.L. and S.B.; Writing—Review and Editing, K.H.L. and S.B.; Visualization, K.H.L.; Supervision, S.B.; Project Administration, S.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Incheon National University (International Cooperative) Research Grant in 2018.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

We thank the National IT Industry Promotion Agency (NIPA) for the high-performance computing support program for 2022.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

RRI, R-peak to R-peak intervals; HRV, heart rate variability; ADASYN, adaptive synthetic; ANS, autonomic nervous system; PNN, probabilistic neural network; DNN, deep neural network; CNN, convolutional neural network; LSTM, long short-term memory; NSR, normal sinus rhythm; AFIB, atrial fibrillation; ResNet, residual neural network; MSE, mean squared error; ECG, electrocardiogram; PSD, power spectral density; ANOVA, analysis of variance; BAC, balanced accuracy; QWK, quadratic weighted kappa.

References

Partridge, L.; Deelen, J.; Slagboom, P.E. Facing up to the global challenges of ageing. Nature 2018, 561, 45–56. [Google Scholar] [CrossRef] [PubMed] [Green Version]
McIntyre, R.L.; Rahman, M.; Vanapalli, S.A.; Houtkooper, R.H.; Janssens, G.E. Biological Age Prediction From Wearable Device Movement Data Identifies Nutritional and Pharmacological Interventions for Healthy Aging. Front. Aging 2021, 2, 26. [Google Scholar] [CrossRef] [PubMed]
Rodgers, J.L.; Jones, J.; Bolleddu, S.I.; Vanthenapalli, S.; Rodgers, L.E.; Shah, K.; Karia, K.; Panguluri, S.K. Cardiovascular risks associated with gender and aging. J. Cardiovasc. Dev. Dis. 2019, 6, 19. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Hamczyk, M.R.; Nevado, R.M.; Barettino, A.; Fuster, V.; Andrés, V. Biological Versus Chronological Aging: JACC Focus Seminar. J. Am. Coll. Cardiol. 2020, 75, 919–930. [Google Scholar] [CrossRef] [PubMed]
Hannum, G.; Guinney, J.; Zhao, L.; Zhang, L.; Hughes, G.; Sadda, S.V.; Klotzle, B.; Bibikova, M.; Fan, J.B.; Gao, Y.; et al. Genome-wide Methylation Profiles Reveal Quantitative Views of Human Aging Rates. Mol. Cell 2013, 49, 359–367. [Google Scholar] [CrossRef] [Green Version]
Blackburn, E.H.; Epel, E.S.; Lin, J. Human telomere biology: A contributory and interactive factor in aging, disease risks, and protection. Science 2015, 350, 1193–1198. [Google Scholar] [CrossRef] [Green Version]
Marioni, R.E.; Shah, S.; McRae, A.F.; Chen, B.H.; Colicino, E.; Harris, S.E.; Gibson, J.; Henders, A.K.; Redmond, P.; Cox, S.R.; et al. DNA methylation age of blood predicts all-cause mortality in later life. Genome Biol. 2015, 16, 25. [Google Scholar] [CrossRef] [Green Version]
Benavente, E.D.; Jimenez-Lopez, F.; Attia, Z.I.; Malyutina, S.; Kudryavtsev, A.; Ryabikov, A.; Friedman, P.A.; Kapa, S.; Voevoda, M.; Perel, P.; et al. Studying accelerated cardiovascular ageing in Russian adults through a novel deep-learning ECG biomarker. Wellcome Open Res. 2021, 6, 12. [Google Scholar] [CrossRef]
Attia, Z.I.; Friedman, P.A.; Noseworthy, P.A.; Lopez-Jimenez, F.; Ladewig, D.J.; Satam, G.; Pellikka, P.A.; Munger, T.M.; Asirvatham, S.J.; Scott, C.G.; et al. Age and Sex Estimation Using Artificial Intelligence from Standard 12-Lead ECGs. Circ. Arrhythmia Electrophysiol. 2019, 12, e007284. [Google Scholar] [CrossRef]
Task Force of The European Society of Cardiology and The North American Society of Pacing and Electrophysiology; Malik, M.; Bigger, T.; Camm, A.J.; Kleiger, R.E.; Malliani, A.; Moss, A.J.; Schwartz, P.J. Heart rate variability, Standards of measurement, physiological interpretation, and clinical use. Eur. Heart J. 1996, 17, 354–381. [Google Scholar] [CrossRef] [Green Version]
Garavaglia, L.; Gulich, D.; Defeo, M.M.; Mailland, J.T.; Irurzun, I.M. The effect of age on the heart rate variability of healthy subjects. PLoS ONE 2021, 16, e0255894. [Google Scholar] [CrossRef]
Makowiec, D.; Wejer, D.; Kaczkowska, A.; Zarczyńska-Buchowiecka, M.; Struzik, Z.R. Chronographic imprint of age-induced alterations in heart rate dynamical organization. Front. Physiol. 2015, 6, 201. [Google Scholar] [CrossRef] [Green Version]
Pikkujämsä, S.M.; Mäkikallio, T.H.; Sourander, L.B.; Räihä, I.J.; Puukka, P.; Skyttä, J.; Peng, C.-K.; Goldberger, A.L.; Huikuri, H.V. Cardiac Interbeat Interval Dynamics From Childhood to Senescence. Circulation 1999, 100, 393–399. [Google Scholar] [CrossRef] [Green Version]
Schumann, A.; Bär, K.J. Autonomic aging—A dataset to quantify changes of cardiovascular autonomic function during healthy aging. Sci. Data 2022, 9, 95. [Google Scholar] [CrossRef]
Malik, M.; Hnatkova, K.; Huikuri, H.V.; Lombardi, F.; Schmidt, G.; Zabel, M. CrossTalk proposal: Heart rate variability is a valid measure of cardiac autonomic responsiveness. J. Physiol. 2019, 597, 2595–2598. [Google Scholar] [CrossRef]
Ishaque, S.; Khan, N.; Krishnan, S. Trends in Heart-Rate Variability Signal Analysis. Front. Digit. Health 2021, 3, 639444. [Google Scholar] [CrossRef]
Byun, S.; Kim, A.Y.; Jang, E.H.; Kim, S.; Choi, K.W.; Yu, H.Y.; Jeon, H.J. Detection of major depressive disorder from linear and nonlinear heart rate variability features during mental task protocol. Comput. Biol. Med. 2019, 112, 103381. [Google Scholar] [CrossRef]
Voss, A.; Schroeder, R.; Heitmann, A.; Peters, A.; Perz, S. Short-term heart rate variability-influence of gender and age in healthy subjects. PLoS ONE 2015, 10, e0118308. [Google Scholar] [CrossRef] [Green Version]
Choi, J.; Cha, W.; Park, M.G. Declining Trends of Heart Rate Variability According to Aging in Healthy Asian Adults. Front. Aging Neurosci. 2020, 12, 610626. [Google Scholar] [CrossRef]
Beckers, F.; Verheyden, B.; Aubert, A.E. Aging and nonlinear heart rate control in a healthy population. Am. J. Physiol.-Heart Circ. Physiol. 2006, 290, H2560–H2570. [Google Scholar] [CrossRef]
Colosimo, A. Estimating a cardiac age by means of heart rate variability. Am. J. Physiol.-Heart Circ. Physiol. 1997, 273, H1841–H1847. [Google Scholar] [CrossRef] [PubMed]
Corino, V.D.A.; Matteucci, M.; Cravello, L.; Ferrari, E.; Ferrari, A.A.; Mainardi, L.T. Long-term heart rate variability as a predictor of patient age. Comput. Methods Programs Biomed. 2006, 82, 248–257. [Google Scholar] [CrossRef] [PubMed]
Poddar, M.G.; Kumar, V.; Sharma, Y.P. Heart rate variability: Analysis and classification of healthy subjects for different age groups. In Proceedings of the 2015 2nd International Conference on Computing for Sustainable Global Development (INDIACom), New Delhi, India, 11–13 March 2015; pp. 1908–1913. [Google Scholar]
Botsva, N.; Naishtetik, I.; Khimion, L.; Chernetchenko, D. Predictors of aging based on the analysis of heart rate variability. PACE-Pacing Clin. Electrophysiol. 2017, 40, 1269–1278. [Google Scholar] [CrossRef] [PubMed]
Makowiec, D.; Wdowczyk, J. Patterns of heart rate dynamics in healthy aging population: Insights from machine learning methods. Entropy 2019, 21, 1206. [Google Scholar] [CrossRef] [Green Version]
Al-Mter, Y. Automatic Prediction of Human Age Based on Heart Rate Vari-Ability Analysis Using Feature-Based Methods; Linköping University: Linköping, Sweden, 2020. [Google Scholar]
Lecun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
Ribeiro, A.H.; Ribeiro, M.H.; Paixão, G.M.M.; Oliveira, D.M.; Gomes, P.R.; Canazart, J.A.; Ferreira, M.P.S.; Andersson, C.R.; Macfarlane, P.W.; Wagner, M.; et al. Automatic diagnosis of the 12-lead ECG using a deep neural network. Nat. Commun. 2020, 11, 1760. [Google Scholar] [CrossRef] [Green Version]
Faust, O.; Shenfield, A.; Kareem, M.; San, T.R.; Fujita, H.; Acharya, U.R. Automated detection of atrial fibrillation using long short-term memory network with RR interval signals. Comput. Biol. Med. 2018, 102, 327–335. [Google Scholar] [CrossRef] [Green Version]
Andersen, R.S.; Peimankar, A.; Puthusserypady, S. A deep learning approach for real-time detection of atrial fibrillation. Expert Syst. Appl. 2019, 115, 465–473. [Google Scholar] [CrossRef]
Ivanovic, M.D.; Atanasoski, V.; Shvilkin, A.; Hadzievski, L.; Maluckov, A. Deep Learning Approach for Highly Specific Atrial Fibrillation and Flutter Detection based on RR Intervals. In Proceedings of the 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Berlin, Germany, 23–27 July 2019; pp. 1780–1783. [Google Scholar] [CrossRef]
Lai, D.; Bu, Y.; Su, Y.; Zhang, X.; Ma, C.S. Non-Standardized Patch-Based ECG Lead Together with Deep Learning Based Algorithm for Automatic Screening of Atrial Fibrillation. IEEE J. Biomed. Health Inform. 2020, 24, 1569–1578. [Google Scholar] [CrossRef]
Taye, G.T.; Hwang, H.J.; Lim, K.M. Application of a convolutional neural network for predicting the occurrence of ventricular tachyarrhythmia using heart rate variability features. Sci. Rep. 2020, 10, 6769. [Google Scholar] [CrossRef] [Green Version]
Faust, O.; Acharya, U.R. Automated classification of five arrhythmias and normal sinus rhythm based on RR interval signals. Expert Syst. Appl. 2021, 181, 115031. [Google Scholar] [CrossRef]
Wang, L.; Lin, Y.; Wang, J. A RR interval based automated apnea detection approach using residual network. Comput. Methods Programs Biomed. 2019, 176, 93–104. [Google Scholar] [CrossRef]
Wang, T.; Lu, C.; Shen, G.; Hong, F. Sleep apnea detection from a single-lead ECG signal with automatic feature-extraction through a modified LeNet-5 convolutional neural network. PeerJ 2019, 7, e7731. [Google Scholar] [CrossRef] [Green Version]
Shen, Q.; Qin, H.; Wei, K.; Liu, G. Multiscale Deep Neural Network for Obstructive Sleep Apnea Detection Using RR Interval from Single-Lead ECG Signal. IEEE Trans. Instrum. Meas. 2021, 70, 1–13. [Google Scholar] [CrossRef]
Pfundstein, M. Human Age Prediction Based on Real and Simulated RR Intervals Using Temporal Convolutional Neural Networks and Gaussian Processes; Linköping University: Linköping, Sweden, 2020. [Google Scholar]
Angulu, R.; Tapamo, J.R.; Adewumi, A.O. Age estimation via face images: A survey. EURASIP J. Image Video Process. 2018, 2018, 42. [Google Scholar] [CrossRef] [Green Version]
Munoz, M.L.; Van Roon, A.; Riese, H.; Thio, C.; Oostenbroek, E.; Westrik, I.; De Geus, E.J.C.; Gansevoort, R.; Lefrandt, J.; Nolte, I.M.; et al. Validity of (Ultra-)Short recordings for heart rate variability measurements. PLoS ONE 2015, 10, e0138921. [Google Scholar] [CrossRef] [Green Version]
Hennig, T.; Maass, P.; Hayano, J.; Heinrichs, S. Exponential distribution of long heart beat intervals during atrial fibrillation and their relevance for white noise behaviour in power spectrum. J. Biol. Phys. 2006, 32, 383–392. [Google Scholar] [CrossRef] [Green Version]
Khandoker, A.H.; Palaniswami, M.; Karmakar, C.K. Support vector machines for automated recognition of obstructive sleep apnea syndrome from ECG recordings. IEEE Trans. Inf. Technol. Biomed. 2009, 13, 37–48. [Google Scholar] [CrossRef]
Goldberger, A.L.; Amaral, L.A.; Glass, L.; Hausdorff, J.M.; Ivanov, P.C.; Mark, R.G.; Mietus, J.E.; Moody, G.B.; Peng, C.K.; Stanley, H.E. PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation 2000, 101, E215–E220. [Google Scholar] [CrossRef] [Green Version]
Makowski, D.; Pham, T.; Lau, Z.J.; Brammer, J.C.; Lespinasse, F.; Pham, H.; Schölzel, C.; Chen, S.H.A. NeuroKit2: A Python toolbox for neurophysiological signal processing. Behav. Res. Methods 2021, 53, 1689–1696. [Google Scholar] [CrossRef]
Neurophysiological Data Analysis with NeuroKit2. Available online: https://neuropsychology.github.io/NeuroKit/functions/ecg.html#ecg-peaks (accessed on 30 October 2022).
Satija, U.; Ramkumar, B.; Sabarimalai Manikandan, M. A Review of Signal Processing Techniques for Electrocardiogram Signal Quality Assessment. IEEE Rev. Biomed. Eng. 2018, 11, 36–52. [Google Scholar] [CrossRef] [PubMed]
Shaffer, F.; Ginsberg, J.P. An Overview of Heart Rate Variability Metrics and Norms. Front. Public Health 2017, 5, 258. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Pincus, S.M. Approximate entropy as a measure of system complexity. Proc. Natl. Acad. Sci. USA 1991, 88, 2297–2301. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Richman, J.S.; Moorman, J.R. Physiological time-series analysis using approximate entropy and sample entropy. Am. J. Physiol. Heart Circ. Physiol. 2000, 278, H2039–H2049. [Google Scholar] [CrossRef] [Green Version]
Peng, C.K.; Havlin, S.; Stanley, H.E.; Goldberger, A.L. Quantification of scaling exponents and crossover phenomena in nonstationary heartbeat time series. Chaos 1995, 5, 82–87. [Google Scholar] [CrossRef]
Grassberger, P.; Procaccia, I. Characterization of Strange Attractors. Phys. Rev. Lett. 1983, 50, 346–349. [Google Scholar] [CrossRef]
Buda, M.; Maki, A.; Mazurowski, M.A. A systematic study of the class imbalance problem in convolutional neural networks. Neural Netw. 2018, 106, 249–259. [Google Scholar] [CrossRef] [Green Version]
Johnson, J.M.; Khoshgoftaar, T.M. Survey on deep learning with class imbalance. J. Big Data 2019, 6, 27. [Google Scholar] [CrossRef] [Green Version]
He, H.; Bai, Y.; Garcia, E.A.; Li, S. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In Proceedings of the 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), Hong Kong, China, 1–8 June 2008; pp. 1322–1328. [Google Scholar]
Virgeniya, S.C.; Ramaraj, E. A Novel Deep Learning based Gated Recurrent Unit with Extreme Learning Machine for Electrocardiogram (ECG) Signal Recognition. Biomed. Signal Process. Control 2021, 68, 102779. [Google Scholar] [CrossRef]
Qin, H.; Liu, G. A dual-model deep learning method for sleep apnea detection based on representation learning and temporal dependence. Neurocomputing 2022, 473, 24–36. [Google Scholar] [CrossRef]
Barz, B.; Denzler, J. Deep learning on small datasets without pre-training using cosine loss. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Snowmass Village, CO, USA, 1–5 March 2020; pp. 1360–1369. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar] [CrossRef] [Green Version]
Ismail Fawaz, H.; Forestier, G.; Weber, J.; Idoumghar, L.; Muller, P.A. Deep learning for time series classification: A review. Data Min. Knowl. Discov. 2019, 33, 917–963. [Google Scholar] [CrossRef] [Green Version]
Kumar, P.; Bhatnagar, R.; Gaur, K.; Bhatnagar, A. Classification of Imbalanced Data:Review of Methods and Applications. IOP Conf. Ser. Mater. Sci. Eng. 2021, 1099, 012077. [Google Scholar] [CrossRef]
Yadav, S.; Bhole, G.P. Handling Imbalanced Dataset Classification in Machine Learning. In Proceedings of the 2020 IEEE Pune Section International Conference (PuneCon), Pune, India, 16–18 December 2020; pp. 38–43. [Google Scholar] [CrossRef]
Gösgens, M.; Zhiyanov, A.; Tikhonov, A.; Prokhorenkova, L. Good Classification Measures and How to Find Them. Adv. Neural Inf. Process. Syst. 2021, 21, 17136–17147. [Google Scholar]
Rosati, R.; Romeo, L.; Vargas, V.M.; Gutiérrez, P.A.; Hervás-Martínez, C.; Frontoni, E. A novel deep ordinal classification approach for aesthetic quality control classification. Neural Comput. Appl. 2022, 34, 11625–11639. [Google Scholar] [CrossRef]
Abraham, B.; Nair, M.S. Automated grading of prostate cancer using convolutional neural network and ordinal class classifier. Informatics Med. Unlocked 2019, 17, 100256. [Google Scholar] [CrossRef]
Cruz-Ramírez, M.; Hervás-Martínez, C.; Sánchez-Monedero, J.; Gutiérrez, P.A. Metrics to guide a multi-objective evolutionary algorithm for ordinal classification. Neurocomputing 2014, 135, 21–31. [Google Scholar] [CrossRef]
De la Torre, J.; Puig, D.; Valls, A. Weighted kappa loss function for multi-class classification of ordinal data in deep learning. Pattern Recognit. Lett. 2018, 105, 144–154. [Google Scholar] [CrossRef]
Mitani, A.A.; Freer, P.E.; Nelson, K.P. Summary measures of agreement and association between many raters’ ordinal classifications. Ann. Epidemiol. 2017, 27, 677–685.e4. [Google Scholar] [CrossRef]
Sim, J.; Wright, C.C. The kappa statistic in reliability studies: Use, interpretation, and sample size requirements. Phys. Ther. 2005, 85, 257–268. [Google Scholar] [CrossRef] [Green Version]
Fatourechi, M.; Ward, R.K.; Mason, S.G.; Huggins, J.; Schlögl, A.; Birch, G.E. Comparison of evaluation metrics in classification applications with imbalanced datasets. In Proceedings of the 2008 Seventh International Conference on Machine Learning and Applications, San Diego, CA, USA, 11–13 December 2008; pp. 777–782. [Google Scholar] [CrossRef]
Fleiss, J.L.; Cohen, J.; Everitt, B.S. Large sample standard errors of kappa and weighted kappa. Psychol. Bull. 1969, 72, 323–327. [Google Scholar] [CrossRef] [Green Version]
Landis, J.R.; Koch, G.G. The Measurement of Observer Agreement for Categorical Data Published by: International Biometric Society Stable. Biometrics 1977, 33, 159–174. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Van Hulse, J.; Khoshgoftaar, T.M.; Napolitano, A. Experimental perspectives on learning from imbalanced data. ACM Int. Conf. Proc. Ser. 2007, 227, 935–942. [Google Scholar] [CrossRef]

Figure 1. Overview of data preprocessing and feature extraction.

Figure 2. Sample 5 min RRI sequence from a subject from age group 1.

Figure 3. Modified 1D-ResNet34. (a) Overall architecture; (b) Conv block; (c) Residual block.

Figure 4. Overall architecture of (a) HRV model; (b) combined model.

Figure 5. Mean and 95% CI (error bars) of the prediction performance based on the data type and method. L_Class: classification loss; L_Hybrid: hybrid loss; ADA: ADASYN.

Figure 6. Confusion matrices of HRV models based on the methods in the test dataset.

Figure 7. Confusion matrices of RRI models based on the methods in the test dataset.

Figure 8. Confusion matrices of combined models based on the methods in the test dataset.

Table 1. Number of subjects and 5 min RRI samples from each age group.

Age Group	Number of Subjects	Number of 5 min RRIs
Group 1 (18–29)	696 (64%)	2280 (63%)
Group 2 (30–39)	147 (13%)	503 (14%)
Group 3 (40–49)	100 (9%)	365 (10%)
Group 4 (≥50)	150 (14%)	458 (13%)
Total	1093 (100%)	3606 (100%)

Table 2. Description of HRV features.

Feature	Description
Time-domain
RRI (ms)	Mean RR intervals
SDNN (ms)	Standard deviation of RR intervals
RMSSD (ms)	Root mean square of successive RR interval differences
pNN50 (%)	Percentage of successive RR intervals differing more than 50 ms
TRI	Integral of the histogram of the RR interval divided by its height
TINN (ms)	Baseline width of the RR interval histogram
Frequency-domain
logVLF (s²)	Logarithm-transformed power in the VLF band (0–0.04 Hz)
logLF (s²)	Logarithm-transformed power in the LF band (0.04–0.15 Hz)
LFnu (nu)	Relative power of the LF band
logHF (s²)	Logarithm-transformed power in the HF band (0.15–0.4 Hz)
HFnu (nu)	Relative power of the HF band
LF/HF	Ratio between LF and HF band powers
logTot (s²)	Logarithm-transformed total power
Nonlinear domain and Poincaré plot
ApEn	Approximate entropy
SampEn	Sample entropy
α1	Short-term scaling exponent of DFA
α2	Long-term scaling exponent of DFA
CorDim	Correlation dimension
SD1 (ms)	Standard deviation of the Poincaré plot perpendicular to the line of identity
SD2 (ms)	Standard deviation of the Poincaré plot along the line of identity

Table 3. Architecture of modified 1D-ResNet34.

Block	Output Shape	Number of Layers	1D-ResNet34
Conv	(64, 1200)	1	1 × 15, 64, stride 1
Conv	(64, 598)	1	1 × 5 max pool, stride 2
Residual 1	(64, 598)	4	$[\begin{matrix} 1 \times 15, 64 \\ 1 \times 15, 64 \end{matrix}]$ , stride 1
Residual 2	(128, 299)	1	$[\begin{matrix} 1 \times 15, 128 \\ 1 \times 15, 128 \end{matrix}]$ , stride 2
Residual 2	(128, 299)	3	$[\begin{matrix} 1 \times 15, 128 \\ 1 \times 15, 128 \end{matrix}]$ , stride 1
Residual 3	(192, 150)	1	$[\begin{matrix} 1 \times 15, 192 \\ 1 \times 15, 192 \end{matrix}]$ , stride 2
Residual 3	(192, 150)	5	$[\begin{matrix} 1 \times 15, 192 \\ 1 \times 15, 192 \end{matrix}]$ , stride 1
Residual 4	(256, 75)	1	$[\begin{matrix} 1 \times 15, 256 \\ 1 \times 15, 256 \end{matrix}]$ , stride 2
Residual 4	(256, 75)	2	$[\begin{matrix} 1 \times 15, 256 \\ 1 \times 15, 256 \end{matrix}]$ , stride 1
Linear	(1000)	1	Average pool
Linear	(4)	3	1000-d fc, SoftMax

Table 4. Performance metric definitions.

Metric	Formula
Balanced accuracy (BAC)	$\frac{\sum_{i = 1}^{n} {Recall}_{i}}{n},$ where Recall_i = TP_i/(TP_i + FN_i) and n is the number of classes. TP_i and FN_i denote true positive and false negative of the ith class, respectively.
Quadratic weighted kappa (QWK)	$\frac{p_{ow} - p_{cw}}{1 - p_{cw}},$ $p_{ow} = \sum_{j} \sum_{k} w_{jk} n_{jk},$ $p_{cw} = \sum_{j} \sum_{k} w_{jk} n_{j \cdot} n_{\cdot k},$ where p_ow is the weighted proportion of observed agreement, p_cw is the weighted proportion of chance agreement, n_jk is the proportion of samples categorized in the jth and kth class, n_j is the proportion of samples categorized in the jth row, and n_∙k is the proportion of samples categorized in the kth column, where j and k are the class levels [65,67,70]. Quadratic weights are defined as w_jk = 1 $-$ (j $-$ k)² $/$ (N $-$ 1)², where N is the number of classes [65,67,70].
Accuracy	$\frac{Number of correct predictions}{Total number of predictions}$
Precision	$\frac{\sum_{i = 1}^{n} {Precision}_{i}}{n},$ where Precision_i = TP_i/(TP_i + FP_i) and n is the number of classes. TP_i and FP_i denote true positive and false positive of the ith class, respectively.
F1-score	$2 \frac{Precision \cdot Recall}{Precision + Recall}$

Table 5. Mean and standard deviation of HRV features calculated from each age group. One-way ANOVA was performed to compare the effect of age on HRV features (* p < 0.05).

Feature	Group 1 (18–29, N = 696)	Group 2 (30–39, N = 147)	Group 3 (40–49, N = 100)	Group 4 (≥50, N = 150)	p-Value
RRI	893.98 ± 129.38	909.82 ± 171.27	865.30 ± 128.90	914.53 ± 137.73	0.024 *
SDNN	65.30 ± 34.65	68.56 ± 92.77	55.35 ± 55.12	55.17 ± 86.66	0.068
RMSSD	60.91 ± 46.33	65.86 ± 128.61	48.97 ± 78.41	54.48 ± 122.14	0.311
pNN50	30.16 ± 21.87	23.65 ± 22.94	14.49 ± 19.23	11.29 ± 20.85	<0.001 *
TRI	13.64 ± 4.60	12.46 ± 4.98	10.39 ± 4.63	9.24 ± 5.43	<0.001 *
TINN	222.38 ± 88.63	220.03 ± 103.30	174.27 ± 87.14	175.65 ± 136.27	<0.001 *
logVLF	−5.18 ± 0.53	−5.11 ± 0.50	−5.15 ± 0.51	−5.00 ± 0.61	0.001 *
logLF	−4.19 ± 0.54	−4.13 ± 0.57	−4.48 ± 0.68	−4.56 ± 0.86	<0.001 *
LFnu	0.50 ± 0.17	0.58 ± 0.20	0.61 ± 0.18	0.57 ± 0.19	<0.001 *
logHF	−4.21 ± 0.76	−4.50 ± 0.84	−5.00 ± 1.02	−4.88 ± 1.07	<0.001 *
HFnu	0.50 ± 0.17	0.42 ± 0.20	0.39 ± 0.18	0.43 ± 0.19	<0.001 *
LF/HF	1.50 ± 1.67	2.32 ± 2.30	2.57 ± 2.19	2.30 ± 2.48	<0.001 *
logTot	−3.22 ± 0.44	−3.26 ± 0.41	−3.59 ± 0.59	−3.51 ± 0.61	<0.001 *
ApEn	1.10 ± 0.08	1.07 ± 0.10	1.06 ± 0.12	1.05 ± 0.13	<0.001 *
SampEn	1.58 ± 0.25	1.48 ± 0.28	1.38 ± 0.31	1.39 ± 0.32	<0.001 *
α1	0.99 ± 0.25	1.07 ± 0.30	1.14 ± 0.28	1.08 ± 0.31	<0.001 *
α2	0.72 ± 0.17	0.73 ± 0.18	0.85 ± 0.23	0.91 ± 0.24	<0.001 *
Cordim	1.43 ± 0.17	1.36 ± 0.18	1.32 ± 0.22	1.26 ± 0.23	<0.001 *
SD1	43.14 ± 32.82	46.65 ± 91.15	34.68 ± 55.53	38.59 ± 86.54	0.311
SD2	80.50 ± 38.32	82.62 ± 95.24	68.03 ± 56.62	65.22 ± 88.07	0.008 *

Table 6. Mean and 95% CI of the prediction performance based on the data type and method. The bold entries indicate the best performance in terms of each evaluation metric.

Data	Method	BAC (%)	QWK	Accuracy (%)	Precision	F1
HRV	Classification loss	39.0 ± 0.459	0.514 ± 0.011	67.1 ± 0.477	0.404 ± 0.008	0.378 ± 0.005
	Classification loss + ADASYN	44.8 ± 0.542	0.526 ± 0.011	63.1 ± 0.568	0.436 ± 0.007	0.435 ± 0.006
	Hybrid loss	37.8 ± 0.399	0.475 ± 0.013	68.8 ± 0.486	0.323 ± 0.004	0.344 ± 0.004
	Hybrid loss + ADASYN	43.5 ± 0.554	0.583 ± 0.010	61.3 ± 0.569	0.408 ± 0.006	0.416 ± 0.006
RRI	Classification loss	37.9 ± 0.390	0.495 ± 0.011	68.6 ± 0.493	0.407 ± 0.022	0.346 ± 0.004
	Classification loss + ADASYN	41.8 ± 0.528	0.550 ± 0.009	65.2 ± 0.496	0.445 ± 0.010	0.415 ± 0.006
	Hybrid loss	39.2 ± 0.317	0.519 ± 0.009	69.0 ± 0.467	0.314 ± 0.003	0.348 ± 0.003
	Hybrid loss + ADASYN	44.8 ± 0.589	0.568 ± 0.010	59.8 ± 0.568	0.456 ± 0.006	0.446 ± 0.006
Combined (RRI + HRV)	Classification loss	37.7 ± 0.405	0.474 ± 0.010	66.5 ± 0.454	0.417 ± 0.008	0.373 ± 0.005
	Classification loss + ADASYN	41.4 ± 0.505	0.549 ± 0.009	67.1 ± 0.495	0.459 ± 0.007	0.424 ± 0.006
	Hybrid loss	44.0 ± 0.489	0.589 ± 0.010	70.1 ± 0.511	0.496 ± 0.014	0.425 ± 0.006
	Hybrid loss + ADASYN	42.9 ± 0.552	0.590 ± 0.010	64.7 ± 0.553	0.422 ± 0.008	0.417 ± 0.006

Table 7. Mean prediction performance for each age group based on the data type and method.

		HRV			RRI			Combined
Method	Group	Recall	Precision	F1	Recall	Precision	F1	Recall	Precision	F1
Classification loss	1	0.928	0.734	0.819	0.961	0.719	0.822	0.931	0.718	0.810
	2	0.021	0.132	0.036	0.013	0.332	0.024	0.035	0.169	0.057
	3	0.112	0.218	0.145	0.006	0.057	0.011	0.101	0.226	0.136
	4	0.496	0.532	0.510	0.538	0.522	0.528	0.443	0.555	0.489
Classification loss + ADASYN	1	0.789	0.787	0.787	0.861	0.766	0.810	0.905	0.752	0.821
	2	0.209	0.262	0.230	0.143	0.192	0.163	0.124	0.239	0.161
	3	0.166	0.232	0.190	0.113	0.321	0.164	0.150	0.237	0.181
	4	0.627	0.464	0.531	0.553	0.501	0.523	0.478	0.607	0.532
Hybrid loss	1	0.965	0.702	0.812	0.953	0.719	0.820	0.933	0.745	0.828
	2	0.000	0.000	0.000	0.000	0.000	0.000	0.043	0.333	0.074
	3	0.000	0.000	0.000	0.000	0.000	0.000	0.120	0.301	0.168
	4	0.546	0.591	0.565	0.614	0.537	0.571	0.664	0.606	0.631
Hybrid loss + ADASYN	1	0.760	0.815	0.786	0.730	0.798	0.762	0.833	0.788	0.809
	2	0.226	0.213	0.217	0.318	0.213	0.253	0.187	0.219	0.200
	3	0.097	0.116	0.104	0.239	0.221	0.227	0.077	0.177	0.105
	4	0.657	0.488	0.558	0.504	0.591	0.541	0.621	0.504	0.553

Table 8. Mean and 95% CI of difference in performance metrics between the RRI and HRV models. The differences were calculated by subtracting the value of an HRV model from that of an RRI model. A two-sample t-test was performed to compare the metrics from the two models (* p < 0.05).

Metric	Method	Difference (RRI−HRV)	p-Value
BAC (%)	Classification loss	−1.023 ± 0.546	0.001 *
	Classification loss + ADASYN	−3.006 ± 0.696	<0.001 *
	Hybrid loss	1.394 ± 0.371	<0.001 *
	Hybrid loss + ADASYN	1.266 ± 0.761	0.003 *
QWK	Classification loss	−0.019 ± 0.012	0.018 *
	Classification loss + ADASYN	0.024 ± 0.010	0.001
	Hybrid loss	0.044 ± 0.013	<0.001 *
	Hybrid loss + ADASYN	−0.002 ± 0.010	0.053
Accuracy (%)	Classification loss	1.573 ± 0.377	<0.001 *
	Classification loss + ADASYN	2.083 ± 0.522	<0.001 *
	Hybrid loss	0.210 ± 0.303	0.544
	Hybrid loss + ADASYN	−1.547± 0.607	<0.001 *
Precision	Classification loss	0.003 ± 0.022	0.794
	Classification loss + ADASYN	0.009 ± 0.012	0.155
	Hybrid loss	−0.009 ± 0.004	0.001 *
	Hybrid loss + ADASYN	0.048 ± 0.008	<0.001 *
F1	Classification loss	−0.031 ± 0.006	<0.001 *
	Classification loss + ADASYN	−0.020 ± 0.008	<0.001 *
	Hybrid loss	0.003 ± 0.003	0.181
	Hybrid loss + ADASYN	0.030 ± 0.007	<0.001 *

Table 9. Mean and 95% CI of difference in performance metrics between the combined and HRV models. The differences were calculated by subtracting the value of an HRV model from that of a combined model. A two-sample t-test was performed to compare the metrics from the two models (* p < 0.05).

Metric	Method	Difference (Combined−HRV)	p-Value
BAC (%)	Classification loss	−1.222 ± 0.425	<0.001 *
	Classification loss + ADASYN	−3.337 ± 0.692	<0.001 *
	Hybrid loss	6.211 ± 0.493	<0.001 *
	Hybrid loss + ADASYN	−0.547 ± 0.728	0.174
QWK	Classification loss	−0.040 ± 0.008	<0.001 *
	Classification loss + ADASYN	0.022 ± 0.010	0.002 *
	Hybrid loss	0.114 ± 0.012	<0.001 *
	Hybrid loss + ADASYN	0.007 ± 0.009	0.331
Accuracy (%)	Classification loss	−0.507 ± 0.297	0.135
	Classification loss + ADASYN	3.953 ± 0.510	<0.001 *
	Hybrid loss	1.300 ± 0.345	<0.001 *
	Hybrid loss + ADASYN	3.417 ± 0.551	<0.001 *
Precision	Classification loss	−0.001 ± 0.007	0.036 *
	Classification loss + ADASYN	−0.005 ± 0.005	<0.001 *
	Hybrid loss	0.103 ± 0.008	<0.001 *
	Hybrid loss + ADASYN	−0.009 ± 0.005	0.005 *
F1	Classification loss	−0.005 ± 0.005	0.191
	Classification loss + ADASYN	−0.011 ± 0.008	0.011 *
	Hybrid loss	0.081 ± 0.006	<0.001 *
	Hybrid loss + ADASYN	0.001 ± 0.007	0.841

Table 10. Mean and 95% CI of difference in performance metrics between the combined and RRI models. The differences were calculated by subtracting the value of an RRI model from that of a combined model. A two-sample t-test was performed to compare the metrics from the two models (* p < 0.05).

Metric	Method	Difference (Combined−RRI)	p-Value
BAC (%)	Classification loss	−0.199 ± 0.536	0.492
	Classification loss + ADASYN	−0.331 ± 0.681	0.377
	Hybrid loss	4.818 ± 0.446	<0.001 *
	Hybrid loss + ADASYN	−1.813 ± 0.660	<0.001 *
QWK	Classification loss	−0.021 ± 0.011	0.005 *
	Classification loss + ADASYN	−0.001 ± 0.008	0.858
	Hybrid loss	0.070 ± 0.008	<0.001 *
	Hybrid loss + ADASYN	0.021 ± 0.009	0.003 *
Accuracy (%)	Classification loss	−2.080 ± 0.353	<0.001 *
	Classification loss + ADASYN	1.870 ± 0.441	<0.001 *
	Hybrid loss	1.090 ± 0.292	0.002 *
	Hybrid loss + ADASYN	4.963 ± 0.528	<0.001 *
Precision	Classification loss	0.009 ± 0.023	0.432
	Classification loss + ADASYN	0.014 ± 0.012	0.028 *
	Hybrid loss	0.182 ± 0.015	<0.001 *
	Hybrid loss + ADASYN	−0.034 ± 0.009	<0.001 *
F1	Classification loss	0.027 ± 0.006	<0.001 *
	Classification loss + ADASYN	0.009 ± 0.008	0.048 *
	Hybrid loss	0.078 ± 0.006	<0.001 *
	Hybrid loss + ADASYN	−0.029 ± 0.007	<0.001 *

Table 11. Cohen’s kappa representing agreement between predictions of two distinct samples from the same subject.

Model	HRV	RRI	Combined
Cohen’s kappa	0.770	0.678	0.737

Table 12. Mean and 95% CI of the prediction performance of the RRI model using different sample windows.

Method (RRI Model)	BAC (%)	QWK	Accuracy (%)	Precision	F1
Classification loss	37.5 ± 0.343	0.492 ± 0.010	68.6 ± 0.482	0.381 ± 0.019	0.343 ± 0.004
Classification loss + ADASYN	42.7 ± 0.555	0.555 ± 0.010	65.0 ± 0.496	0.451 ± 0.008	0.426 ± 0.006
Hybrid loss	38.2 ± 0.364	0.482 ± 0.011	67.8 ± 0.498	0.303 ± 0.004	0.337 ± 0.003
Hybrid loss + ADASYN	43.4 ± 0.569	0.594 ± 0.008	60.4 ± 0.561	0.434 ± 0.005	0.429 ± 0.005

Table 13. Summary of machine learning-based age prediction using HRV and RRI.

Reference	Input Data	Subjects	Measurement Condition	Predicted Target	Algorithm	Validation	Best Performance
Corino et al., 2006 [22]	HRV	131 healthy	24 h	Individual ages	RLR, FFNN, RBFNN	3-fold CV	Correlation coefficient: 0.872 (FFNN)
Poddar et al., 2015 [23]	HRV	60 healthy males	Supine rest	Three age groups	SVM, KNN, PNN	Holdout	ACC: 70% (PNN)
Botsva et al., 2017 [24]	HRV	22,433 ^a	N/A (130 s ECG)	Mean group ages (nine groups)	ANN	Holdout	85% ^b
Makowiec and Wdowczyk, 2019 [25]	HRV	181 healthy	Nocturnal sleep	Four age groups	SVM	N/A	ACC: 93.6% ^c
Al-Mter, 2020 [26]	HRV	181 healthy	Nocturnal sleep	Seven age groups	SVM, RF, XGB	Holdout	ACC: 28.77% (RF)
Pfundstein, 2020 [38]	RRI	181 healthy	Nocturnal sleep	Seven age groups	CNN + LSTM	Holdout	ACC: 32.43%
This study	HRV, RRI, combined	1121 healthy	Supine rest	Four age groups	ResNet	5-fold CV	ACC ^d: 61.3% (HRV), 59.8% (RRI), 64.7% (combined)

Abbreviations: N/A, not applicable; CV, cross-validation; ACC, accuracy; RLR, robust linear regression; FFNN, feedforward neural network; RBFNN, radial basis function neural network; SVM, support vector machine; KNN, k-nearest neighbors; PNN, probabilistic neural network; ANN, artificial neural network; RF, random forest; XGB, extreme gradient boosting. ^a Participants’ status was not specified; ^b Evaluation method was not specified; ^c Evaluation was not based on unseen test data; ^d Results using hybrid loss with ADASYN.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lee, K.H.; Byun, S. Age Prediction in Healthy Subjects Using RR Intervals and Heart Rate Variability: A Pilot Study Based on Deep Learning. Appl. Sci. 2023, 13, 2932. https://doi.org/10.3390/app13052932

AMA Style

Lee KH, Byun S. Age Prediction in Healthy Subjects Using RR Intervals and Heart Rate Variability: A Pilot Study Based on Deep Learning. Applied Sciences. 2023; 13(5):2932. https://doi.org/10.3390/app13052932

Chicago/Turabian Style

Lee, Kyung Hyun, and Sangwon Byun. 2023. "Age Prediction in Healthy Subjects Using RR Intervals and Heart Rate Variability: A Pilot Study Based on Deep Learning" Applied Sciences 13, no. 5: 2932. https://doi.org/10.3390/app13052932

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Age Prediction in Healthy Subjects Using RR Intervals and Heart Rate Variability: A Pilot Study Based on Deep Learning

Abstract

1. Introduction

2. Related Works

3. Materials and Methods

3.1. Dataset

3.2. Data Preprocessing

3.3. Feature Extraction

3.4. Class Imbalance Handling

3.5. Loss Function

3.6. Modified 1D-ResNet34

3.7. Modeling and Statistical Analysis

3.8. Evaluation Metric

4. Results

4.1. Effect of Age on HRV Features

4.2. Classifier Assessment

4.3. Confusion Matrices

4.4. Comparison between RRI and HRV Models

4.5. Comparison between Combined, HRV, and RRI Models

4.6. Inter-Sample Reliability in Age Prediction

5. Discussion

5.1. Performance of RRI-Based Deep Learning Model

5.2. Comparison with Similar Previous Studies

5.3. Comparison between the Models

5.4. Limitations

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI