1. Introduction
The ability to perform activities of daily living (ADLs) is a fundamental indicator of an individual’s functional independence and overall well-being. In elderly individuals, changes in ADL performance can signal the onset of cognitive decline or neurodegenerative diseases, making continuous monitoring an essential aspect of healthcare and assisted living. The early detection of irregularities in movement patterns can provide critical information for caregivers and healthcare professionals, enabling timely intervention to prevent falls, injuries, or complications related to mobility impairment [
1].
Various sensor-based approaches have been explored in the literature for ADL detection, ranging from wearable sensors to environmentally embedded sensors. Wearable sensors, such as accelerometers, gyroscopes, and inertial measurement units (IMUs), are commonly used for activity tracking due to their ability to provide high-frequency motion data. However, these devices require users to wear them continuously, which can be uncomfortable and lead to compliance issues, particularly among the elderly. In contrast, non-wearable sensors, including cameras, depth sensors, infrared sensors, and radar systems, allow for unobtrusive activity monitoring without requiring physical attachment to the body. Among these, vision-based systems, such as monocular cameras, stereo cameras, and RGB-D sensors, have demonstrated high accuracy in human activity recognition (HAR). However, these methods raise privacy concerns as they capture detailed images that may contain personally identifiable information [
2].
While several studies have attempted to anonymize visual data or develop techniques such as federated learning to ensure that data processing remains within a private network [
3], the most effective privacy-preserving approach is to use sensors that inherently protect user anonymity. Radar sensors fulfill this criterion by design. Unlike cameras, radars do not capture explicit visual or biometric details but rather rely on the emission and reflection of radio waves, allowing for precise motion detection without revealing the physical characteristics of the monitored individual. Radar-based HAR is gaining interest due to its non-invasive nature, resilience to environmental conditions, and capability to function in low-visibility settings, such as dark rooms, smoke-filled environments, or occluded spaces [
4]. Furthermore, radars are capable of detecting subtle physiological signals, including breathing and heart rate, making them well-suited for applications in healthcare, elderly monitoring, and smart home environments [
5].
In this study, we employ a BGT60TR13C Xensiv 60 GHz radar sensor [
6] to recognize ADLs performed in a bathroom environment, a highly privacy-sensitive setting where traditional vision-based methods are impractical. The 60 GHz radar operates in the millimeter-wave band, providing higher spatial resolution compared to lower-frequency radars such as those operating at 5.8 GHz or 24 GHz. This improved resolution enhances the system’s ability to distinguish fine-grained movements, which is particularly beneficial for detecting activities such as brushing teeth, face washing, and dressing/undressing. The radar’s low power consumption and miniaturized form factor make it ideal for Internet-of-Things (IoTs)-based continuous monitoring, ensuring efficient and discreet ADL recognition in smart home environments.
This study introduces several key innovations that advance radar-based HAR in bathroom environments. Firstly, it employs a state-of-the-art 60 GHz frequency modulated continuous wave (FMCW) radar sensor equipped with three receiving antennas, enabling the parallel processing of range–Doppler maps to preserve angular information. The sensor’s compact dimensions (19 mm × 12.7 mm) facilitate seamless integration within the bathroom environment, such as placement near a mirror above the sink, ensuring non-intrusive deployment.
Secondly, unlike previous studies that primarily focused on basic ADLs, such as walking, sitting, standing, and lying on the floor (as a proxy for falls or medical distress), this work expands the scope to include more specific bathroom-related activities. These include fall recovery (standing up after a fall), brushing teeth, washing the face, and combing hair, thereby improving the system’s applicability in real-world monitoring scenarios.
Thirdly, to enhance feature learning from radar data, this study systematically investigates 16 pre-trained networks (PTNs), evaluating their ability to extract discriminative representations for activity classification. In addition, real-time feasibility is explicitly assessed by measuring the processing time, providing insights into practical deployment. The most lightweight PTNs among those examined are further validated on a low-power edge computing platform, demonstrating suitability for resource-constrained environments.
Finally, the proposed system is evaluated on a diverse dataset, incorporating both male and female participants with varying physical characteristics and a wide age range, ensuring generalizability across different user profiles. Each participant contributed approximately 83 min of recorded data, significantly exceeding the durations typically found in related studies and providing a robust foundation for model training and evaluation.
Radar-Based ADL Recognition and Related Work
HAR has been extensively explored across various domains, including healthcare, elderly care, smart homes, and security applications. In healthcare settings, radar technology is increasingly valuable for monitoring vital signs and recognizing daily activities, benefiting from its non-intrusive nature and privacy-preserving characteristics [
7]. Similarly, unobtrusive sensing systems such as radar are particularly advantageous in elderly care applications, enabling continuous monitoring without affecting user comfort or privacy [
8]. Among critical applications in elderly care, fall detection is particularly significant; radar-based systems leveraging Doppler analysis effectively identify the rapid downward movement characteristic of falls, demonstrating high detection accuracy and reliability [
9]. Despite the broad spectrum of radar applications described, the remainder of this section specifically concentrates on research aimed at HAR as it constitutes the core topic of the present work.
Several studies have leveraged handcrafted features due to their interpretability and computational simplicity. Li et al. [
10] employed 20 handcrafted Doppler features extracted and subsequently reduced through selection processes. Although their bidirectional long short-term memory (BiLSTM) classifier demonstrated effectiveness, the handcrafted approach necessitates significant domain expertise, making feature extraction cumbersome and context-specific. Similarly, Li et al. [
4] extracted 68 handcrafted features from micro-Doppler spectrograms, achieving computational efficiency via hierarchical classification and adaptive thresholding. However, like the earlier work, it faced limitations due to lack of angle information, susceptibility to environmental interference at 5.8GHz, and complex hierarchical implementations.
Recent efforts have transitioned towards learned feature approaches, which typically yield better generalization and less manual effort. Cao et al. [
11] advanced the state-of-the-art by combining time–range and time–Doppler maps using a multi-feature attention fusion module and training with VGGNet (VGG13), leveraging learned features to reduce misclassification. Yet, despite superior accuracy, this approach did not address the radar-target angle, which could enhance spatial context in activity recognition.
Other studies have utilized radars equipped with multiple outputs (i.e., receiving antennas), which in principle allow for the capture of angular information. However, in several cases, this angular information was not explicitly incorporated into the processing pipeline. Chen et al. [
12] utilized a 77GHz radar with multiple receiving antennas, extracting learned Doppler–time maps via dilated convolutions and multi-head self-attention mechanisms. However, despite having multiple receiving channels capable of capturing angular information, the study averaged signals from all antennas into a single representation, effectively discarding spatial diversity. This prevented the model from leveraging angle-of-arrival features that could enhance classification performance.
Similarly, Kurtoglu et al. [
13] employed learned features from range–Doppler maps and micro-Doppler spectrograms using CNN-BiLSTM architectures. Despite high accuracy through multi-representation fusion, their methodology neither separately processed data from multiple receiving antennas nor preserved angular information explicitly, potentially limiting spatial differentiation capabilities. Additionally, their small dataset posed significant constraints on generalizability.
Vandersmissen et al. [
14] advanced feature extraction by deploying deep learning models on range–Doppler and micro-Doppler data combined with visual inputs. While achieving high accuracy, particularly with 3D CNN architectures, the need for simultaneous video and radar data collection poses privacy and practical implementation challenges, alongside lacking explicit angular data consideration.
To overcome dataset size limitations and enhance generalization, Saeed et al. [
15] utilized extensive publicly available datasets, employing ResNet-based learned features with robust performance in controlled scenarios. However, performance significantly dropped in cross-environmental validations, highlighting environmental sensitivity. Since the specific ResNet variant used in their study is not explicitly stated, the computational feasibility for real-time embedded applications remains uncertain, as deeper ResNet architectures tend to be computationally intensive, whereas shallower variants such as ResNet18 offer more efficient alternatives. Additionally, the publicly available dataset used in their study is based on a radar with a single receiving antenna, meaning that angular information is not captured or incorporated into the feature extraction process.
Huan et al. [
16] proposed a hybrid CNN-LSTM attention network that efficiently decouples Doppler and temporal features, reducing network complexity while maintaining high classification accuracy. The attention-based feature refinement further enhances activity recognition by focusing on the most relevant motion patterns. However, while the radar utilized multiple receiving channels, angular information was averaged rather than explicitly preserved, potentially limiting spatial differentiation. Additionally, the dataset, despite having a reasonable number of participants, is constrained by a short total data collection duration of only 1000 s, which may impact model generalization.
Specifically targeting restroom scenarios, Saho et al. [
17] combined CNN-learned features with handcrafted acceleration and jerk metrics, achieving excellent fall detection accuracy using Doppler signatures to capture motion comprehensively. However, the radar system employed only a single receiving antenna, preventing the capture of angular information. Additionally, the study focused exclusively on predefined restroom activities, omitting other relevant actions such as washing, brushing teeth, combing hair, and attempting to stand up again after a fall. Furthermore, all participants in the study were male, which limits the generalizability of the model to a more diverse population as gender-related differences in movement patterns may influence classification performance.
Finally, Visser et al. [
18] demonstrated effective integration of Doppler, azimuth, and elevation data with convolutional and LSTM architectures to extract spatial-temporal learned features, which provided a cost-effective solution while maintaining competitive accuracy. However, their system exhibited confusion between sitting, standing, and lying on the floor, which was used in the study to simulate a fall. The misclassification may stem from feature extraction based on four convolutional layers, which may not sufficiently differentiate between these critical postures. Additionally, the study considers only five activities—walking, sitting on the toilet, standing up from the toilet, washing hands, and lying on the floor (fall)—while neglecting other essential behaviors such as recovering from a fall, brushing teeth, combing hair, and washing the face. Furthermore, the study does not address real-time system performance or its feasibility for use on low-power edge computing platforms, leaving its suitability for deployment in resource-constrained environments an open question.
Recent studies have demonstrated significant advancements in leveraging deep feature learning (DFL) through pre-trained neural networks for radar-based HAR. Pre-trained CNNs, including DenseNet [
19], ResNet [
20,
21], Inception [
21], EfficientNet [
22], VGG-16 [
20], VGG-19 [
23], GoogleNet [
24], and MobileNetV2 [
19,
20], effectively extract discriminative features from radar data representations such as micro-Doppler spectrograms and range–Doppler images, significantly enhancing recognition accuracy and computational efficiency.
The application of these architectures substantially reduces the necessity for extensive radar-specific datasets by leveraging transfer learning. Furthermore, hybrid architectures combining 3D and 2D CNN models, such as EfficientNet in the progressively orthogonally mapped EfficientNet architecture [
22], outperform traditional approaches by capturing complex spatio-temporal radar features. Additionally, densely connected networks such as Dop-DenseNet [
19] demonstrate superior performance by effectively preserving detailed micro-Doppler information essential for precise gesture and activity recognition. Collectively, these works emphasize the suitability of pre-trained CNN architectures for radar-based HAR, particularly beneficial in scenarios with limited data and computational constraints. However, it should be noted that the cited studies employing DFL through PTNs primarily investigated HAR in general contexts and did not specifically address ADLs within bathroom environments.
Notably, publicly available datasets have facilitated advancements in radar-based HAR research. Shah et al. [
25] presented a dataset collected using a C-band FMCW radar operating at 5.8 GHz with 400 MHz bandwidth, involving 48 participants performing six common human activities, including walking, sitting, standing, bending, drinking water, and falling. Similarly, Fioranelli et al. [
26] introduced another dataset, employing the same type of radar (C-band FMCW at 5.8 GHz, 400 MHz bandwidth), collected from over 50 participants across nine different environments, covering similar activities. However, both datasets were acquired using radar systems equipped with only one receiving antenna, therefore lacking angular information. Furthermore, activities specifically related to bathroom environments were not included in these datasets, making them unsuitable for the context of the present study.
3. Results
The classification performance achieved using each of the 16 PTNs in conjunction with the BiLSTM-based network is presented in
Table 4 and
Table 5.
Table 4 reports the average classification accuracy both with and without (w/o) data augmentation, while
Table 4 provides accuracy values for each activity. The introduction of data augmentation yielded improvements ranging from approximately 4% to 11%, with greater benefits observed for networks featuring a larger number of parameters. The highest overall classification accuracy was obtained using DenseNet201, which achieved 97.02% accuracy. The activities LYD (lying down) and GTU (getting up) exhibited slightly lower accuracy compared to other actions, indicating a greater challenge in distinguishing between these specific activities.
Among the tested networks, ResNet50 ranked second in terms of overall classification performance. Although its average accuracy was lower than that of DenseNet201, it provided the highest accuracy for the LYD activity. Both DenseNet201 and ResNet50 demonstrated optimal performance when trained with a sliding window length of 16 samples, suggesting that this window size effectively captures the necessary temporal dynamics for activity recognition. To further analyze the classification outcomes of the DenseNet201-based architecture, a confusion matrix is provided in
Table 6. This matrix highlights the classification reliability for each activity, with misclassifications primarily occurring between LYD and GTU, as expected from the results in
Table 5. The overall accuracy for other activities remained consistently high, confirming the effectiveness of feature extraction and sequential modeling for radar-based activity recognition.
The optimized hyperparameters used in the architectures based on DenseNet201 and ResNet50, obtained through Bayesian optimization, are detailed in
Table 7. The parameters include number of hidden units (NumHiddenUnits), dropout rate (DropRate), mini-batch size (MiniBatchSize), initial learning rate (InitialLearnRate), gradient threshold (GradientThreshold), dropout probability of the attention layer (DropoutProbability), and number of attention heads (NumHeads). The optimization process identified different configurations for each network, reflecting their distinct architecture and feature extraction properties.
Overall, the results indicate that DenseNet201 and ResNet50 offer the most effective feature representations for Doppler radar-based activity recognition. While other networks such as InceptionV3 and ResNet101 also achieved competitive results, the combination of feature extraction capability and temporal modeling was most successful in DenseNet201-based classification. The observed variations in performance across activities suggest that certain motions, particularly LYD and GTU, require further refinement in feature representation and sequence modeling to improve classification robustness.
The computational efficiency of the classification models was assessed in terms of total objective function evaluation time and average inference time per test window. The DenseNet201-based model required a total optimization time of 86,349.38 s, with an average inference time of 0.83 milliseconds per window. In comparison, the ResNet50-based model demonstrated a shorter total optimization time of 62,760 s but exhibited a higher average inference time of 1.58 milliseconds per window.
These results indicate that while ResNet50 required less overall optimization time, its per-window inference time was nearly double that of DenseNet201. This suggests that DenseNet201, despite its increased computational demand during training and optimization, provides a more efficient real-time inference capability, which is a critical factor in deploying radar-based activity recognition systems in real-world applications. The difference in inference times can be attributed to variations in network architecture, feature extraction complexity, and internal processing mechanisms, which influence computational efficiency during deployment.
Furthermore, to assess the feasibility of deployment on edge computing nodes, the classification models based on the lighter PTNs, including EfficientNetB0, GoogleNet, MobileNetV2, ResNet18, ShuffleNet, and SqueezeNet, were evaluated on an NVIDIA Jetson Nano equipped with an NVIDIA Maxwell GPU (128 CUDA cores) and 4 GB of RAM. The accuracy performances obtained were identical to those presented in
Table 4, confirming the effectiveness of these models even in resource-constrained environments. Additionally, the average inference times per window, reported in
Table 8, demonstrate that the selected PTNs achieve inference times compatible with real-time execution, making them well-suited for practical edge-based deployment.
To provide a comprehensive overview of the radar-based activity recognition process, the range–Doppler maps corresponding to all ten activities considered in this study are presented in
Appendix A. These visualizations offer additional insight into the Doppler signatures and range characteristics associated with different actions, complementing the quantitative analysis provided in this section.
4. Discussion
This study focuses on the recognition of ADLs in privacy-sensitive environments such as bathrooms. The necessity to ensure privacy precludes the use of sensors capable of capturing detailed intensity images (e.g., monocular/stereo cameras) or even depth maps (e.g., time-of-flight or RGB-D cameras). Instead, a radar-based approach is employed, leveraging the BGT60TR13C Xensiv 60 GHz radar for activity detection. While an RGB-D camera was used for ground-truth annotation during data collection, all computational methodologies were tested exclusively on radar data to ensure adherence to privacy-preserving principles.
The 60 GHz radar operates within the millimeter-wave band, which provides significantly higher spatial resolution compared to lower-frequency alternatives such as 24 GHz or 5 GHz radars. This high frequency enables precise detection of fine-grained human movements, facilitating the recognition of activities such as standing up, sitting down, walking, and brushing teeth by distinguishing even small variations in motion. The ability to accurately differentiate between subtle motion transitions is particularly critical in elderly monitoring and smart home applications, where identifying independent or assisted movement patterns can provide valuable insights into an individual’s physical condition.
The BGT60TR13C Xensiv radar is optimized for advanced signal processing, reducing interference and noise to improve activity identification, even in multi-user scenarios. This capability is particularly relevant for applications in assisted living environments, where multiple individuals may be monitored simultaneously. Additionally, its compact size and low power consumption make it an ideal candidate for IoTs-based solutions, enabling long-term continuous monitoring with minimal energy consumption. Unlike optical-based technologies, 60 GHz radars operate independently of lighting conditions, making them effective in diverse ambient lighting scenarios, including low-light or completely dark environments.
The choice of frequency and radar type plays a crucial role in determining the system’s applicability. FMCW radars, such as the BGT60TR13C, are particularly well-suited for short-range monitoring, offering high range resolution and Doppler-based motion analysis. In contrast, pulsed radars, which operate at lower frequencies, provide improved obstacle penetration but at the cost of reduced resolution and are more cumbersome. For ADL recognition, where distinguishing fine movements is essential, the high-resolution capability of FMCW radar makes it the preferred choice.
The results in
Table 4 demonstrate that feature extraction using different PTNs produces varying classification performances across ADLs. Generally, higher accuracy is observed for activities such STAND, REST, FACE, TEETH, WALK, and HAIR. These activities typically involve distinct motion patterns, making them easier to recognize using radar Doppler spectrograms.
Among the evaluated PTNs, DenseNet201 and ResNet50 achieved the highest classification accuracy. DenseNet201 exhibited the best overall performance, with an accuracy of 97.02%, followed by ResNet50 (94.57%). The high accuracy of these networks can be attributed to their deep feature extraction capabilities, which enhance the model’s ability to differentiate between similar actions. However, DenseNet201 requires higher computational resources during training, whereas ResNet50 demonstrates a more balanced trade-off between training complexity and inference efficiency. Notably, DenseNet201 exhibited lower inference time (0.83 ms per window) compared to ResNet50 (1.58 ms per window), making it a more suitable candidate for real-time applications in low-power embedded systems commonly used in IoTs-based ADL monitoring. Furthermore, to evaluate the feasibility of edge deployment, the selected lightweight PTNs were tested on an NVIDIA Jetson Nano. The measured average inference times per window ranged from 35 ms to 62 ms, demonstrating computational efficiency suitable for real-time execution in resource-constrained environments.
The classification accuracy per activity shows that the lowest-performing categories are LYD and GTU, as evident in
Table 4 and the confusion matrix in
Table 5. The confusion matrix reveals that these activities are often misclassified as each other, suggesting that the model tends to interpret LYD and GTU as phases of the same activity rather than distinct actions. However, misclassifications with other activities remain minimal, indicating that feature extraction and sequential modeling are effective for most ADLs.
In terms of hyperparameter optimization, Bayesian optimization was employed to fine-tune the DenseNet201 and ResNet50 architectures. As detailed in
Table 6, the optimal configurations differed between the two networks. DenseNet201 was configured with 1026 hidden units, a dropout rate of 0.457, and an initial learning rate of 2.7599 × 10
−5, whereas ResNet50 required 4802 hidden units and a lower dropout rate (0.208). These differences suggest that ResNet50 requires a larger number of neurons to compensate for its shallower architecture compared to DenseNet201’s densely connected layers.
To contextualize the obtained results, a comparison is provided in
Table 9 with related works on radar-based HAR. In addition, for greater clarity and relevance,
Table 10 presents a focused comparison limited to the specific set of activities examined in this study. It is important to clarify that the comparative performance results reported in this study were obtained using different datasets. A direct comparison of the same dataset—either publicly available or the one collected in this study—was not feasible for several reasons.
Regarding open-source datasets, they typically employ radars equipped with a single receiving antenna, rendering them incompatible with our proposed framework, which explicitly leverages angular processing to effectively recognize more complex activities. Conversely, most existing methods considered in our comparative analysis do not incorporate angular information, except for the work by Visser et al. [
18], and thus cannot be effectively evaluated using our angular-dependent data.
To the best of our knowledge, Visser et al. [
18] is the only previous study employing three separate range–Doppler maps derived from three antennas, making it compatible with our methodology and suitable for testing with our dataset. However, their method relies on training an ad hoc convolutional neural network specifically designed for feature extraction, in contrast to our transfer learning-based approach utilizing pre-trained networks. As future research, we plan to investigate the employment of custom-designed neural architectures and systematically compare them with pre-trained networks, thereby providing comprehensive insights into the relative advantages and disadvantages of each approach.
While several previous studies have investigated ADL recognition using radar, only the works by Saho et al. [
17] and Visser et al. [
18] have addressed a subset of activities related to the bathroom environment. Nonetheless, neither of these studies has considered the full range of bathroom-specific activities included in the present work. According to the comparative results shown in
Table 10, this study achieves competitive or superior accuracy across activities common to these previous works.
In particular, our method outperforms both prior approaches in the recognition of critical activities such as LYD, which may indicate emergency scenarios. Moreover, our approach also achieves higher accuracy in DRESS. Similarly, our approach maintains comparably high accuracy for SIT and STAND activities, closely matching or slightly exceeding prior results. Additionally, activities such as TEETH, FACE, HAIR, REST, and GTU (recovering from a fall) were uniquely evaluated in our study, highlighting the broader coverage and enhanced practical relevance of the proposed approach.
Studies such as Li et al. [
10], Cao et al. [
11], and Chen et al. [
12] report high classification accuracy for activities like walking (95–98%), sitting down (78–92%), and standing up (74–100%). The performance achieved in this study for these actions aligns well with previous findings, with WALK reaching 97.87% and STAND achieving 100% accuracy.
Certain studies, including Vandersmissen et al. [
14] and Kurtoglu et al. [
13], evaluated more diverse and dynamic activities, such as drumming, ironing, and folding laundry, which were not included in this dataset. Nonetheless, their reported accuracy for standing up (100%) and sitting down (100%) suggests that these actions are well-suited for radar-based recognition, particularly when using high-resolution frequency bands like 77 GHz.
A notable difference arises when comparing our work with Saeed et al. [
15], who reported 100% accuracy for walking and standing activities. However, their dataset consisted of only 1026 micro-Doppler signatures in total, with approximately 170 samples per activity, making their findings less generalizable. Additionally, the dataset description lacks sufficient detail, limiting the ability to assess its diversity and real-world applicability.
Overall, this study’s results align with state-of-the-art benchmarks, demonstrating that radar-based ADL recognition is feasible and effective using deep learning models. The high classification accuracy observed in most activities suggests that Doppler spectrogram-based analysis is a promising alternative to other monitoring systems (e.g., camera-based), especially in strictly privacy-sensitive settings like bathrooms.
Limitations and Considerations in Multi-Person Environments
Radar sensors typically face significant challenges in distinguishing or tracking an individual’s movements when multiple people are present within the sensing range. For this reason, radar technology is naturally applied in scenarios where there is a reasonable certainty that only one person is within range, such as a bathroom or in monitoring situations where an older adult spends most of their time in a specific location (e.g., a chair or a bed). In a previous study by the authors [
5], the presence of additional individuals was investigated in the context of vital sign detection. Those results showed that, if other subjects remain more than 0.5 m away from the monitored person and do not obstruct the line of sight between the sensor and the target, the accuracy loss is approximately 2.61% for heart rate and 4.88% for respiration.
In this work, we specifically focus on the case of a single person in the bathroom environment because the monitoring of bathroom-related ADLs is primarily intended for assessing an older adult’s independence. The presence of additional individuals would imply that the older adult is not autonomous, thus defeating the purpose of evaluating self-sufficiency. Nevertheless, based on our previous experiments, it is reasonable to assume that if other individuals are located more than one meter away and do not stand between the radar sensor and the monitored subject, the detection performance does not deteriorate significantly. Furthermore, given that the radar range is limited to a maximum of 2.5 m and the radar frequency employed has minimal penetration through walls, the presence of other moving subjects in different areas of the home is unlikely to interfere with the monitoring in the designated bathroom environment.
5. Conclusions
This study demonstrates the feasibility of radar-based HAR in privacy-sensitive environments, specifically focusing on ADLs performed in a bathroom setting. By leveraging a BGT60TR13C Xensiv 60 GHz radar, this work evaluates the ability of deep learning models to classify subtle human movements in a privacy-preserving way (i.e., without capturing biometric details).
A dataset was collected from seven volunteers performing ten ADLs, including face washing, brushing teeth, dressing/undressing, sitting, and standing up, activities that have not been systematically studied in previous radar based HAR research. The classification framework incorporated pre-trained feature extractors and a BiLSTM network with an attention mechanism, allowing for effective modeling of sequential radar data. Among the 16 PTNs tested, DenseNet201 achieved the highest classification accuracy (97.02%), followed by ResNet50 (94.57%). While most activities were recognized with high reliability, LYD (lying down) and GTU (getting up) exhibited a slightly lower classification accuracy, likely due to their similar motion patterns and overlapping Doppler characteristics.
Future research should focus on enhancing the differentiation between LYD and GTU by incorporating contextual information and temporal dependency modeling, which could improve recognition accuracy for actions with overlapping motion characteristics. Additionally, expanding the study to include a larger pool of volunteers will be useful for increasing dataset diversity and improving generalizability. The real-world deployment of radar-based ADL recognition systems will also be crucial for evaluating their robustness under varying spatial constraints, occlusions, and user-specific variations. Further advancements will involve the inclusion of additional ADLs, such as body washing and entering/exiting the bathtub, to establish a more comprehensive and realistic radar-based activity recognition framework suitable for healthcare and smart home applications. Moreover, future research will investigate feature learning using custom-designed neural networks and systematically compare their performance with the pre-trained networks presented in this study, thereby providing insights into the relative advantages and limitations of each approach.