1. Introduction
Human Activity Recognition (HAR) plays a pivotal role in a wide range of applications, including healthcare, assisted living, sports performance analysis, and rehabilitation monitoring. Fundamentally, HAR entails classifying physical activities, such as walking, running, sitting, or more complex movements, based on sensor data collected from the human body. The growing availability and miniaturization of wearable sensors, including accelerometers, gyroscopes, magnetometers, and physiological monitors, have significantly improved the feasibility and granularity of HAR systems. These technological advancements have made it possible to perform continuous, real-time monitoring in naturalistic environments, thereby enabling personalized and proactive interventions [
1].
Despite significant advancements in sensing technologies, achieving accurate and robust HAR under real-world conditions remains a considerable challenge. Traditional HAR systems often employ a centralized architecture, where raw or pre-processed data from multiple wearable devices is transmitted to a central processor, typically a smartphone, cloud service, or edge hub, for classification. However, this centralized design introduces several technical limitations, particularly in dynamic, resource-constrained environments. These limitations include increased communication overhead, elevated energy consumption, data synchronization complexities, vulnerability to node disconnections, and concerns related to user data privacy [
2]. Furthermore, centralized inference requires the central node to possess substantial processing power to handle multi-sensor data streams, which may be impractical for low-power, real-time applications such as continuous patient monitoring or remote fitness tracking.
  1.1. Wearable Systems and Their Challenges
The adoption of wearable technologies in healthcare and activity monitoring has skyrocketed, particularly with the emergence of smartwatches, fitness trackers, and wearable medical devices. However, these systems are often limited by their hardware capabilities, especially in terms of computation, storage, and power [
3]. This makes it impractical to run complex machine learning models locally on each device, pushing most processing tasks to the central node. Furthermore, wearable systems often operate under variable conditions: sensors may lose contact, network availability may fluctuate, and the physical placement of the device may introduce variability in signal quality. These challenges make robustness and fault tolerance essential design goals in modern HAR systems.
From a system architecture perspective, current wearable HAR solutions primarily focus on data capture and transmission, with minimal onboard processing [
4,
5]. The captured data is typically streamed to a mobile phone or cloud service, where inference is conducted using centralized neural network models. These models, while powerful, depend heavily on uninterrupted data flow and fail gracefully only in rare circumstances. A disconnection or failure in any part of the sensor network can significantly degrade performance or halt monitoring altogether [
6].
  1.2. Neural Networks in HAR and Edge Constraints
Recent advances in deep learning have dramatically improved HAR accuracy. Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and Transformer-based models have been applied successfully to multi-modal sensor data [
4,
5]. These models are capable of learning complex temporal and spatial features directly from raw signals, reducing the need for handcrafted features. However, their deployment remains challenging in wearable contexts due to computational and memory limitations.
Most deep learning-based HAR frameworks are still implemented in centralized architectures where data from all sensors is required simultaneously [
1]. Edge deployment, which runs models directly on wearable devices, offers potential benefits in terms of latency reduction and data privacy, but is constrained by the limited processing capabilities of individual devices. Moreover, scaling such systems to accommodate multiple sensors or users exacerbates these limitations. Edge–cloud hybrid solutions have been proposed, but they still rely heavily on constant connectivity and central control, which may not be sustainable in all use cases.
  1.3. Problem Statement: The Limits of Centralized and Distributed HAR
As shown in 
Figure 1a, a centralized Machine Learning (ML) model computes all measurements and extracted features together to obtain a series of outputs, which may be classifications or regressions. In a distributed sequential ML approach (
Figure 1b), the model is partitioned by layers such that all measurements and features are received by the first core, which contains the first layer. The subsequent cores contain one layer at a time, operating in a pipeline to reduce computation time. However, a single core will receive all measurements, which requires a direct connection to all sensors in the network. Finally, suppose each sensor contains a portion of the original network, unaware of the other signals. In that case, it will lose observability because some measurements will not be accounted for, resulting in an inaccurate inference (
Figure 1c).
The fundamental problem with centralized HAR models lies in their structural dependency on full-sensor data integration at a single location. This dependency introduces significant bottlenecks in terms of latency, scalability, and fault tolerance. In scenarios where multiple sensors are deployed across the body, maintaining synchronization and uninterrupted data streams becomes increasingly difficult. Centralized systems are also more vulnerable to performance degradation due to packet loss, sensor dropout, or delayed data transmission.
In addition to these technical limitations, centralized systems pose privacy risks. Raw sensor data often contains sensitive information about users’ behaviors and routines. Transmitting this data across networks increases the risk of potential security breaches. Similarly, energy consumption becomes a critical concern when high volumes of data must be wirelessly transmitted, particularly in battery-constrained devices like smartwatches or wearable patches [
7].
  1.4. Proposal: A Distributed Framework for HAR
To address these challenges, we propose a distributed neural network architecture for human activity classification using wearable sensors. In this architecture, each wearable device processes its local data stream using a lightweight neural model trained to produce high-level feature representations or preliminary predictions. These outputs, rather than the raw data, are transmitted to a central classifier, which integrates information from all devices to perform final activity recognition. This distributed framework offers several compelling advantages:
- Reduced Communication Overhead: only concise representations are transmitted, minimizing bandwidth use and energy consumption. 
- Improved Fault Tolerance: the system remains functional even if one or more devices disconnect or fail temporarily. 
- Enhanced Scalability: new sensors can be added with minimal impact on the central node’s computational load. 
- Greater Privacy: since raw data is not transmitted, the system inherently protects sensitive user information. 
This architectural shift allows us to reimagine HAR systems not as monolithic, centrally controlled pipelines but as collaborative, adaptive, and modular networks of intelligent agents. Each sensor node contributes to the overall classification task without requiring complete access to all data, thereby enabling deployment in real-time, low-power, and privacy-sensitive scenarios.
  1.5. Related Work
Recent research in HAR has increasingly shifted from traditional centralized processing toward edge-based and distributed approaches that better accommodate real-world deployment constraints. Centralized models such as those employing deep convolutional and recurrent networks have demonstrated strong performance across various benchmark datasets [
8,
9]. These models often aggregate raw multi-sensor data to a cloud or local server for classification, achieving high accuracy but suffering from high communication costs, latency, and vulnerability to sensor failures or dropouts [
10]. To address these issues, edge intelligence techniques have been developed, allowing devices to perform localized inference. For example, Haque et al. introduced LightHART, a transformer-based HAR model optimized for execution on mobile processors without sacrificing accuracy [
11]. Similarly, the work of Agarwal and Alam proposed an ultra-lightweight deep learning model tailored to wearable hardware constraints [
7].
Federated learning (FL) has emerged as a promising solution to reduce data transmission and enhance privacy. Frameworks such as FedHAR [
12], FedHealth [
13], and FedOpenHAR [
14] enable multiple wearable devices to collaboratively train models while retaining data locally. These approaches mitigate privacy risks but still depend on stable connectivity and frequent model synchronization. Spiking neural networks (SNNs), like those explored in [
15], offer biologically inspired, event-driven computation with low energy consumption, making them suitable for on-body deployment, though training remains challenging. Meanwhile, hybrid distributed architectures such as adARC [
16] and Sannara et al.’s distributed sensor fusion scheme [
17] combine local lightweight inference with central aggregation of intermediate representations or decisions, offering robustness to disconnections and improved scalability.
In addition to these frameworks, recent studies have explored enhancements to traditional FL through communication-efficient and heterogeneous model approaches. Sozinov et al. [
18] evaluated FL for HAR using both softmax regression and deep neural networks, showing that although FL achieves slightly lower accuracy than centralized models, it significantly reduces privacy risks and data transmission costs. Gad et al. [
19] proposed FedAKD, an FL strategy based on augmented knowledge distillation, which supports heterogeneous client models and improves communication efficiency by exchanging soft labels rather than gradients. Their results show superior performance under non-IID conditions, addressing one of the key limitations of traditional FL algorithms.
Beyond FL, split learning (SL) and federated split learning (FSL) have been introduced to reduce the computational load on wearable devices further while preserving privacy. Ndeko et al. [
20] presented an FSL framework with differential privacy that partitions the model between the client and the server. Their method achieved improved accuracy, reduced latency, and better privacy preservation compared to conventional FL, particularly in edge computing scenarios.
Building on these advancements in FL and SL, recent studies have emphasized the importance of energy awareness and heterogeneous device capabilities in real-world HAR deployments. Nguyen et al. proposed an energy-aware FL framework that dynamically selects participating clients based on energy status and data quality, achieving a better trade-off between accuracy, fairness, and device longevity [
21]. Similarly, Thakur et al. introduced FedMeta, a meta-learning-based approach that enhances personalization and adaptation to non-iid data distributions commonly encountered in wearable HAR [
22]. Saylam and Incel [
23] provided a comprehensive review of FL techniques for edge devices, highlighting the challenges posed by device heterogeneity, communication bottlenecks, and the need for lightweight aggregation methods tailored to wearable systems.
In parallel, research has also focused on distributed training and inference paradigms that relax the server-centric assumptions of traditional FL. Khan et al. developed a multi-frequency federated learning (MFFL) algorithm for HAR using head-worn sensors, enabling asynchronous updates from devices operating at different sampling and computation rates while maintaining both accuracy and energy efficiency [
24]. Furthermore, decentralized training approaches, such as ring-based and peer-to-peer topologies, have been explored to eliminate single points of failure, allowing HAR systems to maintain progress even under intermittent connectivity. These emerging strategies emphasize adaptability, privacy preservation, and scalability, making them particularly relevant for next-generation distributed HAR frameworks.
Complementary to learning strategies, energy efficiency remains a central concern for wearable HAR systems. Ding et al. [
25] proposed dynamic inference schemes and sensor sampling strategies to reduce power consumption without sacrificing performance. Similarly, Rezaie et al. [
26] introduced an adaptive activity-aware algorithm that adjusts sensing and computation rates based on user context, achieving substantial energy savings. These techniques focus on prolonging battery life, though they often assume single-device processing without inter-sensor coordination.
Despite these advances, many existing solutions either compromise accuracy for efficiency or require architectural assumptions (e.g., synchronized updates or homogeneous sensors) that are difficult to maintain in real-world deployments. In contrast, the framework proposed in this work adopts a modular, distributed architecture in which each sensor node runs an independent neural model to generate compact, informative outputs for fusion. This reduces communication overhead, enhances fault tolerance, and supports asynchronous operation, making it well-suited for scalable, privacy-conscious, and resource-constrained HAR deployments.
  1.6. Contributions and Structure of the Paper
In this work, we attempt the following:
- Propose a  novel distributed neural network framework-  for multi-sensor HAR using wearable devices, as shown in  Figure 2- . 
- Implement and optimize lightweight local models for individual sensor nodes and a central fusion model for final classification. 
- Validate our approach on a publicly available multi-sensor HAR dataset that we created and compare its performance against a centralized model. 
- Demonstrate that our distributed model achieves comparable or even better accuracy while significantly reducing communication overhead and improving robustness to sensor disconnections. 
The remainder of the paper is organized as follows: 
Section 2 presents the dataset, experimental setup, and implementation details. 
Section 3 outlines the proposed distributed framework, detailing the design of both local and central neural network components. 
Section 4 reports the results, including accuracy comparisons, robust analysis, and energy consumption metrics. Finally, 
Section 5 discusses implications, limitations, and directions for future research.
  2. Materials and Methods
  2.1. Participants
A total of 67 participants (40 males and 27 females), aged between 20 and 60 years, were recruited for this study from the local university population and the surrounding community. All individuals voluntarily agreed to participate and provided informed consent before data collection. Participants self-reported being in good general health and physical condition at the time of the study.
The primary inclusion criterion was the absence of any known cardiovascular conditions or medical history that could compromise the participant’s ability to perform the physical activities safely as outlined in the experimental protocol. Individuals with mobility impairments, recent injuries, or conditions affecting balance or coordination were excluded to ensure consistency and safety during data acquisition.
This study was approved by the University of Puerto Rico’s Institutional Review Board (IRB) in compliance with the ethical standards established by the Collaborative Institutional Training Initiative (CITI Program) for research involving human subjects. All procedures were reviewed and authorized before data collection. Before participating, all individuals received a detailed explanation of the study’s objectives and methods and provided written informed consent.
  2.2. Dataset
We collected a dataset using wearable sensors strategically placed on the human body. The dataset comprises multiple sessions of physical activity performed by 67 participants, with each session consisting of several repetitions of predefined activities. These activities were recorded using three types of signals: quaternion, linear acceleration, and angular velocity. These signals provide a rich representation of body movement and orientation, which is essential for robust classification.
Each session was recorded at a fixed sampling rate of 50 Hz, meaning that one sample corresponds to one timestamp of recorded data. On average, each session lasted approximately 37 min, resulting in thousands of individual data samples per session. The “Duration per session” column in 
Table 1 indicates the length of each activity per participant.
Table 1 summarizes the list of activities selected for classification, their corresponding labels, and the average duration per subject for each activity. This provides a detailed overview of how the dataset was constructed for training and evaluating the classification models.
 Many studies in the literature rely primarily on acceleration and gyroscope signals for activity recognition, as these modalities capture translational and rotational motion effectively [
27]. However, in this work, we also incorporated quaternions because they represent the sensor’s orientation in 3D space without suffering from the singularities and ambiguities [
28]. By incorporating quaternions, the system achieves a more stable and comprehensive understanding of body posture and joint movement, which is particularly valuable when classifying activities that involve complex or continuous changes in orientation and speed.
To capture the full-body dynamics during the selected activities, wearable sensors were strategically placed on key locations of the body. In this study, five sensors were used and positioned as follows: on the chest to provide a reference for torso orientation and posture; on the right and left wrists to monitor upper limb movements; and on the left and right knees to capture lower limb dynamics. This sensor configuration enables the model to learn both global and local motion patterns across different body segments, improving its ability to distinguish between similar activities. A visual representation of the sensor placement is shown in 
Figure 3. The dataset collected is available in the resource [
29].
Anatomical considerations guided the placement strategy and aligned with established practices in the human activity recognition literature. Sensors located on the chest and limbs offer a suitable trade-off between capturing relevant motion features and ensuring wearability. While this configuration proved effective for the activities included in this study, it is essential to note that no experiments were conducted to assess alternative sensor placements. Classification performance may vary with different sensor configurations, particularly when motion sensitivity or signal discriminability is reduced due to location-specific factors.
  2.3. Experimental Design
To evaluate the proposed distributed framework for human activity recognition, a controlled experiment was conducted using five MetaMotionRL wearable sensors (version r0.5) from Mbientlab Inc. (San Jose, CA, USA). Each sensor was configured to stream specific inertial data types at a frequency of 50 Hz, utilizing Bluetooth Low Energy (BLE) for wireless communication with a MetaBase app running on an iOS device.
The chest sensor captured quaternions, 3-axis linear acceleration, and 3-axis angular velocity. The sensors placed on the right hand and left knee streamed only acceleration and gyroscope data, while the sensors on the left hand and right knee transmitted quaternion data exclusively. All data streams were timestamped at the source and subsequently synchronized offline using a standard reference clock. Synchronization was achieved through linear extrapolation based on timestamps to ensure temporal alignment across all sensor inputs.
Each of the 67 participants underwent a structured experimental session with a total duration of approximately 37 min. The session consisted of six predefined activities listed in 
Table 1: folding clothes, sweeping, walking, moving boxes, riding a bike, and sitting. The first five activities were performed continuously for 5 min each, while the remaining time was allocated to the sitting activity.
All sessions were conducted under supervised conditions in a controlled indoor environment, following a strict and consistent timeline across participants to ensure standardization and reproducibility of the dataset. Each session began immediately after sensor initialization and synchronization, and data acquisition was conducted in real-time over approximately 37 min. per participant. However, model training and classification were performed offline after data collection. The sensor signals remained stable throughout each session, with no noticeable drift or degradation in data quality. This ensured consistent labeling and reliable input for offline model development and evaluation. Future work will investigate longer-term deployments to assess the impact of continuous use, including possible effects of sensor drift or synchronization loss on classification performance.
Transitions between activities were excluded from the final dataset to avoid introducing ambiguity in the activity labels. The sitting activity occurred at the beginning of the session and after every two subsequent activities, resulting in a total of four sitting periods. The short intervals during which the participant transitioned from one activity to another were automatically identified by software using the predefined time intervals for each task and systematically discarded, ensuring that each data segment corresponded strictly to a single, well-defined activity. While this design decision promotes label consistency and simplifies the learning process, it does not account for the transitional or overlapping motions that often occur in real-world deployments. Therefore, in future work, we aim to extend the framework toward real-time implementation, where detecting and classifying transitions will be critical for enabling continuous and adaptive activity recognition in dynamic environments.
  2.4. Data Processing
To ensure temporal consistency, all recordings were synchronized post hoc using the timestamp information provided by the sensors. A linear extrapolation method was applied to align the signals across devices with a uniform sampling grid. Each recording session consisted of multiple predefined activities, performed in a controlled environment. Activity labels were assigned based on the sequence and timing of each task, which were recorded during acquisition.
To ensure the validity of the labeling process, all data acquisition sessions were supervised in real time by a member of the research team, who followed a predefined script indicating the sequence and duration of each activity. The timing of each task was logged concurrently with the sensor data to provide accurate alignment between activity execution and label assignment. After acquisition, the recorded sensor signals were visually inspected using custom plotting tools to verify that signal transitions (e.g., changes in acceleration or orientation) were consistent with the expected activity changes. Sessions where inconsistencies or ambiguities in timing or execution were detected were either repeated or excluded from the dataset. Although inter-observer agreement was not employed, this manual supervision and verification process served to ensure the reliability of the assigned labels and mitigate the effects of possible temporal drift or misalignment.
During preprocessing, any data entries with invalid or missing labels were excluded. In addition, samples containing missing values in any of the sensor features were removed to ensure the integrity of the dataset. Feature values were then normalized using z-score normalization (zero mean and unit variance), computed based on the training set statistics. This normalization was applied using the utilities from the scikit-learn library and saved for consistent scaling of future data.
The resulting normalized time series were segmented into overlapping windows of 50 samples (equivalent to 1 s) using a sliding window approach with a stride of 25 samples (50% overlap). Each window was assigned the most representative label from the segment to generate the final dataset used for model training and evaluation.
  4. Results
This section presents a comprehensive evaluation of both the centralized and distributed models using a test set composed of unseen data. The results include accuracy, confusion matrices, and class-wise performance metrics, enabling a comparative analysis of the two approaches.
  4.1. Centralized Model Performance
To optimize the classification performance and assess the robustness of the centralized model, multiple architectural configurations were tested. Each configuration varied the number of convolutional and LSTM units, while maintaining the overall CNN–LSTM structure. The evaluation was conducted using a test set composed of data that was not used during training or validation, ensuring an unbiased assessment of generalization capability.
Figure 10 presents the accuracy obtained for each configuration. The models are grouped into three categories: C2L1, C3L2, and C4L3, where 
C denotes the number of convolutional layers and 
L the number of LSTM layers. Each configuration also varies in terms of convolutional filters (
F) and the number of hidden units in the LSTM layers (
N). Among all tested variants, the configuration with 256 convolutional filters and a two-layer LSTM with 256 hidden units (F = 256, N = 256, C = 3, L = 2) achieved the highest accuracy (93.68%), demonstrating the benefit of deeper representations when processing complex multi-sensor data.
 These results establish a robust and well-performing centralized baseline, which serves as a strong reference point for evaluating the distributed framework. The effectiveness of this configuration not only validates the design choices of the centralized approach but also provides a meaningful benchmark for subsequent comparisons.
  4.2. Distributed Model Performance
Building upon the centralized baseline, the distributed framework was evaluated under the same test conditions to enable a direct comparison. In this approach, each wearable sensor is associated with an independent local model responsible for processing its own data stream. These local models were trained separately using their respective sensor modalities and produce intermediate predictions in the form of SoftMax probability vectors.
To integrate the outputs from the local models, a lightweight central model based on an MLP was employed. This central node aggregates the class scores output by each sensor to generate the final activity classification.
Figure 11 presents the accuracy obtained for each configuration of the distributed framework. The models are grouped under the category C2L1, representing a lightweight and straightforward architecture tailored for embedded deployment. Among all tested variants, the configuration with 16 convolutional filters and a single-layer LSTM with 32 hidden units (F = 16, N = 32, C = 2, L = 1) achieved the highest accuracy (95.99%).
 These results confirm that the distributed framework, despite relying on simpler and independently trained local models, can achieve competitive and better classification performance. The lightweight configuration with minimal computational complexity proved sufficient for capturing discriminative features when combined across sensors. This balance between efficiency and accuracy highlights the potential of distributed processing in wearable systems. A comparative analysis between the centralized and distributed approaches is presented in the following subsection.
  4.3. Comparison
To enable a direct evaluation of both approaches, the centralized and distributed models were tested under identical conditions using the same dataset. 
Figure 12 provides a detailed comparison of the class-wise performance for both approaches. It includes confusion matrices and the per-class precision, recall, and F1-scores. It can be appreciated that the distributed model achieved higher precision than the centralized approach, indicating a reduced rate of false positives in its predictions. In terms of overall accuracy and recall, both models performed comparably, with slight variations depending on the activity class. These results highlight the effectiveness of the distributed strategy, particularly considering its modularity and scalability, and set the stage for a more in-depth discussion on the trade-offs between the two approaches.
The comparative evaluation between the centralized and distributed models reveals clear and consistent performance differences in favor of the distributed approach. As shown in 
Figure 12, which presents the normalized confusion matrix, each value from the centralized model is accompanied by its distributed counterpart in parentheses. This format enables a direct, class-by-class comparison. Notably, the distributed model demonstrates improved classification accuracy in several key activities. For instance, the class Sitting shows an increase in true positive rate from 0.90 to 0.96, while Walking improves from 0.96 to 0.98. In more complex activities such as Sweeping and Moving Boxes, the distributed model also shows notable gains, increasing from 0.88 to 0.91 and from 0.91 to 0.94, respectively, while simultaneously reducing off-diagonal misclassifications. These improvements suggest that sensor-level processing enables better preservation of temporal patterns and motion nuances that may be lost during early fusion in centralized architectures.
Figure 12 further reinforces these findings by displaying per-class precision, recall, and F1-score for both models. The distributed model, represented by darker bars, consistently achieves higher recall across all classes. This trend is particularly evident in Sweeping, Walking, and Moving Boxes, where the recall gains are substantial. Since recall is critical in activity recognition systems to avoid missing actual events, these results highlight the distributed model’s strength in capturing true positives. In terms of F1-score, which balances both precision and recall, the distributed model outperforms the centralized one in nearly every class, confirming its superior classification reliability.
 While the centralized model performs slightly better in precision for a few activities, such as Sitting and Folding Clothes, the overall advantage shifts toward the distributed model due to its balanced improvement in both recall and F1-score. This trade-off is acceptable, especially in real-time systems where false negatives (missed detections) are more detrimental than occasional false positives.
In summary, the distributed model not only matches but often surpasses the centralized model in key evaluation metrics. Its superior performance in complex, dynamic activities demonstrates that localized processing with late fusion enhances robustness and generalization. These results support the design of distributed architectures in wearable activity recognition systems, where sensor-specific information and reduced latency are crucial for reliable real-time performance.
  4.4. Simulation of BLE Packet Loss and Sensor Disconnection
To evaluate the resilience of the proposed distributed framework under BLE communication failures, we conducted a series of experiments simulating sensor disconnections and packet loss during inference. These tests were designed to reflect real-world challenges in wearable systems, including dropped connections, sensor malfunctions, and partial data loss.
The distributed system includes five unique sensor sources: the Chest, Left Knee, Right Hand, Left Hand, and Right Knee. The local models associated with each sensor were independently trained and fixed during evaluation. A centralized MLP model receives a 30-dimensional concatenated input vector formed by the class probability distributions of all five sensors. To simulate sensor failures or transmission loss, we manipulated the input vectors at inference time without retraining the central model, reflecting deployment conditions where real-time adaptation is not feasible.
We evaluated the following fault scenarios:
- Baseline (No loss): all sensors are active and transmitting valid data. This serves as the reference for maximum expected performance. 
- BLE packet loss (10%): randomly zeroes out 10% of the predicted outputs for each sensor to simulate moderate transmission dropouts. The central model continues to receive data from all sensors. 
- BLE packet loss (30%): a more severe version, where 30% of each sensor’s outputs are zeroed out, simulating high BLE packet loss or interference. 
- Single-sensor disconnection: Each sensor is individually removed by replacing its full output vector with zeros. This represents a total communication failure or power loss in one node. 
- Two-sensor disconnection: All 10 possible combinations of two sensors being disconnected simultaneously were tested. This simulates more critical failures or simultaneous signal dropout in two channels, allowing for the analysis of sensor pair importance in global inference. 
Each scenario was evaluated using identical test subjects and consistent inference parameters to ensure reproducibility and comparability across conditions. 
Table 3 summarizes the impact of BLE packet loss and specific disconnection scenarios on classification performance, measured as accuracy and F1-score averaged across 10 held-out test subjects.
While all disconnection scenarios negatively impacted performance, results show that the distributed model remained robust under moderate packet loss (10%) and some single-sensor disconnections. However, removing the Chest sensor or combinations involving the Chest and Left Hand caused significant degradation, suggesting these sensors contribute highly informative features.
To further analyze this, we conducted a complete evaluation of all one-sensor and two-sensor disconnection combinations, shown in 
Table 4. The values represent average metrics across the 10 test subjects. Results show a clear trend where accuracy and F1-score vary depending on which specific sensors are dropped, confirming their relative importance to the central model.
As can be observed, the Chest and Left-Hand combination yields the lowest overall performance, underscoring the collective influence of these two elements. In contrast, disconnections involving only peripheral joints (e.g., Right Knee or Left Knee) have a minimal effect on the central model’s prediction, demonstrating fault tolerance in those cases.
These experiments were designed to be fully reproducible and can be replicated. The robust results emphasize the advantage of the distributed architecture in maintaining reliable inference even in the presence of communication failures.
  4.5. Latency and Energy Estimation
Beyond accuracy, latency and energy consumption are critical factors when evaluating the feasibility of activity recognition systems in real-time wearable scenarios. In this subsection, we report and compare the inference latency, energy consumption, power usage, and estimated battery life of both the centralized and distributed approaches.
All measurements were performed on an NVIDIA Jetson AGX Xavier (NVIDIA Corporation, Santa Clara, CA, USA) using TensorRT for model optimization and deployment. The Jetson platform was selected as a high-performance embedded system capable of profiling inference workloads, allowing relative comparison between centralized and distributed architectures. However, these results serve as a proxy and do not directly represent the on-device performance of ultra-low-power sensor nodes such as the MetaMotionRL (MbientLab Inc., San Jose, CA, USA).
Inference latency was recorded as the average time required to process a single input window and generate a prediction. For the centralized model, latency includes the entire processing pipeline of the fused multi-sensor input. In contrast, the distributed model performs independent local inferences at each node, followed by a lightweight aggregation step at the central unit. This modular and parallel structure led to significantly lower latency in the distributed setup.
Energy consumption per inference was estimated using TensorRT profiling tools, which account for both computational operations (FLOPs) and memory access patterns. The centralized model, due to its more complex architecture and multi-sensor integration, exhibited substantially higher energy consumption. The distributed framework, composed of lightweight CNN–LSTM local models and a compact MLP fusion module, consumed considerably less energy per inference, a key advantage for deployment on low-power embedded platforms.
To further quantify performance, we computed the power consumption (W) of each model using the following expression:
Assuming an inference rate of 50 predictions per second, consistent with the 50 Hz sampling rate commonly used in human activity recognition. This assumption enables extrapolation of average power usage. It is essential to note that the energy consumption of BLE communication was not included in this estimation, as accurate BLE transmission profiling requires direct measurement at the hardware level, which was beyond the scope of this study.
To estimate battery life, we used the specifications of the MetaMotionRL sensor  (MbientLab Inc., San Jose, CA, USA), which includes a 190 mAh battery operating at 3.7 V, yielding a total energy capacity of:
Using this energy capacity and the estimated power consumption for each model, the expected battery life was computed as:
For the distributed model, the total energy per inference was computed as the sum of the energy usage from all five local models (processing quaternion, accelerometer, and gyroscope data), plus the energy usage of the central model. Total latency was calculated assuming parallel local inference, using:
Table 5 provides a comprehensive summary of these performance efficiency metrics. The distributed model achieved a total inference time of 0.643 ms and energy usage of 3.91 mJ, compared to 3.184 ms and 9.219 mJ for the centralized model. These improvements are primarily attributed to the distributed execution of lightweight local models and the minimal overhead of the central fusion module.
 This efficiency translates directly into extended operational autonomy. While the centralized model supports approximately 1.52 h of continuous operation per full battery charge, the distributed framework enables an estimated runtime of roughly 3.6 h per sensor. These values represent theoretical estimates based on Jetson-level profiling and battery specifications and should be interpreted as relative indicators rather than absolute performance guarantees.
These results highlight the practical advantages of the distributed architecture in scenarios where energy efficiency, low-latency inference, and prolonged operation are critical. By distributing the computational load across local nodes and reducing the complexity at each sensor, the system enables extended usage while maintaining competitive classification performance.
The experimental evaluation confirms that the distributed approach consistently outperformed the centralized model across all evaluated metrics, including accuracy, precision, recall, and F1-score. While the centralized model benefits from fused multi-sensor input, the distributed framework demonstrated superior performance, particularly in activities involving subtle or overlapping motion patterns. This suggests that local models, when coordinated adequately through an efficient fusion strategy, can effectively capture discriminative features while also offering additional advantages in modularity and scalability.
Beyond predictive performance, the efficiency gains achieved by the distributed model are substantial. As shown in 
Table 5, total inference latency decreased from 3.18 ms in the centralized model to just 0.64 ms in the distributed setup. Similarly, energy per inference was reduced from 9.22 mJ to 0.68 mJ when combining local and central components. These reductions led to a significantly lower power consumption (0.195 W vs. 0.461 W), which in turn resulted in a more than twofold increase in estimated battery life, from approximately 1.5 h in the centralized model to nearly 3.6 h in the distributed case.
These findings confirm that the distributed architecture is better suited for real-time, embedded deployments where energy autonomy and responsiveness are critical constraints. The balance between performance and efficiency offered by this framework lays a solid foundation for future work on scalable, robust, and context-aware activity recognition systems in real-world wearable applications.
Finally, we acknowledge that the current energy and latency estimates are derived from a high-performance embedded proxy (Jetson AGX Xavier), and do not include the energy overhead of BLE communication. Future work will focus on empirical energy profiling directly on ultra-low-power sensor nodes, incorporating BLE transmission cost and inference latency at the microcontroller level for full-system evaluation.
  4.6. Implementation Challenges on Embedded Platforms
While this study focused on evaluating the proposed framework through offline simulations, transitioning the system to real-time embedded platforms introduces several practical challenges. First, BLE communication latency and packet loss must be carefully considered, as real-world transmission delays and asynchronous sensor updates can lead to temporary misalignments between sensor inputs. In our framework, such effects were simulated using zero-vector substitution during the inference process. Still, real-time handling may require buffering strategies, synchronization protocols, or lightweight extrapolation methods to maintain consistency across sensor nodes.
Second, the computational capabilities of embedded processors, such as microcontroller units (MCUs) or edge AI devices (e.g., Jetson Nano), can limit the complexity of local neural networks. While our CNN–LSTM models are designed to be lightweight, optimizing them further via model quantization, pruning, or hardware-specific inference engines (e.g., TensorRT, ONNX Runtime) would be necessary to meet real-time constraints.
Additionally, power consumption trade-offs must be managed carefully. Frequent wireless communication and local processing increase energy usage, which is critical in wearable applications. Efficient scheduling of inference and transmission tasks, possibly driven by event-based triggering or adaptive sampling, will be essential for long-term operation.
Finally, synchronization among sensors becomes a non-trivial issue in multi-node systems. Without centralized clocking, drift can occur between nodes, resulting in temporal alignment issues. Implementing timestamp correction or leveraging protocols such as BLE time synchronization would be required.
Although real-time deployment was outside the scope of this work, future research will address these aspects through implementation on embedded prototypes with actual BLE communication links.
  5. Discussion
The results presented in the previous section highlight the strengths and trade-offs between the centralized and distributed approaches for activity recognition using wearable sensors. This discussion builds on those findings by examining their practical implications, acknowledging limitations, and outlining directions for future research.
We acknowledge that no external motion capture system was used to validate the absolute accuracy of the Inertial Measurement Unit (IMU) orientation estimates. Instead, we relied on the onboard sensor fusion algorithms provided by the MetaMotionRL sensors, which integrate accelerometer, gyroscope, and magnetometer readings to produce stable orientation outputs in short recording windows. Since the activities were performed under controlled conditions and were of limited duration (~5 min per activity), the accumulated drift was negligible for activity classification. Furthermore, the model’s generalization capability was evaluated across a diverse group of 67 participants, yielding consistent performance results. While validation with an external motion capture system would provide higher-fidelity ground truth for biomechanical analysis, it was not required for the goals of this study, which focus on robust and efficient activity recognition in practical wearable settings.
The centralized model, which processes fused data from all sensors simultaneously, achieved strong classification results. This aligns with previous findings, which show that centralized architectures benefit from access to multi-sensor input, enabling the capture of inter-sensor dependencies. However, the distributed model exhibited consistently higher precision across all activity classes, reflecting a more conservative prediction strategy with fewer false positives. This behavior is particularly advantageous in contexts where misclassifications could lead to undesired actions or safety risks.
Beyond classification performance, the distributed framework offers significant advantages in computational efficiency. By assigning local models to individual sensors and transmitting only their SoftMax outputs, the architecture reduces per-node processing requirements and enables parallel inference with lower latency. When deployed on the NVIDIA Jetson AGX Xavier using TensorRT, the distributed model achieved an order-of-magnitude improvement in both inference time and energy consumption compared to the centralized configuration. These gains translate into extended operational autonomy, with an estimated runtime increase from just over 1.52 h in the centralized case to approximately 3.6 h per sensor in the distributed configuration. This improvement makes the system more suitable for continuous applications such as physical therapy monitoring, occupational safety, and long-term activity tracking.
Although battery life was estimated and compared between configurations, we did not conduct experiments to assess how sensor performance or data quality may degrade as battery levels drop. Recording sessions in this study lasted approximately 37 min, well within the sensors’ full runtime, and no perceptible degradation in signal quality, stability, or transmission was observed during that period. However, we acknowledge that low battery conditions in extended deployments could potentially impact sampling rates, sensor accuracy, or communication reliability. Future work will investigate these aspects in the context of long-term, real-time operation.
It is essential to note that the distributed inference framework was evaluated using offline simulations, where data were collected and synchronized in advance under controlled conditions. This decision allowed us to focus on validating the inference architecture itself, specifically, the ability to classify activities based on partial, independently processed sensor inputs, without introducing variability from wireless communication or real-time processing constraints.
Nonetheless, we recognize that real-time implementation aspects such as BLE transmission delays, packet loss, and power consumption are critical for practical deployment and were not evaluated in this study. To address this, we have outlined a concrete plan for real-time implementation. Specifically, we intend to integrate BLE communication between each wearable sensor (MCU) and a central embedded hub (e.g., NVIDIA Jetson Nano or Xavier), using concurrent Bluetooth interfaces to emulate realistic operating conditions. During this phase, we will measure key transmission parameters, including:
- Total system delay, from sensing to central decision output (end-to-end latency); 
- Packet loss rate, both random and due to temporary disconnections; 
- Transmission power and energy consumption, measured per sensor over extended sessions. 
Additionally, we aim to characterize the temporal behavior of the system as battery levels decrease, to analyze possible degradation in sampling rate, latency, or BLE stability over time. This implementation phase will also include the integration of error-handling mechanisms, predictive smoothing, and dropout-tolerant inference to mitigate the effects of intermittent data streams.
We also aim to characterize system behavior as battery levels decline, and implement error-handling mechanisms, predictive smoothing, and dropout-tolerant inference to mitigate instability in the data stream. This phase will involve a new round of data collection with an expanded participant cohort under dynamic and unconstrained conditions. Together, these efforts will bridge the gap between offline validation and real-time deployment, enabling a more comprehensive assessment of the system’s viability in embedded and mobile contexts.
The distributed design additionally provides flexibility and robustness. Each local model can operate independently, allowing for partial inference even when one or more sensors become disconnected. This capability improves fault tolerance and supports modular scalability, critical features in real-world environments with unpredictable conditions. These aspects, along with reduced communication overhead, further reinforce the suitability of distributed learning systems for wearable applications with resource constraints.
Despite these advantages, the distributed framework presents certain limitations. Local models may lack the global context available in centralized architectures, which can impact classification performance in activities that involve inter-limb coordination. Furthermore, managing asynchronous data streams and synchronizing predictions across multiple nodes introduces architectural complexity, particularly in lossy or bandwidth-limited communication environments.
Another important consideration is the absence of explicit spatial calibration between sensors. In this study, each local model leveraged only its motion data, which was sufficient for the selected activities. However, tasks that require precise relative positioning, such as gait symmetry analysis, sign language interpretation, or specific rehabilitation protocols, may benefit from sensor alignment in a standard spatial frame. As the focus here was to demonstrate a minimally coordinated distributed system, alternative spatial configurations and calibration strategies were not explored.
Lastly, it is essential to distinguish the proposed HAR framework from full-body biomechanical systems such as Xsens. These systems aim to reconstruct body pose using anatomically anchored sensor placements and high-precision inertial data, typically requiring extensive spatial calibration and personalized models. In contrast, our system focuses on activity classification using independently captured local motion features, without reconstructing global posture. This design supports computational efficiency, modular deployment, and ease of use in practical applications where full pose estimation is unnecessary or impractical.
In summary, while the centralized model offers high overall accuracy and more straightforward integration, the distributed approach provides superior precision, energy efficiency, and robustness. These qualities make it a strong candidate for real-time embedded activity recognition. 
Table 6 summarizes the key trade-offs discussed in this section and supports informed system design decisions based on specific deployment constraints.
Future work will focus on exploring architectures that adaptively combine centralized and distributed inference based on contextual factors and energy constraints. Promising research directions include developing methods to dynamically weight sensor contributions, incorporating attention mechanisms at the fusion stage, and enabling online learning directly at the edge. While the current evaluation was conducted in a controlled environment with a predefined activity protocol to ensure synchronization and label consistency, one of the primary motivations behind the proposed distributed framework is its potential scalability to real-world, dynamic scenarios. Accordingly, future efforts will include validating the model under unscripted, noisy conditions and free-form activity sessions to assess its robustness and generalizability. Moreover, extending evaluation to multi-user scenarios, outdoor environments, and intermittent connectivity will be crucial for advancing the deployment readiness of these systems. Finally, implementing and validating the proposed framework in real-world, real-time conditions beyond offline simulation remains an essential step toward practical application.