**2. Background**

#### *2.1. Object of Human Activities Recognition (HAR)*

Human activities refer to human behaviors concerning the body or the environment. The recognition of human activity aims to capture the action/status of the agents from a series of observations. A successful recognition could provide personalized support in plenty of human-centric applications [16,17]. Since the HAR tasks cover a wide range of activities, it is necessary to sort the related topics in an impressive and compact way. Most research works assort the task into a few levels according to the activity complexity [4,7] (from gestures to actions), followed by human object/human interaction. Group activities [18,19] are the most complicated ones, requiring multiple people and essentially composed of series of gestures, actions, and interactions. In this work, we sorted the human activity recognition into three problems (Figure 2) according to the attributes of the targeted task: body position-related problem, body action-related problem, and body status-related problem corresponding to the questions of "where", "what", and "how", respectively. The"where" problem addresses the position-related recognition, such as indoor positioning [20], tracking [21], proximity [22], etc. The "what" problem deals with the action-related recognition, which belongs to the most widely researched section under the HAR task. Examples are fall detection [23], gait analysis [24], ADL (activity of daily life) [25], etc. The last one is the "how" problem, inferring the body status-related research, such as emotion-sensing [26], respiration/heartrate sensing [27], healthcare [28], etc. This task-oriented categorization aims to supply a basic concept of the objectives of human activity recognition. As can be seen, HAR is a multifaceted topic covering almost all human-related activities and needs interdisciplinary knowledge to understand the behaviors and provide assistance properly.

**Figure 2.** Categorization of human activities.

#### *2.2. General Process of Human Activity Recognition*

Human activity recognition explains comprehensive body behaviors aiming to supply ethical-respect assistance. A complete recognition task is generally composed of three steps (Figure 3): sensing, data processing, and decision making. Sensing techniques play a fundamental role in the procedure, trying to perceive as much contextual knowledge as possible so that a reliable recognition becomes possible. A successful HAR task depends firstly on the data quality perceived from the applied sensors and secondly on the processing skills of the acquired data. With the developments in physics, electronics, and other fundamental subjects, novel sensors and devices are emerging to supply more efficient signal patterns for human activity recognition [29–31]. The revolution of the ToF camera, as an example, has enabled the camera to move from simply capturing the streamed images to providing additional depth information to the images, thus provoking a wide range of recognition tasks such as hand gestures [32] and facial expressions [33]. Recently, significant advances in detection accuracy and range, and the power consumption of the ToF sensor, have continued to boost novel applications in both industrial automation [34]

and consumer electronics [35]. Diverse sensing techniques have been utilized for specific HAR scenarios and have provided outperforming recognition performance, which motivated us to write this survey focusing on those state-of-the-art sensing techniques with in-depth exploration and extensive analysis. After getting the knowledge from the sensing approaches, the second step is to process the data. According to the data quality and the deployed algorithms, a pre-processing step such as normalization or calibration might be needed. For small volume data such as single-dimensional ECG data or RSSI-based positioning data, a rule-based [36] or multi-lateration [37] algorithm will supply good results. For extensive volume data such as image and speech, the algorithms deployed on the pre-processed data have been dominated by deep neural network-based models [38] with a training process due to its cutting-edge recognition performance compared with traditional approaches [39], such as feature descriptors in object detection. Currently, a large amount of HAR tasks are conducted based on the image streams captured from different kinds of cameras. Those works were focused on spatio-temporal relations of the individuals in the scene. Traditionally, researchers handcraft the features [40] to deduce the target activity, but such approaches firstly heavily rely on the individual experience for the selection of high-relative features; secondly, the handcrafted features might be inefficient and lack generalization in dynamic and environments. The last decade's exploration of machine learning has impressively influenced the processing pipeline in HAR applications. Network models based on convolutional computing [41] or attentional mechanisms [42] for feature abstraction have dominated the approaches for data processing and presented the state-of-the-art recognition performance. The corresponding general framework comprises steps including data acquisition from the applied sensing technique, feature abstraction with distinct network models, and target decision-making based on the inference result of the network model [43,44]. After the patterns of the activities are acquired from the data in the processing step, a decision on the activity recognition could be concluded as a final step. This survey will, however, not cover the recognition algorithms adopted in the data processing step and the final inference step based on the network models. The aim of this survey is to supply a detailed explanation of the physical principles under the applied sensing techniques in HAR tasks and discuss the differences between them so that researchers can choose the right one for their applications. As the first component in the pipeline of the HAR application, the sensing techniques transform human physical activities into numerical information that could be further processed. The following section will extensively present the related HAR-targeted sensing approaches and the behind.

**Figure 3.** The general process of an HAR task.

#### **3. Sensing Techniques**

As Figure 1 depicts, we categorized the sensing techniques into five classes according to the sensing principles: mechanical kinematic sensing, field sensing, wave sensing, physiological sensing, and the hybrid or others. Compared with other categorization approaches such as the deployment approach (wearable, object, environmental, etc.), the principle-based categorization gives a better understanding of the sensing technique's physical background. In the following subsections, we will enumerate the leading sensing modalities in each class with their sensing tricks and related state-of-the-art research works. After the enumeration, we also provide an evaluation and comparison of the sensing modalities with the following performance metrics:


#### *3.1. Mechanical Kinematic Sensing*

Mechanical sensing refers to mechanical mobility and deformation when a force is deployed on/from the target. The mobility and deformation are perceived by the mechanical sensors, which transform the mechanical variation into electric signals. Mechanical sensors have been widely used to monitor body activity such as the kinematic senors.

In physics and maths, kinematics is a field of study exploring geometrical motion. Kinematics sensing in HAR is based on the human body-related motion properties such as velocity, acceleration, rotation, etc. Since the recognition of body motion activities is the most related object of HAR tasks compared to other objects such as positioning, status monitoring, etc., kinematic sensors have become the dominant sensing approach in scientific research and industry application. The most popular deployed sensors are inertial ones such as accelerometers and gyroscopes. Another reason for the massive usage of inertial sensors is the power effectiveness and small size, enabling a pervasive embedding of the sensing unit into personal assistant devices such as smartphones and wearable devices such as fitness bands.

Nearly all of the current commercial wearable devices are embedded with inertial sensors that deliver motion signals of a distinct body part without much concern about power consumption and comfort. Both academic and industrial researchers have developed plenty of works with inertial-sensor-embedded wearable applications. For example, Hristijan et al. [45] explored a weighted ensemble learning algorithm with data from head-mounted inertial sensors to recognize eight everyday activities. Tobias et al. [46] proposed a respiration rate monitoring using an in-ear headphone inertial sensor. Wrist-, hand-, finger-worn inertial sensors are primarily used for gesture recognition as a means of human–machine interface [47–49]. Related wearables are smart gloves, smart watches, smart rings, wristbands, etc. Another popular motion-recognition-enabled wearable modality is the smart garment. Kang et al. [50] designed an IMU and conductive-yarn-integrated clothes to prevent spinal disease by continuous posture monitoring. Zhang [51] evaluated an innovative full-body wearable garmen<sup>t</sup> system based on IMUs for motion analysis during different exercises. Wang et al. [52] evaluated stroke patients' acceptance of an IMU-embedded smart garmen<sup>t</sup> for supporting upper extremity rehabilitation and received positive responses in a clinical setting. Besides the wearable electric devices and smart garments, inertial sensors could also be integrated into shoes and soles for foot- and legrelated motion-based research, such as gait analysis [53], indoor pedestrian navigation [54], workout recognition [55], injury prevention [56], etc.

Besides the advantage in wearability (power consumption, small size, low cost, pervasiveness), inertial sensors also outperform in data quality regarding sensitivity and accuracy. A high-resolution accelerometer could sense minor vibrations on bodies. Cesareo et al. [57] assessed breathing parameters using the IMU-based system. With the proposed algorithm, they reconstructed respiration-induced movement and precisely perceived the respiratory rate through an automatic method. Huang et al. [58] demonstrated a novel method for

3D pose reconstruction with six IMUs, which outperformed the camera-based methods in situations such as heavy occlusions and fast motion.

Regarding the above-listed advantages, inertial sensors currently play the most critical role in HAR tasks, even in the unique cases of commercial wearable targeting motionrelated applications [59]. However, inertial sensors need to be mounted on the target part to sense the part motion pattern, which might be annoying regarding the user habit when longterm continuous motion monitoring is demanded and might cause burden and discomfort for users. For highly accurate motion reconstruction, the inertial sensor also faces the challenge of accumulated errors, which need to addressed by constant recalibration.

#### *3.2. Wave Sensing*

Wave sensing is a non-contact sensing technique based on the propagation properties of waves. Three kinds of wave sensing approaches are mainly used for HAR tasks. The first is the RF signals such as WiFi, BT, mmWave, etc., referring to a wireless electromagnetic signal with identified radio frequencies ranging from 3 kHz to 300 GHz. The propagation of the wireless electromagnetic wave is based on the electric and magnetic fields that are orthogonal to each other. The second wave signal is the acoustic signal, a mechanical wave that includes vibration, sound, ultrasound, and infrasound. The third is the optical signal, an electromagnetic signal with the typical extremely high frequency in THz order. In HAR, those wave sensing approaches have been explored widely and deeply. For example, imagebased activity recognition analyzes the target actions in the images from the video and can supply recognition with high accuracy. Since video information is captured by a camera that takes all light rays and focuses it via the lens onto a grid of tiny light-sensitive photosites, it is essentially optic-enabled sensing. RF and acoustic signals, as ambient sensors, offer advantages in both privacy protection and reducing the extra burden of objects.

Two kinds of sensing methods exist in wave-based human-centric sensing: active and passive sensing. Figure 4 shows the essential difference between the two methods. Active sensing requires an external source of energy. The source emits waves to the measured object and receives the wave's reflection, transmission, and absorption. Features abstracted from the received information are then utilized for object description. On the other hand, passive sensing does not need an active wave source and perceives the object variables by receiving a measured wave signal from the object.

**Figure 4.** Wave-based human-centric sensing in two methods: active and passive.

#### (A) RF Signal

RF-based HAR is a non-intrusive approach that can bypass the burden and discomfort caused by wearable activity monitoring sensors. The basic principle of the RF-based HAR system is that the propagation path of the RF wave will be affected by the intrusiveness of the human body. The resulting variations in the received wave can then be used as features to deduce different activities.

A series of RF signals were explored for HAR tasks, such as WiFi, UWB, mmWave, etc. Among them, WiFi is the most popular due to its pervasiveness in the indoor environment. The critical intuition of WiFi-based HAR is that motions of the human body introduce different multipath distortions in WiFi signals and generate different patterns in the time series of channel state information. Li et al. [60] proposed a system named Wi-Motion, being able to jointly leverage the amplitude and phase information extracted from the channel state information sequence, and to achieve a mean accuracy of 96.6% in the line-of-sight environment and 92% in not line-of-sight environment regarding five predefined typical human activities (bend, half squat, step, stretch leg, and jump). Liu et al. [61] designed a WiFi-based sleep monitoring system to abstract fine-grained sleep information such as a person's respiration, sleeping postures and rollovers by continuously collecting the fine-grained wireless channel state information. Besides the activity recognition, the WiFi signal can be leveraged for indoor location tasks. An example work is from Wang et al. [62] where the authors proposed a dual-task residual convolutional neural network with onedimensional convolutional layers for the joint task of activity recognition and indoor localization. Bluetooth technology is another RF approach to perform HAR tasks. However, compared with the WiFi signal, the Bluetooth signal is relatively weak [63]. Thus the accuracy and reaching range is limited. However, it enjoys advantages in cost and ease of use. Therefore, Bluetooth technology is mainly used for indoor locations by deploying plenty of small form-factor, power-saving, cost-efficient tags with high density [64].

Besides the WiFi and BT wave signal, the mmWave technology, which operates in the frequency range of 30 GHz and 300 GHz, recently exhibited high attraction to researchers. Since a higher frequency means a smaller antenna size, thus the mmWave radar is compact in form factor. Many antennas could be packaged into a small space to enable highly directional beams. Moreover, the mmWave signal enjoys a larger bandwidth than WiFi signals and higher range resolution. Recent advances in small and low-cost single-chip consumer radar systems operating at mmWave frequencies have opened up many new applications, such as automotive radar, health monitoring, etc. HAR has also been explored with mmWave-based approaches and has received outstanding results with fine-grained classifiers. Zhang et al. [65] predicted the target behavior by using the micro-Doppler effect (induced by micromotion dynamics of a target or its structure) from mmWave radar [65]. Using a neural network work-based classifier, they go<sup>t</sup> 95.19% accuracy of bulk motion of the body and the micromotions from arms and legs. Zhao et al. [66] proposed a system named mBeats, where a robot mounted with mmWave radar system is used to provide periodic heart rate measurements under different user poses. A falldetection system based on mmWave radar was also presented by Sun et al. [67] with the support of a recurrent neural network with long short-term memory units. Li et al. [68] designed another interesting mmWave radar-enabled system called ThuMouse, which regressively tracks the position of a finger aided by a deep neural network. MmWaverelated exploration is still at an early stage and will have an explosive growth period in the following years triggered by its unusual behavior compared to WiFi, BT, and the large-scale chip-level commercialization.

Another greatly promising and widely used RF wave signal is the ultra wide band (UWB), which is a decades-old wireless technology used for short-range, high-bandwidth communication with a high data rate. Now it is also as a standard for high-accuracy location services. According to FiRa, a consortium founded by the dominating companies for UWB standards, the reborn UWB will mainly be focused on three use cases: hands-free access control, location-based services, and peer-to-peer communication, which will be complementary to current dominant wireless solutions. Recently, UWB support has started to appear in high-end smartphones. There is no question that the UWB will boost another wave on related applications. Figure 5 shows the wide spectrum of UWB compared with others, allowing UWB to operate at a shallow power state and build stable connectivity with other devices in a crowded radio environment. Thanks to the higher base frequency, UWB devices can provide higher accuracy in position with the level of around 10 cm [69], which

is highly dominant compared with WiFi or BT-based positioning with accuracy of meterlevel [70]. Another key feature is that UWB is resistive to the multipath effect, a common issue for most RF-based wave sensing technology. The multipath effect refers to the received radio signal from more than one path because of the reflection of retraction caused by objects near the main signal path. The large bandwidth of UWB provides frequency diversity that can make the time-modulated ultra-wideband (TM-UWB) signal resistant to the multipath effects [71]. Researchers have explored plentiful HAR-related applications with UWB, such as activity recognition in smart homes [72], gesture recognition [73], sleep postural transition recognition [74], healthcare monitoring [75], etc. With the popularization of low-cost UWB chips in wearable devices, there will be more short distance-based novel applications based on the UWB technique, such as swarm intelligence, social distancing, etc. However, despite the above-described advantages of UWB, there will still be some time for a wide deployment of UWB, considering its higher cost. Moreover, regarding the data streaming rate, UWB is not a good option for large data interaction between devices compared with other narrowband radio systems.

**Figure 5.** The wide UWB power spectrum results in a low power consumption compared to other technologies. (Source: FiRa Consortium).

(B) Acoustic Signal

An acoustic signal is a mechanical wave resulting from an oscillation of pressure and travels through the solid, liquid, or gas in the form of a wave. A clear, well-known acoustic signal is the audible sound from a speaker by the vibration of vocal folds. The vibration travels through air and reaches the outer ear and the eardrum. There are two kinds of sound outside the range of audible sound frequency (20–20 Khz): infrasound and ultrasound. An example of infrasound is the atmospheric infrasound caused by the earthquake when the earth's surface near the epicenter and surrounding regions oscillates in a low frequency. Ultrasound is an acoustic signal with a higher frequency than the upper audible limit of human hearings. A widely used example of ultrasound is medical imaging, where the ultrasound waves travel through the body and create a sonogram of organs, tissues, etc.

As an ambient sensor, ultrasound could firstly supply mm level positioning accuracy indoors based on the time of flight [76,77]. Such a positioning system is based on several wireless ultrasonic beacons with fixed and known coordination under an indoor environment, and receives or emits ultrasonic signals which are finally used for position deduction. The wireless module (WiFi, Bluetooth, or others) is used for data interaction and time synchronization. Finger motion recognition is another application based on ultrasound by leveraging the characteristic of detected morphological changes of deep muscles and tendons. Yang et al. [78] had obtained an accuracy of 95.4% for real-time finger motion recognition. Mokhtari et al. [79] proposed a resident identification system as an innovative home platform by using ultrasound arrays to detect the height of the moving resident and other sensors such as pyroelectric infrared to detect the moving direction. Wang et al. proposed a novel contactless respiration monitoring approach using ultrasound signals with off-the-shelf audio devices. Unlike other works based on chest displacement where

false detection may often occur, they monitor the respiration by directly sensing the exhaled airflow from breathing. The principle is that the exhaled airflow from breathing can be regarded as air turbulence, scattering the sound wave and resulting in the doppler effect. The experiment's results showed an accuracy of 0.3 breaths/min (2%), and it was concluded that the ambient noise and the variation of respiration rate, respiration style, sensing distance, and transmitted signal frequency have little effect on respiration monitoring accuracy of the system.

Previous works on sound (captured by the microphone on a smartphone) are mainly focused on the following application cases: environment assessment [80,81], proximity sensing [82,83], or indoor positioning [84,85]. The sources of sound are either from finetuned tags or from the surroundings. In the work of Benjamin et al. [82], an algorithm using inaudible sound patterns was explored to accurately detect whether two mobile phones are within a few meters from each other. The method can be implemented as a standard smartphone application with real-time inferencing, enabling smartphone-based collaborative activity detection and other embedded sensors.

Overall, acoustic signals provide an alternative and competitive approach for highly accurate human or robot positioning and distance-related activity recognition. The method is non-intrusive, thus reducing users' extra burden and protecting privacy security. However, it still suffers from the computational load and is limited by complex environmental acoustic sources. For example, the accuracy and robustness of ultrasound-based indoor positioning enormously decrease when a collision-like sound occurs, or when a significant barrier between tags exists.

#### (C) Optic Signal

Optical signals for HAR tasks mainly refer to deep learning-enabled image processing with the images captured by the photosensitive elements in cameras. Most related works focused on spatio-temporal relations among the objects in the scene. Those works involved tracking multi-agents spots, evaluating their appearance, aggregating independent and joint features, segmenting their movements, extracting their actions, and then perceiving their activities. Image-based systems could cover almost every HAR task and achieve very high recognition accuracy because of the complete view of data captured in the scene. The covered tasks include positioning, navigation, body-part monitoring, full-body monitoring, individual activity recognition, group activity recognition, etc. Sathyamoorthy et al. [86] designed a system named COVID-robot for social distancing monitoring in crowded scenarios. With the help of an RGB-D camera and a 2-D Lidar, the mobile robot can avoid collision in a crowd and estimate distance between all detected individuals among the camera view during self-navigating. Lee et al. [87] presented a innovative wearable navigation system based on an RGBD camera to help the visually impaired. A glass-mounted RGBD camera collected the environment information, which is as a input to their navigation algorithm of real-time 6-DOF feature-based visual odometry. Kim et al. [88] proposed a hand gesture control system based on the tactile feedback to the user's hand. Amit et al. [89] proposed an approach to analyze a user's body posture during a workout and compare it to a professional's reference workout, thus getting visual feedback while performing a workout. The system aims to assist people in completing the exercises independently and prevent incorrectly performed motions that may eventually cause severe long-term injuries. Meng et al. [90] addressed the problem of recognizing person–person interaction by depth cameras providing multi-view data. They divided each person–person interaction into body part interactions at first. Then the pairwise features of these body part interactions were used to analyze the person–person interaction. The method was demonstrated in three public datasets. As can be seen, the image-based HAR tasks are profoundly dependent on the neural-network-based algorithms. Most of the researcher's effort in this field is in the advanced algorithm exploration to reach the state of the art.

Undoubtedly, camera-based HAR systems have succeeded in different scenarios, including indoor monitoring and outdoor surveillance. However, the problem is that

the approach might not be well accepted due to severe privacy concerns. This is one reason that sensor-based HAR is still prevalent in research communities and has led to many research contributions recently. Another significant disadvantage of an image-based solution is located in the computation load. Since the image-based HAR needs strong hardware support (GPU, CPU, memory, bus) for running the millions of parameters (weights and activations) from the trained deep neural network, the cost of hardware resources, power, and maintenance is enormous. Additionally, since this is an optic sensing solution, the performance is deeply influenced by environmental conditions such as light, temperature, air quality, etc.

#### *3.3. Physiological Sensing*

The term "physiological sensing" refers to both the natural physiological signals and the kinematic signals activated from the organism. Physiological variables have been widely used in diagnosis, drug discovery, healthcare monitoring, etc. In human activity recognition, the human body, a compound of biochemistry, has a rich set of electrophysiological and kinematic variables that could be measured on the body to indicate the status and action of the object. Figure 6 summarizes the biological variables used in the task of HAR.

**Figure 6.** Physiological sensing modalities for HAR.

#### (A) Electrophysiological Signals

Electrophysiology focuses on the electrical properties of the neurons, molecular and cellular, of living beings. The behavior of neurons is essentially based on the electrical and chemical signals inside the physical body. A series of high-level expressions and actions could be interpreted by monitoring those signals. EMG (electromyography), ECG (electrocardiogram), EEG (electroencephalogram), and EOG (electrooculography) are commonly monitored electrophysiological signals in clinical scenarios. Research works in the last decade showed a significant contribution of electrophysiological signals in human behavior interpretation. For example, electromyography is a diagnostic procedure that monitors the electrical signals of muscles and motor neurons. Pancholi et al. [91] developed a low-cost EMG sensing system to recognize the arm activities such as hand open/close or wrist extension/flexion. Srikanth et al. [92] focused on the recognition of complex construction activities with wearable EMG and IMU sensors in a neural network-based way. Similar work has been explored for hand gesture recognition [93,94], human–computer interaction [95,96], etc. ECG records the electrical signal during the heartbeat. With up to twelve electrodes, ECG signals are commonly used to check different heart conditions. The ECG signal is also a popular explored signal for HAR and commonly combined with other inertial sensors [97,98]. Since the cells in the brain communicate through fast electrical impulses, researchers developed EEG equipment to record the brain's electrical activity by using small metal electrodes attached to the scalp [99]. The signal was also explored in HAR such as eyes open/close [100], emotion recognition [101], etc. EOG is a technique for recording the capitalization on the eyes' cornea–retina potential difference. Typical basic applications of EOG signals are ophthalmological diagnosis and eye movement recording. However, researchers have already explored the potential of EOG signals in HAR [102]. Lu et al. [103] also proposed a dual model to achieve EOG-based human activity recognition

with an average recognition accuracy of 88.15% according to three types of activities (i.e., reading, writing, and resting). Besides the above-listed commonly used electrophysiological signals, many other related signals describing various electrical body-related variables could be explored for HAR tasks. Electrophysiological signals need more effort for activity interpretation compared with other sensing approaches because of the complexity of body anatomy and are used mostly as an auxiliary role. However, they have advantages such as ubiquity and the on-body measurement, indicating the potential of wearables in the implementation stage.

#### (B) Other physiological signals

An example is from Paolo Palatini's study [104] exploring the relation between sports and blood pressure. One of the conclusions is that both systolic and diastolic blood pressure increase significantly during weight lifting, which is a solid support to the current belief that people with hypertension should not take isometric sports. Besides the blood pressure observation, monitoring kinematic signals such as respiration and heart rate plays a critical role in sleep studies, sports training, patient monitoring, etc. Lu et al. [105] designed a wearable sensor system with the fusion of heart rate, respiration, and motion measurement sensors to enhance the energy expenditure estimation. Their study shows that the fusion design supplies more stable estimation than existing systems. Brouwer et al. [106] improved real-life emotion estimates based on heart rate. Li et al. [107] proposed a sleep and wake classification model with heart rate and respiration signals for long-term sleep studies and reached 88% classification accuracy. Plenty of research work utilized the two sensing modalities in wearable configuration to monitor medicine and health state [108,109]. Phonation is when the vocal folds produce certain sounds through vibration, which has also been explored to help disabled and unhealthy individuals for a better expression or understanding. Lee et al. [110] developed a lip-reading algorithm using optical flow and properties of articulatory phonation for hearing-impaired people, supplying them with continuous feedback on their pronunciation and phonation through lip-reading training, aiming for more effective communication with people without hearing disabilities. Gomez et al. [111] proposed a monitoring approach of Parkinson's disease leveraging biomechanical instability of phonation for the frequent evaluation at a distance. Muscle (either on facial or other body parts) and joint movement monitoring is a more straightforward way for human activity recognition. The movement can be perceived by a series of sensors such as fabric stretch sensors, capacitive sensors, laser doppler vibrometry, etc. Applications based on muscle/joint movement monitoring include hand gesture recognition [112], physical stress [113], gait cycle estimation [114], chronic pain level recognition [115], etc. As electrophysiological sensing, kinematic biological sensing is an on-body approach that the monitoring can be placed near the body, enabling continuous observation and remote feedback, especially for healthcare, diagnosis, and rehabilitation applications.

#### *3.4. Field Sensing*

The field is a concept in physics, inferring a region in which each point will be affected by force. For example, electric charges will form an electric field. When another charged particle is placed in the electric field, it will bear an electric force that either repels or attracts it. A magne<sup>t</sup> will generate a magnetic field surrounding it, and a paper clip in the range of the field will be pulled towards the magnet. Two like magnetic poles will also repel each other when they are close enough to be in the range of either magnetic field. Any object with a quality on Earth will fall to the ground because of its gravity, as it is affected by the force of Earth's gravitational field.

The field strength means the magnitude of a vector-valued field. For example, in the electric field, the strength is represented by the unit of volts per meter (V/m). In the magnetic field, the field is represented by Oersted\*Ampere/meter (Oe\*A/m). Moreover, when the flux density defines the strength, the Gaus (G) units or Tesla (T) are used. The gravitational field strength is measured in meters per second squared (m/s2) or Newtons per kilogram (N/kg). All the units used to represent the field strength are vector-valued.

Another approach to know the field strength is to look at the field contour lines. The closer the lines are, the stronger the forces in that part of the field are, and the stronger the field strength is.

Figures 7–9 show an electric field of a parallel plate capacitor, a magnetic field activated by a Helmholtz coil, and the gravitational field of the Earth, respectively. Field-based sensing is based either on the field strength measurement (such as magnetic field strength) or the strength variation caused by characteristics indirectly (such as the potential change of the capacitor, the pressure of object caused by the gravity).

**Figure 7.** Eletric field (parallel plate capacitor).

**Figure 8.** Magnetic field (Helmholtz coils).

**Figure 9.** Gravitational field of Earth.

(A) Electric Field

The electric field is ubiquitous in our environment since any potential difference will construct an electric field. Either powered objects (such as appliances, walled power cables, etc.) or non-powered conductive items (such as metal frames near the power cable in a building, the human body, etc.) will activate an electric field to near objects that have a different potential level (especially the ground). The potential difference is essentially a difference in charge distribution. A typical example is that people sometimes

feel mildly shocked when touching an appliance, even when the appliance is powered off. This is because there is a possibility of residual charge remaining inside the capacitors of the electronic circuits, which takes a little time to discharge. When the appliance is not appropriately grounded, touching it will cause a mild shock as the charge is transferred to the neutral body.

There are mainly two kinds of electric field-based HAR applications—active or passive—depending on the emitter of the field. An active electric field-based HAR application delivers the field variation as a signal source when the field is emitted from the environment and the human acts as an intruder. A passive one delivers the field variation when considering the electric field emitted from the body itself to the ground since the human body is a perfect conductor and can store the charges. The passive electric field describes a biological signal of the body, the human body capacitance, which will be introduced in the following subsection of the hybrid sensing technique in HAR. Here we firstly focused on the active electric field-based HAR application.

A very representative work is from Zhang et al. [116], where they introduced roomscale interactive and context-aware applications with a system named Wall++, which is a low-cost sensing technique that turns ordinary walls into smart infrastructures. The system can first track users' touch and gestures and estimate body pose when close with the principle of active mutual capacitance sensing, which measures the capacitance between two electrodes (namely the electric field strength between the electrodes). When a body part is near a transmitter–receiver pair, it interferes with the projected electric field, reducing the received current, which can be measured for inferencing. On the other hand, if the user's body touches an electrode, it dramatically increases the capacitance and the received current. Secondly, the system could also work in a passive airborne electromagnetic sensing mode to detect and track the active appliances and users when wearing an electromagnetic emitter. Another typical work is from Cheng et al. [117], where the authors used conductive textile-based electrodes that are easy to be integrated into garments to measure changes in the electric field strength (in capacitance) inside the human body. Since those changes are related to motions and shape changes of muscle, skin, and other tissue, the authors thus abstracted high-level knowledge from the changes and inferenced a broad range of activities and physiological parameters. For example, they embedded the prototype into a collar and performed quantitative evaluations of the recognition accuracy of actions such as chewing, swallowing, speaking, sighing (taking a deep breath), and different head motions and positions. There are other similar works based on active electric field sensing, such as touch detection [118], body tracking based on smart floor [119], respiration, heart rate, stereotyped motor behavior recording [120], hand gesture recognition [121], etc.

Active electric field sensing is non-intrusive, low-cost, has low power consumption, and has excellent potential for pervasive privacy-respecting environmental sensing. However, it is still more complex in hardware construction compared with the passive electric field sensing mode. Furthermore, it can be affected by electromagnetic interference. Thus its reliable operation has a demand in environmental conditions.

#### (B) Magnetic Field

Magnetic field sensing is an active approach for distance-based motion sensing. There are mainly two magnetic field-based motion-sensing systems depending on whether the magnetic field was generated by the direct current (DC) or alternative (AC) current.

In DC magnetic field motion sensing systems, electromagnets or permanent magnets are often used to generate the magnetic field. A magnetic sensor (magnetometer) senses the magnetic field strength. Since the magnetometer is widely embedded into wearable devices, the DC motion sensing system has been extensively explored for finger/hand tracking to enable a novel machine input approach. Chen et al. [122] designed a system named uTrack, which converts the thumb and fingers into a 3D input system using magnetic field sensing. A permanent magne<sup>t</sup> was affixed to the back of the thumb, and a pair of magnetometers were worn on the back of the fingers. A continuous data stream was obtained by moving the thumb across the fingers and was used for 3D pointing. The system

shows a tracking accuracy of 4.84 mm in 3D space. Similar works [123,124] were conducted using a permanent magne<sup>t</sup> as the field generator for motion tracking.

In contrast, AC magnetic field sensing is mostly composed of oscillation-based magnetic field transmitters and receivers. The transmitter mostly uses coils to generate an alternating magnetic field. The receiver is also integrated with a coil to sense the strength of the magnetic field at different distances from the transmitter coil. This principle is that the oscillating magnetic flux through the receiver coils will induce an oscillating voltage with the same frequency. The voltage is later used for distance or pose estimation. Oscillating magnetic field has been explored in a variety of HAR tasks, such as indoor location [125], finger tracking [126], human–computer-interaction [127], wearable social distance monitoring [128–130], etc. It could also be implemented for underwater positioning to enable the tracking or navigation of underwater-unmanned vehicles or divers [131].

The advantage of the DC magnetic field motion sensing system is that the magne<sup>t</sup> used for field generating is easy to access. The sensing unit is at the chip level, thus enjoying the pervasiveness regarding the wide use of smart wearable devices. Moreover, the tracking accuracy can reach up to mm level. The disadvantage of such a system is located in the short sensing range. Since the field attenuates quickly, the detection range is limited to several centimeters. The AC magnetic field sensing's performance in range and accuracy mainly depends on coil design. The detection range could reach up to ten meters with a larger transmitter coil. Ordinary everyday used furniture made of wood and textile will not deform the distribution of the activated field. However, the drawback is that the metallic objects will cause magnetic field distortions. Fortunately, researchers have tried to address this issue by a secondary calibration (either with a look-up table or with neural network-based calibration) step and achieved outstanding results [132].

#### (C) Gravitational Field

A gravitational field explains gravitational phenomena when a massive body produces a force on another massive body. Earth's gravity is denoted by g, describing the net acceleration imparted to the physical objects caused by the combined effect of gravitation (caused by the mass distribution within Earth) and the centrifugal force (caused by Earth's rotation). On Earth, gravity gives weight to physical objects. The weight is calculated by multiplying the gravitational acceleration by the mass. Gravitational field-based HAR tasks mainly utilize the pressure sensed by pressure sensors caused by the body's weight. Different pressure sensors are presented for HAR tasks, such as the commercially available force-sensitive resistor (FSR), resistive textile, etc. By analyzing the pressure patterns caused by the motion of the body, extensive HAR applications are explored, such as gait analysis [133], workout recognition and user identification [134], indoor location [135], smart furniture [136], rehabilitation [137], etc. The textile-resistive pressure sensor is composed of a matrix of resistive units. By sensing the pressure of each unit from the matrix, the user motion patterns can be delivered. For a small number of resistive units, such as a few FSR units integrated into the insole, a one-dimensional data stream is used for action recognition. For a large number resistive units such as would be found on a mat-like surface, the data stream is usually converted to pressure images as two-dimensional arrays, which can be processed by a neural network-based algorithm used in computer version tasks for more accurate activity recognition.

One of the advantages of a pressure-based sensor is that the sensing component can be customized to any shape and size. Thus it is suitable for a large scale of surface types that needs to be sensed. The sensing precision could also be adjusted by arranging the density of the sensing units. Cheap, commercially available layer-wise films commonly construct the sensing unit. Thus the overall system is affordable to build. However, the cost comes into the system's deployment in a large area (such as floors for location and tracking) since the sensing only occurs during contact, which is a drawback compared with other sensing modalities such as RF-based sensing with no limitation of contact. In summary, gravitational field-based HAR is a non-intrusive and straightforward motion action monitoring and analysis method. It can be extensively deployed for intelligent

ambient sensing but is limited by the contact constraints and cost of deployment in a large area.

#### *3.5. Hybrid / Others*

#### (A) Human Body Capacitance

Human body capacitance (HBC) is essentially a biological variable describing the capacitance between the human body and the environment, mainly the ground. It is also a passive electric field-based sensing approach since the capacitance model comprises two conductive plates that store charges (corresponding to body and environment in the human electric field model) and a dielectric medium (corresponding to the air between body and ground). Figure 10 depicts the human body capacitance in a living room, where multiple electric fields exist, for example, the field between the appliance and the ground, between the metal frames of the window/door to the ground, as well as the human body capacitance between the body and the environment. person–person is a ubiquitous biological parameter that could be explored for a wide range of human-centric motionrelated applications based on its sensitivity to both the body's motion and the variation of the environment.

**Figure 10.** Human body capacitance: the static electric field between the body and the environment.

Unlike other biological features, such as ECG, EMG, etc., HBC is a feature that interacts with surroundings, especially the ground. Being insulated by the wearing, the body and the surroundings form a natural capacitor. HBC is used to describe the charges stored in the body. A series of studies [138–141] indicate a value of 100–400 pF of the body capacitance. The value varies with respect to skin state [142,143], garmen<sup>t</sup> [144], body postures [145], etc. Researchers have explored applications such as communication [146], cooperation perceiving [147], motion monitoring [117,148,149], etc., based on the concept, which has continued attracting the attention of researchers recently. Since HBC is a passive signal, the sensing units were mostly designed in a small form factor with small power consumption [149,150]. Wilmsdorff et al. [151] explored this passive capacitive sensing technique with a wide range of applications indoors and outdoors. In [152], the authors presented an HBC-based capacitive sensor for full-body gym exercise recognition and counting; by sensing the local potential variation of the body, different kinds of body actions could be classified. Besides motion sensing, HBC could also be used for proximity and joint activity recognition [147] by exploring the human body capacitance variation caused by the proximity and motion of an intruder.

As a passive motion-sensing approach, the systems based on human body capacitance enjoy the advantage of low cost, low power consumption, portability, and full-body sensing ability. However, although the sensitivity in motion and environmental variation forms the potential ability of this variable, at the same time they also limit the development of it, since any action, either from the body or from the environment, will induce an efficient signal, and there is difficulty in recognizing the source of the signal.
