Research regarding biometrics all stemmed from the well-known vulnerabilities of knowledge-based authentication systems (KBA) and token-based authentication systems (TBA). In two recent works, the authors identified the challenges around knowledge-based systems [
20,
21]. Usability and security aspects were identified as the main challenges with regard to KBA systems. TBA systems are more common than many expect, with radio frequency identification (RFID) technology increasingly being used in many domains such as Healthcare [
22,
23] and smart cities [
24]. Fatima et al. [
25] demonstrate the vulnerabilities of smart card theft, stealing microchips, and impersonation.
Biometric research has become an intensively researched topic in order to overcome the challenges and limitations imposed by previous authentication techniques. Biometrics can mainly be identified as physiological and, more recently, behavioural. Palma et al. [
26] provide a detailed overview of biometric-based human recognition systems. The physiological biometrics mentioned include fingerprint, face, hand, iris, ear acoustic, vascular patterns, electrocardiogram (ECG) and deoxyribonucleic acid (DNA). However, Joshi et al. [
27] performed a security analysis on the fingerprint system alone and confirmed that there were 16 possible attack points. Devices such as smartphones are using biometric authentication systems, such as facial authentication (FA) techniques. Like all other forms of biometrics, FA is vulnerable to presentation and spoofing attacks, as demonstrated by Zheng et al. [
28]. The geometry of the hand is susceptible to spoofing attacks, as Bhilare et al. [
29] demonstrated that the system has a spoofing acceptance rate of 84.56%. From a security perspective, this system is not sufficiently secure to be used. All other physiological techniques also suffer from issues related to human characteristics that deteriorate with age, sensing challenges with different illuminations, and substantial changes to the biometric feature such as a cut to the skin.
To overcome the challenges, research has been shifted towards investigating behavioural biometrics due to the numerous benefits it has over physiological data from a security and usability perspective. Behavioural biometrics are all non-intrusive [
1,
30]; however, they are difficult to replicate, which instantly takes away the presentation attacks discussed earlier, and generally more secure in the sense that humans do not have to remember any passwords or token cards. Various behavioural biometrics have been researched, including keystroke dynamics [
31,
32,
33,
34], mouse dynamics [
35,
36], gait [
37], and many more.
Recent research regarding behavioural biometrics has investigated ways of utilising existing infrastructure in people’s daily lives, with the main motivation of keeping down expenses as well as providing scope for the research to develop. Furthermore, by leveraging existing Wi-Fi devices in buildings, no wearable sensor or camera-based system that can be privacy-invasive needs to be used; hence, the sensing is conducted in a device-free manner, which is positive in terms of user convenience. Current research has shifted to Wi-Fi sensing and the use of radio frequency signals, which use radio frequency (RF) signals that propagate in the surrounding environment [
38]. CSI refers to information about a wireless communication channel between a transmitter and a receiver. The CSI recorded can be affected by various factors such as obstacles that can be static or moving, i.e., furniture, walls, or humans, as well as distance, among others.
2.1. Human Activity Recognition (HAR) Based on Sensors
Recently, HAR from various sensors has received great attention due to its beneficial use in various applications in the fields of surveillance systems, health care systems, rehabilitation, and smart homes. Currently, HAR is recorded by internal or external sensing. Internal sensing usually refers to a device attached to the human body, this can range from a smartwatch to smart clothes which have various sensors containing essential data which can be leveraged for various applications in many ways. There are many integrated sensors within these wearable devices, including accelerometers, gyroscopes, and orientation and magnitude sensors, which all have different functionalities capturing different sets of data. The reason why wearable devices have come under the spotlight in recent years is due to the ability to utilise the captured data to classify the activity of a person in real-time to provide assistance and guidance, which refers to human activity recognition (HAR) [
39]. HAR plays a key role in ambient assisted living (AAL), medical diagnosis, and especially in healthcare, among many others.
Muaaz and Mayrhofer investigated the security strength of a smartphone-based gait recognition system using zero-effort and live minimal-effort impersonation attacks using the accelerometer sensor within the smartphone [
40]. Showing that the system had no false positives and the expert impersonators found it difficult to mimic the gait due to regularity between their steps. However, this system matched impersonators with similar physical characteristics to their targets, and the system will find it difficult to distinguish attackers outside of the experiment. Sun et al. also investigated the accelerometer sensor in wearable Internet of Things (WIoT) devices [
41], which again looks at gait recognition. The authors proposed a speed-adaptive gait cycle segmentation method and a matching threshold generation method to mitigate the problem of varying walking speeds. [Gait-based identification for elderly users in wearable healthcare systems] also looked at gait recognition for the use of elderly people by alleviating the problem of intra-subject gait fluctuation. The authors proposed a gait template synthesis model and an arbitration-based score-level fusion method to improve the overall accuracy. The system achieved an average recognition rate of 96.7%. However, both systems would deteriorate because the study only focused on one age group, and younger generations have intrabody gait fluctuations, which lead to greater variability in the results due to their unstable walking [
41,
42]. Furthermore, both systems require high computation and memory costs, which is an ongoing challenge in WIoT devices due to their low memory nature, which means that the methods have to be very efficient and often not robust enough to enable high accuracy. Recent advances in wearables have led to necklaces and knee bandages. Chen et al. explored the use of a neck-mounted wearable that has embedded infrared (IR) sensors to track facial expressions [
43]. Although this seems practical, the camera may be blocked by hair or a beard and is dependent on walking parameters such as speed. Very recently, Lie et al. investigated the various wearable sensors in a medical knee bandage and aimed to provide postoperative rehabilitation and protection [
44]. They incorporated five sensors, including electromyography sensors (EMG), accelerometers (ACC), electrogoniometers (EMG), gyroscopes (GYRO), and microphones (MIC).
Although there is promise with respect to wearable devices for a range of applications, there are many limitations and challenges that accompany it. Some limitations have been expressed specifically for each listed paper above; however, the listed problems below are applicable to all. Firstly, WIoT devices need to wear them at all times for them to function, and this is inconvenient and cumbersome. Also, it may cause skin irritation and other medical problems for some users [
45,
46]. Although some articles consider the security implications of WIoT, there still persist problems in which illegal intruders who are not registered in the system or do not wear these devices cannot be recognised by wearable device-based systems [
47]. One of the biggest challenges embedded in wearable research is that devices with small size and low power consumption and factor form, such as those dedicated to wearable platforms, have strict computational, memory, and energy requirements [
48]. A recent study by Tran et al. investigated chronic patient perceptions of the use of WIoT in healthcare [
49]. Although 55% believed that the devices could improve their follow-up and reactivity of care, most of the responses highlighted that patients feared that they could replace human intelligence, represent serious risks of hacking, or lead to misuse of private patient data by caregivers. Thus, 22% of the study’s users would refuse to use the WIoT devices for the above reasons.
External sensing refers to devices such as cameras in fixed locations. Vision-based HAR research can be divided on the basis of the data type, which includes RGB data and RGB-D data. Zerrouki et al. recently investigated HAR based on the variation of body shape; this was achieved by segmenting the body into five partitions and in each frame, area ratios were calculated and fed into the proposed adaptive boosting algorithm [
50]. However, two factors severely impact the performance of this system. In dark or dusky conditions, the camera will not be able to detect the human body due to low illumination levels. In addition to this, the automatic changing of the background makes human action recognition challenging and can generate errors and false classifications. Oyedotun et al. applied deep learning for hand gesture recognition in Thomas Moeslund’s gesture recognition database [
51]. The authors demonstrated that DNN and stack denoising autoencoders (SDAEs) are capable of learning the complex hand gesture classification task with lower error rates. They achieved recognition rates of 91.33% and 92.83% for CNN and SDAE, respectively. Abraham et al. noticed the problem of RGB and depth camera videos that are affected by background clutter and illumination changes and are applicable to a limited field of view only; the above two papers fall under this issue [
52]. The authors overcame this by presenting a multimodal feature-level fusion approach that includes an RGB camera, depth sensor, and a wearable, and an accuracy rate of 97.6% is achieved on the publicly available UTD-MHAD dataset. However, the system does not incorporate multiview HAR, and the orientation of the person whose action is being recognised must be in sight of the operating camera.
The ubiquity of Wi-Fi technology has made it indispensable in our daily lives. Researchers are increasingly exploring its applications in wireless sensing due to its widespread availability and cost-effectiveness. By harnessing commercial Wi-Fi devices, researchers can create innovative sensing solutions without the need for expensive and complex wearable devices or cameras. Although certain wearable devices may be affordable, they often come equipped with low-cost sensors, resulting in subpar performance. The use of Wi-Fi infrastructure presents a viable and economical alternative for advanced sensing technologies. In contrast to vision-based techniques, identification systems rely on Wi-Fi signals capable of penetrating obstacles, making them particularly effective in complex and cluttered environments. Diverging from methods employing inertial measurement units (IMUs) or wearable sensors, Wi-Fi sensing operates in a non-intrusive and device-free manner. Furthermore, in the context of the expanding Internet of Things (IoT) landscape, the widespread presence of Wi-Fi devices facilitates the establishment of a ubiquitous and imperceptible security system through CSI-based biometric sensing. Wi-Fi sensing offers enhanced privacy compared to wearable-based systems or camera-based methods due to its nonintrusive nature and reduced risk of capturing sensitive visual information. Unlike cameras, which can potentially record detailed images or videos of individuals, Wi-Fi sensing operates on the basis of radio waves, which do not capture visual data. Additionally, wearable devices often require physical contact with the body, raising concerns about personal space and consent. Wi-Fi sensing, being device-free, eliminates the need for individuals to wear or carry any tracking equipment, preserving their privacy and reducing the risk of unauthorised data collection. In addition, Wi-Fi signals can be effectively anonymised, ensuring that the identification and tracking of specific individuals remains challenging, further safeguarding user privacy. These factors collectively contribute to Wi-Fi sensing being a more privacy-conscious choice in comparison to wearable or camera-based systems.
2.2. Human Activity Recognition (HAR) Based on CSI
CSI-based HAR has received a lot of attention in recent years because of the advantages of sensor-based HAR. The main benefits include the fact that it is non-intrusive and does not require users to wear any sensors on their bodies. It is insensitive to illumination, making it effective at all times of the day. It also has greater privacy protection due to not having a camera operating in the room. Finally, CSI-based HAR is cost-effective as it is often implemented on existing Wi-Fi infrastructure and does not require any additional hardware. Wi-Fi signals can be described in two different ways: received signal strength (RSS) and channel state information (CSI). RSS is often used in indoor positioning and provides an estimate of the power of the received signals. However, RSS is not stable and cannot capture dynamic changes in the signal while an activity is being performed [
53]. In contrast to RSS, which provides a more general indication of signal strength, CSI offers a more detailed and dynamic representation of the signal, making it a better choice for HAR.
The two problems associated with CSI-based HAR remain with the dynamic environments and recognising new activities with new users, which ultimately requires new sampling in different environments, which is a time-consuming task and inconvenient for users. Wang et al. [
6] propose a multimodal CSI-based HAR system (MCBAR) that aims to address the issue of rarely seen activities in unseen environments and accomplishes this by using generative adversarial network (GAN) and semi-supervised learning techniques. This method enables a more robust system and requires only a small amount of data from the participants. In one paper, the authors aim to improve recognition performance by implementing a framework, augment few-shot learning-based human activity recognition (AFSL-HAR). The framework achieves high recognition rates by including a feature Wasserstein generative adversarial network (FWGAN) module, which can synthesise diverse samples to help the recognition model learn sharper classification boundaries [
54].
Schäfer et al. investigate deep neural networks, such as LSTM and SVM, to classify eight common activities. These are EMPTY, LYING, SIT, SIT-DOWN, STAND, STAND-UP, WALK, and FALL [
55]. Three experiments were conducted on different platforms, and all achieved high accuracy. One other related paper investigated attention-based bidirectional long-short-term memory (ABLSTM) for passive HAR recognition. BLSTM enables the model to learn features in two directions from raw sequential CSI data [
56]. LSTM only processes in one direction; however, Chen et al. [
56] argued that future CSI is also of great importance for HAR. The attention layer can assign greater weights to more important features and time steps to be identified, leading to greater performance. Shalaby et al. [
57] looked at four different deep learning methods, specifically: a convolution neural network (CNN) with a gated recurrent unit (GRU); a CNN with a GRU and attention; a CNN with a GRU and a second CNN; and a CNN with long short-term memory (LSTM) and a second CNN. They report that dividing the model into two main steps, feature extraction and classification, enabled high performance. Their models achieved good accuracy levels of 99.31%, 99.16%, 98.88%, and 98.95%, compared to those of 75% and 95% achieved by the LSTM and ABLSTM models, respectively.
Other papers have investigated transferring CSI data into images and feeding this in CNN networks, whilst others investigated deep neural networks (DNN). A work collected seven different activities and collected CSI from a Raspberry Pi 4 device and converted the data to RGB images to be fed into a 2D CNN classifier [
53]. CNN can analyse the data in parallel rather than sequentially, unlike other models such as LSTM. Training time for other neural networks is longer than CNN; therefore, by using CNN, there is less consumed training time and lower computational complexity, leading to greater model efficiency. The model was successful with an accuracy of around 95%. In other work, the authors have used DNN and used four techniques for HAR [
58]. The proposed model (HARNN) implements a two-level decision tree, a linear regression method, a noise removal mechanism, and an RNN used to recognise human activity. Finally, work investigating the vulnerability of how a DNN will be influenced by adversarial attacks by adding small perturbations to the CSI has been undertaken [
56]. When adversarial attacks were implemented, the accuracy was reduced to around 0.3, which presents a great security risk.
Overall, HAR using Wi-Fi CSI information is a complicated task affected by numerous surrounding parameters, such as multipath reflections of Wi-Fi signals in the nearby environment where the activities are performed, and temperature and humidity of the air, which influence the amplitude and phase shift of the received signal. A significant amount of research is centred around deep learning techniques. Overfitting is a big problem with deep learning, as the model achieves very high accuracy while training without learning any complex pattern, so when exposed to unseen data, the classification is often incorrect. However, all papers mentioned above often fail to compare their techniques to others, and this data collection is often the issue. Issues persist with the number of users involved. Recent publications involve 3 to 10 users, which does not allow robust findings [
6,
54,
55,
56,
57,
58,
59,
60]. It is also the case that papers cite the issue with dynamic environments, but again the papers only test two or three different environments, which does not give enough trust in the model that is being tested.