AtomGID: An Atomic Gesture Identifier for Qualitative Spatial Reasoning

Bouchard, Kevin; Bouchard, Bruno

doi:10.3390/app14125301

Open AccessArticle

AtomGID: An Atomic Gesture Identifier for Qualitative Spatial Reasoning

by

Kevin Bouchard

^*

and

Bruno Bouchard

Laboratoire d’Intelligence Ambiante pour la Reconnaissance d’Activités, Université du Québec à Chicoutimi, Chicoutimi, QC G7H 2B1, Canada

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(12), 5301; https://doi.org/10.3390/app14125301

Submission received: 29 May 2024 / Revised: 12 June 2024 / Accepted: 17 June 2024 / Published: 19 June 2024

(This article belongs to the Special Issue Intelligent Systems for Multidisciplinary Applications in Era of Artificial Intelligence)

Download

Browse Figures

Versions Notes

Abstract

:

Featured Application

Non-deep-learning approach to recognize gestures from an imprecise tracking system such as passive RFID.

Abstract

In this paper, we present a novel non-deep-learning-based approach for real-time object tracking and activity recognition within smart homes, aiming to minimize human intervention and dataset requirements. Our method utilizes discreet, easily concealable sensors and passive RFID technology to track objects in real-time, enabling precise activity recognition without the need for extensive datasets typically associated with deep learning techniques. Central to our approach is AtomGID, an algorithm tailored to extract highly generalizable spatial features from RFID data. Notably, AtomGID’s adaptability extends beyond RFID to other imprecise tracking technologies like Bluetooth beacons and radars. We validate AtomGID through simulation and real-world RFID data collection within a functioning smart home environment. To enhance recognition accuracy, we employ a clustering adaptation of the flocking algorithm, leveraging previously published Activities of Daily Living (ADLs) data. Our classifier achieves a robust classification rate ranging from 85% to 93%, underscoring the efficacy of our approach in accurately identifying activities. By prioritizing non-deep-learning techniques and harnessing the strengths of passive RFID technology, our method offers a pragmatic and scalable solution for activity recognition in smart homes, significantly reducing dataset dependencies and human intervention requirements.

Keywords:

smart home; flocking; clustering; qualitative spatial reasoning; passive RFID; gesture

1. Introduction

Western societies are increasingly challenged by the phenomenon of global population aging [1]. This demographic shift poses a significant threat to the sustainability of healthcare services [2], highlighting the urgent need for innovative solutions. One such solution is the concept of aging in place [3], which advocates for enabling individuals to reside in their own homes with a degree of independence for longer periods, rather than relocating to specialized care facilities. An effective strategy for implementing aging in place involves the development of smart homes tailored to the specific profiles and needs of their residents [4]. For instance, people with mild cognitive deficit, early Alzheimer’s disease, or intellectual impairment could benefit from such technology. Smart homes are residential spaces equipped with a range of sensing technologies [5], which can include vision-based sensors, ultrasonic sensors, or simpler ubiquitous sensors like infrared motion detectors, light sensors, electromagnetic contacts, and smart power analyzers. This integration of technology aims to support aging individuals in successfully and securely completing their Activities of Daily Living (ADLs) [6]. Furthermore, the incorporation of intelligent agents into smart homes, alongside sensor networks, transforms these environments into comprehensive cognitive orthoses. Collaborating with occupational therapists, nurses, physicians, and other professionals, these cognitive orthoses work synergistically to enhance the quality of life of aging individuals [7]. By providing real-time monitoring, assistance, and intervention when needed, smart homes equipped with intelligent agents offer a promising avenue for promoting independence and well-being among the elderly population.

Many researchers have studied the challenges related to this ambitious goal from several points of view of computing and engineering. One of these challenges is the ability, for an artificial intelligence, to be able to comprehend the context and the ongoing ADLs in real time [8]. This topic, known as activity recognition, is the subject of the predilection of many teams of research around the world [9]. Several approaches have been proposed, but they can broadly be divided into two main families: knowledge-driven [10] or data-driven [11]. The first family of approaches [10,12,13,14,15] was originally explored with the development of expert systems and models based on mathematical formalisms (first-order logic, lattice theory, etc.). These models are generally considered to be simpler to implement and to evolve. Nevertheless, to be functional, they require a library containing the models for each ADL. While these approaches are less popular in research nowadays, they are often still chosen for real-world deployment where machine learning models are challenging to train due to high variability in the conditions, in the activities, and the difficulty to obtain ground truth [16]. Nevertheless, these knowledge-driven approaches have long suffered from generalization problems and from complex update/maintenance.

To palliate to the weaknesses of these approaches, most researchers are now focusing on using machine learning for ADLs recognition [17]. The idea is to attain adaptable systems which would be independent from a human-constructed library [18]. Most of those are supervised, which means they need a labeled training dataset to work properly [19]. On the contrary, unsupervised methods do not require a labeled dataset for the learning phase [20]. These models are scarce in the literature and can only recognize coarsely grained ADLs [21]. A good tradeoff is the recent semi-supervised learning models [22]. Nevertheless, whatever approach researchers rely upon, their approach is limited by the quality of the data [16]. Very often, smart homes still rely on very simple sensors due to the cost and the complexity of processing the data [23]. With the recent deep learning architectures, researchers are now including more advanced sensors such as passive RFID [24], ultrawideband (UWB) radars [25], and cameras [26]. While deep learning can readily extract features and transform complex data to learn meaningful patterns, in smart homes, for some specific tasks, it does not cover all scenarios and all needs. For instance, data may embed some spatial information that is not readily available, such as object movements and their spatial relationships. While it is likely that when granted enough data, deep learning could learn such relations, it is often simpler to use a non-learning algorithm. Moreover, it might provide details of the reasoning and be more interpretable than the classic black box of a deep learning approach [27].

Considering that qualitative spatial reasoning has, to the best of our knowledge, been infrequently explored by the research community focusing on Activities of Daily Living (ADLs) recognition, apart from the incorporation of limited location-related features [28], this paper introduces a straightforward yet highly efficient and effective algorithm for extracting spatial information. This algorithm capitalizes on the fuzzy tracking of everyday objects using passive Radio-Frequency Identification (RFID) technology. While our implementation builds upon our previous passive RFID tracking system [29,30,31], in principle, the algorithm is compatible with any imprecise tracking system. Termed the Atomic Gestures Identifier (AtomGID), this novel algorithm represents a departure from learning-based approaches. It can be configured using intuitive rules of thumb and does not necessitate a dataset for operation. While we acknowledge that it may not rival state-of-the-art deep learning methods in terms of performance when provided with extensive data, we posit that AtomGID serves as a complementary tool rather than a direct competitor. We propose that AtomGID addresses a crucial gap in the field by facilitating the incorporation of simple and explainable qualitative spatial reasoning (QSR) [32] into activity recognition frameworks for smart homes. By leveraging AtomGID, researchers and practitioners can harness the power of spatial information to enhance the interpretability and robustness of ADL recognition systems, thereby advancing the capabilities of smart home technologies.

AtomGID is implemented and tested first with a home-made simulator that simply adds random Gaussian noise on the data. Then, we implemented it in a realistic smart home infrastructure using binary sensors (electromagnetic contacts, pressure mats, infrared sensors, light detection, and flowmeters) and using passive RFID. Passive tags are used to localize and track in real-time the objects which are used for ADLs in the smart home (e.g., a mug, a plate) [29]. A model of qualitative spatial reasoning is then exploited to extract movement features through AtomGID. We show that AtomGID can readily decompose positions into simple atomic spatial information. Then, a use case is proposed to show how AtomGID could be used in a complete pipeline. First, the extracted knowledge is processed by our clustering algorithm based on flocking [33]. Nevertheless, since flocking is based on the vectors’ similarity, any other clustering could have been used for this purpose. Second, spatial knowledge is compared by creating a neighborhood graph of the atomic gestures. The resulting method is fully unsupervised. The various experiments conducted in our smart home demonstrate the feasibility of using the method for coarsely or finely grained ADLs. Finally, the method does not require any fine-tuning to work properly. This aspect represents a strong advantage of the method for generalization in new smart home infrastructures.

2. Related Work

The scientific literature related to activity recognition in the context of smart homes for seniors is quite extensive [6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,34,35,36,37,38,39,40,41,42,43,44,45,46,47]. Generally speaking, the problem of ADL recognition in smart homes relates to a fundamental question in AI [34]: How to identify the actual ongoing activities, as well as their progress (steps), so that this identification can be used in a decision-making assistive process? Usually, a recognition system receives a series of observations (which are low-level inputs from sensors), interprets these inputs (extracting and understanding the information), and subsequently employs an algorithm to align the observed input sequence with one of the activity models (signatures) stored in a knowledge base [12]. The literature can be divided into three main categories of technologies, which are typically used for activity recognition: vision-based approaches, recognition using wearable sensors, and recognition exploiting ambient sensors.

Camera-based approaches constitute one of the most prolific fields of research since the rise of deep learning models [39]. Recently, there have been notable advancements in employing deep learning models for the automatic extraction of features from input data, ranging from low-level to high-level features. This approach has shown significant improvements, particularly in the classification of large datasets, especially those based on vision. In light of this progress, the work of [40] introduces a Convolutional Neural Networks (CNNs) architecture as a deep learning model to recognize human actions in a smart home video dataset. Also, they compared the performance of their deep learning approach with previous works that utilized traditional machine learning methods on the same dataset. Their results indicate that the proposed deep learning model achieved an accuracy rate of 82.41% in classifying human activity. The limitation of this work is the low granularity of their recognition process. More recently, Garg et al. [41] proposed a vision-based human activity recognition model also using deep learning algorithms. Their proposal relies on a hybrid model composed of Long Short-Term Memory (LSTM) combined with Convolutional Neural Networks (CNNs). LSTM is a specialized form of recurrent neural networks (RNNs), uniquely tailored to handle long-term data dependencies. Concurrently, Convolutional Neural Networks (CNNs) have established themselves as high-performing tools for image classification within the realm of deep learning algorithms. Recognizing the limitations of LSTM in classifying static images, they therefore proposed a hybrid CNN-LSTM model. This model leverages CNNs for initial feature extraction, followed by feeding these features to LSTM in a sequential manner via a time-distributed layer. They applied this hybrid model to classify six high-level activities (boxing, jogging, etc.) from two datasets, achieving accuracies of 96.24% and 93.39% on the KTH and Weizmann datasets, respectively. Additionally, they conducted two separate implementations of the CNN and LSTM models on these datasets, employing identical parameters to those used in the hybrid model, to assess their individual impacts on accuracy and loss. While the model shows good performance, its limitation relies on focusing only on high-level activities. Also, the chosen activity was pretty different, making them easy to distinguish. In a smart home, with, for instance, the case of cooking, activities are often similar (i.e., making tea or making coffee). Closer to our proposal, some works tried to address the specific issue of recognizing low-level actions (fine-grained recognition) in smart homes and hand gestures. A good example is the work of Nguyen-Dinh et al. [42], which proposes recognizing the hand parts in a depth hand silhouette using a camera-based system. His team created a database of synthetic hand depth silhouettes and their corresponding hand parts labeled maps and then trained a Random Forest (RF) classifier with the database. They obtained interesting results, but the approach is limited in the precision of the information (only hand parts are recognized and not the action carried out by the hand). The main drawback of all these camera-based approaches is that, in an assistive context for smart homes, users (most often elders) are often very reticent to the use of cameras in their homes for privacy reasons.

Another type of approach concerns models based on wearable sensors, such as wristbands. In their recent work, Jabla et al. [10] introduced an innovative activity recognition framework driven by knowledge and implemented on smartphones, which is considered to be a wearable device. Their framework aims to enhance recognition accuracy and improve people’s quality of life within dynamic environments in real-time. Specifically, we advocate for a knowledge-driven approach that incorporates ontology-based context evolution and dynamic decision-making, enabling the accurate recognition of new and unfamiliar activities. They validated the effectiveness of the framework using a publicly available activity recognition dataset, demonstrating its superiority over data-driven baseline approaches in terms of accuracy. The experimental findings highlight that their framework not only enhances accuracy but also facilitates the effective learning of activities when encountering unknown scenarios in real-time. In another study, Yuan et al. [18] tried to harness recent advancements in self-supervised learning techniques by using the UK Biobank accelerometer dataset—a repository of 700,000 person days of unlabeled data—aiming to construct models with significantly enhanced generalizability and accuracy. They argue that the resultant models consistently surpass robust baseline models across eight benchmark datasets, exhibiting a relative improvement in F1 scores ranging from 2.5% to 130.9% (with a median of 24.4%). Notably, their findings demonstrate robust generalizability across various external datasets, participant cohorts, living environments, and sensor devices, a notable departure from previous findings. Finally, our team recently exploited photoplethysmography sensors [43] to recognize hand gestures using standard machine learning techniques with a standard Random Forest algorithm. The machine learning approach leverages a dataset collected from a wristband equipped with an accelerometer and photoplethysmography sensors. To construct this dataset, we defined a series of atomic cooking gestures performed by participants. The collected data have been labeled and will be made available to the scientific community. We achieved promising results, with an accuracy of 94%. The limitations of all these wearable approaches are that the user has to wear devices. Therefore, the person may wear the device incorrectly, and this is considered a bit intrusive [7]. Also, all these previous contributions did not include spatial aspects in their models.

Finally, the last type of approach relies on ambient sensors distributed everywhere in the user’s environment [5]. These approaches used a large variety of sensors such as RFID, infrared sensors, pressure sensors, UWB radars, etc. Some works, such as the one of Fan et al. [8], introduce an expanded framework designed to integrate various sensors with different communication protocols. Their focus lies in conceptualizing activity recognition as a cloud-based service. Consequently, this framework facilitates the direct implementation of various applications, including home automation, environmental and health management, and advanced risk prediction. They demonstrated with experimentation the capability of their approach with many sensors to achieve a theoretical accuracy of 93%. More specifically, in the field of recognition with RFID antennas, Li et al. [24] introduced a novel tag trajectory index method aimed at resolving the issue of localized tag trajectories between RFID readers. It also presents a tag identification model based on RFID readers equipped with multiple antennas, offering a more cost-effective and efficient deployment solution. Additionally, they proposed a multi-antenna optimization control scheme capable of automatically adjusting the number of operational antennas to meet real-world demands, thereby enhancing recognition efficiency. Furthermore, it introduces an indoor tag trajectory reasoning data structure called OPTR-tree, facilitating the fast recognition of multiple tag tracks in indoor environments with room-level accuracy. Both the theoretical analysis and experimental findings validated that the proposed multi-antenna model enhances the reliability of recognizing multiple tags within building premises. In regard to our team [31], we presented a method that uses the signal strength indication of RFID antennas with statistical features to perform relative positioning in a smart home. The goal of the proposed method was to enable the tracking of most objects inside a smart home in real-time, allowing for activity recognition based on this tracking. This work also introduced a new dataset of 4,100,000 RFID data collected in a real full-scale smart home setting. The dataset was made available for the community. The method showed an accuracy of 95.5%, which is similar to previous works but requires a fifth of the time to compute. At the end, it is clear that the ambient distributed sensor approaches are the most suited in the context of Ambient Assisted Living (AAL) and that these approaches could give good results. However, only several works in this context addressed specifically the find-grained gesture recognition, and very few tackled the issue of integrating spatial reasoning to enhance the assistive potential of the model [9,11].

In summary, most of these existing approaches encounter a significant limitation by operating at a very low granularity, restricting their capability to differentiate between broad categories of activities (such as morning routines, washing, meal preparation, etc.). This limitation arises from the predominant focus of prior research on monitoring user routines and discerning patterns within them, rather than prioritizing real-time assistance. Notably, this trend is exemplified in the well-known works of Diane Cook and her team from Washington State University [35,36,37,38], who have dedicated over a decade to exploring the ADL recognition problem from various perspectives. Their efforts have yielded diverse approaches for identifying high-level ADLs from low-level ambient sensor data, employing techniques such as Decision Tree (DT) classifiers, Naïve Bayes Classifiers (NBCs), Random Forests (RFs), and Support Vector Machines (SVMs) [35]. Additionally, they have delved into multi-resident ADL recognition [38] and recently introduced a method for finely discerning activity start and end times [36]. While this advancement provides valuable temporal information, the issue of recognizing very specific low-level actions (steps) remains unaddressed, as does the consideration of spatial relationships between objects and users. In contrast, our approach distinguishes itself by prioritizing the recognition of low-level actions (steps) within the context of Activities of Daily Living, leveraging spatial relationships to enhance understanding.

3. The Smart Home Setup

Our laboratory, the Laboratoire d’Intelligence Ambiante pour la Reconnaissance d’Activité (LIARA), has been at the forefront of research and development in the domain of smart homes aimed at facilitating aging in place for almost two decades [12,21,23,25,29]. Over the years, our research has focused on creating intelligent environments that seamlessly integrate pervasive sensing technologies with user-friendly interfaces to enhance the quality of life of older adults. The smart home prototype utilized in this project represents the culmination of years of research and development efforts. It incorporates a diverse range of pervasive sensing technologies, carefully selected to provide comprehensive monitoring and assistance capabilities. These technologies include electromagnetic contacts, temperature sensors, passive infrared motion detectors, flow switches, tactile mats, light sensors, ultrasonic sensors, passive RFID, and UWB radars. Figure 1 illustrates the layout of the testing bed, showcasing the deployment of these sensors throughout the living space.

While the smart home prototype utilized in this project leverages a multitude of sensing technologies, it is important to note that our laboratory possesses a broader spectrum of advanced tools and resources. For instance, technologies such as the power analyzer, which is typically installed on the main electrical panel, and UWB sensors, offer additional capabilities for monitoring and analysis [25]. Furthermore, our lab is equipped with interactive devices designed to facilitate seamless communication and interaction with users. These include screens, speakers, tablets, and various robotic platforms [23]. Our ongoing commitment to innovation and research ensures that our smart home solutions remain at the forefront of technological advancement. By continually exploring new sensor technologies, refining user interfaces, and enhancing interaction modalities, we strive to create intelligent environments that not only support aging in place but also promote independence, safety, and well-being for older adults.

3.1. Data-Driven Orientation

As highlighted in the Introduction, knowledge-driven approaches encounter significant challenges due to the necessity of constructing complex libraries of Activities of Daily Living (ADLs) by human experts, rendering them intricate to implement in larger-scale projects [16]. Moreover, these models often lack the flexibility to adapt to the diverse profiles and behaviors of individual residents [10]. Recognizing these limitations, a substantial portion of the research community, including our team, has shifted towards data-driven approaches [21]. However, in our specific context, the utilization of existing machine learning algorithms poses challenges. Straightforward application is not feasible, and even the adoption of unsupervised methods, which hold considerable promise for our objectives, presents significant hurdles. Furthermore, the deployment of smart home environments generates vast quantities of data. For instance, within the LIARA smart home, binary sensors alone produce approximately one million raw data entries per day, and when RFID tracking is incorporated, this figure increases by up to 5 million, depending on the number of objects monitored.

While numerous methodologies exist for handling large datasets, the unique characteristics of smart home data present distinct challenges. The majority of collected data often lack meaningful events, as most of the time, the home environment remains inactive. Even during periods of activity, only a fraction of the data capture relevant information pertaining to ADLs. Thus, the challenge lies in identifying and extracting the pertinent knowledge embedded within this vast and largely inconsequential dataset. Consequently, the raw data corresponding to ADLs are similar to the raw data corresponding to idle time (or no activity ongoing). This property makes it difficult to automatically extract interesting frequent patterns, since such patterns are, in fact, not frequent at all. Researchers often partially address this issue by transforming the data into an event-based representation, storing only the significant changes in states [8]. The main advantage of the method is that the dataset is easy to understand and process, and the risk of losing information is limited for simple sensors (e.g., binary sensors). However, the story unfolds differently for analog sensors or other complex technologies such as ultrasound, laser range scanner, and RFID. Defining events is harder since the value of these sensors varies with every reading. In that case, the threshold must be set, and it increases the risk of losing information [16]. The activity recognition method presented in this paper uses only RFID and the binary sensors of the smart home (electromagnetic contacts, passive infrared, tactile mat, light detection, and flow sensors). RFID is therefore the only technology causing real challenges in the data collection phase. Figure 2 shows a sample of the binary sensors’ dataset.

3.2. Machine Learning with Qualitative Spatial Reasoning

Our laboratory has been focusing on integrating knowledge from the research on qualitative spatial reasoning (QSR) in research related to the smart environment for some time now [44]. This paper presents our latest advances regarding the development of a fully unsupervised activity recognition algorithm exploiting QSR. Specifically, it presents a key component of the system that enables fine-grained activity recognition. Overall, this integrates into a five-step process. First, the raw data are collected from sensors in the smart home. The binary sensors are stored as events directly in a data warehouse. Second, the data collected from all passive RFID tags, in the form of Received Signal Strength Indication (RSSI), are transformed using a localization algorithm using a machine learning technique [29]. At this stage, it is important to clarify that RFID is used to track objects in the smart home, not persons. Thus, the second step transforms raw RSSI values emanating from all the objects of the smart home into Cartesian position values. Third, the positions of the objects are transformed into qualitative movements composed of a direction and a traveled distance value. This step combines the use of Clementini et al.’s [32] framework and a new gesture recognition algorithm developed in our lab which is the main contribution of this paper: AtomGID. Fourth, the binary dataset and the spatial knowledge are used in a version of the flocking specifically adapted for clustering [45]. The algorithm is exploiting the relationships of the spatial information to build clusters representing the ADLs. The fifth is to exploit these clusters for activity recognition in the smart home. Figure 3 shows the overall method.

4. Spatial Knowledge

Unlike ambient sensors, RFID positions amount for a large data warehouse if collected blindly for learning. The positions are collected for every object of the smart home every 100 ms and, without any sort of post-filtering, it is very likely that all the positions will change every localization iteration. Hence, intrinsically, the dataset of the positions is noisy. Extracting meaningful information can therefore be challenging. In addition, the positions are bound to the smart home configuration, and, consequently, any classification performed directly on them is set to fail at generalizing in new smart homes.

Fortunately, a whole field of research has been working on extracting significant information from spatial data over the last four decades. Qualitative spatial reasoning (QSR) embeds a wide range of formal models to extract quality spatial features [33]. While it is very rarely used for smart homes, our past research has shown that such high-level knowledge can significantly improve the process. There are several properties that can be exploited with spatial data. Our algorithm is focused on the movement of objects since it seems to be the best discriminating indicator for ADL recognition. In that regard, a gesture recognition system was designed.

4.1. Gesture Primers

Gesture recognition is widely studied in human–computer interaction. It is usually performed in order to interact with software for both enjoyment (games, dynamic presentations, etc.) and faster/adapted commands. In general, a gesture is defined as an expressive and meaningful body motion (hand, face, arms, etc.) that conveys a message or, in technical terms, embeds important information of a spatio-temporal nature. The process of gesture recognition is usually divided into four main steps [42]: dataset segmentation, filtering of the segmented data, limitation of the directions, and matching the directions sequence in a knowledge base. In the context of our research, the meaning behind the gesture is actually irrelevant. Instead, a gesture can be regarded as a feature for machine learning. Gestures are, therefore, decomposed as a set of atomic gestures which represent the simplest form of qualitatively significant movements performed through the use of objects. As opposed to the literature on gestures recognition which focuses on the fourth step (matching the sequence), our challenge relies mostly upon the first three steps.

4.2. Segmentation of the Data

Segmenting the dataset into atomic gestures is the first step of the process. The literature on gesture recognition often assumes this step is either trivial or given (through the use of a device such as a mouse for example). Due to our noisy and continuous data stream, it is, however, actually the biggest challenge to overcome with the new system. Segmentation becomes even harder since in most of the spatio-temporal series, the objects are actually idle. Indeed, there are several objects in the smart home, but the user moves only a few at the time. Nevertheless, the objects always appear as moving from a purely tracking standpoint; that is, they are never at the exact same position twice in a row. The localization produces an average error of 14.12 cm in a constrained environment (i.e., small space, a few objects, four antennas). This discrepancy does not follow a normal distribution. Hence, a dataset made of only idle positions could still embed some patterns.

Finally, the segmentation phase is also conditional on the last two subjective variables. The first one is the actual performance of the user. As opposed to the literature, we are in a keyhole context; therefore, the resident is not intentionally performing the gesture. The second one is the granularity of the atomic gesture. The variation in granularity can lead to drastically different results. Figure 4 illustrates three interpretations, depending on the granularity, for the same example dataset.

4.3. Qualitative Spatial Reasoning

After segmentation usually comes the step of filtering the data. In our case, the filtering is performed by the tracking algorithm and is less interesting at this stage (see one of our previous papers [29]). The next challenge lies in the spatial representation of the data. It is important to limit the number of possible basic directions, since a qualitative representation is desirable. The QSR framework by Clementini et al. [32] provides a scalable and simple way to represent distance and orientation relationships between entities. Scalability is important in order to obtain a model that does not depend on the localization method and technology used. Moreover, the model enables various granularity levels, which suits our situation well. While the framework could easily evolve in the future, the number of basic directions was fixed to eight in this project. These eight basic qualitative directions are

D = \{E, N E, N, N W, W, S W, S, S E\}

that stand, respectively, for the following: East, Northeast, North, Northwest, West, Southwest, South, Southeast.

To adequately exploit the framework of Clementini et al. [32], a qualitative unit of distance has to be defined. This part was trickier than defining the qualitative orientation (our basic directions) since it would bound the granularity of the recognizable gestures. The distance is, accordingly, defined as a function of the average localization error

ε = 15 c m

. The qualitative distances are therefore expressed in terms of

ε

number. Figure 5 shows a sample atomic gesture described in the QSR framework. The atomic gesture is the pair (

E, ε_{3}) .

The benefit of this unit is that it would gracefully allow the gestures to evolve if, for example, the localization method would become more accurate in the future.

4.4. Atomic Gesture Identifier

In order to perform this task, a recursive algorithm, named Atomic Gesture IDentifier (AtomGID), was created by our team. The algorithm takes as input a dataset of position to extract the atomic gestures. A description of this algorithm can be seen below (Algorithm 1). AtomGID takes two inputs: the average error

ε

and a list of positions

L_{p} = [(x_{n}, y_{n}), \dots, (x_{m}, y_{m})]

. In that list, the symbol

n

is the beginning of the list, and

m

is the ending number in terms of iterations. The algorithm can take an input list of varying size and deals with the data stream. As for the output, the algorithm creates a list of atomic gestures

L_{α} = [\dots]

. Gesture

α

inside that list is composed of the pair of value directions (

D_{x})

and distance (

ε_{y})

. Those values are described in qualitative terms according to the model defined previously. Supplementary information is needed in order for the algorithm to work. This element,

φ

, is a correlation coefficient and is used to evaluate the certainty of the recognition. Therefore, an atomic gesture

α

is a structure

< (D_{x}, ε_{y}), φ >

. As a side note, in the actual implementation of the spatial datamining algorithm, AtomGID is only called on the sequence of positions from an active object (as opposed to all objects). An ad hoc algorithm was implemented in order to classify, in real-time, whether an object is active or idle from the changes in all the objects. Only the varying positions of the active object are streamed to AtomGID.

Algorithm 1. Atomic Gestures Identifier.

Input: List of positions

L_{p} = [(x_{n}, y_{n}), \dots, (x_{m}, y_{m})];

Average error

ε

Output: List of atomic gesture

L_{α} = [\dots];

A gesture

α

is a structure

< (D_{x}, ε_{y}), φ >

Compute radius of the smallest enclosing circle

L_{p} \to δ

If

δ \leq ε

or

|L_{p}| < 20

Return

L_{α} = (< (I d l e, ε_{0}), 0 >

)
Call AtomGID (

{L_{p} ([x}_{n}, {y_{n}], [x}_{\frac{m}{2} - 1}, y_{\frac{m}{2} - 1}]), ε) \to R_{l}

Call AtomGID (

L_{p} ({[x}_{m / 2}, y_{m / 2}], [x_{m}, y_{m}]), ε) \to R_{r}

Compute CorrelationCoefficient

(L_{p}) \to φ

Compute BestLinearRegression

(L_{p}) \to σ

Find QualitativeDirection(

σ, \vec{p_{n} p_{m}}) \to d i r

Call Combine

(L a s t (R_{l}), F i r s t (R_{r}), < d i r, φ >) \to L_{c}

Return

L_{α} = (R_{l} [1, L a s t (R_{l}) - 1] + L_{c} + R_{r} [2, L a s t (R_{r})])

4.4.1. Smallest Enclosing Circle

The first step of AtomGID is to compute the smallest enclosing circle on the input list of the positions. There are several methods used to compute the circle, but the simplest is executed in

O (n^{4})

, which is not desirable in our context. The geometric method is implemented in AtomGID. The method is performed in four steps. Step 1: Draw an enclosing circle of center

c

. Step 2: Reduce the size of the circle by finding the point

a

fartest from the center of the circle. Draw a circle with the same center but pass by point

a

. Step 3: If the circle does not pass through two or more points, make the circle smaller by moving the center towards point

a

until the circle contacts another point

b

from the set. If the circle does not contain a point-free interval of arc greater than half the circle’s circumference, the algorithm stops. Step 4: Take

d

and

e

, the points at the end of the longest points free interval, and reduce the circle until one of these following conditions is met:

I.

The diameter is the distance

\bar{d e}

;

II.

The circle touches another point

f

from the set.

a.: If no such point-free arc interval exists, then end;
b.: Else go to (step 4) repeat the process.

The geometric approach has an

O (n^{2})

time complexity. Step 1 is constant, and finding point

a

in step 2 is conducted by passing all the points once:

O (n)

. Step 3 can also be performed in

O (n)

. To find point

f

, it is necessary to test the

n - 2

remaining points. However, every time we must verify that the remaining

n - 3

are still in the enclosing circle. That last step is accomplished in

O (n^{2})

.

4.4.2. Recursivity

The recursivity of AtomGID has a stop condition at the beginning of the algorithm. The recursion will stop if the input list of the positions is shorter than twenty elements. Performing gesture recognition on too small a number of points would result in lower accuracy, since it would increase the chance for noise to generate genuine gestures. The second stop condition uses the smallest enclosing circle obtained previously. If the radius of this circle is smaller or equal to the average localization error

ε

, then it must mean that the list of positions does not represent significant movements and, therefore, it is not relevant for our data mining. When any of these two conditions is broken, the recursion is stopped. AtomGID would then create a new list of gestures and add a single idle. Otherwise, AtomGID is called recursively twice with half of the list of positions in argument.

4.4.3. Correlation Coefficient

The Pearson product–moment correlation coefficient is computed to evaluate the representativity of the direction inferred on the data. It is used in combination with the linear regressions explained in Section 4. In our algorithm, the correlation coefficient is crucial for the segmentation of the atomic gestures. It is actually necessary to compare the found gestures when trying to combine the results of recursive calls. It is also used to plot the current dataset to the appropriate qualitative direction. The coefficient, denoted by

φ

, is evaluated using Equation (1).

\frac{(n (\sum_{i = 0}^{n} x_{i} y_{i}) - (\sum_{i = 0}^{n} x_{i}) (\sum_{i = 0}^{n} y_{i}))}{\sqrt{n (\sum_{i = 0}^{n} x_{i}^{2}) - {(\sum_{i = 0}^{n} x_{i})}^{2}} \times \sqrt{n (\sum_{i = 0}^{n} y_{i}^{2}) - {(\sum_{i = 0}^{n} y_{i})}^{2}}}

(1)

The value of

φ

is comprised between −1 and +1 based on the degree of correlation. A value far from zero expresses a better correlation between the direction and the data. However, a threshold still needs to be set to make a decision. In our experiments, the threshold was set empirically by generating small sets of data points with representative noise with a stationary object. The idle objects usually returned a value of

φ < 0.5

.

4.4.4. Qualitative Direction

From a set of positions, the quantitative direction is found by performing linear regressions. The regression results in a linear function are in the form

y = a x + b

. The unknown constants

a

and

b

are found from

L_{p}

by exploiting (2) and (3). One regression is computed per pair of qualitative directions. For four directions, two equations are found; for eight directions, four equations are found, and so on. For each candidate pair of directions, the point set is rotated to set it to

\frac{π}{4}

. Rotation is easily carried out using the rotation matrix (4) and it is performed in

O (n)

. As a side note, all the linear regression can be performed simultaneously to all the rotations to keep the overall complexity in

O (n)

. This optimization is particularly useful if one expects to deal with a large input list of points. Once the equations are computed, obtaining the qualitative direction is straightforward. First, the candidate pair is evaluated using their correlation coefficients. The one with the highest correlation is selected. Once the regression step has been completed, the equation would result in two possible opposite directions. Only one of the two directions can explain the movement. It is determined by finding the vector

\vec{p_{n} p_{m}}

from the list

L_{p} = [p_{n} (x_{n}, y_{n}), \dots, p_{m} (x_{m}, y_{m})]

. The vector is given by Equation (5). The vector would not be precise enough to compute the atomic gesture, but it is good enough to discriminate between the potential outcomes. The signs of the vector are simply compared with the quadrant of the Cartesian plan in which the direction is situated.

a = \frac{(|L_{p}| (\sum_{i = 0}^{|L_{p}|} x_{i} y_{i}) - (\sum_{i = 0}^{|L_{p}|} x_{i}) (\sum_{i = 0}^{|L_{p}|} y_{i}))}{(n (\sum_{i = 0}^{n} x_{i}^{2}) - ({\sum_{i = 0}^{n} x_{i})}^{2})}

(2)

b = \frac{(\sum_{i = 0}^{|L_{p}|} y_{i})}{|L_{p}|} - a \frac{(\sum_{i = 0}^{|L_{p}|} x_{i})}{|L_{p}|}

(3)

[\begin{matrix} x^{'} \\ y^{'} \end{matrix}] = [\begin{matrix} \cos (\frac{π}{4} - D_{θ}) & \sin (\frac{π}{4} - D_{θ}) \\ - \sin (\frac{π}{4} - D_{θ}) & \cos (\frac{π}{4} - D_{θ}) \end{matrix}] \times [\begin{matrix} x \\ y \end{matrix}]

(4)

\vec{p_{n} p_{m}} = [\frac{1}{ι} \sum_{i = m - ι}^{m} x_{i} - \frac{1}{ι} \sum_{i = 1}^{l} x_{i}, \frac{1}{ι} \sum_{i = m - ι}^{m} y_{i} - \frac{1}{ι} \sum_{i = 1}^{l} y_{i}]

(5)

4.4.5. Combining the Recursive Calls

There is only one step that remains to be clarified in AtomGID. How are the results of the two recursive calls combined? A function taking three parameters as inputs and named Combine is used. The parameters are atomic gestures from the result of both recursive calls and the gesture of the current call. Since the results from both the left and right branches’ child are lists of atomic gestures, only the last and first elements are considered. The combine function has three potential outputs. It will either result in a simple idle gesture, in the current gesture returned as a list, or in the list comprised of the two children

L_{c} = [L a s t (R_{g}), F i r s t (R_{d})]

. The decision is made by comparing the qualitative directions and the coefficient

φ

. While the implementation is a bit longer, these few inference rules represent the idea:

A.

If children are identical (including idle) → current

B.

If children are different but not idle

a.: If the average of their $φ$ > current $φ$ → two children
b.: Else → current

C.

If one child is idle

c.: If non-idle child’s $φ$ > current $φ$ → two children
d.: Else → current

Finally, AtomGID returns a list of gestures composed by the list returned by the left child (minus the last element) at the beginning by the result of the combine function in the middle and as a list of the gesture along with and by the right child (minus the first element) at the end.

4.4.6. Discussion

There are several advantages to the gesture recognition algorithm we developed. Section 6 presents the experiments that were performed to validate its proper behavior and, as the reader will see, it is very reliable despite the difficult context in which it is used. The performance of the algorithm is also very important in order to use it in real-time on data streams. It can be evaluated using the master theorem (see Equation (6)) knowing that the number of recursive calls and the division factor are both equal to two. Our function complexity is defined in four steps. The regression is performed in

O (n)

. The calculation of the vector is constant, and the correlation coefficient also requires one scan of the list of positions

O (n)

. It is the smallest enclosing circle which is the hungriest:

O (n^{2})

; therefore, the function is defined as

O (n^{2})

. The theorem says that whenever

f (n) = O (n^{k})

and

a < b^{k}

, the complexity is equal to

O (f (n))

. The global complexity is

O (n^{2})

. Note that a better version of the smallest enclosing circle achieved in

O (n)

exists.

T (n) = 2 \times T (\frac{n}{2}) + O (n^{2})

(6)

The main reason why AtomGID is used in this research project is not for its performance, however. The RFID in our context is particularly difficult to fully exploit since it generates a large amount of noisy data every day. To illustrate our point, suppose that 20 objects are tracked for a day every 100 ms. The number of positions generated is 864,000 × 20 = 17.28 million. If only one object is moving at the time, only the atomic gestures of that specific object need to be kept. If the gestures take at least 1 s, the maximum number of gestures we need to keep is 86,400, which represents a 99.5% reduction. Moreover, the algorithm is less dependent on the absolute disposition of the smart home, and the qualitative framework is fully extensible. Finally, compared to a deep-learning-based approach to gesture recognition, AtomGID is actually traceable and explainable. As a logical model, it is fully transparent and easy to reproduce.

5. Clustering with Flocking

Now that the foundation of our method has been set, only the last part is left to discuss. One may want to keep in mind that, at this point, a data warehouse containing information from all the sensors of the smart home and a dataset of atomic gestures is used. In accordance with the literature, that dataset of raw data for the binary sensors is processed to an event-based format. Event-based datasets are closely tied to time. They are lighter (require fraction of the space) since they remove all duplicate information and instead only store transitional information (e.g., passive infrared went from false to true). It is assumed that our context is purely unsupervised; no labeled training set is available, and the number of ADLs to extract is unknown. For the validation purpose of our model, the reader will notice that it is actually not the case, but this assumption should hold out of the lab context. Consequently, it seems appropriate to design an unsupervised method that would require very little parametrization specific to the problem. In our past work, we have used flocking to that purpose, and we have shown that it works well in context where data are limited. In this section, we briefly summarize how to use it, but the savvy reader should keep in mind that (1) more details are available in our previous paper [45], and (2) that at this point, a different unsupervised learning method could be used, including more advanced deep learning approaches [40].

5.1. Flocking Primers

Flocking [33] was created to simulate the behaviors of animals moving as a group. Using the three simple rules shown on Figure 6, autonomous agents interact with each other, resulting in an emerging and natural movement. The Alignment force (

\vec{F_{A}}

) is there to keep the heading direction vector of an agent in line with the direction of its neighbors. It can be found simply by averaging the heading vectors of all the agents in sight. The second force, Separation (

\vec{F_{S}}

), prevents a group of agents from moving to a unique position. The force acts like a reverse magnet that repulses the agents from its neighbors. To calculate it, the agent’s position vector (

\vec{P a}

) and all the neighbors’ position vector (

\vec{P n}

) are required. The difference in

\vec{P a} - \vec{P n}

is divided by their squared difference. The sum for all neighbors is computed. Finally, the Cohesion force (

\vec{F_{C}}

) is used on the agent to make it seek the center of mass of its neighbors. The goal is to obtain a group of agents staying and moving together. To find that force, the average of the

\vec{P n}

is calculated.

\vec{P a}

is the agent’s position,

M s

is the agent’s maximum speed (a predefined constant),

\vec{V a}

is the agent’s velocity, and

\vec{P n}

is a neighbor’s position vector.

5.2. Similarity and Dissimilarity

The three basic rules of flocking allow the agents to move naturally as a group. In order to use it as a clustering algorithm, it is necessary to modify it so the similar agents will form a group and the dissimilar agents will avoid each other. Hence, two new forces are added: similarity and dissimilarity. These forces, described in [45], help the agents form clusters in an unsupervised manner without specifying the number we are looking for. The similarity of agents in clustering with flocking is primarily based on the measurement of distance. Since the spatial information is qualitative, the function cannot work as it is. An evaluation function becomes necessary to accomplish such a task. This evaluation function should be independent from the number of qualitative classes used (in our case, we have eight). An automatically generated graph could fill that role. Figure 7 represents the graph that was used in this project. The arcs between the nodes are weights representing the relation of similarity between directions. Those values are found by dividing the highest similarity value (100% similar) by half the number of different directions. The advantage of this method is that it creates a clear separation between the representation and the clustering algorithm. In consequence, the directions framework could be changed (to add more directions, for example), and the only step that would be necessary is to regenerate the neighborhood graph.

To be more precise, let us quickly break down the process of adapting a clustering algorithm to exploit it with flocking behavior.

Step 1. We model the basic principles of flocking behavior with the three main rules: cohesion, alignment, and separation. As mentioned, these rules govern how agents (representing data points) move within a group to maintain cohesion while avoiding collisions with neighboring agents.

Step 2. We need to modify the flocking algorithm to suit the requirements of clustering. Instead of merely mimicking the natural movement of agents, the algorithm should encourage the formation of clusters where similar agents are grouped together while dissimilar agents are kept apart.

Step 3. We need to introduce the similarity and dissimilarity forces. Add two additional forces to the flocking algorithm: similarity and dissimilarity. The similarity force attracts agents that are similar to each other, encouraging them to cluster together. This force is based on a measure of similarity, such as distance or similarity scores between data points. Conversely, the dissimilarity force repels agents that are dissimilar, preventing them from clustering together. This force helps maintain separation between distinct clusters.

Step 4. We need to define the similarity and dissimilarity metrics, which means how similarity and dissimilarity between agents will be measured. This involves defining distance metrics or similarity/dissimilarity scores based on the characteristics of the data being clustered. In the context of qualitative spatial information, we use evaluation functions that assess similarity based on qualitative attributes, such as directional relationships.

Step 5. We need to create a neighborhood graph construction. We created a neighborhood graph that represents the relationships between agents based on their similarity and dissimilarity. Each agent is represented as a node in the graph, and the edges between nodes are weighted to indicate the strength of the relationship between agents. The weights of the edges can be calculated based on the similarity/dissimilarity metrics defined earlier. For example, higher similarity scores result in stronger connections between nodes, while dissimilar nodes have weaker connections.

Step 6. We need to integrate all that with the flocking algorithm. Integrating the neighborhood graph into the flocking algorithm will influence the movement of agents. We then modify the cohesion, alignment, and separation rules to consider the forces of similarity and dissimilarity derived from the neighborhood graph. Agents should be attracted to similar neighbors while avoiding dissimilar ones, thereby forming clusters based on the underlying data similarity.

By following these steps, we adapted the clustering algorithm to exploit flocking behavior, allowing for the unsupervised formation of clusters based on the similarity of the data points. This approach provides a flexible and scalable method for clustering data in complex environments, taking advantage of the principles of collective behavior observed in natural systems.

6. Experiments and Results

The validation of AtomGID is then achieved by implementing each of the contributing modules in the smart home infrastructure described in Section 2. Since collecting a large number of naturalistic RFID data for that purpose is challenging, we also considered simulation results. Moreover, our validation leverages several years of development and would not be possible without our previously developed and validated approaches. First, we leverage an RFID localization system improved incrementally over several publications by our team [29,30,31]. Then, we also reuse the clustering method published in [45] to evaluate it in the context of activity recognition in a smart home. Section 6 describes the various experiments and results. In order to manage the noise coming from different objects (for instance, metal objects, large quantity of water, etc.) in passive RFID technology readings before data preprocessing, we deployed several strategies to improve the quality and reliability of the collected data. First, we optimized the antennas’ placement to ensure that RFID readers are strategically positioned to maximize signal strength and minimize interference. We performed several prior experiments with different reader configurations and orientations to identify optimal placement for reliable data capture. Secondly, we used only high-quality antennas and calibrated them properly to optimize signal reception and minimize signal degradation. We adjusted antenna parameters such as gain, polarization, and beam width to enhance performance in our specific environments. Third, we selected the appropriate RFID frequencies (e.g., low-frequency, high-frequency, ultra-high-frequency) based on the specific application and environmental conditions. Different frequencies may exhibit varying levels of susceptibility to noise and interference, so we chose the most suitable frequency band for reliable data collection. Finally, we ensured that RFID tags are positioned correctly and oriented optimally for consistent and reliable reading. We experimented with tag placement and orientation to minimize shadowing, reflections, and other factors that may affect signal reception.

6.1. Standalone AtomGID Experiments

To evaluate the performance of AtomGID, the team first developed an experimental protocol using the trilateration algorithm and the smart home infrastructure at our lab. The experiments were conducted in the kitchen where four RFID antennas enable the 14.12 cm in localization precision [31]. The gestures were made with four basic directions. The length of each atomic direction varied from 30 to 60 cm. The idle gesture was also added to the dictionary. Figure 8 shows the 13 gestures that were selected for the first experimental phase. This phase was a scripted scenario in the smart home using the real RFID system. The participants were recruited to perform each gesture ten times. Lines were traced on the kitchen counter and the participants were asked to follow them. Our lab is not equipped with technology enabling us to gather a ground truth dataset; therefore, we assumed that the guidelines were approximately followed (from visual observations). It resulted in a recognition accuracy of 77%. It may seem low, but the majority of the errors are due to segmentation and the insertion of an idle time between directions. Moreover, RFID tracking is more difficult than RFID localization, which might result in a lower accuracy in the trilateration itself.

6.1.1. Simulation with Eight Directions

In addition to the naturalistic experiments that were conducted at our lab, a simulator was developed to validate the algorithm on a larger number of gestures. The simulator uses a uniform distribution to generate noise in the gestures. It is not copying the behavior of RSSI observed in a smart environment, which is quite challenging to approximate. In the smart home, the observed noise is not random, and therefore patterns can be distinguished depending on the zone of the localization. These patterns cannot be modelized since they depend on the exact configuration of the objects/persons in the smart environment. The results of any RFID simulation are always optimistic, but it gives a good overview of the potential application within realistic contexts.

Several simulations were conducted to validate the algorithm. In this section, we present the most interesting using a continuous stream of eight randomly generated directions. The reader may first take a look at the confusion matrix of the overall simulations presented on Figure 9. Overall, 56,741 gestures were generated over approximately three days of computing. The overall accuracy averaged 93.23%. As the matrix shows, the only gesture affecting the accuracy is the idle gesture with an average of 63.73%. Without it, the performance climbed to 99.75%. These results suggest that the transition between different gestures is actually very well detected. Obviously, as shown by previous experiments in the smart home, the simulation is too optimistic compared to the real RFID localization but gives a good overview of the capacities of AtomGID.

One final experiment we performed with gesture recognition was to compare different correlation thresholds and different sizes for the idle gesture. With the simulator, the results were unambiguous. A lower correlation threshold drastically improves the recognition of the idle gesture. At 0.25, the accuracy is 96% (under that, it becomes lower). Then, it is, respectively, 88.4%, 75.34%, 64.6%, 52.38%, and 38.9% at 0.3, 0.35, 0.4, 0.45, and 0.5. The accuracy of the other gestures correlates inversely but is more stable, varying from 96.4% at 0.25 to 99.75% at 0.45. The thresholds do not behave in the same way in the smart home, where noise is much harder to distinguish from gestures. This shows that the thresholds are closely tied to how well an idle object can be distinguished, while at the same time, they do not much affect the overall recognition. Note that it is assumed that most objects are idle in the smart home and only one at a time can be active. Since an ad hoc algorithm performs this filtering, AtomGID is set to a higher threshold.

6.1.2. Comparison with the Literature on Passive RFID Gesture

To evaluate the performance of AtomGID, it seemed important to review the literature on gesture recognition, despite the fundamental difference in the purpose of our approach. Most gesture recognition research concern analysis of video sequence or accelerometers data [46]. Indeed, in another context, gesture recognition is often considered to be a solved problem [47]. Despite this, very few researchers (other than our lab) have used gesture recognition for RFID. With noisy technology like RFID and in the context of continuous gesture recognition, the problem is actually very difficult. Classical gesture recognition usually does not operate with continuous data streams and therefore ignores the segmentation step (or assumes that it is obtained by the device).

Asadzadeh et al. [48] are among the pioneers in investigating gesture recognition using RFID. They employed a partitioning localization technique in conjunction with reference tags, utilizing three antennas positioned on a desk to monitor an 80 cm × 80 cm area subdivided into 64 equally sized square cells (10 cm × 10 cm). Their experiments focused on four basic directions, with an average gesture duration of 4.5 s at a velocity of 20 cm/s. While their localization algorithm demonstrated higher accuracy compared to our method, with an error of approximately 10 cm, it was rendered impractical for our context due to the absence of segmentation and reference tags. It is worth noting that although Asadzadeh et al., achieved a recognition rate of 93%, they overlooked the segmentation step, conducted experiments with gestures twice the length of ours, and tested on a table without any interference.

Another notable approach is that of Zou et al. [49], who leveraged RFID systems with phase difference information to recognize hand gestures performed in front of a tag. While their system performed admirably, it is not applicable to our scenario where objects move freely in the environment. Similarly, Ding et al. [50] utilized a comparable approach to recognize gestures on a grid comprising multiple passive RFID tags. Their RFIPad system achieved impressive accuracy in recognizing various motions and English letters. However, their segmentation method relied on the assumption that gestures were paused between strokes, which does not align with the continuous nature of our context. Despite their achievements, distinguishing confused letters relied on comparing the phase values of feature points, presenting challenges in our scenario.

In their recent work, Zhang et al. [51] proposed a low-cost, non-invasive, and scalable gesture recognition technology and successfully implemented the RF-alphabet, a gesture recognition system for complex, fine-grained, domain-independent 26 English letters. The RF-alphabet had an average accuracy of 89.7% in different domains for new users and new environments, respectively, by performing feature analysis on similar signals. However, the experimental process used in this work is not transparent, and it is difficult to fully assess. For instance, the work does not mention the number of participants nor the way the RFID tags are used on the hand of the user.

Finally, in previous work [52] from our lab, we proposed a simple algorithmic approach for hand gesture recognition designed to be used as the core component of a fine-grained activity recognition model. It was based on a simple wristband with accelerometers and a gyroscope. A set of 13 atomic gestures for cooking activities was defined, enabling the characterization of high-level cooking tasks as a set of simple gestures. The experiment was performed with 21 users, giving an average recognition rate of 83% for each gesture, which was inferior to the actual recognition performance of the approach proposed in this paper.

6.2. AtomGID in an Activity Recognition Context

The final set of experiments were designed to test the AtomGID in an activity recognition context with the overall system. As described earlier, for this purpose, we leveraged a clustering method that our team developed using flocking [45]. This enables us to detect activities that are performed in the smart home in a fully unsupervised manner without relying on a large dataset. For this experiment, a spatial dataset of five ADLs realized ten times was collected in our smart home infrastructure. Keep in mind that the gestures are only extracted for a single active object in the kitchen area. It is the zone we selected to install a higher number of RFID antennas to be able to perform precise localization necessary for that task. The reader should also keep in mind that even if we had an unlimited number of RFID antennas, trilateration would have to be restrained to specific areas to avoid being overwhelmed by collisions. Indeed, each RFID system functions in round robin, but the brand we own can only support four antennas per system.

The ADLs selected are fine-grained activities performed exclusively in the kitchen: preparing a bowl of cereal, preparing an instant coffee, making a burger, preparing pasta, and washing hair (in the kitchen sink). This dataset is still strongly imbalanced since the degree of complexity and the number of sensors activated vary a lot. The localization algorithm is the module generating the most data. There are about 20 tagged objects in the kitchen at the time of data gathering which are all generating positions. We drastically reduced the complexity by assuming that only one object at the time is moving (or is active) and feeding the list of positions to AtomGID. The dataset is split randomly, and 2/3 is used for the training. The procedure is repeated ten times to average the results. On most executions, flocking rapidly converges over 80% accuracy. The average final accuracy of the trained models is 86.73%. One interesting observation our team made is that flocking converges in around the same amount of time in all the experiments. It seems to indicate that the basic computation fee is higher than the linearity of the dataset.

7. Conclusions

In conclusion, this paper introduces AtomGID, a non-deep-learning gesture recognition algorithm specifically tailored for imprecise tracking systems. Developed through several years of research at our lab, AtomGID builds upon our prior work in passive RFID tracking and utilizes a flocking-based clustering method. Integrating AtomGID into our existing system demonstrates significant promise in accurately recognizing fine-grained activities within a smart home environment. While the field of activity recognition has made notable strides, addressing fine-grained activities effectively remains a challenge, often constrained by technological limitations or contextual factors. Nonetheless, advancing this area is imperative for enhancing smart home systems, as it holds the potential to enable more targeted support services and enhance the explainability of decision-making processes.

Looking ahead, several avenues of development warrant exploration. Currently, our approach serves as a bridge in contexts where deep learning may not be suitable, albeit potentially falling short of state-of-the-art (SOTA) methods. This prompts the question of whether to continue refining non-deep-learning solutions or adapt deep learning frameworks for our specific context. For instance, by amassing a comprehensive dataset comprising passive RFID data and partially annotated gestures and activities from diverse home environments, deep learning algorithms could theoretically supplant all three components of our system: passive RFID tracking, AtomGID, and clustering.

At present, it appears that there is still a niche for simpler, explainable non-deep-learning approaches. However, for future endeavors, our team plans to conduct a comparative study by replicating gesture recognition using an SOTA algorithm to assess its performance against AtomGID. This comparative analysis will provide invaluable insights into the strengths and limitations of each approach, guiding future research efforts towards the most effective solutions for activity recognition in smart home environments. Moreover, it would be interesting to explore techniques for enhancing the interpretability and explainability of gesture recognition algorithms, particularly in the context of decision-making within smart home systems. This could involve developing visualizations or interactive interfaces to help users understand how gestures are interpreted and utilized. Finally, with the availability of other sensors in most smart home environments, it is clear that an avenue of investigation consists of the integration of additional sensor modalities, such as ultrasound or photoplethysmography sensors, alongside passive RFID technology. The fusion of data from multiple sensors could improve the robustness and accuracy of gesture recognition systems, particularly in complex or dynamic environments.

Author Contributions

Conceptualization, methodology, software, validation, visualization, K.B.; writing—original draft preparation, writing—review and editing, resources, K.B. and B.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Natural Sciences and Engineering Research Council of Canada, through the discovery grant program.

Data Availability Statement

The original contributions presented in the study are included in the article; further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

United Nations. World Population Ageing 2023: Challenges and Opportunities of Population Ageing in the Least Developed Countries; United Nations-Department of Economic and Social Affairs: New York, NY, USA, 2023; pp. 1–74. [Google Scholar]
Komp-Leukkunen, K.; Sarasma, J. Social Sustainability in Aging Populations: A Systematic Literature Review. Gerontologist 2023, 64, 1–12. [Google Scholar] [CrossRef]
Lewis, C.; Buffel, T. Aging in place and the places of aging: A longitudinal study. J. Aging Stud. 2020, 54, 100870. [Google Scholar] [CrossRef]
Knoefel, F.; Wallace, B.; Thomas, N.; Sveistrup, H.; Goubran, R.; Laurin, C.L. Evolution of the Smart Home and AgeTech. In Supportive Smart Homes: Their Role in Aging in Place; Springer International Publishing: Cham, Switzerland, 2023; pp. 15–21. [Google Scholar]
Knoefel, F.; Wallace, B.; Thomas, N.; Sveistrup, H.; Goubran, R.; Laurin, C.L. Sensor Technologies: Collecting the Data in the Home. In Supportive Smart Homes: Their Role in Aging in Place; Springer International Publishing: Cham, Switzerland, 2023; pp. 35–52. [Google Scholar]
Maitre, J.; Bouchard, K.; Bertuglia, C.; Gaboury, S. Recognizing activities of daily living from UWB radars and deep learning. Expert Syst. Appl. 2021, 164, 113994. [Google Scholar] [CrossRef]
Szabó, P.; Ara, J.; Halmosi, B.; Sik-Lanyi, C.; Guzsvinecz, T. Technologies designed to assist individuals with cognitive impairments. Sustainability 2023, 15, 13490. [Google Scholar] [CrossRef]
Fan, X.; Xie, Q.; Li, X.; Huang, H.; Wang, J.; Chen, S.; Xie, C.; Chen, J. Activity recognition as a service for smart home: Ambient assisted living application via sensing home. In Proceedings of the 2017 IEEE International Conference on AI Mobile Services (AIMS), Honolulu, HI, USA, 15–30 June 2017; pp. 54–61. [Google Scholar]
Nagpal, D.; Gupta, S. Human Activity Recognition and Prediction: Overview and Research Gaps. In Proceedings of the 2023 IEEE 8th International Conference for Convergence in Technology (I2CT), Lonavla, India, 7–9 April 2023; pp. 1–5. [Google Scholar]
Jabla, R.; Khemaja, M.; Buendia, F.; Faiz, S. A knowledge-driven activity recognition framework for learning unknown activities. Procedia Comput. Sci. 2022, 207, 1871–1880. [Google Scholar] [CrossRef]
Ye, J.; Zhong, J. A Review on Data-Driven Methods for Human Activity Recognition in Smart Homes. In Cases on Virtual Reality Modeling in Healthcare; IGI Global: Hershey, PA, USA, 2022; pp. 21–40. [Google Scholar]
Bouchard, B.; Giroux, S.; Bouzouane, A. A Keyhole Plan Recognition Model for Alzheimer’s Patients: First Results. Appl. Artif. Intell. 2007, 21, 623–658. [Google Scholar] [CrossRef]
Hoey, J.; Poupart, P.; Craig, T.; Boutilier, C.; Mihailidis, A. Automated handwashing assistance for persons with dementia using video and a partially observable Markov decision process. Comput. Vis. Image Underst. 2010, 114, 503–519. [Google Scholar] [CrossRef]
Lundström, J.; Synnott, J.; Järpe, E.; Nugent, C.D. 2015, Smart home simulation using avatar control and probabilistic sampling. In Proceedings of the 2015 IEEE International Conference on Pervasive Computing and Communication Workshops (PerCom Workshops), St. Louis, MO, USA, 23–27 March 2015; pp. 336–341. [Google Scholar]
Jia, H.; Chen, S. Integrated data and knowledge driven methodology for human activity recognition. Inf. Sci. 2020, 536, 409–430. [Google Scholar] [CrossRef]
Lockhart, J.W.; Weiss, G.M. Limitations with activity recognition methodology & data sets. In Proceedings of the 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computing, Seattle, WA, USA, 13–17 September 2014; Association for Computing Machinery: New York, NY, USA, 2014; pp. 747–756. [Google Scholar]
Nguyen, B.; Coelho, Y.; Bastos, T.; Krishnan, S. Trends in human activity recognition with focus on machine learning and power requirements. Mach. Learn. Appl. 2021, 5, 100072. [Google Scholar] [CrossRef]
Yuan, H.; Chan, S.; Creagh, A.P.; Tong, C.; Acquah, A.; Clifton, D.A.; Doherty, A. Self-supervised learning for human activity recognition using 700,000 person-days of wearable data. NPJ Digit. Med. 2024, 7, 1–10. [Google Scholar] [CrossRef]
Qian, H.; Pan, S.J.; Miao, C. Weakly-supervised sensor-based activity segmentation and recognition via learning from distributions. Artif. Intell. 2021, 292, 103429. [Google Scholar] [CrossRef]
Riboni, D.; Murru, F. Unsupervised recognition of multi-resident activities in smart-homes. IEEE Access 2020, 8, 201985–201994. [Google Scholar] [CrossRef]
Chen, H.; Gouin-Vallerand, C.; Bouchard, K.; Gaboury, S.; Couture, M.; Bier, N.; Giroux, S. Enhancing Human Activity Recognition in Smart Homes with Self-Supervised Learning and Self-Attention. Sensors 2024, 24, 884. [Google Scholar] [CrossRef]
Patricia AC, P.; Rosberg, P.C.; Butt-Aziz, S.; Alberto PM, M.; Roberto-Cesar, M.O.; Miguel, U.T.; Naz, S. Semi-supervised ensemble learning for human activity recognition in casas Kyoto dataset. Heliyon 2024, 10, e29398. [Google Scholar] [CrossRef]
Bouchard, K.; Bouchard, B.; Bouzouane, A. Smart homes: Practical guidelines. In Opportunistic Networking; CRC Press: Boca Raton, FL, USA, 2017; pp. 205–238. [Google Scholar]
Li, S.; Lu, J.; Chen, S. A room-level tag trajectory recognition system based on multi-antenna RFID reader. Comput. Commun. 2020, 149, 350–355. [Google Scholar] [CrossRef]
Lafontaine, V.; Bouchard, K.; Maítre, J.; Gaboury, S. Denoising UWB Radar Data for Human Activity Recognition Using Convolutional Autoencoders. IEEE Access 2023, 11, 81298–81309. [Google Scholar] [CrossRef]
Zolfaghari, S.; Massa, S.M.; Riboni, D. Activity Recognition in Smart Homes via Feature-Rich Visual Extraction of Locomotion Traces. Electronics 2023, 12, 1969. [Google Scholar] [CrossRef]
Arrotta, L. Multi-inhabitant and explainable Activity Recognition in Smart Homes. In Proceedings of the 2021 22nd IEEE International Conference on Mobile Data Management (MDM), Toronto, ON, Canada, 15–18 June 2021; IEEE: New York, NY, USA, 2021; pp. 264–266. [Google Scholar]
Fahad, L.G.; Tahir, S.F. Activity recognition in a smart home using local feature weighting and variants of nearest-neighbors classifiers. J. Ambient. Intell. Humaniz. Comput. 2021, 12, 2355–2364. [Google Scholar] [CrossRef]
Fortin-Simard, D.; Bilodeau, J.-S.; Bouchard, K.; Gaboury, S.; Bouchard, B.; Bouzouane, A. Exploiting passive RFID technology for activity recognition in smart homes. IEEE Intell. Syst. 2015, 30, 7–15. [Google Scholar] [CrossRef]
Bergeron, F.; Bouchard, K.; Gaboury, S.; Giroux, S. Tracking objects within a smart home. Expert Syst. Appl. 2018, 113, 428–442. [Google Scholar] [CrossRef]
Bergeron, F.; Bouchard, K.; Gaboury, S.; Giroux, S. RFID Indoor Localization Using Statistical Features. Cybern. Syst. 2021, 52, 625–641. [Google Scholar] [CrossRef]
Clementini, E.; Di Felice, P.; Hernández, D. Qualitative representation of positional information. Artif. Intell. 1997, 95, 317–356. [Google Scholar] [CrossRef]
Olfati-Saber, R. Flocking for Multi-Agent Dynamic Systems: Algorithms and Theory. IEEE Trans. Autom. Control 2006, 51, 401–420. [Google Scholar] [CrossRef]
Das, D.; Nishimura, Y.; Vivek, R.P.; Takeda, N.; Fish, S.T.; Plötz, T.; Chernova, S. Explainable activity recognition for smart home systems. ACM Trans. Interact. Intell. Syst. 2023, 13, 1–39. [Google Scholar] [CrossRef]
Dawadi, P.; Cook, D.J.; Schmitter-Edgecombe, M. Analyzing Activity Behavior and Movement in a Naturalistic Environment Using Smart Home Techniques. IEEE J. Biomed Health Inf. 2015, 19, 1882–1892. [Google Scholar]
Wang, T.; Cook, D.J. sMRT: Multi-resident tracking in smart homes with sensor vectorization. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 43, 2809–2821. [Google Scholar] [CrossRef]
Wang, T.; Cook, D.J. Multi-person activity recognition in continuously monitored smart homes. IEEE Trans. Emerg. Top. Comput. 2021, 10, 1130–1141. [Google Scholar] [CrossRef]
Wang, T.; Cook, D.J.; Fischer, T.R. The Indoor Predictability of Human Mobility: Estimating Mobility With Smart Home Sensors. IEEE Trans. Emerg. Top. Comput. 2022, 11, 182–193. [Google Scholar] [CrossRef] [PubMed]
Sharma, V.; Gupta, M.; Pandey, A.K.; Mishra, D.; Kumar, A. A review of deep learning-based human activity recognition on benchmark video datasets. Appl. Artif. Intell. 2022, 36, 2093705. [Google Scholar] [CrossRef]
Mehr, H.D.; Polat, H. Human activity recognition in smart home with deep learning approach. In Proceedings of the 2019 7th International Istanbul Smart Grids and Cities Congress and Fair (ICSG), Istanbul, Turkey, 25–26 April 2019; IEEE: New York, NY, USA, 2019; pp. 149–153. [Google Scholar]
Garg, A.; Nigam, S.; Singh, R. Vision based human activity recognition using hybrid deep learning. In Proceedings of the 2022 International Conference on Connected Systems & Intelligence (CSI), Trivandrum, India, 31 August–2 September 2022; IEEE: New York, NY, USA, 2022; pp. 1–6. [Google Scholar]
Nguyen-Dinh, L.V.; Calatroni, A.; Tröster, G. Robust online gesture recognition with crowdsourced annotations. J. Mach. Learn. Res. 2014, 15, 3187–3220. [Google Scholar]
Doukaga, H.-N.; Rakotoarson, N.H.; Aubin-Morneau, G.; Fortin, P.; Maitre JBouchard, B. Fine-Grained Human Activity Recognition in Smart Homes Through Photoplethysmography-Based Hand Gesture Detection. In Proceedings of the 17th International Conference on PErvasive Technologies Related to Assistive Environments, Crete, Greece, 26–28 June 2024; pp. 1–6. [Google Scholar]
Bouchard, K.; Bouchard, B.; Bouzouane, A. Spatial recognition of activities for cognitive assistance: Realistic scenarios using clinical data from Alzheimer’s patients. J. Ambient. Intell. Humaniz. Comput. 2014, 5, 759–774. [Google Scholar] [CrossRef]
Bouchard, K.; Lapalu, J.; Bouchard, B.; Bouzouane, A. Clustering of human activities from emerging movements: A flocking based unsupervised mining approach. J. Ambient. Intell. Humaniz. Comput. 2019, 10, 3505–3517. [Google Scholar] [CrossRef]
Chu, Y.C.; Jhang, Y.J.; Tai, T.M.; Hwang, W.J. Recognition of hand gesture sequences by accelerometers and gyroscopes. Appl. Sci. 2020, 10, 6507. [Google Scholar] [CrossRef]
Nogales, R.E.; Benalcázar, M.E. Hand gesture recognition using automatic feature extraction and deep learning algorithms with memory. Big Data Cogn. Comput. 2023, 7, 102. [Google Scholar] [CrossRef]
Asadzadeh, P.; Kulik, L.; Tanin, E. Gesture recognition using RFID technology. Pers. Ubiquitous Comput. 2012, 16, 225–234. [Google Scholar] [CrossRef]
Zou, Y.; Xiao, J.; Han, J.; Wu, K.; Li, Y.; Ni, L.M. GRfid: A Device-Free RFID-Based Gesture Recognition System. IEEE Trans. Mob. Comput. 2017, 16, 381–393. [Google Scholar] [CrossRef]
Ding, H.; Guo, L.; Zhao, C.; Wang, F.; Wang, G.; Jiang, Z.; Xi, W.; Zhao, J. RFnet: Automatic gesture recognition and human identification using time series RFID signals. Mob. Netw. Appl. 2020, 25, 2240–2253. [Google Scholar] [CrossRef]
Zhang, Y.; Yang, Y.; Li, Z.; Yang, Z.; Liu, X.; Yuan, B. RF-Alphabet: Cross Domain Alphabet Recognition System Based on RFID Differential Threshold Similarity Calculation Model. Sensors 2023, 23, 920. [Google Scholar] [CrossRef]
Bouchard, B.; Maitre, J.; Gaboury, S.; Roberge, A. Hand Gestures Identification for Fine-Grained Human Activity Recognition in Smart Homes. In Proceedings of the 13th International Conference on Ambient Systems, Networks and Technologies (ANT), Porto, Portugal, 22–25 March 2022; Elsevier: Amsterdam, The Netherlands, 2022; pp. 1–8. [Google Scholar]

Figure 1. The smart home prototype used to collect the datasets. Unused sensors are hidden. Several electromagnetic contacts were omitted for clarity.

Figure 2. A sample dataset for the binary sensors. From left to right, the columns represent the following: timestamp, sensor, logical zone, position x, position y, and the state.

Figure 3. A summary of the activity recognition process with AtonGID and the QSR framework.

Figure 4. Same dataset, three granularity values, and three different resulting sequences of atomic gestures.

Figure 5. A sample atomic gesture (in red) in the QSR framework.

Figure 6. Basic rules of Flocking. (a) Alignment; (b) separation; (c) cohesion. In green is the actual distance between the boid and flockmates, and in red is the steering direction imposed by the rule.

Figure 7. The neighborhood graph of the qualitative directions with the similarity as percentage.

Figure 8. Example gestures used for the experiments. Eight are composed of two directions, four are composed of only one. The last on the picture is idle.

Figure 9. Confusion matrix of generated gestures.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Bouchard, K.; Bouchard, B. AtomGID: An Atomic Gesture Identifier for Qualitative Spatial Reasoning. Appl. Sci. 2024, 14, 5301. https://doi.org/10.3390/app14125301

AMA Style

Bouchard K, Bouchard B. AtomGID: An Atomic Gesture Identifier for Qualitative Spatial Reasoning. Applied Sciences. 2024; 14(12):5301. https://doi.org/10.3390/app14125301

Chicago/Turabian Style

Bouchard, Kevin, and Bruno Bouchard. 2024. "AtomGID: An Atomic Gesture Identifier for Qualitative Spatial Reasoning" Applied Sciences 14, no. 12: 5301. https://doi.org/10.3390/app14125301

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

AtomGID: An Atomic Gesture Identifier for Qualitative Spatial Reasoning

Abstract

Featured Application

Abstract

1. Introduction

2. Related Work

3. The Smart Home Setup

3.1. Data-Driven Orientation

3.2. Machine Learning with Qualitative Spatial Reasoning

4. Spatial Knowledge

4.1. Gesture Primers

4.2. Segmentation of the Data

4.3. Qualitative Spatial Reasoning

4.4. Atomic Gesture Identifier

4.4.1. Smallest Enclosing Circle

4.4.2. Recursivity

4.4.3. Correlation Coefficient

4.4.4. Qualitative Direction

4.4.5. Combining the Recursive Calls

4.4.6. Discussion

5. Clustering with Flocking

5.1. Flocking Primers

5.2. Similarity and Dissimilarity

6. Experiments and Results

6.1. Standalone AtomGID Experiments

6.1.1. Simulation with Eight Directions

6.1.2. Comparison with the Literature on Passive RFID Gesture

6.2. AtomGID in an Activity Recognition Context

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI