1. Introduction
Pattern recognition and outlier detection have become integral components of various fields, ranging from finance [
1,
2,
3] to healthcare [
4,
5,
6]. These concepts aim to extract meaningful information from data and identify patterns that may go unnoticed. The foundation for successful analysis lies in having a well-organized dataset. The ability to interpret and explain models is valuable for understanding decision-making processes and building trust and confidence in their outputs. Interpretability and explainability [
7] enable stakeholders, including domain experts, policymakers, and end-users, to comprehend and validate the results and make informed decisions based on them. With this, pattern recognition and outlier detection are powerful tools for extracting knowledge from data. A well-organized dataset is the backbone of these techniques, providing the necessary information to train accurate models. Whether utilizing supervised or unsupervised learning approaches [
8] these methods can reveal hidden patterns and detect outliers, leading to valuable insights and improved decision-making.
Interpretability and explainability are crucial aspects in the field of ML. Whereas ML models have shown remarkable performance in various domains, their complex nature often makes understanding the reasons behind their decisions challenging. Interpretability refers to comprehending and explaining how an ML model arrives at its predictions or classifications. On the other hand, explainability focuses on providing clear and understandable explanations for those predictions. In the context of this work—on pattern recognition and outlier detection—interpretability and explainability play significant roles. When dealing with well-organized datasets, interpretability allows us to understand the underlying relationships and patterns captured by models. By examining the features and patterns the model identifies, valuable insights are obtained into the problem domain. In the case of recognizing patterns in a person’s daily life activities, interpretability can help to understand which features contribute the most to specific actions or behaviors, such as the impact of room occupancy probabilities and timestamps on particular routines.
Explainability becomes particularly important when dealing with unsupervised learning techniques. In unsupervised learning, where no explicit labels or guidance exist, explainability allows understanding the hidden structures and patterns the model discovers autonomously. By explaining the identified patterns and anomalies, we can better understand unusual behavior or outliers in a person’s daily routines. This understanding can provide valuable feedback or intervention when anomalies are detected.
In the context of this paper, each user provides an independent dataset where all the data of a user relates only to that user. To acquire individual data, a smartwatch is used, and a service worker is responsible for periodically fetching all the new data for each user and saving it in a .csv file within a single folder stored in the system. This service worker was created with the help of the Crontab Guru tool [
9]. This tool helps to schedule and automate the execution of tasks, such as running a Python script on a Unix-like operating system. It simplifies the process of creating cron expressions, which define when and how frequently a task should run.
In conclusion, this paper presents a system that can organize the data to recognize the patterns in older adults’ daily life activity and, consequently, detect potential outliers in their behavior. The following sections will explain how this study is organized.
2. Related Work
The detection of anomalies in behavior is a critical aspect of activities of daily living (ADLs) in older adults. It involves identifying unusual patterns in behavior, which could indicate a potential problem, such as a fall, a medical emergency, or a change in health condition. In recent years, several studies have focused on developing algorithms and methods to detect anomalies in behavior using various sensors and data analysis techniques.
In [
10], an algorithm to detect falls in older adults using wearable sensors is presented. The algorithm uses a combination of ML techniques and signal processing to detect abnormal movements and postures that are characteristic of falls. The algorithm was evaluated using real-world data collected from older adults, and the results showed high accuracy in detecting falls.
In [
11], an algorithm is developed to detect anomalies in eating behavior using a smartwatch. The algorithm uses data from the accelerometer and gyroscope sensors in the smartwatch to identify patterns of hand-to-mouth movements, which are indicative of eating behavior. The algorithm was evaluated using data collected from a group of volunteers, and the results showed a high accuracy in detecting eating behavior.
In addition to detecting falls and eating behavior, several studies have also focused on detecting anomalies in other ADLs, such as sleeping, walking, and medication adherence. For instance, ref. [
12] proposed an algorithm to detect sleep disturbances in older adults using data from a wearable device. The algorithm uses ML techniques and signal processing to analyze data from the device’s accelerometer and gyroscope sensors to detect abnormal sleep patterns. Similarly, ref. [
13] proposed an algorithm to detect walking anomalies in two different groups of adults: one with cognitive impairment and the other composed of healthy adults. They use a virtual classifier algorithm that uses data from GPS sensors to identify walking speed changes, which indicate walking abnormalities. The results show a high level of accuracy in detecting walking anomalies.
Using both CNN and LSTM architectures, ref. [
14,
15] can predict a person’s behavior while representing actions with neural embedding.
Finally, ref. [
16] proposed an algorithm to detect medication adherence in older adults using data from a smartwatch. The algorithm uses data from the smartwatch’s accelerometer and gyroscope sensors to detect hand-to-mouth movements that indicate taking medication. The algorithm was evaluated using data collected from a group of older adults, and the results showed a high level of accuracy in detecting medication adherence.
In conclusion, detecting anomalies in behavior is an essential aspect of monitoring the ADLs of older adults. It involves using sensors and data analysis techniques to identify unusual patterns in behavior that could indicate a potential problem. Several studies have focused on developing algorithms and methods to detect anomalies in behavior using various sensors and data analysis techniques, with promising results. These studies can potentially improve the quality of life of older adults and provide valuable insights into their health and well-being.
Table 1 compares the approach taken in this work and those from the literature.
The contribution of the present work lies in its ability to provide a scalable solution for monitoring the ADLs of older adults. While there have been several works in the past that have attempted to address the same issue, what sets our solution apart is its ability to cater to diverse users. One of the critical challenges previous works have faced is the need for more scalability of their solutions. This has made it challenging to apply their solutions in different settings, such as in rural areas or for individuals with special needs. Our method is scalable due to its modular construction and the ability to process large datasets iteratively. The modular manageable steps are retrieving beacon information, assigning compartments, and calculating metrics. This allows for independent scaling of each part. The inherent parallelism in the loops over beacons, compartments, and data records enables efficient execution on multi-core processors, GPUs, or distributed systems (not implemented in this paper but referred to in the future work section). Redirecting and consolidating data minimize redundancy and memory usage, ensuring it can handle increasing input data. Another significant aspect of our approach is its emphasis on satisfying the needs of older adults and their caregivers. While previous works have focused on monitoring the ADLs of older adults, they have often overlooked the needs of their caregivers.
Efficient pattern recognition and behavior anomaly detection methods are also critical features of our solution. Previous works have often struggled with detecting anomalous behavior, which has resulted in a high number of false positives or negatives. This paper implements effectiveness using logical validation, metric calculations, and data handling. Comparison of strings with a threshold to group-related data minimizes noise and improves accuracy (which reduces the occurrence of false positives and false negatives). Its use of multiple statistical metrics—such as probabilities, averages, standard deviations, and percentiles—provides a holistic view of patterns, enhancing the reliability of its outputs. Early data validation prevents unnecessary computations, which ensures the results are meaningful and based on sufficient data. Focusing on specific compartment–time combinations allows for pattern recognition. This ensures that caregivers can act promptly in case of abnormal behavior, which can significantly improve the well-being of an older adult.
In summary, the present work contributes to monitoring the ADLs of older adults. Its scalability, ability to satisfy users’ and caregivers’ needs, and efficient pattern recognition and behavior anomaly detection methods make it a reliable and effective system.
3. Materials and Methods
3.1. System Architecture
The primary aim of this study is to develop a highly autonomous and scalable system designed to identify the daily life patterns of older adults. Additionally, the system is intended to detect deviations in their behavior throughout the day, thereby enabling the transmission of alerts to their family members or caregivers. The construction of a system capable of fulfilling these objectives is essential. The architectural framework of the project is depicted in
Figure 1 and is divided into two stages.
The first stage corresponds to the configuration of devices to gather data on older adults’ daily activities. This involves beacons emitting periodic Bluetooth Low-Energy (BLE) signals, which are then assigned to several rooms within the household. Concurrently, two applications for data collection and configuration are developed. Firstly, an Android mobile application facilitates the setup of the physical system and maps beacon representations within the application interface. Subsequently, user association with the beacons is achieved. A smartwatch widget application compatible with Garmin OS is developed to enable this. Both systems, the mobile application and the smartwatch widget, encompass distinct functionalities that allow the integration of components for data storage. These two developed applications are out of this paper’s scope. The data collection process encompasses acquiring beacon locations through the mobile application and identifying users through the smartwatch widget application, facilitating the capture of vital biometric data to monitor the user’s physiological signs. This phase culminates in the generation of data stored within a MongoDB database.
Subsequently, these accumulated data undergo processing to derive insights into usage patterns. The mobile application configures the environment by associating all household beacons with the individual, ensuring scalability by accommodating unlimited environments. Conversely, the smartwatch widget application captures periodic beacon signals, stores them, and records the user’s heart rate. The stored data are then transmitted to an API, bridging the applications with the database. The database assumes responsibility for storing all pertinent data gathered by the smartwatch throughout the user’s daily activities, which subsequently serves as the basis for pattern recognition and outlier detection.
The second stage is the focus of this paper. This phase encompasses data processing, analysis, and decision-making. The project is designed to operate autonomously, devoid of any requirement for human intervention in decision-making processes. So, as more dataare obtained, the assembled algorithm becomes more accurate and prepared to make decisions and not create false positives concerning the detection of outliers.
3.2. Observed Context
The observed context focuses on the user (observed subject) and the dwelling. The user in the experimental scenario had a university-level education and was already retired during the analysis period, which took place between the ages of 68 and 70. At the social level, the user lived alone and did not receive regular visits from family members or others. The lifestyle was moderately active, limited to essential activities such as grocery shopping.
The house is in a rural area, surrounded by a green area, with other houses nearby. In this house, the monitoring scenario was configured and equipped with a set of BLE beacons from Estimote [
17]. A smartwatch was used to monitor the presence of the inhabitant in the different compartments. Each room to be monitored was equipped with one or more BLE beacons.
3.3. Data Gathering
This work combines BLE beacons with pattern detection algorithms to analyze individual patterns and detect anomalies in daily activities. Beacons are used as IoT devices, emitting periodic BLE signals at 2.4 GHz, containing Universal Unique Identifiers (UUIDs). Garmin smartwatches are also employed, with functionalities such as external communication, BLE device scanning, and Beats Per Minute (BPM). The smartwatch sends collected data to an API and gathers beacon signals via BLE scanning. BPM data are included as additional information to help identify moments of concern, though it is not used as a standalone indicator. This allows for an individual setup of each user through the smartwatch. To ensure the system is both scalable and user-friendly, it is designed to require minimal human intervention, limited to installing and configuring devices in users’ homes. A mobile application facilitates the connection between monitoring daily activities and setting up the devices (beacons and smartwatches).
The solution determines the user’s proximity to a beacon using the Received Signal Strength Indicator (RSSI) and transmission power () values. RSSI measures the signal strength received by the smartwatch, ranging from (strong) to (weak), while represents the beacon’s transmission power, typically between −40 dBm and +4 dBm. Higher values increase signal range but consume more battery power, reducing durability. These factors define beacon placement to ensure reliable and efficient operation.
To minimize interference, beacons are installed in the center of ceilings, avoiding corners where signal strength might weaken due to walls or obstacles. They are spaced approximately every 5 m, providing reliable coverage for about 25 m
2 areas each, as presented in
Figure 2. Although beacons can transmit signals over longer distances, this configuration prioritized accuracy and consistency.
In this process, BLE beacons function as devices that periodically transmit data. The smartwatch application operates in the background, capturing and storing the received data in the format depicted in
Figure 3. Additionally, the application records the BPM obtained from the smartwatch, as shown in
Figure 4. Subsequently, the data are temporarily stored within the smartwatch’s storage until it establishes a Bluetooth connection with a mobile phone. Once connected, the data are transmitted via Wi-Fi to communicate with an Application Programming Interface (API). The application manages these data, removing them upon successful communication with the API.
Figure 3 shows that data are recorded every five minutes, including the date of the reading, the RSSI value, and the UUID of the associated beacon. This information is organized into a list and prepared for transmission to the API.
Figure 4 shows how the reading of BPM data is stored locally, where the date and heart rate values are stored in each data point.
In conclusion, this comprehensive process of collecting, storing, and transmitting data allows for the acquisition of valuable insights. By leveraging the resources of BLE beacons and a smartwatch, this solution paves the way for future analyses related to collected data regarding monitoring activities of daily living in older adults.
3.4. Data Integration
A seamless data flow occurs when the smartwatch establishes a Bluetooth connection with its associated mobile phone to perform updates. All the newly collected data from the smartwatch are transmitted to the designated API amd are stored in a structured manner within a database (MongoDB). The format in which these data are stored aligns with the representations shown in
Figure 5 and
Figure 6.
Figure 6 visually represents the specific structure employed to store the data in the database about nearby beacon information during the day. It highlights the arrangement of different fields, such as date, time, and RSSI, and these metadata enrich the understanding of the collected information. Bearing in mind that the beacon data records always have a date and time and a UUID, this guarantees the uniqueness of the records, mainly since the time value contains the hours, minutes, and seconds, and the UUID is a single value.
3.5. Data Pre-Processing
It was necessary to organize the collected data to identify patterns in the collected data. Data pre-processing consisted of segmenting the dataset to group the data by compartment, removing days with missing records, and temporarily grouping the records.
The collected data are ordered chronologically. To carry out the individual analysis of the movement patterns in each compartment, it was necessary to organize the data, separating the records referring to each data point. The division of data allowed the study of compartment-by-compartment patterns to start without the complexity of the remaining records.
3.6. Incomplete Data Removal
To address incomplete periods, we calculated a minimum threshold of records necessary to represent a full day. By comparing the total daily records—given that data are recorded every 5 min, resulting in up to 288 measurements per day—we established that only days with a total occurrence exceeding half of this value would be considered for further analysis. This criterion guaranteed the availability of numerous recordings to represent an entire day adequately. Consequently, periods with fewer recordings were excluded from the analysis, enhancing accuracy and mitigating the impact of data irregularities.
3.7. Grouped Data
Initially, we selected one-hour intervals for data segmentation. The pattern becomes clearer when data are precisely grouped into hourly periods. Although we explored other periods, shorter intervals exhibited significant noise, while longer intervals yielded overly abstract data with limited utility for the intended study. Consequently, we segmented the data by each hour of the day to facilitate pattern assessment within a reasonable time frame. We could observe a complete cycle of participants’ daily activities by evaluating data across twenty-four hours.
We adopted a method involving grouping data into equal periods based on the intended configuration to achieve the desired granularity. Initially, we grouped data by hour for each day and then grouped these data by hour across all days. This approach allowed for a comprehensive view of the system, enabling recognition of daily activity patterns. Specifically, at any given moment, the closest beacon device with the highest RSSI value indicates a 100% probability of the individual being in that compartment. Subsequently, the data were grouped to capture the activity pattern for each hour across all days.
3.8. Data Formatting
We established a system where a service worker periodically retrieves new data for each individual. This setup ensures the existence of a local version of the data, which is utilized for pattern recognition, alongside its presence in the database. Consequently, data security is enhanced as the data are stored in two locations, providing redundancy in case of system failure.
The data collected from BLE beacons are organized as time series data. Each dataset record denotes the time of day the individual spends within a compartment. It is crucial to accurately represent the timestamp feature as a numerical variable to specify the time interval between other records. Therefore, to conduct behavioral analysis based on individuals’ positions throughout the day, we decomposed the timestamp into multiple features, including date–time, the UUID of the beacon corresponding to a specific compartment, and RSSI. Each record represents the reading captured at that moment, indicating which beacon the smartwatch and the mobile phone detected every five minutes.
3.9. Pattern Recognition
The identification of patterns was carried out considering the distribution of all recorded data, the result of the calculation of the 25th and 75th percentiles, and the measure of the occupancy probability.
The patterns consist of observing characteristics that, as a general rule, tend to repeat themselves. The data were loaded into a temporal data structure (time series) and subsequently analyzed based on their graphical representation. Two files will be created and used during the pattern recognition process: the train and the pattern files. These files will be essential for the follow-up of the work that will explain further on how they will be used. In this way, the training file will be the file that will have all the data related to the respective person’s daily life activity since the first day and has the format that can be seen in
Figure 7 (which contains approximately 26,000 records). The pattern file will be the file that will have the data organized by hours and compartments, and the calculation of the occupancy probability, standard deviation, and 25th and 75th percentiles for each are presented in
Figure 8 (containing approximately 11,000 records).
Algorithm 1 orchestrates the calculation and pattern recognition processes and is tasked with conducting the necessary computations to analyze the patterns of each individual. To accomplish this, the data must be retrieved from the train file and stored in a structure (represented by
(“
train.csv”)). After that, the algorithm accesses the database to look for all the beacons
(six beacons were used, as presented in
Figure 2), that is, all the compartments
(five compartments) associated with the respective person
, and stores them in a structure. Next, it is crucial to understand if there is more than one beacon inside the same compartment. If this situation arises, the algorithm combines the data associated with
, treating them as unified entities. For example, if the living room is relatively large, approximately 35–40 m
2, then there are at least two beacons in this room. When configuring these two beacons, the location field will read “living room 1” (
) and “living room 2” (
). To this end, we have applied the Ratcliff/Obershelp pattern-matching algorithm [
18]. It returns the equivalence percentage between two strings (
, where
and
represent the UUID strings associated with
and
, and
represents the value of the equivalence from 0 to 1, where 1 is a complete match for two given strings). In these contexts, the precision of strings is determined by its size. If the equivalence is greater than 85%, there is at least one case where these beacons’ data must be concatenated since they correspond to the same compartment. The minimum threshold for equivalence was set at 85%, determined through comparisons among sets of compartments. It was the most accurate value through various comparisons, as shown in
Table 2.
Algorithm 1: Pattern Recognition |
|
The algorithm stores the correspondence of beacons in a separate structure upon recognizing that they are located in the same compartment. It then selects the “master” beacon, , for that compartment based on the alphabetical order of each beacon’s UUID. The beacon with the smaller UUID is designated as the master beacon. Consequently, all other beacons within the same compartment are configured to point to the “master” beacon. For instance, if the living room has three associated beacons (, , and ), is chosen as the master beacon. This means that points to itself, while and point to . Thus, when data from or are detected, the algorithm interprets them as originating from , enabling a more accurate calculation of the user’s location.
The system then examines the structure containing chronologically ordered data corresponding to the older person’s daily activities. It searches for the earliest and latest dates within this structure. Subsequently, it verifies if the difference in days between the latest and earliest dates exceeds seven days. If not, the algorithm cannot recognize a pattern, as the minimum required duration for pattern recognition is seven days. In such cases, the algorithm initiates a reorganization of all data. Consequently, the structure is reconstructed, where each date (including date and hour) encompasses all information related to BLE beacons and their RSSI values recorded by the smartwatch at a specific time. This approach facilitates the assessment of the volume of data obtained daily, allowing for improved organization based on date discrepancies. Furthermore, all available beacon data are included for each date to ensure comprehensive data organization.
After organizing the data, it is necessary to verify which days do not meet the minimum criteria set by the algorithm, as detailed in
Section 3.6. Then, the system initiates a new re-verification process if at least seven consecutive days of relatively stable data are available for pattern recognition. This precaution is essential because the elimination of certain days may lead to situations where there are insufficient days to meet this requirement.
Therefore, the algorithm ranges from the first date in the structure to the last, and every five minutes of every day, it seeks to see if there is information from beacons that can be read. If so, it increases the beacon’s probability field where the RSSI value is higher; i.e., the algorithm will look for all the beacon information that the smartwatch reads and acquire the information in which the RSSI value is higher. This means that the respective person was closer to that beacon than others at that moment, increasing the probability value of that moment (which corresponds to the hour of minutes). If no data are received from any beacon at a given moment, the smartwatch cannot detect any associated BLE beacons. Possible reasons include sensor disconnection or the person being outside the home. To address this, a new location named “None | Outside” is introduced to account for these occurrences and distinguish them from other events.
Therefore, these data must be divided by the days the pattern is studied. This is because, for example, if we have twelve days of data, and at 1 am we have the average room value of 11.4, this indicates that each average day compared to twelve days is very close to 100%. With this, the algorithm has to divide this value by the total number of days so that it has the probability of 0 to 1 that the older adult is inside a room at 1 am. This division is made for all data so that it has consistency in the data and can be worked on. Thus, everything is prepared to determine the probability of a person being in a compartment () at a time () (given by ). Briefly, the algorithm will travel each hour and every five minutes and calculates the probability through the average and the number of days, the standard deviation, and the 25th and 75th percentiles to have data to analyze the data.
Finally, the algorithm writes what is in the structure for the pattern file, organized in ascending hours, and the probability of each compartment in descending order, visible in
Figure 8. A complete list of the functions defined in Algorithm 1 is presented in
Appendix A.
3.10. Outlier Detection
The Interquartile Range (IQR) rule [
19] and the Z-score method [
20] emerged as the most suitable techniques for this study. These methods offer robust statistical insights into data deviations from the norm. It is important to note that each technique may require specific adaptations depending on the context in which it is applied. Furthermore, it is worth mentioning that both methods are recommended for data conforming to a normal distribution.
For the outlier detection process, several works have been carried out [
21,
22,
23]. A formula incorporating the Interquartile Range (IQR) with a constant value of 2.2 is suggested to be used as the optimal value. Consequently, the upper and lower limits of the IQR are given by Equations (
1) and (
2)
where
and
represent the upper and lower limits of the IQR, and
,
, and
represent the first, second, and third quartiles, respectively.
Based on these two formulas, if a specific value is lower than the result of Equation (
2) or if it is higher than the result of Equation (
1), then it means that this value is an outlier.
The Z-score method is widely used in many other statistics-related operations [
24,
25]. By default, any Z-score of 3 or more is considered an outlier since it falls outside the range of approximately 99.7% of the data. The Z-score is given by Equation (3).
The algorithm accesses the database to look for all the beacons—that is, all the compartments associated with the respective person—and stores them in a structure. This way, it does precisely the same step explained in
Section 3.9, associating all the beacons to just one in the same compartment.
If new data exist, the algorithm proceeds to another verification phase: determining whether the difference between the last date of these new data and the last date in the file name are more significant than one hour. The algorithm can only detect outliers in the behavior if there is at least one hour since the data are organized by hours.
The algorithm processes the new day’s data by organizing them based on the hours and minutes associated with each compartment. For each time interval (every hour and every five minutes), it computes the probability using the mean and the total number of days as described. Additionally, it calculates the standard deviation and the 25th and 75th percentiles, ensuring the availability of detailed statistics for further data analysis. This process mirrors the approach explained in
Section 3.9, providing a consistent framework for analyzing patterns and distributions across time and compartments. After this organization, a base to detect possible outliers is created. For that, the algorithm will look for all the data saved in the pattern file to compare the data of each hour with the new data, keeping them in a new structure.
To detect possible outliers in the new data, a 95% confidence level is considered. Thus, with the number of days of the pattern, the number of degrees of freedom that will be used to calculate the t-score is calculated. So, in the structure that contains the indicator data related to pattern recognition, they will be necessary at the beginning of each verification, as they will be traversed hour by hour in chronological order. Therefore, the algorithm will fetch all the information related to each compartment, namely the probability of being in the respective compartment at a specific moment and the 25th and 75th percentiles, and depending on the time of day, it will fetch the data calculated with the received data in this process. Each compartment’s probability of obtaining the new data will compare whether this value is below Equation (
2) or higher than Equation (
1), taking the default values. This is the first step to take during verification.
The algorithm realizes that with the set of new data, it indicates possible outliers at a specific time. Still, it could be a false positive, and that is why the detection of outliers is divided into three levels: “low”, “medium”, and “high”. The detected outliers will be stored later in a new structure, divided into the levels mentioned above. Therefore, depending on the resulting value, using the Z-score method with Equation (3) will characterize the level. As mentioned in the previous section, any value calculated above three utilizing this method is considered an outlier. When the algorithm was still in the testing phase, it was found that this value was unsuitable for the problem in question. So, what was thought was that, in addition to this calculated result, there was an associated constant relative to the level of confidence that we wanted to impose on this detection. The result generated by calculating this method would be multiplied by a constant, where this constant is the previously calculated t-score. Adding this constant to the result increased the accuracy of the calculation by about 65%, making this approach much better than setting a static value. Examples of this are presented in
Table 3.
Furthermore, this solution involves how many days there are for pattern recognition since the t-score depends on the number of degrees of freedom, thus implying dependence on the number of days of the pattern. In this way, determining which level is being treated will depend on the result generated by Equation (3) multiplied by the constant, and this result will determine the level.
Still, in detecting the outlier level, there is an auxiliary variable: the person’s BPM over a given hour. Garmin does not provide specific accuracy specifications in measuring heart rate. However, it was reported that the accuracy of each watch would depend on multiple factors, but more specifically on the smartwatch used for testing. The typical error is ±7 BPM [
26]. Therefore, according to [
27], the normal BPM range in older people is between 60 and 100 BPM. This interval will be used so that if the average of BPM values within the respective hour is outside this interval (taking into account the error associated with the measurement), an outlier level will be added to the previously detected one. With this, the algorithm organizes the outliers so that it is possible to understand when the outlier appeared at their respective time and which contributed to and harmed these new data. That is when a pattern is recognized in this study; as a rule, a compartment always stands out at each hour. This means that at the respective time, the probability of the person being in that compartment is greater than the others. So, if, for example, there is a time of day when the person is not in that compartment, the algorithm will identify that the compartment affected in calculating this outlier is the division that is usually ahead of the rest. The beneficiary is the one who, at that time, has more data than everyone else. Furthermore, there may be the possibility that at a particular time, there is not a well-defined pattern, which means that if any compartment is one of the highest among the others and that at a certain time of a specific day, the probability of it having been at the same time but on a different day is much higher, and the algorithm detects this and identifies that there was a compartment in which its probability was increased above the standard. Still, it is one of those with the highest chance.
This detection is only possible when the data are not continuous when detecting the pattern, that is, when there is a standard deviation during the calculation of the probability, because if the standard deviation of a probability at a given time is zero, then Equation (3) cannot be used as division by zero. The algorithm receives a probability value, converts it into the corresponding five-minute intervals within each hour, and assesses whether the deviation between them exceeds specific threshold values. This determination enables the algorithm to discern the outlier level of the new data. The probability of being in each division at a particular time is calculated, yielding a value ranging from 0 to 1. This value is then translated into the corresponding number of five-minute intervals within an hour, allowing for the establishment of threshold values for each calculated probability. For instance, if A is assigned a probability of 0.6 in a given hour within a compartment, this value translates to approximately 35–40 min of presence within that hour. Suppose a maximum limit of 15 min is set for detecting outliers in new data. In that case, any subsequent data indicating a presence of 28 min or less within an hour suggest behavioral changes. Depending on the interval, the algorithm classifies the levels of outliers being detected. Furthermore, as described earlier, this process utilizes an auxiliary system that averages the BPM readings for each hour. This information is leveraged to identify individuals who may benefit or be adversely affected by the observed behavioral changes.
After this detection and characterization of each outlier, the algorithm will check, within the structure that stores the levels of outliers and their respective information and whether an outlier detected at a given time had already been seen in the past hour. So, for the algorithm to carry out this analysis, it is necessary to analyze the information on who benefited and who was harmed in each detected outlier and, in turn, increase the level if so justified. With this, the algorithm can have a temporal precession of each detected outlier. After calculating and analyzing these data, if outliers are detected, the family member or caregiver with whom the patient has associated the detailed information on how each outlier was seen is emailed.
4. Results
The research objectives described in this paper depend on the behavioral normality of the person. We analyze the lower and higher limits, exposed by the 25th and 75th percentiles. Finally, we evaluate the occupation probability of each compartment, indicating the likelihood of the person being present at a specific time.
4.1. Accumulated Data
Observing the accumulated data makes it feasible to visualize the distribution without excluding any records. These records are grouped solely by sixty-minute periods. All recorded data are depicted in graphs corresponding to the respective compartments. The charts in the following figures will display recorded values ranging from 0 to 12. Here, 12 corresponds to the number of five-minute intervals within one hour. Consequently, every hour is divided into 12 intervals of five minutes each.
In the kitchen (
Figure 9), periods of heightened presence, denoted by a notably increased frequency of data occurrences, occur between 10 am and 1 pm, as well as between 7 pm and 9 pm This pattern suggests a pronounced presence in the kitchen during meal times. However, between 2 pm and 4 pm, a reduction in presence is observed, attributed to the recognized tendency of individuals to relocate to the living room or embark on outdoor activities such as walking.
The bathroom is a traditional short attendance compartment, as shown in
Figure 10. Nevertheless, it is possible to identify some recurring periods of use, namely the early morning (probably related to morning hygiene) between 8 am and 9 am, and 10 pm and 11 pm before moving to the bedroom.
The third compartment analyzed was the bedroom. This space is usually used to rest and sleep. From
Figure 11, it is possible to identify the occupation in the evening periods. Between 10 pm and 9 am, there is a strong movement, as well as at 7 pm.
The living room compartment is visible in
Figure 12, where the presence is detected in the periods between 2 pm to 4 pm and 9 pm to 11 pm. The inhabitants confirmed the routine of moving from the kitchen to the living room after dinner and staying there until they moved to the bedroom to sleep. The room is an open space that is midway to each compartment, which justifies the existence of some movements during the day.
The last area analyzed is related to situations where no data are available or the person was away from home. From
Figure 13, it can be seen that between 3 pm and 6 pm, it is frequent for the person to leave the house. The person himself confirmed this information.
4.2. Time Distribution of Attendances
The analysis of attendance totals does not allow for assessing their distribution (by hour) during the period. In turn, outliers in the training dataset can compromise the identification of patterns by analyzing the maximum and minimum number of presences. A percentile analysis can eliminate noise in the data caused by outliers generated by non-routine events.
After assessing the data distribution by viewing all the daily records, the 25th and 75th percentiles were analyzed. In the kitchen compartment (
Figure 14), we can identify periods of movement, namely between 9 am and 10 am, 12 pm and 1 pm, and 8 pm and 9 pm. As expected in the bathroom, no movements are identified during most of the day, as shown in
Figure 15. The 75th percentile reveals a more significant presence in the early morning between 8 am and 9 am, in the early afternoon after lunch around 2 pm, and at the end of the day between 10 pm and 11 pm.
As for the bedroom compartment,
Figure 16 shows that the 25th and 75th percentiles in certain parts of the day are very constant and almost always equivalent throughout the day. Thus, analyzing the percentiles, it can be concluded that they reveal a more significant presence between 12 am and 9 am, and also, from 11 pm onward, this presence begins to increase. This is because the person usually sleeps during this period.
In the living room compartment, the presence of users between 2 pm and 4 pm is once again validated through
Figure 17. During this period, the number of records is constant. Furthermore, between 10 am and 11 am, and 10 pm and 11 pm, there is also a significant presence in this compartment.
In the none or outside area, by viewing the graph in
Figure 18, it can be seen that between 4 pm and 6 pm, there is a greater possibility of the person being away from home. In addition, other time intervals may have more significant potential, but this may have to do with sensor failures when collecting the data sent by the beacons in each compartment or the person being out of the house.
4.3. Occupation Probability
Calculating the probability of an inhabitant’s presence in a specific compartment was an additional mechanism to study movement patterns and occupancy within the living environment. This approach provided valuable insights into the likelihood of finding an individual in a particular compartment during a given hour. To assess the compartment’s occupancy probability hour by hour, the average for each compartment is calculated based on the occurrences, and the probability is calculated by dividing the average number of occurrences for each compartment by the number of days, hour by hour. The calculation of compartment occupancy probabilities, as outlined in
Section 3.9, contributed to the broader analysis of daily living patterns and behavior. It complemented detecting outliers and patterns identified through other mechanisms, providing a comprehensive view of the inhabitants’ activities and movements within the live environment.
After calculating the probability, hour by hour, the similarity between the occurrence chart (
Figure 9) and the probability chart (
Figure 19) was again verified. According to the probability graph in the kitchen compartment, it is more likely that the person is in this compartment due to breakfast, lunch, and dinner. It can be seen that the intervals are between 10 am and 11 am, 12 pm and 1 pm, and 8 pm and 9 pm, respectively.
The bathroom compartment does not have very high values about the probability of presence, essentially justified by the purpose of this compartment and the type of short-term activities that take place here, verified through
Figure 20. Even so, the highest values observed between 9 am and 10 am and 10 pm and 11 pm agree with the patterns previously identified in
Figure 10 and
Figure 15.
In the bedroom compartment, the probability graph of
Figure 21 shows a very well-defined probability of being in the bedroom between 12 and 9 am. In addition, from 11 pm onward, the likelihood of being in the room increases. This is because the person is sleeping during this time interval.
In the living room compartment, confirmation of presence around mid-afternoon is visible when the probability value is calculated, reaching high values between 3 pm and 4 pm, as shown in
Figure 22. This result is excellent compared to the values obtained with the same analysis in the none or outside area (
Figure 23).
As for the none or outside area, the probability graph in
Figure 23 shows that it is expected or has a reasonable probability from 4 pm to 6 pm. Other times when the probability of this area may be relevant have to do with the fact that the sensor, sometimes, cannot read the data from the beacons of each compartment.
Observing the probability of space occupation across all compartments, as depicted in
Figure 24, allows for easy discernment of movement patterns. It is evident that the individual under study typically occupies the bedroom for sleeping between 11 pm and 8–9 am. During this period, the person will likely visit the bathroom at night. Subsequently, the person arises from bed, exits the bedroom, and heads to the bathroom for morning activities between 9 am and 10 am. Following this, between 10 am and 11 am, it is expected to find the individual in the kitchen for breakfast. Between 11 am and 12 pm, the person is typically between the living room and the kitchen. A high probability of being in the kitchen is observed between 12 pm and 1 pm, coinciding with the customary lunch hour. Post-lunch, individuals typically remain in the living room, with occasional bathroom visits, between 1 pm and 4 pm. Subsequently, between 4 pm and 7 pm, it is customary for the person to be outside the house for a walk. The individual is expected to be in the kitchen for dinner between 7 pm and 9 pm. Following dinner, the person may spend time between the living room and the bathroom until approximately 11 pm.
4.4. Outlier Detection
During the experimentation phase, the developed model was submitted to an analysis to identify outliers in the newly received data. These outliers were categorized into three levels, reflecting the degree of deviation from standard patterns. The objective was to continually improve the model’s performance, minimize false positives, and enhance its accuracy in responding to the evolving data.
Figure 25 provides a graphical representation of the outcomes to evaluate the model’s overall performance. This graph depicts the number of outliers identified at each level during the experimentation phase and the conclusion of the work. This iterative process aimed to fine-tune the model’s capabilities, enhancing its ability to detect and classify outliers accurately.
Therefore, in
Figure 26, there is an example of an email generated when the algorithm detects an outlier when receiving new data collected by the person’s smartwatch. With the visualization of
Figure 26, it is noticed that most of the outliers created were of the “low” level. This is because, during those hours, data that deviated from the standard were detected but were not critical or even negligible. Also, notice that the email has two types of messages.
The first message corresponds to the fact that there is a compartment with a significant occupancy probability at a given time. Still, its probability has been reduced this time because of another compartment. That is when usually the person’s pattern is to be in a particular compartment. Still, that day, there was another person’s behavior, spending more time in another compartment than usual. For example, this first type of message can be visible with the phrase At 03:00 AM, the usual location has been changed from the “Bedroom” to the “Bathroom”. This means that the presence in the “Bedroom” has become a pattern at that hour of the day; moreover, the “Bathroom” is also trending on that specific day.
The second message corresponds to the fact that a compartment is well defined but not significantly so. So, when there is a very high probability of that same division, it can generate a message indicating that the likelihood of that compartment has been increased even more. This second type of message will generally always appear at the “low” level of outliers, as it is rare for there to be a situation in which there is a time when there is not at least one well-defined compartment at that time, which at least makes it possible to analyze which the probability of frequenting that space is excellent. For example, in the generated email, we can see a message of this type: “At 12:00 PM, the usual location has been increased by ‘Kitchen’ to the ‘Living Room’”.
In this way, the email manages to be quite descriptive of what it detected, helping whoever is reading what is happening that day.
5. Conclusions
Pattern recognition was successfully achieved in all compartments through a series of analyses. The identification of movement patterns was confirmed through the different analysis, accumulated data, percentiles, probability, and the outlier detection mechanism. Even though the dataset may contain irregularities in the records, the model demonstrated the ability to adapt to the identified patterns. The model’s ability to adapt and recognize the identified patterns highlights its robustness and reliability.
The accumulated data were crucial in establishing movement patterns within the compartments. By analyzing the cumulative data over time, we could discern consistent trends and understand how activities unfolded within each area. Percentiles provided further insight into the distribution of values within the dataset. This allowed for a more nuanced understanding of the variations in movement patterns and helped identify activities that deviated from the norm. The probability analysis significantly determined the likelihood of specific events or actions occurring within the compartments. By assigning probabilities to various scenarios, we could quantify the occurrence of particular behaviors and validate the identified patterns. Furthermore, the outlier detection mechanism was another validation tool for the identified movement patterns. The mechanism ensured that any irregularities or unexpected behaviors were accounted for and adequately addressed by detecting and flagging anomalies or outliers. Identifying outliers and studying patterns are distinct processes, but they complement each other in ensuring the correct interpretation of the information. We can validate the identified patterns and comprehensively understand the data by analyzing both aspects.
Limitations of the method occur when the number of beacons increases and in the data processing step. The complexity of Algorithm 1 is primarily determined by the pairwise comparison of beacons, which scales quadratically as
, and the data processing step, which scales as
, where
b is the number of beacons,
n is the number of data points,
c is the number of compartments, and
r is the number of RSSI values per compartment. Future work may involve optimizing these two components. To reduce the pairwise beacon comparison complexity, beacons can be grouped into clusters based on logical proximity, and only beacons within the same cluster can be compared. Another possibility is to use hashing techniques like Locally Sensitive Hashing [
28] to approximate string similarity
instead of exact comparisons. To reduce the complexity of the data processing step, parallel processing can be used to process independent operations on different compartments
on time intervals
.
In conclusion, the combination of pattern recognition and outlier detection mechanisms enabled the correct interpretation of the information within the dataset. Various analyses identified and validated movement patterns, including accumulated data, percentiles, probability assessments, and outlier detection. This comprehensive approach ensured the accuracy and reliability of the results and provided valuable insights into the daily activities and behaviors of the individuals under study.