1. Introduction
Studies indicate that 17% of the population is aged over 65 in the UK and one million people will have dementia by 2025, and this will increase to two million by 2050 [
1,
2,
3]. These numbers underline a situation which presents a certain level of criticality that needs to be managed. Cognitive impairment is a condition-affecting the memory and thinking abilities of elderly people [
4]. This situation makes the elderly people dependent on their caregivers. However, studies show that age-in-place can help to mitigate the affects of cognitive decline. Providing a living environment in a smart home can assist elderly people suffering from dementia to lead an independent life. Moreover, tracking daily activities of elderly people at such a smart home would be helpful to detect the early indicators of dementia.
The indicators of cognitive impairment can be observed in daily activities, such as cooking and eating [
5,
6]. Monitoring the trends over time and tracking the changes in activity patterns, such as getting up repeatedly during the night and failure to complete tasks, would be useful to understand the markers of cognitive decline. For example, an elderly person suffering from Alzheimer’s may have abnormalities in their sleeping patterns, such as waking up or going to the toilet in the middle of the night. Moreover, there can be an abnormality in their eating habits (forgetting to have dinner, for example), or they may suffer from the consequences of dehydration because of forgetting to drink water. They may also get confused and make mistakes while performing activities such as running the dishwasher; they may confuse names on a phone book; they might forget to turn off heaters and kitchen utilities [
7].
In-home automatic assessment of cognitive decline has been the subject of many studies [
8,
9,
10,
11,
12]. Currently, questionnaires or in-person examinations are being used by experts to evaluate the cognitive status of elderly people. The work reported in [
11] was a comparison of the paper-based Montreal Cognitive Assessment and its electronic version (eMoCA) from 401 participants in two groups. A demographic questionnaire was built into the eMoCA. The study presents that the eMoCA provides the potential to screen for early changes in cognitive function and the access to rural or remote communities. In [
12], the authors assumed that leisure-time physical activity (LTPA) is protective against decline in cognitive performance. They evaluated the cognition assessment using the in-person questionnaires to the participants. Study results show the independent association between a low level LTPA and a greater decline in cognitive performance. However, examination methods of this kind poorly represent the cognitive status of an elderly person, since they only depend on pre-defined questions asked in a given short time. Our study relies on the idea that indicators of cognitive impairment can be observed in daily activities. Thus, monitoring the activities of an elderly person in a smart assisted environment would be helpful to assess the cognitive status. This system could be used as a decision-supporting system for caregivers and medical doctors to take action towards improving their life quality.
Daily activities are often composed of several sub-activities [
13]. For example; the activity “preparing coffee” consists of the following sub-activities: boiling water; taking a cup; mixing coffee and water. These sub-activities are important in the detection of abnormal behaviour related to dementia. For example, an elderly person suffering from cognitive decline may get confused during the performance of an activity and this may result in repetition or skip of some sub-activities. The repetition frequencies of sub-activities and their correlations can be clues regarding the abnormal behaviour arising from cognitive impairment. Unfortunately, existing studies treat each activity as an atomic unit and fail to model activities based on their sub-activities, and thus fail to capture the relationships among sub-activities, which might be important in the context of dementia. This study addresses this shortcoming by constructing activity instances hierarchically from their sub-activities. Activity recognition resembles scene parsing or phrase detection, which are hierarchical learning problems. Inspired by solutions to these problems [
14,
15,
16], we explored recursive auto-encoders to model daily activities from their low-level sub-activity structures hierarchically and then detect abnormal behaviour arising from cognitive decline.
Unfortunately, there exists no publicly available dataset on abnormal behaviour of people with dementia. Producing such a dataset would require time and an adequate experimental environment. When there is no real-world dataset available, data simulation can be a solution [
17,
18,
19,
20]. Given the scarcity of such data, simulating daily life abnormal behaviours of elderly people suffering from dementia would be helpful for providing automatic assessment methods. Thus, in this paper, a method is proposed to artificially produce abnormal activities reflecting on typical behaviour of elderly people with dementia.
The proposed application would be used as a cognitive status assessment method in the natural flow of daily life of elderly people suffering from cognitive decline. The proposed method would be used as a warning and decision supporting system rather than a decision making system. As described in
Figure 1, the model learns the normal activities and detects the possible candidates for abnormal behaviour, which are actually deviating sensor representations from the normal ones. Then the detected abnormal activities are presented to the caregiver or the medical doctors to support their decision making. Final decisions would be given by the clinicians.
The contributions of this paper are three-fold.
Simulation of abnormal behaviour: A method is presented to simulate abnormal behaviour stemming from cognitive decline. More specifically, activity-related (repetition of activities and sleep disorder anomalies) and sub-activity (confusion)-related abnormal behaviour instances are generated from real-world data.
A new sensor representation: Raw sensor measurements coming from sequential data are represented using raw sensor triggering information rather than a bag-of-words style approach. This representation encodes granular-level information such as the frequency of each sensor activation and their relative activation order.
Modelling activities hierarchically: Recursive auto-encoders and their variants are used to model sensor-based daily activities based on their sub-activities and detect abnormal behaviour-related to cognitive impairment.
The rest of the paper is organised as follows.
Section 2 summarises the literature work.
Section 3 describes the dataset used and explains dementia-driven data generation; it also presents the sensor representations along with the variants of the auto-encoder models for abnormal behaviour detection.
Section 4 presents the experimental settings and results.
Section 5 provides a discussion of the results. Finally,
Section 6 concludes the paper.
2. Related Work
Automatic assessment of cognitive impairment has been tackled using many machine learning approaches, such as support vector machines (SVMs), naïve Bayes (NB) methods [
21], restricted Boltzmann machines (RBMs) [
22], Markov logic networks [
9,
23], hidden Markov models (HMMs) [
19,
24], random forest methods [
20], hidden conditional random fields [
25], recurrent neural networks (RNNs), convolutional neural networks (CNNs) [
26,
27] and some hierarchical models [
28,
29].
Cognitive Assessment Studies: In [
30,
31], participants were asked to complete a pre-defined set of tasks and based on their performance, their cognitive status was evaluated. This score was calculated based on the duration of the activity and the sensor activations. In [
32], the authors focused on kettle and fridge usage and sleep patterns. The cognitive status of a person was assessed based on the kettle and fridge usage times, durations and frequencies; and the duration of sleep. In [
33], the authors designed games to assess the cognitive status of an elderly person. Unfortunately, these assessment methods were not performed in the natural flow of daily activities. Rule-based systems require trained experts and manually designed and integrated rules for each individual person since daily life habits change from person to person. For example, while one person has the habit of going to toilet frequently during the night, this routine might be abnormal for another person. On the other hand, our method does not involve expert input, because it learns the habits of people automatically from training data. In this study, we aim to detect abnormal behaviour in a real-life scenario and in the natural flow of daily living without providing any instructions, rules or tasks.
Deep Learning Studies: In [
34], features were extracted and selected from sequential data using RBMs. In [
35], convolutional neural networks (CNNs) and long short term (LSTM) recurrent neural networks (RNNs) were used to recognise activities from wearable sensors. In [
36], the authors explored convolutional and recurrent approaches on movement data captured with wearable sensors. In [
37], the authors utilised CNNs to classify activities using smartphone sensors. In [
38], features from raw physiological signals were extracted using CNNs, and a multivariate Gaussian distribution was exploited to identify risks. Unlike our work, these studies exploited wearable sensor data, which would not be applicable for our task, since elderly people would be annoyed by wearing sensors. In [
26,
27], the authors exploited RNNs and CNNs to detect abnormal behaviour stemming from cognitive decline; however, these studies failed to capture the intrinsic structure of activities and cannot detect anomalies occurring at the sub-activity level.
Data Generation Studies: In [
19], the authors modified a real-world dataset to synthesise health-related abnormal behaviours. Daily living activities such as sleeping and waking up were chosen, and abnormal behaviours such as frequent toilet visits, no exercise and sleeping without dinner were synthesised. In [
20], more data were synthesised using HMMs based on real data collected. To increase the realism of data simulation, the authors modelled the sensor events by a combination of Markov chains and the Poisson distribution. However, in both [
19,
20], it was not mentioned in detail how the data synthesis was done. In [
17], the authors modified a real-life dataset, converting the rooms into activities. The authors focused on walking and eating in conjunction with the sleeping activity, and samples of these activities were manually inserted.
Hierarchical Modelling Studies: In [
25], the authors exploited HCRF to detect abnormal behaviour by considering sub-activity and their relations. First, activities were recognised by using HCRF; then a threshold based method was used to detect abnormal behaviours. However, they did not build activities from their sub-activities; they looked for anomalies in sub-activities manually. For example, for an anomaly occurring in sub-activity “forget to turn off the tap”, they checked the HCRF confidence value calculated for this sub-activity specifically. In [
9], the authors detected anomalies by exploiting a Markov logic network, which uses rule-based reasoning and probabilistic reasoning. Unfortunately, these rules would need to be changed based on the home environment, sensors and habits of the elderly. In [
23], these rules were learned automatically by using a formal rule induction method. In our study, the abnormal behaviour is defined in the context of sequences considering their relationships with before and after activities, similarly to [
9,
25]. In [
28], recursive auto-encoders (RAE) were used to cope with the scarcity of data. The authors applied transfer learning when there was limited data available. They learnt “normal” behaviour in a source household, and then transfered the parameters of a RAE to another house (source) to detect abnormal behaviour of dementia sufferers. In [
29], graph convolutional networks (GCNs) were exploited to build daily activities from their granular-level structures in order to detect abnormal behaviour arising from cognitive impairment.
Impressive results have been obtained with recursive models in hierarchical learning problems, such as parsing, sentence-level sentiment analysis and paraphrase detection and scene parsing [
14,
15,
39]. In [
40,
41], auto-encoders were exploited for anomaly detection in time-series sequences. In [
15], the authors used recursive auto-encoders for predicting sentiment distributions. Instead of using a bag-of-words model, hierarchical compositional semantics was exploited to understand the sentiment. Inspired by [
14,
15], we aim to hierarchically merge sensor readings coming from time-series sensor activation data. This model will be helpful to understand the intrinsic sub-structures of activities and to extract sub-activities.
Data Simulation Studies: Many studies used data simulations to cope with the scarcity of data [
17,
19,
20]. In [
28], transfer learning via recursive auto-encoders (RAE) was used to detect abnormal behaviour of elderly people when there was limited data available. First, normal behaviour was learned in a source household, and then the parameters of a RAE were transferred to another house (target) to detect abnormal behaviour. In [
19], the authors modified a real-world dataset to synthesise health-related abnormal behaviour for their experiments. In [
20], more data were synthesised using hidden Markov models (HMMs) based on a small set of real data collected. In [
17], the authors modified a real-life dataset of an older adult converting basically the rooms into activities. In [
28], recursive auto-encoders (RAE) via transfer learning were used to cope with the scarcity of data. First “normal” behaviour in a source household was learnt, and then the parameters of a RAE were transferred to another house (source) to detect abnormal behaviour of dementia sufferers.
Sensor Representations: The studies in the literature exploit binary, changing and lasting features [
42]. However, these features were extracted from time-slice chunks within a given time and neglect the interaction between sensors, their triggering order and frequency. Similarly to our work, in [
43], the authors tried to capture the relationship between the sensor activations. They learn an adjacency matrix reflecting the sensor topology in the house.
3. Materials and Methods
In this section, firstly, the dataset used is presented along with the simulation of the abnormal behaviour. Secondly, two different sensor representations, namely, bag-of-sensors and raw-sensor-measurement, are described. Thirdly, recursive auto-encoder models and their variants, namely, traditional and greedy RAEs are presented. Lastly, an abnormal behaviour detection method is summarised.
3.1. Dataset Description
The proposed RAE-based method was evaluated on Aruba testbed provided by CASAS smart home project [
44]. In our study, we used three door and 31 motion sensors and excluded temperature sensors, since they do not add any additional information. The data were collected in 224 days and data were noted as sensor readings and time-stamps. In total, there are 11 daily activities, namely, “meal preparation”, “relaxing”, “eating”, “work”, “sleeping”, “washing dishes”, bed to toilet”, “entering home”, “leaving home”, “housekeeping” and “respirating” in this dataset. Unfortunately, Aruba dataset does not include any abnormal behaviour reflecting the cognitive status of elderly people with dementia. Therefore, we need to generate some artificial abnormal behaviour.
3.2. Simulation of Dementia-Related Abnormal Behaviour
In this study, we generate two types of abnormal behaviour observed in daily activities of elderly people with dementia: (i) activity and (ii) sub-activity-related abnormal behaviour. In activity-related anomalies, an activity itself is totally normal, but there is an anomaly related to its frequency or its occurrence time (before/after certain activities). On the other hand, a sub-activity-related anomaly occurs in the intrinsic structure of the activity (frequency of sensor activations, their order and correlation). In the first one, activities as a whole are repeated or forgotten; in the second one, some steps (sub-activities) of activities are forgotten or repeated.
3.2.1. Activity-Related Abnormal Behaviour
An elderly person with cognitive decline has a tendency either to forget or repeat a certain activity [
45,
46]. This kind of abnormal behaviour is simulated by inserting certain activities within the sequence of the day. This simulation generates abnormal activities in an abnormal time of the day, such as cooking or going to the toilet in the middle of a night, showing degeneration of the sleep–waking cycle, which is a symptom of cognitive decline [
45,
47]. We injected the instances of the following activities into the normal activity sequences to generate abnormal activities related to the frequency: meal preparation, eating, work, washing dishes, leaving home and entering home. We injected relaxing, eating, bed to toilet and respirating into the normal activity sequences of sleeping activity to mimic abnormal behaviour stemming from sleeping disorders (see Algorithm 1). In total, we manually generate 77 abnormal activity instances.
Algorithm 1: Simulation of abnormal activities. |
|
3.2.2. Sub-Activity-Related Abnormal Behaviour
Elderly people with cognitive impairment may get confused during the performance of daily activities. As a result, they might tend to perform some sub-activities more than once, or change the orders of sub-activities within an activity. For example; during washing clothes activity, an elderly person may confuse how to use the washing machine, and may press the buttons of the machine a couple of times. Then the sensors on the machine would be triggered abnormally more than they should be.
The generation of this kind of anomalies is done by repeating some sensor activations in a given activity instance (see Algorithm 2). For this purpose, given random instances of working, eating, meal preparation and bed to toilet, we randomly repeat the sensors (
respectively) involved in these activities. Here, we can think that the sensor
emulates the computer. Repeating the triggering of this sensor in working activity will emulate the confusion of using a computer. For example, assume that
is a randomly chosen sequence of working activity, where each
represents a sensor activation. Here, the sensor
is inserted at random locations with a random frequency. Then the modified
S becomes
which results in abnormal activations of that sensor (see
Figure 2). In total 69 abnormal activities are generated in this category.
Algorithm 2: Simulation of abnormal sub-activities. |
|
3.3. Feature Engineering
In this study, raw sensor readings are mapped onto two representations, namely, bag-of-sensors (BOS) and raw-sensor-measurement (RSM) representations.
3.3.1. Bag-Of-Sensors (BOS)
This representation is the same as raw feature representation described in [
42]. However, we name it BOS, since it resembles a bag-of-words representation in document recognition literature. This representation ignores the context of sensor events in a given duration. Firstly, time-slice chunks are segmented from raw sensor data using a sliding window approach [
42]. A time-slice chunk can be thought of as a bag that collects the sensors which are triggered in a given time. A vector of length
N, where
N is the total number of sensors in the dataset, is initialised to zeros, and the sensors triggered at a given time are set to 1. This feature ignores the frequency and the order of activations.
For example, sensor readings from Aruba dataset within a 1 minute time are shown in
Figure 3. There are 34 sensors in Aruba test-bed. Thus BOS representation for this chunk will be 0011101000000000000000000000000000, where only the positions at
are set to 1. Although
is triggered two times and
is triggered only once, they have the same affect on the representation. Moreover, first
is activated and then
and so on; nevertheless, this order is lost in this representation.
3.3.2. Raw-Sensor-Measurement (RSM)
In this version, the frequency and the correlation between the sensor activations are preserved. For example, given the one-minute data in
Figure 3, the RSM representation will be
. The representation is then mapped onto a one-hot encoded representation for each sensor activation. The extracted representation will be the variable size of (number of sensor activations in a given time window × number of total sensors in the dataset; in this example,
), whereas BOS has a fixed size of (1 × number of total sensors in the dataset;
in this case). BOS feature ignores the relative order and the frequency of sensor activations, whereas this information is captured by the RSM representation. However, the order of sensor activations, their correlation with other sensors and their frequency are granular, important details to detect anomalies-related to dementia.
3.4. Auto-Encoder Models for Abnormal Behaviour Detection
An auto-encoder network is an architecture that takes an input and is trained to reproduce that input in its prediction layer. Auto-encoders are unsupervised since they do not need explicit labels during training. However, they work in a self-supervised fashion since they learn model parameters relying on training data. An auto-encoder consists of an encoder compressing the input, a decoder reconstructing the input and a loss function calculating the error between the real input and the reconstructed input.
In a recursive auto-encoder (RAE), which originated from [
48], given two children, an encoding function first constructs the parent. Then the children are reconstructed by decoding function to calculate the loss. The same encoder and decoder are used at all levels of the tree recursively. We will be focusing on two types of RAEs—traditional RAEs and greedy RAEs.
3.4.1. Traditional Linear Recursive Auto-Encoders
In a traditional RAE, a parent is constructed by merging a child with its neighbour. In
Figure 4 (figure retrieved from [
15]), a list of inputs
is given. First, the children
are merged to calculate parent
so that
where a weight matrix
W is multiplied with the children. Then a bias term is added before applying an element-wise activation function such as
. Next, parent vector
is merged with the next child
. The same procedure is applied recursively in all levels of the tree. Then, the model reconstructs the children in a reconstruction layer:
. In the end, the reconstruction errors are minimised in a training phase to learn the model parameters. The reconstruction error is calculated as
. The process repeats until the full tree is constructed and a reconstruction error is obtained at each non-terminal node. The encoding and decoding weight matrices are learned by applying the back-propagation algorithm.
3.4.2. Greedy Recursive Auto-Encoder
In greedy RAE, two children that give the least reconstruction error are merged at each tree level. This greedy approach is described as follows. Assume that a sequence of instances
is given (see
Figure 5). First, the parent
of children [
,
] is encoded, then the children are reconstructed. The reconstruction error
is calculated and kept in memory. Then, the merging is shifted to the right child where the parent of children [
,
] is encoded and the reconstruction error is calculated as
. This shifting is done until the last child is used. The minimum error among the errors
is chosen and the corresponding children are merged at that level. Let’s assume
is the minimum, which is a result of merging of children [
,
]. The first merging for the first level of the tree is done as
and these children are represented by
. Then the merging for the second level is done with
and it continues in the same greedy manner until only one parent (
) remains in the last layer.
3.5. Abnormal Behaviour Detection
First, the dataset is divided into training and testing sets and the training set is used to learn the parameters for encoding function ( and ) and decoding function and for a RAE model. Then test instances are given to constructed RAE trees to construct their parents. The main motivation behind is that given a training activity set of normal behavior, RAE learns a feature representation that encodes and models normal behaviour. Abnormal behavior is defined as the ones deviating from the expected behaviour. When a new test instance is introduced, the model will reconstruct the children with a small error, while the abnormal instances will be poorly reconstructed. Thus, the reconstruction error will be exploited to decide if an instance is normal or abnormal behaviour. Two different methods are used to construct RAE trees as follows.
3.5.1. BOS Feature Merging Method
A sliding window of one minute is applied on the raw data (in both training and testing dataset) and sensor readings in each one window are mapped onto BOS representation (
Section 3.3). Then a window size of
w is used to extract chunks from these BOS representations. Thus, these chunks have a size
, where
n is the number of features (=34). Then each row of a chunk is merged with its next row using traditional RAE until only one parent is constructed in the end.
In Equation (
1), the error between the original children
and
, and their reconstructed versions
and
is calculated using the mean squared error (MSE).
N is the total number of features that each
has. Then the error of each parent is used to decide if there is an abnormality in children or not. Here, in a constructed RAE tree for an input, time-slices in 25 min chunks are spanned, and the relationship between each one-minute slice is taken into account during the mergings in RAE.
3.5.2. RSM Feature Merging Method
First, each one minute time-slices are mapped onto RSM representation. Inspired by [
15], where words in a sentence are merged by a RAE, we treat each sensor activation as a word and each extracted RSM as a sentence. For example, in the extracted RSM feature
, each sensor activation, such as
, is treated as a word. Resembling a sentence, in a RSM representation the order of the words is important to decide the context of a sentence. The sensor activations in RSM representations are merged hierarchically by greedy RAE. Each sensor activation is represented as a one-hot encoding representation during the merging. Here, the error for each RSM tree is used to decide if that time-slice is abnormal or not. This error is decided in two ways. First, the average error of all parents in the tree is used. Second, the error of the last parent is used. The experiments with this feature is performed in two modes following the same procedure in [
15].
Unsupervised RAE: In unsupervised RAE, activity labels are not used and RAE is trained as described in
Section 3.4.1. Each sensor reading is represented by one-hot encoding and parents are constructed from the children. The error is calculated using MSE in Equation (
4).
Semi-Supervised RAE: In semi-supervised RAE, the error at each parent node is a combination of unsupervised RAE error (see
Section 3.4.1) and supervised error. Supervised error is calculated in the following way. Assume that we have the RSM input
, which is extracted within one minute duration from raw data (
Figure 5). The activity occurred at that one minute, label
l is used as the label for whole parents in the tree while the parents are used as the features. Each parent
p can be seen as a feature describing the sub-tree under it. Then a softmax layer is added to each parent as follows.
where
. Assume that there are
K labels,
is a
multinomial distribution and
. Then softmax layer’s outputs can be used as conditional probabilities for a parent
p as
. Then, the cross-entropy (supervised) error is:
where
is the
kth element of the multinomial target label distribution
t for parent.
A weighted average of supervised error
and unsupervised error
(Equation (
1)) is used to calculate the final error (Equation (
4)), where
is decided experimentally as a value between 0 and 1.
5. Discussion
Although LSTMs and CNNs outperform our proposed RAE-based method, one disadvantage of supervised models on our proposed method is that they require too much training data. Collecting and labelling that much training data is time consuming and a laborious task. Moreover, providing labelled data at the beginning would not be enough since observation of elderly people suffering from dementia in a smart home is a task which can be up to years. Thus, a continuous labelling of the data would be necessary. Moreover, activity classes need be fixed for supervised models. However, users tend to change their activity patterns in a time lapse of years. This would require the training set to be updated and labelled again. Thus, using RAEs to model activities is more advantageous than using supervised methods.
Moreover, although supervised methods give better AUC results than RAE models, they require labelling information which is tedious and time consuming task to obtain. In a case where getting a training set is difficult, RAE models can be an alternative to supervised methods. Moreover, detection of dementia indicators is a process spanning months and maybe years. In this time, the habits of residents may change and new activities may emerge. Thus, obtaining a training dataset and labelling it would not be sufficient since this labelling process would need to be repeated again when the activity labels change. However, with unsupervised methods such as RAE, no activity label data is used and the model can be updated at any time. Some of the supervised methods, such as CRF, take frequency information of each class instances into account and favours those classes in terms of classification. This would be a problem with imbalanced datasets like daily activity datasets, where abnormal detection of infrequent classes are important as well. However, RAE models do not learn class based parameters since they do not use class labels. We see that supervised methods used in the experiments tend to detect abnormal instances of frequent classes better than the others.
However, RAE models cannot relate one instance to another and neglects temporal information. Another problem with BOS is that it does not reflect the real status of an activity being performed. For example, people do not tend to close the room doors after they enter or leave. Once the door is open, the door sensor continues to emit 1. However, RMS feature representation only takes the activation of the sensor into account and then neglects the information that the door is left open. For scenarios where the door sensor is not important, it is good that the on status is not carried forward, but for abnormality detection scenarios such as leaving the door open, RMS feature would not be able to catch this information.