4.1. Clustering Results
All of the above selected features were clustered separately using the k-means algorithm described in clustering behavioral spectrums to find user habits. To find the optimum number of clusters, or in other words the correct number of habits identified in a set of trials, the elbow method employed in a previous work was used [
29]. This method calculates the percentage of the explained variation for each number of desired clusters. At one moment, as the number of clusters used in the k-means algorithm increases, there is no significant increment in the percentage of explained variation between the clusters and their respective indexes. The marginal point where the explained variation does not change substantially according to the number of points showing an apparent angle is marked in the elbow graph [
38]. The results regarding the percentage of variation explained in the data obtained from the AAL experiments based on the number of clusters used are shown in
Table 1,
Figure 6A. The same analysis was performed using the HS data and the results are presented in
Table 2.
Figure 6B shows the HS results for the explained variations in the number of clusters using personality trait scores as features.
Comparing the percentage of explained variation obtained using both features to express activity information, we observed that using the variability yielded a higher percentage from a lower number of clusters compared to the behavioral spectrum represented in its raw format. Increasing the number of traits decreases the explained variation in the obtained clusters. Moreover, we observed a considerable difference when comparing the explained variation with the number of clusters according to the age of the subjects. For example, using five clusters for HS (elderly), we obtained a total explained variation of 85% (all traits); however, when using the same number of clusters for the 166A dataset (younger subjects, we obtained a total explained variation of 62% (all traits).
Comparing the obtained explained variance from the activity-related features from both AAL and HS data we inferred that using the variability obtained from the behavioral spectrum allowed us to obtain a higher explained variation for the same number of clusters. In addition, comparing the three different periods of the day showed that there is a higher possibility of users performing different activities during the morning period, which was true according to the annotation data.
Using the data provided by the behavioral spectrum can lead to situations where the percentage of variation does not change significantly on increasing the number of clusters. A higher percentage of variance was obtained with few clusters using only neuroticism as feature; the same result was obtained when using variability as well. This is explained considering that the user did not repeat the experiments during different days; instead, data were collected during only one instance per user. Therefore, we can state that almost all the users performed different trials. The results for all traits lead to a similar conclusion; it is difficult to cluster different users according to the scores of all their personality traits. However, when analyzing only neuroticism as a feature, it is possible to classify a larger number of users into the same group. This result shows that it is difficult to find the correct number of habits and clusters in the dataset. Therefore, in this work, both the behavioral spectrum and personality trait scores were clustered into two groups.
4.2. Personality Trait Identification Results
As explained in the Correlation with User Personality Traits section, two types of information were collected from the users, from two different datasets. The information related to the activity trials of the users were represented as behavioral signals. For this case, the features used were the behavioral signal in its raw format and the data variability found. Each of the features from each dataset was clustered into two groups. The clustering results for the personality scores showed that the output cluster number 1 represented the highest scores and the output cluster number 2 the lowest scores. The correlation is observed for the same user, and the activity trait is classified within the same cluster group with which the personality of the user is associated. Because users with higher levels of stability usually perform their activities in a slow and organized manner, all the routines fitting in the habits with higher amplitudes (the dotted red line) are associated with higher levels of stability in the N dimension based on the PEN or big five model. Higher levels of neuroticism are associated with the other two habits. All the routines found in the experiments were classified into one of the three routines; then, the behavior or user was associated with one personality trait.
To compare with psychological approach, we collected the TIPI-J as described in
Section 3.3. The personality scores obtained from the questionnaire were organized in tree groups (all five scores, only extroversion and neuroticism dimensions, and only the neuroticism scores).
The durations of the trials observed in the activity data in the AAL experiments were shorter (maximum 20 min) than the ones observed in the HS data. Consequently, compared to longer observation times, the behavioral spectrum obtained from the AAL trials contained more activity information. Therefore, representation using the variability as a feature showed slightly higher associations with the user personality traits compared to that observed when using extroversion and neuroticism as features. However, when using all five personality traits, the raw data showed better association than that achieved using the variability feature (See
Table 3). One possible conclusion from these results is that for a shorter period, the sequence and duration of the activities in a trial has a higher level of association with all five activity traits as a whole. In contrast, the variability in the user activity series has a higher level of association with extroversion and neuroticism scores. Furthermore, there is no significant association between the activities and neuroticism alone.
The activities collected from the HS dataset were divided in three periods of the day, namely morning, afternoon, and night (See
Table 4,
Table 5 and
Table 6). This classification was done to observe the influence of each group of activities on the association, because the elderly normally have a distinct separation in terms of the activity frequency according to the period of the day. The results for the association observed during the morning period are presented in
Table 4.
For the HS experiments, the results showed that the association between all the big five scores and activity features as behavioral signals was high. One reason for this is that the scores obtained from the elderly in terms of extroversion and neuroticism did not change much between the individuals, or at least the therapists did not report any such changes. During the morning period, the number of observed activities was higher; however, the variability feature does not have a higher correlation than the behavioral spectrum in its raw format. One reason for this is that the variability in the number of activities between the individuals does not change the difference in the personality scores observed. This became clearer when observing the results for the night period.
The activities performed after lunch and before 6pm were considered as happening during the afternoon period. Their associations with the personality scores are presented in
Table 5.
The trials observed for the afternoon period were almost the same for all the subjects. During that period, as part of rehabilitation and therapy, normally the subjects were encouraged to engage in group activities. This resulted in trials involving similar activities. Only a few subjects, owing to physical limitations, performed different activities. Furthermore, as stated before, because there is no significant difference in the personality trait scores, a high association between the activities and personality traits of the users was observed for the afternoon period. Finally, the association results obtained for the night period are presented in
Table 6.
After analyzing the recorded trials for the night period, we found a moderate variability in the frequency and number of activities. Usually, older subjects tend to sleep very early, and do not perform any of the night activities, like watching TV or playing games. For this period, we found that neuroticism had a higher association with the activities in both featured representations (raw data and variability). In addition, the higher association found for the behavioral spectrum in its raw format to all the five traits was most likely because of the same reason mentioned for the association with the morning results explained earlier. The number and frequency of activities performed by the subjects were not consistent with the different personality traits. However, for the night period, the activities were not as diverse as the ones found in the morning, thus resulting in a lower level of association than that observed in the morning results for the five scores.