**3. Proposed Solution**

In this section, we present the solution proposed to detect context-aware sociability patterns and behavioral changes. It performs incremental learning of context-sensitive sociability patterns through the combination of FPM and CEP. FPM is a computational technique that aims to discover patterns that occur with significant frequency in different data collection types, such as relational and non-relational databases, text files, and data streams [18].

The algorithm used in our solution was proposed by Lago et al. [44], which aims to learn activity patterns in smart homes (e.g., activity sequence). We applied this algorithm to digital phenotyping of mental health through the recognition of sociability patterns. First, we present a formalization update of this algorithm through unit step functions to represent the appropriate logic to identify time intervals in which social events routinely occur. We used the formalized algorithm to implement an event processing network capable of incrementally identifying context-sensitive sociability patterns. For this purpose, we used CEP concepts [19], which provide a set of tools to process data streams efficiently, so performing tasks such as data aggregation and filtering, context partitions, data window, high-level information derivation, and pattern recognition.

The proposed solution can also detect abnormal social behaviors and changes in social routines through the application of concepts of drift identification techniques. Additionally, we use fuzzy logic to model the knowledge of the mental health specialist to detect behavior change. Finally, the developed solution provides an Application Programming Interface (API) to enable the rapid implementation of strategies to identify context-aware sociability patterns and configure behavioral changes.

Figure 1 presents an overview of the processing flow to identify context-aware sociability patterns and social behavior change. The first layer represents the generation of social events from data of online social networks and physical and virtual sensors embedded in ubiquitous devices. As examples of social events, it is possible to cite conversations identified from microphone data and interactions mediated by technology (e.g., phone calls, text messages, and social media posts). Next, the layer responsible for detecting context-aware sociability patterns supports a set of CEP rules designed to implement the algorithm to identify sociability patterns. The next layer performs tasks of detecting abnormal behaviors and routine changes. This layer also contains a Fuzzy Inference System (FIS) that models specialist knowledge needed to recognize social behavior changes. The last layer refers to client applications that receive notifications of new patterns and behavior changes emitted by the proposed solution's components.

Ubiquitous devices (e.g., smartphones, wearable sensors, IoT devices, social networks) represent valuable social data sources. Computational methods (e.g., data mining, machine learning) can process context data from physical and virtual sensors embedded in these devices to identify social situations, such as face-to-face interactions (i.e., socialization in physical environments) and device-mediated interaction (i.e, socialization in virtual environments). For example, computational methods can process microphone and wireless communication interfaces data (e.g., WiFi, Bluetooth, NFC) to identify conversations and physical proximity [23,35]. Call logs and text messages can represent device-mediated interactions [31,33].

We emphasize that our proposed solution does not include the generation of events, but focuses on processing high-level sociability events inferred by other solutions to identify patterns and behavior changes. Next, we present in detail its components.

**Figure 1.** Components of the proposed solution.

### *3.1. Learning Context-Aware Sociability Patterns*

In this section, we present the algorithm for identifying context-sensitive sociability patterns and its implementation using CEP.

### 3.1.1. Algorithm for Identifying Sociability Patterns

We consider that, if the social activities are detected frequently at a specific time interval, this interval composes the sociability pattern of monitored individuals. Thus, we define sociability patterns as *periods of the day in which the individual usually socializes, that is, the set of time intervals in which social activities habitually occur*. The algorithm processes the data stream to recognize time intervals [*Tstart*, *Tend*] in which the number of occurrences of social activities is higher than *φ* ∗ |*n*|. In this regard, |*n*| is considered as the total number of processed observations in a defined time window to model social behavior, and *φ* is a parameter to be manually set, which is responsible for indicating the sensitivity of the algorithm.

The algorithm input is a social event stream that has the start time of each social activity. The first step of the algorithm is to determine, based on the timestamp, which time frame each social event belongs. For this, the algorithm segments the time in slots with equal sizes. Each slot represents a slice of the day and has a sequential identifier. To define the size of the slot (i.e., in how many periods should divide the day), the programmer is required to specify a value for the *t* parameter, which is responsible for creating an array of counters with the total number of slots. For example, if the programmer decides to divide the day into periods of 30 min, the parameter *t* is equal to 0.5, since 24*t* = 48 slots. This equation is responsible for creating the storage structure for counting occurrences of social activities in each slot.

After defining the size of the slots, we now describe the counting phase of the algorithm. At this stage, the algorithm uses the timestamp of each event to define its slot. By identifying the slot of social event, the algorithm increments the counter value

that represents this slot in the structure responsible for storing these statistics. Therefore, when processing the flow of events, the frequency of social activities in each slot is updated. This approach of saving only the summary (i.e., the count) allows reducing the data volume since it is not necessary to memorize the full content of the events.

The next phase of the algorithm is the sociability pattern discovery, which uses the summary of the counting phase to identify frequent intervals of sociability. At this stage, it is necessary to define which slots have a sufficient number of observations, that is, a quantity that enables them to be candidate slots to form a frequent period. For this, the number counting of social observations of the analyzed slot must be greater than or equal to *Sth*. The algorithm uses Equation (1) to define the value of *Sth*. The *θ* parameter is entered by the programmer to set up the sensitivity of the equation.

$$S\_{th} = |n| \* \theta \* \frac{1}{\frac{24}{l}} \tag{1}$$

Equation (2) is responsible for iterating the slot array *Cs* and verifying which slots are candidates to form a sociability pattern, so assigning zero to the count of non-candidate slots. For this, this equation defines the multiplication between the count of each slot (*slot[i]*) and the unit step function, which returns zero value in cases of negative arguments (*slot*[*i*] − *slot*\_*th* < 0) and one for non-negative arguments (*slot*[*i*] − *slot*\_*th* > 0). In the end, the slot array *Cs* is sent to the process of identifying frequent sociability intervals.

$$\mathbb{C}\_s[i] := slot[i] \* \mathbf{u} \mathbf{n} \mathbf{t} \mathbf{t}\mathbf{e} \mathbf{p} (slot[i] - slot\\_th) \tag{2}$$

Finally, after defining the requirement for a slot to be candidate, the next step is to identify which sets of slots compose an interval at which social activities are routine for the monitored individual. Equation (3) groups the adjacent non-zero candidate slots in the array *Cs* into a sociability pattern. The unit step function verifies whether the sum of the event counts for the grouped slots (i.e., time intervals) subtracted from *ϕ* ∗ |*n*| results in a positive value. If this condition is satisfied, the time interval formed by these sets of adjacent slots represents a sociability pattern. In the end, the array *Ps* will contain sociability patterns, that is, the time intervals in which social activities routinely occur for the monitored individual.

$$P\_{\mathbf{s}}[i:i+n] := \mathbf{unit}\_{\mathbf{s}} \mathbf{step}((\sum\_{j=i}^{i+n-1} \mathbb{C}\_{\mathbf{s}}[j]) - \boldsymbol{\varrho} \* |n|) \tag{3}$$

### 3.1.2. Context-Aware Sociability Patterns

So far, the algorithm allows identifying the individual's sociability routine, so mapping the frequent start time of social activities. However, this context-free analysis may result in inefficiency when outlining the social habit, since the individual's behavior may vary due to specific contexts, such as workdays, weekends, rainy days, among others. For this, we use a strategy with Context Attributes (CAs), in which several scales can be used to represent them. For example, a temporal feature may have several scales, as a broad scale, so differentiating days of week and weekends, or a more specific, so distinguishing each day of the week (e.g., Monday, Tuesday, Wednesday). We inject these CAs into the stream of social observations, which can be derived directly from event properties (e.g., timestamp) or retrieved from external sources (e.g., climate APIs). By enabling this setting, mental health professionals can define which contexts are considered more suitable for each patient and treatment.

Each CA is used as a data segmentation dimension to identify behavior change due to specific context situations. Therefore, the identification of sociability patterns is performed from a subset of data that has a particular CA. For example, all social events that occurred over the weekend (i.e., CA = Weekend) are used to identify the individual's social routine in this context condition. The algorithm needs to create a structure (e.g., a matrix) to store slot counters for each context dimension. During the counting phase, each social event increments, in the index of its respective slot, values in the structures that store statistics for each CA of the processed social observation. In summary, we partitioned data flow based on CAs and performed incremental learning of sociability patterns for each derived data stream.
