*4.3. The Proposed Bayesian Network*

A formal definition of the BN and its nodes are as follows.

**Definition 1.** *A BN is a directed acyclic graph (DAG) with a set of nodes N, a set of edges E* = (*N<sup>i</sup>* , *Nj*)*, and a conditional probability table (CPT) which represents a causal relationship between connected nodes. Each node represents a specific event on the sample space* Ω*, and each edge and the value of the CPT represent a conditional relationship between a child node and parent nodes, P*(*C* = *c*|*P* = *p*)*. Given the BN and evidence e, the posterior probability P*(*N*|*e*) *can be calculated by chain rule, where Pa*(*N*) *is the set of parent nodes of N [17]:*

$$P(N|e) = \prod P(N|Pa(N)) \times e = \prod P(N|Pa(N)) \prod\_{e\_l \in e} e\_{l\nu} \tag{1}$$

**Definition 2.** *A set of nodes N consists of the set of query nodes Q, which represents the event user wants to know from the BN a set of evidence nodes V, which observes the sensor data and classifies the properness, and a set of inference nodes I, which infers the probability of related contexts based on a CPT.*

Figure 4 shows the proposed BN. The proposed BN consists of *V*, *I*, and *Q*, where |*V*| = 64, |*I*| = 23, and |*Q*| = 1. Full names of sensors are described in Table 4. Nodes in *V* are set by nine types of low-level sensor data, the query node in Q represents the recognition result, eating or not, and each intermediate node in *I* represents the sublevel context of the target activity. By using intermediate

nodes, the proposed model is more resistant to overfitting than typical learning models which mainly depend on automatically calculated statistics, such as the mean, deviation, or Fourier coefficients. For example, even if the model is trained only with the eating data using a fork, it could approximately recognize the eating activity using chopsticks if the user eats while sitting and shows the similar pattern of the movement of the hand, and so on. Moreover, in addition to the complex composition of the eating activity itself, there could be many unexpected or omitted sensor values: user may eat while lying down or eat at midnight, or take off the wrist-wearable device or smartphone, where the accelerometer value is omitted. A BN could deal with these issues as it provides the probabilistic approach for recognizing each context, so it can give an approximate answer even if some data are uncertain or missing, compared to other deterministic classifiers which give a wrong answer or cannot give any answer at all.

**Figure 4.** The proposed Bayesian network.

For a structure of the proposed BN, we construct the modular BN with a tree-structured design.

**Definition 3.** *Modular Bayesian network [18]. A Modular BN (MBN) consists of a set of submodular BNs M and the conditional probability between submodules R. Given BN submodules θ<sup>i</sup>* = (*V<sup>i</sup>* , *Ei*) *and θ<sup>j</sup>* = *Vj* , *E<sup>j</sup> , the link Ri*,*<sup>j</sup>* = {< *θ<sup>i</sup>* , *θ<sup>j</sup>* >|*i* 6= *j*, *V<sup>i</sup>* ∩ *V<sup>j</sup>* = ∅} *is created. Two submodules are connected and communicate only by shared nodes.*

The proposed MBN has one main module containing a query node and four submodules where each leaf node in a main module (object/spatial/subject/temporal) becomes the root node of each submodule. All submodules are designed by a tree-structured approach, where each module has only one root node, which is also a shared node, and all child nodes have exactly one parent node. By following these design approaches, the proposed model is more explainable as the probability

of each shared node could easily be calculated and explain the probability of each context to an individual. Moreover, these design approaches substantially reduce the complexity of the BN to *O*(*k* 3*n <sup>k</sup>* + *wn*<sup>2</sup> + (*wrwr <sup>w</sup>*)*n*); by limiting *k* to 2 and minimizing the *w*, where *n* is the number of nodes, *k* is the maximum number of parents, *r* is the maximum number of values for each node, and *w* is the maximum clique.

**Algorithm 1.** Learning algorithm for the CPT.

for∀D,// D is the input data increment numOfData by 1; C := class of D; for i = 1 to n(I) do if C includes *I<sup>i</sup>* then increment num(*Ii*) by 1; if ∃ q ∈ Q s.t. q ∈ C then increment num(I<sup>i</sup> ∩ Q); for i = 1 to n(I) do P(*Ii*) := *num*(*Ii*) *numO f Data* ; CPT(*Ii*) := *P*(*I<sup>i</sup>* |*Q*) = *P*(*I<sup>i</sup>* ,*Q*) *P*(*Q*) = *num*(*Ii*∩*Q*) *num*(*Q*) ;

To calculate the value of the CPT, the proposed BN learns the data using simple learning algorithm. In the training process, the training data enters into *E* and *I*. For evidence nodes in *E*, there is a simple binary decision tree for each evidence node and it learns a criterion for classification. For inference nodes in *I*, BN counts the number of occurrences that *C* ⊂ *I<sup>i</sup>* for ∀*I<sup>i</sup>* ∈ *I* and update the element of the CPT, as shown in Algorithm 1. For example, if *C<sup>k</sup>* = {*sitting*} ∩ {*dinnerware*} ∩ {*eating*}, *C<sup>k</sup>* ⊂ *I*<sup>1</sup> = {*sitting*} and *C<sup>k</sup>* ⊂ *Q*<sup>1</sup> = {*eating*}, so *num*(*I*1) and *num*(*I*<sup>1</sup> ∩ *Q*1) increment, and so on. For this algorithm, the proposed BN needs *O*((*M* + *N*) × *ND*) time complexity for learning, where ND is the amount of data, and when either the number of nodes or data is fixed, the time complexity becomes linear.
