**1. Introduction**

The rapid development of the ICT sector has enabled scenarios where an ever-deeper interconnection between the physical and digital worlds is proposed (Phyigital—2007 by Chris Weil). One scenario is the home, where the interconnection between technologies and people enables the implementation of a paradigm for autonomous living, guaranteeing mutual safety (people, family members, and caregivers are in contact with each other) and allowing the identification of behavioral drifts and their subsequent compensation (behavioral drift is selectively compensated with solutions for the identified problem by containing costs and promoting personal autonomy) [1].

In recent years, applications of domiciliary technology systems interacting with the person have addressed issues such as activity recognition (e.g., [2–4]), health monitoring [5], security [6], and the prediction of future events [7].

The need to understand what is happening inside the dwelling requires that the phenomena to be tracked are observable; this assumes that there are suitable sensing systems (sensors and transducers) and that these are distributed more or less densely in the home. Considering the environmental sensing, the various proposals combine data collected from different types of sensors, such as RFIDs (Radio-Frequency IDentification), PIRs (Passive InfraRed) [8], contact sensors, pressure-sensitive mats [9], tilt sensors [10], power meters [11], inertial sensors, infrared array sensors [12], etc. In general, the types of sensor can be vision-based (e.g., cameras), wearable (generally based on inertial sensors like accelerometers, gyroscopes, etc.) or environment detection (e.g., motion or door/window sensors, temperature/humidity sensors, etc.). It is worth noting that both current regula-

tions and perceived privacy violations make it difficult to use vision-based systems (2D and 3D cameras) over other detection techniques.

Considering wearable vs. non-wearable devices, a non-intrusive monitoring system (that is, without wearable devices) can guarantee a better trade-off between privacy and reliability because the absence of wearable devices eliminates or reduces some important critical issues, such as routine maintenance (e.g., recharging batteries) or the misuse of the device (e.g., taking it off in certain situations or forgetting to wear it). However, nonintrusive monitoring systems can provide reliable and direct measurements for many activities in a specific environment, such as room occupancy in the house, but do not achieve high levels of accuracy for many other tasks, such as calculating the number of people in a room. It is worth noting that in some scenarios, like in the case of aging, the exact calculation of the number of people in a home support system is not particularly relevant. What is often of interest is to detect whether the person is alone at home (and to inform the caregiver so that they pay more attention to the person) or whether there is a behavioral drift present, which leads the elderly to isolate themselves and progressively reduce their degree of sociality.

In these contexts, it is necessary to identify methodologies for identifying whether, when, and for how long the person is in the company of other people (possibly also considering the number).

People-counting in smart environments with distributed sensor networks has been studied in the past, but using a large number of sensors (e.g., up to 60 sensors distributed in a single apartment). Such a large number of sensors represents an important entry barrier for many households. The challenge is to tackle this task in sparsely sensorized apartments, with sensors that can be wireless (nowadays, many apartments are not equipped to host wired solutions) and can provide only local and simple information.

In this article, we propose a method that addresses technological problems (such as the blind times of the PIRs, their insensitivity in the absence of motion, and their different sensitivity depending on the distance and temperature of bodies), topological problems (such as possible sensor interference due to the inability to separate detection areas), and algorithmic problems. In the instrumented apartment, there is only one PIR per room and one on/off sensor on the front door. This is the minimum monitoring configuration: below this, one or more rooms are not 'observable'. The house is modeled as a DAG (Directed Acyclic Graph). The model is used to deal with the fragmentation of the data stream that the various sensors can generate. In particular, we need to manage unwanted transitions between rooms when there are multiple people in the house. The DAG model can constrain the transition between adjacent rooms and avoid crossing walls. However, other issues for correct detection remain. The state of each room is represented by a set of values, one for each identified person. Each value decays over time and represents the probability that the person, while not specifying who they are, is still in the room. Because the sensors used in our setting cannot identify the number of people, the approach is based on multi-branch inference that, over time, differentiates the movements in the apartment and estimates the number of people; the limitation is that the number of people must be less than the number of rooms in the dwelling [13].

The main contributions of our work are:


The rest of the document is organized as follows. In Section 2, previous and related work is introduced. Next, Section 3 presents the proposed approach. In Section 4, the performance is evaluated. Finally, Section 5 discusses and concludes our work.

### **2. Related Works**

During the years, various smart environment systems have been proposed. They have been used in many scenarios, such as family houses [14], offices [15], shopping malls [16], and museums [17], and have been applied for a variety of purposes, including tracking people in buildings [18], counting people numbers [19], recognizing human behavior [20], etc. In addition, energy consumption can be monitored, and the indoor environment can be controlled automatically by using appropriate sensors and controllers [21].

The types of detection means (e.g., through sensors or transducers) that can be used in smart environments are diversified. They can be roughly divided into two main categories, wearable devices and non-wearable devices, where the latter is in turn divided into sensors–transducers and multimedia-based devices. In terms of wearable devices, Bluetooth—BLE ([22,23]), UWB, Zigbee, WiFi, and RFID technologies [24], or specialized sensors (e.g., magnetic field sensors [25]) are widely used in indoor positioning systems to track or to localize people [26]. Sensors– transducers include various types of detection systems positioned in a smart environment to detect the movement of humans [27]. A non-exhaustive list of these devices includes Passive Infrared sensors [28,29], Thermal Sensors [30], and Force-Sensing Resistors (e.g., smart floors [31]. Moreover, pressure polymer, electromechanical film (EMFi), piezoelectric sensors, load cells, or WiFi can be used [31]. Finally, multimedia-based approaches can obtain rich context from the environment with videos [32] and audio [33–35].

Various approaches for people-counting in non-intrusive monitoring environments have been proposed without multimedia-based devices. Petersen et al. [36] propose an SVM-based method (Support Vector Machine) to detect the presence of visitors in the smart homes of solitary elderly because social activity is an important factor for assessing the health status of the elderly (social, psychological, and physical are, typically, the three dimensions of the health-related quality of life). Wireless motion sensors are installed in every room, and several key features are extracted and input to a SVM classifier, which is trained to detect multi-person events. The model has been validated with a two-subjects dataset and the results demonstrate the feasibility of visitor detection. The adopted method suffers from some criticalities: time is divided into fixed slots of time, with epochs of 15 min; all possible room combinations are taken into account (i.e., n × (n − 1)/2 ) without considering that adjacent rooms could produce sensors interference; PIR blocking-time is not mentioned (blocking time could interfere with the activation order of PIRs); the time of the day is flagged a priori to consider the 'circadian rhythm'; and finally, the approach is supervised.

Müller et al. [37] implement two approaches for inferring the presence of multiple persons in a test lab equipped with 50 motion sensors (data from CASAS) and some contact sensors; only two persons are monitored in the laboratory. One approach is a simple statistical method to derive the number of people based on the raw sensor data, while the second one uses multiple hypothesis tracking (MHT) and Bayesian filtering to track the people. The first method reaches an accuracy of 90.75%; the second one reaches 83.35%. The limit of this research is that the proposed method can only distinguish whether the house has a single person or multiple persons, but does not estimate the number of people. In addition, there are dense sensors in the test smart home, and the complex installation of sensors may limit the widespread use of such a system.

The authors of [38] estimate people numbers that satisfy the house topology and sensor activation constraints; then, a Hidden Markov Model is used to refine the result. The algorithm is validated in two smart homes and obtains high accuracy results when the smart home has 0 to 3 persons, but the accuracy decreases dramatically with 4 or more persons. In this work, both simulated and real data from different scenarios were used: ARAS (limitation: small rooms, few rooms, and only partially covered), ARAS-FC (limitation: few rooms without a dataset; the authors produced some data via simulation), and House 2 (limitation: the authors produced a simulated dataset). Unfortunately, it is not clear if they took into account the limits of the PIR sensors (for example, sensitivity and blocking time) and the criticality of the map; for instance, this could create interference between the different activations.

The work in [39] proposes an unsupervised multi-resident tracking algorithm that can also provide a rough estimation of the number of active residents in the smart home. They consider two datasets. The first is the dataset TM004 from CASAS, consisting of 25 ambient sensors distributed among eight rooms and with two-bedroom apartments with two older adult residents; occasionally, their child will come and stay in their house for a couple of days. The second is named Kyoto and contains a denser grid of sensors (91 sensors installed in six rooms—hallways included) with two residents; occasionally, they received friends for a visit of a few days. The algorithm has good performance, but this decreases as the number of residents increases; moreover, the algorithm tends to generate more resident identifiers when the same resident triggers the same sensor events, and it has a higher possibility of segmentation errors when tracking residents in a location where sensors are more densely deployed.

Other papers focus on the problem of the recognition of multi-resident activities in a smart-home infrastructure [40,41] (using the CASAS dataset with 60 sensors and 2 residents), [42] (using the ARAS dataset), and they can have as a consequence the possibility of counting the number of people. However, given their main goal, the number of sensors is typically very high, and the number of residents is limited to two persons.

A recent paper [43], focuses on the recognition of some daily activities in a multiresident family home. The recognition of daily activities is a specialized task that requires to precisely identify the type of resident. For this purpose, authors used numerous and specialized types of actuators (e.g., a sensor module for a cup and a sensor box of the fridge) to distinguish the different activities performed by the individuals and using a data-driven and knowledge-driven combination method to recognize users. The article is very interesting, but, for the obvious reasons of observability of the phenomena, it requires a conspicuous number of transducers in addition to those normally used for home monitoring.

The survey in [27] focuses on the techniques for localizing and tracking people in multiresident environments. For the counting problem, they identify three classes of approaches: (a) binary-based techniques based on binary sensors like PIRs—they typically exploit snapshots or, possibly, the history of snapshots with spatial and temporal dependencies to understand the number of people [44]; (b) clustering-based techniques that identify multiple non-overlapping clusters containing one or more targets; and (c) statistical-based techniques based on statistical models to estimate the number of persons.

The work in [45] aims at identifying visitors by using different measures of entropy for the cases with/without visitors in a smart home equipped only with PIR sensors and a door contact sensor that is used to confirm the visits and their duration. An accuracy of 98–99% is obtained in a setting where a single occupant typically resides in the home and a visitor arrives.

Alternative approaches to estimate the number of people have been proposed. For example [31] use WiFi: the movement can be detected through the analysis of the propagation effects of radio-frequency signals. Wang et al. [46] propose a method to count people by utilizing breathing traces, reaching 86% accuracy for four people. Similarly, in [47], Fiber Bragg Grating sensors are used for the detection and number of occupants, experimented

with three people. Recent techniques also include voice recording to recognize up to three persons [48].

In the literature, vision-based methods have been always considered a reliable approach to estimate the number of people because the camera can obtain rich information. Vera et al. [49] proposed a system to count people using depth cameras mounted in the zenithal position: people are detected in each camera and the tracklets that belong to the same person are determined. Even though vision-based methods are efficient and reliable [14,50,51], they are unsuitable for smart homes due to privacy reasons. Algorithms based on wearable devices are infeasible for detecting visitors who do not wear such devices. Additionally, this intrusive method may not be acceptable for those people with low compliance.

Our algorithm avoids the usage of wearable devices and cameras: it adopts a system equipped with a very low number of presence detectors to realize non-intrusive monitoring, based on architectural modules of the BRIDGe project (Behavioral dRift compensation for autonomous and InDependent livinG) [52,53].

The proposed algorithm is based on minimal data about the house structure (plans of the flat) and sensor position, such as room adjacency and possible overlapping monitored areas (sensor interference), and can update the estimated number of people dynamically. The case study is with four inhabitants (a family with two adult children) in an apartment with a living room, a kitchen (open view), three bedrooms (a double room and two single rooms), two bathrooms, and a corridor. In the apartment, there are frequent guests, especially from Friday to Sunday (typically in the evening). The maximum number of people has reached six people. The apartment is instrumented with one PIR per room (eight PIRs) and a contact sensor on the main door. The PIR has a 2 s blocking time, 2 moves sensibility (the number of moves required for the PIR sensor to report motion), and a 12 s window time (the period of time during which the number of moves must be detected for the PIR sensor to report motion).

### **3. Proposed Method and Algorithm**

### *3.1. System Architecture*

The use-case scenario is a classical house with typical rooms: a kitchen, a living room, and one or more bedrooms and bathrooms. In each room, a PIR sensor is installed. PIR sensors are cheap and small, but they have some shortcomings: (a) the detection area of the PIR sensor is difficult to control, so that PIR sensors in different rooms may have overlapping areas of sensing range; (b) PIR sensors can only provide a binary response to the presence or not of people regardless of the number of people; (c) the sensitivity of the PIR is not uniform (it depends on the distance, the width of the visibility area—for example, edge zone or sectors of areas—and on the speed of the subject, on the characteristics of the subject [54]); and (d) the functioning of a PIR depends on a set of motion detection parameters (e.g., the blocking time).

Moreover, a contact sensor is installed at the entrance of the smart home. The sensor sends an activation signal when the door is opened. It is worth noting that a contact sensor (e.g., door and windows perimeter monitoring) is less critical, in terms of functioning and parameters, with respect to PIRs.

Figure 1 shows an overview of the architecture and data flow of the proposed algorithm. The inputs of our algorithm are: the stream data from the PIR and contact sensors and some information stored in the database, including the house structure and the sensors settings. The stream data are processed by the Data Processor to detect the status of the sensors, which can be active or inactive. The Data Fragment Generator reads the sensor data and groups them into fragments concerning a continuous period that may represent interesting changes in the house. Then, the Event Detector detects events in the received data fragment, which may also include events coming from the door contact sensor. Based on the detected events, the status of the sensors, and the system setting, a multi-branch inference machine infers the number of people by fusing several independent inference engines that represent different possible scenarios compatible with the sequence of events. An algorithm coordinator controls all these modules and adds some functionalities that

allow (a) to start the algorithm in any initial situation without any information about the number of residents and (b) to avoid accumulated errors that may occur in long-time runnings. All the modules are detailed in the next subsections.

**Figure 1.** Architecture of the proposed algorithm.

### *3.2. Fragment Generation and Event Detection*

### 3.2.1. Data Fragment Generation

Sensors produce and send data irregularly, depending on the activities that occur in the house. Typically, a series of signals are activated by the movement of a person within a small time interval. To recognize events that happen in the house, we divide stream data into semantic fragments composed of sequences of signals that occur in a certain interval as shown in Figure 2. Fragments are separated by periods that do not detect events for a given time interval.

In our Data Fragment Generator, only the active signal of PIR sensors and contact sensors installed at the entrance door are taken into consideration. Thus, data fragments represent events such as 'somebody moves from the bedroom and goes out passing through the living room'.

**Figure 2.** The generation of a data fragment: circles represent events that produce new data; if the time difference between two events is greater than *interval*, the new data are regarded as the beginning of a new data fragment.

### 3.2.2. House Event Detection

The layout of the rooms in the house is modeled as a Directed Acyclic Graph (DAG) representing their adjacency. The algorithm works also in multi-floor buildings. The detection and inference of possible events is realized by finding the transition of active signals from generated data fragments.

Besides the movement of people between adjacent rooms, there are some special events:


• **Overlap:** The detection area of PIR sensors in different rooms may have overlapping detection areas. An example is shown in Figure 3. Such kinds of events need to be identified to have a better inference.

**Figure 3.** Overlap Example: Both sensors in Room A and B are active, but the overlap is in Room B, so this is defined as the *Overlapped True (OT)* room; instead, Room A is defined as the *Overlapped False (OF)* room.

The overlap case depends on the direction and the installation place of sensors; therefore, the possible overlap areas can be identified in advance.

If an overlapping case occurs, both sensors are active: if the difference between their timestamps is less than a predefined *overlap interval*, the detector will detect an overlap event. Figure 4 shows the typical behavior.

**Figure 4.** Overlap Event Detector: The last two signals highlighted in each data fragment are compared with the list of known overlap cases.

### *3.3. House Status Estimation*

Decayed Room Status Representation

> To represent the house status, our method considers the following facts:


To estimate the number of occupants accurately, besides the latest data, also the previous data need to be taken into account. In each room, more than one person may be present: the status of a room is represented by a set of values, one for each estimated person, that decay over time and that represent the probability that the persons are still in the room. We call this kind of representation *Decayed Room Status Representation (DRSRt)* and the status of each person *j* in each room *i* at the time instant *t Room Status Signals (RSSji*,*<sup>t</sup>)*. The value of each *RSSji*,*<sup>t</sup>* varies from 0 to 1: 1 means that the person has been detected, 0 thatthepersonhaslefttheroom,anintermediatevaluethatthepersonmaybeintheroom.

Each *RSSji*,*<sup>t</sup>* decays over time with a given decay ratio until it reaches a lower limit, according to Equation (1), where Δ*ti* is the time difference from the last update of the *i*-th room and *ni* is the number of persons estimated in the room.

$$RSS\_{i, t + \Lambda t\_i}^{j} = \max\{RSS\_{i, t}^{j} - decay\\_ratio \times \Delta t\_i, decay\\_lower\\_limit\}, j = 1, 2, \dots, n\_i \tag{1}$$

where *decay\_ratio* defines the decay speed of *RSSji*,. and *decay\_lower\_limit* is the limit that *RSSji*,. can reach. Such a decay mechanism can make up for the shortcomings of the PIR sensors that are insensitive to motionless people. When a person is detected in a room, if the adjacent room has not revealed an activity, we can assume that the person is still there. Therefore, the activation status can last for a certain period, until the *RSSji*,*<sup>t</sup>* value decays to the lower limit (*decay\_lower\_limit)*. Thus, from the status *DRSRt* of the house, we can determine whether a room is occupied or not by comparing their *RSSji*,*<sup>t</sup>* values with a given threshold. The description of *RSSji*,*<sup>t</sup>*is shown in Figure 5.

**Figure 5.** The graphical representation of *RSSji*,*<sup>t</sup>* with the active and inactive areas, the decay lower limit and the decay ratio.

The *RSSji*,*<sup>t</sup>*also has the following tunable parameters:


Because the transfer of people from one room to another can be detected from a data fragment as described in Section 3.2.2, if the algorithm detects a transfer from room A to room B, while room B already has one person in it, then the number of people in the status of that room will be set to 2. For example, if the active threshold is 0.2 and the *RSS*s in the house are those shown in Table 1, then the total number of people in the house is estimated to be equal to 3 because three *RSSji*,*<sup>t</sup>* values are greater than 0.2. Notice that for Room 2, two different *RSSj*2,*t*values are available, one for each person.


**Table 1.** Example of people occupation determination.

### *3.4. Inference Engine*

An inference engine is an entity that infers the status of the house. It has two attributes: the status of the rooms *RSSi*,*<sup>t</sup>* of the house described above and a confidence score which changes with the inferring process. The confidence score represents the consistency of the state of the house. Whenever an ambiguity condition occurs, the confidence score decreases to return to 1 when the ambiguity is resolved. For example, if the inference engine finds that a transfer from one room to another is concluded successfully, the inference process finishes, and the state of the house is updated. On the contrary, if some inconsistencies are found and the process needs to continue in order to solve these problems, the confidence score is decreased. The number of inference engines depends on the events in the house and the sensors data, as different branches for all the possible cases that could be inferred are generated.

When the algorithm starts, one inference engine is initialized with the confidence score set equal to 1. All rooms are regarded as empty, i.e., all *RSSi*,*<sup>t</sup>* are initially empty. When the system is running, the new sensors data and the events detected by the Data Fragment Generator are fed to the inference engine to update the status of the rooms. Then, the number of people is estimated based on the *RSSi*,*<sup>t</sup>* values.

The Inference Engine updates status of the rooms following the main rules below:


•When the system receives an active signal, the algorithm determines if a room transfer has occurred according to the topology of the house. If a transfer happens, a new *RSSji*,*<sup>t</sup>* of the target room is set to 1. At the same time, the *RSSji*,*<sup>t</sup>* with the minimum value of the room where the person comes from is deleted. In case the active signal just intercepts an activity inside the room (person movement), all *RSSji*,*<sup>t</sup>* for that room are incremented through the following formula (in the current settings, *arise\_ratio* is equal to the *decay\_ratio*)

$$RSS\_{i, t + \Lambda t\_i}^{j} = \max\{RSS\_{i, t}^{j} + \text{arise\\_ratio} \times \Lambda t\_i, 1\}, j = 1, 2, \dots, n\_i \tag{2}$$

• If the PIR sensor does not activate for a certain time, but it may be possible that there are still persons in the room, the status of the room becomes inactive, and its *RSSji*,*<sup>t</sup>* values are decreased by the additional decay value.
