*4.3. Monitoring*

IoTCrawler allows aforementioned participants to connect their sensors to the system, to make them available for a broader audience. As sensors are often deployed in environments where their operation cannot be controlled or even guaranteed, and given that many of them are battery powered and have a limited life span, it is to be expected that their reliability might fluctuate over time. It is therefore important to observe the performance of the sensors. For this, IoTCrawler has developed the Monitoring component, which is responsible for observing the incoming data streams of different sensors, detecting possible faults in the data, and, if possible, providing counter measures to mitigate them. The proposed monitoring concept with its different subcomponents provides an extensive set of features for addressing issues of dynamics in IoT environments.

### 4.3.1. Fault Detection and Fault Recovery

The Fault Detection (FD) component monitors the data streams that are available to the IoTCrawler framework and follows a two-layered approach. In the first layer, the component categorises faults as definite faults (due to hardware issues) or as anomalies, which could occur because of brief environmental factors, an unexpected behaviour detected through learned patterns. These anomalies can be categorised as faults, if they persist for a longer period of time. To cater to the needs of most of the sensor streams, the FD component uses different algorithms, e.g., the Prophet algorithm [55] for time series analysis and stochastic algorithms which determines the likelihood for a value to occur based on the previous observations of the sensor. The FD component subscribes to new data streams that become available through the MDR. Through the metadata, the FD determines which approach should be used. This is differentiated based on how much information is provided in metadata. The MDR is then notified in case of faults and trigger the recovery mechanism. To deal with faulty sensors, IoTCrawler has developed a two-stage counter measure. The Fault Recovery (FR) mechanism is a first response to handle missing sensor observations by imputing artificially generated sensor readings. The goal here is to have a quick solution to provide uninterrupted data streams for the applications using them. For long-lasting faults, the FD can issue the deployment of a virtual sensor to replace the broken one.

In the case of multiple sensors, we employ a knowledge-based Bayesian Maximum Estimation (BME) for imputing an identified faulty value [56]. BME is a mapping method for spatiotemporal estimation that allows various knowledge bases to be incorporated in a logical manner—definite rules for prior information, hard (high precision) and soft (low precision) data into modelling [57]. More details about this algorithm can be checked in [56].

To evaluate the working of FD and FR, an instance of FD is presented in the example below. Sensors deployed in three different parking areas in the city of Murcia are integrated into the IoTCrawler framework. These sensors record the information about the number of free parking spots in their respective parking lots with an update interval of 2 min. A model was trained on the data of several days from the parking areas to learn the normal

behaviour. As an instance of the results, Figure 5 shows the original data for one day from each sensor, each along with one injected anomaly.

**Figure 5.** Data with injected anomalies at different instances.

The algorithm detected two anomalous patches in each instance. The first detected patch consists of the initial samples where the values do not change and the second anomalous patch is the actual anomalies. The first fault in each instance, caused by the sensor, is considered to be in stuck-at fault condition as this behaviour was not observed in the training set. The stuck-at condition is fulfilled when a sensor repeats an observation more times than was observed in the training set.

For the stuck-at fault, an estimated value cannot be interpolated by the neighbouring sensors, as all of the sensors show the same faulty behaviour at the same instance. The second anomalous patch in each sensor occurs when the data from another sensor is normal at those time instances. A recovery value is then generated using data from sensors with normal behaviour and BME as the interpolation technique (explained above). Results can be seen in Figure 6.

**Figure 6.** Comparison of actual and recovery values at the anomalous patch.

### 4.3.2. Virtual Sensor

To replace a faulty sensor in the longer term, IoTCrawler provides the virtual sensor component. A virtual sensor is capable of providing artificial sensor observations for a longer time as it is trained on larger data sets with different algorithms. As a result of the FR mechanisms as a first response, virtual sensors are allowed to train for a longer period of time, hence allowing the algorithms to learn more patterns which also make them capable of learning data drift. The concept of virtual sensors is that it takes historical data from a broken sensor and its correlating sensors and use the relationship to predict the values in place of the broken sensor. For instance, in the case of a broken temperature sensor, a virtual sensor can be trained to project the temperature at the failed sensor's location using nearby temperature sensors as predictors. To achieve this, the component searches for neighbouring sensors that can be used as predictors in the ML model. The correlation between the broken sensor and each candidate is calculated to train only with the most promising data sets. Via a grid search approach, the most promising ML model is selected.

To test the component, different scenarios are considered and the results are documented in [58]. The viability of virtual sensors has been shown in different environments by considering neighbouring sensors with the same and different sensor types than the faulty one. Models selected through grid searching along with models created through

ensembling were used to make the predictions, both of which showed promising solutions. The results show that a fully autonomous deployment of virtual sensors is possible, although it should be mentioned that their effectiveness highly depends on the availability of correlating surrounding sensors.
