*4.5. Indexing*

Indexing provides a means for clients to search for IoT entities efficiently. It focuses on IoT streams and sensors, where queries can be based on sensor type and absolute or relative location. To initiate the process of indexing, a platform manager needs to register a MDR with the Indexing component. In turn, this will trigger the subscription to sensors and streams at the registered MDR. As the metadata descriptions are updated at the MDR, the Indexing component will be notified and then index the sensors and streams based on location. For scalability, the Indexing component can be configured so that the persistence it relies on (MongoDB) can be shared (see Figure 10).

Indexing exposes a query interface which complies with the NGSI-LD specification. Upon querying by a client, entities that relate specifically to sensors, IoT streams, location points or QoI will be responded to directly. Else the query is the forwarded to the MDR for complete query resolution. The approach enhances the query resolution performance significantly, as co-located entities and common types are indexed and grouped, allowing reduced latency in query processing.

The indexing technique applied is based on a geospatial approach defined by Janeiko et al. [65]. The index is a tripartite whereby two of the indices link iot-stream:IoTStream and sosa:Sensor entities to a geo-partition key. The other index contains the actual data and is also geo-partitioned. The partition key is determined by intersecting the location of the sensor represented as GeoJSON objects with predefined GeoJSON polygons representing geographical regions. The index contains the entities in the form of a graph, whereby linked entities are stored as a single entry. Here, the IoTStream entity is the root entity with all other indexed entities are nested within it, hence any query for any entity must be linked to an IoTStream entity. The structure defined allows to construct compound indices, which accelerates nested queries. By providing its indices for search, the Indexing component addresses scalability **R-1**.

**Figure 10.** Indexing sensors and IoTStreams.

The Indexing component is responsible for creating and updating the metadata indices to allow fast search and retrieval of the metadata stored in the MDR, using geospatial indexing. The initial approach for geospatial indexing IoT Streams and Sensors was to use geohash, whereby the location is represented by a string of characters with a predefined length reflecting the granularity of the bounding box the entity will be associated with. A new approach has been taken to maintain the exact location of the entity by using a Quad Search Tree. The main KPI that is applicable is the latency and retrieval time. The Indexer partitions the notifications from MDR broker notifications for stream or sensor data location by country. Latency and retrieval time can be measured based on: a data set's size or number of entities, i.e., streams and sensors, a number of countries or a number of concurrent requests.

Therefore, the approach to evaluation will be applied to a data set with different sizes, multiple countries. Data sets were randomly generated which covered entities located within 6 countries. In terms of hardware, the experiment was conducted on a computer with an Intel CORE i7 CPU of 6 cores, 1.9 GHz and 32 GB RAM. Concurrency tests were performed using the Apache Bench tool. Two sets of tests were performed. Each set had a number of entities stored in the indexer. For each set, two sets of concurrency tests were performed: one with 100 requests (the graphs show the total time for all requests) with a concurrency of 10; and 10,000 requests with a concurrency of 1000. Regarding the query response time, two factors are measured, the total time for the response, the wait time and the time the indexer receives the requests and responds, irrespective of the connection time.

Between the 3 sets of tests, the wait- and total response times show a gradual increase with respect to the number of stored entities. What is also noted is that for the last set of concurrency requests, a significant change in delay is observed, especially in the case of requests with a concurrency of 1000. Figure 11a,b show the response times for requests with increasing concurrency. The plots have been smoothed out with a moving average of 10 and 20, respectively.

(**b**) 10000 Requests and Concurrency of 1000

**Figure 11.** Indexing response times.

### **5. Enablers for Search and Orchestration Layer**

This section addresses the enablers for search in the IoTCrawler framework. Based on the description in Section 3, all enablers will be shown in detail, including experimental results and evaluations.

### *5.1. Privacy and Security*

The IoTCrawler framework places security and privacy as a traversal pillar interacting with the different layers of its architecture (cf. Figure 1). This pillar comprises: Identity Management (idM), access control management, for both intra-domain and inter-domain and finally privacy from a data point of view. Starting with idM, this component is responsible for handling the different identities that are registered in the IoTCrawler

framework. An identity, which can be a user, device or service comprises different attributes such as: name, email, role and organisation, to name a few. They are quite important for the definition of access control and privacy encryption policies as we will see below. Another important function carried out by the idM is that of authentication. Any entity registered in the system must perform the login operation due to the exposed API. In our case, we have selected the FIWARE KeyRock GE (https://fiware-idm.readthedocs.io/en/latest/ accessed on 24 February 2021), which exposes an OAuth2 API.

To deal with this heterogeneous landscape, we have designed a comprehensive approach where we are combining a distributed authorisation solution called Distributed Capability-Based Access Control (DCapBAC) with Distributed Ledger Technology (DLT), specifically Hyper Ledger Fabric (https://www.hyperledger.org/use/fabric accessed on 24 February 2021) and the use of smart contracts. DCapBAC decouples traditional authorisation solutions, such as XACML framework, in two different phases: authorisation request and access. For that, a new component, called Capability Manager (CM), is introduced. It is the end-point for the authorisation requests and it also issues an authorisation token called Capability Token (CT) after validating the authorisation request by communicating with the XACML framework. Regarding the access phase, the XACML Policy Enforcement Point (PEP) is moved as a Proxy located close to the server where resources are stored. In this case, CT acts as a proof of possession which allows the PEP Proxy to validate it easily without querying any other third party. This CT contains all details regarding the resources to be accessed, the access mode among others.

DLT provides numerous advantages in term of resilience, and traceability because of its consensus approach where all nodes of the network must agree on global policies. For this reason, in IoTCrawler, an additional step is taken as showcased in Figure 12, by introducing the Blockchain as an added element in the security process; by storing policies in the Blockchain, as well as CTs that can later be revoked; and thus need to be checked by the PEP Proxy in the Blockchain for validity.

**Figure 12.** Policy Enforcement Point (PEP) Proxy interaction diagram.

Access control components are integrated into the Blockchain to enhance security and scalability. By leveraging Blockchain, several issues of current access control systems can be overcome.

•Untrustworthy entities: First, Policy Administration Point (PAP) might be subject to an attack and perform malicious actions such as updating a policy against the resource owner's will. Having a Blockchain helps avoid misbehaviour of PAP. The access control policy's integrity is checked by registering and checking its meta-data, such as the hash value managed by the Blockchain network. Second, policy evaluation done by Policy Decision Point (PDP), which could be manipulated by an untrusted PAP. The Blockchain ensures this misbehaviour to be detectable.


To address the scalability requirements **R-1**, we carefully design the security components so that only critical parts are executed on-chain and other parts can be done off-chain. Policy and capability managing operations are on-chain with policy enforcement and identity managemen<sup>t</sup> can be done off-chain or access to another service. In addition, we carefully select the consensus algorithm, which is one of the core parts of the Blockchain, so that it provides efficient throughput and latency performance. As a result, security and privacy enablers provide by-design secure access to IoT data thanks to the DCapBAC access control model in privacy-preserving using attribute-based encryption. DCapBAC is coupled with Blockchain to provide distributed trust among untrusted domains by agreeing on common policies and ensuring policies' integrity. In addition, Blockchain offers transparency, auditability and fault tolerance to access control. Our chosen Blockchain deployment with sufficient consensus algorithm ensures low overhead, in another word, high scalability.

For the evaluation of these components we have measured the latency associated to each of the operations that these components perform to gran<sup>t</sup> authentication and authorisation, as well as the performance metrics linked to the CPU and memory consumption of these operations by increasing the number of simultaneous requests up to 2048 connections. We ran the benchmark experiment on a server with Intel Xeon E-2146G CPU, 32GB RAM, in a local network environment.

### 5.1.1. Identity Management and Authentication Evaluation

Starting with Keyrock, we have evaluated the latency on two different sets of operations, the first one is related to the generation, information retrieval and deletion of the Identity Management Token, while the second set is focused on the user point of view, providing information about common operations related to user management. For the evaluation of this metric, we have launched 100 executions of these operations to provide the average latency value and 95% confidence intervals as presented in the following graphs and tables. As we can see in Figure 13a, the delay obtained for the difference is really low, reaching up to 30 ms for the generation of the idM token. This operation, compared to the others, is the heaviest one because it comprises the different mathematical operations required to generate the token. The most common authentication operations usually triggered via web interface or REST API are shown in Figure 13b. In light of these results, we can also state that user operations last about 30 ms, which is reasonable in terms of latency.

(**a**) Token Operations

(**b**) User Operations

**Figure 13.** Delay of Identity Management Operations (units in milliseconds).

Regarding the scalability aspect, we have assessed the performance of the idM in terms of CPU and memory consumption resources. The objective was to provide a trend as the number of requests increases, so that we can provide an estimation for a higher number of simultaneous requests. Therefore, to achieve this goal we have launched different number of simultaneous connections: 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024 and 2048. Additionally, we have repeated the experiments 4 times. More specifically, we have employed a query for authenticating a user. According to Figure 14a,b, we can state that the CPU resources managed by the idM remain stable as the simultaneous requests increase. Regarding the memory resources, we can see that up to 256 simultaneous connections, the increase is less than 1.5%. From that number on, the memory resources increase again about 1.5%. Therefore, we can state that it is able to handle a large number of communications.

(**a**) CPU Consumption vs. Number of Simultaneous Communications (**b**) Memory Consumption vs. Number of Simultaneous Communications **Figure14.**IdentityManagementResourcesconsumption.
