*5.4. Orchestrator*

The Orchestrator component is targeted to be the mediating component for IoT applications, which are expected to be running outside the IoTCrawler platform, interacting with it via interfaces of the Orchestrator. The Orchestrator is an endpoint, which forwards all metadata requests to a Ranking component and subscription requests are forwarded directly to the MDR. At the same time, the Orchestrator provides its endpoint for receiving notifications coming from the MDR. Without it, applications would have to expose their own REST endpoint, which is often not possible (e.g., for apps running on mobile devices or in private networks). The Orchestrator mitigates that by providing its own endpoint (not exposed to the public) and redirecting all incoming notifications to a dedicated queue in a publicly available publish-subscribe service (Advanced Messaging and Queuing Protocol (AMQP)). It is enough for an IoT application to subscribe to a queue in the messaging service to ge<sup>t</sup> notified immediately. The described publish-subscribe mechanism also allows the setup to notify the IoT applications about stream failures detected by the monitoring component.

The Orchestrator implements the NGSI-LD interface and redirects incoming NGSI-LD requests to two components: MDR and Ranking. Entity subscription requests are analysed, modified (if required) and forwarded to MDR. The metadata/discovery requests are

forwarded directly to the Ranking component, which allows it to rank the results of metadata requests according to a specified ranking criteria. As a result, the Orchestrator hides two IoTCrawler components under a single NGSI-LD interface—one of the interfaces used by IoTCrawler applications.

The evaluation of the Orchestrator component consists of measuring a dependency of performance characteristics (throughput and latency) on the number of parallel connections— IoT applications, running remotely. In this experiment, the Orchestrator component is working on top of Djane Broker—a lightweight NGSI-LD broker, which is less functional than Scorpio. The benchmarking process has been conducted using a single Intel Xeon machine (4 cores, 16 GB Ram). Each value was obtained by averaging the values of 10 repetitive experiments. Results can be seen in Figure 19. The number of parallel clients varied within the range 64–1024, where each client performed intensive and non-intensive workloads. For the non-intensive workload (1 request by each of 64–1024 parallel clients), the maximal average throughput is around 400 requests per second when the latency is less than 0.2 s. For intensive workloads (100 consecutive requests by each of 64–1024 clients), the maximal average throughput increases up to 1200 requests per second with the average latency increased to 1 s.

(**a**) Throughput (requests per second)

**Figure 19.** Orchestrator performance.

### **6. Application Domain Instantiation**

This section presents two application examples of how IoTCrawler is being instantiated in real-world scenarios. Other scenarios for different domains are under development and will be part of a future publication.

### *6.1. Smart Home—Semantic Integration Focus*

The target of the smart home use case was to understand the challenges which smarthome owners are facing when deploying and using their smart home devices. We have implemented an energy insight dashboard and tested it in a longitudinal study with end users in an early stage of the project. The energy insight dashboard was built with the objective to provide smart home users insights about their energy consumption and thereby to reduce their energy costs and carbon footprint. This was achieved by collecting energy measurements from smart plugs and other smart energy meters. The web-based application includes various aggregated and real-time views of the energy data as well as information about the usage frequencies of appliances attached to the smart plugs.

**Evaluation:** As part of IoTCrawler, we extended the dashboard to a public test bed running 24 h a week for almost a year. More than 60 homes and 3400 devices were connected during that period. Power users have more than two hundred devices connected to a smart home. Thus, we realised that managing these devices, which include knowing their locations, and for smart plugs, what kind of appliances are connected to which, created a considerable challenge for smart home owners. More importantly, the heterogeneity of devices with respect to their communication technologies, APIs and the gateways to which they are connected, makes it hard to develop smart home applications that run seamlessly with different vendors. As a response to tackling this challenge, we integrated an early version of an IoTCrawler feature for semantic annotation in which we used machine learning to detect device types, their locations and connected appliances in realtime [8]. We conducted a survey to validate the benefits of IoTCrawler features. Most of

the respondents indicated that comparing and analysing energy usage is a benefit of the Energy Insights Dashboard (77%). On the second rank, respondents indicated that the automatic device detection feature is a benefit of the Energy Insights Dashboard (41%).

Further conversations with smart home owners and application developers have shown that IoTCrawler has the potential to be an effective IoT platform. For example, smart home users will be able to keep their data on their own hardware (located in private networks) and federate it into the IoTCrawler for processing by third-party analytical services. A Blockchain-based security mechanism (part of IoTCrawler) enables data owners to gran<sup>t</sup> access to certain analytical services the similar way as a smart phone user grants access to certain mobile apps. Analytical service developers are considered responsible for managing their processing infrastructure and federating the processing results back to IoTCrawler. The core of IoTCrawler consists of the NGSI-LD standard together with a number of semantic ontologies, which makes data and metadata models more structured and understandable by independent service developers, which opens a potential for service compositions. As a result, raw data owners (smart-home users) will be authorised to access the intermediate (if needed) and final processing results calculated out of their data.

Encouraged by these findings, we further developed crawling and semantic annotation mechanisms to reduce time and effort when integrating smart home and other IoT and stream data into IoTCrawler. As IoTCrawler provides a common, semantic abstraction for finding and accessing the respective data streams, it becomes much easier to develop smart home applications. Consequently, we developed the "What's happening at home" prototype that is fully implemented on top of the IoTCrawler infrastructure and interacts with the Orchestrator, Search Enabler, Ranking and Security components. The application detects users' activities based on the energy consumption of appliances attached to smart plugs. Activities are modelled in terms of Home Activity ontology (http://sensormeasurement. appspot.com/ont/home/homeActivity accessed on 24 February 2021), which is partly described in one of the GraphQL schemas (https://github.com/IoTCrawler/Search-Enabler/ blob/master/src/resources/schemas/homeActivity.graphqls accessed on 24 February 2021) used by the Search Enabler. The schema allows applications to filter households by type or location of detected activities (considering privacy policies). The developed application prototype demonstrates the separation between functionality and benefits from the granularity of the IoTCrawler data model by dealing with sensors and their streams.

### *6.2. Smart Parking—Security and Privacy Focus*

Finding a free parking spot can be very cumbersome in populated cities with the collateral effects of having more vehicles circulating in the city, such as the increase in noise and pollution. In IoTCrawler, we provide a solution to alleviate this problem by offering a parking recommendation service, which allows the user to define the destination, time of arrival and the affordable walking distance. This solution takes advantage of IoTCrawler by gaining a way of representing the information homogeneously, allowing the new information to be introduced without any modification to our solution. More specifically, this solution uses Indexing and Ranking components to retrieve an ordered list of parking sites and parking meter information. Additionally, we allowed the data providers to specify different access policies, as an exercise for proving the security capabilities of our IoTCrawler platform, which the latter will affect the consumers in terms of the visibility of the information depending on the consumer's attributes.

**Evaluation:** The SmartParking Most Valuable Product (MVP) is being tested in the City of Murcia, in the south-east of Spain. Previous to this solution, the City of Murcia had devoted efforts in research and development based on IoTCrawler, in order to incorporate and integrate promising solutions that would undertake the different challenges with respect to working with data from competing parking providers and regulated parking zones. The previous system was inspired by the participation in the CPAAS.IO project by the University of Murcia, where a solution for parking was devised, using technology derived from the FIWARE ecosystem: FogFlow. The parking solution based on FogFlow, utilised small "edge" devices that were to be installed in different parking locations, charged with the task of gathering data and performing local computations (such as aggregation or availability evaluation). This way, the system leveraged edge computing to enable quick and efficient data transfer, while relying on cloud resources for the heavylifting and edge workload-management centralisation. This solution already involved the use of NGSI interfaces for data access, which later on eased the transition to the next iteration, based on IoTCrawler. Some of the difficulties faced by the FogFlow approach were caused by some locations that already had online systems in place. They had special interfaces and connectors, which had to be developed in order to adapt the information and make it available to the rest of the system. In some cases, security and privacy were an issue, as providers wanted to be in control of what and was shared when with the system, and furthermore, how that information was to be accessed later by different parking solutions.

Those gaps have been successfully addressed by the IoTCrawler architecture, which provides a better and broader fit to the parking scenario, by introducing security through fine-detail policies that allow us to define how and whom is allowed to access or produce data. It also considers different ways of which data are to be incorporated into the system, be it directly from NGSI-LD enabled devices connecting to the parking system, through adaptation of other devices or even integrating entire existing systems through connectors and gateways. SmartParking leverages this security, providing a way to discriminate which end-users can access certain information. This way, a user could have permission to access specific parking alternatives. Although in our current implementation this functionality is only utilised by two fictitious users "Juan", who has access to private parking, and "Pedro", who has access to both parking and regulated parking zones. This functionality will allow us to introduce special user roles, such as medical professionals, who would have additional access to parking information for special private parking lots close to their hospital, or city officials that would have access to parking in official buildings, students having access to parking information in the city campus, etc. Furthermore, the security components of the platform would easily allow to define other flows of information coming from the end-users themselves, beyond the classical star ratings. This could mean the ability of claiming parking spaces, updating parking availability in zones with no (or poor) sensory information and it even opens up for future social/collaborative parking solutions, in which end-users can temporarily offer others their domestic parking lot while at work.

SmartParking, through IoTCrawler, copes with the diversity of data existing in the system, by using semantic technologies, such as those found in the semantic web. The extended usage in IoTCrawler of the NGSI-LD standard both for APIs and data modelling, allows the precise representation of information coming from different parking providers and allows for successful searches over highly diverse data. In a similar way to the previous FogFlow solution, which had a local scalability strategy based on the usage of edge devices as part of a distributed system, the IoTCrawler solution allows for the distribution of information through distributed MDRs, but it also provides a federation strategy that allows for broader and more diverse architectures, in which existing parking platforms can be integrated into IoTCrawler's framework, enabling the federation with other parking systems. This federation capability, paired with the Indexing and Ranking components of IoTCrawler, as well as security components, allows for scaling beyond the local city to upper tiers, such as regional or national levels.

Finally, IoTCrawler integrates monitoring, fault-detection and fault-recovery mechanisms, providing useful data regarding the availability and reliability of the parking information contained in the system that can be directly used as part of the parking recommendation system with no further development needed. In short, the IoTCrawler approach for the SmartParking solution in Murcia, by far outperforms (feature-wise) the previous solution based in FogFlow, by accounting for the security aspect of data access, the diversity of data and the integration of existing solutions while allowing for greater scalability and flexibility to adapt and adopt new strategies and ideas, making it, in a way, future-proof.

### **7. Conclusions and Future Work**

This paper presents the IoT search framework IoTCrawler, which allows for the search of data sources in the IoT. It features a domain-independent and layered design and provides solutions for crawling, indexing and searching of IoT data sources. Key enablers supporting the search process ensure privacy and security, scalability and reliability.

We started out the paper by presenting, several issues regarding an IoT search framework listed and analysed to build the basis for our requirements. These requirements have been successfully addressed by the IoTCrawler framework and its components. The loosely coupled components allow for different instantiations of the framework without blocking the search process. The scalability of the discovery and search enablers has also been evaluated to fulfil requirement **R-1**. With the adaptation of well-known ontologies and standards, an information model has been created to ensure a reliable basis for semantic annotation and context provision. This and the integration of standardised query interfaces enables the framework to be used for machine-initiated search queries **R-2**.

Requirement **R-3** is addressed by designing the framework in a layered approach, which allows the discovery layer to work independently from the search layer. This enables crawling and discovery of new data sources, constantly semantically enriching and monitoring the data sources as well as building indices to speed up incoming search requests. In addition, it makes it possible to include existing solutions, it offers interoperability and overcomes data fragmentation and heterogeneity. As data sources in the IoT are often of private or restricted nature, security and privacy have to be considered **R-4**. Through the integration of an extensive security and privacy component, from design time on into the architecture of the framework, this requirement is successfully addressed.

To showcase the capabilities and applicability of IoTCrawler, two real-world instantations in different domains have been realised, featuring the search process in a smart home environment and the search in a Smart City use case. In future work, it is planned to enrol the IoTCrawler framework to further use cases covering other domains. This will bring "real" results and present how the framework could increase the benefits gained by the IoT.

**Author Contributions:** Conceptualization, T.I. and M.F.; methodology, T.I., T.E., J.X.P., H.T., J.A.M., P.G.-G. and P.S. (Pavel Smirnov); software, T.I., E.B.I., M.F., P.G.-G., A.G.-V., T.E., P.S. (Pavel Smirnov), J.A.M., S.B., A.F., N.P. and R.R.; validation, M.J.B., P.S. (Parwinder Singh), A.F., R.R. and N.P.; formal analysis, T.I., E.B.I., A.G.-V., T.E., R.R. and N.P.; investigation, E.B.I., R.R., A.F., H.T., A.G.-V., J.A.M. and P.S. (Pavel Smirnov); resources and data curation, M.K. and S.H.C.; writing—original draft preparation, T.I., E.B.I., M.F., T.E., J.X.P., P.S. (Patrik Schneider), H.T., A.G.-V., P.S. (Parwinder Singh), M.J.B., J.A.M., P.G.-G. and P.S. (Pavel Smirnov); writing—review and editing, R.T. and M.S.; visualization, T.I., E.B.I., P.G.-G. A.G.-V., J.A.M., H.T. P.S. (Pavel Smirnov), J.X.P., T.E.; supervision, M.S. and M.P.; project administration, A.F.S.; funding acquisition, A.F.S. and R.T. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work has been funded by the EU Horizon 2020 Research and Innovation program through the IoTCrawler project under gran<sup>t</sup> agreemen<sup>t</sup> number 779852.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** No new data were created in this study. Data sharing is not applicable to this article. Where available, source of data has been referenced in text.

**Conflicts of Interest:** The authors declare no conflict of interest.
