**1. Introduction**

During the last years, the Internet of Things (IoT) has grown massively and is still growing because of the availability of cheap sensor devices and more and more widespread IoT frameworks, increasing the number of devices and services. This leads to new possibilities for use cases and scenarios in the IoT (e.g., http://www.ict-citypulse.eu/scenarios/ accessed on 24 February 2021). These scenarios range from agriculture, Industry 4.0, to smart cities and many others. A quite common problem for all of these domains is the

**Citation:** Iggena, T.; Bin Ilyas, E.; Fischer, M.; Tönjes, R.; Elsaleh, T.; Rezvani, R.; Pourshahrokhi, N.; Bischof, S.; Fernbach, A.; Xavier Parreira, J.; et al. IoTCrawler: Challenges and Solutions for Searching the Internet of Things. *Sensors* **2021**, *21*, 1559. https:// doi.org/10.3390/s21051559

6

9

**\***

Academic Editor: Paolo Bellavista

Received: 27 January 2021 Accepted: 18 February 2021 Published: 24 February 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

search and discovery of available IoT devices, which is the main purpose of the IoTCrawler framework.

To realize the envisaged IoT search framework, a two-layered approach is foreseen, containing the Discovery and Processing Layer and the Search and Orchestration Layer. The term Discovery refers to the process of connecting new data sources to the framework. This may require a step to extract additional information from other databases named Crawling. The Processing refers to actions to ease up and enhance the later search. Processing includes the Indexing, i.e., preparing ordered references to discovered data sources for faster access; the Semantic Enrichment (SE), i.e., the deduction of new data, either describing higherlevel context or the data stream itself. The Search and Orchestration Layer becomes active when a search process is started. Search refers to the act of finding suitable data sources in the system by an application and includes a ranking mechanism to sort out the results to fit best the specific use case. Creating the ability for an application to receive live observations from a data stream is done during the orchestration step. When designing an IoT search framework, there are several issues to be considered: volume (the amount of data), heterogeneity (different kinds of data sources), dynamics (changes in data sources/environments) and security and privacy (e.g., IoT data sources measuring sensitive data). By analysing these issues, a number of general requirements for an IoT search platform can be derived:


While traditional IoT middleware platforms allow users to search for particular IoT devices, they still require manual interaction to integrate data sources into a use case. As the number of IoT devices has increased profoundly in last couple of years, many middlewares have also surfaced to introduce more flexibility and functionality to IoT solution providers. Middlewares like Kaa (https://www.kaaproject.org/ accessed on 24 February 2021) and SiteWhere (https://sitewhere.io/ accessed on 24 February 2021) provide features like data storage, data analysis, device managemen<sup>t</sup> along with the tools to analyse infrastructure and optimise computation or provide additional functionality like digital twins in Kaa. MainFlux (https://www.mainflux.com/ accessed on 24 February 2021) and OpenRemote (https://openremote.io/ accessed on 24 February 2021) employ protocol and device agnostic strategies to ease the connectivity of devices. Distributed Services Architecture (DSA) (http://iot-dsa.org/ accessed on 24 February 2021) provide solutions for the devices to communicate in a decentralised manner. With all these different middlewares, there is still a lack of searching mechanisms that facilitates both Machine-to-Machine (M2M) and Machine-to-Human (M2H) communication. The main goal of IoTCrawler is to provide tools that answer search queries according to user's preferences such as sensor types, location, data quality. For better M2M communication, automated context dependent access is provided based on a machine initiated semantic search. IoTCrawler also monitors these IoT devices and informs the users about changes in data quality and the availability of new relevant sensors to provide flexibility and additional information. Moreover, IoTCrawler envisions a platform which can provide any user an easy access to open data while also facilitating private users such as industries and businesses. For this, research has been conducted to implement strategies which ensure that private data stays protected and is only provided to the authenticated user.

This paper provides an overview of the IoT search framework IoTCrawler, which is able to crawl IoT data sources and provides an interface to allow for human- as well as machine-initiated search requests. The IoTCrawler framework consists of a series of loosely coupled components and is thoroughly designed to address the identified requirements. The components are designed to be used individually or as a whole framework to allow the search for IoT data sources in a fast, stable and secure way.

The remainder of this paper is organised as follows. Section 2 presents related work in regarding the solutions and components of the IoTCrawler framework. Section 3 describes the idea of IoTCrawler as a search framework for data sources in the IoT, while Sections 4 and 5 depict the two layered approach and present the enablers for the discovery and the enablers for the search layer in detail, including solutions to address the presented requirements. Section 6 provides an overall evaluation of several IoTCrawler framework instances running for certain use cases in real-world environments. Finally, Section 7 concludes the paper.
