3.2.2. Data Acquisition Layer

Data collected at the first layer (Sensing and Actuation), using the deployed sensors, are sent via HTTP requests and MQTT by the RPi gateways to the HOLSYS platform, which is deployed in the remote server, as depicted in Figure 4. The HOLSYS platform is based on the open source Thingsboard IoT platform in its community edition. Services and packages together with all connectors enabling the acquisition of all the deployed IoT devices are installed and configured in the cluster composed of one performant master and three slaves. They are HP computers with Intel core i3 and i5 with 4 Gbytes of RAM and 500 Gbytes of storage each.

**Figure 4.** Data transfer from IoT nodes to the HOLSYS platform via RPi gateways over MQTT and HTTP.

The HOLSYS acquisition layer is a set of RPis representing the deployed nodes to the platform. A RPi is a tiny credit-card sized computer using the Raspbian operating system, a free version based on Debian and optimized for its limited power. The three gateways that have been deployed are a 4 Gbyte RAM RPi 4 B+ and two 1 Gbyte RAM RPi 3 B+. NodeRed is used in these RPis to acquire data from the serial connected nodes to pre-process and store them locally in files for a backup. Each node is represented to the platform by a unique token and identifier, which is used to secure the communication and to identify the streamed data. HTTP requests have been used as a backup transfer protocol in case the MQTT broker becomes unfunctional. However, the HTTP protocol is not suitable for IoT architectures as it is more energy consuming and has a bigger data packet size. On the other hand, MQTT has been designed to provide a lightweight messaging technique enabling small packet size for faster transfer. The MQTT is an IoT data transfer protocol having publish/subscribe architecture. It is based on a broker to which all clients, either subscribers, publishers, or both at the same time, should be connected to (see Figure 5).

The installed broker is the central communication point where data is exchanged between clients based on a topic. This latter is a category in which a given client has published data. From the other side, the subscriber will get the data from the given topic.

#### 3.2.3. Data Processing Layer

The processing of data can be performed in real-time or batch manners in the HOLSYS platform servers or by third-party applications deployed elsewhere. The local processing can be a simple aggregation of each received tuple of data or a complex processing using the rule-based engine. It is a customizable and configurable system for complex event processing. It allows for filtering, enriching, and transforming incoming data and triggering various actions, for example, notifications or communication with external systems. In the current study, a rule chain has been implemented to save received data to the HOLSYS database (NoSQL Cassandra) after applying filters on them according to each data source (e.g., Temperatures, Humidity and CO2 ranges; null values, missing data) and forward, via MQTT and Kafka, the filtered data to external sources for other applications. This latter plays a key role for integrating machine learning applications. For instance, the occupancy forecast, an important input to the MPC algorithm, is performed as an external application while using the filtered data coming from the platform. In fact, Apache Kafka, a high-performance real-time data streaming technology capable of handling large amounts of events, is used as a pipeline to transmit stream data between the platform and other applications (e.g., machine learning algorithms). This part shall be detailed in the next section. Furthermore, MQTT has been used to send data to the consumers in the MATLAB MPC control as it is supported by default as a communication method implemented by MATLAB. Figure 6 depicts the communication between the platform and the processing layer. Kafka and MQTT are the main tools used to allow the processing layer to receive data for occupancy forecasts and MPC control. For instance, the forecast model receives the indoor CO2 concentration and the real occupancy number, respectively over MQTT and Kafka, to forecast 10 steps ahead. Real-time forecasted values are fed as an input to the MPC controller in order to predict the required control actions, which are sent back to the HOLSYS platform for execution.

**Figure 6.** General architecture platform for controlling ventilation system based on occupancy forecast and CO2 measurement.

#### 3.2.4. Data Storage, Visualization, and Applications Layer

This layer presents all external services that can be connected to the platform by means of supported data transfer protocols and technologies. Mainly, Kafka, MQTT, and HTTP requests are used to allow external third-party applications to connect, produce data, and consume available resources of the HOLSYS platform. In addition to the local NoSQL Cassandra database, which is deployed by default for storing data at the platform level, Mongodb is also used to store backup data for batch processing, serving other applications, such as training the forecast model of occupancy. The main reason behind using Mongodb is its architecture based on storing data as JSON format but with a special syntax called BSON. Each set of data is stored as a document into a collection while its content can be unstructured and different from other documents, resulting in the NoSQL principle, which does not need a tabular and relational concept. Furthermore, the collected data is broadcasted on MQTT using adequate topics for all other applications, mainly those requiring shared resources, such as weather data. As for data visualization, Grafana tool is

being integrated with Cassandra and the Mongodb database for real-time visualization of data streams.
