2.2.1. Datasets

Our data were obtained from HOBOS public dataset [30] with additional measurements provided by *Connecthive* [31]. HOBOS data originate from Wurzburg and Schwartau in Germany for the period between 1 January 2017, and 19 May 2019, while Connecthive data were collected from Villefranche de Rouergue in France between February and November 2019, as shown in Table 1. This table shows, from left to right, the location, the number of hives, the observation period, and the sampling frequency. The weight of the beehive through time is measured in Kg for the hives of Wurzburg and Schwartau, while that of the hives of Villefranche de Rouergue is measured in grams. Note that, between May 2018 and October 2018, there is missing data from the Wurzburg hive because the system was down. It is important to mention that the temperature of the beehive through time, the humidity of the beehive through time, and the number of departures from and arrivals to the beehive for a certain date are available with HOBOS dataset, but they have not been used in the current study.

#### **Table 1.** Weight data measurement conditions.


#### 2.2.2. Global Collection of Information on Beehives

In the market there exist several companies that provide tools and methods for global data collection from the beehives, as shown in Table 2. We can mention several examples such as: BTS, Arnia [39], ApisProtect [40], Beeehive Scales [41], BuzzTech [42], Solution-Bee [43], etc. Moreover, open data sources also could be available online as shown in Table 1. Although hives measurements performed by Arnia [39], for instance, might look the same, the reason behind the choice of Connecthive [31] over similar service providers is, first, that open-source data exists on the internet as fixed raw data which cannot be customized for the needs of our experiments nor extended, and second, the association of a nomadic scale and a fixed scale shown in Section 2.2.3, in addition to RFID tag and the automatic weight recording in the database for each single hive are unique and specific to

Connecthive, as there is no similar approach by other compagnies. Moreover, the workflow approach, a specific subject of this paper, will be elaborated more in Sections 2.3.2 and 3.3.2.


**Table 2.** Similar solutions comparison.

Connecthive is a company that develops beekeeping data acquisition systems based on the one hand, on measurements taken from on-board sensors in a few witness hives (so-called richly measured), and on the other hand, from a nomadic acquisition system associated with a voice input interface on tablet or smartphone. The nomadic system is moved to all the hives (said to be weakly measured) during the visits made by the beekeeper. Weight measurements are smoothed out to disregard variations caused by wind. Regarding the support, the hives are placed on a horizontal support even if the ground is not. The data collected are fed back in real time or in deferred mode (thanks to an IoT network) on a server which aggregates it. A rich measurement includes the following quantities in particular: instantaneous weight of the hive, internal temperature, and hygrometry of the swarm, rustling of the swarm, outdoor temperature and hygrometry, wind speed, duration of sunshine, inbound and outbound bee traffic, level pollen collection. A weak measurement includes the following quantities: hive weight, swarm size measured by thermography, instantaneous traffic level capture (short videos), automated varroa count (still image processing). In addition, observed but non-metrological observation data supplement this weak measurement, in particular: surface area and type of brood, number of frames occupied, possible pathological signs.

Although in this study only continuous and discontinuous weight measurements will be considered, the integration of other input data will be part of the future work.

As for the data transmission, GSM, SIGFOX, and 4G are used for communication between the four main layers, i.e., device layer, operation layer, backend layer, and managemen<sup>t</sup> layer, as described in Figure 3. In the case of network shortage, the replication mechanism synchronizes the same data among the three layers as soon as the respective networks (GSM, SIGFOX, and 4G) are reached. The data structures increase in complexity as the layer is higher. Finally, it is essential to mention that, in Lebanon, there is no SIGFOX. Thus, GSM and 4G are only considered.

## 2.2.3. Weight Measurements

As illustrated in Figures 2 and 3, continuous measurements are taken by a stationary scale connected to the cloud, whereas the discontinuous measurements are carried out with a mobile (portable) scale, which is also connected to a network. Beehives weighed with the mobile scale are automatically identified with an RFID tag and the weight data are recorded in the user's smartphone before being consolidated in the cloud.

The discretization of measurable data into events associated with a finite time is the foundation of today's computation theory [46]. Data from observed events are stored after appending it with a timestamp. This discretization is due to the clock-based nature of computers. However, the physical time is continuous, and physical quantities not acquired by the sensors (e.g., physical quantities at time instants different from sampling times) can still be estimated by interpolation or extrapolation, thus making the collected data virtually continuous. Signal processing techniques usually consider the continuous nature of such measurements and lean towards representing signals as mathematical functions. Meanwhile, on the other hand, discrete processing of large amounts of data

might require enormous storage resources, e.g., in case of a large number of sensors, and/or data observation spanning a long period of time.

**Figure 3.** Continuous and discontinuous weight measurements.

Discrete measurements can be considered continuous in case the sampling interval is as small as the sensor's time resolution. For instance, if a sensor is capable of providing a maximum of one sample per second, then acquiring a measurement every second is continuous relative to that sensor's performance. In this project, we exploit this type of continuous measurement, namely instantaneous weight measurements of a reference hive, coupled with discrete measurement samples, approximately a week apart, acquired from all other (non-reference) hives in the apiary.

As mentioned earlier, the proposed system relies on external weight measurements. Because the weight measurement of each beehive is too expensive, the weight of one beehive per apiary is continuously measured and the weight of all the beehives is measured one time each week. Therefore, the challenge resides in inferring continuous weight variations of each beehive from the variations of the continuous measures. To do so, two types of scale are used: a continuous (stationary) and a discontinuous (nomad) scale. More precisely, the continuous scale is used to measure different parameters of the hive over a specified region, using a mobile sensor. A fixed device can be associated with this mobile sensor under the control hive which is later equipped with fixed sensors for weight, temperature, and videos analysis. All measurements are stored in the database to use them to control a hive simulator. The constant recording of the weight, temperature, and humidity, applied over the control hive, relates to the spot records made on each hive separately, using the discontinuous scale. These accumulated data can then faithfully reflect the situation of the colony by giving it a unique profile using artificial intelligence (AI).

Note that the continuous scale is connected to the IoT by Sigfox or by mobile data networks (2G–3G). It can also be disconnected. In this case, data are uploaded to the apiarist's smartphone when nearby, using Bluetooth. The discontinuous scale is always connected to the apiarist's smartphone. An illustration of the continuous and discontinuous weight measurements is presented (Figure 2).

#### *2.3. Build Time*

We carried out a proof of concept based on the data coming from two apiaries: Wurzburg (Center of Germany) and Villefranche de Rouergue (South of France). The results of this proof of concept are not sufficient as apidological results but allow us to design the methodology of our project. After a simultaneous data collection, weight variation patterns will be identified for data labeling and events association. These events will trigger on the predefined BPMN workflow model a sequence of automated business rules

in response to the hive's events. Weight patterns discovery and BPMN workflow will be discussed in Sections 2.2.1 and 2.2.2.

#### 2.3.1. Weight Patterns Detection and Analysis

The study is divided into four main sections: (1) provision of datasets containing the weight and ambient temperature; (2) loading, cleaning, and smoothing the data; (3) analysis of weight and temperature fluctuations; and (4) results validation.

•Data preprocessing

In the first phase, datasheets containing the recorded weight with the corresponding timestamp are cleaned. All invalid date formats and unusual weight values are eliminated. For an enhanced analysis, the smoothing is used to eliminate minor variations in the data and to make the difference between the samples uniform. In addition, the smoothing method calculates the average of the weight between two samples having the difference in time equal to the window given.

• Data analysis

The analysis is divided in two parts: the monthly and daily analysis. In both parts a linear regression is fit to the model used to find a correlation between the time and the weight. The coefficient of determination R2 and the slope are both used to interpret the variations.

In the monthly study, events are analyzed depending on weight variations and the current season. The data are resampled to 12 h and sets of monotonous samples were analyzed consequently. Each set follows a linear regression, and the slope is then used to give the appropriate interpretation for the remarkable changes in a single frame.

In the daily analysis, the parameters considered in the study are the weight, the time of the day, and the outside temperature that is categorized into three sets. The data are resampled to 1 h, then split into three parts: the inactive period, from midnight until dawn and dusk until midnight, and the active period from dawn until dusk. In each part, monotonous segments are fit to a linear regression model, where the slope is utilized to determine the corresponding events. As a first step, three temperature ranges are identified too cold (below 12 ◦C), normal (between 12 ◦C and 35 ◦C), and too hot (over 35 ◦C) [47]. Then, in each range and depending on the time of the day, an appropriate interpretation is generated. Finally, all detected events and patterns are displayed on daily and monthly plots that show each continuous weight variation analysis in a specific color along with its corresponding label. The results are validated from independent datasets from France and Germany.

The data is smoothed using a running average window. Noise resulting from beekeeper intervention or system failure is excluded from the dataset. Below, Figure 4 illustrates examples of detected patterns based on data collected by Connecthive and correlated with beekeeper observation from the hive.

**Figure 4.** Detected Patterns.

#### 2.3.2. Business Process Model and Notation (BPMN) & Workflows

Regarding apiculture best practices, two business process models are built. The first model is a result of face-to-face interviews with three different amateurs beekeepers in Lebanon. All common practices were validated, and a comprehensive synthesis is built accordingly before building the model itself. The resulting (BPMN) model for amateur beekeepers consists of around 60 gates. As for the second model, it is built with professionals in the domain and is an outcome of many meetings with a leading company in Lebanon, l'Atelier du Miel [48], where to better organize the received information, an excel file was created to describe the beekeeping process taking into consideration several parameters, such as hive's status, region, altitude, etc. In addition, Connecthive [31] also provided us with pseudocode that illustrates some of the beekeeping processes shown in Figure 5. The resulting (BPMN) model for professional beekeepers consists of more than 100 gates.

Table 3 illustrates an excerpt from the mentioned Excel file as the result of many interviews with l'Atelier du Miel [48] correlated with provided data from Connecthive [31], France during spring season and fall respectively. The table describes as mentioned previously: the region, altitude, the starting day, the season, flower types and hive's status. Moreover, in the table, we can distinguish two main categories of tasks with their respective actions and countermeasures: regular daily tasks and anomalies detection. Please note that this is only an excerpt of the original file and the provided information in the table is not exhaustive but is given just as an example. Finally, the same type of anomalous events provided by Connecthive and shown in the bottom of the table, will be fully discussed in Section 3.3.

In this paper and to avoid complexity, only a simple model will be considered taking only into consideration the hive's weight and the season and was built from the pseudo code shown in Figure 5. The BPMN model will be shown later in the results section (Section 3.3.2). To perform the modeling, business process model and notation (BPMN— OMG, 2013) [49] is deployed. BPMN 2.0 makes available a set of modeling objects allowing any organization (industrial, health, military...) to represent their business process.

The aims of BPMN are to be "readily understandable by all business users, from the business analysts that create the initial drafts of the processes, to the technical developers responsible for implementing the technology that will perform those processes, and finally, to the businesspeople who will manage and monitor those processes". At the descriptive layers, the proposed modeling objects allow to build a process model to understand what an organization do and how it works [50].

#### **Table 3.** Extract from the beekeeping best practices Excel file.



**Figure 5.** Simple Pseudo Code, provided by Connecthive, France.

At the analytic level, BPMN allows to deploy key performance indicators (KPI) to make analysis of processes to improve performance. To this purpose, first, BPMN is easily extendable to integrate specific attributes, concepts, or else relation to a process model and to use it as analysis entities [51]. Second, tools currently available to use BPMN often integrate modules for process analysis and continuous improvement.

Lastly, at the executable level elements, such as data used, actors involved within the process are added to run the process. In the same vein as for the analytic level, tools available to integrate the workflow engine are able to interpret these elements and to execute processes.

Regarding the last point, the concept of workflow is considered. The notion of workflow emerged in the late 1960s [52] in the context of the evolution of information systems [53]. Several definitions have been proposed in the literature for the concept of workflow, including that of the workflow managemen<sup>t</sup> coalition (WFMC), an organization that defines standards for the development of workflow tools [54,55]. Thus, a workflow consists of automating the flow of information in a process by providing each actor with the information necessary to carry out the activities that make up the process by following procedural rules. This is the reason why the process is modeled first (e.g., using formal models BPMN/DMN presented hereafter) to identify the parts that can be executed by a workflow engine [56,57] in order to drive the taking charge of a protocol.
