In this section, we propose our
SOC and capacity-based scheduler (SOCCS). It processes event requests and determines the best node in an SBC cluster to run them based on the remaining battery estimations and CPU usage in the nodes. The proposed scheduler is formed by three main elements: the
SOC estimator, the monitor and the scheduler. The relation among these blocks and how they communicate with the orchestrator (represented by blue lines) as well as the communication between the orchestrator and the cluster elements (represented by green lines) is depicted in
Figure 2. Please note that in this representation, the set of nodes where events can be scheduled (
N) comprises the controller node and the computing nodes (
). The functions of each block in the scheduling module are explained in the following subsections.
4.1. SOC Estimator Block
This element acts as an agent since it runs in every node comprising the SBC cluster. Its main functions are to receive the necessary information from the measurement equipment and calculate the
SOC using the coulomb counting method [
20].
The
SOC estimation can be dependent on certain characteristics of the batteries (e.g., state of health and model) by taking into account the used estimation model [
27]. Such factors are related to the battery hardware. Thus, the estimators have to be updated when there is a change in the hardware. The battery hardware-dependent estimations are less evident in the coulomb counting method, as it monitors the total electric charge that a battery absorbs or releases during its charging or discharging phases.
The estimation of the
SOC can be achieved by dividing the percentage of the released electric charge in the battery by the one that entered it. Denoting the released capacity when the battery is completely discharged as
and the rated capacity as
, the
SOC percentage can be obtained as follows:
The proposed estimator block adopts the coulomb counting method because of its simple yet accurate approach. To get more accurate results, the
term is obtained by considering the actual electric charge that the battery can deliver over several charge-discharge cycles. Following a similar methodology as the one proposed in [
20], we can find the coulombic efficiency (
) of the rated capacity. Additionally, we used the maximum releasable capacity (
). Thus, Equation (
4) is adjusted to:
When the battery is fully charged, the
SOC is given by Equation (
5). However, during a discharging phase, we should know the percentage of the capacity relative to the
term, denoted as the depth of discharge (
DOD). The
is obtained from a measured charging and discharging current (
) in an operating period
and then subtracted from the total
as shown the following equations:
The
is an accumulated value as shown in Equation (
7). We can estimate the
SOC of the battery through Equation (
8) at any time. The estimation process is based on the measured voltage and current. The
SOC estimator block knows the battery operation mode from the value and direction of the operating current. During the discharging phase, the
adds up the drained charge until reaching the
value when the battery is exhausted (i.e.,
= 0%). Meanwhile, the
counts down the accumulated charge in the charging phase until the battery is fully charged (i.e.,
= 100%).
The SOC estimator block uses the aforementioned procedure to estimate the SOC battery in each node where it is running. Finally, it sends the estimated value to the monitor block.
4.2. Monitor Block
This module is responsible for monitoring all the virtual nodes that have been created and assigned an event request by the scheduling algorithm described in
Section 4.3. It also monitors the status and usage of the physical nodes by communicating with the orchestrator. Specifically, the monitor module tracks the CPU and memory consumption of each node, as summarised in Procedure 1.
Procedure 1: Update Nodes. |
|
This procedure updates the resource utilization of each node within the SBC cluster. It determines a node’s resource utilization by calculating the whole usage of its virtual nodes in terms of CPU and memory (line 2). Considering all the cluster nodes as candidates to place a created virtual node by default, this procedure checks if a current node’s utilization has not reached its defined maximum capacity (line 3). If the maximum capacity has been reached, the node’s status is marked as unscheduled, and it is excluded from the candidate selection process in the scheduling algorithm (line 4). Line 5 checks for the opposite condition. It verifies that the current node’s usage is below its maximum value. The node’s status is set to scheduled in line 6 if it was previously marked as unscheduled. The updating nodes procedure is used by the monitor block, and its behavior is described in Algorithm 1.
The monitor block’s procedure begins by initializing two parameters (lines 1–2) that run during the whole lifetime of the system (line 3). It checks several parameters (e.g.,
,
,
) for all the created virtual nodes (
) to verify if certain conditions have been satisfied. If a virtual node is running, the monitor block gathers its CPU and memory usage from a metrics server (e.g., Prometheus) and records these values (lines 5–7). Otherwise, the event is determined to be in one of two possible states:
succeeded or
failed. In the case that an event has completed its execution, the virtual node where it was running is marked as succeeded. If the event has been completed past its deadline (line 9), the algorithm updates the amount of deadline violations in line 10. The other state is related to the failed virtual nodes (line 11). When this condition is satisfied, the virtual nodes monitor updates the amount of rejected events in line 12. Afterwards, the algorithm releases the used resources and updates the respective parameters (lines 13–15). Finally, the algorithm calls the updating nodes procedure in line 16. The rejected events and deadline violations parameters are later used as evaluation metrics of the proposed scheduler. The metrics are analyzed in detail in
Section 5.
Finally, the monitoring block receives the SOC battery information sent by the SOC estimator block in a parallel process to Algorithm 1. The received SOC information is saved in . As a result, the scheduling block is able to obtain the utilization of a node in terms of CPU usage, memory usage and SOC by reading the stored values in .
Algorithm 1: Monitor Process. |
|
4.3. Scheduler Block
This module determines the best node where an event can run according to the
SOC prediction. The scheduler block receives the event requests one after the other and appends them to a priority queue. At the same time, it takes the events from the priority queue one by one and determines the node where each event will run based on the remaining battery estimations and CPU usage. The
SOC prediction is determined through a regression model [
28] that is explained in
Section 4.3.1, and Procedure 2 summarises the
SOC prediction methodology used by Algorithm 2.
Procedure 2:SOC prediction. |
|
Procedure 2 uses a trained prediction model to forecast the
SOC value that a node would have if a specific virtual node that is running an event were assigned to it. The procedure takes as input objects the current node and the virtual node to be scheduled. Then, it extracts and determines the required information by the
SOC regression model to predict the
SOC value. The first step in the
SOC prediction procedure is to initialize the output variable and create an empty set to store the data used by the prediction model (line 1). Next, the procedure checks if the current node is the controller node, because the data used by the model is determined by the node’s type (line 2). In
Section 4.3.1, we explain why we make this differentiation in the input data to the model. Lines 3–6 determine and store in the
variable the values required by the model in the case of the controller node. In line 3, the expected CPU usage of the node, if the virtual node were deployed to it, is calculated by adding the current CPU usage and the required CPU of the event running in the virtual node. Lines 4 and 5 determine the expected overall number of exchanged packets between the controller and the computing nodes once the virtual node is scheduled. The obtained values are then stored in the data set (line 6). In the case of computing nodes, the procedure only factors in the expected CPU usage and saves this value in the data set (lines 7–9). Finally, the prediction value is obtained from the
SOC regression model considering the stored data (line 10).
Algorithm 2 is executed during the whole lifetime of the scheduling algorithm while there are existing elements in the priority queue (line 1). Before analyzing any possible node to assign an event, the algorithm obtains the node’s capacity from the monitor block (line 2). After gathering the usage of the nodes, the algorithm takes the first element in and initializes the list of possible candidates to host the new virtual node (lines 3–5). The purpose of the following steps in this algorithm is to define the potential candidate SBCs where virtual nodes can be placed. Each physical node with a battery percentage above a minimum predefined value and a scheduled status is added to (lines 6–8). After that, the algorithm removes the controller node from the list if it is present in the list and there is at least one computing node available (lines 9–10). Thus, the algorithm increases the controller’s longevity and avoids associated extra processing for deploying an event on it.
If the previous condition is not met (line 9), we analyze two possibilities with the same outcome (line 11). The first possibility corresponds to the case when no node can host a virtual node. The second possibility is more complex, since the only available node is the controller and the event to schedule will run during the whole lifetime of the system. For both cases the associated event (i.e., service or task) is rejected in lines 12–22, and
metric is updated. Accordingly, the created virtual node for that event is removed to release its resources, and
P,
and
are updated. In the case of a service event, the algorithm checks each former network function. If the function has already been deployed, it is deleted to release the associated resources. In the case that the function is in the priority queue, it is also removed so as to not be considered by the scheduler and save further resources. After not meeting the aforementioned conditions (lines 9 and 11), we have at least one node in
(line 23). Notice that the controller node can be in
when there are no more available computing nodes and the events to be placed have a specified running time (
). In this way, our proposal scheduler reduces the number of rejected events, as demonstrated in
Section 5.
The process of Algorithm 2 continues, and in the next step the first element in is assumed as the best node (line 24). This node is then analyzed to determine if any candidate node has a better score than this current best node (lines 25–32). In line 26, the algorithm verifies if the SOC predictor model exists. If it does, the predictor model calculates the SOC of the current candidate node using Procedure 2 (line 27).
The output of the
SOC prediction procedure is used in Algorithm 2 to calculate the node score through Equation (
9) (line 28). With this equation, the algorithm tries to maximize the node score by selecting the node with the highest
SOC and the minimum CPU usage.
In Equation (
9), the value
is an adjustable positive weight with values between 0 and 1. The
SOC term can correspond to either a predicted value (i.e., using the
SOC regression model) or a real measured value from the
SOC estimator block. The expected CPU usage (
) represents the new CPU usage that the analyzed node would have if a virtual node running an event were scheduled to it. This value is calculated by adding the current CPU usage of the node and the required CPU usage of the event. The steps are similar to the one previously described, but the node score is calculated using the current
SOC of the node when the
SOC predictor model is not available (lines 29–30). After calculating
, Algorithm 2 checks if this value is higher than the best node score (line 31). In the case that the value is higher, the current node is taken as the new best node (line 32). Finally, the scheduler block communicates to the orchestrator to bind
in
(line 33).
Algorithm 2: Event Scheduler. |
|
Algorithm 3 represents the main process of the scheduler application which initializes the other processes, i.e., the monitor process, the event scheduler and the regression model handler (see
Section 4.3.1) (lines 5–7). The algorithm begins by initializing the set of virtual nodes, which can be updated by both the scheduler and monitor block, and the event lists (lines 1–3). Additionally, it initializes the
SOC prediction model used in Algorithm 2 and the data set to train the model in Algorithm 4 (line 4). The
variable is updated in each cycle of the Algorithm 1. Algorithm 3 waits for any event request while the scheduling application is running (line 8). When a request arrives, the algorithm adds it to the corresponding list in line 9. Then, it creates a virtual node with the requirements for the event and adds it to the set of virtual nodes (line 10). After creation of the virtual node, the events’ ranking in the priority queue (
) (lines 11–13) is determined by a process that considers two factors:
and
. The former represents the amount of time that the scheduler can delay the execution of an event without missing its deadline (see Equation (
10)). The latter is the waiting time of the event before being processed by the scheduler (see Equation (
11)). In both equations, we denoted the current system time as
. Note that the smaller the
, the faster the created virtual node will be executed.
Based on the previous definitions, we calculate the ranking score for the virtual node where an event runs, denoted by
, as follows:
where
is an adjustable positive weight with values between 0 and 1. A virtual node with the lowest ranking must be executed first. Thus, the algorithm updates the priority list and sorts the queue by taking into account the calculated ranking of the virtual node (lines 14–15).
Algorithm 3: Main process. |
|
4.3.1. SOC Regression Model
In contrast to the described methodology in
Section 4.1, which is under the umbrella of direct calculation methods and model-based methods, data-driven methods do not require an equivalent circuit or electrochemical mechanism model to describe battery behaviors. Thus, the data-driven methods can estimate the battery
SOC through sampled data by finding a relation between the data and the
SOC measurements. Methods of this kind include autoregression moving average (ARMA), artificial neural network (ANN), support vector machine (SVR) and others [
29]. These methods can cause a large computational burden when the training data is huge. Additionally, they must be trained in an initial state before their hyper-parameters can be adjusted. Thus, these methods might not be feasible for use cases where an SBC battery-powered cluster must also process service and task requests.
In the case of regression models, the model coefficients are determined from available training data by minimizing the root mean square error (RMSE) between the predicted and real values. The RMSE represents the standard deviation of the prediction errors, thus showing how concentrated the data is around the line of best fit [
28]. In general, regression models can be classified into two types: polynomial and linear regression models. The former may include higher powers of one or more predictor variables and is defined in Equation (
13). The latter may include the interaction effects of two or more variables and represents an example of a multiple linear regression model. It is represented in Equation (
14) [
28].
In this paper, we adopt a regression model for SOC estimation, since the computational characteristics of SBCs might not be capable of supporting complex algorithms, such as ARMA, ANN and SVR. The regression model is trained with an initial dataset formed by collected metrics during a defined period. To improve its accuracy, it is updated when the RMSE metric is above a pre-defined threshold. The model coefficients are calculated and updated through Algorithm 4.
Algorithm 4: Regression Model Handler. |
|
This algorithm checks the SOC regression model after waiting a pre-defined time (line 2). After that, it verifies if the regression model has not been training (line 3). Then, the model is trained with the existing training dataset (), and this dataset is initialized to gather new data for future model examinations (lines 4–5). If a model already exists (line 6), the algorithm studies the model’s coefficients to determine if they must be updated (lines 7–11). Using the existing gathered data, the algorithm makes several SOC predictions from the trained model (line 7). Then, it determines the RMSE between the predictions and the measured SOC in (line 8). In line 9, the calculated RMSE is checked to determine if it is above a pre-defined threshold. If this is the case, Algorithm 4 updates the SOC regression model using the existing dataset (line 10). After this process, the training dataset is restarted, and newly gathered data is added through the monitoring process (line 11).