Figure 1.
Coordination model.
Figure 1.
Coordination model.
Figure 2.
Gossip pattern: combination of aggregation and spreading patterns (in the red dotted box), inspired from bio-inspired design patterns [
36]. The bottom layer, called basic patterns, contains the four elementary patterns currently implemented in the current version of the SAPERE derived middleware, in the form of coordination mechanisms.
Figure 2.
Gossip pattern: combination of aggregation and spreading patterns (in the red dotted box), inspired from bio-inspired design patterns [
36]. The bottom layer, called basic patterns, contains the four elementary patterns currently implemented in the current version of the SAPERE derived middleware, in the form of coordination mechanisms.
Figure 3.
General diagram of a topology with instances of the coordination platform communicating with each other.
Figure 3.
General diagram of a topology with instances of the coordination platform communicating with each other.
Figure 5.
Using coordination mechanisms to implement a gossip ensemble learning approach. This figure illustrates the use of different types of learning models depending on the node. In this approach, each node manages distribution and aggregation of predictions.
Figure 5.
Using coordination mechanisms to implement a gossip ensemble learning approach. This figure illustrates the use of different types of learning models depending on the node. In this approach, each node manages distribution and aggregation of predictions.
Figure 6.
The interface “” defined in the coordination middleware.
Figure 6.
The interface “” defined in the coordination middleware.
Figure 7.
Aggregation of 3 different properties: TOTAL, MODEL (node model), and CL_MODEL (cluster model).
Figure 7.
Aggregation of 3 different properties: TOTAL, MODEL (node model), and CL_MODEL (cluster model).
Figure 8.
Application of a single aggregation specification (contained in the synthetic property “AGGREGATION” of the local LSA: ). This processing follows the following 3 steps: (step 1) it retrieves the objects to be aggregated from LSAs received as input and from the local LSA (value of property in each LSA); (step 2) it executes the method defined in the class of ; and (step 3) it stores its content in the "aggregatedValue" attribute of this property.
Figure 8.
Application of a single aggregation specification (contained in the synthetic property “AGGREGATION” of the local LSA: ). This processing follows the following 3 steps: (step 1) it retrieves the objects to be aggregated from LSAs received as input and from the local LSA (value of property in each LSA); (step 2) it executes the method defined in the class of ; and (step 3) it stores its content in the "aggregatedValue" attribute of this property.
Figure 10.
Accuracies of node predictions obtained using the Markov chains model, for each aggregator.
Figure 10.
Accuracies of node predictions obtained using the Markov chains model, for each aggregator.
Figure 11.
Accuracies of node predictions obtained by the LSTM model, for each aggregator used. Averages obtained for each model appear as dashed lines.
Figure 11.
Accuracies of node predictions obtained by the LSTM model, for each aggregator used. Averages obtained for each model appear as dashed lines.
Figure 12.
Accuracies of cluster predictions obtained by the two models (Markov chains and LSTM) and for each aggregator used. Averages obtained for each model appear as dashed lines.
Figure 12.
Accuracies of cluster predictions obtained by the two models (Markov chains and LSTM) and for each aggregator used. Averages obtained for each model appear as dashed lines.
Figure 13.
Comparison of accuracies obtained on cluster’s level predictions for (1) each approach (GEL, GFDL with Markov chains model, and GFDL with LSTM model), and (2) each aggregator used in both approaches GEL and GFDL (sampling_nb, dist_power_hist, and no aggregator). Averages obtained for each approach appear as dashed lines.
Figure 13.
Comparison of accuracies obtained on cluster’s level predictions for (1) each approach (GEL, GFDL with Markov chains model, and GFDL with LSTM model), and (2) each aggregator used in both approaches GEL and GFDL (sampling_nb, dist_power_hist, and no aggregator). Averages obtained for each approach appear as dashed lines.
Figure 14.
Same comparisons obtained on node’s level predictions: by approach (GEL, GFDL with MC, GFDL with LSTM) and by aggregator. Averages obtained for each approach appear as dashed lines.
Figure 14.
Same comparisons obtained on node’s level predictions: by approach (GEL, GFDL with MC, GFDL with LSTM) and by aggregator. Averages obtained for each approach appear as dashed lines.
Figure 15.
Accuracies obtained on cluster’s level predictions with ensemble learning (Markov chains and LSTM) and for each aggregator used.
Figure 15.
Accuracies obtained on cluster’s level predictions with ensemble learning (Markov chains and LSTM) and for each aggregator used.
Table 1.
Comparison of centralised and decentralised federated learning.
Table 1.
Comparison of centralised and decentralised federated learning.
| Centralised Federated Learning | Decentralised Federated Learning |
---|
Model training | Executed by the nodes |
Constraints regarding models | All nodes must have a similar type of model and similar behaviours to predict |
Model distribution | Managed by the central server | Managed by the nodes |
Risk of network overhead | Important around the central server (all the nodes exchange data with it). | Can be reduced, depending on the density of the network graph (each node exchanges data only with its direct neighbours). |
Confidentiality of the model dataset | Yes (nodes only exchange model weights) |
Table 2.
Role of each component in the coordination model.
Table 2.
Role of each component in the coordination model.
Component | Role | Layer |
---|
Digital twins | Intelligent agent that represents a real device and interacts with other digital twins via the coordination platform to perform specific tasks (exchanging energy, training a learning model and generating predictions). | Intermediary (between the end user and the middleware layer). |
Coordination platform | System that executes the coordination model within a node device. It interacts with other coordination platforms located on other nodes, through the spreading coordination law. | Lower (coordination middleware). |
Tuple space | Virtual environment shared between the different digital twins and on which the coordination laws are executed. | Lower (coordination middleware). |
Coordination law | Bio-inspired mechanism operating in tuple space, which transforms or diffuses data exchanged by digital twins. The gossip mechanism, which is used to implement the GFDL and GEL approaches, is part of this and uses the elementary mechanisms of aggregation and spreading. | Lower (coordination middleware). |
LSA | Tuple structure containing the properties of a digital twin. It is submitted by a digital twin, transformed and disseminated by the coordination laws, then retrieved by other digital twins. | Lower (coordination middleware). |
Table 3.
The 6 electrical power variables that characterise the state of the node.
Table 3.
The 6 electrical power variables that characterise the state of the node.
Variable | Description |
---|
requested | total power of demands |
produced | total generated power |
consumed | total power of satisfied demands |
supplied | total generated power used for supplies |
missing | total power of not satisfied demands |
available | total generated power not used for supplies |
Table 4.
Synthesis of distributed learning approaches (centralised versus decentralised and the 3 types of data exchanged). The two approaches we are implementing are highlighted in bold.
Table 4.
Synthesis of distributed learning approaches (centralised versus decentralised and the 3 types of data exchanged). The two approaches we are implementing are highlighted in bold.
| Type of Shared Data: |
---|
|
Training Dataset
|
Model Weights
|
Predictions
|
---|
Centralised distribution | Centralised Learning | Federated Learning (FDL) | Ensemble Learning (EL) |
Decentralised distribution (Gossip) | Not implemented | Gossip Federated Learning (GFDL) | Gossip Ensemble Learning (GEL) |
Table 5.
Similarities between GFDL and GEL.
Table 5.
Similarities between GFDL and GEL.
| Gossip Federated Learning | Gossip Ensemble Learning |
---|
General mechanism | Both use the generic gossip mechanism of the coordination platform. |
Type of distributed learning | Both are decentralised approaches. |
Dataset confidentiality | Both guarantee confidentiality, as datasets are not disseminated. |
Requirements for predicted behaviour | In both cases, the cluster must contain nodes with similar behaviours,
so that the exchange of knowledge about these behaviours is beneficial. |
Table 6.
Differences between GFDL and GEL.
Table 6.
Differences between GFDL and GEL.
| Gossip Federated Learning | Gossip Ensemble Learning |
---|
Data shared | Learning Model weights | Predictions. |
Requirements for model structures | All the learning models disseminated must have the same structure. | No requirement. |
Possibilities for implementing aggregation operators | A wider variety of aggregation functions can be defined on learning models. This offers more possibilities to increase the gains in accuracy provided by aggregation. | Much more limited possibilities as prediction objects have simpler structures and fewer functions. |
Implementation effort required to integrate a new type of model | Implement and test new aggregation functions for the new type of the learning model. | Nothing to be implemented. |
Memory and bandwidth consumption | High or very high, depending on the structure of the learning model (number and size of layers, matrices). This may require a significant reduction in the frequency of gossip. | Limited risk of network or memory overhead as the prediction results take up much less memory. |
Sensitivity to an aggregation operator not suited to the situation. | The impact is less immediate as it concerns the model weights and not necessarily the classification results. | The impact is direct because the aggregation coefficients are applied directly to the prediction results. |
Table 7.
LSA properties aggregated in
Figure 7.
Table 7.
LSA properties aggregated in
Figure 7.
Variable | Class | Operator | Description |
---|
TOTAL | NodeTotal | sum | The sum operator defined in class consists of summing each power variable contained in class, over the set of nodes. It returns a class instance which represents the whole cluster of nodes. |
MODEL | | sampling_nb | The sampling_nb operator defined in class calculates a weighted average of the various LSTM models, using a weighting proportional to the number of samples in each LSTM model. |
CL_MODEL | | power_loss | The power_loss operator defined in class calculates a weighted average of the different Markov chain models, using a weighting proportional to the model’s accuracy evaluated with a local test dataset. |
Table 8.
Main properties of a learning model instance.
Table 8.
Main properties of a learning model instance.
Property | Description |
---|
Model configuration | Contains the model type, perimeter (node or cluster), aggregator, incremental learning frequency, aggregation frequency, etc. |
Time of last weight update | This information allows the model to check whether any changes have been made recently, and whether it is therefore necessary to send the model content to its LSA for spreading (see step 11 in Algorithm 1). |
Test dataset | Recent history of the power values and corresponding states (classification), for each of the 6 powers to be predicted. This data constitutes a test set for evaluating models from different neighbouring nodes, as it provides the states observed, which will then be compared with the states predicted by the model to be evaluated, as part of the aggregation of the different models received. |
Aggregation weights | Table of weighted coefficients assigned to each neighbouring node model (including the local node) in the last aggregation. For each variable, the sum of the coefficients linked to each node is equal to 1, and each weight (also called coefficient) indicates the relative importance attributed to the sub-model of a node k during this aggregation calculation. |
Table 9.
Main services implemented by a learning model instance.
Table 9.
Main services implemented by a learning model instance.
Service | Description |
---|
Model training | Incremental training, including the latest dataset updates. |
Prediction | Single prediction calculation at a given date-time and horizon. |
Series of predictions | Set of predictions generated with a given set of time-dates and horizons. This returns a list of prediction results for each date-time and horizon requested. |
Model compaction | Returns the compacted format to be submitted in the LSA (to reduce communication overhead). |
Weights copy | Operation for copying weights from another model: this operation is used when the local model needs to recover the weights from the last aggregation. |
Number of samples | Returns the numbers of samplings in the model dataset (for the “quantitative” aggregator). |
Aggregation results | Returns the detailed results of the last aggregation, with the weights obtained for each model received and each power variable. This result is displayed in the web application: it can be used to check the coefficient assignments during the last aggregation. |
Table 10.
Common variables used in aggregation operators.
Table 10.
Common variables used in aggregation operators.
Variable | Description |
---|
N | index of the local node |
| set of node indexes of the models to be aggregated: each index identifies the node location of a model to be aggregated. This set includes the index of the local node (N) and of all its direct neighbours: |
k, i | node index: and |
| learning model of |
| training dataset of |
| test dataset of |
| weight assigned by the operator to , which represents its relative importance compared with the other models. This coefficient is normalised: |
Table 11.
Adaptations used to reduce bandwidth. The ‘Usages’ column indicates the types of objects to which these adaptations apply.
Table 11.
Adaptations used to reduce bandwidth. The ‘Usages’ column indicates the types of objects to which these adaptations apply.
Adaptation | Description | Usages |
---|
Spreading frequency | Enables to reduce the frequency at which an object is broadcast in the context of the gossip application. | Learning models, Predictions. |
Synchronisation of spreading | Delays the start of the broadcast of information from a node, until the direct neighbouring nodes are also ready to broadcast the same information, in the context of the gossip application. | Learning models. |
Compression of exchanged objects | Reduces the size of data exchanged between LSAs by converting an object into a compressed format just before it is physically sent. | Learning models. |
Table 12.
Parameters set for each classifier used.
Table 12.
Parameters set for each classifier used.
Classifier | Learning Frequency | Gossip Frequency | Batch Size | Learning Rate | Other Hyper Parameters |
---|
Markov Chains | 1/min | 1/5 min | N/A | N/A | sliding window: 100 days |
LSTM | 1/20 min | 1/20 min | 32 | ( for the 1st training) | |
Table 13.
Accuracy results, broken down by aggregator and by prediction perimeter. The ‘CLUSTER’ and ‘NODE’ lines correspond to the averages obtained for each prediction perimeter, and the ‘TOTAL’ line corresponds to the overall averages obtained for all predictions. The left half of the table shows the accuracies obtained using the Markov chains model, while the right half shows the predictions obtained using the LSTM model. Accuracies are also calculated based on non-trivial predictions.
Table 13.
Accuracy results, broken down by aggregator and by prediction perimeter. The ‘CLUSTER’ and ‘NODE’ lines correspond to the averages obtained for each prediction perimeter, and the ‘TOTAL’ line corresponds to the overall averages obtained for all predictions. The left half of the table shows the accuracies obtained using the Markov chains model, while the right half shows the predictions obtained using the LSTM model. Accuracies are also calculated based on non-trivial predictions.
| Markov Chains | LSTM |
---|
|
Predictions NB
|
Reliability %
|
Reliability Non Trivia %
|
Predictions NB
|
Reliability %
|
Reliability Non Trivia %
|
---|
TOTAL | 77,094 | 80.26% | 75.84% | 59,020 | 92.85% | 91.24% |
CLUSTER | 25,896 | 75.05% | 74.68% | 17,834 | 88.39% | 88.39% |
CLUSTER none | 6942 | 74.75% | 74.02% | 6474 | 86.44% | 86.44% |
CLUSTER power_loss | 7470 | 74.59% | 74.59% | 2196 | 89.03% | 89.03% |
CLUSTER min_loss | 3726 | 75.42% | 75.42% | 2216 | 90.03% | 90.03% |
CLUSTER sampling_nb | 3840 | 75.29% | 75.29% | 3864 | 89.08% | 89.08% |
CLUSTER dist_power_hist | 3918 | 75.88% | 74.66% | 3084 | 89.98% | 89.98% |
NODE | 47,580 | 83.54% | 76.92% | 41,186 | 94.78% | 92.92% |
NODE none | 6840 | 83.32% | 77.29% | 14,322 | 95.48% | 93.82% |
NODE power_loss | 21,678 | 85.51% | 80.05% | 4362 | 94.25% | 92.31% |
NODE min_loss | 3726 | 88.78% | 77.56% | 4380 | 93.10% | 90.76% |
NODE sampling_nb | 7572 | 78.18% | 71.13% | 12,038 | 94.35% | 92.42% |
NODE dist_power_hist | 7764 | 80.94% | 72.63% | 6084 | 95.58% | 93.91% |
Table 14.
Accuracy results, broken down by aggregator and by prediction perimeter obtained, using the ensemble learning approach. The ’CLUSTER’ and ’NODE’ lines correspond to the averages obtained for each prediction perimeter, and the ‘TOTAL’ line corresponds to the overall averages obtained for all predictions. Accuracies are also calculated based on non-trivial predictions.
Table 14.
Accuracy results, broken down by aggregator and by prediction perimeter obtained, using the ensemble learning approach. The ’CLUSTER’ and ’NODE’ lines correspond to the averages obtained for each prediction perimeter, and the ‘TOTAL’ line corresponds to the overall averages obtained for all predictions. Accuracies are also calculated based on non-trivial predictions.
| Gossip Ensemble Learning |
---|
|
Predictions NB
|
Reliability %
|
Reliability Non Trivia %
|
---|
TOTAL | 233,364 | 86.26% | 87.80% |
CLUSTER | 115,980 | 86.19% | 86.05% |
CLUSTER none | 29,340 | 81.19% | 81.06% |
CLUSTER dist_power_last | 28,698 | 88.52% | 88.52% |
CLUSTER dist_power_hist | 29,346 | 88.32% | 88.19% |
CLUSTER sampling_nb | 28,596 | 86.81% | 86.71% |
NODE | 117,384 | 86.33% | 90.17% |
NODE none | 29,334 | 92.52% | 90.54% |
NODE dist_power_last | 29,796 | 93.93% | 91.63% |
NODE dist_power_hist | 30,450 | 92.37% | 89.53% |
NODE sampling_nb | 27,804 | 65.05% | 88.96% |