Gossip Coordination Mechanism for Decentralised Learning

Glass, Philippe; Di Marzo Serugendo, Giovanna

doi:10.3390/en18082116

Open AccessArticle

Gossip Coordination Mechanism for Decentralised Learning

by

Philippe Glass

^*,†

and

Giovanna Di Marzo Serugendo

^*,†

Centre Universitaire d’Informatique, University of Geneva, 1205 Geneva, Switzerland

^*

Authors to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Energies 2025, 18(8), 2116; https://doi.org/10.3390/en18082116

Submission received: 24 February 2025 / Revised: 14 April 2025 / Accepted: 15 April 2025 / Published: 20 April 2025

(This article belongs to the Section F5: Artificial Intelligence and Smart Energy)

Download

Browse Figures

Versions Notes

Abstract

:

In smart grids, renewable energies play a predominant role, but they produce more and more data, which are volatile by nature. As a result, predicting electrical behaviours has become a real challenge and requires solutions that involve more all microgrid entities in learning processes. This research proposes the design of a coordination model that integrates two decentralised approaches to distributed learning applied to a microgrid: the gossip federated learning approach, which consists of exchanging learning models between neighbouring nodes, and the gossip ensemble learning approach, which consists of exchanging prediction results between neighbouring nodes. The experimentations, based on real data collected in a living laboratory, show that the combination of a coordination model and intelligent digital twins makes it possible to implement and operate these two purely decentralised learning approaches. The results obtained on the predictions confirm that these two implemented approaches can improve the efficiency of learning on the scale of a microgrid, while reducing the congestion caused by data exchanges. In addition, the generic gossip mechanism offers the flexibility to easily define different variants of an aggregation operator, which can help to maximise the performance obtained.

Keywords:

smart grid; coordination model; coordination law; digital twin; tuple space; LSA; node; node state; gossip federated learning (GFDL); gossip ensemble learning (GEL)

1. Introduction

Now that the integration and management of renewable energies have become a major priority, the need for energy distribution systems between private individuals has intensified.

Smart grids (see Definition 1) play an essential role in the distribution and regulation of energy at a local level in a microgrid structure that can be connected to the power grid.

These new requirements on energy regulation have a particular impact on the need to find the most sustainable balance possible between electricity consumption and production in a microgrid system to ensure the stability of such a system [1]. As a result, machine learning algorithms need to be put in place to learn the electrical behaviour of systems incorporating renewable energy, which by its nature is highly volatile [2].

In addition, microgrids—which are part of the new models of distributed, dense, and ubiquitous computing systems [3]—provide huge quantities of electricity data generated in different places and at different times. According to Zhang et al.’s review on data analysis in smart grids [4], in addition to lower-level data such as electrical status measurements collected by sensors, smart grid systems also generate higher-level data such as control instructions, fault information, and analysis data that provide insight into previous information, all of which contribute to a significant increase in the total volume of data. In this context, a purely centralised learning framework could quickly find itself at a loss to manage such a volume of data, the sources of which may vary over time. This creates a bottleneck: firstly, in the central server IT unit, which must manage and coordinate all the units in the micro-network, and secondly in the data network, since all the information flows converge on the central server. These limitations cover coordination, control, fault management, and machine learning tasks on electrical behaviours. Moreover, centralised control systems take much longer to adapt to a disruption to a distributed energy resource. More generally, they are much less scalable in relation to any changes that may occur to a distributed energy resource (addition, modification, or deletion of a distributed energy resource) [5]. Furthermore, the data characterising users’ electrical power is confidential and could be hacked by third parties during the transfer [6]: such data should not be sent as-is to a centralised unit which coordinates all the tasks.

To meet these challenges, we propose to use purely decentralised models which are suitable for distributed systems such as microgrids, where each computational unit generates consumption and production data. Coordination models [7] could fill this gap as they allow the design of distributed systems capable of adapting to permanent changes and in which autonomous agents—which represent the smart grid entities—interact with each other. Such systems have been used, for example, to coordinate different medical and transport services during a humanitarian crisis [8].

The work presented in this paper consists of designing and implementing a coordination model equipped with a gossip mechanism (see Definition 8) that enables the exchange of knowledge between computing nodes, making it possible to execute two variants of decentralised learning frameworks: the gossip federated learning approach, based on exchanges of model weights among nodes, and the gossip ensemble learning approach, based on exchanges of prediction results among nodes. To make the gossip mechanism as generic as possible, we propose to define the knowledge aggregation method on the fly, i.e., outside the middleware that defines the mechanism (this method being specified as an input parameter). In summary, this paper proposes to integrate into a coordination model a generic gossip mechanism for executing decentralised learning approaches among the nodes of a microgrid. This work is also part of the LASAGNE (https://eranet-smartenergysystems.eu/global/images/cms/Content/Fact%20Sheets/2020/ERANetSES_ProjectFactSheet_JC2020_LASAGNE.pdf, accessed on 14 April 2025) research project [9], initiated by several consortia in Switzerland and Sweden, which aims at developing self-adaptive applications for smart grid energy management. These applications cover aspects such as energy negotiation or peak shaving. To address this goal, the LASAGNE project develops an edge-to-cloud and edge-to-edge infrastructures [10], relying on a digital platform providing specific services for these applications. This platform instantiates a coordination model, provides digital twins, and supports distributed collaborative machine learning (ML) methods. In the rest of the paper, we present the research context, our approach, and its experimentation. Here is the general structure. In Section 2, we will present existing approaches to distributed collaborative learning, and we will present our contribution, which consists of using the gossip mechanism of a coordination model to implement decentralised collaborative learning with two variants: gossip federated learning, which applies gossip to learning models, and gossip ensemble learning, which applies gossip to predictions. In Section 3, we will present the experiment, and the evaluation method applied to the two gossip variants. We will present and compare the different results obtained in terms of accuracy. In Section 4, we will discuss the gains in accuracy brought by the two gossip approaches. In Section 5, we will summarise the contribution, its added value and limitations, and the scope for improvement.

2. Materials and Method

In this contribution, we propose to implement the gossip mechanism in a coordination model, and to apply it to distribute in a decentralised way the knowledge acquired by each node: the principle is to reforge the knowledge by mutilating the learning experiences. First, let us look at the existing approaches to decentralised learning.

2.1. Related Works

Before starting this section, we define the notion of smart grid.

Definition 1.

A smart grid (also called microgrid) is an electricity network that uses computer technologies to adjust the flow of electricity between suppliers and consumers. It is made up of several interconnected nodes containing computing units, which are generally connected to electrical devices that consume or produce electricity—except for the highest-level nodes. By collecting information on the state of the network, a smart grid contributes to the balance between production, distribution, and consumption [11].

In a microgrid structure, a node—also called an edge device—is a computational entity able to store and process data and software. Typically, the coordination platform and the digital twins [12] execute within a node (e.g., an edge device). In this section, we present existing machine learning distribution approaches to help edge devices improve their knowledge using a distribution framework at the microgrid (i.e., global) level. In fact, we need to distinguish between the learning model itself—which a node in the network runs at its level to predict electrical behaviours based on data observed locally—and the framework used at the microgrid level to federate the learning done by the individual nodes. These two orthogonal elements constitute the learning techniques implemented throughout a microgrid. In what follows, we focus on frameworks used to federate learning between different computational units; the different types of learning models used by a microgrid computational unit are also important but are not the main focus of this exploration of research work.

2.1.1. Distributed Collaborative Machine Learning Frameworks

In terms of frameworks, we present here two main families of approaches, one called federated learning, which uses a centralised model, and the other called decentralised federated learning, which uses a purely decentralised model.

Centralised Federated Learning: The General Approach

Centralised federated learning, more commonly known as federated learning (FDL) [13], is a paradigm in which multiple machines collaboratively train an artificial learning model while keeping their training data at the local level. The principle is that different devices distributed in different nodes located in the same network all participate in learning a variable whose behaviour is homogeneous across all the nodes. In this approach, each device participates in the learning of this common behaviour by training an instance of the learning model using its local data. A central server manages the distribution of the learning model to each node device (called ‘client’) in a cyclic manner. Periodically, each client is therefore allocated a version of the model, which it will re-train with its dataset and then return to the central server. The local data used to train the model is not sent to other nodes; in this way, it remains protected. As represented in Figure A1, the data streams containing learning models all converge on the central server, forming a star graph topology. The central server aggregates the model weights received from the clients (for example, by applying an average) and assigns the new model version based on the aggregated model weights. The federated learning paradigm contrasts with purely centralised learning, in which all machines send their data to a central server, which is the only entity to execute the learning model.

In the energy domain, Savi et al. [14] have experimented with the federated learning approach coupled with the LSTM model to predict the short-term effective load at different node locations in a smart grid. Ibrahem et al. [15] have proposed a federated learning approach that uses a privacy-preserving aggregation scheme for energy theft detection in smart grids. They have designed an aggregator server that receives only the encrypted form of model parameters from individual nodes. This protects the network from external intrusions and preserves the node’s privacy.

Decentralised Federated Learning Approach

A variant of the federated learning approach, called decentralised federated learning [16], involves the various devices to a greater extent, in that each node device itself manages the distribution of model weights to the surrounding node devices, unlike the traditional approach where a single server is responsible for distributing model weights to all the devices. We consider this approach to be totally decentralised in that the data flows are not all linked to a single central server as in a star graph topology but are distributed throughout the network. Similarly, the aggregation and communication processing are not concentrated on a single computing unit, but on a set of local units located in the various nodes. The decentralised federated learning approach uses the gossip mechanism (see Definition 8), which is a composition of aggregation and spreading mechanisms. Figure A2 represents the decentralised federated learning which is an application of the gossip mechanism to a set of nodes that train a learning model. This mechanism, also called gossip federated learning, is a framework in which a local node performs all actions locally. More precisely, each node executes alternatively the computation phase (which consists of training the learning model with local data) and the inter-nodes communication phase (which consists of exchanging learning models with neighbours and merging local models with those of neighbouring nodes). Table 1 summarises the two approaches presented, highlighting the points of convergence and divergence.

According to Liu et al. [16], this approach reduces the communication traffic among nodes and prevents disclosing sensitive local data to other nodes. During the inter-node communication phase, one node only sends the computed weights of its learning model (and not training data). An empirical study, ref. [17], provides a systematic comparison of federated learning and gossip federated learning approaches, using experimental scenarios including a real unsubscribe trace collected on cell phones under different conditions: communication regularity (continuous or in bursts), bandwidth, network size, and compression techniques. Another approach has experimented with gossip federated learning on an ultra-dynamic configuration, such as a network of electric vehicles moving in an urban environment [18]. To predict the vehicles’ trajectories, this experiment has applied gossip federated learning to a fleet of moving cars, with each vehicle device locally executing the LSTM learning model. The results confirmed that in this type of highly dynamic configuration, gossip federated learning significantly improves the accuracy of poor-experienced vehicles, by taking advantage of the learning experience of better-trained models that move in the vicinity. As the decentralised federated learning approach is new, there are still very few applications in the field of energy.

Giuseppi [19] has proposed a decentralised federated learning algorithm for non-intrusive load monitoring (NILM) applied to a network of energy communities. NILM is a process for analysing voltage and current variations that occur in a house; for different types of devices, evaluations have confirmed that the decentralised version tends to perform better than the centralised version. Moussa et al. [20] have experimented with a decentralised federated learning framework on a microgrid network composed of grid edge devices that use a LSTM learning model locally. In this approach, each node applies a variant of the gossip mechanism conditioned by the performance of its current predictions; if the latter are good, it sends its weights to neighbouring nodes (otherwise, it will ask neighbouring nodes for their weights to improve its performance).

Ensemble Learning Approach

So far, we have presented approaches that allow several independent models to be combined, but which are structured in the same way. In the literature, there are also a multitude of approaches that allow models of different types to be combined, using many ways of combining them [21]. This is the very general ensemble learning approach, which builds a new higher-level classifier from several classifiers.

The underlying idea of ensemble learning is that the union of several basic models which are trained separately, and which may be of different types, can produce more reliable results than a single model. This draws inspiration from Condorcet jury theorem [22], which formally expresses the fact that a majority vote improves the reliability of the decision. According to Sagi’s review on ensemble learning methods [23], different sub-approaches to ensemble learning use different ways of combining learning models, such as input manipulation (which gave rise to the AdaBoost and bagging classifiers), output manipulation (which gave rise to gradient boosting), ensemble hybridisation (which gave rise to random forest), manipulated learning (rotation forest) and partitioning.

In the same way, ensemble learning approaches are also used in the field of electricity. Wang et al. [24] have built an ensemble learning approach for load forecasting which combines LSTM models as first level models and fully connected cascade (FCC) neural networks as second level models. The LSTM models are executed on different portions of data, which are separated using the HDBSCAN clustering algorithm [25]. Kumar et al. [26] have also experimented with an ensemble learning approach to forecast consumption power. This approach uses a voting process to choose a prediction from among the predictions generated by 5 different base models. The choice consists of applying these models to the evaluation data and selecting the one that minimises the calculated error. With the aim of securing a microgrid and gaining users’ trust, Ali et al. [27] have proposed an ensemble learning model to filter prosumers willing to exchange energy in a microgrid, based on their predicted reliability in terms of packet loss rate, response time, responsiveness, integrity, and consistency. The proposed ensemble learning approach combines LSTM and LGBoost classifiers and applies data sampling manipulation algorithms using SMOTE and PCA. It is important to note that the ensemble learning implementations we have just mentioned all combine different elementary models in a centralised manner, i.e., the mechanisms that combine the models are executed by a single process, although the models are driven by independent processes. Other approaches have already explored decentralised mechanisms for combining models, for example, the approach of Yu et al. [28], which involves each agent in the process of combining data samples from neighbouring agents and the approach of Magureanu et al. [29], which defines a consensus probabilistic protocol (called Slush) based on peer-to-peer exchanges. This purely decentralised version offers advantages such as reduced overhead communication and robustness against intrusions. To date, such decentralised ensemble learning approaches have not yet been implemented on microgrid applications.

2.1.2. The Identified Gap and Research Question

Considering the various existing works, we can identify the absence of a generic coordination mechanism that can ensure purely decentralised learning with an exchange of knowledge between microgrid nodes in different forms—either prediction results or model weights—and with the possibility of defining the aggregation operator as a parameter injected into the coordination mechanism. We can formulate this knowledge gap in a more general way by means of the following research question: how can we integrate a gossip mechanism into a coordination model that can be used for decentralised learning, with an aggregator that the client layer can define on the fly? The following sections present our approach, which seeks to respond to this research gap.

2.2. Using a Coordination Model on Microgrid Nodes

In this section, we describe the coordination model, which is the cornerstone of our contribution: In fact, a coordination model manages the mechanisms that allow different digital twins (also known as agents) to interact with each other, not only to exchange and regulate energy, but also to distribute knowledge, in the form of model weights or prediction results. This paper focuses on the second point, which involves the gossip mechanism for decentralised learning frameworks.

2.2.1. Coordination Model: Key Concepts

It is worth noting that the different concepts listed in this section have already been presented in our previous paper [30] as the coordination model, and the digital twins have been used since the beginning of our research to simulate energy exchanges among producer and consumer devices.

Definition 2.

A coordination model provides a coordination medium, for sharing data among the coordinated entities and coordination laws applying on the shared data (transformation, spreading) [31]. We consider coordination laws as self-organising mechanisms inspired by bio-inspired systems. The coordination model uses the stigmergy principle [32], which enables the exchange of asynchronous information between the coordinated entities using the coordination medium. Such a service has its own logic, controlled by an autonomous software agent (also called intelligent digital twins; see Definition 4).

The coordination model used derives from the SAPERE coordination model and SAPERE middleware [33]. Ben Mahfoudh [34] provided the most recent extension. In a previous contribution [30], we already used this model to exchange electrical energy and manage peak shaving (based on the interaction among digital twins). Figure 1 represents the coordination model in the form currently used, which draws inspiration from biochemistry. It contains a shared virtual environment, called tuple space (see Definition 3), in which digital twins can share data by submitting or retrieving data properties. Digital twins can thus submit and retrieve LSAs (see Definition 6). LSAs contain tuples of properties provided from other digital twins or generated by the coordination law [35] (see Definition 7) at any time.

Definition 3.

A tuple space is a shared virtual space containing all tuples of a node. There is a shared space for each coordination platform (see Definition 5).

Definition 4.

A digital twin is a software agent that interacts on the one hand with end-user applications (or with consumption/production edge devices), and on the other hand, coordinates its actions with other digital twins through the coordination platform. To do so, it submits LSAs into the tuple space or retrieves LSAs from the tuple space. Digital twins serve various purposes:

1.: Interaction with the coordination platform.
2.: Coordination of tasks among digital twins.
3.: Real-time representation of their physical twin counterparts (e.g., edge consumption device data).

Definition 5.

A coordination platform is a system that implements and runs the coordination model. The coordination platform executes within a grid edge device and communicates through the network with other coordination platforms in other grid edge devices (thanks to the spreading coordination eco-law, a specific type of broadcasting communication described below). The software agents generate new data on the fly by coordinating their activities through the coordination platform; they submit and retrieve data from the platform. The data submitted corresponds to the following types:

1.: Data generated by an actual physical object linked to the environment (e.g., consumption or production level).
2.: Data exchanged among digital twins for managing energy (e.g., requests or offers of energy, peak shaving schedules).
3.: Data provided by other services (possibly processed or transformed dynamically by self-organisation mechanisms).

Definition 6.

A live semantic annotation (LSA) is the tuple of data containing the digital twin properties that travel and evolve in the tuple space. Digital twins can thus submit and retrieve LSAs (tuples of properties) provided from other digital twins or generated by the coordination laws at any time.

Definition 7.

A coordination law is a mechanism that applies to entities in the environment. Coordination laws are modelled with bio-inspired design patterns [36] and classified into three hierarchical levels, as shown in Figure A3:

1.: The bottom layer (called basic patterns), which contains the atomic mechanisms.
2.: The middle layer (called composed patterns), which contains the mechanisms using those of the lower layer.
3.: The upper layer (called high level patterns), which contains the mechanisms using those of the two lower levels.

2.2.2. The Gossip Mechanisms

Our approach is to use the bio-inspired mechanisms provided by the platform to implement decentralised learning, enabling multiple nodes to share knowledge. As shown in Figure 2 and explained in Definition 8, the gossip pattern uses the elementary patterns of aggregation and spreading; each of them appears in the basic patterns layer and is materialised in the coordination model by the two coordination laws of aggregation and spreading, and the combination of these two laws makes it possible to implement a gossip mechanism, which is applied to learning models, within the framework of gossip federated learning.

Definition 8.

Gossip is a mechanism for obtaining shared agreement on information among a set of digital twins in a decentralised manner. All the digital twins work together to gradually reach this agreement by aggregating their own knowledge with that of their neighbours. The gossip mechanism consists of two elementary mechanisms: aggregation, which consists of enriching information by merging its values received from surrounding digital twins, using a specific operator (e.g., sum or average); and spreading, which consists of sending information from one digital twin to its surrounding neighbours.

We now describe how the gossip mechanism gradually achieved consensus. Let us consider a specific piece of information on which a cluster of nodes applies gossip. At start-up, each node contains a different value of the same information, but at each gossip application cycle, this same information is broadcast to neighbouring nodes, and each node applies an aggregation operator receiving as input its own value as well as that received from its neighbours. A received value may already have been aggregated previously by indirect neighbours. Regardless of the operator used, in each node, the aggregation will end up directly or indirectly integrating the version of all the other nodes. As a result, the gossip makes it possible to arrive progressively at a convergent result, which corresponds to a consensus. The speed of convergence and the obtained consensus value can vary depending on the operator used (for example, min or max or average or other type of operator). In some cases, convergence is delayed if a node modifies its local version before the information has had time to converge. Indeed, in this context of dynamic adaptation, the digital twins attached to the nodes regularly modify their properties and, as a result, new values regularly arrive in each aggregator. Section 2.5.1 and Section 2.5.2 describe the implementation of these two elementary mechanisms within the SAPERE derived middleware.

2.2.3. Overview of Coordination Model Components

Table 2 summarises the different elements involved in the functioning of a coordination model. In this software architecture, the digital twins are part of the intermediate level layer, while the other components are an integral part of the lower-level layer (the coordination middleware).

2.2.4. Defining the Microgrid Topology

The coordination model used ensures interaction between digital twins located at different nodes. The general topology of a network is defined as shown in Figure 3. An instance of the coordination platform runs on each node, which is directly linked to several neighbouring nodes. All nodes can communicate with each other either directly or indirectly. We can define any mesh topology between the nodes, the only requirement being to run an instance of the coordination platform on each node (which may or may not be physically on different hosts). At each execution cycle, the spreading coordination law propagates the LSA data located at the current node to all direct neighbour nodes (and thus to the entire network after a certain number of cycles). We should note that each node receives LSAs from neighbouring nodes on a TCP-IP socket server, in the form of serialised objects. Furthermore, a node defines its neighbours in emission only. Consequently, neighbours can be different in transmission and reception. A node propagates LSAs to nodes in its own neighbour list, but it can itself receive LSAs from other nodes not included in its neighbour list.

2.2.5. Scalability to Suit Different Environmental Topologies

By definition, the coordination model implemented can be deployed on networks with any type of topology and number of nodes. In addition, it can easily integrate (even in real time) any changes to the topology, such as the addition/removal of a node or the attachment/removal of a digital twin at a node. It can also be deployed on a larger scale, as was demonstrated at the Vienna Marathon, to track runners in real time [37]. This type of deployment has not yet been carried out in the context of decentralised learning. This requires more extensive deployment of the two approaches (gossip federated/ensemble learning) on an environment with around a hundred nodes. Furthermore, limitations may arise if a very large number of digital twins are attached to the same node. These may be physical limitations related to the computing unit (available memory, bandwidth) or related to the execution of coordination laws (limitation on the number of properties that can be submitted at the same time on LSAs, or on the number of times coordination laws can be executed at each cycle at node level). In summary, our approach is scalable and adaptable to the topology and number of nodes on a network scale, whereas scalability on a single node scale is limited by the physical capacity of that node.

2.3. Defining the Variables to Be Predicted

The aim of this study is to be able to predict the electrical behaviour of the various computational nodes that make up a microgrid, using a decentralised learning mechanism between the nodes. We then extend this prediction perimeter to the microgrid corresponding to a cluster of nodes.

2.3.1. Defining Variables at Node Level

First, we define the node of a microgrid as follows:

Definition 9.

A node (also called GED: grid edge device) is a local computing device of the environment, identified by a network address and linked to a set of direct neighbouring nodes, as well as to a set of local electrical devices with which it interacts. In the context of this paper, a node is associated with a building and is also called a GED (grid edge device).

Definition 10.

We call node state the set of 6 power variables that characterise the global current electric state of the node, i.e., the 6 following variables summed over all the electrical devices connected to the node’s computing devices: wattage requested, produced, consumed, supplied, missing, and available (see Table 3).

In terms of classification, the range of possible values is broken down into 7 intervals, as follows: The first slice corresponds to values close to 0 (less than 1% of the maximum authorised power), the next 5 correspond to the 5 slices of 20% of the maximum power (located between the first slice and the maximum), and the last corresponds to all values above the maximum authorised power (for example, for the “product” variable; these are overproduction values). Each model therefore tries to predict the power class at the horizon time, which corresponds to one of these 7 intervals. To keep the data volume and computation time reasonable, we limit ourselves to 7 states, each state representing an interval of values that the variable to be predicted can take. This compromise gives a correct order of magnitude for the electrical power without overusing the calculation units. Furthermore, the division of classes into regular intervals does not necessarily reflect the actual distribution of values for each power variable. This would require further improvements which, on the one hand, would require a more in-depth study of the actual distribution and, on the other hand, a breakdown of the classes specific to each variable, which is not the case in this version, given that this contribution focuses more on the learning distribution mechanisms than on the classifier implementations themselves. In this contribution, we propose that each node tries to predict each of the 6 variable classes at the local node level. These variables are predicted at different horizons to manage Peak Shaving. The principle is to anticipate any overproduction or overconsumption with the aim of avoiding it proactively. Each node’s learning twin runs a machine learning model that trains on the variable’s history and regularly generates predictions. As explained in the next paragraph, these variables are also extended to a group of nodes (cluster): the model is then duplicated for the 6 variables associated with the perimeter of the cluster.

2.3.2. Defining Variables at Cluster Level

We consider a cluster to be a group of several nodes which share prediction models or prediction results depending on the decentralised learning framework used. It can be the complete microgrid when it does not contain many nodes, which is the case in our experiments. In the same way as for a node, we can study the electrical behaviour at cluster level; each of the 6 variables that we defined earlier can be aggregated at the level of the whole cluster by simply adding up its values across the different nodes in the cluster (as shown in Figure A7). To calculate the cluster total, each node applies a sum aggregator to all 6 variables from neighbouring nodes (which are disseminated by the spreading coordination law).

In this paper, we therefore define the prediction perimeter (otherwise known as the scope), which corresponds to either the node or the cluster. This amounts to replicating each prediction model for each perimeter. For example, in a cluster of 4 nodes from N1 to N4, each node will try to predict its own production as well as that of the cluster, and the same for the 5 other variables. These learning models are also replicated on each node.

2.4. Two Different Ways of Using the Gossip Mechanism

Decentralised knowledge sharing between different nodes in a network can be achieved in different ways, depending on whether the learning models have the same structure or not. Either the different entities manage learning models with the same structure, in which case it is possible to exchange model weight parameters to make each node’s model “more efficient” in predicting electrical behaviour at a local level or cluster level. In this case, we can try to adopt a federated learning strategy based on the gossip mechanism. Alternatively, the entities have heterogeneous structural models, and exchanging model parameters makes no sense. In this case, it is possible to benefit from the learning experience of other nodes by exchanging the prediction data generated by the different nodes. It is always possible to aggregate prediction results, whatever the learning model used by the nodes. In this situation, we can adopt an ensemble learning strategy, based on a gossip mechanism. In both cases, we are talking about decentralised learning using the gossip mechanism, but in the first case, we are talking about federated learning and applying aggregation to learning models, while in the second case, we are talking about ensemble learning and applying aggregation to the predictions themselves.

2.4.1. The Gossip Federated Learning Approach

First, let us define the key concepts. In machine learning, federated learning is a learning paradigm in which several machines collaboratively train an artificial intelligence model while keeping their data local. Gossip federated learning corresponds to a paradigm in which agents communicate directly with each other and disseminate their local models peer-to-peer. We then propose to adapt this definition to the use of the bio-inspired gossip pattern that we are using in this research.

Definition 11.

In the context of this contribution, we define gossip federated learning as a purely decentralised approach to share and optimise learning across a cluster of nodes by using the gossip coordination mechanism implemented in each node. The gossip mechanism consists of spreading learning models from one node to another and merging models by aggregating them. This approach assumes that all nodes try to predict the same type of behaviours, using machine learning models with similar structure. The different nodes are also supposed not to exchange their training data samples with each other, which remain private.

This latter definition is based on the bio-inspired pattern of gossip [36]. Indeed, if we apply the gossip pattern to machine learning model weights, we can implement a purely decentralised federated learning approach where each node plays a full role in learning and knowledge distribution. Firstly, it contributes to improving knowledge of a behaviour (by training its model with its own local data); secondly, it benefits from the knowledge acquired by its neighbours (by receiving their learning models); and finally, it participates in the distribution of models (by sending its model content to all its direct neighbours at each gossip cycle). This gossip federated learning approach contrasts with the centralised federated learning approach, where a central server manages all aggregation and distribution to the “client” entity (which trains the model with its own local data). As shown in Figure 4, the principle is that each node manages the aggregation and spreading of models at its own level. Periodically, each node does the following:

Trains iteratively its own learning model using its private local data updated with the latest observations (see step number 1 on the upper figure).
Receives model weights from surrounding nodes which are directly linked (see step number 2 on the bottom figure).
Aggregates the models received into one merged model. Its own model is also included in the aggregation (see step number 3 on the bottom figure).
Spreads its updated model to the surrounding nodes which are directly linked (see step number 4 on the bottom figure).

Figure 4. Using coordination mechanisms to implement a gossip federated learning approach. The upper figure shows phase 1, which consists of training the learning model from local data, while the bottom figure shows the next 3 phases, which are to receive models from neighbouring nodes, aggregate them with the local model, and then spread the new model obtained to neighbouring nodes.

It is worth noting that the frequency of gossip application can be adjusted so that the accuracy obtained by the model is optimal. It should be at least equal to the model update frequency (at the risk of unnecessarily broadcasting model weights that have already been sent) but can be less than the relearning frequency. We can also note that the physical constraints of the electrical network (represented in red in Figure 3) which have an impact on the transmission of energy are not directly dealt with in the gossip federated learning approach. In fact, these constraints impact the values of the power variables rather than their transmission between the node computing units since this is not a latency problem on the data network. The power values we retrieve from smart meters in the living lab already take this loss into account. If these transmission problems have a recurring impact on the values of power variables at particular time slots, the learning models will be able to integrate these changes in behaviour into the weight matrices and therefore into the generation of predictions. On the other hand, data transmission constraints have a direct impact on the way the gossip federated learning approach works. As explained in more detail in Section 2.6.3, given the high volume of data exchanged between the nodes, we had to reduce the gossip frequency on the learning models and use a mechanism for compressing learning model contents when broadcasting them to other nodes. In this way, we were able to achieve fluidity in exchanges between nodes. However, given the varying distance between neighbouring nodes, the update dates of the models injected into the gossip mechanism will always be a few seconds apart. However, this difference is very small compared to the frequency of gossip application or model retraining (which can be counted in minutes, or even tens of minutes depending on the type of model).

2.4.2. The Gossip Ensemble Learning Approach

As mentioned in Section “Ensemble Learning Approach” offers an alternative to federated learning for sharing knowledge acquired by different independent entities. Ensemble learning is defined as a common group-wide learning paradigm, using different machine learning models and combining them, with the aim of improving the reliability of predictions made by the cluster of nodes.

However, there are many ensemble learning approaches with different ways of combining models. In our approach, we propose to combine only the predictions generated by each of the models; this seems to us to be the simplest and most natural way, as it does not involve sending model samples (which remain private for each node) and limits the number of bytes to be exchanged. Unlike federated learning, different nodes can use different types of learning models, but similar to federated learning, the different models must predict the same type of behaviour and are assumed to keep their local dataset private. In the same way, we propose to define a purely decentralised ensemble learning approach based on the use of the bio-inspired gossip pattern.

Definition 12.

In the remainder of this paper, we define gossip ensemble learning as a purely decentralised approach to sharing and optimising predictions of common behaviours across a cluster of nodes using the gossip coordination mechanism, by spreading prediction results from one node to another and merging prediction results, by aggregating them.

This approach has the advantage over gossip federated learning of being able to integrate nodes with different models, and of requiring less bandwidth to spread to neighbouring nodes, as the prediction data is far less voluminous than the weight data of a learning model. However, as this approach only shares the results of predictions (and not the content of the models themselves), it cannot apply an aggregator that evaluates and compares the performance of models, such as the “power_loss” aggregator.

Figure 5 describes the gossip ensemble learning approach, which consists of applying the aggregation and spreading mechanisms to the predictions generated by learning models (and not to learning models, as in the case of federated learning through gossip).

It should be noted that despite this difference, the general mechanism remains the same. Indeed, the general stages (1, 2, 3, 4) which involve local updates, reception, aggregation, and spreading are similar to those of gossip federated learning. The difference is that these operations are applied to predictions rather than learning models.

2.4.3. Overview of Implemented Approaches

Table 4 situates the two approaches implemented in our contribution (displayed in bold) in relation to the main existing approaches to distributed learning. The type of distribution is either centralised or decentralised, while the data exchanged can be training datasets, model weights, or model predictions. In our contributions, we deal exclusively with decentralised distributions, with exchanges of model weights and predictions. It should be noted that training dataset exchanges are not addressed, as we consider these data to be private and should not be disseminated to other nodes.

Table 5 highlights the similarities while Table 6 highlights the differences between the two approaches gossip federated learning and gossip ensemble learning, based on various criteria. The result is that the gossip ensemble learning approach is less restrictive, so it is preferable to use it if we do not have a lot of resources, bandwidth, and time for integration. On the other hand, the gossip federated learning approach can offer more possibilities in terms of accuracy optimisation by playing on the different aggregation operators. We will see in Section 2.6.3 that it is possible to reduce bandwidth by using some of the adaptations implemented.

2.5. Integration of the Gossip Pattern into the Coordination Middleware

The implementation of the gossip mechanism uses the implementations of the aggregation and spreading mechanisms, as shown in Figure 2. The gossip mechanism, which combines these two elementary mechanisms, is executed as a higher-level pattern. Algorithm 1 represents the gossip process loop which invokes the aggregation and dissemination mechanism at regular intervals. Gossip is applied to the variable

n o d e O b j e c t

, which represents the object to be aggregated and disseminated; this is a learning model (in the gossip federated learning approach) or a prediction (in the gossip ensemble learning approach). The processing loop involves the following main steps represented in Algorithm 1:

Local update of $n o d e O b j e c t$ : see step 6.
Aggregation of $n o d e O b j e c t$ and received objects: see step 9.
Dissemination $n o d e O b j e c t$ to neighbouring nodes: see step 12.

It should be noted that the local update frequency and aggregation frequency are lower than the coordination law frequency; for this reason, local update and aggregation are not systematically applied at each iteration (see the IF condition in step 5 and 8).

Algorithm 1 Gossip processing loop applied on an object called

n o d e O b j e c t

(learning model or prediction). The two constants UPDATE_PERIOD and GOSSIP_PERIOD correspond to the inverse of the update and gossip frequencies. Comments are in teal blue.

1:: $n o d e O b j e c t \leftarrow l o c a l U p d a t e ()$ ▹first local update: model training for GFDL or prediction calculation for GEL.
2:: while True do ▹ Iterative processing of the coordination laws.
3:: $w a i t S e c (1)$ ▹ wait for 1 second. (Time elapsed between 2 coordination law cycles).
4:: $r e c e p t i o n . a p p l y ()$ ▹ reception of the LSA from neighbouring nodes.
5:: if $(c u r r e n t T i m e () > = n o d e O b j e c t . l a s t U p d a t e T i m e () + U P D A T E_P E R I O D)$ then▹ check whether it’s time to apply a new local update.
6:: $n o d e O b j e c t \leftarrow l o c a l U p d a t e ()$ ▹ iterative local update
7:: end if
8:: if $(c u r r e n t T i m e () > = n o d e O b j e c t . l a s t A g g r e g a t i o n T i m e ()) + G O S S I P_P E R I O D)$ then ▹ check whether it’s time to apply a new aggregation.
9:: $a g g r e g a t i o n . a p p l y (n o d e O b j e c t . p r o p e r t y)$ ▹ call for aggregation
10:: end if
11:: if $(n o d e O b j e c t . l a s t U p d a t e T i m e () > = l s a . l a s t S p r e a d i n g T i m e ())$ then ▹ check whether the node object is updated.
12:: $n o d e O b j e c t . a c t i v a t e S p r e a d i n g ()$ ▹ activate the diffusion to neighbouring nodes.
13:: end if
14:: end while

2.5.1. Implementation of a Generic Aggregation Mechanism

The aggregation mechanism implemented draws inspiration from a previous implementation of the SAPERE middleware, which applied an aggregation on an LSA directly and not on a particular property of the LSA. This version could not be suitable, as all the LSAs to be aggregated were deleted and replaced by the LSA resulting from the aggregation, which had the result of deleting all prosumer agents concerned. Furthermore, it was not possible to define the aggregator outside the middleware library, and the aggregation functions only applied to simple objects like numbers, strings, or dates. These are “standard” aggregators, like the min/max or average function.

In addition to this, the need arose to execute aggregations on several different properties of the same LSA (i.e., several different aggregators attached to the LSA). The aggregation mechanism has therefore been completely overhauled, so that it can be applied to any LSA properties and provide a degree of flexibility.

General Algorithm for Aggregation

Algorithm 2 represents the general aggregation process, which comprises 3 stages:

Retrieval of the objects to be aggregated; see step 1 and 2.
Execution of the aggregation operator defined in the object class. It should be noted that the aggregation operator is determined on the fly by the digital twin requesting aggregation; see step 3.
Update of ‘aggregatedValue’ attribute in the corresponding LSA property; see step 4.

Algorithm 2 Aggregation on an object called

n o d e O b j e c t

(learning model or prediction). Comments are in teal blue.

1:: $listObjToAggregate \leftarrow lsa . getReceivedObjects (nodeObject . property . name)$ ▹collect all values to be aggregated received from neighbours.
2:: $listObjToAggregate . add (nodeObject)$ ▹add the value of the current node.
3:: $r e s u l t \leftarrow a g g r e g a t i o n . a p p l y (o p e r a t o r, l i s t O b j T o A g g r e g a t e)$ ▹application of aggregation and saves the result in the value of the current node.
4:: ( $l s a . p r o p e r t y [n o d e O b j e c t . p r o p e r t y . n a m e]) . s e t A g g r e g a t e d V a l u e (r e s u l t)$ ▹update of the $a g g r e g a t e d V a l u e$ attribute in the LSA property containing the object.

Using Polymorphism to Make Aggregation Generic

The new aggregation mechanism is capable, on the one hand, of aggregating any class of objects (e.g., apples, pears, or walnuts) and, on the other hand, of aggregating a class of objects in a different way (for example, by summing the weights, applying the maximum of the weights, or calculating an arithmetic average). We thus use the polymorphism principle to define the aggregation behaviour for each “aggregatable” object: the aggregation method must be defined in the upper-level class which implements the

I A g g r e g a t a b l e

interface. The idea is to make the aggregation coordination law (defined in the coordination model middleware) independent of the

a g g r e g a t e

method applied to each object. This latter method is in the specific class of the objects to aggregate, with the possibility of defining different aggregation variants (one variant per operator value, to be defined in aggregate method). For example, an aggregatable class that aggregates a power value can define different aggregation operators: one that sums the power values, another that applies the maximum on power values, and another that applies the arithmetic means on power values. Figure 6 illustrates the use of

I A g g r e g a t a b l e

interface with 4 specific classes that implement the aggregate method.

N o d e T o t a l

class models the state of a node, containing the 6 power variables.

P r e d i c t i o n D a t a

class models predictions made at different times, with different horizons: it contains the predicted and actual states at the different horizons, as well as at each regular time between the initial date and the last horizon.

M a r k o v C h a i n s M o d e l

and

L s t m M o d e l

classes model the contents of Markov chains and LSTM learning models with all parameters. For Markov chains, these are the transition matrices, while for LSTM, these are the different layers of the recurrent neural network, each layer containing 3 different weight matrices (w, u and b).

These 4 classes are used in this contribution to integrate federated learning and ensemble learning. The two classes

M a r k o v C h a i n s M o d e l

and

L s t m M o d e l

model the structure of the two base learning models we have integrated: Markov chains and LSTM.

Figure 7 illustrates the aggregation mechanism applied on 3 property objects (listed in Table 7) whose aggregations are independent of each other; the 3 aggregators do not apply to the same property and the same class, and they use aggregation operators that are specific to them. In this example, the objects are of different classes, but we could very well use two independent aggregations which apply to the same object class (for example LstmModel), but which do not use the same operator.

2.5.2. Implementation of Spreading Mechanism

The spreading coordination law disseminates all present LSAs at the current node tuple space to all neighbouring nodes and repeats this operation at each execution cycle. Since all the nodes repeat the same operation every cycle, we can easily deduce that every LSA will eventually be routed to all the nodes in the cluster, in a direct or indirect way. For each LSA present in the tuple space, the process for spreading LSAs involves the following two steps:

It identifies the addresses to target; these are the addresses of the connected node where the LSA has not yet been present.
For each address targeted, it duplicates the LSA and sends the copy to the target address (via TCP-IP).

Figure A5 illustrates an LSA spreading cycle performed by a node. The spreading eco-law sends the LSA to its direct neighbours (except for the originating node, which does not need to receive its own send). Each time an LSA is sent, the eco-law stores the various destination addresses in the synthetic LSA property “SENDING”. In this way, when the LSA is received another time by the same node, it will not be sent again to the same destinations, and the broadcasting process of this LSA will be stopped in this node. The process of physically sending serializes the content of an LSA object and transmits it via the TCP/IP protocol. Then, when the LSA is received at a destination node, its content is deserialized by the LSA server of the destination node.

2.6. Using Gossip Pattern for Gossip Federated Learning

The generic aggregation mechanism therefore allows an aggregation to be applied to any class of objects, and to different properties of an LSA at the same time. In the context of gossip federated learning, this will involve aggregating objects containing learning models (used for federated learning aggregation) and node state instances (used to aggregate the 6 power variables at the microgrid level).

2.6.1. Template of a Generic Learning Model That Can Be Aggregated

We have modelled a common template (in the form of an abstract class and a generic interface) to define the common properties and services that must be included in any type of learning model implementation used by the learning digital twin. To date, we have implemented 2 types of models, which are Markov chains [38] and LSTM [39], but in the future this architecture will allow the integration of other types of models, by implementing the methods required in the common template. A concrete model will notably have to implement the train process and prediction of each power variable (including produced, requested, consumed, provided, missing, available). A model is structured into several sub-models, each of which is associated with one of the 6 power variables to predict. In the following, we use the term “sub-model” to designate the sub-part of the model dedicated to the prediction of a single variable. In each model implementation, the parameters of each sub-model are well separated. Table 8 and Table 9 list the main properties and services implemented in a “concrete” learning model:

2.6.2. Definition of Several Operators for Learning Model Aggregation

Figure 8 shows a view of the aggregation mechanism applied to one property. The aggregation method follows the common parameter signature shown. The learning model’s instances to aggregate correspond to the

o b j_{1}

,

o b j_{2}

,

o b j_{n}

objects in this diagram, and the aggregation method is executed in

o b j_{1}

instance (the local model). The result of the aggregation,

o b j_{a g g r}

, is then stored in

L S A_{1}

, which is the LSA containing the local model.

As represented in Figure 9, a model aggregation takes place in two stages:

First, for each power variable, we extract the associated sub-model and assign it coefficients (otherwise called weights) according to the aggregation operator. A coefficient assigned to a model from one node defines the pertinence of the model in relation to models from other nodes. Each operator corresponds to a different approach and therefore to a different way of assigning weighted coefficients. The obtained coefficients are then stored in the learning model’s instance.
Secondly, for each sub-model, we apply the linear combination based on the coefficients previously assigned to the entire model structure. This combination is applied to each sub-model, and each sub-model then applies this combination recursively to the data clusters layer by layer, matrix by matrix, or vector by vector, depending on how layers are structured.

Figure 9. Aggregation of a learning model in two steps: (1) Assignment of model coefficients according to the approach used (among the 4 shown in the figure, in 1a, 1b, 1c, 1d). (2) Application of a linear combination of models using assigned coefficients.

The first step (i.e., assigning sub-model weights) is implemented in the abstract learning model class, as this latter class contains enough common information and services to handle the various weight calculations within this class. This has the advantage, on the one hand, of preserving a certain level of generalisation in the aggregation code and, on the other, of saving aggregation programming when implementing a new learning model.

The second step, which consists of applying a linear combination, is specific to each type of learning model. This is because each sub-part of the data structure of a learning model is specific to each type of model and requires adaptation when calculating a linear combination. It should be noted that, from an algorithmic point of view, this second step is relatively straightforward, as it simply applies linear combinations to objects formed from matrices, vectors of decimal numbers.

In what follows, we specifically describe each coefficient evaluation of each aggregation operator. Each of them is therefore treated in a “sub-block” following one of the 4 different approaches represented in Figure 9. Each sub-block calls a sub-method which implements the corresponding aggregation operator in the abstract class of learning models. When the time comes to implement a new aggregation approach, a new operator will be added to this class. For all the aggregation operators described in this section, we consider the common variables used for the calculation, which are listed in Table 10:

Quantitative Approach (“Sampling_nb”)

The principle is to assign a model a weight proportional to the number of samples it contains in the training data set. The underlying idea is that a model containing more samples in the training data should be more accurate. Compared to other aggregation operators, this one is relatively simple as it requires very little calculation, nor evaluation to be done in real time. This approach, which is purely quantitative, may be considered as a “raw foundry” aggregation because it does not take into account the fact that a neighbouring node can have (or not) similar behaviours. As a result, this approach does not consider the relevance of the model in the aggregation. Equation (1) shows the calculation of this coefficient based on the dataset size:

c o e f_{k} = \frac{s i z e (d a t a s e t_{k})}{\sum_{\begin{matrix} i \in K_{N} \end{matrix}} s i z e (d a t a s e t_{i})}

(1)

Approach Based on Model Loss (Power_loss)

The principle is to evaluate in real time the degree of similarity between the prediction of a node learning model and the actual values contained in a validation dataset.

To apply this approach, we evaluate each model received from neighbouring nodes separately (as well as the local model). First, the local node generates predictions using each of them and using the local test dataset (found in the model instance as described above). The results of each prediction are then compared with the states observed, which are contained in this dataset. The cross-entropy distance called "loss" is then calculated between the predicted states and the observed states. This loss calculation is performed for each model, and the model coefficient assignment applies a decreasing exponential to this calculated loss, as shown in Equation (2). In this way, the better a model performs, the smaller the loss and the greater its weight.

c o e f_{k} = \frac{10^{- L O S S_{k, N}}}{\sum_{\begin{matrix} i \in K_{N} \end{matrix}} 10^{- L O S S_{i, N}}}

(2)

Equation (3) shows the loss computation which applies a cross-entropy (https://en.wikipedia.org/wiki/Cross-entropy, accessed on 14 April 2025) calculation taking the following two vectors as inputs. The first input, represented by

p r e d i c t i o n s_{i}

, corresponds to the classes predicted by

m o d e l_{i}

; and the second input, represented by

t e s t d a t a s e t_{N} . t r u e_c l a s s e s ()

, corresponds to the true class extracted from the local node’s test dataset.

L O S S_{i, N} = c r o s s_e n t r o p y (p r e d i c t i o n s_{i}, t e s t d a t a s e t_{N} . t r u e_c l a s s e s ())

(3)

The cross-entropy of two vectors p and q of the same size is calculated as follows:

c r o s s_e n t r o p y (p, q) = - \sum_{\begin{matrix} x \in p . i n d e x e s \end{matrix}} p (x) * l o g (q (x))

Equation (4) shows the prediction calculations performed by the evaluated model (

m o d e l_{i}

), using the local node’s test dataset (

t e s t d a t a s e t_{N}

). This results in the output vector

p r e d i c t i o n s_{i}

.

p r e d i c t i o n s_{i} = m o d e l_{i} . c o m p u t e_p r e d i c t i o n s (t e s t d a t a s e t_{N})

(4)

Approach Based on the Minimal Model Loss (“Min_loss”)

This approach, which is algorithmically close to the previous one, consists in selecting only the best-performing model in the sense of minimizing the loss function based on the cross-entropy between predictions and observed states. The underlying idea is that the aggregated model composed with the different received models does not perform as well as the single model that minimises the loss value. As shown in Equation (5), only the model that outperforms the others is assigned a non-zero coefficient (1 in this case).

c o e f_{k} = (k = k_{m i n}) ? 1 : 0

(5)

Equation (6) shows the calculation of the index that minimises the cross-entropy between the predictions computed by

m o d e l_{i}

and the true classes extracted from the test dataset. The cross-entropy resulting from each instance of

m o d e l_{i}

is calculated in the same way as for the ‘power_loss’ variant (see Equation (3)).

k_{m i n} = arg min_{\begin{matrix} i \in K_{N} \end{matrix}} {c r o s s_e n t r o p y (p r e d i c t i o n s_{i}, t e s t d a t a s e t_{N} . t r u e_c l a s s e s ())}

(6)

Approach Based on the Power Profile Similarities

We define power profile as the history of power values contained in the data set for the considered variable. This approach focuses on the similarity of the different states taken by the variable to be predicted. It assumes that a model from a node which contains a more similar power profile should provide a more accurate model irrespective of the actual performance that this model could bring to local node data. From an algorithmic point of view, this approach requires less computation and time, because it only requires comparing recent state histories (contained in the learning model) and does not require evaluating models in real time. This distance calculation is performed for each node profile, and the model coefficient assignment applies a decreasing exponential to this calculated distance, as shown in Equation (7). Thus, the more similar the profile of a neighbour node, the smaller the distance and the greater its weight.

c o e f_{k} = \frac{10^{- D I S T_{k, N}}}{\sum_{\begin{matrix} i \in K_{N} \end{matrix}} 10^{- D I S T_{i, N}}}

(7)

Equation (8) shows the calculation of the profile distance (

D I S T_{i, j}

) between two nodes (i and j); this is the average of the differences in power values between

n o d e_{i}

and

n o d e_{j}

obtained for each instant with a corresponding power value in the dataset of

n o d e_{i}

and

n o d e_{j}

.

D I S T_{i, j} = \frac{1}{s i z e (t i m e_i n d e x e s_{i, j})} * \sum_{\begin{matrix} t i m e \in t i m e_i n d e x e s_{i, j} \end{matrix}} | d a t a s e t_{i} . v a l u e (t i m e) - d a t a s e t_{j} . v a l u e (t i m e) |

(8)

In Equation (9), we can see that the set of time indexes

t i m e_i n d e x e s_{i, j}

corresponds to the intersection of the indexes of

d a t a s e t_{i}

and

d a t a s e t_{j}

. It contains all the indexes for which the power values are present in both datasets. This set is used in Equation (8) to collect power values in the calculation of the profile distance

D I S T_{i, j}

between

n o d e_{i}

and

n o d e_{j}

.

t i m e_i n d e x e s_{i, j} = d a t a s e t_{i} . i n d e x e s \cap d a t a s e t_{j} . i n d e x e s

(9)

Possible Improvements for Aggregators

We have already implemented and tested 4 different aggregation operators applied to learning models, but it is always possible to develop a new operator that is better adapted to the situation and therefore improves the accuracy obtained; the implementation of the generic aggregation mechanism makes it easy to integrate a new operator, which is an advantage. However, the current implementation assigns the same aggregator to all 6 variables and does not allow a different aggregator to be assigned depending on the variable. For example, a given aggregation operator may be effective for consumption but much less so for production, and the ability to choose the most appropriate aggregator for each variable would further improve overall accuracy.

2.6.3. Adaptations Made to Reduce the Bandwidth Used

During our experiments, we observed that spreading of learning model objects between nodes uses up a lot of bandwidth. On the one hand, each object separately uses a lot of memory (such as a learning model object, which has a complex structure and a large volume of data) and, on the other hand, the aggregation and spreading mechanisms that work together tend to multiply the number of objects sent between nodes, even if not all nodes directly receive data from all their neighbours. This number tends to grow polynomially as the number of nodes and links increases. As a result, this large flow of data can quickly lead to bandwidth overload, and in some cases, we have observed delays in data reception due to spreading, which are detrimental to energy exchange, among other things. To overcome this problem, we have made the adaptations described below; some affect the spreading frequency while others reduce the size of sent objects.

Adapting the Spreading Frequency of Learning Models

To reduce the volume of data exchanged, we first tried to reduce the spreading frequency of learning models, by ceasing to submit its content to the LSA (for spreading) when this is not necessary. The idea is to set the frequency of gossip in the launch parameters of the coordination platform; this parameter is added to the aggregation configuration for each of the learning models (the one applied to the node state prediction and the one applied to the cluster state prediction). This parameter indicates the minimum number of minutes between two aggregations. After each aggregation, a node will block the dissemination of its learning model by simply removing the property concerned from its LSA. It will wait until the last model update date is later than the last aggregation date + N minutes (N being the value of the minimum waiting time parameter between 2 aggregations). It should be noted that not only is it not necessary to apply spreading when the model is not updated, but the optimum prediction accuracy is not necessarily obtained when the aggregation frequency is at its maximum. We could go even further in this direction, by preventing aggregation as long as there is no significant cumulative change in the local model since the last aggregation, compared to the modifications caused by this last aggregation.

Synchronisation of Spreading Start-Ups

At this point, a node will not submit its model immediately, as the other nodes may not yet be ready to send their model: the idea is to ensure that all the direct neighbours are also ready to start aggregation. This synchronisation of spreading avoids sending unnecessary voluminous data. At this stage, the node will indicate to the others that it is ready, by submitting the indicator in its “MODEL_READY” LSA property. Aggregation with the sum operator is applied to this property, so that all nodes can know the exact number of nodes ready to propagate their model. One node can then send its model as soon as all neighbouring nodes are ready (e.g., when the number of ready nodes is equal to the number of direct neighbours). This confirmation will reach all the nodes at approximately the same time, to within a few seconds, depending on the distance between the nodes in terms of minimal number of links.

Using a Compression Mechanism to Reduce the Size of Objects Exchanged

Another focus we are working on to reduce bandwidth is the use of a more “compact” class to exchange learning model weights between nodes. This latter class uses a much simpler and more elementary structure, to use less memory in the spreading process once the object is serialised. This compacted class of a model contains the entire hierarchy of weight matrices in “flattened” form; it includes a hash table whose keys identify each matrix, and each value contains the compressed contents of the matrix in the form of a buffer. The compacted class is used only for data exchange between nodes and cannot be directly aggregated. The principle is to be able to compact or de-compact the contents of such an object at any time. Figure A6 shows the aggregation mechanism applied to a set of compacted models: in a first step, each model received is decompressed (as the compressed form cannot be directly aggregated) and in a second step, the aggregation is performed on the “complete” models. Conversely, before the learning digital twin submits its learning model to its LSA, it calls on the compression operator (converting the complete form to the compressed form).

Synthesis

Table 11 summarises the different adaptations achieved to reduce bandwidth. The ‘Usages’ column shows that 2 of these adaptations (synchronisation of spreading and object compression) are only applied on learning models: in fact, learning model objects tend to use more bandwidth than prediction objects.

During the experiments, we found that these three adaptations enabled us to overcome the difficulties associated with the physical limitations of the data network. To achieve this, we looked for a good compromise between the sending frequency and the stability of the communication between the nodes. The compression mechanism requires more processing as it requires two object conversions: compression when sending to the tuple space and decompression when recovering from tuple space. For this reason, we only use it when necessary, i.e., for learning models. We could reduce further frequency of aggregation, by checking that there is significant cumulative change in the local model since the last aggregation, compared to the modifications caused by this previous aggregation.

2.7. Using Gossip Pattern for Decentralised Ensemble Learning

In Section 2.6, we presented the use of the gossip mechanism provided by the coordination model to implement the gossip federated learning approach. In a similar way, this section describes the use of the gossip mechanism to implement the gossip ensemble learning approach.

2.7.1. Implementation of Aggregation on Prediction Data

The ensemble learning approach, as defined in Definition 12 consists in aggregating the predictions received from a set of nodes. Given the uniqueness of the class modelling predictions (as opposed to the classes modelling learning models) and the relatively basic structure of a prediction result, the aggregation implementation is simplified. In addition, a prediction object is much smaller than that of a learning model, so there’s no need to implement a data compression (and decompression) mechanism to send it to the tuple space.

Structure of Predictions

The structure is defined in

P r e d i c t i o n D a t a

class, which includes a mapping table containing one prediction result (modelled by the

P r e d i c t i o n R e s u l t

class) for each horizon instant and for each variable name (these are the 6 variables which characterise the power state). Each prediction result contains the starting state and the list of transition probabilities to the different possible states (in the form of a list of real numbers). The aggregation of a prediction then amounts to applying this aggregation to each prediction result contained (in a

P r e d i c t i o n R e s u l t

instance) for each horizon date and each variable. Like the instances of the learning models, a prediction instance also contains the recent history of the node’s state; this data is used by the aggregator based on similarities in power profiles.

Definition of Several Operators for Prediction Aggregation

In the same way as for a learning model aggregation, the aggregation method of a prediction follows the parameter signature shown in Figure 8. As represented in Figure A8, a prediction aggregation also takes place in two stages:

First, for each power variable, we extract the associated prediction results and assign a coefficient (otherwise called weights) to each variable according to the aggregation operator. A coefficient assigned to a prediction result from one node defines the pertinence of the prediction result in relation to the prediction results from other nodes.
Secondly, for each prediction result, we apply the linear combination based on the coefficients previously assigned to the entire prediction structure. This combination is applied to each prediction result, which amounts to applying this linear combination to the probability vectors (which are stored in the form of an array of real numbers).

In what follows, we specifically describe each coefficient evaluation of each aggregation operator. Each of them is therefore treated in a sub-block following one of the 3 different approaches represented in Figure A8. The quantitative and ‘power profile’ approaches are already used in gossip federated learning; they are explained in Section 2.6.2. Regarding the quantitative approach, we should note that the dataset size comes from the model that generated the prediction; this information is simply copied into the prediction result when generated. The ‘last power’ approach is quite close to the ‘power profile’ approach; the underlying idea is that a prediction from a node which contains a more similar power value should be more accurate. This approach assumes that the last power value is more relevant to represent a node profile, which implies that it is not necessary to go through the history of the node’s states. Unlike the approach based on the power profile, the distance between two nodes only considers the most recent power state (i.e., the last recorded).

3. Results

3.1. Prediction Assessment Method

To assess the impact of the gossip mechanism on the quality of predictions, we run the coordination model with the living-lab scenario which corresponds to actual household production and consumption measured in the Les Vergers eco-district located near Geneva. The principle of this evaluation is to compare the predicted state at

t 1

with the actual state of the node at the prediction target time (

t 2

=

t 1

+ prediction horizon). The learning digital twin launches the prediction calculations and stores the results in the database. Each prediction record also includes the actual state at the horizon, which is completed a posteriori by the learning twin after horizon time (

t 2

); this state corresponds to the state observed at the time of the horizon (

t 2

).

We carried out comparative tests over the same time slot between 9 a.m. and 1 p.m. on the same day, so that the different variants were tested with the same level of difficulty. At this time of day, the difficulty of prediction is medium; there is a certain amount of solar energy production, with some volatility, but this is not as great as in the afternoon. For the network configuration, we used chained network topologies with 2 and 4 nodes to limit traffic between nodes. We also compare each aggregation operator, as well as the version without aggregation, which corresponds to local learning (in this case, the aggregator is designated ‘None’ in the results tables). We have integrated the Markov chains and LSTM models, which operate very differently. The Markov chains model was integrated first, as it is relatively easy and quick to implement. This stochastic model uses transition matrices (one per hour) to define the transition probabilities of all possible states. It is suitable for time series that fluctuate over time, thanks to the use of a sliding window (set to 100 days). The LSTM model was added later; it appears to be a good alternative to the Markov chains model. In fact, LSTM, which belongs to the family of recurrent neural networks, can memorise both short-term and longer-term behaviour; it is known to provide excellent performances for learning electrical power time series. The LSTM instances we have integrated use one epoch, with a batch size of 32 and a learning rate of

10^{- 3}

for the first training and

10^{- 4}

for subsequent training sessions—which are iterative.

3.2. Description of the Experiment

The principle of the experiment is to simulate energy exchanges between prosumer digital twins, based on a real power history that corresponds to the power values collected in the Les Vergers living lab. The dataset used corresponds to the electrical power data collected by the smart meters in Les Vergers since May 2022. The start date of the scenario is chosen arbitrarily as a fictitious date that must fall within the range of the data collected (between 1 May 2022 and today; for example, 15 January 2023 in the experiment).

We use a cluster of 4 nodes with a chained graph topology (shown in Figure A4), where a node corresponds to a building in the living lab’s eco-district. At start-up, the dataset is initialised with the power values collected in the living lab, and then the coordination platform simulates the energy exchanges between the prosumer twins. The 6 power variables attached to each node vary according to the new energy supplies generated by prosumer twins (and the same applies to the 6 variables attached to the cluster level). These new samples of power values are added to the model datasets every minute, and the learning models are re-trained (depending on the frequency chosen for the learning model type), taking these new samples into account. Table 12 shows the different parameters defined for each type of learning model (Markov chains and LSTM).

3.2.1. Initial Start-Up Training

For the first training of the learning models, each node coordination platform retrieves the data from the living lab dataset over the few days (generally 7) preceding the fictitious starting date and time and that correspond to its assigned node. Given that the values are collected every minute by the smart meters at each location, this gives 1440 records per day, per node and per variable, or more than 60,000 records for a week and one node. This data is then transmitted to the learning models. The time taken for initial learning varies greatly depending on the learning model used: a few seconds for Markov chains model or more than one minute for LSTM (for all 6 variables in one node). This learning process can be carried out in parallel for each node.

3.2.2. Incremental Training

The coordination platforms then iteratively re-train their model at regular intervals to take account of new observations. LSTM models are much more computationally intensive; as a result, they are re-trained less frequently (i.e., every 20 min) than Markov models, which are re-trained every time new data is received (i.e., every minute). We have assigned gossip frequencies that are less than or equal to the re-training frequencies: every 5 min for Markov chains and every 20 min for LSTM. A higher frequency would run the risk of sending the same version of a model more than once, which would use up bandwidth unnecessarily. In this study, we have not yet explored the impact of reducing drastically this frequency on the accuracies obtained by the models.

3.3. Assessing Gossip Federated Learning

Table 13 displays the obtained averages of accuracy, using Markov chains models and LSTM models, and broken down by aggregator and by prediction perimeter (node/cluster). In addition, we have identified the non-trivial prediction periods for which the horizon falls within a period of higher volatility, and therefore an increase in the entropy of the state variables (which increases the difficulty of prediction). In the results, we have recalculated the predictions obtained by extracting only the non-trivial predictions (see the ‘Reliability Non-Trivial’ columns).

3.3.1. Node State Predictions

Figure 10 and Figure 11 show the accuracies obtained in the form of histograms, only for the prediction applied to the local node perimeter and for each model separately (Markov chains/LSTM).

Using the Markov Chains Model

With the Markov chains model, the accuracies obtained ranged from 78% to 88%, with an average of 83.5%. The min_loss aggregator tends to outperform the other aggregators as well as local learning, while the sampling_nb aggregator tends to underperform local learning. Given that the difference in accuracy obtained between min_loss and sampling_nb is greater than 10%, these results highlight the importance of the choice of aggregation operator used. This also confirms the idea that the sampling_nb aggregator only considers the quantitative aspect of a model (since it is based on the size of its dataset) whereas the min_loss aggregator considers the real-time performance of models on a local dataset. The power_loss aggregator, which is also based on model performance, outperforms local learning, but to a lesser extent. In this experiment, selecting a single high-performance model rather than combining several proved to be more efficient. It should be noted that increasing accuracy comes at a cost in terms of CPU and memory usage, as the min_loss and power_loss aggregators are very computationally intensive (each model received is re-evaluated at each aggregation).

For non-trivial predictions, we can observe a significant decrease in accuracy. In node-level predictions, the proportion of trivial predictions is high (close to 50%) and, given that entropy increases considerably in these non-trivial prediction ranges, this tends to drastically reduce the accuracy obtained.

Using the LSTM Model

With the LSTM model, accuracies obtained are between 93% and 96%, around the average of 94.78%. In general, we observe significantly better performance, including for local prediction, which unanimously confirms the effectiveness of the LSTM model compared with Markov chains for node state predictions. This confirms the fact that LSTM is often cited as one of the most suitable models for predicting behaviour that can fluctuate significantly over time, such as electricity production. Its ability to take into account both short-term and longer-term behavioural memory may explain this efficiency. Furthermore, we note that the relative performance rankings of each aggregator have been turned upside down: only the dist_power_hist aggregator, which is based on node history similarity, outperforms local learning. It is also worth noting that the accuracies obtained are less disparate than those obtained with Markov chains. We also note that most aggregators reduce accuracy; this can be explained by differences in behaviour between nodes (for example, some nodes have no producers while others produce energy). When aggregating models, a model from a neighbour with quite different behaviour is of little relevance. This also explains why the dist_power_hist aggregator, which is based precisely on the similarity of historical behaviours, tends to outperform the others.

3.3.2. Cluster State Predictions

Figure 12 displays the histograms of accuracy obtained with predictions applied to the cluster perimeter. It should be noted that all cluster predictions are non-trivial. This can be explained by the fact that the cluster state, which aggregates all the node’s states, tends to be more volatile than the most stable node’s states (these are generally nodes with zero production since they have no attached producers).

We observe a significant difference between the two models (on average, LSTM outperforms Markov chains by +13% in accuracy). We can see that local learning has the lowest accuracy; all aggregators thus tend to increase the accuracy performance. This can easily be explained by the similarity of the behaviour to be predicted between each node; in fact, each node tries to predict the state of the cluster which is common to all nodes (each belonging to the same cluster). All the nodes therefore predict the same behaviour, even though each node has a different sample of data. For both models (Markov chains and LSTM), the aggregators show very little difference between them. The aggregator min_loss slightly exceeds dist_power_hist with the LSTM model and vice versa with the Markov chains model. Given that they provide equivalent accuracy, we tend to prefer the dist_power_hist aggregator, which requires much less calculation than min_loss.

3.4. Assessing Gossip Ensemble Learning

In the same way as for the evaluation of gossip federated learning, we used the living lab scenario, in the same time slot between 9 a.m. and 1 p.m. and using a chain network topology with 2 and 4 nodes. The cluster of nodes uses different types of learning models to reproduce a relevant case study of ensemble learning; half of the nodes use the Markov chains model, while the other half use the LSTM model. Table 14 shows the average accuracies obtained, for each perimeter (cluster/node) and for each aggregator.

3.4.1. Comparison with the Gossip Federated Learning Approach

Figure 13 depicts the different results obtained with the ensemble learning approach for cluster predictions (in orange), which is compared to the gossip federated learning’s approach (GFDL) applied with the Markov chains model and with the LSTM model. The ensemble learning approach did not allow us to implement as many aggregators, so there are 2 of them: sampling_nb and dist_power_hist, which are compared with the local learning version, which uses no aggregator. The average obtained for ensemble learning is 86.26%, which is significantly higher than that obtained for GFDL with the Markov chains model, but slightly lower than that obtained for GFDL with LSTM. The same cannot be said for the predictions of the node perimeter (see Figure 14), for which the average obtained (around 85%) barely exceeds that of GFDL with Markov chains and is far below that of GFDL with LSTM.

The accuracies of node level predictions displayed on Figure 14 confirm that the ensemble learning approach performs less well when each node tries to predict more heterogeneous behaviours, which is the case for node perimeter predictions. This drop is mainly due to the poor results (around 65%) obtained by the sampling_nb aggregator, which is based on the size of the training data sets. This is because some variables, such as the power produced, are hugely different from one node to another: it therefore makes no sense to aggregate their predictions according to the dataset sizes. The negative impact of this aggregator is even more direct and significant than for the GFDL approach, which consists of aggregating the model weights instead of aggregating the prediction results. Aggregating a prediction result has a more direct impact on accuracy. We can see that the impact is so great that even non-trivial predictions, which are less affected by these differences in behaviour between nodes, obtain better results.

3.4.2. Comparing the Performance of Each Aggregator

Figure 15 depicts the comparison of each aggregator used in ensemble learning’s approach.

We observed that for cluster perimeter predictions, the 3 aggregators clearly tend to improve the accuracy obtained, which is not the case for node perimeter predictions where only the dist_power_last aggregator performs slightly better than local learning. In a similar way to federated learning, this can be explained by the increased relevance of knowledge sharing when the different nodes are led to predict the same behaviour. The two aggregators dist_power_hist and dist_power_last are based on electrical states similarities, but the first calculates this difference by averaging over the recent history, whereas the second is based solely on the last recorded state. We might expect the first one to perform better, but the results confirm that the one based on the latest state performs slightly better. This may reflect an important level of volatility in the state variables, making recent states more relevant than others. As we have just seen in Section 3.4.1, the sampling_nb aggregator, which is based on the size of the training dataset, is less effective than the other two. However, this difference is lower for the cluster perimeter, as it involves predicting identical behaviour, which is less the case for the node perimeter.

In more general terms, these results confirm that the ensemble learning approach is a good alternative to gossip federated learning in that it provides comparable (or almost comparable) accuracy while offering greater integration flexibility and reducing the use of computing resources and bandwidth. Using this approach brings more gains for cluster-level predictions, which all relate to identical behaviour.

4. Discussion

First, we observed a significant difference in accuracy between the Markov chains and LSTM classifiers, independent of the use of the gossip mechanism. Figure 12 shows this very clearly. However, this very significant accuracy gain brought by LSTM is not free, in the sense that LSTM consumes much more memory and computing time and requires more effort in terms of integration to be able to be called by the digital learning twins running in the coordination platform (see Table A1). On the other hand, one might think that the gossip ensemble learning approach could alleviate the poor performance of Markov chains by mixing predictions, but the same experiment without any LSTM node gives worse results. Since we can adjust the re-learning and gossip frequency, we can limit the resource over-consumption by LSTM: as a result, the use of LSTM remains largely beneficial. In other respects, these results show that the relative performance of an aggregator fluctuates greatly depending on the learning model used, the model hyper-parameters, the prediction perimeter and other parameters linked to the experiment context. This confirms the need to carefully test and compare the impact of each aggregator on accuracy when implementing a new model or modifying the perimeter or certain hyper-parameters. Depending on the situation and the model used, gossip does not bring systematic gains, but the choice of an appropriate aggregator is decisive in improving accuracy. For example, changing the learning model or extending the prediction perimeter to the entire cluster rather than just the local node tends to upset the relative performance of each aggregator. In all the test cases, at least one of the aggregators implemented outperformed local learning, and depending on the situation, the benefits of the most appropriate aggregator may be considerable. In more general terms, we have observed that the gain from gossip is more systemic when the nodes are trying to predict similar behaviour (i.e., behaviour at cluster level in the context of this study). This confirms the added value of the gossip mechanism in such a case. Furthermore, the relative gains linked to gossip are greater overall for LSTM than for Markov chains: the gossip mechanism barely compensates for the difficulties encountered by the Markov chains model and further accentuates the relative performance of LSTM. Choosing the right model and the right aggregator is therefore doubly beneficial. The gossip ensemble learning variant that we have implemented is an equally interesting alternative; it provides comparable (or almost comparable) accuracy, although it has far fewer integration constraints. Similarly, the gains associated with gossip depend on the similarities between behaviours and may be greater when the basic learning model tends to be more accurate without using gossip.

5. Conclusions

In this study, we have integrated into a coordination model a mechanism of gossip federated learning, with an aggregator that the client layer can define on the fly. To achieve this, we have implemented the gossip bio-inspired pattern [36] as a complete generic mechanism that can be applied to any class of object and any aggregation operator defined in the aggregated object class. We have applied this new gossip mechanism on the learning models weights to integrate the gossip federated learning approach in which each node of a microgrid participates not only in learning, but also in the distribution of model weights with the surrounding nodes. We have also integrated the gossip ensemble learning approach by applying this same mechanism to the prediction results generated by the microgrid nodes. Our tests with a multi-node cluster confirmed the resilience of gossip approaches and the improved accuracy of predictions when an appropriate aggregator is chosen.

To date, although this work has addressed some aspects of the gossip federated learning approaches, we have identified other aspects that have not yet been considered or that can be developed further to meet certain recurring needs. Some future work could focus on the following improvements to integrate them into this research:

Firstly, we could enrich the learning models by incorporating meteorological variables into their features.

Regarding the implementation of the generic class of learning models, we could define a distribution of classes specific to each variable. This would allow the classifiers to better adapt to the distribution of each variable and therefore improve the results obtained. We could also allow a different aggregator to be assigned to each of the 6 variables defined. This would maximise the accuracy obtained for a specific variable. For example, the power produced could have a different aggregator to the power consumed, which is not currently the case.

Regarding the experiment, we could carry out more extensive comparative tests by modifying the gossip frequency applied to the learning models and checking whether certain values can improve the performance already obtained. Still in the same direction, we could also implement and test a variant that further reduces the frequency of gossip by only applying it when the latest changes in model weights (or prediction results) since the last application of gossip become significant. In addition, we could also test the gossip mechanism with other types of learning models used for micro-networks. Furthermore, we could use other real data sets to test the models with other case studies and highlight the influence of each aggregator on the new results obtained.

Finally, we could test the model on a larger scale in an architecture comprising at least several dozen nodes.

Author Contributions

P.G. has designed and implemented the algorithms, scenarios, and needed databases. G.D.M.S. has conceived the idea of using a coordination platform and digital twins to manage gossip federated learning and gossip ensemble learning. All authors have read and agreed to the published version of the manuscript.

Funding

This research is supported by the LASAGNE project funded by ERA-NET 108767, which involves various stakeholders from Sweden and Switzerland such as living labs, energy suppliers, smart meter suppliers, digital platform providers, municipalities, and academic institutions.

Data Availability Statement

Restrictions apply to the availability of these data. Data was obtained from HES-SO Genève and is available from the authors with the permission of HES-SO Genève and the municipality of Meyrin, where the living lab “Les Vergers” is located.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that might appear to influence the work reported in this document.

Abbreviations

The following abbreviations are used in this paper:

EL	Ensemble Learning
FDL	Federated Learning
GED	Grid Edge Device
GEL	Gossip Ensemble Learning
GFDL	Gossip Federated Learning
kWh	kilowatts-hour
LSA	Live semantic annotation (see Definition 6)
LSTM	Long Short-Term Memory
SAPERE	Self-Aware Pervasive Service Ecosystems

Appendix A

Appendix A.1. Federated Learning Approach

Appendix A.1.1. Centralised Federated Learning Approach

Figure A1. Centralised federated learning approach, inspired from [14]: the central server, represented in the middle, manages the distribution and collection of model weights that the subordinate nodes (also called “client”) update. Each client node is represented by a different colour.

Appendix A.1.2. Decentralised Federated Learning Approach Based on Gossip Mechanism

Figure A2. Decentralised federated learning approach, inspired from Liu et al. [16]. Each node is represented by a different colour and the long dotted arrows correspond to the retraining of the node local model, while the short dotted arrows correspond to the diffusion of the local model to neighbouring nodes.

Appendix A.2. Bio-Inspired Design Patterns

Figure A3. Description and composition of bio-inspired design patterns according to Fernandez-Marquez et al. [36].

Figure A4. Chained graph topology used in the experimentation.

Appendix A.3. Spreading of LSAs

Figure A5. LSA spreading to direct neighbour nodes.

Figure A6. Aggregation of compacted learning models. This operation is carried out in three stages: 1: unpacking the compacted models received, 2: aggregating the “complete” models, 3: compacting the aggregated model.

Table A1. Comparison of Markov chains and LSTM. (*): The last column shows the possibilities of reducing consumption in terms of resources and computation by adjusting the training frequency.

Learning Model	Accuracy	Computing Time	Resource Consumption	Integration Effort	Possibility to Reduce Consumption (*)
Markov Chains	80.26%	Very low (few seconds)	Very low	Low	Yes
LSTM	92.85%	High (few minutes)	High	High	Yes

On a light grey background: the learning model used.

Figure A7. Calculation of the cluster total by aggregating the totals obtained for each node. This figure shows only the two power variables “produced” and “consumed” out of the 6 power variables.

Appendix A.4. Aggregation of a Prediction

Figure A8. Aggregation of a prediction in two steps. (1): assignment of prediction coefficients according to the approach used (among the 3 shown in the figure, in 1a, 1b, 1c) and (2): application of a linear combination of predictions using assigned coefficients.

References

Holjevac, N.; Capuder, T.; Kuzle, I. Adaptive control for evaluation of flexibility benefits in microgrid systems. Energy 2015, 92, 487–504. [Google Scholar] [CrossRef]
Sayed, E.T.; Olabi, A.G.; Alami, A.H.; Radwan, A.; Mdallal, A.; Rezk, A.; Abdelkareem, M.A. Renewable energy and energy storage systems. Energies 2023, 16, 1415. [Google Scholar] [CrossRef]
Marzal, S.; Salas, R.; González-Medina, R.; Garcerá, G.; Figueres, E. Current challenges and future trends in the field of communication architectures for microgrids. Renew. Sustain. Energy Rev. 2018, 82, 3610–3622. [Google Scholar] [CrossRef]
Zhang, Y.; Huang, T.; Bompard, E.F. Big data analytics in smart grids: A review. Energy Inform. 2018, 1, 8. [Google Scholar] [CrossRef]
Zuo, K.; Wu, L. A review of decentralized and distributed control approaches for islanded microgrids: Novel designs, current trends, and emerging challenges. Electr. J. 2022, 35, 107138. [Google Scholar] [CrossRef]
Liu, J.; Xiao, Y.; Li, S.; Liang, W.; Chen, C.P. Cyber security and privacy issues in smart grids. IEEE Commun. Surv. Tutor. 2012, 14, 981–997. [Google Scholar] [CrossRef]
Ciatto, G.; Mariani, S.; Di Marzo Serugendo, G.; Louvel, M.; Omicini, A.; Zambonelli, F. Twenty years of coordination technologies: COORDINATION contribution to the state of art. J. Log. Algebr. Methods Program. 2020, 113, 100531. [Google Scholar] [CrossRef]
Ben Mahfoudh, H.; Di Marzo Serugendo, G.; Naja, N.; Abdennadher, N. Learning-based coordination model for spontaneous self-composition of reliable services in a distributed system. Int. J. Softw. Tools Technol. Transf. 2020, 22, 417–436. [Google Scholar] [CrossRef]
Abdennadher, N. Eranet-Smartenergysystems.eu. Available online: https://eranet-smartenergysystems.eu/global/images/cms/Content/Fact%20Sheets/2020/ERANetSES_ProjectFactSheet_JC2020_LASAGNE.pdf (accessed on 14 April 2025).
Abdennadher, N. Towards a Distributed Continuum Computing Platform for Federated Learning Based Self-Adaptive IoT Applications. In Intelligent Environments 2024: Combined Proceedings of Workshops and Demos & Videos Session; IOS Press: Amsterdam, The Netherlands, 2024; p. 5. [Google Scholar]
El-Hawary, M.E. The smart grid—State-of-the-art and future trends. Electr. Power Compon. Syst. 2014, 42, 239–250. [Google Scholar] [CrossRef]
Grieves, M.W. Digital Twins: Past, Present, and Future. In The Digital Twin; Springer: Berlin/Heidelberg, Germany, 2023; pp. 97–121. [Google Scholar]
Zhang, C.; Xie, Y.; Bai, H.; Yu, B.; Li, W.; Gao, Y. A survey on federated learning. Knowl.-Based Syst. 2021, 216, 106775. [Google Scholar] [CrossRef]
Savi, M.; Olivadese, F. Short-term energy consumption forecasting at the edge: A federated learning approach. IEEE Access 2021, 9, 95949–95969. [Google Scholar] [CrossRef]
Ibrahem, M.I.; Mahmoud, M.; Fouda, M.M.; ElHalawany, B.M.; Alasmary, W. Privacy-preserving and efficient decentralized federated learning-based energy theft detector. In Proceedings of the GLOBECOM 2022—2022 IEEE Global Communications Conference, Rio de Janeiro, Brazil, 4–8 December 2022; pp. 287–292. [Google Scholar]
Liu, W.; Chen, L.; Zhang, W. Decentralized federated learning: Balancing communication and computing costs. IEEE Trans. Signal Inf. Process. Netw. 2022, 8, 131–143. [Google Scholar] [CrossRef]
Hegedus, I.; Danner, G.; Jelasity, M. Decentralized learning works: An empirical comparison of gossip learning and federated learning. J. Parallel Distrib. Comput. 2021, 148, 109–124. [Google Scholar] [CrossRef]
Dinani, M.A.; Holzer, A.; Nguyen, H.; Marsan, M.A.; Rizzo, G. Gossip learning of personalized models for vehicle trajectory prediction. In Proceedings of the 2021 IEEE Wireless Communications and Networking Conference Workshops (WCNCW), Nanjing, China, 29 March–1 April 2021; pp. 1–7. [Google Scholar]
Giuseppi, A.; Manfredi, S.; Menegatti, D.; Pietrabissa, A.; Poli, C. Decentralized federated learning for nonintrusive load monitoring in smart energy communities. In Proceedings of the 2022 30th Mediterranean Conference on Control and Automation (MED), Athens, Greece, 28 June–1 July 2022; pp. 312–317. [Google Scholar]
Moussa, M.; Abdennadher, N.; Couturier, R.; Serugendo, G.M. A generic-based federated learning model for smart grid and renewable energy. In Proceedings of the 2023 22nd International Symposium on Parallel and Distributed Computing (ISPDC), Bucharest, Romania, 10–12 July 2023. [Google Scholar]
Dong, X.; Yu, Z.; Cao, W.; Shi, Y.; Ma, Q. A survey on ensemble learning. Front. Comput. Sci. 2020, 14, 241–258. [Google Scholar] [CrossRef]
Boland, P.J. Majority systems and the Condorcet jury theorem. J. R. Stat. Soc. Ser. D Stat. 1989, 38, 181–189. [Google Scholar] [CrossRef]
Sagi, O.; Rokach, L. Ensemble learning: A survey. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2018, 8, e1249. [Google Scholar] [CrossRef]
Wang, L.; Mao, S.; Wilamowski, B.M.; Nelms, R. Ensemble learning for load forecasting. IEEE Trans. Green Commun. Netw. 2020, 4, 616–628. [Google Scholar] [CrossRef]
McInnes, L.; Healy, J.; Astels, S. hdbscan: Hierarchical density based clustering. J. Open Source Softw. 2017, 2, 205. [Google Scholar] [CrossRef]
Kumar, J.; Gupta, R.; Saxena, D.; Singh, A.K. Power consumption forecast model using ensemble learning for smart grid. J. Supercomput. 2023, 79, 11007–11028. [Google Scholar] [CrossRef]
Ali, W.; Din, I.U.; Almogren, A.; Rodrigues, J.J. GreenTrust: Trust Assessment Using Ensemble Learning in Internet of Microgrid Things. IEEE Internet Things J. 2024. [Google Scholar] [CrossRef]
Yu, Y.; Deng, J.; Tang, Y.; Liu, J.; Chen, W. Decentralized ensemble learning based on sample exchange among multiple agents. In Proceedings of the 2019 ACM International Symposium on Blockchain and Secure Critical Infrastructure, Auckland, New Zealand, 8 July 2019; pp. 57–66. [Google Scholar]
Magureanu, H.; Usher, N. Consensus learning: A novel decentralised ensemble learning paradigm. arXiv 2024, arXiv:2402.16157. [Google Scholar]
Glass, P.; Di Marzo Serugendo, G. Coordination Model and Digital Twins for Managing Energy Consumption and Production in a Smart Grid. Energies 2023, 16, 7629. [Google Scholar] [CrossRef]
COORDINATION: International Conference on Coordination Models and Languages. Lecture Notes in Computer Science. Springer. Available online: https://link.springer.com/conference/coordination (accessed on 14 April 2025).
Omicini, A. Nature-Inspired Coordination Models: Current Status and Future Trends. Int. Sch. Res. Not. 2013, 2013, 384903. [Google Scholar] [CrossRef]
Castelli, G.; Mamei, M.; Rosi, A.; Zambonelli, F. Engineering pervasive service ecosystems: The SAPERE approach. ACM Trans. Auton. Adapt. Syst. (TAAS) 2015, 10, 1. [Google Scholar] [CrossRef]
Ben Mahfoudh, H. Learning-Based Coordination Model for Spontaneous. Ph.D. Thesis, Université de Genève, Geneva, Switzerland, 2020. [Google Scholar]
Viroli, M.; Nardini, E.; Castelli, G.; Mamei, M.; Zambonelli, F. Towards a coordination approach to adaptive pervasive service ecosystems. In Proceedings of the 2011 IEEE Fifth International Conference on Self-Adaptive and Self-Organizing Systems, Ann Arbor, MI, USA, 3–7 October 2011; pp. 223–224. [Google Scholar]
Fernandez-Marquez, J.L.; Di Marzo Serugendo, G.; Montagna, S.; Viroli, M.; Arcos, J.L. Description and composition of bio-inspired design patterns: A complete overview. Nat. Comput. 2012, 12, 43–67. [Google Scholar] [CrossRef]
Zambonelli, F.; Omicini, A.; Anzengruber, B.; Castelli, G.; De Angelis, F.L.; Serugendo, G.D.M.; Dobson, S.; Fernandez-Marquez, J.L.; Ferscha, A.; Mamei, M.; et al. Developing pervasive multi-agent systems with nature-inspired coordination. Pervasive Mob. Comput. 2015, 17, 236–252. [Google Scholar] [CrossRef]
Whittaker, J.A.; Thomason, M.G. A Markov chain model for statistical software testing. IEEE Trans. Softw. Eng. 1994, 20, 812–824. [Google Scholar] [CrossRef]
Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]

Figure 1. Coordination model.

Figure 2. Gossip pattern: combination of aggregation and spreading patterns (in the red dotted box), inspired from bio-inspired design patterns [36]. The bottom layer, called basic patterns, contains the four elementary patterns currently implemented in the current version of the SAPERE derived middleware, in the form of coordination mechanisms.

Figure 3. General diagram of a topology with instances of the coordination platform communicating with each other.

Figure 5. Using coordination mechanisms to implement a gossip ensemble learning approach. This figure illustrates the use of different types of learning models depending on the node. In this approach, each node manages distribution and aggregation of predictions.

Figure 6. The interface “

I A g g r e g a t a b l e

” defined in the coordination middleware.

Figure 6. The interface “

I A g g r e g a t a b l e

” defined in the coordination middleware.

Figure 7. Aggregation of 3 different properties: TOTAL, MODEL (node model), and CL_MODEL (cluster model).

Figure 8. Application of a single aggregation specification (contained in the synthetic property “AGGREGATION” of the local LSA:

L S A_{1}

). This processing follows the following 3 steps: (step 1) it retrieves the objects to be aggregated from LSAs received as input and from the local LSA (value of

p r o p N a m e

property in each LSA); (step 2) it executes the

o b j_{1} . a g g r e g a t e

method defined in the class of

o b j_{1}

; and (step 3) it stores its content in the "aggregatedValue" attribute of this property.

Figure 8. Application of a single aggregation specification (contained in the synthetic property “AGGREGATION” of the local LSA:

L S A_{1}

). This processing follows the following 3 steps: (step 1) it retrieves the objects to be aggregated from LSAs received as input and from the local LSA (value of

p r o p N a m e

property in each LSA); (step 2) it executes the

o b j_{1} . a g g r e g a t e

method defined in the class of

o b j_{1}

; and (step 3) it stores its content in the "aggregatedValue" attribute of this property.

Figure 10. Accuracies of node predictions obtained using the Markov chains model, for each aggregator.

Figure 11. Accuracies of node predictions obtained by the LSTM model, for each aggregator used. Averages obtained for each model appear as dashed lines.

Figure 12. Accuracies of cluster predictions obtained by the two models (Markov chains and LSTM) and for each aggregator used. Averages obtained for each model appear as dashed lines.

Figure 13. Comparison of accuracies obtained on cluster’s level predictions for (1) each approach (GEL, GFDL with Markov chains model, and GFDL with LSTM model), and (2) each aggregator used in both approaches GEL and GFDL (sampling_nb, dist_power_hist, and no aggregator). Averages obtained for each approach appear as dashed lines.

Figure 14. Same comparisons obtained on node’s level predictions: by approach (GEL, GFDL with MC, GFDL with LSTM) and by aggregator. Averages obtained for each approach appear as dashed lines.

Figure 15. Accuracies obtained on cluster’s level predictions with ensemble learning (Markov chains and LSTM) and for each aggregator used.

Table 1. Comparison of centralised and decentralised federated learning.

	Centralised Federated Learning	Decentralised Federated Learning
Model training	Executed by the nodes
Constraints regarding models	All nodes must have a similar type of model and similar behaviours to predict
Model distribution	Managed by the central server	Managed by the nodes
Risk of network overhead	Important around the central server (all the nodes exchange data with it).	Can be reduced, depending on the density of the network graph (each node exchanges data only with its direct neighbours).
Confidentiality of the model dataset	Yes (nodes only exchange model weights)

On a grey background: the approach labels. On a light grey background: the different comparison criteria.

Table 2. Role of each component in the coordination model.

Component	Role	Layer
Digital twins	Intelligent agent that represents a real device and interacts with other digital twins via the coordination platform to perform specific tasks (exchanging energy, training a learning model and generating predictions).	Intermediary (between the end user and the middleware layer).
Coordination platform	System that executes the coordination model within a node device. It interacts with other coordination platforms located on other nodes, through the spreading coordination law.	Lower (coordination middleware).
Tuple space	Virtual environment shared between the different digital twins and on which the coordination laws are executed.	Lower (coordination middleware).
Coordination law	Bio-inspired mechanism operating in tuple space, which transforms or diffuses data exchanged by digital twins. The gossip mechanism, which is used to implement the GFDL and GEL approaches, is part of this and uses the elementary mechanisms of aggregation and spreading.	Lower (coordination middleware).
LSA	Tuple structure containing the properties of a digital twin. It is submitted by a digital twin, transformed and disseminated by the coordination laws, then retrieved by other digital twins.	Lower (coordination middleware).

On a light grey background: the different components.

Table 3. The 6 electrical power variables that characterise the state of the node.

Variable	Description
requested	total power of demands
produced	total generated power
consumed	total power of satisfied demands
supplied	total generated power used for supplies
missing	total power of not satisfied demands
available	total generated power not used for supplies

On a light grey background: the different variables.

Table 4. Synthesis of distributed learning approaches (centralised versus decentralised and the 3 types of data exchanged). The two approaches we are implementing are highlighted in bold.

	Type of Shared Data:
	Training Dataset	Model Weights	Predictions
Centralised distribution	Centralised Learning	Federated Learning (FDL)	Ensemble Learning (EL)
Decentralised distribution (Gossip)	Not implemented	Gossip Federated Learning (GFDL)	Gossip Ensemble Learning (GEL)

On a light grey background: the two categories of distributed learning approaches (centralised/decentralised).

Table 5. Similarities between GFDL and GEL.

	Gossip Federated Learning	Gossip Ensemble Learning
General mechanism	Both use the generic gossip mechanism of the coordination platform.
Type of distributed learning	Both are decentralised approaches.
Dataset confidentiality	Both guarantee confidentiality, as datasets are not disseminated.
Requirements for predicted behaviour	In both cases, the cluster must contain nodes with similar behaviours, so that the exchange of knowledge about these behaviours is beneficial.

On a grey background: the approach labels. On a light grey background: the different comparison criteria.

Table 6. Differences between GFDL and GEL.

	Gossip Federated Learning	Gossip Ensemble Learning
Data shared	Learning Model weights	Predictions.
Requirements for model structures	All the learning models disseminated must have the same structure.	No requirement.
Possibilities for implementing aggregation operators	A wider variety of aggregation functions can be defined on learning models. This offers more possibilities to increase the gains in accuracy provided by aggregation.	Much more limited possibilities as prediction objects have simpler structures and fewer functions.
Implementation effort required to integrate a new type of model	Implement and test new aggregation functions for the new type of the learning model.	Nothing to be implemented.
Memory and bandwidth consumption	High or very high, depending on the structure of the learning model (number and size of layers, matrices). This may require a significant reduction in the frequency of gossip.	Limited risk of network or memory overhead as the prediction results take up much less memory.
Sensitivity to an aggregation operator not suited to the situation.	The impact is less immediate as it concerns the model weights and not necessarily the classification results.	The impact is direct because the aggregation coefficients are applied directly to the prediction results.

On a grey background: the approach labels. On a light grey background: the different comparison criteria.

Table 7. LSA properties aggregated in Figure 7.

Variable	Class	Operator	Description
TOTAL	NodeTotal	sum	The sum operator defined in $N o d e T o t a l$ class consists of summing each power variable contained in $N o d e T o t a l$ class, over the set of nodes. It returns a $N o d e T o t a l$ class instance which represents the whole cluster of nodes.
MODEL	$L s t m M o d e l$	sampling_nb	The sampling_nb operator defined in $L s t m M o d e l$ class calculates a weighted average of the various LSTM models, using a weighting proportional to the number of samples in each LSTM model.
CL_MODEL	$M a r k o v C h a i n s M o d e l$	power_loss	The power_loss operator defined in $M a r k o v C h a i n s M o d e l$ class calculates a weighted average of the different Markov chain models, using a weighting proportional to the model’s accuracy evaluated with a local test dataset.

On a light grey background: LSA property labels.

Table 8. Main properties of a learning model instance.

Property	Description
Model configuration	Contains the model type, perimeter (node or cluster), aggregator, incremental learning frequency, aggregation frequency, etc.
Time of last weight update	This information allows the model to check whether any changes have been made recently, and whether it is therefore necessary to send the model content to its LSA for spreading (see step 11 in Algorithm 1).
Test dataset	Recent history of the power values and corresponding states (classification), for each of the 6 powers to be predicted. This data constitutes a test set for evaluating models from different neighbouring nodes, as it provides the states observed, which will then be compared with the states predicted by the model to be evaluated, as part of the aggregation of the different models received.
Aggregation weights	Table of weighted coefficients assigned to each neighbouring node model (including the local node) in the last aggregation. For each variable, the sum of the coefficients linked to each node is equal to 1, and each weight (also called coefficient) indicates the relative importance attributed to the sub-model of a node k during this aggregation calculation.

On a light grey background: the different properties of a learning model instance.

Table 9. Main services implemented by a learning model instance.

Service	Description
Model training	Incremental training, including the latest dataset updates.
Prediction	Single prediction calculation at a given date-time and horizon.
Series of predictions	Set of predictions generated with a given set of time-dates and horizons. This returns a list of prediction results for each date-time and horizon requested.
Model compaction	Returns the compacted format to be submitted in the LSA (to reduce communication overhead).
Weights copy	Operation for copying weights from another model: this operation is used when the local model needs to recover the weights from the last aggregation.
Number of samples	Returns the numbers of samplings in the model dataset (for the “quantitative” aggregator).
Aggregation results	Returns the detailed results of the last aggregation, with the weights obtained for each model received and each power variable. This result is displayed in the web application: it can be used to check the coefficient assignments during the last aggregation.

On a light grey background: the different services of a learning model instance.

Table 10. Common variables used in aggregation operators.

Variable	Description
N	index of the local node
$K_{N}$	set of node indexes of the models to be aggregated: each index identifies the node location of a model to be aggregated. This set includes the index of the local node (N) and of all its direct neighbours: $K_{N} = {N} \cup {k : n o d e_{k} \in n e i g h b o u r h o o d (n o d e_{N})}$
k, i	node index: $k \in K_{N}$ and $i \in K_{N}$
$m o d e l_{k}$	learning model of $n o d e_{k}$
$d a t a s e t_{k}$	training dataset of $m o d e l_{k}$
$t e s t d a t a s e t_{k}$	test dataset of $m o d e l_{k}$
$c o e f_{k}$	weight assigned by the operator to $m o d e l_{k}$ , which represents its relative importance compared with the other models. This coefficient is normalised: $\sum_{\begin{matrix} k \in K_{N} \end{matrix}} c o e f_{k} = 1$

On a light grey background: the different variables.

Table 11. Adaptations used to reduce bandwidth. The ‘Usages’ column indicates the types of objects to which these adaptations apply.

Adaptation	Description	Usages
Spreading frequency	Enables to reduce the frequency at which an object is broadcast in the context of the gossip application.	Learning models, Predictions.
Synchronisation of spreading	Delays the start of the broadcast of information from a node, until the direct neighbouring nodes are also ready to broadcast the same information, in the context of the gossip application.	Learning models.
Compression of exchanged objects	Reduces the size of data exchanged between LSAs by converting an object into a compressed format just before it is physically sent.	Learning models.

On a light grey background: the different adaptations.

Table 12. Parameters set for each classifier used.

Classifier	Learning Frequency	Gossip Frequency	Batch Size	Learning Rate	Other Hyper Parameters
Markov Chains	1/min	1/5 min	N/A	N/A	sliding window: 100 days
LSTM	1/20 min	1/20 min	32	$10^{- 4}$ ( $10^{- 3}$ for the 1st training)

On a light grey background: the two classifiers used.

Table 13. Accuracy results, broken down by aggregator and by prediction perimeter. The ‘CLUSTER’ and ‘NODE’ lines correspond to the averages obtained for each prediction perimeter, and the ‘TOTAL’ line corresponds to the overall averages obtained for all predictions. The left half of the table shows the accuracies obtained using the Markov chains model, while the right half shows the predictions obtained using the LSTM model. Accuracies are also calculated based on non-trivial predictions.

	Markov Chains			LSTM
	Predictions NB	Reliability %	Reliability Non Trivia %	Predictions NB	Reliability %	Reliability Non Trivia %
TOTAL	77,094	80.26%	75.84%	59,020	92.85%	91.24%
CLUSTER	25,896	75.05%	74.68%	17,834	88.39%	88.39%
CLUSTER none	6942	74.75%	74.02%	6474	86.44%	86.44%
CLUSTER power_loss	7470	74.59%	74.59%	2196	89.03%	89.03%
CLUSTER min_loss	3726	75.42%	75.42%	2216	90.03%	90.03%
CLUSTER sampling_nb	3840	75.29%	75.29%	3864	89.08%	89.08%
CLUSTER dist_power_hist	3918	75.88%	74.66%	3084	89.98%	89.98%
NODE	47,580	83.54%	76.92%	41,186	94.78%	92.92%
NODE none	6840	83.32%	77.29%	14,322	95.48%	93.82%
NODE power_loss	21,678	85.51%	80.05%	4362	94.25%	92.31%
NODE min_loss	3726	88.78%	77.56%	4380	93.10%	90.76%
NODE sampling_nb	7572	78.18%	71.13%	12,038	94.35%	92.42%
NODE dist_power_hist	7764	80.94%	72.63%	6084	95.58%	93.91%

On a purple background: the perimeter and aggregator used. The bold font indicates a result averaged over all aggregators.

Table 14. Accuracy results, broken down by aggregator and by prediction perimeter obtained, using the ensemble learning approach. The ’CLUSTER’ and ’NODE’ lines correspond to the averages obtained for each prediction perimeter, and the ‘TOTAL’ line corresponds to the overall averages obtained for all predictions. Accuracies are also calculated based on non-trivial predictions.

	Gossip Ensemble Learning
	Predictions NB	Reliability %	Reliability Non Trivia %
TOTAL	233,364	86.26%	87.80%
CLUSTER	115,980	86.19%	86.05%
CLUSTER none	29,340	81.19%	81.06%
CLUSTER dist_power_last	28,698	88.52%	88.52%
CLUSTER dist_power_hist	29,346	88.32%	88.19%
CLUSTER sampling_nb	28,596	86.81%	86.71%
NODE	117,384	86.33%	90.17%
NODE none	29,334	92.52%	90.54%
NODE dist_power_last	29,796	93.93%	91.63%
NODE dist_power_hist	30,450	92.37%	89.53%
NODE sampling_nb	27,804	65.05%	88.96%

On a purple background: the perimeter and aggregator used. The bold font indicates a result averaged over all aggregators.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Glass, P.; Di Marzo Serugendo, G. Gossip Coordination Mechanism for Decentralised Learning. Energies 2025, 18, 2116. https://doi.org/10.3390/en18082116

AMA Style

Glass P, Di Marzo Serugendo G. Gossip Coordination Mechanism for Decentralised Learning. Energies. 2025; 18(8):2116. https://doi.org/10.3390/en18082116

Chicago/Turabian Style

Glass, Philippe, and Giovanna Di Marzo Serugendo. 2025. "Gossip Coordination Mechanism for Decentralised Learning" Energies 18, no. 8: 2116. https://doi.org/10.3390/en18082116

APA Style

Glass, P., & Di Marzo Serugendo, G. (2025). Gossip Coordination Mechanism for Decentralised Learning. Energies, 18(8), 2116. https://doi.org/10.3390/en18082116

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Gossip Coordination Mechanism for Decentralised Learning

Abstract

1. Introduction

2. Materials and Method

2.1. Related Works

2.1.1. Distributed Collaborative Machine Learning Frameworks

Centralised Federated Learning: The General Approach

Decentralised Federated Learning Approach

Ensemble Learning Approach

2.1.2. The Identified Gap and Research Question

2.2. Using a Coordination Model on Microgrid Nodes

2.2.1. Coordination Model: Key Concepts

2.2.2. The Gossip Mechanisms

2.2.3. Overview of Coordination Model Components

2.2.4. Defining the Microgrid Topology

2.2.5. Scalability to Suit Different Environmental Topologies

2.3. Defining the Variables to Be Predicted

2.3.1. Defining Variables at Node Level

2.3.2. Defining Variables at Cluster Level

2.4. Two Different Ways of Using the Gossip Mechanism

2.4.1. The Gossip Federated Learning Approach

2.4.2. The Gossip Ensemble Learning Approach

2.4.3. Overview of Implemented Approaches

2.5. Integration of the Gossip Pattern into the Coordination Middleware

2.5.1. Implementation of a Generic Aggregation Mechanism

General Algorithm for Aggregation

Using Polymorphism to Make Aggregation Generic

2.5.2. Implementation of Spreading Mechanism

2.6. Using Gossip Pattern for Gossip Federated Learning

2.6.1. Template of a Generic Learning Model That Can Be Aggregated

2.6.2. Definition of Several Operators for Learning Model Aggregation

Quantitative Approach (“Sampling_nb”)

Approach Based on Model Loss (Power_loss)

Approach Based on the Minimal Model Loss (“Min_loss”)

Approach Based on the Power Profile Similarities

Possible Improvements for Aggregators

2.6.3. Adaptations Made to Reduce the Bandwidth Used

Adapting the Spreading Frequency of Learning Models

Synchronisation of Spreading Start-Ups

Using a Compression Mechanism to Reduce the Size of Objects Exchanged

Synthesis

2.7. Using Gossip Pattern for Decentralised Ensemble Learning

2.7.1. Implementation of Aggregation on Prediction Data

Structure of Predictions

Definition of Several Operators for Prediction Aggregation

3. Results

3.1. Prediction Assessment Method

3.2. Description of the Experiment

3.2.1. Initial Start-Up Training

3.2.2. Incremental Training

3.3. Assessing Gossip Federated Learning

3.3.1. Node State Predictions

Using the Markov Chains Model

Using the LSTM Model

3.3.2. Cluster State Predictions

3.4. Assessing Gossip Ensemble Learning

3.4.1. Comparison with the Gossip Federated Learning Approach

3.4.2. Comparing the Performance of Each Aggregator

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A

Appendix A.1. Federated Learning Approach

Appendix A.1.1. Centralised Federated Learning Approach

Appendix A.1.2. Decentralised Federated Learning Approach Based on Gossip Mechanism

Appendix A.2. Bio-Inspired Design Patterns

Appendix A.3. Spreading of LSAs

Appendix A.4. Aggregation of a Prediction

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives