2.2.2. Energy Sharing Mechanism

Once the price sequences of buyers and sellers are intersected, i.e., *prices*1 < *priceb*1, the microgrids whose bidding price are within the interval will be chosen to enter the energy sharing process. Due to the uncertainty and complexity of price intersections, a layering method and a price-prioritized quantity-weighted sharing rule are combined to solve the energy sharing problem.

The number of selected buyer and seller microgrids are *nbshare* and *nsshare*, respectively. Starting from the highest bidding price of sellers, the buyer microgrids whose bidding prices are higher than *psbsshare* and all of the seller microgrids are selected to be combined into a sharing layer. These buyer microgrids have the priority to trade with seller microgrids as they would like to pay the higher price for each unit of energy. Deals are made in this layer and related microgrids are removed from the sharing list depending on different situations. The layering method is applied repeatedly until there is no buyer microgrid in the sharing list or all the energy of seller microgrids is sold out. The detailed layering process is presented below:

• (1) Form a bidding layer according to the above-mentioned method and proceed to (2).


Take the situation in Figure 3 as an example. Two buyer microgrids (*pb*1 and *pb*2) and three seller microgrids (*p<sup>s</sup>*1, *ps*2 and *p<sup>s</sup>*3) are selected to form Layer 1 as shown in Figure 3a. After energy allocation in Layer 1, all of the seller microgrids have surplus energy, therefore *pb*1 and *pb*2 are removed from the sharing list as their energy demands are satisfied. *ps*3 is also removed from the list as no buyer microgrid's bidding price is higher than his. Afterwords, Layer 2 is formed containing one buyer microgrid (*p<sup>b</sup>*3) and two seller microgrids (*p<sup>s</sup>*1 and *p<sup>s</sup>*2), as shown in Figure 3b. The sharing process ends after the energy allocation in this layer.

**Figure 3.** Layering methods in the proposed sharing mechanism.

For each layer in the energy sharing process, without loss of generality, we propose a price-prioritized quantity-weighted sharing rule for two situations. Figure 4 gives the sharing results of examples on these two situations, in which the bidding price/quantity of each deal is given below/above the figure. Energy quantity of buyers in a layer is sorted based on their quoted prices in descending order, while for sellers the quantity are sorted in ascending order. This rule ensures buyers with higher bid prices give priority to lower-priced energy. In Figure 4a, for the sharing process in round *n*, when ∑ *qbi* ≥ ∑ *qsj* , every seller will sell out its energy, the exceeded part of demand will be cut and participate in the next round of bidding in the energy market. However, when ∑ *qbi* < ∑ *qsj* as shown in Figure 4b, the sellers will have to fairly share the exceeded part of supply. A seller microgrid *j*'s trading quantity is calculated as follows:

$$q\_j^n = \begin{cases} \ q\_j^n & \text{if } \sum q\_i^b \ge \sum q\_{j'}^s \\\ q\_j^n - q\_{cut,j}^n & \text{if } \sum q\_i^b < \sum q\_{j'}^s \end{cases} \tag{5}$$

$$q\_{cut,j}^n = \left(\sum q\_j^s - \sum q\_i^b\right) \cdot \frac{q\_j^n}{\sum q\_j^s}.\tag{6}$$

In Equation (6), *<sup>q</sup>mcut*,*<sup>j</sup>* represents the cut quantity for microgrid *j* in round *n*. The oversupply burden is weighted shared to each seller microgrid and cut from their energy supply. This sharing rule guarantees that each seller microgrid could sell a non-negative quantity, which is more fair than the equally sharing mechanism. After the determination of sharing layers and trading quantity, the DNO can choose any suitable price within the interval [*psi* , *pbj* ] as trading price at this time slot for microgrid *i* and *j*. We assume both sides of this transaction agree to trade at a price *pij* = *θ* · (*pbi* + *<sup>p</sup>sj*), where *θ* ∈ (0, 1) is a predefined constant. Without loss of fairness, *θ* is set as 0.5 in this paper.

The proposed energy sharing mechanism ensures that buyer microgrids with higher bidding price and seller microgrids with lower bidding price have the priority in reaching a deal. In addition, the fairness of energy trading quantity is accomplished by a weighted sharing rule.

**Figure 4.** Sharing price and quantity under two situations.

#### **3. A Q-Cube Framework for Q-Learning in a Continuous Double Auction Energy Trading Market**

In a normal Q-learning algorithm, an agen<sup>t</sup> learns from the environment information and interacts with relevant agents. By observing states and collecting rewards, an agen<sup>t</sup> selects appropriate actions to maximize future profits. The agents are independent from one other both in terms of acting as well as learning. However, the particularity of the energy trading market creates a complex energy economic system. Non-cooperative trading pattern, personalized MGO preferences and time-varying market conditions bring difficulties to the selection of bidding strategies for market participants. As a model-free algorithm, Q-learning is capable of modeling the MGOs' bidding behaviors in a continuous double auction energy trading market. In this paper, a Q-cube framework of Q-learning algorithm is proposed especially for this multi-microgrid non-cooperative bidding problem, which addressed the exploitation–exploration issue.

## *3.1. Basic Definitions for Q-Learning*

We base the Q-cube framework on an MDP consisting of a tuple *<sup>S</sup>*, *A*, *S* ,*<sup>r</sup>*. Detailed introductions of these variables are given as follows.
