**5. Conclusions**

To better describe the characteristics of future electricity market, a non-cooperative continuous double auction mechanism, considering the coupling relationship of bidding price and quantity, was developed in this paper to facilitate energy trading among microgrids in the distribution network. An alternative form of 'demand response' is performed in the proposed energy trading mechanism by exerting the potential capacity of BESS, which expands the concept of demand response from time-based to multi-agent-based. The Q-learning algorithm was introduced to CDA mechanism as a decision-making method for each microgrid. To solve the existing defects on the application of Q-learning algorithm in power system, a non-tabular framework of Q-values considering two dimensions of the bidding action is proposed as a Q-cube. In addition, corresponding parameter setting and state-action architecture are designed to better reflect the microgrids' personalized bidding preferences and make rational decisions according to real-time status of the networked microgrids. Simulations on a realistic case from Hongfeng Lake, Guizhou Province, China prove the efficiency and applicability of the proposed CDA mechanism and Q-cube framework. All of the microgrids are able to make an appropriate negotiation response to the global real-time supply and demand relationship without disclosing personal privacy. A 65.7% and 10.9% increase in the overall profit of the distribution network could be achieved by applying a QLCDA mechanism compared with the traditional energy trading mechanism and P2P energy trading mechanism, respectively. In addition, the Q-value distribution in the proposed Q-cube gives a good response to microgrid's bidding behaviors and preferences on both theoretical analysis and simulation results. As has been demonstrated in this paper, the proposed Q-cube framework of a Q-learning algorithm for a continuous double auction mechanism can be applied to more energy trading markets in future EI.

There are still some limitations of the proposed Q-cube framework to be discussed: the interaction between bidding price and quantity should be better described as many other factors could have an influence on this coupling relationship, and it is still difficult to summarize the microgrids' energy bidding preferences with these existing parameters. Moreover, the power flow calculation should be considered synchronously as the energy trading quantity might cause safety issues in the distribution network. In future works, a two-layer energy bidding architecture could be discussed considering both QLCDA among microgrids and internal coordinated dispatch inside microgrids. The interaction of these two layers is worth studying. The power transmission limitations should be considered to ensure the safety of energy market. In addition, further extensions are to be carried out on the time-varying setting of QL parameters and a more appropriate description of the reward function.

**Author Contributions:** Conceptualization, N.W. and W.X.; Methodology, N.W.; Software, N.W.; Validation, N.W.; Formal Analysis, N.W.; Investigation, N.W. and Z.X.; Data Curation, Z.X.; Writing—original draft preparation, N.W.; Writing—review and editing, W.X., Z.X. and W.S.; Visualization, N.W.; Supervision, W.X.; Project Administration, W.X.; Funding Acquisition, W.X. All of the authors have read and approved the final manuscript.

**Funding:** This research was funded by the National Natural Science Foundation of China under Grant No. 61773292 and 71841036.

**Acknowledgments:** The authors thank Ke Sun and Yifan Cheng for careful reading and many helpful suggestions to improve the presentation of this paper.

**Conflicts of Interest:** The authors declare no conflict of interest.
