**7. Conclusions**

This paper builds an OCECF model to optimize the carbon emission and energy losses of power grids simultaneously and proposes a new MCR-Q(λ) learning to solve this problem, which has the following four contributions/novelties:


To further improve the operation benefit of power grids, future works can focus on the carbon trading system-based optimal power flow and the Pareto-based multi-objective learning methods, while a decentralized optimization will be studied for high operation privacy and reliability.

**Author Contributions:** H.C. established the model, implemented the simulation and wrote this article; C.G. guided and revised the paper and refined the language; X.H. collected references; Y.L. guided the research; T.Y. assisted in writing algorithms. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by Technical Projects of China Southern Power Grid gran<sup>t</sup> number [GDK JXM20173256].

**Conflicts of Interest:** The authors declare no conflict of interest.
