Next Article in Journal
Energetic Analysis of Low Global Warming Potential Refrigerants as Substitutes for R410A and R134a in Ground-Source Heat Pumps
Next Article in Special Issue
Study of Internal Flow Heat Transfer Characteristics of Ejection-Permeable FADS
Previous Article in Journal
Women Will Drive the Demand for EVs in the Middle East over the Next 10 Years—Lessons from Today’s Kuwait and 1960s USA
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Reinforcement-Learning-Based Multi-Objective Differential Evolution Algorithm for Large-Scale Combined Heat and Power Economic Emission Dispatch

School of Electrical and Information Engineering, Jiangsu University, Zhenjiang 212013, China
*
Author to whom correspondence should be addressed.
Energies 2023, 16(9), 3753; https://doi.org/10.3390/en16093753
Submission received: 16 February 2023 / Revised: 23 April 2023 / Accepted: 25 April 2023 / Published: 27 April 2023

Abstract

:
As social and environmental issues become increasingly serious, both fuel costs and environmental impacts should be considered in the cogeneration process. In recent years, combined heat and power economic emission dispatch (CHPEED) has become a crucial optimization problem in power system management. In this paper, a novel reinforcement-learning-based multi-objective differential evolution (RLMODE) algorithm is suggested to deal with the CHPEED problem considering large-scale systems. In RLMODE, a Q-learning-based technique is adopted to automatically adjust the control parameters of the multi-objective algorithm. Specifically, the Pareto domination relationship between the offspring solution and the parent solution is used to determine the action reward, and the most-suitable algorithm parameter values for the environment model are adjusted through the Q-learning process. The proposed RLMODE was applied to solve four CHPEED problems: 5, 7, 100, and 140 generating units. The simulation results showed that, compared with four well-established multi-objective algorithms, the RLMODE algorithm achieved the smallest cost and smallest emission values for all four CHPEED problems. In addition, the RLMODE algorithm acquired better Pareto-optimal frontiers in terms of convergence and diversity. The superiority of RLMODE was particularly significant for two large-scale CHPEED problems.

1. Introduction

Traditional thermal power plants cannot efficiently convert thermal energy into electrical energy, and a large amount of thermal energy is wasted as heat [1]. Today, heat supply is an indispensable part of our lives, and therefore, utilizing waste heat has become a new production trend. Combined heat and power (CHP) generation technology collects and utilizes the waste heat for heat supply in the power generation process. Compared with the pure power generation plants, the energy utilization efficiency of CHP plants is more than 90% and can save 10% to 40% of the power generation costs. In addition, CHP plants can reduce the pollutant gas emissions by nearly 13∼18% [2]. In order to realize the sustainable development of the power industry, the application of CHP in the power system has become a global development trend [3].
Combined heat and power economic dispatch (CHPED) is an effective way to achieve optimal production in the CHP production system. CHPED refers to the optimization of electrical and heat production for three types of generating units, i.e., power-only (PO) units, CHP units, and heat-only (HO) units, with the goal of minimizing fuel costs. CHPED is a quite complex optimization task, which should be solved by efficient optimization methods.

1.1. Literature Review

Early methods for the CHPED problem included dual-quadratic programming (DQP) [4], Lagrange relaxation programming (LRP) [5], and the branch-and-bound method (BABM) [6]. However, these methods have the limitations of high initial sensitivity and low solution accuracy for non-convex problems.
Swarm and evolutionary optimization algorithms (SEOAs) are global optimizers that do not require the optimization problem to be convex or differentiable. Therefore, many SEOAs have been devised to solve the CHPED problem during the past ten years, such as the improved genetic algorithm [7], multi-player-based harmony search [2], Kho–Kho optimizer [8], niching differential evolution [9], migrating-variables-based differential evolution [10], collective information particle swarm optimization [11], the amalgamated heap and jellyfish optimizer [12], the hybrid chameleon swarm algorithm [13], hybrid grasshopper optimization [14], hybrid crow search [15], and adaptive cuckoo search [16]. However, all the above works only considered the economic production objective while ignoring the environmental pollution objective, and CHPED was solved using a single objective optimization framework.
With the increasingly serious social and environmental problems, both fuel costs and environmental impacts should be considered in the production process. As a result, combined heat and power economic emission dispatch (CHPEED) [17] is established with two conflicting goals, i.e., minimizing the fuel costs and reducing pollutant gas emissions. CHPEED is a non-linear, non-convex, and multi-objective optimization problem with multiple constraints. Recently, the research on CHPEED has become a hot topic in academia and industry.
Elaiw et al. [18] presented a hybrid DE-SQP method to solve the dynamic CHPEED problem. In the hybrid algorithm, DE acts as a global optimizer for the base-level search and SQP is used for fine-tuning of the final solution. Ahmadi et al. [19] used the normal boundary intersection (NBI) method to handle the CHPEED problem. The NBI was applied to find the Pareto-optimal solutions, and the TOPSIS decision-making approach was adopted to obtain the tradeoff solution Anand et al. [20] put forward a civilized swarm optimization (CSO) algorithm to solve the CHPEED problem. CSO is a synthetic technique based on particle swarm optimization and the society civilization algorithm. Sadeghian et al. [21] solved the CHPEED problem based on double-Benders decomposition (DBD). The DBD method consists of the external BD and the internal BD. For the external BD, the on/off state of generation units is determined by the master problem, and for the internal BD, the economic dispatch is solved through the sub-problem. Alomoush [22] applied stochastic fractal search (SFS) to solve the CHPEED problem. By using a compromise programming method, the fuel cost and gas emission were coupled into an aggregate objective function, and the approximate global optimal solution was obtained by the SFS algorithm. Jdoun et al. [23] proposed a dynamic control whale optimization (DCWOA) algorithm to solve the CHPEED problem. DCWOA adds the dynamically controlled constriction function into the traditional WOA. Note that most of these works transform the multi-objective CHPEED problem into a single-objective optimization problem, solve the problem by executing the single-objective optimization algorithm many times, and obtain the Pareto-optimal solutions.
Pareto-based multi-objective optimization algorithms have also been proposed for the CHPEED problem, which can obtain the Pareto-optimal solutions in one run. Niknam et al. [24] solved the reserve constrained dynamic CHPEED problem based on a multi-objective-enhanced firefly algorithm. Basu [25] recommended the nondominated sorting genetic algorithm-II (NSGA-II) to solve the CHPEED problem. NSGA-II employs fast nondominated sorting (FNS) and crowding distance (CD) comparison to select better individuals. Shi et al. [26] developed a multi-objective line-up competition algorithm (MLCA) to deal with the CHPEED problem with power transmission loss. An efficient diversity preservation mechanism was employed in the MLCA to produce the uniformly distributed Pareto-optimal solutions. Shaabani et al. [27] introduced a time-varying accelerated multi-objective particle swarm optimization (TV-MOPSO) algorithm to optimize the CHPEED solution. In TV-MOPSO, the acceleration coefficients are dynamically changed during the optimization process. Li et al. [17] proposed a two-stage approach to solve the CHPEED problem, which combines the multi-objective optimization algorithm θ -DEA and an integrated decision-making strategy. Sun et al. [28] put forward an indicator- and crowding-distance-based evolutionary algorithm (IDBEA) for the CHPEED problem. Sundaram [29] proposed a hybrid multi-objective algorithm based on NSGA-II and MOPSO (NSGAII-MOPSO) for the CHPEED problem. Sundaram [30] implemented a multi-objective multi-verse optimization (MOMVO) algorithm for the solution of the CHPEED problem. In MOMVO, a chaotic opposition strategy is used for the initial population generation, and it explores the search space extensively. Xiong et al. [31] proposed an improved bare bones MOPSO (IMOBBPSO) algorithm to solve three CHPEED problems. In IBBMOPSO, the adaptive particle update strategy is added to automatically adjust the weight of the personal and global best position, and an external archiving strategy is established to improve the swarm diversity.

1.2. Contributions of This Work

Despite the above research works, there are still two limitations in the existing CHPEED research. Firstly, the existing CHPEED methods do not introduce advanced machine learning technology, and integrating machine learning techniques into multi-objective optimization algorithms may improve their efficiency at solving the CHPEED problem. Secondly, most of the existing works only considered the small-scale CHPEED problem with less than 10 units, and the large-scale CHPEED problem with more than 100 units has not been considered. Based on these considerations, this paper devised a reinforcement-learning-based multi-objective differential evolution (RLMODE) algorithm to deal with the CHPEED problem considering a large-scale system.
Multi-objective differential evolution (MODE) is a multi-objective evolutionary optimization technique. Due to its advantages of simple implementation, good stability, and robustness [32,33], MODE has been applied to solve the many real-world multi-objective problems including power dispatch problems [34,35]. On the other hand, reinforcement learning (RL) is an important machine learning technique. RL studies how an agent learns through interaction with the external environment. RL does not need any prior data, but only needs to accumulate rewards based on the information of the agent learned from the external environment and finally obtains the maximum reward [36].
In this paper, using the RL technique, a novel reinforcement-learning-based multi-objective differential evolution (RLMODE) algorithm is proposed to solve the CHPEED problem.
The main contributions of this paper are listed as follows:
  • A novel reinforcement-learning-based multi-objective differential evolution (RLMODE) algorithm is developed.
  • The RLMODE algorithm uses RL to automatically adjust the control parameters, which enhances the search ability and stability.
  • The RLMODE algorithm was utilized to solve four CHPEED problems including two large-scale CHPEED problems with more than 100 generating units.
  • The superiority of the RLMODE algorithm was verified by comparing with well-established multi-objective optimization algorithms.
The rest of the article is structured as follows. Section 2 introduces the mathematical model of CHPEED. Section 3 describes the proposed RLMODE algorithm in detail. Section 4 states the implementation of RLMODE for solving CHPEED. In Section 5, RLMODE is applied to solve four CHPEED problems and compared with other algorithms. Section 6 draws the conclusions.

2. Mathematical Formulation of CHPEED Problem

2.1. Objective Function

2.1.1. Fuel Cost

The total fuel cost F C is composed of the fuel cost of the PO, CHP, and HO units [25]. The fuel cost objective function is described as follows:
m i n F C = i = 1 N P C i P i + j = 1 N C C j P j C , H j C + k = 1 N H C k H k
where C i P i , C j P j C , H j C , and C k H k represent the fuel cost of the i t h PO unit, j t h CHP unit, and k t h HO unit, respectively; P i , P j C , H j C , and H k are the power and heat output of three types of units; N P , N C , and N H represent the three types of units.
The fuel costs function of the PO, CHP, and HO units are formulated as follows:
C i P i = a i P i 2 + b i P i + c i + d i sin e i P i min P i
C j P j C , H j C = f j P j C 2 + g j P j C + l j + h j H j C 2 + m j H j C + n j P j C H j C
C k H k = o k H k 2 + p k H k + q k
where a i , b i , c i , d i , e i , f j , g j , h j , l j , m j , n j , o k , p k , q k represent the cost coefficients of the PO, CHP, and HO units. The sinusoidal function in Equation (2) represents the valve point effect [37] of the PO unit, which is shown in Figure 1.

2.1.2. Gas Emissions

The pollutant gases generated during power generation include NO x , SO 2 , and CO 2 . The gas emission objective function F E is composed of the gas emission of the PO, CHP, and HO units [22], which is described as follows:
m i n F E = i = 1 N P E i P i + j = 1 N C E j P j C + k = 1 N H E k H k
where E i P i , E j P j C , and E k H k represent the gas emission of the i t h PO unit, j t h CHP unit, and k t h HO unit, respectively.
The gas emission functions of the PO, CHP, and HO units are formulated as follows:
E i P i = α i P i 2 + β i P i + γ i + δ i e ϵ i P i
E j P j C = ζ j P j C
E k H k = η k H k
where α i , β i , γ i , δ i , ϵ i , ζ j , and η k represent the emission coefficients of the PO, CHP, and HO units.

2.2. Constraints

2.2.1. Power Balance Constraint

The total power generated by all PO and CHP units should be equal to the total power demand P D plus the transmission loss P L :
i = 1 N P P i + j = 1 N C P j C = P D + P L
Transmission loss P L can be calculated by Kron’s loss formula:
P L = i = 1 N P + N C j = 1 N P + N C P ¯ i B i j P j ¯ + i = 1 N P + N C P ¯ i B 0 i + B 00
where B i j , B 0 i , B 00 are the coefficients of the B-matrix.

2.2.2. Heat Balance Constraint

The total heat generated by all CHP and HO units should be equal to the total heat demand H D :
j = 1 N C H j C + k = 1 N H H k = H D

2.2.3. Capacity Constraint of the PO Units

The capacity constraint of the PO units is:
P i min P i P i max i = 1 , , N P
where P i min and P i max are the lower and upper limits of the i t h PO unit, respectively.

2.2.4. Capacity Constraint of the CHP Units

The power and heat produced by the CHP units are coupled to each other and confined to a polygonal region called feasible operation regions, as illustrated in Figure 2. Therefore, the upper and lower power of the jth CHP unit are determined by its heat H j C , and the upper and lower heat of the jth CHP unit are determined by its power P j C :
P j C , min H j C P j C P j C , max H j C , j = 1 , , N C H j C , min P j C P j C H j C , max P j C , j = 1 , , N C
where P j C , min H j C and P j C , max H j C are the functions of the lower and upper power limits in the CHP unit. Similarly, H j C , min P j C and H j C , max P j C are the functions of the lower and upper heat limits in the CHP unit, which is shown in Figure 2.

2.2.5. Capacity Constraint of the HO Units

The capacity constraint of the HO units is:
H k min H k H k max , k = 1 , , N H
where H k min and H k max are the lower and upper limits of the k t h HO unit, respectively.

3. Proposed RLMODE Algorithm

3.1. MODE Algorithm

3.1.1. Initialization

At the beginning, MODE randomly initializes N candidate solutions { X i 0 , i = 1 , , N } as follows:
X i 0 = X L + rand · X U X L
where X L and X U are the lower and upper bounds, respectively; rand [ 0 , 1 ] D are random real values; D is the number of optimization variables.

3.1.2. Mutation

The differential mutation is the key production operator, which is used to generate the mutant solutions. The classic mutation strategy DE/rand/1 is described as follows:
V i G = X r 1 G + F i · X r 2 G X r 3 G ,
where V i G = V i , 1 G , V i , 2 G , , V i , D G is the mutant solution; G is the generation number; r 1 , r 2 , r 3 { 1 , 2 , , N } are three random numbers and r 1 r 2 r 3 i ; F i is the scale factor for the i t h individual, which is used for scaling the difference vector.

3.1.3. Crossover

The crossover operator aims at increasing the population diversity of the algorithm. The binary crossover operator is described as follows:
U i , j G = V i , j G if r a n d ( 0 , 1 ) C R or j = j rand X i , j G otherwise
where U i G = U i , 1 G , , U i , j G , , U i , D G is the offspring solution; r a n d ( 0 , 1 ) [ 0 , 1 ] is a random real number; j rand [ 1 , D ] is a random integer; C R is the crossover rate within [0, 1].

3.1.4. Selection

After the crossover operator, the offspring solutions and parent solutions are merged into one large group. Then, the fast nondominated sorting (FNS) and crowding distance (CD) operators are used to select better solutions in the next generation [25]. The FNS approach is shown in Figure 3a. The FNS approach divides the merged population into several frontiers according to the dominance relationship, where the solutions in the frontier S1 are the best level, the solutions in the frontier S2 are the second-best level, and so on. To estimate the density of the individuals in the same frontier, the CD operator is used, as shown in Figure 3b. For the boundary solutions, the CD value was set to infinite ; for the other solutions, the CD value of the ith solution is the mean side length of the rectangle consisting of the ( i 1 ) th and ( i + 1 ) th solutions.

3.2. RLMODE Algorithm

3.2.1. Reinforcement Learning Technique

RL is an important machine learning technique mainly including five elements, namely the environment, agent, state, action, and reward [36]. After the agent executes an action, the environment will turn into a new state. For the impact (positive or negative) caused by the new environmental state, a reward (positive or negative) will be sent to the agent. Then, the agent performs a new action based on the reward and the new state from the environmental feedback, as shown in Figure 4.
The Q-learning technique is a representative value-based RL model [38]. Q-learning is simple in structure and does not require any prior knowledge. It can be learned in the process of performing tasks. The Q-learning framework is shown in Algorithm 1.
The formula for updating the Q value is:
Q ( s t , a t ) = Q ( s t , a t ) + α [ r t + γ max a Q ( s t + 1 , a ) Q ( s t , a t ) ]
where Q ( s t , a t ) is the Q value for state s t and action a t , r t is the reward of the current generation, and max a Q ( s t + 1 , a ) is the maximum Q value of the action in the next state s t + 1 .
Algorithm 1 Pseudocode for Q-learning.
Require: 
State s t , action a t , discount factor γ , learning rate α , reward R.
Ensure: 
Final state s.
  1:
Initialize the Q table.
  2:
Randomly initialize the current state s t .
  3:
while  F E S m a x F E S   do
  4:
    Choose the best action a t based on the Q table;
  5:
    Perform action a t , and obtain a reward r t ;
  6:
    Obtain the maximum Q value of the next state s t + 1 ;
  7:
    Update the Q table by Equation (18);
  8:
    Set the current state s t = s t + 1 ;
  9:
     F E S = F E S + 1
10:
end while

3.2.2. Q-Learning Parameter Adjustment

In the proposed RLMODE algorithm, Q-learning is employed to adjust the control parameter (i.e., scale factor F i ). The Q table is used to record the values of pairs (state, action). As shown in Figure 5, for each individual, the agent has three types of states and three types of actions for each state. The probability of the agent to select different actions in different states is determined according to the values in the Q table.
Three states are defined in RLMODE, i.e.:
  • State S = 1 : the offspring solution dominates its own parent solution, indicating that the mutation operator achieves success, and a positive reward value is assigned R = 1 ;
  • State S = 2 : the offspring solution does not dominate its own parent solution, but dominates one of the other parent solutions, indicating that the mutation operator is relatively successful, and a middle reward value is assigned R = 0.5 ;
  • State S = 3 : the offspring solution does not dominate its own parent solution or the other parent solutions, which indicates that the mutation operator fails, and no reward value is assigned R = 0 .
Three actions used to adjust the scale factor are: (1) d F = 0.1 ; (2) d F = 0 ; and (3) d F = 0.1 .
The probability of each agent selects action a j in state s i is determined by the softMax strategy:
π ( s i , a j ) = e Q ( s i , a j ) / T j = 1 n e Q ( s i , a j ) / T
where π ( s i , a j ) is the selection probability for the agent.
After selecting the action, the agent adjusts its scale factor F i as follows:
F i = F i + d F i
In the RLMODE, each individual has an independent Q table, and therefore, there are in total N Q tables. Each individual updates its Q table independently during the iterative process.

3.2.3. Elite-Guided Mutation

In order to enhance the convergence speed, an elite-guided mutation operator is employed in the RLMODE algorithm. The elite guided mutation operator is shown as follows:
V i G = X i G + F i · pBest i X i G + F i · X r 1 G X r 2 G ,
where p B e s t i is one of the top 10% of individuals in the population after the fast nondominated sorting and crowding distance operators.

3.2.4. Pseudocode of RLMODE Algorithm

By using the reinforcement learning technique, the detailed pseudocode of RLMODE is shown in Algorithm 2.
Algorithm 2 Pseudocode of the RLMODE algorithm.
Require: 
Population size N, crossover rate C R , discount factor γ , learning rate α .
Ensure: 
The Pareto-optimal solutions.
  1:
// == == == Initialization == == == //
  2:
Initialize action matrix a 0 , state matrix s 0 , reward matrix R, Q tables;
  3:
Set F E S = 0 , G = 0 ;
  4:
Initialize the population X i 0 , i = 1 , , N according to Equation (15);
  5:
Evaluate the fitness of the population;
  6:
Sort the population using fast nondominated sorting (FNS) and crowding distance (CD) operators;
  7:
F E S = F E S + N ;
  8:
while  F E S < m a x F E S   do
  9:
     // == == == Mutation and crossover == == == //
10:
     for  i = 1 to N do
11:
         Generate V i G using the elite-guided mutation operator according to Equation (21);
12:
         Generate U i G using the crossover operator according to Equation (17);
13:
     end for
14:
     // == == == Q-learning-based parameter adjustment == == == //
15:
     for  i = 1 to N do
16:
         Calculate the action selection probability for the ith agent according to Equation (19);
17:
         Choose the action to adjust the value d F i ;
18:
         Update the action of the ith agent;
19:
         Evaluate the fitness of offspring U i G ;
20:
         if  U i G X i G  then
21:
              Set the reward R i = 1 , and state S i = 1 ;
22:
         else if  U i G o t h e r X j G ( j i )  then
23:
              Set the reward R i = 0.5 , and state S i = 2 ;
24:
         else
25:
              Set the reward R i = 0 , and state S i = 3 ;
26:
         end if
27:
         Update the Q table for the ith agent;
28:
         Adjust the scale factor F i for X i according to Equation (20);
29:
     end for
30:
     // == == == Pareto selection == == == //
31:
     Merge the parent and offspring into a large population with 2 N solutions;
32:
     Sort the population using FNS and CD operators and choose the best N solutions in the next generation;
33:
      F E S = F E S + N ; G = G + 1
34:
end while

4. Implementation of RLMODE for Solving CHPEED

The individual X i is composed of the power and heat output of the PO, CHP, and HO units:
X i = P i , 1 , , P i , N P , P i , 1 C , , P i , N C C , H i , 1 C , , H i , N C C , H i , 1 , , H i , N H , i = 1 , , N
The constraint repair techniques are as follows:
(1) For the power balance constraint, the power output vector is composed of the PO and CHP units, i.e., X i P = [ P i , 1 , , P i , N p , P i , 1 C , , P i , N C C ] . The difference value between power production and power demand is defined as:
d i f P = P D + P L j = 1 N P P i , j j = 1 N C P i , j C
If | d i f P | > e p ( e p is a very small positive value), then randomly select a dimension variable X i , j from X i P , and X i , j is repaired as follows:
X i , j = min X i , j + d i f P , P j max , if d i f P > 0 and X i , j P j max max X i , j d i f P , P j min , if d i f P < 0 and X i , j P j min X i , j , otherwise
After repairing X i , j , re-calculate the value of d i f P . If | d i f P | > e p , then select another dimension variable X k , j from X i P that was not previously selected and continue to repair X k , j using Equation (24). The above repair process is repeated until | d i f P | e p .
(2) For the heat balance constraint, the heat output vector is composed of CHP and HO units, i.e., X i H = [ H i , 1 C , , H i , N C C , H i , 1 , , H i , N H ] . The difference value between heat production and heat demand is defined as:
d i f H = H D j = 1 N C H i , j C j = 1 N H H i , j
if | d i f H | > e p , then randomly select a dimension variable X i , j from X i P , and X i , j is repaired as follows:
X i , j = min X i , j + d i f H , H j max , if d i f H > 0 and X i , j H j max max X i , j d i f H , H j min , if d i f H < 0 and X i , j H j min X i , j , otherwise
After repairing X i , j , re-calculate the value of d i f H . If | d i f H | > e p , then select another dimension variable X k , j from X i H that was not previously selected, and continue to repair X k , j using Equation (26). The above repair process is repeated until | d i f H | e p .
(3) The power output of the PO units is repaired as:
P i , j = P j min , if P i , j P j min P j max , if P i , j P j max P i , j , otherwise
(4) The output of the CHP units is repaired as:
P i , j C = P j C , min H i , j C , if P i , j C P j C , min H i , j C P j C , max H i , j C , if P i , j C P j C , max H i , j C P i , j C , otherwise
H i , j C = H j C , min P i , j C , if H i , j C H j C , min P i , j C H j C , max P i , j C , if H i , j C H j C , max P i , j C H i , j C , otherwise
(5) The heat output of the HO units is repaired as:
H i , j = H j min , if H i , j H j min H j max , if H i , j H j max H i , j , otherwise
In total, the constraint repair process is shown in Figure 6.
Not all constraints were strictly satisfied after using the constraint repair technique. Therefore, the total constraint violation degree of each individual X i is calculated as follows:
V X i = V P B + V H B + V P + V C H P + V H = P D + P L j = 1 N P P i , j + j = 1 N C P i , j C + H D j = 1 N C H i , j C j = 1 N H H i , j + j = 1 N P max P i , j P j max , 0 + max P j min P i , j , 0 + j = 1 N C max P i , j C P j C , max H i , j C , 0 + max P j C , min H i , j C P i , j C , 0 + j = 1 N C max H i , j C H j C , max P i , j C , 0 + max H j C , min P i , j C H i , j C , 0 + j = 1 N H max H i , j H j max , 0 + max H j min H i , j , 0
where V P B and V H B are the violation degree of the power balance and heat balance constraints, respectively; V P , V C H P , and V H are the violation degree for the PO, CHP, and HO capacity constraints, respectively.
Now, both the objective function and constraint violation degree are obtained. When applying the RLMODE algorithm to handle the CHPEED problem, the constraint domination principle (CDP) [39] is also adopted.
The flowchart of RLMODE for handling CHPEED is shown in Figure 7.

5. Simulation Results

The RLMODE algorithm was utilized to solve four CHPEED problems: 5, 7, 100, and 140 generating units. The effectiveness of the RLMODE algorithm was verified by comparing with four representative multi-objective optimization algorithms, namely TV-MOPSO [27], GDE3 [40], NSGA-II-DE [41], and MODE-RMO [42]. The parameter settings of these multi-objective algorithms are given in Table 1. All the multi-objective algorithms were implemented 30 times independently.

5.1. Case 1: Five-Unit CHPEED Problem

The first case was a five-unit CHPEED problem chosen from [25]. It consists of 1 PO unit, 3 CHP units, and 1 HO unit. The power requirement and heat requirement were 300 MW and 150 MWth, respectively. The computational resource, i.e., maximum functional evaluations m a x F E S = 1000 was used.
Table 2 presents the results of economic dispatch (EcD), emission dispatch (EmD), and economic emission dispatch (EED) for Case 1. From Table 2, it can be seen that:
  • In the case of EcD, the costs of TV-MOPSO, GDE3, NSGA-II-DE, MODE-RMO, and RLMDOE were USD 13,686.49, 13,712.33, 13,700.49, 13,675.28, and 13,674.70, respectively. Therefore, RLMDOE achieved the smallest cost among the five algorithms.
  • In case of EmD, the emissions of TV-MOPSO, GDE3, NSGA-II-DE, MODE-RMO, and RLMDOE were 1.21 kg, 1.24 kg, 1.23 kg, 1.23 kg, and 1.21 kg, respectively. Therefore, RLMDOE and TV-MOPSO achieved the smallest emission.
  • In the case of EED, the results of the best compromise solutions of the five algorithms were given. The cost and emission of RLMDOE were USD 14,856.36 and 6.09 kg, which were smaller than those of TV-MOPSO, GDE3, NSGA-II-DE, and MODE-RMO. Therefore, RLMODE achieved the best compromise solution. Due to the complexity of the RLMODE algorithm, its simulation time and computational memory were not dominant.
The Pareto-optimal frontier (POF) obtained by TV-MOPSO, GDE3, NSGA-II-DE, MODE-RMO, and RLMODE is plotted in Figure 8.
To quantifiably compare the POF obtained by these algorithms, three performance metrics, i.e., diversity metric (DM) [43], hypervolume (HV) [44], and inverted generational distance (IGD) [45], were further employed to compare the POF obtained by these algorithms. DM measures the diversity of the POF, and a larger DM value means better diversity of the algorithm. HV measures both the convergence and diversity of the POF. A larger HV value indicates the better performance of the algorithm. IGD also measures both convergence and diversity. A smaller IGD value indicates the better performance of the algorithm.
Table 3 presents the statistical results of the DM, HV, and IGD metrics including the minimum, mean, maximum values, and standard deviation (Std) based on 30 independent runs. Meanwhile, the Wilcoxon rank sum test was performed, where “+” and “=” mean RLMODE is significantly better than or similar to the comparison algorithm, respectively. As can be seen from Table 3:
  • Concerning DM, the minimum, mean, and maximum values and standard deviation of RLMODE were better than TV-MOPSO, GDE3, NSGA-II-DE, and MODE-RMO.
  • Regarding HV, the minimum, mean, and maximum values of RLMODE were better than TV-MOPSO, GDE3, NSGA-II-DE, and MODE-RMO. The standard deviation of RLMODE was the second-best after TV-MOPSO.
  • Considering IGD, the mean and maximum values and standard deviation of RLMODE were better than TV-MOPSO, GDE3, NSGA-II-DE, and MODE-RMO. The minimum IGD of RLMODE was the second-best after TV-MOPSO.
  • Based on the Wilcoxon test, RLMODE was notably better than GDE3, NSGA-II-DE, and MODE-RMO in terms of DM, HV, and IGD. RLMODE was notably better than TV-MOPSO in terms of HV and similar to TV-MOPSO in terms of DM and IGD.
From the above analysis, the RLMODE algorithm achieved the overall best performance among in the aspects of convergence and diversity for Case 1.

5.2. Case 2: Seven-Unit CHPEED Problem

The second case was the seven-unit CHPEED problem selected from [25]. It consisted of 4 PO units, 2 CHP units, and 1 HO unit. The power requirement and heat requirement were 600 MW and 150 MWth, respectively. The maximum computational resource m a x F E S = 2000 was used.
Table 4 presents the results of the EcD, EmD, and EED for Case 2. From Table 4, it can be seen that:
  • In the case of EcD, the costs of TV-MOPSO, GDE3, NSGA-II-DE, MODE-RMO, and RLMDOE were USD 10,261.88, 10,298.40, 10,222.16, 10,249.37, and 10,212.26. Therefore, RLMDOE achieved the smallest cost among the five algorithms.
  • In the case of EmD, the emissions of TV-MOPSO, GDE3, NSGA-II-DE, MODE-RMO, and RLMDOE were 7.75 kg, 7.88 kg, 7.74 kg, 7.59 kg, and 7.54 kg, respectively. Therefore, RLMDOE achieved the smallest emission among the five algorithms.
  • In the case of EED, the cost and emission of RLMDOE were USD 12,000.28 and 18.42 kg, which were smaller than those of TV-MOPSO, GDE3, NSGA-II-DE, and MODE-RMO. Therefore, RLMODE achieved the best compromise solution.
The Pareto-optimal frontier (POF) obtained by TV-MOPSO, GDE3, NSGA-II-DE, MODE-RMO, and RLMODE is plotted in Figure 9.
Table 5 presents the statistical results of the performance metrics based on 30 independent runs. As can be seen from Table 5:
  • Concerning DM, the minimum, mean, and maximum values of RLMODE were better than TV-MOPSO, GDE3, NSGA-II-DE, and MODE-RMO.
  • Regarding HV, the minimum, mean, and maximum values and standard deviation of RLMODE were better than TV-MOPSO, GDE3, NSGA-II-DE, and MODE-RMO.
  • Considering IGD, the minimum and mean values and standard deviation of RLMODE were better than TV-MOPSO, GDE3, NSGA-II-DE, and MODE-RMO. The maximum IGD of RLMODE was the second-best after NSGA-II-DE.
  • Based on the Wilcoxon test, RLMODE was notably better than TV-MOPSO, GDE3, NSGA-II-DE, and MODE-RMO in terms of DM, HV, and IGD.
From the above analysis, the RLMODE algorithm achieved the overall best performance in the aspects of convergence and diversity for Case 2.

5.3. Case 3: 100-Unit CHPEED Problem

The third case was a 100-unit CHPEED problem, which was established by duplicating Case 1 20 times. It consisted of 20 PO units, 60 CHP units, and 20 HO units. The power requirement and heat requirement were 6000 MW and 3000 MWth, respectively. The maximum computational resource m a x F E S = 20,000 was used.
Table 6 presents the cost and emission results of the EcD, EmD, and EED for Case 3. Due to the large size of 100 units, the detailed dispatch results of the PO, HO, and CHP units by the other algorithms are given in Tables S1–S3 in the Supplementary File. From Table 6, it can be seen that:
  • In the case of EcD, the costs of TV-MOPSO, GDE3, NSGA-II-DE, MODE-RMO, and RLMDOE were USD 284,998.66, 280,781.47, 278,648.30, 278,670.12, and 278,102.84, respectively. Therefore, RLMDOE achieved the smallest cost.
  • In the case of EmD, the emissions of TV-MOPSO, GDE3, NSGA-II-DE, MODE-RMO, and RLMDOE were 45.49 kg, 33.93 kg, 26.39 kg, 30.99 kg, and 25.56 kg, respectively. Therefore, RLMDOE achieved the smallest emission.
  • In the case of EED, the cost and emission of RLMDOE were USD 292,647.89 and 153.57 kg, which were smaller than those of TV-MOPSO, GDE3, NSGA-II-DE, and MODE-RMO. Therefore, RLMODE achieved the best compromise solution.
The Pareto-optimal frontier (POF) obtained by TV-MOPSO, GDE3, NSGA-II-DE, MODE-RMO, and RLMODE is plotted in Figure 10.
Table 7 presents the statistical results of the performance metrics based on 30 independent runs. As can be seen from Table 7:
  • Concerning DM, the minimum and mean values and standard deviation of RLMODE were better than TV-MOPSO, GDE3, NSGA-II-DE, and MODE-RMO. The maximum DM of RLMODE was the second-best after NSGA-II-DE.
  • Regarding HV, the minimum, mean, and maximum values and standard deviation of RLMODE were better than TV-MOPSO, GDE3, NSGA-II-DE, and MODE-RMO.
  • Considering IGD, the minimum, mean, and maximum values and standard deviation of RLMODE were better than TV-MOPSO, GDE3, NSGA-II-DE, and MODE-RMO.
  • Based on the Wilcoxon test, RLMODE was notably better than TV-MOPSO, GDE3, NSGA-II-DE, and MODE-RMO in terms of DM, HV, and IGD.
Therefore, the RLMODE algorithm achieved the overall best performance among the five algorithms in the aspects of convergence and diversity for the large-scale Case 3.

5.4. Case 4: 140-Unit CHPEED Problem

The third case was a 140-unit CHPEED problem, which was established by duplicating 7-unit Case 2 20 times. It consisted of 80 PO units, 40 CHP units, and 20 HO units. The power requirement and heat requirement were 12,000 MW and 3000 MWth, respectively. The maximum computational resource m a x F E S = 30,000 was used.
Table 8 presents the cost and emission results of the EcD, EmD, and EED for Case 4. Due to the large size of the 140 units, the detailed dispatch results of the PO, HO, and CHP units by the other algorithms are given in Tables S4–S6 in the Supplementary File. From Table 8, it can be seen that:
  • In the case of EcD, the costs of TV-MOPSO, GDE3, NSGA-II-DE, MODE-RMO, and RLMDOE were USD 237,703.69, 224,936.75, 239,690.11, 225,670.28, and 216,483.24, respectively. Therefore, RLMDOE achieved the smallest cost.
  • In the case of EmD, the emissions of TV-MOPSO, GDE3, NSGA-II-DE, MODE-RMO, and RLMDOE were 194.38 kg, 201.67 kg, 180.39 kg, 191.32 kg, and 172.18 kg, respectively. Therefore, RLMDOE achieved the smallest emission.
  • In the case of EED, the cost and emission of RLMDOE were USD 239,690.11 and 391.68kg, which were smaller than those of TV-MOPSO, GDE3, NSGA-II-DE, and MODE-RMO. Therefore, RLMODE achieved the best compromise solution.
The Pareto-optimal frontier (POF) obtained by TV-MOPSO, GDE3, NSGA-II-DE, MODE-RMO, and RLMODE is plotted in Figure 11.
Table 9 presents the statistical results of the performance metrics based on 30 independent runs. As can be seen from Table 9:
  • Concerning DM, the minimum, mean, and maximum values of RLMODE were better than those of TV-MOPSO, GDE3, NSGA-II-DE, and MODE-RMO. The standard deviation of RLMODE was the second-best after NSGA-II-DE.
  • Regarding HV, the minimum, mean, and maximum values and standard deviation of RLMODE were better than those of TV-MOPSO, GDE3, NSGA-II-DE, and MODE-RMO. The standard deviation of RLMODE was the second-best after NSGA-II-DE.
  • Considering IGD, the minimum and mean values of RLMODE were better than those of TV-MOPSO, GDE3, NSGA-II-DE, and MODE-RMO. The maximum IGD and standard deviation of RLMODE were the second-best after NSGA-II-DE.
  • Based on the Wilcoxon test, RLMODE was notably better than TV-MOPSO, GDE3, NSGA-II-DE, and MODE-RMO in terms of DM, HV, and IGD.
Therefore, the RLMODE algorithm achieved the overall best performance in the aspects of convergence and diversity for the large-scale Case 4.

6. Conclusions

In this paper, a reinforcement-learning-based multi-objective differential evolution (RLMODE) algorithm was devised to deal with the CHPEED problem considering large-scale systems with more than 100 units. In RLMODE, a reinforcement learning technique called Q-learning was employed to adjust the scale factor parameters. The constraint repair technique and constraint domination principle were employed to deal with complex operating constraints in CHPEED. The suggested RLMODE was applied to solve four CHPEED problems with 5, 7, 100, and 140 units and compared with well-established multi-objective algorithms. The main findings are summarized below:
  • For two small-scale CHPEED problems with 5 and 7 units, the proposed RLMODE achieved better results in the case of economic dispatch (EcD), emission dispatch (EmD), and economic emission dispatch (EED). The costs and emissions of RLMODE were less than the four compared algorithms, TV-MOPSO, GDE3, NSGA-II-DE, and MODE-RMO.
  • For two large-scale CHPEED problems with 100 and 140 units, the proposed RLMODE also achieved the best results in the case of EcD, EmD, and EED. The costs and emissions of RLMODE were the smallest among the compared algorithms.
  • Considering the performance metrics of the Pareto-optimal Front (i.e., DM, HV, and IGD), the suggested RLMODE obtained better results than the compared algorithms, and the Wilcoxon rank sum test indicated that the superiority was significant.
  • The Pareto-optimal frontier obtained by RLMODE was better than the compared algorithms from Figure 8 to Figure 11. The superiority was especially obvious for the two large-scale CHPEED problems with 100 and 140 units.
The proposed RLMODE showed its effectiveness for the CHPEED problem. The good performance of RLMODE benefited from the reinforcement-learning-based parameter adjustment technique. There are some promising research for future work. Firstly, the multi-region power system is very important, and it is a promising work to improve the RLMODE algorithm to solve the multi-region CHPEED problem. In addition, CHP unit commitment is seldomly studied in the existing research, and the development of efficient optimization method for the CHP unit commitment problem is also worth investigation.

Supplementary Materials

The following Supporting Information can be downloaded at https://www.mdpi.com/article/10.3390/en16093753/s1. Table S1: Detailed results of economic dispatch for 100-unit CHPEED problem; Table S2: Detailed results of emission dispatch for 100-unit CHPEED problem; Table S3: Detailed results of economic emission dispatch for 100-unit CHPEED problem; Table S4: Detailed results of economic dispatch for 140-unit CHPEED problem; Table S5: Detailed results of emission dispatch for 140-unit CHPEED problem; Table S6: Detailed results of economic emission dispatch for 140-unit CHPEED problem; Table S7: All variables in the article.

Author Contributions

Conceptualization, X.C. and S.F.; methodology, X.C. and K.L.; software, S.F.; validation, X.C., S.F. and K.L.; formal analysis, X.C.; investigation, S.F.; resources, X.C.; data curation, S.F.; writing—original draft preparation, S.F.; writing—review and editing, X.C.; visualization, S.F.; supervision, X.C.; project administration, X.C. and K.L.; funding acquisition, K.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the National Natural Science Foundation of China (61873114) and the Youth Program of the Faculty of Agricultural Equipment in Jiangsu University (NZXB20210211).

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Yang, W.; Peng, Z.; Yang, Z.; Guo, Y.; Chen, X. An enhanced exploratory whale optimization algorithm for dynamic economic dispatch. J. Abbr. 2021, 7, 7015–7029. [Google Scholar] [CrossRef]
  2. Nazari-Heris, M.; Mohammadi, I.B.; Asadi, S.; Geem, Z. Large-scale combined heat and power economic dispatch using a novel multi-player harmony search method. Appl. Therm. Eng. 2019, 154, 493–504. [Google Scholar] [CrossRef]
  3. Chen, X.; Shen, A. Self-adaptive differential evolution with Gaussian–Cauchy mutation for large-scale CHP economic dispatch problem. Neural Comput. Appl. 2022, 34, 11769–11787. [Google Scholar] [CrossRef]
  4. Rooijers, F.J.; van Amerongen, A.R.A. Static economic dispatch for co-generation systems. IEEE Trans. Power Syst. 1994, 9, 1392–1398. [Google Scholar] [CrossRef]
  5. Wong, K.P.; Algie, C. Evolutionary programming approach for combined heat and power dispatch. IEEE Trans. Power Syst. 2002, 61, 227–232. [Google Scholar] [CrossRef]
  6. Rong, A.; Lahdelma, R. An efficient envelope-based Branch and Bound algorithm for non-convex combined heat and power production planning. Eur. J. Oper. Res. 2007, 183, 412–431. [Google Scholar] [CrossRef]
  7. Zou, D.; Li, S.; Kong, X.; Ouyang, H.; Li, Z. Solving the combined heat and power economic dispatch problems by an improved genetic algorithm and a new constraint handling strategy. Appl. Energy 2019, 237, 646–670. [Google Scholar] [CrossRef]
  8. Srivastava, A.; Das, D.K. A new Kho-Kho optimization Algorithm: An application to solve combined emission economic dispatch and combined heat and power economic dispatch problem. Eng. Appl. Artif. Intell. 2020, 94, 103763. [Google Scholar] [CrossRef]
  9. Liu, D.; Hu, Z.; Su, Q.; Liu, M. A niching differential evolution algorithm for the large-scale combined heat and power economic dispatch problem. Appl. Soft Comput. 2021, 113, 108017. [Google Scholar] [CrossRef]
  10. Zou, D.; Gong, D. Differential evolution based on migrating variables for the combined heat and power dynamic economic dispatch. Energy 2022, 238, 121664. [Google Scholar] [CrossRef]
  11. Chen, X.; Li, K. Collective information-based particle swarm optimization for multi-fuel CHP economic dispatch problem. Knowl.-Based Syst. 2022, 248, 108902. [Google Scholar] [CrossRef]
  12. Shaheen, A.M.; El-Sehiemy, R.A.; Elattar, E.; Ginidi, A.R. An Amalgamated Heap and Jellyfish Optimizer for economic dispatch in Combined heat and power systems including N-1 Unit outages. Energy 2022, 246, 123351. [Google Scholar] [CrossRef]
  13. Rizk-Allah, R.M.; Hassanien, A.E.; Snášel, V. A hybrid chameleon swarm algorithm with superiority of feasible solutions for optimal combined heat and power economic dispatch problem. Energy 2022, 254, 124340. [Google Scholar] [CrossRef]
  14. Ramachandran, M.; Mirjalili, S.; Nazari-Heris, M.; Parvathysankar, D.S.; Sundaram, A.; Gnanakkan, C.A.R.C. A hybrid grasshopper optimization algorithm and Harris hawks optimizer for combined heat and power economic dispatch problem. Eng. Appl. Artif. Intell. 2022, 111, 104753. [Google Scholar] [CrossRef]
  15. Ramachandran, M.; Mirjalili, S.; Ramalingam, M.M.; Gnanakkan, C.A.R.C.; Parvathysankar, D.S.; Sundaram, A. A ranking-based fuzzy adaptive hybrid crow search algorithm for combined heat and power economic dispatch. Expert Syst. Appl. 2022, 197, 116625. [Google Scholar] [CrossRef]
  16. Yang, Q.; Liu, P.; Zhang, J.; Dong, N. Combined heat and power economic dispatch using an adaptive cuckoo search with differential evolution mutation. Appl. Energy 2022, 307, 118057. [Google Scholar] [CrossRef]
  17. Li, Y.; Wang, J.; Zhao, D.; Li, G.; Chen, C. A two-stage approach for combined heat and power economic emission dispatch: Combining multi-objective optimization with integrated decision making. Energy 2018, 162, 237–254. [Google Scholar] [CrossRef]
  18. Elaiw, A.; Xia, X.; Shehata, A. Combined heat and power dynamic economic dispatch with emission limitations using hybrid DE-SQP method. Abstr. Appl. Anal. Hindawi 2013, 2013, 1–10. [Google Scholar] [CrossRef]
  19. Ahmadi, A.; Moghimi, H.; Nezhad, A.E.; Agelidis, V.G.; Sharaf, A.M. Multi-objective economic emission dispatch considering combined heat and power by normal boundary intersection method. Electr. Power Syst. Res. 2015, 129, 32–43. [Google Scholar] [CrossRef]
  20. Anand, H.; Narang, N. Civilized swarm optimization for combined heat and power economic emission dispatch. In Proceedings of the 2016 7th India International Conference on Power Electronics (IICPE), Patiala, India, 17–19 November 2016; pp. 1–6. [Google Scholar]
  21. Sadeghian, H.; Ardehali, M. A novel approach for optimal economic dispatch scheduling of integrated combined heat and power systems for maximum economic profit and minimum environmental emissions based on Benders decomposition. Energy 2016, 102, 10–23. [Google Scholar] [CrossRef]
  22. Alomoush, M.I. Application of the stochastic fractal search algorithm and compromise programming to combined heat and power economic–emission dispatch. Eng. Optim. 2020, 52, 1992–2010. [Google Scholar] [CrossRef]
  23. Jadoun, V.K.; Prashanth, G.R.; Joshi, S.S.; Narayanan, K.; Malik, H.; Márquez, F.P.G.A. Optimal fuzzy based economic emission dispatch of combined heat and power units using dynamically controlled Whale Optimization Algorithm. Appl. Energy 2022, 315, 119033. [Google Scholar] [CrossRef]
  24. Niknam, T.; Azizipanah-Abarghooee, R.; Roosta, A.; Amiri, B. A new multi-objective reserve constrained combined heat and power dynamic economic emission dispatch. Energy 2012, 42, 530–545. [Google Scholar] [CrossRef]
  25. Basu, M. Combined heat and power economic emission dispatch using nondominated sorting genetic algorithm-II. Int. J. Electr. Power Energy Syst. 2013, 53, 135–141. [Google Scholar] [CrossRef]
  26. Shi, B.; Yan, L.; Wu, W. Multi-objective optimization for combined heat and power economic dispatch with power transmission loss and emission reduction. Energy 2013, 56, 135–143. [Google Scholar] [CrossRef]
  27. ali Shaabani, Y.; Seifi, A.R.; Kouhanjani, M.J. Stochastic multi-objective optimization of combined heat and power economic/emission dispatch. Energy 2017, 141, 1892–1904. [Google Scholar] [CrossRef]
  28. Sun, J.; Deng, J.; Li, Y. Indicator & crowding distance-based evolutionary algorithm for combined heat and power economic emission dispatch. Appl. Soft Comput. 2020, 90, 106158. [Google Scholar]
  29. Sundaram, A. Combined heat and power economic emission dispatch using hybrid NSGA II-MOPSO algorithm incorporating an effective constraint handling mechanism. IEEE Access 2020, 8, 13748–13768. [Google Scholar] [CrossRef]
  30. Sundaram, A. Multiobjective multi-verse optimization algorithm to solve combined economic, heat and power emission dispatch problems. Appl. Soft Comput. 2020, 91, 106195. [Google Scholar] [CrossRef]
  31. Xiong, G.; Shuai, M.; Hu, X. Combined heat and power economic emission dispatch using improved bare-bone multi-objective particle swarm optimization. Energy 2022, 244, 123108. [Google Scholar] [CrossRef]
  32. Storn, R.; Price, K. Differential evolution–a simple and efficient heuristic for global optimization over continuous spaces. J. Glob. Optim. 1997, 11, 341–359. [Google Scholar] [CrossRef]
  33. Wang, X.; Dong, Z.; Tang, L. Multiobjective differential evolution with personal archive and biased self-adaptive mutation selection. IEEE Trans. Syst. Man, Cybern. Syst. 2018, 50, 5338–5350. [Google Scholar] [CrossRef]
  34. Basu, M. Economic environmental dispatch using multi-objective differential evolution. Appl. Soft Comput. 2011, 11, 2845–2853. [Google Scholar] [CrossRef]
  35. Qiao, B.; Liu, J.; Hao, X. A multi-objective differential evolution algorithm and a constraint handling mechanism based on variables proportion for dynamic economic emission dispatch problems. Appl. Soft Comput. 2021, 108, 107419. [Google Scholar] [CrossRef]
  36. Hu, Z.; Gong, W.; Li, S. Reinforcement learning-based differential evolution for parameters extraction of photovoltaic models. Energy Rep. 2021, 7, 916–928. [Google Scholar] [CrossRef]
  37. Chen, X.; Tang, G. Solving static and dynamic multi-area economic dispatch problems using an improved competitive swarm optimization algorithm. Energy 2022, 238, 122035. [Google Scholar] [CrossRef]
  38. Watkins, C.J.; Dayan, P. Q-learning. Mach. Learn. 1992, 8, 279–292. [Google Scholar] [CrossRef]
  39. Deb, K.; Pratap, A.; Agarwal, S.; Meyarivan, T. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 2002, 6, 182–197. [Google Scholar] [CrossRef]
  40. Marek, M.; Kadlec, P. Another evolution of generalized differential evolution: Variable number of dimensions. Eng. Optim. 2022, 7, 61–80. [Google Scholar] [CrossRef]
  41. Li, H.; Zhang, Q. Multiobjective optimization problems with complicated Pareto sets, MOEA/D and NSGA-II. IEEE Trans. Evol. Comput. 2008, 13, 284–302. [Google Scholar] [CrossRef]
  42. Chen, X.; Du, W.; Qian, F. Multi-objective differential evolution with ranking-based mutation operator and its application in chemical process optimization. Chemom. Intell. Lab. Syst. 2014, 136, 85–96. [Google Scholar] [CrossRef]
  43. Deb, K.; Jain, S. Running performance metrics for evolutionary multi-objective optimization. In Proceedings of the Fourth Asia-Pacific Conference on Simulated Evolution and Learning (SEAL02), Singapore, 18–22 November 2002; pp. 13–20. [Google Scholar]
  44. Zitzler, E.; Thiele, L. Multiobjective evolutionary algorithms: A comparative case study and the strength Pareto approach. IEEE Trans. Evol. Comput. 1999, 3, 257–271. [Google Scholar] [CrossRef]
  45. Coello, C.A.C.; Cortés, N.C. Solving multiobjective optimization problems using an artificial immune system. Genet. Program. Evolvable Mach. 2005, 6, 163–190. [Google Scholar] [CrossRef]
Figure 1. Cost curve with valve point effect.
Figure 1. Cost curve with valve point effect.
Energies 16 03753 g001
Figure 2. Capacity constraint of two typical CHP units.
Figure 2. Capacity constraint of two typical CHP units.
Energies 16 03753 g002
Figure 3. Fast nondominated sorting and crowding distance.
Figure 3. Fast nondominated sorting and crowding distance.
Energies 16 03753 g003
Figure 4. Agent–environment interaction of learning.
Figure 4. Agent–environment interaction of learning.
Energies 16 03753 g004
Figure 5. Q table for one agent.
Figure 5. Q table for one agent.
Energies 16 03753 g005
Figure 6. Constraint repair process for individual X i .
Figure 6. Constraint repair process for individual X i .
Energies 16 03753 g006
Figure 7. Flowchart of RLMODE for solving CHPEED.
Figure 7. Flowchart of RLMODE for solving CHPEED.
Energies 16 03753 g007
Figure 8. Comparison of Pareto-optimal frontier for the 5-unit CHPEED problem.
Figure 8. Comparison of Pareto-optimal frontier for the 5-unit CHPEED problem.
Energies 16 03753 g008
Figure 9. Comparison of Pareto-optimal frontier for the 7-unit CHPEED problem.
Figure 9. Comparison of Pareto-optimal frontier for the 7-unit CHPEED problem.
Energies 16 03753 g009
Figure 10. Comparison of Pareto-optimal frontier for the 100-unit CHPEED problem.
Figure 10. Comparison of Pareto-optimal frontier for the 100-unit CHPEED problem.
Energies 16 03753 g010
Figure 11. Comparison of Pareto-optimal frontier for the 140-unit CHPEED problem.
Figure 11. Comparison of Pareto-optimal frontier for the 140-unit CHPEED problem.
Energies 16 03753 g011
Table 1. Parameter settings for the multi-objective algorithms.
Table 1. Parameter settings for the multi-objective algorithms.
AlgorithmParameters
TV-MOPSO [27]Population size N = 100, weight coefficient ω m i n = 0.1 , ω m a x = 0.9 ,
acceleration coefficient C 1 f = C 2 i = 0.5 , C 1 i = C 2 f = 2
GDE3 [40]N = 100, scale factor F = 0.5 , crossover rate C R = 0.5
NSGA-II-DE [41] N = 100 , polynomial mutation rate η = 20 , F = 0.5 , C R = 0.5
MODE-RMO [42] N = 100 , F = 0.5 , C R = 0.5
RLMODE N = 100 , C R = 0.5 , α = 0.1 , γ = 0.5
Table 2. Results of EcD, EmD, and EED for the 5-unit CHPEED problem.
Table 2. Results of EcD, EmD, and EED for the 5-unit CHPEED problem.
OutputTV-MOPSOGDE3NSGA-II-DEMODE-RMORLMODE
P 1 (MW)135135135135135
P 1 C (MW)44.9248.1551.5340.4841.58
P 2 C (MW)16.5216.3010.3019.5218.43
P 3 C (MW)103.56100.55103.17105105
EcD H 1 C (MWth)68.7269.8774.4475.4176.36
H 2 C (MWth)42.7941.0339.2941.6340.08
H 3 C (MWth)2.396.712.7300
H 1 (MWth)36.1032.3933.5432.9633.56
Cost (USD)13,686.4913,712.3313,700.4913,675.2813,674.70
Emission (kg)12.0512.0412.0412.0412.04
P 1 (MW)3535353535
P 1 C (MW)116.87118.76115.75118.71114.19
P 2 C (MW)48.5748.5155.1445.4746.57
P 3 C (MW)99.5697.7394.11100.83104.24
EmD H 1 C (MWth)91.4578.9898.8979.33102.35
H 2 C (MWth)41.9240.9812.5736.0828.83
H 3 C (MWth)4.22017.446.950
H 1 (MWth)12.4130.0521.1127.6318.82
Cost (USD)12.4130.0521.1127.6318.82
Emission (kg)1.211.241.231.231.21
P 1 (MW)94.1994.3895.0494.8494.36
P 1 C (MW)73.8967.1470.5662.7172.60
P 2 C (MW)26.9234.5530.8241.6228.78
P 3 C (MW)105103.93103.58100.83104.26
EED H 1 C (MWth)72.6492.757579.7671.84
H 2 C (MWth)25.71048.9235.2539.95
H 3 C (MWth)001.2000
H 1 (MWth)51.6657.2524.8834.9938.21
Cost (USD)14,860.2314,889.7514,859.3414,881.1414,856.36
Emission (kg)6.096.136.156.156.09
CPU time (s)3.02.32.42.22.5
Table 3. Statistical results of the performance metrics for the 5-unit CHPEED problem.
Table 3. Statistical results of the performance metrics for the 5-unit CHPEED problem.
MetricAlgorithmMinMeanMaxStdSig.
DMTV-MOPSO0.74240.80030.84570.0260=
GDE30.71830.75940.80500.0273+
NSGA-II-DE0.67590.77310.80960.0283+
MODE-RMO0.70600.77110.81810.0224+
RLMODE0.78630.81310.88840.0229
2HVTV-MOPSO0.19260.19310.19340.0002+
GDE30.19140.19210.19290.0004+
NSGA-II-DE0.19060.19180.19260.0005+
MODE-RMO0.19140.19230.19290.0004+
RLMODE0.19270.19320.19370.0003
IGDTV-MOPSO9.845211.605813.17670.7173=
GDE312.534314.212116.43901.0600+
NSGA-II-DE11.804913.887717.43701.4504+
MODE-RMO11.979313.567816.64361.0189+
RLMODE10.063411.348712.66920.6079
Table 4. Results of EcD, EmD, and EED for the 7-unit CHPEED problem.
Table 4. Results of EcD, EmD, and EED for the 7-unit CHPEED problem.
OutputTV-MOPSOGDE3NSGA-II-DEMODE-RMORLMODE
P 1 (MW)65.9965.7961.3663.4152.75
P 2 (MW)91.2399.4999.9190.8292.99
P 3 (MW)109.65100.13102.37109.88112.84
P 4 (MW)201.91203.08206.46204.62217.86
EcD P 1 C (MW)98.7198.8097.5698.8091.24
P 2 C (MW)40.1140.364040.0740
H 1 C (MWth)0.5107.29044.54
H 2 C (MWth)73.4169.637574.0275
H 1 (MWth)76.0880.3767.7175.9830.46
Cost (USD)10,261.8810,298.4010,222.1610,249.3710,212.26
Emission (kg)27.0527.1827.5227.1928.75
P 1 (MW)42.5536.5933.8536.4846.41
P 2 (MW)31.6638.3253.6544.4552.59
P 3 (MW)80.8368.9659.3773.6564.99
P 4 (MW)83.1099.7196.5485.4976.57
EmD P 1 C (MW)247246.97246.99247245.49
P 2 C (MW)122.60117.15117.36120.68121.79
H 1 C (MWth)00002.68
H 2 C (MWth)53.5669.6388.2466.2753.20
H 1 (MWth)96.4480.3761.7683.7394.11
Cost (USD)17,638.8317,329.1217,345.5217,553.3817,640.14
Emission (kg)7.757.887.747.597.54
P 1 (MW)61.4173.437565.3975
P 2 (MW)89.4193.6278.8776.3980.07
P 3 (MW)102.93114.2999.22121.91105.95
P 4 (MW)136.29107.57139.33125.01129.74
EED P 1 C (MW)176.91176.97174.48178.74176.19
P 2 C (MW)40.5541.6340.594040.54
H 1 C (MWth)0.160.9324.0506.87
H 2 C (MWth)75.4776.41757575
H 1 (MWth)74.3672.6650.957568.13
Cost (USD)12,047.7912,027.7512,131.1412,049.3212,000.28
Emission (kg)18.4218.6718.5218.5118.42
CPU time (s)5.54.54.94.65.0
Table 5. Statistical results of the performance metrics for the 7-unit CHPEED problem.
Table 5. Statistical results of the performance metrics for the 7-unit CHPEED problem.
MetricAlgorithmMinMeanMaxStdSig.
DMTV-MOPSO0.68590.73250.77650.0225+
GDE30.73260.76570.80330.0194+
NSGA-II-DE0.70320.78490.83960.0339+
MODE-RMO0.69490.76240.81480.0227+
RLMODE0.76320.80480.85160.0251
HVTV-MOPSO0.27670.27810.27960.0007+
GDE30.27850.28040.28180.0009+
NSGA-II-DE0.27760.28080.28270.0012+
MODE-RMO0.27790.28060.28260.0010+
RLMODE0.28050.28210.28270.0005
IGDTV-MOPSO31.858036.395043.79902.9514+
GDE330.166037.667049.06704.9670+
NSGA-II-DE25.258033.699042.35604.3386+
MODE-RMO29.741036.889056.57805.5591+
RLMODE25.003029.697042.90804.0536
Table 6. Results of EcD, EmD, and EED for the 100-unit CHPEED problem.
Table 6. Results of EcD, EmD, and EED for the 100-unit CHPEED problem.
OutputTV-MOPSOGDE3NSGA-II-DEMODE-RMORLMODE
EcDCost (USD)284,998.66280,781.47278,648.30278,670.12278,102.84
Emission (kg)204.75227.54232.20230.31238.49
EmDCost (USD)330,327.51336,643.25341,869.59338,879.12342,104.18
Emission (kg)45.4933.9326.3930.9925.56
EEDCost (USD)292,904.09292,934.82293,398.89293,113.78292,647.89
Emission (kg)157.50160.30156.81155.89153.57
CPU time (s)47.448.048.047.954.9
Table 7. Statistical results of the performance metrics for the 100-unit CHPEED problem.
Table 7. Statistical results of the performance metrics for the 100-unit CHPEED problem.
MetricAlgorithmMinMeanMaxStdSig.
DMTV-MOPSO0.63200.69180.75430.0274+
GDE30.69560.77470.84560.0371+
NSGA-II-DE0.76330.80980.87640.0235+
MODE-RMO0.76330.80730.84740.0206+
RLMODE0.81680.84140.87580.0144
HVTV-MOPSO0.16980.17330.17670.0017+
GDE30.17690.18140.18360.0015+
NSGA-II-DE0.18010.18360.18520.0010+
MODE-RMO0.18040.18280.18450.0009+
RLMODE0.18610.18690.18790.0004
IGDTV-MOPSO909.180012341788.3000206.8600+
GDE3270.9400487.70001047.2000166.0400+
NSGA-II-DE210.3000279.2600442.580052.9940+
MODE-RMO252.5700331.1600449.750055.0550+
RLMODE169.3400200.3200224.150011.8440
Table 8. Results of EcD, EmD, and EED for the 140-unit CHPEED problem.
Table 8. Results of EcD, EmD, and EED for the 140-unit CHPEED problem.
OutputTV-MOPSOGDE3NSGA-II-DEMODE-RMORLMODE
EcDCost (USD)237,703.69224,936.75239,690.11225,670.28216,483.24
Emission (kg)466.50526.37391.68554.75544.62
EmDCost (USD)330,651.70337,670.10347,284.96340,838.48347,112.22
Emission (kg)194.38201.67180.39191.32172.18
EEDCost (USD)242,778.96243,338.27242,231.62243,210.60239,690.11
Emission (kg)423.76428.76418.54425.72391.68
CPU time (s)77.675.675.276.484.6
Table 9. Statistical results of the performance metrics for the 140-unit CHPEED problem.
Table 9. Statistical results of the performance metrics for the 140-unit CHPEED problem.
MetricAlgorithmMinMeanMaxStdSig.
2DMTV-MOPSO0.63430.69230.74160.0259+
GDE30.60080.69180.74730.0343+
NSGA-II-DE0.75420.79800.84650.0215+
MODE-RMO0.63960.71750.76350.0360+
RLMODE0.76600.81440.85410.0228
HVTV-MOPSO0.22510.22780.23160.0017+
GDE30.22250.22630.22970.0018+
NSGA-II-DE0.23350.23610.23910.0013+
MODE-RMO0.22570.22840.23180.0015+
RLMODE0.24880.25180.25530.0015
IGDTV-MOPSO2110.50003065.30003751.2000387.1000+
GDE3795.67001180.70001913.3000257.3000+
NSGA-II-DE445.6000555.7200738.690062.6100+
MODE-RMO680.350010011611.9000243.1500+
RLMODE376.3600482.7200783.060092.9610
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Chen, X.; Fang, S.; Li, K. Reinforcement-Learning-Based Multi-Objective Differential Evolution Algorithm for Large-Scale Combined Heat and Power Economic Emission Dispatch. Energies 2023, 16, 3753. https://doi.org/10.3390/en16093753

AMA Style

Chen X, Fang S, Li K. Reinforcement-Learning-Based Multi-Objective Differential Evolution Algorithm for Large-Scale Combined Heat and Power Economic Emission Dispatch. Energies. 2023; 16(9):3753. https://doi.org/10.3390/en16093753

Chicago/Turabian Style

Chen, Xu, Shuai Fang, and Kangji Li. 2023. "Reinforcement-Learning-Based Multi-Objective Differential Evolution Algorithm for Large-Scale Combined Heat and Power Economic Emission Dispatch" Energies 16, no. 9: 3753. https://doi.org/10.3390/en16093753

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop