*2.2. Dynamic Programming (DP)*

The basic idea of dynamic programming is that it not only separates the current stage from the future stages, but also considers the current benefit and the future benefit together. Therefore, the optimal decision selection of each stage is from the overall consideration, which is generally different from the optimal choice of this stage [41]. Concretely, for a multi-stage decision-making problem, dynamic programming can divide it into several stages according to time or other characteristics, and each stage has several states and decision strategies [42]. The system transfers from one stage to the next according to a certain rule, and the purpose is to obtain the optimal strategy combining each stage [43]. The following Equation (3) is the state transition formula of dynamic programming, and it is also the most important part of dynamic programming.

$$S\_{\hat{j}} = T(S\_{j-1}, \ x\_{j-1}) \quad j = 1, 2, \dots, l \tag{3}$$

where *S<sup>j</sup>* stands for the state variable at stage *j*, with *l* stages in total. *xj*−<sup>1</sup> represents the decision variable at stage *j* − 1, and *T*(*Sj*−<sup>1</sup> , *xj*−1) is the state transition function [44].
