*3.4. Knowledge Transfer*

Through exploiting the optimal knowledge matrices of the previous tasks, the knowledge transfer [32] can approximate the optimal knowledge matrices of a new task, and this is how knowledge transfer works. In this study, the most similar previous task will be chosen for knowledge transfer, based on its similarity with the new task, which can be expressed as

$$\mathbf{Q}\_{i}^{\rm n0} = r \cdot \mathbf{Q}\_{i}^{\rm s\*} + (1 - r) \cdot \mathbf{Q}\_{i}^{\rm initial} \tag{11}$$

where *Qi***n0** is the approximated optimal knowledge matrices of the *i*th controllable variable of the new task; *Qi***s\*** denotes the optimal knowledge matrices of the *i*th controllable variable of the most similar previous task; *Qi***initial** represents the initial knowledge matrices of the new task without knowledge transfer; and *r* represents the comparability between the most similar previous task and the updated task, with 0 ≤ *r* ≤ 1, respectively.

### **4. TRL Design of PV Systems for MPPT**

### *4.1. Control Variable and Action Space*

For the purpose of obtaining the GMPP of a PV system, the output voltage *<sup>V</sup>*pv is chosen as the control variable, in which the entire searching space is decomposed into four layers. In each layer, the searching space is uniformly discretized into ten actions within the corresponding range from lower bounds to upper bounds.
