**1. Introduction**

In the past decade, a continuous decline in the overall price of photovoltaic (PV) modules can be witnessed around the world, thanks to the advancement of new materials and manufacturing, as well as the ever-growing attention to greenhouse gas emissions [1,2]. As a consequence, solar energy has rapidly become a promising renewable power source in the global energy market. Technologically, PV systems own the elegant merits of easy installation, high safety, solar resources abundance, nearly free maintenance, and environmental friendliness [3–5]. Thus far, large-scale PV systems are widely installed, due to their short-term and long-term economic prospects [6,7].

In practice, the stochastic variation in actual environmental conditions, e.g., variation of solar radiation and fluctuation in temperature, usually leads to the power–voltage ( *P*–*V*) curve to exhibit a highly nonlinear and time-varying feature. Hence, how to accurately determine the output characteristics of PV cells, as well as the maximum possible output of PV systems under various weather conditions, becomes a very challenging issue. This task is often referred to as maximum power point tracking (MPPT) [8]. For the sake of achieving MPPT, a power converter (DC–DC converter and/or inverter) is often used to connect with PV systems. Currently, conventional MPPT techniques have received further development so that, in the recent PV systems, the output power can be dynamically adjusted under di fferent environmental conditions, e.g., hill climbing [9], perturb and

observe (P&O) [10], and incremental conductance (INC) [11]. All of these schemes adopt a common assumption that the PV cells share the same module as well as the modules share the same array, and are exposed to the same temperature and solar irradiation, upon which only one maximum power point (MPP) exists. Although they own a simple structure and can e fficiently seek the MPP under uniform solar irradiation conditions, a consistent oscillation around MPP is inevitable, which causes a long-lasting loss of solar energy. Besides, o ffline MPPT approaches such as fractional short circuit current (FSCC) [12] and fractional open circuit voltage (FOCV) [13] have been adopted for PV systems, which possess the prominent superiorities of relatively lower complexity and inexpensive implementation. Nevertheless, a common deficiency of these methods is due to the fact that they will not be applicable when solar irradiation is rapidly changing.

Furthermore, when the distribution of solar irradiation among PV modules is unequal, an uneven solar irradiation scenario may emerge, namely partial shading conditions (PSC). For example, the shadows caused by surroundings such as buildings, trees, clouds, birds, dirt, etc. Every single PV module may receive di fferent levels of solar radiation [14]. Under this circumstance and the presence of the bypass diodes, the output *P*–*V* curve is usually nonlinear, that is, it will contain multiple local maximum power points (LMPPs) and a single global MPP (GMPP). Generally speaking, at LMPP, the PV system usually reaches a low-quality optimum point, while the aforementioned methods can be easily trapped, thus, they are inadequate to fully exploit the solar energy under PSC. To handle this intractable hindrance, a grea<sup>t</sup> number of approaches have been introduced. For example, reference [15] developed a fuzzy logic controller (FLC) where the approximate optimal design for membership functions and control regulations were found to be the same by GA. In addition, for the sake of achieving the rapid tracking of GMPP under PSC, a new method called the improved particle swarm optimization algorithm (PSO), based on strategy with variable sampling time, was proposed [16]. In literature [17], in order to accomplish MPPT under di fferent environmental conditions and PSC, an artificial bee colony (ABC) algorithm was proposed, which only requires few parameters and its convergence has no relation to the initial conditions. In [18], the bio-inspired Cuckoo search algorithm (CSA) was adopted to e ffectively tackle PSC by the use of Levy flight with fast convergence. Moreover, a social behaviour motivated algorithm named teaching–learning-based optimization (TLBO) was adopted to achieve the accurate tracking of GMPP under PSC, the advantages of this algorithm are simple structure and fast convergence [19]. Furthermore, the generalized pattern search (GPS) optimization algorithm [20] was devised to resolve PSC, which has superior performance, such as high convergence speed, excellent dynamic, and steady state e fficiencies, as well as simple operation. In reference [21], an ant colony optimization (ACO) combined with a novel strategy of pheromone updating was developed for MPPT, which can e ffectively improve the speed of tracking, accuracy, stability, and robustness under various weather conditions and di fferent partial shading patterns. However, all of these meta-heuristic algorithms have two main deficiencies as they are independently utilized for MPPT under various scenarios, as follows:


Rapid development of artificial intelligence in recent years, especially Google DeepMind's AlphaGo [22], which has easily defeated two world champions in two world-renowned Go matches in 2016 and 2017, respectively, has boosted a tide of artificial intelligence. In fact, the model-free reinforcement learning (RL) is one of the core algorithms of AlphaGo, which can rapidly construct an optimal action policy at each state, according to its current knowledge or experience [23]. Motivated from this outstanding characteristic, a new transfer reinforcement learning (TRL) with space decomposition for MPPT of PV systems under PSC is proposed in this paper. In comparison with the aforementioned meta-heuristic algorithms, TRL has the following two advantages:


### **2. Modelling of PV Systems under PSC**
