1. Introduction
In recent years, the consequences of climate change on the availability of resources such as fresh water seem to be evident [
1]. Moreover, a growing population has resulted in a significant increase in global food demand that entails greater pressure on water resources with a direct impact on irrigated agriculture [
2]. In order to maximize the efficiency of this water use, in the last two decades, many irrigated areas have been subjected to modernization processes [
3] where open channels have been replaced by pressurized networks [
4]. However, these modernization processes has led to high energy consumption, making measures to optimize the use of this resource also necessary [
5], which together with the increase in the price of conventional electricity, can lead to the unfeasibility of some farms [
6]; these are mainly in irrigation areas with underground resources, where the extraction cost alone can be up to 70% of the total energy cost [
7].
Renewable energy resources, which depend on natural resources to generate an infinite supply of energy that is sustainable and non-pulling, are a promising alternative to the conventional energy resources and have gained significant importance in the recent centuries to overcome the energy shortages [
8].
Photovoltaic solar energy has emerged as the most popular approach of the renewable energy resources [
9,
10], mainly in areas where the number of sunlight hours, among other things, is very high, as for example in Mediterranean areas [
11]. Consequently, irrigation crops using photovoltaic pumping systems have increasingly gained interest, and have already been established in many places of the world, for example, USA [
12], India [
13], Algeria [
14], Turkey [
15] and Spain [
4,
16]. Photovoltaic energy can be used in standalone or grid-connected systems to supply power for pumping stations in irrigated agriculture [
17]. Although, this energy source has many environmental and economic benefits [
18,
19,
20], their dynamic and highly weather-dependent nature makes its management a major challenge. The variation in solar irradiance could result in dramatic problems in balancing between the power generation and demand at the pumping station. Thus, solar PV power forecasting is essential to the efficient management of these PV systems integrated in commercial irrigated farms.
Fully modelling such PV systems is a hard task that requires the implementation of many mathematical models that allow the analysis of the factors that affect the performance, for example, geographical location, solar irradiance, multiplicity of PV-system components in the market and the complexity of the permutations of these components, their types, efficiencies, and their different performance indicators. Moreover, the varying methods used for the design of PV-systems often lead to results with significant differences due to differing assumptions [
21]. Several methods are detailed in [
22,
23]. Thus, the efficient integration of photovoltaic (PV) production in energy systems is conditioned by the capacity to anticipate its variability, that is, the capacity to provide accurate forecasts [
24]. Consequently, these accurate methods can present some limitations for users in their regions, such as the difficulty and specificity of the method, or the necessity to get accurate input and output data [
25], which is not a serious problem due the current capacity to monitor of these types of systems.
However, Artificial Neural Networks (ANNs) based methods have been proven to be a useful tool to model different engineering systems under real-world conditions without having to solve these mathematical methods [
26,
27]. The ANNs exhibit excellent characteristics such as high-speed information processing, mapping capabilities, fault tolerance, adaptively, generalization and robustness. Therefore, ANNs are a powerful and smart tool for modelling, prediction, and optimization the management of the PV energy systems. Many researchers have summarized the use of the ANNs in many applications of PV energy systems. For example, multilayer perceptron networks (MLP) or radial basis function networks (RBF) have been widely applied to forecast electrical efficiency, energy performance or PV yield [
28,
29,
30]. Others implement these ANNs for electrical modelling a photovoltaic module [
31], or MPPT-based artificial intelligence techniques for photovoltaic systems [
32], or even for prediction of solar radiation [
33].
However, the most important and easily managed operating variable for the managers of these irrigation networks is the power available at the outlet of the pumping station, i.e., the power available to the irrigation system. None of the previous models consider this essential aspect for the management of the PV systems in irrigation. On the other hand, none of these works consider the temporal influence of the input variables of the model, or the degradation of the PV systems over time. Thus, in this work, a new method, based on BigData techniques and Artificial Intelligence, has been developed to forecast the power available to the irrigation system (at the outlet of the pumping station) when PV energy systems are used as the energy source. The forecasting model is based on a deep recurrent artificial neural network optimized by a genetic algorithm (NSGA-II) [
34] known as deepLSTM (Long-Short Term Memory) [
35]. The developed model PREPOSOL (PREdiction of POwer in SOLar installations) considers each input of the forecasting model in a time series and analyses the seasonality and trend of each input variable. In addition, due to the memory capacity of the developed model, it can address the problem of degradation of the PV system by adapting over time to the real system conditions. The model PREPOSOL has been developed in MATLAB® (Mathworks Inc., Natick, MA, USA) software and was tested in a real farm in the southeast of Albacete province (Spain).
The paper is structured as follows.
Section 2.1 describes the case study and all the measuring equipment used.
Section 2.2 details the process to calculate the cloudiness and power loss in cables leading to the pumping station.
Section 2.3 explains a new approach to forecast the available power at the pumping station fed by PV systems.
Section 2.4 and
Section 2.5 outline the equations to build the PREPOSOL model and their optimization processes.
Section 3 presents the input variables of the PREPOSOL model.
Section 3.1 presents the Pareto front for the last generation and the evolution of the best individual in the model.
Section 3.2 analyses the results obtained from the PREPOSOL model against real data.
Section 4 is the discussion section and finally,
Section 5 shows our main conclusions.
2. Materials and Methods
2.1. Case Study and Data Source
In the present work, for developing the model, a commercial farm called “Peruelos” was used. The geographical coordinates were 38.994° latitude and −1.859° longitude and it was in the southeast of Albacete province (Spain). The crop was almond trees with a plantation framework of 7 × 7 m
2 in a total irrigation area of approximately 90 ha. The photovoltaic system provided energy for a subsurface drip irrigation system, which was composed by 20 subunits with a highly irregular shape and topography that reached huge elevation differences (up to 60 m). An accurate description of the installation appears in [
36,
37].
The photovoltaic system was composed by 152 polycrystalline silicon photovoltaic modules. The photovoltaic module was SM6610P 265 (Astronergy/Chint Solar, Frankfurt, Germany) of 60 solar cells and a unit capacity per photovoltaic module of 265 Wp with south orientation and a slope of 8.5°. The PV generator was 40 kWp and was composed by eight lines in parallel with 19 photovoltaic modules per line.
The variable frequency drive (VFD) model was 3G3RX-A4220-E1F (Omron Europe B.V., Hoofddorp, the Netherlands), with a nominal power of 30 kW, an output nominal current of 57 A and an overvoltage protection of 800 V. The VFD efficiency, according to the manufacturer, is 89.7% at 25% load and 95% at 100% load.
The irrigation pump was connected to the photovoltaic generator mainly by two different lines of cables. The first line, from the VFD up to the borehole inlet, was in a buried aluminium cable XZ1 0.6/1kV with a section of 150 mm
2 and a length of 470 m, and the second line, from the borehole inlet up to the submersible motor, was a copper cable RVK 0.6/1kV with a section of 25 mm
2 and a length of 225 m (
Figure 1).
The installed equipment on the farm to generate the PREPOSOL model was a Middleton EP07/134 calibrated pyranometer (Middleton Solar, Melbourne, Australia), which allowed to measure the values of irradiance on the horizontal surface (W·m
−2) and an agroclimatic station SICO WS-600 (SICO Control Systems, Madrid, Spain) to measure wind speed (m·s
−1) both located in a corner of the photovoltaic generator (
Figure 1).
An AR5 electrical network analyser (CIRCUTOR, Barcelona, Spain), which had an accuracy better than 1.5%, was used to measure the generated AC power.
The equipment used for the system monitoring was programmed to record the measurements with a time interval of 10 min during 2016, 2017, and 2018; its evolution is presented in
Section 3 of results.
2.2. Objective Photovoltaic Power
The presence of clouds has a negative effect on generating energy with photovoltaic installations. Thus, an algorithm to calculate the clear sky conditions was used [
38,
39]. Later, the comparisons between forecasts of the clear sky conditions with the measures of the irradiance on a horizontal surface allowed calculation of the level of cloudiness (%) at all times of the day.
The objective was to get photovoltaic power to the inlet machine that requires it, in this case, an irrigation pump.
Figure 1 shows two lines of buried cables in Alternate Current (AC) between the outlet of the VFD and the inlet of the irrigation pump, which can produce important power losses that are necessary to calculate. The calculation of power losses in long cables was based on the cable resistance approach obtained according to the temperature of the cable (Equation (1)).
where
CLPOW is the power losses in the cable (kW), I
max is the AC current in the cable (A), N is the number of conductors, L is the length of the cable (m), and R is the resistance according to the temperature reached (Ω).
2.3. Problem Approach
The available power for the irrigation system, mainly when PV systems are implemented as energy sources, is a key variable to schedule a precision irrigation and the efficient management of the pumping station. Thus, in this work, a new approach based on recurrent neural networks with memory optimized by genetic algorithms was developed to forecast the available power at the pumping station (fed by PV systems), considering both clear and cloudy days in a single model. The developed model can memorise the most relevant information over time. This made it possible to optimise the model’s forecasts, as well as to adapt dynamically to the natural evolution (degradations, efficiency losses, etc.) of the PV systems. The model was trained to forecast the power available at the pumping station up to 3 h in advance, which is enough time to carry out the optimal scheduling of a precise irrigation system. The first phase of the model building process was to identify the main input variables. There are many techniques to achieve this input space reduction. Principal component analysis and partial least square cardinal components are two widely used techniques. However, when the significant input variables resulting from these techniques are used in nonlinear models, very poor results are usually obtained [
40]. Therefore, an adaptation of the methodology developed by [
40], which applied fuzzy curves and fuzzy surfaces for the identification was used to find the significant input variables in an automatic way. A stacked LSTM (deepLSTM) model was then designed and optimized by the NSGA-II genetic algorithm.
2.4. LSTM Cell
The LSTM network belongs to the Recurrent Neuronal Network class and it was specially designed for sequence problems. The recurrent connection of these kind of neural networks adds state or memory to the model and allows it to learn and harness the ordered nature of the sensor measurements with the input sequence.
Figure 2 shows the architecture of a LSTM cell, which is the elemental unit for stacked or deep LSTM models. An LSTM cell consists of a set of recurrently connected blocks, known as memory blocks. These blocks can be thought of as a differentiable version of the memory chips in a digital computer. Each of these blocks contains one or more recurrently connected memory cells and three multiplicative units (the input, output and forget gates) that provide continuous analogues of write, read, and reset operations for the cells.
In all RNNs, there is feedback that considers output from the previous time steps (h
t−1). Unlike other ANN architectures, RNNs have a feedback loop at every node, which allows information to move in both directions and so learning temporal patters of widely separated events. In addition to this feedback loop, the LSTM cell has some extra gates, namely the input, forget, cell and output gate that are used to decide which information are going to be forwarded to another node. Consequently, the input gate (
it) can be defined according to Equation (2).
where
σ represent the sigmoid function;
xt is the input vector of the model at time step
t;
Ui is the weight matrix that connects the inputs to the hidden layer;
ht−1 is the hidden state from previous time step
t−1 and
Wi is the recurrent connection between the previous hidden layer and current hidden layer.
Similarly,
ft defines the forget gate Equation (3), which decides what to forget by a mechanism of sigmoid function.
where
Uf is the weight matrix that connects the inputs to the forget layer and
Wf is the recurrent connection between the previous forget layer and the current forget layer.
The cell state (
Ct) represents the “memory” of the LSTM Equation (4) networks and information from the earlier time steps can travel to later time steps, reducing the effect of short-term memory.
where
is the candidate hidden state that is computed based on the current input and the previous hidden state Equation (5) and
Ct−1 represents the internal memory at the time step
t−1.
where
tanh represents tanh function;
Ug is the weight matrix that connects the inputs to the candidate hidden layer and
Wf is the recurrent connection between the previous candidate hidden layer and the current candidate hidden layer.
Finally, the output gate (
ot), which defines the new cell state (
Ct) and the hidden state at time step
t, is computed according to Equations (6) and (7).
where
Uo is the weight matrix that connects the inputs to the output gate and
Wo is the recurrent connection between the previous candidate hidden layer and the current candidate hidden layer at output gate.
After a training process in this kind of ANNs, the weight matrixes (U) and the recurrent connexions (W) are optimized to minimize the forecasting errors. Thus, at the end of this step, the forget gate decides what is relevant to keep from the prior steps. The input gate decides what information is relevant to add from the current step, and the output gate determines what the next hidden state should be.
2.5. Building and Optimizing the DeepLSTM Model
The deep or stacked LSTM is a model that has multiple hidden LSTM layers where each layer contains multiple LSTM cells. Stacking LSTM hidden layers makes the model deeper, more accurately earning the description as a deep learning technique [
41]. The multilayers can recombine the learned representation from prior layers and create new representations at high levels of abstraction, e.g., some groups of layers were specialised in cloudy days, while others in clear days (all of them under the same model). However, the right or optimal architecture of deepLSTM models is hard to find, and it is frequently adjusted by trial and error leading to not skilful models, or unstable models. Consequently, the architecture of PREPOSOL (based on deepLSTM) model has been automatically optimized by the multi-objective genetic algorithm known as NSGA-II [
34].
Figure 3 shows the flowchart of the PREPOSOL development and optimization.
In the first step, an initial population of nPop chromosomes was randomly generated. Every chromosome or individual (chr) was made up of nDec decision variables, or genes, that represent each hyperparameter of the PREPOSOL model (deepLSTM model) to be optimized. Thus, every individual of the genetic algorithm defined one deepLSTM model to be trained.
Table 1 shows the five decision variables (genes) considered in this work, as well as the value range that everyone takes during the optimization process. The two first decision variables, or genes,
(nDec1 and
nDec2) define the dimensions of the inner gates and the cell state activation function of the LSTM cell, respectively. Consequently, these two first genes define the architecture of the LSTM cell. The third gene (
nDec3) establishes the depth of the deepLSTM model and so its capacity to be adapted to the problem addressed. The gene
nDec4 was designed to find the best way to train the deepLSTM model for this problem, i.e., the best training function. Finally, gene
nDec5 was responsible to find the best way to measure the difference between the forecasted and the actual values of the available power at the pumping station.
Once the initial population was created, each deepLSTM model created by each chromosome was trained and tested with two different datasets (train and test datasets). After that, it was necessary to sort this initial population according to its aptitude, i.e., its forecast capacity. However, to ensure good generalization of the PREPOSOL model, the aptitude of every individual was measured, not over the training set but over the testing set. Thus, the initial population was sorted by the aptitude of the testing set, which was defined by two objective functions, F1 and F2. In this work, F1 (Equation (8)) is the coefficient of determination (R
2) of the testing dataset, while F2 (Equation (9)) measures the standard error prediction (SEP) [
48] of this dataset.
where
nt is the total number of samples of the testing dataset;
is the estimated available power at pumping station, kW;
is the average of APower of the testing dataset, kW; APower is the observed available power at pumping station, kW; and
is the average of the observed APower of the testing dataset.
Thus, the NSGA-II algorithm optimizes the decision variables by maximizing the F1 values and minimizing the F2 values simultaneously. In the remaining stages, the individuals were modified (crossover and mutation), and the top nPop were selected based on their objective function values. The process was repeated for several generations (nGEN). Finally, the set of nPop optimal individuals (optimal deepLSTM models) obtained in the last generation defines the Pareto front.
3. Results
The input variables used in the generated model PREPOSOL are representative for this case study “Peruelos” and are represented in
Figure 4. The solar irradiance on the horizontal surface (W·m
−2), measured by the pyranometer, shows the normal evolution according to the time of year, which is reflected on the power achieved. The mean wind speed (m·s
−1) was of 2.78 and the main cloudiness values were recorded in winter months. However, there were many cases of cloudiness, below of 10% because of location in a highly irregular topography, that could be related to fog, turbidity of the atmosphere and the airborne dust, that mainly occurred in the early morning and late afternoon.
3.1. Evolution of the PREPOSOL Optimization
The population corresponding with the last generation in the evolutional optimization process was composed of five genes, which represented a different characteristic of the ANN, with 100 individuals for each of them.
During the optimization process, the number of initial training steps in the ANN was done with a limitation time. Later, the best individuals initially trained were retrained, this time with no limitation time on the number of steps. The function time in the training process was essential because it allowed to get more or less dense ANNs, which implied a slow or fast training process respectively, and whose effect was reflected in the function costs with more or less approximation. Consequently,
Figure 5 represents the Pareto front for the last generation and the evolution of the best individual with and without limitation time. The individuals are not separated into different groups in the graph and, except for some, in general terms there are linear relationships because as F1 value increase, so F2 values do too.
In this case, the best individual was obtained without limitation of time (I2) with a very small change in the R2 value with respect of the optimization process with limitation time, but a clear improvement of the SEP value which results in an increase in the accuracy of the model.
3.2. Optimal PREPOSOL Model
To analyse the precision of the PREPOSOL model, a statistical analysis was performed based on the calculation of the root mean square error (RMSE), relative error (RE), and coefficient of determination (R
2). The statistical values of R
2, RMSE and RE were 0.9157, 1.61 and 8.10, respectively (
Figure 6a).
Figure 6b shows the comparison between the measured values and the predicted values with differences between 13–18 kW, which are produced mainly in autumn-winter, and were slightly higher in predicted values compared to the measured values. However, for high power between 22–28 kW, approximately, the predicted values were below the measured values, mainly in summer, when this high level of energy is more frequent.
Despite that, a good accuracy was observed in the power range of 19–22 kW without many differences. Before the development of the generated model, a good filtering was done of the initial values to obtain quality values. However, due the large number of inlet parameters for calculations, these results could be influenced by the lack of representative values in all periods of time (summer, autumn, winter, spring). Thus, according to the previous reasoning, this model will be able to learn better with more and better-quality representative data for all periods of time. Although the differences between the measured and predicted values suffered temporal variations, the statistical analysis reveals a good approach in values of the main statistical indicators.
The generated model can be applied in any system that uses photovoltaic solar energy to feed their equipment. However, its application in irrigation pumping systems is a great advantage, especially when the crop irrigation requirements are high and irrigation with this type of system is the only solution. For that, it is essential to have an accurate knowledge of the irrigation area to allow to collect quality parameters of complex irrigation systems [
37]. In this case, the hydraulic information was calibrated and validated in the same real farm, which allowed to make an irrigation planning for the system.
For a demonstration, two irrigation strategies were used.
Table 2 shows the strategy with an individual subunit and
Table 3 shows the strategy with several simultaneous subunits. The subunits analysed were 3 and 11 of 20 that make up the farm, which depending on the selected strategy, allowed to obtain quality irrigation parameters an accordingly the inlet power of the pump [
37].
Moreover,
Table 4 shows a short example for the 6 July 2018 with different hours of the day and their predicted power obtained with the PREPOSOL model. Thus, for quality irrigation of subunit 3, (Emission Uniformity (EU) = 85%) 14 kW was necessary at the inlet of the pump, whereby, according to
Table 4, the farmer could irrigate subunit 3 individually at any hour. However, for a simultaneous quality irrigation of the subunits 3 and 11 (EU = 85%) 19 kW are necessary at the inlet of the pump whereby, according to
Table 4, the farmer could only irrigate subunits 3 and 11 from 13:20:00 to 15:20:00 hours.
This was a small demonstration about the possibilities offered by the generated model for irrigation scheduling.
4. Discussion
The monitoring of power generation installations is not always considered [
49]. One of the essential and important factors, among others, in getting accurate predictions in PV installations, is the cloudiness that is produced by the variability of weather conditions and has a direct effect in the level of irradiance reached. Some researchers have published on this issue, e.g., [
50,
51]. The implementation of the cloudiness parameter in the PREPOSOL model has allowed to demonstrate their good performance. Moreover, in this case, the cloudiness parameter was calculated with irradiance values obtained with a pyranometer located in the same study place, but it could be calculated by the users through official sources of information that are easier to use. Moreover, the current low-cost monitoring systems of photovoltaic installations can be a useful technology [
52]. Thus, a high efficiency was demonstrated for predicting available photovoltaic power with the PREPOSOL model using a small number of input variables that are commonly monitored.
A data acquisition system is very important to get information in real time, but it is especially important in solar pumping systems as it allows to describe their behaviour in variable weather conditions [
53]. There has been much research to get photovoltaic power predictions using artificial neural networks [
49,
54]. However, it is remarkable that the PREPOSOL model has an associative and adaptative memory, which indicates that as the data becomes better, its memory will be able it to adapt better to the conditions in which is working. Thus, the methodology implemented could be applied in any other farm, with adjustments of the model for each specific case.
The PREPOSOL model considers the temporal influence of the input variables and the degradation of the PV systems over time; both aspects have not been considered previously, which can result in significant lack of precision because the PV-module manufacturers guarantee a power drop of less than 20% within the warranty period [
55,
56]. Moreover, the most important and easily manageable operating variable for the managers of these irrigation networks, is the power available at the outlet of the pumping station, i.e., the power available to the irrigation system. None of the previous models have considered this essential aspect in the management of the PV systems in irrigation.
The adequate management of a solar pumping system is a key objective so that farmers can obtain quality and quantity food, have water and energy savings, be competitive in the agriculture sector and generate benefits in their farms. The high variability of weather conditions involves an efficient management of the energy resources to be applied appropriately in the irrigation system, especially in farms with water scarcity. In this work, the power prediction to 3 h in advance, represents enough time for the decision-making process and allows easy management of the pumping station, or to do modifications in the irrigation scheduling, as was demonstrated in the example presented for the irrigation system.
5. Conclusions
The utility of the PREPOSOL model to predict available power in photovoltaic installations up to 3 h in advance was demonstrated. Thus, it will be possible to adapt the energetic production as the system demands with the aim of making a rational use of the resources, increase the competitivity and productivity of the developed activity, planification and programming of specific situations and other benefits that can be done according to the developed activity. Although the generated model can be applied in any system that use photovoltaic solar energy to feed their equipment, in this work was applied in a solar pumping system.
Thus, the generated model enabled predictions of photovoltaic power with a high level of precision (RE = 8.10%) with respect to the measured photovoltaic power, using a reduced number of inlet variables which could be a great advantage in saving monitoring systems of the photovoltaic installations. Moreover, the temporal influence of the input variables and the degradation of the PV systems over time was considered, both fundamental aspects that have not been considered in other works.
Another aspect, and the most important and easily manageable operating variable for the managers of these irrigation networks, is the power available at the outlet of the pumping station, i.e., the power available to the irrigation system, was considered in this work.
The need to have sufficient quantity and quality data is essential for accurate approximations. Although, the statistical analysis obtained shows good results, and due at the implemented algorithms in the generated model which present associative and adaptative memory, the accuracy could be further improved if the data were better.
Thus, the PREPOSOL model allowed to optimize the generated energy with accurate predictions of photovoltaic power and, in this case, could optimize the use of water for quality irrigation with high emission uniformity (85%), which is a key resource in the viability of many farms especially in farms with water scarcity.
The PREPOSOL model could be used to generate alarms for malfunctioning equipment in the system, because if the monitoring photovoltaic power values are below the predicted values of the model it will indicate technical problems and the need for immediate checking. Thus, the efficiency of the installation is guaranteed, and there is the possibility to save costs and avoid more expensive repairs.
Although, the generated model can be applied in other types of systems fed by solar energy, in this case, it was applied to photovoltaic pumping systems where the high variability of solar irradiation and the high irrigation requirements of the crops for food production demand an accurate management of the system. The small representative example showed the possibilities that the farmer has in the irrigation decision-making process for a better use of the available resources, as well as a better use of the investment.