1.3.3. SkyImager in the Canary Islands

Six insular power grids comprise the electrical network in the Canary Islands. Conventional generation costs more here than PV technologies, and savings can be shared between the PV system owners and the Spanish utility ENDESA. Penetration of renewables varies among these grids, from a high of 60% penetration of wind energy in El Hierro (after Gorona del Viento hydro-wind power plant is operational) to Lanzarote-Fuerteventura which achieves single-digit integration of renewables because of strong environmental regulations, a weak power grid, and an unstable regulatory environment in Spain for renewable energy infrastructure during the period 2011-15. However, new regulatory policies will provide a more attractive framework for investment. The Canary Islands Government plans to avoid ground-based renewable facilities with a large environmental footprint in favor of smaller rooftop plants close to electricity users. As in Hawaii, the existing distribution grid is not prepared for a large penetration of residential PV systems with resulting reverse flows, voltage and frequency instabilities, and drops at the end of long lines. ENDESA has built a testbed smart grid in a village at the north end of the Fuerteventura-Lanzarote insular power system (La Graciosa).

La Graciosa is the smallest island in the Canary Archipelago with a surface area of 29 km2. It is in a marine nature reserve north of Lanzarote and home to about 700 people in the island capital of Caleta de Sebo. Average global irradiation is 5.157 kWh/ kW·day (1883 kWh/kW·yr) while the average monthly high temperature is 20.8 ◦C. Located a few kilometers away from the African coast, its proximity to the Sahara Desert gives to La Graciosa particularly stable atmospheric characteristics due to a quasi-permanent subsidence thermal inversion. Constant north trade winds, along with the high content of aerosols and dust in the atmosphere, have a large influence over the cloud dynamics and therefore, the irradiance in the region. As shown in Figure 9, the La Graciosa grid is supplied by three 20/0.4 kV transformers (600, 400, and 400 kVA) and tied by a 20 kV seabed cable to Lanzarote. The island has two PV generation plants (5 kW and 30 kW), but recently La Graciosa PV capacity was increased, enhancing the attractiveness of a smart grid energy management system.

**Figure 9.** Microgrid in the Canary Islands.

One of the main differences between La Graciosa project and the prior two experiences in the USA, is that in the island, a system composed of two sky-imagers was installed, as can be seen in Figure 9. The reason behind this was to give the forecasting system the ability to estimate cloud base height (CBH) making use of stereoscopic techniques as in [46,47]. This provides the system with added value in terms of functionality and gives extra data to incorporate in the next steps of the image processing and forecasting pipeline. A recent paper comparing the use of different instruments to measure the CBH concluded that using a pair of inexpensive cameras was the most cost-effective alternative in comparison with other methods such as a ceilometer or LIDAR [48]. In fact, cloud base height is quite important to estimate the position of the shadows if a ray tracing approach is taken and it can also be included as a feature if Machine Learning methods are preferred, as it correlates with the position of the clouds in the image and the recorded irradiance or PV production.

In this case, the device falls away from the Internet of Things (IoT) concept, since two cameras are involved in the system, and some computation must be done either in one of the devices or (as it was done in the project) on a dedicated server. Of course, there are some advantages and drawbacks for using either method, but we found particularly easy the connection between the sky-imagers and the server, and we could exploit the higher computational capabilities of the dedicated server. The main requirement to work this way is to have a robust internet access, which fortunately was granted by the owners of the buildings where the sky-imagers were installed. The network speed can also influence the way of operating, as it can act as a bottleneck in the data stream (due to the relatively large size of images compared to other types of files). Also, in the future the use of Machine Learning algorithms could be done on the server, which is expected to perform better than computing directly on the device.

#### **2. Materials and Methods**

In the original configuration of the SkyImager, a security camera enclosure housed a Raspberry Pi single board computer with programmable Pi camera. The enclosure contained a small circuit board with heater and fan that runs off a supplied 24V AC power supply, standard with many security cameras. The 12V AC output from this board is input to an AC-DC converter which supplies 12V DC to a TOBSON converter which supplies 5VDC at 3A for the Raspberry Pi.

#### *2.1. SkyImager Hardware*

Figure 10a displays the original SkyImager hardware. At NREL it was found necessary to add an extra SBC for increased computational power–the Odroid C1 by Hardkernel. Heat dissipation is an issue with SBC in Texas summers. One C1 was destroyed by heat and as result a cooling fan was added to the design. The new C2 Odroid has a heat sink to eliminate the overheating issue. Acquiring and fusing the 3-exposure images could be done with just the Raspberry Pi 2. The Pi 3 model is 50% faster than its predecessor; careful optimization of the workflow will allow acquisition, processing, and forecasting with just a Pi 3. This would reduce cost and greatly simplify network connections. Figure 10b shows this configuration with a single Pi 3, plastic case, cooling fan, camera, and WeatherBoard. Images could be pushed to the cloud for processing, however, the necessary bandwidth would be substantial. The sky imagers in La Graciosa are built upon a Raspberri Pi 3 model B with no ancillary boards, and a super wide fish-eye lens (field of view over 180◦). An inexpensive mini PV module was added to record irradiance at the camera locations there.

(**a**) (**b**)

**Figure 10.** (**a**) Original configuration Raspberry Pi 2/Odroid C1, (**b**) Current hardware with a single Pi 3 Model B.

Figure 11 shows the equipment deployed on the MET tower at Ft. Sam: a Kipp&Zonen CMP11 pyranometer, the UTSA SkyImager, and a Vaisala WXT520 weather transmitter. Each of the commercial devices costs several thousand dollars; this cost prompted us to search for low-cost substitutes.

**Figure 11.** CMP11 pyranometer, UTSA SkyImager, Vaisala Weather Station.

Several inexpensive alternatives to a commercial pyranometer exist [49]. Devices can be added to the GPIO pins on the Raspberry Pi. The Hardkernel Weather-Board 2 shown in Figure 12a can take not only temperature, humidity, and pressure readings (bme280 Application-Specific Integrated Circuit ASIC), but also measures light in the Visible, Infra-Red, and Ultra-Violet bands (si1132 ASIC). There is a Python interface for data retrieval. After calibration and conversion of Lux to W/m2, this provides irradiance measurements and limited weather data in real time. Another ancillary device that will be useful during initial deployment of the SkyImager is a GPS locator. At \$20, it looks like a small mouse for a desktop computer and plugs into a USB port. The Linux programs gpsmon and cgps can be installed on Raspbian and used to take readings of the exact position using the latest GPS satellite data. These are just two of many environmental sensors that can be connected to the Raspberry Pi. Figure 12b shows the new PiNoIR camera, which captures infrared light as well as visible. This would allow for increased contrast between low-level cumulus and high-level cirrus clouds composed of ice crystals. The SkyImager can be used for additional tasks such as air quality monitoring. Figure 12c shows the \$25 MQ-131 ozone detection sensor, for example.

**Figure 12.** (**a**) Hardkernel's WeatherBoard, (**b**) PiNoIR camera, and (**c**) MQ 131 Ozone Sensor.

### *2.2. Image Processing Pipeline*

Several additional external inputs were required for our forecasting algorithms: distortion parameters for the fish eye lens, zenith angle, True North, and most importantly the cloud base height (CBH). These inputs are used in the image processing pipeline (Figure 13) to output real-time GHI forecasts for the MGMS. A summary of the pipeline is included here, for details see [8]. (1) Distortion Removal (due to fish eye lens), (2) Cropping and Masking, (3) Calculation of "Red-to-Blue Ratio" (RBR), (4) Apply Median Filter to remove impulsive noise, (5) Thresholding to determine cloud presence, (6) Compute Cloud Cover percentage (clear/moderately-cloudy/overcast), (7) Project Clouds to height of CBH, (8) Use Optical Flow to move clouds forward in time, (9) Ray-Tracing to locate cloud shadows, and finally (10) Calculate GHI using shadow locations. Although physically correct, Step (9) Ray-Tracing is an inverse problem mathematically, hence "ill-posed". Small errors in locating shadows can produce significant errors in the forecast irradiance. To address this issue, we investigated using artificial intelligence and neural works to predict GHI values directly from forecast cloud locations.

At NREL another raw image was acquired every 15-seconds. The pipeline described above must be fine-tuned if the processing SBC is to achieve the necessary throughput. An SBC is much more limited than a desktop server as regards CPU speed and available memory/storage, which is provided by a 32 Gb micro-SD card. The usual tradeoffs between keeping a large array in memory versus writing it to disk, are still present even though there is no disk. Efficient programming constructs are required if the goal of low cost is to be achieved. The EMS may run on a military grade RuggedCom server but the SkyImager software is constrained run on an ARM architecture. Steps in the pipeline that have little effect on the overall forecast accuracy can be eliminated. Profiling/timing runs on the optical flow algorithms will show bottlenecks that can be addressed. This is important whether a ray-tracing approach or a machine learning strategy is employed.

The goal in intra-hour solar forecasting is real time PV power predictions. Those forecasts result from a two-step process: predicting cumulus cloud locations 15-min in the future and using projected cloud locations to forecast irradiance. Each step introduces errors. Work is ongoing for *Step 1 - Optical Flow*: compute error metrics of the 15-min ahead image versus the actual image. *Step 2 - Machine Learning* takes the predicted image and computes GHI. This approach separates optical flow [50] from machine learning (ML) and allows *GHI to be predicted directly from the image itself*. The SkyImager can be used as a pyranometer for measuring/observing irradiance.

The training datasets for the neural networks are region-specific, if not site-specific, and require many all-sky images taken on moderately cloudy days. The weights determined for Golden, CO, will have to be fined tuned for deployment in the San Antonio area for example. Training is computationally expensive, whereas the inference or prediction is very fast and can be handled by the Pi. Research on massive deep learning networks requires Graphical Processing Units (GPU) for training and software such as Theano, Keras, or Tensorflow.

**Figure 13.** Image Processing Pipeline.

#### *2.3. Machine Learning for Irradiance Forecasting*

Machine Learning (ML) is now ubiquitous in all areas of engineering and data science. It has been used in many different ways to help solve the solar power forecasting problem, as described in [51,52]. Another area where ML is widely used is forecasting building load [53,54] which includes methods that are physics-based, statistics-based (Gaussian Process, Linear Regression), and use machine learning (Artificial Neural Network, Support Vector Machine, Deep Learning). Classic references for Deep Learning include [55–57] and for Convolutional Neural Networks, [58].

AI software for data mining has evolved dramatically over the last few years. In data analytics Python is the premier programming language [59] and this fully validated our decision to use it for the SkyImager project. Rapidminer (Version 8.1.001) [60] is a machine learning platform with a point and click interface. As shown in Figure 14, a data flow pipeline is established that permits the user to input a data set, select attributes to analyze, determine target and predictor variable roles, partition the data into training, validation, and sometimes testing subsets, create a logical fork to apply different subprocess models such as Random Forests or Deep Learning, run the model(s), and assess error metrics and overall performance. It is a proprietary package, but a version with somewhat reduced functionality is available for educational use. For some models Rapidminer utilizes the H2O machine learning modules (Version 3.8.2.6) [61,62]. Specifically, our Deep Learning (DL) model is found in H2O as the Python function H2ODeepLearningEstimator(). An open source Python package Scikit-Learn (Version 0.19.0) [63] allows a user to prototype and compare a variety of classification, clustering, and regression models. Neural networks and deep learning has seen the evolution of specialized software such as Keras, Theano, and Google's Tensorflow, which recently became open source. The computational demands of training networks on big data are extreme, and this has resulted in a hardware evolution from central processing units (CPU) to graphical processing units (GPU) to special purpose tensor processing units (TPU).

**Figure 14.** Point and click GUI for Rapidminer allows easy model development.

Before predicting irradiance for cloudy days, consider the much simpler problem of forecasting on a clear day. The Haurwitz analytic model performs well on days with no clouds. Using physics, one can derive a closed-form functional relationship [64]: GHIclr = 1098 [cos*θ<sup>z</sup>* exp(−0.057/cos*θz*)], where *θ<sup>z</sup>* is the solar zenith angle. Can neural networks learn this relationship, given a large enough dataset on which to train? The actual forecasting problem on a cloudy day is of course much harder. Information in each all-sky image is used to locate and track low-level cumulus clouds as they move between the sun and PV-arrays. It is the difference between using ML to recognize machine-written (or even hand-written) digits versus recognizing and identifying faces in a crowd of people moving down the street. Work is ongoing to identify the best features to extract from the images, to efficiently solve the intra-hour solar forecasting problem and to predict very short-term ramp events.

A critical component of any machine learning strategy is deciding which features or input variables are most strongly correlated with the labels or output variables. A second aspect involves finding a representation of the data that is compressed or sparse in some basis. This dimensionality reduction [65] can be achieved through principal component analysis (PCA) or by the simple process of discarding unimportant features in the inputs. In our studies, the label or target variables were scalars: GHI values measured atop the ESIF Building at NREL. Other choices are possible such as the value GHIclr − GHImea, the deviation from the clear sky value.

Each input or example is a 3-channel RGB image from the Pi camera. In the current configuration, 3 images taken 5 seconds apart at low, medium, and high exposure times are fused using the Mertens algorithm into one raw image. As mentioned, this approach reduces over-exposure and washout in the circumsolar region. The 1024 × 768 JPEG forms the basic input measurement for both the optical flow and machine learning algorithms. Low level cumulus clouds between the sun and the PV arrays have the greatest effect on the DNI, hence on GHI. For that reason, and to satisfy the need for dimensionality reduction, the first preprocessing step is to locate the area in the image that surrounds the sun and extract a 128 × 128 subimage.

Several approaches to locating the sun in the image are possible. Calculating the zenith angle from the SOLPOS program, finding true North, and then mapping physical to pixel coordinates would require extensive calibration. It was decided to use a simple robust image processing approach that finds the maximum intensity in the image. On a clear day, this always locates the sun, but occasionally when the sun is totally obscured by broken clouds, the brightest point in the picture is actually sunlight reflected off of a nearby cloud. This can be observed in a time lapse video clip in which for a few frames the sun is not at the center of the sub-image. While the cause of some transient errors, it never lasts long and does not happen when the sun is totally obscured by a uniform cloud deck without breaks. Figure 15 shows the low exposure image used to locate and center the sun and the resulting raw fused image that will ultimately become the input to the neural networks. Lastly the subimage will be resized to a point (8 × 8) where the neural networks will train in a reasonable amount of time. In supervised learning, the neural networks require labeled training examples: ordered pairs (*x*, *y*)

where *x* is the input vector, in this case the extracted subimage *img*, and *<sup>y</sup>* is the measured irradiance in units of Watts/m<sup>2</sup> at the time the picture was taken. Since new images are fused every 15 seconds, GHI values were treated as constant on a 60 second interval for the purposes of assigning labels for the neural network images.

**Figure 15.** Sub-images of circumsolar region: (**a**) low exposure and (**b**) raw fused.

Many different metrics are used in data analysis and ML for evaluating model performance. Table 1 lists common ones for solar forecasting. *At* is the actual value, *Ft* the forecast value, and μ the mean; the summation over *t* can be over all observations in the ML context or over the values of a time series. The metrics provide a posteriori error bounds upon which utilities can make economic decisions. MAPE is relative L1 error, normalized for number of observations and converted to percent. When the denominator of a fraction is close to zero, μ is used instead of *At*, a common practice when predicting spot electricity prices. Some metrics such as L1 are more robust–less sensitive to outliers–than the classical RMS error norms.


**Table 1.** Regression Metrics: A actual, F forecast, μ mean.

### *2.4. Stereographic Method for CBH Estimation*

Obtaining CBH from paired sky images has been done by several authors in the past. For instance, the method proposed in [46] generated blocks of clouds which are computed to make a 3D reconstruction of the clouds. Then, the authors were able to obtain the height of the clouds by using geometric computation. On the other hand, the method in [66] used the cross correlation of non-projected saturation images to find all possible combinations that yield feasible heights, selecting the most correlated one (or the one with the minimum error).

In the GRACIOSA project, a pure geometric method based on the relative position of the sky-imagers and the clouds was implemented. First, the algorithm looks for the same cloud feature in the images coming from the two sky-imagers. This task can be extremely difficult due to the chaotic nature of clouds and slight changes in image properties such as luminosity. But fortunately, a Scale-Invariant Feature Transform (SIFT) algorithm [67], can handle this situation. This algorithm performs excellently, identifying features in a constantly changing shape (as clouds), since it considers possible changes in scale and orientation. SIFT is applied to pairs of simultaneous images from both cameras (Figure 16). Once the features from both images have been paired up, the best matches are selected to continue the calculations. Valid features are then transposed from the image (pixels) to real space (azimuth and zenith). With real space coordinates defined and the projection matrix of the lenses known, geometric computation is used to obtain the length of the vectors containing each feature and the geographical position of each camera in real space, from which the height of the evaluated feature can be derived.

**Figure 16.** Feature matching by SIFT algorithm in a pair of images from both cameras.

#### **3. Results**

Almost a terabyte of image data was collected at NREL from 15 October 2015 through 16 April 2016. The measured DNI values were recorded using NREL's CHP1-L pyranometer with units of Watts/m2. Days were grouped into three categories: (1) Clear Sky; consisting of predominantly clear days with little or no cloud cover; (2) Overcast; large masses of clouds that obscure the sun for most of the day; and finally (3) Moderately Cloudy; characterized by large variation in irradiance values and multiple ramp events. For Clear Sky and Overcast conditions there is no forecasting to do –persistence can't be beat. Other cloud cover classifications are possible. One could use unsupervised ML clustering algorithms on the raw irradiance data to find other breakdowns. Standard METAR cloud classification separates clouds into low, middle, and high level. All clouds affect measured irradiance [68]. We focus on cumulus clouds because they have the greatest effect on ramp events.

Prediction of intra-hour GHI can be partitioned into several distinct sub-tasks. (1) Acquiring a time series of all-sky images. Every 15 s a new raw image is fused [69] from three different exposure times to allow for High Dynamic Range (HDR) [70]. (2) Using the recent past images and optical flow to extrapolate cloud locations 15 min into the future. It is possible to enhance the algorithm, but it must not hamper production of real time forecasts. (3) Using the predicted image and the weights from training the neural network, a predicted GHI value is output to the microgrid management system.

In the original configuration, our software used optical flow to track movement of cumulus clouds and then ray tracing to predict cloud shadow locations. A better methodology would utilize artificial intelligence (AI) to classify the reduction in GHI that will result when cumulus clouds are predicted in the circumsolar region. It is expensive to train neural networks, but this is done offline for a given location. Once optimal weights are determined, the calculation of a single GHI value is very fast–amounting to an inner product. If the optical flow calculation requires too much time, a second single board computer can be added to the hardware as was employed at NREL.

Details of our research on machine learning to predict solar irradiance are described in the paper [9]. Our intent here is to provide the reader with a concise summary of that research. A critical outcome was verification that the SkyImager with the Pi camera can measure GHI in real time. Lacking the accuracy of an expensive pyranometer, this approach would use the image sequence acquired to solve the forecasting problem, in order to simultaneously estimate GHI. This data could be

incorporated with readings from the WeatherBoard to provide additional inputs to the MGMS using MQTT or DDS protocols. Variables that are considered deterministic should be treated as stochastic random variables. Convolutional neural networks [58] which preserve spatial information offer the best performance for image datasets. Expensive offline training of the networks is normally done one time but it is possible to do continuous learning where the networks use feedback in the form of newly acquired data to refine the learned weights.

In the field of machine learning, there are standard ways of visualizing both the input dataset consisting of vectors in a high dimensional space, as well as targets and predicted outputs. The UTSA SkyImager collected all images in this study on the ESIF building rooftop at NREL as part of the INTEGRATE project. During 147 days from October, 2015 until May, 2016 there were 14 days of no data collected (technical reasons) and 27 days of partial data collection. The remaining 106 days of no missing data formed the inputs for training and testing the neural networks. This yielded 156,495 observations (examples or rows) for input to the neural networks. We used the standard split (70% − 30%) of the data into training and testing subsets: 109,547 examples for training and 46,948 for testing.

#### *3.1. Comparing 4 Different ML Models*

Each input example is uniquely associated with one of the SkyImager pictures taken every 15 s. The normalized pixel values for the Red-Green-Blue (RGB) channels of an 8 × 8 resized subimage centered about the sun are flattened into one row vector. Note that other color spaces such as HSV or HSL could also be used. Resizing provided dimensionality reduction and reduced runtimes, but with substantial computer resources the 128 × 128 sub-images could be used for training. Average values of each channel were included as additional features for a total of 3 × 64 + 3 = 195 features. The first entry in each row is the measured GHI in *W*/*m*2. To show how well random variables *X* and *Y* are correlated, one uses a scatter diagram. Figure 17 shows scatter diagrams of measured GHI versus predicted GHI for four ML models: Multi-Layer Perceptron (MLP), Random Forest (RF), Deep Learning (DL), and Gradient Boosted Trees (GBT). While all models perform well (tight clustering around *y* = *x*, perfect correlation), DL and GBT have fewer outliers and visually outperform the other two models.

Note that the MLP and RF models were run using the Scikit-Learn ML software package while DL and GBT were run on the Rapidminer platform. This validated our results on different ML packages—results should depend on the algorithms, not the platform on which they are implemented. Currently, Scikit-Learn does not offer a deep learning model. Using *Rapidminer* is very convenient on a powerful desktop PC, our ultimate goal is to use the trained weights on a Raspberry Pi 3 computer for real time forecasting. Scikit-Learn with its open source Python interface should prove valuable for that task.

MAE, MAPE, nRMSE, and *R*<sup>2</sup> error metrics are given in Table 2 and run times in minutes. The explained variance (*R*2) in the last column is very significant. Both DL and GBT achieve values of 0.87, while MLP and RF are 0.1 less. The other error metrics closely track the *R*<sup>2</sup> values. In DL the extra accuracy is at the expense of much longer run times, but GBT gets the highest accuracy and is very fast: 10 min faster than RF.


**Table 2.** Evaluation of four machine learning models.

**Figure 17.** GHI actual vs GHI predicted on 24 October 2015 using different AI models.

A time series is another approach to visualizing the results: GHI values are plotted on the *y*-axis and time on the *x*-axis. Figure 18 compares measured and predicted GHI for one day, 3 October 2015. Although there are differences in the two curves, they track each other well. More illuminating is a time series display for the entire group of testing days shown in Figure 19, where actual GHI is blue and forecast values are red. It is difficult to distinguish the two curves because they track each other so closely. For both figures the deep learning model was used for prediction. Observe in the center of Figure 19 a group of five consecutive clear sky days that are easy to predict, as are completely overcast days.

**Figure 18.** Time series of actual versus predicted GHI for single day of data.

The two curves differ most on moderately cloudy days with air mass cumulus clouds present.

**Figure 19.** Time series of measured and predicted GHI for all testing days.

#### *3.2. Different Deep Learning Model Results*

Machine learning algorithms have many hyper-parameters that can be optimized to improve accuracies and reduce run times. Table 3 shows how changing the number of hidden layers for the DL model, nodes in the layers, and number of epochs (complete passes through the training dataset) affects results. Model 1 has 2 hidden layers each with 50 nodes; it requires ten epochs to train with a run time of ~2 min and *R*<sup>2</sup> = 0.815. Model 2 also has two hidden layers (195,195) and 10 epochs, but runs 3 times longer and only improves *R*<sup>2</sup> to 0.824. To achieve *R*<sup>2</sup> = 0.871 Model 3 (195,195,195) needs 500 epochs and ~45 min. A point of diminishing returns is reached: Model 4 (195,195,97, 195,195) takes more than an hour to run on a desktop PC. Further improvements in accuracy would require larger input images or tuning DL parameters.


**Table 3.** Comparison of 4 Deep Learning Models.

#### *3.3. Cloudy Versus Clear Sky Days*

From each 1024 × 768 fused raw image, the algorithm extracts a 128 × 128 pixel subimage centered on the sun. Using the *transform.resize* function from Skimage, this is resized to 8 × 8 pixels to achieve dimensionality reduction. This idea is critical to successful machine learning: each example in the dataset is a vector in a high-dimensional space and there are many examples. Principal Component Analysis and Linear Discriminant Analysis are other techniques for reduction, but our approach is simple and has proved to be effective.

In the following case study ten days of SkyImager data acquired at NREL were used to synthesize two datasets. Five moderately cloudy days comprised the first dataset, October 16, 17, 18, 19, 20 in 2015. Five clear sky days of data from November 10, 11, 12, 13, 14 of 2015 made up the second dataset. The neural networks were fed 32 × 32 pixel resized images and 4 ML models from Scikit-Learn were compared: Generalized Linear Regression Model (GLM), Multi-Layer Perceptron (MLP), Random Forest Regressor (RFR), and Gradient Boosted Trees (GBT). Table 4 and Figure 20 show the results. GLM and GBT have much shorter runtimes in both cases, while MLP and RFR achieve higher *R*<sup>2</sup> values. Maximum accuracies are achieved with MLP but at a cost of increased runtimes. Extreme accuracy in *R*<sup>2</sup> values for clear sky days (0.97, 1.0, 1.0, 0.99) indicates the networks are learning the analytic form of the Haurwitz clear-sky GHI model well.


**Table 4.** Results for two datasets of 32 × 32 resized sub-images.

**Figure 20.** Scatterplots of Actual GHI versus Predicted GHI for 4 ML Models on moderately cloudy days (top row) and clear sky days (bottom row).

Using still finer sampling for the resized sub-images should yield better results at the cost of larger data files and runtimes. At some point however, statistics suggests diminishing returns. In addition to detailed descriptions of ML models and software our article [9] presents another case study. It uses only one moderately cloudy day (17 October 2015) of observations and runs the four ML models with 8 × 8, 32 × 32, and 64 × 64-pixel sub-images. The size of the CSV input data file increases quickly: 3 Megabytes, 111 Mb, and 442 Mb, as do runtimes 139 s, 444 s, and 1309 s. Accuracies improve, but not beyond a certain point.

#### *3.4. JBSA Microgrid Data*

The JBSA microgrid was built as a testbed for the CPS Energy Grid Modernization Laboratory. Management and control of a microgrid must address many factors including cybersecurity, data acquisition, data management, real-time computation, storage, bandwidth, interoperability, and usability requirements. In addition to the data acquired by the UTSA equipment–SkyImager, WXT520 Vaisala weather station, and pyranometer; this includes 36-hour ahead hourly weather forecasts scraped from the web, the day-ahead Load/PV forecasts, battery State-Of-Charge (SOC) readings, actual load for the base library building, and control data from the Siemens MGMS. The goal is to use all available data in order to refine site-specific solar irradiance forecasts for improved operation and control of the microgrid. Non-UTSA data from the JBSA microgrid was acquired from Itron MV-90 xi meters. This is a system used for the collection and management of interval data consisting of time stamped readings taken every *x* minutes where *x* can be 5, 15, 30, or 60. A large electric utility may acquire a billion interval readings in a single year and use them in a variety of ways

including billing (demand response, real time pricing, curtailable rates), open market operations, and load/market research.

Analyses of the JBSA MV90 data, all of which were taken at 15 min intervals, demonstrated that much finer temporal resolution would be required to capture details of ramp events and provide accurate irradiance forecasts to the MGMS. While 1 min resolution was provided by the UTSA equipment such as the pyranometer and Vaisala weather station, the cost of this equipment precluded widespread deployment in a distributed environment. Similar equipment at the UTSA solar testbed provided a wealth of 1 min data, but we envisioned a network of hundreds of low cost SkyImagers spread across the city of San Antonio, and for this scenario low cost was an essential requirement. As previously discussed, there are a plethora of low-cost sensors that can be connected to the GPIO pins on a SBC, including the WeatherBoard2 (WB2), air quality and ozone sensors, and even devices to measure dust. Costing tens of dollars they provide a cost-effective way of environmental sensing that can easily be incorporated with the SkyImager. Figure 21 shows observations of irradiance, temperature, humidity, and pressure taken every minute on 26 November 2018 with the WB2 sensor mounted on a SkyImager.

**Figure 21.** Data from the WeatherBoard sensor on the UTSA SkyImager.

The mini photovoltaic module used at ULL is an off-the-shelf PV module, made of c-Si, with open circuit voltage of 6V and a short circuit current of 200 mA, with a maximum DC output of 1.1 W. The module was connected to a resistive load to dissipate the heat, and the data was registered by a INA219 DC Current Sensor able to measure very small currents, attached to the Raspberry Pi 3 model B.

#### *3.5. One Second Minimodule Data from La Graciosa*

The Universidad de La Laguna in the Canary Islands provided 1 sec data from the minimodule at the La Graciosa microgrid operated by ENDESA. How does the spectral content of the voltage signal change when moving from 1 s, 5 s, 15 s, to 60 s sampling? Figure 22 shows the effect of sub-sampling on the time-series. Some of the noise present in the data might be due to voltage fluctuations or seagulls (the location was the Fisherman's Guild building). Certainly the area of the minimodule is very small, so it behaves as a point measurement where small occulding objects can drastically the affect irradiance measurements. Still, the analysis strongly suggests that to resolve the frequency and voltage swings that occur during a sudden ramp event, PMU measurements may be a necessity.

**Figure 22.** Effect of subsampling on the 1-sec minimodule data from La Graciosa.

#### *3.6. CBH Estimations*

The results of the CBH estimations are presented in Figure 23. The left boxplot shows the statistical distribution of the heights obtained with the stereographic method, while the right boxplot shows the statistical distribution of a weather station located in Arrecife, Lanzarote, which belongs to the network of the University of Wyoming.

**Figure 23.** Statistical distribution of the heights obtained by two-camera stereographic method (**left**) and from a weather station located in Arrecife, Lanzarote (**right**).

There are several reasons for the apparent mismatch in the data. First, the weather station is located 30 km south of the position of the cameras, which undoubtedly has a significant effect taking into account how the atmospheric conditions develop in the region (with the thermal inversion steadily rising its level from the Sahara Desert). Second, the data of the weather station is obtained using a punctual measurement such as LIDAR, while the CBH estimations of the cameras cover a larger area of the sky (mainly the central part of the fish-eye image, since the distortion on the borders makes it almost impossible to compute the height). Finally, the temporal resolution of the weather station data is up to 1 hour, while the estimation of the CBH by the stereographic approach is done every minute. Likely the most important conclusion here is that the estimations made by the systems are coherent with the previous knowledge of the atmospheric conditions, with a stable thermal inversion ranging from 600 to 2000 m depending on the season of the year, which prevents clouds to rise over a certain height.

#### **4. Discussion**

Our future research efforts will be directed in several areas. Optical flow is a critical area for the success of intra-hour solar forecasting. IoT and cyber-security also form a critical component. Solving the AC Optimal Power Flow equations with GHI forecasts from the SkyImager will be important for solving the energy storage and microgrid control problems. Using the Raspberry Pi additionally as a multiple-sensor platform will be investigated. Environmental studies of the effects of dust and bird feces on the solar panels may well utilize SkyImager technology.

It was on the INTEGRATE project that a synergism developed between researchers and engineers at the national lab, universities, utilities, and private industry that continued after the project ended. Management styles are quite different in academia and private industry with national labs somewhere in between. Software version control was critical, as was careful documentation of all work. Both for the utility where engineers would use the hardware/software and for the university where graduate students and faculty would move on to other projects this was very important. Our decision to use Python was an excellent one. Increasingly, both documentation, tutorials, and example programs are being delivered in the form of IPython notebooks (.IPNB files) as, for example, Google's Tensorflow. Even a package such as Open Computer Vision [71,72] that is written in C++ for efficiency has Python bindings that allow easy access to routines for image fusion and optical flow.

#### *4.1. Lessons Learned at the 3 Deployment Locations*

#### 4.1.1. SkyImager at NREL

The NREL microgrid was located at a specialized research facility, but every attempt was made to simulate conditions at a utility. The communications network was well established; there was abundant state-of-the-art ancillary equipment such as pyranometers and an on-site weather station, and the process of deployment went relatively smoothly. Still there were important lessons to be learned. The SkyImager was initially configured with a single Raspberry Pi 2, which proved insufficient for both acquiring images and processing them through the pipeline to produce irradiance forecasts. This problem was solved by adding an Odroid C1, but this made the design more complex and required bridging between the two SBC using a USB-internet connection. The plethora of operating systems for both the SBC and EMS servers provided still another challenge. There are differences in the way open source packages such as OpenCV and Mosquitto install and operate on Raspbian, Ubuntu 14, and Ubuntu 16. In some cases, there are compiled binaries available and in others software must be compiled from source files. From a solar forecasting perspective, NREL was where the best, most complete data was acquired: over six months of daily images and ground truth pyranometer observations. It took researchers over a year to analyze the data and the process is ongoing.

#### 4.1.2. SkyImager at San Antonio, TX, USA

Two critical applications of islanded microgrids are remote installations in developing countries and power systems for the military that must be entirely stand alone. While this made a military base the perfect site for testing a microgrid EMS, it also meant that obtaining base access for UTSA researchers was an issue. For safety purposes, it requires at least two people to lower the 10 m MET tower. Beyond the initial installation, access to the tower is required every month to inspect the instruments and clean the surface of the plastic dome that covers the camera enclosure. The cost of the pyranometer and weather station exceeded that of the SkyImager by a factor of 20 and prompted adding low-cost sensors to the SBC for measuring temperature, humidity, and light.

Occasional loss of power to the SBC as the battery went through its initial testing phase was a minor issue which required tinkering so that the forecasting software was immediately brought back on line during a reboot. Initially the Mosquitto MQTT broker was chosen for the UTSA software, but because of compatibility issues with the MyRio software, HiveMQ proved to be a better choice. Direct internet access to our SBC from outside the corporate network was available only through a WebEx session that required close coordination between the utility and university personnel. Lesson learned: when initially transitioning hardware/software from a research environment to a production one, it is imperative to have physical/cyber access to the equipment and network. Several Odroid's were

damaged by high temperatures in the enclosure boxes and had to be replaced. MV90 meter readings every 15 min are clearly insufficient for the intra-hour forecasting problem. We are currently working with a group in Austin to add inexpensive Phasor Measurement Units (PMU) to acquire detailed frequency and phase information in conjunction with weather, irradiance, and sky images. Cloud computing [73] and 5G networks offer unique opportunities to move most of the computations off the Raspberry Pi to the cloud.
