In the current section, we present the evaluation results of the studied CNN models and our image processing method on the task of image regression for irradiance estimation and forecasting. We study the performance of the VGG11 [
38] and ResNet-50 [
39] models, for which similar comparative studies have been conducted on various applications [
26,
40]. We study the performance of two CNN models which target edge devices with limited resources, namely, the MobileNetV2 [
11] and SqueezeNet [
12] models. With this selection of models, we cover a wide range of model sizes and number of operations, as shown in
Table 1. The models and all training and evaluation processes were implemented in Python using PyTorch on a Linux workstation with an Intel(R) Core(TM) i7-9700K CPU @ 3.60 GHz and NVIDIA GeForce RTX 3080 GPU. The edge FPGA deployment and evaluation of the models was performed on a Xilinx ZCU104 FPGA board using the Xilinx Vitis 2021.2, Vivado 2021.2, and Vitis AI 2.0 tools.
6.1. Image Regression Models Training and Performance Evaluation
First, we trained the four models of the study to perform irradiance estimation from sky images. The training dataset included the years 2015 and 2016 (522,320 samples), while the test dataset consisted of the entire year 2014 (240,944 samples). During training, we used the Mean Square Error (MSE) loss function, which is suitable for image regression tasks. The hyperparameters were tuned based on the RMSE results of the trained models on the entire test dataset. Tuning was performed for the ResNet-50 model and the hyperparameters were kept the same for all the training procedures in the work in order to keep the experimental environment consistent. All the models are trained for ten epochs, which was identified as sufficient, because all models showcased overfitting after only a few epochs. This can be attributed to the high number of training steps during each epoch due to the large size of the training dataset. Regarding the batch size and the image size, these were limited to 16 and , respectively, due to GPU memory limitations and the training time required. Regarding the learning rate, it was initialized to and automatically tuned by a scheduler that reduced the learning rate by a factor of 0.75 if the validation loss plateaued for five epochs.
The training dataset was split into the training subset and validation subset for evaluation of the models during training. Instead of a random split, we selected all the samples of one random day of each consecutive month of the training dataset and added them to the validation subset until we reached the desired training–validation split ratio of 80–20%. In this way, the validation subset included indicative samples of the entire dataset in terms of the yearly periodical phenomena in the sky images. Furthermore, we avoided including in the validation subset samples only 1 min apart from almost identical ones in the training subset. We evaluated the performance of the models after every training epoch and selected the saved model with the minimum validation loss to avoid overfitting.
In
Table 2, we present the performance evaluation results of the four different models on the original dataset as well as on the dataset with the images enhanced with the SunMask channel. The ResNet-50 model appears to result in the lowest error metrics, with an RMSE of
W/m
. When introducing our SunMask generation method, it can be observed that it consistently improves the performance of all models. The MobileNetV2 model is favoured the most by our SunMask generation method, with its RMSE decreasing from
W/m
to
W/m
, a
improvement.
In order to quantify the effect that the SunMask generation method has on different distributions of sky image phenomena, we evaluated the MobileNetV2 model, which shows the largest improvement with the SunMask, on individual months. The RMSE and nRMSE results are presented in
Figure 9. We note here that we focus on the nRMSE metric in order to directly compare the performance of the models across different months. The reason for this is that the distribution of the sky image features and irradiance values is different for each month. Thus, a month with a smaller RMSE can correspond to a larger nRMSE, and vice versa; this is apparent in months such as February, May, and December. The nRMSE plot indicates that, in general, the model tends to perform better in months when there are less complex effects of clouds obstructing the sun in the image, such as June and August. Furthermore, the improvement in the performance thanks to SunMask is larger in months with complex sky image phenomena. In particular, MobileNetV2 shows the largest nRMSE improvement with SunMask in February (4.1%), May (3.9%), March (3.7%), April (3.7%), and January (3.6%). The current work focuses on the dataset generated at the location of Folsom, CA, USA (
,
). However, it would be worth exploring the performance of image regression CNNs for irradiance estimation and of the proposed SunMask generation method on other geographic areas. These geographic areas could include regions with significantly different sky image feature distributions, such as higher latitude regions where the sun is close to the horizon. Of course, this would require an extensive, publicly available, and high-quality dataset for the particular area. The availability of such extensive, annotated, and public datasets is currently an open issue in the field of Deep Learning-based irradiance forecasting and Machine Learning in general.
Following the results for irradiance estimation, we selected the ResNet-50 model, which had the best performance, to perform standalone irradiance forecasting. For this purpose, we shifted the irradiance values of the dataset backwards in time by the forecast horizon
, as explained in
Section 3. We formulated the dataset in this way for three different forecast horizons 5, 10, and 15 min ahead. We trained the ResNet-50 model similarly to before, with the simple RGB input and the 12-channel stacked SunMask images described in
Section 4. The results of the ResNet-50 model on image regression for irradiance forecasting and the persistence model are shown in
Table 3. It can be observed that the ResNet-50 model achieves a forecast skill which is incremental with regards to the forecast horizon, having slightly worse forecasting performance than the persistence model for the 5 min forecasting horizon. When ResNet-50 is trained to operate on the proposed stacked SunMask images, the results show that it achieves consistently improved forecast skill for all forecast horizons. Using this method, the ResNet-50 model can surpass the persistence model even in the very short-term forecast horizon of 5 min, adding
to its forecast skill.
6.2. Edge FPGA Porting and Acceleration Results
Following the evaluation of the performance of the CNN models, including those that included our proposed image processing method, in this subsection we present the results of our implementation flow on the edge-oriented Xilinx MPSoC FPGA. The first step towards implementing the CNN models of the IRM on the FPGA is to perform quantization. The Python quantization application that we developed utilizes several different quantization functionalities of the Xilinx Vitis AI 2.0 Quantizer. The results of the quantization process for the four different models of our study are presented in
Table 4. First, we performed Post-Training Quantization using a batch of unlabeled images. It can be observed that the PTQ has a very significant effect of
W/m
increased RMSE compared to the performance of the original floating-point VGG11 in
Table 2. After performing an additional Fast Fine-tuning step using 1000 unlabeled images, the effect was reduced to
W/m
. For the ResNet-50 model, PTQ results in a slight increase in RMSE of
W/m
, which is reduced to
W/m
with FF. MobileNetV2 suffers a loss in performance which cannot be corrected even after FF, resulting in a loss of
W/m
. Finally, the quantized SqueezeNet model has a large performance degradation from its original floating-point model, with an increase in RMSE of
W/m
. With FF the increase in RMSE becomes
W/m
. The SqueezeNet model architecture allows us to perform an additional Quantization-Aware Training step instead of FF. We trained the SqueezeNet model for an additional epoch using the QAT capabilities of the Vitis AI Quantizer. After the QAT step, the performance of SqueezeNet is restored to a similar level as the original floating-point model, suffering an increase of only
W/m
RMSE.
After quantization of the CNN models, we implemented the FPGA processing architecture described in
Section 5 on the Xilinx ZCU104 FPGA board using the Xilinx Vitis and Vivado 2021.2 tools. The dual-core DPU IP was operated at
; the entire design consumed 15.585 W as measured by the Vivado power analysis tool. In
Table 5, we present the resources utilization of the PL of the FPGA. We observe that the dual-core DPU IP consumes a very significant amount of resources, especially regarding the Digital Signal Processing (DSP) slices responsible for performing most computations. It is worth noting that for applications where processing throughput is not critical, the developer can configure the DPU with a single processing core, reducing resource utilization by about half for most resources.
In order to showcase the real-time capabilities of the edge FPGA on the image regression task, we benchmarked the four different CNNs on the DPU IP Core. We evaluated their throughput in terms of Frames per Second (FPS) on both cores of the DPU using multithreading; the results are presented in
Figure 10. The results show that the VGG11 model with the highest number of parameters and operations has the lowest throughput of 46 FPS on a single core of the DPU. The lightweight MobileNetV2 and SqueezeNet models showcase significantly higher throughput than the VGG11 and ResNet-50 models. In particular, SqueezeNet has the highest throughput of 1028 FPS, even though its original floating-point model requires more operations with a lower number of parameters than MobileNetV2. When utilizing both cores of the DPU, all the models can achieve a little less than
throughput. For the ResNet-50, MobileNetV2 and SqueezeNet models, the achieved throughput rates can be considered to satisfy real-time requirements, e.g., for a sky imager providing a video at 60 FPS, leaving space for additional algorithms to complete more PV related processes.