Energy Performance Analysis and Output Prediction Pipeline for East-West Solar Microgrids

Nguyen, Khanh; Koch, Kevin; Chandna, Swati; Vu, Binh

doi:10.3390/j7040025

Open AccessArticle

Energy Performance Analysis and Output Prediction Pipeline for East-West Solar Microgrids

Applied Data Science and Analytics, SRH University Heidelberg, 69123 Heidelberg, Germany

^*

Author to whom correspondence should be addressed.

J 2024, 7(4), 421-438; https://doi.org/10.3390/j7040025

Submission received: 13 August 2024 / Revised: 14 October 2024 / Accepted: 18 October 2024 / Published: 21 October 2024

(This article belongs to the Section Computer Science & Mathematics)

Download

Browse Figures

Versions Notes

Abstract

:

Local energy networks, known as microgrids, can operate independently or in conjunction with the main grid, offering numerous benefits such as enhanced reliability, sustainability, and efficiency. This study focuses on analyzing the factors that influence energy performance in East-West microgrids, which have the unique advantage of capturing solar radiation from both directions, maximizing energy production throughout the day. A predictive pipeline was also developed to assess the performance of various machine learning models in forecasting energy output. Key input data for the models included solar radiation levels, photovoltaic (DC) energy, and the losses incurred during the conversion from DC to AC energy. One of the study’s significant findings was that the east side of the microgrid received higher radiation and experienced fewer losses compared to the west side, illustrating the importance of orientation for efficiency. Another noteworthy result was the predicted total energy supplied to the grid, valued at €15,423. This demonstrates that the optimized energy generation not only meets grid demand but also generates economic value by enabling the sale of excess energy back to the grid. The machine learning models—Random Forest, Extreme Gradient Boosting, and Recurrent Neural Networks—showed superior performance in energy prediction, with mean squared errors of 0.000318, 0.000104, and 0.000081, respectively. The research concludes that East-West microgrids have substantial potential to generate significant energy and economic benefits. The developed energy prediction pipeline can serve as a useful tool for optimizing microgrid operations and improving their integration with the main grid.

Keywords:

East-West microgrids; energy prediction pipeline; energy output analysis; solar energy; grid-connected mode; renewable energy; machine learning models; global radiation; energy forecasting

1. Introduction

In the context of growing energy demands and an increasing focus on renewable energy sources, microgrids are playing an increasingly crucial role in modern energy systems. These local energy networks are designed to operate independently or in conjunction with the main grid, covering a wide range of energy supplies, including renewable sources like solar and wind, conventional generators, and energy storage systems [1]. The goal of a microgrid is to provide dependable and environmentally responsible energy to communities, industrial parks, and other locations while optimizing performance and reducing energy consumption [2].

Microgrids represent a fundamental transformation in local energy generation, distribution, and consumption. They provide a flexible, efficient, and environmentally friendly alternative to traditional power systems, particularly in areas prone to outages or lacking robust infrastructure [3]. By integrating renewable energy sources and employing advanced technologies, microgrids enhance energy reliability, sustainability, and economic viability, thereby paving the way for a cleaner energy future [4,5].

Over the past decade, significant research has been conducted in the field of microgrids, with a particular focus on areas such as microgrid planning, predictive maintenance, real-time energy management, and energy output prediction. The primary objective behind predicting the energy output of microgrids is to enhance system performance and stability. This predictive capability plays a pivotal role in several aspects, including the efficient management of energy from both renewable and conventional sources, optimization of storage and generator control, forecasting of renewable resources, grid stability enhancement, economic optimization, and increased utilization of renewable energy resources. Simultaneously, it contributes to risk reduction and heightened availability within the microgrid system [6].

2. Related Work

In the context of machine learning and energy systems, predicting energy output involves using historical or real-time data with machine learning models to forecast future energy generation. Various machine-learning techniques have been employed for this purpose. A previous study compared multiple regression, support vector machine regression, and Gaussian regression models, concluding that multiple linear regression performed the best. However, the study focused heavily on model selection, overlooking the potential effects of external environmental factors that could influence energy predictions in real-world scenarios. Additionally, the study highlighted the importance of optimal location and parameter selection for model development but did not explore these factors in depth [7].

Another review examined photovoltaic (PV) power prediction in a semi-arid climate, introducing a stacked machine learning model and comparing it with Random Forest, Extreme Gradient Boosting, and linear regression. While the stacked model demonstrated superior accuracy, the study’s limitation lay in its narrow focus on semi-arid climates, which limited its applicability to other regions with different climatic conditions. Furthermore, while feature engineering significantly enhanced prediction accuracy, the study did not explore potential improvements in model robustness under varying conditions [8].

Rosero et al. investigated the use of cloud computing and machine learning for energy management in a microgrid cluster, showcasing the potential of these technologies to optimize energy distribution and efficiency. A notable disadvantage, however, was the study’s limited discussion of the scalability of such systems in larger or more complex grids, as well as the lack of real-world validation beyond simulations [9].

In another study, a multilayer feedforward neural network (MLFFNN) trained with the Levenberg–Marquardt algorithm was applied to data from a real PV power plant. Although the model achieved a low mean squared error (MSE) of 0.0053, indicating its high accuracy, the study did not address the challenges of computational complexity and the potential need for real-time application in larger, more diverse datasets [10].

Arafat et al. provided a comprehensive review of machine learning applications in microgrid predictive maintenance. While their work offered valuable insights into integrating sensor data with operational drivers, it lacked a detailed discussion on the real-time deployment of these frameworks in dynamic environments. Furthermore, the study acknowledged several challenges, including data quality and integration issues, but did not propose concrete solutions for overcoming these hurdles [11].

Despite the advancements in these studies, many have focused on one-sided microgrids, with relatively few studies implementing the East-West model. This model has the potential to leverage global radiation from both sides, offering advantages for energy generation throughout the day. One-sided solar farms have a uniform orientation among all solar panels, whereas two-sided solar farms have two distinct orientations, similar to east- and west-sided solar panels. Figure 1 illustrates an example of a microgrid in a simulated environment, as developed by SRH Berlin University of Applied Sciences.

This study contributes significantly to the understanding and development of east-west microgrids by analyzing the factors that influence their performance and quantifying the energy that can be generated and fed into the main grid. Additionally, it assesses the potential revenue streams from selling this energy, providing essential insights for energy planners and operators. A key hypothesis tested in this research is that there is no significant difference in energy production and losses between the east and west sides of the microgrid, which aids in optimizing panel orientation. Furthermore, this study introduces an innovative energy output prediction pipeline that integrates critical factors such as solar radiation, photovoltaic energy (DC), and conversion losses to enhance the accuracy of energy output predictions. A comparative analysis of microgrid operation modes—grid-tied (connected to the local utility) versus islanded (operating independently)—is also conducted to determine which mode offers greater energy optimization benefits. Through these contributions, the study addresses existing research gaps and provides actionable insights for the effective implementation and management of east-west microgrids in practical applications.

3. Materials and Methods

3.1. Data Understanding and Preprocessing

To ensure the greatest possible stability, three different datasets were used to train the models. The initial dataset comprises data from a real-time-designed microgrid model with east- and west-oriented photovoltaic (PV) modules and maximum power point (MPP) inverters. The system was designed to be grid-connected and does not include an energy storage system. The model was developed by SRH Berlin University of Applied Science using an online PVsol tool [12]. A total of four inverters, each with two maximum power points, were evaluated across 270 columns. An inverter is a system that converts direct current (DC) to alternating current (AC) energy through transformer switching and control circuits. Two inverters were oriented east, while two were oriented west. For each inverter-MPP combination, total irradiation, various losses, and grid feed-in were measured. The Global PV Energy dataset provides a comprehensive overview of the total amount of solar energy reaching a specific area of the Earth’s surface over a defined time period, comprising hourly values for the entire year.

The second dataset is structured similarly to the first; however, it contains ten inverters instead of four. The final dataset comprises weather forecasts with a 15-min resolution, including solar radiation and air temperature data, among other variables. These forecasts were created by the Laboratory of Climatology at the University of Liège using the regional climate model MAR. The data were collected from 10 May 2019, to 18 June 2019 [13].

The primary dataset was the solar farm dataset, which included a total of four inverters. To ensure an accurate analysis between east and west inverters and to construct a reliable model for grid feed-in prediction, several preprocessing steps were essential. One of the initial steps in the process was to split the datasets for each inverter/MPP combination into different rows. This allowed for more efficient analysis between the east and west orientations. Another step involved removing all columns with null values and irrelevant columns from the dataset. The irrelevant columns included Deviation from the Standard Spectrum, Ground Reflection (Albedo), and Orientation and Inclination of the Module Surface. The resulting dataset was then reduced to 19 columns (see Table 1).

The final step was to ensure that the time column was formatted correctly. The format was adjusted as follows: “yyyy-mm-dd HH:MM”.

3.2. Feature Engineering

To address the second use case—predicting grid feed-in energy—it is essential to identify the features required for constructing the model. A feature is a specific data characteristic used to make predictions. Features are crucial inputs for machine learning models, enabling them to identify patterns in the data [14]. In addition to the provided columns containing open circuit voltage, MPP voltage, and global PV radiation, 21 additional features were incorporated.

From the provided time column, several temporal features were created, including the day of the week and the week of the year. Furthermore, twelve additional features were added using lagged data, which are also referred to as delayed features in time-series analysis [15]. These values reflect how data from previous hours were calculated. Additionally, the area and inverter label columns were encoded, resulting in two new feature columns.

The following 24 features (see Table 2) were utilized in the creation of the models.

To guarantee the integrity of the outcome, an outlier detection process was implemented. Potential outliers were identified using a method that involves subtracting the first quartile (Q1) from the third quartile (Q3) to calculate one and a half times the interquartile range (IQR). Data points falling below Q1 minus 1.5 times the IQR or above Q3 plus 1.5 times the IQR were considered potential outliers, thereby extending the range for outlier detection.

3.3. Use Case 1: Data Analysis in Microgrids

To provide an initial overview of the columns and their interdependencies, a correlation matrix was constructed. The degree of dependency between two variables is measured on a scale from −1 to 1. A value of −1 indicates a strong negative dependency, while a value of 1 indicates a strong positive dependency. If the value remains around 0, no dependency can be assumed [14].

The bar chart was the primary tool used for comparing east- and west-oriented solar panels [16]. Bar charts are effective for comparing categories such as inverters or MPPs. Additionally, scatter plots were employed to investigate values such as global PV radiation or PV energy (DC) for potential dependencies in greater detail [17].

Once the data had been preprocessed, a two-way ANOVA test was conducted to ascertain which factors were statistically significant in relation to the east and west sides (see Figure 2). ANOVA is an effective method for comparing means between groups and assessing the statistical significance of differences among factors, helping to identify distinctions between the two sides. Furthermore, ANOVA can examine the interactions between factors, identifying dependencies that affect the analysis. In this research project, the factors analyzed included global radiation, open circuit voltage, MPP voltage, and losses in the process [18].

3.4. Use Case 2: Energy Output Prediction

In this section, a pipeline was established for the purpose of predicting energy output. Following testing of the pipeline on our original dataset, it was applied to two additional microgrid datasets. One dataset was collected in Berlin, Germany, using a simulation environment. This dataset comprised information from ten inverters, distributed between the east and west sides. The second dataset was provided by Liège University in Belgium and included a weather forecast produced by the university’s Laboratory of Climatology, based on the MAR regional climate model [19].

To identify the most effective models for energy prediction, five machine-learning methods were executed and compared. In each model, the grid search technique was employed to find the optimal hyperparameter combination. Subsequently, the metrics used for comparison were mean square error (MSE) and processing time. The stability of the models was also assessed during testing on the three datasets to ensure optimal performance. The initial model employed was linear regression, which is a fundamental method for understanding the relationship between the target and predictive variables. It is a rapid processing method and highly effective for linear relationships. However, it is not well-suited for non-linear data and is susceptible to noise.

In the field of machine learning, Support Vector Regression (SVR) is a widely used method that employs support vectors to predict the value of a dependent variable. The objective is to create an optimal boundary (support vector line) around the data and apply a specialized loss function to enable the model to learn from the dataset. This model’s key strengths are its efficiency with complex data structures and its capacity to control overfitting. However, this approach requires careful parameter fine-tuning, which may not be suitable for large datasets. Two crucial parameters in this model are the cost value and the kernel, as outlined in [20]. The cost value determines the balance between achieving a low training error and a low testing error, while the kernel calculates the degree of similarity between data points in a high-dimensional feature space. The parameters utilized in this research were cost values (0.1, 1, 10) and kernels (linear, polynomial, radial basis function).

The next models introduced were tree-based machine learning methods. One of the most popular methods is Random Forest Regression, which combines many decision trees to create more accurate predictions. It helps reduce overfitting and enhances the accuracy of the model, particularly when there are multiple independent variables. However, the model is somewhat complex and difficult to explain in detail [21]. This study tested three different numbers of trees: 50, 100, and 200. Additionally, four different maximum depths for each decision tree within the Random Forest were implemented: None, 10, 20, and 30.

Extreme Gradient Boosting (XGBoost) is a robust boosting algorithm that is highly effective in machine learning, both for prediction and classification. Similar to Random Forest, it employs multiple decision trees to improve accuracy and establish more robust predictions by learning from the shortcomings of previous trees. It functions effectively even on datasets with a high level of noise but requires intricate parameter tuning, which can result in overfitting if not managed with precision. This model requires multiple parameters to be tuned and focuses on the tree-based boosting algorithm within the model [22]. The number of boosting rounds or trees to be constructed in the ensemble was set at 100, 500, and 1000. Furthermore, the maximum depth of each decision tree in the ensemble was set to 3, 4, and 5, while the step size at each iteration towards minimizing the loss function was set to 0.01, 0.1, and 0.2.

The final model utilized in this project was the Recurrent Neural Network (RNN), which is designed to process sequential data, including text, audio, and time series data. This model’s distinguishing feature is its capacity to retain past data. It is effective with complex datasets but requires significant time for training and may experience gradient vanishing. Hyperparameter tuning is a systematic process of searching for the optimal combination of hyperparameters to enhance the RNN’s performance [23]. The number of Long Short-Term Memory (LSTM) units or neurons in each LSTM layer was set at 128 and 256, while the number of LSTM layers in the RNN architecture was set to 2. Additionally, the activation function utilized within the LSTM units was the Rectified Linear Unit (ReLU). ReLU was selected for this model due to its effectiveness in mitigating the vanishing gradient issue and its computational efficiency, which contributes to enhanced training and performance outcomes. The batch size, or the number of data samples processed together in each forward and backward pass during training, was set to 32. The number of epochs, or the number of times the entire training dataset was passed forward and backward through the neural network during training, was set to 10, 20, 100, and 200.

Regarding the tools used, the model was trained on a local machine (MacBook Air 2017, Model A1466, EMC 3178-Designed by Apple in Cupertino, CA, USA, assembled in China) equipped with an Intel Core i5 CPU, which has 2 cores and a base frequency of 1.8 GHz, with a turbo boost of up to 2.9 GHz. The training-test size was split into 80% for training and 20% for testing to ensure robust model evaluation.

4. Results

4.1. Use Case 1: Data Analysis in Microgrids

4.1.1. Data Exploring

A review of the correlation matrix revealed key insights into the relationships between various columns. Notably, Global PV Radiation, PV Energy (DC), and Feed-in Energy showed a strong positive correlation. As solar radiation increased during periods of sunlight, feed-in energy also rose, resulting in greater grid load.

However, it is important to account for losses, which were inversely related to sunlight levels. These losses include the following:

Losses due to STC Conversion (Rated Efficiency of Module)
Losses due to mismatch (Manufacturer Information)
Losses due to DC/AC conversion
Losses due to input voltage deviate from the rated voltage

These losses exhibited a strong negative correlation with sunlight, approaching a value of −1, as shown in Figure 3. For example, higher global PV radiation led to increased losses due to STC conversion.

To better understand the results and their implications for grid feed-in predictions, a time trend analysis was conducted. Figure 4 shows that the winter months (November to February) experienced lower irradiance levels. Additionally, the days with any sunlight during this period also showed reduced irradiance. In contrast, data from April to September revealed a different trend. During peak hours, irradiance ranged between 5 and 8 kilowatts, with more precise records of days with high irradiance.

4.1.2. Overview

The initial use case focuses on analyzing solar data from east-oriented and west-oriented panels. To gain an initial understanding of the input, losses, and output, all rows of the dataset were grouped by orientation (east and west). Upon initial review, it became evident that the eastern area exhibited a higher level of solar radiation than the western side. This indicates that the solar panels received greater solar radiation in the morning compared to the afternoon. Consequently, fluctuations were observed in both the loss values and grid feed-in, as illustrated in Figure 5.

As previously established in the correlation matrix, this relationship was also verified with another chart. Figure 6 illustrates the correlation between global PV radiation and PV energy (DC). With the exception of a few outliers, a one-to-one relationship is confirmed, exhibiting a linear growth pattern.

Figure 7 below illustrates the linear correlation between PV energy (DC) and MPP voltage in this microgrid. The results indicate that controlling the MPP voltage in a photovoltaic system can help achieve specific energy generation targets.

4.1.3. Losses

Losses play a critical role in the generation of grid feed-in energy. In some cases, losses account for over 80 percent of global PV radiation. The greatest loss of energy occurs due to STC (Standard Test Conditions) conversion. STC conversion is a standardized test condition that ensures comparability by measuring the energy generated by solar panels. It should be noted that losses can occur in various ways, including variations in temperature and issues related to cable installation [24].

To identify discrepancies between the eastern and western regions, all losses were aggregated and classified by area. The results demonstrate notable discrepancies in the following categories:

Losses due to mismatch (configuration/shading)
Losses due to module-specific partial shading
Losses due to deviation from the nominal module temperature

The most significant discrepancy between the eastern and western systems is the loss incurred due to configuration and shading mismatches. A configuration mismatch occurs when individual solar panels are not optimally aligned with one another. For instance, this can happen when different models of panels are installed together. Such discrepancies typically manifest as differences in voltage or current. A shading mismatch occurs when a solar panel casts a shadow on itself, most notably during the morning and evening hours when the sun is at a relatively low angle. In this case, the panels receive a reduced level of irradiation, resulting in a corresponding decrease in energy generation [18].

The east side experienced a total loss of 12.68 kWh, while the west side saw a total loss of 20.18 kWh over the course of the year (see Figure 8).

Additionally, a quantifiable reduction was observed in the impact of module-specific partial shading. Losses due to module-specific partial shading occur under conditions of cloud cover. The impact of tree and building shadows on the panels was also quantified in this context [25]. Once again, the east side was found to have significantly less loss, at −118.45 kWh, while the west side experienced a loss of −141.39 kWh over the course of the year (see Figure 9).

The last notable discrepancy was identified in the loss incurred due to a deviation from the specified nominal module temperature. This column quantifies the loss of energy resulting from a deviation in operating temperature from the specified nominal temperature of 25 degrees Celsius. The standard temperature of 25 degrees is used as the nominal temperature because it is the temperature at which solar panels are tested under standard conditions. It is important to note that if the temperature exceeds 25 degrees, the losses will be higher; conversely, losses are lower when the temperature is below 25 degrees [26]. This phenomenon was also evident in the dataset, with some values exhibiting a positive range. In terms of quantitative data, the eastern side exhibited a total loss due to deviation from the nominal module temperature of −66.43 kWh, while the western side demonstrated a loss of −79.72 kWh (see Figure 10).

4.1.4. Output

The solar farm is typically operated in island mode. In the event that the solar farm produces more energy than can be consumed, the system will switch to grid-connected mode, allowing any excess energy to be sold to the main grid. In addition to grid feed-in energy, the PV energy output serves as a key performance indicator, indicating the amount of energy input to the inverter. Each inverter has a maximum power point, which is critical for the efficiency of a solar system. The maximum power point is reached when the voltage and current are optimally matched, resulting in maximum power generation [27].

A review of the PV energy (DC) across all inverter/MPP combinations revealed that all MPPs could be reached without significant loss, as illustrated in Table 3.

The second key figure is the grid feed-in energy. Similar to the global PV radiation data, the findings indicate that the greatest amount of energy could be fed into the main grid between the months of April and September, as illustrated in Figure 11. Based on an average price of 46.27 cents per kilowatt-hour, the total revenue for the year would be €15,423.47, as shown in Figure 12.

ANOVA provided a systematic method for evaluating whether the observed differences in losses were statistically significant or likely to have occurred by random chance. Furthermore, it allowed for the analysis of interactions between variables, such as the type of losses and the side of the solar farm. The results of the two-way ANOVA tests indicated statistically significant differences between the east and west sides regarding losses due to module-specific partial shading and losses due to mismatch (configuration/shading). The p-value was less than 0.05, as illustrated in Table 4 and Table 5.

4.2. Use Case 2: Energy Output Prediction Pipline

In this section, a pipeline for predicting energy output was established. Following comprehensive testing of the original dataset, the pipeline was deployed to two additional microgrid datasets. One of the datasets was gathered through a simulation environment based in Berlin, Germany, comprising data from 10 inverters, five located on the east side and five on the west side. The second dataset was sourced from Liège University in Belgium and included weather forecasts produced by the Laboratory of Climatology, based on the MAR regional climate model [13].

4.2.1. Comparison of 5 Models

As illustrated in Figure 13, the root-mean-square errors for Linear Regression and Support Vector Regression were 0.002213 and 0.002744, respectively. In contrast, Random Forest Regression and XGBoost demonstrated the greatest efficacy in this use case, as evidenced by their lowest mean square error (MSE) and shortest processing time. The MSEs for Random Forest Regression and XGBoost were 0.000318 and 0.000104, respectively. The Recurrent Neural Network (RNN) yielded the most optimal results, with an MSE of 0.000081 for the configuration using 200 epochs and 256 LSTM units, and 0.000210 for the configuration using 100 epochs and 128 LSTM units (see Figure 14).

In terms of processing time, Linear Regression and Support Vector Regression were the most efficient of the five models, requiring only 1 s and 5 s, respectively. The difference in processing time between Random Forest Regression and XGBoost was minimal, with averages of 29 s and 24 s, respectively (see Figure 15). It is important to note that the Recurrent Neural Network (RNN) required a significantly longer training period, especially when adjusting multiple parameters. The processing time was four hours for the configuration with 200 epochs and 256 LSTM units, and two hours for the configuration with 100 epochs and 128 LSTM units.

4.2.2. Comparison of Three Datasets

The 10-inverter dataset with the East and West sides demonstrated the 4-inverter’s optimal performance in Random Forest Regression and XG Boosting, with MSEs of 0.000096 and 0.000070, and processing times of 60 and 59 s, respectively. In terms of linear regression, the MSE was 0.000745 and the processing time was 4 s. The Support Vector Regression model was computed in 14 s, and it had the lowest MSE among the five models, at 0.002056. In comparison, the Recurrent Neural Network produced a result with an MSE of 0.001186, requiring 39,600 s for processing. This resulted in a high MSE and high processing time when compared to the original dataset, where lower MSE corresponded to shorter processing times (see Figure 16).

The results obtained from the Liège University dataset differed significantly from the other two datasets, as they included only 40 days of data rather than a full year’s worth. The two best-performing models, Random Forest Regression and Support Vector Regression demonstrated excellent results with MSE values of 67.5601 and 68.5681, respectively, while maintaining processing times of just 10 and 1 s, respectively. The next highest-performing model was XG Boosting, which achieved an MSE of 70.1264 while taking three seconds to process. In comparison, the Linear Regression model completed its computations in a mere two seconds yet yielded one of the lowest MSE scores among the five models, standing at 72.0166. In contrast, the Recurrent Neural Network produced a result with an MSE of 72.4784 but required a substantial processing time of 1500 s, which falls into the category of high MSE and extended processing duration (see Figure 17).

Extensive testing on two full-year datasets revealed that Random Forest Regression and XG Boosting consistently demonstrated superior performance compared to the other models. The Random Forest Regression model showed remarkable performance when the maximum depth parameter was set to 200. Similarly, the XG Boosting model exhibited excellent results when the number of boosting rounds or trees reached 1000, the maximum decision tree depth was set to 4, and a step size of 0.1 was applied for each iteration to minimize the loss function. It is worth noting, however, that while the performance of the Recurrent Neural Network (RNN) was noteworthy in the original dataset, it displayed variability in different computational environments. In the primary dataset, the RNN model was configured with 200 epochs and 256 Long Short-Term Memory (LSTM) units.

4.2.3. Final Pipeline

The end-to-end pipeline demonstrated how energy was predicted from raw time series data (see Figure 18). It provided context after the key stages of the process, namely the introduction of the dataset, data preprocessing, feature engineering, and model building. This ensured a logical and clear progression. The pipeline comprised four stages: data preprocessing, feature engineering, model establishment and hyperparameter optimization, and model comparison and selection. The dataset must include time series data with weather conditions, making this model suitable for use with full-year time series datasets.

5. Discussion

As illustrated in Figure 11 (Grid Feed-In), there is potential to feed 333,340 kilowatt-hours into the main grid throughout the year. This would have covered the total grid consumption of 27,710 kilowatt-hours. However, due to the absence of a battery storage system within the microgrid, the surplus energy was transferred to the main grid. Consequently, energy had to be imported from the main grid to the microgrid, as this was the only way to cover the constant energy demand. It is, therefore, evident that the microgrid would be unable to operate in pure island mode and is dependent on the main grid.

The initial hypothesis that the east and west sides would produce the same amount of energy was rejected. There are notable differences between the two sides. As demonstrated in the analysis between global PV radiation and PV energy (DC), some outliers were identified. Although it was not possible to assign these outliers 100 percent to one side, they still indicate differences between the two sides.

There was a significant discrepancy in energy losses between the east and west sides. The east-facing solar panels demonstrated considerably lower energy loss than the west-facing panels, despite receiving more global PV radiation. This can be attributed to a superior configuration, as the individual solar panels are better matched in terms of voltage and current on the east side, which is crucial for optimizing the configuration. The analysis also revealed an increase in energy loss due to the shading of the solar panels on the west side. This may suggest the presence of obstructions, such as buildings or trees, on the west side of the solar farm. As a result, the solar farm is subject to shadowing, particularly in the evening.

The final noteworthy discrepancy was observed in the losses resulting from deviations from the nominal temperature. It is notable that the west side has significantly higher losses, which are influenced by weather conditions in Germany. Typically, temperatures are lower in the morning than in the afternoon during the summer months. Consequently, the west side experiences significantly higher afternoon losses than the east side. These significant differences disprove the hypothesis that there are no differences in energy production and losses between the east and west sides. These results were not anticipated, as it was assumed that both sides would have the same modules and, therefore, the same conditions for energy production. However, the solar farm is unable to withstand the prevailing weather conditions and temperatures. These findings provide a foundation for optimizing the solar farm. The reason for this is the direction of shading due to the solar module layout, as well as the timing of shading. However, since these losses did not significantly impact the overall performance of the modules, there was no statistical difference in energy output between the two sides.

A key strength of this study was its large sample size, comprising 270 columns and 8760 data points. From that, a train-test set with 35,400 data points was created, allowing the model to learn and perform better in time series forecasting with full-year trends. Moreover, the pipeline was adaptable and did not require a substantial number of features in the original dataset, making it suitable for a range of scenarios. In regard to the testing model and different datasets, the models yielded favorable outcomes with full-year data. However, they exhibited less stability with the Liège University data, which included only 40 days. This may pose a potential limitation when applying this pipeline to other datasets with fewer data points.

One of the primary insights from the case study was that the solar panels on the east and west sides do not consistently perform at the same level of efficiency. The outcome is contingent upon a number of variables. Examples of controllable factors include the adjustment of solar panels to one another and the shading caused by nearby objects. These can be adjusted to achieve optimal results. However, it should be noted that external factors, such as weather conditions or temperature, are beyond our control.

Moreover, the models demonstrated consistent performance across two full-year datasets, with Random Forest Regression and XG Boosting consistently outperforming other models in terms of both MSE and processing time. It is worth noting, however, that the Recurrent Neural Network (RNN) demonstrated remarkable performance in the initial dataset but exhibited fluctuations when subjected to various computational environments. This established pipeline can be readily applied to other time series datasets containing a full year’s worth of data.

One significant challenge associated with this microgrid is the absence of an energy storage system. Therefore, any excess energy must be sold to the main grid. However, at night, energy must be imported back to the microgrid from the main grid to meet demand. As a result, significant financial losses are incurred. Another area for improvement is the analysis of energy consumption, which can be addressed in the future. This will enable a more accurate analysis to be made and greater efficiency to be achieved in energy management.

Author Contributions

Conceptualization, K.N., K.K., S.C. and B.V.; methodology, K.N. and K.K.; data analysis, K.N. and K.K.; prediction pipeline, K.N. and K.K.; writing—original draft preparation, K.N. and K.K.; writing—review and editing, K.N. and K.K.; visualization, K.N. and K.K.; supervision, S.C. and B.V. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are available per request.

Acknowledgments

The authors would like to thank Islam Saiful for his technical expertise and guidance throughout the project; Sina Mehraeen and Kamellia Reshadi for their technical advice, suggestions, and comments greatly enriched our work and helped us achieve our research objectives.

Conflicts of Interest

The authors declare no conflicts of interest.

References and Note

Hirsch, A.; Parag, Y.; Guerrero, J. Microgrids: A review of technologies, key drivers, and outstanding issues. Renew. Sustain. Energy Rev. 2018, 90, 402–411. [Google Scholar] [CrossRef]
Ton, D.T.; Smith, M.A. The U.S. Department of Energy’s Microgrid Initiative. Electr. J. 2012, 25, 84–94. [Google Scholar] [CrossRef]
Parhizi, S.; Lotfi, H.; Khodaei, A.; Bahramirad, S. State of the Art in Research on Microgrids: A Review. IEEE Access 2015, 3, 890–925. [Google Scholar] [CrossRef]
Hatziargyriou, N.; Asano, H.; Iravani, R.; Marnay, C. Microgrids. IEEE Power Energy Mag. 2007, 5, 78–94. [Google Scholar] [CrossRef]
Lasseter, R.H. MicroGrids. In Proceedings of the 2002 IEEE Power Engineering Society Winter Meeting. Conference Proceedings (Cat. No.02CH37309), New York, NY, USA, 27–31 January 2002; Volume 1, pp. 305–308. [Google Scholar] [CrossRef]
Rao, S.N.V.B.; Yellapragada, V.P.K.; Padma, K.; Pradeep, D.J.; Reddy, C.P.; Amir, M.; Refaat, S.S. Day-Ahead Load Demand Forecasting in Urban Community Cluster Microgrids Using Machine Learning Methods. Energies 2022, 15, 6124. [Google Scholar] [CrossRef]
Alrawi, O.; Bayram, I.S.; Al-Ghamdi, S.G.; Koc, M. High-Resolution Household Load Profiling and Evaluation of Rooftop PV Systems in Selected Houses in Qatar. Energies 2019, 12, 3876. [Google Scholar] [CrossRef]
Yin, L.; Xiao, T.; Zhao, X.; Li, B. Improving the Stability and Optimizing the Feature Engineering of Machine Learning Models for Photovoltaic Power Prediction. Energies 2021, 14, 1669. [Google Scholar] [CrossRef]
Rosero, D.G.; Díaz, N.L.; Trujillo, C.L. Cloud and machine learning experiments applied to the energy management in a microgrid cluster. Appl. Energy 2021, 304, 117663. [Google Scholar] [CrossRef]
Sharkawy, A.-N.; Ali, M.; Mousa, H.; Ali, A.; Abdel-Jaber, G. Machine Learning Method for Solar PV Output Power Prediction. SVU-Int. J. Eng. Sci. Appl. 2022, 3, 2. [Google Scholar] [CrossRef]
Arafat, M.Y.; Hossain, M.J.; Alam, M.M. Machine learning scopes on microgrid predictive maintenance: Potential frameworks, challenges, and prospects. Renew. Sustain. Energy Rev. 2024, 190, 113736. [Google Scholar] [CrossRef]
SRH Berlin University of Applied Science. A microgrid in Simulated Environment.
Dumas, J.; Dakir, S.; Liu, C.; Cornélusse, B. Coordination of operational planning and real-time optimization in microgrids. Electr. Power Syst. Res. 2021, 190, 106696. [Google Scholar] [CrossRef]
Wagavkar, S. Introduction to the Correlation Matrix. Built In. 2023. Available online: https://builtin.com/data-science/correlation-matrix (accessed on 10 August 2024).
Montgomery, D.C.; Jennings, C.L.; Kulahci, M. Introduction to Time Series Analysis and Forecasting, 2nd ed.; John Wiley & Sons: Hoboken, NJ, USA, 2015. [Google Scholar]
Mitchell, C. What Is a Bar Graph? Investopedia, 25 August 2015. [Google Scholar]
Knaflic, C.N. Storytelling with Data: A Data Visualization Guide for Business Professionals; Wiley: Hoboken, NJ, USA, 2015. [Google Scholar]
Kiernan, D. Natural Resources Biometrics; Open SUNY: New York, NY, USA, 2014. [Google Scholar]
Kumar, B.; Pratap, B.; Shrivastava, V. Artificial Intelligence for Solar Photovoltaic Systems: Approaches, Methodologies and Technologies; CRC Press Taylor & Francis Group: Boca Raton, FL, USA, 2023. [Google Scholar]
Klopfenstein, Q.; Vaiter, S. Linear Support Vector Regression with Linear Constraints. Mach. Learn. 2021, 110, 1939–1974. [Google Scholar] [CrossRef]
Shahhosseini, M.; Hu, G. Improved Weighted Random Forest for Classification Problems. In Progress in Intelligent Decision Science; Allahviranloo, T., Salahshour, S., Arica, N., Eds.; Springer: Cham, Switzerland, 2021; Volume 1301, pp. 33–43. [Google Scholar] [CrossRef]
Guo, R.; Zhao, Z.; Wang, T.; Liu, G.; Zhao, J.; Gao, D. Degradation State Recognition of Piston Pump Based on ICEEMDAN and XGBoost. Appl. Sci. 2020, 10, 6593. [Google Scholar] [CrossRef]
Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
Abou Jieb, Y.; Hossain, E. Photovoltaic Systems: Fundamentals and Applications; Springer: Cham, Switzerland, 2022. [Google Scholar]
Prasad, D.K.; Snow, M. Designing with Solar Power: A Source Book for Building Integrated Photovoltaics (BiPV); Earthscan: Abingdon, UK, 2013. [Google Scholar]
Crawley, G.M. Solar Energy; World Scientific Publishing Co., Pte Ltd.: Singapore, 2016. [Google Scholar]
Eltamaly, A.M.; Abdelaziz, A.Y. Modern Maximum Power Point Tracking Techniques for Photovoltaic Energy Systems; Springer: Cham, Switzerland, 2020. [Google Scholar]

Figure 1. A microgrid in simulated environment [12].

Figure 2. Formula for two-way ANOVA test [18].

Figure 3. Correlation Matrix between the columns.

Figure 4. Feed-in energy over the year (kWh).

Figure 5. Overview about the energy.

Figure 6. Relationship between the global PV radiation and the PV energy (DC).

Figure 7. Relationship between PV Energy (DC) and MPP Voltage.

Figure 8. Comparison of the losses due to mismatch (configuration/shading) between East and West.

Figure 9. Comparison of the losses due to Module-specific Partial Shading between East and West.

Figure 10. Comparison of the losses due to Deviation from the nominal temperature between East and West.

Figure 11. Grid feed-in over the year.

Figure 12. Price of the fed in energy over the year.

Figure 13. MSE for Different Regression Models.

Figure 14. Parameter Tuning of RNN Results.

Figure 15. MSE and Time Processing for Different Models.

Figure 16. 10-Inverter Dataset—MSE and Time Processing for Different Models.

Figure 17. Liège Dataset—MSE and Time Processing for Different Models.

Figure 18. Energy Output Prediction Pipeline.

Table 1. Columns for Analysis.

No.	Description
1	Time
2	Area
3	Inverter
4	Open Circuit Voltage (V)
5	MPP Voltage (V)
6	Global PV Radiation (kWh)
7	Losses due to STC Conversion (Rated Efficiency of Module) (kWh)
8	Losses due to Module-specific Partial Shading (kWh)
9	Losses due to Low-light performance (kWh)
10	Losses due to Deviation from the nominal module temperature (kWh)
11	Losses due to Mismatch (Manufacturer Information) (kWh)
12	Losses due to Mismatch (Configuration/Shading) (kWh)
13	Diodes (kWh)
14	Failing to reach the DC start output (kWh)
15	Losses due to MPP Matching (kWh)
16	PV energy (DC) (kWh)
17	Losses due to DC/AC Conversion (kWh)
18	Losses due to Input voltage deviations from rated voltage (kWh)
19	Feed-in energy (kWh)

Table 2. Features for Prediction Models.

No.	Description
1	Open Circuit Voltage (V)
2	MPP Voltage (V)
3	Global PV Radiation (kWh)
4	Hour
5	Day of week
6	Quarter
7	Month
8	Day of year
9	Day of month
10	Week of year
11	Lag 1 (kWh)
12	Lag 2 (kWh)
13	Lag 3 (kWh)
14	Lag 4 (kWh)
15	Lag 5 (kWh)
16	Lag 6 (kWh)
17	Lag 7 (kWh)
18	Lag 8 (kWh)
19	Lag 9 (kWh)
20	Lag 10 (kWh)
21	Lag 11 (kWh)
22	Lag 12 (kWh)
23	Area encoded
24	Inverter encoded

Table 3. Comparison of the PV energy (DC) between all inverter/MPP combinations.

Inverter/MPP	kWh
Inverter 1 MPP1	4534.13
Inverter 2 MPP1	4533.34
Inverter 3 MPP1	4467.45
Inverter 4 MPP1	4467.34
Inverter 1 MPP2	4531.43
Inverter 2 MPP2	4531.48
Inverter 3 MPP2	4465.44
Inverter 4 MPP2	4465.67

Table 4. ANOVA Test Result for Losses Due to Module-Specific Partial Shading.

Column	sum_sq	df	F	PR(>F)
Inverter	0.007523	3.0	7.111227	0.000090
MPP	0.002203	1.0	6.247672	0.012438
Residual	24.710690	70075.0	NaN	NaN

Table 5. ANOVA Test Result for Losses due to Mismatch (Configuration/Shading).

Column	sum_sq	df	F	PR(>F)
Inverter	0.000802	3.0	15.291898	0.070826 × 10⁻¹⁰
MPP	0.000245	1.0	14.010321	1.819551 × 10⁻⁴
Residual	1.225334	70075.0	NaN	NaN

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Nguyen, K.; Koch, K.; Chandna, S.; Vu, B. Energy Performance Analysis and Output Prediction Pipeline for East-West Solar Microgrids. J 2024, 7, 421-438. https://doi.org/10.3390/j7040025

AMA Style

Nguyen K, Koch K, Chandna S, Vu B. Energy Performance Analysis and Output Prediction Pipeline for East-West Solar Microgrids. J. 2024; 7(4):421-438. https://doi.org/10.3390/j7040025

Chicago/Turabian Style

Nguyen, Khanh, Kevin Koch, Swati Chandna, and Binh Vu. 2024. "Energy Performance Analysis and Output Prediction Pipeline for East-West Solar Microgrids" J 7, no. 4: 421-438. https://doi.org/10.3390/j7040025

APA Style

Nguyen, K., Koch, K., Chandna, S., & Vu, B. (2024). Energy Performance Analysis and Output Prediction Pipeline for East-West Solar Microgrids. J, 7(4), 421-438. https://doi.org/10.3390/j7040025

Article Menu

Energy Performance Analysis and Output Prediction Pipeline for East-West Solar Microgrids

Abstract

1. Introduction

2. Related Work

3. Materials and Methods

3.1. Data Understanding and Preprocessing

3.2. Feature Engineering

3.3. Use Case 1: Data Analysis in Microgrids

3.4. Use Case 2: Energy Output Prediction

4. Results

4.1. Use Case 1: Data Analysis in Microgrids

4.1.1. Data Exploring

4.1.2. Overview

4.1.3. Losses

4.1.4. Output

4.2. Use Case 2: Energy Output Prediction Pipline

4.2.1. Comparison of 5 Models

4.2.2. Comparison of Three Datasets

4.2.3. Final Pipeline

5. Discussion

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References and Note

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI