1. Introduction
In the context of growing energy demands and an increasing focus on renewable energy sources, microgrids are playing an increasingly crucial role in modern energy systems. These local energy networks are designed to operate independently or in conjunction with the main grid, covering a wide range of energy supplies, including renewable sources like solar and wind, conventional generators, and energy storage systems [
1]. The goal of a microgrid is to provide dependable and environmentally responsible energy to communities, industrial parks, and other locations while optimizing performance and reducing energy consumption [
2].
Microgrids represent a fundamental transformation in local energy generation, distribution, and consumption. They provide a flexible, efficient, and environmentally friendly alternative to traditional power systems, particularly in areas prone to outages or lacking robust infrastructure [
3]. By integrating renewable energy sources and employing advanced technologies, microgrids enhance energy reliability, sustainability, and economic viability, thereby paving the way for a cleaner energy future [
4,
5].
Over the past decade, significant research has been conducted in the field of microgrids, with a particular focus on areas such as microgrid planning, predictive maintenance, real-time energy management, and energy output prediction. The primary objective behind predicting the energy output of microgrids is to enhance system performance and stability. This predictive capability plays a pivotal role in several aspects, including the efficient management of energy from both renewable and conventional sources, optimization of storage and generator control, forecasting of renewable resources, grid stability enhancement, economic optimization, and increased utilization of renewable energy resources. Simultaneously, it contributes to risk reduction and heightened availability within the microgrid system [
6].
2. Related Work
In the context of machine learning and energy systems, predicting energy output involves using historical or real-time data with machine learning models to forecast future energy generation. Various machine-learning techniques have been employed for this purpose. A previous study compared multiple regression, support vector machine regression, and Gaussian regression models, concluding that multiple linear regression performed the best. However, the study focused heavily on model selection, overlooking the potential effects of external environmental factors that could influence energy predictions in real-world scenarios. Additionally, the study highlighted the importance of optimal location and parameter selection for model development but did not explore these factors in depth [
7].
Another review examined photovoltaic (PV) power prediction in a semi-arid climate, introducing a stacked machine learning model and comparing it with Random Forest, Extreme Gradient Boosting, and linear regression. While the stacked model demonstrated superior accuracy, the study’s limitation lay in its narrow focus on semi-arid climates, which limited its applicability to other regions with different climatic conditions. Furthermore, while feature engineering significantly enhanced prediction accuracy, the study did not explore potential improvements in model robustness under varying conditions [
8].
Rosero et al. investigated the use of cloud computing and machine learning for energy management in a microgrid cluster, showcasing the potential of these technologies to optimize energy distribution and efficiency. A notable disadvantage, however, was the study’s limited discussion of the scalability of such systems in larger or more complex grids, as well as the lack of real-world validation beyond simulations [
9].
In another study, a multilayer feedforward neural network (MLFFNN) trained with the Levenberg–Marquardt algorithm was applied to data from a real PV power plant. Although the model achieved a low mean squared error (MSE) of 0.0053, indicating its high accuracy, the study did not address the challenges of computational complexity and the potential need for real-time application in larger, more diverse datasets [
10].
Arafat et al. provided a comprehensive review of machine learning applications in microgrid predictive maintenance. While their work offered valuable insights into integrating sensor data with operational drivers, it lacked a detailed discussion on the real-time deployment of these frameworks in dynamic environments. Furthermore, the study acknowledged several challenges, including data quality and integration issues, but did not propose concrete solutions for overcoming these hurdles [
11].
Despite the advancements in these studies, many have focused on one-sided microgrids, with relatively few studies implementing the East-West model. This model has the potential to leverage global radiation from both sides, offering advantages for energy generation throughout the day. One-sided solar farms have a uniform orientation among all solar panels, whereas two-sided solar farms have two distinct orientations, similar to east- and west-sided solar panels.
Figure 1 illustrates an example of a microgrid in a simulated environment, as developed by SRH Berlin University of Applied Sciences.
This study contributes significantly to the understanding and development of east-west microgrids by analyzing the factors that influence their performance and quantifying the energy that can be generated and fed into the main grid. Additionally, it assesses the potential revenue streams from selling this energy, providing essential insights for energy planners and operators. A key hypothesis tested in this research is that there is no significant difference in energy production and losses between the east and west sides of the microgrid, which aids in optimizing panel orientation. Furthermore, this study introduces an innovative energy output prediction pipeline that integrates critical factors such as solar radiation, photovoltaic energy (DC), and conversion losses to enhance the accuracy of energy output predictions. A comparative analysis of microgrid operation modes—grid-tied (connected to the local utility) versus islanded (operating independently)—is also conducted to determine which mode offers greater energy optimization benefits. Through these contributions, the study addresses existing research gaps and provides actionable insights for the effective implementation and management of east-west microgrids in practical applications.
3. Materials and Methods
3.1. Data Understanding and Preprocessing
To ensure the greatest possible stability, three different datasets were used to train the models. The initial dataset comprises data from a real-time-designed microgrid model with east- and west-oriented photovoltaic (PV) modules and maximum power point (MPP) inverters. The system was designed to be grid-connected and does not include an energy storage system. The model was developed by SRH Berlin University of Applied Science using an online PVsol tool [
12]. A total of four inverters, each with two maximum power points, were evaluated across 270 columns. An inverter is a system that converts direct current (DC) to alternating current (AC) energy through transformer switching and control circuits. Two inverters were oriented east, while two were oriented west. For each inverter-MPP combination, total irradiation, various losses, and grid feed-in were measured. The Global PV Energy dataset provides a comprehensive overview of the total amount of solar energy reaching a specific area of the Earth’s surface over a defined time period, comprising hourly values for the entire year.
The second dataset is structured similarly to the first; however, it contains ten inverters instead of four. The final dataset comprises weather forecasts with a 15-min resolution, including solar radiation and air temperature data, among other variables. These forecasts were created by the Laboratory of Climatology at the University of Liège using the regional climate model MAR. The data were collected from 10 May 2019, to 18 June 2019 [
13].
The primary dataset was the solar farm dataset, which included a total of four inverters. To ensure an accurate analysis between east and west inverters and to construct a reliable model for grid feed-in prediction, several preprocessing steps were essential. One of the initial steps in the process was to split the datasets for each inverter/MPP combination into different rows. This allowed for more efficient analysis between the east and west orientations. Another step involved removing all columns with null values and irrelevant columns from the dataset. The irrelevant columns included Deviation from the Standard Spectrum, Ground Reflection (Albedo), and Orientation and Inclination of the Module Surface. The resulting dataset was then reduced to 19 columns (see
Table 1).
The final step was to ensure that the time column was formatted correctly. The format was adjusted as follows: “yyyy-mm-dd HH:MM”.
3.2. Feature Engineering
To address the second use case—predicting grid feed-in energy—it is essential to identify the features required for constructing the model. A feature is a specific data characteristic used to make predictions. Features are crucial inputs for machine learning models, enabling them to identify patterns in the data [
14]. In addition to the provided columns containing open circuit voltage, MPP voltage, and global PV radiation, 21 additional features were incorporated.
From the provided time column, several temporal features were created, including the day of the week and the week of the year. Furthermore, twelve additional features were added using lagged data, which are also referred to as delayed features in time-series analysis [
15]. These values reflect how data from previous hours were calculated. Additionally, the area and inverter label columns were encoded, resulting in two new feature columns.
The following 24 features (see
Table 2) were utilized in the creation of the models.
To guarantee the integrity of the outcome, an outlier detection process was implemented. Potential outliers were identified using a method that involves subtracting the first quartile (Q1) from the third quartile (Q3) to calculate one and a half times the interquartile range (IQR). Data points falling below Q1 minus 1.5 times the IQR or above Q3 plus 1.5 times the IQR were considered potential outliers, thereby extending the range for outlier detection.
3.3. Use Case 1: Data Analysis in Microgrids
To provide an initial overview of the columns and their interdependencies, a correlation matrix was constructed. The degree of dependency between two variables is measured on a scale from −1 to 1. A value of −1 indicates a strong negative dependency, while a value of 1 indicates a strong positive dependency. If the value remains around 0, no dependency can be assumed [
14].
The bar chart was the primary tool used for comparing east- and west-oriented solar panels [
16]. Bar charts are effective for comparing categories such as inverters or MPPs. Additionally, scatter plots were employed to investigate values such as global PV radiation or PV energy (DC) for potential dependencies in greater detail [
17].
Once the data had been preprocessed, a two-way ANOVA test was conducted to ascertain which factors were statistically significant in relation to the east and west sides (see
Figure 2). ANOVA is an effective method for comparing means between groups and assessing the statistical significance of differences among factors, helping to identify distinctions between the two sides. Furthermore, ANOVA can examine the interactions between factors, identifying dependencies that affect the analysis. In this research project, the factors analyzed included global radiation, open circuit voltage, MPP voltage, and losses in the process [
18].
3.4. Use Case 2: Energy Output Prediction
In this section, a pipeline was established for the purpose of predicting energy output. Following testing of the pipeline on our original dataset, it was applied to two additional microgrid datasets. One dataset was collected in Berlin, Germany, using a simulation environment. This dataset comprised information from ten inverters, distributed between the east and west sides. The second dataset was provided by Liège University in Belgium and included a weather forecast produced by the university’s Laboratory of Climatology, based on the MAR regional climate model [
19].
To identify the most effective models for energy prediction, five machine-learning methods were executed and compared. In each model, the grid search technique was employed to find the optimal hyperparameter combination. Subsequently, the metrics used for comparison were mean square error (MSE) and processing time. The stability of the models was also assessed during testing on the three datasets to ensure optimal performance. The initial model employed was linear regression, which is a fundamental method for understanding the relationship between the target and predictive variables. It is a rapid processing method and highly effective for linear relationships. However, it is not well-suited for non-linear data and is susceptible to noise.
In the field of machine learning, Support Vector Regression (SVR) is a widely used method that employs support vectors to predict the value of a dependent variable. The objective is to create an optimal boundary (support vector line) around the data and apply a specialized loss function to enable the model to learn from the dataset. This model’s key strengths are its efficiency with complex data structures and its capacity to control overfitting. However, this approach requires careful parameter fine-tuning, which may not be suitable for large datasets. Two crucial parameters in this model are the cost value and the kernel, as outlined in [
20]. The cost value determines the balance between achieving a low training error and a low testing error, while the kernel calculates the degree of similarity between data points in a high-dimensional feature space. The parameters utilized in this research were cost values (0.1, 1, 10) and kernels (linear, polynomial, radial basis function).
The next models introduced were tree-based machine learning methods. One of the most popular methods is Random Forest Regression, which combines many decision trees to create more accurate predictions. It helps reduce overfitting and enhances the accuracy of the model, particularly when there are multiple independent variables. However, the model is somewhat complex and difficult to explain in detail [
21]. This study tested three different numbers of trees: 50, 100, and 200. Additionally, four different maximum depths for each decision tree within the Random Forest were implemented: None, 10, 20, and 30.
Extreme Gradient Boosting (XGBoost) is a robust boosting algorithm that is highly effective in machine learning, both for prediction and classification. Similar to Random Forest, it employs multiple decision trees to improve accuracy and establish more robust predictions by learning from the shortcomings of previous trees. It functions effectively even on datasets with a high level of noise but requires intricate parameter tuning, which can result in overfitting if not managed with precision. This model requires multiple parameters to be tuned and focuses on the tree-based boosting algorithm within the model [
22]. The number of boosting rounds or trees to be constructed in the ensemble was set at 100, 500, and 1000. Furthermore, the maximum depth of each decision tree in the ensemble was set to 3, 4, and 5, while the step size at each iteration towards minimizing the loss function was set to 0.01, 0.1, and 0.2.
The final model utilized in this project was the Recurrent Neural Network (RNN), which is designed to process sequential data, including text, audio, and time series data. This model’s distinguishing feature is its capacity to retain past data. It is effective with complex datasets but requires significant time for training and may experience gradient vanishing. Hyperparameter tuning is a systematic process of searching for the optimal combination of hyperparameters to enhance the RNN’s performance [
23]. The number of Long Short-Term Memory (LSTM) units or neurons in each LSTM layer was set at 128 and 256, while the number of LSTM layers in the RNN architecture was set to 2. Additionally, the activation function utilized within the LSTM units was the Rectified Linear Unit (ReLU). ReLU was selected for this model due to its effectiveness in mitigating the vanishing gradient issue and its computational efficiency, which contributes to enhanced training and performance outcomes. The batch size, or the number of data samples processed together in each forward and backward pass during training, was set to 32. The number of epochs, or the number of times the entire training dataset was passed forward and backward through the neural network during training, was set to 10, 20, 100, and 200.
Regarding the tools used, the model was trained on a local machine (MacBook Air 2017, Model A1466, EMC 3178-Designed by Apple in Cupertino, CA, USA, assembled in China) equipped with an Intel Core i5 CPU, which has 2 cores and a base frequency of 1.8 GHz, with a turbo boost of up to 2.9 GHz. The training-test size was split into 80% for training and 20% for testing to ensure robust model evaluation.
5. Discussion
As illustrated in
Figure 11 (Grid Feed-In), there is potential to feed 333,340 kilowatt-hours into the main grid throughout the year. This would have covered the total grid consumption of 27,710 kilowatt-hours. However, due to the absence of a battery storage system within the microgrid, the surplus energy was transferred to the main grid. Consequently, energy had to be imported from the main grid to the microgrid, as this was the only way to cover the constant energy demand. It is, therefore, evident that the microgrid would be unable to operate in pure island mode and is dependent on the main grid.
The initial hypothesis that the east and west sides would produce the same amount of energy was rejected. There are notable differences between the two sides. As demonstrated in the analysis between global PV radiation and PV energy (DC), some outliers were identified. Although it was not possible to assign these outliers 100 percent to one side, they still indicate differences between the two sides.
There was a significant discrepancy in energy losses between the east and west sides. The east-facing solar panels demonstrated considerably lower energy loss than the west-facing panels, despite receiving more global PV radiation. This can be attributed to a superior configuration, as the individual solar panels are better matched in terms of voltage and current on the east side, which is crucial for optimizing the configuration. The analysis also revealed an increase in energy loss due to the shading of the solar panels on the west side. This may suggest the presence of obstructions, such as buildings or trees, on the west side of the solar farm. As a result, the solar farm is subject to shadowing, particularly in the evening.
The final noteworthy discrepancy was observed in the losses resulting from deviations from the nominal temperature. It is notable that the west side has significantly higher losses, which are influenced by weather conditions in Germany. Typically, temperatures are lower in the morning than in the afternoon during the summer months. Consequently, the west side experiences significantly higher afternoon losses than the east side. These significant differences disprove the hypothesis that there are no differences in energy production and losses between the east and west sides. These results were not anticipated, as it was assumed that both sides would have the same modules and, therefore, the same conditions for energy production. However, the solar farm is unable to withstand the prevailing weather conditions and temperatures. These findings provide a foundation for optimizing the solar farm. The reason for this is the direction of shading due to the solar module layout, as well as the timing of shading. However, since these losses did not significantly impact the overall performance of the modules, there was no statistical difference in energy output between the two sides.
A key strength of this study was its large sample size, comprising 270 columns and 8760 data points. From that, a train-test set with 35,400 data points was created, allowing the model to learn and perform better in time series forecasting with full-year trends. Moreover, the pipeline was adaptable and did not require a substantial number of features in the original dataset, making it suitable for a range of scenarios. In regard to the testing model and different datasets, the models yielded favorable outcomes with full-year data. However, they exhibited less stability with the Liège University data, which included only 40 days. This may pose a potential limitation when applying this pipeline to other datasets with fewer data points.
One of the primary insights from the case study was that the solar panels on the east and west sides do not consistently perform at the same level of efficiency. The outcome is contingent upon a number of variables. Examples of controllable factors include the adjustment of solar panels to one another and the shading caused by nearby objects. These can be adjusted to achieve optimal results. However, it should be noted that external factors, such as weather conditions or temperature, are beyond our control.
Moreover, the models demonstrated consistent performance across two full-year datasets, with Random Forest Regression and XG Boosting consistently outperforming other models in terms of both MSE and processing time. It is worth noting, however, that the Recurrent Neural Network (RNN) demonstrated remarkable performance in the initial dataset but exhibited fluctuations when subjected to various computational environments. This established pipeline can be readily applied to other time series datasets containing a full year’s worth of data.
One significant challenge associated with this microgrid is the absence of an energy storage system. Therefore, any excess energy must be sold to the main grid. However, at night, energy must be imported back to the microgrid from the main grid to meet demand. As a result, significant financial losses are incurred. Another area for improvement is the analysis of energy consumption, which can be addressed in the future. This will enable a more accurate analysis to be made and greater efficiency to be achieved in energy management.