Energy Management System for an Industrial Microgrid Using Optimization Algorithms-Based Reinforcement Learning Technique

Upadhyay, Saugat; Ahmed, Ibrahim; Mihet-Popa, Lucian

doi:10.3390/en17163898

Open AccessArticle

Energy Management System for an Industrial Microgrid Using Optimization Algorithms-Based Reinforcement Learning Technique

by

Saugat Upadhyay

,

Ibrahim Ahmed

and

Lucian Mihet-Popa

^*

Faculty of Information Technology, Engineering and Economics, Østfold University College, Kobberslagerstredet 5, 1671 Fredrikstad, Norway

^*

Author to whom correspondence should be addressed.

Energies 2024, 17(16), 3898; https://doi.org/10.3390/en17163898

Submission received: 10 June 2024 / Revised: 6 July 2024 / Accepted: 31 July 2024 / Published: 7 August 2024

(This article belongs to the Section A1: Smart Grids and Microgrids)

Download

Browse Figures

Versions Notes

Abstract

:

The climate crisis necessitates a global shift to achieve a secure, sustainable, and affordable energy system toward a green energy transition reaching climate neutrality by 2050. Because of this, renewable energy sources have come to the forefront, and the research interest in microgrids that rely on distributed generation and storage systems has exploded. Furthermore, many new markets for energy trading, ancillary services, and frequency reserve markets have provided attractive investment opportunities in exchange for balancing the supply and demand of electricity. Artificial intelligence can be utilized to locally optimize energy consumption, trade energy with the main grid, and participate in these markets. Reinforcement learning (RL) is one of the most promising approaches to achieve this goal because it enables an agent to learn optimal behavior in a microgrid by executing specific actions that maximize the long-term reward signal/function. The study focuses on testing two optimization algorithms: logic-based optimization and reinforcement learning. This paper builds on the existing research framework by combining PPO with machine learning-based load forecasting to produce an optimal solution for an industrial microgrid in Norway under different pricing schemes, including day-ahead pricing and peak pricing. It addresses the peak shaving and price arbitrage challenges by taking the historical data into the algorithm and making the decisions according to the energy consumption pattern, battery characteristics, PV production, and energy price. The RL-based approach is implemented in Python based on real data from the site and in combination with MATLAB-Simulink to validate its results. The application of the RL algorithm achieved an average monthly cost saving of 20% compared with logic-based optimization. These findings contribute to digitalization and decarbonization of energy technology, and support the fundamental goals and policies of the European Green Deal.

Keywords:

EMS; PPO; BESS; optimization algorithm; peak shaving; price arbitrage

1. Introduction

Clean energy sources such as hydropower, wind energy, and solar energy are gradually replacing more conventional energy sources based on fossil fuels and coal. This shift is a result of the environmental duty to become sustainable and reduce carbon emissions, in addition to the outcomes of economic and technological progress. Therefore, as the globe moves towards more sustainable solutions, the significance of microgrids and distributed generation, especially those that use and incorporate more renewable energy sources, has grown. There has been a significant shift in how the power system operates, and thus microgrids have emerged as the new method of managing distributed generation. One of the definitions of the term “microgrid”, according to the US Department of Energy, is “a group of interconnected loads and distributed energy resources within clearly defined electrical boundaries that act as a single controllable entity with respect to the grid” [1]. Industrial microgrids (IMGs) are made up of industrial loads, energy storage systems (ESSs), and renewable energy sources, and have different operational requirements compared with residential microgrids [2,3]. Such kind of microgrids aid in lowering long-distance power transmission losses while simultaneously reducing the pollution from heavy industry [4]. IMGs are an effective instrument for adapting to diverse energy requirements. A battery energy storage system (BESS), for example, may be controlled by a microgrid to provide different backup power and enhance the reliability of the IMG [5].

An energy management system (EMS) is used to optimally coordinate the power exchange throughout the IMG and with the main grid, reducing energy costs while improving flexibility and energy efficiency [6,7,8]. Designing and developing EMS algorithms for day-ahead and real-time scheduling is challenging because of the complexity of the microgrid, intermittent nature of DERs, and unpredictable load requirements [6,9]. Battery energy storage systems (BESSs) can be effectively utilized to balance these demands and trade energy with the main grid based on the renewable production and price of electricity.

Energy optimization in industrial microgrids has been extensively studied in the literature. Authors in [10] developed a day-ahead multi-objective optimization framework for industrial plant energy management, assuming that the facility had installed RESs. Meanwhile, Ref. [11] created an optimal energy management method in the industrial sector to minimize the total electricity cost with renewable generation, energy storage, and day-ahead pricing using state task network and mixed integer linear programming, while [12] presented a demand response strategy to reduce energy for industrial facilities, using energy storage and distributed generation investigated under day-ahead, time-of-use, and peak pricing schemes. These studies utilized basic optimization approaches and did not utilize forecasting. On the other hand, ref. [13] introduced an online energy management system (EMS) for an industrial facility equipped with energy storage. The optimization employed a rolling horizon strategy and used an artificial neural network model to forecast and minimize the uncertainty of electricity prices. The system solved a mixed-integer optimization problem based on the most recent forecast results for each sliding window, which helped in scheduling responsive demands. Additionally, ref. [14] presented a real-time EMS that used a data distribution service that incorporated an online optimization scheme for microgrids with residential energy consumption and irradiance data from Florida. It utilized a feed-forward neural network to predict the power consumption and renewable energy generation. A review of energy optimization in industrial microgrids utilizing distributed energy resources is presented in [15]. Furthermore, resource efficiency and resiliency are important aspects of microgrid design as they affect the overall system performance. A comparison between the microgrid stage efficiencies for each mode of operation is presented in [16], while resilience analysis and methods of improving resiliency in microgrids can be found in [17].

Reinforcement learning has emerged as a method to solve complex problems with large state spaces. An RL agent starts with a random policy, which is a mapping between the observations (inputs) and the actions (outputs), and then incrementally learns to update and improve its policy using a reward signal that is given by the environment as an evaluation of the quality of the action performed. The goal of the agent is to maximize the reward signal over time. This can be achieved using a variety of methods but, generally, there are two high-level approaches, learning through the value function, and learning through the policy. A value function is an estimate of the future rewards obtained by taking an action and then following a specific policy. RL agents can learn either by optimizing the value function, the policy, or both [18]. Actor–critic learning approaches make use of both the policy and the value function. The actor modifies the policy by updating the value function estimate provided by the critic.

Reinforcement learning has been used to optimize microgrids in the residential sector because it has proven to be a viable strategy for optimizing complex dynamic systems [19,20]. For instance, a microgrid in Belgium saw a reduction in cost and an increase in efficiency when the Deep-Q-Network (DQN) technique was implemented in [21], assuming a fixed price of electricity. The suggested method in [22] generated three distinct consumption profiles according to the needs of the customers and used the Deep Deterministic Policy Gradient (DDPG) algorithm to produce a very profitable scheme. However, the results were only observed over a few weeks, and, for one of the plans, the battery was simply discharged at the end, which does not demonstrate how the trained algorithm would function over an extended period of time. Several distinct reinforcement learning algorithms were compared over a ten-day period in using data from Finland, and an enhanced version of the Advantage Actor–Critic algorithm (A3C++) achieved the best performance [23]. In a different instance, ref. [24] reduced the operational cost by 20.75% using RL. More recently, Proximal Policy Optimization (PPO) has emerged as a powerful RL algorithm and was utilized in [25,26] to optimize energy costs in a microgrid with promising results. However, load forecasting was not included.

This paper builds on the existing research framework by combining PPO with machine learning-based load forecasting to produce an optimal solution for an industrial microgrid in Norway under different pricing schemes, including day-ahead pricing and peak pricing. It addresses the peak shaving and price arbitrage challenges by taking the historical data into the algorithm and making the decisions according to the pattern of energy consumption, battery characteristics, PV production, and energy price.

The paper is distributed into four different sections. The microgrid architecture is discussed in Section 2, with the components of the microgrid at the industrial site in Norway. In Section 3, the design and workflow of the EMS algorithms are discussed. The results from the algorithms are presented in Section 4, while Section 5 concludes the paper.

2. Microgrid Architecture

The microgrid at the industrial site in Norway is a grid-connected system with 200 kWp of PV generation, a 1.1 MWh battery storage system, a 360 kW electric vehicle charger, and two types of loads. The overall system diagram can be seen in Figure 1. There are several smart meters (denoted by SM) installed to record the energy flow. Load 1 and load 2 are the main electricity loads, where load 1 is an industrial load and load 2 is a smaller load from an existing old building.

The 1.1 MW battery energy storage system (BESS) is used for backup energy supply and storage. This stored energy is sold back to the grid when the electricity prices are high. The 360 kW electric vehicle (EV) charger is present at the facility to charge the electric lorries and trucks.

2.1. PV System

The PV system is distributed in three different areas in three buildings. The south building is a facade configuration with 44 panels with 310 watts each, while the southeast building is equipped with 96 modules with an 11° inclination and a roof-mounted configuration. Similarly, the northwest building is configured with 74 solar panels with an 11° inclination towards the northwest. The PV system also contains three inverters to couple it with the IMG. Based on the irradiance in the area, the anticipated PV energy production throughout 2024 was calculated using PVSOL software (Version: PVSOL premium 2024 (R2)), and the results are displayed in Figure 2. Table 1 shows the general parameters of the PV system.

2.2. Battery Energy Storage System

The BESS used is a 1.1 MWh container unit equipped with bidirectional inverters, also called a Power Conversion System (PCS). It is outfitted with high-precision sensors to monitor all its internal parameters such as temperature, humidity, voltage, and current, and protect against overcharging, flooding, or fire. This is achieved using a series of logical interlocks and a mix of hardware and software safeguards. The battery and inverter specifications are given in Table 2.

An essential component of controlling the energy transfer between the battery storage system and the electrical grid is the bidirectional inverter or Power Conversion System (PCS). Its primary job is to charge the batteries by converting alternating current (AC) from the IMG into direct current (DC), and vice versa. For applications such as peak shaving, where excess energy is kept during low demand times and released at peak demand to sustain grid operations, this bidirectional capability is essential.

The inverter or PCS system has the ability to operate in both grid-tied and off-grid modes. This system is adaptable for a range of energy storage requirements since it can handle broad battery voltage ranging from 600 V to 900 V, generate up to 500 kW of nominal power, and support up to eight battery strings. It has an efficiency above 97%. For efficient thermal management, the PCS unit uses forced air cooling, which ensures peak performance, even at full load. The inverter specifications are displayed in Table 3.

In addition to the main components, the system also contains other IoT devices, smart meters, GPC (Grid Power Controller), etc. These devices function as a gateway to the battery system so that it can be controlled with the help of software programming. They operate on LINUX (Version: Ubuntu 22.04.4 LTS) and use the MODBUS TCP protocol [27] for communication with local or remote servers and to send data to the cloud, as shown in Figure 3.

Figure 1 illustrates the four smart meters in the industrial microgrid, out of which SM1 is a virtual smart meter while SM2, SM3, and SM4 are the physically present meters connected to the loads and DERs. These smart meters measure apparent power, active power, and reactive power using the true RMS value measurement (TRMS) up to the 63rd harmonic in all four quadrants [28].

3. Energy Management System

The basic block diagram of the energy management system is shown in Figure 4. It receives the measurements from the IMG, processes all the data, and uses different optimization algorithms to produce energy dispatch commands that are sent back to the IMG. These algorithms are explained in the following sections.

3.1. Data Acquisition and Processing

The EMS development steps are shown in Figure 5. The first step to developing an energy management system is to collect data from different components such as PV, battery storage system, grid, etc. The data can be collected using various sources such as smart meters, data loggers, a database or cloud system, or publicly available API services. The PV irradiance data were taken from PVSOL simulation software. Other important data to be read for the EMS development were the consumption data from the loads present at the industrial site and the grid import. Since the area is primarily a manufacturing site, the majority of its load or consumption is from heavy machines used for manufacturing. The load values and grid import values were collected using the Phoenix Contact smart meter [28].

The energy price data were collected from the ‘www.hvakosterstrommen.no’ (accessed on 15 February 2024) website [29]. This website provides an open and free API to retrieve Norwegian electricity prices along with historical data. They collect the data from ENTSO-E in euros and convert it to the local currency using the latest exchange rate [30]. ENTSO-E is a transparency platform where data and information on electricity generation, transportation, and consumption are centralized and published for the benefit of the whole European market [30].

3.2. Data Analysis and Forecasting

The data analysis and forecasting part consisted of four main steps: data preparation and feature engineering, model training, forecasting and adjustment, and compilation and output. The initial step of this process was taking the historical data and arranging them in a specific format, removing outliers and missing values, etc. The data were collected on an hourly basis and were aggregated from different sources including PV production, battery state-of-charge, grid power import, and site load values, as well as the hourly electricity prices. The forecasting process begins with loading historical data, preparing time-based features, and defining features and target variables. A Random Forest Regressor model [31] is trained using these data.

The Random Forest Regressor is a meta estimator based on decision trees that employs averaging to increase prediction accuracy and manage over-fitting after fitting several decision tree regressors on different subsamples of the dataset. The Random Forest structure can be represented conceptually in Equation (1) as follows [32]:

f (X) = \frac{1}{B} \sum_{b = 1}^{B} T_{b} (X; Θ_{b})

(1)

where:

$f (X)$ is the prediction function of the Random Forest.
B is the number of trees.
$T_{b} (X; Θ_{b})$ represents a single decision tree indexed by b, which is a function of the features, X, and random parameters, $Θ_{b}$ .

Predictions are adjusted based on PV production before making a forecast for a specific month. Finally, the results are compiled and saved, completing the process.

Figure 6 shows the graph from forecasted data. It shows the grid import (denoted by blue line) and site load (denoted by green line) of the site. The grid import is negative as time goes by because, following the month of March, there is more PV production and due to this more energy is supplied to the grid.

3.3. Logic-Based Optimization

A logic-based optimization algorithm was developed to use as a benchmark, and the flowchart of the algorithm is displayed in Figure 7. The energy price and battery SOC play an important role in the optimization process. The system starts by measuring the following important parameters: the power generated by the PV system (

P_{P V}

), the load/consumption (

P_{l o a d}

), the cost of energy (

E_{c o s t}

), the power imported from the grid (

G_{i}

), and the initial state of charge of the battery (

S O C_{i n i t}

). The viability of using stored energy vs. grid energy is then evaluated based on economic factors, such as if the current cost of energy is less than a predetermined minimum (

E_{m i n}

).

The system will not charge the battery to save expensive energy expenses if the cost is unfavorable and the battery SOC is below a maximum threshold (

S O C_{m a x}

). Upon reaching a certain power threshold (

P_{t h r e s}

), the system determines if it is necessary to use the grid to satisfy energy requirements. The battery health is maintained by the system maintaining the battery SOC above a minimum allowable level (

S O C_{m i n}

). On the other hand, the algorithm will discharge the battery if the SOC is above

S O C_{m i n}

. To maximize both economic and energy efficiency, the system additionally incorporates some logic to manage energy from the PV system and use it directly for the load or to charge the battery with any excess generation.

For peak shaving, the algorithm uses an energy management technique called “dynamic peak shaving”, which is used to lower the greatest power demand or load in the system throughout the day. By setting a peak shaving threshold, the power demand, or grid import per hour, is kept below a certain level. This is accomplished using a battery storage system to supplement the grid supply during times of high demand. Dynamic peak shaving aims to minimize energy expenses, prevent peak demand charges, and lessen the burden on the electrical system. The peak shaving threshold is dynamically determined using the maximum load estimate for each day. This algorithm is intended to run on a daily basis.

The given equations describe the battery’s charging and discharging operations. The charging equation limits the charge added to the battery by either the maximum charge rate or the remaining capacity adjusted for efficiency. Similarly, the discharging equation limits the energy discharged by either the maximum discharge rate or the current storage level adjusted for efficiency. These equations ensure the battery operates within its physical and efficiency constraints, optimizing its performance.

{Charge}_{t} = min (Charge Rate, \frac{Max Capacity - Battery Storage}{Efficiency})

(2)

{Discharge}_{t} = min (Discharge Rate, \frac{Battery Storage}{Efficiency})

(3)

The algorithm determines the battery’s charge, discharge, or hold state each hour based on site load and projected energy price. It charges the battery when prices are low, ensuring it does not the exceed maximum SOC, and discharges when prices are high or the site load surpasses the dynamic peak shaving level, maintaining the SOC above the minimum. If neither condition is met, the battery remains in the “hold” state. The algorithm adjusts the battery’s SOC and power output based on these decisions, ensuring the SOC stays within operating limits and optimizing battery usage for cost and load needs. This process preserves battery efficiency and lifespan while managing energy flow.

3.4. Reinforcement Learning Algorithm

The reinforcement learning algorithm was developed using the same parameters to compare its output for cost saving with the results of the logic-based optimization. The RL agent was specifically designed to minimize costs associated with energy and peak load charges. It leverages a reinforcement learning (RL) algorithm [33], Proximal Policy Optimization (PPO) [34], implemented through the Stable Baselines3 library. The first step in developing the RL agent using the PPO algorithm was to build a custom environment, which is built on the OpenAI Gymnasium framework, which is a standard for developing and comparing reinforcement learning algorithms. This environment simulates the microgrid and allows the agent to control the battery storage system. It includes the battery charging, discharging, and holding, and defines a discrete action space and a continuous observation space, where the state includes normalized values of the forecasted site load, grid import, PV production, and battery SOC.

The Proximal Policy Optimization algorithm works by iteratively enhancing its policy without introducing significant, harmful revisions. The clipped surrogate objective function, which has the following mathematical expression shown in Equation (4), is the approach used by the PPO algorithm to limit the undesirable policy changes [35].

L^{C L I P} (θ) = {\hat{E}}_{t} [min (r_{t} (θ) {\hat{A}}_{t}, clip (r_{t} (θ), 1 - ϵ, 1 + ϵ) {\hat{A}}_{t})]

(4)

where:

$r_{t} (θ) = \frac{π_{θ} (a_{t} ∣ s_{t})}{π_{θ_{o l d}} (a_{t} ∣ s_{t})}$ is the probability ratio of the current policy, $π_{θ}$ , to the old policy, $π_{θ_{o l d}}$ .
${\hat{A}}_{t}$ is an estimator of the advantage function at timestep t.
$ϵ$ is a small value (e.g., 0.1 or 0.2) that defines the clipping range to keep the updates stable [35].

The objective of the RL agent is to identify the optimal strategy that reduces power costs while respecting operational limitations such as battery SOC and capacity. The RL agent is then trained for 50 million timesteps. The model gives feedback in the form of rewards during this process, which are intended to motivate cost-cutting behaviors. For example, the agent is rewarded when it takes advantage of cheap energy price hours to charge, and minimizes grid usage by discharging during peak costs. It eventually learns how to maximize battery utilization for cost optimization by iteratively improving its policy.

Figure 8 shows the general diagram of the PPO algorithm. The actions are evaluated based on the rewards that are generated, to minimize costs and maximize efficiency, and the system iteratively improves its decision-making strategy through continuous training episodes. During training, the agent is used in a simulation to calculate the best course of action (charge, discharge, or hold) at various points in time, given the site load, PV production, grid import, and electricity price state inputs. Through a comparison of the operational expenses with and without battery optimization, a reward signal is calculated based on the performance of the RL agent. To quantify the economic advantages of strategic battery management, the costs are computed using the agent’s actions and the current power prices. After the action is carried out and the reward is assigned, the model updates and enhances its internal policy by observing the reward and the altered condition of the environment (next state). This cycle keeps going until an episode ends, which is the achievement of a predetermined state or the conclusion of a series of states. The agent resets and moves on to the next episode and keeps learning until the training session is finalized. The RL agent is ultimately intended to learn a policy that reduces energy expenses and earns as much profit as possible by selling the excess energy through these recurrent cycles.

3.5. Grid Pricing Scheme

The pricing scheme of the main grid is taken from Nordpool, which is the Pan-European power exchange market [36]. Two pricing schemes were tested, the normal pricing scheme in which the hourly price is given from Nordpool data without any additional costs, and the peak hour pricing scheme in which, in addition to the normal hourly price, there is a penalty each month given for the highest power consumption in kW. The peak hour pricing information is given in Table 4 where (NOK) stands for Norwegian Krone.

Therefore, the total energy cost depends not only on the consumption in kWh but also on the highest peak in kW per month as it will be added to the cost, as shown in Equation (5).

Total Cos t = E_{k W h} \times {Price}_{N O K / k W h} + {Peak}_{k W} \times Peak {Price}_{N O K / k W / m o n t h}

(5)

To help illustrate this point, take for instance the two power profiles displayed in Figure 9. Even though the total energy consumption is the same for both profiles (area under the curve), the cost of the blue profile is higher than the red profile because of the higher peak power consumption that results in additional penalties under the peak pricing scheme.

3.6. Simulation Approach

The simulation approach involved several steps. Initially, the data were acquired from smart meters, PV production sources, and battery energy storage systems (BESSs). Next, they were processed by removing outliers and handling missing values, standardizing the data and implementing load forecasting. Subsequently, Phasor models and complex models were developed using MATLAB-Simulink (Version: R2023b) to test the energy management system (EMS). Additionally, a Python environment model based on the Markov Decision Process (MDP) was created to train reinforcement learning (RL) agents. Hyperparameter tuning and training took place within the Python environment. Finally, the RL agents were evaluated by testing them both in the Python environment and through co-simulation with MATLAB-Simulink. The overall process is summarized in Figure 10.

4. Results and Discussion

4.1. Battery Scheduling with Peak Shaving

The peak shaving algorithm is used to obtain an automatic battery charging and discharging schedule. This schedule enables the EMS to control the BESS in an advanced and organized way and it can be used to communicate with the EMS.

Figure 11 shows the grid import, site load, energy price, battery power, and SOC for a day in July obtained from the logic-based algorithm for automatic scheduling with peak shaving. Here, dynamic peak shaving logic is used to determine the peak shaving value for each day. Based on the highest anticipated load for a particular day, the dynamic peak shaving algorithm determines the threshold for controlling peak power consumption. The highest anticipated electricity consumption for the day is captured by the variable ‘daily-max-load’. After that, the peak shaving threshold is dynamically adjusted using this value. If the daily maximum load is 200 kW or more, the threshold is set at 150 kW; if it is 150 kW or less, the threshold is set at 100 kW. In addition, 150 kW is the threshold that is maintained for load projections that fall in between these ranges. Utilizing battery storage to its full potential to minimize peak power prices, this approach enables a flexible response to changing load circumstances.

It can be observed from Figure 11 that the battery charges when energy prices are low during the day and discharges under two conditions. The first is when the consumption or site load exceeds the threshold value and the second is when the energy prices are at maximum for the day. It also checks that the SOC is not below 20% SOC.

This shows that the algorithm properly applies demand response by accurately using the battery when the consumption exceeds the threshold value. After this, the battery discharges, and the grid import is restricted to that value for that hour.

4.2. Results Comparison from Algorithms

The developed algorithms were tested and implemented with the hourly data of each month from February 2024 to July 2024. The bar chart in Figure 12 shows the cost savings achieved by the RL algorithm and logic-based optimization algorithm during the six-month period. The RL approach (shown in red) consistently outperforms the logic-based approach (shown in blue).

When comparing the cost reductions achieved by the RL optimization between February and July, the RL algorithm produced savings that were 20% greater on average than the logic-based algorithm. This significant result highlights the powerful nature of RL as an approach for optimization. This highlights a substantial improvement in cost efficiency, achieved through a combination of peak shaving and price arbitrage, both dynamically managed by reinforcement learning. Peak shaving minimizes penalties by reducing peak load during high-cost periods, while price arbitrage optimizes energy costs by charging the battery during low-cost periods and discharging during high-cost periods. The RL algorithm enhances efficiency by continuously learning and adapting to energy price fluctuations and load demand, ensuring optimal battery operation. These strategies collectively contribute to the significant cost savings.

4.3. Economic Optimization Based on a Peak Pricing Scheme

The peak pricing scenario presents a complex optimization challenge for the reinforcement learning (RL) agent. In this scenario, the agent must not only manage power exchanges with the main grid but also carefully control the peak power drawn from the grid to avoid significant cost increases. This dual objective makes the optimization problem more intricate than the normal pricing scheme. Two RL algorithms were employed to tackle this challenge: Proximal Policy Optimization (PPO) and Twin Delayed Deep Deterministic Policy Gradient (TD3). TD3, an enhanced version of the traditional DDPG, incorporates three key improvements. First, it utilizes two Q-functions instead of one (“twin”), then it updates Q-functions less frequently (“delayed”), and, finally, it smooths actions by introducing random action noise [37].

The performance of both algorithms was evaluated through co-simulation with MATLAB-Simulink and a Python-based mathematical model. Figure 13 shows the normalized spot price, SOC, and battery power for both the TD3 and the PPO agent. The results reveal distinct behavioral patterns for each algorithm:

TD3 Algorithm:

Exhibits more aggressive behavior (higher peaks in Figure 14a,c).
Responds quickly to price fluctuations.
Discharges rapidly when prices start to increase (Figure 14d).
Achieves lower positive peaks in grid power, indicating effective peak shaving (Figure 14c).

PPO Algorithm:

Demonstrates less aggressive behavior (lower peaks in Figure 14a,c).
Responds more smoothly to price fluctuations (Figure 14b).
Focuses more on spot price trading rather than peak shaving.

Both algorithms show similar general trends, such as discharging during price peaks (e.g., at 10 and 40 h) and charging during troughs (e.g., at 25 h), as illustrated in Figure 13. However, TD3’s more aggressive approach seems to yield better overall performance, particularly in managing peak power draw from the grid.

The financial flow in this peak pricing scheme can be summarized using the following key points:

Spot price trading: both algorithms attempt to capitalize on price differentials by charging when prices are low and discharging when prices are high.
Peak penalty avoidance: TD3, in particular, appears to prioritize reducing peak power draw from the grid, which helps minimize the monthly peak penalty.
Battery utilization: the algorithms must balance the costs of battery degradation against the potential savings from energy arbitrage and peak shaving.
Long-term vs. short-term optimization: the agents must weigh immediate gains from spot price trading against long-term benefits of peak shaving.

The superior performance of TD3 can be attributed to its ability to better balance these competing financial objectives. Its more aggressive behavior allows it to capitalize on short-term price fluctuations while simultaneously managing long-term peak power costs. However, it is important to note that this complex trade-off between spot price trading and peak penalty management requires longer training sessions to optimize effectively. Future research could focus on extending training periods and fine-tuning hyperparameters to further improve performance in this challenging scenario.

5. Conclusions

In this paper, the optimization of an industrial microgrid using logic-based and RL-based algorithms was performed. Load forecasting and simulation validation were carried out, and two algorithms were benchmarked against one another. Notably, the RL algorithm achieved an average monthly cost reduction of 20% compared with logic-based optimization. The RL algorithm effectively manages battery energy storage systems (BESSs) by dynamically adapting peak shaving logic to varying load projections. Battery charging and discharging respond to energy prices and load conditions, ensuring efficient operation. Future research directions include investigating scalability for larger microgrids, and testing robustness under diverse scenarios.

Author Contributions

Conceptualization, S.U. and L.M.-P.; Methodology, S.U. and L.M.-P.; Software, S.U. and I.A.; Formal analysis, I.A. and L.M.-P.; Writing—original draft, S.U. and I.A.; Writing—review & editing, L.M.-P.; Supervision, L.M.-P. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by EEA and Norway Grants financed by Innovation Norway in DOITSMARTER project, Ref. 2022/337335.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

BESS	Battery energy storage system
EMS	Energy management system
RES	Renewable energy sources
PCS	Power Conversion System
GPC	Grid Power Controller
PPO	Proximal Policy Optimization
TD3	Twin-Delayed Deep Deterministic Policy Gradient
DR	Demand response
MG	Microgrid
IMG	Industrial microgrid
PV	Photovoltaics
ESS	Energy Storage System
DERs	Distributed energy resources
RL	Reinforcement learning
EV	Electric vehicle
IoT	Internet of Things
API	Application Programming Interface
TCP	Transmission Control Protocol

References

Department of Energy, Office of Electricity Delivery and Energy Reliability. Summary Report: 2012 DOE Microgrid Workshop. 2012. Available online: https://www.energy.gov/oe/articles/2012-doe-microgrid-workshop-summary-report-september-2012 (accessed on 24 May 2022).
Lu, R.; Bai, R.; Ding, Y.; Wei, M.; Jiang, J.; Sun, M.; Xiao, F.; Zhang, H.T. A hybrid deep learning-based online energy management scheme for industrial microgrid. Appl. Energy 2021, 304, 117857. [Google Scholar] [CrossRef]
Wang, C.; Yan, J.; Marnay, C.; Djilali, N.; Dahlquist, E.; Wu, J.; Jia, H. Distributed Energy and Microgrids (DEM). Appl. Energy 2018, 210, 685–689. [Google Scholar] [CrossRef]
Brem, A.; Adrita, M.M.; O’Sullivan, D.T.; Bruton, K. Industrial smart and micro grid systems—A systematic mapping study. J. Clean. Prod. 2020, 244, 118828. [Google Scholar] [CrossRef]
Mehta, R. A microgrid case study for ensuring reliable power for commercial and industrial sites. In Proceedings of the 2019 IEEE PES GTD Grand International Conference and Exposition Asia (GTD Asia), Bangkok, Thailand, 19–23 March 2019; pp. 594–598. [Google Scholar]
Roslan, M.; Hannan, M.; Ker, P.J.; Begum, R.; Mahlia, T.I.; Dong, Z. Scheduling controller for microgrids energy management system using optimization algorithm in achieving cost saving and emission reduction. Appl. Energy 2021, 292, 116883. [Google Scholar] [CrossRef]
Roslan, M.; Hannan, M.; Ker, P.J.; Uddin, M. Microgrid control methods toward achieving sustainable energy management. Appl. Energy 2019, 240, 583–607. [Google Scholar] [CrossRef]
Pourmousavi, S.A.; Nehrir, M.H.; Colson, C.M.; Wang, C. Real-time energy management of a stand-alone hybrid wind-microturbine energy system using particle swarm optimization. IEEE Trans. Sustain. Energy 2010, 1, 193–201. [Google Scholar] [CrossRef]
Marzband, M.; Sumper, A.; Ruiz-Alvarez, A.; Domínguez-García, J.L.; Tomoiagă, B. Experimental evaluation of a real time energy management system for stand-alone microgrids in day-ahead markets. Appl. Energy 2013, 106, 365–376. [Google Scholar] [CrossRef]
Choobineh, M.; Mohagheghi, S. A multi-objective optimization framework for energy and asset management in an industrial Microgrid. J. Clean. Prod. 2016, 139, 1326–1338. [Google Scholar] [CrossRef]
Ding, Y.M.; Hong, S.H.; Li, X.H. A demand response energy management scheme for industrial facilities in smart grid. IEEE Trans. Ind. Inform. 2014, 10, 2257–2269. [Google Scholar] [CrossRef]
Gholian, A.; Mohsenian-Rad, H.; Hua, Y. Optimal industrial load control in smart grid. IEEE Trans. Smart Grid 2015, 7, 2305–2316. [Google Scholar] [CrossRef]
Huang, X.; Hong, S.H.; Li, Y. Hour-ahead price based energy management scheme for industrial facilities. IEEE Trans. Ind. Inform. 2017, 13, 2886–2898. [Google Scholar] [CrossRef]
Youssef, T.A.; El Hariri, M.; Elsayed, A.T.; Mohammed, O.A. A DDS-based energy management framework for small microgrid operation and control. IEEE Trans. Ind. Inform. 2017, 14, 958–968. [Google Scholar] [CrossRef]
Gutiérrez-Oliva, D.; Colmenar-Santos, A.; Rosales-Asensio, E. A review of the state of the art of industrial microgrids based on renewable energy. Electronics 2022, 11, 1002. [Google Scholar] [CrossRef]
Correia, A.F.; Moura, P.; de Almeida, A.T. Technical and economic assessment of battery storage and vehicle-to-grid systems in building microgrids. Energies 2022, 15, 8905. [Google Scholar] [CrossRef]
Hussain, A.; Bui, V.H.; Kim, H.M. Microgrids as a resilience resource and strategies used by microgrids for enhancing resilience. Appl. Energy 2019, 240, 56–72. [Google Scholar] [CrossRef]
Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
Arwa, E.O.; Folly, K.A. Reinforcement learning techniques for optimal power control in grid-connected microgrids: A comprehensive review. IEEE Access 2020, 8, 208992–209007. [Google Scholar] [CrossRef]
Mughees, N.; Jaffery, M.H.; Mughees, A.; Ansari, E.A.; Mughees, A. Reinforcement learning-based composite differential evolution for integrated demand response scheme in industrial microgrids. Appl. Energy 2023, 342, 121150. [Google Scholar] [CrossRef]
François-Lavet, V.; Taralla, D.; Ernst, D.; Fonteneau, R. Deep reinforcement learning solutions for energy microgrids management. In Proceedings of the European Workshop on Reinforcement Learning (EWRL 2016), Barcelona, Spain, 3–4 December 2016. [Google Scholar]
Chen, P.; Liu, M.; Chen, C.; Shang, X. A battery management strategy in microgrid for personalized customer requirements. Energy 2019, 189, 116245. [Google Scholar] [CrossRef]
Nakabi, T.A.; Toivanen, P. Deep reinforcement learning for energy management in a microgrid with flexible demand. Sustain. Energy Grids Netw. 2021, 25, 100413. [Google Scholar] [CrossRef]
Ji, Y.; Wang, J.; Xu, J.; Fang, X.; Zhang, H. Real-time energy management of a microgrid using deep reinforcement learning. Energies 2019, 12, 2291. [Google Scholar] [CrossRef]
Lee, S.; Seon, J.; Sun, Y.G.; Kim, S.H.; Kyeong, C.; Kim, D.I.; Kim, J.Y. Novel architecture of energy management systems based on deep reinforcement learning in microgrid. IEEE Trans. Smart Grid 2023, 15, 1646–1658. [Google Scholar] [CrossRef]
Ahmed, I.; Pedersen, A.; Mihet-Popa, L. Smart Microgrid Optimization using Deep Reinforcement Learning by utilizing the Energy Storage Systems. In Proceedings of the 2024 4th International Conference on Smart Grid and Renewable Energy (SGRE), Doha, Qatar, 8–10 January 2024; pp. 1–7. [Google Scholar]
ProSoft Technology. Introduction to Modbus TCP/IP; Acromag, Inc.: Wixom, MI, USA, 2024. [Google Scholar]
EEM-MA771—Measuring Instrument. 2024. Available online: https://www.phoenixcontact.com/en-no/products/measuring-instrument-eem-ma771-2908286 (accessed on 10 March 2024).
Hva Koster Strommen. What Does Strømmen.no Cost? 2024. Available online: https://www.hvakosterstrommen.no/ (accessed on 15 March 2024).
ENTSOE. Entso-e Transparency Platform. 2024. Available online: https://transparency.entsoe.eu/ (accessed on 13 March 2024).
RandomForestRegressor. Available online: https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html (accessed on 30 May 2024).
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Amazon Web Services. What Is Reinforcement Learning? Available online: https://aws.amazon.com/what-is/reinforcement-learning/ (accessed on 30 May 2024).
OpenAI. Proximal Policy Optimization. Available online: https://spinningup.openai.com/en/latest/algorithms/ppo.html (accessed on 30 May 2024).
Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal policy optimization algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar]
Nordpool Market Data. Available online: https://www.nordpoolgroupqa.com/en/trading/Market-data1/Intraday/Market-data1/Market-data1/Overview/ (accessed on 10 January 2023).
Fujimoto, S.; Hoof, H.; Meger, D. Addressing function approximation error in actor-critic methods. In Proceedings of the International Conference on Machine Learning (PMLR), Stockholm, Sweden, 10–15 July 2018; pp. 1587–1596. [Google Scholar]

Figure 1. Overall microgrid system diagram.

Figure 2. Forecasted PV power generation throughout 2024.

Figure 3. Schematic of IoT device communication.

Figure 4. Overview of the energy management system.

Figure 5. EMS development steps.

Figure 6. Forecasted graph of grid import and site load.

Figure 7. Flowchart of the logic-based optimization algorithm.

Figure 8. Workflow of PPO algorithm.

Figure 9. An example illustrating the cost difference under the peak pricing scheme for two consumption profiles with the same total consumed energy (area).

Figure 10. Overview of the simulation approach and steps followed.

Figure 11. Grid import, site load, energy price, battery power, and SOC for a day in July.

Figure 12. Monthly savings results comparison of both algorithms.

Figure 13. Normalized results for the spot price, SOC, and battery. (a) Results with PPO; (b) results with TD3.

Figure 14. Comparison of the

P_{b a t t e r y}

, electricity cost,

P_{g r i d}

, and the SOC between TD3 and PPO. (a) Comparison of

P_{b a t t e r y}

; (b) comparison of the electricity cost; (c) comparison of

P_{g r i d}

; (d) comparison of the SOC.

Figure 14. Comparison of the

P_{b a t t e r y}

, electricity cost,

P_{g r i d}

, and the SOC between TD3 and PPO. (a) Comparison of

P_{b a t t e r y}

; (b) comparison of the electricity cost; (c) comparison of

P_{g r i d}

; (d) comparison of the SOC.

Table 1. General PV system parameters.

Parameters	Values
PV Generator Output	200.88 kWp
PV Generator Surface	1059.6 m²
Number of PV Modules	648
Number of Inverters	3
PV Module Used	JAM60S01-310/PR
Speculated Annual Yield	87,594 kWh/kWp

Table 2. Table showing battery system specifications.

Parameters	Values
Battery Type	LPF Lithium-ion
Battery Capacity	1105 kWh
Rated Battery Voltage	768 Vdc
Battery Voltage Range	672–852 Vdc
Max. Charge/Discharge Current	186 A
Max. Charge/Discharge Power	1000 kW

Table 3. Table showing inverter specifications.

Parameters	Values
Rated Voltage	400 V (L-L)
Rated Frequency	50/60 Hz
AC Connection	3 W + N
Rated Power	2 × 500 kW
Rated Current Imax	2 × 721.7 A
Power Factor	0.8–1 (leading or lagging, load-dependent)

Table 4. Summary of the microgrid peak hour pricing scheme.

Peak hour pricing scheme (taken from the highest peak in the month)

Winter: November–March (84 NOK/kW/month)

Summer: April–October (35 NOK/kW/month)

Peak hour pricing scheme for reactive power (taken from the highest peak in the month)

Winter: November–March (35 NOK/kVAr/month)

Summer: April–October (15 NOK/kW/month)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Upadhyay, S.; Ahmed, I.; Mihet-Popa, L. Energy Management System for an Industrial Microgrid Using Optimization Algorithms-Based Reinforcement Learning Technique. Energies 2024, 17, 3898. https://doi.org/10.3390/en17163898

AMA Style

Upadhyay S, Ahmed I, Mihet-Popa L. Energy Management System for an Industrial Microgrid Using Optimization Algorithms-Based Reinforcement Learning Technique. Energies. 2024; 17(16):3898. https://doi.org/10.3390/en17163898

Chicago/Turabian Style

Upadhyay, Saugat, Ibrahim Ahmed, and Lucian Mihet-Popa. 2024. "Energy Management System for an Industrial Microgrid Using Optimization Algorithms-Based Reinforcement Learning Technique" Energies 17, no. 16: 3898. https://doi.org/10.3390/en17163898

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Energy Management System for an Industrial Microgrid Using Optimization Algorithms-Based Reinforcement Learning Technique

Abstract

1. Introduction

2. Microgrid Architecture

2.1. PV System

2.2. Battery Energy Storage System

3. Energy Management System

3.1. Data Acquisition and Processing

3.2. Data Analysis and Forecasting

3.3. Logic-Based Optimization

3.4. Reinforcement Learning Algorithm

3.5. Grid Pricing Scheme

3.6. Simulation Approach

4. Results and Discussion

4.1. Battery Scheduling with Peak Shaving

4.2. Results Comparison from Algorithms

4.3. Economic Optimization Based on a Peak Pricing Scheme

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI