1. Introduction
Time is organized in terms of years, months, days, and hours. Each time term is distinguished by scale and combined to form the multiscale phenomena that is part of our daily life. This is an outcome of multiscale dynamics in the solar system [
1]. Conventionally, most modeling approaches focus on a mono-scale perspective. When the macroscale behavior of a system is the focal point, the microscale is modeled applying constitutive interactions. On the other hand, if the microscale is the subject matter, one assumes that not a compelling thing occurs at a macroscale level. Accordingly, multiscale modeling helps to manage the restrictions of both methods (macro- and microscale) by targeting the efficiency of macroscale modeling while preserving the precision of microscale modeling altogether. However, the simultaneous use of several scales and preceptive levels leads to more complex modeling approaches [
1].
The integration of a supply chain’s decision level is extremely important for improving investment returns. Planning and scheduling are generally performed separately even though they are interdependent. The integration of planning (e.g., design) and scheduling (e.g., operation) improves the decision-level management resulting in lower net costs. Yet, large-scale problems are formed as a result of different time scales integration that are typically computationally intractable. In order to address this problem, various modeling and solution approaches have been proposed (e.g., Yilmaz et al. 2019 [
2]). The majority are problem-specific and are only valid for short-time timeframes. Accordingly, clustering arises as an effective and appropriate approach to deal with this type of problem by means of grouping similar inputs such as supply, demand, and price together. Input parameters typically are made up of multiple attributes such as the concurrent electricity and heat demand. This approach can significantly reduce the model size combined with the improved computational performance while maintaining solution accuracy.
The task of clustering is to discover the structure in sets of unlabeled data by means of grouping it into uniform groups. Particularly, the similarity of the within-group-object (i.e., within one cluster) is minimized while the between-group-object dissimilarity is maximized (i.e., between different clusters). An effective clustering can be measured by a high similarity/uniformity within group objects as well as a high dissimilarity/heterogeneity amongst groups. Objects (i.e., events, measurements) are commonly characterized by elements or vectors in a multidimensional space where every dimension identifies a specific quantifiable attribute (variable) that describes the object. Thus, an arranged set of objects could be represented conceptually as an m × n matrix. The symbols m and n denote the rows (one per object) and columns (one per attribute), respectively.
Energy hub models can be applied to different spatial scales, (e.g., from a single building to a large geographic region) as well as time scales. Particularly, energy hub modeling could be applied to different time scales from long-term planning (e.g., designing and sizing the energy conversion and storage units) to medium- or short-term planning (scheduling and operation).
Many studies have addressed the optimal scheduling and planning of energy hub systems. For example, [
3] studied the daily scheduling of an energy hub including different generation and storage technologies. Ma et al. [
4], proposed a deterministic MILP model aiming to minimize the daily operating costs (including electricity/gas and carbon emissions costs) of an energy hub. Lu et al. [
5],) proposed a multi-objective optimization framework for the optimal operation of energy hub components considering uncertain households’ behaviors. Taqvi et al. [
6] carried out a study to determine the rooftop renewable energy potential for the optimal designing of an EV charging infrastructure using a multi-energy hub approach, but this study did not consider energy storage and a multiscale approach for size reduction. A recent study [
7] investigated the effect of the demand response and the impact of carbon trading on a multi-objective optimization model that considered the operation cost, energy utilization efficiency, and consumption rate of renewable energy.
In addition, energy hub network operation has been explored in several studies; however, few studies have carried out the design and operation of urban energy systems based on the energy hub concept. Koltsaklis et al. in 2014 [
8] studied the optimal design and operation of distributed energy systems (DESs), but the study did not consider renewable energy technologies. Maroufmashat, Sattari, et al. [
9], proposed a deterministic model for the design and operation of DESs in urban areas including renewable energy sources. Economic and environmental considerations were investigated, but renewable resource uncertainties were not addressed. Mukherjee et al. [
10], proposed a stochastic programming approach for the planning and operation of a power-to-gas energy hub. The study focused on assessing the benefits of power-to-gas energy storage while accounting for the uncertainty of fuel cell vehicles, refueling hydrogen, and the electricity price. Kotzur et al. [
11] employed time series aggregation (based on clustering algorithms) to the cluster demand, wind speed, and solar irradiance. The study shows that the application of clustering methods in energy hub optimization significantly reduces the model complexity. A mixed-integer linear programming model for offshore energy hubs was developed by [
12]. They used different aggregation, clustering, and time sampling to address the multi-timescale aspects without reducing the actual demand data, however, the uncertainty of offshore wind was not considered. In 2023, Amry et al. [
13], developed a strategy to carry out the optimal sizing for an EV workplace charging station considering PV and flywheel energy storage systems; nevertheless, the work did not consider the uncertainty associated with solar energy.
The aforementioned studies did not consider uncertainty in the distributed renewable energy sources (DRESs) which could lead to inaccurate decisions. Zhang et al. [
14], developed a two-stage stochastic model for the optimal design and operation of combined cooling, heat, and power (CCHP) units. The study considered the uncertainties of loads and solar irradiation in different seasons, but it did not include energy storage systems (ESS). f et al. [
15] developed a stochastic model for the operation and scheduling of energy hubs considering the DRES’s uncertainties. The stochastic scenarios were generated using a Monte Carlo simulation and were reduced using the k-means clustering algorithm. However, only three scenarios per uncertain feature were considered in the optimization.
1.1. Research Gap and Contribution
The current literature covers several aspects of energy hub modeling such as the design, operation, greenhouse gas (GHG) emissions minimization, DRESs, ESS, renewable energy uncertainty, and demand size reduction. However, no studies have combined all features into a single model formulation. Accordingly, there is a knowledge gap that demands the development of a comprehensive stochastic optimization model that considers design and operation planning, ESS, and the DRES’s uncertainties. In addition, there is a need to apply an efficient reduction size to large-sized multi-attribute demand data used as an input to stochastic energy hub modeling. In other words, the main novelty of this work is to combine all the research aspects into one model and present a general and practical framework to solve different types of multiscale energy systems with multiple attributes.
Table 1 summarizes the research aspect of previous energy hubs, the research gap, and the current research contribution.
1.2. Research Problem
Typically, planning and scheduling are both performed separately even though they are interdependent. However, the integration between these different time scales is the key to improving the efficiency and profit margins, as the integration of planning (e.g., design) and scheduling (i.e., operation) improves decision level management which results in lower net costs. However, the computational tractability arising from this integration makes it difficult to solve. For example, a very large and intractable problem will be formed if different time scales of the multiscale energy hub model are converted to the shortest planning period (i.e., detailed scheduling over a long duration). While relaxing some constraints, employing surrogate models, or using an averaging method, these might lead to infeasible operations (i.e., since detailed schedules cannot be obtained to meet the planned production targets) or an inaccurate system design [
11,
21].
An energy hub modeling approach can establish controllable and flexible energy due to the ability to integrate different types of DRESs and energy storage system (ESSs) [
22]. Using DRESs in the energy sector can alleviate the impact of greenhouse gas emissions from other sectors (e.g., agriculture, industry, transportation) and play a key role in pathways towards deep decarbonization [
23]. However, due to the uncertainty and variability of DRESs (e.g., solar photovoltaic and wind turbine), the advantages of energy hub systems to supply flexible power could be limited and diminished [
24]. Therefore, modeling the energy hub by considering the uncertainties associated with these sources is crucial.
1.3. Research Motivations and Goals
This study attempts to address the following challenges (multiscale decision making, uncertainty and variability in DRERs) associated with energy hubs by:
Applying unique general mathematical programming-based clustering methods to reduce the multiple attribute demand data size that has the ability to attain normal and sequence clustering, change the internal clustering measure, and tune attribute weights.
Proposing a statistical method that models the uncertain behavior of renewable energy sources.
Formulating the energy hub system as a two-stage stochastic optimization.
The first goal of the present work was to overcome the problem associated with integrating different scales of an energy hub model by adopting a generic clustering approach. The goal of the clustering approach was to represent the days in a year that exhibit a similar trajectory with a reduced-sized typical day candidate (i.e., representative) of the operating year. A sufficient number of representative curves mean the representatives are able to provide a close enough solution to the full size (high in accuracy) model while also maintaining solution tractability.
The second goal of this work was to develop a two-stage stochastic optimization model for the design and operation of an energy hub system with hydrogen storage. A hydrogen storage system was selected due to its flexibility in offering different energy recovery pathways. For instance, hydrogen can be used to produce electricity through a fuel-cell, supplied for the hydrogen demand to a hydrogen vehicle, injected and distributed into the existing natural gas infrastructure. Two case studies are considered to optimally design and operate the energy hub model, one without restriction on green-house gas (GHG) emissions, and another restricting the GHG emissions. A Weibull distribution statistical method was implemented to generate stochastic wind speed scenarios from wind speed data. To test the clustering efficiency, the cluster results were applied to the developed energy hub model and compared with the energy hub when the whole set of data was used. At the end of this paper, the efficiency of the stochastic approach was assessed.
2. Methods: Clustering Algorithms and Stochastic Scenario Generation
The first aim of this work was tackling the problem arising from the integration of different supply chain decision levels by developing a generic clustering method.
The goal was to decrease the model size by means of switching the annual days with regular days characteristic of the operational year. Even though clustering has been extensively used in several applications, demand patterns clustering has been weakly analyzed. For instance, demand patterns are particularly complex due to their multidimensional nature comprising shape (e.g., hourly demand curves trajectory), while time regularly exhibits diverse attributes (e.g., the energy hub’s parallel heat and electricity demand).
Figure 1 shows a conceptual schematic of the proposed clustering approach application to the multiscale decision-making problem. This analysis is based on a mathematical programming approach. For instance, the clustering algorithm for multidimensional attributes is formulated by applying mixed-integer programming (MIP) methods. Due to the complexity of the MILP clustering model, a heuristic size reduction derived from the full-scale MILP clustering approach was proposed to tackle computational issues.
The time-series data comprised the clustering algorithm proposed in the present work. Clustering has drawn significant attention given its potential processing applications for big data. The algorithm clusters demand data considered the trajectories-time and shape-similarity at once. Hence, the time-series data clustering could help with reducing the processing difficulties of multiscale modeling. For instance, the least absolute value method L
1-norm [
25,
26,
27,
28,
29] was applied to quantify the similarity and preserve the model’s linearity while showcasing the generality of the algorithm.
The proposed clustering approach was an extension of the work previously proposed by [
21] which included the clustering of multi-attributes as an alternative to the traditional single attribute. The weighting method has been selected as a multi-objective optimization approach [
30] to cope with the problem’s multiple-attribute nature (see
Figure 2 that illustrates a bi-objective problem’s Pareto front).
2.1. General Algorithm Formulation
The clustering approach aims at allocating days to clusters with the least dissimilarity. Accordingly, the load curves set given in
D (days) and
H (hours) were collected in
C clusters. Multiple attributes, denoted by index
, were considered within the model formulation. This can be expressed in the following general form:
where
is the multi-objective performance criteria function to be minimized, and
(Integral Absolute Error) denotes the
L1-norm employed to measure similarity in the
ath attribute. Equation (1) shows the multi-objective performance criteria of different attributes
a as a weighted function, with
as attribute
a’s weighting factor (
,
). In contrast, Equation (2) represents the constraint for the day assignment where every single day of the year must be allocated to a curves
c cluster. The binary variable
indicates the binary variable allocating loads for day
joining cluster
. The IAE formulation can be expressed as follows:
where
and
represent the load and clustered curves, respectively. Furthermore, Equation (4) results from employing the trapezoidal rule to Equation (3) as follows:
where
denotes the absolute difference between the clustered curve
c and load curve
l for the
hth hour of
d day for attribute
a, which can be defined as follows:
Dnotes a’s attribute demand load for hour h in the dth day, is the representative demand of attribute a for the hth hour in the c cluster. It is worth noticing that the model can be adapted to different performance criteria. For instance, the L2 norm can be easily implemented in place of the L1 norm by incorporating the Euclidean distance shown in Equation (3).
Additionally, the demand data could be sequentially clustered by incorporating a constraints set according to the string property concept [
31]. Sequence clustering could assist with maintaining flexible operations, such as for cases arising where similar continuous operations are preferred to minimize the change-mode or set ups costs. Consequently, the constraints sets are included to integrate the time-dimension sequencing into the clustering model:
Equations (7)–(9) control the initial, intermediate, and final sequence clusters, respectively.
The proposed general formulation provides the same platform to perform sequence and normal clustering because it is built on the same algorithmic structure. Nevertheless, the model is an MINLP given the multiplication of variable
and
(shown in Equation (5)) and the absolute value. Hence, the absolute function is linearized by applying linearization methods on the absolute function [
32]. Furthermore, the bilinear term (
) is further linearized by incorporating an extra continuous variable (
) called the relaxation variable, through a set of constraints [
33]. Further details on the linearization approach can be found in [
21]. In summary, the model for normal clustering is made up by (1)–(3), whereas sequence clustering is denoted by (1)–(3), and (6)–(8).
2.2. Multiple Attributes Heuristic Algorithm for Size-Reduction
Given the complexity of the proposed clustering approach, the goal in this subsection was proposing a heuristic size-reduction algorithm to handle the issue. The present MILP model can be also applied to long timeframe planning, including multiple attributes. However, the linearity and programming basis of the full-scale clustering model was kept by the heuristic modeling framework. As shown, the clustering model was namely composed of two variable types: continuous (, and ), and discrete (day assignment binary variable ). Accordingly, the algorithm decomposed the original problem into a master problem and subproblem. The master problem was an MIP that solves complex variables such as day assignment () and fixes them to given feasible integers. The subproblem was Linear Programming (LP) that solves the resulting continuous cluster curves problem () using the master problem’s fixed integer variable values.
In the master problem, the initial guess clusters were fixed while the optimization algorithm solved for the day assignment (
). Then, the master problem’s solution was used to initialize the subproblem to find a solution for the cluster curves. Hence, turning the problem into a simple linear programming. The algorithm worked on an iterative structure comparing the upper and lower bound solutions until the differences were within the acceptable range. This structure has been used earlier, and denotes a suitable solution method to deal with large-scale mathematical models [
34,
35]. It is worth mentioning that the objective function upper and lower bounds were obtained from solving the master problem and subproblem, respectively.
Figure 3a,b shows the multiple attributes heuristic algorithm flow diagram. For instance,
Figure 3a illustrates the algorithm execution for a given weight factor combination and each number of clusters. In addition, it shows the execution of the proposed algorithm for a given weight factor to construct the Pareto frontier. After every scenario of a given weight factor has been considered, the procedure moves to the following weight factor repeating the steps until all weight factors are considered.
On the other hand,
Figure 3b shows the execution of the proposed algorithm for each initial guess scenario. This procedure is given as follows:
Initialization: Set the number of initial guess scenarios N.
Generate random initial guess clusters scenarios: The scenarios are generated using random uniform distribution between the minimum and maximum demand of each hour for every attribute in the entire demand curve .
Initial scenario: Consider scenario n = 1.
Master problem solution: Solve for the day assignment () given fixed cluster curves to obtain the upper bound objective function .
Subproblem solution: Solve for cluster curves ) given fixed day assignment () from the master problem and obtain the objective function lower bound .
Convergency check: If go to 7. Otherwise, implement cluster curves obtained from subproblem as starting point to solve master problem. Repeat steps 4–6 and iterate until convergence is achieved.
Next scenario: Record n scenario solution and consider next scenario. Repeat steps 4–6 until all scenarios have been considered. Then move to next step.
Scenario with minimum objective function: A cluster solution corresponds to the minimum objective function value
. The model can be used for both normal clustering (1)–(2) and (4)–(5), or sequence clustering (1)–(2), (4)–(5), and (6)–(8). The common formulation can be applied to multiple attributes problems. The models were formulated in the General Algebraic Modeling System (GAMS) [
36]. The total number of continuous and binary variables for the full-scale clustering model with 2 attributes was given as
and
, respectively. The variables
and
denote the total number of days and clusters, respectively. Likewise, the total number of binary variables in the master problem was given as
while the total number of continuous variables in the subproblem was
.
The solution quality and time for the full-scale general clustering and size-reduction heuristic algorithms for multi-attributes were examined. One can notice that the full-scale clustering could tackle the entire year’s heat and electricity demand data using normal clustering. For instance, for one year of demand data, no solution was returned after 48 h of CPU time. On the other hand, when using sequence clustering, the solution time was reasonable for the one-year data. In sequence clustering, there were extra sets of constraints that reduced the feasible region size, which resulted in shorter solution times.
For comparison purposes, the two proposed algorithms were tested using reduced datasets (i.e., 20 days) of one year demand data. The runs included 4, 5, and 6 clusters using 20-day demand data for the normal and 365-day demand data for the sequence clustering, respectively (i.e., 6 runs in total). For all runs, the weight factor was set to 0.5 in both attributes. Twenty-five initial guess scenarios were generated in each of the heuristic formulation runs.
Table 2 shows the optimal objective function value and solution time for both clustering methods.
As shown in
Table 2, the heuristic approach significantly reduced the CPU time compared with the full-scale model solution. As the number of clusters increased, the heuristic algorithm did not match the full-scale model’s objective function value. Nonetheless, the difference was negligeable. For example, for 6 clusters, the IAE difference between the heuristic and full-scale model was less than 1%. Therefore, the heuristic approach outperformed in terms of the solution time, especially for large datasets with close proximity to the full-scale model’s solution form.
Accordingly, we implemented the heuristic approach to generate cluster curves for the energy hub case study presented in
Section 3.
2.3. Uncertainty of Wind Speed Modeling
Unfortunately, wind speed is notoriously variable, varying substantially throughout a day, from season to season, and even from year to year. Nonetheless, the Weibull distribution was favorable for describing the wind speed fluctuation at any time interval using two parameters [
37]. This statistical tool reflected how often the winds of different speeds could be seen at a location. This was a widely used method in both industry and academia. Therefore, the Weibull distribution was used to fit one-year wind speed data [
36]. The wind data corresponded to the measured wind speeds from the Waterloo region in 2018, collected from the National Solar Radiation Data Base [
38]. The maximum likelihood method (MLM) was used to fit a Weibull distribution to measured wind speed data [
39]. Accordingly, the best-fit Weibull distribution of the available data was achieved.
Figure 4 shows the best-fit Weibull distribution and variable wind speed probability. The probability can be estimated as follows:
where
is the wind speed,
denotes the shape, and
is the scale.
The Weibull distribution allows for estimating the probability of the wind speed occurrence. Accordingly, stochastic scenarios can be generated as each scenario has an occurrence probability (see
Figure 5). The probability at which the wind speed is delimited between two points is given as follows:
where
is the cumulative distribution function, and
and
are the upper and lower wind speed limits of each stochastic scenario (
). Accordingly, the inverse cumulative Weibull distribution (
returns the wind speed at a given probability of occurrence. Additionally, the upper and lower wind speed limit of each stochastic scenario can be calculated as follows:
In order to obtain scenarios with equivalent probabilities (i.e., matching areas under the probability density function curve), each scenario is denoted by a probability equal to
. The variable
S represents the total number of scenarios. The upper and lower limits of the wind speeds for each scenario can be calculated as follows:
To avoid infinite values, Equation (13) is used to impose a 99% confidence interval given that when
.
Figure 5 illustrates the fitted Weibull distribution curve where each shaded area under the probability distribution function represents a single stochastic scenario with the probability of (1/
S).
3. Case Study: Multiple Attributes Clustering Application to Energy Hubs
The present case study illustrates the application of the clustering algorithms to energy demand data including multiple attributes, and data-driven statistical methods to represent the intermittent behavior of uncertain wind speed data. Additionally, the energy hub design and operation were formulated as a multiscale model with multiple attributes. This was done by agglomerating demand data with similar profiles and generating stochastic scenarios for a two-stage stochastic model considering uncertainty in the wind data. In addition, the impact of clustering on the solution accuracy was investigated. It has previously been determined that clustering considerably helps to reduce the computation time.
The present case study evaluated the proposed sequence and normal clustering algorithms’ outputs versus the full-size energy hub design and operating model under wind speed uncertainty with multiple demand attributes. The energy hub is a strategic (long-term) and medium-term decision level problem. The aim was to minimize the total annual cost of designing (installing and sizing) and operating the energy hub while meeting the energy demands. There are numerous models available in the literature for the energy hub problem, from mathematical programming to heuristics. The present case study adopts the [
40] MILP model.
The present energy hub system aimed to minimize the annual operational and maintenance cost, as well as the capital cost while meeting electricity and heat demands within the units’ operating capacities and physical constraints.
In this paper, the authors consider both REDSs (Renewable Distributed Energy Sources) and DERs (distributed energy resources) based on fossil fuels. The current energy hub system includes a variety of conventional energy conversion technologies powered by natural gas such as combined heat and power (CHP) units, boilers, and a non-conventional energy conversion technology (i.e., wind turbines) powered by renewable energy resources. Additionally, it utilizes a hydrogen production and storage system from electricity utilizing an electrolyzer, hydrogen tank, and fuel cell as the ESS. We chose a hydrogen storage system because it can play a role in both storing energy and supplying the hydrogen demand for hydrogen vehicles.
Figure 6 shows the energy hub layout with the considered energy technologies and input data handling (wind speed, electricity, and heat demand).
The optimization program decides the number of each unit and the respective capacity within the energy hub system, as well as the operating points for the electrolyzers, hydrogen tanks, fuel cells, boilers, and CHP generators at each time point. Particularly, in this paper, the discrete size of each technology was considered in the optimization, which made this work more realistic. The number and type of each technology chosen was a design decision variable while the operating variables were related to how the energy hub units were running. The main outputs of the optimization model can be summarized as follows:
Type and number of energy conversion and storage technologies within the hub.
Design and operation of optimal energy hub under intermittent wind energy availability, and based on multiple attributes aggregated demand data or full-size demand data.
Economic cost of the system including capital, operation and maintenance, and fuel consumption.
Environmental impact of the system through the GHG emissions.
Natural gas is the fuel for both: the boiler and CHP. As illustrated, the electricity demand was met by means of the CHP generators, wind turbines, and fuel cell, whereas heat is met by the boilers and CHPs. The list of the energy generation technologies and its technical and economical properties are given in
Table 3. This model is a general framework for microgrid/energy hub systems where different technologies can easily be added or removed, according to the problem that needs to be solved.
The mathematical model was formulated as a two-stage stochastic with recourse, where the first-stage decisions decided the design of the system that included the number of each unit and the respective capacity within the microgrid, while the second-stage decisions planned the operation of the system including the operating points for the electrolyzer, fuel cell, CHP units, and boilers at each time point. The two-stage stochastic recourse (we refer to it as recourse problem, RP) formulation was basically a bi-level optimization formulation whose inner optimization problem mimicked the second-stage planning process. Due to the special structure, the two-stage stochastic programs could be naturally reformulated into an equivalent single-level optimization problem. Therefore, the single level optimization formulation of the RP for the design and operation of the energy hub system could be directly written as follows:
Table 3.
Technical and economic information about the energy conversion and storing technologies.
Table 3.
Technical and economic information about the energy conversion and storing technologies.
Unit | Rated Capacity (kW) | Input Energy Form | Output Energy Form | Efficiency | Capital Cost | Operating and Maintenance Cost ($/kW) |
---|
Boiler [9] | 530 | kW fuel HHV | kW heat | 0.82 | 100 ($/kW) | 0.027 |
300 | 0.9 | 120 ($/kW) | 0.027 |
100 | 0.8 | 150 ($/kW) | 0.027 |
CHP [9] | 300 | kW fuel HHV | kW power | 0.26 | 900 ($/kW) | 0.016 |
kW heat | 0.44 | |
100 | kW fuel HHV | kW power | 0.35 | 1080 ($/kW) | 0.016 |
kW heat | 0.5 |
60 | kW fuel HHV | kW power | 0.31 | 1200 ($/kW) | 0.0111 |
kW heat | 0.56 |
wind turbine [41] | 20 | kW available by air | kW power | 0.4 | 2200 ($/kW) | 0.008 |
30 | kW available by air | kW power | 0.42 | 1906 ($/kW) | 0.008 |
Storing units | | | | |
Electrolyzer [9] | 290 | kW power | kg of H2 | 0.0193 | 155,051$/unit | 0.06 |
Fuel Cell [42] | 250 | kg/h of H2 | kW power | 16.5 | 210,630$/unit | 0.06 |
Key case study inputs such as the heat and electricity demand in addition to the number of cluster curves have been previously generated and can be found in
Section S1 of the Supplementary Information. Moreover, the objective cost function was multiplied by a parameter designated as
(see Equation (14) for details) which allowed comparing of the original demand dataset (i.e., one-year time horizon) and the clustered cases. The parameter
denotes the repetitions (frequency) for the
dth day. The parameter
is equal to one when the original demand data is used, and equal to the number of days when the representative cluster curves are used. For instance, if cluster 1 represents 45 days, its corresponding parameter
is equal to 45.
For future reference, the energy hub with hydrogen storage considering hourly electricity and heat demand loads for an entire year (i.e., full-size) was designated as the original model. On the other hand, the energy hub with hydrogen storage considering 4, 5, and 6 hourly load clusters (i.e., clusters regarded as days) was designated as the clustered model (see
Section S1 of the Supplementary Information for more details). Moreover,
Table 4 lists the weight factor combinations employed to construct the Pareto frontier.
This case study comprised four different scenarios. Each scenario considered a particular operating or environmental constraint under which the proposed clustering algorithms and data-driven methods were tested and evaluated. Accordingly, the following four subsections present the results and discussions of each of these scenarios.
3.1. Baseline Scenario
This case study considered the energy hub operation under unconstrained GHG emissions. For instance,
Figure 7 shows a comparison of the objective function values along with their corresponding relative errors for the clustered and original models (optimal). As shown in the figure, all clustered cases underestimated the objective function value compared with the original case. The objective function values were closer to the original model in normal clustering compared with sequence clustering. The clustered model’s objective functional relative error (i.e., compared with the original model) ranged between −4% and −10%. Moreover, the higher the clusters number, the better the quality of the solution for both sequence and normal clustering. Thus, the solution gap between the clustered and original case became smaller.
Regarding the relative errors, the average relative error of all weight factors (see
Table 4) is included in
Figure 7. The errors were inversely proportional to the number of clusters. The clustered model’s objective function values did not significantly vary as a function of the weight factors due to symmetry similarity in the electricity and heat demands. The bar chart (y-logarithmic scale) in
Figure 7 displays the average solution time for all weight factors of each clustered case run (4, 5, and 6 clusters) versus the original model solution time. As shown in the figure, the clustered model significantly outperformed the original model in terms of the solution time. The clustered model’s average solution time was two orders of magnitudes smaller than the original model (i.e., ~7000 s).
In addition, we examined the effect of applying the multiscale clustering approach for the demand data on the energy hub design.
Figure 8 showcases the design decision variables of each energy hub unit and the total installed heat generation and electrical power capacity. This is presented for the clustered and original model for the weight factors 1, 4, and 8, using sequence and normal clustering. The figure illustrates that the higher the clusters number was, the closer the design decision variables’ values between the clustered and original model were. Likewise, the installed generation capacities followed similar trends. It is worth noticing that overall, the weight factor 1 showed slightly better results for normal clustering because it tended to better align with the heat demand. Due to the high fluctuations in the heat demand, an improved design (i.e., closer to the original model) was attainable by prioritizing the heat demand, which minimized the errors caused by cluster variability.
As shown in
Figure 8, the clustered cases’ installed capacity for power and heat generation are generally underestimated. Specifically, the power capacity was underestimated by a lower margin than heat compared with the original model. This was because the total heat production rate was allowed to exceed the demand (if necessary), whereas an equality constraint was imposed on the power balance to satisfy the electricity demand. Generally, changing the weight factors had a steady effect on the installed capacity of power and heat, as the priority switched from heat to electricity. This could be the result of the heat and electricity demand featuring similar symmetry throughout the whole horizon.
3.2. Environmental Scenario (CO2 Emissions Regulation)
The previous results showed that the optimization emphasized non-renewable energy sources, and not a single wind turbine storing unit (e.g., electrolyzer and/or fuel cell) was installed. This was the result of traditionally higher costs of renewables compared with traditional fossil fuels. Nonetheless, renewables are cleaner alternatives which can be integrated with current energy hubs and microgrid systems to mitigate GHG emissions (i.e., CO
2, CH
4, NO
x). To analyze the energy hub design under CO
2 emission regulations, a carbon constraint was introduced in the energy hub mathematical model as follows:
where
is the annual CO
2 emission mass generated by the energy hub,
denotes Ontario’s natural gas emission factor (0.187 kg/kWh), and
is the CO
2 emissions limit. Only emissions from the operation of fossil fuel units (boilers and CHP) were considered, whereas emissions associated with renewable (i.e., wind turbines) and storage units were considered negligeable. Renewable units’ emissions in the operation stage were negligible compared to fossil fuel units.
A sensitivity analysis was performed to assess the effect of introducing carbon emissions regulations. The analysis consisted of fixing the allowable annual CO
2 emissions of the energy hub. Accordingly,
Figure 9 shows the energy hub’s total annual cost and installed wind turbine units as a function of the CO
2 emissions. The figure was generated using the clustered model for 6 clusters and a weight factor of 4 (i.e., equal emphasis of heat and electricity data) using normal and sequence clustering. Both conditions better represented the entire year demand data as they featured lower IAE. The upper CO
2 emission limit coincided with the lowest annual cost. At this point, the emission constraint was inactive and the installed wind turbines were nil.
On the other hand, when was reduced, the objective function (total annual cost) increased and the optimization selected wind turbines. Accordingly, the greater the CO2 emissions reduction, the higher the number of installed wind turbines and objective function value. It is worth noticing that the results trend for both normal and sequence clustering were nearly equivalent as a function of . Nevertheless, at higher CO2 emission levels, the sequence clustering tended to feature slightly lower objective function values and fewer wind turbines. Conversely, at lower CO2 emission levels sequence clustering chose to install higher numbers of wind turbines leading to total costs exceeding the normal model solution.
Likewise, at lower CO2 emission levels, the greater the number of recommended storing units for sequence clustering were compared with normal clustering. Emission reductions at high carbon levels slightly impacted the objective function value. Conversely, further emission reductions at already low CO2 emission levels came with moderate cost increases. These additional costs arose from the extra storage units required to help dispatch wind power more efficiently.
3.3. GHG Emissions Constrained Scenario
This scenario assumed a 20% CO
2 emissions reduction from the upper carbon limit (i.e., baseline scenario when the emissions constraint is inactive). The effects of the weight factors and clusters numbers over the objective function value as well as the relative errors are illustrated in
Figure 10. For instance, when the clusters were emphasized more on the heat demand (at weight factor 1), the clustered model’s objective function values were much closer to the original model. This was because the heat demand undergoes a higher degree of variability among utilities. When clusters prioritize electricity demand in normal clustering (at weight factor 8), the highest deviation or relative error takes place with respect to the original model results. The proposed weighted clustering method allows tuning and generating clusters that prioritize attributes over others.
There was no clear relationship between the weight factors 1 to 8 and solution quality. Nonetheless, sequence clustering exhibited less variability as the priority switched from heat to electricity. In addition, the average relative errors tended to converge. Normal clustering showed slightly lower average errors than sequence clustering. The average solution time for all weight factors of each cluster run (4, 5, and 6 clusters) along with the original energy hub solution time are displayed in
Figure 10. From the bar chart (y-logarithmic scale), it is observed that the time required to solve the clustered cases was notably lower than that of the original model. For instance, the original model’s solution time (65,137 s) exceeded the clustered model’s average solution time by between an order of magnitude of 2 and 3 (i.e., 50 to 100 s). It is worth noticing that solving the problem without CO
2 emissions regulation (see
Figure 7 for details) was significantly faster by an order of magnitude of 1.
On the other hand, once the GHG emissions constraint was active, the optimization algorithm chose storing and wind turbine units to keep emissions within the desirable target. Accordingly, additional non-zero variables (e.g., continuous variables associated with power flow to/or from storing units, hydrogen flow rates, power directed from wind turbines, and binary on/off variables for charging and discharging storing) were handled by the optimization algorithm. This significantly increased the degree of complexity. Comparatively, there was no outstanding difference between the clustered cases with/without an environmental constraint in terms of the solution time.
The effects of cluster numbers and weight factors on the design decision variables under a GHG emissions constraint are displayed through
Figure 11,
Figure 12 and
Figure 13. For example,
Figure 11 shows a comparison between the original and clustered model fossil fuel units’ design variable values as a function of optimization runs for weight factors 1, 4, and 8. As illustrated in the figure, the weight factors had no significant effect on the design decision variable values. Like the previous scenarios, the higher the number of clusters, the closer the solutions are to the original model (see
Section S3.2 of the Supplementary Information for more details). Furthermore, all optimized clustered scenarios avoided the selection of CHP300 and boiler530 units. This aligned with the original model’s results, as these types of units are the largest in size and are the greatest carbon emitters.
Figure 12 shows the number of wind turbines versus the clustered runs for the weight factors 1, 4, and 8 (also the original model results). As illustrated in the figure, the larger the number of clusters was, the smaller the gap was in the number of wind turbines between the clustered and original model results. It is also worth noticing that at the weight factor 8 for 4- and 5-normal clustering, the number of wind turbines was overestimated by a larger margin. This was explained by the high error in the objective function value as illustrated in
Figure 10.
Figure 13 shows the storing units’ design decision variables’ values under the GHG emissions constraint using both the clustered and original data. The bar chart shows that for most clustered scenarios, the storing units were in very good agreement with the original model. One can notice that some of the sequence clustering results were overestimating the number of hydrogen tanks needed.
This scenario also discussed the operational/decision solution quality of the proposed multiscale clustering approach of the demand data.
Figure 14 and
Figure 15 illustrate the energy hub’s utility production rates (clustered with weight factors 1, 4, and 8 and original model) by fossil-powered and wind/storage units, respectively. Each unit’s utility production was estimated by adding its corresponding production rate over the year. This was the summation of all products between the stochastic scenarios and their corresponding weighted probability.
Figure 14 shows that the clustered model’s utility production rates are in very good agreement with those of the full-size original model. There was no significant variation in the CHP units’ heat and electricity rates between the two models. Nevertheless, the boilers’ heat rate relative error was high in sequence clustering.
Figure 15 clearly shows there is a large degree of error in the electricity rates from wind turbines and fuel cells when comparing the clustered and original model results. This deviation from the original model was accentuated in sequence clustering. Despite the errors, the proposed clustering approach could still be considered as a powerful size-reduction tool that was able to reduce the computational time considerably. For example, the design decision variable values between both models were close. Similarly, the heat and electricity production rates (see
Figure S13 in the Supplementary Material) from the clustered model were in very good agreement with the original model results, whereas their corresponding relative errors did not exceed 20%.
3.4. Stochastic Energy Hub Formulation Assessment
The present section assesses the potential benefits of the proposed energy hub model formulation to store energy under uncertain wind speed scenarios. Accordingly,
Figure 16 and
Figure 17 illustrate the average power transferred to electrolyzers (i.e., charging power) and received from fuel cells in each stochastic scenario, respectively. Both figures illustrate the results of the energy hub model with 6 normal clusters and a weight factor of 4. For simplicity purposes, the average hourly power over a year per scenario is displayed (average power flow from each hour with respect to all days over a year of time per scenario). As one could expect,
Figure 17 demonstrates that the rate of charging (i.e., power transferred to electrolyzers to produce hydrogen) is larger at higher wind speeds. This means that more energy can be stored at a higher wind energy availability. An increase in the scenario number denoted an increase in the wind speed. Additionally, at relatively low demand times, the optimization algorithm decided to store more energy. Conversely,
Figure 17 clearly shows that the fuel cells’ discharge rate is inversely proportional to the wind speed scenario number. Most discharged power occurred at peak demand.
To examine the stochastic programming method efficiency, the value of stochastic solution (VSS) was calculated following [
43]. The VSS helped in determining whether it was advantageous fixing the stochastic model’s first-stage decision variables. This was done based on the expected value problem (EV) solution. In other words, VSS represented the extra cost the decision maker must pay for not considering uncertainty in the analysis (stochasticity). To estimate the VSS, the solution to the EV must be determined first. In the present study, the EV was represented by the deterministic energy hub solved that considered the mean as the uncertain parameter (wind speed) value. In the next step, the first-stage decision variables (design decision) obtained from the EV were fixed and used as input parameters in the two-stage stochastic energy hub with the recourse problem (RP). Then, the resultant RP was solved after fixing the first-stage decision variables, which was the expected result of using the EV solution (EEV). The EEV provided the second-stage decision variables solution once the first-stage decision variables were fixed. Accordingly, the VSS could be defined as the difference between the EEV and RP.
Table 5 shows the EV, EEV, and RP solutions to the energy hub problem with/without the GHG emissions constraint. The results were obtained using 6-normal clusters with a weight factor of 4 for the year demand data (i.e., better representative by featuring lower IAE). The table clearly shows that when there was no environmental consideration, no benefit was gained from using stochastic programming (i.e., VSS = 0). As previously discussed, when the environmental constraint was inactive, neither the wind turbines nor storing units were suggested to be installed; hence, all stochastic scenarios’ solutions were identical.
In contrast, when the emission constraint was active, the VSS was estimated to be 14,832$/year (VSS = EEV-RP). The positive VSS value proved that considering uncertainty in the energy hub modeling is beneficial. Although the EV (deterministic solution) featured the lowest objective function value, deterministic modeling solutions are insufficient because they heavily rely on a relatively small segment of information (e.g., average wind speed). This information does not sufficiently explain real events, such as the wind speed behavior; therefore, they cannot be considered true representatives of real data (e.g., annual wind data). As a result, it could be stated that the wind speed uncertainty has a significant effect on the optimization solution once environmental constraints are considered in the model’s formulation (as previously proven by the VSS value estimation).
4. Conclusions
The present work targeted the integrated supply chain problem using a clustering approach. Given the fact that employing shorter time periods (e.g., hours) to achieve optimal decisions leads to larger and intractable integrated models, this work aimed to decrease the model size by representing the yearly days by typical days representative of the operating year. Accordingly, a mathematical programming approach was considered to model the clustering problem with multiple attributes. Distinct attributes featured varying scales or units, which turned the problem into a multi-objective optimization program. Accordingly, the weighting method allowed for dealing with such problem. The present clustering algorithm featured a unique characteristic that enabled attaining normal and sequence clustering employing the same similarity measure. The proposed weighted clustering method allowed tuning and generating clusters that prioritized some attributes among others.
Although the developed approach is simple, the computation complexity of the proposed clustering algorithm is obvious. A heuristic clustering approach was proposed to tackle the computational tractability of the full-scale clustering model. It was found that the larger the number of clusters employed was, the closer the solutions between the clustered and original models were. Additionally, the consideration of uncertainty in the energy hub modeling was proven to be beneficial, particularly when the environmental constraints were included in the formulation. In addition, the heuristic approach remarkably outperformed the full-scale model in terms of the solution time, usually by several orders of magnitude, particularly for large datasets. Accordingly, it can be stated that employing the clustering approach is an effective tool to reduce the model size while maintaining reasonable, accurate results. The proposed multiscale clustering method is a trade-off between the computational effort and data accuracy.
Future works can include the application of the proposed clustering approach to different multiscale planning problems. The stochastic energy hub planning model can be extended to include capacity expansion planning decisions to satisfy the multiple attributes demand. It would be interesting to use forecasted demand data to plan energy hub systems, as this case study was limited to implementing historical demand data into multiscale modeling. Therefore, forecasting techniques can be employed to forecast the future demands; the clustering approach will be applied to reduce the size of these multiple attributes demands where they can be used as an input to the energy hub planning or capacity expansion model. Another example for future work is that the multiscale clustering approach can be applied to a superstructure modeling approach to design new chemical or power plants. Therefore, instead of solving the superstructure model for a 1-day profile that represents the whole year, it can be solved for several representative days that are more likely to reflect the real behavior of demand.