**Contents**


## **About the Editor**

**Sung-Yoon Huh** is an Assistant Professor in the Department of Energy Policy at Seoul National University of Science & Technology, where he has been a faculty member since 2017. Dr. Huh completed his Ph.D. at Seoul National University and his undergraduate studies at the same institute. He also went through two years of postdoctoral studies at the University of California at Berkeley. His research interests lie in the area of energy and environmental economics, ranging from theory to actual policy practice. Dr. Huh has published on a wide range of topics, including economic valuation, social acceptance, and innovation diffusion of energy (also environmental) projects and technologies. His articles have appeared in renowned international journals such as *Renewable and Sustainable Energy Reviews*, *Applied Energy*, *Energy Policy*, *Energy Economics*, *Sustainability*, and *Energies*. He has also actively collaborated with researchers in several other disciplines to provide better insights into the energy sector.

## **Preface to "End-Users' Perspectives on Energy Policy and Technology"**

The global energy market is changing rapidly, and several megatrends can be identified in its transition. Among them, with a wider application of new energy technologies such as renewable energy, there is a possibility of a shift from a conventional centralized energy supply system to a more distributed energy production system. In addition, as people's overall economic level, education level, and living standards have improved, public interest and participation in energy policy and technology are steadily increasing. In many countries, the rejection of several energy projects due to the opposition of local residents is a representative example of such importance of public opinion in the process of energy policy implementation. In such a situation, it is very important to understand the social needs and public preferences for energy policy and technology and reflect them fully in future energy policy and technology development. In this Special Issue, differents kinds of theoretical and empirical studies analyzing the general public's perceptions and behavior regarding new energy technology and policy are presented. Through these papers, it is possible to predict future changes in the energy market as well as in the behavior patterns of end-users.

> **Sung-Yoon Huh** *Editor*

## *Article* **A Machine Learning Solution for Data Center Thermal Characteristics Analysis**

#### **Anastasiia Grishina 1, Marta Chinnici 2,\*, Ah-Lian Kor 3, Eric Rondeau <sup>4</sup> and Jean-Philippe Georges <sup>4</sup>**


Received: 13 July 2020; Accepted: 20 August 2020; Published: 25 August 2020

**Abstract:** The energy efficiency of Data Center (DC) operations heavily relies on a DC ambient temperature as well as its IT and cooling systems performance. A reliable and efficient cooling system is necessary to produce a persistent flow of cold air to cool servers that are subjected to constantly increasing computational load due to the advent of smart cloud-based applications. Consequently, the increased demand for computing power will inadvertently increase server waste heat creation in data centers. To improve a DC thermal profile which could undeniably influence energy efficiency and reliability of IT equipment, it is imperative to explore the thermal characteristics analysis of an IT room. This work encompasses the employment of an unsupervised machine learning technique for uncovering weaknesses of a DC cooling system based on real DC monitoring thermal data. The findings of the analysis result in the identification of areas for thermal management and cooling improvement that further feeds into DC recommendations. With the aim to identify overheated zones in a DC IT room and corresponding servers, we applied analyzed thermal characteristics of the IT room. Experimental dataset includes measurements of ambient air temperature in the hot aisle of the IT room in ENEA Portici research center hosting the CRESCO6 computing cluster. We use machine learning clustering techniques to identify overheated locations and categorize computing nodes based on surrounding air temperature ranges abstracted from the data. This work employs the principles and approaches replicable for the analysis of thermal characteristics of any DC, thereby fostering transferability. This paper demonstrates how best practices and guidelines could be applied for thermal analysis and profiling of a commercial DC based on real thermal monitoring data.

**Keywords:** data center; thermal characteristics analysis; machine learning; energy efficiency; clustering; unsupervised learning

#### **1. Introduction**

Considerable efforts have been made by Data Centers in terms of their energy efficiency, reliability and sustainable operation over the past decade. A continuous increase in computing and power demands has spurred DCs to respond and upgrade their facilities in terms of size and stability [1,2]. A rapid growth of the information technology (IT) industry, advent of IoT, and AI technologies requires an exponentially expanding amount of data to be stored and processed. Consequently, smart DC

management is on the rise to meet this demand. If a data center experiences a system failure or outage, it becomes challenging to ensure a stable and continuous provision of IT services, particularly for smart businesses, social media, etc. If such a situation occurs on a large scale, it could be detrimental to the businesses and public sectors that rely on DC services, for example, health systems, manufacturing, entertainment, etc. In other words, a data center has emerged as a mission-critical infrastructure [3] to the survival of public and business sectors enabled by smart technologies. Therefore, it warrants an exceptional necessity for the backup system management and uninterruptible power supply (UPS) systems so that computing system stability can be maintained even in emergency situations. Overall, thermal management involves the reduction of excess energy consumption by cooling systems, servers, and their internal fans. This encompasses the compliance of the IT room environment to the requirements stipulated in IT equipment specifications and standards that ensure better reliability, availability, and overall improved IT equipment performance.

The mission-critical facility management for the stable operation of a DC leads to huge cost increases, and careful reviews must be performed starting from the initial planning stage [4,5]. Moreover, IT servers require uninterruptible supplies of not only power but also cooling [6,7]. Undeniably, a significant increase in power density has led to a greater cooling challenge [8]. For this purpose, the design of a central cooling system in a liquid cooling architecture includes cooling buffer tanks. This design provides chilled water supply in emergency situations. During cooling system outages, the interruption of chillers triggers activation of emergency power and cooling supplies to restore cooling services. However, such emergency situations are infrequent on the scale of a DC life-cycle. In addition, recent the development of IT equipment has increased servers' tolerance to various operational challenging conditions compared to that in the past. Consequently, the operating times and capacities of chilled water storage tanks could be considerably reduced. The same is true for energy and water consumption of the liquid cooling system. Undeniably, it is imperative to explore the thermal specifications of the DC IT equipment. They are expressed as (but not limited to) different admissible ranges for temperature, humidity, periods of overheating before automatic power off, etc.

Given that IT devices might have different recommended specifications for their operation, maintaining healthy operational conditions is a complex task. Undoubtedly, covert thermal factors tend to affect the health of IT and power equipment systems in a negative way. Such factors comprise recirculation, bypass, and (partial) server rack overheating and stable operation is critical for a DC [9]. For example, in the case where an IT room is divided into cold and hot aisles, ineffective partitioning of the aisles (e.g., poor physical separation of the aisles) may result in leaked air causing recirculation of hot air (note: this is the mixing of hot and cold air) or cold air bypass (note: this happens if the cold air passes by the server and does not cool it properly - this occurs when the speed of air-flow is too high or the cold air is unevenly directed at the hot servers) [10]. Consequently, such emerging challenges have to be addressed to effect thermal conditions optimization within a DC facility. Undeniably, an increase in ambient temperature could lead to an increase in power usage of IT equipment that could lead to hardware degradation [8]. Thus, it is necessary to address the issue of server waste heat dissipation with the ultimate goal to attain an even thermal distribution within a premise. This is possible by taking appropriate measures to prevent the emergence of heat islands that result in individual server overheating [11].

This work explores IT room thermal characteristics using data mining techniques for the purpose of relevant and essential knowledge discovery. The primary goal is to use an unsupervised machine learning technique to uncover inadequacies in the DC cooling system based on real monitored thermal data. Analysis in this research leads to the identification of areas for improved thermal management and cooling that will feed into DC recommendations. The proposed methodology includes statistical analysis of IT room thermal characteristics and identification of individual servers that are contributors to hotspots (i.e., overheated areas) [12]. These areas emerge when individual servers do not receive adequate cooling. The reliability of the analysis has been enhanced due to the availability of a dataset of ambient air temperature in the hot aisle of an ENEA Portici CRESCO6 computing cluster [9].

In brief, clustering techniques have been used for hotspots localization as well as server categorization based on surrounding air temperature ranges. The principles and approaches employed in this work are replicable for thermal analysis of DC servers and thus, foster transferability. This work showcases the applicability of best practices and guidelines in the context of a real commercial DC that transcends the typical set of existing metrics for DC thermal characteristics assessment. Overall, this paper aims to raise DC thermal awareness and formulate recommendations for enhanced thermal management. This aim is supported by the following list of research objectives [9]:

RO.1.To identify a clustering (grouping) algorithm that is appropriate for the purpose of this research; RO.2.To determine the criteria for feature selection in the analysis of DC IT room thermal characteristics; RO.3.To determine the optimal number of clusters for the analysis of thermal characteristics;


In summary, typical identification of hotspots (and rack or cluster failure) in data centers is through the deployment of heatmaps (e.g., by Facebook in [13]) which comprise discrete scene snapshots of the premise (and not individual compute node) under study. However, our novel contribution is the employment of an Artificial Intelligence approach (i.e., Machine Learning clustering technique) based on 'continuous' data center environmental monitoring data, which will provide insights into the qualitative measure of the 'duration' a particular compute node is in a particular temperature range (i.e., low, medium, hot). Additionally, our proposed approach has an edge over typical heatmaps due to the low level granularity (i.e., with more details) information it provides for each node rather than aggregated information of the nodes (in clusters), so that targeted as well as effective corrective action or interventions can be appropriately taken by data center owners.

The remainder of the paper is organized as follows: Section 2 focuses on the background of the problem and related work; Section 3 presents the methodology used in this work; Section 4 provides discussion of experimental results and analysis; Section 5 concludes the paper with a summary and recommendation for future work.

#### **2. Background and Related Work**

In recent years, a number of theoretical and practical studies have been conducted on DC thermal management to better understand ways to mitigate inefficiencies of the cooling systems. This includes DC thermal and energy performance evaluation and load distribution optimization. Ineffective thermal management could be the primary contributor to DC IT infrastructure unreliability due to hardware degradation.

Existing DC-related thermal management research highlights the primary challenges of cooling systems in high power density DCs [14]; recommends a list of thermal management strategies based on energy consumption awareness [2,15]; explores the effect of different cooling approaches on power usage effectiveness (PUE) using direct air with a spray system that evaporates water to cool and humidify incoming air [16]; investigates the thermal performance of air-cooled data centers with raised and non-raised floor configurations [17]; studies various thermofluid mechanisms using cooling performance metrics [18]; proposes thermal models for joint cooling and workload management [19], while other strains of research explore thermal-aware job scheduling, dynamic resource provisioning, and cooling [20]. In addition, server-related thermal information, such as inlet/outlet air temperature and air mover speed, is utilized to create thermal and power maps with the ultimate goal to monitor the real-time status of a DC [21].

A majority of previously listed research work focuses on simulations or numerical modeling [2,16–20] as well as on empirical studies involving R&D or small-scale data centers [16,21]. Thus, there is a need for more empirical research involving real thermal-related data for large scale data centers. Undeniably, it is tremendously beneficial to identify hotspots and the air dynamics (particularly its negative effects) within a DC IT room. Such useful evidence-based information will help DC operators improve DC thermal management and ensure uninterrupted steady computing system operations. This is made possible when affected servers continue to perform graceful degradation-related computations or enter the 'switch off' mode once the temperature threshold is breached. Thermal management could be improved in a number of ways based on evidence-based analysis. For example, some corrective actions could be: identify cold air leakages and erect isolating panes; adjust speed, volume, and direction of the cold air stream; apply free cooling wherever possible or adjust the humidity levels. An exhaustive guideline for DC thermal management improvement can be found in [7].

A crucial step forward in DC thermal management related research could be adherence to the recommended thermal management framework [22] at varying DC granularity levels. As a part of the framework, thermal metrics have been created by research and enterprise DC communities [10]. Employment of the metrics aims to reveal the underlying causes of thermal-related issues within a DC IT room and to assess the overall thermal conditions of the room. A recently proposed holistic DC assessment method is based on biomimicry [23]. This integrates data on energy consumption for powering and cooling ICT equipment.

This paper is an extension of the previous authors' work [10,11,24–28], which focus on real DC thermal monitoring data. In detail, this current research focuses on the analysis of DC IT room thermal characteristics to uncover ways to render a more effective cooling system as well as explore possibilities to employ machine learning techniques to address this issue. Appropriate data analytics techniques have been applied on real server-level sensor data to identify potential risks caused by the possible existence of negative covert physical processes related to the cooling strategy [2]. In summary, this work is based on the analysis of DC thermal characteristics using machine learning (ML) techniques. ML has been generally employed for virtual machines allocation, global infrastructure management, prediction of electricity consumption, and availability of renewable energy [29]. Thus far, there is work on ML for thermal characteristics assessment and weather conditions prediction, but only limited available work on thermal management. Typically, Computational Fluid Dynamics (CFD) techniques have been employed for the exploration of DC thermal management. Their drawbacks are high computational power and memory requirements. Therefore, the added value of this research is an evidence-based recommendation for a cooling system for more targeted temperature management through thermal characteristics analysis for localization of overheated areas in the DC IT room.

#### **3. Methodology**

This section discusses the thermal characteristics analysis of an ENEA R.C. Portici cluster CRESCO 6. An ML clustering technique was chosen for a more in-depth analysis of overheated servers' localization based on an available dataset of CRESCO6 server temperature measurements. The terms "server" and "node" are used interchangeably in this work, while "hotspot" is utilized to indicate an overheated area next to a server in the IT room and results in an overheated or undercooled server. The drawback of a typical analysis of temperature measurements is that it could not locate the specific nodes which cause rack overheating. Hence, to address this issue, we have applied node clustering to localize potentially harmful hotspots. To identify overheated areas in the CRESCO6 group of nodes, we sequentially grouped the nodes into clusters characterized by higher or lower surrounding air temperature [9]. The term "group of nodes" stands for the DC "cluster" (note that this term is not used to avoid its confusion with the term "cluster", which is the outcome of running an ML clustering algorithm).

#### *3.1. Cluster and Dataset Description*

Thermal analysis was based on monitoring data of the CRESCO6 cluster in the premises of ENEA-Portici Research Center (R.C.). Data collected were cluster power consumption of IT equipment (servers) and measurements of ambient air temperature. This cluster has been up and running since May 2018. It is used to augment the computing resources of the CRESCO4 system, already installed and still operating in the Portici Research Center. The reason for the augmentation is due to the rise in demand for analytic activities. Thus, with the addition of the cluster CRESCO6, the overall computing capability of ENEA R.C. has increased up to seven-fold. The cluster comprises 418 Lenovo nodes housed in a total of 5 racks. Each node includes two CPUs, each with 24 cores (with a total of 20,064 cores). This pool of resources is aimed to support Research and Development activities in ENEA Research Center [9].

Measurements of ambient air temperature were recorded during two phases. The first phase was during cluster set up and performance tuning (subject to thermal control strategies) and other indicator specifications during the months of May–July 2018. Subsequently, end-users were allowed to submit and run their jobs and relevant parameters had been monitored and measured for approximately 9 months (September 2018–February 2019). The measurements were paused in August 2018 as shown in Figure 1 [9]. Data collected encompassed all 216 nodes, out of which 214–215 nodes were consistently monitored, and the other 1–2 nodes had missing values or were turned off. The monitoring system consisted of an energy meter, a power meter of CPU, RAM and computing power utilization of every node, and CPU temperature for both processing units of each node with thermal sensors installed inside the servers, at the inlet and exhaust air locations in cold and hot aisles respectively (i.e., placed in the front and rear parts of every node).

**Figure 1.** Period of available measurements data in May–December 2018 and January–February 2019.

#### *3.2. Data Analytics*

Variation of the air temperature was captured and analyzed for different areas of the IT room. The variability of thermal data and uncertainty in defining temperature thresholds for overheated areas has provided a justification for the use of an unsupervised learning technique. Hence, a k-means algorithm was employed to address the limitations of typical statistical techniques and cluster the servers according to their surrounding air temperature. Silhouette metric and within-cluster sum of squares were used to first determine the number of clusters. Available thermal characteristics (i.e., exhaust air temperature, readings of CPU temperature) served as inputs to the k-means algorithm. A set of surrounding air temperature measurements for the nodes was clustered the same number of times as the number of measurements taken for the batch of all the nodes. Subsequently, the resulting series of cluster labels were intersected to unravel nodes (distinguishable by their IDs) that frequently occurred in the high-temperature cluster.

An adapted data lifecycle methodology was employed for this work, as depicted in Figure 2. The methodology comprises several data preprocessing steps, data analysis, followed by interpretation of the results and their exploitation in the form of recommendations for the DC under consideration [9]. A detailed discussion of all data analytics stages represented in Figure 2 are found in the ensuing section.

**Figure 2.** Data analysis lifecycle methodology adapted to sequential clustering of DC servers based on their thermal characteristics.

The data preprocessing step consisted of data cleansing of zero and missing values and formatting. The dataset was organized as shown in Table 1. This table summarizes the results of monitoring of the overall number of nodes in a computing cluster, *N*. In addition, data preprocessing involved timestamps formatting for further exploitation. In detail, the system was configured so that the monitoring system recorded the thermal data for every node with an interval of around 15 min, including a slight latency between each pair of consecutive readings of temperature sensors around the nodes. The readings resulted in a set of *N* rows with the information for every node ID.


**Table 1.** Dataset using for clustering analysis.

The data analysis step included several substages. The sequential clustering substage encompassed the investigation into the optimal number of clusters followed by server clustering into groups (with three possible levels: low, medium, and high) of the surrounding air temperature. The results were further consolidated to ascribe final cluster labels for each server (i.e., low, medium, or high temperature) based on the frequency of occurrences for each node label in the sequence of results [9]. Clustering was performed *M* times, where *M* is the overall number of time labels at which measurements were taken for all cluster nodes. Each new set of monitoring system readings was labeled with a time label *t*1, ... , *tM*. The exact timestamp for the extracted information was marked with *ti* + *tinj* for every node *j*. Depending on the available dataset, a number of relevant features described the thermal state of every node and their different combinations could be used as a basis for clustering (RO.2 will be more considered in detail in Section 4). Thus, we introduce *base* in the last column of Table 1 which denotes the basis for clustering, i.e., a combination of temperature measurements taken as k-means input (see Results and Discussions for details). The indicator *base* also corresponds to the temperature of the cluster centroid [9].

In this work, k-means was chosen as a clustering algorithm due to the following reasons (RO.1):


The number of clusters, *K*, is an unknown parameter that also defines the number of ranges for Ci baserange . It is estimated for each of the three combinations or *bases* using two metrics separately: the average silhouette coefficient and within cluster sum of squares (WCSS) metric [9,30,31] (RO.3). The application of these two indices to derive the suitable number of thermal ranges or clusters is shown in Appendix A. In brief, the silhouette coefficient was computed for each clustered sample of size *N* and showed the degree of isolation for the clusters, thus, indicating the quality of clustering. The +1 value of silhouette index for a specific number of clusters, *K*, indicated the high density of clusters, −1 showed incorrect clustering, and 0 stood for overlapping clusters. Therefore, we focused on local maxima of this coefficient. WCSS was used in the Elbow method of determining the number of clusters and was used here to support the decision obtained from the silhouette coefficient estimation. It measured the compactness of clusters, and the optimal value of *K* was the one that resulted in the "turning point" or the "elbow" of the WCSS (*K*) graph. In other words, if we increase the number of clusters after reaching the elbow point, it does not result in significant improvement of clusters compactness. Although it could be argued that other features could be additionally used for determining the number of clusters, the combination of the two aforementioned methods had converged on the same values of K for our chosen *bases*, which was assumed to be sufficient for this current research.

Once we obtained the optimal parameter *K*, we performed the clustering procedure for the chosen *bases*. For a fixed *base*, the sequence of cluster labels was examined for every node. Based on the frequency of occurrences for each cluster label (low, medium, or hot air temperature labels), the node was ascribed the final most frequently occurring cluster label *Cbaserange* in the sequence and was assigned to the corresponding set of nodes as *Nbaserange*. Furthermore, we took the intersection of the sets of nodes in the hot air range for every *base*. Thus, we obtained the IDs of the nodes that were most frequently labeled as the nodes in "danger" or hot areas of the IT rack by three clustering algorithms: *Nhot* <sup>=</sup> <sup>∩</sup>*bases*- *Nbasehot range* (RO.4). In the following section, we will discuss the results of k-means sequential clustering and identify the nodes that occurred in the overheated areas more frequently than others.

#### **4. Results and Discussions**

For every combination of measured thermal data, the results of servers clustering into cold, medium, and hot temperature ranges had been further analyzed to calculate the frequency of occurrences of each node in each cluster and determined their final frequency label (i.e., cluster label or temperature range). These labels were further intersected with labels obtained for different bases. Each set of *N* = 216 nodes was clustered at once, followed by temperature-based clustering for the same set of nodes using measurements taken at the next timestamp. This process is referred to here as sequential clustering. The indicator *base* of the input temperature data was used in three possible combinations of available thermal data: exhaust air (*base* 1), CPU (*base* 2), and exhaust air and CPU temperature measurements (*base* 3) (RO.2).

The dataset contained *M* = 15, 569 sets of temperature measurements. Each *M* sets consisted of 216 node-level temperature measurements of the sensor data. The sensors were installed in different locations with respect to the node: in the front (inlet), rear (exhaust) of every node, and two sensors inside each node (CPU temperature). The optimal number of clusters *K* was influenced by the *base* chosen for clustering. Using the silhouette metric and WCSS, we obtained the optimal number of *K* for *bases* 1–3 (exhaust, CPU, and exhaust and CPU measurements) and the *K* values equaled to three, five, and three clusters, respectively [9].

During sequential clustering, each node was labeled with a particular temperature range cluster. Since clustering was repeated for each set of measurements grouped by time label, every node was repeatedly clustered several times and tagged with different labels while the algorithm was in progress (RO.4). Figure 3a–c shows the results of sequential clustering, where one set of three or five vertically aligned graphs represents the result for sequential clustering using one input *base*. In detail, Figure 2 compares how frequently each of the 216 nodes is labeled with low, medium, or hot temperature range. Every vertical graph corresponds to the proportion of occurrences (in %) of low, medium, or hot temperature labels for one node. This figure indirectly implies the incidence rate, "duration", or tendency of a particular node experiencing a certain temperature range (see legend in Figure 3a–c). Here, the medium temperature range label most frequently occurred for the majority of the nodes for all cluster *bases*. We also observe that some nodes remain in the hot range for more than 50% of clustering iterations. This information could alternatively be obtained if the temperature levels (lower and upper bounds of low, medium, and hot temperature ranges) were preset by an expert, but for this research, this estimation is not available. Therefore, the consideration of the clustering results is crucial for the DC in a situation when temperature ranges are unavailable. Figure 2 provides a means for asynchronous assessment of the thermal state of the IT room and unravels relevant thermal trends.

**Figure 3.** The ratio of nodes labeled by different temperature ranges (low, medium, and hot) based on different thermal data input (*base*): (**a**) Exhaust temperature, (**b**) Exhaust and CPU temperature, (**c**) CPU temperature. The legend includes coordinates of cluster centroids.

In cases where nodes remain in the hot range for a prolonged period or frequently fall in this range, it implies that they are overheated (and the cooling appears to be ineffective). Consequently, this could cause hardware degradation where the nodes have reduced reliability and accessibility as they automatically switch to lower power mode when overheated. Therefore, we continued with the analysis to identify the actual node IDs that had most frequently been clustered within the hot air temperature ranges. Table 2 provides an insight into the ratio of nodes with the highest frequency of occurrences in cold, medium, or hot air temperature range (RO.5). Depending on the cluster base, 50% to 86% of all nodes had the highest frequency of occurrence in the medium range. The hot range encompassed 11%–37% of all nodes, and only 0.5%–4% had been clustered within the cold range.


**Table 2.** Ratio of cluster sizes and intersection of node labels from three thermal data combinations (*bases*).

Finally, the results of sequential clustering with three bases were intersected for cross-validation purposes. These results in the intersection of nodes were labeled as cold, medium, and hot surrounding air range. One node (equal to 0.5% of nodes) was labeled as the cold air temperature range for all three *bases*. The cluster characterized by the medium air temperature range had the largest intersection among the *bases* (i.e., more than 90%), while 8% (or 18 nodes) were binned in the hot air temperature range most frequently using all three *bases*. The highlight of this exploration is that we were able to identify the IDs of the most frequently overheated nodes. DC operators could further exploit this evidence-based information to improve thermal conditions in the cluster IT room. Possible corrective actions to mitigate overheating of the localized nodes could be to improve the existing natural convection cooling by directing the cold air to the hottest nodes. In addition, DC operators could update the load scheduling and decrease the workload of the identified nodes to indirectly prevent overheating (RO.6). In summary, DC operators could improve resource allocation policies and cooling strategies to effectively address this issue.

This current paper has contributed to thermal characteristics awareness for a real DC cluster and addressed the issue of servers overheating. This has two positive effects in terms of sustainability. Firstly, local overheating could be considered as an IT room thermal design pitfall. It leads to a high risk of hardware degradation for servers that are frequently and/or for long time exposed to high surrounding air temperature. From this perspective, the localization of hot regions of the IT room performed in this study (via a ML technique) is crucial for providing a better overview of thermal distribution around the servers which could be fed into better thermal control and management strategies. In other words, future thermal management improvements could be aligned to the direction provided in this study with the aim of mitigating the abovementioned risk. Secondly, a clustering technique used in this phase requires less computational resources compared to heatmaps, computational fluid dynamics modeling and/or simulations performed on existing simulation packages [9]. Therefore, this work evidences the benefit of less computationally intensive analytical techniques (in yielding sufficient information) to incentivize improvement of thermal conditions (through even thermal distribution) in data centers. In summary, conclusions that could be drawn from this research are that a majority of the nodes were

located in medium and hot air temperature ranges. Joint results of three clustering algorithms had shown that that 8% of cluster servers were most frequently characterized as having hot surrounding air temperature. Based on this evidence, we have formulated a list of recommendations (see subsequent section) to address the problem of repeated or prolonged overheating of servers (RO.6).

#### **5. Conclusions and Future Work**

Analysis of IT and cooling systems is necessary for the investigation of DC operations-related energy efficiency. A reliable cooling system is essential to produce a persistent flow of cold air to cool servers that could be overheated due to an increasing demand in computation-intensive applications. To reiterate, Patterson [8] has maintained there is an impact of DC ambient temperature on energy consumption of IT equipment and systems. However, in this paper, the focus is on thermal characteristics analysis of an IT room. The research methodology discussed in this paper includes statistical analysis of IT room thermal characteristics and the identification of individual servers that frequently occur in the overheated areas of the IT room (using a machine learning algorithm). Clustering techniques are used for hotspot localization as well as categorization of nodes (i.e., servers) based on surrounding air temperature ranges. This methodology has been applied to an available dataset with thermal characteristics of an ENEA Portici CRESCO6 computing cluster. In summary, this paper has presented a proposed methodology for IT room thermal characteristics assessment of an air-cooled DC cluster located in a geographical region where free air cooling is unavailable. The steps involved for evidence-based targeted temperature management (to be recommended for air-cooled DCs) are as follows:


To reiterate, the approaches covered in this work are transferrable for thermal characteristics analysis in any air-cooled DC context enabled with a thermal monitoring system. This study illustrates the applicability of the best practices and guidelines to a real DC and uses an ML approach to perform IT room thermal characteristics assessment. This work could be extended by incorporating an integrated thermal management with existing energy efficiency policies-related research (e.g., energy awareness [15]; job scheduling using AI [32], temporal-based job scheduling [33], work-load aware scheduling [34], and queue theory [35]; resource utilization [36] of multiple applications using annealing and particle swarm optimization [37]). Another direction that could be taken could be the energy efficiency policies and waste heat utilization [26].

**Author Contributions:** Conceptualization A.G., M.C., A.-L.K., E.R. and J.-P.G.; methodology, A.G., M.C., A.-L.K.; data analysis, A.G.; data curation, M.C.; writing—original draft preparation, A.G., M.C., A.-L.K. and J.-P.G.; writing—review and editing, A.G., M.C., A.-L.K.; project administration, E.R. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by EMJMD PERCCOM Project [38].

**Acknowledgments:** This work was supported by EMJMD PERCCOM Project [38]. During the work on the project, the author A.G. was affiliated with ENEA-R.C. Casaccia, Rome, Italy. Moreover, the authors extend their gratitude to the research HPC group at the ENEA-R.C. Portici for the discussion of ENEA-Data Center control and modeling. **Conflicts of Interest:** The authors declare no conflict of interest.

#### **Appendix A**

Several approaches are widely used by data scientists to identify the optimal number of clusters. However, it is worth noting that none of the approaches is considered accurate, instead they provide a suggestion for the number of clusters. The current study applies two indices: WCSS, also known as an elbow method, and average Silhouette Index [32]. We apply a systematic approach and calculate these metrics for different values of cluster numbers, *K*. Subsequently, an optimal value is chosen based on the values of the indices. WCSS characterizes the compactness of the cluster and is defined as follows:

$$\text{WCSS}(K) = \sum\_{j=-1}^{K} \sum\_{\mathbf{x} \in \mathcal{C}\_j} \|\mathbf{x} - \boldsymbol{\mu}\_j\|^2 \tag{1}$$

where *K* is the number of clusters, *C* is a set of clusters (*C*1, *C*2, ... *Cj*), and μ*<sup>j</sup>* represents a certain cluster sample mean. The target value of this metric should be a possible minimized value. Practically, this refers to a point where the value of the metric continues to decrease but at a significantly slower rate in comparison to a smaller number of clusters and is considered an optimal one. It is visually associated with an "elbow" of the graph. The justification for choosing an elbow point is that with the increase in the number of clusters, the metric decreases only slightly, while computations become increasingly more intensive.

The second method applied in this paper is the average silhouette. It is computed using average silhouette index over all data points (or cluster members). It estimates within-cluster consistency and should be maximized to achieve the effective cluster split. The formula for the Silhouette index is as follows:

$$s(i) = \frac{b(i) - a(i)}{\max\{b(i), a(i)\}},\tag{2}$$

where, *a* is the mean distance between one cluster member and all other members of the same cluster. Parameter *b* is the distance between a cluster member and all other points in the nearest cluster. We depict the results of this metrics calculation in Figure A1. It shows the metrics for one-step of sequential clustering based on server exhaust air temperature. At *K* = 3, we observe a local maximum for silhouette index and an elbow point of WCSS graph.

**Figure A1.** (**a**) WCSS; (**b**) Average Silhouette Index. Both indices are computed for one step of the sequential clustering procedure based on the exhaust air temperature.

#### **References**


Thermal and Thermomechanical Phenomena in Electronic Systems (ITherm), Orlando, FL, USA, 27 May 2014; pp. 1346–1353.


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **Stochastic Modeling of the Levelized Cost of Electricity for Solar PV**

**Chul-Yong Lee <sup>1</sup> and Jaekyun Ahn 2,\***


Received: 25 April 2020; Accepted: 10 June 2020; Published: 11 June 2020

**Abstract:** With the development of renewable energy, a key measure for reducing greenhouse gas emissions, interest in the levelized cost of electricity (LCOE) is increasing. Although the input variables used in the LCOE calculation, such as capacity factor, capital expenditure, annual power plant operations and maintenance cost, discount and interest rate, and economic life, vary according to region and project, most existing studies estimate the LCOE by using a deterministic methodology. In this study, the stochastic approach was used to estimate the LCOE for solar photovoltaic (PV) in South Korea. In addition, this study contributed to deriving realistic analysis results by securing the actual data generated in the solar PV project compared to the existing studies. The results indicate that the LCOE for commercial solar power ranged from KRW 115 (10 cents)/kWh to KRW 197.4 (18 cents)/kWh at a confidence level of 95%. The median was estimated at KRW 160.03 (15 cents)/kWh. The LCOE for residential solar power ranged from KRW 109.7 (10 cents)/kWh to KRW 194.1 (18 cents)/kWh at a 95% confidence level and a median value of KRW 160.03 (15 cents)/kWh. A sensitivity analysis shows that capital expenditure has the most significant impact on the LCOE for solar power, followed by the discount rate and corporate tax. This study proposes that policymakers implement energy policies to reduce solar PV hardware and soft costs.

**Keywords:** LCOE; stochastic; solar PV; South Korea; renewable energy

#### **1. Introduction**

Since the Paris Agreement came into effect in November 2016, the issue of reducing greenhouse gas (GHG) emissions gained traction globally. As a result, most countries are required to submit and implement Nationally Determined Contributions in an effort to address this issue. For example, South Korea is expected to reduce GHG emissions by 37% from business-as-usual levels by 2030. However, the reduction targets submitted per country are currently lower than the global target of maintaining a temperature rise below 2 ◦C above pre-industrial levels for this century [1]. Further GHG reduction targets for each country are to be discussed in order to meet the universal target, which is therefore likely to become a major constraint on the global economy.

A transition to renewable energy is one of the key measures for reducing GHG emissions. Solar photovoltaic (PV) is the fastest-growing source of numerous renewable energy sources, leading to a sharp reduction in cost and an increase in demand. Therefore, it is essential to accurately estimate the cost of solar PV and to compare it with other energy sources. To do so, it is necessary to compare the costs incurred for producing equivalent amounts of power. For this reason, many studies have introduced the levelized cost of electricity (LCOE) [2–6].

This study has marginal contributions to the previous study from three perspectives. First, this study considers more sophisticated input variables than previous studies. Most existing studies consider capacity factors, capital expenditure (CAPEX), annual power plant operations and maintenance (O&M), discount rate, and economic life as input variables [2–5]. In this study, a more realistic analysis is attempted in additional consideration of the project's corporate tax, debt cost, inflation rate, and loan interest rate. Second, existing LCOE-related studies were analyzed from a deterministic point of view. Since the input variables used in the LCOE calculation vary according to region and project, simulation techniques are useful to account for these changes. This study aims to stochastically estimate the LCOE based on a Monte Carlo simulation to consider the variation of input variables. Third, while existing stochastic approach studies subjectively assume input variables, this study derives the optimal distribution using actual data in the case of capacity factor. The distribution is analyzed by using the actual generation data of the solar PV project and the Kolmogorov–Smirnov statistics test.

The target of the analysis is solar PV in South Korea, which is growing rapidly. As a result of stochastic approach using Monte Carlo simulation, significant statistical values such as a reference value, an average, a median value, a standard deviation, a minimum value, and a maximum value are derived. The methodology proposed in this study can be applied to various energy sources in multiple countries globally. In addition, these results are expected to prove valuable in countries' energy policy development and economic analysis.

The structure of this paper is as follows. Section 2 demonstrates the current status of the global solar LCOE. Section 3 introduces the methodology used in this study. Section 4 examines the stochastic LCOE results, and Section 5 discusses the results and presents policy implications.

#### **2. Literature Review**

Bhandari and Stadler [7] compared the average LCOE of residential and commercial solar PV in Cologne, Germany, with the electricity rate to determine grid parity. By comparing the LCOE to the high electricity bills of consumers, it was estimated that grid parity would be achieved in 2013 or 2014. However, by considering low wholesale electricity prices, it was found that grid parity would be achieved in 2023. Branker et al. [8] estimated the LCOE by focusing on cases in North America and conducted a sensitivity analysis on the major variables. The LCOE variables of initial investment (installation) costs, investment methods, economic life, and debt redemption period responded sensitively. With developments in financing techniques and industry and technology improvements, it was revealed that solar PV could be more cost effective than traditional energy sources, thereby reaching grid parity more efficiently. Mendicino et al. [4] suggested an appropriate contract price for the Corporate Power Purchase Agreement (CPPA) using the LCOE. CPPA is a contract between electricity consumer and a power generator with renewable energy. The results show that the appropriate price range is between EUR 75/MWh and EUR 100/MWh.

Rhodes et al. [9] calculated the LCOE for 12 plant technologies by county in the United States. For some technologies, the average cost has increased when internalizing the cost of carbon and air pollutants. Including the cost of USD 62/tCO2 for CO2 emissions, combined cycle gas turbine, wind and nuclear power showed the lowest LCOE. Clauser and Ewert [10] analyzed the LCOE of geothermal energy and other primary energy. The LCOE was calculated by varying the conditions of geothermal energy, and as a result of these cost comparisons, it was concluded that geothermal energy could be converted into electrical energy at an attractive cost, particularly for steam use in natural or engineered geological reservoirs.

Chadee and Clarke [11] conducted a technical and economic assessment to determine the level of LCOE of wind power in the Caribbean islands. The assessment was conducted on two sites with eight wind turbines ranging from 20 to 3050 kW. Mundada et al. [12] calculated the LCOE for a hybrid system of solar PV, batteries and combined heat power (CHP). Sensitivity analysis of these hybrid systems for LCOE was performed on the capital costs of the three energy subsystems, capacity factor of PV and CHP, efficiency of CHP, natural gas rates and fuel consumption of CHP. The results of sensitivity have provided decision makers with a clear guide to distributed generation LCOE with off-grid PV + battery + CHP systems. Nissen and Harfst [13] proposed an 'Energy price adjusted LCOE' that allows for more accurate LCOE calculations in consideration of rising energy prices.

As such, the LCOE methodology is useful for a variety of countries and for various energy sources. However, the existing studies analyzed LCOE from a deterministic point of view, and thus did not reflect much uncertainty about the input variable. Therefore, this study has a marginal contribution compared to previous studies in that it analyzes the LCOE using a stochastic approach. In addition, in assuming the distribution of the input variable, in the case of capacity factor, the distribution was derived using actual data. Lastly, this study is contributing to deriving a more realistic LCOE in that it considers more specific input variables, such as corporate tax, debt cost, inflation rate, and loan interest rate than previous studies.

#### **3. Methodology**

#### *3.1. Levelized Cost of Electricity*

The LCOE is the average cost per unit of electricity generated by a particular plant. It is calculated by dividing the present value of the total generation cost of the facility by the present value of total power generation. The LCOE allows the evaluation of the costs in relation to the generated amount of power during the economic lifetime of a plant and across the entire energy generation process, including initial construction capital, operations, and maintenance [14,15].

The total cost incurred in the generation of energy comprises of the initial CAPEX and annual O&M costs. More specifically, the initial CAPEX can be separated into hardware and soft costs. Hardware costs refer to equipment and materials, civil engineering, power generation equipment, and annex buildings, while soft costs include design, permits and authorizations, and construction supervision services. O&M costs cover annual operations and maintenance costs of the power plant and financial services fees, such as insurance premiums [16,17].

The LCOE is affected by construction costs, operations and maintenance costs, the lifespan of the power plant, power generation technology, energy efficiency, system degradation rates, inflation and interest rates, and corporate taxes. The formula for calculating the LCOE may be defined as follows [10]:

$$\text{LCOE}\_t = \frac{\text{CAPEX}\_t + \sum\_{n=1}^{T} \frac{\text{OM}\_n + \text{FC}\_n}{(1+r)^n}}{\sum\_{n=1}^{T} \frac{(1-d)^n \times \text{CF} \times 365(days) \times 24(hours) \times \text{Capacity}}{(1+r)^n}} \tag{1}$$

In the above formula, *CAPEXt* refers to initial investment (facilities), which include equipment and materials, the construction of structures, grid connection, permits, design, supervision, and inspection at time *t*. Indirect costs, *OMn*, are the O&M costs at time *n*; *FCn*, the finance costs at time *n*; *r*, the discount rate; d, the degradation rate; *CF*, the capacity factor; capacity, the energy generating capacity of the power plant; and T, the operation period of the power plant. that is interest cost due to debt. In this study, interest cost due to debt is considered as finance cost. The capacity factor is the rate at which the power generator operates for one year. For example, the capacity factor of Korea's solar PV is about 14.78%, which means that it produces only 14.78% of power capacity per year. The discount rate is affected by the inflation rate and the interest rate of safe assets in a country. It could also be interpreted that the LCOE represents the recovery of costs disbursed during the lifetime of a generation facility at a discounted rate *r* in the form of an equal amount paid annually.

#### *3.2. Stochastic Approach*

Economic analysis methodologies can be categorized as deterministic and stochastic models depending on the relationship between the input and output variables. The characteristics of the deterministic model are the relations between the input and output variables that are certain, and that the model allows an analytical solution. Contrarily, the model contains three weaknesses. First, it excludes potential future alternatives as it sets long-term variables at fixed values. Second, if all the scenarios with possible variables are combined, the number of cases increases exponentially, hindering the application of the sensitivity analysis. Third, the model does not allow for the reflection of correlation between variables.

The stochastic model does not enable the development of a solution, meaning that a confidence interval must be identified in the results by incorporating the probabilistic characteristics of the input variables using a simulation technique that generates random numbers [18]. The advantages of the stochastic simulation technique are as follows. First, it enables the estimation of a solution for a comparatively difficult mathematical question. Second, for uncertain variables, it allows for the establishment of a correlation between the probability distributions of the variables. However, a disadvantage is that the estimate produced through the stochastic simulation is an approximate value calculated by repeated sampling, thus requiring a statistical interpretation [19]. Among the stochastic simulation techniques, the Monte Carlo simulation is a method that is used universally. Assuming that the input variable is a probabilistic variable, an adequate probability distribution is selected, and a random number that follows the relevant distribution is produced [20–23].

To explain the value of the Monte Carlo simulation, the probability variable X has a probability density function *fx*(*x*) and assumes an arbitrary function *g*(*x*). The expected value of *g*(*x*) is as follows:

$$E(\mathcal{g}(X)) = \int\_{x \in \mathcal{X}} \mathcal{g}(\mathbf{x}) f\_{\mathbf{x}}(\mathbf{x}) d\mathbf{x} \tag{2}$$

To estimate the expected value of g(*x*) as per the above, n number of samples (*x*1, ... , *xn*) are extracted from a distribution of the probability function *X*, and the average of g(*x*) is calculated as below:

$$\left(\widetilde{\mathbf{g}}\_{\mathfrak{n}}(X)\right) = \frac{1}{n} \sum\_{i=1}^{n} \mathbf{g}(\mathbf{x}\_{i})\tag{3}$$

 *gn*(*X*) is the Monte Carlo estimator of *<sup>E</sup>*(*g*(*X*)), which is based on the law of large numbers. In the case where the weak law of large numbers is expressed as the formula below, it could be concluded that when the number (*n*) of samples is infinite, the average of g(*x*) based on the sampling can be found at *E*(*g*(*X*)) [24].

$$\lim\_{n \to 0} P(\left| \overline{\mathcal{g}}\_n(X) - E(\mathfrak{g}(X)) \right| \ge \varepsilon) = 0 \tag{4}$$

As a result, *gn*(*X*) satisfies the identity below and becomes the unbiased estimator of *<sup>E</sup>*(*g*(*X*)).

$$E(\widetilde{\mathcal{g}}\_n(X)) = E(\frac{1}{n}\sum\_{i=1}^n \mathcal{g}(\mathbf{x}\_i)) = \frac{1}{n}\sum\_{i=1}^n E(\mathcal{g}(\mathbf{x}\_i)) = E(\mathcal{g}(X))\tag{5}$$

If *g*(*x*) is given as a complex function, the integral calculation becomes problematic, and so does finding a solution. However, if the Monte Carlo simulation is used, the expected value of the function can be estimated without having to conduct a complex calculation process [25,26].

In the Monte Carlo simulation, the value and size of the extracted sample has an absolute effect on the results. Therefore, the method used to generate the random number that follows the given probability distribution is highly important. Figure 1 shows the analysis procedure of the Monte Carlo simulation [27]. An appropriate distribution can be selected based on available data, on the judgment of a knowledgeable expert, or on a combination of data and judgment. Factors for judgment are discrete or continuous, having bound or not, number of modes, and symmetric or skewed. The size of the extracted sample depends on the estimated standard deviation, the desired margin of error, and the critical value of the normal distribution for significant level [28]. As a general rule of thumb, 10,000 iterations are used [29].

**Figure 1.** Procedure for the Monte Carlo simulation analysis.

For the purpose of this study, the LCOE analysis of solar PV, the major variables that determine the economics of power generation are the generation amount of power and cost. However, these two variables fluctuate significantly and contain uncertain factors related to the solar radiation amount, technological advancement, and market conditions. When applying the deterministic technique to the economic analysis of photovoltaics, with both volatile and uncertain variables, it becomes difficult to reveal the characteristics of the variables since uncertain future factors are simplified. However, the stochastic simulation method can produce results by considering input variables that are uncertain or volatile through an adequate probability distribution and could therefore be a more appropriate method.

#### *3.3. Sensitivity Analysis*

The sensitivity of the contribution of the solar LCOE distribution enables a comparative analysis of the direction and extent of the impact of probability variables on the LCOE. The distribution contribution can be calculated according to the following.

In the first phase, the sample and results of input variables extracted through the simulation are organized in order, and the correlation between the input variable samples and results are identified.

In the second phase and as per Equation (6), the distribution contribution (*vi*) of input variable *i* represents the proportion of the ordinal correlation coefficient squared (*R*<sup>2</sup> *<sup>i</sup>* ) over the summation from *i* to N of the ordinal correlation coefficient squared (*R*<sup>2</sup> *i* ).

$$w\_i = \frac{R\_i^2}{\sum\_{i}^{N} (R\_i^2)}\tag{6}$$

The numerator (ordinal correlation coefficient [*Ri* ]) acquires the original negative (−) or positive (+) sign. The reason being, that when the ordinal correlation coefficient is negative (−), the resulting value decreases as the input variable increases. In contrast, when it is positive (+), the resulting value increases.

#### **4. Empirical Results**

#### *4.1. Data*

In this section, we apply the above-mentioned stochastic simulation technique to establish the probability distribution for variables with uncertainties. Thereafter, a random sample will be extracted from the relevant distribution, and the resulting value of the probability distribution will be estimated by repeatedly conducting the LCOE calculation.

Table 1 shows the input variables required for the LCOE analysis. The subjects of the analysis are 100 kW commercial facilities installed on the ground and 3 kW residential facilities installed inside buildings. The random variables can be classified as internal and external factors. The internal factors consist of costs and facility characteristics, with the costs comprising CAPEX and O&M costs. In terms of facility characteristics, the capacity factor (which determines the generation amount of power) and system degradation rate are considered random variables. The external factors include the discount rate and corporate tax. Other variables, such as the debt ratio, loan interest rate, inflation, and lifespan, have been granted fixed values, considering their significance and the fact that they fluctuate.


**Table 1.** Input variables for the solar levelized cost of electricity (LCOE).

In the probability distribution, the most appropriate distribution was determined through verification when there were sufficient data samples. Contrarily, the probability distribution cases presented in preceding studies were referred to when the data was lacking or insufficient. A representative study on probability distribution estimation is Spooner [30], which estimated the cost distribution of a construction project and proposed the application methods for normal distribution, log-normal distribution, triangular distribution, beta distribution, and uniform distribution. Uniform distribution is used when there is an insufficient amount of data and the fluctuation range is relatively small. Triangular distribution is used when the maximum likelihood estimation (MLE) is certain, and the information about the maximum and minimum values is considered concrete. However, when the possibility for the reduction of variables is very low, the minimum value does not have to be designated. This study therefore proposed that it is reasonable to estimate the probability distribution for relevant variables using the triangular distributions, which are skewed distributions. This study used the probability simulation analysis software Crystal Ball (version 11).

#### 4.1.1. Capacity Factor

The capacity factor of solar PV is based on data on photovoltaic energy generation provided by the Korea Energy Agency. The goodness-of-fit test was conducted to establish the probability distribution for the capacity factor. The number of capacity factor samples totaled 106,654, and the Kolmogorov–Smirnov (K–S) statistics test is the test method used. As per Equation (7), this test method extracts the maximum value statistics (*Dn*) by subtracting the cumulative distribution function for the fitting distribution *F*(*x*) from the cumulative percentile of the actual measurement data *Fn*(*x*). The smaller the statistic, the higher the goodness-of-fit.

$$D\_n = \max \left| F\_n(\mathbf{x}) - F(\mathbf{x}) \right| \tag{7}$$

The goodness-of-fit test was conducted with a total of 14 probability distributions, from the logistic distribution to the exponential distribution. Table 2 lists the test results according to each probability distribution. The test results show little difference of the K–S statistics between the logistic distribution and the student t distribution. In this study, the logistic distribution (an average of 14.78%, scale of 0.22%) with the smallest value of the K–S statistics was selected as the probability distribution for capacity factors.


**Table 2.** Probability distribution test results.

The probability density function (PDF) of a logistic distribution is presented as follows [31]:

$$f(\mathbf{x}) = \frac{e^{-\frac{\mathbf{x}-\mu}{s}}}{s(1+e^{-\frac{\mathbf{x}-\mu}{s}})} = \frac{1}{4s}\text{sech}^2(\frac{\mathbf{x}-\mu}{2s})\tag{8}$$

The distribution average is calculated as the average of μ and variance *<sup>s</sup>*2π<sup>2</sup> <sup>3</sup> . A unique feature of the logistic distribution is its bell-shaped curve similar to a normal distribution, but with the possibility of a kurtosis through scale variables (*s*). Figure 2 illustrates the logistic distribution with an average of 14.78% and scale of 0.22% selected in the distribution based on the actual measurement data for the capacity factor.

**Figure 2.** Probability distribution for capacity factor (logistic distribution).

#### 4.1.2. Discount Rate

The triangular distribution was used in considering the uncertainty surrounding the characteristics of the discount rates. This distribution can be used when the MLE is accurate, and information on the maximum and minimum is certain. The PDF of the triangular distribution is provided as follows [32]:

$$f(\mathbf{x}) = \frac{2(\mathbf{x} - a)}{(m - a)(b - a)}\tag{9}$$

In the above formula, *a* refers to the minimum value; *m*, to the mode; and *b*, to the maximum value. The average (μ) and variance (σ2) of this distribution are as follows:

$$
\mu = \frac{a+m+b}{3} \tag{10}
$$

$$
\sigma^2 = \frac{(a^2 + b^2 + m^2 - ab - am - bm)}{18} \tag{11}
$$

According to the study conducted by Choi and Park [33], the discount rate reached a maximum of 7.5% in 2001 and currently equates to 5.5%, but requires a reduction of 1% in the future. Therefore, this study applied the triangular distribution of a minimum of 4.5%, a mode of 5.5%, and a maximum of 7.5%. Figure 3 shows the triangular distribution generated based on the above assumptions.

#### 4.1.3. O&M Costs

The preceding study by the International Energy Agency (IEA) [34] was referenced in establishing the probability distribution for O&M costs, with the establishment of the normal distribution for this purpose. This distribution is applicable when the probability of MLE is high. The PDF of the normal distribution is presented as the following:

$$f(\mathbf{x}) = \frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{(\mathbf{x}-\mu)^2}{2\sigma^2}}\tag{12}$$

In the above formula, μ refers to the average, while σ represents the standard deviation. In this study, the surveyed average for commercial facilities was KRW 37,365 (USD 33.97 (In this study, USD 1 = KRW 1100 is applied))/kW, and KRW 11,667 (USD 10.61)/kW for residential facilities. The standard deviation was assumed at an average of 5%, identical to the study by the IEA [34],

with the standard deviations for commercial and residential facilities as KRW 1 868 (USD 1.70)/kW and KRW 583 (USD 0.53)/kW, respectively. Figures 4 and 5 are normal distributions based on the above-mentioned assumptions.

**Figure 3.** Probability distribution for discount rate (triangular distribution).

**Figure 4.** Probability distribution for commercial operations and maintenance (O&M) costs (normal distribution).

**Figure 5.** Probability distribution for residential O&M costs (normal distribution).

#### 4.1.4. CAPEX

A normal distribution is assumed for the probability distribution for CAPEX, similar to the O&M costs as per above. One difference is the application of the standard deviation. The surveyed numbers are identical in that they were applied according to an average, but the standard deviation was assumed as 10% of the average. The reason for this is the greater uncertainty of CAPEX compared to O&M costs, caused by a reduction in costs resulting from technological advances or factors causing unexpected cost increases.

The CAPEX for commercial facilities recorded an average of KRW 1.6 million (USD 1 454.55)/kW, while the standard deviation totaled KRW 160,000 (USD 145.45)/kW. As for residential facilities, the average reached KRW 1.8 million (USD 1 636.36)/kW, while the standard deviation amounted to KRW 180,000 (USD 163.64)/kW. Figures 6 and 7 display the normal distributions for commercial and residential facilities.

**Figure 6.** Probability distribution for commercial facility capital expenditure (CAPEX) (normal distribution).

**Figure 7.** Probability distribution for residential facility CAPEX (normal distribution).

#### 4.1.5. System Degradation Rate

The triangular distribution is assumed for the system degradation rate after referring to the preceding study by the IEA [34]. The technological threshold for photovoltaic energy generation is clear, meaning that the use of a triangular distribution seems reasonable. In IEA [34], a mode of 0.5% was applied, whereas this study applied 0.7%, as surveyed. The minimum and maximum values were applied at 0% and 0.8%, respectively, identical to the study by the IEA [34]. Figure 8 shows the triangular distribution generated based on the above assumptions.

**Figure 8.** System degradation rate probability distribution (triangular distribution).

#### 4.1.6. Corporate Tax

Corporate tax is a policy variable, and the following scenario was proposed as a hypothesis. The corporate tax will decline from its current state until it reaches a range where all corporate tax is exempt. The probability distribution suited to this hypothesis is the triangular distribution, which can arbitrarily establish a threshold.

This study assumes the triangular distribution, while the mode and maximum value were set at the current level of corporate tax at 24.2% (including residence tax), with the minimum value set at 0% (exemption of corporate tax scenario). Finally, as per Figure 9, a right triangle distribution was generated where the probability reaches its highest point at 24.2% and then declines gradually.

**Figure 9.** Corporate tax probability distribution (triangular distribution).

#### *4.2. Results of Stochastic Simulation*

This study uses the Monte Carlo simulation technique and encompasses the characteristics of variables that reflect the uncertainty and variability of solar PV generation in order to select a probability distribution. Arbitrary random numbers were generated within the distribution, and by repeating this process 10,000 times, this study estimates the range of meaningful solar LCOE with a confidence interval of 95%.

In the case of commercial solar energy, the LCOE records an average of KRW 159.49 (14 cents)/kWh with a standard deviation of KRW 13.31 (1 cent)/kWh. The 95% confidence interval was between KRW 133.60 (12 cents)/kWh and KRW 186.52 (17 cents)/kWh. Figure 10 shows the probability distribution for commercial solar energy generation, while Table 3 provides the relevant statistics.

**Figure 10.** Probability distribution for commercial solar LCOE.


**Table 3.** LCOE statistics for commercial solar energy generation.

In the case of residential solar LCOE, an average of KRW 137.15 (12 cents)/kWh was recorded with a standard deviation of KRW 14.80 (1 cent)/kWh. The confidence interval at 5% significance level showed a minimum value of KRW 109.67 (10 cents)/kWh and a maximum of KRW 167.35 (15 cents)/kWh. Figure 11 and Table 4 illustrate the probability distribution and statistics for residential solar LCOE, respectively.

**Table 4.** Residential solar LCOE statistics.


**Figure 11.** Probability distribution for the residential solar LCOE.

#### *4.3. Sensitivity Analysis Results*

Figures 12 and 13 show the results of the analysis on the variance contribution of the LCOE for solar PV. The factor with the greatest impact on the LCOE of commercial and residential solar PV is established as CAPEX. The contribution of this variable was recorded at 57.3% and 74.8% for commercial and residential facilities, respectively, while the proportion accounts for an absolute majority. This implies that CAPEX is the most common and significant factor to consider when seeking to improve the economics of solar power generation activities of both commercial and residential facilities. Therefore, it is essential to determine the factor that will reduce the relevant costs.

**Figure 12.** Sensitivity of the commercial solar LCOE.

The second most crucial factor to consider in improving the economy is the discount rate for commercial solar energy generation, which accounted for 18% of the variance contribution. Similarly, it was also the second-most important factor for residential generation facilities, accounting for 17.7% of the variance contribution. This implies that the development of a policy aimed at reducing the burden of financial costs may be effective.

**Figure 13.** Sensitivity of the residential solar LCOE.

#### **5. Discussion**

The stochastic LCOE analysis methodology proposed in this study was applied to solar PV for residential and commercial use in South Korea. The deterministic LCOE performed in most existing studies does not indicate the uncertainty of the input variable. Although the sensitivity analysis of the deterministic methodology shows the uncertainty of input variables, the number of scenarios is limited. This study is meaningful in that the confidence interval of the results is derived by reflecting the stochastic characteristics of the input variable instead of finding the deterministic LCOE. This study also contributed to producing realistic analysis results by collecting the actual data generated in the solar PV project.

Among the input variables, the capacity factor was set to the optimal distribution function using the actual power generation data of solar PV. As a result of statistical analysis, significant statistical values, such as a reference value, an average, a median value, a standard deviation, a minimum value, and a maximum value, were derived. Commercial solar LCOE was estimated to range between KRW 115 (10 cents)/kWh and KRW 197.4 (18 cents)/kWh at a 95% confidence level. The median was valued at KRW 160.03 (15 cents)/kWh. The LCOE of residential solar was estimated to range between KRW 109.7 (10 cents)/kWh and KRW 194.1 (18 cents)/kWh at a 95% confidence level and a median value of KRW 160.03 (15 cents)/kWh. The sensitivity analysis showed that CAPEX had the most significant impact on solar LCOE, followed by the discount rate and corporate tax.

Therefore, to reduce the LCOE of solar PV, it is necessary to strive to reduce CAPEX. The increasing dissemination of solar PV could also lower CAPEX in PV. The reasons behind the high hardware and soft costs of solar PV facilities in Korea are twofold: the unique economic environment and lack of experience. For example, Germany is continuously lowering costs by steadily distributing solar PV systems across the country. If Korea followed suit and gained valuable learning experiences as a result, it is important to consider how much solar LCOE could be reduced. To answer this question, we assumed that Korea holds the equivalent level of knowledge and experience to Germany. Specifically, we assumed that Korea's hardware costs, including the costs of modules, inverters, connection bands, electric wiring, structures, and installation and construction, are reduced to the same level as they are in Germany. In addition, we assumed the same for soft costs, including supervisory costs, design costs, and general management costs, as well as the O&M costs (costs of replacing parts, safety management costs, etc.), excluding land leasing rates. Table 5 shows cost breakdown of a 100 kW solar system in Korea.


**Table 5.** Cost breakdown of a 100 kW solar system.

Figure 14 indicates how Korea's solar LCOE could be reduced through the reduction of domestic installation costs for solar energy generation facilities to the same level as Germany. If done effectively, Korea's solar LCOE would drop KRW 26.6 (2 cents)/kWh. Similarly, if the soft costs and O&M costs are also reduced to Germany's level, the solar LCOE would be reduced by KRW 17 (1.5 cents)/kWh and KRW 6.3 (0.6 cents)/kWh, respectively. Therefore, if Korea's overall installation costs are reduced as per Germany's levels, the solar LCOE would decrease to KRW 97.2 (9 cents)/kWh. This is even less than Germany's current solar LCOE of KRW 122 (11 cents)/kWh, due to Korea's advantage in terms of capacity factor and corporate tax. However, decreasing installation costs cannot be achieved within a limited period of time. Germany managed to reduce costs based on experience gained through years of installing solar energy generation facilities. Therefore, if Korea continues to deploy solar energy systems, reduced solar LCOE below KRW 100 (9 cents)/kWh can be achieved.

**Figure 14.** Commercial solar LCOE adjustment in South Korea after lowering costs to the level of Germany.

The higher cost of domestic modules and inverters in Korea also indicated that increased efforts are required to lower costs in the manufacturing sector. In addition, the high installation and construction costs are attributable to the country's high labor costs and extended construction periods. With limited scope to lower labor costs, efforts should be focused on reducing construction periods. Due to civil complaints often prolonging construction projects, a policy to increase public acceptance of solar energy generation facilities is also required.

Subsequently, we examine license and permit costs. License and permit costs paid to local autonomies in Korea are 10 times higher than in Germany and 50 times higher than in China. The greatest challenge for solar energy generators in Korea is securing licenses and permits for development activities from local autonomies. This urgently calls for the development of a policy to lower license and permit costs and to streamline related procedures. In addition, domestic grid connection costs in Korea are more than four times higher than in China, highlighting the need for lowering these costs. Domestic general management costs are also higher, exceeding those of other countries by more than 10 times. Cost reductions must be achieved by systemizing domestic projects.

This study also recommends the reduction of value-added tax on the installation of solar energy generation facilities. Recently, the National Energy Administration of China announced a policy to reduce taxes for solar generators, introducing a refund of 50% on value-added taxes for supplies associated with solar energy and the lowering of taxes on the use of farmlands. This policy will be adopted by 2020 to promote solar PV in the country. The reduction of corporate tax on the profits of solar energy generators should also be considered. The United States is accelerating the deployment of renewable energy sources through an investment tax credit and production tax credit, serving as adequate reference points.

Policy efforts to lower discount rates are also required. Discount rates comprise the cost of debt and the cost of equity, which can be lowered strategically. For instance, the cost of debt can be lowered by offering preferential interest rates for loans to generators of renewable energy. The promotion and support of renewable energy project financing (PF) should be additionally considered. The cost of debt can also be lowered through the government's active promotion of PF and support for the development of new investment products, with the objective to establish an industry ecosystem for renewable energy.

The promotion of renewable energy is an absolute necessity, considering the need to reduce GHG emissions and improve air quality. However, if the promotion of renewable energy becomes a burden to consumers in the form of excessive electricity bills, the deployment of renewable energy could be restricted. Should the cost of solar PV be reduced based on the results of this study, it would lead to a decrease in the unit cost of solar energy generation. Consequently, a reduction in power generation-related payments and, ultimately, relief in terms of electricity bills for consumers will follow.

The methodology proposed in this study is expected to be applicable to solar and other energy sources in other countries. A stochastic approach could be generalized if the methodology of this study is widely used in future studies. In addition, meaningful implications can be drawn from comparing countries and energy sources.

**Author Contributions:** C.-Y.L. designed the study, reviewed the related literature, and interpreted the results. J.A. outlined the methodology, and developed the model. All authors provided substantial writing contributions and significant comments on numerous drafts. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was supported by the Korea Energy Economics Institute (KEEI) grant funded by the South Korean Prime Minister's Office.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
