1. Introduction
In recent years, the electricity demands of residential customers have drastically increased due to high usage of household appliances and the charging of electric vehicles [
1]. There have been many recommendations made by researchers on how to manage the electricity demands of residential customers, which mainly include increasing power generation facilities, inventing new storage technologies and efficiently managing available electricity through the use of smart grid technology [
2]. Smart grids are considered to be the best solution for managing the high and irregular electricity demands of residential consumers. Smart grids eliminate the limitations of conventional grids by integrating new technologies that allow better performance and a holistic approach in the power domain. In addition, smart grids interact with end-use consumers by analyzing energy requirements and electricity pricing, in order to provide suitable demand response programs [
3]. Demand response (DR) programs encourage residential consumers to change their regular electricity usage patterns; accordingly, utility companies provide incentives which facilitate low electricity consumption during peak load hours, as well as maintaining network stability. Demand response program modeling can be carried out for all types of consumers, including residential, commercial and industrial consumers [
1,
2,
3].
Among the DR programs, incentive-based DR programs motivate the customer to voluntarily participate in the program and reduce their electricity consumption for a specified time period, as requested by the utility; there exists a contract between the users and the utility company [
4]. The primary objective of consumers in an incentive-based DR program is to reduce electricity costs, whereas the utility companies in these schemes intend to effectively manage and distribute the available electricity at low cost [
5], as improper appliance scheduling may have adverse effects on either the customer or the utility company [
6,
7]. Much of the research in this domain aims to enhance the performance of DR programs, including optimal selection of DR programs, the use of structured and load grouping-based system control approaches, as well as consumer flexibility and awareness.
A cost-effective incentive-based DR program model has been proposed for day-ahead and intraday electricity markets, and has been implemented on a 200-unit residential building on the Spanish electricity market [
8]. As a first step, a mathematical relationship between electricity costs and energy demand was proposed, using a fitting process based on historical data [
8]. In addition, a dynamic incentive pricing scheme was adopted based on peak-hour load requirements. In work proposed in [
9], an incentive-based DR with LSTM was established, in order to analyze and predict user behavior in response to participation in the DR program. The proposed approach in [
9] delivered high prediction accuracy and better performance in several environments. The work presented in [
10] focused on the learning paradigm for higher prediction accuracies, using a deep neural network for the incentive scheme DR modeling. In this model, the deep neural network established the demand response interaction and developed an incentive scheme.
In this study on demand response modeling for the purposes of forecasting dynamic pricing and electricity demand, a deep neural network technique was employed in order to eliminate energy price fluctuations in the future. Later, reinforcement learning was performed to identify the ideal incentive pricing for various types of customers, ensuring benefits for both residential customers and the utility company. Similarly, in paper [
11], reinforcement learning was applied to determine the effective incentive rate, on an hourly basis, which could provide gains for both customers and service providers. In addition, a modified deep learning model based on a recurrent neural network was developed to find uncertainty in the near future and predict day-ahead electricity costs. A quasiconvex cost function-based game theory was utilized to forecast the base load price, then the DR scheme was adopted in order to alter customer electricity load and reduce their costs [
12]. The simulation results proved that the proposed pricing and billing scheme was capable of controlling the energy consumption of residential customers during the peak-load condition. In [
13], a novel data analytical demand response (DADR) was suggested to manage peak load demand. The proposed DADR scheme accounts for various parameters collected from customers, which include appliance usage factors, appliance priority indices and appliance run-time factors. By considering all these parameters, different algorithms are developed considering the consumer’s and utility’s perspectives, in order to make DR decisions during the peak-load condition. A demand-side energy management method (DMPC) is proposed in [
14] which considers different types of smart loads, renewable energy resources and energy storage systems for the DR program. In addition, the user can choose an ideal schedule as per their own preferences based on various factors, including the power reduction of flexible loads, the start time of shift-able loads and the charge/discharge routine.
In contrast to the aforementioned DR schemes, machine learning-based DR strategies have gained more attention among researchers in recent years, due to their high performance [
15]. In particular, game theory-based DR strategies may be utilized in the smart grid environment for making decisions regarding management of load balancing. A two-stage-based DR scheme is proposed in [
16]; in the first stage, a noisy inverse optimization technique predicts the load requirement and in the second stage, a one-leader multiple-follower stochastic Stackelberg gaming model is applied, in order to make a final decision on load balancing. Here, the game model is supported by a hybrid dual decomposition-gradient descent (HDDGD) algorithm; its practical implementation in China and a distribution test system both demonstrate the effectiveness of the proposed methodology. In addition, to further improve the dynamic electricity pricing [
17,
18,
19] predictions, the missing data problem is resolved with the help of an advanced deep learning method called generative adversarial networks (GAN), in which the GAN model is frequently updated, in order to complete the data required for decision making, when data are missing [
20]. A real-time DR scheme considering customer requirements, dynamic electricity prices and outdoor temperatures, for the efficient scheduling of home appliances, has been implemented in [
21], using deep reinforcement learning (DRL). However, while the aforementioned incentive-based DR approaches are very effective at balancing the interests of residential customers and utility companies, they also have many limitations. These limitations include: (i) The low performance and accuracy of existing DR schemes. (ii) The consideration of only the basic home appliance load in DR decision-making and not the electric vehicle charging load, though it consumes more energy. (iii) The lack of accounting for customer privacy in the structuring of DR schemes. (iv) The failure to consider customer comfort and willingness to participate from a wider perspective.
Demonstrating the benefits of demand response modeling with data on the usage of electricity by sets of residential consumers can lead to the disclosure of various patterns, which may result in security issues and privacy concerns, as discussed in [
22]. Accordingly, customer privacy is one of the major challenges for incentive-based DR programs, since they have direct control over customer appliances. Recently, there have been several research studies investigating customer privacy in incentive-based DR schemes in smart grid environments [
23,
24]. A simple hopping scheme-based demand response program is proposed in [
25] to ensure customer privacy, by scheduling appliances with an adjustable schedule that excludes information on individual appliance power consumption. In [
26], a mathematical programming framework, suggested in order to balance the customer objectives of privacy and low electricity prices, with the utility company’s objective of managing the load demand during peak-hours, has been designed to meet the need for privacy. In this work [
26], customer privacy is protected by a privacy-protection strategy that maintains minimal changes to power between consecutive time slots, in a smart grid environment. An online privacy-preserving DR program that uses a tree-based noise aggregation method has been used to protect both the privacy of customer information and the utility company’s data, as described in work [
27]. Results obtained from [
27] indicate that the proposed approach is able to manage large-scale datasets efficiently and balance this with privacy preservation. A novel encryption-based DR scheme is introduced by [
28] involving major entities, such as smart meters and utility providers. For each user, the information about their power consumption and identity is encrypted and sent to the utility center; in order to verify the identity of the specific customer, the administrator can access trustful parties which will disclose the data gathered from the smart meter; the results show the feasibility of practical implementation. In [
29], an identity-based signature encryption algorithm for providing the authentication and integrity of the DR scheme is proposed and the results indicate that the proposed technique is better than others.
Besides implementing an efficient demand response program with customer privacy protection, identifying an appropriate strategy for handling the high volume of data collected from customer devices (sensors, smart meters and actuators) is another major challenge involved in a smart grid environment. These large amounts of data must be preserved in a cloud environment, which consists of resources such as hardware, software, processors and storage, for efficient management of the smart grid. However, the increasing number of users in smart grids leads to inefficient management of cloud resources when handling large numbers of user requests at the same time. However, fog–cloud computing frameworks, as shown in
Figure 1, allow management of the smart grid environment, due to their key characteristics of minimum latency, ease of accessibility, ability to connect to large numbers of devices, customer privacy protection, location awareness, etc. The literature dealing with the use of the fog–cloud computing concept in the efficient management of smart grids is very limited [
30]. A detailed review of cloud–fog computing architecture, its applications and security-related challenges in relation to various applications, is discussed in [
31,
32].
A hierarchical cloud—fog-based model is proposed by [
33], in order to provide different types of computing services at various levels and for the efficient management of resources in the smart grid. Furthermore, three algorithms, throttled, round-robin and particle swarm optimization, may be used to manage load balancing, as shown in [
33]. Similarly, in [
34], a fog–cloud-based framework was implemented with the objective of attaining maximum performance from the smart grid resources, along with a different load-balancing algorithm. In [
35], to achieve minimal response and processing times in the smart grid, fog–cloud computing was utilized, with a hybrid of particle swarm optimization and simulated annealing algorithms for load balancing. In addition, a novel server broker policy was introduced for the selection of possible fog data centers, in order to quickly respond to customer requests [
35]. Few papers in the literature [
36,
37,
38,
39] have shown the importance of this environment for the effective management of smart grids, or for achieving the best processing times and the optimum usage of resources within the required timeframes allotted to processes. The entire focus of this work centers on the different modules of this two-step process. The first step in the process involves organizing sets of residential users into hubs. Each unit, considered as a set of users under a smart hub that co-ordinates the scheduling of appliances, is where the strategy profiles are developed using a discounted stochastic game.
The next step in the process is ensuring the privacy of the users, by using a load-balancing framework in a cloud–fog environment. As such, in this module hubs are created with users as well as a virtual manager. The privacy of the user data is secured within the cluster, with a ratio level of 75%. The next step in the process is the GAN Q-learning module. This machine learning model is used in the process to enable residential users to follow the pattern of the best strategy profile. Following the development of this model, users will be able receive incentive schemes that ensure the privacy of their scheduling patterns.
The contributions of this paper are outlined below. Although the various issues and challenges involved in incentive-based DR schemes in smart grids have been comprehensively analyzed in existing studies, the high volume of data produced by smart grid entities still poses many challenges and requires more attention. Therefore, privacy-protection-based demand response analysis using machine learning with a cloud–fog- based smart grid environment is proposed. Our proposed real-time DR model consists of two steps: in the first step, scheduling of appliances is carried out using an optimal DR strategy developed through a discounted stochastic game at the residential side and hubs are set up in a cloud–fog-based smart grid environment; in the second step, the DR strategy analysis for residential consumers, with privacy concerns and consideration of EVs is performed using a GAN Q-learning model in a cloud computing environment.
The entire paper is sectioned as follows:
Section 2 explains the new approach to machine learning with cloud–fog computing analysis used in the proposed model.
Section 3 portrays the mathematical analysis of the model at every stage, with privacy concerns, using the GAN Q-learning model and the DR game-approach analysis.
Section 4 covers the results of the privacy-concern model at different stages with the mathematical approach.
Section 5 contains the conclusions of this work and discusses the future aspects and practical considerations of the proposed model in a smart grid scenario.
2. Proposed Methodology
Demand response modeling in smart grids plays an important role in understanding the key factors related to user profiles and customer interactions with the utility. This paper is focused on DR analysis at the residential consumer profile level, which is considered as the smart hub of every residential unit in the cloud–fog environment. The major new adaptations of this system are the specific DR strategy analysis used in this process and the importance of cloud–fog computing models in bringing data integrity to users. This model confirms that users all receive the same information from the receiver end. The first analysis in this work shows the process of handling the incentives provided to consumers based on the scheduling of appliances on a daily basis. The demand response modeling is performed on the residential side based on the scheduling of appliances and creates an incentive scheme using a discounted stochastic game model. Generally, stochastic games are involved when there are different stages in the game and every user in the game is responsible for their actions. Based on a set of suitable actions chosen by the user, a suitable payoff is given. The second analysis is made with respect to the incentive scheme modeling resulting from the discounted stochastic game. In this analysis, optimal solutions with privacy concerns are developed in the cloud–fog environment by using GAN Q- learning. This distributed reinforcement learning process is based on generative adversarial networks.
Figure 2 exhibits the overall proposed architecture of DR modeling. In this environment, the first step handles the proper scheduling of appliances required for the optimal DR strategy and is created using a discounted stochastic game, with a set of residential consumers as the players in the model. With consideration of the scheduling of appliances based on discounted stochastic game in the cloud–fog environment, this two-step process has been formulated. In this process, consider a set of payoff vectors
to be created based on the limited set of
equilibrium payoffs. In the sequence of strategy profiles created,
, the payoffs of players are based on the initial state of scheduling, S, considering the timeframe to be
t = 1 to
t = 24. These elements are based on the game strategy by satisfying the conditions
and
. Scheduling pairs are chosen, (S,
) and
, which exhibit a sequence of mixed-action combinations at every stage of the scheduling sequence,
. For every discounted game, there is a need to initiate every scheduling state, S and
, which is done based on the selection of
from
E(
s). This is done in order to obtain the final equilibrium payoff,
. Therefore, for every scheduling set made, recommended scheduling strategy profiles are obtained from the formulated process with
, so that for every
, the best scheduling strategy is recommended. In each set, there are sets of sequences created based on the scheduling parameters. Therefore,
represents a set of sequences corresponding to
which is defined by
. In this process, equilibrium strategy profiles are created for
and
, with a sequence of
as
. The discounted stochastic game algorithm is created at the cloud–fog level of the smart grid environment, which is the first step in the process of executing the module which determines the strategy. On the basis of the discounted stochastic game, a measurable selection value of Nash equilibrium is determined, with consideration of payoffs, based on one-shot game scenarios played by the residential users in the environment. This model enables the process of effective scheduling between the utilities and the users; a dedicated framework for this discounted stochastic game is made to create the best strategy profiles. The game strategy in this environment is set based on the elements (
N,
px,
E(s),
), where N represents the set of users in a required state-space environment, in accordance with the cloud–fog setup. For each user in a set, actions performed in the scheduling environment are represented as
Y*. For each set of users in the environment, there are two conditions based on the scheduling strategy performed. The chosen scheduling strategy for each set of users in the residential units is applied based on two sets of expressions,
, where each element corresponds to each available set of best scheduling strategy profiles created in this process.
The second base expression for reaching the state of Nash equilibrium is based on
, where
Px represents the payoff vectors required to reach the equilibrium payoffs,
. By this method, the discounted strategy is based within the frame for the best scheduling option and provides probability measures for the best strategy patterns evolved. With this requirement of space with strategy S performed in the environment, the chosen strategy profile for each user, j, is based on the initial state in the structure and the probability measure
E(
s). A strategy vector is created based on scheduled states, S, for per-day calculations in this state. A strategy of
is performed, based on the set of all residential profiles in the dataset created in the scheduling period (
t = 1 to
t = 24). For every initial state ‘s’, the strategy profile,
, creates a parameter value with a probability measure over H, which shows the outcome of Nash equilibrium in the strategy game. The final scheduled strategy pattern is based on the probability measure
E(
s). The resulting expected discounted payoff for each user, j, in a set of N users with initial state, is expressed in Equation (1) as the function
The utility function for the set of residential profiles obtained from the strategy game analysis is performed, based on the scheduling of appliances for the set of consumers based on the R′. The payoff function for obtaining the incentive game process is based on the function
In this final analysis, the strategy equilibrium is determined based on every residential user, with the strategy, Sj, that they created, and with every initial state given by the following function, in accordance with the discounted game analysis, as . For each selected strategy profile and for finite intervals of time, the required incentive-based strategy is announced based on the following function . This analysis is continued with a learning paradigm in a cloud–fog environment with GAN approaches, after determination of the best strategy profile for creating the incentive-based structure in the smart grid environment.
3. Mathematical Analysis and the Learning Paradigm with Cloud–Fog Computing
Cloud–fog architecture is involved in this demand response modeling, with privacy concerns, for the purposes of efficient energy management. The total set of user requests in the framework that are handled is given as
. Each request is handled on different virtual machines (VM).
Figure 3 represents a detailed framework for cloud–fog architecture in a smart grid environment. Significant amounts of processing occur and response times are calculated based on the set of user requests, starting from 0 to R. Each set in the created smart hubs is supervised at the cluster level with a virtual manager, as well as an intermediate broker, according to the assignment made in the smart hub.
The processing time was calculated based on the set of user requests given by the virtual machines, as indicated by Equation (2):
The total set of Tp = , where Lpk = .
The main analysis in the architecture of a smart grid environment is required to minimize the time required for the residential user requests which have been assigned to the virtual machines. This helps to ensure better privacy for the residential users, as every unit is managed by the smart hub with a virtual manager, where privacy can be handled more effectively. The mathematical formulation for minimizing the objective function is explained with the following function:
Minimize RU
x = , where RU
x represents the set of user requests handled by the virtual manager,
indicates the working load initiated in the VM by the user requests and
denotes the total efficiency and the computation power accordingly. The next aspect to consider is the response time (
), which is the time taken for the users to receive a response to the users’ requests, from clusters of the corresponding smart hubs for the residential units. For the entire framework, an algorithm based on calculations of the response time is performed with mathematical analysis, as shown in
Table 1.
The steps in the algorithm focus on three major criteria. Each cloud to be managed with a virtual manager is fixed and initiated. The created managers are set with counts for load balancing. In this paper, the algorithm and the cloud–fog environment are based on a load-balancing framework. The algorithm explains the step-by-step procedure for privacy-based secured management in the environment, based on the virtual managers in the load-balancing network. In this analysis, privacy is ensured by this environment, as each cluster of the smart hub protects the information of the residential unit data through virtual managers. In the sets of residential user environments, scheduling policies had to be created based on the best strategy games performed by the users. Therefore, finding an optimal energy policy that considers the multiple tasks in the given scenario is carried out by a GAN Q-learning algorithm. The main motivation for developing a GAN Q-learning model for the SG environment is to perform data analysis on different residential classes. Among generative models, the Markov process and Boltzmann machines are typically used to analyze the data for this process. However, in this algorithm the main analysis is performed with a generator network in order to determine the GAN models of optimal energy policy, in state- and action-based distributions. The generative network is represented as in the class of . The discriminator network, , in the process is established to differentiate between the real strategy performed and the strategy resulting from the main network, .
The main network improves its overall performance based on the information obtained from the discriminator network. The process starts at every timestep; Q
G obtains a state which characterizes the input and develops the required strategy for generating the policy, as per the use case
. Every set of strategies obtained from the set of estimates, is represented as
Y(
s,
x). In the next step, the selection of a strategy,
x*, is made using
.
Figure 4 depicts the analytical approach of GAN Q-learning, based on the strategy requirements of the cloud–fog environment. The required action in this case involves the best strategy obtained for maintaining the energy policy. Once performed, this action receives an incentive and moves to the next state, depending on the best strategy created in the model. The intermediate action, which acts as an agent, applies the necessary scheduling strategy
x*, so that the user with the best strategy receives an incentive reward, I
R, and moves to the next state,
s*. A function is then developed in accordance with the strategy profile, (
s,
x,
IR,
s′), which results in an equivalent change, represented as
. The next step involves moving to the discriminator side and updating the generative network accordingly. The entire GAN Q-learning model involves the generator and discriminator sides. Updates during the next change are performed by the main objective function of the learning function, which is defined as
.
Figure 4 elaborates on the distribution learned by the GAN Q-learning process based on the two states. This process trains an agent with a set of strategies for the residential profiles. The epoch states emphasize the number of users involved in the learning process and adaptations. These epochs clearly indicate the different states in the user profiles which have been developed. Depending on the learning process, epochs and states are generated with a time frame for better strategies. This creates a policy which may be objective, to be utilized by other residential users for their scheduling patterns. Updates to the algorithm discriminator network are made after every set of learned patterns are obtained in the complete Q-learning process. The network from the objective function determines the optimal policy from the created strategies. This is obtained from an intermediate agent, from the original set of strategy data profiles, given as
.
The final stage of obtaining the new optimal policy is based on the set of best strategy profiles created in this use case of residential consumers, with the GAN Q-learning algorithm. At every epoch, each set was created by updating
QG and
QD in accordance with the profiles generated; the objective function for obtaining them through the learning model is given by Equation (3):
Equation (3) provides updating parameters for epochs of the two-state network in this algorithm. Furthermore, it creates the optimal policy for the training dataset given by the residential consumers. This is the final step in the model and provides the effectiveness of the algorithm for every set of epochs which has been created,
QD, and the process continues until the required effectivity is achieved.
Table 2 outlines the GAN Q-learning process for the purpose of obtaining the optimal policy created by a learning process based in a cloud–fog environment, with states and epochs. In this algorithm there are two steps involved that are based on the learning rate and the agent. These schemes are the first step in determining
Qc and
QD for a set of scheduling parameters for different appliances, for
t = 1 to 24. For every set a learning pattern is formed, which is represented as
xt. After determining the learning process for a set of strategy profiles, the discriminator network is updated, as are the weights, based on the learning rate; the learning rates are initiated at 0.5. Depending on each strategy profile, the learning state is updated so that the consumers learn them in an effective pattern and a policy for demand response is maintained. In the last step, the generator network is also updated during the Q-learning process.
4. Results and Discussion
Cloud Analyst, which is based on cloud sim, is considered to be a suitable simulation tool for cloud–fog architecture frameworks, for smart hubs set up in residential environments. In this paper, three regions are analyzed as a set of three clusters, with respect to each smart hub in the environment. As the user requests are sent to each set of intermediators and the central broker, there is a need to determine which smart hub is most appropriate, based on the user requests given by the residential units. The process of the entire module includes steps that involve creating smart hubs for sets of residential units and for each residential unit, effective scheduling of appliances is determined using a discounted stochastic game. The outcome of the game gives the best scheduling method for the users in each residential unit. The cloud–fog environment is used in the process in order to maintain privacy among the different smart hubs created. Therefore, in this process privacy is handled effectively for the users. The data of the best scheduling policy can be maintained among the users so that the environment is set up using a load-balancing network. Next, a GAN Q-learning model is used in order to obtain the best optimal energy policy in this DR model and the best strategy developed receives incentives. This is mainly achieved through dynamic pricing schemes, where the cost effectiveness on a per-day basis is reduced based on the scheduling pattern.
The entire architecture is handled based on the smart energy hubs created as
E1 to E
n, as per the residential buildings. In this analysis, we have considered a total of
E = 11 energy hubs that are in direct communication with the utility provider. These users are set up with corresponding energy hubs, in the form of clusters or regions. Each fog in the region contains m number of virtual machines and storage.
Figure 5 depicts the analysis of different clusters with regard to their average response times. The average response times have been calculated by the load-balancing network in this environment, determining the usage.
Figure 6 shows the processing times for the set of epochs created.
Figure 6 explains the processing time with respect to the set of epochs in the residential environment. From the analysis of the average response time and processing, it appears that when more clusters have been created, the processing time is better. For experimental verification with mathematical modeling, for each smart energy hub created within a cluster region, performance analysis was carried out using a load-balancing algorithm, enabling the creation of privacy for the users of each cluster. For experimental verification, for each smart energy hub, the performance of one fog and one cluster/region was calculated, based on the load-balancing algorithm.
Table 3 represents the response and process times involved in calculating the performance of the algorithm.
Average response and process times are used to understand load balancing on an hourly basis, from t = 1 to t = 24. Regarding the energy hubs in the cloud–fog clusters, eight hubs are formulated in this process to efficiently analyze the process time in the layers. This analysis is performed with smart grid integration to enhance the energy consumption of the buildings and to enable the use of the load-balancing algorithm. The simulations are carried out in Java Platform with cloud sim infrastructure. The algorithm uses Virtual Manager to allocate the work load and the fog receives the user requests in order to allow Virtual Manager to use minimum allocations. The algorithm for the training process obtains the optimal policy, based on intermediate state actions of every developed strategy from the user profiles of the residential consumers, from the discounted game.
At the beginning of every training process, a learning distribution parameter, based on a two-state environment model with an incentive policy, is created. This learning algorithm creates a transition model for every pair of (
s,
x), by exploring different sets of strategies created at this level, based on the actions developed by the users and contributes to the development of privacy in this model. The performance study is based on the allocation time during the learning process, which is carried out in an Open AI Gym cartpole environment with two states, using a one-hidden-layer GAN Q-learning model. The hidden layers of the developed model require two dense layers with 130 units, which can be used to learn the reward.
Figure 7 shows the epochs needed to determine the reward policy given to the set of users, based on the time at every instant of scheduling performed.
Figure 8 portrays the optimal strategy profile determined through GAN Q-learning, using Open AI Gym, explaining the need for GAN Q-learning in creating the optimal strategy profile for the sets of residential consumers from 11 clusters.
Figure 7 shows the epochs that have been made based on the clusters. The clusters are set up to ensure the privacy of the set of residential units. Among the set of epochs created from the two generator and discriminator models, the stages of the learning process are represented in epochs.
Figure 8 represents the rewards; incentive approaches were used for the set of consumers in the residential units, to adapt the best strategy profiles from the learning process by GAN Q-learning, with epochs generated from the profile.
The computational time involved in processing this analysis has a major impact on the scheduling patterns created by the fog clusters. The clusters are required for the privacy of the residential units and for preserving the data of the clusters, and the information is only shared with the common data center in the smart grid. This information is analyzed based on the scheduling patterns and the inference time is calculated so that data privacy is maintained within the clusters in a well-established manner. The parameters considered for the learning algorithm include the training period, average amount of energy saved and the inference time. The epochs at each step help to understand the analysis of the demand response, in training processes in two-state environments.
Table 4 displays the performance and computational analysis with the learning algorithm. These results show the mathematical analysis of the different epochs created to establish the need for approximation.
Table 5 exhibits the performance analysis with respect to the learning algorithm, it stated the calculation done with the set of parameters in the two- state environment accordingly. In this process, reward distribution is shown with the privacy model created in the cloud–fog environment.