Data Storage Optimization Model Based on Improved Simulated Annealing Algorithm

Wang, Qiang; Yu, Dong; Zhou, Jinyu; Jin, Chaowu

doi:10.3390/su15097388

Open AccessArticle

Data Storage Optimization Model Based on Improved Simulated Annealing Algorithm

by

Qiang Wang

^1,†,

Dong Yu

^2,*,†

,

Jinyu Zhou

¹ and

Chaowu Jin

³

¹

Information Construction and Management Center, Jinling Institute of Technology, Nanjing 211169, China

²

School of Electrical Engineering, Southeast University, Sipailou No. 2, Nanjing 210096, China

³

Institute of Electrical and Mechanical, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Sustainability 2023, 15(9), 7388; https://doi.org/10.3390/su15097388

Submission received: 2 March 2023 / Revised: 19 April 2023 / Accepted: 26 April 2023 / Published: 28 April 2023

(This article belongs to the Section Energy Sustainability)

Download

Browse Figures

Versions Notes

Abstract

:

Since there is a longitudinal and horizontal penetration problem between multi-level data centers in the smart grid information transmission network. Based on the improved Simulated Annealing algorithm, this paper proposes a data storage optimization model for smart grids based on Hadoop architecture. Combining the characteristics of distributed storage in cloud computing, the smart grid data are equivalent to a task-oriented data set. The smart grid information platform is flattened, equal to a collection of multiple distributed data centers. The smart grid data over time were counted to derive the dependencies between task sets and data sets. According to the dependency between task sets and data sets, the mathematical model was established in combination with the actual data transmission of the power grid. The optimal transmission correspondence between each data set and the data center was calculated. An improved Simulated Annealing algorithm solves the longitudinal and horizontal penetration problem between multi-level data centers. When generating a new solution, the Grey Wolf algorithm provides direction for finding the optimal solution. This paper integrated the existing business data and computational storage resources in the smart grid to establish a mathematical model of the affiliation between data centers and data sets. The optimal distribution of the data set was calculated, and the optimally distributed data set was stored in a distributed physical disk. Arithmetic examples were used to analyze the efficiency and stability of several algorithms to verify the improved algorithm’s advantages, and the improved algorithms’ effectiveness was confirmed by simulation.

Keywords:

smart grid; improved simulated annealing algorithm; Grey Wolf algorithm; data storage optimization model

1. Introduction

1.1. Background

The development of smart grids is accompanied by a massive increase in grid information data [1,2,3]. Traditional data processing mainly uses parallel or distributed computing, which requires expensive storage and computing resources and effective and reasonable allocation and partitioning of various tasks of grid data [4,5]. From the smart grid construction and operation, it is found that the various types of information data generated in the grid have a wide range of distribution, different performance requirements, and processing frequencies. In the face of these problems, how to efficiently and orderly call, calculate, analyze, simulate, optimize, design, decide and store these power system data has become a challenge for developing a smart grid [6,7,8]. To further improve the efficiency and quality of information management and storage, the cloud-based smart grid information platform should be carefully studied, and its specific design and construction should be optimized [9,10]. For the smart grid information transmission network, there is a longitudinal and horizontal penetration problem between multi-level data centers [11,12]. According to the characteristics of the data solution in the power grid, based on the improved Simulated Annealing algorithm and the Grey Wolf algorithm, this paper studies the data storage optimization model of the smart grid based on Hadoop architecture.

1.2. Literature Review

Cloud computing technology uses distributed storage to meet the storage requirements of massive information data in the grid and assists with redundant storage to improve data storage reliability [13,14]. For different types of information data in the smart grid system, cloud computing data management technology can manage the data efficiently and does not need to consider the difference between its type and performance requirements [15,16]. Based on virtualization technology, cloud computing provides different service forms with various abstract resources and fully integrates the power grid’s computing resources and multiple data services [17,18]. Authors of [19] provide accurate and detailed data support for the operation control and business system through resource integration and sharing and improve the operation efficiency of the power grid. In addition, based on the cloud computing information platform, the hidden trouble of the power grid operation is timely found and restored to ensure the safe and reliable operation of the smart grid. Based on the current Internet-leading Kubernetes cluster management system and Docker containerization deployment technology, authors of [20] propose a cluster system architecture that can be dynamically extended in real-time. An efficient, reliable, and green shared computing platform has been built, and the deployment has been realized in the General Dispatching of the South China power grid. Authors of [21] combine edge computing with cloud computing to build a cloud-edge collaborative computing framework. A new centralized-distributed joint control model is proposed to establish the physical architecture of power Internet of Things based on cloud-edge collaboration. Authors of [22] analyze the flexibility sources of the data centers from different temporal and spatial dimensions. Authors of [23] propose a Hadoop-based framework for synchronous harmonic extensive data analysis of the distribution network. The proposed framework facilitates harmonic data processing related to storage and advanced research based on big data technologies. This framework deploys a large matrix multiplication algorithm for MapReduce programming model solutions based on solving harmonic state estimation problems during the data processing phase. The MapReduce-based harmonic distortion calculations are implemented as an advanced analysis of the harmonics. Through comprehensive numerical research, the characteristics of the proposed algorithm are analyzed, and the effectiveness of the proposed framework is verified. An essential part of cloud computing technology is cloud storage, which can consume excessive storage space while ensuring data reliability. Additionally, improper data storage location can increase network latency and give users a terrible service experience.

For the longitudinal and horizontal penetration problem between multi-level data centers in the smart grid information transmission network, the cloud-based smart grid management algorithm can solve the problem [24,25]. The principle of the First In First Out algorithm is to maintain the user-submitted tasks by Job Queue in the Hadoop platform and assign the tasks by Job Tracker [26]. The algorithm is simple to implement and consumes little, but the problem is that the algorithm needs to support priority preemption and tends to block lower-level tasks. The principle of the Fair Scheduler algorithm is that when there is a task in the cluster, it can enjoy the whole resource, and when other tasks appear, a part of the resource space in the cluster will be vacated for the new task. Each task in the cluster can occupy roughly the same amount of CPU time, providing equal shared resources for each task [27]. At the same time, the algorithm can also set the priority for tasks so that the time occupied between small and large tasks can be reasonably arranged. However, the algorithm generates a large amount of intermediate data when processing many tasks in a cluster, seriously affecting the system’s performance. The Capacity Scheduler algorithm is a task scheduling algorithm suitable for multi-user shared cluster environments. Each task is submitted to a queue, and a certain percentage of computational resources is allocated to each line. The computational resources in the queue are shared by all tasks in the queue [28]. The Genetic algorithm is one of the more commonly used ones nowadays. It works based on simulating the biological evolution process and genetic mechanism of genes. A stochastic search method was developed by combining computer science with high parallel population search capability and good scalability [29]. The stochastic nature of the Simulated Annealing algorithm can add a great deal of computational effort to the solution process when the amount of data is relatively large and includes the difference between the current solution and the better solution [30,31]. The Gray Wolf algorithm to generate a new solution significantly reduces the number of iterations required in the solution process and dramatically improves the speed of the solution [32,33]. A hybrid algorithm combining the Simulated Annealing and Genetic algorithm is designed, namely, the Simulated Annealing Genetic algorithm [34]. An Ant Colony system-improved Grey Wolf Optimization is designed to solve the model, and the Grey Wolf Optimization is improved by the convergence factor and the proportional weight [35]. Authors of [36] propose a metaheuristic optimizer Colony Search Optimization algorithm. The algorithm mimics the social behavior of early humans. A bi-level Whale Optimization algorithm is designed and compared with the original Whale Optimization algorithm and the Moth-flame Optimization [37]. Authors of [38] propose a novel two-level Particle Swarm Optimization to solve the credit portfolio management problem. This paper uses an improved Simulated Annealing algorithm to solve the longitudinal and horizontal penetration problem between multi-level data centers. When generating a new solution, the Grey Wolf algorithm provides direction for finding the optimal solution. Table 1 includes the main features of the literature on the cloud-based smart grid management algorithm.

The power grid data in the platform grows exponentially. In contrast, the power grid data has the characteristics of many types, large scale, fast change speed, small value density, and discrete geographical location. How to store this massive data inexpensively, reliably, and efficiently, and access and analyze them quickly to meet the needs of various smart grid applications is an urgent problem. For the longitudinal and horizontal penetration problem between multi-level data centers stored in the smart grid information transmission network, based on the improved Simulated Annealing algorithm and the Grey Wolf algorithm, this paper proposes a data storage optimization model for the smart grid based on Hadoop architecture. The effectiveness of the improved algorithms is verified by simulation. The novelty and superiority of this paper compared to existing works and contributions of this paper can be summarized as follows:

Based on the improved Simulated Annealing algorithm and the Grey Wolf algorithm, this paper proposes a data storage optimization model for the smart grid based on Hadoop architecture. An improved Simulated Annealing algorithm solves the longitudinal and horizontal penetration problem between multi-level data centers. The Gray Wolf algorithm generates new solutions, providing the direction to search for the optimal solution.
The smart grid data are counted over time to derive the dependencies between task sets and data sets. According to the dependency between task sets and data sets, the mathematical model is established in combination with the actual data transmission of the power grid. The optimal transmission correspondence between each data set and the data center is calculated.
This paper integrates the existing business data and computational storage resources in the smart grid to establish a mathematical model of the affiliation between data centers and data sets. The optimal distribution of the data set is calculated, and the optimally distributed data set is stored in a distributed physical disk.

The remainder of this paper is organized as follows: Section 2 introduces the materials and methods. In Section 3, the results are discussed. Section 4 concludes this study.

2. Materials and Methods

2.1. Integration of Heterogeneous Resources

The processing of heterogeneous resources is the first task of building the information platform because, in the functional scope of data collection of the platform, the main target is heterogeneous resources [43]. Heterogeneous resources are generated in various modules of the smart grid system, whether between different subsidiaries or different types of equipment or devices of the same type [44]. These factors lead to significant data dispersion and decentralization, and the heterogeneous situation is complex and challenging to share and connect. In response to these problems, the information platform scheme proposed in this paper can focus all kinds of distributed system resources on the same data platform by its robust data management system and then process and analyze the relevant data efficiently and reliably according to the service requirements provided by the system and the user’s interactive equipment. The integration framework of heterogeneous resources is shown in Figure 1.

2.2. Improved Algorithm

Hadoop is a dedicated high-speed data processing, analysis, and cloud computing storage tool. It involves a complete range of data processing subprojects and can even leave the cloud computing system with task processing [45,46]. As seen in Figure 2, the whole platform system collects, transmits, stores, and processes power grid data all the time. Combined with the features of distributed storage of cloud computing, this paper proposes a data storage optimization model for smart grid, which is divided into four stages:

Equivalent the smart grid data to a task-oriented data set;
Flat processing of the smart grid information platform and equivalent it to a collection of multiple distributed data centers;
Most of the mapping relationship in the smart grid is relatively stable. Therefore, the smart grid data in a period can be counted to obtain the dependency between the task set and the data set;
According to the dependency between the task set and the data collection, a mathematical model is established in combination with the actual data transmission of the power grid. An intelligent algorithm calculates the optimal transmission correspondence between each data set and the data center;

The smart grid information transmission network has a longitudinal and horizontal penetration problem between multi-level data centers. According to the characteristics of the data solutions in the power grid, this paper chooses the improved Simulated Annealing algorithm to solve the problem. It uses the Grey Wolf algorithm to produce new solutions. The randomness of the Simulated Annealing algorithm will increase the computational workload in the solution process in the case of relatively large data amounts. After the Grey Wolf algorithm produces the new solution, in the process of solving, the number of iterations is reduced, and the speed of solving is improved.

When there are b queues in the Hadoop system, and the capacity of each queue is a job, the function of time for running each task was set to be t_(a,b). The time required to complete the task is as follows:

T_{a} = \min [\sum_{a = 1, 2, \dots, n}^{a} t_{(a, b)}] (b = 1, 2, \dots, m)

(1)

where m is the number of queues. A double fitness constraint equivalent is introduced into the algorithm, in which the average use time of the task is short, and the total task completion time is short. If the two fitness functions are defined, the average time of the completed job in this queue is such as the formula:

P (t) = \frac{\sum_{a = 1, 2, \dots, n}^{a} t_{(a, b)}}{a}

(2)

where n is the number of jobs.

The other function is as follows:

Q (t) = \frac{1}{t_{(a, b)}}

(3)

The objective function is available:

S = α P (t) + β Q (t)

(4)

where α and β are the adaptive adjustment coefficient.

In this study, the most widely used and easily understood exponential annealing method in the annealing algorithm is adopted, as shown in the following formula:

T_{k} = μ T_{k} - 1

(5)

where T_k is the kth iteration data, and μ is the number of iterations.

The Grey Wolf algorithm is used to generate a new solution to improve the speed of random search. The Grey Wolf is divided into four types of a wolf: the wolf α corresponds to an optimal solution, the wolf β corresponds to an optimal solution, the wolf δ corresponds to a suboptimal solution, and the wolf ω corresponds to the other remaining solution. The process of wolf predation is used to represent the optimization process of the algorithm. The optimization process mainly includes three behaviors: surround, hunting, and attack [47,48]. Finally, the optimal global solution is found. The specific mathematical description is as follows:

Formulas can express the process (6)–(8):

\begin{array}{l} D = | C \cdot X_{P} (t) - X (t) | \\ X (t + 1) = X_{P} (t) - A \cdot D \end{array}

(6)

a = 2 - 2 \cdot \frac{t}{t_{\max}}

(7)

\begin{array}{l} A = 2 a \cdot r_{1} - a \\ C = 2 r_{2} \end{array}

(8)

In the formulas, X_P(t) represents the position vector of the prey. X(t) represents the position vector of the individual gray wolf. A and C are the system vector. α is the convergence factor. r₁ and r₂ are the random vectors between [0, 1]. t_max is the maximum number of iterations.

The hunting process is guided by wolves α, β, δ. The location of wolf ω is updated by the location of wolves α, β, δ. The hunting process can use Formulas (9)–(11):

\begin{array}{l} D_{α} = | C_{1} \cdot X_{α} - X (t) | \\ D_{β} = | C_{2} \cdot X_{β} - X (t) | \\ D_{δ} = | C_{3} \cdot X_{δ} - X (t) | \end{array}

(9)

\begin{array}{l} X_{1} = X_{α} - A_{1} \cdot D_{α} \\ X_{2} = X_{β} - A_{2} \cdot D_{β} \\ X_{3} = X_{δ} - A_{3} \cdot D_{δ} \end{array}

(10)

X (t + 1) = \frac{X_{1} + X_{2} + X_{3}}{3}

(11)

where X(t) and X(t + 1) are the positions of the gray wolf after the iteration of t and the iteration of t + 1, respectively. X_α, X_β, X_δ are the positions of wolves α, β, δ.

When the prey stops moving, the gray wolf collectively attacks the prey and completes the entire hunting behavior. This process is simulated by the linear variation of convergence factor α. When |A| ≤ 1, the gray wolves attack the prey, corresponding to the local search ability of the algorithm. When |A| > 1, the gray wolves search for the possible positions of prey, reaching the searchability of the algorithm. The basic steps of the Grey Wolf algorithm are as follows:

Algorithm initialization. A grey wolf population X_i (i = 1, 2, …, N) with scale N was randomly generated in parameter space. The number of iterations is t = 0, and set the maximum number of iterations is t_max;
Calculate the target function value of each prey. In the initial gray wolf population, all individuals are arranged according to the default value of the target function. The positions of the three individuals corresponding to the optimal, optimal, and suboptimal target function values are recorded as X_α, X_β, X_δ;
According to Formulas (9)–(11), calculate the distance between the other gray wolf X_α, X_β, X_δ, and then update the position of the Grey Wolf;
Update the parameters (a, A, C) according to Formulas (6)–(8);
Calculate each gray wolf individual’s current target function value, and determine the new value of X_α, X_β, X_δ according to the target function;
t = t + 1. When t < t_max, go to step (3), and continue the iterative. When t = t_max, the iteration ends and outputs the optimal value X_α;
The flow chart of the improved algorithm is shown in Figure 3:

2.3. Data Optimization Model

The solutions obtained from the improved Simulated Annealing algorithm were optimized according to the above article, assuming that there are a total of n data centers, and the data centers together constitute a set of A, A = {A_p|p = 1, 2, …, n}. The data set is defined as d_i, and the complete set of the composed data flow is defined as D. T_i is the task set d_i, which belongs to the entire task set T = {T_i|i = 1, 2, …, m}. S_i is the size of the data set d_j. For the task set d_i and d_j, its interdependency can be defined as

Y_{i, j} = | T_{i} \cap T_{j} | 1 \leq i, j \leq m

(12)

In the formula, Y_i_,j is a set composed of calling task sets d_i and d_j simultaneously.

For the data centers with the number of N after processing, the transmission bandwidth between A_i and A_j is defined as b_ij, then the bandwidth matrix between the data centers can be expressed as

B = [\begin{matrix} b_{11} & b_{12} & \dots & b_{1 n} \\ b_{21} & b_{22} & \dots & b_{2 n} \\ ⋮ & ⋮ & \dots & ⋮ \\ b_{n 1} & b_{n 2} & \dots & b_{n n} \end{matrix}]

(13)

The transmission time of a data set between the data center A_i and A_j is

T i m e (d_{i}, A_{p}, A_{q}) = \frac{s_{i}}{b_{p q}} 1 \leq i \leq m, 1 \leq p, q \leq n

(14)

The expression for the spatial distance of the two different data sets, d_i, and d_j, is as follows:

R (d_{i}, d_{j}) = {\begin{matrix} \frac{1}{Y_{i, j}} & Y_{i, j} \neq 0, 1 \leq i, j \leq m \\ 0 & Y_{i, j} \neq 0, 1 \leq i, j \leq m \end{matrix}

(15)

For data sets with the number n, the optimal distribution of data can be generated by the optimization algorithm. The task set T_i is divided into n subsets, {

T_{i}^{1}

,

T_{i}^{2}

, …,

T_{i}^{p}

, …,

T_{i}^{n}

}. Assuming all tasks in the subset

T_{i}^{p}

need to transfer the data sets d_i from C_des to C_p, the overall time of transmission is as follows:

T i m e_{i, z} = \sum_{p = 1}^{n} | T_{i}^{p} | T i m e (d_{i}, C_{d e s}, C_{p})

(16)

For a data set d_i, the affiliated data center should have the following conditions:

ξ = \arg [\min_{1 \leq p \leq n} [R (d_{i}, e_{p}) [1 - λ \frac{\min_{1 \leq j \leq m} {T i m e_{j, z}}}{T i m e_{i, z}}]]]

(17)

In the formula, e_p is the geometric center of the data center.

The analysis of computational complexity is as follows: Integrate the existing business data and computing storage resources in the smart grid, put the mobile computing-related data set into the task scheduling center, and classify the data set with data association. The data center controls and analyzes the subordination relationship between the data center and the data set, calls the intelligent optimization algorithm, and calculates the optimal distribution of the data set.

3. Results and Discussion

3.1. Discussion of the Algorithm

The improved algorithm in this paper was analyzed. The feasibility, efficiency, and stability of the improved algorithm proposed in this paper were verified by comparing the other six algorithms based on Hadoop. Build an experimental simulation platform, one of which serves as the master and the other four serve as slave–slave. The stand-alone operating system is Windows 10, 8 g memory, Java programming language, and Hadoop version 20.20.2 is adopted. Before installing Hadoop, Cygwin software was installed on the PC to simulate the Unix operating system.

The process of building the Hadoop platform is now briefly described as follows:

Install the operating system in the five pc machines respectively, and use the switch to connect the pc machine without the password configuration;
Configure the/etc/hosts file for each pc machine, and configure the IP address to ensure the network connectivity of each pc machine;
Install Hadoop, configure the above algorithms in the background, and run job tasks.

According to the platform, the all-around performance (feasibility, stability, and efficiency) of the improved Simulated Annealing algorithm is analyzed and compared. Simulating tasks counts the total running time and job waiting time. It is compared and diagnosed with the First In First Out algorithm, the Fair Scheduler algorithm, and the Capacity Scheduler algorithm. According to the different power grid data under various conditions, 10 tasks are selected for analysis. At the beginning of the experiment, mark 10 subtasks and number them by 1–10. According to four different algorithms, each scheduler is configured, and the task is processed to obtain corresponding data results. To maintain the fairness and accuracy of the experiment, the algorithm is set not to support priority control.

3.1.1. Efficiency Verification of the Algorithm

First, verify the First In–First Out algorithm. It can be seen from Figure 4 that the total running time of 10 tasks is obtained by processing according to the numbering sequence, with a total time of 49 s. Among them, five sub-tasks take a long time, namely task 2, task 3, task 5, task 7, and task 8, resulting in the compression of the running time of other tasks. At the same time, it needs to wait for a long time, and in terms of resource allocation, it falls into local optimization, reducing users’ overall satisfaction.

Figure 5 shows the running time of the task using the Fair Scheduler algorithm. The algorithm determines the execution order according to the task capacity. The total running time of all tasks is 33 s, which is lower than the First In First Out algorithm. In terms of resource allocation, there is an unreasonable resource allocation problem due to the difference in the actual resource requirements of tasks. This leads to a long-running time, a waste of resources, and a local optimization problem.

Figure 6 shows the running time of the task using the Capacity Scheduler algorithm. Regarding resource allocation, the order of executing tasks is determined by the computing capacity of different tasks. The total running time of all tasks is 30 s. The algorithm will allocate idle resources to tasks with enormous demands to meet their computing capacity. It can be seen that compared with the First In–First Out algorithm and the Fair Scheduler algorithm, the Capacity Scheduler algorithm has been improved in terms of resource allocation efficiency and algorithm performance. However, this algorithm needs help to obtain the optimal global solution.

As shown in Figure 7 for the Genetic Algorithm task run time, after executing 10 tasks, the total run time is 31 s. As can be seen from the figure, during execution, the algorithm allocates the corresponding idle resources to the more demanding tasks to meet their computational capacity. However, the programming is more complex, the setting of parameters needs to be determined empirically, and the stronger dependence on the merits of the initial population.

Figure 8 shows the running time of the task using the Ant Colony algorithm. After executing 10 tasks, the total running time is 37 s. Ant Colony algorithm has the same initial pheromone value and tends to select the next node randomly. Although random selection can explore a larger task space and help to find potential global optimal solutions, it takes a long time to play the role of positive feedback, resulting in a slower initial convergence speed of the algorithm.

Figure 9 shows the running time of the task using the Particle Swarm Optimization algorithm. After executing 10 tasks, the total running time is 30 s. The advantages of the Particle Swarm Optimization algorithm are fast search speed, high efficiency, a simple algorithm, and suitability for real value processing. Disadvantages: Poor handling of discrete optimization problems, prone to falling into local optimization.

Figure 10 shows the running time of the task using the improved Simulated Annealing algorithm. After executing 10 tasks, the total running time is 28 s. Compared with the other algorithms, this algorithm improves the full running time and reduces task waiting time. At the same time, the algorithm can achieve the optimal global solution.

3.1.2. Stability Verification of the Algorithm

To verify the stability of the four algorithms, 10 experiments were also conducted. The total running time of each task of algorithms is shown in Figure 11. The analysis shows that the First In–First Out algorithm has good stability, but its running time is the longest, and the processing efficiency is lower than the other three types. The total running time of the improved Simulated Annealing algorithm is lower than that of the First In–First Out algorithm. In terms of stability, the improved Simulated Annealing algorithm is better than the Capacity Scheduler and Fair Scheduler algorithms. Considering comprehensively, the improved Simulated Annealing algorithm has reasonable practicability.

From the above analysis, it can be seen that the optimal solution obtained by the improved Simulated Annealing algorithm is independent of the value of the set initial solution. Improved Simulated Annealing algorithm has high parallelism and fast solving speed. However, the algorithm also has some disadvantages, such as slow convergence of the optimal solution, and the algorithm’s performance is related to the initial value.

3.2. Discussion of the Optimization Results

The method is verified by simulation. First, determine the experimental environment parameters: the CPU is Intel (R) core (TM) I3 3120M, the operating system is Windows 7, and the software model is Matlab 7.0. Set the initial temperature of the improved Simulated Annealing algorithm to t₀, the range of r₁ and r₂ is [0, 2], the content of A is [−1, 1], and the range of C is [0, 1]. Use Matlab to run the Grey Wolf algorithm program. After about 90 iterations, the optimal value is obtained, and the solution is optimal.

We selected the number of data sets d as 20, 50, 100, and 200 for simulation, and compared the algorithm’s running time.

As seen from Figure 12, when the number of data sets was the same, the transmission time required to use storage optimization was shorter than that needed not to use storage optimization. This advantage becomes more evident as the number of data sets increases. For example, when the number of data sets was 20, the transmission time was 453 s when the storage optimization was not used and 221 s when the storage optimization was used. When the number of data sets increased to 50, the length of transmission time increased by 5 times, corresponding to the number of transmission times without using storage optimization; while using storage optimization, the transmission time only increased by about 2.1 times.

From the above analysis, it can be concluded that the number of iterations is reduced after the Grey Wolf algorithm is used to generate a new solution. Using a Simulated Annealing algorithm avoids the problem of the locally optimal solution. When developing a new key, the bandwidth of the new solution is constrained, and the Grey Wolf algorithm does not need to judge the constraint conditions of the new solution. Thus, the performance of the optimization algorithm is improved, and the solution speed is improved.

4. Conclusions

The smart grid information transmission network has a longitudinal and horizontal penetration problem between multi-level data centers. According to the characteristics of the data solution in the power grid, based on the improved Simulated Annealing algorithm and the Grey Wolf algorithm, this paper studied the data storage optimization model of the smart grid based on Hadoop architecture. The conclusions are as follows:

Equivalent the smart grid data to a task-oriented data set. Flat processing of the smart grid information platform is equivalent to a collection of multiple distributed data centers. Most of the mapping relationship in the smart grid is relatively stable. Therefore, the smart grid data in a period can be counted to obtain the dependency between the task set and the data set;
According to the dependency between the task set and the data collection, a mathematical model is established in combination with the actual data transmission of the power grid. The optimal transmission correspondence between each data set and the data center is calculated;
Integrate the existing business data and computing storage resources in the smart grid, put the mobile computing-related data set into the task scheduling center, and classify the data set with data association. The data center controls and analyzes the subordination relationship between the data center and the data set, call the intelligent optimization algorithm, and calculates the optimal distribution of the data set. Store the optimally distributed data set in a distributed physical disk;
An improved Simulated Annealing algorithm is used to solve the longitudinal and horizontal penetration problem between multi-level data centers, and the Grey Wolf algorithm is used when generating new solutions, providing the direction to search for the optimal solution. The optimal solution is independent of the value of the initial set. After drawing on the Grey Wolf algorithm to generate new solutions, the process becomes less annealing, the number of iterations is reduced and the time required to solve is reduced.

In response to various service needs, smart grid data optimization and storage should also evolve with the times, requiring attention in the following areas:

Active research into service forms suitable for the future diversity of smart grid data;
In the current information age, the number of data-generating devices is increasing, and the amount of data generated is unlimited. Still, the storage and processing equipment for data is limited, so there is a need to develop further and innovate more promising data storage and processing technologies.

Author Contributions

Conceptualization, D.Y. and Q.W.; methodology, D.Y. and Q.W.; software, Q.W. and J.Z.; validation, D.Y. and C.J.; formal analysis, Q.W. and J.Z.; investigation, J.Z.; data curation, Q.W.; writing—original draft preparation, Q.W. and J.Z.; writing—review and editing, D.Y.; visualization, D.Y. and J.Z.; supervision, D.Y.; project administration, J.Z. The authors Q.W. and D.Y. have contributed equally to this paper. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China, funding number 52075232.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data sharing not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Zhang, Z.; Deng, R.; Yau, D.K.Y.; Cheng, P. Zero-Parameter-Information Data Integrity Attacks and Countermeasures in IoT-Based Smart Grid. IEEE Internet Things J. 2021, 8, 6608–6623. [Google Scholar] [CrossRef]
Shi, J.; Foggo, B.; Yu, N. Power System Event Identification Based on Deep Neural Network With Information Loading. IEEE Trans. Power Syst. 2021, 36, 5622–5632. [Google Scholar] [CrossRef]
Oskouei, M.Z.; Mohammadi-Ivatloo, B.; Abapour, M.; Shafiee, M.; Anvari-Moghaddam, A. Privacy-preserving mechanism for collaborative operation of high-renewable power systems and industrial energy hubs. Appl. Energy 2021, 283, 116338. [Google Scholar] [CrossRef]
Oprea, S.-V.; Bâra, A.; Marales, R.C.; Florescu, M.-S. Data Model for Residential and Commercial Buildings. Load Flexibility Assessment in Smart Cities. Sustainability 2021, 13, 1736. [Google Scholar] [CrossRef]
Ansari, S.; Ayob, A.; Lipu, M.S.H.; Saad, M.H.M.; Hussain, A. A Review of Monitoring Technologies for Solar PV Systems Using Data Processing Modules and Transmission Protocols: Progress, Challenges and Prospects. Sustainability 2021, 13, 8120. [Google Scholar] [CrossRef]
Alzahrani, A.; Sajjad, K.; Hafeez, G.; Murawwat, S.; Khan, S.; Khan, F.A. Real-time energy optimization and scheduling of buildings integrated with renewable microgrid. Appl. Energy 2023, 335, 566–577. [Google Scholar] [CrossRef]
Guo, C.; Luo, F.; Cai, Z.; Dong, Z.Y. Integrated energy systems of data centers and smart grids: State-of-the-art and future opportunities. Appl. Energy 2021, 301, 117474. [Google Scholar] [CrossRef]
Hafeez, G.; Alimgeer, K.S.; Wadud, Z.; Khan, I.; Usman, M.; Qazi, A.B.; Khan, F.A. An Innovative Optimization Strategy for Efficient Energy Management With Day-Ahead Demand Response Signal and Energy Consumption Forecasting in Smart Grid Using Artificial Neural Network. IEEE Access 2020, 8, 84415–84433. [Google Scholar] [CrossRef]
Omidvar Tehrani, S.; Shahrestani, A.; Yaghmaee, M.H. Online electricity theft detection framework for large-scale smart grid data. Electr. Power Syst. Res. 2022, 208, 107895. [Google Scholar] [CrossRef]
Ponnusamy, V.K.; Kasinathan, P.; Madurai Elavarasan, R.; Ramanathan, V.; Anandan, R.K.; Subramaniam, U.; Ghosh, A.; Hossain, E. A Comprehensive Review on Sustainable Aspects of Big Data Analytics for the Smart Grid. Sustainability 2021, 13, 13322. [Google Scholar] [CrossRef]
Fragkos, G.; Johnson, J.; Tsiropoulou, E.E. Dynamic Role-Based Access Control Policy for Smart Grid Applications: An Offline Deep Reinforcement Learning Approach. IEEE Trans. Hum.-Mach. Syst. 2022, 52, 761–773. [Google Scholar] [CrossRef]
Mohiuddin, S.M.; Qi, J. Optimal Distributed Control of AC Microgrids With Coordinated Voltage Regulation and Reactive Power Sharing. IEEE Trans. Smart Grid 2022, 13, 1789–1800. [Google Scholar] [CrossRef]
Su, W.; Shi, Y. Distributed energy sharing algorithm for Micro Grid energy system based on cloud computing. IET Smart Cities 2023. [Google Scholar] [CrossRef]
Wójcicki, K.; Biegańska, M.; Paliwoda, B.; Górna, J. Internet of Things in Industry: Research Profiling, Application, Challenges and Opportunities—A Review. Energies 2022, 15, 1806. [Google Scholar] [CrossRef]
Ahammed, M.T.; Khan, I. Ensuring power quality and demand-side management through IoT-based smart meters in a developing country. Energy 2022, 250, 123747. [Google Scholar] [CrossRef]
Chehri, A.; Fofana, I.; Yang, X. Security Risk Modeling in Smart Grid Critical Infrastructures in the Era of Big Data and Artificial Intelligence. Sustainability 2021, 13, 3196. [Google Scholar] [CrossRef]
Dabbaghjamanesh, M.; Kavousi-Fard, A.; Dong, Z.Y. A Novel Distributed Cloud-Fog Based Framework for Energy Management of Networked Microgrids. IEEE Trans. Power Syst. 2020, 35, 2847–2862. [Google Scholar] [CrossRef]
Rosero, D.G.; Díaz, N.L.; Trujillo, C.L. Cloud and machine learning experiments applied to the energy management in a microgrid cluster. Appl. Energy 2021, 304, 117770. [Google Scholar] [CrossRef]
Ma, F.; Luo, X.; Litvinov, E. Cloud Computing for Power System Simulations at ISO New England—Experiences and Challenges. IEEE Trans. Smart Grid 2016, 7, 2596–2603. [Google Scholar] [CrossRef]
Su, Y.; Zhou, T.; Zheng, W.; Zhao, L.; Xue, H.; Huang, G. Construction of Power System Computing and Analysis Platform Based on Cloud Computing. South. Power Syst. Technol. 2022, 16, 67–75. [Google Scholar]
Si, Y.; Tan, Y.; Wang, F.; Kang, W.; Liu, S. Cloud-Edge Collaborative Structure Model for Power Internet of Things. Proc. CSEE 2020, 40, 7973–7979. [Google Scholar] [CrossRef]
Cao, Y.; Ding, Z.; Wang, P.; Zhang, S.; Liu, J.; Liu, W.; Cheng, M. Coordinated Operation for Data Center and Power System in the Context of Energy Internet (II): Opportunities and Challenges. Proc. CSEE 2022, 42, 3512–3526. [Google Scholar] [CrossRef]
Cao, Z.; Lin, J.; Wan, C.; Song, Y.; Taylor, G.; Li, M. Hadoop-based framework for big data analysis of synchronised harmonics in active distribution network. IET Gener. Transm. Distrib. 2017, 11, 3930–3937. [Google Scholar] [CrossRef]
Gao, J.; Ding, X.; Song, X.; Fan, L. Routing Algorithm for Information Transmission in Neighborhood Area Network towards Smart Grid. Int. J. Signal Process. Syst. 2016, 4, 344–348. [Google Scholar] [CrossRef]
Yu, B.; Yin, X.; Chen, X.; Zhang, Z.; Jiang, L. Hybrid Hierarchical Communication Network Optimal Placement for Transmission Line Online Monitoring in Smart Grid. J. Commun. 2016, 11, 798–804. [Google Scholar] [CrossRef]
Shi, X.; Li, X. Operations Design of Modular Vehicles on an Oversaturated Corridor with First-in, First-out Passenger Queueing. Transp. Sci. 2021, 55, 1187–1205. [Google Scholar] [CrossRef]
Wei, J.; Ren, R.; Juarez, E.; Pescador, F. A linux implementation of the energy-based fair queuing scheduling algorithm for battery-limited mobile systems. IEEE Trans. Consum. Electron. 2014, 60, 267–275. [Google Scholar] [CrossRef]
Pham, T.D.; Hong, W.-K. Genetic algorithm using probabilistic-based natural selections and dynamic mutation ranges in optimizing precast beams. Comput. Struct. 2022, 258, 106681. [Google Scholar] [CrossRef]
El-Nemr, M.; Afifi, M.; Rezk, H.; Ibrahim, M. Finite Element Based Overall Optimization of Switched Reluctance Motor Using Multi-Objective Genetic Algorithm (NSGA-II). Mathematics 2021, 9, 576. [Google Scholar] [CrossRef]
Hao, X.; Liu, J.; Zhang, Y.; Sanga, G. Mathematical model and simulated annealing algorithm for Chinese high school timetabling problems under the new curriculum innovation. Front. Comput. Sci. 2021, 15, 151309. [Google Scholar] [CrossRef]
Suanpang, P.; Jamjuntr, P.; Jermsittiparsert, K.; Kaewyong, P. Tourism Service Scheduling in Smart City Based on Hybrid Genetic Algorithm Simulated Annealing Algorithm. Sustainability 2022, 14, 16293. [Google Scholar] [CrossRef]
Xie, Q.; Guo, Z.; Liu, D.; Chen, Z.; Shen, Z.; Wang, X. Optimization of heliostat field distribution based on improved Gray Wolf optimization algorithm. Renew. Energy 2021, 176, 447–458. [Google Scholar] [CrossRef]
Qasim, O.S.; Algamal, Z.Y. A gray wolf algorithm for feature and parameter selection of support vector classification. Int. J. Comput. Sci. Math. 2021, 13, 93. [Google Scholar] [CrossRef]
Lu, F.; Bi, H.; Huang, M.; Duan, S. Simulated Annealing Genetic Algorithm Based Schedule Risk Management of IT Outsourcing Project. Math. Probl. Eng. 2017, 2017, 6916575. [Google Scholar] [CrossRef]
Lu, F.; Feng, W.; Gao, M.; Bi, H.; Wang, S.; Mauriello, F. The Fourth-Party Logistics Routing Problem Using Ant Colony System-Improved Grey Wolf Optimization. J. Adv. Transp. 2020, 2020, 8831746. [Google Scholar] [CrossRef]
Wen, H.; Wang, S.X.; Lu, F.Q.; Feng, M.; Wang, L.Z.; Xiong, J.K.; Si, M.C. Colony search optimization algorithm using global optimization. J. Supercomput. 2021, 78, 6567–6611. [Google Scholar] [CrossRef]
Lu, F.; Yan, T.; Bi, H.; Feng, M.; Wang, S.; Huang, M. A bilevel whale optimization algorithm for risk management scheduling of information technology projects considering outsourcing. Knowl.-Based Syst. 2022, 235, 107600. [Google Scholar] [CrossRef]
Lu, F.-Q.; Huang, M.; Ching, W.-K.; Siu, T.K. Credit portfolio management using two-level particle swarm optimization. Inf. Sci. 2013, 237, 162–175. [Google Scholar] [CrossRef]
Jia, Y.H.; Mei, Y.; Zhang, M. A Bilevel Ant Colony Optimization Algorithm for Capacitated Electric Vehicle Routing Problem. IEEE Trans. Cybern. 2022, 52, 10855–10868. [Google Scholar] [CrossRef]
Yuan, S.; Xu, Y.; Mu, B.; Zhang, L.; Ren, J.; Ma, S.; Duan, W. An Improved Continuous Tabu Search Algorithm with Adaptive Neighborhood Radius and Increasing Search Iteration Times Strategies. Int. J. Artif. Intell. Tools 2021, 30, 2150001. [Google Scholar] [CrossRef]
Dagal, I.; Akın, B.; Akboy, E. Improved salp swarm algorithm based on particle swarm optimization for maximum power point tracking of optimal photovoltaic systems. Int. J. Energy Res. 2022, 46, 8742–8759. [Google Scholar] [CrossRef]
Zhou, S.; Xing, L.; Zheng, X.; Du, N.; Wang, L.; Zhang, Q. A Self-Adaptive Differential Evolution Algorithm for Scheduling a Single Batch-Processing Machine With Arbitrary Job Sizes and Release Times. IEEE Trans. Cybern. 2021, 51, 1430–1442. [Google Scholar] [CrossRef] [PubMed]
Zhao, X.; Li, L.; Tao, Y.; Lai, S.; Zhou, X.; Qiu, J. Aggregated operation of heterogeneous small-capacity distributed energy resources in peer-to-peer energy trading. Int. J. Electr. Power Energy Syst. 2022, 141, 108162. [Google Scholar] [CrossRef]
González, I.; Calderón, A.J.; Portalo, J.M. Innovative Multi-Layered Architecture for Heterogeneous Automation and Monitoring Systems: Application Case of a Photovoltaic Smart Microgrid. Sustainability 2021, 13, 2234. [Google Scholar] [CrossRef]
Ramdane, Y.; Boussaid, O.; Boukraà, D.; Kabachi, N.; Bentayeb, F. Building a novel physical design of a distributed big data warehouse over a Hadoop cluster to enhance OLAP cube query performance. Parallel Comput. 2022, 111, 102918. [Google Scholar] [CrossRef]
Huang, Y.F.; Wu, H.Y. Image retrieval based on ASIFT features in a Hadoop clustered system. IET Image Proc. 2020, 14, 138–146. [Google Scholar] [CrossRef]
Komathi, C.; Umamaheswari, M.G. Erratum to “Design of Gray Wolf Optimizer Algorithm-Based Fractional Order PI Controller for Power Factor Correction in SMPS Applications”. IEEE Trans. Power Electron. 2020, 35, 5543. [Google Scholar] [CrossRef]
Sun, X.; Hu, C.; Lei, G.; Guo, Y.; Zhu, J. State Feedback Control for a PM Hub Motor Based on Gray Wolf Optimization Algorithm. IEEE Trans. Power Electron. 2020, 35, 1136–1146. [Google Scholar] [CrossRef]

Figure 1. Integration of heterogeneous resources.

Figure 2. The overall framework of the information platform.

Figure 3. The flow chart of the improved algorithm.

Figure 4. First In–First Out algorithm task runtime.

Figure 5. Fair Scheduler algorithm task running time.

Figure 6. Capacity Scheduler algorithm task running time.

Figure 7. Genetic algorithm task running time.

Figure 8. Ant Colony algorithm task running time.

Figure 9. Particle Swarm Optimization algorithm task running time.

Figure 10. Improved Simulated Annealing algorithm task runtime.

Figure 11. Comparison of the four scheduler algorithms.

Figure 12. Comparison of storage optimization data.

Table 1. The main features of the literature on the cloud-based smart grid management algorithm.

Literature	Algorithms	Design Concept	Advantages	Disadvantages
Ref. [26]	First In First Out algorithm	Schedules tasks according to the time and priority of the user’s submission	The algorithm is relatively simple and easy to implement	Suitable for single-user, single-task jobs, this algorithm is unsuitable when there are multiple users sharing different types of jobs
Ref. [27]	Fair Scheduler algorithm	Enables all jobs to receive the same amount of resources	Parallel execution of multiple types of jobs in a computable framework. Addresses underutilization of resources	Generates a large amount of intermediate data when processing many tasks in a cluster, which seriously affects the performance of the system
Ref. [28]	Capacity Scheduler algorithm	Resources are allocated according to the different needs of each queue, ensuring that each job has the resources it needs	Improved system throughput. Supports multi-user, multi-type jobs	Need to configure the queue parameters for the job manually
Ref. [29]	Genetic algorithm	It works based on simulating the biological evolution process and genetic mechanism of genes	High parallel population search capability and good scalability	The programming is more complex. The setting of parameters needs to be determined empirically. Stronger dependence on the merits of the initial population
Refs. [30,31]	Simulated Annealing algorithm	Derived from the solid annealing principle, it is a probability-based algorithm	The computational process is simple, general and robust, suitable for parallel processing, and can be used to solve complex non-linear optimization problems	Slow convergence, long execution time, algorithm performance related to initial values, and parameter sensitivity
Refs. [32,33]	Gray Wolf algorithm	An optimized search method inspired by the prey-hunting activities of the grey wolf	Strong convergence performance, few parameters, easy to implement	The location update equation has a strong exploitation capability and a weak exploration capability. Global search accuracy is slightly poor
Ref. [39]	Ant Colony Optimization algorithm	Arises from the study of ant colony behavior	Easy to find the optimal global solution	Slow convergence rate. The contradiction between population diversity and convergence rate
Ref. [40]	Tabu Search algorithm	Simulates the human characteristic of having a memory function	Diversifying search paths used to jump out of the optimal local solution	With weak global search capability, the search result depends entirely on the initial solution and the mapping relationship of the neighborhood
Ref. [41]	Particle Swarm Optimization algorithm	Inspired by the foraging behavior of birds	Highly versatile. Simple in principle and easy to implement. Fast convergence	Poor handling of discrete optimization problems, prone to falling into local optimum solutions
Ref. [42]	Differential Evolution algorithm	A stochastic search algorithm optimized for functions of actual variables	The overall optimal solution can be found with a high probability, which is suitable for large-scale parallel distribution processing	The variability between the individuals in the later stages of the algorithm decreases, and it is easy to fall into a local optimum
This paper	Improved Simulated Annealing algorithm and Gray Wolf algorithm	A double fitness constraint equivalent is introduced into the improved Simulated Annealing algorithm. The Grey Wolf algorithm is used to generate a new solution	The optimal solution is independent of the value of the initial set. After drawing on the Grey Wolf algorithm to create new solutions, the number of iterations is reduced	The grey wolf algorithm and the simulated annealing algorithm need to be modeled mathematically separately

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, Q.; Yu, D.; Zhou, J.; Jin, C. Data Storage Optimization Model Based on Improved Simulated Annealing Algorithm. Sustainability 2023, 15, 7388. https://doi.org/10.3390/su15097388

AMA Style

Wang Q, Yu D, Zhou J, Jin C. Data Storage Optimization Model Based on Improved Simulated Annealing Algorithm. Sustainability. 2023; 15(9):7388. https://doi.org/10.3390/su15097388

Chicago/Turabian Style

Wang, Qiang, Dong Yu, Jinyu Zhou, and Chaowu Jin. 2023. "Data Storage Optimization Model Based on Improved Simulated Annealing Algorithm" Sustainability 15, no. 9: 7388. https://doi.org/10.3390/su15097388

APA Style

Wang, Q., Yu, D., Zhou, J., & Jin, C. (2023). Data Storage Optimization Model Based on Improved Simulated Annealing Algorithm. Sustainability, 15(9), 7388. https://doi.org/10.3390/su15097388

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Data Storage Optimization Model Based on Improved Simulated Annealing Algorithm

Abstract

1. Introduction

1.1. Background

1.2. Literature Review

2. Materials and Methods

2.1. Integration of Heterogeneous Resources

2.2. Improved Algorithm

2.3. Data Optimization Model

3. Results and Discussion

3.1. Discussion of the Algorithm

3.1.1. Efficiency Verification of the Algorithm

3.1.2. Stability Verification of the Algorithm

3.2. Discussion of the Optimization Results

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI