Design of an Energy Supply and Demand Forecasting System Based on Web Crawler and a Grey Dynamic Model

Lin, Gang; Liang, Yanchun; Tavares, Adriano

doi:10.3390/en16031431

Open AccessArticle

Design of an Energy Supply and Demand Forecasting System Based on Web Crawler and a Grey Dynamic Model

by

Gang Lin

^1,2,

Yanchun Liang

^1,* and

Adriano Tavares

²

¹

School of Computer Science, Zhuhai College of Science and Technology, Zhuhai 519041, China

²

Department of Industrial Electronics, School of Engineering, University of Minho, 4800-058 Guimares, Portugal

^*

Author to whom correspondence should be addressed.

Energies 2023, 16(3), 1431; https://doi.org/10.3390/en16031431

Submission received: 4 December 2022 / Revised: 13 January 2023 / Accepted: 23 January 2023 / Published: 1 February 2023

Download

Browse Figures

Versions Notes

Abstract

:

An energy supply and demand forecasting system can help decision-makers grasp more comprehensive information, make accurate decisions and even plan a carbon-neutral future when adjusting energy structure, developing alternative energy resources and so on. This paper presents a hierarchical design of an energy supply and demand forecasting system based on web crawler and a grey dynamic model called GM(1,1) which covers all the process of data collection, data analysis and data prediction. It mainly consists of three services, namely Crawler Service (CS), Algorithm Service (AS), Data Service (DS). The architecture of multiple loose coupling services makes the system flexible in more data, and more advanced prediction algorithms for future energy forecasting works. In order to make higher prediction accuracy based on GM(1,1), this paper illustrates some basic enhanced methods and their combinations with adaptable variable weights. An implementation for testing the system was applied, where the model was set up for coal, oil and natural gas separately, and the enhanced GM was better with relative error about 9.18% than original GM on validation data between 2010 and 2020. All results are available for reference on adjusting of energy structure and developing alternative energy resources.

Keywords:

energy supply and demand forecasting system; crawler service; algorithm service; data service; GM(1,1)

1. Introduction

An energy supply and demand forecasting system is often used to make energy demand forecasting, which is essential for initiating development plans, making operational and maintenance decision, and performing strategic energy planning [1]. An intact energy supply and demand forecasting system includes data collection, data handling, data presentation and so on. Therefore, the architecture, data sources and algorithms should be focused on during the system design. In fact, energy is so important that the National Bureau of Statistics releases the energy data at regular intervals on the internet. In addition, some enterprises also record energy data daily, weekly, monthly, or yearly on the LAN. Therefore, historical data about energy can be collected from the Internet or the Intranet. In order to obtain energy data dynamically, a model of web crawler is being considered. A web crawler crawls across the web, at its core is an element of recursion, and it retrieves page contents for a URL, examines that page for another URL, and retrieves that page [2]. The first-generation tool is a kind of general search engine which tries to search all the pages [3]. Topical crawler or Focused Crawler is different from a general search engine, and it only focuses on some pages associated with the topic set [4]. In order to collect and analyze topic-specific web information more efficiently without downloading many irrelevant pages, heuristic algorithm, context graph, machine learning, page features and others are studied [5,6,7,8,9].

The model of the web crawler is the basis of data acquisition to generate a report, especially for an enterprise where there is neither the API support from their existing website systems nor the granted database access rights approval [10]. In general, it is easier to obtain data from a stable API than from an unstable website since with the upgrade of the website, any style of data table, position of data, the tag name of HTML, data format or others may be changed on the same URL. Furthermore, the upgrade of a website on a server is always more frequent than that of an API as a part of business service. On the design of the model of the web crawler, it is necessary to consider the unstable feature of a website and dynamic parse rules configured. Designing a dynamic web crawler according to different environments is a key issue.

Grey system theory, abbreviated to GST or GS, was pioneered by Deng Julong in 1982 and it is suitable for solving uncertain problems with greyness such as less data, less samples, incomplete information and lack of experience [11]. GS has been successfully applied to many different fields [11,12,13,14,15,16,17,18,19,20,21,22] including energy forecasting such as coal and gas outburst prediction [14], gas well productivity [16], electricity consumption productivity [22] and so on. The basic Grey Model GM(1,1) hides the law of grey cause and white effect, and its expression is shown as Formula (1), where x⁽⁰⁾(k) is a certain quantity as a white output, b is a grey input equivalent, a is called the developing coefficient, and z⁽¹⁾ is MEAN series of Accumulated Generating Operation (AGO) series x⁽¹⁾, z⁽¹⁾(k) can be identified by averaging adjoining data in AGO series, as shown in Formula (2).

x⁽⁰⁾(k) + a · z⁽¹⁾(k) = b (k = 1, 2,…, n)

(1)

z⁽¹⁾(k) = (x⁽¹⁾(k − 1) + x⁽¹⁾(k))/2 (k = 2, 3,…, n)

(2)

There are two stages in the development of grey modeling, namely smoothing stage and non-smoothing stage, Smo (Smooth) GM(1,1) in 1982 for the former, Steep GM(1,1) and Undulating GM(1,1) (UGM) in 2001 for the latter [11]. When using GM(1,1), the feature of raw data series is required to be distinguished before the form or stage is determined. The expression of Steep GM(1,1) and UGM are, respectively, shown in Formulas (3) and (4).

x⁽⁰⁾(k) + a · z⁽¹⁾(k − t) = b² (k = t + 1, t + 2,…, n)

(3)

x⁽⁰⁾(k) + atan((k − t) · p · z⁽¹⁾(k − t)) = b sin((k − t) · p) (k = t + 1, t + 2,…, n)

(4)

If z⁽¹⁾(k) is replaced with x⁽¹⁾(k) in Formula (1), it is changed to a primitive form of GM(1,1), as shown in Formula (5).

x⁽⁰⁾(k) + a · x⁽¹⁾(k) = b (k = 1, 2,…, n)

(5)

In 2014, Liu et al. defined four basic models of GM(1,1), namely Even Grey Model (EGM), Original Difference Grey Model (ODGM), Even Difference Grey Model (EDGM), and Discrete Grey Model (DGM) [23]. Both EGM and EDGM are inferred from the mean/even form Formula (1), ODGM is from the primitive form Formula (5), and DGM is the discrete form of GM(1,1), as shown in Formula (6).

x⁽¹⁾(k + 1) = β₁ · x⁽¹⁾(k) + β₂ (k = 1, 2,…, n−1)

(6)

In 2021, Xu et al. summarized some revised models of GM(1,1) to improve prediction accuracy, including the application of the segment modeling method, cubic spline function, Fourier series, particle swarm algorithm and so on [24].

When the system encounters GM(1,1), multiple challenges are faced, data stream for external data, internal storage data and data generated from multiple algorithms, and prediction accuracy. In this paper, we made two main works, one is the system design of three services with the ability to involve different data sources and algorithms, and the other is implementing two basic enhanced methods and their combination for GM(1,1) to improve the prediction accuracy. Firstly, the CS is introduced to collect the energy supply and demand data available online, which allows different URL sources to be configured, as well as data to be stored after validation into text files, a relational database or a NoSQL database through the DS. Secondly, this paper illustrates the application based on GM(1,1), including two basic enhanced methods and their combination with adaptable variable weights for higher accuracy. According to the loose coupling design between AS and others, other algorithms are open to add into the system once they implement specified interfaces. Finally, according to the design, the testing for the system was implemented, where the model was set up for coal, oil and natural gas in the form of values and percentages separately.

2. Design on Energy Supply and Demand Forecasting System

The final design of the proposed software stack is shown in Figure 1. There are five main parts: Client, Crawler Service (CS), Data Service (DS), Data Base, and Algorithm Service (AS). The CS obtains data about energy supply and demand from websites or services through the Internet or Intranet. The DS stores data into databases or files according to customized settings in advance. The AS generates associated prediction data series by different instances of GMs, such as smoothing GM(1,1) or non-smoothing ones. The Client can be a combination of both CS and AS, which offers API for other external applications such as websites, desktop applications, applications for smart phone, etc. In fact, the Client, CS, DS and AS can be merged into a unified service such as a Data Handling Service (DHS), which is simpler. However, it will lessen soft flexibility on the whole, e.g., within the design of five parts, the CS can be published as a schedule service task such as at twelve o’clock. All the raw data can be prepared in advance, on the calling of AS, these data can be fetched as an input to the selected algorithm. That is to say, both CS and AS share the data in the Data Base through the DS, where CS is to add raw data and AS is to generate prediction data based on raw data.

2.1. Crawler Service and Data Service

Primary data is crawled from the configured websites by the CS, and then the data is stored into different databases and files according to customized settings by the DS, such as a relational database DM8, or a NoSQL database HBase, or Json file, or XML files, etc. Firstly, users make some necessary data initialization, including target website URL and parsed rules and so on. Secondly, the CS generates the request info to access the target URL. Thirdly, the response info is parsed according to some configured parse rules by the CS as soon as the response is returned from the target website. The whole logic is shown as Figure 2. Both the CS and AS can be accessed by the Client, but no Client is allowed to access the database or data files directly without calling DS since the DS is the only legal access point for data files or data bases. The DS consists of two main functions, data storing, and data retrieval. As for the CS, it calls the data storing service within DS mainly.

2.2. Algorithm Service and Data Service

Energy supply and demand forecasting is the estimating of future energy consumption based on the various data and information available [1], with the existing energy data as the input, and the generated prediction data through the system as the output. In order to validate the model performance without waiting until the future time, the existing energy data can be split into training and validation data sets. Generally, the raw data may be outside, or part of an input beyond the system. When the data source URL and parsed rules are configured in advance, as described in Section 2.1, the input can also be considered as changing into the configured information from the whole system perspective.

Unlike CS, which obtains data from URL outside, DS is mainly up to internal data access. The DS offers a service on data, including data storing into database or files, and data retrieval from database or files. After the call of CS, the raw data is collected for coal, gas and oil and so on. In order to illustrate the value of these data, several different kinds of algorithms were implemented in AS. Although currently AS implementation is focused on the GM(1,1), its design is generic and open for other algorithms. Once any new algorithm is implemented, it will be embedded in the AS. The AS generates prediction results based on raw data from the CS. It calls both data retrieval service and data storing service within DS. On the one hand, the retrieval service is to pick up raw data from CS. On the other hand, the data storing service is to store the prediction results from AS. The whole logic is shown in Figure 3.

Firstly, the raw data is picked up as soon as a task starts. The task can be in periodic or random patterns. For example, a task can be scheduled every year, or some user triggers a task by clicking the button or link after basic configurations about tasks are set. Secondly, the raw data must be validated as only valid raw data goes to the next process. It makes sure that the type of data value is correct and the amount of data is up to the basic low threshold. Thirdly, the data is validated again before the actual execution per algorithm, namely validation of the matching rate between data and algorithm. Different from the first validation, the second validation is for a specific algorithm rather than a general validation for all algorithms. There are different validation rules for different algorithms, such as level ratio detection for GM(1,1). Fourthly, the AS builds the model by using raw data and generates the prediction results. Finally, these prediction results are stored through the call of DS. In fact, the result of level ratio detection is not always successful. When the result is tagged as a failure, the logic of 6.3 in Figure 3 will be detailed, as shown in Figure 4.

2.3. API and Application

The Client offers API for other external applications, every component of AS, CS, DS contains its own interface, the call relationship between them is shown in Figure 5.

3. Data Analysis and Enhanced Algorithm

Energy supply and demand data can be available online, e.g., global standardized data on the world’s energy markets can be found within the BP Statistical Review of World Energy from the official website of BP. The official website of National Bureau of Statistics of China provides more detailed energy data of China through the China Statistical Yearbook 2021. For instance, raw data is directly obtained from the China Statistical Yearbook without any changes, which covers period of 1978 and 2020. However, there are only 3 years of data in the first 11 years, respectively, in 1978, 1980 and 1985, between 1978 and 1989. Therefore, these 3 years of data are identified as invalid data after validation rules execution, and they will be ignored in the later process. In addition, in order to verify the performance model, the data between 1990 and 2009 is classified as the training set which is used to build the algorithm model, and the data after 2009 is classified as the test set which is employed to check the model. There are five kinds of energy consumption and component data for coal, oil, gas, power and amount crawled from the official website of National Bureau of Statistics of China. There is a slight difference in the fitted value, predicted value, residual error and absolute relative error between four kinds of GM(1,1), namely EGM, ODGM, EDGM and DGM no matter whether data is for coal, oil, gas or power. We take the data analysis of the proportion of coal consumption as an example, as shown in Table 1, where FV, MV, PV, RV, RE, and ARE are abbreviated forms of fitted value, mean value, predicted value, real value, residual error and absolute relative error separately. It can be seen that 1.77~1.79% is less than 13.01~13.06%, i.e., MV of absolute relative error during the period of 2010 and 2015 is less than that during 2016 and 2020, and the prediction accuracy of the first six years is higher than that of the second five years.

The enhanced method is designed after the analysis of the relation between the real value and predicted value. Although there is no distinct direct linear relation between RV and PV, the linear relation between the arithmetic means of different period is found for coal, oil and amount, and the quadratic relation between the residual error and the sequence number is found for gas and power. Two basic enhanced algorithms and a common combined one are proposed accordingly.

The first enhanced algorithm, namely e1, groups the data and fit a linear relation. Firstly, a constant value k is defined as the number of years in each group, where k is greater than or equal to one. When k is set to one, the group degenerates into one item in each group. Secondly, for each group i, there are two different calculation styles for e1. The first style is to calculate the arithmetic mean of values of RV or FV within each group, namely Intranet Mean Value Style, and the second style is to calculate the arithmetic mean of values with the index from one to the end of group i, namely Simple Mean Value Style, as shown in Formulas (7) and (8), where x can be RV or FV. It is proposed that k should be set to one of the factors of the number of data items n to avoid grouping unevenly, and then n/k is the number of groups.

u(i) = (x(i · k − k + 1) + x(i · k − k + 2) +… + x(i · k))/k (i = 1, 2,…, n/k)

(7)

u(i) = (x(1) + x(2) + … + x(i · k))/(i · k) (i = 1, 2,…, n/k)

(8)

Thirdly, the linear function is fitted between u for RV and FV, as shown in Formula (9). Finally, the fitted linear function with the max indicator of the degree of trend line fitting is used to adjust PV, as shown in Formula (10), and the delta expression is shown as Formula (11).

u_RV(i) = a · u_FV(i) + b

(9)

PV_e1 = a · PV + b

(10)

∆_e1 = (a − 1) · PV + b

(11)

Table 2 shows the linear relation between RV and FV of coal consumption percentage by two different styles, and only item with the max R2 is selected, which item is marked with “O” before the number value of year every group in the cell of “l”. Table 3 lists the final result of enhanced algorithm I, and it can be seen that the enhanced algorithm I has a better performance with mean absolute relative error of 5.38% than 7.90%.

The second enhanced algorithm, namely e2, is to fit a quadratic relation between the residual error and sequence number. Firstly, the first year to predict is defined at the sequence number of 0, the year before it is with the negative sequence number, and the year after it with the positive number, such as years from 1990 to 2020 with sequence numbers from −19 to 9. In the whole, the continuous sequence numbers are used. Secondly, the quadratic function is fitted between the residual error and sequence number, as shown in Formula (12), where RE_i is for residual error, i for sequence number, a, b, and c are parameters to be determined by fitting. Finally, the fitted parabola function is used to adjust PV, as shown in Formula (13), and the delta expression is shown in Formula (14).

RE_i = a · i² + b · i + c

(12)

PV_e2 = PV + ∆_e2

(13)

∆_e2 = a · i² + b · i

(14)

In addition, there is a combined enhanced algorithm based on e1 and e2, namely, ea. Both e1 and e2 are combined with different weights to form ea, as shown in Formula (15), where W^T is coefficient row vector, the weight amount of all items is equal to 1,

Δ^{T}

is the delta expression row vector for different enhanced algorithms. The Formula (15) is common for multiple enhanced algorithms, e.g., if only two enhanced algorithms exist, W^T contains only two weights such as [1,0] or [0,1], and

Δ

^T is [

Δ_{e 1}, Δ_{e 2}

]. The algorithm e1 is used when W^T is set to [1,0] and e2 is used when WT is set to [0,1]. Table 4 lists the relative report for enhanced algorithms.

PV_ea = PV + W^T ∙ ∆^T

(15)

The enhanced Grey Dynamic Model can support the prediction of coal, oil, gas, power and amount with average absolute relative error about 23.34%, the enhanced Grey Dynamic Model is better with relative error about 9.18%.

4. Discussions and Results

In this paper, we propose the hierarchical design of an energy supply and demand forecasting system including Client that offers API for other external applications, Crawler Service (CS), Algorithm Service (AS), Data Service (DS), and Data Base. The AS is open for other algorithms only if new algorithms are implemented and embedded into the AS according to the interface definition.

In order to improve the prediction accuracy of GM(1,1), we additionally come up with two basic enhanced algorithms and a combination of them. The enhanced algorithm reaches better prediction accuracy by up to more than ten percent. There are two main points that are focused on during the design of enhanced algorithm, both less absolute relative error values and being simple enough. The first basic enhanced algorithm e1 is based on grouping, and the sequence of series is the focus of the second one e2. Sometimes, it is difficult to find the global optimal weight values, which is one shortcoming. Here are two proposals, one is to generate a limited pool of weight values, the other is to stop the execution of enhanced algorithms once the prediction accuracy reached a threshold value configured in advance.

In the future, we would like to add more Grey Models and other algorithms for more situations, such as machine learning methods [25], seasonal grey model (SGM) [26] and so on [27,28]. Additionally, more features can be involved such as the energy consumer behavior which modelling will be one of the major aspects of future research [29], and the uncertainty of renewable sources, demands, energy market spot prices, etc., which modeling will be a promising line of future research in the modeling of Smart Energy Hub (SEH) [30].

Author Contributions

Conceptualization and methodology, G.L. and Y.L.; software, G.L.; validation, A.T.; data curation, G.L.; writing—original draft preparation, G.L.; writing—review and editing, Y.L. and A.T.; funding acquisition, Y.L. and G.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by NSFC grant number 61972174, Guangdong Science and Technology Planning Project grant number 2020A0505100018, Guangdong Universities’ Innovation Team Project grant number 2021KCXTD015, Guangdong Key Disciplines Project grant number 2021ZDJS138, and 2021 University-level Teaching Quality Project grant number ZLGC20210203.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ARE	Absolute Relative Error
AS	Algorithm Service
CS	Crawler Service
DGM	Discrete Grey Model
DHS	Data Handling Service
DS	Data Service
EDGM	Even Difference Grey Model
EGM	Even Grey Model
FV	Fitted Value
GM	Grey Model
MV	Mean Value
ODGM	Original Difference Grey Model
PV	Predicted Value
RE	Residual Error
RV	Real Value
SEH	Smart Energy Hub
SGM	Seasonal Grey Model

References

Islam, M.A.; Che, H.S.; Hasanuzzaman, M.; Rahim, N.A. Energy demand forecasting. In Energy for Sustainable Development; Hasanuzzaman, M., Rahim, N.A., Eds.; Academic Press: Cambridge, MA, USA, 2019; pp. 105–123. [Google Scholar]
Mitchell, R. Web Scraping with Python, 2nd ed.; No Starch Press: Sebastopol, CA, USA, 2018; pp. 33–48. [Google Scholar]
McBryan, O.A. GENVL and wwww: Tools for Taming the Web. In Proceedings of the First World-Wide Web Conference, Geneva, Switzerland, 25–27 May 1994. [Google Scholar]
Chakrabarti, S. Focused Crawling: A New Approach for Topic-Specific Resource Discovery. In Proceedings of the 8th International World Wide Web Conference (www8), Toronto, ON, Canada, 11–14 May 1999. [Google Scholar]
Mukherjea, S. WTMS: A System for Collecting and Analyzing Topic-specific Web Information. Comput. Netw. 2000, 33, 457–471. [Google Scholar] [CrossRef]
Diligenti, M.; Coetzee, F.M.; Lawrence, S.; Giles, C.L.; Gori, M. Focused Crawling Using Context Graphs. In Proceedings of the 26th International Conference on Very Large Data Bases, Palo Alto, CA, USA, 10 September 2000; pp. 527–534. [Google Scholar]
Mccallum, A.; Nigam, K.; Rennie, J.; Seymore, K. Building Domain Specific Search Engines with Machine Learning Techniques. In Proceedings of the AAAI Spring Symposium on Intelligent Agents in Cyberspace, Stanford, CA, USA, 22–24 March 1999; pp. 28–39. [Google Scholar]
Menczer, F.; Belew, R.K. Adaptive Retrieval Agents: Internalizing Local Context and Scaling Up to the Web. Mach. Learn. 2000, 39, 203–242. [Google Scholar] [CrossRef]
Uemura, Y.; Itokawa, T.; Kitasuka, T.; Aritsugi, M. An Effectively Focused Crawling System. In Innovations in Intelligent Machines—2: Intelligent Paradigms and Applications; Watanabe, T., Jain, C.L., Eds.; Springer: Berlin, Germany, 2012; Volume 376, pp. 61–76. [Google Scholar]
Lin, G.; Liang, Y.; Fu, X.; Chen, G.; Cai, S. Design of a Daily Brief Business Report Generator Based on Web Scraping with KNN Algorithm. J. Phys. Conf. Ser. 2019, 1345, 052064. [Google Scholar] [CrossRef]
Deng, J. The Primary Methods of Grey System Theory; Huazhong University of Science and Technology Press: Wuhan, China, 2005; pp. 1–4. [Google Scholar]
Liu, S.; Yang, Y.; Xie, N.; Forrest, J. New progress of Grey System Theory in the New Millennium. Grey Syst. Theory Appl. 2016, 6, 2–31. [Google Scholar] [CrossRef]
Qiao, G.; Zhang, W.; Xue, S. Speed Control Based on Fuzzy PID Control with Grey Prediction in the Deep Sea Stepping System. J. China Coal Soc. 2009, 34, 1550–1553. [Google Scholar]
Fang, X.; Chen, Y.; Li, S. Application of Multidimensional Grey Evaluation Methods in Coal and Gas Outburst Prediction. Ind. Saf. Environ. Prot. 2012, 38, 81–83. [Google Scholar]
Wu, Z.; Xu, B.; Gu, C.; Li, Z.C. Comprehensive Evaluation Methods for Dam Service Status. Sci. China Technol. Sci. 2012, 55, 2300–2312. [Google Scholar] [CrossRef]
Tang, K.; Zhou, N.; Fan, X. Analysis on the Factors Influencing the Gas Well Productivity of S2 Gas Pool in Permian of Zizhou Gas Field. Comput. Technol. Geophys. Geochem. Explor. 2012, 34, 723–728. [Google Scholar]
Liang, C.; Gu, D.; Bichindaritz, I. Integrating Gray System Theory and Logistic Regression into Case-based Reasoning for Safety Assessment of Thermal Power Plants. Expert Syst. Appl. 2012, 39, 5154–5167. [Google Scholar] [CrossRef]
Sheng, B.; Li, J. Measurement and Analysis of the Level of Development of New Urbanization in Jinan City: Based on GM(1,1) Model. J. Xichang Univ. (Nat. Sci. Ed.) 2022, 36, 48–51+70. [Google Scholar] [CrossRef]
Li, Z. Application of Kalman-GM(1,1) Combined Model in Settlement Prediction of the Top of the Enclosure Wall of the Foundation Pit. Urban Geotech. Investig. Surv. 2022, 205–208. [Google Scholar]
Xiao, J.; He, T. Forecast of Railway Freight Volume Based on Improved Gray GM(1,1) Model. J. Lanzhou Jiaotong Univ. 2021, 40, 40–45. [Google Scholar]
Kumar, T.S.; Rao, K.V.; Balaji, M.; Murthy, P.B.G.S.N.; Kumar, D.V. Online monitoring of crack depth in fiber reinforced composite beams using optimization Grey model GM(1,N). Eng. Fract. Mech. 2022, 271, 108666. [Google Scholar] [CrossRef]
Du, X.; Wu, D.; Yan, Y. Prediction of electricity consumption based on GM(1,Nr) model in Jiangsu province, China. Energy 2023, 262, 125439. [Google Scholar] [CrossRef]
Liu, S.F.; Zeng, B.; Liu, J.; Xie, N.M. Several Basic Models of GM(1,1) and Their Applicable Bound. Syst. Eng. Electron. 2014, 36, 501–508. [Google Scholar]
Xu, N.; Ding, S.; Gong, Y. Advances in Grey GM(1,1) Forecasting Model and Its Extension. Math. Pract. Theory 2021, 51, 52–59. [Google Scholar]
Es, H.A. Monthly natural gas demand forecasting by adjusted seasonal grey forecasting model. Energy Sources Part A Recovery Util. Environ. Eff. 2021, 43, 54–69. [Google Scholar] [CrossRef]
Chou, J.-S.; Hsu, S.-M. Automated prediction system of household energy consumption in cities using web crawler and optimized artificial intelligence. Int. J. Energy Res. 2022, 46, 319–339. [Google Scholar] [CrossRef]
Filippova, S.P.; Malakhova, V.A.; Veselov, F.V. Long-Term Energy Demand Forecasting Based on a Systems Analysis. Therm. Eng. 2021, 68, 881–894. [Google Scholar] [CrossRef]
Salhein, K.; Kobus, C.J.; Zohdy, M. Forecasting Installation Capacity for the Top 10 Countries Utilizing Geothermal Energy by 2030. Thermo 2022, 2, 334–351. [Google Scholar] [CrossRef]
Fodstad, M.; Granado, P.D.; Hellemo, L.; Knudsen, B.R.; Pisciella, P.; Silvast, A.; Bordin, C.; Schmidt, S.; Straus, J. Next frontiers in energy system modelling: A review on challenges and the state of the art. Renew. Sustain. Energy Rev. 2022, 160, 112246. [Google Scholar] [CrossRef]
Lasemi, M.A.; Arabkoohsar, A.; Hajizadeh, A.; Mohammadi-Ivatloo, B. A comprehensive review on optimization challenges of smart energy hubs under uncertainty factors. Renew. Sustain. Energy Rev. 2022, 160, 112320. [Google Scholar] [CrossRef]

Figure 1. Components of Energy Supply and Demand Forecasting System.

Figure 2. The final algorithmic design of the Crawler Service.

Figure 3. Final algorithm of the Algorithm Service.

Figure 4. Flow Chart of Level Ratio Detection.

Figure 5. Call relationship between interfaces of AS, CS, DS.

Table 1. Relative error report about coal consumption percentage based on GM(1,1).

Year	Coal RV * (%)	EGM * FV *	EGM RE *	EGM ARE *	ODGM * FV	ODGM RE	ODGM ARE	EDGM * FV	EDGM RE	EDGM ARE	DGM * FV	DGM RE	DGM ARE
1990	76.2	76.2	0	0.00%	76.2	0	0.00%	76.2	0	0.00%	76.2	0	0.00%
1991	76.1	74.296	1.804	2.37%	74.2881	1.8119	2.38%	74.2962	1.8038	2.37%	74.3043	1.7957	2.36%
1992	75.7	74.038	1.662	2.20%	74.0306	1.6694	2.21%	74.0377	1.6623	2.20%	74.0449	1.6551	2.19%
1993	74.7	73.78	0.92	1.23%	73.7739	0.9261	1.24%	73.7802	0.9198	1.23%	73.7864	0.9136	1.22%
1994	75	73.523	1.477	1.97%	73.5182	1.4818	1.98%	73.5235	1.4765	1.97%	73.5288	1.4712	1.96%
1995	74.6	73.268	1.332	1.79%	73.2634	1.3366	1.79%	73.2678	1.3322	1.79%	73.2721	1.3279	1.78%
1996	73.5	73.013	0.487	0.66%	73.0094	0.4906	0.67%	73.0129	0.4871	0.66%	73.0163	0.4837	0.66%
1997	71.4	72.759	−1.359	1.90%	72.7563	−1.3563	1.90%	72.7589	−1.3589	1.90%	72.7614	−1.3614	1.91%
1998	70.9	72.506	−1.606	2.27%	72.5041	−1.6041	2.26%	72.5058	−1.6058	2.26%	72.5074	−1.6074	2.27%
1999	70.6	72.253	−1.653	2.34%	72.2528	−1.6528	2.34%	72.2535	−1.6535	2.34%	72.2542	−1.6542	2.34%
2000	68.5	72.002	−3.502	5.11%	72.0024	−3.5024	5.11%	72.0022	−3.5022	5.11%	72.002	−3.502	5.11%
2001	68	71.752	−3.752	5.52%	71.7528	−3.7528	5.52%	71.7517	−3.7517	5.52%	71.7506	−3.7506	5.52%
2002	68.5	71.502	−3.002	4.38%	71.5041	−3.0041	4.39%	71.5021	−3.0021	4.38%	71.5001	−3.0001	4.38%
2003	70.2	71.253	−1.053	1.50%	71.2562	−1.0562	1.50%	71.2534	−1.0534	1.50%	71.2505	−1.0505	1.50%
2004	70.2	71.005	−0.805	1.15%	71.0092	−0.8092	1.15%	71.0055	−0.8055	1.15%	71.0018	−0.8018	1.14%
2005	72.4	70.758	1.642	2.27%	70.7631	1.6369	2.26%	70.7585	1.6415	2.27%	70.7539	1.6461	2.27%
2006	72.4	70.512	1.888	2.61%	70.5178	1.8822	2.60%	70.5123	1.8877	2.61%	70.5069	1.8931	2.61%
2007	72.5	70.267	2.233	3.08%	70.2733	2.2267	3.07%	70.267	2.233	3.08%	70.2608	2.2392	3.09%
2008	71.5	70.023	1.477	2.07%	70.0297	1.4703	2.06%	70.0226	1.4774	2.07%	70.0155	1.4845	2.08%
2009	71.6	69.779	1.821	2.54%	69.787	1.813	2.53%	69.779	1.821	2.54%	69.771	1.829	2.55%
MV * 1991~2009				2.47%			2.47%			2.47%			2.47%
2010	69.2	69.536	−0.336	0.49%	69.5451	−0.3451	0.50%	69.5363	−0.3363	0.49%	69.5275	−0.3275	0.47%
2011	70.2	69.294	0.906	1.29%	69.304	0.896	1.28%	69.2944	0.9056	1.29%	69.2847	0.9153	1.30%
2012	68.5	69.053	−0.553	0.81%	69.0638	−0.5638	0.82%	69.0533	−0.5533	0.81%	69.0428	−0.5428	0.79%
2013	67.4	68.813	−1.413	2.10%	68.8244	−1.4244	2.11%	68.8131	−1.4131	2.10%	68.8018	−1.4018	2.08%
2014	65.8	68.574	−2.774	4.22%	68.5858	−2.7858	4.23%	68.5737	−2.7737	4.22%	68.5616	−2.7616	4.20%
2015	63.8	68.335	−4.535	7.11%	68.3481	−4.5481	7.13%	68.3351	−4.5351	7.11%	68.3223	−4.5223	7.09%
2016	62.2	68.097	−5.897	9.48%	68.1112	−5.9112	9.50%	68.0974	−5.8974	9.48%	68.0837	−5.8837	9.46%
2017	60.6	67.86	−7.26	11.98%	67.8751	−7.2751	12.01%	67.8605	−7.2605	11.98%	67.846	−7.246	11.96%
2018	59	67.624	−8.624	14.62%	67.6398	−8.6398	14.64%	67.6245	−8.6245	14.62%	67.6092	−8.6092	14.59%
2019	57.7	67.389	−9.689	16.79%	67.4053	−9.7053	16.82%	67.3892	−9.6892	16.79%	67.3732	−9.6732	16.76%
2020	56.8	67.155	−10.355	18.23%	67.1717	−10.372	18.26%	67.1548	−10.355	18.23%	67.1379	−10.338	18.20%
MV 2010~2015				1.78%			1.79%			1.78%			1.77%
MV 2016~2020				13.03%			13.06%			13.04%			13.01%
MV 2010~2020				7.92%			7.94%			7.92%			7.90%

* RV, FV, RE, ARE, MV, EGM, ODGM, EDGM and DGM are abbreviations for real value, fitted value, residual error, absolute relative error, mean value, Even Grey Model, Original Difference Grey Model, Even Difference Grey Model and Discrete Grey Model separately.

Table 2. Linear relation between RV and FV of coal consumption percentage.

Style	k	EGM * a	EGM b	EGM R²	ODGM * a	ODGM b	ODGM R²	EDGM * a	EDGM b	EDGM R²	DGM * a	DGM b	DGM R²
Intranet Mean Value Style	1	1.0939	−6.8909	0.448	1.0939	−6.8909	0.4479	1.0939	−6.8909	0.448	1.0939	−6.8909	0.4481
	2	1.0321	−2.3155	0.4589	1.0351	−2.5329	0.4591	1.0321	−2.316	0.4589	1.0291	−2.104	0.4587
	4	1.0254	−1.8324	0.502	1.0286	−2.0656	0.5023	1.0254	−1.8309	0.502	1.0222	−1.6019	0.5017
Simple Mean Value Style	1	1.3838	−27.9914	0.8816	1.3838	−27.9914	0.8809	1.3838	−27.9914	0.8816	1.3838	−27.9914	0.8824
	O 2	1.5679	−41.478	0.9333	1.5704	−41.659	0.9331	1.5679	−41.478	0.9333	1.5653	−41.294	0.9334
	4	1.6066	−44.272	0.9283	1.6105	−44.553	0.9285	1.6066	−44.273	0.9283	1.6027	−43.989	0.9282

* EGM, ODGM, EDGM and DGM are abbreviations for Even Grey Model, Original Difference Grey Model, Even Difference Grey Model and Discrete Grey Model separately.

Table 3. Relative error report about coal consumption percentage for e1.

Year	Coal RV * (%)	EGM * FV *	EGM RE *	EGM ARE *	ODGM * FV	ODGM RE	ODGM ARE	EDGM * FV	EDGM RE	EDGM ARE	DGM * FV	DGM RE	DGM ARE
2010	69.2	69.536	−0.336	0.49%	69.5451	−0.3451	0.50%	69.5363	−0.3363	0.49%	69.5275	−0.3275	0.47%
2011	70.2	69.294	0.906	1.29%	69.304	0.896	1.28%	69.2944	0.9056	1.29%	69.2847	0.9153	1.30%
2012	68.5	69.053	−0.553	0.81%	69.0638	−0.5638	0.82%	69.0533	−0.5533	0.81%	69.0428	−0.5428	0.79%
2013	67.4	68.813	−1.413	2.10%	68.8244	−1.4244	2.11%	68.8131	−1.4131	2.10%	68.8018	−1.4018	2.08%
2014	65.8	68.574	−2.774	4.22%	68.5858	−2.7858	4.23%	68.5737	−2.7737	4.22%	68.5616	−2.7616	4.20%
2015	63.8	68.335	−4.535	7.11%	68.3481	−4.5481	7.13%	68.3351	−4.5351	7.11%	68.3223	−4.5223	7.09%
2016	62.2	68.097	−5.897	9.48%	68.1112	−5.9112	9.50%	68.0974	−5.8974	9.48%	68.0837	−5.8837	9.46%
2017	60.6	67.86	−7.26	11.98%	67.8751	−7.2751	12.01%	67.8605	−7.2605	11.98%	67.846	−7.246	11.96%
2018	59	67.624	−8.624	14.62%	67.6398	−8.6398	14.64%	67.6245	−8.6245	14.62%	67.6092	−8.6092	14.59%
2019	57.7	67.389	−9.689	16.79%	67.4053	−9.7053	16.82%	67.3892	−9.6892	16.79%	67.3732	−9.6732	16.76%
2020	56.8	67.155	−10.355	18.23%	67.1717	−10.372	18.26%	67.1548	−10.355	18.23%	67.1379	−10.338	18.20%
MV * 2010~2015				1.78%			1.79%			1.78%			1.77%
MV 2016~2020				13.03%			13.06%			13.04%			13.01%
MV 2010~2020				7.92%			7.94%			7.92%			7.90%

* RV, FV, RE, ARE, MV, EGM, ODGM, EDGM and DGM are abbreviations for real value, fitted value, residual error, absolute relative error, mean value, Even Grey Model, Original Difference Grey Model, Even Difference Grey Model and Discrete Grey Model separately.

Table 4. Mean value of absolute relative error report about energy consumption data for enhanced algorithms during 2010~2020.

Item	Algorithm	Enhanced EGM * ARE *	Enhanced ODGM * ARE	Enhanced EDGM * ARE	Enhanced DGM * ARE	Original EGM ARE	Original ODGM ARE	Original EDGM ARE	Original DGM ARE
Coal	e1	11.80%	11.82%	11.80%	11.78%	18.21%	18.36%	18.23%	18.10%
Oil	e2	2.39%	2.39%	2.39%	2.39%	8.61%	8.59%	8.60%	8.61%
Gas	e2	15.65%	15.64%	15.65%	15.67%	42.99%	42.85%	42.97%	43.11%
Power	e2	8.40%	8.40%	8.40%	8.41%	23.57%	23.49%	23.54%	23.59%
Amount	0.9e1 + 0.1e2	7.67%	6.80%	6.79%	6.77%	8.62%	8.69%	8.63%	8.57%
Mean Value		9.18%	9.01%	9.01%	9.00%	20.40%	20.40%	23.34%	20.40%

* EGM, ODGM, EDGM and DGM are abbreviations for Even Grey Model, Original Difference Grey Model, Even Difference Grey Model and Discrete Grey Model separately.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lin, G.; Liang, Y.; Tavares, A. Design of an Energy Supply and Demand Forecasting System Based on Web Crawler and a Grey Dynamic Model. Energies 2023, 16, 1431. https://doi.org/10.3390/en16031431

AMA Style

Lin G, Liang Y, Tavares A. Design of an Energy Supply and Demand Forecasting System Based on Web Crawler and a Grey Dynamic Model. Energies. 2023; 16(3):1431. https://doi.org/10.3390/en16031431

Chicago/Turabian Style

Lin, Gang, Yanchun Liang, and Adriano Tavares. 2023. "Design of an Energy Supply and Demand Forecasting System Based on Web Crawler and a Grey Dynamic Model" Energies 16, no. 3: 1431. https://doi.org/10.3390/en16031431

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Design of an Energy Supply and Demand Forecasting System Based on Web Crawler and a Grey Dynamic Model

Abstract

1. Introduction

2. Design on Energy Supply and Demand Forecasting System

2.1. Crawler Service and Data Service

2.2. Algorithm Service and Data Service

2.3. API and Application

3. Data Analysis and Enhanced Algorithm

4. Discussions and Results

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI