Next Article in Journal
Assessing Mutual Fund Performance in China: A Sector Weight-Based Approach
Previous Article in Journal
Evaluation of Deformable Convolution: An Investigation in Image and Video Classification
Previous Article in Special Issue
Spillover Effect of Network Public Opinion on Market Prices of Small-Scale Agricultural Products
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

An Expanded Spatial Durbin Model with Ordinary Kriging of Unobserved Big Climate Data

1
Post Doctoral Program, Department of Statistics, Faculty of Mathematics and Natural Sciences, Universitas Padjadjaran, Sumedang 45363, Indonesia
2
Department of Statistics, Faculty of Mathematics and Natural Sciences, Universitas Padjadjaran, Sumedang 45363, Indonesia
3
Department of Mathematics, Faculty of Mathematics and Natural Sciences, Universitas Padjadjaran, Sumedang 45363, Indonesia
4
Research Center for Climate and Atmosphere, National Research and Innovation Agency (BRIN), Jakarta Pusat 10340, Indonesia
5
Research Center for Artificial Intelligence and Cyber Security, National Research and Innovation Agency (BRIN), Bandung 40135, Indonesia
*
Author to whom correspondence should be addressed.
Mathematics 2024, 12(16), 2447; https://doi.org/10.3390/math12162447
Submission received: 30 June 2024 / Revised: 26 July 2024 / Accepted: 30 July 2024 / Published: 7 August 2024
(This article belongs to the Special Issue Machine Learning, Statistics and Big Data)

Abstract

:
Spatial models are essential in the prediction of climate phenomena because they can model the complex relationships between different locations. In this study, we discuss an expanded spatial Durbin model with ordinary kriging on unobserved locations (ESDMOK) to predict rainfall patterns in Java Island. The classical spatial Durbin model needed to be expanded to obtain a parameter estimation for each location. We combined this with ordinary kriging because the data were not available in some locations. The data were taken from the National Aeronautics and Space Administration Prediction of Worldwide Energy Resources (NASA POWER) website. Since climate data are big data, we implement a big data analytics approach, namely the data analytics life cycle method. As the exogenous variables, we used air temperature, humidity, solar irradiation, wind speed, and surface pressure. The authors developed an R-Shiny web applications to implement our proposed technique. Using our proposed technique, we obtained more accurate and reliable climate data prediction, indicated by the mean absolute percentage error (MAPE), which was equal to 1.956%. The greatest effect on rainfall was given by the surface pressure variable, and the smallest was wind speed.

1. Introduction

Climate is the long-term average weather and atmospheric conditions in a specific region, typically measured for 30 years or more, both regionally and globally [1]. These climate phenomena have significant effects on various aspects of human life, including agriculture, health, coastal ecosystems, and water quality [2]. One notable impact is La Nina, which increases rainfall in the Western Pacific region. Empirical data from BMKG indicates that La Nina can increase rainfall on Java Island by 20% and 70% [3]. Extreme rainfall causes landslides and floods, among other natural disasters [4,5]. The community suffers greatly from the financial and health effects of these natural disasters, and those who depend on rainfall conditions, like farmers and fishermen, truly need information related to rainfall prediction. The intensity of rainfall has a significant impact on every aspect of life, and exceptionally high rainfall has an impact on the occurrence of natural disasters and so should be warned of before the event. People who depend on rainfall conditions, such as fishermen and farmers, really need information related to rainfall prediction. Rainfall is closely linked to Sustainable Development Goal (SDG) 13, which focuses on “Climate Action”. This goal emphasizes urgent action to combat climate change and its impacts [4]. Understanding and managing rainfall patterns are crucial for achieving SDG 13. Effective climate action requires addressing the impacts of changing rainfall on various sectors, enhancing resilience to extreme weather events, and ensuring sustainable water and land management.
The spatial model is an effective tool for predicting rainfall and climate. Creating statistical or computational models to represent spatial relationships or patterns in data is a prevalent practice in spatial modelling [5]. Spatial models are widely used in various fields to analyse patterns, relationships, and phenomena related to the spatial dimension in data [6]. Rainfall is closely related to other climate elements, such as air temperature, humidity, solar irradiation, wind speed and surface pressure, which vary from region to region [7]. These climate variables are observed and modelled using spatial analysis. Falah et al. (2023) conducted hybrid modelling of the Spatial Autoregressive Exogenous (SAR-X) model using Casetti’s model approach for the prediction of rainfall in West Java, Indonesia [8]. The weakness of the SAR-X model is that it only facilitates spatial dependence on the response variable, not paying attention to spatial dependence on the exogenous variables. Discussing spatial aspects, there is a possibility that there is spatial dependency not only on the response variable but also on exogenous variables. Thus, a model approach is needed that can facilitate the existence of spatial dependencies between these variables. Therefore, to overcome these problems, the expanded spatial Durbin model (ESDM) was used to overcome spatial dependence on exogenous variables [9].
In various phenomena, there are often unknown values at observation locations. A linear interpolation model called the ordinary kriging (OK) model is utilized to predict these values at unobserved locations [10]. The OK method is a spatial interpolation method that uses the spatial variability of data to estimate values at unobserved locations [11]. The advantage of the OK method is that it produces an optimal estimate by taking into account the spatial information and covariance structure of the data [12]. This is to obtain the most accurate prediction at unobserved locations [13]. The predictions generated by OK often produce smooth maps without any sharp spikes or dips between observation points [14].
The ongoing expansion of data poses a challenge to the prediction of climate and rainfall patterns. We discovered a gap in rainfall prediction utilizing the ESDM with OK of climate data based on the summary above. These climate data were collected from 119 districts/cities in Java Island using the National Aeronautics and Space Administration Prediction of Worldwide Energy Resources (NASA POWER) website. In this research, the data analytics life cycle was the big data approach methodology, which was utilized to predict rainfall using big climate data. Rainfall description and prediction have a more in-depth stage in the data analytics life cycle. The application study in this research is supported by computing an integrated R script for the development of the ESDM using an R-Shiny web application to facilitate the prediction process.

2. Materials and Methods

2.1. Experimental Semivariogram

The OK method was used to analyse geostatistical data and interpolate values based on observed data. It is worth noting that D. G. Krige, a gold mining expert from South Africa, first introduced the model. Furthermore, OK relies on the assumption that there is a spatial correlation between observed data as determined by the distance between the entities [15]. To perform the interpolation, the model employs a semivariogram calculation representing the spatial differences and values between all pairs of observed data. The semivariogram also shows the weights used in the interpolation process, which is calculated based on a sample semivariogram with distance h, different z values, and n data samples. The experimental semivariogram at distance h can be expressed as follows [16]:
γ ^ ( h ) = 1 2 N ( h ) i = 1 N ( h ) [ Z ( s i + h ) Z ( s i ) ] 2
where:
  • γ ^ ( h ) : experimental semivariogram value with distance h ;
  • Z ( s i ) : observation value in location s i ;
  • Z ( s i + h ) : observation value in location s i + h ;
  • N ( h ) : many pairs of data with the same distance h ;
  • h : distance.
All possible pairs of distances were calculated using a Euclidean distance equation as follows:
| h | = ( s i ( u i ) s j ( u j ) ) 2 ( s i ( v i ) s j ( v j ) ) 2
The results of the Euclidean distance calculation were converted to kilometres with | d i j | × 111.319 . A value of 111.319 was obtained by converting 1 degree longitude to kilometres [17]. A location can be determined as s i j ( u i j , v i j ) , where s i j is the symbol of the location i and j , with i = 1 , 2 , 3 , N and j = 1 , 2 , 3 , N , while u and v indicate the latitude and longitude coordinates.

2.2. Theoretical Semivariogram Model

There are three commonly used theoretical semivariogram models in kriging: the spherical, the Gaussian, and the exponential [16]. Furthermore, theoretical semivariogram values can be divided into three using the equations in Table 1: h is the location distance between samples, c is the sill value, and a is the range [18]. Theoretical semivariogram models can be seen in Table 1, and a theoretical semivariogram plot can be seen in Figure 1.

2.3. Ordinary Kriging (OK) Method

The kriging method is a prediction method that provides a BLUE (best linear unbiased estimation) estimator of point values or averages for prediction at unobserved locations. This method uses a semivariogram calculation that represents the spatial and value differences between all pairs of data samples. According to [19], the kriging estimator Z ^ ( x ) , where s is the location at the unsampled point, is a linear combination of random variables; this can be seen in the kriging estimation formulated in the following equation:
Z ^ ( s ) m ( s ) = i = 1 n λ i [ Z ( s i ) m ( s i ) ]
with:
  • s : predicted locations;
  • s i : i-th data location adjacent to the predicted location;
  • m ( s ) : the expected or average value of Z ( s ) ;
  • m ( s i ) : the expected or average value of Z ( s i ) ;
  • n: the number of data used for prediction;
  • λ i : weight value at i-th location.
The objective of the kriging method is to determine the weight values λ i that result in minimum estimator variance and an unbiased estimator. The estimator variance can be expressed as follows [20]:
σ e 2 ( s ) = V a r [ Z ^ ( s ) Z ( s ) ]
while the requirement to produce an unbiased estimator is:
E [ Z ^ ( s ) Z ( s ) ] = 0
The OK method is one of the kriging methods that assumes that the mean is unknown. If Z ( s ) = m ( s ) and Z ( s i ) = m ( s i ) , then m ( s ) = m ( s i ) = m , and then Equation (6) becomes:
Z ^ ( s ) m = i = 1 n λ i [ Z ( s i ) m ]
Z ^ ( s ) = m + i = 1 n λ i [ Z ( s i ) m ]
Z ^ ( s ) = i = 1 n λ i Z ( s i ) m ( i = 1 n λ i 1 )
Since m is assumed to be unknown, the OK estimator is obtained as Z ^ ( s ) = i = 1 n λ i Z ( s i ) with the condition i = 1 n λ i = 1 .
The BLUE (best linear unbiased estimation) properties of the OK method are shown as follows:
  • Linear
    The OK estimator obtained from n observations of the data used forms a linear model, namely:
    Z ^ ( s ) = i = 1 n λ i Z ( s i )
  • Unbiased
    The OK estimator is unbiased if it satisfies Equation (8)
    E [ Z ^ ( s i ) Z ( s ) ] = E [ i = 1 n λ i Z ( s i ) Z ( s ) ] = i = 1 n λ i E [ Z ( s i ) Z ( s ) ] ( E [ Z ( s i ) Z ( s ) ] = 0 ) = 0
    since the mean is assumed to be unknown, E [ Z ( s i ) Z ( s ) ] = 0 , and the unbiased estimator property is satisfied by the OK method.
  • Best
The best here means that the OK estimator has the minimum error variance. The variance of the estimator of OK is as follows:
σ O K 2 = V a r [ Z ^ ( s ) Z ( s ) ]
σ O K 2 = V a r [ Z ^ ( s ) ] + V a r [ Z ( s ) ] 2 C o v [ Z ^ ( s ) , Z ( s ) ]
to describe V a r [ Z ^ ( s ) ] in Equation (10), it is known that Z ^ ( s ) = i = 1 n λ i Z ( s i ) , which can be expressed as follows:
V a r [ Z ^ ( s ) ] = V a r [ i = 1 n λ i Z ( s i ) ]
V a r [ Z ^ ( s ) ] = i = 1 n j = 1 n λ i λ j C o v [ Z ( s i ) , Z ( s j ) ]
to describe C o v [ Z ^ ( s ) , Z ( s ) ] in Equation (11):
C o v [ Z ^ ( s ) , Z ( s ) ] = E [ Z ^ ( s ) Z ( s ) ] E [ Z ^ ( s ) ] E [ Z ( s ) ] C o v [ Z ^ ( s ) , Z ( s ) ] = E [ ( i = 1 n λ i Z ( s i ) ) Z ( s ) ] E [ i = 1 n λ i Z ( s i ) ] E [ Z ( s ) ] C o v [ Z ^ ( s ) , Z ( s ) ] = i = 1 n λ i E [ Z ( s i ) Z ( s ) ] i = 1 n λ i E [ Z ( s i ) ] E [ Z ( s ) ]
C o v [ Z ^ ( s ) , Z ( s ) ] = i = 1 n λ i C o v [ Z ( s i ) , Z ( s ) ]
Supposing that V a r [ Z ( s ) ] = σ 2 , Equations (11) and (12) can be substituted into Equation (10), and the following equation can be obtained:
σ O K 2 = i = 1 n j = 1 n λ i λ j C o v [ Z ( s i ) , Z ( s j ) ] + σ 2 2 i = 1 n λ i C o v [ Z ( s i ) , Z ( s ) ]
with the condition i = 1 n λ i = 1 .
Based on Equation (13), to obtain the minimum value of the estimator variance, the Lagrange multiplier (LM) method was used with the parameter μ . The LM equation is expressed as follows:
F ( λ i , μ ) = i = 1 n j = 1 n λ i λ j C o v [ Z ( s i ) , Z ( s j ) ] + σ 2 2 i = 1 n λ i C o v [ Z ( s i ) , Z ( s ) ] + 2 μ [ i = 1 n λ i 1 ]
By deriving Equation (14) for the following variables λ i :
F ( λ i , μ ) λ i = 2 j = 1 n λ j C o v [ Z ( s i ) , Z ( s j ) ] 2 C o v [ Z ( s i ) , Z ( s ) ] + 2 μ = 0
Because F ( λ i , μ ) λ i = 0 , the following is obtained:
j = 1 n λ j C o v [ Z ( s i ) , Z ( s j ) ] = C o v [ Z ( s i ) , Z ( s ) ] μ
By deriving Equation (14) with respect to the parameter μ , we obtain:
F ( λ i , μ ) μ = 2 [ i = 1 n λ i 1 ] = 0
Because F ( λ i , μ ) μ = 0 , the following is obtained:
2 [ i = 1 n λ i 1 ] = 0 i = 1 n λ i 1 = 0
i = 1 n λ i = 1
Equations (15) and (16) compose an OK system. By solving the equation, the following matrix can be formed:
λ 1 C 11 + λ 2 C 12 + λ 3 C 13 + + λ n C 1 n + μ = C 10 λ 1 C 21 + λ 2 C 22 + λ 3 C 23 + + λ n C 2 n + μ = C 20 λ 1 C 31 + λ 2 C 32 + λ 3 C 33 + + λ n C 3 n + μ = C 30 λ 1 C n 1 + λ 2 C n 2 + λ 3 C n 3 + + λ n C n n + μ = C n 0 λ 1 + λ 2 + λ 3 + + λ n + 0 = 1
( C 11 C 12 C 13 C 1 n 1 C 21 C 22 C 23 C 2 n 1 C 31 C 32 C 33 C 3 n 1 C n 1 C n 2 C n 3 C n n 1 1 1 1 1 1 0 ) ( λ 1 λ 2 λ 3 λ n μ ) = ( C 10 C 20 C 30 C n 0 1 )
Meanwhile, to determine the weight value of each observed point against the unobserved point, this can be expressed as follows:
( λ 1 λ 2 λ 3 λ n μ ) = ( C 11 C 12 C 13 C 1 n 1 C 21 C 22 C 23 C 2 n 1 C 31 C 32 C 33 C 3 n 1 C n 1 C n 2 C n 3 C n n 1 1 1 1 1 1 0 ) 1 ( C 10 C 20 C 30 C n 0 1 )
λ = C n n 1 C n 0
where:
  • C n n : the variance covariance matrix between the variables sampled at the location n and the sampled variable at location n ;
  • C n 0 : the variance vector of the covariance between the sampled variable at the location n and the predicted variable;
  • μ : the Lagrange multiplier parameter.
To obtain the variance equation of the OK estimator in Equation (13), Equation (15) can be substituted, and the following is obtained:
σ O K 2 = i = 1 n j = 1 n λ i λ j C o v [ Z ( s i ) , Z ( s j ) ] + σ 2 2 i = 1 n λ i C o v [ Z ( s i ) , Z ( s ) ] σ O K 2 = i = 1 n λ i j = 1 n λ j C o v [ Z ( s i ) , Z ( s j ) ] + σ 2 2 i = 1 n λ i C o v [ Z ( s i ) , Z ( s ) ] σ O K 2 = i = 1 n λ i C o v [ Z ( s i ) , Z ( s ) ] μ + σ 2 2 i = 1 n λ i C o v [ Z ( s i ) , Z ( s ) ]
σ e 2 = σ 2 i = 1 n λ i C o v [ Z ( s i ) , Z ( s ) ] μ
σ O K 2 = σ 2 ( λ 1 C 10 + λ 2 C 20 + λ 3 C 30 + + λ n C n 0 ) μ
σ O K 2 = σ 2 ( C 10 C 20 C 30 C n 0 1 ) ( λ 1 λ 2 λ 3 λ n μ )
The minimum estimator variance is commonly referred to as the OK estimator variance; thus, the best estimator satisfied the OK method.

2.4. Expanded Spatial Durbin Model (ESDM)

The ESDM was used to overcome spatial dependence on the exogenous variables. The ESDM is formulated as follows [9]:
y = ρ W y + α 1 n + X Z J β 0 + W X ˜ θ + ε with ε ~ i i d N ( 0 , σ 2 I )
Letting A = X Z J , it follows that:
y = ρ W y + α 1 n + A β 0 + W X ˜ θ + ε with ε ~ i i d N ( 0 , σ 2 I )
y = ρ W y + U δ + ε
where:
U = [ 1 n A W X ˜ ] ,
δ = [ α β 0 θ ] ,
with:
  • y : vector of dependent variables of size ( n × 1 ) ;
  • X ˜ : matrix of independent variables of size ( n × k ) ;
  • X : matrix of independent variables of size ( n × n k ) ;
  • ρ : spatial lag coefficient of the dependent variable;
  • α : constant parameter;
  • W : spatial weight matrix of size ( n × n ) ;
  • Z : location information that contains elements Z x i , Z y i   with   i = 1 , , n , representing the latitude and longitude of each observation, of size ( n k × 2 n k ) ;
  • J : expansion of the identity matrix of size ( 2 n k × 2 k ) ;
  • β : matrix of size ( n k × 1 ) , which contains parameter estimators for all explanatory k variables at each observation;
  • β 0 : parameter expressed by β l a t i t u d e , β l o n g i t u d e of size ( 2 k × 1 ) ;
  • θ : spatial lag parameter vector of covariate variable of size ( k × 1 ) ;
  • : Kronecker product;
  • ε : error vector of size ( n × 1 ) ;
  • s i : location matrix with i = 1 , , n .

2.5. Mean Absolute Percentage Error (MAPE)

To evaluate the model’s performance, the mean absolute percentage error (MAPE) is calculated as follows:
M A P E = ( 1 n i = 1 n | y ( s i ) y ^ ( s i ) y ( s i ) | ) × 100 %
with
  • y ( s i ) : the values in the actual data at the location s i ;
  • y ^ ( s i ) : the values in the prediction data at the location s i ;
  • n : the number of observation locations.
According to Lawrence’s criteria (2009) [21], MAPE values can be categorized as follows (Table 2):

2.6. Data Analytics Life Cycle

Large data quantities, a variety of data architectures, and rapid growth in data were the challenges faced by big data and data science, which led to the creation of the data analytics life cycle. There are six stages in this life cycle, which might happen simultaneously in some circumstances. This analysis could typically proceed both forward and backward, enabling an iterative process that takes into account newly discovered information as it becomes available [22]. This makes it possible to solve problems and go through the procedure repeatedly, which also makes it easier to operationalize research objectives. Best practices for the analytical process, from discovery to research work completion, are defined by the data analysis life cycle. The data analytics life cycle consists of six stages that apply looping or backward/forward. The following is a summary of the six stages of the data analysis life cycle [23]:
  • Discovery (Problem Formulation): At this point, a literature review was conducted to prepare for the research problem analysis phase. This phase required assembling resources, including data, technology, references, and time. Developing a problem framework as an analytical task to be tackled in the following stage and developing preliminary hypotheses to investigate and evaluate the data were crucial tasks in this phase.
  • Data Preparation: Initial data analysis was part of the data pre-processing performed at this stage. A necessary step before building the model was to prepare the data for collection in the database repository, which included procedures such as data cleaning, extraction, transformation, and integration.
  • Model Planning: This stage focuses on planning the model by determining the methods, techniques, and research flow to be followed during the model-building stage.
  • Model Building: At this stage, this research focused on creating datasets for testing, training, and creating output models. The model’s efficiency in running on the current hardware, such as its quick hardware and parallel processing capabilities, was considered.
  • Communicating Results: This step entailed testing the data model and any modifications with the user or in an experimental environment to ascertain whether the output complied with the development criteria. Should the model fail to satisfy the specifications, an assessment was carried out, and the procedure could revert to the earlier phase for further improvement.
  • Operationalizing (Operationalization): At this point, the final report, instructions, codes, and technical documents had to be submitted. To guarantee a wider application, this stage can also include implementing the model as a pilot project.
If more enhancements are needed, the data analytics life cycle might be carried out again from phases 1 through 5. The evaluation of the modelling process, from steps 6 to 1, was indicated by dotted lines, highlighting the possibility of revisiting certain stages if the modelling results did not meet the desired criteria.

3. Results

3.1. Data Description

The objective of this study was to predict big climate data using the ESDMOK in Java Island, Indonesia. The model was applied to secondary data obtained from the National Aeronautics and Space Administration (NASA) Prediction of Worldwide Energy Resources (POWER). The POWER project provides solar and meteorological data generated by NASA to support renewable energy, building energy efficiency, and agricultural needs. The POWER project started in 2003 as an outgrowth of Surface meteorology and Solar Energy (SSE). NASA-generated satellite data are essential in supporting researchers and the public in studying Earth’s climate and climate processes [24]. The POWER project provides long-term climatological mean estimates of meteorological data and solar energy flux surface data. In addition to these long-term climatological averages, daily data in the form of time series are also available. Solar data are based on satellite observations, and meteorological data are derived from the MERRA-2 assimilation model. These satellite and model-based products have proven to be quite accurate in providing reliable solar and meteorological resource data in regions where surface measurements could be more sparse or non-existent. The uncertainty estimates of POWER data are based on comparisons with measurement data [25,26,27].
In addition, POWER also provides high-resolution precipitation data derived from NASA’s Global Precipitation Measurement (GPM) mission’s Integrated Multi-satellite Retrievals for GPM (IMERG) with a resolution of 0.5° × 0.625° latitude–longitude grid (approximately 50 km) [28]. NASA POWER data can be downloaded free via the web at https://power.larc.nasa.gov/ (accessed on 5 July 2023). The data retrieval process began with inputting the latitude and longitude coordinates of the location and determining the time interval. In this study, the daily data interval was taken from 1 January 1982 to 5 July 2023 and was recorded in a daily data format. The selected climate variables included rainfall, air temperature, humidity, wind speed, solar irradiation, and surface pressure, along with latitude and longitude coordinate information. The output data obtained were stored in files of comma-separated value (.csv) format.

3.2. Data Analytics Life Cycle for the ESDMOK

In this research, rainfall prediction with ESDMOK uses the data analytics life cycle methodology shown in Figure 2. The process begins with formulating research problems, including natural disasters caused by rainfall, problem identification, climate variables affecting rainfall levels, and initial hypotheses based on theories that support ESDMOK. Furthermore, the data preparation stage includes determining the source of climate data to be analyzed and data pre-processing. The data collection process begins with inputting information on the location coordinates of 119 districts/cities on the Java Island, determining the observation time interval in the form of daily data, and selecting climate variables. In the data collection process, this research utilizes the application programming interface (API) by running the “pynasapower” package in Python. The data pre-processing stage includes; removing missing values, aggregating daily data to monthly data, selecting the climate variables. The model planning stage integrates location data with climate variables to provide input data for ESDMOK, such as spatial modelling. Furthermore, at the model development stage, based on the NASA POWER grid resolution of latitude-longitude 0.5° × 0.625° (approximately 50 km), 55 locations with the same data were generated. Thus, the locations were split into 64 observed locations and 55 unobserved locations, and the prediction stage was performed using ESDMOK. The communication results stage includes the model evaluation stage, which calculates accuracy using the MAPE calculation, the post-processing stage, which visualizes spatial mapping for rainfall prediction, and the interpretation of results to gain knowledge that can be used as a recommendation. The last stage is operationalization, which involves documenting research results and disseminating scientific papers in journals.

3.3. Framework ESDMOK for Prediction

Based on the gap analysis, ESDMOK is a spatial model that can predict unobserved locations and considers spatial dependencies in exogenous variables. Figure 3 shows the framework diagram for ESDMOK planning as part of the model-building stage. As mentioned in the data preparation stage, the process starts with inputting climate data sourced from NASA POWER. Next, in the data pre-processing stage, removing missing values (−999), aggregating daily data into monthly data, and selecting duplicate data. In this study, climate data from NASA POWER went through a pre-processing data stage using an R-Shiny web application, available at the following link: https://annisanurfalah.shinyapps.io/Pre-ProcessingData/ (accessed on 26 September 2023). The resulting pre-processed data were split into observed and unobserved locations, which can be seen in Appendix A. The observed locations were used as an input in the OK method for predicting climate data at unobserved locations. In this study, predictions at unobserved locations were calculated with an OK method using an R-Shiny web application, available at the following link: https://annisanurfalah.shinyapps.io/Ordinary-Point-Kriging/ (accessed on 7 May 2024). The integration results of climate data at observed and unobserved locations were used to construct an inverse distance weight matrix and spatial autocorrelation was assessed using the Moran Index and Scatterplot. If spatial autocorrelation is detected, the process continues with ESDM, using the Maximum Likelihood Estimation (MLE) method. Parameter estimates for ESDM are then calculated, followed by evaluating prediction accuracy using Mean Absolute Percentage Error (MAPE).The prediction calculations used the ESDM via an R-Shiny web application, available at the following link: https://andriyanafalah.shinyapps.io/SDM-Expansion/ (accessed on 11 June 2024). The prediction results are then processed by visualizing the spatial mapping in the form of web application-based maps, choropleth maps and providing interpretation to gain valuable insights.

3.4. Prediction Result of OK Method at Unobserved Locations

Semivariogram values were calculated based on all possible distance pairs, where the distance function used was the Euclidean distance, a function of the distance h , which describes the difference between the main variable and the difference in the additional variable h . Equation (1) was used to determine the semivariogram value and the number of distance pairs. The research data that were observed consisted of 64 districts/cities for six climate data categories, including rainfall, air temperature, humidity, wind speed, solar irradiation, and surface pressure. The calculation results of the experimental semivariogram values are shown in Table 3.
The experimental semivariogram is used for fitting the theoretical semivariogram model. Reference to Table 1, the theoretical semivariogram model with varying sill and range values in each climate data set was fitted to the experimental semivariogram values. The theoretical semivariogram was fit for climate data, and the results are plotted in Appendix B. The sum square error (SSE) number was used to determine which theoretical semivariogram model was the best. The SSE values for the spherical, exponential, and Gaussian models are shown in Table 4. Based on the lowest SSE values, indicated by bold numbers, the optimal models for the rainfall, air temperature, humidity, solar irradiation, wind speed, and surface pressure are chosen.
Based on Table 4, the Gaussian semivariogram model had the minimum SSE value for rainfall and air temperature. Then, the exponential semivariogram model had the minimum SSE value for humidity, wind speed, solar irradiation and surface pressure. Therefore, these models were selected as the input for the calculation prediction of the OK method at unobserved locations (55 districts/cities). The integration results at observed locations (64 districts/cities) and unobserved locations are presented in the form of spatial mapping visualization, which can be seen in Figure 4. Climate variables in this data visualization include rainfall, air temperature, humidity, wind speed, solar irradiation, and surface pressure, which are variables in 119 districts/cities in Java Island, Indonesia. The data of these climate variables are represented in colours and bars, namely in several colours or one colour with different levels of intensity, to provide information on the level of value of the data. This data visualization aims to convey the value of climate variables so that data visualization can be classified into the explanation category. Still, with the addition of several interactive features and for easy use, this data visualization was made in the form of a web application. Data visualization in the form of spatial mapping based on web applications was developed using the Javascript programming language, HyperText Markup Language (HTML), and Cascade Style Sheets (CSS) style sheet language. Javascript was used to modify the display to handle data processing and data structures, while HTML and CSS were used for the content of the webpage, namely the layout and structure of the display. Several libraries were used in this data visualization, namely the leaflet library as a base map, map features such as overlay, zoom in, zoom out, and pan, and another library including highchart for data representation in the form of bar charts.

3.5. Prediction Result of the ESDM

The estimation of prediction parameters in the ESDM was conducted using an R-Shiny web application. An estimated ρ ^ value of 0.999 was obtained, producing an optimum spatial lag with a positive value ( ρ ^ > 0 ) and indicating spatial lag dependence. It signified the influence of adjacent locations within the Java Island region on rainfall prediction data. It shows that the phenomenon of rainfall on Java Island has a positive spatial autocorrelation, meaning that if districts/cities on Java Island have high rainfall, then other districts/cities around it have high rainfall as well or are not much different from the other districts/cities. The results of the parameter estimate calculation β ^ 0 and θ ^ are shown in Table 5.
Base on Table 5, the estimate β ^ 0 measures the direct impact of the exogenous variables on rainfall level in the same region, and the estimate θ ^ captures the spillover effects of the exogenous variables on rainfall level. Based on the estimate β ^ 0 , we can obtain the estimate β ^ from which different parameter estimates were obtained for each exogenous variable in 119 districts/cities. It explains that the ESDM produces different parameter estimates for each exogenous variable at each location due to the expansion of the exogenous variable matrix involving latitude and longitude information at each location. The highest effect on the rainfall is given by the surface pressure and the lowest is humidity. The ESDM equation for each location can be found in Appendix C. A visualization of rainfall prediction in 119 districts/cities of Java Island is shown in Figure 5.
Based on Figure 5, the highest monthly rainfall predictions, with values above 220 mm, were in West Java, such as in Ciamis, Tasikmalaya, and Pangandaran, while the lowest monthly rainfall predictions, with values below 140 mm, were in East Java, such as in Probolinggo, Probolinggo City, Situbondo, and Bondowoso. The proposed model resulted in a MAPE value of 1.956%, indicating very accurate prediction.

4. Discussion

Data analytics life cycle methodology consists of six stages and is used to analyze big climate data sourced from NASA POWER. The initial step taken is formulating research problems regarding the impact of rainfall on natural disasters. Problem identification includes spatial dependencies on climate variables that affect rainfall on the island of Java, Indonesia and initial hypotheses based on the theory that supports ESDMOK. Data preparation begins with collecting climate variable data containing location coordinate information, determining the observation time interval in the form of daily data, and selecting six climate variables: rainfall, air temperature, humidity, wind speed, solar irradiation, and surface pressure. Furthermore, the data pre-processing stage includes cleaning missing values, aggregating daily data to monthly data, and selecting duplicate data. Model planning integrates location data with climate variables as inputs in ESDMOK. Model development is one of the main objectives of our proposed technique for rainfall prediction. In the model-building stage, climate variables were predicted at 55 unobserved locations using the OK method based on 64 observed locations. The OK method uses the experimental semivariogram to fit the theoretical semivariogram model. The best theoretical semivariogram models for fitting are Gaussian and exponential models, with minimum SSE. Furthermore, based on the spatial autocorrelation test, spatial dependency exists in both rainfall and exogenous variables, contributing significantly to the accuracy of prediction results using ESDM. Communication of results gives very accurate prediction results, as shown by MAPE of 1.956%. The surface pressure variable has the largest influence on rainfall, and the smallest is wind speed. Furthermore, the post-processing stage is a visualization of spatial mapping for rainfall prediction and the interpretation of results to gain knowledge that can be used as a recommendation.
The ESDMOK was applied to predict rainfall in 119 districts/cities in Java Island, Indonesia, which is influenced by exogenous variables in the form of climate variables. Rainfall prediction is important in climate change in accordance with goal 13 of the Sustainable Development Goals (SDGs) concerns climate action. The summary of this research underscored that the level of rainfall in each region, based on the data from 119 districts/cities in Java Island, Indonesia, was significantly influenced by other climate-variable factors, such as air temperature, humidity, solar irradiation, wind speed and surface pressure [7]. The data complexity implies the need for a more effective technique. The incorporation of deep learning approaches and leveraging of big data should be considered to further enhance the prediction and analysis of rainfall in the study region.

5. Conclusions

In conclusion, this study proposes an ESDM and an interpolation technique with the OK method to calculate predictions at unobserved locations. ESDMOK can be used for rainfall prediction in spatial dependence on exogenous variables. Our proposed technique can identify spatial rainfall patterns, capture spatial dependence between observation units within the region and incorporate relevant exogenous variables; the model improves rainfall prediction accuracy. The surface pressure variable effects the most significant influence of exogenous variables on rainfall, and the smallest is wind speed.
The results of this model support disaster mitigation, water resources management, and infrastructure development that is resilient to natural disasters. The prediction results of the ESDMOK in all districts and cities in Java Island can be used as a recommendation by the Meteorology Climatology and Geophysics Agency (BMKG), Indonesia, agribusiness companies, and the general public in improving agricultural planning and planting seasons and providing climate information for the general public, especially related to rainfall in areas that have Monsoonal patterns.

6. Patents

Granted Copy Right: Copy Right for Computer Program, number 000484474.
Entitled “Application of RShiny Program for Ordinary Point Kriging Method on Rainfall Data in West Java”, Ministry of Law and Human Rights of the Republic of Indonesia (Falah, A. N., Ruchjana, B. N., Abdullah, A. S., Rejito, J.), 2023. https://annisanurfalah.shinyapps.io/Ordinary-Point-Kriging/ (accessed on 7 May 2024).

Author Contributions

Conceptualization, A.N.F. and Y.A.; methodology, A.N.F., Y.A., B.N.R. and E.H.; software, A.N.F., E.M. and S.B.S.; validation, T.H., R. and H.S.; formal analysis, A.N.F., Y.A., B.N.R., E.H., T.H., E.M., R., H.S. and S.B.S.; investigation, E.H., T.H., R. and H.S.; resources, A.N.F., E.H., T.H. and E.M.; data curation, E.H., T.H., R. and H.S.; writing—original draft preparation, A.N.F., Y.A. and B.N.R.; writing—review and editing, A.N.F., Y.A., B.N.R., E.H., T.H. and E.M.; visualization, A.N.F., E.M. and S.B.S.; supervision, Y.A., B.N.R. and E.H.; project administration, A.N.F. and Y.A.; funding acquisition, Y.A. All authors have read and agreed to the published version of the manuscript.

Funding

The authors gratefully thank Universitas Padjadjaran for providing financial support by a Postdoctoral Research Grant scheme with the contract number 2413/UN6.3.1/PT.00/2024 and Fundamental Research Grant with the contract number 4039/UN6.3.1/PT.00/2024 from the Ministry of Research, Technology and Higher Education Indonesia (Kemendikbudristek).

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Acknowledgments

We also thank the National Research and Innovation Agency (BRIN), Academic Leadership Grant Unpad 2024 and RISE_SMA project of European Union 2019–2024 for their assistance in conducting this research. Thanks to Atje Setiawan Abdullah, for the discussion and to all reviewers for their valuable comment and suggestion for this paper.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Table A1. Observed Locations (64 districs/cities) in Java Island, Indonesia.
Table A1. Observed Locations (64 districs/cities) in Java Island, Indonesia.
LocationsDistricts/CitiesLatitudeLongitudeLocationsDistricts/CitiesLatitudeLongitude
1Serang City−6.11493106.15269433Pemalang−6.89214109.378232
2Pandeglang−6.3092106.104734Brebes−6.86278109.037757
3Tangerang City−6.17139106.64055635Wonosobo−7.36793109.900846
4Tangerang Selatan City−6.28578106.71226136Magelang City−7.48054110.217695
5Kepulauan Seribu−5.6629106.568337Banjarnegara−7.43482109.566706
6Jakarta Pusat−6.17106.8238Cilacap−7.69801109.024769
7Bekasi City−6.24159106.99241639Sleman−7.68167110.32333
8Bogor City−6.59763106.79956840Yogyakarta City−7.80046110.39128
9Indramayu−6.32758108.32493641Kulon Progo−7.8596110.1579
10Karawang−6.32273107.33757942Situbondo−7.71667114.05
11Kuningan−7.01381108.57006443Probolinggo−7.7353113.4717
12Kota Cirebon−6.73725108.55065944Sumenep−7.02113.87
13Majalengka−6.83638108.22737345Jember−8.1689113.7022
14Sumedang−6.83812107.92753246Bondowoso−7.9404113.9834
15Garut−7.22791107.90869947Banyuwangi−8.21861114.366944
16Cianjur−6.82076107.1429648Pasuruan City−7.63333112.9
17Sukabumi−7.06667106.749Probolinggo City−7.75113.216667
18Bandung−7.02525107.5197650Lumajang−8.13113.22
19Bandung City−6.91486107.60823851Sampang−7.05113.25
20Pangandaran−7.61506108.49882752Mojokerto City−7.47222112.433611
21Tasikmalaya City−7.31956108.20297253Surabaya City−7.2458112.7378
22Sragen−7.42028111.02324754Blitar−8.13333112.25
23Karanganyar−7.60692110.98451555Malang City−7.98112.62
24Pati−6.74905111.03771956Nganjuk−7.6111.9333
25Wonogiri−7.79826110.94060657Tuban−6.9112.1
26Blora−6.96874111.41825458Bojonegoro−7.15111.88
27Kudus−6.80739110.84036959Tulungagung−8.0667111.9
28Semarang City−7.00223110.43422660Blitar City−8.09861112.165278
29Surakarta City−7.58104110.82667861Kediri City−7.81661112.011917
30Jepara−6.57941110.67847962Madiun City−7.63111.5231
31Batang−6.90668109.73392763Pacitan−8.13333111.16667
32Kendal−6.93268110.20307464Ponorogo−7.8686111.4619
Table A2. Unobserved Locations (55 districts/cities) in Java Island, Indonesia.
Table A2. Unobserved Locations (55 districts/cities) in Java Island, Indonesia.
LocationsDistricts/CitiesLatitudeLongitudeLocationsDistricts/CitiesLatitudeLongitude
1Serang−6.1510629Pekalongan−6.89032109.677
2Cilegon City−6.0204106.054130Tegal−6.87027109.1602
3Lebak−6.65106.216731Pekalongan City−6.88981109.6738
4Tangerang−6.3106.532Tegal City−6.86728109.1379
5Jakarta Barat−6.16717106.765733Magelang−7.47986110.2176
6Jakarta Selatan−6.25106.834Purworejo−7.71297110.01
7Jakarta Timur−6.2248106.901135Temanggung−7.31343110.1693
8Jakarta Utara−6.15225106.875536Banyumas−7.47536109.1615
9Bekasi−6.24667107.108337Kebumen−7.6708109.6614
10Bogor−6.59504106.816638Purbalingga−7.38559109.3617
11Kota Depok−6.38559106.830739Bantul−7.88461110.3341
12Purwakarta−6.53868107.449940Gunung Kidul−7.96668110.6026
13Subang−6.57159107.758741Pamekasan−7.0667113.5
14Cirebon−6.8108.566742Pasuruan−7.73333112.8333
15Sukabumi City−6.9237106.928743Jombang−7.47112.23
16Bandung Barat−6.8333107.483344Mojokerto−7.55112.5
17Cimahi City−6.89954107.533945Sidoarjo−7.45303112.7173
18Banjar City−7.37459108.558246Bangkalan−7.02919112.7461
19Ciamis−7.32622108.329347Gresik−7.1933112.553
20Tasikmalaya−7.5108.133348Lamongan−7.12112.42
21Rembang−6.70915111.342149Malang−7.96688112.6331
22Demak−6.89228110.63750Batu City−7.86667112.5167
23Grobogan−7.02424110.918751Madiun−7.61667111.65
24Semarang−7.2486110.468952Trenggalek−8.05111.72
25Boyolali−7.51847110.593253Kediri−7.83333112.1667
26Klaten−7.74432110.667854Magetan−7.65111.37
27Sukoharjo−7.68017110.832655Ngawi−7.4019111.445
28Salatiga City−7.33102110.51

Appendix B

Theoretical semivariogram plot for climate variables, including rainfall, air temperature, humidity, solar irradiation, wind speed, and surface pressure.
Figure A1. Theoretical semivariogram plot for rainfall.
Figure A1. Theoretical semivariogram plot for rainfall.
Mathematics 12 02447 g0a1
Figure A2. Theoretical semivariogram plot for air temperature.
Figure A2. Theoretical semivariogram plot for air temperature.
Mathematics 12 02447 g0a2
Figure A3. Theoretical semivariogram plot for humidity.
Figure A3. Theoretical semivariogram plot for humidity.
Mathematics 12 02447 g0a3
Figure A4. Theoretical semivariogram plot for solar irradiation.
Figure A4. Theoretical semivariogram plot for solar irradiation.
Mathematics 12 02447 g0a4
Figure A5. Theoretical semivariogram plot for wind speed.
Figure A5. Theoretical semivariogram plot for wind speed.
Mathematics 12 02447 g0a5
Figure A6. Theoretical semivariogram plot for surface pressure.
Figure A6. Theoretical semivariogram plot for surface pressure.
Mathematics 12 02447 g0a6

Appendix C

Table A3. The ESDM equations for predicting rainfall in 119 districts/cities of Java Island, Indonesia.
Table A3. The ESDM equations for predicting rainfall in 119 districts/cities of Java Island, Indonesia.
NoLocationsESDM Equation for Predicting Rainfall
1Serang City y ^ ( s 1 ) = 0.999 i = 1 64 w 1 i y ( s 1 ) + 300.639 × 1 ( s 1 ) + 2.903 X 1 0.340 X 2 + 16.948 X 3 5.394 X 4 10.201 X 5 9.262 i = 1 64 w 1 i X 1 3.725 i = 1 64 w 1 i X 2 9.497 i = 1 64 w 1 i X 3 0.499 i = 1 64 w 1 i X 4 + 3.015 i = 1 64 w 1 i X 5
2Pandeglang y ^ ( s 1 ) = 0.999 i = 1 64 w 1 i y ( s 1 ) + 300.639 × 1 ( s 1 ) 2.135 X 1 + 4.572 X 2 + 4.572 X 3 + 2.859 X 4 + 22.605 X 5 9.262 i = 1 64 w 1 i X 1 3.725 i = 1 64 w 1 i X 2 9.497 i = 1 64 w 1 i X 3 0.499 i = 1 64 w 1 i X 4 + 3.015 i = 1 64 w 1 i X 5
3Tangerang City y ^ ( s 1 ) = 0.999 i = 1 64 w 1 i y ( s 1 ) + 300.639 × 1 ( s 1 ) + 8.362 X 1 4.063 X 2 + 1.187 X 3 + 5.062 X 4 + 5.351 X 5 9.262 i = 1 64 w 1 i X 1 3.725 i = 1 64 w 1 i X 2 9.497 i = 1 64 w 1 i X 3 0.499 i = 1 64 w 1 i X 4 + 3.015 i = 1 64 w 1 i X 5
119Ponorogo y ^ ( s 1 ) = 0.999 i = 1 64 w 1 i y ( s 1 ) + 300.639 × 1 ( s 1 ) + 3.881 X 1 + 22.807 X 2 1.230 X 3 7.267 X 4 + 1.912 X 5 9.262 i = 1 64 w 1 i X 1 3.725 i = 1 64 w 1 i X 2 9.497 i = 1 64 w 1 i X 3 0.499 i = 1 64 w 1 i X 4 + 3.015 i = 1 64 w 1 i X 5

References

  1. NASA Overview: Weather, Global Warming, and Climate Change. 2022. Available online: https://science.nasa.gov/climate-change/what-is-climate-change/ (accessed on 28 February 2024).
  2. Ditjenppi Dampak dan Fenomena Perubahan Iklim. 2022. Available online: http://ditjenppi.menlhk.go.id/kcpi/index.php/info-iklim/dampak-fenomena-perubahan-iklim (accessed on 7 March 2024).
  3. BMKG Analisis Dinamika Atmosfer Dasarian III Mei 2022. Available online: https://www.bmkg.go.id/iklim/dinamika-atmosfir.bmkg (accessed on 5 April 2024).
  4. SDGs Indonesia Sustainable Development Goals (SDGs)-Tujuan 13. 2021. Available online: https://indonesia.un.org/id/sdgs/13/key-activities (accessed on 5 April 2024).
  5. Hatfield, G. Spatial statistics. In Practical Mathematics for Precision Farming; Wiely: Hoboken, NJ, USA, 2018; pp. 75–104. [Google Scholar] [CrossRef]
  6. Stohlgren, T.J. Spatial Analysis and Modeling. In Measuring Plant Diversity: Lessons from the Field; Oxford University Press: Oxford, UK, 2007; pp. 254–270. [Google Scholar] [CrossRef]
  7. Hermawan, E.; Lubis, S.W.; Harjana, T.; Purwaningsih, A.; Risyanto; Ridho, A.; Andarini, D.F.; Ratri, D.N.; Widyaningsih, R. Large-Scale Meteorological Drivers of the Extreme Precipitation Event and Devastating Floods of Early-February 2021 in Semarang, Central Java, Indonesia. Atmosphere 2022, 13, 1092. [Google Scholar] [CrossRef]
  8. Falah, A.N.; Ruchjana, B.N.; Abdullah, A.S.; Rejito, J. The Hybrid Modeling of Spatial Autoregressive Exogenous Using Casetti’s Model Approach for the Prediction of Rainfall. Mathematics 2023, 11, 3783. [Google Scholar] [CrossRef]
  9. Andriyana, Y.; Falah, A.N.; Ruchjana, B.N.; Sulaiman, A.; Hermawan, E. Spatial Durbin Model with Expansion Using Casetti’s Approach: A Case Study for Rainfall Prediction in Java Island, Indonesia. Mathematics 2024, 12, 2304. [Google Scholar] [CrossRef]
  10. Abdullah, A.S.; Matoha, S.; Lubis, D.A.; Falah, A.N.; Jaya, I.G.N.M.; Hermawan, E.; Ruchjana, B.N. Implementation of Generalized Space Time Autoregressive (GSTAR)-Kriging model for predicting rainfall data at unobserved locations in West Java. Appl. Math. Inf. Sci. 2018, 12, 607–615. [Google Scholar] [CrossRef]
  11. Falah, A.N.; Abdullah, A.S.; Parmikanti, K.; Ruchjana, B.N. Prediction of cadmium pollutant with ordinary point kriging method using Gstat-R. AIP Conf. Proc. 2017, 1827, 020019. [Google Scholar] [CrossRef]
  12. Ruchjana, B.N.; Falah, A.N.; Abdullah, A.S. Application of the ordinary kriging method for prediction of the positive spread of COVID-19 in West Java. J. Phys. Conf. Ser. 2021, 1722, 012026. [Google Scholar] [CrossRef]
  13. Gunawan, A.A.S.; Falah, A.N.; Faruk, A.; Lutero, D.S.; Ruchjana, B.N.; Abdullah, A.S. Spatial data mining for predicting of unobserved zinc pollutant using ordinary point Kriging. In Proceedings of the 2016 International Workshop on Big Data and Information Security (IWBIS), Jakarta, Indonesia, 18–19 October 2016; pp. 83–88. [Google Scholar] [CrossRef]
  14. Gharaibeh, M.A.; Albalasmeh, A.A.; Moos, N.; Mohawesh, O.; Pratt, C.; El Hanandeh, A. A comparative analysis to forecast salinity and sodicity distributions using empirical Bayesian and disjunctive kriging in irrigated soils of the Jordan valley. Environ. Earth Sci. 2024, 83, 238. [Google Scholar] [CrossRef]
  15. Falah, A.N.; Subartini, B.; Ruchjana, B.N. Application of universal kriging for prediction pollutant using GStat R. J. Phys. Conf. Ser. 2017, 893, 012022. [Google Scholar] [CrossRef]
  16. Youkuo, C.; Yongguo, Y.; Wangwen, W. Coal seam thickness prediction based on least squares support vector machines and kriging method. Electron. J. Geotech. Eng. 2015, 20, 167–176. [Google Scholar]
  17. Maria, E.; Budiman, E.; Taruk, M. Measure distance locating nearest public facilities using Haversine and Euclidean Methods. J. Phys. Conf. Ser. 2020, 1450, 012080. [Google Scholar] [CrossRef]
  18. Montero, J.M.; Fernández-Avilés, G.; Mateu, J. Spatial and Spatio-Temporal Geostatistical Modeling and Kriging; Wiely: Hoboken, NJ, USA, 2012; ISBN 9781118762387. [Google Scholar]
  19. Falah, A.N.; Hamid, N.; Rusyaman, E.; Abdullah, A.S.; Ruchjana, B.N. Implementation of Ordinary Co-Kriging method for prediction of coal quality variable at unobserved locations. J. Phys. Conf. Ser. 2021, 1722, 012076. [Google Scholar] [CrossRef]
  20. Abdullah, A.S.; Hamid, N.; Falah, A.N.; Ruchjana, B.N. Prediction of spread shear strength of rock with ordinary point kriging method using GStat-R. Appl. Math. Inf. Sci. 2019, 13, 393–399. [Google Scholar] [CrossRef]
  21. Lawrence, K.D.; Klimberg, R.K.; Lawrence, S.M. Fundamentals of Forecasting Using Excel; Industrial Press Inc.: New York, NY, USA, 2009; ISBN 083113335X. [Google Scholar]
  22. Rahul, K.; Banyal, R.K. Data Life Cycle Management in Big Data Analytics. Procedia Comput. Sci. 2020, 173, 364–371. [Google Scholar] [CrossRef]
  23. Munandar, D.; Ruchjana, B.N.; Abdullah, A.S.; Pardede, H.F. Literature Review on Integrating Generalized Space-Time Autoregressive Integrated Moving Average (GSTARIMA) and Deep Neural Networks in Machine Learning for Climate Forecasting. Mathematics 2023, 11, 2975. [Google Scholar] [CrossRef]
  24. Stackhouse, P.J. NASA POWER Data Methodology. 2020. Available online: https://power.larc.nasa.gov/ (accessed on 24 June 2024).
  25. White, J.W.; Hoogenboom, G.; Stackhouse, P.W.; Hoell, J.M. Evaluation of NASA satellite- and assimilation model-derived long-term daily temperature data over the continental US. Agric. For. Meteorol. 2008, 148, 1574–1584. [Google Scholar] [CrossRef]
  26. White, J.W.; Hoogenboom, G.; Wilkens, P.W.; Stackhouse, P.W.; Hoel, J.M. Evaluation of satellite-based, modeled-derived daily solar radiation data for the continental United States. Agron. J. 2011, 103, 1242–1251. [Google Scholar] [CrossRef]
  27. Bai, J.; Chen, X.; Dobermann, A.; Yang, H.; Cassman, K.G.; Zhang, F. Evaluation of nasa satellite-and model-derived weather data for simulation of maize yield potential in China. Agron. J. 2010, 102, 9–16. [Google Scholar] [CrossRef]
  28. Huffman, G.J.; Bolvin, D.T.; Braithwaite, D.; Hsu, K.; Joyce, R.; Xie, P. NASA Global Precipitation Measurement ( GPM ) Integrated Multi-satellitE Retrievals for GPM (IMERG ). Algorithm Theoretical Basis Document (ATBD) Version 4.4. Natl. Aeronaut. Sp. Adm. 2014, 1–31. Available online: https://gpm.nasa.gov/sites/default/files/document_files/IMERG_ATBD_V5.2_0.pdf%0Ahttps://pmm.nasa.gov/sites/default/files/document_files/IMERG_ATBD_%0AV4.4.pdf (accessed on 27 June 2024).
Figure 1. Theoretical semivariogram model plot.
Figure 1. Theoretical semivariogram model plot.
Mathematics 12 02447 g001
Figure 2. Data analytics life cycle methodology of the ESDMOK.
Figure 2. Data analytics life cycle methodology of the ESDMOK.
Mathematics 12 02447 g002
Figure 3. Framework diagram of ESDMOK for prediction.
Figure 3. Framework diagram of ESDMOK for prediction.
Mathematics 12 02447 g003
Figure 4. Spatial mapping of climate data in 119 districts/cities in Java Island, Indonesia: (a) rainfall, (b) air temperature, (c) humidity, (d) wind speed, (e) solar irradiation, (f) surface pressure.
Figure 4. Spatial mapping of climate data in 119 districts/cities in Java Island, Indonesia: (a) rainfall, (b) air temperature, (c) humidity, (d) wind speed, (e) solar irradiation, (f) surface pressure.
Mathematics 12 02447 g004
Figure 5. Spatial mapping of rainfall prediction in 119 districts/cities of Java Island.
Figure 5. Spatial mapping of rainfall prediction in 119 districts/cities of Java Island.
Mathematics 12 02447 g005
Table 1. Theoretical semivariogram model.
Table 1. Theoretical semivariogram model.
ModelFunction
Spherical γ ( h ) = { c [ ( 3 h 2 a ) ( h 2 a ) 3 ] , h a c , h > a (3)
Exponential γ ( h ) = { c [ 1 e x p ( h a ) ] , h a c , h > a (4)
Gaussian γ ( h ) = { c [ 1 e x p ( h a ) 2 ] , h a c , h > a (5)
Table 2. MAPE score scale.
Table 2. MAPE score scale.
Scale MAPEAccuracy Score
10%Very accurate prediction
10 < MAPE   20%Good prediction
20 < MAPE   50%Reasonable prediction
>50%Inaccurate prediction
Table 3. Experimental semivariogram values.
Table 3. Experimental semivariogram values.
No.The Number of Data Points with the Same
Distance
DistanceExperimental Semivariogram Values
RainfallAir
Temperature
HumidityWind SpeedSolar
Irradiation
Surface Pressure
1916,986.95612.6120.8651.9040.1860.0332.442
24032,999.84868.2091.3894.2820.3580.0743.495
37654,226.02172.5651.4295.3980.2320.1073.252
49974,327.20290.9081.7895.8200.2850.1444.465
59696,082.271168.0672.3908.2020.4000.1335.657
6109117,949.815159.6682.8809.0460.3270.1807.131
794138,289.058169.5862.6508.6120.3860.2045.904
890159,416.873219.5952.1169.5710.2990.2954.560
986181,276.821265.0713.19110.4950.3880.2977.804
1076202,391.4335.1872.1849.9890.3990.2644.687
1183223,437.78377.2262.49112.5270.3460.3405.015
1272244,803.453371.6072.24611.4610.2800.4194.565
1384268,191.079510.8851.91812.1290.4290.4823.515
1455287,619.435534.3573.05918.2050.3000.4245.342
1563309,682.512581.6902.84814.2760.3590.4886.496
Table 4. Theoretical semivariogram for OK method.
Table 4. Theoretical semivariogram for OK method.
SSESphericalExponentialGaussian
Rainfall0.00010930.00011358.86 × 10−5
Air Temperature6.96 × 10−98.56 × 10−98.93 × 10−9
Humidity1.35 × 10−78.59 × 10−86.28 × 10−8
Wind Speed1.92 × 10−85.35 × 10−104.47 × 10−10
Solar Irradiation6.76 × 10−108.82 × 10−114.27 × 10−11
Surface Pressure7.11 × 10−87.82 × 10−87.33 × 10−8
Table 5. Parameter-estimated value of SDM.
Table 5. Parameter-estimated value of SDM.
CoefficientParameter-Estimated Value
β ^ 0 θ ^
X 1 (air temperature) β ^ l a t i t u d e 1.602−9.262
β ^ l o n g i t u d e 2.963
X 2 (humidity) β ^ l a t i t u d e 0.197−3.725
β ^ l o n g i t u d e 0.427
X 3 (wind speed) β ^ l a t i t u d e 0.838−9.497
β ^ l o n g i t u d e 0.143
X 4 (solar irradiation) β ^ l a t i t u d e 6.598−0.499
β ^ l o n g i t u d e −0.416
X 5 (surface pressure) β ^ l a t i t u d e −5.5513.015
β ^ l o n g i t u d e 10.844
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Falah, A.N.; Andriyana, Y.; Ruchjana, B.N.; Hermawan, E.; Harjana, T.; Maryadi, E.; Risyanto; Satyawardhana, H.; Sipayung, S.B. An Expanded Spatial Durbin Model with Ordinary Kriging of Unobserved Big Climate Data. Mathematics 2024, 12, 2447. https://doi.org/10.3390/math12162447

AMA Style

Falah AN, Andriyana Y, Ruchjana BN, Hermawan E, Harjana T, Maryadi E, Risyanto, Satyawardhana H, Sipayung SB. An Expanded Spatial Durbin Model with Ordinary Kriging of Unobserved Big Climate Data. Mathematics. 2024; 12(16):2447. https://doi.org/10.3390/math12162447

Chicago/Turabian Style

Falah, Annisa Nur, Yudhie Andriyana, Budi Nurani Ruchjana, Eddy Hermawan, Teguh Harjana, Edy Maryadi, Risyanto, Haries Satyawardhana, and Sinta Berliana Sipayung. 2024. "An Expanded Spatial Durbin Model with Ordinary Kriging of Unobserved Big Climate Data" Mathematics 12, no. 16: 2447. https://doi.org/10.3390/math12162447

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop