Next Article in Journal
Effective Crisis Management during Adversity: Organizing Resilience Capabilities of Firms and Sustainable Performance during COVID-19
Next Article in Special Issue
Two-Phase Stratified Random Forest for Paddy Growth Phase Classification: A Case of Imbalanced Data
Previous Article in Journal
Retail Apocalypse as a Differential Urbanisation Symptom? Analysis of Ground Floor Premises’ Evolution in Barcelona between 2016 and 2019
Previous Article in Special Issue
Machine-Learning-Based System for the Detection of Entanglement in Dyeing and Finishing Processes
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Modeling the Level of Drinking Water Clarity in Surabaya City Drinking Water Regional Company Using Combined Estimation of Multivariable Fourier Series and Kernel

1
Department of Statistic, Institut Teknologi Sepuluh Nopember, Surabaya 60111, Indonesia
2
Department of Statistic, Halu Oleo University, Kendari 93132, Indonesia
*
Author to whom correspondence should be addressed.
Sustainability 2022, 14(20), 13663; https://doi.org/10.3390/su142013663
Submission received: 2 August 2022 / Revised: 31 August 2022 / Accepted: 3 September 2022 / Published: 21 October 2022

Abstract

:
The purpose of this study is to propose an appropriate model to predict chemical composition during water purification at the Regional Water Company (PDAM) Surabaya, in order to achieve proper drinking water standards. Drinking water treatment is very expensive, so the model serves as a basis for determining the composition of chemicals used in the water purification process at PDAM Surabaya. This study examines a model of the relationship between the level of clarity of drinking water and the composition of the chemicals used. The government can obtain important benefits from the forecasting model to formulate policies for the company. One of the objectives of developing the estimation method involved in this research is to efficiently determine the exact chemical composition resulting from the water purification process, which will inform the financing and control of water quality. We used a multivariable linear approach for some parametric components, a multivariable Fourier Series approach for some nonparametric components, and a multivariable Kernel approach for semiparametric regression. Using the penalized least square (PLS) approach, a mixed estimator of the Fourier and Kernel Series was obtained with semiparametric regression. The smoothing parameters were selected using a common cross-validation technique (GCV). The performance of this technique was evaluated using the Gaussian Kernel and Fourier Series with data trends in the drinking water clarity level obtained from PDAM Surabaya. The findings showed that this technique performed well, so we recommend that the government conduct an in-depth analysis to determine correct chemical composition so that the cost of water treatment can be minimized.

1. Introduction

In the adult human body, 70% of body weight is in the form of liquid. Therefore, drinking water is a nutritional element that is as important as carbohydrates, proteins, fats, and vitamins. Consuming good and sufficient mineral water can help the digestive process, regulate metabolism, regulate food substances in the body, and ensure body balance, provided that the quality of drinking water is assessed in terms of clarity [1]. To obtain drinking water that is fit for consumption involves costly processing. Careful planning is needed so that costs can be minimized. One approach that can be used is the application of a semiparametric regression model, as proposed in this paper.
Semiparametric regression is a regression analysis technique in addition to parametric and nonparametric regression. Both parametric and nonparametric features are integrated in semiparametric regression. The parametric features used in semiparametric regression include Fourier Series and Kernel. These three different estimators are combined in the estimate. Linear regression is the easiest and most efficient estimator compared to other nonparametric regression methods. Kernel is useful for use with un patterned data [2] and has a relatively faster convergence speed than polynomial estimators, Fourier Series, or splines [3]. Many nonparametric and semiparametric regression estimators have been developed. For data with variable patterns independent of node location, nonparametric regression without mixed estimators, such as spline smoothing [4], Penalized Spline [5]. B-Spline [6], Weighted Partial Spline [7]. For data with changing patterns at certain sub-intervals, truncated splines are used [8,9].
Semiparametric regression estimators have been developed by several authors, including [10,11]. The authors of [12,13,14] used the Kernel technique and the authors of [13,14] dan [15,16] developed an approach involving Fourier Series.
The assumption used by researchers when designing nonparametric or semiparametric regression models is that each predictor in the nonparametric component will follow the same pattern. However, the actual scenario is likely to reveal different correlation patterns among the predictors. Mixed estimators for nonparametric and semiparametric regression have been developed to estimate the regression curve according to the data pattern. Studies on mixed estimators for nonparametric regression have previously been conducted on the mixed Kernel estimator and Fourier Series in nonparametric regression [17], on the mixed estimator of Fourier Series and truncated spline in nonparametric regression by [13,18,19], and on the mixed estimator Kernel and smoothing spline in nonparametric regression [20]. For mixed semiparametric regression estimators, ref. [16] developed an approach that combines spline truncated and Kernel approaches, while [12,14] constructed a model combining Kernel estimators and Fourier Series. The authors of [20] developed an approach that combines smoothing spline and Fourier Series. The smoothing parameter serves to control the smoothness between the goodness of fit and the penalty.
This paper proposes a combined estimate of semiparametric regression of the multivariable Kernel mixed estimator and Fourier Series, in which some nonparametric components contain repeating and un patterned components. Semiparametric regression research using mixed estimates of Kernel and Series Fourier was carried out by [13] but has not been able to overcome the semiparametric data pattern of mixed multivariable Kernel and multivariable Fourier Series for parameters components; Kernel components and Fourier Series are also multivariable. This paper presents the combined estimation of the multivariable Kernel estimator and the multivariable Fourier Series, using a PLS estimation approach as the estimation method. The estimator technique uses a PLS estimate produced by combining goodness of fit and penalty.
With respect to the optimal smoothing parameter selection method using the GCV method, a small optimal smoothing parameter will produce a very rough estimate on the Fourier Series estimator, but a large smoothing parameter will produce a very fine estimate where the estimator is not able to estimate the data according to the pattern. Similarly, an optimal bandwidth is required because a very small bandwidth will result in a very coarse Kernel estimator and a very wide bandwidth will produce a Kernel estimator that is slippery and does not match the data pattern. Previous nonparametric and semiparametric regression researchers have extensively developed the GCV method. Researchers who studied the GCV technique in nonparametric regression, among others, found that the GCV technique was superior to an unbiased risk approach in this context [17]. The authors of [15] investigated semiparametric regression using the GCV method. The selection of smoothing parameters in this study was based on the development of optimal smoothing parameter selection on the balanced estimator combined multivariable Fourier Series and Kernel in semiparametric regression [13].
The model results obtained from the estimation results are anticipated to be used in modeling the level of clarity of drinking water in PDAM Surabaya. The estimation of parametric components is approximated by a multivariable parametric method, Kernel components are approximated by Gaussian Kernels and components of the Fourier Series are approximated using Fourier Series with trends. The estimation model can be used to predict the composition of the optimal use of chemicals while taking into account the threshold for drinking water. This information is anticipated to provide reference materials for planning drinking water management. As each chemical is expensive and its price varies, it is hoped that, using this model, PDAM Surabaya can efficiently manage the costs of producing clean water.

2. Materials and Techniques

Given a data pair of n that is ( t 1 i , , t p i , x 1 i , , x q i , z 1 i , , z r i , y i ) , with i = 1 , 2 , , n . y i is the reaction variable. A multivariable semiparametric regression form is produced as follows:
y i = η ( t 1 i , , t p i , x 1 i , , x q i , z 1 i ,   ,   z r i ) + ε i
ε i is a random error for which I I D N ( 0 , σ 2 ) . Assuming that the regression curve μ is additive, it may be expressed as:
y i = j = 1 p g j ( t j i ) + k = 1 q m k ( x k i ) + l = 1 r h l ( z l i ) + ε i ,   i = 1 , 2 , , n
Part j = 1 p g j ( t j i ) is a parametric component that can be approached using a multivariable linear function; k = 1 q m k ( x k i ) is a nonparametric component that can be approached using a multivariable Kernel function, and l = 1 r h l ( z l i ) is a nonparametric component with hl expected to be smooth and enclosed in continuous function space on ( 0 , π ) , so that h l ( z l i ) may be approached using Fourier Series [21] H ( z l ) with H ( z i ) = b z i + 1 2 a 0 + s = 1 S a s cos s z i where b, a 0 , , a s , s = 1 , 2 , S are model parameters.
The following equation yields the estimator η by PLS optimization:
M i n h l C ( 0 , π ) , β R p + 1 , γ R r ( S + 2 ) { n 1 i = 1 n ( y i β 0 j = 1 p β j t j i k = 1 q m k ( x k i ) l = 1 r ( b l z l i + 1 2 a 0 l + s = 1 S a s l cos   s z l i ) ) 2 + l = 1 r λ l 0 π 2 π ( h l ( z l ) ) 2 d z l }
where λ l are the smoothing parameters.
The function that measures goodness of fit makes up the first component of Equation (3), and the function that measures the penalty makes up the second.

3. Results and Discussion

3.1. Mixed Kernel Model and Multivariable Fourier Series in Semiparametric Regression

Several lemmas must be satisfied to generate a mixed model Kernel and multivariable Fourier Series in semiparametric regression in Equation (2). Lemma 1 presents solutions for parametric components, Lemma 2 presents solutions for Kernel components, Lemma 3 presents solutions for Fourier Series components and Lemma 4 presents solutions for goodness of fit, while Lemma 5 presents the penalty component form of Equation (3).
Lemma 1.
If the components of a linear parametric curve are multivariable, j = 1 p g j ( t j i ) in Equation (2) are approximated by a multivariable linear function, then j = 1 p g j ( t j i ) can be written to matrix asX β , whereXis a size matrixn × (p + 1), and β is a size vector (p + 1) × 1.
Proof of Lemma 1.
If function g j ( t j i ) is roughly represented by a multivariable linear function, then g j ( t j i ) = β 0 j + β i j t j i ,   i = 1 , 2 , , n ;   j = 1 , 2 , , q .
For g j ( t j i ) , when i = 1 , 2 , , n ,   then the following is obtained
[ g j ( t j 1 ) g j ( t j n ) ] = [ β 0 j + β 1 j t j 1 β 0 j + β n j t j n ]
so that:
[ j = 1 p ( β 0 j + β 1 j t j i ) j = 1 p ( β 0 j + β 1 j t j i ) ] = [ β 01 + β 11 t 1 i + β 02 + β 12 t 2 i + + β 0 p + β 1 p t p i β 01 + β 11 t 1 i + β 02 + β 12 t 2 i + + β 0 p + β 1 p t p i ] ,   β 0 * = β 01 + + β 0 p
so that the matrix can be written, such as:
j = 1 p g j ( t j i ) = X β   with X = [ 1 t 11 1 t p 1 1 t 1 n 1 t p n ] ,   β 0 * = β 01 + + β 0 p   and β = [ β 0 * β 11 β 1 p ] T   are   linear   function   parameters
.□
Lemma 2.
If the components of the Kernel curve k = 1 q m k ( x k i ) in Equation (2) is approximated by a multivariable Kernel function, the NadarayaWatson estimator [2], then k = 1 q m k ( x k i ) can be written as matrix Ω y , as follows:
k = 1 p m k ( x k i ) = Ω y ,
where Ω is a size matrix size n × n, y is a size vector n × 1.
Proof of Lemma 2.
If the Kernel curve k = 1 q m k ( x k i ) in Equation (2) is approached with a multivariable Kernel function, the Nadaraya–Watson estimator [2], then k = 1 q m k ( x k i ) can be written as a matrix Ω y
m φ k ( x k i ) = n 1 i = 1 n W φ k i ( x k i ) y i
with:
W φ k ( x k i ) = K φ k ( x k x k i ) n 1 i = 1 n K φ k ( x k x k i ) ,
K φ k ( x k x k i ) = 1 φ k K ( x k x k i φ k )
where K φ k ( x k x k i ) is a Kernel function. φ k is a bandwidth parameter.
For i = 1 , 2 , , n , obtained:
m ^ φ k i ( x k i ) = n 1 k = 1 n W φ k 1 ( x k 1 ) y 1 = n 1 W φ k 1 ( x k 1 ) y 1 + W φ k 2 ( x k 2 ) y 1 + + W φ k n ( x k n ) y 1 m ^ φ k n ( x k n ) = n 1 k = 1 n W φ k n ( x k n ) y n = n 1 W φ k n ( x k n ) y n + W φ k n ( x k n ) y 2 + + W φ k n ( x k n ) y n
as follows k = 1 , 2 , , q
[ m ^ φ 1 i ( x 1 i ) m ^ φ q i ( x q i ) ] = [ n 1 i = 1 n W φ 1 i ( x 1 i ) y i n 1 i = 1 n W φ q i ( x q i ) y i ] = [ n 1 W φ 11 ( x 1 i ) y 1 + + W φ 1 n ( x 1 i ) y i n 1 W φ q 1 ( x q i ) y 1 + + W φ q n ( x q i ) y i ]
Kernel components can be written as follows:
g ^ φ k ( t k ) = P k ( φ k ) y
y = ( y 1 y 2 y n ) T
P k ( φ k ) = [ n 1 W φ 11 ( x 11 ) W φ 12 ( x 12 ) W φ 1 n ( x 1 n ) n 1 W φ q 1 ( x q 1 ) W φ q 2 ( x q 2 ) W φ q n ( x q n ) ]
as a result of Equation (6), the following is obtained:
k = 1 q m k ( x k i ) = Ω y ,
where Ω = [ P 1 ( φ 1 ) P 2 ( φ 2 ) P q ( φ q ) ] T . □
Lemma 3.
If the Kernel curve component l = 1 r h l ( z l i ) in Equation (2) is approximated by a Fourier Series function H l ( z l i ) , assuming regression h l C ( 0 , π ) ,   l = 1 , 2 , , r , then
l = 1 r h l ( z l i )   c a n   b e   w r i t t e n   a s   l = 1 r h l ( z l i ) = D γ
where D is a matrix of size n × r ( S + 2 ) and γ is a vector of size r ( S + 2 ) × 1 .
Proof of Lemma 3.
If the components of the curve of the Fourier Series l = 1 r h l ( z l i ) is approximated by the Fourier Series function H l ( z l i ) assuming regression h l ( z l i ) h l C ( 0 , π ) ,   l = 1 , 2 , , r are oscillation parameters [21], then obtained follow is
H l ( z l i ) = b l z l i + 1 2 a 0 l + s = 1 S a s l cos   s z l i ,   i = 1 , 2 , , n ,   l = 1 , 2 , , r
for i = 1 , 2 , , n ,   then the following is obtained:
[ H l ( z l 1 ) H l ( z l n ) ] = [ b l z l 1 + 1 2 a 0 l + s = 1 S a s l cos   s z l 1 b l z l n + 1 2 a 0 l + s = 1 S a s l cos   s z l n ]
so that l = 1 r H l ( z l i ) in Equation (10), the above equation is presented in matrix form, as follows:
l = 1 r h l ( z l i ) = D γ
where:
D = [ z 11 1 cos   z 11 cos   S z 11 z r 1 1 cos   z r 1 cos   S z r 1 z 1 n 1 cos z r 1 cos S z r n z r n 1 cos   z r n cos   S z r n ]
and
γ = [ b 1 1 2 a 01 a 11 a S 1 b r 1 2 a 0 r a 1 r a S r ] T .
Lemma 4.
If the semiparametric regression model is as in Equation (2), where the linear regression curve is given in Lemma 1, the Kernel curve is in Lemma 2 and the Fourier Series curve is in Lemma 3, then the equation’s goodness of fit (3) is as follows:
( ( I y ) Ω X β D γ ) T ( ( I y ) Ω X β D γ )
Proof of Lemma 4.
If the parametric component is approximated by a linear function so that j = 1 p g j ( t j i ) = X β (Lemma 1), then the Kernel component is close to the Nadaraya–Watson Kernel, then k = 1 q m k ( x k i ) = Ω y , then the components of the Fourier Series are approximated by the Bilodeau Fourier Series, then l = 1 r h l ( z l i ) = D γ , Equation (3) can be presented in the form of a matrix y = X β + y Ω + D γ + ε , so that the goodness of fit equation (3) is obtained ε T ε , as follows:
( ( I y ) Ω X β D γ ) T ( ( I y ) Ω X β D γ )
Lemma 5.
If the penalty component is given as l = 1 r λ l 0 π 2 π ( h l ( z l ) ) 2 d z l from Equation (2), then:
l = 1 r λ l 0 π 2 π ( h l ( z l ) ) 2 d z l = γ T P γ
λ l are smoothing parameters.
Proof of Lemma 5.
Given a penalty component is given as l = 1 r λ l 0 π 2 π ( h l ( z l ) ) 2 d z l with h l ( z l i ) expected to be smooth and enclosed in a continuous function space C ( 0 , π ) , so that h l ( z l i ) may be approached by b l z l i + 1 2 a 0 l + s = 1 S a s l cos   s z l i , then determine the second derivative of h l ( z l i ) , obtained h l ( z l ) = s = 1 S s 2 a s l cos   s z l a result
0 π 2 π ( h ( z ) ) 2 d z = s = 1 S s 4 a s 2 = γ T P γ
with γ = ( b 1 2 a 0 a 1 a S ) and P = ( 0 0 0 0 0 0 0 0 0 0 1 4 0 0 0 0 S 4 ) . □
Theorem 1 provides a detailed explanation of the mixed model of Kernel and multivariable Fourier Series, where the linear curve is provided in Equation (4), the Kernel curve in Equation (5), the Fourier Series curve in Equation (8), and the penalty in Equation (12).
Theorem 1.
Equation (1) describes a semiparametric regression model. The linear regression curve is represented by the expressions given in Equation (4), the Kernel in Equation (5), Fourier Series in Equation (8), and penalty in Equation (12). We obtain a multivariable by minimizing the PLS in Equation (3), giving us:
η ^ ( β , φ , λ , s ) ( t , x , z ) = M * y
whereyis a vector size n × 1 andM*is a matrix size n × n.
Proof of Theorem 1.
Based on Lemma 4, Equation (3)’s optimization can be expressed as:
M i n γ , β η ( γ , β ) = M i n γ , β [ n 1 [ ( I Ω k ) y X β D γ ] T [ ( I Ω k ) y X β D γ ] + γ T P γ ]
where γ T P γ a is the penalty component.
In Equation (13), if A = ( I Ω k ) y X β , so ζ ( γ , β ) , It can be expressed as:
η ( γ , β ) γ = n 1 γ [ ( A T γ T D T ) ( A D γ ) + γ T P γ ] = [ 2 D T D + 2 P ] γ 2 D T A = 0 γ ^ = [ D T D + P ] 1 D T A
To get an estimator γ , so η ( γ , β ) is obtained by partially deriving from γ , and the result is equal to zero.
γ ^ = B ( ( I Ω ) y X β ^ ) where   B = [ D T D + P ] 1 D T
Next, to get an estimation β ^ , i.e., partially derivative of ζ ( γ , β ) to β , then equated to zero, as follows:
η ( γ , β ) β = n 1 β [ ( y T ( I Ω ) T β T X T - γ T D T ) ( ( I Ω ) y X β D γ ) + γ T P γ ] = n 1 [ 2 X T X β ^ 2 X T ( I Ω ) y + 2 X T D γ ] = 0
β ^ = ( X T X ) 1 X T [ ( I Ω ) y D γ ^ ]
Equation (14) is substitute into Equation (15), and we obtained:
β ^ = T y ,   where   T = ( I ( X T X ) 1 X T D B X ) 1 ( X T X ) 1 X T ) ( I D B ) ( I Ω )
Next get γ ^ , namely by substituting Equation (16) into the Equation (14), we obtained:
γ ^ = B ( ( I Ω ) y X β ^ ) = B ( I Ω X T ) y
where β ^ = T y ,
The above equation can be written in matrix form, as follows
γ ^ = C * y , where   C = B ( I Ω X T )
y is a vector of size n × 1 and C* is a matrix of size n × n,
If β ^ given in Equation (15) is substituted, we get an estimator for the parametric component:
g ^ ( β , φ , λ , s ) ( t , x , z ) = X β ^ = X T * y ,   with   T   is   a   matrix   size   n × n
where T = X ( I + ( X T X ) 1 X T D B X ) 1 ( ( X T X ) 1 X T ( I D B ) ( I Ω ) ) .
After the parametric estimator is obtained, the next step is to obtain a multivariable Kernel component estimator, as follows:
m ^ ( β , φ , λ , s ) ( t , x , z ) = k = 1 q m ^ ( β , φ , λ , s ) k ( t , x , z ) = Ω y
where Ω = k = 1 q P k ( φ k ) = P 1 ( φ k ) P 2 ( φ k ) P g ( φ k ) . Ω is a matrix size n × n,
The following steps are taken to obtain the estimator for the multivariable Fourier Series components:
l = 1 r h ^ ( β , φ , λ , s ) l ( t , x , z ) = D γ ^ = D C * y , with   γ ^ = C * y
Because g ^ ( β , φ , λ , s ) ( t , x , z ) given in Equation (17), m ^ ( β , φ , λ , s ) ( t , x , z ) is given in Equation (18), dan h ^ ( β , φ , λ , s ) ( t , x , z ) is given in Equation (19), then the estimation of the Kernel mixture and the multivariable Fourier Series in the semiparametric regression, where: x = ( x 1 ,   x 2 ,   ,   x q ) T , t = ( t 1 , t 2 ,   ,   t p ) T , and   z = ( z 1 , z 2 ,   ,   z r ) T , so that: η ^ ( β , φ , λ , s ) ( t , x , z ) = ( X T + Ω + D C ) y .
Furthermore, the above equation can be presented as follows:
η ^ ( β , φ , λ , s ) ( t , x , z ) = M * y
where M * = ( X T + Ω + D C ) . y is a vector of size n × 1 and M* is a matrix of size n × n.□

3.2. Smoothing Parameter Selection

Semiparametric regression using the Kernel mixed estimator and multivariable Fourier Series is highly reliant on the selection of the best smoothing, bandwidth, and oscillation parameters. The authors of [22,23] state that the selection of the smoothing parameters using GCV in semiparametric regression utilizing the combined estimator multivariable Fourier Series and Kernel is according to:
G C V ( β , φ , λ , s ) = n 1 ( I M * )   y   2 ( n 1 t r a c e ( I M ) 2 ,   with   M * = ( X T + Ω + D C )
The least G C V ( β , φ , λ , s ) results in the ideal smoothing parameters, oscillation para-meters, and bandwidth.

4. Modeling Data

In this section, the TKAM data at PDAM Surabaya are subjected to a combined model of the multivariable Kernel and the Fourier Series in semiparametric regression. Drinking water is very important for human life and must be used wisely considering future generations [24]. Because of water pollution, purification processes are necessary which are very costly [25] and require careful planning, including the assessment of the composition of the chemical substances needed to obtain drinking water that meets required standards.
After conducting an initial study on TKAM data at PDAM Surabaya, the obtained data show that there were differences in the data patterns between each predictor variable and the reaction variable; that is, some showed a Fourier Series pattern, others showed a Kernel pattern, and some followed a linear pattern. These data were then applied to the model in Equation (19) using R software with library(pracma), library(MASS), library(lmtest) and library(gtools). The response variable y was the level of clarity of drinking water. The predictor variables thought to affect the level of water clarity included aluminum sulfate ( x 1 ), liquid chlorine ( x 2 ), cupric sulfate ( x 3 ), chlorine ( x 4 ), Dukem 108A ( x 5 ), and the turbidity of the water after deposition (x6), where x2, x3, and x6 are parametric components, x1 dan x5 are Kernel components and x4 is a Series Fourier component. The estimated result, based on the smallest GCV criterion value [23,26], from Equation (20) is 0.00209. The following estimation models were obtained:
y ^ = 0.28893 + 17.8774 x 2 i 0.33519 x 3 i 0.02863 x 6 i + 1 2 π exp [ 1 2 ( 0.47351 x 1 i 0.33068 ) 2 ] Σ i = 1 30 1 2 π exp [ 1 2 ( 0.47351 x 1 i 0.33068 ) 2 ] + 1 2 π exp [ 1 2 ( 0.50129 x 5 i 0.00177 ) 2 ] Σ i = 1 30 1 2 π exp [ 1 2 ( 0.50129 x 1 i 0.00177 ) 2 ] 0.52630 x 4 i 0.57788 cos   x 4 i 4.51577 × 10 9 cos   2 x 4 i 2.00623 cos   3 x 4 i
An overview of the real data and the estimated results is presented in Figure 1.
A combination of the multivariable Kernel and multivariable Fourier Series in Semiparametric regression has a value R2 = 88.2% when estimating the degree of clarity of drinking water in PDAM Surabaya utilizing semiparametric regression modeling. Based on the value = 88.2% obtained, the predictor variable can explain 88.2% of the variance in the relationship between the response variables. Furthermore, this shows the suitability of this model to be used in modeling the TKAM data from PDAM Surabaya [19].

5. Conclusions

This paper presents an estimation technique for semiparametric regression using PLS. We combined the multivariable parametric estimator, the multivariable Kernel and the multivariable Fourier Series to estimate the regression curve with data having a data pattern that was partly parametric multivariable, partly multivariable Kernel and partly Fourier Series multivariable. The model was based on the smallest GCV. We considered the outcomes using various types of Kernels, while looking at the estimator features of the Fourier Series and Kernel estimator in semiparametric regression. The model obtained was more adequate compared to [19] which had a determination coefficient R of 84%. Using a mixed estimate of Kernel and multivariable Fourier Series, the coefficient of determination R was found to be 88.4%.

Author Contributions

Conceptualization, A.T.A.; data curation, A.T.A. and I.Z.; formal analysis A.T.A. and I.N.B.; funding acquisition, I.N.B.; investigation, I.N.B. and I.Z.; methodology, A.T.A. and I.N.B.; software, A.T.A. and I.Z.; supervision and validation, I.N.B.; visualization, A.T.A. and I.Z.; writing—original draft, A.T.A. and I.N.B.; writing—review and editing, A.T.A., I.N.B. and I.Z. All authors have read and agreed to the published version of the manuscript.

Funding

In accordance with the project plan of the Publication Writing and IPR Incentive Program (PPHKI), the authors gratefully acknowledge financial support from the Institut Teknologi Sepuluh Nopember for this project.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data is available to support the study’s conclusions, in the Dissertation’s Pane, R. Department of Statistic, Institut Teknologi Sepuluh Nopember Surabaya Indonesia, 2019.

Conflicts of Interest

The authors declare no conflict of interest.

Notations

SymbolMeaning
y i Response variable to i
x p i The predictor variable to p on the parametric component x for the i-th subject.
the ith predictor variable on the kth parametric component
The p-th predictor variable on the parametric component for the i-th subject
t k i The predictor variable to k on the nonparametric component t for the i-th subject.
z r i The predictor variable to r on the nonparametric component z for the i-th subject.
a 0 , a S , b Fourier Series parameters
β Parametric component parameter vector/regression coefficient vector
m ( x ) Parametric function for parametric components
g ( t ) Kernel functions for nonparametric components
h ( z ) Fourier Series functions for nonparametric components
ε Random error vector with ε N ( 0 , I σ 2 ) .
σ 2 Error variance
φ Bandwidth
a Vector containing the parameters of the Fourier Series measuring (S + 2) × 1.
T The matrix containing the coefficients of the multivariable Fourier Series of n × (S + 2).
λ Smoothing parameter
S Oscillation parameters
S S E Sum square error
S S T Sum square total
A 1 The invers of A matrix
K ( t ) Kernel function
t r ( A ) The trace of A matrix
D Matrix D*
E [ . ] Expectation
GCV ( . ) Generalized cross validation
I Identity matrix
SOscillation of a Fourier Series function
R 2 Coefficient of determination
D The matrix containing the coefficients of the multivariable Fourier Series of n × r(S + 2).
γ Multivariable Fourier Series parameter matrices of size (S + 2) × 1
A Goodness of fit component function semiparametric regression mixed with Kernel and Fourier Series
B Penalty component function mixed semiparametric regression Kernel and Fourier Series
L The integral of a function L
M The integral of a function M
P Matrix containing Kernel weights of size n × n.
T Matrix containing multivariable Fourier Series coefficients of size n × p ( k + 2 ) .
Ω Matrix containing matrices P.
X Parametric component predictor variable matrix of size n × ( r + 1 )
Z Matrix containing univariable Fourier Series coefficients of size n × ( k + 2 ) .
λ j j-th smoothing parameter
β Vector containing parameter estimates of parametric components
μ ^ Parameter vector containing parameter estimates of parametric and nonparametric components
R 2 Coefficient of determination
Real number
. Norm/vector length

References

  1. The Regulation of the Minister of Health of the Republic of Indonesia no. 416/Menkes/per/IX/1990. 1990. Available online: https://baristandsamarinda.kemenperin.go.id/download/PerMenKes416(1990)-Syarat&Pengawasan_Kualitas_Air.pdf (accessed on 17 May 2022).
  2. Okumura, H.; Naito, K. Non-parametric Kernel Regression for Multinomial Data. J. Multivar. Anal. 2006, 97, 2009–2022. [Google Scholar] [CrossRef] [Green Version]
  3. Nadaraya, E.A.; Watson, G.S. On Estimating Regression. Theory Probab. Its Appl. 1964, 9, 141–142. [Google Scholar] [CrossRef]
  4. Eubank, R.L. Spline Smoothing and Nonparametric Regression; Mercel Dekker: New York, NY, USA, 1988. [Google Scholar]
  5. Wood, S.N. On Confidence Intervals for Generalized Additive Models Based on Penalized Regression Spline. Aust. N. Z. J. Stat. 2006, 48, 445–464. [Google Scholar] [CrossRef]
  6. Iqbal, A.; Abd Hamid, N.N.; Ismail, A.I.M.; Abbas, M. Galerkin approximation with quintic B-spline as basis and weight functions for solving second order coupled nonlinear Schrödinger equations. Math. Comput. Simul. 2021, 187, 1–16. [Google Scholar] [CrossRef]
  7. Kim, T.W.; Kvasov, B. A shape-preserving approximation by weighted cubic splines. J. Comput. Appl. Math. 2012, 236, 4383–4397. [Google Scholar] [CrossRef] [Green Version]
  8. Ruppert, D. Selecting the number of knots for penalized splines. J. Comput. Graph. Stat. 2002, 11, 735–757. [Google Scholar] [CrossRef]
  9. Wahba, G. Spline Models for Observasion Data; SIAM: Philadelphia, PA, USA, 1990. [Google Scholar]
  10. Nisa, K.; Budiantara, I.N.; Tuty, A. Multivariable Semiparametric Regression Model with Combined Estimator of Fourier Series and Kernel. IOP Conf. Ser. Earth Environ. Sci. 2017, 58, 012028. [Google Scholar] [CrossRef] [Green Version]
  11. Bhattacharya, P.K.; Zhao, P.L. Semiparametric Inference in a Partial Linear Model. Ann. Stat. 1997, 1, 244–262. [Google Scholar] [CrossRef]
  12. Smith, M.; Kohan, R.; Mathur, S.K. Bayesian Semiparametric Regression: An Exposition and Application to Print Advertising Data. J. Bus. Res. 2000, 49, 229–244. [Google Scholar] [CrossRef]
  13. Cheng, M.Y.; Paige, R.L.; Sun, S.; Yan, K. Variance Reduction for Kernel Estimations in Clustered/Longitudinal Data Analysis. J. Stat. Plan. Inference 2010, 140, 1389–1397. [Google Scholar] [CrossRef]
  14. Manzana, S.; Zerom, D. Kernel Estimation of a Partially Linear Additive Model. Stat. Probab. Lett. 2005, 72, 313–322. [Google Scholar] [CrossRef]
  15. Amato, U.; Antoniadis, A.; De Feis, I. Fourier Series Approximation of Separable Models. J. Comput. Appl. Math. 2002, 146, 459–479. [Google Scholar] [CrossRef] [Green Version]
  16. Morton, J.; Silverberg, L. Fourier Series of Half-Range Functions by Smooth Extension. Appl. Math. Model. 2009, 33, 812–821. [Google Scholar] [CrossRef]
  17. Mardianto, M.F.F.; Kartiko, S.H.; Utami, H. Forecasting Trend-Seasonal Data Using Nonparametric Regression with Kernel and Fourier Series Approach. In Proceedings of the Third International Conference on Computing, Mathematics and Statistics (ICMS2017); Springer: Singapore, 2019; pp. 343–349. [Google Scholar]
  18. Nisa, K. Semiparametrik Regression Model with Combined Estimator of Truncated Spline and Fourier Series (Case Study: Life Expectancy of East Java Province). Master’s Thesis, Institut Teknologi Sepuluh Nopember, Surabaya, Indonesia, 2017. Available online: https://core.ac.uk/download/pdf/291461498.pdf (accessed on 27 June 2022).
  19. Pane, R.; Budiantara, I.N.; Zain, I.; Otok, B.W. Parametric and Nonparametric Estimators in Fourier Series Semiparametric Regression and Their Characteristics. Appl. Math. Sci. 2014, 8, 5053–5064. [Google Scholar] [CrossRef]
  20. Hardle, W. Applied Nonparametric Regression; Humboldt Universität zu Berlin: Berlin, Germany, 1994. [Google Scholar]
  21. Bilodeau, M. Fourier Smoother and Additive Models. Can. J. Stat. 1992, 3, 257–269. [Google Scholar] [CrossRef]
  22. Kayri, M.; Zirhoglu, G. Kernel Smoothing Function and Choosing Bandwidth for Nonparametric Regression Techniques. Ozean J. Appl. Sci. 2009, 2, 49–60. [Google Scholar]
  23. Lin, Y.; Zhang, H.H. Component Selection and Smoothing in Multivariate Nonparametric Regression. Ann. Stat. 2006, 34, 2272–2297. [Google Scholar] [CrossRef]
  24. Hefni, E. Water Quality Review: For Management of Aquatic Resources and Environment; Kanisius: Yogyakarta, Indonesia, 2003. [Google Scholar]
  25. Sutrisno, T. Clean Water Supply Technology; PT Rineka Cipta: Jakarta, Indonesia, 2004. [Google Scholar]
  26. Aydin, D.; Memmedli, M.; Omay, R.E. Smoothing Parameter Selection for Nonparametric Regression Using Smoothing Spline. Eur. J. Pure Appl. Math. 2013, 6, 222–238. [Google Scholar]
Figure 1. Comparison between real data and TKAM estimation results at PDAM Surabaya.
Figure 1. Comparison between real data and TKAM estimation results at PDAM Surabaya.
Sustainability 14 13663 g001
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Ampa, A.T.; Budiantara, I.N.; Zain, I. Modeling the Level of Drinking Water Clarity in Surabaya City Drinking Water Regional Company Using Combined Estimation of Multivariable Fourier Series and Kernel. Sustainability 2022, 14, 13663. https://doi.org/10.3390/su142013663

AMA Style

Ampa AT, Budiantara IN, Zain I. Modeling the Level of Drinking Water Clarity in Surabaya City Drinking Water Regional Company Using Combined Estimation of Multivariable Fourier Series and Kernel. Sustainability. 2022; 14(20):13663. https://doi.org/10.3390/su142013663

Chicago/Turabian Style

Ampa, Andi Tenri, I Nyoman Budiantara, and Ismaini Zain. 2022. "Modeling the Level of Drinking Water Clarity in Surabaya City Drinking Water Regional Company Using Combined Estimation of Multivariable Fourier Series and Kernel" Sustainability 14, no. 20: 13663. https://doi.org/10.3390/su142013663

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop