Next Article in Journal
On the Question of the Bäcklund Transformations and Jordan Generalizations of the Second Painlevé Equation
Previous Article in Journal
A Sparse Quasi-Newton Method Based on Automatic Differentiation for Solving Unconstrained Optimization Problems
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

The Application of Mixed Smoothing Spline and Fourier Series Model in Nonparametric Regression

by
Ni Putu Ayu Mirah Mariati
1,2,*,
I. Nyoman Budiantara
1,* and
Vita Ratnasari
1
1
Department of Statistics, Institut Teknologi Sepuluh Nopember, Surabaya 60111, Indonesia
2
Universitas Mahasaraswati Denpasar, Bali 80233, Indonesia
*
Authors to whom correspondence should be addressed.
Symmetry 2021, 13(11), 2094; https://doi.org/10.3390/sym13112094
Submission received: 1 October 2021 / Revised: 20 October 2021 / Accepted: 25 October 2021 / Published: 4 November 2021

Abstract

:
In daily life, mixed data patterns are often found, namely, those that change at a certain sub-interval or that follow a repeating pattern in a certain trend. To handle this kind of data, a mixed estimator of a Smoothing Spline and a Fourier Series has been developed. This paper describes a simulation study of the estimator in nonparametric regression and its implementation in the case of poor households. The minimum Generalized Cross Validation (GCV) was used in order to select the best model. The simulation study used generation data with a Uniform distribution and a random error with a symmetrical Normal distribution. The result of the simulation study shows that the larger the sample size n, the better the mixed estimator as a model of nonparametric regression for all variances. The smaller the variance, the better the model for all combinations of samples n. Very poor households are characterized predominantly in their consumption of carbohydrates compared to that of fat and protein. The results of this study suggest that the distribution of assistance to poor households is not the same, because in certain groups there are poor households that consume higher carbohydrates, and some households may consume higher fats.

1. Introduction

The Smoothing Spline has been used in nonparametric regression analysis [1]. The Smoothing Spline can provide better results because the analysis can overcome the patterns of data that show a sharp increase and decrease, resulting in a relatively smooth curve [2]. The advantages of using the Smoothing Spline are its unique statistical properties, it enables visual interpretation, it can handle smooth data and functions, and can readily handle data that change at certain sub-intervals [3,4,5,6]. In addition to the Spline estimator, the Fourier Series estimator is an estimation technique that is quite popular in nonparametric regression. The Fourier Series is a model that provides good statistical and visual interpretation among nonparametric regression models. The advantage of estimating the Fourier Series is that it can handle the characteristics of data that follow a repeating pattern at a certain trend interval, and provides a good statistical interpretation. It can handle data characters that follow a repeating pattern at certain trend intervals [7].
In addition to the development of research on nonparametric regression, mixed estimators in nonparametric regression have been recently developed. A mixed estimator of the Fourier Series and the Truncated Spline was developed by Sudiarsa et al. [8]. The local Polynomial Mixture and Truncated Spline were developed by Suparti et al. [9]. In daily life, mixed data patterns often appear; in particular, data patterns can fluctuate at certain sub-intervals or follow repeating patterns in a certain trend. Thus, a mixed estimator of the Smoothing Spline and the Fourier Series is applied to handle this kind of data. In the current study, to test the performance of the mixed of the Smoothing Spline and the Fourier Series estimator as a model of nonparametric regression, a simulation study was carried out. In this paper, we discuss the suitability of the sample size and variance in this model of nonparametric regression. After the simulation study was completed, it was applied to the real data of poor households in Bali Province.
The province of Bali consists of nine regencies/cities. Among the nine districts/cities in Bali Province, Karangasem Regency has the highest poverty rate [10]. This is seen in the high percentage of poor people, of 6.28%. One of the most frequently used indicators to determine the level of poverty in a community is the level of household spending. Therefore, a comprehensive approach is needed to alleviate poverty through the identification of the factors that influence the expenditure of poor households in Karangasem Regency. The poor household data used in this study were household expenditure, wage/salary, and carbohydrate, fat, and protein consumption per capita per month. According to the World Bank, poverty can be reflected in household expenditure and income [11]. Household expenditure is influenced by income and consumption factors [12,13,14]. In addition to the income factor, the consumption factor is also very influential on the expenditure of poor households [15]. Based on the factors that influence the consumption of fat per capita per month, it is expected that it can be analyzed using the Fourier Series. This is because the higher the wage/salary, the higher the expenditure; however, at a certain interval, as the wage/salary decreases, the expenditure also decreases. In contrast, the other factors are examined using the Smoothing Spline. Thus, the problem of poor households in Karangasem Regency was examined with a mixed Smoothing Spline and Fourier Series estimator in nonparametric regression.

2. Mixed Smoothing Spline and Fourier Series Estimator in Nonparametric Multivariable

The mixed Smoothing Spline and Fourier Series estimator in multivariable nonparametric regression is an estimator that combines two methods that have different characteristics, i.e., data that change at sub-intervals are characteristic of the Smoothing Spline and data that have repeating patterns are characteristic of the Fourier Series. The following describes the mixed Smoothing Spline and Fourier Series estimator. Paired data is provided as ( x 1 i , x 2 i , , x p i , t 1 i , t 2 i , , t q i , y i ) with y i ,   i = 1 , 2 , , n , which indicates many observations are response variables. The relationship between the response variables y i and predictor variables ( x 1 i , x 2 i , , x p i , t 1 i , t 2 i , , t q i ) follows a multivariable nonparametric regression model:
y i = μ ( x 1 i , x 2 i , , x p i , t 1 i , t 2 i , , t q i ) + ε i ,   i = 1 , 2 , , n .
where μ is the regression curve and ε i is random error, which is assumed to be independent and identical with a symmetrical Normal distribution having zero mean and variance σ 2 . Assume that the shape of the regression curve μ is unknown and additive in nature, so the regression model is obtained as follows:
μ ( x 1 i , x 2 i , , x p i , t 1 i , t 2 i , , t q i ) = j = 1 p g j ( x j i ) + k = 1 q h k ( t k i ) .  
where j = 1 p g j ( x j i ) is a component of the Smoothing Spline and k = 1 q h k ( t k i ) is a component of the Fourier Series. The component regression curve is g j ( x j i ) , j = 1 , 2 , p and assumed to be smooth in the sense that it is contained in the Sobolev space, namely:
g j W 2 m [ a j , b j ] , j = 1 , 2 , , p .  
Function g ˜ is a function of unknown shape and is assumed to be smooth in the sense that it fits within Sobolev space W . The Sobolev space W can be decomposed into a direct sum of two spaces W 0 and W 1 that are mutually perpendicular, namely: W = W 0 W 1 with W 0 W 1 . Next, the component regression curve h k ( t k i ) , k = 1 , 2 , , q is approximated using the Fourier Series function: h k ( t k i ) = b k t k i + 1 2 α 0 k + u = 1 K α u k cos u t k i ,   k = 1 , 2 , , q . The mixed estimator Smoothing Spline and Fourier Series in nonparametric regression can be obtained by optimization Penalized Least Squares (PLS) as follows [16]:
M i n g j W 2 m [ a j , b j ] { n 1 i = 1 n ( y i [ j = 1 p g j ( x j i ) + k = 1 q ( b k t k i + 1 2 α 0 k + u = 1 K α k u cos u t k i ) ] ) 2       + j = 1 p λ j a j b j [ g j ( m ) ( x j ) ] 2 d x j } ,   0 < λ j < .
Equation (1) becomes:
Q ( c ˜ , d ˜ ) = { n 1 ( Z ˜ g ˜ ) ( Z ˜ g ˜ ) + j = 1 P λ j a j b j ( g j ( m ) ( x j i ) ) d x j i } .
The explanation of the transformation from Equation (1) to Equation (2) is described in Appendix A.
Thus, Equation (2) becomes:
Q ( c ˜ , d ˜ ) = { ( Z ˜ Z ˜ Z ˜ U c ˜ Z ˜ V τ ˜ d ˜ c ˜ U Z ˜ + c ˜ U U c ˜ + c ˜ U V τ ˜ d ˜ +      d ˜ V τ ˜ Z ˜ + d ˜ V τ ˜ U c ˜ + d ˜ V τ ˜ V τ ˜ d ˜ + λ d ˜ V τ ˜ d ˜ ) n 1 } .
If Equation (3) is reduced to d ˜ , it thus becomes:
Q ( c ˜ , d ˜ ) d ˜ = 2 V τ ˜ Z ˜ + 2 V τ ˜ U c ˜ + 2 V τ ˜ V τ ˜ d ˜ ^ + 2 n λ V τ ˜ d ˜ ^ = 0 , Z ˜ + U c ˜ + [ V τ ˜ + n λ I ] d ˜ ^ = 0 .
For example, M = V τ ˜ + n λ I , so that Equation (4) becomes:
d ˜ ^ = M 1 ( Z ˜ U c ˜ ) .
Next,
Q ( c ˜ , d ˜ ) c ˜ = 2 U Z ˜ + 2 U U c ˜ + 2 U V τ ˜ d ˜ = 0 ,
Substitute Equation (5) into Equation (6), so that we get:
2 U Z ˜ + 2 U U c ˜ ^ + 2 U V τ ˜ M 1 ( Z ˜ U c ˜ ) = 0 .
where M = V τ ˜ + n λ I , then V τ ˜ = M n λ I . As a result, the following equation is obtained:
V τ ˜ M 1 = ( M n λ I ) M 1 = ( I n λ M 1 ) .
By substituting Equation (8) into Equation (7) and then solving, we obtain:
c ˜ ^ = ( U M 1 U ) 1 U M 1 Z ˜ ,
then substitute Equation (9) into Equation (5), so that:
d ˜ ^ = M 1 ( I U ( U M 1 U ) 1 U M 1 ) Z ˜ .
Thus,
g ^ = U c ˜ ^ + V d ˜ ^ , = [ U ( U M 1 U ) 1 U M 1 + V τ M 1 ( I U ( U M 1 U ) 1 U M 1 ) ] Z ˜ , g ^ ( λ ˜ , K ˜ ) ( x ˜ , t ˜ ) = D ( λ , K ) y ˜ D ( λ , K ) h ˜ .
Next, we substitute Equation (11) into equation y ˜ = g ˜ + h ˜ + ε ˜ , thus obtaining:
ε ˜ ε ˜ = y ˜ y ˜ 2 y ˜ D ( λ , K ) y ˜ + 2 α ˜ t D ( λ , K ) y ˜ 2 α ˜ t y ˜ 2 α ˜ t D ( λ , K ) D ( λ , K ) y ˜ + 2 α ˜ t D ( λ , K ) y ˜ +    2 α ˜ t D ( λ , K ) t α ˜ + y ˜ D ( λ , K ) D ( λ , K ) y ˜ + α ˜ t D ( λ , K ) D ( λ , K ) t α ˜ + α ˜ t t α ˜ .
The steps to obtain Equation (12) from Equation (11) can be seen in Appendix B.
The next step is to obtain the derivative of Equation (12):
( ε ˜ ε ˜ ) ( α ˜ ) = 2 t y ˜ 2 t y ˜ 2 t D ( λ , K ) D ( λ , K ) y ˜ + 2 t D ( λ , K ) y ˜ 4 t D ( λ , K ) t α ˜ ^ + 2 t D ( λ , K ) D ( λ , K ) t α ˜ + 2 t t α ˜ = 0 , α ˜ ^ = ( ( 2 D ( λ , K ) ) D ( λ , K ) t t ) 1 ( ( D ( λ , K ) I ) ( I D ( λ , K ) ) ) y ˜ .
Equation h ˜ ^ ( λ ˜ , K ˜ ) ( x ˜ , t ˜ ) = t α ˜ becomes:
h ˜ ^ ( λ ˜ , K ˜ ) ( x ˜ , t ˜ ) = t ( ( 2 D ( λ , K ) ) D ( λ , K ) t t ) 1 ( ( D ( λ , K ) I ) ( I D ( λ , K ) ) ) y ˜ ,
= B ( λ ˜ , K ˜ ) y ˜ .
After obtaining h ˜ ^ ( λ ˜ , K ˜ ) ( x ˜ , t ˜ ) = B ( λ ˜ , K ˜ ) y ˜ , then substitute Equation (10) into Equation (14), so that it becomes:
g ^ ( λ ˜ , K ˜ ) ( x ˜ , t ˜ ) = D ( λ , K ) ( I t ( ( 2 D ( λ , K ) ) D ( λ , K ) t t ) 1 ( ( D ( λ , K ) I ) ( I D ( λ , K ) ) ) ) y ˜ , = A ( λ ˜ , K ˜ ) y ˜ .
Based on Equations (15) and (16), the mixed regression Smoothing Spline and Fourier Series model is symbolized by μ ^ ˜ ( λ ˜ , K ˜ ) ( x ˜ , t ˜ ) as follows:
μ ^ ˜ ( λ ˜ , K ˜ ) ( x ˜ , t ˜ ) = g ^ ˜ ( λ ˜ , K ˜ ) ( x ˜ , t ˜ ) + h ˜ ^ ( λ ˜ , K ˜ ) ( x ˜ , t ˜ ) ,   = C ( λ ˜ , K ˜ ) y ˜
In which:
C ( λ ˜ , K ˜ ) y ˜ = ( A ( λ ˜ , K ˜ ) + B ( λ ˜ , K ˜ ) ) y ˜ ,
A ( λ ˜ , K ˜ ) y ˜ = D ( λ , K ) ( I t ( ( 2 D ( λ , K ) ) D ( λ , K ) t t ) 1 ( ( D ( λ , K ) I ) ( I D ( λ , K ) ) ) ) y ˜ ,
B ( λ ˜ , K ˜ ) y ˜ = t ( ( 2 D ( λ ˜ , K ˜ ) ) D ( λ ˜ , K ˜ ) t t ) 1 ( ( D ( λ ˜ , K ˜ ) I ) ( I D ( λ ˜ , K ˜ ) ) ) y ˜ ,
D ( λ ˜ , K ˜ ) y ˜ = U ( U M 1 U ) 1 U M 1 + V τ ˜ M 1 ( I U ( U M 1 U ) 1 U M 1 ) ,
M = V τ ˜ + n λ I ;   λ j = λ τ j , j = 1 , 2 , , p , V τ ˜ = τ 1 V 1 + τ 2 V 2 + + τ p V p ,
t = [ t 11 1 / 2 cos t 11 cos 2 t 11 cosK t 11 t 12 1 / 2 cos t 12 cos 2 t 12 cosK t 12 t 1 n 1 / 2 cos t 1 n cos 2 t 1 n cosK t 1 n   t q 1 1 / 2 cos t q 1 cos 2 t q 1 cosK t q 1 t q 2 1 / 2 cos t q 2 cos 2 t q 2 cosK t q 2 t q n 1 / 2 cos t q n cos 2 t q n cosK t q n ] ,
  • λ ˜ : smoothing parameters,
  • K ˜ : oscillation parameters,
  • U: matrix size n × m p ,
  • V τ ˜ : variance-covariance matrix τ 1 V 1 τ 2 V 2 τ p V p size n × n .
The estimator of the mixture is highly dependent on the smoothing parameter and the oscillation parameter. These parameters are selected using the Generalized Cross Validation method. The selection of the optimal smoothing and oscillation parameters in the mixed Smoothing Spline and Fourier Series estimator requires the following lemma:
Lemma 1.
If the mixed Smoothing Spline and Fourier Series estimator is given by Equation (17), then the mean square error (MSE) of the mixed Smoothing Spline and Fourier Series estimator is given by:
M S E ( λ ˜ , K ˜ ) = n 1 ( I C ( λ ˜ , K ˜ ) ) y ˜ 2
Proof. 
If μ ^ ˜ ( λ ˜ , K ˜ ) ( x ˜ , t ˜ ) is the mixed Smoothing Spline and Fourier Series estimator, then the MSE of this model is given:
M S E ( λ ˜ , K ) = n 1 ( y ˜ y ^ ) ( y ˜ y ^ ) , = n 1 ( y ˜ C ( λ ˜ , K ) y ˜ ) ( y ˜ C ( λ ˜ , K ) y ˜ ) , = n 1 ( I C ( λ ˜ , K ) y ) ( I C ( λ ˜ , K ) y ˜ ) , = n 1 ( I C ( λ ˜ , K ˜ ) ) y ˜ 2 .
Lemma 2.
If the mixed of Smoothing Spline and Fourier Series estimator is given by Equation (17) and M S E ( λ ˜ , K ) , as in Lemma 1, then the Generalized Cross Validation (GCV) function is given by:
G C V ( λ ˜ , K ˜ ) = n 1 ( I C ( λ ˜ , K ˜ ) ) y ˜ 2 [ n 1 t r a c e ( I C ( λ ˜ , K ˜ ) ) ]
Proof. 
The GCV function can be written as:
G C V ( λ ˜ , K ˜ ) = M S E ( λ ˜ , K ˜ ) [ n 1 t r a c e ( I C ( λ ˜ , K ˜ ) ) ]
Based on Lemma 2, we obtain:
G C V ( λ ˜ , K ˜ ) = M S E ( λ ˜ , K ˜ ) [ n 1 t r a c e ( I C ( λ ˜ , K ˜ ) ) ] , = n 1 ( I C ( λ ˜ , K ˜ ) ) y ˜ 2 [ n 1 t r a c e ( I C ( λ ˜ , K ˜ ) ) ] .
The smoothing parameter ( λ ) and the optimal oscillation parameter (K) are obtained from the minimization function G C V ( λ ˜ , K ˜ ) , i.e.,
G C V ( λ ˜ o p t i m a l , K o p t i m a l ) = M i n n 1 ( I C ( λ ˜ , K ˜ ) ) y ˜ 2 [ n 1 t r a c e ( I C ( λ ˜ , K ˜ ) ) ] .

3. Simulation Study for Mixed Smoothing Spline and Fourier Series

A simulation study for the mixed Smoothing Spline and Fourier Series estimator in multivariable nonparametric regression was conducted. The simulation was carried out to describe the ability of the Smoothing Spline estimator and the Fourier Series in estimating the multivariable nonparametric regression curve. The steps taken are as follows.
(1)
Make a nonparametric regression model with two predictor variables: y i = g 1 ( x 1 i ) + h 1 ( t 1 i ) + ε i , with   i = 1 , 2 , , n .
(2)
Take the variation for sample size n = 25, n = 50, n = 100, n = 200.
(3)
Generate x and t from the Uniform distribution (0,1).
(4)
Generate the random error from the symmetrical Normal distribution ( 0 , σ 2 ) ; σ 2 = 0.25 ;   σ 2 = 0.5 ;   σ 2 = 1   . The selection of variations σ 2 is undertaken to examine the behavior of the mixed estimator in response to the increase in the variance of error.
(5)
Define the regression curve for the Smoothing Spline component of the following function: Polynomial Function: g 1 ( x 1 i ) = 2 x 1 i 4 ( x 1 i 0.45 ) .
(6)
Define the regression curve for the components of the Fourier Series of the following function: Trigonometry function: h 1 ( t 1 i ) = 0.1 t 1 i + 1 2 + cos ( 2 π t 1 i n )
(7)
Design a nonparametric regression equation using a mixture of the Smoothing Spline and Fourier Series functions: y i = 2 x 1 i 4 ( x 1 i 0.45 ) + 0.1 t 1 i + 1 2 + cos ( 2 π t 1 i n ) + ε i ,
(8)
Model the data using a mixture of the Smoothing Spline and Fourier Series estimator.
(9)
Determine the value of the λ with K = 1, 2, 3.
(10)
Determine the value of the minimum GCV.
(11)
Determine the value of MSE and R2 based on GCV minimum value obtained in step 9.
(12)
Repeat one hundred times for each scenario.
(13)
Perform variations of n and σ 2 for each experimental function group.
(14)
Compare the results in each scenario using the GCV, MSE, and R2 criteria.
(15)
Form a conclusion about the ability of the mixed Smoothing Spline and Fourier Series estimator to estimate the multivariable nonparametric regression curve.
Point 10 states the selection of the smoothing parameters ( λ ) and oscillation parameters (K) are optimal using the Generalized Cross Validation (GCV) method.
Using the mixed Smoothing Spline and Fourier Series estimator in Equation (1), this simulation uses two conditions, such that the variance of the observation size is n = 25, 50, 100, and 200, and the error variance is σ 2 = 0.25, 0.5, and 1. The sample size n was used to show if increasing the sample size improved the model estimate. The sample sizes n = 25, 50, 100, and 200 are considered sufficient to represent small and large sample sizes. Similarly, the variance measure was used to determine if a smaller variance resulted in a better model estimate. The measures of the variance error, σ 2 = 0.25, 0.5, and 1, were deemed sufficient to represent small and large variance sizes. The combination for the simulation model can be seen in Table 1 below.
All simulation experiments in Table 1 will then look for the GCV value. The regression equation designed for this simulation study is as follows:
y i = g 1 ( x 1 i ) + h 1 ( t 1 i ) + ε i , i = 1 , 2 , , n
Furthermore, the function error   ε i   is generated from the distribution N ( 0 , 1 ) ,   x i U ( 0 , 1 ) ,   t i U ( 0 , 1 ) . The scatterplot simulation data are shown in Figure 1.
Figure 1 shows the scatterplot with a sample size of n = 100 and an error variance σ 2 = 0.25 . Figure 1 indicates that the pattern of the relationship between y and x tends to change in certain sub-intervals. The change in the pattern of the data can be seen at several points, namely, points 0.6, 0.7, and 0.8. At intervals of 0 to 0.6, the data pattern tends to decrease, between 0.6 and 0.7 the data pattern tends to increase, between 0.7 and 0.8 the data pattern increases, and at intervals of 0.8 to 1 the data pattern once again decreased. This data pattern is in accordance with the characteristics of the Smoothing Spline. The pattern of the relationship between y and t shows that the plot indicates an unclear pattern or leads to a change in the shape of the repeating data behavior pattern following an upward trend line. Based on the results of this description, this pattern is characteristic of the Fourier Series.
Based on Figure 1, it can be said that, in the use of the two predictors, there are two different data patterns. The pattern of the relationship between y and x tends to change at certain sub-intervals, whereas the relationship between y and t follows a repeating pattern on a trend. Thus, there is a different pattern between the predictors, so that the data was modeled using a mixed model of the Smoothing Spline and Fourier Series. The best model was selected based on the optimal value of the smoothing parameter and the oscillation parameter. The optimal values of the smoothing parameter and the oscillation parameter were selected based on the minimum GCV value. Table 2 presents the selection of the smoothing parameters and optimal oscillation parameters based on the minimum GCV value.
Table 2 shows that, for the various combinations of smoothing parameters and oscillation parameters used for the Smoothing Spline and Fourier Series mixed models, the optimal smoothing parameter value obtained was 0.042 and the oscillation parameter was 1. These results provide the minimum GCV value of 0.0000184. Based on this, the value of R2 = 86.439% was obtained and the MSE value was 0.257. The obtained values of R2 and MSE demonstrate the suitable performance of the Smoothing Spline and Fourier Series mixed model. Figure 2 presents a scatter plot between the value of y and the value y ^ .
In Figure 2, the estimated data plot is very close to the original data, so this model can be used to make accurate predictions.
Furthermore, with the same procedure, one hundred repetitions were carried out for scenario 4. The minimum GCV, R2, and MSE values for one hundred repetitions are summarized in Table 3.
In order to make it easier to compare the effect of the error variance, Table 4 presents a comparison of the R2 value for one hundred replications based on n = 100.
Determining whether there is a significant difference based on the different error variance values was tested based on the average difference between the groups using the hypothesis:
Hypothesis 1 (H1): 
There is no significant difference between the average,
R 2 ( σ 2 = 0.25 ) , R 2 ( σ 2 = 0.5 ) ,   R 2 ( σ 2 = 1 )
Hypothesis 2 (H2): 
There is a real difference between the average of the average,
R 2 ( σ 2 = 0.25 ) , R 2 ( σ 2 = 0.5 ) ,   R 2 ( σ 2 = 1 )
The output presented in the Appendix C shows that the p-value 0.000 < 0.05. Therefore, it was concluded that H0 should be rejected, meaning that there is a difference on average between R 2 ( σ 2 = 0.25 ) , R 2 ( σ 2 = 0.5 ) , and R 2 ( σ 2 = 1 ) .
Furthermore, the Tukey test was used to compare the average of all couples. The output of presented in Appendix B shows that all the error variance values are in different groups, which means that, on average, there is a difference between the values of R 2 ( σ 2 = 0.25 ) , R 2 ( σ 2 = 0.5 ) , and R 2 ( σ 2 = 1 ) . Based on the output above, it can also be seen that R 2 ( σ 2 = 0.25 ) gives better results than the other options, followed by R 2 ( σ 2 = 0.5 ) and R 2 ( σ 2 = 1 ) . Thus, for n = 100 it can be seen that, the smaller the value of σ 2 , the better the results that were obtained. The same steps were carried out for 12 trials, and the whole experiment is summarized in Table 5.
In order to make it easier to compare the simulation results, a scatter diagram is presented in Figure 3 and Figure 4 for the GCV and R2 values of the Polynomial-Trigonometric experimental function for each variation of the number n.
Table 5 shows that the obtained results improve as the sample size increases. This is indicated by the resulting GCV value. It can be seen that the larger the sample size, the smaller the result of the GCV value. The GCV value of the Smoothing Spline and Fourier Series mixed model for the number of samples n = 200 is smaller than that for the sample size n = 100. Similarly, for the number of samples n = 100, the resulting GCV value was smaller than that for the sample size n = 50, etc. In addition to using GCV criteria, we also used MSE criteria and R2 to determine the suitability of the model. The simulation results show that a sample size of n = 200 gives better results based on the highest value R2 and lowest MSE for all given error variance values.
Table 5 also shows that the smaller the sample error variance, the better the model estimate obtained for all sample size variations. This is indicated by the smaller the size of the error variance, the smaller the GCV value. This condition is as shown in Table 5: at a sample size of n = 25, the error variance σ 2 = 1 gives a GCV value of 0.01741598, then decreases at σ 2 = 0.5 to 0.00378041, and decreases again at σ 2 = 0.25 to 0.00185812. This can also be seen in the sample size n = 50, where the error variance is σ 2 = 1 gives a GCV value of 0.00046407, then decreases at σ 2 = 0.5 to 0.00019175, and decreases again at σ 2 = 0.25 to 0.00012196. Similar results were seen for n = 100 and n = 200; for the sample size n = 100, for variance values of σ 2 = 1 , σ 2 = 0.5 , and σ 2 = 0.25 , the GCV results were 0.00006150, 0.00002668, and 0.00001073, respectively, whereas for the sample size n = 200, the GCV results were respectively 0.00000629, 0.00000243, and 0.0000040. Furthermore, to determine the power of the mixed Smoothing Spline and Fourier Series, a comparison was made between the nonparametric regression of the Smoothing Spline, the Fourier Series, and the mixed Smoothing Spline and Fourier Series. In this simulation, the observed size variance was n = 200 and the error variance was σ 2 = 0.25 , and 100 repetitions were undertaken. The selection of the best model was determined by the minimum GCV value. A comparison table for n = 200 and the variance of error σ 2 = 0.25 is presented in Table 6.
The results show that the mixed Smoothing Spline and Fourier Series in nonparametric regression performed better than the individual Smoothing Spline or Fourier Series. This can be seen based on the minimum GCV value in Table 6. After performing the simulation, real-life applications were carried out regarding the use of the mixed method of the Smoothing Spline and Fourier Series in multivariable nonparametric regression.

4. Application on Poor Household Data in Karangasem, Bali

Linearity tests and scatter plots can be used for modeling the pattern of relationships between response variables and predictor variables, and for examining the behavior of data patterns. This is explained in the following. A non-linearity test was undertaken based on the Terasvirta Neural Network Test to identify data patterns using a significance level of 5%. This identification was used to compare the data patterns between using a parametric regression or a nonparametric regression. The non-linearity test results are presented in Table 7.
Based on Table 7, it can be concluded that the relationship between the variables of wages/salaries per capita per month, consumption of carbohydrates per capita per month, consumption of fat per capita per month, and consumption of protein per capita per month, with household expenditure variables per capita per month, was non-linear. Because the form of this non-linear function was not known, it was modeled using multivariable nonparametric regression. This was clarified using a scatter plot, as explained below.
The presentation of data in the form of a scatter plot aims to identify the behavioral pattern of the distribution of the data, and whether the data has a pattern that changes in sub-intervals or follows a repeating pattern. Figure 5a shows that the relationship between the response variable of household expenditure per capita per month and the predictor variable of wages/salaries does not follow a certain pattern. The data plot indicates that the relationship does not show a clear pattern or that there is a tendency to exhibit a change in behavior at a certain interval. Wages/salaries increased and household expenditure per capita per month also increased; this can be seen in the interval of 100,000 IDR until 250,000 IDR, in which there was an increase in household expenditure per capita per month. By comparison, for wages/salaries in the interval of 250,000 IDR until 292,500 IDR, monthly household expenditure per capita experienced a slight decrease from the previous interval. Based on the results of this description analysis, the pattern of the relationship between the variable wage/salary and household expenditure per capita a month followed the Smoothing Spline approach.
Figure 5b shows that the pattern of the relationship between the response variable of household expenditure per capita per month and the predictor variable of carbohydrate consumption per capita per month changes at certain sub-intervals. Carbohydrate consumption at the interval of 1200–1300 g increased, and carbohydrate consumption at the interval of 1300–1400 g decreased slightly, and increased in the interval of 1400–1700 g. If the consumption of carbohydrates is high, the expenditure will also increase. Carbohydrate consumption in the interval of 1700–1800 g decreased. Based on this illustration, the relationship pattern between the variable consumption of carbohydrates per capita per month and household expenditure per capita per month was approached using the Smoothing Spline. The third variable that is thought to influence household expenditure per capita per month is the consumption of fat per capita per month.
Figure 5c shows that the pattern of the relationship between the response variable of household expenditure per capita per month and the predictor variable of fat consumption per capita per month does not follow a certain pattern or lead to changes in the form of repetitive data behavior patterns that follow an upward trend line. Based on the results of this description, the pattern of the relationship between the consumption of fat per capita per month with household expenditure per capita per month was approached using the Fourier Series. Figure 5d shows that the pattern of the relationship between the response variable of household expenditure per capita per month with the predictor variable of protein consumption per capita per month does not follow a certain pattern. The data plot indicates that it does not show a clear pattern or a tendency to change behavior at a certain interval. Consumption of protein per capita a month in the interval of 80–110 g increases slowly. Then, the consumption of protein per capita per month in the interval of 110–125 g experienced an increase with that in household expenditure per capita per month. Based on the results of this descriptive analysis, the pattern of the relationship between the response variable of household expenditure per capita per month with the predictor variable of protein consumption per capita per month used the Smoothing Spline approach.
Based on the data and information obtained from the data exploration, the non-linearity test and scatter plot showed that the Smoothing Spline Mixed Estimator and Fourier Series are the appropriate methods in nonparametric regression. This is because the pattern of the relationship between the response variable and the predictor variable tends to change its behavior in sub-intervals, and there is also a pattern of relationship between the response variable and the predictor variable that repeats itself following an upward trend. The general model of the mixed regression Smoothing Spline and Fourier Series for a case study of monthly household expenditure per capita is as follows:
y = g 1 ( x 1 i ) + h 1 ( t 1 i ) + g 2 ( x 2 i ) + g 3 ( x 3 i ) + ε i ;   i = 1 , 2 , , 579
In matrix form this can be written as y ˜ = μ ˜ + ε ˜ .
When the predictor variables of wages/salaries, carbohydrate consumption per capita per month, and protein consumption per capita per month are modeled using the Smoothing Spline component, and the predictor variable of fat consumption per capita per month is modeled with the Fourier Series component, the minimum GCV value is sought. Table 8 shows the smallest GCV value.
The smallest GCV value obtained from the model was 1 . 42 × 1 0 13 . In addition, the value of R2 indicated that the model described 98.99% of the relationship between x 1 i , x 2 i , t 1 i , x 3 i and y i , and the Mean Square Error (MSE) was 8 . 66 × 1 0 5 . The best model can be given in the following equation:
y ^ = 7.72 × 10 4 + 5.38 × 10 1 x 1 i 7.71 × 10 4 + 2.33 × 10 1 x 2 i 7.72 × 10 4 + 2.77 x 3 i + V d ˜ ^ + 7.41 × 10 13 t 1 i 3.01 × 10 11 + 8.42 × 10 11 cos t 1 i
with:
V = [ 3.72 × 10 9   5 . 03 × 10 9   3.705 × 10 9 2.84 × 10 9 ] , d ^ ˜ = [ 4.49 × 10 7 ; 4.06 × 10 7 ; ; 6.61 × 10 7 ]
Oscillation parameters for K = 1 and the smoothing parameter were λ 1 = 0.03 ;   λ 2 = 0.03 ;   λ 3 = 0.03 . From the model above, we obtained a plot of the data estimation results using the Smoothing Spline and Fourier Series mixed estimators. The plot of the goodness of the mixed Smoothing Spline and Fourier Series model can be seen in Figure 6.
Figure 6 shows that the data pattern approach using a mixed estimate of the Smoothing Spline and Fourier Series in nonparametric regression is good. This is because the estimation curve of mixed Smoothing Spline and Fourier Series estimator is in visual accordance with the data, or close to the actual data. Furthermore, based on the mixed model of the Smoothing Spline and the Fourier Series shown above, we obtained the scatter plot between the y value and the value y ^ shown in Table 7.
Figure 7 shows that the predicted value of monthly household expenditure per capita ( y ^ ) was relatively close to the actual monthly expenses ( y ) for each poor household in Karangasem Regency, Bali Province. In addition, the mixed model of the Smoothing Spline and Fourier Series had a relatively large coefficient of determination, namely 98.99%. This shows that the mixed model of the Smoothing Spline and Fourier Series is suitable to be used in modeling the monthly per capita household expenditure of poor households in Karangasem Regency, Bali Province.
Based on Figure 6 and the best model, the following can be concluded:
  • At wages/salaries above Rp. 250,000, there is a decrease in the monthly expenditure of poor households, whereas the characteristics of these households are dominant in consumption of carbohydrates and fat compared to protein consumption.
  • Consumption of fat below 90 g has decreased the expenditure of poor households; in this position the household consumes more carbohydrates than fat and protein, and the household has a wage/salary income below Rp. 120,000 per month.
  • Consumption of carbohydrates above 1700 g has decreased the expenditure of poor households per month; these households have the characteristic of consuming protein. These households have an income above Rp. 270,000.
  • To ensure household consumption is nutritious, safe, and balanced, and that the consumption of carbohydrates, fats, and proteins is in accordance with the recommendations of the PUGS of the Ministry of Health, poor households must increase their carbohydrate consumption to 5280.75 g, fat consumption to 1429.89 g, and protein consumption to 1542.32 g. Income (wages/salaries) of 441,000 IDR per month will support the expenditure of poor households of 436,113 IDR, thus positioning the poor households above the poverty line. In efforts to meet the nutritional intake of poor households, the government must provide assistance in the form of basic food items that can meet the community’s nutrition needs, in terms of carbohydrates, proteins, and fats.
Based on the results of this study, the best model was obtained using the smoothing parameters of λ 1 = 0.03 ;   λ 2 = 0.03 ;   λ 3 = 0.03 , and the oscillation parameter of K = 1. It was found that, in certain positions, greater consumption of fat does not always increase the expenditure because, in this position, very poor households receive assistance from the government. Very poor households are characterized by consuming more carbohydrates than fat and protein. The results of this study suggest that the level of assistance provided to poor households should not be the same because in certain groups there are poor households that consume higher carbohydrates, higher fat, or higher protein. In addition, in an effort to increase the income of poor households, the government can provide assistance to create jobs by opening small and medium businesses, and provide counseling to the community by employing scholars to construct villages. The Karangasem district government can create community empowerment programs, such as the independent National Community Empowerment Program, the Simantri Program, and the Joint Business Group Program. These schemes also include the Savings and Loans program implemented by women funded by the independent National Community Empowerment Program, which is often referred to as the Women’s Savings and Loans Program. Poverty alleviation programs are referred to as the Integrated Village Development Movement. Furthermore, programs classified as productive include People’s Business Credit, Unsecured Credit, and the Bali Mandara Credit Guarantee.
A poverty alleviation program can not only reduce the number of poor people, but also reduce the number of unemployed people around places of business. This program is said to be a truly productive program that aims to increase the income of the poor. Several government policies that relate to poverty alleviation programs can be implemented to address poverty in Karangasem Regency, Bali Province. These include reducing the expenditures of the poor, as implemented by other parties such as the government or other communities, and increasing the income of the poor so that they can escape from poverty. Several policies aimed at reducing expenditures for the poor include the Raskin Program, the Direct Cash Assistance Program, basic food assistance, the Bali Mandara Health Insurance Program, the Community Health Insurance, and poor scholarships, namely, scholarships provided to children or students who have dropped out of school.

5. Conclusions

The mixed estimator Smoothing Spline and Fourier Series, as a model of nonparametric regression was improved, for all variations in variance. In the case in which the variance is decreasing, the Smoothing Spline and Fourier Series mixed model estimator is better for all combinations of n samples. Using a sample size of 100, this model was found to be suitable. In the simulation study, a comparison was made between the mixed estimator of the Smoothing Spline and Fourier Series, and the individual Smoothing Spline and Fourier Series estimators. The mixed estimator of the Smoothing Spline and Fourier Series was found to be better than the individual Smoothing Spline and Fourier Series estimators based on the minimum GCV value. Based on the results of the model applied to the household expenditure per capita per month of poor households in Karangasem Regency, Bali Province, in which monthly wage/salary, carbohydrate consumption, fat consumption, and protein consumption per capita were used as predictor variables, a relatively large coefficient of determination was obtained (R2), namely 98.99%. This shows that the mixed estimator of the Smoothing Spline and Fourier Series is suitable to be used in modeling the monthly expenditure of poor households in Karangasem Regency, Bali Province.
The best model was obtained using the smoothing parameters of 0.03, 0.03, and 0.03, and the oscillation parameter of K = 1. It was found that, in a certain position, the greater consumption of fat does not always increase the household’s expenditure because, in this position, very poor households receive assistance from the government. Very poor households are characterized by consuming more carbohydrates than fat and protein. Based on the results of this study, it is suggested that the distribution of assistance to poor households should not be the same, because some poor households consume a higher level of carbohydrates, and others consume a higher level of fats.

Author Contributions

Conceptualization, N.P.A.M.M., I.N.B. and V.R.; methodology, N.P.A.M.M. and I.N.B.; software, N.P.A.M.M. and V.R.; validation, N.P.A.M.M., I.N.B. and V.R.; formal analysis, N.P.A.M.M. and I.N.B.; investigation, N.P.A.M.M.; data curation, N.P.A.M.M.; writing original draft preparation, N.P.A.M.M.; writing review and editing, N.P.A.M.M. and I.N.B.; visualization, N.P.A.M.M.; supervision, I.N.B.; project administration, I.N.B. and N.P.A.M.M.; All authors have read and agreed to the published version of the manuscript.

Funding

The authors thank the Directorate of Higher Education (DIKTI), the Ministry of Education and Culture of the Republic of Indonesia, for funding research through a Penelitian Disertasi Doktor (PDD) grant in 2021, Funding number: 3/E1/KP.PTNBH/2021.

Acknowledgments

The authors thank the editor and the reviewers for their constructive and helpful comments.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

If a given goodness of fit is as follows:
n 1 i = 1 n ( y i [ j = 1 p g j ( x j i ) + k = 1 q ( b k t k i + 1 2 α 0 k + u = 1 K α k u cos u t k i ) ] ) 2 ,
then the goodness of fit can be written as:
( c ˜ , d ˜ ) = { n 1 ( Z ˜ g ˜ ) ( Z ˜ g ˜ ) } .
Proof. 
Function g j , j = 1 , 2 , , p is any function contained in the space W. Then space W can be decomposed into a direct sum of two spaces W 0 and W 1 perpendicular to each other [17,18], namely: W = W 0 W 1 with W 0 W 1 . Thus g j , can be expressed as g j = u j + v j with u j W 0 and v j W 1 .
If the basis on the space W0 is { θ j 1 , θ j 2 , , θ j m } where m is the order of the spline polynomial and the basis on the space W1 is { ψ j 1 , ψ j 2 , , ψ j n } where n is the number of observations, then for each function u j W 0 can be written as follows:
u j = c j 1 θ j 1 + c j 2 θ j 2 + + c j m θ j m = k = 1 m c j k θ j k = θ ˜ j c ˜ j ,
and for each function v j W 1 as follows:
v j = ς j 1 ψ j 1 + ς j 2 ψ j 2 + + ς j n ψ j n ,     = ( d 1 τ j ) ψ j 1 + ( d 2 τ j ) ψ j 2 + + ( d n τ j ) ψ j n ,     = i = 1 n ( d i τ j ) ψ j i = τ j ψ ˜ j d ˜ .
where c j and d j are constants. Thus, for each function g j W can be described as follows:
g j = u j + v j ,     = k = 1 m c j k θ j k + i = 1 n ( d i τ j ) ψ j i , = θ ˜ j c ˜ j + τ j ψ ˜ j d ˜ , j = 1 , 2 , , p .
with θ ˜ j = { θ j 1 , θ j 2 , , θ j m }   and   c ˜ j = { c j 1 , c j 2 , , c j m } . ψ ˜ j = { ψ j 1 , ψ j 2 , , ψ j n }   and   d ˜ = ( d 1 , d 2 , , d n ) . .
Based on Riesz’s Representation Theorem, by describing x as a linear function in the space W and g j W , then Equation (A1) can be written as follows:
x g j = x ( u j + v j ) ,      = x u j + x v j ,      = u j ( x j i ) + v j ( x j i ) ,      = g j ( x j i ) .
x is a linear function limited to the space W, so that a single value is obtained; η i W is a representation of x and can satisfy the equation:
x g j = η j i , g j ,
where . , . is the inner product.
Based on properties of the inner product, then Equation (A2) can be written as:
g j ( x j i ) = η j i , g j , = η j i , θ ˜ j c ˜ j + η j i , τ j ψ ˜ j d ˜
Based on Equation (A3), for i = 1 it can be stated as follows:
g j ( x j 1 ) = η j i , θ ˜ j c ˜ j + η j i , τ j ψ ˜ j d ˜ ,       = η j 1 , ( θ j 1 θ j 2 θ j m ) ( c j 1 c j 2 c j m ) + η j 1 , ( τ j ψ j 1 τ j ψ j 2 τ j ψ j n ) ( d 1 d 2 d n ) ,       = c j 1 η j 1 , θ j 1 + c j 2 η j 1 , θ j 2 + + c j m η j 1 , θ j m +        d 1 τ j η j 1 , ψ j 1 + d 2 τ j η j 1 , ψ j 2 + + d 2 τ j η j 1 , ψ j n .
If the process continues in the same way, then for i = n we obtain:
g j ( x j n ) = c j 1 η j n , θ j 1 + c j 2 η j n , θ j 2 + + c j m η j n , θ j m +       d 1 τ j η j n , ψ j 1 + d 2 τ j η j n , ψ j 2 + + d 2 τ j η j n , ψ j n .
Based on Equation (A4) vector g j ( x j ) can be expressed in the form:
g ˜ j ( x j ) = ( g ˜ j ( x j 1 ) g ˜ j ( x j 2 ) g ˜ j ( x j n ) ) ,       = ( η j 1 , θ j 1 η j 1 , θ j 2 η j 1 , θ j m η j 2 , θ j 1 η j 2 , θ j 2 η j 2 , θ j m η j n , θ j 1 η j n , θ j 2 η j n , θ j m ) ( c j 1 c j 2 c j m ) +        ( τ j η j 1 , ψ j 1 τ j η j 1 , ψ j 2 τ j η j 1 , ψ j n τ j η j 2 , ψ j 1 τ j η j 2 , ψ j 2 τ j η j 2 , ψ j n τ j η j n , ψ j 1 τ j η j n , ψ j 2 τ j η j n , ψ j n ) ( d 1 d 2 d n ) . g ˜ j = U j c ˜ j + τ j V j d ˜
When W = W 2 m [ a j , b j ] , j = 1 , 2 , , p then:
η j i , θ j k = x j θ j k , = x j i k 1 ( k 1 ) ! , i = 1 , 2 , , n ; k = 1 , 2 , , m .
Thus, U j can be written as:
U j = ( 1 u j 1 x j 1 m 1 ( k 1 ) ! 1 u j 2 x j 2 m 1 ( k 1 ) ! 1 u j n x j 2 m 1 ( k 1 ) ! ) ,
Furthermore,
η j i , ψ j i = θ j k + ψ j i , ψ j i , = ψ j i , ψ j t ,
Then the matrix V j can be written as:
V j = ( ψ j 1 , ψ j 1 ψ j 1 , ψ j 2 ψ j 1 , ψ j n ψ j 2 , ψ j 1 ψ j 2 , ψ j 2 ψ j 2 , ψ j n ψ j n , ψ j 1 ψ j n , ψ j 2 ψ j n , ψ j n ) ,
so that the form of the spline estimator can be expressed as following:
g ˜ ( x 1 , x 2 , , x p ) = j = 1 p g j ( x j ) , = j = 1 p U j c ˜ j + j = 1 p τ j V j d ˜ , = U c ˜ + V τ ˜ d ˜ .
where
U = ( U 1 U 1 U p ) , c ˜ = ( c ˜ 1 c ˜ 2 c ˜ p ) , V τ = ( τ 1 V 1 τ 2 V 2 τ p V p ) , d ˜ = ( d ˜ 1 d ˜ 2 d ˜ n ) .
Furthermore, for components of the regression curve h k ( t k i ) the regression curve has an unknown shape and is contained in continuous space C ( 0 , π ) . For components of the regression curve h k ( t k i ) , k = 1 , 2 , , q the function of the Fourier series is approached, which is as follows:
h k ( t k i ) = b k t k i + 1 2 α 0 k + u = 1 K α u k cos u t k i , k = 1 , 2 , , q , = b k t k i + 1 2 α 0 k + α 1 k cos t k i + α 2 k cos 2 t k i + + α K k cosK t k i .
The Fourier series regression equation can be written as follows:
y i = k = 1 q h k ( t k i ) + ε i , i = 1 , 2 , , n .
Equation (A6) can be written in the form:
[ y 1 y 2 y n ] = [ t k 1 1 / 2 cos t k 1 cos 2 t k 1 cosK t k 1 t k 2 1 / 2 cos t k 2 cos 2 t k 2 cosK t k 2 t k n 1 / 2 cos t k n cos 2 t k n cosK t k n ] [ b k α 0 k α 1 k α 2 k α K k ] , = t k α ˜ k .
Thus, in Equation (A5), the Fourier Series function in the nonparametric regression component with the q predictor can be expressed in the following form:
h ˜ i = k = 1 q h k ( t k i ) , = k = 1 q t k α ˜ k , = t α ˜ .
where:
t = [ t 11 1 / 2 cos t 11 cos 2 t 11 cosK t 11 t 12 1 / 2 cos t 12 cos 2 t 12 cosK t 12 t 1 n 1 / 2 cos t 1 n cos 2 t 1 n cosK t 1 n   t q 1 1 / 2 cos t q 1 cos 2 t q 1 cosK t q 1 t q 2 1 / 2 cos t q 2 cos 2 t q 2 cosK t q 2 t q n 1 / 2 cos t q n cos 2 t q n cosK t q n ] , α ˜ = [ b 1 α 01 α 11 α 21 c K 1 b q α 0 q α 1 q α 2 q α K q ] .
The component of goodness of fit can be written as:
( c ˜ , d ˜ ) = n 1 i = 1 n ( y i [ j = 1 p g j ( x j i ) + k = 1 q ( b k t k i + 1 2 α 0 k + u = 1 K α k u cos u t k i ) ] ) 2 , = n 1 ( y ˜ g ˜ h ˜ ) 2 , = n 1 ( Z ˜ g ˜ ) 2 , = { n 1 ( Z ˜ g ˜ ) ( Z ˜ g ˜ ) } .
where Z ˜ = y ˜ h ˜ .
Thus, it is proven that the goodness of fit can be written as:
( c ˜ , d ˜ ) = n 1 i = 1 n ( y i [ j = 1 p g j ( x j i ) + k = 1 q ( b k t k i + 1 2 α 0 k + u = 1 K α k u cos u t k i ) ] ) 2 , = { n 1 ( Z ˜ g ˜ ) ( Z ˜ g ˜ ) } .
Based on Equation (A8), Equation (1) can be written as Equation (2). □

Appendix B

y ˜ = g ˜ + h ˜ + ε ˜ , = D ( λ , K ) ( y ˜ h ˜ ) + h ˜ + ε ˜ , ε ˜ = y ˜ D ( λ , K ) y ˜ + D ( λ , K ) h ˜ h ˜ .
Substitute Equation (A7) into Equation (A9):
ε ˜ ε ˜ = ( y ˜ D ( λ , K ) y ˜ + D ( λ , K ) t α ˜ t α ˜ ) ( y ˜ D ( λ , K ) y ˜ + D ( λ , K ) t α ˜ t α ˜ ) .
Thus, it becomes Equation (12).

Appendix C

Analysis of Variance
SourceDFSSMSFP
Factor225786.312893.11027.270.000
Error2973727.612.6
Total29929513.9
Tukey Pairwise Comparisons
FactorNMeanGrouping
0.2510090.279A
0.5010080.468 B
1.0010067.636 C

References

  1. Eubank, R.L. Nonparametric Regression and Spline Smoothing, 2nd ed.; Marcel Dekker, Inc.: New York, NY, USA, 1999. [Google Scholar]
  2. Akram, T.; Abbas, M.; Iqbal, A.; Baleanu, D.; Asad, J.H. Novel numerical approach based on modified extended cubic B-spline functions for solving non-linear time-fractional telegraph equation. Symmetry 2020, 12, 1154. [Google Scholar] [CrossRef]
  3. Becher, H.; Kauermann, G.; Khomski, P. Using penalized splines to model age and season of birth dependent effects of childhood mortality risk factors in rural burkina faso. Biom. J. 2009, 51, 110–122. [Google Scholar] [CrossRef] [PubMed]
  4. Wang, X.; Shen, J.; Ruppert, D. On the asymptotics of penalized spline smoothing. Electron. J. Stat. 2011, 5, 1–17. [Google Scholar] [CrossRef]
  5. Cox, D.D.; Sullivan, F. Penalized type estimator for generalized nonparametric regression. J. Multivar. Anal. 1996, 56, 185–206. [Google Scholar] [CrossRef] [Green Version]
  6. Aydin, D.; Memmedli, M.; Omay, R. Smoothing parameter selection for nonparametric regression using smoothing spline. Eur. J. Pure Appl. Math. 2013, 6, 222–238. Available online: https://www.ejpam.com/index.php/ejpam/article/view/1362/296 (accessed on 15 February 2021).
  7. Bilodeau, M. Fourier smoother and additive models. Can. J. Stat. 1992, 20, 257–269. [Google Scholar] [CrossRef]
  8. Sudiarsa, I.W.; Budiantara, I.N.; Purnami, S.W. Combined estimator fourier series and spline truncated in multivariable nonparametric regression. Appl. Math. Sci. 2015, 9, 4997–5010. [Google Scholar] [CrossRef]
  9. Prahutama, A.; Santoso, R. Mix local polynomial and spline truncated: The development of nonparametric regression model. J. Phys. Conf. Ser. 2018, 1025, 012102. [Google Scholar] [CrossRef]
  10. BPS. Indikator Kesejahteraan Rakyat Provinsi Bali; Bhinneka, C.V., Ed.; Badan Pusat Statistik Provinsi Bali: Denpasar, Indonesia, 2017.
  11. World Bank. Era Baru Dalam Pengentasan Kemiskinan di Indonesia Ikhtisar; The World Bank: Jakarta, Indonesia, 2006. [Google Scholar]
  12. Leamer, E.E. Macroeconomic Patterns and Stories; Springer: Berlin/Heidelberg, Germany, 2009. [Google Scholar]
  13. Sukirno, S. Makro Ekonomi Teori Pengantar; Raja Grafindo Persada: Jakarta/Depok, Indonesia, 2012. [Google Scholar]
  14. Sudibia, I.K.; Marhaeni, A.A.I.N. Beberapa strategi pengentasan kemiskinan di kabupaten Karangasem provinsi Bali. J. Kependud. Dan Pengemb. Sumber Daya Mns. 2012, 9, 1–14. [Google Scholar]
  15. Wijantari, N.M.W.; Bendesa, K.G. Kemiskinan di provinsi Bali studi komparatif Kabupaten/Kota di provinsi Bali. J. Bul. Stud. Ekon. 2016, 21, 13–25. [Google Scholar]
  16. Mariati, N.P.A.M.; Budiantara, I.N.; Ratnasari, V. Combination estimation of smoothing spline and fourier series in nonparametric regression. J. Math. 2020, 1–10. [Google Scholar] [CrossRef]
  17. Abbas, I.A.; Marin, M. Analytical solutions of a two-dimensional generalized thermoelastic diffusions problem due to laser Pulse, Iran. J. Sci. Technol.-Trans. Mech. Eng. 2018, 42, 57–71. [Google Scholar] [CrossRef]
  18. Marin, M.; Othman, M.I.A.; Seadawy, A.R.; Carstea, C. A domain of influence in the Moore–Gibson–Thompson theory of dipolar bodies. J. Taibah Univ. Sci. 2020, 14, 653–660. [Google Scholar] [CrossRef]
Figure 1. Plot of the simulation data.
Figure 1. Plot of the simulation data.
Symmetry 13 02094 g001
Figure 2. Plot of actual and predicted data with smoothing parameters and oscillation parameters.
Figure 2. Plot of actual and predicted data with smoothing parameters and oscillation parameters.
Symmetry 13 02094 g002
Figure 3. Plot of the comparison of the GCV value of the experimental function.
Figure 3. Plot of the comparison of the GCV value of the experimental function.
Symmetry 13 02094 g003
Figure 4. (a). Plot of the comparison of the R2 value of the experimental function; (b) plot of the comparison of the MSE value of the experimental function.
Figure 4. (a). Plot of the comparison of the R2 value of the experimental function; (b) plot of the comparison of the MSE value of the experimental function.
Symmetry 13 02094 g004
Figure 5. (a) Plot of comparison of total household expenditure (HE) with wage/salary (WS) per capita per month; (b) plot of comparison of total household expenditure (HE) with consumption of carbohydrates (CC) per capita per month; (c) plot of comparison of total household expenditure (HE) with fat consumption (FC) per capita per month; (d) plot of comparison of total household expenditure (HE) with protein consumption (PC) per capita per month.
Figure 5. (a) Plot of comparison of total household expenditure (HE) with wage/salary (WS) per capita per month; (b) plot of comparison of total household expenditure (HE) with consumption of carbohydrates (CC) per capita per month; (c) plot of comparison of total household expenditure (HE) with fat consumption (FC) per capita per month; (d) plot of comparison of total household expenditure (HE) with protein consumption (PC) per capita per month.
Symmetry 13 02094 g005
Figure 6. Plot for Data Pattern Estimation.
Figure 6. Plot for Data Pattern Estimation.
Symmetry 13 02094 g006
Figure 7. Scatter plot y and y ^ .
Figure 7. Scatter plot y and y ^ .
Symmetry 13 02094 g007
Table 1. Simulation type model.
Table 1. Simulation type model.
RepeatFunctionn σ 2
1Polynomial and Trigonometry2000.25
22000.5
32001
41000.25
51000.5
61001
7500.25
8500.5
9501
10250.25
11250.5
12251
Table 2. GCV value with various types of smoothing and oscillation parameters.
Table 2. GCV value with various types of smoothing and oscillation parameters.
NoSmoothing ParametersOscillation ParametersGCV
10.03820.0000191649
20.04710.0000189786
30.07030.0000187579
40.15310.0000185500
50.42010.0000184495 *
60.15420.0000185493
70.09630.0000186508
80.05720.0000188576
90.04030.0000191030
100.03410.0000192730
* GCV minimum.
Table 3. The optimal GCV value from each replication of the experiment simulation model scenario 4.
Table 3. The optimal GCV value from each replication of the experiment simulation model scenario 4.
RepeatGCVminimumR2MSE
10.0000184086.4390.256
20.0000083892.6210.172
30.0000142089.0610.228
40.0000130087.4930.215
50.0000099489.1490.188
60.0000097291.2900.192
70.0000146088.8140.227
80.0000081491.8460.170
90.0000099291.4370.188
100.0000134085.7990.218
1000.0000107090.4370.199
Information: R2 unit in%.
Table 4. R2 value with various errors experimental function for n = 100.
Table 4. R2 value with various errors experimental function for n = 100.
Repeat σ 2 = 0.25 σ 2 = 0.5 σ 2 = 1
186.44079.52565.356
292.62184.57460.949
389.06183.91465.290
487.49378.30868.861
589.14983.75570.094
691.29080.51761.168
788.81578.49467.647
891.84780.19576.263
991.43886.78172.427
1085.80080.21267.902
10090.43879.75567.505
Average90.27980.46867.636
Information: R2 unit in%.
Table 5. Average Value GCV, MSE, and R2.
Table 5. Average Value GCV, MSE, and R2.
n σ 2 GCVR2MSE
250.250.0018581274.6420.440
0.50.0037804164.3630.635
10.0174159858.3920.786
500.250.0001219681.8510.293
0.50.0001917575.5510.367
10.0004640764.1670.572
1000.250.0000107390.2790.196
0.50.0000266880.4680.309
10.0000615067.6380.470
2000.250.0000004098.0750.080
0.50.0000024389.9250.199
10.0000062979.0280.320
Information: R2 unit in %.
Table 6. Minimum GCV value for n = 200.
Table 6. Minimum GCV value for n = 200.
RepeatSmoothing
Spline
Fourier
Series
Mixed Smoothing Spline and Fourier Series
10.26346400.028804020.0000011100
20.25296230.027653050.0000003890
30.22684870.024768620.0000003730
40.27269020.029816540.0000003930
50.23759650.025949590.0000004590
60.27701160.030297660.0000003300
70.23150530.025313970.0000003590
80.23403910.025586060.0000003860
90.23115050.025228830.0000003710
100.25319900.027675620.0000003620
1000.23161810.025302320.000000353
Average
Minimum GCV
0.245188460.026720.000000398 *
* GCV minimum.
Table 7. Terasvirta Neural Network Test.
Table 7. Terasvirta Neural Network Test.
Correlation χ 2 P value Conclusion
x1 to y502.34 < 2.2 × 10 16 Non linear
x2 to y203.94 < 2.2 × 10 16 Non linear
x3 to y327.05 < 2.2 × 10 16 Non linear
x4 to y35.943 1.567 × 10 8 Non linear
Table 8. Selection of optimal smoothing parameters and oscillation parameters.
Table 8. Selection of optimal smoothing parameters and oscillation parameters.
Nok λ 1 λ 2 λ 3 GCV
130.070.070.07 5.70 × 10 13
220.050.10.05 6.54 × 10 13
310.050.10.03 4.99 × 10 13
420.10.050.1 5.48 × 10 13
510.030.030.03 1 . 421 × 10 13 *
630.050.030.03 1.66 × 10 13
720.030.030.05 1.72 × 10 13
810.050.030.05 1.98 × 10 13
910.10.030.05 2.52 × 10 13
1020.050.050.03 2.58 × 10 13
* GCV minimum.
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Mariati, N.P.A.M.; Budiantara, I.N.; Ratnasari, V. The Application of Mixed Smoothing Spline and Fourier Series Model in Nonparametric Regression. Symmetry 2021, 13, 2094. https://doi.org/10.3390/sym13112094

AMA Style

Mariati NPAM, Budiantara IN, Ratnasari V. The Application of Mixed Smoothing Spline and Fourier Series Model in Nonparametric Regression. Symmetry. 2021; 13(11):2094. https://doi.org/10.3390/sym13112094

Chicago/Turabian Style

Mariati, Ni Putu Ayu Mirah, I. Nyoman Budiantara, and Vita Ratnasari. 2021. "The Application of Mixed Smoothing Spline and Fourier Series Model in Nonparametric Regression" Symmetry 13, no. 11: 2094. https://doi.org/10.3390/sym13112094

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop