1. Introduction
Unquestionably, the current era of economic disruption has a negative side that is particularly felt by middle- to low-income individuals. Disruptions to the economy can eliminate the economic growth momentum generated by demographic bonuses. Numerous jobs previously performed by humans are being replaced by technological innovation and various forms of artificial intelligence. It will lead to new inequality issues as a result of labour reduction, ultimately affecting the welfare of the community. In order to prevent economic disruption from aggravating existing welfare issues in Indonesia, particularly for vulnerable communities and households, the government must implement optimal policies.
The welfare of the population in an area can be described through several indicators, one of which is household consumption expenditure (Sekhampu and Niyimbanira [
1]; Irawan, et al. [
2]). The presentation of household consumption expenditure data produced by the Central Bureau of Statistics (BPS) via the National Socio-Economic Survey (Susenas) must be expanded in order to estimate population parameters at the national, provincial, and district/city levels. It is not designed to estimate population parameters in smaller areas, such as sub-districts or villages, because the sample size is insufficient. The government now requires data presented at a more detailed and accurate regional level in order to conduct development planning and evaluation as well as address population welfare and inequality issues in a targeted and effective manner. Due to a lack of information at the subregional level, policymaking and implementation by local governments are less optimized.
In addition, the sustainable development goals (SDGs) targeted by the United Nations (UN) must be provided by each member country, including Indonesia. Obviously, the fulfillment of the SDGs target requires an estimation level at smaller geographical areas such as districts/cities, sub-districts, and even at the village level. However, the limited number of samples in surveys conducted by BPS will result in inadequate precision for estimation values or parameter estimation in small areas due to the large variance of the resulting estimates. The provision of more budget to increase the number of samples and the number of survey officers is one effort that can be made so that the existing survey design is able to provide a direct estimation of statistical output in small areas with adequate precision, one of which is for the estimation of average household expenditure.
This data is one of the key components required in calculating the poverty rate of a region. The estimation of average household expenditure data up to the sub-district level can later be used as an indicator in grouping sub-districts in a region based on expenditure groups. In addition, the estimated data can be used as an indicator to rank regions to obtain regions that will be the target of poverty alleviation programs or community welfare improvement programs by the regional government. The importance of the need for information down to the small-area level and the limitations of existing resources make it necessary for BPS to apply a statistical method capable of handling these problems. According to Notodiputro and Kurnia [
3], one possible solution is the indirect estimator, known as small area estimation (SAE).
Rao and Molina [
4] explained that the application of SAE is conducted by borrowing strength from the information of auxiliary variables associated with the response variable or the estimated variable. This condition allows SAE to be employed to improve the effectiveness of survey sample collection at BPS. Several estimation methods can be conducted in SAE, including Best Linear Unbiased Prediction (BLUP), Empirical Best Linear Unbiased Prediction (EBLUP), Hierarchical Bayes (HB), and Empirical Bayes (EB). In general, the selection criteria for these estimation methods are determined based on the type of data on the response variable. EB and HB methods are generally used on response variables that are binary or enumerated, while BLUP and EBLUP methods are more appropriate for continuous response variables (Rao and Molina [
4]). The EBLUP method is a form of General Linear Mix Model (GLMM) when the parameter variance is unknown and is considered to have several advantages over other models (Ghosh and Rao [
5]). Fay and Herriot [
6] initiated using the EBLUP estimation method in area-level SAE to estimate the logarithm of the per capita income of the United States population. Therefore, this model is known as the Fay–Herriot model.
Many research variables, including variables generated from BPS surveys, have strong correlations. One example is the correlation between average household food expenditure and average household non-food expenditure (Nurizza [
7]). These strongly correlated variables can be estimated together using the Multivariate EBLUP SAE method and are expected to have a more efficient estimation value than Univariate EBLUP SAE (Datta, Fay, and Ghosh, [
8]). The Multivariate Fay–Herriot or Multivariate EBLUP model was then developed by Benavent and Morales [
9] by presenting four different estimation models based on the structure of the covariance matrix.
Based on the condition of the March 2020 National Socio-Economic Survey (Susenas) data for Central Java Province, out of a total of 576 sub-districts, 573 sub-districts were included as samples. There were three sub-districts that were not selected as Susenas samples (BPS [
10]). Because not all sub-districts were selected as Susenas samples, the problem is how to estimate the parameters for unsampled sub-districts. In estimating EBLUP for unsampled areas, a global synthetic model is usually used. Rao [
11] stated that a synthetic estimator is an unbiased estimator in a large area that is used to obtain an indirect estimator in a small area, assuming that the small area has the same characteristics as the large area. The synthetic estimator model will ignore the random area effect since the random area effect information does not exist in the unsampled area (Saei and Chambers [
12]), so that the estimation in unsampled subdistricts may be biased.
Some studies with the EBLUP method utilize the addition of cluster information in estimating unsampled areas. Ginanjar [
13] researched some of them, who estimated per capita expenditure at the sub-district level in Jambi Province in unsampled sub-districts using the univariate EBLUP method with the addition of cluster information. With the same method, Anisa et al. [
14] also added the mean value of the random area effect estimator in each cluster to the prediction model to estimate the unsampled area. Meanwhile, with the Fay–Herriot Multivariate model, Nuryadin [
15] applied cluster information to predict the average per capita expenditure per village for food and non-food in unsampled villages. These studies conclude that models that are first clustered turn out to provide better predictions than models without clustering. There has been no research on the EBLUP-FH Multivariate method with K-Medoids Cluster information for actual data compared with the direct estimation method.
The clustering technique commonly used by researchers is K-Means Cluster. However, K-Means Cluster is highly sensitive to large data containing outliers, so the K-Medoids Cluster technique is a better alternative in this condition because it is more robust to outliers (Patel and Singh [
16]; Sangga [
17]). Based on this explanation, this study compares the direct estimation method and the EBLUP-FH Multivariate method in estimating the average household expenditure on food and non-food at the sub-district level in Central Java Province. In addition, this research also estimates the average household expenditure on food and non-food in non-sampled areas (sub-districts) using the EBLUP-FH Multivariate method by applying K-Medoids Cluster information. The K-Medoids cluster technique is based on considering a large amount of data and the presence of outliers in the auxiliary variables used.
2. Materials and Methods
Table 1 below presents a summary of the materials and methods used in this research. The detailed explanation will be presented in the following subsections.
2.1. Average of Household Expenditures
BPS [
10] defines average household expenditure as the monthly costs incurred for all household members’ consumption, divided by the number of households. Household consumption can be divided into food and non-food consumption and is restricted to spending on household necessities only, without consideration of sources. The forms of consumption expenditures include purchases, gifts, and items generated by the household (excluding expenditures used for business purposes or those given to other parties).
The calculation of average household expenditure in the i-th area can be mathematically formulated as follows:
where:
: average monthly household expenditure in the i-th area (rupiah)
: total household expenditure in a month in the i-th area (rupiah)
: number of households
2.2. Related Research on Determining Auxiliary Variables
Rao [
18] states that in conducting indirect estimation, the choice of auxiliary variables is very significant in determining the accuracy of the resulting estimates. Estimation of per capita expenditure variables using small area estimation, or SAE, has been done quite a lot in Indonesia. Desiyanti et al. [
19] use the EBLUP Univariate method to estimate average per capita expenditure at the sub-district level in West Sumatra. However, estimation of unsampled sub-districts still uses synthetic estimators. Auxiliary variables used in indirect estimation are the number of non-electricity user families, the number of non-PLN electricity user families, the number of polyclinics/medical centers, the number of minimarkets/supermarkets, the number of SD/MI, and the number of doctor’s practices.
In Amaliana and Lestari’s research [
20] on the application of the EBLUP Univariate method to the Fay–Herriot SAE model, the auxiliary variables used including the percentage of agricultural households, the number of Insurance for the Indigence recipients, State Electricity Company (PLN) electricity users, the number of Elementary School (SD)-Junior High School (SMP)-High School (SMA)- University (PT), the number of families living in slums, the number of Certificate of Indigence (SKTM) owners, the number of educational institutions and skills, and the number of Indonesian migrant workers (TKI) have a significant effect in indirectly estimating per capita expenditure in the Jember District.
Furthermore, Nurizza and Ubaidillah [
21] used the SAE multivariate approach in estimating food and non-food per capita expenditure in Indonesia. Their results shows that in estimating indirect per capita food expenditure, the variables of the number of non-PLN electricity users, the number of riverbank settlements, the number of migrant workers, elementary schools, vocational schools, universities, auxiliary health centers, polyclinics, doctor’s offices, village maternity clinics, integrated health posts, medium and small industries (IMK), restaurants and inns had a significant effect. Meanwhile, for the indirect estimation of non-food per capita expenditure in Indonesia, the variables that have a significant effect are the number of PLN electricity users, non-PLN users, migrant workers, elementary schools, midwife practice sites, doctor practice sites, village maternity clinics, integrated health posts, community health centers without inpatient care, auxiliary community health centers, polyclinics, pharmacies, and restaurants.
Small-area estimation of per capita expenditure at the subdistrict level was also conducted by Ginanjar [
13] using the EBLUP method in Jambi Province. In this study, there were eight auxiliary variables or predictor variables that significantly influenced per capita expenditure at the subdistrict level in Jambi Province, namely population, number of universities, the ratio of school facilities, number of polyclinics/health centers, coverage of doctors, coverage of health workers, coverage of people with disabilities, and the ratio of midwives.
2.3. Small Area Estimation
An area is considered large if the sample drawn from it is large enough to yield a direct estimate with sufficient precision. Conversely, an area or domain is considered small if the domain-specific sample is not large enough to support direct estimation with sufficient precision or accuracy (Rao and Molina [
4]). Small area estimation (SAE) is an indirect estimation technique in small areas that is conducted by borrowing strengths from related areas and/or periods to increase the effectiveness of the sample size and decrease the standard error, allowing the estimation results to have sufficient precision (Rao and Molina [
4]).
The main problems in SAE are how to produce reasonably good parameter estimates in an area with a relatively small sample size and how to estimate the mean square error (MSE) of the resulting parameter estimates (Pfeffermann [
22]). Both of these main points can be generated by borrowing additional information from within the area, outside the area, or outside the survey (auxiliary variables), which can usually be obtained from census or administrative data.
Based on the availability of auxiliary variables, SAE can be classified into two types (Rao dan Molina [
4]).
2.3.1. Basic Unit-Level Model
The unit-based small area estimation model is an SAE model with available auxiliary variables corresponding to response variables observed up to the unit level. Assumed auxiliary variables are available for every
j-th element in the
i-th area.
available for each
j-th element in the
i-th area. The variables of interest are
assumed to have a relationship with
through the following equation:
Area random effects are denoted by , a random variable that is assumed to be independent and identically distributed. While for with a known constant and are random variables that are mutually independent and identically distributed with respect to . In other words, and are generally assumed to have a normal probability distribution.
2.3.2. Area-Level Model (Basic Area-Level)
The area-based SAE model introduced by Fay and Herriot in 1979 is part of the General Linear Mixed Model (GLMM). This GLMM model is built based on the availability of predictor variables and direct estimation at a certain area level. Suppose there are a number of small areas as many as
(
with auxiliary variable data available for each
i-th small area being
, with the parameters to be estimated being
. The
is assumed to be linearly related to
through the following equation (Ubaidillah [
23]):
By:
is a vector of regression coefficients of size p × 1
: known positive constant
: small area random effects, with assumed to be independent and identically distributed (iid) with and .
If assumed
is an unbiased direct estimator for
, where the estimator
contains the error of the sample draw, namely
, then the sampling model can be formulated as follows:
where
is a sampling error that is assumed to be independent of each other with its variance assumed to be known (
) or
and
.
Combining Equations (1) and (2) will result in a General Linear Mixed Model of area-based small area estimation known as the Fay–Herriot model, namely:
In the model Equation (3) above, the variation of the response variable in a small area is assumed to be explained by the relationship between the response variable and the auxiliary variables, which is called the fixed effect model. In addition, this model also contains a small area random effect component, which is a small area-specific variation component that cannot be explained by the auxiliary variables. The combination of these two assumptions (the fixed effect model and the random effect model) forms a linear mixed model.
2.4. Multivariate Fay–Herriot Models
The Multivariate Fay–Herriot model is a development of the Univariate Fay–Herriot model that can be used for more than one response variable (Ubaidillah [
23]). Suppose the population is partitioned into
area. Let
be a vector of the
-th variable of interest, with
. Meanwhile, the vector of
-th direct estimators of
is denoted by
. As for
, it is assumed to be related to
area-specific auxiliary variables
through a linear model (Ubaidillah, 2017):
where:
: vector of area random effects
: covariance matrix of area random effects of size
: -th matrix of area-specific auxiliary variables of size with
: vector of regression coefficients, with
The sampling model can be formulated as follows:
where
is the vector of sampling errors and
is a known covariance matrix of size
. By combining Equations (4) and (5), the Multivariate Fay–Herriot model is generated as follows:
where
and
are independent.
The model in Equation (6) can be written in matrix form as follows (Benavent and Morales, 2016):
where
and
are mutually independent.
is a matrix of random effect constants that are assumed to be known. The matrix
with
is a matrix of auxiliary variables with
. The vector
is the
vector of variables of interest with
. The
operator means stacking matrix by column. The matrix
is the covariance matrix of the random effects area where
is the identity matrix of size
, and
denotes a Kronecker product. While
is a sampling covariance matrix of size
which is assumed to be known and obtained from sampling error in the survey.
Empirical Best Linear Unbiased Prediction (EBLUP) Multivariate
Under the model in Equation (7), it holds that
and
. The best linear unbiased prediction (BLUP) of
where
is:
where
is the best linear unbiased estimator (BLUE) of
.
Since the value of the random effect variance component,
is unknown, it must be determined from empirical data when modeling parameters using the EBLUP-Fay–Herriot approach. There are several estimation methods that can be performed on the random effect variance component, such as the Maximum Likelihood (ML) and Restricted Maximum Likelihood (REML) methods based on normal likelihood (Patterson and Thompson [
24]).
As stated earlier, the multivariate BLUP estimator (8) depends on the variance parameter
of
where
. The variance parameter,
, cannot be known and is estimated using the REML approach. Restricted log-likelihood of the joint probability density of
which is expressed as a function of
is given as follows (Benavent and Morales [
9]):
where
. By taking the partial derivative of Equation (9) with respect to
with
-th element, where
, then the score vector is obtained
where:
where
is the partial derivative of
with respect to
-th element of
. By taking the second order partial derivative of Equation (9) with respect to
with
-th element, changing sign and taking expectations, then the Fisher Information matrix is obtained as follows:
The iterative
of Fisher-scoring algorithm for REML estimation of
is:
Furthermore, the Empirical Best Linear Unbiased Prediction (EBLUP) estimator for the Multivariate Fay–Herriot model is obtained by plugging
in
and
of Equation (8) as follows:
where
is the Best Linear Unbiased Estimator (BLUE) for
with covariance matrix
.
2.5. Direct Estimation
Estimation of population parameters in a region based only on sample data from that region is said to be direct estimation (Rao and Molina [
4]). This direct estimation method is design-based or depends on the sampling design used. The March 2020 National Socio-Economic Survey (Susenas) results were used in this study to directly estimate the response variable on average household expenditure on food and non-food.
2.6. Selection of Auxiliary Variables
The auxiliary variables used in SAE must be related to the response variable. The auxiliary variables used in this study were taken from the variables used in related studies and then grouped into variable groups with the following details:
There are methods we can use to select auxiliary variables, including forward, backward, and stepwise methods. The stepwise selection method combines the forward and backward selection methods. The stepwise method modifies the forward selection method. When a new variable is added, all candidate variables in the model are checked again to see if they are still significant. If there is a variable that becomes insignificant based on the specified significance level, then the variable is removed (backward). In this stepwise method, there are two levels of significance: adding variables and removing variables from the model.
2.7. Multivariate EBLUP Method with Added Cluster Information
The EBLUP method is generally used to estimate an area that contains a sample. Unsampled areas can usually be estimated using a synthetic model. The problem with the synthetic model is that it does not consider the random effect area because it does not have enough information about the area that was not sampled. It can lead to an estimated value with a large bias (. Therefore, adding cluster information to the EBLUP method should improve estimates for unsampled areas. Clustering is conducted based on auxiliary variables so that all areas will be included in certain clusters, both with and without samples.
The addition of cluster information is based on the assumption that an area has a pattern of close relationships with other areas. The random area effect has a similarity pattern between areas, allowing it to be analyzed using cluster techniques from the auxiliary variables in each small area. In estimating an unsampled area, the random area effect is often ignored due to the absence of such information. The EBLUP estimator for unsampled areas can be modeled as follows:
with
are the unsampled subdistricts in this study (Padureso sub-district, Batuwarno sub-district, and Lebakbarang sub-district).
The sampled and unsampled sub-districts will be grouped based on the auxiliary variables so that the cluster for each sub-district can be identified. The auxiliary variables used are selected variables that have met the assumptions of sample adequacy and non-multicollinearity first. The next step to be done in the sampled sub-districts is to average the random area effects per known cluster. Then the average of the random area effect per cluster will be entered into the prediction model as the estimator of the random area effect. The average random area effect per cluster is formulated in the following equation:
with
: number of sub-districts sampled in the -th cluster
: the average random effect area in the -th cluster
: random effect area in the -th sample
The average random effect area is used as additional information in areas where there are no samples in the corresponding cluster. Thus, the EBLUP estimator for unsampled areas can be formulated as follows:
with
are the unsampled subdistricts in the
-th cluster and
is the average of random effect area in the
-th cluster.
The quality of the resulting estimates can be evaluated based on the Relative Standard Error (RSE) value. The RSE value for the Multivariate EBLUP method is obtained by comparing the square root value of the MSE to the estimated value of the response variable, expressed as a percentage, according to the following formula:
According to BPS (2020), decisions regarding the accuracy of an estimate with RSE conditions 25% the resulting data is accurate (and can be used), condition 25% RSE 50% needs to be careful if the data will be used, and the condition RSE % data is considered inaccurate. The greater the RSE value, the more the estimator value differs significantly from the real parameter value.
2.8. Research Stages
The stages of research using the Multivariate EBLUP method and with the addition of cluster information are as follows:
- 1.
Prepare response variable data from National Socio-Economic Survey (Susenas) March 2020 data and auxiliary variable data from Village Potential Podes 2020 data for each sub-district in Central Java Province.
- 2.
Prepare the direct estimation results for the response variable of average household food and non-food expenditures that have been obtained from the results of the March 2020 National Socio-Economic Survey (Susenas) processing, namely 573 sub-districts out of a total of 576 sub-districts in Central Java Province.
- 3.
Test the correlation between the response variables average household expenditure on food and average household expenditure on non-food with Pearson Correlation.
The Pearson Correlation test hypothesis is as follows:
with the Pearson correlation coefficient formula as follows:
To test the significance of the correlation, the t-test is used with the following formula:
where:
The t-distribution formula for obtaining the appropriate t-value for testing the significance of the correlation coefficient is given by Equation (16). Then, the results of Equation (16) are compared to the t-table values with degrees of freedom . If , then will be rejected or if the p-value is less than which is set at 0.05.
- 4.
In the sampled area, the SAE area-level model was built to estimate parameters through the Multivariate EBLUP method, namely by:
- a.
Estimating the variance component
using the REML method through the Fisher scoring iteration procedure, according to Equation (10). The estimation process was conducted with the help of open-source R software version 4.1.3, using the package "msaeDB".
- b.
Estimating where
- c.
Perform the selection of auxiliary variables using the stepwise method
- d.
Estimating the
average household expenditure on food and non-food
() in each sampled sub-district using the selected auxiliary variables according to Equation (11)
- e.
Calculate the RSE values
of EBLUP-FH Multivariate on the average of household expenditure on food and non-food for each sub-district according to Equation (14)
- 5.
Perform the estimation process on non-sampled sub-districts using the Multivariate EBLUP method by adding cluster information with the K-Medoids technique, preceded by the following steps:
- a.
Checking the assumption of sample adequacy (KMO value) and detecting multicollinearity.
- b.
Apply the Z-Score approach to standardize the auxiliary variables used in the clustering procedure.
- c.
Determination of the optimum number of clusters using the silhouette method.
- d.
In the sampled area, the known components are averaged in each cluster according to equation (12).
- e.
Estimating the
average household expenditure on food and non-food
in the non-sampled area using the EBLUP-FH Multivariate method by adding cluster information ( according to equation (13). The estimation process uses R software with the “msaeDB” package and “msaefhns” function.
- 6.
Analyzing the results of estimating the average
household expenditure on food and non-food at the sub-district level in Central Java Province.
2.9. Data Source
This research uses secondary data from the Central Bureau of Statistics (BPS) as follows:
- 1.
Average monthly household expenditure on food and non-food data for 573 sub-districts in Central Java Province, sourced from the March 2020 National Socio-Economic Survey (Susenas) raw data using the direct estimation method. This data is used as response variables.
- 2.
Data on facilities, infrastructure, and other auxiliary variables available in each sub-district in Central Java Province were sourced from processing Village Potential (Podes) 2020 raw data.
This research is a case study for all sub-districts (576 sub-districts) in the districts/cities of Central Java Province in 2020. The National Socio-Economic Survey (Susenas) and Podes data used are aggregated data for each sub-district in Central Java Province. The processing in this study was carried out using the open-source software R version 4.1.3.
2.10. Research Variables
The variables used in this study include response variables and auxiliary variables. The response variable used is the average monthly food and non-food consumption expenditure of households in the
i-th sub-districts, sourced from the March 2020 National Socio-Economic Survey (SUSENAS) data. Meanwhile, auxiliary variables in each sub-district are obtained from PODES data in the 2020 Central Java Province. The determination of the auxiliary variables in this study is based on factors that affect the average household food and non-food consumption expenditure. The 40 candidates for auxiliary variables are shown in
Appendix A,
Table A1. Meanwhile, the significant auxiliary variables included in the model are presented in
Table 2.
4. Discussion
The results of the estimation of the average household expenditure on food and non-food at the sub-district level in Central Java from the EBLUP-Fay–Herriot Multivariate model produced a better level of diversity than the direct estimation results. It can be seen from the comparison between the Relative Standard Error (RSE) value between direct estimation and the EBLUP Multivariate model for each sub-district in Central Java. Many outliers are still found in the box plot of the direct estimation results RSE value, and the RSE value is greater than 25 percent. Meanwhile, the EBLUP-Fay–Herriot Multivariate SAE results can significantly reduce the number of outliers in the RSE value. There are not even outliers at all in the RSE value of the EBLUP Multivariate estimation results for the average household expenditure variable for food in each sub-district. This result is in line with studies about EBLUP Multivariate that show the effectiveness of the EBLUP Multivariate method in producing estimates down to the smallest area level (sub-district). The EBLUP multivariate method outperforms direct estimation based on the survey design.
For the three sub-districts that were not sampled in the March 2020 National Socio-Economic Survey (Susenas), the average household expenditure on food and non-food was estimated by adding cluster information to the EBLUP-Fay–Herriot Multivariate.
Table 6 shows that the estimated average household expenditure on food in Padureso sub-district is IDR 1,550,241, in Batuwarno sub-district it is IDR 1,574,599, and in Lebakbarang sub-district it is IDR 1,540,425. These three sub-districts are all members of cluster 1 for the average food expenditure variable group. Meanwhile, the estimated average household expenditure on non-food items in the three non-sampled sub-districts is lower than the value of food expenditure, namely IDR 1,470,115 in Padureso, IDR 1,455,223 in Batuwarno, and IDR 1,416,978 in Lebakbarang.
The Multivariate EBLUP estimation with the addition of cluster information can be used to estimate average household expenditure data down to the sub-district level, which can then be used as an indicator to categorize sub-districts in a region based on expenditure groupings. The estimated data can also be used as an indication or a reference in identifying priority regions to get targeted locations in programs for reducing poverty or improving community welfare. Through direct estimation of the survey design, it is impossible to collect statistics on average household expenditures down to the sub-district level. It is because the BPS survey has a limited budget and people to survey. This issue can be solved by using small area estimation using the EBLUP Multivariate approach and adding cluster information for areas not sampled in the survey. As a result, local government’s activities are more effective and focused since data is available down to the small area (subdistrict) level.
For future research, the use of the EBLUP Fay–Herriot Multivariate model can be applied to other data that has a strong correlation. If the research is conducted in areas that have different geographical characteristics, researchers can also develop the Fay–Herriot Multivariate model by adding spatial and time aspects. The auxiliary variables used can be differentiated in each research area because the influence of variables can be different in different areas, so it is expected that the estimation model formed will be better and more accurate. In addition, other clustering methods can also be used as alternatives in estimating unsampled areas, such as the Fuzzy K-Means non-hierarchical cluster method, Fuzzy K-Medoids, or hierarchical cluster methods.
5. Conclusions
The EBLUP-Fay–Herriot Multivariate method can improve the parameter estimates generated by the direct estimation method since it yields lower levels of variance (RSE) when estimating average household expenditure on food and non-food at the sub-district level for the sampled sub-districts in Central Java Province, Indonesia. For the sub-districts in Central Java Province that were not sampled from the March 2020 Susenas, the application of the EBLUP-Fay–Herriot multivariate method with the addition of K-Medoids cluster information can be done to estimate the average household expenditure for food and non-food at the sub-district level. The RSE value of all sub-districts from the EBLUP-Fay–Herriot Multivariate estimation is also below 25 percent, so the estimation results are reliable and provide a good level of diversity.
This research is expected to contribute significantly to multivariate modeling of the small area estimation level area. Additionally, it is envisaged that regional governments will use the information on average household expenditure at the sub-district level that results from the estimation using the Multivariate EBLUP-FH approach to design and implement programs relating to welfare and poverty. Because of the limited number of samples and budget, BPS, as the official statistics provider, is unable to provide this data down to the sub-district level.